Illegitimate Recombination between Duplicated Genes Generated from Recursive Polyploidizations Accelerated the Divergence of the Genus Arachis

The peanut (Arachis hypogaea L.) is the leading oil and food crop among the legume family. Extensive duplicate gene pairs generated from recursive polyploidizations with high sequence similarity could result from gene conversion, caused by illegitimate DNA recombination. Here, through synteny-based comparisons of two diploid and three tetraploid peanut genomes, we identified the duplicated genes generated from legume common tetraploidy (LCT) and peanut recent allo-tetraploidy (PRT) within genomes. In each peanut genome (or subgenomes), we inferred that 6.8–13.1% of LCT-related and 11.3–16.5% of PRT-related duplicates were affected by gene conversion, in which the LCT-related duplicates were the most affected by partial gene conversion, whereas the PRT-related duplicates were the most affected by whole gene conversion. Notably, we observed the conversion between duplicates as the long-lasting contribution of polyploidizations accelerated the divergence of different Arachis genomes. Moreover, we found that the converted duplicates are unevenly distributed across the chromosomes and are more often near the ends of the chromosomes in each genome. We also confirmed that well-preserved homoeologous chromosome regions may facilitate duplicates’ conversion. In addition, we found that these biological functions contain a higher number of preferentially converted genes, such as catalytic activity-related genes. We identified specific domains that are involved in converted genes, implying that conversions are associated with important traits of peanut growth and development.


Introduction
The peanut (Arachis hypogaea L.), known as the "longevity fruit", is the leading oil and food crop among the legume family and is in seed oil (~46-58%) and protein (~22-32%) [1,2]. Peanut products are rich in fat and protein, which are essential for eradicating malnutrition and ensuring food security, which directly reflects the value-added effect of comprehensive processing and utilization [2][3][4]. The worldwide area dedicated to peanut cultivation covers about 23 million hectares and the vast majority of peanuts (>95%) are grown in Asia and Africa, with an annual production of nearly 42 million tons [5].
The peanut originated from South America and belongs to the genus Arachis [6], which contains 81 species and can be divided into nine sections according to the morphological characteristics, geographical distribution and hybrid affinity [2,7]. In the section Arachis, there are mostly wild diploid species (2n = 2x = 20), with only two tetraploids (AABB, 2n = 4x = 40), namely, the wild (Arachis monticola) and cultivated (A. hypogaea) species [8].
Genetic recombination plays an important role in DNA repair and crossovers between homologous chromosomes (or DNA segments) are a major driving force of biological evolution [9,36]. The recombination between homologous chromosomes is often called homologous recombination, while the recombination between homoeologous chromosomes (generated from polyploidization) is considered an abnormal recombination, which is called "illegitimate recombination" [37]. DNA genetic information can be reciprocally or symmetrically exchanged between homologous sequences during the meiotic and mitotic recombination of plants [38]. Gene conversion results from nonreciprocal recombination, which involves the unidirectional transfer of one gene (or DNA segment) locus to its paralogous counterparts [39]. Gene conversion between duplicated genes (or homoeologous chromosomes) generated from whole-genome duplication (WGD) has been discovered in yeast [40] and mammalian [41] genomes and also identified in plant genomes of Oryza sativa [20,21], Sorghum bicolor [37], Triticum aestivum [42], Gossypium [43], Brassica campestris and Brassica oleracea [44]. In addition, gene conversion between duplicates or homoeologous chromosomes is frequent and long-lasting and has been demonstrated in genomes of the genus Oryza and the homoeologous chromosomes 11 and 12 of rice produced from the common tetraploidization event of grasses [21,36,37,45]. Although the preliminary inference of gene conversion between subgenomes produced from PRT in A. hypogaea has been made [2], a comprehensive analysis of gene conversion for Arachis is lacking.
Mainly due to the biological and economic significance of Arachis, the genomes of five peanut species with different ploidies, including A. duranensis [3], A. ipaensis [6,46], A. monticola [5], A. hypogaea (Shitouqi) [2] and A. hypogaea (Tifrunner) [28], have been deciphered so far. Here, by performing a comparison analysis of these genomes, we aim to identify paralogous and orthologous gene sets associated with polyploidizations and species divergence, to assess the scale and patterns of conversion and to explore the factors that influence the occurrence of conversion, as well as its impact on genomic and functional evolution.

Detection of Duplicated Genes
To identify the duplicated genes produced by LCT and PRT and the orthologous genes related to the speciation of the considered Arachis genomes, BLASTP software [47] was first employed to search the potential homologous gene pairs, with the strict parameters of e-value < 1 × 10 −5 and Score > 100. Then, the homologous gene information and locations on chromosomes were input into ColinearScan [14], to infer the colinear gene pairs and test the significance of the colinearity of chromosomal regions (blocks), while the key parameter, the maximum gap, was set to 50 intervening genes; the large gene families with 50 or more members were removed from the blocks. Lastly, we performed genomic homologous structure analyses through homologous dotplots to help to determine the paralogous and orthologous genes. This genome colinearity analysis approach was adopted in many previous angiosperm genomic comparisons [4,48,49].

Construction of Homologous Gene Quartets
Assessing the conversion between duplicate genes generated from LCT and PRT, we defined homologous gene quartets according to the gene colinearity information. If both genomes of any two Arachis species, A and B, retained one pair duplicate produced by LCT, a homologous gene quartet was formed by paralogous genes A1 and A2 from species A and their respective orthologous genes B1 and B2 from species B ( Figure 1A). If there is no gene conversion between duplicated genes after species divergence, the sequence similarity between orthologous genes should be more similar than any pair of paralogous genes. However, if the duplicate genes are affected by conversion, we may find that the gene tree of the quartets exhibits a different structure compared to the expected topology ( Figure 1B). To infer the conversion between duplicated genes located in different subgenomes of tetraploid peanut genomes, we constructed another type of quartet formed by one duplicated gene pair, Ama and Amb in A. monticola and their respective orthologous genes Ad in A. duranensis and Ai in A. ipaensis. Meanwhile, a similar approach was used to construct the quartets for the cultivated tetraploid A. hypogaea and its diploid ancestors of A. duranensis and A. ipaensis ( Figure 1C,D).

Calculation of Ks and Ka
The synonymous nucleotide substitution rate (Ks) and nonsynonymous nucleotide substitution rate (Ka) between homologous gene pairs were estimated by using the Nei-Gojobori [50] approach, by implementing the program codeml in PAML [51]. ClustalW was employed to align multiple gene CDS and setting default parameters [52]. Due to the nucleotide substitutions frequently occurring at the same sites in a sequence, we used the Jukes-Cantor (JC) model to correct the Ks and Ka values, denoted by Ps and Pa [37,53].

Gene Conversion Inference
ClustalW [54] was used to conduct multiple sequence alignment of amino acid sequences from each quartet. If the quartets had gaps in pair-wise alignment sequences accounting for >50% of the alignment length, or the amino acid identity of compared homologous genes was less than 40%, the quartet was removed. Those highly divergent quartets were removed to avoid the false inference of gene conversion resulting from problematic alignments.
Whole-gene conversion (WCV) inference: Since the divergence of orthologous gene pairs in a quartet occurred later than their respective paralogous genes, the expected similarity of orthologous gene pairs was higher than the paralogues in this quartet. However, the paralogous gene pairs may be affected by conversion and become more similar than their respective orthologous gene pairs. Here, we used two methods to infer the potential whole-gene conversion events, in which the similarity of homologous gene pairs was measured by Ks (defined as WCV-I). The bootstrap test was performed on the gene tree for each quartet to check the confidence level of the conversion events [20,37]. Additionally, we used the ratios of amino acid locus identity of sequences in each quartet to measure the similarity and examined of the topological tree changes to infer the potential whole-gene conversion events (defined as WCV-II). Compared with WCV-I, WCV-II has more stringent standards for inferring the conversion, because the divergence of these Arachis genomes occurred more recently. The similarity between orthologous sequences in different Arachis species is often very high due to their relatively close genetic relationship, as seen in previous studies of the conversion in hexaploid wheat and the genus Oryza [21,42].
Partial-gene conversion (PCV) inference: To detect possible gene conversion that affected only portions of a gene from paralogues, we employed a dynamic programming algorithm combined with phylogenetic analysis to search the DNA segments > 10 nucleotides in length affected by conversion, as in previous studies, to infer the partial-gene conversion of Oryza subspecies [21,22].

Statistical Analysis of the Correlation between Conversion and Physical Location
To check whether gene conversion is affected by the physical location of duplicated genes on chromosomes, we calculated the distance of duplicated genes relative to the chromosomal termini. Firstly, the duplicated genes on each chromosome arm were divided into 1 Mb bin run from the chromosome termini to the centromere and the number of duplicated genes in each bin was counted. Then, we divided the number of converted genes by the number of all duplicated genes to calculate the conversion rate in each bin. The fold increase in conversion rates was equal to the mean of the first selected bin divided by the mean of all other bins. Lastly, one million rounds of a permutation test were carried out by randomly swapping the box sums of the conversion rates and calculating the fold increase for each permutation, as previously reported [21,55].

Gene Ontology Analysis
InterProScan v5.0 [56] with default parameters was employed to identify the GO terms for each gene in Arachis genomes and the functional overview of duplicated genes is available. The online visualization tool WEGO (http://wego.genomics.org.cn/, accessed on 1 May 2021) [57] was used to compare and show GO annotation results of considered gene sets, while the functional distribution and changing trend of converted and nonconverted genes can be clearly displayed. The Pearson chi-squared test was used to test the significance of difference between the number of converted and nonconverted genes in the same biological function.

Genomic Homology
Through the intra-genomic colinearity analysis, we inferred the gene colinearity within Arachis genomes. First, we identified the duplicated genes generated by recursive polyploidizations in each diploid and tetraploid peanut genome and found the A. duranensis with highly preserved intragenomic homology than other genomes (Supplementary Table S1). We identified 599 homologous blocks with four or more colinear genes, containing 5016 colinear gene pairs in A. duranensis. Using the same parameters, we found only 431 homologous blocks in A. monticola A, containing 2785 colinear gene pairs, which may be due to many chromosomal rearrangements occurred after it split from other species. Furthermore, we distinguished the blocks which were generated by LCT according to the median Ks of anchored gene pairs located in each block [26]. For example, within the dotplot of A. ipaensis, the homologous block between chromosomes 4 and 5, with a median Ks value of 0.86, was related to the LCT, indicates that this block was generated by LCT ( Figure 2B and Supplementary Figures S1-S8). In this way, we obtained the duplicated gene sets which generated from LCT in different ploidy peanut genomes (Supplementary Table S2 and Figure 2D). Ultimately, we found that the maximum number of duplicated genes was 2460 in A. duranensis and the minimum was 877 in A. monticola A. To identify the orthologous genes between genomes, we performed inter-genomic comparisons of the considered genomes from five Arachis species (Supplementary Table S1). We found that there were 2170-2991 blocks preserved between two diploid peanut genomes, or between the subgenomes in tetraploid peanut genomes, involving 15,894-37,293 colinear gene pairs. Obviously, there is better genomic colinearity between genomes than within genomes. Furthermore, we identified the orthologous gene pairs between any two peanut genomes, according to the median Ks of anchored gene pairs located in blocks related to the divergence of genomes (or subgenomes) ( Figure 2C and Supplementary Figures S9-S12). At last, we inferred that there were 5297-19,264 orthologous gene pairs between genomes or subgenomes, which were generated from the divergence of genomes (Supplementary  Table S2).

Homologous Gene Quartets
Based on the above intra/inter-genomic homologous gene colinearity information, we constructed the quartets between the considered genomes. We identified 2200 quartets between A. duranensis and A. ipaensis and used these to infer the conversion events of duplicated genes generated from LCT that occurred after the divergence of these two diploid peanuts ( Figure 1A). Then, we identified only 137 quartets between A. monticola A and A. monticola B, which we used to infer the conversions of duplicated genes generated from LCT which occurred after the formation of wild tetraploid peanut ( Figure 1A). Additionally, we constructed 1315 quartets between A. hypogaea A (Shitouqi) and A. hypogaea B (Shitouqi) and 1954 quartets between A. hypogaea A (Tifrunner) and A. hypogaea B (Tifrunner), which are both related to the LCT events, and used them to infer the conversions occurred after the formation of the cultivated tetraploid peanut ( Figure 1A). Fewer quartets were identified between two subgenomes of wild tetraploid peanut, possibly due to the extensive specific rearrangements occurred in its genome. Additionally, to infer the conversions between the duplicated genes generated from the PRT in tetraploid peanuts ( Figure 1C), we identified the quartets between the subgenomes in A. monticola, A. hypogaea (Shitouqi) and A. hypogaea (Tifrunner) and two wild diploid peanut genomes, for which there were 2866, 10,784 and 13,306 quartets, respectively.

Gene Conversion between LCT-Related Duplicated Genes
By removing highly divergent quartets, we obtained the reliable quartets for further inferring the gene conversion (Table 1). After filtering, we successfully identified the 1871 quartets between A. duranensis and A. ipaensis and the quartets among the subgenomes in three tetraploid peanut genomes of A. monticola, Shitouqi and Tifrunner, amounting to 99, 1314 and 1935, respectively (Table 1). Using these quartets, we inferred the wholegene (WCV-I and WCV-II) and partial-gene conversion (PCV) events between duplicates through the comparison of the gene tree topology changes, based on the similarity of the synonymous nucleotide substitution rate and the amino acid identity rate (see the Methods for details). Note: WCV-I a , the similarity of homologous gene pairs measured by Ks; WCV-II b , the ratios of amino acid locus identity of sequences in each quartet to measure the similarity and examination of the topological tree changes; PCV c , a dynamic programming algorithm combined with phylogenetic analysis.
In A. duranensis, we found that 220 (11.8%) of the paralogues were converted after this species' divergence from A. ipaensis, among which 4 (0.2%) of the paralogues were affected by WCV and 216 (11.6%) of the paralogues were affected by PCV (Table 1 and Figure 3A). In A. ipaensis, we found the 242 (13.0%) of the paralogues were converted after this species' divergence from A. duranensis, among which only 2 (0.1%) of the paralogues were affected by WCV and 241 (12.9%) of the paralogues were affected by PCV (Table 1 and Figure 3A). Considering that the conversions could be affected by the PRT, we inferred the conversion between the duplicated genes related to LCT in the subgenomes of A. monticola (Table 1 and Figure 3B). We found that 13 (13.1%) of the paralogues were converted in A. monticola A and 14 (13.1%) of the paralogues were converted in A. monticola B, which are similar to the conversion rates in two diploid peanuts, but show signs of being slightly higher. In addition, considering that the conversions could be affected by PRT and artificial domestication, we further independently inferred the conversion of the duplicated genes related to LCT in the subgenomes of Shitouqi and Tifrunner. We found that slightly fewer duplicated genes were affected by gene conversion in Shitouqi, with 126 (9.6%) of the paralogues having been converted in subgenome A and 115 (8.8%) of the paralogues having been converted in subgenome B (Table 1 and Figure 3C). Similar to Shitouqi, we found that fewer duplicated genes were affected by conversion in Tifrunner, with 132 (6.8%) of the paralogues having been converted in subgenome A and 139 pairs (7.1%) of the paralogues having been converted in subgenome B (Table 1 and Figure 3D). These conversion rates for paralogues in different ploidies of the peanut genome suggest that habitats and genetic bases both have a certain influence on the occurrence of conversion.

Gene Conversion between PRT-Related Duplicated Genes
To infer the conversions between PRT-related duplicated genes, we also removed the highly divergent quartets. After filtering, we obtained the quartets between the subgenomes of A. monticola, Shitouqi and Tifrunner, and two diploid peanut genomes, amounting to 2625, 9972 and 12,657, respectively (Supplementary Table S3). Then, using these quartets, we explored the gene conversion between subgenomes of tetraploid peanut. In A. monticola, we inferred that 433 (16.5%) of duplicated gene pairs related to PRT were affected by gene conversion. Of these, the conversion patterns of 263 (10.0%) of the paralogues were inferred to be WCV-I and 354 (13.5%) of the paralogues were inferred to be WCV-II, whereas there were fewer paralogues affected by PCV, with only 5 (0.2%) pairs. Meanwhile, we found that the 181 (54%) of the converted genes in A. monticola located in subgenome A were used as donors; the other converted genes used as donors were located in subgenome B.
Similarly, we also inferred the conversion between duplicated genes related to PRT in two cultivated tetraploid peanuts. In Shitouqi, we detected 1122 (11.3%) of the PRT-related duplicated genes affected by gene conversion. Of these, 607 (6.1%) of the duplicates were inferred to be WCV-I and 1006 (10.1%) of the duplicates were inferred to be WCV-II, with only 9 (0.1%) of the duplicates having been affected by PCV ( Figure 4A and Supplementary  Table S3). In Tifrunner, we detected 1706 (13.5%) of the PRT-related duplicated genes affected by gene conversion. Of these, 1115 (8.8%) of the duplicates were inferred to be WCV-I and 1495 (11.8%) of the duplicates were inferred to be WCV-II, with only 29 (0.2%) of the duplicates having been affected by PCV ( Figure 4B and Supplementary Table S3). Comparing the results of the above inferences, we found that the conversion between duplicates located in different subgenomes of wild tetraploid peanut occurs with a higher frequency than in cultivated tetraploid peanut, while the dominant source of donors is also different in two tetraploid peanuts. Furthermore, we compared the distribution of donors in the subgenomes of Shitouqi and Tifrunner and found that 112 (41%) of the converted genes located in subgenome A and 158 (59%) of the converted genes located in subgenome B were inferred to be donors. Interestingly, 84.7% (222) of the genes were taken as donors in the two genomes. This suggests that the donor genes in the converted duplicated genes are often taken as donors in different genomes. For example, part of an orthologous gene pair, Sha03g5270 of Shitouqi and Tha03g4128 of Tifrunner were both found to be donors in two tetraploid peanut genomes ( Figure 4C,D).

Conversion and Evolution
Conversion homogenizes paralogous gene sequences, which makes those paralogues affected by conversion appear younger than expected based on sequence divergence with one another [20,21,37,58]. Here, we also found that the average Pn = 0.181 and Ps = 0.534 of converted paralogues were significantly smaller than the average Pn = 0.199 and Ps = 0.559 of nonconverted converted paralogues in A. duranensis (p-value = 6.32 × 10 −3 , p-value = 6.13 × 10 −4 , t-test) (Supplementary Table S4). However, this comparison could not determine whether converted genes evolved slowly based on the paralogues themselves, since the pairwise distances between paralogues was distorted by conversion. Thereby, we further compared the Pn and Ps of converted and nonconverted orthologues between the considered genomes and found the distance of orthologues affected by conversion to be significantly larger than that of those orthologues not affected by conversion (Table 2 and Supplementary Table S5). For example, the average Pn = 0.055 and Ps = 0.109 of converted paralogues between A. duranensis and A. ipaensis were significantly larger than the average Pn = 0.023 and Ps = 0.064 of paralogues not affected by conversion (p-value = 9.48 × 10 −14 , p-value = 7.48 × 10 −7 , t-test). These results suggest that the converted paralogues have evolved faster than the nonaffected ones, also indicating the conversion contributes to the divergence of genus Arachis. To determine whether the gene conversion was affected by evolutionary selection pressure, we employed the Pn/Ps ratios of paralogues and orthologues to reflect the selection pressure during their evolution (Supplementary Tables S4 and S5). The average Pn/Ps ratio of converted paralogues in A. duranensis was 0.34, similar to the average Pn/Ps ratio of nonconverted paralogues, 0.36. This comparison seems to suggest that the conversion does not result in obvious changes in selection pressure of the paralogues in A. duranensis. Similarly, to check the corrections of conversion and evolutionary rates, we further used the Pn/Ps ratios of orthologues to find the actual selection pressure difference. We found that the average Pn/Ps ratio of converted orthologues between A. duranensis and A. ipaensis was 0.505, significantly larger than the average Pn/Ps ratio of nonconverted orthologues, which was 0.359 (3.68 × 10 −9 ) ( Table 2). This difference in Pn/Ps ratio also exists in comparisons between other genomes (Table 2). These results suggest that conversion reduces the negative selection pressure on genes, making them prone to the "free" mode of evolution.

Conversion and Physical Position
By calculating the rate of gene conversion occurring on different chromosomes, we found no significant difference in the rate of gene conversion between different chromosomes (Supplementary Tables S6 and S7). For example, the average gene conversion rate of 10 chromosomes in the A. duranensis genome was 11.8%, while the conversion rate of each chromosome was distributed in the smaller range of 9.7~14.1%, with no significant difference (p-value = 0.998). Furthermore, we found that the distribution of converted genes was unbalanced in the different regions of each chromosome (Supplementary Tables S8 and S9), as the converted genes tended to be located in near the end of the chromosome (Figure 3). For instance, approximately 30% of all converted genes generated from the PRT event were located within 5% of the end of the chromosomes. However, we did not find a high rate of gene conversion near the end of the chromosome (Supplementary Tables S10 and S11). The average rate gene conversion of A. duranensis genome was 11.8%, similar to the rate of conversions within the 5 Mb region near the chromosomal telomeres, which was 13.1%.

Chromosome Rearrangements and Conversion
Chromosome rearrangement events possibly disrupt genomic collinearity and the degree of chromosome rearrangement can be reflected by the number of blocks in the genome. To explore potential associations between rearrangements and conversion, we investigated the relationship between the conversion rate and the numbers of blocks related to the LCT and PRT event on each chromosome from eight Arachis genomes, respectively (Supplementary Tables S12 and S13). After a thorough comparison, unfortunately, we found no valuable correlation; even if there was a hint of correlation, the tendency was inconsistent across genomes. For example, we found that the gene conversion rate was weakly negatively correlated with the number of blocks in A. duranensis (R 2 = 0.0681), whereas a weakly positively correlation was exhibited in A. ipaensis (R 2 = 0.0052). Furthermore, when investigating the relationship between the length (colinear gene pairs) of the colinearity region and the gene conversion rate, it was found that the conversion of longer regions showed a higher rate than the shorter regions ( Figure 5), while, in the homologous chromosomal regions with more than 50 gene pairs, the gene conversion rate in A. duranensis was 13.1%, smaller than the conversion rate of 4.3% in those regions of with less than 10 gene pairs (Supplementary Tables S14 and S15). Although no correlation between rearrangement and conversion was found, we still revealed that the well-preserved ancestral homology can facilitate gene conversion.

Gene Function Analysis
The probability of a gene being converted may be associated with its function; thus, we performed a gene ontology analysis to identify the GO terms for duplicated genes in the studied peanut genomes. Firstly, we identified the GO terms of 235, 207, 240 and 262 LCT-related converted genes in A. hypogaea A (Shitouqi), A. hypogaea B (Shitouqi), A. hypogaea A (Tifrunner) and A. hypogaea B (Tifrunner), respectively. Comparing the proportion of converted and duplicated genes for each function, we found that some genes with specific functions were more likely to be converted, whereas there were some functional genes that were biased toward escape from conversion ( Figure 6). We found that the genes involved in those functions associated with large numbers of genes were biased towards gene conversion (Supplementary Table S16). For example, regarding the catalytic activity-related genes in A. hypogaea A (Tifrunner), the converted genes accounted for 23.1% of all converted genes, a significantly higher level than that of the duplicated genes related to this function, which only accounted for 15.1% of all duplicated genes in the whole genome (p-value < 0.001). This implies that catalytic activity-related genes tended to be affected by conversion. In contrast, some genes associated with functions (regulation of metabolic process) encoded by few genes might have avoided conversion ( Figure 6). Furthermore, we checked the domains involved in the converted genes in all the studied peanut genomes. Duplicated genes that had experienced gene conversion in diploid peanut were enriched in helix-loop-helix DNA-binding domain (p-value = 1.23 × 10 −6 ), WD (p-value = 6.65 × 10 −5 ) and ring finger domain (p-value = 4.13 × 10 −3 ) ( Figure 7A and Supplementary Table S17). After the formation of tetraploid peanuts, the domains involved in the converted genes of Shitouqi were enriched in the triose-phosphate Transporter family (p-value = 2.20 × 10 −16 ), the helix-loop-helix DNA-binding domain (p-value = 4.55 × 10 −4 ) and protein phosphatase 2C (p-value = 3.09 × 10 −4 ) ( Figure 7B and Supplementary  Table S18). The domains involved in the converted genes of Tifrunner were enriched in the helix-loop-helix DNA-binding domain (p-value = 3.93 × 10 −4 ), the RNA recognition motif (p-value = 7.62 × 10 −2 ) and short chain dehydrogenase (p-value = 8.74 × 10 −6 ) ( Figure 7C and Supplementary Table S19). Among all the genomes of different ploidy peanut, the converted genes involved domains which were enriched in the helix-loop-helix DNA-binding domain (p-value = 2.20 × 10 −16 ), Ring finger domain (p-value = 2.61 × 10 −6 ) and the RNA recognition motif (p-value = 6.01 × 10 −3 ) ( Figure 7D and Supplementary  Table S20). These results suggest that the identified converted genes with specific domains may be associated with important traits of peanut growth and development.

Long-Lasting Extensive Conversions Affected the Evolution of Duplicated Genes in Peanut Genomes
Duplicated genes generated from recursive ancient polyploidizations, which played an important role during the diversification of green plants, have been reported in many previous studies [10][11][12][59][60][61][62][63][64]. Here, we inferred the conversions between duplicated genes produced by LCT and PRT events and offered new insights into the evolutionary process of duplicated genes in peanut genomes. First, duplicated genes have been produced for a long time and they still interact with each other as a high frequency under the action of illegitimate genetic recombination, as demonstrated by the conversion between LCT-produced duplicates here identified and by the findings of previous studies on sorghum and rice [37]. Second, the conversion affects the DNA sequence to varying degrees, either at the level of the entire gene, or at that of only a few nucleotide sites. Third, conversion is an on-going long-lasting event which affected the evolution of duplicated genes, here revealed by the conversion events that occurred between the duplicates produced by the recent duplication event (PRT) of tetraploid peanut genomes.

Conversion Contributes to the Divergence of Genus Arachis Genomes
Conversions cause the sequence of duplicated gene pairs to become more similar than expected and it seems that the conversion causes the duplicates to be more conserved. However, we found that conversion accelerates the evolutionary rate of duplicate genes and contributes to the divergence of genus Arachis genomes. The main reason for this apparent result is that conversion distorted the genetic distance between duplicated gene pairs; this has also been demonstrated in previously studies [21,65,66]. Here, we emphasize that a closer understanding of the effect of conversion on the rate of nucleotide evolution should be obtained by comparing orthologous gene pairs between genomes. In addition, gene conversion as an accelerating force of nucleotide variation may lead to the transfer of a new mutation from one gene in duplicates to another copy, accelerating the divergence of peanut genomes.

Donor Genes Are Preferred as Donors
A gene conversion event involves copying one gene sequence from a donor locus to a receptor locus [67]. As a consequence of the conversion, the "acceptor" sequence is replaced, wholly or partly, by a sequence that is copied from the "donor", whereas the sequence of the donor remains unaltered. This gene conversion pattern has also long been identified in mammalian cells, with the human hemoglobin genes of HBG1 and HBG2 being the first characterized examples [41]. Comparative analyses of the characteristics of donor and acceptor genes in conversion events are helpful for elucidating the mechanism of conversion. We found that independent conversion events that have survived (so far) in different peanut genomes often used the same genes as donors. It seems improbable to attribute this to selection, as a gene from the ancestor of a diploid peanut wild species was inherited from two different varieties of tetraploid peanuts and was mostly consistently expressed as a donor. This indicates that a gene affected by gene conversion in one species as a donor is usually preferred as a donor in another species if it is also affected by conversion. A more plausible explanation is that one gene copy has a "privileged" nature over the other. This could be genetic or epigenetic. If one gene or its neighboring region possesses mutations or epigenetic changes, the other gene might be more likely to act as a donor, helping to reinstate intactness.

Conversion and Genomic Rearrangements
Duplicated genes distributed near the ends of chromosomes tend to undergo conversion, which has been reported in rice, sorghum, genus Oryza genomes and hexaploid wheat [20,21,37,42,45]. However, in peanut genomes, we did not find the duplicated genes near the end of the chromosome to be more preferentially converted. If gene conversion is based on interactions between similar DNA sequences, this finding seems unreasonable for the following reasons. First of all, most duplicates are distributed in regions near the end of chromosomes or far from the centromere [26,68] and the DNA sequence should have higher similarity, which can provide suitable basic conditions for the occurrence of conversion. In contrast, the abundance of repeat elements near the centromere often increases the frequency of DNA rearrangement and nucleotide variation, which ultimately leads to a reduction in the sequence similarity between homoeologous chromosomes related to the WGDs. Repetition elements are enriched in the pericentromeric regions, which has been demonstrated in many angiosperm genomes, such as rice, sorghum, cotton, soybean and peanuts [2,4,6,25,27,43,69,70]. Through careful examination, we found that some du-plicates distributed near the terminal regions of chromosomes still showed a preference for conversion, involving the chromosomes 2, 4, 5 and 6 in tetraploid peanut (Tifrunner) genomes, which maintained a good ancestral genomic structure (Figure 4B and Supplementary Table S9). Additionally, we also found that the length (colinear gene pairs) of the blocks may be positively correlated with the conversion rate, that is, the well-preserved homoeologous regions showed a higher conversion rate. This suggests that the duplicates located near the chromosomal terminal regions were not preferentially converted in the peanut genome, which may be caused by extensive genomic rearrangements after LCT and PRT events [2,26]. Genomic rearrangements can change the structure of ancestral chromosomes and the gene collinearity between homologous chromosomes produced by WGDs is often destroyed [71][72][73]. There were more genome rearrangements in peanut genomes than in rice and sorghum genomes relative to the ancestral genomes of their respective families (Legume and Poaceae) [2,26,72]. Perhaps, as a result of polyploidizations, the terminal region of the ancestral chromosome may no longer be near the telomeres in the peanut genome. In the future, we can further explore whether the regions that maintain good genomic collinearity and are preferentially affected by conversion are the regions near the end of ancestral chromosomes.

Conversion and Function
Gene conversion causes duplicated gene pairs to be very similar or even identical in sequence and the presence of duplicate copies may neutralize meaningful mutations and provide opportunities for functional innovation [74]. The evolution of functional genes that are members of large families may often be accompanied by strong purifying selection, as proposed by previous studies [75][76][77][78][79][80][81]. We confirmed that the functions associated with multigene families may be biased toward the occurrence of gene conversion. These results are also consistent with previous studies which proposed that the most multigene families were thought to have coevolved with related homologous genes through gene conversion [82]. In addition, gene conversion is emerging as a driver of innovation amongst meiotic drive genes, which likely contributed to the expansion and birth of meiotic driver genes [83,84]. This may be especially true when important components of drive systems consist of segments of DNA that can be copied multiple times within a genome. Here, we found that the effect of conversion on the functional genes in diploid ancestors and tetraploid peanut was inconsistent, even in different tetraploid cultivars peanuts which were inconsistent in certain functions; this may be due to the fact that geographical distribution and artificial domestication may have caused the two different varieties of peanuts to evolve in different directions.

Conclusions
Duplicated genes in Arachis genomes generated from recursive polyploidizations experienced long-lasting effects from gene conversion. By performing comparative genomics and phylogenetic analyses, we identified the scale and patterns of conversion between duplicates produced by LCT and PRT events during the diversification of the genus Arachis. Gene conversion maintained the similarity of duplicate sequences, provided opportunities for further gene conversion and accelerated the evolutionary rate of Arachis genomes. Chromosome rearrangements after polyploidization are associated with gene conversion events, while the well-preserved homoeologous chromosome regions may facilitate the conversion of duplicate genes. The genes involved in the functions associated with multigene families may be preferentially converted. We identified specific domains which were involved in converted genes, implying that conversions are associated with important traits of peanut growth and development. This present effort will contribute to understanding the evolution of duplicated genes affected by gene conversion in Arachis genomes.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/genes12121944/s1, Figure S1: The homologous colinearity Ks dotplot within A. duranensis genome, Figure S2: The homologous colinearity Ks dotplot within A. ipaensis genome, Figure S3: The homologous colinearity Ks dotplot within A monticola A genome, Figure S4: The homologous colinearity Ks dotplot within A monticola B genome, Figure S5: The homologous colinearity Ks dotplot within A. hypogaea A (Shitouqi) genome, Figure S6: The homologous colinearity Ks dotplot within A. hypogaea B (Shitouqi) genome, Figure S7: The homologous colinearity Ks dotplot within A. hypogaea A (Tifrunner) genome, Figure S8: The homologous colinearity Ks dotplot within A. hypogaea B (Tifrunner) genome, Figure S9: The homologous colinearity Ks dotplot between A. ipaensis and A. duranensis genomes, Figure S10: The homologous colinearity Ks dotplot between A monticola A and A monticola B genomes, Figure S11: The homologous colinearity Ks dotplot between A. hypogaea A (Shitouqi) and A. hypogaea B (Shitouqi) genomes, Figure S12: The homologous colinearity Ks dotplot between A. hypogaea A (Tifrunner) and A. hypogaea B (Tifrunner) genomes, Table S1: Number of homologous blocks and gene pairs within a genome or between genomes, Table S2: Number of paralogous and orthologous gene pairs within genome or between studied genomes, Table S3: Gene conversion between PRT-related duplicated genes in A. monticola and A. hypogaea genomes, Table S4: Nucleotide substitution rates of duplicate genes from quartets in studied peanut genomes, Table S5: Nucleotide substitution rates of PRT-related duplicated genes from quartets between peanut genomes, Table S6: The conversion rate of LCT-related duplicated genes and physical location of genes on chromosomes, Table S7: The conversion rate of PRT-related duplicated genes and physical location of genes on chromosomes, Table S8: The converted duplicates of LCT-related in each interval from the terminal, Table S9: The converted duplicates of PRT-related in each interval from the terminal, Table S10: Relationship between gene physical location and gene conversion of LCT-related, Table S11: Relationship between gene physical location and gene conversion of PRT-related, Table S12: Relationship between the block number and gene conversion of LCT-related, Table S13: Relationship between the block number and gene conversion of PRT-related, Table S14: Relationship between the block length (colinear gene pairs) and gene conversion of LCT-related, Table S15: Relationship between the block length (colinear gene pairs) and gene conversion of PRT-related, Table S16: Statistics of the converted and duplicated genes from top four functions from the Shitouqi and Tifrunner genomes, Table S17: The enrichment of domains involved in converted genes from two diploid peanut genomes, Table S18: The enrichment of domains involved in converted genes from Shitouqi genome, Table S19: The enrichment of domains involved in converted genes from Tifrunner genome, Table S20: The enrichment of domains involved in converted genes from the genomes of five Arachis species.