Genome-Wide Identification and Development of LTR Retrotransposon-Based Molecular Markers for the Melilotus Genus

Melilotus is an important genus of legumes with industrial and medicinal value, partly due to the production of coumarin. To explore the genetic diversity and population structure of Melilotus, 40 accessions were analyzed using long terminal repeat (LTR) retrotransposon-based markers. A total of 585,894,349 bp of LTR retrotransposon sequences, accounting for 55.28% of the Melilotus genome, were identified using bioinformatics tools. A total of 181,040 LTR retrotransposons were identified and classified as Gypsy, Copia, or another type. A total of 350 pairs of primers were designed for assessing polymorphisms in 15 Melilotus albus accessions. Overall, 47 polymorphic primer pairs were screened for their availability and transferability in 18 Melilotus species. All the primer pairs were transferable, and 292 alleles were detected at 47 LTR retrotransposon loci. The average polymorphism information content (PIC) value was 0.66, which indicated that these markers were highly informative. Based on unweighted pair group method with arithmetic mean (UPGMA) dendrogram cluster analysis, the 18 Melilotus species were classified into three clusters. This study provides important data for future breeding programs and for implementing genetic improvements in the Melilotus genus.


Introduction
Transposons, including retrotransposons or DNA transposons, are mobile genetic elements that are common in the genomes of eukaryotes [1]. A single, open reading frame (ORF) in which Gag and Pol are fused is common to both Ty1-copia and Ty3-gypsy elements, although many retroelements have an extra ORF with an unknown function [2]. Long terminal repeat (LTR) retrotransposons that possess two long terminal repeats are the most abundant group of transposons in plants [3], and they are particularly abundant in species with large genomes [4]. Moreover, LTR retrotransposons play key roles in plant phenotype variations and in the evolution of genome structure and function [5][6][7][8]. Mascagni et al. studied the relationship between changes in LTR retrotransposon abundance and the evolution of a genus and confirmed that LTR retrotransposons have continued to evolve during speciation [9]. Barghini et al. investigated LTR retrotransposon dynamics in the evolution of the olive (Olea europaea) genome and found that retrotransposon activity has impacted the olive genome structure in more ancient times than in other angiosperms [10]. Copia and Gypsy are the two main subgroups of LTR retrotransposons, and the major structural difference between the Copia and Gypsy groups is based on the order of reverse transcriptase (RT) and integrase [11].
Given the abundant, ubiquitous, and transcriptionally active retrotransposons in plant genomes, many molecular marker systems have been developed to exploit insertional

Identification and Analysis of LTR Retrotransposons in the Melilotus Genome
The number of identified LTR-RTs was 181,040 in the Melilotus genome (Table S1), with 101,240 belonging to the Ty3-gypsy group, 77,935 belonging to the Ty1-copia subgroup, and 1865 belonging to the other subgroup. In all, 168,428 LTR-RTs were successfully mapped to eight chromosomes of Melilotus. The maximum LTR-RTs was found on chromosome 2 (∼1.38%), and the minimum was observed on chromosome 4 (∼1.18%) ( Figure S1). Their total length was 585,894,349 bp, which accounted for 55.28% of the Melilotus genome. In total, 350 primer pairs were designed on the basis of RBIPs, IRAPs, ISBPs, and REMAPs (Table 1).

Amplification with LTR Primers in Melilotus
To amplify the genomic DNA of four accessions (Acc3, Acc5, Acc6, and Acc7) that were randomly selected for primer identification, 350 LTR-RT primer pairs were initially used. The total number of primer pairs that generated amplification products was 320, and 79 primer pairs showed polymorphism in four accessions. The polymorphic primers were chosen for further screening using 15 M. albus accessions, and each accession included four individual plants ( Figure S2). Among the primer pairs, 47 produced bands and revealed polymorphisms, including 34 pairs of RBIP primers and 13 pairs of IRAP primers ( Figure 1, Table S2). Additionally, to confirm the authenticity and accuracy of the PCR amplification bands obtained, the electrophoretic bands produced by primer pair Ma_LTR_302, which amplified variant alleles ranging from 411 to 426 bp in the seven Melilotus species, were sequenced. The primer pair Ma_LTR_302 belongs to the RBIP type, with the forward primer located outside of the LTR in the surrounding genome sequence of the LTR-RTs and the reverse primer located in the LTR regions of the LTR-RTs ( Figure 2). We found that the flanking sequence inserted by LTR-RT is the main cause of polymorphisms, which is in accordance with previous studies [12].   Type of  LTR-RTs  585,894,349  Ty1/Copia  77,935  RBIP  232  Ty3/gypsy  101,240  IRAP  105  Unknown  1865  ISBP  10  REMAP  3 RBIP: retrotransposon-based insertion polymorphisms, IRAP: inter-retrotransposon amplified polymorphisms, ISBP: insertion-site-based polymorphisms, REMAP: retrotransposon-microsatellite amplified polymorphisms.

Amplification with LTR Primers in Melilotus
To amplify the genomic DNA of four accessions (Acc3, Acc5, Acc6, and Acc7) that were randomly selected for primer identification, 350 LTR-RT primer pairs were initially used. The total number of primer pairs that generated amplification products was 320, and 79 primer pairs showed polymorphism in four accessions. The polymorphic primers were chosen for further screening using 15 M. albus accessions, and each accession included four individual plants ( Figure S2). Among the primer pairs, 47 produced bands and revealed polymorphisms, including 34 pairs of RBIP primers and 13 pairs of IRAP primers ( Figure 1, Table S2). Additionally, to confirm the authenticity and accuracy of the PCR amplification bands obtained, the electrophoretic bands produced by primer pair Ma_LTR_302, which amplified variant alleles ranging from 411 to 426 bp in the seven Melilotus species, were sequenced. The primer pair Ma_LTR_302 belongs to the RBIP type, with the forward primer located outside of the LTR in the surrounding genome sequence of the LTR-RTs and the reverse primer located in the LTR regions of the LTR-RTs ( Figure 2). We found that the flanking sequence inserted by LTR-RT is the main cause of polymorphisms, which is in accordance with previous studies [12].

Transferability of the Newly Developed LTR Retrotransposon-Based Markers
In the first step, 79 pairs of primers were selected and further screened in 15 M. albus accessions. In total, 47of the 79 LTR primer pairs showed polymorphisms among 15 M. albus accessions. A total of 182 alleles were obtained, and 3 to 7 alleles were observed per locus, with an average of 3.96. Primer 229 had the largest number of polymorphic bands (7) and highest polymorphism information content (PIC) value. The average expected heterozygosity (He) was 0.61, ranging from 0.42 to 0.8. The PIC values were between 0.38 and 0.77, with a mean of 0.54 (Table 2).

Transferability of the Newly Developed LTR Retrotransposon-Based Markers
In the first step, 79 pairs of primers were selected and further screened in 15 M. albus accessions. In total, 47of the 79 LTR primer pairs showed polymorphisms among 15 M. albus accessions. A total of 182 alleles were obtained, and 3 to 7 alleles were observed per locus, with an average of 3.96. Primer 229 had the largest number of polymorphic bands (7) and highest polymorphism information content (PIC) value. The average expected heterozygosity (He) was 0.61, ranging from 0.42 to 0.8. The PIC values were between 0.38 and 0.77, with a mean of 0.54 (Table 2).
In this study, 47 newly developed LTR-RT markers generated polymorphic bands in 40 accessions of 18 Melilotus species. A total of 292 alleles were obtained at the 47 transferable LTR-RT-based markers in 18 species, and 3 to 15 alleles were observed per locus, with an average of 6.21 (Table S2)    In this study, 47 newly developed LTR-RT markers generated polymorphic bands in 40 accessions of 18 Melilotus species. A total of 292 alleles were obtained at the 47 transferable LTR-RT-based markers in 18 species, and 3 to 15 alleles were observed per locus, with an average of 6.21 (Table S2) Table S3).

Outlier Detection
We successfully tested a total of 47 polymorphic LTR-RT markers in 18 Melilotus populations. We used the BayeScan 2.

Cluster and Population Structure Analysis
The unweighted pair group method with arithmetic mean (UPGMA) dendrogram showed that the 18 Melilotus species were divided into three clusters ( In the dendrogram constructed, bootstrap values ranged from 36% to 100% between clusters, and the average bootstrap value observed in this study was 75%. Cophenetic correlation analysis was carried out to confirm the grouping pattern of the Melilotus species ( Figure S3). The correlation test results showed that the correlation coefficient r was equal to 0.817 ( Figure S3B). Furthermore, 15 individual plants of M. albus were classified into four main clusters ( Figure S4). The cluster analysis showed that single plants from an accession were clustered together, and the genetic similarity coefficients of 15 germplasms ranged between 0.75 and 0.95, thus revealing their close genetic relationships.
In accordance with the observed optimal goodness of fit (K = 2), the Bayesian clustering model, which was carried out with STRUCTURE software on all individuals and run for K = 1-11, divided the 40 evaluated accessions belonging to 18 Melilotus species into two groups ( Figure 5 Genetic variation within and among the Melilotus were determined by analysis of molecular variance (AMOVA). All species were analyzed to be a single group by AMOVA (Table 4). According to the AMOVA results, there were highly significant differences (p < 0.001) in genetic differentiation among species and within species. Of the total genetic variance, 45.19% was due to differences among species, and 54.81% was due to differences within species. Therefore, the results showed significant genetic differences among the 18 Melilotus species of the one group.   Table 5.  Table 5.

Discussion
Molecular markers based on LTR retrotransposons show broad applications in genetic mapping, genetic diversity assessment [28,29], phylogenetic evolution analysis [30], and variety identification [31]. Additionally, compared with traditional phenotypic markers, molecular markers are more efficient, accurate, and reliable for differentiating varieties and closely related species [32]. Nevertheless, relatively few molecular markers have been found in numerous non-model plants, including Melilotus, which considerably limits the genetic research on these species. In previous studies, the genetic diversity of Melilotus has been investigated using different molecular markers, such as simple sequence repeats (SSRs) [26,33] and expressed sequence tags-simple sequence repeats (EST-SSRs) [27,34]. In plant research, retrotransposons play a major role in genome evolution [35], and the presence of a very large number of error-prone retrovirus replications can lead to accumulation of genetic variations [36]. Ramakrishnan et al. deployed retrotransposon-based markers to reveal the genetic diversity and population structure of Asian bamboo [37]. However, LTR-RT-based molecular markers, which are remarkable tools in detecting genetic diversity, have not previously been used in Melilotus.
In this study, 181,040 LTR retrotransposons were used to develop 350 LTR primer pairs for PCR amplification. In total, 47 of the 79 LTR primer pairs showed polymorphisms among 15 M. albus accessions, which indicated that this type of molecular marker can distinguish 15 M. albus accessions well (Figures S2 and Figure S3). The high levels of polymorphism observed may be due to the M. albus materials selected for screening the primers. In addition, the polymorphic primer pairs were screened to validate the availability and transferability of LTR-RTs in a panel of 40 Melilotus accessions, which yielded 292 clear strong bands. The average number of alleles per primer pair found in this study was higher than the number of EST-SSRs [27,38], while 265 (90.75%) of 292 bands were polymorphic and showed the potential for use in genotyping. Moreover, all of the 47 primer pairs could amplify products successfully in most species and showed stable transferability, exhibiting a higher transferability rate than that obtained in chokecherry (Prunus virginiana L.) [39] and that from Melilotus EST-SSR primers [27]. Molecular markers are feasible for evaluating genetic diversity in plant species [40]. The genetic diversity revealed at LTR-RT loci was supported by high values of He and PIC. In this study, the mean He and PIC were 0.70 and 0.66, respectively. Primer 283 showed a stronger ability to discriminate genotypes due to its high PIC value (0.84). The primer 93 marker showed a lower PIC value (0.35), suggesting that this primer had less discriminatory ability in the present study. These results clearly suggest that more diverse LTR-retrotransposon marker loci can be identified and effectively applied to breeding programs to obtain the plant types desired for commercial cultivation. The I value can be used to evaluate the level of genetic diversity in a population, where the greater the I is, the greater the genetic diversity [41]. Yan et al. [27] obtained a lower average I value (0.0670) than this study (0.0750) using EST-SSR markers, indicating that the Melilotus accessions evaluated in this study were more diverse. In addition, the EST-SSR marker analysis did not consider mononucleotide repeats because of the difficulty of distinguishing single nucleotide repeats from polyadenylation products and single nucleotide stretch errors produced by sequencing [27].
In previous molecular phylogenetic analyses [25], 18 Melilotus species were classified into clade I and clade II (Table 5). An SSR maker analysis showed that all Melilotus species were clustered into A and B [26], which were further divided into A1 and B1, respectively (Table 5). An EST-SSR marker analysis showed that the 18 Melilotus species were grouped into I, II, and III [27]. According to the UPGMA cluster analysis involving the LTR-RT markers conducted in this study, the 18 species were grouped into three clusters. Among the 11 species in cluster I, 10, 10 and 8 of the species were consistent with clade I, I and A, respectively. Except for M. segetalis, the GIII of the phylogenetic trees was consistent with II. Cluster II contained the remaining species, which was consistent with III. As the primers increased, the results became more similar. Melilotus germplasms from different countries clustered together, indicating that kinship has a greater impact on the genetic structure than the location provenance used in the present study.
The detection of natural selection signatures within a genome can reveal which genes are under the influence of natural selection. It is possible to identify loci with an atypical variation pattern (outlier loci) by comparing the genetic diversity of loci across the genome, which is likely to be affected by selection. Outlier loci can better explain the adaptive genetic variation that is not accounted for by neutral loci [42]. Although a large number of loci were revealed in this study, less than 5% were identified as outliers (Table S3). Additionally, the results obtained with the LTR-RT markers were similar to those with the microsatellite (SSR) method, which revealed different sizes of DNA fragments by electrophoresis. Although LTR retrotransposons are highly heterogeneous in plants, homoplasy is also obviously important in evaluating phylogenetic relationships among species. However, microsatellites follow a stepwise mutation process, considering allele size difference, and it may be more suitable to use microsatellite data analysis [43].

Plant Materials, Genomic DNA Isolation, and PCR Primer Design
A total of 15 accessions of Melilotus albus were utilized to screen polymorphic LTR-RT-based markers and evaluate genetic diversity (Table S4). In total, 40 accessions of 18 Melilotus species were obtained to assess the newly developed LTR-RT-based markers for their transferability ( Table 6). The seeds of the accessions utilized in this study were obtained from the National Plant Germplasm System (NPGS, Beltsville, MD, USA). Total genomic DNA was extracted using the sodium dodecyl sulfate (SDS) method [44]. The quality of the isolated DNA was assessed in 1% agarose gels and by a NanoDrop spectrophotometer (ND-1000, Thermo Scientific™, Waltham, MA, USA). The DNA concentrations were normalized to 20 ng/µL for polymerase chain reaction (PCR).

Identification of LTRs
Our research group completed the whole-genome sequencing of diploid M. albus, with a genome size of approximately 1.04 Gb (BioProject ID: PRJNA674670). The M. albus LTR sequences (LTRs) were identified using the 'RepeatMasker' tool on 20 December 2018 (http://www.repeatmasker.org/cgi-bin/WEBRepeatMasker). The 'cross_match' search engine was applied, and 'Rice' was specified as the DNA source. Other software parameters were run according to the default options. In addition, the LTR primers were designed using DNAMAN Version 6.0 (Lynnon Corporation, San Ramon, CA, USA) (Table S5).

Primer Selection and PCR Conditions
Four accessions of M. albus were randomly selected and used to screen 350 LTR-RT markers by PCR amplification. Amplification was conducted in a 10 µL reaction solution that included 1.0 µL of genomic DNA (20 ng/µL), 4.95 µL of 2× reaction mix (500 µM dNTP, 20 mM Tris-HCl, 100 mM KCl, 3 mM MgCl 2 ), 2.0 µL of double-distilled water, 1.0 µL of each LTR primer (4 µM each), and 0.05 µL of 2.5 U/µL Golden DNA Polymerase. The PCR program consisted of a pre-denaturation of 3 min at 94 • C, followed by 35 cycles of 94 • C for 30 s, 50-60 • C for 30 s, and 72 • C for 30 s, and finally, an extension cycle of 7 min at 72 • C. The PCR products were separated in 6.0% nondenaturing polyacrylamide gels (400 V, 1.5 h) for visualization under UV Imager Gel Doc XR+ system lights (Bio-Rad, Hengshui, Hebei, China).

Sequencing of PCR Amplification Products
PCR was conducted in a 25.2 µL volume consisting of 10 µL of 2× reaction mix, 0.2 µL of Golden DNA Polymerase, 8.0 µL of ddH 2 O, 3.0 µL of genomic DNA (20 ng/µL), and 2.0 µL of each primer. The PCR products were sent to a commercial company (Shanghai Sangon Biological Engineering Technology, Shanghai, China) for sequencing (ABI 3730 DNA sequencer). Sequence information was used for multiple sequence alignment using DNAMAN.

Data Analysis
The number of amplified bands was recorded to construct a "0, 1" binary matrix. The polymorphism information content (PIC) and expected heterozygosity (He) were calculated as reported previously [45]. BayeScan.V.2.1 software [46] can be used to detect genetic markers under selection using differences in allele frequencies between populations. The default parameters given in the program were used. The POPGENE 32 software was used to calculate the percentage of polymorphic loci (PPL), number of polymorphic loci (NPL), number of observed alleles (N A ), number of efficient alleles (Ne), Nei's (1973) diversity index (h), and Shannon's information index (I) [47]. Analysis of molecular variance (AMOVA) was carried out using the method of Arlequin suite version 3.5 [48]. The UPGMA dendrogram was produced using Free Tree V.9.1.50 software [49]. To assess the reliability, the field repetition count we used was 100. The phylogram was visualized by TreeViewX V.5.0 software [50]. In addition, the Cophenetic correlation test was carried out by using a matrix comparison plot in NTSYSpc.V.2.1. To subdivide the individuals into different subgroups, a Bayesian clustering analysis was carried out in STRUCTURE 2.3 software [51]. Because of the estimated 'log probability of data' [LnP(D)] of STRUCTURE overestimating the number of subgroups [52], we used the ad hoc measure K [53] to estimate the number of groups. Values of K to explore were chosen according to [53] and were calculated in Excel tables. It was run for K = 1-11 with the admixture model, and a total of 20 independent runs were set for each K value and for each run.

Conclusions
In conclusion, 350 pairs of LTR retrotransposon primers for identifying the molecular markers were designed for M. albus. Overall, 47 primer pairs showed high polymorphism among 15 M. albus accessions, and all of the polymorphic primer pairs showed transferability among 40 accessions of 18 Melilotus species. In addition, the origin of the accessions did not have an influence on the genetic structures in the 18 Melilotus species. Our results suggest that the markers investigated here will be useful for studying the genetic diversity, population structure, and germplasm of Melilotus species.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/plants10050890/s1, Table S1: The information of being identified LTR-RTs in the Melilotus genome. Table S2: Primer sequence, allele size range and polymorphism information for 47 LTR retrotransposon loci among 18 Melilotus species. Table S3: Detection of outlier loci using BayeScan. Table S4: List of M. albus accessions used for LTR-RT molecular marker validation. Table S5: Information of LTR primers designed using DNAMAN software. Figure S1: Distribution of the ratio of LTR retrotransposons on eight chromosomes of Melilotus. Figure S2: The result of PAGE (polyacrylamide gel electrophoresis) of primer 25 in 15 M. albus accessions. Figure S3: Matrix comparison plot for cophenetic correlation test. Figure S4: Dendrogram of 15 M. albus by the UPGMA cluster analysis based on LTR-RT analysis.