Evaluating the Diversity of Ecotypes of Red Clover ( Trifolium pratense L.) from Northwestern Spain by Phenotypic Traits and Microsatellites

: For more than 50 years, the CIAM-AGACAL (Agricultural Research Centre of Mabegondo, Xunta de Galicia, A Coruña, Spain) has been carrying out the important task of conserving the phytogenetic resources of ecotypes and natural populations of grassland species from northwestern Spain. The CIAM-AGACAL’s germplasm bank has 57 populations of red clover ( Trifolium pratense Lam.), one of the most cultivated forage legumes in the world. The goal of the present study was to evaluate the diversity among cultivars and natural clover populations at morphological and molecular level. Twelve polymorphic SSR loci revealed 241 microsatellite alleles with an average of 20.08 alleles per locus. Two main groups were detected by the Structure software, one of them including local populations and the second clustering cultivars and related populations. Intra-speciﬁc variability was found among cultivars and natural populations. A moderate genetic differentiation of Spanish red clover cultivars was observed (Fst = 0.08) between the two main clusters. Finally, a certain relationship between phenotypic and genotypic variation was detected.


Introduction
Red clover (Trifolium pratense Lam.) is native to southeastern Europe, appears spontaneously in almost all the Iberian Peninsula and is cultivated as a forage species, firstly in northern Europe and later worldwide as one of the most used mowing legumes in temperate climates [1]. Galicia (northwestern Spain) is a region where the agrifarming industry is an important pillar in the economy, production and transformation of local products. In Galicia, the main use of red clover is by mowing and it can be supplied green or preserved as silage or hay. It can also be grazed even though trampling by cattle can damage the crown and affect its persistence. Hence, there is interest in characterizing the natural populations of clovers preserved in the germplasm bank of CIAM for their possible use for the creation of commercial varieties with a native genetic base that is better adapted to Galician edaphoclimatic conditions.
The germplasm bank of prairie grasses of the CIAM-AGACAL (Agricultural Research Centre of Mabegondo, Xunta de Galicia) preserves a unique reference collection of prairie species cultivated mainly in the northwest of the Iberian Peninsula; a summary of these works of collection, multiplication and characterization can be found in López et al. [2]. This collection has been expanded with various surveys in recent years with accessions of this species; today a total of 57 natural populations of red clover are preserved, half of which come from Galicia, but most are not evaluated. An ecotype from this germplasm bank, a local cultivar, is already registered as "Maragato" (Registration number 191500112), but not commercialized yet. Local ecotypes can be used to obtain cultivars that are better adapted to specific climatic conditions, and are more resilient to a changing environment.
To date, great efforts have been made in the collection, multiplication and characterization of germplasm collections for the possible supply of seeds to develop new varieties providing ecotypes adapted to our soil and climatic conditions. There is currently a need to promote its use to obtain varieties for low-input agriculture, reducing production costs and in line with what the market demands, and at the same time improving the possibilities for crop diversification (greening) to encourage good environmental practices in crop production, as well as the maintenance of areas that are beneficial for mitigating climate change and for the benefit of the environment (https://ec.europa.eu/info/food-farming-fisheries/key-policies/ common-agricultural-policy/income-support/greening_en, accessed on 16 April 2019).
The aim of this study project was to evaluate the phenotypic and molecular diversity of ecotypes of red clover from CIAM-AGACAL, which can play an important role in the development of commercial varieties with an autochthonous genetic basis and for the improvement of sustainable agricultural systems according to the challenges of the Common Agricultural Policy (CAP) and to contribute to the mitigation of climate change.

Phenotypic Traits
A test field was established at Centro de Investigacións Agrarias de Mabegondo (CIAM, Xunta de Galicia) in Mabegondo, A Coruña in northwestern Spain. The initial preparation of the ground consisted of typical tasks of ploughing, milling and the installation of anti-grass mesh and with no compost or soil with any type of fertilizer or liming added. In the month of July 2018, the populations (13) and commercial varieties (4) were sown in Alveoli trays in a greenhouse. In September the plants were transplanted to a test field in plots of 40 × 40 cm with a design of random complete blocks with four repetitions (Figure 1). For characterization, the recommendations of the International Board for Plant Genetic Resources [3] and the International Union for the Protection of New Varieties of Plants [4] were followed. For each population and commercial cultivar, the following descriptors were evaluated: FLO: number of days from 1 January 2019, when three heads per plant were flowering per plot; CRE: Growth in flowering, on a visual scale from 1 to 9, 1 being a little and 9 much (after taking the annotation of flowering date); CRF: Growth in the year of sowing, on a visual scale from 1 to 9, 1 being a little and 9 much (in the year of sowing, at the end of winter); HAB: Growth habit in early spring before flowering, on a visual scale from 1 to 9, with 1 = prostrate to 9 = erect; ENF: Tolerance to pests and diseases on a visual scale from 1 to 9, with 1 = sensitive to 9 = resistant. Additionally, altitude (ALTIT) was recorded for the origin of the samples (Table 1).
For statistical analyses, a fixed-effects ANOVA was performed for each variable according to the following model Xmjk = µ + Cm + Rj + (CR) mj + mjk; where Xi(m)jk is the observation of the cultivar i (i = 1 to 17) in the repetition j (j = 1, 2, 3, 4) and the sample k (k = 1 to 30); µ is the mean of all the observations; Cm, Rj, (CR) mj and mjk are the effects of the cultivar m, the repetition j, the interaction cultivar repetition, and the error associated to the sample k in the observation mjk, respectively.
The amplification conditions were 94 • C for 5 min, followed by 35 cycles at 95 • C for 30 s, annealing at a specific temperature depending on the multiplex set, for 90 s, and 1 min at 72 • C, and final extension at 60 • C for 30 min.
Amplification products were diluted with water, and 2 µL of the diluted amplification product was added to 0.12 µL of 600LIZ size standard (Applied Biosystems, Foster City, CA, USA) and 9.88 µL of formamide. The allele sizes were detected using Peak Scanner TM software (Applied Biosystems).
A Bayesian analysis was performed with the Structure software [9,10] by using the admixture model with unlinked loci and correlated allele frequencies, as defined in Pereira-Lorenzo et al. [11] and Porras-Hurtado et al. [12], recommending a minimum of 20 iterations (30 in this study) to estimate the ancestry membership proportions of a population. We computed K = 1 to 15 unknown reconstructed panmictic populations (RPPs) of genotypes, with the options use popinfo = 0, popflag = 0, which considers that the sampled genotypes were of unidentified origin, assigning them probabilistically to RPPs based on a qI (probability of membership) of 80%, while a lower probability meant an admixed genotype. The second order change of the likelihood function, divided by the SD of the likelihood (∆K), was also estimated to find the best K value supported by the data [13] by using Structure Harvester [14]. The inbreeding coefficient (Fis) [15] was calculated in the program GenoDive [16].
Similarity relationships among the samples were studied using multivariate analysis techniques. For each ecotype (20 samples) and commercial cv. (10 samples) the frequency of each allele was assigned to a variable, with values 1, and 0 for presence and absence of the allele, respectively. Principal components (PCs) were estimated on the variance-covariance matrix of the allele frequencies [17][18][19] using SPSS V.22.

Genetic and Geographic Structure
A Bayesian analysis using the Structure software [9] was conducted using 12 SSRs to determine the genetic structure among 300 unique genotypes. Two loci harbouring null alleles were not included in this analysis. The ∆K criterion values increased until K = 2 ( Figure S1) estimated by using Structure Harvester [14] in a group of 231 genotypes out of 300, with a qI > 80% (77% of all genotypes). This corresponded to a strong differentiation in two main groups of genotypes (RPP, reconstructed populations), one with 91 genotypes (RPP1, 31.33% of the total number of genotypes) including only natural populations (2762, 1806, 1803, 1808, 1809 and 1811) and a second one with 140 genotypes (RPP2, 46.67% of the total number of genotypes) including commercial cvs. and natural populations (Table S3).
Number of alleles were 152 and 199 for RPP1 and RPP2, of which 98 and 134 were rare (p < 0.05), respectively.

Factorial Component Analysis (FCA)
FCA showed congruent results with the Bayesian method, with the differentiation of the 2 RPPs in the first axis, RRP1 in the negative PC1 and RPP2 in the positive PC1 with the admixed accessions in between ( Figure 2). Fst between RPP1 and RPP2 was 0.086 (p < 0.001).

Genetic and Geographical Origin
When we represent the different populations in the FCA analysis, we clearly observed the introgressants into the natural populations, and the different level of purity in the samples (Figure 3). The most differentiated populations (RPP1) from commercial cvs. (RPP2.1) were from Asturias and Galicia, at altitudes lower than 1000 masl (Figure 4), meanwhile natural populations from RPP2.2 were found at altitudes between 400 and 1500 masl.

Genetic and Phenotypic Variation
Overlapping information obtained by genotypic variation (K = 2 and 3) with the phenotypic traits (Table 1), FLO and ENF showed significant variation (p < 0.05) when K = 2; and CRF (p < 0.01), HAB (p < 0.01) and ENF (p < 0.05) when K = 3 was considered. RPP1 showed the lowest FLO but 1803 in respect to RRP2. RPP1 and RPP2.2 had lower CRF and HAB and higher ENF than RPP2.1. Finally, RRP2.2 showed higher significant (p < 0.05) ALTIT than RPP1 and RPP2.1. When we represented the three main significant origins of phenotypic variation by K = 3, we can differentiate two main groups, those cultivars related with commercial cultivars (RRP2.1) with higher CRF and HAB and lower ENF, from those local cultivars with the opposite phenotypic characteristics (RPP1 and RPP2.2) ( Figure 5).

Discussion
SSRs used in this study showed higher genetic variation than other studies on red clover, with double the average number of alleles per locus than in red clover in the Ukraine [20] and red clover populations from the NPGS-USDA core collection [21]. Genetic differentiation between RPP1 and 2 was higher (Fst = 0.08) than that found in the Ukraine with Fst = 0.07 of the variation between the two main populations detected by the Structure software [20].
Bayesian method, FCA and FST values indicated that some natural populations derived from commercial cultivars, those clustering in RPP2, while those from RPP1 had an independent origin. Genetic differentiation between the two main clusters, over 10%, were of the same order as those found in some other crops, such as Brassica rapa subsp. rapa L. (0.100) [22], wheat (0.132) [23], and lower than those of the lupine (0.179) [24].
Genetic differentiation between clusters was reflected in some phenotypic traits such as FLO, ENF, CRF, HAB and ENF as is found in other crops such as pear [25] and cowpea [26].
RRP2 showed a higher diversity, surely due to the different origins of the commercial cultivars, with 19.5% more alleles than RPP1 and higher than the NPGS-USDA core collection [21].
Our results differentiated those populations related to commercial cultivars from others probably derived from the wild, a different situation from what happened in the Ukraine [20] where no geographical differentiation was detected due to the genetic relationships between the genotypes and the original populations used in the breeding programmes promoting free trans-pollination.

Conclusions
We found a group of natural populations not related to commercial cultivars that can provide local cultivars with specific agronomic traits for production and plant breeding.
Genetic and phenotypic variation was in general related, with the most differentiated group having lower CRF and HAB and higher ENF, which can make selection for agronomical traits difficult. However, 1803 showed phenotypic traits like the commercial group but with a different genetic background. SSRs indicated that most of the natural populations evaluated need recurrent selection to reduce the number of admixed accessions indicating natural hybridisation with other closer relatives. Moreover, SSRs can be used as a fast tool to remove introgressants to homogenize the natural populations found in northern Spain.