In-Depth Characterisation of Common Bean Diversity Discloses Its Breeding Potential for Sustainable Agriculture

: Legumes’ cultivation contributes services to agro-ecosystems and society, in line with the principles of sustainability. Among pulses, the common bean is one of the most important sources of plant proteins and other important nutrients for humans. Extensive phenotypic and genetic characterisations of unexplored bean germplasm are still needed to unlock its breeding potential. To the purpose, a panel of 192 diverse genotypes, mainly developed starting from European landrace accessions, was characterised for relevant morpho-phenological traits; a partially replicated experimental design was used. For each quantitative trait, Best Linear Unbiased Predictors and broad-sense heritability were estimated. The screened panel revealed a high level of diversity for most of the measured traits, especially for days to ﬂowering and hundred-seed weight. The same material was also characterised by means of double-digest Restriction-site Associated DNA; a high number of SNP markers were successfully produced. The genotyping allowed understanding the ﬁne genetic structure of the panel. Genetic information was also used to study morpho-phenological traits considering di ﬀ erent genetic groups existing within the panel. At the same time, genotypes characterised by favourable traits were identiﬁed. The availability of such collection with its extensive characterisation, make this material an excellent resource for common bean improvement.


Introduction
The increased use of chemical fertilizers, pesticides, and herbicides, and an overall simplification of agricultural systems in the past decades have significantly reduced the biological diversity of our agro-ecosystems (the level of above and below-ground diversity in terms of number of macro-and micro-organisms) [1,2], impairing beneficial effects that biodiversity has on crop productivity [3], health [4], and maintenance of agro-ecosystem services for future generations [5]. Overall, agro-ecosystems resilience and long-term sustainability of agriculture have been negatively affected, leading to an increased pollution of the environment [6].
Due to their Biological Nitrogen Fixation (BNF), control of weeds [7] and services given to other components of agro-ecosystems (e.g., feeding pollinators as such mitigating their decline [8,9] or suppressive catch and green manure crop [10]) legumes-grown in rotation, mixture, or association with other crops-can certainly contribute to a sustainable improvement of agro-ecosystems [11]. A wise exploitation of legume BNF ability can reduce: (i) use of nitrogen chemical fertilisers, (ii) emissions of greenhouse gases (GHG)-occurring directly as consequence of farming activities or indirectly by production and transport of fertilisers (e.g., GHG emissions in the source category "agriculture" accounted for 10% of total EU GHG emissions in 2011 [12]) and (iii) nitrogen leaching and consequent water pollution. As such, legumes can effectively play a key role in the diversification and sustainable intensification of contemporary agriculture under the current climate change scenario. In this regard, it should also be also noted that the nitrogen cycle has been quoted among the planet's safe boundaries [13,14].
The use of legumes in agriculture greatly declined in the past and especially in Europe which presently needs to import food and feed plant proteins with negative repercussions on its trade balance [15] that in 2013 was estimated as high as 70% of needs [16]. In addition, although a regular consumption of plant proteins has been always recommended by physicians, the habit of using legumes in the diet has recently dropped down as well as the knowledge on how to use legumes in food preparations [17]. However, starting from 2013, recent European agricultural policies favoured significant productive growth of feed and food legumes in Europe. The latest European report "on the development of plant proteins in the European Union" claims that the increasing cultivation of field pea and fava bean drove this growth; as such food pulses have tripled cultivation area and overall production in Europe [18]. On the other hand, the same document also reports that, despite this positive trend, imports of other pulses such as common bean have slightly increased, putting Europe in a competitive disadvantage towards other countries [18].
Presently, the world is facing an increased demand in legume products for feeding animals and humans due to the: (i) continuous demographic growth, (ii) scarce soil availability and fertility decrease (that, in turn, hampers the possibility to sustain this demographic growth), (iii) increased awareness that restoring more diverse agricultural systems is a safe means to contrast crop losses due to climate change and to contribute long-term sustainability of agriculture [12], (iv) shift of people dietary habits towards use of vegetable-instead of animal-derived proteins for ethical, health, environmental, cultural, aesthetic or economic reasons. This demand is especially relevant and high in Europe where plant protein production does not meet internal market demand [16].
Among legumes, pulses are the most important source of plant protein and other nutrients for humans [19]. Common bean (Phaseolus vulgaris L.)-a diploid (2n = 2x = 22), annual, predominantly self-pollinating species [20]-is one of the most important pulses worldwide. The cultivated forms of this species were domesticated from wild relatives in Mesoamerica and Andes mountains, giving rise to two distinct genepools [21,22]. Both genepools were then introduced in other continents where different environmental pressures, biotic and abiotic stresses, farmer preferences as well as between-genepool occasional crosses and/or initial genetic bottleneck resulted in a complex genetic structure of the bean germplasm [23][24][25][26][27].
Because of its great variability in terms of plant physiology and architecture, seed characteristics (e.g., size, shape and colour), relative duration of the reproductive cycle and many other qualitative and quantitative traits [28][29][30][31][32][33][34], nowadays, the common bean cultivation spans a wide range of cropping systems and environments [35], over an area of about 18 million hectares with a total production of 12 million tons per year [36,37]. Overall, it is mainly cultivated for its dry grain; however, the production of fresh grain (shell beans) and pod (snap beans) is also important, especially the latter in Europe [38]. In many developing regions of the world, common bean still is a major source of proteins and other nutrients such as biologically important minerals, thiamine and folic acid [39].
In the past, scarce research and development efforts limited the improvement of varieties and farming practices able to enhance use and production of common bean in comparison to other major crops such as maize and soybean [40]. Nowadays, renewed research efforts are carried out paying special attention to obtain regular production by increasing the ability of this species to tolerate and/or escape major factors limiting its yield: water deficiency, heat stresses as well as other biotic and abiotic Sustainability 2019, 11, 5443 3 of 20 stresses [40]; in addition, productivity under low-water availability, reduced application of chemicals and enhanced BNF would also contribute to attain a more sustainable bean production. In this context, research and linked breeding efforts need to better explore the great within-species diversity of common bean, also taking advantage of the new molecular and breeding technique advances [40].
Germplasm collection and characterisation are pivotal for breeding and many other research activities. A direct and simple way to have access to germplasm is asking Institutions formally devoted to store and provide it to users. The world largest common bean collections are presently held at the International Center for Tropical Agriculture (CIAT, Colombia), the United States Department of Agriculture (USDA), the Brazilian Agriculture Corporation (EMBRAPA), the National Institute of Forestry, Agriculture and Livestock Research (INIFAP, Mexico), the Leibniz Institute of Plant Genetics and Crop Plant Research (IPK, Germany) and are mostly composed of landraces. These collections, although largely explored and characterised for several traits [26,29,[41][42][43][44][45], still contain large untapped diversity.
Recent developments of high-throughput sequencing methods, coupled with a significant reduction of genotyping costs, opened new possibilities for molecular characterisation of common bean. Techniques such as double-digest Restriction-site Associated DNA (ddRAD-seq) [46] generate high number of DNA-based markers useful to study and dissect genetic architecture, within-species genetic diversity, localise and identify genes controlling relevant traits. Knowledge generated through the application of such technologies will potentially drive quick advancement of common bean breeding. In applying deep genotyping to a collection of germplasm, it is highly recommended to use homozygous genotypes (pure lines); this is particularly relevant when molecular and morpho-phenological data are jointly analysed. If the collection to be assessed is made of landraces, some preliminary work to obtain lines is often needed even if working with a predominantly autogamous species such as common bean. In fact, landraces are composed of different genotypes and, possibly, not of all homozygous genotypes, due the occurrence of occasional cross-pollinations. To develop panels of homozygous diversity suitable to deep genotyping, Single Seed Descent (SSD) can be usefully applied [47,48].
The main purpose of our work is to make available useful genetic and morpho-phenological information of a common bean panel that includes a relevant portion of the European diversity of this species; such information would possibly facilitate future development of valuable materials for sustainable agriculture. To the purpose we: (i) produced a useful panel of highly homozygous common bean genotypes; (ii) dissected its genetic structure through next-generation sequencing genotyping and (iii) performed an extensive phenotypic characterisation, evaluating meaningful quantitative and qualitative traits, accounting for the genetic architecture and diversity of the panel.

Plant Material
The common bean diversity panel developed by the Department of Agricultural, Food and Environmental Science of the University of Perugia (DSA3), described by Raggi and colleagues [49], was characterised. Briefly, it encompasses 192 common bean homozygous genotypes obtained from landraces (179) and cultivars (13) through five successive generations of SSD under isolated conditions. Plant isolation was applied to avoid cross-pollination events and to reduce risks of viral, bacterial and fungal diseases. Plants were grown in pots (40 cm diameter) in a net covered nursery supplied with an automatic drip-irrigation system.
Accessions from which the homozygous genotypes were developed can be grouped into three main geographical areas according to their origin: 153 from Europe, 22 from South America and 17 from Central America. Italy is the most represented country in the panel with 53 accessions; the remaining 139 accessions are from other 33 different countries ( Figure 1). Further details on the studied genotypes are reported in Table S1. Geographical heatmap representing distribution and number of accessions included in this study (i.e. 192 common bean accessions). Pie charts report the percentage of the two main genetic groups: magenta (Mesoamerican genepool) and yellow (Andean genepool) as from results of STRUCTURE analysis. Adapted from Raggi and colleagues [49].

Genotyping
DNA of the 192 lines constituting the diversity panel was successfully extracted from young leaf tissues and genotyped using double digest Restriction-site Associated DNA (ddRAD) on an Illumina Hi-Seq 2500 platform as described by Raggi and colleagues [49]; the analysis allowed to characterise a total of 106,072 polymorphic loci. A whole-genome linkage disequilibrium-pruned (r 2 < 0.3) subset of SNP markers was then produced using Plink v. 1.09 [49,50]. This subset of 2,518 linkage disequilibrium-pruned SNPs was here used to perform genetic structure and cryptic relatedness analyses.

Genetic Structure and Cryptic Relatedness
A Bayesian clustering approach based on molecular makers, implemented in the software STRUCTURE v. 2.3.4 [51], was used to determine the number of genetic groups (K) in the diversity panel. As one of the main assumptions of the software is that markers should be in Linkage Equilibrium (LE), the reduced subset of SNP markers was used. STRUCTURE was initially run assuming an admixture model for different clusters ranging from 1 to 16; for each tested cluster ten runs based on a 30,000 burn-in period and a Markov Chain Monte Carlo (MCMC) of 30,000 iterations after burn-in were performed. Effective number of clusters (K) in the diversity panel was then inferred testing the change of the log-likelihood between K values (ΔK) [52] using Structure Harvester [53]. Subsequently, for the most significant clustering values, Q-matrixes were generated after single runs performed using a 100,000 burn-in period and 300,000 MCMC iterations and results plotted with the software DISTRUCT [54]. A threshold of q ≥ 0.8 was used to assign genotypes to different STRUCTURE clusters [55,56]. The association between genotype cluster assignment and the available phaseolin data were then tested by analysis of contingency tables with the likelihood ratio chi- Figure 1. Geographical heatmap representing distribution and number of accessions included in this study (i.e., 192 common bean accessions). Pie charts report the percentage of the two main genetic groups: magenta (Mesoamerican genepool) and yellow (Andean genepool) as from results of STRUCTURE analysis. Adapted from Raggi and colleagues [49].

Genotyping
DNA of the 192 lines constituting the diversity panel was successfully extracted from young leaf tissues and genotyped using double digest Restriction-site Associated DNA (ddRAD) on an Illumina Hi-Seq 2500 platform as described by Raggi and colleagues [49]; the analysis allowed to characterise a total of 106,072 polymorphic loci. A whole-genome linkage disequilibrium-pruned (r 2 < 0.3) subset of SNP markers was then produced using Plink v. 1.09 [49,50]. This subset of 2518 linkage disequilibrium-pruned SNPs was here used to perform genetic structure and cryptic relatedness analyses.

Genetic Structure and Cryptic Relatedness
A Bayesian clustering approach based on molecular makers, implemented in the software STRUCTURE v. 2.3.4 [51], was used to determine the number of genetic groups (K) in the diversity panel. As one of the main assumptions of the software is that markers should be in Linkage Equilibrium (LE), the reduced subset of SNP markers was used. STRUCTURE was initially run assuming an admixture model for different clusters ranging from 1 to 16; for each tested cluster ten runs based on a 30,000 burn-in period and a Markov Chain Monte Carlo (MCMC) of 30,000 iterations after burn-in were performed. Effective number of clusters (K) in the diversity panel was then inferred testing the change of the log-likelihood between K values (∆K) [52] using Structure Harvester [53]. Subsequently, for the most significant clustering values, Q-matrixes were generated after single runs performed using a 100,000 burn-in period and 300,000 MCMC iterations and results plotted with the software DISTRUCT [54]. A threshold of q ≥ 0.8 was used to assign genotypes to different STRUCTURE clusters [55,56]. The association between genotype cluster assignment and the available phaseolin data were then tested by analysis of contingency tables with the likelihood ratio chi-squared (χ 2 ) test. The same SNP dataset was also used to display a dendrogram representing the cryptic relatedness of genotypes within the diversity panel based on a kinship matrix calculated in TASSEL v. 5.2 [57]; the graphical representation of the latter was developed using the R package "ggplot2". . At the two sites, the sowing was carried out in mid-May 2017; the experimental plots were covered by anti-insect net and water was supplied with localised irrigation throughout the entire duration of the trials. Both experiments were arranged using similar partially replicated randomised design with five entries replicated five times and two entries replicated six times, producing a total of 222 plots out of 192 entries (i.e., genotypes).
Data on morpho-phenological traits were collected according to the descriptors of the International Board for Plant Genetic Resources (IBPGR) [58] and the Protocol to test Distinctness, Uniformity, and Stability (DUS) of P. vulgaris of the European Community Plant Variety Office (CPVO) [59]; other descriptors were retrieved from literature. In total 9 quantitative and 15 qualitative (i.e., categorical) traits were recorded. The full list of the phenotypic traits and relative references are in Table 1.

Phenotypic Data Analyses
The experimental design allowed for bi-dimensional spatial analysis carried out for each experiment using the GenStat procedure as described by several authors [61][62][63][64]. For each entry and experimental site, Best Linear Unbiased Predictors (BLUPs) of the recorded quantitative traits were calculated using the best suitable spatial model according to the experimental field setup through Restricted Maximum Likelihood (REML) method and by selecting the best model, using the Akaike Information Criterion (AIC) [65]. The variance components were used to estimate broad-sense heritability (He 2 B ) of each trait, along with its standard error, on a plot basis as follows: where σ 2 p = σ 2 e + σ 2 g (phenotypic variance), σ 2 g = genotypic variance and σ 2 e = error variance. Descriptive statistics and Pearson's correlation coefficients among BLUPs for all quantitative traits were calculated using functions implemented in "agricolae" [66] and Past3 [67] and then visualised with the R package "ggplot2" [68]. Samples were grouped according to the most significant genetic structure, using a threshold of q ≥ 0.8 [55]. Phenotypic data analysis was then carried out according to sample assignation to the different genetic structure groups. Differences among genetic groups were then tested for statistical significance using the univariate t-test (p ≤ 0.01) and graphically visualised as data dispersion. A principal component analysis (PCA) of quantitative trait BLUPs was also carried out after data normalisation. PCA results were graphically summarised in a biplot [69] in which genotypes belonging to different identified genetic groups were highlighted with different colours.
Scores of the 15 qualitative traits were visually inspected. When samples showed different values between scores gathered at the two experimental sites, a "Not Assigned" (NA) score was considered for analysis. The described procedure has been applied to all samples and all categorical (i.e qualitative) traits. Chi-squared (χ 2 ) test and analyses of contingency tables were then performed grouping the samples according to the most significant genetic structure (q ≥ 0.8).

Seed-protein Content Evaluation
The bean diversity panel was also screened for an estimation of seed-protein content. To the purpose, 20 g of seed samples (harvested at CREA-CI) were ground using an ultra-centrifugal Retsch ZM200 mill (Retsch GmbH, Haan, Germany) equipped with a bottom sieve with 0.75 mm trapezoid holes. Total nitrogen analyses were carried out using a LECO TrueSpec Carbon, Hydrogen and Nitrogen (CHN) analyser (LECO, St. Joseph, MI, USA). Instrument calibration was obtained using LECO Barley Calibration sample for CHNs (C = 45.11 ± 0.35, H = 1.74 ± 0.05, N = 6.39 ± 0.05) with five levels of standard analyte: 0.050, 0.100, 0.150, 0.200 and 0.250 gr. Each common bean sample has been weighed on Tin Foil cup (0.0800 g ± 0.0050 g) and then analysed in triplicate. Samples were combusted for 3' and Helium was used as carrier gas. A conversion factor of 6.25 was used for estimation of grams of seed protein over 100 grams of dry seed weight [70].

Identification of Accessions Carrying Relevant Traits for Sustainable Agriculture
Days to flowering, hundred seed weight, leaf width, leaf length and seed-protein content were further analysed due to their high value for common bean improvement in light of sustainable agriculture [55,71,72]. For each genetic group identified in the panel (using genetic structure analysis) the 95th percentile of BLUP data was calculated and genotypes characterised by higher values (i.e., value ≥ 95th percentile) were identified. For days to flowering, data were analysed according to flowering precocity, meaning that the analysis focused on genotypes characterised by minimum values; the same principle was adopted for leaf length and leaf width; only accessions characterised by minimum values (5th percentile) of both traits at the same time were considered.

Genetic Structure and Cryptic Relatedness
According to ∆K analysis, the diversity panel is composed by two main subgroups (K = 2) ( Figure 2). Within the panel, 87 genotypes were assigned to the first group (45%), 94 to the second one (49%) while 11 were classified as admixed (6%), being characterised by q values ≤ 0.8. The assignation of genotypes to the two identified genetic groups was highly consistent with the information inferred through their phaseolin alleles: genotypes assigned to K1 are mostly Mesoamerican (phaseolin type S) while those to K2 mainly Andean (phaseolin type C and T) (χ 2 = 121.96, df = 2, Cramer's V = 0.84, p ≤ 0.001) (Figure 3a).
Other two levels of substructure of the panel were also investigated: K = 3 and K = 9 ( Figure 2). When K = 3 was considered, separation of the two genepools highlighted in K = 2 analysis is still evident; however, at K = 3, the Andean samples divided into two subgroups, revealing the existence of substructure within the Andean genotypes (Figure 2). At this value of K, the consistency between group assignment and phaseolin information is even more striking, in fact, genotypes characterised by C and T phaseolin are separated accordingly (χ 2 = 168.73, df = 4, Cramer's V = 0.74, p ≤ 0.001) (Figure 3b).
Considering K = 9, a complex fine genetic structure was observed in both Mesoamerican and Andean groups that was also corroborated by results of the cryptic relatedness, summarised as dendrogram in Figure 2. Moreover, the latter analysis showed that no redundancy affected the diversity panel as all the 192 genotypes are characterised by diverse allele combinations (Figure 2).   Results are shown for K = 2, K = 3 and K = 9. According to the Evanno test, K2 provides the best representation of the population structure. At K = 2, pink represents the Mesoamerican genepool, while yellow the Andean. On the bottom, the dendrogram represents the population cryptic relatedness (kinship).

Quantitative Traits
The spatial analysis-performed on the datasets produced at the two experimental sites-was more efficient than the complete randomised design (Crd) for two variables only: days to flowering (DTF) at CREA-CI and inflorescence length (IL) in both DSA3 and CREA-CI; for these cases, efficiency of the spatial models enhanced between 10 and 34%. Not surprisingly Crd generally was the best model for BLUPs calculation since the trials were performed in experimental fields where spatial

Quantitative Traits
The spatial analysis-performed on the datasets produced at the two experimental sites-was more efficient than the complete randomised design (Crd) for two variables only: days to flowering (DTF) at CREA-CI and inflorescence length (IL) in both DSA3 and CREA-CI; for these cases, efficiency of the spatial models enhanced between 10 and 34%. Not surprisingly Crd generally was the best model for BLUPs calculation since the trials were performed in experimental fields where spatial variation is generally low. In the common bean diversity panel extensive phenotypic variation exists for all the measured traits (Table 2). The highest coefficients of variation (CV) were observed for DTF, IL, flower buds per inflorescence (FBI) and hundred seed weight (HSW), meaning that the explored phenotypic diversity for these traits was higher when compared to the others. The estimates of broad-sense heritability (He 2 B ) were relatively high for all quantitative traits and generally consistent between experimental sites (Table 2). DTF was the trait characterised by very high values of He 2 B in both experiments carried out at DSA3 and CREA-CI experimental fields ( Table 2).
Results of simple linear regression analyses between BLUPs calculated at the two experimental sites showed good data consistency ( Figure S1) that was particularly high for of DTF, FBI, IL, and HSW. BLUPs dispersion analysis further confirmed this evidence ( Figure S2). In addition, the analyses showed that positive, relatively high and significant correlations exist between traits describing same plant organs: inflorescence (IL vs. flower buds per inflorescence (FBI), ρ = 0.63), leaf (leaf length (LL) vs. leaf width (LW), ρ = 0.63) and pod (pod width 1 (PW1) vs. pod width 2 (PW2), ρ = 0.42; pod width 1 (PW1) vs. pod length (PL) ρ = 0.36) ( Table 3). On the other hand, lack of correlation among most of other traits suggested a broad rearrangement of the phenotypic diversity within the panel.
When genotypes were separated according to their genetic constitution at K = 2 (K1 mainly included Mesoamerican, while K2 Andean genotypes) traits such as HSW, LL, PW1 showed significant differences (p < 0.001, t-test), with Mesoamerican genotypes characterised by smaller leaves, narrower pods and lighter seeds than Andeans ( Figure 4 and Table S2). On the other hand, the two groups did not show significant differences for the other quantitative traits (Table S2) as also shown by their data dispersion (Figure 4).  The PCA also gave graphical representation of the morpho-phenological diversity of the panel. PC1, accounting 24.82% of total variation, mainly separated samples according to their diverse LL, LW, FBI, IL, and HSW while PC2 (18.32%) was mainly related to different pod cross-section sizes (PW1 and PW2) and days to flowering (DTF). PL contributed to sample separation on both axes (Figure The PCA also gave graphical representation of the morpho-phenological diversity of the panel. PC1, accounting 24.82% of total variation, mainly separated samples according to their diverse LL, LW, FBI, IL, and HSW while PC2 (18.32%) was mainly related to different pod cross-section sizes (PW1 and PW2) and days to flowering (DTF). PL contributed to sample separation on both axes ( Figure 5). It is noteworthy that the PCA highlighted a wide morpho-phenological variation of Andean genotypes (i.e., evenly distributed over the biplot) when compared to the Mesoamericans, mainly clustered on the left side. According to the projection of the original variables on the biplot, Mesoamerican genotypes showed small leaves (LL and LW), shorter inflorescences (IL and FBI) and lighter seeds (HSW) than Andeans. PCA did not evidence clear clustering of the eleven admixed genotypes ( Figure 5).

Qualitative Traits
Most of the qualitative traits generally showed consistent scores between experimental sites; low discrepancy between data recorded at the two experiments was only recorded for Leaf Colour Anthocyanin (LCA, NA = 11.5%) and Pod Beak Orientation (PBO, NA = 13.5%). A clearly different distribution of trait scores within the two genetic groups was observed for: i) Colour Of Flowers (COF), predominant presence of white within Mesoamerican samples (K1; blue in Figure 6); ii) Base Of Standard (BOS), mainly present in the Mesoamerican genetic group (K1; orange); iii) Pod Beak Position (PBP), predominantly marginal in K1 (blue) and iv) Pod Beak Orientation (PBO), predominantly "downward oriented" in the Mesoamerica group (grey) while the "straight orientation" is the most abundant condition within the Andean group (K2) (orange) ( Figure 6). As expected, these differences were confirmed by contingency table analysis results (p ≤ 0.001). Pod Wall Fibre (PWF) also showed a different distribution between the two groups (p ≤ 0.05) where "strongly contracting" pod wall fibre was more frequent among the Andean (K2) genotypes than the Mesoamericans.
Regarding the screened qualitative seed-traits, the Mesoamerican group (K1) showed a high number of black-seeded genotypes (Seed Coat Darker Colour (SCDC) and Seed Coat Lighter Colour (SCLC) indicated as dark blue in Figure 6); the same group was also characterised by absence of Seed Coat Pattern (SCP, blue).
In the PCA of qualitative traits, the first two axes explained 29.67 % of the total variation ( Figure  S3). PC1 splits the Mesoamerican genotypes into three main groups: two on the left (A and B) and the other on the right side of the biplot (C) while the remaining ones are scattered in the middle together with most of the Andean genotypes. Genotypes separation on this axis is mainly due to differences of BOS, LCA, COF, SCDC, SCLC, and PC ( Figure S3). On the other hand, Andean

Qualitative Traits
Most of the qualitative traits generally showed consistent scores between experimental sites; low discrepancy between data recorded at the two experiments was only recorded for Leaf Colour Anthocyanin (LCA, NA = 11.5%) and Pod Beak Orientation (PBO, NA = 13.5%). A clearly different distribution of trait scores within the two genetic groups was observed for: (i) Colour Of Flowers (COF), predominant presence of white within Mesoamerican samples (K1; blue in Figure 6); (ii) Base Of Standard (BOS), mainly present in the Mesoamerican genetic group (K1; orange); (iii) Pod Beak Position (PBP), predominantly marginal in K1 (blue) and (iv) Pod Beak Orientation (PBO), predominantly "downward oriented" in the Mesoamerica group (grey) while the "straight orientation" is the most abundant condition within the Andean group (K2) (orange) ( Figure 6). As expected, these differences were confirmed by contingency table analysis results (p ≤ 0.001). Pod Wall Fibre (PWF) also showed a different distribution between the two groups (p ≤ 0.05) where "strongly contracting" pod wall fibre was more frequent among the Andean (K2) genotypes than the Mesoamericans.
Regarding the screened qualitative seed-traits, the Mesoamerican group (K1) showed a high number of black-seeded genotypes (Seed Coat Darker Colour (SCDC) and Seed Coat Lighter Colour (SCLC) indicated as dark blue in Figure 6); the same group was also characterised by absence of Seed Coat Pattern (SCP, blue).
Considering seed traits on the whole panel, most of genotypes showed absence of coat patterns (SCP, 76.0%), cuboid shape (SS, 43.8%) and medium brilliance (BS, 45.3%). These traits also showed significantly different distributions between Mesoamerican and Andean genotypes (p ≤ 0.05). Sustainability 2019, 11, x FOR PEER REVIEW 12 of 21  Figure S3) were mainly assigned to a genetic group highlighted by STRUCTURE analysis at K = 9 (Figure 2, K = 9, magenta cluster). These genotypes (groups A and B in the PCA) are characterised by absence of seed coat pattern (87%), striped "base of standard" (83%), purple flower (78%) and black seed (70%).

Seed-Protein Content Evaluation
Seed-protein content was successfully assessed on 170 genotypes. Recorded values ranged from a minimum of 22.08 to a maximum of 35.40 gr/100 gr dry-seed weight, showing that the diversity panel is highly variable for this trait. When genotypes were grouped according to STRUCTURE analysis results (K = 2), seed-protein content average values were significantly different (p ≤ 0.05) with the Mesoamerican group being characterised by a higher value when compared with the Andean one (29.20 vs 26.60) (Figure 7a). Considering the Seed-protein content and HSW at the same time, the pure lines derived from a Portuguese (CIAT accession number G10248A) and an Italian landrace (IPK PHA1916) displayed favourable combination of the two traits (Figure 7b) within the Mesoamerican cluster. On the other hand, considering the Andean group, the best combination of the two traits was observed in two lines developed from two Italian landraces: 4959 and 6124 (UNIPG genebank, reference accession number) (Figure 7b).

Identification of Accessions Carrying Relevant Traits for Sustainable Agriculture
Results of the 95th percentile of BLUP data analysis allowed the identification of accessions carrying relevant values of DTF, HSW, and seed-protein content. Within the Mesoamerican group, genotypes developed from 4651 (UNIPG), G20109 (CIAT) and PI-309885 (USDA) showed best values of DTF, HSW and seed-protein content, respectively. When considering the Andean group, G10077 (CIAT), 4455 (UNIPG) and 6124 (UNIPG) revealed best DTF, HSW and seed-protein content, respectively; interestingly, within the same genetic group, a line developed from a cultivar (UNIPG 7583), showed high values of both HSW and seed-protein content at the same time. Genotypes, carrying most relevant values of the three above-mentioned traits are listed in table 4. In the PCA of qualitative traits, the first two axes explained 29.67 % of the total variation ( Figure S3). PC1 splits the Mesoamerican genotypes into three main groups: two on the left (A and B) and the other on the right side of the biplot (C) while the remaining ones are scattered in the middle together with most of the Andean genotypes. Genotypes separation on this axis is mainly due to differences of BOS, LCA, COF, SCDC, SCLC, and PC ( Figure S3). On the other hand, Andean genotypes mainly grouped in the centre of the biplot meaning that lower diversity exists for the above-mentioned traits within this group. To a certain extent, Andean samples were separated by PC2. The same analysis showed no clustering of the eleven admixed genotypes ( Figure S3).
Interestingly, most of Mesoamerican genotypes (67%) characterised by negative values of the first principal component (PC1) (cluster A and B in Figure S3) were mainly assigned to a genetic group highlighted by STRUCTURE analysis at K = 9 (Figure 2, K = 9, magenta cluster). These genotypes (groups A and B in the PCA) are characterised by absence of seed coat pattern (87%), striped "base of standard" (83%), purple flower (78%) and black seed (70%).

Seed-Protein Content Evaluation
Seed-protein content was successfully assessed on 170 genotypes. Recorded values ranged from a minimum of 22.08 to a maximum of 35.40 gr/100 gr dry-seed weight, showing that the diversity panel is highly variable for this trait. When genotypes were grouped according to STRUCTURE analysis results (K = 2), seed-protein content average values were significantly different (p ≤ 0.05) with the Mesoamerican group being characterised by a higher value when compared with the Andean one (29.20 vs. 26.60) (Figure 7a). Considering the Seed-protein content and HSW at the same time, the pure lines derived from a Portuguese (CIAT accession number G10248A) and an Italian landrace (IPK PHA1916) displayed favourable combination of the two traits (Figure 7b) within the Mesoamerican cluster. On the other hand, considering the Andean group, the best combination of the two traits was observed in two lines developed from two Italian landraces: 4959 and 6124 (UNIPG genebank, reference accession number) (Figure 7b).

Discussion
The SSD strategy allowed to produce a panel of diverse common bean pure lines mainly derived from European landraces. Results of the morpho-phenological and genetic characterisation would possibly make this panel a useful tool for future research activities on the common bean and, at the

Identification of Accessions Carrying Relevant Traits for Sustainable Agriculture
Results of the 95th percentile of BLUP data analysis allowed the identification of accessions carrying relevant values of DTF, HSW, and seed-protein content. Within the Mesoamerican group, genotypes developed from 4651 (UNIPG), G20109 (CIAT) and PI-309885 (USDA) showed best values of DTF, HSW and seed-protein content, respectively. When considering the Andean group, G10077 (CIAT), 4455 (UNIPG) and 6124 (UNIPG) revealed best DTF, HSW and seed-protein content, respectively; interestingly, within the same genetic group, a line developed from a cultivar (UNIPG 7583), showed high values of both HSW and seed-protein content at the same time. Genotypes, carrying most relevant values of the three above-mentioned traits are listed in Table 4.

Discussion
The SSD strategy allowed to produce a panel of diverse common bean pure lines mainly derived from European landraces. Results of the morpho-phenological and genetic characterisation would possibly make this panel a useful tool for future research activities on the common bean and, at the same time, an attractive resource for different breeding programmes. In this regard, seed samples of the genotypes composing the panel are kept under long-term storage conditions in the genebank held by DSA3 (FAO code: ITA-363).
Results of next-generation genotyping, based on thousands of SNP markers in linkage equilibrium, allowed to precisely assess the level of diversity of the panel. In particular, cryptic relatedness analysis showed that no redundancy exists within this collection, meaning that the criteria used for initial germplasm selection allowed inclusion of highly diverse materials. This corroborates the fact that, when building a collection, landraces-rather than elite germplasm-are the materials of choice if the objective is to maximise the level of genetic diversity. It is also noteworthy that landraces still hold useful traits for sustainable agriculture that might have disappeared in modern cultivars [73,74]. A renewed interest on such traits is arising due to the need of breeding new varieties for sustainable agriculture [74][75][76][77]. The availability of such collection and the relative genotyping data open new possibilities for the identification of genetic determinants (QTL and/or candidate genes) involved in the control of the above-mentioned traits such as stress tolerances, precocity and enhanced nutritional characteristics. Indeed, this diversity panel has been recently used to identify chromosomic regions carrying candidate genes involved in flowering time control [49].
The dissection of the genetic architecture of this panel confirmed a balanced contribution of the two main common bean genepools. Results of genetic structure analysis, based on Evanno's test, indicated K = 2 as the most likely population structure; this evidence was corroborated by the available information on phaseolin alleles that allowed an initial separation of the panel into Mesoamerican and Andean genotypes. Similarly, when data were analysed as K = 3, it was possible to identify two subpopulations within the Andean group in accordance with their phaseolin alleles C and T. Studying an Italian collection of common bean landraces, Raggi and colleagues [32] reported a similar genetic structure where samples characterised by a certain phaseolin allele (C, T or S) were consistently assigned to a specific genetic group out of the three identified.
However, the use of a linkage disequilibrium-pruned SNP subset also allowed to display a finer structure of the panel; when K = 9 was analysed, Mesoamerican and Andean genepools divided into four and five subgroups, respectively. Other studies based on molecular markers reported similar within genepool substructures. In particular, Blair and colleagues [78] reported a separation of wild Mesoamerican genotypes into four different groups according to their geographical origin; in the same study, no evident subdivision of the Andean wild material was observed. When studying the structure of a panel of Andean elite and non-elite genotypes, Cichy and colleagues observed a substructure that splits Andeans into two subgroups [45]. In our study, the genetic architecture of the lines constituting the panel appears more complex. In fact, genotype attribution to the relative genetic group was obtained considering q values ≥ 0.8 [55,56]; using this threshold, admixed genotypes -possibly products of early crosses between Mesoamerican and Andean individuals -might affect data analysis, making the identification of genepool-related trait combination rather difficult. When studying genetic structure using such a high number of molecular markers higher assignation thresholds might be considered.
Outcomes of our genetic structure analyses can certainly be useful to produce recombinant experimental populations for plant breeding. For example, attribution of genotypes to different genetic groups can be advantageously exploited to develop Recombinant Inbred Lines (RILs) and/or Multi-parent Advanced Inter-Cross (MAGIC) populations for fine QTL mapping. In all cases, the previous knowledge of genetic and phenotypic diversity of a high number of genetically "stabilised" lines is fundamental to address different experimental needs.
The same information can also be relevant when breeding aims at the constitution of heterogeneous populations characterised by a high level of within-population genetic diversity. In self-pollinating species, such as common bean, two strategies can be used to achieve the purpose. The first consists of creating mixtures of existing uniform cultivars (i.e., pure lines). This method has been successfully applied to cereals producing heterogeneous materials characterised by high adaptation capacity [79], yield stability [80] and ability to contain fungal and viral disease diffusion [81][82][83][84]. The second method is to create segregating Composite-Cross Populations (CCPs), in which, carefully selected genetically diverse "founder" genotypes are crossed with each other-rather than physically mixed [85]-and bulked progenies are then let to evolve for a number of successive generations applying or not an active selection [86]. Recently, this method has generated interest for its application to cereals (i.e., wheat and barley) in sustainable farming systems [80,87]. In both cases, previous knowledge of genetic diversity of a wide panel of candidate components (in the case of mixtures) or founders (CCPs) is crucial for a wise and more conscious exploitation of their diversity.
When compared to broader common bean's germplasm collection morpho-phenological descriptions, results of this phenotypic characterisation showed that a relatively high level of phenotypic diversity was included for both qualitative and quantitative traits [28,30,31]. Regarding quantitative traits, the partially replicated experimental design allowed to successfully estimate broad-sense heritability and at the same time allowed for significant cost reduction of phenotyping. Application of such experimental designs, where only a subset of samples is replicated, can potentially increase the number accessions to be screened in future research by optimising invested efforts in terms of labour and money. Reducing costs of germplasm characterisation is a key factor to boost the exploitation of the so-called "untapped diversity", kept in different collections held by genebanks all over the world.
In our study, morpho-phenological trait analyses were carried out accounting for genotype assignation to the two identified clusters (structure analysis assignation) corresponding to the Mesoamerican and Andean genepools. Genotypes assigned to the Andean group showed larger leaves, heavier seeds and longer inflorescences when compared to the Mesoamericans. Such evidences are in line with those obtained by Singh and colleagues [60], that analysed morpho-phenological traits of common bean landraces considering the two genepools (Phaseolin allele-based assignation). In the same work, authors also found a group of accessions in which phaseolin classification did not match morphological trait-based assignation. Interestingly, in our work, the phaseolin information did not always match genetic attribution to a certain genetic cluster (K = 2) even when these attributions were complete (q = 1.0). This evidence further confirms the advantage of using next-generation genotyping to precisely define genetic groups for morpho-phenological data analysis rather than a limited number of markers or morphological characteristics. Indeed, morpho-phenological data did not allow to produce a clear subdivision between or within genepools. However, results of the qualitative trait PCA-mainly based on the rearrangement of different plant organ pigmentation-allowed to split the Mesoamerican genotypes into three subgroups: two of the identified groups (left-side of the biplot), correspond to a genetic cluster identified considering K = 9. This group of genotypes shows qualitative characteristics that are in line with the description of the within Mesoamerican genepool race named "Mesoamerica" [21]. This evidence further confirms the value of the fine genetic structure analysis of our work suggesting its useful application to study the complex within-genepool genetic architecture of the common bean.
Since a high and cost-effective production of plant proteins is relevant for increasing sustainability of our food system, we also focused our attention on the identification of promising genotypes carrying favourable combination of high seed weight and high seed-protein content. Even if results of regression analysis showed slightly negative correlation, some accessions-developed from G10248A, PHA1916, 4959 and 6124-showed favourable combinations of these two traits. For example, such accessions can represent useful resources for breeding programmes that aim at maximising seed-protein content for markets requiring large-seeded varieties; this approach can be used to meet consumers' preferences and, at the same time, to enhance nutritional quality. Moreover, the performed morpho-phenological characterisation allowed to identify genotypes carrying relevant traits to develop new varieties able to better cope with harsh environmental conditions that are expected as result of climate change. A total of ten early flowering genotypes were identified within the two genepools. Flowering time is a key determinant for dry matter production and seed yield in common bean as well as in other major crops [88,89]; high temperature at flowering can dramatically reduce seed set rates in bean. It has been reported that night temperatures above 18 • C can reduce pollen viability in this crop [40,90]. Indeed, early flowering materials can contribute selecting novel varieties able to avoid yield losses caused by heat stress. This approach is of particular interest for determinate growth habit types that-being mechanically picked in a single harvest-can de facto avoid heat stress during flowering. On the other hand, for indeterminate growth habit materials, early flowering can contribute increasing productive cycle duration and, consequently, number of harvests with a beneficial effect on the overall production. This aspect is very relevant under marginal conditions, where sowing can occur late in the season (e.g., due to low minimum temperatures at high altitudes in temperate climates) exposing plants to narrow productive windows. Early flowering can also be particularly useful in light of reducing water consumption by shortening exposure of plants to water-demanding growing periods. Same principles can also apply to reduce crop exposure to other biotic and abiotic stresses on the field; significant reduction of time from sowing to flowering is particularly relevant in Europe, where bean production occurs in summer (sowing is generally carried out in April-May). Finally, a limited leaf area, that has been observed within the screened panel, can be exploited to achieve better water-use efficiency during mid-season drought; in other pulses, such as cowpea, a positive correlation between leaf area and drought stress has been reported [91].
Molecular and morpho-phenological information presented in this work will help foster common bean breeding for the development of new cultivars carrying favourable traits to attain more sustainable bean production.
Finally, with this work, we want to promote a broad use of grain legumes within diverse agricultural systems and socio-economic contexts in order to provide society multiple beneficial advantages, in line with the principles of sustainability.
Supplementary Materials: The following are available online at http://www.mdpi.com/2071-1050/11/19/5443/s1, Figure S1: Linear regressions between the data collected on the same genotypes grown in experimental sites; Figure S2: Boxplots of the 9 quantitative traits in pairwise comparisons between the two experimental sites; Figure S3: Principal component analysis (PCA) of 15 qualitative traits; Table S1. List of the 192 common bean lines developed by the Department of Agricultural, Food and Environmental Sciences (DSA3) of the University of Perugia; Table S2: Quantitative traits. Descriptive statistics by genetic groups.
Author Contributions: L.C. and L.R. equally contributed to this paper as first authors. V.N., L.R. and A.C. conceived and designed the experiments. L.C. and A.C. performed morpho-phenological data collection. L.C., L.R., S.C. and A.C. performed phenotypic data analyses. L.R. and L.C. performed genotypic data analyses. A.C. and V.N. funded phenotyping costs. V.N. funded genotyping costs. All authors wrote the manuscript.