Multidimensional Analysis of Diversity in DH Lines and Hybrids of Winter Oilseed Rape (Brassica napus L.)

Limited genetic variability is a major problem in rapeseed breeding, whose strict selection, limited geographical range and a short period of domestication has led to a reduction in genetic and phenotypic diversity. Assessing specific populations for the greatest genetic diversity for many traits simultaneously, requires the evaluation of multi-feature characteristics. The aim of this study was to estimate the variability of DH (doubled haploid) lines and two generations of winter oilseed rape hybrids. In addition, the relationship between the Mahalanobis distances of parental forms and the effect of heterosis in hybrids was investigated. The results of MANOVA showed that the genotypes and years as well as genotype × year interaction were significantly different (p < 0.001) for all eight observed traits. The first (V1) and the second (V2) canonical variate explained 38.57% and 27.55% of the total variation in 2015, and 50.19% and 31.84% in 2016, respectively Canonical variate analysis showed that the traits, flowering time and number of branches per plant had a very large influence on the differentiation of genotypes. Graphs demonstrated that the tested DH lines and hybrids are characterized by a similar, wide range of variability. It was observed, that when using a diverse population of DH lines for crossbreeding with only one CMS/Rfo line, there was no significant reduction of variation in hybrid generations and between these generations. The phenotypic distance values determined by the results of Mahalanobis analysis were similar in both years, but slightly wider in 2016. The observed minimum and maximum ranged from 1.324 to 22.356 in 2015, and from 1.105 to 27.792 in 2016, respectively. No significant correlation was observed between the hybrid heterosis effect and the Mahalanobis distance of the parental lines.


Introduction
Genetic and phenotypic diversity has always been the basis of plant breeding and development of improved cultivars with wider adaptability and a broad genetic base. The use of differentiated genotypes allows new varieties to be obtained with the desired combinations of traits. Limited genetic variability is a major problem in rapeseed (Brassica napus L.) breeding, whose long-term and one-way selection (double low varieties) in combination with its limited geographical range and a short period of domestication has led to a reduction in both genetic and phenotypic diversity [1,2]. The genetic pool of rapeseed breeding materials has been narrowed further by the emphasis on high oil content and seed quality characters. In addition, rapeseed does not have naturally occurring wild forms that could be natural sources of variability.
Good knowledge of the range of diversity that affects a given trait allows to understand the pool of genes within the available population, and is necessary to improve diversity through selection. Such selection of plants based on genetic divergence has been successful in many crops. Analysis of diversity helps the breeder to choose parental forms for specific breeding objectives. By using diverse parents for hybridization, we increase the chance of achieving a wide range of variability in the segregating population and a superior heterosis effect in F1 hybrids. This relationship between the distance of parental forms and heterosis has been confirmed by many studies [3][4][5][6][7][8]. On the other hand, there are also studies that prove the absence of such a relationship [9][10][11].
Genetic diversity, which is the combined sum of genetic variations found in a species or population, is also required to perform genetic mapping and search for markers associated with traits of interest. In order to improve breeding efficiency through marker-assisted selection, it is necessary to identify gene loci, which leads to an understanding of the genetic basis of traits [12]. This can be done by defining alleles in the breeding population before and after selection or comparing them to a population with contrasting traits [13,14]. Hence, populations with high variability and diversity are the basis for genetic mapping.
Assessing specific populations for the greatest genetic diversity for many traits simultaneously requires the evaluation of multi-feature characteristics. Multivariate statistical methods can be applied for this purpose [15]. The most used methods, like principal components analysis [16,17] or canonical variate analysis, reduce the dimensionality of multi-feature object comparisons [18,19]. Another useful method for measuring the genetic divergence is D2 statistics proposed by Mahalanobis [20]. Using these methods to study a given population makes it possible to assess the diversity in terms of many features at once, which is advisable when there are not many significant correlations between these features [21].
The aim of this study was to estimate the variability of doubled haploid (DH) lines and two generations of winter oilseed rape hybrids. In addition, the relationship between the Mahalanobis distances of parental forms and the effect of heterosis in hybrids was investigated. The phenotypic variation of yield-related traits in these populations was described by Dobrzycka et al. [22], and the genetic parameters of DH lines by Bocianowski et al. [23,24]. The effect of heterosis on these genotypes was also studied [25]. The plant material was prepared to perform genetic mapping of yield-related traits and is currently undergoing molecular analysis with microsatellite markers.

Plant Material and Experimental Design
The plant material used in the study includes 182 genotypes of winter oilseed rape: 60 DH lines, two generations of hybrids (60 lines in each generation), one CMS ogura line, and one Rfo restorer line. DH lines were derived from F1 hybrids between two lines: 324/2, with high oleic acid content (77.9%), and 622/3, with high oil content (51.9%) and high seed yield. Single cross hybrids (CMS×DH) were created by crossing these 60 DH lines with the CMS ogura line, and three-way cross hybrids (CMS/DH×Rfo) were created by crossing the obtained single hybrids with the restorer (Rfo) line.
Field experiments were conducted in Borowo (52 • 70 N, 16 • 46 E), Plant Breeding Strzelce Ltd., Co., IHAR-PIB Group (Strzelce, Poland), and included two growing seasons: 2014/15 and 2015/16. Experiments were carried out on podzolic soils (sandy soil) of quality class IIIa. The topsoil was slightly acidic and its pH value was 6.0. Borowo had an average daily temperature of 9.8 • C in the first growing season (August 2014-July 2015) and 10.3 • C in the second growing season (August 2015-July 2016), compared to an average of 8.4 • C in previous years. The total precipitation was 399.2 mm in the first growing season and 627.7 mm in the second, while the pre-2014 average for this period was 533.7 mm. The crops that were cultivated previously on the soil were spring barley (2014/15) and winter triticale (2015/16). The experiments used a randomized block design with three replications and two randomly distributed standards (Monolit and Arsenal). The whole experiment consisted of 594 plots (for 182 objects + standards). Each plot contained four rows, 2 m long with a sowing density of 25 seeds/m. The distance between rows was 30 cm. The field management followed standard agricultural practice. Accordingly, fertilization was applied in the fall: K 90 kg/ha, P 60 kg/ha, N 18 kg/ha, and S 18 kg/ha. In the spring, Agronomy 2021, 11, 645 3 of 12 N 26 kg/ha and S 13 kg/ha was applied. Appropriate plant protection products were also applied in the fall and spring.

Traits Evaluated
Traits evaluated in the field trials were flowering time, duration of flowering, plant height, number of branches per plant, number of siliques per plant, silique length, number of seeds per silique, and 1000 seed weight. Flowering time (days) was the number of days from the beginning of the year to the beginning of flowering. Duration of flowering (days) was the number of days from the beginning to the end of flowering. Plant height (cm) was measured on three randomly selected plants from each plot after the end of flowering time. The number of branches per plant and siliques per plant were counted on three well-developed, randomly selected plants from each plot at the green siliques stage. Silique length (mm) and number of seeds per silique were estimated on 20 siliques from each plot. Siliques were collected at the mature seeds stage from the main branch and then dried. Thousand seed weight (g) was estimated from the average of three measurements from the mixed seeds of plants in a plot.

Statistical Analysis
The results collected from field experiments were subjected to statistical analysis. Firstly, the normality of the distribution of the studied traits was tested using Shapiro-Wilk's normality test [26]. Multivariate normality and homogeneity of variance-covariance matrices were tested by Box's M test. A multivariate analysis of variance (MANOVA) was performed based on the following model using a MANOVA procedure in GenStat 18: n is the total number of observations, p is the number of traits, X is (n × k)-dimensional matrix of design, k is the number of genotypes, T is (k × p)-dimensional matrix of unknown effects, and E is (n × p)-dimensional matrix of residuals. Canonical variate analysis (CVA) was applied for the multi-trait assessment of similarity of the investigated genotypes in a lower number of dimensions with the least possible loss of information [27], and for each year independently. Mahalanobis [20] distance was suggested as a measure of "polytrait" genotypes' similarity, the significance of which was verified by means of the critical value D α , called "the least significant distance" [28]. The analysis of the relationship between thousand seed weight and selected traits was carried out with multivariate regression analysis. Observations from year 2015 and 2016 were analyzed separately. To measure how well the model fitted the data, the coefficients of determination (R 2 ) were calculated.

The Canonical Variate Analysis (CVA)
In our study, all quantitative traits had a normal distribution as well as multivariate normality. The analysis of variance (ANOVA) indicated that the main effects of year, genotype, and year × genotype interaction were significant for all the traits of study (results shown in Wolko et al. [25], Table 1). The results of MANOVA (for genotypes F = 71.22, for years F = 104.17, and for genotype-by-year interaction F = 217.73) showed that the genotypes and years as well as genotype × year interaction were significantly (p < 0.001) different for all eight observed traits.
The results of the CVA for the genotypes are shown in Table 1. The first two canonical variates together explained 66.12% (in 2015) and 82.03% (in 2016) of the total variation between the genotypes ( Table 1, Figures 1 and 2). Figures 1 and 2 show the distribution of the genotypes in the system of the first two canonical variates in 2015 and 2016, respectively. In the diagrams, the coordinates of a given genotype are values of the first and second canonical variate, respectively. A significant linear relationship with the first canonical variate in 2015 was found for flowering time, no. of branches per plant (positive dependencies), and duration of flowering and thousand seed weight (negative dependencies) ( Table 1). The second canonical variate in this year was significantly positively correlated with flowering time, plant height and no. of seeds per silique but negatively correlated with duration of flowering and thousand seed weight. In 2016, there was a significant positive linear relationship with the first canonical variate for flowering time, no. of branches per plant and no. of siliques per plant, but there was a negative correlation for duration of flowering, plant height, no. of seeds per silique and thousand seed weight ( Table 1). The second canonical variate was significantly positively correlated with flowering time, plant height, no. of branches per plant, no. of siliques per plant, silique length and no. of seeds per silique but negatively correlated with duration of flowering and thousand seed weight.

Multivariate Regression Analysis
In 2015 plant height, no. of siliques per plant, silique length and no. of seeds per silique significantly affected thousand seed weight, whereas only silique length had a positive influence. In 2016, the same as in 2015, only silique length had a positive effect on thousand seed weight, whereas no. of seeds per silique had a negative effect. The percentage of phenotypic variation explained was 18.38% in 2015 and 30.56% in 2016 (Table 2).

Mahalanobis Distances
The range of Mahalanobis distances between studied genotypes was larger in 2016 ( Figure 3). The greatest phenotypic variation in all observed traits (measured Mahalanobis distances) for single genotypes in 2015 was found for DH997-CMS/1010×Rfo (the Mahalanobis distance between them amounted to 22.356). The greatest similarity in 2015 was found for CMS/922×Rfo and CMS/935×Rfo (1.324). In the second year of study, Mahalanobis distances ranged from 1.105 (between CMS×993-Monolit) to 27.792 (between DH945-CMS×984).

Mahalanobis Distances
The range of Mahalanobis distances between studied genotypes was larger in 2016 ( Figure 3). The greatest phenotypic variation in all observed traits (measured Mahalanobis distances) for single genotypes in 2015 was found for DH997-CMS/1010×Rfo (the Mahalanobis distance between them amounted to 22.356). The greatest similarity in 2015 was found for CMS/922×Rfo and CMS/935×Rfo (1.324). In the second year of study, Mahalanobis distances ranged from 1.105 (between CMS×993-Monolit) to 27.792 (between DH945-CMS×984).  Mahalanobis distances between pairs of parental genotypes were significantly correlated with the heterosis effect of CMS×DH hybrids for thousand seed weight in 2016. Statistically significant negative correlation between the heterosis effect of CMS/DH×Rfo hybrids and Mahalanobis distances between their parental forms were observed for plant height (in the first year of study), silique length (in both years) and number of seeds per silique (in the second year of study) ( Table 3).

Discussion
The genetic complexity of quantitative features, such as seed yield and yield related traits, and the genotype × environment interaction makes it difficult to obtain stable and high-yielding lines and varieties. The phenotypic divergence varies depending on the environment and usually reflects the genes and environmental interaction. Analysis of variance proved that year, genotype, as well as the interaction between them, had a

Discussion
The genetic complexity of quantitative features, such as seed yield and yield related traits, and the genotype × environment interaction makes it difficult to obtain stable and high-yielding lines and varieties. The phenotypic divergence varies depending on the environment and usually reflects the genes and environmental interaction. Analysis of variance proved that year, genotype, as well as the interaction between them, had a highly significant (p ≤ 0.01) impact on all the traits observed in this study. Shi et al. [29] obtained the same results in their research of 15 yield-correlated traits, while Teklewold and Becker [30] observed strong genotype effects (p ≤ 0.01) for 14 phenotypic traits. Individual traits differ in significance and have different shares of the total multivariate variation. The most important traits in the multivariate variation of genotypes were identified in the study by means of CVA [31]. The results of MANOVA showed that the genotypes and years as well as genotype × year interaction were significantly different (p < 0.001) for all eight traits observed in our study.
Knowledge of the genetic and phenotypic diversity among the available breeding materials is crucial for the improvement of characteristics through selection. Multivariate statistical methods allow for an accurate assessment of genotypes, while considering many important agronomic traits simultaneously. These types of analyses are popular in many species of plants for the characterization of germplasm collections and in the selection of parents for the most suitable for breeding. Using multivariate statistical analysis, tested DH lines and their single cross and three-way cross hybrids were characterized in this study, in terms of several traits. The first canonical variate (V 1 ) and the second canonical variate (V 2 ) explained 38.57% and 27.55% of the total variation in 2015, and 50.19% and 31.84% in 2016, respectively. The same analysis was performed on two populations of DH lines of oilseed rape by Szała et al. [32], and resulted in 77.81% for V 1 and 13.57% for V 2 . The first two canonical variates in our study, together explained 66.12% (in 2015) and 82.03% (in 2016) of the total variation, what means that in 2016 there was smaller loss of information. Both canonical variates in the first and second year of observation were positively correlated with the flowering time and number of branches per plant, and negatively correlated with the duration of flowering and thousand seed weight. These results allowed us to interpret the distribution of objects in the space of the first two canonical variates such that genotypes characterized by later flowering and more branches per plant were placed in the upper right part of the system, and genotypes with long flowering time and high thousand seed weight were placed in the lower left part of the coordinate system. Different results were obtained by Szała et al. [32], who found positive correlations between the thousand seed weight and the first canonical variate and a negative correlation between the number of branches per plant and the first and second canonical variate. Jahan et al. [33] performed PC analysis and observed, similarly to our results, a positive correlation in the number of branches per plant with both variates (Z1 and Z2). On the contrary, they noticed a positive correlation between thousand seed weight and Z1, while in our research this feature was negatively correlated with both canonical variates. Based on canonical variate analysis, it can be concluded that in our research, flowering time and number of branches per plant were the traits that had a very large influence on the differentiation of genotypes.
The genotypes studied in this research were presented in the coordinate system of the first two canonical variates. We can clearly state that all three observed groups of objects show a high degree of dispersion in the system. There was no clustering of genotypes in terms of their relatedness. Many researchers distinguish groups based on CV analysis, but fewer subjects are usually involved. For example, Parvin et al. [34] studied 40 rapeseed genotypes, which were grouped into five clusters. They noticed that genotypes of similar origin were not placed in the same groups. A similar conclusion was reached by Kumar and Singh [35], who evaluated twenty-five genotypes of Brassica napus from different places in the country, which were eventually grouped into six clusters. Additionally, no association between geographical distribution and genetic divergence was noticed by Parveen et al. [36], who studied 15 genotypes of rapeseed grouped in four clusters. In our study of 184 genotypes, the analysis of such clusters would be unreadable. However, based on the graphs, we can conclude that the tested DH lines and hybrids are characterized by a similar, wide range of variability.
Regression analysis is used to explain how the value of a dependent variable is modified when any independent variable changes. In most studies on Brassica napus, linear regression is performed for the seed yield per plot or plant [37,38]. The structure of our field experiments did not allow for effective measurement of the seed yield; therefore, the linear regression was performed for the thousand seed weight, which is one of the important traits affecting the yield in oilseed rape. Both Elliott et al. [39] and Sharafi et al. [40] performed regression analysis for the seed yield, and found a significant positive effect of the thousand seed weight, which means that its increase results in an increase in the yield. Additionally, Elliott et al.'s [39] results proved that seedlings from large seeds are more tolerant to abiotic and biotic stress and have increased vigor and size. In our study, regression analysis showed that the thousand seed weight in both years was positively influenced by the silique length, and negatively influenced by the number of seeds per silique.
The Mahalanobis distances between genotypes give an overview of phenotypic similarities and allow us to interpret the existing divergence in the context of all tested traits. Despite the slight phenotypic variability of rapeseed, this analysis can provide good information on the range of available diversity. Although our plant material consists of DH lines and hybrids related to them, the phenotypic distances determined by the Mahalanobis analysis were varied. The range of observed values was similar in both years, but slightly wider in 2016. The observed minimum and maximum ranged from 1.324 to 22.356 in 2015, and from 1.105 to 27.792 in 2016, respectively. Similar, although slightly higher values for the Mahalanobis distances were observed by Kumar and Singh [35] in parental lines and hybrids of Brassica rapa-from 2.642 to 30.102. However, Parvin et al. [34] and Mili et al. [41] noticed much lower values in Brassica napus genotypes, ranging from 0.304 to 8.145 and from 0.378 to 12.433, respectively. It is interesting that in our research, despite the significant influence of the environment, the greatest variability was observed between the same groups of objects in both years. It was observed, that when using a diverse population of DH lines for crossbreeding and only one CMS/Rfo line, there is no significant reduction of variation in hybrid generations and between these generations.
Maintaining high variability among breeding materials is important to produce heterotic hybrids. The use of the heterosis effect, possible by crossing genetically distant parental forms, has contributed to the largest increase in rapeseed yield in recent decades. Unfortunately, strong selection during the breeding process limits the genetic pool, which makes obtaining a high heterosis effect more difficult [42]. According to a definition of heterosis, it is generally believed that greater distance leads to a higher heterosis effect, although there are many studies that do not show such a relationship. A significant number of hybrids that we have examined showed a positive heterosis effect, especially in terms of plant height, silique length and number of seeds per silique [25]. However, in the present study, in general, no significant correlation was observed between the hybrid heterosis effect and the Mahalanobis distance of the parental lines. Similarly, such a relationship was not observed by Bocianowski et al. [21] in his research of six yield-related traits in oilseed rape hybrids and their parental lines. Also, Popławska et al. [43] did not observe such a relationship, explaining that the phenotypic variability of the studied lines was too narrow, what did not allow for predicting the heterosis effect based on the Mahalanobis distance. A similar conclusion was made by Bocianowski et al. [44], who stated that the lack of correlation between heterosis and Mahalanobis distance may be the result of high phenotypic and small genotypic diversity of parental lines, which resulted in a small number of obtained, significant heterosis effects.

Conclusions
In conclusion, it should be emphasized that phenotypic variation is not synonymous with genetic variation, because it is subject to environmental modifications and represents only a small fragment of the information contained in the genome [2]. It can be stated, that when it comes to the selection of materials for hybrid breeding, it is best to rely on genetic rather than phenotypic variation. However, breeding of rapeseed focuses not only on improving its yield, but also on obtaining varieties with altered quality features such as a changed fatty acid composition, increased protein content, bright seed color, and resistance to biotic and abiotic stresses. For this reason, it is important to contain as much variability as possible in the oilseed rape breeding materials. Diverse populations with high phenotypic variability in traits of interest are also necessary for genetic mapping and the search for molecular markers associated with these traits. For this purpose, the studies described in this article were conducted by initially assessing the phenotypic traits that affect the yield and their heterosis. The next step was to examine and estimate the variability of the studied groups of genotypes, which showed significant differentiation in terms of the assessed traits. These results suggest that the studied populations are suitable for ongoing genetic mapping because only a wide range of diversity in the analyzed trait makes it possible to find the genes responsible for this diversity.
Author Contributions: A.Ł. and J.W. planned the research project, carried out the experiments and collected phenotypic data. J.B. and A.C. performed statistical analyses. All authors wrote and revised the manuscript. All authors have read and agreed to the published version of the manuscript.