Effect of Chromosomal Localization of NGS-Based Markers on Their Applicability for Analyzing Genetic Variation and Population Structure of Hexaploid Triticale

This study aimed to determine whether using DNA-based markers assigned to individual chromosomes would detect the genetic structures of 446 winter triticale forms originating from two breeding companies more effectively than using the entire pool of markers. After filtering for quality control parameters, 6380 codominant single nucleotide polymorphisms (SNPs) markers and 17,490 dominant diversity array technology (silicoDArT) markers were considered for analysis. The mean polymorphic information content (PIC) values varied depending on the chromosomes and ranged from 0.30 (2R) to 0.43 (7A) for the SNPs and from 0.28 (2A) to 0.35 (6R) for the silicoDArTs. The highest correlation of genetic distance (GD) matrices based on SNP markers was observed among the 5B–5R (0.642), 5B–7B (0.626), and 5A–5R (0.605) chromosomes. When silicoDArTs were used for the analysis, the strongest correlations were found between 5B–5R (0.732) and 2B–5B (0.718). A Bayesian analysis showed that SNPs (total marker pool) allowed for the identification of a more complex structure (K = 4, ΔK = 2460.2) than the analysis based on silicoDArTs (K = 2, ΔK = 128). Triticale lines formed into groups, ranging from two (most of the chromosomes) to four (7A) groups depending on the analyzed chromosome when SNP markers were used for analysis. Linkage disequilibrium (LD) varied among individual chromosomes, ranging from 0.031 for 1A to 0.228 for 7R.

The classification system based on how genotypes are obtained divides triticale into the categories of primary and secondary.Primary triticale genotypes are hybrids obtained by crossing rye with wheat, whereas secondary genotypes are obtained via crosses among primary triticale by crossing primary triticale lines with secondary triticale lines or primary triticale's with wheat or rye [10].Currently, secondary triticale forms (a cross between two types of primary (first-cross) triticale's), formed as a result of the combination of the chromosomes from genomes A and B, which increased genetic variation, fulfil breeders' needs [11][12][13].Due to its origin, triticale is expected to show narrow variation and a weak structure within species.Thus, studying genetic variation is a prerequisite for future perspectives and the successes of breeding programs.
Using factorial analysis, over one thousand hexaploid winter triticale genotypes from 24 countries were subjected to morphological and physiological assessments of their yield components.The results revealed that the first three principal components may account for up to 80% of the variance [14].Several researchers have conducted similar experiments [11,15].However, given the limited number of traits that can be researched, the majority of which are quantitative, and their reliance on environmental factors, such a study is only tangentially significant [16].The introduction of molecular markers has enabled the refinement of this kind of study.To determine if 80 different European hexaploid triticale lines had a genetic structure, Kuleung et al. [17] used 57 simple sequence repeat (SSR) markers.There was a genetic variance ranging from 7% to 86%.According to an agglomeration analysis, the materials formed five groups, which generally corresponded with a pedigree.This clustering may result from the fact that breeders also use the varieties of other breeders as parents in their breeding programs, which reduces the genetic differentiation between their breeding programs.Researchers looked at 232 types of triticale from Polish breeding companies and genotyped them with diversity array technology (DArT) markers.They found that the genetic structure of the material was weak, with 86% of the variation being found within groups [18].Similar outcomes were found for 161 lines from the global collection [19].For DArT markers, a principal coordinates analysis (PCoA) displayed just two data groupings.The first two coordinates explained 23.4% of the variation.Additionally, 144 triticale forms were evaluated using DArT markers, revealing a genetic identity of 60% [20].Surprisingly, an analysis of 54 Brazilian genotypes from two breeding organizations employing SSR markers revealed as many as seven groups [21].
In the case of triticale, clustering usually differentiates between the winter and spring varieties.The spring forms show slightly more variance than the winter forms [19,20].Other crops, such as wheat [22][23][24], barley [25], and rapeseed, have also shown this distinction between their winter and spring varieties [26].The population's geographic origin is another element that typically has the most significant impact on population clustering [27,28].It must be emphasized that grouping triticale materials into separate groups may be difficult, regardless of the marker system used for analysis.For instance, Losert et al. [29] examined 121 registered triticale varieties using genotyping by sequencing (GBS) markers.No genetic structure related to the known origin was discovered.Tams et al. [30] revealed the same phenomenon for 128 genotypes of winter European triticale using SSRs.The PCoA's first two coordinates accounted for 22.4% of the variation in materials from other nations.Studies utilizing other molecular techniques do not always support the assumption that triticale has a slight genetic variation and a tiny structure [18].Yet, most triticale population assessments indicate either a few groups or none [18,29,31].It depends on how many genetic variants are in the gene pool, how much genome-wide linkage disequilibrium (LD) there is, and how closely related the population is to itself.These factors affect the mapping resolution, marker density, statistical methods, mapping power, and how well plant breeding works [32].
According to recent research, triticale agronomic traits should be examined separately for each chromosome rather than for all simultaneously [33].Rye provided evidence of selection resulting from years of breeding at the chromosomal level [34].The different breeding preferences among plant breeding companies in the same country support this evidence.When variation is examined at the level of a species' chromosomes, it can be used to analyze the genetic structure of the species.Also, the genetic structure found when counting the total number of a specific type of genetic marker might be different from the genetic structure found by counting the markers attached to specific chromosomes.Moreover, the genetic structure may vary depending on the DNA marker type (dominant vs. codominant).
The goal of our study is to determine whether DNA markers attached to specific triticale chromosomes would work better than using all markers together to determine the genetic structures of triticale types originating from two breeding companies.

Results
The initial data set consisted of 24,353 SNPs and 64,334 silicoDArTs.After filtering for quality control parameters, 6380 SNPs and 17,490 silicoDArT markers were considered for analysis.The number of mapped SNPs per chromosome ranged from 93 (4B) to 416 (6R), whereas the number of silicoDArTs varied from 180 (4B) to 946 (4R) (Figure S1).In general, the R genome showed the highest marker number.As many as 1740 SNPs (27%) and 6192 silicoDArTs (35%) were not assigned to chromosomes but were still used for the analysis based on the total pool of markers.
A marker's polymorphic information content (PIC) delivers relevant information on its usefulness for studies using the given material set.The higher the value, the more informative the marker [35], with a value of about 0.5 considered to be the most informative.In our study, the PIC values for both marker types ranged from 0.1 to 0.5, with the mean values for the total marker pools being equal to 0.39 and 0.30 for SNPs and silicoDArTs, respectively.The proportions of markers with a PIC value above 0.41 (highly informative) reached 64.9% (SNP) and 20.5% (silicoDArTs).The mean PIC values for markers with a chromosome location varied from 0.30 for 2R to 0.43 for 7A (SNPs) and 0.28 for 2A to 0.35 for 6R (silicoDArTs).The SNP markers from each chromosome had similar PIC values, with most markers being above 0.41 (48% for 7B and 77% for 2A).Except for the 1A and 1R chromosomes, which had a lot of markers with PIC values below 0.40 (59% for 1A and 68% for 1R), none of the other chromosomes did.However, the differences between triticale genomes were significant (F = 6.534, p < 0.05).Tukey's HSD test showed that the R genome formed a separate group.The PIC distribution of silicoDArTs indicated the highest proportion of markers (ca. 41% (4A and 7B) to 74% (3R)), with values between 0.25 and 0.40, except for 2A and 4B, where the highest proportion of markers (ca. 38%) was noticed in the range from 0.1 to 0.25, referring to markers with a medium informativeness.The difference between triticale genomes was significant (F = 4.74, p < 0.05).Tukey's HSD test showed that the A and R genomes were in separate groups.Violin plots showing the distribution of PIC values by chromosome are given in Figure 1.
Expected heterozygosity (H e ) is a common statistic for assessing the amount of genetic variability within a population.The H e values for the total SNP markers pool ranged from 0.12 (4 SNPs) to 0.50 (912 SNPs), with an average of 0.41 (Figure 1).The average H e value noticed for the markers assigned to each chromosome was the highest for 2A (0.46) and the lowest for 2R (0.31).The differences between triticale genomes were significant (F = 5.902, p < 0.05).Tukey's HSD test showed that the R genome formed a separate group.When silicoDArTs were applied for analysis, the H e for individual markers varied from 0.05 (363 markers) to 0.50 (180 markers), with an average of 0.22.The parameter depended on chromosomes and was the highest for 6R (0.23) and the lowest for 2A (0.17).According to AMOVA, the difference between the triticale genomes was insignificant (F = 2.683, p = 0.09).However, the average H e for the A, B, and R genomes within the given marker tape ranged from 0.393 (A) to 0.396 (B) and from 0.308 (A) to 0.329 (R) in the case of SNPs and silicoDArTs, respectively.
Based on the total pools of SNP and silicoDArT markers, the UPGMA clustering of plant materials using Nei's genetic distance (Figure S2A,C) showed that the two main groups differed slightly depending on the marker system used.The first cluster constructed using SNP markers included 119 out of the 254 genotypes from Strzelce Plant Breeding and 164 among the 192 lines from DANKO Plant Breeding.The second cluster splintered into two subclusters.The first subcluster included 70 triticale lines from Strzelce and four lines from DANKO, whereas the second subcluster comprised 65 and 24 lines from Strzelce and DANKO, respectively.Based on silicoDArTs, cluster one included 58 lines (46 from Strzelce and 12 from DANKO).Cluster two consisted of the remaining 388 lines, further divided into two branches: 113 lines from Strzelce and 34 lines from DANKO formed one branch, and 95 (from Strzelce) and 146 (from DANKO) lines formed another.The highest Nei's genetic distance (GD) calculated using the total pool of SNP markers was between the L202 line and L80 or L135 (>11,000), and the lowest (<1000) was between B4 and B80, B261 and B262, L5 and L35, L27 and L28, L31 and L129, and L36 and L125.Similarly, the L202, B4, and B27 lines had the highest GDs (>10,000), with 28, 28, and 36 lines among the investigated pool.When silicoDArTs were utilized, the highest GD values (>8500) were observed between B39 vs. B120 and B143 and B115 vs. B146 and B192.Lines B120 and B143 showed the highest GD values (>8000), with 16 and 17 lines, respectively.The lowest values (<300) occurred between L31 and L129, L36 and L125, L4 and L200, L6 and L171, L27 and L28, L14 and L201, and B261 and B262.
The first two principal axes in the PCoA utilizing the whole pool of SNPs explained only 5.4% and 5.0% of the total genetic variation, respectively.The total pool of silicoDArTs resulted in comparable values (6.6% and 4.1% for the first and second principal coordinates, respectively).However, separate accession groups could not be seen independently of the marker system applied for analysis (Figure S1B,D).Still, in the case of silicoDArTs, materials from DANKO showed less variation than those from Strzelce.However, a comparable dispersion was seen for all analyzed forms when applying the SNP markers.
Mantel tests used to show the correlation level between two distance matrices, reflecting how the lines differed from each other at the genetic level, revealed a significant correlation between two genetic distance matrices based on the total pool of SNPs and silicoDArTs (r = 0.505; p = 0.01).The correlation between distance matrices representing the same chromosome but evaluated on SNPs and silicoDArTs indicated a significant correlation that ranged from 0.074 to 0.387 for 4R and 3R, respectively.Correlation coefficients were sometimes insignificant (1A, 3A, 4A, 1B, and 5R) (Table 1).Mantel tests showed that the GD matrices evaluated on the markers assigned to triticale chromosomes were positively correlated.The correlation was relatively high between 5B and 5R (0.642), 5B and 7B (0.626), and 5A and 5R (0.605) when SNP markers were considered (Table 2).The 4B-and 6B-based GD matrices exhibited the lowest correlation values (0.175-0.321) of all the matrices, including those derived from a whole set of markers.When silicoDArTs were used for analysis, the highest correlation was observed between 5B and 5R (0.732), 2B and 5B (0.718), 1B and 1R (0.711), 1B and 2B (0.709), 1B and 5B (0.700), 5A and 5B (0.704), and 5A and 5R (0.700).The lowest values noticed for 3R-based matrices were correlated with all remaining matrices, excluding those for 3B.A Bayesian analysis of the genetic structure based on SNP markers revealed that the highest ∆K value was 2460.2 (K = 4), indicating the presence of four groups (Figure 2A,C).The respective groups encompassed 65, 75, 31, and 275 accession numbers and formed Pop1, Pop2, Pop3, and Pop4, respectively (Figure 2A,E).The origin of the accession was not a determinant of the genetic structure.The lines originating from DANKO and Strzelce were present in each group, but in different proportions.The percentages of lines from DANKO in individual populations were 37.0% in Pop1, 76.0% in Pop2, 16.1% in Pop3, and 38.5% in Pop4.Different results were obtained when the total pool of silicoDArT markers was applied.A sharp peak at K = 2 was identified, but ∆K reached only 128.0, indicating a weak population structure (Figure 2D).Among the investigated lines, 363 clustered in Pop1 and 83 clustered in Pop2 (Figure 2B,F).The clustering did not follow known genotype origins.However, only two genotypes from DANKO were included in Pop2.Codominant SNP markers, considered to be more informative, more strongly illustrated the population structure in our study compared to dominant silicoDArTs, and were used to analyze population structures based on markers assigned to individual chromosomes.The results preferentially indicated the presence of two groupings which formed Pop1 and Pop2 (2A, 3A, 6A, 7A, 1B, 4B, 5B, 6B, 7B, 1R, 4R, 5R, 6R, and 7R) (Table 3).However, for chromosomes 1A, 5A, 2B, 3B, 2R, and 3R, the highest ΔK value was found to be the greatest for K = 3, and the highest K = 4 was evaluated for the 4A chromosome (Table 3).The triticale genotypes assigned to the particular groupings differed depending on chromosome, even if they had the same K value.
Coord Codominant SNP markers, considered to be more informative, more strongly illustrated the population structure in our study compared to dominant silicoDArTs, and were used to analyze population structures based on markers assigned to individual chromosomes.The results preferentially indicated the presence of two groupings which formed Pop1 and Pop2 (2A, 3A, 6A, 7A, 1B, 4B, 5B, 6B, 7B, 1R, 4R, 5R, 6R, and 7R) (Table 3).However, for chromosomes 1A, 5A, 2B, 3B, 2R, and 3R, the highest ∆K value was found to be the greatest for K = 3, and the highest K = 4 was evaluated for the 4A chromosome (Table 3).The triticale genotypes assigned to the particular groupings differed depending on chromosome, even if they had the same K value.
An Analysis of Molecular Variance (AMOVA) revealed that 14% of the total variance (Phi PT = 0.139, p < 0.001) was found among four subpopulations estimated from STRUCTURE when the entire pool of SNPs was used (Table 3).Different results were found depending on the chromosome.The highest values between population variance accounted for 41% (Phi PT = 0.413, p < 0.001) for the 7R chromosome, whereas low values (i.e., Phi PT = 1%, p < 0.001) were evaluated for 4R and 7B, respectively.The Phi PT between particular populations with K > 2 (chromosomes 1A, 4A, 5A, 2B, 3B, 2R, and 3R) was highly significant (p < 0.001) and varied from 0.015 (between Pop2 and Pop3 for chromosome 3R) to 0.351 (between Pop2 and Pop3 for chromosome 2R) (Table 4).High values of Phi PT (greater than 0.20) obtained for 2A, 7A, 1B, 2B, 4B, 6B, 1R, 2R, 5R, 6R, and 7R were indicative of a high genetic differentiation between grouping accessions.An analysis of linkage disequilibrium (LD) for the whole genome showed that 36% of the SNP marker pairs were associated with a p-value below 0.001, 78% of which recorded an r 2 value of less than 0.1 (Table 5).The mean value of r 2 was 0.122.Pairs of loci in significant LD on the chromosome level varied from 11.6% (1A) to 55.2% (7A), and r 2 values of <0.1 ranged from 55.7% (7A) to 94.1% (1A).The highest (r 2 > 0.7) or complete (r 2 = 1.0)LD was represented in only 0.57% of the significant pairwise comparisons for chromosome 1A, but reached approximately 10% for 7A, 2A. and 2R and exceeded 14% for chromosome 7R.The distribution of r 2 values is shown in Figure 3.
highly significant (p < 0.001) and varied from 0.015 (between Pop2 and Pop3 for chromosome 3R) to 0.351 (between Pop2 and Pop3 for chromosome 2R) (Table 4).High values of PhiPT (greater than 0.20) obtained for 2A, 7A, 1B, 2B, 4B, 6B, 1R, 2R, 5R, 6R, and 7R were indicative of a high genetic differentiation between grouping accessions.An analysis of linkage disequilibrium (LD) for the whole genome showed that 36% of the SNP marker pairs were associated with a p-value below 0.001, 78% of which recorded an r 2 value of less than 0.1 (Table 5).The mean value of r 2 was 0.122.Pairs of loci in significant LD on the chromosome level varied from 11.6% (1A) to 55.2% (7A), and r 2 values of <0.1 ranged from 55.7% (7A) to 94.1% (1A).The highest (r 2 > 0.7) or complete (r 2 = 1.0)LD was represented in only 0.57% of the significant pairwise comparisons for chromosome 1A, but reached approximately 10% for 7A, 2A. and 2R and exceeded 14% for chromosome 7R.The distribution of r 2 values is shown in Figure 3.The mean LD varied among individual chromosomes and ranged from 0.031 for 1A to 0.228 for 7R (Table 5).The average LD decay for the whole genome was 8.86 Mb when The mean LD varied among individual chromosomes and ranged from 0.031 for 1A to 0.228 for 7R (Table 5).The average LD decay for the whole genome was 8.86 Mb when r 2 = 0.1 (Figure 3).The LD decay for a predicted r 2 of 0.1 varied from 1.18 Mb (1A) to 28.36 Mb (7R) based on the chromosomes.The mean distance between mapped SNP markers ranged from 2.53 Mb for chromosome 6R to 7.32 Mb for chromosome 7B and was 4.24 Mb for the whole genome (Table 5).

Discussion
The decision on the marker type suitable for a given task could be more straightforward.Dominant or codominant markers are available on one side, but rarely both.On the other hand, the choice is nearly always taken a priori.Thus, it is vital to have reasonable criteria to facilitate this decision.The DArT PL company offers a DArTseq platform generating codominant (SNP) and dominant (silicoDArT) markers based on a single data set.In this study, we highlighted the efficiency of both marker types for genetic diversity analyses in the entire triticale collection.In addition, a population structure and linkage disequilibrium (LD) study was also conducted using a whole pool of SNPs and SNPs assigned to individual chromosomes.
The PIC values evaluated in the study based on triticale lines showed that SNPs seemed more informative than silicoDArTs.PIC values equal to 0.39 for SNP markers are congruent with data presented for the other plant species, such as rye [36], Pisum fulvum [37], and maize [38].The results indicated that the marker system and marker types were informative enough for our purpose, which was also the case at the level of genomes, with some preferences for the marker type used.The results were more uniform for silicoDArTs, with a mean PIC equal to 0.31 for the A and B genomes and 0.33 for R the genome.Comparing the genomes based on the PICs evaluated on the SNPs showed that the R genome (mean PIC = 0.35) separated from the wheat genome (mean PIC = 0.40 for the A and B genomes).Within individual genomes, the PIC varied slightly between chromosomes and ranged from 0.28 (2A) to 0.33 (7A), from 0.29 (4B) to 0.33 (6B), and from 0.31 (2R) to 0.35 (6R) for dominant silicoDArT markers.The PICs within individual genomes for codominant SNP markers reached maximum values of 0.43, 0.41, and 0.40 for 7A, 5B, and 6R, respectively.The mean value of the expected heterozygosity calculated by SNPs was twofold higher (H e = 0.41) than that for silicoDArTs (H e = 0.22), most probably due to the codominant nature of SNP markers.Similar relationships were observed, regardless of whether they were calculated using whole marker pools or markers assigned to individual chromosomes.However, for SNP markers, the mean H e values were lower for the R genome than for the A and B genomes.The H e index value for the triticale collection was much higher than that for breeding lines of common wheat [39] or barley [40], and it was about the same as that for a group of wheat genotypes with a lot of genetic diversity [41].The differences we saw between the PIC and H e in this study were due to marker types that showed their dominant and codominant natures.On the other hand, differences between genomes are likely related to how materials are bred by their companies.Thus, the PIC and H e indices could be used to decide which marker system better fits the requirements of given studies.To better discriminate between lines, preferences for higher PIC values and lower H e values should be given.Furthermore, the presented data indicate that each chromosome, at least in some cases, could be treated individually for a better resolution.
UPGMA analyses showed that data clustering utilizing either silicoDArTs or SNPs did not group plant materials based on their breeding origin.Materials from the two companies were evenly distributed between two main clusters utilizing SNPs and silico-DArTs.A grouping like this may mean that companies either share breeding materials or that the species' genetic pool is limited.An alternative explanation is that companies' breeding preferences are similar, or breeding pressure needs to be stronger or longer to form separate genetic pools representative of such companies.Moreover, it cannot be excluded that this pressure is directed preferentially towards specific chromosomes, as only a minor fraction of agronomic traits is considered in breeding programs.If so, whole marker pools may not identify such differences.While a limited genetic pool in triticale is apparent [18] and the exchange of breeding materials cannot be excluded, we tend to think that, at least to some extent, breeding companies exhibit distinct pressure towards their materials.Still, time is needed to evaluate the differences between pools.Thus, these differences are subtle or averaged by the whole marker pool representing all chromosomes simultaneously, indicating that putative distinction could be uncovered based on markers assigned to selected chromosomes.Nevertheless, the outcome from the PCoA analysis based on the whole pools of markers was that the materials from DANKO Plant Breeding Ltd., located in Laski, Poland, were more uniform than those from Strzelce Plant Breeding Ltd., located in Borowo, Poland, as indicated by clustering using silicoDArTs.Furthermore, both total pools of SNPs and silicoDArTs were inconclusive regarding the differentiation of plant materials due to their origin, indicating that the marker choice may have led to somewhat distinct and inconsistent results.This conclusion is congruent with the results presented by other studies [34,42,43] and is supported by the correlation between genetic distance matrices evaluated based on SNPs and silicoDArTs.This result encouraged us to test whether using markers associated with individual chromosomes would be more conclusive for plant material differentiation.We chose to compare genetic distance matrices evaluated based on SNP and silicoDArT markers assigned to triticale chromosomes for that purpose.
Interestingly, the correlations varied even more than in the total marker pools.Moreover, different results were obtained depending on the marker system used.Within each type of marker, the correlation values between chromosomes ranged from low but still significant to fairly high.This may suggest that the most correlated chromosomes were under the same breeding selection, exhibited synteny, or carried some agronomically important traits important for breeding programs.The correlation between 5A and 5R partly supports the notion concerning directed breeding pressure or synteny regarding agronomic traits.Both chromosomes code QTLs conferring height with the dwarfing gene Ddw1, which is localized on 5R in rye [44], and the dominant height-reducing Rht12 locus, located on 5A in wheat [45]-both QTLs control plant height in central European winter triticale [46].In addition, many other vital QTLs responsible for powdery mildew resistance, seedling freezing tolerance, and yielding capacity have been localized in triticale on 5R or 5A [47][48][49][50].
Moreover, the homology between 5RS and 5AS and between 5RS and 5BS (the most correlated GDs) has been proven by many researchers [51][52][53] and is congruent with our Mantel test results.Another example is the 1B and 2B pair.Here, the two chromosomes harbor numerous R-genes in wheat and triticale, including functional stripe rust [54,55] and powdery mildew R-genes [56,57].A high correlation coefficient between chromosomes from the same homologous group, e.g., 5A and 5B, confirms the presence of essential genes responsible for a plant's development, adaptation, or resistance.
Regarding the presented data, it is reasonable to assume that different marker types (codominant SNPs or dominant silicoDArTs), combined with Bayesian statistics, should detect the distinct genetic structures of the analyzed triticale population.However, only the SNP markers (total marker pool) supported the strong stratification of triticale materials (∆K = 2460.2),with moderate differentiation between Pops (Phi PT ranging from 0.150 to 0.237).Thus, we focused on codominant SNP markers in later structure and LD analyses.The Bayesian-model-based method based on markers assigned to individual chromosomes resulted in identifying distinct population structures.The differentiation was more apparent compared to the total marker pools, respectively.Since there was no information on breeding preferences, physical traits, etc., it was hard to determine what the genetic structures found meant.
LD is an essential parameter of genetic diversity in natural populations and germplasm collections, and it can be employed to evaluate population stability and crop improvement programs [58].The extent and distribution of LD in the genome determine the required number of SNP markers and the mapping resolution for the success of forward genetic studies, such as genome-wide association analyses [59].However, in cases of triticale, when SNPs' positions are based on the genomes of single genotypes of rye and wheat, we cannot exclude spurious results in LD analysis [60].It has been reported that 10-20% of SNPs in plant species may need to correct their physical positions due to the available reference assembly [60,61].
In our study, the number of SNP marker pairs with significant LD (p < 0.01) varied from 11.6% to 55.15%, depending on the chromosome analyzed.The mean LD was the lowest for the 1A (r 2 = 0.031) and the highest for the 7R (r 2 = 0.228) chromosomes.This result was expected and is consistent with similar analyses in bread wheat [62], durum wheat [63], and rye [64].However, much higher LD values exceeding 0.300 and reaching ~0.420 for the 3R and 4R chromosomes were reported by Vendelbo et al. [64] for non-restorer germplasm and cytoplasmic male-sterile lines of rye, who explained that the higher LD in the investigated population was linked to the low frequency of detectable recombination events caused by the depletion of rare alleles and suggesting the occurrence of population bottlenecks or intense selection.The same explanation was suggested by Roncallo et al. [63] for the rising number of SNPs in high (r 2 > 0.07) or complete LD (r 2 = 1.0) for the 2A and 7A chromosomes in a durum wheat collection, compatible with our results observed for the identical chromosomes, as well as for 2R and 7R.In our study, the maximum LD decay distance (r 2 > 0.1) was 28.4 Mb, with an average LD decay distance of about 8.9 Mb for the triticale genome.This overall LD decay distance is difficult to compare with other studies due to the different methods used in calculations [65], as well as the use of recombination units (cMs) instead of base pairs (bps).Moreover, self-fertilizing plants usually show less decay of LD because, in a homozygous genetic background, recombination events are ineffective in causing LD decay [59].However, the slightly open-pollinating nature of triticale, which can be up to 10% [66], may influence this fact.
The presented results demonstrate that analyses of any genetic pool should be supported by selecting the most adequate marker system, which may vary depending on the genome or individual chromosome levels.The more informative the system is, the better the available outcomes.Moreover, depending on specific requirements, it might be vital to base a study on more heterozygotic or homozygous materials.When it is crucial to identify the most exhibited differences based on genetic distances, the more uniform the analyzed materials are, the better.Such data are significant when, i.e., choosing crossing lines when the maximum heterozygosity is expected in the F1 progeny to compensate for the adverse effects of some recessive alleles present in the homozygous materials.As any breeding materials can be subjected to varying breeding programs, the results could be addressed exclusively to the given species under the assumption that the representative genetic pool is analyzed.Furthermore, similar studies conducted at the level of individual chromosomes are time-consuming and hardly available, which makes conclusions concerning other cereals difficult.

Plant Material
The plant material consisted of 446 homogeneous breeding lines of winter hexaploid triticale (X Triticosecale Wittm.ex A. Camus) obtained from the Polish plant breeding companies DANKO Plant Breeding Ltd., located in Laski (51 genotypes).The breeding lines for examination were selected by the breeders to represent the genetic variability of triticale within each company's breeding materials to the maximum possible extent.

DNA Extraction and Genotyping
The frozen leaf samples collected from triticale at the heading stage of development were homogenized in liquid nitrogen using a mortar and pestle.The homogenate was transferred to 2 mL Eppendorf tubes, and then the total DNA was isolated using a commercial reagent kit, the DNeasy Plant Mini Kit (Qiagen, Hilden, Germany).The isolated DNA samples were qualitatively and quantitatively evaluated using a NanoDrop 2000 spectrophotometer (Thermo Fisher Scientific Inc., Waltham, MA, USA).DNA integrity was tested electrophoretically in 1.5% agarose gel, and 100 ng/mL samples were prepared.These samples underwent sequencing (Diversity Arrays Technology Pty Ltd., Canberra, Australia; http://www.diversityarrays.com/(accessed on 1 June 2024)) at a 2.5 mln depth of reads using the HiSeq 2000 platform (Illumina Inc., San Diego, CA, USA).Two marker types were generated by DArTseq analysis: SNPs (codominant) and silicoDArTs (domi-nant).The codominant markers were scored as 0, 1, and 2, where 0 and 1 were homozygotes.The dominant markers were coded as 0 and 1.The chromosomal assignment and position of the markers were evaluated based on wheat (Wheat Chinese Spring IWGSC RefSeq v1.0) and rye (Secale cereale Lo7 RefSeq v3.0; https://galaxy-web.ipk-gatersleben.de/(accessed on 1 June 2024)) sequences in reference genomes that were homologous to the marker sequence.Diversity Arrays Technology Pty Ltd. supported the data.

Quality Control of Markers and Genetic Diversity
Marker quality was based on the following parameters: reproducibility > 95%, call rate > 95%, minor allele frequency (MAF > 0.05), and one ratio > 0.1 [67].The scoring of reproducibility involved the proportion of technical replicate assay pairs for which the marker score exhibited consistency.The call rate determined the proportion of genotypes per marker with non-missing data.Finally, the one ratio represented the proportion of samples for which the genotype score equaled '1'.The quality parameters and polymorphic information content (PIC) of each marker were computed automatically by the DArT software (version 3.5.2).
The expected heterozygosity (H e ) of the total SNP and silicoDArT markers that met the filter criteria was calculated using GenAlEx v. 6.5 [68].The expected heterozygosity (H e ), also known as the gene diversity of the locus [69], is a measure of the genetic diversity in a population and describes the expected proportion of heterozygous genotypes if the population is in Hardy-Weinberg equilibrium.The H e index is calculated according to Nei [70].

Statistics
A one-way analysis of variance (ANOVA) was used to determine if there were significant differences in the mean values of H e and PIC for the triticale A, B, and R genomes.The AMOVA was evaluated in Statistica v. 13.3.0 [71].Violin plots visualized the distributions of PIC values among chromosomes and were constructed in PAST version 4.17 software [72].An unweighted pair group method with arithmetic mean (UPGMA) and 1000 bootstrap replications using the PAST software [72] based on Nei's genetic distance was conducted.Then, the genetic relationships among triticale lines, including their origins (DANKO Plant Breeding Ltd. and Strzelce Plant Breeding Ltd.), were assessed.

Population Structure
The population genetic structure was evaluated utilizing a Bayesian algorithm implemented in Structure software version 2.3.4 [74].With correlated allele frequencies, the admixture model with 50,000 lengths of the burning period and 100,000 MCMC replicates was applied.Ten independent runs were performed for each simulated value of K, ranging from 1 to 7. The best K was identified based on the approach of Evanno et al. [75].The structure results were visualized via the online server (http://clumpak.tau.ac.il/, (accessed on 1 June 2024)) in the CLUMPAK software (version 1.1.2)[76].The "main pipeline-admixture" and "best K" algorithms were used.The analysis was conducted based on the total silico-DArT and SNP marker pools.In the case of SNP markers, the analysis was conducted on markers assigned to particular chromosomes.
An analysis of molecular variance (AMOVA) was performed based on the total pool of SNPs and SNPs assigned to individual chromosomes.Furthermore, the pairwise Phi PT population values were calculated to estimate the genetic differentiation among recognized populations (Pops).Finally, probability values estimated by 1000 permutations were used to determine whether the partitioning of variance components was significant.The genetic indices and AMOVA were evaluated in GeneAlEx v. 6.5 [68].

Linkage Disequilibrium
Linkage disequilibrium (LD) was calculated in the TASSEL 5.0 software [77] with a significance level of p < 0.001 and a sliding window of 50 SNPs.The chromosome and genome-specific mean values were estimated.LD (r 2 ) was plotted against the physical distance (bp) between markers to determine the LD decay distance (bp).A trend line describing the LD decay was evaluated by a nonlinear regression model based on Hill and Weir [78] and implemented in Remington et al. [79] in R (http://www.rproject.org;(accessed on 1 June 2024)).The critical threshold of the r 2 value, below which the LD could be considered as being caused by physical linkage, was set to 0.1.

Conclusions
This study concludes that both codominant (SNP) and dominant (silicoDArT) markers efficiently analyze triticale genetic diversity.However, SNP markers were found to be more informative than silicoDArTs.It is important to note that the choice of marker type can lead to distinct and inconsistent results.Additionally, the study showed correlations between the genetic distance matrices of specific chromosomes, indicating shared breeding selection or the presence of important agronomic traits.By using SNP markers in combination with Bayesian statistics, this study achieved a strong classification of triticale materials based on markers assigned to individual chromosomes, and such population structures differed depending on whether the analyses were conducted on individual chromosome-based markers or their total pool.An LD analysis based on SNPs revealed variations in the LD across different chromosomes, but the LD values were relatively low, allowing us to work on the whole marker pool and markers assigned to chromosomes.

Figure 1 .
Figure 1.Average values of expected heterozygosity (H e ) (A) and distribution of the polymorphic information content (PIC) values of SNP (B) and silicoDArT (C) markers calculated for the total marker pools and particular chromosomes of triticale.The horizontal line inside the box shows the median.The shapes of violin plots show the distribution of PIC values by chromosome.

Figure 2 .
Figure 2. Principal coordinate analysis (PCoA) based on Nei's genetic distance inferred from the (A) SNP and (B) silicoDArT data of a 446 triticale accessions.Each of the color of dots indicate individuals assigned to different subpopulations (Pop).Delta K for differing numbers of subpopulations (k) based on: (C) SNP and (D) silicoDArT markers.Structure plot with K = 4 and K = 2 clusters of triticale accessions based on: (E) SNP and (F) silicoDArT markers.Each accession's genome is represented by a single row, which is partitioned into colored segments in proportion to the estimated membership in the subpopulations.

Figure 2 .
Figure 2. Principal coordinate analysis (PCoA) based on Nei's genetic distance inferred from the (A) SNP and (B) silicoDArT data of a 446 triticale accessions.Each of the color of dots indicate individuals assigned to different subpopulations (Pop).Delta K for differing numbers of subpopulations (k) based on: (C) SNP and (D) silicoDArT markers.Structure plot with K = 4 and K = 2 clusters of triticale accessions based on: (E) SNP and (F) silicoDArT markers.Each accession's genome is represented by a single row, which is partitioned into colored segments in proportion to the estimated membership in the subpopulations.

Figure 3 .
Figure 3. Linkage disequilibrium (LD) decay plot for the whole genome of triticale lines and frequency of r 2 values.(A) Scatter plot of LD values of intra-chromosomal pairwise loci against physical distance (Mb).LD decay was fitted with the nonlinear regression model curve (red line).The green line indicates the intersection between critical r 2 and the map distance.The critical threshold of r 2 value below which the LD could be considered caused by physical linkage was set to 0.1 (blue line).(B) LD (r 2 ) values frequency by chromosome and whole genome (WG).

Figure 3 .
Figure 3. Linkage disequilibrium (LD) decay plot for the whole genome of triticale lines and frequency of r 2 values.(A) Scatter plot of LD values of intra-chromosomal pairwise loci against physical distance (Mb).LD decay was fitted with the nonlinear regression model curve (red line).The green line indicates the intersection between critical r 2 and the map distance.The critical threshold of r 2 value below which the LD could be considered caused by physical linkage was set to 0.1 (blue line).(B) LD (r 2 ) values frequency by chromosome and whole genome (WG).

Table 1 .
Mantel test evaluated between genetic distance matrices of individual chromosomes calculated based on SNPs and silicoDArTs.

Table 2 .
The Mantel's test correlation coefficients (r) evaluated between genetic distance matrices calculated based on SNPs (above diagonal) and silicoDArTs (below diagonal) for individual chromosomes and the whole pool of markers of the given molecular type.

Table 3 .
Structure results for total SNP markers and, according to the assignment, the particular triticale chromosomes, together with the analysis of molecular variance (AMOVA) of the genetic variation between accessions forming groups evaluated by Bayesian analysis.

Table 4 .
Pairwise Phi PT genetic distances between populations with K > 2 for selected triticale chromosomes.

Table 4 .
Pairwise PhiPT genetic distances between populations with K > 2 for selected triticale chromosomes.

Table 5 .
Distribution and linkage disequilibrium of SNP markers on whole genome (WG) and particular chromosomes of triticale.