The Use of DArTseq Technology to Identify Markers Related to the Heterosis Effects in Selected Traits in Maize

Spectacular scientific advances in the area of molecular biology and the development of modern biotechnological tools have had a significant impact on the development of maize heterosis breeding. One technology based on next-generation sequencing is DArTseq. The plant material used for the research consisted of 13 hybrids resulting from the crossing of inbred maize lines. A two-year field experiment was established at two Polish breeding stations: Smolice and Łagiewniki. Nine quantitative traits were observed: cob length, cob diameter, core length, core diameter, number of rows of grain, number of grains in a row, mass of grain from the cob, weight of one thousand grains, and yield. The isolated DNA was subjected to DArTseq genotyping. Association mapping was performed using a method based on the mixed linear model. A total of 81602 molecular markers (28571 SNPs and 53031 SilicoDArTs) were obtained as a result of next-generation sequencing. Out of 81602, 15409 (13850 SNPs and 1559 SilicoDArTs) were selected for association analysis. The 105 molecular markers (8 SNPs and 97 SilicoDArTs) were associated with the heterosis effect of at least one trait in at least one environment. A total of 186 effects were observed. The number of statistically significant relationships between the molecular marker and heterosis effect varied from 8 (for cob length) and 9 (for yield) to 42 (for the number of rows of grain). Of particular note were three markers (2490222, 2548691 and 7058267), which were significant in 17, 8 and 6 cases, respectively. Two of them (2490222 and 7058267) were associated with the heterosis effects of yield in three of the four environments.


Introduction
The continuous development of molecular biology and statistical tools to analyze the large amount of data obtained for different mapping populations is changing the approach to selecting plant materials for breeding [1]. There is an increasing emphasis on the selection of multiple functional traits using, for this purpose, unrelated but genetically aligned plant materials. For maize, yield and its structural traits are the most important [2][3][4]. Increasing the range of maize cultivation is linked to breeding progress, which includes exploiting heterosis (the vigor of the first generation of hybrids) and creating hybrids with lower climatic requirements [5][6][7]. Access to increasingly modern cultivation technologies and breeding methods is also very important. The demand for new varieties of maize is constantly growing, making it the subject of intensive breeding and genetic research. The identification of molecular markers linked to genes determining not only the heterosis effect or grain yield but also other functional traits is now a priority in maize breeding programs [1,8].
To identify markers and their coupled genes, association mapping [9,10] and genomic selection [11,12] are currently the most commonly used. In association mapping, we can distinguish between candidate gene association and genome-wide association study (GWAS). In candidate gene association, correlations between DNA polymorphisms in a specific gene and a trait are checked. In the absence of detailed biochemical knowledge related to the sought-after trait, a GWAS analysis is justified. This approach searches for trait-marker associations across the genome and assumes that there are markers within the genome conditioning the expression of the trait that show coupling imbalances [13]. It was on maize that the first association mapping was performed [14] using simple RAPD and RFLP techniques. With the development of new biotechnological methods and statistical programs, the number of species studied has increased.
The breakthrough came in 2006 with Illumina's launch of the Solexa (Genome Analyzer) Next Generation Sequencer. The sequencing technology used in this device is still being developed and Illumina now has several sequencing platforms, including two for 'bulk' analysis (NovaSeq and NextSeq), and the smallest is for sequencing small numbers of samples with shorter genomes (MiniSeq and iSeq). The platforms differ in the amount of data that they can obtain (from 1.8 GB for MiniSeq to 6000 GB on the NovaSeq platform) and the number of reads that they can generate in a limited time [15]. Initially, the problem was the large amount of sequencing data preventing bioinformatics analyses on desktop computers; nowadays, analyses are performed on high-throughput servers with large RAM capacity, in either a Linux or an iOS environment [16]. Many bioinformatics tools run algorithms that allow whole genomes and transcriptomes to be assembled from short sequences [17]. As the results show, bioinformatics analyses currently pose the greatest difficulty in adapting NGS for diagnostics, as the result of improper folding is false positives and incorrectly folded genomes, making it difficult to correctly interpret the results obtained [18].
One technology based on next-generation sequencing is DArTseq. This technology was used in the present study to identify single nucleotide polymorphism (SNP) and Silico DArT polymorphisms, for which associations with the heterosis effect were sought. DArTseq technology (in opposition to GBS) provides a large pool of both SNP and SilicoDArT markers (which are also characterized by dominance) [19]. In this method, the complexity of the genome is reduced via restriction enzymes and sequencing short reads. DArTseq technology replaces the hybridization step, and sequencing is conducted using the Illumina system [20].
Molecular analyses in maize are focused on the identification of new markers (SNP and SilicoDArT) and quantitative trait loci (QTLs) regions, as well as breeders' interest in DNA analyses, which additionally includes the search for new methods to select the parental components for heterosis crosses [21]. The idea here is to find a relationship between the yield of an F 1 hybrid and the heterogeneity of loci markers for its parental forms. It is well known that breeding success is determined by access to starting materials with significant genetic diversity because well-identified and heterotically partitioned starting material results in lower costs for the entire hybrid breeding process [22].
The purpose of this study was to identify molecular markers (SNPs and SilicoDArTs) linked to the heterosis effects for yield and yield traits in maize (Zea mays L.).

Plant Material
The plant material for the study consisted of 13 Figure 1.   Polyphoska in the quantity of 350 kg ha -1 and 160 kg ha -1 of urea were applied. The nutrient content was as follows: Before winter ploughing: N-21 kg N ha -1 in the form of polyphoska, P-70 kg P 2 O 5 ha −1 in the form of polyphoska, and K-105 kg K 2 O ha -1 in the form of polyphoska. Before sowing: N-73.6 kg N ha -1 in the form of urea-was applied. The chemical control of weeds was performed with a field sprayer. Within each repetition, one cob each was selected from ten plants and biometric measurements were made on them. These biometric measurements were carried out in the first half of November each year. Nine quantitative traits were observed: cob length (LC), cob diameter (DC), core length (LCO), core diameter (DCO), the number of rows of grain (NRG), the number of grains in a row (NGR), mass of grain from the cob (MGC), weight of 1000 grains (WTGs) and yield.

Genotyping as Well as SNP and SilicoDArT Data Processing
The seeds of each hybrid were divided. One part was allocated to set up a field experiment and ten pieces were allocated for molecular analyses. Seeds of the F 1 generation hybrids (10 from each hybrid) were sown onto sterile Petri plates to obtain young seedlings. Approximately 1 cm 2 of leaf tissue was taken from each of the 10 plants and a collection sample was prepared (separately for each hybrid). The plant material prepared in this way was isolated. Isolation was carried out using Maxwell equipment from Promega, for automatic DNA isolation. The Maxwell ® RSC Tissue DNA Kit AS1610 was used for this purpose. After isolation, the concentration and purity of the DNA samples were tested and equal dilutions were prepared to 100 ng µL -1 . DNA concentration and purity were checked using a DeNovix spectrophotometer from TK Biotech. Care was taken to only select samples for sequencing where the DNA purity at 260/280 absorbance was between 1.8 and 2.0 and at 230/260 absorbance was not less than 1.8. DNA prepared in this way was placed in a 96-well plate and sent for sequencing to Diversity Arrays Technology.
For next-generation sequencing, F 1 hybrids and inbred lines (highly homozygous), which were the parental forms of these hybrids, were sent to a 96-well plate. However, for the above publication, only the sequencing results of the F 1 hybrids were used from the database of the results obtained. The results for the lines were used for a different inference. The inbred lines were obviously characterized by low yield because they had previously undergone inbreeding to obtain plant materials that were highly homozygous. The F 1 hybrids were characterized by a high yield.
The genotypic data used for association mapping were obtained from polymorphisms identified in DArT and candidate gene sequences. The details of DArT sequences were described by Tomkowiak et al. [23]. Genotyping using DArTseq technology was performed by Diversity Arrays Technology Pty Ltd. (Kilian A., Canberra, Australia). DNA sample digestion/ligation reactions were processed according to Kilian and Graner [20], but by replacing a single PstI-compatible adaptor with two adaptors corresponding to PstIand NspI-compatible sequences and moving the assay on the sequencing platform as described by Sansaloni et al. [24]. The PstI-compatible adapter was designed to include an Illumina flow cell attachment sequence, a sequencing primer sequence and a "staggered" varying length barcode region, similar to the sequence reported by Elshire et al. [25]. The reverse adapter contained a flow cell attachment region and an NspI-compatible overhang sequence. Only "mixed fragments" (PstI-NspI) were amplified via PCR using the following reaction conditions: denaturation for 1 min at 94 • C; followed by 30 cycles of 94 • C for 20 s, 57 • C for 30 s and 72 • C for 45 s; and the final elongation at 72 • C for 7 min. After PCR, equimolar amounts of amplification products from each sample of the 96-well microtiter plate were bulked and applied in c-Bot (Illumina) bridge PCR, followed by sequencing on Illumina Hiseq2500. The sequencing (single read) was run for 78 cycles. Sequences generated from each lane were processed using proprietary DArT analytical pipelines. In the primary pipeline, the fastq files were first processed to filter away poor-quality sequences, applying more stringent selection criteria to the barcode region compared to the rest of the sequence. In this way, the assignments of the sequences to specific samples carried in the "barcode split" step were very reliable. Approximately 2,500,000 (±7%) sequences per barcode/sample were used in the marker calling. Finally, identical sequences were collapsed into "fastqcall files". These files were used in the secondary pipeline for DArT PL's proprietary SNP and SilicoDArT (presence/absence of restriction fragments in representation) calling algorithms (DArTsoft14).

Statistical Analysis
Heterosis effects for hybrids were calculated relative to the average of both parents, for each trait ( Figure 2) [23]. The effects of heterosis were tested at the 0.05 level. DArT sequences meeting three criteria were selected for association analysis: the missing observation fractions < 10%, minor allele frequency (MAF) > 0.25 and one SNP and SilicoDArT within a given sequence (69 nt). Association mapping of heterosis effects was performed using a method based on The effects of heterosis were tested at the 0.05 level. DArT sequences meeting three criteria were selected for association analysis: the missing observation fractions < 10%, minor allele frequency (MAF) > 0.25 and one SNP and SilicoDArT within a given sequence (69 nt). Association mapping of heterosis effects was performed using a method based on a mixed linear model [26,27]. The GenStat v. 22 [28] statistical software package was used for all of the analyses. The significance of associations between SilicoDArT and SNP markers and heterosis effects was tested at the 0.001 level (with Benjamini-Hochberg correction, for multiple comparisons) [29]. All analyses were conducted for four environments (two localities, two years) independently.

Results
Next-generation sequencing yielded 81,602 molecular markers (28,571 SNPs and 53,031 SilicoDArTs). Of the 81,602, 15,409 (13,850 SNPs and 1559 SilicoDArTs) were selected for association analysis using the criteria given above. Due to statistically significant interactions, association analyses of genotypes-years, genotypes-locations, years-locations and genotypes-years-locations were performed independently for four environments (combinations of years-locations). The 105 molecular markers (8 SNPs and 97 SilicoDArTs) were associated with the heterosis effect of at least one trait in at least one environment. A total of 186 effects were observed ( Figure 3, Tables 1-9).

Results
Next-generation sequencing yielded 81,602 molecular markers (28,571 SNPs and 53,031 SilicoDArTs). Of the 81,602, 15,409 (13,850 SNPs and 1559 SilicoDArTs) were selected for association analysis using the criteria given above. Due to statistically significant interactions, association analyses of genotypes-years, genotypes-locations, years-locations and genotypes-years-locations were performed independently for four environments (combinations of years-locations). The 105 molecular markers (8 SNPs and 97 Sili-coDArTs) were associated with the heterosis effect of at least one trait in at least one environment. A total of 186 effects were observed ( Figure 3, Tables 1-9).  The heterosis effect of the cob length was determined by five markers: four Sili-coDArTs and one SNP ( Table 1) Table 1). The other four markers determined the heterosis effect only in single environments. The largest number (four) of associations was observed in Łagiewniki in 2014. The heterosis effect of cob diameter was determined by 18 SilicoDArT markers ( Table 2). The effects of these markers varied from  (Table 2). The heterosis effect of core length was determined by eleven markers: eight Silico-DArTs and three SNPs ( Table 3) No marker was significant for the heterosis effect in more than two environments. Three markers (Silico-DArT markers 2490222 and 2548691, and SNP marker 4776193) were associated with the heterosis effect of core length in two environments ( Table 3). The largest number (eight) of associations was observed in Łagiewniki in 2014. The heterosis effect of core diameter was determined by 27 markers: 24 SilicoDArTs and 3 SNPs (Table 4)   The 34 SilicoDArT markers and one SNP marker were associated with the heterosis effect of the number of rows of grain ( Table 5) (Table 5).  The heterosis effect of the number of grains in a row was determined by 15 SilicoDArT markers ( Table 6). The effects of individual markers ranged from −6.98  (Table 6). The heterosis effect of mass of grain from the cob was determined by seven SilicoDArT markers ( Table 7)  The heterosis effect of weight of one thousand grains was determined by ten Silico-DArT markers ( Table 8) Table 8). The largest number (six) of associations was observed in Łagiewniki in 2014. The heterosis effect of yield was determined by five markers: four SilicoDArTs and one SNP ( Table 9) Table 9). The other three markers determined the heterosis effect of yield only in single environments. The largest number (three) of associations was observed in Łagiewniki in 2013. Of particular note are three markers (2490222, 2548691 and 7058267) which were significant in 17, 8 and 6 cases, respectively. Two of them (2490222 and 7058267) were associated with the heterosis effects of yield in three of the four environments (Table 9).

Discussion
It is estimated that by 2050 the planet will be inhabited by approximately 10 billion people and therefore there is increasing pressure to increase and balance food production. Breeders, with the support of scientific entities, are developing ever newer tools that guarantee ever greater accuracy in selection [30,31]. Current selection methods have been enriched by statistical models and advances in molecular biology that allow for both the identification of markers for individual traits that are the result of single genes and those that are determined by multiple QTLs to different degrees, explaining the phenotypic variation in a trait [32,33]. Recent selection methods allow work to be carried out using both Mendelian and population-based models. Each of the aforementioned models has a defined field of action that defines its use within specific experimental needs. Increasingly, molecular markers obtained by next-generation sequencing and association mapping are being used for selection [34].
Association mapping is one tool for identifying novel markers and associated genes. Association mapping assumes that multiple plant materials (usually unrelated or distantly related with a high degree of alignment within lineages) are taken for study [35]. Obviously, the size of the study population depends on the species and trait. In order to achieve the desired effect, the size of the mapping population can vary from about 200 to 800 individuals [35]. Increasing the population size is usually not economically justified [36]. The use of diverse plant materials means that recombinations occurring at earlier stages of derivation of materials are frequent, and the variation contained between them reflects the variation in the gene pool [35]. Thus, the trait markers identified should be useful for broader plant material. The method assumes that all analyzed plant materials are screened for a specific trait and are profiled with the relevant DNA markers [37]. In order to identify trait markers, it is necessary to have a large number of markers densely and evenly distributed across the genome. The density of such coverage is dependent on the value of the coupling imbalance, which will depend on the species and the trait [38]. As a result, markers obtained by next-generation sequencing methods such as GBS [25] or DArTseq [39] and relatively high computing power [38] are required for such studies. In this study, association mapping was used to identify molecular markers related with the heterosis effect. The results of our study indicate that the 105 molecular markers (8 SNPs and 97 SilicoDArTs) were associated with the heterosis effect of at least one trait in at least one environment. A total of 186 effects were observed. The number of statistically significant relationships between the molecular marker and heterosis effect varied from eight (for cob length) and nine (for yield) to forty-two (for the number of rows of grain).
Heterosis, as a genetic phenomenon, determines the beneficial consequence of crossbreeding, which is the vigor of the first generation of hybrids [40]. Interest in hybrid vigor and its practical use started in 1880, when Beal found and described the existence of hybrid vigor after crossing two population varieties. Very often, the economic importance of the heterosis phenomenon can be demonstrated without statistical analyses, as its symptoms are sometimes striking [41]. Although the phenomenon of heterosis has measurable economic benefits, unfortunately it cannot be perpetuated. One exception is vegetatively propagated plants, e.g., bananas, where production has been dominated by two vegetatively propagated hybrids [42].
Very often, but not always, the phenomenon of heterosis concerns quantitative traits, e.g., plant yield. A major breeding goal is the heterosis of grain yield, although we know little about how heterosis effects affect yield components [43]. Many authors argue that grain yield is a trait characterized by low heritability and that the lack of correlation between yield levels of parental lines and their F1 generation hybrids is a result of gene dominance and overdominance [44,45]. In some experimental combinations, despite the significance of additive effects, the percentage of non-additive gene effects to the genetic yield variance of F 1 generation hybrids often exceeded 80%. It appeared that environmental interaction effects influenced the expression of gene action [46]. In the past decade, many researchers in their studies [47][48][49] have applied molecular biology methods to detect and locate loci determining grain yield and yield structure traits in maize. However, intensive research is still underway to search for loci that are coupled to yield and its components [50]. According to various studies, QTL regions associated with genes that determine grain yield and its components are distributed throughout the genome [4]. Our results indicate that five molecular markers (four SilicoDArTs and one SNP) were associated with the heterosis effect of yield. Two (SilicoDArTs: 2490222 and 7058267) of them were associated with the heterosis effect of yield in three of the four environments.
In maize research, molecular analyses intensively focus on identifying QTL regions and new markers. Breeders with DNA analyses are also interested in developing methods to select parental components for heterosis crosses [21] and to predict the effect of heterosis [51]. An idea here is to find an association between the heterogeneity of loci markers for their parental forms and the yield of an F 1 hybrid [52]. It is well known that starting materials with high genetic diversity make for breeding success, as well-identified and heterotically divided starting material results in lower costs for the entire hybrid breeding process [22]. We can divide plant materials into heterotic groups according to the following criteria: genetic origin (pedigree), results of crossing in diallel systems, geographical origin and molecular markers. Unfortunately, the first three subdivision criteria presented are flawed. Therefore, in recent years, attempts have been made to select parental components for heterosis crosses on the basis of their genetic diversity, determined by molecular markers [53]. In the research presented here, association analysis was used to detect relationships between the effect of heterosis and the genotype of hybrids. Similar studies were conducted on winter oilseed rape [54].
Rapidly developing new genotyping methods based on hybridization markers or next generation sequencing are finding increasing application in basic research. The reproducibility of the results of DArT and DArTseq technologies and the availability of a very large number of SNP markers, as well as their declining costs, mean that these modern methods are finding increasing applications in identifying markers for quantitative traits and genome-wide selection in economically important plants. The use of these methods reduces work time significantly. DArTseq technology also proves itself as a powerful diagnostic tool for studying genotypic diversity [55]. DArT and DArTseq technology have been successfully applied in Chinese common wheat (Triticulum aestivum L.) to study the population structure and genetic diversity for 111 breeding lines and cultivars from northern China. The results obtained provided valuable information into China's wheat breeding program for selecting parental forms [56]. The DArT method has been widely used in parentage analysis, such as in oats (Avena sp.), where groups corresponding to spring and winter forms were distinguished for 134 varieties [57]. A low degree of differentiation was shown in 232 forms of Bengal pea (Cajanus cajan) based on 696 DArT markers, of which only 64 were polymorphic; wild forms were the most diverse [58]. Tomkowiak et al. [53], using next-generation sequencing and association mapping, identified 15,409 silicoDArT markers and SNPs significantly associated with yield and yield structure traits in maize. From these markers, 18 SilicoDArT markers determined the quantitative trait, in all localities considered. A physical map was constructed based on these markers. Six of the eighteen identified markers (numbered 1818, 14506, 2317, 3233, 11657 and 12812) were located within the genes on chromosomes 8, 9, 7, 3, 5 and 1, respectively. Two markers (no. 15097-SilicoDArT-and no. 5871-SNP) linked to fusarium resistance in maize have been localized within genes [59] on chromosomes 2 and 3, respectively. In their study, Tomkowiak et al. [60], using DArTseq technology and association mapping, identified six markers associated with vigor and germination in maize. The existing literature indicates that four of these genes (phosphoinositide phosphatase sac7 isoform ×1 gene, grx_c8glutaredoxin subgroup iii gene, sucrose synthase 4 isoform ×2 gene and putative SET domain-containing protein family isoform ×1 gene) determine the level of seed germination and seed vigor in maize.

Conclusions
The integration of molecular genetics with traditional plant selection methods guarantees an objective assessment of phenotype, taking into account environmental factors, and offers the possibility of eliminating undesirable genotypes as early as the seedling stage, thus reducing breeding time. The development of molecular biology, including in vitro culture breeding methods, marker technologies (e.g., next-generation sequencing), the construction of genetic maps of crop plants of different species or the use of genetic mapping, quantitative trait mapping and, more recently, association mapping in combination with appropriately selected mapping populations and automated plant phenotyping methods, open up new possibilities for breeding. The association mapping used in the present study is a good and useful tool for identifying molecular markers that determine the effect of heterosis yield and yield structure traits in maize. The main objective of our work was to propose a method for detecting markers coupled to the effects of heterosis of quantitative traits. The results obtained show the versatility of the given markers as tools for heterosis diagnosis. The advantage of the proposed approach is that it is independent of the sign of the heterosis effect.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.