Modern maize breeding is to create new cultivars with improved traits [1
]. Currently, the breeding based largely on the use of the heterosis phenomenon (F1 hybrid vigor), occurring as a result of crossing two inbred lines with the highest combining ability. As a result, high-yielding hybrids are obtained with traits that exceed parental lines. Although the reasons for the occurrence of heterosis are not clearly determined, there are many hypotheses explaining this phenomenon, e.g., the hypothesis of overdomination or domination. It is believed that heterosis is associated with the genetic distance between parental forms, determined by DNA polymorphism, therefore adequate selection of parental components becomes a key element in the breeding process [1
]. Presently, intensive research have been conducted on the possible use of molecular markers in the selection of parental lines for heterosis crosses [2
]. Markers based on single nucleotide polymorphisms (SNPs) are increasingly used for this purpose [5
Modern methods for identifying single SNP polymorphisms use the next-generation sequencing technology (NGS). The NGS is a technique developed in the 21st century that provides much higher performance and throughput than the previously used Sanger sequencing technique [6
]. This technology provides inexpensive whole genome sequence readings through methods, such as chromatin immunoprecipitation, mutation mapping, polymorphism detection and detection of non-coding RNA sequences [7
]. Sequencing methods such as: Restriction site associated DNA (RADseq) [8
], multiplexed shotgun genotyping (MSG) [9
] and bulked segregant RNA-Seq (BSRSEq) [10
] enable the identification of a significant number of markers and more accurate examination of many loci in a small number of samples.
Another genotyping-by-sequencing method applied already to many hundreds of organisms is DArTseq™. The DArTseq™ represents a combination of a DArT complexity reduction methods and next generation sequencing platforms [11
]. Therefore, DArTseq™ represents a new implementation of sequencing of complexity reduced representations [16
] and more recent applications of this concept on the next generation sequencing platforms [8
]. The DArTseq procedure [15
] is used, among others, to identify single nucleotide markers (SNPs) and provides a large pool of so-called silicoDArTs that have a dominant character because the variability is determined by a single point mutation, without a variant from the second homologous chromosome. The presence or absence of a mutation (silicoDArT marker) is often treated as a single feature, which is assigned a value of 1 or 0, respectively. In this method, the genome complexity is reduced by restriction enzyme digestion and sequencing of short fragments [12
]. DArTseq technology replaces the hybridization stage with sequencing taking place in the Illumin system [18
]. Similar to DArT methods based on array hybridizations, the DArTseq™ technology is optimized for each organism by application the most appropriate complexity reduction method (both the size of the representation and the fraction of a genome selected for assays). DArTseq™ has been optimized for maize a few years ago and was used to characterize a complete maize germplasm collection of the International Maize and Wheat Improvement Center (CIMMYT).
Association mapping also called linkage disequilibrium mapping involves searching for genotype–phenotype correlations in unrelated individuals using dedicated statistical methods [19
]. The association mapping approach provides possibilities to generate good quality markers for marker-assisted selection (MAS). Functional markers tightly linked with the trait reflect gene polymorphisms, which directly cause phenotypic variation. Association mapping provides opportunities to find such markers in a broad spectrum of genetic resources. Its potential results from the likelihood of higher mapping resolution, due to the use of a larger number of recombination events in the germplasm’s developmental history [22
]. Thus, association mapping has become a promising approach compared to traditional mapping. There are two main types of association mapping: Genome-wide association mapping (GWAM) and candidate gene association mapping (CGAM). The GWAM approach surveys genetic variation in the whole genome to find association signals for various complex traits, whereas CGAM correlates DNA polymorphisms in selected candidate genes and the trait of interest [20
]. There are many examples of successful application of association analysis in cereals, mainly in maize.
Recently, GWAM has evolved as a powerful tool to dissect the genetic architecture of complex traits in crop species [22
]. Advances in NGS allow identification of thousands of genetic marker loci, which in turn enables their statistical association with traits of interest based on linkage disequilibrium [24
]. Skim-based genotyping by sequencing (skimGBS) uses low-coverage (1–10×) whole genome sequencing for high resolution genotyping. Genomic reads from parental individuals are mapped to the reference genome and SNPs are predicted. Reads from the progeny are then mapped to the same reference and comparison with the parental SNP file enables calling of SNPs in the progeny of one or other parental genotypes [25
]. Associated genetic markers can be causal for the trait of interest or in linkage disequilibrium with a causal locus [20
]. To date, GWAM approaches using whole genome sequencing have allowed researchers to dissect genetic regulation of complex traits, such as oil biosynthesis, carotenoid concentration and yield in well studied crops, including maize and rice [26
The aim of this study was to identify SNP and SilicoDArT markers associated with yield traits and morphological features in maize (Zea mays L.). These studies are to facilitate the selection of parental components for heterosis crossings.
Maize, one of our most important crop species, has been the target of genetic investigation and experimentation for more than 100 years. Crossing two inbred lines tends to result in “better” offspring, a process known as heterosis. Attempts to map genetic loci that control traits important for farming have been made, but few have been successful [30
The DArTseq technology is a modification of the DArT method. It consists in replacing the hybridization step on microarrays with next-generation sequencing in the Illumina system [18
]. Several times more polymorphic markers—both dominant silicoDArT and codominant SNPs—are obtained as a result of the analysis.
The first association mapping (AM) was described in wheat [31
], where markers associated with resistance to cereal rust, yellow rust, powdery mildew and also grain yield were identified. One hundred and seventy winter wheat (Triticum aestivum
L.) lines were analyzed. A number of markers of the traits studied was selected based on AM, and they were positioned on the appropriate species chromosomes, based on the genetic map containing 1644 markers, of which 813 were DArT markers. DArT markers have also proved useful in association mapping in wheat [32
]. The latter authors identified markers highly associated with important agrotechnical traits useful in breeding programs. DArT markers have been successfully used to analyze the genetic diversity and structure of Chinese common wheat (Triticum aestivum
L.) populations. A total of 111 cultivars and breeding lines from northern China were examined. The results provided information for further selection of parental forms and establishing heterozygous test materials for the needs of the Chinese wheat breeding program [33
In the present study, similarity dendrograms were constructed between the inbred lines based on all significant SNPs and SilicoDArT markers. Inbred lines on both dendrograms clustered according to the kernel structure (flint, dent) and origin. There were no statistically significant differences in the clustering of the analyzed lines between genetic similarity results based on SNP and SilicoDArT markers.
The association mapping resulted in identification of 969 markers significantly linked (at FDR < 0.05 in GWAM) with the analyzed morphological features and yield structure traits. Among the selected markers, 623 were SilicoDArTs and 346 were SNPs. The least markers (6) were associated with the NPLB trait, and the most (150) with the anthocyanin coloration of glumes of cob. Three markers were associated with five or six traits: SilicoDArT 4591115 (anthocyanin coloration of anthers, length of main axis above the highest lateral branch, cob length, number of grains per cob, weight of fresh grains per cob and weight of fresh grains per cob at 15% moisture), SilicoDArT 7059939 (anthocyanin coloration of glumes of cob, time of anthesis—50% of flowering plants, time of silk emergence—50% of flowering plants, anthocyanin coloration of anthers and cob diameter) and SilicoDArT 5587991 (anthocyanin coloration of glumes of cob, time of anthesis, anthocyanin coloration of anthers, curvature of lateral branches and number of rows of grain). The sequence of SilicoDArTs 4591115, 7059939 and 5587991 were used in physical mapping in the Zea mays genome. The marker 4591115 was localized on chromosome 5 on the non-coding region. In the closest neighborhood (10–60 Kb) were localized putative MYB DNA-binding domain protein, vegetative cell wall protein gp1-like gene and some uncharacterized protein LOC100191236. It is difficult to conclude whether the localization of this marker in the genome can have a direct meaning on its detected correlations with the features.
One fragment (41 bp) of SilicoDArT 5587991 showed homology to chromosome 1, and the second fragment of this marker (28 bp) was aligned to chromosome 3 within the intron of the 50S ribosomal protein L31 gene. The two fragments (27 and 22 bp) of marker 7059939 were aligned to the separate regions of chromosome 2 (120 Mb distant from each other). The first fragment was located directly within the serine carboxypeptidase-like gene and the second was in protein LOC103648845. Both markers showed homology to the coding sequences, which might affect the correlated features. It is difficult to explain division of the marker sequences into two fragments. It could be the result of incidental ligation during the library preparation step or alternatively real rearrangements in tested lines genomes.
The theoretical basis for the relationship between genetic distance and heterosis was presented by Bernardo [34
]. He found that molecular markers could be useful for predicting heterosis if they show a strong domination effect, allele frequency is negatively correlated with parents, their inheritance is high and there is an association between quantitative trait loci (QTL).
In the study, similarity dendrograms were created between inbred lines based on molecular markers. The lines analyzed clustered according to origin. Most studies indicate that the less related parental components, the higher the heterosis effect can be expected in the F1 generation hybrids. Thus, in the case of missing or incomplete information about the origin of parental lines, molecular SNP and SilicoDArT markers may be useful in predicting hybrid formulas in heterosis.
Nineteen tropical maize biparental populations evaluated in multienvironment trials were used in Zhang et al.’s [35
] study to assess the prediction accuracy of different quantitative traits using low-density (~200 markers) and genotyping-by-sequencing (GBS) single-nucleotide polymorphisms (SNPs), respectively. An extension of the genomic best linear unbiased predictor that incorporates genotype × environment (GE) interaction was used to predict genotypic values; cross-validation methods were applied to quantify prediction accuracy. Their results showed that low-density SNPs were largely sufficient to obtain a good prediction in biparental maize populations for simple traits with moderate-to-high heritability, but GBS outperformed low-density SNPs for complex traits. GE interaction in maize is usually strong for complex quantitative traits, and maize hybrids are always tested in multiple environments. Most of the current genomic prediction studies have only applied a single-environment model and have not considered predictive models with correlated environmental structures [36
]. Cook et al. [37
] conducted joint-linkage quantitative trait locus (QTL) mapping and GWAM for kernel starch, protein and oil in the maize nested association mapping population, composed of 25 recombinant inbred line families derived from diverse inbred lines. Joint-linkage mapping revealed that the genetic architecture of kernel composition traits is controlled by 21–26 QTLs. Numerous GWAM associations were detected, including several oil and starch associations in acyl-CoA:diacylglycerol acyltransferase 1–2, a gene that regulates oil composition and quantity. Results from nested association mapping were verified in a 282 inbred association panel using both GWAM and candidate gene association approaches. They identified many beneficial alleles that will be useful for improving kernel starch, protein and oil content. Benke et al. [38
] investigated the effect of two different Fe regimes on the formation of morphological and physiological traits; they have identified polymorphisms significantly associated with morphological and physiological traits and analyzed the correlation between those traits employing the association mapping population. Fine mapping of QTL confidence intervals of the intercrossed B73 × Mo17 population resulted in the identification of a total of 13 SNPs in Fe limited regime and 2 SNPs under normal supplementation that were statistically (FDR = 0.05) associated with cytochrome P450 94A1, invertase beta-fructofuranosidase insoluble isozyme 6 and a low-temperature-induced 65 kDa protein. Association analysis of the entire genome under restricted and normal Fe treatments yielded a total of 18 and 17 significant SNPs, respectively. Dell’Acqua et al. [39
] generated for the first time a balanced multi-parental population in maize, which serves as a tool for effortless QTL mapping in maize due to a large variety and dense recombination events. This author generated 1636 MAGIC maize recombinant inbred lines originating from eight genetically different founder lines. The analysis of the MAGIC 529 maize line demonstrated that the population is a balanced, uniformly differentiated mosaic of eight founders that has a mapping power and resolution enhanced by the high frequencies of minor alleles and the rapid disappearance of linkage disequilibrium. That study provided evidence how MAGIC maize can be used to find strong candidate genes through the incorporation of genome sequencing and transcriptomic information. The latter authors described three flowering time QTLs and three grain yield QTLs and indicated potential candidate genes. MAGIC maize subsets have been demonstrated to acquire high power and high resolution QTL mapping in power simulations. According to Xiao et al. [40
], a growing number of readily available GWAM results allow us to narrow down association analyses to single well-annotated candidate genes and to elucidate the structure of the genome and its constitution connected with the studied traits. First attempts aimed at calculating the pattern of the distribution of associated loci at the whole genome level have demonstrated that intragenic regions and those with close proximity to genes (as opposed to intergenic regions) were primarily responsible for the variability of maize traits, particularly in the 5′UTR (non-translated region) [10
]. In addition, non-synonymous mutated SNPs along with variants with high copy numbers show the highest rate of functional mutations, while intergenic regions contain significantly less functional SNPs [41
]. The above systematic studies indicate that gene regulation at the level of expression should have an important function in phenotypic diversity. The expression pattern of immature maize kernels has been broadly studied within the frame of this hypothesis [42
] and highly similar conclusions have been drawn as in earlier studies regarding quantitative traits; namely that non-synonymous SNPs are the crucial factors in expression regulation, and they have the highest number of SNP-QTL associations [42
4. Materials and Methods
4.1. Plant Materials
The plant material was sixty-two inbred lines from the maize collections belonging to two Polish cultivation companies: The Plant Breeding Smolice IHAR Group (Poland) and the Plant Breeding Małopolska (Poland). Among the analyzed lines were both flint- and dent-shaped grain forms. Lines with a flint-type grain belonged to three different origin groups: F2 (a group related to the F2 line bred at INRA in France from the Lacaune population), EP1 (a group related to the EP1 line, bred in Spain from the population derived from the Pyrenees) and German Flint (a line group bred from the local German population). Lines with a dent-type grain belonged to different origin groups from the United States: Iowa Stiff Stalk Synthetic (BSSS), Iowa Dent (ID) and Lancaster. Inbred lines of complex origin bred from different starting populations and lines of unknown origin were also analyzed (Table 3
The field experiment with inbred maize lines was established in 2015 using 10 m2 plots in a set of complete random blocks in three replicates in Polish breeding stations in the Plant Breeding Smolice IHAR Group (51°42′20.813″ N, 17°9′57.405″ E) and Plant Breeding Małopolska (50°58′12.75″ N, 16°56′5.892″ E). The analysis of morphological features was conducted from May to October 2015 and included 13 traits: Type of grain (TG), anthocyanin coloration of glumes of cob (ACGC), time of anthesis—50% of flowering plants (TA), time of silk emergence—50% of flowering plants (TSE), anthocyanin coloration of silks (ACSi), anthocyanin coloration of anthers (ACA), anthocyanin coloration at the base of the glume (ACBG), angle between main axis and lateral branches (ANGLE), curvature of lateral branches (CLB), length of main axis above the highest lateral branch (LMA), number of primary lateral branches (NPLB), anthocyanin coloration of sheath (ACSh) and anthocyanin coloration of internodes (ACI). Biometric measurements were carried out in the first half of November 2015 and included nine traits: Plant length (PL), height ratio of insertion of peduncle of the upper ear to plant length (HIP), cob diameter (DC), cob length (LC), number of rows of grain (NRG), number of grains per cob (NGC), weight of fresh grains per cob (WFGC), dry matter content at harvest time (DM) and weight of fresh grains per cob at 15% moisture (WFG15). Measurements concerning yield structure traits were performed on 20 randomly selected cobs from three replicates of each inbred line.
Climatic conditions: In 2015, the average rainfall in Smolice was 39.45 mm and was 5.82 mm lower than the average rainfall for many years. The highest rainfall was in July (55 mm) and the lowest in March (15 mm). The average air temperature this year in Smolice was 11.54 °C and was higher than the average temperature over the years by 1.8 °C. The warmest month in 2015 was August (21 °C), while the lowest temperature was recorded in December (1.1 °C). In 2015, rainfall and temperature levels were unfavorable during the initial development of maize. Despite the early sowing date, the maize remained in the 2–3 leaf stage for a long time, and purple discoloration was visible on the leaves due to the difficulty in taking phosphorus from the soil. May was full of rainfall, which had a positive effect on the further development of maize.
4.3. Genotyping and SilicoDArT and SNP Data Processing
Genotype data for association mapping were derived from polymorphisms identified in DArT and candidate gene sequences.
Sixty-two lines were genotyped. Total genomic DNA was extracted from the young leaves of the analyzed forms using the GenElute Plant Mini Kit (Sigma-Aldrich, Poznań, Poland). DNA purity and concentration were determined spectrophotometrically (Thermo Scientific, Waltham, MA, USA). The concentration of all DNA samples was adjusted to 100 ng µL−1. The DArTseq analysis was performed at Diversity Arrays Technology Pty Ltd. (Australia).
DNA samples digestion/ligation reactions were processed according to Kilian et al. [13
] but replacing a single PstI-compatible adaptor with two adaptors corresponding to: PstI- and NspI-compatible sequences and moving the assay on the sequencing platform as described by Sansaloni et al. [15
]. The PstI-compatible adapter was designed to include Illumina flowcell attachment sequence, sequencing primer sequence and “staggered”, varying length barcode region, similar to the sequence reported by Elshire et al. [17
]. Reverse adapter contained flowcell attachment region and NspI-compatible overhang sequence.
Only “mixed fragments” (PstI–NspI) were amplified in PCR using the following reaction conditions: Denaturation 1 min at 94 °C, followed by 30 cycles of 94 °C for 20 s, 58 °C for 30 s and 72 °C for 45 s, and the final elongation 72 °C for 7 min. After PCR equimolar amounts of amplification products from each sample of the 96-well microtiter plate are bulked and applied to c-Bot (Illumina) bridge PCR followed by sequencing on Illumina Hiseq2500. The sequencing (single read) was run for 77 cycles.
Sequences generated from each lane were processed using proprietary DArT analytical pipelines. In the primary pipeline the fastq files were first processed to filter away poor quality sequences, applying more stringent selection criteria to the barcode region compared to the rest of the sequence. In that way the assignments of the sequences to specific samples carried in the “barcode split” step were very reliable. Approximately 2,500,000 (+/−7%) sequences per barcode/sample were used in marker calling. Finally, identical sequences were collapsed into “fastqcall files”. These files were used in the secondary pipeline for DArT PL’s proprietary SNP and SilicoDArT (presence/absence of restriction fragments in representation) calling algorithms (DArTsoft14). For the association analysis, only DArT sequences meeting the following criteria were selected: One SilicoDArT and SNP within a given sequence (69 nt), minor allele frequency (MAF) >0.25 and the missing observation fractions <10%. SilicoDArT and SNP sequences were mapped using the Blast service at https://www.gabipd.org/
with default parameters. The sequences (69 bp) of three SilicoDArT markers (4591115, 7059939 and 5587991) were used to search the RefSeq genome database of Zea mays
(tax ID: 4577) with Nucleotide BLAST search (NCBI, https://blast.ncbi.nlm.nih.gov/Blast.cgi
). The silicoDArT markers are dominant, because they represent the presence versus absence of restriction enzyme fragment in genomic representations of a subset of lines in the analysis. These markers are extracted by DArTsoft14 software and markers, which were present in a representation were assigned 1 and those absent were assigned 0 value, respectively.
4.4. Statistical Analysis and Association Mapping
A one-way analysis of variance (ANOVA) was performed to verify the hypothesis of the lack of the effect of lines on the variability of observed traits. Sample sizes for lines that were used in calculations were equal to ten for each of the four replications. The coefficients of genetic similarity (S) of the investigated lines were calculated using the Nei and Li [29
] formula. Lines were grouped hierarchically using the unweighted pair group method of arithmetic means (UPGMA) based on calculated coefficients. The relationships among lines were presented in the form of a dendrogram. Association mapping was performed using a method based on the mixed linear model with the population structure estimated by the eigenanalysis (principal component analysis applied to all markers) and modeled by random effects [43
]. All analyses were conducted in Genstat 18.2. Significance of associations between traits and SilicoDArT and SNP markers was assessed on the basis of p
-values corrected for multiple testing by the Benjamini–Hochberg method.
The development of new genotyping methods based on hybridization markers or NGS makes them increasingly applied in basic research. The availability of a large number of SNP markers or the reproducibility of DArT technology and their decreasing costs make modern methods to be used in economically important plants in applied research, such as identification of trait markers or even selection at the level of entire genomes, when the criterion of time is more important than the initial financial expenditure. The results of the conducted research show undoubtedly to the advantages of DArT Seq technology because of it identifying 49,911 polymorphisms (including 33,452 SilicoDArT and 16,459 SNP). Of these markers three very important ones were identified, and deserve particular attention, because they were associated with five or six traits: SilicoDArT 4591115, SilicoDArT 7059939 and SilicoDArT 5587991. These markers will be analyzed and tested in subsequent years of research. As results from conducted research molecular markers SilicoDArT and SNP can also be used in this species to group lines in terms of origin and lines with incomplete origin data. They can therefore be used to select parent components for heterosis hybrids. However, one should be aware of the advantages and disadvantages of DArT or SNP markers discussed. It seems that DArT markers may sometimes be more convenient than SNP markers due to their dominant nature (e.g., in polyploid species). The availability of probe DNA sequences, and thus the possibility of developing specific markers is their unquestionable advantage. At the same time, the DArTseq technology (in contrast to GBS) provides a large pool of the so-called silicoDArTs, which are also dominant (such a marker is either present or not in a given genotype, but is not related to the difference in the DNA sequence of a given marker). One should also be aware of the fact that DArT markers, due to their known location in many utility species, may be a better solution than SNP markers when the chromosomal location or localization of linkage groups to specific chromosomes of a given species is important. Therefore, it is worthwhile to consider which type of markers will provide greater advantages in the case of specific research tasks.