Identification and Analysis of Candidate Genes Associated with Yield Structure Traits and Maize Yield Using Next-Generation Sequencing Technology

The main challenge of agriculture in the 21st century is the continuous increase in food production. In addition to ensuring food security, the goal of modern agriculture is the continued development and production of plant-derived biomaterials. Conventional plant breeding methods do not allow breeders to achieve satisfactory results in obtaining new varieties in a short time. Currently, advanced molecular biology tools play a significant role worldwide, markedly contributing to biological progress. The aim of this study was to identify new markers linked to candidate genes determining grain yield. Next-generation sequencing, gene association, and physical mapping were used to identify markers. An additional goal was to also optimize diagnostic procedures to identify molecular markers on reference materials. As a result of the conducted research, 19 SNP markers significantly associated with yield structure traits in maize were identified. Five of these markers (28629, 28625, 28640, 28649, and 29294) are located within genes that can be considered candidate genes associated with yield traits. For two markers (28639 and 29294), different amplification products were obtained on the electrophorograms. For marker 28629, a specific product of 189 bp was observed for genotypes 1, 4, and 10. For marker 29294, a specific product of 189 bp was observed for genotypes 1 and 10. Both markers can be used for the preliminary selection of well-yielding genotypes.


Introduction
Biological progress in modern plant breeding is defined as the development of new genotypes with traits relevant to agricultural practice [1,2].These traits are associated with plant productivity and health, the suitability of produced materials for processing, as well as meeting the expectations of consumers (food) and non-consumable material users (e.g., cellulose-based resources).Maize breeding is aimed at developing high-yielding hybrid varieties [3][4][5].This progress increasingly relies on the application of genomics and genetic engineering advancements [6].
Maize breeding worldwide is based on a wide range of research techniques in molecular genetics, primarily in two areas.The first is making selection decisions based on DNA nucleotide sequence analysis, and the second is expanding genetic variability in breeding populations through genetic modifications, primarily by developing plant organisms with foreign species genes [7,8].This has not only created attractive prospects for achieving biological progress but also opened new possibilities for the utilization of not only maize but also other crops [5].
The introduction of molecular analysis for genetic markers has enabled the development of selection methodologies based on genetic markers-known as marker-assisted selection (MAS).The scope of applying this methodology is clearly dependent on the progress of knowledge about the genome of a given species [9,10].The breakthrough in genomics was the completion of the first stage of sequencing the human genome and the announcement in February 2001 of two publications, published in "Nature" and "Science", describing its organization [11,12].This achievement has opened up a wide range of possibilities for characterizing the genomic sequences of crop plants [13][14][15][16].The identification of genome sequences has revealed the presence of a vast number of differences in the studied genomes, with the majority being single-nucleotide polymorphisms (SNPs).They have been used to create a tool for a global genome analysis-the so-called SNP microarrayenabling the simultaneous determination of genotypes at thousands of loci [17].SNP microarrays have become a key tool in the search for unknown polymorphisms (mutations) responsible for the occurrence of genetic diseases (e.g., monogenic) or predisposition to their development (e.g., complex diseases), as well as the phenotypic variability of production traits.The procedure based on SNP microarrays is commonly referred to as a GWAS (genome-wide association study) and is widely used primarily in human genomics and increasingly in animal and plant genomics, including maize [18].
To identify trait markers, it is necessary to have a large number of markers that densely and uniformly cover the genome.The density of this coverage depends on the linkage disequilibrium, which is species-and trait-dependent.As a result, such studies require markers obtained by next-generation sequencing methods such as GBS [17] or DArTseq and relatively large computational power.Two approaches are distinguished in association mapping: candidate gene association and genome-wide association studies (GWASs).Conducting a GWAS involves searching for trait-marker associations in the entire genome, assuming that there are markers showing linkage disequilibrium within the gene regulating the expression of a given trait [19].Initially, association mapping performed in maize [20] did not consider population structure.This false-positive association was filtered out by the study of Pritchard, who included population structure in his maize study [21].
With the advancement of efficient marker methods and the availability of statistical software (Genstat 23), the number of analyzed species has increased, and DNA markers identified by this method are currently used in breeding practice [22][23][24].Association mapping has proven useful for identifying the markers of traits whose quantitative loci explain a significant portion of trait variation [25].However, this method has limited application for complex traits with weak effects of individual loci [26].
For several years, maize breeding worldwide has been supported by useful molecular markers.Many authors have stated in their publications that marker-assisted breeding accelerates yield growth not only in the USA but also in other countries, offering enormous potential to enhance maize productivity and germplasm value [27,28].
Therefore, the aim of the present study was to identify new markers linked to candidate genes determining grain yield using next-generation sequencing, gene association, and physical mapping, as well as to optimize the diagnostic procedures for the identification of 19 selected molecular markers.

Plant Material
The plant material included 64 inbred lines, 122 F 1 hybrids, and 20 reference genotypes of maize (both high-yielding and low-yielding).The plant material was derived from Hodowla Roślin Smolice Sp. z o.o.Grupa IHAR (51  55 ′ 50 ′′ E).Part of the analyzed lines were flint grain lines of three different origins: F 2 (a group related to the F 2 line, bred at INRA in France from the Lacaune population), EP1 (a group related to the EP1 line, bred in Spain from the population derived from the Pyrenees), and German Flint.The second part of the plant material was dent-type kernels derived from various groups of origin from the United States: Iowa Stiff Stalk Synthetic (BSSS), Iowa Dent (ID), and Lancaster.

Weather Conditions
The data used came from the weather station belonging to the Pozna ń University of Life Sciences.In 2022, the average rainfall in Smolice was 39.94 mm, lower than the multi-year average rainfall, which amounted to 48.27 mm.The wettest month in this year was July (102 mm of rainfall), and the greatest drought was recorded in October (3 mm of rainfall).The temperature in this year ranged from −2.1 • C in February to 21.6 • C in July.The average air temperature in this year was higher than the long-term average air temperature by 1.05 • C and amounted to 9.79 • C. In the second decade of April, no rainfall was recorded, which had an adverse effect on corn emergence, while in May, rainfall totals were higher, which resulted in soil sealing and uneven corn emergence.In 2022, the average rainfall in Kobierzyce was 40.93 mm, and similarly to Smolice, it was lower than the multiyear average rainfall by 7.37 mm.The highest rainfall was recorded in May (81 mm), and the lowest was recorded in October (12.4 mm).The average air temperature in 2022 was 10.3 • C in Kobierzyce, higher than the long-term average temperature by 1.42 • C (Figure 1).The warmest month was July (22.5 • C), and the coldest was February (−4.3 • C).In 2022, no weather anomalies were observed in Smolice and Kobierzyce.Despite the periodic drought, the weather was typical for these areas of Poland.

DNA Isolation
The isolation of DNA from 66 inbred lines and 122 F 1 hybrids was conducted using a commercial reagent kit purchased from Promega.The samples of isolated DNA were subjected to next-generation sequencing.DNA isolation from 20 reference genotypes was carried out using a commercial reagent kit purchased from A&A Biotechnology.The concentration and purity of the isolated DNA samples were determined using a DS-11 spectrophotometer from DeNovix.The isolated DNA template was adjusted to an equal concentration of 100 ng µL −1 by diluting it with double-distilled water (ddH 2 O).

Genotyping
The methodology was taken from the work of Sobiech et al. [29].DArTseq technology, which is based on next-generation sequencing, was applied for genotyping.The isolated DNA of the 188 maize plants tested (100 ng in 25 µL from each genotype) was sent in two 96-well Eppendorf plates for analysis to identify silicoDArT and SNPs.The analyses were performed at Diversity Arrays Technology, University of Canberra, Australia.Using methods proposed by Baird et al. [30], in the first step, the DNA template was digested with Ape KI, Pst I, and Msp I restriction enzymes to reduce genome complexity.The original GBS method used a single Ape KI enzyme (Elshire et al. [31]), and later, the method was expanded to include two additional enzymes: one infrequent cutter, Pst I, in combination with a frequent genomic DNA cutter, i.e., Msp I (Poland & Rife, [32]).Such an approach enabled the creation of a homogeneous library and the detection of most fragments associated with the infrequent cutting enzyme.A characteristic feature of the applied enzymes is their sensitivity to methylation, allowing for the filtering of non-coding regions and methylated repetitive sequences such as mobile elements.In the following step, genomic DNA fragments cleaved by restriction enzymes were ligated with adapters.Since the latter contains identifiers (so-called barcodes), the origin of each sample was strictly defined, and the identifiers met the appropriate criteria (Poland et al. [33]).The resulting PCR products were analyzed for size and constituted a genomic library, which was subsequently sequenced using a leading NGS platform (Kilian et al. [34]), Illumina, following the methodology detailed on the Diversity Arrays Technology's website (https://www.diversityarrays.com/technology-and-resources/dartseq/)(URL accessed on 20 October 2023).
Genes 2024, 15, 56 5 of 21 2.2.5.Association Mapping by Carrying Out a GWAS Association mapping of 188 maize genotypes (66 lines and 122 hybrids) was performed for yield and yield structure traits by carrying out a GWAS.This mapping was conducted based on the results obtained from the genotyping and phenotyping analyses.The genotypic data were obtained from the DArTseq analysis, while the phenotypic data comprise results from field experiments concerning yield size and ear structure traits.The following yield structure traits were analyzed: ear length, ear diameter, core length, core diameter, the number of rows, the number of kernels per row, TSW, and yield per plot.Based on the GWAS, silicoDArT and SNP markers showing the highest significance level, i.e., those that were most strongly associated with yield structure traits and yield, were selected for further study.

Physical Mapping
Sequences of the silicoDArT and SNP markers, selected based on the GWAS, were subjected to BLAST (Basic Local Alignment Search Tool) analysis, which involved searching databases for sequences highly homologous to the selected silicoDArT and SNP markers.The following publicly available web browsers were used for this: CEPH Genotype database http://www.cephb.fr/en/cephdb/(URL accessed on 20 October 2023), NCBI Map Viewer http://www.ncbi.nlm.nih.gov/projects/mapview/(URL accessed on 20 October 2023), UCSC Genome Browser http://genome.ucsc.edu/(URL accessed on 20 October 2023), Ensembl Map View http://ensembl.fugu-sg.org/common/helpview? kw=mapview;ref (URL accessed on 20 October 2023).The programs used helped identify the chromosomal locations of the retrieved sequences, similar to the analyzed sequences, and determine their physical location.The sequences of all genes located within the designated chromosomal region were further analyzed.

Functional Analysis of Gene Sequences
Our functional analysis was carried out using the Blast2GO program https://www.blast2go.com/(URL accessed on 20 October 2023).The sequences of all genes located in the chromosomal regions identified from the BLAST analysis were subjected to analysis.The aim was to obtain information about the biological function of the gene sequences located in a designated chromosomal region.

Designing Primers for Identified SilicoDArT and SNPs Associated with Yield and Traits
The Primer 3 Plus program was used to design primers.The program can be accessed online and does not need to be downloaded or installed.Primer 3 Plus offers various options, ranging from various ways of specifying the sequence for which the primers are to be designed and general expectations regarding primer characteristics (size, the melting temperatures of both the primers and products, %GC, complementarity, etc.) to very detailed settings for primer parameters.
The PCR conditions were individually determined for each of the identified markers and differed in terms of primer annealing temperature, determined according to their respective melting temperatures.The following amplification temperature profile was used: initial denaturation for 5 min at 95 • C, followed by 35 cycles (denaturation for 45 s at 95 • C), primer annealing for 1 min (a different temperature was used for each pair of primers, consistent with their melting temperature), extension for 1 min at 72 • C, and final extension for 5 min at 72 • C before cooling to 4 • C.

Electrophoresis
The electrophoresis of the PCR products was conducted on a 2.5% agarose gel, with the addition of 1 µL of Midori Green solution, for 2 h at 100 V.The O'RangeRuler 50 bp (Fermentas, Waltham, MA, USA) was used as a reference to identify the sizes of the amplified products.The visualization of the separated DNA fragments was carried out under UV light and captured on digital images using the BIORAD gel visualization and documentation system.

Field Experiment
The field experiment was established in two locations: Smolice (51  E).This allowed us to perform and analyze biometric measurements of 188 maize genotypes.The measurement results were used for association mapping.After harvest, observations of the following yield structure traits were conducted: ear length, ear diameter, core length, core diameter, the number of rows, the number of grains per row, grain weight per ear, TSW, and yield per plot (Figure 2).Density plots were constructed to examine the distribution of all analyzed variables in both locations.The peaks in the density plots illustrate the ranges where the values of the analyzed traits are concentrated; e.g., for the majority of the analyzed genotypes in both locations (Smolice, Kobierzyce, Poland) ear length falls within the range of 17-19 cm.As demonstrated in the accompanying graphs, the distribution of the analyzed variables differed between the locations for core diameter, the number of rows, the number of grains per row, grain weight per ear, and yield per plot (Figure 2).

Phenotyping
An analysis of variance between the genotypes was performed for the recorded traits, and significant variation was observed for all traits.Our analysis of variance also showed statistically significant variation for all the studied traits between the locations where the field experiment was conducted.The line-location interaction was not significant, only being so for the number of rows (Table 1).To determine the relationships between groups of variables in the dataset, i.e., observations of the yield structure traits and yield per plot in both locations, a multivariate technique was applied, namely canonical variate analysis.All traits were characterized by a normal distribution.The grouping of genotypes into lines and hybrids could be observed (Figure 3).Density charts showing the distribution of the analyzed traits: cob length (cm), ear diameter (cm), core length (cm), core diameter (cm), the number of rows, the number of kernels per row, mass of grain per cob (g), TSW (g), and grain yield (kg).

Phenotyping
An analysis of variance between the genotypes was performed for the recorded traits, and significant variation was observed for all traits.Our analysis of variance also showed statistically significant variation for all the studied traits between the locations where the field experiment was conducted.The line-location interaction was not significant, only being so for the number of rows (Table 1).

Trait
Analysis of Variance Correlations between the observed traits were analyzed in both locations, i.e., Smolice and Kobierzyce.It was demonstrated that in Smolice, the most strongly positively correlated traits were cob length and core length (97%), mass of grain from the cob and yield (92%), cob length and mass of grain from the cob (89%), and cob length and yield (87%) (Figure 4).In the case of Kobierzyce, the following traits were strongly positively correlated: cob and core length (98%), mass of grain from the cob and yield (97%), cob and core diameter (94%), cob diameter and mass of grain from the cob (93%), and cob diameter and yield (93%) (Figure 5).Correlations between the observed traits were analyzed in both locations, i.e., Smolice and Kobierzyce.It was demonstrated that in Smolice, the most strongly positively correlated traits were cob length and core length (97%), mass of grain from the cob and yield (92%), cob length and mass of grain from the cob (89%), and cob length and yield (87%) (Figure 4).In the case of Kobierzyce, the following traits were strongly positively correlated: cob and core length (98%), mass of grain from the cob and yield (97%), cob and core diameter (94%), cob diameter and mass of grain from the cob (93%), and cob diameter and yield (93%) (Figure 5).

DNA Isolation
DNA isolation from the 188 genotypes which were sent for next-generation sequencing was performed using a kit from Promega.The yield from individual isolations was high, ranging from 106 ng/µL for line 15 to 935.24 ng/µL for line 34.The purity of the isolated DNA was very good and averaged 1.8 for absorbance A260/A280.The exception were two samples: line 17, which had a purity of 1.54, and line 64, which had a purity of 2.39.Given the relatively high concentration of DNA obtained, the samples were adjusted to a uniform concentration of 100 ng µL −1 , required for next-generation sequencing analyses.

Genotyping
A total of 92,614 molecular markers were obtained as a result of next-generation sequencing, including 60,436 SilicoDArT markers and 32,178 SNPs.MAF > 0.25 and a number of missing observations <10% were applied as criteria to determine the usefulness of the identified markers.This operation reduced the number of markers to 32,900 (26,234 DArTs and 6666 SNPs), which were subsequently used for association mapping (Table 2).The majority of SNP and Silico DArT markers were associated with yield (18,352-Kobierzyce and 18,751-Smolice), mass of grain from the cob (17,685-Kobierzyce and 18,314-Smolice), and core diameter (17,787-Kobierzyce and 16,018-Smolice).Few markers were associated with the number of rows of grain (12,757-Kobierzyce and 11,714-Smolice) and the number of grains in row (13,265-Kobierzyce and 13,981-Smolice) (Table 2).In order to narrow down the number of markers for physical mapping, 20 markers were selected from among all the significant ones that were associated with the same traits in both locations (Kobierzyce and Smolice).

DNA Isolation
DNA isolation from the 188 genotypes which were sent for next-generation sequencing was performed using a kit from Promega.The yield from individual isolations was high, ranging from 106 ng/µL for line 15 to 935.24 ng/µL for line 34.The purity of the isolated DNA was very good and averaged 1.8 for absorbance A260/A280.The exception were two samples: line 17, which had a purity of 1.54, and line 64, which had a purity of 2.39.Given the relatively high concentration of DNA obtained, the −1  Based on the identified SNP and SilicoDArT molecular markers, a dendrogram of genetic similarity was constructed for the 188 analyzed genotypes (Figure 6).The dendrogram very clearly shows two distinct similarity groups.The first group consisted of 65 inbred lines from HR in Kobierzyce, while the second group included 122 analyzed hybrids and 1 inbred line.Such an ideal clustering demonstrates the usefulness of SNP and silico DArT markers for grouping genotypes by genetic similarity.
16,018-Smolice).Few markers were associated with the number of rows of grain (12,757-Kobierzyce and 11,714-Smolice) and the number of grains in row (13,265-Kobierzyce and 13,981-Smolice) (Table 2).In order to narrow down the number of markers for physical mapping, 20 markers were selected from among all the significant ones that were associated with the same traits in both locations (Kobierzyce and Smolice).
Based on the identified SNP and SilicoDArT molecular markers, a dendrogram of genetic similarity was constructed for the 188 analyzed genotypes (Figure 6).The dendrogram very clearly shows two distinct similarity groups.The first group consisted of 65 inbred lines from HR in Kobierzyce, while the second group included 122 analyzed hybrids and 1 inbred line.Such an ideal clustering demonstrates the usefulness of SNP and silico DArT markers for grouping genotypes by genetic similarity.A total of 20 of the 32,900 markers (26,234 DArTs and 6666 SNPs) significantly associated with the analyzed yield structure traits and yield were selected.These markers were significant for the same traits in both locations (Kobierzyce and Smolice) (Table 3).An attempt was also made to determine the location of selected SNP markers.Unfortunately, it was not possible to determine the position of one marker.The next step was to design primers for the identification of the 19 selected and localized markers.After determining the location of the 19 selected SNPs, an attempt was made to design primers for their identification.Primer sequences are shown in Table 4.Of the 19 markers selected, 2 (28629 and 29294) produced different amplification products on the electropherograms.The first 10 genotypes based on field observations are classified as the highest yielding while genotypes numbered 11-20 are classified as the lowest yielding.For marker 28629, a specific product of 189 bp was observed for genotypes 1, 4, and 10.Non-specific products of 200 bp were obtained for the remaining genotypes (Figure 7).This marker is located on chromosome 8, 3130 bp upstream of "protein senescence-associated gene 21, mitochondrial" and 91 bp downstream of "uncharacterized protein loc100382335".For marker 29294, a specific product of 189 bp was observed for genotypes 1 and 10.Non-specific products of 200 bp were obtained for the remaining genotypes (Figure 8).This marker is located on chromosome 5 within the hydroxyproline o-galactosyltransferase galt6 gene.Both markers will undergo further testing on a larger number of extreme genotypes to be used for the initial selection of high-yielding genotypes.Of the 19 markers selected, 2 (28629 and 29294) produced different amplification products on the electropherograms.The first 10 genotypes based on field observations are classified as the highest yielding while genotypes numbered 11-20 are classified as the lowest yielding.For marker 28629, a specific product of 189 bp was observed for genotypes 1, 4, and 10.Non-specific products of 200 bp were obtained for the remaining genotypes (Figure 7).This marker is located on chromosome 8, 3130 bp upstream of "protein senescence-associated gene 21, mitochondrial" and 91 bp downstream of "uncharacterized protein loc100382335".For marker 29294, a specific product of 189 bp was observed for genotypes 1 and 10.Non-specific products of 200 bp were obtained for the remaining genotypes (Figure 8).This marker is located on chromosome 5 within the hydroxyproline o-galactosyltransferase galt6 gene.Both markers will undergo further testing on a larger number of extreme genotypes to be used for the initial selection of high-yielding genotypes.

Discussion
The breeding of heterozygous maize varieties consistently aims to harness the potential of hybrid vigor, shorten the breeding process (e.g., by utilizing doubled haploid lines), and improve and reduce the costliness of seed production.The priority for all breeders is to obtain high-yielding and disease-resistant maize varieties [35].
In the present study, an analysis of variance was performed based on phenotypic observations (related to yield and yield structure traits).For all traits, significant variation was observed between the genotypes.Our analysis of variance also showed statistically significant variation for all studied traits between locations where the field experiment was established.The interaction of line and trial location was not significant only for the number of rows.To determine the relationships between the groups of variables in the dataset, i.e., observations of the yield structure traits and yield per plot in both locations, a multivariate technique was applied, namely canonical variate analysis.All traits were characterized by a normal distribution.The grouping of genotypes into lines and hybrids could be observed.
Phenotypic analysis, unfortunately, does not allow for the selection of parental components for heterosis crosses because traditional methods used in heterosis breeding are insufficient in the era of technological progress.In light of this challenge, modern agriculture has led to the harnessing of high-throughput techniques for analyzing the genomes of crop plants for their subsequent use in improving existing varieties, including maize [36].Such a genomics-focused approach allows one to obtain information about coding regions that provide information on protein structure (genomic), as well as intergenic regions; both types can be successfully applied to improve crop plant varieties [37].
The introduction of next-generation sequencing (NGS) methods has enabled the elucidation of nucleotide sequences in plants other than model organisms such as Arabidopsis thaliana, which are characterized by a small genome.Crop species of interest mainly include maize, coffee, or sugarcane [38].
In recent years, many authors have attempted to identify molecular markers linked to functionally important traits in maize.Bocianowski et al. [39] used NGS technology and associative mapping to identify markers related to the heterosis effect in maize.Using the same methods, Sobiech et al. [40] identified markers linked to the resistance of maize plants to fusarium.In turn, Tomkowiak et al. [41] identified six SNP markers (1818; 14506; 2317; 3233; 11657; 12812) located inside genes, on chromosomes 8, 9, 7, 3, 5 and 1, related to the amount of yield in corn.The authors of [42] identified four genes-sucrose synthase 4 isoform ×2 gene, phosphoinositide phosphatase sac7 isoform ×1 gene, putative

Discussion
The breeding of heterozygous maize varieties consistently aims to harness the potential of hybrid vigor, shorten the breeding process (e.g., by utilizing doubled haploid lines), and improve and reduce the costliness of seed production.The priority for all breeders is to obtain high-yielding and disease-resistant maize varieties [35].
In the present study, an analysis of variance was performed based on phenotypic observations (related to yield and yield structure traits).For all traits, significant variation was observed between the genotypes.Our analysis of variance also showed statistically significant variation for all studied traits between locations where the field experiment was established.The interaction of line and trial location was not significant only for the number of rows.To determine the relationships between the groups of variables in the dataset, i.e., observations of the yield structure traits and yield per plot in both locations, a multivariate technique was applied, namely canonical variate analysis.All traits were characterized by a normal distribution.The grouping of genotypes into lines and hybrids could be observed.
Phenotypic analysis, unfortunately, does not allow for the selection of parental components for heterosis crosses because traditional methods used in heterosis breeding are insufficient in the era of technological progress.In light of this challenge, modern agriculture has led to the harnessing of high-throughput techniques for analyzing the genomes of crop plants for their subsequent use in improving existing varieties, including maize [36].Such a genomics-focused approach allows one to obtain information about coding regions that provide information on protein structure (genomic), as well as intergenic regions; both types can be successfully applied to improve crop plant varieties [37].
The introduction of next-generation sequencing (NGS) methods has enabled the elucidation of nucleotide sequences in plants other than model organisms such as Arabidopsis thaliana, which are characterized by a small genome.Crop species of interest mainly include maize, coffee, or sugarcane [38].
In recent years, many authors have attempted to identify molecular markers linked to functionally important traits in maize.Bocianowski et al. [39] used NGS technology and associative mapping to identify markers related to the heterosis effect in maize.Using the same methods, Sobiech et al. [40] identified markers linked to the resistance of maize plants to fusarium.In turn, Tomkowiak et al. [41] identified six SNP markers (1818; 14506; 2317; 3233; 11657; 12812) located inside genes, on chromosomes 8, 9, 7, 3, 5 and 1, related to the amount of yield in corn.The authors of [42] identified four genes-sucrose synthase 4 isoform ×2 gene, phosphoinositide phosphatase sac7 isoform ×1 gene, putative SET domain containing protein family isoform ×1 gene, and grx_c8-glutaredoxin subgroup iii-which can significantly regulate the level of seed vigor and germination of maize seeds.
In the present study, a total of 92,614 molecular markers were obtained utilizing nextgeneration sequencing, including 60,436 SilicoDArT markers and 32,178 SNPs.MAF > 0.25 and a number of missing observations <10% were applied as criteria to determine the usefulness of the identified markers.In this way, 32,900 markers (26,234 DArTs and 6666 SNPs) were obtained and applied for association mapping.
NGS technology is utilized for sequencing genomes and transcriptomes, studying protein-DNA/RNA interactions, assessing methylation levels, discovering new DNA polymorphisms, and conducting meta-genomic studies [43].This technology allows for the analysis of various DNA fragments represented by multiple copies during a single reaction, library preparation, and the subsequent collection of gigabases of genomic data from a single sequencing run [44,45].This not only increases the number of samples examined but also enhances the reliability of the obtained sequencing results.This is particularly valuable when the variation between specific genotypes is small [36].The costs and time required for sequencing reactions, when calculated per unit of obtained information, are significantly lower compared to the costs of analyses conducted using traditional capillary sequencers [46].
Another sequencing strategy, primarily used to study the interactions between plants and their environment, is the use of NGS methods to characterize the plant transcriptome in different physiological states.The analysis of cDNA sequences provides information about expressed sequence tags (ESTs), which are transcribed in specific tissues and organs, and despite some limitations, these data are very useful for breeders [36,47,48].Nextgeneration sequencing techniques also enable qualitative and quantitative analyses of genes expressed under different conditions, and the results of these analyses are used for association mapping [49][50][51][52].
In the present study, association mapping was carried out, and it was found that the highest number of SNPs and silico DArT markers were associated with yield per plot (18,352-Kobierzyce and 18,751-Smolice), grain weight per ear (17,685-Kobierzyce and 18,314-Smolice), and core diameter (17,787-Kobierzyce and 16,018-Smolice).The fewest markers were associated with the number of rows (12,757-Kobierzyce and 11,714-Smolice) and the number of grains per row (13,265-Kobierzyce and 13,981-Smolice).To narrow down the number of markers for physical mapping, 19 markers were selected from among all significant ones that were associated with the same traits in both locations (Kobierzyce and Smolice).These markers were tested on high-and low-yielding reference genotypes.As a result of testing, two markers (28629 and 29294) that differentiated the tested genotypes were selected.For marker 28629, a specific product of 189 bp was observed for genotypes 1, 4, and 10.For marker 29294, a specific product of 189 bp was observed for genotypes 1 and 10.
Thanks to their specific features, SNP and SilicoDArT markers find many applications, including in the creation of molecular linkage maps and the identification of quantitative trait loci (QTLs) responsible for the inheritance of quantitative traits.Additionally, they are used for origin analysis, fingerprinting of cultivated varieties, in studies on population genetic diversity and gene flow, and plant evolutionary genetics [52].
In the present study, we identified 19 SNP markers that are significantly associated with yield structure traits in maize.Five of these markers (28629, 28625, 28640, 28649, and 29294) are located within genes that can be considered candidate genes related to yield traits.Marker 28629 is located on chromosome 8 within the leucine-rich repeat receptor-like protein kinase gene.Receptor-like kinases (RLKs) are a diverse group of transmembrane proteins characterized with a ligand-binding domain to receive signal molecules, a membrane-spanning domain to anchor the protein, and a cytoplasmic protein kinase domain to transduce signals downstream [53].According to reports in the literature, the first RLK was isolated from maize, and then numerous RLKs were identified in over 20 plant species [54].RLKs can indirectly influence the yield of maize because they mediate many signaling messages on the cell surface and act as key regulators during developmental processes [55][56][57].Genetic and biochemical studies conducted by other scientists have also shown that plant LRR-RLKs play an important role in various processes during growth and development [58,59].CLV and RPK2 have also been found to be essential receptor-like kinases in the formation and maintenance of the shoot apical meristem [60,61].Another significant marker was SNP 28625, located on chromosome 1 within the arabinosyltransferase (arad1) gene.As reported in [62], gene-encoding arabinosyltransferase ARAD1 catalyzes the polymerization of arabinose into the arabinan of arabinogalactan during secondary wall formation in loblolly pine.Research indicates a connection between arabinogalactan proteins and lignin biosynthesis for cell wall formation.It is well known that lignin occurs in the cell wall and is necessary for the transport of water and aqueous nutrients in plant stems.The polysaccharide components of plant cell walls are hydrophilic and therefore permeable to water, while lignin is hydrophobic.The cross-linking of polysaccharides by lignin prevents the absorption of water by the cell wall.Consequently, lignin enables the plant's vascular tissue to conduct water efficiently, which is very important for the proper growth and development of the plant and its yield [63].Therefore, there is a high probability that the ARAD1 arabinosyltransferase gene may influence the yield of maize.Research conducted in recent years has shown that cell walls can play an important role in intercellular communication, not only as a pathway for the transport of signaling molecules, or as an area of the cell in which the receptor domains of membrane proteins operate, but also as a source of signals that influence the functioning of cells [64][65][66].The third significant SNP (28640) was located on chromosome 9 inside the sugar phosphate gene.Sucrose phosphate synthase (SPS) is a key enzyme in the sugar metabolic pathways in plants.SPS catalyzes the conversion of fructose-6-phosphate and uridine diphosphate-glucose (UDP-glucose) into sucrose-6-phosphate which is a substrate in the synthesis of sucrose [67,68].SPS exists in many isoforms which may play various functional roles and are specific to different tissues and stages of development.Particularly, knowledge about the relationship of the localization of the individual forms of these enzymes and their role in plant responses to various stresses is highly desirable.It has been demonstrated, for instance, that several maize SPS sequences are most strongly expressed in the leaves and less intensively in pollen and kernels, and this is related to reactions to different abiotic factors [69].SUS isoforms have been found in different areas of cell walls [70,71]).In maize, the specificity of the function of individual SUS isoforms in both cytoplasmic and membrane-associated sucrose degradation was emphasized in [72].The fourth significant SNP (28649) was located on chromosome 4 within the gene annotated as ubiquitin carboxyl-terminal hydrolase 15 isoform x2.Ubiquitin carboxyl-terminal hydrolases (UCHs) belong to an enzymatic subclass of deubiquitinating enzymes (DUBs).The function of this gene in maize has not been described in the literature.The fifth significant SNP (29294) was located on chromosome 5 within the hydroxyproline o-galactosyltransferase galt6 gene.According to our analyses and data in the literature, this gene may be related to the yield of maize because, according to Kaur et al. (2023), hydroxyproline o-galactosyltransferase galt6 plays an important role in various stages of plant growth and development [73].Moreira et al. (2023) have reported that arabinogalactan proteins (AGPs) are hydroxyproline-rich, sugar-rich glycoproteins widely distributed in the plant kingdom [74].The synthesis of their complex carbohydrates is initiated by a family of hydroxyproline galactosyltransferase (Hyp-GALT) enzymes, which add the first galactose to Hyp residues in the protein backbone.

Conclusions
In the present study, we identified 19 SNP markers that are significantly associated with yield structure traits in maize.Five of these markers are located within genes that can be considered candidate genes related to yield traits: marker 28629, located on chromosome 8 within the leucine rich repeat receptor like protein kinase gene; marker 28625, located on chromosome 1 within the arabinosyltransferase (arad1) gene; marker 28640, located on chromosome 9 inside the sugar phosphate gene; marker 28649, located in ubiquitin carboxyl-terminal hydrolase 15 isoform x2ubiquitin carboxyl-terminal hydrolase 15 isoform x3; marker 29294, located in hydroxyproline o-galactosyltransferase galt6.These genes

Figure 1 .
Figure 1.Temperature and rainfall in Smolice and Kobierzyce in 2022.

22 Figure 2 .
Figure 2. Density charts showing the distribution of the analyzed traits: cob length (cm), ear diameter (cm), core length (cm), core diameter (cm), the number of rows, the number of kernels per row, mass of grain per cob (g), TSW (g), and grain yield (kg).

Figure 2 .
Figure 2. Density charts showing the distribution of the analyzed traits: cob length (cm), ear diameter (cm), core length (cm), core diameter (cm), the number of rows, the number of kernels per row, mass of grain per cob (g), TSW (g), and grain yield (kg).

Figure 3 .
Figure 3. Canonical variate analysis for the studied traits.

Figure 3 .
Figure 3. Canonical variate analysis for the studied traits.

Figure 6 .
Figure 6.Dendrogram showing the genetic similarity between the analyzed genotypes.

Figure 6 .
Figure 6.Dendrogram showing the genetic similarity between the analyzed genotypes.

Table 1 .
F-statistics from our analysis of variance for the analyzed yield structure traits.

Table 1 .
F-statistics from our analysis of variance for the analyzed yield structure traits.

Table 2 .
SilicoDArT and SNP molecular markers significantly associated with analyzed yield structure traits in Kobierzyce (K) and Smolice (S) (significant associations selected at p < 0.001 with Benjamini-Hochberg correction for multiple testing).

Table 3 .
Characteristics and locations of markers significantly associated with the analyzed traits.

Table 4 .
Sequences of the designed primers used to identify the newly selected markers significantly associated with the analyzed traits.