The Use of the Generalized Linear Model to Assess the Speed and Uniformity of Germination of Corn and Soybean Seeds

: The use of seeds with high physiological quality allows rapid growth and establishment of seedlings in the ﬁeld to be obtained. Therefore, the accuracy of the information obtained during the determination of the physiological quality of seeds is of great importance. The objective was to use generalized linear models, investigating which link function (Probit, Logit and Complementary log-log) is suitable to predict T50 and uniformity during germination of soybean and corn seeds. To perform the experiments, we used seeds from ﬁve commercial hybrids and/or cultivars of corn and soybean. The germination speed was calculated by counting the germinated seeds and the results were expressed in the form of proportions. Germination uniformity was calculated by the difference in the times required for germination. The best model was selected according to the criteria of the test of Deviance, AIC and BIC. The Logit model showed accurate results for most cultivars. The evaluation of germination in the form of proportions considering the assumption of binomial response is satisfactory, and the choice of the link function is dependent on the characteristics of each lot and/or species evaluated. The use of this methodology makes it possible to estimate any germination time and uniformity.


Introduction
Physiological seed quality is the ability of the seed to perform vital functions, characterized by germination, vigor and longevity, which directly affects the implantation of a culture under field conditions. High-potential seeds guarantee the growth and development of plants and the eventual yield of crops [1][2][3].
The correct identification of lots and/or cultivars with seeds of high physiological potential is one of the main tasks of researchers and professionals working in seed physiology and technology. For this, two basic physiological components are taken into accountgermination and vigor-these being the factors that theoretically govern the ability of seeds to express their vital functions under biotic and abiotic conditions [4,5].
There are several ways to assess the physiological quality of a seed lot; however, the most common method is the germination test [2,6]. As a result, the germination test is performed to determine the final germination percentage of the seed lots. However, the time for 50% of the seeds to germinate (T50) and the germination uniformity, which is the time difference between two percentage germinations and other defined parameters [7][8][9][10], complement the germination data and can indicate the performance of seeds in the field.
To assess which seed lot has superior physiological quality, the response variable (percentage of germinated seeds) is commonly assessed by non-linear regressions, such as the Hill function [7,10,11]. Another way used to analyze the germination data is through linearization, in which a link function is used, highlighting the Probit function as the most used [12][13][14].
However, such approaches are imprecise, as they consider the response variable to be continuous. For instance, in the use of non-linear regressions, the proportions of germinated seeds are cumulative and residual autocorrelation may occur [15], whereas in the use of linearization, the germination percentages are transformed into Probit units, considering that the data follow normal distribution, which ends up generating inaccurate models or simply a lack of linearization [14,16].
As the germination process is qualitative, with a binary result-that is, the seed does or not germinate-errors are not normally distributed. Thus, the classic regression analysis approach is not indicated [15]. An alternative for analyzing this type of data is the theory of generalized linear models, with the binomial distribution being a particular case and indicated for proportion data [13,17,18]. Therefore, our hypothesis is that the generalized linear models are more indicative, since it may provide the most accurate information about the problem exposed [19].
In this context, the objective of this work was to use the generalized linear models, investigating which link function (Probit, Logit and Complementary log-log) is suitable to predict T50 and uniformity during germination of soybean and corn seeds.

Plant material
The commercial corn seeds used were: AS 1633 PRO3, 2B587 RR, 2A401PW, AL Bandeirante and BRS 4103. The first three refer to hybrid corn seeds, while the last two are cultivars. The commercial soybean cultivars used were: DS59716 IPRO, CD2737 RR, CD251 RR, CD2820 IPRO and CD2857 RR. Corn seeds were produced in the center of southern Brazil and soybean seeds were produced in the central region of Brazil (Goiás Province) grown in the crop season of 2016/17.

Germination, T50 and Uniformity Study
The corn and soybean seeds were distributed on sheets of paper towels, moistened with an amount of water equivalent to 2.5 times the dry paper mass in Petri dishes. For each hybrid and/or cultivar, 20 seeds were used with four replications at 15 independent points in time. Then, the Petri dishes were transferred to a BOD (Biochemical Oxygen Demand) germinator type, set at a constant temperature of 25 • C. In Table 1, the average water content of seeds for each hybrid and/or cultivar used in this research is demonstrated. The counting of the germinated seeds was carried out at regular intervals of 6, 12 and 24 h and continued until 204 h (15 independent points in time), with the protrusion of the primary root ≥2 mm being adopted as the germination criterion [20]. The results were expressed by the percentage of the number of seeds germinated in each time interval and the number of viable seeds obtained at the end of the experiment, given by a sequence of Bernoulli tests.

Binomial Regression Models and Selection Criteria
Since the response variable, that is, the proportion of seeds germinated over time, has an exponential distribution, the use of the generalized linear models proposed by [21] was considered. These models are made up of three parts: a random component of the model in which the response variable has a distribution belonging to the exponential family; a systematic component formed by a set of independent variables; and the link function, which makes the link between the random component and the systematic component.
The default distribution assumed for the random variable Y i , i = 1,...n, which is the number of seeds germinated, is the binomial with parameter m i and probability of success π i -that is, Y i~B inomial (m i , π i ). Thus, the link functions used for binomial regression were Probit, Logit and Complementary log-log. The canonical forms of these models are presented below and, according to [17], the use of these functions in canonical form has as advantages an adequate scale for modeling with practical interpretation of the model parameters and the simplification of the estimation algorithms. Thus, these models in canonical form are: in which µ i is the mean and Φ −1 is the standard normal cumulative distribution function. The model parameters were obtained using the maximum likelihood method. It is recommended to use the maximum likelihood method due to its excellent, consistent and asymptotic properties [17,21]. When the parameters β 0 and β 1 were obtained, the adjustment was verified by the Deviance criteria and Akaike (AIC) and Bayesian (BIC) information criteria.
The Deviance criterion is used as a measure of discrepancy, measuring the quality of fit of the models; for this reason, the log likelihood rate statistic is used. By this criterion, a model is considered ideal when it results in a significantly small value of the deviation, that is, generating evidence that, for a small number of parameters, an adjustment as good as the saturated model is obtained [21,22]. With regard to binomial distribution, Deviance was defined by: in which y i represents the realization of the random variable; m i = n: sample size;μ i , i = 1, 2,..., n, are the adjusted values for the model of interest. Thus, when Deviance, divided by the number of degrees of freedom, is not close to 1, there is an indication that the model may be poorly adjusted, since the binomial models have a dispersion parameter ψ = 1, which is fixed. Values with parameter dispersion ψ < 1 and ψ > 1 indicate underdispersion and overdispersion, respectively.
The checked sub-effect or overdispersion is incorporated into a constant parameter dispersion (ψ) using the maximum estimation method of "quasi-likelihood". The dispersion parameter ψ was estimated from the square root of the Deviance quotient by the degrees of freedom. Thus, with this new dispersion parameter, the standard errors of the parameters (β 0 and β 1 ) were corrected, and the inferences in the significance tests were reformulated. Such correction was performed as described by Equation (5): in which SE and SE corrected is the standard error and standard error corrected, respectively.
Among the more specific criteria and those used in the literature to assess quality setting are the Akaike (AIC) [23] and Bayesian (BIC) information criteria [24]. Both criteria penalize the lack of adjustment to the data and the complexity of the model, so that the lowest values are preferred [19]. The expressions are shown in (6) and (7), respectively.
in which p is the number of parameters estimated in the model, LL is the log likelihood evaluated at the value of the estimated parameters; and n is the total number of observations used.

Germination Times and Germination Uniformity
After selecting the most parsimonious model, the times needed to germinate were calculated at 10, 20, 25, 30, 40, 50, 60, 70, 75, 80, 90 and 99% for each corn and soybean hybrid and/or cultivar, by repetition. Germination uniformity was calculated by the difference in germination times of 75 and 25% (U7525).
The following formulas and Table 2 were used to calculate each time already mentioned, depending on the model selected: in whichθ p is specific time for a determined p-value and (β 0 ,β 1 ) is the pair of estimates for a simple linear regression model.

Statistical Analysis of Germination Times
The germination times T10, T50, T99 and U7525 (uniformity) were submitted to the Shapiro-Wilk (modified) normality test [25], followed by univariate analysis, considering the completely randomized design with the application of Fisher's LSD test, at 5% probability.
Subsequently, aiming to know the germination behavior of corn and soybean hybrids and/or cultivars globally, the germination times (T10, T20, T30, T40, T50, T60, T70, T80,  T90 and T99) and uniformity (U7525) were submitted to the multivariate classificatory technique of grouping by the Ward method, using the Euclidean distance, by the standardized data matrix for each variable by the formula: in which Z is the standardized value observed; X is the observed value; X is the average of the observed values; and S is the standard deviation of the observed values.

Computational Aspects
The work was carried out using SAS Software (Version 9.4, Cary, NC, USA) through GENMOD procedures for model adjustments and selection, and GLM for univariate analyses. For multivariate analysis and preparation of graphics, we used MINITAB Statistical Software (Version 17, State College, PA, USA).

Results and Discussion
In Figure 1 (experimental data), we present the representations of the germination process for the different hybrids and/or cultivars of corn and soybeans. Initially, germination is slow, with a low proportion, then there is a period of acceleration and finally, stabilization with all viable seeds germinating. When adjusting the Probit, Logit and Probit, and Logit and Complementary log-log link functions for the 10 hybrids and/or cultivars tested, it was observed that no cultivar presented adequate adjustment simultaneously by the link functions when considering the Deviance criterion. The corn hybrid and soybean cultivars with dispersion parameter closest to 1 were AS 1633 PRO3 and CD251 RR, both adjusted by the Logit link function. The overdispersion phenomenon was observed in six cultivars simultaneously, with functions by link values to Deviance from 1.3, whereas only the 4103 corn cultivar BRS presented underdispersion, indicated by three functions being used (Table 3). The phenomena of under-and overdispersion were defined by [17], as a variance of the response variable above or below the variance expected by the model adopted. The main consequences of these phenomena are the estimation of standard errors, which consequently can induce an inappropriate choice of models, potentially compromising the conclusions [26]. Even in the face of researchers' efforts to control experimental conditions, the occurrence of phenomena such as overdispersion is common for agricultural systems, as there is great variability [27,28].
Among the tested link functions, the complementary log-log was the one that gave the highest values for Deviance-in other words, most models formulated with that link function are overdispersed, except BRS 4103. While functions Probit and Logit showed the smallest deviations, the latter being closer, the Logit function was the one that provided the best adjustments, presenting the appropriate adjustment in 7 of the 10 hybrids and/or cultivars used in this research (Table 3).
These results demonstrate the importance of considering the choice of the correct link function, since the use of inaccurate models is potentially generating misleading conclusions [26,28]. Although the logit model is currently preferred in some areas-for example, in biometrics [22,29,30]-in this study, smaller deviances were obtained for most hybrid and/or cultivar studied; it is necessary to study this by comparing the link seeking functions that best describe the probability of interest [31].
For all hybrids and/or cultivars tested, there was agreement between the Deviance criterion and the AIC and BIC information criteria-that is, those functions with the least deviations were also those with the lowest AIC and BIC values ( Table 3). This agreement facilitated the selection of the most parsimonious models to evaluate the germination of corn and soybean seeds. Thus, the Probit model was chosen to evaluate the germination of two cultivars, the Logit model of seven cultivars and the Complementary log-log of one cultivar (Table 4).  The information criteria presented penalize the lack of adjustment to the data and the complexity of the model; therefore, models with lower values were chosen [19,32]. According to these criteria, the Logit function stands out as a good alternative to evaluate the germination of corn and soybean seeds, generating theoretical bases for other areas, such as, for example, in seed science and technology, countering the idea that the model Probit should always be used to assess physiological quality, from germination to longevity in thermal models [12,14].
Keeping the focus on the estimation of the dispersion parameter, the selected models had their standard errors of the estimates corrected by quasi-likelihood using Deviance to estimate the constant dispersion parameter, where it was used in the procedure in (5). With the application of this correction, as expected, standard errors showed an increase for models with overdispersion and a decrease for models with subdispersion; however, the significance of the parameters was not affected ( Table 4).
The use of quasi-likelihood to correct estimated standard errors is recommended by [33] and has been adopted in seed germination studies [18,29], in entomology data [34,35], in the assessment of ecological data [26], and in the modeling of the number and dry matter mass of Rhizobium nodules in bean culture [28]. Therefore, there is a solid body of literature on the use of this methodology.
With the selected models (Table 4), it is possible to estimate the germination times of interest to the researcher, using the formulas presented in (8), (9) and (10). The average germination time or time required for 50% of the seeds used in the germination experiments (T50) is considered to be the preferred one to describe the germination and physiological quality of the seeds submitted to different treatments or to compare different batches of seeds [7][8][9]11,14].
In the case of the Probit and Logit models, the T50 can be obtained easily, since the parameters β 0 and β 1 form a linear equation of the type y = a ± bx, where, upon equaling the terms of the equation to zero, the T50 is obtained, because both models have symmetry around zero. Thus, to calculate the T50, we can use the following formula: in whichβ 0 is the intercept andβ 1 the slope angle. Table 5 shows the values for T50 obtained in the 10 hybrids and/or cultivars, adopting the models provided in Table 4 and the expected interval for this parameter obtained in an experimental way. Thus, we can consider the methodology of generalized linear models, adopting the efficient binomial distribution to evaluate the germination of corn and soybeans, since all the results estimated by the chosen functions are contained in the experimental intervals. As much as authors defend the use of linear models to estimate germination times, considering the assumption of normality of the data [12,14,36], often a simple transformation of the percentages of germination using a certain link function, such as the Probit model (inv. Norm function in Microsoft Excel) does not allow the dataset to be linearized [14,16]. Thus, an approach considering germination as a binary variable, in which seeds may or may not germinate, has been more indicated [13,18].
It is worth mentioning that another differential of the work is that the calculated germination times are obtained based on the number of viable seeds-that is, the correct definition for the T50 in this research is the time required for 50% of the viable seeds to germinate, not requiring additional formulas to calculate the actual amount of germinated seeds.
In addition to the traditional T50, other parameters can be used to evaluate the germination of a seed batch; for example, it is possible to calculate the time for 10% of viable seeds to germinate, or it is also necessary to identify whether two batches with final germinations have the same germination uniformity or even understand germination as a global process, not being restricted to just a few parameters that can lead to false conclusions about the physiological quality of a seed lot.
As an example of using other parameters to assess germination, we have the work of [7,11], using Hill's nonlinear function of four parameters to estimate beyond T50, the maximum germination time and germination uniformity, which is the time interval between two predefined germinations. The germination times of 10 and 90% of the seeds have also been calculated [14,37] to evaluate the physiological quality of seeds. However, the two most widespread statistics for evaluating the germination of a seed lot are the T50 and germination uniformities [7,10,37].
Thus, seed lots or cultivars that exhibit the lowest values of T50 or any other germination time can be considered of higher physiological quality; the same reasoning is valid for germination uniformity [7,8,37].
Currently, research involving the evaluation of seed germination to determine previously mentioned parameters largely uses non-linear models or some link function directly on the percentage data, without paying attention to the type of variable studied and its probability distribution, which often causes convergence problems or even severe errors in parameter estimation [7,10,14]. However, when we use generalized linear models, these difficulties are overcome, as we are working with a simple linear equation. As a demonstration, the germination of the corn cultivar BRS 4103 was modeled by the Complementary log-log function, showing the T50 and the germination uniformity (see Figure 2). The substitution of T50 = 37.59 in the equation shown in Figure 2 will return the value of~−0.3665, corresponding to 50% germination as indicated in Table 2. Following selection of link functions better suited to evaluate the germination of each plot, germination times T10, T50, T99 and U7525 were determined for all sampling data, following which, analysis of variance was performed complemented with the means test (LSD) in order to compare the physiological potential of corn and soybean cultivars. According to the (modified) Shapiro-Wilk test [25], the four parameters evaluated have a normal distribution. The results of the analysis of variance revealed that the F test was significant for all corn parameters, whereas, for soybean cultivars, only germination uniformity was not significant at 5% probability.
When evaluating T10, it was possible to observe that the corn cultivar 2A401PW was the one that showed the slowest germination, while the cultivars AL Bandeirante and BRS 4103 showed faster germinations. For evaluated soybean varieties, cultivating CD2820 IPRO showed slower germination, different to other cultivars using twice the time to reach 10% germination, compared with cultivars CD251 RR and CD2737 RR (Table 7).   (4.30; 11.29) ns * T10 (95% CI) , T50 (95% CI) and T99 (95% CI) = time required for germination of 10, 50 and 99% of viable seeds and 95% confidence interval; # U7525 (95% CI) = germination uniformity given by the difference between the 75th and 25% percentiles and 95% confidence interval; ns = not significant at the 5% probability level; averages followed by equal lowercase letters in the columns do not differ between themselves by Fisher's LSD test at 5%.
The behavior of cultivars at T50 was altered in relation to T10, only for cultivar 2B587 RR, the corn cultivars AL Bandeirante and BRS 4103 were also faster in reaching 50% germination in approximately 16 h. For soybeans, the cultivar CD2820 IPRO continued to be less vigorous, whereas cultivars CD251 RR and CD2737 RR continued to exhibit greater physiological quality ( Table 7).
The time for 99% of germinated seeds was calculated in order to determine the behavior of hybrids and/or cultivars when they are near to complete 100% germination. Thus, it is observed that the corn hybrid 2B587 RR, which did not present statistical difference in the previous times with the cultivar AS 1633 PRO3, showed difference in more than 22 h. This differentiation may be due to uniformity since the corn hybrid 2B587 RR proved to be less uniform. The best performing corn cultivar in time T99 was AL Bandeirante, also being the most uniform. For soybean cultivars at time T99, it was proved that CD2820 IPRO is the least vigorous and the CD251 RR and CD2737 RR cultivars have the highest physiological quality.
The lower or higher speed of germination of one cultivar in relation to the other is due to the time spent in the restoration of the damaged organelles and tissues before beginning the development of the embryonic axis, during the germination process [8,38]. According to [38], cultivars or seed lots with higher germination speed and uniformity are considered the most vigorous.
The effect of a seed lot can be defined as the sum of the properties that determine the activity and performance of seed lots as acceptable in germination in a wide range of environments [4,5]. Thus, the identification of high-performance seed lots or cultivars is an important initiative for the success of agricultural production [2].
Under these assumptions, we list the corn cultivars AL Bandeirante and BRS 4103 and the soybean cultivars CD251 RR and CD2737 RR as those of greater vigor based on germination and uniformity times. Several authors who, using a simple radicle count, managed to predict the vigor of several species [8,39,40], support this statement.
The higher vigor of hybrids and/or cultivars is evidenced when analyzing germination in a broader context, considering various germination times (T10, T20, T30, T40, T50, T60,  T70, T80, T90 and T99) and uniformity standard (U7525) through multivariate classificatory analysis. In general, the cophenetic coefficient was above 0.90, indicating little distortion with the original data matrix [41]. It was possible to verify the existence of three groups, in the dendrogram, for both corn and soybeans (Figure 3). The corn plants with smaller distance were AL Bandeirante and BRS 4103, with Euclidean distance of 0.96 (Table 8), or had similar physiological quality, confirming the results for T10, T50 and T99 ( Table 7). The greatest difference in physiological quality was observed between the cultivars BRS 4103 and 2B587 RR with Euclidean distance greater than 7 (Table 8). Regarding soybean cultivars, the closest or less distant were cultivars CD2737 and CD251 RR, with Euclidean distance equal to 0.15 (Table 8), confirming the results introduced previously (Table 5). Additionally, the most distant physiological potential was observed for the cultivar CD2820 IPRO with cultivars CD251 RR and CD2737 RR, with Euclidean distance equal to 8.02 (Table 8).

Conclusions
Germination evaluation in the form of proportions, considering the assumption of binomial response, is satisfactory, and the choice of the link function will depend on the characteristics of each lot and/or species evaluated. The presented methodology allows calculation of any germination time and uniformity in a more robust way.