Analysis of Fluid Velocity inside an Agricultural Sprayer Using Generalized Linear Mixed Models

: The ﬂuid velocity inside the tank of agricultural sprayers is an indicator of the quality of the mixture. This study aims to formulate the best generalized linear mixed model to infer the ﬂuid velocity inside a tank under speciﬁc operational parameters of the agitation system, such as liquid level, circuit pressures, and number of active nozzles. A complex model was developed that included operational parameters as ﬁxed e ﬀ ects (FE) and the section of the tank as the random e ﬀ ect. The goodness of ﬁt of the model was evaluated by considering the lowest values of Akaike’s information criteria and Bayesian information criterion, and by estimating the residual variance. The gamma distribution and log-link function enhanced the goodness of ﬁt of the best model. The Toeplitz structure was chosen as the structure of the covariance matrix. SPSS and SAS software were used to compute the model. The analysis showed that the greatest inﬂuence on the ﬂuid velocity was exerted by the liquid level in the tank, followed by the circuit pressure and, ﬁnally, the number of active nozzles. The development presented here could serve as a guide for formulating models to evaluate the e ﬃ ciency of the agitation system of agricultural sprayers.


Introduction
The velocity of the fluid inside an agricultural sprayer, necessary for the homogeneous mixing of the active matter, is determined by the internal agitation system, which is hydraulic in most cases. It differs according to (a) certain working parameters such as circuit pressure, number of active nozzles, and liquid level in the tank; and (b) the geometry of the tank height, width, section type, presence of barriers and irregularities inside the tank, and the location of the nozzles.
The combination of these two sets of parameters makes it difficult for the observed fluid velocity data to fit a normal distribution. This requires the use of non-parametric methods when evaluating possible velocity variations based on working and geometry parameters [1]. This analytical approach helps reveal the effect of the parameters separately, masking the possible interactions between them, as well as the weight that each one exerts on the velocity variations.
Fluid velocities inside spray tanks can be correlated with the efficacy of the agitation. Xiongkui et al. [2] concluded that the efficacy of agitation was improved by increases in both flow rate and working pressure; similar conclusions were obtained by [3]. In turn, it can be concluded that Appl. Sci. 2020, 10, 5029 2 of 18 an increase in fluid velocity will produce an improvement in the efficacy of the agitation system [4]. Different studies have investigated fluid velocities inside spray tanks using Computational Fluid Dynamics (CFD) [5]. The use of this type of model is very useful, but it has the drawback of requiring long calculation times and the need for very precise data on the geometry of the interior of the tank and the location and characteristics of the components of the agitation system [6]. In this sense, Generalized Linear Mixed Models (GLMM) are an alternative to the CFD.
GLMM extend the set of generalized linear models by adding random effects to the linear estimator [7], allowing the modelling of correlated observations. In addition to generalized linear models [8], the GLMM can be formulated using a three-part sequence [9], on the basis of its general formulation [10,11] (Equation (1)).
where y = n × 1 observed data in a column vector; X = n × p matrix of the p predictor variables (all levels of fixed effects); β = p × 1 column vector of the fixed-effects regression coefficients; Z = n × q design matrix for the q random effects; u = q × 1 vector of the random effects; ε = n × 1 unobserved error vector.

Theoretical Sequencing of the Model
The conditional distribution of each y ij , given a vector q × 1 of random effects u i , belongs to the exponential family of distributions (positive or negative binomial, Poisson, normal, and gamma, among others). The variance is given by (y ij |u i ) = φυ(E(y ij |u i )), where, υ(·) is a known function for the variance, a function of the conditional mean E(y ij |u i ), and φ is a scalar parameter that may be known or needs to be estimated. It should be noted that, given the random effects u i , it is assumed that values of y ij are independent of each other, which assumes conditional independence [12].
The conditional mean of y ij , which is dependent on the fixed effects (FE) β and random effects (RE) u i , is linked to the linear predictor η ij , through the application of a known linkage function, g(·), which is monotonous and differentiable [13], as follows (Equation (2)): g{E(y ij |u i ; x ij ; z ij )} = η ij = x ij T β + z ij T u i , where, x ij and z ij are two vectors of covariates p × 1 and 1 × q dimensional, respectively. Although any function can be chosen for g(·), each distribution belonging to the exponential family has a special linking function called the canonical linking function. The known linkage function g(·) is given by (Equation (3)): g{E(y ij )} = θ i , where, θ i is the canonical location parameter. Generally, any multivariate distribution can be supported for u i . In practice, it is common to assume that the u i has a multivariate normal distribution [14], with zero mean and variance-covariance matrix G, sized q × q [15], where G is some function of θ. Additionally, the RE u i are assumed to be independent of covariates x i , if any.
The adjustment of the model requires the maximization of the marginal likelihood, which is obtained by integrating the RE. The form of distribution of RE has a notable influence on the unbiased inference of the maximum likelihood estimators (MLE) of FE [14]. Software packages offer several answers to solve this [16]. SPSS and SAS, by default, use iterative linear methods that maximize the penalized quasi-likelihood residual (PQL [17][18][19]; RSPL method in SAS), whose main criticism is that the inference of the plausibility that it may be inappropriate, as it generates a bias for large variances or small means. Existing methods to approximate the likelihood to estimate GLMM parameters include, marginal quasi-likelihood methods (MQL) [20], Laplace approximations [21], Gauss-Hermite quadrature (GHQ) [22], and Markov chain Monte Carlo (MCMC) algorithms [23]. Adaptive quadrature (GHQ) or Laplace approximations are the preferred methods for categorical response variables [15,22,24]. Residual methods account for the FE in the construction of the objective function, which reduces the bias in covariance parameter estimates. Estimation methods involving the Taylor series create pseudo-data for each optimization, which are transformed to have a zero mean in a residual method. While the covariance parameter estimates in a residual method are the maximum likelihood estimates for the transformed problem, the fixed-effects estimates are (estimated) the generalized least squares estimates. In a likelihood method that is not residual based, both the covariance parameters and the fixed-effects estimates are maximum likelihood estimates, but the former are known to have greater bias. In certain problems, residual likelihood estimates of covariance parameters are unbiased.
The degrees of freedom (df) for RE, required for Wald tests or F tests or Akaike's information criteria (AICc), must be between 1 and N-1 (where N is the number of RE levels). Software packages vary enormously in their approach to computing df. The simplest approach uses the minimum number of df contributed by RE that affect the term being tested (residual method) [25]. The Satterthwaite and Kenward-Roger (KR) approximations [25,26] use more complicated rules to approximate the df and adjust standard errors.
The overview of the model implies that the variance matrix of the target variable has the following form: V = ZGZ' + R. Knowing that ε is the vector of unobserved errors, the variance takes the form of (Equation (4)): The R matrix is by default the scaled identity matrix, R = σ 2 I, where I is the n × n identity matrix, and G is a diagonal matrix containing variance components [27]. The usual estimate of the variance-covariance matrix of the mean model parameters depends on the specified variance-covariance model for the data. The consideration of a wide class of variance-covariance models helps ensure that this specification is logical [28].

Objective
This study aims to explore, describe, and adjust a generalized linear mixed model, appropriate for the analysis of the influence that each variable (geometric and agitation system regulation) exerts on the internal velocity of the fluid. Considering that these models allow the dependent variable to not have a normal distribution [29], there are correlated observations and a linear relationship with the independent variables and covariates through a given linkage function.

Source of the Data
For this study, the data from the trial conducted by [4] with a hydropneumatic sprayer tank ( Figure 1) was used, in which the fluid velocity (cm/s; FV) was measured using a 3-D microacoustic Doppler velocimeter, at 32 points, distributed in several positions inside the tank (Figure 2; height-H; side-S; and section-SC), and at varying pressure of the agitation system (8, 10, 12 bar; P), number of active nozzles (2,4;NN), and liquid level in the tank (1000, 2000, 3000 L; LT). General terminology notation is specified in Table 1      First-order autoregressive covariance matrix structure ARH(1) Heterogeneous first-order autoregressive covariance matrix structure ARMA(1,1) First-order autoregressive moving-average covariance matrix structure CS Compound-symmetry covariance matrix structure CSH Heterogeneous compound-symmetry covariance matrix structure D Diagonal covariance matrix structure Toep Toeplitz covariance matrix structure UN Unstructured covariance matrix structurê β 0 andβ 1 Estimated linear regression intercept and slope (computed Error in prediction of linear regression

Software Used
The SPSS package (IBM Corp. Released 2017. IBM SPSS Statistics for Windows, Version 25.0. Armonk, NY: IBM Corp.) and PROC GLIMMIX of SAS/STAT, version 9.4 of the SAS System for Windows (SAS Institute Inc. 2018), whose scripts are shown in Appendix A, were used.

Useful Sequencing of the Best Model
In this study, for the formulation and selection of the best model in terms of the number and nature of the fixed and random effects (Tables 2-4), a canonical link function was assumed under the premise of a normal distribution of the errors of the model and the target variable, fluid velocity (FV). The Kolmogorov-Smirnov (K-S) test was used to evaluated conformance with the normal distribution (theoretical curve shown in Figure 3). Once the best model was selected and the FV showed positive scale values that deviated towards larger positive values, it was tested until the combination "distribution form-linking function", providing a better fit, was obtained.  Table 3. Increments of the Akaike information criterion corrected for each model with respect to the best model (minimum), and the relative likelihood of each model expressed through its relative Akaike weight (w i ). With respect to the variance-covariance matrix, the basic rationale was that "parsimony means power", that is, to obtain the most efficient inferences about the mean model, one selects the most parsimonious covariance structure possible, that continues to reasonably fit the data. It is, thus, recommended, in each case, to test both the formats with the least number of parameters and heterogeneous structures [30].

Distribution Analysis of the Targeted Variable: Fluid Velocity
Liquid level in Tank   The analysis of the fluid velocity (cm/s) variable as a whole (n = 7200) revealed an average value of 11.25 ± 6.08 cm/s, within a range of values from 0.69 cm/s to 61.73 cm/s, highlighting a mismatch with the normal distribution (K−S = 0.08; p < 0.001). The representation of the data based on the operational parameters of interest: LT, NN, and P, showed various extremes and outliers in the right  Table 5 shows estimates of mean and variance of the Pearson's residues according to the different covariance matrix structures tested. A linear regression was performed between the predicted and observed values (Table 6), assessing their significance through an analysis of variance of the regression model, the constant, and the slope. The results obtained with the best model, through SPSS and SAS, through three iterations, using the information criteria and the estimates of the residual variances (Table 7); using estimated averages for the different levels of FE and theirs p-values (Table 8 and Figure 4); and through the adjustment coefficients of a regression between observed and predicted values, were compared (Table 9).   Results obtained with both software agreed that fluid velocity was significantly lower as the liquid level inside the tank increased. While no differences were observed at 8 or 10 bar, the fluid velocity increased significantly when the pressure reached 12 bar. However, the results differed in the effect of the number of active nozzles. While the SPSS model did not observe differences between the presence of 2 and 4 open nozzles, the SAS model observed and separated them, highlighting that the velocity was significantly lower with four open nozzles, potentially due to their geometric location in the tank as they were facing each other ( Figure 1).
Thus, when the four nozzles operated, the liquid flowed facing 2 to 2 and produced a braking effect. In this case, the four nozzles were arranged on the same work plane and aligned with two sets of nozzles facing each other, with each pair located at the ends of the tank. In contrast, when two nozzles were used, they were located in the same plane but misaligned with each other, and thus did not directly face the fluid flows. There was, however, a gap in the flow lines of approximately 700 mm. This is consistent with the significance of the σ3 covariance, between the first and last section, observed in the Toeplitz variance-covariance matrix (Equation (6)). Generally, the least square means obtained with SAS were slightly higher than those obtained with SPSS, with the exception of the cases of 2000 L and 3000 L of liquid in the tank.
The predictions on the fluid velocity of the best model revealed significant differences between all the levels of FE (Figure 4). It is noteworthy that these were slightly lower than those observed during the test (Table 8), with the differences between what was predicted by the SPSS and SAS models, and the differences in the analysis of the observations made by both software being in a certain proportion.
Without the pretension of establishing a correct linear regression, as several of its premises are not fulfilled, if this method was used to find the output that adjusts its predictions with the measurements in the laboratory better, it was observed that the best model, computed with SAS, revealed a better adjustment in the coefficient of determination (R 2 ; almost 8%), a higher correlation and, the residues, having less skewness and kurtosis, than that computed using SPSS (Table 9).   The relationship between the observed and predicted values estimated using the best model is represented in Figure 5, through the linear regression that relates them. Starting from this, 95% predictor intervals are shown, whose calculation responds to what is explained in (Equation (5)) [31]. The points excluded from the prediction interval, the section they belonged to, and the dispersion statistic that characterized them were identified.
adjusted determination coefficient. Rho: Spearman's rank correlation coefficient. p-value its < 0.001 in all items. Therefore, the best model was the one computed with SAS, considering the parameters that have been discussed throughout the study. However, the predicted values reproduce the highest values in the positive direction of the target variable, fluid velocity (FV), overestimating some predictions from 2 cm/s to 32 cm/s, but reducing the maximum predicted velocity by at least 15 to 17 cm/s ( Figure  5). The best fit occurred when the liquid level in the tank was 1000 L (R 2 ad = 0.467), at a pressure of 8 bar, with two active nozzles (R 2 ad = 0.661). The model setting for the variable pressure was better for a pressure of 8 (R 2 ad = 0.453) and 12 bars (R 2 ad = 0.410), as opposed to 10 bars (R 2 ad = 0.392). Finally, the model was always better adjusted with two active nozzles (R 2 ad = 0.456) than with four (R 2 ad = 0.352). The model's worst adjusted combination of variables occurred with LT = 2000 L, P = 12 bar, and NN = 2 (R 2 ad = 0.146). The values excluded from the prediction limits in Figure 5 are those whose residue exceeding twice the width of the 95% prediction interval (e ≥ 5.85; Equation (5)). Sixty-nine percent of these were measured in Section 2, with σ 2 FV,SC2 all data = 49.27 (n = 1800) versus σ 2 FV,SC2 excluded data = 52.74 (n = 255), which was the closest to the breakwater element; while 21% were measured in Section 1, with σ 2 FV,SC1 all data = 39.10 (n = 1800) versus σ 2 FV,SC1 excluded data = 42.31 (n = 80). This shows that the excluded values in each section not only contain the variability of the whole section but also exceed it. In this regard, the following combination of FE levels are highlighted: LT = 2000 L, P = 10 bar, and NN = 4, with σ 2 FV excluded data = 124.67 ( Figure A1 in Appendix B).

Distribution Analysis of the Targeted Variable: Fluid Velocity
The analysis of the fluid velocity (cm/s) variable as a whole (n = 7200) revealed an average value of 11.25 ± 6.08 cm/s, within a range of values from 0.69 cm/s to 61.73 cm/s, highlighting a mismatch with the normal distribution (K−S = 0.08; p < 0.001). The representation of the data based on the operational parameters of interest: LT, NN, and P, showed various extremes and outliers in the right tail of the distribution, with Skewness = 1.24, and Kurtosis = 2.68 ( Figure 3). It is known that, to develop models that describe what is observed in reality, the variable should not be transformed, nor should extremes and outliers be treated and/or eliminated [32].

Model Comparison
Models without interactions between FE and with an increasing level of complexity were developed, whose fit was compared with (a) the AICc corrected [33,34], despite a high number of observations in relation to the number of parameters in the model (K), such that n/K > 40, and (b) the Schwarz's Bayesian criterion (BIC) [34,35], which modifies the model parameter scores with respect to the AICc (Table 2). This obviates the need to establish a level of significance in accepting or rejecting effects, similar to the stepwise method [36].
As can be seen in Table 2, the best model showed a lower value of the AICc and BIC information criteria coefficients, and also the highest relative Akaike weight (w i ; Table 3). The model that best represented the reality measured by the data, was number 11, which included the operational parameters of liquid level in the tank, number of active nozzles, and pressure, with FE and the section of the geometric shape of the tank as a RE. Models 12 and 13, respectively, followed this with a decreasing plausibility in terms of adjustment to the observed data.
Model 11, together with the Models 12 and 13, were those that estimated the lowest residual variance among the other models (Table 4).
Models which excluded SC, in the fixed and the random effects (Models 1 to 10 and 14), showed the lowest relative weights (w i ) in the aggregate, and the highest estimates of residual variance. The smallest values of residual variance, corresponding to the models that best fit the description of the observations [37] highlight the importance of SC, as a descriptor of the sprayer tank geometry, when assessing the velocity of the fluid inside, and the need for their inclusion as RE and not as FE.
The practical use of such sprayers causes them to gain importance in the interactions between the factors [38]. Nevertheless, the introduction of double interactions: LT×P, LT×NN, and P×NN, and the three-way interaction: LT×P×NN, among the FE, barely improves the fit by 0.41% (∆AICc = −179.75). It must be highlighted here that, the interaction double, LT×P, displayed greatest influence (absorbed more variability) on the fluid velocity inside the tank. Since the formulation of an efficient model was tested here, the inclusion of the double and triple interactions were dismissed, even as RE [29] (the G matrix did not converge), to simplify the interpretation of the results.
In this type of GLMM models, the normal parametric distribution for RE has been assumed [39]. This is mainly for conventional reasons [40], understanding that, despite having chosen four cross-sections for the location of the control points within the sprayer's tank, the cut-off point for the choice of the section could take any value from one end to the other. It should be remembered that it is not highly recommended that random effects have many levels so that the Z matrix can converge [27].
No further information was provided on the form of the joint distribution of the RE (SC, H, S), and, thus, the normal distribution assumption, for these RE as a whole, was not evaluated directly in the Models 12, 13, and 14 [41]. The choice of a RE only partly outweighs the natural concern about the incorrect specification of RE in GLMM models [42], and its possible impact on the FE inferences [43].
No evidence was found of the use of GLMM in the analysis of the operation of the agitation system of agricultural sprayers. Until now, studies carried out in this regard have focused on the use of linear models such as ANOVA [4] or numerical modeling using CFD [1,6]. Therefore, the GLMM application methodology proposed in this research can be extended to different tank geometries (cylindrical, asymmetric, cubic, etc.) and to different designs of the agitation system (number of nozzles, location of nozzles, etc.).

Best Model Description
The data was structured from control points, as observation units that were independent of each other. The measurements were carried out according to different levels of the FE, such as liquid level in the tank, pressure, and the number of active nozzles, in four sections and at three heights and distances from the perimeter (sides).
The target variable (dependent) is the fluid velocity inside the tank. As previously warned, considering the form of the distribution of the target variable and the outcome of the various tests carried out, the combination that best matched what was observed was a Gamma-type distribution [31] with a logarithmic link function (f(x) = log(x)).
The values of the information criteria after the assessment of the different forms of the variance-covariance matrix (Table 5) show that, in the case of the AICc, the structures that best fit the model are the First-order autoregressive moving-average, followed by the Toeplitz, while with the BIC, the best fit model was the identity model followed by the first-order autoregressive, and then the first-order autoregressive moving-average model, which corresponded to their respective relative weights (w i AICc or BIC). Among these, the model that recorded a minor statistical significance forσ 2 SC was the Compound-symmetry covariance structure, being the one that most closely related the observations to the assumption of independence [19,44].
Despite having a large number of observations, and only one RE, it was difficult to choose the most appropriate covariance structure based only on the information criteria shown in Table 5 [45]. The degree of adjustment of the model's predictions to the observed data was, thus, analyzed according to the different covariance structures (Table 6) [27,28].
The accordance between the observed and predicted values of the model with regard to the determination coefficient, based on the structure of the covariance matrix ( Table 6), revealed that the best fit corresponded to Unstructured and Toeplitz. The smallest standard error in the estimation was provided by First-order autoregressive moving-average, followed by the Identity. As for the estimation of the variance of Pearson's residues, the values closest to unity came from Unstructured, followed by the Toeplitz. Pearson's average residue estimate was very close to zero for all structures.
Through the AICc and BIC information criteria (Table 5), and with the coefficients of fit between what was observed and what was predicted by the model (Table 6), it seems that the Toeplitz covariance structure could be the most appropriate for the case under consideration (Equation (6)), obtaining the following p-values: p(σ 2 ) = 0.001; p(σ 1 ) = 0.308; p(σ 2 ) = 0.367; and p(σ 3 ) = 0.001. This shows that the variability of the fluid velocity within each level of SC was significant, as was the difference in the observed variability between the first and last sections of the tank.
The number of samples in this test was high, and only one RE was included, which made it difficult to bias the iterative linear methods that maximize the PQL. The Laplace and Gauss-Hermite quadrature approaches were tested, resulting in higher (worse) values in the AICc and BIC information criteria (41,652.69). The MQL method was also tested, obtaining lower (better) values in the information criteria with respect to the PQL. A lower adjustment, however, between what was inferred and what was observed, as well as a clear bias due to excess in the estimation of the FE was observed.
The KR [46] and Satterthwaite [47] approaches for the calculation of degrees of freedom did not improve the reporting criteria and the estimation of residual variance, compared to the Between-Within, Containment, and Residual methods. Thus, without complicated covariance structures, when the sample size is moderate to large, and the design is reasonably balanced, similar to the case considered here, the Between-Within method offered greater accuracy and compatibility with robust fixed-effects estimators [48]. As the approximate variance matrix of the fixed-effects estimates depends on the estimation method, assuming the least square mean is estimable, PROC GLIMMIX constructs an approximate t-test to test the null hypothesis that the associated population size equals zero. By default, the denominator for degrees of freedom for this test was the same as that of the effect in the type III tests of FE, although the Between-Within method divided the residual degrees of freedom into between-subject and within-subject portions so that PROC GLIMMIX determined changes in a FE within any subject. If so, it assigned the within-subject degrees of freedom to the effect, and assigned the between-subject degrees of freedom to the effect, if otherwise [49]. As multiple effects within the subject (measurement point) were related to the classification variables (LT, P, and NN), the degrees of freedom within the subject were divided into components that correspond to the subject of the interactions by the effect. Therefore, Tukey's test was conducted with the row adjustment, and multiple comparisons of the probability and confidence intervals were obtained for the differences between the least quadratic means of the FE levels [50].
The tests of effects and fixed coefficients were carried out with SPSS, and robust estimation was applied, in view of the extreme and atypical data whose effect could violate the homoscedasticity premise of the model [7,12]. In SAS, various estimators (known as sandwich or empirical) reduce to the heteroscedasticity consistent covariance matrix estimators (HCMM) [51]. Based on simulations in regression models, some authors strongly recommend the one known as HC3 estimator to accommodate non-normal data and correlated observations [52,53]. This approximation was motivated as a jack-knife estimator, based on the "leave-one-out" estimates ofβ [54].

Comparison of IBM SPSS and SAS Model Results
The estimate of the residual variance and the estimated variance of Pearson's residues, predicted by the model computed through SAS, offered better values than the one computed through SPSS (Table 7). Additionally, SAS showed the ratio of the generalized chi-square with its degrees of freedom, which is a measure of the variability of the observation on the mean model.
The distribution of Pearson's residues on the predicted values did not fit a normal distribution in either case. However, the probability of the result of the model computed with SAS showed a lower significance (K-S = 0.029; p = 2.79 × 10 −15 ) than the one computed with SPSS (K-S = 0.034; p = 4.61 × 10 −21 ).
As shown in Table 8, with the exception of the number of active nozzles in the SPSS model, all three FE had a significant effect on the internal fluid velocity. Both software showed that the level of liquid in the tank (TL) had the greatest influence on the fluid velocity, followed by the agitation system pressure (P), and finally, the number of open nozzles (NN). Considering the SPSS model, the most different FE was the number of open nozzles, as it seemed to have the least influence, with respect to the other two (<F). Considering the SAS model, the agitation system pressure and the number of open nozzles had a significant effect, and which was more similar to each other and much less than that of the liquid level in the tank.
Results obtained with both software agreed that fluid velocity was significantly lower as the liquid level inside the tank increased. While no differences were observed at 8 or 10 bar, the fluid velocity increased significantly when the pressure reached 12 bar. However, the results differed in the effect of the number of active nozzles. While the SPSS model did not observe differences between the presence of 2 and 4 open nozzles, the SAS model observed and separated them, highlighting that the velocity was significantly lower with four open nozzles, potentially due to their geometric location in the tank as they were facing each other ( Figure 1).
Thus, when the four nozzles operated, the liquid flowed facing 2 to 2 and produced a braking effect. In this case, the four nozzles were arranged on the same work plane and aligned with two sets of nozzles facing each other, with each pair located at the ends of the tank. In contrast, when two nozzles were used, they were located in the same plane but misaligned with each other, and thus did not directly face the fluid flows. There was, however, a gap in the flow lines of approximately 700 mm. This is consistent with the significance of the σ 3 covariance, between the first and last section, observed in the Toeplitz variance-covariance matrix (Equation (6)). Generally, the least square means obtained with SAS were slightly higher than those obtained with SPSS, with the exception of the cases of 2000 L and 3000 L of liquid in the tank.
The predictions on the fluid velocity of the best model revealed significant differences between all the levels of FE (Figure 4). It is noteworthy that these were slightly lower than those observed during the test (Table 8), with the differences between what was predicted by the SPSS and SAS models, and the differences in the analysis of the observations made by both software being in a certain proportion.
Without the pretension of establishing a correct linear regression, as several of its premises are not fulfilled, if this method was used to find the output that adjusts its predictions with the measurements in the laboratory better, it was observed that the best model, computed with SAS, revealed a better adjustment in the coefficient of determination (R 2 ; almost 8%), a higher correlation and, the residues, having less skewness and kurtosis, than that computed using SPSS (Table 9). Therefore, the best model was the one computed with SAS, considering the parameters that have been discussed throughout the study. However, the predicted values reproduce the highest values in the positive direction of the target variable, fluid velocity (FV), overestimating some predictions from 2 cm/s to 32 cm/s, but reducing the maximum predicted velocity by at least 15 to 17 cm/s ( Figure 5). The best fit occurred when the liquid level in the tank was 1000 L (R 2 ad = 0.467), at a pressure of 8 bar, with two active nozzles (R 2 ad = 0.661). The model setting for the variable pressure was better for a pressure of 8 (R 2 ad = 0.453) and 12 bars (R 2 ad = 0.410), as opposed to 10 bars (R 2 ad = 0.392). Finally, the model was always better adjusted with two active nozzles (R 2 ad = 0.456) than with four (R 2 ad = 0.352). The model's worst adjusted combination of variables occurred with LT = 2000 L, P = 12 bar, and NN = 2 (R 2 ad = 0.146). The values excluded from the prediction limits in Figure 5 are those whose residue exceeding twice the width of the 95% prediction interval (e ≥ 5.85; Equation (5)). Sixty-nine percent of these were measured in Section 2, with σ 2 FV,SC2 all data = 49.27 (n = 1800) versus σ 2 FV,SC2 excluded data = 52.74 (n = 255), which was the closest to the breakwater element; while 21% were measured in Section 1, with σ 2 FV,SC1 all data = 39.10 (n = 1800) versus σ 2 FV,SC1 excluded data = 42.31 (n = 80). This shows that the excluded values in each section not only contain the variability of the whole section but also exceed it. In this regard, the following combination of FE levels are highlighted: LT = 2000 L, P = 10 bar, and NN = 4, with σ 2 FV excluded data = 124.67 ( Figure A1 in Appendix B).

Conclusions
The best model tested included the liquid level in the tank, circuit pressure, and number of active nozzles as FE, and only the section of the tank as RE; height, and side of the control points, being·irrelevant. The inclusion of double and triple interactions among the FE hardly improved the fit of the best model and did not favor the interpretation of the results.
After the obligatory singular evaluation in the choice of the covariance structure, particularly for the fixed and random effects of this study, the Toeplitz covariance matrix structure showed the best values in terms of information criteria and adjustment between predicted and observed data as a whole. However, agreeing with prior studies, this structure did not show a conclusive advantage over other structures, so its choice could be discussed.
Using SPSS, the Satterthwaite approximation could be performed for the calculation of the degrees of freedom while using robust estimation methods for the tests of FE and coefficients. With SAS, however, the Between-Within method and the HC3 estimator were used. The best model in terms of reporting criteria was obtained using SAS.
Through this generalized linear mixed model, it has been possible to evaluate fixed and random effects together so that the FE that most influenced the fluid velocity could be identified as the liquid level in the tank, followed by circuit pressure, and finally, the number of active nozzles. The results obtained for the variable liquid levels in the tank and circuit pressure showed similar significant differences considering the best model calculated with both software. However, they differed with regard to the significance of the effect of the active nozzle. While the SAS calculation observed a significant effect, the SPSS did not.
The best model smoothed and moderated the extreme and outliers observed in the test but did not entirely eliminate its influence. Thus, it overestimated the predictions (from 2 cm/s to 32 cm/s) but limited the highest values (at least 15-17 cm/s). Almost 70% of the outliers were found in a single section, which was the one closest to the breakwater element.
The main limitation of this type of model (GLMM) is its singularity; it will always be necessary to formulate and validate the most appropriate one for each combination of variables. However, the use of GLMM will allow evaluating the influence of FE and RE simultaneously, and even by adding covariates. This opens up the possibility of evaluating the efficiency of different agricultural sprayers with variable geometries and specific operational parameters. Funding: This research has been funded by the LAMAGRI research group (http://lamagri.unizar.es/).