Assessment of the Use of Geographically Weighted Regression for Analysis of Large On-Farm Experiments and Implications for Practical Application

On-farm experimentation (OFE) is a farmer-centric process that can enhance the adoption of digital agriculture technologies and improve farm profitability and sustainability. Farmers work with consultants or researchers to design and implement experiments using their own machinery to test management practices at the field or farm scale. Analysis of data from OFE is challenging because of the large spatial variation influenced by spatial autocorrelation that is not due to the treatment being tested and is often much larger than treatment effects. In addition, the relationship between treatment and yield response may also vary spatially. We investigate the use of geographically weighted regression (GWR) for analysis of data from large on-farm experiments. GWR estimates local regressions, where data are weighted by distance from the site using a distance-decay kernel. It is a simple approach that can be easily explained to farmers and their agronomic advisors. We use simulated data to test the ability of GWR to separate yield variation due to treatment from any underlying spatial variation in yield that is not due to treatment; show that GWR kernel bandwidth can be based on experimental design to accurately separate the underlying spatial variability from treatment effects; and demonstrate a step-wise model selection approach to determine when the response to treatment is global across the experiment or locally varying. We demonstrate our recommended approach on two large-scale experiments conducted on farms in Western Australia to investigate grain yield response to potassium fertiliser. We discuss the implications of our results for routine practical application to OFE and conclude that GWR has potential for wide application in a semi-automated manner to analyse OFE data, improve farm decision-making, and enhance the adoption of digital technologies.

is not due to the treatment being tested [34]; and 3.
is often much larger than the variation due to treatment effects.
Spatial auto-correlation occurs when Tobler's First Law applies: "Everything is related to everything else. But near things are more related than distant things." [35]. Thus, spatial data are typically not independent, and regression models that assume independence cannot be applied. Various methods have been used to handle spatial autocorrelation in large-scale agricultural experiments. For experiments without geo-referenced measurements, these include standard methods typically applied to small plot experiments, such as experimental design with blocking, global regressions that allow spatially autocorrelated error terms [36], and linear mixed models [19,33,34,37,38]. However, standard methods are frequently not suitable for the analysis of on-farm experiments that are conducted using large plots in simple designs that may be systematic rather than randomised, and where there are multiple geo-referenced measurements recorded for each plot.
Spatial regression [39][40][41] and Bayesian conditional auto-regressive models [42] have been used to account for spatial autocorrelation in OFE data. However, these methods assume a global relationship between treatment and yield, and in large experiments, the relationship between treatment and yield may vary spatially. Lark and Wheeler [43] estimated local nitrogen response curves using moving window regression with a spatial error model. Pringle et al. [44] used co-kriging to interpolate response from all levels of a variable input onto a common grid, producing a yield corresponding to each level at each cell in the grid. For each cell, they then fit local polynomial response curves. Co-kriged yield from variable and discrete inputs has also been used for point-wise hypothesis testing, effectively considering fertiliser rates to be discrete treatments rather than estimating response curves directly [45,46]. In addition, more simple comparisons of two treatment types laid in strips have been performed using local methods [22,47] and an examination of changes across plot boundaries has also been considered [20].
Geographically weighted regression (GWR) is an alternative and simpler approach that has recently been used to estimate spatially varying response curves from OFE [48,49]. GWR can be simply described as performing local linear regressions, where data are weighted by distance from each local site [50][51][52]. The ability to describe GWR so simply is a major advantage of GWR, meaning that it can be easily explained to farmers and their agronomic advisors and it has the potential to be applied in a semi-automated or automated way in computer software. Consider the example of an on-farm experiment that aims to detect the response of yield to different fertiliser rates. Fertiliser response typically increases up to a maximum yield before reaching a plateau or decreasing as more fertiliser is applied [53]. It is often modelled using linear-plateau [54], quadratic-plateau [55], linear plus exponential [56], or modified Mitscherlich [57] functions. If the experiment applies rates in the approximately linear part of the fertiliser response curve, basic linear GWR can be used to estimate and map the spatially varying intercepts and slopes. The intercept map will capture spatial variation in yield that is not due to the applied fertiliser, and the slope map will capture the local yield response to fertiliser. We expect data analysis using GWR to be part of a broader OFE process of co-learning. For that process to lead to practice change, the results of analysis must be assimilated with a farmer's Agronomy 2020, 10, 1720 4 of 25 prior knowledge and intuition. The outputs from GWR using linear regressions are ideal for this, as they can be visualised using two simple maps of intercept and response that are easily understood and contextualised by farmers. They allow farmers to visualise spatial variation in yield and in response to fertiliser and adjust management practices using data in conjunction with their experience and intuition. This added interpretation of the analysis encourages and enables farmers to use data and evidence-based decision-making as an operational part of farm management. Making OFE a routine part of farming will bridge the disconnect between farmers and researchers, allowing for the continuous learning and improvement in farm management as well as generating new knowledge in the agricultural domain.
While local linear regressions cannot be used to determine yield-maximising or profit-maximising rates of fertiliser, OFE can have several objectives depending on the interests of the experimenter. The optimisation of inputs may not be the prime objective. In the case where we investigate using local linear regressions, the primary goal of the experiment is to quantify the relationship between applied fertiliser and yield and how that relationship varies in space. Additional learning can also arise from considering causes of variation in non-treatment effects captured by the intercept, which can offer insight into pre-existing soil nutrient supplies or causes of yield variability that may be due to soil constraints, weeds, or disease [14]. "One difficulty with GWR is that the estimated parameters are, in part, functions of the weighting function or kernel selected in the method" [50]. GWR results crucially depend on the kernel bandwidth, which determines the rate at which weights decay so that larger bandwidths lead to smoother changes in model parameters. Bandwidth is usually selected by minimising the leave-one-out cross-validation error or measuring that balance accuracy with model complexity such as the Akaike Information Criterion (AIC) [58] or the corrected AIC (Amick) [52,59]; see Section 2.3.2. However, because OFE data are densely distributed in space and have a non-random structure due to experimental design, these standard approaches to bandwidth selection can result in over-fitted models and confounded estimates of spatial and treatment effects. Rakshit et al. [48] describe and demonstrate two approaches to bandwidth selection that can be used for OFE: bandwidth based on the experimental design and spatial k-fold cross-validation [60].
An important question for potential practitioners of PA is whether parts of a paddock respond differently to treatment and should therefore be managed differently [43]. This is an issue of model selection: is the relationship between treatment and yield global, or are locally varying models required? Two categories of on-farm experiments can be defined: (1) those that aim to estimate local response curves for a variable input assuming that they vary spatially across a field, and (2) those that assess response globally across a field [34]. One approach for exploring non-stationarity in treatment-yield relationships is to apply standard methods to predefined landscape zones [40]. This can be also be considered in a geostatistical framework by comparing the use of global or local semivariograms. Pringle, McBratney, and Cook [44] compared global and local kriging methods by the use of an evaluation set containing 10% of the data from two field experiments that were systematic in design. They concluded that a local model was only justified for one of the two experiments, and the other could therefore be addressed as a whole. Model selection for GWR is commonly performed by comparing GWR with ordinary least squares regression applied to the entire dataset. Since yield data from OFE have large, spatially auto-correlated non-treatment effects, we suggest comparing GWR with "mixed" GWR, where the model intercept can vary spatially to capture spatial variation in yield, but the slope of the regression, or response, is fixed globally. GWR and mixed GWR models can be compared using the AICc, and the significance testing of terms is also possible [61,62].
The objective of this study is to assess whether GWR can be used to estimate spatially varying differences in fertiliser response within the approximately linear part of fertiliser response curves for use in the broader OFE process of co-learning and data-driven decision-making. While we are aware that by considering only the linear part of the response curve, we cannot determine yield-maximising or profit-maximising rates of fertiliser, we are primarily interested in assessing the use of GWR for quantifying the relationship between applied fertiliser and yield, and determining if and how that Agronomy 2020, 10, 1720 5 of 25 relationship varies in space. We do this by simulating on-farm experiments with linear response to fertiliser treatment and using the simulated data to assess the ability of GWR to separate treatment effects from non-treatment effects. We consider two issues of importance in the application of GWR to OFE: (1) bandwidth selection and (2) model selection to determine whether the relationship between treatment and yield is global or spatially varying. We test the use of a bandwidth based on experimental design and show that it provides a better separation of non-treatment and treatment effects and a more accurate estimation of treatment effects than the usual approach of minimising the AICc. We also propose and demonstrate a stepwise model selection approach for determining whether a global or local model of treatment effects should be used.
The article is structured as follows. Materials and methods are outlined in Section 2. Section 2.1 describes the simulated data that we use to test the ability of GWR to accurately separate underlying spatial variation in yield that is not due to treatment from treatment effects and to assess methods for bandwidth selection and model selection. Section 2.2 describes two on-farm experiments undertaken in Western Australia on which we demonstrate the application of GWR to understand spatial variation in yield response to potassium fertiliser. The GWR method is described in detail in Section 2.3. Results are presented in Section 3. Section 3.1 contains the results using simulated data and subsequent recommendations for application of GWR to OFE in practice. Section 3.2 applies these recommendations to the analysis of the two potassium trials. Finally, we discuss our results and implications for the practical application of GWR for the analysis of OFE in computer software systems and decision tools.
We used the 'GWmodel' R package [63,64] to conduct this research and provide R code as Supplementary Materials. Some modifications to the GWmodel package have been made to speed up the estimation of mixed GWR models, and they are freely available from https://github.com/ fionahevans/GWmodelFE.

Simulated Data
Data are simulated for three likely scenarios:
Linear response to fertiliser that varies within spatial zones 3.
Locally varying linear response to fertiliser For each scenario, minimum yield (yield.min), the yield when no fertiliser is applied, is assumed to vary spatially. It is simulated as a spatially autocorrelated random field with mean 2500 kg ha −1 , using an exponential variogram with sill equal to 2500 × 10 2 and range equal to 30 pixels. A fertiliser strip trial is simulated using a randomised complete block design with four rates (0, 30, 60, and 100) and two repetitions. Each strip is 3 pixels (30 m) wide. The response to fertiliser is added to the minimum yield.

Global Linear Response to Fertiliser
The effect of fertiliser is simulated globally as: where rate is the amount of fertiliser applied in kg ha −1 . The coefficient 6 corresponds to a global response of 6 kg of yield per kilogram of fertiliser applied. The simulated experiment, comprised of yield.min, rate, and yield, is shown in Figure 1.

Linear Response to Fertiliser that Varies within Spatial Zones
Local linear response is assumed to vary across three spatial zones, where the northern zone has no response, the middle zone has the same response as for the global linear simulation, and the southern zone has twice the response of the global linear simulation. The simulated experiment, comprised of yield.min, zone response, fertiliser rate, and yield, is shown in Figure 2.

Linear Response to Fertiliser that Varies within Spatial Zones
Local linear response is assumed to vary across three spatial zones, where the northern zone has no response, the middle zone has the same response as for the global linear simulation, and the southern zone has twice the response of the global linear simulation. The simulated experiment, comprised of . , zone response, fertiliser , and , is shown in Figure 2.

Locally Varying Linear Response to Fertiliser
The effect of fertiliser is simulated locally as: where response is simulated as a spatially autocorrelated random field with mean 6 (the coefficient used to simulate global linear response), using an exponential variogram with sill = 60 and range = 100. Local response to treatment varies at a larger scale than other yield variation. The simulated experiment, comprised of . , response, , and , is shown in Figure 3.

Locally Varying Linear Response to Fertiliser
The effect of fertiliser is simulated locally as: where response is simulated as a spatially autocorrelated random field with mean 6 (the coefficient used to simulate global linear response), using an exponential variogram with sill = 60 and range = 100. Local response to treatment varies at a larger scale than other yield variation. The simulated experiment, comprised of yield.min, response, rate, and yield, is shown in Figure 3.

Potassium Trials
Two large-scale experiments were conducted on farms in the grainbelt of Western Australia to investigate grain yield response to potassium fertiliser Muriate of Potash (MOP, or potassium chloride, KCl, 50% potassium).

Cunderdin Trial
An on-farm experiment was conducted near Cunderdin (31.68 S, 117.17 E), approximately 150 km east of Perth, to investigate barley yield response to potassium in 2018. Four rates of MOP were applied, with two repetitions, using a randomised complete block design in a strip trial ( Figure 4). The strips were 12 m wide and approximately 900 m long. Yield monitor data were collected using a harvester with 12 m swath width and cleaned to remove: 1. Extreme high and low yields. 2. Extreme high and low harvester speeds.

Potassium Trials
Two large-scale experiments were conducted on farms in the grainbelt of Western Australia to investigate grain yield response to potassium fertiliser Muriate of Potash (MOP, or potassium chloride, KCl, 50% potassium).

Cunderdin Trial
An on-farm experiment was conducted near Cunderdin (31.68 S, 117.17 E), approximately 150 km east of Perth, to investigate barley yield response to potassium in 2018. Four rates of MOP were applied, with two repetitions, using a randomised complete block design in a strip trial ( Figure 4). The strips were 12 m wide and approximately 900 m long. Yield monitor data were collected using a harvester with 12 m swath width and cleaned to remove:

1.
Extreme high and low yields.

2.
Extreme high and low harvester speeds.

Potassium Trials
Two large-scale experiments were conducted on farms in the grainbelt of Western Australia to investigate grain yield response to potassium fertiliser Muriate of Potash (MOP, or potassium chloride, KCl, 50% potassium).

Cunderdin Trial
An on-farm experiment was conducted near Cunderdin (31.68 S, 117.17 E), approximately 150 km east of Perth, to investigate barley yield response to potassium in 2018. Four rates of MOP were applied, with two repetitions, using a randomised complete block design in a strip trial ( Figure 4). The strips were 12 m wide and approximately 900 m long. Yield monitor data were collected using a harvester with 12 m swath width and cleaned to remove: 1. Extreme high and low yields. 2. Extreme high and low harvester speeds.

Narambeen Trial
An on-farm experiment was conducted near Narambeen (32.11 • S, 118.67 • E) to investigate wheat yield response to potassium in 2019. Four rates of MOP were applied in a trial design comprised Agronomy 2020, 10, 1720 8 of 25 of both strips and a checkerboard design ( Figure 5). The strips were approximately 30 m wide and 1200 m long. In the checkerboard, each strip was split into eight plots approximately 150 m in length. Some errors in fertiliser placement occurred, and therefore, the trial design was altered post hoc to adjust some plot boundaries. Yield monitor data were collected using a harvester with 15 m swath width and cleaned using the protocol above to remove locations near the boundaries of experimental plots where there was a possibility of mixed treatments.
An on-farm experiment was conducted near Narambeen (32.11° S, 118.67° E) to investigate wheat yield response to potassium in 2019. Four rates of MOP were applied in a trial design comprised of both strips and a checkerboard design ( Figure 5). The strips were approximately 30 m wide and 1200 m long. In the checkerboard, each strip was split into eight plots approximately 150 m in length. Some errors in fertiliser placement occurred, and therefore, the trial design was altered post hoc to adjust some plot boundaries. Yield monitor data were collected using a harvester with 15 m swath width and cleaned using the protocol above to remove locations near the boundaries of experimental plots where there was a possibility of mixed treatments.

Basic GWR
In its simplest form, GWR performs local linear regressions, accounting for spatial auto-correlation by weighting data by distance from each local site using a distance-decay kernel [50][51][52]. More generally, GWR performs local likelihood estimation [59,65], which means that a wider range of local template models can be used (such as generalised linear models), hypothesis testing can be performed using local t-scores, and models can be compared using model selection criteria.
Consider spatial data collected at geographic locations ( , ) for = 1, … , . Let y be the response variable and = ( 1 , … , ) be the explanatory variables. The basic GWR model takes the form where 0 ( , ) and ( , ) are the coefficients of the local regression at the i-th location and is the error. Letting = ( 0 , … , ), the maximum likelihood estimator for the i-th location is given by

Basic GWR
In its simplest form, GWR performs local linear regressions, accounting for spatial auto-correlation by weighting data by distance from each local site using a distance-decay kernel [50][51][52]. More generally, GWR performs local likelihood estimation [59,65], which means that a wider range of local template models can be used (such as generalised linear models), hypothesis testing can be performed using local t-scores, and models can be compared using model selection criteria.
Consider spatial data collected at geographic locations (u i , v i ) for i = 1, . . . , n. Let y be the response variable and X = (x 1 , . . . , x m ) be the explanatory variables. The basic GWR model takes the form where β 0 (u i , v i ) and β k (u i , v i ) are the coefficients of the local regression at the i-th location and i is the error. Letting β = (β 0 , . . . , β k ), the maximum likelihood estimator for the i-th location is given bŷ Agronomy 2020, 10, 1720 where W i is an n by n diagonal matrix whose diagonal elements correspond to the weights based on distance from the i-th data point used in the local regression and C i = X T W i X −1 X T W i . The variance of the estimated parameters is given by where σ 2 is the normalised residual sum of squares from the local regression. The hatmatrix S is defined to be the matrix with n rows given by X i C i . Writing ν 1 = tr(S) and ν 2 = tr (S T S ), the effective degrees of freedom is given by γ * = n − 2ν 1 + ν 2 , and σ 2 is given by and the local t-score for the j-th explanatory variable can be defined as where s jj is the j-th diagonal entry in SS T . A formal hypothesis test is performed using the corresponding p-value with γ degrees of freedom. To avoid false positives due to multiple testing, p-values can be adjusted using various methods [66]. Note that these p-values test whether local regression coefficients are significant (i.e., whether there is a yield response to treatment) but do not test the stationarity of covariates (i.e., whether yield response to treatment should be modelled as a global locally varying).

Bandwidth Selection for GWR
Comparison of kernels and bandwidth selection are usually performed using model selection criteria that encode a trade-off between the accuracy of the model and its complexity, such as the AIC [58] or the corrected AIC (AICc) recommended for GWR [52,59]. The AICc is given by: where the hatmatrix S, defined in Section 2.3.1 is a function of the bandwidth. The AICc is composed of two parts: the first is the negative of the log-likelihood or model accuracy, and the second is a measure of the number of parameters or model complexity. A good model is one that maximises accuracy while minimising complexity; thus, smaller values of the AICc indicate better models. The AIC is asymptotically equivalent to leave-one-out cross-validation [67], which means that it can provide overly generous estimates of model accuracy for large datasets. Since OFE data are large, densely distributed in space, and have a non-random structure due to experimental design, bandwidth selection using AICc can result in over-fitted models and confounded estimates of spatial and treatment effects. A recently proposed method for selecting bandwidth is based on the experimental design used, so that local regressions capture data covering the full range of treatments [48]. The recommended bandwidth is (τ − 1)d/2 , where d is the distance between adjacent treatments or the width of the strips in the trial, and τ is the number of treatments. In some cases, this may not be large enough to accommodate all treatments for some local regressions, in which case a larger bandwidth should be used.
We compare bandwidth selection by minimising AICc with bandwidth selection based on strip widths and experimental design.

Mixed GWR and Model Selection
Basic GWR assumes that all regression coefficients vary geographically. In some situations, this may not be the case; for example, yield response to fertiliser application may be constant across a trial. In this case, a mixed GWR model that allows some coefficients to be global may better describe the data. The mixed GWR model has the form where X a = x 1 , . . . , x q are response variables with fixed or global coefficients and X b = x q+1 , . . . , x m are response variables with spatially varying coefficients. The estimation of mixed GWR models is performed as follows [52]: 1.
For each column of X a , i. Regress the column against X b using basic GWR. ii.
Compute the residual from the above regression.

2.
Regress y against X b using basic GWR.

3.
Compute the residual from the above regression.

4.
Regress the y residuals against the X a residuals using ordinary least squares regression to get a = β 1 , . . . ,β q .

5.
Regress y − X aâ against X b using basic GWR to get the spatially varying coefficients.
The hatmatrix S * for a mixed GWR model can be defined and used to calculate the AICc bŷ where S is the hatmatrix for the spatially varying part of the model, and W = (I − S) T (I − S).
Model selection for GWR is commonly performed by comparing GWR with ordinary least squares regression applied to the entire dataset. For OFE, we instead compare basic and mixed GWR models. Fotheringham, Brunsdon, and Charlton [52] discuss several methods for testing whether a covariate is stationary and better modelled using a mixed GWR model, including a Monte Carlo approach, testing the variability of the variance of coefficients using the methods proposed by Leun, et al. [68] and by minimising the AICc. We propose a model selection approach for OFE that assesses goodness of fit, balanced by model complexity, using the AICc criterion. The models tested are shown in Table 1. Table 1. Model forms tested as part of model selection.

GWmodel R Package
The application of GWR requires careful selection of an appropriate kernel and kernel bandwidth. Kernel functions can be continuous, giving positive weight to all distances, or discontinuous, assigning zero weights to distances greater than the set bandwidth. Discontinuous kernels allow for efficient computation. The R package 'GWmodel' implements six kernels for geographic weighting [63], which are reproduced in Table 2. Gaussian and exponential kernels are continuous functions of distance, producing positive weights for all distances. Bi-square and tri-cube kernels are discontinuous functions that produce zero weights beyond a certain distance. The box-car kernel is a simple discontinuous function that gives equal weight to all data within a certain distance and zero weight beyond it. Bandwidth can be specified either as a fixed distance with a variable number of points (adaptive = FALSE) or as an adaptive distance where a fixed number of local sample points (adaptive = TRUE) is used to estimate the local likelihood estimation (Fotheringham et al., 2002). While adaptive bandwidths are useful when the spatial density of data varies, fixed distance bandwidths are more appropriate for OFE because data collected by yield monitors exhibit spatial structure where points are sampled continuously along the path of the harvester and the maximum distance between sample points is determined by the width of the harvester. As a result of this spatial structure, the density of points is regular in space, and fixed distance bandwidths are used throughout this study. Table 2. Distance weighting kernels implemented in the GWmodel R package where w i j is the j-th element of the diagonal of the weight matrix W used to estimate the local regression at location i, d i j is the distance between the i-th and j-th observations, and b is the bandwidth.

Kernel Name Formula
Gaussian Tri-Cube otherwise.

Bandwidth Selection
This section presents the results of applying basic GWR to three types of simulated data: global linear response, local linear response that varies across three zones, and locally varying linear response. For each simulation, the yield is modelled using a basic GWR model with a spatially varying intercept and slope (response) with a Gaussian kernel: gwr.basic( yield ∼ rate , kernel = "gaussian", adaptive = FALSE).
We show maps of locally estimated intercept and slope, which we refer to as the "response" to treatment. We map the error components: overall model error and error in estimating the intercept and response. We compare the results of using two bandwidths: the bandwidth that minimises AICc (identified using GWmodel function 'bw.gwr') and that based on the strip width. For the simulated data, the strips are 3 pixels (30 m) wide, and using the formula in Section 2.3.2, the optimal bandwidth is given by (τ − 1)d/2 = 4.5 pixels (45 m).
For the simulated data with global linear response to fertiliser, AICc is minimised using bandwidth = 0.87 pixels. The estimated intercepts and responses are mapped in Figure 6a with the error components. The intercept map looks like a slightly smoothed version of the minimum yield (yield.min). The mean of the response (estimated slope coefficient) is 5.7, which is near the constant value of 6 used to create the simulation. Estimated response ranges from −7.6 to 19.1. However, both the intercept and response maps show effects corresponding to the simulated trial strips, and there is clearly confounding between the intercept and response. The overall model error has no visible spatial autocorrelation; however, there is autocorrelation evident in the errors in the estimated intercept and response. The results using a larger bandwidth of 4.5 are shown in Figure 6b. Use of the larger bandwidth results in smoother estimates of both the intercept and response. The mean response is 5.2, but the response has a reduced range of 0.6 to 8.4. The errors in the overall model and intercept are larger and more spatially autocorrelated than when using the AICc-minimising bandwidth; however, the error in estimation of the response is smaller. For the simulated data where the linear response is constant for three fixed zones, a bandwidth of 0.87 pixels minimised AICc. The intercept, response, and errors are shown in Figure 7a. The response map shows some error along the zone and strip boundaries. Variation within the zones is visible in the response map, but the mean response across each of the three zones is −0.6, 6.1, and 11.5, which are near the values of 0, 6, and 12 used to simulate the data. The overall error is small and has no visible spatial autocorrelation; however, spatial autocorrelation is evident in the intercept and response error. The results using the larger bandwidth are shown in Figure 7b. Use of the larger bandwidth results in smoother estimates of the intercept and response without affecting the ability of GWR to identify the mean response for each zone. While the overall error is higher than that achieved using the AICc-minimising bandwidth, the error in the intercept and response is lower. For the simulated data where the linear response is constant for three fixed zones, a bandwidth of 0.87 pixels minimised AICc. The intercept, response, and errors are shown in Figure 7a. The response map shows some error along the zone and strip boundaries. Variation within the zones is visible in the response map, but the mean response across each of the three zones is −0.6, 6.1, and 11.5, which are near the values of 0, 6, and 12 used to simulate the data. The overall error is small and has no visible spatial autocorrelation; however, spatial autocorrelation is evident in the intercept and response error. The results using the larger bandwidth are shown in Figure 7b. Use of the larger bandwidth results in smoother estimates of the intercept and response without affecting the ability of GWR to identify the mean response for each zone. While the overall error is higher than that achieved using the AICc-minimising bandwidth, the error in the intercept and response is lower. Agronomy 2020, 10, x FOR PEER REVIEW 13 of 25 For the simulation of locally varying linear response, AICc was minimised using a bandwidth of 0.88 pixels. The results are shown in Figure 8a. They show confounding between the estimated intercept and response, with considerable error in estimation of the response. In contrast, when the larger bandwidth is used, as shown in Figure 8b, there is no confounding between the estimated intercept and response, and although there is more spatial autocorrelation in the overall and intercept errors, the response error is lower. For the simulation of locally varying linear response, AICc was minimised using a bandwidth of 0.88 pixels. The results are shown in Figure 8a. They show confounding between the estimated intercept and response, with considerable error in estimation of the response. In contrast, when the larger bandwidth is used, as shown in Figure 8b, there is no confounding between the estimated intercept and response, and although there is more spatial autocorrelation in the overall and intercept errors, the response error is lower. Agronomy 2020, 10, x FOR PEER REVIEW 14 of 25

Summary of Bandwidth Selection Results and Recommended Approach to Bandwidth Selection for OFE
For simulations of global linear response, linear response that varies across zones, and locally varying linear response, AICc-minimising bandwidths ranged from 0.87 to 0.88 pixels. Using AICc-minimising bandwidth, GWR on local linear response by zones showed an error in the part of the trial where the highest rate of fertiliser was applied, and GWR applied to locally varying linear response showed confounding between the estimated intercept and response. In contrast, the use of a larger bandwidth based on the width of strips in the trial resulted in smoother estimates of the intercept and response as well as more accurate estimates of response, with less confounding, despite a larger and more autocorrelated overall error. To test whether a more accurate estimation of response is an artefact of this particular simulation, we performed 500 simulations of minimum yield with locally varying linear response for bandwidths of 0.89, 1, 2.25, 4.5, 6, 8.5, and 10 pixels. The results show that AICc, overall error, and intercept error all increase with increasing bandwidth, but the response error decreases as the bandwidth increases to a minimum value achieved using bandwidth 4.5 before increasing again (Figure 9). This supports the theory presented in Section 2.3.2 with respect to determining the optimal bandwidth based on the experimental design for estimating yield response to treatment. There are also implications for the interpretation of results of GWR

Summary of Bandwidth Selection Results and Recommended Approach to Bandwidth Selection for OFE
For simulations of global linear response, linear response that varies across zones, and locally varying linear response, AICc-minimising bandwidths ranged from 0.87 to 0.88 pixels. Using AICc-minimising bandwidth, GWR on local linear response by zones showed an error in the part of the trial where the highest rate of fertiliser was applied, and GWR applied to locally varying linear response showed confounding between the estimated intercept and response. In contrast, the use of a larger bandwidth based on the width of strips in the trial resulted in smoother estimates of the intercept and response as well as more accurate estimates of response, with less confounding, despite a larger and more autocorrelated overall error. To test whether a more accurate estimation of response is an artefact of this particular simulation, we performed 500 simulations of minimum yield with locally varying linear response for bandwidths of 0.89, 1, 2. 25, 4.5, 6, 8.5, and 10 pixels. The results show that AICc, overall error, and intercept error all increase with increasing bandwidth, but the response error decreases as the bandwidth increases to a minimum value achieved using bandwidth 4.5 before increasing again (Figure 9). This supports the theory presented in Section 2.3.2 with respect to determining the optimal bandwidth based on the experimental design for estimating yield response to treatment. There are also implications for the interpretation of results of GWR analyses: reduced variation in the estimated intercept and response maps when using larger bandwidths should enable easier interpretation, and end users can have greater confidence in the ability of the GWR approach to separate treatment effects from other non-treatment influences on yield.
Agronomy 2020, 10, x FOR PEER REVIEW 15 of 25 analyses: reduced variation in the estimated intercept and response maps when using larger bandwidths should enable easier interpretation, and end users can have greater confidence in the ability of the GWR approach to separate treatment effects from other non-treatment influences on yield. Figure 9. Distribution of AICc and error components calculated for 500 simulations of locally varying linear response of yield to fertiliser application with varying bandwidths.

Model Selection
This section tests whether model selection correctly identifies the optimal model for global and locally varying response to fertiliser. We use a Gaussian kernel with both the AICc-minimising bandwidth and bandwidth of 4.5 pixels based on the width of experimental strips. Table 3 shows the results. Table 3. AICc for models of increasing order of complexity fitted to data with simulated global and locally varying linear response (minimum values of AICc identify optimal models; these are marked with asterisks, *).

Model
Global

Model Selection
This section tests whether model selection correctly identifies the optimal model for global and locally varying response to fertiliser. We use a Gaussian kernel with both the AICc-minimising bandwidth and bandwidth of 4.5 pixels based on the width of experimental strips. Table 3 shows the results. Table 3. AICc for models of increasing order of complexity fitted to data with simulated global and locally varying linear response (minimum values of AICc identify optimal models; these are marked with asterisks, *).

Model
Global Model selection performed on the global linear simulation correctly identified the optimal model as global linear when using the optimal bandwidth identified by minimising AICc; however, when a larger bandwidth based on strip width was used, model selection failed to identify the correct model (Table 3). Model selection performed on the simulation of locally varying linear response correctly identified the optimal model as local linear using both the optimal bandwidth identified by minimising AICc and when using a larger bandwidth based on strip width.

Summary of Model Selection Results and Recommended Approach to Model Selection for OFE
Stepwise model selection identified the correct model for global response only when using the optimal bandwidth identified by minimising AICc. For locally varying response, the correct model was identified using optimal and larger bandwidths. This suggests that to determine whether the assumption of locally varying response is valid, model selection should be performed using the AICc-minimising bandwidth; then, the selected model should be estimated using a larger bandwidth. To test whether this conclusion holds, we performed 500 simulations of minimum yield with both global and local response and calculated the percentage of times model selection identified the correct model ( Table 4). The results showed that use of the smaller AICc-minimising bandwidth correctly identified the global model 100% of the time and the local model 78% of the time. Use of the larger bandwidth led to local models being selected all of the time. Therefore, we recommend that model selection for OFE be performed as follows: 1.
Calculate the bandwidth that minimises AICc.

2.
Fit a mixed GWR that allows only the intercept to vary spatially.

3.
Fit a basic GWR that allows both the intercept and yield response to treatment to vary spatially. 4.
Select the model from 2 and 3 that has the lowest AICc.
We applied the Monte Carlo technique described in [52] to the same 500 simulations, but that method performed more poorly (Table 5). Table 4. Percentage of 500 simulations in which the correct model was identified by model selection.

Bandwidth Global Response Local Response
AICc-minimising 100 78 45 m <1 100 Table 5. Percentage of 500 simulations in which the correct model was identified by a Monte Carlo test for spatial variability.

Potassium Trials
Model selection for each of the potassium trials was performed using the recommended approach developed using the simulated data in Section 3.1.2. After identifying the optimal model, GWR bandwidths were determined using the optimal approach identified using simulated data in Section 3.1.1, such that the optimal bandwidth is given by (τ − 1)d/2 where τ is the number of fertiliser rates being trialed and d is the width of the plot trials.

Cunderdin Trial
Model selection for the Cunderdin trial identified the mixed GWR model as optimal, meaning that while there was a variation in yield that was not due to fertiliser, the barley yield response to potassium was homogenous across the trial. The optimal bandwidth was 12 m. Mixed GWR was applied using this bandwidth with a non-adaptive Gaussian kernel to predict the intercept (expected yield with no fertiliser applied) over a grid of 10 m by 10 m cells across the trial area. The intercept map, shown in Figure 10a, looks like a smoothed version of the cleaned yield data. This map is the yield expected if no MOP fertiliser treatment was applied. Therefore, the spatial variation visible in the intercept map is attributed to all non-treatment effects on yield, such as the amount of stored potassium and other nutrients in the soil, soil type, and constraints. Values range from 4000 to 6000 kg ha −1 . The global response (rate of change in yield due to applied MOP) was 1.7 kg ha −1 of barley yield per kg ha −1 of MOP applied. This response is extremely low and would not justify the use of MOP fertilisation: the expected yield with no fertiliser applied is 4605 kg ha −1 across the trial. The addition of 40 kg MOP ha −1 would result in a total yield increase of only 61 kg ha −1 , and 80 kg MOP ha −1 would result in an increase of only 121 kg ha −1 .
The model errors are shown in Figure 10b. Approximately 90% of the absolute errors are less than 500 kg ha −1 , and approximately 50% of the absolute errors are less than 250 kg ha −1 . The correlation between observed and predicted yields is 0.63.
Agronomy 2020, 10, x FOR PEER REVIEW 17 of 25 potassium was homogenous across the trial. The optimal bandwidth was 12 m. Mixed GWR was applied using this bandwidth with a non-adaptive Gaussian kernel to predict the intercept (expected yield with no fertiliser applied) over a grid of 10 m by 10 m cells across the trial area. The intercept map, shown in Figure 10a, looks like a smoothed version of the cleaned yield data. This map is the yield expected if no MOP fertiliser treatment was applied. Therefore, the spatial variation visible in the intercept map is attributed to all non-treatment effects on yield, such as the amount of stored potassium and other nutrients in the soil, soil type, and constraints. Values range from 4000 to 6000 kg ha −1 . The global response (rate of change in yield due to applied MOP) was 1.7 kg ha −1 of barley yield per kg ha −1 of MOP applied. This response is extremely low and would not justify the use of MOP fertilisation: the expected yield with no fertiliser applied is 4605 kg ha −1 across the trial. The addition of 40 kg MOP ha −1 would result in a total yield increase of only 61 kg ha −1 , and 80 kg MOP ha −1 would result in an increase of only 121 kg ha −1 .
The model errors are shown in Figure 10b. Approximately 90% of the absolute errors are less than 500 kg ha −1 , and approximately 50% of the absolute errors are less than 250 kg ha −1 . The correlation between observed and predicted yields is 0.63.

Narambeen Trial
Model selection for the Narambeen trial identified the basic GWR model as optimal, meaning that the yield response to treatment varies spatially across the trial. Each plot contained two swathes of yield data, so the plots are 30 m wide, and therefore, the optimal bandwidth is 45 m. Basic GWR was applied using this bandwidth with a non-adaptive Gaussian kernel to predict the intercept and response over a grid of 10 m by 10 m cells across the trial area. The intercept map shown in Figure  11a looks like a smoothed version of the cleaned yield data. This map is the yield expected if no MOP fertiliser treatment was applied. Therefore, the spatial variation visible in the intercept map is attributed to all non-treatment effects on wheat yield, such as soil nutrition, soil type, and management constraints. Values range from 500 to 4000 kg ha −1 , with three quarters of the paddock yielding less than 2000 kg ha −1 .
The response map, shown in Figure 11b, shows the rate at which applied MOP will affect yield (the slope of the local regressions). Across 95% of the trial area, MOP had a positive yield response. Local t-tests, shown in Figure 11d, showed that the effect of fertiliser on yield was significant (p < 0.05) over 75% of the trial area. However, the mean response was only 1.6 kg yield ha −1 per kg of MOP applied, and the maximum response is 5 kg ha −1 yield per kg of MOP applied. A recent analysis of potassium cost to wheat grain prices [69] showed that the that the break-even ratio for the fertilisation of barley with MOP was 7.4; thus, this analysis shows that MOP fertilisation is not profitable for this paddock. This begs the question: Is statistical significance as important as practical

Narambeen Trial
Model selection for the Narambeen trial identified the basic GWR model as optimal, meaning that the yield response to treatment varies spatially across the trial. Each plot contained two swathes of yield data, so the plots are 30 m wide, and therefore, the optimal bandwidth is 45 m. Basic GWR was applied using this bandwidth with a non-adaptive Gaussian kernel to predict the intercept and response over a grid of 10 m by 10 m cells across the trial area. The intercept map shown in Figure 11a looks like a smoothed version of the cleaned yield data. This map is the yield expected if no MOP fertiliser treatment was applied. Therefore, the spatial variation visible in the intercept map is attributed to all non-treatment effects on wheat yield, such as soil nutrition, soil type, and management constraints. Values range from 500 to 4000 kg ha −1 , with three quarters of the paddock yielding less than 2000 kg ha −1 .
or economic significance? Since farmers are ultimately interested in profit, statistical significance is not necessarily a useful piece of information for OFE.  The response map, shown in Figure 11b, shows the rate at which applied MOP will affect yield (the slope of the local regressions). Across 95% of the trial area, MOP had a positive yield response. Local t-tests, shown in Figure 11d, showed that the effect of fertiliser on yield was significant (p < 0.05) over 75% of the trial area. However, the mean response was only 1.6 kg yield ha −1 per kg of MOP applied, and the maximum response is 5 kg ha −1 yield per kg of MOP applied. A recent analysis of potassium cost to wheat grain prices [69] showed that the that the break-even ratio for the fertilisation of barley with MOP was 7.4; thus, this analysis shows that MOP fertilisation is not profitable for this paddock. This begs the question: Is statistical significance as important as practical or economic significance? Since farmers are ultimately interested in profit, statistical significance is not necessarily a useful piece of information for OFE.
GWR can be used to predict yield using different amounts of fertiliser by application of the regression equation; for example, Figure 11c shows the predicted change in yield if 134 kg ha −1 of MOP had been applied. The predicted average improvement in yield achieved across the trial by MOP fertiliser application was 106 kg ha −1 at a rate of 66 kg MOP ha −1 , 215 kg ha −1 at a rate of 134 kg MOP ha −1 , and 321 kg ha −1 at 200 kg MOP ha −1 .
Model errors ranged from -1519 to 1709 kg ha −1 with 90% of absolute errors less than 275 ha −1 . The correlation between observed and predicted yields is 0.90.

Discussion
A major benefit of OFE is that it provides substantially more information about performance at the field scale than conventional plot trials and thereby supports a direct interpretation of manageable effects. However, the additional complexity this information provides can make analysis challenging. Using simulation, this work has shown that GWR is a simple method for the analysis of OFE data that can accurately separate yield variation that is not due to the applied treatment from yield response due to treatment. A process of model selection can be used to determine whether yield response to treatment is constant across the trial or spatially varying. The GWR method is relatively easy to apply to large-scale on-farm experiments. It accepts a range of experimental designs that can be easily installed in commercial operations.
The method requires some care in application, particularly in bandwidth and model selection. In Section 3.1.1, we considered bandwidth selection for OFE and showed that the standard method that selects bandwidth by minimising AICc is sub-optimal for an analysis of OFE data, which are densely and regularly distributed in space. We tested a simple method for bandwidth selection based on trial design [48] and found that the larger bandwidths it prescribes are better able to separate treatment and non-treatment effects and provide more accurate estimates of yield response to treatment. The results presented here are similar to those obtained by Guo et al. [70], who concluded that AIC minimisation may not be optimal for bandwidth selection, and smaller bandwidths can overfit local models.
While conventional agronomic research seeks to represent underlying global truths about relationships between treatments and crop growth, we hypothesise that a spatially variable treatment-yield relationship will be appropriate in most cases, because factors controlling crop yield vary continuously in space and results obtained from any one location may not apply to new locations. Furthermore, decades of observation of crop yield variation from precision agriculture indicate a large spatial variation within and between management units. We test this hypothesis using model selection to compare basic GWR models that assume that yield response varies locally with mixed GWR models where response is assumed to be global across the trial. Our recommended approach for model selection did not give perfect results on simulated data: while it correctly identified simulations of global response with 100% accuracy, it only identified simulations of local response 78% of the time. We also tried a commonly used Monte Carlo technique [52], but that method performed more poorly.
When applied to two on-farm experiments that aimed to investigate yield response to MOP fertiliser, our recommended bandwidth and model selection approaches identified one trial as having global response and one as having local response. For each, the GWR intercept map estimated a variation in yield that was not due to treatment. The overall error for modelling each trial was low, and the R 2 was 0.63 for the Cunderdin trial and 0.9 for the Narambeen trial, which compares favourably with another published potassium-yield model derived from on-farm experiments, which had an R 2 of 0.48 [71]. Unfortunately, both of our potassium trials showed small yield responses to MOP fertiliser; and moreover, a response that was below the break-even ratio, indicating that fertilisation with MOP was not profitable at either location. This may because there was sufficient plant-available potassium in the soil, and it would be useful to compare the results of soil tests taken prior to experimentation with locally varying response in future trials. We intend to follow up this study by applying our recommended approach in situations where larger yield responses might be expected-for example, nitrogen trials in known deficient conditions. We note that while our applications are in broadacre cereal cropping, OFE can also be used for any horticultural or viticultural cropping system.
One concern is that while the estimation of basic GWR models is fast, the estimation of mixed GWR models and their AICc values can be slow for the large datasets acquired from OFE (approximately 40 min for our simulated data compared to less than one minute using basic GWR on a Dell Latitude laptop with 2.8 GHz CPU and 32 GB RAM). This has implications for the widespread practical application of GWR and its implementation in computer software. The importance of model selection is demonstrated in Section 3.1.2, and the use of an incorrect model could lead to spurious management decisions by the farmer. However, while the use of a mixed GWR model applied to simulated data with locally varying response gave poor results, basic GWR applied to simulated data with a global response gave realistic results that estimated the mean response equal to the global response used to simulate the data. This means that incorrectly applying a basic GWR model results in a smaller error than incorrectly applying a mixed GWR model. In practice, the likelihood of misinterpreting the results of simply applying basic GWR to OFE will be further reduced by several factors.
First, because on-farm experiments are conducted at the same scale as conventional management, the framework for evaluating results parallels the knowledge farmers use to make sense of complex information about the landscape they manage. Farmers and consultants draw information from a wide range of sources to reduce uncertainty [14]. While formal models can contribute much to support this decision-making under uncertainty, the end result will rely strongly on intuition that can be assisted by large-scale visualisations [72].
Second, the information that is extracted from OFE will vary depending on its purpose and context. While OFE may be used to support PA zoning, there are many other potential benefits from the learning around them, including the confirmation of suspected performance trends; the identification of specific treatment by soil interactions; or the use of results as performance indicators.
Thirdly is the question of practical significance of measured variation. A farmer, aided by an agronomic consultant [73], might conclude that the range of statistically significant response revealed by basic GWR was of interest but not large enough to require localised (PA) fertiliser management. That conclusion could be driven by practicality or economic considerations (as for the potassium trials we report) rather than statistical confidence, meaning that model selection for OFE could become part of the interpretation of results as well as part of the analysis. In that case, basic GWR could be quickly and easily applied to OFE in a semi-automated way to support decision-making. We use the term "semi-automated" because data analysis using GWR can be automated within a software platform, but interpretation of the results, assessment of the practical significance of results, and implications of the results for decision-making are all critical parts of the co-learning OFE process that cannot be automated. While the development of computer software for data analysis is possible and desirable, OFE is a larger process that requires systems for ongoing engagement, communication, and agronomic assistance in interpreting the results of analysis.
OFE is part of a learning process that is incremental in nature. Each experiment and analysis provides information about part of a complex managed system. Within this continued learning process, OFE results can lead to more questions being asked about the causes of spatial variability in yield Agronomy 2020, 10, 1720 21 of 25 response to treatment and spatial variability in yield that is not due to treatment. This in turn leads to more experiments being designed and conducted, in an ongoing cycle of knowledge development. For this, OFE requires easy-to-implement methods that can provide value from the process, even while more sophisticated methods emerge. In the early stages of this cycle, basic GWR analyses such as those described in this paper can provide sufficient learning to support this cycle as it progresses. We anticipate that as the experimental questions diversify, more complex methods of analysis can and should be sought.
By estimating local linear response curves, GWR can be used to predict the effects of different rates of fertiliser application. In this, it differs from geostatistical applications that interpolate yields from different treatments onto a common grid and then perform point-wise hypothesis testing [46,74]. However, fertiliser response is typically only linear in the part of the response curve before the rate of the response diminishes and a maximum yield is attained. The rate of fertiliser that maximises profits is lower that the yield-maximising rate, and it usually lies in the non-linear part of the response curve [75]. Thus, targeting just the linear part of the response curve omits consideration of the section of response that determines optimal management: that is, identification of the economic break-even ratio and profit-maximising rate. Local non-linear response curves can potentially be fitted to co-kriged yield maps from different treatments [44]. Based on the results of this study, we surmise that non-linearity can also be supported within a GWR framework by incorporating distance-based spatial weights in non-linear rather than linear regression. We are extending the GWR framework to this end and comparing GWR with other methods for analysis (linear, non-linear, and geostatistical) of data from large on-farm experiments in future work.
The use of additional data sources to attribute causes of variation is also required. These may include information about soil nutrition from soil samples taken prior to experimentation that can be used to tailor local response curves or other spatial data that can be used to explain non-fertiliser causes of spatial variability in yield. Ideally, the identification of suitable data sources for testing would become a part of the OFE co-learning framework.

Conclusions
OFE is an inexpensive means of acquiring knowledge to support on-farm decision-making and improve the uptake of digital agriculture. The process appeals to farmers, but an analysis of data from OFE can be challenging. GWR is a relatively simple method for testing for and modelling spatial variability in yield response that provides a pragmatic solution to this challenge. It has the advantage that it can be easily explained to farmers and their advisors.
Similar to all pragmatic solutions, the use of GWR requires judgment in the selection of appropriate approaches to application. This article used simulated data to show that GWR can accurately separate non-treatment and treatment effects and test approaches for bandwidth and model selection for GWR applied to OFE data, thereby developing more consistent methodologies for analysis that facilitate interpretation and subsequent farmer decision-making. It demonstrated the application of these methodologies on data from two potassium trials in Western Australia. Based on these results, we assert that GWR has the potential to be widely applied in a semi-automated manner to analyse OFE data, enhance adoption of digital technologies, and improve farm productivity and profitability.
Furthermore, rigorous statistical analysis of on-farm experiments using GWR as a base will enable agricultural researchers to use large on-farm trials to support the generation of agricultural knowledge and hypothesis testing at scale.