The methodological design follows the three research objectives outlined in the Introduction and integrates spatial econometrics, economic indicators, and sustainability risk assessment within a unified analytical structure.
2.5.3. Object 1—Spatial Modelling of Rent Sensitivity
Before applying the geographically weighted regression (GWR) model, a global spatial autocorrelation analysis is conducted to examine whether farmland rental prices exhibit statistically significant spatial dependence across counties. Identifying spatial autocorrelation provides a useful diagnostic for assessing whether spatial processes may vary across locations and whether global modelling assumptions may be violated.
Global Moran’s I is employed to quantify the overall degree of spatial autocorrelation in county-level farmland rental prices. The statistic is defined as
where
is the number of spatial units (counties),
and
denote farmland rent values at locations
and
,
is the global mean of farmland rent, and
represents the spatial weight between counties
and
.
is the sum of all spatial weights.
Moran’s I typically ranges between −1 and 1, where positive values indicate spatial clustering, negative values reflect spatial dispersion, and values near zero suggest the absence of global spatial structure [
24]. To determine whether the observed statistic departs meaningfully from spatial randomness, a permutation-based
p-value is computed. In addition, a corresponding
Z-score is reported to indicate the degree to which the observed Moran’s I deviates from its expected value under the null hypothesis of spatial randomness. This
p-value quantifies the probability of obtaining I value as extreme as the observed one under the null hypothesis of no spatial autocorrelation; small values (e.g.,
p < 0.05) provide evidence of significant spatial dependence, whereas large values imply that the spatial distribution is statistically indistinguishable from randomness.
To identify the local spatial structure of farmland rental prices, Local Indicators of Spatial Association (LISA) is further employed. LISA decomposes global spatial autocorrelation into location-specific statistics, allowing for the detection of local clusters and spatial outliers [
25].
In this study, Local Moran’s I is calculated for each county to identify statistically significant High–High, Low–Low, High–Low, and Low–High spatial associations in farmland rental prices. High–High and Low–Low clusters indicate areas where counties with similar rent levels are surrounded by neighbours with comparable values, whereas High–Low and Low–High patterns represent spatial outliers. Statistical significance is assessed using a permutation-based approach.
Geographically weighted regression (GWR) is a spatial modelling technique that enables regression coefficients to vary across locations, allowing for relationships between the dependent and explanatory variables to change over space. In contrast to global regression models—which assume that a single set of parameters applies uniformly across the entire study area—GWR captures spatial heterogeneity by estimating location-specific coefficients [
26]. The method employs a spatial weighting kernel in which observations closer to the focal location receive greater weight, while more distant observations contribute less, thereby producing a locally calibrated model.
This framework is particularly suited for analysing farmland rental prices, where localised conditions such as market accessibility, transportation networks, infrastructure, and regional economic activity can create substantial spatial variation [
27]. Conventional regression approaches may overlook these geographically distinct influences, potentially leading to biased or overly generalised estimates. GWR mitigates this issue by fitting separate local regressions at each county, allowing for the influence of macroeconomic and agricultural variables on rent to differ across regions. This provides a more refined understanding of spatial market dynamics and enhances the interpretability of rental price determinants for land use planning and policy analysis. The general specification of GWR is expressed as follows [
28]:
In a geographically weighted regression (GWR) framework, the coefficient for each predictor is estimated at the specific spatial coordinates , enabling the relationship between variables to vary across locations. The local intercept reflects the expected value of the dependent variable at that location when all predictors are held constant. The disturbance term is assumed to be normally distributed with constant variance, representing unexplained local variation in the model.
The spatial weighting matrix represents the degree of influence that observation j has on the estimation of local parameters at location i. This matrix governs how information from nearby locations is incorporated into the calibration of each coefficient , assigning greater weight to observations that are spatially closer and diminishing the influence of those farther away.
Weights are determined by a kernel function that quantifies spatial proximity between pairs of observations. One of the most applied kernels in GWR is the Gaussian function, which specifies the weight
between locations
i and
j as follows [
28]:
The spatial distance between observations, denoted as , is a key determinant of how much influence location exerts on the parameter estimates at location within the GWR framework. The degree of this influence is governed by the bandwidth parameter , which defines the scale over which neighbouring observations contribute to the local regression. A smaller bandwidth assigns greater importance to nearby points and underlines fine-grained spatial patterns, whereas a larger bandwidth incorporates information from a broader area, producing estimates that resemble a more global model.
Selecting an appropriate bandwidth is essential for balancing local detail with regional stability. In this study, the optimal bandwidth is determined by minimising the corrected Akaike Information Criterion (
), a standard approach widely used for GWR model calibration [
29]:
where
L is the maximum likelihood of the model.
represents the number of estimated parameters, including the intercept and all local regression coefficients.
is the total number of observations used in the model. The first component,
, reflects the goodness of fit, with smaller values indicating better agreement between the model and the observed data. The term
penalises model complexity by increasing the score as more parameters are added, thereby discouraging overfitting. The final adjustment term,
, further penalises models when the number of parameters approaches the sample size, ensuring more reliable inference in settings where local estimation reduces the effective degrees of freedom. The bandwidth in the GWR model is selected by minimising the corrected Akaike Information Criterion (AICc). AICc is adopted because it explicitly balances model goodness-of-fit and model complexity, making it well suited to geographically weighted regression where increased flexibility may otherwise lead to overfitting. Compared with alternative criteria such as cross-validation, AICc provides a theoretically grounded and widely used approach for bandwidth optimisation in spatially structured datasets with multiple explanatory variables [
30].
Assessing uncertainty is a crucial component of geographically weighted regression (GWR), as it provides information on the robustness of the locally estimated coefficients. A commonly used measure of such uncertainty is the standard error (SE), which reflects the degree of variability in the parameter estimates across space [
31]. Examining Ses allows for researchers to distinguish meaningful spatial patterns in coefficient estimates from variation that may simply arise due to sampling noise.
The computation of local standard errors relies on the GWR hat matrix
, which represents the influence that each observation exerts on the local parameter estimates [
32]. The standard error of a coefficient
at location
i is given by
where
is the estimated variance of the residuals;
is the diagonal element of the hat matrix, representing the leverage of observation
i.
For the sensitivity analysis of mean farmland rent predictions, the local coefficients generated by the geographically weighted regression (GWR) model are utilised. Because GWR is a linear modelling framework, each coefficient directly reflects the marginal contribution of its corresponding variable to rental price variation. Examining these coefficients across geographic locations reveals how the influence of each factor changes spatially.
2.5.5. Evaluation Criteria
The coefficient of determination (COD) is widely used to assess the accuracy and explanatory power of predictive models by measuring how well the predicted outcomes align with the observed data [
34]. It represents the fraction of total variation in the dependent variable that can be attributed to the model rather than random noise, thereby indicating how effectively the model captures key relationships in the dataset [
35]. The formula for COD is as follows:
where
is the mean values.
is the predicted value. A COD value approaching 1 indicates that the model accounts for most of the variability in the observed data, reflecting strong predictive capability. Conversely, values near zero imply that the model performs poorly and fails to explain the underlying variation in the target variable.
The importance of each explanatory variable is evaluated by examining the variability of its local regression coefficients, quantified through their standard deviation across all geographic locations [
36]. The coefficient variability can be calculated:
A larger value of indicates stronger spatial heterogeneity, meaning the effect of that variable differs substantially from one county to another. Conversely, a smaller value suggests that the variable’s impact is relatively stable throughout the study area.
The confidence interval (CI) is used to quantify prediction uncertainty. To avoid data leakage, standard errors applied to the test set are estimated from the training set rather than recalculated on test observations. Using training-derived variance preserves out-of-sample independence and ensures an unbiased assessment of predictive uncertainty. The CI in this study is defined as follows [
37]:
The quality of the confidence interval can be evaluated using the coverage rate, which measures the proportion of predictions falling within the 95% CI. A well-calibrated interval should capture close to 95% of the samples, with an acceptable range of roughly 92–97% due to natural model and data variability.