LLM-Enabled Reconstruction of Farmer Fertilizer-Reduction Responses Under Policy Scenarios: Evidence from Sparse Stated-Preference Data

Liu, Shuaiwen; Zhang, Yichuan; Sun, Zhentao; Huang, Xiao; Yu, Chaoqing

doi:10.3390/agriculture16121266

Open AccessArticle

LLM-Enabled Reconstruction of Farmer Fertilizer-Reduction Responses Under Policy Scenarios: Evidence from Sparse Stated-Preference Data

by

Shuaiwen Liu

^1,2

,

Yichuan Zhang

^1,2,

Zhentao Sun

³,

Xiao Huang

^2,*

and

Chaoqing Yu

²

¹

School of Computer Science and Technology, Hainan University, Haikou 570228, China

²

Center for Eco-Environment Restoration Engineering of Hainan Province, School of Ecology, Hainan University, Haikou 570228, China

³

Department of Energy and Power Engineering, Tsinghua University, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Agriculture 2026, 16(12), 1266; https://doi.org/10.3390/agriculture16121266

Submission received: 26 April 2026 / Revised: 5 June 2026 / Accepted: 5 June 2026 / Published: 8 June 2026

(This article belongs to the Topic Applications of Artificial Intelligence Models and Spatiotemporal Data in Agriculture and the Ecological Environment)

Download

Browse Figures

Versions Notes

Abstract

Agricultural fertilizer reduction depends on farmers’ responses to policy incentives, but such responses are often observed only at a few subsidy levels and under hypothetical conditions. Using survey-based stated-preference data from 15 counties in China, this study examines whether large language model (LLM)-based methods can reconstruct fertilizer-reduction response intervals under alternative subsidy scenarios. Three LLM-based inference strategies were designed and compared with 14 conventional methods within an exploratory evaluation framework covering interval recovery, extrapolation behavior, and curve-shape plausibility. LLM-based methods were competitive in this sparse-anchor reconstruction task. The incremental inference strategy, which reconstructs target intervals through local changes between subsidy anchors, produced the most stable results. DeepSeek V3.2 Increment obtained the highest IO (0.528) and a high EIO (0.602), while Qwen3-8B Increment achieved the lowest MAME (1.291) and the highest EIO (0.636). SHAP analysis showed that reconstruction difficulty was mainly associated with fertilizer bags per mu (0.2414), annual fertilizer cost (0.1808), and fertilization training (0.1473). Overall, this study explores the potential of LLM-based inference as a flexible approach for fertilizer-reduction policy-response analysis from limited stated-preference data.

Keywords:

LLMs; fertilizer reduction; policy incentives; stated-preference data

1. Introduction

Agricultural nitrogen management is important for maintaining crop production while reducing environmental pressure [1]. Nitrogen fertilizer has greatly increased crop yields, but excessive and inefficient use has also caused soil acidification, water eutrophication, nitrate leaching, and greenhouse gas emissions [2]. To address these problems, China has introduced a series of policies to control fertilizer overuse and improve nutrient-use efficiency [3]. However, in a smallholder-based agricultural system, the effects of these policies depend not only on policy design, but also on how farmers respond to policy incentives [4]. Farmers differ in location, income, education, production conditions, fertilizer-use habits, and access to technical training [5]. These differences may lead to different responses to the same fertilizer-reduction policy [6]. Therefore, fertilizer-reduction policy analysis must account for heterogeneous farmer responses to different policy incentives [7].

Existing studies have examined many factors related to farmers’ fertilizer-reduction behavior [8]. These factors include household characteristics, farm size, production costs, technical training, environmental awareness, and institutional support [9,10]. This literature helps explain why fertilizer-reduction behavior differs across farmers and regions. However, policy design also requires forward-looking evidence. Policymakers need to know how farmers may change their fertilizer-reduction responses when subsidy levels or policy conditions change [11]. Policy design therefore requires not only explanations of current fertilizer-use behavior, but also estimates of possible response patterns under alternative subsidy scenarios. In practice, it is difficult to collect dense response data across many subsidy levels. Real policy experiments are also costly, slow, and affected by local implementation conditions [12]. As a result, there remains a gap between studies that explain existing fertilizer-use behavior and studies that estimate heterogeneous farmer responses before a policy is implemented.

Existing behavioral and policy-response modeling methods provide one way to address this gap [13]. Agent-based models (ABMs) have been widely used to represent human decision-making in coupled human–environment systems. They can describe heterogeneous agents, policy scenarios, and possible interactions among decision makers [14]. However, they often require predefined decision rules and behavioral assumptions, which are difficult to validate when empirical behavioral data are limited [15]. Traditional statistical models have clear parameters and can be useful when the model structure is well specified. Yet they often rely on strong assumptions about the data-generating process and may be less flexible in nonlinear and heterogeneous decision settings [16]. Modern machine-learning methods can improve prediction accuracy, but they usually require structured inputs, sufficient training samples, feature engineering, and careful calibration [17]. These methods are valuable, but they do not fully solve the task considered in this study: reconstructing interval-valued fertilizer-reduction responses from sparse subsidy anchors while incorporating heterogeneous farmer information.

This task involves more than mapping subsidy amount to response. A farmer’s stated fertilizer-reduction response may depend on crop type, fertilizer cost, labor constraints, agricultural income dependence, training experience, and attitude toward new technologies [18]. These factors are not only separate variables. Their meanings may change across farming contexts. For example, the same subsidy level may imply different adjustment pressure for farmers with different fertilizer costs, labor availability, or training experience [19]. Natural-language representation provides a simple way to place farmer profiles, observed response anchors, and hypothetical policy scenarios in the same input format [20]. In this setting, large language models (LLMs) are used to reconstruct stated responses from textualized farmer information and sparse anchors. This allows stated responses under new policy scenarios to be estimated from textualized farmer information and sparse response anchors, without hand-coding detailed decision rules for each farmer type [21].

Recent studies on generative agents and LLM-based behavior simulation suggest that LLMs can produce context-sensitive responses in some social and decision-making tasks [22]. This has encouraged their use in agent-based modeling, human behavior simulation, and policy scenario analysis [23]. At the same time, LLM-generated behavior cannot be treated as automatically valid or realistic [24]. In agricultural policy analysis, it remains unclear whether LLM-based methods can reconstruct meaningful fertilizer-reduction response patterns from limited stated-preference data while reflecting farmer heterogeneity and sensitivity to policy incentives. Therefore, the present study focuses on whether LLM-based inference can recover observed response anchors, generate plausible response curves, and reveal variation in reconstruction difficulty across farmer groups.

Against this background, this study examines whether LLM-based inference can reconstruct heterogeneous farmer-stated fertilizer-reduction responses under hypothetical subsidy incentives. Here, farmer-response reconstruction refers to estimating farmer-stated fertilizer-reduction response intervals under hypothetical subsidy levels, based on sparse stated-preference anchors and farmer attributes. This study develops three LLM-based inference strategies and compares them with 14 conventional modeling methods. The study has three objectives: (i) It develops LLM-based methods for interval-valued response reconstruction under sparse subsidy anchors. (ii) It compares LLM-based and conventional methods in terms of anchor recovery, extrapolation behavior, and response-curve plausibility. (iii) It examines whether reconstruction difficulty differs across farmer groups and identifies farmer-level factors associated with model performance.

2. Materials and Methods

The workflow, presented in Figure 1, comprises three interconnected modules: (i) Data Collection, (ii) Accuracy and Plausibility Evaluation, and (iii) Heterogeneity analysis, in line with the study objectives.

The Data Collection module processes and cleans empirical data, including screening household-level attributes and policy scenario response data, and standardizes them into JavaScript Object Notation (JSON) format. The Accuracy and Plausibility Evaluation module adapts 14 baseline methods and three LLM methods and quantifies model performance. The best-performing method is then used for heterogeneity analysis to examine how farmer attributes are associated with reconstruction difficulty. Each of these modules is detailed in the following sections.

2.1. Data Collection

Because directly observing farmers’ adaptation to policy changes would require large-scale longitudinal surveys or real-world policy experiments, this study used self-reported intentions under hypothetical subsidy settings rather than observed responses to actual interventions. The survey was conducted from June to September 2025 in 10 major agricultural provinces in China. In each province, the research team selected one to three nearby counties, yielding a final sample of 15 counties in total (Table 1). Within the sampled counties, households from nearby villages who agreed to participate were randomly selected for interviews. Data were collected through a combination of online and offline methods: online questionnaires were administered in the sampled counties with the assistance of local investigators recruited by the research team, whereas face-to-face interviews were carried out in randomly chosen counties in Hainan Province.

The survey variables were selected to reflect the main factors commonly associated with farmers’ fertilization decisions, which can be broadly grouped into five dimensions: geographic context, household labor endowment, production characteristics, economic status, and education-related adoption behavior. Previous research suggests that fertilization practices are strongly influenced by policy incentives such as fertilizer subsidy reforms, as well as by labor constraints, production arrangements, and household economic conditions. At the same time, regional differences in climate, soil quality, and local governance contribute to considerable spatial heterogeneity in both fertilizer application and use efficiency. Educational attainment also plays an important role in shaping farmers’ propensity to adopt organic fertilizers or reduce chemical fertilizer inputs [25]. These insights guided the identification of the core variables and policy scenarios presented in Table 2.

The survey recorded location identifiers at multiple administrative levels, labor composition and economic factors, crop areas, fertilizer application (timing, type, price, quantity), and exposure to and engagement with fertilization training and government-promoted agricultural practices. Responses were screened to ensure completeness, requiring accurate geographical identifiers, a full growing season of planting and fertilization data (per-hectare application, fertilizer type, prices), and at least one policy-response entry. Records affected by extreme events such as floods, typhoons, or severe pest outbreaks were retained for heterogeneity rather than idealized conditions. Different crops grown by the same household were treated as independent data points, resulting in a final dataset of 328 behavioral responses across subsidy scenarios from 82 households. This expansion improves task coverage across crops and policy anchors, but it does not imply 328 fully independent households; accordingly, the results should be interpreted as evidence from a limited stated-preference sample rather than from a large representative panel.

After data screening and cleaning, each questionnaire record was converted into a structured JSON format on the basis of fixed mapping rules and predefined schemas, thereby ensuring compatibility with different testing methods. Figure S1 presents a representative example of the standardized JSON format. Because the input requirements varied across methods, these standardized records were further subjected to method-specific preprocessing, as detailed below. The resulting records were integrated into a unified dataset and randomly partitioned at the household level into training and validation sets using an 8:2 ratio (i.e., all crop records from the same household were assigned to the same set). To improve the robustness of the results, the random partitioning procedure was repeated 10 times.

2.2. Reconstruction Settings and Modeling Strategies

To examine farmer-stated fertilizer-reduction responses under sparse subsidy anchors, this study grouped the evaluated methods into four categories according to input form and reconstruction logic. Anchor-based reconstruction methods used only the observed subsidy–response anchors of the same farmer. Pooled univariate modeling used subsidy amount as the only predictor and learned a shared response pattern across all samples. Tabular-feature methods used farmer attributes, fertilizer-use variables, and subsidy information in structured form. LLM-based methods used the same farmer attributes, fertilizer-use information, observed anchors, and target subsidy scenario, but represented them in textual form. This design kept the core information comparable between the LLM-based and tabular-feature settings while allowing their representation formats to differ.

The present task differs from standard stated-preference choice modeling in several respects. The target is an interval-valued fertilizer-reduction response rather than a discrete choice; the input contains sparse subsidy–response anchors rather than complete choice sets; and the objective is to reconstruct missing response intervals rather than estimate choice probabilities. For this reason, the conventional baselines were selected to represent the main modeling assumptions that may be relevant to sparse interval reconstruction of farmer policy responses.

Anchor-based reconstruction methods were included to test how much of a missing response interval could be recovered from the remaining anchors of the same farmer-crop record alone. Linear fitting, curve fitting, monotone piecewise linear interpolation, and per-sample isotonic regression were included as anchor-based references. Linear fitting and curve fitting represent simple parametric assumptions about the subsidy–response relationship. Monotone piecewise linear interpolation and per-sample isotonic regression represent local reconstruction rules that also incorporate the weak policy prior that higher subsidies should not decrease stated willingness to reduce fertilizer use.

Monotonic Gaussian Process was included as a pooled univariate reference to test whether a shared subsidy–response curve could be learned across all samples using subsidy amount alone. This setting ignores farmer-level covariates, but it provides a global one-dimensional reference for the sparse-anchor reconstruction task.

The tabular-feature methods were included as stronger conventional baselines because they use farmer-level covariates in addition to subsidy information. Generalized Additive Model (GAM) and Monotone GAM represent interpretable nonlinear statistical models. MLP, CatBoost, and XGBoost-based models represent commonly used machine-learning methods for nonlinear tabular prediction, with CatBoost and XGBoost serving as competitive tree-based references for structured data. Monotone variants were added to examine whether the weak policy prior of non-decreasing subsidy responses improved reconstruction. Conformal CatBoost was included because the target output is interval-valued, and Household Random Effects was included to account for repeated crop-level records from the same household. Together, these methods provide a task-specific comparison across anchor-only reconstruction, pooled subsidy–response modeling, covariate-based tabular prediction, monotonicity-constrained reconstruction, interval-oriented prediction, and grouped-data modeling. This design helps evaluate whether LLM-based textual reconstruction offers advantages over conventional structured approaches in the present sparse stated-response reconstruction setting. A summary of the evaluated modeling approaches is provided in Table 3.

Linear fitting, curve fitting, Monotone Piecewise Linear, and Per-Sample Isotonic were implemented as anchor-based reconstruction methods. In these methods, the held-out target anchor was reconstructed only from the remaining observed anchors of the same farmer. No cross-sample covariates were introduced. Subsidy amount was the only explicit predictor. Data preprocessing was therefore limited. Linear fitting and curve fitting estimated the interval bounds from simple functions of subsidy amount. Monotone Piecewise Linear and Per-Sample Isotonic further imposed monotonic structure through piecewise interpolation or isotonic regression. The corresponding input format for this anchor-only setting is illustrated in Figure S1.

Monotonic Gaussian Process was used as a pooled univariate modeling setting. In this method, all training samples were pooled. However, the predictor space still contained only subsidy amount. A shared smooth function from subsidy amount to response midpoint was learned from the full training set. The target interval width was then recovered from interpolation of the sample’s observed anchor widths. Farmer-specific background variables were not used in this setting. This method therefore represents a global one-dimensional modeling strategy rather than a within-sample reconstruction rule. The corresponding input format for this pooled univariate setting is shown in Figure S2.

Isotonic Residual Projection, GAM, Monotone GAM, Multilayer perceptron (MLP), CatBoost, CatBoost Monotone, XGBoost Monotone, Conformal CatBoost, and Household Random Effects were implemented as tabular-feature modeling methods. In these methods, subsidy amount was combined with farmer background variables, crop information, and fertilizer-use characteristics. The resulting inputs were encoded into structured tabular representations. Method-specific preprocessing was then applied, such as numerical standardization, one-hot encoding, or native categorical handling. Household Random Effects further introduced farmer ID as a grouping variable, so that household-level random effects could be modeled in addition to fixed covariate effects. The corresponding input format for the tabular and household-grouped settings is presented in Figure S3.

Unlike tabular models, LLM-based methods can operate directly on textualized farmer profiles and anchor descriptions, thus reducing the need for extensive manual feature engineering [20,38]. The textual inputs were constructed from the same core information used by the tabular-feature models, including farmer attributes, fertilizer-use variables, observed subsidy-response anchors, and the target subsidy scenario. Thus, the difference between the two settings mainly lies in how the information was represented: tabular models used encoded structured variables, whereas LLM-based methods used natural-language descriptions of the same information.

In this study, two base LLMs were used: DeepSeek V3.2 [39] and Qwen3-8B [40]. DeepSeek V3.2 served as a larger-parameter model accessed through an application programming interface (API), whereas Qwen3-8B served as a smaller local model. Both models were examined under direct inference and incremental inference settings, allowing different forms of textual response reconstruction to be compared within the same sparse-anchor task. By contrast, performing Low-Rank Adaptation (LoRA)-based fine-tuning on DeepSeek V3.2 would require prohibitively expensive hardware and an extremely long training time, so it was only conducted on Qwen3-8B. The purpose of this comparison was not to rank all available LLMs, but to examine whether different LLM-based inference strategies could support the same sparse-anchor reconstruction task across different deployment conditions. The comparison therefore focuses on inference strategies under two practical deployment settings rather than on exhaustive model benchmarking.

In the LLM direct inference setting, farmer background information was converted into a series of textual entries. The observed anchor points, excluding the target anchor, were rewritten as natural-language descriptions of the farmer’s stated fertilizer-reduction interval under specific subsidy levels. The target subsidy amount was then provided explicitly, and the model was asked to generate the predicted fertilizer-reduction interval directly. A representative prompt example is shown in Figure S4.

The LoRA-fine-tuned LLM used the same textual input format as direct inference. The base model, Qwen3-8B, was further adapted on the training set using LoRA, a parameter-efficient fine-tuning method that injects trainable low-rank updates into frozen pretrained weights [41]. In this setting, textualized farmer profiles and their corresponding ground-truth fertilizer-reduction intervals were used as supervision for parameter-efficient fine-tuning. The implementation was carried out with the LLaMA-Factory framework, which was designed as a unified efficient fine-tuning platform for large language models [42]. The training hyperparameters, decoding parameters, and runtime environment for Qwen3-8B are detailed in Table A1; the loss function configuration is illustrated in Figure A1, and the full prompt template is provided in Figure S5.

In addition to the baseline methods above, an LLM-based incremental inference method was proposed for target interval reconstruction under sparse anchor observations. In this setting, the model was not asked to generate the target interval directly. Instead, only the adjacent known anchor information around the target subsidy level was provided. The subsidy range between anchors was discretized into 10-RMB steps, and the LLM was instructed to assign a raw relative incremental weight

w_{k}

to each step

k

rather than output the final interval. The raw weights were then normalized as

α_{k} = m a x (0, w_{k}) / Σ_{(j = 1)}^{K} m a x (0, w_{j})

, where

K

is the number of 10-RMB steps in the local segment. If the sum of non-negative weights was zero, uniform weights

α_{k} = 1 / K

were used. For each interval bound c ∈ {min, max}, let

y_{L}^{(c)}

and

y_{R}^{(c)}

denote the left and right anchor values. The total increment was defined as

Δ^{(c)} = y_{R}^{(c)} - y_{L}^{(c)}

, and the increment allocated to step k was

δ_{k}^{(c)} = α_{k} Δ^{(c)}

. The reconstructed value after m steps was therefore

y_{h} a t_{m}^{(c)} = y_{L}^{(c)} + Σ_{(k = 1)}^{m} δ_{k}^{(c)}

. Because the incremental method works within a bounded local segment, it was not used for extrapolation directly. For extrapolation scenarios, this study adopted a linear model to estimate the missing outer anchors. The incremental method was then applied between the resulting anchors. In this way, the target response interval was reconstructed from local incremental transitions rather than generated in a single step. A representative prompt example is shown in Figure S6.

2.3. Evaluation Framework for Accuracy and Rationality

To characterize how different methods reconstruct missing response information under sparse subsidy anchors, model behavior was examined under a leave-one-anchor-out protocol [43]. Specifically, the first five groups of variables in Table 2, together with three of the four anchor points contained in the sixth group, were provided as model inputs, and the remaining anchor point was treated as the reconstruction target. This procedure was repeated four times so that each anchor point was held out once. In this way, each method was examined for its ability to recover one unobserved anchor from the other three observed anchors within the same stated-preference record.

Two types of recovery behavior were distinguished. A prediction was classified as interpolation when the target anchor lay between two provided anchors, and as boundary-anchor recovery when the held-out anchor was located at the lower or upper boundary of the surveyed subsidy levels. This distinction was retained throughout the assessment in order to separately assess performance under easier within-range inference and more challenging out-of-range inference.

To characterize the agreement between predicted and reference fertilizer-reduction intervals, the interval overlap (IO) metric was adopted [44]. Let the predicted interval for sample

i

be denoted by

{\hat{I}}_{i} = [{\hat{l}}_{i}, {\hat{u}}_{i}]

, and let the survey-derived reference interval be denoted by

I_{i} = [l_{i}, u_{i}]

. For any interval

A

,

∣ A ∣

denotes its length. The IO for sample

i

was defined as:

I O_{i} = \frac{| \hat{I_{i}} \cap I_{i} |}{| \hat{I_{i}} \cup I_{i} |} .

(1)

When the two intervals did not overlap,

∣ {\hat{I}}_{i} \cap I_{i} ∣

was set to 0. This metric is equivalent to the intersection-over-union ratio of two one-dimensional intervals, with larger values indicating better agreement between the predicted and reference ranges. For summary reporting, IO was averaged across all samples, and separate averages were also computed for interpolation and extrapolation cases.

However, IO alone may be inflated by overly wide prediction intervals. To complement overlap-based characterization, two additional metrics were introduced to characterize interval sharpness and midpoint fidelity [45], namely mean interval width (MIW) and mean absolute midpoint error (MAME). MIW was used to quantify the average width of the predicted intervals, thereby reflecting whether the model tended to produce excessively broad and weakly informative predictions. MAME was used to measure the absolute deviation between the midpoints of the predicted and reference intervals, thereby reflecting whether the predicted interval was centered appropriately around the target response.

The two metrics were defined as follows:

MIW = \frac{1}{N} \sum_{i = 1}^{N} (\hat{u_{i}} - \hat{l_{i}}),

(2)

MAME = \frac{1}{N} \sum_{i = 1}^{N} | \frac{\hat{u_{i}} + \hat{l_{i}}}{2} - \frac{u_{i} + l_{i}}{2} | .

(3)

Smaller values of MIW indicate sharper interval predictions, whereas smaller values of MAME indicate better localization of the predicted interval around the reference midpoint.

Curve-shape plausibility was examined by requiring each model to reconstruct a dense subsidy–response curve from 0 to 800 RMB with a step size of 10 RMB, using the farmer information and the four anchor points listed in Table 2 as inputs. Because the plausibility assessment focused on the overall shape of the generalized response curve, the midpoint of each predicted interval was used to represent the model-implied response at each subsidy level.

From a policy-scenario perspective, higher subsidy levels are generally expected to increase, or at least not reduce, farmers’ stated willingness to reduce fertilizer use [46]. However, this expectation should be treated as a weak policy prior rather than as a strict behavioral law [47]. A non-monotonic curve should therefore not be interpreted automatically as invalid. Farmers may show thresholds, plateaus, or local reversals because of crop differences, risk perception, labor constraints, or adjustment costs. In this study, monotonic consistency is interpreted only as a sign of stronger alignment with the assumed policy-response direction. Accordingly, monotonicity violation degree (MVD) and flat-segment ratio (FSR) were used as auxiliary plausibility indicators, not as direct measures of behavioral realism.

Let

m_{t}

denote the midpoint prediction at subsidy level

s_{t}

, where

t = 0,1, \dots, T

, and let

T

denote the total number of adjacent intervals along the generated curve. The MVD was defined as

MVD = \frac{\sum_{t = 1}^{T} \max (0, m_{t - 1} - m_{t})}{\sum_{t = 1}^{T} | m_{t} - m_{t - 1} | + ϵ},

(4)

where

ϵ

is a very small positive constant introduced to avoid division by zero. This metric measures the proportion of downward movement relative to the total amount of variation in the generated curve. A smaller MVD indicates closer alignment with the non-decreasing response pattern implied by this monotonic policy prior.

To further identify curves that remained nearly unchanged over long subsidy intervals, the FSR was defined as

FSR = \frac{1}{T} \sum_{t = 1}^{T} 1 (| m_{t} - m_{t - 1} | < τ),

(5)

where

1 (\cdot)

is the indicator function and

τ

is a small threshold for identifying near-flat changes. A larger FSR indicates that the generated curve contains a greater proportion of nearly constant segments, suggesting limited sensitivity to changes in subsidy amount.

Taken together, these two indicators were used to describe whether the generalized subsidy–response curves remained numerically coherent and broadly consistent with the monotonic policy prior adopted in this study. Because this prior is a modeling choice, the MVD and FSR results should be interpreted together with the response-recovery metrics rather than as standalone evidence of behavioral validity.

2.4. Heterogeneity and Interpretability Analysis

To examine whether model accuracy varied across farmer subgroups, heterogeneity analysis was conducted using the best-performing model from the accuracy and plausibility evaluation. For each farmer-crop sample, reconstruction difficulty was measured by the mean absolute midpoint error across the four leave-one-anchor-out tasks. A larger value indicates greater reconstruction difficulty for that sample.

The sample-level difficulty for sample i was defined as

D_{i} = \frac{1}{R_{i}} \sum_{r = 1}^{R_{i}} (\frac{1}{K} \sum_{k = 1}^{K} | \hat{m} i r k - m i r k |)

(6)

where

R_{i}

is the number of repeated runs for sample

i

,

K

is the number of anchor points,

{\hat{m}}_{i r k}

is the predicted midpoint at anchor

k

in run

r

, and

m_{i r k}

is the corresponding reference midpoint. The reference midpoint was calculated as

m_{i r k} = \frac{l_{i r k} + u_{i r k}}{2}

(7)

where

l_{i r k}

and

u_{i r k}

are the lower and upper bounds of the observed interval, respectively. A larger

D_{i}

indicates greater reconstruction difficulty.

After computing

D_{i}

, samples were grouped by crop type, fertilization training frequency, and attitude toward new technology. For each subgroup, the mean difficulty and its 95% confidence interval (CI) were calculated. These subgroup means were then compared with the overall sample mean. The resulting subgroup comparisons were visualized as a forest plot in the Results section. This analysis was used to identify whether some farmer groups were systematically easier or harder to predict than others.

To examine which farmer-level variables were associated with reconstruction difficulty, this study used a surrogate error-analysis approach. Directly applying Shapley Additive Explanations (SHAP) to the LLM was not suitable for this purpose, because the LLM used textual prompts as inputs and generated response intervals as outputs. Such an analysis would mainly explain the influence of prompt tokens on a generated answer, rather than the relationship between farmer attributes and reconstruction error [48,49]. Therefore, we first calculated the sample-level reconstruction difficulty from the outputs of the best-performing LLM method. This difficulty value was then treated as a structured target variable. A random forest model was trained to relate farmer-level variables to this target, and TreeSHAP was applied to this surrogate model to identify the variables most associated with reconstruction difficulty [50,51,52]. The SHAP value for the

i

-th input,

ϕ_{i}

, attributes changes in the surrogate model prediction to that input and is calculated using the classic Shapley value formulation [53]:

ϕ_{i} = \sum_{S \subseteq M ∖ {i}} \frac{| S |! (| M | - | S | - 1)!}{| M |!} [f_{x (S \cup {i})} - f_{x (S)}]

(8)

where

M

is the complete set of inputs, and

S

represents any subset not including

i

. The value function

f_{x} (S)

is defined as the conditional expectation of the model’s output given the feature subset

S

, denoted by

E [f (x) | x_{S}]

. A SHAP value of zero means that the input has no contribution to the predicted reconstruction difficulty; a positive value indicates that the input increases the predicted difficulty relative to the baseline, whereas a negative value indicates that it decreases the predicted difficulty.

3. Results

This section presents the empirical and modeling results in three steps. Section 3.1 first summarizes the main patterns observed in the survey data, including the distribution of responses across subsidy levels and the variation in intended fertilizer-reduction behavior among farmers. Section 3.2 then compares the performance of the evaluated modeling approaches from the perspectives of predictive accuracy, generalization ability, and extrapolation capability. Finally, Section 3.3 focuses on heterogeneity and interpretability analysis based on the best-performing approach, with the aim of identifying the key factors associated with differences in reconstruction difficulty.

3.1. Survey Results

Farmers’ intended fertilizer-reduction responses varied clearly across subsidy and household conditions (Figure 2a–d). As subsidies increased, responses shifted steadily away from unwillingness and toward willingness (Figure 2a). The distribution of intended reduction also moved toward higher reduction levels, indicating stronger stated adjustment under larger subsidies (Figure 2b). Together, these patterns suggest that subsidy incentives raise both the probability of accepting fertilizer reduction and the intensity of the intended response.

Stated responses also differed by agricultural income dependence and training exposure. Annual fertilizer expenditure rose sharply with household reliance on agricultural income, with median cost increasing from 1900 RMB in the low-dependence group to 8250 RMB in the high-dependence group (Figure 2c). Training was likewise associated with stronger willingness to reduce fertilizer use (Figure 2d). Farmers without training were mostly unwilling, whereas those with occasional or repeated training showed much larger willing shares. This pattern suggests that training, alongside subsidies, may strengthen acceptance of fertilizer-reduction practices.

3.2. Model Performance

Model performance was evaluated by comparing the reconstructed intervals with the survey-derived reference intervals under the stated subsidy scenarios. Therefore, the reported IO, EIO, IIO, MIW, and MAME values describe how closely each method recovered farmer-stated fertilizer-reduction responses in the survey setting.

3.2.1. Comparison of Conventional Modeling Approaches

Figure 3 compares the 14 conventional modeling approaches using normalized scores for MIW, MAME, IO, extrapolation interval overlap (EIO), and interpolation interval overlap (IIO). Overall, the methods differed substantially in their performance balance. Monotone XGBoost delivered the strongest overall profile, achieving the highest scores on four of the five metrics, including MAME (84.9 ± 3.5), IO (77.1 ± 4.0), EIO (75.1 ± 2.4), and IIO (52.7 ± 7.2). By contrast, Household Random Effects ranked first only on MIW (100.0 ± 5.9) but performed worst on MAME (0.0 ± 10.2), suggesting that narrow intervals alone did not ensure accurate recovery. Among the other machine-learning baselines, MLP and CatBoost also remained competitive on selected metrics, but neither matched the overall balance of Monotone XGBoost. The corresponding raw performance metrics and statistical comparisons are reported in Table 4.

Monotonic or shape-constrained baselines generally performed better on overlap-related metrics than the simpler fitting approaches. Monotone Piecewise Linear outperformed Linear Fitting on both IO and EIO, while Curve Fitting remained weak across IO, EIO, and IIO. Per-sample Isotonic showed relatively favorable MIW but only moderate overlap performance, and GAM-based methods were comparatively stable but less competitive overall. Taken together, Monotone XGBoost provided the strongest conventional reference, whereas simpler univariate fitting methods showed weaker recovery performance.

3.2.2. LLM Performance

Figure 4 compares the five LLM settings with Monotone XGBoost, the strongest conventional modeling approach. The specific numerical indicators of each method are shown in Table 5. Overall, the incremental settings showed more balanced results across the five metrics than the direct settings. DeepSeek V3.2 Increment gave the most even profile, with relatively high scores on MIW (60.0), MAME (86.1), IO (100.0), EIO (93.5), and IIO (76.9). Qwen3-8B Increment was especially strong on normalized MAME and EIO, both reaching 100.0, while also maintaining relatively high scores on MIW (56.3) and IO (85.4). By contrast, the direct settings were less even across metrics. DeepSeek V3.2 Direct achieved the highest IIO score (100.0), but remained moderate on the other metrics, whereas Qwen3-8B Direct was weaker on most dimensions. LoRA fine-tuning improved the local model only partly, with Qwen3-8B LoRA performing relatively well on IO and IIO, but remaining lower on MIW, MAME, and EIO.

Relative to Monotone XGBoost, the two incremental settings scored higher on most metrics, while the direct and LoRA settings showed a more mixed pattern. DeepSeek V3.2 Direct and Qwen3-8B LoRA exceeded Monotone XGBoost on selected overlap-related metrics, but not across all five dimensions.

Because the repeated splits were conducted at the household level, crop records from the same household were kept within the same training or validation split. This design reduced household-level information leakage and provided a stricter evaluation setting for the present survey structure. The relatively large standard deviations of some metrics, especially IIO, indicate that interpolation performance varied across household-level splits, which is reasonable given the limited number of surveyed households.

3.2.3. Generalized Curve Behavior

Figure 5 compares the generalized subsidy-response curves of all evaluated methods over the full subsidy range. Clear differences were observed in both monotonicity and flat-segment behavior. All conventional methods remained non-decreasing, with MVD = 0.0000 in every case, but their FSR values varied widely. Linear Fitting and MLP showed the lowest FSR values, whereas Curve Fitting, GAM, Monotone GAM, Household Random Effects, and Monotonic Gaussian Process were noticeably flatter. Stronger plateau behavior appeared in Monotone Piecewise Linear and Per-sample Isotonic, and the flattest curves were produced by CatBoost, Monotone CatBoost, Monotone XGBoost, and Conformal CatBoost, all with FSR = 0.9625. Among the LLM settings, DeepSeek V3.2 Increment and Qwen3-8B Increment both preserved monotonicity, with MVD = 0.0000 and FSR values of 0.4375 and 0.4750, respectively. By contrast, DeepSeek V3.2 Direct, Qwen3-8B Direct, and Qwen3-8B LoRA showed visible monotonicity violations, with MVD values of 0.1525, 0.3335, and 0.2960, respectively.

Under the monotonic policy prior adopted in this study, the incremental LLM settings produced relatively stable curve shapes. They preserved non-decreasing trends while avoiding the strongest flat-segment patterns observed in several conventional methods. However, monotonicity violations should be interpreted cautiously. They indicate departures from the assumed policy-response direction, not necessarily behavioral invalidity. In real farming decisions, local declines or flat segments may still occur because of crop differences, risk concerns, labor constraints, or adjustment costs. Therefore, MVD and FSR were used only as auxiliary diagnostics of curve shape.

3.3. Heterogeneity and Interpretability

To assess whether reconstruction difficulty varied across farmer subgroups, a heterogeneity analysis was conducted using the best-performing model, DeepSeek V3.2 Increment. Reconstruction difficulty was measured by the sample-level mean absolute anchor error, and subgroup means were compared with the overall average of 1.68. As shown in Figure 6, the comparison was organized by crop type, fertilization training frequency, and technology attitude. This comparison identified variation in reconstruction difficulty across farmer profiles.

As shown in Figure 6, clear but moderate heterogeneity was observed. By crop type, the highest error was recorded for rice (2.11; 95% CI: 1.73–2.49), followed by vegetables (1.96; 1.37–2.55), whereas fruit showed the lowest error (0.91; 0.59–1.23). Wheat (1.52) remained below the overall mean, while corn (1.72) was close to it. By training frequency, the greatest difficulty was found for occasional training (2.14; 1.80–2.48), followed by repeated training (1.88; 1.55–2.20), whereas no training showed the lowest value (1.23; 0.86–1.60). By technology attitude, the “risk-averse, not tried” group showed the highest error (1.84; 1.46–2.22), while “unaware of new technology” and “tried but gave up” were lower, at 1.53 and 1.59, respectively. These results indicate that reconstruction difficulty was higher in several groups with stronger or more variable stated responses. To examine which variables were associated with this pattern, SHAP analysis was then applied, as shown in Figure 7, and the corresponding mean absolute SHAP values are summarized in Table 6.

Figure 7 shows that the sample-level reconstruction error was most strongly associated with Fertilizer bags per mu, with a mean absolute SHAP value of 0.2414. This variable reflects farmers’ dependence on chemical fertilizer in daily production. Farmers who use more fertilizer per mu may face greater concern about yield loss when reducing fertilizer, so their stated responses may depend more on production risk and adjustment cost. Annual fertilizer cost ranked second, with a SHAP value of 0.1808. This variable reflects the economic pressure of fertilizer use. For farmers with higher fertilizer costs, subsidy incentives may be more relevant, but their responses may also depend on whether the subsidy can offset possible production losses.

Fertilization training also had a high SHAP value of 0.1473. Training may affect how farmers understand fertilizer-reduction policies and evaluate their feasibility. Farmers with training may consider not only the subsidy amount, but also crop type, technical feasibility, and expected production effect. Province latitude also ranked among the influential variables, with a mean absolute SHAP value of 0.0904. This result should not be interpreted as a direct effect of latitude itself, but rather as a proxy for broader regional environmental and socioeconomic differences in the survey sample. In the Chinese agricultural context, higher-latitude regions may generally be colder and drier and may differ in cropping systems, soil conditions, market access, regional development, and local policy implementation. Therefore, the SHAP results suggest that reconstruction difficulty was associated not only with farmer-level factors, but also with broader regional contexts captured by latitude. These results suggest that the model’s reconstruction error was associated with farmer-level factors such as fertilizer dependence, cost pressure, and technology adoption, as well as broader regional contexts represented in the sample.

4. Discussion

4.1. Advantages of LLM-Based Reconstruction

This study uses LLMs to reconstruct farmer-stated fertilizer-reduction responses under different subsidy levels. The goal is not to fully simulate real farmer behavior, but to estimate missing response intervals from limited observed anchors and farmer information. For this task, LLMs have a practical advantage: they can place farmer attributes, known subsidy-response anchors, and target policy scenarios into one textual context [38]. This makes the reconstruction process more flexible than methods that only use subsidy values or fixed tabular features [54].

Compared with ABMs, the LLM-based approach needs fewer predefined behavioral rules. ABMs are useful when decision rules and interaction mechanisms are clear [55]. However, in this study, the data are sparse, and each farmer only provides responses at a few subsidy levels. Under this condition, building an ABM would require many assumptions about farmer preferences, learning, and interactions. These assumptions would be difficult to verify [56]. The LLM-based method avoids this problem by using the observed anchors and farmer profiles directly for response reconstruction [57].

Compared with conventional machine-learning methods, the LLM-based approach in this study should not be interpreted as universally superior for all farmer-response prediction tasks. However, in the present sparse-anchor reconstruction task, its advantage over the evaluated conventional baselines was supported by the experimental results, especially under the incremental inference setting. This advantage may partly arise from the textual organization of heterogeneous farmer information and sparse response anchors, which is useful when responses depend on multiple production, cost, labor, and training-related factors [58]. More specifically, the comparison between direct and incremental inference shows that the incremental strategy was a more suitable way to use LLMs in the present sparse-anchor reconstruction task. Direct inference asked the LLM to generate the target interval in one step, whereas incremental inference used nearby anchors to define a bounded local segment and asked the LLM to allocate relative changes within that segment. Therefore, the stronger performance of the incremental setting supports the usefulness of the proposed LLM-based incremental reconstruction strategy. At the same time, this advantage should not be attributed to the LLM component alone. It may reflect both the LLM’s use of contextual farmer information and the structural benefit of anchor-constrained reconstruction [59].

4.2. Limitations and Improvement Prospects

This study is still limited by the type of data used. The responses come from stated-preference surveys, not from observed behavior after real policy implementation. Therefore, the results show what farmers said they would do under hypothetical subsidies, rather than what they actually did in practice [60]. This limits the validity of the reconstructed responses. Future work should use follow-up surveys, pilot subsidy programs, or field experiments to collect real fertilizer-use records [61]. These records may include fertilizer purchase, application amount, crop yield, subsidy receipt, and changes in management practice [62]. Comparing stated responses with real changes would provide a stronger test of the method [63].

The extrapolation evaluation is also methodologically weak. In this study, extrapolation only means recovering a held-out boundary anchor. It does not prove that the model has the right to predict beyond the observed subsidy anchors in a behavioral problem. Future studies should use denser subsidy designs, with more intermediate and boundary subsidy levels [64]. Some levels can be used for model input, while others can be reserved only for testing. Real policy data beyond the original subsidy range would also be needed before making stronger claims about out-of-range prediction [65].

The curve-shape indicators and the incremental strategy also need further methodological checks. MVD and FSR are based on a monotonicity assumption, but this assumption is only conditional. Farmers may show thresholds, flat responses, or local reversals because of risk, labor limits, crop differences, or adjustment costs [66]. Therefore, these indicators should be treated only as auxiliary curve diagnostics, not as proof of behavioral realism. The incremental strategy should be evaluated with the same caution. As discussed above, its performance may depend not only on the LLM component, but also on the local-anchor structure used to constrain the reconstruction process. Future work should test this with ablation experiments, such as removing farmer attributes, changing the number of anchors, altering the prompt, or applying the same incremental design to non-LLM models. This would help clarify whether the advantage comes mainly from the LLM, the local-anchor design, or their combination [67].

4.3. Future Prospects for LLM-Based Reconstruction

Future work should examine whether the current reconstruction framework can be extended beyond fertilizer-reduction responses to other farmer management decisions, such as fertilization timing, fertilizer type adjustment, irrigation, pesticide use, crop switching, or adoption of green production technologies. These decisions are also central to agricultural management and have been widely discussed in studies of sustainable practice adoption, fertilizer management, and farmer behavioral change [68,69]. Such extensions would help test whether the framework is useful only for the present subsidy-response task or can support broader stated-response reconstruction under different agricultural decision settings.

5. Conclusions

This study examined whether LLM-based methods can reconstruct farmer-stated fertilizer-reduction responses under hypothetical subsidy scenarios. Using survey data from 15 counties in China, we compared three LLM-based strategies with 14 conventional methods. The evaluation covered interval recovery, extrapolation behavior, and curve-shape plausibility. LLM-based methods were competitive in this sparse-anchor reconstruction task, with the incremental inference strategy showing the most consistent performance. DeepSeek V3.2 Increment achieved the highest IO (0.528) and a high EIO (0.602), while Qwen3-8B Increment achieved the lowest MAME (1.291) and the highest EIO (0.636).

These findings indicate that LLM-based textual inference can support the reconstruction of heterogeneous stated-response patterns when policy-scenario data are limited. The approach combines farmer attributes, observed subsidy anchors, and target policy scenarios in a flexible textual format. Within this task setting, the incremental strategy provided a stable way to recover missing response intervals and describe possible response curves under alternative subsidy levels.

The heterogeneity analysis showed that reconstruction errors differed across farmer groups. Surrogate SHAP analysis linked these errors mainly to fertilizer-use intensity, annual fertilizer cost, and fertilization training, suggesting that predictions may be less reliable for farmers whose responses are strongly shaped by input intensity, cost pressure, or training-related differences. Overall, this study provides an exploratory framework for reconstructing heterogeneous stated policy responses from limited survey data. Future work should test this framework with larger samples, revealed fertilizer-use records, richer management information, and more diverse policy scenarios.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/agriculture16121266/s1, Figure S1: Representative example of the standardized JSON format for anchor-based input; Figure S2: Input format for the pooled univariate modeling setting; Figure S3: Input format for tabular-feature and household-grouped modeling settings; Figure S4: Representative prompt example for LLM direct inference; Figure S5: Full prompt template for LoRA-based fine-tuning; Figure S6: Representative prompt example for LLM incremental inference.

Author Contributions

Conceptualization, S.L. and X.H.; methodology, S.L.; software, S.L.; validation, S.L., X.H. and Y.Z.; formal analysis, X.H.; investigation, S.L.; resources, X.H.; data curation, S.L.; writing—original draft preparation, S.L. and Y.Z.; writing—review and editing, X.H., Z.S. and C.Y.; visualization, X.H.; supervision, Z.S. and C.Y.; project administration, X.H.; funding acquisition, X.H. and C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Hainan Talent Convergence Initiative (HNYT20250005), the First-class Discipline Breakthrough Initiative of Hainan University (XKTP2025C08), Hainan Provincial Natural Science Foundation of China (326QN0502), the Hainan Province Science and Technology Special Fund (ZDYF2023XDNY181), the Hainan University Research Start-up Fund (XJ2400005276, RZ2300002823, RZ2300002833), the Hainan University Undergraduate Innovation and Entrepreneurship Training Program (S202510589050).

Institutional Review Board Statement

Ethical review and approval were waived for this study due to the research only involving farmer surveys and interviews with voluntary, informed consent of respondents, anonymized data without identifiable personal information, and no involvement of animal breeding, experimentation, or other animal welfare-related research activities; thus not falling within the scope of animal ethics review and not requiring ethical approval in accordance with the regulations of Hainan University.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original data will be provided as required.

Acknowledgments

The authors sincerely thank the farmers who participated in the survey and the local investigators who assisted with questionnaire distribution, field interviews, and data collection. The authors also appreciate the constructive comments and support received during the preparation of this manuscript. DeepSeek V3.2 and Qwen3-8B were used only as part of the experimental LLM-based reconstruction procedures described in Section 2. The authors reviewed, verified, and are fully responsible for all content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ABM	Agent-based model
API	Application programming interface
CI	Confidence interval
EIO	Extrapolation interval overlap
FSR	Flat-segment ratio
GAM	Generalized Additive Model
IIO	Interpolation interval overlap
IO	Interval overlap
JSON	JavaScript Object Notation
LLM	Large language model
LoRA	Low-Rank Adaptation
MAME	Mean absolute midpoint error
MIW	Mean interval width
MLP	Multilayer perceptron
MVD	Monotonicity violation degree
SHAP	Shapley Additive Explanations

Appendix A

Table A1. The training hyperparameters, decoding parameters, and runtime environment for Qwen3-8B.

Category	Name	Value
Training Hyperparameter	Batch size	1
	Cutoff length	4096
	Gradient accumulation steps	1
	Learning rate	3 × 10⁻⁵
	LoRA rank	32
	Optimizer type	AdamW
	Training epochs	10
	Warmup ratio	0.1
	Validation split	0.1
	Lr scheduler	cosine
	Evaluation strategy	steps
	Evaluation steps	200
	Per-device eval batch size	1
	Precision	BF16
Inference Settings	Temperature	0.0
Inference Settings	Top p	0.9
Local Environment	System version	CentOS 7.9
	LLaMA-Factory version	0.9.5.dev0
	PyTorch version	2.6.0
	CPU	Intel Xeon Gold 5418Y
	GPU	4 × NVIDIA GeForce RTX 4090 GPUs, 24 GB each

Figure A1. Loss curves of one random run from 10 LoRA training trials of Qwen3-8B ((left): train loss, (right): eval loss).

References

Chen, X.; Cui, Z.; Fan, M.; Vitousek, P.; Zhao, M.; Ma, W.; Wang, Z.; Zhang, W.; Yan, X.; Yang, J. Producing more grain with lower environmental costs. Nature 2014, 514, 486–489. [Google Scholar] [CrossRef] [PubMed]
Guo, J.H.; Liu, X.J.; Zhang, Y.; Shen, J.; Han, W.; Zhang, W.; Christie, P.; Goulding, K.; Vitousek, P.; Zhang, F. Significant acidification in major Chinese croplands. Science 2010, 327, 1008–1010. [Google Scholar] [CrossRef] [PubMed]
Duan, J.; Liu, H.; Zhang, X.; Ren, C.; Wang, C.; Cheng, L.; Xu, J.; Gu, B. Agricultural management practices in China enhance nitrogen sustainability and benefit human health. Nat. Food 2024, 5, 378–389. [Google Scholar] [CrossRef]
Cui, Z.; Zhang, H.; Chen, X.; Zhang, C.; Ma, W.; Huang, C.; Zhang, W.; Mi, G.; Miao, Y.; Li, X. Pursuing sustainable productivity with millions of smallholder farmers. Nature 2018, 555, 363–366. [Google Scholar] [CrossRef]
Dessart, F.J.; Barreiro-Hurlé, J.; Van Bavel, R. Behavioural factors affecting the adoption of sustainable farming practices: A policy-oriented review. Eur. Rev. Agric. Econ. 2019, 46, 417–471. [Google Scholar] [CrossRef]
Piñeiro, V.; Arias, J.; Dürr, J.; Elverdin, P.; Ibáñez, A.M.; Kinengyere, A.; Opazo, C.M.; Owoo, N.; Page, J.R.; Prager, S.D. A scoping review on incentives for adoption of sustainable agricultural practices and their outcomes. Nat. Sustain. 2020, 3, 809–820. [Google Scholar] [CrossRef]
Kremmydas, D.; Athanasiadis, I.N.; Rozakis, S. A review of agent based modeling for agricultural policy evaluation. Agric. Syst. 2018, 164, 95–106. [Google Scholar] [CrossRef]
Li, M.; Wang, J.; Zhao, P.; Chen, K.; Wu, L. Factors affecting the willingness of agricultural green production from the perspective of farmers’ perceptions. Sci. Total Environ. 2020, 738, 140289. [Google Scholar] [CrossRef]
Liu, Y.; Ruiz-Menjivar, J.; Zhang, L.; Zhang, J.; Swisher, M.E. Technical training and rice farmers’ adoption of low-carbon management practices: The case of soil testing and formulated fertilization technologies in Hubei, China. J. Clean. Prod. 2019, 226, 454–462. [Google Scholar] [CrossRef]
Ma, W.; Abdulai, A.; Goetz, R. Agricultural cooperatives and investment in organic soil amendments and chemical fertilizer in China. Am. J. Agric. Econ. 2018, 100, 502–520. [Google Scholar] [CrossRef]
Berger, T. Agent-based spatial models applied to agriculture: A simulation tool for technology diffusion, resource use changes and policy analysis. Agric. Econ. 2001, 25, 245–260. [Google Scholar] [CrossRef]
An, L. Modeling human decisions in coupled human and natural systems: Review of agent-based models. Ecol. Model. 2012, 229, 25–36. [Google Scholar] [CrossRef]
Matthews, R.B.; Gilbert, N.G.; Roach, A.; Polhill, J.G.; Gotts, N.M. Agent-based land-use models: A review of applications. Landsc. Ecol. 2007, 22, 1447–1459. [Google Scholar] [CrossRef]
Filatova, T.; Verburg, P.H.; Parker, D.C.; Stannard, C.A. Spatial agent-based models for socio-ecological systems: Challenges and prospects. Environ. Model. Softw. 2013, 45, 1–7. [Google Scholar] [CrossRef]
Larooij, M.; Törnberg, P. Validation is the central challenge for generative social simulation: A critical review of LLMs in agent-based modeling. Artif. Intell. Rev. 2025, 59, 15. [Google Scholar] [CrossRef] [PubMed]
Breiman, L. Statistical modeling: The two cultures (with comments and a rejoinder by the author). Stat. Sci. 2001, 16, 199–231. [Google Scholar] [CrossRef]
Benos, L.; Tagarakis, A.C.; Dolias, G.; Berruto, R.; Kateris, D.; Bochtis, D. Machine learning in agriculture: A comprehensive updated review. Sensors 2021, 21, 3758. [Google Scholar] [CrossRef]
Knowler, D.; Bradshaw, B. Farmers’ adoption of conservation agriculture: A review and synthesis of recent research. Food Policy 2007, 32, 25–48. [Google Scholar] [CrossRef]
Zhang, Y.; Long, H.; Li, Y.; Ge, D.; Tu, S. How does off-farm work affect chemical fertilizer application? Evidence from China’s mountainous and plain areas. Land Use Policy 2020, 99, 104848. [Google Scholar] [CrossRef]
Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, prompt, and predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
Park, J.S.; O’Brien, J.; Cai, C.J.; Morris, M.R.; Liang, P.; Bernstein, M.S. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, San Francisco, CA, USA, 29 October–1 November 2023; pp. 1–22. [Google Scholar]
Wang, L.; Zhang, J.; Yang, H.; Chen, Z.-Y.; Tang, J.; Zhang, Z.; Chen, X.; Lin, Y.; Sun, H.; Song, R. User behavior simulation with large language model-based agents. ACM Trans. Inf. Syst. 2025, 43, 1–37. [Google Scholar] [CrossRef]
Lu, Y.; Aleta, A.; Du, C.; Shi, L.; Moreno, Y. LLMs and generative agent-based models for complex systems research. Phys. Life Rev. 2024, 51, 283–293. [Google Scholar] [CrossRef]
Dillion, D.; Tandon, N.; Gu, Y.; Gray, K. Can AI language models replace human participants? Trends Cogn. Sci. 2023, 27, 597–600. [Google Scholar] [CrossRef]
Huang, J.; Huang, Z.; Jia, X.; Hu, R.; Xiang, C. Long-term reduction of nitrogen fertilizer use through knowledge training in rice production in China. Agric. Syst. 2015, 135, 105–111. [Google Scholar] [CrossRef]
Seber, G.A.; Lee, A.J. Linear Regression Analysis; John Wiley & Sons: Hoboken, NJ, USA, 2003. [Google Scholar]
Marquardt, D.W. An algorithm for least-squares estimation of nonlinear parameters. J. Soc. Ind. Appl. Math. 1963, 11, 431–441. [Google Scholar] [CrossRef]
Fritsch, F.N.; Carlson, R.E. Monotone piecewise cubic interpolation. SIAM J. Numer. Anal. 1980, 17, 238–246. [Google Scholar] [CrossRef]
Ayer, M.; Brunk, H.D.; Ewing, G.M.; Reid, W.T.; Silverman, E. An empirical distribution function for sampling with incomplete information. Ann. Math. Stat. 1955, 26, 641–647. [Google Scholar] [CrossRef]
Tran, A.; Maupin, K.; Rodgers, T. Monotonic Gaussian process for physics-constrained machine learning with materials science applications. J. Comput. Inf. Sci. Eng. 2023, 23, 011011. [Google Scholar] [CrossRef]
De Leeuw, J.; Hornik, K.; Mair, P. Isotone optimization in R: Pool-adjacent-violators algorithm (PAVA) and active set methods. J. Stat. Softw. 2010, 32, 1–24. [Google Scholar] [CrossRef]
Hastie, T.; Tibshirani, R. Generalized additive models. Stat. Sci. 1986, 1, 297–310. [Google Scholar] [CrossRef]
Pya, N.; Wood, S.N. Shape constrained additive models. Stat. Comput. 2015, 25, 543–559. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018, 31, 6639–6649. [Google Scholar]
Hu, L.; Aramideh, S.; Chen, J.; Nair, V.N. Monotone tree-based gami models by adapting xgboost. arXiv 2023, arXiv:2309.02426. [Google Scholar] [CrossRef]
Laird, N.M.; Ware, J.H. Random-effects models for longitudinal data. Biometrics 1982, 38, 963–974. [Google Scholar] [CrossRef]
Hegselmann, S.; Buendia, A.; Lang, H.; Agrawal, M.; Jiang, X.; Sontag, D. Tabllm: Few-shot classification of tabular data with large language models. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Valencia, Spain, 25–27 April 2023; pp. 5549–5581. [Google Scholar]
Liu, A.; Mei, A.; Lin, B.; Xue, B.; Wang, B.; Xu, B.; Wu, B.; Zhang, B.; Lin, C.; Dong, C. Deepseek-v3. 2: Pushing the frontier of open large language models. arXiv 2025, arXiv:2512.02556. [Google Scholar] [CrossRef]
Yang, A.; Li, A.; Yang, B.; Zhang, B.; Hui, B.; Zheng, B.; Yu, B.; Gao, C.; Huang, C.; Lv, C. Qwen3 technical report. arXiv 2025, arXiv:2505.09388. [Google Scholar] [CrossRef]
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. In Proceedings of the International Conference on Learning Representations, Virtual, 25 April 2022. [Google Scholar]
Zheng, Y.; Zhang, R.; Zhang, J.; Ye, Y.; Luo, Z. Llamafactory: Unified efficient fine-tuning of 100+ language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 3: System Demonstrations), Bangkok, Thailand, 11–16 August 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 400–410. [Google Scholar] [CrossRef]
Stone, M. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. Ser. B 1974, 36, 111–133. [Google Scholar] [CrossRef]
Khosravi, A.; Nahavandi, S.; Creighton, D.; Atiya, A.F. Comprehensive review of neural network-based prediction intervals and new advances. IEEE Trans. Neural Netw. 2011, 22, 1341–1356. [Google Scholar] [CrossRef]
Gneiting, T.; Balabdaoui, F.; Raftery, A.E. Probabilistic forecasts, calibration and sharpness. J. R. Stat. Soc. Ser. B Stat. Methodol. 2007, 69, 243–268. [Google Scholar] [CrossRef]
Milgrom, P.; Shannon, C. Monotone comparative statics. Econom. J. Econom. Soc. 1994, 62, 157–180. [Google Scholar] [CrossRef]
Foster, A.D.; Rosenzweig, M.R. Microeconomics of technology adoption. Annu. Rev. Econ. 2010, 2, 395–424. [Google Scholar] [CrossRef]
Horovicz, M.; Goldshmidt, R. Tokenshap: Interpreting large language models with monte carlo shapley value estimation. In Proceedings of the 1st Workshop on NLP for Science (NLP4Science), Miami, FL, USA, 12–16 November 2024; pp. 1–8. [Google Scholar]
Chang, Y.; Cao, B.; Wang, Y.; Chen, J.; Lin, L. JOPA: Explaining Large Language Model’s Generation via Joint Prompt Attribution. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vienna, Austria, 27 July–1 August 2025; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 22106–22122. [Google Scholar] [CrossRef]
Han, C.; Kim, S.; Kim, D.W.; Celi, L.A.; Kim, J.; Bae, S.; Yoon, D. Surrogate modeling for interpreting black-box LLMs in medical predictions. arXiv 2026, arXiv:2604.20331. [Google Scholar] [CrossRef]
Wei, B.; Fazli, M.; Zhu, Z. Making Sense of LLM Decisions: A Prototype-based Framework for Explainable Classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Singapore, 20–27 January 2026; pp. 26814–26822. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.-I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef]
Shapley, L.S. A value for n-person games. In Contributions to the Theory of Games II; Kuhn, H.W., Tucker, A.W., Eds.; Princeton University Press: Princeton, NJ, USA, 1953; pp. 307–318. [Google Scholar] [CrossRef]
Carballo, K.V.; Na, L.; Ma, Y.; Boussioux, L.; Zeng, C.; Soenksen, L.R.; Bertsimas, D. Tabtext: A flexible and contextual approach to tabular data representation. arXiv 2022, arXiv:2206.10381. [Google Scholar] [CrossRef]
Smajgl, A.; Brown, D.G.; Valbuena, D.; Huigen, M.G. Empirical characterisation of agent behaviours in socio-ecological systems. Environ. Model. Softw. 2011, 26, 837–844. [Google Scholar] [CrossRef]
Grimm, V.; Revilla, E.; Berger, U.; Jeltsch, F.; Mooij, W.M.; Railsback, S.F.; Thulke, H.-H.; Weiner, J.; Wiegand, T.; DeAngelis, D.L. Pattern-oriented modeling of agent-based complex systems: Lessons from ecology. Science 2005, 310, 987–991. [Google Scholar] [CrossRef] [PubMed]
Argyle, L.P.; Busby, E.C.; Fulda, N.; Gubler, J.R.; Rytting, C.; Wingate, D. Out of one, many: Using language models to simulate human samples. Political Anal. 2023, 31, 337–351. [Google Scholar] [CrossRef]
Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, S.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E. On the opportunities and risks of foundation models. arXiv 2021, arXiv:2108.07258. [Google Scholar] [CrossRef]
Horton, J.J.; Filippas, A.; Manning, B.S. Large language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? National Bureau of Economic Research: Cambridge, MA, USA, 2023. [Google Scholar]
Murphy, J.J.; Allen, P.G.; Stevens, T.H.; Weatherhead, D. A meta-analysis of hypothetical bias in stated preference valuation. Environ. Resour. Econ. 2005, 30, 313–325. [Google Scholar] [CrossRef]
Harrison, G.W.; List, J.A. Field experiments. J. Econ. Lit. 2004, 42, 1009–1055. [Google Scholar] [CrossRef]
Duflo, E.; Kremer, M.; Robinson, J. Nudging farmers to use fertilizer: Theory and experimental evidence from Kenya. Am. Econ. Rev. 2011, 101, 2350–2390. [Google Scholar] [CrossRef]
Adamowicz, W.; Louviere, J.; Williams, M. Combining revealed and stated preference methods for valuing environmental amenities. J. Environ. Econ. Manag. 1994, 26, 271–292. [Google Scholar] [CrossRef]
Heckman, J.J. Micro data, heterogeneity, and the evaluation of public policy: Nobel lecture. J. Political Econ. 2001, 109, 673–748. [Google Scholar] [CrossRef]
Todd, P.E.; Wolpin, K.I. Assessing the impact of a school subsidy program in Mexico: Using a social experiment to validate a dynamic behavioral model of child schooling and fertility. Am. Econ. Rev. 2006, 96, 1384–1417. [Google Scholar] [CrossRef] [PubMed]
Feder, G.; Just, R.E.; Zilberman, D. Adoption of agricultural innovations in developing countries: A survey. Econ. Dev. Cult. Change 1985, 33, 255–298. [Google Scholar] [CrossRef]
Lipton, Z.C.; Steinhardt, J. Troubling Trends in Machine Learning Scholarship: Some ML papers suffer from flaws that could mislead the public and stymie future research. Queue 2019, 17, 45–77. [Google Scholar] [CrossRef]
Miriti, P.K.; Lambarraa-Lehnhardt, F. Understanding farmer preferences and trade-offs for adopting sustainable crop production: A systematic review. Discov. Sustain. 2025, 6, 760. [Google Scholar] [CrossRef]
Liu, M.; Liu, H. Farmers’ adoption of agriculture green production technologies: Perceived value or policy-driven? Heliyon 2024, 10, e23925. [Google Scholar] [CrossRef]

Figure 1. The overall framework of modeling approaches for testing, evaluation, and heterogeneity analysis.

Figure 2. Surveyed responses. Panels (a–d) summarize four aspects of the survey results: (a) response composition across subsidy levels; (b) shift in intended fertilizer-reduction intensity across subsidy levels, with colors distinguishing subsidy levels; (c) annual fertilizer cost by agricultural income dependence; and (d) overall response mix by training exposure.

Figure 3. Comparison of conventional modeling approaches. Scores are reported on a normalized 0–100 scale. For each metric, 100 denotes the best value achieved among all evaluated methods and 0 denotes the worst; for MIW and MAME, the direction is reversed so that higher scores always indicate better performance.

Figure 4. Performance comparison between LLM modeling approaches and the best overall performing baseline modeling approach. Scores are reported on a normalized 0–100 scale. For each metric, 100 denotes the best value achieved among all evaluated methods and 0 denotes the worst; for MIW and MAME, the direction is reversed so that higher scores always indicate better performance. The red dashed horizontal segment in each metric group indicates the normalized score of the Monotone XGBoost baseline, and the red numeric labels highlight the scores of the two incremental LLM approaches.

Figure 5. Generalized subsidy–response curves of the evaluated methods over the full subsidy range. The solid blue line denotes the generalized midpoint response, and the shaded region with dashed boundaries denotes the generalized interval and its bounds. Red points and vertical bars indicate the observed anchor means and anchor intervals at the surveyed subsidy levels. MVD and FSR are reported in each panel to summarize monotonicity violation and flat-segment behavior. Red boxes mark representative local segments with downward changes, oscillations, or extended flat patterns.

Figure 6. Heterogeneity in reconstruction difficulty under DeepSeek V3.2 Increment.

Figure 7. SHAP beeswarm plot of the top 10 most influential features, arranged from highest to lowest influence.

Table 1. Provinces and counties included in the survey.

Province	County
Anhui	Feixi
Chongqing	Shizhu
Guangdong	Mei, Wuhua
Hainan	Yazhou, Ledong
Hebei	Luannan
Hunan	Hengnan
Jiangsu	Pizhou, Tongshan, Suining
Shandong	Yucheng, Feicheng
Shanxi	Zezhou
Sichuan	Luxian

Table 2. Categories of survey information and corresponding value ranges. Data were collected using the local unit mu (1/15 ha).

Category	Data
1. Geographical location (input)	1.1 Provincial-level divisions (i.e., Hainan, Jiangsu, etc.) 1.2 Prefecture-level divisions 1.3 County-level divisions
2. Household labor force situation (input)	2.1 Number of household members (2–7) 2.2 Number of working-age household members aged 18–60 (2–7) 2.3 Number of laborers specifically engaged in agricultural production (0–4) 2.4 Whether additional hired workers are employed for planting work besides household labor (Y/N)
3. Crop cultivation situation (input)	3.1 Types of crops (rice, corn, and wheat/vegetables/fruits) 3.2 Cultivation area per crop (0 mu–greater than 30 mu)
4. Economic factors (input)	4.1 Proportion of agricultural income in total annual household income (less than 10–greater than 50%) 4.2 Number of hired workers (2–50) 4.3 Average number of working days per person (5–30 days) 4.4 Daily wage per person (60–150 RMB/day) 4.5 Types of chemical fertilizers (urea/compound fertilizer/potash) 4.6 Timing of chemical fertilizer application (pre-sowing, pre-transplanting, post-harvest, tillering stage, booting stage, etc.) 4.7 Amount of chemical fertilizer used per application (0.5–2 bags) 4.8 Price per bag of chemical fertilizer (60–300 RMB) 4.9 Total annual expenditure on chemical fertilizers (700–10,000 RMB)
5. Educational level (input)	5.1 Attitude towards new technologies promoted by the government (e.g., direct-seeded rice, green pest control) 5.2 Whether one has received training on scientific fertilization techniques
6. Policy scenario question and answer (output)	Subsidies for reducing fertilizer use: 50, 100, 200, and 500 RMB/mu. (The output was the stated interval of intended fertilizer-reduction percentage under each subsidy level).

Table 3. Summary of evaluated modeling approaches.

Category	Method
Anchor-based reconstruction	Linear Fitting [26]
	Curve Fitting [27]
	Monotone Piecewise Linear [28]
	Per-Sample Isotonic [29]
Pooled univariate modeling	Monotonic Gaussian Process [30]
Tabular-feature modeling	Residual + Isotonic Projection [31]
	Generalized Additive Model [32]
	Monotone Generalized Additive Model [33]
	Multilayer Perceptron [34]
	CatBoost [35]
	Monotone CatBoost [35]
	Monotone XGBoost [36]
	Conformal CatBoost [35]
	Household Random Effects ¹ [37]
LLM-based modeling	LLM Direct Inference
	LLM Fine-tuned Inference
	LLM Incremental Inference

¹ Household Random Effects was included in tabular-feature modeling because it uses structured covariates with an additional household-level grouping term.

Table 4. Raw performance metrics of conventional baseline models across 10 repeated splits. Values are reported as mean ± standard deviation. For each metric, each method was compared with Monotone XGBoost using paired Wilcoxon signed-rank tests across the same repeated splits, with Holm correction applied for multiple comparisons. Lower values indicate better performance for MIW and MAME, whereas higher values indicate better performance for IO, EIO, and IIO.

Baseline Models	MIW	MAME	IO	EIO	IIO
XGBoost_Monotone	3.039 ± 0.172	1.726 ± 0.320	0.451 ± 0.043	0.505 ± 0.040	0.397 ± 0.055
CatBoost	3.151 ± 0.188	1.883 ± 0.359 *	0.368 ± 0.058 *	0.399 ± 0.071 *	0.336 ± 0.059
CatBoost_Monotone	2.585 ± 0.176 *	1.909 ± 0.345 *	0.361 ± 0.052 *	0.407 ± 0.061 *	0.314 ± 0.052 *
ConformalCatBoost	8.692 ± 1.976 *	2.938 ± 0.381 *	0.254 ± 0.037 *	0.229 ± 0.045 *	0.279 ± 0.045 *
Curve_fitting	3.581 ± 0.409 *	2.717 ± 0.269 *	0.189 ± 0.031 *	0.106 ± 0.020 *	0.271 ± 0.044 *
GAM	3.104 ± 0.252	2.331 ± 0.479 *	0.352 ± 0.049 *	0.342 ± 0.066 *	0.362 ± 0.049
GAM_Monotone	3.045 ± 0.593	2.510 ± 0.485 *	0.341 ± 0.037 *	0.344 ± 0.058 *	0.339 ± 0.055 *
HouseholdRandomEffects	0.921 ± 0.746 *	4.168 ± 0.928 *	0.385 ± 0.054 *	0.390 ± 0.051 *	0.380 ± 0.069
IsotonicPerSample	2.831 ± 0.374	2.954 ± 0.323 *	0.363 ± 0.059 *	0.345 ± 0.065 *	0.382 ± 0.070
IsotonicResidualProjection	4.002 ± 0.684 *	2.440 ± 0.455 *	0.336 ± 0.036 *	0.356 ± 0.043 *	0.317 ± 0.042 *
Linear_fitting	4.453 ± 0.582 *	3.379 ± 0.387 *	0.343 ± 0.062 *	0.318 ± 0.067 *	0.369 ± 0.065
MLP	3.046 ± 0.328	2.155 ± 0.426 *	0.369 ± 0.040 *	0.420 ± 0.050 *	0.319 ± 0.071
MonotonePiecewiseLinear	4.002 ± 0.721	3.058 ± 0.303 *	0.395 ± 0.064 *	0.407 ± 0.071 *	0.382 ± 0.070
MonotonicGaussianProcess	3.969 ± 0.701	3.322 ± 0.336 *	0.130 ± 0.023 *	0.127 ± 0.022 *	0.132 ± 0.033 *

* Indicates a statistically significant difference from Monotone XGBoost at adjusted p < 0.05.

Table 5. Raw performance metrics of the LLM-based methods and Monotone XGBoost across 10 repeated splits. Values are reported as mean ± standard deviation.

Models	MIW	MAME	IO	EIO	IIO
Monotone XGBoost	3.038 ± 0.171	1.725 ± 0.319	0.450 ± 0.043	0.504 ± 0.040	0.396 ± 0.054
DeepSeek V3.2 Direct	3.686 ± 0.475	2.453 ± 0.228	0.461 ± 0.058	0.412 ± 0.068	0.509 ± 0.091
Qwen3-8B Direct	4.949 ± 0.599	2.580 ± 0.353	0.413 ± 0.082	0.386 ± 0.083	0.439 ± 0.086
Qwen3-8B LoRA	4.043 ± 0.477	2.371 ± 0.319	0.459 ± 0.068	0.422 ± 0.071	0.495 ± 0.077
DeepSeek V3.2 Increment	2.533 ± 0.294	1.692 ± 0.301	0.528 ± 0.076	0.602 ± 0.076	0.454 ± 0.109
Qwen3-8B Increment	2.682 ± 0.297	1.291 ± 0.195	0.479 ± 0.057	0.636 ± 0.060	0.321 ± 0.068

Table 6. Mean absolute SHAP values of key features.

Feature	Mean Absolute SHAP Value
Fertilizer bags per mu	0.2414
Annual fertilizer cost	0.1808
Fertilization training	0.1473
Province latitude	0.0904
Household size	0.0742
Agricultural labor	0.0596
Price per bag of chemical fertilizer	0.0585
Agricultural income share	0.0406
Working-age labor	0.0327
Planting area	0.0319

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Liu, S.; Zhang, Y.; Sun, Z.; Huang, X.; Yu, C. LLM-Enabled Reconstruction of Farmer Fertilizer-Reduction Responses Under Policy Scenarios: Evidence from Sparse Stated-Preference Data. Agriculture 2026, 16, 1266. https://doi.org/10.3390/agriculture16121266

AMA Style

Liu S, Zhang Y, Sun Z, Huang X, Yu C. LLM-Enabled Reconstruction of Farmer Fertilizer-Reduction Responses Under Policy Scenarios: Evidence from Sparse Stated-Preference Data. Agriculture. 2026; 16(12):1266. https://doi.org/10.3390/agriculture16121266

Chicago/Turabian Style

Liu, Shuaiwen, Yichuan Zhang, Zhentao Sun, Xiao Huang, and Chaoqing Yu. 2026. "LLM-Enabled Reconstruction of Farmer Fertilizer-Reduction Responses Under Policy Scenarios: Evidence from Sparse Stated-Preference Data" Agriculture 16, no. 12: 1266. https://doi.org/10.3390/agriculture16121266

APA Style

Liu, S., Zhang, Y., Sun, Z., Huang, X., & Yu, C. (2026). LLM-Enabled Reconstruction of Farmer Fertilizer-Reduction Responses Under Policy Scenarios: Evidence from Sparse Stated-Preference Data. Agriculture, 16(12), 1266. https://doi.org/10.3390/agriculture16121266

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LLM-Enabled Reconstruction of Farmer Fertilizer-Reduction Responses Under Policy Scenarios: Evidence from Sparse Stated-Preference Data

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection

2.2. Reconstruction Settings and Modeling Strategies

2.3. Evaluation Framework for Accuracy and Rationality

2.4. Heterogeneity and Interpretability Analysis

3. Results

3.1. Survey Results

3.2. Model Performance

3.2.1. Comparison of Conventional Modeling Approaches

3.2.2. LLM Performance

3.2.3. Generalized Curve Behavior

3.3. Heterogeneity and Interpretability

4. Discussion

4.1. Advantages of LLM-Based Reconstruction

4.2. Limitations and Improvement Prospects

4.3. Future Prospects for LLM-Based Reconstruction

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI