Wheat Yield Prediction Based on Parallel CNN-LSTM-Attention with Transfer Learning Model

Song, Caixia; Liu, Tengao; Ning, Weiguang; Xu, Tong; Song, Shuhui; Li, Zifei; Ouyang, Shuyun; Song, Xinquan; Han, Taoyang; Zhang, Zichen; Chen, Tianyu; Xie, Jinbao

doi:10.3390/agriculture15232519

Open AccessArticle

Wheat Yield Prediction Based on Parallel CNN-LSTM-Attention with Transfer Learning Model

by

Caixia Song

^1,*,

Tengao Liu

¹,

Weiguang Ning

²,

Tong Xu

¹,

Shuhui Song

²,

Zifei Li

²,

Shuyun Ouyang

¹,

Xinquan Song

¹,

Taoyang Han

¹,

Zichen Zhang

¹,

Tianyu Chen

¹ and

Jinbao Xie

¹

College of Science and Information, Qingdao Agricultural University, Changcheng Road, Qingdao 266109, China

²

Qingdao Smart Rural Development Service Center, Licang District, Qingdao 266100, China

^*

Author to whom correspondence should be addressed.

Agriculture 2025, 15(23), 2519; https://doi.org/10.3390/agriculture15232519

Submission received: 22 October 2025 / Revised: 25 November 2025 / Accepted: 2 December 2025 / Published: 4 December 2025

(This article belongs to the Section Artificial Intelligence and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

Accurate wheat yield prediction is essential for ensuring food security and supporting governmental decision-making. However, the scarcity of long-term agricultural time-series data and the complex interplay between meteorological and socio-economic factors pose significant challenges. To address these issues, this study proposes a Transfer-Learning-Based Parallel CNN–LSTM–Attention (TPCLA) model for wheat yield forecasting. A cross-regional transfer learning strategy is employed to mitigate data scarcity by leveraging temporal patterns learned from regions with similar ecological characteristics. The proposed parallel architecture integrates one-dimensional convolutional neural networks and long short-term memory networks to jointly extract spatial and temporal features, while an attention mechanism is incorporated to highlight key influencing factors and enhance feature interpretability. Unlike conventional studies that primarily focus on climatic variables, this work considers both direct factors (e.g., average temperature and precipitation) and indirect socio-economic factors (e.g., agricultural mechanization level, total agricultural output value, grain production scale, cultivated land area, and disaster-affected area). Experimental results on multivariate wheat data from 1993 to 2024 demonstrate that several indirect indicators exert a more substantial influence on yield than traditional meteorological variables—reflecting the increasing ability of modern agricultural practices to buffer climatic variability. The proposed TPCLA model achieves an RMSE of 0.394, MAE of 0.326, and an

R^{2}

of 0.904, outperforming multiple benchmark models and confirming its robustness and predictive superiority under small-sample conditions. The findings not only validate the effectiveness of integrating indirect yield-influencing factors but also provide new insights for agricultural policy formulation and climate resilience strategies.

Keywords:

transfer learning; deep learning; convolutional neural network (CNN); long short-term memory (LSTM); wheat yield prediction

1. Introduction

Wheat is one of the world’s three primary cereal grains, and it plays a decisive role in ensuring global food security and stabilizing socio-economic development. As a staple crop providing the largest share of global caloric intake, wheat production contributes significantly to the world’s food supply and nutrition structure [1,2,3,4]. Accurate and timely wheat yield prediction is, therefore, essential for preventing food shortages, optimizing agricultural input allocation, and supporting national macro-level policy-making [5,6]. China, the world’s largest wheat producer, accounts for 11.26% of the total global wheat planting area and 17.98% of the global wheat output [7]. Among its production regions, winter wheat grown between the Yellow River and the Huai River occupies an absolutely dominant position, contributing over 85% of the country’s total summer grain yield [8]. Consequently, improving the precision of wheat yield forecasting is of strategic importance for national food security and global agricultural stability.

Wheat yield is affected by a complex set of interacting elements, which can be categorized into direct and indirect influencing factors. Direct factors refer to climatic and environmental variables that directly affect crop growth physiology, including average temperature and precipitation. Indirect factors, on the other hand, shape farmers’ production decisions, resource inputs, and resilience to environmental variability. These include indicators such as the total power of agricultural machinery, the total agricultural output value, the comprehensive output of agriculture–forestry–animal husbandry–fishery sectors, the total grain output, the cultivated land area, and disaster-affected areas. Although climate factors have traditionally been considered the dominant predictors of crop yields, modern agricultural mechanization has increasingly enabled farmers to buffer or offset adverse weather impacts through enhanced operational efficiency and improved management practices [9,10,11]. Our empirical findings further reveal that, for the studied regions, the average temperature and precipitation are no longer the most influential factors. Instead, several indirect socio-economic indicators show a stronger association with wheat yields. This highlights the necessity of incorporating both direct and indirect factors into prediction models—an important aspect that is largely neglected in existing crop yield forecasting studies.

A substantial body of research has explored machine learning and deep learning techniques for crop yield prediction. Traditional statistical models—such as linear regression, ARIMA, support vector regression, and decision-tree–based ensembles—provide basic predictive capabilities but often struggle to represent nonlinear multivariate time-series relationships within agricultural systems [12]. With the rise of deep learning, various models including convolutional neural networks (CNNs), recurrent neural networks (RNNs), long short-term memory (LSTM) networks, and hybrid architectures with attention mechanisms have demonstrated improved performance in capturing complex spatiotemporal patterns [6,12,13,14,15,16]. Recent studies have applied deep neural networks to integrate climatic, ecological, and remote sensing features in order to enhance model accuracy [17,18]. Meanwhile, attention mechanisms have been introduced to identify key temporal features and enhance interpretability in yield prediction tasks [16]. Despite these advancements, two critical challenges persist:

(1) The scarcity of high-quality, long-term multivariate agricultural time-series data in many regions.

(2) The lack of explicit consideration for indirect socio-economic and policy-related factors, which play increasingly significant roles under modern agricultural conditions.

To address these limitations, this paper proposes a Transfer-learning-based Parallel CNN-LSTM-Attention (TPCLA) model that integrates both direct and indirect yield-influencing variables. A cross-regional transfer learning strategy is employed to alleviate the problem of limited temporal samples by transferring knowledge from regions with similar climatic characteristics [19]. Meanwhile, the designed parallel CNN-LSTM architecture enables the extraction of spatial features via 1D convolution while simultaneously capturing long-term temporal dependencies through LSTM. An attention mechanism is applied to further highlight the most influential features and enhance model interpretability. Beyond improving predictive accuracy, an important contribution of this work is the explicit incorporation of indirect socio-economic factors—such as the mechanization level and agricultural output indicators—which reveals their substantial influence on wheat yield. This fills an important gap in the existing literature and provides a valuable empirical reference for policy-making and agricultural planning.

The main contributions of this paper are as follows.

(1) A cross-regional transfer learning strategy is introduced to overcome the scarcity of wheat yield time-series data, enabling more robust temporal feature extraction.

(2) A parallel CNN-LSTM-Attention network is designed to preserve spatiotemporal feature integrity, enhance data utilization, and improve prediction performance.

(3) Both direct and indirect yield-influencing factors are incorporated, demonstrating that indirect socio-economic variables significantly contribute to wheat yield prediction and filling a research gap in existing crop forecasting methodologies.

(4) Extensive experiments on multivariate wheat time-series data from 1993 to 2023 validate the accuracy, stability, and generalization capability of the proposed TPCLA model.

2. Related Works

In recent decades, many researchers have increasingly focused on improving crop yield prediction through different methods, including empirical statistical models, process-oriented crop growth models, and prediction through remote sensing data [5,20,21]. Traditional statistical models predict yields by establishing regression equations between weather variables (such as temperature, precipitation, solar radiation, etc.) and the yields measured at different time and spatial scales [4,22,23].

Process-based crop simulation models present a mechanistic alternative to purely statistical approaches for yield prediction. Models such as the Decision Support System for Agrotechnology Transfer (DSSAT) [24], the Agricultural Production Systems Simulator (APSIM) [25], and the World Food Studies (WOFOST) [8] simulate crop growth and development by modeling underlying biophysical processes (e.g., photosynthesis, phenology, and soil–water dynamics). Their primary strength lies in their strong interpretability and the ability to conduct “what-if” scenario analyses under changing environmental conditions [21,26]. However, a significant limitation hindering their widespread operational application is their dependency on extensive, high-quality input parameters—including detailed soil profiles, cultivar-specific genetic coefficients, and precise daily management data—which are often difficult or costly to obtain at regional scales [27]. This data requirement challenge, coupled with the computational complexity of running these models, has motivated the exploration of data-driven machine learning approaches that can learn complex, non-linear relationships directly from more readily available historical data, thus providing a complementary and often more practical pathway for large-area yield forecasting [28].

Substantial research efforts have been devoted to developing models for wheat yield forecasting, particularly through the utilization of multivariate time-series data. Earlier studies relied predominantly on linear statistical techniques to capture temporal yield variations. However, with advances in computing capabilities, machine learning and deep learning methods have increasingly been adopted across diverse domains—including image analysis, language processing, and signal interpretation [27]. Classical machine learning algorithms such as support vector machines (SVMs) and random forests (RFs) have been extensively applied in remote sensing–related tasks [28,29] and in agricultural yield prediction [30,31,32]. Building on these developments, a range of machine learning, deep learning, and hybrid frameworks have reported improved performance for wheat yield estimation.

Alongside these approaches, statistical analysis methods have continued to evolve. For example, Niedbała G. [33] proposed a multi-head linear regression framework for wheat prediction, while Amin et al. [34] demonstrated the effectiveness of the AutoRegressive Integrated Moving Average (ARIMA) model for forecasting wheat yields. Nevertheless, methods grounded solely in statistical assumptions often struggle to represent complex nonlinear dependencies, and their predictive capability may degrade when dealing with extended time horizons or numerous interacting variables.

Recent years have seen rapid advances in machine learning and deep learning, leading to their widespread use in various predictive modeling tasks. Sequence-based architectures, particularly RNN and lSTM models, have gained prominence due to their ability to represent temporal dynamics. Comparative evaluations of multiple machine learning models—including LSTM—have shown that both linear regression and LSTM approaches can produce competitive results in agricultural yield forecasting [35]. Enhanced variants of LSTM have also been introduced, such as the Deep LSTM (DLSTM) model, which demonstrated improved accuracy for production-related time-series problems [36], and an LSTM optimized using the Improved Optimization Framework (IOF), which further strengthened yield prediction performance [37]. Additional hybrid designs have been proposed, including an ARIMA–LSTM combination for wheat forecasting [38] and a CNN–LSTM architecture augmented with multi-head attention and skip connections [39].

Although the aforementioned deep learning models have achieved promising results in capturing temporal patterns and improving predictive performance, most existing architectures rely primarily on climatic variables or single-dimensional temporal features. For instance, LSTM variants and their optimized forms [36,37,40] exhibit strong nonlinear modeling capability, yet their performance may degrade in data-scarce scenarios due to the absence of mechanisms for parameter reuse across regions. Similarly, hybrid ARIMA–LSTM approaches [41] and CNN–LSTM-based architectures with attention and skip connections [39] effectively extract spatiotemporal features but often require sufficiently large datasets to fully exploit their representational capacity. In contrast, the transfer learning strategy adopted in the present study aims to alleviate limitations associated with small-sample datasets by allowing the model to reuse knowledge from ecologically similar regions, thereby improving generalization performance without depending solely on increasing data volume.

Furthermore, recent studies have incorporated attention mechanisms into neural architectures to emphasize critical temporal features and enhance interpretability, as demonstrated in attention-augmented LSTM and CNN–LSTM models [39]. While such methods strengthen feature discrimination, they predominantly focus on direct climatic variables and rarely integrate indirect socio-economic indicators that also influence yield fluctuations. The approach proposed in this study complements existing research by jointly modeling direct meteorological factors and indirect agricultural variables within a parallel spatiotemporal architecture. The inclusion of cross-regional transfer learning further enables the model to capture invariant patterns that conventional deep learning models [36,37,40] may overlook. Rather than replacing prior architectures, the present framework extends their applicability to small-sample, multi-factor agricultural prediction scenarios.

3. Methods

3.1. Data Preprocessing

The dataset of this study is all derived from the government statistical yearbooks of various provinces from 1993 to 2024, including relevant characteristics such as wheat output, grain output, wheat cultivation area, and so on. Due to the authority of the data collected in these government yearbooks, there is no need to handle missing values or outliers. However, normalization is still required. According to the proportion allocation of features, all features are normalized to the scale of [0, 1], enabling a standardized evaluation of their impact on yield predictions. The Min–Max scaling method was applied for this transformation, formalized as follows:

X^{'} = \frac{X - X_{m i n}}{X_{m a x} - X_{m i n}}

(1)

where

X^{'}

,

X_{m a x}

, and

X_{m i n}

represent the normalized value, as well as the maximum and minimum values of the original feature, respectively.

3.2. Baselines

This section describes models used as baselines for comparing the proposed TPCLA: RNN, LSTM, LSTM Attention, and PCL.

3.2.1. RNN

In deep learning architectures, RNNs are specifically designed for modeling sequential temporal data. Their inherent structure integrates information from preceding timesteps with current inputs, thereby enabling the effective capture of temporal dependencies within sequences. A standard RNN unit typically consists of three layers: an input layer, a hidden (recurrent) layer, and an output layer, with the hidden state updated iteratively at each timestep. The unfolded architecture is illustrated in Figure 1.

In Figure 1,

x_{t}

is the input at timestep t,

h_{t}

is the hidden state, and

y_{t}

is the output. Therefore, the update of

h_{t}

can be calculated as follows:

h_{t} = σ (W_{h h} \cdot h_{t - 1} + W_{x h} \cdot x_{t} + b_{h})

(2)

where

W_{h h}

is the weight matrix for recurrent connections,

W_{x h}

is the weight matrix mapping inputs to a hidden state,

b_{h}

is a hidden-layer bias term, and

σ

is the activation function.

The output

y_{t}

is computed as follows:

y_{t} = s o f t m a x (W_{h y} \cdot h_{t} + b_{y})

(3)

where

W_{h y}

represents the weight matrix for hidden to output transformations, and

b_{y}

is the output layer bias term.

3.2.2. LSTM

LSTM is an optimized variant of the RNN, specifically designed to address the inherent issues of gradient vanishing and the explosion in standard RNNs. And LSTM’s core unit comprises three gating mechanisms—the forget, input, and output gates—along with a memory cell. This architecture enables the systematic regulation of information flow across time steps. By selectively retaining or discarding features from the previous state, it effectively propagates contextually relevant information forward in the sequence. The structure of the LSTM unit is shown in Figure 2.

The forget gate modulates historical memory preservation, the input gate controls new feature integration, and the output gate controls the current state output. The gated mechanism works in synergy with memory cells, enabling LSTM to preserve long-term data states while being sensitive to short-term time. This is a crucial ability for modeling complex temporal dynamic series data.

The forget gate determines which information to discard from the memory cell. The forget gate processes the previous hidden state,

h_{t - 1}

, and the current input,

x_{t}

, through a Sigmoid activation function, generating the forget gate vector

f_{t}

. The governing equation is as follows:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

(4)

where

f_{t}

is the retention proportion,

W_{f}

is the weight matrix of the forget gate,

σ

is the sigmoid function,

h_{t - 1}

is the hidden state from the previous timestep,

x_{t}

is the current input, and

b_{f}

is the bias vector.

The input gate regulates the incorporation of new information into the memory cell and operates in two stages.

Stage One: The input gate generates a vector,

i_{t}

, through a Sigmoid activation function.

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(5)

Stage Two: A candidate value,

\tilde{C_{t}}

, generated via the hyperbolic tangent (tanh) activation.

\tilde{C_{t}} = t a n h (W_{C} \cdot [h_{t - 1}, x_{t}] + b_{C})

(6)

The updated input value is derived by the element-wise multiplication of these two components.

W_{i}

and

W_{C}

are weight matrices for the input gate and candidate cell, respectively;

b_{i}

and

b_{C}

are their corresponding bias terms; and ⊙ is the Hadamard (element-wise) product. The memory cell serves as the core component of the LSTM unit, integrating outputs from both the forget and input gates to update its state,

C_{t}

:

C_{t} = t a n h (f_{t} ⊙ C_{t - 1} + i_{t} ⊙ \tilde{C_{t}})

(7)

This process ensures the selective retention of historical information and the controlled integration of new features.

The output gate determines the final hidden state,

h_{t}

. The output gate first computes an output ratio, ot, via sigmoid activation:

o_{t} = σ (W_{o} ⊙ [h_{t - 1}, x_{t}] + b_{o})

(8)

where

W_{o}

represents the output gate’s weight matrix, and

b_{o}

is its bias vector.

The hidden state is then generated by filtering the memory cell through a tanh nonlinearity and scaling it with the output ratio.

h_{t} = o_{t} ⊙ t a n h (C_{t})

(9)

where

h_{t}

represents the result, which is propagated to subsequent timesteps, completing the recursive temporal dependency.

3.2.3. LSTM Attention

The LSTM-Attention model utilizes LSTM layers to capture temporal dependencies in sequential data, coupled with an attention mechanism that assigns context-dependent weights to salient input components. This design, inspired by human cognitive focus, dynamically directs computational resources toward the most relevant features, enhancing crucial information and filtering out noise—a capability particularly beneficial for sequential data analysis.

The attention mechanism computes context-aware representations through the following:

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(10)

The triad of Q (Query), K (Key), and V (Value) is denoted as follows:

Q: the target element requiring contextual processing.
K: elements in the input sequence used for similarity computation.
V: the actual content values of the input sequence.

where

d_{k}

represents a scaling factor set to prevent the dot product from being too large and causing gradient vanishing.

The LSTM-Attention mechanism operates in three sequential stages.

Stage One, Similarity Scoring: calculate the pairwise affinity between Q and all K elements.

Stage Two, Probability Normalization: convert scores to attention weights via softmax.

Stage Three, Context Aggregation: compute the weighted summation of V using normalized weights.

3.2.4. Parallel CNN-LSTM-Attention Model

To effectively capture the spatiotemporal dependencies embedded in multivariate time-series data, this study proposes a Parallel CNN–LSTM–Attention (PCLA) model. As illustrated in Figure 3, wheat yield is jointly influenced by spatial factors (e.g., soil properties, cultivated area, disaster-affected area) and temporal factors (e.g., precipitation, temperature, policy interventions), which exhibit strong interdependencies. The CNN branch extracts localized spatial patterns associated with soil and field characteristics, whereas the LSTM branch models long-term temporal dependencies and seasonal dynamics in variables such as rainfall and temperature. During feature fusion, these spatial and temporal representations are concatenated to construct an integrated descriptor, and an attention mechanism subsequently assigns adaptive weights to critical time steps—such as extreme weather events during key growth phases—enabling the model to identify both where (spatial) and when (temporal) influential factors occur.

The multivariate time-series data related to wheat production are structured into three-year segments and formatted as multivariate input arrays. Within the PCLA framework, the multi-year data are simultaneously fed into two parallel processing pathways. The CNN branch comprises two convolutional layers, each equipped with a 3 × 3 kernel, a stride of 1, batch normalization, ReLU activation, and adaptive max-pooling. To comply with the Conv1d operation, the input tensors are reshaped prior to convolution and subsequently restored to their temporal alignment after processing. In parallel, the LSTM branch consists of two stacked LSTM layers, designed to capture long-range dependencies and temporal patterns. The activation functions in the LSTM layers are optimized using particle swarm optimization (PSO). The LSTM processes the raw temporal sequences directly, ensuring an uninterrupted information flow along the time dimension.

During the fusion stage, the spatial features extracted via the CNN branch are concatenated with the temporal features produced through the LSTM branch along the feature axis. This concatenation integrates two complementary sources of information, resulting in a unified spatiotemporal representation whose dimensionality equals the sum of the CNN channels and LSTM hidden units. The fused representation is subsequently passed to the attention module, which consists of a two-layer neural structure that computes importance scores for each time step. These scores are normalized via a softmax function and applied to generate a weighted aggregation of the fused features. This process yields a compact context vector that emphasizes temporally salient components relevant to the prediction task. The context vector is then forwarded to a fully connected output layer to generate the final yield estimation. By combining the spatial modeling capability of CNNs, the temporal learning strength of LSTMs, and the adaptive weighting of attention, the model produces a more comprehensive and discriminative representation.

Unlike conventional serial CNN–LSTM–Attention architectures—in which spatial features are learned first, and temporal dynamics are modeled sequentially—the PCLA model processes spatial and temporal information in parallel, thereby preserving the original temporal structure of the data. This design avoids distortions that may arise from early convolutional compression and enhances the overall utilization of the available features. Moreover, the attention mechanism improves interpretability and mitigates challenges associated with limited sample sizes by emphasizing critical spatiotemporal patterns.

3.2.5. Parallel CNN-LSTM-Attention Model Based on Transfer Learning

The Parallel CNN-LSTM-Attention Model Based on Transfer Learning (TPCLA), aiming at the problem of small quantity and few features of wheat yield data in a single region, optimizes the PCLA model through cross-regional transfer learning, enabling the model to learn data from different regions of the same type to learn more data features. This is to improve the prediction accuracy of the model and solve the problems of large prediction error and low model accuracy caused by the scarcity of wheat data.

The training process of this model is divided into two stages:

Pre-training stage: train using wheat datasets of similar dimensions or agricultural climate regions to learn the invariant features from the domain gap.

Fine-tuning stage: readjust the learning rate, and conduct secondary training on the Shandong wheat dataset using the layer-by-layer thawing method to retain the pre-trained features and learn the data features of the target domain.

3.3. Hyperparameters Optimization

Hyperparameters are critical factors influencing the performance of deep learning models. As model performance varies significantly with hyperparameter configurations, their appropriate selection is essential. In this study, the PSO algorithm was applied to optimize hyperparameters for a deep learning model. PSO randomly generates a population of particles, in which each particle represents a hyperparameter combination. The velocity and position of each particle were initialized, and the MAE was employed as the minimization objective function. During iterative updates, the particles’ individual historical best positions (pBest) and the global historical best position (gBest) were recorded. Upon the completion of the iterations, optimized hyperparameters were obtained. Table 1 summarizes the search ranges for each hyperparameter in the deep learning model optimization process.

3.4. Evaluation Criteria

The root mean absolute error (

R M S E

), the mean absolute error (

M A E

), and the coefficient of determination(

R^{2}

) were adopted as indicators in comparing the accuracy of the models. Among them,

R M S E

is the sum of the squares of the distances between the true value and the predicted value, the average value is taken, and then the root is cut.

R M S E

is more intuitive when the true value and the predicted value are compared at the same order of magnitude. The smaller the

R M S E

, the higher the average prediction accuracy.

M A E

is calculated by taking the sum of the absolute distances between the true value and the predicted value and then taking the average. The smaller the

M A E

, the higher the prediction accuracy.

R^{2}

is the ratio of the dispersion degree of the predicted data relative to the mean value of the true data to that of the true data relative to the mean value, reflecting the fitting effect of the model. The closer its value is to 1, the better the fitting effect. The specific formula is as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}

(11)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | y_{i} - \hat{y_{i}} |

(12)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - \hat{y_{i}})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(13)

In the equations, n represents the number of data points, y is the actual wheat yield,

\hat{y}

is the predicted wheat yield, and

\bar{y}

is the average wheat yield.

4. Experiments

4.1. Wheat Dataset Preprocessing

The wheat production data are sourced from government statistical yearbooks of multiple provinces in China from 1993 to 2024. As official datasets, they are considered highly reliable and require no exception handling. The dataset includes eight features: rainfall, the average temperature, the total cultivated land area, the total grain output, the total output value of agriculture, forestry, animal husbandry, and fishery, the total agricultural output value, the total power of agricultural machinery, and the disaster-affected area. The target variable is the wheat yield per unit area. The data from 1993 to 2012 were adopted as the training data. The data from 2013 to 2016 were used as verification data. The data from 2017 to 2024 were used as test data.

4.2. Feature Analysis

A feature analysis was conducted on the above-mentioned feature data to obtain the influence weights of each feature on wheat yields. This paper adopts the Pearson correlation coefficient and the Spearman correlation coefficient for the correlation analysis of each feature. The calculation formulas are as follows, respectively:

ρ_{X Y} = \frac{C o v (X, Y)}{σ_{x} σ_{y}}

(14)

r_{s} = 1 - \frac{6 \sum_{i = 1}^{n} d_{i}^{2}}{n (n^{2} - 1)}

(15)

where

ρ_{X Y}, r_{s}

is the Pearson correlation coefficient and the Spearman correlation coefficient,

X, Y

is the feature and the target feature, respectively,

C o v (X, Y)

is the total covariance of X and Y,

σ_{x} σ_{y}

is the standard deviation of X and Y, n is the data point, and

d_{i}

is the grade difference of

X_{i}

and

Y_{i}

.

As shown in Table 2, both Pearson and Spearman correlation coefficients indicate that the total power of agricultural machinery exhibits the highest correlation, suggesting that it has the most substantial impact on wheat yields, followed by the gross agricultural output value. Furthermore, the affected area demonstrates a negative correlation with wheat yields, whereas average temperature and precipitation show the least influence.

4.3. Wheat Yield Forecasting Performance Evaluation of the Models

The hyperparameters optimized using the particle swarm algorithm are listed in Table 3. All models use the Adam optimizer and the MAE as the loss function.

As shown in Table 4, all models maintain prediction errors below 100 units at the thousand-unit scale while achieving relatively high confidence levels within a 2% confidence interval. These results demonstrate reliable predictive capability despite the inherent complexity of crop yield systems, justifying the selection of these models as benchmarks.

Specifically, the LSTM-Attention model achieves the smallest prediction errors in most years, attaining optimal forecasting accuracy (CD = 0.194) in the 2024 comprehensive evaluation. The traditional RNN exhibits relatively larger errors, whereas the TPCLA model demonstrates the most prominent confidence performance (0.803). Overall, all models effectively predict wheat yields while maintaining high confidence levels, validating the applicability and stability of the selected modeling framework.

As shown in Figure 4, the predicted wheat yield values from all models generally align with the actual values in terms of trends, though certain years exhibit notable systematic prediction deviations. Specifically, in 2018, all models consistently overestimated the actual yields, with the LSTM-Attention model showing the most significant deviation (predicted: 4.640; vs. actual: 4.492). Conversely, in 2021, all models except PCLA underestimated the actual yield, indicating a systematic underestimation.

In terms of long-term performance, the TPCLA model demonstrated the best prediction stability throughout the entire period, with its prediction curve being closest to the actual values. Particularly between 2021 and 2024, the prediction accuracy of TPCLA was significantly superior to other benchmark models. It is noteworthy that the traditional RNN model exhibited large prediction errors in multiple years (e.g., 2022 and 2023), while the LSTM-Attention model tended to consistently overestimate the actual yield.

The phenomenon of multiple models exhibiting unidirectional prediction biases in specific years suggests the potential presence of systematic external influencing factors not captured in the model feature set, such as extreme climate events, agricultural policy adjustments, or global market fluctuations. This observation not only validates the consistency characteristics of the model predictions but also provides important clues for further investigation into key external factors affecting wheat yields.

As can be seen from Figure 5, the TPCLA model exhibits a narrower error distribution that is more concentrated near zero with a distinct left-skewed pattern. The PCLA model also demonstrates errors close to zero but with a wider distribution and notable right-skewed characteristics. Although the LSTM-Attention model achieves the narrowest error distribution, its overall errors deviate significantly from zero, indicating larger prediction inaccuracies and consequently higher training errors compared to the former two models. In contrast, both the RNN and LSTM models display wide error distributions with a substantial deviation from zero, resulting in notably inferior prediction accuracy relative to the LSTM-Attention, PCLA, and TPCLA models.

In summary, the TPCLA model optimized through cross-regional transfer learning achieves a compact error distribution centered near zero, effectively enhancing data utilization efficiency. This approach fully leverages the value of limited data samples and addresses data scarcity challenges in wheat yield prediction by enabling extensive feature learning through transfer learning mechanisms.

By comparing the model performance of the test data throughout the entire cycle in Table 5, TPCLA performed the best. The TPCLA model demonstrates superior performance in the comparative analysis of model performance using full test period data. Compared to the suboptimal LSTM-Attention model, it achieves an 18.36% reduction in RMSE, a 12.60% decrease in MAE, and a 4.39% improvement in R², effectively validating the feasibility of model optimization through transfer learning on small-sample wheat yield datasets. Furthermore, the LSTM-Attention model without transfer learning still outperforms other benchmark architectures in evaluation metrics, indicating that this attention-enhanced model can further improve prediction accuracy by enhancing data utilization. In conclusion, effective knowledge transfer can be achieved through cross-domain pretraining for parameter initialization and subsequent model fine-tuning. This approach helps overcome data scarcity constraints in target regions while improving prediction accuracy and fitting performance, thereby providing a viable methodology for accurate and efficient crop yield prediction.

5. Discussion

Our experimental results demonstrate the effectiveness of the proposed TPCLA model in predicting wheat yields using limited time-series data. A particularly noteworthy finding emerged from the systematic analysis of prediction errors across multiple models. As illustrated in Figure 4, all models consistently overestimated the actual yield in 2018, while a collective underestimation occurred in 2021. This consistent directional bias across diverse architectures strongly suggests the influence of external drivers not captured in the feature set, rather than model failure. In fact, this sensitivity underscores the model’s capacity to detect unquantified external shocks.

In Table 2, a further feature analysis identified total power of agricultural machinery as the most influential positive factor, while disaster-affected area exhibited a negative correlation. This provides a plausible explanation for the observed biases: the overestimation in 2018 may be attributed to unrecorded extreme climate events (e.g., regional drought) that reduced actual yields beyond model expectations, whereas the underestimation in 2021 could reflect the impact of a potent yield-enhancing policy (e.g., the promotion of new cultivars or temporary subsidies), whose effect exceeded projections based on historical data alone.

This insight elevates the role of our model from a mere forecasting tool to a diagnostic instrument, capable of retrospectively revealing significant external factors—such as policy efficacy or extreme weather impacts—that are otherwise poorly documented. Such a capability offers quantitative support for governmental evaluation of agricultural policies and the development of risk mitigation strategies.

In terms of predictive performance, the TPCLA model achieved optimal results across all core metrics—RMSE, MAE, and R². It reduced RMSE and MAE by 18.36% and 12.60%, respectively, compared to the suboptimal LSTM-Attention model, while elevating R² to 0.904. This marked improvement confirms the efficacy of cross-regional transfer learning, which enables the model to learn common yield-influencing patterns (e.g., climatic trends and cropping systems) during pre-training, and subsequently adapt to the local characteristics of the target region during fine-tuning. This approach fundamentally mitigates overfitting issues common in small-sample scenarios and enhances the generalization capability.

5.1. Comparative Advantages over Biophysical Models

When compared to traditional biophysical crop models, the TPCLA framework exhibits several distinct advantages in the context of regional yield prediction:

(1) Reduced Data Dependency and Enhanced Practicality: Biophysical models require extensive and often unavailable input parameters, such as detailed soil properties, genetic coefficients, and daily management records. In contrast, TPCLA relies solely on publicly available macroscopic indicators (e.g., cultivated area, gross agricultural output, machinery power), significantly lowering data acquisition barriers and enabling scalable and rapid yield estimation.

(2) Integration of Complex System Dynamics and Implicit Knowledge: While biophysical models excel at simulating known physiological processes, they struggle to incorporate socio-economic and human decision factors, such as policy shifts or market responses. TPCLA, as a data-driven approach, automatically learns the composite effects of these factors from historical data. The systematic prediction biases observed in 2018 and 2021 exemplify the model’s ability to internalize the impact of external shocks not explicitly included in the feature set.

(3) Computational Efficiency and Rapid Deployment: Biophysical models are computationally intensive, often requiring complex simulations at fine spatial resolutions. Once trained, TPCLA performs predictions via a single forward pass, allowing frequent and timely updates—a critical feature for supporting real-time agricultural decision-making.

(4) Generalization via Knowledge Transfer in Small-Sample Settings: A key contribution of this study is the use of cross-regional transfer learning to overcome data scarcity. By pre-training on data from agronomically similar regions, the model captures universal spatio-temporal patterns before fine-tuning on the target region (Shandong). This “learn-and-adapt” strategy effectively mitigates overfitting and yields more robust predictions on limited local data (31 years), as validated through the test results.

5.2. Rationale for Feature Inclusion and Architectural Design

The inclusion of the Gross Output Value of Agriculture, Forestry, Animal Husbandry, and Fishery is statistically justified due to its strong correlation with wheat yield (Pearson = 0.795, Spearman = 0.734). This variable serves as a proxy for regional agricultural development, indirectly capturing the effects of technological progress, capital investment, and policy support—factors that are difficult to quantify directly but critically influence yield outcomes.

In the parallel CNN-LSTM-Attention architecture, the 1D-CNN branch is employed to capture short-term local temporal patterns within multi-year windows. This design is motivated by the recognition that crop yields are often influenced by sequential conditions over consecutive years (e.g., sustained investment in agricultural machinery). While LSTM models long-term temporal dependencies, the CNN complements it by detecting localized, multi-year interactions that may signify critical preparatory phases for high yields. This parallel setup allows the model to leverage both short-term fluctuations and long-term trends, enhancing its capacity to represent complex agricultural systems without manual feature engineering.

6. Conclusions

This study investigates the effectiveness of transfer learning in improving wheat yield prediction under small-sample conditions. To address the challenges posed by limited data availability and complex multivariate dependencies, a Transfer-learning-based Parallel CNN–LSTM–Attention (TPCLA) model is proposed. By integrating cross-regional transfer learning with a parallel spatiotemporal feature extraction framework, the model enhances data utilization and effectively captures invariant yield-related patterns.

Comparative experiments among five deep learning architectures—RNN, LSTM, LSTM–Attention, PCLA, and TPCLA—demonstrate that TPCLA consistently achieves the highest accuracy and robustness across all evaluation metrics. The results confirm that transfer learning can mitigate the effects of data scarcity and improve model generalization, especially when yield is influenced by both direct climatic variables and indirect socio-economic factors. An analysis of prediction residuals further indicates that unobserved external influences, such as weather anomalies, policy interventions, and market fluctuations, contribute to systematic deviations. These findings highlight the practical value of TPCLA for supporting agricultural planning and policy formulation in data-constrained settings.

Future Work: Although the incorporation of indirect socio-economic indicators improves prediction performance, some variables—such as the total agricultural output value—naturally exhibit upward trends due to macroeconomic growth, inflation, or rising GDP. This may introduce spurious correlations or inflated importance in the model, especially when the true crop yield remains stable over time. Addressing this limitation represents an important direction for future research. Potential solutions include the following: (1) The de-trending or economic normalization of long-term socio-economic indicators; (2) Employing causality-aware feature selection or structural causal models to disentangle genuine yield determinants from confounding macroeconomic trends; (3) Designing trend-robust architectures that explicitly separate short-term agronomic signals from long-term economic drift.

Future work will explore these approaches to further enhance model interpretability and prevent biased correlations that can arise from inherently trending variables.

Author Contributions

Conceptualization, C.S., T.L., and T.X.; methodology, C.S. and T.L.; formal analysis, W.N. and T.X.; investigation, S.S., Z.L., S.O., X.S., and T.H.; resources, X.S.; data curation, S.O., X.S., and T.H.; writing—original draft, C.S., T.L., W.N., and T.X.; writing—review and editing, S.S., Z.L., S.O., X.S., T.H., and Z.Z.; visualization, T.H., Z.Z., T.C., and J.X.; supervision, T.C. and J.X.; project administration, J.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the Shandong Province College Students Innovation and Entrepreneurship Training Program (No. 202410435034, No. S202410435101, No. S202410435069, No. S202410435066, and No. S202410435065), in part by the Qingdao Science and Technology Demonstration project—the new modern agriculture project in 2024 (No. 24-2-8-xdny-11-nsh)—and in part by the Shandong Province Technology Innovation Guidance Plan (No. YDZX2024018).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Curtis, T.; Halford, N. Food security: The challenge of increasing wheat yield and the importance of not compromising food safety. Ann. Appl. Biol. 2014, 164, 354–372. [Google Scholar] [CrossRef]
UNICEF. The State of Food Security and Nutrition in the World 2021; FAO: Rome, Italy, 2021. [Google Scholar]
Shiferaw, B.; Smale, M.; Braun, H.J.; Duveiller, E.; Reynolds, M.; Muricho, G. Crops that feed the world 10. Past successes and future challenges to the role played by wheat in global food security. Food Secur. 2013, 5, 291–317. [Google Scholar] [CrossRef]
Cai, Y.; Guan, K.; Lobell, D.; Potgieter, A.B.; Wang, S.; Peng, J.; Xu, T.; Asseng, S.; Zhang, Y.; You, L.; et al. Integrating satellite and climate data to predict wheat yield in Australia using machine learning approaches. Agric. For. Meteorol. 2019, 274, 144–159. [Google Scholar] [CrossRef]
Basso, B.; Cammarano, D.; Carfagna, E. Review of crop yield forecasting methods and early warning systems. In Proceedings of the First Meeting of the Scientific Advisory Committee of the Global Strategy to Improve Agricultural and Rural Statistics, Rome, Italy, 18–19 July 2013; Volume 241. [Google Scholar]
Liang, J.; Li, H.; Li, N.; Yang, Q.; Li, L. Analysis and prediction of the impact of socio-economic and meteorological factors on rapeseed yield based on machine learning. Agronomy 2023, 13, 1867. [Google Scholar] [CrossRef]
He, Z.; Xia, X.; Zhang, Y. Breeding noodle wheat in China. In Asian Noodles: Science, Technology, and Processing; Wiley: Hoboken, NJ, USA, 2010; pp. 1–23. [Google Scholar]
Huang, J.; Tian, L.; Liang, S.; Ma, H.; Becker-Reshef, I.; Huang, Y.; Su, W.; Zhang, X.; Zhu, D.; Wu, W. Improving winter wheat yield estimation by assimilation of the leaf area index from Landsat TM and MODIS data into the WOFOST model. Agric. For. Meteorol. 2015, 204, 106–121. [Google Scholar] [CrossRef]
Molitor, K.; Braun, B.; Pritchard, B. The effects of food price changes on smallholder production and consumption decision-making: Evidence from Bangladesh. Geogr. Res. 2017, 55, 206–216. [Google Scholar] [CrossRef]
Tilman, D. Global environmental impacts of agricultural expansion: The need for sustainable and efficient practices. Proc. Natl. Acad. Sci. USA 1999, 96, 5995–6000. [Google Scholar] [CrossRef] [PubMed]
Grassini, P.; Specht, J.E.; Tollenaar, M.; Ciampitti, I.; Cassman, K.G. High-yield maize–soybean cropping systems in the US Corn Belt. In Crop Physiology; Elsevier: Amsterdam, The Netherlands, 2015; pp. 17–41. [Google Scholar]
You, J.; Li, X.; Low, M.; Lobell, D.; Ermon, S. Deep gaussian process for crop yield prediction based on remote sensing data. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Khaki, S.; Wang, L. Crop yield prediction using deep neural networks. Front. Plant Sci. 2019, 10, 621. [Google Scholar] [CrossRef] [PubMed]
Iniyan, S.; Varma, V.A.; Naidu, C.T. Crop yield prediction using machine learning techniques. Adv. Eng. Softw. 2023, 175, 103326. [Google Scholar] [CrossRef]
Kamilaris, A.; Prenafeta-Boldú, F.X. Deep learning in agriculture: A survey. Comput. Electron. Agric. 2018, 147, 70–90. [Google Scholar] [CrossRef]
Xiong, Z.; Guo, X.; Zhang, D.; Tseng, S.P.; Ji, Y.; Sun, X. Crop Yield Prediction based on Attention-LSTM Model. In Proceedings of the 2024 International Conference on Orange Technology (ICOT), Online, 15–18 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–4. [Google Scholar]
Ma, C.; Ye, Z.; Zi, Q.; Liu, C. Machine-Learning-Based Multi-Site Corn Yield Prediction Integrating Agronomic and Meteorological Data. Agronomy 2025, 15, 1978. [Google Scholar] [CrossRef]
Jabed, M.A.; Murad, M.A.A. Crop yield prediction in agriculture: A comprehensive review of machine learning and deep learning approaches, with insights for future research and sustainability. Heliyon 2024, 10, e40836. [Google Scholar] [CrossRef]
Weiss, K.; Khoshgoftaar, T.M.; Wang, D. A survey of transfer learning. J. Big Data 2016, 3, 9. [Google Scholar] [CrossRef]
Lobell, D.B.; Burke, M.B. On the use of statistical models to predict crop yield responses to climate change. Agric. For. Meteorol. 2010, 150, 1443–1452. [Google Scholar] [CrossRef]
Tao, F.; Yokozawa, M.; Zhang, Z. Modelling the impacts of weather and climate variability on crop productivity over a large area: A new process-based model development, optimization, and uncertainties analysis. Agric. For. Meteorol. 2009, 149, 831–850. [Google Scholar] [CrossRef]
Shi, W.; Tao, F.; Zhang, Z. A review on statistical models for identifying climate contributions to crop yields. J. Geogr. Sci. 2013, 23, 567–576. [Google Scholar] [CrossRef]
Tao, F.; Zhang, Z.; Shi, W.; Liu, Y.; Xiao, D.; Zhang, S.; Zhu, Z.; Wang, M.; Liu, F. Single rice growth period was prolonged by cultivars shifts, but yield was damaged by climate change during 1981–2009 in China, and late rice was just opposite. Glob. Chang. Biol. 2013, 19, 3200–3209. [Google Scholar] [CrossRef]
Jones, J.W.; Hoogenboom, G.; Porter, C.H.; Boote, K.J.; Batchelor, W.D.; Hunt, L.; Wilkens, P.W.; Singh, U.; Gijsman, A.J.; Ritchie, J.T. The DSSAT cropping system model. Eur. J. Agron. 2003, 18, 235–265. [Google Scholar] [CrossRef]
Keating, B.A.; Carberry, P.S.; Hammer, G.L.; Probert, M.E.; Robertson, M.J.; Holzworth, D.; Huth, N.I.; Hargreaves, J.N.; Meinke, H.; Hochman, Z.; et al. An overview of APSIM, a model designed for farming systems simulation. Eur. J. Agron. 2003, 18, 267–288. [Google Scholar] [CrossRef]
Tao, F.; Zhang, Z.; Liu, J.; Yokozawa, M. Modelling the impacts of weather and climate variability on crop productivity over a large area: A new super-ensemble-based probabilistic projection. Agric. For. Meteorol. 2009, 149, 1266–1278. [Google Scholar] [CrossRef]
Patrício, D.I.; Rieder, R. Computer vision and artificial intelligence in precision agriculture for grain crops: A systematic review. Comput. Electron. Agric. 2018, 153, 69–81. [Google Scholar] [CrossRef]
Feng, Q.; Liu, J.; Gong, J. UAV remote sensing for urban vegetation mapping using random forest and texture analysis. Remote Sens. 2015, 7, 1074–1094. [Google Scholar] [CrossRef]
Feng, Q.; Liu, J.; Gong, J. Urban flood mapping based on unmanned aerial vehicle remote sensing and random forest classifier—A case of Yuyao, China. Water 2015, 7, 1437–1455. [Google Scholar] [CrossRef]
Ip, R.H.; Ang, L.M.; Seng, K.P.; Broster, J.C.; Pratley, J.E. Big data and machine learning for crop protection. Comput. Electron. Agric. 2018, 151, 376–383. [Google Scholar] [CrossRef]
Jeong, J.H.; Resop, J.P.; Mueller, N.D.; Fleisher, D.H.; Yun, K.; Butler, E.E.; Timlin, D.J.; Shim, K.M.; Gerber, J.S.; Reddy, V.R.; et al. Random forests for global and regional crop yield predictions. PLoS ONE 2016, 11, e0156571. [Google Scholar] [CrossRef]
Han, J.; Zhang, Z.; Cao, J.; Luo, Y.; Zhang, L.; Li, Z.; Zhang, J. Prediction of winter wheat yield based on multi-source data and machine learning in China. Remote Sens. 2020, 12, 236. [Google Scholar] [CrossRef]
Niedbała, G. Application of multiple linear regression for multi-criteria yield prediction of winter wheat. J. Res. Appl. Agric. Eng. 2018, 63, 125–131. [Google Scholar]
Amin, M.; Amanullah, M.; Akbar, A. Time series modeling for forecasting wheat production of Pakistan. JAPS J. Anim. Plant Sci. 2014, 24, 1444–1451. [Google Scholar]
Arya, S.; Anju; N.A.R. A. Prediction of international rice production using long short-term memory and machine learning models. Int. J. Inform. Commun. Technol. (IJ-ICT) 2025, 2252, 8776. [Google Scholar] [CrossRef]
Sagheer, A.; Kotb, M. Time series forecasting of petroleum production using deep LSTM recurrent networks. Neurocomputing 2019, 323, 203–213. [Google Scholar] [CrossRef]
Bhimavarapu, U.; Battineni, G.; Chintalapudi, N. Improved optimization algorithm in LSTM to predict crop yield. Computers 2023, 12, 10. [Google Scholar] [CrossRef]
Fan, D.; Sun, H.; Yao, J.; Zhang, K.; Yan, X.; Sun, Z. Well production forecasting based on ARIMA-LSTM model considering manual operations. Energy 2021, 220, 119708. [Google Scholar] [CrossRef]
Kalmani, V.H.; Dharwadkar, N.V.; Thapa, V. Crop Yield Prediction using Deep Learning Algorithm based on CNN-LSTM with Attention Layer and Skip Connection. Indian J. Agric. Res. 2025, 59, 1303–1311. [Google Scholar] [CrossRef]
Chu, Z.; Yu, J. An end-to-end model for rice yield prediction using deep learning fusion. Comput. Electron. Agric. 2020, 174, 105471. [Google Scholar] [CrossRef]
Xu, D.; Zhang, Q.; Ding, Y.; Zhang, D. Application of a hybrid ARIMA-LSTM model based on the SPEI for drought forecasting. Environ. Sci. Pollut. Res. 2022, 29, 4128–4144. [Google Scholar] [CrossRef] [PubMed]

Figure 1. RNN unfolded structure.

Figure 2. LSTM structural unit.

Figure 3. PCLA network architecture.

Figure 4. Prediction results of wheat yield on five models.

Figure 5. Box chart of yield forecasting error on five models.

Table 1. Hyperparameter optimization range.

Hyperparameter	Range
Learning rate	[0.0001, 0.01]
First output nodes	[8, 96]
CNN kernel	[2, 5]
Activation function	[`tanh’, `ReLU’]
Epochs	[300, 900]

Table 2. Correlation coefficients of Pearson and Spearman.

Feature	Pearson	Spearman
Total Power of Agricultural Machinery (10⁴ kW)	0.773455	0.760421
Gross Agricultural Output Value (billion CNY)	0.703582	0.747912
Gross Output Value of Agriculture, Forestry, Animal Husbandry and Fishery (billion CNY)	0.682246	0.723209
Grain Output (10⁴ tons)	0.662487	0.664379
Total Cultivated Land Area (10³ hectares)	0.521171	0.507828
Disaster-Affected Area (10³ hectares)	−0.272738	−0.264948
Average Temperature (°C)	0.175619	0.118955
Rainfall (mm)	0.167748	0.230432

Table 3. Search results of particle swarm optimization for deep learning models.

Hyperparameter	RNN	LSTM	LSTM-Attention	PCLA	TPCLA
Learning rate	0.00123	0.00572	0.00409	0.00220	0.00125
First output nodes	70.06	93.58	62.78	62.56	43
CNN kernel	-	-	-	3	3
Activation function	tanh	tanh	tanh	tanh	relu
Epochs	626	417	631	548	490

Table 4. The forecast error value of wheat yields.

	RNN	LSTM	LSTM-Attention	PCLA	TPCLA
2017	−0.0018	−0.033	0.087	−0.084	−0.026
2018	−0.093	0.029	0.148	−0.054	−0.171
2019	−0.253	−0.098	−0.016	−0.082	−0.195
2020	−0.235	−0.028	0.040	−0.081	−0.173
2021	−0.207	−0.084	0.096	0.004	−0.226
2022	−0.435	−0.088	0.060	−0.030	−0.171
2023	−0.424	−0.115	0.070	0.008	−0.163
2024	−0.332	−0.102	0.194	−0.069	−0.149
Confidence degree	0.607	0.744	0.66	0.617	0.803

Table 5. Model performance index results.

Model	RMSE	MAE	R-Squared
RNN	0.548	0.415	0.813
LSTM	0.813	0.465	0.548
LSTM-Attention	0.482	0.373	0.866
PCLA	0.490	0.417	0.851
TPCLA	0.394	0.326	0.904

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Song, C.; Liu, T.; Ning, W.; Xu, T.; Song, S.; Li, Z.; Ouyang, S.; Song, X.; Han, T.; Zhang, Z.; et al. Wheat Yield Prediction Based on Parallel CNN-LSTM-Attention with Transfer Learning Model. Agriculture 2025, 15, 2519. https://doi.org/10.3390/agriculture15232519

AMA Style

Song C, Liu T, Ning W, Xu T, Song S, Li Z, Ouyang S, Song X, Han T, Zhang Z, et al. Wheat Yield Prediction Based on Parallel CNN-LSTM-Attention with Transfer Learning Model. Agriculture. 2025; 15(23):2519. https://doi.org/10.3390/agriculture15232519

Chicago/Turabian Style

Song, Caixia, Tengao Liu, Weiguang Ning, Tong Xu, Shuhui Song, Zifei Li, Shuyun Ouyang, Xinquan Song, Taoyang Han, Zichen Zhang, and et al. 2025. "Wheat Yield Prediction Based on Parallel CNN-LSTM-Attention with Transfer Learning Model" Agriculture 15, no. 23: 2519. https://doi.org/10.3390/agriculture15232519

APA Style

Song, C., Liu, T., Ning, W., Xu, T., Song, S., Li, Z., Ouyang, S., Song, X., Han, T., Zhang, Z., Chen, T., & Xie, J. (2025). Wheat Yield Prediction Based on Parallel CNN-LSTM-Attention with Transfer Learning Model. Agriculture, 15(23), 2519. https://doi.org/10.3390/agriculture15232519

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Wheat Yield Prediction Based on Parallel CNN-LSTM-Attention with Transfer Learning Model

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. Data Preprocessing

3.2. Baselines

3.2.1. RNN

3.2.2. LSTM

3.2.3. LSTM Attention

3.2.4. Parallel CNN-LSTM-Attention Model

3.2.5. Parallel CNN-LSTM-Attention Model Based on Transfer Learning

3.3. Hyperparameters Optimization

3.4. Evaluation Criteria

4. Experiments

4.1. Wheat Dataset Preprocessing

4.2. Feature Analysis

4.3. Wheat Yield Forecasting Performance Evaluation of the Models

5. Discussion

5.1. Comparative Advantages over Biophysical Models

5.2. Rationale for Feature Inclusion and Architectural Design

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI