Multi-Source Feature Fusion Network for LAI Estimation from UAV Multispectral Imagery

Zhang, Lulu; Zhang, Bo; Zhang, Huanhuan; Yang, Wanting; Hu, Xinkang; Cai, Jianrong; Wu, Chundu; Wang, Xiaowen

doi:10.3390/agronomy15040988

Open AccessArticle

Multi-Source Feature Fusion Network for LAI Estimation from UAV Multispectral Imagery

by

Lulu Zhang

¹,

Bo Zhang

²

,

Huanhuan Zhang

¹,

Wanting Yang

³,

Xinkang Hu

¹,

Jianrong Cai

⁴

,

Chundu Wu

^1,5,6,* and

Xiaowen Wang

^1,*

¹

School of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China

²

School of the Environment and Safety Engineering, Jiangsu University, Zhenjiang 212013, China

³

School of Mechatronic Engineering, Taizhou University, Taizhou 225300, China

⁴

School of Food and Biological Engineering, Jiangsu University, Zhenjiang 212013, China

⁵

Key Laboratory for Theory and Technology of Intelligent Agricultural Machinery and Equipment, Jiangsu University, Zhenjiang 212013, China

⁶

Jiangsu Province and Education Ministry Cosponsored Synergistic Innovation Center of Modern Agricultural Equipment, Jiangsu University, Zhenjiang 212013, China

^*

Authors to whom correspondence should be addressed.

Agronomy 2025, 15(4), 988; https://doi.org/10.3390/agronomy15040988

Submission received: 25 March 2025 / Revised: 12 April 2025 / Accepted: 19 April 2025 / Published: 20 April 2025

(This article belongs to the Section Precision and Digital Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

The leaf area index (LAI) is a critical biophysical parameter that reflects crop growth conditions and the canopy photosynthetic potential, serving as a cornerstone in precision agriculture and dynamic crop monitoring. However, traditional LAI estimation methods relying on single-source remote sensing data and often suffer from insufficient accuracy in high-density vegetation scenarios, limiting their capacity to reflect crop growth variability comprehensively. To overcome these limitations, this study introduces an innovative multi-source feature fusion framework utilizing unmanned aerial vehicle (UAV) multispectral imagery for precise LAI estimation in winter wheat. RGB and multispectral datasets were collected across seven different growth stages (from regreening to grain filling) in 2024. Through the extraction of color attributes, spatial structural information, and eight representative vegetation indices (VIs), a robust multi-source dataset was developed to integrate diverse data types. A convolutional neural network (CNN)-based feature extraction backbone, paired with a multi-source feature fusion network (MSF-FusionNet), was designed to effectively combine spectral and spatial information from both RGB and multispectral imagery. The experimental results revealed that the proposed method achieved superior estimation performance compared to single-source models, with an R² of 0.8745 and RMSE of 0.5461, improving the R² by 36.67% and 5.54% over the RGB and VI models, respectively. Notably, the fusion method enhanced the accuracy during critical growth phases, such as the regreening and jointing stages. Compared to traditional machine learning techniques, the proposed framework exceeded the performance of the XGBoost model, with the R² rising by 4.51% and the RMSE dropping by 12.24%. Furthermore, our method facilitated the creation of LAI spatial distribution maps across key growth stages, accurately depicting the spatial heterogeneity and temporal dynamics in the field. These results highlight the efficacy and potential of integrating UAV multi-source data fusion with deep learning for precise LAI estimation in winter wheat, offering significant insights for crop growth evaluation and precision agricultural management.

Keywords:

UAV; winter wheat; leaf area index; multi-source feature fusion network; vegetation index; precision agriculture

1. Introduction

Wheat (Triticum aestivum L.) is one of the most important cereal crops globally, serving as a staple food for approximately 40% of the global population. The accurate monitoring of wheat growth is crucial in ensuring national food security and sustainability [1,2,3]. The leaf area index (LAI), defined as the total leaf area per unit ground surface, is a key metric, closely linked to wheat’s photosynthetic capacity, nutrient cycling, biomass accumulation, and yield potential. Consequently, the LAI is widely acknowledged as an essential indicator in monitoring crop growth and development [4,5]. However, traditional LAI measurement depends largely on manual field surveys and laboratory analyses. These methods are laborious, time-intensive, destructive, and lack real-time monitoring capabilities, making them impractical for large-scale agricultural monitoring.

In recent years, UAV remote sensing has emerged as an effective and versatile tool for agricultural monitoring, owing to its flexibility and ability to acquire high-resolution data [6,7,8]. Equipped with RGB, multispectral, or hyperspectral sensors, UAVs efficiently capture detailed spectral, structural, and textural information from crop canopies, offering a robust data source for precise LAI estimation. Several studies have integrated UAV-derived data with machine learning algorithms to enhance the LAI predictive accuracy [9,10,11]. For instance, Liu et al. [12] combined UAV multispectral imagery with the random forest (RF) algorithm to estimate the spring wheat LAI across multiple growth stages, achieving an R² of 0.834. Similarly, Ma et al. [13] employed the extreme learning machine (ELM) algorithm with UAV hyperspectral data to accurately invert the cotton LAI under varying nitrogen conditions, yielding a validation R² of 0.9066. Cheng et al. [14] utilized ensemble learning to predict the maize LAI under water and fertilizer stress, reporting an R² of 0.876 and an RMSE of 0.481.

Despite these advances, many existing methods depend heavily on the spectral VI, which often exhibits saturation effects in dense canopies or late growth stages, reducing the estimation accuracy. Li et al. [15] found that relying solely on visible-band color indices fails to mitigate spectral saturation in rice, although incorporating texture and morphological features markedly improves the model stability and generalization. Recent developments in multi-source data fusion have broadened the scope of crop parameter estimation. Yu et al. [16] successfully fused multispectral, thermal infrared, and canopy structure data from UAVs with the RF model to enhance maize LAI estimation. Likewise, Du et al. [17] integrated spectral reflectance, VI, and texture information using RF to improve rice LAI prediction under nitrogen stress. However, traditional machine learning models like RF and support vector machines (SVM) depend on manually extracted features, limiting their capacity to address the complex nonlinearities and high-dimensional temporal dynamics inherent in crop growth monitoring.

Deep learning technologies have recently advanced agricultural image analysis, leveraging their superior capabilities in nonlinear modeling and automatic feature extraction [18,19,20]. Models such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and their variants have been widely applied to crop LAI estimation [21], pest and disease detection [22], and soil moisture monitoring [23]. For example, Zhao et al. [24] developed the DPeNet model, incorporating residual blocks and deformable convolution modules, achieving average precision of 0.932 for multi-scale pest detection. Wittstruck et al. [25] used CNNs with UAV-based RGB imagery and digital surface model (DSM) data to estimate the winter wheat LAI, attaining an R² of 0.83. Similarly, Wei et al. [26] combined a CNN-LSTM model with UAV multi-source data to predict the wheat yield, biomass, and straw-to-grain ratios with high accuracy.

Despite the widespread adoption of UAV remote sensing, machine learning, and deep learning in crop LAI estimation, several challenges persist: (1) most studies rely on single-sensor data, which restricts the information diversity and leads to spectral saturation in dense canopies or late growth stages, compromising the accuracy; (2) traditional methods struggle to model complex phenotypic traits and temporal dynamics, limiting their generalization across regions and management practices; (3) existing multi-source fusion approaches typically apply basic feature concatenation or statistical weighting, without fully utilizing the complementary physiological and structural traits embedded in heterogeneous data sources. The aforementioned challenges, particularly the problem of spectral saturation in dense canopies and the limited effectiveness of conventional fusion strategies, underscore the necessity of a more physiologically informed and theoretically grounded approach to multi-source data integration. Spectral saturation frequently arises when vegetation indices such as the NDVI or NDRE reach reflectance plateaus under high-biomass or late-growth-stage conditions, resulting in diminished sensitivity to LAI variations [27]. In contrast, RGB imagery, although lacking near-infrared sensitivity, captures the fine-scale morphological and textural characteristics of the canopy, including the gap fraction, leaf orientation, and spatial heterogeneity. These structural features often remain informative, even when spectral indices become saturated, providing a complementary source of phenotypic information [28].

Accordingly, a model that jointly utilizes both spectral and structural information can offer a more comprehensive representation of the crop canopy status across different developmental stages. Deep convolutional neural networks are particularly well suited for this task, as they can learn hierarchical and nonlinear interactions between spatial and spectral features without relying on manually defined indices. The proposed cross-modal feature fusion network (FusionNet) is designed to integrate information from RGB and VI inputs through parallel feature extraction branches and a late-stage fusion module. This architecture allows the model to capture both physiological traits, such as chlorophyll-related reflectance, and morphological traits, such as the canopy structure and complexity. By leveraging these complementary data characteristics, the model enhances the LAI estimation accuracy while alleviating the saturation effects that hinder traditional methods. In addition, multi-temporal UAV data from different growth stages are employed to validate the model’s robustness and generalization under diverse physiological and canopy conditions.

This method advances the technical design of UAV-based LAI estimation while providing theoretical insights into the integration of cross-modal canopy signals to ad-dress the inherent limitations of single-source remote sensing.

2. Materials and Methods

2.1. Overview of the Study Area

The study area is located at the Runguo Agricultural Base in Zhenjiang, Jiangsu Province, China (32°8′13″ N, 119°43′55″ E), at an elevation of approximately 12 m (Figure 1a). The terrain is flat, with yellow–brown soil predominant in the region. The region experiences a subtropical monsoon climate, with an average annual temperature of 15.4 °C and annual precipitation of 800 to 1100 mm. The experimental field spans 23,700 m² and is primarily cultivated with winter wheat. Sowing typically occurs in November, with harvesting completed by June of the following year. This study adopted a plot-based planting design, incorporating 21 winter wheat varieties, predominantly from the Yangmai and Zhenmai series (Figure 1c). Each variety, numbered from 1 to 21, represents a unique genotype. For varieties 1 to 20, four sampling points were evenly distributed per variety, while variety 21 included 11 sampling points to ensure a uniform spatial distribution. The spatial distribution of all sampling points is shown in Figure 1b.

To minimize confounding effects due to agronomic management, all plots were subjected to uniform cultivation practices, including standardized fertilization, irrigation, and weed/pest control protocols. Fertilizer application rates were determined based on pre-planting soil nutrient assessments, and irrigation was implemented following a consistent schedule across all plots. This uniform management ensured that the observed differences in the LAI could be primarily attributed to varietal/genotypic traits or the phenological stage, rather than discrepancies in resource availability.

2.2. Data Acquisition

2.2.1. UAV Image Data Acquisition

A DJI Phantom 4 Multispectral (P4M) UAV was utilized for image acquisition in this study. It integrates a multispectral imaging system comprising five narrowband multispectral sensors (blue: 450 ± 16 nm, green: 560 ± 16 nm, red: 650 ± 16 nm, red edge: 730 ± 16 nm, and near-infrared (NIR): 840 ± 26 nm) and an RGB camera. The data collection occurred between 12 March and 8 May 2024, coinciding with seven key growth stages of winter wheat, from regreening to grain filling.

To minimize the impact of variable solar angles and to ensure consistent illumination, UAV flights were conducted exclusively between 10:00 and 14:00 local time under clear sky conditions and with wind speeds below 5 m/s. Images were captured in time-lapse mode at 2 s intervals, maintaining a forward overlap of 80% and a side overlap of 75%, optimized for orthophoto generation and three-dimensional (3D) reconstruction. Additionally, ten ground control points (GCPs) were distributed throughout the study area and measured using an RTK-GPS system with positioning accuracy of ±2 cm to enhance the geometric accuracy of image correction. Before data acquisition, a radiometric calibration process was conducted before each flight using a standard reflectance panel to mitigate the effects of atmospheric scattering and absorption, ensuring reliable multispectral data quality.

2.2.2. LAI Acquisition

Simultaneously with UAV image acquisition, LAI data were collected across the sampling plots, with each sampling frame being 1 m × 1 m. The LAI of the winter wheat canopy was measured using the SunScan (Delta-T Devices Ltd., Campbridge, UK). Measurements were carried out during key growth stages under clear and windless conditions, between 11:00 and 14:00 local time. At each critical growth stage, 91 sampling plots were manually measured. To ensure data accuracy and stability, measurements were taken at five locations within each plot (four corners and the center). Each sampling point was measured 10 times, and outliers were excluded before calculating the average LAI value for each plot.

2.3. Data Processing

2.3.1. UAV Image Processing

This study employed the Pix4Dmapper software (version 4.6.3, Pix4D S.A., Lausanne, Switzerland) to process the raw multispectral UAV imagery through a workflow involving image mosaicking, geometric correction, radiometric calibration, and band registration. The procedure included the following: (1) multispectral images were imported into Pix4Dmapper, which automatically retrieved the spatial coordinates and flight orientation data; (2) image alignment was achieved through feature point matching and multi-view stereo (MVS) techniques, producing a dense point cloud refined through filtering methods; (3) a high-resolution digital surface model (DSM) and a georeferenced digital orthophoto map (DOM) were generated; (4) geometric corrections were applied using in situ ground control points (GCPs) to reduce mosaicking errors and terrain distortions; (5) radiometric calibration was performed with a synchronized standard reflectance panel to derive accurate surface reflectance properties.

Subsequently, the processed multispectral imagery was subjected to band composition using the ENVI 5.3 software (L3Harris Geospatial, Boulder, CO, USA). Regions of interest (ROIs) were defined within ENVI based on sampling points distributed across the study area, delineating the extent of subsequent data cropping. Finally, using Python 3.9 with the Rasterio and Geopandas libraries, UAV images from seven growth stages were cropped into sample images consistent with the actual sampling areas. For each sampling point, RGB and spectral images were extracted within ROI boundaries and synchronized with ground-based LAI measurements for subsequent vegetation index calculation and model training.

2.3.2. Extraction of VIs

VIs are indicators constructed by combining reflectance features from different spectral bands. They are designed to amplify vegetation signals and highlight reflectance contrasts between vegetation and soil, particularly in the red, NIR, and red-edge bands [29]. This approach leverages the strong absorption of green plants in the red and blue bands and their elevated reflectance in the NIR and green bands, providing quantitative measures for the monitoring of crop canopy growth and LAI dynamics. Based on previous studies, this research selected eight commonly used VIs for the dynamic monitoring of the winter wheat LAI. Spectral indices are calculated using the reflectance, where B, G, R, RE, and NIR represent the reflectance data in the blue, green, red, red-edge, and near-infrared bands, respectively. The relevant VIs and their calculation formulas are detailed in Table 1.

Specifically, the normalized difference vegetation index (NDVI) and the optimized soil-adjusted vegetation index (OSAVI) are widely adopted for LAI and biomass estimation, with the OSAVI offering improved performance under conditions with significant soil background influence. The normalized difference red-edge index (NDRE) and the green normalized difference vegetation index (GNDVI) incorporate the red-edge and green bands, respectively, enhancing the sensitivity to the chlorophyll concentration and alleviating the saturation effects commonly observed in dense canopies. The modified chlorophyll absorption ratio index (MCARI) and the transformed chlorophyll absorption in reflectance index (TCARI) are tailored to accentuate chlorophyll absorption characteristics, thereby improving the detection of plant stress and pigment variability. Meanwhile, the green leaf index (GLI) and the red–green–blue vegetation index (RGBVI), derived solely from visible bands, offer valuable structural information and maintain robustness under varying illumination or when using low-cost RGB or multispectral sensors.

Collectively, these eight indices capture complementary physiological (e.g., chlorophyll content) and structural (e.g., canopy morphology) features of winter wheat across growth stages. Their integration into the LAI estimation framework supports a comprehensive and reliable assessment of the crop status using UAV-based multispectral imagery.

2.3.3. LAI Data Preprocessing

To enhance the LAI data quality and improve the model training stability, this study preprocessed the raw LAI data and analyzed their distribution characteristics across different growth stages using boxplot analysis. Figure 2 illustrates the distribution of both the original and preprocessed LAI data, covering seven critical growth stages of winter wheat: regreening, jointing, booting, heading, pre-anthesis, post-anthesis, and grain filling. Preprocessing involved outlier removal using the interquartile range (IQR) method [37], where values below Q1 − 1.5 × IQR or above Q3 + 1.5 × IQR were excluded, followed by linear interpolation to fill missing data.

Figure 2 presents the LAI distribution before and after preprocessing. The raw data (Figure 2a) indicate a mean LAI of 4.044 (standard deviation 0.988) at the heading stage, with a maximum value of 6.800, potentially influenced by measurement errors. After preprocessing (Figure 2b), the maximum LAI at heading decreased to 5.700, with the mean slightly reduced to 4.030 (<1% change) and the standard deviation lowered to 0.975. Outliers were reduced by approximately 5–10%. At the post-anthesis stage, the mean LAI dropped from 4.442 to 4.430, and the standard deviation decreased from 0.955 to 0.912. Post-preprocessing, the LAI distribution became more concentrated, with the coefficient of variation decreasing by about 5% on average, while preserving the dynamic trends across growth stages, making it suitable for subsequent model training.

2.4. Data Augmentation

To enhance the model’s generalization ability across diverse scene conditions and mitigate overfitting due to limited data volumes or imbalanced sample distributions, a data augmentation strategy [38] was introduced during the preprocessing phase of the image dataset. The primary techniques included the following.

(1) Horizontal Flip: Images were randomly flipped horizontally with a probability of 0.5 to simulate observational variations from different flight angles.

(2) Affine Transformation: Applied with a probability of 0.5, this involved scaling (factors between 0.9 and 1.1), translation (±5% of image dimensions), and rotation (angles from −10° to 10°), enhancing the model’s capacity to recognize features across diverse spatial scales and perspectives.

(3) Gaussian Noise Perturbation: Gaussian noise with variance ranging from 10 to 50 was added with a probability of 0.3, mimicking random noise interference during sensor acquisition to improve the model’s robustness to image noise.

2.5. LAI Prediction Model Construction

In agricultural remote sensing, the LAI serves as a crucial biophysical parameter that reflects the structural characteristics and growth conditions of crop canopies. To achieve precise LAI predictions from remote sensing images, this study develops two deep learning models: a single-modal CNN model using RGB images and a multimodal model integrating RGB images with spectral VIs. Both models incorporate a CNN framework with residual blocks (ResBlocks) [39] to facilitate deeper feature extraction and improve the computational efficiency. Additionally, data augmentation techniques are employed to enhance the models’ ability to generalize and predict accurately. Through carefully designed architectures and data processing strategies, this research aims to enhance the precision and reliability of LAI predictions, providing robust technical support for agricultural remote sensing applications. The following subsections detail the model design, data preprocessing methods, and experimental configuration.

2.5.1. Lightweight CNN Prediction Model Based on Single-Modal RGB Imagery

This section introduces a lightweight CNN model developed to predict LAI values from RGB imagery. The model employs a streamlined CNN architecture for feature extraction, incorporating multiple convolutional layers that autonomously learn and capture multi-scale spatial features from the input images. These extracted features effectively represent key canopy characteristics, including texture, color variation, and spatial organization, which are essential for accurate LAI prediction.

As illustrated in Figure 3, the proposed model consists of two primary modules: a feature extraction module and an LAI regression module. The feature extraction module focuses on learning discriminative spatial features from the RGB images, while the regression module maps these features to LAI values. The detailed structures and functionalities of each module are described below.

Feature Extraction Module

The feature extraction module employs a deep CNN architecture, incorporating multiple ResBlocks to enhance the feature extraction efficiency and mitigate common challenges in deep network training. Each ResBlock comprises two convolutional layers and batch normalization (BN) operations, augmented by a skip connection mechanism to effectively address gradient vanishing and feature degradation, thereby strengthening the model’s feature learning capacity. The network consists of three primary ResBlocks, tasked with extracting shallow, intermediate, and deep image features, respectively.

(1) Input and Initial Convolution

The input is an RGB image with dimensions 18 × 18 × 3. Initial feature extraction is performed using a 2D convolutional layer (Conv2D) with the following parameters: kernel size 3 × 3, 32 channels, stride 1, and padding 1. This operation is mathematically expressed as

F_{0} = σ (W_{0} * X + b_{0}),

(1)

where

X

represents the input RGB image,

W_{0}

and

b_{0}

are the convolutional kernel weights and bias,

*

denotes convolution, and

σ

is the ReLU activation function, introducing nonlinearity. Subsequently, BN is applied to accelerate training and stabilize the outputs:

F_{B N} = B N (F_{0}) = γ (\frac{F_{0} - μ}{\sqrt{σ^{2} + ϵ}}) + β,

(2)

where

μ

and

σ^{2}

are the mean and variance of the feature map,

γ

and

β

are learnable parameters, and

ϵ

is a small constant preventing division by zero.

(2) Residual Block Design

Following the initial convolution, three ResBlocks are stacked, with the channel sizes increasing to 64, 128, and 256. Each ResBlock includes two Conv2D layers (3 × 3 kernels), BN, ReLU activation, and a skip connection, preserving the input information. The ResBlock output is

F_{r e s} = F_{i n} + σ (B N (W_{2} * σ (B N (W_{1} * F_{i n})))),

(3)

where

F_{i n}

is the input, and

W_{1}

and

W_{2}

are the weights of the convolutional layers. The skip connection ensures training stability in deep architectures.

(3) Pooling and Feature Compression

Each ResBlock is followed by a MaxPooling2D layer (2 × 2 kernel, stride 2), reducing the feature map dimensions from 18 × 18 to 9 × 9, 4 × 4, and 2 × 2. The pooling operation is

F_{p o o l} (i, j) = \underset{m, n \in N}{m a x} F (2 i + m, 2 j + n),

(4)

where

N

is the 2 × 2 neighborhood. The pooling operation reduces the computational complexity while emphasizing locally salient features. At the end of feature extraction, an AdaptiveAvgPool2D layer compresses the feature map into a global representation of 1 × 1 × 256:

F_{a v g} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} F (i, j),

(5)

where

H

and

W

are the feature map’s height and width.

(4) Regularization

To prevent overfitting, the feature extraction module introduces the dropout regularization operation with a dropout rate of 0.3, randomly deactivating 30% of the neurons to enhance generalization.

II.: Regression Prediction Module

The regression prediction module is designed to map high-level features generated by the feature extraction module onto continuous LAI predictions. This module consists of multiple fully connected (dense) layers, progressively reducing the feature dimensionality to achieve precise regression.

(1) Fully Connected Layer Design

The flattened feature vector

F_{f l a t t e n}

is first fed into a dense layer with 128 neurons and ReLU activation:

Z_{1} = σ (W_{3} F_{f l a t t e n} + b_{3}),

(6)

where

W_{3}

and

b_{3}

denote the weights and bias. A dropout step follows to improve the robustness. The output then proceeds to a second dense layer, shrinking the dimension to 64 with ReLU activation:

Z_{2} = σ (W_{4} Z_{1} + b_{4}),

(7)

(2) Output Layer

Finally, an output layer with a single neuron generates the LAI prediction:

\hat{y} = W_{5} Z_{2} + b_{5} .

(8)

Given that LAI prediction is a regression task, no activation function (i.e., linear activation) is applied to the output layer, preserving the continuous nature of the output values.

(3) Design Goal

By systematically reducing the feature dimensions, this module ensures the the model effectively learns the optimal relationship between high-dimensional features and LAI predictions.

2.5.2. Construction of the MSF-FusionNet Model Based on RGB and VI Images

To improve the precision and robustness of LAI estimation from UAV imagery, this study proposes MSF-FusionNet, a multimodal LAI prediction model that integrates RGB and MS imagery, as shown in Figure 4. This model leverages the complementary information from RGB imagery and multispectral imagery. Building upon the lightweight CNN structure for single-modal RGB imagery, it incorporates multispectral data and the derived VIs through a multimodal feature extraction and fusion mechanism, significantly improving the adaptability to complex terrain and environmental variations. The model design is detailed below in terms of three aspects: input data and preprocessing, the multimodal feature extraction network, and feature fusion with LAI prediction.

Input Data and Preprocessing

The dataset comprises high-resolution RGB and multispectral images collected at various time points to capture dynamic crop growth. RGB imagery, consisting of visible light bands (red, green, blue) with dimensions

H \times W \times 3

, primarily characterizes the crop morphology and color features. Multispectral imagery, spanning the blue, green, red, NIR, and red-edge bands with dimensions

H \times W \times 5

, is highly sensitive to the canopy chlorophyll content, moisture status, and structural changes. To quantify crop vigor and health, eight typical VIs (see Table 1) are calculated from multispectral data as input features for VI images (

H \times W \times 8

).

During preprocessing, RGB and multispectral images are standardized to ensure consistent numerical distributions:

X_{s t d} = \frac{X - μ}{σ},

(9)

where

X

is the raw image data, and

μ

and

σ

are the mean and standard deviation, respectively. Data augmentation, such as horizontal flips and affine transformations, is used to increase the sample diversity and reduce overfitting, boosting model generalization.

II.: Multimodal Feature Extraction Network

To efficiently extract features from RGB and VI images, the model employs two independent CNN branches, each processing RGB and VI data, respectively, with a structure identical to that of the CNN described in Section 2.5.1 for RGB feature extraction. The design is as follows.

(1) RGB Branch: Takes a 3-channel RGB image (

H \times W \times 3

) as input, producing a 256-dimensional feature vector

F_{R G B} \in R^{256}

through multiple convolutional, pooling, and residual modules.

(2) VI Branch: Processes an 8-channel VI image (

H \times W \times 8

) using the same network architecture, yielding a 256-dimensional feature vector

F_{V I} \in R^{256}

.

This dual-branch design processes different modalities in separate feature spaces, preserving the morphological features of RGB images and the physiological information of VI images, thus avoiding inter-modal interference and information loss from early fusion. The symmetry of the branches further ensures efficient feature learning across modalities.

III.: Feature Fusion and LAI Prediction

To fully exploit the complementary information from RGB and VI images, the model employs a feature-level fusion strategy. The 256-dimensional feature vectors output by the RGB and VI branches are concatenated along the feature dimension to form a 512-dimensional fused feature vector, expressed as

F_{f u s i o n} = c o n c a t (F_{R G B}, F_{V I}) \in R^{512},

(10)

where

F_{R G B}

and

F_{V I}

represent the feature vectors from the RGB and VI branches, respectively.

Subsequently, the fused feature

F_{f u s i o n}

is input into a fully connected fusion network (FusionNet) for LAI regression prediction. FusionNet refines the features progressively through a multi-layer fully connected architecture and outputs the predicted LAI value. The specific structure of FusionNet is detailed as follows.

First Fully Connected Layer: The 512-dimensional input is reduced to 128 dimensions, with the ReLU activation function applied to introduce nonlinearity:

Z_{3} = σ (W_{6} F_{f u s i o n} + b_{6}), Z_{3} \in R^{128},

(11)

where

σ

denotes the ReLU function,

W_{6} \in R^{128 \times 512}

, and

b_{6} \in R^{128}

.

Second Fully Connected Layer: The 128-dimensional features are further compressed to 64 dimensions, with ReLU activation applied again:

Z_{4} = σ (W_{7} Z_{3} + b_{7}), Z_{4} \in R^{64},

(12)

where

W_{7} \in R^{64 \times 128}

, and

b_{7} \in R^{64}

.

Output Layer: A single-neuron fully connected layer generates the LAI prediction:

\hat{y} = W_{8} Z_{4} + b_{8}, \hat{y} \in R .

(13)

Since LAI prediction is formulated as a regression task, a linear activation function is applied in the output layer to ensure the continuity of the predicted values. By jointly optimizing the feature extraction sub-network and the integrated regression layers, the model significantly enhances its capability to interpret and represent multi-modal information.

2.5.3. Experimental Environment and Parameter Settings

(1) Dataset and Division

The dataset used in this study consists of 91 sampling points collected across seven key growth stages of winter wheat, from 12 March to 8 May 2024. These stages include regreening, jointing, booting, heading, pre-anthesis, post-anthesis, and grain filling. The dataset encompasses 21 winter wheat cultivars, with four sampling points allocated to each of the first 20 cultivars and 11 sampling points designated for the 21st cultivar due to its larger planting area. To ensure the representativeness of model training and evaluation, a stratified sampling strategy was employed. Specifically, 70 points were allocated to the training set and 21 to the testing set. For the first 20 cultivars, three samples were assigned to the training set and one to the test set. For the 21st cultivar, seven samples were assigned to training and four to testing. Furthermore, for multi-temporal data collected at the same sampling point, all temporal observations were consistently assigned to either the training or the test set, thereby maintaining spatial independence across time.

(2) Model Training Configuration

All training experiments were executed on a high-performance computing workstation equipped with an AMD EPYC 9654 processor (96 cores, 2.4 GHz), 128 GB of RAM, and an NVIDIA RTX 4090 GPU (NVIDIA Corporation, Santa Clara, USA; 24 GB VRAM), operating under Windows 10. The model architecture was developed using Python 3.9 and trained with the PyTorch 2.2.2 deep learning framework. The loss function employed was the Smooth L1 Loss, which offers a balance between robustness to outliers and sensitivity to minor prediction errors. The AdamW optimization [40] was utilized, with an initial learning rate of 0.001, and a dynamic learning rate decay strategy was applied to improve the convergence stability. To mitigate overfitting, a weight decay coefficient of 10⁻⁴ was used. The model was trained for 20 epochs with a mini-batch size of 16.

2.6. Model Evaluation

2.6.1. Spatial Autocorrelation Measure

Moran’s I is one of the most widely used global indicators to assess spatial autocorrelation, i.e., the degree to which a spatial variable is correlated with itself in space. Proposed by Patrick Moran in 1950 [41], this index quantitatively evaluates whether spatial units with similar values are more likely to be clustered together (positive spatial autocorrelation), dispersed (negative spatial autocorrelation), or randomly distributed (no spatial autocorrelation).

The formula for Moran’s I is as follows:

I = \frac{n}{W} \cdot \frac{\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j} (x_{i} - \bar{x}) (x_{j} - \bar{x})}{\sum_{i = 1}^{n} {(x_{i} - \bar{x})}^{2}}

(14)

where n is the number of samples,

x_{i}

is the LAI of the ith sample point,

\bar{x}

is the average LAI of all samples,

w_{i j}

is the spatial weight of sample points i and j (whether they are adjacent), and

\sum_{i = 1}^{n} \sum_{j = 1}^{n} w_{i j}

is the sum of all spatial weights. The value of Moran’s I typically ranges within [–1, 1], where

I > 0

indicates positive spatial autocorrelation (similar values cluster together),

I < 0

indicates negative spatial autocorrelation (dissimilar values cluster together), and I ≈ 0 suggests a random spatial pattern.

To determine the statistical significance of Moran’s I, a hypothesis test is often conducted under the null hypothesis that there is no spatial autocorrelation. A permutation test or Z-score-based approach is commonly used, where a significant deviation from 0 suggests non-random spatial structuring.

2.6.2. Baseline Models

To further validate the effectiveness and robustness of the proposed model, this study incorporates four classical machine learning methods and two representative deep learning architectures for comparative analysis: random forest regression (RF) [42], support vector machine (SVM) [43], extreme learning machine (ELM) [44], and eXtreme gradient boosting regression (XGBoost) [45]. Meanwhile, the deep learning baselines consist of ResNet18 [39] and AlexNet [46], both of which are widely adopted convolutional neural networks (CNNs) for image-based regression tasks. By systematically evaluating the predictive performance of these algorithms in LAI inversion tasks, this research aims to investigate their adaptability and performance differences in characterizing complex vegetation growth traits. Below is an academic description of each model along with its parameter configurations.

RF is an ensemble learning technique introduced by Breiman in 2001. It combines multiple decision trees using bootstrap aggregating (bagging) to form a strong predictive model. Its robustness to multicollinearity and ability to handle high-dimensional data make it ideal for remote sensing applications. For this study, RF employed 500 trees and a minimum leaf size of 1.

SVM optimizes regression by minimizing the structural risk and maximizing the margin. It employs a radial basis function (RBF) kernel for nonlinear regression tasks. Parameters: penalty parameter C = 1.0 to balance complexity and error, kernel coefficient γ = “scale” (automatically determined by feature count), and ε = 0.1 to define the tolerance range for prediction errors.

ELM is a fast learning algorithm based on single hidden layer feedforward neural networks (SLFN), proposed by Huang Guangbin [44]. ELM randomly generates input weights and hidden layer biases and employs a least-squares method to compute output weights, thereby avoiding the convergence difficulties inherent in traditional neural networks. In this study, the ELM model was configured with 100 hidden layer neurons, the activation function was set to “ReLU”, the number of output nodes was 1, and the initial learning rate was set to 0.01.

XGBoost is an efficient gradient boosting decision tree (GBDT) framework. It incorporates optimized strategies such as parallelized tree construction and regularization to enhance the training speed and generalization capabilities, making it widely applicable in remote sensing estimation tasks. In this study, the XGBoost model was configured with 100 estimators, a maximum tree depth of 6, and a learning rate of 0.1. Additionally, to control model complexity and mitigate overfitting, the regularization parameter λ was set to 1, and the subsample ratio was set to 0.8.

ResNet18 is a deep residual convolutional network that introduces identity shortcut connections to address the vanishing gradient problem in deep architectures. Its residual blocks enable the efficient training of deeper networks by preserving low-level and high-level features through additive feature propagation. In this study, the ResNet18 model was adopted for regression by modifying its fully connected layer to output a single continuous LAI value, and it was trained using the Adam optimizer with a learning rate of 0.001 and 30 training epochs.

AlexNet is a classical CNN architecture that has significantly advanced deep learning in computer vision. It consists of five convolutional layers followed by three fully connected layers, employing ReLU activation and dropout regularization. Despite its historical impact, AlexNet’s relatively shallow depth and lack of residual learning mechanisms may limit its ability to extract fine-grained spatial–spectral information from UAV imagery. In this study, AlexNet was trained using the Adam optimizer at a learning rate of 0.001.

2.6.3. Evaluation Indicators

To thoroughly evaluate the predictive capabilities of the constructed models, this study performed validation analyses using the independent test samples. The model accuracy was quantitatively measured by the coefficient of determination (R²) and the root mean square error (RMSE), which reflected the precision and reliability of the LAI estimation models. Specifically, the R² quantifies the proportion of variance in the observed data explained by the model, where values approaching 1 indicate superior model performance. The RMSE quantifies the average prediction error, where lower values reflect better stability and reduced variability between the predicted and actual observations.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(15)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(16)

3. Results

3.1. Variability and Spatial Autocorrelation Analysis of LAI Data

Table 2 presents the LAI statistics for winter wheat across various growth stages, derived from ground measurements. The mean LAI increases initially and then stabilizes over time. At regreening, the mean LAI is 1.215, reflecting sparse early canopy cover, rising to 1.681 at jointing and 2.854 at booting as the canopy thickens. It peaks at 4.044 during heading and then levels off between 4.200 and 4.572 from pre-anthesis to grain filling, indicating a stable canopy density in later stages. This trend mirrors the physiological growth of winter wheat, from sparse early coverage to dense maturity.

The coefficient of variation (CV) is relatively small in the early stages, such as regreening (CV = 0.318) and jointing (CV = 0.396), but higher than in the mid-to-late stages. This may be attributed to greater field environmental heterogeneity or individual plant variability during seedling growth, with the subsequent decrease in CV reflecting enhanced canopy closure and growth uniformity. Overall, the variance, standard deviation, and CV across the seven key growth stages remain low, indicating minimal LAI dispersion among winter wheat varieties and a relatively uniform LAI distribution. These statistical characteristics confirm the high data stability across critical growth stages, providing a reliable foundation for subsequent sample partitioning and model training.

To assess whether spatial autocorrelation might affect the model’s validation accuracy, Moran’s I was calculated for LAI measurements at all 91 sampling points across the seven growth stages (Table 3). The results show that the Moran’s I values range narrowly between 0.0036 and 0.0181, with all p-values exceeding 0.10 and the z-scores falling below 1.3. These statistics indicate no significant global spatial autocorrelation (p > 0.05), suggesting that the LAI values are randomly distributed rather than clustered or dispersed in the study field. Consequently, the likelihood of spatial dependency between the training and test samples is minimal, confirming the effectiveness of the stratified sampling strategy and ensuring robust model evaluation.

3.2. Correlation Analysis Between VI and LAI Data

VIs, derived from variations in surface spectral reflectance, are key indicators of the vegetation distribution, growth status, and health. These indices, computed from multispectral remote sensing data, enhance vegetation signals and facilitate the quantitative assessment of crop canopy properties. This study selected eight representative VIs to analyze their relationships with the LAI across various growth stages (Table 4). These indices capture the structural and physiological characteristics of the crop canopy, providing essential input for accurate LAI estimation.

The findings indicate significant correlations between each vegetation index and the LAI across the growth stages, confirming their effectiveness for LAI estimation. However, the correlation strength varies by stage and index. During regreening and jointing, the NDRE, MCARI, and TCARI show the strongest correlations of 0.881, 0.867, and 0.867 at regreening and 0.861, 0.872, and 0.872 at jointing, demonstrating solid positive associations. The NDVI and GNDVI also perform well (0.78–0.85). At booting, the MCARI and TCARI remain highly correlated (0.878), and the OSAVI improves to 0.850. After heading, the correlations generally weaken, although the OSAVI retains notable strength at pre-anthesis (0.752), post-anthesis (0.750), and grain filling (0.773). Conversely, the RGBVI and GLI exhibit weaker correlations, especially in grain filling (0.182 and 0.257), indicating reduced LAI representation later in the cycle.

The results also reveal dynamic trends. The early stages (regreening to booting) show correlations of 0.70–0.88, reflecting sensitivity to canopy development. Post-heading, the correlations drop to 0.41–0.77, with the RGBVI and GLI being particularly weak in grain filling, likely due to stable canopies and reduced index sensitivity. The OSAVI, however, maintains stronger correlations, showing adaptability to late-stage canopy conditions.

3.3. LAI Estimation Based on RGB and Multispectral Data

This research compared LAI estimation using RGB images and multispectral VI data to assess their effectiveness in agriculture. Figure 5 shows scatter plots of the predicted versus measured LAI based on RGB images (Figure 5a) and multispectral VI data (Figure 5b).

From Figure 5a, the LAI estimation model based on RGB images exhibits a moderate correlation, with a coefficient of determination (R²) of 0.5078. However, the root mean square error (RMSE) is 1.0816, indicating significant deviations between the predicted and measured values. Several data points diverge from the ideal 1:1 line, underscoring the limitations of RGB images in capturing complex vegetation characteristics. This can be attributed to the restricted spectral information provided by RGB data, limited to the red, green, and blue bands, which lack the depth required to characterize vegetation’s spectral properties comprehensively, particularly under high vegetation coverage or variable environmental conditions.

In contrast, the LAI estimation results based on multispectral VI data, as illustrated in Figure 5b, exhibited substantially improved performance. The model achieved an R² of 0.8291, indicating the strong predictive capability of multispectral data in estimating vegetation parameters. The RMSE decreased to 0.6373, suggesting closer agreement between the predicted and measured LAI values, with the scatter points distributed more closely along the 1:1 line. This enhancement can be attributed to the NIR and red-edge bands by the multispectral VIs (e.g., NDVI, NDRE), which provide a more accurate representation of the physiological and biochemical characteristics of vegetation, particularly in monitoring the leaf area dynamics and crop health status.

3.4. LAI Estimation Using Multi-Source Image Fusion

The LAI, a critical canopy parameter in agricultural remote sensing, is often underestimated when relying on a single data source due to its limited information capacity. RGB images offer visual morphological details, while multispectral VIs capture spectral reflectance properties. Combining these complementary sources can improve the LAI estimation accuracy. This study developed the MSF-FusionNet model, integrating RGB and multispectral VI data from different growth stages in 2024, to achieve precise LAI predictions, with results shown in Figure 6.

The findings demonstrate that the fusion of RGB and VI data substantially improves the LAI prediction accuracy. As shown in Figure 6, the MSF-FusionNet model achieves an R² of 0.8745 and an RMSE of 0.5461, underscoring its superior capability to capture LAI variability. Compared to models relying on single data sources, MSF-FusionNet exhibits marked performance gains. Relative to the RGB-only model (R² = 0.5078, RMSE = 1.0816), it improves the R² by approximately 36.67% and reduces the RMSE by about 49.50%. Compared to the VI-only model (R² = 0.8291, RMSE = 0.6373), the R² increases by roughly 5.54%, and the RMSE decreases by approximately 14.30%.

Table 5 further details the model performance across growth stages. The model excels at regreening (R² = 0.7679, RMSE = 0.1438) and jointing (R² = 0.8611, RMSE = 0.2581), likely due to the pronounced spectral features and lower canopy coverage in early growth, facilitating accurate LAI detection. The robust performance persists at booting (R² = 0.8592, RMSE = 0.3644), indicating strong model reliability during mid-growth stages. However, a gradual decline in prediction accuracy is observed from the heading stage onwards, with the R² values decreasing to 0.6898 at heading, 0.6181 at pre-anthesis, 0.5875 at post-anthesis, and 0.5418 at grain filling. This performance reduction corresponds to the increasing canopy density and complexity during the late phenological stages. In these stages, overlapping leaf layers, enhanced shadowing, and spectral saturation, particularly in the near-infrared and red-edge bands, may hinder the model’s ability to distinguish fine-scale LAI variations. Despite these challenges, the model still maintains a reasonable estimation capacity, suggesting that the fusion of RGB and VI features remains informative under dense canopy conditions. These findings point to the need to incorporate additional structural or angular information in future model iterations to further improve the LAI estimation performance under high-LAI regimes. Nonetheless, the consistent trends in the test loss and RMSE between the training and testing sets across all stages suggest good generalization and minimal overfitting.

3.5. Performance Evaluation of LAI Models

To further validate the predictive performance of the MSF-FusionNet model for LAI estimation, four classic machine learning algorithms were selected for comparative analysis. These included ELM, RF, XGBoost, and SVM. Figure 7 presents the prediction results of these models, with the R² and RMSE used as evaluation metrics.

As illustrated in the scatter plots of Figure 7, the ELM model exhibited the lowest accuracy in LAI prediction, with an R² of 0.7672 and an RMSE of 0.7438. This indicates limited fitting capabilities for the fused RGB and VI features, particularly in high-LAI regions, where significant deviations from the measured values were observed. The RF model showed improved prediction performance, achieving an R² of 0.8254 and an RMSE of 0.6442, effectively capturing some of the non-linear relationships through ensemble learning. XGBoost further optimized the performance, with an R² of 0.8368 and an RMSE of 0.6228, demonstrating its advantage in handling high-dimensional features with minimal prediction bias. The SVM model, with an R² of 0.8287 and an RMSE of 0.6381, performed comparably to RF but was slightly inferior to XGBoost, reflecting its capacity to model complex patterns via kernel mapping.

In addition to the four conventional machine learning models, two representative deep learning architectures, namely ResNet18 and AlexNet, were employed as benchmark baselines to evaluate the performance of the proposed MSF-FusionNet. These models are widely recognized as CNNs that are frequently applied in image-based regression tasks.

As illustrated in Figure 8, ResNet18 demonstrated strong performance in LAI estimation, with an R² of 0.8479 and an RMSE of 0.6012. This performance can be attributed to the use of residual connections, which enhance feature propagation and gradient flow in deeper layers. In comparison, the classical AlexNet model achieved an R² of 0.7584 and an RMSE of 0.7578. Its relatively lower accuracy may result from a limited network depth, a lack of residual learning, and the coarser spatial resolution in its feature maps, which restricts its capacity to extract fine-grained spectral and spatial patterns from fused UAV imagery.

Overall, the proposed MSF-FusionNet achieved the best performance, with an R² of 0.8745 and an RMSE of 0.5461, surpassing the traditional machine learning and deep learning baselines. These results demonstrate the effectiveness of combining modality-specific CNN encoders with a fully connected fusion network for the modeling of complex and non-linear relationships between RGB and vegetation index (VI) features in crop LAI estimation.

3.6. LAI Mapping with the Optimal Estimation Model

The MSF-FusionNet model, which demonstrated the best LAI prediction accuracy, was used to generate spatially continuous LAI maps across six key winter wheat growth stages (Figure 8). The results visually illustrate the dynamic changes in canopy development over time.

During the regreening stage (Figure 9a), the LAI values were relatively low (0.5–2.0) and uniformly distributed, reflecting early vegetative growth. As the wheat entered the jointing (Figure 9b) and booting (Figure 9c) stages, the LAI increased to 1.5–3.5 and 2.0–4.5, respectively, consistent with canopy densification. The heading stage (Figure 9d) marked the peak canopy development, with the LAI reaching 3.0–5.0 and spatial continuity strengthening. During the post-anthesis (Figure 9e) and grain-filling stages (Figure 9f), the LAI values plateaued or slightly decreased (2.5–4.5), reflecting physiological maturity and partial senescence. Spatial heterogeneity in the LAI values was evident, especially during the booting and heading stages, where high-LAI patches (>4.0) emerged. This variation may result from micro-environmental differences, including irrigation uniformity, soil nutrient availability, and genotype-specific responses to management.

In practical applications in agriculture, the generated LAI maps offer valuable agronomic insights. For instance, spatially localized low-LAI zones detected during the mid- to late growth stages could signal underlying water or nutrient deficiencies, thus prompting site-specific interventions such as supplemental fertilization or irrigation. Conversely, consistently high-LAI areas may correspond to high-yield potential zones, but also require attention regarding lodging risks or disease surveillance. Such interpretive layers support the core objective of precision agriculture, translating spatial information into actionable decisions.

Although this study did not explicitly validate the spatial LAI patterns against measured yields or plant nutrient content, field observations suggest good agreement between the predicted high-LAI zones and visually diverse subplots. Future work will integrate plot-level biomass and yield measurements to further confirm the physiological validity of LAI heterogeneity and assess its predictive value for yield mapping and nutrient demand modeling.

4. Discussion

4.1. Evaluation of RGB and Multispectral VI Data in Estimating LAI

RGB and multispectral VI data are essential in estimating the LAI of winter wheat, a key indicator of the canopy photosynthetic capacity and yield potential [47,48]. RGB imagery captures visible light, offering insights into the canopy color and texture. However, as shown in Figure 5a, RGB-based models exhibited limited performance (R² = 0.5078, RMSE = 1.0816), particularly during the heading and grain-filling stages. The lack of NIR and red-edge bands limits its sensitivity to the canopy physiology in dense or disturbed conditions.

In contrast, multispectral VIs, incorporating the NIR and red-edge bands, offer more robust indicators of the canopy condition. As shown in Figure 5b, these models achieved greater accuracy (R² = 0.8291, RMSE = 0.6373). VIs such as the NDRE and MCARI exhibited strong correlations with the LAI during the early growth stages (r = 0.881 and 0.872), as detailed in Table 3. Nonetheless, spectral saturation in the later growth stages reduced the sensitivity of some VIs (e.g., RGBVI and GLI), with the correlations dropping to 0.182 and 0.257 [49]. To address saturation effects and enhance LAI estimation across all growth stages, future research could explore the integration of thermal infrared data or the development of novel VI combinations.

4.2. Benefits and Limitations of Multi-Source Fusion in Dense Canopy LAI Estimation

RGB imagery and multispectral VI data can effectively monitor canopy changes in winter wheat LAI estimation, yet they offer different perspectives. RGB data capture visual characteristics, such as color and texture, responding well to external canopy structure shifts. In contrast, VI data, derived from the NIR and red-edge bands, highlight physiological and spectral traits, showing strong sensitivity to LAI variations in key stages like regreening and jointing (e.g., NDRE correlation of 0.881, Table 3). Previous studies have similarly noted VIs’ effectiveness during key growth periods [50,51]. However, when used alone, the RGB-based model (R² = 0.5078, RMSE = 1.0816) and VI-based model (R² = 0.8291, RMSE = 0.6373) falter in high-coverage stages like heading and grain filling, possibly due to spectral saturation or complex canopy structures.

The integration of RGB and VI data in the MSF-FusionNet model markedly improved the LAI estimation accuracy, as demonstrated in Figure 6. This model achieved an R² of 0.8745 and an RMSE of 0.5461, reflecting substantial gains over single-source approaches. Compared with the RGB model, the R² improved by 36.67% and the RMSE dropped by 49.50%. Compared with the VI model, the R² increased by 5.54% and the RMSE decreased by 14.30%. These improvements stem from the complementary nature of the datasets. RGB imagery compensates for VIs’ limitations under sparse canopy conditions, while VIs provide vital spectral information for the assessment of the canopy physiology, consistent with the findings of Zhang et al. [52] and Li et al. [53].

Although the MSF-FusionNet model demonstrated substantial improvements in LAI prediction across the early and mid-growth stages, a notable decline in performance was observed during later phenological stages, particularly post-anthesis and grain filling, where the R² values dropped to 0.5875 and 0.5418, respectively (Table 5). This reduction in predictive accuracy is closely associated with the increasing structural complexity of the crop canopy, which introduces shadowing effects, multi-layered occlusions, and spectral saturation, especially in the near-infrared and red-edge bands. The current model architecture employs a dual-branch design that processes RGB and VI inputs independently before feature-level fusion. While this configuration effectively captures complementary spatial and spectral patterns during earlier stages, it exhibits a limited capacity to characterize the vertical heterogeneity and nonlinear interactions present in densely vegetated canopies. In particular, the reliance on two-dimensional surface observations and conventional convolutional operations constrains the model’s ability to represent the internal canopy structure, which becomes increasingly important in high-LAI conditions.

Furthermore, the spectral indices used in this study, although selected for their established relevance to crop biophysical traits, tend to saturate when the LAI exceeds approximately 4, reducing their sensitivity to physiological variations during late growth stages. At the same time, RGB features extracted under top-down UAV perspectives are susceptible to illumination variability and fail to adequately distinguish between productive and senescent tissue, especially under uneven lighting or occlusion. As a result, the fused features may retain high-level noise or redundant information, limiting their contribution to accurate regression.

These findings highlight the inherent limitations of conventional feature fusion strategies in resolving structural ambiguity under complex canopy conditions. Future model development may benefit from incorporating additional data modalities that convey three-dimensional canopy information, such as UAV-based LiDAR or multi-angle imagery. From a network design perspective, integrating attention mechanisms or Transformer-based architectures could enable adaptive feature weighting, improving the model’s focus on physiologically relevant regions while suppressing noise-prone areas. Alternatively, the introduction of prior constraints from radiative transfer models may offer a pathway to enhance physical interpretability and generalization in late-season crop monitoring.

4.3. Strengths and Limitations of Deep Learning in Crop LAI Estimation

The MSF-FusionNet developed in this study outperforms traditional machine learning algorithms in estimating the LAI of winter wheat (Figure 7). Compared to the best-performing traditional model, XGBoost, MSF-FusionNet achieves a significant improvement, with an R² of 0.8745 and an RMSE of 0.5461, representing increases of 4.51% and 12.24%, respectively. This advantage is primarily attributed to the deep convolutional architecture of the CNN, which excels in automatic feature extraction and representation. By effectively integrating spatial features and spectral information from fused RGB imagery and VI data, the model captures the complex nonlinear dynamics of the LAI [54,55]. In contrast, traditional methods like ELM and RF, although effective to some extent, rely on manually crafted features and exhibit limitations when handling high-dimensional data, restricting their capacity to capture complex interactions between RGB imagery and VIs.

Additionally, MSF-FusionNet demonstrated strong robustness and generalization across different growth stages. It achieved R² values of 0.7679 and 0.8611 during the regreening and jointing stages, respectively (Table 5), indicating adaptability to sparse canopy and variable field conditions. The spatial LAI distribution maps (Figure 9) further validated its ability to capture canopy heterogeneity across plots during the booting and heading stages, offering valuable insights for precision agriculture practices such as irrigation management, targeted fertilization, and disease control. Nevertheless, the model’s performance declined in the later growth stages, with the R² falling to 0.5418 during the grain-filling period. This highlights the limitations of current deep learning approaches in addressing complex canopy structures. Future research may benefit from integrating attention mechanisms and additional environmental parameters to improve the model robustness and scalability.

4.4. Model Interpretability and Phenological Independence

The proposed MSF-FusionNet model aims to improve the accuracy of winter wheat LAI estimation by integrating spatial features from RGB imagery and spectral information from multispectral VIs. As with most CNNs, the model functions as a black box, making it difficult to directly interpret the contributions of individual input features to the prediction outcomes. This limitation is particularly relevant in agronomic applications, where model transparency can enhance decision-making and support physiological understanding. Although this study primarily focused on the prediction accuracy, future work will incorporate interpretability techniques such as Grad-CAM, saliency mapping, and occlusion analysis. These methods will help to identify the importance of specific spectral bands, vegetation indices, and spatial textures across growth stages, providing insights into the underlying mechanisms driving model performance and improving its applicability in precision agriculture.

Regarding temporal modeling, while the dataset spanned seven representative growth stages of winter wheat, the temporal observations were collected in a discrete rather than continuous manner. The intervals between image acquisitions ranged from 7 to 14 days, and each UAV campaign was executed under distinct environmental and illumination conditions. As such, the dataset lacks the temporal granularity and regularity required by recurrent neural networks (RNNs), long short-term memory (LSTM) units, or 3D CNNs to effectively capture temporal dynamics. Moreover, one of the primary design objectives of MSF-FusionNet was to enable accurate LAI estimation based on single-date UAV imagery, a scenario that reflects typical operational constraints in agricultural monitoring. Incorporating LSTM-like architectures would necessitate the availability of temporally continuous, multi-stage data for every prediction, which could substantially reduce the model’s scalability and applicability in real-world settings, where frequent data collection is not feasible.

Instead, MSF-FusionNet employs a non-sequential training approach, where multi-stage samples are used to learn phenological variation without enforcing explicit temporal dependencies. This enables the model to generalize across developmental stages while maintaining robustness under sparse temporal sampling conditions. Although it does not capture temporal transitions explicitly, the model retains stage-specific information through exposure to diverse growth periods during training, aligning well with the realities of UAV-based crop monitoring.

4.5. Generalizability and Scalability of the Multi-Source Fusion Model

Although MSF-FusionNet demonstrated strong performance across various winter wheat growth stages in this study, its generalizability beyond the current experimental setting requires further scrutiny. The dataset used was collected under relatively homogeneous conditions, including consistent management practices, uniform soil types, and the deployment of a single UAV platform. Standardized radiometric calibration also contributed to minimizing external variability. However, discrepancies may arise when the model is applied under different operational or environmental settings. For instance, variations in flight parameters (e.g., altitude, sensor spectral characteristics), atmospheric conditions (e.g., haze, solar zenith angle), or crop traits (e.g., cultivar-specific canopy architectures) could alter the reflectance profiles and derived features.

To enhance the model’s broader applicability, several key directions should be explored. First, multi-site or cross-regional validations involving diverse climatic zones, soil textures, and agronomic practices are essential to assess the model’s robustness to environmental heterogeneity. Second, domain adaptation strategies could be employed to adjust the learned representations when transferring the model to different UAV sensors or spectral configurations. Third, in cases where substantial genotypic variation exists in the canopy morphology, cultivar-specific calibration may be necessary to account for changes in the LAI–reflectance relationship. Lastly, incorporating additional sensing modalities such as thermal imaging or LiDAR could enrich structural feature extraction, helping to reduce the sensitivity to platform differences and operational inconsistencies.

By systematically expanding the geographic and sensor coverage of the training datasets and refining the multimodal fusion framework, MSF-FusionNet holds promise for scalable and reliable applications across varied agroecological contexts.

5. Conclusions

The dynamic monitoring of the LAI across growth stages using UAV remote sensing is essential for efficient agricultural management and precision decision-making. This study leveraged high-resolution UAV-acquired RGB imagery and multispectral VI data to evaluate the efficacy of single-source and multi-source data in LAI estimation. A CNN-based model, MSF-FusionNet, was developed alongside four traditional machine learning algorithms (ELM, RF, XGBoost, and SVM) to estimate the LAI, and spatial distribution maps of the LAI for different growth stages were generated based on the optimal estimation model. The key findings are outlined below.

(1) UAV-derived RGB features and multispectral VI data effectively captured the spatiotemporal variation in the winter wheat LAI, demonstrating strong potential for the tracking of crop growth dynamics. Nevertheless, the estimation model based solely on RGB data exhibited lower accuracy (R² = 0.5078, RMSE = 1.0816). Although VI-based models performed better (R² = 0.8291, RMSE = 0.6373), their accuracy was diminished under high canopy closure and during late growth stages (e.g., grain filling), limiting their ability to accurately reflect the canopy structure and density.

(2) Integrating RGB and VI data markedly improved the LAI estimation performance. The MSF-FusionNet model achieved an R² of 0.8745 and an RMSE of 0.5461, reflecting R² increases of 36.67% and 5.54% and RMSE reductions of 49.50% and 14.30%, respectively, compared to the best single-source RGB and VI models. These improvements underscore the value of multi-source data fusion in enhancing the representation of winter wheat growth patterns, with significant accuracy gains observed during the regreening (R² = 0.7679) and jointing stages (R² = 0.8611).

(3) The CNN-based MSF-FusionNet outperformed traditional machine learning models in LAI estimation. Compared with XGBoost (R² = 0.8368, RMSE = 0.6228), MSF-FusionNet improved the R² by 4.51% and reduced the RMSE by 12.24%. Additionally, the LAI maps predicted by MSF-FusionNet across six key growth stages aligned closely with field observations, accurately capturing the spatial heterogeneity and dynamic changes in the LAI.

Despite the promising results, several aspects warrant further exploration in future studies. First, integrating additional data modalities, such as LiDAR-based canopy height models or multi-angle imagery, could help to resolve vertical occlusions and enhance LAI estimation under dense canopy conditions. Second, attention mechanisms or Transformer-based architectures may further improve feature fusion and model interpretability by dynamically focusing on the most informative canopy regions. Third, a more explicit link between LAI heterogeneity and agronomic traits (e.g., yield components, nutrient status) would facilitate the development of actionable management recommendations. Finally, multi-site and multi-year validations are essential to assess the robustness of the proposed framework across diverse agronomic conditions, thereby paving the way for scalable, reliable applications in precision agriculture.

Author Contributions

Conceptualization, L.Z.; methodology, L.Z. and C.W.; software, L.Z.; validation, L.Z. and X.W.; formal analysis, L.Z., H.Z. and X.H.; investigation, L.Z., H.Z., W.Y. and X.H.; resources, C.W., J.C. and B.Z.; data curation, L.Z., C.W., W.Y. and H.Z.; writing—original draft preparation, L.Z.; writing—review and editing, L.Z., C.W. and X.W.; visualization, L.Z.; supervision, C.W., J.C., B.Z. and X.W.; project administration, C.W.; funding acquisition, C.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD-2023-87), the Key and General Projects of Jiangsu Province (No. BE2022338), the Postgraduate Research and Practice Innovation Program of Jiangsu Province (No. KYCX24_3990), and the Project of the Faculty of Agricultural Engineering of Jiangsu University (No. NZXB20200102).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author(s).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LAI	Leaf area index
UAV	Unmanned aerial vehicle
VIs	Vegetation indices
CNN	Convolutional neural network
ELM	Extreme learning machine
SVM	Support vector machine
RNNs	Recurrent neural networks
DSM	Digital surface model
P4M	DJI Phantom 4 multispectral drone
NIR	Near-infrared band
GCPs	Ground control points
ROIs	Regions of interest
NDVI	Normalized difference vegetation index
NDRE	Normalized difference red-edge vegetation index
OSAVI	Optimized soil-adjusted vegetation index
MCARI	Modified chlorophyll absorption in reflectance index
TCARI	Transformed chlorophyll absorption in reflectance index
GNDVI	Green normalized difference vegetation index
RGBVI	Red–green–blue vegetation index
GLI	Green leaf index
IQR	Interquartile range method
ResBlocks	Residual blocks
Conv2D	2D convolutional layer
BN	Batch normalization
RF	Random forest regression
XGBoost	eXtreme gradient boosting regression
R²	Coefficient of determination
RMSE	Root mean square error
CV	Coefficient of variation
MLP	Multi-layer perceptron

References

Zheng, Z.; Hoogenboom, G.; Cai, H.J.; Wang, Z.K. Winter Wheat Production on the Guanzhong Plain of Northwest China Under Projected Future Climate with SimCLIM. Agric. Water Manag. 2020, 239, 106233. [Google Scholar] [CrossRef]
Long, S.; Zhou, S.; He, H.; Zhang, L. Optimizing Food Crop Layout Considering Precipitation Uncertainty: Balancing Regional Water, Carbon, and Economic Pressures with Food Security. J. Clean. Prod. 2024, 467, 142881. [Google Scholar] [CrossRef]
Zhang, L.; Song, X.; Niu, Y.; Zhang, H.; Wang, A.; Zhu, Y.; Zhu, X.; Chen, L.; Zhu, Q. Estimating Winter Wheat Plant Nitrogen Content by Combining Spectral and Texture Features Based on a Low-Cost UAV RGB System Throughout the Growing Season. Agriculture 2024, 14, 456. [Google Scholar] [CrossRef]
Tao, H.; Feng, H.; Xu, L.; Miao, M.; Long, H.; Yue, J.; Li, Z.; Yang, G.; Yang, X.; Fan, L. Estimation of Crop Growth Parameters Using UAV-Based Hyperspectral Remote Sensing Data. Sensors 2020, 20, 1296. [Google Scholar] [CrossRef] [PubMed]
An, M.; Xing, W.; Han, Y.; Bai, Q.; Peng, Z.; Zhang, B.; Wei, Z.; Wu, W. The Optimal Soil Water Content Models Based on Crop-LAI and Hyperspectral Data of Winter Wheat. Irrig. Sci. 2021, 39, 687–701. [Google Scholar] [CrossRef]
Xu, S.; Xu, X.; Zhu, Q.; Meng, Y.; Yang, G.; Feng, H.; Yang, M.; Zhu, Q.; Xue, H.; Wang, B. Monitoring Leaf Nitrogen Content in Rice Based on Information Fusion of Multi-Sensor Imagery from UAV. Precis. Agric. 2023, 24, 2327–2349. [Google Scholar] [CrossRef]
Li, L.; Xie, S.; Ning, J.; Chen, Q.; Zhang, Z. Evaluating Green Tea Quality Based on Multisensor Data Fusion Combining Hyperspectral Imaging and Olfactory Visualization Systems. J. Sci. Food Agric. 2019, 99, 1787–1794. [Google Scholar] [CrossRef]
Lan, Y.; Huang, Z.; Deng, X.; Zhu, Z.; Huang, H.; Zheng, Z.; Lian, B.; Zeng, G.; Tong, Z. Comparison of Machine Learning Methods for Citrus Greening Detection on UAV Multispectral Images. Comput. Electron. Agric. 2020, 171, 105234. [Google Scholar] [CrossRef]
Zhang, D.; Han, X.; Lin, F.; Du, S.; Zhang, G.; Hong, Q. Estimation of Winter Wheat Leaf Area Index Using Multi-Source UAV Image Feature Fusion. Trans. Chin. Soc. Agric. Eng. 2022, 38, 171–179. [Google Scholar]
Chatterjee, S.; Baath, G.S.; Sapkota, B.R.; Flynn, K.C.; Smith, D.R. Enhancing LAI Estimation Using Multispectral Imagery and Machine Learning: A Comparison Between Reflectance-Based and Vegetation Indices-Based Approaches. Comput. Electron. Agric. 2025, 230, 109790. [Google Scholar] [CrossRef]
Zhang, B.Y.; Gu, L.M.; Dai, M.L.; Bao, X.Y.; Sun, Q.; Zhang, M.Z.; Qu, X.Z.; Li, Z.H.; Zhen, W.C.; Gu, X.H. Estimation of Grain Filling Rate of Winter Wheat Using Leaf Chlorophyll and LAI Extracted from UAV Images. Field Crop Res. 2024, 306, 109198. [Google Scholar] [CrossRef]
Liu, Q.; Qu, Z.; Bai, Y.; Yang, W.; Fang, H.; Bai, Q.; Yang, Y.; Zhang, R. Using Multispectral Spectrometry and Machine Learning to Estimate Leaf Area Index of Spring Wheat. J. Irrig. Drain. 2024, 43, 63–73. [Google Scholar] [CrossRef]
Ma, Y.; Zhang, Q.; Yi, X.; Ma, L.; Zhang, L.; Huang, C.; Zhang, Z.; Lv, X. Estimation of Cotton Leaf Area Index (LAI) Based on Spectral Transformation and Vegetation Index. Remote Sens. 2022, 14, 136. [Google Scholar] [CrossRef]
Cheng, Q.; Ding, F.; Xu, H.; Guo, S.; Li, Z.; Chen, Z. Quantifying Corn LAI Using Machine Learning and UAV Multispectral Imaging. Precis. Agric. 2024, 25, 1777–1799. [Google Scholar] [CrossRef]
Li, S.; Yuan, F.; Ata-UI-Karim, S.T.; Zheng, H.; Cheng, T.; Liu, X.; Tian, Y.; Zhu, Y.; Cao, W.; Cao, Q. Combining Color Indices and Textures of UAV-Based Digital Imagery for Rice LAI Estimation. Remote Sens. 2019, 11, 1763. [Google Scholar] [CrossRef]
Yu, X.; Fan, K.; Huo, X.; Yin, Q.; Qian, L.; Liu, Z.; Zhang, C.; Li, L.; Wang, W.; Hu, X. Dynamic Estimation of Summer Maize LAI Based on Multi-Feature Fusion of UAV Imagery. Trans. Chin. Soc. Agric. Eng. 2025, 41, 124–134. [Google Scholar] [CrossRef]
Du, X.; Zheng, L.; Zhu, J.; He, Y. Enhanced Leaf Area Index Estimation in Rice by Integrating UAV-Based Multi-Source Data. Remote Sens. 2024, 16, 1138. [Google Scholar] [CrossRef]
Qiu, D.; Guo, T.; Yu, S.; Liu, W.; Li, L.; Sun, Z.; Peng, H.; Hu, D. Classification of Apple Color and Deformity Using Machine Vision Combined with CNN. Agriculture 2024, 14, 978. [Google Scholar] [CrossRef]
Memon, M.S.; Chen, S.; Shen, B.; Liang, R.; Tang, Z.; Wang, S.; Zhou, W.; Memon, N. Automatic Visual Recognition, Detection and Classification of Weeds in Cotton Fields Based on Machine Vision. Crop Prot. 2025, 187, 106966. [Google Scholar] [CrossRef]
Liu, J.; Abbas, I.; Noor, R.S. Development of Deep Learning-Based Variable Rate Agrochemical Spraying System for Targeted Weeds Control in Strawberry Crop. Agronomy 2021, 11, 1480. [Google Scholar] [CrossRef]
Li, Y.; Liu, H.; Ma, J.; Zhang, L. Estimation of Leaf Area Index for Winter Wheat at Early Stages Based on Convolutional Neural Networks. Comput. Electron. Agric. 2021, 190, 106480. [Google Scholar] [CrossRef]
Li, Y.; Wang, H.; Dang, L.M.; Sadeghi-Niaraki, A.; Moon, H. Crop Pest Recognition in Natural Scenes Using Convolutional Neural Networks. Comput. Electron. Agric. 2020, 169, 105174. [Google Scholar] [CrossRef]
Park, S.-H.; Lee, B.-Y.; Kim, M.-J.; Sang, W.; Seo, M.C.; Baek, J.-K.; Yang, J.E.; Mo, C. Development of a Soil Moisture Prediction Model Based on Recurrent Neural Network Long Short-Term Memory (RNN-LSTM) in Soybean Cultivation. Sensors 2023, 23, 1976. [Google Scholar] [CrossRef]
Zhao, N.; Zhou, L.; Huang, T.; Taha, M.F.; He, Y.; Qiu, Z. Development of an Automatic Pest Monitoring System Using a Deep Learning Model of DPeNet. Measurement 2022, 203, 111970. [Google Scholar] [CrossRef]
Wittstruck, L.; Jarmer, T.; Trautz, D.; Waske, B. Estimating LAI from Winter Wheat Using UAV Data and CNNs. IEEE Geosci. Remote Sens. Lett. 2022, 19, 1–5. [Google Scholar] [CrossRef]
Wei, L.; Yang, H.; Niu, Y.; Zhang, Y.; Xu, L.; Chai, X. Wheat Biomass, Yield, and Straw-Grain Ratio Estimation from Multi-Temporal UAV-Based RGB and Multispectral Images. Biosyst. Eng. 2023, 234, 187–205. [Google Scholar] [CrossRef]
Feng, W.; Wu, Y.; He, L.; Ren, X.; Wang, Y.; Hou, G.; Wang, Y.; Liu, W.; Guo, T. An Optimized Non-Linear Vegetation Index for Estimating Leaf Area Index in Winter Wheat. Precis. Agric. 2019, 20, 1157–1176. [Google Scholar] [CrossRef]
Raj, R.; Walker, J.P.; Pingale, R.; Nandan, R.; Naik, B.; Jagarlapudi, A. Leaf Area Index Estimation Using Top-of-Canopy Airborne RGB Images. Int. J. Appl. Earth Obs. Geoinf. 2021, 96, 102282. [Google Scholar] [CrossRef]
Zhao, B.; Ding, Y.; Cai, X.; Xie, J.; Liao, Q.; Zhang, J. Seedlings Number Identification of Rape Planterz Based on Low Altitude Unmanned Aerial Vehicles Remote Sensing Technology. Trans. Chin. Soc. Agric. Eng. 2017, 33, 115–123. [Google Scholar] [CrossRef]
Zhang, L.; Wang, X.; Zhang, H.; Zhang, B.; Zhang, J.; Hu, X.; Du, X.; Cai, J.; Jia, W.; Wu, C. UAV-Based Multispectral Winter Wheat Growth Monitoring with Adaptive Weight Allocation. Agriculture 2024, 14, 1900. [Google Scholar] [CrossRef]
Gitelson, A.A.; Gritz, Y.; Merzlyak, M.N. Relationships Between Leaf Chlorophyll Content and Spectral Reflectance and Algorithms for Non-Destructive Chlorophyll Assessment in Higher Plant Leaves. J. Plant Physiol. 2003, 160, 271–282. [Google Scholar] [CrossRef] [PubMed]
Haboudane, D.; Miller, J.R.; Tremblay, N.; Zarco-Tejada, P.J.; Dextraze, L. Integrated Narrow-Band Vegetation Indices for Prediction of Crop Chlorophyll Content for Application to Precision Agriculture. Remote Sens. Environ. 2002, 81, 416–426. [Google Scholar] [CrossRef]
Daughtry, C.S.T.; Walthall, C.L.; Kim, M.S.; De Colstoun, E.B.; McMurtrey Iii, J.E. Estimating Corn Leaf Chlorophyll Concentration from Leaf and Canopy Reflectance. Remote Sens. Environ. 2000, 74, 229–239. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Merzlyak, M.N. Use of a Green Channel in Remote Sensing of Global Vegetation from EOS-MODIS. Remote Sens. Environ. 1996, 58, 289–298. [Google Scholar] [CrossRef]
Candiago, S.; Remondino, F.; De Giglio, M.; Dubbini, M.; Gattelli, M. Evaluating Multispectral Images and Vegetation Indices for Precision Farming Applications from UAV Images. Remote Sens. 2015, 7, 4026–4047. [Google Scholar] [CrossRef]
Louhaichi, M.; Borman, M.M.; Johnson, D.E. Spatially Located Platform and Aerial Photography for Documentation of Grazing Impacts on Wheat. Geocarto Int. 2001, 16, 65–70. [Google Scholar] [CrossRef]
Vinutha, H.; Poornima, B.; Sagar, B. Detection of Outliers Using Interquartile Range Technique from Intrusion Dataset. In Information and Decision Sciences; Springer: Berlin/Heidelberg, Germany, 2018; pp. 511–518. [Google Scholar] [CrossRef]
Hao, X.; Liu, L.; Yang, R.; Yin, L.; Zhang, L.; Li, X. A Review of Data Augmentation Methods of Remote Sensing Image Target Recognition. Remote Sens. 2023, 15, 827. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the 7th International Conference on Learning Representations (ICLR 2019), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar] [CrossRef]
Moran, P.A. Notes on Continuous Stochastic Phenomena. Biometrika 1950, 37, 17–23. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Awad, M.; Khanna, R. Support Vector Regression. In Efficient Learning Machines; Springer: Berlin/Heidelberg, Germany, 2015; pp. 67–80. [Google Scholar] [CrossRef]
Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme Learning Machine: Theory and Applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Chen, T.; He, T.; Benesty, M.; Khotilovich, V.; Tang, Y. Xgboost: Extreme Gradient Boosting. R Package Version 0.4-2. 2015, pp. 1–4. Available online: https://cran.ms.unimelb.edu.au/web/packages/xgboost/vignettes/xgboost.pdf (accessed on 15 July 2020).
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, AL, USA, 3–6 December 2012; pp. 1097–1105. [Google Scholar]
Gitelson, A.A.; Viña, A.; Arkebauer, T.J.; Rundquist, D.C.; Keydan, G.; Leavitt, B. Remote Estimation of Leaf Area Index and Green Leaf Biomass in Maize Canopies. Geophys. Res. Lett. 2003, 30, 1248. [Google Scholar] [CrossRef]
Thenkabail, P.S.; Smith, R.B.; De Pauw, E. Hyperspectral Vegetation Indices and Their Relationships with Agricultural Crop Characteristics. Remote Sens. Environ. 2000, 71, 158–182. [Google Scholar] [CrossRef]
Gitelson, A.A.; Kaufman, Y.J.; Stark, R.; Rundquist, D. Novel Algorithms for Remote Estimation of Vegetation Fraction. Remote Sens. Environ. 2002, 80, 76–87. [Google Scholar] [CrossRef]
Xie, P.; Zhang, Z.; Ba, Y.; Dong, N.; Zuo, X.; Yang, N.; Chen, J.; Cheng, Z.; Zhang, B.; Yang, X. Diagnosis of Summer Maize Water Stress Based on UAV Image Texture and Phenotypic Parameters. Trans. Chin. Soc. Agric. Eng. 2024, 40, 136–146. [Google Scholar] [CrossRef]
Liu, J.; Zhu, Y.; Tao, X.; Chen, X.; Li, X. Rapid Prediction of Winter Wheat Yield and Nitrogen Use Efficiency Using Consumer-Grade Unmanned Aerial Vehicles Multispectral Imagery. Front. Plant Sci. 2022, 13, 1032170. [Google Scholar] [CrossRef]
Zhang, Y.; Xu, A.; Lan, D.; Zhang, X.; Yin, J.; Goh, H.H. ConvNeXt-Based Anchor-Free Object Detection Model for Infrared Image of Power Equipment. Energy Rep. 2023, 9, 1121–1132. [Google Scholar] [CrossRef]
Li, Z.; Gu, T.; Li, B.; Xu, W.; He, X.; Hui, X. ConvNeXt-Based Fine-Grained Image Classification and Bilinear Attention Mechanism Model. Appl. Sci. 2022, 12, 9016. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Davidson, C.; Jaganathan, V.; Sivakumar, A.N.; Czarnecki, J.M.P.; Chowdhary, G. NDVI/NDRE Prediction from Standard RGB Aerial Imagery Using Deep Learning. Comput. Electron. Agric. 2022, 203, 107396. [Google Scholar] [CrossRef]

Figure 1. Study area survey: (a) location of study site; (b) study area and sampling point distribution; (c) spatial distribution map of different winter wheat varieties in the study area, where numbers 1–21 represent different wheat genotypes.

Figure 2. Distribution of the original LAI data and preprocessed LAI data: (a) distribution of the original LAI data; (b) distribution of the preprocessed LAI data.

Figure 3. Single-modal feature extraction network framework.

Figure 4. Multimodal feature fusion network framework. (a) The overall framework; (b) the structure of the modules, such as the stem layer, multi-layer perceptron (MLP) layer, and residual block.

Figure 5. LAI prediction results based on single-modal data: (a) the LAI prediction results based on RGB data; (b) the LAI prediction results based on multispectral VI data.

Figure 6. Observed vs. predicted LAI using the MSF-FusionNet model. The red dashed line indicates the 1:1 reference line, and blue dots represent individual sample predictions. The model achieves an R² of 0.8745 and an RMSE of 0.5461, indicating strong prediction accuracy and demonstrating the effectiveness of multi-source image fusion for LAI estimation.

Figure 7. Relationship between predicted and measured LAI values based on different machine learning models.

Figure 8. Relationship between predicted and measured LAI values based on different deep learning models.

Figure 9. Spatial distribution maps of LAI for different growth stages in the study area. Note: Due to space limitations, only the visualization results for the post-flowering stages were analyzed and presented.

Table 1. Details of multispectral vegetation indices.

Vegetation Index	Formulation	Reference
Normalized difference vegetation index (NDVI)	NDVI = (NIR $-$ R)/(NIR $+$ R)	[30]
Normalized difference red-edge vegetation index (NDRE)	NDRE = (NIR $-$ RE)/(NIR $+$ RE)	[31]
Optimized soil-adjusted vegetation index (OSAVI)	OSAVI = (NIR $-$ R)/(NIR $+$ R $+$ L) (L = 0.16)	[32]
Modified chlorophyll absorption in reflectance index (MCARI)	MCARI = [(RE $-$ R) $-$ 0.2 (RE $-$ G)] $\times$ (RE/R)]	[33]
Transformed chlorophyll absorption in reflectance index (TCARI)	TCARI = 3 [(RE $-$ R) $-$ 0.2 (RE $-$ G) $\times$ (RE/R)]	[30]
Green normalized difference vegetation index (GNDVI)	GNDVI = (NIR $-$ G)/(NIR $+$ G)	[34]
Red–green–blue vegetation index (RGBVI)	RGBVI = (G² $-$ B $\times$ R)/(G² $+$ B $\times$ R)	[35]
Green leaf index (GLI)	GLI = (2G $-$ R $+$ B)/(2G $+$ R $+$ B)	[36]

Note: L represents the standard value of the canopy background adjustment factor (0.16).

Table 2. Descriptive statistics of LAI for winter wheat across various growth stages.

Date	Growth Stage	Number	Max	Min	Mean	Std	Var	CV
12 March 2024	Regreening	91	2.067	0.500	1.215	0.386	0.149	0.318
21 March 2024	Jointing	91	3.750	0.286	1.681	0.665	0.442	0.396
30 March 2024	Booting	91	5.700	0.643	2.854	1.023	1.046	0.358
10 April 2024	Heading	91	6.800	1.300	4.044	0.988	0.975	0.244
18 April 2024	Pre-Anthesis	91	5.983	2.017	4.200	0.857	0.734	0.204
24 April 2024	Post-Anthesis	91	6.743	1.729	4.442	0.955	0.912	0.215
8 May 2024	Grain Filling	91	6.733	2.050	4.572	0.913	0.834	0.200

Table 3. Global Moran’s I statistics of LAI at different wheat growth stages.

Growth Stage	Moran’s I	p-Value	z-Score
Regreening	0.0045	0.2	0.7419
Jointing	0.0177	0.106	1.2663
Booting	0.0036	0.207	0.7025
Heading	0.0181	0.105	1.2363
Pre-Anthesis	0.0051	0.213	0.6799
Post-Anthesis	0.0097	0.159	0.9486
Grain Filling	0.0074	0.172	0.9046

Table 4. Correlation coefficients between VI and LAI at different growth stages.

Growth Stage	NDVI	NDRE	OSAVI	MCARI	TCARI	GNDVI	RGBVI	GLI
Regreening	0.831	0.881	0.768	0.867	0.867	0.849	0.790	0.805
Jointing	0.782	0.861	0.743	0.872	0.872	0.814	0.708	0.723
Booting	0.708	0.805	0.850	0.878	0.878	0.734	0.706	0.707
Heading	0.612	0.624	0.658	0.671	0.671	0.580	0.564	0.559
Pre-Anthesis	0.579	0.645	0.752	0.710	0.710	0.592	0.410	0.424
Post-Anthesis	0.626	0.638	0.750	0.724	0.724	0.615	0.420	0.450
Grain Filling	0.615	0.644	0.773	0.674	0.674	0.625	0.182	0.257

Table 5. LAI estimation performance across key growth stages based on multi-source imagery fusion.

Growth Stage	Training Set			Validation Set
Growth Stage	Test Loss	R²	RMSE	Test Loss	R²	RMSE
Regreening	0.0204	0.7494	0.2051	0.0103	0.7679	0.1438
Jointing	0.0581	0.7632	0.3162	0.0333	0.8611	0.2581
Booting	0.0886	0.8329	0.4206	0.0664	0.8592	0.3644
Heading	0.2079	0.5063	0.7010	0.1288	0.6898	0.5203
Pre-Anthesis	0.3391	0.3101	0.6933	0.1481	0.6181	0.5479
Post-Anthesis	0.4430	0.2978	0.8115	0.1515	0.5875	0.5703
Grain Filling	0.3221	0.3328	0.7445	0.1819	0.5418	0.6072

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, L.; Zhang, B.; Zhang, H.; Yang, W.; Hu, X.; Cai, J.; Wu, C.; Wang, X. Multi-Source Feature Fusion Network for LAI Estimation from UAV Multispectral Imagery. Agronomy 2025, 15, 988. https://doi.org/10.3390/agronomy15040988

AMA Style

Zhang L, Zhang B, Zhang H, Yang W, Hu X, Cai J, Wu C, Wang X. Multi-Source Feature Fusion Network for LAI Estimation from UAV Multispectral Imagery. Agronomy. 2025; 15(4):988. https://doi.org/10.3390/agronomy15040988

Chicago/Turabian Style

Zhang, Lulu, Bo Zhang, Huanhuan Zhang, Wanting Yang, Xinkang Hu, Jianrong Cai, Chundu Wu, and Xiaowen Wang. 2025. "Multi-Source Feature Fusion Network for LAI Estimation from UAV Multispectral Imagery" Agronomy 15, no. 4: 988. https://doi.org/10.3390/agronomy15040988

APA Style

Zhang, L., Zhang, B., Zhang, H., Yang, W., Hu, X., Cai, J., Wu, C., & Wang, X. (2025). Multi-Source Feature Fusion Network for LAI Estimation from UAV Multispectral Imagery. Agronomy, 15(4), 988. https://doi.org/10.3390/agronomy15040988

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Source Feature Fusion Network for LAI Estimation from UAV Multispectral Imagery

Abstract

1. Introduction

2. Materials and Methods

2.1. Overview of the Study Area

2.2. Data Acquisition

2.2.1. UAV Image Data Acquisition

2.2.2. LAI Acquisition

2.3. Data Processing

2.3.1. UAV Image Processing

2.3.2. Extraction of VIs

2.3.3. LAI Data Preprocessing

2.4. Data Augmentation

2.5. LAI Prediction Model Construction

2.5.1. Lightweight CNN Prediction Model Based on Single-Modal RGB Imagery

2.5.2. Construction of the MSF-FusionNet Model Based on RGB and VI Images

2.5.3. Experimental Environment and Parameter Settings

2.6. Model Evaluation

2.6.1. Spatial Autocorrelation Measure

2.6.2. Baseline Models

2.6.3. Evaluation Indicators

3. Results

3.1. Variability and Spatial Autocorrelation Analysis of LAI Data

3.2. Correlation Analysis Between VI and LAI Data

3.3. LAI Estimation Based on RGB and Multispectral Data

3.4. LAI Estimation Using Multi-Source Image Fusion

3.5. Performance Evaluation of LAI Models

3.6. LAI Mapping with the Optimal Estimation Model

4. Discussion

4.1. Evaluation of RGB and Multispectral VI Data in Estimating LAI

4.2. Benefits and Limitations of Multi-Source Fusion in Dense Canopy LAI Estimation

4.3. Strengths and Limitations of Deep Learning in Crop LAI Estimation

4.4. Model Interpretability and Phenological Independence

4.5. Generalizability and Scalability of the Multi-Source Fusion Model

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI