Utilizing Multi-View Morphological, Color–Textural and Multispectral Features for Interpretable Estimation of Lettuce Fresh Weight Using Machine Learning

Zhang, Xiaodong; Li, Tiezhu; Guo, Chuandong; Zhang, Deshen; Zhang, Yixue

doi:10.3390/horticulturae12060688

Open AccessArticle

Utilizing Multi-View Morphological, Color–Textural and Multispectral Features for Interpretable Estimation of Lettuce Fresh Weight Using Machine Learning

by

Xiaodong Zhang

^1,2,*,

Tiezhu Li

¹

,

Chuandong Guo

¹,

Deshen Zhang

¹ and

Yixue Zhang

^3,*

¹

School of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China

²

Department of Agricultural Engineering, Jiangsu University, Zhenjiang 212013, China

³

Basic Engineering Training Center, Jiangsu University, Zhenjiang 212013, China

^*

Authors to whom correspondence should be addressed.

Horticulturae 2026, 12(6), 688; https://doi.org/10.3390/horticulturae12060688

Submission received: 1 May 2026 / Revised: 22 May 2026 / Accepted: 31 May 2026 / Published: 2 June 2026

(This article belongs to the Special Issue Intelligent Agricultural Equipment Monitoring Technology for Vegetable Production)

Download

Browse Figures

Versions Notes

Abstract

Accurate and reliable prediction of lettuce fresh weight is essential for optimising protected cultivation management and improving the yield and quality. Multimodal data combined with machine learning models have been widely used for monitoring crop growth. However, existing approaches often fail to capture dynamic physiological changes during crop growth, whereas conventional machine learning models are frequently limited by their black-box nature and thus cannot reveal the intrinsic relationships between features and targets. To address the above issues, this study developed a stationary, multi-sensor integrated data acquisition platform under controlled greenhouse conditions. By fusing multi-view morphological structure, color and texture, and multispectral features, the study constructed interpretable machine learning models for predicting the fresh weight of lettuce. Based on the data collected by the platform, 66 initial features covering morphology, color texture, and vegetation indices were extracted from the data. A two-stage feature-selection strategy combining Pearson correlation screening and variance inflation factor (VIF)-based multicollinearity elimination was used to select nine optimal input variables for the model. To achieve an accurate estimation of the fresh weight of lettuce, the system compared six models: Support Vector Regression (SVR), Random Forest Regression (RFR), Gradient Boosted Decision Tree Regression (GBDT), K-nearest neighbour regression (KNN), XGBoost, and Backpropagation Neural Network (BPNN). The results indicate that the SVR model based on multimodal data fusion performed best, with an R² of 0.93, an RMSE of 3.23 g, an RMSEn of 5.60%, and an MAE of 2.31 g, demonstrating a significantly higher prediction accuracy than the other models. Furthermore, the SHAP interpretation method was used to reveal the contributions of key features to fresh weight estimation and their interaction mechanisms. This study provides a feasible approach and technical guidance for non-destructive estimation of fresh weight in lettuce under controlled conditions, and offers a preliminary basis for the development of phenotypic monitoring models for protected cultivation.

Keywords:

protected lettuce; multi-view images; fresh weight; data fusion; machine learning

1. Introduction

The accurate and reliable prediction of fresh weight in lettuce is a critical component for optimizing greenhouse production management, improving resource utilization efficiency, and ensuring yield [1,2]. As the most intuitive physical indicator of photosynthetic product accumulation and growth status, dynamic changes in fresh weight directly reflect the plant’s physiological response to environmental regulation [3]. Traditional fresh weight measurements rely on destructive sampling and weighing using electronic balances. Although this method offers high precision, its irreversible nature precludes continuous, non-destructive monitoring of individual plants throughout their entire life cycle. Furthermore, the time-consuming and labor-intensive nature of these operations makes it difficult to meet the high-throughput demands of intelligent phenotyping platforms [4,5]. Consequently, the development of non-destructive methods for estimating fresh weight has become a research hotspot in the field of precision agriculture [6,7].

The three-dimensional geometric characteristics of vegetation are strongly correlated with biomass, as they directly quantify the spatial occupancy and morphological distribution of plants [8,9]. Wang et al. [10] found that the correlation coefficient (R²) between canopy structure features extracted from UAV multispectral imagery and aboveground biomass in grasslands was 0.73; by integrating coupled information on height and coverage, this approach significantly outperformed traditional spectral vegetation indices. Xie et al. [11] employed LiDAR technology to calculate canopy height models and point cloud volumes, achieving an extraction accuracy of over 95%, thereby providing precise phenotypic parameters for biomass distribution modeling. However, relying solely on structural information has limitations. When plants are subjected to stress, internal physiological changes typically precede external morphological changes, resulting in a lag in the response of structural features during early growth stages or the initial phase of stress [12]. Additionally, canopy overlap and shading can affect the estimation accuracy of key plant phenotypic parameters [13,14]. Therefore, relying solely on 3D structural information makes it difficult to fully reconstruct the true growth status of vegetation in complex canopy scenarios [9,15].

Multispectral features sensitively reflect physiological and biochemical states, such as chlorophyll content and water stress, whereas color and texture features extract richer spatial distribution information from plant surfaces; both serve as effective supplements to structural information [16,17]. However, relying solely on single-source data or two-dimensional information from a single perspective is highly susceptible to interference from canopy overlap, sensor saturation effects, and complex environmental backgrounds [18]. Research indicates that the deep fusion of multisource phenotypic features through cross-modal information complementarity and multilevel feature synergy enables a more comprehensive analysis of complex systems [19]. Zhang et al. [20] significantly improved the estimation of aboveground biomass in sorghum by fusing morphological, color, and textural features from multiview images. In the diagnosis of water stress in summer maize, Xie et al. [21] constructed a backpropagation neural network model by combining vegetation indices, image texture, and phenotypic parameters, significantly mitigating the underestimation of low stomatal conductance values. In a study predicting sorghum chlorophyll content, Zhang et al. [22] constructed a PLSR model by integrating RGB color features, hyperspectral indices, and fluorescence intensity. The model achieved a prediction R² of 0.90, far exceeding that of models using single-sensor features, thereby significantly improving the model’s predictive performance. Che et al. [23] acquired image sequences at five growth stages using UAVs for maize biomass estimation. They extracted parameters, such as canopy structure and spectral characteristics, from both raw and reconstructed images and constructed aboveground biomass estimation models based on both single-parameter and multimodal data. Multimodal data fusion achieved a higher biomass estimation accuracy than single-parameter methods, with a coefficient of determination of 0.83. The fusion of multisource remote sensing features also validated the significant advantages of multimodal information synergy over single data sources. In summary, cross-modal feature fusion can effectively compensate for the inherent limitations of single information sources, enabling a more comprehensive and accurate analysis of vegetation phenotype and physiological status [24,25].

Multi-view imaging technology is a key method for enhancing the robustness of plant phenotyping. By leveraging spatial redundancy to overcome the limitations of single-view imaging in terms of information coverage and data accuracy [26,27], it effectively addresses the shortcomings of traditional single-view, single-sensor approaches in analyzing complex three-dimensional plant structures, physiological states, and environmental interactions, thereby opening new avenues for achieving precise phenotypic analysis from the organ to the population scale [28]. Zhang et al. [20] utilized multi-view image fusion and multi-category features to assess the above-ground biomass of sorghum. Compared to single-type variables and image information from a single viewpoint, fusion based on averaged multi-view image information significantly enhances the ability to capture the phenotypic characteristics of sorghum above-ground biomass. Li et al. [29] estimated blueberry yield using multi-view images combined with the YOLOv8 object detection framework. Compared to single-view image information, a regression model based on multi-view image fusion significantly improved the accuracy of blueberry yield estimation, reducing the mean absolute percentage error to 24.6% and achieving an R² of 0.77—representing an improvement of 5.2% to 15.7% over single-view methods. Zhang et al. [30] used multi-angle remote sensing technology to estimate water use efficiency in winter wheat. Compared with traditional single-angle vertical observations, multi-angle remote sensing can capture richer information on canopy structure and outperform traditional single-angle spectral parameter models. Duan et al. [31] utilized multi-angle imaging technology to collect color images of rice from multiple angles, enabling a comprehensive determination of the number of panicles. Multi-angle imaging effectively overcomes occlusion issues associated with a single viewpoint, significantly improving the accuracy and stability of panicle number identification. In summary, multi-view imaging effectively overcomes issues such as single-view occlusion and information loss by integrating multidimensional phenotypic characteristics. This method provides more comprehensive data and reduces random errors caused by a single measurement angle through information complementarity and averaging, thereby improving the accuracy of phenotypic analysis.

Traditional machine learning models generally suffer from a lack of transparency in their decision-making processes and poor interpretability. While striving for high predictive accuracy, models often sacrifice interpretability, making it difficult to balance predictive performance and interpretability [32,33]. To address this challenge, explainable machine learning has rapidly developed in recent years. It not only reveals the internal processes of models but also uses model-agnostic explanation methods to transform any “black-box” model into an explainable one without requiring knowledge of its internal structure, providing explanations at both global and local levels [34,35]. Applying the aforementioned interpretability framework to crop fresh weight prediction using multimodal data fusion allows for the clear visualization of the influence pathways of different data sources on prediction results while maintaining model accuracy, thereby enhancing the model’s credibility and usability in practical applications [36,37].

To address the aforementioned issues, in this study, we designed an integrated platform for data acquisition using multiple sensors. By capturing multiple-view morphological, color, and texture data, as well as multispectral features of mature lettuce, we propose a yield prediction model based on multi-view and multi-modal feature fusion for greenhouse-grown lettuce. This study established a feature fusion framework that integrates optical sensor data, including multispectral vegetation indices and visible light color and texture features. It introduced a collaborative strategy for multi-view and multi-scale features and used the variance inflation factor tests and correlation analysis to eliminate redundant features, thereby selecting an optimal feature subset. By combining ensemble learning methods, such as random forest, we constructed a mapping model between lettuce multimodal data and fresh weight. By utilizing SHAP explanations to enhance the scientific understanding of the prediction process, we fully leveraged the complementary nature of spectral, morphological, and color texture features to overcome the limitations of single-feature dimensions and angle dependency. This study provides a theoretical basis and technical guidance for non-destructive estimation of the fresh weight of greenhouse-grown lettuce.

2. Materials and Methods

2.1. Plant Materials and Experimental Design

This study used “Italian bolting-resistant lettuce” (Lactuca sativa L. var. ramosa Hort.) as the experimental material. The experiment was conducted from September to December 2025 in a Venlo-type glass greenhouse at Jiangsu University (32.2° N, 119.5° E). During the experiment, the average daytime temperature in the greenhouse was 20 °C, the average nighttime temperature was 16 °C, and the average relative humidity was maintained between 65% and 75%. When the lettuce seedlings reached the “three leaves and a heart” stage, vigorous individuals were selected and transplanted into cultivation pots. To establish a population of lettuce samples with distinct phenotypic differences, this study used a standard Hoagland nutrient solution formula as a basis and adjusted the

{N H}_{4}^{+}

to

{N O}_{3}^{-}

ratio to set five nitrogen levels: 25%, 50%, 100% (CK), 150%, and 200% of the standard nitrogen concentration. Each treatment consisted of 24 replicates, totaling 120 plants. All pots were arranged in a completely randomized block design to eliminate the effects of microenvironmental variations within the greenhouse. Perlite with a particle size of approximately 3 mm was used as the growing medium; the medium was rinsed with clean water and sterilized at a high temperature prior to use. Irrigation was performed twice daily at 09:00 and 17:00, with a volume of 150 mL per plant per irrigation, to ensure that the growing medium remained moist without waterlogging. During the experiment, the pH of the nutrient solution was regularly monitored and adjusted to 6.0 ± 0.2 to ensure a stable root zone environment. Once the lettuce reached 45 days after transplanting (maturity), multi-view image data were simultaneously acquired using a fixed multi-sensor platform on the morning of the harvest day (08:00–10:00). Immediately after data collection, destructive sampling was performed, and the fresh weight of the aboveground parts was measured using an electronic balance (LQ-C, Jiaxing Zhengfeng Intelligent Equipment Co., Ltd., Jiaxing, China) with a precision of 0.01 g. The lettuce cultivation site is shown in Figure 1.

2.2. Multi-View Data Acquisition and Image Extraction

To ensure the standardization and consistency of data collection, this study divided the collection process into three stages: multi-view stereoscopic vision data collection, top-down multi-trait data collection, and side-view multi-angle trait data collection. All data acquisitions were conducted within a proprietary light-shielded darkbox platform to eliminate ambient light interference. In the first stage, a low-cost RGB sensor (AT-36, Shenzhen Antong Electronics Development Co., Ltd., Shenzhen, China) was used to capture multi-angle image sequences of lettuce, which were then used to construct a 3D point cloud and extract morphological features. The model construction of all point cloud data and the extraction of morphological parameters were carried out according to the experimental protocol described in a previous study [38]. In the second and third stages, a high-resolution RGB camera (MV-CS200-10GM/C, Hangzhou Hikvision Robotics Co., Ltd., Hangzhou, China) and a multispectral camera (MAX-G800, Shenzhen Zhongda Ruihe Technology Co., Ltd., Shenzhen, China) were used to simultaneously collect top-view and side-view data. To ensure consistency in subject-to-camera distance and field of view across different acquisition locations, and to guarantee the uniformity and standardization of acquired images, this study employed an 8 mm fixed-focus lens (MVL-KF0814M-12MPE, Hangzhou Hikvision Robotics Co., Ltd., Hangzhou, China) paired with a high-resolution RGB camera. The lens has an aperture range of F1.4 to F16.0, which can be flexibly adjusted according to actual acquisition requirements. For top-down imaging, the camera’s optical axis is kept perpendicular to the plant canopy (rotation angle set to 0°), with a single top-down image captured at a fixed working distance of 1.5 m. For side-view imaging, four rotation angles (0°, 90°, 180°, and 270°) were set, with image acquisition completed in four separate sessions. The working distance of the camera for each side-view perspective remained consistent with that of the top-down phase to ensure the comparability of data collected from different angles.

To ensure imaging quality and improve acquisition efficiency, this study followed a “top-view first, side-view later” sequence. Specifically, after completing the collection of top-view data for all samples, multi-angle side-view data were collected uniformly to avoid interference with the lighting environment and imaging system caused by frequent adjustments to the shooting angle. To correct distortion in high-resolution RGB cameras, the camera’s internal parameters are calibrated using the Zhang Zhengyou method prior to image acquisition. Distortion coefficients are obtained using a calibration target, and OpenCV software (Version: 4.13.0.92) is employed to correct both radial and tangential lens distortion, thereby ensuring imaging accuracy. For the quantitative correction of multispectral data, this study performed radiometric calibration using a standard reflectance plate. After acquiring reference images of the whiteboard, rectangular regions of interest (ROIs) were defined at the center of each spectral channel image, and the average grayscale value within these regions was extracted. By combining the nominal reflectance of the whiteboard in the corresponding bands with a linear mapping model, a correspondence between image pixel grayscale values and target reflectance was established, thereby achieving quantitative correction and feature inversion for the entire image.

To achieve precise separation of plant objects from the background and ensure the accuracy of subsequent feature extraction, this study employs the classic U-Net deep learning network to perform semantic segmentation on RGB and multispectral images from stages two and three. This study did not introduce any innovations to the U-Net network architecture during the image segmentation process. Instead, it utilized an open-source network framework and trained the model using a labeled dataset of lettuce plants and background samples to ensure the stability and reliability of the segmentation results. During segmentation, the lettuce plant was treated as the target foreground, while the flower pot, growing medium, and darkbox background were treated as background regions. Through network inference, a binary segmentation mask was generated to achieve precise separation of the foreground and background, laying the foundation for subsequent image feature extraction.

2.3. Feature Extraction and Variable Construction

Based on image preprocessing, this study extracted multiple feature variables from multimodal data, including morphological features (MFs), color indices (CIs), texture indices (TIs), and multispectral vegetation indices (VIs). After variable selection, these were used as inputs for the model predicting the fresh weight of lettuce.

2.3.1. Extraction of Morphological Features (MFs)

Vegetation morphological parameters are core indicators for assessing crop growth and development and play a crucial role in biomass estimation [39]. The extent of lettuce canopy expansion directly reflects its photosynthetic efficiency and dry matter accumulation capacity; therefore, establishing a quantitative mapping relationship between external morphology and biomass yield is of significant scientific value. Drawing on previous research methods [38], this study extracted the three-dimensional point cloud features of lettuce through 3D reconstruction of image sequences. To characterize plant growth status from multiple dimensions, the system extracted 12 phenotypic parameters covering the spatial scale, two-dimensional projections, and three-dimensional geometric structures. These include plant height (PH), projected convex hull area (PCHA), projected circumscribed circle radius (PCCR), point cloud convex hull volume (PCCHV), projected circumscribed rectangle width (PER_W), projected circumscribed rectangle height (PER_H), projected bounding rectangle length (PER_L), point cloud convex hull surface area (PCCHSA), projected convex hull perimeter (PCHP), projected bounding rectangle width-to-height ratio (PER_WHR), ratio of projected convex area to circumscribed circle area (PCHA/CCA), and ratio of point cloud convex hull surface area to volume (PCCHSA/PCCHV), as shown in Table 1.

2.3.2. Extraction of Color Feature Indices (CIs)

Color is one of the most visually apparent phenotypic traits of lettuce, and its spatiotemporal distribution is intrinsically linked to the plant’s internal physiological and biochemical composition, as well as its water content [40]. Studies have shown that subtle changes in leaf color characteristics often precede significant alterations in morphological structures when crops are subjected to water stress or nutrient deficiency [41]. Therefore, quantitatively analyzing the color characteristics from an optical perspective is key to modeling lettuce growth and diagnosing its physiological status. In this study, the mean values of the extracted RGB color components were first normalized to obtain standardized components r, g, and b. Subsequently, a series of color indices highly correlated with crop growth dynamics were constructed through mathematical combinations [42]. Building on previous research on biomass estimation, this study screened and calculated 19 color indices based on the RGB space and standardized color components as candidate feature variables for fresh weight prediction. Specific definitions are provided in Table 2.

2.3.3. Extraction of Texture Feature Indices (TIs)

Plant texture features are key visual parameters that characterize the geometric topology and tissue texture of plant surfaces and can sensitively reflect fluctuations in crop physiological status. In this study, the gray-level co-occurrence matrix (GLCM) method, introduced by Haralick et al. [47], was utilized to perform a quantitative analysis of textural information on lettuce surfaces, with the aim of revealing the heterogeneity of gray-level spatial distribution and textural fineness characteristics. Through systematic screening, eight GLCM statistical indices with clear physical significance were identified: mean (MEA), variance (VAR), homogeneity (HOM), contrast (CON), dissimilarity (DIS), entropy (ENT), angular second-order moment (ASM), and correlation (COR). These indices characterize the textural properties of plant surfaces in multiple dimensions; their specific calculation formulas are presented in Table 3.

According to previous reports, the calculation of plant texture features based on GLCM for biomass estimation shows little correlation with the choice of calculation direction or window size [48]. In this study, we conducted practical analyses using four calculation directions (0°, 45°, 90°, and 135°) and three sliding window sizes (1 × 1, 3 × 3, and 5 × 5), which also confirmed this conclusion. Therefore, we selected a calculation direction of 0° and a window size of 3 × 3 for the analysis.

2.3.4. Extraction of Multispectral Feature Indices (VIs)

Multispectral vegetation indices can effectively eliminate the effects of atmospheric conditions, soil background, and lighting conditions by linearly or non-linearly combining spectral reflectance data from specific wavelength bands, thereby significantly enhancing the ability to characterize crop physiological status. This study utilized corrected reflectance data from five core spectral bands—blue (450 nm), green (550 nm), red (660 nm), red edge (720 nm), and near-infrared (850 nm)—acquired by a multispectral camera to construct a spectral feature set.

To systematically screen for key spectral features closely related to the fresh weight of lettuce aboveground biomass, this study, drawing on previous research on biomass remote sensing estimation and vegetation physiological monitoring, calculated and preliminarily selected 27 spectral vegetation indices as candidate input variables for the fresh weight prediction model. The specific definitions and calculation formulas are presented in Table 4.

2.3.5. Feature Selection and Multicollinearity Testing

To construct a high-sensitivity, low-redundancy multimodal feature set and provide a high-quality data foundation for predicting the fresh weight of lettuce, this study developed a two-step optimization strategy consisting of “initial screening based on correlation” and “removal of multicollinearity.”

(1) Initial feature screening based on Pearson correlation: For the four categories of initial features—morphology, color, texture, and multispectral data—the correlation coefficients with fresh weight were calculated separately. In the field of plant phenotyping, a large body of literature indicates that a correlation coefficient greater than 0.4 between image features and phenotypes is considered a significant correlation [20]. Therefore, a uniform screening criterion is established for the correlation screening process: statistical significance, p ≤ 0.05, and absolute value of the correlation coefficient, |r| ≥ 0.4. Through this step, redundant variables with weak associations or statistical insignificance were rapidly removed from the original high-dimensional feature set, initially forming a subset of features for fresh weight prediction.

(2) Multicollinearity screening based on the variance inflation factor (VIF): To avoid model overfitting caused by high information overlap among features, this study employed a stepwise regression strategy to perform multicollinearity analysis on each independent feature dataset. While ensuring that the VIF ≤ 10, features with high correlations to the target variable and clear physical or physiological significance are prioritized for retention, whereas redundant variables are progressively removed to reduce model complexity and enhance generalization ability.

2.4. Fresh-Weight Prediction Modelling

In this study, to achieve accurate predictions of the fresh weight of lettuce, and given the relatively small size of the lettuce sample dataset, we selected six representative machine learning algorithms for modeling and comparative analysis. These include: Support Vector Regression (SVR), Random Forest Regression (RFR), Gradient Boosted Decision Tree Regression (GBDT), K-Nearest Neighbors Regression (KNN), XGBoost, and Backpropagation Neural Networks (BPNNs). These machine learning methods offer significant advantages when handling small datasets and can improve prediction accuracy through algorithm optimization and tuning. These algorithms encompass diverse mathematical mechanisms, ranging from classical kernel mapping to ensemble learning and deep learning, enabling them to effectively capture the complex nonlinear relationships between multi-source phenotypic traits and the target variable. To eliminate the impact of differences in feature dimensions on model training, this study performs standardization preprocessing on the input features. All standardization rules are derived solely from the training set data; the test set is directly transformed using these rules and is not included in the standardization process to strictly prevent data leakage. During model development, this study employed a grid search combined with 10-fold cross-validation for hyperparameter optimization. The specific parameters used in the model-building process are shown in Table 5; all other parameters were set to their default values. First, the search ranges for the core parameters were determined based on the characteristics of each algorithm. Subsequently, GridSearchCV was used to traverse all parameter combinations with the objective of maximizing the coefficient of determination R². This process ultimately identified the optimal parameter configuration on the validation set. This study included five nitrogen gradient treatment groups. To ensure balanced data distribution between the training and testing sets across different treatment levels, the dataset was divided using stratified sampling in a 7:3 ratio. Sample splitting was performed independently within each treatment group to ensure that both datasets fully covered the entire range of experimental treatments, thereby avoiding any imbalance in group distribution. The training set was used for model training and hyperparameter tuning, whereas the test set was used to independently evaluate the final predictive performance of the model. The detailed workflow for feature extraction and modeling is shown in Figure 2. All algorithms in this study were run on the Windows 11 operating system with the following hardware configuration: Intel(R) Core(TM) i9-13900HX CPU @2.20 GHz, NVIDIA GeForce RTX 4060 GPU, and 32 GB of RAM.

2.5. Performance Evaluation Metrics

To objectively and comprehensively evaluate the predictive accuracy and generalization ability of machine learning models for the fresh weight of lettuce, this study selected the coefficient of determination (R²), root mean square error (RMSE), normalized root mean square error (RMSEn), and mean absolute error (MAE) as quantitative evaluation metrics. By eliminating the influence of the target variable’s units and range of values, the normalized root mean square error (RMSEn) enables an objective comparison of the model performance across tasks. The formulas for each metric are as follows.

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}

(1)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(2)

R M S E n = 100 \times \frac{R M S E}{m a x (y_{i}) - m i n (y_{i})}

(3)

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(4)

where

y_{i}

represents the measured fresh weight of the ith lettuce sample,

\hat{y}

is the corresponding model prediction,

\bar{y}

is the mean of the measured fresh weights of the lettuce samples, and n is the number of samples.

m a x (y_{i})

and

m i n (y_{i})

represent the maximum and minimum values of the measured fresh weight of lettuce, respectively.

Furthermore, the simulation performance of the model is classified based on the RMSEn value as follows: when the RMSEn is less than 10%, between 10% and 20%, between 20% and 30%, or greater than 30%, the simulation is considered excellent, good, fair, or poor, respectively [58].

2.6. SHAP-Based Model Interpretability Analysis

Although machine learning models possess excellent nonlinear fitting capabilities, their complex internal decision-making logic often exhibits “black-box” characteristics, making it difficult to intuitively interpret the response mechanisms between input features and output variables [59]. To thoroughly elucidate the quantitative contribution mechanisms of multisource phenotypic features to the fresh weight of lettuce, this study introduces the SHAP (SHapley Additive Explanations) framework to conduct a comprehensive interpretability analysis of the optimal prediction model. The SHAP method is based on Shapley value theory in cooperative game theory. By calculating the marginal contribution of each feature to the model’s predicted target, it achieves precise attribution of feature importance [60]. Compared to traditional methods, such as the Gini importance of random forests, SHAP satisfies mathematical axioms, including local accuracy, missingness, and consistency. It can precisely quantify the positive driving and negative inhibitory effects of features and supports the local decomposition of single-sample prediction results [61]. Prior to conducting the SHAP analysis, all variables were standardized, effectively mitigating the impact of differences in data dimensions on the calculation and interpretation of SHAP values.

3. Results

3.1. Statistical Analysis of Fresh Weight Under Nitrogen Gradients

Fresh weight is a key indicator for assessing lettuce yield and nutritional status, and its dynamic changes reflect the critical role of nitrogen in regulating the assimilation and distribution of photosynthetic products. As an essential element for the synthesis of proteins, chlorophyll, and nucleic acids, nitrogen serves as the primary driver of photosynthetic carbon assimilation and determines the biomass accumulation [62]. The statistical results for the lettuce fresh weight under different nitrogen gradients are presented in Table 6.

A statistical analysis of 120 lettuce samples revealed that the fresh weight ranged from 21.50 g to 81.65 g, with an average of 42.63 g across all samples. As nitrogen supply levels increased, the average fresh weight of lettuce in each group showed a stepwise upward trend: the average fresh weight at the low nitrogen level (N1) was 31.34 g, while at the high nitrogen level (N5) it was 51.68 g. This study plotted box-and-whisker plots of fresh weight distribution for different treatment groups (Figure 3). The observation of the median values at the center of the boxes reveals that the response of lettuce fresh weight to the nitrogen gradient exhibits distinct phasic characteristics: within the N1 to N4 gradient range, the average fresh weight increased in a near-linear manner, with a total increase of 62.58% compared to the N1 group; upon entering the N4 to N5 range, the average fresh weight increased only slightly from 50.95 g to 51.68 g, with the growth rate slowing significantly to 1.43%, and the median position of the box remaining essentially unchanged. This indicates that, under the current greenhouse conditions, lettuce growth exhibits a physiological saturation effect. When nitrogen concentration approaches the N4 level, the plants’ assimilation capacity tends to reach saturation, and the marginal contribution of continued nitrogen input to fresh weight accumulation decreases significantly.

3.2. Correlation Analysis Between Multi-View Phenotypic Features and Fresh Weight

In studies predicting the fresh weight of lettuce, the correlation between color indices and fresh weight exhibited a significant angle dependence. Under a top-down viewing angle, the Pearson correlation coefficients (|r|) between various color indices and fresh weight were generally below 0.4, indicating that color information based on the canopy top struggled to adequately characterize the dynamic accumulation of fresh weights. In contrast, the correlations between indices such as VARI, NDI, IGRVI, and GRRI and fresh weight were significantly enhanced under the single-side and side-view-averaged perspectives. The correlation coefficients exceeded 0.48 under the single-side view and further increased to 0.52–0.53 after the side-view averaging. The experimental results confirmed that oblique imaging can more effectively capture the three-dimensional structure of plants and their pigment distribution characteristics in the vertical dimension, thereby demonstrating stronger representational capabilities for fresh weight estimation.

In terms of textural features, the average oblique view demonstrated a significant advantage. Specifically, the absolute values of the correlation coefficients between the local uniformity (HOM), textural complexity (ENT), and angular second-order moment (ASM) of the image (which reflect homogeneity, complexity, and consistency, respectively), and fresh weight reached 0.58, 0.60, and 0.59, indicating a strong correlation. Furthermore, the correlation coefficients for mean area (MEA) and dissimilarity (DIS) reached 0.53. From a biophysical perspective, the leaf orderliness, geometric complexity, and irregularity extracted from the side view can be precisely mapped to the cumulative fresh weight levels. In contrast, from the top-down perspective, most texture features exhibited low correlations because of the difficulty in capturing the complex spatial structure of the plant; only variance (VAR) and correlation (COR) demonstrated moderate correlations.

The performance of multispectral vegetation indices derived from data acquired at different viewing angles demonstrated clear complementary characteristics. The side-view angle generally yields superior results, with correlation coefficients of 0.63 and 0.61 for the SR and MSR indices, respectively, indicating excellent predictive potential. However, the top-down view possesses unique advantages in capturing spectral features that reflect canopy physiological status; the correlation coefficients for NDRE and MTCI under this view reached 0.57 and 0.58, respectively, significantly outperforming the oblique view. The correlations of GNDVI, GWDRVI, and SV_CI_green under the top-down view also reached 0.52. The above analysis indicates that the side-view perspective excels at characterizing the three-dimensional morphology and textural attributes of plants, whereas the top-down view is more sensitive to spectral physiological responses at the top of the canopy. The organic integration of these two perspectives provides a scientific basis for constructing high-precision multimodal data-prediction models.

3.3. Feature Selection and Multiple Collinearity Test Results

Based on a comprehensive evaluation using Pearson’s correlation coefficient and the variance inflation factor (VIF), this study selected the nine optimal features from the 66 original feature variables and constructed the final feature space for fresh weight prediction. Table 7 lists the final variables selected through correlation and VIF screening.

3.4. Comparative Analysis of Fresh-Weight Prediction Models

To evaluate the predictive accuracy of fresh weight in lettuce, this study selected six machine learning algorithms for modeling and analysis: support vector regression (SVR), random forest regression (RFR), gradient boosted decision tree regression (GBDT), K-nearest neighbor regression (KNN), Extreme Gradient Boosting (XGBoost), and backpropagation neural networks (BPNNs). A baseline model was constructed using only morphological features as primary features; subsequently, RGB color indices were incorporated to investigate the complementary effect of visible light bands on spectral information; thereafter, textural features were introduced to correct for canopy geometric effects by utilizing microstructural information; finally, multispectral vegetation indices were introduced. Through an incremental feature fusion strategy, the study systematically examined the performance gains of multimodal data on the model. The regression evaluation results under different feature combinations, using four assessment metrics—coefficient of determination (R²), root mean square error (RMSE), normalized root mean square error (RMSEn), and mean absolute error (MAE)—are shown in Figure 4.

The RFR model demonstrated excellent fitting capabilities across all feature combinations, with R² values consistently ranging between 0.80 and 0.82, indicating robust overall predictive performance. When using only morphological features as input, the RFR model achieved an R² of 0.80, an RMSE of 5.27 g, and an MAE of 3.95 g on the test set, indicating that morphological features could adequately reflect the variation patterns in the fresh weight of lettuce. After introducing RGB color features, the model’s R² value on the test set increased slightly to 0.81, while the improvement in RMSE was not significant, with an MAE of 3.90 g, indicating that color features had a limited effect on enhancing the performance of the RFR model. After further integrating texture features, the R² value on the test set increased to 0.82, the RMSE decreased to 5.09 g, and the MAE decreased to 3.65 g, indicating that texture information could supplement plant structural details, thereby improving prediction accuracy. When multispectral features were integrated to form a complete feature set, the testing error of the model further decreased, with an MAE of 3.61 g; however, the rate of performance improvement gradually leveled off.

The SVR model demonstrated excellent fitting capability and stable generalization performance under various feature combinations, particularly under multimodal feature fusion conditions. When using only morphological features, the SVR model achieved an R² of 0.84, an RMSE of 4.72 g, and an MAE of 3.51 g on the test set, with prediction accuracy significantly higher than that of the RFR model, indicating that SVR can effectively capture the nonlinear relationship between morphological features and fresh weight. After introducing RGB color features, the R² on the test set increased to 0.86, the RMSE decreased to 4.39 g, and the MAE decreased to 3.24 g, indicating that color information plays a significant complementary role in the SVR model. After further integrating texture features, the model maintained a stable predictive performance on the test set, with R² = 0.85, RMSE = 4.66 g, and MAE = 2.92 g, showing no obvious signs of overfitting. Under the full feature combination, the SVR model achieved the best prediction results: R² = 0.93, RMSE = 3.23 g, RMSEn = 5.60%, and MAE = 2.31 g on the test set, significantly outperforming other models.

When using only morphological features, the GBDT model achieved an R² value of 0.77, an RMSE of 5.64 g, and an MAE of 4.27 g on the test set. After introducing RGB color features, the R² value on the test set slightly improved to 0.79, and the MAE to 3.95 g; however, the model’s overall predictive performance remained limited. Upon further integration of texture and multispectral features, the model performance on the test set declined, with R² dropping to 0.72 and 0.70, respectively, while RMSE increased significantly, with MAE values of 3.55 g and 4.13 g, respectively. This indicates that, with a limited sample size, the GBDT model is sensitive to high-dimensional features, faces a significant risk of overfitting, and requires improvement in its generalization ability.

The XGBoost model maintains high training accuracy while demonstrating good predictive stability on the test set. When using only morphological features, the model achieved an R² of 0.72 and an MAE of 4.68 g on the test set, indicating relatively average predictive performance. After introducing RGB color features, the R² on the test set significantly improved to 0.85, whereas the RMSE decreased to 4.62 g and the MAE decreased to 3.46 g, indicating that color features significantly enhanced model performance. After further integrating texture features, the model’s R² on the test set increased to 0.88, with the RMSE further decreasing to 4.10 g and the MAE to 2.85 g, achieving the model’s optimal predictive performance. Under the full feature combination, the model’s predictive performance declined slightly but remained at a high level, with R² = 0.87 and an MAE of 2.87 g.

The KNN model exhibited a perfect fit for the training set, with R² values reaching 1 for all feature combinations. However, the R² values of the model for the test set were generally low, and its prediction errors were significant, indicating that the model was highly dependent on the training samples and had limited generalization ability. The predicted values of the KNN model almost perfectly aligned with the 1:1 reference line, indicating a strong local fitting capability for the training samples. However, in the regression scatter plot of the test set, the predicted points showed a clear divergence trend, with some samples deviating significantly from the 1:1 reference line. In particular, large prediction errors were observed in the medium-to-high fresh weight range, revealing distinct dispersion characteristics. The mean absolute error (MAE) for the morphological feature set was 5.12 g; after incorporating RGB features, the MAE was 3.75 g; after integrating texture features, the MAE was 3.58 g; and with the full feature combination, the MAE was 3.38 g. These results indicate that when the sample size is limited and the feature dimension is high, the KNN model struggles to make stable predictions for unseen samples.

The overall prediction performance of the BPNN model improved gradually with an increasing feature dimension, indicating that multimodal data fusion has a positive effect on neural network models. The MAE for morphological features alone was 3.95 g; after adding RGB features, the MAE decreased to 3.44 g; after integrating texture features, the MAE was 3.27 g; and with the full feature combination, the MAE reached 2.81 g. However, the prediction accuracy of this model on the test set consistently remained lower than that of the SVR and XGBoost models.

3.5. Cross-Model Comparison of Machine-Learning Algorithms

Figure 5 illustrates the differences in the fresh-weight prediction performance of six ML models under a four-stage IC feature combination. The features are introduced in stages, including morphological features (MFs), color indices (CIs), texture indices (TIs), and multispectral vegetation indices (VIs). By analyzing multiple metrics—including R², RMSE, MAE, and RMSEn—the study clearly demonstrates the patterns of how feature increment and model adaptability influence prediction performance.

When using only a single morphological feature, each model can achieve a basic estimation of fresh weight based on the three-dimensional structural information of the plant; however, the performance varies significantly. Among them, the SVR model performed best, with R² = 0.84, RMSE = 4.72 g, and MAE = 3.51 g, effectively capturing the nonlinear relationship between morphological features and fresh weight. The BPNN and RFR models demonstrated good stability, with R² values of 0.81 and 0.80, respectively, and MAE values of 3.95 g each. The XGBoost and GBDT models exhibited relatively lower prediction accuracy, with R² values of 0.72 and 0.77, respectively, and MAE values of 4.68 g and 4.27 g, respectively. The KNN model had the highest error and poorest performance, with an R² of only 0.69 and an MAE as high as 5.12 g. This indicates that morphological features are the core foundation for estimating the fresh weight of lettuce; however, there are significant differences in the ability of different models to extract information from individual structural features.

After introducing RGB color features, the overall prediction accuracy of all models improved, and the errors decreased slightly. The SVR model showed significant gains, with R² increasing to 0.86, RMSE decreasing to 4.39 g, and MAE decreasing to 3.24 g; the XGBoost model achieved a leap in performance, with R² increasing from 0.72 to 0.85 and MAE decreasing to 3.46 g; the BPNN, KNN, GBDT, and RFR models all showed varying degrees of optimization, with MAE decreasing to 3.44 g, 3.75 g, 3.95 g, and 3.90 g, respectively. It was evident that RGB color features could supplement information on the plant’s visual growth status and effectively address the information gaps in single morphological features, enhancing the models’ predictive capabilities.

After further integration of the texture features, the performance of most models continued to improve, demonstrating a notable synergistic effect from the combination of multiple features. XGBoost achieved the best performance at this stage, with R² = 0.88 and MAE = 2.85 g; SVR maintained its lead, with R² = 0.85 and MAE reduced to 2.92 g; the errors of the BPNN, RFR, and KNN models continued to converge, with MAE decreasing to 3.27 g, 3.65 g, and 3.58 g, respectively; and the performance of the GBDT model stabilized, showing no significant improvement. Texture features can precisely characterize differences in canopy surface details, complementing morphological and color features, and effectively enriching the model’s feature input dimensions.

After finally integrating multispectral features to construct a full-dimensional feature space, the performances of the various models diverged significantly. The SVR model demonstrated a distinct advantage in handling high-dimensional features, achieving the best results across the entire dataset with R² = 0.93, RMSE = 3.23 g, RMSEn = 5.60%, and MAE = 2.31 g, demonstrating exceptional robustness and generalization ability. XGBoost and BPNN maintained high accuracy levels, with R² values of 0.87 and 0.85, respectively, and MAE values of 2.87 g and 2.81 g, respectively. The RFR and KNN models showed performance saturation with no significant improvement in feature increments. The GBDT model exhibited significant overfitting and degradation, with R² dropping to 0.70 and MAE rising to 4.13 g, indicating poor adaptability to high-dimensional features and a weak generalization ability.

In summary, progressive multimodal feature fusion can sequentially supplement plant structure, growth status, and physiological information, thereby effectively improving the prediction accuracy of the fresh weight. Low-dimensional feature fusion can achieve stable gains across all models, whereas model adaptability varies significantly under high-dimensional feature scenarios. Based on a comprehensive evaluation of all indicators, the SVR model demonstrated the highest prediction accuracy, lowest error rate, and best stability, indicating that it is the optimal model for estimating the fresh weight of lettuce.

3.6. SHAP-Based Feature Contribution Analysis

To elucidate the generalization decision logic of the optimal SVR model on independent samples and to clarify the distribution of mathematical contributions from each modality’s phenotypic variables within the algorithm, this study introduced the Kernel SHAP game-theoretic explainability framework and conducted a rigorous quantitative analysis using only the 30% independence validation test set. Because this reserved test set was never involved in the model’s weight training or hyperparameter grid search from the outset—and thus constitutes validation data entirely unseen by the model—the marginal contribution quantification performed on it can authentically and unbiasedly evaluate the generalization interpretability of features. The contribution weights quantified by the SHAP framework essentially constitute a posteriori verification of the internal mapping logic fitted by fully supervised machine learning algorithms, reflecting the mathematical marginal variation in the output prediction values contributed by each variable in the construction of the decision hyperplane.

As evidenced by the global feature importance calculated based on the independent test set (Figure 6A), the decision-making of the optimal SVR model exhibits distinct characteristics of “shape dominance, with multidimensional synergy among multispectral and color indices.” The “point cloud convex hull surface area (PCCHSA),” which characterizes the three-dimensional geometric envelope scale of lettuce plants, holds an absolute dominant position in the test set decision-making, with its marginal contribution magnitude showing a clear lead. This post hoc analysis directly confirms that the overall baseline for fresh weight estimation of lettuce is highly dependent on the physical support provided by strong morphological and dimensional variables. However, relying solely on morphological features can easily lead to underlying fitting biases in algorithms lacking internal regularization constraints, as they fail to suppress spatial noise at the edges of the greenhouse, resulting in severe local overfitting within small sample spaces. The support vector regression (SVR) model selected in this study, through parameter regularization and a mechanism that maximizes structural margin, stably captures the dominant contribution of PCCHSA while synergistically integrating two-dimensional morphological features, such as the projected circumcircle radius (PCCR), the red–green ratio index (GRRI), and multispectral physiological features from oblique and top views (such as SV_SR, SV_CI_green, and TV_CARI). Although these multimodal features—which interweave spectral, color, and morphological information—have limited global marginal contributions, they provide essential fine-grained physiological state corrections and multidimensional topological complementarity within the algorithm’s hyperplane. This mathematically demonstrates that multimodal fusion improves generalization stability compared with predictions based on a single morphological benchmark.

As shown in the topological distribution map of feature influences on the test set (Figure 6B), a clear nonlinear drive chain emerges between the feature values and model outputs. The distribution of SHAP values for PCCHSA in the test set samples is extremely skewed to the right, with high-value scatter points densely intertwined within the positive contribution range, indicating that when the volume of the three-dimensional lettuce structure exceeds the benchmark threshold, the fresh weight prediction output of the algorithm will exhibit a significant positive step response. The scatterplots of multispectral features (such as SV_SR and TV_CARI) and RGB color indices (GRRI) were relatively symmetrical and concentrated on both sides of the zero value, with high and low feature values exhibiting nonlinear overlap in localized regions. This indicates that, on the 30% independent test set, the algorithm’s utilization of spectral physiological features and color indices tends toward a posteriori “fine-tuning correction”—that is, based on the biomass baseline determined by morphological analysis, it performs bidirectional interactive fine-tuning of the final decision hyperplane according to the physiological activity mapped by multispectral reflectance and the differences in stress-induced pigments manifested in the RGB color space.

In summary, by conducting an SHAP analysis on an independently reserved validation dataset, this study elucidates the direction and magnitude of contributions from each modal feature to the model output from a post hoc algorithmic perspective. The SVR model does not blindly distribute equal weights across phenotypic variables; rather, it uses three-dimensional morphological features as the core mathematical driver, supplemented by post hoc multispectral physiological features and RGB color indices for fine-tuning. These findings correlate with the test set coefficient of determination (R²) in terms of external fitting performance and provide a critical internal logical basis for evaluating the generalization behavior and robustness optimization of multimodal phenotypic inversion techniques in complex, uncontrolled facility environments.

3.7. Independent Sample Trial of the Optimal Model for Fresh Weight of Lettuce

To further validate the generalization performance of the optimal model on independent samples, we conducted field validation trials on independent batches at a Venlo-type greenhouse at Jiangsu University (32.2° N, 119.5° E). The test material comprised Italian bolt-resistant lettuce. Cultivation methods and nutrient solution management conditions were consistent with those used in the modeling experiments, with standardized irrigation using a modified Hoagland nutrient solution. Once the lettuce reached maturity, phenotypic data were collected, and destructive fresh weight measurements were taken to construct an independent validation dataset.

The validation process strictly followed the previous modeling workflow: the collected phenotypic traits were input into the trained optimal prediction model to obtain fresh-weight predictions, which were then compared and analyzed against the simultaneously obtained experimental data. Ninety lettuce samples were collected for this independent validation. The results of the fit analysis between the model predictions and experimental values based on the independent validation set are shown in Figure 7. The model predictions showed a significant linear correlation with the measured values, with a coefficient of determination (R²) of 0.86 and a root mean square error (RMSE) of 3.36 g. The prediction error fell within a reasonable range, indicating that the optimal model possessed good generalization ability under controlled cultivation conditions. This validated the feasibility and reliability of this modeling method for the non-destructive prediction of fresh weight in mature lettuce.

4. Discussion

4.1. Association Mechanism Between Multimodal Features and Lettuce Fresh Weight

The phenotypic characteristics of a plant’s aboveground parts serve as the direct physical basis for biomass accumulation, and their precise characterization plays a decisive role in predicting the fresh weight. Multimodal phenotypic traits, encompassing structural, physiological, color, and textural dimensions, collectively regulate the spatial configuration of photosynthetic organs, light capture efficiency, and rates of material synthesis. Through phenotypic plasticity, these traits respond to environmental changes, influencing biomass allocation patterns and accumulation levels. Three-dimensional morphological traits, such as PCCHSA and PER_W, directly reflect a plant’s ability to occupy space within the canopy, its potential for vertical expansion, and the overall complexity of its structure. These traits determine the efficiency of light radiation transmission, interception, and utilization within the canopy and represent the most critical physical factors driving fresh weight in lettuce [63]. The more developed the canopy structure and the more fully it expands spatially, the larger the plant’s photosynthetically active area becomes, and the faster the rate of fresh weight accumulation [64].

Color indices, such as the GRRI, can sensitively characterize chlorophyll content, photosynthetic efficiency, nitrogen nutrient levels, and plant health, providing critical physiological information to complement fresh weight predictions. This enables models to capture the contribution of internal physiological changes to fresh weight, rather than relying solely on external morphology. Multispectral indices, such as the SV_CI_green index, can accurately distinguish between plant growth health and senescence levels, rapidly indicating differences in overall growth vigor. They provide an intuitive basis for the visual assessment of fresh weight gradients in lettuce and serve as a crucial bridge linking external appearance to internal growth status [65]. RGB textural features analyze crop surface uniformity, complexity, and detail heterogeneity to reflect plant growth consistency, leaf arrangement patterns, and canopy compactness. They serve to refine and correct local variations that are difficult to capture through morphological and spectral features, further enhancing the model’s ability to distinguish fresh weight variations across different individuals and growth stages. The four categories of multimodal features—morphology, spectroscopy, color, and texture—are not mutually exclusive. Instead, they collectively describe the formation and accumulation patterns of lettuce fresh weight across four levels: structural volume, physiological vitality, visual color, and surface heterogeneity. These features complement and reinforce one another, jointly constructing a comprehensive, systematic, and biologically meaningful phenotypic characterization system [66].

4.2. Analysis of Multimodal Feature Selection and Modeling Strategy

The accuracy and reliability of biomass estimation depend heavily on the scientific selection of feature variables. Establishing a rigorous and efficient feature selection framework is a core prerequisite for improving model learning efficiency, enhancing prediction stability, and ensuring biological interpretability. The multimodal feature set constructed in this study encompasses four major categories of indicators: 3D morphology, spectral physiology, color, and texture. Features derived from different modalities but calculated from the same source data generally exhibit strong linear correlations and information redundancy; in particular, canopy structure parameters extracted from 3D point cloud reconstructions are highly prone to severe multicollinearity issues because of the nature of data generation from the same source. If all raw features are directly fed into the model for training, this will not only significantly increase computational redundancy and reduce model runtime efficiency but also cause bias and distortion in regression parameter estimates, amplify the risk of model overfitting, and simultaneously obscure and dilute the true driving effects of key core phenotypes on fresh weight. Ultimately, this will weaken the model’s generalization ability across samples and growth stages, as well as the interpretability of the results.

To effectively address the issue of multicollinearity among multimodal features and improve the quality of model inputs, this study employs a two-step screening strategy that combines Pearson correlation analysis with the variance inflation factor (VIF) to achieve standardized dimensionality reduction of high-dimensional phenotypic data. First, Pearson correlation analysis is used to identify and eliminate highly redundant variable combinations within the feature set, thereby reducing information overlap. Subsequently, the VIF test was used to quantitatively assess the strength of multicollinearity among variables and eliminate indicators with excessively high VIF values that could compromise model stability. This screening process is grounded in statistical rigor while retaining key phenotypic features with clear biological significance. It avoids the loss of valuable information that can result from simple exclusion or subjective selection, ensuring that the final feature set input into the model possesses independence, representativeness, and interpretability.

To further elucidate the contribution mechanisms of each selected feature to fresh weight prediction, this study utilized the SHAP method to quantify and visually rank the contribution of the screened multimodal features. In terms of the global importance ranking, the three core features—PCCHSA, GRRI, and PER_W—retained through the two-step screening process consistently ranked in the top three, playing a dominant role in predicting lettuce fresh weight; followed by SV_SR, ENT, and PCCR, while spectral color and low-level texture features contributed the least. Among these, three-dimensional morphological features, such as PCCHSA, can accurately characterize the plant canopy’s horizontal coverage, three-dimensional spatial expansion capacity, and canopy compactness, directly determining the total amount of light energy captured by the plant and the population’s resource utilization efficiency, and are highly coupled with biomass accumulation and fresh weight formation; color features, such as GRRI, can sensitively characterize plant chlorophyll levels, photosynthetic physiological activity, and nutritional status; two-dimensional morphological features, such as PER-W, supplement the scale of lateral canopy growth; and color and texture features depict the apparent color and surface heterogeneity of the canopy. These four categories of data complement each other’s strengths and collaborate in modeling, reflecting the complex synergistic mechanisms of multimodal features.

In summary, the multi-stage feature selection method based on Pearson correlation coefficients and VIF tests can significantly reduce multicollinearity and information redundancy in multimodal data. While streamlining the model’s input dimensions, it effectively retains phenotypic features that play a key role in driving fresh weight. The optimized feature set constructed through this screening process supports high-precision, robust fresh weight prediction models and provides quantitative, interpretable phenotypic indicators for greenhouse lettuce cultivation management, population structure optimization, and the early, efficient screening of breeding materials, offering significant theoretical reference value and potential for practical application.

The six machine learning algorithms selected in this study exhibited significant performance differences when processing the pre-screened multimodal features, which is closely related to the adaptability of different algorithm architectures to small-sample, high-dimensional data. Among them, the K-Nearest Neighbors (KNN) model achieved extremely high fitting accuracy on the training set, but its prediction error increased significantly on the test set. This performance suggests that instance-learning algorithms, which lack internal regularization constraints and rely solely on spatial distance for prediction, are highly prone to overfitting due to local training noise when processing complex nonlinear plant phenotypic data, thereby losing their generalization ability. In stark contrast, models incorporating parameter penalties or ensemble learning mechanisms demonstrated strong robustness. Preferred models, represented by Support Vector Regression (SVR), maintained good continuity and stability in error metrics across the training and test sets. This is primarily because SVR constructs optimal decision boundaries based on support vectors and performs Z-score standardization on the data prior to modeling. This eliminates dimensional differences among features of different modalities and prevents features with excessively large absolute values from dominating model training, thereby achieving better global approximation.

From the perspective of big data requirements in traditional machine learning, the 120 lettuce samples used in this study indeed constitute a relatively small sample size, which is an objective reason for the slight discrepancy in performance between the training and testing sets for some core models. However, due to constraints such as controlled-environment cultivation, destructive phenotypic data collection, and intensive physicochemical measurements, individual phenotypic experiments in the field of agronomy are often subject to dual spatial and temporal limitations, making it difficult to accumulate large-scale samples in the short term. According to statistical learning theory, a model’s generalization ability depends not only on the absolute number of samples but also on the relative ratio of independent observations to the number of input features. In this study, Pearson correlation analysis and VIF tests were used to reduce the initial 66 input variables to 9 core variables, increasing the ratio of effective samples to features to over 13:1. This mathematical structure satisfies the basic requirements for constructing stable, non-random decision boundaries in shallow machine learning models. Furthermore, this study introduced a temporally independent external batch dataset and conducted a blind validation while maintaining identical preprocessing parameters and optimal model structure. The model maintained high predictive accuracy on entirely new, independent samples. This result confirms that, through rigorous feature selection, scaling, and regularization control, it is entirely feasible and reliable to construct a high-precision plant phenotype estimation model with cross-batch and cross-temporal generalization capabilities within the controlled sample space of 120 plant accessions.

4.3. Analysis of the Impact and Applications of the Greenhouse Microenvironment

Spatial heterogeneity within controlled-environment greenhouses and the resulting microclimate gradients are key factors influencing the precise analysis of crop multimodal phenotypes. Areas near exterior walls, evaporative cooling systems, or entrances within greenhouses commonly exhibit uneven ventilation rates, diminishing returns from sunlight, and localized shading. These variations in local total radiation directly trigger morphological plasticity responses in crops. For example, in locations with excessive direct sunlight or uneven local light distribution, lettuce often optimizes light capture by adjusting internode elongation and leaf spread angles, leading to abnormal increases in plant height or heterogeneous changes in canopy geometry. From the perspective of decision mechanisms in statistical modeling and machine learning algorithms, such spatial interference introduces underlying fitting biases for algorithms based on different mathematical architectures. For algorithms such as K-nearest neighbors (KNN)—which lack internal regularization constraints and rely solely on multidimensional topological spatial distances for instance learning—the introduction of spatial noise is catastrophic. Once certain locations exhibit abnormally tall plant heights or color shifts due to excessive sunlight absorption, the KNN algorithm’s neighborhood determination structure is directly dominated by these location-induced pseudo-features when calculating Euclidean distances in the feature space. This makes the algorithm highly susceptible to severe local overfitting within small sample spaces, resulting in a significant deterioration of prediction errors on the test set. Superior models, such as support vector regression (SVR), due to the introduction of regularization, maximization of structural margin, or residual iteration mechanisms, possess a certain degree of mathematical robustness against nonlinear biological phenotypic noise. They can effectively mitigate the interference of features with extreme absolute values on the global hyperplane, demonstrating stronger generalization stability.

Furthermore, a scientific assessment of the performance loss when applying this model to in situ greenhouse production environments without controlled darkbox components serves as a crucial basis for evaluating its industrial transferability. The core function of a darkbox platform is to provide a constant light source and completely isolate the system from external stray light, thereby enabling the capture of weak crop phenotypic signals with a high signal-to-noise ratio. However, when directly deployed in actual greenhouse production rows—where lighting is variable, natural shadows are intertwined, and complex cultivation backgrounds exist (such as reflections from perlite substrates and shadows from pipes)—the model’s predictive performance is expected to decline significantly. This is due to dynamic fluctuations in natural sunlight intensity, slight plant deformations caused by fans, and reduced canopy segmentation accuracy resulting from complex backgrounds. Therefore, a thorough analysis of the spatial distribution patterns of shadows and the physical denoising mechanisms in in situ greenhouse environments has become a key breakthrough for advancing the practical application of multimodal phenotypic inversion technology. This study pioneered the establishment of mapping boundaries under controlled conditions; subsequent work will gradually bridge the gap toward in situ online application in greenhouses by introducing a canopy segmentation network with illumination robustness and a dynamic natural light correction model.

It should also be emphasized that the core scenario of the multimodal phenotypic inversion system in this study is explicitly grounded in highly controlled greenhouse environments. Modern smart greenhouses exhibit highly standardized, industrialized, and equipment-integrated operational characteristics, typically equipped with automated PLC-controlled moving rails, suspended track-based phenotyping platforms, or inspection robots. Under this closed, intensive production model, integrating high signal-to-noise ratio controlled imaging darkrooms or shading yield measurement components directly into automated inspection lines through hardware modifications is highly feasible from an engineering perspective. Therefore, this study’s selection of a controlled greenhouse darkroom environment to establish phenotypic mapping boundaries precisely aligns with the industrial technological demands of modern facility agriculture for fully automated, non-destructive, high-precision yield measurement and precise growth regulation. By first elucidating the multimodal phenotype-driven mechanisms of lettuce fresh weight in an ideal controlled environment, this study lays a solid theoretical foundation for the next phase of solidifying this model into a standardized online detection algorithm module for factory-style smart greenhouse production lines. It also provides critical algorithmic and data support for subsequent, deeper-level digital production in smart facility agriculture, offering clear prospects for practical application in the modern, high-level facility horticulture industry.

4.4. Advantage Analysis of Multimodal Data Fusion in Lettuce Fresh Weight Prediction

Multimodal data fusion technology offers significant advantages in predicting the fresh weight of lettuce by enhancing information complementarity, improving model expressiveness, and increasing prediction stability. In terms of information representation, a single data source typically reflects only certain aspects of crop growth. For example, studies on fresh weight estimation using morphological parameters derived from 3D reconstruction primarily rely on geometric structural information, making it difficult to capture internal physiological changes within the plant [38]. When using multispectral data or vegetation indices to characterize chlorophyll content and nitrogen status, estimates are easily disrupted by variations in canopy structure, limiting prediction accuracy [51]. Furthermore, studies relying solely on RGB imagery tend to focus on color or texture information, resulting in a limited ability to capture complex canopy structures [42]. Multimodal data fusion technology integrates morphological structural information with spectral physiological information, thereby enabling a transition from structural characterization to structure–physiology coupling.

Multimodal data fusion demonstrated a significant cumulative gain effect in terms of model performance. With the introduction of color, texture, and multispectral features, the prediction accuracy of each model showed a stepwise improvement trend, consistent with the conclusions of previous studies on crop biomass estimation [46,57]. In this study, the support vector regression (SVR) model achieved the best results after fusing multimodal data (R² = 0.93, RMSE = 3.23 g), significantly outperforming scenarios with single-feature inputs. This indicates that multimodal data inputs not only provide more effective features but also enhance the model’s sensitivity to key variables in high-dimensional feature spaces.

With respect to model generalization and stability, single-source data are susceptible to variations in imaging angles, lighting conditions, and individual plant differences; in contrast, multimodal data fusion enhances the model’s adaptability to complex environmental changes through information redundancy and complementarity. In a study on crop biomass estimation based on UAV multispectral imagery, Osco et al. [67] found that models integrating multiple vegetation indices and textural features exhibited more stable predictive performance across different growth stages, with a smaller variation in the coefficient of determination (R²) than single-feature models. This confirms the effectiveness of multimodal data fusion in enhancing the temporal stability of models [68].

Multimodal data fusion overcomes the dimensional limitations of single data sources. By leveraging the synergy of multimodal data, it enhances the accuracy, reliability, and applicability of fresh weight prediction. Compared with traditional single-modal methods, it offers significant advantages and provides more efficient and robust technical support for the dynamic monitoring of greenhouse lettuce growth and the precise estimation of yield.

4.5. Limitations and Future Work

This study achieved high inversion accuracy and good interpretability in multimodal data fusion and fresh weight prediction for lettuce; however, due to limitations in experimental conditions and the depth of feature extraction—including a small sample size and a single experimental design—the following limitations remain to be addressed. First, this study utilized only one lettuce variety, a single greenhouse experiment, five nitrogen levels, and 120 plants, with data collected only once at maturity. It lacks validation across multiple varieties and batches. First, regarding data acquisition, all data in this study were collected under controlled greenhouse darkroom conditions, where the lighting was stable and the background was uniform, thereby reducing the impact of external interference on image quality. However, there is a significant gap between this idealized experimental environment and the actual application scenarios in open fields and open-air agricultural facilities. Existing research has confirmed that in real-world agricultural settings, numerous factors—including differences between RGB and near-infrared imaging, heterogeneity in plant geometry, fluctuations in light intensity, canopy shading, and image quality degradation—can significantly reduce the operational stability and generalization performance of visual phenotyping models. Consequently, advancing robust, highly transferable multimodal phenotyping modeling technologies has become a key research direction in this field [69]. Therefore, future work should involve conducting validation trials in field or semi-open environments, expanding the sample size, and increasing the number of varieties and growth stages to assess the model’s robustness and generalizability in complex scenarios. Regarding feature construction and fusion methods, this study primarily employed manually designed morphological parameters, color indices, texture features, and vegetation indices for fusion analysis. Although this approach has clear physical significance and strong interpretability, there remains the potential for unmined information within the high-dimensional complex feature space. In the future, deep learning methods could be introduced to enable automatic feature extraction and end-to-end modeling, thereby further enhancing model performance.

5. Conclusions

This study developed a model for predicting the fresh weight of lettuce by integrating multi-view morphological, color, and texture, and multispectral features, thereby achieving accurate predictions of fresh weight under controlled greenhouse conditions. The results indicate that multimodal data fusion can effectively overcome the limitations of single data sources, jointly characterize plant geometry and physiological information, and improve prediction accuracy.

In terms of model performance, we compared the performance of six typical machine learning algorithms using an incremental feature fusion strategy. The results show that the SVR model performed best in the full-dimensional feature space, with a test set coefficient of determination (R²) of 0.93 and a root mean square error (RMSE) of only 3.23 g, significantly outperforming the RFR and other compared models. As color, texture, and multispectral features were progressively introduced, the performance of all models continued to improve, validating the effectiveness of multimodal data fusion in fresh-weight prediction. Feature contribution analysis further elucidated the decision-making mechanisms of the models. The SHAP explanation model indicated that the PCCHSA is the core driving factor for the fresh weight prediction of lettuce, establishing the physical foundation for fresh weight inversion; meanwhile, spectral indices and texture features play a fine-tuning role by providing information on physiological status and spatial heterogeneity. Furthermore, multi-view imaging demonstrated a distinct complementary effect among features: the side view excels at capturing the three-dimensional structure, while the top view is more sensitive to spectral responses; the fusion of these two views effectively enhanced the depth of feature representation.

In summary, this study provides a feasible approach and technical guidance for the nondestructive estimation of the fresh weight of lettuce under controlled greenhouse conditions and offers a scientific basis for the phenotypic modeling of protected-culture crops.

Author Contributions

Conceptualization, T.L. and X.Z.; methodology, T.L. and C.G.; software, T.L.; validation, X.Z. and T.L.; formal analysis, T.L.; investigation, T.L.; resources, T.L.; data curation, T.L., C.G. and D.Z.; writing—original draft preparation, T.L.; writing—review and editing, T.L. and X.Z.; funding acquisition, X.Z. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the project of the National Key Research and Development Program of China (Grant Nos. 2022YFD2002302); the National Key Research and Development Pro- gram for Young Scientists (Grant Nos. 2022YFD2000200); and the Jiangsu Province Industry Forward-looking Program Project (Grant Nos. BE2023017).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding authors.

Acknowledgments

The authors express their gratitude to the School of Agricultural Engineering, Jiangsu University, for providing the essential instruments without which this work would not have been possible. The authors also thank DEEPL (Version: 26.4.1.1) and Paperpal (Version: 5.59.5) for their assistance in translating and checking the grammar of this article in English. The authors also thank the reviewers for their important feedback.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Moon, T.; Kim, D.; Kwon, S.; Ahn, T.I.; Son, J.E. Non-Destructive Monitoring of Crop Fresh Weight and Leaf Area with a Simple Formula and a Convolutional Neural Network. Sensors 2022, 22, 7728. [Google Scholar] [CrossRef]
Zhang, W.Y.; Cao, H.X.; Zhang, W.X.; Hanan, J.; Ge, D.K.; Cao, J.; Xia, J.A.; Xuan, S.L.; Liang, W.J.; Zhang, L.L.; et al. An aboveground biomass partitioning coefficient model for rapeseed (Brassica napus L.). Field Crops Res. 2020, 259, 107966. [Google Scholar] [CrossRef]
Zhang, W.X.; Wu, Q.; Sun, C.L.; Ge, D.K.; Cao, J.; Liang, W.J.; Yin, Y.J.; Li, H.; Cao, H.X.; Zhang, W.Y.; et al. Biomass-based lateral root morphological parameter models for rapeseed (Brassica napus L.). Food Energy Secur. 2023, 13, e519. [Google Scholar] [CrossRef]
Wu, J.; Li, C.; Pan, X.; Wang, X.; Zhao, X.; Gao, Y.; Yang, S.; Zhai, C. Model for Detecting Boom Height Based on an Ultrasonic Sensor for the Whole Growth Cycle of Wheat. Agriculture 2024, 14, 21. [Google Scholar] [CrossRef]
Ginzburg, D.N.; Cox, J.A.; Rhee, S.Y. Non-destructive, whole-plant phenotyping reveals dynamic changes in water use efficiency, photosynthesis, and rhizosphere acidification of sorghum accessions under osmotic stress. Plant Direct 2024, 8, e571. [Google Scholar] [CrossRef]
Wei, L.; Yang, H.; Niu, Y.; Zhang, Y.; Xu, L.; Chai, X. Wheat biomass, yield, and straw-grain ratio estimation from multi-temporal UAV-based RGB and multispectral images. Biosyst. Eng. 2023, 234, 187–205. [Google Scholar] [CrossRef]
Huang, Y.P.; Li, Z.A.; Bian, Z.Y.; Jin, H.J.; Zheng, G.Q.; Hu, D.; Sun, Y.; Fan, C.L.; Xie, W.J.; Fang, H.M. Overview of Deep Learning and Nondestructive Detection Technology for Quality Assessment of Tomatoes. Foods 2025, 14, 286. [Google Scholar] [CrossRef]
Li, W.; Zhang, C.; Ma, T.; Li, W. Estimation of summer maize biomass based on a crop growth model. Emir. J. Food Agric. 2021, 33, 742–750. [Google Scholar] [CrossRef]
Wang, J.; Zhang, Y.; Gu, R. Research Status and Prospects on Plant Canopy Structure Measurement Using Visual Sensors Based on Three-Dimensional Reconstruction. Agriculture 2020, 10, 462. [Google Scholar] [CrossRef]
Wang, T.; Lu, M.; He, G.; Wang, Q. Estimation model of grassland above ground biomass integrating three-dimensional structure and spectral characteristics of vegetation. Trans. Chin. Soc. Agric. Mach. 2025, 56, 76–83. [Google Scholar]
Xie, Y.; Wang, B.; Yao, Y.; Yang, L.; Gao, Y.; Zhang, Z.M.; Lin, L.X. Quantification of vertical community structure of subtropical evergreen broad-leaved forest community using UAV-Lidar data. Acta Ecol. Sin. 2020, 40, 940–951. [Google Scholar]
Tang, Z.; Miralles, D.G.; Guo, Z.; Maes, W.H. Fast response of satellite fluorescence-derived plant physiology to drought stress. Nat. Commun. 2026, 17, 2886. [Google Scholar] [CrossRef]
Dugdale, S.J.; Malcolm, L.A.; Hannah, D.M. Drone-based structure-from-motion provides accurate forest canopy data to assess shading effects in river temperature models. Sci. Total Environ. 2019, 678, 326–340. [Google Scholar] [CrossRef]
Niu, Y.X.; Han, W.T.; Zhang, H.H.; Zhang, L.Y.; Chen, H.P. Estimating maize plant height using a crop surface model constructed from UAV RGB images. Biosyst. Eng. 2024, 241, 56–67. [Google Scholar] [CrossRef]
Gu, W.; Wen, W.; Wu, S.; Zheng, C.; Lu, X.; Chang, W.; Xiao, P.; Guo, X. 3D Reconstruction of Wheat Plants by Integrating Point Cloud Data and Virtual Design Optimization. Agriculture 2024, 14, 391. [Google Scholar] [CrossRef]
Tunio, M.H.; Gao, J.; Lakhiar, I.A.; Solangi, K.A.; Qureshi, W.A.; Shaikh, S.A.; Chen, J. Influence of atomization nozzles and spraying intervals on growth, biomass yield, and nutrient uptake of butter-head lettuce under aeroponics system. Agronomy 2021, 11, 97. [Google Scholar] [CrossRef]
Mahmood ur Rehman, M.; Liu, J.; Nijabat, A.; Faheem, M.; Wang, W.; Zhao, S. Leveraging Convolutional Neural Networks for Disease Detection in Vegetables: A Comprehensive Review. Agronomy 2024, 14, 2231. [Google Scholar] [CrossRef]
Niu, G.; Gu, J.; Xu, J.; Chen, Z. Multi-object quantity estimation based on multi-view convolution neural network. Command Inf. Syst. Technol. 2022, 13, 71–79. [Google Scholar]
Zhu, W.; Feng, Z.; Dai, S.; Zhang, P.; Wei, X. Using UAV multispectral remote sensing with appropriate spatial resolution and machine learning to monitor wheat scab. Agriculture 2022, 12, 1785. [Google Scholar] [CrossRef]
Zhang, H.; Tian, Q.; Bian, L.; Ge, Y. Plants biomass acquisition based on morphological, color and texture features of multi-view visible images. Trans. Chin. Soc. Agric. Mach. 2024, 55, 295–305. [Google Scholar]
Xie, P.; Zhang, Z.; Ba, Y.; Dong, N.; Zuo, X.; Yang, N.; Chen, J.; Cheng, Z.; Zhang, B.; Yang, X. Diagnosis of summer maize water stress based on UAV image texture and phenotypic parameters. Trans. Chin. Soc. Agric. Eng. 2024, 40, 136–146. [Google Scholar]
Zhang, H.; Ge, Y.; Xie, X.; Atefi, A.; Wijewardane, N.K.; Thapa, S. High throughput analysis of leaf chlorophyll content in sorghum using RGB, hyperspectral, and fluorescence imaging and sensor fusion. Plant Methods 2022, 18, 60. [Google Scholar] [CrossRef]
Che, Y.; Wang, Q.; Li, S.; Li, B.; Ma, Y. Monitoring of maize phenotypic traits using super-resolution reconstruction and multimodal data fusion. Trans. Chin. Soc. Agric. Eng. 2021, 37, 169–178. [Google Scholar]
Sun, J.; Jiang, S.Y.; Mao, H.P.; Wu, X.H.; Li, Q.L. Classification of Black Beans Using Visible and Near Infrared Hyperspectral Imaging. Int. J. Food Prop. 2015, 19, 1687–1695. [Google Scholar] [CrossRef]
Xu, S.; Xu, X.; Zhu, Q.; Meng, Y.; Yang, G.; Feng, H.; Yang, M.; Zhu, Q.; Xue, H.; Wang, B. Monitoring leaf nitrogen content in rice based on information fusion of multi-sensor imagery from UAV. Precis. Agric. 2023, 24, 2327–2349. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, H.; Bian, L.; Zhou, L.; Ge, Y. Accurate plant 3D reconstruction and phenotypic traits extraction via stereo imaging and multi-view point cloud alignment. Front. Plant Sci. 2025, 16, 1642388. [Google Scholar] [CrossRef]
Hayashi, A.; Kochi, N.; Kodama, K.; Isobe, S.; Tanabata, T. CLCFM3: A 3D Reconstruction Algorithm Based on Photogrammetry for High-Precision Whole Plant Sensing Using All-Around Images. Sensors 2025, 25, 5829. [Google Scholar] [CrossRef]
Li, Y.; Zhang, B.; Wang, Y.; Zhang, X.; Zhang, J.; Fan, X. Study on 3D reconstruction and double sided alignment method of maize based on multi-view images. Jiangsu Agric. Sci. 2023, 51, 177–184. [Google Scholar]
Li, Z.; Li, C.; Munoz, P. Blueberry yield estimation through multi-view imagery with YOLOv8 object detection. In Proceedings of the 2023 ASABE Annual International Meeting, Omaha, Nebraska, 9–12 July 2023. [Google Scholar]
Zhang, H.; Liu, M.; Feng, Z.; Song, L.; Li, X.; Liu, W.; Wang, C.; Feng, W. Estimations of water use efficiency in winter wheat based on multi-angle remote sensing. Front. Plant Sci. 2021, 12, 614417. [Google Scholar] [CrossRef]
Duan, L.; Huang, C.; Chen, G.; Xiong, L.; Liu, L.; Yang, Q. Determination of rice panicle numbers during heading by multi-angle imaging. J. Integr. Agric. 2015, 14, 211–219. [Google Scholar] [CrossRef]
Belle, V.; Papantonis, I. Principles and practice of explainable machine learning. Front. Big Data 2021, 4, 688969. [Google Scholar] [CrossRef]
Gosiewska, A.; Kozak, A.; Biecek, P. Simpler is better: Lifting interpretability-performance trade-off via automated feature engineering. Decis. Support Syst. 2021, 150, 113556. [Google Scholar] [CrossRef]
Bhatt, U.; Xiang, A.; Sharma, S.; Weller, A.; Taly, A.; Jia, Y.; Ghosh, J.; Puri, R.; Moura, J.M.F.; Eckersley, P. Explainable machine learning in deployment. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, Barcelona, Spain, 27–30 January 2020; pp. 648–657. [Google Scholar]
Maksymiuk, S.; Gosiewska, A.; Biecek, P. Landscape of R packages for eXplainable Artificial Intelligence. arXiv 2020, arXiv:2009.13248. [Google Scholar]
de Lara, A.; Mieno, T.; Luck, J.D.; Puntel, L.A. Predicting site-specific economic optimal nitrogen rate using machine learning methods and on-farm precision experimentation. Precis. Agric. 2023, 24, 1792–1812. [Google Scholar] [CrossRef]
Hu, T.; Zhang, X.; Bohrer, G.; Liu, Y.; Zhou, Y.; Martin, J.; Li, Y.; Zhao, K. Crop yield prediction via explainable AI and interpretable machine learning: Dangers of black box models for evaluating climate change impacts on crop yield. Agric. For. Meteorol. 2023, 336, 109458. [Google Scholar] [CrossRef]
Li, T.; Zhang, Y.; Hu, L.; Zhao, Y.; Cai, Z.; Yu, T.; Zhang, X. Multi-Trait Phenotypic Analysis and Biomass Estimation of Lettuce Cultivars Based on SFM-MVS. Agriculture 2025, 15, 1662. [Google Scholar] [CrossRef]
Camps-Valls, G.; Campos-Taberner, M.; Moreno-Martínez, Á.; Walther, S.; Duveiller, G.; Cescatti, A.; Mahecha, M.D.; Muñoz-Marí, J.; García-Haro, F.J.; Guanter, L.; et al. A unified vegetation index for quantifying the terrestrial biosphere. Sci. Adv. 2021, 7, eabd9445. [Google Scholar] [CrossRef]
Song, Z.; Yan, S.; Zang, Z.; Fu, Y.; Wei, D.; Cui, H.-L.; Lai, P. Temporal and Spatial Variability of Water Status in Plant Leaves by Terahertz Imaging. IEEE Trans. Terahertz Sci. Technol. 2018, 8, 192–199. [Google Scholar] [CrossRef]
Bulgari, R.; Riahi, J.; Cecire, R.; Celi, L.; Malandrino, M.; Stefanescu Miralles, G.; Comba, L.; Alfarano, L.; Pugliese, M. Characterisation of combined abiotic and biotic stresses effects on lettuce plants via a multi-analysis approach. Front. Plant Sci. 2025, 16, 1550577. [Google Scholar] [CrossRef]
Tavakoli, H.; Gebbers, R. Assessing Nitrogen and Water Status of Winter Wheat Using a Digital Camera. Comput. Electron. Agric. 2019, 157, 558–567. [Google Scholar] [CrossRef]
Zhang, J.; Deng, J.T.; Ni, G.W.; Niu, Z.J.; Pan, S.J.; Han, W.T. Influencing Factors of Soil Moisture Content Inversion in Kiwifruit Root Region Based on Vegetation Index. Trans. Chin. Soc. Agric. Mach. 2022, 53, 223–230. [Google Scholar]
Liu, Y.; Huang, J.; Sun, Q.; Feng, H.K.; Yang, G.J.; Yang, F.Q. Estimation of Plant Height and Above Ground Biomass of Potato Based on UAV Digital Image. Natl. Remote Sens. Bull. 2021, 25, 2004–2014. [Google Scholar]
Tang, Z.; Guo, J.; Xiang, Y.; Lu, X.; Wang, Q.; Wang, H.; Cheng, M.; Wang, H.; Wang, X.; An, J.; et al. Estimation of Leaf Area Index and Above-Ground Biomass of Winter Wheat Based on Optimal Spectral Index. Agronomy 2022, 12, 1729. [Google Scholar] [CrossRef]
Elsherbiny, O.; Zhou, L.; Feng, L.; Qiu, Z. Integration of Visible and Thermal Imagery with an Artificial Neural Network Approach for Robust Forecasting of Canopy Water Content in Rice. Remote Sens. 2021, 13, 1785. [Google Scholar] [CrossRef]
Haralick, R.M.; Shanmugam, K.; Dinstein, I. Textural Features for Image Classification. IEEE Trans. Syst. Man. Cybern. 1973, 3, 610–621. [Google Scholar] [CrossRef]
Lati, R.N.; Filin, S.; Eizenberg, H. Plant Growth Parameter Estimation from Sparse 3D Reconstruction Based on Highly-Textured Feature Points. Precis. Agric. 2013, 14, 586–605. [Google Scholar] [CrossRef]
Yang, B.; Li, X.F.; Zhang, J.Q.; Wan, H.W.; Yu, Y.L.; Gao, J.X.; Wang, Y.C. Estimation Above-Ground Biomass of Grassland Based on Unmanned Aerial Vehicle Multi-Spectral Images. Remote Sens. Technol. Appl. 2025, 40, 1333–1343. [Google Scholar]
Wang, Y.K.; Ma, Y.X.; Fan, X.D.; Chen, H.; Hu, X.T. Estimation Model of Comprehensive Moisture Index for Summer Maize Based on UAV Multispectral Data. Trans. Chin. Soc. Agric. Mach. 2025, 56, 74–85. [Google Scholar]
Cao, Q.; Miao, Y.; Feng, G.; Gao, X.; Li, F.; Liu, B.; Yue, S.; Cheng, S.; Ustin, S.L.; Khosla, R. Active Canopy Sensing of Winter Wheat Nitrogen Status: An Evaluation of Two Sensor Systems. Comput. Electron. Agric. 2015, 112, 54–67. [Google Scholar] [CrossRef]
Al-Saddik, H.; Simon, J.C.; Cointault, F. Assessment of the Optimal Spectral Bands for Designing a Sensor for Vineyard Disease Detection: The Case of ‘Flavescence dorée’. Precis. Agric. 2019, 20, 398–422. [Google Scholar] [CrossRef]
Kumar, C.; Dhillon, J.; Huang, Y.; Reddy, K. Explainable Machine Learning Models for Corn Yield Prediction Using UAV Multispectral Data. Comput. Electron. Agric. 2025, 231, 109990. [Google Scholar] [CrossRef]
Wang, W.K.; Zhang, J.Y.; Wang, H.; Cao, Q.; Tian, Y.C.; Zhu, Y.; Cao, W.X.; Liu, X.J. Non-Destructive Monitoring of Rice Growth Key Indicators Based on Fixed-Wing UAV Multispectral Images. Sci. Agric. Sin. 2023, 56, 4175–4191. [Google Scholar]
Wang, L.G.; He, J.; Zheng, G.Q.; Guo, Y.; Zhang, Y.; Zhang, H.L. Estimation of Maize FPAR Based on UAV Multispectral Remote Sensing. Trans. Chin. Soc. Agric. Mach. 2022, 53, 202–210. [Google Scholar]
Gu, X.B.; Xu, Y.; Cheng, Z.K.; Zhou, Z.H.; Wei, C.Y.; Du, Y.D. Inversion of Leaf Water Content for Mulched Winter Wheat Based on Multi-spectral Remote Sensing of Unmanned Aerial Vehicle. Trans. Chin. Soc. Agric. Mach. 2025, 56, 547–556, 565. [Google Scholar]
Marques Ramos, A.P.; Prado Osco, L.; Elis Garcia Furuya, D.; Nunes Gonçalves, W.; Cordeiro Santana, D.; Pereira Ribeiro Teodoro, L.; Antonio da Silva Junior, C.; Fernando Capristo-Silva, G.; Li, J.; Henrique Rojo Baio, F.; et al. A Random Forest Ranking Approach to Predict Yield in Maize with UAV-Based Vegetation Spectral Indices. Comput. Electron. Agric. 2020, 178, 105791. [Google Scholar] [CrossRef]
Nasim, W.; Ahmad, A.; Belhouchette, H.; Fahad, S.; Hoogenboom, G. Evaluation of the OILCROP-SUN Model for Sunflower Hybrids Under Different Agro-Meteorological Conditions of Punjab–Pakistan. Field Crops Res. 2016, 188, 17–30. [Google Scholar] [CrossRef]
Musolf, A.M.; Holzinger, E.; Malley, J.D.; Bailey-Wilson, J.E. What Makes a Good Prediction? Feature Importance and Beginning to Open the Black Box of Machine Learning in Genetics. Hum. Genet. 2021, 141, 1515–1528. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Fu, Y.; Peng, Y.; Ming, J. Clinical Decision Support Tool for Breast Cancer Recurrence Prediction Using SHAP Value in Cooperative Game Theory. Int. J. Clin. Decis. Support 2024, 10, e24876. [Google Scholar] [CrossRef]
Rao, Y.; Zhang, L.; Gao, L.; Wang, S.; Yang, L. ExAutoGP: Enhancing Genomic Prediction Stability and Interpretability with Automated Machine Learning and SHAP. Animals 2025, 15, 1172. [Google Scholar] [CrossRef]
Khan, S.; Iqbal, M.Z.; Solangi, F.; Azeem, S.; Bodlah, M.A.; Zaheer, M.S.; Niaz, Y.; Ashraf, M.; Abid, M.; Gul, H.; et al. Impact of Amino Acid Supplementation on Hydroponic Lettuce (Lactuca sativa L.) Growth and Nutrient Content. Sci. Rep. 2025, 15, 15829. [Google Scholar] [CrossRef]
Si, C.; Lin, Y.; Luo, S.; Yu, Y.; Liu, R.; Naz, M.; Dai, Z. Effects of led light quality combinations on growth and leaf colour of tissue culture-generated plantlets in sedum rubrotinctum. Hortic. Sci. Technol. 2024, 2024, 53–67. [Google Scholar] [CrossRef]
Tian, J.; Wang, C.; Chen, F.; Qin, W.; Yang, H.; Zhao, S.; Xia, J.; Du, X.; Zhu, Y.; Wu, L.; et al. Maize Smart-Canopy Architecture Enhances Yield at High Densities. Nature 2024, 632, 576–584. [Google Scholar] [CrossRef]
Lu, N.; Zhou, J.; Han, Z.; Li, D.; Cao, Q.; Yao, X.; Tian, Y.; Zhu, Y.; Cao, W.; Cheng, T. Improved Estimation of Aboveground Biomass in Wheat from RGB Imagery and Point Cloud Data Acquired with a Low-Cost Unmanned Aerial Vehicle System. Plant Methods 2019, 15, 17. [Google Scholar] [CrossRef]
Li, X.; Cheng, Y.; Zhou, Y.; Shi, L.; Sun, J.; Ho, G.W.; Wang, R. Programmable Robotic Shape Shifting and Color Morphing Dynamics Through Magneto-Mechano-Chromic Coupling. Adv. Mater. 2024, 36, 2406714. [Google Scholar] [CrossRef]
Osco, L.P.; Ramos, A.P.M.; Faita Pinheiro, M.M.; Moriya, É.A.S.; Imai, N.N.; Estrabis, N.; Ianczyk, F.; Araújo, F.F.d.; Liesenberg, V.; Jorge, L.A.d.C.; et al. A Machine Learning Framework to Predict Nutrient Content in Valencia-Orange Leaf Hyperspectral Measurements. Remote Sens. 2020, 12, 906. [Google Scholar] [CrossRef]
Zhao, Z.; Yao, H.; Zeng, D.; Jiang, Z.; Zhang, X. UAV Multi-Source Data Fusion with Super-Resolution for Accurate Soybean Leaf Area Index Estimation. Front. Plant Sci. 2025, 16, 1700660. [Google Scholar] [CrossRef]
Rana, S.; Hensel, O.; Nasirahmadi, A. From Vineyard to Vision: Multi-Domain Analysis and Mitigation of Grape Cluster Detection Failures in Complex Viticultural Environments. Results Eng. 2025, 29, 108833. [Google Scholar] [CrossRef]

Figure 1. Lettuce cultivation environment.

Figure 2. Workflow.

Figure 3. Box plots of fresh weight under different nitrogen gradients.

Figure 4. Comparison of prediction performance of six models for lettuce fresh weight.

Figure 5. Polar bar chart for comprehensive evaluation of fresh-weight prediction performance across machine-learning models.

Figure 6. Feature importance and effect distribution for fresh-weight prediction based on SHAP values. (A) Importance ranking based on the overall influence of features, where higher-ranked features have greater effects on biomass prediction; (B) effects of individual feature points on model output in positive and negative directions. The color gradient from blue to red indicates the magnitude of feature values and reflects the actual contribution of input variables to the final prediction results.

Figure 7. Analysis of the Fit Between Actual and Predicted Fresh Weights Based on the Optimal Model.

Table 1. The morphological characteristic parameters used in this study.

Morphological Characteristic Parameters	Correlation Coefficient (Absolute Value) with the Fresh Weight of Lettuce
PH	0.8 *
PCCHSA	0.91 *
PCCHV	0.85 *
PCHA	0.91 *
PCHP	0.9 *
PCCR	0.67 *
PER_W	0.85 *
PER_H	0.7 *
PER_L	0.84 *
PER_WHR	0.096
PCHA/CCA	0.052
PCCHSA/PCCHV	0.76 *

Note: * indicates p ≤ 0.05.

Table 2. RGB color indices used in this study.

Index Name	Abbreviation	Calculation Formula	Correlation Coefficient (Absolute Value) with the Fresh Weight of Lettuce			Source
Index Name	Abbreviation	Calculation Formula	Top-Down View	Single-Sided View	Side-View Averaging	Source
Normalized Red channel	R	$R = r / (r + g + b)$	0.25 *	0.29 *	0.32 *	[43]
Normalized Green channel	G	$G = g / (r + g + b)$	0.25 *	0.31 *	0.33 *
Normalized Blue channel	B	$B = b / (r + g + b)$	0.35 *	0.03	0.01
Visible-band Difference Vegetation Index	VDVI	$V D V I = (2 G - R - B) / (2 G + R + B)$	0.25 *	0.30 *	0.32 *	[44]
Normalized Green-Blue Difference Index	NGBDI	$N G B D I = (G - B) / (G + B)$	0.36 *	0.06	0.05
Visible Atmospheric Resistant Index	VARI	$V A R I = (G - B) / (G + R - B)$	0.12	0.48 *	0.52 *
Normalized Difference Index	NDI	$N D I = (G - R) / (G + R)$	0.06	0.49 *	0.52 *
Modified Green-Red Vegetation Index	IGRVI	$I G R V I = (G^{2} - R^{2}) / (G^{2} + R^{2})$	0.09	0.49 *	0.52 *
Green Leaf Index	GLI	$G L I = (2 G - B - R) / (G + R + B)$	0.26 *	0.35 *	0.36 *
Red-Green-Blue Vegetation Index	GBVI	$G L V I = (G^{2} - B R) / (G^{2} + B R)$	0.35 *	0.10	0.09
Green-Red Ratio Index	GRRI	$G R R I = G / R$	0.05	0.49 *	0.53 *
Green-Blue Ratio Index	GBRI	$G B R I = G / B$	0.39 *	0.08	0.06
Blue-Green-Red Ratio Index	BGRRI	$B G R R I = (G + B) / R$	0.14	0.30 *	0.33 *	[45]
Red-Green-Blue Ratio Index	RGBRI	$R G B R I = (G + R) / B$	0.40 *	0.03	0.01
Red-Blue-Green Ratio Index	RBGRI	$R B G R I = (R + B) / G$	0.23 *	0.28 *	0.29 *
Excess Green Index	ExG	$E x G = 2 G - R - B$	0.25 *	0.31 *	0.33 *	[46]
Excess Red Index	ExR	$E x R = 1.4 R - G$	0.06	0.48 *	0.52 *
Excess Green-Red Difference Index	ExGR	$E x G R = E x G - E x R$	0.16	0.42 *	0.45 *
Color Index of Vegetation Extraction	CIVE	$C I V E = 0.441 R - 0.881 G + 0.385 B + 18.7$	0.24 *	0.33 *	0.35 *

Note: r, g, and b represent the normalized pixel values of the red, green, and blue channels in the RGB image, respectively; * indicates p ≤ 0.05.

Table 3. RGB texture indices used in this study.

Texture index	Abbreviation	Calculation Formula	Correlation Coefficient (Absolute Value) with the Fresh Weight of Lettuce
Texture index	Abbreviation	Calculation Formula	Top-Down View	Single-Sided View	Side-View Averaging
Mean	MEA	$M E A = \sum_{i} \sum_{j} i \times P (i, j)$	0.40 *	0.47 *	0.53 *
Variance	VAR	$V A R = \sum_{i} \sum_{j} {(i - u_{x})}^{2} \times P (i, j)$	0.48 *	0.40 *	0.47 *
Homogeneity	HOM	$H O M = \sum_{i} \sum_{j} \frac{P (i, j)}{1 + {(i - j)}^{2}}$	0.18 *	0.52 *	0.58 *
Contrast	CON	$C O N = \sum_{i} \sum_{j} {(i - j)}^{2} \times P (i, j)$	0.10	0.43 *	0.48 *
Dissimilarity	DIS	$D I S = \sum_{i} \sum_{j} \|i - j\| \times P (i, j)$	0.04	0.47 *	0.53 *
Entropy	ENT	$E N T = - \sum_{i} \sum_{j} P (i, j) \times l n P (i, j)$	0.18 *	0.53 *	0.60 *
Angular Second Momen	ASM	$A S M = \sum_{i} \sum_{j} {P (i, j)}^{2}$	0.21 *	0.52 *	0.59 *
Correlation	COR	$C O R = \frac{\sum_{i} \sum_{j} (i - u_{x}) (j - u_{y}) P (i, j)}{σ_{x} σ_{y}}$	0.46 *	0.30 *	0.35 *

Note: In the above formulas, i denotes the gray level of the central pixel, j denotes the gray level of a neighboring pixel satisfying the specified distance and direction, and P(i, j) is the normalized value in the i-th row and j-th column of the gray-level co-occurrence matrix, corresponding to the probability that a pixel pair with gray levels i and j occurs in the whole image. mean is the mean of P(i, j);

σ_{x} a n d σ_{y}

are the standard deviations of the probability distributions in the x and y directions, respectively;

u_{x} a n d u_{y}

are the means of the probability distributions in the x and y directions, respectively; * indicates p ≤ 0.05.

Table 4. Multispectral vegetation indices used in this study.

Index Name	Abbreviation	Calculation Formula	Correlation Coefficient (Absolute Value) with the Fresh Weight of Lettuce			Source
Index Name	Abbreviation	Calculation Formula	Top-Down View	Single-Sided View	Side-View Averaging	Source
Normalized Difference Vegetation Index	NDVI	$(N I R - R) / (N I R + R)$	0.22 *	0.43 *	0.53 *	[49]
Renormalized Difference Vegetation Index	RDVI	$(N I R - R) / \sqrt{(N I R + R)}$	0.07	0.20 *	0.13	[50]
Normalized Difference Red Edge Index	NDRE	$(N I R - R E) / (N I R + R E)$	0.57 *	0.08	0.10	[49]
Green Normalized Difference Vegetation Index	GNDVI	$(N I R - G) / (N I R + G)$	0.52 *	0.37 *	0.40 *	[49]
Red Edge Green Normalized Difference Vegetation Index	REGNDVI	$(R E - G) / (R E + G)$	0.45 *	0.26 *	0.29 *	[51]
Green Wide Dynamic Range Vegetation Index	GWDRVI	$(0.12 N I R - G) / (0.12 N I R + G)$	0.53 *	0.41 *	0.44 *	[51]
MERIS Terrestrial Chlorophyll Index	MTCI	$(N I R - R E) / (R E + R)$	0.58 *	0.17	0.21 *	[49]
Structure Insensitive Pigment Index	SIPI	$(N I R - B) / (N I R + R)$	0.05	0.17	0.21 *	[52]
Simple Ratio Index	SR	$N I R / R$	0.32 *	0.60 *	0.63 *	[53]
Green Chlorophyll Index	CI_green	$(N I R / G) - 1$	0.54 *	0.37 *	0.46 *	[49]
Chlorophyll Absorption Ratio Index	CARI	$(R E - R) - 0.2 (R E + R)$	0.51 *	0.08	0.24 *	[53]
Green-Blue Normalized Difference Vegetation Index	GBNDVI	$(N I R - (G + B)) / (N I R + (G + B))$	0.13	0.15	0.35 *	[53]
Difference Vegetation Index	DVI	$(N I R - R)$	0.03	0.03	0.16	[49]
Red Edge Soil Adjusted Vegetation Index	RESAVI	$1.5 \times (N I R - R E) / (N I R + R E + 0.5)$	0.37 *	0.18 *	0.18 *	[54]
Enhanced Vegetation Index	EVI	$2.5 \times (N I R - R) / (N I R + 6 R - 7.5 B + 1)$	0.30 *	0.15	0.09	[55]
Two-band Enhanced Vegetation Index	EVI2	$2.5 \times (N I R - R) / (N I R + 2.4 R + 1)$	0.06	0.24 *	0.19 *	[50]
Normalized Green Index	NGI	$G / (N I R + G + R E)$	0.51 *	0.34 *	0.37 *	[55]
Optimized Soil-Adjusted Vegetation Index	OSAVI	$1.16 \times (N I R - R) / (N I R + R + 0.16)$	0.13	0.41 *	0.46 *	[56]
Modified Simple Ratio Index	MSR	$((N I R / R) - 1) / \sqrt{(N I R / R + 1)}$	0.28 *	0.57 *	0.61 *	[56]
Modified Chlorophyll Absorption in Reflectance Index	MCARI	$(R E - R - 0.2 (R E - G)) \times R E / R$	0.27 *	0.15	0.14	[56]
Triangular Vegetation Index	TVI	$0.5 \times (120 \times (R E - G) - 200 \times (R - G))$	0.36 *	0.09	0.17	[56]
Soil Adjusted Vegetation Index 2	SAVI2	$1.5 \times (N I R - R) / (N I R + R + 0.5)$	0.07	0.27 *	0.24 *	[56]
Perpendicular Vegetation Index	PVI	$(R - G) / {(R + G)}^{2} - 2 \times R \times G$	0.31 *	0.03	0.01	[55]
Wide Dynamic Range Vegetation Index	WDRVI	$(0.1 N I R - R) / (0.1 N I R + R)$	0.24 *	0.52 *	0.59 *	[57]
Wide Dynamic Range Vegetation Index 2	WDRVI2	$(0.2 N I R - R) / (0.2 N I R + R)$	0.23 *	0.49 *	0.57 *	[57]
Transformed Chlorophyll Absorption in Reflectance Index	TCARI	$3 \times [(R E - R) - 0.2 \times (R E - G)] / (R E / R)$	0.34 *	0.03	0.09	[56]
Modified Triangular Vegetation Index 2	MTVI2	$1.5 \times [1.2 \times (N I R - G) - 2.5 \times (R - G)] / S Q R T [(2 N I R + 1)^{2} - (6 N I R - 5 \times S Q R T (R) - 0.5)]$	0.17	0.01	0.06	[55]

Note: G, R, B, RE, and NIR denote the spectral reflectance of the green, red, blue, red-edge, and near-infrared bands, respectively; * indicates p ≤ 0.05.

Table 5. List of ML algorithms and related hyperparameters used in this study.

ML Method	Input Features	List of Hyperparameters and Their Optimal Value
RFR	MFs	{‘max_depth’: 5, ‘max_features’: ‘sqrt’, ‘min_samples_leaf’: 2, ‘min_samples_split’: 7, ‘n_estimators’: 150}
	MF-CIs	{‘max_depth’: None, ‘max_features’: ‘log2’, ‘min_samples_leaf’: 1, ‘min_samples_split’: 10, ‘n_estimators’: 50}
	MF-CI-TIs	{‘max_depth’: 5, ‘max_features’: ‘log2’, ‘min_samples_leaf’: 1, ‘min_samples_split’: 3, ‘n_estimators’: 50}
	MF-CI-TI-VIs	{‘max_depth’: 25, ‘max_features’: ‘log2’, ‘min_samples_leaf’: 3, ‘min_samples_split’: 7, ‘n_estimators’: 50}
SVR	MFs	{‘C’: 10, ‘degree’: 2, ‘epsilon’: 0.5, ‘gamma’: ‘scale’, ‘kernel’: ‘linear’}
	MF-CIs	{‘C’: 10, ‘degree’: 2, ‘epsilon’: 0.5, ‘gamma’: 0.1, ‘kernel’: ‘rbf’}
	MF-CI-TIs	{‘C’: 10, ‘degree’: 2, ‘epsilon’: 0.2, ‘gamma’: ‘scale’, ‘kernel’: ‘linear’}
	MF-CI-TI-VIs	{‘C’: 100, ‘degree’: 2, ‘epsilon’: 0.5, ‘gamma’: ‘scale’, ‘kernel’: ‘linear’}
GBDT	MFs	{‘learning_rate’: 0.05, ‘max_depth’: 3, ‘max_features’: ‘sqrt’, ‘n_estimators’: 50, ‘subsample’: 0.8}
	MF-CIs	{‘learning_rate’: 0.05, ‘max_depth’: 3, ‘max_features’: ‘sqrt’, ‘n_estimators’: 50, ‘subsample’: 0.8}
	MF-CI-TIs	{‘learning_rate’: 0.1, ‘max_depth’: 4, ‘max_features’: ‘sqrt’, ‘n_estimators’: 50, ‘subsample’: 0.8}
	MF-CI-TI-VIs	{‘learning_rate’: 0.01, ‘max_depth’: 5, ‘max_features’: ‘sqrt’, ‘n_estimators’: 200, ‘subsample’: 0.9}
XGBoost	MFs	{‘colsample_bytree’: 1.0, ‘gamma’: 0, ‘learning_rate’: 0.01, ‘max_depth’: 5, ‘n_estimators’: 100, ‘subsample’: 1.0}
	MF-CIs	{‘colsample_bytree’: 1.0, ‘gamma’: 0.1, ‘learning_rate’: 0.05, ‘max_depth’: 7, ‘n_estimators’: 50, ‘subsample’: 0.6}
	MF-CI-TIs	{‘colsample_bytree’: 1.0, ‘gamma’: 0, ‘learning_rate’: 0.01, ‘max_depth’: 10, ‘n_estimators’: 200, ‘subsample’: 1.0}
	MF-CI-TI-VIs	{‘colsample_bytree’: 0.8, ‘gamma’: 0.1, ‘learning_rate’: 0.01, ‘max_depth’: 7, ‘n_estimators’: 200, ‘subsample’: 1.0}
KNN	MFs	{‘algorithm’: ‘auto’, ‘leaf_size’: 10, ‘n_neighbors’: 7, ‘p’: 2, ‘weights’: ‘distance’}
	MF-CIs	{‘algorithm’: ‘auto’, ‘leaf_size’: 10, ‘n_neighbors’: 15, ‘p’: 2, ‘weights’: ‘distance’}
	MF-CI-TIs	{‘algorithm’: ‘auto’, ‘leaf_size’: 10, ‘n_neighbors’: 7, ‘p’: 2, ‘weights’: ‘distance’}
	MF-CI-TI-VIs	{‘algorithm’: ‘auto’, ‘leaf_size’: 10, ‘n_neighbors’: 7, ‘p’: 1, ‘weights’: ‘uniform’}
BPNN	MFs	{‘activation’: ‘tanh’, ‘alpha’: 0.001, ‘batch_size’: 32, ‘hidden_layer_sizes’: (100,), ‘learning_rate_init’: 0.001, ‘solver’: ‘adam’}
	MF-CIs	{‘activation’: ‘tanh’, ‘alpha’: 1, ‘batch_size’: ‘auto’, ‘hidden_layer_sizes’: (50,), ‘learning_rate_init’: 0.01, ‘solver’: ‘adam’}
	MF-CI-TIs	{‘activation’: ‘tanh’, ‘alpha’: 1, ‘batch_size’: 32, ‘hidden_layer_sizes’: (100,), ‘learning_rate_init’: 0.001, ‘solver’: ‘adam’}
	MF-CI-TI-VIs	{‘activation’: ‘relu’, ‘alpha’: 0.1, ‘batch_size’: 32, ‘hidden_layer_sizes’: (100, 50), ‘learning_rate_init’: 0.01, ‘solver’: ‘adam’}

Table 6. Summary of fresh-weight statistical characteristics for 120 lettuce samples.

Nitrogen Gradient	Sample Size	Mean (g)	Standard Deviation	Max (g)	Min (g)	CV (%)
N1	24	31.34	3.72	38.10	23.43	11.86
N2	24	35.35	6.85	47.20	21.51	19.38
N3	24	43.82	10.02	63.27	21.50	22.87
N4	24	50.95	6.07	65.14	40.93	11.91
N5	24	51.68	12.07	81.65	29.68	23.36

Table 7. Feature set after VIF-based collinearity elimination.

Forecast Target	Feature Categories	Key Features
Fresh weight	MFs	PCCHSA, PCCR, PER_W
	CIs	GRRI
	TIs	ENT
	VIs	TV_MTCI, TV_CARI, SV_SR, SV_CI_green

Note: TV_: Top-down view, SV_: Side view.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, X.; Li, T.; Guo, C.; Zhang, D.; Zhang, Y. Utilizing Multi-View Morphological, Color–Textural and Multispectral Features for Interpretable Estimation of Lettuce Fresh Weight Using Machine Learning. Horticulturae 2026, 12, 688. https://doi.org/10.3390/horticulturae12060688

AMA Style

Zhang X, Li T, Guo C, Zhang D, Zhang Y. Utilizing Multi-View Morphological, Color–Textural and Multispectral Features for Interpretable Estimation of Lettuce Fresh Weight Using Machine Learning. Horticulturae. 2026; 12(6):688. https://doi.org/10.3390/horticulturae12060688

Chicago/Turabian Style

Zhang, Xiaodong, Tiezhu Li, Chuandong Guo, Deshen Zhang, and Yixue Zhang. 2026. "Utilizing Multi-View Morphological, Color–Textural and Multispectral Features for Interpretable Estimation of Lettuce Fresh Weight Using Machine Learning" Horticulturae 12, no. 6: 688. https://doi.org/10.3390/horticulturae12060688

APA Style

Zhang, X., Li, T., Guo, C., Zhang, D., & Zhang, Y. (2026). Utilizing Multi-View Morphological, Color–Textural and Multispectral Features for Interpretable Estimation of Lettuce Fresh Weight Using Machine Learning. Horticulturae, 12(6), 688. https://doi.org/10.3390/horticulturae12060688

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Utilizing Multi-View Morphological, Color–Textural and Multispectral Features for Interpretable Estimation of Lettuce Fresh Weight Using Machine Learning

Abstract

1. Introduction

2. Materials and Methods

2.1. Plant Materials and Experimental Design

2.2. Multi-View Data Acquisition and Image Extraction

2.3. Feature Extraction and Variable Construction

2.3.1. Extraction of Morphological Features (MFs)

2.3.2. Extraction of Color Feature Indices (CIs)

2.3.3. Extraction of Texture Feature Indices (TIs)

2.3.4. Extraction of Multispectral Feature Indices (VIs)

2.3.5. Feature Selection and Multicollinearity Testing

2.4. Fresh-Weight Prediction Modelling

2.5. Performance Evaluation Metrics

2.6. SHAP-Based Model Interpretability Analysis

3. Results

3.1. Statistical Analysis of Fresh Weight Under Nitrogen Gradients

3.2. Correlation Analysis Between Multi-View Phenotypic Features and Fresh Weight

3.3. Feature Selection and Multiple Collinearity Test Results

3.4. Comparative Analysis of Fresh-Weight Prediction Models

3.5. Cross-Model Comparison of Machine-Learning Algorithms

3.6. SHAP-Based Feature Contribution Analysis

3.7. Independent Sample Trial of the Optimal Model for Fresh Weight of Lettuce

4. Discussion

4.1. Association Mechanism Between Multimodal Features and Lettuce Fresh Weight

4.2. Analysis of Multimodal Feature Selection and Modeling Strategy

4.3. Analysis of the Impact and Applications of the Greenhouse Microenvironment

4.4. Advantage Analysis of Multimodal Data Fusion in Lettuce Fresh Weight Prediction

4.5. Limitations and Future Work

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI