Multimodal Optical Biosensing and 3D-CNN Fusion for Phenotyping Physiological Responses of Basil Under Water Deficit Stress

Jeon, Yu-Jin; Kim, Hyoung Seok; Lee, Taek Sung; Park, Soo Hyun; Yun, Heesup; Jung, Dae-Hyun

doi:10.3390/agronomy16010055

Open AccessArticle

Multimodal Optical Biosensing and 3D-CNN Fusion for Phenotyping Physiological Responses of Basil Under Water Deficit Stress

by

Yu-Jin Jeon

^1,2,

Hyoung Seok Kim

³,

Taek Sung Lee

³,

Soo Hyun Park

³,

Heesup Yun

⁴

and

Dae-Hyun Jung

^1,2,*

¹

Department of Smart Farm Science, Kyung Hee University, Yongin 17104, Republic of Korea

²

Interdisciplinary Program in IT-Bio Convergence System, Kyung Hee University, Yongin 17104, Republic of Korea

³

Smart Farm Research Center, Korea Institute of Science and Technology (KIST), Gangneung 25451, Republic of Korea

⁴

Department of Biological and Agricultural Engineering, University of California, Davis, CA 95616, USA

^*

Author to whom correspondence should be addressed.

Agronomy 2026, 16(1), 55; https://doi.org/10.3390/agronomy16010055

Submission received: 18 November 2025 / Revised: 18 December 2025 / Accepted: 22 December 2025 / Published: 24 December 2025

(This article belongs to the Special Issue Smart Farming: Advancing Techniques for High-Value Crops)

Download

Browse Figures

Versions Notes

Abstract

Water availability critically affects basil (Ocimum basilicum L.) growth and physiological performance, making the early and precise monitoring of water-deficit responses essential for precision irrigation. However, conventional visual or biochemical methods are destructive and unsuitable for real-time assessment. This study presents a multimodal optical biosensing and 3D convolutional neural network (3D-CNN) fusion framework for phenotyping physiological responses of basil under water-deficit stress. RGB, depth, and chlorophyll fluorescence (CF) imaging were integrated to capture complementary morphological and photosynthetic information. Through the fusion of 130 optical parameter layers, the 3D-CNN model learned spatial and temporal–spectral features associated with resistance and recovery dynamics, achieving 96.9% classification accuracy—outperforming both 2D-CNN and traditional machine-learning classifiers. Feature-space visualization using t-SNE confirmed that the learned latent representations reflected biologically meaningful stress–recovery trajectories rather than superficial visual differences. This multimodal fusion framework provides a scalable and interpretable approach for the real-time, non-destructive monitoring of crop water stress, establishing a foundation for adaptive irrigation control and intelligent environmental management in precision agriculture.

Keywords:

multimodal data fusion; chlorophyll fluorescence; 3D convolutional neural networks (3D-CNN); plant physiological monitoring; precision agriculture

1. Introduction

Global climate change has emerged as one of the most critical threats to the stability of agricultural production as the frequency and intensity of extreme weather events continue to increase [1]. These environmental disturbances destabilize the productivity of open-field farming and highlight the structural limitations of conventional production systems that rely heavily on regional and seasonal conditions. Consequently, Controlled-Environment Agriculture (CEA) has gained prominence as a key global strategy for ensuring stable food production by enabling the precise regulation of diverse environmental factors.

The advancement of CEA has accelerated the integration of engineering-based technologies—such as embedded sensing, automated irrigation, optical monitoring, and data-driven decision-making systems—into modern agricultural production frameworks. Recent progress in optical biosensing, multimodal imaging, and machine learning has enabled non-destructive and high-resolution monitoring of plant physiological states, offering engineering-driven solutions to challenges previously addressed solely through traditional agronomic approaches.

These technological innovations are particularly crucial for high-value horticultural crops cultivated in greenhouses and plant factories, where precise environmental control is directly linked to production efficiency and crop performance. Among such crops, basil (Ocimum basilicum L.) is one of the most widely cultivated herbs in controlled environments and has emerged as an ideal model crop for developing and validating engineering-based phenotyping and environmental optimization strategies [2].

Among various environmental factors, water availability is the most critical determinant of basil’s growth and physiological function. Water deficit reduces photosynthetic efficiency, nutrient uptake, and leaf expansion, ultimately degrading quality and marketable yield [3]. Although moderate water stress can enhance essential oil accumulation, severe drought significantly alters metabolic activity, leading to poor aroma quality and growth reduction [4]. Therefore, real-time, accurate monitoring of basil’s physiological response to water stress and subsequent recovery is essential for precision irrigation management.

Traditional diagnostic approaches—such as observing leaf wilting, discoloration, or reduced leaf area—detect visible symptoms that appear only after physiological disruption has occurred [5]. Although biochemical assays and gene-expression analyses can quantify drought responses [6], they are destructive, labor-intensive, and unsuitable for continuous monitoring in commercial cultivation.

To overcome these limitations, optical biosensing offers a non-destructive, real-time approach to monitor plant physiological states through optical signals [7]. In this framework, biological recognition elements such as chlorophyll fluorescence (CF) represent intrinsic photosynthetic indicators of stress; optical transducers (e.g., LEDs, filters, and sensors) convert these biological signals into measurable optical data; and computational interpretation using deep learning translates optical data into interpretable physiological information. This integration allows quantitative assessment of plant stress adaptation and recovery mechanisms in complex cultivation environments.

CF imaging, in particular, captures photochemical (qP) and non-photochemical quenching (NPQ) processes that directly reflect photosystem II (PSII) efficiency [8,9]. Because these dynamics precede morphological symptoms, CF serves as an ideal biorecognition signal for optical biosensing. However, single-modality measurements are often influenced by external illumination or geometric variations [10].

Recent advances in deep learning, especially convolutional neural networks (CNNs), have enabled the extraction of hierarchical spatial and temporal features from multimodal imaging data [11,12,13]. In particular, 3D convolutional neural networks (3D-CNNs) process volumetric data along x–y–z dimensions, capturing dynamic patterns within optical signal cubes more effectively than 2D models [14,15]. This approach has proven powerful in plant disease diagnosis and hyperspectral feature learning, yet its application to physiological stress detection using cost-effective optical sensors remains limited.

Previous studies employing 3D convolutional neural networks in agriculture have primarily focused on disease diagnosis or biochemical anomaly detection using hyperspectral image cubes, where volumetric learning has been shown to outperform 2D-CNNs by jointly modeling spatial and spectral correlations [16]. In parallel, chlorophyll fluorescence imaging has been extensively applied to early stress detection; however, many CF-based studies still rely on handcrafted feature extraction or sequential modeling approaches rather than direct volumetric representation learning [17]. More recently, multimodal fusion frameworks integrating hyperspectral and chlorophyll fluorescence information have demonstrated improved classification performance, yet these approaches typically treat each modality as an independent feature stream and do not explicitly preserve spatial–physiological alignment within a unified 3D volume [11].

In contrast, the present study advances beyond existing work by constructing an aligned RGB–depth–chlorophyll fluorescence fusion cube that encodes spatial structure and physiological dynamics simultaneously, and by applying a 3D-CNN to directly learn discriminative representations from this fused volume using cost-effective optical sensors. This design philosophy is consistent with recent cross-modal learning approaches in agricultural sensing, which emphasize modality-aware architectural integration to extract meaningful representations from heterogeneous inputs [18]. Furthermore, the proposed framework explicitly benchmarks its performance against traditional machine learning and 2D-CNN baselines, and systematically evaluates modality complementarity, thereby clarifying the specific contribution of volumetric multimodal learning for physiological stress and recovery phenotyping.

Therefore, this study proposes a 3D-CNN-based optical biosensing framework to classify basil’s physiological responses—normal, resistance, and recovery—under water deficit stress. RGB, depth, and chlorophyll fluorescence data were collected under controlled environmental conditions simulating plant factory systems. The specific objectives are to (1) acquire multimodal optical biosensing data of basil under controlled water-deficit and recovery conditions; (2) construct a 3D-CNN model that fuses multimodal optical signals into a unified biosensing parameter for feature learning; and (3) evaluate its performance compared to traditional machine learning and 2D-CNN approaches.

By bridging biological signal acquisition with deep multimodal representation learning, this study establishes a robust framework for non-destructive physiological monitoring. The proposed approach contributes to the advancement of AI-driven precision agriculture, offering a foundation for adaptive irrigation management and intelligent stress diagnosis in smart agriculture.

2. Materials and Methods

The overall workflow of this study (Figure 1) illustrates the optical biosensing pipeline, consisting of (i) biosignal acquisition from basil leaves, (ii) optical transduction and digitization, (iii) multimodal fusion, and (iv) deep-learning-based physiological classification.

2.1. Sample Preparation

Sweet basil (Ocimum basilicum L.) was selected as the biological recognition material due to its well-characterized photosynthetic and volatile responses to water availability.

The growth and imaging system used in this study consisted of a custom-built chamber equipped with environmental control and optical biosensing units (Figure 2a). The chamber integrated LED-based illumination and multiple optical sensors for synchronized image acquisition of RGB, depth, and chlorophyll fluorescence signals. The control unit managed imaging schedules and environmental parameters, including temperature, humidity, and light intensity, which were monitored and adjusted via a computer interface.

The environmental conditions followed general basil cultivation conditions [19], with daytime temperatures of 28 °C to 32 °C, nighttime temperatures of 22 °C to 24 °C, relative humidity of 40% to 70%, light intensity of 150 µmol/m²s to 200 µmol/m²s, and a photoperiod of 14 h.

Basil plants were cultivated hydroponically in growth trays (Figure 2b). After the basil plants developed at least four pairs of true leaves, plants with uniform growth were selected and divided into four groups: one control group and three treatment groups subjected to different irrigation conditions to induce varying degrees of water-deficit stress. The plants were then transferred to individual imaging containers within the growth and imaging chamber (Figure 2c), where both stress induction and subsequent recovery treatments were conducted under controlled environmental conditions. This setup ensured consistent imaging geometry, uniform environmental exposure, and reliable physiological measurements across all experimental groups.

Following the method of Gräf et al. (2021) [20], we captured thermal images of the leaves to check the leaf temperature differences between plants under normal and water-deficient conditions. During the water-deficit stress response, we observed that the average leaf temperature was more than 1 °C higher than that of the control group. Upon re-watering, the leaf temperature in the recovery response dropped to within 1 °C of the control group. To induce water-deficit stress responses, the treatment groups were maintained under drained conditions for 1 day, 3 days, and 9 days, respectively. To induce recovery responses, they were then re-watered under the same conditions as the control group. As per the treatment conditions, the plants in each treatment group showed increased leaf temperatures during the drainage period compared to the control group, and their leaf temperatures returned to levels similar to the control group after re-watering, indicating recovery responses. Although thermal imaging was acquired concurrently, it was not included in the multimodal fusion cube for 3D-CNN training. Thermal data were solely used to physiologically validate the Resistance and Recovery labels, as leaf temperature is an indirect and environment-sensitive indicator compared to PSII-centered chlorophyll fluorescence signals.

2.2. Image Acquisition

For 9 days, basil plants underwent a drainage and re-watering treatment while RGB, depth, and chlorophyll fluorescence image data were collected. The data collection was carried out three times a day (morning, afternoon, and evening) using a growth and imaging chamber (PhytoChamber; PhytoWorks Inc., Gangneung-si, South Korea) (Figure 3). This equipment was developed by PhytoWorks Inc. and is provided as a modular, assembled system. The equipment’s structure ensures consistent environmental conditions and includes a control part (Figure 3a) located at the top, which commands the imaging process. The LED part (Figure 3b) and camera part (Figure 3c) are fixed above to consistently capture the top view of the plant growth part. The equipment, as shown in Figure 3d, is a chamber that isolates it from the external environment, allowing for independent setting of conditions such as temperature and humidity. It includes fans and Peltier elements for temperature and humidity control, as well as internal sensors for monitoring temperature, humidity, and CO₂ levels.

The equipment configuration involves a control part with a single-board computer (Raspberry Pi 4; Raspberry Pi Foundation, Cambridge, UK) (Figure 3a-1) that manages the chamber’s internal environment and another single-board computer (LattePanda Alpha; DF Robot, Shanghai, China) (Figure 3a-2) that commands the image sensors. The LED part contains white LEDs for plant growth and red and blue LEDs for chlorophyll fluorescence imaging. The camera part includes an RGB-D camera (Figure 3c-1) with an Intel^® RealSense™ LiDAR Camera L515 (Intel Corporation, Santa Clara, CA, USA), a thermal camera (Figure 3c-2) with a FLIR Lepton 3.5 thermal sensor (Teledyne FLIR LLC, Wilsonville, OR, USA), and a chlorophyll fluorescence image acquisition device (Figure 3c-3) consisting of a Basler ace acA1300-60gm-NIR GigE camera (Basler AG, Ahrensburg, Germany), a 6 mm C Series VIS-NIR fixed focal length lens (Edmund Optics Inc., Barrington, NJ, USA), and a longpass OD4 650 nm 12.5 mm filter (Edmund Optics Inc., Barrington, NJ, USA). All devices are fixed in place.

Thus, through this single unit of equipment (Figure 3), all types of plant image data used in the experiment were collected under consistent environmental conditions, including RGB, depth, thermal, and CF images. For the CF images, a modified protocol based on [21] was employed. Initially, images of Fo and Fm were captured in a dark-adapted state after 20 min of dark adaptation, followed by exposure to actinic light to induce the Kautsky effect and acquire Fp images. Subsequently, during the light adaptation phase, the number of saturating flashes, denoted by n, was set to four. This notation explicitly indicates that the saturating flashes were activated a total of four times to capture Ft_Ln and Fm_Ln images. After these flashes, Ft_Lss and Fm_Lss images were taken upon reaching the steady-state fluorescence level in light. With these 13 directly measured images, we obtained 31 types of physiologically significant chlorophyll fluorescence parameters in the form of images. A detailed description of these 31 chlorophyll fluorescence parameters is provided in Table 1.

2.3. Dataset Preparation

Multimodal optical data were prepared through two parallel pipelines depending on the target learning framework: (i) volumetric image-based inputs for 3D-CNN training (Figure 4) and (ii) tabular feature vectors for conventional machine learning models.

2.3.1. ROIs Extraction and Labelling

After image acquisition, multimodal alignment was performed to ensure spatial correspondence among RGB, depth, and chlorophyll fluorescence (CF) modalities using the ORB feature-matching algorithm [28]. Regions of interest (ROIs) corresponding to basil leaves were then extracted and resized to 32 × 32 pixels. Each ROI was labeled as ‘Normal’, ‘Resistance’, or ‘Recovery’ according to the corresponding water treatment condition.

For deep learning–based analysis, multimodal image features within each ROI were organized into a unified volumetric representation. In total, the 3D fusion cube comprised 130 optical parameter layers, constructed from 31 chlorophyll fluorescence (CF) parameters mapped to three color channels (31 × 3 = 93 layers), together with six RGB–Depth channels and 31 single-channel CF parameter maps. Specifically, these 130 feature layers—consisting of 6 RGB–Depth channels, 93 RGB-mapped CF parameter layers, and 31 additional calculated CF parameter maps—were assembled and stacked to form a 3D fusion cube of size 130 × 32 × 32.

The overall preprocessing and fusion pipeline is schematically illustrated in Figure 4, including multimodal spatial alignment, ROI extraction, modality-wise normalization, and channel-wise stacking to construct the 3D fusion cube. This volumetric representation preserves both spatial heterogeneity and temporal–spectral continuity of the optical biosensing signals, forming the input for subsequent 3D-CNN model learning.

2.3.2. Preparation of Input Features for Machine Learning Models

In contrast to deep learning networks, conventional machine learning algorithms require a tabular input structure in which each sample is represented by a fixed-length feature vector. To enable a direct comparison between machine learning–based and deep learning–based approaches, a separate feature preparation pipeline was employed.

For this purpose, image-derived chlorophyll fluorescence (CF) and color parameters were aggregated over each ROI to generate representative numerical descriptors. Specifically, mean intensity values of CF indices that have been reported to correlate strongly with drought stress responses [8,29] were extracted. Two feature configurations were considered: (1) a single-parameter case using only the Fv/Fm index, and (2) a multi-parameter case combining eight parameters (Fv/Fm, Y_Lss, Rfd_L3, NPQ_L2, and RGB intensity values).

Each feature vector thus represented the averaged physiological and optical responses of a single plant under a specific water treatment condition. The resulting tabular dataset was standardized using z-score normalization prior to model training to ensure comparable feature scaling across all machine learning models (Equation (1)).

x' i j = \frac{x i j - μ_{j}}{σ_{j}}

(1)

where

x' i j

is the value of the

j

-th feature for the i-th sample, and

μ_{j}

and

σ_{j}

denote the mean and standard deviation of the

j

-th feature across all samples, respectively. This normalization ensured that each feature contributed equally to the classification process regardless of its original magnitude or unit.

2.3.3. Fusion as a 3D Fusion Parameter

The composition of the 3D fusion cube used for 3D-CNN training is summarized here. A total of 92, 78, and 56 plant images were collected for the ‘Normal’, ‘Resistance’, and ‘Recovery’ classes, respectively, yielding 368, 312, and 224 ROIs for each class.

Each ROI cube consisted of 130 image layers, including 6 layers from RGB and depth images, 93 layers corresponding to RGB-mapped images derived from 31 CF parameters, and 31 additional calculated CF parameter maps. These 32 × 32 pixel images were stacked along the spectral (z) axis to construct a 130 × 32 × 32 fusion parameter.

By organizing multimodal optical features in this volumetric form, parameters become learnable along the z-axis, particularly for CF data, where continuously acquired and gradually varying physiological responses are arranged sequentially within the cube. The resulting labeled fusion cubes were divided into training and test sets using an 8:2 ratio for each class. A detailed summary of the dataset composition is provided in Table 2.

2.4. Construction of Machine Learning and Deep Learning Models

2.4.1. Machine Learning Models

To establish baseline predictive models for basil’s physiological response to water availability, several machine learning algorithms were implemented, including Logistic Regression, k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), and Light Gradient Boosting Machine (LightGBM). These models were designed to compare the effectiveness of conventional machine learning with deep learning-based approaches in phenotyping data analysis.

All models were developed in Python 3.8 using open-source libraries such as Scikit-learn and Keras. The dataset was divided into training and test sets at an 8:2 ratio. Before training, all features were standardized using z-score normalization to ensure consistent scaling across variables.

The Logistic Regression model estimates class probabilities by applying the sigmoid function to the linear combination of input features and weights (Equation (2)).

\hat{y} = \frac{1}{1 + e^{- (w^{T x} + b)}}

(2)

Classification is performed using a threshold of 0.5. L2 regularization was applied to prevent overfitting, the convergence tolerance was set to

1 \times 10^{- 4}

the optimization algorithm was set to ‘lbfgs’, and the maximum iteration count was 100. The model minimizes the cross-entropy loss (Equation (3)).

L = - \frac{1}{N} \sum_{i = 1}^{N} [y_{i} \log ({\hat{y}}_{i}) + (1 - y_{i}) \log (1 - {\hat{y}}_{i})]

(3)

The k-NN model classifies a test sample based on the majority class among its nearest neighbors in the feature space. The Euclidean distance metric was used to calculate similarity, and the number of neighbors k was set to 3. Uniform weights were applied, meaning that all neighboring samples contributed equally to the final decision.

The SVM model seeks the optimal hyperplane that maximizes the margin between classes. The model minimizes the following objective function (Equation (4))

\min_{w, b} \frac{1}{2} {| w |}^{2} + C \sum_{i = 1}^{N} ξ_{i}

(4)

subject to

y_{i} (w^{T} ϕ (x_{i}) + b) \geq 1 - ξ_{i}

. Here,

C

is the penalty parameter controlling the trade-off between margin width and classification error, and

ϕ (x_{i})

represents the kernel mapping. The RBF (Radial Basis Function) kernel was selected with the kernel coefficient set to ‘scale’ and the convergence tolerance to

1 \times 10^{- 3}

.

The LightGBM model is a gradient boosting framework that sequentially builds decision trees to minimize classification error. The model uses the Gradient Boosting Decision Tree (GBDT) algorithm, where each new tree corrects the residuals of the previous ones. The objective function at iteration t is expressed as Equation (5)

{Obj}^{(t)} = \sum_{i = 1}^{N} l (y_{i}, \hat{y_{i}^{(t - 1)}} + f_{t} (x_{i})) + Ω (f_{t})

(5)

where

l

is the loss function and

Ω (f_{t})

represents the regularization term. The maximum number of leaf nodes per tree was set to 31, the learning rate to 0.1, and the number of boosting iterations to 100.

2.4.2. Deep Learning Models: 2D-CNN and 3D-CNN

A model based on 3D-CNN was constructed to learn the features of cube-shaped 3D fusion parameters. Additionally, a 2D-CNN-based model structure was developed to learn using 2D image parameters for comparison with the method utilizing the 3D fusion parameters. The convolution used in each model is explained as shown in Figure 5, and its equations follow (Equations (6)–(8)).

v_{i j}^{x y} = f (r_{i j} + \sum_{m = 0}^{M_{i} - 1} \sum_{h = 0}^{H_{i} - 1} \sum_{w = 0}^{W_{i} - 1} k_{i j m}^{h w} v_{(i - 1) m}^{(x + h) (y + w)})

(6)

v_{i j}^{x y z} = f (r_{i j} + \sum_{m = 0}^{M_{i} - 1} \sum_{b = 0}^{B_{i} - 1} \sum_{h = 0}^{H_{i} - 1} \sum_{w = 0}^{W_{i} - 1} k_{i j m}^{h w b} v_{(i - 1) m}^{(x + h) (y + w) (z + b)})

(7)

f (x) = m a x (0, x)

(8)

In Equations (6)–(8), v represents the output variable in the feature map. B, H, and W denote the size of the kernel along the spectral and spatial dimensions, respectively. (b, h, w) are the indices of the kernel, and (z, x, y) are the indices of the feature map, corresponding to the two spatial dimensions and one spectral dimension, respectively. k represents the kernel parameters. i, j, and m are the indices of the input layer, output layer, and feature map, respectively. M indicates the number of feature maps; thus, Mi represents the number of feature maps in the ith layer. r is the bias term. In this study, the Rectified Linear Unit (ReLU) is chosen as the activation function (Equation (8)).

In Figure 5, B, H, and W represent the size of the kernel along the spectral and spatial dimensions, respectively. M is the number of feature maps.

The architectures of the two models are illustrated in Figure 6. The 2D-CNN model (Figure 6a) consists of sequential 2D convolutional, max pooling, dropout, flatten, and dense layers. Each convolutional layer applies kernels of size (3, 3) across the x and y dimensions to generate spatial activation maps. The max pooling layers reduce feature map resolution, while the dropout layers randomly deactivate 25% of neurons to prevent overfitting. The flatten and dense layers transform extracted features into a fully connected representation for classification into ‘Normal’, ‘Resistance’, and ‘Recovery’ classes, with Softmax as the activation function.

The 3D-CNN model (Figure 6b) comprises 3D convolutional, max pooling, flatten, and dense layers. The 3D convolutional filters (kernel size: 3 × 3 × 3) slide across the x, y, and z dimensions of the 3D fusion cube, capturing both spatial and spectral correlations. The max pooling layer (2 × 2 × 2) reduces activation map size, while the flatten and dense layers convert the extracted features into a one-dimensional representation for final classification using Softmax.

In total, the network consists of five convolutional blocks with increasing numbers of filters (64–128), followed by fully connected layers for three-class classification using a Softmax activation function. The model was trained using the Adam optimizer with the default learning rate of 0.001, a batch size of 2, and 30 epochs. Early stopping and explicit regularization techniques were not applied, as the model exhibited stable convergence and no observable overfitting under the given training configuration. A detailed summary of the 3D-CNN architecture hyperparameters is provided in Table 3.

Both models were trained using the backpropagation algorithm and the Adam optimizer, with categorical cross-entropy as the loss function. For the 2D-CNN model, training was performed with a batch size of 32 and 100 epochs. For the 3D-CNN model, a smaller batch size of 2 and 30 epochs was used due to the higher computational load of volumetric data.

Model weights were iteratively updated to minimize prediction error, allowing each network to learn discriminative representations of basil’s physiological responses. All models were implemented using TensorFlow and Keras libraries under Python 3.8.

2.5. Performance Evaluation

To evaluate the performance of the constructed models, the types of basil responses to water availability were classified on the test set using the 3D-CNN model trained with fusion parameters and the model trained with 2D image parameters. The values of accuracy, precision, recall, and F1-score were compared using the results obtained from TP (the number of true positives), TN (the number of true negatives), FP (the number of false positives), and FN (the number of false negatives). These performance metrics explain how accurately the models classify. (Equations (9)–(12)):

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(9)

P r e c i s i o n = \frac{T P}{T P + F P}

(10)

R e c a l l = \frac{T P}{T P + F N}

(11)

F 1 s c o r e = 2 \times \frac{R e c a l l \times P r e c i s i o n}{R e c a l l + P r e c i s i o n}

(12)

A confusion matrix method was used to create a 3 × 3 matrix illustrating the relationship between actual and predicted values, thus providing a detailed evaluation of the model’s performance in classifying each label.

To further assess the robustness and generalizability of the proposed 3D-CNN model, additional validation strategies were employed. First, a stratified K-fold cross-validation (K = 5) was conducted using the entire dataset to evaluate the stability of model performance across different data partitions. In each fold, the dataset was divided into training and validation subsets while preserving class distributions, and the model was trained and evaluated independently. Performance metrics were averaged across all folds to quantify variability and robustness.

Second, learning curve analysis was performed by monitoring training and validation losses and accuracies over successive training epochs. Learning curves were used to examine model convergence behavior and to identify potential overfitting or underfitting tendencies during training. These complementary validation strategies provide a more comprehensive assessment of model reliability beyond a single train–test split.

Additionally, the receiver operating characteristic (ROC) curve method was used to represent the relationship between the true positive rate (TPR) and false positive rate (FPR), assessing the classification model’s performance based on discrimination thresholds. This involves calculating the TPR and FPR using various thresholds, plotting the ROC curve, and then calculating the area under the curve (AUC) using the trapezoidal integration method. A value close to 1.0 is considered indicative of good classification performance. (Equations (13)–(15))

F P R = \frac{F P}{F P + T N} = α

(13)

T P R = \frac{T P}{T P + F N} = 1 - β

(14)

A U C = \sum_{i} \{(1 - β_{i} \cdot Δ α) + \frac{1}{2} [Δ (1 - β) \cdot Δ α]\}, where Δ (1 - β) = (1 - β_{i}) - (1 - β_{i - 1}) Δ α = α_{i} - α_{i - 1}

(15)

2.6. Feature Visualization Using t-SNE

To visualize the high-dimensional structure of the 3D-CNN input data and the feature representations learned by the network, t-distributed Stochastic Neighbor Embedding (t-SNE) was employed. t-SNE is a machine learning-based dimensionality reduction technique that converts pairwise distances between data points into probability distributions. It maps high-dimensional data to a lower-dimensional space by minimizing the divergence between the probability distribution of the original data (modeled by a Gaussian distribution) and the probability distribution of the low-dimensional embeddings (modeled by a t-distribution). The algorithm follows the formulation (Equations (16)–(18)).

P_{i j} = \frac{\exp (- ∥ x_{i} - x_{j} ∥^{2} / 2 σ_{i}^{2})}{Σ_{k \neq l} \exp (- ∥ x_{k} - x_{l} ∥^{2} / 2 σ_{i}^{2})}

(16)

q_{i j} = \frac{\exp (- ∥ y_{i} - y_{j} ∥^{2})}{Σ_{k \neq l} \exp (- ∥ y_{k} - y_{l} ∥^{2})}

(17)

C = KL (P | | Q) = \sum \sum p_{i j} l o g \frac{p_{i j}}{q_{i j}}

(18)

In Equation (16),

P_{i j}

represents the similarity between high-dimensional data points

x_{i}

and

x_{j}

, calculated using a Gaussian distribution. Here,

x_{i}

and

x_{j}

are data points in the high-dimensional space,

∥ x_{i} - x_{j} ∥

is the Euclidean distance between data points

x_{i}

and

x_{j}

, and

σ_{i}

is the standard deviation of data point

x_{i}

. In Equation (17),

q_{i j}

represents the similarity between low-dimensional data points

y_{i}

and

y_{j}

. Here,

y_{i}

and

y_{j}

are data points in the low-dimensional space, and

∥ y_{i} - y_{j} ∥

is the Euclidean distance between data points

y_{i}

and

y_{j}

. Equation (18) defines

C

, the cost function, which uses the Kullback-Leibler (KL) Divergence,

KL (P | | Q)

, to measure the difference between the high-dimensional data probability distribution

P

and the low-dimensional data probability distribution

Q

. This difference is minimized using gradient descent.

3. Results and Discussion

3.1. Visualization of Chlorophyll Fluorescence Parameter Fusion

To illustrate the fusion process of optical biosensing data, representative chlorophyll fluorescence parameters were visualized before integration into the 3D fusion cube (Figure 7). Among the 31 fluorescence parameters, Fv/Fm and NPQ were selected as representative indicators of photosystem II efficiency and non-photochemical quenching, respectively. Each parameter image consisted of red, green, and blue color channels, as well as corresponding numerical value maps representing pixel-level intensity distributions. Temporal sequences (L1–L4 and Lss) were captured to reflect dynamic physiological responses under varying water availability conditions. These time-resolved images revealed gradual transitions in chlorophyll fluorescence intensity, corresponding to stress induction and recovery phases.

Through the fusion process, 130 parameter layers—including RGB, Depth, and multiple CF-derived channels—were stacked to form a single 3D cube-shaped fusion parameter. This cube structure preserved both spatial and temporal–spectral continuity, allowing the 3D-CNN model to learn complex physiological patterns within the integrated feature space. As shown in Figure 7, the color mapping along the temporal (z-axis) direction visually demonstrates the progressive changes in Fv/Fm and NPQ over time, confirming that the fusion approach effectively captures the continuous dynamics of basil’s physiological state under water-deficit stress.

In the context of AI-based analysis, the fusion cube serves not merely as a stacked collection of parameter images but as a temporal–spectral manifold that retains both spatial and chronological coherence of the basil’s physiological signals. This multidimensional representation allows the 3D-CNN to capture subtle variations that occur across time and spectral domains, enabling the model to recognize gradual stress–recovery transitions that may not be apparent in individual frames. By learning hierarchical spatiotemporal patterns within the fusion volume, the model effectively internalizes dynamic physiological cues that describe the plant’s adaptive responses under water deficit conditions.

Compared to conventional machine learning approaches such as Logistic Regression, k-NN, SVM, or lightGBM, which rely on pre-extracted statistical features from tabular data, the proposed 3D-CNN directly learns discriminative representations from raw fused images. Unlike 2D-CNNs that operate on single time slices and lack temporal continuity, the 3D-CNN framework leverages inter-frame dependencies along the temporal axis to infer sequential physiological processes.

This structural advantage allows the model to simultaneously analyze chlorophyll fluorescence dynamics and spatial heterogeneity, providing a more holistic interpretation of basil’s stress physiology. These observations are consistent with prior chlorophyll fluorescence imaging studies showing that time-resolved CF indicators sensitively capture early drought responses and recovery kinetics in horticultural and model plants [8,9].

3.2. Prediction Performance of Machine Learning Classifiers

To evaluate the ability of traditional algorithms to classify the physiological responses of basil to varying water availability, four representative machine learning classifiers—Logistic Regression, k-NN, SVM, and LightGBM—were trained and tested using chlorophyll fluorescence parameters (Table 4). Two scenarios were considered: (i) using only Fv/Fm as a single representative indicator of PSII efficiency, and (ii) using Fv/Fm combined with seven additional parameters (Y_Lss, Rfd_L3, NPQ_L2, R, G, and B) that are known to be associated with drought stress responses.

Overall, the inclusion of multiple stress-related parameters resulted in higher classification accuracy compared to using Fv/Fm alone, indicating that multi-parametric inputs provide a more comprehensive representation of plant physiological states. When Fv/Fm alone was used, Logistic Regression achieved the highest accuracy (0.5193), demonstrating that even a simple linear classifier can capture distinguishable trends in photosynthetic efficiency under limited water availability. However, when multiple parameters were used, SVM exhibited the best performance (0.6077), reflecting its capability to handle non-linear and high-dimensional feature spaces. This improvement suggests that integrating parameters related to photochemical efficiency (Fv/Fm), non-photochemical quenching (NPQ), and spectral color features (R, G, B) enhances model sensitivity to complex drought-induced changes.

As the dimensionality of the input data increased, the underlying relationship between CF parameters and physiological states became more non-linear. Consequently, kernel-based models such as SVM outperformed linear approaches, effectively learning boundary distributions within the multi-dimensional feature space. These findings are consistent with previous research showing that simple models such as Logistic Regression tend to underperform in small or feature-interactive datasets, while SVMs can better generalize to non-linear relationships through kernel optimization [30,31]. In particular, studies applying k-NN and Logistic Regression to phenotyping and germination image analysis also reported limited discrimination power when underlying biological variability was high [31], aligning with the trends observed in the present work.

Nevertheless, the overall accuracy of all tested machine learning classifiers remained below 0.61, highlighting intrinsic limitations in their capacity to capture the dynamic and spatially heterogeneous responses of basil under water-deficit stress. Similar constraints have been noted in prior optical sensing studies, where handcrafted features were insufficient to represent the temporal dependencies of chlorophyll fluorescence signals [26]. Traditional models rely on manually derived statistical features and assume independence between temporal observations, thus lacking the ability to learn latent spatiotemporal dependencies inherent in biosensing data. These constraints emphasize the necessity of advanced deep learning approaches—such as the 3D-CNN fusion framework introduced in the following Section 3.3—to extract hierarchical representations and interpret complex physiological responses beyond the scope of conventional classifiers.

3.3. Ablation Study on the Contribution of Individual and Combined Modalities

To quantitatively evaluate the contribution of each sensing modality and to clarify their complementary roles, an ablation study was conducted by varying the combinations of input modalities. For this purpose, 3D-CNN models were independently trained and evaluated using different modality configurations, including five cases: RGB-only, CF-only, RGB + Depth, RGB + CF, and full multimodal fusion (RGB + Depth + CF). To ensure a fair comparison, all models shared the same network architecture, training protocol, and train–validation data split, with only the input channel configurations differing across experiments.

Table 5 summarizes the classification accuracies obtained for each modality configuration. Among the single-modality models, the CF-only model achieved the highest classification accuracy (84.53%), indicating that chlorophyll fluorescence (CF) provides the most direct physiological information related to water-deficit stress. In contrast, the RGB-only model exhibited relatively limited performance, with an accuracy of 64.08%, which can be attributed to the fact that RGB information primarily relies on color and appearance cues and is therefore less sensitive to early physiological changes.

When dual-modality inputs were employed, classification performance generally improved compared to single-modality models. In particular, the RGB + CF configuration achieved an accuracy of 95.67%, significantly outperforming both the RGB-only and CF-only models. This result suggests that color and structural context from RGB images provides complementary information to fluorescence-derived physiological signals.

The full multimodal fusion model (RGB + Depth + CF) achieved the highest classification accuracy of 96.90%, outperforming all partial modality combinations and further improving upon the RGB + CF configuration. This improvement indicates that depth information contributes to capturing subtle structural recovery patterns that are not fully represented by fluorescence dynamics alone. Specifically, RGB encodes surface appearance, Depth captures structural geometry, and CF provides time-resolved physiological activity, with each modality contributing distinct yet complementary information. Consequently, the synergistic integration of these modalities enables the 3D-CNN model to construct a more holistic representation of basil’s stress–recovery dynamics, which cannot be achieved by any single modality or partial fusion alone.

3.4. Comparison of Prediction Performance Between 2D-CNN and 3D-CNN Models

To further evaluate the effectiveness of the proposed fusion approach, the classification performance of a 3D-CNN model using the fusion parameter was compared with that of a 2D-CNN model trained on 2D RGB image parameters (Table 6). The 2D-CNN achieved an accuracy of 0.7679, a precision of 0.8585, a recall of 0.6644, and an F1 score of 0.6631. The relatively low recall and F1 scores (~0.66) indicate that the 2D-CNN was limited in detecting the Positive class, suggesting that RGB images alone lack sufficient spectral–temporal information to capture subtle physiological transitions. In contrast, the 3D-CNN model using the fusion parameter exhibited remarkable improvements, achieving 0.9690 accuracy, 0.9733 precision, 0.9661 recall, and 0.9694 F1 score. The high recall and F1 values indicate that the model performs robustly in identifying all physiological response types with minimal false negatives.

Both deep learning models outperformed all machine learning classifiers presented in Table 4, confirming the superiority of deep feature extraction over manually engineered features in capturing basil’s complex physiological responses. Furthermore, the 3D-CNN, which learns from RGB, depth, and time-resolved CF parameters simultaneously, demonstrated a significant advantage over the 2D-CNN trained on single-frame RGB data. These results highlight that the inclusion of temporal–spectral information allows the model to infer subtle stress-induced patterns that static imaging cannot reveal.

This performance trend is consistent with prior multimodal imaging studies in plant phenotyping and stress detection, where 3D-CNN architectures outperformed 2D approaches by learning volumetric and spatiotemporal representations from hyperspectral and fluorescence imaging data. For example, Jung et al. [16] demonstrated that 3D-CNNs trained on hyperspectral image cubes achieved superior disease classification performance compared to 2D-CNNs by jointly modeling spatial and spectral correlations within plant tissues. Similarly, Dong et al. [17] reported that chlorophyll fluorescence-based stress diagnosis benefits from modeling temporal fluorescence dynamics, although their approach relied on feature extraction and sequential learning rather than direct volumetric convolution. More recently, Zhang et al. [11] showed that fusing hyperspectral and chlorophyll fluorescence information improves stress classification accuracy; however, their framework treated each modality as a separate feature stream rather than as a spatially aligned 3D volume.

In this context, the present study extends existing work by integrating RGB, depth, and chlorophyll fluorescence signals into a unified fusion cube and directly learning discriminative physiological representations through 3D convolution. Such volumetric multimodal learning enables more effective characterization of continuous stress and recovery dynamics, particularly for visually similar states such as ‘Normal’ and ‘Recovery’, which are difficult to distinguish using single-modality or 2D-based approaches.

Figure 8 shows the confusion matrices of the two deep learning models. The 2D-CNN achieved a 98% true positive rate for the ‘Normal’ class and 81% for the ‘Resistance’ class but exhibited substantial misclassification of the ‘Recovery’ samples as ‘Normal’, indicating poor discrimination of post-stress recovery responses. Conversely, the 3D-CNN achieved true positive rates above 91% for all three classes (Normal, Resistance, Recovery), demonstrating its ability to effectively differentiate among various physiological states. This suggests that basil leaves in the ‘Normal’ and ‘Recovery’ states share visually similar RGB textures, and that the integration of CF and depth information in the 3D-CNN enables a more accurate characterization of recovery dynamics.

3.5. Reliability and Practical Applicability of the 3D-CNN Model

Beyond classification accuracy, the practical deployment of deep learning–based phenotyping models requires reliable decision boundaries and sufficient computational efficiency. Therefore, this section evaluates the reliability of the proposed 3D-CNN model using ROC analysis and examines its inference time and model complexity to assess real-world applicability.

Before assessing classification reliability, the training stability and convergence behavior of the proposed 3D-CNN model were examined using learning curve analysis. As shown in Figure 9a, the training and validation curves exhibited consistent convergence with a minimal performance gap, indicating that the model learned generalized feature representations without severe overfitting. The smooth convergence trend further suggests that the selected training configuration was appropriate for multimodal feature learning.

Figure 9b presents the receiver operating characteristic (ROC) curves of the 3D-CNN fusion model, which depict the relationship between the true positive rate (TPR) and the false positive rate (FPR) for each response class. The ROC analysis was conducted to evaluate the model’s ability to distinguish among the three physiological states—Normal, Resistance, and Recovery—in basil leaves subjected to varying water conditions. The area under the curve (AUC) values were calculated as 0.90 for the Normal class, 0.93 for the Resistance class, and 0.92 for the Recovery class. These consistently high AUC values exceeding 0.90 demonstrate that the 3D fusion parameter model achieves strong separability in multi-class classification by maintaining a high TPR while minimizing the FPR across all categories.

The particularly high AUC observed for the Resistance class (0.93) indicates that the 3D-CNN model is exceptionally sensitive in detecting the onset of stress responses, even when physiological changes are subtle. This reflects the model’s ability to capture early alterations in photochemical efficiency and energy dissipation processes that occur during drought stress adaptation. Similar patterns have been observed in fluorescence-based physiological analyses, where Fv/Fm and NPQ were shown to sensitively reflect PSII regulation and photoprotective adjustments under water-deficit conditions [21,25].

Ultimately, the ROC analysis demonstrates that the proposed 3D-CNN fusion framework establishes stable and reliable decision boundaries across physiological response classes, providing a robust foundation for real-world deployment beyond improvements in classification accuracy. This emphasis on robust validation is consistent with recent multimodal machine learning studies, such as Guan et al. (2025) [32].

Furthermore, K-fold cross-validation results (Table 7) demonstrated stable classification performance across different data splits, with low variability in accuracy and F1-score among folds. This consistency confirms that the proposed 3D-CNN fusion framework is not sensitive to a specific train–test partition and is capable of generalizing across heterogeneous samples.

To evaluate the practical feasibility of the proposed 3D-CNN framework, the inference time and model complexity were quantitatively analyzed. Inference experiments were conducted on a workstation equipped with an NVIDIA RTX-3090 GPU, an Intel Core i9 CPU, and 64 GB of system memory. The inference time was measured as the average forward-pass latency per region of interest (ROI), excluding data loading and preprocessing, in order to reflect the pure computational cost of the model.

The proposed 3D-CNN model contains approximately 1.73 million trainable parameters, corresponding to a model size of approximately 6.6 MB. Despite its volumetric convolutional architecture, the average inference time of the 3D-CNN model was approximately 5 ms per ROI. From an operational perspective, stress diagnosis in greenhouse or vertical farming environments does not require frame-level video processing; instead, decision-support systems typically operate at minute- or hour-level intervals.

Considering these operational requirements, the observed inference time falls well within an acceptable range for practical deployment. Overall, the proposed 3D-CNN fusion model effectively achieves a balance between classification performance and computational efficiency, supporting its practical applicability in precision irrigation and smart farming applications.

3.6. Feature Distribution Analysis Using t-SNE

To interpret how the 3D-CNN model learns discriminative representations of basil physiological states, feature distributions at both the input level and the learned representation level were visualized using t-distributed stochastic neighbor embedding (t-SNE) (Figure 10). This analysis enables a qualitative assessment of how Normal, Resistance, and Recovery responses are organized within the feature space under different water availability conditions.

At the input-level embeddings (Figure 10a), the three classes exhibited partially separated yet overlapping distributions. Normal and Resistance samples were positioned relatively far apart, while Recovery samples formed an intermediate continuum between them. This overlap indicates that, despite the inclusion of multi-channel information, raw optical data alone are insufficient to clearly disentangle subtle physiological changes occurring during the early stages of water stress. Such ambiguity is consistent with previous chlorophyll fluorescence (CF)-based studies reporting that early drought responses are characterized by continuous dynamics—such as gradual induction of non-photochemical quenching (NPQ) and moderate declines in photosystem II (PSII) efficiency—rather than abrupt shifts in fluorescence indices [12,13].

In contrast, the t-SNE visualization of feature embeddings extracted from the final fully connected layer of the 3D-CNN (Figure 10b) revealed compact and well-separated clusters for the Normal, Resistance, and Recovery groups. This pronounced separation demonstrates that the 3D-CNN effectively learned nonlinear correlations between optical signals and physiological responses through hierarchical convolutional operations. The learned latent space preserves both spectral variability and temporal continuity, indicating that samples are organized according to underlying physiological processes rather than superficial visual similarities.

These results indicate that the 3D-CNN does not merely memorize image-level differences but instead learns a meaningful latent manifold that represents plant stress–response trajectories as smooth transitions across conditions. In particular, the intermediate positioning of the Recovery state between Normal and Resistance reflects gradual functional restoration following water-deficit-induced physiological impairment, consistent with known adaptation processes in which NPQ relaxation follows photoprotective activation and PSII quantum efficiency progressively recovers after re-watering [33].

The clear clustering in the learned feature space further suggests that the 3D-CNN performs implicit feature selection by emphasizing physiologically informative dimensions while suppressing less relevant noise. In this process, RGB, depth, and CF modalities provide complementary information: RGB features capture stress-related changes in leaf color distribution and chromatic uniformity [34], depth features encode structural responses such as leaf drooping and morphological restoration during recovery [35], and CF parameters provide direct insight into photosynthetic function.

Among these modalities, CF parameters play a central role by representing time-resolved functional states of PSII along the spectral–temporal axis of the 3D fusion cube. These include maximum and effective quantum efficiency indices (e.g., Fv/Fm and Y(II)) and dynamic NPQ metrics. As the 3D convolutional kernels slide across spatial (x–y) and temporal–spectral (z) dimensions, the network simultaneously learns the temporal evolution and spatial distribution of these signals [15]. Features associated with persistently elevated NPQ and suppressed PSII efficiency dominate the Resistance cluster, whereas gradual NPQ relaxation and PSII functional recovery characterize the Recovery cluster, explaining its distinct yet intermediate position in the learned t-SNE space.

Overall, the t-SNE analysis demonstrates that the proposed 3D-CNN does not rely on static intensity-based differences for classification. Instead, it integratively encodes appearance cues from RGB data, structural information from depth measurements, and photosynthetic physiological dynamics from CF signals into a biologically meaningful latent space. By aligning multimodal optical information with established mechanisms of photosynthetic regulation and stress adaptation, the proposed framework substantially enhances the biological interpretability of AI-driven plant phenotyping.

3.7. Limitations and Future Perspectives

Despite the strong performance of the proposed multimodal 3D-CNN framework, several limitations should be considered. First, the experiments were conducted under controlled environmental conditions using a custom imaging chamber, which ensured stable illumination and imaging geometry. While this setting was essential for isolating physiological responses, it may limit direct transferability to commercial greenhouses or open-field environments where background complexity and environmental variability are greater.

Second, the present study focused on a single crop species, basil, under water-deficit stress. Although basil represents a suitable model crop for controlled-environment phenotyping, physiological responses and optical signatures can differ across species and stress types. Future work should therefore evaluate the generalizability of the proposed fusion strategy across multiple crops and environmental conditions to establish broader applicability.

Finally, multimodal learning using 3D-CNNs inherently involves higher computational complexity than 2D-based approaches. While the current model achieved acceptable inference speed for practical decision-support intervals, further optimization will be required for large-scale or real-time deployment. In this regard, future studies may explore modular sensing [36] configurations and hybrid or attention-based network architectures [37] to balance computational efficiency with physiological interpretability, thereby facilitating scalable deployment in smart farming systems.

4. Conclusions

This study demonstrated the effectiveness of a 3D-CNN-based multimodal data fusion approach for phenotyping basil (Ocimum basilicum L.) under varying water availability. By integrating RGB, depth, and time-resolved CF data, the proposed model captured both spatial and temporal–spectral features that collectively describe the plant’s physiological state transitions. This fusion framework enabled comprehensive monitoring of water-stress responses, providing insights into the mechanisms of resistance and recovery.

Compared to traditional machine learning classifiers and a 2D-CNN model trained on single-frame RGB images, the 3D-CNN achieved significantly higher classification accuracy and learned more distinct and biologically meaningful feature representations. The model effectively distinguished Normal, Resistance, and Recovery states, accurately reflecting basil’s adaptive dynamics under water-deficit and rehydration conditions. Feature-space visualization using t-SNE confirmed that the learned spatial–spectral embeddings corresponded to physiologically interpretable clusters rather than superficial visual differences, validating that the 3D-CNN captured latent manifold structures underlying real plant responses.

The proposed approach contributes to precision agriculture by providing a non-destructive method for continuous stress monitoring. By linking optical biosensing with deep learning, the 3D fusion framework enables the early detection of subtle stress cues and dynamic visualization of recovery processes. This method can be extended to other crops and abiotic stress scenarios, offering a scalable foundation for intelligent irrigation control and phenotyping automation.

Future work will focus on expanding the dataset to include broader environmental variability, validating model generalizability in real greenhouse and field conditions, and coupling the system with automated irrigation or climate-control mechanisms for closed-loop water management. Ultimately, this study establishes a basis for developing intelligent, data-driven phenotyping systems that connect multimodal optical sensing with temporal deep learning architectures to enhance water-use efficiency and crop resilience under diverse agricultural conditions.

Author Contributions

Conceptualization, Y.-J.J., H.S.K. and D.-H.J.; methodology, Y.-J.J. and T.S.L.; software, T.S.L. and S.H.P.; validation, Y.-J.J., H.S.K. and H.Y.; formal analysis, Y.-J.J.; investigation, S.H.P. and T.S.L.; resources, D.-H.J.; data curation, H.S.K. and S.H.P.; writing—original draft preparation, Y.-J.J.; writing—review and editing, H.Y. and D.-H.J.; visualization, Y.-J.J. and H.Y.; supervision, D.-H.J.; project administration, D.-H.J.; funding acquisition, D.-H.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available upon reasonable request from the corresponding author. Data sharing requires approval from the research team due to internal policy and collaborative agreements.

Acknowledgments

This research was supported by the Regional Innovation System & Education(RISE) program through the Gyeonggi RISE Center, funded by the Ministry of Education(MOE) and the Gyeonggi-do, Republic of Korea (2025-RISE-09-A07).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CNN	convolutional neural networks
RGB	red–green–blue
CF	chlorophyll fluorescence
QA	primary quinone acceptor
qP	photochemical quenching coefficient
PS II	Photosystem II

References

Lim, Y.; Seo, M.-G.; Lee, J.; Hong, S.; An, J.-T.; Jeong, H.-Y.; Choi, H.-I.; Hong, W.-J.; Lee, C.; Park, S.J.; et al. Optimizing Plant Size for Vertical Farming by Editing Stem Length Regulators. Plant Biotechnol. J. 2025, 23, 3041–3053. [Google Scholar] [CrossRef]
Pennisi, G.; Pistillo, A.; Orsini, F.; Cellini, A.; Spinelli, F.; Nicola, S.; Fernandez, J.A.; Crepaldi, A.; Gianquinto, G.; Marcelis, L.F.M. Optimal Light Intensity for Sustainable Water and Energy Use in Indoor Cultivation of Lettuce and Basil under Red and Blue LEDs. Sci. Hortic. 2020, 272, 109508. [Google Scholar] [CrossRef]
Ekren, S.; Sönmez, Ç.; Özçakal, E.; Kurttaş, Y.S.K.; Bayram, E.; Gürgülü, H. The Effect of Different Irrigation Water Levels on Yield and Quality Characteristics of Purple Basil (Ocimum basilicum L.). Agric. Water Manag. 2012, 109, 155–161. [Google Scholar] [CrossRef]
Radácsi, P.; Inotai, K.; Sárosi, S.; Czövek, P.; Bernáth, J.; Németh, É. Effect of water supply on the physiological characteristic and production of basil (Ocimum basilicum L.). Eur. J. Hortic. Sci. 2010, 75, 193. [Google Scholar] [CrossRef]
Kalamartzis, I.; Menexes, G.; Georgiou, P.; Dordas, C. Effect of Water Stress on the Physiological Characteristics of Five Basil (Ocimum basilicum L.). Cultivars. Agron. 2020, 10, 1029. [Google Scholar] [CrossRef]
Seo, M.-G.; Lim, Y.; Hendelman, A.; Robitaille, G.; Beak, H.K.; Hong, W.-J.; Park, S.J.; Lippman, Z.B.; Park, Y.-J.; Kwon, C.-T. Evolutionary Conservation of Receptor Compensation for Stem Cell Homeostasis in Solanaceae Plants. Hortic. Res. 2024, 11, uhae126. [Google Scholar] [CrossRef]
Singh, A.K.; Mittal, S.; Das, M.; Saharia, A.; Tiwari, M. Optical Biosensors: A Decade in Review. Alex. Eng. J. 2023, 67, 673–691. [Google Scholar] [CrossRef]
Yao, J.; Sun, D.; Cen, H.; Xu, H.; Weng, H.; Yuan, F.; He, Y. Phenotyping of Arabidopsis Drought Stress Response Using Kinetic Chlorophyll Fluorescence and Multicolor Fluorescence Imaging. Front. Plant Sci. 2018, 9, 603. [Google Scholar] [CrossRef] [PubMed]
Gorbe, E.; Calatayud, A. Applications of Chlorophyll Fluorescence Imaging Technique in Horticultural Research: A Review. Sci. Hortic. 2012, 138, 24–35. [Google Scholar] [CrossRef]
Luo, Z.; Yang, W.; Yuan, Y.; Gou, R.; Li, X. Semantic Segmentation of Agricultural Images: A Survey. Inf. Process. Agric. 2024, 11, 172–186. [Google Scholar] [CrossRef]
Zhang, C.; Zhou, L.; Xiao, Q.; Bai, X.; Wu, B.; Wu, N.; Zhao, Y.; Wang, J.; Feng, L. End-to-End Fusion of Hyperspectral and Chlorophyll Fluorescence Imaging to Identify Rice Stresses. Plant Phenomics 2022, 2022, 9851096. [Google Scholar] [CrossRef]
Jia, N.; Zheng, C. Multimodal Rapid Identification of Growth Stages and Discrimination of Growth Status for Morchella. Smart Agric. Technol. 2024, 9, 100507. [Google Scholar] [CrossRef]
Ametefe, D.S.; Sarnin, S.S.; Ali, D.M.; Caliskan, A.; Caliskan, I.T.; Aliu, A.A.; John, D. Enhancing Leaf Disease Detection Accuracy through Synergistic Integration of Deep Transfer Learning and Multimodal Techniques. Inf. Process. Agric. 2025, 12, 279–299. [Google Scholar] [CrossRef]
Noshiri, N.; Beck, M.A.; Bidinosti, C.P.; Henry, C.J. A Comprehensive Review of 3D Convolutional Neural Network-Based Classification Techniques of Diseased and Defective Crops Using Non-UAV-Based Hyperspectral Images. Smart Agric. Technol. 2023, 5, 100316. [Google Scholar] [CrossRef]
Jeon, Y.-J.; Hong, S.; Lee, T.S.; Park, S.H.; Song, G.; Seo, M.-G.; Lee, J.; Lim, Y.; An, J.-T.; Lee, S.; et al. Volumetric Deep Learning-Based Precision Phenotyping of Gene-Edited Tomato for Vertical Farming. Plant Phenomics 2025, 7, 100095. [Google Scholar] [CrossRef]
Jung, D.-H.; Kim, J.D.; Kim, H.-Y.; Lee, T.S.; Kim, H.S.; Park, S.H. A Hyperspectral Data 3D Convolutional Neural Network Classification Model for Diagnosis of Gray Mold Disease in Strawberry Leaves. Front. Plant Sci. 2022, 13, 837020. [Google Scholar] [CrossRef]
Dong, Z.; Zhao, J.; Ji, W.; Wei, W.; Men, Y. Classification of Tomato Seedling Chilling Injury Based on Chlorophyll Fluorescence Imaging and DBO-BiLSTM. Front. Plant Sci. 2024, 15, 1409200. [Google Scholar] [CrossRef] [PubMed]
Wang, N.; Wu, Q.; Gui, Y.; Hu, Q.; Li, W. Cross-Modal Segmentation Network for Winter Wheat Mapping in Complex Terrain Using Remote-Sensing Multi-Temporal Images and DEM Data. Remote. Sens. 2024, 16, 1775. [Google Scholar] [CrossRef]
Khater, E.-S.; Bahnasawy, A.; Abass, W.; Morsy, O.; El-Ghobashy, H.; Shaban, Y.; Egela, M. Production of Basil (Ocimum basilicum L.) under different soilless cultures. Sci. Rep. 2021, 11, 12754. [Google Scholar] [CrossRef]
Gräf, M.; Immitzer, M.; Hietz, P.; Stangl, R. Water-Stressed Plants Do Not Cool: Leaf Surface Temperature of Living Wall Plants under Drought Stress. Sustainability 2021, 13, 3910. [Google Scholar] [CrossRef]
Baker, N.R.; Rosenqvist, E. Applications of Chlorophyll Fluorescence Can Improve Crop Production Strategies: An Examination of Future Possibilities. J. Exp. Bot. 2004, 55, 1607–1621. [Google Scholar] [CrossRef] [PubMed]
Oxborough, K.; Baker, N.R. Resolving Chlorophyll a Fluorescence Images of Photosynthetic Efficiency into Photochemical and Non-Photochemical Components—Calculation of qP and Fv-/Fm-; without Measuring Fo-; Photosynth. Res. 1997, 54, 135–142. [Google Scholar] [CrossRef]
Genty, B.; Briantais, J.-M.; Baker, N.R. The Relationship between the Quantum Yield of Photosynthetic Electron Transport and Quenching of Chlorophyll Fluorescence. Biochim. Biophys. Acta BBA Gen. Subj. 1989, 990, 87–92. [Google Scholar] [CrossRef]
Oxborough, K. Imaging of Chlorophyll a Fluorescence: Theoretical and Practical Aspects of an Emerging Technique for the Monitoring of Photosynthetic Performance. J. Exp. Bot. 2004, 55, 1195–1205. [Google Scholar] [CrossRef]
Horton, P.; Ruban, A.V. Regulation of Photosystem II. Photosynth. Res. 1992, 34, 375–385. [Google Scholar] [CrossRef]
Lichtenthaler, H.K.; Miehé, J.A. Fluorescence Imaging as a Diagnostic Tool for Plant Stress. Trends Plant Sci. 1997, 2, 316–320. [Google Scholar] [CrossRef]
Horton, P.; Ruban, A.V.; Walters, R.G. Regulation of Light Harvesting in Green Plants. Annu. Rev. Plant Physiol. Plant Mol. Biol. 1996, 47, 655–684. [Google Scholar] [CrossRef]
Rublee, E.; Rabaud, V.; Konolige, K.; Bradski, G. ORB: An Efficient Alternative to SIFT or SURF. In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; IEEE: Piscataway, NJ, USA, 2011; pp. 2564–2571. [Google Scholar]
Lazarević, B.; Šatović, Z.; Nimac, A.; Vidak, M.; Gunjača, J.; Politeo, O.; Carović-Stanko, K. Application of Phenotyping Methods in Detection of Drought and Salinity Stress in Basil (Ocimum basilicum L.). Front. Plant Sci. 2021, 12, 629441. [Google Scholar] [CrossRef] [PubMed]
Bailly, A.; Blanc, C.; Francis, É.; Guillotin, T.; Jamal, F.; Wakim, B.; Roy, P. Effects of Dataset Size and Interactions on the Prediction Performance of Logistic Regression and Deep Learning Models. Comput. Methods Programs Biomed. 2022, 213, 106504. [Google Scholar] [CrossRef] [PubMed]
Awty-Carroll, D.; Clifton-Brown, J.; Robson, P. Using K-NN to Analyse Images of Diverse Germination Phenotypes and Detect Single Seed Germination in Miscanthus Sinensis. Plant Methods 2018, 14, 5. [Google Scholar] [CrossRef]
Guan, T.; Gong, J.; Lin, J.; Palanisamy, C.P.; Pei, J.; Abd El-Aty, A.M. Machine Learning-Driven Multimodal Optimization of Selenium Biotransformation and Flavor Profiling in Fermented Apple–Yacon Functional Beverages. Innov. Food Sci. Emerg. Technol. 2025, 105, 104198. [Google Scholar] [CrossRef]
Chen, D.; Wang, S.; Cao, B.; Cao, D.; Leng, G.; Li, H.; Yin, L.; Shan, L.; Deng, X. Genotypic Variation in Growth and Physiological Response to Drought Stress and Re-Watering Reveals the Critical Role of Recovery in Drought Adaptation in Maize Seedlings. Front. Plant Sci. 2016, 6, 1241. [Google Scholar] [CrossRef] [PubMed]
Yang, H.; Fei, L.; Wu, G.; Deng, L.; Han, Z.; Shi, H.; Li, S. A Novel Deep Learning Framework for Identifying Soybean Salt Stress Levels Using RGB Leaf Images. Ind. Crops Prod. 2025, 228, 120874. [Google Scholar] [CrossRef]
Dayoub, E.; Lamichhane, J.R.; Schoving, C.; Debaeke, P.; Maury, P. Early-Stage Phenotyping of Root Traits Provides Insights into the Drought Tolerance Level of Soybean Cultivars. Agronomy 2021, 11, 188. [Google Scholar] [CrossRef]
Li, T.; Xiao, L.; Ling, H.; Yang, Y.; Zhong, S. Preparation of Artificial Substrate Binding Sites of Nanozyme with “Modular Structure” Strategy Used for the Construction of Visual Sensing Analysis Platform for Levodopa. Microchem. J. 2025, 212, 113292. [Google Scholar] [CrossRef]
Chen, X.; Chen, C.; Tian, X.; He, L.; Zuo, E.; Liu, P.; Xue, Y.; Yang, J.; Chen, C.; Lv, X. DBAN: An Improved Dual Branch Attention Network Combined with Serum Raman Spectroscopy for Diagnosis of Diabetic Kidney Disease. Talanta 2024, 266, 125052. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overview of the optical biosensing and deep-learning workflow for monitoring basil’s physiological responses under water deficit stress. The pipeline integrates biosignal acquisition (RGB–depth–chlorophyll fluorescence imaging), optical transduction and synchronization, 3D fusion cube generation, and physiological state classification using a 3D-CNN model.

Figure 2. Growth and imaging system for basil cultivation and data acquisition. (a) Schematic of the growth and imaging chamber integrating environmental control, LED illumination, and optical sensors managed via a computer interface. (b) Hydroponically grown basil plants before treatment. (c) Experimental setup inside the imaging chamber for controlled stress and recovery monitoring.

Figure 3. Optical biosensing imaging chamber and components. (a) Control unit for synchronized imaging and environmental regulation, including (a-1) Raspberry Pi 4 for environmental control and (a-2) LattePanda Alpha for sensor and LED management. (b) Plant growth platform with top-view imaging stage. (c) Imaging sensors: (c-1) RGB–Depth camera (c-2) thermal camera, and (c-3) chlorophyll-fluorescence camera. (d) Insulated imaging chamber with Peltier-based temperature/humidity control and internal environment sensors for stable RGB, Depth, CF, and thermal image acquisition.

Figure 4. Schematic illustration of the multimodal dataset preparation and 3D fusion pipeline.

Figure 5. Schematic illustration of convolution operations: (a) 2D convolution applied across spatial dimensions (x, y), and (b) 3D convolution extending along the spectral dimension (z).

Figure 6. Architectures of the 2D-CNN and 3D-CNN models for optical biosensing-based classification. (a) The 2D-CNN extracts spatial features from 2D inputs (32 × 32 × 3) using convolution, pooling, and dense layers. (b) The 3D-CNN processes 3D fusion cubes (130 × 32 × 32 × 1) to learn spatial–spectral features across modalities. Both models use Softmax for classification into Normal, Resistance, and Recovery classes.

Figure 7. Visualization of chlorophyll fluorescence parameters within a 3D fusion cube. Color mapping along the z-axis illustrates the gradual temporal–spectral variations in parameters such as Fv/Fm and NPQ, representing the physiological signal dynamics captured by the biosensing system.

Figure 8. Confusion matrices for classification using (a) the 2D image parameter model and (b) the 3D fusion parameter model.

Figure 9. Reliability analysis of the proposed 3D-CNN fusion model: (a) learning curves illustrating training and validation performance over epochs, and (b) receiver operating characteristic (ROC) curves for classification of basil physiological responses.

Figure 10. t-SNE visualization of input and learned feature distributions in the 3D-CNN model. (a) Input-level embeddings showing partial overlap among classes, with Recovery features positioned between Normal and Resistance. (b) Feature embeddings from the final dense layer showing clear separation, indicating that the 3D-CNN effectively learned discriminative spatial–spectral patterns associated with basil’s water-stress responses.

Table 1. List of chlorophyll fluorescence parameters derived from dark- and light-adapted imaging. Each parameter corresponds to a specific physiological component of photosystem II activity, photochemical quenching (qP), or non-photochemical quenching (NPQ).

CF Parameter	Name	Description	Formula	Reference
Fo	minimum fluorescence in dark-adapted state	QA oxidized (qP = 1), non-photochemical quenching relaxed (NPQ = 0)	Measured	[21]
Fm	maximum fluorescence in dark-adapted state	QA reduced (qP = 0), non-photochemical quenching relaxed (NPQ = 0)	Measured
Fp	peak fluorescence during the initial phase of the Kautsky effect	local F-maximum resulting from rapid reduction of plastoquinone pool and slower activation of re-oxidation mechanisms and of non-photochemical quenching	Measured
Fm_Ln	maximum fluorescence during light adaptation	QA reduced (qP = 0), non-photochemical quenching induced (NPQ > 0)	Measured
Fm_Lss	steady-state maximum fluorescence in light	QA reduced (qP = 0), non-photochemical quenching at maximum (NPQ max)	Measured
Ft_Ln	instantaneous fluorescence during light adaptation	instantaneous F-level during light adaptation that results from a dynamic equilibrium of plastoquinone reducing and re-oxidizing processes and from non-photochemical quenching	Measured
Ft_Lss	steady-state fluorescence in light	steady-state F-level that results from a dynamic equilibrium of plastoquinone reducing and re-oxidizing processes and from non-photochemical quenching	Measured
Fo_Ln	minimum fluorescence during light adaptation	calculated estimate: QA oxidized (qP = 1), non-photochemical quenching induced (NPQ > 0)	Fo/((Fm − Fo)/Fm + Fo/Fm_Ln)
Fo_Lss	steady-state minimum fluorescence in light	QA oxidized (qP = 1), non-photochemical quenching at maximum (NPQ max)	Fo/((Fm − Fo)/Fm + Fo/Fm_Lss)	[22]
Fv/Fm	maximum PSII quantum yield	maximum PSII quantum yield in dark-adapted state	(Fm − Fo)/Fm	[22]
Fv/Fm_Ln	PSII quantum yield of light adapted sample	PSII quantum yield in light-adapted state	(Fm_Ln − Fo_Lss)/Fm_Ln	[23]
Fv/Fm_Lss	PSII quantum yield of light adapted sample at steady state	PSII quantum yield in light-adapted steady-state	(Fm_Lss − Fo_Lss)/Fm_Lss	[24]
NPQ_Ln	instantaneous non-photochemical quenching during light adaptation	non-photochemical quenching induced in light	(Fm − Fm_Ln)/Fm_Ln	[24]
NPQ_Lss	steady-state non-photochemical quenching	steady-state non-photochemical quenching in light	(Fm − Fm_Lss)/Fm_Lss	[25]
Rfd_Ln	instantaneous fluorescence decline ratio in light	empiric parameter used to assess plant vitality	(Fp − Ft_Ln)/Ft_Ln	[25]
Rfd_Lss	fluorescence decline ratio in steady state	empiric parameter used to assess plant vitality	(Fp − Ft_Lss)/Ft_Lss	[26]
Y_Ln	instantaneous PSII quantum yield during light adaptation	PSII quantum yield induced in light	(Fm_Ln − Ft_Ln)/Fm_Ln	[26]
Y_Lss	steady-state PSII quantum yield	steady-state PSII quantum yield in light	(Fm_Lss − Ft_Lss)/Fm_Lss	[23]
qP_Ln	coefficient of photochemical quenching during light adaptation	estimate of the fraction of open PSII reaction centers PSIIopen/(PSIIopen+ PSIIclosed)	(Fm_Ln − Ft_Ln)/(Fm_Ln − Fo_Ln)	[23]
qP_Lss	coefficient of photochemical quenching in steady state	estimate of the fraction of open PSII reaction centers PSIIopen/(PSIIopen+ PSIIclosed)	(Fm_Lss − Ft_Lss)/(Fm_Lss − Fo_Lss)	[27]

Table 2. Summary of dataset composition used for 3D-CNN training and validation. Each ROI cube corresponds to one optical biosensing representation of basil’s physiological state.

Label	Number of Plant Images	Number of ROIs
Label	Number of Plant Images	Total	Train	Test
Normal	92	368	294	74
Resistance	78	312	250	62
Recovery	56	224	179	45

Table 3. Detailed architecture parameters of the proposed 3D-CNN model.

Layer	Configuration
Input	3D fusion cube of size 32 × 32 × 130 × 1 (H × W × spectral × channel)
Conv3D-1	64 filters, kernel size 3 × 3 × 3, stride 1 × 1 × 1, padding = same, activation = ReLU
MaxPooling3D-1	pool size 2 × 2 × 2
Conv3D-2	64 filters, kernel size 3 × 3 × 3, stride 1 × 1 × 1, padding = same, activation = ReLU
MaxPooling3D-2	pool size 3 × 2 × 2
Conv3D-3	128 filters, kernel size 3 × 3 × 3, stride 1 × 1 × 1, padding = same, activation = ReLU
MaxPooling3D-3	pool size 2 × 2 × 2
Conv3D-4	128 filters, kernel size 2 × 3 × 3, stride 1 × 1 × 1, padding = same, activation = ReLU
Conv3D-5	128 filters, kernel size 3 × 3 × 3, stride 1 × 1 × 1, padding = same, activation = ReLU
MaxPooling3D-4	pool size 2 × 2 × 2
Flatten	Feature vector flattening
Dense-1	256 units, activation = ReLU
Output	3 units, activation = Softmax

Table 4. Classification accuracy of traditional machine learning classifiers using a single (Fv/Fm) or multiple stress-related parameters.

Classifier	Fv/Fm	Fv/Fm, Y_Lss, Rfd_L3, NPQ_L2, R, G, B
Logistic Regression	0.5193	0.5967
k-NN	0.4420	0.5912
SVM	0.5028	0.6077
LightGBM	0.4917	0.5966

Table 5. Performance comparison of 3D-CNN models under different modality combinations.

Modality	Train Accuracy	Validation Accuracy	Test Accuracy
RGB-only	0.6522	0.6000	0.6408
CF-only	0.8696	0.8400	0.8453
RGB + Depth	0.6619	0.5517	0.5690
RGB + CF	0.9595	0.9583	0.9567
RGB + Depth + CF	0.9701	0.9655	0.9690

Table 6. Performance comparison between the 2D-CNN and 3D-CNN models on the test set.

Model	Accuracy	Precision	Recall	F1 Score
2D-CNN	0.7679	0.8585	0.6644	0.6631
3D-CNN	0.9690	0.9733	0.9661	0.9694

Table 7. K-fold cross-validation results of the 3D-CNN fusion model.

Metric	Fold 1	Fold 2	Fold 3	Fold 4	Fold 5	Mean ± SD
Accuracy	0.9618	0.9712	0.9646	0.9687	0.9589	0.9650 ± 0.00476
Precision	0.9686	0.9754	0.9709	0.9731	0.9658	0.9708 ± 0.00355
Recall	0.9627	0.9695	0.9672	0.9703	0.9609	0.9661 ± 0.00392
F1-score	0.9648	0.9723	0.9691	0.9710	0.9628	0.9680 ± 0.00395

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jeon, Y.-J.; Kim, H.S.; Lee, T.S.; Park, S.H.; Yun, H.; Jung, D.-H. Multimodal Optical Biosensing and 3D-CNN Fusion for Phenotyping Physiological Responses of Basil Under Water Deficit Stress. Agronomy 2026, 16, 55. https://doi.org/10.3390/agronomy16010055

AMA Style

Jeon Y-J, Kim HS, Lee TS, Park SH, Yun H, Jung D-H. Multimodal Optical Biosensing and 3D-CNN Fusion for Phenotyping Physiological Responses of Basil Under Water Deficit Stress. Agronomy. 2026; 16(1):55. https://doi.org/10.3390/agronomy16010055

Chicago/Turabian Style

Jeon, Yu-Jin, Hyoung Seok Kim, Taek Sung Lee, Soo Hyun Park, Heesup Yun, and Dae-Hyun Jung. 2026. "Multimodal Optical Biosensing and 3D-CNN Fusion for Phenotyping Physiological Responses of Basil Under Water Deficit Stress" Agronomy 16, no. 1: 55. https://doi.org/10.3390/agronomy16010055

APA Style

Jeon, Y.-J., Kim, H. S., Lee, T. S., Park, S. H., Yun, H., & Jung, D.-H. (2026). Multimodal Optical Biosensing and 3D-CNN Fusion for Phenotyping Physiological Responses of Basil Under Water Deficit Stress. Agronomy, 16(1), 55. https://doi.org/10.3390/agronomy16010055

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multimodal Optical Biosensing and 3D-CNN Fusion for Phenotyping Physiological Responses of Basil Under Water Deficit Stress

Abstract

1. Introduction

2. Materials and Methods

2.1. Sample Preparation

2.2. Image Acquisition

2.3. Dataset Preparation

2.3.1. ROIs Extraction and Labelling

2.3.2. Preparation of Input Features for Machine Learning Models

2.3.3. Fusion as a 3D Fusion Parameter

2.4. Construction of Machine Learning and Deep Learning Models

2.4.1. Machine Learning Models

2.4.2. Deep Learning Models: 2D-CNN and 3D-CNN

2.5. Performance Evaluation

2.6. Feature Visualization Using t-SNE

3. Results and Discussion

3.1. Visualization of Chlorophyll Fluorescence Parameter Fusion

3.2. Prediction Performance of Machine Learning Classifiers

3.3. Ablation Study on the Contribution of Individual and Combined Modalities

3.4. Comparison of Prediction Performance Between 2D-CNN and 3D-CNN Models

3.5. Reliability and Practical Applicability of the 3D-CNN Model

3.6. Feature Distribution Analysis Using t-SNE

3.7. Limitations and Future Perspectives

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI