1. Introduction
Grapes are highly nutritious and flavorful fruit with a long history of cultivation, remaining among the most widely produced fruits globally [
1]. They offer numerous health benefits, including antioxidant and anti-inflammatory properties, modulation of gut microbiota, anti-obesity effects, and protective actions on the heart and liver as well as anti-diabetic and anti-cancer activities [
2]. Beyond their consumption as fresh fruit, grapes have extensive applications in the food industry, contributing to products such as wine, grape juice, jams, raisins, and other derivatives [
2,
3]. Lorenz et al. categorized the phenology of the grapevine seasonal cycle into seven principal growth stages: budburst, leaf development, the appearance of inflorescences, flowering, fruit development, fruit maturation, and senescence [
4]. Coombe proposed an alternative classification of the main grapevine growth stages, including budburst, shoots 10 cm, flowering begins, full bloom, setting, berries pea size, veraison, and harvest [
5]. Regardless of the specific classification scheme, chlorophyll plays an indispensable role throughout the entire growth process of grapevines.
Chlorophyll is a group of green pigments found in higher plants and other photosynthetic organisms. Among the various types of chlorophyll,
chlorophyll a and
chlorophyll b are the main components, which are essential for the photosynthetic apparatus in terrestrial plants and green algae [
6]. A deficiency in chlorophyll can result in leaf yellowing, poor growth and development, and reduced yield. As a fruit-bearing plant, grapevines require adequate chlorophyll to ensure normal growth and development. Changes in the chlorophyll content of grape leaves can also serve as a physiological index for responses and adaptation to UV-C radiation [
7]. Therefore, the rapid, accurate, and efficient detection of chlorophyll content is crucial. Traditional methods for measuring chlorophyll content include spectrophotometry, fluorometry, high-performance liquid chromatography (HPLC), and portable chlorophyll meters [
8,
9,
10,
11]. However, these methods have limitations. While spectrophotometry, fluorometry, and HPLC provide accurate measurements, they require destructive sampling and the preservation of samples for laboratory analysis. Portable chlorophyll meters, such as SPAD-502 (Konica Minolta Inc., Tokyo, Japan), CL-01 (Hansatech Instruments Ltd., King’s Lynn, Norfolk, UK), Dualex Scientific+ (FORCE-A, Orsay, France), and CCM-200 (Opto-Sciences Inc., Tokyo, Japan), allow for non-destructive, low-cost, and low-labor measurements [
12]. However, these devices often require multiple measurements to reduce variation, and their results are significantly affected by factors such as plant variety, cultivation practices, and environmental conditions. In recent years, hyperspectral imaging (HSI) technology has been extensively studied and proven to be an effective method for determining chlorophyll content in various plant species, including maize leaves [
13], sugar beet [
14], lettuce [
15], wheat [
16], and millet leaves [
17].
HSI is capable of capturing multiple images at different wavelengths [
18], simultaneously collecting both spatial and spectral data [
19]. Compared to standard red, green, blue (RGB) images, HSI provides significantly more information. In agricultural production, hyperspectral remote sensing has been widely applied in crop growth monitoring, health assessment, resource allocation, yield estimation, and disease detection [
19]. However, during the acquisition of spectral data, issues such as noise, baseline drift, and scattering can easily arise. Spectral preprocessing can reduce and eliminate the influence of various non-target factors [
20], enhancing data quality and improving the accuracy and efficiency of subsequent analysis. Gao et al. applied Savitzky–Golay (SG) smoothing, adaptive window length SG smoothing, standard normal variate (SNV), and multiplicative scatter correction (MSC) algorithms to preprocess raw spectral data and constructed prediction models to analyze the effects of different preprocessing methods on model prediction accuracy [
21]. Zhang et al. developed four preprocessing methods—SG, MSC, SNV, and variable sorting for normalization—to filter noise and scattering information from the raw spectra [
22]. Yu et al. used MSC, SNV, and SG convolution smoothing for spectral preprocessing to facilitate subsequent analysis [
23].
Convolutional neural networks (CNNs) are a type of feedforward neural network characterized by its ability to automatically learn features in tasks such as image and speech processing. CNNs possess a unique ability to extract spatial–spectral features, making them highly effective for processing hyperspectral images, and they are thus widely used in this field [
24]. Pyo et al. proposed a point-centered regression CNN to estimate the concentrations of phycocyanin and
chlorophyll a in water bodies using hyperspectral images, achieving coefficient of determination (R
2) values greater than 0.86 and 0.73, respectively, with root mean square error (RMSE) values below 10 mg/m
3 [
25]. Luo et al. introduced an attention residual CNN, which, in combination with near-infrared HSI (900–1700 nm), predicted the fat content of salmon fillets, achieving an R
2 of 0.9033, RMSE of prediction of 1.5143, and residual predictive deviation (RPD) of 3.2707 [
26]. Li et al. employed short-wave infrared HSI combined with a one-dimensional (1D) CNN model to determine the soluble solids content in dried Hami jujube, and they achieved an R
2 of 0.857, RMSE of 0.563, and RPD of 2.648 [
27]. Ye et al. proposed a 1D deep learning model based on CNN with a spectral attention module to estimate the total chlorophyll content of greenhouse lettuce from full-spectrum hyperspectral images. The experimental results showed an average R
2 of 0.746 and an average RMSE of 2.018, outperforming existing standard methods [
15]. Wang et al. established an attention-CNN incorporating multi-feature parameter fusion to estimate chlorophyll content in millet leaves at different growth stages using hyperspectral images. The model achieved an R
2 of 0.839, an RMSE of 1.451, and an RPD of 2.355, demonstrating superior predictive accuracy and regression fit compared to conventional models [
17].
CNNs are currently among the most widely used models in deep learning [
28]. In addition to the aforementioned regression tasks, they can also be applied to image segmentation. Image segmentation divides images into regions with different characteristics and extracts regions of interest (ROIs) [
29], which can be understood as a pixel classification problem [
28]. Compared to conventional RGB images, each spatial location in a hyperspectral image contains hundreds of spectral bands. Due to its high dimensionality [
30], the image segmentation of hyperspectral images is more complex. Existing methods for hyperspectral image segmentation can be categorized into thresholding, watershed, clustering, morphological, region based segmentation, deep learning, and superpixel-based segmentation [
31]. Among deep learning methods, commonly used CNN architectures include AlexNet [
32], VGGNet [
33], ResNet [
34], GoogLeNet [
35], MobileNet [
36], and DenseNet [
37]. CNN models have been widely applied to both classification and regression tasks.
The combination of HSI and CNN has yielded positive results in the field of agricultural production [
19,
24,
38]. However, few studies have applied this combination to chlorophyll content regression models for grape leaves. Therefore, this study integrates hyperspectral image segmentation, spectral preprocessing, and CNNs to propose, for the first time, a data-driven ensemble framework for predicting
chlorophyll a + b content in grape leaves. The specific objectives include (1) comparing the regression performance before and after hyperspectral image segmentation to demonstrate the effectiveness of ROI extraction through image segmentation; (2) discussing the optimal spectral preprocessing method for chlorophyll prediction models; and (3) selecting the most effective CNN model from the proposed self-developed CNN models, based on the extracted ROIs of grape leaves and the best preprocessing method, to predict the
chlorophyll a + b content in grape leaves.
3. Results
3.1. VGG16-U-Net Masked Images
In this study, the VGG16-U-Net model was employed to segment grape leaves in hyperspectral images by annotating, training, and testing on the RGB images calculated from the hyperspectral data. As a result, 204 mask images were generated from the calculated RGB images of grape leaves. These masks were then applied to the hyperspectral images, producing masked hyperspectral images of the grape leaves.
Using the sample “2020-09-10_013” as an example,
Figure 6a displays the true RGB image captured by an RGB sensor. However, due to the positional offset between the RGB sensor and the hyperspectral camera, annotations cannot be directly made on the true RGB image. Therefore, the RGB image calculated from the hyperspectral image was used for annotation.
Figure 6b shows the image before annotation, and
Figure 6c shows the image after annotation.
Figure 6d displays the hyperspectral image at a specific band, and by applying the mask generated through the aforementioned annotation and training process, we obtain the masked hyperspectral image at that specific band, as shown in
Figure 6e.
After training with the VGG16-U-Net model, the mask images of the grape leaves in the test set achieved an MIoU of 98.34%, an MPA of 99.24%, and an accuracy of 99.64%, demonstrating the effectiveness of the segmentation for grape leaf images.
Table 6 shows the results of predicting the
chlorophyll a + b content using the hyperspectral images before and after segmentation, utilizing the self-developed CNN1-1 model for training. As can be seen, compared to the original images, the results obtained using the segmented mask images were superior. The R
2 improved from 0.867 to 0.904, and the RMSE decreased from 2.899 to 2.465. After 10-fold cross-validation, the R
2 still improved from 0.822 to 0.847, and the RMSE decreased from 3.531 to 3.290. These results confirm the effectiveness of the image segmentation, and subsequent experiments will be conducted using the segmented mask images.
3.2. Impact of Different Preprocessing Methods on Regression Models
Before establishing regression models to predict the
chlorophyll a + b content, this study applied 15 distinct preprocessing methods to the average spectra of the segmented grape leaf hyperspectral images. These methods include SNV, MSC, FFT, FD, SD, and combinations thereof such as SNV + FFT, SNV + FD, SNV + SD, MSC + FFT, MSC + FD, MSC + SD, FFT + FD, FFT + SD, FD + FFT, and SD + FFT. The objective was to explore the impact of spectral preprocessing on the performance of regression models.
Figure 7 presents the comparative spectral profiles of grape leaves, showing both the original mean reflectance spectra and their transformed counterparts after spectral preprocessing.
Chlorophyll a + b content predictions were performed using the self-developed CNN1-1 model with the results displayed in
Table 7. Among the 15 preprocessing methods, the top five performing were FFT, SNV + FD, MSC + FD, SNV + FFT, and MSC + FFT. Following spectral preprocessing, there was a noticeable improvement in R
2 and a reduction in RMSE both before and after 10-fold cross-validation. The FFT preprocessing method improved R
2 from 0.904 to 0.925 and reduced RMSE from 2.465 to 2.172 with 10-fold cross-validation further improving R
2 to 0.880 and reducing RMSE to 2.983. The SNV + FD combination achieved an R
2 of 0.917 and an RMSE of 2.291 with cross-validation results of R
2 at 0.870 and RMSE at 3.069. The MSC + FD combination resulted in an R
2 of 0.917 and an RMSE of 2.294 with cross-validation results of R
2 at 0.869 and RMSE at 3.089. The SNV + FFT combination reached an R
2 of 0.916 and an RMSE of 2.300 with a post cross-validation R
2 of 0.869 and RMSE of 3.070. Lastly, the MSC + FFT combination showed an R
2 of 0.913 and an RMSE of 2.341 with cross-validation results of R
2 at 0.866 and RMSE at 3.059.
Among these methods, FFT preprocessing yielded the best outcomes, attaining the highest pre- and post-cross-validation R2 values of 0.925 and 0.880, respectively. The efficacy of the FFT spectral preprocessing method was validated, making it the chosen uniform preprocessing method for regression model comparisons in the subsequent sections.
3.3. Evaluation of CNN Models for Accurate Prediction of Chlorophyll a + b Content
To establish regression models for predicting the chlorophyll a + b content of 204 grape leaf hyperspectral images, the dataset was divided into a training set and a test set in an 8:2 ratio. Using the average spectra of the grape leaves obtained after image segmentation and FFT spectral preprocessing, traditional regression models—including SVR, RFR, GBR, and PLSR—as well as 12 CNN models were built to predict the chlorophyll a + b content of grape leaves. The regression models were evaluated using various performance metrics.
Table 8 displays the results of the 12 self-developed CNN models compared to the four traditional regression models. The CNN1-1 model achieved an R
2 of 0.925 and an RMSE of 2.172 with 10-fold cross-validation results showing an R
2 of 0.880 and an RMSE of 2.983. The R
2 of the CNN1-2 model was 0.923 with an RMSE of 2.204, and after 10-fold cross-validation, the R
2 was 0.874 and the RMSE was 3.003. The CNN1-3 model had an R
2 of 0.921 and an RMSE of 2.230 with 10-fold cross-validation results showing an R
2 of 0.872 and an RMSE of 3.025. The CNN1-4 model achieved an R
2 of 0.925 and an RMSE of 2.173 with 10-fold cross-validation results showing an R
2 of 0.879 and an RMSE of 2.944. The CNN2-1 model recorded an R
2 of 0.919 and an RMSE of 2.266 with 10-fold cross-validation results showing an R
2 of 0.861 and an RMSE of 3.109. The CNN2-2 model had an R
2 of 0.915 and an RMSE of 2.312 with 10-fold cross-validation results showing an R
2 of 0.858 and an RMSE of 3.156. The CNN2-3 model exhibited an R
2 of 0.920 and an RMSE of 2.250 with 10-fold cross-validation results showing an R
2 of 0.862 and an RMSE of 3.128. The CNN2-4 model recorded an R
2 of 0.919 and an RMSE of 2.259 with 10-fold cross-validation results showing an R
2 of 0.860 and an RMSE of 3.138. The CNN3-1 model had an R
2 of 0.917 and an RMSE of 2.288 with 10-fold cross-validation results showing an R
2 of 0.859 and an RMSE of 3.135. The CNN3-2 model showed an R
2 of 0.917 and an RMSE of 2.288 with 10-fold cross-validation results showing an R
2 of 0.859 and an RMSE of 3.126. The CNN3-3 model recorded an R
2 of 0.916 and an RMSE of 2.303 with 10-fold cross-validation results showing an R
2 of 0.859 and an RMSE of 3.151. The CNN3-4 model achieved an R
2 of 0.829 and an RMSE of 3.286 with 10-fold cross-validation results showing an R
2 of 0.806 and an RMSE of 3.388.
The SVR model had an R2 of 0.131 and an RMSE of 7.407 with 10-fold cross-validation results of R2 at 0.148 and RMSE at 8.302. The RFR model recorded an R2 of 0.879 and an RMSE of 2.760 with 10-fold cross-validation R2 of 0.794 and RMSE of 3.781. The GBR model showed an R2 of 0.871 and an RMSE of 2.853 with 10-fold cross-validation results of R2 at 0.813 and RMSE at 3.663. The PLSR model demonstrated an R2 of 0.771 and an RMSE of 3.805 with 10-fold cross-validation results of R2 at 0.769 and RMSE of 4.100.
Experimental results indicate that compared to traditional regression models, the CNN models developed in this study generally outperformed the conventional methods. Among all the models evaluated, the self-developed CNN1-1 model showed optimal performance, achieving the highest R2 of 0.925 and maintaining an R2 of 0.880 even after 10-fold cross-validation. It also exhibited the lowest MSE and RMSE, thereby providing a more accurate prediction of the chlorophyll a + b content in grape leaves. This superior performance can be attributed to the strong feature learning capability of CNNs, which enables them to effectively capture the complex mapping between spectral information and chlorophyll content.
Traditional machine learning models such as SVR, RFR, GBR, and PLSR offer relatively high interpretability and reasonable modeling capabilities in hyperspectral analysis. They are particularly robust when working with limited sample sizes or under high-noise conditions. However, these models typically rely on manual feature selection or dimensionality reduction, which limits their ability to fully exploit the intricate non-linear structures and local spectral correlations between bands contained in high-dimensional spectral data.
In contrast, CNNs, as a deep learning approach, possess powerful automatic feature extraction capabilities. Through their multi-layer convolutional architecture, CNNs can learn hierarchical feature representations directly from raw spectra, effectively capturing complex non-linear relationships between spectral features and target variables without human intervention. Moreover, CNNs are particularly effective at identifying local spectral continuity and inter-band dependencies. As a result, they generally achieve better prediction performance and stronger generalization ability compared to traditional methods.
3.4. Statistical Validation of Prediction Performance
To further validate the predictive accuracy and generalization capability of the best-performing CNN1-1 model, a series of statistical validation analyses were conducted.
A scatter plot of the predicted versus measured
chlorophyll a + b content was generated, and a linear regression line was fitted to visualize their relationship, as shown in
Figure 8.
The R
2 was 0.925, indicating a strong correlation and high predictive accuracy. The regression equation was
To assess the statistical reliability of the predictions, two one-sample t-tests were performed. First, the slope was tested against the ideal value of 1. The result (t = −1.427, p = 0.161) showed no significant deviation, indicating that the predicted values are proportional to the measured chlorophyll content. Second, the intercept (bias) was tested against 0, yielding t = −0.152, p = 0.880, which suggests no significant systematic bias between the predicted and actual values.
The SEP was calculated to be 2.199 µg/cm2, reflecting the typical deviation of predicted values from the reference measurements. To evaluate potential overfitting, an F-test was conducted to compare the variance of SEP with that of the training error. The result (F = 0.454, p = 0.993) showed no significant difference, providing statistical evidence that the model generalizes well and is not overfitted to the training data.
Collectively, these validation results demonstrate that the CNN1-1 model delivers robust predictive performance with high accuracy, minimal bias, and strong generalization ability.
4. Discussion
This study demonstrates the feasibility and effectiveness of integrating HSI with CNNs for predicting chlorophyll a + b content in grapevine leaves. By employing the VGG16-U-Net model for hyperspectral image segmentation, the approach enables the automatic extraction of leaf regions and reduces background noise interference—offering a distinct advantage over many traditional spectral studies that rely on manual region selection.
For spectral preprocessing, spectral preprocessing methods such as FFT, SNV + FD, MSC + FD, SNV + FFT, and MSC + FFT significantly enhanced the prediction accuracy of
chlorophyll a + b content. However, some methods, such as SNV, MSC, SD, and SNV + SD, actually reduced model performance. The proper choice of preprocessing is difficult to assess prior to model validation [
43]. Although previous studies have predominantly employed common preprocessing methods such as SNV or FD, our findings reveal that FFT preprocessing—despite being less commonly used—produced the most effective results. This suggests that frequency-domain transformations can effectively capture periodic or structural patterns in the spectral data that are associated with chlorophyll concentration. Such information may not be readily apparent in the original or derivative spectra, highlighting the potential of FFT as a valuable preprocessing technique for spectral regression tasks.
Interestingly, the shallow CNN architecture (CNN1-1) outperformed deeper and more complex models, indicating that increased network depth or complexity does not necessarily lead to improved performance—especially when feature extraction is already supported by appropriate preprocessing and ROI segmentation. This finding may be attributed to the relatively small sample size in this study, where deeper networks are more prone to overfitting.
To further evaluate the performance of the proposed VGG16-U-Net-FFT-CNN1-1 framework for chlorophyll content prediction, we compared its results with those reported in previous studies that also utilized hyperspectral imaging for chlorophyll estimation. As shown in
Table 9, the proposed VGG16-U-Net-FFT-CNN1-1 framework outperformed the models developed by Ye et al. [
15], Yang et al. [
16], and Wang et al. [
17], offering a novel and more effective approach to chlorophyll content prediction.
From a practical perspective, this approach offers a non-destructive, efficient, and scalable solution for monitoring plant physiological traits. Within the context of precision agriculture, these models could be embedded into field-deployable platforms to support rapid chlorophyll diagnostics, facilitate nutrient management, and enable early stress detection. However, this study was based on hyperspectral data collected at a single time point. Future work will focus on expanding the dataset to encompass the entire growth cycle of grapevines, enabling the model to be generalized across different developmental stages. Additionally, this research will not be limited to chlorophyll content but will also focus on other plant physiological and biochemical indicators—such as nitrogen content, leaf water status, and disease markers—in order to establish a more comprehensive and integrated crop monitoring framework. In addition, future research will consider other crops of higher economic value and scientific significance.