Reflectance Spectroscopy for the Classification and Prediction of Pigments in Agronomic Crops

Reflectance spectroscopy, in combination with machine learning and artificial intelligence algorithms, is an effective method for classifying and predicting pigments and phenotyping in agronomic crops. This study aims to use hyperspectral data to develop a robust and precise method for the simultaneous evaluation of pigments, such as chlorophylls, carotenoids, anthocyanins, and flavonoids, in six agronomic crops: corn, sugarcane, coffee, canola, wheat, and tobacco. Our results demonstrate high classification accuracy and precision, with principal component analyses (PCAs)-linked clustering and a kappa coefficient analysis yielding results ranging from 92 to 100% in the ultraviolet–visible (UV–VIS) to near-infrared (NIR) to shortwave infrared (SWIR) bands. Predictive models based on partial least squares regression (PLSR) achieved R2 values ranging from 0.77 to 0.89 and ratio of performance to deviation (RPD) values over 2.1 for each pigment in C3 and C4 plants. The integration of pigment phenotyping methods with fifteen vegetation indices further improved accuracy, achieving values ranging from 60 to 100% across different full or range wavelength bands. The most responsive wavelengths were selected based on a cluster heatmap, β-loadings, weighted coefficients, and hyperspectral vegetation index (HVI) algorithms, thereby reinforcing the effectiveness of the generated models. Consequently, hyperspectral reflectance can serve as a rapid, precise, and accurate tool for evaluating agronomic crops, offering a promising alternative for monitoring and classification in integrated farming systems and traditional field production. It provides a non-destructive technique for the simultaneous evaluation of pigments in the most important agronomic plants.

Various techniques and equipment can be employed to optimize agricultural production strategies, according to Li et al. (2022) [6] and Wang et al. (2022) [7]. These include tools such as red-green-blue (RGB) sensors, multispectral image sensors (MSI), hyperspectral remote sensing (HRS), hyperspectral imaging sensors (HSI), and visible-near-infraredshortwave infrared (VIS-NIR-SWIR) spectroscopy tools. These can be combined with machine learning and artificial intelligence algorithms [8]. This combination can lead to improved yields in a range of production methods, including indoor and vertical farming, as well as traditional agriculture [2,9]. mass from corn, sugarcane, coffee, canola, wheat, and tobacco. F-test by one-way ANOVA (p < 0.001). Different letters over the boxes indicate significant differences by Duncan's test (p < 0.001) between crop plants. (n = 60). Dash: means; square: mean ± SD; outer spread: min-max; triangle: raw data. The abbreviations are described in Table S2. Figure 2 presents UV-VIS-NIR-SWIR hyperspectral curves for 360 of the total samples as corn, sugarcane, coffee, canola, wheat, and tobacco leaves. The permutation multivariate analysis of variance (F: 45,657.4; p < 0.001) reveals the wavelengths with the highest significance among all spectra. Variations in the reflectance factor were noted in the UV region (350-400 nm), where many phenolic and flavonoid compounds were observed, Figure 1. Boxplot of leaf pigment concentrations expressed by leaf (A-F) area and (G-L) mass from corn, sugarcane, coffee, canola, wheat, and tobacco. F-test by one-way ANOVA (p < 0.001). Different letters over the boxes indicate significant differences by Duncan's test (p < 0.001) between crop plants. (n = 60). Dash: means; square: mean ± SD; outer spread: min-max; triangle: raw data. The abbreviations are described in Table S2.

Hyperspectral Analysis in Leaves
When pigment was expressed on a mass basis ( Figure 1G-L), higher values were observed for coffee and tobacco in terms of the chlorophyll and carotenoid concentrations. Similarly, higher anthocyanin and flavonoid concentrations were observed in corn, sugarcane, and coffee ( Figure 1K,L). We also analyzed leaf pigments based on both area and mass (Table S1), establishing minimum and maximum values for corn, sugarcane, coffee, canola, wheat, and tobacco ( Figure 1 and Table S2). The twelve attributes analyzed demonstrated the coefficient of variation values spanning from 37.1% to 93.4%. All were classified as having high variance ( Figure 1). Figure 2 presents UV-VIS-NIR-SWIR hyperspectral curves for 360 of the total samples as corn, sugarcane, coffee, canola, wheat, and tobacco leaves. The permutation multivariate analysis of variance (F: 45,657.4; p < 0.001) reveals the wavelengths with the highest significance among all spectra. Variations in the reflectance factor were noted in the UV region (350-400 nm), where many phenolic and flavonoid compounds were observed, as well as in the visible region (400-700 nm), where leaf pigments such as anthocyanins, carotenoids, and chlorophylls were present. as well as in the visible region (400-700 nm), where leaf pigments such as anthocyanins, carotenoids, and chlorophylls were present. The near-infrared region (700-1300 nm) showed structural differences in leaf tissues, such as pectin, hemicellulose, lignin, and cellulose, while the shortwave infrared (SWIR; 1300-2500 nm) was attributed to the structural water content of proteins and conjugate water in intrinsic structures, such as vacuoles and other organelles, as well as cell walls ( Figure 2). The high variability (between 22.4 and 48.3%; 350-2500 nm) among the crop samples was primarily attributable to differences in pigments, structural composition, and leaf scattering ( Figure 2). The reflectance factor was analyzed in expanded leaves from corn, sugarcane, coffee, canola, wheat, and tobacco. The spectral range included UV-VIS (350-700 nm, which shows pigments in the leaves), NIR (700-1300 nm, which reveals the structural components), and SWIR (1300-2500 nm, which represents the structural-water interactions). The dotted line indicates the inflection points at 700 and 1300 nm. One-way ANOVA F-test showed significance (p < 0.001) with 360 samples for 60 samples for each crop. Figure 3 displays a cluster heatmap created to visualize the relationship between spectral data and pigment concentrations. The association between hyperspectral values and pigment concentrations was leveraged to categorize pigments (chloroplast or extrachloroplast) and identify distinct crops. Blue colors represent higher reflectance signals for crops with substantial concentrations of chlorophylls (as they have two major peak absorptions, with blue peaks at 430 and 453 nm for chlorophyll a and b), and carotenoids (broad absorption range in blue (400-500 nm)), while deeper shades of green and red indicate crops with high anthocyanin and flavonoid concentrations. Lighter shades denote a lack of association between specific wavelength bands and a certain pigment concentration. Clustering patterns based on UV to VIS to NIR to SWIR bands and crop groups were discernible. The clustering revealed the dominance of one class of pigment or the interac- The reflectance factor was analyzed in expanded leaves from corn, sugarcane, coffee, canola, wheat, and tobacco. The spectral range included UV-VIS (350-700 nm, which shows pigments in the leaves), NIR (700-1300 nm, which reveals the structural components), and SWIR (1300-2500 nm, which represents the structural-water interactions). The dotted line indicates the inflection points at 700 and 1300 nm. One-way ANOVA F-test showed significance (p < 0.001) with 360 samples for 60 samples for each crop.

Cluster Heatmap of Selected Wavelengths and Classification-Based UV-VIS-NIR-SWIR Bands
The near-infrared region (700-1300 nm) showed structural differences in leaf tissues, such as pectin, hemicellulose, lignin, and cellulose, while the shortwave infrared (SWIR; 1300-2500 nm) was attributed to the structural water content of proteins and conjugate water in intrinsic structures, such as vacuoles and other organelles, as well as cell walls ( Figure 2). The high variability (between 22.4 and 48.3%; 350-2500 nm) among the crop samples was primarily attributable to differences in pigments, structural composition, and leaf scattering ( Figure 2).  Figure 3 displays a cluster heatmap created to visualize the relationship between spectral data and pigment concentrations. The association between hyperspectral values and pigment concentrations was leveraged to categorize pigments (chloroplast or extrachloroplast) and identify distinct crops. Blue colors represent higher reflectance signals for crops with substantial concentrations of chlorophylls (as they have two major peak absorptions, with blue peaks at 430 and 453 nm for chlorophyll a and b), and carotenoids (broad absorption range in blue (400-500 nm)), while deeper shades of green and red indicate crops with high anthocyanin and flavonoid concentrations. Lighter shades denote a lack of association between specific wavelength bands and a certain pigment concentration. Clustering patterns based on UV to VIS to NIR to SWIR bands and crop groups were discernible. The clustering revealed the dominance of one class of pigment or the interaction between reflectance and cell mesophyll scattering. For instance, most UV-VIS bands exhibited similar correlation patterns within their own group (C 3 vs. C 4 metabolism; Figure 3). The NIR or SWIR bands demonstrated a strong and negative correlation with specific wavelength values, suggesting an interaction between pigments and leaf thickness, as well as the occurrence of scattering phenomena within leaf structures such as cell parenchyma and intercellular spaces. Nevertheless, carotenoids (Car), anthocyanins (AnC), and flavonoids (Flv) showed no correlation with the NIR-SWIR-linked spectra associated with the structure and structure-water of the bands. All wheat samples exhibited a positive Z-score across all bands from UV-VIS to sugarcane. Furthermore, the clustering of pigments and metabolism reveals that each crop plant possesses a specific fingerprint, which could potentially be associated with a specific vegetation index. This association takes into account the vegetative growth stages, and whether they are associated with UV-VIS bands (350-700 nm), NIR bands (700-1300 nm), or SWIR fingerprints (1300-2500 nm). Cluster heatmap displayed the correlation between the spectral data of crops (corn, sugarcane, coffee, canola, wheat, and tobacco) and their pigments (chloroplastidic, such as chlorophylls and carotenoids, and extrachloroplastidics, such as anthocyanins and flavonoids). These are grouped by UV-VIS, NIR, and SWIR wavelength bands, and spectral resolution of 10, 20, and 40 nm. The color blue indicates a positive relationship between the spectral bands, pigments present in crop plants (C3 and C4 metabolism), and pigment concentrations, while red represents negative correlations, as per the Z-scores (p < 0.001).

Principal Component Analysis (PCA), Correlation Coefficients, and Loadings of the Wavelengths
The first three principal components (PCs) accounted for 100% of the variance in the six spectral analyses (UV-VIS-NIR-SWIR) of corn, sugarcane, coffee, canola, wheat, and tobacco crops. Based on the PCA 3D plot in Figure 4, two primary clusters formed between C3 and C4 metabolisms (350-700 nm, 700-1300 nm, 1300-2500 nm, or 350-2500 nm; Figure 4A-D). The high precision of the results was demonstrated by the accuracy and Kappa coefficients, which were approximately greater than 0.94 (Acc) and 0.92 (K), respectively ( Figure  4A-D). Cluster heatmap displayed the correlation between the spectral data of crops (corn, sugarcane, coffee, canola, wheat, and tobacco) and their pigments (chloroplastidic, such as chlorophylls and carotenoids, and extrachloroplastidics, such as anthocyanins and flavonoids). These are grouped by UV-VIS, NIR, and SWIR wavelength bands, and spectral resolution of 10, 20, and 40 nm. The color blue indicates a positive relationship between the spectral bands, pigments present in crop plants (C 3 and C 4 metabolism), and pigment concentrations, while red represents negative correlations, as per the Z-scores (p < 0.001).
The NIR or SWIR bands demonstrated a strong and negative correlation with specific wavelength values, suggesting an interaction between pigments and leaf thickness, as well as the occurrence of scattering phenomena within leaf structures such as cell parenchyma and intercellular spaces. Nevertheless, carotenoids (Car), anthocyanins (AnC), and flavonoids (Flv) showed no correlation with the NIR-SWIR-linked spectra associated with the structure and structure-water of the bands. All wheat samples exhibited a positive Z-score across all bands from UV-VIS to sugarcane. Furthermore, the clustering of pigments and metabolism reveals that each crop plant possesses a specific fingerprint, which could potentially be associated with a specific vegetation index. This association takes into account the vegetative growth stages, and whether they are associated with UV-VIS bands (350-700 nm), NIR bands (700-1300 nm), or SWIR fingerprints (1300-2500 nm).

Principal Component Analysis (PCA), Correlation Coefficients, and Loadings of the Wavelengths
The first three principal components (PCs) accounted for 100% of the variance in the six spectral analyses (UV-VIS-NIR-SWIR) of corn, sugarcane, coffee, canola, wheat, and tobacco crops. Based on the PCA 3D plot in Figure 4, two primary clusters formed between C 3 and C 4 metabolisms (350-700 nm, 700-1300 nm, 1300-2500 nm, or 350-2500 nm; The correlation coefficients (CCs), principal component loadings (PCLs), and hyperspectral vegetation index (HVI) were obtained from the principal component analysis (PCA) across different crops. The results reveal the presence of three PCs, CC, HVI, and PCL, for the majority of the six spectral datasets ( Figures 5 and S1A,B). Visible (VIS) wavelengths were the dominant contributors to the first PC, which was linked to 555 and 660 nm. Near-infrared (NIR) wavelengths dominated the second PC, which was linked to 710, 940, 1080, and 1190 nm, with minor effects on the CC. Shortwave infrared (SWIR) wavelengths were the major contributors to the third PC, which combined the HVI, CC, and PCL with peaks and valleys at 1470, 1850, and 2245 nm.
A high correlation between NIR and SWIR was confirmed in the CC-associated PC1 ( Figures 5 and S1). The shapes of the HVI and PCL were complex, represented multiple contributions from UV-VIS-NIR-SWIR bands ( Figures 5 and S1B), and showed a nonrandom distribution in the pigment phenotyping-based area and mass ( Figure 5A-L). The correlation coefficients (CCs), principal component loadings (PCLs), and hyperspectral vegetation index (HVI) were obtained from the principal component analysis (PCA) across different crops. The results reveal the presence of three PCs, CC, HVI, and PCL, for the majority of the six spectral datasets ( Figure 5 and Figure S1A,B). Visible (VIS) wavelengths were the dominant contributors to the first PC, which was linked to 555 and 660 nm. Near-infrared (NIR) wavelengths dominated the second PC, which was linked to 710, 940, 1080, and 1190 nm, with minor effects on the CC. Shortwave infrared (SWIR) wavelengths were the major contributors to the third PC, which combined the HVI, CC, and PCL with peaks and valleys at 1470, 1850, and 2245 nm.
A high correlation between NIR and SWIR was confirmed in the CC-associated PC1 ( Figures 5 and S1). The shapes of the HVI and PCL were complex, represented multiple contributions from UV-VIS-NIR-SWIR bands ( Figures 5 and S1B), and showed a nonrandom distribution in the pigment phenotyping-based area and mass ( Figure 5A-L).

Machine Learning and Artificial Intelligence Algorithms for Classification
Phenotypic characterization of the pigments in corn, sugarcane, coffee, canola, wheat, and tobacco was performed using UV-VIS-NIR-SWIR hyperspectral data and ML and AI algorithms ( Figure 6). Eight algorithms were employed, namely, adaboost (AdaB), gradient boosting (GB), K-nearest neighbors (KNN), naive bayes (NB), neural network (NN), random forest (RF), support vector machine (SVM), and tree. These algorithms displayed a range of performances in classifying crops. Cross-validation data were used to classify crops using UV-VIS-NIR-SWIR spectra ( Figure 6A-D).

Machine Learning and Artificial Intelligence Algorithms for Classification
Phenotypic characterization of the pigments in corn, sugarcane, coffee, canola, wheat, and tobacco was performed using UV-VIS-NIR-SWIR hyperspectral data and ML and AI algorithms ( Figure 6). Eight algorithms were employed, namely, adaboost (AdaB), gradient boosting (GB), K-nearest neighbors (KNN), naive bayes (NB), neural network (NN), random forest (RF), support vector machine (SVM), and tree. These algorithms displayed a range of performances in classifying crops. Cross-validation data were used to classify crops using UV-VIS-NIR-SWIR spectra ( Figure 6A-D). The NB algorithm demonstrated lower accuracy than the other AI algorithms, and the NN displayed a lower correlation with the accurate crop classification produced by the other AI algorithms. However, the AdaB, GB, KNN, NB, NN, RF, SVM, and Tree algorithms all achieved ≈100% accuracy with high precision and efficiency. The SWIR bands showed higher accuracy and precision and a faster evaluation and confusion matrix generation than the UV-VIS or NIR bands ( Figures S2 and S3).
For example, NB, NN, and SVM demonstrated moderate accuracy and precision in the classification of crops within the ranges of 350-700 nm and 700-1300 nm, with 40-76% accuracy and precision in classifying crops. The use of the full spectra with individual AI algorithms demonstrated high accuracy, low error, and high precision in crop classification ( Figures S2 and S3). The results indicate significant discrimination (p < 0.01) in highthroughput phenotyping using individual AI algorithms for the UV-VIS, NIR, and SWIR bands (350-2500 nm).

Calibration, Cross-Validation, and Prediction Simultaneous Models by Crop Leaf-Based Partial Least Squares Regression
The results of spectral assessment for all crops during the mode calibration, crossvalidation, and prediction steps are presented in Tables 1 and S3, and Figure S3. The optimal number of components selected was four PCA-PLSR factors, determined through leaveone-out cross-validation. The correlation coefficients (R 2 ) at the calibration phase were ≥0.77 and reached a maximum of 0.89, which is similar to the results for the cross-validation phase (≥0.76, max. 0.88) and the prediction phase (≥0.66, max. 0.89). Additionally, high values of offset, root mean square error (RMSE), and RPD (≥2.1, max. 3.0) were observed, indicating significance in all PLSR parameters. The models showed low or approximately zero bias, demonstrating the absence of bias for corn, sugarcane, coffee, canola, wheat, and tobacco plants. In general, it is emphasized that increasing the number of selected target-specific wavelength regions for cross-validation and prediction models can lead to high accuracy (R 2 > 0.77), as shown in Figure S3. For instance, the partial least squares regression (PLSR) method based on UV-VIS-NIR-SWIR hyperspectral data predicts β-loadings and weighted coefficients. However, compared to models that predict pigments in area and mass contents, the reflectance factors in the UV-VIS, NIR, and SWIR regions show high amplitudes of difference coefficients and distinct loadings, with great differences being observed.
To achieve higher precision than the models reported in Table 1 and Figure S3, it is important to consider many peaks and valleys distributed in all spectra. In this way, the high-throughput phenotyping crop-based pigments and β-loadings and weighted coefficients predicted by the PLSR method display larger differences in shape, although they are complex and represent several scattering contributions. Nevertheless, here, the generated models are of high accuracy and precision and have minimal bias and noise.

Vegetation Indices and Pigment Profiling
Combining vegetation indices (VIs) resulted in a total of fifteen VIs, which showed both positive (13) and negative (2) statistical values for the classification and estimation of crop attributes (Figure 7 and Figure S3; Table S3). The optimal band combinations for VIs were determined to possess high accuracy, precision, and significance (F: 245.3; p < 0.001) for classifying, predicting, and monitoring pigments. The most responsive indices, PSSRc and RARS, demonstrated high accuracy in distinguishing all phenotypes utilizing UV-VIS-NIR-SWIR hyperspectral analyses ( Figure 7A). The correlation between VIs was evaluated using the circular correlation coefficient graph and showed high positive and negative correlations ranging from −1 to +1 (p < 0.001) ( Figure 7B). NDVI 750 , RARS, PSND, and PSSRc displayed strong positive interactions, while PSRI2 and FR showed strong negative interactions ( Figure 7B). SIPI, WBI, and MSI demonstrated minimal or negligible correlations with other VIs ( Figure 7B). method based on UV-VIS-NIR-SWIR hyperspectral data predicts β-loadings and weighted coefficients. However, compared to models that predict pigments in area and mass contents, the reflectance factors in the UV-VIS, NIR, and SWIR regions show high amplitudes of difference coefficients and distinct loadings, with great differences being observed.
To achieve higher precision than the models reported in Table 1 and Figure S3, it is important to consider many peaks and valleys distributed in all spectra. In this way, the high-throughput phenotyping crop-based pigments and β-loadings and weighted coefficients predicted by the PLSR method display larger differences in shape, although they are complex and represent several scattering contributions. Nevertheless, here, the generated models are of high accuracy and precision and have minimal bias and noise.

Vegetation Indices and Pigment Profiling
Combining vegetation indices (VIs) resulted in a total of fifteen VIs, which showed both positive (13) and negative (2) statistical values for the classification and estimation of crop attributes (Figures 7 and S3; Table S3). The optimal band combinations for VIs were determined to possess high accuracy, precision, and significance (F: 245.3; p < 0.001) for classifying, predicting, and monitoring pigments. The most responsive indices, PSSRc and RARS, demonstrated high accuracy in distinguishing all phenotypes utilizing UV-VIS-NIR-SWIR hyperspectral analyses ( Figure 7A). The correlation between VIs was evaluated using the circular correlation coefficient graph and showed high positive and negative correlations ranging from −1 to +1 (p < 0.001) ( Figure 7B). NDVI750, RARS, PSND, and PSSRc displayed strong positive interactions, while PSRI2 and FR showed strong negative interactions ( Figure 7B). SIPI, WBI, and MSI demonstrated minimal or negligible correlations with other VIs ( Figure 7B).    The prediction analysis showed that WBI, ARI, PSRI2, VOG2, and FR had limited correlations and associations with chlorophylls, carotenoids, anthocyanins, and flavonoids (correlation normalized below 0.50). These findings suggest that the selected reflectance indices have high potential for accurate and precise classification and prediction for the six crops analyzed. On the other hand, other indices showed higher correlation coefficients (above 0.5), and a significant portion of the scatter points fell within the 99% confidence interval of prediction, indicating the suitability of all models tested for each crop.
WBI, ARI, PSRI2, VOG2, and FR had limited correlations and associations with chlorophylls, carotenoids, anthocyanins, and flavonoids (correlation normalized below 0.50). These findings suggest that the selected reflectance indices have high potential for accurate and precise classification and prediction for the six crops analyzed. On the other hand, other indices showed higher correlation coefficients (above 0.5), and a significant portion of the scatter points fell within the 99% confidence interval of prediction, indicating the suitability of all models tested for each crop.

Remote Sensing Sensor and Pigment Phenotyping in Leaves for High-Throughput Monitoring Crops
The application of phenotyping for crop-based pigment profiling through remote sensing techniques shows substantial promise. The use of full spectra based on hyperspectral curves has been demonstrated to provide a more precise classification and estimation of pigment profiles in C3 and C4 plant metabolisms compared to range spectra [3,11,29]. Moreover, the UV-VIS-NIR-SWIR wavelength bands and sensor resolutions contribute significantly to the efficient differentiation of crop metabolism (C3, C4, or CAM) [7,8,16,20]. Multivariate PCA has also been utilized with high accuracy and precision to differentiate C4 metabolism (as in corn and sugarcane) from C3 metabolism (as in coffee, canola, wheat, and tobacco) using select UV-VIS to NIR to SWIR bands.
The potential for high-throughput pigment phenotyping to evaluate and track growth and development, biophysical and biochemical characteristics, and diseases in

Remote Sensing Sensor and Pigment Phenotyping in Leaves for High-Throughput Monitoring Crops
The application of phenotyping for crop-based pigment profiling through remote sensing techniques shows substantial promise. The use of full spectra based on hyperspectral curves has been demonstrated to provide a more precise classification and estimation of pigment profiles in C 3 and C 4 plant metabolisms compared to range spectra [3,11,29]. Moreover, the UV-VIS-NIR-SWIR wavelength bands and sensor resolutions contribute significantly to the efficient differentiation of crop metabolism (C 3 , C 4 , or CAM) [7,8,16,20]. Multivariate PCA has also been utilized with high accuracy and precision to differentiate C 4 metabolism (as in corn and sugarcane) from C 3 metabolism (as in coffee, canola, wheat, and tobacco) using select UV-VIS to NIR to SWIR bands.
The potential for high-throughput pigment phenotyping to evaluate and track growth and development, biophysical and biochemical characteristics, and diseases in crop production has been demonstrated in recent studies [1][2][3]8,13]. Spectral variations in pigment concentrations, including chlorophyll a and b, are closely correlated with differences in agronomic traits, such as plant height, grain yield, growth cycle, photosynthesis, transpiration, and water use efficiency [6,23,30,31].
Integrating hyperspectral sensors with chemometric techniques has proven effective for identifying and predicting a range of crop characteristics. For example, many studies have demonstrated the use of hyperspectral analyses to differentiate between livestock-integrated farming systems and indoor and vertical farming productions and to determine crop leaf characteristics [6,[32][33][34]. Thus, our first and second objective proposed methods have also shown potential for increasing yield production in crops such as wheat and canola.
According to recent studies by da Silva Junior et al. (2018) [35] and Wang et al. (2022) [7], biosensors are the most promising tools for remote sensing, as demonstrated in Figures 1-5. Hyperspectral analysis of chlorophylls and carotenoids in the VIS band has proven useful for monitoring [13,36]. Meanwhile, the best analysis of anthocyanins and flavonoids can be performed along with other pigment classes in the UV-VIS band [7,37,38]. This approach is a better tool for monitoring the status of six crops, and the increased presence of bioactive compounds or antioxidants in crops holds promise for increased yields [8,13,39,40]. Therefore, the UV-VIS band display serves as a robust technique for phenotyping and chemometrics, amalgamating numerous high-performance attributes in crop sciences, such as rapid precision, minimal sample requirement, and transparent technology for sample analysis, while safeguarding human health, safety, and quality [16,40].

Artificial Intelligence Algorithms Improvement Selection Pigment in Crops
The integration of remote sensing with AI techniques has proven highly effective for crop phenotyping and has contributed to improvements in crop yield, disease resistance, and cultivar selection. The use of hyperspectral analyses in conjunction with AI algorithms, such as random forest (RF) and neural network (NN), for pigment-based profiling has proven effective for classifying and monitoring six key crops with high accuracy. These algorithms can effectively link hyperspectral data with various factors, including nutritional deficiencies, heat and cold stresses, and crop yield [8,34,41].
Accordingly, the most accurate algorithms identified in previous studies include AdaBoost, gradient boosting, k-nearest neighbors, naive bayes, neural networks, random forests, support vector machines, and decision trees [7,21,28]. However, it is important to note that an accuracy greater than 60% does not necessarily guarantee a strong correlation with pigment concentration in C 3 and C 4 plants. The use of AI algorithms offers promising prospects for improving the extraction of complex interactions between hyperspectral data and pigment complexes, resulting in more accurate spectral data classification compared to other spectroscopic techniques. However, further research is needed to understand why AI algorithms do not respond to certain changes in the growth and development stages of C 3 and C 4 plants. Thus, our first and second objectives proposed in this method for our analysis are as follows.

Quantitative and Optimization PLSR Models to Estimate Pigments in Crops
In recent studies by Zhou et al. (2022) [38] and Zhang et al. (2022) [2], PLSR models were utilized to quantify the correlation between ultraviolet, visible, infrared, and shortinfrared spectral data and pigment concentration data in six crops: corn, sugarcane, coffee, canola, wheat, and tobacco. The data were divided into calibration (70%) and validation (30%) sets, with the calibration set of 270 samples and the validation set of 90 samples achieving high accuracy and precision [4,42]. The results highlighted robust generation models based on R 2 , offset, RMSE, RPD, bias, and weight coefficients, although estimating anthocyanins (AnC) and flavonoids (Flv) proved more challenging [40,43]. Despite the variation between C 3 and C 4 plants, the highest prediction values were obtained for AnC and Flv, with a full spectra method applied to crop analysis [13,34]. Therefore, spectral data pre-processing was found to remove irrelevant information and improve robustness and accuracy [23,44], but the results here show that pre-processing may not be necessary, as spectral data without pre-processing improved the accuracy, precision, and reliability of PLSR models [13,44].
PLSR models require an optimal four factors [13,16], but without evidence of overfitting based on all parameters tested (Table 1). This work is significant, demonstrating the wide application of the PLSR method for detecting issues in future field crops using remote sensing and biosensors within integrated farming systems, as few studies have used combined UV-VIS-NIR-SWIR hyperspectroscopy to analyze Chls, Car, AnC, and Flv pigments in the leaves of C 3 and C 4 agronomic plants [2,40]. In this way, combining several types of sensors and conducting diverse studies on the large number of parameters of many plants at high speed can additionally be used for the estimation of plant morphological parameters, physiological processes, and biochemical composition. Therefore, the use of reflectance spectroscopy for the classification and prediction of pigment profiling in agronomic crops and for other plant research could be an effective strategy for optimizing growing conditions. This could contribute significantly to progress in the field of agronomic research and practice and relate to our thirty objectives.
The hyperspectral vegetation index (HVI) is emerging as a vital tool for plant phenotyping, particularly for assessing pigmentation levels such as chlorophyll a, b, a+b, carotenoids, anthocyanins (AnC), and flavonoids (Flv). Utilizing reflectance hyperspectroscopy, HVI can select the most responsive wavelengths, optimizing methods for remote and proximal sensing of physiological, biochemical, and morphological characteristics of plants. This provides a non-invasive, highly accurate method for assessing plant health and development, and it could revolutionize current agronomic practices. As such, HVI presents a promising alternative for enhancing our understanding and management of pigment phenotyping in plants. In this sense, using a specific wavelength, such as 435, 470, 550, 680, 685, 705, or 750, instead of broader bands such as blue, green, red, or near-infrared, holds more promise in characterizing C 3 and C 4 crops. Therefore, HVI algorithms and vegetation indices based on narrow bands could potentially enable more precise classification when there are different pigments present in crop varieties within the same environment.
A recent study by Koh et al. (2022) [45] discussed how hyperspectral vegetation indices (VIs) are increasingly being used in agriculture and plant phenotyping to estimate plant biophysical and biochemical traits. This study presented an automated hyperspectral vegetation index (AutoVI) system for the rapid generation of novel trait-specific indices and showed that AutoVI can rapidly generate complex novel VIs that correlate strongly with the measured chlorophyll and sugar contents in wheat [45,46].
While many reflectance indices have demonstrated superior performance in estimating certain plant traits compared to existing vegetation indices [47], there is still a need for a more robust model [15,48]. Specifically, we lack a system that adequately incorporates data related to anthocyanins and flavonoids in plants of agronomic importance [43,48]. Enhancing the current models to incorporate these pigment compounds could provide a more comprehensive assessment of plant health and development, thus further optimizing high-throughput plant pigment phenotyping platforms [36,41,47]. Therefore, it is crucial to continue research efforts to enhance the accuracy and efficiency of plant pigment phenotyping methodologies.

Vegetation Indices Combined for Pigment Phenotyping
Vegetation indices (VIs) are utilized for quantifying and expressing variables in crops, addressing the phenotyping gap discussed in numerous studies [16,34,47,49]. VIs, obtained by combining remote sensing reflectance data from UV-VIS-NIR-SWIR regions, are simple and effective parameters for characterizing vegetation cover and plant growth status. Combining reflectance indices with machine learning (ML) algorithms has led to successful phenotyping with high accuracy and precision [16,34]. Thus, the relative contributions of VIs were studied and could be used to select VIs corresponding to each pigment in different crops and for successful monitoring [24,50].
Morphological and anatomical changes can be identified using different VIs [7,13,16]. Evaluations of the mutual effect of VIs and pigment biosynthesis and degradation showed that specific or range bands can correlate with pigment classes and concentrations, structural components, and cell organelles. Digital agriculture tools linked with high-throughput pigment phenotyping can classify changes in C 3 and C 4 plant production features with high accuracy, speed, and efficiency at a low cost per sample [2,7,34,49]. These changes facilitate pigment profiling and identification by hyperspectral sensors, machine learning, AIAs, and recent HVI algorithms, providing an alternative to the standard approach of distribution within the leaf profile, changing both the optical properties of leaves in visible bands (>500 nm) and generating different VIs.
Our results align with those of Fu et al. (2021) [4] and Braga et al. (2021) [29], in which pigment concentrations and contents represent at most one responsive pathway for the classification of distinct plants [4,29]. Thus, the initial hypothesis indicated that the monitoring status significantly impacts classification, but the estimation process requires monitoring over a different timescale. This is confirmed when significant alterations in the visible and near-infrared spectra are not correlated with classic Vis, such as NDVI 750 , VOG, PSI, and PRI, despite extensive changes in colorimetric patterns and leaf reflectance factors (R) (Figures 2-4). It is noteworthy that changes in cellular components should not affect spectral changes in the visible spectrum [13], but this phenomenon is commonly observed both in ranges and in full bands [47]. Therefore, the correlation previously reported is valid, as are the hypotheses initially described.
The internal structure of a leaf, such as the volume of mesophyll cells, varies among species [29]. As a result, the near-infrared (NIR) and shortwave infrared (SWIR) regions are greatly influenced by air cell interfaces. Additionally, the outer surface characteristics, including waxiness and epicuticular metabolites, trichomes, and stomates, also impact reflectance spectra in important crops. Therefore, the interplay of these external and internal leaf features with reflectance factors and pigment phenotyping may alter the energy flow within the leaf, potentially inducing toxic effects and impeding photosynthesis. Hence, the use of remote sensing sensors for high-throughput pigment phenotyping in crops is of utmost importance.
In contrast with Crusiol et al. (2022) [51] and Crusiol et al. (2023) [23], who did not find significant variation in pigments or the water status in soybean crops when monitored by combined AIAs, NB, and remote sensors, our study observed this variation in energy dissipation components. These components are mostly influenced by non-photosynthetic pigments, thermal influence by antioxidant mechanisms, the efficient use of water (WUE), and leaf water content (WBI, DSWI-5) (Figures 7 and 8). Therefore, these components exhibited changes in VIs across all regions of the analyzed spectra (350-2500 nm) [13]. Thus, it is important to evaluate the impact of grafting and various classes of pigments on crops such as corn, sugarcane, coffee, canola, wheat, and tobacco. We found that plants subjected to these classifications exhibited a higher degree of precision and accuracy in their monitoring status. This precision and accuracy were superior to, for instance, those stemming from integrated systems of cultivation and digital agriculture.

Plant Materials
Six plants of agronomic importance, corn (Zea mays L.), sugarcane (Saccharum officinarum L.), coffee (Coffea arabica L.), canola (Brassica napus L.), wheat (Triticum aestivum L.), and tobacco (Nicotiana tabacum L.), were collected from the Plant Cultivation Farms of the State University of Maringá (Maringá, Paraná, Brazil). The selection of these crop plants was based on their leaf development patterns. A total of 60 leaves were collected from each plant group, resulting in a total of 360 samples analyzed for modeling. Furthermore, plant metabolism was clustered into C 3 and C 4 categories based on wavelength bands, with corn and sugarcane classified as C 4 metabolism, and coffee, canola, wheat, and tobacco classified as C 3 metabolism. All leaves were collected and immediately taken to the laboratory for in vivo analyses (hyperspectral measures) and in vitro analysis (extraction of pigment profiles) during the vegetative growth phase. For leaves of the wheat plants, the flag leaf was evaluated.

Pigment Quantifications and Hyperspectral Analysis
The quantification of chlorophylls and carotenoids (Chla, Chlb, Chla+b, and Car), anthocyanins (AnC), and flavonoids (Flv) was performed using a methanolic extract-based method. The absorbance curve of pigments in vitro was analyzed between 350 and 1100 nm using a Shimadzu UV-3600 Plus UV-VIS-NIR spectrophotometer (Shimadzu Inc., Tokyo, Japan). The quantification of pigment profiles was performed and expressed in terms of area and mass, as detailed in [14,26].
In addition, hyperspectral reflectance was measured using spectroradiometers (ASD Inc; FieldSpec 3, Boulder, CO, USA) across the ultraviolet, visible, near-infrared, and shortwave infrared bands, following the methodology outlined in [15]. In brief, the PlantProbe ® leaf clip (Analytical Spectral Devices ASD Inc., Longmont, CO, USA) was used to ensure data acquisition free of atmospheric effects. Standard white reference plates (Spectralon ® , Labsphere Inc., Longmont, CO, USA) were employed for equipment calibration and optimization. Reflectance spectra of the leaves were obtained in the 350-2500 nm range. The equipment was programmed to perform 50 readings for each sample, generating an average spectral curve. Measurements were taken at a single point on the adaxial face of the leaves. Furthermore, the same leaves used for pigment quantification were also analyzed using spectroradiometers to establish correlations between the pigment profiles and the corresponding reflectance spectra. This step was crucial for generating and validating the models.

Analysis of Variance and Descriptive Statistics
One-way analysis of variance was performed to analyze the data. The results were considered statistically significant if the p-value was less than 0.001. To compare attributes, a post hoc Duncan's test was applied. Pearson's correlation was also calculated (p < 0.001). The results included the means, medians, minimum, maximum, and coefficient of variation (CV) of the calculated data, following the methods reported by [27].

Analysis of Leaf Reflectance Spectral Fingerprints
Based on the hyperspectral reflectance curves, parameters derived from the machine learning and artificial intelligence algorithm decision analysis were evaluated for the main fingerprint groups associated with wavelengths (Table S1). The fingerprints, along with vibrational modes, were related to the C 3 and C 4 metabolisms of corn, sugarcane, coffee, canola, wheat, and tobacco crops. The correlations of coefficients, principal component (PC) loadings, beta-loadings, and weighted coefficients were analyzed to identify the key fingerprints. Principal component analysis (PCA) was performed using The Unscrambler (CAMO AS, Oslo, Norway).

Machine Learning, Artificial Algorithms, and Hyperspectral Vegetation Index
Machine learning (ML) and artificial algorithms (AIAs) were utilized to perform routine analysis using Orange software scripts. The algorithms employed included adaboost (AdaB), gradient boosting (GB), kernel K nearest neighbors (KNN), naive bayes (NB), neural network (NN), random forest (RF), support vector machines (SVM), and tree (Tree). The evaluation model was based on a 70:30 data split, with 70% of the data used for training and 30% for testing. The performance of the prediction AIAs was evaluated in terms of rankperformed precision and recall data, as detailed by [28]. In addition, each algorithm model was subjected to a confusion matrix analysis to assess the predicted data. All algorithm testing followed [13]. In addition, the Hyperspectral Vegetation Index (HVI) method was used to calculate all possible combinations of two spectral bands, each corresponding to a unique HVI algorithms. These HVIs are then correlated with leaf optical property efficiency measures, using the Pearson correlation coefficient and coefficient of determination, and visualized as contour maps [23].

Vegetation Indices
The vegetation indices (VIs) were calculated based on the descriptions in Table S1. Reflectance hyperspectral data were tested with 15 VIs, such as NDVI 750 , WBI, RARS, ARI1, PSND, SIPI, PSRI, PSRI2, PSSRc, VOG1, VOG2, MSI, PRI, PVR, and FR (Table S1). These VIs were used to correlate pigment profiling and make decisions for the best crop classification based on correlation and association with a significance level of p < 0.001.

Conclusions
In conclusion, this method enables efficient and simultaneous quantification of chlorophyll, carotenoids, anthocyanins, and flavonoids in six crops (corn, sugarcane, coffee, canola, wheat, and tobacco) based on area and mass. The approach combines artificial intelligence, vegetation indices, wavelength selections, and remote sensing sensors to develop UV-VIS-NIR-SWIR models, contributing to advancements in digital agriculture. Our results demonstrate that PLSR, vegetation indexes, such as NDVI 750 , VOG, PSI, and PRI, despite extensive changes, and hyperspectral vegetation index (HVI) algorithm models exhibit precision and accuracy in calibration, cross-validation, and prediction. They provide comprehensive, robust, and rapid evaluation tools for crop quality and agronomic science. In addition, the use of full spectra based on hyperspectral curves has been demonstrated to provide a more precise classification and estimation of pigment profiles in C 3 and C 4 plant metabolisms compared to range spectra. Consequently, our study offers promising opportunities for the use of simultaneous algorithms in deciphering complex interactions between hyperspectral data and pigment profiling phenotyping in plants. This could lead to more accurate and precise spectral data classifications in the future, benefiting research in remote sensing for plant research.