Chemometric Analysis for the Prediction of Biochemical Compounds in Leaves Using UV-VIS-NIR-SWIR Hyperspectroscopy

Reflectance hyperspectroscopy is recognised for its potential to elucidate biochemical changes, thereby enhancing the understanding of plant biochemistry. This study used the UV-VIS-NIR-SWIR spectral range to identify the different biochemical constituents in Hibiscus and Geranium plants. Hyperspectral vegetation indices (HVIs), principal component analysis (PCA), and correlation matrices provided in-depth insights into spectral differences. Through the application of advanced algorithms—such as PLS, VIP, iPLS-VIP, GA, RF, and CARS—the most responsive wavelengths were discerned. PLSR models consistently achieved R2 values above 0.75, presenting noteworthy predictions of 0.86 for DPPH and 0.89 for lignin. The red-edge and SWIR bands displayed strong associations with pivotal plant pigments and structural molecules, thus expanding the perspectives on leaf spectral dynamics. These findings highlight the efficacy of spectroscopy coupled with multivariate analysis in evaluating the management of biochemical compounds. A technique was introduced to measure the photosynthetic pigments and structural compounds via hyperspectroscopy across UV-VIS-NIR-SWIR, underpinned by rapid multivariate PLSR. Collectively, our results underscore the burgeoning potential of hyperspectroscopy in precision agriculture. This indicates a promising paradigm shift in plant phenotyping and biochemical evaluation.


Introduction
Over the past few years, hyperspectral spectroscopy has become prominent, revolutionising botanical and agronomic research and bridging the intricacies of plant biology with state-of-the-art technological advancements [1,2]. Revolutionary advances in remote sensing technology, particularly in hyperspectral non-imaging and imaging, have expanded the frontiers of precision agriculture, environmental monitoring, and plant physiology research. Spanning the regions from ultraviolet to shortwave infrared (UV-VIS-NIR-SWIR), Table 1. Descriptive statistics of biochemical parameters measured in leaves of Hibiscus rosa-sinensis L. (Hibiscus) and Pelargonium zonale (L.) L'Hér. Ex. Aiton (Geranium). For each parameter, the table presents the count (n), mean, median, minimum, maximum, and coefficient of variation (CV %). (n = 200).

Parameters
Count (
Principal component analysis (PCA) was performed to further delve into the spectral data, and the results are shown in Figure 1B. The two primary dimensions, Dimension 1 (Dim 1) and Dimension 2 (Dim 2), collectively accounted for a significant 63.4% variance-42.2% attributed to Dim 1 and 21.2% to Dim 2. The evident clustering in the PCA plot highlights the inherent spectral differences and unique biochemical compositions of Hibiscus and Geranium. In addition, the vectors demonstrate a higher correlation for mass-based biochemical compounds (Car, Chla, Chla+b, Flv) for Hibiscus and area-based biochemical compounds (Chlb, Chla+b Car, Chla) for Geranium plants. This in-depth analysis aimed to identify specific compounds and their interactions, providing insights into their distinctive biochemical and structural attributes ( Figure 1). The measured extracts and growth variables included chlorophyll a-based area (Chla(area)), chlorophyll b-based area (Chlb(area)), combined chlorophyll a+b-based area (Chla+b(area)), carotenoid-based area (Car(area)), chlorophyll a-based mass (Chla(mass)), chlorophyll b-based mass (Chlb(mass)), combined chlorophyll a+b-based mass (Chla+b(mass)), carotenoid-based mass (Car(mass)), chlorophyll a/b ratio (Chla/b), flavonoid-based area (Flv(area)), flavonoid-based mass (Flv(mass)), phenolic compounds (Phe), radical scavenging activity (DPPH), lignin, and cellulose. (n = 200).

Spectral Reflectance and Principal Component Analysis of Hibiscus and Geranium Leaves
For distinct maximum factors, based on the vectors analysed, the hyperspectral reflectance values for Hibiscus rosa-sinensis L. and Pelargonium zonale (L.) L'Hér. Ex. Aiton was evaluated across the UV-VIS-NIR-SWIR bands with a spectral resolution of 1 nm (Figure 2). Within this spectrum, clear demarcations at 700 nm and 1300 nm indicate transitions from the visible (VIS) spectrum to the near-infrared (NIR) and from NIR to the shortwave infrared (SWIR) bands, respectively. In addition, a t test comparison yielded a value of 9.38 with a corresponding p value of 0.03, signifying marked differences in the biochemical attributes of the leaves between the two species ( Figure 2A).
To further elucidate these differences, principal component analysis (PCA) was employed for the hyperspectral curves ( Figure 2B). The first principal component (PC1) accounted for 83% of the total variance, while the second (PC2) represented 15%. The mean PCA value for Hibiscus on PC1 was 0.932, in contrast to the Geranium plant's mean of -0.923, with an accuracy of 0.66 and Kappa coefficient of 0.64, emphasising the distinct spectral characteristics of each species. This analysis was conducted to identify the specific compounds in the leaves of both plants.

Spectral Reflectance and Principal Component Analysis of Hibiscus and Geranium Leaves
For distinct maximum factors, based on the vectors analysed, the hyperspectral reflectance values for Hibiscus rosa-sinensis L. and Pelargonium zonale (L.) L'Hér. Ex. Aiton was evaluated across the UV-VIS-NIR-SWIR bands with a spectral resolution of 1 nm ( Figure 2). Within this spectrum, clear demarcations at 700 nm and 1300 nm indicate transitions from the visible (VIS) spectrum to the near-infrared (NIR) and from NIR to the shortwave infrared (SWIR) bands, respectively. In addition, a t test comparison yielded a value of 9.38 with a corresponding p value of 0.03, signifying marked differences in the biochemical attributes of the leaves between the two species ( Figure 2A).
To further elucidate these differences, principal component analysis (PCA) was employed for the hyperspectral curves ( Figure 2B). The first principal component (PC1) accounted for 83% of the total variance, while the second (PC2) represented 15%. The mean PCA value for Hibiscus on PC1 was 0.932, in contrast to the Geranium plant's mean of −0.923, with an accuracy of 0.66 and Kappa coefficient of 0.64, emphasising the distinct spectral characteristics of each species. This analysis was conducted to identify the specific compounds in the leaves of both plants.
The observed spectral differences between the two species highlight their unique biochemical and structural leaf optical properties.
Based on Figure 3, the subsequent components, namely, PC-3, PC-4, and PC-5, contributed minimally, accounting for just over 1% of the total variance, and PC-6 and PC-10 contributed 0.05% of the data. The cumulative variability across the components was visually depicted with red circles, confirming the dominance of PC-1 and PC-2 in capturing the spectral differences between Hibiscus and Geranium (Figures 2 and 3). This analysis underscored the inherent spectral variability and highlighted the significant contribution of the initial components to variance in the data ( Figure 3). Based on Figure 3, the subsequent components, namely, PC-3, PC-4, and PC-5, contributed minimally, accounting for just over 1% of the total variance, and PC-6 and PC-10 contributed 0.05% of the data. The cumulative variability across the components was visually depicted with red circles, confirming the dominance of PC-1 and PC-2 in capturing the spectral differences between Hibiscus and Geranium (Figures 2 and 3). This analysis underscored the inherent spectral variability and highlighted the significant contribution of the initial components to variance in the data ( Figure 3).

Calibration, Cross-Validation, and Prediction of Biochemical Parameters in Hibiscus and Geranium Leaves
Calibration and cross-validation were undertaken using partial least squares regression (PLSR) to establish the relationships between hyperspectral reflectance data and the biochemical parameters in leaves of Hibiscus rosa-sinensis L. and Pelargonium zonale (L.) L'Hér. Ex. Aiton (Table 2). During the calibration process, chlorophyll a, based on area  Based on Figure 3, the subsequent components, namely, PC-3, PC-4, and PC-5, contributed minimally, accounting for just over 1% of the total variance, and PC-6 and PC-10 contributed 0.05% of the data. The cumulative variability across the components was visually depicted with red circles, confirming the dominance of PC-1 and PC-2 in capturing the spectral differences between Hibiscus and Geranium (Figures 2 and 3). This analysis underscored the inherent spectral variability and highlighted the significant contribution of the initial components to variance in the data ( Figure 3).

Calibration, Cross-Validation, and Prediction of Biochemical Parameters in Hibiscus and Geranium Leaves
Calibration and cross-validation were undertaken using partial least squares regression (PLSR) to establish the relationships between hyperspectral reflectance data and the biochemical parameters in leaves of Hibiscus rosa-sinensis L. and Pelargonium zonale (L.) L'Hér. Ex. Aiton (Table 2). During the calibration process, chlorophyll a, based on area

Calibration, Cross-Validation, and Prediction of Biochemical Parameters in Hibiscus and Geranium Leaves
Calibration and cross-validation were undertaken using partial least squares regression (PLSR) to establish the relationships between hyperspectral reflectance data and the biochemical parameters in leaves of Hibiscus rosa-sinensis L. and Pelargonium zonale (L.) L'Hér. Ex. Aiton (Table 2). During the calibration process, chlorophyll a, based on area (Chla+b(area)) and mass (Chla+b(mass)), presented values (R 2 ) of 0.73 and 0.14, respectively. However, upon cross-validation, these values experienced a slight dip, measuring 0.71 and 0.08, respectively. Specifically, for chlorophyll a (Chla (mg m −2 )), a value of 0.73 was observed during calibration, which was adjusted to 0.71 in the cross-validation phase. Chlorophyll b (Chlb (mg m −2 )) recorded a value of 0.63 in calibration and 0.60 in cross-validation. Carotenoids considered in terms of area and mass showcased values of 0.86 and 0.49 during calibration, adjusting to 0.85 and 0.43 during cross-validation. Table 2. Calibration and cross-validation validation statistics for biochemical parameters measured in leaves of Hibiscus rosa-sinensis L. (Hibiscus) and Pelargonium zonale (L.) L'Hér. Ex. Aiton (Geranium) using PLS regression models. The table presents the maximum PLS factor, coefficients of determination (R 2 ), offset, root mean square error (RMSE), and ratio of prediction to deviation (RPD) for each parameter during the calibration and cross-validation. (n = 140). Furthermore, this study also examined other significant biochemical parameters. For instance, the lignin concentration displayed a calibration of 0.74, which was adjusted to 0.71 during the cross-validation phase. However, while the study examined parameters such as the radical scavenging potential of DPPH, cellulose, and Flv, they showed significantly higher values. Nevertheless, the comprehensive results of these metrics require further research or data acquisition to establish the contribution of the most associated bands ( Table 2).

Parameters
Such patterns, spanning chlorophyll to lignin, underscore the robustness of the PLSR model base area and are minor for mass units. The consistent correlations between the reflectance data and biochemical parameters indicate the success of the model in predicting and validating these parameters in the leaves of the study plants ( Table 2).
The validation and prediction phases further attested to the accuracy of the established models, with Partial Least Squares Regression (PLSR) offering key insights into the correlations between hyperspectral reflectance data and biochemical parameters in leaves (Table 3 and Figure 4). The root mean square error of prediction (RMSEP) values were notably insightful. The precision of these PLSR models was visually represented in scatter plots, as depicted in Figure 4. The correlation, slope, offset, and other predictive statistical parameters for various biochemical parameters in the leaves of both plants are summarised in Table 3. This table shows the maximum PLS factor, correlation coefficient (r), slope, offset, standard error of prediction (SEP), ratio of prediction to deviation (RPD), bias, and the linear equation relating the prediction to the calibration model (R 2 P). Table 3. Predictive statistical parameters obtained from PLS regression models for biochemical parameters in leaves of Hibiscus rosa-sinensis L. (Hibiscus) and Pelargonium zonale (L.) L'Hér. Ex. Aiton (Geranium). The table presents the maximum PLS factor, correlation coefficient (r), slope, offset, standard error of prediction (SEP), ratio of prediction to deviation (RPD), bias, and the linear equation relating prediction to the calibration model (R 2 P). (n = 140). Chlorophyll a (Chla (mg m −2 )) exhibited a remarkable correlation coefficient (r) of 0.93 and a robust RPD of 2.65. With a maximum PLS factor of 6, the model displayed a slope of 0.70, an offset of 303.9, and a SEP of 144.5, underpinning its efficiency in predicting Chla concentrations with a bias of 97.2. On the other hand, chlorophyll b (Chlb (mg m −2 )) and total chlorophyll (Chla+b (mg m −2 )) had correlation coefficients of 0.85 and 0.86, respectively. The PLSR models for these parameters offered insights into their concentrations with biases of 124.2 and 286. Carotenoids (Car (mg m −2 )), an essential capacity for photoprotection in plants, stood out with an exemplary r value of 0.96 and an RPD of 3.53, suggesting the model's precision in predicting carotenoid concentrations. The Car (mg m −2 ) model, with a PLS factor of 6, displayed a slope of 0.90 and a low bias of 1.8, reinforcing its predictive accuracy.

Parameters
In contrast, the flavonoids (Flv (nmol cm −2 )) showed a lower correlation coefficient of 0.05. Even with a maximum PLS factor of 2, the prediction showed a SEP of 0.7 and a nearneutral bias of 0.6. However, when analysed on a different scale, such as Flv (µmol g −1 ), the correlation improved significantly to 0.83, with an RPD of 1.81, in agreement with the calibration models (R 2 = 0.62). Phenolic compounds (Phe (mL L −1 )) registered an RMSEP value of 1.35 but had a correlation coefficient of −0.11, indicating potential discrepancies in the predictions. Antioxidant capacity, represented by DPPH, also exhibited a strong correlation coefficient of 0.86, with an RPD value of 1.98, highlighting the model's reliability for predicting antioxidant levels in plant samples.
Lignin (mg g −1 ), crucial for plant structural integrity, showcased notable performance with a correlation coefficient of 0.89 and an RPD of 2.24. The ability of this model to predict lignin concentrations with a PLS factor of 7 underscores its significance, given lignin's pivotal role in plant physiology and its influence on plant reflectance spectra. For the cellulose PLSR model, with a maximum PLS factor of 5, the correlation coefficient (r) was 0.91, indicating a strong relationship between the predicted and observed cellulose concentrations. The model exhibited an RPD value of 2.54, demonstrating its reliability in predicting cellulose concentrations.   In addition, the PLSR models have proven to be an indispensable tool offering unparalleled precision and reliability in predicting a myriad of biochemical attributes in the leaves of the examined plants. The consistent and robust performance across various parameters testifies to the potential of hyperspectral non-imaging coupled with PLSR in plant chemometric parameters, as shown in Table 3 and Figure 4.

Spectral Weighted Coefficients and Loadings from PLSR Analysis
In the PLSR model analysis, the metrics for the weight and loadings across the UV-VIS-NIR-SWIR spectral range are in Figure 5. The analysis indicated a consistent distribution of regions characterised by prominent peaks and valleys, underscoring the role of weights and loadings in the formulation of the accuracy and precision of the predictive model. A meticulous analysis of the PLSR model revealed one of the two salient wavelengths within the 350 to 2500 nm range, each intrinsically linked to specific biochemical molecules.
For Chla (mg m −2 ), a pronounced peak wavelength was observed at 698 nm, complemented by a significant valley at 723 nm. In a similar leaf, Chlb (mg m −2 ) exhibited a peak at 515 nm and a valley at 723 nm. The wavelengths associated with Chla+b (mg m −2 ) peak at 696 nm and a valley at 722 nm. Meanwhile, carotenoids (mg m −2 ) register a peak at 789 nm and a valley at 725 nm. The parameter flavonoids (nmol cm −2 ) is characterised by a peak at 719 nm and a valley at 1392 nm.
When transitioning to mass units, there is a discernible shift in wavelength. The Chla/b (mg g −1 ) peaks at 721 nm and valleys at 516 nm. Chla (mg g −1 ) peaks at 1064 nm with a valley at 1521 nm, whereas Chlb (mg g −1 ) peaks at 803 nm and valleys at 716 nm. The wavelengths for Chla+b (mg g −1 ) peak at 698 nm and valley at 1521 nm, and those for carotenoids (mg g −1 ) peak at 716 nm with a valley at 1505 nm. The flavonoids (µmol g −1 ) also exhibited peaks at 715 nm and valleys at 1489 nm. Phenolics (mL L −1 ) were marked by a peak at 354 nm and a valley at 1468 nm, and DPPH peaks at 363 nm with a valley at 719 nm. The wavelengths for the mg de lignin g −1 peak at 698 nm and valley at 723 nm, and those for cellulose (nmol mg −1 ) peak at 713 nm and valley at 1383 nm ( Figure 5).
The wavelengths identified in this study provide a profound understanding of the interplay between spectral data and molecular composition, paving the way for advanced insights into hyperspectroscopy and its contributions to specific modelling endeavours.

Hyperspectral Vegetation Index for Selected Most Responsive Wavelengths and Bands
Hyperspectral vegetation indices spanning the wavelength spectrum from 350 to 2500 nm and clear correlation dynamics were observed. The area based on chlorophyll a (Chla(area)) stood out with a compelling (R 2 ) value of 0.89, underscoring its pronounced linear association with the examined wavelengths. Conversely, several parameters, namely, the chlorophyll a/b ratio (Chla/b), flavonoids (Flv), area, chlorophyll a mass (Chla(mass)), combined chlorophyll a+b mass (Chla+b(mass)), phenolic compounds (Phe), and cellulose, revealed milder correlations. The cluster heatmap gradient, transitioning seamlessly from deep blue to red, elegantly encapsulated these insights, presenting an intricate portrayal of correlation magnitudes across diverse biochemical constituents ( Figure 6). Plants 2023, 12, x FOR PEER REVIEW 10 of 25

Hyperspectral Vegetation Index for Selected Most Responsive Wavelengths and Bands
Hyperspectral vegetation indices spanning the wavelength spectrum from 350 to 2500 nm and clear correlation dynamics were observed. The area based on chlorophyll a (Chla(area)) stood out with a compelling (R 2 ) value of 0.89, underscoring its pronounced linear association with the examined wavelengths. Conversely, several parameters, namely, the chlorophyll a/b ratio (Chla/b), flavonoids (Flv), area, chlorophyll a mass (Chla(mass)), combined chlorophyll a+b mass (Chla+b(mass)), phenolic compounds (Phe), and cellulose, revealed milder correlations. The cluster heatmap gradient, transitioning seamlessly from deep blue to red, elegantly encapsulated these insights, presenting an intricate portrayal of correlation magnitudes across diverse biochemical constituents ( Figure 6).

Algorithms for Selected Most Responsive Wavelengths and Bands
To determine the most responsive and relative contribution wavelengths spanning the UV-VIS-NIR-SWIR1-SWIR2 spectral regions for the Hibiscus and Geranium species, a comprehensive suite of advanced computational algorithms was employed. Partial least squares (PLS), variable importance in projection (VIP), interval PLS-VIP (iPLS-VIP), genetic algorithms (GA), random forest (RF), and competitive adaptive reweighted sampling (CARS), each with its distinct computational framework, provided a multifaceted perspective on spectral data (Figures 7 and 8).
The partial least squares (PLS) technique demonstrated a notable affinity towards the UV and VIS regions. In the UV spectrum, a select cohort of 4 wavelengths emerged as pivotal, while in the VIS spectrum, a more expansive set of 131 wavelengths was demarcated. The NIR domain, rich in spectral information, is marked by 66 distinct wavelengths. Concurrently, the spectral behaviours of the SWIR1 and SWIR2 regions are often complex, with selections of 26 and 29 wavelengths, respectively. The variable importance in the projection (VIPs) algorithm, with its nuanced computational mechanics, unveiled a more expansive spectral selection. Within the UV domain, 57 wavelengths are accentuated. The VIS and NIR regions, both intricate in their spectral compositions, were densely populated, with wavelengths of 316 and 154, respectively. The SWIR1 and SWIR2 spectra were not overshadowed, with 133 and 75 wavelengths earmarked orderly. The integrated iPLS-VIP approach, which combines the principles of PLS and VIP, produces a diverse and intricate selection matrix. The UV and VIS domains were punctuated at wavelengths of 39 and 157, respectively. The NIR region, with its rich spectral intricacies, resonates profoundly with 197 selected wavelengths. The SWIR sectors, particularly SWIR1 and SWIR2, are delineated at wavelengths of 335 and 187, respectively.
Genetic algorithms (GAs), lauded for their dynamic computational adaptability, etched a distinct bias towards the UV and VIS domains, earmarking 15 and 50 wavelengths, respectively. The subsequent spectral niches, notably NIR, SWIR1, and SWIR2, were populated with 27, 44, and 14 wavelengths, respectively, thereby demonstrating the versatility of the algorithm.
The random forest (RF) algorithm, renowned for its robust equitability in data handling, unveiled a harmonious spectral distribution for model construction. It encompasses an array of 75 wavelengths in the UV domain, a robust contingent of 433 in the VIS spectrum, and a hearty 202 in the NIR bands. The SWIR spectra, with their unique spectral signatures, were carefully addressed, with SWIR1 and SWIR2 contributing 134 and 71 wavelengths, respectively.
In the last analysis, the Competitive Adaptive Reweighted Sampling (CARS) algorithm, with its intricate computational schema, presented a holistic spectral panorama. It identified 48 wavelengths in the UV bands, a substantial 320 in the VIS spectrum, 199 in the NIR bands, and a synergistic total of 243 spanning the SWIR1 (108) and SWIR2 (135) bands. All selected wavelengths were distributed for the evaluated biochemical parameters ( Figure 7A

Biochemical Parameters
Understanding the biochemical parameters of plants provides insights into their physiological status, overall health, and responses to environmental stress. In this study, the parameters for Hibiscus rosa-sinensis L. (Hibiscus) and Pelargonium zonale (L.) L'Hér. Ex. Aiton (Geranium) was methodically assessed to predict and select the most responsive wavelengths and bands.
Chlorophylls, specifically chlorophyll a (Chla) and chlorophyll b (Chlb), are the key pigments facilitating photosynthesis. For the plants studied, the average Chla concentration (1322.8 mg m −2 and 64.2 mg g −1 ) was higher than that of Chlb (1012.0 mg m −2 and 46.0 mg g −1 ), in coherence with the known predominant presence of Chla in plants. The higher coefficients of variation (CVs) associated with Chlb suggest greater variability, which can be attributed to the role of Chlb in adjusting the light absorbed for mechanisms of dissipation exceeding energy due to its broader absorption peak, as suggested by [19][20][21][22]. Along these lines, carotenoids (Cars) with higher accumulation play an integral role in photoprotection. These pigments are essential for safely dissipation of excess energy, particularly under intense light or stress conditions. Additionally, carotenoids assist in maintaining the structural integrity of the photosynthetic apparatus and act as antioxidants, protecting plant cells from potential damage caused by reactive oxygen species [4,23,24]. Their accumulation indicates a plant's adaptive response to ensure optimal photosynthetic efficiency and minimise photodamage under varying environmental conditions.
Flavonoids are secondary metabolites recognised to protect plants against UV radiation and pathogens [20,25]. Their variable concentrations, as denoted by the high CV, possibly reflect the adaptive nature of plants to varying environmental factors. The consistently low CV for DPPH, an indicator of antioxidant potential, suggested that the radical scavenging capacity was relatively stable across the samples studied. This aligns with previous findings wherein plants exhibited consistent antioxidant capabilities despite varying conditions [5,26,27].
Lignin and cellulose are vital components of the plant cell wall, imparting structural integrity. The lower CV of lignin compared to cellulose suggests a more uniform distribution or a consistent synthesis mechanism across both species. Bloem, Gerighausen, Chen & Schnug (2020) [28] suggested that lignin biosynthesis is intricately regulated by mechanisms related to light interaction with the leaves, which could account for the observed consistency.
The correlative matrix and principal component analysis shed light on the intricate interactions between these biochemical parameters. The strong positive associations between lignin and chlorophyll parameters are consistent with those reported in previous studies. For instance, Vanholme, Demedts, Morreel, Ralph & Boerjan (2010) [29] proposed that lignin synthesis might be affected by the rate of photosynthesis and, consequently, chlorophyll content.
Moreover, the negative correlation between DPPH and the Chla/b ratio and Flv might suggest a compensatory mechanism wherein higher antioxidant potential is associated with a lowered Chla/b ratio, perhaps indicating stress conditions where Chla predominance is essential. Carotenoids, which play a crucial role in photoprotection and are precursors for abscisic acid, show significant correlations with various parameters. Their positive association with Chla, as noted by Steidle Neto et al. (2017) [21], is a testament to their synergistic role in photosynthesis.
Finally, the PCA results encapsulated 63.4% of the variance. They revealed distinctive biochemical compositions for Hibiscus and Geranium, reiterating species-specific biosynthetic pathways and regulatory mechanisms that differentiate plant species in terms of their biochemical constituents.
This exploration of the biochemical parameters of Hibiscus and Geranium leaves offers a comprehensive overview of their physiological and biochemical characteristics and underscores the intricate interplay of these parameters. These findings pave the way for further investigation into how alterations in plant biochemistry can modify the selected wavelengths and the most responsive bands.

Advanced Data Analysis for Hyperspectroscopy UV-VIS-NIR-SWIR
Hyperspectral reflectance is an efficient and effective method to discern spectra through imaging or non-imaging methods. It is a powerful tool that can capture and analyse information across various electromagnetic wavelengths. Its application in plant biochemistry, particularly in the UV-VIS-NIR-SWIR range, has grown significantly over the past decade because of its ability to provide detailed insights into the biochemical and structural properties of plant tissues without causing harm [1].
Spectral reflectance and its significance, for example, the spectral reflectance values for Hibiscus rosa-sinensis L. (Hibiscus) and Pelargonium zonale (L.) L'Hér. Ex. Aiton (Geranium), as captured across the UV-VIS-NIR-SWIR bands, revealed inherent biochemical differences between the two plant species. Transitions noted at 700 and 1300 nm are crucial, marking the shifts from the VIS spectrum to NIR and NIR to SWIR. These transitions, particularly from VIS to NIR, are often associated with the chlorophyll absorption peak, which provides insight into the photosynthetic efficiency of plants [30]. The marked difference in reflectance values, as corroborated by the t test, underscores the intrinsic biochemical variance between Hibiscus and Geranium. These differences may be attributed to variations in the chlorophyll content, cellular structures, and moisture content [31].
For example, PCA for hyperspectral analysis is an efficient way to analyse the data derived from curves. The utility of PCA in analysing hyperspectral data cannot be overstated for some of these aspects, such as CA and spectral diversity. As demonstrated by the significant variance captured by PC1 (83%) and PC2 (15%) and the high accuracy and precision, the technique effectively consolidates complex spectral data into digestible formats. Impressively, only two principal components accounted for nearly 98% of the variance, emphasising the distinct spectral characteristics of Hibiscus and Geranium plants. These spectral differences can be attributed to variations in compounds such as flavonoids, chlorophylls, and phenolic compounds, each with unique reflectance and absorption profiles in the hyperspectral range [32]. Moreover, the dominance of PC1 and PC2, as visually emphasised by the cumulative variability circles, further emphasises the robustness of PCA in representing the complex interplay of spectral wavelengths. Therefore, any slight variations in the third and subsequent principal components, which account for only a negligible portion of the total variance, are not expected to play a pivotal role in deciphering the overall biochemical and structural attributes of the two plants.
In this sense, complemented by advanced data analysis techniques such as PCA, hyperspectral sensors offer a promising avenue to decipher the complex biochemical attributes of photosynthetic pigments and other compounds in plants. As demonstrated by the distinct reflectance profiles of Hibiscus and Geranium, this technology holds significant potential for distinguishing plant species, understanding their unique biochemical compositions, and gaining insights into their physiological and structural properties.

Biochemical Parameters for Calibration, Cross-Validation, and Prediction PLSR Models
Partial least squares regression (PLSR) has proven to be a robust method for establishing relationships between hyperspectral reflectance data and various biochemical parameters in plants [18,33,34]. This study is no exception, where PLSR was employed to calibrate and cross-validate the relationships between reflectance data and biochemical parameters in the leaves of Hibiscus and Pelargonium geranium plants.
During the calibration phase, based on area, the chlorophyll a and carotenoid concentrations showed strong correlations with values of 0.93 and 0.96, respectively. Such high calibration values typically underscore a reliable model; however, as with most models, cross-validation often provides slightly lower correlation values. This study corroborates this expectation with values (R 2 ) of 0.90 and 0.89 for chlorophylls and dissipation energy based on area and mass, respectively [35,36].
While chlorophyll a and b and carotenoids received significant attention in the study, less commonly studied biochemical parameters such as lignin concentrations also showed respectable calibration values. The calibration values for lignin, a complex organic polymer critical for structural support in vascular plants [37], were 0.74 and 0.71 for R 2 to crossvalidation, indicating reasonable model reliability.
The data for parameters such as DPPH, which represents the radical scavenging potential, along with cellulose and SAE, indicate the study's efforts to achieve a comprehensive understanding of the biochemistry of plants. Nevertheless, the apparent need for further research or data acquisition regarding these parameters highlights the challenges faced when attempting to calibrate and validate models for certain biochemical compounds. This reflects a larger issue in the scientific community, where achieving high calibration values for certain parameters remains elusive even with advanced techniques [21,38,39]. Table 2 shows the calibration and cross-validation statistics for various biochemical parameters. It is essential to note that the ratio of prediction to deviation (RPD) values is invaluable because they provide insight into the quality of the calibration models. Typically, an RPD value greater than 2 indicates that a model is suitable for predictive purposes [40].
The validation and prediction phases further demonstrated the efficacy of the PLSR models. The high correlation coefficient (r) values, especially for photosynthetic pigments (0.85 to 0.96), signify a strong relationship between the observed and predicted values. The strong performance of carotenoids, essential compounds for photoprotection in plants [41], was particularly noteworthy, with an r value of 0.96.
However, not all the parameters showed stellar results. Flavonoids, for instance, demonstrate lower correlation values, underscoring the potential challenges in predicting certain biochemical parameters using hyperspectral data [42,43]. In contrast, lignin exhibited a reasonable r-value of 0.89.
The presented data underscore the potential and challenges of employing PLSR models to predict plant biochemical parameters using hyperspectral sensors. The efficacy of this technique in predicting a plethora of parameters, from chlorophyll to lignin, highlights its importance in modern plant research and potential applications in precision agriculture, phenotyping, and other related fields.

Selected Most Responsive Wavelengths and Bands for Algorithms and Molecular Insights
Exploration of the UV-VIS-NIR-SWIR spectral range yielded notable peaks and valleys, which serve as critical indicators of the correlation between specific wavelengths and unique biochemical molecules in plants. Such correlations underscore the importance of these wavelengths in determining the concentrations of the corresponding biochemical molecules in plant tissues. This is evident when observing the sensitivity of the rededge region (approximately 690-730 nm) to chlorophyll forms Chla, Chlb, and Chla+b. These observations are well supported by the literature, including those of Gitelson and Solovchenko (2018) [44]. Similarly, the sensitivity of specific wavelengths to pigments such as carotenoids and flavonoids aligns with the results of Blackburn (2007) [45].
The hyperspectral vegetation index further emphasises the correlation between spectral data and vegetation properties. A notable observation here is the strong linear association between the photosynthetic concentration, structural molecules, and antioxidant compounds and the studied wavelengths, showing the efficacy of hyperspectral sensors as a non-invasive method, in line with the findings of Chen et al. (2019) [46].
Diving deeper into wavelength selection, it was observed that different computational algorithms offered varied perspectives on the most responsive wavelengths for both Hibiscus and Geranium species. For example, the partial least squares (PLS) method demonstrated a strong affinity for the UV and VIS regions, reinforcing its significance in determining biochemical concentrations. These data were consistent with those of Thenkabail et al. (2011) [33]. The variable importance in the projection (VIP) method showcased an expansive spectral selection, highlighting especially the VIS and NIR regions, which have been previously recognised for their role in determining water content and cellular structures by Kycko, Zagajewski, Lavender & Dabija (2019) [47]. Other methods, such as genetic algorithms (GAs) and random forest (RF), provide unique interpretations of the spectral data, with GAs leaning more towards the UV and VIS regions and RF offering a broader perspective. Finally, with its intricate mechanics, Competitive Adaptive Reweighted Sampling (CARS) captures a comprehensive spectral view, highlighting the value of a holistic spectral approach for deciphering vegetation biochemistry.
Understanding these correlations and the resulting insights from hyperspectroscopy will enhance our knowledge of plant hyperspectroscopy. This foundational understanding is crucial for developing advanced models that can predict biochemical content from hyperspectral data. The robust correlations between spectral data and molecular compositions demonstrate the high potential of hyperspectroscopy in precision agriculture, ecology, and environmental monitoring [36,48,49]. Given its ability for quick, non-invasive, and detailed evaluations, hyperspectroscopy has emerged as a pivotal tool and a promising timely intervention to ensure optimal plant health and productivity.
The interplay between spectral analysis and advanced computational algorithms has opened new avenues in hyperspectroscopy, highlighting its potential for mapping plant biochemical parameters effectively.

Experimental Design and Growth Conditions of Plants
Hibiscus rosa-sinensis L. (Hibiscus) and Pelargonium zonale (L.) L'Hér. Ex. Aiton (Geranium) plants were cultivated in the Botanical Garden at the State University of Maringá, Maringá, Paraná, Brazil, under greenhouse conditions. These conditions provided natural ambient light, with temperatures between 22 • C and 26 • C, and a photoperiod of 16 h. To ensure consistent hydration, the plants were watered twice daily, at 8 a.m. and 6 p.m. Leaves of various ages were sampled from different parts of the plant. A total of 200 samples were collected for hyperspectral reflectance analysis and assessment of leaf biochemical profiles. To guarantee uniformity in the data collection, all measurements were conducted between 11 a.m. and 1 p.m. A schematic of the flowchart analysis is shown in Figure 9. . Flowchart of the methodology for assessing biochemical molecules in Hibiscus and Geranium leaves using UV-VIS-NIR-SIR hyperspectral sensors. Plants were cultivated in a greenhouse, and hyperspectral reflectance measurements of the leaves were taken. Biochemical extraction of pigments and cellular components was subsequently analysed using ELISA. Data from hyperspectral reflectance and biochemical absorbance were integrated and examined using PLS regression models. Responsive wavelengths were selected, and the corresponding PLS models were generated. Figure 9. Flowchart of the methodology for assessing biochemical molecules in Hibiscus and Geranium leaves using UV-VIS-NIR-SIR hyperspectral sensors. Plants were cultivated in a greenhouse, and hyperspectral reflectance measurements of the leaves were taken. Biochemical extraction of pigments and cellular components was subsequently analysed using ELISA. Data from hyperspectral reflectance and biochemical absorbance were integrated and examined using PLS regression models. Responsive wavelengths were selected, and the corresponding PLS models were generated.

Acquisition of Hyperspectral Leaf Reflectance
Leaf hyperspectral reflectance was acquired using a FieldSpec ® 3 spectroradiometer complemented by an ASD contact PlantProbe ® (Analytical Spectral Devices ASD Inc., Boulder, CO, USA). The spectroradiometer incorporated three sensors spanning wavelengths ranging from 350 to 2500 nm. By employing the PlantProbe ® , we ensured that the data remained uncontaminated by atmospheric interference. The measurements were directed at the adaxial surface of the leaves, deliberately avoiding the central vein. Periodic calibration of the device was conducted using a standard white reference plate (Spectralon ® , Labsphere Inc., Longmont, CO, USA), resulting in 2151 bands within the 350-2500 nm spectrum. This method produced 200 distinctive hyperspectral leaf profiles aligned with the respective biochemical metrics. Optimal bands for chemometric evaluations were identified through principal component analysis and specific algorithms that discerned the most responsive wavelengths.

Profiling of Biochemical Compounds
To quantify the levels of total chlorophyll (Chl), carotenoids (Car), anthocyanins (AnC), and flavonoids (Flv) in the leaf extracts, we adopted a modified methodology based on Gitelson and Solovchenko (2018) [44]. Leaf samples, each 1 cm 2 , were homogenised in 2 mL tubes using a chloroform and methanol mixture (2:1 v/v) supplemented with CaCO 3 . After thorough extraction, we added distilled water, equivalent to 20% of the volume of the extract, to facilitate the separation of the polar and nonpolar phases. This solution was centrifuged at 15,000 rpm for 9 min to ensure a distinct phase division. For quantification, we placed a 200 µL aliquot of the extract into a quartz glass UV 96-well microplate. The resultant readings were acquired using the Biochrom Asys UVM-340 Microplate-Reader, complemented by the ScanPlus VisibleWell ® software version 1.0.2 (Biochrome Ltd., Milton Road, Cambridge, UK). Furthermore, leaf segments utilised for extraction quantification were oven-dried at 70 • C until they reached a constant weight. Subsequent measurements were performed using an analytical balance to express the results per unit of mass.

Chlorophyll and Carotenoid Quantification
To quantify chlorophyll a, b, a+b, and carotenoids (carotenes and xanthophylls), 200 µL of methanolic extract was added to each well. Absorbance was recorded at 470, 652, and 665 nm using a methanol extract. The formulae presented by Falcioni et al. (2023) [35] were used to determine the chlorophyll and carotenoid concentrations expressed in mg cm −2 and mg g −1 .

Flavonoid and Anthocyanin Quantification
The polar fraction of the methanolic extract was analysed to assess flavonoid (Flv) concentrations. The absorbance of these extrachloroplastidic pigments was determined at λ358 nm using a molar absorption coefficient of ε358 = 25 mM −1 cm −1 , as described by Gitelson & Solovchenko (2018) [44]. After Flv quantification, the water-methanol phase was acidified with hydrochloric acid to a final concentration of 0.1% HCl. This adjustment facilitated the determination of anthocyanin (AnC) levels at λ530 nm, employing a molar absorption coefficient of ε530 = 30 mM −1 cm −1 , as reported by Gitelson et al. (2020) [41].

Total Soluble Phenolic Compounds
Soluble phenolic compounds (PhCs) were quantified using a modified procedure of Ragaee (2006) [50]. For this assay, a 2 mL Eppendorf tube was loaded with 150 µL of the methanolic extract, 70 µL of 1 M Folin-Ciocalteu reagent, 140 µL of 3.56 M Na 2 CO 3 , and 850 µL of deionised water. Following a 50-min incubation in the dark, the mixture was centrifuged at 15,000 rpm for 2 min. The absorbance of the supernatant was measured at λ725 nm using a quartz glass microplate reader. Gallic acid served as the standard for estimating the equivalent Phe concentration, characterised by the equation Y = 87.651x + 1.6515 with an R 2 value of 0.993.

Antioxidant Compounds
The antioxidant potential was determined using the DPPH (2,2-diphenyl-1-picrylhydrazyl) free radical neutralisation method, adapted from the protocol outlined by Llorach et al. (2008) [27]. DPPH solution (1 mM) was used in this assay. The reaction was initiated by adding 50 µL of the methanolic extract to 200 µL of DPPH solution. After vigorous mixing, the samples were incubated in darkness for an hour. Absorbance measurements were performed using a quartz glass 96-well microplate reader at λ515 nm [34].

PLSR Analysis of UV-VIS-NIR-SWIR Reflectance in Plants
For PLSR analysis, the dataset was divided into two subsets: 140 samples for calibration and cross-validation, and an additional 60 samples were designated for the external validation of the model. Multiple plant biochemical parameters were assessed, including the area-and mass-based metrics of chlorophyll a, chlorophyll b, total chlorophyll a+b, carotenoids, flavonoids, chlorophyll a/b ratio, phenolic compounds, lignin, and cellulose. These parameters were compared with the UV-VIS-NIR-SWIR spectral curves, considering each to be an independent entity. PLSR models were developed using the NIPALS algorithm. Outliers were identified using Leverage's type and further examined using Leverage and Hotelling's T 2 methods with a threshold set at 5%. The performance of the model was evaluated using the coefficients of determination (R 2 ) and the root mean square error (RMSE) across the calibration, cross-validation, and prediction stages. Based on benchmarks established by Minasny et al. (2013) [51], R 2 values above 0.75 indicated optimal model performance, those between 0.75 and 0.5 were considered adequate, and values below 0.5 indicated suboptimal predictions. Additionally, the ratio of performance to deviation (RPD) was derived from the R 2 values across different stages, providing insights into the precision of PLS model predictions. Calibration, cross-validation, and validation statistics for two plant species: PLS factors, R 2 values, offset, RMSE, and RPD during the calibration and cross-validation stages for each parameter, as well as predictive statistics such as the correlation coefficient (R 2 ), slope, offset, SEP, RPD, and the equation linking prediction to the calibration model [52].

Evaluating Hyperspectral Vegetation Indices Using Optimal Wavelengths
To optimise the accuracy of biochemical assessments, key hyperspectral bands were identified using the normalised difference vegetation index formula (Equation (1)) drawn from Crusiol et al., (2023) [53]. This approach generated distinct hyperspectral vegetation indices (HVIs). Each HVI was correlated with cross-sections relevant to phenomenological flows. Correlations were quantified using the Pearson correlation coefficient and coefficient of determination (R 2 ) using the custom IDL code. A ground-based sensor captured spectra from 350 to 2500 nm, and the results are depicted in a contour map. HVI = Wavelength 1 − Wavelength 2 Wavelength 1 + Wavelength 2 (1)

Algorithmic Determination of Key Wavelengths in Plants
To accurately discern the most relevant wavelengths for our investigations of the Hibiscus and Geranium plants, a suite of advanced algorithms was utilised. It incorporates techniques such as partial least squares (PLS), variable importance in projection (VIP), interval PLS-VIP (iPLS-VIP), genetic algorithms (GA), random forests (RF), and competitive adaptive reweighted sampling (CARS). Data analysis was performed with precision using multiple software platforms. The R software package version 4.2.2 Corrplot R-Core Team 2021 and the Python programming language version 3.11.5 (Python Software Foundation, Wilmington, DE, USA) formed the foundation of our analytical framework. In Python, RF procedures were facilitated by the scikit-learn library, whereas the DEAP library underpinned our GA evaluations. In the R environment, the PLS package was paramount for the PLS-focused analyses. Additionally, for iPLS analyses, MATLAB 2022a software version 9.12 (MathWorks, Inc., Natick, MA, USA) was used and seamlessly integrated with PLS_Toolbox (Eigenvector Research, Inc., Manson, WA, USA). The relative contribution of each wavelength was determined by identifying the most responsive wavelengths. This was based on the maximum and minimum values selected by the wavelength selection algorithms. Comprehensive descriptive statistics were used to characterise the biochemical metrics. For each parameter, evaluations included count (n), mean, median, minimum, maximum, and coefficient of variation (CV, %), as delineated by [4]. The categorisation of CV adhered to the criteria proposed by Zar (2010) [54]. Pearson's correlation coefficient was used to determine the interrelationships between biochemical attributes. For these analytical tasks, we used Statistica 10 ® (StatSoft Inc., Tulsa, OK, USA) and the R software framework. Graphical depictions were generated using a suite of applications: SigmaPlot 10.0 ® (Systat Inc., Santa Clara, Silicon Valley, CA, USA), specific R packages, Excel (Microsoft Inc., Silicon Valley, CA, USA), and CorelDraw 2020 ® (Corel Corp., Ottawa, ON, Canada).

Principal Component Analysis (PCA)
The Unscrambler X software, version 10.4 (CAMO Software, Oslo, Norway), was used to conduct PCA on the growth parameter data, with a statistical significance level set at p < 0.01. To avoid underfitting and overfitting, the optimal number of principal components was determined based on the first maximum value of overall accuracy [25].

Conclusions
These findings demonstrated the UV-VIS-NIR-SWIR spectral range, revealing its cardinal role in identifying the distinctive biochemical constituents of Hibiscus and Geranium plants. The reliability of our models was exemplified by R 2 values consistently surpassing the 0.75 threshold, reinforcing the red edge in predicting vital plant molecules, such as chlorophyll. Additionally, parameters such as DPPH and lignin yielded significant outcomes, achieving R 2 values of 0.86 for DPPH and 0.89 for lignin. Our application of advanced algorithms, particularly PLS, VIP, CARS, and other models, indicates an intricate relationship between the spectral data and plant biochemistry. The identification of highly responsive wavelengths, particularly in the red-edge region, emphasises deep-seated correlations with key plant pigments. Finally, the fusion of hyperspectroscopy and cutting-edge computational methodologies holds great promise in the future. This signifies a new era in precision agriculture and environmental oversight. Furthermore, they reduce the costs of reagents and their environmental disposal, thereby contributing to sustainability. Finally, chemometric methods applied to hyperspectral analysis are good predictive tools. The extensive yet largely untapped potential of hyperspectroscopy, as presented in our study, can be used for further exploration, fostering an environment ripe for innovation and transformative advances in sustainable agricultural practices.