Assessing Contents of Sugars, Vitamins, and Nutrients in Baby Leaf Lettuce from Hyperspectral Data with Machine Learning Models

: Lettuce ( Lactuca sativa ) is a leafy vegetable that provides a valuable source of phytonutrients for a healthy human diet. The assessment of plant growth and composition is vital for determining crop yield and overall quality; however, classical laboratory analyses are slow and costly. Therefore, new, less expensive, more rapid, and non-destructive approaches are being developed, including those based on (hyper)spectral reflectance. Additionally, it is important to determine how plant phenotypes respond to fertilizer treatments and whether these differences in response can be detected from analyses of hyperspectral image data. In the current study, we demonstrate the suitability of hyperspectral imaging in combination with machine learning models to estimate the content of chlorophyll (SPAD), anthocyanins (ACI), glucose, fructose, sucrose, vitamin C, β -carotene, nitrogen (N), phosphorus (P), potassium (K), dry matter content, and plant fresh weight. Five classification and regression machine learning models were implemented, showing high accuracy in classifying the lettuces based on the applied fertilizers treatments and estimating nutrient concentrations. To reduce the input (predictor data, i.e., hyperspectral data) dimension, 13 principal components were identified and applied in the models. The implemented artificial neural network models of the machine learning algorithm demonstrated high accuracy (r = 0.85 to 0.99) in estimating fresh leaf weight, and the contents of chlorophyll, anthocyanins, N, P, K, and β -carotene. The four applied classification models of machine learning demonstrated 100% accuracy in classifying the studied baby leaf lettuces by phenotype when specific fertilizer treatments were applied.


Introduction
Lettuce (Lactuca sativa L.) is the most commercially produced leafy vegetable grown in moderate climates worldwide.Fresh lettuce is a desirable, but highly perishable, product that should be consumed almost immediately after harvest or maintained in conditions that prevent its wilting, weight loss, enzymatic discoloration, senescence, and tissue deterioration [1].Lettuce produced in the United States can be grouped into the following three broad categories: whole heads and bulk harvest (for salad processing) that are commonly harvested at market maturity and baby leaf grown at a very high density and harvested at early stages of plant development [2].All lettuce products, including baby leaf lettuce, are a good source of bioactive compounds [3] that are beneficial to human health [4].
The mineral composition of plants reflects the mineral content and quality of the soil in which they grow.This information is valuable for assessing soil fertility and potential environmental contaminants absorbed by the plant.Minerals are vital for plant growth, development, and physiological processes [5].Understanding a plant's mineral composition provides insights into its nutritional requirements, growth patterns, and ways to improve cultivation practices.It is important to determine the mineral composition of plants to assess their nutritional value, as minerals like iron, calcium, magnesium, and potassium are essential for human health.Knowing their levels can help meet dietary mineral requirements, particularly for individuals with specific mineral needs.Different cultivars may have varying mineral compositions, so analyzing them can assist growers, breeders, and the food processing industry in making informed choices when selecting cultivars with desirable mineral profiles.For instance, minerals found in lettuce, such as potassium and magnesium, may contribute to potential health benefits associated with its consumption [3].Similar to mature lettuce, baby leaf lettuce also provides several dietary minerals, such as potassium, magnesium, and iron [6,7].Like mature lettuce, the mineral composition of baby leaf lettuce varies among lettuce types, and the mineral content changes depending on cultural conditions [8].
Sugars, such as glucose, fructose, and sucrose, are essential for providing energy to the plant and play a role in the plant's metabolism and growth [9].They are also important for human nutrition, providing a source of energy and sweetness [10].Vitamins, such as vitamin C and β-carotene, are essential micronutrients that are important for human health.Vitamin C is an antioxidant that helps protect cells from damage, supports the immune system, and aids in the absorption of iron.β-carotene is a precursor to vitamin A, which is important for vision, immune function, and skin health [11].Therefore, the presence and levels of sugars, vitamins, and nutraceuticals in plants [12], including lettuce, are important factors to consider when assessing its nutritional value and quality.
Although the evaluation of lettuce composition is important for determining its overall quality, laboratory analyses are slow and costly.Therefore, more rapid, less expensive, and non-destructive approaches based on (hyper)spectral reflectance are being developed to assess plant phenotypes [13], response to stress [14], and postharvest quality [15].Recently, substantial progress has been made in using reflectance data to estimate lettuce quality and composition, including pH value and the content of chlorophyll, anthocyanins, soluble solids, water, nitrogen (N), phosphorus (P), potassium (K), calcium (Ca), magnesium (Mg), and sulfur (S) [16][17][18][19][20][21][22][23][24][25].However, there is a lack of studies modeling the content of sugars, vitamins, and other compounds in baby leaf lettuce.Moreover, previous studies did not perform the classification of lettuce composition based on the applied fertilizer treatments using hyperspectral data and machine learning algorithms.Therefore, the current study focuses on assessing the content of certain sugars, vitamins, and nutrients in baby leaf lettuce from hyperspectral data using partial least squares regression and machine learning models.

Plant Material
Analyses were performed on a set of 300 greenhouse-grown plants from 50 accessions, which included 36 cultivars, seven plant introductions, six breeding lines, and an accession of Lactuca serriola L., the wild progenitor of cultivated lettuce (Appendix A-Table A1).These accessions were selected for this study based on our previous preliminary analyses and published information [26,27] to represent a broad range of phenotypic and composition diversity in lettuce.
Lettuce seeds were sown in potting soil (Premium Growers Mix, Sun Land Garden Products, Watsonville, CA, USA), and then covered with ~5 mm of sand, watered, and kept in the dark at 10 • C to enhance the uniformity of germination.After two days, the temperature was increased to 20 • C with a 16 h/8 h light/dark photoperiod.Two weeks later, well-established and uniform-looking plants were transplanted into 7.6 cm (~514 mL volume) pots.At this point, the plants were split into two groups that were planted into different substrates.One group of plants, later used to analyze the content of sugars and vitamins, was transplanted into a 1:1 mix of potting soil and sand and fertilized with 1.5 g of Osmocote Smart-Release Plant Food Flower and Vegetable (Scotts, Marysville, OH, USA).The second group of plants, later used for nutrient analyses, was transplanted into a 1:2 mix of Espoma VM8 8-Quart Organic Vermiculite (Espoma, Millville, NJ, USA) and sand.This group of plants was fertilized with five different combinations of nitrogen (N), phosphorus (P), and potassium (K) fertilizers to ensure a substantial difference in their nutrient composition.The "NPK" (control) treatment provided a full recommended dose of N, P, and K; the "nPK" treatment provided 1/3 of N and a full dose of P and K; the "PK" treatment provided a full dose of P and K, but no N; the "NP" treatment provided a full dose of N and P, but no K; and the "NK" treatment provided a full dose of N and K, but no P. N, P, and K macronutrients were provided with Non-Coated Ammonium Nitrate 34-0-0 Prill Form Fertilizer (Intermountain Farmers Association, Salt Lake City, UT, USA), Triple Super Phosphate 0-46-0 Easy Peasy Plants 99% pure (Easy Peasy Plants, Alvin, IL, USA), and All-Natural Muriate of Potash-Easy Peasy 0-0-60 Potassium (Easy Peasy Plants, Alvin, IL, USA).All plants were sprayed with an identical amount of micronutrient mix solution that was prepared by dissolving 1 g of Axilo Mix 5 (0-0-0) Micronutrient Mix (Valagro USA, Houston, TX, USA) in 1 L of distilled water.
Plants from both groups were grown together in a greenhouse in the randomized complete block design with three replications.During the growing period, the average daily temperature in the greenhouse ranged from 20 to 24 • C, with a natural day length between 13 and 14 h.Outdoor average daily light integrals ranged from 40 to 55 mol m −2 d −1 .Plants were watered daily as needed.When most plants from Osmocote and NPK treatments developed four true leaves about 10 cm long, all plants were evaluated for the content of pigments, scanned with a hyperspectral camera, and subsequently harvested for composition analyses.Plants used in composition analyses were evaluated for biomass production (fresh weight in g).

Content of Pigments
The content of pigments was determined two days before harvest with hand-held meters that use light transmittance to provide good in situ estimates of chlorophyll (SPAD-502 from Konica Minolta Sensing, Tokyo, Japan) and anthocyanins (ACM-200 plus from Opti-Sciences, Hudson, NH, USA).Measurements were taken on three leaves of similar age (avoiding youngest and oldest leaves) about 1 cm from the leaf edge and averaged for each plant.The content of chlorophylls is expressed in SPAD units; the content of anthocyanins is expressed in ACI (anthocyanins content index) units.

Hyperspectral Imaging
Lettuce plants' reflectance was measured with ASD FieldSpec 3 (Analytical Spectral Devices Inc., Boulder, CO, USA), which collects reflectance from 350 nm to 2500 nm, with a 1.4 nm sampling interval between 350 and 1050, and a 2 nm sampling interval between 1000 and 2500 nm.Hyperspectral data were subdivided into 1 nm bandwidth using an ASD self-driven interpolation method.Spectral jump correction for 725-1000 nm and 1800-1950 nm bands was performed using the parabolic correction equations [28].The hyperspectral image (HSI) data acquisition set (Figure 1) comprised ASD-Pro halogen lamps, ASD FieldSpec, a computer, a Spectralon reflectance standard, and connection wires.
The reflectance measurements were taken for individual plants in a dark room illuminated with four ASD-Pro-Lamps (Analytical Spectral Devices Inc., Boulder, CO, USA).The black and non-reflective cloth was used to cover pots and soil.Hyperspectral images were taken at a height of about 10 cm above the plant.Each plant was scanned four times, with a 90 • turn between individual measurements.Reflectance calibration was performed using the Spectralon SRT-MS-100 reflectance standard (Labsphere, North Sutton, NH, USA).

Laboratory Quantification of Compounds
Immediately after taking reflectance measurements, all leaves from plants were harvested and submitted to the University of California Davis Analytical Laboratory to determine the content of sugars, vitamin C, β-carotene, N, P, and K using common quantification procedures developed for plant tissue samples (https://anlab.ucdavis.edu,accessed on 14 November 2022).A detailed description of all analytical methods was previously provided [8,29]; therefore, they will be mentioned only briefly.
Samples for the analyses of sugars were dried at 55 °C, ground, and extracted with hot deionized water [30].The amounts of glucose, fructose, and sucrose in extracts were determined using a PerkinElmer Series 200 Quaternary HPLC (PerkinElmer, Waltham, MA, USA) with a Sciex API 200 mass spectrometer (Sciex, Redwood City, CA, USA).Vitamin C (ascorbic acid) was quantified in leaf tissue according to the previously developed protocol [31], with minor modifications.β-carotene was quantified from a homogenized sample prepared from fresh tissue and deionized water using HPLC analysis with an isocratic mobile phase of methanol/acetonitrile (90:10).N, P, and K were quantified using methods based on the extraction of soluble nitrate (NO3-N) [32], phosphorus (PO4-P) [33], and potassium [34] from plant material with a solution of 2% acetic acid.The content of glucose, fructose, sucrose, N, P, and K is reported in g per kg of fresh weight (g kg −1 FW), and vitamin C and β-carotene are reported in mg per kg of fresh weight (mg kg −1 FW).Another part of each sample was oven-dried at 105 °C to determine dry matter (DM) content.DM content at 55 °C and 105 °C were expressed in percentages of fresh weight.

Hyperspectral Image Indexing and Extraction Models
The first-order derivative of reflectance (FDR) of the filtered HSI data was computed using the formulas elaborated in our previous studies [17,18].The computed FDR values were explored to find the 13 most important principal components.The reflectance values at the found bandwidth indices were subsequently used to develop multivariate

Laboratory Quantification of Compounds
Immediately after taking reflectance measurements, all leaves from plants were harvested and submitted to the University of California Davis Analytical Laboratory to determine the content of sugars, vitamin C, β-carotene, N, P, and K using common quantification procedures developed for plant tissue samples (https://anlab.ucdavis.edu,accessed on 14 November 2022).A detailed description of all analytical methods was previously provided [8,29]; therefore, they will be mentioned only briefly.
Samples for the analyses of sugars were dried at 55 • C, ground, and extracted with hot deionized water [30].The amounts of glucose, fructose, and sucrose in extracts were determined using a PerkinElmer Series 200 Quaternary HPLC (PerkinElmer, Waltham, MA, USA) with a Sciex API 200 mass spectrometer (Sciex, Redwood City, CA, USA).Vitamin C (ascorbic acid) was quantified in leaf tissue according to the previously developed protocol [31], with minor modifications.β-carotene was quantified from a homogenized sample prepared from fresh tissue and deionized water using HPLC analysis with an isocratic mobile phase of methanol/acetonitrile (90:10).N, P, and K were quantified using methods based on the extraction of soluble nitrate (NO 3 -N) [32], phosphorus (PO 4 -P) [33], and potassium [34] from plant material with a solution of 2% acetic acid.The content of glucose, fructose, sucrose, N, P, and K is reported in g per kg of fresh weight (g kg− 1 FW), and vitamin C and β-carotene are reported in mg per kg of fresh weight (mg kg −1 FW).Another part of each sample was oven-dried at 105 • C to determine dry matter (DM) content.DM content at 55 • C and 105 • C were expressed in percentages of fresh weight.

Hyperspectral Image Indexing and Extraction Models
The first-order derivative of reflectance (FDR) of the filtered HSI data was computed using the formulas elaborated in our previous studies [17,18].The computed FDR values were explored to find the 13 most important principal components.The reflectance values at the found bandwidth indices were subsequently used to develop multivariate regression models based on the filtered HSI data and laboratory (ground truth) data by applying the PLS and PCA modeling approaches.The ground truth data were those for fresh leaf weight (FLW), percentage of dry matter content at 55 • C (55 • C DM) and 105 • C (105 • C DM), SPAD, ACI, content of glucose (Glu), fructose (Fru), sucrose (Suc), vitamin C (Vit-C), β-carotene (β-Carot), N, P, and K.A multivariate linear regression modeling [35] was applied to establish predictive models using PLSR and PCR approaches.

Machine Learning Models for Classification and Prediction
To classify and estimate the responses of the studied lettuces to fertilizer treatments using their hyperspectral reflectance data, five classification and regression models of machine learning algorithms were applied as follows: a support vector machine (SVM), an ensemble, linear, and quadratic discriminant, and an artificial neural network (ANN).The classification of the studied lettuces based both on their response to fertilizers treatments (PK, NK, NP, NPK, nPK, and Osmocote) and HSI data was performed.The estimation and prediction models were found using ANN algorithms.A total of 80% of the input HSI data and corresponding response data (fresh leaf weight (FLW), percentage of dry matter content at 55 • C (55 • C DM) and 105 • C (105 • C DM), SPAD, ACI, and content of glucose (Glu), fructose (Fru), sucrose (Suc), vitamin C (Vit-C), β-carotene (β-Carot), N, P, and K) were used for model training.A total of 10% of the data were used for testing, and 10% of the data were used for validation.The predictive models were based on the least gradient of mean squared errors over simulation using 1000 epochs with 10 hidden layers and 1 output layer.The implementation of the ANN algorithm for multivariate linear predictive models (Figure 2) is composed of the following five major steps: data import, data preparation, model selection, model evaluation, and regression model performance.
regression models based on the filtered HSI data and laboratory (ground truth) data by applying the PLS and PCA modeling approaches.The ground truth data were those for fresh leaf weight (FLW), percentage of dry matter content at 55 °C (55 °C DM) and 105 °C (105 °C DM), SPAD, ACI, content of glucose (Glu), fructose (Fru), sucrose (Suc), vitamin C (Vit-C), β-carotene (β-Carot), N, P, and K.A multivariate linear regression modeling [35] was applied to establish predictive models using PLSR and PCR approaches.

Machine Learning Models for Classification and Prediction
To classify and estimate the responses of the studied lettuces to fertilizer treatments using their hyperspectral reflectance data, five classification and regression models of machine learning algorithms were applied as follows: a support vector machine (SVM), an ensemble, linear, and quadratic discriminant, and an artificial neural network (ANN).The classification of the studied lettuces based both on their response to fertilizers treatments (PK, NK, NP, NPK, nPK, and Osmocote) and HSI data was performed.The estimation and prediction models were found using ANN algorithms.A total of 80% of the input HSI data and corresponding response data (fresh leaf weight (FLW), percentage of dry matter content at 55 °C (55 °C DM) and 105 °C (105 °C DM), SPAD, ACI, and content of glucose (Glu), fructose (Fru), sucrose (Suc), vitamin C (Vit-C), β-carotene (β-Carot), N, P, and K) were used for model training.A total of 10% of the data were used for testing, and 10% of the data were used for validation.The predictive models were based on the least gradient of mean squared errors over simulation using 1000 epochs with 10 hidden layers and 1 output layer.The implementation of the ANN algorithm for multivariate linear predictive models (Figure 2) is composed of the following five major steps: data import, data preparation, model selection, model evaluation, and regression model performance.
In the training phase of the ANN model, we applied the Levenberg-Marquardt method [36], which is a built-in function in the MATLAB package (MathWorks 2023, Natick, MA, USA) for linear and non-linear least squares methods.In the training phase of the ANN model, we applied the Levenberg-Marquardt method [36], which is a built-in function in the MATLAB package (MathWorks 2023, Natick, MA, USA) for linear and non-linear least squares methods.

Model Accuracy Metrics
Three accuracy metrics of the developed regression and ANN models were used that are correlation coefficient (r), normalized root mean squared error (NRMSE), and accuracy (%).The correlation between the ground truth and predicted values of response variables was computed using Equation (1), as follows: where x i and x are the laboratory-measured values of a response variable and their mean value, respectively and y i and y are the predicted/estimated values of a response variable and their mean value, respectively.Another metric used to assess the accuracy of the regression models was NRMSE, which was computed using the following formulation-Equation ( 2): where N is the number of data points in the response variable.Four different classification models (SVM, ensemble, quadratic, and linear discriminant) of machine learning algorithms were employed to classify the studied lettuce by the six applied fertilizers treatment (PK, NK, NP, NPK, nPK, Osmocote).The accuracy of the classification models was estimated using Equation (3) in % as given in [37], as follows: The variables in Equation ( 3) are true positive (TP), true negative (TN), false positive (FP), and false negative (FN) values.The accuracy metric shown in Equation (3) was applied for all six fertilizer treatments.

HSI Data
The captured hyperspectral data of the whole lettuce plant as elaborated in Section 2.3 were filtered using a seven-point moving average digital filter.The filtered hyperspectral reflectance data of all studied lettuce accessions (Figure 3) exhibit six volatile bandwidth regions as found using FDR simulations.

PLSR and PCR Model Performances
In addition to the laboratory-measured values for 13 response variables, two more adjusted values were calculated as follows: a square root of SPAD and logarithm (base 2) of ACI as previously recommended [38].The computed FDR of reflectance values show that the 13 most important principal components were found in seven different bandwidth regions, i.e., 500-575, 600-675, 700-775, 950-1000, 1100-1150, 1300-1400, and 1875-2000 nm based on the maximum or minimum reflectance derivative values (Figure 4).These bandwidth regions are in close agreement with those determined previously on lettuce [39].Correlations between the predicted values and laboratory measurements (ground truth) for multivariate linear PLSR and PCR models ranged from 0.5450 to 0.9476 (Figure 5), while NRMSE values ranged between 0.3397 and 0.9038.The highest correlation (0.9476 and 0.9306) between the predicted values found using PLS and PCR multivariate models and laboratory-measured values was observed for fresh leaf weight (FLW).The lowest correlation (0.6442 and 0.5445) between modeled and measured data was observed for the potassium (K) content.These correlation values show the range of values from minimum to maximum, which were found after several iterative simulations.The correlation values of PLSR and PCR models to estimate SPAD, N, P, and K nutrients were lower than those found in our previous studies [17,18].This difference may be related to the scanning approach used on analyzed plants.In the current study, the whole lettuce plants were scanned using the hyperspectral camera, whereas, in our previous study, hyperspectral images were collected from individual lettuce leaves placed in a fully isolated chamber.Correlations for N, P, and K content determined in the current study using PLSR and PCR models are, however, in close agreement with the results from other laboratories [21].The accuracy of the PLSR and PCR models were higher for N and P, but lower for K in our study than previously reported accuracies for the PLSR models [40].

PLSR and PCR Model Performances
In addition to the laboratory-measured values for 13 response variables, two more adjusted values were calculated as follows: a square root of SPAD and logarithm (base 2) of ACI as previously recommended [38].The computed FDR of reflectance values show that the 13 most important principal components were found in seven different bandwidth regions, i.e., 500-575, 600-675, 700-775, 950-1000, 1100-1150, 1300-1400, and 1875-2000 nm based on the maximum or minimum reflectance derivative values (Figure 4).These bandwidth regions are in close agreement with those determined previously on lettuce [39].Correlations between the predicted values and laboratory measurements (ground truth) for multivariate linear PLSR and PCR models ranged from 0.5450 to 0.9476 (Figure 5), while NRMSE values ranged between 0.3397 and 0.9038.The highest correlation (0.9476 and 0.9306) between the predicted values found using PLS and PCR multivariate models and laboratory-measured values was observed for fresh leaf weight (FLW).The lowest correlation (0.6442 and 0.5445) between modeled and measured data was observed for the potassium (K) content.These correlation values show the range of values from minimum to maximum, which were found after several iterative simulations.The correlation values of PLSR and PCR models to estimate SPAD, N, P, and K nutrients were lower than those found in our previous studies [17,18].This difference may be related to the scanning approach used on analyzed plants.In the current study, the whole lettuce plants were scanned using the hyperspectral camera, whereas, in our previous study, hyperspectral images were collected from individual lettuce leaves placed in a fully isolated chamber.Correlations for N, P, and K content determined in the current study using PLSR and PCR models are, however, in close agreement with the results from other laboratories [21].The accuracy of the PLSR and PCR models were higher for N and P, but lower for K

Classification of Lettuces by the Applied Fertilizer Treatments Using Machine Learning Algorithms
To classify the applied six fertilizer treatments (PK, NK, NP, NPK, nPK, and Osmocote) for all studied lettuces and the collected hyperspectral image data, four different machine learning classification algorithms (SVM, ensemble, linear and quadratic discriminant) were implemented and tested.The filtered HSI data were taken as an input variable and the applied six treatments, converted to categorical arrays, and used as an output variable.To reduce the dimension of the input (predictor) variable data, i.e., HSI data, 13 wavebands of HSI, and 13 principal component values found using FDR reflectance values (Figure 4), were used.HSI data were classified based on the six applied fertilizer treatments.A total of 90% of 300 data sets were used for model training, and 10% were used for model testing/validation.The employed classification scales were individual classes-TP, FP, TF, FN-and the combined classes-TPR (true positive rates) and FNR (false negative rates).The plotted confusion matrices of the employed algorithms (Figure 6) show that the highest accuracy (77.8% for model training and 73.3% for model testing/validation) in classifying the applied treatments was achieved with the ensemble algorithm (Figure 6a).The secondhighest accuracy (75.9% for model training and 70.0% for model testing/validation) of classification was reached using the SVM algorithm (Figure 6b).The other two algorithms, quadratic discriminant (Figure 6c) and linear discriminant (Figure 6d), resulted in 74.7% and 80% for model training, and 70.0% and 66.7% accuracy for model testing/validation, respectively.A total of 100% accuracy was achieved in classifying lettuces using SVM, ensemble, linear, and quadratic discriminant algorithms when NK, Osmocote, and PK treatments were applied (Figure 6a-d).This demonstrates that NK, Osmocote, and PK fertilizers have significant effects on the leaf pigment content.Similar conclusions (Figure 6a,b) can be drawn with adequate confidence about the nPK treatment effect.On the other hand, NP and NPK treatments (Figure 6a-d) had a very small or no effect on lettuce leaf pigments, and therefore, the hyperspectral imaging technique could not detect the impact of NP and NPK treatments.

Classification of Lettuces by the Applied Fertilizer Treatments Using Machine Learning Algorithms
To classify the applied six fertilizer treatments (PK, NK, NP, NPK, nPK, and Osmocote) for all studied lettuces and the collected hyperspectral image data, four different machine learning classification algorithms (SVM, ensemble, linear and quadratic discriminant) were implemented and tested.The filtered HSI data were taken as an input vari-

Classification of Lettuces by the Applied Fertilizer Treatments Using Machine Learning Algorithms
To classify the applied six fertilizer treatments (PK, NK, NP, NPK, nPK, and Osmocote) for all studied lettuces and the collected hyperspectral image data, four different machine learning classification algorithms (SVM, ensemble, linear and quadratic discriminant) were implemented and tested.The filtered HSI data were taken as an input vari- NK, Osmocote, and PK treatments were applied (Figure 6a-d).This demonstrates that NK, Osmocote, and PK fertilizers have significant effects on the leaf pigment content.Similar conclusions (Figure 6a,b) can be drawn with adequate confidence about the nPK treatment effect.On the other hand, NP and NPK treatments (Figure 6a-d) had a very small or no effect on lettuce leaf pigments, and therefore, the hyperspectral imaging technique could not detect the impact of NP and NPK treatments.

Performances of Machine Learning Algorithms to Estimate Nutrients of Lettuces
To estimate and predict nutrient concentration levels in lettuces based on hyperspectral image data using artificial neural network algorithms of machine learning, the HSI data were treated as a predictor variable, and laboratory-measured values were treated as a response variable.The acquired HSI data of lettuces were filtered using a Gaussian sevenpoint moving average filter.Two additional transformed values, the square root of SPAD and the logarithm of ACI, were included in ANN models.Thus, a total of 15 response variables were simulated.
The predictor variable (filtered HSI data) and response variable values were randomly split into 80%, 10%, and 10% for training, testing, and validation sets.A total of 1000 epoch simulations with 10 hidden layers and one output layer for ANN models were applied to find the best-fitting multivariate predictive linear model based on the following three performance metrics: mean squared error, error gradient, and correlation coefficient (r) between the predicted values and laboratory-measured (ground truth) values.For the ANN model simulations, two different sets of response variable values were tested, original data (laboratory-measured) (Figure 7) and a normalized version of the laboratory-measured data (Figure 8).
phase (r = 0.9808 … 0.9998), testing (r = 0.7500 … 0.9357), and validation (r = 0.8480 … 0.9400) (Figure 7).The accuracy of the estimation models for N, P, and K contents is in good agreement with the accuracy of the models developed using back propagation neural networks and random forest algorithms [40].The lowest correlation between the estimated and ground truth values was found for Vit-C, Glu, Fru, 55C DM, 105C DM, and the square root of SPAD, which also showed much higher NRMSE values (Figure 7).The ANN model's performance in estimating nutrient levels and other traits, when using normalized values of response variables, showed a slight decline in overall accuracy in terms of the correlation between predicted and measured values for all nutrients, except Glu and 55C DM, in the training set (Figure 8).Conversely, when evaluating the validation data sets, the models demonstrated improved accuracy for most traits, and the use of normalized response variables resulted in reduced normalized error margins (NRMSE).

Conclusions
Based on the data processing approach and the estimated performance of models that were developed from hyperspectral image data, the following can be concluded.
1.The performed studies demonstrate that the accuracy of the PLSR and PCR models degrades when the hyperspectral data of the whole lettuce plant is collected.Therefore, machine learning algorithms have become a more reliable solution for model development; 2. The application of machine learning algorithms for the classification of the lettuces When the original data and the ANN model were applied, the highest correlations between the estimated nutrient values and laboratory-measured values were detected for fresh leaf weight (FLW), log2(ACI), N, P, K, SPAD, and β-carotene in the model training phase (r = 0.9808 . . .0.9998), testing (r = 0.7500 . . .0.9357), and validation (r = 0.8480 . . .0.9400) (Figure 7).The accuracy of the estimation models for N, P, and K contents is in good agreement with the accuracy of the models developed using back propagation neural networks and random forest algorithms [40].The lowest correlation between the estimated and ground truth values was found for Vit-C, Glu, Fru, 55C DM, 105C DM, and the square root of SPAD, which also showed much higher NRMSE values (Figure 7).
The ANN model's performance in estimating nutrient levels and other traits, when using normalized values of response variables, showed a slight decline in overall accuracy in terms of the correlation between predicted and measured values for all nutrients, except Glu and 55C DM, in the training set (Figure 8).Conversely, when evaluating the validation data sets, the models demonstrated improved accuracy for most traits, and the use of normalized response variables resulted in reduced normalized error margins (NRMSE).

Conclusions
Based on the data processing approach and the estimated performance of models that were developed from hyperspectral image data, the following can be concluded.

1.
The performed studies demonstrate that the accuracy of the PLSR and PCR models degrades when the hyperspectral data of the whole lettuce plant is collected.Therefore, machine learning algorithms have become a more reliable solution for model development; 2.
The application of machine learning algorithms for the classification of the lettuces by the applied treatments based on hyperspectral image data of whole lettuce plants is reliable and provides sufficiently high accuracy, in particular for NK, Osmocote, and PK applied nutrients.This approach can be used to eventually develop a valuable and practical model to determine plant mineral composition and to identify nutrient deficiency in plants using hyperspectral image data; 3.
The accuracy of models to assess nutrient levels of the studied lettuce cultivars using machine learning algorithms demonstrated that a larger number of predictor (input) and response (output) datasets will likely result in a higher accuracy of the estimation models.For example, SPAD and ACI data collected on 300 samples provided better results than traits with smaller datasets.Additionally, the data skewness of response variables may hinder the performances of the machine learning models.

Figure 2 .
Figure 2. Machine learning algorithm implementation for predictive models.Figure 2. Machine learning algorithm implementation for predictive models.

Figure 2 .
Figure 2. Machine learning algorithm implementation for predictive models.Figure 2. Machine learning algorithm implementation for predictive models.

Figure 4 .
Figure 4. Principal components of multivariate PLS and PCR models.

Figure 5 .
Figure 5. Performances of multivariate PLS and PCR models.

Figure 4 .
Figure 4. Principal components of multivariate PLS and PCR models.

Figure 4 .
Figure 4. Principal components of multivariate PLS and PCR models.

Figure 5 .
Figure 5. Performances of multivariate PLS and PCR models.

Figure 5 .
Figure 5. Performances of multivariate PLS and PCR models.

Figure 7 .
Figure 7. Performances of multivariate ANN models with original data.Figure 7. Performances of multivariate ANN models with original data.

Figure 7 . 14 Figure 8 .
Figure 7. Performances of multivariate ANN models with original data.Figure 7. Performances of multivariate ANN models with original data.Agriculture 2024, 14, x FOR PEER REVIEW 11 of 14

Figure 8 .
Figure 8. Performances of multivariate ANN models with normalized data.