Monitoring the Foliar Nutrients Status of Mango Using Spectroscopy-Based Spectral Indices and PLSR-Combined Machine Learning Models

: Conventional methods of plant nutrient estimation for nutrient management need a huge number of leaf or tissue samples and extensive chemical analysis, which is time-consuming and expensive. Remote sensing is a viable tool to estimate the plant’s nutritional status to determine the appropriate amounts of fertilizer inputs. The aim of the study was to use remote sensing to characterize the foliar nutrient status of mango through the development of spectral indices, multivariate analysis, chemometrics, and machine learning modeling of the spectral data. A spectral database within the 350–1050 nm wavelength range of the leaf samples and leaf nutrients were analyzed for the development of spectral indices and multivariate model development. The normalized difference and ratio spectral indices and multivariate models–partial least square regression (PLSR), principal component regression, and support vector regression (SVR) were ineffective in predicting any of the leaf nutrients. An approach of using PLSR-combined machine learning models was found to be the best to predict most of the nutrients. Based on the independent validation performance and summed ranks, the best performing models were cubist (R 2 ≥ 0.91, the ratio of performance to deviation (RPD) ≥ 3.3, and the ratio of performance to interquartile distance (RPIQ) ≥ 3.71) for nitrogen, phosphorus, potassium, and zinc, SVR (R 2 ≥ 0.88, RPD ≥ 2.73, RPIQ ≥ 3.31) for calcium, iron, copper, boron, and elastic net (R 2 ≥ 0.95, RPD ≥ 4.47, RPIQ ≥ 6.11) for magnesium and sulfur. The results of the study revealed the potential of using hyperspectral remote sensing data for non-destructive estimation of mango leaf macro- and micro-nutrients. The developed approach is suggested to be employed within operational retrieval workﬂows for precision management of mango orchard nutrients.


Introduction
Over the last two decades, advancements in remote sensing technologies such as the use of reflectance spectroscopy, airborne and satellite technology, and statistical analysis approaches thereof have made it easy to understand several key processes and components of plants such as plant population [1][2][3], grain yield and biomass [4][5][6][7][8], pigment or chlorophyll [9][10][11], water stress response [12][13][14][15], nutritional status [16][17][18][19][20][21] or pest and disease identification [22][23][24][25]. Yet, in-field proximal sensing to estimate the nutritional status of the data in a wavelength range of 500-1700 nm of maize and soybean, [51] determined the macronutrients content (N predicted best followed by P, K, and S) (R 2 = 0.69-0.92 and ratio of performance to deviation (RPD) = 1.62-3.62) and micronutrients (Cu and Zn were best predicted, followed by Fe and Mn) (R 2 = 0.19-0.86, RPD = 1.09-2.69) satisfactorily. The predictions of sodium and B were not satisfactory. Machine learning models are a unique and robust technique to analyze and model any data being non-linear and nonparametric [21,52,53]. Use of a combination of linear multivariate analysis models such as PLSR and principal component analysis (PCA) with non-linear and non-parametric models such as artificial neural network (ANN) [54][55][56][57] elastic net (ELNET), support vector regression (SVR), Gaussian process regression (GPR), multivariate adaptive regression spline (MARS), random forest (RF), extreme gradient boosting (XGB), generalized additive model (GAM), and k-nearest neighbor (KNN) [20] has been reported to retrieve information from spectral features. The machine learning models are capable of performing numerous calculations in several combinations and are thus useful to reduce the time involved in the analysis. In [21], the authors predicted the citrus, Valencia-orange, leaf N, P, K, Ca, Mg, S, Cu, iron (Fe), manganese (Mn), and Zn by using RF, ANN, and KNN of the spectral reflectance and its first-order derivative with R 2 = 0.61 to 0.91. An RF and support vector machine regression (SVMR) of the airborne hyperspectral remote sensing imagery data were used by [58] to estimate the N, P, K, Zn, Na, Cu, and Mg with R 2 = 0.55-0.78 (with RF) and S and Mn with R 2 = 0.68-86 (with SVMR) of the mixed pasture in New Zealand. They emphasized a better performance of the non-linear machine learning model (SVMR and RF) than the linear (PLSR). Some of the commonly employed multivariate modeling techniques to extract information from hyperspectral data and to establish a relationship between spectral reflectance and measured variables are stepwise MLR [59,60], PLSR [61,62], successive projections algorithm coupled with MLR [63,64], ANN [63,65], and SVMR [47,66]. Very recently, a combination of the PCA and PLSR combined learning models have been used by [20] as a non-destructive tool, to predict the leaf ion (K, Na, Ca, and Mg) content for phenotyping of rice to salt-stress. They found the prediction accuracy of different approaches in order as PLSR-combined models > PCA-combined models > indices-based models. A non-linear SVR based radial basis function (RBF) kernel predicted critical N concentration in the sugarcane canopy correlated with actual N by R 2 of 0.78 and RMSE of 0.035% [67]. Nutrients with low plant or leaf concentration and subtle physical absorption features still pose a challenge and less attention has been paid for its error-free estimation using remote sensing techniques. Hence, studies to develop techniques that can accurately estimate foliar mineral nutrients are required. Use of the linear models such as PLSR, PCA, or PCR in combination with the non-linear machine learning models has been gaining popularity to retrieve information from the hyperspectral reflectance data [54][55][56]68]. This can be achieved by using the principal components, latent variables, or variables selected through variable importance [69] as an input for further regression or machine learning modeling. These approaches reduce the collinearity and data dimensionality and increase the computation speed but at the same time retain most of the information of the original dataset [54,68]. The use of linear and non-linear regression analysis has been successfully demonstrated in a few studies but very limited information is available on this aspect in fruit crops, specifically mango.
Mango is the "king of fruits" and its estimated area of cultivation in the world is 5.44 million hectares with production and productivity of 43.3 million tonnes and 7.96 t ha −1 , respectively [70]. In India, mango is grown over 2.52 million hectares and has a productivity of 6.92 t ha −1 . The share of India to the world's area of cultivation and production is 46% and 41.6%, respectively. Though the share of area and production of India to the world is huge, the productivity is lesser than the world's average and all other mango-producing countries. In addition to many others, one of the major constraints to the yield is suboptimal and inappropriate nutrient management [71]. Conventional agronomic methods for plant nutrient estimation are being practiced regularly at important growth stages, to manage the fertilizer nutrients [72]. These methods need a huge number of leaf or tissue sample collection and analysis which is time-consuming and expensive [48,73]. Remote sensing could be a viable tool to estimate the plant's nutritional status and assist in understanding the appropriate amounts of fertilizer inputs in a cost-, labor-and time-effective manner.
Owing to the limited knowledge available on the use of hyperspectral remote sensing to characterize the foliar nutrient status in mango, our study was undertaken with the objectives (1) to compare the efficacy of the spectral indices and chemometric modeling methods and (2) to develop robust quantification models by combining linear and nonlinear machine learning models to estimate the foliar macro-and micro-nutrient status of mango.

Materials and Methods
The objectives of the current study were achieved through four different steps as demonstrated in the scheme of Figure 1. A brief outline about the steps is also presented as follows: Step 1-data collection: field-level leaf sampling and chemical analysis to determine the nutrient content and measurement of the spectral data in the laboratory; Step 2-spectral data pre-processing; Step 3-data analysis: development of vegetation indices and machine learning models, and Step 4-identification of spectral algorithms: identification of the robust spectral algorithms based on the model prediction evaluation parameters.

Experimental Setup
Around 400 leaf samples were collected from mango orchards located in North Goa and South Goa districts of Goa State on the west coast of India ( Figure 2). Leaf samples were collected from mature plants yielding fruits with an approximate age of 8-10 years. A 4-7-month-old leaf with petiole from the middle of the shoot was collected during the post-fruiting season, i.e., June-July of the years 2018 and 2019. The sampling was undertaken for three weeks. The post-fruiting season was selected in the view that the plant is exhausted of nutrients due to the fruiting in the previous season and gives the actual idea of the nutritional status. The fertilizer application is normally recommended in the second fortnight of June or the end of the monsoon season (second fortnight of September). The post-fruiting stage was ideal for the study to get actual nutritional status and to make the fertilizer prescription for the subsequent season. The samples were collected on dry sunny days.
Remote Sens. 2021, 13, x FOR PEER REVIEW 5 of 24 Figure 1. The scheme of the methodology followed in the study.

Experimental Setup
Around 400 leaf samples were collected from mango orchards located in North Goa and South Goa districts of Goa State on the west coast of India ( Figure 2). Leaf samples were collected from mature plants yielding fruits with an approximate age of 8-10 years. A 4-7-month-old leaf with petiole from the middle of the shoot was collected during the post-fruiting season, i.e., June-July of the years 2018 and 2019. The sampling was undertaken for three weeks. The post-fruiting season was selected in the view that the plant is exhausted of nutrients due to the fruiting in the previous season and gives the actual idea of the nutritional status. The fertilizer application is normally recommended in the second fortnight of June or the end of the monsoon season (second fortnight of September). The post-fruiting stage was ideal for the study to get actual nutritional status and to make the fertilizer prescription for the subsequent season. The samples were collected on dry sunny days.

Spectral Measurements
A total of 40 orchards were identified for the study, and samples from 10 trees from each orchard were collected, making a total of 400 samples. Immediately after collecting the leaf samples from the field, they were placed in the thermally insulated box to avoid any changes in biochemical properties due to change in temperature and transported to the laboratory. The spectral measurements of the detached leaves were carried out in the laboratory on the same day of leaf sample collection. Mango leaf samples collected were scanned to record the spectral data in the wavelength range of 282-1097 nm using an optical fiber of visible near-infrared spectroradiometer (GER1500, Spectra Vista Corp., Poughkeepsie, NY, USA) as non-contact observations. The Spectroradiometer was calibrated with the Spectralon ® panel (Spectra Vista Corp., Poughkeepsie, NY, USA) (100% spectral reflectance) before recording the spectral measurement of the leaf samples. The spectral observations of the adaxial surface of mango leaves were taken within a black box to reduce the impact of stray light. It was ensured that the leaves cover the full field of view of the foreoptics (pistol grip). The spectral observations were taken at nadir position

Spectral Measurements
A total of 40 orchards were identified for the study, and samples from 10 trees from each orchard were collected, making a total of 400 samples. Immediately after collecting the leaf samples from the field, they were placed in the thermally insulated box to avoid any changes in biochemical properties due to change in temperature and transported to the laboratory. The spectral measurements of the detached leaves were carried out in the laboratory on the same day of leaf sample collection. Mango leaf samples collected were scanned to record the spectral data in the wavelength range of 282-1097 nm using an optical fiber of visible near-infrared spectroradiometer (GER1500, Spectra Vista Corp., Poughkeepsie, NY, USA) as non-contact observations. The Spectroradiometer was calibrated with the Spectralon ® panel (Spectra Vista Corp., Poughkeepsie, NY, USA) (100% spectral reflectance) before recording the spectral measurement of the leaf samples. The spectral observations of the adaxial surface of mango leaves were taken within a black box to reduce the impact of stray light. It was ensured that the leaves cover the full field of view of the foreoptics (pistol grip). The spectral observations were taken at nadir position to reduce the impact due to bidirectional reflectance. The calibration was done every after five samples were recorded. The spectral reflectance data were collected at a bandwidth of 1.5 nm. Further, spectral resampling at a 1 nm interval was done using spline interpolation. The spectral data were further smoothed using Savitzky-Golay filtering across a 15 bands moving window (a window length of 15 and zero polynomial order) to reduce the noise using "prospectr" package in R software version 3.5.2 [74]. The polynomial order was zero. A multiple scatter correction to the data was further done using the standard normal variate technique. Spectral data in the range between 350 nm to 1050 nm was utilized due to the absence of noise. For each leaf sample, an average of five measurements was considered as a representative spectral signature. The average spectral reflectance with standard deviation for the calibration and validation dataset has been presented as Figure 3.

Chemical Analysis
After capturing the leaf spectral data, the samples were oven-dried at 60 • C till constant weight is achieved. The samples were powdered and stored in a zip-lock plastic bag for further chemical analysis. The powdered leaf samples were digested using a mixture of nitric acid and perchloric acid as 9:4 v/v proportion for analysis of the P, K, Ca, Mg, S, Fe, Mn, Zn, Cu, and B [75]. The total N concentration in the mango leaves was estimated using the modified micro Kjeldahls method [76]. Leaf P concentration was determined by measuring the intensity of the yellow color developed by vanado-molybdate reagent with a spectrophotometer [75]. The S concentration of the leaf samples was estimated by measuring turbidity developed by barium chloride using a spectrophotometer [77]. Total leaf K, Ca, Mg, Fe, Mn, Zn, and Cu concentrations were measured with Atomic Absorption Spectrophotometer (nova400P, Analytik Jena, Germany). The B concentration was estimated by measuring a pink color intensity developed in the digest by Azomethine-H indicator with a spectrophotometer. The contents of the nutrients were expressed as percentage and parts per million on a dry weight basis.
Outliers in the 400 nutrient data points were identified and removed resulting in a total of N = 376 sample data for further statistical analysis. The data set was divided into 70% for model calibration (N = 263) and 30% for independent validation (N = 113). The equality of mean, variance, distribution, and CV of calibration and validation datasets were analyzed using t-test, F-test, Kolmogorov-Smirnov test, and Flinger-Kileen test, respectively.

Development of Parametric Regression Models
The best combination of the wavelengths for the development of the VIs was identified using contour plots. The normalized difference spectral index (NDSI) and ratio spectral index (RSI) were calculated as Based on all the possible two-pair combination of the wavelengths, the spectral indices were calculated using the software MATLAB. A combination of wavelengths having the highest correlation coefficient with the respective leaf nutrient content was identified for the spectral index.

Development of Nonparametric Regression Models
Multivariate models were built using the hyperspectral reflectance data and corresponding leaf nutrient content. Initially, three different models, i.e., PLSR, PCR, and SVR were tested to retrieve the leaf nutrient contents from the spectral data. The best performing nutrient specific model was identified. The latent variables (LVs) generated from the PLSR model were used as input variables for developing different linear and non-linear models. Machine learning regression algorithms evaluated in the current study were elastic net (ELNET), support vector machine regression (SVR), Gaussian process regression (GPR), multivariate adaptive regression splines (MARS) [78], random forest (RF), k-nearest neighbors (KNN), extreme gradient boosting (XGB) [79], neural network (NNET), and Cubist [80]. The hyper-parameters of each model were calibrated using tenfold crossvalidation with five repetitions in "caret" [81] package of R statistical software version 3.5.2 [82]. The hyperparameters which were optimized for each machine learning model were as ELNET-alpha, lambda; SVR-sigma, C; GPR-sigma; MARS-nprune, degree; RF-mtry; splitrule, min.node.size; KNN-Number of neighbors (k); XGB-nrounds, max_depth, eta, gamma, colsample_bytree, min_child_weight, subsample; NNET-size, decay and Cubist -committees, neighbors. Every machine learning model was calibrated using a training dataset using 10-fold cross-validation with five repetitions and thus each model was run 50 times. The performance of a particular model to predict the leaf nutrient content was assessed based on the values of model evaluation parameters such as R 2 , d-index, mean bias error (MBE), root mean square error (RMSE), residual prediction deviation (RPD), and the ratio of performance to inter-quartile distance (RPIQ). The prediction accuracy of different models was categorized based on RPD as excellent (>2), acceptable (≥1.4-2.0) and non-reliable (<1.40) [83] and RPIQ as very poor (<1.5), poor (1.5-2.0), good (>2.0-2.5) and very good (>2.5) [84]. The values of these parameters are indicated with a superscript letter c and v for calibration and validation, respectively. It is difficult to decide the bestperforming model evaluation parameters such as R 2 , RMSE, RPD, RPIQ, etc. individually. So, a composite summed rank based on these parameters was developed considering the performance of each model parameter wise for the calibration and validation. Ranking of each model evaluation parameter for a particular nutrient in calibration or validation was done using the RANK.AVG function of Microsoft Excel. The ranks of calibration and validation were summed separately and all together and it was referred as a summed rank for a particular model. The model with the least rank predicted the nutrient with the greatest accuracy and the one with the highest rank had the poorest prediction accuracy.

Descriptive Statistics
The descriptive statistics of the mango leaf nutrients in the full, calibration and validation dataset are presented in Table 1. The coefficient of variation (CV) for the nutrients analyzed for the calibration and validation dataset varied from 10.30-93.30% and 10.90-88.40%, respectively. For both these datasets, the greatest and least coefficient of variation (CV) was observed for Cu and N, respectively. Similarly, for the full dataset, the greatest (91.90) and least (10.50%) CV was observed for the Cu and N, respectively. All the parameters were positively skewed except for N in full and calibration and N, P, K, and Mg in the validation dataset. The results revealed that the difference between calibration and validation dataset for mean, variance, and CV was insignificant. Kolmogorov-Smirnov test showed an equal distribution of leaf nutrient content across the calibration and validation dataset (p > 0.05) except for P (p = 0.03). These results confirm that the calibration and validation dataset are statistically similar and the random selection employed is appropriate. The calibration and validation dataset represented the variability present in the full dataset. The Jarque-Bera test of normality indicated that all the parameters were normally distributed except N, Ca, Mn, Zn, Cu, and B ( Table 1). The values of these nutrients were Box-Cox transformed to make them normally distributed before the data were employed for further statistical analysis ( Table 2).

Indices Development and Prediction Performance
The best combinations of the wavelengths for the development of the VIs were identified through contour plots were generated and are presented in Figures 4 and 5. The NDSI and RSI identified for each nutrient have been listed in Table 3, with the results of the prediction performance. In the case of NDSI, for calibration, the prediction accuracy as indicated by R 2 c ranged from 0.002 (N) to 0.466 (Mg) with RMSE c of 0.09 (S) to 2672.08 (Mg), respectively, while for validation, the R 2 v varied from 0.05 (N) to 0.41 (K) with RMSE v of 0.11 (P) to 2739.48 (Mg). In general, for calibration and validation, the RPD and RPIQ values were ≤0.94 and ≤1. 25. It indicated that the predictions were very poor for all the nutrients using the NDSIs. In the case of RSIs, the R 2 and RMSE varied from 0.04 (N) to 0.50 (Mg) and 0.11(P) to 871.58 (Mg), respectively, during calibration and from 0.06 (Cu)-0.38 (Ca) and 0.08 (S)−1081.80 (Mg), respectively, for validation. For the calibration and validation prediction using the RSIs, the RPD and RPIQ were ≤1.03 and ≤1.35. Similar to the NDSIs, the predictions using the RSIs were also very poor. In the current study, none of the spectral indices developed could yield successful predictions for any of the nutrients.

Performance of Nonparametric Regression Analysis
Multivariate analysis techniques such as PLSR, PCR, and SVR were employed to predict the mango leaf nutrient contents using the calibration and validation dataset, respectively (Table 4). Model evaluation parameters such as R 2 , d-index, MBE, RMSE, RPD, and RPIQ were used to evaluate the prediction accuracy of the model. To avoid the complexity of deciding the performance of the model using the model evaluation parameters individually, a composite rank based on these parameters was developed considering the performance during calibration and validation. Overall sum ranking showed that the PLSR model was the best to predict most of the nutrients except N, S (best obtained by SVR), and Mn (best obtained by PCR) in which predictions were unreliable. The accuracy of the PLSR to predict P, K, Ca, Mg, Fe, Mn, Zn, and B with respect to R 2 c , RPD c, and RPIQ c for calibration varied from 0.34-0.59, 1.23-1.56, and 1.47-2.13. During validation, these indices ranged from 0.26-0.53, 1.11-1.42, and 1.34-1.79, respectively. The greatest prediction accuracy was achieved for the leaf Ca (R 2 v = 0.53, RPD v = 1.42 and RPIQ v = 1.79 for the independent validation). Based on the RPIQ values for both calibration and validation, predictions for all the nutrients were categorized as poor except for Mg during calibration (RPIQ = 2.13, very good). However, as per the criteria of RPD, the P, K, Ca, and Mg, predictions for calibration and Ca and Mg for validation were acceptable. Among all the nutrients, the performance of these three multivariate models to predict N and Cu was the poorest with R 2 ≤ 0.19, RPD ≤ 1.10, and RPIQ ≤ 1.90, indicating very poor prediction for both calibration and validation.

Performance of the PLSR-Combined Machine Learning Models
The results pertaining to the prediction performance of the PLSR-combined machine learning models are presented in Table A1 and the best performing models in Figure 6. The optimum number of latent variables (LVs) were generated and selected using PLSR and 10-fold cross-validation and used as predictor variables for machine learning model development. The LVs are linear combinations of all the input variables but orthogonal to each other which helps to reduce collinearity. The number of the LVs for N, P, K, Ca, Mg, S, Fe, Mn, Zn, Cu, and B selected were 1, 6, 5, 5, 7, 10, 5, 5, 5, 1, and 6, respectively. Overall, the prediction performance improved significantly with the PLSR-combined machine learning models over the single PLSR model. For all the nutrients, the performance of the best performing PLSR-combined machine learning models with respect to R 2 , RPD, and RPIQ for calibration ranged from 0.95 to 0.99, 4.42 to 11.06, and 6.55-13.80, and for validation, these were 0.88-0.99, 2.73 to 5.76, and 3.31 to 7.65, respectively. Based on the RPD and RPIQ values, it was evident that all the machine learning models combined with PLSR were effective to predict all the macro-and micro-nutrients with very good to excellent prediction accuracy. Based on the independent validation performance and the summed ranks, the best performing model for different nutrients were Cubist for N (R 2 v = 0.94, RPD v = 4.27, and RPIQ v = 6.03), P (R 2 v = 0.91, RPD v = 3.3, and RPIQ v = 3. Although the prediction accuracies for all the models were very good to excellent, the most robust PLSR-combined models were Cubist, SVR, and ELNET. Table 5 gives an overview of the independent validation performance and identifies the best machine learning models combined with PLSR to predict mango leaf nutrients based on the RPD and RPIQ. Among the nine machine learning models tested, the performance of the MARS, RF, and KNN was the poorest and yielded non-reliable predictions for most of the nutrients except MARS for K and Zn and KNN for Mg and B.  Figure 6. Performance of the best performing PLSR-combined models for predicting nutrients as (a) N using PLSR-Cubist, (b) P using PLSR-Cubist, (c) K using PLSR-Cubist, (d) Ca using PLSR-SVR, (e) Mg using PLSR-SVR, (f) S using PLSR elastic net (ELNET), (g) Fe using PLSR-SVR, (h) Mn using PLSR-SVR, (i) Zn using PLSR-Cubist, (j) Cu using PLSR-SVR, and (k) B using PLSR-SVR.

Variations in Leaf Nutrient Concentrations and Spectral Data
Before employing the nutrient and spectral data for the statistical analysis, it is very important to appropriately split the data into calibration and validation datasets. The insignificant results of the t-test, F-test, Kolmogorov-Smirnov test, and Flinger-Kileen test indicated that the random division of the data into calibration and validation datasets was accurate, rendering it suitable for further statistical analysis. The variations in the spectral data were more prominent in the NIR region than in the visible. Similar findings were noted by [85] and [20] while predicting the leaf ion content using remote sensing in cotton and rice, respectively. The spectral pattern and variations recorded made the spectral data suitable for further analysis. Prominent leaf spectral variations in visible-NIR regions were reported by [21] for predicting macro-and micro-nutrient content in orange. [49] showed the hyperspectral features in the spectral region of 470-800 nm are useful for detecting concentrations of leaf nutritional elements. In [86], variations in the spectral signature of oil palm for different nutrient such as N, P, K, Mg, Ca, and B were observed. Higher reflectance in the infrared region (650-900 nm) was also observed by [87] in groundnut plants while predicting the N, P, and K content and yield. Wide variation in the nutrient status of the plant is an important pre-requisite to developing prediction models from remote sensing data. In the current study, a wide variation in the nutrient data was observed for all except for N and Zn. Such observations are supported by the results obtained by [49] with the highest variations for Ca and least for Mg in tallgrass prairie vegetation. The degree of variation may also affect the prediction of nutrients using spectral data and different statistical analysis methods.

Vegetation Indices
In the current study, none of the spectral indices developed could predict any of the leaf nutrients successfully. The inability of model development by spectral indices could be the outcome of an unsuccessful match of selected indices and wavelengths as individual wavelengths and/or regions might not have strongly correlated with nutrient concentrations. Another probable explanation could be the inability to better deal with confounding factors such as reflectance saturation, leaf area, roughness, and moisture in the leaf, which reduces the performance of raw spectral bands [88]. Earlier studies used different vegetation indices for predicting foliar nutrient in different crops and most of these were used to detect foliar or canopy N, P, and K content as they are powerful indicators of plant nutrition status [89,90]. Normalized difference spectral indices were effectively used by [91] and [92] to estimate leaf N, P, or K content in different plant species.
In [11], a poor prediction of already published 43 empirical spectral indices for the N, P, and K content of the shrub and grass vegetation in China was recorded. Furthermore, to overcome this, the linear regression analysis to optimize the band-band combination was performed and effectively retrieved the leaf N, P, and K content (R 2 > 0.5, p < 0.05). This confirms that hyperspectral data could be potentially used for fine-scale monitoring of degraded vegetation.
The use of few wavelengths to develop a spectral index and for the prediction of nutrients offers a simple way to model any parameter, but at the same time, does not consider the information hidden in the other parts of the spectrum or wavelengths. A poor prediction accuracy was found by [93] exploiting the Inverted red-edge chlorophyll index (R 2 = 0.66), relative normalized difference index (R 2 = 0.48), red-edge chlorophyll index (R 2 = 0.28), and normalized difference infrared index ranged R 2 = 0.28−0.67 for the coffee canopy N using satellite data. Thus, our results on the poor performance of spectral or vegetation indices are consistent with those reported by [11,93], among others. Prediction accuracy of R 2 = 0.16−0.48 was obtained by [94] to predict the N:P ratio of the grass vegetation using previously published vegetation indices computed from the satellite data however the performance was improved to R 2 = 0.59−0.72 with optimized vegetation indices.

Chemometrics and Machine Learning Regression Modeling
Among the three multivariate models tested, the PLSR was the best to predict most of the nutrients except SVR for N, S, and PCR for Mn. However, the prediction performance was poor with R 2 ≤ 0.53 and low values of RPD and RPIQ (Table 4). [95] and [96] predicted the leaf nutritional elements with PLSR modeling of the spectral data. Our results are also consistent with that of [49], who used the PLSR model to predict the tallgrass prairie leaf pigment and nutritional status with the lowest RMSE of prediction. A reasonable selection of modeling and validation datasets is important to improve the prediction accuracy of the PLSR models. The spectral modeling of leaf nutrients is complex and depends on spectral features [49]. The PSLR has the capability of building linkages between the high dimensional spectral features and the vegetation properties. Reliability of PLSR is due to its ability to address the property of collinearity and over-fitting in the hyperspectral data than other multivariate models [97,98], and hence the PLSR is widely preferred in the hyperspectral analysis [99][100][101]. The outcome could be due to the low nutrient ranges and weak relationships between nutrients and reflectance that hinder the model development.
The results of better prediction of N% with the SVR are consistent with those reported by [38] in the pear (R 2 = 0.66) and apple (R 2 = 0.77). In [51], satisfactory results for all macronutrients (R 2 = 0.69-0.92, RPD = 1.62-3.62) were also observed, with N predicted best followed by P, K, and S. The micronutrients group showed lower prediction accuracy (R 2 from 0.19 to 0.86, RPD from 1.09 to 2.69), however, indicated Cu and Zn were best predicted, followed by Fe and Mn. In the current study, we employed PLSR, but other multivariate modeling techniques such as random forest [102], and artificial neural networks [103] can also be used. The constraint of using advanced modeling tools could be that these are pure data-driven approaches, and it might be difficult to interpret the biological processes and significance. Owing to the poor performance of the single PLSR and other multivariate models, a new approach of combining the PLSR with machine learning [20] was attempted to retrieve the leaf nutrient content to improve the accuracy. In general, prediction accuracy improved significantly with the PLSR-combined machine learning models over the single PLSR model. The ELNET, SVR, GPR, MARS, RF, KNN, XGB, NNET, and Cubist combined with the PLSR retrieved N, P, K, Ca, Mg, S, Fe, Mn, Zn, Cu, and B with very good to excellent prediction accuracy with few exceptions such as poor prediction of MARS for K and Zn and KNN for Mg and B, which yielded non-reliable predictions.
The current study explored an approach of using the PLSR in combination with machine learning models of the spectral data as an attempt to retrieve leaf nutrient content, and very few studies have been conducted in this context so far. The use of such an approach of latent variable modeling reduces the redundancy and dimensionality in the data and speed of computation with a meager loss of information from the original data [104]. Furthermore, the use of visible-NIR (350-1050 nm) spectral data in the study also provides a greater opportunity to upscale it to the field level [20].
An approach identified in the current study would help in offering the guidelines for precision nutrient management in mango crops, which might further help to improve the fruit yield and quality. The present approach is suitable for rapid and reliable estimation of the leaf nutrients at the laboratory level, however, field investigations are needed to upscale this research at the canopy level using the ground-based or airborne hyperspectral remote sensing. A major limitation or constraint to upscale the research at field or canopy level is cloud cover, which coincides with sampling time, i.e., the post-fruiting season in the study region.

Conclusions
In the current study, spectroscopy-based novel spectral indices, chemometric modeling methods-solo PLSR, PCR, and SVR-and PLSR-based machine learning models were evaluated to predict the mango leaf macro-and micronutrient contents. The approach of spectral indices and chemometrics modeling methods both were inefficient and could not retrieve any of the nutrients satisfactorily. In the study, a combination of linear and non-linear machine learning methods yielded the best predictions. The PLSR-combined machine learning models of the Cubist, SVR, and ELNET were found to be the most robust in predicting most of the nutrients and provided very good to excellent prediction accuracy. The results of the study revealed that the hyperspectral sensing data could be employed to retrieve the foliar nutritional status of the mango. The presented approach is suitable for rapid and reliable estimation of the leaf nutrients at the laboratory level, however, field investigations are needed to upscale this research at the canopy level using ground-based or airborne hyperspectral remote sensing.