A Machine Learning Framework to Predict Nutrient Content in Valencia-Orange Leaf Hyperspectral Measurements

Osco, Lucas Prado; Ramos, Ana Paula Marques; Faita Pinheiro, Mayara Maezano; Moriya, Érika Akemi Saito; Imai, Nilton Nobuhiro; Estrabis, Nayara; Ianczyk, Felipe; Araújo, Fábio Fernando de; Liesenberg, Veraldo; Jorge, Lúcio André de Castro; Li, Jonathan; Ma, Lingfei; Gonçalves, Wesley Nunes; Marcato Junior, José; Eduardo Creste, José

doi:10.3390/rs12060906

Open AccessArticle

A Machine Learning Framework to Predict Nutrient Content in Valencia-Orange Leaf Hyperspectral Measurements

by

Lucas Prado Osco

^1,*

,

Ana Paula Marques Ramos

²

,

Mayara Maezano Faita Pinheiro

²,

Érika Akemi Saito Moriya

³,

Nilton Nobuhiro Imai

³

,

Nayara Estrabis

¹

,

Felipe Ianczyk

¹

,

Fábio Fernando de Araújo

⁴

,

Veraldo Liesenberg

⁵

,

Lúcio André de Castro Jorge

⁶

,

Jonathan Li

⁷

,

Lingfei Ma

⁷

,

Wesley Nunes Gonçalves

¹

,

José Marcato Junior

¹

and

José Eduardo Creste

⁴

¹

Faculty of Engineering, Architecture, and Urbanism and Geography, Federal University of Mato Grosso do Sul (UFMS), 79070-900 Campo Grande, Brazil

²

Environmental and Regional Development, University of Western São Paulo (UNOESTE), 19050-920 Presidente Prudente, Brazil

³

Department of Cartographic Science, São Paulo State University (UNESP), 19060-900 Presidente Prudente, Brazil

⁴

Department of Agronomy, University of Western São Paulo (UNOESTE), 19050-920 Presidente Prudente, Brazil

⁵

Forest Engineering Department, Santa Catarina State University (UDESC), 88520-000 Conta Dinheiro, Brazil

⁶

National Research Center of Development of Agricultural Instrumentation, Brazilian Agricultural Research Agency (EMBRAPA), 13560-970 São Carlos, Brazil

⁷

Department of Geography and Environmental Management and Department of Systems Design Engineering, University of Waterloo (UW), Waterloo, ON N2L 3G1, Canada

^*

Author to whom correspondence should be addressed.

Remote Sens. 2020, 12(6), 906; https://doi.org/10.3390/rs12060906

Submission received: 4 February 2020 / Revised: 6 March 2020 / Accepted: 7 March 2020 / Published: 12 March 2020

(This article belongs to the Special Issue Hyperspectral Remote Sensing of Agriculture and Vegetation)

Download

Browse Figures

Review Reports Versions Notes

Abstract

This paper presents a framework based on machine learning algorithms to predict nutrient content in leaf hyperspectral measurements. This is the first approach to evaluate macro- and micronutrient content with both machine learning and reflectance/first-derivative data. For this, citrus-leaves collected at a Valencia-orange orchard were used. Their spectral data was measured with a Fieldspec ASD FieldSpec^® HandHeld 2 spectroradiometer and the surface reflectance and first-derivative spectra from the spectral range of 380 to 1020 nm (640 spectral bands) was evaluated. A total of 320 spectral signatures were collected, and the leaf-nutrient content (N, P, K, Mg, S, Cu, Fe, Mn, and Zn) was associated with them. For this, 204,800 (320 × 640) combinations were used. The following machine learning algorithms were used in this framework: k-Nearest Neighbor (kNN), Lasso Regression, Ridge Regression, Support Vector Machine (SVM), Artificial Neural Network (ANN), Decision Tree (DT), and Random Forest (RF). The training methods were assessed based on Cross-Validation and Leave-One-Out. The Relief-F metric of the algorithms’ prediction was used to determine the most contributive wavelength or spectral region associated with each nutrient. This approach was able to return, with high predictions (R²), nutrients like N (0.912), Mg (0.832), Cu (0.861), Mn (0.898), and Zn (0.855), and, to a lesser extent, P (0.771), K (0.763), and S (0.727). These accuracies were obtained with different algorithms, but RF was the most suitable to model most of them. The results indicate that, for the Valencia-orange leaves, surface reflectance data is more suitable to predict macronutrients, while first-derivative spectra is better linked to micronutrients. A final contribution of this study is the identification of the wavelengths responsible for contributing to these predictions.

Keywords:

spectroscopy; proximal sensor; macronutrient; micronutrient; artificial intelligence

Graphical Abstract

1. Introduction

Remote sensing techniques can be useful for the estimation of plant health conditions, including monitoring the nutritional status [1,2,3,4], the stress response [5,6,7], plant count [8,9], yield prediction [10,11,12], chlorophyll content [13,14,15], pest and disease identification [16,17], and biomass estimation [18], among others. Multisensory data is often used to accomplish this task, including the ones acquired by orbital sensors, aircraft or Unnamed Aerial Vehicle (UAV)-embedded cameras, terrestrial sensors, and field spectroradiometers, known as proximal sensors [19,20,21,22,23]. This type of sensor can measure the spectral response of a target at very-high resolutions while having a reductive amount of radiometric interference by being near the leaf sample.

The usage of proximal sensors for plant evaluation has assisted phenological studies of different species. Due to the high spectral resolution capability of these sensors, studies have been relatively successful in modeling phenomena, such as the ones previously stated, but at the leaf level, like plant stress, yield prediction, nutrient content, chlorophyll, and many other attributes [24,25,26,27]. They also have the advantage of helping to define, in detail, the appropriate spectral regions to estimate these phenomena. This definition is relatively important as it can guide future research towards the development of equipment specifically designed to measure these regions [23]. Another type of contribution is that it can assist in creating spectral vegetation indices or other simpler mathematical models that contribute to identifying the different characteristics of plants [13,28].

Currently, one of the most common problems in monitoring crops is knowing the proper amounts of fertilization rates. Traditional agronomic methods used to evaluate plant nutrients are done regularly, in key periods, to manage fertilization of agricultural fields [29]. These methods require the collection of a high number of leaves for the chemical analysis of the leaf tissue. However, this chemical analysis is a time-consuming, labor-intensive, and pollutive task [3,30,31]. Remote sensing, specifically proximal sensing, can provide an effective alternative in assisting nutritional analysis of plants more accurately. The use of proximal sensors has an advantage over traditional agronomic methods since it allows to infer vegetation conditions in a non-invasive and non-destructive manner [32,33,34,35].

Regarding the monitoring of plant and leaf nutritional conditions by remote sensing systems, recent research has made significant advances, especially in the estimation of nitrogen (N) content [1,2,3,4,21,25,28,31]. These studies were conducted at orbital, aerial, terrestrial, or proximal levels in different crops. N deficiency is linked to a characteristic chlorosis symptom, which is observable at the visible spectra [21,25,28]. Still, considerable research was also able to identify spectral bands and wavelengths in the near and short-wave infrared regions related to this nutrient [2,3,25,28,36]. Regardless, even though N is a pretty standard nutrient to be evaluated by remote sensing systems, the same cannot be said about others.

The evaluation of nutrients, other than N, by proximal sensors, is more unusual. One study was able to infer potassium (K) content by computing random two-band spectral indices calculated from hyperspectral data ranging from 350 to 2500 nm [37]. Others focused on evaluating a large group of macronutrients, magnesium (Mg), S, phosphor (P), K, and calcium (Ca), and found important associations between the spectral region of 470 to 800 nm with them [32] Lastly, one approach aimed to predict macronutrients like K, calcium (Ca), and magnesium (Mg), as well as micronutrients like manganese (Mn) and iron (Fe), by using near-infrared spectroscopy, but their method did not return satisfactory results for micronutrients [24]. Recent literature demonstrates how hyperspectral measurements are being linked to nutrients, specifically macro. However, there is a gap in terms of micronutrient prediction by spectral sensors that need to be addressed by new research, and few studies were conducted within this theme. In citrus, a study performed a Partial Least-Squares Regression (PLSR) evaluation on both macro- and micronutrients and archived interesting results using Laser-Induced Breakdown Spectroscopy (LIBS) [38]. Similar research, focusing only on near-infrared spectroscopy, also returned high predictions for both classes of nutrients [39].

Another way to infer chemical components from hyperspectral measurements is by applying a derivative analysis. The derivation of the reflectance data allows highlighting absorption features of components that, in a traditional spectral curve (i.e., reflectance curve), may not be measured with the same accuracy or even be detected [40,41,42]. Studies that apply a derivation of the reflectance curves in plants have found good correlations with N [40,41] and cadmium (Cd) concentrations [42]. Since the gains of derivative analysis are known in the literature, there are also methods for data analysis in the remote sensing scenario that may benefit from it. The advantages proportionated by derivatives may assist in the evaluation of leaf nutritional content when combined with more robust techniques.

The aforementioned studies found high relationships with hyperspectral data by employing various statistical methods in their analysis. However, methods like Partial Least-Squares Regression (PLSR), Principal Component Analysis (PCA), Stepwise-Multiple Linear Regression (SMLR), among others, returned different accuracies even for the same cultures [15,24,32,37,38,39,40,42]. Some of these methods are also reductive, and the prediction may decrease if an increase occurs in the model complexity [32]. Since hyperspectral measurements produce high and complex amounts of data, one type of approach that could ideally deal with this is machine learning.

Machine learning algorithms are a robust and intelligent technique that can model different types of data [43,44]. These algorithms have the advantage of being non-parametric and non-linear while being able to analyze noised and imperfect data [45,46,47]. They are also capable to perform numerous combinations and calculations in a matter of seconds, achieving relative success in remote sensing applications regarding plant analysis [48,49]. Concerning hyperspectral measurements, among the applications evaluated, these algorithms were able to return state-of-the-art performances for many situations [5,16,23,50,51,52]. Even though, to date, no study evaluated the performance of machine learning algorithms in inferring both plant macro- and micro-nutritional content with only leaf hyperspectral measurements. Since these algorithms have returned good accuracies in different hyperspectral analyses, they could be appropriate to deal with the complexity imposed by this type of dataset in the described situation.

As previously stated, the first-derivative of the reflectance data has already been proved to be effective in associating with different chemical components. From this information, it could be assumed that both the reflectance data and its first-derivative could be of assistance in predicting different nutrients of the leaf tissue. Since the derivation of a reflectance curve can highlight hard-to-detect components at the first level, it is possible that, by integrating these curves with machine learning algorithms, one can create important information regarding proximal sensing and plant nutritional analysis. In this spirit, a framework adopting machine learning algorithms are proposed to predict macro- and micronutrient content in the leaf-tissue directly from its hyperspectral response. This is the first approach to evaluate different nutrient content combining machine learning methods and reflectance/first-derivative data.

In this study, citrus leaves—more specifically, from Valencia-orange trees—were selected to compose the experimental dataset. It is well known that a sufficient supply of both macro- and micronutrients is critical to the management and sustainability of these plants, and the balance of available nutrients is a key component to profitability [53]. Citrus plants are economically important to the agricultural sector of many countries and may benefit significantly from a rapid and indirect nutritional assessment, such as the one proposed here. In this manner, the aims of this work are to a) show a method to indicate the most suitable spectra (reflectance/first-derivative), in order to model the nutrient content according to the algorithms’ performance; and b) determine the important wavelengths or spectral regions associated with each nutrient.

2. Materials and Methods

The framework proposed in this paper was divided into four main phases (Figure 1). In the first phase, the hyperspectral measurements of the leaf samples in a Valencia-orange orchard were performed. These measurements were conducted with a field spectroradiometer. In the second phase, the spectral measurements were corrected, and the data were pre-processed. These corrections aimed to convert the radiance signal to reflectance, as well as remove the noise and calculate their first-derivative. The third phase involved the data analysis by machine learning algorithms. In this phase, a fine-tuning to determine the most appropriate parameters to model the data was performed. The fourth and final phase consisted of the organization of the prediction values into a hyperspectral map, where it was identified as the most appropriate algorithm and wavelength (i.e., spectral window) to predict each nutrient.

2.1. Study Area and Data Acquisition

As a study area, an open field of citrus trees, located on private property in the municipality of Ubirajara, São Paulo state, Brazil, was selected. The species analyzed were all of Valencia-orange (Citrus sinensis “Valencia”), planted on a Citrumelo–Swingle rootstock. During the evaluation, the trees were in their vegetative phase, with an adult size, measuring nearly 3 m in height (ground-related), with crown areas around 5.5 m². During the survey, the trees were at their maturing stage, which is five years from their initial planting. The area contains 752 trees per hectare, planted at approximately a 7 × 1.9 m spacing. Each plantation field was 250 m × 250 m in size, with some fields compensating for others accordingly to its location. The plantation fields were selected randomly inside this property and configured the different conditions of the treatments. Before the analysis, the soil was fertilized with 250 kg/ha of N in the form of urea, 125 kg/ha of phosphorus excreted, expressed as P₂O₅, and 167 kg/ha of potassium oxide (K₂O). The area is predominantly composed of red-yellow podzolic soil, situated in a Cwa Köppen [54] subtropical climate type unit.

This paper evaluated leaf samples from multiple orange tress scattered around different planting fields in an experimental portion of the orchard. In each field, the number of trees was selected according to the size of the planting field and trees planted per area. The selection was both to measure the leaf hyperspectral response and to collect them. A total of 320 samples collected in the field, with both spectra curves and later-known nutrient content, were gathered during this survey. The sampling method followed standard recommended agronomic procedures, guided by a field specialist. To represent the proper conditions of a citrus tree, only leaves at a medium canopy height and those visually healthy with no signs of diseases or damages were evaluated. A lift platform was used to elevate the person with the equipment. Since the chemical analysis of the leaf tissue recommends the 3rd or 4th leaf of a fruit branch to be sampled, the spectroradiometer equipment was directed as close as possible to the leaves that shared this description.

After measuring the spectral radiance, the leaves were extracted from their respective branches, separated, and identified them into plastic bags to be submitted to the laboratory. The leaf samples consisted of the same leaves that had their spectral radiance measured. They were conditioned at an appropriate temperature and transported accordingly. In the laboratory, the leaves were washed with a neutral detergent to remove any impurities. Later, they were dried in an oven, for 48 h, at 60–65 °C, and then crushed. From the crushed material, 100 mg was used for the N analysis. For that, the Kjeldahl titration method [55], divided into 3 phases, was followed: (1) digestion; (2) distillation in a nitrogen distiller; and (3) titration with sulfuric acid (H₂SO₄). The remaining material was separated and used for the analysis of the other macronutrient (P, K, Ca, Mg, and S) and the micronutrients (copper (Cu), Fe, Mn, and zinc (Zn)), following standard laboratory procedures of chemical analysis of the leaf-tissue [56].

2.2. Hyperspectral Measurement Processing

The spectral radiance of the Valencia-orange leaves was measured with a Fieldspec ASD FieldSpec^® HandHeld 2 spectroradiometer. To record each target, the equipment was positioned at a 45° angle concerning the tree canopy. For that, a lift platform was used to ensure the correct height. This equipment operates at a spectral range of 325 nm to 1075 nm. In this study, a 10° aperture lens was adopted, and 10 readings/measurements were conducted in each leaf to produce one mean spectral signature. This procedure was important to reduce noise and variance for the same target. Before each spectral measurement, a Lambertian (Spectralon^® plate) surface plate was registered. This Lambertian plate was used to calibrate the equipment and convert the digital number to a physical signal. As mentioned, the leaf-spectral response in-field was recorded into 320 measurements for this experiment.

The measured spectral curves consist of the radiance value of the target (i.e., leaf samples) spread along the electromagnetic spectrum. To produce the reflectance value (i.e., reflectance factor), the Hemispherical Conical Reflectance Factor (HCRF) was calculated as shown in Equation (1) [57]:

HCRF (ω_{i} ω_{r}) = \frac{dL (θ_{r}, Φ_{r}) (target)}{dL (θ_{r}, Φ_{r}) (reference)} K (θ_{i}, Φ_{i}, θ_{r}, Φ_{r})

(1)

where dL is the radiance; ω is the solid angle; θ and Φ are the zenith and azimuth angles, respectively; i is the incident flux; and r is the reflected energy flux. The K value is the calibration coefficient (i.e., correction factor specified for the equipment). The target corresponds to the radiance of the leaf and the reference is the radiance of the Lambertian surface plate. The HCRF represents the spectral signature of the recorded target.

After obtaining the reflectance factor of each leaf, a low signal-to-noise removal was performed by excluding wavelengths under 380 nm and above 1020 nm. After this, the first-derivative of all the HCRFs (n = 320) was calculated. The first-derivative calculation is a traditional method for modeling spectral data, and many approaches have discussed this issue. For this study, a linear least mean-squared smoothing filter [58] was firstly performed to reduce the random noise that may vary with the wavelengths and affect the derivative function. In most cases, noise can be assumed to be stationary with constant variance. It then can estimate a noise-free spectrum s(λ) in terms of the current value of the observed data. By knowing the correct signal of the spectrum giving a specific wavelength s(λ), it is possible to perform a final approximation to estimate derivatives by suitable difference schemes according to a finite band resolution: Δλ. Thus, the first-derivative was calculated according to [58]:

\frac{d_{s}}{d_{λ}} | = \frac{s (λ_{i}) - s (λ_{j})}{Δ λ}

(2)

where Δλ is calculated as |λj − λi|, assuming that the interval between the bands is constant. Additional tests involving further derivates, such as the second, third, and fourth, were also made in the experimental phase of this study. However, there were no indications of an improvement over the first-derivate for the used dataset during the machine learning analysis. For this reason, the proposed framework was limited only to the first-derivative, but future research using different leaf data to process additional derivatives is encouraged.

From the total leaf measurements (n = 320) used here, 10% (n = 32) was randomly separated and designed to compose the testing dataset (Figure 2). Wavelengths ranging from 380 to 1020 nm were used in the software as columns, while the leaf measurements (320) were used as rows. The 32 measurements were configured as an independent dataset, which belonged to the Valencia-orange trees located at different plantation fields, never before seen by the algorithms. The other 288 measurements configured the dependent dataset and belonged to trees with conditions or characteristics distinct from one another, observed during the field campaign. To indicate that, a descriptive statistical analysis was conducted with the nutrients’ concentration from the chemical analysis of the leaf tissue, and the following parameters were calculated: minimum, maximum, mean, standard deviation, median, and coefficient of variation. They were important to demonstrate the discrepancy of the dependent (calibration/training) data used, and how representative it could be of the nutritional conditions of Valencia-orange leaf tissue in the analyzed period.

2.3. Machine Learning Analysis and Hyperspectral Mapping

In a computational environment, the nutrients were individually selected as the target variables. As input parameters, the reflectance and the first-derivative were used, and the performance of the algorithms in predicting these nutrients was evaluated. As stated, the curves were separated into three sets. The training dataset was used to set-up the hyperparametrization of the chosen algorithms. For that, the Random Search approach [59] was used. The same conjunction of training/validation data was adopted for all algorithms. This process was repeated with a fine-tuning until the reduction in the mean absolute error (MAE) did not result in any more practical gains, as the modification in the parameters impacted the processing time. Once the hyperparameters of each algorithm were defined, the testing dataset was used to verify its real performance.

To configure and run the algorithms, the open-source computer program RapidMiner 9.5 was used, which is based on a particular Python Library [60], while still permitting the development and implementation of different codes. The workstation for this task was equipped with an Intel(R) Core (TM) i7-8550U CPU 4.00 GHz, a Nvidia GeForce MX-150 4Gb GDDR5 64-bits 6008 MHz GPU, and 8GB RAM DDR4 2400MHz. The algorithms for the proposed framework were as follows: k-Nearest Neighbor (kNN), Lasso Regression, Ridge Regression, Support Vector Machine (SVM), Artificial Neural Network (ANN), Decision Tree (DT), and Random Forest (RF). The prediction metrics to evaluate these algorithms were the coefficient of determination (R²), mean absolute error (MAE), and root-mean-squared error (RMSE). To ascertain the relationship between the measured data and the predicted data, the overall finest models were evaluated in a regression plot.

Regarding the configuration of each algorithm, the parameters of the used methods were set to the library default values, except those described in Table 1. For both the DT and RF algorithms an Extreme Gradient Boosting (XGBoost) model was used to increase their performances. This model adopts a forward-learning ensemble method [61], which obtains predictive results in gradually improved estimations. To illustrate the machine learning architecture regarding data inputs and outputs in the proposed analysis, a structure was organized in Figure 3.

It is also important to address that, although hyperspectral data is relatively easy to obtain, leaf tissue analysis can be limited. This is mostly because the chemical analysis can be highly cost if considering the amounts of data required to process the machine learning algorithms. Therefore, the appropriate number of samples is something to be observed in each case. In this study, the amount of data used to train and validate/test the used algorithms should be sufficient based on the literature. One study [62] compared different learners, such as RF, SVM, kNN, and others, in diverse settings. Between these settings, they evaluated the number of classes per problem (from 2 to 50) and the number of samples per class (from 5 to 100). This returned a variation of 10 to 5000 samples. Through their study, it was demonstrated that data curation could be modeled by these algorithms from a few to a high number of samples and still achieve appropriate results. In comparison with the proposed approach, other machine learning frameworks also adopted similar sample sizes, like 324 leaf measurements that were used to model the water-stress response from lettuce [23], 189 hyperspectral observations that were used to model grapevine water status [63], and 266 observations that were used as training to predict nitrogen content in rice fields [64].

As a discussion example, a recent paper collected 500 samples per class with 540 spectral bands and adopted a Cross-Validation method with a dataset considering 200 samples for each validation to demonstrate the importance of the feature selection methods [65]. Regardless, hyperspectral data have a characteristic distinct from most data, which is a high number of bands/wavelengths available to model a given problem. The used dataset was composed of 320 leaf measurement (in which 32 were separated as a test) and 640 spectral bands (380–1020 nm). This gives a total of 204,800 combinations to work with, which should be enough to configure a training/testing dataset. Although this high dimensionality could offer potential problems to hyperspectral data processes [65], studies already suggested that maintaining the original data could also outperform feature-selected subsets [66,67].

As mentioned, though the aforementioned studies did use similar sample sizes of data to train, validate, and test their learners, little information related to hyperspectral wavelengths and machine learning method sample size could be encountered in the literature [65]. Since there is no research focused on evaluating the impact of the training set to model spectral data, a previous comparison regarding two well-known sampling methods was performed. The first is the cross-validation method, which is more suitable to deal with the most common tasks in machine learning data curation [43]. The Cross-Validation method was performed with 10k folders. This model separates the data into 10(k) parts while using nine of them to train the algorithm and one to validate. This process is done sequentially, constantly changing the folder used for validation. In this manner, the chosen algorithm is always validated by data not used during the training phase. The second method used was the Leave-One-Out approach. This method is similar to Cross-Validation, but instead it only takes one data instance for validation each time. The method is considered a very time-intensive procedure, and it is only recommended for smaller datasets [43]. After applying the Random Search approach [59] to perform a fine-tuning, both training methods’ results were compared (Table 2).

In the Cross-Validation method, from the 288 samples, 90% was used to train while 10% was used to validate, and was repeated 10 times randomly. In the Leave-One-Out method, 287 samples were used to train, while one sample was used to test it. This was repeated until all instances were used. The low difference between MAE predictions in each nutrient for both methods indicates that, even when adopting a more suitable training approach to model smaller datasets (Leave-One-Out), the training results are similar. Still, while the Leave-One-Out method is approximately unbiased, it could result in a high variance. Normally, the variance in fitting a model tends to be higher in small datasets since it is more sensitive to noise and artifacts in the used training sample. Because of that, a Cross-Validation method would also show signs of high variance, as well as a high bias if given a limited amount of data. This was not the case here, since both methods returned high predictions and similar metrics, thus indicating that, whatever the training method, the amount of data (204,800 combinations) was sufficient to model the given problem. Regardless, the Leave-One-Out method needed a higher computational cost, which is something to be considered when evaluating the amount of processed data. In the workstation, the Leave-One-Out-averaged processing time for all algorithms was 7.5 times slower than the Cross-Validation method. Because of that, the Cross-Validation method was adopted in this study, but future research should considerer both methods according to their respective dataset size and characteristics.

Lastly, the contribution of each spectral wavelength into the performance of the algorithm was computed by displaying their Relief-F value. Relief-F uses a kNN scoring to address noise data while handles incomplete data [68]. It is considered a reliable metric to inform a feature score and then be applied to rank top-scoring features. Here, the Relief-F values were used to map the hyperspectral response of each nutrient regarding the strength of the individual wavelengths to the performance of the evaluated algorithms. Aside from that, to help ascertain the hyperspectral relationship with the evaluated nutrients dataset, an analysis of each nutrient and a Shapiro–Wilk normality test at a 95% confidence interval was performed. As the normality test returned a p-value under 0.05, a non-parametric Spearman’s correlation test in a pairwise comparison was executed to verify the association between each nutrient.

3. Results

The chemical analysis of the leaf tissue returned heterogeneous and non-parametric results for the nutrient content of the analyzed leaves (Table 3). Analysis has shown that the majority of the nutrients presented a high variability and uniform distribution. This behavior was most noticeable in nutrients, such as Ca, Fe, Mn, and Zn. Regardless, this condition is important to demonstrate the applicability of the proposed framework. Since this is a heterogeneous dataset, machine learning algorithms are advantageous for modeling data with such characteristics.

The correlation between nutrients (Figure 4) is important information to characterize a dataset. The correlation coefficients indicated that, although significant, most nutrients have a low correlation value among themselves. Still, the pairwise comparison returned an expected behavior. Macronutrients, such as N, P, and K, showed positive correlations with each other while presenting negative correlation coefficients (N and P) with the other nutrients. The correlation coefficients between the nutrients varied around 0.5 or below, with the highest reaching 0.59 and lowest reaching −0.60. The low correlation value is also favorable for the proposed framework, as it helps to isolate the effects of the nutrient on the evaluated wavelengths.

For the machine learning algorithms used, the results were separated between the two datasets: reflectance (Table 4) and first-derivative (Table 5). The algorithms returned good performances (R² > 0.80) for the macronutrients with the spectral reflectance as predictors. When the first-derivative was used, the algorithms performed well on both macro- and micronutrients (some R² > 0.80), but all performances were improved for the micronutrients. This is an important discovery, as it highlights the importance of first-derivative measurements and their relationship with micronutrients in the Valencia-orange leaf tissue. In both datasets, algorithms like RF, ANN, and kNN returned better predictions than most linear ones, such as Lasso and Ridge Regressions and SVM. The MAE predictions returned here are similar to the predictions resulted from the training phase, which indicates how adjusted the sampling method was.

To ascertain the relationship between each nutrient prediction, their regression values were plotted (Figure 5a,b). A quick analysis of the best-predicted values versus the laboratory-measured values demonstrates how the performance of the algorithms varied with the increase in the nutrient concentrations. Nutrients such as P and Ca did not show a closer resemblance with a 1:1 relationship (dashed-line—Figure 5a,b) as much as the other nutrients’ predictions, even lower ones such as Fe. Regardless, most predictions were quite well related to the laboratory data, and on-site measurements of nutrients like N, K, Mg, S, Cu, Mn, and Zn may benefit from the advantage promulgated by the approach presented here.

The calculated Relief-F value showed the contribution of each wavelength to the algorithms’ performance (Figure 6a,b). This contribution is important to isolate specific spectral regions and wavelengths of the electromagnetic spectrum most closely related to each nutrient. This relationship, however, is limited to the evaluated algorithm and its performance. Still, since most performances were relatively high (Table 3 and Table 4) for most nutrients, this metric is an interesting parameter, as it shines some light on the spectral mapping of Valencia-orange leaf nutrients, as not much is known about their spectral behavior.

As mentioned, the Relief-F value calculated for each wavelength indicated important contributions in different ranges for each nutrient (Figure 5a,b). Because of that, certain bands of the electromagnetic spectrum contributed more than others. To summarize the information obtained from the proposed framework, a table (Table 6) indicating the nutrient and its class (macro or micro), the machine learning method most suitable to model it, its coefficient of determination (R²), the spectral data which its prediction was calculated from, and the most contributive wavelengths or spectral regions to model the measured nutrient from the Relief-F value was created. These results demonstrate the potential of applying different machine learning algorithms for this task. So far, this is the first approach of its kind with nutrient content in leaf tissue analysis.

4. Discussion

In the proposed framework, both reflectance data and their first-derivatives were used to predict macro- and micronutrients. This approach used a robust technique (machine learning) to model the hyperspectral data, which helped to ascertain some discoveries. The results demonstrated compelling performances to predict most of the nutrients (Table 3, Table 4 and Table 5). Nutrients like N, Mg, Cu, Mn, and Zn were predicted with an R² of 0.912, 0.832, 0.861, 0.898, and 0.855, respectively. Other nutrients like P, K, and S presented inferior performances with an R² of 0.771, 0.763, and 0.727, respectively. The worst performances were obtained for nutrients like Ca and Fe, with their R²’s equal to 0.624 and 0.612, respectively. In comparison to the literature, most of these performances, specifically those related to macronutrients, were similar or superior for other types of plants and methods. For N, predictions using visible to infrared data returned accuracies between 0.73 to 0.87 (R²) [2,30,40]. For K, a three-band combination index predicted the nutrient with an R² equal to 0.74 [37]. In nutrients like Mg, S, P, Ca, and others, the predictions (R²) variated between lower values of 0.27 up to 0.98, depending on the method applied and the plant evaluated [24,26,30,32,38,39,50].

One important finding from this research is the relationship between nutrients, algorithms, and leaf-spectral curves. For macronutrients, the performance of the algorithms was superior when adopting the surface reflectance data. As for the micronutrients, the first-derivative of the spectral reflectance returned better performances for the algorithms (Figure 5a,b and Table 6). This finding can be related to information reported in the literature [39,40,41]. Since first-derivative spectra allow for highlighting absorption features of the original spectra, it could potentially be linked to different components not so easily observable in spectral reflectance data alone. This approach demonstrated a better relation for all micronutrients when linked to the algorithms with the first-derivative, so this could offer a possible explanation. As previously mentioned in Section 2.2, other derivatives of the dataset were evaluated, but could not find a significant difference over the first derivative. Still, further research should continue to explore the association between first-derivative spectra, second- and third-derivatives, and micronutrients.

Another contribution of the proposed framework is that, although, with a limitation in the accuracy of the algorithms, it is possible to identify the wavelengths and the spectral regions that most contributed to predicting each nutrient (Figure 6a,b and Table 6). While some nutrients show contributions from the same wavelengths, these contributions vary in value (Relief-F). Even so, most of the nutrients showcase particular wavelengths that could potentially be isolated or used in combination with others to ascertain their relationship with the prediction (Table 6). This finding could help to map the Valencia-orange leaf spectral behavior related to both macro- and micronutrients and promote the investigation of simpler mathematical models or spectral indices capable of modeling these nutrients by focusing on these wavelengths.

Machine learning algorithms have the advantage of modeling data in a non-linear and a non-parametric manner. Unlike many traditional statistical methods, these algorithms are built with the advantage of dealing with noisy, complex, and heterogeneous data [16,23,50,51,52]. These characteristics proved to be an advantage for this study, as the data used had higher variance, was not-normal (Table 3), and, while statistically significant, low-correlated in a pairwise manner (Figure 4). Previous research aimed to model nutrient content with similar characteristics by using multiple mathematical methods in the analysis of plant hyperspectral data, but it did not return the same accuracies [15,24,32,40,42]. Nonetheless, since machine learning methods can deal with most of the data inconsistencies, both in hyperspectral measurements and in nutrient content analysis, the proposed framework should be more appropriate to combine these features not requiring data modification while still returning good performances.

Finally, the different performances returned by the algorithms should be discussed. It is clear that regression models like Lasso, Ridge, and SVM were inferior to others (Table 3 and Table 4) in both scenarios (reflectance/first-derivative). Although SVM is known to handle high dimensionality data and do well with a limited training dataset [45], it performed poorly in the used dataset in comparison with the rest. The DT algorithm, though not as inferior in performance as the aforementioned algorithms, achieved middling results in comparison with the remaining methods. For the DT, the XGBoost model was adopted to improve the prediction performance, which was also implemented in the RF base model. During the experimental phase, this boosting model proved to be of assistance in enhancing the performance of both algorithms. Still, DT did not return predictions as accurate as did the RF algorithm.

The highest performances were obtained by the RF, ANN, and kNN algorithms; both for macro- and micronutrients. While RF was better in almost all predictions, ANN and kNN performed well in only particular cases, especially for K (reflectance data) and Mn (first-derivative data), respectively. kNN is a simpler method than ANN and RF, being relatively faster. However, throughout the different nutrients, RF and ANN had better consistency. The ANN was constructed in a manner that could predict most of the nutrients, but performance rates were limited. The amount of available data for training the algorithms could also be a potential hindrance for deep learning networks to handle. While the ANN method benefited from a multilayer perceptron, with two hidden layers and a high number of neurons and interactions, the RF algorithm was boosted with the XGBoost model, which returned a continuous performance for the reflectance and first-derivative datasets. Regardless, as no machine learning algorithm is considered universally appropriate to deal with any task, a framework like the one proposed here is recommended since it makes uses of multiple algorithms because different data could potentially impact its performance.

5. Conclusions

The proposed approach uses leaf spectral data in the visible and near-infrared regions, and switches between reflectance and its first-derivative data to predict the amount of macro- and micronutrients measured in the laboratory. This method was able to return high predictions (R²) for nutrients like N (0.912), Mg (0.832), Cu (0.861), Mn (0.898), and Zn (0.855), and, to a lesser extent, P (0.771), K (0.763), and S (0.727). These accuracies were obtained with the RF, ANN, and kNN algorithms, among which RF performed the best. Another discovery was that reflectance data is more suitable to model macronutrients, while the first-derivative of the reflectance data is better related to micronutrients. Another contribution also made by this study is the identification (by the Relief-F value) of the wavelengths most responsible for the prediction results. Each nutrient was better correlated to one or more spectral wavelengths. Because of it, future research should evaluate simpler models or spectral vegetation indices capable of modeling the nutrient content by focusing on these wavelengths. Although the presented method was used for evaluating the nutritional conditions of Valencia-orange leaves, it can be replicated for different plants and cultivars, with the possibility of even better performances being achievable. Furthermore, as an advantage of this approach, this framework may be implemented in hyperspectral data obtained with sensors embedded in UAV-based systems.

Author Contributions

Conceptualization, L.P.O., A.P.M.R., J.M.J., and J.E.C.; methodology, L.P.O., A.P.M.R., É.A.S.M., and J.E.C. formal analysis, L.P.O. and M.M.F.P.; resources, J.E.C., N.N.I., F.F.d.A. and J.M.J.; data curation, L.P.O., M.M.F.P., A.P.M.R., É.A.S.M., J.E.C., and J.M.J.; writing—original draft preparation, L.P.O.; writing—review and editing, A.P.M.R, É.A.S.M., N.N.I., F.F.d.A., N.E., F.I., V.L., L.A.d.C.J., J.L., L.M., W.N.G. and J.M.J.; supervision, A.P.M.R., J.E.C., N.N.I. and J.M.J; project administration, A.P.M.R., J.E.C and J.M.J.; funding acquisition, L.P.O., and J.E.C. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financed in part by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—Finance Code 001 and supported by the Universidade Federal de Mato Grosso do Sul (UFMS). É.A.S. Moriya is supported by FUNDUNESP/Print (p: 3030/2019). V. Liesenberg is supported by FAPESC (2017TR1762) and CNPq (313887/2018-7). N.N. Imai is supported by CNPq (310128/2018-8). J. Marcato Junior is supported by CNPq (433783/2018-4, 303559/2019-5) and Fundect (59/300.066/2015).

Acknowledgments

The authors acknowledge Universidade Federal de Mato Grosso do Sul (UFMS) for supporting the research, and Fazenda Brasilia, located in Areia Branca, Ubirajara—SP (Brazil), for contributing to the experimental site.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Li, F.; Wang, L.; Liu, J.; Wang, Y.; Chang, Q. Evaluation of leaf N concentration in winter wheat based on discrete wavelet transform analysis. Remote Sens. 2019, 11, 1331. [Google Scholar] [CrossRef]
Li, Z.; Jin, X.; Yang, G.; Drummond, J.; Yang, H.; Clark, B.; Li, Z.; Zhao, C. Remote sensing of leaf and canopy nitrogen status in winter wheat (Triticum aestivum L.) based on N-PROSAIL model. Remote Sens. 2018, 10, 1463. [Google Scholar] [CrossRef]
Osco, L.P.; Marques Ramos, A.P.; Saito Moriya, É.A.; de Souza, M.; Marcato Junior, J.; Matsubara, E.T.; Imai, N.N.; Creste, J.E. Improvement of leaf nitrogen content inference in Valencia-orange trees applying spectral analysis algorithms in UAV mounted-sensor images. Int. J. Appl. Earth Obs. Geoinf. 2019, 83, 101907. [Google Scholar] [CrossRef]
Zheng, H.; Li, W.; Jiang, J.; Liu, Y.; Cheng, T.; Tian, Y.; Zhu, Y.; Cao, W.; Zhang, Y.; Yao, X. A comparative assessment of different modeling algorithms for estimating leaf nitrogen content in winter wheat using multispectral images from an unmanned aerial vehicle. Remote Sens. 2018, 10, 2026. [Google Scholar] [CrossRef]
Loggenberg, K.; Strever, A.; Greyling, B.; Poona, N. Modelling water stress in a Shiraz vineyard using hyperspectral imaging and machine learning. Remote Sens. 2018, 10, 202. [Google Scholar] [CrossRef]
Gerhards, M.; Schlerf, M.; Rascher, U.; Udelhoven, T.; Juszczak, R.; Alberti, G.; Miglietta, F.; Inoue, Y. Analysis of airborne optical and thermal imagery for detection of water stress symptoms. Remote Sens. 2018, 10, 1139. [Google Scholar] [CrossRef]
Johnson, K.; Sankaran, S.; Ehsani, R. Identification of Water Stress in Citrus Leaves Using Sensing Technologies. Agronomy 2013, 3, 747–756. [Google Scholar] [CrossRef]
Osco, L.P.; dos Santos de Arruda, M.; Marcato Junior, J.; da Silva, N.B.; Ramos, A.P.M.; Moryia, É.A.S.; Imai, N.N.; Pereira, D.R.; Creste, J.E.; Matsubara, E.T.; et al. A convolutional neural network approach for counting and geolocating citrus-trees in UAV multispectral imagery. ISPRS J. Photogramm. Remote Sens. 2020, 160, 97–106. [Google Scholar] [CrossRef]
Weinstein, B.G.; Marconi, S.; Bohlman, S.; Zare, A.; White, E. Individual tree-crown detection in rgb imagery using semi-supervised deep learning neural networks. Remote Sens. 2019, 11, 1309. [Google Scholar] [CrossRef]
Zhang, K.; Ge, X.; Shen, P.; Li, W.; Liu, X.; Cao, Q.; Zhu, Y.; Cao, W.; Tian, Y. Predicting rice grain yield based on dynamic changes in vegetation indexes during early to mid-growth stages. Remote Sens. 2019, 11, 387. [Google Scholar] [CrossRef]
Nevavuori, P.; Narra, N.; Lipping, T. Crop yield prediction with deep convolutional neural networks. Comput. Electron. Agric. 2019, 163, 104859. [Google Scholar] [CrossRef]
Hunt, M.L.; Blackburn, G.A.; Carrasco, L.; Redhead, J.W.; Rowland, C.S. High resolution wheat yield mapping using Sentinel-2. Remote Sens. Environ. 2019, 233, 111410. [Google Scholar] [CrossRef]
Cui, B.; Zhao, Q.; Huang, W.; Song, X.; Ye, H.; Zhou, X. A new integrated vegetation index for the estimation of winter wheat leaf chlorophyll content. Remote Sens. 2019, 11, 974. [Google Scholar] [CrossRef]
Guo, T.; Tan, C.; Li, Q.; Cui, G.; Li, H. Estimating leaf chlorophyll content in tobacco based on various canopy hyperspectral parameters. J. Ambient Intell. Humaniz. Comput. 2019, 10, 3239–3247. [Google Scholar] [CrossRef]
Peng, Z.; Guan, L.; Liao, Y.; Lian, S. Estimating total leaf chlorophyll content of gannan navel orange leaves using hyperspectral data based on partial least squares regression. IEEE Access 2019, 7, 155540–155551. [Google Scholar] [CrossRef]
Abdulridha, J.; Batuman, O.; Ampatzidis, Y. UAV-based remote sensing technique to detect citrus canker disease utilizing hyperspectral imaging and machine learning. Remote Sens. 2019, 11, 1373. [Google Scholar] [CrossRef]
Yao, Z.; Lei, Y.; He, D. Early visual detection of wheat stripe rust using visible/near-infrared hyperspectral imaging. Sensors (Switzerland) 2019, 19, 952. [Google Scholar] [CrossRef]
Pham, T.D.; Yokoya, N.; Bui, D.T.; Yoshino, K.; Friess, D.A. Remote sensing approaches for monitoring mangrove species, structure, and biomass: Opportunities and challenges. Remote Sens. 2019, 11, 230. [Google Scholar] [CrossRef]
Brinkhoff, J.; Dunn, B.W.; Robson, A.J.; Dunn, T.S.; Dehaan, R.L. Modeling mid-season rice nitrogen uptake using multispectral satellite data. Remote Sens. 2019, 11, 1837. [Google Scholar] [CrossRef]
Zhou, C.; Ye, H.; Xu, Z.; Hu, J.; Shi, X.; Hua, S.; Yue, J.; Yang, G. Estimating maize-leaf coverage in field conditions by applying a machine learning algorithm to UAV remote sensing images. Appl. Sci. 2019, 9, 2389. [Google Scholar] [CrossRef]
Delloye, C.; Weiss, M.; Defourny, P. Retrieval of the canopy chlorophyll content from Sentinel-2 spectral bands to estimate nitrogen uptake in intensive winter wheat cropping systems. Remote Sens. Environ. 2018, 216, 245–261. [Google Scholar] [CrossRef]
Vanbrabant, Y.; Tits, L.; Delalieux, S.; Pauly, K.; Verjans, W.; Somers, B. Multitemporal chlorophyll mapping in pome fruit orchards from remotely piloted aircraft systems. Remote Sens. 2019, 11, 1468. [Google Scholar] [CrossRef]
Osco, L.P.; Ramos, A.P.M.; Moriya, E.A.S.; Bavaresco, L.G.; Lima, B.C.; Estrabis, N.; Pereira, D.R.; Creste, J.E.; Marcato Junior, J.; Gonçalves, W.N.; et al. Modeling hyperspectral response of water-stress induced lettuce plants using artificial neural networks. Remote Sens. 2019, 11, 2797. [Google Scholar] [CrossRef]
de Oliveira, D.M.; Fontes, L.M.; Pasquini, C. Comparing laser induced breakdown spectroscopy, near infrared spectroscopy, and their integration for simultaneous multi-elemental determination of micro- and macronutrients in vegetable samples. Anal. Chim. Acta 2019, 1062, 28–36. [Google Scholar] [CrossRef]
Chen, J.; Li, F.; Wang, R.; Fan, Y.; Raza, M.A.; Liu, Q.; Wang, Z.; Cheng, Y.; Wu, X.; Yang, F.; et al. Estimation of nitrogen and carbon content from soybean leaf reflectance spectra using wavelet analysis under shade stress. Comput. Electron. Agric. 2019, 156, 482–489. [Google Scholar] [CrossRef]
Cuq, S.; Lemetter, V.; Kleiber, D.; Levasseur-Garcia, C. Assessing macro-element content in vine leaves and grape berries of vitis vinifera by using near-infrared spectroscopy and chemometrics. Int. J. Environ. Anal. Chem. 2019. [Google Scholar] [CrossRef]
Santoso, H.; Tani, H.; Wang, X.; Segah, H. Predicting oil palm leaf nutrient contents in kalimantan, indonesia by measuring reflectance with a spectroradiometer. Int. J. Remote Sens. 2019, 40, 7581–7602. [Google Scholar] [CrossRef]
Osco, L.P.; Ramos, A.P.M.; Pereira, D.R.; Moriya, E.A.S.; Imai, N.N.; Matsubara, E.T. Predicting canopy nitrogen content in citrus-trees using random forest algorithm associated to spectral vegetation indices from UAV-Imagery. Remote Sens. 2019, 11, 2925. [Google Scholar] [CrossRef]
Allen, V.; Barker, D.J.P. Handbook of Plant Nutrition; Taylor et Francis, 2015. Available online: https://www.bokus.com/bok/9781439881989/handbook-of-plant-nutrition/ (accessed on 4 February 2020).
Malmir, M.; Tahmasbian, I.; Xu, Z.; Farrar, M.B.; Bai, S.H. Prediction of macronutrients in plant leaves using chemometric analysis and wavelength selection. J. Soils Sediments 2019. [Google Scholar] [CrossRef]
Ye, X.; Abe, S.; Zhang, S. Estimation and mapping of nitrogen content in apple trees at leaf and canopy levels using hyperspectral imaging. Precis. Agric. 2019. [Google Scholar] [CrossRef]
Ling, B.; Goodin, D.G.; Raynor, E.J.; Joern, A. Hyperspectral analysis of leaf pigments and nutritional elements in tallgrass prairie vegetation. Front. Plant Sci. 2019, 10, 1–13. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Yan, C.; Lu, S.; Wang, P.; Qiu, G.Y.; Li, R. Estimation of chlorophyll content in intertidal mangrove leaves with different thicknesses using hyperspectral data. Ecol. Indic. 2019, 106, 105511. [Google Scholar] [CrossRef]
Shi, J.; Li, W.; Zhai, X.; Guo, Z.; Holmes, M.; Elrasheid Tahir, H.; Zou, X. Nondestructive diagnostics of magnesium deficiency based on distribution features of chlorophyll concentrations map on cucumber leaf. J. Plant Nutr. 2019, 42, 2773–2783. [Google Scholar] [CrossRef]
Román, J.R.; Rodríguez-Caballero, E.; Rodríguez-Lozano, B.; Roncero-Ramos, B.; Chamizo, S.; Águila-Carricondo, P.; Cantón, Y. Spectral response analysis: An indirect and non-destructive methodology for the chlorophyll quantification of biocrusts. Remote Sens. 2019, 11, 1350. [Google Scholar] [CrossRef]
Bruning, B.; Liu, H.; Brien, C.; Berger, B.; Lewis, M.; Garnett, T. The Development of hyperspectral distribution maps to predict the content and distribution of nitrogen and water in wheat (Triticum aestivum). Front. Plant Sci. 2019. [Google Scholar] [CrossRef]
Lu, J.; Yang, T.; Su, X.; Qi, H.; Yao, X.; Cheng, T.; Zhu, Y.; Cao, W.; Tian, Y. Monitoring leaf potassium content using hyperspectral vegetation indices in rice leaves. Precis. Agric. 2019. [Google Scholar] [CrossRef]
Jull, H.; Künnemeyer, R.; Schaare, P. Nutrient quantification in fresh and dried mixtures of ryegrass and clover leaves using laser-induced breakdown spectroscopy. Precis. Agric. 2018, 19, 823–839. [Google Scholar] [CrossRef]
Galvez-Sola, L.; García-Sánchez, F.; Pérez-Pérez, J.G.; Gimeno, V.; Navarro, J.M.; Moral, R.; Martínez-Nicolás, J.J.; Nieves, M. Rapid estimation of nutritional elements on citrus leaves by near infrared reflectance spectroscopy. Front. Plant Sci. 2015, 6, 1–8. [Google Scholar] [CrossRef]
Yang, J.; Du, L.; Gong, W.; Shi, S.; Sun, J.; Chen, B. Analyzing the performance of the first-derivative fluorescence spectrum for estimating leaf nitrogen concentration. Opt. Express 2019, 27, 3978. [Google Scholar] [CrossRef]
Yang, J.; Cheng, Y.; Du, L.; Gong, W.; Shi, S.; Sun, J.; Chen, B. Selection of the optimal bands of first-derivative fluorescence characteristics for leaf nitrogen concentration estimation. Appl. Opt. 2019, 58, 5720–5727. [Google Scholar] [CrossRef]
Zhou, W.; Zhang, J.; Zou, M.; Liu, X.; Du, X.; Wang, Q.; Liu, Y.; Liu, Y.; Li, J. Prediction of cadmium concentration in brown rice before harvest by hyperspectral remote sensing. Environ. Sci. Pollut. Res. 2019, 26, 1848–1856. [Google Scholar] [CrossRef] [PubMed]
Mitchell, T.M. Machine Learning, 1st ed.; McGraw-Hill, Inc.: New York, NY, USA, 1997. [Google Scholar]
Ball, J.E.; Anderson, D.T.; Chan, C.S. Comprehensive survey of deep learning in remote sensing: Theories, tools, and challenges for the community. J. Appl. Remote Sens. 2017, 11, 1. [Google Scholar] [CrossRef]
Ma, L.; Liu, Y.; Zhang, X.; Ye, Y.; Yin, G.; Johnson, B.A. ISPRS Journal of Photogrammetry and Remote Sensing Deep learning in remote sensing applications: A meta-analysis and review. ISPRS J. Photogramm. Remote Sens. 2019, 152, 166–177. [Google Scholar] [CrossRef]
Gao, J.; Meng, B.; Liang, T.; Feng, Q.; Ge, J.; Yin, J. ISPRS Journal of Photogrammetry and Remote Sensing Modeling alpine grassland forage phosphorus based on hyperspectral remote sensing and a multi-factor machine learning algorithm in the east of Tibetan. ISPRS J. Photogramm. Remote Sens. 2019, 147, 104–117. [Google Scholar] [CrossRef]
Feng, P.; Wang, B.; Liu, D.L.; Yu, Q. Machine learning-based integration of remotely-sensed drought factors can improve the estimation of agricultural drought in South-Eastern Australia. Agric. Syst. 2019, 173, 303–316. [Google Scholar] [CrossRef]
Han, L.; Yang, G.; Dai, H.; Xu, B.; Yang, H.; Feng, H.; Li, Z.; Yang, X. Modeling maize above-ground biomass based on machine learning approaches using UAV remote-sensing data. Plant Methods 2019, 15, 10. [Google Scholar] [CrossRef]
Singhal, G.; Bansod, B.; Mathew, L.; Goswami, J.; Choudhury, B.U.; Raju, P.L.N. Chlorophyll estimation using multi-spectral unmanned aerial system based on machine learning techniques. Remote Sens. Appl. Soc. Environ. 2019, 15, 100235. [Google Scholar] [CrossRef]
Chanda, S.; Hazarika, A.K.; Choudhury, N.; Islam, S.A.; Manna, R.; Sabhapondit, S.; Tudu, B.; Bandyopadhyay, R. Support vector machine regression on selected wavelength regions for quantitative analysis of caffeine in tea leaves by near infrared spectroscopy. J. Chemom. 2019, 33, 10. [Google Scholar] [CrossRef]
Shah, S.H.; Angel, Y.; Houborg, R.; Ali, S.; McCabe, M.F. A random forest machine learning approach for the retrieval of leaf chlorophyll content in wheat. Remote Sens. 2019, 11, 920. [Google Scholar] [CrossRef]
Fu, P.; Meacham-Hensold, K.; Guan, K.; Bernacchi, C.J. Hyperspectral leaf reflectance as proxy for photosynthetic capacities: An ensemble approach based on multiple machine learning algorithms. Front. Plant Sci. 2019, 10. [Google Scholar] [CrossRef]
Obreza, T.A.; Morgan, K.T. (Eds.) Nutrition of Florida Citrus Trees, 2nd ed.; IFAS Extension; University of Florida: Gainesville, FL, USA, 2008. [Google Scholar]
Köppen, W.; Volken, E.; Brönnimann, S. The thermal zones of the Earth according to the duration of hot, moderate and cold periods and to the impact of heat on the organic world. Meteorol. Zeitschrift 2011, 20, 351–360. [Google Scholar] [CrossRef] [PubMed]
Nitrogen Determination by Kjeldahl Method PanReac AppliChem ITW Reagents. Available online: https://www.itwreagents.com/uploads/20180114/A173_EN.pdf (accessed on 27 February 2019).
Malavolta, E.; Vitti, G.C.; Oliveira, S.A. Evaluation of Nutritional Status of Plants: Principles and Perspectives, 2nd ed.; POTAFOS: Piracicaba, SP, Brazil, 1997; 319p. [Google Scholar]
Anderson, K.; Rossini, M.; Labrador, J.P.; Balzarolo, M.; Arthur, A.; Fava, F.; Julitta, T.; Vescovo, L. Inter-comparison of hemispherical conical reflectance factors (HCRF) measured with four fiber-based spectrometers. Opt. Express. 2013, 21, 605–617. [Google Scholar] [CrossRef] [PubMed]
Tsai, F.; Philpot, W. Derivative Analysis of Hyperspectral Data. Remote Sens. Environ. 1998, 66, 41–51. [Google Scholar] [CrossRef]
Bergstra, J.; Bengio, Y. Random search for hyper-parameter optimization. J. Mach. Learn. Res. 2012, 13, 281–305. [Google Scholar]
RapidMiner. RapidMiner Python Package. Available online: https://github.com/rapidminer/pythonrapidminer (accessed on 5 December 2019).
XGBoost. eXtreme Gradient Boosting. Available online: https://github.com/dmlc/xgboost (accessed on 5 December 2019).
Amancio, D.R.; Comin, C.H.; Casanova, D.; Travieso, G.; Bruno, O.M.; Rodrigues, F.A.; Da Fontoura Costa, L. A systematic comparison of supervised classifiers. PLoS ONE 2014, 9, 1–14. [Google Scholar] [CrossRef]
Pôças, I.; Tosin, R.; Gonçalves, I.; Cunha, M. Toward a generalized predictive model of grapevine water status in Douro region from hyperspectral data. Agric. For. Meteorol. 2020, 280, 107793. [Google Scholar] [CrossRef]
Zha, H.; Miao, Y.; Wang, T.; Li, Y.; Zhang, J.; Sun, W. Sensing-Based Rice Nitrogen Nutrition Index Prediction with Machine Learning. Remote Sens. 2020, 12, 215. [Google Scholar] [CrossRef]
Hennessy, A.; Clarke, K.; Lewis, M. Hyperspectral Classification of Plants: A Review of Waveband Selection Generalisability. Remote Sens. 2020, 12, 113. [Google Scholar] [CrossRef]
Chan, J.C.W.; Paelinckx, D. Evaluation of Random Forest and Adaboost tree-based ensemble classification and spectral band selection for ecotope mapping using airborne hyperspectral imagery. Remote Sens. Environ. 2008, 112, 2999–3011. [Google Scholar] [CrossRef]
Alonzo, M.; Bookhagen, B.; Roberts, D.A. Urban tree species mapping using hyperspectral and lidar data fusion. Remote Sens. Environ. 2014, 148, 70–83. [Google Scholar] [CrossRef]
Urbanowicz, R.J.; Meeker, M.; La Cava, W.; Olson, R.S.; Moore, J.H. Relief-based feature selection: Introduction and review. J. Biomed. Inform. 2018, 85, 189–203. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The workflow of the four main processes adopted for the proposed approach.

Figure 2. Spectral wavelengths used for testing the machine learning algorithms’ performance. In green are the spectral reflectance, while in dark-red are their respective first-derivatives.

Figure 3. Structure of the machine learning architecture of the proposed framework.

Figure 4. Correlation between the measured nutrients for the Valencia-orange leaf samples.

Figure 5. (a) Macronutrient prediction comparison against laboratory measurements for the best algorithms’ results. (b) Micronutrient prediction comparison against laboratory measurements for the best algorithms’ results.

Figure 6. (a) The individual contribution of wavelengths for each macronutrients’ prediction. (b) The individual contribution of wavelengths for each micronutrients’ prediction.

Table 1. Detailed information regarding the algorithms adopted in the proposed framework.

Algorithm	Hyperparameter	Criteria
kNN	Distance Number of Neighbors	Euclidian k-Neighbors = 5
Lasso Regression (L1)	Strength (α) Elastic Net Mixing Proportion (L1–L2)	1.0 0.57:0.43
Ridge Regression (L2)	Strength (α) Elastic Net Mixing Proportion (L1–L2)	0.1 0.57:0.43
SVM	Radial Basis Function (RBF) Kernel exp(-g\|x-y\|²)	g = automatic Regression Loss = 1.00 SVM Type Cost = 1 Tolerance = 0.001 Interaction (limit) = Unlimited
ANN	Activation: Logistic Function Adam Optimizer Regularization (α) = 0.0001	Neurons (First Hidden Layer) = 400 Neurons (Second Hidden Layer) = 200 Interactions = 10,000
DT	Number of Leaves Trees Depth	Leaves (minimal) = 2 Tree-depth (maximum) = 100
RF	Number of Trees Nodes	Trees = 100 Nodes (maximum) = 5

Table 2. MAE returned from both training methods: Cross-Validation and Leave-One-Out.

Method	N	P	K	Ca	Mg	S	Cu	Fe	Mn	Zn
kNN
MAE (Ref.)—Cross-Validation	0.682	0.122	1.070	6.125	0.201	0.179	6.793	17.022	4.890	5.058
MAE (1st Der.)—Cross-Validation	0.884	0.174	1.235	6.358	0.357	0.154	6.589	16.789	3.012	4.246
MAE (Ref.)—Leave-One-Out	0.700	0.158	1.107	6.549	0.224	0.144	6.723	17.686	4.705	5.157
MAE (1st Der.)—Leave-One-Out	0.912	0.178	1.301	6.897	0.377	0.163	6.453	16.949	3.192	4.349
Lasso Regression
MAE (Ref.)—Cross-Validation	1.986	0.083	1.322	7.010	0.523	0.118	17.552	31.849	8.004	7.128
MAE (1st Der.)—Cross-Validation	1.056	0.079	1.756	6.849	0.446	0.287	17.389	30.595	7.341	9.058
MAE (Ref.)—Leave-One-Out	1.898	0.095	1.352	7.101	0.538	0.107	17.515	32.256	8.058	7.185
MAE (1st Der.)—Leave-One-Out	1.285	0.087	1.975	6.942	0.455	0.284	17.395	30.112	7.578	9.088
Ridge Regression
MAE (Ref.)—Cross-Validation	2.058	0.245	1.022	9.388	0.589	0.284	17.383	27.218	9.085	6.143
MAE (1st Der.)—Cross-Validation	1.359	0.234	1.766	6.238	0.687	0.235	17.287	33.185	7.898	8.897
MAE (Ref.)—Leave-One-Out	2.045	0.277	1.079	9.183	0.587	0.225	17.858	27.351	9.056	6.041
MAE (1st Der.)—Leave-One-Out	1.350	0.202	1.768	6.156	0.678	0.202	17.084	33.584	7.984	8.789
SVM
MAE (Ref.)—Cross-Validation	0.789	0.134	1.012	5.888	0.456	0.179	18.894	20.170	3.374	7.071
MAE (1st Der.)—Cross-Validation	1.058	0.158	1.415	5.979	0.568	0.199	10.152	23.158	6.385	5.759
MAE (Ref.)—Leave-One-Out	0.798	0.159	1.028	5.978	0.456	0.178	18.265	20.588	3.318	7.052
MAE (1st Der.)—Leave-One-Out	1.158	0.178	1.456	5.899	0.589	0.158	10.128	23.480	6.318	5.878
ANN
MAE (Ref.)—Cross-Validation	0.789	0.157	1.025	7.126	0.285	0.134	7.895	29.389	5.058	6.388
MAE (1st Der.)—Cross-Validation	1.058	0.193	1.453	6.087	0.456	0.146	9.185	19.241	4.358	5.268
MAE (Ref.)—Leave-One-Out	0.744	0.155	1.064	7.235	0.259	0.138	7.563	29.289	5.568	6.456
MAE (1st Der.)—Leave-One-Out	1.057	0.189	1.487	6.023	0.482	0.105	9.458	19.568	4.238	5.215
DT
MAE (Ref.)—Cross-Validation	0.689	0.158	1.056	6.054	0.315	0.113	7.874	19.498	4.286	5.158
MAE (1st Der.)—Cross-Validation	1.055	0.123	1.126	6.586	0.467	0.205	6.894	31.218	4.878	4.238
MAE (Ref.)—Leave-One-Out	0.641	0.102	1.085	6.088	0.305	0.106	7.415	19.352	4.984	5.512
MAE (1st Der.)—Leave-One-Out	1.028	0.112	1.285	6.547	0.489	0.202	6.897	31.189	4.489	4.354
RF
MAE (Ref.)—Cross-Validation	0.689	0.078	1.112	3.025	0.210	0.087	7.289	18.548	4.898	3.789
MAE (1st Der.)—Cross-Validation	0.638	0.101	1.207	6.238	0.389	0.179	6.046	16.189	3.874	1.789
MAE (Ref.)—Leave-One-Out	0.677	0.089	1.103	3.045	0.207	0.088	7.358	18.895	4.984	3.898
MAE (1st Der.)—Leave-One-Out	0.622	0.107	1.201	6.125	0.379	0.189	6.215	16.189	3.789	1.875

Table 3. Descriptive data from the chemical analysis of the Valencia-orange leaves.

Summary	Macronutrient (g/kg)						Micronutrient (mg/kg)
Summary	N	P	K	Ca	Mg	S	Cu	Fe	Mn	Zn
Mean	29.55	2.13	17.07	30.72	5.36	2.36	72.20	86.95	36.69	27.77
Std. Dev.	2.95	0.43	3.34	13.18	1.39	0.38	26.09	39.50	19.11	13.09
Median	29.45	2.17	16.70	28.85	5.25	2.35	69.90	78.35	33.10	22.80
Min.	24.00	1.21	11.80	10.70	2.70	1.60	25.50	26.20	14.30	10.90
Max.	36.70	2.98	28.30	78.60	9.90	3.60	128.90	207.30	122.10	69.80
Coeff. Var.	9.98	20.39	19.60	42.90	25.97	16.37	36.14	45.44	52.105	47.16

All of the nutrients returned a p-value under 0.05 for the Shapiro–Wilk normality test at a 95% confidence interval.

Table 4. The machine learning algorithms’ accuracy performance for the reflectance data.

Method	N	P	K	Ca	Mg	S	Cu	Fe	Mn	Zn
kNN
R²	0.852	0.623	0.621	0.179	0.797	0.119	0.834	0.437	0.592	0.431
MAE	0.704	0.163	1.087	6.765	0.285	0.204	7.083	18.142	5.105	6.005
RMSE	1.245	0.278	2.041	13.905	0.445	0.416	11.362	36.248	11.707	10.157
Lasso Regression
R²	0.394	0.452	0.315	0.157	0.413	0.660	0.215	0.180	0.189	0.128
MAE	2.145	0.193	1.542	7.304	0.627	0.137	19.881	34.140	8.470	7.852
RMSE	2.526	0.335	2.745	14.091	0.757	0.258	24.744	43.751	16.513	12.573
Ridge Regression
R²	0.351	0.153	0.354	0.169	0.139	0.056	0.232	0.222	0.190	0.137
MAE	2.468	0.284	1.347	9.923	0.698	0.298	19.321	28.456	9.456	6.541
RMSE	2.912	0.417	2.597	13.989	0.916	0.431	24.485	42.158	16.502	12.982
SVM
R²	0.638	0.404	0.530	0.336	0.458	0.400	0.277	0.308	0.742	0.447
MAE	0.902	0.247	1.546	6.551	0.505	0.233	19.692	21.421	3.666	7.891
RMSE	1.952	0.349	2.275	12.501	0.752	0.344	23.754	40.201	9.309	9.741
ANN
R²	0.860	0.656	0.762	0.481	0.733	0.438	0.841	0.340	0.698	0.595
MAE	0.840	0.177	1.265	7.637	0.359	0.174	8.377	30.259	5.880	6.949
RMSE	1.211	0.265	1.619	11.052	0.510	0.332	11.120	39.251	10.078	8.567
DT
R²	0.743	0.661	0.613	0.576	0.759	0.452	0.731	0.453	0.730	0.640
MAE	0.787	0.178	1.434	6.375	0.345	0.166	8.835	20.016	5.090	5.681
RMSE	1.644	0.263	2.064	12.123	0.484	0.328	14.472	35.726	9.525	8.0811
RF
R²	0.912	0.771	0.699	0.624	0.832	0.727	0.754	0.527	0.854	0.741
MAE	0.706	0.093	1.146	3.525	0.234	0.100	7.828	19.375	5.093	4.246
RMSE	1.059	0.216	1.818	9.404	0.405	0.231	13.850	33.233	7.007	6.846

The bolded in the table are the scores representing the overall best performance of each nutrient.

Table 5. The machine learning algorithms’ accuracy performance for first-derivative data.

Method	N	P	K	Ca	Mg	S	Cu	Fe	Mn	Zn
kNN
R²	0.669	0.453	0.329	0.311	0.512	0.172	0.752	0.512	0.898	0.587
MAE	0.944	0.180	1.341	6.994	0.398	0.184	7.456	17.846	3.594	4.948
RMSE	1.867	0.335	2.717	12.039	0.689	0.404	13.903	33.740	5.859	8.655
Lasso Regression
R²	0.257	0.401	0.161	0.287	0.401	0.158	0.292	0.190	0.234	0.130
MAE	1.489	0.197	1.986	7.048	0.486	0.304	19.513	33.146	7.934	9.588
RMSE	2.804	0.354	3.040	12.258	0.789	0.407	23.503	43.487	16.055	12.561
Ridge Regression
R²	0.210	0.310	0.157	0.302	0.129	0.222	0.273	0.183	0.300	0.158
MAE	1.650	0.265	1.997	6.978	0.789	0.248	19.212	34.240	8.242	9.047
RMSE	2.987	0.380	3.101	12.268	0.987	0.358	23.819	44.289	15.348	11.978
SVM
R²	0.373	0.323	0.330	0.531	0.270	0.459	0.679	0.423	0.649	0.513
MAE	1.357	0.250	1.714	6.745	0.678	0.240	11.870	24.653	6.676	6.159
RMSE	2.262	0.373	2.716	10.511	0.844	0.326	15.829	36.705	10.855	9.397
ANN
R²	0.721	0.554	0.445	0.566	0.564	0.582	0.800	0.444	0.838	0.731
MAE	1.287	0.219	1.680	6.495	0.559	0.197	10.030	20.466	4.617	5.605
RMSE	1.712	0.302	2.471	10.113	0.652	0.287	12.493	36.028	7.372	6.983
DT
R²	0.703	0.633	0.491	0.491	0.479	0.474	0.786	0.509	0.728	0.584
MAE	1.298	0.136	1.223	6.850	0.597	0.232	7.013	32.559	4.976	4.832
RMSE	2.767	0.274	2.368	13.015	0.832	0.314	25.209	33.861	15.036	11.391
RF
R²	0.866	0.765	0.548	0.501	0.507	0.453	0.861	0.612	0.879	0.855
MAE	0.738	0.119	1.225	6.668	0.424	0.209	6.509	17.280	4.050	2.075
RMSE	1.185	0.219	2.231	10.839	0.693	0.328	10.389	31.640	6.377	5.121

The bolded in the table are the scores representing the overall best performance of each nutrient.

Table 6. Summarized information on the results obtained by the proposed framework.

Nutrient	Class	Method	R²	MAE	Spectral Data	Contributive Wavelengths/Spectral Regions (nm) *
N	Macro	RF	0.912	0.706	Reflectance	384–412; 421; 423; 432; 433; 435; 440–455; 464–472; 480–487
P	Macro	RF	0.771	0.093	Reflectance	385–411; 438–456; 472–477; 502; 521; 527; 544–555;
K	Macro	ANN	0.763	1.265	Reflectance	762–764; 816; 838; 857; 903 908; 915–925; 934–957; 973–1020
Ca	Macro	RF	0.624	3.525	Reflectance	545–551; 749–787; 843–888; 901–1020
Mg	Macro	RF	0.832	0.234	Reflectance	390; 411–412; 445; 496; 554; 586–630; 643; 656–669
S	Macro	RF	0.727	0.100	Reflectance	579; 590; 595; 609–612; 618; 624–632; 645–680; 684–689; 700
Cu	Micro	RF	0.861	6.509	First-Derivative	388; 394; 416–419; 430–432; 440; 452–456; 461; 475; 512; 523; 823; 863–865; 951; 977–979;
Fe	Micro	RF	0.612	17.280	First-Derivative	391–396; 405; 421–424; 433–436; 474–477; 552; 758; 810; 837; 890–892; 910; 926
Mn	Micro	kNN	0.898	3.594	First-Derivative	381; 392–410; 414; 438; 555–568; 582; 819; 607; 761–767; 823–835; 841
Zn	Micro	RF	0.855	2.075	First-Derivative	381; 398; 407–411; 420; 449; 555–559; 604–607; 858

* These wavelengths and regions were obtained by sorting the highest Relief-F values of each prediction.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Osco, L.P.; Ramos, A.P.M.; Faita Pinheiro, M.M.; Moriya, É.A.S.; Imai, N.N.; Estrabis, N.; Ianczyk, F.; Araújo, F.F.d.; Liesenberg, V.; Jorge, L.A.d.C.; et al. A Machine Learning Framework to Predict Nutrient Content in Valencia-Orange Leaf Hyperspectral Measurements. Remote Sens. 2020, 12, 906. https://doi.org/10.3390/rs12060906

AMA Style

Osco LP, Ramos APM, Faita Pinheiro MM, Moriya ÉAS, Imai NN, Estrabis N, Ianczyk F, Araújo FFd, Liesenberg V, Jorge LAdC, et al. A Machine Learning Framework to Predict Nutrient Content in Valencia-Orange Leaf Hyperspectral Measurements. Remote Sensing. 2020; 12(6):906. https://doi.org/10.3390/rs12060906

Chicago/Turabian Style

Osco, Lucas Prado, Ana Paula Marques Ramos, Mayara Maezano Faita Pinheiro, Érika Akemi Saito Moriya, Nilton Nobuhiro Imai, Nayara Estrabis, Felipe Ianczyk, Fábio Fernando de Araújo, Veraldo Liesenberg, Lúcio André de Castro Jorge, and et al. 2020. "A Machine Learning Framework to Predict Nutrient Content in Valencia-Orange Leaf Hyperspectral Measurements" Remote Sensing 12, no. 6: 906. https://doi.org/10.3390/rs12060906

APA Style

Osco, L. P., Ramos, A. P. M., Faita Pinheiro, M. M., Moriya, É. A. S., Imai, N. N., Estrabis, N., Ianczyk, F., Araújo, F. F. d., Liesenberg, V., Jorge, L. A. d. C., Li, J., Ma, L., Gonçalves, W. N., Marcato Junior, J., & Eduardo Creste, J. (2020). A Machine Learning Framework to Predict Nutrient Content in Valencia-Orange Leaf Hyperspectral Measurements. Remote Sensing, 12(6), 906. https://doi.org/10.3390/rs12060906

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Machine Learning Framework to Predict Nutrient Content in Valencia-Orange Leaf Hyperspectral Measurements

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and Data Acquisition

2.2. Hyperspectral Measurement Processing

2.3. Machine Learning Analysis and Hyperspectral Mapping

3. Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI