Plot-Based Classiﬁcation of Macronutrient Levels in Oil Palm Trees with Landsat-8 Images and Machine Learning

: Oil palm crops are essential for ensuring sustainable edible oil production, in which production is highly dependent on fertilizer applications. Using Landsat-8 imageries, the feasibility of macronutrient level classiﬁcation with Machine Learning (ML) was studied. Variable rates of compost and inorganic fertilizer were applied to experimental plots and the following nutrients were studied: nitrogen (N), phosphorus (P), potassium (K), magnesium (Mg) and calcium (Ca). By applying image ﬁlters, separability metrics, vegetation indices (VI) and feature selection, spectral features for each plot were acquired and used with ML models to classify macronutrient levels of palm stands from chemical foliar analysis of their 17th frond. The models were calibrated and validated with 30 repetitions, with the best mean overall accuracy reported for N and K at 79.7 ± 4.3% and 76.6 ± 4.1% respectively, while accuracies for P, Mg and Ca could not be accurately classiﬁed due to the limitations of the dataset used. The study highlighted the effectiveness of separability metrics in quantifying class separability, the importance of indices for N and K level classiﬁcation, and the effects of ﬁlter and feature selection on model performance, as well as concluding RF or SVM models for excessive N and K level detection. Future improvements should focus on further model validation and the use of higher-resolution imaging. assessed the ability of available satellite images from Landsat-8 OLI/TIRS and machine learning models in creating an open-source method for classifying nutrient levels of palm trees on a plot basis. This was conducted using mean reﬂectance extracted from each as predictors for nutrient levels acquired from chemical analysis of in palm stands. In this study, the of separability metrics, ﬁlters, and selection were also put to the test via constructing models with the dataset on different


Introduction
The global population is projected to reach 7.58 billion by the end of the year and an additional 27.7 million tonnes of edible oil will be required to fulfill food demands. With its greater per-hectare production and economic competitiveness, oil palm is a pivotal crop in ensuring sufficient edible oil is available in the global market [1]. Agriculture has at times become a controversial topic among conservationists due to its negative impacts on the environment, such as biodiversity loss, deforestation, and increased carbon emissions [2][3][4][5]. Precision agriculture (PA), which involves informed decision making in agriculture using information interpreted from sensor-based data (such as the remote sensing data in this paper) or other sources, is currently sought as a solution for improved sustainable food production [6][7][8][9]. 2 of 28 In fertilizer application, PA enables site specific management by determining macronutrient status and fertilizer requirement in individual plants. These macronutrients include nitrogen (N), phosphorus (P), potassium (K), magnesium (Mg) and calcium (Ca), which are essential for ensuring good plant health [7][8][9][10]. Like most plants, macronutrient levels in palm trees are diagnosed via removal of leaflets for destructive chemical analysis, such as the Kjeldahl method for N determination [11][12][13][14][15][16][17][18][19][20][21][22][23]. By relating remote sensing and GIS technology data with field results at promising accuracy and precision, the findings could be extrapolated to plantation scales, enabling more efficient, economic and non-destructive means of fertilizer management. This ensures global food security is met under sustainable terms via increasing crop production with available land and resource [7,8,15,20].
It can be seen that the wavelengths selected in literature for N prediction are focused in the green (i.e., 510-550 nm) and red edge (i.e., 710-750 nm) regions, which correspond to chlorophyll characteristics [12,20,37]; this is contrary to P and K, in which wavelengths at the SWIR region play a large role, in addition to those in the VISNIR region [13,14,23,29,30,[32][33][34][35]. Additionally, specific wavelengths identified as significant explanatory variables may be used to derive vegetation indices (VI) mathematically for improved prediction [23,29]. However, the exact wavelengths identified for predictions may differ between crops, their varieties, and methods used. The authors of [20] identified 526 nm and 716 nm as ideal predictors for wheat N, while [25] concluded 522 nm and 740 nm for rice N, with both studies using the ratio of the readings from the first derivatives of their respective wavelength pairs. On the other hand, [13,30] identified several highly similar wavelengths despite use of different methods for oil seed rape N prediction ( [13]: 513 nm, 542 nm, 718 nm, 928 nm, 1015 nm; [30]: 574 nm, 719 nm, 918 nm, 1017 nm).
It remains unfortunate that spectroradiometers are unaffordable for most agricultural practitioners, in addition to its more laborious nature when plants have to be scanned individually for plantation-scale monitoring. Multispectral imaging from satellite sensors may offer a solution to wide-scale and affordable measures for nutrient monitoring. To date, a handful of research has attempted N prediction using high spatial resolution (<5 m) multispectral images captured from commercial satellites, such as QuickBird [38], GeoEye-1 [39], SPOT 7 [40] and WorldView-2 [41], with promising results. Sadly, free medium-resolution data from satellite sensors such as Landsat-8 OLI, Sentinel 2 MSI, ASTER and Sentinel 3 OLCI were even greater in rarity [42,43], or at best, simulated with spectroradiometer readings [31]. In palm trees, similar platforms (i.e., spectroradiometer and high resolution images) have been explored for the prediction of N, P and K [40,[44][45][46], although difficulty in its widespread application still remains with its inaccessibility to oil palm smallholders.
Taking leverage of its free availability and consistent revisit frequency, this study assessed Landsat-8 OLI satellite imageries and ML algorithms (i.e., Support Vector Machine (SVM), Artificial Neural Network (ANN), Random Forest (RF)) in classifying nutrient levels of palm trees with different treatments for the following macronutrients: nitrogen (N), phosphorus (P), potassium (K), magnesium (Mg) and calcium (Ca). Given its 30 m resolution, the study proposed an open-source and plot-based method to classify the nutrient status of palm trees via image processing, feature extraction and ML classification. The aim of this study is to produce a freely available nutrient level classification model with Landsat-8 OLI imageries as input. Studies to date on nutrient estimation have only focused on spectroradiometer data or high resolution imaging, particularly for N. The positive results acquired from this research will open insights to the potential of using easily available coarse satellite imaging in classifying plot N and other macronutrient levels via ML, subsequently allowing long-term monitoring of palm plantations at a large scale. This would not only promote efficient, convenient and cost-effective nutrient management at a plantation scale, but increase the accessibility of nutrient monitoring to smallholders.

Study Area and Experimental Setup
The study was carried out at an oil palm plantation located in Johor, Malaysia, from 2013 to 2017 ( Figure 1). The palm trees were planted at a 144 trees/hectare (ha) density. Malaysia is a country with a tropical climate due to its proximity to the equator. According to [47,48], the study area is characterized by lowlands consisting of soil from the Rengam series (USDA Soil Taxonomy: Ultisols). Temperature, average monthly rainfall and average wind speed in the area during the study period ranged from 23 • C to 33 • C (annual average: 29 • C), 64.01 mm to 350.9 mm (annual average: 2066.4 mm) and 3.9 to 9.7 km/h, respectively [49]. Overall, soil and climate conditions were suitable for palm cultivation with possible improvements from fertilizer application [48,50]. In total, the experimental plots covered 3.97 ha of palm trees, spanning across 39.44 ha of palm plantation. Palm stands were aged five and half years when the experiment was initially conducted. Three levels of inorganic fertilizer and four levels of compost were applied for N, P, K and Mg in a factorial design with three replicates for each tree. N, P, K and Mg fertilizers applied were ammonium chloride (NH 4 Cl), rock phosphate (P), muriate of potash (KCl) and kieserite (MgSO 4 ·H 2 O). This resulted in 36 plot observations in total. Specific details on treatment levels are shown in Table 1 and Figure 1. It should be noted that soil and plant interactions were not taken into account in this study as this research's purpose is to evaluate the potential of ML models in classifying nutrient levels in palm tree plots using coarse imagery, and fertilizer applications were conducted to induce nutrient level variability in plots. Table 1. Description of fertilizer and compost treatment levels for nitrogen (N), phosphorus (P), potassium (K) and magnesium (Mg). Amounts are measured in kg nutrient per tree.

Treatment Levels
Fertilizer Treatment (N, P, K, Mg in kg palm − thermal infrared 2 (TIR2): 11.5-12.51 μm), at 30 m (panchromatic at 15 m) and 100 m spatial resolution respectively. Images close to sampling dates were manually screened to ensure the subset consisting of the study site was cloud-free, followed by further inspection with the quality assessment band (BQA) provided in the download. Overall, the selected images were +/− 2 weeks from the stipulated date. Specific details of selected imageries are as below (Table 2).

Figure 1.
Location of study area in Malaysia. Highlighted boxes in enlarged image of study area are plots considered for the study while numbers in each plot represent treatments applied to the respective plots. The first and second numbers represent treatment levels for fertilizer and compost, respectively (e.g., a plot with the number "13" is treated with level 1 fertilizer treatment and level 3 compost treatment). Compiled from GoogleEarth Pro.

Materials/Data Collection
Each plot consisted of 4 × 3 (12) palm trees, with an average plot area of 1100 m 2 . In each plot, frond 17 from all 12 palm trees was sampled to produce ground truth by destructive foliar analysis and its nutrient status was acquired as a mean of all observations in the plot. Frond 17 was selected as the reference for palm tree nutrient levels due to past studies indicating its greater representation of nutrient status and correlation between nutrient contents and yield [51]. N was acquired using combustion or near-infrared (NIR) spectroscopy, while P, K, Mg and Ca were acquired by wet ashing or NIR spectroscopy [52]. The acquired foliar nutrient status was reported in % dry matter (DM). By conducting the experiment for five consecutive years (2013-2017), 36 plot observations from each year yielded 180 samples in total. The coordinates of the four corners of each plot ( Figure 1 further inspection with the quality assessment band (BQA) provided in the download. Overall, the selected images were +/− 2 weeks from the stipulated date. Specific details of selected imageries are as below (Table 2). QGIS and the Python programming language were used for all data processing in the study. Nutrient level classes were constructed based on critical value ranges provided by [53] (See Appendix A.3, Table A1). As a result, five class ranges were produced and assigned with an ordinal class value: Deficient = 1, Marginally Deficient = 2, Optimum = 3, Marginally Excessive = 4 and Excessive = 5. For each nutrient observation of each plot, its value is therefore assigned based on the derived class ranges (e.g., a plot observation of 2.50% N in DM would be in the Optimum class range and thus assigned the class value 3). The distribution of samples among nutrient levels is shown in Table 3 (Left). To reduce overfitting or biased analysis, observations from nutrient level classes identified to have less than 10 samples are merged with the adjacent class, yielding Table 3 (Right). The best image scene identified each year was atmospherically corrected by applying a Dark Object Subtraction (i.e., DOS1) algorithm from a plugin in QGIS by [54]. Despite the availability of Landsat-8 Surface Reflectance images (Level-2 Product) provided by USGS, the study proceeded with DOS due the transferrability of the approach between images from different platforms. This approach would enable a transferrable model to be developed in events where surface reflectance products are not available for a particular image and simple atmospheric correction without cost and climatological parameters are required. Using GoogleEarth Pro as reference, the series of satellite images was checked for co-registration, followed by a subset and resampling with the nearest neighbour to produce a Region of Interest (ROI). In a shapefile layer, the recorded coordinates for the four corners of each plot were used to produce a rectangular polygon feature encompassing the plot area. The shapefile layer was rasterized to produce a binary mask with a dimension equal to the ROI (294 × 524 pixels) as means to extract the band mean reflectance values of all plots in the ROI (Appendix A.1, Figure A1). The spatial resolution of the resampled ROI and the binary mask was determined at 1.6 m as the resolution required to preserve the rectangular shape of the polygon features during rasterization. It was essential to ensure a suitable resampling resolution was selected based on the size of polygons as a low resampling resolution would result in overestimation of polygon area and incorrect value extraction [55][56][57] (See Figures A2 and A3 at Appendix A.1). Masking and extraction enabled only mean values calculated from the extent of each plot in the ROI to be included in further analysis. The acquisition of these values allowed subsequent calculation of the Jeffries-Matusita (J-M) distance for all possible pairwise combinations of nutrient levels in each nutrient to quantify the separability of values between plots from different nutrient levels. The J-M distance is a distance which takes into account the covariance matrix between features of pairwise classes (Equation (1)). The value ranges from 0 to 2, with a value >1.8 between classes considered good for separability [54,[58][59][60]: where: x, y = first and second spectral signature vector ∑ x, ∑ y = covariance matrix of plot B = Bhattacharya distance J xy = Jeffries-Matusita distance In attempts to reduce image noise and improve separability, the following filters and transforms were applied to the series of images: standard filters (i.e., min, median, max, gaussian and rank) and fast fourier transform (FFT). The process was carried out using Numpy, Pywt and Scipy libraries in Python. For FFT filtering, the image was transformed to the 2D frequency domain and swapped for quadrants to obtain the image center [61]. Values for coordinates within 1 x or y unit from the centre of domain (i.e., (0, 0), (0, 1), (1, 0), (0, −1), (−1, 0)) were retained, while other values in higher frequencies were suppressed by 10 times by multiplying the values by a coefficient of 0.1. Therefore, the filter functions as a minimum filter with the ability to suppress values at specific frequencies via the 2D frequency domain (Appendix A.2, Figure A4). To identify the best-performing filter, value extraction and distance calculation were repeated for all unfiltered (as control) and filtered image scenes, whereby the approach yielding the highest J-M distance averaged from all five nutrients was considered the best choice.

Vegetation Index and Feature Selection
Vegetation indices (VI) are effective in enhancing signals from vegetation and suppressing unintended noises. VIs were selected for this study by evaluating VIs listed in a review on VI development [62]. The corresponding research article where each VI was first mentioned was studied to identify its feasibility for application, with the following criteria in mind, and with priority in descending order: Criterion 1: VI was derived with strong theoretical or mathematical foundation Criterion 2: VI was derived with general applicability in mind Criterion 3: VI was derived with satellite bands or broadband wavelengths In total, 16 VIs were shortlisted for application in this study, in which their respective derivation was carried using values from four bands (i.e., B, G, R and NIR) of the best filtered image. In addition to the initial bands (i.e., B, G, R, NIR, SWIR1, SWIR2), a total of 22 spectral features would be used as model inputs for nutrient level classification. The reference, index name and formula of each VI are portrayed in the table below (Table 4): Table 4. Vegetation indices (VI) selected as additional feature and for analysis. Parenthesis after full name of VI refers to acronym used throughout study.

Reference
VI Formula [63] Normalized Difference Vegetation Index (NDVI) Green NDVI (GNDVI) Soil adjusted and Atmospherically Resistant Vegetation Atmospherically Resistance Vegetation Index (ARVI) [72] Modified Simple Ratio (MSR) VIs will be applied in two of the studied scenarios. As the VIs selected are derived from mostly similar initial bands (e.g., RDVI and IPVI both use values from R and NIR band), multicollinearity problems may occur. To address this, feature selection via correlation analysis and variance inflation factor (VIF) was conducted for one of the scenarios (i.e., Scenario 3, see Section 2.3.4). VIF has been applied in many studies involving multiple variable regressions to ensure minimal collinearity between independent variables [76][77][78]: where: R 2 = coefficient of determination between an independent variable and the other independent variables.
For each nutrient, the first 11 indices with the highest correlation coefficient were selected. VIF was applied to reduce the number of features further for dimension reduction and avoiding multicollinearity problems. VIFs between the variables were generated and the six variables with the lowest VIFs were selected. It is suggested that the VIF value for each variable should be less than 10 to avoid multicollinearity. To achieve this, the calculation of VIF values for variables with each other and the elimination of the variable with the highest VIF value were conducted iteratively until all variables possessed a VIF value of less than 10 upon VIF re-calculation.
In a nutshell, processing images via Sections 2.3.1 and 2.3.2 subsequently resulted in specific vector features with paired class values for each nutrient or scenario, which would be applied as inputs for the algorithms of interest.

Machine Learning
Supervised machine learning (ML) algorithms were applied in the classification problem of this study. Supervised learning, the most common type of ML algorithm, constructs a decision function for future predictions by associating given sample features with their corresponding class values in a dataset given during model training. Support Vector Machine (SVM), Artificial Neural Network (ANN) and Random Forest (RF) were selected for this study. All models were implemented in the Python 3.7 environment using the Scikit-learn library.
The SVM is an instance-based supervised learning algorithm developed by Vapnik in the 20th century and originated from the Vapnik-Chervonenkis (VC) theory. At its core, the model classifies samples by deriving an optimal separating hyperplane which maximizes the margin between the boundaries of different classes through solving a convex optimization problem [79,80]. Following suggestions by [81], the optimal hyperparameter C (C) and gamma (gamma) were each searched in exponents of 2 (i.e., 2 −15 , 2 −14 , . . . , 2 15 ), while the RBF kernel (kernel = rbf) was adopted due to its ability to map data implicitly into a higher dimensional space.
The Multilayer Perceptron (MLP) was selected as the ANN model for this study. The model is a Feedforward Neural Network obeying empirical risk minimization. The model attempts to minimize the squared error in a cost function using gradient descent methods such as the back-propagation algorithm, in which error values are propagated backwards in the model to improve its performance at each iteration [82,83]. The model consists of one hidden layer, and the Rectified Linear Unit was selected as the activation function. Hidden layer sizes (i.e., hidden_layer_sizes) were searched between 1 to 10 with learning rate at 0.2, 0.1 and 0.05 (i.e., learning_rate_init), while setting maximum iterations (i.e., max_iter) to 2000.
RF is an ensemble model consisting of a collection of classification and regression trees (CART) [84]. The model applies the bootstrap aggregation (or Bagging) which reduces model bias and increases variance. In each tree, samples are split into nodes based on their function (i.e., Gini index) and the number of features defined by the user (features are randomly selected) [85,86]. In RF, the number of estimators or trees was searched between 100, 200 and 500; the minimum sample for splitting was explored at multiples of 2 from 2 to 14; while the maximum number of features was set between 1 and 6.

Performance Evaluation
For each model, the classification accuracy and the confusion matrix were used as the performance metrics to evaluate the model in both the calibration and the validation stage. Cohen's kappa score was also provided for each of the respective models (See Appendix A.3, Table A9). Together with grid search and 3-fold cross validation, 50% of all samples were randomly selected and used to identify the best combination of hyperparameters for the model. Calibration and validation were conducted for 30 repetitions, where the samples were randomly split to 50:50 at every iteration. The process was also conducted in the Python 3.7 environment with the Scikit-learn library.
To explore the effectiveness of filters, VIs and feature selection, classification by the model was assessed under four different scenarios: Scenario 1: Unfiltered (Control) band mean reflectance of sites (number of features = 6); Scenario 2: Best filtered band mean reflectance of sites (number of features = 6); Scenario 3: Feature selection of the best filtered case (number of features vary with each nutrient); and Scenario 4: All features of the best filtered case (number of features = 22). Figure 2 summarizes the methodology applied for this study.
To explore the effectiveness of filters, VIs and feature selection, classification by the model was assessed under four different scenarios: Scenario 1: Unfiltered (Control) band mean reflectance of sites (number of features = 6); Scenario 2: Best filtered band mean reflectance of sites (number of features = 6); Scenario 3: Feature selection of the best filtered case (number of features vary with each nutrient); and Scenario 4: All features of the best filtered case (number of features = 22). Figure 2 summarizes the methodology applied for this study.

Data Description and Processing
Most samples from N, P, K, Mg and Ca were grouped in the Marginally Excessive (Mar Ex), Optimum (Opt), Marginally Excessive, Optimum and Optimum classes, respectively ( Table 5). Given that P only has less than 10 samples, merging observations resulted in all observations being in the optimum class and its exclusion from further model derivation. Other nutrients with their observations from one class merged to the subsequent class include Mg and Ca, with samples at Deficient (Def) and Marginally Deficient (Mar Def) classes merged to the subsequent class respectively. Among nutrients to be analysed, Mg had a rather balanced sample distribution despite the dominant class occupying more than 50% of all observations. From Table 6, it could be seen that the overall spectral signature exhibited from the observations was similar to plants or oil palms: peak G, absorptions at B and R, NIR reflectance shoulder and SWIR absorptions. When compared, the Covs of both nutrients and reflectance values of NIR in addition to SWIR regions were similar in magnitudes: nutrients ranged from 5.2% to 12.6%, while reflectance values from 4.5% to 8.8%. Mg and B band reported with the highest value respectively (Tables 5 and 6).

Filter and Feature Selection
J-M distances when applying filters for the studied nutrients were summarized in Table 7. At control, N was identified to have the highest J-M distance (thus separability) between different classes, while Ca reported the lowest value. All applied methods led to an overall improvement in J-M distances between classes, suggesting better feature separability. Both identified filters were found as functionally similar to the minimum filter: Fourier filter involving 2D-FFT and subsequent suppression of high frequency values at the frequency domain; while the Rank filter set with Rank 1 selects the 2nd-lowest value among the centre pixel and its neighbours to replace its value. N and Ca benefited from the Fourier filter, with the latter gaining a relative improvement of 50% in distance. K and Mg experienced more improvements with the Rank filter instead, acquiring a <2.5% relative increase in distance. Overall, VI transformation was conducted with values extracted from images using the Rank filter, given its higher J-M distance when averaged among nutrients. All pairwise J-M distances (i.e., between classes of nutrients for all filters) are provided in Appendix A.3 (Tables A2-A8). All nutrients were found to have statistically significant relationships with each other at the 1% significance level (α = 0.01) (Figure 3). N is reported to be more correlated with P or Ca than with K or Mg. The highest correlation coefficient between nutrients was reported between P and Ca, at 0.61. K reported negative correlation with all studied nutrients at magnitude less than 0.30, except for Ca. Visible bands seem to correlate with each other highly and positively (>0.90). NIR exhibited low correlation with SWIR1 and SWIR2, contrary to the latter two exhibiting strong correlation (0.89) with each other. Excluding Ca, correlation coefficients between nutrients and visible bands ranged between 0.20 and 0.40. In contrast, IR bands reported their best magnitude of correlation with N: NIR at 0.5, SWIR1 at 0. 44 Table 8 shows the first 11 spectral features possessing the strongest correlation with the corresponding nutrient. The feature which best correlates with each nutrient could be distinguished into two groups: initial spectral bands for K and Mg; while correction-related indices for N and Ca. For N, the selected features ranged from 0.55 (MSR) to 0.77 (EVI) in terms of correlation coefficient; while K ranged from 0.34 (EVI2) to 0.40 (G), Mg from 0.26 (MSR) to 0.31 (G) and Ca from 0.24 (TVI) to 0.34 (SARVI).  Figure 3). It could be seen that the numbers of variables selected for all nutrient cases were lower than the use of initial bands from the satellite images (i.e., Scenario 1, no. of features = 6). Interestingly, none of the features selected involved the use of SWIR bands during their respective derivations. This is attributed to the high correlation between selected indices with SWIR, as observed between GARI or ARVI with SWIR1 (>0.70) or SWIR2 (>0.90) (Figure 3).

Machine Learning Model Performance
At calibration, the performance for N of almost all models were centred (i.e., median and mean) at more than 0.8 (80%) in all scenarios, with the best performance of SVM, MLP and RF reported in Scenario 1, 3 and 4 respectively (Figure 4). For K and Mg, the average performances of all models seem to span a wider range: positioned in between 0.75 (75%) to 1 (100%) for K, while 0.60 (60%) to 1 (100%) for Mg. All models were reported to have the highest median and mean performance (>0.85) with Ca. When compared, the following models were affected by scenarios: RF for K, MLP for N and MLP for K. For instance, the average performance of MLP models in their best-performing scenario (Scenario 3: 0.844) was 9.2% higher than its worst performing counterpart (Scenario 1: 0.752) ( Table 9). RF reported the highest mean performance for most cases, achieving a perfect 1 (100%) at least once for each nutrient. MLP, on the other hand, observed greater variability in performance than RF and SVM, especially at lower quartiles. MLP for N (Scenario 3 and 4) and K (Scenario 2-4) reduced the most performance variability (i.e., boxplot size) with scenarios, while RF and SVM found reduced performance variability with Scenarios 3 and 4 for K and Mg respectively.  When mean was considered, nutrients reported the best performance with SVM at Scenario 4 (0.797 ± 0.043), SVM at Scenario 1 or 2 (0.766 ± 0.041), SVM at Scenario 4 (0.635 ± 0.05) and MLP at Scenario 4 (0.870 ± 0.028) for N, K, Mg and Ca respectively ( MLP acquired more instances of performance accuracy beyond its boxplot, and  The average performance of all models at validation stage decreased for all nutrients regardless of scenarios: N, K, Mg and Ca achieved an average greater than 0.70 (70%), 0.60 (60%), 0.50 (50%) and 0.80 (80%) respectively ( Figure 5). For most cases, the best performing scenario reported by each model in each nutrient during validation was different from its calibration counterpart (Table 9).
MLP acquired more instances of performance accuracy beyond its boxplot, and greater performance variability (i.e., larger boxplot size) and standard deviation than SVM and RF ( Figure 5 and Table 9). MLP was the greatest benefactor of scenarios. For instance, Scenario 3 improved the mean accuracy of MLP from 0.701 to 0.754 for N and from 0.675 to 0.718 for K, in addition to decreased performance variability with Scenario 2 or 3. RF and SVM experienced increase in performance for similar scenarios as well, but to a lesser extent (<2% in mean accuracy) than MLP (>3% in mean accuracy). Overall, SVM has the best classification accuracy while RF had the upper hand in standard deviation and size of boxplot.

Discussion
J-M distance was shown to be a strong separability metric in this study. This was observed by instances for N level classification: perfect classification of samples between Optimum (Opt) and Excessive (Ex) levels as well as low misclassification between Opt and Marginally Excessive (Mar Ex) levels, given the pairwise J-M distance values were at 1.99 and 1.87 respectively (Table 10A). After filtering, pairwise distance of N for between Mar Ex and Ex as well as K or Ca for Opt and Mar Ex were increased by slight amounts. Unfortunately, this did not translate into any form of improved classification accuracy. Most samples from different classes of Mg or Ca remain misclassified with the given pairwise distance. These findings were consistent with those reported by [58][59][60], who noted requirement of J-M distance values greater than 1.80 for effective class separability.  However, low J-M distance values may also be a result of uneven sampling encountered between classes for all nutrients in this study, particularly N, K and Ca. Uneven sampling could lead to model overfitting and complex decision surfaces formed and dominated by samples from the majority class. In remote sensing, RF is susceptible to uneven sampling between classes for classification problems, although findings regarding its impact remained inconclusive [84]. For SVM in this study, more than 50% of all support vectors were selected from the majority class, leading to greater misclassification of samples from another class as the majority class (Table 10B-D). Some SVM instances were noted to have high support vector to total sample ratio (50%) as well. An increase in ratio may subsequently result in increased overfitting and misclassification [87,88]. Although a high classification accuracy was acquired for Ca (Table 9, Figure 4) during the validation stage, it has to be reflected that most of the correct classifications (>90%) were from the majority class (Table 10D). Taking into account the required J-M distance value greater than 1.80 for effective class separability [58][59][60] and the need for even sampling, it is suggested that the classification of nutrient levels for Mg and Ca using Landsat-8 imagery remains inconclusive based on the limitations of the dataset.
Identification of SWIR2 as a strong predictor for N concurs with findings from [44,89], who conducted similar experiments with hyperspectral spectroradiometers instead. Several other researchers have also identified SWIR regions as potential regions for N predictions. The SWIR2 band (2.11-2.29 µm) region is associated with absorption features as a result of vibration activities from amide bonds of N-containing proteins. SWIR regions are also said to have low scattering by canopy structural variation, thus making them perfect candidates for canopy-level monitoring [90,91]. Sadly, the limited wavelength coverage by band reflectance from Landsat-8 OLI satellite prevented further comparison of other spectral regions as predictors for the studied nutrients. VIs were also applied in hopes of magnifying signals from biophysical parameters of vegetation. Several VIs related to soil-line and atmospheric adjustments were identified as potential predictors for N (i.e., SAVI, SARVI, EVI, MSAVI, EVI2 and GARI), while NIR-related indices for K (i.e., NDVI, TVI and IPVI). This may be attributed to the following: (1) the role of N in photosynthesis as well as the susceptibility of involved spectral regions (i.e., visible) to soil background or atmospheric effects and (2) the role of K in plant cellular structure maintenance, development and disease resistance, which may be spectrally reflected at the corresponding NIR region (i.e., 815-879 nm) of the applied band [92,93].
In this study, atmospherically-adjusted indices (i.e., EVI, SARVI, ARVI, GARI) were prioritised over soil indices in N. This was suggested by the little-to-no difference in correlation coefficients between mathematically related indices; the correlation coefficient between N and ARVI was greater than N and SAVI, in addition to the former possessing a coefficient closer in magnitude to their composite, SARVI. Yet, soil-related indices were still important in this study. Developed by [64], SARVI combined SAVI and ARVI to address both atmospheric and soil background effects. Using cotton plants, the index was shown to outperform ARVI and SAVI when atmospheric and soil effects were strong, particularly when LAI < 3. A similar conclusion could be drawn by observing greater coefficients of SAVI than OSAVI. SAVI had a higher L parameter (L = 0.5) set in this study compared to OSAVI (L = 0.16) which had greater performance when scenes contain greater soil background effects [64,71]. This suggests the presence of background soil effects from the study site, despite being visually confirmed to have closed canopy cover.
Based on literature [63][64][65][66][67][68][69][70][71][72][73][74][75], most indices were initially derived to quantify biophysical parameters such as LAI, vegetation cover or fPAR. Subsequently, one would expect growth in palm stands due to greater N and K levels to be captured in VIs [94]. In addition, most identified VIs were originally derived from satellite data (i.e., Landsat and MODIS imageries), thus suggesting further compatibility in application [63,65,[67][68][69]72,75]. Still, many of the indices evaluated in this study were highly correlated with each other, due to indices being successions of other indices, such as EVI2 being a 2-band approximation of EVI [68,75]. Because of this, care should be taken to include VIs identified as strong predictors but uncorrelated with each other, such that issues related to multicollinearity could be avoided. The use of feature selection may aid in remediating the issue, as applied in Scenario 3.
Using scenarios, it could be seen that MLP experienced the greatest improvement for N and K in both classification accuracy and consistency with the use of filters (Scenario 2) or filters and feature selection (Scenario 3), as observed in improved minimum, mean and maximum accuracy, in addition to reduced standard deviation and boxplot size. MLPs are able to benefit from greater number of features which improves the description of the response variable to be classified [80]. Increased accuracy was also identified in RF and SVM models under similar scenarios and nutrients. However, the use of filters and all features (Scenario 4) led to decreased mean accuracy and increased standard deviation of models for several models compared to Scenario 3: SVM for K, RF for K and MLP for N. It may be plausible to suggest the Hughes' phenomenon or curse of dimensionality as its cause, where increasing data dimension with further inclusion of features resulted in sample points being so sparsely distributed such that models were unable to acquire a generalize solution or establishing an effective decision surface [95]. Using VIF to address multicollinearity and dimension reduction (Scenario 3), it was found MLP and RF acquired their respective best performance for N during validation, despite number of features applied were less than the use of initial bands. This is consistent the previous finding for feature selection and may suggest the potential use of fewer indices to represent or improve the information captured in the initial bands of the images, including bands not applied in their derivation, such as the SWIR bands.
On another note, SVM or RF for N at Scenario 4 during validation was the best scenario despite reduced accuracy during calibration. SVM and RF possessed the upper hand in performance accuracy, variability and accuracy difference between scenarios compared to MLP. This suggests the robustness and stability conferred to these models in handling high dimensional data at low samples [95]. The main contribution to such differences is each model's approach in acquiring its respective generalized solution: SVM follows the structural risk minimization and the kernel method, thus focusing samples involved in constructing the decision boundaries only and allowing the ability to handle both low and high dimension data respectively; and RF is able to mediate these factors by applying bootstrap aggregation (or bagging) mechanism which involves decision making from hundreds of tree classifiers [77,86,96]. MLPs require greater number of features to perform well and solve non-convex problems by minimizing observed errors, which may, at times, result in local optima convergence and overfitting [83]. Still, it is worth noting MLP was able to classify several instances of N for the Ex class accurately using the selected features (Table 11). Overall, SVM has the best performance in terms of accuracy (i.e., minimum, mean, median, maximum) for both N and K while RF in terms of stability (i.e., boxplot size, standard deviation). Model performance and stability is summarized as SVM > RF > MLP and RF > SVM > MLP, respectively. The coefficient of variation (Cov) of models may be used as a compromise for both aspects when selecting a model of choice for a particular nutrient. Models with lower Cov (i.e., low standard deviation/high mean) are preferred due to lower performance dispersal. Table 12 summarizes the performance of the best model in each ML algorithm for N and K. Based on the table, RF is preferred over SVM for both N and K, although SVM may be selected for K instead if accuracy is prioritised over standard deviation, as shown by the slight difference in Cov and a difference of 3% in mean classification accuracy. Nevertheless, the performance of all models in classifying nutrient levels of palm tree plots in this study remained optimistic, particularly for N and K. Despite the coarse resolution of Landsat-8 OLI/TIRS imageries, the study yielded models with performance greater or comparable to several studies [14,40,46] which conducted similar research with data of higher spatial or spectral resolutions (i.e., SPOT7 imagery and spectroradiometer). In fact, in several iterations, merging samples from Ex to Mar Ex to produce a binary problem (i.e., Ex or Opt) for N resulted in a nearly perfect classification (>90%, Figure 6). This may open opportunities for developing models which are able to detect nutrient excessiveness in palms and subsequently guide reduction in fertilizer application. If further validations reap consistent results, ML models trained with Landsat-8 images may become a possible approach to informed decision making in reducing excessive application of fertilizers. Contrary to this, further studies are required for N deficiency detection as no sample for the class was produced with the experimental set-up, although [14] had shown such possibilities with reflectance from a spectroradiometer. While not performing as well as N, K levels may still be classified with satisfactory accuracy using SVM or RF. Further studies are required to study the transferability of the models in terms of generalizing nutrient level classes. Oil palm trees are perennial crops with an industrial life cycle of approximately 25 years, which led to controlled experiments and monitoring being more challenging than annual crops (i.e., maize, rice, etc.). As such, studies on such applications for palm stands beyond the age range (i.e., 6.5-11 years) in this study are required. To gain better insights, higher-resolution imaging, such as UAV imaging, should be deployed to study nutrient prediction with ML on individual palm trees to check for consistency. The use of UAV data increases the variability in spectral and textural information captured for each plot and individual trees.

Conclusions
Precision agriculture plays an essential role in ensuring food security is sought sustainably in the near future. Thanks to their greater oil production on a per hectare basis, oil palm trees contribute to the sustainable production of edible oils by freeing up more land when compared to other oil crops. By applying sensor technology and ML models, this study assessed the ability of freely available satellite images from Landsat-8 OLI/TIRS and machine learning models in creating an open-source method for classifying nutrient levels of palm trees on a plot basis. This was conducted using mean reflectance extracted from each plot as predictors for nutrient levels acquired from chemical analysis of frond 17 in palm stands. In this study, the potential of separability metrics, image filters, VIs and feature selection were also put to the test via constructing models with the dataset on different scenarios.
Overall, nutrients with high pairwise J-M distances such as N and K were able to achieve satisfactory performance. However, the performance of most models was undermined by uneven sample distribution, resulting in possible overfitting by the majority class. Uneven sample distribution also poses a risk of result misinterpretation if not taken into account, as observed with Ca. Rank filter was selected as the filter of choice and the visible region had greater correlation than IR regions for K and Mg, with the inverse being true for N. For VIs, atmospherically or soil-corrected indices were selected for N (i.e., SAVI, SARVI, EVI, MSAVI, EVI2 and GARI), while those related to NIR (i.e., NDVI, TVI, IPVI) for K. Using VIF to address multicollinearity, the study further identified the potential of using fewer VIs, such as GARI and ARVI to represent information from all initial bands, including those not involved in their derivation.
When the considered algorithms were compared, SVM was superior to RF and was the best in terms of accuracy, while the inverse was true for model stability. In terms of Further studies are required to study the transferability of the models in terms of generalizing nutrient level classes. Oil palm trees are perennial crops with an industrial life cycle of approximately 25 years, which led to controlled experiments and monitoring being more challenging than annual crops (i.e., maize, rice, etc.). As such, studies on such applications for palm stands beyond the age range (i.e., 6.5-11 years) in this study are required. To gain better insights, higher-resolution imaging, such as UAV imaging, should be deployed to study nutrient prediction with ML on individual palm trees to check for consistency. The use of UAV data increases the variability in spectral and textural information captured for each plot and individual trees.

Conclusions
Precision agriculture plays an essential role in ensuring food security is sought sustainably in the near future. Thanks to their greater oil production on a per hectare basis, oil palm trees contribute to the sustainable production of edible oils by freeing up more land when compared to other oil crops. By applying sensor technology and ML models, this study assessed the ability of freely available satellite images from Landsat-8 OLI/TIRS and machine learning models in creating an open-source method for classifying nutrient levels of palm trees on a plot basis. This was conducted using mean reflectance extracted from each plot as predictors for nutrient levels acquired from chemical analysis of frond 17 in palm stands. In this study, the potential of separability metrics, image filters, VIs and feature selection were also put to the test via constructing models with the dataset on different scenarios.
Overall, nutrients with high pairwise J-M distances such as N and K were able to achieve satisfactory performance. However, the performance of most models was undermined by uneven sample distribution, resulting in possible overfitting by the majority class. Uneven sample distribution also poses a risk of result misinterpretation if not taken into account, as observed with Ca. Rank filter was selected as the filter of choice and the visible region had greater correlation than IR regions for K and Mg, with the inverse being true for N. For VIs, atmospherically or soil-corrected indices were selected for N (i.e., SAVI, SARVI, EVI, MSAVI, EVI2 and GARI), while those related to NIR (i.e., NDVI, TVI, IPVI) for K. Using VIF to address multicollinearity, the study further identified the potential of using fewer VIs, such as GARI and ARVI to represent information from all initial bands, including those not involved in their derivation.
When the considered algorithms were compared, SVM was superior to RF and was the best in terms of accuracy, while the inverse was true for model stability. In terms of scenarios, MLP gained the most from filters and selected features (Scenario 2 and 3), though use of filters and all features (Scenario 4) led to worse performance. SVM and RF experienced similar situations, though to a lesser extent. This may be caused by the Hughes' phenomenon. The study concluded N and K as potential variables predictable by reflectance value from Landsat-8 imageries and respective machine learning algorithms (RF for N and RF or SVM for K), with the best mean accuracy reported at 79.7% and 76.6% respectively. In fact, the results acquired for N from this study by collapsing the classification problem into a simpler version may be the first to point towards the possibility of producing a one-of-its-kind classification model for excessive N detection in oil palm trees using freely available Landsat-8 imageries. Unfortunately, Mg and Ca remained not possible for classification in this study.
While this study has comparable or better results than several studies conducted with data of greater resolution, further research is required to ensure the models' transferability, with rooms for further improvement via higher resolution data or different analytical approaches. The results from the free-source approach used by this study thus bring the palm plantation cultivation community one step closer to open-source precision agriculture.         . From left to right: 1st Image: Image before filtering; 2nd Image: Image at 2D fourier domain; and 3rd Image: Image after filtering. Note only signal at the centre of the image is faintly visible in 2D domain (In inner orange circle of top row image), while other magnitudes for other frequencies are relatively low (As illustrated by bottom row image). Semitransparent circles added to identify region where signal is still visible. Given only frequencies beyond the centre are further suppressed in terms of magnitude, this method serves as a minimum filter for specific frequencies (as observed with processed image in top row being smoother).
Appendix A. 3. Tables   Table A1. Critical values for nutrient levels in leaf 17 of palm stands aged more than 6 years from planting (Source: [52]  . From left to right: 1st Image: Image before filtering; 2nd Image: Image at 2D fourier domain; and 3rd Image: Image after filtering. Note only signal at the centre of the image is faintly visible in 2D domain (In inner orange circle of top row image), while other magnitudes for other frequencies are relatively low (As illustrated by bottom row image). Semi-transparent circles added to identify region where signal is still visible. Given only frequencies beyond the centre are further suppressed in terms of magnitude, this method serves as a minimum filter for specific frequencies (as observed with processed image in top row being smoother).
Appendix A.3. Tables   Table A1. Critical values for nutrient levels in leaf 17 of palm stands aged more than 6 years from planting (Source: [52]