Machine-Learning Approach Using SAR Data for the Classiﬁcation of Oil Palm Trees That Are Non-Infected and Infected with the Basal Stem Rot Disease

: Basal stem rot disease (BSR) in oil palm plants is caused by the Ganoderma boninense ( G. boninense ) fungus. BSR is a major disease that affects oil palm plantations in Malaysia and Indonesia. As of now, the only available sustaining measure is to prolong the life of oil palm trees since there has been no effective treatment for the BSR disease. This project used an ALOS PALSAR-2 image with dual polarization, Horizontal transmit and Horizontal receive (HH) and Horizontal transmit and Vertical receive (HV). The aims of this study were to (1) identify the potential backscatter variables; and (2) examine the performance of machine learning (ML) classiﬁers (Multilayer Perceptron (MLP) and Random Forest (RF) to classify oil palm trees that are non-infected and infected by G. boninense . The sample size consisted of 55 uninfected trees and 37 infected trees. We used the imbalance data approach (Synthetic Minority Over-Sampling Technique (SMOTE) in these classiﬁcations due to the differing sample sizes. The result showed backscatter variable HV had a higher correct classiﬁcation for the G. boninense non-infected and infected oil palm trees for both classiﬁers; the MLP classiﬁer model had a robust success rate, which correctly classiﬁed 100% for non-infected and 91.30% for infected G. boninense , and RF had a robust success rate, which correctly classiﬁed 94.11% for non-infected and 91.30% for infected G. boninense . In terms of model performance using the most signiﬁcant variables, HV, the MLP model had a balanced accuracy (BCR) of 95.65% compared to 92.70% for the RF model. Comparison between the MLP model and RF model for the receiver operating characteristics (ROC) curve region, (AUC) gave a value of 0.92 and 0.95, respectively, for the MLP and RF models. Therefore, it can be concluded by using only the HV polarization, that both the MLP and RF can be used to predict BSR disease with a relatively high accuracy. and A.R.M.S.; validation, A.R.M.S.; formal analysis, I.C.H.; investigation, I.C.H.; re-sources, A.R.M.S., S.K.B.; F.M.M. and K.A.; data curation, I.C.H.; writing—original draft preparation, I.C.H.; writing—review and editing, A.R.M.S., and F.M.M.; supervision, A.R.M.S. All authors have read and agreed to the published version of the manuscript.


Introduction
Oil palm (Elaeis guineensis) from the Palmae family, is a major crop in Indonesia and Malaysia, which are the main export products of these countries in the agricultural sector.
Indonesia is the largest producer of palm oil, with a total production of 34,520 million metric tons, equivalent to 58% in the year 2016, whereas Malaysia is the second largest producer of palm oil, with a production of 17,320 million metric tons, equivalent to 29% in the same year [1]. Oil palm trees in SouthEast Asian countries, such as Malaysia, are exposed to various fungi attacks, with the most common fungus being the G. boninense, which results in the Basal Stem Rot (BSR) disease [2]. The G. boninense attack results in a reduction of production yield that can eventually lead to tree mortality [3]. This disease has been one of the major culprits in the palm oil yield reduction throughout most of the production areas in the country [4]. Until now, there has been no effective treatment for the BSR disease and the current sustaining measures can only extend the life of oil palm trees [5][6][7].
In the case of young oil palm trees, the root and stem parts become decayed, the leaves turn yellowish, and the plants die within six months [8]. It is common for young plants grown at sites where old oil palm trees had been cut down for the replanting process, to be infected by the G. boninense, which came from the felled oil palm trees [8]. For mature trees, the infection symptoms on the leaves and fronds are roughly known only when the base of the palm leaves (petiole) is broken, whereas the shoots do not grow, and the leaves become yellowish, which will cause the trees to die within 12 to 36 months [9]. When the occurrence of the disease is more than the rate of 30%, the production of fresh fruit bunches (FFB) decreases or is affected by 26%, while the fruits produced are of poor quality. When the attack occurs more than the rate of 60%, the FFB production is affected by 45% [10]. To regulate the spread of the BSR disease in oil palm estates, the health status of oil palm must crucially be monitored through plantation management. Disease monitoring can be implemented, and oil palm life can be extended to increase productivity [6]. The need for an automated non-destructive approach has led to the creation of a rapidly-specific method suitable for the early detection of diseases, and remote sensing techniques can be used to monitor plant diseases and stress [11]. Today, many researchers have used remote detection techniques for the early detection and mapping of the BSR disease in oil palm plants based on the symptoms of G. boninense infection [12].
Three types of remote sensing techniques were used to study the BSR disease in previous studies. First, the BSR disease was detected using ground-based platform sensors, i.e., hyperspectral imaging spectrometer: [13] reached an overall accuracy (OA) of 97% using multivariate pattern recognition; [14] achieved 82% net accuracy using a Jeffries-Matusita JM distance analysis and maximum likelihood classification; and [15] obtained an OA of 94% using Partial Least Square Discriminant Analysis (PLS-DA). The second technique used airborne hyperspectral platform sensors: [16] obtained 84% precision using the Lagrangian interpolation red edge technique. The third technique used spaceborne optical platform sensors: [17] had used the Multispectral QuickBird image to identify and map the BSR disease with the identification accuracy of 85% and the mapping accuracy 67%, using the Vegetation Index.
Based on previous studies, observation showed that the use of RS sensors in detecting and monitoring the BSR disease were able to detect the disease in oil palm plantations effectively [12]. A remote sensing hyperspectral sensor platform has been used in two different scales: ground-based (in-situ) and airborne sensors. Both showed satisfactory results with acceptable accuracy. However, a limitation of using the ground platform is that the range will be confined to a small area [18], while the airborne platform has some disadvantages, such as in obtaining the permission to encroach in the air, which can be a long process with a high cost, because nowadays airplanes are used as a common airborne platform [19]. Spaceborne optical remote sensing remains the foremost utilized platform because of its advantage in resolution, and each radiometric and spatial measurement. However, owing to clouds and haze that incorporate a regular prevalence in Malaysia, as well as in different countries of South-East Asia, the usage of spaceborne optical remote sensing can result in imaging distortions, since its detection cannot penetrate clouds and haze [20]. In contrast, remote sensing satellites that use RAdio Detection and Ranging (RADAR) technology employ microwaves, which are able to broadcast through most clouds and haze. Hence, the backscattering signals obtained using RADAR remote sensing satellites are less influenced by weather conditions. Several studies have shown the ability of SAR (Synthetic Aperture Radar) data in monitoring crop conditions and adopting biophysical parameters [21][22][23][24]. Studies showed that backscattering SAR sensitivity to plant conditions depends on SAR sensor parameters, i.e., polarization, incident angle, and wavelength [25][26][27]. In general, SARs with short wavelengths, such as X-band (~3 cm) and C-band (~6 cm), have a lower chance of penetrating the canopy. On the other hand, L-band (~20 cm) and P-band (~100 cm) that have long wavelengths can penetrate the plant cover to the ground surface [28]. The achieved penetration depth depends on the biophysical parameter of the object which causes the spread in the plant layer (e.g., geometry, size and water content, scatter objects) that can increase or weaken the interaction between microwave and distribution-production characteristics [28]. SAR images have the potential not only to distinguish different crop types, but also to monitor crop growth [29,30]. Furthermore, to date, there is still a lack of health monitoring of the oil palm trees by using microwave wavelengths. Therefore, there is a potential to assess the feasibility of SAR data in classifying oil palm trees that are non-infected and infected by G. boninense.
The machine-learning algorithm is one probable method that can be used to classify oil palm trees that are non-infected and infected by G. boninense. ML algorithms use a computation method to find out information directly from the data without depending on the equations that have been designated as a model [31]. In the recent decade, ML algorithms have been used in various applications, such as land cover mapping [32][33][34][35], forest monitoring [36][37][38][39], and agricultural monitoring [40][41][42]. Up to now, machine-learning algorithms have also been used to classify remote sensing data and crop diseases. [43] presented the application of ML algorithm in plant resistance genes discovery and plant diseases classification; [44] have used ML algorithms to classify healthy and unhealthy plant leaves of sorghum, citrus, and cabbage; and [43] contributed a procedure for the early detection and differentiation of sugar beet diseases based on ML algorithms. Nonetheless, the class imbalance of the data poses a challenge to the ML classifiers, since the class imbalance often favors a majority class [45]. To overcome the class imbalance problems, data level approaches are commonly used. Synthetic Minority Oversampling Technique (SMOTE) is the most widely used at the data level to overcome the imbalance problem of various applications in the agricultural field [46,47].
Hence, the aim of this research is to introduce L-band ALOS PALSAR-2 (Japan Aerospace Exploration Agency (JAXA)) dataset to discriminate oil palm trees that are non-infected and infected by G. boninense using ML algorithm. The two objectives required to be achieved by this research are to: (1) identify the potential backscatter variables; and (2) examine the performance of ML classifiers (Multilayer Perceptron (MLP) and Random Forest (RF). In this research, the imbalance approach (Synthetic Minority Over-Sampling Technique (SMOTE) is used to classify oil palm trees that are non-infected and infected by G. boninense.

Study Area
The study area is located in the oil palm plantations of Felcra Seberang Perak 10, Phase

Field Survey Data
Field data collection was conducted on 20 March 2017. The census of oil palm trees that were non-infected and infected by G. boninense was obtained from the Malaysian Palm Oil Board (MPOB) and the infection status of the BSR disease was identified based on the visual symptoms stated by the expert, as shown in Table 1 and Figure 2. The GPS coordinates of individual trees were recorded using Trimble R6 GPS receiver (Trimble Navigation Limited, Sunnyvale, CA, USA) and the total number of oil palm trees that were non-infected and infected by G. boninense collected for this study were 55 and 37, respectively.

Category
Description Non-infected Healthy palm, no foliage symptom (0%), no fruiting body Infected Foliage symptom more than 25%, produce fruiting bodies

Field Survey Data
Field data collection was conducted on 20 March 2017. The census of oil palm trees that were non-infected and infected by G. boninense was obtained from the Malaysian Palm Oil Board (MPOB) and the infection status of the BSR disease was identified based on the visual symptoms stated by the expert, as shown in Table 1 and Figure 2. The GPS coordinates of individual trees were recorded using Trimble R6 GPS receiver (Trimble Navigation Limited, Sunnyvale, CA, USA) and the total number of oil palm trees that were non-infected and infected by G. boninense collected for this study were 55 and 37, respectively. Table 1. Category and description of infection status of the oil palm trees.

Category Description
Non-infected Healthy palm, no foliage symptom (0%), no fruiting body Infected Foliage symptom more than 25%, produce fruiting bodies

Dataset Description
This study used ALOS PALSAR-2 image data archived on 20 March 2017. The scene observed and captured on ALOS PALSAR-2 is a fine beam dual polarization (HH and HV) and a high spatial resolution 10 m, with a pixel size of 6.25 m × 6.25 m and a 24 cm radar wavelength. Table 2 shows the specification of ALOS PALSAR-2 sensor data.

Dataset Description
This study used ALOS PALSAR-2 image data archived on 20 March 2017. The scene observed and captured on ALOS PALSAR-2 is a fine beam dual polarization (HH and HV) and a high spatial resolution 10 m, with a pixel size of 6.25 m × 6.25 m and a 24 cm radar wavelength. Table 2 shows the specification of ALOS PALSAR-2 sensor data.

Data Analysis and Evaluation
The four main steps involved in this research and the workflow of the processing are shown in Figure 3.

Data Analysis and Evaluation
The four main steps involved in this research and the workflow of the processing are shown in Figure 3.

SAR Image Pre-Processing
As shown in Figure 3, ALOS PALSAR-2 data were first subjected to speckle filtering and radiometric correction. Since ALOS PALSAR-2 data are stored in Level 1.5 product, data have been subjected to geometrical correction; thus, terrain correction step was unnecessary for this data. To reduce high frequency noise in ALOS PALSAR-2 data, speckle filtering was applied to the image. Several well-known researchers have developed filtering algorithms for SAR images [48][49][50]. A wide range of filtering techniques was applied to varying researches. Reduction of speckle noise is one of the most important processes to increase the quality of coherent radar images. Image variance or speckle is a granular

SAR Image Pre-Processing
As shown in Figure 3, ALOS PALSAR-2 data were first subjected to speckle filtering and radiometric correction. Since ALOS PALSAR-2 data are stored in Level 1.5 product, data have been subjected to geometrical correction; thus, terrain correction step was unnecessary for this data. To reduce high frequency noise in ALOS PALSAR-2 data, speckle filtering was applied to the image. Several well-known researchers have developed filtering algorithms for SAR images [48][49][50]. A wide range of filtering techniques was applied to varying researches. Reduction of speckle noise is one of the most important processes to increase the quality of coherent radar images. Image variance or speckle is a granular noise that inherently exists and degrades the quality of active radar and SAR images [51] found that Lee filter is best used for agricultural areas; hence, Lee filtering was adopted in this research. A filter's Kernel size plays an important role in smoothing the image [52]. To preserve the details of the image, appropriate size should be selected for better results. The 7 × 7 of window filter size was utilized for HH and HV based on high Equivalent Number of Looks (ENL) values, and low Speckle Suppression and Mean Preservation Index (SMPI) value, so it indicates better performance in terms of speckle reduction and means preservation. Radiometric correction was then applied to the image to obtain the sigma naught value in the unit of decibel (dB), as in Equation (1) [53] where DN is the digital number of the ALOS PALSAR-2 data. Subsequently, the SAR image was subset into the region of interest, and the data image type was converted from the .dim format to the .tiff format prior to being processed using ArcGIS Desktop software version 10.6.1 (Environmental Systems Research Institute (ESRI), Redland, CA,USA).

Training Data
In the second step, an orthorectified UAV image (3DR Iris + drone with MAPIR Survey 2 camera (MAPIR, Peau Prodiuctions, Inc., San Diego, CA, USA), as a base map was overlaid with the GPS coordinate of each individual tree, which had been recorded using Trimble GPS device during the field survey. Afterwards, the SAR image and coordinate of each tree were overlaid to extract backscattering values. The backscattering value for each oil palm point was extracted from the processed SAR image by using Extract Multi Values tool in ArcGIS. As the SAR image was tested with dual polarization (HH and HV), each digitized point possessed two different backscatter values.

Imbalance Data Approach
In this study, data imbalance was an issue due to the small number of instances in the infected class compared to the non-infected. Therefore, the imbalance approach, SMOTE was applied for the classification of non-infected and infected by G.boninense. The algorithm was chosen because it is of low complexity, has less bias, and requires minimal computation [45,54]. From the SMOTE result, non-infected and infected by G. boninense, was pooled to represent the training and testing datasets with a 70:30 ratio, respectively.

Data Sampling
SMOTE is a technique that oversamples the minority class by generating synthetic examples [55]. SMOTE first randomly selects an example of a minority class and finds the nearest minority class neighbor. The synthetic instance is then created by choosing one of the k nearest neighbors b at random and connecting a and b to form a line segment in space characteristics. The synthetic samples are produced as a convex combination of two selected samples a and b [56]. Finally, the new instances of minority class are synthesized. In this study, SMOTE was executed using an open-source ML program, Waikato Environment for Knowledge Analysis (WEKA) version 3.8.5 [57]. Prior to the classification, the SMOTE parameters were tuned and summarized in Table 3. The process of manipulating the backscatter of dual-polarization values can result in more effective classification [58]. Therefore, instead of using the original backscatter (HH and HV) polarization values, a trial was also made to obtain other variables gained from the backscatter of dual polarization values. The backscatter variables generated for this study were (i) range (HH-HV); (ii) average (HH + HV/2); and (iii) simple band ratio (HH/HV), (HV/HH). All six (6) backscatter variables were used in this study, as shown in Table 4. In this study, the variables of backscatter of dual-polarization values were used as the ML inputs as illustrated in Table 5. Table 4. Variables used to classify trees that were non-infected and infected by G. boninense (Adapted from [59]).

Machine Learning Approach
The ML approach could also be used to classify a large number of samples and classify a smaller number of samples [60,61]. Due to the small number of samples, we performed a cross-validation process to evaluate model performance. The K-fold cross-validation function in WEKA was used to split the data into training and testing datasets and obtain an independent evaluation of the model accuracy. Thus, the resulting model was valid and not limited to only one set of data. Ten iterations of cross-validation were performed, where the original data were randomly partitioned into ten subsamples. Two ML methods were used, as described below: (1) Multilayer Perceptron (MLP). MLP is an artificial neural network feed-forward model that charts input data sets to a set of appropriate outputs. An MLP is the result of multiple layers of nodes being connected to one another [62]. Every node is a neuron (or processing element) with a nonlinear activation function, except for the input nodes. For training the network, MLP utilizes a supervised learning technique called backpropagation [63]. MLP is a modification of the standard linear perception and can distinguish non-linearly separable data.
(2) Random Forests (RF). RF is a type of classification algorithm that contains a group of decision tree stochastics. In RF, the individual bootstrap sample is drawn from the original datasets to train each tree, and each node has a random feature from the original dataset [64]. The dataset is assigned according to the majority vote from the ensemble of trees built by the RF algorithm [65]. Moreover, it has high predictive accuracy, and it works well with an imbalanced dataset and robust against noise [66,67].

Accuracy Assessment
Overall accuracy as an assessment metric will be biased because of the data imbalance problem since it mainly represents the majority class's accuracy [54]. This analysis, therefore, presented the description of the confusion matrix as an alternative in terms of the success rate of the non-infected and infected G. boninense, along with the balanced classification rate or balanced accuracy (BCR) and the receiver operating characteristics (ROC) curve region (AUC). These metrics were used to evaluate different classifier approaches and measure their performance. Following the classification of Kappa value by [68], we considered the BCR below 40.00% as poor, between 40.00% and 80.00% as moderate and above 80.00% as robust. While, the AUC value of 0.50 is regarded as fail, 0.51−0.69 is considered to be poor, 0.70-0.79 is deemed to be acceptable, 0.80-0.89 is considered excellent, and 0.90-1.00 is regarded as outstanding [69]. Finally, ANOVA and Tukey Honest Significant Difference (HSD) test at p < 0.05 using IBM SPSS Statistics for Windows, version 19 (IBM Corp., Armonk, NY, USA) determined the significance level of the differences between classifiers and backscatter variables.

Sensitivity of Variables of Backscatter Used
The success rate (%), BCR (%) and AUC of the non-infected and infected by G. boninense are illustrated in a heat map shown in Figures 4-6, respectively. Since the accuracy assessment of training and testing datasets was about similar, the result and discussion of success rate, BCR and AUC are focused on the testing dataset. As shown in Figure 4, the MLP model performed better than the RF model, given any backscatter variables. HV had a higher correctly classified the G. boninense non-infected and infected oil palm trees for both classifiers in terms of backscatter variable. The MLP classifier model for the variable HV had a robust success rate with correctly classified 100% for non-infected and 91.30% for infected G. boninense, BCR 95.65% (robust) and AUC 0.92 (outstanding). Meanwhile, the RF classifier model for the variable HV had a robust success rate with correctly classified 94.11% for non-infected and 91.30% for infected G. boninense, BCR 92.70% (robust) and AUC 0.95 (outstanding).       The ANOVA (Table 6) test demonstrated that there is no statistically significant difference between backscatter variables. However, a Tukey post hoc ( Table 7) test showed that the HH and HV were statistically significant (p = 0.03) and there was no statistically significant difference between the other backscatter variables. The Tukey's HSD (Table 8) Contrary to the HH variable, we correctly classified the G. boninense non-infected and infected oil palm trees for both classifiers. The MLP classifier model for the variable HH had a poor and robust success rate with correctly classified 17.65% for non-infected and 86.96% for infected G. boninense, BCR 52.30% (moderate) and AUC 0.49 (fail). Meanwhile, the RF classifier model for the variable HH had a poor to moderate success rate with correctly classified 29.41% for non-infected and 52.17% for infected G. boninense, BCR 40.80% (moderate) and AUC 0.44 (poor).
However, the manipulation results of the variables obtained in this study only reached higher from average variables ((HH + HV)/2). The MLP classifier model for the variable HH + HV)/2 had a robust success rate with correctly classified 88.24% for non-infected and 82.61% for infected G. boninense, BCR 85.40% (robust) and AUC 0.88 (excellent). Meanwhile, the RF classifier model for the variable HH + HV)/2 had a moderate to robust success rate with correctly classified 76.47% for non-infected and 82.61% for infected G. boninense, BCR 79.55% (moderate) and AUC 0.85 (excellent).
The ANOVA (Table 6) test demonstrated that there is no statistically significant difference between backscatter variables. However, a Tukey post hoc ( Table 7) test showed that the HH and HV were statistically significant (p = 0.03) and there was no statistically significant difference between the other backscatter variables. The Tukey's HSD (Table 8) test depicted the mean comparison of BCR (%) highest for HV = 70.65% and lowest for HH = 50.53%.

Effects of Classifiers on the Model Performance
Based upon the results in the study, the MLP model performed slightly better than the RF model, given any backscatter variables, with BCR ranging from 52.30 to 95.65% (moderate to robust), AUC = 0.49-0.92 (fail to outstanding), non-infected success rate = 17.65-100% (poor to robust) and infected success rate = 82.61-100% (robust) compared to 40.80 to 92.70% (moderate to robust), AUC = 0.44-0.95 (fail to outstanding), non-infected success rate = 29.41-94.11% (poor to robust) and infected success rate = 52.17-100% (moderate to robust).
From the results obtained, the excellent variable selection also plays a role in determining the performance model. Results for the HV polarized backscatter variables were found to be better than other backscatter variables based on the results of the success rate, balanced accuracy (BCR) and the receiver operating characteristics (ROC) curve region, (AUC) for both classifiers.
The ANOVA also concluded that there is no statistically significant difference in model performance (p = 0.271). The Tukey's HSD test depicted the mean comparison of BCR (%) slightly higher for MLP model = 70.65% than RF model = 64.44%.

Sensitivity of Variables of Backscatter Used
The current study was to identify the potential backscatter variables for the classification of the BSR disease in oil palm plants. Several studies have shown the efficiency of backscattering of SAR data sensitivity to plant conditions [25][26][27]. The radar backscattering from crops is sensitive to the crop canopy structure and the underlying soil condition [70]. As regards the polarization response, cross-polarized backscatter is found to be more sensitive than co-polarized backscatter. This is attributed to the depolarization due to multiple reflection of incoming signals inside the plant canopy. This is consistent with conclusions in the literature as mentioned above that HV polarization is more sensitive compared to HH polarization.
This study discovered the potential to assess the feasibility of backscatter variables from SAR data in classifying oil palm trees that are non-infected and infected by G. boninense and revealed prediction accuracy. The backscatter of HV polarization has greater success in correctly classifying G. boninense non-infected and infected oil palm trees for both classifiers. The MLP classifier model for the HV polarization has a robust success rate with correctly classifying 100% of the non-infected and 91.30% of the infected G. boninense, BCR 95.65% (robust) and AUC 0.92 (outstanding). Meanwhile, the RF classifier model for the HV polarization had a robust success rate with correctly classifying 94.11% of the non-infected and 91.30% of the infected G. boninense, BCR 92.70% (robust) and AUC 0.95 (outstanding).

Effects of Classifiers on the Model Performance
Based upon the BCR, AUC, success rate of non-infected and infected by G. boninense as well as the ANOVA, both the MLP-based models and the RF-based models have almost similar performance in classifying the oil palm trees. We need a thorough understanding of the output of these methods in order to make maximum use of each classification system.
There are two factors we can consider when choosing an algorithm. (1) Performance; when deciding the algorithm to use for classification or regression, the algorithm's overall output is a significant determinant. Both MLP and RF are able to distinguish linear and nonlinear relationships and MLP systems have more significant potential for this [71].
(2) Robustness; when evaluating performance it is necessary to look at the robustness of the application rather than the consistency of the fitting [72]. Although the data were generated from the same database, they do not work well with new data. Such a condition is considered "overfitting" [73]. In this situation, the respective delegate does not comply with the existing protocol robustness, reflecting its generality. In this scenario, we reduce the chance of over generalization. These can be minimized by restricting the number of models that can be suggested by an algorithm. To do so, both MLP and RF can be used with advantages and disadvantages. In MLP, the number of hidden neurons and layers has a significant effect on the complexity of the model, as well as the amount of regularization used when optimizing the weights. RF allows us to change the size and number of trees, or the size and depth of individual trees, to fit the issue. Additionally, each of these methods can manage uncertainty and overfitting. MLP are stereotyped as being especially sensitive to inputs, which may contribute to tendencies towards deviations [74].
Many previous studies involved the comparison of these classifiers. Another research study deals with the prediction of carbon and nitrogen in the soil and how different fields have different amounts of these two elements [75]. The research compares MLP, RF and Gradient Boosted Machines (GBM) to see better results. The findings indicate that the results differed according to the kinds of data used. The RF model outperformed other models in most situations. Another research done based on the comparison of MLP and RF in predicting building energy usage, which is a numerical prediction rather than a category classification case [76]. The study found MLP performed significantly better than RF.
In reality, both techniques help with various aspects of applications. ML has found that no single algorithm can be useful in every situation. Thus, no single algorithm always performs better, and the results of an algorithm will differ significantly depending on the application and the size of the dataset. Hence, one can compare various learning algorithms' outputs for a particular problem to find the best algorithm. It is also useful to construct ensembles of multiple models generated with different approaches to combine their strengths, thus reducing their weaknesses.

Conclusions
In this study, the backscatter values of oil palm trees were extracted using ALOS PALSAR-2 image data and divided into two level status: non-infected and infected by G. boninense. Six (6) backscatter variables were used, namely HH, HV, range (HH-HV), average (HH + HV/2), simple band ratio (HH/HV), and simple band ratio (HV/HH) to generate ML models for the prediction of the BSR disease. Two classifier models were used; MLP and RF. In terms of model performance, given any backscatter variables, the MLP model had a balanced accuracy (BCR) ranging from 52.30% to 95.65% compared to 40.80% to 92.70% for the RF model. Comparison between the MLP model and RF model for the receiver operating characteristics (ROC) curve region, (AUC) gave a range of 0.49-0.92 and 0.44-0.95, respectively, for the MLP and RF models.
Regarding the polarization response, certain variables such as the fact HV that captures the phenomena under study better, stand out more significantly. The crosspolarized backscatter (HV) was found to be more sensitive to crop canopy structure than co-polarization backscatter (HH), which is more susceptible to the surface. Due to that, the results of the success rate, a balanced accuracy (BCR) and the receiver operating characteristics (ROC) curve region, (AUC) become low if it involves HH polarization as a variable. Using the most significant variables, HV, the MLP model had a balanced accuracy (BCR) of 95.65% compared to 92.70% for the RF model. Comparison between MLP model and RF model for the receiver operating characteristics (ROC) curve region, (AUC) gave a value of 0.92 and 0.95, respectively, for the MLP and RF models.
In conclusion, by using only the HV polarization, both the MLP and RF can be used to predict BSR disease with a relatively high accuracy. The major benefit derived from this study is the potential of using SAR data and the imbalanced data approach to classify oil palm trees infected by G. boninense using ML techniques. In the future, studies with samples of different severity levels will be used to analyse backscatter characteristics and identify oil palm trees that have been infected with BSR disease.