1. Introduction
Palm oil, as vegetable oil, is highly adaptable, being utilized in a wide variety of applications ranging from biofuels to soaps to snack foods. In Asia, palm oil is regarded for its health and food preservation properties. Meanwhile, it is a frequently utilized biofuel in Europe due to its relatively high energy content and ability to combine well with other oils. It also appeals to both regions because of its inexpensive cost when compared to other oils. As a result, since the 1990s, global palm oil production has quadrupled. Malaysia was predicted to generate over 30% of the world’s total palm oil by 2020 [
1]. Malaysia’s economy benefits considerably from the solid financial returns associated with palm oil sales. In 2020, palm oil would account for roughly 38% of Malaysia’s agricultural output and contribute 3% to the country’s gross domestic product [
2]. Improved mature oil palm acreage and higher oil palm productivity through increased fresh fruit bunch (FFB) yield and higher oil extraction rates are predicted to help Malaysia’s palm oil production reach 22 million metric tonnes (mt) by 2025 and up to 25 million mt by 2030 [
3]. Meanwhile, palm-oil plantations cover approximately 18% of the country’s territory, directly employ 441,000 people (more than half of whom are small landholders), and indirectly use a large number in a country with a population of 32 million [
1].
Even though the palm oil industry in Malaysia is more than a century old and is the country’s most important commodity, it continues to face numerous significant issues. As a result, integrated disease and pest management for oil palm farms has been embarked upon. This is critical to avoid substantial crop destruction, mainly due to major diseases such as basal stem rot (BSR). Numerous control strategies or methods have been employed or developed to mitigate the economic impact brought about by the disease, such as destroying or eliminating the infected palms, treating the infected palms, or providing protection to the young or healthy palms that are not infected yet [
4]. At present, BSR disease has no effective cure [
5]. Most control techniques can only extend the productive lives of infected palms without completely curing the disease.
BSR disease begins with pathogenic fungi colonizing the oil palm root, followed by the destruction of basal stem tissue [
6]. The disease eventually kills the internal tissue and palm xylem, disrupting water and nutrient flow from the root to the plant’s upper half [
7]. Infected trees will show signs of wilting, dry fronds, and unopened spears. Later, basidiocarps or fruiting bodies in the shape of a conch will emerge on the palm trunk. At this point, the fungal infections have spread widely throughout the palm and typically result in plant death [
8]. The fruiting body might then release spores, which can spread to the soil or to neighboring palms. As a result,
Ganoderma reduces the productive life of oil palm trees, resulting in considerable output losses for the oil palm business [
9]. With an average mortality rate of 3.7 percent, the anticipated yield loss owing to this disease in Malaysia might exceed USD 500 million [
10]. To regulate the spread of BSR disease in oil palm estates, the health status of oil palm must crucially be monitored through plantation management. Disease monitoring can be implemented, and oil palm life can be extended to increase productivity [
11]. The need for an automated non-destructive approach has led to the creation of a rapid specific method suitable for the early detection of diseases, and remote sensing techniques can be used to monitor plant diseases and stress [
12].
Recently, numerous researchers have employed remote detection approaches for the early detection and mapping of BSR disease in oil palm plants based on the symptoms of
G. boninense infection [
13]. Non-invasive remote sensing techniques, including ground-based, airborne, and space-borne remote sensing, have also been investigated to identify and map BSR-infected trees. Recent studies have demonstrated that hyperspectral and multispectral remote sensing methods can distinguish healthy and BSR-infected trees [
14,
15,
16,
17,
18,
19,
20]. Terrestrial laser scanning (TLS) [
21,
22,
23], synthetic aperture radar (SAR) data [
24,
25], intelligent electronic nose (E-Nose) systems [
26,
27], tomographic sensors [
28,
29], and microfocus X-ray fluorescence [
30] also showed positive results in detecting BSR-infected trees. These reports showed that the approaches employed can detect BSR early and distinguish healthy from BSR-infected trees. However, several of the techniques were limited in their ability to further characterize the degree of BSR infection [
13].
Biological activity produces metabolic heat, which causes the temperature of the products to rise [
31]. A temperature differential will form at the surface due to the loss of water through transpiration. Varied features of plant leaves will result in different temperature distributions on the surface due to transpiration, which is dependent on the plant’s growth stage [
32]. Using a sensitive camera and the appropriate image analysis software, the method allows the surface temperature of plant leaves to be displayed visually and quantified with high resolution. Thermal imaging is a technique for measuring temperature distributions from a distance. Therefore, thermal imaging has the potential to determine plant properties in a non-contact and non-destructive manner.
Numerous studies demonstrate thermal imaging’s capacity to detect plant diseases in either a controlled environment such as a plant growth chamber or a greenhouse or in an uncontrolled environment such as a field site. The authors of [
33] used infrared (IR) thermography to characterize the temperature range of affected leaves while monitoring scab disease on apple leaves in a greenhouse. The maximum temperature difference (MTD) was shown to increase in direct proportion to scab formation and to be highly correlated with the extent of infection regions. Due to the leaf withering, the MTD was reduced in later phases. However, the leaf area with enhanced perspiration was more significant than the leaf area with scab lesions, and the percentage decreased from more than 70% in the early stage to 20% in mature lesions. The study in [
34] examined the ability of thermal imaging to identify the early indicators of fungal diseases on rose plants (
Rosa hybrida L.). Two tests were conducted in a plant growth chamber to determine the impact of powdery mildew and gray mold infections. A feature selection was carried out, with the best retrieved thermal properties with the highest linguistic hedge values being chosen. The findings of this study demonstrated that pre-symptomatic detection of powdery mildew and gray mold infections is possible. The best prediction rates were 69% and 80% (on the second day after inoculation) for identifying mildew and gray matter in their pre-symptomatic stages. Image spectroscopy and thermal photography were employed in this work [
35] to identify peanut leaf spots in peanut fields. Two thermal assessments were conducted: one spanning the entire canopy and the other focusing on a single plant. Thermal infrared tests in the diseased zone revealed a greater radiance than in the healthy region in the first set. The decreased root absorption efficiency seen in infected plants, which was more pronounced during the hottest hours of the day when the plant’s water requirement was more significant, may contribute to this thermal behavior. The second set of assessments was conducted on single plants that were observed for thermal activity and accurate IR responses throughout the day. The diseased plants’ temperature was found to be 2.2 °C greater than that of the healthy plants. The temperature difference enabled identification of infected and healthy leaves prior to apparent necrosis on the leaves. Subsequently, ref. [
36] used thermal images of canopy regions of oil palm trees from non-infected and infected BSR trees. The images were processed to derive intensity values that correspond to the plants’ thermal characteristics. These values were analyzed statistically. Selected principal component scores were employed in multivariate classification algorithms such as k-nearest neighbor (kNN) and support vector machine (SVM). The findings indicated that when the average intensity value of trees was employed, the SVM-based model achieved the maximum overall classification accuracy of 89.2% for the training set and 84.4% for the test set. A recent study [
37] utilized thermal imagery to detect BSR in oil palm during the seedling stage. The extracted values of thermal characteristics were obtained by processing thermal images of oil palm seedlings for each infected and healthy seedling. Statistical analysis was performed to find any significant differences that indicate healthy and diseased seedlings. To minimize the input’s dimensionality, principal component analysis (PCA) was employed. The SVM (fine Gaussian) classification model using principal component 1 and principal component 3 input parameters produced the best results, with an accuracy of 80%.
Although there have been studies to detect trees infected by BSR [
36,
37], these studies differ in terms of image acquisition and image processing. In terms of image acquisition, this study is innovative because it balances the effects of several different radiation sources, such as emissivity, reflection temperature, and other environmental parameters (atmospheric temperature, ecological humidity, and camera distance), in contrast to previous studies in which the parameters of the thermal camera were set to a fixed value. The parameters involved were emissivity (0.98), reflected apparent temperature (RAT) (20 °C), atmospheric temperature (20 °C), and relative humidity (50%). The emissivity was kept constant (0.98) in this investigation, but the RAT value was varied according to the value reflected by the reflector. The reflector is positioned within the field of vision of the infrared camera, and its temperature is measured using the reflector’s emissivity of one. The reflection temperature is the outcome of the reflector temperature. Meanwhile, atmospheric temperature and relative humidity values were set every half hour and ranged from 24–30 °C and 67–92%, respectively. Meanwhile, prior research standardized the image temperature scale from 24–34 °C to ensure that pixel intensity corresponds to the exact temperature representation, in contrast to this study, which assessed each thermal image by focusing on features of temperature variance. In this regard, we strive to improve existing methodologies and to develop novel approaches for detecting BSR illness in oil palm plantations.
A machine learning (ML) algorithm is one probable method that can be used to classify oil palm trees that are non-infected and BSR-infected. ML algorithms use a computation method to find out information directly from the data without depending on the equations that have been designated as a model [
38]. In the last decade, ML algorithms have been used in various applications, such as agricultural monitoring [
39,
40,
41], land cover mapping [
42,
43,
44], and forest monitoring [
45,
46,
47]. ML approaches have also been applied to precision farming, which is now known as digital farming [
41]. One of the most significant concerns of digital agriculture is pest and disease control. Recently, ML algorithms have also been used to classify remote sensing data and crop disease detection [
48].
Numerous researchers have researched BSR disease detection using ML. Researchers [
49] used electrical properties to detect BSR disease in oil palm trees at an early stage. Only 56 mature tree samples were chosen, with 14 trees representing each of the four infection levels. Quadratic Discriminant Analysis (QDA) achieved the maximum accuracy, while impedance performed the best, with an overall accuracy of 82–100%. Multispectral Quickbird satellite images were employed by [
50] for BSR disease classification. The plot contained 144 oil palm trees ranging in age from 10 to 21 years old. In comparison to SVM and regression tree (CART) models, the RF classifier performed the best, with the highest accuracy in the producer (91%), user (83%), and overall (91%) categories. In a recent study, ref. [
22] used TLS to classify the healthiness levels of BSR disease. The results indicated that the kernel naïve Bayes (KNB) model created utilizing principal component 1 and 2 as input parameters performed the best among 90 other models.
Nevertheless, the data’s class imbalance presents a challenge for machine learning classifiers, as the class imbalance frequently favors a majority class [
51]. To address issues of class imbalance, data-level techniques are frequently used. Random oversampling (ROS), random undersampling (RUS), and synthetic minority oversampling (SMOTE) are the most often utilized data-level techniques for resolving the imbalance problem in a variety of agricultural applications [
52,
53].
Thus, this research aims to use a thermal imaging dataset to distinguish non-infected and infected oil palm trees utilizing an imbalance data technique and a machine learning algorithm. The two objectives of this research are to: (1) identify potential temperature features and (2) assess the performance of machine learning (ML) classifiers (naïve Bayes (NB), multilayer perceptron (MLP), and random forest (RF)) to distinguish non-infected and BSR-infected oil palm trees.
2. Materials and Methods
2.1. Data Collections
The study site is located within the Felcra Seberang Perak 10, Phase 1, Parcel 3 oil palm farms. It is located approximately at latitude 4°06′01″–4°06′44″ N and longitude 100°53′07″–100°53′42″ E in Mukim Pasir Salak, Perak Tengah district, Perak. Parcel 3 covers an area of 26 hectares and contains a total of 3660 trees. Oil palm trees for Phase 1, Parcel 3 were planted in 2005 as second-generation plants. The oil palm trees in this study were 13 years old, and 2009 was the first year of fruit production for the plantation. The oil palm planting was maintained similarly to commercial palm oil plantations, including fertilization, fruit harvesting, trimming, and weed control. Pruning mature palms properly was necessary to remove dead or senescing leaves and to provide access to the FFBs in the appropriate harvesting period. The plot was planted at a density of 142 palms per hectare, and the palms were spaced 9 × 9 × 9 m apart in an equilateral triangular design.
The data for the classification model were collected between 20 and 22 March 2017. A total of 92 samples of oil palm trees used in this study were selected randomly. The samples were categorized as non-infected (healthy tree) and BSR-infected. The number of oil palm trees for non-infected was 55, and that for BSR-infected was 37. The health status of trees infected with BSR was determined by an expert based on visual signs provided by the Malaysian Oil Palm Board (MPOB).
The FLIR T620 IR infrared thermal imaging camera (FLIR Systems, Inc., Wilsonville, OR, USA) was used for data acquisition. The trunk images of each tree section were randomly captured at three different angles. The age of the tree was 13 years old, having a height of more than 4 m. The thermal camera position was 1 m above the ground and 1m away from the tree. The image acquisition was carried out for trunk sections in two different sessions: morning and afternoon. For the morning session, the images were captured from 7.30 a.m. to 10 a.m. Meanwhile, the images in the afternoon session were captured from 4.30 p.m. to 7 p.m. This selection session was based on the sun’s heat energy gradually absorbed by crop plants during daylight hours. Moreover, as the ambient temperature rises throughout the day, these objects become less distinct from other warm objects that the camera’s sensor detects and highlights. An illustration diagram for the experimental setup of the trunk is shown in
Figure 1.
To measure temperature accurately, the effects of several different radiation sources must therefore be offset, such as emissivity, reflected temperature, and other environmental parameters (atmospheric temperature, ecological humidity, and camera distance). The temperature of the object (
Tobj) can be calculated from Equation (1). Different camera manufacturers use similar equations to perform temperature measurements [
54]. In order to solve Equation (1), numerous parameter inputs are required by the camera, or software, to precisely estimate the temperature of the object.
where
εobj refers to the object’s emissivity,
Tref refers to the reflected temperature,
τatm refers to the transmittance of the atmosphere, and
Tatm refers to the atmosphere’s temperature. Generally, the transmittance of the atmosphere is determined by the distance between the object and the camera and the relative humidity. This value is typically close to one. However, because the atmosphere’s emittance is close to zero (1 −
τatm), this characteristic has a negligible effect on temperature measurements. On the other hand, the emissivity of the object and the reflected temperature have a significant impact on the temperature measurement and must be determined precisely.
2.1.1. Emissivity Measurement
Emissivity is the efficiency of an object to radiate heat. Provided that both the object and an ideal blackbody are at the same temperature, emissivity can be defined as the ratio of infrared energy emitted by the object, as compared to that emitted by an ideal blackbody and represented as a percent or a decimal. In this experimental study, the emissivity of the oil palm tree’s surface was estimated using an emissivity coating method [
55]. If a part of the surface under study can be coated with a black paint with a known emissivity, the emissivity of the surface can be obtained by changing the emissivity value set on the device until the surface temperatures measured on the coated and uncoated surfaces are the same [
56]. Several authors utilized a similar approach, except instead of black paint, black electrical tape was employed [
57,
58,
59,
60]. The configuration of the emissivity is then changed until the actual temperature is measured. The final configured emissivity is the emissivity of the object. As a result, at emissivity of 0.98, the oil palm tree’s temperature and the tape’s temperature recorded by the thermal camera were the same.
2.1.2. Reflected Apparent Temperature (RAT)
The reflected apparent temperature must be calibrated for accurate measurement. The object’s perceived temperature compensates for the radiation reflected from its surroundings into the camera. When the emissivity is low and the object temperature is significantly different from the reflected apparent temperature, it is even more crucial to set the reflected apparent temperature accurately. A crumpled and re-flattened sheet of aluminum foil is a frequently used substitute [
55]. The reflector is positioned within the infrared camera’s field of view, and the reflector’s temperature is determined using an emissivity of one and a distance of zero. Finally, the test is repeated with the reflector’s temperature as the reflected temperature. The final reflected temperature is the resultant temperature value.
2.1.3. Atmospheric Temperature and Humidity
Additionally, the camera may take into account the effect of atmospheric temperature. The nature of the camera demonstrates that transmittance is dependent on the relative humidity present in the atmosphere. The temperature and humidity of the atmosphere were recorded every half hour using a TFA Dostmann Digital Thermo-Hygrometer (30.5002) (TFA-Dostmann.de., Wertheim-Reicholzheim, Germany).
2.1.4. The Distance between the Object and the Camera
Distance is a parameter that indicates the distance between an object and the front lens of the camera. In this research, the distance was fixed at 1 m. The camera was focused on the trunk of the oil palm tree at a height of 1 m above the ground, where the G. boninense fruiting bodies appear on the basal stem.
2.2. Data Pre-Processing
The temperature variation for each of the thermal images was analyzed using the camera manufacturer’s software, FLIR ResearchIR Max (FLIR Systems, Inc., Wilsonville, OR, USA). The two primary image processing steps involved in this study and the processing workflow are depicted in
Figure 2.
2.2.1. Image Enhancement
Image enhancement seeks to improve the perceived utility of images for human viewers or help in processing other image-based techniques by computer. Two inputs for changing the image’s contrast are used in image enhancement, such as setting the limits for different scales and using automatic gain control (AGC) algorithms (these algorithms can improve image detail and contrast). This study used scale limits from image and plateau equalization (PE). Scale limits from image’s function is to look at the entire image to determine the min and max values for the scale; meanwhile, PE allows for excellent contrast in almost all scenes. Users can control the algorithm’s aggressiveness and choose how intense they want the image enhancements to be using a PE slider. In short, the outcome of image enhancement improves the image’s quality and can lead to better views of an image. Image differences before and after the image enhancement process can be seen in
Figure 2.
2.2.2. Identifying the Region of Interest (ROI)
The first thing to consider is that the oil palm trunk area needs to be separated from its background. The process starts with recognizing different regions in the image that are likely to contain foreground objects. Defining the region of interest (ROI) is the first and primary step in thermography processing analysis. Current software uses various shapes, including a box, ellipse, line, bendable line, polygon, freehand, spot cursor, and measurement cursor for defining these regions. The ROI was represented in this study by a polygon. Polygons were selected due to the irregular trunk features of the oil palm tree. As a result, the ROI temperature was considered, as shown in
Figure 2.
2.3. Feature Extraction—Thermal Image
Feature extraction is a technique for reducing the dimension of an image by efficiently representing remarkable regions as a compact feature vector. This method was carried out on the thermal images that had been processed. FLIR Tools in the FLIR ResearcherIR Max software environment was used to extract features from each thermal image. The following features were retrieved from the ROI of the thermal images that represent the oil palm trees denoted by and are defined as below:
Maximum temperature of oil palm trunk, (Tmax) = max (A)
Minimum temperature of oil palm trunk, (Tmin) = min (A)
Center temperature of oil palm trunk, (Tcenter) = center (A)
Mean temperature of oil palm trunk, (Tmean) =
Standard deviation temperature of oil palm trunk, (Tsd) =
where N is total number of pixels and is pixel value at i.
Every feature was extracted from the three images taken at different angles and the values averaged. These averaged features were then used to analyze the characteristics of non-infected and infected trees using a statistical analysis of variance (ANOVA) using JMP Pro 16 (SAS Institute Inc., Cary, NC, USA).
2.4. Statistical Analysis
Since a series of comparisons were carried out in the present study, the variances were analyzed (through an ANOVA test) to see whether the means of dependent variables are different in the involved groups or not. In this study, an ANOVA test was conducted in the following order:
to assess the temperature characteristic of the non-infected and infected trees during morning and evening sessions to see if there was a significant effect between non-infected and infected trees during morning and evening sessions;
to evaluate the relationship between non-infected and infected trees and the feature temperature captured by the thermal camera.
2.5. Machine Learning Approach
Additionally, the machine learning approach can be utilized to categorize both large numbers of samples and small numbers of samples [
22,
61]. Classification is performed using the features retrieved from the thermal images as input. The extracted features act as the predictor while the oil palm status serves as a response. To differentiate between non-infected and BSR-infected trees, the classification was performed using the Waikato Environment for Knowledge Analysis (WEKA) version 3.8.5. We used cross-validation to evaluate the model’s performance due to the small sample size. WEKA’s K-fold cross-validation function divided the data into training and testing sets and carried out an independent assessment of the model’s accuracy. As a result, the model generated was acceptable and not restricted to a single collection of data. We performed ten iterations of cross-validation, randomly partitioning the original data into ten subsamples. Three ML techniques were employed, as follows:
(i) Naïve Bayes (NB). The NB algorithm is a probabilistic generative model based on the concept of conditional independence of predictor features, which means that the presence of one feature in a class is unrelated to any other feature [
62]. NB’s conditional independence assumption enables the computation of the sample data’s class-conditional probabilities, which can be calculated directly from the training data rather than by assessing all feature possibilities [
63].
(ii) Multilayer perceptron (MLP). MLP is an artificial neural network feed-forward model that charts input datasets to a set of appropriate outputs. An MLP is the result of multiple layers of nodes being connected [
64]. Except for the input nodes, every node is a neuron (or processing element) with a nonlinear activation function. For training the network, MLP utilizes a supervised learning technique called backpropagation [
65]. MLP is a modification of the standard linear perception and can distinguish non-linearly separable data.
(iii) Random forest (RF). RF is a classification algorithm that consists of a collection of stochastic decision trees. In RF, each tree is trained using a separate bootstrap sample from the original datasets, and each node contains a random feature from the original dataset [
66]. The dataset is assigned using a majority vote obtained from an ensemble of trees constructed using the RF technique [
67]. Additionally, it has high predictive accuracy, is resilient to noise, and is effective with an imbalanced dataset [
68,
69].
2.6. Imbalance Data Approach
Using ML, classifiers are developed with the goal of minimizing classification errors and increasing predicting accuracy. These classification methods make the basic assumption that the dataset under research comprises a well-balanced number of examples for each specific class of classifications. Therefore, the target classes’ prior probabilities are considered identical [
70]. Classification algorithms have traditionally been motivated by improving the predicted accuracy of the generated classifiers.
Nonetheless, maximizing overall accuracy may not be the optimal strategy in the event of an unbalanced dataset. To maximize overall accuracy, a classifier concentrates on the majority class, which carries the maximum weight in the data. As a result, the classifier can achieve high accuracy on the majority class while doing poorly on the minority group due to the overall dataset extension. Our concern is with the minority class. Due to the small number of BSR-infected samples relative to non-infected samples, data imbalance was a challenge in this study. In machine learning, class imbalance can be addressed by either altering the underlying algorithm’s learning processes or modifying the dataset itself. The approach to solving this problem is a data-level approach. To increase the imbalance ratio, concerning data-level imbalance handling, the incidence of the two classes is algorithmically equated.
Data Sampling
The data-level approach is often referred to as the data sampling technique. It accomplishes this by artificially balancing the class instances in the dataset. Resampling corrects for imbalances by altering each class type instance, which frequently employs sampling techniques such as undersampling or oversampling, or a combination of the two [
71]. Resampling techniques are more adaptable because they are not dependent on the classifier chosen [
72].
(i) Undersampling is one of the simplest strategies to handle imbalanced data. The primary undersampling method arbitrarily eliminates majority class examples in order to balance the dataset [
73]. The simplest yet most effective method is undersampling the majority class, most commonly implemented as random undersampling (RUS). In RUS, the majority of class instances are discarded at random until a more balanced distribution is attained [
74]. Consider, for example, a dataset consisting of 100 majority class instances and 10 minority class instances. In RUS, by selecting 90 majority class instances at random to be removed, one might attempt to create a balanced class distribution. The resulting dataset will then have 20 instances: 10 (the original) minority class instances and 10 (randomly remaining) majority class instances.
(ii) Oversampling is another common sampling method employed in dealing with an imbalanced class problem. Numerous oversampling methods are available including random oversampling (ROS), focused oversampling, and synthetic sampling [
75,
76]. In ROS, minority class instances are copied and also repeated in the dataset until a more balanced distribution is attained. Therefore, if there are 100 majority class instances and two minority class instances, traditional oversampling would copy the two minority class instances 49 times. The resulting dataset will have 200 instances: the 100 majority class instances and 100 minority class instances (i.e., 50 of each of the two minority class instances). In focused oversampling, only those minority class values having samples that occur on the boundary between the majority and minority class values are resampled.
(iii) SMOTE is a method that generates synthetic samples to oversample the minority class [
75]. SMOTE selects a representative of a minority class at random and then locates the nearest minority class neighbor. The synthetic instance is then constructed by randomly selecting one of the k-nearest neighbors, B, and linking A and B in space attributes to form a line segment. The synthetic samples are created by combining two selected samples, A and B, convexly [
77]. Finally, new minority class instances are synthesized.
This study used resampling approaches such as RUS, ROS, and SMOTE. Resampling was performed using an open-source ML program, WEKA. Prior to classification, the resampling parameters were determined and are summarized in
Table 1.
We divided the dataset into 70% training and 30% test for testing purposes.
Table 2 illustrates the imbalanced technique used to pre-process the dataset for non-infected and BSR-infected trees.
2.7. Accuracy Assessment
Overall accuracy as an assessment metric will be biased because of the data imbalance problem since it mainly represents the majority class’s accuracy [
54]. This analysis, therefore, presented the description of the confusion matrix as an alternative in terms of the success rate of the non-infected and BSR-infected trees, along with the receiver operating characteristic (ROC) curve region (AUC) and precision–recall curve (PRC). These metrics were used to evaluate different classifier and imbalanced approaches and measure their performance. The receiver operating characteristic (ROC) curve represents the degree or measure of separability, whereas the area under the curve (AUC) represents the degree or measure of separability. In AUC, the true positive rate (TPR), on the
y-axis, is plotted against the false positive rate (FPR), on the
x-axis [
78]. The AUC is a good metric for classifier performance because it is decision-dependent, and the score is always confined between 0 and 1 [
79]. No viable classifier has an AUC value less than 0.5 [
80]. The higher the AUC value, the more capable the model is at discriminating between positive and negative classifications. The PRC is an alternative to the AUC. PRC is calculated and plotted as the precision (
y-axis) versus recall (
x-axis) for a single classifier at various thresholds [
78]. In general, the greater the area under the PRC score, the better a classifier performs. In contrast to AUC, PRC does not take into account the number of true negative outcomes [
81].
This study considered the AUC and PRC value of 0.50 as fail, 0.51–0.69 as poor, 0.70–0.79 as acceptable, 0.80–0.89 as excellent, and 0.90–1.00 as outstanding [
82]. Meanwhile, this study classified the success rate of classifying the non-infected and BSR-infected trees as poor if it was less than 40.00 percent, moderate if it was 40.00–80.00 percent, and robust if it was more significant than 80.00 percent [
83].
5. Conclusions
In this study, the feature properties of oil palm trees were extracted from thermal data and divided into two status levels: non-infected and infected by G. boninense. Eight (8) temperature features were used, namely, Tmean, Tsd, Tcenter, Tmax, Tmin, a combination of Tmean, Tsd, Tmax, Tmin, and Tmean, Tsd, Tcenter, Tmax, Tmin. Single, RUS, ROS, and SMOTE approaches were used with three classifier models NB, MLP, and RF.
In a comparison of performance, the AUC and PRC results for all feature temperatures (except Tsd) increased when using all three classifiers with the ROS approach instead of the single approach. In terms of model performance in the ROS approach, given any temperature features, the RF model had an AUC ranging from 0.754 to 0.921 compared to 0.649 to 0.827 for the NB model and 0.628 to 0.810 for the MLP model. Comparison between the RF, NB, and MLP models for the PRC gave a range of 0.739–0.902, 0.657–0.782, and 0.623–0.772, respectively.
Regarding the temperature features, specific features such as Tmax stand out more than others. Using the most significant variable, Tmax, the RF model had an AUC of 0.921 compared to 0.797 for the MLP model and 0.762 for the NB model. For PRC, a comparison of the RF model, the MLP model, and the NB model yielded values of 0.902, 0.764, and 0.736.
In conclusion, by using only the Tmax feature, RF can predict BSR disease with a relatively outstanding accuracy compared to the MLP and NB which had an acceptable accuracy. The significant benefit derived from this study is the potential of using thermal data and the imbalanced data approach to classify oil palm trees infected by G. boninense using ML techniques. In the future, studies with samples of different severity levels will be used to analyze temperature features and identify oil palm trees that have been infected with BSR disease.