Decision Tree Algorithms for Developing Rulesets for Object-Based Land Cover Classiﬁcation

: Decision tree (DT) algorithms are important non-parametric tools used for land cover classiﬁcation. While di ﬀ erent DTs have been applied to Landsat land cover classiﬁcation, their individual classiﬁcation accuracies and performance have not been compared, especially on their e ﬀ ectiveness to produce accurate thresholds for developing rulesets for object-based land cover classiﬁcation. Here, the focus was on comparing the performance of ﬁve DT algorithms: Tree, C5.0, Rpart, Ipred, and Party. These DT algorithms were used to classify ten land cover classes using Landsat 8 images on the Copperbelt Province of Zambia. Classiﬁcation was done using object-based image analysis (OBIA) through the development of rulesets with thresholds deﬁned by the DTs. The performance of the DT algorithms was assessed based on: (1) DT accuracy through cross-validation; (2) land cover classiﬁcation accuracy of thematic maps; and (3) other structure properties such as the sizes of the tree diagrams and variable selection abilities. The results indicate that only the rulesets developed from DT algorithms with simple structures and a minimum number of variables produced high land cover classiﬁcation accuracies (overall accuracy > 88%). Thus, algorithms such as Tree and Rpart produced higher classiﬁcation results as compared to C5.0 and Party DT algorithms, which involve many variables in classiﬁcation. This high accuracy has been attributed to the ability to minimize overﬁtting and the capacity to handle noise in the data during training by the Tree and Rpart DTs. The study produced new insights on the formal selection of DT algorithms for OBIA ruleset development. Therefore, the Tree and Rpart algorithms could be used for developing rulesets because they produce high land cover classiﬁcation accuracies and have simple structures. As an avenue of future studies, the performance of DT algorithms can be compared with contemporary machine-learning classiﬁers (e.g., Random Forest and Support Vector Machine).


Introduction
Object based image analysis (OBIA) has become an effective method of land cover classification of remotely sensed data [1,2]. Unlike traditional pixel-based analysis, OBIA offers an opportunity to develop discrete objects which relate to real world objects through image segmentation [3,4].
The segmentation process reduces within class spectral variations and offers an opportunity to increase classification accuracy, especially when conducted at an appropriate scale [5,6].
The ability to incorporate texture, compaction, and other object related information with spectral information has differentiated OBIA from other methods of classification such as pixel and sub-pixel approaches. Compared to pixel-based image analysis, OBIA is effective in reducing salt-and-pepper effects on thematic maps [7,8]. There are currently a number of segmentation algorithms available in eCognition Developer; however, multiresolution segmentation is the most common method used in land cover classification [1,4,9].
Apart from segmentation, another important component of OBIA is the actual classification of segmented objects [1,2]. Myint et al. [10] explained that there are two ways of assigning classes to segmented objects: (1) employing expert knowledge through rulesets; and (2) using automated classifiers. Under expert knowledge, classification is done by developing rulesets which are based on the thresholds of different object related information. Under the automated classification approach, objects are classified based on contemporary machine-learning classifiers such as Nearest Neighbor (NN), Random Forest (RF), Support Vector Machine (SVM), and classification and regression tree (CART), which have been incorporated into eCognition Developer 9.1 [2,11]. However, the practice of developing rulesets using thresholds of different object-related information remains a common practice in OBIA land cover classification [1,12].
There are many ways of establishing thresholds during ruleset development such as using expert knowledge, trial-and-error, and using binary recursive decision trees (DTs) [10]. Although not a common practice, the implementation of DTs seems more formal in establishing thresholds and the eventual development of effective rulesets [2,3,13]. Here, it is important to note that the statistical packages that have been used for developing rulesets are referred to as DT algorithms [14]. These algorithms are generally referred to as "black box" or "white box" depending on how easily an interpreter can follow the process. Black box algorithms, such as RF, have been used extensively in land cover classification, especially with the advancements in machine learning techniques [15]. On the other hand, simple machine learning DT algorithms such as Rpart, C5.0, and Tree have also been used for land cover classification and establishment of thresholds when developing rulesets [13].
Simple DTs are useful tools for establishing thresholds for developing decision rules for land cover classification of remote sensing data because they are non-parametric and are easy to interpret [10,16]. During classification, many variables are generated based on spectral and textural object-related information which can be used to develop effective rulesets if appropriate techniques such as DTs are applied. Belgiu et al. [17] suggested that DTs can be helpful for selecting the most influential variable and identifying the thresholds for different land cover classes because they are non-parametric and hence ideal for most landscapes.
While decision tree algorithms have been used in different areas associated with land cover classification, these algorithms have not been individually assessed on their effectiveness in establishing thresholds for developing rulesets for OBIA land cover classification. The main aim of this study was to conduct a multiple criteria evaluation of five different machine learning DT algorithms based on their performance when classifying Landsat 8 images. The performance comparison focused on the effectiveness of these five algorithms on handling different sizes of data and how simple each algorithm is to interpret.

Study Site
The study was conducted on the Copperbelt Province of Zambia ( Figure 1) which is located in the northern part of the country (latitude: 12.82 • S, longitude: 28.21 • E). The area receives between 1000 and 1200 mm of rainfall per annum and experiences temperature ranging 7-35 • C [18,19]. Mining and agriculture are the major economic activities in this area. This area is highly urbanized and has a population density (62.5 persons per square kilometer) [18,20]. As a result of the non-productivity of the mines, most people practice small-scale shifting cultivation causing rapid land cover change, especially the conversion of forest areas to agriculture and settlements. The Copperbelt Province also has the largest proportion of forest plantations in Zambia that are owned by a parastatal company called Zambia Forest and Forestry Company (ZAFFICO) [18]. ISPRS Int. J. Geo-Inf. 2020, 9, x FOR PEER REVIEW  3 of 22 productivity of the mines, most people practice small-scale shifting cultivation causing rapid land cover change, especially the conversion of forest areas to agriculture and settlements. The Copperbelt Province also has the largest proportion of forest plantations in Zambia that are owned by a parastatal company called Zambia Forest and Forestry Company (ZAFFICO) [18].

Datasets
Landsat 8 images, also called Landsat observation land images (OLI) [21], acquired from the United States Geological Survey (USGS) website (http://glovis.usgs.gov) were used in this study. The images were taken in 2016 and the September images were selected because, during this period, the study area experiences a dry season (i.e., no rains) and hence has less cloud cover. Landsat 8 has a spatial resolution of 30 m, spectral resolution of 11 bands, temporal resolution of 16 days, and radiometric resolution of 12 bits. In addition, Landsat 8 images have a panchromatic band with a spatial resolution of 15 m [22,23]. In this study, six bands that range from visible to infrared were used. Apart from the Landsat images, the Shuttle Radar Topography Mission (SRTM) Digital Elevation Models (DEM), with a spatial resolution of 30 m, were used for pre-processing and as auxiliary data for classification.
In total, 2600 random points were randomly overlaid on the Landsat images using ArcGIS 10.4 software package [24]. Visual and prior expert knowledge of different land cover classes was used to assign classes to each of the 2600 points. The 2600 random points were assigned one of the 10 land cover types identified on the ground (Table 1 and Table S1). This data was separated into DT training (1000 sample points), land cover classification (1000 sample points), and accuracy assessment sample

Datasets
Landsat 8 images, also called Landsat observation land images (OLI) [21], acquired from the United States Geological Survey (USGS) website (http://glovis.usgs.gov) were used in this study. The images were taken in 2016 and the September images were selected because, during this period, the study area experiences a dry season (i.e., no rains) and hence has less cloud cover. Landsat 8 has a spatial resolution of 30 m, spectral resolution of 11 bands, temporal resolution of 16 days, and radiometric resolution of 12 bits. In addition, Landsat 8 images have a panchromatic band with a spatial resolution of 15 m [22,23]. In this study, six bands that range from visible to infrared were used. Apart from the Landsat images, the Shuttle Radar Topography Mission (SRTM) Digital Elevation Models (DEM), with a spatial resolution of 30 m, were used for pre-processing and as auxiliary data for classification.
In total, 2600 random points were randomly overlaid on the Landsat images using ArcGIS 10.4 software package [24]. Visual and prior expert knowledge of different land cover classes was used to assign classes to each of the 2600 points. The 2600 random points were assigned one of the 10 land cover types identified on the ground (Table 1 and Table S1). This data was separated into DT training (1000 sample points), land cover classification (1000 sample points), and accuracy assessment sample for both the DT algorithms and the land cover map (600 sample points). The dataset was distributed to each land cover class following the percentage area covered by each class (see Table 1). Note that there were 1000 independent training samples for the DT and 1000 for land cover classification, while there were 6000 validation sample points for both DTs algorithm and land cover maps. The total number of samples was 2600.

Pre-processing
Pre-processing included the correction of images from atmospheric effects and topographic variation. This process converts digital numbers (DN) into ground reflectance values, which are more useful for image analysis. To ensure consistency during analysis, all images were projected to the Universal Transverse Mercator (UTM) projection system Zone 35S and World Geodetic System 84 (WGS 84) datum. Automated ATCOR 3, available in PCI Geomatics (PCI Geomatics, Ontario, Canada), was used for haze removal, atmospheric correction, and topographic correction by incorporating a 30 m digital elevation model.

Image Segmentation
Segmentation creates spectrally homogenous objects which can be related to real objects on the ground [25,26]. Past research recognized the challenges in establishing the optimal segmentation parameters [2,5]. Thus, the segmentation parameters which are scale (Sc), shape (Sh) and compaction (Cm) are commonly established by using trial-and-error methods [4,25,27]. Drǎguţ et al. [5] proposed a formal method of establishing optimal levels of scale factors using an Estimation of Scale Parameter (ESP) tool. For this study, the ESP tool indicated 12 for Sc, 0.2 for Sh, and 0.8 for Cm. With these scale parameters, multiresolution algorithm in eCognition Developer 9.1 (Trimble Navigation Ltd., Sunnyvale, California) was used to segment the images into spectrally homogeneous objects.

Sample Selection and Feature Extraction
After segmentation, the next step was to select sample objects using the 1000 random training points and extract object-related information. Several object-related feature values were developed based on spectral indices (Table 2), DEM values, spectral values of each band, and grey level co-matrix (GLCM). This process was done in eCognition Developer 9.1 by using the "assign class by thematic layer" and "classified image object to sample" tools. The samples and extracted object-related information were then exported to a spreadsheet. To assess the performance of the DT algorithms in land cover classification, the DTs' accuracies and the classification accuracy of the final thematic maps were considered. It is important to note that we refer to the accuracy derived from cross-validation of the DTs after training as DT accuracy while the final accuracy of thematic maps, which was derived from accuracy assessment, is referred to as land cover classification accuracy or thematic map accuracy.

Decision Tree Algorithms
DTs have been used in image-based classification because they are non-parametric and can be interpreted easily [13,14,46]. In OBIA, establishing decision rulesets is an important step towards land cover classification. However, this stage requires thresholds related to classes, which can be established by using knowledge-based methods or simple DTs. The knowledge-based approach can be complex, especially when many land covers and decision variables are involved. Here, the focus was on binary recursive DTs, which use response variables to split trees until there is no possibility of further splitting. The performance of five DT algorithms (Table 3), Rpart, Tree, Party, C5.0, and Ipred, was assessed in this study using a multiple criteria approach [47]. The assessment included three components: (1) assessing the accuracy of DTs accuracy on clustering of training data; (2) assessing the accuracy of land cover classification; and (3) examining the simplicity (e.g., Tree diagram and number of variables) of the structure of the DTs. The DT algorithms were trained using the information sample extracted from the samples. After training the DT algorithms, an independent dataset (600 points) was used to cross-validate the DTs which were produced during training. The comparison was made by using the predicted results from the DTs and the independent data, which was used as reference data. Accuracies in terms of percentage were derived from the comparison for all the five DT algorithms. A comparison was made on the accuracies of the DTs when different sample sizes were used to train the DT algorithms. The samples (i.e., 1000 sample points) were divided into ten samples in multiples of 100. Thus, the smallest sample size was 100, and the largest sample size was 1000 (Table 4). These samples were established by randomization and selecting the specific number of samples. The independent samples were then used to assess the accuracy of DTs over increased sample sizes. A Kruskal-Wallis non-parametric test was then used to assess the significant difference among the DT accuracies of the five algorithms because of the limited classification attempts and non-normality of the data as suggested by Li et al. [2]. After making the 10 classifications based on the sample size for each DT, the best result for each DT algorithm was used in developing rulesets for land cover classification. Rulesets were developed from output summaries and DT diagrams. The rulesets were then implemented in eCognition Developer 9.1 process tree for land cover classification to produce thematic maps for the five DTs.

Assessing Thematic Map Accuracy
The land cover classification is complete only after the accurate assessment is done on thematic maps [1,48,49]. The independent sample of 600 random validation points was used to build confusion matrices by comparing the classified and reference land cover points. The confusion matrices were used to calculate the users', producers', and overall accuracies (Figure 2). The suitability of the five land cover classification outputs was compared using user's and producer's accuracies. The general accuracy of the thematic maps was done using the overall accuracy.

DT Accuracy
The DT algorithms were tested by using predicted results against an independent sample through cross-validation. The DT accuracies from this assessment showed that C5.0 had the highest (83%) mean DT accuracy, while Party had the least accuracy (77%) (Figure 3). However, these differences were not statistically significant when tested using a Kruskal-Wallis non-parametric test (p-value > 0.05) [50].

DT Accuracy
The DT algorithms were tested by using predicted results against an independent sample through cross-validation. The DT accuracies from this assessment showed that C5.0 had the highest (83%) mean DT accuracy, while Party had the least accuracy (77%) (Figure 3). However, these differences were not statistically significant when tested using a Kruskal-Wallis non-parametric test (p-value > 0.05) [50]. The results of the performance of DT algorithms with different sample sizes show that the individual accuracy increases with the increase in sample size. Figure 4 shows that C5.0 had the highest DT accuracy of 88% when the largest sample size of 1000 was used, while Rpart had the lowest accuracy of 63% when the smallest sample size was used. The results of the performance of DT algorithms with different sample sizes show that the individual accuracy increases with the increase in sample size. Figure 4 shows that C5.0 had the highest DT accuracy of 88% when the largest sample size of 1000 was used, while Rpart had the lowest accuracy of 63% when the smallest sample size was used.

Number of Variables and Classification Accuracy
The efficiency of DT algorithms was also assessed by comparing the variables used and the final thematic map accuracies. In this study, 197 initial variables were considered for training the DTs.

Number of Variables and Classification Accuracy
The efficiency of DT algorithms was also assessed by comparing the variables used and the final thematic map accuracies. In this study, 197 initial variables were considered for training the DTs. Rpart had the fewest variables (8), while C5.0 used 35 variables for building a single DT ( Figure 5). Tree and Ipred also had fewer variables (less than 15) and retained high land cover classification accuracies of over 85% The ability of the DT algorithms to select a minimum number of variables and retain high classification accuracies was important for DT algorithms selection because this simplifies the structure of the DT.

Thematic Map Classification Accuracy
The overall accuracy and Kappa coefficient, calculated from the confusion matrices, were used to compare OBIA classification accuracy resulting from the five DT algorithms. The producer's (PA) and user's accuracies (UA) were considered to establish the classification accuracy of each land cover class. The thematic maps showed a variation in the results which were produced. The C5.0 and Part algorithms had small land cover components, especially on bare land, while thematic maps for Rpart, Tree, and Ipred showed continuous sections for bare land ( Figure 6).

Thematic Map Classification Accuracy
The overall accuracy and Kappa coefficient, calculated from the confusion matrices, were used to compare OBIA classification accuracy resulting from the five DT algorithms. The producer's (PA) and user's accuracies (UA) were considered to establish the classification accuracy of each land cover class. The thematic maps showed a variation in the results which were produced. The C5.0 and Part algorithms had small land cover components, especially on bare land, while thematic maps for Rpart, Tree, and Ipred showed continuous sections for bare land ( Figure 6).
The results show that the thematic map produced using the Tree algorithm had the highest overall accuracy of 89%, while the Party algorithm thematic map had the lowest overall accuracy of 73% (Table 5 and Figure 6). Rpart, Ipred, and C5.0 had overall accuracies of 88%, 85%, and 74%, respectively. The classification by DT algorithms with relatively low overall accuracies, C5.0 and Party, had very lower PAs and UAs (59-43%) for classes such as bare land and wetlands. Bare land had the lowest PA and UA, 74% and 62%, respectively, for the Rpart algorithm and primary forests had the lowest PA of 66% while the UA accuracy was 96%. Primary and secondary forests had some of the lowest user's and producer's accuracies (39% and 37%, respectively), indicating the challenges in separating the two land covers. The results show that the thematic map produced using the Tree algorithm had the highest overall accuracy of 89%, while the Party algorithm thematic map had the lowest overall accuracy of 73% (Table 5 and Figure 6). Rpart, Ipred, and C5.0 had overall accuracies of 88%, 85%, and 74%, respectively. The classification by DT algorithms with relatively low overall accuracies, C5.0 and Party, had very lower PAs and UAs (59-43%) for classes such as bare land and wetlands. Bare land had the lowest PA and UA, 74% and 62%, respectively, for the Rpart algorithm and primary forests had the lowest PA of 66% while the UA accuracy was 96%. Primary and secondary forests had some of the lowest user's and producer's accuracies (39% and 37%, respectively), indicating the challenges in separating the two land covers.   Bare land  93  83  74  62  74  81  43  47  59  58  Dry Agriculture  97  99  97  97  94  95  93  70  91  72  Grassland  92  86  98  98  93  93  78  82  65  95  Irrigated Crops  100  81  100  100  97  100  100  95  100  100  Plantation Forest  100  93  69  94  91  100  82  90  81  54  Primary Forests  66  96  89  89  88  82  80  39  90  80  Secondary Forests  85  82  85  87  87  82  64  84  82

Other DT Characteristics
The structure of each algorithm was also considered in terms of graphic output, useful summaries, and the ability to produce ruleset as part of the output. Rpart, Tree, C5.0, and Party have tree diagrams as part of their output. Unlike Rpart and Tree algorithms, C5.0 and Party produced large tree diagrams which are difficult to interpret. This is because of their ability to include many variables when building DTs. On the other hand, Ipred aims at improving the classification results by developing many DT and improving their accuracies. It is difficult to produce a single tree diagram because the Ipred algorithm focuses on many DTs at once [51]. All DT algorithms produce summaries which are useful for developing rulesets; however, C5.0 and Rpart produce comprehensive rulesets for each terminal node.

DT Accuracy
The comparison of the five DT algorithms showed that the DT accuracy was not significantly different among these algorithms. However, C5.0 showed high mean DT accuracy across different sample sizes. These findings are in line with the findings of Powers et al. [13], who achieved over 88% DT accuracy using the C5.0 algorithm. This can be attributed to C5.0 s ability to integrate more decision variables in developing a DT through boosting and bagging [52,53]. Boosting improves classification by continuously formulating independent trees which are used to correct errors on the final models, while, in bagging, several trees are formulated and the final tree is established by voting for the most accurate variables and splits [46,47,53]. Apart from C5.0, DT algorithms such as Ipred and random forest also use bagging to improve classification accuracy [51,54,55].
In evaluating the performance of DT algorithms, it is important to employ a multiple criteria approach. DeFries et al. [47] used a multiple criteria approach in evaluating DT performances by considering their accuracy, ability to handle noise in the data, computation time and structure of the algorithms. While algorithms the DT accuracy of the five algorithms were not significantly different, the C5.0 had a relatively higher mean DT accuracy than the other four DTs. However, the C5.0 algorithm is very sensitive to noise in the data and they have a larger structure [55]. In the present study, 35 variables were used and 105 nodes were developed for a single C5.0 DT as compared to other algorithms such as Rpart which had 10 variables and 12 nodes.
The DTs algorithm, such as Rpart and Tree, which have simple structures, are effective in selecting and reducing the number of predictor variables [46]. Rodriguez-Galiano et al. [50] reported that the accuracy among different DT algorithms did not differ on land cover mapping in the Mediterranean region; however, the performance of these DTs was sensitive to noise on the data and the size of the sample. Therefore, the performance of the algorithm should not be selected based on the statistical accuracy alone but rather on multiple criteria which include several considerations and the eventual classification accuracy of the thematic maps.

Thematic Map Classification Accuracy
This study has shown that DT algorithms are effective tools in developing decision rulesets for land cover thematic maps. The DTs have been used in land cover classification because they are non-parametric in nature and can be used with a number of auxiliary data such as digital elevation models (DEM), spectral indices, and spatial data [56,57]. For example, Im et al. [58] used Light Detection and Ranging (LiDAR) data and OBIA in land cover classification using the C5.0 DT algorithm.
High thematic map classification accuracy can be achieved by using these DTs; however, the accuracy differs from algorithm to algorithm. Due to spectral similarities between some classes such as primary and secondary forests, their accuracies were not as high as the other land cover classes (e.g., water and plantation forests). Phiri [19] reported similar challenges when conducting land cover classification in Zambia. Sharma et al. [56] reported that land cover classification accuracy using CART algorithms was better (>88%) than traditional classification approaches such as Maximum Likelihood and ISODATA, which attained overall accuracies of less than 72%. However, most studies have reported that other machine-learning algorithms such as support vector machines, random forests and neural network produced higher classification accuracies than the DTs used in this study [52,56,57]. Apart from C5.0 algorithm, CART based algorithms, such as Rpart and Tree, are the most commonly used algorithms because of their simplicity and ability to select and reduce variables for classification [16,57].
The high DT accuracy from cross-validation may not result in high land cover classification accuracy because of: (1) the overfitting and saturation of the algorithm; (2) the ability to handle noise on the data; and (3) the size and structure of the algorithm [47]. For example, C5.0 had a high mean DT accuracy similar to other algorithms; however, it had a relatively low land cover classification accuracy ( Table 2). This is large because the C5.0 algorithm is susceptible to noise in the data as this algorithm does not have a strong ability to handle outliers and is more prone to overfitting [47,55]. Large DTs need to be pruned in order to reduce the effects of overfitting; however, pruning may affect classification accuracy [13].
In this study, land cover classification accuracy was high for DT algorithms such as Rpart and Tree; these DTs also had simple structures, an ability to deal with noise in the data and high statistical prediction accuracy. Simple algorithms which use a minimum number of decision variables have simple structures, are less saturated, and hence can be easily interpreted [56,57]. Among these DT algorithms, Rpart and C5.0 are commonly used in land cover classification and usually produce high classification accuracies. Powers et al. [13] reported 88% overall accuracy when C5.0 was used for mapping fine scale-industrial disturbance. Another DT algorithm which is simple and has high classification accuracy is the Tree algorithm; however, Rpart is preferable to Tree because it is more flexible and has a lot of supportive packages currently available in R statistical software [55]. When working with DT algorithms for land cover classification, it is important to establish the effects of different tuning parameters such as the number of variables, number of splits, size of a tree and allowable error because they can influence the classification results [6,50]. Future studies could focus more on the influence of different tuning parameters on the classification accuracy of different landscapes and different remote sensing data.

Selecting the Best DT for Ruleset Development
Choosing the ideal DT algorithm to use for land cover classification should be the most important objective when using these DT algorithms to establish decision rulesets in OBIA land cover classification. It is important to consider all the properties of the DT algorithms such as model accuracy, simplicity, ability to handle different numbers of variables, and sizes of datasets (Table 6). This can be achieved by using a multiple criteria evaluation approach as suggested by DeFries et al. [47]. The focus in this evaluation should be on the DT algorithms which have high classification accuracies and simple structures and are not susceptible to noise in the data and are easy to interpret. In this study, Tree and Rpart algorithms are recommended for developing decision rulesets, especially when larger numbers of variables are involved as these algorithms have the ability to select a small number of influential variables for classification and hence achieving simplicity and high classification accuracy [47,56]. Ipred algorithm does not differ from Rpart and Tree in most of its functionalities; however, this algorithm was built on the principle of bagging which is difficult to achieve simplicity and extract decision rulesets as it produces several decision trees. To successfully use Ipred, a function (nbagg = 1) which specifies the production of one DT could be employed [51]. The C5.0 and Party should be used when the objective is to include more predictor variables and to produce a high DT accuracy during cross-validation.
Although the overall accuracies achieved in this study were high, user's and producer's accuracies for spectrally similar classes such as primary and secondary forest were low. Therefore, there is a need to define the classes so that they are spectrally similar. In addition, other methods such as non-parametric machine learning classifiers, e.g., Random Forest [15,19,59] and Support Vector Machine [60,61], which have proved to be more effective, can be used during classification in order to achieve higher accuracy.
The current study has only one location and this has the potential to affect the transferability and the generalization of the results. This can be a challenge for future studies and the results from this study can be generalization with some levels of uncertainties. It is important to note that the generalization, transferability, and reproducibility of the results are largely influenced by the type of DT used, the sample size, and the type of input features.

Conclusions
In this paper, a systematic comparison of the performance of five DT algorithms on land cover classification using Landsat 8 is presented. The main focus was selecting DT algorithms which have high classification accuracies, simple structures, and are easy to interpret, by using the multiple criteria approach suggested by DeFries et al. [47]. While all algorithms had high mean DT accuracies, it was established that the Tree and the Rpart algorithms were simple, easy to interpret, and not affected by noise from datasets. The results from the Tree and Rpart DT algorithms produce high overall accuracies of over 86%. The C5.0 and Party algorithms were equally good with respect to the overall accuracy; however, they incorporate a large number of decision variables in the output which can be difficult to implement and exhibit the effects of overfitting and saturation. Further analysis showed that Rpart and Tree can select the minimum number of variables and hence retain simple but accurate rulesets. Based on the DT and the land cover accuracy and other important aspects such as number of variable and the simplicity of the DT structure, it can be recommended to use Rpart or Tree in developing rulesets for OBIA land cover classification of Landsat 8 imagery. Going forward, future studies can compare the performance of these simple DTs with the contemporary machine learning classifiers such as RF and SVM in different geographic locations at multiple time periods.