1. Introduction
Land cover classification from satellite images is one of the primary fields in remote sensing. Finer spatial resolution data (10–30 m), in particular from Landsat, has been widely used for regional studies of land cover and change, and very fine spatial resolution imagery (<1 to 5 m) play an important role in local studies. Wall-to-wall mapping of large areas with 10–30 m data is expensive in terms of financial and computational resources, and there are only a few efforts for large areas, such as the National Land Cover Dataset (NLCD) of the United States [
1], the National Land Cover of South Africa (NLC) [
2], or the European Coordination of Information on the Environment as pan-European maps (CORINE) [
3]. Recently, global forest cover [
4,
5] and global land cover maps [
6] were derived from 30 m Landsat data. Most macro-regional, continental, and global applications, however, employ data of relatively coarse spatial resolution (250–1000 m) from Terra-Aqua/MODIS, SPOT/VEGETATION, NOAA/AVHRR, and ENVISAT/MERIS. Besides fewer difficulties in handling data volumes, the increased number of available cloud-free images allows for generation of data composites, and the dense temporal information helps to discern classes by their distinct phenological patterns. The latter is advantageous for mapping across various ecoregions where classes are likely to be represented by multiple clusters in feature space [
7,
8].
The lack of spatial detail of coarse resolution data imposes limitations for accurate land cover characterization [
9,
10,
11]. The assignment of discrete classes to coarse resolution cells cannot adequately describe spatially complex areas [
12]. The likelihood for mixed pixels is a function of the spatial resolution, the thematic detail to be mapped, and the size and spatial pattern of land cover patches [
13]. However, discrete class assignment of mixed pixels not only imposes serious difficulties to coarse image data classification but also alters the area estimation. Several studies have noted that at coarser spatial resolution dominating classes with large patches yield higher area proportions than expected at the expense of dispersed, small-patch classes [
7,
13,
14]. Studies have postulated that area calculations from fractional estimates are more accurate than from discrete classifications [
7,
15].
Several algorithms have been explored for large area mapping with coarse resolution data. For instance, Fernandes
et al. [
10] compared a hard classifier, artificial neural networks (ANN), linear spectral unmixing, clustering, and linear regression for fractional class estimation and found differences of approximately 20% compared to fine resolution reference data. Studies focusing on urban land cover compared advanced regression algorithms [
16] or various discrete classifiers [
17]. Several studies for the same global 1° spatial resolution AVHRR Normalized Difference Vegetation Index (NDVI) dataset have shown that classification of 11 land cover classes with decision trees (DT) perform best with 93% overall accuracy [
18] compared to Maximum Likelihood classification (78%) [
19] and ANN (85%) [
20]. Most automated processing systems for macro-regional to global land cover characterization employ DT approaches [
1,
12,
21,
22,
23,
24,
25]. There are two general types of DT: classification trees (CT) with a discrete target value and regression trees (RT) with a continuous result.
Besides the classification algorithm, features and training data for supervised image classification have to be defined. Several studies address feature generation and selection processes [
26,
27,
28,
29] and various aspects of training data selection [
17,
30,
31,
32]. However, only a few studies have focused on training data allocation schemes, such as between-class sample balance or the structure of heterogeneous samples. In particular classification trees may suffer from an unbalanced sample size between classes because the number of samples in each leaf defines the class [
33,
34], and several allocation schemes have been recommended [
24,
26,
27,
32]. A few studies recommend heterogeneous training data for discrete classification [
35,
36] but most large-area mapping projects select homogeneous areas for training [
7,
22,
27]. For regression techniques, the impact of non-random selection of heterogeneous training data is unknown, and the impact of combined tree models for several classes on correct area estimation has been widely overlooked.
The objective of this study is to compare the accuracy and area estimations of several decision tree approaches trained with specific sample allocation schemes from an existing higher spatial resolution map for discrete and continuous land cover mapping. Specific goals are:
- (1).
Evaluate the performance of DT algorithms using two common approaches of classification and regression trees
- (2).
For classification trees, compare (a) heterogeneous training pixels with different allocation schemes against homogeneous pixels and (b) schemes of sample allocation between classes
- (3).
For regression trees, assess (a) sample allocations for heterogeneous samples and (b) normalized and non-normalized results to combine multiple models.
4. Results and Analysis
4.1. Reference Data
4.1.1. Spatial Co-Registration
Table 2 shows near-to-perfect spatial co-registration between NDVI from ten Landsat images and corresponding dates of MODIS composites. The offsets are negligible, with averages of
x = −3 m,
y = −3 m and extremes lower-equal ±30 m. The coefficient values itself are all positive and indicate a sufficiently high correlation,
i.e., the spatial patterns in Landsat and MODIS NDVI are closely related to each other. This finding is an important prerequisite for the following analysis as it permits a direct relation between Landsat-based NLCD2006 maps and MODIS.
Table 2.
Spatial offset between Landsat images (for their spatial location see
Figure 1) and temporally corresponding composites of MODIS data using the NDVI.
Table 2.
Spatial offset between Landsat images (for their spatial location see Figure 1) and temporally corresponding composites of MODIS data using the NDVI.
Path-Row | Location | Acquisition of Landsat Image | X-Offset (m) | Y-offset (m) | Correlation Coefficient |
---|
021-037 | East Gulf Coastal Plain, AL | 15 June 2006 | −30 | 30 | 0.71 |
025-036 | Ouachita Mountains, AR | 15 September 2006 | −30 | 0 | 0.76 |
018-038 | East Gulf Coastal Plain, GA | 10 June 2006 | 0 | 0 | 0.74 |
028-034 | Osage Plains, KS | 13 April 2006 | −30 | −30 | 0.83 |
020-034 | Interior Low Plateaus, KY | 23 May 2006 | 30 | −30 | 0.75 |
023-037 | Mississippi Valley MS | 3 October 2006 | −30 | 0 | 0.90 |
023-034 | Mississippi Valley, MO | 31 July 2006 | 30 | −30 | 0.55 |
018-036 | Piedmont, SC | 10 June 2006 | 0 | 0 | 0.68 |
027-037 | Dallas Area, TX | 13 September 2006 | 0 | 30 | 0.86 |
028-038 | Central Texas, TX | 20 September 2006 | 30 | 0 | 0.84 |
4.1.2. NLCD2006 Data
Figure 3A shows the NLCD2006 map recoded to nine classes (
Table 1) at 30 m spatial resolution. The map illustrates some spatial details such as the road network in Kansas that disappeared in
Figure 3B, showing the spatial distribution of the dominant class at 900 m spatial resolution derived with majority rule
argmax(H).
Figure 3C indicates the corresponding area proportion of the dominating class,
max(H). There are distinct regional patterns with homogeneous areas in the western portion (Shrubland, Grassland, Cultivated crops), the Mississippi valley (Cultivated crops), the southern Ozark and Appalachians mountains (Deciduous forest), the Okefenokee Swamp in southern Georgia (Wetlands), and large metropolitan areas like Atlanta, Dallas-Fort Worth, and St. Louis (Developed). In particular, the southeastern region is highly heterogeneous with area proportions of the dominating class below 50%; similar heterogeneous patterns exist in eastern Texas, Oklahoma, Louisiana, and Arkansas.
Figure 3.
(A) Reference map at 30m spatial resolution; (B) coarsened map at 900 m using majority rule, argmax(H); and (C) area proportion of that class, max(H).
Figure 3.
(A) Reference map at 30m spatial resolution; (B) coarsened map at 900 m using majority rule, argmax(H); and (C) area proportion of that class, max(H).
Table 3 shows for each class the percentage of homogeneity in 12 bins. It is evident that there are more pixels with low homogeneity, but the magnitude is different for each class. For instance, class Water only exists in selected parts of the map and thus H = 0% makes up 76.7% of the study area. Class Deciduous forest is rather ubiquitous with a proportion of 37.6% for 10% ≤ H < 60%. Due to many roads that cause a homogeneity slightly above 0%, class Developed is an interesting example with only 21.3% for H = 0% but 61.5% for 0% < H < 10%.
Table 3.
Homogeneity (H) in 10-percent bins and bins for 0 and 100 percent derived from NLCD2006. For abbreviations of class names see
Table 1.
Table 3.
Homogeneity (H) in 10-percent bins and bins for 0 and 100 percent derived from NLCD2006. For abbreviations of class names see Table 1.
Homogeneity (%) | Wat | Dev | DF | EF | Shb | Grs | Past | Crop | Wet |
---|
H = 0 | 76.67 | 21.32 | 23.43 | 39.92 | 55.22 | 39.46 | 47.16 | 58.16 | 61.32 |
0 < H < 10 | 19.39 | 61.45 | 24.37 | 26.40 | 27.14 | 33.34 | 18.13 | 12.89 | 24.46 |
10 ≤ H < 20 | 1.53 | 10.36 | 12.01 | 9.50 | 8.10 | 8.10 | 9.33 | 5.52 | 5.63 |
20 ≤ H < 30 | 0.65 | 2.28 | 8.73 | 6.45 | 3.72 | 4.31 | 6.60 | 3.75 | 2.90 |
30 ≤ H < 40 | 0.38 | 1.08 | 6.86 | 4.87 | 1.77 | 3.08 | 5.20 | 3.02 | 1.69 |
40 ≤ H < 50 | 0.29 | 0.74 | 5.51 | 3.84 | 1.02 | 2.48 | 4.21 | 2.63 | 1.09 |
50 ≤ H < 60 | 0.23 | 0.57 | 4.52 | 3.05 | 0.73 | 2.09 | 3.34 | 2.41 | 0.76 |
60 ≤ H < 70 | 0.18 | 0.48 | 3.86 | 2.40 | 0.59 | 1.81 | 2.59 | 2.34 | 0.56 |
70 ≤ H < 80 | 0.17 | 0.45 | 3.52 | 1.79 | 0.52 | 1.62 | 1.88 | 2.45 | 0.45 |
80 ≤ H < 90 | 0.15 | 0.42 | 3.39 | 1.23 | 0.49 | 1.53 | 1.16 | 2.89 | 0.41 |
90 ≤ H < 100 | 0.19 | 0.49 | 3.37 | 0.56 | 0.59 | 1.91 | 0.40 | 3.60 | 0.52 |
H = 100 | 0.18 | 0.36 | 0.41 | 0.01 | 0.12 | 0.26 | 0.01 | 0.35 | 0.22 |
4.2. Sample Allocation of Training Data
This section exemplarily demonstrates training sample allocation schemes. Each of the following tables shows the expected sample frequency, which is calculated from the number of samples that fulfill the specific allocation criteria, the corresponding expected number of samples, in many cases considering a minimum of 50 samples per class or sample bin, followed by actual sample allocation. All numbers are specific for this study and are meant to demonstrate the sample allocation process in practice.
Table 4 presents the random sample allocation for homogeneous pixels. The expected frequency and thus the expected number of samples is relative to the class proportion of H = 100% in
Table 3. Actual sampling starts at H = 100% and decreases until the expected number of samples per class is reached. Sufficient samples of fully homogeneous pixels (H = 100%) were available for classes Deciduous forest, Grassland, and Cultivated crops. To reach the expected number of 358 samples for class Shrubland, Homogeneity had to be decreased to 96%.
Table 4.
Random allocation with a minimum of 50 samples per class using homogeneous pixels H = 100%. Homogeneity (H) in percent. See
Table 1 for abbreviations of class names.
Table 4.
Random allocation with a minimum of 50 samples per class using homogeneous pixels H = 100%. Homogeneity (H) in percent. See Table 1 for abbreviations of class names.
| Wat | Dev | DF | EF | Shb | Grs | Past | Crop | Wet |
---|
Expected sample frequency (%) | 9.29 | 18.70 | 21.46 | 0.44 | 6.23 | 13.77 | 0.31 | 18.26 | 11.54 |
Expected number of samples | 510 | 976 | 1112 | 72 | 358 | 731 | 66 | 954 | 621 |
H = 100% | 352 | 623 | 1112 | 28 | 181 | 731 | 0 | 954 | 420 |
99% ≤ H < 100% | 71 | 172 | 0 | 44 | 71 | 0 | 66 | 0 | 170 |
98% ≤ H < 99% | 53 | 117 | 0 | 0 | 56 | 0 | 0 | 0 | 31 |
97% ≤ H < 98% | 34 | 64 | 0 | 0 | 47 | 0 | 0 | 0 | 0 |
96% ≤ H < 97% | 0 | 0 | 0 | 0 | 3 | 0 | 0 | 0 | 0 |
Table 5 shows the allocation proportional to the expected area from NLCD2006 (
Table 1). Heterogeneous pixels were allocated uniformly across six bins of H ≥ 50%. Sampling should start at the bin with the highest homogeneity (H = 100%) because in some cases the expected sample size may not be available and will be allocated from the next bin. For instance, for class Evergreen forest with an expected total of 715 samples each bin should contain 119.17 samples (rounded to 119 or 120 samples), but only 29 samples could be selected for H = 100% and the remaining 90 samples were allocated from bin 90% ≤ H < 100%.
Table 5.
Allocation proportional to expected area with a minimum of 50 samples per class using heterogeneous pixels with uniform allocation across six bins of H ≥ 50%. Homogeneity (H) in percent. For abbreviations of class names see
Table 1.
Table 5.
Allocation proportional to expected area with a minimum of 50 samples per class using heterogeneous pixels with uniform allocation across six bins of H ≥ 50%. Homogeneity (H) in percent. For abbreviations of class names see Table 1.
| Wat | Dev | DF | EF | Shb | Grs | Past | Crop | Wet |
---|
Expected sample frequency (%) | 1.91 | 7.52 | 23.84 | 13.43 | 6.31 | 12.44 | 13.69 | 15.35 | 5.51 |
Expected number of samples | 145 | 422 | 1230 | 715 | 362 | 666 | 727 | 810 | 323 |
H = 100% | 24 | 70 | 205 | 29 | 60 | 111 | 0 | 135 | 54 |
90% ≤ H < 100% | 24 | 71 | 205 | 209 | 61 | 111 | 242 | 135 | 54 |
80% ≤ H < 90% | 25 | 70 | 205 | 120 | 60 | 111 | 122 | 135 | 54 |
70% ≤ H < 80% | 24 | 70 | 205 | 119 | 60 | 111 | 121 | 135 | 53 |
60% ≤ H < 70% | 24 | 71 | 205 | 119 | 61 | 111 | 121 | 135 | 54 |
50% ≤ H < 60% | 24 | 70 | 205 | 119 | 60 | 111 | 121 | 135 | 54 |
Table 6 presents equal allocation between classes of heterogeneous pixels with random allocation across sample bins of
argmax(H). Although the lowest potential level of dominance could be as low as 11.1% (1/9 classes) in reality, the lowest homogeneity was above 20%. For most classes pixels are highly heterogeneous with 40% ≤ H < 70%,
i.e., the area of the dominating class makes up approximately half of the pixel. Only classes Grassland and Cropland indicated more homogeneous pixels with 70% ≤ H < 100%.
Table 6.
Equal class allocation of heterogeneous pixels with random allocation across bins of
argmax(H). Homogeneity (H) in percent. See
Table 1 for abbreviations of class names.
Table 6.
Equal class allocation of heterogeneous pixels with random allocation across bins of argmax(H). Homogeneity (H) in percent. See Table 1 for abbreviations of class names.
| Wat | Dev | DF | EF | Shb | Grs | Past | Crop | Wet |
---|
Expected sample frequency (%) | 11.11 | 11.11 | 11.11 | 11.11 | 11.11 | 11.11 | 11.11 | 11.11 | 11.11 |
Expected number of samples | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 | 600 |
H = 100% | 42 | 36 | 6 | 0 | 9 | 14 | 0 | 14 | 20 |
90% ≤ H < 100% | 54 | 66 | 62 | 20 | 50 | 100 | 12 | 121 | 60 |
80% ≤ H < 90% | 62 | 59 | 82 | 38 | 55 | 100 | 48 | 94 | 61 |
70% ≤ H < 80% | 63 | 71 | 82 | 73 | 66 | 92 | 93 | 101 | 46 |
60% ≤ H < 70% | 68 | 84 | 94 | 102 | 95 | 83 | 97 | 87 | 69 |
50% ≤ H < 60% | 108 | 90 | 104 | 131 | 117 | 91 | 137 | 73 | 104 |
40% ≤ H < 50% | 108 | 96 | 101 | 120 | 103 | 73 | 132 | 71 | 109 |
30% ≤ H < 40% | 80 | 75 | 58 | 94 | 75 | 40 | 69 | 27 | 101 |
20% ≤ H < 30% | 15 | 23 | 11 | 22 | 30 | 7 | 12 | 12 | 30 |
Table 7 shows sample allocation schemes for regression trees for class Evergreen forest. Heterogeneous pixels were allocated randomly or uniformly across all bins of homogeneity (see also
Table 3). In case of insufficient available samples for a bin starting at H = 100%, the remaining samples were added to the next bin.
Table 7.
Random and uniform allocation of heterogeneous pixels for regression trees for class Evergreen forest. Homogeneity (H) in percent.
Table 7.
Random and uniform allocation of heterogeneous pixels for regression trees for class Evergreen forest. Homogeneity (H) in percent.
| Random-50 | Random-0 | Uniform |
---|
Expected Sample Frequency (%) | Expected Number of Samples | Actual Number of Samples | Expected Sample Frequency (%) | Expected Number of Samples | Actual Number of Samples | Expected Sample Frequency (%) | Expected Number of Samples | Actual Number of Samples |
---|
H = 100 | 0.01 | 50 | 29 | 0.01 | 0 | 0 | 8.33 | 450 | 29 |
90 ≤ H < 100 | 0.56 | 77 | 98 | 0.56 | 31 | 31 | 8.33 | 450 | 871 |
80 ≤ H < 90 | 1.23 | 109 | 109 | 1.23 | 66 | 66 | 8.33 | 450 | 450 |
70 ≤ H < 80 | 1.79 | 136 | 136 | 1.79 | 97 | 97 | 8.33 | 450 | 450 |
60 ≤ H < 70 | 2.40 | 165 | 165 | 2.40 | 129 | 129 | 8.33 | 450 | 450 |
50 ≤ H < 60 | 3.05 | 197 | 197 | 3.05 | 165 | 165 | 8.33 | 450 | 450 |
40 ≤ H < 50 | 3.84 | 234 | 234 | 3.84 | 207 | 207 | 8.33 | 450 | 450 |
30 ≤ H < 40 | 4.87 | 283 | 283 | 4.87 | 263 | 263 | 8.33 | 450 | 450 |
20 ≤ H < 30 | 6.45 | 360 | 360 | 6.45 | 348 | 348 | 8.33 | 450 | 450 |
10 ≤ H < 20 | 9.50 | 505 | 505 | 9.50 | 513 | 513 | 8.33 | 450 | 450 |
0 < H < 10 | 26.40 | 1318 | 1318 | 26.40 | 1425 | 1425 | 8.33 | 450 | 450 |
H = 0 | 39.92 | 1966 | 1966 | 39.92 | 2156 | 2156 | 8.33 | 450 | 450 |
4.3. Accuracy Assessment of Classification and Regression Trees
4.3.1. Reference Data for Discrete Map Assessment
Table 8 provides details on the reference sample allocation process and reference label assignment. For each class, the number of potential samples (each sample corresponds to one 900 m MODIS pixel) meets the following conditions: (1) its homogeneity in NLCD2006 is higher than 50% and (2) it is located within the extent of Landsat images (
Figure 1). The average of the homogeneity shows that, albeit all samples belong in majority to one class (H > 50%), the level of dominance is moderate and most samples are not pure. For each stratum in NLCD2006, 150 samples were extracted. Out of 1350 samples, four were excluded from analysis because response data were obscured by clouds or class assignment was too uncertain. The columns for primary and alternative label indicate for each class the number of assigned reference samples. For instance, there are 120 samples with primary label of class Water and another 84 samples labeled as Water by the alternative call. For 73 samples, the primary and alternative calls agree,
i.e., class assignment is quite certain. On the other hand, there are 47 samples for which the alternative class was not Water and 11 samples for which the primary call was not Water. As the assignment of Water in image interpretation is quite simple, these samples were likely located along the edge of a water body and contain a mixture of land cover types. There are extreme cases of ambiguity such as Grassland and Pasture, both indicating a specific land use difficult to classify only using satellite imagery, or frequently mixed pixels, e.g., Wetland. Less than half of the samples (48.8%) had corresponding class labels in the primary and alternative call.
Table 8.
Sample allocation from NLCD2006 (H > 50%) and location in Landsat path-rows (
Figure 1) and primary and alternative reference label assignment from Landsat and Google Earth image interpretation. Agreement shows the number of samples with equal primary and alternative label. Homogeneity (H) in percent.
Table 8.
Sample allocation from NLCD2006 (H > 50%) and location in Landsat path-rows (Figure 1) and primary and alternative reference label assignment from Landsat and Google Earth image interpretation. Agreement shows the number of samples with equal primary and alternative label. Homogeneity (H) in percent.
Class/Strata | NLCD-Based Sample Allocation | Reference Set from Response data |
---|
| Potential Samples | Mean (H) | Primary | Alternative | Agreement |
---|
Water | 4857 | 76.67 | 120 | 84 | 73 |
Developed | 14,197 | 79.32 | 155 | 153 | 130 |
Deciduous forest | 60,678 | 72.35 | 322 | 309 | 193 |
Evergreen forest | 35,733 | 67.06 | 90 | 84 | 20 |
Shrubland | 17,996 | 71.40 | 178 | 196 | 111 |
Grassland | 36,388 | 73.58 | 96 | 155 | 6 |
Pasture | 32,439 | 66.88 | 121 | 123 | 1 |
Cropland | 63,522 | 77.81 | 218 | 171 | 108 |
Wetland | 11,502 | 74.47 | 46 | 71 | 15 |
4.3.2. Discrete Map Assessment
The overall accuracies (OA) of all classifications for discrete maps are shown in
Table 9 for the primary reference label (P) or the primary and alternative label (P + A) as correctly classified. Confidence intervals (
p < 5%, two tailed z-test) range between 2.5% and 2.7% and are therefore not presented.
Overall accuracies of RF-C are, on average, 1% higher than from C5.0, and Cubist yields about 0.5% higher accuracies than RF-R. Assessing discrete maps from classification trees (C5.0, RF-C), heterogeneous training pixels show, on average, 6% better accuracy than homogeneous training data. There are no notable differences between uniform or random allocation of heterogeneous training samples. Area-proportional between-class sample allocations show 1% higher overall accuracies than equalized sampling, and accuracy for random training sampling decreases another 0.5%. Discrete maps from regression trees show a consistent pattern of 2%–3% higher accuracies for uniform allocation. Random allocation with no minimum sample size per bin resulted in 1%–2% lower accuracies than when allocating at least 50 samples for each bin. Note that normalization has no effect on discrete maps obtained with the majority rule. Best results for classification trees were obtained with RF-C, uniform allocation and area-balanced between-class sample allocation and for regression trees with uniform sampling but negligible differences between Cubist and RF-R (highlighted cells in
Table 9).
Assessments using the primary and alternative reference label as correctly classified result in, on average, 14% higher accuracies, which indicates ambiguity in reference label assignment of some classes. In terms of class accuracies (see
supplemental material), Water and Developed are well classified (on average 75% or better in users and producers accuracy). Shrubland and Cropland form a second group with above 50% in both class accuracies. There is confusion between Evergreen and Deciduous forest, and between both classes and Wetland as many forests in the southeastern US are interconnected with wetlands either as riparian vegetation or along estuaries at the coast. It should be considered that Wetland was the class with lowest accuracies in NLCD [
40]. Other classes with below 50% class accuracy are Pasture and Grassland as both indicate land use forms of herbaceous areas.
Table 9.
Accuracy measures and absolute difference in area for discrete and continuous (class memberships) classifications. OA: overall accuracy using primary (P) or primary and alternative (P + A) label of reference data as correctly classified. r: correlation coefficient. MAD: mean absolute difference. Int: Intercept. AD: absolute difference in million hectares and percent. Classification trees C5.0 and Random forest classification (RF-C) with homogeneous samples (H = 100) or heterogeneous samples allocated uniformly for H ≥ 50% or randomly (argmax(H)). Sample allocation between classes with random, area-proportional, equal allocation. Regression trees Cubist and Random Forest Regression (RF-R) with uniform and random allocation with no minimum or at least 50 samples per bin. NN: no normalization. Norm: normalization. Highlighted cells indicate best results for classification and regression trees.
Table 9.
Accuracy measures and absolute difference in area for discrete and continuous (class memberships) classifications. OA: overall accuracy using primary (P) or primary and alternative (P + A) label of reference data as correctly classified. r: correlation coefficient. MAD: mean absolute difference. Int: Intercept. AD: absolute difference in million hectares and percent. Classification trees C5.0 and Random forest classification (RF-C) with homogeneous samples (H = 100) or heterogeneous samples allocated uniformly for H ≥ 50% or randomly (argmax(H)). Sample allocation between classes with random, area-proportional, equal allocation. Regression trees Cubist and Random Forest Regression (RF-R) with uniform and random allocation with no minimum or at least 50 samples per bin. NN: no normalization. Norm: normalization. Highlighted cells indicate best results for classification and regression trees.
Classification | Accuracy Discrete | Accuracy Continuous | AD Discrete | AD Continuous |
---|
Algorithm | Allocation | Class/Norm. | OA (%) P | OA (%) P + A | r | MAD (%) | Slope | Int. (%) | Mio ha | % | Mio ha | % |
---|
C5.0 | Homogen | Random | 46.14 | 56.24 | 0.64 | 9.94 | 0.70 | 3.32 | 89.99 | 55.55 | 85.47 | 52.76 |
| | Area | 49.63 | 62.11 | 0.74 | 8.75 | 0.85 | 1.65 | 48.77 | 30.10 | 47.90 | 29.57 |
| | Equal | 49.70 | 61.29 | 0.69 | 9.45 | 0.77 | 2.59 | 66.93 | 41.31 | 65.76 | 40.59 |
| Uniform | Random | 52.23 | 65.23 | 0.81 | 7.34 | 0.90 | 1.12 | 28.73 | 17.73 | 15.45 | 9.54 |
| | Area | 54.53 | 67.83 | 0.81 | 7.25 | 0.89 | 1.20 | 23.35 | 14.41 | 8.52 | 5.26 |
| | Equal | 54.09 | 67.01 | 0.78 | 7.81 | 0.82 | 1.95 | 20.40 | 12.59 | 27.29 | 16.85 |
| Random | Random | 53.12 | 67.38 | 0.81 | 7.16 | 0.85 | 1.68 | 32.41 | 20.01 | 15.26 | 9.42 |
| | Area | 53.79 | 66.86 | 0.81 | 7.22 | 0.82 | 2.00 | 21.78 | 13.44 | 7.11 | 4.39 |
| | Equal | 52.82 | 66.86 | 0.78 | 7.76 | 0.77 | 2.58 | 22.01 | 13.59 | 30.94 | 19.10 |
RF-C | Homogen | Random | 47.03 | 57.06 | 0.67 | 9.49 | 0.71 | 3.19 | 91.14 | 56.26 | 82.34 | 50.83 |
| | Area | 49.78 | 63.22 | 0.76 | 8.31 | 0.87 | 1.45 | 46.06 | 28.43 | 45.81 | 28.28 |
| | Equal | 51.63 | 64.04 | 0.72 | 8.92 | 0.79 | 2.33 | 61.36 | 37.88 | 60.53 | 37.37 |
| Uniform | Random | 53.27 | 67.24 | 0.83 | 7.02 | 0.92 | 0.82 | 31.89 | 19.68 | 16.83 | 10.39 |
| | Area | 55.57 | 69.47 | 0.83 | 6.96 | 0.91 | 0.96 | 22.82 | 14.09 | 8.74 | 5.39 |
| | Equal | 54.09 | 67.61 | 0.80 | 7.41 | 0.85 | 1.69 | 20.93 | 12.92 | 27.03 | 16.68 |
| Random | Random | 52.97 | 67.24 | 0.83 | 6.70 | 0.87 | 1.39 | 36.55 | 22.56 | 18.03 | 11.13 |
| | Area | 54.75 | 68.72 | 0.83 | 6.67 | 0.85 | 1.66 | 27.37 | 16.90 | 6.36 | 3.93 |
| | Equal | 53.64 | 67.53 | 0.81 | 7.20 | 0.80 | 2.25 | 23.26 | 14.36 | 29.37 | 18.13 |
Cubist | Random-50 | NN | 53.79 | 68.50 | 0.86 | 6.25 | 0.85 | 2.79 | 29.14 | 17.99 | 16.20 | 10.00 |
| | Norm | 53.79 | 68.50 | 0.86 | 6.07 | 0.73 | 2.93 | 29.14 | 17.99 | 5.45 | 3.36 |
| Random-0 | NN | 51.93 | 65.90 | 0.86 | 5.95 | 0.76 | 2.56 | 35.36 | 21.83 | 2.51 | 1.55 |
| | Norm | 51.93 | 65.90 | 0.86 | 5.93 | 0.75 | 2.71 | 35.36 | 21.83 | 4.22 | 2.61 |
| Uniform | NN | 55.94 | 69.84 | 0.79 | 12.37 | 0.86 | 11.71 | 15.32 | 9.46 | 148.94 | 91.94 |
| | Norm | 55.94 | 69.84 | 0.81 | 8.79 | 0.46 | 5.96 | 15.32 | 9.46 | 49.87 | 30.78 |
RF-R | Random-50 | NN | 53.19 | 66.79 | 0.85 | 6.95 | 0.81 | 4.11 | 31.02 | 19.15 | 29.84 | 18.42 |
| | Norm | 53.19 | 66.79 | 0.85 | 6.66 | 0.66 | 3.79 | 31.02 | 19.15 | 10.45 | 6.45 |
| Random-0 | NN | 51.41 | 65.75 | 0.86 | 6.54 | 0.73 | 3.59 | 39.21 | 24.20 | 9.15 | 5.65 |
| | Norm | 51.41 | 65.75 | 0.85 | 6.49 | 0.68 | 3.56 | 39.21 | 24.20 | 2.66 | 1.64 |
| Uniform | NN | 56.24 | 69.76 | 0.77 | 14.24 | 0.81 | 14.26 | 16.20 | 10.00 | 177.88 | 109.80 |
| | Norm | 56.24 | 69.76 | 0.79 | 9.55 | 0.40 | 6.65 | 16.20 | 10.00 | 56.28 | 34.74 |
The differences in classification accuracies were statistically tested using McNemar test and
Figure 4A depicts the statistically significant differences. In contrast to using only the primary reference label (lower-left triangle), there are less significant differences for assessments with the primary and alternative reference label (upper right triangle). Most obvious is that sampling homogeneous training data for classification trees almost always performs significantly worse (for actual accuracies see
Table 9 and
supplemental material). There are statistically significant differences between classification trees using heterogeneous training samples and regression trees, even though the differences in overall accuracies are low. This is due to the nature of the test, which aims at the number in differences of correctly and incorrectly classified reference samples between two classifications. The statistically significant differences in overall accuracies are shown in
Figure 4B. Again, most notable is that classifications with homogeneous training samples perform significantly worse than all others. The main difference to McNemar test is that there are more statistically significant differences for the reference set using primary + alternative calls as correctly classified due to the higher range in overall accuracies (see also
Table 9).
Figure 4.
Statistical significance of difference in accuracies between (A) image classifications using McNemar test and (B) overall accuracies. Lower-left triangle shows results for the primary reference label, upper-right triangle for the primary and alternative reference label as correctly classified.
Figure 4.
Statistical significance of difference in accuracies between (A) image classifications using McNemar test and (B) overall accuracies. Lower-left triangle shows results for the primary reference label, upper-right triangle for the primary and alternative reference label as correctly classified.
4.3.3. Class-Membership Assessment
The continuous reference derived from NLCD2006 is used for assessing class memberships across all classes using four statistics (
Table 9): correlation coefficient (r), mean absolute difference (MAD), slope, and intercept (Int). Memberships from C5.0 show in general inferior results with lowest r and highest MAD compared to other tested algorithms. Homogeneous training data for classification trees are clearly inferior compared to heterogeneous training pixels (∆
r = 0.11 and ∆MAD = 1.94%). Equal allocation between classes results in slightly lower correlation and higher MAD than random or area-proportional allocation. For regression trees, random allocation shows 0.07 higher correlations and a notably (4.9%) lower MAD than uniform sampling. Normalization only marginally improves correlation coefficients but the MAD decreases by 1.5%. Best results of classification trees were obtained for RF-C with area-proportional between-class sample allocation and randomly allocated heterogeneous samples (
r = 0.83, MAD = 6.67%) which is almost as good as best results from regression trees with Cubist, random allocation of heterogeneous pixels with no minimum set and normalization
r = 0.86 and MAD = 5.93%).
Figure 5 shows the spatial distribution of the MAD for which MAD was computed for each pixel individually. The figure only displays results for RF-C and Cubist; the spatial distribution of the error was similar for C5.0 and RF-R, respectively. For classification trees there are no spatial differences between among-class allocations (area-proportional allocation is shown), and there are no differences between allocations of heterogeneous pixels (uniform is displayed), which corresponds to the spatial patterns shown in
Figure 3C. Allocating only homogeneous pixels for training shows notably higher errors in general and in particular for transitional zones from Deciduous forest to Evergreen forest in Mississippi, Alabama, and Georgia as well as transitions from Shrubland to Cultivated crops to Grassland in Texas and Oklahoma. Regression tree results with random allocation of heterogeneous pixels depict no differences among each other (Random-0 is depicted) for which normalization has no impact on the spatial distribution of errors. There are isolated areas with high errors, e.g., the Okefenokee Swamp in southeastern Georgia for which the membership values for class Wetland were underestimated. Uniform allocation depicts high MAD throughout the entire image, which decrease when normalization is applied.
Figure 5.
Spatial mean absolute difference (MAD) of selected image sets of class memberships. Random forest classification (RF-C) with area-proportional sample allocation between classes and homogeneous and heterogeneous, uniformly allocated training pixels. Cubist with uniform and random allocation of heterogeneous training pixels and with and without normalization.
Figure 5.
Spatial mean absolute difference (MAD) of selected image sets of class memberships. Random forest classification (RF-C) with area-proportional sample allocation between classes and homogeneous and heterogeneous, uniformly allocated training pixels. Cubist with uniform and random allocation of heterogeneous training pixels and with and without normalization.
Parameters of the regression line between reference and predicted values show generally better results for classification trees with higher slopes and lower intercepts compared to regression trees. RF-C with uniform allocation of heterogeneous training pixels shows highest slopes (0.92) and lowest intercept (0.82%). For regression trees, uniform allocation of heterogeneous pixels and no normalization indicates highest slope (0.86) for Cubist at the expense of a very high intercept with 11.71%; the lowest intercept of 2.56 was found for random sampling and no normalization.
4.4. Area Analysis
A second criterion for classifier performance and analysis of different sampling schemes is the similarity of area estimates.
Table 9 depicts the total absolute difference between area proportions of the NLCD2006 map as reference and class membership or discrete maps expressed in million hectares and percent against the total study area. For instance, class-memberships of the C5.0 classification tree with homogeneous training pixels and random sample allocation between classes (first line in
Table 9) shows a difference of 85.47 Mio ha or 52.76% to the NLCD2006 as reference.
Area differences from discrete maps for classification trees show no notable differences among algorithms (average of 24.30% for C5.0, 24.79% for RF-C) and a clearly better performance of heterogeneous pixel allocation (16.02%) compared to 41.59% for homogeneous pixels. For heterogeneous training pixels, equal allocation of samples between classes shows up to 2% lower differences than area-proportional allocation and 5%–8% lower than random allocation. For this sample allocation the C5.0 algorithm shows a slightly better result than RF-C. For regression trees Cubist shows slightly lower differences (average of 16.42%) than RF-R (17.78). Uniform allocation using Cubist shows lowest difference (9.46%), which is in line with better overall accuracies when measured with homogeneous test data (H = 100).
For membership estimates from classification trees, on average, there is no notable difference between C5.0 (20.83%) and RF-C (20.24%). Homogeneous training data show clearly inferior results with on average 39.90% difference compared to heterogeneous training pixels (10.85%). Area-proportional sample allocation of randomly sampled heterogeneous pixels yield best results with 3.93% total difference for RF-C. Memberships from regression trees show lower differences for Cubist (average 23.37%) than RF-R (29.45%). Random allocation (6.21%) clearly outperforms uniform sampling (66.82%). The table also indicates the importance of normalization (13.27%) because non-normalized results on average cannot correct the total area estimate (39.56%). For Cubist (RF-R), sampling with Random-50 estimated 110.0% (118.4%), Random-0 99.1% (105.6%) and Uniform 191.9% (209.8%) of the total area as compared to NLCD2006. Total areas of non-normalized results for random sampling are relatively close to the true total area, which is also the best result using Cubist with an absolute difference of 1.55%.
6. Conclusions
This study tested several sampling methods for discrete classification and class membership estimation (
i.e., continuous land cover) using decision-tree methods. It employed an annual time series of spectral bands of MODIS data at 900 m spatial resolution and a subset of the 2006 National Land Cover Database as wall-to-wall finer resolution reference map from which training samples were allocated. Spatial co-registration was ensued with baseline Landsat data that also served as response data for discrete map assessment. There are three main conclusions:
- (1)
Regression trees show higher accuracies and lower differences in expected area but classification trees better predict the full dynamic range of values. For tested regression tree methods, results of Cubist are better than random forest regression. Random forest classification performs better than C5.0 with boosted trees.
- (2)
For classification trees, heterogeneous training data perform clearly better than homogeneous pixels for both, discrete and continuous land cover mapping. Uniform allocation of heterogeneous pixels is slightly better than random allocation. For between-class sample allocation area-proportional training data allocation is recommended.
- (3)
For regression trees, normalization is imperative to correctly estimate the total area of class memberships. Random allocation is very important for estimating class memberships. A uniform sampling structure can be recommended for deriving discrete maps.
This study only focused on one study area, the southeastern United States. Further tests in other regions of the world and with different data sets and scales, e.g., 30 m image classification trained with 1 m reference data, will be needed to confirm and generalize its results.