Land Cover Heterogeneity Effects on Sub-Pixel and Per-Pixel Classifications

Per-pixel and sub-pixel are two common classification methods in land cover studies. The characteristics of a landscape, particularly the land cover itself, can affect the accuracies of both methods. The objectives of this study were to: (1) compare the performance of sub-pixel vs. per-pixel classification methods for a broad heterogeneous region; and (2) analyze the impact of land cover heterogeneity (i.e., the number of land cover classes per pixel) on both classification methods. The results demonstrated that the accuracy of both per-pixel and sub-pixel classification methods were generally reduced by increasing land cover heterogeneity. Urban areas, for example, were found to have the lowest accuracy for the per-pixel method, because they had the highest heterogeneity. Conversely, rural areas dominated by cropland and grassland had low heterogeneity and high accuracy. When a sub-pixel method was used, the producer’s accuracy for artificial surfaces was increased by more than 20%. For all other land cover classes, sub-pixel and per-pixel classification methods performed similarly. Thus, the sub-pixel classification was only advantageous for heterogeneous urban landscapes. Both creators and users of land cover datasets should be aware of the inherent landscape heterogeneity and its potential effect on map accuracy.


Introduction
Land cover maps derived from remotely sensed images are widely used to investigate human-environment interactions [1].Accordingly, concern about the accuracy of these maps has grown.If accuracy refers to "the degree of "correctness" of a map" [2], accuracy assessment is a process of quantifying the degree to which the derived map conforms to the ground "truth" [2].The confusion matrix is a key means for accuracy assessment, because it quantifies not only the overall accuracy, but also the errors of omission and commission associated with individual map classes [2,3].However, the confusion matrix is set by the resolution of the imagery and, thus, assumes that each pixel is homogenous.In areas with small land cover features (e.g., isolated trees) or high land cover heterogeneity (e.g., urban areas), the confusion matrix may not characterize the full extent of land cover accuracy [4].Since the last decade, there has been a call to move beyond the confusion matrix to include the spatial pattern of classification errors when documenting the accuracy of land cover maps [4][5][6].Understanding the spatial variation of these errors helps scientists to identify whether regions of interest have sufficient accuracy [4] or to pin point regions of low accuracy for further classification enhancement procedures [5,7].In order to reveal the spatial pattern of these errors, it is necessary to recognize their sources.
In addition to errors associated with sensor limitations and atmospheric effects, landscape characteristics (i.e., the spatial arrangement and properties of land cover) can also be a source of error [8].Elevation, for example, has been found to both enhance and reduce classification accuracy.In the former case, vegetated regions at higher elevation may have higher classification accuracy, because phenology at higher elevations tends to be more homogeneous [9].In the latter case, drastic elevation changes can cause variation in brightness values between a horizontal surface and a sloped surface for the same land cover [10].
Patch size and land cover heterogeneity (i.e., the number of land cover types found within a defined spatial window, which can be a pixel, a block of pixels or a study region) can also have an effect on classification accuracy.Typically, larger patches have higher accuracy, while more heterogeneous patches have lower accuracy [11,12].For example, the probability of a correct classification of a Landsat pixel will be greater than 0.5 if it is contained in a patch of 56 × 56 pixels or if it is homogeneous [13].Depending on the image resolution and classification method used, the influence of patch size and heterogeneity on accuracy may be different.Heterogeneity can reduce per-pixel classification accuracy by increasing intra-class variation if the image resolution is fine or by increasing the number of mixed pixels (i.e., pixels comprising more than one land cover type) if the image resolution is coarse [14].The mixed-pixel problem, however, may be resolved by applying sub-pixel classification to unmix the pixels to land cover proportions [15][16][17].
Although sub-pixel classification has potential in a variety of applications [18][19][20], its accuracy is still affected by high intra-class variation caused by land cover heterogeneity [16,21].While statistical analyses have been performed to examine the effect of land cover heterogeneity on per-pixel classification [11,13,22], the same has not been done for sub-pixel classification.Therefore, the objectives of this study are to: (1) analyze the effects of land cover heterogeneity on sub-pixel classification; and (2) compare sub-pixel classification with per-pixel classification over a broad heterogeneous region to assess in which landscapes sub-pixel classifications may be advantageous.

Study Area
This study examines a 10,000 km 2 area in central Arkansas (USA), centered on the capital of Little Rock (Figure 1).This area was selected because of its heterogeneity in physiography and land cover.Little Rock is situated at the intersection of four Omernik Level III ecoregions [23].The Arkansas Valley ecoregion north of Little Rock is characterized by forested hills (31% forest in 2006) that bound large valleys covered in a mixture of agricultural activities (9% cropland).The Mississippi Alluvial Plain to the east is a relatively flat ecoregion historically covered by forested wetlands and several large grasslands, but is now agriculturally dominated (54% cropland).South of Little Rock lays the South Central Plains ecoregion, composed of rolling forested plains (54% forest) with many small patches of urban (14%), agriculture (0.4%) and barren (0.3%) lands.The Ouachita Mountains ecoregion to the west is mostly forested (68%), with steep slopes along east-west trending ridges.Commercial logging is the major land use in these latter two ecoregions.More details of land cover composition within this study area, as well as temporal changes and their drivers can be found in Jawarneh and Julian [24].In all, land cover is most heterogeneous in the Arkansas Valley, followed by the South Coastal Plains, the Mississippi Alluvial Plain and, lastly, the Ouachita Mountains.

Data
Landsat imagery (30-m resolution) and aerial photographs (1-m resolution) were used to analyze the impact of land cover heterogeneity on per-pixel and sub-pixel classification methods.Landsat 5 Thematic Mapper (TM) images (Path 24, Row 36) at Level 1T with six bands (excluding the thermal band) were downloaded from the U.S. Geological Survey (USGS) Earth Explorer.To reduce confusion between cropland and other land covers (e.g., barren and grassland/shrub), this study used multi-seasonal images acquired on 12 April 2010, 19 September 2010, and 6 November 2010.These images were cloud-free and closest to the acquisition date of the National Agriculture Imagery Program (NAIP) photos (June-September 2010).Fast Line-of-sight Atmospheric Analysis of Spectral Hypercubes (FLAASH) [25] was used to convert Landsat images from top-of-atmosphere radiance to surface reflectance.Ultimately, all images were layer-stacked and subset to the predefined 100 × 100 km 2 study area.
Land cover types were classified by a per-pixel classification using the supervised Maximum Likelihood Classification (MLC; [26]) and by a sub-pixel classification using the supervised Fuzzy Maximum Likelihood Classification (FMLC; [27]).Training and validation data for these classification methods were developed based on the National Land Cover Database 2006 (NLCD 2006; [28]) and NAIP aerial photos obtained from the U.S. Department of Agriculture (USDA) Geospatial Data Gateway [29].These 1-m resolution natural-color photos were acquired during the agricultural growing seasons and were administered by the USDA Farm Service Agency through the Aerial Photography Field Office in Salt Lake City.NAIP photos were available as orthorectified photos with a reported horizontal accuracy of 6 m [29,30].

Classification
The land cover classes in Table 1 were used for both per-pixel and sub-pixel classification methods.The training and validation data were developed using stratified random samples.By aggregating similar NLCD 2006 classes, we identified seven land cover strata: water (NLCD Class 11), artificial surface (21,22,23,24), barren land (31), forest (41,42,43), grassland/shrub (52, 71, 81), cropland (82) and wetland (90, 95).We did not incorporate the wetland class in our classification scheme, due to its easy confusion with other land cover classes (e.g., agricultural crops and upland forests) (Table 1) [31], but it was considered in the sampling strategy in order to account for the spectral variation of forest cover in wetland and upland areas.For each sample, its land cover proportions were determined by visually interpreting the 2010 NAIP aerial photos.
Training samples are recommended to be homogeneous (i.e., composed of only one land cover type) to make sure that the radiance histogram of a class is unimodal to facilitate the calculation of statistical measures (e.g., mean and covariance matrix) [32].However, in a heterogeneous landscape, selecting homogeneous samples can be problematic, because it is difficult to have enough homogeneous samples located in all parts of the study area.Moreover, as the landscape becomes more heterogeneous with mixed pixels, the radiance data extracted from samples may not be representative of other mixed pixels for the same class.That is, pixel radiance is a function of radiance from all classes within the pixel and its neighboring pixels [27].Accordingly, our samples ("soft" training samples) consisted of anywhere from one to seven land cover types.In all, we used 1575 samples (225 "soft" training samples per land cover class).The size of "soft" training samples was one Landsat pixel (30 × 30 m).Land cover proportions for each training sample were estimated by visually interpreting NAIP photos at a scale of 1:2000.These "soft" training samples were used in sub-pixel classification.For per-pixel classification, however, "hard" training samples were used.These "hard" training samples were created by applying the dominant rule to the "soft" training samples.We used the Maximum Likelihood Classification algorithm (per-pixel) to produce a categorical map with six classes (i.e., wetland classes were converted to the forest class) and the Fuzzy Maximum Likelihood Classification algorithm (sub-pixel) to produce a fractional map constructed by overlaying six proportional images, each of which represented a land cover type.The sum of all land cover proportions within pixels of the fractional map was one, because it was the assumption of the FMLC [27].

Class Definition
Cropland (CR) Areas used for the production of crops, such as corn, soybeans, vegetables, tobacco and cotton.
This class also includes fallow cropland.
Artificial surface (AR) Construction materials, such as asphalt, concrete and rooftops.
Barren (BA) Areas of bedrock, bare soil, quarries and any accumulation of earthen material.

Forest (FO)
All trees over 5 m, including low-density trees in urban areas.
Grassland/Shrub (GR) Areas with >80% coverage of graminoid or herbaceous vegetation; or areas with >20% coverage of shrubs less than 5 m high.

Water (WA)
Areas of open water with <25% coverage of any other class.

Validation
In this study, we used the cross tabulation matrix [33] proposed as a customization of the conventional confusion matrix [34] to validate the performance of the two classification methods.The cross tabulation matrix aims at assessing the accuracy of a classification at the sub-pixel scale.Therefore, unlike the conventional confusion matrix measuring the agreement/disagreement between a categorical map resulting from a per-pixel classification and a set of "hard" validation samples, each of which consists of exactly one class, the cross tabulation matrix measures the agreement/disagreement between a fractional map resulting from a sub-pixel classification and a set of "soft" validation samples, each of which consists of more than one class [33].For a study area, while there is exactly one conventional confusion matrix, there may be multiple cross tabulation matrices, each of which is created for each "soft" validation sample.In a cross tabulation matrix, diagonal entries represent the agreements calculated as the overlapped class proportions between a "soft" validation sample and a corresponding pixel (or block of pixels) in the fractional map.Similarly, off-diagonal entries represent the disagreements calculated based on the non-overlapped class proportions.In addition to the individual cross tabulation matrices created for the "soft" validation samples, an overall cross tabulation matrix can be created for an average validation sample to characterize the accuracy of a classification at the level of the study area.The average validation sample consists of all classes found in all "soft" validation samples.A proportion of each class in the average validation sample is calculated as an average of the proportions of the same class across all "soft" validation samples.Both individual and overall cross tabulation matrices can be used for either sub-pixel or per-pixel classification.In the case of per-pixel classification, "soft" validation samples can be used to validate a categorical map with the assumption that the class proportion of pixels in the categorical map equals 100%.
In this study, individual cross tabulation matrices and overall cross tabulation matrices were created for both per-pixel and sub-pixel classification methods based on 700 "soft" validation samples (100 per sampling class).The size of "soft" validation samples was a block of 3 × 3 pixels to reduce the effect of misregistration between the referenced NAIP photos and the Landsat data [35,36].For each validation sample, we estimated reference land cover proportions by visually interpreting the NAIP photos at the scale of 1:2000, which was equivalent to the procedure applied to the training samples.Besides, for each classification method, using the individual cross tabulation matrices, we were able to determine the producer's and user's accuracies for "soft" validation samples to facilitate the statistical analysis of the impact of heterogeneity on the classification accuracies.Furthermore, the overall cross tabulation matrix provided a general understanding about the misclassifications between classes for each classification method.

Statistical Analyses
In these statistical analyses, land cover heterogeneity was calculated for each validation sample as the number of land cover types existing within the sample.For instance, a sample occupied by two land cover types has a land cover heterogeneity of two classes.Samples having the same heterogeneity were grouped into six groups, ranging from one to six.Because there were too few (<5) samples in group six, this group was excluded from further statistical analyses.
In order to test the effect of heterogeneity on classification accuracies, two statistical tests were performed based on the "soft" validation samples.First, the Wilcoxon signed-rank test was conducted to test whether differences in the producer's and user's accuracies between the two classification methods were significant.Second, the Steel-Dwass test was conducted to study whether a change in the number of land cover classes (i.e., heterogeneity) would result in significant change in the producer's, user's and overall accuracies of both classification methods.These two non-parametric tests were conducted because the producer's and user's accuracies of the two classification methods were not normally distributed.

Land Cover Map and Accuracy
The categorical per-pixel land cover map (Figure 1) and the fractional sub-pixel land cover maps (Figure 2) provide two different perspectives of the same landscape with similar accuracy.The per-pixel classification had an overall accuracy of 81.9%, and the sub-pixel classification had an overall accuracy of 82.3% (Table 2).Although these overall accuracies were slightly lower than the common threshold of 85% [37], they were acceptable, because: (1) they were in a range of the accuracies published during the last decade [38]; (2) they exceeded the level of 75% suggested by Goodchild et al. [39]; and (3) they were higher than the regional accuracy of NLCD 2001, Level I (79% for Region 7; [40]).Indeed, this South-Central U.S. region has the lowest overall accuracy for the NLCD, likely due to high landscape heterogeneity and large areas of ephemeral wetlands [40].Among all classes, cropland had the highest producer's and user's accuracies (Figure 3).Using multi-seasonal Landsat data to identify cropland definitely improved these accuracies, but we argue in the next section that it has more to do with croplands having low heterogeneity.Nevertheless, misclassifications did occur for croplands, particularly between fallow cropland and barren lands (Table 2).Barren and artificial surfaces were also confused.In the per-pixel classification, 27% (2.33/8.48) of artificial surfaces was misclassified as barren and 9% (0.78/8.32) of the barren surface was misclassified as artificial surface.In the sub-pixel classification, 21% (1.8/8.48) of the artificial surface was misclassified as barren and 14% (1.15/8.32) of the barren surface was misclassified as artificial surface.An exploration of these classification errors revealed that the misclassifications between the artificial surface and the barren surface occurred due to spectral similarities between high-albedo artificial surfaces (i.e., rooftops) and barren (i.e., sand and dry soil), as well as between low-albedo artificial surfaces (i.e., asphalt and roads) and barren surfaces (i.e., gravel) [41][42][43][44].Artificial surfaces were also confused with grassland/shrub (nearly 8% for per-pixel and 3% for sub-pixel), as well as forest (7% for per-pixel and 6% for sub-pixel).The most likely reason was the spectral similarity in the near-infrared band between high-albedo artificial surface and vegetation [42].Barren surfaces were confused with cropland, grassland/shrub and forest, especially along class boundaries, such as narrow and/or linear farm paths, forest paths and dirt shoulders along tree-lined roads.These misclassifications showed that both per-pixel and sub-pixel classification methods had difficulty in identifying narrow/linear features [12].This difficulty in identifying narrow/linear features might also explain some of the misclassifications between forest and grassland/shrub, which were commonly found in low-intensity residential areas and open woodlands.Spectral confusion was also a likely culprit for some of these misclassifications [45].
Wilcoxon signed-rank tests were conducted to test the differences between the producer's accuracies of per-pixel vs. sub-pixel classification methods and the differences between the user's accuracies of per-pixel vs. sub-pixel classification methods for each land cover class (Figure 3).User's accuracies were similar for all land cover classes, except for barren (p < 0.05) and grassland/shrub surfaces (p < 0.001), which were significantly higher for sub-pixel classification, and artificial surface, which was significantly higher (p < 0.05) for per-pixel classification.The producer's accuracy of sub-pixel classification was significantly higher than that of per-pixel classification for artificial surface (p < 0.001) and forest (p < 0.001) classes (Figure 3).Most notably, while the difference between the producer's accuracy of sub-pixel vs. per-pixel classification for most classes was small (<5%), sub-pixel classification performed far better for artificial surfaces with a producer's accuracy that was 20% higher than for per-pixel classification.This finding was supported by claims that sub-pixel classification is more advantageous than per-pixel classification, because it relaxes the assumption that pixels are homogenous [46,47].Many studies, especially urban studies characterizing artificial surfaces with remote sensing data, have been successful in using sub-pixel classification to develop fractional maps of land cover types [19,48,49].For other classes, the preference of sub-pixel vs. per-pixel classification remains a debate, because the producer's and user's accuracies were not consistently higher for one classification over the other (Figure 3).However, classifiers other than FMLC (e.g., spectral mixture analysis, fuzzy c-means, neural network) could have provided a different result where sub-pixel classification would have been better than per-pixel classification for classes other than artificial surfaces.A note of caution is that Landsat data, with a resolution of 30 m, likely does not produce many mixed pixels, which may have masked differences between per-pixel and sub-pixel classification methods in our study.For coarser resolution data, such as MODIS, sub-pixel classifications may be needed to properly characterize land cover/use in heterogeneous landscapes.

Heterogeneity and Its Effect on Classification Accuracy
Both the per-pixel (Figure 1) and sub-pixel (Figure 2) land cover maps show varying levels of heterogeneity across the study area.Cropland-dominant areas tended to have the lowest heterogeneity, while urban areas tended to have the highest heterogeneity.Among the six classes, barren validation samples were most heterogeneous, with an average heterogeneity of 3.0 classes (i.e., any "soft" validation samples containing barren land typically had three classes), followed closely by artificial surfaces at 2.9 (Figure 3).In contrast, cropland validation samples were the least heterogeneous, with an average heterogeneity of 2.1 classes.Given that barren and artificial surface were mostly found in urban areas, while grassland/shrub and cropland were found in rural areas, the trend of decreasing heterogeneity from barren to cropland suggests that the urban landscape is more heterogeneous than the rural landscape.
A rural-urban heterogeneity trend was also evident in the producer's and user's accuracies of the per-pixel classification, where urban areas usually had lower accuracy and rural areas had much higher accuracy (Figure 3).In other words, where land cover was less heterogeneous, the accuracies of per-pixel classification were higher.This result was similar to the findings from other scholars for per-pixel classification [11,13,22].Interestingly, we found that this trend generally held for sub-pixel classification, as well.The exception was artificial surfaces, due to their exceptionally high producer's accuracy (see the previous section).Overall, the accuracy of per-pixel classification is expected to decrease as pixels become more heterogeneous [11,13].This is likely the reason why most sub-pixel classifications have been conducted in urban landscapes with large areas of artificial surfaces, e.g., [50,51].
The effect of heterogeneity on classification accuracy was further analyzed by testing the differences in the mean rank of the overall accuracy, as well as the producer's and user's accuracies among groups of heterogeneity using the Steel-Dwass non-parametric test.The results demonstrated that the accuracies of validation samples having one class (heterogeneity = 1 class) were significantly and distinctly the highest, followed by validation samples with two classes.If a validation sample had three or more classes, its accuracy dropped considerably, and in general, there were no significant differences among validation samples with a heterogeneity of three, four or five classes.These findings further support our previous results that increasing land cover heterogeneity reduces the accuracy of both per-pixel and sub-pixel land cover classification methods.
While the variation in accuracy among validation samples with three or more classes was not significant, such variation could be due to other factors, such as difference in the spatial extents of classes within validation samples or due to misregistration between NAIP photos and Landsat data.In fact, classes having a small spatial extent often have low accuracy in per-pixel classification [11,12].The effect of misregistration on the classification accuracy in this study was expected to be small due to the use of window-based (3 × 3 pixels) validation samples.Indeed, other studies indicated that increasing the window size of validation samples could result in increased accuracy [35].However, this study used the window size of 3 × 3 pixels because if the window size was greater than 3 × 3 pixels, it would be difficult to visually estimate proportions of all land cover types within the samples, due to their high heterogeneity.

Conclusion
From our study of a broad heterogeneous landscape, we found that: (1) both per-pixel and sub-pixel classification accuracies are reduced by land cover heterogeneity; and (2) sub-pixel classification performs better overall than per-pixel classification for artificial surfaces.Therefore, when faced with the decision of using a per-pixel or sub-pixel classification, sub-pixel classification may only become advantageous for heterogeneous urban landscapes.For all other types of landscapes, the extra time involved in producing a sub-pixel classification is probably not worth the time spent on developing its "soft" training and "soft" validation samples.Our experience showed that the time needed to visually estimate all land cover proportions within 50 training samples for a classification was 15 times longer than the time needed to visually identify the dominant land cover type within the same 50 training samples for a per-pixel classification.Regardless of which type of classification is used or the intended application, both creators and users of land cover datasets should be aware of the inherent landscape heterogeneity and its potential effect on map accuracy.

Figure 1 .
Figure 1.The study area centered on Little Rock, Arkansas, USA.The base layer is the land cover map (2010; 30-meter resolution) generated by the per-pixel classification in this study.The four Omernik Level III ecoregions are outlined in black.

Figure 3 .
Figure 3.The mean producer's and user's accuracies of per-pixel and sub-pixel classification methods together with the mean number of classes across all 3 × 3 pixels "soft" validation samples.The mean of producer's and user's accuracies were calculated as the averages of all producer's and user's accuracies taken from the individual cross tabulation matrices developed for all "soft" validation samples.The error bars represent one standard error.The asterisks indicate * p < 0.05, ** p < 0.01 and *** p < 0.001 from the Wilcoxon signed-rank test.

Table 1 .
Land cover classes adapted from the National Land Cover Database 2006 (NLCD 2006).

Table 2 .
The overall cross tabulation matrix for the two classification methods: per-pixel (normal text) and sub-pixel (bold italicized text).