Designing an Experiment to Investigate Subpixel Mapping as an Alternative Method to Obtain Land Use / Land Cover Maps

Various subpixel mapping (SPM) methods have been proposed as downscaling techniques to reduce uncertainty in classifying mixed pixels. Such methods can provide category maps of a higher spatial resolution than the original input images. The aim of this study was to explore and validate the potential of SPM as an alternative method for obtaining land use/land cover (LULC) maps of regions where high-spatial-resolution LULC maps are unavailable. An experimental design was proposed to evaluate the feasibility of SPM for providing the alternative LULC maps. A case study was implemented in the Jingjinji region of China. SPM results for spatial resolutions of 500–100 m were derived from a single 1-km synthetic fraction image using two representative SPM methods. The 1-km synthetic fraction image was assumed to be error free. Accuracy assessment and analysis showed that overall accuracies of the SPM results were reduced from about 85% to 75% with increasing spatial resolution, and that producer’s accuracies varied considerably from about 62% to 93%. SPM performed best when handling areal features in comparison with linear and point features. The highest accuracies were achieved for areas with the lowest complexity. The study concluded that the results from SPM could provide an alternative LULC data source with acceptable accuracy, especially in areas with low complexity and with a large proportion of areal features.


Introduction
Various aspects of land use and land cover (LULC) are important in geo-information, environmental, and socioeconomic applications [1].Remote sensing is a cost-effective and efficient means to obtain data for LULC monitoring over large land surfaces at a variety of spatial and temporal scales [2].The presence of mixed pixels in remote sensing images, however, can cause difficulties in the extraction of accurate LULC information because their spectral characteristics reflect the composite signature of different LULC classes [3].Soft classification (or spectral unmixing) methods have been proposed to address this problem by estimating the memberships of LULC classes within a pixel [4].Unfortunately, such techniques might be unable to provide any indication of the spatial distribution of such LULC classes within each mixed pixel [5].Alternatively, subpixel mapping (SPM) may be used as a post-processing step of soft classification to reduce the uncertainty in the spatial distribution of the subpixels within each mixed pixel.SPM, initially proposed by Atkinson [5], decomposes each mixed pixel into a fixed number of subpixels based on a zoom factor and it then assigns these subpixels to specific LULC classes [6,7].It can be considered a classifier that transforms fraction images (i.e., the output of soft classifications) into a hard classification map at the subpixel scales [8].In contrast to common classifiers, SPM can produce a LULC map with finer spatial resolution than the original moderate-or low-spatial-resolution input image [8].Thus, SPM offers a solution to the tradeoff between the spatial resolution of a sensor and its spectrum.It can process mixed pixels in low-, moderate-, and high-spatial-resolution imagery and thus save on the cost of obtaining images with higher spatial resolution [9,10].In particular, in the analysis of LULC time series data, e.g., for change detection, SPM can be applied to low-or moderate-resolution images acquired in an earlier phase to produce LULC maps with spatial resolution consistent with maps produced from high-spatial-resolution images acquired during the current phase.It is therefore attractive to use SPM to derive finer-resolution LULC maps from coarser-resolution remote sensing images.
Since 1997, a number of techniques related to SPM have been developed.These include Hopfield neural networks [9,11], back-propagation neural networks [12], pixel swapping [13,14], spatial attraction models [15,16], indicator co-kriging [17,18], Markov random fields [19][20][21], advanced artificial intelligence-based algorithms [22,23], interpolation-based methods [24][25][26], and geometric methods [27,28].To date, more than 200 articles on SPM methods have been published, a few of which have focused on applications for special classes, such as urban tree identification [29], urban building extraction [30], and lake area estimation [31].Generally, however, these articles have focused on the development of SPM methods and performed relatively well.Little if any consideration has been given to whether SPM could feasibly provide alternative LULC data in practical applications for large and complex regions where high-resolution LULC maps are lacking.Moreover, no research has examined how landscape heterogeneity [32], the different spatial distribution patterns (i.e., areal, linear, or point patterns) of geographical objects [8,33], and zoom factors affect the performance of SPM in practical applications over large areas.
The objective of this study was to propose an experimental design to investigate the feasibility of using SPM to obtain LULC data in the absence of high-spatial-resolution LULC maps.Meanwhile, a combined index (CI) of landscape shape index (LSI) and areal pattern proportion (APP) was developed to explore the relationship between land surface complexity and the performance of SPM results.In the experimental design, the impact factors, such as zoom factor and land surface complexity, of SPM were explored to provide indications regarding practical applications.Different to most previous SPM studies that have concentrated on methodological development, this study was intended to provide practical guidelines for users to obtain LULC data using SPM.To assess the effectiveness of the experimental design, a case study of the Jingjinji region of China was implemented.

Experimental Design
The experimental design is presented in Figure 1.First, SPM is implemented to obtain finer-resolution LULC maps.The specific steps include the extraction of fraction images and reference data from remote sensing images or existing datasets, setting more than three zoom factors, and selecting at least two representative SPM algorithms to generate SPM maps.Second, the performance of the SPM is evaluated by analysis of the results, including accuracy assessment indices, i.e., overall accuracy (OA), Kappa coefficient (KP), producer's accuracy (PA), and user's accuracy (UA), spatial distribution patterns of geographical objects and landscape heterogeneity.Third, the impact of the SPM results is analyzed based on the zoom factors, different class levels, and SPM results in different subareas with different LSI, APP and CI.

Step 1: Implementation of SPM Methods
We first use SPM methods to obtain a classification map with a spatial resolution finer than that of the input data.Input data for SPM are the fraction images that express the proportion of each class within individual pixels.Fraction images can be used to determine the number of subpixels for each class within mixed pixels with a given zoom factor.The zoom factor is set to determine how many subpixels are in a pixel and to determine the spatial resolution of the SPM results.

Extracting Fraction Images and Reference Data
We distinguish two ways to obtain fraction images: soft classification from real remote sensing images and generation from existing LULC datasets such as upscaled data from high spatial-resolution classification data.For real remote sensing images, soft classification uses a classifier, such as a Bayesian or a support vector machine soft classifier, to directly generate fraction images with values ranging from 0 to 1 that indicate the proportion or possibility of pixels belonging to each class.Another way to extract fraction images is to aggregate existing classification data to simulate the output of soft classification, such as the Resources and Environmental Scientific Data Center of the Chinese Academy of Sciences [34], National Land Cover Data 1992 [35], and Global Land Cover datasets [36].Usually, a mean filter (such as 3 × 3 pixels) or a regular polygon are used to calculate the class proportions within the mean filter or the polygon from the existing classification data to produce fraction images [7,13,17,18,33].
Reference data can be obtained in two ways: (1) validation points of ground truth for the SPM of real remote sensing images; and (2) reference maps for the SPM of synthetic fraction images degraded from existing classification data.Note that, for synthetic fraction images, each SPM map should have its corresponding reference map to ensure they have the same spatial resolution and size.The reference map is classified maps, which can be aggregated from existing classification datasets.More details for the generation of the reference maps can be discussed in Section 3.1.

Select SPM methods
Step 1

All classes Each class
An assessment of SPM as an alternative method to obtain land use data

Hard classification
Fraction images

Reference data
Each class PA UA SPM results

Soft classifier Aggregation
Step 2 Step 3 Figure 1.Flowchart of the experimental design.

Step 1: Implementation of SPM Methods
We first use SPM methods to obtain a classification map with a spatial resolution finer than that of the input data.Input data for SPM are the fraction images that express the proportion of each class within individual pixels.Fraction images can be used to determine the number of subpixels for each class within mixed pixels with a given zoom factor.The zoom factor is set to determine how many subpixels are in a pixel and to determine the spatial resolution of the SPM results.

Extracting Fraction Images and Reference Data
We distinguish two ways to obtain fraction images: soft classification from real remote sensing images and generation from existing LULC datasets such as upscaled data from high spatial-resolution classification data.For real remote sensing images, soft classification uses a classifier, such as a Bayesian or a support vector machine soft classifier, to directly generate fraction images with values ranging from 0 to 1 that indicate the proportion or possibility of pixels belonging to each class.Another way to extract fraction images is to aggregate existing classification data to simulate the output of soft classification, such as the Resources and Environmental Scientific Data Center of the Chinese Academy of Sciences [34], National Land Cover Data 1992 [35], and Global Land Cover datasets [36].Usually, a mean filter (such as 3 ˆ3 pixels) or a regular polygon are used to calculate the class proportions within the mean filter or the polygon from the existing classification data to produce fraction images [7,13,17,18,33].
Reference data can be obtained in two ways: (1) validation points of ground truth for the SPM of real remote sensing images; and (2) reference maps for the SPM of synthetic fraction images degraded from existing classification data.Note that, for synthetic fraction images, each SPM map should have its corresponding reference map to ensure they have the same spatial resolution and size.The reference map is classified maps, which can be aggregated from existing classification datasets.More details for the generation of the reference maps can be discussed in Section 3.1.

Setting Zoom Factors
The zoom factor Z is a critical parameter that divides each coarse pixel in the fraction images into Z ˆZ finer subpixels.With the zoom factor of Z, the number of subpixels for each class within mixed pixels can be calculated by multiplying the class proportion by Z 2 .Theoretically, it could be any integer value (>1) but an appropriate zoom factor should meet three requirements: (1) express the surface adequately; (2) retain the accuracy within an acceptable range; and (3) provide SPM output at the desired spatial resolution for practical applications, e.g., change detection requires maps with the same spatial resolution pertaining to different times.Therefore, selection of the optimal zoom factor could be considered the search for the optimal scale of SPM and several zoom factors should be tested and compared.

Selecting the SPM Algorithm
Different methods are suited to different situations because each has its own characteristics.Most SPM methods are based on spatial correlation, and therefore they are good at dealing with areal patterns.Some of these, such as pixel swapping [13] and Hopfield neural networks [9], involve iteration processes; thus, they require longer processing time.Others, such as geostatistics [37], are based on pattern prediction, and are more suited to dealing with point features.These methods use training images to predict subpixel distributions, and therefore auxiliary high-spatial-resolution data are essential for the training sample.Selection of any SPM method should meet the requirements of the application according to the specific situation, e.g., different land surfaces or types of LULC classes.To explore the potential of different SPM methods, two or more representative SPM methods should be used.

2.2.
Step 2: Assessment of the SPM Output SPM output is usually assessed using a confusion matrix and its statistical metrics.The reference data for the accuracy assessment are considered the correct classification of the surface; thus, the agreement between the SPM output and the reference data is considered to reflect the effectiveness of the SPM.In SPM, the spatial pattern of LULC is also significant in determining the classification results and to their accuracies.Thus, the SPM output is assessed in terms of accuracy assessments and the values of the landscape pattern index [38].

Accuracy Assessment
The accuracy of the SPM output is evaluated using a confusion matrix [2].This matrix presents the differences between the SPM output and the reference data, where the diagonal elements show correct classifications and the others represent commission and omission errors for each class [2].Four accuracy assessment indices (OA, KP, PA, and UA) are extracted from the confusion matrices to provide a quantitative assessment of the SPM results [2].Both OA and KP measure the degree of agreement between the classification results and the reference data [2].The difference between OA and KP is that KP excludes the possibility that a pixel might be classified correctly by randomness [2].PA measures the ratio of pixels within a certain class in the reference map that are classified correctly in the classification map.Similarly, UA measures the ratio of pixels classified as a specific class in the classification map with those that are actually in that class in the reference map.In the experimental design, the accuracy of the SPM results with different zoom factors, at different class levels, and using different SPM methods are assessed and compared.Note, similar to traditional accuracy assessments for classified maps of remote sensing imagery, all subpixels within both pure and mixed pixels of fraction images can be used in the accuracy assessment, because the experimental design aims to investigate the overall performance of SPM for practical applications and because the LSI and the APP are calculated in an area including both pure and mixed pixels.

Landscape Heterogeneity Analysis
The LSI represents the measurement of spatial patterns, which could be the measurement of area/edge, shape, core area, and aggregation.It can be calculated at both the landscape and the class level, reflecting the measurement of the spatial pattern of the classification data over an entire area or for each class, respectively.Of the various indices, the LSI is chosen as a measurement of surface complexity because it has a direct interpretation for landscape heterogeneity [39].It is defined as the total length of patch edges within a landscape, divided by the total area, and adjusted by a constant for a square standard.It is represented as where E is the edge length and A is the patch area.The value of LSI begins at one and increases without limit as the landscape shape becomes increasingly irregular.Note that the calculation of LSI needs classified maps and each classified map can produce a LSI value by Equation (1).

Areal Pattern Proportion
The APP is an index with which to evaluate the areal spatial pattern, because most existing SPM methods are suited to areal features [16,18,24,25,40].The APP is equal to the area of features with areal pattern, divided by the total area: where A area is the area of the areal features, A linear is the area of features with linear pattern, and A point is the area of features with point pattern [33].The sum of A area , A linear and A point is the total area.A area , A linear and A point are extracted by a feature pattern recognition method [33].First, all features are segmented by the seeded region-growing model from each fraction image.Second, the shape-density index is calculated for each extracted feature.Last, features are divided into the three types according to different values of shape-density index.More details can be found in [33].

Combined Index
Both the LSI and the APP affect the performance of the SPM output.A CI of LSI and APP is developed next to assess the performance of the SPM results: where α and β are control parameters that balance the contribution of LSI and APP.To control the range of parameters (´1 to 1) and of CI (0 to 1), the LSI is normalized into the range of 0 to 1.

Representative Spatial Pattern Classes
To investigate the performance of SPM in dealing with features with different spatial patterns, some LULC classes are selected as representatives of areal, linear, and point features.In Geographical Information Sciences research, geographical features are usually generalized and classified into three types, i.e., polygon, polyline, and point, and these three patterns have been considered in many applications [33].Areal features refer to high-resolution (H-resolution) cases where pixels are smaller than the features of interest, such as cropland and lakes, while point features refer to low-resolution (L-resolution) cases where pixels are larger than the features of interest [8], e.g., buildings.Linear features differ from both areal and point features because pixels within this pattern are both shorter and wider than the features of interest and they typically retain connectivity [33], e.g., roads.

SPM Performance with Different Zoom Factors
Based on the accuracy assessment of all SPM output, the change of accuracy with different zoom factors is analyzed.The accuracy might decrease with increasing zoom factor [6,24,40,41], and therefore we are interested in establishing at which zoom factor the accuracy stabilizes.

SPM Performance of Different Classification Levels
Accuracies of SPM output at different classification levels are analyzed and compared.In a hierarchical classification system, different classification levels reflect the surface to differing degrees and they have different class amounts that also influence the SPM output.

SPM Performance of Different SPM Methods
The SPM output produced by different SPM methods is compared, and both similarities and differences are explored.The accuracy range and its change over the entire study area and for each class could also be discussed.

SPM Performance in Subareas with Different LSI Values
The spatial distribution of measurements calculated in subareas could be used to analyze SPM performance with different LSI values.The change of accuracy with LSI value can be illustrated using a scatterplot of measurements in subareas, where the LSI value is plotted as the x-axis and the accuracy assessment, such as OA and PA, is plotted as the y-axis.The accuracy assessment for different LSI ranges and the correlations between LSI and the accuracy assessment are calculated to explore the relationship between accuracy and LSI.

SPM Performance of Different Spatial Distribution Patterns
LULC classes with areal, linear, and point features are selected as representative features.Their accuracy assessments are compared to investigate the performance of SPM in dealing with different types of feature.Furthermore, the relationship between the APP values and accuracy assessments is discussed, as are the scatterplots of measurements in subareas for the APP values and accuracy assessments, and the correlation values between the APP and accuracy assessment.

Analysis of SPM Performance with CI Value
To analyze the relationship of SPM performance with both LSI and APP, the LSI and APP values are used as variables in a regression of accuracy assessments to obtain an estimation equation.Thus, when the LSI and APP values of a subarea are known, the equation could be used to estimate the performance of SPM and consequently, to determine whether the use of SPM is appropriate.Note, when exploring the relationship of the SPM results with the LSI, APP, and CI values, the LSI, APP, and CI indices and the accuracy of SPM results are calculated in county subareas.

Study Area and Data
The study area of the Jingjinji region, encompassing Beijing, Tianjin, and Hebei provinces, comprises an area of 220,000 km 2 .The Jingjinji region continues to play a considerable role in the regional development of China because it contains the political and cultural center of Beijing and a significant port city (Tianjin).The area mainly comprises cropland, most of which is represented by dry land on the plains.The northern and western parts of the study area are characterized by hilly topography dominated by woodland and grassland.Built-up areas are dispersed widely throughout the plains but are more concentrated along the coast.
The fraction images are degraded from land use data with 30-m spatial resolution (Figure 2a,b), which are derived from Landsat TM imagery by means of human-machine interactive interpretation [42].The input data for the SPM are synthetic land use fraction images with 1-km spatial resolution from 2005 (Figure 2c,d), provided by the Resources and Environmental Scientific Data Center of the Chinese Academy of Sciences.This land use data mainly focus on classes associated with human activities.In contrast to this, land cover data is more related to the physical and biological cover of the earth's land.To meet the requirements of many land resource management and ecological applications in China, the Chinese Academy of Sciences launched a project for generating land use datasets over China.This project was supported by many Chinese key technology research and development programs.The 1-km land use fraction images can be freely obtained from the Resources and Environmental Scientific Data Center, whereas the original 30-m spatial resolution classification data are not freely available.The steps for the generation of 1-km-resolution fraction images and reference data are as follows.(1) Polygons of the grids (such as the 1-km 2 grid) covering the Jingjinji region are produced; (2) Proportions of the classes within each polygon are determined by dividing the area of each class by the area of the polygon; (3) The estimated class proportions within each polygon are converted into fraction images; (4) Reference maps are obtained by transforming (or hardening) these fraction images with different spatial resolutions into labeled maps [16].Additional details about the 1-km fraction images and the 30-m land used data can be found at [34].
In practice, 1-km-spatial-resolution land use datasets are often too coarse to meet the requirements of many geo-information applications [42].Therefore, according to the experimental design, SPM methods are employed to produce land use maps at a scale finer than 1-km spatial resolution and they are freely provided for practical applications.The downscaling of these 1-km land use fraction images serves as an alternative means for sharing these data.Reference data used in the accuracy assessment are aggregated from the 30-m land use map and they have the same spatial resolution as the SPM output.Errors in the generation of the 30-m land use maps from the remote sensing imagery are not considered here, i.e., we assume the 30-m land use map is error free.
The classification system used for these data consists of several land use categories grouped into a hierarchical nomenclature at two levels with names and codes, as shown in Table 1.The classification system was designed by the Resources and Environmental Scientific Data Center of the Chinese Academy of Sciences [34].To explore the performance of SPM for different classification systems, the two classification levels are tested.Level I contains six land use categories: cropland, woodland, grassland, water, built-up, and bare land areas, each of which is separated into several subcategories at Level II.The study area incorporates all six land use categories at Level I and 22 of the 25 land use subcategories at Level II.

Experiments and Methods
To apply the experimental design as described in Section 2, the 1-km land use fraction images as the input of SPM and reference data are generated by the four steps described in Section 3.1.The zoom factor of the SPM is set at values from 2 up to 10 to generate the SPM results with spatial resolutions of 500, 333, . . ., and 100 m.For each zoom factor, there is a reference map that has the same spatial resolution as the corresponding SPM result.For the accuracy assessment of the entire study area, confusion matrices are calculated by comparing the SPM results with the corresponding reference images.The OA, KP, PA, and UA values are extracted as statistical indices from the confusion matrices.Note that, when exploring the relationships of LSI, APP, and CI to the accuracies of the SPM results, the LSI, APP and, CI are calculated in units of county from the 250 m reference map at Level I while OA and PA are also calculated in units of county by comparing the 250 m reference map with 250 m SPM maps at Level I.
To explore the potential of different SPM methods, two representative SPM methods based on spatial dependence are chosen: spatial-attraction-model-based subpixel mapping (SAMSPM) and vectorial-boundary-based subpixel mapping (VBSPM).SAMSPM was proposed originally by Mertens et al. (2006) and, subsequently, it has been applied successfully in several tests [6,16,22,23].It is considered an efficient method for predicting the spatial distribution of LULC classes at subpixel scale.VBSPM was proposed by Ge et al. (2014) to improve the geometric SPM method [28].Similar to SAMSPM, VBSPM is based on spatial dependence.These two SPM methods were chosen because neither involves iteration and they are both more efficient than other methods, e.g., pixel-swapping [13,14] for the generation of SPM maps with 22 classes over a large and complex area.

SAMSPM
SAMSPM uses the attraction between a subpixel and its neighboring pixels to represent the measurement of spatial dependence in estimating the spatial distribution of subpixels within a mixed pixel [15].The attraction between a subpixel and its neighboring pixels is calculated using fraction values in the neighborhood as where p ab (c) is the raw attraction of subpixel p ab (a, b = 1, 2, . . ., S; S is the zoom factor) for class c (c = 1, 2, . . ., C; C is the class number); P ij (c) is the fraction value of pixel P ij (i = 1, 2, . . ., M, j = 1, 2, . . ., N; M and N are the width and length of the image, respectively) for class c in neighborhood N t [p ab ] of subpixel p ab ; d(p ab ,P ij ) is the distance between pixel P ij and subpixel p ab .Avg{} is the operator that gets the mean of its arguments.Further details about SAMSPM can be found in [15].

VBSPM
VBSPM first geometrically partitions a mixed pixel and then estimates the locations and lengths of the segments for each LULC class polygon within the mixed pixel using a vectorial boundary extraction model, as shown in Equation (5).A ray-crossing algorithm is then used to determine the LULC class of each subpixel within each vectorial boundary [27]: , P pi`1qc ‰ P pi´1qc p1 ´Pic q 1  2 , P pi`1qc " P pi´1qc where V ic (0) and V ic are the start and final locations of segment i (i = 0, 1, . . ., 7) corresponding to pixel P i (i = 0, 1, . . ., 7) of class c (c = 1, 2, . . ., C); P ic , P (i+1)c , and P (i-1)c indicate the proportions of class c in pixel P i and to the left and right of neighboring pixel P i , respectively.Further details can be found in [27].

SPM Output and Accuracies
Using the 1-km fraction images in Figure 2c,d as the inputs of SPM methods, representative output of SAMSPM and VBSPM with spatial resolutions of 500 m (i.e., zoom factor = 2) and 100 m (i.e., zoom factor = 10) at Levels I and II are shown in Figure 3.A similar spatial pattern is observed from the study area and more detail is obviously resolved with increasing spatial resolution.It indicates that SPM can produce maps at different spatial resolutions from the 1-km fraction images and that these maps are visually similar to the original classification map.

SPM Output and Accuracies
Using the 1-km fraction images in Figure 2c,d as the inputs of SPM methods, representative output of SAMSPM and VBSPM with spatial resolutions of 500 m (i.e., zoom factor = 2) and 100 m (i.e., zoom factor = 10) at Levels I and II are shown in Figure 3.A similar spatial pattern is observed from the study area and more detail is obviously resolved with increasing spatial resolution.It indicates that SPM can produce maps at different spatial resolutions from the 1-km fraction images and that these maps are visually similar to the original classification map.

Entire Area
The OAs, KPs, PAs, and UAs of the SPM output obtained for Levels I and II are presented in Figure 4.It shows that OA decreases from 85% to 77% at Level I and from 83% to 75% at Level II, as the spatial resolution changes from 500 to 100 m.Similarly, KP decreases from 0.761 to 0.661 and from 0.751 to 0.656 at Levels I and II, respectively.Note, for a zoom factor greater than five, the accuracy assessments remain stable with a value of OA approximately equal to 78% at Level I and to 75% at Level II; generally, the OA is 2%-3% larger at Level I than at Level II.

Entire Area
The OAs, KPs, PAs, and UAs of the SPM output obtained for Levels I and II are presented in Figure 4.It shows that OA decreases from 85% to 77% at Level I and from 83% to 75% at Level II, as the spatial resolution changes from 500 to 100 m.Similarly, KP decreases from 0.761 to 0.661 and from 0.751 to 0.656 at Levels I and II, respectively.Note, for a zoom factor greater than five, the accuracy assessments remain stable with a value of OA approximately equal to 78% at Level I and to 75% at Level II; generally, the OA is 2%-3% larger at Level I than at Level II.

Each Class
At Level I, the PAs of the classes in SAMSPM have the same rank as in VBSPM for all zoom factors.Cropland and woodland are highest, followed by grassland and water, and bare land and

Each Class
At Level I, the PAs of the classes in SAMSPM have the same rank as in VBSPM for all zoom factors.Cropland and woodland are highest, followed by grassland and water, and bare land and built-up areas are the lowest, as shown in Figure 4. Cropland, which is distributed mainly in plain areas, is the dominant land use class.It has the highest accuracy among the six classes at Level I with a range of 83%-93% because of the large proportion of areal features.Woodland is distributed mainly in the mountains and foothills.Its accuracy is the second highest among the six classes with a range of 82%-89%, which is also attributable to the relatively large proportion of areal features.Grassland is distributed mainly in the mountains, plateaus, and foothills.Its accuracy follows cropland and woodland with a range of 66%-76% because it has a relatively complex distribution.Water is distributed mainly in the plain and valley areas.It has relatively low accuracy with a range of 62%-88% because of the large proportion of linear and point features.The built-up area is distributed mainly in the plain areas.It has relatively low accuracy because of the large proportion of point features.Bare land has accuracy of up to 82% because of the various pattern features.Note, the influence of bare land on OA is limited because it accounts for only a very small proportion of the entire area.
At Level II, the PA and UA values vary much more than at Level I, as shown in Figure 4. Paddy land and dry land, subdivided from cropland at Level I, have accuracies ranging from 79% to 94%.The accuracies of forest, shrub land, and open forest land, subdivided from woodland at Level I, all range from 63% to 86% because they have similar areal features.The high, medium, and low coverages of grassland, subdivided from grassland at Level I, have similar patterns and accuracies ranging from 63% to 77%.Rivers, lakes, reservoirs, intertidal areas, and beaches, subdivided from water at Level I, have various accuracies.Intertidal areas have the highest accuracy because of the areal features distributed along the shoreline.Lakes and reservoirs also have relatively high accuracy ranging from 74% to 87% because of their areal patterns, whereas rivers have lower accuracy (<68%) because of their linear patterns.Urban areas, rural areas, and other construction land, subdivided from built-up areas in Level I, have varying accuracies.Urban areas have high accuracy ranging from 87% to 92% because most urban areas have a large patch size.Rural areas have relatively low accuracy (<70%) because the class patches are small, even though some are smaller or similar to the fraction pixel size (1 km 2 ).Other construction land has relatively high accuracy ranging from 72% to 90%.Classes subdivided from bare land at Level I also have various accuracies.For example, marshland and saline-alkali soil have relatively high accuracies of 71% and 65%, respectively, whereas rock land with linear pattern and bare and sandy land with complex patterns have relatively low accuracies at Level II of about 30%, 40%, and 53%, respectively.

Relationship between LSI and Accuracies within Subareas
Figure 5 shows the LSI values calculated in units of county from the reference map (Level I) at a spatial resolution of 250-m.Generally, the mountainous areas in the north and west have relatively high LSI values, whereas the plains in central and southeastern areas have relatively low LSI values.At the class level, the mountainous areas in the north and east have relatively low LSI values for cropland, woodland, and grassland.Woodland in central parts and water both have LSI values with only small differences.
To explore the relationship between LSI and the accuracy measurements in units of county, the OA and PA values (y-axis) are plotted against LSI (x-axis) in Figure 6.Apparently, the OA values show a declining trend as LSI grows.The same trend is found for the classes of cropland and built-up areas.Water has relatively low LSI values but its PA values are widely separated.The plotted dots for built-up areas can be separated into two groups: one group has the same trend as the OA (i.e., the PA values decline as LSI grows), and the other group shows the PA values tending to increase with the increase of LSI.This may be due to the presence of two types of spatial pattern features: point features with LSI values of near one and low SPM accuracy, and areal features with high SPM accuracy.Figure 6 shows that OA has a relatively strong negative correlation of ´0.75 with LSI, whereas PA has a low correlation with LSI in grassland, woodland and built-up areas.Less complex areas, with low LSI values, usually have high accuracy because of the stronger spatial correlation between pixels/subpixels and lower uncertainty in class allocation.Some classes, however, might contain many regular point features with low LSI values and low class accuracy.The reason for this may be that spatial correlation only works if the patch size is smaller than the pixel size.When predicting subpixel classes using neighboring pixel information, SPM may not perform well.This reveals that areas with low LSI values and classes with areal features usually have high accuracy.To explore the relationship between LSI and the accuracy measurements in units of county, the OA and PA values (y-axis) are plotted against LSI (x-axis) in Figure 6.Apparently, the OA values show a declining trend as LSI grows.The same trend is found for the classes of cropland and built-up areas.Water has relatively low LSI values but its PA values are widely separated.The plotted dots for built-up areas can be separated into two groups: one group has the same trend as the OA (i.e., the PA values decline as LSI grows), and the other group shows the PA values tending to increase with the increase of LSI.This may be due to the presence of two types of spatial pattern features: point features with LSI values of near one and low SPM accuracy, and areal features with high SPM accuracy.Figure 6 shows that OA has a relatively strong negative correlation of −0.75 with LSI, whereas PA has a low correlation with LSI in grassland, woodland and built-up areas.Less complex areas, with low LSI values, usually have high accuracy because of the stronger spatial correlation between pixels/subpixels and lower uncertainty in class allocation.Some classes, however, might contain many regular point features with low LSI values and low class accuracy.The reason for this may be that spatial correlation only works if the patch size is smaller than the pixel size.When predicting subpixel classes using neighboring pixel information, SPM may not perform well.This reveals that areas with low LSI values and classes with areal features usually have high accuracy.The average OA and PA values in units of county within the different LSI ranges are displayed in Table 2.It shows that units of county with an LSI range of 1-4 have an average OA of 85%, units of county with an LSI range of 4-7 have an average OA of 79%, units of county with an LSI range of 7-10 have an average OA of 75%, and units of county with an LSI value >10 have an average OA of 70%-71%.The PA values for cropland tend to decline with the increase of LSI.The PA values for grassland and water are relatively low for small LSI values but they remain relatively high for median LSI values.The average PA values for woodland are relatively low (61.48%) when the value of LSI is The average OA and PA values in units of county within the different LSI ranges are displayed in Table 2.It shows that units of county with an LSI range of 1-4 have an average OA of 85%, units of county with an LSI range of 4-7 have an average OA of 79%, units of county with an LSI range of 7-10 have an average OA of 75%, and units of county with an LSI value >10 have an average OA of 70%-71%.The PA values for cropland tend to decline with the increase of LSI.The PA values for grassland and water are relatively low for small LSI values but they remain relatively high for median LSI values.The average PA values for woodland are relatively low (61.48%) when the value of LSI is within the range 1-4, whereas they remain relatively high when the value of LSI is within the range 4-13.The PA values for built-up areas tend to increase with the increase of LSI.Furthermore, some average PA values show large jumps that could be caused by extreme values.

Three Spatial Distribution Patterns and APP Values within Subareas
A representative subarea containing all three spatial patterns is chosen to explore the SPM performance for different spatial patterns and the results and accuracy assessments of the subarea are shown in Figure 7.In this subarea, woodland is a typical areal feature with the highest accuracy.Cropland, distributed in a linear pattern along the valley, has relatively lower accuracy than woodland.The built-up area, representative of a point pattern class, has the lowest accuracy.
Classes with areal patterns where class patches are much larger than the pixel size always have high accuracy, e.g., woodland at Level I with PA and UA values >94% (Figure 7).Classes with linear patterns, such as cropland (Figure 7) with PA and UA values ranging from 48% to 77%, have accuracies between those of areal and point features.The accuracy of classes with point patterns such as built-up areas (Figure 5), which are the so-called L-resolution cases [8], is the lowest among the three classes.
Figure 8 shows the APP values calculated in units of county.It can be seen that the spatial pattern of the APP values is similar to that of the LSI values.
The OA and PA values (y-axis) are plotted against APP (x-axis) in Figure 9.For all classes, OA does not show a significant increasing trend with the increase of APP in Figure 9, but there is a slight increase from 78.52% to 84.43% in Table 3.However, the trend of increase of PA with the increase of APP is apparent in all of the classes, as shown in Figure 9.The reason for this increase in PA with APP value may be that higher APP values mean a greater proportion of areal features that usually have higher accuracy in SPM. Figure 9 shows PA has relatively strong correlation with APP, while the relationship between OA and APP is less obvious (correlation coefficient: 0.21).Less complex areas with low APP values usually have high accuracy because of their stronger spatial correlation between pixels/subpixels and lower uncertainty in class allocation.Some classes, however, may contain many regular point features with low APP values and low accuracy.The reason for this is spatial correlation works well for features where the patch size is smaller than the pixel size.This indicates that areas with low APP values and classes with areal features usually have high accuracy.Classes with areal patterns where class patches are much larger than the pixel size always have high accuracy, e.g., woodland at Level I with PA and UA values >94% (Figure 7).Classes with linear patterns, such as cropland (Figure 7) with PA and UA values ranging from 48% to 77%, have accuracies between those of areal and point features.The accuracy of classes with point patterns such as built-up areas (Figure 5), which are the so-called L-resolution cases [8], is the lowest among the three classes.
Figure 8 shows the APP values calculated in units of county.It can be seen that the spatial pattern of the APP values is similar to that of the LSI values.
The OA and PA values (y-axis) are plotted against APP (x-axis) in Figure 9.For all classes, OA does not show a significant increasing trend with the increase of APP in Figure 9, but there is a slight increase from 78.52% to 84.43% in Table 3.However, the trend of increase of PA with the increase of APP is apparent in all of the classes, as shown in Figure 9.The reason for this increase in PA with APP value may be that higher APP values mean a greater proportion of areal features that usually have higher accuracy in SPM. Figure 9 shows PA has relatively strong correlation with APP, while the relationship between OA and APP is less obvious (correlation coefficient: 0.21).Less complex areas with low APP values usually have high accuracy because of their stronger spatial correlation between pixels/subpixels and lower uncertainty in class allocation.Some classes, however, may contain many regular point features with low APP values and low accuracy.The reason for this is spatial correlation works well for features where the patch size is smaller than the pixel size.This indicates that areas with low APP values and classes with areal features usually have high accuracy.The average OA and PA values of subareas within different APP ranges are shown in Table 3.It can be seen that subareas with an APP range of 0-0.2 have an average OA of 79%, subareas with an APP range of 0.2-0.4 have an average OA of 80%, subareas with an APP range of 0.4-0.6 have an average OA of 82%, subareas with an APP range of 0.6-0.8 have an average OA of 81%, and subareas with an APP value >0.8 have an average OA of 84%.The PA values for cropland tend to increase with the increase of APP.For example, cropland with an APP value of <0.8 has an average PA value of <65%, whereas cropland with an APP value of >0.8 has an average PA value of about 83%.The PA values for woodland, grassland, water, bare land, and built-up areas are similar to cropland, showing an increase of the average PA value with increasing APP.Generally, the greater the proportion of the areal feature, as indicated by higher APP values, the higher the accuracy of SPM.The average OA and PA values of subareas within different APP ranges are shown in Table 3.It can be seen that subareas with an APP range of 0-0.2 have an average OA of 79%, subareas with an APP range of 0.2-0.4 have an average OA of 80%, subareas with an APP range of 0.4-0.6 have an average OA of 82%, subareas with an APP range of 0.6-0.8 have an average OA of 81%, and subareas with an APP value >0.8 have an average OA of 84%.The PA values for cropland tend to increase with the increase of APP.For example, cropland with an APP value of <0.8 has an average PA value of <65%, whereas cropland with an APP value of >0.8 has an average PA value of about 83%.The PA values for woodland, grassland, water, bare land, and built-up areas are similar to cropland, showing an increase of the average PA value with increasing APP.Generally, the greater the proportion of the areal feature, as indicated by higher APP values, the higher the accuracy of SPM.

Impact of CI (Combined Index of LSI and APP) on SPM Results
To evaluate the impact of the CI on the SPM results, linear regression is used to determine the most suitable parameters for each CI of all classes and each class.The CI regression equations and plots of CI (x-axis) against OA and PA (y-axes) are shown in Figure 10.
For all classes, LSI mainly affects the OA values, while the relationship between APP and OA is not significant.For each class, both LSI and APP are involved in the CI formulations of the six classes and the CI values have relatively good agreement with PA.It is noteworthy that some classes have low correlations between PA and either LSI or APP, while they have relatively high correlations with CI.For example, grassland and woodland do not show a declining trend in OA with increasing LSI values (see Figure 6), whereas they do show good linear relationships between PA and CI (see Figure 10).Thus, although LSI and APP may not effectively characterize the relationship with PA for some classes, CI is an effective solution for measuring the relationship between landscape heterogeneity and areal feature proportions with PA.Therefore, CI represents an alternative index of the relationship between PA and land surface complexity.To evaluate the impact of the CI on the SPM results, linear regression is used to determine the most suitable parameters for each CI of all classes and each class.The CI regression equations and plots of CI (x-axis) against OA and PA (y-axes) are shown in Figure 10.For all classes, LSI mainly affects the OA values, while the relationship between APP and OA is not significant.For each class, both LSI and APP are involved in the CI formulations of the six classes and the CI values have relatively good agreement with PA.It is noteworthy that some classes have low correlations between PA and either LSI or APP, while they have relatively high correlations with CI.For example, grassland and woodland do not show a declining trend in OA with increasing LSI values (see Figure 6), whereas they do show good linear relationships between PA and CI (see Figure 10).Thus, although LSI and APP may not effectively characterize the relationship with PA for some classes, CI is an effective solution for measuring the relationship between landscape heterogeneity and areal feature proportions with PA.Therefore, CI represents an alternative index of the relationship between PA and land surface complexity.

SPM Performance with Change of Zoom Factor
This study shows that OA and KP generally decrease as the spatial resolution of the SPM results becomes finer with the increase of the zoom factor.The reason for this can be attributed to the increase in the number of subpixels within a coarse pixel as the zoom factor increases; thus, greater uncertainty is generated regarding class allocation of the subpixels [25,40].The reason for the stability of the accuracy assessment when the zoom factor is above five is that the proportion of mixed pixels within the 1-km fraction images becomes stable at this zoom factor.This is because the criterion defining

SPM Performance with Change of Zoom Factor
This study shows that OA and KP generally decrease as the spatial resolution of the SPM results becomes finer with the increase of the zoom factor.The reason for this can be attributed to the increase in the number of subpixels within a coarse pixel as the zoom factor increases; thus, greater uncertainty is generated regarding class allocation of the subpixels [25,40].The reason for the stability of the accuracy assessment when the zoom factor is above five is that the proportion of mixed pixels within the 1-km fraction images becomes stable at this zoom factor.This is because the criterion defining mixed or pure pixels depends upon the zoom factor.As an illustration, it can be seen that the spatial resolution changes from 500 to 200 m when the zoom factor increases from 2 to 5, whereas it changes from 200 to 100 m when the zoom factor increases from 5 to 10.

SPM Performance at the Two Classification Levels
Comparing the accuracy assessments of the SPM results at the two classification levels reveals that classes with areal patterns at Level II, subdivided from Level I, have similar PA and UA values as those at Level I because of the similarities in their distribution patterns.For example, the PA of dry land (83%-94%), subdivided from cropland, is close to cropland (83%-93%).Classes with linear or point patterns, however, are different from classes with areal patterns.For example, the classes of intertidal areas and rivers at Level II, subdivided from water, have different accuracies.Intertidal areas have high accuracy (>90%) and rivers have low accuracy (<68%), whereas the accuracy of water is 62%-88%.Another example is built-up areas (58%-64%) at Level I, which has the subclass of rural areas (<40%) with a point pattern.

Performances of SAMSPM and VBSPM
Both SAMSPM and VBSPM perform well in the experiments and produce equivalent accuracies in different areas, but some differences remain in the detail (Figure 10).VBSPM produces slightly better results with greater OA (range: 0.3%-1.1%)than SAMSPM.VBSPM, however, generates results for built-up areas at Level I and rivers, rural areas, and rock land at Level II that are slightly less accurate (<1%) than SAMSPM with a low zoom factor (<4).This is because these classes do not represent H-resolution cases, for which VBSPM is most applicable [27].The different performances between SAMSPM and VBSPM in these classes, however, have only minor influence on the OA.

SPM Performance in Subareas with Different Landscape Heterogeneity
A scatterplot is used to investigate the relationship between accuracy and LSI values.It shows that the OA and PA both decline with increasing surface complexity, which is measured by LSI.For the entire image, the less complex the surface, the better SPM performs.The complexity of the surface for the entire area and for each class can be measured separately by LSI at both the landscape (all classes) and the class level.For example, the experiment shows that subareas with LSI values in the range

SPM Performance of Point, Linear, and Areal Pattern Classes
Among the three spatial patterns, classes with areal pattern have the highest accuracy.One explanation for this is that areal features have strong spatial correlation between pixels and subpixels; thus, those SPM methods based on the assumption of spatial dependence are well able to deal with such correlation, especially VBSPM.A second explanation is that the object size of areal features is generally much larger than the pixel size; thus, it contains a relatively large proportion of pure pixels that can lead to relatively high accuracy in the downscaled classification.Because the SPM methods used in this study are based on the assumption of spatial correlation, land use classes with linear patterns, such as rivers, often do not remain contiguous in the SPM results; thus, their accuracy is not high, especially for linear features with small widths.However, classes with point patterns have the lowest accuracy assessment among the three spatial patterns.This is because greater uncertainty in predicting the spatial locations of point features is generated because of the fewer constraints or lack of complementary information.Atkinson [8] stated that the goal of SPM for point features is to predict patterns rather than to generate an accurate prediction on a subpixel-by-subpixel basis.Thus, some SPM algorithms based on pattern prediction may be more suited to mapping land use classes with point patterns [33].

Analysis of SPM Performance with CI Value
The CI is a linear combination of the LSI and APP, which are the two indices related to SPM performance.The values of the regression parameters reveal that the contributions of the two indices may vary for different areas and different classes.For all classes, CI is similar to LSI, which indicates that the less complex the area, the less impact APP has and the better SPM may perform.At the class level, CI has different combinations of the LSI and APP.The SPM performance is better for smaller values of LSI and larger values of APP.Generally, CI could be used as an indicator to predict the performance of SPM prior to its use in practical applications.

Uncertainty of SPM Caused by Fraction Images of Soft Classification
The experiment uses the error free fraction images to avoid the impact of the uncertainty in soft classification on the SPM results.Actually, uncertainty or errors in soft classification are inevitable when classifying real remote sensing images [27].As a result, uncertainty or errors from the soft classification of real remote sensing images would be propagated into SPM and the accuracy of SPM would be decreased.Ge [4] proposed a solution using the multiple-point simulation to reduce the uncertainty or errors in soft classification and it can be used in future SPM experiments on real remote sensing images.

Potential of SPM as an Alternative Method for Generating LULC Maps
Using the 1-km synthetic fraction images, the SPM methods based on the assumption of spatial dependence are able to produce land use maps at subpixel scales, with OA values of 77%-85% and 75%-83% for the six classes at Level I and 22 classes at Level II, respectively, over the large area of Jingjinji region in China.These accuracies decline with the increase of S, and stabilize at an OA of approximately 78% at Level I and 75% at Level II, when the zoom factor reaches five.The two SPM methods produce the highest accuracy for areal pattern classes (e.g., cropland, woodland, and grassland), medium accuracy for linear pattern classes (e.g., water), and the lowest accuracy for point pattern classes (e.g., bare land and built-up areas).In the subareas (i.e., counties), less complex areas exhibit higher accuracy; i.e., subareas with high LSI values have lower accuracy, and subareas with LSI values of <18 generate SPM results with OAs of >80%.
In summary, according to the SPM output, SPM has the potential to generate alternative LULC maps for regions where high-spatial-resolution maps are unavailable, especially in less complex areas with a large proportion of areal features.Thus, when using SPM to obtain finer-resolution LULC maps from real coarse remote sensing images or existing fraction images in future applications, the experimental design can be used.A primary assessment of the LSI and APP can be done to provide an indication of the most suitable method and to determine the appropriate zoom factor and classification system to use.

Conclusions
The objective of this study was to investigate the feasibility of using subpixel mapping (SPM) to obtain land use/land cover (LULC) data in the absence of high-spatial-resolution LULC maps.An experimental design was proposed to evaluate its feasibility for providing alternative LULC maps based on accuracy assessments and landscape measurements.In accordance with the experimental design, a case study was implemented in the Jingjinji region of China using 1-km land use fraction imagery as input data, the landscape shape index (LSI) as the measurement of surface complexity, and areal pattern proportion as the measurement of the spatial distribution pattern of geographical features.The results and analysis showed that overall accuracy (OA) was approximately 80% and that the accuracy declined with increasing zoom factor before stabilizing at a zoom factor of five.The accuracy of SPM was apparently associated with both the complexity and spatial pattern of the geographical features.Higher accuracy was obtained in less complex areas and when handling areal features, compared with handling linear and point features.Subareas with LSI values <4 showed OA values of approximately 85%, subareas with LSI values of 4-10 showed OA values of approximately 77%, and subareas with LSI values >10 had OA values of approximately 70%.
This experiment used an existing proportional land use dataset.In the future, soft classification results derived from remote sensing images should also be practicable.Moreover, other newly developed SPM methods, such as pattern-prediction-based SPM, might be worth consideration.Therefore, it would be interesting in the future to repeat this experiment using other SPM algorithms and actual remotely sensed images.

Figure 1 .
Figure 1.Flowchart of the experimental design.

Figure 4 .
Figure 4. OA, KP, PA, and UA of SAMSPM and VBSPM at Levels I and II.(a) OA and KP values of SAMSPM and VBSPM at Level I and Level II; (b) PA and UA values of SAMSPM and VBSPM at Level I; (c) PA and UA of SAMSPM and VBSPM values at Level II.

Figure 4 .
Figure 4. OA, KP, PA, and UA of SAMSPM and VBSPM at Levels I and II.(a) OA and KP values of SAMSPM and VBSPM at Level I and Level II; (b) PA and UA values of SAMSPM and VBSPM at Level I; (c) PA UA of SAMSPM and VBSPM values at Level II.

Figure 10 .
Figure 10.Correlation between CI value (x-axis) and accuracy (y-axis) at landscape level.

Figure 10 .
Figure 10.Correlation between CI value (x-axis) and accuracy (y-axis) at landscape level.

Table 1 .
Classification system used in this study at hierarchical levels I and II.

Table 2 .
Average OA and PA in units of county with different LSI ranges.

Table 3 .
Average OA and PA in units of county with different APP ranges.
Impact of CI (Combined Index of LSI and APP) on SPM Results 1-4 have OA values of about 85%, subareas with LSI values in the range 4-10 have OA values of about 77%, and subareas with LSI values >10 have average OA values of about 70%.However, at the class level, PA is not always consistent with the LSI value.Usually, classes with H-resolution features (e.g., cropland and woodland) have relatively high PA values.Some classes with point features (e.g., water and built-up areas) have relatively low LSI and low PA values, because SPM does not perform well when dealing with point features.Areal features with lower LSI values have higher PA values than point features with low LSI values.