Mapping Complex Urban Land Cover from Spaceborne Imagery : The Influence of Spatial Resolution , Spectral Band Set and Classification Approach

Detailed land cover information is valuable for mapping complex urban environments. Recent enhancements to satellite sensor technology promise fit-for-purpose data, particularly when processed using contemporary classification approaches. We evaluate this promise by comparing the influence of spatial resolution, spectral band set and classification approach for mapping detailed urban land cover in Nottingham, UK. A WorldView-2 image provides the basis for a set of 12 images with varying spatial and spectral characteristics, and these are classified using three different approaches (maximum likelihood (ML), support vector machine (SVM) and object-based image analysis (OBIA)) to yield 36 output land cover maps. Classification accuracy is evaluated independently and McNemar tests are conducted between all paired outputs (630 pairs in total) to determine which classifications are significantly different. Overall accuracy varied between 35% for ML classification of 30 m spatial resolution, 4-band imagery and 91% for OBIA classification of 2 m spatial resolution, 8-band imagery. The results demonstrate that spatial resolution is clearly the most influential factor when mapping complex urban environments, and modern “very high resolution” or VHR sensors offer great advantage here. However, the advanced spectral capabilities provided by some recent sensors, coupled with contemporary classification approaches (especially SVMs and OBIA), can also lead to significant gains in mapping accuracy. Ongoing development in instrumentation and methodology offer huge potential here and imply that urban mapping opportunities will continue to grow.


Introduction
Detailed land cover information is crucial for mapping and managing complex urban environments across local and regional scales [1,2], and remote sensing is the only practical and cost-effective means of generating such information over large areas [3].However, mapping urban land poses a significant challenge for remote sensing due to the high spatial frequency of surface features [4][5][6]; urban land is highly heterogeneous, involving a mosaic of both human-made materials (such as asphalt, concrete, roof tiles and other impervious surfaces) and semi-natural surfaces (for instance grass, trees, bare soil, water etc.).Therefore, although spaceborne remote sensing has been employed for urban land cover classification over several decades, early work was limited by the relatively coarse spatial resolution of available sensors, perhaps most commonly Landsat Thematic Mapper (TM) and its 30 m resolution multispectral imagery [7,8].In urban environments, this level of spatial detail leads inevitably to mixed pixels, whereby each pixel exhibits some spectral average representing multiple surface features [9][10][11].

VHR Sensors
A breakthrough for urban mapping came around the turn of the millennium with the advent of so-called "very high resolution" (VHR) satellite sensors, led by the 4 m spatial resolution (multispectral) IKONOS mission in 1999, but followed by a series of other instruments including OrbView-3 (also 4 m resolution), QuickBird (2.4 m) and GeoEye-1 (1.6 m) [12].The advantage of these image sources for classifying urban land cover is obvious; the fine spatial resolution enables relatively accurate identification of small urban features [13][14][15].Nonetheless, despite their benefit of fine spatial resolution, these VHR instruments tended to have rather limited spectral capabilities.For instance, compared with Landsat TM's seven spectral bands (three visible, near infrared, two shortwave infrared and thermal infrared), IKONOS, OrbView-3, QuickBird and GeoEye-1 each has only four (visible and near infrared) spectral bands.Such a limited spectral band set potentially constrains the ability of remotely sensed imagery to distinguish between urban surfaces, given their often subtly varying spectral properties [4,16].This is perhaps especially a problem where detailed thematic classification is attempted (i.e., where many specific land cover classes are mapped rather than few broad categories).For instance, a planning agency official conducting a land use inventory may wish to map several different types of roofing materials rather than a single "buildings" class [17].

Enhanced Spectral Capabilities
Now, a latest generation of satellite sensors is emerging with enhanced spectral, as well as advanced spatial, properties.Notably, two new VHR instruments, WorldView-2 (WV2) and WorldView-3 (WV3), acquire multispectral imagery with eight spectral bands: coastal, blue, green, yellow, red, red-edge, near infrared 1 (NIR1) and near infrared 2 (NIR2).In particular, the coastal, yellow and red-edge bands, as well as NIR2, represent "new" spectral bands, not routinely found on multispectral sensors.(Indeed, these advancements in spectral capability are not restricted to VHR instruments; the most recent Landsat sensor, Operational Land Imager (OLI), has ten multispectral bands, including new coastal, cirrus and thermal infrared 2 bands.)These enhanced spectral properties may prove especially valuable for urban mapping, enabling subtly varying spectral classes to be identified [18].This advantage is likely to be most pertinent where detailed classification schema are involved, for instance when identifying many different land cover classes in complex urban environments.

Pixel-Based versus Object-Based Classification
Notwithstanding these spectrally and spatially advanced satellite sensors, difficulties remain for urban mapping.Traditionally, urban classification has been conducted using pixel-based approaches, whereby land cover classes are allocated to each individual pixel [18,19]; and historically most such analysis has employed statistical parametric classifiers such as the maximum likelihood (ML) algorithm [15,20,21].Though ML classification is a perfectly valid method, it makes certain statistical assumptions about the data, and the nature of VHR imagery can mean that it is difficult to honour these assumptions.Specifically, ML classification tends to work well where training data are relatively "clean", such as where coarse spatial resolution imagery is used to classify general land cover classes.
Where training data are rather noisier, such as where fine resolution imagery is used to map complex, e.g., urban, environments, the ML classifier can be considerably less accurate [9,21].More generally, pixel-based approaches as a whole have limitations when it comes to urban analysis using VHR imagery [21,22].Contrary to the problem of mixed pixels which occurs where image spatial resolution is too coarse, VHR imagery can effectively "over-sample" the scene whereby within-feature variation (occurring where image resolution is too fine) reduces pixel-based classification accuracy [5,[23][24][25].
Recent years have seen significant development with classification methodologies, and some of these have particular relevance for urban mapping.Non-parametric pixel-based classifiers such as support vector machines (SVMs) seem well-suited to VHR urban classification since they are better able to handle noisy training data, compared to for instance the ML classifier [18,26].Moreover, object-based classification has grown in popularity, whereby land cover classes are allocated to objects representing real-world features instead of somewhat arbitrary pixel structures [23].Other practitioners have tested spatial indices and wavelet-based approaches to enhance classification performance [27].
The object-based approach directly addresses, and to an extent overcomes, the problem of within-feature variation and its attendant (pixel-based) misclassification [1,28,29].Consequently, object-based classification, which exploits spatial, textural and topological (as well as spectral) information [30][31][32] may facilitate highly accurate mapping of complex urban environments using VHR imagery.

Mapping Complex Urban Land Cover
Theory underpinning spectral and spatial image properties and their influence on land cover classification accuracy is fairly well-established, and different image sources and classifiers have been tested quite widely on urban environments.However, such experiments have often tended to be limited in scope, perhaps comparing only one or two variables (e.g., spatial resolution or classification approach); and/or adopting general and unambitious thematic classification schema; and/or working with small image data sets.For instance, many studies have conducted fairly basic, procedural comparisons between pixel-and object-based classification [21,33,34], but these have not necessarily considered other influential variables such as input spatial resolution or spectral bands, or classification algorithm.Some practitioners have (sensibly) adopted VHR imagery, including WV2 data with its enhanced spectral properties, for mapping urban environments, but commonly these attempt only broad differentiation between a few general land cover classes [34][35][36] rather than detailed distinction of many specific classes.In effect, these studies seem content to replicate the sort of classification schema used for decades with medium spatial resolution imagery, rather than attempting to exploit the full information content of WV2 imagery and create very thematically-detailed urban land cover maps.Also, most test WV2 data sets tend to be very small, often only a few hectares [37], and this can limit the strength of the scientific findings since there is no consideration of spatial extrapolation or transferability of the approach.That is, the results may be very parochial and depend strongly on local context.This paper builds on our theoretical understanding of how image and classifier characteristics influence land cover accuracy by presenting an exhaustive and rigorous practical experiment to compare the influence of spatial resolution, spectral band set and classification approach for mapping complex urban environments.Uniquely this study provides a full test of the latest VHR imagery for urban classification, demonstrating how its component (spatial and spectral) parts contribute to output mapping accuracy.Analysis is iterated using a series of different spatial resolutions and spectral band sets to simulate imagery ranging from traditional medium spatial resolution satellite sensors such as Landsat TM to state-of-the-art VHR sensors like WV2, adapting an approach developed by [38].Moreover, given the influential role that classification approach plays on output accuracy, and how this is linked intrinsically with image specifications, all image data sets are classified using parametric and non-parametric pixel-based, and object-based, classifiers.Unlike earlier work, this study adopts a detailed classification system, including many specific land cover classes rather than few general categories.This enables a fuller and more robust assessment of the WV2 data, but also delivers helpful practical information for urban planners and other user communities on the level of thematic detail that can be achieved when mapping complex urban environments.Finally, analysis is conducted using a relatively large image covering approximately 121 km 2 of the city of Nottingham, UK and its environs.This means that urban land cover information is generated at a scale of practical value and relevance (the whole city-scale), unlike earlier experiments that have been limited to very small, local areas.

Study Area and Classification Schema
The study area is the city of Nottingham, UK and its environs (Figure 1), located at 52 ˝57 1 N latitude, 1 ˝08 1 W longitude.Nottingham has a population of slightly more than 300,000 [39] and covers an area of approximately 121 km 2 .The climate is cool, moist temperate, with average high summer temperatures around 20 ˝C, and average monthly precipitation around 50 mm throughout the year.The topography is fairly flat, with altitude generally around 100 m.Nottingham is a relatively typical UK city, in that it comprises a mixture of residential, industrial and commercial land use, and therefore represents a good test for urban mapping methodologies.Land cover can be broadly categorised into various types of anthropogenic features (e.g., asphalt, concrete, roof materials) intersecting with the semi-natural environment (e.g., vegetation (grass, trees), bare soil and water).The central urban core is generally more built-up and less vegetated than the outlying residential areas, though this varies considerably across the city and its districts.A classification schema was developed that captured the detailed spatial heterogeneity of the urban land cover throughout Nottingham.In total, eleven classes were identified: asphalt, concrete roofs, clay roofs, slate roofs, metal roofs, grass, broadleaved trees, needle-leaved trees, bare soil, water and shadow (Table 1).
is generated at a scale of practical value and relevance (the whole city-scale), unlike earlier experiments that have been limited to very small, local areas.

Study Area and Classification Schema
The study area is the city of Nottingham, UK and its environs (Figure 1), located at 52°57′N latitude, 1°08′W longitude.Nottingham has a population of slightly more than 300,000 [39] and covers an area of approximately 121 km 2 .The climate is cool, moist temperate, with average high summer temperatures around 20 °C, and average monthly precipitation around 50 mm throughout the year.The topography is fairly flat, with altitude generally around 100 m.Nottingham is a relatively typical UK city, in that it comprises a mixture of residential, industrial and commercial land use, and therefore represents a good test for urban mapping methodologies.Land cover can be broadly categorised into various types of anthropogenic features (e.g., asphalt, concrete, roof materials) intersecting with the semi-natural environment (e.g., vegetation (grass, trees), bare soil and water).The central urban core is generally more built-up and less vegetated than the outlying residential areas, though this varies considerably across the city and its districts.A classification schema was developed that captured the detailed spatial heterogeneity of the urban land cover throughout Nottingham.In total, eleven classes were identified: asphalt, concrete roofs, clay roofs, slate roofs, metal roofs, grass, broadleaved trees, needle-leaved trees, bare soil, water and shadow (Table 1).

Image and Reference Data
A WorldView-2 (WV2) image of Nottingham was acquired on 26 May 2012.The multispectral imagery was supplied in 11 bit data format, at a spatial resolution of 2 m and with eight spectral wavebands: coastal, blue, green, yellow, red, red edge, NIR1 and NIR2 (Figure 2, top line).Image preprocessing requirements were minimal for two reasons.First, a single source data set was used whereby all comparative outputs were derived from the original WV2 image, and this meant that geometric distortion was of relatively little consequence.Nonetheless, the image's geometric fidelity was examined manually by cross-referencing the image with ancillary map data; geometric accuracy proved relatively high in general.Second, analysis involved thematic classification and the accuracy of output land cover maps was assessed independently (of the original spectral imagery).This meant that external factors such as atmospheric distortion that influence original (input) pixel digital numbers were of little consequence.

Image and Reference Data
A WorldView-2 (WV2) image of Nottingham was acquired on 26 May 2012.The multispectral imagery was supplied in 11 bit data format, at a spatial resolution of 2 m and with eight spectral wavebands: coastal, blue, green, yellow, red, red edge, NIR1 and NIR2 (Figure 2, top line).Image preprocessing requirements were minimal for two reasons.First, a single source data set was used whereby all comparative outputs were derived from the original WV2 image, and this meant that geometric distortion was of relatively little consequence.Nonetheless, the image's geometric fidelity was examined manually by cross-referencing the image with ancillary map data; geometric accuracy proved relatively high in general.Second, analysis involved thematic classification and the accuracy of output land cover maps was assessed independently (of the original spectral imagery).This meant that external factors such as atmospheric distortion that influence original (input) pixel digital numbers were of little consequence.The original 2 m spatial resolution, 8 spectral band WV2 image was modified to create a series of spatial/spectral data sets for comparative classification analysis.First, the imagery was degraded successively to a series of coarser spatial resolutions: 4 m, 10 m and 30 m.These particular values were chosen to approximate the spatial properties of commonly used satellite sensors, ranging from state-of-the-art VHR imagery to traditional medium resolution imagery.For instance, while 2 m represents WV2, 4 m matches earlier VHR imagery from IKONOS, 10 m matches the new Sentinel-2 MultiSpectral Instrument (MSI), and 30 m matches Landsat TM or OLI.
Second, two additional spectral band subsets were created from the 8 band original.This was a simple process that just involved deselecting spectral bands as required; a 4 band subset was created using the blue, green, red and NIR1 bands, and a 6 band subset was created using these four plus the red edge and NIR2 bands (Figure 2).Again, the aim here was to compare a range of spectral band sets, and where possible approximate the spectral properties of commonly used satellite sensors.The original 8 band WV2 image represents state-of-the-art VHR remote sensing, but also shares some spectral innovations with other recently developed sensors.For instance, Landsat OLI, Sentinel-2 MSI and RapidEye use certain novel bands, including, in common with WV2, coastal (OLI) and red edge (RapidEye).The 4 band subset represents a conventional and widely used visible/near infrared band set.For instance, other VHR sensors such as IKONOS and GeoEye-1 use these four bands; and many medium resolution sensors, including some early Landsat instruments, typically use three or four visible and near infrared bands.The 6 band subset is less direct in matching real-world sensors, but represents an intermediate step between the 4 and 8 band data sets, and also specifically targets The original 2 m spatial resolution, 8 spectral band WV2 image was modified to create a series of spatial/spectral data sets for comparative classification analysis.First, the imagery was degraded successively to a series of coarser spatial resolutions: 4 m, 10 m and 30 m.These particular values were chosen to approximate the spatial properties of commonly used satellite sensors, ranging from state-of-the-art VHR imagery to traditional medium resolution imagery.For instance, while 2 m represents WV2, 4 m matches earlier VHR imagery from IKONOS, 10 m matches the new Sentinel-2 MultiSpectral Instrument (MSI), and 30 m matches Landsat TM or OLI.
Second, two additional spectral band subsets were created from the 8 band original.This was a simple process that just involved deselecting spectral bands as required; a 4 band subset was created using the blue, green, red and NIR1 bands, and a 6 band subset was created using these four plus the red edge and NIR2 bands (Figure 2).Again, the aim here was to compare a range of spectral band sets, and where possible approximate the spectral properties of commonly used satellite sensors.The original 8 band WV2 image represents state-of-the-art VHR remote sensing, but also shares some spectral innovations with other recently developed sensors.For instance, Landsat OLI, Sentinel-2 MSI and RapidEye use certain novel bands, including, in common with WV2, coastal (OLI) and red edge (RapidEye).The 4 band subset represents a conventional and widely used visible/near infrared band set.For instance, other VHR sensors such as IKONOS and GeoEye-1 use these four bands; and many medium resolution sensors, including some early Landsat instruments, typically use three or four visible and near infrared bands.The 6 band subset is less direct in matching real-world sensors, but represents an intermediate step between the 4 and 8 band data sets, and also specifically targets spectral bands of value for characterising terrestrial features.In total, 12 spatial/spectral image combinations (4 spatial resolutions, 3 spectral band sets) were used for comparative classification analysis (see Figure 3).spectral bands of value for characterising terrestrial features.In total, 12 spatial/spectral image combinations (4 spatial resolutions, 3 spectral band sets) were used for comparative classification analysis (see Figure 3).Before proceeding to classification analysis, the (4, 6 and 8) spectral band sets were supplemented with certain spectral indices, with the underlying intention to increase the accuracy of the resultant classifications.Specifically, the 4 and 6 band sets were supplemented with a normalized difference vegetation index (NDVI, ) [40] layer (so they in effect became 5 and 7 band data sets, respectively).The 8 band set was supplemented with NDVI, Normalized Difference Bare Soil Index (NDBSI, ) and Normalized Difference Brick Roof Index (NDBRI, ) layers [41] (so this in effect became an 11 band data set).Note, NDBSI and NDBRI could be calculated for the 8 band set, but not the 4 or 6 band sets, because only the full 8 band set included a yellow band.These particular spectral indices were added through trial-and-error whereby many indices were tested and these three proved useful to aid identification of vegetation, soil and roof classes.The indices were added initially at the object-based classification stage (described below), but to ensure a fair comparison between all classification analysis, the same input data layers (i.e.spectral bands plus indices) were used for all classifiers.Reference data were collected from a range of sources to create an independent data set for training and testing the classification analysis.Field land cover survey was conducted at locations throughout the study area in May 2013, matching the anniversary date of original image acquisition.Free online spatial data resources such as Google Street View and Bing Maps were used to supplement field survey [42,43], whereby secondary ground photos and images were browsed to identify the land cover classes present at sample locations.Detailed vector map data-specifically MasterMap data [44] created by Ordnance Survey-were also consulted and cross-referenced with the imagery to gain a fuller appreciation of the land cover and land use present throughout the study area.Reference data sources were compiled and triangulated to create a comprehensive reference data set of land cover at locations throughout the study area.This data set was split into two parts, layers [41] (so this in effect became an 11 band data set).Note, NDBSI and NDBRI could be calculated for the 8 band set, but not the 4 or 6 band sets, because only the full 8 band set included a yellow band.These particular spectral indices were added through trial-and-error whereby many indices were tested and these three proved useful to aid identification of vegetation, soil and roof classes.The indices were added initially at the object-based classification stage (described below), but to ensure a fair comparison between all classification analysis, the same input data layers (i.e., spectral bands plus indices) were used for all classifiers.Reference data were collected from a range of sources to create an independent data set for training and testing the classification analysis.Field land cover survey was conducted at locations throughout the study area in May 2013, matching the anniversary date of original image acquisition.Free online spatial data resources such as Google Street View and Bing Maps were used to supplement field survey [42,43], whereby secondary ground photos and images were browsed to identify the land cover classes present at sample locations.Detailed vector map data-specifically MasterMap data [44] created by Ordnance Survey-were also consulted and cross-referenced with the imagery to gain a fuller appreciation of the land cover and land use present throughout the study area.Reference data sources were compiled and triangulated to create a comprehensive reference data set of land cover at locations throughout the study area.This data set was split into two parts, one used to create training class samples for classification analysis, and the other used to test the accuracy of the output land cover maps.

Research Methods
Three different classification approaches were tested: 1.
Maximum likelihood classification: a parametric pixel-based approach; 2.
Support vector machine classification: a non-parametric pixel-based approach; and 3.
Object-based classification.
In total, land cover classification was conducted using 36 different data set/classifier combinations (4 spatial resolutions ˆ3 spectral band sets ˆ3 classifiers; Figure 3).

Pixel-Based Class Training
The first step in supervised land cover classification is generally class training.Indeed, choosing appropriate training samples is one of the most critical aspects of classification methodologies, and can be very significant in determining the final success (or otherwise) of the classification process.Here, training was first conducted for pixel-based classification, and this involved laborious trial-and-error, iterating training samples to optimise classification performance.Initially, some theoretical considerations influenced training data selection.The author of [45] recommends a minimum of 10ρ to 30ρ training samples per class, where ρ = number of spectral bands.In this study, with eight spectral bands, the minimum requirement is therefore between 80 and 240 samples per class, and, with 11 classes, between 880 and 2640 for the whole classification.Also, training samples were selected randomly from locations throughout the study area, thereby avoiding any spatial bias that can be caused where training samples are spatially clustered.
Training was first carried out using the 2 m spatial resolution imagery.Because of the great spatial complexity of this VHR data, and drawing on contemporary research practice (e.g., [46]), it was decided that 3 ˆ3 blocks of pixels would be used for training here, rather than individual pixels.Blocks or groups of pixels provide some representation of the natural variation present within land cover structures at this scale of observation and, as [47] notes, this approach can avoid the selection of potentially noisy and unrepresentative individual pixels.In total, for the 2 m resolution image, 479 training samples were selected, each representing a block of nine (3 ˆ3) pixels, so 4311 pixels overall.This is well above the minimum requirement specified by [45].
Once class training was complete for the 2 m spatial resolution imagery, the process was repeated successively on the 4, 10 and 30 m imagery.Every attempt was made to use the same or similar training points at the different resolutions to ensure direct comparability between results, but some slight modifications were necessary.First, because of the spatial averaging implicit to coarsening resolution, it was neither desirable nor possible to maintain 3 ˆ3 blocks of pixels as training samples, so individual pixels were used instead.As spatial resolution becomes coarser, pixels cover larger areas on the Earth's surface, so individual pixels are less likely to represent very small, unrepresentative features.Also, as resolution coarsens, it becomes harder to identify homogenous training samples that extend over 3 ˆ3 pixels; for a 30 m spatial resolution image, training samples would need to cover almost a hectare in size, and this is unlikely and uncommon in an urban environment.
Second, for accurate classification results (where hard training as opposed to soft or fuzzy training is used), training classes should be pure, or as pure as possible.That is, each training sample should represent only its designated land cover class, not a mixture of classes.Clearly, as spatial resolution coarsens, it becomes harder to identify pure pixels as training samples since there is more pixel mixing in general.Here, to ensure training samples were as pure as possible, each original (2 m imagery) training point was inspected to determine whether or not it represented a pure land cover class at the coarser spatial resolution.Only those samples that were deemed pure were retained for classification; others were discarded.This had the effect that the total number of training samples reduced successively at each coarser spatial resolution (4 m resolution = 412 training samples, 10 m = 299, 30 m = 254), meaning it was not always possible to achieve the recommended number.Nonetheless, relatively large samples were maintained for all classifications, and this approach enabled direct comparability between results.Further, to ensure the suitability of training classes, various statistical tests were conducted.
While conducting class training, care was taken to investigate and ensure the spectral separability of classes.In particular, class spectral graphs were examined and transformed divergence (TD) measures were calculated to enable a statistical assessment of class separability.Where initial TD values were relatively low, e.g., below 1.3 on the scale of 0-2 (where 0 = not separable and 2 = completely separate) as recommended by [48], training classes were inspected and refined, with the removal and addition of points as necessary.Eventually, through repeated evaluation of TD values and refinement of training classes, all training class sets achieved satisfactory spectral separability.

Pixel-Based Classification
Initially, maximum likelihood (ML) classification was performed on the 12 image data sets, using the training data as described above.The ML algorithm is perhaps the most commonly used image classification approach [49,50] and is now widely-known and well-understood (e.g., see [51] for a full description), so only brief detail is provided.The main intention of using ML classification here was to demonstrate the performance of a conventional parametric pixel-based classifier as a benchmark against which other, newer classification approaches could be compared.Though ML classification is generally effective where its assumptions of data normality are met, it may be that the inherent "noisiness" (i.e., spatial heterogeneity) of VHR image pixels renders this form of data unsuitable for parametric classification.
Next, a support vector machine (SVM) classification was performed on the 12 image data sets.The SVM is a non-probabilistic binary linear classifier which, through the operation of the kernel trick, determines the radial position of decision boundaries (support vectors) that yield the optimal separation of classes [52,53].SVMs are increasingly used in image classification, often increasing classification accuracy over traditional approaches [37,54,55].In locating the support vectors, SVMs tend to use only a subset of the training data and so they are particularly advocated for use with high-dimensional data sets primarily because it is believed that the decision making is not constrained by the Hughes effect [16,56,57].Although others dispute this somewhat [58], use of the SVM as a classifier should benefit complex classification problems-e.g., where fine spatial resolution imagery is used to map detailed classification schema in heterogeneous environments-and can perform better than ML classification for urban environments using VHR imagery [59,60].However, as a pixel-based approach, it may still suffer from within-feature variation leading to some degree of misclassification [1,23].
Parameter settings for the SVM classifier were chosen through consideration of prevailing theory and literature where available, plus trial-and-error testing, ultimately leading to optimum classification outputs.(SVM classification was conducted using ENVI image processing software [61]).A radial basis function nonlinear (Gaussian) kernel method was used [34,37,62] because it deals with non-linear problems [63] and can be used for various applications [34,64].This kernel requires two main parameters to be determined: gamma and penalty.Gamma expresses the degree of influence of training samples on the classification process (as gamma increases, influence decreases), and penalty controls the trade-off between misclassification of training samples and simplicity of the decision surface [65].After extensive trial-and-error testing, gamma and penalty were set at 0.5 and 500 respectively.

Object-Based Classification
Following pixel-based classification, object-based classification was performed on the 12 image data sets.Object-based classification operates at the scale of identifiable objects or patches in the landscape, rather than pixels.Usually these objects are derived directly from remotely sensed imagery, whereby spectrally similar neighbouring pixels are grouped together to form objects [32,51].This is the main focus of the now established field of object-based image analysis (OBIA) or geographic object-based image analysis (GEOBIA).The development of OBIA has been linked closely with the emergence of VHR imagery since fine spatial resolution imagery is especially susceptible to within-feature (or within-object) variation and resultant pixel-based misclassification [1,19].
Object-based classification generally involves two main steps, segmentation and classification.Segmentation is conducted first, and this process can be influenced by various spatial parameters.For instance, in the case of eCognition [66] (the OBIA software package used here), the three main parameters of interest are scale, shape and compactness.These three parameters determine segmented objects on the basis of, respectively, the object's size (determined by spatial heterogeneity), its regularity of form (i.e., the complexity of an object's boundary configuration) and how closely packed the object's pixels are (through comparison of the object to a circle).Following segmentation, classification is conducted on the segmented objects.Each object is classified on the basis of its pixels' spectral information, but this can also be supplemented by additional discriminating variables such as object size and shape.
Here, considerable experimentation was conducted to determine the optimum OBIA approach, and ultimately a multi-stage (sometimes referred to as multi-scale) object-based classification procedure was developed (Figure 4).Initially, vegetation and non-vegetation features were distinguished (stage 1).Then, vegetation features were divided into their constituent classes (stage 2a), and separately non-vegetation features were divided into their constituent classes (stage 2b).Multi-stage OBIA approaches have been used widely in recent times to classify complex environments [24,67] since single-stage procedures cannot always achieve balanced segmentation outcomes for all classes of interest.That is, specific segmentation parameters may be suitable for certain classes (e.g., large areas of grassland), but lead to considerable under-(or over-) segmentation of other classes (e.g., buildings).Since multi-stage OBIA allows different parameter settings for different classes, this can achieve optimum classification outcomes for all classes [68].
Remote Sens. 2016, 8, 88 9 of 22 geographic object-based image analysis (GEOBIA).The development of OBIA has been linked closely with the emergence of VHR imagery since fine spatial resolution imagery is especially susceptible to within-feature (or within-object) variation and resultant pixel-based misclassification [1,19].
Object-based classification generally involves two main steps, segmentation and classification.Segmentation is conducted first, and this process can be influenced by various spatial parameters.For instance, in the case of eCognition [66] (the OBIA software package used here), the three main parameters of interest are scale, shape and compactness.These three parameters determine segmented objects on the basis of, respectively, the object's size (determined by spatial heterogeneity), its regularity of form (i.e., the complexity of an object's boundary configuration) and how closely packed the object's pixels are (through comparison of the object to a circle).Following segmentation, classification is conducted on the segmented objects.Each object is classified on the basis of its pixels' spectral information, but this can also be supplemented by additional discriminating variables such as object size and shape.
Here, considerable experimentation was conducted to determine the optimum OBIA approach, and ultimately a multi-stage (sometimes referred to as multi-scale) object-based classification procedure was developed (Figure 4).Initially, vegetation and non-vegetation features were distinguished (stage 1).Then, vegetation features were divided into their constituent classes (stage 2a), and separately non-vegetation features were divided into their constituent classes (stage 2b).Multi-stage OBIA approaches have been used widely in recent times to classify complex environments [24,67] since single-stage procedures cannot always achieve balanced segmentation outcomes for all classes of interest.That is, specific segmentation parameters may be suitable for certain classes (e.g., large areas of grassland), but lead to considerable under-(or over-) segmentation of other classes (e.g., buildings).Since multi-stage OBIA allows different parameter settings for different classes, this can achieve optimum classification outcomes for all classes [68].A key factor for segmentation is how well segmented outputs correspond to real-world features.While the optimum shape and compactness settings remained consistent between input data sets (see Table 2 below), the scale setting had a significant impact on segmentation outcome [29].Some recent A key factor for segmentation is how well segmented outputs correspond to real-world features.While the optimum shape and compactness settings remained consistent between input data sets (see work has promoted the use of built-in segmentation assessment, where appropriate segmentation scales are determined during the OBIA process (e.g., [69,70]).Here, we conducted sensitivity testing to compare a range of scale parameter settings and assessed their accuracy using a combination of objective metrics, as described by [71], and human assessment.40 objects were selected randomly and compared against reference data acquired from the MasterMap vector coverage and field survey.In line with the recommendation in [72] to use multiple metrics to test the full range of segmentation characteristics, here we used five different metrics from [71] to check segmentation accuracy.The metrics employed were the Area Fit Index (AFI) which shows how closely segments overlap reference objects; two Relative Area (RA) measures, RAsub and RAsuper, which indicate over-and under-segmentation respectively; the Quality Rate (QR) which is an area-based measure that includes consideration of false positives when determining segmentation success; and the D index which is a combined metric that considers both over-and under-segmentation to indicate how closely objects produced match ideal segmentation output.See [71] for further detail on these.Collectively, the five metrics provided a strong and varied test of segmentation accuracy.Nonetheless, the somewhat arbitrary nature of accuracy metrics' units and their sometimes conflicting outcomes [72] means that a visual check can also be useful [73][74][75].The authors of [76] claimed that human interpretation represents the most effective means of assessing segmentation output, supported later by [77].Therefore, ultimately, both quantitative metrics and qualitative assessment were used in combination to determine final scale settings.Following segmentation, standard nearest neighbour classification was conducted to label objects to the most appropriate class.To ensure a fair comparison between pixel-based and object-based classification, the same training samples were used in all cases.As well as the straightforward spectral information provided by the different band sets, classification performance was enhanced (determined through trial-and-error) with additional discriminatory variables.Certain spatial object characteristics were incorporated, including area, shape and length/width ratio; and various spectral indices (described above) were also used: NDVI for all spectral band sets, plus NSDBI and NSDBRI for the 8 band set only.Finally, because of shadow effects with certain roof classes when using the 2 m spatial resolution imagery, individual buildings often tended to be classified as two objects, one representing the non-shaded side of the roof and the other representing the shaded side.Here, therefore, the concrete and clay roof classes were each first classified as two separate sub-classes (e.g., non-shadowed clay roofs, shadowed clay roofs) and then later combined to form a single (e.g., clay roofs) class (Figure 4, stage 3).
In the past, considerable attention has focused on specific OBIA parameter settings, especially for the widely used eCognition [29], though this has created some difficulties for transferability since the OBIA process can be highly idiosyncratic to each particular study or image data set.As such, there is only limited benefit in reporting parameter settings, since these may not be directly transferable to another context.However, here, for completeness, but especially since this study is principally concerned with comparison between spatial, spectral and classifier characteristics using a common study area and data set, parameter settings are presented in Table 2. Notably, it is interesting to compare settings between the different spatial resolution inputs (30 m, 10 m, 4 m, 2 m), bearing in mind that in each case sensitivity testing was used to optimize classification outcome.It can be seen that the shape and compactness settings are consistent throughout all 12 classifications, whereas the optimum scale setting increased consistently from 30 m to 2 m resolution.Broadly speaking, increasing the scale parameter increases average segment size, and this makes sense in the current context whereby the higher scale settings at finer resolutions offset smaller pixel sizes, leading to consistently sized objects (i.e., consistent between varying input spatial resolution).Also, the optimum scale setting was consistently larger for vegetation classes (e.g., large parcels of grassland and woodland) than non-vegetation classes (e.g., small urban features such as buildings and roads).Finally, following segmentation, a merging procedure can be used to combine spectrally similar objects thus refining the final segmented output.Here, this proved helpful only in the case of the input 2 m spatial resolution data, since the complexity of this imagery led inevitably to some degree of over-segmentation.

Accuracy Assessment and Statistical Testing
Following classification, the 36 output land cover maps were tested against reference data to calculate their accuracy.To enable direct comparison between the three different classification approaches, it was necessary to adopt a common means of accuracy assessment, so point-based checking was conducted.It is important to note that alternative object-based approaches are now available for use with OBIA outputs [78][79][80][81], and these can have the benefit of providing an exact match between analysis data (i.e., classified objects) and reference data (e.g., vector map features).In total, 438 sample points were checked, with between 30 and 43 points used for each individual class.The same points were used for all classification outputs, ensuring direct comparability between results.Confusion or error matrices were generated to show correspondence between predicted (classified) and reference class labels, indicating class-level accuracies (including users and producers accuracy), inter-class confusion or error, and overall classification accuracy.
While error matrices provide a useful means of comparing classification results, they can only provide an "estimate" of classification accuracy (based on the sample of points used), and therefore only tentative conclusions can be drawn [82].This is especially the case where differences in accuracy are marginal, for instance a few percentage points apart.For example, it may be unwise to assert that a land cover map with an accuracy of, say, 93% is definitively more accurate than a map with 89% accuracy.This 4% difference may in fact be a statistical artefact of the sample of test points.The authors of [83] state that accuracy statements should be compared in a statistically rigorous manner and the results expressed with confidence limits.Here, the McNemar test was used to compare classification outputs and indicate the statistical significance of any difference in results [82].That is, in the example above, the McNemar test could indicate whether or not a difference of 4% is statistically significant.As [82] notes this is a non-parametric test that is focused on the binary distinction between correct and incorrect class allocations of two classification outputs (LC map 1 and LC map 2).The McNemar test calculates the z value: where f 12 indicates the total number of paired class allocations correct in LC map 1 but incorrect in LC map 2, and f 21 indicates the total number of paired class allocations correct in LC map 2 but incorrect in LC map 1.If z ě 3.2, this demonstrates a significant difference between two LC maps at the 99% confidence level [84].Here, a fully rigorous and exhaustive approach was adopted for expressing the statistical significance of classification output differences.The McNemar test was conducted on every possible pair of classified land cover maps.With 36 original maps, this meant 630 paired combinations.The results, expressed as a matrix, enable straightforward comparison between all classifications, clearly identifying those classification pairs that are significantly different and those that are not.

Results
In total, 36 land cover maps were produced, using a combination of four spatial resolutions (30 m, 10 m, 4 m, 2 m), three spectral band sets (4 bands, 6 bands, 8 bands) and three classifiers (ML, SVM, OBIA).The main aim of this paper is to provide a comprehensive comparison between these variables, so for completeness extracts of all 36 classified maps are provided in Figure 5. Note, this figure should be interpeted with some caution since it shows only one small area and is not therefore fully representative of land cover throughout the whole study extent.Nonetheless, the figure clearly shows the most significant and consistent pattern evident throughout the results: classification improves as spatial resolution becomes finer.

Results
In total, 36 land cover maps were produced, using a combination of four spatial resolutions (30 m, 10 m, 4 m, 2 m), three spectral band sets (4 bands, 6 bands, 8 bands) and three classifiers (ML, SVM, OBIA).The main aim of this paper is to provide a comprehensive comparison between these variables, so for completeness extracts of all 36 classified maps are provided in Figure 5. Note, this figure should be interpeted with some caution since it shows only one small area and is not therefore fully representative of land cover throughout the whole study extent.Nonetheless, the figure clearly shows the most significant and consistent pattern evident throughout the results: classification improves as spatial resolution becomes finer.3 and 4 respectively).The highest overall classification accuracy (91%) was achieved by arguably the most sophisticated data set/classifier combination, using the most advanced spatial and spectral characteristics of WV2 imagery and state-of-the-art OBIA (Table 3).This classification enabled relatively accurate mapping of all classes, with only minor confusion between vegetation classes and between concrete and other impervious classes.In contrast, it is clear that the less sophisticated data set/classifier combination (using relatively coarse 30 m resolution and only four basic spectral bands) is wholly inadequate in classifying such detailed urban land cover classes (Table 4).Few classes are mapped with any success and overall classification accuracy is only 35%.Perhaps the most significant factor here, as will be discussed below, is the coarse spatial resolution, which prevents accurate identification of small urban features.
A summary of classification accuracies for all 36 land cover maps is provided in Figure 6.This figure presents raw overall classification accuracies and enables direct assessment of the differences between classifications.However, this does not indicate which of these differences are statistically significant.Therefore, a full matrix of z values (calculated from the McNemar test of statistical significance) between all 630 classification pairs is also presented, in Figure 7. Statistically significant differences (i.e., z values ě 3.2, at the 99% confidence level) are highlighted in grey.Note, the figure clearly contains a large volume of information and requires careful interpretation, but it is included here to enable comprehensive and unlimited comparison between data set/classifier combinations.
Remote Sens. 2016, 8, 88 13 of 22 While the full set of statistical classification results are summarized below, two full error matrices are first presented to give some examples of class-level detail.To provide contrast and show the full range of classification success, the most accurate classification overall (2 m, 8 bands, OBIA) and least accurate classification overall (30 m, 4 bands, OBIA) are presented (Tables 3 and 4, respectively).The highest overall classification accuracy (91%) was achieved by arguably the most sophisticated data set/classifier combination, using the most advanced spatial and spectral characteristics of WV2 imagery and state-of-the-art OBIA (Table 3).This classification enabled relatively accurate mapping of all classes, with only minor confusion between vegetation classes and between concrete and other impervious classes.In contrast, it is clear that the less sophisticated data set/classifier combination (using relatively coarse 30 m resolution and only four basic spectral bands) is wholly inadequate in classifying such detailed urban land cover classes (Table 4).Few classes are mapped with any success and overall classification accuracy is only 35%.Perhaps the most significant factor here, as will be discussed below, is the coarse spatial resolution, which prevents accurate identification of small urban features.
A summary of classification accuracies for all 36 land cover maps is provided in Figure 6.This figure presents raw overall classification accuracies and enables direct assessment of the differences between classifications.However, this does not indicate which of these differences are statistically significant.Therefore, a full matrix of z values (calculated from the McNemar test of statistical significance) between all 630 classification pairs is also presented, in Figure 7. Statistically significant differences (i.e., z values ≥ 3.2, at the 99% confidence level) are highlighted in grey.Note, the figure clearly contains a large volume of information and requires careful interpretation, but it is included here to enable comprehensive and unlimited comparison between data set/classifier combinations.The most obvious finding is the clear correlation between spatial resolution and classification accuracy.As spatial resolution becomes finer-from 30 m, through 10 m and 4 m, to 2 m-classification accuracy increases consistently, with all spectral band sets and all classifiers (Figure 6).Moreover, these differences across resolutions (i.e., comparing common spectral band sets and classifiers) are all statistically significant (Figure 7).At the coarsest resolution (30 m), accuracy ranges between 30% and 40%; while the finest (2 m) resolution imagery leads to accuracies routinely in the 80s%, and at maximum in excess of 90% (Table 3).This finding reaffirms the contention that accurate and detailed mapping of complex urban environments requires spatially detailed data, and here contemporary VHR imagery holds considerable value for the mapping community.
The relationship between the number of spectral bands and classification accuracy is less marked than that of spatial resolution.Nonetheless, overall, increasing the number of spectral bands-from 4, through 6, to 8-does lead to modest increases in classification accuracy (Figure 6), though these are not always statistically significant (Figure 7).This trend is consistent at all spatial resolutions, though more pronounced at the finer (4 m and especially 2 m) resolutions than the coarser (30 m and 10 m) resolutions.For instance, for the 2 m resolution imagery, average accuracy (i.e., the average of all three classifiers) increases from 79.1% when using 4 bands, to 83.5% when using 6 bands and 86.9% when using 8 bands.In contrast, for the 30 m imagery, average accuracy increases only very slightly from 32.9% for 4 bands, to 34.8% for 6 bands and 35.3% for 8 bands.(Note, the influence of the number of spectral bands on classification accuracy also depends on the classifier used, as discussed below.)This finding demonstrates that enhanced spectral information can aid distinction of detailed thematic classes in complex environments.Notably, here, the additional bands offered by contemporary VHR sensors such as WV2 and WV3 may offer some advantage over early VHR sensors such as IKONOS.
The influence of choice of classifier on classification accuracy is more complex than that of spatial resolution or spectral band set.The results show that the classifier can have a noticable effect on accuracy, but only when considered in combination with spatial resolution and/or number of spectral bands (Figure 6).At the coarsest spatial resolution (30 m), differences between classifiers are marginal, and generally not statistically significant (Figure 7).However, it is interesting to note that the ML classifier performs slightly better overall at this resolution (or at least no worse, when factoring in statistical significance) than the more sophisticated SVM and OBIA approaches.This pattern continues at the next finest resolution (10 m), and here both pixel-based classifiers (ML, SVM) also prove significantly more accurate than OBIA.
At the finest spatial resolutions (4 m, 2 m), patterns related to choice of classifier change from those observed at the coarser resolutions, and also become more defined (Figure 6).The SVM classifier is now consistently (often significantly) more accurate than the ML classifer (Figure 7).For instance, for the 4 m resolution imagery, average SVM accuracy (i.e., the average of all three spectral band sets) is 75.7%, compared to an average ML accuracy of 73.9%.At 2 m resolution, the difference is even more pronounced: average SVM accuracy = 84.6%,average ML accuracy = 82%.

The Key Role of Spatial Resolution
Spatial resolution is the most significant factor in determining the success or otherwise of mapping complex urban land cover (Figure 6); this is clear, and indeed unsurprising [21].VHR satellite sensor imagery has proved a game-changer here, obviously increasing the spatial detail and spatial accuracy of urban land cover maps as compared against medium resolution imagery, but also substantially increasing the level of thematic detail.While Landsat-like image classifications were often limited to a single general "urban" class [11,85], VHR imagery enables many constituent urban land cover types to be discriminated [35,86].

Spectral Data Dimensionality
While the role of spatial resolution in urban mapping is fairly straightforward, the role of image spectral characteristics is less clear.Recent VHR sensors such as WV2 and WV3 now have enhanced spectral capabilities compared to early generation VHR instruments like IKONOS and QuickBird.It should be noted the new bands provided by WV2 and WV3-coastal, yellow, red edge and NIR2-were not necessarily developed with urban environments in mind.Instead, the main stated intentions were to enhance capabilities for bathymetry (coastal) and vegetation (yellow, red edge, NIR2) analysis.However, we show some evidence here that the greater spectral capability of WV2 can indeed increase urban mapping accuracy over old four-band, e.g., IKONOS, imagery (Figure 6).This benefit is most pronounced, and only really statistically significant, at the finest spatial resolutions, and especially using OBIA.
When designing this experiment, we did wonder whether the Hughes effect would play any obvious role in influencing classification accuracy.This effect refers to the "curse of dimensionality" where adding spectral bands can in fact reduce classification accuracy, essentially since more statistical demands are being made of (inherently limited or sparse) training data.Clearly, any such effect would counteract the intuitive expectation, as here, that added spectral detail should increase class separation.Overall, there was no noticable Hughes effect.Generally classification accuracy stayed static or increased modestly as the number of spectral bands increased; there were certainly no obvious cases where accuracy decreased (Figure 6).This outcome seems satisactory.The Hughes effect is perhaps more of a concern with higher dimension, e.g., hyperspectral, data, where it may be necessary to perform data reduction on a data set with 10s or 100s of spectral bands [87].It seems WV2's eight-band data set is sufficiently small not to invoke any Hughes effect.This is useful since it means there would be no particular need to consider data reduction at the outset of any project, at least from an accuracy perspective (there may be other, e.g., computer processing time, considerations).

Classifier Choice
Choice of classifier is important in determining the success of classifying complex urban environments and the optimum choice will vary depending on image data characterstics.First, we consider the comparison between parametric (ML) and non-parametric (SVM) pixel-based approaches.An interesting pattern emerged here: ML was generally more accurate at the coarser spatial resolutions, but this trend was reversed at the finer resolutions with SVM becoming superior (Figure 6).This outcome is likely explained by the quality of the training data at the different resolutions and the fact that SVMs are better able than ML to handle complex, noisy data (i.e., as may occur at finer resolutions) [18].As might be expected, this finding was most pronounced at the finest, 2 m, resolution, where differences between SVM and ML were generally statistically significant (Figure 7).
Next, we consider the comparison between pixel-and object-based approaches.Here, somewhat surprisingly, at the coarser spatial resolutions, pixel-based approaches tended to be more accurate than OBIA (Figure 6).In fact, even at the finer resolutions, for small spectral band sets (4 bands, and sometimes even 6 bands), ML and SVM outperformed OBIA.However, for the most sophisticated and complex data sets (2 m/8 bands; also 4 m/8 bands), OBIA was markedly (and significantly, Figure 7) more accurate.Indeed the OBIA classification of the 2 m resolution, 8 band data set was comfortably the most accurate result overall (Table 3).This finding reinforces the contention that OBIA is particularly well suited for VHR imagery [88][89][90].At this fine scale of observation, within-feature variation is likely which may well lead to pixel-based misclassification, but may be mitigated by aggregation at the scale of the object.It is interesting, though, that the number of spectral bands has a noticable influence on OBIA classification accuracy, and the results imply that contemporary VHR instruments with enhanced spectral capabilities have particular potential for urban mapping, holding a considerable advantage over early VHR sensors.
When considering pixel-and object-based classification accuracy, it should be noted that a point-based assessment procedure was adopted since this enabled direct comparison between the different classification approaches.However, some practitioners have recently promoted the uptake of object-based assessment procedures, suggesting they may provide a more appropriate test of OBIA outputs [78][79][80][81].

Project Requirements versus Project Resources
This research presents various data and analysis considerations for urban mapping projects.The other essential consideration relates to project resourcing, since this will have a bearing on both the imagery acquired and the methodology employed.The results here show that VHR imagery is essential for accurate, detailed thematic mapping of urban land cover.Unfortunately this imagery can be costly, unlike the free provision of all Landsat data.Moreover, the more advanced image products (e.g., 2 m, 8 band WV3 imagery) tend to be considerably more expensive than basic (e.g., 4 m, 2 band) products.Also important here are computer, software and operator resources, and in general OBIA approaches tend to require more resource than pixel-based classification approaches.For instance, OBIA generally involves considerably more operator input than ML or SVM analysis, some OBIA operations require substantial computer processing resources, and OBIA packages can be relatively costly.This experiment found OBIA classification of 2 m spatial resolution, 8 spectral band imagery most accurate, though this combination is perhaps the most expensive in terms of resourcing.Satisfactory, cheaper alternatives may exist, depending on user requirements.Here, for instance, SVM classification of 2 m, 4 band data resulted in only a fairly modest decrease in accuracy against the maximum, and this approach would incur considerable savings in terms of data costs (4 band WV2 or WV3 imagery), software requirements (no OBIA purchase) and manpower (reduced operator time).

Conclusions
This paper presents an exhaustive practical experiment to demonstrate the success of contemporary spaceborne imagery and classification methodologies for mapping complex urban environments.This is a unique investigation to provide a full test of the latest VHR imagery for detailed urban classification, examining the influence of spatial resolution and spectral band set, as well as comparing traditional and modern classification approaches.In contrast, previous studies have generally tended to conduct limited comparisons between, for instance, coarse and fine resolution or pixel-and object-based classification.A detailed, 11 class classification schema is used here to identify the maximum level of thematic information that can be achieved using VHR imagery.Again, this contrasts with earlier work that has usually opted for few, broad land cover classes.Finally, our work is conducted on a relatively large image area, the city of Nottingham, UK and its environs, ensuring that urban land cover information is generated at a scale of practical value.In contrast, earlier experiments have often been limited to very small, local areas.
Overall, it is clear that spatial resolution is the most influential factor in enabling accurate mapping of complex urban environments: the finer the resolution, the higher the accuracy.New VHR sensors offer huge potential here, and ongoing technological advancement (and accompanying changes in legislation) implies that opportunities will continue to grow.WV3 was launched in 2014 with the potential for acquiring multispectral (8 band) imagery at a resolution of 1.2 m.Crucially, in 2015 new U.S. governmental legislation was passed that then allowed this image resolution to be made available to commercial users.While not as influential as spatial resolution, the new spectral capabilities provided by, for instance, WV3 can also lead to modest increases in urban mapping accuracy.This advantage is maximized through the use of contemporary, e.g., SVM and especially OBIA, classification approaches, when compared against traditional ML classification.Overall, state-of-the-art VHR imagery (2 m resolution, 8 bands) and OBIA classification provides the most accurate combination for mapping complex urban land cover, but this is perhaps also the most costly and resource-hungry approach.Where resources are limited, requiring some compromise between imagery and methodology, the recommended order of priority is, first, spatial resolution (as fine as possible); second, classifier (first choice OBIA, second choice SVM); and, third, spectral band set (8 bands if possible).

Figure 2 .
Figure 2. WorldView-2 spectral wavebands (top line) and spectral band subsets used for comparative analysis.

Figure 2 .
Figure 2. WorldView-2 spectral wavebands (top line) and spectral band subsets used for comparative analysis.

Figure 3 .
Figure 3.The 36 image data set/classifier combinations (4 spatial resolutions ˆ3 spectral band sets 3 classifiers) used for comparative classification analysis.

Figure 5 .
Figure 5. Land cover maps (detail) for the 36 data set/classifier combinations, plus the WorldView-2 image (© DigitalGlobe, Inc.All Rights Reserved) and MasterMap vector data (© Crown Copyright and Database Right 2015.Ordnance Survey (Digimap Licence).

Figure 5 .
Figure 5. Land cover maps (detail) for the 36 data set/classifier combinations, plus the WorldView-2 image (© DigitalGlobe, Inc.All Rights Reserved) and MasterMap vector data (© Crown Copyright and Database Right 2015.Ordnance Survey (Digimap Licence).

Figure 6 .
Figure 6.Overall land cover classification accuracies for the 36 data set/classifier combinations (ML = maximum likelihood, SVM = support vector machine, OBIA = object-based image analysis).

Figure 6 .
Figure 6.Overall land cover classification accuracies for the 36 data set/classifier combinations (ML = maximum likelihood, SVM = support vector machine, OBIA = object-based image analysis).

Figure 7 .
Figure 7. Matrix of McNemar test z values showing the statistical significance of differences between all classification pairs (ML = maximum likelihood, SVM = support vector machine, OBIA = object-based image analysis; 30 m, 10 m, 4 m and 2 m refer to spatial resolution; 4b, 6b and 8b refer to the number of spectral bands; z values ≥ 3.2 (highlighted grey) are statistically significant at the 99% confidence level).

Figure 7 .
Figure 7. Matrix of McNemar test z values showing the statistical significance of differences between all classification pairs (ML = maximum likelihood, SVM = support vector machine, OBIA = object-based image analysis; 30 m, 10 m, 4 m and 2 m refer to spatial resolution; 4b, 6b and 8b refer to the number of spectral bands; z values ě 3.2 (highlighted grey) are statistically significant at the 99% confidence level).

Table 1 .
Nottingham urban land cover classification schema.

Table 1 .
Nottingham urban land cover classification schema.

Table 2 below
), the scale setting had a significant impact on segmentation outcome[29].Some recent

Table 2 .
Segmentation parameter settings for multi-stage object-based classification.

Table 3 .
Error matrix for the OBIA classification using 2 m spatial resolution, 8 spectral band imagery.

Table 4 .
Error matrix for the OBIA classification using 30 m spatial resolution, 4 spectral band imagery.