Mapping of Coastal Cities Using Optimized Spectral – Spatial Features Based Multi-Scale Superpixel Classification

The high interior heterogeneity of land surface covers in high-resolution image of coastal cities makes classification challenging. To meet this challenge, a Multi-Scale Superpixels-based Classification method using Optimized Spectral–Spatial features, denoted as OSS-MSSC, is proposed in this paper. In the proposed method, the multi-scale superpixels are firstly generated to capture the local spatial structures of the ground objects with various sizes. Then, the normalized difference vegetation index and extend multi-attribute profiles are introduced to extract the spectral–spatial features from the multi-spectral bands of the image. To reduce the redundancy of the spectral–spatial features, the crossover-based search algorithm is utilized for feature optimization. The pre-classification results at each single scale are, therefore, obtained based on the optimized spectral–spatial features and random forest classifier. Finally, the ultimate classification is generated via the majority voting of those pre-classification results in each scale. Experimental results on the Gaofen-2 image of Qingdao and WorldView-2 image of Hong Kong, China confirmed the effectiveness of the proposed method. The experiments verify that the OSS-MSSC method not only works effectively on the homogeneous regions, but also is able to preserve the small local spatial structures in the high-resolution remote sensing images of coastal cities.


Introduction
The term "coastal cities" usually refers to cities located within 100 km from the shoreline where many physical, natural, social and economic elements intersect [1,2].The development of coastal cities favors fast concentration of population and expansion of urban areas.The land surface covers of coastal cities are, therefore, facing rapid change.Due to changes of land surface covers having a close bearing on ecological protection and sustainable development of cities, prompt and accurate production of land-cover maps using remote sensing images play a key roles in the land-use planning and management of coastal cities [3,4].
Over the past two decades, high-resolution remote sensing images with fine spatial details of land surface covers have drawn much attention of researchers [4][5][6].The related mapping techniques have evolved from the pixel-based approaches to superpixels-based methods [7][8][9][10][11][12][13]. This is mainly due to the complex structure of diverse urban land covers which make the pixel-based classification methods hardly applicable for high-resolution remote sensing images.To be specific, due to the abundant details in high-resolution image cause the problem of excessive heterogeneity, ground objects of the same land surface covers may reflect excessively different spectral features, thereby the spectral separation of different land surface covers is more difficult and conventional pixel-based classification methods are not efficient enough in such cases [10].By contrast, the superpixels-based methods can simultaneously make use of the spectral and spatial information of images.Especially, the scale information of different ground objects are also considered in the superpixels-based methods, which is useful for remaining the edge information of different geometric structure [14].For example, Csillik proposed a fast segmentation and classification method of the very high-resolution remote sensing data using the Simple Linear Iterative Clustering (SLIC) superpixels.In [15], to improve the classification accuracy, both supervised and unsupervised Fractal Net Evolution Approach (FNEA, has been embedded in the commercial software eCognition) were developed for extracting buildings from very high-resolution imagery.These methods can firstly segment the high-resolution image into a number of homogeneous regions to reduce the interior heterogeneity of different kinds of land surface covers.The efficiency of the superpixel-based methods also have been verified in the classification of highly built-up urban areas with abundant fine spatial details [16][17][18].
However, due to the coastal cities being located in the surface junction area of continent and sea, the urbanization of a coastal area brings out increasingly fragmented landscapes including those typical types of urban areas such as buildings, blacktop, building shadows and particular land-cover types like wharf, bathing beach, rocky islets, etc.These diverse urban land covers form complex urban morphology and structures which make the superpixels-based methods challenging.On one hand, the so-called optimal scale superpixels are always hard to generate.Once the under-segmentation problem occurs, the classification accuracy will be inferior.The fusion of multi-scale superpixels are therefore can be an effective method for accommodating the various sizes of ground objects in a coastal city image.On the other hand, the single-feature extraction method focuses on the spectral or spatial feature and cannot sufficiently describe characteristics of different land types.Although the combination of spectral-spatial features has increased the separability of land types, many of the feature extraction methods are low-utility since they require predefined image structures, whereas ground objects in the coastal cities are size various and irregularly shaped [19,20].
Recently, the Attribute Profiles (APs) have widely been used for modeling the spatial characteristics of the ground objects in remotely sensed images [21][22][23][24].In [21], a set of profiles including the area and standard deviation attributes were used to extract the spatial information of hyperspectral images.Further, for the high-resolution images, APs can perform a multilevel analysis of the image by the sequential application of different kinds of attribute filters, which leads to the concept of Extended Multi-Attribute Profiles (EMAPs).As described in [25][26][27], one typical property of EMAPs is that their constructions avoid the predefining of image structures.In addition, the EMAPs can preserve the geometric characteristics of the relevant regions and effectively attenuate unimportant details [23,25,28].This is useful for keeping the boundaries of different land surface covers and decreasing their interior heterogeneity of spectral and spatial features.Nevertheless, due to the EMAPs usually appearing as high dimensions with redundancy and relevance, feature selection or optimization is imperative.
Exhaustive algorithms are the most basic method for finding a subset of prominent features to improve predictive accuracy and to remove the redundant features [29].However, this kind of algorithm is times consuming, that is, is if an image has Dim features, an exhaustive algorithm will have to test a 2 Dim times feature combination to search for the most informative subset features.Apart from the exhaustive algorithms, some intelligent optimization algorithms that are computationally inexpensive in terms of both memory and runtime, such as genetic algorithm (GA) and Particle Swarm Optimizer (PSO), also have been adopted to solve the feature selection problem [30][31][32][33][34].This kind of method can select the most informative feature subset from the initial feature space based on the optimization of certain criteria, like classification accuracy.The so-called Gravitational Search Algorithm (GSA) is a recently proposed intelligent optimization algorithm whose simple concept and superior performance have attracted much attention [35][36][37].More recently, we have proposed a crossover-based algorithm, for the dimensionality reduction of hyperspectral images [29].With the based method, the spectral subset with less bands and more discriminative spectral information are selected based on a combined optimization criterion constructed by the overall classification accuracy and the size of the band subset.Thus, the CROSSOVER-BASED ALGORITHM can be an effective method for the optimization of the EMAPs.
Given the background above, this paper proposes a Multi-Scale Superpixels-based Classification (MSSC) method for coastal cities using the Optimized Spectral-spatial (OSS) features (OSS-MSSC).In the proposed OSS-MSSC, the remote sensing image is firstly divided into multi-scale superpixels using the FNEA segmentation method based on three sets of scale parameters.Then, we extract the spectral and spatial features via the Normalized Difference Vegetation Index (NDVI) and EMAPs.The initial feature space is thereby constructed using the extracted spatial-spectral features and the multi-spectral bands of the image.To decrease the redundancy of these features, the CROSSOVER-BASED ALGORITHM is adopted to select the optimal combination of the spectral-spatial features.Accordingly, the pre-classification results at each single scale are generated using the Random Forest (RF) classifier and the majority vote scheme.The RF classifier is selected as the classier mainly because of the random and rotation forest classifiers having achieved outstanding performances in classification of multi-source and multi-temporal images [38][39][40][41][42], such as urban sprawl modelling [43], agriculture water management [44], and mapping of the wetland park [45].Finally, these pre-classification results are fused to obtain the ultimate land-cover map of coastal cities.
The remainder of this paper is organized as follows.Section 2 presents the details of the proposed OSS-MSSC method.Section 3 describes the experimental dataset and the corresponding comparison results.In Section 4, the detailed discussions of the proposed method is presented with some future scopes.Finally, Section 5 concludes the whole paper.

Methodology
The overall framework of the proposed OSS-MSSC is illustrated in Figure 1.As shown in Figure 1, there are three major steps within the proposed method: (1) Multi-scale superpixel generation: generating the superpixels with multiple scales to inosculate with different land surface covers; (2) Spectral-spatial feature extraction and optimization: extracting NDVI and EMAPs to construct the initial feature space along with the multi-spectral bands and optimizing the initial features to construct the optimized feature space; (3) Multi-scale superpixels-based classification: classifying the optimized spectral-spatial features first using the RF classifiers, then restricting the classification result by the generated superpixels at each single scale and majority voting scheme, and finally fusing these classification results at different scales to get the ultimate land-cover map of the coastal city via the majority voting rule.

Multi-Scale Superpixel Generation
Segmentation techniques are powerful tools for defining the spatial dependences of the pixels on a high-resolution remote sensing image [46].In the segmentation techniques, adjacent pixels with similar spectral features are always gathered into superpixels.Comparison to the pixels-based methods, the superpixel-based classification methods can effectively remove the 'salt and pepper' effect.In this paper, the multi-resolution segmentation (MRS) algorithm of Ecognition software, i.e., the FNEA-based segmentation method is adopted to perform the hierarchical over-segmentation to generate the superpixels.

Multi-scale Superpixel Generation
Segmentation techniques are powerful tools for defining the spatial dependences of the pixels on a high-resolution remote sensing image [46].In the segmentation techniques, adjacent pixels with similar spectral features are always gathered into superpixels.Comparison to the pixels-based methods, the superpixel-based classification methods can effectively remove the 'salt and pepper' effect.In this paper, the multi-resolution segmentation (MRS) algorithm of Ecognition software, i.e., the FNEA-based segmentation method is adopted to perform the hierarchical over-segmentation to generate the superpixels.
In FNEA, a set of segmentation parameters including scale, shape, and compactness are desired.Traditionally, users can get the proper parameters through a trial-and-error process.However, the trial-and-error process is time-consuming and operator-dependent [47].To overcome these two problems, many researchers have carried out different experiments to identify the relationship between segmentation parameters and segmentation results to find an effective way for segmentation parameter selection [48,49].However, in most of the cases, the solutions can only suit specific applicable conditions.
In this paper, we do not consider the selection of optimal segmentation parameters, but segment the high-resolution remote sensing image into a series of over-segmented superpixels using N set of parameters.As show in Figure 1, the superpixels at multiple scales from 1 to N can represent the local image information in an adaptive domain.For example, the small scale can accommodate to the small size objects like grass and trees while the large scale is suitable for the objects with large size such as buildings and roads.Specifically, the multi-scale superpixels can be generated following the five main steps below: (1) Define a scale parameter i∈ [1,2, … , ].The scale parameter is the threshold for breaking the merge of different pixels and determines the size of the superpixels/objects.In FNEA, a set of segmentation parameters including scale, shape, and compactness are desired.Traditionally, users can get the proper parameters through a trial-and-error process.However, the trial-and-error process is time-consuming and operator-dependent [47].To overcome these two problems, many researchers have carried out different experiments to identify the relationship between segmentation parameters and segmentation results to find an effective way for segmentation parameter selection [48,49].However, in most of the cases, the solutions can only suit specific applicable conditions.
In this paper, we do not consider the selection of optimal segmentation parameters, but segment the high-resolution remote sensing image into a series of over-segmented superpixels using N set of parameters.As show in Figure 1, the superpixels at multiple scales from 1 to N can represent the local image information in an adaptive domain.For example, the small scale can accommodate to the small size objects like grass and trees while the large scale is suitable for the objects with large size such as buildings and roads.Specifically, the multi-scale superpixels can be generated following the five main steps below: (1) Define a scale parameter i ∈ [1, 2, . . ., N].The scale parameter is the threshold for breaking the merge of different pixels and determines the size of the superpixels/objects.(2) Calculate the color and shape of each superpixel by: where D is the number of features utilized to calculate the color heterogeneity such as spectral heterogeneity, ω k is the weight of the k-th feature, σ k is the standard deviation of the feature values in the k-th feature in the superpixel, f smooth and f compact represent the smoothness and compactness of the superpixel, respectively, and ω s is the weight of the smoothness.(3) Calculate the regional heterogeneity of each superpixel by where ω ∈ [0, 1] is the weight of the shape.(4) Compare the regional heterogeneity h of each superpixel with the predefined scale parameter i and merge any of the two adjacent superpixels whose regional heterogeneity is smaller than i. (5) If the regional heterogeneity of the newly generated superpixel is larger than i, the merging process will be terminated.
Accordingly, as shown in Figure 1, with a series scale parameters i ∈ [1, 2, . . ., N], the multi-scale superpixels are generated for the following processes.

Spectral-Spatial Feature Extraction and Optimization
The coastal cities scenes in study areas are complicated, multi-scaled and irregularly shaped, which leads to a high between-class homogeneity and low within-class heterogeneity.Thus, it is difficult to classify land surface covers merely depending on the spectral information.Fortunately, the high-resolution images provide fine spatial structural details.In this paper, in addition to the multi-spectral bands, both the EMAPs and NDVI are also taken to describe the spectral-spatial features of superpixels on each single scale.The NDVI is used to increase the comparability of vegetation while the EMAP is used to model the geometrical structures of different artificial facilities like buildings, roads, and bridges.
Mathematical morphology is a well-established framework which provides operators to extract high-quality spatial features [21].The morphological profiles from these operators are particularly suitable for representing the size variability of different ground objects, but they are not sufficient to model other geometrical features.Thus, the morphological attribute filters were presented and the application of attribute filters in a multi-level way archives the extraction of the APs.Traditionally, for a grayscale image I, the APs can be generated by a sequence of thickening and thinning operations as follows [50]: where φ K I represents the filtered image generated by thickening operation φ with threshold λ K .Similarly, the γ K I is the filtered image generated by thinning operation γ with threshold λ K .Therefore, the output AP I is a stack of (2K + 1) images, including the original image, K filtered images from the thickening profiles, and the other K filtered images from the thinning profiles.
For the multi-spectral image, the APs can be expressed as: where B i is i-th band in the multi-spectral image and L is the number of bands.EMAP is the extension of APs obtained using multiple types of attributes.The three morphological attributes [22,51,52], area, standard deviation, and moment of inertia, are taken into account in this paper.Thereby the EMAPs utilized in this paper can be defined as: where AP a MS , AP s MS , AP in MS are the EMAPs for the multi-spectral image considering attribute area, standard deviation, and moment of inertia, respectively.
The performance of filtering operations implemented in EMAPs highly relies on the threshold value setting of each attribute [27].Due to the irregularly shaped and multi-scale characteristics of coastal city land surface covers, adopting a proper threshold value can be a challenging task.A smaller value may introduce noise when processing complex land surface covers in a large size, while a large one could attenuate the information of small sized-objects.In order to solve this issue, an automatic threshold selection strategy [21,53] is utilized in this paper, which generates a dense sampling of threshold values from a wide range according to the characteristics of each band from the tested datasets.
The constructed EMAPs are rich in spatial information but inevitably result in increase of the computational burden and curse of dimensionality problem [27].To reduce the information redundancy, the crossover-based method [29] is utilized to optimize the extracted spectral-spatial features in this paper.Comparing to the traditional exhaustive algorithm based feature selection method, the crossover-based method is adaptive and time-saving.In this method, pop feature subsets (X i , i ∈ [1, 2, . . ., pop]) are initialized randomly and the discriminative capability of each subset is evaluated based on the corresponding overall classification accuracy (OA(X i )) and size of the subset (D b ) as illustrated in Equation (7).
where ω b is a weight factor for the balancing the classification accuracy and the size of the i-th subset, D is the dimension of the spectral.Apparently, a subset with higher fitness value (fit) represents a better discriminative capability for image classification.
Following the iterative operators of the CROSSOVER-BASED method, including update of acceleration, mass, velocity, position, and the crossover operator, the optimal spectral-spatial features can be finally acquired when reaching the termination condition as shown in Figure 2. Usually, the maximum iteration times are adopted as the termination condition.

{ }
, , where , , AP AP AP are the EMAPs for the multi-spectral image considering attribute area, standard deviation, and moment of inertia, respectively.The performance of filtering operations implemented in EMAPs highly relies on the threshold value setting of each attribute [27].Due to the irregularly shaped and multi-scale characteristics of coastal city land surface covers, adopting a proper threshold value can be a challenging task.A smaller value may introduce noise when processing complex land surface covers in a large size, while a large one could attenuate the information of small sized-objects.In order to solve this issue, an automatic threshold selection strategy [21,53] is utilized in this paper, which generates a dense sampling of threshold values from a wide range according to the characteristics of each band from the tested datasets.
The constructed EMAPs are rich in spatial information but inevitably result in increase of the computational burden and curse of dimensionality problem [27].To reduce the information redundancy, the crossover-based method [29] is utilized to optimize the extracted spectral-spatial features in this paper.Comparing to the traditional exhaustive algorithm based feature selection method, the crossover-based method is adaptive and time-saving.In this method, pop feature subsets ) are initialized randomly and the discriminative capability of each subset is evaluated based on the corresponding overall classification accuracy ( ( ) i OA X ) and size of the subset ( b D ) as illustrated in Equation ( 7).
where b ω is a weight factor for the balancing the classification accuracy and the size of the i-th subset, D is the dimension of the spectral.Apparently, a subset with higher fitness value (fit) represents a better discriminative capability for image classification.
Following the iterative operators of the CROSSOVER-BASED method, including update of acceleration, mass, velocity, position, and the crossover operator, the optimal spectral-spatial features can be finally acquired when reaching the termination condition as shown in Figure 2. Usually, the maximum iteration times are adopted as the termination condition.

Multi-Scale Superpixel-based Classification
Based on the optimized features, we first perform classification of the coastal city based on the optimized spectral-spatial features using the RF classifier.However, the thickening and thinning operations in the EMAPs make the boundary of some land surface covers blurred.Moreover, the pixel-based RF classifier method always faces the 'salt and pepper noise' problem.In contrast, the

Multi-Scale Superpixel-based Classification
Based on the optimized features, we first perform classification of the coastal city based on the optimized spectral-spatial features using the RF classifier.However, the thickening and thinning operations in the EMAPs make the boundary of some land surface covers blurred.Moreover, the pixel-based RF classifier method always faces the 'salt and pepper noise' problem.In contrast, the boundaries offered by superpixels are clear and the pixels in superpixels are always more homogeneous.Therefore, the obtained classification results are then optimized by the superpixels generated at each single scale using the majority vote scheme.
Specifically, the obtained classification result is firstly mapped to the superpixels generated at each single scale to assign classification labels for all pixels.Then, the classification unit is defined as superpixel instead of individual pixel in the post-processing and the mostly frequently appeared label in a superpixel is considered as the new label of this superpixel as shown in Equation (8).
where Num is the total number of the expected classes, (i, j) is the coordinate of pixel reg(i, j), and f reg(i,j) is the label of pixel reg(i, j) in superpixel reg from the initial classification result.Thereby, the pre-classification results at each single scale are obtained and there are N classification results for each pixel of the coastal city image.The ultimate land-cover map is obtained via the fusion of the N classification results using the majority vote scheme following Equation (8).The classification results are quantitatively evaluated using two widely used precision evaluation criteria calculated from the confusion matrix, i.e., the Overall Accuracy (OA) and kappa coefficient (kappa) [54].

Introduction of Datasets
In this study, two high-resolution images from Gaofen-2 and Worldview-2 are selected as the study area as shown in Figure 3. Figure 3a shows the Gaofen-2 images of a core area of Qingdao (119 • 30 ~121 • 00 E, 35 • 35 ~37 • 09 N) which located on the west coast of the Yellow Sea.It is a key node city for the construction of the "21st Century Maritime Silk Road" of China with temperate monsoon climate which has undergone a rapid urbanization in the past twenty years.The Gaofen-2 data contained 4 multispectral bands spanning from visible to Near Infrared (NIR) spectral region, i.e., 3 visible spectral bands (blue: 450-520 nm defined as B1, green: 520-590 nm defined as B2, red: 630-690 nm defined as B3), 1 NIR band (770~890 nm defined as B4) with a spatial resolution of 4 m, and 1 panchromatic band (450-900 nm) with a resolution of 1 m.The used Gaofen-2 image (with a size of 4792 × 4800 pixels) was collected on 16 February 2016 under clear weather condition.The main land surface covers in this study area can be classified into 12 classes, i.e., building, open space, road, vegetation, shadow, rock, bare soil, beach, red sport ground, green sport ground, sand, and vacant.To make full use of the spectral and spatial information of the Gaofen-2 image, the tested image is fused using the Gram-Schmidt Pan Sharpening method [55] after the positive shooting correction in this paper.

Experimental Settings
In this study, we conduct a series of systematical experiments to validate the effectiveness of the proposed method.The experiments contain: (1).Experiment merely using the raw multi-spectral features and NDVI (defined as Raw); Figure 3b presents the Worldview-2 image of a marginal area called YuenLong of Hong Kong (113 • 31 -114 • 18 E, 22 • 05 -22 • 22 N) located on the south coast of South China Sea.The utilized Worldview-2 data contained 8 multispectral bands spanning from visible to NIR spectral region (blue: 450-510 nm defined as B1, green: 510-580 nm defined as B2, red: 630-690 nm defined as B3, NIR: 770-895 nm defined as B4, coastal: 400-450 nm defined as B5, yellow: 585-625 nm defined as B6, red edge: 705-745 nm defined as B7, and NIR 2: 860-1040 nm defined as B8) with a spatial resolution of 2 m.The used Worldview-2 image (with a size of 2636 × 2113 pixels) was collected on 14 January 2016 under clear weather condition.The main land surface covers in this study area can be classified into nine classes, i.e., building, open space, road, vegetation, shadow, river, inland water, aquafarm, and mudflat.As shown in Figure 3b, this study area contains more vegetation and nature types that than that of Qingdao.

Experimental Settings
In this study, we conduct a series of systematical experiments to validate the effectiveness of the proposed method.The experiments contain: (1) Experiment merely using the raw multi-spectral features and NDVI (defined as Raw); (2) Experiment using the spectral-spatial features (defined as SS); (3) Experiment using the optimized spectral-spatial features (defined as OSS); (4) Experiment using the proposed method (defined as OSS-MSSC); (5) Experiment using a favorite image classification model using convolutional neural network with 16 weight layers (denoted as VGG (Visual Geometry Group), the source code can be download from https://github.com/ry/tensorflow-vgg16)because of its simplicity and accuracy; (6) Experiment using the multi-scale superpixel based VGG by majority voting (defined as MSS-VGG).
The first two experiments can be used to test the effects of the extracted spectral and spatial features while the third experiments can be used to verify the effectiveness of the feature selection method.The fourth experiments are used to confirm the superiority of the proposed OSS-MSSC method.The last two experiments are taken to verify the effectiveness of the OSS-MSSC compared to the advanced classification method using deep learning which have demonstrated their superiority in large-scale image recognition [56].A typical advantage of the deep learning method is that they can abstract the spectral and spatial features in multiple levels using the repeated convolution and pooling operations.Note that the classifiers in all the first four experiments are the same RF classifier.Moreover, the utilized samples for Gaofen-2 and Worldview-2 images which contain 173,350 and 100,710 pixels, respectively, are also keep the same as listed in Table 1 (readers can ask for the tested image and samples by email).All these samples are manfully selected by referring to the Google Earth images.For the parameter setting, this study mainly concerns the parameters in the generation of superpixels, the extraction of EMAPs, the optimization of features using the CROSSOVER-BASED ALGORITHM, the RF classifier, and the network structures of utilized VGG.
Due to the multi-scale superpixels are generated using the FNEA method of the Ecognition Software, there are 3 parameters need to be set, including the scale, weight of the smoothness, and weight of the shape.To distinguish different kinds of land surface covers, we set three different scale levels in this paper as listed in Table 2.For the Gaofen-2 image of Qingdao, due to the vegetation mainly displays as small patches as shown in Figure 1, the first level focuses on the vegetation their shadow and the scale is set to 10.Moreover, considering various texture and shape of the vegetation, the weights of smoothness and shape in the first level is set to 0.2 and 0.8, respectively.The second level mainly concerns the artificial facilities with larger size.The scale, shape, and smoothness parameters is set to 20, 0.5, and 0.7, respectively.The third level concentrates on the line objects with large scale, such as the road in the coastal cities.In this level, the scale, shape, and smoothness parameters is set to 30, 0.7, and 0.7, respectively.For the Worldview-2 image, due to the study area located in the north of Hong Kong where preserves large area of nature land surface covers, river, aquafarm, and vegetation in this area show larger scales while building and shadow display smaller scales.Thereby, the scale, shape, and smoothness parameters of building and their shadow are set to 10, 0.7, and 0.8, respectively.For the water spectral with low heterogeneity, i.e., the river, inland water, and aquafarm, the three parameters are set to 20, 0.5, and 0.5, respectively.For the other urban land covers including vegetation, the three parameters are set to 15, 0.2, and 0.5, respectively as listed in Table 2.
For the generation of the EMAPs, the thresholds were automatically computed by using the automatic framework, which had been tested on many well-known datasets with different spatial resolutions and proven to be excellent in terms of classification accuracies [21,53].Specifically, a wide range of threshold settings for each attribute was established according to the characteristics of tested dataset as follows (each multi-spectral band): For the area attribute, λ a = (100/ϕ) × {1, 3, 5, 7, 9}; For the standard deviation attribute, λ s (B i ) = µ i × {σ min , σ min + δ s , σ min + 2δ s , . . ., σ max }; For the moment of inertia attribute, λ in = {0.1,0.3, 0.5, 0.7}; where ϕ and µ i are the resolution (i.e., 1 m for Gaofen-2 image and 2 m for Worldview-2 image) and average value of the i-th band, respectively.The values of σ min , σ max , and δ s are set to 2.5%, 27.5%, and 2.5%, respectively [21].Accordingly, 164 (including 44 area, 88 standard deviation, and 32 moment of inertia attributes as shown in Table 3) and 328 (including 88 area, 176 standard deviation, and 64 moment of inertia attributes as shown in Table 3) EMAPs features for the tested Gaofen-2 and Worldview-2 images can be generated, respectively.For the feature optimization of using the CROSSOVER-BASED ALGORITHM, all the parameters are set to the recommended values in [29], i.e., the initial constant, the decrease coefficient, the population size, the maximum number of iterations, and the weight factor are set to 20, 100, 10, 10, and 0.6, respectively.To decrease randomicity, 30 independent runs for ach tested image is performed and the results with median performance is utilized for the following classification.For the RF classifier, the value of tree in each of the experiment is set to 200 for the fair comparison.
For the convolutional network in VGG with 16 layers, the recommended parameters in [56] are adopted and the patch size is set to 21.Moreover, to make fair comparison, all the algorithms are implemented using Matlab 2017b on a computer with Intel (R) Xeon (R) CPU (1.70 GHz) and 32 GB of memory.

Optimization Results of the Features
As introduced in Section 3.2, in the process of the feature extraction, 164 and 328 dimensional EMAPs are generated for the Qingdao and Hong Kong area, respectively.That is to say, there are 169 features (includes 5 spectral features) available for the Gaofen-2 image and 337 features (includes 9 spectral features) for the Worldview-2 image.With the crossover-based feature optimization method, 55 and 102 of the 169 and 337 spectral-spatial features are selected.The corresponding results are listed in Table 3.
For the Gaofen-2 image of Qingdao, as shown in Table 3, with the feature selection method, the dimension of the EMAPs are sharply decreased from 164 to 50 (19 area attributes, 18 standard deviation attributes, and 13 moment of inertia attributes).Especially nearly 80% (18 of 88) of the standard deviation attributes in the EMAPs have been removed.Moreover, ten of the 18 selected standard deviation attributes come from the features at µ × 2.5 and µ × 5, and the others show roughly average distribution.This indicates that the internal heterogeneities of some objects are small and that of many other objects are large.Similarly, the 19 selected area attributes present roughly average distribution as well.This confirms the high variability of scales in the complicated tested image.Due to the moment of inertia is the measure of regional compactness, the calculated value of narrow area is larger while that of the compact area is smaller.Therefore, the moment of inertia attributes contribute larger to the extraction of road than other land surface covers.This is the main reason for the number of the selected moment of inertia attributes is relatively smaller than the other two kinds of attributes.In addition, from Table 3 we can conclude that the multi-spectral bands and the NDVI feature play a key role in the mapping of the coastal cities using the Gaofen-2 image.
For the Worldview-2 image of Hong Kong, as illustrated in Table 3, the dimension of the EMAPs are decreased to 93 (32 area attributes, 29 standard deviation attributes, and 32 moment of inertia attributes) from 328.Similar to that of the Qingdao image, the standard deviation attributes are sharply decreased from 176 to 29 and their distributions are roughly average.For the area attributes, it can be seen that most (20 of 32) of the selected attributes concentrate on the small and large areas.This may come from the buildings in the Yuenlong area are basically small in size and the open spaces are scattered, while the aquafarm, river, vegetation etc. are coherent and relatively large in area.Moreover, due to the aquafarm, river, inland water, and road occupy most of the areas and show regular shape, half (32 of 64) of the moment of inertia attributes are selected and they can be used to discriminate the narrow and square objects.Besides, similar to that of the Gaofen-2 image, all the multi-spectral bands and the NDVI feature are selected as shown in Table 3.

Qualitative Evaluation
The classification results of the two study area are shown in the Figures 4 and 5.For the Gaofen-2 image of Qingdao, the classification results only with the multi-spectral features (Figure 4a) is worse than the other methods based on the spectral-spatial features (Figure 4b-d), especially for the buildings and road with obvious morphological characteristics as highlighted in the white circles.For the classification results of VGG-based methods shown in Figure 4e,f, it can be found that these methods can also obtained acceptable classification results of the study area.However, some of the buildings (red) are misclassified to the open space (blue) as indicated by the black rectangles.For the Worlview-2 image of Hong Kong, as shown in Figure 5a,b, the first two methods display more complicated 'salt and pepper' problem.With the feature optimization strategy, the classification results of OSS show better integrality as presented in Figure 5c.Moreover, the proposed OSS-MSSC provides the best performance as shown in Figure 5d, especially for the circled water bodies.By contrast, some of the aquafarm is misclassified to inland water in the VGG-based methods as highlighted by the blue rectangles in Figure 5e,f.In addition, similar to the results of the Qingdao area, there are some buildings (red) are misclassified to the open space (blue) in the VGG-based methods.Besides, due to both the spectral and spatial information are utilized in the VGG-based method and the study areas are very large, the effectiveness of the multi-scale superpixels for the VGG-MSS method is hard to say and it is difficult to give more detailed analysis of the classification results.For further qualitative evaluation, 5 respective study areas shown in Figure 6, i.e., T1-T5 framed in Figure 3, are zoomed in and the corresponding classification results are presented in Figures 7-11.
From the 5 local details of the classification results given in Figures 7-11 we can find that the combination of the spectral-spatial features effectively improves the classification results in all the tested areas.As shown in Figures 7a, 8a, 9a, 10a and 11a, without the spatial information, the classification results show serious 'salt and pepper' problems.Especially in the bathing beach area where the buildings (red) and the open space (blue) are heavily confused with each other as illustrated in Figures 7a and 8a and Figure 10a.Similarly, the river (yellow), inland water (cyan), and aquafarm (magenta) in aquafarm area are confused with each other as illustrated in Figure 11a.This is mainly because of the Raw multi-spectral features based method only takes use of the spectral information and cannot utilize the abundant textures information of objects, while their spectral features are similar as shown in T4 and T5 of Figure 3.In contrast, the results shown in Figures 7b, 8b, 9b, 10b and 11b shown that the introduction of EMAPs effectively promotes the classification performance.In particular, from Figures 7b, 8b and 10b we can find that the confusion of the buildings and the open space has been largely decreased.This comes from their differences in spatial features, such as texture and roughness.For example, in the wharf area, the textures of the buildings and the open space like parking lot are very different, thus there can be separated using the SS-based method as shown in Figure 7b.For the bathing beach, the surface of the beach is very smooth while that of the building is rougher.They also can be separated with the comprehensive utilization of the spectral-spatial features as shown in Figure 10b.These results confirm that the spatial features extracted by the EMAPs describe the complicated textures of land surface covers in coastal city and this demonstrates that the spatial and geometry information is essential for the classification of the high-resolution images.Their efficiencies are also verified in the quantitative evaluation shown in Table 4.More importantly, from Figures 7d, 8d, 9d, 10d and 11d we can find that the classification results of the OSS-MSSC show satisfactory results in different kinds of land surface covers with various spectral and spatial characteristics.This is mainly becausethe multi-scale superpixels help to fully utilize the local information and overcome the "between-class homogeneity" problem in the pixel-based classification methods.Moreover, the VGG-based method faces the contour distortion problem as shown in Figures 7e, 8e, 9e, 10e and 11e.For example, the outlines of the buildings are badly smoothed and the outlines of the road are expanded as shown in Figures 7e, 8e, 9e and 10e.Even the combination of the multi-scale superpixels-based method can hardly alleviate these problem as displayed in Figures 7f, 8f, 9f and 10f.This is mainly caused by the convolution and pooling strategies of VGG.The problem of VGG-based method is more obviously revealed in the aquafarm area as shown in Figure 11.It can be found that the edges of the aquafarm (magenta) and river (yellow) are corroding inward and the surrounding areas are misclassified.

Quantitative Evaluation
From Table 4, it can be concluded that the proposed method has the best classification performance for the two study areas.For the Qingdao area, the OA reaches 97.25% with a best kappa coefficient 0.96.For the study area of Hong Kong, these evaluation criteria achieve 95.40% and 0.95, respectively.By comparison, the classification accuracy of the Raw spectral feature-based method is much lower when only spectral features are utilized as shown in Table 4.After adding the morphological profile features, the classification accuracy of the SS-based method increased greatly, the OA and kappa of Qingdao and Hong Kong images increased to 95.19%, 0.93 and 93.00%, 0.91, respectively.This demonstrate the importance of the spatial and geometry information in the mapping of the high-resolution image of the coastal cities.Moreover, for the Hong Kong image, the OSS-based method can promote the classification to a certain degree.Although the classification accuracy of OSS-based method is a little lower than that of the SS-based method for the Qingdao image, the OSS-based method greatly reduces the dimension of features and reduces the computation consuming.For the VGG and VGG-MSS based methods, owing to the deep learning of the spectral and spatial features of the high-resolution images, these methods performed superior results than the Raw-based based method.Nevertheless, due to the inaccurate boundary location problem, the classification accuracy is still lower that of the SS-, OSS-, and OSS-MSSC-based methods.Moreover, the VGG based method usually require much more computing time than that of the proposed method.

Discussion
A novel mapping method for coastal cities, denoted as OSS-MSSC, is proposed in this paper based on the multi-scale superpixels with optimized spectral-spatial features.This algorithm achieves effective classification of the high-resolution remote sensing image of a coastal city by using three strategies, i.e., the multi-scale superpixel generation, the spectral-spatial feature extraction and optimization, and the superpixels-based decision fusion.The contributions of these strategies are discussed as follows.

The Effectiveness of Spectral-Spatial Features
In the proposed method, apart from the multi-spectral bands and the NDVI, the EMAPs are utilized to extract the geometrical structures of different ground objects which can also decrease the between-class homogeneity of different land surface covers and increase the within-class heterogeneity of one type of land surface cover.Compared to those traditional feature extraction approaches like Markov Random Field (MRF) and Gray-Level Co-occurrence Matrix (GLCM), the EMAPs avoid the need of predefined image structures and effectively preserve the geometrical characteristics of the objects with irregularly shape.From the classification results shown in Figures 4 and 5, and Figures 7c,  8c, 9c, 10c and 11c, it can be concluded that the combination of spectral and spatial features effectively improves the holistic nature of classification results and eliminates many of the salt and pepper noise.Especially for the urban land covers with large within-class heterogeneity like open space (blue), road (white), and vacant (pink).This is mainly becausethe spectral features of the road are similar to the buildings while their morphological characteristics are very different.In addition, the OA and kappa obtained by the SS-based method shown in Table 4 also confirms the effectiveness of the spectral-spatial features.Moreover, the classification results of the residential area of Qingdao shown in Figure 8c reveal that for some of the land types, the feature optimization strategy is effective for decreasing the 'salt and pepper' problem.

The Effectiveness of Feature Optimization
The typical problem of high-dimensional features is their redundancy which always lead to the issue of dimensionality.Thereby, although high-dimensional spectral-spatial features have been extracted, their application needs the selection and optimization strategies.Moreover, the huge data of the extracted features requires large amount of storage space and computation times.To meet these challenges, the crossover-based feature optimization method is adopted to realize the feature selection.It has been validated that the crossover-based method outperforms many state-of-the-art intelligent algorithms in solving the dimensionality reduction problem [29].As shown in Table 3, the operator can remarkable decrease the dimension of the EMAPs.Moreover, the multi-spectral bands and NDVI feature are demonstrated with great importance for the classification of the Gaofen-2 image.Nevertheless, one problem that cannot be ignored is that, for the Gaofen-2 image of Qingdao, the classification results after the feature selection results are little worse than that produced with all the features although the amount of data has been significantly reduced.This is because some useful information in the unselected features are lost.We will try to explore and incorporate some more excellent feature optimization algorithm in our future work.

The Effectiveness of the Multi-Scale Superpixel Classification
Due to the high interior heterogeneity of different types of land surface covers, it is hard to fully utilize their spectral and spatial features to achieve high-quality image classification.To solve this problem, the multi-scale FNEA method is introduced to generated multi-scale superpixels to capture the edges of those land surface covers by the simultaneous utilization of the spectral and spatial information.In this way, there is no need to search for the optimal scale for image segmentation and a user can set the scale parameters in a relaxed environment.Moreover, the fusion decision strategy based on the majority vote scheme can further promote the classification accuracy.The experimental results shown in  obviously confirm the superiority of the multi-scale superpixels-based method, especially for those land surface covers with very similar spectral information and various scales like bare soil and sand in the Qingdao image.In other words, the OSS-MSSC method not only works effectively on homogeneous regions, but also is able to preserve the small local spatial structures in the high-resolution remote sensing images of coastal cities.

Conclusions
Classification of high-resolution images of a coastal city is a challenging task because of the complicated land surface covers.In this paper, the EMAPs are extracted to provide multilevel spatial description of the various land surface covers in a coastal city, and thereby decrease the inter-class variability and improve their intra-class variability.The optimized spectral-spatial features based on the crossover-based method are utilized to perform pre-classification, and the multi-scale superpixels are generated to capture the local spatial structures of the ground objects with various sizes.Experimental results on the Gaofen-2 image of Qingdao and Worlview-2 image of Hong Kong show the superiority of the proposed OSS-MSSC.The proposed method can preserve the small local spatial structures of the complicated land surface covers with irregular shapes and discriminate those land surface covers with similar spectral features.Especially, the proposed method has an outstanding capability for distinguishing the rock, bare soil, sand, and vacant areas as well as rivers, inland water, and aquafarm whose spectral features are strikingly similar to each other.However, although the crossover-based feature optimization method can effectively decrease the dimension of the spectral-spatial features, the method maybe at risk to information loss.This is mainly due to the fact that this kind of feature optimization method is based on feature reduction rather than information centralization.To meet this challenge, in the future, we will try to explore other feature optimization algorithms to promote classification accuracy.

Figure 1 .
Figure 1.Framework of the proposed OSS-MSSC (Multi-Scale Superpixels-based Classification method using the Optimized Spectral-spatial features).

Figure 1 .
Figure 1.Framework of the proposed OSS-MSSC (Multi-Scale Superpixels-based Classification method using the Optimized Spectral-spatial features).

Figure 2 .
Figure 2. The crossover-based search algorithm-based feature selection.

Figure 2 .
Figure 2. The crossover-based search algorithm-based feature selection.

Figure 3 .
Figure 3.The study area and the four zoomed-in representative area.(a) Gaofen-2 image of Qingdao; (b) Worldview-2 image of Hong Kong.

Figure 3 .
Figure 3.The study area and the four zoomed-in representative area.(a) Gaofen-2 image of Qingdao; (b) Worldview-2 image of Hong Kong.

Table 1 .
Training and testing samples for the two study area.

Table 2 .
Parameter settings in the generation of superpixels.

Table 3 .
Results of feature optimization.