Modelling the Spectral Uncertainty of Geographic Features in High-Resolution Remote Sensing Images: Semi-Supervising and Weighted Interval Type-2 Fuzzy C-Means Clustering

: The spectral uncertainty refers to the diversity and variations of spectral characteristics within a single geographic object or across di ﬀ erent objects of the same class. Usually, existing methods represent the spectral characteristics as precise single-valued curves. Thus, the spectral variations cannot be modeled, which further restricts the analysis and classiﬁcation performance of remote sensing images. On the other hand, unsupervised methods have poor performance in classiﬁcation and modeling uncertainty, while supervised methods need a large number of samples with high quality. Fuzzy semi-supervised clustering (FSSC) methods achieve a high accuracy with limited labelled samples. Thus, currently, FSSC methods attract more and more attention. This paper proposes a novel method to model the spectral uncertainty for very-high-resolution (VHR) images based on interval type-2 fuzzy sets (IT2 FSs), namely the hierarchical semi-supervising and weighted interval type-2 fuzzy c-means for objects (hierarchical SSW-IT2FCM-O) clustering method. In this method, the VHR image is segmented into image objects to reduce spectral uncertainty within objects. Spectral values, spectral indices and textures were weighted for object-based image classiﬁcation. To further reduce spectral uncertainty across di ﬀ erent objects of the same class, the spectral characteristics of land cover types were represented as banded curves with certain widths instead of precise single-valued spectral curves. The experimental results show that the banded spectral curves produced by the hierarchical SSW-IT2FCM-O can e ﬀ ectively model the spectral uncertainty of geographic objects. From the perspective of classiﬁcation, four typical validity indices along with the confusion matrix and kappa coe ﬃ cient were used to test the e ﬀ ectiveness of the hierarchical SSW-IT2FCM-O method, and these indices show that the presented method SSW-IT2FCM-O has greater classiﬁcation accuracy than the existing FSSC methods and, more importantly, it requires smaller training samples than the existing methods.


Introduction
Currently, land cover determination by remote sensing data is common and important. Because the inherent fuzziness of geographical objects, and the sensor, data acquisition, processing, conversion and transmission processes may produce or propagate errors, uncertainty exists widely in remote sensing data [1]. For the object-based image analysis of very-high-resolution (VHR) remote sensing images, the spectral attributes of different pixels within a single object are often different. Second, the spectral characteristics of the same land cover type vary from object to object. Some studies have pointed out that the spectral curves of nearly all geographic objects are bands with certain ranges [1,2]. Figure 1 shows the difference between the single-valued spectrum and the banded spectrum of water, where the single-valued spectrum is calculated by the average spectrum on each band. For the banded spectrum, its lower and upper boundaries refer to the first and third quartiles of ground reflectance; this spectral attribute is expressed as an interval number on each band. Unfortunately, the single-valued spectrum does not contain such width information of the banded spectrum. As a result, the banded spectrum contains much more information than the single-valued curve and can express the uncertainty of the spectrum, but the single-valued spectrum curve cannot.
Remote Sens. 2019, 11, x FOR PEER REVIEW 2 of 30 Second, the spectral characteristics of the same land cover type vary from object to object. Some studies have pointed out that the spectral curves of nearly all geographic objects are bands with certain ranges [1,2]. Figure 1 shows the difference between the single-valued spectrum and the banded spectrum of water, where the single-valued spectrum is calculated by the average spectrum on each band. For the banded spectrum, its lower and upper boundaries refer to the first and third quartiles of ground reflectance; this spectral attribute is expressed as an interval number on each band. Unfortunately, the single-valued spectrum does not contain such width information of the banded spectrum. As a result, the banded spectrum contains much more information than the singlevalued curve and can express the uncertainty of the spectrum, but the single-valued spectrum curve cannot. Many studies aim to address the uncertainty of remote sensing data in the classification process. For the unsupervised methods, the fuzzy c-means (FCM) clustering [3] is the classical method and is based on type-1 fuzzy sets [4]. Spectral curves adopted by the FCM are still precise single-valued curves, and these curves may deviate greatly from the real spectral curves of geographical objects. These factors lead to the low classification accuracy of FCM and its extended algorithms. Some researchers improved the FCM by interval type-2 fuzzy sets (IT2 FS) for handling the uncertainty of membership grade, such as interval type-2 fuzzy c-means (IT2FCM) [5] and interval-valued possibilistic fuzzy c-means (IPFCM) [6]. These methods use two fuzzifiers during the classification process, and the membership grade matrix of land cover types contain lower and upper membership values. However, the spectral curves used by these are still single-valued, so these methods still cannot effectively model the spectral uncertainty of geographic features.
Supervised methods can effectively simulate the spectral curves and require a large amount of sample data for training a stable and robust classifier [7,8]. Ma et al. [7] summarized that most of the supervised methods need a 30-50% training sample of the total data; for example, Wang et al. used approximately 30% of the total data to train the interval type-2 fuzzy set-based supervised classification method [9]. However, practically, it is difficult to select large samples with high quality, especially when users are unfamiliar with a study area, so labelled samples are often scarce because they are difficult or expensive to record in reality and sometimes the amount of data is too enormous to completely label [10]. Unlike supervised and unsupervised methods, the fuzzy semi-supervised clustering (FSSC) methods can take advantage of limited labelled samples to improve the performance of unsupervised methods [11]. As a result, FSSC methods are very suitable for remote sensing data classification because usually the investigation of a study area is limited. Recently, some Many studies aim to address the uncertainty of remote sensing data in the classification process. For the unsupervised methods, the fuzzy c-means (FCM) clustering [3] is the classical method and is based on type-1 fuzzy sets [4]. Spectral curves adopted by the FCM are still precise single-valued curves, and these curves may deviate greatly from the real spectral curves of geographical objects. These factors lead to the low classification accuracy of FCM and its extended algorithms. Some researchers improved the FCM by interval type-2 fuzzy sets (IT2 FS) for handling the uncertainty of membership grade, such as interval type-2 fuzzy c-means (IT2FCM) [5] and interval-valued possibilistic fuzzy c-means (IPFCM) [6]. These methods use two fuzzifiers during the classification process, and the membership grade matrix of land cover types contain lower and upper membership values. However, the spectral curves used by these are still single-valued, so these methods still cannot effectively model the spectral uncertainty of geographic features.
Supervised methods can effectively simulate the spectral curves and require a large amount of sample data for training a stable and robust classifier [7,8]. Ma et al. [7] summarized that most of the supervised methods need a 30-50% training sample of the total data; for example, Wang et al. used approximately 30% of the total data to train the interval type-2 fuzzy set-based supervised classification method [9]. However, practically, it is difficult to select large samples with high quality, especially when users are unfamiliar with a study area, so labelled samples are often scarce because they are difficult or expensive to record in reality and sometimes the amount of data is too enormous to completely label [10]. Unlike supervised and unsupervised methods, the fuzzy semi-supervised clustering (FSSC) methods can take advantage of limited labelled samples to improve the performance of unsupervised methods [11]. As a result, FSSC methods are very suitable for remote sensing data classification because usually the investigation of a study area is limited. Recently, some extended versions of FSSC methods have been developed, such as the semi-supervised fuzzy c-means (ssFCM) [12], the semi-supervised kernel-based fuzzy c-means (S2KFCM) [13], the semi-supervised fuzzy clustering algorithm with feature discrimination (SFFD) [10], the semi-supervising interval type-2 fuzzy c-means algorithm using spatial information (SIIT2-FCM) [14], Semi-supervised Kernel Fuzzy C-Means in feature space (SKFCM-F) and semi-supervised multiple kernel fuzzy c-means (SMKFCM) [15]. Lai and Garibaldi compared existing studies and concluded that the Mahalanobis distance (ssFCM) can provide slightly better performance than Gaussian kernel-based distance (S2KFCM) [11]. Like the FCMs and IT2 FCMs, the SFFCs still treat the spectral characteristics as a single-valued curve. So, in our experiments, the above mentioned unsupervised or semi-supervised methods are difficult to distinguish water and shadows with, as well as grass and forest.
In summary, the existing methods cannot effectively model the uncertainty of spectral curves. Therefore, how to model the spectral uncertainty of geographic objects and then improve classification performance is still a problem. The objective of this study is to propose a hierarchical method to model spectral uncertainty in very-high-resolution (VHR) images based on the semi-supervising and weighted interval type-2 fuzzy c-means for objects (SSW-IT2FCM-O). In this method, the VHR image is segmented into objects and then object-based features, including spectra, spectral indices (SIs) and textures, are extracted for classification. Second, the SSW-IT2FCM-O is proposed to classify image objects. The proposed method has the following advantages: (1) spectral curves of land cover types are bands with certain widths for modelling exactly the uncertainty of spectral characteristics.
(2) The prior knowledge of labeled samples contains three types: the hierarchical organization of land cover types, the weights of bands and SIs, and the statistic information of labeled samples. (3) A novel weighted distance is proposed to measure the distance between interval cluster centers and segmented objects. (4) The fuzzy membership degree matrix of the proposed method consists of lower and upper membership grade matrices, while the fuzzy membership degree matrix in existing FSSC methods is a single value matrix. Thus, the presented method can deal with the uncertainty of membership values. (5) The SSW-IT2FCM-O can effectively classify image objects and greatly reduce the uncertainty caused by salt-and-pepper noise that is widely existed in other FSSC methods, and it can effectively distinguish land cover types with similar spectral attributes. These advantages make the proposed method better than existing FSSC methods.
The paper is organized as follows. In Section 2, the conception of IT2 FS, weighted distanced between interval number vectors, SIs and textures are reviewed. In Section 3, the main ideology for constructing the algorithm proposed in this study is described. In Section 4, we give an experiment to and compare with the online support vector machines (LaSVM) [16], which is a popular supervised method, and other semi supervising methods, such as ssFCM, SKFCM-F, SIIT2-FCM, the confusion matrix and kappa coefficients are adopted to validate the classification results, and the spectral curves modeled by the proposed method will be discussed, and the effects of sample volume will be discussed too. Section 5 is the conclusion.

Prerequisites
Our study is based on the IT2 FS, and the banded curve as expressed interval number vectors and each band have different weights in our proposed method. Therefore, in this section, a brief introduction of the IT2 FS and the weighted distance between interval number vectors will be provided. The SIs are used to improve the classification performance in this study, so some selected SIs are introduced too.

IT2 FS
Fuzzy sets (i.e., type-1 fuzzy sets) have been applied to many domains because they can model fuzziness, as shown in Figure 2a. However, the membership function of a type-1 fuzzy set is single-valued and then cannot address the error of the membership value of fuzzy objects. This flaw can be overcome by type-2 fuzzy sets and type-N fuzzy sets [17]. A common type-2 fuzzy set is characterized by a noninterval secondary membership function, which makes computation extremely difficult. Additionally, the secondary membership function is difficult to handle. IT2 FSs are a special case of type-2 fuzzy sets in which the secondary membership grade is a constant that equals one [18][19][20], as shown in Figure 2b, the uncertainty of membership grade is expressed by the footprint of uncertainty (FOU) which is surrounded by the lower membership function (LMF) and upper membership function (UMF). Compared with the general type-2 fuzzy set, the IT2-FS needs much less computation, so it is popular in application. In this study, the definition of IT2 FS introduced by Mendel [20] is adopted. [20]: an IT2 FS A on the universe X φ is given by the following: where U is the universe of discourse for the secondary variable u. Note that since U is a subset of [0, 1] and, for the sake of convenience, the IT2 FS is represented as A, and type-2 fuzzy sets in which the secondary membership grade is a constant that equals one [18][19][20], as shown in Figure 2(b), the uncertainty of membership grade is expressed by the footprint of uncertainty (FOU) which is surrounded by the lower membership function (LMF) and upper membership function (UMF). Compared with the general type-2 fuzzy set, the IT2-FS needs much less computation, so it is popular in application. In this study, the definition of IT2 FS introduced by Mendel [20] is adopted. Definition 1 (IT2 FS) [20]: an IT2 FS on the universe ≠ is given by the following: where is the universe of discourse for the secondary variable . Note that since is a subset of

Weighted Distance between Interval Number Vectors
The interval number can handle the uncertainty of data. An interval number can be expressed as ̅ = [ , ] ( ≤ ), where and are the infimum and supremum of ̅ respectively, and the width of ̅ is ; the greater the width of ̅ , the greater the uncertainty in it. All elements in the interval number vector are interval numbers, each of which could be expressed as = ( ̅ , ̅ , … , ̅ ) , and n is the number of dimensions and all elements ̅ (1 ≤ ≤ ) are interval numbers.
The weighted distance between interval number vectors is used to measure the distance between spectral attributes of segment features and banded curves of land cover types. Currently, there are multiple definitions of the distance between interval numbers. The Euclidean distance between interval numbers is commonly used, but this definition considers only endpoints of interval numbers [21]. The definition proposed by Li et al. [22] is more effective for remote sensing image classification [23] and adopted in this study. Let = [ , ] and = [ , ] be two interval numbers, then the interval distance between and is calculated as follows: Then for two interval number vectors = ( ̅ , ̅ , … , ̅ ) and = ( , , … , ), the distance between them is expressed as follows: Let W = ( , , … , ) be the weight vector of attributes of and , then the weighted interval distance is expressed as:

Weighted Distance between Interval Number Vectors
The interval number can handle the uncertainty of data. An interval number can be expressed as , where x − and x + are the infimum and supremum of x respectively, and the width of x is x − +x + 2 ; the greater the width of x, the greater the uncertainty in it. All elements in the interval number vector are interval numbers, each of which could be expressed as x = (x 1 , x 2 , . . . , x n ), and n is the number of dimensions and all elements x i (1 ≤ i ≤ n) are interval numbers.
The weighted distance between interval number vectors is used to measure the distance between spectral attributes of segment features and banded curves of land cover types. Currently, there are multiple definitions of the distance between interval numbers. The Euclidean distance between interval numbers is commonly used, but this definition considers only endpoints of interval numbers [21]. The definition proposed by Li et al. [22] is more effective for remote sensing image classification [23] and adopted in this study.
Let a = [a − , a + ] and b = [b − , b + ] be two interval numbers, then the interval distance between a and b is calculated as follows: Then for two interval number vectors x = (x 1 , x 2 , . . . , x n ) and y = y 1 , y 2 , . . . , y n , the distance between them is expressed as follows: Let W = (w 1 , w 2 , . . . , w n ) be the weight vector of attributes of x and y, then the weighted interval distance is expressed as:

Spectral Indices and Textures of Image Objects
In this study, SIs are used to improve the performance of modelling uncertainty and classification. Because different sensors have different band numbers and band widths, SIs may vary for different sensor data. In this study, we focus on the WorldView-2 sensor dataset (8 bands). Many SIs were developed for this kind of dataset [24][25][26], including the normalized difference vegetation index (NDVI) [27], NDVI6 [26], the soil adjusted vegetation index (SAVI) [28], the normalized difference water index (NDWI) [29], the forest and crop index (FCI) [30], the normalized difference bare soil index [30], the verified enhanced vegetation index (VEVI) [31], the WorldView non-homogeneous feature difference (NHFD) [32], the normalized difference soil index (NDSI) [33], the normalized differences built-up index (NDBI) [34], the shaded vegetation index (SVI) [35], and gray and average of green, yellow and red band (Table 1). Besides these SIs, the sum of all bands (SUM), brightness, and ratio of Nir2 are selected too. The chosen SIs are regarded as newly generated bands for classification. It is well known that a SI can provide useful information for a special land cover, but it may bring uncertainty for other land cover types, so two issues are encountered: which SIs should be selected and integrated into image clustering and how to calculate the weights for these SIs. The gray level co-occurrence matrix (GLCM) [36] and the derived indicators, including the energy (ASM), contrast (CON), correlation (COR) and entropy (ENT), are useful for remote sensing image classification [37]. In this study, these four texture feature values and the homogeny (HOMO) are employed, and the shape index is employed too.

Methodology
The proposed hierarchical method is object based, and its process is described in Figure 3. In this section, we will illustrate this method in detail.
Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 30 The proposed hierarchical method is object based, and its process is described in Figure 3. In this section, we will illustrate this method in detail.

Preprocessing VHR Images
The VHR image (World View 2) contains one pan band with a spatial resolution of 0.5 m and eight multispectral bands with a spatial resolution of 2 m. The necessary preprocesses include radiometric and atmospheric correction and Gram-Schmidt sharpening for eight multispectral bands by ENVI software.

Segmentation
The sharpened image was segmented with a multi-resolution segmentation algorithm by the eCognition software (Version 8.64). The average values of the spectral characteristics of pixels within a segmented objected are defined as the spectral characteristics of this object, and then the SIs in Table  1 and four texture features and shape index are calculated based on the spectral characteristics of segmented objects.

Preprocessing VHR Images
The VHR image (World View 2) contains one pan band with a spatial resolution of 0.5 m and eight multispectral bands with a spatial resolution of 2 m. The necessary preprocesses include radiometric and atmospheric correction and Gram-Schmidt sharpening for eight multispectral bands by ENVI software.

Segmentation
The sharpened image was segmented with a multi-resolution segmentation algorithm by the eCognition software (Version 8.64). The average values of the spectral characteristics of pixels within a segmented objected are defined as the spectral characteristics of this object, and then the SIs in Table 1 and four texture features and shape index are calculated based on the spectral characteristics of segmented objects.

Sample Selection
As mentioned in Section 1, FSSC methods use limited samples for classification, rather than requiring a large number of samples in supervised methods. In FSSC methods, these limited samples are used to train the class centers repeatedly during the classification process. Several representative samples for each land cover type are necessary and samples should be uniformly distributed in the study area. In this study, we use the ArcGIS software to draw some sample regions of each land cover type, and then select the samples through the overlap operation between this layer and the segmented image object layer.

Hierarchical Classification
This step is the core in our study. For classical FCMs, IT2 FCMs and FSSC methods, all target land-cover types are recognized at the same level, which makes it difficult to distinguish the types with high spectral similarity, such as water bodies and shadows. From the cognitive point of view, multi-level or hierarchical methods are helpful for expressing and understanding geographical knowledge and widely used in VHR image classification [31,38]. These kinds of methods usually use different thresholds for different SIs to recognize different land-cover types in a crisp way. However, it is often difficult to find suitable threshold values. Unlike these strategies, our method uses a fuzzy method to distinguish land cover types hierarchically.

Hierarchical Organization of Land Cover Types
Our method organizes the land cover type (also labeled samples) as a multi-level decision tree ( Figure 4). In this tree, all non-leaf nodes are classification nodes and the proposed soft classification method is adopted to distinguish their children types. All children nodes in the decision tree belong to statistical nodes, and we will take the average, first (Q1) and third (Q3) quantiles of spectral, SIs and texture information for these nodes later.

Preprocessing VHR Images
The VHR image (World View 2) contains one pan band with a spatial resolution of 0.5 m and eight multispectral bands with a spatial resolution of 2 m. The necessary preprocesses include radiometric and atmospheric correction and Gram-Schmidt sharpening for eight multispectral bands by ENVI software.

Segmentation
The sharpened image was segmented with a multi-resolution segmentation algorithm by the eCognition software (Version 8.64). The average values of the spectral characteristics of pixels within a segmented objected are defined as the spectral characteristics of this object, and then the SIs in Table  1 and four texture features and shape index are calculated based on the spectral characteristics of segmented objects.

Determining Subsets of Bands, SIs and Textures
For the object-based classification method, each image object contains three types of information: spectra, SIs and textures. The appropriate features are effective in distinguishing a land cover type from others; conversely, any inappropriate ones may have little effect for distinguishing it from others. As a result, we select the most useful bands, SIs and textures for each classification node by the feature selection method in the eCognition software.

Calculating the Weights
Once some useful bands, SIs and textures have been selected, the weights should be assigned to them. This is based on the hypothesis that if a land cover type can be distinguished from its nearest type, it must be able to be distinguished from other types. In this step, we calculate the average, first (Q1) and third (Q3) quantiles of the selected bands' spectrum for each node by labeled samples. Let J be the number of bands (1 ≤ j ≤ J) for the i-th land cover type and there are L selected samples, then the average value of each band's spectrum is calculated as: where A k is the area of k-th feature, B k j is the spectrum of j-th band of the k-th feature and Mean ij is the mean value of the spectrum value of the j-th band of the i-th land cover type. In the same way, the mean of the selected spectral indices and textures of all land cover types are calculated. The initial spectral characteristic curve of land cover type i-th can be expressed as [Q1, Q3] 1 , [Q1, Q3] 2 , · · · , [Q1, Q3] J . In Figure 3, each classification node contains several children nodes, the interval distance between any children and its brother nodes can be calculated using the first and third quantiles by Equations (2) and (3), and the nearest brother node can be achieved by sorting these distances. If t-th is the i-th node's nearest brother, the weight of the selected band j-th w i1 , w i2 , . . . , w ij (1 ≤ j ≤ J) can be determined as: The SSW-IT2FCM-O is executed repeatedly for each classification node in the decision tree. That is, the SSW-IT2FCM-O is first executed at the top node, and then a subset including bands, spectral indices and textures is built for each children classification node. If a children node still has its own children classification nodes, the SSW-IT2FCM-O would be executed at these nodes repeatedly. Finally, the final result is obtained by collecting the results of each classification node.
For object-based classification methods, all image objects have different sizes; however, all pixels have the same area in existing pixel-based FSSC methods. Thus, pixel-based methods are inappropriate for object-based methods. To make the SSW-IT2FCM-O fit for image objects, the sizes of objects are used as the weights in the object function. Considering the uncertainty for the fuzzifier m in classical FSSC methods, the object function of SSW-IT2FCM-O is: where C is the class number and N is the number of samples in each classification node, A j is the area of object j-th, and wd ij is the weighted interval distance and will be defined later. α is the scaling factor used to maintain a balance between the supervised and unsupervised component within the optimization mechanism. F = [ f ik ] is the priori knowledge matrix used to indicate the membership grades of the labeled samples, b k is an indicator used to distinguish between labeled and unlabeled patterns if a sample x k is labeled, b k = 1; otherwise, b k = 0; m 1 and m 2 are two fuzzifiers. In order to handle the uncertainty of the remote sensing image, in our method, three types of information (spectra, SIs and texture values) of all class centers are expressed as interval number Where C is the class number and all element of V X , V Y and V Z are interval number vectors, and each of them has different weights for classification in SSW-IT2FCM-O, then the weighted distance between image objects and land cover types should be defined. Let each classification node contain selected bands X, selected SIs Y and selected textures Z. A weighted distance wd ij between object j and centroid i can be expressed as follows: where || || W is the weighted interval distance defined by Equation (4); more specifically,||x j − v X i ||W X is the weighted spectral attribute distance of an object k to the spectral attribute center of class i, and ||y j − v Y i ||W Y is the weighted spectral indices distance of an object j to a spectral indices center of class i, and ||z j − v Z i ||W Z is the weighted texture value distance of an object j to a texture value center of class i. The parameter β 1 , β 2 , β 3 controls the effect of selected bands, SIs and texture values. For example, if β 3 equals to 0, this means that no texture information is selected, and the weights of X, Y and Z are calculated by Equation (6).
In type 1 fuzzy set-based FSSC methods, different fuzzifiers produce different membership values of samples, and then produce different classification results. In order to handle the uncertainty of membership values, two fuzzifiers m 1 , m 2 are used to produce the membership values of samples, and the lower and upper membership functions are constructed by m 1 , m 2 and then membership values are expressed as an interval number u ij = u ij , u ij . The Lagrange multiplier is used to minimize the objective function (7)(8), and then the upper and lower membership grade of unlabeled samples are And the upper and lower membership grade of labeled samples is: The single-valued curves are the fuzzy membership grade weighted average of the spectral values of the pixels for all bands when type-1 fuzzy set-based FCMs and FSSC methods are used in remote sensing classification. For the interval type-2 fuzzy set-based IT2 FCMs or SIIT2FCMs, the curves are determined by the Karnik-Mendel (KM) algorithm [39], which is an effective method for determining the centers of interval II-type fuzzy sets during each iteration, but all centers should be type reduced This means that the spectral curves achieved by IT2 FCMs or SIIT2FCMs are single-valued rather than banded too, and the width information of the spectral curve band is lost, and this is inconsistent with the 'same materials with different spectra', just as Figure 1 showed. In this study, all class centers are expressed as interval number vectors in order to handle the uncertainty of centers.
In addition, because the sizes of image objects are different, the KM algorithm cannot be used directly in our study, so it should be modified for the SSW-IT2FCM-O. In type-1 FCM, the area-weighted centroid is expressed as: And then the area-weighted KM algorithm is described as ( see the Algorithm 1): Algorithm 1. The area-weighted KM algorithm for SW-IT2FCM-O (u i j , u i j , m 1 , m 2 , A) Step 1: Set the value m = m 1 + m 2 2 , u i j = u i j + u i j 2 , and calculate the v i by Equation (30).
Step 3: Calculate the v i as follows: In the case of computing v L i : In the case of computing v R i : Step 4: If v i = v i , go to Step 6. Otherwise, Set v i = v i , and go back to Step 2.
The iteration will stop when J t+1 m − J t m ≤ ε is satisfied. The lower and upper membership grades of each sample belonging to each class are determined and conform to an interval number vector u 1k , u 2k , . . . , u Ck = u 1k , u 1k , u 2k , u 2k , . . . , u Ck , u Ck , (k = 1, 2, . . . , N.). As a result, the probability of any two intervals in the vector can be calculated as follows: where L( u ik ) = u ik − u ik and L u jk = u jk − u jk are the widths of the interval numbers u ik and u ik , respectively, for i, j = 1, 2, . . . , C and k = 1, 2, . . . , N.
We can then obtain a possibility matrix P = (p ij , k). Moreover, the ranking vector w k = (w 1k , w 2k , . . . , w Ck ) T can be calculated by w i = 1 n(n−1) n j=1 p ij + n 2 − 1 , and the index of the maximum value in w k is the class index of the sample.
It is easy to know that when the area of each sample is equal to 1, centroid is sampled as Step 1: Initialization, set values for two fuzzifiers m 1 , m 2 and the termination criterion value ε and set the class number C, and initialize the band centroid V B and the spectral indices centroid V S by the first (Q1) and third (Q3) quantiles, respectively. The weight of selected bands and SIs is calculated by Equation (6).
Step 2: Calculate the new distance between the object k and the centroid i using Equation (7) and calculate the lower and upper membership degree matrix by Equations (10)-(13).
Step 3: Calculate all centroids of the band subset v B = v B i , i = 1, 2, . . . , C and spectral indices v S = v S i , i = 1, 2, . . . , C and determine their lower and upper bands v L i and v R i , respectively, via the area-weighted KM algorithm.
Step 5: Calculate the possibility matrix using Equation (17) and then obtain the ranking vector, and then assign a sample to a cluster and return the clustering results.

Model the Banded Spectral Curves of Land Cover Types
Because not all bands are selected at each classification node, we can only get several interval centers of these selected bands in Section 3.4. In order to obtain a complete banded spectrum curve, we collect the lower and upper membership degree matrix of each target land cover type which is achieved in some classification nodes, and then execute Algorithm 1 to produce the banded spectrum curves of each land cover type.

Experiments
In this section, a WorldView-2 dataset (WV-2) is used to test the proposed method. The dataset is classified into six types: water body, shadows, buildings, bare lands, grass, and woods. Here, the fuzzifiers m 1 and m 2 are set to 2.1 and 5, respectively, the maximum number of iterations is 100, and the termination criterion value ε is set to 0.0001. The classification accuracy of the result will be compared against that of other FSSC methods in terms of the five typical validity indices: the partition coefficient (PC-), the partition entropy (PE-), the Xie and Beni index (XB-) and the Fukuyama and Sugeno index (FS-) [40,41]. In addition, the confusion matrix and kappa coefficient are used to verify the accuracy of the proposed method. In addition, the spectrum bands of target land cover types produced by the proposed method will be investigated and compared with spectral curves produced by other FSSC methods. In the last of this section, the effects of sample sizes will be investigated too, and this investigation showed the performance of our method.

Study Area and Materials
The study area ( Figure 5) is the campus of Tianjin Normal University, which is located in southwestern Tianjin City, a metropolis in northern coastal Mainland China. The reason for choosing this region as the study area is mainly due to the convenience for validation, and the study area contains many typical land cover types. The land cover types mainly include buildings, wetlands, trees, grasslands, trails, bare soil and concrete roads.
In this paper, the multispectral images of WV-2 with 3988 × 2532 pixels (252 ha) are used to test the classification performance of the presented HSW-IT2FCM-O algorithm. The dataset was acquired in September 13, 2015. The radiometric and atmospheric correction and Gram-Schmidt sharpening is finished by the ENVI software. All eight bands are used to segment with multi-resolution segmentation algorithm by the eCognition software with the segment scale being 120, and the weights of shape and compactness homogeneity being 0.1 and 0.5 correspondingly. A total of 20,078 objects were produced.
All the SIs in Table 1 and the sum of all bands (SUM), brightness, and ratio of Nir2 are considered. The land cover types are hierarchically organized into a tree ( Figure 6). The top node contains three types: dark objects, vegetation and impervious surfaces (ISs) and bare soils. At the second level, the dark objects are further classified into water body and shadows, the vegetation into grasslands and woodlands, and bare soils into impervious surfaces and bare soils. At the third level, the woodlands are further refined into dense and sparse woodlands. In this study, the sparse woods mainly refer to shrubs, groves and emergent macrophytes. The impervious surfaces (ISs) are further classified into roads, buildings and other ISs which mainly contain a parking lot, basketball court, etc.
produced by the proposed method will be investigated and compared with spectral curves produced by other FSSC methods. In the last of this section, the effects of sample sizes will be investigated too, and this investigation showed the performance of our method.

Study Area and Materials
The study area ( Figure 5) is the campus of Tianjin Normal University, which is located in southwestern Tianjin City, a metropolis in northern coastal Mainland China. The reason for choosing this region as the study area is mainly due to the convenience for validation, and the study area contains many typical land cover types. The land cover types mainly include buildings, wetlands, trees, grasslands, trails, bare soil and concrete roads. In this paper, the multispectral images of WV-2 with 3988×2532 pixels (252 ha) are used to test the classification performance of the presented HSW-IT2FCM-O algorithm. The dataset was acquired in September 13, 2015. The radiometric and atmospheric correction and Gram-Schmidt sharpening is finished by the ENVI software. All eight bands are used to segment with multi-resolution segmentation algorithm by the eCognition software with the segment scale being 120, and the weights of shape and compactness homogeneity being 0.1 and 0.5 correspondingly. A total of 20,078 objects were produced.
All the SIs in Table 1 and the sum of all bands (SUM), brightness, and ratio of Nir2 are considered. The land cover types are hierarchically organized into a tree ( Figure 6). The top node contains three types: dark objects, vegetation and impervious surfaces (ISs) and bare soils. At the second level, the dark objects are further classified into water body and shadows, the vegetation into grasslands and woodlands, and bare soils into impervious surfaces and bare soils. At the third level, the woodlands are further refined into dense and sparse woodlands. In this study, the sparse woods mainly refer to shrubs, groves and emergent macrophytes. The impervious surfaces (ISs) are further classified into roads, buildings and other ISs which mainly contain a parking lot, basketball court, etc. The labelled dataset covers approximately 9.1% of the study area ( Figure 7a) and the test dataset ( Figure 7b) covers 51.6% of the study area. All these data are distributed uniformly in the study area. In this study, two datasets were used to test the proposed method and compare with other methods.   The labelled dataset covers approximately 9.1% of the study area ( Figure 7a) and the test dataset (Figure 7b) covers 51.6% of the study area. All these data are distributed uniformly in the study area. In this study, two datasets were used to test the proposed method and compare with other methods.

Results
We use the labeled datasets to select training samples in the segmented dataset by an intersection operation, and then use the selected training samples to calculate the mean and Q1 and Q3 for each statistic node (target class) ( Figure 6). The bands, SIs and texture values used in each classification node are selected in eCognition software, which are listed in Table 2. The selected SIs are shown in Figure 8, and the weights of selected bands, SIs and texture values are calculated by Equation (6).  At the top node, we classify the study area into three types: dark objects, vegetation and impervious surfaces, and the results are showed in Figure 9 and the user accuracy (U.A.), producer accuracy (P.A.), overall accuracy (O. A.) and kappa coefficient are provided in Table 3. The overall accuracy and kappa coefficient are 0.9859 and 0.9777, respectively, and this means a very high accuracy at this level, and the vegetation has the highest accuracy, followed by dark objects, while the impervious surfaces have a relatively lower accuracy. At the top node, we classify the study area into three types: dark objects, vegetation and impervious surfaces, and the results are showed in Figure 9 and the user accuracy (U.A.), producer accuracy (P.A.), overall accuracy (O.A.) and kappa coefficient are provided in Table 3. The overall accuracy and kappa coefficient are 0.9859 and 0.9777, respectively, and this means a very high accuracy at this level, and the vegetation has the highest accuracy, followed by dark objects, while the impervious surfaces have a relatively lower accuracy. At the top node, we classify the study area into three types: dark objects, vegetation and impervious surfaces, and the results are showed in Figure 9 and the user accuracy (U.A.), producer accuracy (P.A.), overall accuracy (O. A.) and kappa coefficient are provided in Table 3. The overall accuracy and kappa coefficient are 0.9859 and 0.9777, respectively, and this means a very high accuracy at this level, and the vegetation has the highest accuracy, followed by dark objects, while the impervious surfaces have a relatively lower accuracy.   The final result is shown in Figure 10. The confusion matrix and accuracy are listed in Table 4. In this level, the dark objects are classified into water and shadows, and the accuracy (P.A.) of water is as high as 0.9986 and the accuracy (P.A.) of shadows is 71.59%. The vegetation node is further classified into woodlands and grasslands that almost share the same spectral absorption features. Thus, it is difficult to identify them from each other only considering the spectral information. If different spectral indices are considered, woodlands and grasslands can be distinguished with high accuracy, and the woodlands are further classified into dense and sparse woodlands, then the grasslands, dense and sparse woodlands have an accuracy (P.A.) of 0.9578, 0.8829 and 0.5357, respectively. Regarding the impervious surfaces and bare soil, including buildings and bare soil, the accuracy (P.A.) of bare soil is 0.8987. The impervious surfaces are furtherly classified into roads, buildings and others, and the accuracy of them are 0.8075, 0.7051 and 0.6249, respectively. The lowest accuracy means that the poor performance of discriminating the sparse woodlands from the dense woodlands. The reason for this phenomenon is mainly due to the fuzziness of sparse woodlands. In this study area, sparse woodlands contain shrubs, groves and emergent macrophytes, and almost all sparse woodlands are not pure. Thus, it is difficult to distinguish between sparse woodlands and dense woodlands. The complexity of building roofs, and the high similarity between its spectrum and roads' and other ISs', increases the difficulty of building extraction, and other impervious surfaces are similar. buildings and others, and the accuracy of them are 0.8075, 0.7051 and 0.6249, respectively. The lowest accuracy means that the poor performance of discriminating the sparse woodlands from the dense woodlands. The reason for this phenomenon is mainly due to the fuzziness of sparse woodlands. In this study area, sparse woodlands contain shrubs, groves and emergent macrophytes, and almost all sparse woodlands are not pure. Thus, it is difficult to distinguish between sparse woodlands and dense woodlands. The complexity of building roofs, and the high similarity between its spectrum and roads' and other ISs', increases the difficulty of building extraction, and other impervious surfaces are similar.

Comparison with Other Methods
In this section, we compared the results of the hierarchical SSW-IT2FCM-O with that of the three FSSC methods, namely, ssFCM, SIIT2-FCM and SKFCM-F, as well as with that of LaSVM, which is a popular supervised method. All these methods use the same labelled and test datasets as described in Section 4.1.
The values of the parameters of SIIT2-FCM are the same with the hierarchical SSW-IT2FCM-O. For the ssFCM and SKFCM-F, the fuzzifier m is set to 2, the termination criterion value ε is set to 0.0001, and the maximum iterations is 100. For the LaSVM, the radial basis function (RBF) is selected as the kernel function, and the parameter gamma is set as the average of variances of band values of labeled samples and the parameter C is set as 10. The results of these methods are shown in Figure 11, and the accuracies and kappa coefficients are reported in Table 5. Like other object-based methods, the proposed method effectively avoids the negative influence of salt-and-pepper noise. The other FSSC methods are pixel-based methods and are sensitive to saltand-pepper noise, so their results (Figure 11a-c) seem more fragmented than the result of the proposed method.  Comparing the results of ssFCM, SIIT2-FCM, SKFCM-F, LaSVM and the proposed method ( Figures 10 and 11), it is clear that object-based methods greatly reduce the salt-and-pepper effects. The proposed method shows a better performance than other methods. In region A of Figure 10, the shadows are misclassified into water by SKFCM-F methods as well as SIIT2-FCM. This means that these methods are poor at distinguishing water and shadows. In region B, the dense woodland is partly grouped into sparse woodland using SKFCM-F and SIIT2-FCM. In region C, the sparse woodland cannot be detected by ssFCM, SIIT2-FCM, SKFCM-F, LaSVM, and was misclassified as bare soil or grass by ssFCM, misclassified as water by LaSVM, and misclassified as dense woodland by SIIT2-FCM and SKFCM-F. As a result, these methods are weak at distinguishing grass, dense and sparse woodland. In region D, fuzzy methods (ssFCM, SIIT2-FCM, SKFCM-F, hierarchical SSW-IT2FCM-O) present more detailed information than LaSVM. In region E, most of the grass are classified into bare soil by ssFCM, SIIT2-FCM, SKFCM-F, and some bare soil are classified into impervious surface by LaSVM, and the water body are partly grouped into shadows by SIIT2-FCM and SKFCM-F. In region F, part of a sports field is misclassified into roads by SIIT2-FCM and SKFCM-F. Region G is a wetland and contains shallow water, Typha domingensis, reeds, and lotus leaf. Some vegetation is misclassified into building or water by LaSVM, at this region, the result of LaSVM shows poor performance than FSSC methods. In region H, the planted shrubs are misclassified into bare soil by ssFCM and misclassified into water by LaSVM. In region I, bare soil is misclassified into impervious surface by LaSVM and SIIT2-FCM.
Like other object-based methods, the proposed method effectively avoids the negative influence of salt-and-pepper noise. The other FSSC methods are pixel-based methods and are sensitive to salt-and-pepper noise, so their results (Figure 11a-c) seem more fragmented than the result of the proposed method. Table 5 shows that the SKFCM-F is weak in distinguishing shallow from water, and the SKFCM-F only achieves 16.68% accuracy and has a weak performance in impervious surface classification. Thus, the O.A. of SKFCM-F is the lowest. The ssFCM has the best performance for bare soil and other ISs, but it has poor performance in woodland classification. The SKFCM-F has the best performance for water, because all most dark objects are classified into water. The LaSVM has the best performance for buildings, its U.A. reaches 87.94%, but it also loses lots of detailed information. Although it shows better performance, it contains some unreasonable results. For example, the landscape transition in the region E of Figure 10 is water body → shallow water → bare soil → grass → woods, which is a typical fuzziness phenomenon. The shallow water is classified as shadows by fuzzy methods (ssFCM, SIIT2-FCM, SKFCM-F and hierarchical SSW-IT2FCM-O); these results seem reasonable since the spectral feature of shallow water is very similar with that of shadow. However, in this region, the LaSVM classifies the bare soil as impervious surface and, in region G, some vegetation is misclassified into impervious surface too. That is unreasonable. On the whole, the proposed method shows the best performance among these methods (Table 5) Table 6. The values of PC-indicate the average relative amount of membership sharing between pairs of fuzzy subsets. The higher the PC-value is, the better the corresponding classification results will be. PE-is a scalar measure of the amount of fuzziness in a set of results. The FS-is designed to measure the discrepancy between fuzzy compactness and fuzzy separation. XB-is used to measure the average within-cluster fuzzy compactness against the minimum between-cluster separation. The values of these three indices are smaller, indicating better clustering performance of these clustering methods. The PC-and FS-show the best performance of the proposed method. However, although the XB-value of SIIT2-FCM is the smallest, the hierarchical SSW-IT2FCM-O actually has the highest accuracy from these four FSSC methods (Table 5). Regarding the aspect of the time consumed, the proposed method takes much less time than the other three FSSC methods (ssFCM, SIIT2-FCM and SKFCM-F) in our experiment. The main reason for this is that the SSW-IT2FCM-O is an object-based method and 20,078 features as input are classified. Furthermore, 3988 × 2532 (the resolution is 0.5 m) or 997 × 633 (the resolution is 2 m) pixels as input are classified by the other three FSSC methods which are pixel-based methods, so the input dataset size of the proposed method is very significantly smaller than that of the other three FSSC methods. Another reason for this is that the class number of each classification node is lower than the number of expected class and the number of inputs of each child classification nodes is lower than the full input data set. So, the hierarchical SSW-IT2FCM-O is less time consuming than other FSSC methods. The process of the SSW-IT2FCM-O is similar to IT2FCM and SIIT2-FCM and its computational complexity has been proved as R × O N 2 [5,14], where R is the number of required iterations. Generally, R N; thus, the computational complexity of the SSW-IT2FCM-O and the complexity of the IT2FCM are O N 2 .
Similarly, the computational complexity of the SIIT2-FCM, IT2FCM and SIIT2-FCM are O N 2 . Just as discussed above, the number of segmented objects is far lower than the number of pixels for a same study area; the processing speed of the SSW-IT2FCM-O is often faster than that of the SIIT2-FCM.

Discussion
In the FSSC methods, the centroid (spectral curve) of a class is crucial; the closer the centroid is to the true value, the better the classification performance will be. Among three selected typical FSSC methods (ssFCM, SIIT2-FCM and SKFCM-F), the ssFCM shows the best performance. Thus, it was used in this section. In this section, we treat the first (Q1) and third (Q3) quantiles of the test dataset as the true values of the spectral curves of land covers ( Figure 12). Figure 12 shows that some banded curves may overlap each other, such as water and shadows, dense woodlands and sparse woodlands, and the spectral curves of the three types of impervious surfaces overlap badly. The greater the similarity between them, the greater the uncertainty of classification. Therefore, this uncertainty of spectrum makes it difficult to classify. In this section, we will discuss the ability of the proposed method for modeling the uncertainty of the spectrum. Then the uncertainty of the membership degree of six types and the effects of sample sizes will finally be discussed. curves may overlap each other, such as water and shadows, dense woodlands and sparse woodlands, and the spectral curves of the three types of impervious surfaces overlap badly. The greater the similarity between them, the greater the uncertainty of classification. Therefore, this uncertainty of spectrum makes it difficult to classify. In this section, we will discuss the ability of the proposed method for modeling the uncertainty of the spectrum. Then the uncertainty of the membership degree of six types and the effects of sample sizes will finally be discussed.
(c) Impervious surface and bare soil. The spectral curves of nine land cover types modeled by the test dataset, ssFCM and LaSVM, are shown in Figure 13. These curves modeled by ssFCM and LaSVM are single-valued, and the width of the curves cannot be estimated. As a result, ssFCM and LaSVM cannot model the spectral uncertainty and have limited ability to handle this type of uncertainty. The distances between the true valued curves and spectral curves modeled by ssFCM and LaSVM are calculated by Equation (3) and listed in Table 7. We can see that the spectral curves modeled by the ssFCM have the biggest deviation from the true value. From the perspective of land cover types, the curve of building class has the largest deviation for the three methods. The curves of building and sparse woodlands produced by ssFCM are almost outside the true value ranges, which results in the low classification accuracy of these two types.  Table 7. We can see that the spectral curves modeled by the ssFCM have the biggest deviation from the true value. From the perspective of land cover types, the curve of building class has the largest deviation for the three methods. The curves of building and sparse woodlands produced by ssFCM are almost outside the true value ranges, which results in the low classification accuracy of these two types.  The banded spectral curves produced by the hierarchical SSW-IT2FCM-O are shown in Figure 14. All of them have the lower and upper boundaries, and the band width could be expressed clearly, and then spectral uncertainty could be modeled by the width of these banded spectral curves. Intuitively, the banded spectral curves modeled by the SSW-IT2FCM-O have a good shape similarity with the true values. All spectral curves are close to true values, especially the water, grass, dense woodlands and shadows. The banded spectral curves of water, grass, dense woodlands and impervious surfaces modeled by the SSW-IT2FCM-O are almost entirely within the true values of them correspondingly. In a word, the banded curves can express the spectra of geographical objects more objectively and accurately than single-valued curves. Thus, the presented method has better performance than the ssFCM. The banded spectral curves produced by the hierarchical SSW-IT2FCM-O are shown in Figure  14. All of them have the lower and upper boundaries, and the band width could be expressed clearly, and then spectral uncertainty could be modeled by the width of these banded spectral curves. Intuitively, the banded spectral curves modeled by the SSW-IT2FCM-O have a good shape similarity with the true values. All spectral curves are close to true values, especially the water, grass, dense woodlands and shadows. The banded spectral curves of water, grass, dense woodlands and impervious surfaces modeled by the SSW-IT2FCM-O are almost entirely within the true values of them correspondingly. In a word, the banded curves can express the spectra of geographical objects more objectively and accurately than single-valued curves. Thus, the presented method has better performance than the ssFCM. . The x-axis is the wavelength, and the y-axis represents the reflectance for a scale factor of 10,000.

Effects of Sample Volume
As mentioned in Section 1, the FSSC methods need fewer training samples than supervised methods; in this section, we will investigate the effects of sample volume to the hierarchical SSW-IT2FCM-O. To analyze the effects of sample volume, we reduce some samples to cover approximately 8 and 6% of the study area respectively. Figure 15 shows the reduced samples and results of ssFCM, Figure 14. The banded spectral curves of true value and spectral curves modeled by hierarchical SSW-IT2FCM-O (wavelength: nm). The x-axis is the wavelength, and the y-axis represents the reflectance for a scale factor of 10,000.

Effects of Sample Volume
As mentioned in Section 1, the FSSC methods need fewer training samples than supervised methods; in this section, we will investigate the effects of sample volume to the hierarchical SSW-IT2FCM-O. To analyze the effects of sample volume, we reduce some samples to cover approximately 8 and 6% of the study area respectively. Figure 15 shows the reduced samples and results of ssFCM, SIIT2-FCM, SKFCM-F, LaSVM and hierarchical SSW-IT2FCM-O when the reduced samples cover approximately 8%, and the accuracy and Kappa coefficients are reported in Table 8. The accuracy of the ssFCM reduced greatly, some grasslands and woodlands are misclassified to bare soils. The hierarchical SSW-IT2FCM-O method lost approximately 0.47% accuracy and the LaSVM lost approximately 0.8%.
(i) Grasslands Figure 14. The banded spectral curves of true value and spectral curves modeled by hierarchical SSW-IT2FCM-O (wavelength: nm). The x-axis is the wavelength, and the y-axis represents the reflectance for a scale factor of 10,000.

Effects of Sample Volume
As mentioned in Section 1, the FSSC methods need fewer training samples than supervised methods; in this section, we will investigate the effects of sample volume to the hierarchical SSW-IT2FCM-O. To analyze the effects of sample volume, we reduce some samples to cover approximately 8 and 6% of the study area respectively. Figure 15 shows the reduced samples and results of ssFCM, SIIT2-FCM, SKFCM-F, LaSVM and hierarchical SSW-IT2FCM-O when the reduced samples cover approximately 8%, and the accuracy and Kappa coefficients are reported in Table 8. The accuracy of the ssFCM reduced greatly, some grasslands and woodlands are misclassified to bare soils. The hierarchical SSW-IT2FCM-O method lost approximately 0.47% accuracy and the LaSVM lost approximately 0.8%.   Figure 16 when the reduced samples cover approximately 6%, and the accuracies and Kappa coefficients are reported in Table 9. The shadows have been misclassified as water by ssFCM, SIIT2-FCM, and SKFCM-F, while some woodlands are misclassified as grass, as shown in Figure 16b,c,d. Although the LaSVM has achieved higher accuracy in a previous test, when  The reduced samples and results of ssFCM, SIIT2-FCM, SKFCM-F, LaSVM and hierarchical SSW-IT2FCM-O are shown in Figure 16 when the reduced samples cover approximately 6%, and the accuracies and Kappa coefficients are reported in Table 9. The shadows have been misclassified as water by ssFCM, SIIT2-FCM, and SKFCM-F, while some woodlands are misclassified as grass, as shown in Figure 16b-d. Although the LaSVM has achieved higher accuracy in a previous test, when the sample volume is reduced to 6%, some water regions are misclassified as buildings in Figure 16e, resulting in the accuracy being reduced very sharply and its accuracy loss is more serious than FSSC methods. From Table 9, we can see that the accuracies of ssFCM and LaSVM are reduced greatly while the accuracy of hierarchical SSW-IT2FCM-O lost approximately 1.45%. Therefore, the hierarchical SSW-IT2FCM-O depends less on samples and is more stable than other methods. The reduced samples and results of ssFCM, SIIT2-FCM, SKFCM-F, LaSVM and hierarchical SSW-IT2FCM-O are shown in Figure 16 when the reduced samples cover approximately 6%, and the accuracies and Kappa coefficients are reported in Table 9. The shadows have been misclassified as water by ssFCM, SIIT2-FCM, and SKFCM-F, while some woodlands are misclassified as grass, as shown in Figure 16b,c,d. Although the LaSVM has achieved higher accuracy in a previous test, when the sample volume is reduced to 6%, some water regions are misclassified as buildings in Figure 16e, resulting in the accuracy being reduced very sharply and its accuracy loss is more serious than FSSC methods. From Table 9, we can see that the accuracies of ssFCM and LaSVM are reduced greatly while the accuracy of hierarchical SSW-IT2FCM-O lost approximately 1.45%. Therefore, the hierarchical SSW-IT2FCM-O depends less on samples and is more stable than other methods.

Conclusions
This study proposed a hierarchical method to model the spectral uncertainty for VHR images Figure 16. Classification results of five methods when the sample size was reduced to 8%.

Conclusions
This study proposed a hierarchical method to model the spectral uncertainty for VHR images based on IT2 FS with higher classification accuracy than existing FSSC methods, namely, ssFCM, SIIT2-FCM and SKFCM-F. For this method, the prior knowledge of labeled samples contains three types: the hierarchical organization of land cover types, the weights of bands and spectral indices, and the statistic information of labeled samples. The input image is first classified into several subsets according to the hierarchical organization of labeled samples by the proposed methods, and then further distinguished sub-classes.
The selection of bands, spectral indices and textures are essential for the proposed method. Although there are some feature selection methods in the machine learning domain, such as the information gain algorithm and the correlation-based feature selection algorithm [42][43][44], these methods can produce very different results, and the classification results using the features selected by these methods are poor in our experiment. Therefore, in this study we select bands, spectral indices and textures by eCognition software at first. The weight determination is another important aspect in this work that is based on the assumption that if a land cover type can be distinguished from its nearest type, it will be able to be distinguished from other types. We proposed a weight determination method by the first (Q1) and third (Q3) quantiles and a weighted interval distance is defined. The proposed method effectively utilizes the ability of interval type-2 fuzzy sets to handle uncertainties of membership degrees, and the spectral curves of land cover types are modeled as banded curves. The shortcomings of the methods based on type-1 fuzzy sets are avoided. The World View-2 dataset was used to test the effect of the proposed method, and the results showed that the proposed hierarchical method can effectively model the spectral uncertainties of land cover types. From the perspective of classification, it can effectively distinguish between water bodies and shadows, woodlands and grasslands.
The weighted interval distance defined in this study is based on the Euclidean distance. In future, different similarity metrics, such as spectral similarity metrics [45] and the spectral angle metric [46], will be studied. Like other FSSC methods, the centroid is calculated by all samples, and it will increase the calculation time and affect the accuracy. However, the center of a land cover type should be determined by the samples belonging to this type, not by other samples. Thus, in future work, we will improve the centroid determination method.