Optimal Decision Fusion for Urban Land-Use / LandCover Classification Based on Adaptive Differential Evolution Using Hyperspectral and LiDAR Data

Hyperspectral images and light detection and ranging (LiDAR) data have, respectively, the high spectral resolution and accurate elevation information required for urban land-use/land-cover (LULC) classification. To combine the respective advantages of hyperspectral and LiDAR data, this paper proposes an optimal decision fusion method based on adaptive differential evolution, namely ODF-ADE, for urban LULC classification. In the ODF-ADE framework the normalized difference vegetation index (NDVI), gray-level co-occurrence matrix (GLCM) and digital surface model (DSM) are extracted to form the feature map. The three different classifiers of the maximum likelihood classifier (MLC), support vector machine (SVM) and multinomial logistic regression (MLR) are used to classify the extracted features. To find the optimal weights for the different classification maps, weighted voting is used to obtain the classification result and the weights of each classification map are optimized by the differential evolution algorithm which uses a self-adaptive strategy to obtain the parameter adaptively. The final classification map is obtained after post-processing based on conditional random fields (CRF). The experimental results confirm that the proposed algorithm is very effective in urban LULC classification.


Introduction
Urban land-use/land-cover (LULC) classification plays an important role in various applications, including urban change studies and urban planning [1].With the continuous development of Earth observation technology, there is now a variety of remote sensing sensors with different functions.These multiple sensors provide us with ample data for urban LULC classification.However, the recent studies of urban LULC classification have mainly used a specific source of remote sensing data [2].Hyperspectral images can provide both detailed structural and spectral information about urban scenes [3].Therefore, many researchers have used hyperspectral images in urban LULC classification [3][4][5][6][7][8].However, with the influence of urbanization, urban classes are becoming more and more diversified and classification using a single sensor has some drawbacks.Hyperspectral images have abundant spectral information, allowing the spectral characteristics of ground objects to be characterized well [9,10].However, different objects possess similar spectral characteristics [11].As such, it is difficult to distinguish the objects with similar spectral characteristics.Unlike hyperspectral sensors, light detection and ranging (LiDAR) has the advantage of acquiring dense, discrete, detailed, and accurate 3D point coverage over both the objects and ground surfaces [12].Therefore, LiDAR can provide elevation information for urban LULC classification to distinguish objects with similar spectral characteristics but different elevations.It is possible to greatly improve the accuracy of the classification by fusing the two types of data [13].
In recent years, a number of researchers have fused hyperspectral images with LiDAR data.The classification accuracy of urban LULC has been greatly improved as a result of synthesizing the spectral, spatial, and elevation information.In Liao et al. [14], the urban LULC was acquired by classifying the spatial features, elevation features, spectral features and fusion features, respectively, and the final result was obtained by majority voting.In Ghamisi et al. [15], the attribute profile was considered to model the spatial information of the LiDAR and hyperspectral data.Two classification techniques have been considered to build the final classification map, i.e., random forest (RF) and support vector machine (SVM).In Wang et al. [16], both maximum likelihood and SVM classifiers were used to classify the combined synthesized waveform/hyperspectral image features.In [17][18][19][20], RF was used to classify the features extracted from the hyperspectral images and LiDAR data to generate the classification map of the urban area.In general, hyperspectral and LiDAR data fusion mostly uses different feature extraction methods and multiple classifiers or RF (which is also a multi-classifier ensemble system) to complete the classification.Previous studies have focused on voting for different classifiers with equal weights.However, due to the different abilities of the different classifiers to distinguish different types of objects, the voting approach using equal weights is unreasonable.
To solve the problem, this paper proposes optimal decision fusion for urban LULC classification based on adaptive differential evolution (ODF-ADE) to optimize the weights of the different classifiers for hyperspectral remote sensing imagery and LiDAR data.In ODF-ADE, the differential evolution (DE) algorithm-a powerful population-based stochastic search and global optimization technique [21,22] -is used to find the optimal weights of the different classification maps.DE uses genetic operators such as crossover, mutation and selection to guarantee strong global convergence ability and robustness, and is suitable for complex optimization problems.The DE algorithm has been widely applied for many real applications such as numerical optimization [22][23][24][25], mechanical engineering [26], feedforward neural network training [27], digital filter design [28], image processing [29,30] and pattern recognition [31,32].Furthermore, it has also been used in a number of applications in remote sensing such as clustering [33,34], endmember extraction [35] and subpixel mapping [36].
The contributions of this paper are as follows: (1) The optimal decision fusion framework.ODF-ADE is built for use with hyperspectral imagery and LiDAR data.Before the voting operation, the classification maps are generated by the support vector machine (SVM) [37], the maximum likelihood classifier (MLC) [38] and multinomial logistic regression (MLR) [39], which have different advantages in dealing with samples of different distributions.In line with this strategy, the weight optimization problem is transformed into an optimization problem in the feature space by maximizing the objective function, which is constructed using the minimum Euclidean distance between each pixel and the corresponding predicted class in the training samples.Due to the population-based stochastic search and global optimization technique of the DE algorithm, it is used to optimize the constructed objective function.By initializing a set of weights and using crossover and mutation operations for optimization, the performance of the objective function can be improved.
(2) Adaptive differential evolution.There are two control parameters involved in DE: the scaling factor F and the crossover rate CR.These parameters are often kept fixed throughout the optimization process and can significantly influence the optimization performance of DE [40,41].An adaptive DE method is proposed to solve the optimal decision fusion problem, in which an adaptive strategy is utilized to determine the scaling factor F and crossover rate CR.The parameters that need to be determined are encoded into an individual, i.e., an individual has a set of parameters and uses genetic operators such as crossover, mutation and selection for the evolution process.The better individuals with better parameters are more likely to survive and produce offspring.This method reduces the time required for finding the appropriate parameters and can produce flexible DE for optimal decision fusion.
(3) Post-processing based on conditional random fields (CRF).The commonly used classifiers do not consider the correlations between neighboring pixels, leading to the presence of much low-level noise in the classification map.As an improved model for Markov random fields (MRF), conditional random fields (CRF) has the ability to consider the spatial contextual information in both the labels and observed image data.In order to consider the spatial contextual information and preserve the spatial details in classification, pairwise CRF with an 8-neighborhood is used to smooth the final classification map.The pairwise potential uses the spatial smoothing and local class label cost terms to favor spatial smoothing in the local neighborhood and to take the spatial contextual information into account.
The experimental results obtained in this study demonstrate the efficiency of the proposed ODF-ADE fusion algorithm with the datasets provided by the Data Fusion Technical Committee (DFTC) of the IEEE Geoscience and Remote Sensing Society.
The rest of this paper is organized as follows: Section 2 briefly introduces the basics of the DE algorithm.Section 3 describes the proposed ODF-ADE approach for the fusion of hyperspectral and LiDAR data.The experimental results and analysis are given in Section 4. Section 5 discusses the main properties of ODF-ADE in theoretical and empirical terms.Finally, the conclusions are provided in Section 6.

Differential Evolution (DE) Algorithm
DE was proposed in 1995 by Storn and Price [21].Like the other evolutionary algorithms, DE is a stochastic model for the simulation of biological evolution through repeated iterations which preserves the individuals that adapt to the environment.However, compared to the other evolutionary algorithms, DE retains a global search strategy based on the population, with real encoding and simple mutation strategies to reduce the complexity of the genetic operations.The DE algorithm is mainly used for solving global optimization problems.The main steps are mutation, crossover and selection operations, to evolve from a randomly generated initial population to the final individual solution [42].In the proposed method, we use classical DE [21,23] because this strategy is the most often used in practice.As shown in Figure 1, DE can be described as follows: The minimization optimization problem in the continuous feature space can be represented as: where D indicates the dimension of the problem, and indicate the minimum and maximum of the jth element of the individual vector X j , respectively.The process of DE can be described as the following four steps: Step 1 Initialization: Initialize the population X randomly, where the size of the population is NP.
Step 2 Mutation: With the difference vector of two individuals randomly chosen from the population as the source of random changes in the third individual, generate the mutant individual by obtaining the sum of the difference vector and the third individual according to a scaling factor F.
Step 3 Crossover: Mix the parameters of a predetermined target individual X t i and the mutant vector V t i to produce a trial individual U t i by the crossover probability CR.
Step 4 Selection: If the fitness value of the test individual is better than the fitness value of the target individual, the test individual replaces the target individual in the next generation; otherwise, the target body remains alive.
In the evolutionary process of each generation, each individual vector is considered as the target individual once.The algorithm retains the excellent individuals while eliminating the inferior individuals and guides the search process to the global optimum solution approximation through continuous iteration calculation.
Remote Sens. 2017, 9, 868 4 of 20 In the evolutionary process of each generation, each individual vector is considered as the target individual once.The algorithm retains the excellent individuals while eliminating the inferior individuals and guides the search process to the global optimum solution approximation through continuous iteration calculation.

ODF-ADE Methodology
Before describing the proposed method, the notations used throughout this paper are defined (Table 1).

Map
The classification maps used in the weighted voting. w The weight of each class for each classification map.

i C
The i th class of the classification map.

ai D
Minimum distance between pixel a and the training data i T , for which the label is class i .

i P
The population of the i th generation.
, i j

X
The j th individual of i P .

NP
The size of the population.

F CR
The two parameters in DE (i.e., the mutation scale and the crossover probability).

m p
The parameter in the self-adaptive strategy (i.e., the mutation ratio).
To solve the problem of the inadequate utilization of resources caused by equal voting, the proposed framework uses adaptive DE to optimize the weights of the different classification maps to achieve a better effect.Firstly, the normalized difference vegetation index (NDVI), the gray-level co-occurrence matrix (GLCM) textures and the digital surface model (DSM) elevation feature are added to the spectral features extracted by principal component analysis (PCA) or minimum noise fraction (MNF), to form the feature vector.Three classification algorithms (i.e., MLC, SVM, and MLR) are then used to obtain the initial classification maps.A more accurate classification map is generated by weighted voting using the adaptive DE algorithm.The final classification map is

ODF-ADE Methodology
Before describing the proposed method, the notations used throughout this paper are defined (Table 1).

Map
The classification maps used in the weighted voting.w The weight of each class for each classification map.

C i
The ith class of the classification map.

D ai
Minimum distance between pixel a and the training data T i , for which the label is class i.

P i
The population of the ith generation.X i,j The jth individual of P i .

NP
The size of the population.

F CR
The two parameters in DE (i.e., the mutation scale and the crossover probability).p m The parameter in the self-adaptive strategy (i.e., the mutation ratio).
To solve the problem of the inadequate utilization of resources caused by equal voting, the proposed framework uses adaptive DE to optimize the weights of the different classification maps to achieve a better effect.Firstly, the normalized difference vegetation index (NDVI), the gray-level co-occurrence matrix (GLCM) textures and the digital surface model (DSM) elevation feature are added to the spectral features extracted by principal component analysis (PCA) or minimum noise fraction (MNF), to form the feature vector.Three classification algorithms (i.e., MLC, SVM, and MLR) are then used to obtain the initial classification maps.A more accurate classification map is generated by weighted voting using the adaptive DE algorithm.The final classification map is generated after post-processing.The main procedure of the data fusion framework is shown in Figure 2 and is described as follows.
Remote Sens. 2017, 9, 868 5 of 20 generated after post-processing.The main procedure of the data fusion framework is shown in Figure 2 and is described as follows:

Multi-Feature Extraction
In order to represent the features of objects from different angles, MNF or PCA are used to reduce the dimensionality of the hyperspectral image, and the NDVI is used to distinguish the vegetation.To utilize the spatial information, the GLCM is computed.Finally, the DSM is used to represent the elevation information.And the final feature maps are stacked by these features (i.e., MNF + NDVI + GLCM + DSM/PCA + NDVI + GLCM + DSM).
The NDVI is a simple ratio that can be used to analyze remote sensing measurements, to assess whether the target being observed contains live green vegetation or not.In general, if there is much more reflected radiation in the near-infrared wavelengths than in the red wavelengths, then the vegetation in that pixel is likely to be healthy.
A gray-level co-occurrence matrix or gray-level co-occurrence distribution is a matrix that is defined over an image as the distribution of the co-occurring pixel values (grayscale values) at a given offset.The gray-level co-occurrence matrices can measure the texture of the image and they are typically large and sparse, various metrics are used to obtain a more useful set of features.Therefore, the gray-level co-occurrence matrix can be utilized to increase the separability between classes.Homogeneity, also called inverse disparity, measures the local gray uniformity of an image.If the textures of the different regions are similar and the local gray-level of the image is uniform, then the homogeneity will be larger.Therefore the homogeneity of GLCM is used to describe the spatial texture feature.
The DSM refers to a ground elevation model which incorporates the ground surface, buildings, bridges and trees.In comparison, a digital elevation model (DEM) contains only the elevation information of the terrain and does not contain other surface information.The DSM contains the elevation information of any surface elements (soil, vegetation, artificial structures etc.).Therefore, the DSM data obtained from the LiDAR data are added to characterize the elevation information.

Urban LULC Classification by Different Classifiers
SVM is established based on the Vapnik-Chervonenkis (VC) dimension theory and risk minimization principle to obtain the best classification result, thereby finding the best balance between model complexity (i.e., learning accuracy of the specific training samples) and learning ability (i.e., the ability to identify any sample without error), according to the limited sample information.SVM has many unique advantages in solving small-sample, nonlinear and high-dimensional pattern recognition.
MLC is an image classification method based on statistical knowledge and computing probability.Firstly, the nonlinear discriminant function set is established according to Bayes' decision criterion.It is then assumed that all kinds of distribution functions are normal distributions.Finally, the training area is selected to calculate the attribution probability of each sample area to

Multi-Feature Extraction
In order to represent the features of objects from different angles, MNF or PCA are used to reduce the dimensionality of the hyperspectral image, and the NDVI is used to distinguish the vegetation.To utilize the spatial information, the GLCM is computed.Finally, the DSM is used to represent the elevation information.And the final feature maps are stacked by these features (i.e., MNF + NDVI + GLCM + DSM/PCA + NDVI + GLCM + DSM).
The NDVI is a simple ratio that can be used to analyze remote sensing measurements, to assess whether the target being observed contains live green vegetation or not.In general, if there is much more reflected radiation in the near-infrared wavelengths than in the red wavelengths, then the vegetation in that pixel is likely to be healthy.
A gray-level co-occurrence matrix or gray-level co-occurrence distribution is a matrix that is defined over an image as the distribution of the co-occurring pixel values (grayscale values) at a given offset.The gray-level co-occurrence matrices can measure the texture of the image and they are typically large and sparse, various metrics are used to obtain a more useful set of features.Therefore, the gray-level co-occurrence matrix can be utilized to increase the separability between classes.Homogeneity, also called inverse disparity, measures the local gray uniformity of an image.If the textures of the different regions are similar and the local gray-level of the image is uniform, then the homogeneity will be larger.Therefore the homogeneity of GLCM is used to describe the spatial texture feature.
The DSM refers to a ground elevation model which incorporates the ground surface, buildings, bridges and trees.In comparison, a digital elevation model (DEM) contains only the elevation information of the terrain and does not contain other surface information.The DSM contains the elevation information of any surface elements (soil, vegetation, artificial structures etc.).Therefore, the DSM data obtained from the LiDAR data are added to characterize the elevation information.

Urban LULC Classification by Different Classifiers
SVM is established based on the Vapnik-Chervonenkis (VC) dimension theory and risk minimization principle to obtain the best classification result, thereby finding the best balance between model complexity (i.e., learning accuracy of the specific training samples) and learning ability (i.e., the ability to identify any sample without error), according to the limited sample information.SVM has many unique advantages in solving small-sample, nonlinear and high-dimensional pattern recognition.
MLC is an image classification method based on statistical knowledge and computing probability.Firstly, the nonlinear discriminant function set is established according to Bayes' decision criterion.
It is then assumed that all kinds of distribution functions are normal distributions.Finally, the training area is selected to calculate the attribution probability of each sample area to obtain the classification map.When classifying, MLC not only considers the distance of the sample to the class center, but also takes into account the distribution characteristics.
MLR is a particular solution to the classification problem that assumes that a linear combination of the observed features and some problem-specific parameters can be used to determine the probability of each particular outcome of the dependent variable.The best values of the parameters for a given problem are usually determined from training data.The algorithm adopts an MLL prior to modeling the spatial information present in the class label images.
These three algorithms, which are all robust, can make full use of the prior information of the samples and are therefore suitable for the classification of complex objects in urban areas.The six classification images are obtained with the two sets of features-MNF + NDVI + GLCM + DSM, PCA + NDVI + GLCM + DSM-by these three classifiers.

Optimal Decision Fusion Based on Adaptive DE
After the classification maps are obtained by the classification step, they can be used to generate a more accurate classification map by decision fusion, e.g., majority voting.Different classifiers have different abilities to distinguish different objects.In order to avoid the unreasonable use of resources caused by majority voting, weighted voting is used for the decision-level fusion.The DE algorithm allows for global optimization and can be applied to optimize the weights.In addition, a self-adaptive parameter selection method is proposed to adaptively choose the appropriate parameters during the course of DE.

Initial Population
After obtaining the classification maps of the different classifiers, the population can be initialized as P g = X 1,g , . . ., X k,g , . . .X NP,g , where X kG represents the kth individual in the gth generation and NP is the size of the population.As shown in Figure 3, each individual X k,g = x 1 k,g , . . ., x t k,g , . . .
denotes the weight of each class for each classification map.The weights also need to be initialized.The two variables are defined as M and N, which represent the number of land-cover labels and classification maps, respectively.D equals M × N and denotes the number of chromosomes that one individual x kG contains.The initial population P 1 is generated randomly from 0 to 1.
obtain the classification map.When classifying, MLC not only considers the distance of the sample to the class center, but also takes into account the distribution characteristics.MLR is a particular solution to the classification problem that assumes that a linear combination of the observed features and some problem-specific parameters can be used to determine the probability of each particular outcome of the dependent variable.The best values of the parameters for a given problem are usually determined from training data.The algorithm adopts an MLL prior to modeling the spatial information present in the class label images.
These three algorithms, which are all robust, can make full use of the prior information of the samples and are therefore suitable for the classification of complex objects in urban areas.The six classification images are obtained with the two sets of features-MNF + NDVI + GLCM + DSM, PCA + NDVI + GLCM + DSM-by these three classifiers.

Optimal Decision Fusion Based on Adaptive DE
After the classification maps are obtained by the classification step, they can be used to generate a more accurate classification map by decision fusion, e.g., majority voting.Different classifiers have different abilities to distinguish different objects.In order to avoid the unreasonable use of resources caused by majority voting, weighted voting is used for the decision-level fusion.The DE algorithm allows for global optimization and can be applied to optimize the weights.In addition, a self-adaptive parameter selection method is proposed to adaptively choose the appropriate parameters during the course of DE.

Calculation of the Objective Function
In this paper, the objective function is constructed using the sum of the minimum Euclidean distances between each pixel and the corresponding predicted class in the training samples.In the proposed algorithm the purpose of DE is to obtain the maximum value of the objective function.

Calculation of the Objective Function
In this paper, the objective function is constructed using the sum of the minimum Euclidean distances between each pixel and the corresponding predicted class in the training samples.In the proposed algorithm the purpose of DE is to obtain the maximum value of the objective function.
The classification map Map basis is obtained by the usual majority voting.Using Map basis as the basis, voting is undertaken with the weight of chromosome x t k,g .If the pixels in Map 1 belong to class C i , then the weight value of each classification map refers to the corresponding weight of class C i , respectively.The new classification map Map new is obtained by traversing the whole image.
If the predicted label of pixel a is class i, which is part of classification maps b 1 , . . .b m (m denotes the number of classification maps that predict the label is i), then: As shown in Figure 4, w b m i denotes the weight of class i of classification map b m and D ai denotes the minimum distance between pixel a and training data T i , for which the label is class i: where n represents the number of T i , i m represents the mth pixel of T i , µ a represents the image vector of pixel a and µ i m represents the image vector of pixel i m .d(a, i m ) denotes the Euclidean distance between µ a and µ i m .The smaller the value of D ai and the greater the value of w b m i , the greater the value of j a , which means a lager weight.The fitness of the individual X k,g is calculated as follows: where num denotes the total number of image pixels.If the predicted label of pixel a is class i , which is part of classification maps 1 ,... m b b ( m denotes the number of classification maps that predict the label is i ), then: As shown in Figure 4, w , the greater the value of a j , which means a lager weight.
The fitness of the individual , k g X is calculated as follows: where num denotes the total number of image pixels.However, in some cases, the time for finding these appropriate parameters can be unacceptably long.i is produced by mixing the parameters of a predetermined target individual X t i and the mutant vector V t i using the crossover probability CR.Suitable control parameters are always different for different real problems.However, in some cases, the time for finding these appropriate parameters can be unacceptably long.
To solve the problem, a self-adaptive strategy for the control parameters is used.As shown in Figure 5, the control parameters F and CR are encoded to each individual.This means that each individual has its corresponding F and CR values, which can be adjusted during evolution [42].
The weight optimization solution is represented by the D-dimensional vector X k,g and two control parameters F k,g and CR k,g in the gth generation, where k = 1, 2, . . .NP.
To solve the problem, a self-adaptive strategy for the control parameters is used.As shown in Figure 5, the control parameters F and CR are encoded to each individual.This means that each individual has its corresponding F and CR values, which can be adjusted during evolution [42].The weight optimization solution is represented by the D-dimensional vector  For each vector kG X at generation G , its associated mutant vector can be generated via the strategy DE/rand/1/bin (rand refers to the mutation strategy, which uses a random selection of individuals to prevent the population from getting into the deadlock of local searching, 1 represents the number of differential vectors and bin refers to the binomial crossover strategy to expand the search space), which is the strategy most often used in practice [30,43,44].The mutation operators are as follows: where the indices 1 r , 2 r and 3 r are mutually exclusive integers randomly generated within the range ( ) ≠ ≠ ≠ r r r k .
The higher the objective function in Equation ( 5), the more likely the individual is to survive and produce offspring which results in better individuals and increases the probability of finding the optimal solution.To adaptively determine the mutation rate m p according to the derivative of the objective value of each individual, the process is as follows: max( '( )) min( '( )) The new control parameters in the 1 + G th generation where , denotes the uniform random values within the range (0,1), max G is the maximum iteration number, g is the iteration number and b is a parameter to decide the nonconforming degree, for which the experiential value is set to three [45].For each vector X kG at generation G, its associated mutant vector can be generated via the strategy DE/rand/1/bin (rand refers to the mutation strategy, which uses a random selection of individuals to prevent the population from getting into the deadlock of local searching, 1 represents the number of differential vectors and bin refers to the binomial crossover strategy to expand the search space), which is the strategy most often used in practice [30,43,44].The mutation operators are as follows: where the indices r 1 , r 2 and r 3 are mutually exclusive integers randomly generated within the range (1, NP), r 1 = r 2 = r 3 = k.The higher the objective function in Equation ( 5), the more likely the individual is to survive and produce offspring which results in better individuals and increases the probability of finding the optimal solution.To adaptively determine the mutation rate p m according to the derivative of the objective value of each individual, the process is as follows: The new control parameters in the G + 1th generation F kG+1 and CR kG+1 are updated as follows, with probability p m : where rand t , t ∈ {1, 2}, denotes the uniform random values within the range (0,1), G max is the maximum iteration number, g is the iteration number and b is a parameter to decide the nonconforming degree, for which the experiential value is set to three [45].
After the mutation phase, a crossover operation is applied to generate a trial vector k,g , . . ., U D k,g for the mutant vector V kG as follows: CR kG+1 is updated using the following [38]: where rand t , t ∈ {3, 4} denotes the uniform random values within the range (0,1).F k,g+1 and CR k,g+1 are obtained before the mutation is performed.Therefore, they influence the mutation, crossover and selection operations of the new vector X k,g+1 .

Selection
After the calculation of the objective function using Equation ( 5), a selection operation is performed.The objective function value of each trial vector J(U k,g ) is compared with that of its corresponding target vector J(X k,g ) in the current population.If the weight vector, which is obtained in this generation, has a higher or equal objective function value compared with the corresponding target vector, the trial vector will replace the target vector and form the new population of the next generation.Otherwise, the target vector will remain in the population for the next generation.The selection operation can be expressed as follows:

Stopping Condition
If generation g does not meet the maximum generation number G max , go to Step 2. Otherwise, output the best individuals as the weight of each class for each classification map.Finally, obtain the final classification result using the optimized weights.

Post-Classification
(1) Decision fusion for viaducts.Viaducts are common in urban areas.However, due to the similarity of the construction materials, they can be easily confused with tall buildings.Therefore, the proposed framework employs an object-based method to extract and classify the viaducts in the urban area, so as to improve the overall classification accuracy.Viaducts are easy to extract due to the gradually changing characteristic of the viaducts in elevation.Region growing is a simple region-based image segmentation method.This approach to segmentation examines the neighboring pixels of the initial seed points (which are selected manually) and determines whether the pixel neighbors should be added to the region or not.The process is iterated in the same manner as the general data clustering algorithms.As a result, the region growing method performed in the DSM image is used to extract the viaducts to complete the operation.
(2) Post-classification by CRF.The spatial contextual information of remote sensing imagery is very important for the classification task [46,47].Those prior operations which do not consider the correlations between neighboring pixels lead to the presence of much low-level noise in the classification map.As an improved model of MRF, CRF has the ability to consider the spatial contextual information in both the labels and observed image data.In order to consider the spatial contextual information and preserve the spatial details in the classification, pairwise CRF with an 8-neighborhood is used to smooth the final classification map.The pairwise potential uses the spatial smoothing and local class label cost terms to favor spatial smoothing in the local neighborhood and to take the spatial contextual information into account.The local class label cost term also has the ability to alleviate an oversmooth classification result since it considers the different label information of the neighboring pixels at each iterative step in the classification.

Hyperspectral Data
The hyperspectral imagery was acquired on 23 June 2012 between the times of 17:37:10 UTC and 17:39:50 UTC.The hyperspectral sensor used was the CASI visible near-infrared (VNIR) sensor and the average height of the sensor above ground was 1676.4 m.The hyperspectral imagery consists of 144 spectral bands in the 380-1050 nm region.The spatial and spectral resolutions are 2.5 m and 4.8 nm, respectively.And the image pixel number is 1905 × 349 × 144.

LiDAR Data
The LiDAR data were acquired on 22 June 2012 between the times of 14:37:55 UTC and 15:38:10 UTC.The LiDAR point cloud data were obtained from the National Science Foundation Funded Center for Airborne Laser Mapping (NCALM).The sensor recorded five returns and intensity at a platform altitude of 609.6 m above ground, with an average point spacing of 0.74 m.The LiDAR data is rasterized with a spatial resolution of 2.5 m which is identical to the spatial resolution of the hyperspectral image.In this study, the scan angle and atmospheric effects were not taken into account.The hyperspectral image and the DSM are shown in the Figure 6.
Remote Sens. 2017, 9, 868 10 of 20 neighborhood and to take the spatial contextual information into account.The local class label cost term also has the ability to alleviate an oversmooth classification result since it considers the different label information of the neighboring pixels at each iterative step in the classification.

Hyperspectral Data
The hyperspectral imagery was acquired on 23 June 2012 between the times of 17:37:10 UTC and 17:39:50 UTC.The hyperspectral sensor used was the CASI visible near-infrared (VNIR) sensor and the average height of the sensor above ground was 1676.4 m.The hyperspectral imagery consists of 144 spectral bands in the 380-1050 nm region.The spatial and spectral resolutions are 2.5 m and 4.8 nm, respectively.And the image pixel number is 1905 × 349 × 144.

LiDAR Data
The LiDAR data were acquired on 22 June 2012 between the times of 14:37:55 UTC and 15:38:10 UTC.The LiDAR point cloud data were obtained from the National Science Foundation Funded Center for Airborne Laser Mapping (NCALM).The sensor recorded five returns and intensity at a platform altitude of 609.6 m above ground, with an average point spacing of 0.74 m.The LiDAR data is rasterized with a spatial resolution of 2.5 m which is identical to the spatial resolution of the hyperspectral image.In this study, the scan angle and atmospheric effects were not taken into account.The hyperspectral image and the DSM are shown in the Figure 6.

Training Samples and Validation Samples
Each pixel in the image was mapped to one of 15 classes, namely, healthy grass, stressed grass, synthetic grass, trees, soil, water, residential, commercial, road, highway, railway, parking lot 1 (there are cars in the parking lot), parking lot 2 (there is no car in the parking lot), tennis court and running track.The numbers of training and validation samples are shown in Table 2.The location and distribution of the training and validation samples are shown in Figure 7.

Training Samples and Validation Samples
Each pixel in the image was mapped to one of 15 classes, namely, healthy grass, stressed grass, synthetic grass, trees, soil, water, residential, commercial, road, highway, railway, parking lot 1 (there are cars in the parking lot), parking lot 2 (there is no car in the parking lot), tennis court and running track.The numbers of training and validation samples are shown in Table 2.The location and distribution of the training and validation samples are shown in Figure 7.

Experimental Results
MNF and PCA were the two methods used to extract the spectral features.The 22 features containing the most information for the hyperspectral image were kept for both MNF and PCA.The vegetation was characterized by the NDVI (band 69 is the red band, band 82 is the infrared band).In order to increase the class separability, the GLCM was added to characterize the texture information.The GLCM texture was produced by the homogeneity measure with a window size of nine using the first three principal components obtained by PCA.Finally, the DSM data generated by the LiDAR data were added to form the final feature image.The two feature maps (PCA + NDVI + GLCM + DSM/MNF + NDVI + GLCM + DSM) were classified by SVM, MLC and MLR, using DE to optimize the weights of each classification map.The final classification map was obtained by a post-classification operation on the weighted voting result.The features (i.e., MNF, PCA, NDVI, GLCM) were extracted by ENVI.The classifiers SVM, MLC, MLR were operated by Visual C++ 6.0, ENVI and Matlab R2014a.The DE and CRF algorithms were both operated using Visual C++ 6.0.The final classification map is shown in Figure 8.The final overall classification accuracy is 93.5% and the Kappa coefficient is 0.9299.
The classification accuracy of each category is shown in Table 3.The data show that the algorithm achieves a good effect in most of the classes, especially the grass_stressed, grass_synthetic, tree, soil, tennis court and running track classes, where the accuracy reaches or almost reaches 100%.However, the spectral, texture and elevation information of the residential and commercial classes are very similar, which leads to the classification results of these categories not being ideal.There is also some confusion between highway and railway due to the influence of the shadow area in the

Experimental Results
MNF and PCA were the two methods used to extract the spectral features.The 22 features containing the most information for the hyperspectral image were kept for both MNF and PCA.The vegetation was characterized by the NDVI (band 69 is the red band, band 82 is the infrared band).In order to increase the class separability, the GLCM was added to characterize the texture information.The GLCM texture was produced by the homogeneity measure with a window size of nine using the first three principal components obtained by PCA.Finally, the DSM data generated by the LiDAR data were added to form the final feature image.The two feature maps (PCA + NDVI + GLCM + DSM/MNF + NDVI + GLCM + DSM) were classified by SVM, MLC and MLR, using DE to optimize the weights of each classification map.The final classification map was obtained by a post-classification operation on the weighted voting result.The features (i.e., MNF, PCA, NDVI, GLCM) were extracted by ENVI.The classifiers SVM, MLC, MLR were operated by Visual C++ 6.0, ENVI and Matlab R2014a.The DE and CRF algorithms were both operated using Visual C++ 6.0.The final classification map is shown in Figure 8.The final overall classification accuracy is 93.5% and the Kappa coefficient is 0.9299.
The classification accuracy of each category is shown in Table 3.The data show that the algorithm achieves a good effect in most of the classes, especially the grass_stressed, grass_synthetic, tree, soil, tennis court and running track classes, where the accuracy reaches or almost reaches 100%.However, the spectral, texture and elevation information of the residential and commercial classes are very similar, which leads to the classification results of these categories not being ideal.There is also some confusion between highway and railway due to the influence of the shadow area in the hyperspectral image.In order to verify the effects of the proposed algorithm, a multi-group comparison experiment is carried out.In addition, McNemar's test [48] is used to determine the statistical significance of the differences between the classification results obtained by the varying algorithms, using the same test sample set.Given two classifiers

M M
+ ≥ , the 2 X statistic can be considered as following a chi-squared distribution: This test can check whether the difference between varying classification results is meaningful.
Given a significance level of 0.05, then C and 2 C are significantly different.

Effects of Adding LiDAR Data
In this paper, the DSM is used to form the two feature maps, i.e., PCA + NDVI + GLCM + DSM and MNF + NDVI + GLCM + DSM.Adding the LiDAR data to characterize the elevation information can improve the classification result.In order to verify the effects of adding the LiDAR data, the two feature maps (MNF + NDVI + GLCM\PCA + NDVI + GLCM) extracted from the hyperspectral image were classified.The weights of the six classification maps obtained by the MLC\MLR\SVM classifiers were then optimized using the proposed ODF-ADE.The overall accuracy (OA) of the classification results and the accuracy of each category are shown in Table 4, where S means that the result was obtained only using the hyperspectral image (i.e., obtained by optimal weights only using hyperspectral image) and S + L means that the result was obtained by the proposed method (i.e., obtained by optimal weights using both hyperspectral and LiDAR data).Both the results are without post-processing.
From both the OA and the accuracy of the various categories, the method proposed in this paper achieves very good results.As can be seen from Table 4, the overall precision is increased by 3% after adding the LiDAR data.For certain classes, such as commercial, road and railway, the classification accuracy is greatly improved.These classes have similar spectral information, but can be distinguished by the LiDAR data because of the different elevations.According to these data, we can clearly see that the method that fuses LiDAR and hyperspectral data achieves good results in the urban area land-use classification and achieves the expected goal.In addition, the McNemar's test value of these two approaches is given in Tables 5 and 6 to evaluate the statistical significance.It can be seen from Tables 5 and 6    In order to verify the effects of the proposed algorithm, a multi-group comparison experiment is carried out.In addition, McNemar's test [48] is used to determine the statistical significance of the differences between the classification results obtained by the varying algorithms, using the same test sample set.Given two classifiers C 1 and C 2 , the number of pixels misclassified by C 1 but not by C 2 is denoted as M 12 , and M 21 represents the number of cases misclassified by C 2 but not by C 1 .If M 12 + M 21 ≥ 20, the X 2 statistic can be considered as following a chi-squared distribution: This test can check whether the difference between varying classification results is meaningful.Given a significance level of 0.05, then χ 2 0.05,1 = 3.841459.If X 2 is greater than χ 2 0.05,1 , the results of the two classifiers C 1 and C 2 are significantly different.

Effects of Adding LiDAR Data
In this paper, the DSM is used to form the two feature maps, i.e., PCA + NDVI + GLCM + DSM and MNF + NDVI + GLCM + DSM.Adding the LiDAR data to characterize the elevation information can improve the classification result.In order to verify the effects of adding the LiDAR data, the two feature maps (MNF + NDVI + GLCM\PCA + NDVI + GLCM) extracted from the hyperspectral image were classified.The weights of the six classification maps obtained by the MLC\MLR\SVM classifiers were then optimized using the proposed ODF-ADE.The overall accuracy (OA) of the classification results and the accuracy of each category are shown in Table 4, where S means that the result was obtained only using the hyperspectral image (i.e., obtained by optimal weights only using hyperspectral image) and S + L means that the result was obtained by the proposed method (i.e., obtained by optimal weights using both hyperspectral and LiDAR data).Both the results are without post-processing.From both the OA and the accuracy of the various categories, the method proposed in this paper achieves very good results.As can be seen from Table 4, the overall precision is increased by 3% after adding the LiDAR data.For certain classes, such as commercial, road and railway, the classification accuracy is greatly improved.These classes have similar spectral information, but can be distinguished by the LiDAR data because of the different elevations.According to these data, we can clearly see that the method that fuses LiDAR and hyperspectral data achieves good results in the urban area land-use classification and achieves the expected goal.In addition, the McNemar's test value of these two approaches is given in Tables 5 and 6 to evaluate the statistical significance.It can be seen from Tables 5 and 6 that McNemar's values between S and S + L are greater than the critical value of χ 2 0.05,1 (3.841459), which means that the differences are significant.Due to the abilities of the different classifiers to distinguish different types of objects, a voting approach that integrates the classification results using equally weighted classifiers lacks scientific rigor.Therefore, the weights of each class for each classifier were optimized through weighted voting of the six classification images obtained by the different classifiers using different features.The weights of the 15 classes of the six classification maps were initialized randomly from the range (0,1).The initial population size was 30 and the maximum number of iterations was 500.Through optimizing the weight according to the distance between each pixel and the corresponding training sample data, the global optimum solution could be found by iteration.After the optimal decision fusion, the OA of the classification map reached 90.83%, which is much higher than any of the prior classification maps.The classification accuracy of each classifier and the voting result are shown in Table 7 where most of the class accuracies obtained by the weighted voting are better than any of the six initial classifications maps.It can also be seen from Table 8 that the McNemar's test values between voting and any other classifiers are greater than the critical value of χ 2 0.05,1 (3.841459), which means that the algorithms are significant.
In order to verify the effect of the weighted voting, the results of majority voting for which the weight was equal and weighted voting for which the weight was optimized by adaptive DE are compared.The input classification maps were the same six maps obtained before.The OA of the classification results and the accuracy of each category are shown in Table 9. MV means that the result was obtained by voting with equal weights using both the hyperspectral image and LiDAR data; WV means that the result was obtained by optimal weights using the same data.Both the results are without post-processing.
As can be seen from Table 9, the OA is improved after the weighted voting.The accuracy of certain classes, such as highway, railway, parking_lot1 and parking_lot2, is greatly improved.The results show that the weighted voting can fully utilize the differences among the different classifiers and improve the classification result.The result of McNemar's test is shown in Table 6.The value is greater than the critical value of χ 2 0.05,1 (3.841459), which means the proposed algorithm has a significant difference with the majority voting.In order to solve the problem of the viaduct being confused with tall buildings, the DSM data were used to identify the viaduct by a region growing operation.The extracted viaduct was used to correct the classification of the highway class and is shown in Figure 9.The CRF-based smoothing approach was used in the experiment to generate the final classification map.In order to solve the problem of the viaduct being confused with tall buildings, the DSM data were used to identify the viaduct by a region growing operation.The extracted viaduct was used to correct the classification of the highway class and is shown in Figure 9.The CRF-based smoothing approach was used in the experiment to generate the final classification map.Table 10 shows the classification accuracy of the result obtained by weighted voting and post-classification.WV means the result of weighted voting and APC means the result after post-classification.The accuracy of the highway class increases by 20% after the post-classification operation.It can also be clearly seen that the algorithm considering the spatial interaction shows an improvement of more than 3% over the result of weighted voting, in terms of the OA, which demonstrates the effectiveness of incorporating the spatial contextual information.The post-classification also has a great effect on the accuracy of each class.

Sensitivity to Features
In the proposed framework, multiple features are extracted to characterize the experimental area.In order to verify the effect of adding the multiple features, we compared different combinations different features.Twelve feature maps were classified by MLC and the OA is shown in Table 11.In addition, the McNemar's test is operated to verify whether there are significant differences between different features and the results are shown in Table 12.
As can be seen in Table 11, the combination of spectral, NDVI and spatial information is better than any other combination.The NDVI is added to highlight the vegetation, the texture operators based on the GLCM are utilized to increase the separability between classes and the DSM data obtained from the LiDAR data are added to characterize the elevation information.Therefore, two groups of feature maps, i.e., MNF + NDVI + GLCM + DSM/PCA + NDVI + GLCM + DSM are used in consideration of the OA and McNemar's test values.
As can be seen in Table 12, most of the McNemar's test values between the combination of spectral, NDVI and spatial information (i.e., MNF + NDVI + GLCM + DSM/PCA + NDVI + GLCM + DSM) and any other combination, are greater than the critical value of χ 2 0.05,1 (3.841459) which means the different combinations of features are significant.

Sensitivity to the Parameter of ODF-DE
To compare the self-adaptive version of the ODF-DE algorithm, i.e., ODF-ADE, with the original ODF-DE algorithm, the best control parameters setting for ODF-DE are needed.For the ODF-DE algorithm, the control parameter values during the subpixel mapping process were not changed, except for the analyzed parameter.The experiential parameters of the original ODF-DE algorithm were set as follows: CR = 0.8, F = 0.3, and the maximum number of iterations was 500.

Sensitivity of Parameter F
According to the brief introduction to DE provided above, F is an important parameter for the ODF-DE algorithm.Hence, the impact of parameter F on the algorithm was tested.Parameter F was set from 0.1 to 1.0 with a step size of 0.1, and the other parameters were fixed as NP = 30, CR = 0.8, and the maximum number of iterations as 500.The experimental results are presented in Figure 10a.From this figure, the best adjusted OA of ODF-DE, i.e., 90.86% for the experimental image, is obtained when F is equal to 0.3.Although ODF-DE can obtain a higher OA than ODF-ADE, the ODF-ADE algorithm does not need any other prior knowledge.

Sensitivity to the Parameter of ODF-DE
To compare the self-adaptive version of the ODF-DE algorithm, i.e., ODF-ADE, with the original ODF-DE algorithm, the best control parameters setting for ODF-DE are needed.For the ODF-DE algorithm, the control parameter values during the subpixel mapping process were not changed, except for the analyzed parameter.The experiential parameters of the original ODF-DE algorithm were set as follows: CR = 0.8, F = 0.3, and the maximum number of iterations was 500.

Sensitivity of Parameter F
According to the brief introduction to DE provided above, F is an important parameter for the ODF-DE algorithm.Hence, the impact of parameter F on the algorithm was tested.Parameter F was set from 0.1 to 1.0 with a step size of 0.1, and the other parameters were fixed as NP = 30, CR = 0.8, and the maximum number of iterations as 500.The experimental results are presented in Figure 10a.From this figure, the best adjusted OA of ODF-DE, i.e., 90.86% for the experimental image, is obtained when F is equal to 0.3.Although ODF-DE can obtain a higher OA than ODF-ADE, the ODF-ADE algorithm does not need any other prior knowledge.

Sensitivity of Parameter CR
For the experimental images, the ODF-ADE algorithm was performed with CR taken from (0.1, 1.0) with a step size of 0.1, while the other parameters were set as follows: NP = 30, F = 0.3, and the maximum number of iterations was 500.The experimental results are shown in Figure 10b.The best adjusted OA value of ODF-DE for the experimental image, i.e., 90.86%, is obtained by CR = 0.2 (NP = 30, F = 0.3, and the maximum number of iterations is 500).The values are slightly higher than ODF-ADE, i.e., 90.83%.Although ODF-DE can obtain satisfactory results by adjusting the value of parameter CR, ODF-ADE can adaptively provide similar or better decision fusion results, without prior knowledge or experience.

Sensitivity of Parameter NP
The number of the initial population NP is very important in maintaining the diversity of the population and extending the search range in the feature space.To analyze the sensitivity in relation to parameter NP, the other parameters, i.e., CR and F, were determined adaptively and NP assumed the following values for the experimental images: NP = {5, 10, 15, 20, 25, 30, 35, 40, 45, 50}. Figure 10c shows the sensitivity of ODF-ADE in relation to parameter NP by analyzing the relationship between OA and NP.There is an upward trend in the OA of the ODF-ADE algorithm when the value of NP is changed from 5 to 50.When NP is equal to 35, the highest OA value of ODF-ADE is 90.87%.
Based on the aforementioned sensitivity analyses, there is a disadvantage in the original ODF-DE., i.e., the best control parameter settings of ODF-DE are problem-dependent.The proposed ODF-ADE overcomes this disadvantage and it is much more independent than the original ODF-DE.Therefore the conclusion is that ODF-ADE is an effective decision fusion algorithm.

Sensitivity of Parameter CR
For the experimental images, the ODF-ADE algorithm was performed with CR taken from (0.1, 1.0) with a step size of 0.1, while the other parameters were set as follows: NP = 30, F = 0.3, and the maximum number of iterations was 500.The experimental results are shown in Figure 10b.The best adjusted OA value of ODF-DE for the experimental image, i.e., 90.86%, is obtained by CR = 0.2 (NP = 30, F = 0.3, and the maximum number of iterations is 500).The values are slightly higher than ODF-ADE, i.e., 90.83%.Although ODF-DE can obtain satisfactory results by adjusting the value of parameter CR, ODF-ADE can adaptively provide similar or better decision fusion results, without prior knowledge or experience.

Sensitivity of Parameter NP
The number of the initial population NP is very important in maintaining the diversity of the population and extending the search range in the feature space.To analyze the sensitivity in relation to parameter NP, the other parameters, i.e., CR and F, were determined adaptively and NP assumed the following values for the experimental images: NP = {5, 10, 15, 20, 25, 30, 35, 40, 45, 50}. Figure 10c shows the sensitivity of ODF-ADE in relation to parameter NP by analyzing the relationship between OA and NP.There is an upward trend in the OA of the ODF-ADE algorithm when the value of NP is changed from 5 to 50.When NP is equal to 35, the highest OA value of ODF-ADE is 90.87%.
Based on the aforementioned sensitivity analyses, there is a disadvantage in the original ODF-DE., i.e., the best control parameter settings of ODF-DE are problem-dependent.The proposed ODF-ADE overcomes this disadvantage and it is much more independent than the original ODF-DE.Therefore the conclusion is that ODF-ADE is an effective decision fusion algorithm.

Conclusions
Based on DE theory, this paper has proposed a new optimal decision fusion strategy for the fusion of hyperspectral images and LiDAR data, namely ODF-ADE.In line with this strategy, the optimal decision fusion problem is transformed into an optimization problem in the feature space by maximizing the objective value.The traditional voting algorithm always uses equal weights to fuse the classification maps, which results in the differences among the different classifiers not being fully utilized.In the proposed method, DE, which has the ability of global optimization, is used to obtain the weights of the different classification maps.In addition, in the traditional DE it is necessary to choose appropriate control parameters, employing the prior experience of the user, for population size NP, crossover rate CR and scaling factor F. This is quite a difficult task because the best settings for the control parameters are not easy to determine for complex problems.In the proposed method, a self-adaptive strategy is utilized to determine the parameters.
The data sets of the 2013 Data Fusion Contest were used to test the effectiveness of the proposed algorithm.The experimental results show that ODF-ADE cannot only make full use of the advantages of LiDAR data, but it can also obtain more reasonable classification maps using the weighted voting.ODF-ADE can overcome the shortcomings of classification using data from a single sensor and can achieve good results in urban LULC classification.
In our future work, the number of classification maps and the diversity of these maps will be increased to obtain a better result.Ensemble voting will also be considered to further improve the classification result and additional information about LiDAR may be considered in the feature work.

Figure 1 .
Figure 1.Framework of the differential evolution algorithm.

Figure 1 .
Figure 1.Framework of the differential evolution algorithm.

Figure 2 .
Figure 2. Framework of the proposed methodology.

Figure 2 .
Figure 2. Framework of the proposed methodology.

3. 3 . 1 .X
Initial Population After obtaining the classification maps of the different classifiers, the population can be initialized as represents the k th individual in the g th generation and NP is the size of the population.As shown in Figure 3, each individual of each class for each classification map.The weights also need to be initialized.The two variables are defined as M and N , which represent the number of land-cover labels and classification maps, respectively.D equals × M N and denotes the number of chromosomes that one individual kG x contains.The initial population 1 P is generated randomly from 0 to 1.

w 2 =
denotes the weight of class i of classification map m b and ai D denotes the minimum distance between pixel a and training data i T , for which the label is class i : min ( , ), ( , ),... ( , ),... ( , ) where n represents the number of i T , m i represents the m th pixel of i T , μ a represents the image vector of pixel a and μ m i represents the image vector of pixel m i .( , ) m d a i denotes the Euclidean distance between μ a and μ m i .The smaller the value of ai D and the greater the value of m b i

Figure 4 .
Figure 4.The objective function based on the minimum distance.

3. 3 . 3 .
Adaptive Mutation and Crossover DE generates the mutant individual by obtaining the sum of the difference vector and the third individual according to a scaling factor F .The trial individual t i U is produced by mixing the parameters of a predetermined target individual t i X and the mutant vector t i V using the crossover probability CR .Suitable control parameters are always different for different real problems.

Figure 4 .
Figure 4.The objective function based on the minimum distance.

3. 3 . 3 .
Adaptive Mutation and Crossover DE generates the mutant individual by obtaining the sum of the difference vector and the third individual according to a scaling factor F. The trial individual U t in the g th generation, where 1, 2,... = k N P.

Figure 7 .
Figure 7. Location and distribution of the training and validation samples.(a) Location and distribution of the training samples.(b) Location and distribution of the validation samples.

Figure 7 .
Figure 7. Location and distribution of the training and validation samples.(a) Location and distribution of the training samples.(b) Location and distribution of the validation samples.
that McNemar's values between S and S + L are greater than the 841459), which means that the differences are significant.

Figure 10 .
Figure 10.Sensitivity to the parameters of ODF-DE.(a) Sensitivity of ODF-DE in relation to F. (b) Sensitivity of ODF-DE in relation to CR. (c) Sensitivity of ODF-DE in relation to NP.

Figure 10 .
Figure 10.Sensitivity to the parameters of ODF-DE.(a) Sensitivity of ODF-DE in relation to F. (b) Sensitivity of ODF-DE in relation to CR. (c) Sensitivity of ODF-DE in relation to NP.

Table 1 .
The defined notations.

Table 1 .
The defined notations.

Table 2 .
Number of training and validation samples.

Table 2 .
Number of training and validation samples.

Table 4 .
Comparison of the classification accuracy after adding LiDAR data (%).

Table 5 .
Comparison of the McNemar's test values after adding LiDAR data.

Table 6 .
McNemar's test values of majority voting and weighted voting.

Table 7 .
Comparison of the different classification strategies (%).

Table 8 .
McNemar's test values of the different classification strategies.

Table 9 .
Classification accuracy of majority voting and weighted voting (%).

Table 7 .
Comparison of the different classification strategies (%).

Table 8 .
McNemar's test values of the different classification strategies.

Table 9 .
Classification accuracy of majority voting and weighted voting (%).

Table 10 .
Classification accuracy of weighted voting and post-classification (%).

Table 11 .
Classification accuracy of the different features.

Table 12 .
McNemar's test values of the different features.