Tensor-Based Sparse Representation Classiﬁcation for Urban Airborne LiDAR Points

: The common statistical methods for supervised classiﬁcation usually require a large amount of training data to achieve reasonable results, which is time consuming and inefﬁcient. In many methods, only the features of each point are used, regardless of their spatial distribution within a certain neighborhood. This paper proposes a tensor-based sparse representation classiﬁcation (TSRC) method for airborne LiDAR (Light Detection and Ranging) points. To keep features arranged in their spatial arrangement, each LiDAR point is represented as a 4th-order tensor. Then, TSRC is performed for point classiﬁcation based on the 4th-order tensors. Firstly, a structured and discriminative dictionary set is learned by using only a few training samples. Subsequently, for classifying a new point, the sparse tensor is calculated based on the tensor OMP (Orthogonal Matching Pursuit) algorithm. The test tensor data is approximated by sub-dictionary set and its corresponding subset of sparse tensor for each class. The point label is determined by the minimal reconstruction residuals. Experiments are carried out on eight real LiDAR point clouds whose result shows that objects can be distinguished by TSRC successfully. The overall accuracy of all the datasets is beyond 80% by TSRC. TSRC also shows a good improvement on LiDAR points classiﬁcation when compared with other common classiﬁers.


Introduction
LiDAR (Light Detection and Ranging) point cloud classification in urban areas has always been an essential and challenging task.Before classification, various features are extracted from the raw three-dimensional (3D) point cloud, which should be able to distinguish different objects.Due to the complexity in urban scenes, it is difficult to label objects correctly using only single or multi feature thresholds.Thus, research mainly focused on the use of statistical method for supervised classification of LiDAR points in recent years.Common machine learning methods include the support vector machine (SVM) algorithm, AdaBoost, decision trees, random forest, and other classifiers.Those machine learning methods aim to build a classification rule or probability function to determine the label based on the features.SVM seeks out the optimal hyperplane that efficiently separates the classes, and the Gaussion kernel function can be used to map non-linear decision boundaries to higher dimensions, where they are linear [1].AdaBoost is a binary algorithm, but several extensions are explored for multiclass categorization.The weak hypothesis generation routines are combined into AdaBoost algorithm to classify terrain and non-terrain areas in [2].Decision trees can be used to carry out the classification by training data and make a hierarchical binary tree model, new objects can be classified based on previous knowledge [3].Random Forest is an ensemble learning method that uses a group of decision trees, provides measures of feature importance for each class [4], and runs efficiently on large datasets.However, those approaches barely consider the spatial distribution of points, which is an important cue for the classification in complex urban scenes.Some studies have applied graphical models to incorporate spatial context information in the classification.Graphical models take neighboring points into account, which allow for us to encode the spatial and semantic relationships between objects via a set of edges between nodes in a graph [5].Markov network and conditional random field (CRF) are two mainstream methods to define the graphical model.However, a large amount of training data is necessary to obtain the classifier in those statistical studies.Anguelov et al. [6] use the Associated Markov Network (AMN) to classify objects on the ground.The study takes 1/6 points as training data and achieve an overall classification accuracy as 93%.Niemeyer et al. [7] use 3000 points per class to train the CRF model.Seven classes (grass land, road, tree, low vegetation, buildings with gable roof, buildings with flat roof, and facade) are distinguished based on the CRF and the overall accuracy is 83%, which is a fine result in a complex urban scene.As for statistical methods, Im [8] uses 316 training samples (1%) to generate decision trees with an overall accuracy of 92.5%.Moreover, Lodha uses half of dataset as training data through AdaBoost algorithm and the average accuracy is 92%.As a consequence, classifier training would be very time-consuming, especially when Markov network or CRF are used as classifiers.
In order to combine spatial distribution and feature information, we suggest using the high-dimensional tensor data structure for representing each point (to avoid misunderstandings we stress that tensor refers here to a high dimensional data structure, and is not related to the concepts of tensor voting or the structure tensor).Normally, the dimensional data has to be embedded into vectors in traditional methods.However, the vectorization breaks the original multidimensional structure of the signal and reduces the reliability of post processing.Therefore, high-dimensional tensors are utilized in several approaches.The high-dimensional tensor means that the elements in the data are to be addressed by more than two indices.Tensors have been widely applied to hyperspectral images, face images, and video data representation.Renard and Boourennane introduce a hyperspectral image representation based on tensors to jointly take advantage of the spatial and spectral information [9].It shows that the spatial projection into a lower orthogonal subspace joint with spectral dimension reduction can efficiently improve the classification.In face recognition, the Tensorfaces are proposed by Vasilescu et al. to overcome the influence of different factors that are related to facial geometries, expressions, head poses, and lighting conditions [10].The Tensorfaces improve the facial recognition rates when compared with the standard eigenfaces.Kuang et al. use a unified tensor model to represent the large-scale and heterogeneous data [11].It shows a great ability of dimensionality reduction by using incremental high order singular value decomposition.
This paper aims to use as few training data as possible to achieve effective classification.Therefore, sparse representation-based classification is used in this paper.Sparse representation-based classification (SRC) is a well-known technique to represent data by sparse linear combination of bases, which are extracted from a fixed dictionary or learned dictionary.It classifies unknown data based on the reconstruction criteria.SRC has been successfully applied to the processing of signals [12] and images [13].SRC includes two important parts: sparse coding and dictionary learning.Sparse coding is to find a certain small number of base atoms from the dictionary for reconstruction raw data.Sparse coding can be solved by Orthogonal Matching Pursuit (OMP) [14] , LASSO (least absolute shrinkage and selection operator) [13], or the gradient descent algorithm [15].The dictionary can generally come from two sources: mathematical model-based methods and the dictionary learning from training data.The mathematical model-based methods for building a dictionary include: Fourier series, wavelets and discrete cosine transform bases [16,17].But, this predefined dictionary is fixed and cannot be adapted according to the dataset.Therefore, dictionary learning from the dataset is an optimal choice due to its flexibility for a specific dataset.The dictionary learning problem can be solved by the method of optimal directions (MOD) [18], K-SVD [19], and the gradient descent algorithm [20].Furthermore, previous research formulates the high dimensional data SRC problem in terms of tensors.Tensor extensions of the dictionary learning and sparse coding algorithms have Remote Sens. 2017, 9, 1216 3 of 22 been developed, such as Tensor MOD and KSVD for dictionary learning [21], and tensor OMP [22].Moreover, tensor based sparse representation has yielded good performance in high-dimensional data classification [9], face recognition [23], and image de-noising [24].
In this paper, we propose a tensor-based sparse representation classification method for urban airborne LiDAR points identification.The main innovations of this paper are summarized below:

1.
A new data structure is introduced to represent each point.To keep the feature description in their original geometrical 3D space, the LiDAR points are represented as 4th-order tensors.
A point and its neighboring points are rearranged by their spatial distribution in the tensor space, meanwhile the features of each point in the neighborhood are also attached as the fouth mode of the tensor.In this tensor data structure, both spatial and feature information can be used for classification.

2.
A structured and discriminative dictionary set is learned for tensors based on a few samples of training data.Firstly, we present a structured and discriminative dictionary learning adapted to the high dimensional tensor data.Additionally, the dictionary learning only uses a few samples of training data.The dictionary classifier shows better classification ability than other popular classifiers (KNN, decision tree, random forest, SVM) when using the same amount of training data.
Finally, the decision which class a point belong to is based on the minimum reconstruction residual from the sub-dictionary and its subset of sparse tensor.The sparse tensor approximation of each test tensor can be obtained by projecting the test tensor onto dictionaries, and the sparse tensor only has a few nonzero entries that are corresponding to the selected atoms in the dictionary set.We expect that the sparse tensors of points belong to the same class have similar structure.At last, the label of unknown points can be predicted by the minimum reconstruction residual from each class specific sub-dictionaries and the subset of the sparse tensor.
In the following, we first introduce the tensor generation processing in Section 2. Subsequently, the conventional sparse representation classification (SRC) is briefly introduced in Section 3.1.Then, the procedure of tensor-based sparse representation classification is written in detail in Section 3.2, which includes the sparse coding algorithm for tensor data (in Section 3.2.1),structured and discriminative dictionary learning (in Section 3.2.2),and the classification procedure (in Section 3.2.3).After that, the tensor-based sparse representation classification (TSRC) classification results and the comparison with other classifiers are presented in Section 4, followed by a discussion on the influence of parameters selection in Section 5. Finally, the major findings of this work are summarized in Section 6.

Tensor Notations and Preliminaries
A tensor is denoting a multidimensional object, whose elements are to be addressed by more than two indices.The order of a tensor, also known as modes [25], is the number of dimensions.Tensors are denoted as boldface italic capital letters, e.g., T ∈ R I 1 ×I 2 ×•••×I N ; matrices are denoted as upright capital letters, e.g., T ∈ R I 1 ×I 2 ; vectors are denoted as upright lowercase letters, e.g., t ∈ R I .The element The Frobenius norm of a tensor T is defined as: The tensor can be transformed into a vector or matrix, and this processing is known as unfolding or flattening.Given an Nth-order tensor T ∈ R I       ×I N , the n-mode unfolding vector of tensor T is obtained by fixing every index except the one in the mode n [26].The n-mode unfolding matrix is defined by arranging all of the n-mode vectors as columns of a matrix, i.e., the n-mode unfolding matrix The product between two matrices can be extended to the product of a tensor and a matrix.The n-mode product of a tensor For the processing, this can be converted to the matrix product of U and the unfolded tensor T (n) , which is expressed as Equation ( 1): The Tucker decomposition is a form of higher-order principal component analysis, which can be described as Equation (2).It decomposes a tensor each mode.X is the core tensor, and its entries show the level of interaction between the different components [25].The matrix U 1 , U 2 , • • • U N can be considered as the principal components in each mode.
core tensor X can be considered as a compressed version of the original tensor.In the following equations, the ∼ = sign means "approximately equal".
Tucker mode can be written as the Kronecker representation, these two representations are equivalent.Let ⊗ denote the Kronecker product, and define the vectorization operation on tensors as t = vec(T), t ∈ R I 1 I 2 •••I N .The vectorization operation stacks all of the columns of the mode-1 tensor T (1) in a single vector.Then, given t = vec(T), x = vec(X), the equivalent Kronecker representation is shown as following:

Tensor Representation of LiDAR Point
The process of presenting a point p as the tenor T is described, as seen in Figure 1.Firstly, the k closest neighbors of the point p are found and denoted as point set P. Then global Cartesian coordinates of point set P are transformed to the local PCA (Principal Component Analysis) coordinate system.The local axes (e 1 , e 2 , e 3 ) are defined by the principal direction of variance of the point set P based on PCA.The PCA transformation ensures that the first axis e 1 has the most variation, the second axis e 2 has the second-most, and the third axis e 3 the least.Therefore, the spatial distribution is reflected by the coordinate values in the third axis.Points of volumetric structure will have various coordinates in the third axis in the local coordinate system, whereas points of local planar surfaces will have consistent coordinates in the third axis.Let p i (x i , y i , z i ) be the point in global Cartesian coordinate system, p ei (u i , v i , w i ) be the point in local PCA coordinate system.The transformation is calculated by: where (x 0 , y 0 , z 0 ) are the mean coordinates of points within the neighborhood in the global coordinate system.
After that, all points in the local PCA coordinate system are converted into voxel coordinates by the following equations: Remote Sens. 2017, 9, 1216 5 of 22 where (v x i , v y i , v z i ) represents the voxel index within the voxel array, Int() is the function that rounds off the result to the nearest integer, (min(u i ), min(v i ), min(w i )) is the minimum values of (u, v, w) and (∆v x , ∆v y , ∆v z ) indicates the voxel element size.In this paper, ∆v x = ∆v y = ∆v z = 0.2 covering a cubic space of 1 m 3 , which means , a unique natural number ranging from 1 to 5 can be associated to each voxel in X, Y, Z dimensions.Subsequently, the mean feature vector of all the points that is assigned to each voxel is set as the voxel value, which is shown as in Figure 1.As a result, the center point p with its k nearest neighborhood is represented as a 4th-order tensor, the entries are accessed by the voxel index and the feature number.Based on this data structure, the attribute of each point are regarded as entries in the tensor, which are arranged as r i 1 i 2 i 3 i 4 where i 1 = 1, . . ., I 1 ; i 2 = 1, . . ., I 2 ; i 3 = 1, . . ., I 3 ; i 4 = 1, . . ., I 4 , i 1 = i 2 = i 3 = 5 and I 4 equals to the number of attribute on each points in our approach.Finally, the point p is described as the 4th-order tensor T ∈ R I 1 ×I 2 ×I 3 ×I 4 , where I 1 , I 2 , I 3 , I 4 indicate the X, Y, Z and attribute mode, respectively.In this regard, the spatial distribution and attributions can be simultaneously preserved.Each point in the LiDAR dataset is processed as the 4th-order tensor, which is used for tensor-based dictionary learning and sparse coding.
Remote Sens. 2017, 9, 1216 5 of 20 and (∆ , ∆ , ∆ ) indicates the voxel element size.In this paper, ∆ = ∆ = ∆ = 0.2 covering a cubic space of 1 m 3 , which means ∈ [1,5]; ∈ [1,5]; ∈ [1,5], a unique natural number ranging from 1 to 5 can be associated to each voxel in X, Y, Z dimensions.Subsequently, the mean feature vector of all the points that is assigned to each voxel is set as the voxel value, which is shown as in Figure 1.As a result, the center point p with its k nearest neighborhood is represented as a 4th-order tensor, the entries are accessed by the voxel index and the feature number.Based on this data structure, the attribute of each point are regarded as entries in the tensor, which are arranged as , where = 1, … , ; = 1, … , ; = 1, … , ; = 1, … , , = = = 5 and equals to the number of attribute on each points in our approach.Finally, the point p is described as the 4th-order tensor ∈ × × × , where , , , indicate the X, Y, Z and attribute mode, respectively.In this regard, the spatial distribution and attributions can be simultaneously preserved.Each point in the LiDAR dataset is processed as the 4th-order tensor, which is used for tensor-based dictionary learning and sparse coding.

Sparse Representation Classification Model
The sparsity algorithm is to find the best representative of a test sample by sparse linear combination of training samples from a dictionary [13].Given a certain number of training samples from each class, the sub-dictionary D from th class is learned.Assume that there are c classes of subjects, and let D = [D , D , … , D ], which is the overall structured dictionary over the entire dataset.Denote by y a test sample vector and x the sparse coefficient vector of y, the linear representation of y can be written as: The sparse coefficient vector x is calculated by projecting y on the dictionary D, which is called sparse coding procedure.The sparse coefficient vector x can be obtained by solving the following optimization problem:

Sparse Representation Classification Model
The sparsity algorithm is to find the best representative of a test sample by sparse linear combination of training samples from a dictionary [13].Given a certain number of training samples from each class, the sub-dictionary D i from ith class is learned.Assume that there are c classes of subjects, and let D = D 1 , D 2 , . . ., D c , which is the overall structured dictionary over the entire dataset.Denote by y a test sample vector and x the sparse coefficient vector of y, the linear representation of y can be written as: y = Dx (6) The sparse coefficient vector x is calculated by projecting y on the dictionary D, which is called sparse coding procedure.The sparse coefficient vector x can be obtained by solving the following optimization problem: where • 0 is the l 0 -norm of vector x which defines the number of nonzero elements in x and ε is a pre-specified residual level parameter.The problem in ( 7) is a nondeterministic polynomial-time hard (NP-hard problem) due to the non-differentiability and non-convex nature of the l 0 -norm.Typical approaches for solving (7) are either approximation of the original problem with l 1 -norm based convex relaxation [27], or resorting to greedy schemes, such as match pursuit and basis pursuit algorithms [28].The optimization of ( 7) can also be reformulated as: where s is the sparsity level of vector x.By the additional constraint x 0 ≤ s, the sparse vector x for the test sample y on the dictionary D can be obtained.Based on the class information of structured dictionary D, the sparse coefficient vector x can be written as x = x 1 , x 2 , . . ., x c , where x i is the subset of the sparse coefficient vector x associated with class i.Thus, x should be a sparse coefficient vector whose entries is zero except those corresponding to the ith class.According to this assumption, the test sample y i from class i can be well represented by a linear combination of the sub-dictionary D i and its corresponding subset of sparse vector x i .Sparse representation classification (SRC) uses the reconstruction error e i that is associated with each class to perform the data classification.First of all, the sparse representation x of test sample y is recovered with respect to the whole dictionary.Then, x i is extracted from x as the subset vector corresponding to the class i.The test sample is reconstructed by each class specific sub-dictionary D i and its corresponding sparse vector x i .The class label of y is then determined as the one with minimal residual.

Tensor-Based Sparse Reperesntation Classification
When considering the 4th-order tensors that are used in this work, the dictionary set (D 1 , D 2 , D 3 , D 4 ) is required to be learned on X, Y, Z, and attribute modes.The sparse coefficient vector x is also extended to a 4th-order sparse tensor X.After the tensor generation for each point, the 4th-order tensors are used as the input data.At the beginning, the training tensor samples are randomly selected for the dictionary set learning.The dictionary is also composed of several sub-dictionaries that are associated with class i.Subsequently, for the sparse coding, the test tensor is projected into dictionaries on each mode to achieve the sparse tensor.This sparse coding is solved by TOMP (Tensor-based Orthogonal Matching Pursuit).Then, the test tensor data is recovered by the class specific sub-dictionaries and their corresponding subset of the sparse tensor.Finally, the label of the test tensor is predicted by the minimal reconstruction errors.The whole procedure is shown in Figure 2.
We use an alternating strategy to solve the dictionary learning problem.It can be divided into two sub-problems: updating the sparse tensor X by fixing the dictionary set (D 1 , D 2 , D 3 , D 4 ), and updating the dictionary set (D 1 , D 2 , D 3 , D 4 ) by fixing the sparse tensor X, until convergence.As a result, the desired dictionary set (D 1 , D 2 , D 3 , D 4 ) and the sparse tensor X can be obtained.

Tensor-Based Sparse Coding
To calculate the sparse core tensor , we use a greedy algorithm TOMP (Tensor-based Orthogonal Matching Pursuit) proposed by [22].Classical OMP locates the support of the sparse vector that have the best approximation of sample data from the dictionary.It selects the support set by one index at each iteration until atoms are selected or the approximation error is within a preset threshold [14], where is the sparsity.
Given a fouth-order LiDAR point tensor ∈ × × × , suppose that the dictionary set is the sparse tensor of to be calculated.The objective function is converted to a sparse coding problem with -norm regularization which can be written as: where ‖•‖ is the Frobenius norm, Γ = [ , , … , ] is the subset of indices of non-zero values in the sparse core tensor on mode X, Y, Z and attribute, and thus, denotes all possible combination of non-zero indices on the four modes.Therefore, the cross product Γ × Γ × Γ × Γ is the set of all the possible non-zero indices that can appear.Moreover, , , , represents the X, Y, Z and attribute mode sparsity, indicating the number of selected column of each dictionary for the sparse representation.The total sparsity of the fourth-order core sparse tensor is denoted by = × × × .Due to the overcomplete dictionary set, the size of the sparse tensor is larger than the LiDAR tensors .
Tensor-based Orthogonal Matching Pursuit (TOMP) relies on the equivalence of Tucker model and its Kronecker representation.Given t = vec( ), x = vec( ), the following two representations are equivalent: where ⊗ is the Kronecker product.Equation ( 11) is similar to the conventional linear sparse representation formulation.Based on this equivalence, if the vectorized version t admits a -sparse representation over the Kronecker dictionary D = (D ⊗ D ⊗ D ⊗ D ), then the 4th-order tensor ∈ × × × also has a sparse representation with respect to the dictionaries D , D , D , D on each mode.In the standard Tucker model, the core tensor usually has smaller size than the data tensor and the main objective is to find such a decomposition, which is to compute both the core

Tensor-Based Sparse Coding
To calculate the sparse core tensor X, we use a greedy algorithm TOMP (Tensor-based Orthogonal Matching Pursuit) proposed by [22].Classical OMP locates the support of the sparse vector that have the best approximation of sample data from the dictionary.It selects the support set by one index at each iteration until s atoms are selected or the approximation error is within a preset threshold [14], where s is the sparsity.
Given a fouth-order LiDAR point tensor T ∈ R I 1 ×I 2 ×I 3 ×I 4 , suppose that the dictionary set (D 1 , D 2 , D 3 , D 4 ) is fixed, is the sparse tensor of T to be calculated.The objective function is converted to a sparse coding problem with l 0 -norm regularization which can be written as: where • F is the Frobenius norm, Γ n = j 1 n , j 2 n , . . ., j s n n is the subset of s n indices of non-zero values in the sparse core tensor on mode X, Y, Z and attribute, and thus, denotes all possible combination of s n non-zero indices on the four modes.Therefore, the cross product Γ 1 × Γ 2 × Γ 3 × Γ 4 is the set of all the possible non-zero indices that can appear.Moreover, s 1 , s 2 , s 3 , s 4 represents the X, Y, Z and attribute mode sparsity, indicating the number of selected column of each dictionary for the sparse representation.The total sparsity of the fourth-order core sparse tensor is denoted by s = s 1 × s 2 × s 3 × s 4 .Due to the overcomplete dictionary set, the size of the sparse tensor X is larger than the LiDAR tensors T.
Tensor-based Orthogonal Matching Pursuit (TOMP) relies on the equivalence of Tucker model and its Kronecker representation.Given t = vec(T), x = vec(X), the following two representations are equivalent: where ⊗ is the Kronecker product.Equation ( 11) is similar to the conventional linear sparse representation formulation.Based on this equivalence, if the vectorized version t admits a s-sparse representation over the Kronecker dictionary D kron = (D 4 ⊗ D 3 ⊗ D 2 ⊗ D 1 ), then the 4th-order tensor T ∈ R I 1 ×I 2 ×I 3 ×I 4 also has a sparse representation with respect to the dictionaries D 1 , D mode.In the standard Tucker model, the core tensor usually has smaller size than the data tensor and the main objective is to find such a decomposition, which is to compute both the core tensor and the factor matrices.In our approach, the data tensor and dictionaries are known and the objective is to calculate the core tensor X that can approximately recover the input tensor.Additionally, the core tensor X is sparse and its size is larger than the data tensor.The TOMP algorithm is given in Table 1.
Table 1.The algorithm for TOMP (Tensor-based Orthogonal Matching Pursuit). Algorithm: maximum number of non-zeros coefficients s n in each mode.

Structured and Discriminative Dictionary Learning
Dictionary learning aims to build a dictionary that is composed of basis vectors, which can fully represent test samples by the sparse coding procedure.Regarding the 4th-order tensor data, the dictionary set (D 1 , D 2 , D 3 , D 4 ) on X, Y, Z, attribute mode should be learned.To improve the performance of the dictionary learning method, a structured and discriminative dictionary is estimated in our approach.Instead of learning a shared dictionary over all the classes, we derive a structured dictionary is the class specified sub-dictionary associated with class i on X, Y, Z, and attribute mode, and c is the total number of classes.With such a dictionary set, we can use the reconstruction error for classification based on SRC.
Denote by T = [T 1, T 2, . . . ,T c ] the set of training point tensors, where T i is the subset of the training tensor samples from class i. Correspondingly, X i is the sparse tensor of T i over the entire dictionary set (D 1 , D 2 , D 3 , D 4 ).Furthermore, X i can be represented as where The initial dictionaries are composed of k leading principal vectors of matrices along each mode by Tucker decomposition.Denote by T ij the jth tensor from class i, T ij is tucker decomposed to get the U 1 , U 2 , U 3 , U 4 , as shown in Equation ( 12), then the first k number of basis vectors of U 1 , U 2 , U 3 , U 4 are added into dictionaries D Then, the initial dictionary set is optimized by the discriminative dictionary learning model.
Besides requiring (D 1 , D 2 , D 3 , D 4 ) should have strong reconstruction ability of for each tensor, the dictionary set should also own the powerful capability to distinguish tensor samples between various classes.Consequently, the discriminative fidelity terms are added to the dictionary learning model.Firstly, the dictionary set (D 1 , D 2 , D 3 , D 4 ) should be able to well recover the training tensor set T, therefore, small.As a result, the dictionary learning model with the discriminative fidelity terms is defined as: Again, c is the number of clasrses, and the X 0 ≤ s is the sparsity constraint, which means that the sparsity of tensor X is s.
These discriminative tensor dictionaries are learned in an alternating minimization rule, all other dictionaries and the sparse core tensors are fixed when learning one certain mode dictionary.Namely, D 1 , D 2 , D 3 , D 4 are learned independently between each other.To learn the dictionary on a certain mode, we update the sub-dictionary D i class by class.When updating D i , all of the other sub-dictionary D j , j = i are fixed.Then, the objective function can be written as: Mathematically, the tensor equation can be represented in an unfolded form, the following two equations are equivalent: where T (n) is the mode-n unfolding matrix of the tensor T, X (n) is the mode-n unfolded matrix of the tensor ), Equation ( 14) can be rewritten into its unfolded version as: This is a conversion to a constrained convex quadratic optimization problem, and it can be solved by the gradient algorithm in the paper [29].Figure 3 shows the residuals of objective function (13) along dictionary set (D 1 , D 2 , D 3 , D 4 ) updating.
In this way, dictionaries on the four modes are updated.In the next iteration, these new learned dictionaries are used to obtain the new sparse tensor in the sparse coding procedure.As a result, this dictionary learning processing alternates between tensor dictionary learning and sparse tensor update until a stopping criterion is reached.
solved by the gradient algorithm in the paper [29].Figure 3 shows the residuals of objective function (13) along dictionary set (D , D , D , D ) updating.
In this way, dictionaries on the four modes are updated.In the next iteration, these new learned dictionaries are used to obtain the new sparse tensor in the sparse coding procedure.As a result, this dictionary learning processing alternates between tensor dictionary learning and sparse tensor update until a stopping criterion is reached.
. Given a test tensor T k , its sparse tensor X k over the whole dictionary set (D 1 , D 2 , D 3 , D 4 ) is calculated by TOMP, then X i k is the subset of sparse tensor corresponding to the D i 1 , D i 2 , D i 3 , D i 4 that is associated with class i.The test tensor should be well recovered by its corresponding sub-dictionary set and subset of sparse tensor, whereas the residual should be large when using other sub-dictionary sets and subsets of sparse tensor.

Data Description
We perform the classification on eight real airborne LiDAR datasets of Vienna city.The area of each dataset is 100 × 100 m 2 .The densities of datasets mostly range from 8 to 75 points/m 2 .Multiple echoes were recorded and the point clouds in all of the datasets are fully labeled.The datasets contain various kinds of objects, such as: high-rising buildings with balcony, small detached houses, single trees, grouped and low vegetation, ground with consistent height, and ground with slopes.In the classification procedure, the objects are categorized into five classes: open ground which is uncovered or not blocked by any objects; building roof ; vegetation; covered ground, which is usually under the high trees or building roof; and, façade.Height difference.Height difference is measured between the LiDAR point and the lowest point found in a multiple scale cylindrical neighborhood.By varying the size of the local cylindrical neighborhood, height differences are calculated for each scale.The cylinder radii have been set experimentally to 10 m and 2 m, and correspondingly the height differences are denoted by ∆H r1 and ∆H r2 .The height difference ∆H is given by: The threshold λ = 70% × max(∆H r1 ), and the maximum is taken over the ∆H r1 of all the points.If the height difference value that is found in the large neighborhood is higher than the threshold, then this is considered as the reliable height difference value for this object.Otherwise, the objects may be points located on the slope, and the height difference should be calculated in a smaller neighborhood.Normally, the ground point should be selected as the lowest point and have a low height difference value.However, in the area of inclined ground, the ground points on the slope would also have relatively high height difference values for a large neighborhood selection, which can be seen in the rectangular area, as marked in Figure 4a. Figure 4a indicates that sloped ground areas have the same height difference values with roof points, which will lead to misclassification.Therefore, a small neighborhood is more suitable for sloped areas.The height difference values in the rectangular area marked in Figure 4b is much more reasonable using the multiple neighborhood selection.The ground area in the rectangle in Figure 4b shows a constant height difference values with most ground points.Thus, the height difference can be calculated correctly in both sloped and flat environments by using multiple neighborhood selections.NormalSigma0: the standard deviation of normal estimation.The value is high in rough areas and low in smooth areas.6.
NormalZSigma0: the standard deviation of Normal Z estimation in a cylindrical neighborhood.The value can reflect the penetrability of the object.7.
Normal planeoffset: the offset between the current point and its local estimated plane.8-10.
Eigenvalues: Eigenvalue1; Eigenvalue2; and, Eigenvalue3.The covariance matrix used for the normal vector computation is decomposed by eigenvalue analysis.NormalSigma0: the standard deviation of normal estimation.The value is high in rough areas and low in smooth areas.6.
NormalZSigma0: the standard deviation of Normal Z estimation in a cylindrical neighborhood.The value can reflect the penetrability of the object.7.
Normal planeoffset: the offset between the current point and its local estimated plane.The echo number is q-th echo for a certain pulse.The number of echo is the maximum number of echoes that are detected for the pulse to which the echo belongs.The echo number ratio could indicate the penetrability of objects.

Local Shape-Based Features
The local shape-based features are obtained by the normalized eigenvalues λ i , which include: linearity, planarity, sphericity, anisotropy, omivariance, and eigenentropy.The local shape-based features are calculated based on the paper by Niemeyer et al. [7], and are defined as follows: The nearest neighbors are selected for feature extraction, and k f is set to 30.The radius for ER and NormalZSigma0 calculation is set to 1 m.After the feature extraction, all feature values are normalized to the interval [0,1].Then, a feature vector of 18 dimensions corresponding to each point is obtained through the feature extraction, and attached as the 4th-order of the tensor data.Figure 5 shows a selection of features extraction results of Dataset 3.

Anisotropy =
; Omivariance = λ λ λ ; Eigenentropy = − ∑ λ ln λ The nearest neighbors are selected for feature extraction, and kf is set to 30.The radius for ER and NormalZSigma0 calculation is set to 1 m.After the feature extraction, all feature values are normalized to the interval [0,1].Then, a feature vector of 18 dimensions corresponding to each point is obtained through the feature extraction, and attached as the 4th-order of the tensor data.Figure 5 shows a selection of features extraction results of Dataset 3.

Classification Results
Based on the 18 features described in Section 4.2, we conduct a series of experiments for TSRC.Firstly, the general behavior of TSRC is analyzed in Section 4.

Classification Results
Based on the 18 features described in Section 4.2, we conduct a series of experiments for TSRC.Firstly, the general behavior of TSRC is analyzed in Section 4.3.1,then, KNN (k-nearest neighbors), DT (Decision Tree), RF (Random Forest), SVM (Support Vector Machine) are used for comparison in Section 4.3.2.In each experiment, the training data is randomly selected from the whole dataset, and the remaining dataset is used as the test samples.For each class, always the same number of training samples is selected.The overall accuracy (OA) is selected to evaluate all of the classifiers.

Tensor-Based Sparse Representation Classification Results and Discussion
For TSRC, all 3D LiDAR points are generated as the fourth-order tensor T ∈ R 5×5×5×18 , which means that the points are sampled into 5 × 5 × 5 regular grids in the three-dimensional space, and each grid is attached with a 1 × 18 feature vector.The sparsity level is set to 9, and 27 training sample tensors are randomly selected from each class to learn the dictionary.
We conduct the TSRC on the eight real airborne LiDAR datasets.To avoid the biased result, we repeated TSRC 10 times on each dataset.Visual inspection indicates that most objects are classified correctly in Figure 6.The unlabeled points are objects that do not belong to any class mentioned in the Section 4.1, such as fences, cars, power lines, and others.Unlabeled points are not involved in the accuracy evaluation.The amount of points in each LiDAR dataset, percentage of training data, and mean OA of 10 classification experiments and the standard deviation of OA are summarized in Table 2.The overall accuracies of all the datasets are beyond 80%, which are rather good classification results when considering that only a few training samples are used.Moreover, the OA deviations of all datasets are less than 1%, and OA values of 10 TSRC experiments remain stable for all of the datasets.It indicates that the TSRC is barely affected by the training data selection.The confusion matrices in Table 3 present the correctness and incorrectness for each class of all the datasets.From the confusion matrices and qualification (Figure 7) of the classification, we can see that the major confusions occur between open ground and covered ground.14.44% of open ground points are mislabeled as covered ground in Dataset 6, and 8.65% of covered ground points are wrongly labeled as ground in Dataset 3.This is caused by the same attributes that open and covered ground points share, such as same height difference, roughness, and local shape parameters.Moreover, open and covered ground points are easily mixed in the neighborhood when generating the tensor.Based on the Table 3, there are 4.45% of open ground points that are wrongly labeled as roof in Dataset 3. Some slope areas and low roofs are confused with each other in this site.This is due to the same feature values and geometry that they have.However, open and covered ground The confusion matrices in Table 3 present the correctness and incorrectness for each class of all the datasets.From the confusion matrices and qualification (Figure 7) of the classification, we can see that the major confusions occur between open ground and covered ground.14.44% of open ground points are mislabeled as covered ground in Dataset 6, and 8.65% of covered ground points are wrongly labeled as ground in Dataset 3.This is caused by the same attributes that open and covered ground points share, such as same height difference, roughness, and local shape parameters.Moreover, open and covered ground points are easily mixed in the neighborhood when generating the tensor.Based on the Table 3, there are 4.45% of open ground points that are wrongly labeled as roof in Dataset 3. Some slope areas and low roofs are confused with each other in this site.This is due to the same feature values and geometry that they have.However, open and covered ground points are scarcely classified to other classes for other datasets.Therefore, the ground points are well distinguished from other objects by TSRC, which shows great potential ability for ground filtering.As for roof classification result, incorrect points are found essentially on building edges (as seen in Figure 7).They are labeled as vegetation, since such points behave similar for many attributes, such as the low ER values, high NormalSigma0 values, and low planarity.Vegetation are well classified with a high accuracy.The error points do not appear on certain classes, they randomly occur in the other four classes based on the statistic in Table 3.Finally, the accuracy of façade is relatively low.Large numbers of points are labeled as vegetation.They mainly correspond to the façade where points are not sufficiently dense and co-planar.Those points will have high local-planar based feature values, which behave similarly to vegetation points.Furthermore, some façade points are very close to roof and ground, and they are often misclassified due to the neighborhood selection used for tensor generation.In total, TSRC can lead to a good classification result on the airborne urban LiDAR points with a few training data only.Based on the statistic of the number of points per class, the accuracy has no direct relevance to the amount of training data.With 27 training samples given in each class, the performance remains stable no matter whether dealing with classes with a large or a small amount of points.Finally, when compared with the object points, the ground points are most likely to be correctly detected by TSRC, which is meaningful for the filtering and generation of DTM (Digital Terrain Model).

Classification Comparison
For the comparison, the classifiers KNN, DT, RF, and SVM are applied on the eight LiDAR datasets.The training data are randomly selected for 10 repetitions of the experiment, and the same training datasets are used in the TSRC dictionary learning, the training of the other classifiers, and the optimization in each experiments.To find the optimal parameters for KNN, DT, RF, and SVM, we built a misclassification rate function for each classifier based on the training dataset, then the minimum error rate is searched by parameters optimization, and the best parameters of each classifier were selected.The parameters to be optimized for each classifiers are listed in the following.

KNN:
the k nearest neighborhood points, distance computation function.DT: the minimum observations on each leaf, the minimum observations in each branch node, and the maximum number of branch node splits.RF: the number of predictors, and the parameters included for generating the decision tree, which contains the minimum observations on each leaf, the minimum observations in each branch node, and the maximum number of branch node splits.SVM: the kernel function, the kernel size and the box constraint which is the weight of cost of misclassification.
The mean OA and OA standard deviation of 10 experiments for all of the classifiers are summarized in Table 4. Based on Table 4, TSRC shows the best performance over all of the datasets in terms of mean OA, except for Dataset 1, where RF provides the best results.However, TSRC achieves the second best OA and it is just 0.36% lower than the best OA value for this Dataset.As for OA deviation, TSRC also has the lowest OA deviation for Datasets 2-8, while the OA deviation of TSRC for Dataset 1 is only 0. 09% lower than that achieved by SVM.According to the OA deviation, the TSRC is less affected by the training data selection than the other classifiers investigated Figure 8 shows the average accuracy per class for all eight datasets when using TSRC and other classifiers in 10 repetitions of the experiment.For the ground classification, the TSRC has the best accuracy in most cases; however, the accuracy of the TSRC is slightly lower than DT in Dataset 3 and the minimum observations on each leaf, the minimum observations in each branch node, and the maximum number of branch node splits.RF: the number of predictors, and the parameters included for generating the decision tree, which contains the minimum observations on each leaf, the minimum observations in each branch node, and the maximum number of branch node splits.SVM: the kernel function, the kernel size and the box constraint which is the weight of cost of misclassification.
The mean OA and OA standard deviation of 10 experiments for all of the classifiers are summarized in Table 4. Based on Table 4, TSRC shows the best performance over all of the datasets in terms of mean OA, except for Dataset 1, where RF provides the best results.However, TSRC achieves the second best OA and it is just 0.36% lower than the best OA value for this Dataset.As for OA deviation, TSRC also has the lowest OA deviation for Datasets 2-8, while the OA deviation of TSRC for Dataset 1 is only 0. 09% lower than that achieved by SVM.According to the OA deviation, the TSRC is less affected by the training data selection than the other classifiers investigated.
Figure 8 shows the average accuracy per class for all eight datasets when using TSRC and other classifiers in 10 repetitions of the experiment.For the ground classification, the TSRC has the best accuracy in most cases; however, the accuracy of the TSRC is slightly lower than DT in Dataset 3 and lower than DT, RF, and SVM in Dataset 8.The accuracy gap of vegetation classification among TSRC, RF, and SVM is narrow for all of the datasets, but the vegetation classification accuracy is increased by 14.28% for Dataset 3 and 20.91% for Dataset 4 when compared to KNN.As displayed in Figure 8d, the accuracy of roof is significantly improved by TSRC for Dataset 4 and Dataset 5.As for covered ground classification, the accuracies of TSRC are significantly higher than that of DT, and higher than the accuracy obtained by KNN and RF for most cases.The difference between accuracies of TSRC and SVM are small.Additionally, TSRC delivers the best results among all of the classifiers for façade classification.The improvement is especially significant in façade detection for Dateset 3 and Dataset 4, the facades are badly distinguished by the other four classifiers.The accuracy of DT is only 51.03% and 43.62%, while the accuracies increase to 79.73% and 84.97% by TSRC.
KNN directly searches the similar feature vectors from the training data, and all of the attributes are used without any selection or weight assignment.Therefore, the classification results of KNN are not as good as TSRC.The optimal parameters for DT, RF, and SVM is dependent on the large amount of training data and multiple cross validation.Since only a few of training data are used to train DT, RF, and SVM in this paper, the classifiers are easily overfitting and biased.The classifiers work well for the training points, but it could not lead to a good the classification result for a large amount of test points.However, TSRC has better performance than those classifiers by using the same amount of training data.
In a nutshell, the classification results for the eight LiDAR datasets demonstrate the effectiveness of TSRC in improving the classification performance, particularly enhancing the façade detection accuracy.Since façade points are influenced by their sparse density and mini-structures on the wall, and the feature values of façade points are not reliable and yield a bad detection result by the feature-based classifiers.Due to the combination of points spatial distribution and feature values, TSRC could effectively improve the façade detection accuracy.

Discussion
One of our framework's crucial parts is the tensor reconstruction.The tensor is reconstructed by each sub-dictionary and its corresponding subset of the sparse tensor.Then, the class label is determined by the minimum reconstruction error.Theoretically, it is possible that there are equal reconstruction errors.However, this tie situation never happened in our experiments.Since the bases in the sub-dictionary are different between each other, it is very unlikely that a sparse tensor contains certain subsets of tensors that could recover the same tensors.Therefore, the tie situation of reconstruction errors barely happens.
Since there are parameters need to set manually in this approach, such as the neighborhood size in tensor generation, sparsity level in TOMP, and the number of training data, we conducted a series of experiments on how those parameters influence the classification result.This is discussed below.

Impact of Neighborhood Size Selection in Tensor Generation
The impact of KNN neighborhood size in tensor processing on the classification results is assessed.The neighborhood size indicates how many points are involved in the tensor generation, and it depends on the scale parameter kt (k nearest neighbor points).Therefore, we utilize Dataset 4 and vary kt values over the interval between kt = 20 and kt = 120 with ∆kt = 20 Dataset 4 contains various types of objects, such as slopes, small detached houses, high-rising buildings, low vegetation, and high trees, so it is used to test the impact of neighborhood size selection.The classification is evaluated by OA and Kappa index in Table 5.
The OA and Kappa index slightly change by using various kt values, the tendency is similar across the open ground, vegetation, roof and covered ground class.Only the façade objects are influenced by the kt values; the accuracy of façade is lower when smaller kt values are used, and façade accuracy increases when the kt value is larger than 80.In order to achieve the high accuracies of all types of objects, kt value is suggested setting in the range of 80-120.We use a kt value of 80 in our classification.

Discussion
One of our framework's crucial parts is the tensor reconstruction.The tensor is reconstructed by each sub-dictionary and its corresponding subset of the sparse tensor.Then, the class label is determined by the minimum reconstruction error.Theoretically, it is possible that there are equal reconstruction errors.However, this tie situation never happened in our experiments.Since the bases in the sub-dictionary are different between each other, it is very unlikely that a sparse tensor contains certain subsets of tensors that could recover the same tensors.Therefore, the tie situation of reconstruction errors barely happens.
Since there are parameters need to set manually in this approach, such as the neighborhood size in tensor generation, sparsity level in TOMP, and the number of training data, we conducted a series of experiments on how those parameters influence the classification result.This is discussed below.

Impact of Neighborhood Size Selection in Tensor Generation
The impact of KNN neighborhood size in tensor processing on the classification results is assessed.The neighborhood size indicates how many points are involved in the tensor generation, and it depends on the scale parameter k t (k nearest neighbor points).Therefore, we utilize Dataset 4 and vary k t values over the interval between k t = 20 and k t = 120 with ∆k t = 20 Dataset 4 contains various types of objects, such as slopes, small detached houses, high-rising buildings, low vegetation, and high trees, so it is used to test the impact of neighborhood size selection.The classification is evaluated by OA and Kappa index in Table 5.
The OA and Kappa index slightly change by using various k t values, the tendency is similar across the open ground, vegetation, roof and covered ground class.Only the façade objects are influenced by the k t values; the accuracy of façade is lower when smaller k t values are used, and façade accuracy increases when the k t value is larger than 80.In order to achieve the high accuracies of all types of objects, k t value is suggested setting in the range of 80-120.We use a k t value of 80 in our classification.

Impact of Sparsity Level
The sparsity level indicates that the number of bases needed to be extracted from the dictionary for data reconstruction in the sparse coding processing.As in Section 5.1 Dataset 4, which exhibits large variety within each class, is used to test the impact of various sparsity levels on the classification result.The sparsity level s is set from s = 5 to s = 18 as shown in Table 6.The OA and Kappa index remain unchanged by using different sparsity level in the TOMP phase, which also demonstrates that the classification result is not sensitive to the sparsity level.As the initial value of is S 9 only 0.1% less than the optimal value, this initial value is kept for further experiments.The overall accuracy values are given in Figure 9. Generally, the TSRC performs better than the other four classifiers independent of the number of training samples.The overall accuracy tends to increase for all of the classification methods.The overall accuracy remains steady from = 27 to = 90 by TSRC.Therefore, the training data amount is set as 27 in the classification experiments.

Conclusions
In this paper, a tensor-based sparse representation classification frame work is proposed for 3D LiDAR point cloud classification.In this framework, each point is considered as the fourth-order tensor in order to make full use of geometry and feature information.

Conclusions
In this paper, a tensor-based sparse representation classification frame work is proposed for 3D LiDAR point cloud classification.In this framework, each point is considered as the fourth-order tensor in order to make full use of geometry and feature information.18 features per point are extracted from the 3D LiDAR points, and all features are utilized for classification without any feature selection procedure.Based on the Tucker Decomposition, the structured and discriminative dictionaries along each mode are learned for tensor data classification.Then, the test tensor data is projected onto the dictionary set to get its sparse tensor.After that, using different class-specific dictionary sets and its corresponding subsets of the sparse tensor to recover the test tensor data, meanwhile the residuals per class are determined.Finally, the label of the test tensor is determined by the minimal residual.
A series of experiments of TSRC suggest that the TSRC is barely dependent on the neighborhood size of tensor generation and the sparsity level.The TSCR can be successfully conducted by using only a few training samples.Based on the eight real airborne LiDAR points classification result, the OAs of TSRC are beyond 80%, with only 27 training tensors being used per class.Additionally, TSRC achieves a good classification when compared with other classifiers.. TSRC has respectable performance in identifying objects with less distinguishable features, such as façade.

Figure 1 .
Figure 1.The tensor generation from a point cloud set procedure.

Figure 1 .
Figure 1.The tensor generation from a point cloud set procedure.

Figure 3 .
Figure 3.The reconstruction residuals of dictionary set of each iteration.Figure 3. The reconstruction residuals of dictionary set of each iteration.

Figure 3 .
Figure 3.The reconstruction residuals of dictionary set of each iteration.Figure 3. The reconstruction residuals of dictionary set of each iteration.

4. 2 .
Feature Extraction A set of 18 features are extracted from 3D LiDAR points, which contains height-based features, local plane-based features, penetrability-based features and local shape-based features.The four feature groups are detailed hereby.4.2.1.Height-Based Features 1.

Figure 4 .
Figure 4. Height difference feature results: (a) Height difference via constant scale neighborhood; (b) Height difference via multiple scale neighborhood.4.2.2.Local Plane-Based Features For a given 3D point set and its k closest neighbors, the local plane-based features are exploited by estimating a local orthogonal regression plane.The local plane-based features contain: 2-4.Normal vector: Normal X; Normal Y; and, Normal Z.The normal vectors of local planes are estimated by k neighbor points, normal X; normal Y; normal Z are the values in X, Y, Z direction from the normal vectors.5.NormalSigma0: the standard deviation of normal estimation.The value is high in rough areas and low in smooth areas.6.NormalZSigma0: the standard deviation of Normal Z estimation in a cylindrical neighborhood.The value can reflect the penetrability of the object.7.Normal planeoffset: the offset between the current point and its local estimated plane.8-10.Eigenvalues: Eigenvalue1; Eigenvalue2; and, Eigenvalue3.The covariance matrix used for the normal vector computation is decomposed by eigenvalue analysis.This yields Eigenvalue1 λ ; Eigenvalue2 λ ; Eigenvalue3 λ (λ λ λ ) .λ λ have low values for

Figure 4 .
Figure 4. Height difference feature results: (a) Height difference via constant scale neighborhood; (b) Height difference via multiple scale neighborhood.
3.1, then we consider the impact of different neighborhood sizes in the tensor generation on the classification result in Section 4.3.2.Subsequently, various sparsity levels and training data numbers are studied in Sections 4.3.3 and 4.3.4.Furthermore, KNN (k-nearest neighbors), DT (Decision Tree), RF (Random Forest), SVM (Support Vector Machine) are used for comparison in Section 4.3.5.In each experiment, the training data is randomly selected from the whole dataset, and the remaining dataset is used as the test

Figure 6 .
Figure 6.Three-dimensional view of the classification results of eight datasets using TSRC.

Figure 6 .
Figure 6.Three-dimensional view of the classification results of eight datasets using TSRC.

Figure 7 .
Figure 7. Qualification of the classification over eight datasets.

Figure 7 .
Figure 7. Qualification of the classification over eight datasets.

5. 3 .
Impact of Training Data Since learning a classifier strongly depends on the given training data, we further consider the influence of varying the amount of training data on the classification results.We focus on the impact of 10 different amount of training examples, the parameter of training data amount Nt varies from Nt = 9 to Nt = 90, with a step size of ∆Nt = 9.The general behavior of the TSRC and other classifiers under various numbers of training samples is analyzed.KNN, DT, RF, and SVM are chosen for classification comparison.Again, the LiDAR Dataset 4 is used for classification, which contains 352318 points.The overall accuracy values are given in Figure 9. Generally, the TSRC performs better than the other four classifiers independent of the number of training samples.The overall accuracy tends to increase for all of the classification methods.The overall accuracy remains steady from Nt = 27 to Nt = 90 by TSRC.Therefore, the training data amount is set as 27 in the classification experiments.classification comparison.Again, the LiDAR Dataset 4 is used for classification, which contains 352318 points.

Figure 9 .
Figure 9.The OA of dataset 3 with different amount of training tensors by different classifiers.

Figure 9 .
Figure 9.The OA of dataset 3 with different amount of training tensors by different classifiers. 2 Eigenvalue3.The covariance matrix used for the normal vector computation is decomposed by eigenvalue analysis.This yields Eigenvalue1 λ 1 ; Eigenvalue2 λ 2 ; Eigenvalue3 λ 3 (λ 1 > λ 2 > λ 3 ).λ 2 λ 3 have low values for planar object and higher values for voluminous point clouds.3D ≤ n 2D , n 3D is the number of neighbors found in a certain search distance measured in 3D and n 2D is the number of neighbors found in the same distance measured in two-dimensions (2D).The ER is nearly 100% for a flat surface, whereas the ER decreases for penetrable surface parts since there are more points in a vertical search cylinder than there are points in a sphere with the same radius.12. Echo number ratio.The echo number ratio of each point is defined as:

Table 2 .
The percentage of training samples for all of the datasets.The amount of points in each LiDAR dataset, percentage of training data, and mean OA of 10 classification experiments and the standard deviation of OA are summarized in Table2.The overall accuracies of all the datasets are beyond 80%, which are rather good classification results when considering that only a few training samples are used.Moreover, the OA deviations of all datasets are less than 1%, and OA values of 10 TSRC experiments remain stable for all of the datasets.It indicates that the TSRC is barely affected by the training data selection.

Table 2 .
The percentage of training samples for all of the datasets.

Table 3 .
The confusion matrices for each dataset and number of points per class in each dataset.(Error rates above 3% are highlighted by italic type except the mixtures between open ground and covered ground).

Table 4 .
Mean overall accuracy (OA) and OA deviation for all dataset using different classifiers.Bold values indicate the highest overall accuracy and the lowest standard deviation with the respective classifier.

Table 5 .
Accuracy per class, OA and Kappa index of TSRC for Dataset 4 with different kt values.

Table 5 .
Accuracy per class, OA and Kappa index of TSRC for Dataset 4 with different k t values.

Table 6 .
Accuracy per class, OA and Kappa index of TSRC for Dataset 4 with different sparsity level.