You are currently viewing a new version of our website. To view the old version click .
Remote Sensing
  • Article
  • Open Access

3 February 2022

Enhanced TabNet: Attentive Interpretable Tabular Learning for Hyperspectral Image Classification

,
and
1
Department of Electrical and Computer Engineering, Mississippi State University, Starkville, MS 39762, USA
2
Cotiviti Inc., South Jordan, UT 84095, USA
*
Author to whom correspondence should be addressed.

Abstract

Tree-based methods and deep neural networks (DNNs) have drawn much attention in the classification of images. Interpretable canonical deep tabular data learning architecture (TabNet) that combines the concept of tree-based techniques and DNNs can be used for hyperspectral image classification. Sequential attention is used in such architecture for choosing appropriate salient features at each decision step, which enables interpretability and efficient learning to increase learning capacity. In this paper, TabNet with spatial attention (TabNets) is proposed to include spatial information, in which a 2D convolution neural network (CNN) is incorporated inside an attentive transformer for spatial soft feature selection. In addition, spatial information is exploited by feature extraction in a pre-processing stage, where an adaptive texture smoothing method is used to construct a structure profile (SP), and the extracted SP is fed into TabNet (sTabNet) to further enhance performance. Moreover, the performance of TabNet-class approaches can be improved by introducing unsupervised pretraining. Overall accuracy for the unsupervised pretrained version of the proposed TabNets, i.e., uTabNets, can be improved from 11.29% to 12.61%, 3.6% to 7.67%, and 5.97% to 8.01% in comparison to other classification techniques, at the cost of increases in computational complexity by factors of 1.96 to 2.52, 2.03 to 3.45, and 2.67 to 5.52, respectively. Experimental results obtained on different hyperspectral datasets demonstrated the superiority of the proposed approaches in comparison with other state-of-the-art techniques including DNNs and decision tree variants.

1. Introduction

Hyperspectral imagery (HSI) consists of abundant spatial and spectral information in a 3D data cube with hundreds of narrow spectral bands. Due to high spectral resolution, it has been applied in many applications, such as pollution monitoring, urban planning, analysis for land use, and land cover [,,,]. However, an increase in spatial and spectral information poses a challenge in HSI analysis. Thus, analysis of HSI, such as classification, dimensionality reduction [,], and feature extraction [,], has obtained much attention among the remote sensing community for decades []. Moreover, such approaches can be applicable towards vision technology applications in other engineering domains [,,], multispectral remote sensing, and synthetic aperture radar (SAR) imagery [,].
In the last decades, spectral-based classification approaches such as support vector machine (SVM) and composite kernel SVM (SVM-CK) have been widely used in remote sensing [,,]. In addition, different spatial-spectral features have been introduced for HSI classification [,]. Sparse representation (SR) for HSI classification was successfully applied in [], inspired by the successful application of sparse representation in face recognition []. Consequently, many sparse and collaborative representation-based classifiers have been introduced, such as the joint sparse representation classifier (JSRC) [], joint version of spatial-aware collaborative-competition preserving graph embedding with Tikhonov regularization (JSaCCPGT) [], nonlocal weighted JSRC (NLW-JSRC) [], and correntropy-based robust JSRC (RJSRC) []. Furthermore, multiple morphological operations were utilized in [] for constructing spatial-spectral features of HSI, and a spatial-spectral classifier was proposed in [] for addressing the issue of mixed pixel characterization. Multiple kernel learning has also been designed in [] to improve the SVM classifier.
Moreover, tree-based techniques, such as the random forest method, were introduced in []. More recently, enhanced performance of the random forest classifier is presented in [,] for HSI classification. Similarly, the performance of extreme gradient boosting (XGBoost) was investigated in [,] for HSI. Tree-based approaches have the advantages of efficient representation for decision manifolds with approximate hyperplane boundaries, being interpretable by tracking the decision nodes, and being fast to train. A deep neural network (DNN) based on multiscale spectral-spatial fusion was proposed for HSI classification in [,]. However, classification performance will decrease with a deeper network because input for such architecture is one-dimensional and it lacks neighborhood information in the spatial dimension. Moreover, a convolutional deep neural network based on stacked convolutional layers or multilayer perceptron (MLPs) fails to find an optimal solution for decision manifolds in the spectral domain due to lack of appropriate inductive bias []. In addition, convolutional neural networks (CNNs) have drawn much attention to classification of image [], and patch-to-patch CNN was presented in [] to obtain better performance than existing techniques. However, CNNs have the shortcoming of not considering spectral information effectively.
When a DNN is used for large datasets, classification performance can be improved because it enables gradient descent-based end-to-end learning. Tree learning lacks the use of backpropagation in its inputs for guidance from error signals [], thus limiting its performance for large datasets. TabNet, a new canonical deep neural architecture for tabular data, was proposed in [,]. It can combine the valuable benefits of tree-based methods with DNN-based methods to obtain high performance and interpretability. The high performance of DNNs can be made more interpretable by substituting them with tree-based methods. Inspired by this work, we propose to use TabNet for HSI classification in this paper, as spectral signatures of pixels in HSI are organized as a tabular dataset. One of the aims of this paper is to overcome the deficiencies of existing neural networks and decision trees in HSI classification. In this regard, we explore TabNet and modify its original architecture for HSI. The original TabNet takes raw data without any feature processing and is trained with a gradient descent-based method. Moreover, at each decision step, it uses sequential attention. It enables local interpretability that determines the combination and importance of input features, and global interpretability that measures the contribution of each input feature to the trained model. However, sequential attention-based TabNet has some drawbacks as well. Although TabNet can provide good performance to analyze spectral signatures of HSI, it lacks proper use of local contextual information in the spatial domain. For this reason, we have modified the original architecture of TabNet by incorporating spatial information in an attentive transformer called TabNet with spatial attention (TabNets). Specifically, a 2D convolution neural network (CNN) is used in the attentive transformer to spatially process the masks that contribute to soft feature selection of the abstract features. TabNets can overcome the deficiency of CNNs by considering spatial information with a sequential attention.
Recently, different integrated networks, such as stacked auto encoders (SAE) and convolution autoencoders (CAE), were presented in [,] for feature extraction. However, such methods lack the powerful capability of feature extraction in the spatial and spectral domains.
In this work, we observed enhanced performance of unsupervised pretraining on TabNet (uTabNet) for HSI classification, and pretraining was extended to TabNets, resulting in uTabNets. The unsupervised pretrained version of TabNets, i.e., uTabNets, can consider sequential attention in addition to spatial processing of masks by using 2D CNN in the attentive transformer.
Moreover, the existing TabNet does not include any preprocessing stage, weakening its ability to learn in a better way. Certainly, including spatial information in a spectral classifier has led to increased classification accuracy. Many deep learning classifiers, such as recurrent neural networks (RNN) [] and generative adversarial network (GAN) [], use CNN for deep feature extraction with several convolutional and pooling layers [,]. However, most deep learning methods need massive training to accurately learn of parameters. To deal with such issues, various classification frameworks, such as active learning [] and ensemble learning [], are introduced. In addition, spatial optimization using structure profile (SP) is introduced in [] for feature extraction purposes. In this paper, we incorporate SP in the TabNet with structure profile (sTabNet). Similarly, SP is used in extended versions of TabNet, including uTabNet with SP (suTabNet), TabNets with SP (sTabNets), and uTabNets with SP (suTabNets).
The main contribution of this work can be summarized as follows:
  • It introduces TabNet for HSI classification and improves classification performance by applying unsupervised pretraining in uTabNet;
  • It develops TabNets and uTabNets after including spatial information in the attentive transformer;
  • It includes SP in sTabNet as a feature extraction to further improve the classification performance of SP versions of TabNet, i.e., suTabNet, sTabNets, and suTabNets.
The remainder of this article is organized as follows. Section 2 presents related work. Section 3 discusses the proposed TabNet versions for hyperspectral image classification. Section 4 shows experimental results along with a discussion. Section 5 summarizes the article conclusively.

3. Proposed Method

The different variants of enhanced TabNet classifiers proposed in this work are summarized in Table 1.
Table 1. Acronyms and their meaning for variants of proposed TabNet classifiers.

3.1. TabNet for Hyperspectral Image Classification

Suppose that a hyperspectral dataset with d spectral bands contains M labeled samples for C classes, and each is represented by X = { x 1 , x 2 , , x M } R M × d and the corresponding label vector is Y = { y 1 , y 2 , , y C } R M × C . As shown in Figure 1, spectral features are used as inputs to TabNet. Suppose the training data X is passed to the initial decision step with batch size B . Then, the feature selection process includes the following steps:
Figure 1. Encoder for TabNets.
(1)
The “split” module separates the output of the initial feature transformer to obtain features a [ i 1 ] in Step 1 when i = 1;
(2)
If we disregard the spatial information in the attentive transformer of TabNets shown in Figure 4 below, it becomes the attentive transformer for TabNet. It uses a trainable function h i , consisting of a fully connected (FC) and batch normalization (BN) layer to generate features with high dimensions;
(3)
In each step, interpretable information is provided by masks for selecting features, and global interpretability can be attained by aggregating the masks from different decision steps. This process can enhance the discriminative ability in the spectral domain by implementing local and global interpretability for HSI feature selection.
The attentive transformer then generates masks M [ i ] R B × d as a soft selection of salient features with the use of processed features a [ i 1 ] from the previous step as:
M [ i ] = e n t m a x ( P [ i 1 ] h i ( a [ i 1 ] ) )
Entmax normalization [] inherits the desirable sparsity of sparsemax and can provide smoother, and differentiable curvature, whereas sparsemax is piecewise linear denoted as s p a r s e m a x ( P [ i 1 ] h i ( a [ i 1 ] ) ) . Here, P i is the prior scale term that denotes how much a particular feature has been used previously:
P [ i 1 ] = j = 1 i ( γ M [ i ] )
where γ is a relaxation parameter such that a feature is used at one decision step when γ = 1 and features can be used in multiple decisions steps when γ increases. For input attention z = P [ i 1 ] h i ( a [ i 1 ] ) , its sparsemax output can be estimated as:
sparsemax ( z ) = arg min p Δ D | | p z | | 2
where Δ D represents the probability distribution and sparsemax ( z ) provides zero ability to choices with low scores.
However, entmax normalization provides continuous probability distribution. estimating better distributions in comparison to sparsemax normalization, which can be stated as:
entmax ( z ) = arg max   p Δ D p T z + F υ T ( p )
where F υ T ( p ) is a continuous function denoted as F υ T ( p ) = 1 υ ( υ 1 ) n ( p n p n υ ) n p n log p n , υ = 1 , υ 1 ;
(4)
The sparsity regularization term can be used in the form of entropy [] for controlling the sparsity of selected features.
L s p a r s e = i = 1 N s t e p s b = 1 B j = 1 d M b , j [ i ] N s t e p s B log ( M b , j [ i ] + )
where takes a small value for numerical stability. Sparsity regularization λ s p a r s e is also added to the overall loss as λ s p a r s e × L s p a r s e , which can provide favorable bias for convergence to high accuracy for datasets with redundant features;
(5)
A sequential multi-step decision process with N s t e p s is used in TabNet’s encoding. The processed information from ( i 1 ) t h step is passed to the i t h step to decide which features to use. The outputs are obtained by aggregating the processed feature representation in the overall decision function as shown by feature attributes in Figure 1.
With the masks M [ i ] obtained from the attentive transformer, the following steps are used for feature processing.
(1)
The feature transformer in Figure 2 is used to process the filtered features, which can be used in the decision step output and information for subsequent steps:
Figure 2. Feature transformer for TabNets.
[ d [ i ] , a [ i ] ] = X i ( M [ i ] X )
where the [ , ] operator denotes splitting of d [ i ] = R B × N d and a [ i ] = R B × N a , with N d being the width of the prediction layer for the decision and N a being the width of the attention layer for the masks;
(2)
For efficient learning with high capacity, the feature transformer is comprised of layers that are shared across decision steps such that the same features can be input for different decision steps, and decision step-dependent layers in which features in the current decision step depend upon the output from the previous decision step;
(3)
In Figure 2, it can be observed that the feature transformer consists of the concatenation of two shared layers and two decision step-dependent layers, in which each fully connected (FC) layer is followed by batch normalization (BN) and a gated linear unit (GLU) []. Normalization with 0.5 is also used for ensuring stabilized learning throughout the network [];
(4)
All BN operations, except applied at input features, are implemented in ghost BN [] by selecting only part of samples rather than using an entire batch at one time to reduce the cost of computation. This improves performance by using the virtual or small batch size B v and momentum m B instead of using the entire batch. Moreover, decision tree-like aggregation is implemented by constructing overall decision embedding as:
d o u t = i = 1 N s t e p s LeakyReLU ( d [ i ] )
where N s t e p s represents the number of decision steps;
(5)
The linear mapping W f i n a l d o u t is applied for output mapping and softmax is employed during training for discrete outputs.

3.2. TabNet with Unsupervised Pretraining

To include unsupervised pretraining in TabNet (uTabNet), a decoder architecture is incorporated [,]. As shown in Figure 3, the decoder is composed of a feature transformer, and FC layers at each decision step to reconstruct features by combining the outputs. Missing columns of features can be predicted using other feature columns. Suppose S { 0 , 1 } B × d is a binary mask and r is the pretraining ratio of features to randomly discard for reconstruction such that the variable r represents the ratio of masking inside the binary mask S . The term in the encoder is initialized as P [ 0 ] = ( 1 S ) such that the model focuses on the known features, and the last FC layer of the decoder is a result of the product of S and unknown output features. For this purpose, the reconstruction residual ( L r e c ) used in an unsupervised manner without label information is formed as:
L r e c = i = 1 B j d ( X ^ i , j X i , j ) S i , j i = 1 B ( X i , j 1 / B i = 1 B X i , j ) 2 2
where X ^ i , j represents the reconstructed output and X i , j denotes the original input.
Figure 3. TabNets decoder.

3.3. TabNet with Spatial Attention (TabNets)

The generated masks M [ i ] in Equation (1) are used in Equation (2) to update the prior P [ i ] in the attentive transformer for soft feature selection. Spatial information is incorporated by including a 2D CNN inside the attentive transformer, resulting in TabNet with spatial attention (TabNets), as shown in Figure 4. The output feature maps of each layer in TabNets are shown in Table 2.
Figure 4. Attentive transformer for TabNets.
Table 2. Layer-wise summary of spatial attention in proposed Tabnets for window size 25 × 25. The last layer of the TabNets encoder is considered based upon Indian Pines data.
In CNN, 2D kernels are used for convolving the input data after calculating the sum of the product of the kernel and the input data. To cover the total spatial area, the kernel is strided on the input data. Nonlinearity is introduced with an activation function on the convolved features. The value after activation A k , l u , v at spatial position ( u , v ) for the k-th layer with the l-th feature map can be expressed as:
A k , l u , v = ψ ( e k , l + δ = 1 o m 1 θ = τ τ β = Φ Φ f k , l , δ β , θ × A k 1 , l u + β , v + θ )
where ψ represents the function of activation, with e k , l being the bias parameter. o m 1 denotes the number of feature maps in the ( m 1 ) th layer with the depth of the kernel f k , l at the k-th layer for the l-th feature map. 2 τ + 1 represents the width of the kernel and 2 Φ + 1 denotes the height of the kernel with weight parameter f k , l .
First of all, the 3D patch input T × P × P for the T reduced channels from principal component analysis (PCA) and patch size P × P is converted to a 1D input vector. For instance, in the Indian Pines data, the 3D input of size 10 × 25 × 25 becomes a 6250 × 1 vector. The feature size from each layer in the encoder is shown in the second part of Table 2:
(1)
The first BN generates a 6250 × 1 vector;
(2)
It is converted by the first feature transformer layer before Step 1 into a feature vector of size N d + N a = 512 ;
(3)
The Split layer divides it into two parts and provides a feature of size N a = 256 for the attentive transformer;
(4)
The Attentive transformer layer generates output masks for the 6250 × 1 feature;
(5)
The Mask layer in Step 1 generates the multiplicative output M [ i ] X to the feature transformer layer with the 6250 × 1 feature;
(6)
The feature transformer generates the feature of size N d + N a = 512 , which is separated into two parts: N d = 256 in LeakyReLu and N a = 256 for the attentive transformer in Step 1;
(7)
The output of each decision step is then concatenated in the TabNets encoder and converted to a feature map with 16 classes by the FC layer.
For spatial attention inside an attentive transformer, a feature map of different layers is shown in the first part of Table 2.
(1)
The output of e n t m a x from Equation (1) is reshaped to 10 × 25 × 25 as input to the first 2D convolution layer. For a kernel size of 3 × 3 and stride = 3, the first 2D convolution layer provides a 16 × 8 × 8 output;
(2)
The second convolution layer generates an output of size 32 × 6 × 6 with a kernel size of 3 × 3 and stride = 1;
(3)
The third convolutional layer generates an output shape of 64 × 4 × 4 with a kernel size of 3 × 3 and stride = 1;
(4)
The flatten layer provides an output of size 1024 × 1;
(5)
Finally, the FC layer generates an output of size 6250 × 1 that is provided as input to the prior scales for updating the abstract features generated by the FC and BN layers inside the attentive transformer.
In addition, TabNets with unsupervised pretraining (uTabNets) can be obtained by using steps of unsupervised pretraining and Equation (8) on TabNets.

3.4. Structure Profile on TabNet (sTabNet)

By using spatial feature extraction with structure profile (SP) [] in the preprocessing stage, the performance of TabNet can be enhanced by using the TabNet with structure profile (sTabNet).
Spatial feature extraction with structure profile:
First of all, the original input image is divided into M subsets. The structure profile S can be extracted from the input image X using an adaptive texture smoothing model as:
arg min s | | S X | | 2 2 w + λ | | S | | T V
where λ is the free parameter and w is the weight that controls the similarity of adjacent pixels. For smoothing purposes, a local polynomial can be implemented as p = l = 1 m c l p l of degree L denoted as L , with m being the number of elements in L . For N pixels in Ω ( x ) , assume Ω ( x ) = { x 1 , x 2 , , x N } is a set of points around x in X . To obtain the structure profile, S can be obtained as S ( x ) : p ( x ) for each x Ω with the optimization function as:
arg min p L { i = 1 N | | p ( x i ) X ( x i ) | | 2 2 w ( x , x i ) + λ | | p ( x i ) | | T V }
where L = { x α : x R 2 , α z + , | α | 1 L } is a polynomial with a degree L , and w decides the contribution of pixels X ( x i ) towards the construction of polynomial p ( x i ) , such that
w ( x i , x ) = exp ( y Y ( x i ) | | X ( x i + y ) X ( x + y ) | | 2 2 G σ ( | | y | | ) h 0 2 )
where Y ( ) is the small region that can be used for comparing patches around x i and x, the scale parameter h 0 is set to 1, and G σ is the Gaussian function with standard deviation σ . Equation (10) can now be expressed as:
arg min p L { i = 1 N | | p ( x i ) X ( x i ) | | 2 2 w ( x , x i ) + λ | | p ( x i ) | | 1 }
Using the Bregman iteration algorithm [], Equation (13) can be solved as below:
Update p k + 1 ( x i ) :
p k + 1 ( x i ) = arg min p L i = 1 N | | p ( x i ) X ( x i ) | | 2 2   w ( x , x i ) + λ | | d k ( x i ) p ( x i ) b k ( x i ) | | 2 2   w ( x , x i )
Update d k + 1 ( x i ) :
arg min d | d ( x i ) | 1 + λ | | d ( x i ) p k + 1 ( x ) b k ( x ) | | 2 2
The soft thresholding method can be used:
d k + 1 ( x i ) = s o f t ( p k + 1 ( x i ) + b k ( x i ) , 1 / λ )
Update b k + 1 ( x i ) :
b k + 1 ( x i ) = b k ( x i ) + p k + 1 ( x i ) d k + 1 ( x i )
These steps of updating p k + 1 ( x i ) , d k + 1 ( x i ) , and b k + 1 ( x i ) are repeated until convergence is attained.
After obtaining convergence, the aforementioned TabNet classifier is implemented on the extracted SPs to obtain the classification results for sTabNet.

3.5. Structure Profile on Unsupervised Pretrained TabNet (suTabNet)

After applying SP in feature extraction before uTabNet, the performance of the TabNet with unsupervised pretraining with SP feature extraction (suTabNet) can be obtained. Similarly, SP feature extraction can be applied to TabNets and uTabNets to obtain their SP-extracted versions sTabNets and suTabNets, respectively, and on other comparative methods for a fair comparison.

4. Experiments

4.1. Datasets

Three different datasets were used to validate the proposed methods.
The first dataset used for the experiment is the Indian Pines dataset collected by the Airborne Visible and Infrared (AVIRIS) sensor. It consists of 16 different classes with a spatial size of 145 × 145 pixels and spectral bands of 220 (200 after noise removal). The water-absorption bands 104–108, 150–163, and 220 were removed. The spectral wavelength ranges from 0.4 to 2.5 μ m. Ten percent of training samples were taken into consideration from each class for training and the remaining were used for testing. The number of training and testing samples for each class is listed in Table 3.
Table 3. Training and testing samples with class labels in the Indian Pines dataset.
The second dataset used is the University of Pavia dataset, which was acquired by the Reflective Optics System Imaging Spectrometer (ROSIS) sensor in Italy. It has a spatial size of 610 × 340 pixels. It consists of a total of 103 spectral bands after noisy band removal. It includes spectral bands in the range 0.43 to 0.86   μ m. Nine different classes exist in this dataset and 200 training samples were taken from each class as training samples; the remaining were used as testing samples. Table 4 shows the number of training and testing samples for each class.
Table 4. Training and testing samples with class labels in the Pavia dataset.
The third dataset is the Salinas dataset, which is collected with an AVIRIS sensor in Salinas Valley, California. It comprises a spatial size of 512 × 217 pixels with 224 bands (204 bands after band removal). Water-absorption bands 108–112, 154–167, and 224 were removed. It has a spatial resolution of 3.7 m-pixels with 16 different classes. For training, 200 samples from each class were taken and remaining were used for testing. Table 5 shows the number of training and testing samples in different classes.
Table 5. Training and testing samples with class labels in the Salinas dataset.

4.2. Experimental Setup

For all methods in comparison, such as RF, MLP, LightGBM, CatBoost, XGBoost, and CAE, parameters were estimated according to [,,,,,,,,]. For our proposed methods, the Adam optimizer was used to estimate the optimal parameters. In all three datasets, 10% of training samples were allocated for validation and the remaining 90% of training samples were allocated for learning optimal weights of the network for tuning the hyper parameters of the network. The performance of TabNet, uTabNet, TabNets, uTabNets, and their SP-extracted versions sTabNet, suTabNet, sTabNets, and suTabNets on different parameters was investigated from a predefined set of parameters. N d and N a were selected from the range of { 8 , 16 , 24 , 32 , , 1024 } , γ = { 1 , 1.5 , 2 } , λ s p a r s e = { 0 , 0.0001 , , 0.1 } , B = { 16 , 32 , , 16384 } , m B = { 0.2 , , 1 } , N s t e p s = { 1 , 2 , , 10 } , and B v = { 16 , 32 , , 1024 } was used as a range of different parameters for TabNet, uTabNet, TabNets, uTabNets, and their SP versions. In all three datasets, λ s p a r s e = 0.01 , γ = 1.5 , B = 64 , m B = 0.6 , N s t e p s = 5 , and B v = 128 were selected. The proposed TabNets, and uTabNets can provide enhanced results in a smaller number of epochs, such as 200 epochs for Indian Pines data and 500 epochs for the other two datasets. Each experiment was repeated 10 times and the average value is reported to reduce the ambiguity. The optimal parameters of the proposed methods are listed in Table 6 for all three datasets.
Table 6. Parameter tuning in different algorithms.
In addition, varying window size in the range of { 19 × 19 , 21 × 21 , 23 × 23 , 25 × 25 , 27 × 27 } was investigated to incorporate more spatial information. However, choosing a too large window size may add redundancy due to interclass variation among neighboring pixels. As shown in Table 7, 25 × 25 was found to be the most suitable for all datasets. For Indian Pines and Salinas data, 10 × 25 × 25 was used, and 7 × 25 × 25 was used for University of Pavia data.
Table 7. Varying window size in TabNets (OA in percentage).

4.3. Result of Classification

Classification accuracies in terms of overall accuracy, average accuracy, Kappa coefficients, and per-class accuracy are enlisted in Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13. It can be observed that TabNet shows better classification accuracy than the other methods of RF [,,], MLP [], LightGBM [], CatBoost [], and XGBoost [,]. In addition, TabNet with spatial attention (TabNets) and its unsupervised pretrained version (uTabNets) outperform TabNet and its unsupervised version uTabNet in all three datasets. Additionally, uTabNets outperforms the convolutional autoencoder (CAE) [,] in all three datasets. Moreover, sTabNet outperforms TabNet and SP-extracted versions of other methods, such as sRF, sMLP, sLightGBM, sCatBoost, and sXGBoost. Additionally, SP on TabNets (sTabNets) and its unsupervised pretrained version (uTabNets) outperform TabNet, uTabNet, TabNets, and uTabNets, along with all other SP-extracted versions in all three datasets.
Table 8. Classification accuracies on the Indian Pines dataset (10 percent training samples per class).
Table 9. SP Classification accuracies on Indian Pines dataset (10 percent training samples per class).
Table 10. Classification accuracies on University of Pavia dataset (200 training samples per class).
Table 11. SP Classification accuracies on University of Pavia dataset (200 training samples per class).
Table 12. Classification Accuracies on Salinas dataset (200 training samples per class).
Table 13. SP Classification accuracies on Salinas dataset (200 training samples per class).
In Figure 5, Figure 6 and Figure 7, the classification map of the three datasets is consistent with the results in Table 8, Table 9, Table 10, Table 11, Table 12 and Table 13. In Figure 5, the classification map for Indian Pines is shown, which consists of ground truth for the original image in Figure 5a,b. In addition, in these classification maps, labeled pixels are listed, in which sTabNet outperforms TabNet and the SP versions of other techniques. Furthermore, suTabNet outperforms uTabNet and sTabNet. The proposed TabNets shows less noise in the area of Soybean-notill and Woods, and uTabNets shows less noise in the region of Woods.
Figure 5. Classification maps for Indian pines data obtained using different methods including (a) ground truth image, (b) RF (77.52%), (c) MLP (77.48%), (d) LightGBM (76.54%), (e) CatBoost (75.32%), (f) XGBoost (73.79%), (g) TabNet (82.32%), (h) uTabNet (84.44%), (i) CAE (85.07%), (j) TabNets (94.93%), (k) uTabNets (96.36%), (l) sRF (88.98%), (m) sMLP (84.17%), (n) sLightGBM (89.68%), (o) sCatBoost (89.93%), (p) sXGBoost (80.93%), (q) sTabNet (94.41%), (r) suTabNet (95.85%), (s) sCAE (95.95%), (t) sTabNets (96.40%), and (u) suTabNets (97.51%).
Figure 6. Classification maps for University of Pavia data obtained for different methods including (a) ground truth image, (b) RF (78.17%), (c) MLP (80.12%), (d) LightGBM (85.21%), (e) CatBoost (82.19%), (f) XGBoost (81.06%), (g) TabNet (90.19%), (h) uTabNet (92.58%), (i) CAE (94.26%), (j) TabNets (96.58%), (k) uTabNets (97.86%), (l) sRF (89.54%), (m) sMLP (92.28%), (n) sLightGBM (92.94%), (o) sCatBoost (94.26%), (p) sXGBoost (94.26%), (q) sTabNet (97.62%), (r) suTabNet (98.95%), (s) sCAE (98.58%), (t) sTabNets (98.38%), and (u) suTabNets (99.29%).
Figure 7. Classification maps for Salinas data obtained for different methods including (a) ground truth image, (b) RF (79.76%), (c) MLP (83.73%), (d) LightGBM (87.87%), (e) CatBoost (84.12%), (f) XGBoost (86.99%), (g) TabNet (90.45%), (h) uTabNet (91.31%), (i) CAE (92.39%), (j) TabNets (97.32%), (k) uTabNets (98.36%), (l) sRF (89.10%), (m) sMLP (93.57%), (n) sLightGBM (95.05%), (o) sCatBoost (94.70%), (p) sXGBoost (93.93%), (q) sTabNet (96.20%), (r) suTabNet (98.85%), (s) sCAE (98.95%), (t) sTabNets (98.34%), and (u) suTabNets (99.33%).
Moreover, their SP-extracted versions sTabNets, and suTabNets show less noise in the areas of Soybean-mintill and Woods, respectively. In Figure 6, the classification map for the University of Pavia is shown. It can be observed that the maps from the proposed TabNets and uTabNets are smoother in the regions of Bare soil and Meadows, respectively. Similarly, their SP- extracted versions sTabNets and suTabNets produce smoother areas of Bare soil and Meadows, respectively. In Figure 7, the classification map for different methods on the Salinas dataset are shown. It is illustrated that the maps from the proposed TabNets and uTabNets are less noisy in the regions of Corn-seneseed-green-weeds and Grapes-untrained. In addition, the maps from their SP-extracted versions sTabNets and suTabNets contain less noise in the areas of Grapes-untrained and Vinyard-untrained.
Figure 8 shows the classification performance of different methods for varying numbers of training samples in all datasets. For Indian pines, each class’ training sample is varied as { 10 % ,   20 % ,   30 % ,   40 % ,   and   5 0 % } . The training samples per class are varied as { 100 ,   20 0 ,   30 0 ,   40 0 ,   and   5 00 } in both the University of Pavia and Salinas datasets. It can be observed that the proposed TabNets, uTabNets, sTabNets, and suTabNets outperform all other methods, such as RF, MLP, LightGBM, CatBoost, XGBoost, TabNet, uTabNet, CAE, and their SP versions for all numbers of training samples in all three datasets.
Figure 8. Overall classification accuracy (with standard deviations) of considered methods and SP-extracted versions with different numbers of training samples per class: (a,b) Indian Pines, (c,d) University of Pavia, (e,f) Salinas dataset.
To evaluate statistical significance in OA performance improvement, the McNemar’s test [] is shown in Table 14 among different pairs of methods. Two methods are statistically different if z , the value of McNemar’s test denoted as ( | z | > 0 ) , is larger than 1.96 or 2.58, which represents statistical difference at 95% or 99% confidence levels, respectively. The comparison among TabNet, uTabNet, TabNets, uTabNets, sTabNet, suTabNet, sTabNets, suTabNets, and other classifiers is illustrated, which indicates their superiority over their counterparts.
Table 14. Significance from the standard McNemar’s test for the difference between algorithms.
To estimate the computational complexity involved in the proposed algorithms, execution time for different algorithms on three hyperspectral datasets is illustrated in Table 15. All the experiments were run using a NVIDIA Tesla K80 GPU and MATLAB on an Intel(R) Core (TM) i7-4770 central processing unit with 16 GB of memory.
Table 15. Execution time (in seconds) in different experimental datasets.
It can be observed that TabNet has higher computational complexity in comparison to other tree-based methods, which may be due to the sequential attention involved in tabular learning. In addition, the unsupervised pretraining version of TabNet (uTabNet) has higher complexity than TabNet because of the pretraining operation.
Additionally, the proposed TabNets and its unsupervised pretraining version uTabNets show slightly higher complexity than TabNet and uTabNet because of the convolution layer in the attentive transformer for spatial processing of masks. Moreover, the SP-extracted versions TabNets, uTabNets, sTabNets, and suTabNets are slightly costlier than their counterparts due to SP extraction.

5. Conclusions

In this work, we propose a TabNets network that uses spatial attention to enhance the performance of the original TabNet for HSI classification by including a 2D CNN in the attentive transformer. Moreover, unsupervised pretraining on TabNets (uTabNets) was introduced, which can outperform TabNets. SP-extracted versions of TabNet, uTabNet, TabNets, uTabNets were also developed to further utilize spatial information. The experimental results obtained on different hyperspectral datasets illustrate the superiority of the proposed TabNets and uTabNets and their SP versions in terms of classification accuracy over other techniques, such as RF, MLP, LightGBM, CatBoost, XGBoost, and their SP versions. However, the proposed networks show slightly higher complexity for network optimization. In future work, more spatial and spectral information will be incorporated into TabNet to enhance the classification performance with reduced computational cost. Moreover, the performance of the enhanced TabNet on hyperspectral anomaly detection will be investigated. This has potential applications for solving similar classification and feature extraction problems for high-resolution thermal or remote sensing images.

Author Contributions

Conceptualization, C.S., Q.D. and Y.X.; methodology, C.S. and Q.D.; writing—original draft, C.S.; writing—review and editing, C.S. and Q.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors would like to thank the authors of all references used in the paper, the editors, and the anonymous reviewers for their detailed comments and suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Shah, C.; Du, Q. Collaborative and Low-Rank Graph for Discriminant Analysis of Hyperspectral Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 5248–5259. [Google Scholar] [CrossRef]
  2. Shah, C.; Du, Q. Spatial-Aware Probabilistic Collaborative Representation for Hyperspectral Image Classification. In Proceedings of the Image and Signal Processing for Remote Sensing XXVI (Proc. Of SPIE), Edinburgh, UK, 21–25 September 2020. art no 115330Q. [Google Scholar] [CrossRef]
  3. Li, W.; Du, Q. Joint Within-Class Collaborative Representation for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2200–2208. [Google Scholar] [CrossRef]
  4. Shah, C.; Du, Q. Modified Structure-Aware Collaborative Representation for Hyperspectral Image Classification. In Proceedings of the 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 11–16 July 2021. [Google Scholar] [CrossRef]
  5. Pan, L.; Li, H.-C.; Deng, Y.-J.; Zhang, F.; Chen, X.-D.; Du, Q. Hyperspectral Dimensionality Reduction by Tensor Sparse and Low-Rank Graph-Based Discriminant Analysis. Remote Sens. 2017, 9, 452. [Google Scholar] [CrossRef] [Green Version]
  6. Li, W.; Wang, Z.; Li, L.; Du, Q. Feature extraction for hyperspectral images using local contain profile. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 5035–5046. [Google Scholar] [CrossRef]
  7. Hong, D.; Wu, X.; Ghamisi, P.; Chanussot, J.; Yokoya, N.; Zhu, X.X. Invariant attribute profiles: A spatial-frequency joint feature extractor for hyperspectral image classification. IEEE Trans. Geosci. Remote Sens. 2020, 58, 3791–3808. [Google Scholar] [CrossRef] [Green Version]
  8. Chang, C.-I. Hyperspectral Data Exploitation: Theory and Applications; Wiley: Hoboken, NJ, USA, 2007. [Google Scholar] [CrossRef]
  9. Chen, M.; Tang, Y.; Zou, X.; Huang, Z.; Zhou, H.; Chen, S. 3D global mapping of large-scale unstructured orchard integrating eye-in-hand stereo vision and SLAM. Comput. Electron. Agric. 2021, 187, 106237. [Google Scholar] [CrossRef]
  10. Wu, F.; Duan, J.; Chen, S.; Ye, Y.; Ai, P.; Yang, Z. Multi-Target Recognition of Bananas and Automatic Positioning for the Inflorescence Axis Cutting Point. Front. Plant Sci. 2021, 12, 705021. [Google Scholar] [CrossRef]
  11. Cao, X.; Yan, H.; Huang, Z. A Multi-Objective Particle Swarm Optimization for Trajectory Planning of Fruit Picking Manipulator. Agronomy 2021, 11, 2286. [Google Scholar] [CrossRef]
  12. Du, P.; Samat, A.; Waske, B.; Liu, S.; Li, Z. Random Forest and rotation forest for fully polarized SAR image classification using polarimetric and spatial features. ISPRS J. Photogramm. Remote Sens. 2015, 105, 38–53. [Google Scholar] [CrossRef]
  13. Samat, A.; Persello, C.; Liu, S.; Li, E.; Miao, Z.; Abuduwaili, J. Classification of VHR multispectral images using extratrees and maximally stable extremal region-guided morphological profile. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3179–3195. [Google Scholar] [CrossRef]
  14. Melgani, F.; Bruzzone, L. Classification of Hyperspectral Remote Sensing Images with Support Vector Machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1778–1790. [Google Scholar] [CrossRef] [Green Version]
  15. Camps-Valls, G.; Gomez-Chova, L.; Munoz-Mari, J.; Vila-Frances, J.; Calpe-Maravilla, J. Composite Kernels for Hyperspectral Image Classification. IEEE Geosci. Remote Sens. Lett. 2006, 3, 93–97. [Google Scholar] [CrossRef]
  16. Li, J.; Bioucas-Dias, J.M.; Plaza, A. Spectral–Spatial Hyperspectral Image Segmentation Using Subspace Multinomial Logistic Regression and Markov Random Fields. IEEE Trans. Geosci. Remote Sens. 2012, 50, 809–823. [Google Scholar] [CrossRef]
  17. Hughes, G. On the Mean Accuracy of Statistical Pattern Recognizers. IEEE Trans. Inf. Theory 1968, 14, 55–63. [Google Scholar] [CrossRef] [Green Version]
  18. Fauvel, M.; Tarabalka, Y.; Benediktsson, J.A.; Chanussot, J.; Tilton, J.C. Advances in Spectral-Spatial Classification of Hyperspectral Images. Proc. IEEE 2013, 101, 652–675. [Google Scholar] [CrossRef] [Green Version]
  19. Cui, M.; Prasad, S. Class-Dependent Sparse Representation Classifier for Robust Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2015, 53, 2683–2695. [Google Scholar] [CrossRef]
  20. Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.S.; Yi, M. Robust Face Recognition via Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 210–227. [Google Scholar] [CrossRef] [Green Version]
  21. Chen, Y.; Nasrabadi, N.M.; Tran, T.D. Hyperspectral Image Classification Using Dictionary-Based Sparse Representation. IEEE Trans. Geosci. Remote Sens. 2011, 49, 3973–3985. [Google Scholar] [CrossRef]
  22. Shah, C.; Du, Q. Spatial-Aware Collaboration-Competition Preserving Graph Embedding for hyperspectral image classification. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
  23. Zhang, H.; Li, J.; Huang, Y.; Zhang, L. A Nonlocal Weighted Joint Sparse Representation Classification Method for Hyperspectral Imagery. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2014, 7, 2056–2065. [Google Scholar] [CrossRef]
  24. Peng, J.; Du, Q. Robust Joint Sparse Representation Based on Maximum CORRENTROPY Criterion for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 7152–7164. [Google Scholar] [CrossRef]
  25. Benediktsson, J.A.; Palmason, J.A.; Sveinsson, J.R. Classification of Hyperspectral Data from Urban Areas Based on Extended Morphological Profiles. IEEE Trans. Geosci. Remote Sens. 2005, 43, 480–491. [Google Scholar] [CrossRef]
  26. Khodadadzadeh, M.; Li, J.; Plaza, A.; Ghassemian, H.; Bioucas-Dias, J.M.; Li, X. Spectral–Spatial Classification of Hyperspectral Data Using Local and Global Probabilities for Mixed Pixel Characterization. IEEE Trans. Geosci. Remote Sens. 2014, 52, 6298–6314. [Google Scholar] [CrossRef]
  27. Fang, L.; Li, S.; Duan, W.; Ren, J.; Benediktsson, J.A. Classification of Hyperspectral Images by Exploiting Spectral–Spatial Information of Superpixel via Multiple Kernels. IEEE Trans. Geosci. Remote Sens. 2015, 53, 6663–6674. [Google Scholar] [CrossRef] [Green Version]
  28. Ho, T.K. The Random Subspace Method for Constructing Decision Forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar]
  29. Xia, J.; Ghamisi, P.; Yokoya, N.; Iwasaki, A. Random Forest Ensembles and Extended Multiextinction Profiles for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 202–216. [Google Scholar] [CrossRef] [Green Version]
  30. Rasti, B.; Hong, D.; Hang, R.; Ghamisi, P.; Kang, X.; Chanussot, J.; Benediktsson, J.A. Feature Extraction for Hyperspectral Imagery: The Evolution from Shallow to Deep: Overview and Toolbox. IEEE Geosci. Remote Sens. Mag. 2020, 8, 60–88. [Google Scholar] [CrossRef]
  31. Chen, T.; Guestrin, C. XGBoost. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016. [Google Scholar]
  32. Samat, A.; Li, E.; Wang, W.; Liu, S.; Lin, C.; Abuduwaili, J. Meta-XGBoost for Hyperspectral Image Classification Using Extended MSER-Guided Morphological Profiles. Remote Sens. 2020, 12, 1973. [Google Scholar] [CrossRef]
  33. Li, Z.; Huang, L.; Zhang, D.; Liu, C.; Wang, Y.; Shi, X. A Deep Network Based on Multiscale Spectral-Spatial Fusion for Hyperspectral Classification. Proc. Int. Knowl. Sci. Eng. Manag. 2018, 11062, 283–290. [Google Scholar]
  34. Li, Z.; Huang, L.; He, J. A Multiscale Deep Middle-Level Feature Fusion Network for Hyperspectral Classification. Remote Sens. 2019, 11, 695. [Google Scholar] [CrossRef] [Green Version]
  35. Heaton, J. Ian Goodfellow, Yoshua Bengio, and Aaron Courville: Deep Learning. Genet. Program. Evolvable Mach. 2017, 19, 305–307. [Google Scholar] [CrossRef] [Green Version]
  36. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet classification with deep convolutional Neural Networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
  37. Zhang, M.; Li, W.; Du, Q.; Gao, L.; Zhang, B. Feature extraction for classification of Hyperspectral and LIDAR data using patch-to-patch CNN. IEEE Trans. Cybern. 2020, 50, 100–111. [Google Scholar] [CrossRef]
  38. Hestness, J.; Narang, S.; Ardalani, N.; Diamos, G.; Jun, H.; Kianinejad, H.; Patwary, M.M.A.; Yang, Y.; Zhou, Y. Deep Learning Scaling Is Predictable, Empirically. Available online: https://arxiv.org/abs/1712.00409 (accessed on 29 October 2021).
  39. Arik, S.O.; Pfister, T. TabNet: Attentive Interpretable Tabular Learning. arXiv 2020, arXiv:1908.07442. Available online: https://arxiv.org/abs/1908.07442v4 (accessed on 6 November 2021).
  40. Arik, S.O.; Pfister, T. TabNet: Attentive Interpretable Tabular Learning. AAAI 2021, 35, 6679–6687. Available online: https://ojs.aaai.org/index.php/AAAI/article/view/16826 (accessed on 29 October 2021).
  41. Kemker, R.; Kanan, C. Self-Taught Feature Learning for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2017, 55, 2693–2705. [Google Scholar] [CrossRef]
  42. Hang, R.; Liu, Q.; Hong, D.; Ghamisi, P. Cascaded Recurrent Neural Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5384–5394. [Google Scholar] [CrossRef] [Green Version]
  43. Zhu, L.; Chen, Y.; Ghamisi, P.; Benediktsson, J.A. Generative Adversarial Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 5046–5063. [Google Scholar] [CrossRef]
  44. Chen, Y.; Jiang, H.; Li, C.; Jia, X.; Ghamisi, P. Deep Feature Extraction and Classification of Hyperspectral Images Based on Convolutional Neural Networks. IEEE Trans. Geosci. Remote Sens. 2016, 54, 6232–6251. [Google Scholar] [CrossRef] [Green Version]
  45. Cheng, G.; Li, Z.; Han, J.; Yao, X.; Guo, L. Exploring Hierarchical Convolutional Features for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6712–6722. [Google Scholar] [CrossRef]
  46. Haut, J.M.; Paoletti, M.E.; Plaza, J.; Li, J.; Plaza, A. Active Learning with Convolutional Neural Networks for Hyperspectral Image Classification Using a New Bayesian Approach. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6440–6461. [Google Scholar] [CrossRef]
  47. Chen, Y.; Wang, Y.; Gu, Y.; He, X.; Ghamisi, P.; Jia, X. Deep Learning Ensemble for Hyperspectral Image Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2019, 12, 1882–1897. [Google Scholar] [CrossRef]
  48. Duan, P.; Ghamisi, P.; Kang, X.; Rasti, B.; Li, S.; Gloaguen, R. Fusion of Dual Spatial Information for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2021, 59, 7726–7738. [Google Scholar] [CrossRef]
  49. Guyon, I.; Elisseeff, A. An Introduction to Variable and Feature Selection. J. Mach. Learn. Res. 2003, 3, 1157–1182. [Google Scholar]
  50. Chen, J.; Song, L.; Wainwright, M.J.; Jordan, M.I. Learning to Explain: An Information-Theoretic Perspective on Model Interpretation. International Conference to Machine Learning (ICML) 2018. Available online: https://arxiv.org/abs/1802.07814 (accessed on 2 November 2021).
  51. Yoon, J.; Jordon, J.; Schaar, M. Invase: Instance-wise variable selection using neural networks: Semantic scholar. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; Available online: https://openreview.net/forum?id=BJg_roAcK7 (accessed on 2 November 2021).
  52. Grabczewski, K.; Jankowski, N. Feature Selection with Decision Tree Criterion. In Proceedings of the Fifth International Conference on Hybrid Intelligent Systems, Rio de Janeiro, Brazil, 6–9 November 2005. [Google Scholar]
  53. Catboost. Catboost/Benchmarks: Comparison Tools. Available online: https://github.com/catboost/benchmarks (accessed on 4 November 2021).
  54. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  55. Wang, X.; Tan, K.; Du, Q.; Chen, Y.; Du, P. Caps-Triplegan: Gan-Assisted CapsNet for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7232–7245. [Google Scholar] [CrossRef]
  56. Peters, B.; Niculae, V.; Martins, A.F. Sparse Sequence-to-Sequence Models. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July–2 August 2019. [Google Scholar]
  57. Yves, G.; Yoshua, B. Entropy Regularization. Semi-Supervised Learn. 2006, 151–168. [Google Scholar] [CrossRef] [Green Version]
  58. Dauphin, Y.N.; Fan, A.; Auli, M.; Grangier, D. Language Modeling with Gated Convolutional Networks. 2016. Available online: https://arxiv.org/abs/1612.08083 (accessed on 28 October 2021).
  59. Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional Sequence to Sequence Learning. 2017. Available online: https://arxiv.org/abs/1705.03122v1 (accessed on 1 November 2021).
  60. Hoffer, E.; Hubara, I.; Soudry, D. Train Longer, Generalize Better: Closing the Generalization Gap in Large Batch Training of Neural Networks. 2017. Available online: http://arxiv-export-lb.library.cornell.edu/abs/1705.08741?context=cs (accessed on 27 October 2021).
  61. Goldstein, T.; Osher, S. The Split Bregman Method for L1-Regularized Problems. SIAM J. Imaging Sci. 2009, 2, 323–343. [Google Scholar] [CrossRef]
  62. Foody, G.M. Thematic Map Comparison. Photogramm. Eng. Remote Sens. 2004, 70, 627–633. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.