Next Article in Journal
RSSGG_CS: Remote Sensing Image Scene Graph Generation by Fusing Contextual Information and Statistical Knowledge
Next Article in Special Issue
A Tracking Imaging Control Method for Dual-FSM 3D GISC LiDAR
Previous Article in Journal
A Universal Multi-Frequency Micro-Resistivity Array Imaging Method for Subsurface Sensing
Previous Article in Special Issue
Fine-Grained Ship Classification by Combining CNN and Swin Transformer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

PolSAR Scene Classification via Low-Rank Constrained Multimodal Tensor Representation

1
School of Artificial Intelligence, Xidian University, No. 2 South Taibai Road, Yanta District, Xi’an 710071, China
2
Key Laboratory of Digital Earth Science, Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100094, China
3
CNRS, Grenoble INP, GIPSA-Lab, Université Grenoble Alpes, 38000 Grenoble, France
*
Author to whom correspondence should be addressed.
Remote Sens. 2022, 14(13), 3117; https://doi.org/10.3390/rs14133117
Submission received: 12 May 2022 / Revised: 23 June 2022 / Accepted: 27 June 2022 / Published: 28 June 2022
(This article belongs to the Special Issue State-of-the-Art Remote Sensing Image Scene Classification)

Abstract

:
Polarimetric synthetic aperture radar (PolSAR) data can be acquired at all times and are not impacted by weather conditions. They can efficiently capture geometrical and geographical structures on the ground. However, due to the complexity of the data and the difficulty of data availability, PolSAR image scene classification remains a challenging task. To this end, in this paper, a low-rank constrained multimodal tensor representation method (LR-MTR) is proposed to integrate PolSAR data in multimodal representations. To preserve the multimodal polarimetric information simultaneously, the target decompositions in a scene from multiple spaces (e.g., Freeman, H/A/ α , Pauli, etc.) are exploited to provide multiple pseudo-color images. Furthermore, a representation tensor is constructed via the representation matrices and constrained by the low-rank norm to keep the cross-information from multiple spaces. A projection matrix is also calculated by minimizing the differences between the whole cascaded data set and the features in the corresponding space. It also reduces the redundancy of those multiple spaces and solves the out-of-sample problem in the large-scale data set. To support the experiments, two new PolSAR image data sets are built via ALOS-2 full polarization data, covering the areas of Shanghai, China, and Tokyo, Japan. Compared with state-of-the-art (SOTA) dimension reduction algorithms, the proposed method achieves the best quantitative performance and demonstrates superiority in fusing multimodal PolSAR features for image scene classification.

Graphical Abstract

1. Introduction

Polarimetric synthetic aperture radar (PolSAR), a kind of multichannel synthetic aperture radar (SAR), has four channels, HH, HV, VH, and VH, which are formed according to the vertical and horizontal transmission and reception of electromagnetic waves. The polarization of back-scattering waves changes after being reflected by the targets; then, the polarimetric sensors provide more information to describe the land surface structure, land covers, geometrical structures, etc. The targets on the ground can be described using a so-called scattering coefficient that can represent the interaction of an electromagnetic wave with the targets [1].
Compared with optical radar, PolSAR data can actively obtain more information on land cover by alternately transmitting and receiving electromagnetic waves. After obtaining more comprehensive backscattering information, this multichannel sensor can describe land covers from many perspectives, such as color and texture in pseudo-color images, components in coherent or incoherent target decomposition algorithms and parameters, and coefficients in back-scattering matrices [2,3,4]. Recently, lots of researchers have devoted themselves to PolSAR land cover classification, segmentation, clustering, or other pixel-level tasks [5,6,7].
With the development of radar imaging techniques, more sensors with high precision and polarimetric channels (such as ALOS-2, GF3, and TerraSAR-X) have been launched. The continual launching of remote sensing satellites produces a monotonically increasing amount of data. These satellites provide large amounts of data for learning and produce more high-resolution images to extend the research field from pixel-level to target-level and scene-level tasks [8]. Limited by the difficulty of accessing the full polarimetric data, they lag behind the development of PolSAR image scene classification. However, there is still a huge potential for PolSAR data to deal with scene-level data understanding [9].
PolSAR data have multimodal features in different representation spaces to create a comprehensive description of the targets. The eigenvalue decomposition H/A/ α with Wishart-based distance was adopted for unsupervised classification in the early days [10]. Furthermore, target decomposition theorems (including model-, eigenvector-, or eigenvalue-based incoherent and coherent decompositions) are exploited to provide physical interpretations of the targets, in which some feature extraction methods are proposed to describe the PolSAR samples as much as possible [1,11,12,13,14,15,16]. In the literature [2], texture is first investigated as a valuable resource for PolSAR classification, and some image feature extraction techniques (such as the gray-level co-occurrence matrix [17,18], Gabor wavelets [19,20], fractal features [21,22], etc.) are analyzed to represent the pixels in the image.
Overall, with those multimodal features, one still needs to analyze the properties of those features and make a comprehensive fusion technique for the corresponding tasks. Current existing multi-view space clustering (MSC) methods open a new avenue for combining those features, which maintains consistency among different feature spaces in a clustering structure [23]. Some low-rank or sparse constraints are exploited to construct the affinity matrix and encourage the sparsity and low-rankness of the solution [24]. Especially for image-level tasks, an image can be represented by a tensor; then, the tensor-based MSC method is designed, which may also be constrained by a variety of regularization terms [25]. PolSAR data are represented by features in multiple spaces, and PolSAR images can be formulated as tensors. Therefore, determining how to make full use of those PolSAR tensorial forms and exploit the underlying properties in a multimodal features space is very significant for PolSAR data interpretation.
Inspired by multi-view space clustering algorithms, a novel low-rank constrained multimodal tensor representation method (LR-MTR) is proposed to combine multiple features in PolSAR data for scene classification. Different from the most prevalent existing tensor-based PolSAR or hyperspectral land cover classification methods regarding each sample as a tensor [26,27,28,29,30], the proposed method constructs all the representation matrices in different spaces as a tensor, as shown in Figure 1. Furthermore, the high-order representation tensor is constrained by a low-rank norm to model the cross-information from multiple different subspaces. It reduces the unnecessary information of subspace representations in calculations and captures the complementary information among those multiple features. A projection matrix is also calculated by minimizing the difference between all the cascaded features and the features in the corresponding space, which will reduce the difference between the global and single feature spaces. The learned projection matrix can effectively solve the out-of-sample problem in a large-scale data set. Two typical matrices are alternately updated in the optimization: the affinity matrices learned from the multimodal PolSAR data samples and the projection matrix calculated by the representation tensor. Finally, to produce the final results, K-nearest neighbor (KNN) and support vector machine (SVM) classifiers are applied. To solve the complex minimization problem, an augmented Lagrangian alternating direction minimization (AL-ADM) [31,32] algorithm has been applied.
This study has four contributions, as follows:
(1) A new dimension reduction algorithm is introduced. This new algorithm integrates the original multimodal features and extracts an eigenvalue decomposition-based projection matrix. We develop a new framework for image feature extraction. Our method can search the optimal projection matrix and optimal subspace simultaneously.
(2) PolSAR coherency/covariance/scattering matrices and various polarimetric target decomposition can be utilized to describe PolSAR data in multimodal feature spaces. To derive a comprehensive representation, the pseudo-color images from Freeman, Cloude, and Pauli are provided to represent the data. This paper describes PolSAR data via multimodal images and tests the visual information via some efficiency texture extraction methods.
(3) To obtain the correlation among the underlying multimodal data and to congregate those features, LR-MTR regards all the subspace features as high-order, so it can be seen as a tensor. With the constraint of a low-rank term, the tensor models the cross-information and obtains representation matrices in the multimodal subspace. LR-MTR reduces the redundancy of the learned subspace representation. Meanwhile, the LR-MTR algorithm also performs with better accuracy on PolSAR scene classification.
(4) Due to the difficulty of accessing PolSAR data, studies in PolSAR image scene classification are relatively rare. To effectively process and investigate this field, we use the semantic information to label images according to the actual land cover situation. Two new full-polarization PolSAR data sets covering Shanghai and Tokyo, produced by ALOS-2, are exploited to build scene classification data.
On the algorithm level, this method introduces a novel multimodal tensor representation method. It does not directly represent samples in the data domain, but constructs a representation tensor under a new tensor low-rank constraint. It learns a representative projection matrix to reduce the redundant information and projects all the features from different modalities into the representation space. Furthermore, the fused features are obtained for the subsequent PolSAR scene classification.
We organize the remaining parts of this study as follows. Section 2 introduces the related works. In Section 3, we introduce the motivation of our algorithm, the theory and methodology of the proposed algorithm, and the optimal solution. In Section 4, experiments are undertaken to show the efficiency of our algorithm. The experimental analysis and visualization results are used to present our data sets. In Section 5, the conclusion is provided.

2. Related Work

2.1. Multi-View Learning

In the real world, a sample can be described by the features of a multi-view space, the same way a person can be recognized by their face, fingerprints, iris, and gait. Many kinds of multi-view methods have been used in different fields. In general, there are three main kinds of multi-view methods: co-training, co-regularizations, and margin-consistency style algorithms [33,34].
Co-regularized approaches design specific regularization terms for objective functions to explore the complementary information by jointly regularizing the hypotheses within two or more distinct views [35]. The co-training approach is one of the earliest algorithms to train alternately to maximize the manual information and solve the problem of semi-supervised learning [24,36]. The margin-consistency-style algorithm was recently proposed to regularize the margins from multi-view data consistently in large-margin classifiers [37,38]. Apart from those above-mentioned styles, there are different multi-view learning algorithms in some specific tasks, which have significant relationships with those three main categories and provide some fine research directions.
Co-regularization algorithms, as a significant branch of multi-view learning, can be divided into a few parts according to the criterion of designing a constraint for regularization in learning a latent subspace or applying labeled or unlabeled data to maintain consistency in a multi-view space. The first series are canonical correlation analysis (CCA)-based algorithms [39], which can exploit the underlying co-relationships within the multi-view space. Specifically, kernel CCA is put forward to extend the original CCA to a kernel form [40], and tensor CCA is a high-order form to handle data with any number of views [41]. Unlike CCA, Fisher’s multi-view discriminant analysis, as an extension of linear discriminant analysis (LDA), makes use of labeled data to learn an informative projection matrix to maximize the intra-class distance and minimize the intra-class distance [42]. Based on those works, some graph-based methods refer to the fusion of multiple graphs for multi-view features into a common graph before clustering [43]. They all attempt to construct an embedding projection or affinity matrix to represent the graph or similarity among different samples.

2.2. Multi-View Subspace Clustering

Unlike previous multi-view learning algorithms, multi-view subspace clustering (MSC) first calculates the affinity matrix in each view and then constructs the representation tensor. It can capture the high-order correlations of different views instead of independently constructing them. Inspired by the information correlation in MSC, we extend MSC to a linear dimension reduction technique [44], namely LR-MTR. LR-MTR mainly searches for an informative projection matrix for all the viewed data; subspace representation in a multi-view space is achieved as well.
In this section, the MSC model will be introduced. The algorithm constructs the affinity matrix by reconstructing the coefficients. Through multi-view data, which features columns, we have a D-dimensional vector: X = x 1 , x 2 , , x N R D × N ; Z = z 1 , z 2 , , z N R N × N is learned by the MSC model, where each of its columns z i is a subspace representation of the sample vector x i . A multi-view subspace representation matrix will be computed through the following equations:
min Z ( v ) , E ( v ) v = 1 V R Z v + λ v E v 2 , 1 s . t . X v = X v Z v + E v , v = 1 , 2 , , V ,
where each view’s value of the loss penalty is controlled by the parameter λ v , R ( . ) , which is the regularizer. Z v , X v , and E v represent the subspace, the data matrix, and the error matrix in the v t h view, respectively. E 2 , 1 = j = 1 n i = 1 m E i , j 2 1 2 is the 2 , 1 -norm, which is used in Equation (1) for the loss between different data cases; it also promotes the columns to become zero. We use V as the number of views. The objective is to find the series of affinity matrices representing the similarities of the samples in each view.

2.3. Low-Rank Representation

To find the correlations among multi-view spaces, low-rank regularization can be added to the objective function. As a subspace clustering method, to capture the LRR in the given data set, the following problem is to be solved:
Z * = arg min Z rank ( Z ) , s . t . X = X Z
As we all know, to introduce low-rank regularization to subspace clustering, we usually apply a nuclear norm of a matrix:
Z * = arg min Z Z * , s . t . X = X Z ,
in which · * stands for the matrix’s nuclear norm. The LRR method is effective for solving robust clustering problems.
However, we cannot use it on the linear dimension reduction problem. We use the smallest number of rank-1 tensors to define the rank of a tensor X that represents this tensor X as their sum [45]. It seems that the definition of the matrix X’s rank is the smallest number of rank-1 matrices that can generate the matrix X. Given the NP-hard problem of searching for the tensor rank, its definition is not unique. Therefore, some low-rank tensor decomposition approaches to simulate higher-order tensors have been proposed [46].
There is no straightforward method to determine a specific tensor. Thus, given a range of approximate approaches to CANDECOMP/PARAFAC (CP) decomposition, Tucker-decomposition-based methods are expanding [47,48]. Additionally, a tensor rank approximation problem can be formulated as a convex optimization problem to form norm regularizations [49].

3. LR-MTR

In this section, we introduce the motivation of LR-MTR, the specific algorithm, and the optimization. X 1 X V represent multi-view features data collection. LR-MTR calculates the projection matrix U by minimizing the difference between the whole cascaded data set and the features in the corresponding space and combines the subspace representations features into a low-rank tensor Z. The projection matrix is used for linear dimension reduction. LR-MTR utilizes the tensor to model the high-order correlations from multimodal data and obtains a series of representation matrices in the multimodal subspace. The overview of LR-MTR is shown in Figure 1.

3.1. The Motivation of the Method

Most of the existing representative linear dimension reduction methods mainly use geometric, sample metric, or class-specific structures to calculate the similarity of samples and learn a projection matrix for feature extraction. For example, the well-known locality preserving projection (LPP) constructs a similarity graph and retains the local relationships among the data, neighbor preserving embedding (NPE) uses the same thought, and linear discriminant analysis (LDA) employs label information to search for a linear combination of features from two or more known categories. These approaches all process the cascaded features without considering the consistency and independence of the multi-view features from one sample. In the real world, one modal feature is not guaranteed to effectively describe a sample, especially for remote sensing data. The PolSAR is a multi-channel radar sensor and can be represented by some typical modalities, such as color and textures in pseudo-images, coefficients in incoherent/coherent target decompositions, and free parameters in scattering matrices. PolSAR scene classification, taking each image as one sample, can make full use of the information from the PolSAR image, including not only the spatial knowledge, but also the redundant features provided by PolSAR multi-channel descriptors.
Thus, it is urgent to simultaneously exploit the multimodal PolSAR features for data processing. In different representation spaces, PolSAR data have multimodal features to create a comprehensive description for the targets, which can be directly constructed into a high-order tensor. It will seek the spatial information from multi-view data and correlate multimodal features. These high-order tensor samples can extend the linear dimension reduction algorithms to tensor forms, such as tensor locality preserving projection (TLPP) or tensor neighbor preserving embedding (TNPE), but this also results in the huge computation of calculating the tensor similarities and constructing tensor graphs. We try to use representation matrices from multi-view subspaces to construct a tensor that can effectively improve the extraction of correlations among multi-views. Compared with these tensor dimension reduction techniques, LR-MTR constructs a representation tensor instead of a feature tensor, effectively reducing the computation complexity in optimization.
Therefore, the robustness of LRR to noise, corrosion, and occlusion can be fully equipped when learning linear dimension reduction in a robust multi-view subspace. Given that LRR can extract robust features matrix Z between data sets, the low-rank representation relation is explored.

3.2. The Main Algorithm

To achieve linear dimension reduction in PolSAR data with sample-specific outliers and corruptions, we assume that the PolSAR data can be approximately reconstructed by the orthogonal projection matrix U. Thus, the following objective function can be designed based on Equation (1) in a multi-view subspace:
min Z ( v ) , E ( v ) v = 1 V Z v * + λ v E v 2 , 1 + λ U T X U T X W 2 , 1 + v = 1 V S ( v ) F 2 , s . t . X v = X v Z v + E v , v = 1 , 2 , , V , U T U = I , S ( v ) = W Z ( v ) , X = X ( 1 ) ; X ( 2 ) ; ; X ( V )
This objective function was built to calculate the optimal projection and representation matrix. The term U T X U T X W 2 , 1 is used to measure the reconstructive error, and the term S ( v ) F 2 is used to minimize the subspace representation matrix Z v and the overall self-representing matrix W. The rank minimization problem has the NP-hard property. To solve this problem, the rank function R ( · ) with the convex lower bound · * was replaced.
Common approaches to solve the multi-view subspace clustering problem deal with each view feature separately. Considering the correlations of different views, we introduce a tensor nuclear norm to capture hidden related information from different views and learn their subspace representations. The tensor nuclear norm can be written as:
Z * = m = 1 M ξ m Z m * ,
where ξ m ’s are constant, satisfying ξ i > 0 , and m = 1 M ξ m = 1 . Z R I 1 × I 2 × × I M is an M-order tensor. Unfolding the tensor Z along the m t h mode, the matrix Z m R I m × I 1 × × I m 1 × I m + 1 × I M is formed. Through the nuclear norm · * , we can restrict the tensor to a low rank. The nuclear norm is an approximation of the rank of matrices because it is a compact convex envelope of the matrix rank. The low-rank tensor representation’s objective function can be written as follows:
min Z v , E v Z * + λ E 2 , 1 , s . t . E = E 1 ; E 2 ; ; E V , Z = ψ Z 1 , Z 2 , , Z V , X v = X v Z v + E v , v = 1 , 2 , , V ,
where the function ψ · is used to fuse different matrices Z v to a third-order tensor Z . E is formed by concatenating the column of errors corresponding to each view. The columns were encouraged to be zero by 2 , 1 -norm. This corruption is sample-specific, which is an underlying assumption. This means that some instances may have been corrupted. In the integrated approach, E v will be constrained by a consistent and common quantity value. To reduce the variation of the error magnitude between multi-views, the data matrices are normalized so that the errors of different views have the same scale. We normalized x i with x i x i x i 2 . The lowest-rank self-representation matrix comes from the objective function.
Based on Equations (1) and (5), an objective function is designed to calculate the projection matrix U by minimizing the difference between the whole cascaded data set and the features in the corresponding subspace. We combine this method of calculating the projection matrix with the previously proposed method of constructing higher-order tensors to achieve higher-order correlation and linear dimension reduction among multi-view data. To adopt the alternating-direction minimizing strategy and make the equation separable, S Auxiliary G m is used to replace Z ( m ) in the tensor nuclear norm. The function is as follows:
min U , W λ 1 U T X U T X W 2 , 1 + λ 2 E 2 , 1 + m = 1 M γ m G m * + v = 1 V S ( v ) F 2 , s . t . P m z = g m , m = 1 , 2 , , M , E = E 1 ; E 2 ; ; E V , Z = ψ Z 1 , Z 2 , , Z V , X v = X v Z v + E v , v = 1 , 2 , , V , U T U = I , S ( v ) = W Z ( v ) , X = X ( 1 ) ; X ( 2 ) ; ; X ( V )
where γ m is the strength of the nuclear norm, z is the vectorization of Z, and g m is the matrix of G m s. P m is the alignment matrix corresponding to the mode-k unfolding, which is used to align the corresponding elements between Z ( m ) and G m . The feature matrix X = X ( 1 ) ; X ( 2 ) ; ; X ( V ) is arranged in rows by the matrices X ( v ) that form different modalities. As G m is low-rank, the solution Z is ensured to be low-rank in the first constraint. Meanwhile, the features that form multi-views to the representing samples are constrained by the representation matrix in the low-rank tensor norm. The projection matrix U is also calculated to minimize the difference between the whole cascaded data set and the features in the corresponding subspace.

3.3. The Optimal Solution

We introduce our method to the optimization problem in this part. For the two variables in the model, there is no method to attain the optimal solutions synchronously, so a different method is proposed. There are two steps of the iterative algorithms: fix variable W to calculate the optimal U, and then fix U to calculate W.
Step 1: Fix U to calculate W
In Step 1, the AL-ADM method will used for the optimization problem Equation (7) to minimize the Lagrangian function below:
L μ > 0 ( Z 1 , , Z V ; E 1 , , E V ; S 1 , , S V ; G 1 , , G M ; E h ) = λ 1 E h 2 , 1 + λ 2 E 2 , 1 + m = 1 M γ m G m * + μ Φ α m , p m z g m + v = 1 V ( S ( v ) F 2 + μ Φ Y v T , X v X v Z v E v + μ Φ ( R v T , S ( v ) W + Z ( v ) ) ) + μ Φ ( β T , U T X U T X W E h )
in which μ > 0 is a penalty parameter, · F 2 stands for the Frobenius norm of a matrix, and Y v and α m are the Lagrangian multipliers. The definition is given as ϕ Y , C = Y , C + 1 2 C F 2 , where · , · represents the matrix’s inner product. The above problem is unconstrained, and by fixing the other variables we can minimize in turn with respect to the variables Z v , S v , G m , E, and E h , and the Lagrangian multipliers Y v and α m will be updated. For each iteration, we update each variable through the following equations:
1. Z v subproblem:
The subspace representation Z v can be updated through the following subproblem:
Z ( v ) * = arg min Z ( v ) m = 1 M μ Φ ( α m , p m z g m ) + μ Φ ( Y v T , X ( v ) X ( v ) Z ( v ) E ( v ) ) + μ Φ ( R v T , S ( v ) W + Z ( v ) ) = arg min Z ( v ) m = 1 M μ Φ ( Ω v ( α m T ) , Ω k ( p m z g m ) ) + μ Φ ( Y v T , X ( v ) X ( v ) Z ( v ) E ( v ) ) + μ Φ ( R v T , S ( v ) W + Z ( v ) )
where Ω v · is the operation that selects the elements and reshapes them into a matrix corresponding to the v-th view. Thus, we can find Z v through the following:
Z ( v ) * = ( m = 1 M B m ( v ) m = 1 M A m ( v ) + X ( v ) T X ( v ) X ( v ) T E ( v ) + X ( v ) T Y v S ( v ) + W R v ) M + 1 I + X ( v ) T X ( v ) 1
with
A m ( v ) = Ω v α m B m ( v ) = Ω v g m
An M-way tensor has M unfolding ways. For all these modes in our experiments, N × N elements corresponding to the v-th views were chosen by the Ω v · . After that, it is reshaped to the N × N dimensional matrices A m ( v ) and B m ( v ) corresponding to Z v .
2. z subproblem:
We update z directly by z * Z ( v ) . As in each iteration, E ( v ) and Y v have no relation to other views, z is updated independently among multiple views. When all Z ( v ) s are obtained, z can be updated.
3. E subproblem:
The error matrix E can be updated by the following:
E * = arg min E λ 2 E 2 , 1 + v = 1 V μ Φ ( Y v T , X ( v ) X ( v ) Z ( v ) E ( v ) ) = arg min E λ 2 μ E 2 , 1 + 1 2 E F F 2 = arg min E λ 2 μ E 2 , 1 + 1 2 E ( X ( v ) X ( v ) Z ( v ) + Y v ) F 2
where we vertically concatenate the matrices X ( v ) X ( v ) Z ( v ) + Y v with the column to form F. The above subproblem can be solved through Lemma 3.2 in [50].
4. Y v subproblem:
In each iteration, the multiplier Y v can be updated by the following:
Y v * = Y v + ( X ( v ) X ( v ) Z ( v ) E ( v ) )
5. W subproblem:
The optimizer W can be updated through the following subproblem:
W * = arg min W μ Φ ( β T , U T X U T X W E h ) + μ Φ ( R v T , S ( v ) W + Z ( v ) )
After taking the derivative of the above formula, the solution of W is obtained by the following:
W * = ( X T U U T X + X T U β X T U E h + S ( v ) + Z ( v ) + R v ) I + X T U U T X 1
6. R v subproblem:
In each iteration, the multiplier R v is updated by the following:
R v * = R v + ( S ( v ) W + Z ( v ) )
7. S ( v ) subproblem:
The optimizer S ( v ) can be updated through the following subproblem:
S v * = arg min S v = 1 V S v F 2 + v = 1 V μ Φ ( R v T , S ( v ) W + Z ( v ) ) = arg min S 1 μ v = 1 V S v F 2 + 1 2 S v ( W Z ( v ) R v ) F 2
The solution of S ( v ) is obtained by the following:
S ( v ) * = μ ( W Z ( v ) R v ) 2 I + μ I 1 = μ 2 + μ W Z ( v ) R v
8. E h subproblem:
The error matrix E h can be optimized by the following equation:
E h * = arg min E h λ 1 E h 2 , 1 + v = 1 V μ Φ ( β T , U T X U T X W E h ) = arg min E h λ 1 μ E h 2 , 1 + 1 2 E h ( U T X U T X W + β ) F 2
Similarly, to update E, the solution of E h can be obtained when we vertically concatenate the matrices ( U T X U T X W + β ) .
9. β subproblem:
The multiplier β can be updated by the following:
β * = β + ( U T X U T X W E h )
10. G m * subproblem:
G m can be updated by the following:
G m * = arg min G m G m * + μ Φ α m , P m z g m = p r o x β m t r Ω m P m z + α m
where we give a definition Ω m ( P m z + α m ) , which reshapes the vector P m z + α m into a matrix corresponding to the m t h mode unfolding. p r o x β m t r ( L ) = U max ( S β m , 0 ) V T is the threshold of the spectral soft threshold operation, where β m = γ m μ denotes the threshold and L = U S V T is the singular value decomposition (SVD) of the matrix L.
11. g m subproblem:
We replace the corresponding elements to update g m :
g m * G m
12. α m subproblem:
The variable α m can be updated by the following:
α m * = α m + P m z g m
Step 2: Fix W to calculate U
When W is given, the problem is transformed into the following:
U * = arg min U U T X U T X W 2 , 1 s . t . U T U = I
We can use the proposed algorithm, L 2 , 1 norm minimization, to solve this equation. The method includes two steps. First, we can calculate a diagonal matrix H:
H i i = 1 2 U T X U T X W i 2
in which the i t h column of the matrix is represented by ( · ) i . After that, we can transform Equation (25) into the equivalent trace minimization equation:
U * = arg min U U T X I W H I W T X U s . t . U T U = I
The optimal solution of Equation (26) comes from the eigenfunction problem:
X I W H I W T X u = α u
in which α is the eigenvalue and u is the corresponding eigenvector. The optimal solution U * has eigenvectors corresponding to the smaller non-zero eigenvalues.
Algorithm 1 gives the steps of the optimal solution. To facilitate better understanding, the block diagram of the LR-MTR algorithm is shown in Figure 2.
Algorithm 1 Algorithm LR-MTR.
Input: Multimodal feature matrices: X ( 1 ) , X ( 2 ) , , X ( V ) , the numbers of iterations T and the cluster number K and the parameter λ 1 and λ 2 .
Output: the optimal projection U
initialize: Set G = I and U as the matrix with orthogonal column vectors.
for t = 1 , 2 , , T do
  Initialize: E h = 0 ; W = 0 ; β = 0 ; R 1 = R V = 0 ;
   Z ( 1 ) = Z ( V ) = 0 ; S ( 1 ) = 0 , , S ( V ) = 0 ;
   E ( 1 ) = 0 , , E ( V ) = 0 ; Y 1 = = Y V = 0 ;
   α 1 = α 2 = = α M = 0 ; G 1 = 0 , , G M = 0 ;
   μ = 10 6 ; ρ = 1.1 ; ϵ = 10 8 ; m a x μ =
  while not comverged do
   for each of V views do
    Update Z ( v ) , E ( v ) , Y v , W , R v , S ( v ) according to Equations (10), (12), (13), (15), (16) and (18);
   end for
   Update z, E h , β by Equations (19) and (20);
   for each of M modes do
    Updata G m , g m and α m according to Equations (21), (22) and (23);
   end for
   Update μ by μ = m i n ( ρ μ ; m a x μ ) ;
   check the convergence conditions:
    X ( v ) X ( v ) Z ( v ) E ( v ) < ϵ and
    S ( w ) W + Z ( v ) < ϵ
    U T X U T X W E < ϵ
    P m z g m < ϵ ;
  end while
  Updata H using Equation (25).
  Solve the eigenfunction Equation (27) to obtain the optimal U
end for
 Project the samples onto the low-dimensional subspace to obtain low dimensional X ˜ = U T X for classification.

3.4. Complexity Analysis

In the optimization procedure, two-lever iteration is used to calculate the results. The first loop includes the projection matrix U and affinity matrix W calculations. When calculating U, a sub-loop is also needed to iteratively update every parameter, including Z v , S v , G m , E, E h , and Lagrangian multipliers Y v . First, to calculate U, the ALM optimization involves some sub-optimization problems. However, the main computation complexity O ( n m n ) comes from multiplying the matrix in updating Z ( v ) , W, and S ( v ) . Second, to calculate U, the computational complexity is from the eigen-decomposition of Equation (27) covering O ( n 3 ) . If the LR-MTR converges in T steps, the computational complexity is, at most, O ( T n 3 + T t n m n ) , where t is the loop number of the inner iteration of ALM. The outer iteration is fast (three iterations at most), so the computational burden is not too high.

3.5. Convergence Analysis

The objective function of LR-MTR is solved through alternative iterative optimization. We divide this optimization into two parts, one for updating W and anther for U. The suboptimization of W is an AL-ADM process, and U can be updated via an eigenvalue decomposition [51,52].
We find a function J ( W , U ) from the objective function of Equation (7). Then the theorem can be given.
Theorem 1.
The sequence J ( W t , U t ) monotonically decreases in the iterative scheme.
Proof. 
Suppose that, in the t-th iteration of the outer loop, we have the result from Equation (8) as follows:
J ( W t , U t ) = Δ J ( W t , U t ; E , G m | m = 1 M , S ( v ) | v = 1 V )
That means W t updates in the inner loop have the parameters E , G m | m = 1 M , and S ( v ) | v = 1 V according to the AL-ADM method, which reduce the value of J ( W t , U t ) in both loops, and an optimized W is obtained. The convergence properties of the inner loop can also be proven as in the literature [53]. Then it has
J ( W t , U t ) = Δ J ( W t , U t ; E , G m | m = 1 M , S ( v ) | v = 1 V ) J ( W t + 1 , U t ; E , G m | m = 1 M , S ( v ) | v = 1 V ) = Δ J ( W t + 1 , U t )
We can have J ( W t , U t ) J ( W t + 1 , U t ) . After updating W, we should furthermore update U via Equation (24). A diagonal matrix H should be constructed in Equation (25), and matrix U can be obtained by a standard eigenfunction problem in Equation (26). It has J ( W t + 1 , U t ) = Δ J ( W t + 1 , U t , H t + 1 ) . That will further reduce the objective value as follows:
J ( W t + 1 , U t , H t + 1 ) J ( W t + 1 , U t + 1 , H t + 1 ) J ( W t + 1 , U t + 1 )
It can be concluded from the above formulations that J ( W t + 1 , U t ) J ( W t + 1 , U t + 1 ) and J ( W t , U t ) J ( W t + 1 , U t + 1 ) , which means it is monotonically decreasing. As the equation has a positive lower bound, and due to the monotonic decrease in Theorem 1, it is simple to know that the sequence J ( W t , U t ) will converge to a local optimal solution. □

4. Experiments

To test the effectiveness of the LR-MTR algorithm, we run several experiments using the proposed method, which we compare with the previous algorithms. Two PolSAR data sets are used to demonstrate the LR-MTR algorithm. SVM and KNN classifiers are used for the final classification. In our experiment, terrain classification is performed by labeling each tile in a large PolSAR image with some predefined terrain category labels. In high-resolution and large-scale PolSAR image scenes, patch-level classification is more effective than pixel-level classification, as the patches contain richer terrain details which cannot be achieved by a single pixel.

4.1. Data Sets

The LR-MTR method was evaluated on two data sets. One data set covers the region of Tokyo, Japan, in January 2015. The geographical location of the Tokyo data set and the scene categories are presented in Figure 3. The Tokyo data set comes from ALOS-2 fully polarimetric SAR data, whose spatial resolution is six meters. The size of the PolSAR scene in the experiments is 14,186 × 16,514.
For the typical land covers, water and smooth surfaces correspond to surface scattering, rough covers to diffuse scattering, thick tree trunks and buildings to dihedral scattering, and areas of foliage to volume scattering. The patches covering all those typical land covers or jointly constructed by them are selected as the terrain categories. Then, five main terrain categories are given in the Tokyo data set: building, water, woodland, coast, and farmland. Moreover, a category which is not associated with any of the given categories is counted as the last uncertain category. The Tokyo data set is labeled with the reference of categories above. After being labeled, these data need to be preprocessed to construct the PolSAR scene data set. First, geographic calibration of the raw data is performed in the PolSARpro software. Second, Lee filtering is performed on the data. Finally, we cut the large image into small scene patches (samples). There are 256 × 256 pixels in each sample. Then, the cut small samples are labeled. We have access to 2000 samples in the Tokyo data set. There are 855, 697, 205, 187, and 56 scenes for the building, water, woodland, coast, and farmland categories, respectively.
To extract the multimodal features, we apply the parameters from polarimetric decomposition. Further, to extract features hidden in the polarimetric phases, we use three polarimetric parameter groups: Pauli decomposition, H/alpha decomposition, and Freeman decomposition. Pauli decomposition is a typical coherency decomposition. The scattering S matrix was expressed as the complex sum of the Pauli matrices.
Figure 4 shows four images from different categories. The first column shows the Pauli image of each category, where red comes from S H H S V V , green from S H V + S V H , and blue from S H H + S V V . The second column is the Cloude decomposition pseudo-color image of each category, where red represents T 22 , green T 33 , and blue T 11 , as described in Figure 6.10 in [1]. The third column is the Freeman decomposition pseudo-color image of each category, in which red is Odd, green is Dbl, and blue is Vol. The fourth column’s pictures present the category in Google Earth.
The second data set covers the region of Shanghai, China, in March 2015, as seen in Figure 5. The geographical location of the Shanghai data set and the scene categories are presented in Figure 3. The Shanghai data set also comes from ALOS-2. The resolution is the same as that of the Tokyo data set. The size of the labeled PolSAR scene is 8300 × 11,960, and the data set has five main categories: urban areas, suburban areas, farmland, water, and coast. Each sample has 256 × 256 pixels, and there are 1600 samples. There are 262, 346, 378, 417, and 197 scenes for the urban areas, suburban areas, farmland, water, and coast categories, respectively. In the Shanghai data set, we also use polarimetric decomposition to extract more multimodal features, and these four images also come from different categories in the Shanghai data set, as shown in Figure 6, similar to the Tokyo data set.

4.2. Feature Descriptors

To evaluate the classification performance, we use two feature descriptors for comparison. Different features were extracted from the following algorithms: local binary pattern (LBP) and gray-level co-occurrence matrix (GLCM). The dimensions of GLCM and LBP are eight and fifteen, respectively. In a 3 × 3 window, the other eight adjacent pixels are compared with the center one on the gray-scale value level, and we use the value from the center pixel as the threshold; this is where the original LBP operator comes from. If the surrounding pixels have greater values than the threshold, the position where this pixel comes from is set as 1; if not, it is set as 0. By doing so, eight pixels in the 3 × 3 window can be calculated to generate an 8-bit binary number. The center pixel’s LBP value can be obtained, which can show the texture information of the region of interest.

4.3. Experiment Setup

To evaluate the LR-MTR algorithm, we select some different SOTA methods to compare it with in the experiments. There are linear dimension reduction algorithms (multiple dimensional scaling (MDS), independent component analysis (ICA), and principal component analysis (PCA)), linearization of manifold learning methods (such as locality preserving projection (LPP), neighbor preserving embedding (NPE), and locally linear embedding (LLE)), and nonlinear dimension reduction methods (such as kernel principal component analysis (KPCA) and stochastic neighbor embedding (SNE)). We set 15 as the number of neighbors in graph embedding algorithms. To ensure the fairness of the experiment, the features of these algorithms are reduced to a fixed dimension. Before the experiments, we need to ascertain the influence of the parameters in the experiments. The feature dimensions of the data sets also influence the algorithm performance. During data processing, training samples were selected from the whole group of samples randomly. Furthermore, in learning projection matrices, the number of training samples is important and also needs to be considered. We selected L images from each individual for the training process, and the other images in the data set were used as the test set. We set different L by the scale of each individual/object in the two data sets, i.e., L = 10, 20, 30, 40, 50, 60, and 70 for the Tokyo and Shanghai databases. In both data sets, for the LR-MTR algorithm, the M parameter was set as the same value, i.e., γ 1 = γ 2 = = γ M = γ , and the parameter γ was tuned. Each task was conducted five times, and then the mean accuracy was recorded. For each comparative experiment, we tuned the parameters to their best performance. Overall accuracy (OA), average accuracy (AA), and Kappa coefficients are used to show the performance.

4.4. Experimental Result

As discussed in the last subsection, we used seven dimension reduction methods to process 24 and 45 high-dimensional features for comparison. They are composed of three modes of GLCM and LBP features. The high-dimensional features were reduced to the same dimension. The final results are recorded in two PolSAR data sets that have different typical terrain categories. The classification results and maps are shown in Table 1, Table 2, Table 3 and Table 4, Figure 7 and Figure 8. The confusion matrix in Figure 7 shows that our method has good classification ability on different categories of both data sets. The classification map in Figure 8 is randomly selected from one of the repeated experiments. Fifty samples were randomly selected to train the projection matrix U for the linear dimension reduction algorithm, and 50 samples were used to train the classifiers (KNN and SVM). The parameter k, which represents the nearest neighbor number, is set to 21 in the KNN classifier. The SVM classifier uses cross-validation to seek the optimal model parameters. In the following experimental analysis, a KNN classification map was used to show the visualization results.
From the classification results and classification map, it is not difficult to see that the LR-MTR algorithm has better classification performance in terms of all indices than other SOTA algorithms. For OA, the LR-MTR algorithm is approximately 2%, 2%, and 1% better than the second best algorithms. This reveals that the LR-MTR algorithm is an effective algorithm for PolSAR scene classification. In Table 1, Table 2, Table 3 and Table 4, we assign different terrain categories with the corresponding numbers. In Table 1 and Table 2, we use the numbers 1–5 to represent different categories: building, farmland, water, coast, and woodland, respectively. In Table 3 and Table 4, we use the numbers 1–5 to represent urban areas, suburban areas, water, farmland, and coast, respectively.

4.5. Experiment Analysis

(1) Dimension: Fifty training samples were randomly selected from each category and used to learn the projection matrix U. As the dimension changes from 1, 3, 5, …, 15, the KNN results are also affected, as shown in Figure 8. This also shows that the LR-MTR method has a better classification accuracy than other dimension reduction algorithms when the number of dimensions is greater than seven. When it is greater than three, all algorithms except the NPE algorithm have better classification performance. Once the algorithm performance remains stable, all the algorithms will achieve better results.
(2) Parameter Sensitivity: Figure 9 shows the parameter tuning examples. In the LR-MTR algorithm, the balance parameter γ is the most important parameter in the model. From Figure 9, it is not difficult to find that the parameter γ has some effectiveness on the two data sets. The value of γ is bigger in the Shanghai data set than in the Tokyo data set for promising results. As shown in the figure, the results are different with the increase in parameters. It is not difficult to find that the parameters have certain influence on the two different data sets, and there will be an optimal parameter, but the value of the optimal parameter is different in different data sets.
(3) Convergence: To demonstrate the convergence of the LR-MTR algorithm, Figure 10 shows the objective values versus the iteration steps on the two representative data sets. The convergent properties of LR-MTR on the Tokyo data set and the Shanghai data set are shown in Figure 10a,b where the errors 1–4 correspond to the computed results of the four convergence conditions in Algorithm 1, respectively. This shows that the four parameters of convergence in our method reduce as the number of iterations increases. Generally, the LR-MTR method is able to converge within 40 to 60 iterations. In the experiment, the outer loop iteration is set as two or three.
(4) Running Time: LR-MTR and the contrasting algorithms with the same number of samples (50 samples per category) were run, and the time was recorded, as shown in Table 5. Compared with other algorithms, our algorithm cannot achieve a good performance time, as it takes more time to improve the feature representative. This is because the LR-MTR algorithm is based on an optimization algorithm with two loops, which takes more time during the convergence process. The running time can be further reduced accordingly to improve the optimal solutions.
(5) Generalization: In order to test the generalization of the proposed method, we used the learned projected matrix on another data set for classification. For a fair comparison, those DR techniques, including PCA, LPP, NPE, ICA, KPCA, LEE, SNE, and MDS, all used the same 50 samples from each category for learning the projection matrices. The OA and specific accuracy are given. Table 6 shows the results of projection matrices learning from the Tokyo data set and being used on the Shanghai data set. Meanwhile, Table 7 shows the results of projection matrices learning from the Shanghai data set and being used on the Tokyo data set. In the test data sets, we chose 50 samples from one data set as training samples and used the other samples for testing. From these experiments, our algorithm is able to achieve the best classification OA 79.81 (Tokyo learned U used on Shanghai data set) and 87.35 (Shanghai learned U used on Tokyo data set). This is due to the ability to capture the relevance between features, which proves that LR-MTR also has better generalization than other DR techniques.

5. Conclusions

The LT-MTR method is proposed to coordinate PolSAR data in multi-view representation and solve the PolSAR scene classification problem. This method has shown a way to keep the spatial and polarimetric information simultaneously by exploiting the complementary information of multi-view data jointly. As for the high-order matrix, we use a type of high-order matrix tensor to find the relation information of underlying multi-view data. This is also constrained by the low-rank norm to model the cross-information from multiple spaces. To solve the out-of-sample problem in the large-scale data set, the projection matrix is calculated. The experimental results show that LR-MTR algorithm has advantages on the two ALOS-2 PolSAR data sets. The low-rank representation has strong robustness, which ensures that the proposed method can be used to acquire high-quality classification results. Compared with the current popular vector dimensionality reduction methods (PCA, LLE, SNE, MDS, etc.), LR-MTR takes into account the association of data from different perspectives, which can better capture the features that are beneficial to scene classification and achieve better classification results. In our future research, we will focus on the feature extraction techniques for data, apply LR-MTR to different tasks, and optimize the time complexity and computational complexity.

Author Contributions

Conceptualization, B.R. and M.C.; Funding acquisition, B.H. and L.J.; Project administration, B.H.; Software, M.C. and S.M.; Writing—original draft, B.R., M.C. and S.M.; Writing—review and editing, D.H. and J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Natural Science Foundation of China: 62001355, 61671350, 61771379, 61836009; Fundamental Research Funds for the Central Universities: No. XJS211901; the Foundation for Innovative Research Groups of the National Natural Science Foundation of China: 61621005; the Key Research and Development Program in Shaanxi Province of China: 2022GY-067, 2019ZDLGY03-05; China Postdoctoral Science Foundation: No. 2018M633468.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Lee, J.S.; Pottier, E. Polarimetric Radar Imaging: From Basics to Applications; CRC Press: Boca Raton, FL, USA, 2009. [Google Scholar]
  2. Uhlmann, S.; Kiranyaz, S. Integrating color features in polarimetric SAR image classification. IEEE Trans. Geosci. Remote Sens. 2013, 52, 2197–2216. [Google Scholar] [CrossRef]
  3. Ren, B.; Hou, B.; Wen, Z.; Xie, W.; Jiao, L. PolSAR image classification via multimodal sparse representation-based feature fusion. Int. J. Remote Sens. 2018, 39, 7861–7880. [Google Scholar] [CrossRef]
  4. Akbarizadeh, G.; Rahmani, M. Efficient combination of texture and color features in a new spectral clustering method for PolSAR image segmentation. Natl. Acad. Sci. Lett. 2017, 40, 117–120. [Google Scholar] [CrossRef]
  5. Ren, B.; Hou, B.; Zhao, J.; Jiao, L. Unsupervised classification of polarimetirc SAR image via improved manifold regularized low-rank representation with multiple features. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2016, 10, 580–595. [Google Scholar] [CrossRef]
  6. Wu, W.; Li, H.; Li, X.; Guo, H.; Zhang, L. PolSAR image semantic segmentation based on deep transfer learning-Realizing smooth classification with small training sets. IEEE Geosci. Remote Sens. Lett. 2019, 16, 977–981. [Google Scholar] [CrossRef]
  7. Liu, F.; Jiao, L.; Tang, X. Task-oriented GAN for PolSAR image classification and clustering. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2707–2719. [Google Scholar] [CrossRef]
  8. Cheng, G.; Xie, X.; Han, J.; Guo, L.; Xia, G.S. Remote sensing image scene classification meets deep learning: Challenges, methods, benchmarks, and opportunities. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 3735–3756. [Google Scholar] [CrossRef]
  9. Wu, W.; Li, H.; Zhang, L.; Li, X.; Guo, H. High-resolution PolSAR scene classification with pretrained deep convnets and manifold polarimetric parameters. IEEE Trans. Geosci. Remote Sens. 2018, 56, 6159–6168. [Google Scholar] [CrossRef]
  10. Cloude, S.R.; Pottier, E. An entropy based classification scheme for land applications of polarimetric SAR. IEEE Trans. Geosci. Remote Sens. 1997, 35, 68–78. [Google Scholar] [CrossRef]
  11. Yang, J.; Peng, Y.N.; Yamaguchi, Y.; Yamada, H. On Huynen’s decomposition of a Kennaugh matrix. IEEE Geosci. Remote Sens. Lett. 2006, 3, 369–372. [Google Scholar] [CrossRef]
  12. Cloude, S.R.; Pottier, E. A review of target decomposition theorems in radar polarimetry. IEEE Trans. Geosci. Remote Sens. 1996, 34, 498–518. [Google Scholar] [CrossRef]
  13. Freeman, A.; Durden, S.L. A three-component scattering model for polarimetric SAR data. IEEE Trans. Geosci. Remote Sens. 1998, 36, 963–973. [Google Scholar] [CrossRef] [Green Version]
  14. Van Zyl, J.J. Unsupervised classification of scattering behavior using radar polarimetry data. IEEE Trans. Geosci. Remote Sens. 1989, 27, 36–45. [Google Scholar] [CrossRef]
  15. Yamaguchi, Y.; Moriyama, T.; Ishido, M.; Yamada, H. Four-component scattering model for polarimetric SAR image decomposition. IEEE Trans. Geosci. Remote Sens. 2005, 43, 1699–1706. [Google Scholar] [CrossRef]
  16. Touzi, R.; Charbonneau, F. Characterization of target symmetric scattering using polarimetric SARs. IEEE Trans. Geosci. Remote Sens. 2002, 40, 2507–2516. [Google Scholar] [CrossRef]
  17. Dell’Acqua, F.; Gamba, P. Texture-based characterization of urban environments on satellite SAR images. IEEE Trans. Geosci. Remote Sens. 2003, 41, 153–159. [Google Scholar] [CrossRef]
  18. Kandaswamy, U.; Adjeroh, D.A.; Lee, M.C. Efficient texture analysis of SAR imagery. IEEE Trans. Geosci. Remote Sens. 2005, 43, 2075–2083. [Google Scholar] [CrossRef]
  19. Clausi, D.A. Comparison and fusion of co-occurrence, Gabor and MRF texture features for classification of SAR sea-ice imagery. Atmos. Ocean 2001, 39, 183–194. [Google Scholar] [CrossRef]
  20. Li, H.C.; Celik, T.; Longbotham, N.; Emery, W.J. Gabor feature based unsupervised change detection of multitemporal SAR images based on two-level clustering. IEEE Geosci. Remote Sens. Lett. 2015, 12, 2458–2462. [Google Scholar]
  21. Kaplan, L.M. Improved SAR target detection via extended fractal features. IEEE Trans. Aerosp. Electron. Syst. 2001, 37, 436–451. [Google Scholar] [CrossRef]
  22. Solberg, A.S.; Jain, A.K. Texture fusion and feature selection applied to SAR imagery. IEEE Trans. Geosci. Remote Sens. 1997, 35, 475–479. [Google Scholar] [CrossRef]
  23. Gao, H.; Nie, F.; Li, X.; Huang, H. Multi-view subspace clustering. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 4238–4246. [Google Scholar]
  24. Brbić, M.; Kopriva, I. Multi-view low-rank sparse subspace clustering. Pattern Recognit. 2018, 73, 247–258. [Google Scholar] [CrossRef] [Green Version]
  25. Zhang, C.; Fu, H.; Liu, S.; Liu, G.; Cao, X. Low-rank tensor constrained multiview subspace clustering. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1582–1590. [Google Scholar]
  26. Ren, B.; Hou, B.; Chanussot, J.; Jiao, L. PolSAR Feature Extraction Via Tensor Embedding Framework for Land Cover Classification. IEEE Trans. Geosci. Remote Sens. 2019, 58, 2337–2351. [Google Scholar] [CrossRef]
  27. Ren, B.; Hou, B.; Chanussot, J.; Jiao, L. Modified Tensor Distance-Based Multiview Spectral Embedding for PolSAR Land Cover Classification. IEEE Geosci. Remote Sens. Lett. 2020, 17, 2095–2099. [Google Scholar] [CrossRef]
  28. Hong, D.; Gao, L.; Yokoya, N.; Yao, J.; Chanussot, J.; Qian, D.; Zhang, B. More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification. IEEE Trans. Geosci. Remote Sens. 2020, 59, 4340–4354. [Google Scholar] [CrossRef]
  29. Hong, D.; Gao, L.; Yao, J.; Zhang, B.; Antonio, P.; Chanussot, J. Graph Convolutional Networks for Hyperspectral Image Classification. IEEE Trans. Geosci. Remote Sens. 2020, 7, 5966–5978. [Google Scholar] [CrossRef]
  30. Hong, D.; Yokoya, N.; Chanussot, J.; Xu, J.; Zhu, X.X. Joint and Progressive Subspace Analysis (JPSA) with Spatial-Spectral Manifold Alignment for Semi-Supervised Hyperspectral Dimensionality Reduction. IEEE Trans. Cybern. 2020, 51, 3602–3615. [Google Scholar] [CrossRef]
  31. Hong, D.; Yokoya, N.; Chanussot, J.; Zhu, X. An augmented linear mixing model to address spectral variability for hyperspectral unmixing. IEEE Trans. Image Process. 2019, 28, 1923–1938. [Google Scholar] [CrossRef] [Green Version]
  32. Hong, D.; Yokoya, N.; Ge, N.; Chanussot, J.; Zhu, X. Learnable manifold alignment (LeMA): A semi-supervised cross-modality learning framework for land cover and land use classification. ISPRS J. Photogramm. Remote Sens. 2019, 147, 193–205. [Google Scholar] [CrossRef]
  33. Zhao, J.; Xie, X.; Xu, X.; Sun, S. Multi-view learning overview: Recent progress and new challenges. Inf. Fusion 2017, 38, 43–54. [Google Scholar] [CrossRef]
  34. Gao, L.; Han, Z.; Hong, D.; Zhang, B.; Chanussot, J. CyCU-Net: Cycle-Consistency Unmixing Network by Learning Cascaded Autoencoders. IEEE Trans. Geosci. Remote Sens. 2020, 60, 1–14. [Google Scholar] [CrossRef]
  35. Rai, A.K.; Daumé, H. Co-Regularized Multi-View Spectral Clustering. In NIPS’11: Proceedings of the 24th International Conference on Neural Information Processing Systems, Granada, Spain, 12–15 December 2011; Curran Associates Inc.: Red Hook, NY, USA, 2011. [Google Scholar]
  36. Evans, X.Z.; Dugelay, J. A subspace co-training framework for multi-view clustering. Pattern Recognit. Lett. 2014, 41, 73–82. [Google Scholar]
  37. Cao, X.; Zhang, C.; Fu, H.; Liu, S.; Zhang, H. Diversity-induced multi-view subspace clustering. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 586–594. [Google Scholar]
  38. Chao, G.; Sun, S. Consensus and complementarity based maximum entropy discrimination for multi-view classification. Inf. Sci. 2016, 367, 296–310. [Google Scholar] [CrossRef]
  39. Hardoon, D.R.; Szedmak, S.; Shawe-Taylor, J. Canonical correlation analysis: An overview with application to learning methods. Neural Comput. 2004, 16, 2639–2664. [Google Scholar] [CrossRef] [Green Version]
  40. Bach, F.R.; Jordan, M.I. Kernel independent component analysis. J. Mach. Learn. Res. 2002, 3, 1–48. [Google Scholar]
  41. Luo, Y.; Tao, D.; Ramamohanarao, K.; Xu, C.; Wen, Y. Tensor canonical correlation analysis for multi-view dimension reduction. IEEE Trans. Knowl. Data Eng. 2015, 27, 3111–3124. [Google Scholar] [CrossRef] [Green Version]
  42. Kan, M.; Shan, S.; Zhang, H.; Lao, S.; Chen, X. Multi-view discriminant analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2015, 38, 188–194. [Google Scholar] [CrossRef]
  43. Ma, H.T.H.L.Z.W. Graph based multi-modality learning. In Proceedings of the 13th Annual ACM International Conference on Multimedia, Singapore, 6–11 November 2005; pp. 6–11. [Google Scholar]
  44. Hong, D.; He, W.; Yokoya, N.; Yao, J.; Gao, L.; Zhang, L.; Chanussot, J.; Zhu, X.X. Interpretable Hyperspectral Artificial Intelligence: When Non-Convex Modeling meets Hyperspectral Remote Sensing. IEEE Geosci. Remote Sens. Mag. 2021, 9, 52–87. [Google Scholar] [CrossRef]
  45. Kolda, T.G.; Bader, B.W. Tensor decompositions and applications. SIAM Rev. 2009, 51, 455–500. [Google Scholar] [CrossRef]
  46. Grasedyck, L.; Kressner, D.; Tobler, C. A literature survey of low-rank tensor approximation techniques. GAMM-Mitt. 2013, 36, 53–78. [Google Scholar] [CrossRef] [Green Version]
  47. Comon, P. Tensors: A brief introduction. IEEE Signal Process Mag. 2014, 31, 44–53. [Google Scholar] [CrossRef] [Green Version]
  48. Liu, Y.; Long, Z.; Huang, H.; Zhu, C. Low CP rank and tucker rank tensor completion for estimating missing components in image data. IEEE Trans. Circuits Syst. Video Technol. 2019, 30, 944–954. [Google Scholar] [CrossRef]
  49. Liu, J.; Musialski, P.; Wonka, P.; Ye, J. Tensor completion for estimating missing values in visual data. IEEE Trans. Pattern Anal. Mach. Intell. 2012, 35, 208–220. [Google Scholar] [CrossRef] [PubMed]
  50. Guangcan, L.; Zhouchen, L.; Shuicheng, Y.; Ju, S.; Yong, Y.; Yi, M. Robust recovery of subspace structures by low-rank representation. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 171–184. [Google Scholar]
  51. Deng, Y.J.; Li, H.C.; Fu, K.; Du, Q.; Emery, W.J. Tensor low-rank discriminant embedding for hyperspectral image dimensionality reduction. IEEE Trans. Geos. Remote Sens. 2018, 56, 7183–7194. [Google Scholar] [CrossRef]
  52. Wong, W.K.; Lai, Z.; Wen, J.; Fang, X.; Lu, Y. Low-rank embedding for robust image feature extraction. IEEE Trans. Image Process. 2017, 26, 2905–2917. [Google Scholar] [CrossRef]
  53. Lin, Z.; Chen, M.; Ma, Y. The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv 2010, arXiv:1009.5055. [Google Scholar]
Figure 1. Overview of LR-MTR.
Figure 1. Overview of LR-MTR.
Remotesensing 14 03117 g001
Figure 2. LR-MTR algorithm flow chart.
Figure 2. LR-MTR algorithm flow chart.
Remotesensing 14 03117 g002
Figure 3. Geographical location of the Tokyo data set and its scene categories.
Figure 3. Geographical location of the Tokyo data set and its scene categories.
Remotesensing 14 03117 g003
Figure 4. Visualization examples from the Tokyo data set. (a) Building. (b) Water. (c) Woodland. (d) Coast. (e) Farmland.
Figure 4. Visualization examples from the Tokyo data set. (a) Building. (b) Water. (c) Woodland. (d) Coast. (e) Farmland.
Remotesensing 14 03117 g004
Figure 5. Geographical location of the Shanghai data set and its scene categories.
Figure 5. Geographical location of the Shanghai data set and its scene categories.
Remotesensing 14 03117 g005
Figure 6. Visualization examples from the Shanghai data set. (a) Urban areas. (b) Surburban areas. (c) Farmland. (d) Water. (e) Coast.
Figure 6. Visualization examples from the Shanghai data set. (a) Urban areas. (b) Surburban areas. (c) Farmland. (d) Water. (e) Coast.
Remotesensing 14 03117 g006aRemotesensing 14 03117 g006b
Figure 7. Confusion matrix for the Tokyo data set (a) and the Shanghai data set (b).
Figure 7. Confusion matrix for the Tokyo data set (a) and the Shanghai data set (b).
Remotesensing 14 03117 g007
Figure 8. OA results of KNN with the dimension changes. (a) Tokyo data results. (b) Shanghai data results.
Figure 8. OA results of KNN with the dimension changes. (a) Tokyo data results. (b) Shanghai data results.
Remotesensing 14 03117 g008
Figure 9. Adjusted parameters according to the accuracy of the Tokyo data set (left) and the Shanghai data set (right). As we set γ = γ 2 = = γ M = γ in our experiments, γ is the only parameter that needs to be tuned.
Figure 9. Adjusted parameters according to the accuracy of the Tokyo data set (left) and the Shanghai data set (right). As we set γ = γ 2 = = γ M = γ in our experiments, γ is the only parameter that needs to be tuned.
Remotesensing 14 03117 g009
Figure 10. The convergence property of the proposed algorithm. (a) Tokyo data results. (b) Shanghai data results.
Figure 10. The convergence property of the proposed algorithm. (a) Tokyo data results. (b) Shanghai data results.
Remotesensing 14 03117 g010
Table 1. GLCM Tokyo.
Table 1. GLCM Tokyo.
ClassPCALPPNPEICAKPCALLESNEMDSLR-MTR
KNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVM
194.9797.6687.4992.9849.2460.8225.5039.6594.7498.2590.1879.3096.0283.6384.9158.4897.7895.79
251.7964.2982.1496.4362.5046.4383.9376.7953.5750.0050.0064.2953.5787.5062.5048.2173.9367.14
3100.00100.0089.8191.1051.0816.9391.3993.4099.86100.0099.7181.21100.0027.4099.8659.6897.1398.85
442.9340.9877.0760.4958.0523.9086.3484.3941.4635.6130.2462.9347.320.0029.2776.1056.1079.81
578.0772.1997.8675.9464.7167.3892.5167.9188.2483.9680.7571.1286.6377.0143.3252.9482.8984.92
OA88.6089.3588.0587.5052.6041.9562.6066.6589.3089.7585.3577.1090.3554.9580.0059.9092.2893.40
AA73.5575.0286.8883.3957.1143.0975.9372.4375.5773.5670.1871.7776.7155.1163.9759.0882.1283.36
Kappa0.83190.84200.83050.82040.38790.26660.53020.56700.84220.84760.78550.67900.85720.41580.71090.47500.99860.9204
Table 2. GLCM Shanghai.
Table 2. GLCM Shanghai.
ClassPCALPPNPEICAKPCALLESNEMDSLR-MTR
KNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVM
183.2169.8579.7766.7933.9732.8274.8183.5979.7754.9679.0155.3482.0672.9062.2146.5666.8570.77
267.3479.4882.3781.2149.4260.4071.1078.9071.1071.6858.0978.0365.3278.3262.4369.9477.3173.72
389.6884.1384.1386.2453.9741.5381.4887.8383.3383.8689.6875.6687.5779.6385.7167.7281.3882.96
498.3281.2983.2173.8676.7443.4177.9476.0298.0863.7998.0870.2696.8879.3898.3254.9284.2176.04
545.6964.9756.8577.6632.9957.3664.9778.6843.6576.1427.4154.3141.6254.8223.8650.2565.6076.80
OA80.0777.6979.4477.6957.6746.6356.8881.0080.2570.3175.1868.8178.6375.1372.6459.2582.8881.29
AA76.8575.9477.2777.1649.4247.1174.0681.0075.1970.0970.4666.7274.6973.0166.5157.8875.0776.06
Kappa0.75230.71850.73960.72060.40160.34060.68570.76090.73310.63060.68740.60940.72690.68620.64630.49510.99810.6583
Table 3. LBP Tokyo.
Table 3. LBP Tokyo.
ClassPCALPPNPEICAKPCALLESNEMDSLR-MTR
KNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVM
167.6088.0782.9285.9613.9223.5169.2460.0065.6187.8464.3377.0868.8983.5163.0472.0571.9382.34
278.5780.3689.2994.6437.5028.5748.2162.5073.2178.5769.6471.4366.0776.7964.2978.5783.9387.50
397.2760.6997.8588.5226.9714.4953.6622.2495.7060.5594.9877.4794.5588.0998.1393.9785.6570.45
446.8312.2052.2040.4912.6822.9339.0265.8549.769.7640.4913.1743.4122.9351.7115.1264.3974.63
587.7063.1094.1283.9628.346.9560.4374.8765.7844.3951.8740.6469.5253.4863.1037.4386.6392.51
OA78.0068.2086.2082.2530.4518.9059.3048.9074.7066.0071.5567.1075.2075.9074.1570.8077.6578.50
AA75.6060.8883.2778.7223.8819.2954.1157.0970.0156.2264.2655.9668.4964.9668.0559.4378.5181.49
Kappa0.69260.56410.80170.74690.0243-0.06100.44220.33740.64930.53500.60630.54690.65410.65860.64110.59220.99850.9986
Table 4. LBP Shanghai.
Table 4. LBP Shanghai.
ClassPCALPPNPEICAKPCALLESNEMDSLR-MTR
KNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVMKNNSVM
160.3146.9571.3783.2185.5015.6566.7973.2859.5454.5845.0441.6055.7366.4153.4457.6369.4780.15
280.0670.2380.9275.1465.3224.2875.1476.0181.7969.9471.9766.7681.7968.2178.0372.2577.7579.48
383.6077.5181.2288.3630.6929.8979.3787.5765.3479.6366.1479.3760.5875.9350.2669.8487.5787.83
498.0877.2298.0886.8188.7317.7587.2982.2590.8977.4699.2876.5099.5279.8684.6579.3887.0574.82
567.0171.5762.9475.1322.3459.3961.4276.6560.9180.7154.8277.1644.6776.1447.7267.0178.6891.37
OA80.7570.1381.6982.6361.2526.8176.2580.0074.0673.0071.1969.4472.5673.7565.4470.5082.2582.81
AA77.8168.7078.9181.7358.5129.3974.0079.1571.7072.4667.4568.2868.4673.3162.8269.2280.1082.73
Kappa0.75500.62550.76670.78100.50770.09880.69760.74790.67110.66130.63320.61690.64850.66990.56330.62790.99760.9975
Table 5. Running time.
Table 5. Running time.
AlgorithmPCALPPNPEFastICAKernelPCALLESNEMDSLR-MTR
time(s)0.641.230.51.490.621.292.720.426.32
Table 6. Results of Tokyo Learned U for Shanghai data set.
Table 6. Results of Tokyo Learned U for Shanghai data set.
BuildingsFarmlandWaterWoodlandCoastOA
PCA84.3556.9482.5482.0118.2769.25
LPP41.677.7567.9983.4569.0469.94
NPE75.5766.1873.5475.7838.0768.5
ICA87.0245.0980.4264.0312.1861.19
KPCA86.6444.2285.9893.0521.3270.94
LLE80.9258.9682.2895.6812.1871.88
SNE83.5962.7289.6893.7640.177.81
MDS81.6843.0685.9893.0519.2969.63
LT-MTR85.1162.7191.7989.4459.3979.81
Table 7. Results of Shanghai Learned U for Tokyo data set.
Table 7. Results of Shanghai Learned U for Tokyo data set.
UrbanSuburbanFarmlandWaterCoastOA
PCA92.9860.7198.2818.0579.1484.95
LPP79.4264.2981.2124.3978.6173.9
NPE36.6126.7959.2529.7659.8945.7
ICA92.9860.7198.2818.0579.1484.95
KPCA37.4341.0796.2742.4450.859.8
LLE52.1657.1497.2740.9883.4269.8
SNE95.5655.3692.6822.4481.2884.6
MDS96.8444.6452.5144.8868.4571.95
LT-MTR97.3153.5794.9827.3289.3187.35
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Ren, B.; Chen, M.; Hou, B.; Hong, D.; Ma, S.; Chanussot, J.; Jiao, L. PolSAR Scene Classification via Low-Rank Constrained Multimodal Tensor Representation. Remote Sens. 2022, 14, 3117. https://doi.org/10.3390/rs14133117

AMA Style

Ren B, Chen M, Hou B, Hong D, Ma S, Chanussot J, Jiao L. PolSAR Scene Classification via Low-Rank Constrained Multimodal Tensor Representation. Remote Sensing. 2022; 14(13):3117. https://doi.org/10.3390/rs14133117

Chicago/Turabian Style

Ren, Bo, Mengqian Chen, Biao Hou, Danfeng Hong, Shibin Ma, Jocelyn Chanussot, and Licheng Jiao. 2022. "PolSAR Scene Classification via Low-Rank Constrained Multimodal Tensor Representation" Remote Sensing 14, no. 13: 3117. https://doi.org/10.3390/rs14133117

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop