Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

An Efficient Representation-Based Subspace Clustering Framework for Polarized Hyperspectral Images

Remote Sens. 2019, 11(13), 1513; https://doi.org/10.3390/rs11131513

by Zhengyi Chen^1,2,3, Chunmin Zhang^1,2,3,*, Tingkui Mu^1,2,3

, Tingyu Yan^1,2,3, Zeyu Chen^1,2,3 and Yanqiang Wang^1,2,3

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Remote Sens. 2019, 11(13), 1513; https://doi.org/10.3390/rs11131513

Submission received: 7 May 2019 / Revised: 20 June 2019 / Accepted: 24 June 2019 / Published: 26 June 2019

(This article belongs to the Section Urban Remote Sensing)

Round 1

Reviewer 1 Report

I have found this paper clear and well written.
The Authors have provided a well-structured exposition of their material.
The content is described with a sufficient amount of details to understand the topic, results and techniques.
The analysis provided and corresponding figures are quite appropriate to the text and its content.
The list of references to the literature related to this field is also appropriate.

Overall, the content is novel, original and accurate.
But the argument and analysis could be improved in some places.

1) Above Table 1, it seems reasonable to discuss

2) Why SLIC before a superpixel-based method and not only a superpixel-based method applied twice (with different settings)? This needs to be discussed and justified better. How to set the value of N_1 compared to that of N_2?

3) Why here a number of superpixels of 3249, which is a very specific number as compared to any arbitrary one within [2000-5000])?

4) In practice, how to choose the lamba^prime coefficients? What are general recommendations for setting the values of these adjustment coefficients? How optimal these values are? How to define the meaning of optimality in this framework?

5) Can we expect that the recommandations drawn from the analysis of figure 6 be sufficiently general ones for any other dataset(s)?
My opinion is that the assertion just above subsection 4.4 ("can take the same value for different datasets") is too strong and should be relaxed based on the experiments provided in this manuscript. Given a new dataset, how to guarantee a certain level of OA at output remains an open question from my point of view.

6) Please, clarify more accurately what you mean by "the number of in-sample data in our experiment is the most suitable for 429"

Typo(s):
Especially for HSIs and PHSIs, which usually accompanied by both high ...
Y=[y_1, y_2, ⋯, y_{mn} ] in R^D
In the case of the number of superpixls is too small, ...
And Figure 7(c) and Figure 7(d) show the shows the corresponding ...

Author Response

I have found this paper clear and well written.

The Authors have provided a well-structured exposition of their material.

The content is described with a sufficient amount of details to understand the topic, results and techniques.

The analysis provided and corresponding figures are quite appropriate to the text and its content.

The list of references to the literature related to this field is also appropriate.

Overall, the content is novel, original and accurate.

But the argument and analysis could be improved in some places.

We would like to thank the editor for giving us a chance to revise the paper, and thank the reviewer for giving us constructive suggestions as well. We feel lucky that our manuscript went to the reviewer as the valuable comments from them not only helped us with the improvement of our manuscript, but suggested some neat ideas for future studies. Here we submit a new version of our manuscript with the title “An Efficient Representation-based Subspace Clustering Framework for Polarized Hyperspectral Images”, which has been careful modified according to the reviewers’ suggestions. We hope the new manuscript will meet the journal’s standard. Below you will find our point-by-point responses to the reviewer’s comments.

Point 1: Above Table 1, it seems reasonable to discuss.

Response 1: Thanks for the suggestion. The section in which Table 1 is located has been re-written, and more details about Table 1 has also been involved.

Point 2: Why SLIC before a superpixel-based method and not only a superpixel -based method applied twice (with different settings)? This needs to be discussed and justified better. How to set the value of N_1 compared to that of N_2?

Response 2: Thank the reviewer for the comments and sorry for the unclear expression. There isn’t another SLIC operation before superpixel-based sampling method. The superpixel-based sampling method is achieved by applying SLIC twice with different superpixel number settings. The SLIC operation with a larger number aims at reducing the pixel number only, and the segmentation results are the actual input of the algorithms. The SLIC operation with a smaller number is for selecting in-sample points. This section has also been re-written, and the analysis of the selection of the superpixels’ number and the selection of in-sample data has been added into Section 4.4.

Point 3: Why here a number of superpixels of 3249, which is a very specific number as compared to any arbitrary one within [2000-5000])?

Response 3: Thank the reviewer for the comments. We adopt an open source code of SLIC to segment images. In this SLIC code, input images are first over-segmented to more superpixels than the number set by the user, and then these superpixels will be automatically merged based on the similarity of adjacent superpixels. The number of final output superpixels is not equal to the number set by the user, and it might vary from dataset to dataset. In our experiments, the number is set to 3000, and the output result is with a number of 3249. We have modified the expression “3249” to “about 3000” in the revised version of manuscript. Also, we have compared the overall accuracy and running time with different numbers of superpixels, and the analysis has been added into Sectioin 4.4.

Point 4: In practice, how to choose the lamba^prime coefficients? What are general recommendations for setting the values of these adjustment coefficients? How optimal these values are? How to define the meaning of optimality in this framework?

Response 4: Thanks for the comments. These coefficients are not able to be adaptively determined so far, and this will be a direction for our future work. Although we have used (17.a)-(17.e) to reduce the impact of different datasets, it is still very difficult to find fixed constants for these coefficients to reach the best performance on all datasets. But since the proposed method has a low sensitivity of the parameters, parameters can achieve an acceptable performance in wide ranges. The suggested values of lambda'_1, lambda'_2, beta'_1, beta'_2 and alpha' are 1000, 100,10000, 1000 and 15, respectively. Users can adjust these 5 parameters around them to find the best values. And for sigma_s and sigma_p, there are no suggested values for all datasets, and users need to adjust them by trying on different orders of magnitude. But due to the low sensitivity, adjusting is not a complicated task. And for datasets collected by the same instrument in similar measurement environment, all these parameters can adopt the same set of values.

Point 5: Can we expect that the recommandations drawn from the analysis of figure 6 be sufficiently general ones for any other dataset(s)?

My opinion is that the assertion just above subsection 4.4 ("can take the same value for different datasets") is too strong and should be relaxed based on the experiments provided in this manuscript. Given a new dataset, how to guarantee a certain level of OA at output remains an open question from my point of view.

Response 5: Thanks for the comments. In fact, we did some experiments not listed in the manuscript before, and the results showed that lambda'_1, lambda'_2, beta'_1, beta'_2 and alpha' can take the same values for different datasets. But the datasets we used are collected by the same instrument under similar measurement condition, thus the conclusion is indeed too strong. We have relaxed this conclusion in the revised manuscript.

Point 6: Please, clarify more accurately what you mean by "the number of in-sample data in our experiment is the most suitable for 429"

Response 6: Thanks for the suggestion. This sentence has been modified as follows:

“Following consideration of the cost of time, the suggested number of in-sample data is about one tenth of the number of superpixels. Specifically, for images with a size of 512×512, a high OA as well as relatively short running time can be achieved when the number of in-sample data is 400-500.”

Point 7: Typo(s):

Especially for HSIs and PHSIs, which usually accompanied by both high ...

Y=[y_1, y_2, ⋯, y_{mn} ] in R^D

In the case of the number of superpixls is too small, ...

And Figure 7(c) and Figure 7(d) show the shows the corresponding ...

Response 7: Thanks for the comments. The manuscript has been carefully checked, and all the typos and some other issues have been modified. I hope it can get closer to the requirements for publication.

Author Response File: Author Response.docx

Reviewer 2 Report

The authors propose an algorithm for clustering data that possesses a joint HSI and polarized HSI representation. While the problem is interesting, the proposed algorithm appears to me a rather ad-hoc combination of existing methods. Moreover, it is not clear in principle what can be gleaned from introducing polarized data, compared to the traditional HSI alone. I outline several major and minor comments below.

) It is fundamentally unclear to me what benefit is hoped for from considering the polarized data in addition to the traditional HSI. By contrast, it is well understood that incorporating spatial information or data modalities such as lidar may help HSI-driven learning tasks. Without delving into this important problem, the proposed work appears quite unmotivated.

) While the authors discuss in the literature review subspace clustering at length, several recent methods allow to go past subspaces, and consider data lying close to low-dimensional manifolds. Just two recent works that could be considered in this regard are:

a.) Murphy, James M., and Mauro Maggioni. "Unsupervised Clustering and Active Learning of Hyperspectral Images With Nonlinear Diffusion." IEEE Transactions on Geoscience and Remote Sensing 57.3 (2019): 1829-1845

b.) Saranathan, Arun M., and Mario Parente. "On Clustering and Embedding Mixture Manifolds Using a Low Rank Neighborhood Approach." IEEE Transactions on Geoscience and Remote Sensing (2019).

The work of Gillis on matrix factorizations for HSI also seems relevant to the proposed work, for example: Gillis, Nicolas, Da Kuang, and Haesun Park. "Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization." IEEE Transactions on Geoscience and Remote Sensing 53.4 (2015): 2066-2078.

) It would seem natural to compare the proposed method with non-subspace based clustering methods, for example the works above (some of which have open-source code available). Moreover, it would be nice to see some benchmark methods like K-means on the raw data, just for reference.

) The crucial subsampling step seems totally unclear to me. This section needs to be totally re-written and made clear how stability is achieved. This is one of my biggest concerns---I frankly could not understand Section 3.2, in particular Fig. 2.

) No formal definition of "representation-based clustering" is given, which is a bit confusing.

) The English is consistently awkward and often incorrect. It needs a thorough revision, ideally by a careful native speaker. There are also some spelling errors.

) Formatting issues abound, for example equations (8), (9).

) Equation (7) feels extremely ad-hoc to me. Why is this a reasonable way to combine the C matrices? Taking a convex combination seems more natural, frankly.

) Equation (6) implicitly assumes that classes should be distinct in either of the two data modalities. Does this really make sense philosophically? Are there examples where one modality is erroneously discriminative, and one should use the other modality to compensate?

) What is meant by "on actual demand"?

) Why should the sum of the dimensions d_i be equal to D? What if I have two classes that live on 1-dimensional subspaces? This is quite bizarre to me.

) In equation (4), is y_i a spectral or spatial feature?

) Typically || ||_p denotes an l_p norm (for vectors) or an induced matrix norm, not an arbitrary "proper norm."

) Which matrix inner product is considered?

) Algorithm 1 leaves many things implicit, i.e. (10)-(14). It would be better to write the algorithm out in greater detail.

) The first sentence of Section 4.1 is totally redundant.

) When the SLIC superpixels are constructed, why are three bands only considered?

) Performing some robustness analysis for the number of superpixels seems more appropriate than simply stating "2000-5000" works well.

) Why in Figure 6 are (h), (i) on different scales?

Author Response

Point 1: It is fundamentally unclear to me what benefit is hoped for from considering the polarized data in addition to the traditional HSI. By contrast, it is well understood that incorporating spatial information or data modalities such as lidar may help HSI-driven learning tasks. Without delving into this important problem, the proposed work appears quite unmotivated.

Response 1: Thanks for the comments. The polarized data is a good representation of the surface texture and roughness of objects. The authors believe that polarized data is a good complement to spectral data, especially in the field of urban remote sensing (it is well known that artifacts and natural objects usually have different surface properties). For example, when we need to distinguish between various targets under different lighting conditions, or to distinguish targets with different surface features composed of substances of similar spectral structure, the polarization data will have better performance than the traditional HSI. As in the experimental part of the manuscript, using traditional HSI cannot distinguish between windows and surrounding walls in dataset A, and adding polarized data solves this problem. Also, for other regions in dataset A and dataset B, polarization significantly improves the accuracy of clustering. It can be said that the higher clustering accuracy and the better ability to distinguish between different targets are what can be obtained by introducing polarized data.

Point 2: While the authors discuss in the literature review subspace clustering at length, several recent methods allow to go past subspaces, and consider data lying close to low-dimensional manifolds. Just two recent works that could be considered in this regard are:

a.) Murphy, James M., and Mauro Maggioni. "Unsupervised Clustering and Active Learning of Hyperspectral Images With Nonlinear Diffusion." IEEE Transactions on Geoscience and Remote Sensing 57.3 (2019): 1829-1845

b.) Saranathan, Arun M., and Mario Parente. "On Clustering and Embedding Mixture Manifolds Using a Low Rank Neighborhood Approach." IEEE Transactions on Geoscience and Remote Sensing (2019).

The work of Gillis on matrix factorizations for HSI also seems relevant to the proposed work, for example: Gillis, Nicolas, Da Kuang, and Haesun Park. "Hierarchical clustering of hyperspectral images using rank-two nonnegative matrix factorization." IEEE Transactions on Geoscience and Remote Sensing 53.4 (2015): 2066-2078.

Response 2: Thanks for the suggestion. I have read these articles you advised and learned a lot from them. These works are all aimed at achieving higher accuracy and higher efficiency in HSI clustering. Their purpose is the same as the purpose of our work, so they should indeed be considered in the article. On the other hand, as far as my understanding goes, our work differs from these works in the following ways: 1) our work can be considered an improved and more efficient version of existing subspace clustering methods, the principles and performance of subspace clustering methods and works in those papers are different, and the way we achieve higher efficiency is also different from those works; 2) our work is for PHSI clustering tasks instead of HSI clustering tasks, and the combination of polarized data and spectral data need to be considered in our work. Also, we have introduced some of these works as comparisons in the revised version of the manuscript, which will be talked about in detail in the response to Point 3.

Point 3: It would seem natural to compare the proposed method with non-subspace based clustering methods, for example the works above (some of which have open-source code available). Moreover, it would be nice to see some benchmark methods like K-means on the raw data, just for reference.

Response 3: Thanks for the suggestion. I found the open-source codes for 2 of the above three works (the work of Saranathan and the work of Gillis). Unfortunately, I failed to run the code of Saranathan’s work before the deadline, but Gillis’s code has been successfully applied to our data. From the experimental results, we can find that although the rank-two NMF algorithm for HSI runs faster than SSC/LRR/LSR and our proposed methods, it cannot achieve a higher OA. And both the rank-two NMF and our work perform better than k-means. The results of rank-two NMF and K-means have been added to the experimental part of the manuscript as comparisons.

Point 4: The crucial subsampling step seems totally unclear to me. This section needs to be totally re-written and made clear how stability is achieved. This is one of my biggest concerns---I frankly could not understand Section 3.2, in particular Fig. 2.

Response 4: Thanks for the comments and sorry for the unclear expression. This section has been re-written and Figure 2 has been redrawn. I hope it will be easier for readers to read.

Point 5: No formal definition of "representation-based clustering" is given, which is a bit confusing.

Response 5: Thanks for the comments. A formal definition of "representation-based subspace clustering" has been supplemented in section 2.1 as follows:

"Subspace clustering usually contains two tasks: 1) projecting the data into low-dimensional subspaces and 2) calculating the cluster membership of the dataset using statistical methods or spectral clustering. And the core of spectral clustering is the construction of similar graph of which each vertex denotes a data point and the edge weights represent the similarities between connected points. The pairwise distance (PD) which computes the similarity based on the distance (e.g., the Euclidean distance) between two data points, and the reconstruction coefﬁcients (RC) which denotes each data point as a linear representation of the other points, are two widely used approaches to build a similarity graph. Many of the recent studies, for example, the sparse subspace clustering (SSC) , low-rank representation subspace segmentation (LRR) and least squares regression subspace segmentation (LSR) have shown that the RC has a superior performance. These methods collectively called representation-based subspace clustering have a similar form. "

Point 6: The English is consistently awkward and often incorrect. It needs a thorough revision, ideally by a careful native speaker. There are also some spelling errors.

Response 6: Thanks for the comments. The English now has been carefully checked and modified. I hope it can get closer to the requirements for publication.

Point 7: Formatting issues abound, for example equations (8), (9).

Response 7: Thanks for the comments. All equations as well as some other formatting issues have been re-edited.

Point 8: Equation (7) feels extremely ad-hoc to me. Why is this a reasonable way to combine the C matrices? Taking a convex combination seems more natural, frankly.

Response 8: Thank the reviewer for the comments. In fact equation (7) is a convex combination, but it is a convex combination of W•C matrices instead of C matrices. We think the combination of W•C matrices could better integrate the polarization and the spectral information. The convex combination of C matrices is also a viable solution and we did several experiments before. The experimental results showed the two methods are close in performance but the accuracy of W•C matrices is a little (about 1-2%) higher.

Point 9: Equation (6) implicitly assumes that classes should be distinct in either of the two data modalities. Does this really make sense philosophically? Are there examples where one modality is erroneously discriminative, and one should use the other modality to compensate?

Response 9: Thank the reviewer for the comments. Here we adopt an assumption that both polarization and spectral information can distinguish between all classes, and they only differ in the strength of the discriminating ability on different targets. In practice, many classes are not able to be correctly distinguished on both of two data modalities. Polarization can be a supplement to spectral information in many cases. For example, targets under different lighting conditions, the difference between artificial objects and natural objects, identification of camouflage targets, etc.

Point 10: What is meant by "on actual demand"?

Response 10: Thanks for the comments. Here we add two σ parameters to make the weight matrices W_s and W_p adjustable on different datasets by users. We failed to find a fixed constant to reach the best performance on all datasets, so the σ parameters need to be adjusted by the user. But as described in Section 4, these two parameters have a wide range of acceptable values on each dataset, which makes the process of tuning not too complicated.

Point 11: Why should the sum of the dimensions d_i be equal to D? What if I have two classes that live on 1-dimensional subspaces? This is quite bizarre to me.

Response 11: Thanks for the comments. The sum of the dimensions d_i being equal to D is an assumption of subspace algorithms themselves. Exactly, sometimes classes might be found lying on one-dimensional or two-dimensional subspaces. And usually if we have enough data points in each subspace, the dimensions of each subspace are comparatively small. But it doesn’t matter if the sum of dimensions is smaller than D. It is easy to make the sum of the dimensions d_i being equal to D because low dimension subspaces can be described by high-dimensional subspaces. In my own opinion, the assumption of equality is a specification, and we only need to ensure that the sum of the dimensions is not greater than D. It’s even better to say that the dimensions are desired to be as low as possible.

Point 12: In equation (4), is y_i a spectral or spatial feature?

Response 12: Thanks for the comments. In equation (4), y_i a spectral feature. In our method, images have been segmented in to thousands of superpixels. The superpixel segmentation is already a spatial down-sampling operation, so we didn't extract the spatial feature further. A deeper spatial feature might improve the performance, and this will be considered in our future work.

Point 13: Typically || ||_p denotes an l_p norm (for vectors) or an induced matrix norm, not an arbitrary "proper norm."

Response 13: Thanks for the comments. The expression of "a proper norm" has been modified to || ||_x.

Point 14: Which matrix inner product is considered?

Response 14: Thanks for the comments. Here we adopt the Hadamard product. The expression "inner product" has been modified.

Point 15: Algorithm 1 leaves many things implicit, i.e. (10)-(14). It would be better to write the algorithm out in greater detail.

Response 15: Thank the reviewer for the comments. The content about (10)-(14) has been rewritten and more details have been added.

Point 16: The first sentence of Section 4.1 is totally redundant.

Response 16: Thank the reviewer for the comments. This paragraph has been re-written.

Point 17: When the SLIC superpixels are constructed, why are three bands only considered?

Response 17: Thanks for the comments. Many of the superpixel segmentation algorithms, including the open source code of the SLIC algorithm we referenced, are used by default for RGB images. It’s acceptable and not difficult to modify the codes to fit more bands or 3 other bands, but we can find that the RGB bands have performed well enough from the experimental results. Therefore we also use the RGB bands to segment images.

Point 18: Performing some robustness analysis for the number of superpixels seems more appropriate than simply stating "2000-5000" works well.

Response 18: Thanks for the suggestion. We compared the overall accuracy and running time with different numbers of superpixels, and the analysis has been added into Sectioin 4.4.

Point 19: Why in Figure 6 are (h), (i) on different scales?

Response 19: Thank the reviewer for the comments. This is because there isn’t a fixed constant for σ_s or σ_p to reach the best performance on all datasets. For dataset A and dataset B, σ_p’s acceptable ranges are on different scales. The values of σ_s and σ_p need to be turned by users. But the two parameters can achieve a good performance in wide ranges, so tuning is not a complicated task.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The clarity of the content of this revised manuscript has been significantly improved based on the efforts provided by the authors to reply to the reviewers' requirements and suggestions.

Formally, the Authors replied correctly to almost all of my concerns. The overall readability has been upgraded.

Below are a few additional questions to further improve the quality of the manuscript.

1) Please specify how to determine/delineate in practice the boundary of the two targets and how to detect vague areas? What do you mean exactly by "vague" areas? This appears necessary as the number of unlabeled pixels further ignored while calculating the accuracy (AC) may alter the analysis somewhat.

2) I also believe it necessary to better motivate, explain and justify why a sampling strategy is necessary to avoid clustering directly on polarized hyperspectral datasets.

3) Looking at the content of tables 3 and 4, we can observe that the algorithms using the proposed framework (FPS-SSC, FPS-LRR, FPR-LSR) can indeed achieve a superior clustering performance while averaged over the whole dataset (in terms of OA). But meanwhile, they also fail to get the highest accuracy (AC) for many classes. For such classes, what are the reasons for not getting the best results? What are the levers to improve the situation?

4) Although efforts have been provided to check and improve English, there are still some places worth improving and refining.

Typo(s):

Where α, λ1 and λ 2 are trade-off parameters

Author Response

The clarity of the content of this revised manuscript has been significantly improved based on the efforts provided by the authors to reply to the reviewers' requirements and suggestions.

Formally, the Authors replied correctly to almost all of my concerns. The overall readability has been upgraded.

Below are a few additional questions to further improve the quality of the manuscript.

We would like to thank the reviewer again for the recognition of our work and for the valuable comments he (or she) gives. The constructive suggestions the reviewer proposed are very helpful for revising and improving our paper, as well as the important guiding significance to our researches. Here we submit a new version of our manuscript which has been careful modified according to the reviewers’ suggestions. We hope the correction will meet with approval. Below you will find our point-by-point responses to the reviewer’s comments.

Point 1: Please specify how to determine/delineate in practice the boundary of the two targets and how to detect vague areas? What do you mean exactly by "vague" areas? This appears necessary as the number of unlabeled pixels further ignored while calculating the accuracy (AC) may alter the analysis somewhat.

Response 1: Thanks for the comments. The “vague” areas in the manuscript are the pixels which are with rich spatial information changes and are difficult to distinguish from the human’s eye. We ignored them while calculating the accuracy because these pixels are usually difficult for us to give man-made “ground truth” due to the limitation of instruments’ spatial resolution. In our methods, we didn’t adaptively detect the boundary of targets and the vague areas. Algorithms can give clustering results of these pixels but it’s hard for us to know if the results are correct. And in practice, when applying clustering algorithms on a dataset of which “ground truth” is unknown, users don’t need to judge which pixels should be ignored.

The analysis of experimental results is indeed altered somewhat because of the neglect of these pixels, but we still believe it reliable for the reasons: (1) the numbers of “unlabeled” pixels in the datasets we used are much smaller than the pixel number of images; (2) we find the results of these pixels are also relatively accurate according to the subjective observation of the cluster maps (i.e. Figure 4 and Figure 5). In fact, the neglect of some pixels is common in HSI classification and clustering researches. In the relevant papers, adopted datasets usually do not have the “ground truth” of all pixels.

In addition, pixels corresponding to the intersection of different targets may be a mixture of their information. Pixel unmixing is another important task for hyperspectral image processing, but this is relatively independent of our work.

Point 2: I also believe it necessary to better motivate, explain and justify why a sampling strategy is necessary to avoid clustering directly on polarized hyperspectral datasets.

Response 2: Thanks for the comments. Polarized hyperspectral datasets are usually accompanied by both high dimensionality and large scale. For subspace-based models, high-dimensional and large-scale datasets usually have strict requirements on the computers’ memory and running time. Also, many existed subspace-based clustering algorithms can achieve high accuracies on small-scale datasets, but on large-scale datasets, they may not perform as well as on small-scale ones. For example, in our experiments, when the number of in-sample points is larger than 400, the in-sample accuracy presents a downward trend as the number increases. Therefore, we believe that although a sampling strategy which can avoid clustering directly on PHSI datasets is not necessary (subspace-based methods can also get “not bad” clustering results without sampling), it can significantly improve the accuracy and reduce the running time. We have enriched the description about this in the newest manuscript now.

Point 3: Looking at the content of tables 3 and 4, we can observe that the algorithms using the proposed framework (FPS-SSC, FPS-LRR, FPR-LSR) can indeed achieve a superior clustering performance while averaged over the whole dataset (in terms of OA). But meanwhile, they also fail to get the highest accuracy (AC) for many classes. For such classes, what are the reasons for not getting the best results? What are the levers to improve the situation?

Response 3: Thanks for the comments. Hypersepctral data (i.e. S0 data in our experiments) itself is enough on some classes to achieve high accuracy in practice, and adding polarization cannot do a favor in these cases. In addition, from the perspective of real experimental data (at least the two datasets we used in the manuscript), DOLP usually has a higher noise level and a lower signal to noise ratio (SNR) than S0, which sometimes may cause a decrease in accuracy. In our planned future work, we will try to design an adaptive method to adjust the parameter alpha according to the intensity and SNR of DOLP to improve the situation. That means in the areas in which the DOLP is accompanied by low intensity and low SNR, the DOLP data will have a low weight in the process of clustering and the results are mainly determined by S0.

Point 4: Although efforts have been provided to check and improve English, there are still some places worth improving and refining.

Response 4: Thanks for the comments. We have carefully checked the manuscript again and some sentences have been modified.

Point 5: Typo(s):

Where α, λ1 and λ 2 are trade-off parameters

Response 5: Thanks for the comments. We feel very sorry for being careless, and this typo has been modified.

Special thanks to you for your good comments.

Author Response File: Author Response.doc

Reviewer 2 Report

The authors have thoroughly revised their paper, and I now recommend acceptance.

Author Response

The authors have thoroughly revised their paper, and I now recommend acceptance.

Response: We would like to thank the reviewer again for the recognition of our work and for the valuable comments he (or she) gives. The constructive suggestions the reviewer proposed are very helpful for revising and improving our paper, as well as the important guiding significance to our researches. Once again, thank you very much for your comments and suggestions.

Author Response File: Author Response.doc

Round 3

Reviewer 1 Report

My opinion is that the quality of the paper has been greatly improved based on valuable efforts provided by the authors. They replied correctly to all my suggestions and requirements.

I have no further questions.

The overall content is thus worth publishing

Article Menu

An Efficient Representation-Based Subspace Clustering Framework for Polarized Hyperspectral Images

Further Information

Guidelines

MDPI Initiatives

Follow MDPI