Next Article in Journal
Recent Advances in Geological Oceanography II
Previous Article in Journal
YOLO-GE: An Attention Fusion Enhanced Underwater Object Detection Algorithm
 
 
Article
Peer-Review Record

Underwater Small Target Classification Using Sparse Multi-View Discriminant Analysis and the Invariant Scattering Transform

J. Mar. Sci. Eng. 2024, 12(10), 1886; https://doi.org/10.3390/jmse12101886
by Andrew Christensen 1,*, Ananya Sen Gupta 1 and Ivars Kirsteins 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Reviewer 4:
J. Mar. Sci. Eng. 2024, 12(10), 1886; https://doi.org/10.3390/jmse12101886
Submission received: 7 August 2024 / Revised: 1 September 2024 / Accepted: 6 October 2024 / Published: 21 October 2024
(This article belongs to the Section Ocean Engineering)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this paper, a multi-stage algorithm exploiting salient and discriminative features is introduced for underwater target classification. The invariant scattering transform is first applied to extract nonlinear features robust to noise and deformation; a multi-view discriminant method with sparsity constraint is then developed to reduce the dimensionality and enhance the separation between classes; finally a support vector machine is trained for classification. This work contributes by offering new insights and ideas for sonar automatic target recognition, applying the state-of-art machine learning approach to a traditional difficult problem, thus can be of great interests to the relevant research. However, some major issues shall be addressed before a fair evaluation can be made.

1.     Presentation

Given the complexity of the developed algorithm, the presentation of math needs to be much improved. It is very hard to follow at some places, mainly because of undefined or duplicate or incorrect use of symbols. For example, \lambda is used for multiple different parameters; c and c_i in Line 263 are not defined; “x” (Eq (16)) and “*” (Eq. (18)) confusing; “l’” in Eq. (26) is misused, and many others. This reviewer suggests the authors define a symbol table to make sure one symbol for one physical quantity and all symbols are properly defined. Also it is hard to distinguish scalar, vector, and matrix along with their dimensions in the current expressions.

Besides, some operation is introduced without context. For example, line 371: “p eigenvectors” comes from nowhere – eigen-decomposition of what? Similarly, Appendix A.2, Algorithm 1, “p smallest eigenvalues” of what? Reasoning from Line 650 to (A.11) is not obvious – better explain.

2.     Methodology

The approach is presented almost purely mathematically, without connection with the physical features of the studied acoustic signal or discussions on why that can be applied. For example, cascaded wavelets are used and it is mentioned in Line 317, “small variations or distortions in the input signal will only have minor effects on the scattering coefficients”. How those choice and arguments are related to sonar imaging or they are applicable to optical image as well? How the multi-view features are related to the sonar imaging physics and why they provide better discrimination. With those connection, the novelty and significance of the contribution would be more convincing. Otherwise, the method may be simply interpreted as a direct application of existing algorithms.

Besides, selection of the relevant parameters (\lambda, \gamma, threshold in Line 573, eigenvector number p, and \epsilon in Algorithm 1) used in the algorithm seems to be quite ad hoc. Even though it is discussed via experimentations, the readers would be interested to know how those parameters shall be chosen if given a different set of data.

3.     Results

Although the approach is quite interesting, the testing results are a bit disappointing without much new insights. In Section 4.1, the targets analyzed from the PondEx dataset are mentioned to comprise 6 classes. However, there is limited introduction regarding the dataset’s quantity, quality, distribution, classification accuracy for each class, or related discussions in this and later sections. This reviewer suggests that the toy example be removed; instead, some relevant details for the PondEx dataset be added. Besides, the figures shall be improved. For example, it is strange to see 2.5 for index number (Fig. 12); Fig. 8, specify each subpanel (what is \theta on the top?); similar for Fig 11.

One of the purposes in this study is to remove noisy features. However, the discussions in Lines 565-568 are more about the effect of removing signal features. Maybe an even smaller \gamma shall be used to highlight the cases with/without noise feature removal.

The link provided in the Data Availability Statement is currently inaccessible.

Comments on the Quality of English Language

There are grammars/incomplete sentences/format issues (e.g., incorrect indent before “where”) at various places – shall be double checked.

Author Response

Comment 1: Given the complexity of the developed algorithm, the presentation of math needs to be much improved. It is very hard to follow at some places, mainly because of undefined or duplicate or incorrect use of symbols. For example, \lambda is used for multiple different parameters; c and c_i in Line 263 are not defined; “x” (Eq (16)) and “*” (Eq. (18)) confusing; “l’” in Eq. (26) is misused, and many others. This reviewer suggests the authors define a symbol table to make sure one symbol for one physical quantity and all symbols are properly defined. Also it is hard to distinguish scalar, vector, and matrix along with their dimensions in the current expressions.

Response 1: We added a section for defining notation, changed all vectors to be bold lowercase letters and matrices to be bold uppercase letters, and made the corrections pointed out in Comment 1.

 

Comment 2: Besides, some operation is introduced without context. For example, line 371: “p eigenvectors” comes from nowhere – eigen-decomposition of what? Similarly, Appendix A.2, Algorithm 1, “p smallest eigenvalues” of what? Reasoning from Line 650 to (A.11) is not obvious – better explain.

Response 2: These issues have been corrected. See Lines 402, 702, and 713 in the revised manuscript.

 

Comment 3: How those choice and arguments are related to sonar imaging or they are applicable to optical image as well? How the multi-view features are related to the sonar imaging physics and why they provide better discrimination. With those connection, the novelty and significance of the contribution would be more convincing. Otherwise, the method may be simply interpreted as a direct application of existing algorithms.

Response 3: This non-linear wavelet transform  was used because the features in the PondEx images  are localized in the time/spatial domain. The multiple views for each  PondEx are generated by scaling, rotating and translating the Morlet wavelet. The Morlet wavelet can sense the direction of localized singularities that follow a particular direction. Therefore, the Invariant Scattering Transform can isolate features that are persistent along a particular direction in the “smile” in the PondEx images. Multi-view Discriminant is simply a statistical method to better capture is discriminate feature that are persistent across the multiple views.

 

Comment 4: Besides, selection of the relevant parameters (\lambda, \gamma, threshold in Line 573, eigenvector number p, and \epsilon in Algorithm 1) used in the algorithm seems to be quite ad hoc. Even though it is discussed via experimentations, the readers would be interested to know how those parameters shall be chosen if given a different set of data.

Response 4: There is no optimal way to choose the sparsity parameters. The parameters are dependent on the specific dataset.

 

Comment 5: Although the approach is quite interesting, the testing results are a bit disappointing without much new insights. In Section 4.1, the targets analyzed from the PondEx dataset are mentioned to comprise 6 classes. However, there is limited introduction regarding the dataset’s quantity, quality, distribution, classification accuracy for each class, or related discussions in this and later sections. This reviewer suggests that the toy example be removed; instead, some relevant details for the PondEx dataset be added. Besides, the figures shall be improved. For example, it is strange to see 2.5 for index number (Fig. 12); Fig. 8, specify each subpanel (what is \theta on the top?); similar for Fig 11.

Response 5: We added details about each sample size of each target class in the PondEx dataset and added confusion matrices for class accuracy using various the various subspace projection methods and different linear classifiers. See Figure 11 for confusion matrices and Table 1 in revised manuscript. Also see Lines 579 through 609 for discussion.

 

Comment 6: One of the purposes in this study is to remove noisy features. However, the discussions in Lines 565-568 are more about the effect of removing signal features. Maybe an even smaller \gamma shall be used to highlight the cases with/without noise feature removal.

Response 6: Adding a \gamma trial that is approaching zero is redundant since ordinary MvDA acts as the baseline when no noisy features are removed.

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Editor and author,

 

 

After an in-depth review of the paper "Underwater Small Target Classification using Sparse Multi-view Discriminant Analysis and the Invariant Scattering Transform", I believe that this paper explores a new method for classifying small targets in underwater sonar images, combining the advantages of sparse multi-view discriminant analysis (SVMDA) and invariant scattering transform (IST). The research is innovative and has great significance for improving the performance of underwater detection system. However, in order to further enhance the academic rigor, readability and influence of the paper, I think a relatively comprehensive revision and improvement is needed. The following are some of my suggestions, and I believe that with careful revision, this paper can become a high-quality academic achievement.

 

1. The research status of the paper in related fields is relatively comprehensive, but some references are outdated. It is suggested to add important research results in the past five years to reflect the innovation and advancement of this research.

 

2. This paper analyzes the advantages and disadvantages of scattering network and traditional convolutional network for feature extraction, and suggests adding experiments to compare and analyze the influence of two different feature extraction methods on experimental results.

 

3. In terms of dimensionality reduction methods, PCA,MvDA and Sparse MvDA are selected in this paper. It is suggested to add other dimensionality reduction methods for comparison, such as linear dimensionality reduction methods such as LDA and ICA, and nonlinear dimensionality reduction methods such as autoencoders.

 

4. SVM is used to classify the classification methods in this paper, and it is suggested to compare more classification methods, such as random forest, logistic regression, gradient lifting decision tree, etc

 

5. For the evaluation of classification performance, it is suggested that in addition to the accuracy and other indicators, other evaluation criteria such as recall rate and F1 score can be considered to evaluate the algorithm performance more comprehensively.

 

6. It is recommended to add ablation experiments to evaluate the contribution of each module to the overall system.

 

7.Table 2. When the input data is Scattering Coefficients, the upper limit of precision floating exceeds 100%;

 

To sum up, the paper is innovative but needs to be improved in many ways, suggested major revision. It is expected that the author will comprehensively revise the above problems and supplement experimental verification, and reach the publication level after modification.

Author Response

Comment 1: The research status of the paper in related fields is relatively comprehensive, but some references are outdated. It is suggested to add important research results in the past five years to reflect the innovation and advancement of this research.

Response 1: We added more references that are within the past five years and are relevant to this work. See Lines 97 through 121 in revised manuscript.

 

Comment 2: This paper analyzes the advantages and disadvantages of scattering network and traditional convolutional network for feature extraction, and suggests adding experiments to compare and analyze the influence of two different feature extraction methods on experimental results.

Response 2: While we do compare and contrast the Invariant Scattering Transform with traditional CNNs, the purpose of this study is to avoid using CNNs given the small sample and instead show that robust classification results can be achieved without needing to learn the convolution filters.

 

Comment 3: In terms of dimensionality reduction methods, PCA,MvDA and Sparse MvDA are selected in this paper. It is suggested to add other dimensionality reduction methods for comparison, such as linear dimensionality reduction methods such as LDA and ICA, and nonlinear dimensionality reduction methods such as autoencoders.

Response 3: We included results for LDA (see Table 4 and Figure 11) , but given the small sample size we wish to avoid using deep learning techniques for dimensionality reduction. Also see Lines 579 through 609 for discussion.

 

Comment 4: SVM is used to classify the classification methods in this paper, and it is suggested to compare more classification methods, such as random forest, logistic regression, gradient lifting decision tree, etc

Response 4: We added results for linear logistic regression but given the small sample size, we wish to avoid non-linear classifiers. We use Invariant Scattering Transform to handle the nonlinearities in the PondEx data. See Table 4 and Figure 11 and Lines 579 through 609 for discussion.

 

Comment 5: For the evaluation of classification performance, it is suggested that in addition to the accuracy and other indicators, other evaluation criteria such as recall rate and F1 score can be considered to evaluate the algorithm performance more comprehensively.

Response 5: We added confusion matrices for classification accuracy to evaluate the classification performance of each class. See Figure 11.

 

Comment 6: It is recommended to add ablation experiments to evaluate the contribution of each module to the overall system.

Response 6: We replaced Invariant Scattering Transform of the PondEx images with the standard Acoustic Color images of the PondEx images and saw degradation in classification accuracy.

 

Comment 7: Table 2. When the input data is Scattering Coefficients, the upper limit of precision floating exceeds 100%

Response 7: The mean classification accuracy using the Invariant Scattering Coefficients does not exceed 100%. Could the reviewer clarify what is meant by this comment?

Reviewer 3 Report

Comments and Suggestions for Authors

This paper is nicely written and explained. There are a few minor issues such as: 

1. The novelty of this new algorithm is not very clear. It is mentioned that a classification accuracy of 97.3% has been achieved using this technique. However is it better than those using the existing algorithms? If not, how does this proposed algorithm improve the classification? Does it reduce the classification time? 

2. Please explain the data, using a table if possible, in terms of the number of labels and samples you used. 

3. How did you validate the classifier performance? What are the size of the train and test set? Did you try any other classifier such as MLP, or KNN other than SVM? 

4. How do you propose to improve the classification performance in the future? 

I would suggest a minor revision. 

Comments on the Quality of English Language

Good, minor editing is required. 

Author Response

Comment 1: The novelty of this new algorithm is not very clear. It is mentioned that a classification accuracy of 97.3% has been achieved using this technique. However is it better than those using the existing algorithms? If not, how does this proposed algorithm improve the classification? Does it reduce the classification time? 

Response 1: The proposed sparsity constrained algorithms improves interpretability of results by forcing many of the coefficients in the projection matrix to zero, while maintaining classification accuracies that are on par with the non-constrained MvDA algorithm.

 

Comment 2: Please explain the data, using a table if possible, in terms of the number of labels and samples you used. 

Response 2: A table has been added with the description of the samples sizes of all targets in the PondEx dataset. See Table 1.

 

Comment 3: How did you validate the classifier performance? What are the size of the train and test set? Did you try any other classifier such as MLP, or KNN other than SVM? 

Response 3: We performed 10 random splits of data and used the mean test classification accuracy of the splits. We also included confusion matrices to visualize the classification accuracy of each target class. The PondEx dataset had 72 samples and we used a 80/20 test/train split. The breakdown of the sample size of each class is provided in Table 1. We added results for linear logistic regression but given the small sample size, we wish to avoid non-linear classifiers. We use Invariant Scattering Transform to handle the nonlinearities in the PondEx data. See Lines 579 through 609 in revised manuscript for discussion.

 

Comment 4: How do you propose to improve the classification performance in the future? 

Response 4: Future steps could include extending Sparse MvDA to non-linear dimensionality reduction using the kernel method. Kernel methods allow one to project data into a higher dimensional feature space where linear dimensionality reduction can be performed. Kernel methods would be useful if the scattering coefficients can be separated by a higher-order polynomial but not a simple linear decision boundary.

Reviewer 4 Report

Comments and Suggestions for Authors

See attached file

Comments for author File: Comments.pdf

Comments on the Quality of English Language

Fair

Author Response

Comment 1: From Table 2, it is evident that the performance improvement due to the invariant scattering transform is significant. In contrast, the results for MvDA and Sparse MvDA show little difference, with the average value for MvDA even being higher. This undermines the claim that applying a sparse penalty effectively selects the most important features. Although the mean classification accuracy with varying levels of sparsity is shown later, it does not necessarily imply that Sparse MvDA performs better than MvDA on actual data. Therefore, it is important to either show results where Sparse MvDA performs better than MvDA with an appropriately set level of sparsity, or provide a thorough explanation if the performance is lower.

Response 1: Due to the limited sample size of the PondEx dataset, the additional sparsity constraints aim to eliminate noisy features, enhancing the interpretability of the solution by reducing the number of selected features while still achieving classification accuracy comparable to that of MvDA. Therefore, the proposed algorithm is a tradeoff of a slight decrease in accuracy for greater interpretability by using Sparse MvDA.

 

Comment 2: In line 73, The full form of an abbreviation is missing. Scale-invariant feature transform (SIFT).

Response 2: This has been corrected. See Line 73 in revised manuscript.

 

Comment 3: In line 263, the definitions of C, n_i are not clarified. And, the definition of \mu seems to be incorrect.

Response 3: We included a Notation section to clarify the definitions. See Lines 214 through 237 in revised manuscript.

 

Comment 4: In line 270, the trace operator tr() is also used in Equation 6. Therefore, it should be explained in line 263 first.

Response 4: This issue has been corrected. The trace operator is defined in the Notation section.

 

Comment 5: In line 297, It seems that W^T (S_b + \lambda I ) W = I would be the correct expression.

Response 5: This issue has been corrected. See Lines 329 through 330 in revised manuscript

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

The paper has been revised according to the comments and there are no more issues.

Reviewer 4 Report

Comments and Suggestions for Authors

I am satisfied with your revision.

Back to TopTop