Review Reports - Lightweight Multi-Scale Network for Segmentation of Riverbank Sand Mining Area in Satellite Images

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

The manuscript presents a novel Lightweight Multi-scale Network (LMS Net) designed for the segmentation of riverbank sand mining areas in satellite images. The authors address a critical issue in ecological protection and shipping management by proposing a deep learning-based approach that enhances the efficiency of large-area extraction while maintaining accuracy. The paper is well-structured, with a clear statement of the problem, methodology, experiments, and results. The authors have successfully demonstrated the superiority of LMS Net over traditional methods and other lightweight networks in terms of accuracy and efficiency.

Recommendation:

The manuscript is recommended for acceptance after minor revisions. The authors are encouraged to address the following points:

1. Expand the discussion on the generalizability of LMS Net to other remote sensing tasks and scenarios.

2. Include a comparison with the latest state-of-the-art models in the field to benchmark the performance of LMS Net.

3. Provide a more detailed analysis of the trade-offs between accuracy, computational efficiency, and resource usage.

4. Discuss the practical deployment considerations, such as the memory footprint and inference time, which are critical for real-world applications.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

Comments and Suggestions for Authors

The paper is innovative in proposing a lightweight multi-scale network for riverbank sand mining area segmentation, but it has shortcomings in terms of the diversity of the dataset, the detailed description of the method, the rigor of the experimental design, and the verification of practical applications, such as:

1. Limitations of the dataset

Geographical uniformity: The datasets used in the paper are all from satellite images of Shulan City, Jilin Province, China, with a short time span (November 2022 to October 2023). This single geographical source may limit the generalization ability of the model in other geographical areas.

Insufficient data volume: Although it contains a total of 10,270 images, a dataset of this size may be relatively small for deep learning models, especially in terms of diversity and complexity, which may not be enough to cover the changes in various practical scenarios.

2. Lack of comparison with the latest methods

Limited selection of comparison models: The comparison models selected in the paper are mainly concentrated on some classic lightweight networks (such as MobileNet, ShuffleNet, etc.) and Unet series. The lack of comparison with some of the latest models that have outstanding performance in remote sensing image segmentation tasks (such as the latest version of the Transformer-based model) may lead to an overestimation of the performance of LMS Net.

Not covering more evaluation indicators: The paper mainly uses OA, mIoU, F1-score and Kappa as evaluation indicators, and does not cover more detailed analysis such as precision, recall, F1-score, or in-depth exploration of the performance of the model in different categories through confusion matrix and other methods.

3. Insufficient details in the description of the method

The detailed mechanism of LMS Block is not clear enough: Although the paper gives a basic description of LMS Block, the detailed explanation of how it specifically plays a role in multi-scale feature extraction and how it interacts with other network modules is not in-depth, which may make it difficult for readers to fully reproduce or understand its innovations.

Lack of visual analysis: In addition to quantitative results, there is a lack of visual analysis of feature maps or attention mechanisms, which cannot intuitively show the advantages of LMS Block and CAM in feature extraction and attention allocation.

4. Insufficient experimental design

Unclear training details: The paper mentioned that 100 epochs were trained, but the early stopping strategy and whether hyperparameter tuning was performed were not detailed, which may affect the reliability and repeatability of the results.

Lack of cross-validation: The experiment was repeated only 5 times and averaged, lacking more rigorous cross-validation (such as k-fold cross-validation) to ensure the stability and generalization of the results.

5. Insufficient analysis of the trade-off between efficiency and performance

The relationship between floating-point operations and actual running time: Although the paper reports FLOPS and FPS, it does not discuss in detail the actual running time and energy consumption under different hardware environments, especially the performance in actual application scenarios with limited resources.

Matching of model complexity and real-time requirements: The paper claims that LMS Net is suitable for real-time monitoring, but lacks deployment and testing in actual real-time systems, and fails to verify its real-time performance and stability in real environments.

6. Insufficient generalization and robustness

Adaptability to different resolutions and lighting conditions: The satellite images used in the paper have specific resolutions and lighting conditions. The performance of the model under different resolutions, different lighting or weather conditions is not discussed, which may affect its effectiveness in diverse practical applications.

Handling of noise and outliers: Actual remote sensing images often contain abnormal conditions such as noise and cloud cover. The paper does not mention the robustness and processing ability of the model in these cases.

7. Lack of in-depth analysis of errors

Insufficient analysis of mis-segmentation cases: Although some examples of segmentation results are provided, there is a lack of in-depth analysis of the model's mis-segmentation or missed segmentation in complex scenarios, and it is impossible to clarify the model's weaknesses and improvement directions.

Imbalanced categories: The dataset contains two categories, "sand mining area" and "non-sand mining area", but the impact of category imbalance on model training and performance, and whether corresponding balancing measures (such as weighted loss function) have been taken, are not discussed.

8. The outlook for future work is relatively broad

The specific direction of improvement is unclear: The paper mentions in the conclusion that it will focus on the design of the decoder structure and the optimization of the encoder parameters, but does not specifically point out possible improvement methods or strategies, and lacks targeted future research plans.

9. Lack of open source and reproduction support

Code and data are not public: In order to promote academic exchanges and verification, it is recommended that the author disclose the code and some data sets, but the paper does not mention whether relevant resources will be provided, which may affect the transparency and reproducibility of the research.

10. Limitations in practical applications

Dynamic monitoring of environmental changes: The mining of riverbank sand mining areas may be dynamic. The paper does not discuss the application and performance of the model on time series data, and its applicability in dynamic monitoring cannot be evaluated.

Comments on the Quality of English Language

Unclear sentence structure:

Sentence fragmentation and conjunction usage: Some sentences are too long and a bit wordy, which makes reading less smooth. For example:

“The output features of the LMS block in each level need to be input to a channel attention module (CAM), which is particularly defined in Section 3.4.”

It can be changed to: “The output features of the LMS block at each level are input into a channel attention module (CAM), which is defined in Section 3.4.”

Translation tone: Some sentences have obvious translation tone, which affects the natural fluency of English. For example:

“These four matrices are concatenated by channel to form a new feature F5, which has half the number of channels as the output matrix.”

A more natural expression could be: “These four matrices are concatenated along the channel dimension to form a new feature F5, which has half the number of channels compared to the output matrix.”

Terms and abbreviations are not explained at the first appearance:

Abbreviation explanation: Although many abbreviations are used in the paper, some abbreviations are not explained when they first appear, which may affect readers' understanding.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Comments and Suggestions for Authors

Dear Authors,

I revised your manuscript entitled “ Lightweight Multi-scale Network for Segmentation of Riverbank Sand Mining Area in satellite Images” by Hongyang Zhang and Shuo Liu with a great interest. In my opinion it deals with an important problem, however it has been discussed in many papers before. In the last few years, we can observe a rapid increase in publications on the development of satellite data and numerical methods in the analysis of satellite images. For example:

https://www.mdpi.com/2077-1312/12/1/130

https://www.mdpi.com/2072-4292/16/20/3912

https://www.mdpi.com/1424-8220/21/5/1848

This work is in line with this trend and the authors provide a literature review devoted to earlier studies conducted by various authors. The achievements of other researchers should therefore be an inspiration to introduce original novelties that would be published in the form of an article. But will it be scientific? What is scientific? I ask this question to the authors of the discussed manuscript.

The authors' contribution is:

"we proposed a Lightweight Multi-scale Network (LMS Net) for the quick segmentation tasks of riverbank sand mining area from high-resolution satellite images."

"The performance of our proposed network meets the requirements of river management."
OK, in the further part of this work the authors present their analytical tool in a convincing way.

BUT:
1. But as a researcher, I would like to somehow evaluate this success. To what extent does the performance of your proposed network meet the requirements of river management? 100%?

2. The publication mentions the phrase "river management requirements" several times, what are these requirements?

3. A large part of the Conclusions section consists of a description of what the authors intend to do and what has not been done. In the authors' opinion, does this not indicate that the work is unfinished?

However, the weakest element of this work, which negatively stands out from other publications, is the small amount of experimental data. Why should the algorithm be limited to the issue:

"... we will pay more attention to sand mining area not only in the riverbank, and focus on addressing the issue of confusion between sand mining areas and bare land."
Can't it be used for other areas of the terrain surface?

At this stage, taking into account the scope, scientific contribution of the authors in the field of the problem, I would assign the work to the technical notes group.
The scientific nature of the work will result from the appropriate scope of the analyzed empirical data, the provision of a quantitative characteristic of the success of the implemented analytical approach and the definition of the limits of its application.

I present here some general remarks but there are many corrections in the text (hand made) – see the attachment with pdf file. Between us: the quality of the language is not so bad.
….
reviewer
.-.- .-.-

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

This paper still has the following problems：

1. The paper has many formatting issues, such as those on pages 5 and 14.

2. Lack of substantive details and evidence

Although each reply stated "we added data", "we made comparisons", and "we improved the experimental design", no specific explanation or quantitative results were provided in the reply (for example: what data sets were added? What are the comparison results with the latest methods? What evaluation indicators were added?). This makes it difficult for reviewers to evaluate the effectiveness of the changes in your point-to-point reply.

3. Inconsistency between "we have added data" and "we will open source the data"

In the reply, the author mentioned both "we have added data" and "we are willing to open source our data set". But it is not clear whether it has been open sourced or may be open sourced in the future. Reviewers need to see clear open source links or specific ways to obtain data, rather than just "willing" or "want".

4. Lack of systematic summary or reference location of new content

From the content of the reply, the author frequently uses expressions such as "we have added...see in the results/discussion section", but does not clearly mark which sections, figures or experiments have been added in the revised manuscript. For reviewers, it is difficult to quickly locate these updates in the revised manuscript.

5. The response to the reviewer's comments is relatively "superficial"

Most responses only state measures such as "added data", "improved expression", and "added indicators", but do not further explain the improvement in the overall quality of the paper and how to respond to the reviewer's initial concerns.

6. In the revision response letter (Cover Letter), the ideal approach is:

First, summarize the key points of the reviewer's questions and briefly acknowledge or explain the reasons; Then explain the specific changes, such as which chapters or experiments are added;

Finally, provide supporting evidence (comparison of data results, example tables or diagrams) and indicate the corresponding page number or paragraph position of the manuscript for the reviewer to compare.

Author Response

Please see the attachment.

Author Response File: Author Response.docx