Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

EGSDK-Net: Edge-Guided Stepwise Dual Kernel Update Network for Panoptic Segmentation

Algorithms 2025, 18(2), 71; https://doi.org/10.3390/a18020071

by Pengyu Mu^1,*

, Hongwei Zhao¹ and Ke Ma^2,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Algorithms 2025, 18(2), 71; https://doi.org/10.3390/a18020071

Submission received: 16 November 2024 / Revised: 12 January 2025 / Accepted: 20 January 2025 / Published: 1 February 2025

(This article belongs to the Special Issue Machine Learning Algorithms for Image Understanding and Analysis)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Although the paper is interesting and technically correct, there are some scientific issues that are missing. Some changes in ANN makes the paper well for technical magazines, but there are not enough for scientific journal.

1. You should add SOTA from 2023-24 to Introduction and relate it with the motivation of your paper.

2. You presented the proposed changes. However, there is no explanations about how did you get to them. Why these modifications? Why not some others?

3. You should explain why Canny edge detector? Did you make some comparative analysis? What is the measure for choosing Canny instead of some other edge detector? You are citing reference 22 from 1986? Is there anything new after 1986?

4. SKDUM: how did you calculate how many more channels you need for input? It should not be an arbitrary number. You should at lease make comparative study of different number of channels and find the optimum based on some criteria. Why someone else haven't tried this?

Minor comments:

5. In the Abstract, you use two terms which should be connected: "we propose an Edge-Guided Stepwise Dual Kernel Update Network" and "superior performance of SDK-Net." Is this the same? It should be clarified.

6. Proposed SDK-Net is similar to .NET SDKs for Visual Studio (Microsoft). This can confuse search engines. You might consider rename it.

7. "Extensive experiments" cannot be contribution. It is normal research activity. Consider deleting or rephrasing this contribution.

Author Response

Thank you very much for taking the time to review this manuscript. Your comments have been incredibly helpful. Please find below the detailed responses and the corresponding revisions highlighted in the resubmitted file.

Comment 1: You should add SOTA from 2023-24 to Introduction and relate it with the motivation of your paper.

Response 1: Thank you for pointing this out, and we agree with your comments. We have introduced two panoptic segmentation works from 2023-2024 in the introduction, summarizing their contributions and limitations. Additionally, we have referenced two boundary-guided papers from 2023-2024 to further elaborate on how existing works have inspired our approach. These revisions are highlighted in yellow in the second paragraph of the introduction (pages 1-2, lines 33-56).

Comment 2: You presented the proposed changes. However, there is no explanations about how did you get to them. Why these modifications? Why not some others?

Response 2: Thank you for pointing this out, and we apologize for not making our motivations clearer in our description. We would like to clarify that our work primarily focuses on two key components: a real-time edge guidance module and a stepwise dual kernel update module. The inspiration for the first component comes from our observation that recent semantic segmentation and object detection methods have successfully incorporated edge guidance modules to achieve good results. Given the relationships between semantic segmentation, object detection, and panoptic segmentation, we wondered whether incorporating an edge guidance module could similarly enhance panoptic segmentation by integrating insights from external domains to improve performance. The second component arises from our reevaluation of RT-K-Net and K-Net, where we identified a low information utilization issue with their kernel update strategies. To address this, we aimed to improve the effective use of earlier-stage information by introducing additional updates and combinations to the kernels, thereby enhancing overall information utilization in later stages. After introducing the latest edge-guided methods in the introduction, we describe how these methods have inspired our approach. These revisions are highlighted in yellow in the second paragraph of the introduction (pages 2, lines 52-56).

Comment 3: You should explain why Canny edge detector? Did you make some comparative analysis? What is the measure for choosing Canny instead of some other edge detector? You are citing reference 22 from 1986? Is there anything new after 1986?

Response 3: Thank you for your comment. We have added a description regarding the choice of the Canny algorithm in the third paragraph of Section 3.2. You can find the highlighted portion in yellow on page 7, lines 254-259. Then please allow me to provide further clarification. There are two main reasons for our choice of the Canny algorithm. First, several prior works have used the Canny operator as the ground truth generation method, and it has been well-recognized in the field, which gives us confidence in its effectiveness. Second, as one of the most classic and widely used traditional edge detection algorithms, Canny provides reliable results with minimal additional computational cost. Moreover, since traditional edge detection algorithms, including Canny, are based on the physical properties of edges, they are known to yield stable results across different datasets. While the Canny algorithm was introduced in 1986, it remains a strong choice due to its robustness and reliability. Although modern edge detection methods, particularly those based on deep learning, have made significant advancements, these newer methods are often tailored to specific datasets. When applied to different datasets, they may not always produce equally satisfactory results. Therefore, in the early stages of our experiments, we opted for the more stable and dependable Canny algorithm.

Comment 4: SKDUM: how did you calculate how many more channels you need for input? It should not be an arbitrary number. You should at lease make comparative study of different number of channels and find the optimum based on some criteria. Why someone else haven't tried this?

Response 4: Thank you for pointing this out, and we apologize for any confusion caused by our description. Please allow me to clarify. In fact, the input channels for SDKUM are a fixed value, determined by the features that have been optimized through the RTEGM results in the neck. The number of channels in feature is determined by the backbone network, and this is not a configurable option in the experiment. Moreover, within SDKUM, there is a channel doubling during the kernel update process. This doubling is intentional, designed to generate both the current stage kernel and the predicted kernel for the next stage. This is not considered an experimental setting, as our method inherently includes two kernels: one for the current stage and one for the predicted next stage. Additionally, the number of channels in each kernel and the input channels for SDKUM follow the configurations of RT-K-Net and K-Net. We hope this clarifies our approach. If you still have any questions, please feel free to let us know, and we look forward to further discussion with you.

Comment 5: In the Abstract, you use two terms which should be connected: "we propose an Edge-Guided Stepwise Dual Kernel Update Network" and "superior performance of SDK-Net." Is this the same? It should be clarified.

Response 5: Thank you for pointing this out, and we agree with your comment. We have revised the corresponding description in the abstract. You can find the relevant revisions marked in red in the abstract.

Comment 6: Proposed SDK-Net is similar to .NET SDKs for Visual Studio (Microsoft). This can confuse search engines. You might consider rename it.

Response 6: Thank you for pointing this out, and we agree with your comment. There is indeed a confusion with the abbreviation "SDK" and the .NET SDKs for Visual Studio (Microsoft). We have changed the abbreviation for our method to EGSDK-Net.

Comment 7: "Extensive experiments" cannot be contribution. It is normal research activity. Consider deleting or rephrasing this contribution.

Response 7: Thank you for pointing this out, and we agree with your comment. We have deleted the relevant description.

Reviewer 2 Report

Comments and Suggestions for Authors

This manuscript introduced an edge-guided network for panoptic segmentation. Which is quite reasonable as edge plays an important role in segment. Authors made some comparison with other models. However, reviewers found that the compared models are quite old and quite not state-of-the-art. From reviewer’s opinion, this work is well organized and well written, can be accepted after making following can be accepted after making following major revision.

1. Authors compared their results with some models back in 2020, which may be out of date, authors are suggested to compare more recent models like transformer, diffusion with proposed methods.

2. The edge detection may be confused with noises, would authors please explain how to distinguish them?

Author Response

Comment 1: Authors compared their results with some models back in 2020, which may be out of date, authors are suggested to compare more recent models like transformer, diffusion with proposed methods.

Response 1: Thank you for pointing this out, and we agree with your comment. In the experimental section, we have additionally compared our method with two SOTA works from 2023-2024.

Comment 2: The edge detection may be confused with noises, would authors please explain how to distinguish them?

Response 2: Thank you for pointing this out, and please allow me to provide a detailed explanation. First, we use edge maps generated by the Canny algorithm as ground truth. Since the Canny algorithm employs smoothing filters, non-maximum suppression, and double thresholding, it effectively mitigates the impact of noise, allowing us to obtain accurate ground truth maps with almost no additional computational cost. At the same time, we have designed a small branch within the RTEGM, which generates edge prediction maps during training. The generated edge prediction maps are supervised using a loss function, optimizing RTEGM's ability to extract edge information and reducing the influence of noise in the boundary guidance process. Additionally, in our method, we primarily focus on edge information related to the contours of things or stuff, and we do not pay much attention to the noise inside them. This is because the role of edge guidance is essentially to enhance the boundary pixels of each category’s segment, while internal classification is primarily handled by the backbone network and decoder. Hence, by ignoring some of the noise within the objects in the image, and through supervision via the loss function, we reduce the impact of the remaining noise, ultimately improving the effectiveness of our approach. We have added a description in Section 3.2, third paragraph (page 7, line 251), explaining that supervision via the loss function helps reduce the impact of noise, as highlighted in yellow.

Reviewer 3 Report

Comments and Suggestions for Authors

In the paper “SDK-Net: Edge-Guided Stepwise Dual Kernel Update Network for Panoptic Segmentation” the authors propose an Edge-Guided Stepwise Dual Kernel Update Network for panoptic segmentation, where the core components are the real-time edge guidance module and the stepwise dual kernel update module.

The work presented by the authors is of high interest, but there are some points that need to be improved. My comments are as follows.

The Abstract should provide greater detail on the results obtained with the proposed approach.

The Introduction section should be revised, referring to recently published works (2023 and 2024). Based on the results of more recent proposals, authors should consider reviewing the positioning of the results they obtain with the proposals they present in this paper, as well as indicating the limitations that still exist. Likewise, the Related Work section should be updated to take into account newer methods/approaches.

Figures 1, 2 and 3 should be placed after and as closed as possible the text where they are referred.

In the caption placed on the graph in Figure 1, the reference to the method proposed by Hou et. al. should be more specific, including, for example, the corresponding citation.

While the text only describes the limitations of the method ... compared to that proposed by the authors, Figure 1 presents results from other methods in addition to these two. These methods/results should also be mentioned and explained in the text.

At least one paragraph must be written between section titles and subtitles.

Lines 133 – 134 – The sentence “… while retaining a significant amount of semantic information but …” should be revised.

From Equation 1, the balancing factors for the loss function should be better explained. The same applies for the balancing factors of Equation 2.

Line 294 – PQ metric must be defined and its use justified/explained.

Figure 5 should be placed close to the text where it is referenced. It appears in a different section of the document than the one in which it is referenced.

Visual results should be explained in greater detail. On the other hand, images should be improved so that the reader can better analyze them.

The Conclusions section should begin with a brief contextualization of the study/approach presented in the paper.

Author Response

Comment 1: The Abstract should provide greater detail on the results obtained with the proposed approach.

Response 1: Thank you for pointing this out, and we agree with your comment. We have added a detailed description of the results achieved by our method in the abstract. You can find the revision marked in red at the end of the abstract.

Comment 2: The Introduction section should be revised, referring to recently published works (2023 and 2024). Based on the results of more recent proposals, authors should consider reviewing the positioning of the results they obtain with the proposals they present in this paper, as well as indicating the limitations that still exist. Likewise, the Related Work section should be updated to take into account newer methods/approaches.

Response 2: Thank you for pointing this out, and we agree with your comment. We have introduced two panoptic segmentation works from 2023-2024 in the introduction, summarizing their contributions and highlighting their limitations. Additionally, the related work and experimental sections have been updated to include the newly referenced methods. You can find the highlighted revisions in yellow in the second paragraph of the introduction (pages 1-2, lines 33-41) as well as in the second paragraph of the related work section (page 4, lines 127-132).

Comment 3: Figures 1, 2 and 3 should be placed after and as closed as possible the text where they are referred.

Response 3: Thank you for pointing this out, and we agree with your comment. We have moved Figures 1, 2, and 3 to the positions immediately following their references.

Comment 4: In the caption placed on the graph in Figure 1, the reference to the method proposed by Hou et. al. should be more specific, including, for example, the corresponding citation.

Response 4: Thank you for pointing this out, and we agree with your comment. We have added a more detailed description in the caption of Figure 1, including the references corresponding to each method. The relevant revisions have been highlighted in yellow in the caption of Figure 1.

Comment 5: While the text only describes the limitations of the method ... compared to that proposed by the authors, Figure 1 presents results from other methods in addition to these two. These methods/results should also be mentioned and explained in the text.

Response 5: Thank you for pointing this out, and we agree with your comment. We have added new detailed descriptions about the other methods in Figure 1, both before and after the reference to the figure. You can find these revisions highlighted in yellow in the second paragraph of the introduction (pages 2, lines 60-62 and 67-70).

Comment 6: At least one paragraph must be written between section titles and subtitles.

Response 6: Thank you for pointing this out, and we agree with your comment. We have added a paragraph between the section and subsection titles in section 3 and section 4. You can find the relevant changes highlighted in yellow on page 4, lines 161-166, and page 10, lines 336-338.

Comment 7: Lines 133 – 134 – The sentence “… while retaining a significant amount of semantic information but …” should be revised.

Response 7: Thank you for pointing this out. We agree with your comment. We have corrected the error in the sentence accordingly.You can find the revisions highlighted in yellow on page 5, lines 176-177.

Comment 8: From Equation 1, the balancing factors for the loss function should be better explained. The same applies for the balancing factors of Equation 2.

Response 8: Thank you for pointing this out. We agree with your comment. We have provided a more detailed explanation of the balancing factors in Formula 1 and Formula 2. You can find the revisions highlighted in yellow on page 5, lines 189-190 and 194-196.

Comment 9: Line 294 – PQ metric must be defined and its use justified/explained.

Response 9: Thank you for pointing this out. We agree with your comment. We have added a feasibility explanation for PQ, along with its detailed formula definition. You can find the revisions highlighted in yellow on page 10, lines 346-360.

Comment 10: Figure 5 should be placed close to the text where it is referenced. It appears in a different section of the document than the one in which it is referenced.

Response 10: Thank you for pointing this out. We agree with your comment. We have moved Figure 5 to the location close to its reference in the text.

Comment 11: Visual results should be explained in greater detail. On the other hand, images should be improved so that the reader can better analyze them.

Response 11: Thank you for pointing this out. We agree with your comment. We have adjusted the image sizes in the visual results to allow readers to view and analyze the results more clearly. Additionally, we have revised the caption for Figure 6, and we hope the new caption provides a more detailed description. The corresponding revisions can be found on page 13.

Comment 12: The Conclusions section should begin with a brief contextualization of the study/approach presented in the paper.

Response 12: Thank you for pointing this out. We agree with your comment. We have added a description of the research background at the beginning of the conclusion. You can find the corresponding revisions on page 14, lines 445-449.

Article Menu

EGSDK-Net: Edge-Guided Stepwise Dual Kernel Update Network for Panoptic Segmentation

Further Information

Guidelines

MDPI Initiatives

Follow MDPI