Next Article in Journal
Pavement Structure Characteristics and Behaviour Analysis with Digital Image Correlation
Next Article in Special Issue
Image Processing Approach for Grading IVF Blastocyst: A State-of-the-Art Review and Future Perspective of Deep Learning-Based Models
Previous Article in Journal
Broadband DOA Estimation by Exploiting DFT Extrapolation
Previous Article in Special Issue
Identification of Corrosion on the Inner Walls of Water Pipes Using a VGG Model Incorporating Attentional Mechanisms
 
 
Article
Peer-Review Record

DFA-UNet: Efficient Railroad Image Segmentation

Appl. Sci. 2023, 13(1), 662; https://doi.org/10.3390/app13010662
by Yan Zhang 1, Kefeng Li 1, Guangyuan Zhang 1,*, Zhenfang Zhu 1 and Peng Wang 1,2
Reviewer 2:
Appl. Sci. 2023, 13(1), 662; https://doi.org/10.3390/app13010662
Submission received: 11 December 2022 / Revised: 21 December 2022 / Accepted: 28 December 2022 / Published: 3 January 2023
(This article belongs to the Special Issue Applications of Video, Digital Image Processing and Deep Learning)

Round 1

Reviewer 1 Report

DFA-UNet: An Efficient Railroad Image Segmentation

 

The authors present an efficient railroad imagem segmentation algorithm based on an improved U-Net network architecture. The performance evaluation is done based on the railsem19 dataset, and the proposed approach shows some improvement to the original U-Net implementation regarding mIoU, F1-Score, Accuracy, Precision and Recall. I believe the work is interesting but there are some gaps that should be fixed. General comments regarding paper writing and content are listed as follows.

 

"Percsion" -> "Precision"

"advancement. For the current" -> "advancement for the current"

"structure as U-Net," -> "structure as U-Net."

"deformation model-based," -> "model-based deformation,"

"railroad images often contain" -> "they often contain"

"the real-time of railroad segmentation" -> "the real-time railroad segmentation"

"To improve the real-time of railroad segmentation and improve the ability of segmentation to adapt to complex 33 scenes." -> this sentence seems incomplete

 

"Therefore, image segmentation using deep learning techniques is a fast and accu-34 rate segmentation method." -> how fast? the authors provide other examples of image segmentation approaches, and for certain deep learning techniques are not faster than threshold-based segmentation, for instance. They are probably more accurate than conventional approaches, but more computational intensive.

 

"to segmentation the railroad" -> "to segment the railroad"

"proposed by" -> "was proposed by"

"Zhou et al. [5] Presented UNet++ argue that direct" -> rewrite

"differences, In" -> "differences. In"

"U-Net, By" -> "U-Net. By"

"Resnet  for" -> remove extra space

"structure, Capable" -> "structure, capable"

"However the lack" -> "However, the lack"

"vision, Hu et al. [11] by studying the relationship between channels, a new structural unit, SE-Net is proposed" -> rewrite

"characterization.Oktat et al.[6]" -> "characterization. Oktat et al.[6]"

 

"network, Attention Gate is able" -> "network, able"

 

"architecture, To" -> "architecture. To"

"CNNs, The strategy" -> "CNNs, the strategy"

 

"encoder, Transformer" -> "encoder. Transformer"

"resolution, This" -> "resolution. This"

"decoder, It is" -> "decoder, it is"

 

"to its s long-range" -> "to its long-range"

"Camparing our" -> "Comparing our"

"input images ." -> "input images."

 

"image, The" -> "image. The"

"module, Enable" -> "module, enable"

"decoding, Each" -> "decoding, each"

 

"designis." -> "designs."

 

"24G" -> "24GB"

 

The processor used on the experiments should also be specified.

 

Please increase font size of the labels in Figure 5.

 

"it is obvious from the results" -> rewrite

This is not obvious at all. In fact, for the first input image, the result of DFA-UNet is very similar to the one from Attention U-Net. The same behavior repeats with the third input image. In the case of the second input image, the result is too distant from the ground truth. How did the authors measured the comparison to the ground truth? Did they used the amount of pixels correctly classified or was it just visually compared? The result for the second input image seems very bad, even the authors state that it is the best of the tested algorithms. This is very subjective.

 

How the comparison of the different kernel sizes was performed? Which datasets were used in the comparison? This should be explained in more detail.

 

"Stronger capture of low-level semantic information through fusion strategy." -> this sentence seems to be detached from the rest of the text.

 

"scales, In" -> "scales. In"

 

Some important references are not cited in the text. For instance, the authors do not mention RailNet (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8859360&tag=1), which has a superior mIoU than the proposed work and runs at 20 fps using a simpler GPU (NVIDIA GTX1080).

 

Them authors should provide more details regarding the used datasets for validation. According to the railsem19 dataset webpage, it has 8500 unique images in contrast to the 2000 mentioned by the authors.

 

In summary, I believe the authors should provide more details regarding the validation of their work together with more comparison with state of the art rail segmentation algorithms.

 

Other works that should be mentioned/compared:

 

https://dl.acm.org/doi/pdf/10.1145/3503161.3548050

 

https://hrcak.srce.hr/file/410699

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

There are many typos and errors in this paper which requires extensive proofreading service.

The results presented are adequate but very minimal explanation and analyses towards the contribution of this study. 

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Authors have made the necessary adjustments, I'm satisfied.

Please correct this few items in the final version of the paper:

"Percision" -> "Precision"

"although their network has very good detection speed, their accuracy is relatively low" -> "Although their network has very good detection speed, their accuracy is relatively low."

"For image segmentation task" -> the font is different from the rest of the text

"Particulatly," -> "Particularly,"

"these two models respectively." -> "these two models, respectively."

 

Back to TopTop