DFA-UNet: Efficient Railroad Image Segmentation
Round 1
Reviewer 1 Report
DFA-UNet: An Efficient Railroad Image Segmentation
The authors present an efficient railroad imagem segmentation algorithm based on an improved U-Net network architecture. The performance evaluation is done based on the railsem19 dataset, and the proposed approach shows some improvement to the original U-Net implementation regarding mIoU, F1-Score, Accuracy, Precision and Recall. I believe the work is interesting but there are some gaps that should be fixed. General comments regarding paper writing and content are listed as follows.
"Percsion" -> "Precision"
"advancement. For the current" -> "advancement for the current"
"structure as U-Net," -> "structure as U-Net."
"deformation model-based," -> "model-based deformation,"
"railroad images often contain" -> "they often contain"
"the real-time of railroad segmentation" -> "the real-time railroad segmentation"
"To improve the real-time of railroad segmentation and improve the ability of segmentation to adapt to complex 33 scenes." -> this sentence seems incomplete
"Therefore, image segmentation using deep learning techniques is a fast and accu-34 rate segmentation method." -> how fast? the authors provide other examples of image segmentation approaches, and for certain deep learning techniques are not faster than threshold-based segmentation, for instance. They are probably more accurate than conventional approaches, but more computational intensive.
"to segmentation the railroad" -> "to segment the railroad"
"proposed by" -> "was proposed by"
"Zhou et al. [5] Presented UNet++ argue that direct" -> rewrite
"differences, In" -> "differences. In"
"U-Net, By" -> "U-Net. By"
"Resnet for" -> remove extra space
"structure, Capable" -> "structure, capable"
"However the lack" -> "However, the lack"
"vision, Hu et al. [11] by studying the relationship between channels, a new structural unit, SE-Net is proposed" -> rewrite
"characterization.Oktat et al.[6]" -> "characterization. Oktat et al.[6]"
"network, Attention Gate is able" -> "network, able"
"architecture, To" -> "architecture. To"
"CNNs, The strategy" -> "CNNs, the strategy"
"encoder, Transformer" -> "encoder. Transformer"
"resolution, This" -> "resolution. This"
"decoder, It is" -> "decoder, it is"
"to its s long-range" -> "to its long-range"
"Camparing our" -> "Comparing our"
"input images ." -> "input images."
"image, The" -> "image. The"
"module, Enable" -> "module, enable"
"decoding, Each" -> "decoding, each"
"designis." -> "designs."
"24G" -> "24GB"
The processor used on the experiments should also be specified.
Please increase font size of the labels in Figure 5.
"it is obvious from the results" -> rewrite
This is not obvious at all. In fact, for the first input image, the result of DFA-UNet is very similar to the one from Attention U-Net. The same behavior repeats with the third input image. In the case of the second input image, the result is too distant from the ground truth. How did the authors measured the comparison to the ground truth? Did they used the amount of pixels correctly classified or was it just visually compared? The result for the second input image seems very bad, even the authors state that it is the best of the tested algorithms. This is very subjective.
How the comparison of the different kernel sizes was performed? Which datasets were used in the comparison? This should be explained in more detail.
"Stronger capture of low-level semantic information through fusion strategy." -> this sentence seems to be detached from the rest of the text.
"scales, In" -> "scales. In"
Some important references are not cited in the text. For instance, the authors do not mention RailNet (https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=8859360&tag=1), which has a superior mIoU than the proposed work and runs at 20 fps using a simpler GPU (NVIDIA GTX1080).
Them authors should provide more details regarding the used datasets for validation. According to the railsem19 dataset webpage, it has 8500 unique images in contrast to the 2000 mentioned by the authors.
In summary, I believe the authors should provide more details regarding the validation of their work together with more comparison with state of the art rail segmentation algorithms.
Other works that should be mentioned/compared:
https://dl.acm.org/doi/pdf/10.1145/3503161.3548050
https://hrcak.srce.hr/file/410699
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Reviewer 2 Report
There are many typos and errors in this paper which requires extensive proofreading service.
The results presented are adequate but very minimal explanation and analyses towards the contribution of this study.
Comments for author File: Comments.pdf
Author Response
Please see the attachment.
Author Response File: Author Response.docx
Round 2
Reviewer 1 Report
Authors have made the necessary adjustments, I'm satisfied.
Please correct this few items in the final version of the paper:
"Percision" -> "Precision"
"although their network has very good detection speed, their accuracy is relatively low" -> "Although their network has very good detection speed, their accuracy is relatively low."
"For image segmentation task" -> the font is different from the rest of the text
"Particulatly," -> "Particularly,"
"these two models respectively." -> "these two models, respectively."