Next Article in Journal
Improving Mountain Snow and Land Cover Mapping Using Very-High-Resolution (VHR) Optical Satellite Images and Random Forest Machine Learning Models
Previous Article in Journal
Evaluation of Global Historical Cropland Datasets with Regional Historical Evidence and Remotely Sensed Satellite Data from the Xinjiang Area of China
 
 
Article
Peer-Review Record

Detecting Object-Level Scene Changes in Images with Viewpoint Differences Using Graph Matching

Remote Sens. 2022, 14(17), 4225; https://doi.org/10.3390/rs14174225
by Kento Doi 1,2,*, Ryuhei Hamaguchi 2, Yusuke Iwasawa 1, Masaki Onishi 2, Yutaka Matsuo 1 and Ken Sakurada 2
Reviewer 1:
Reviewer 2: Anonymous
Reviewer 3:
Remote Sens. 2022, 14(17), 4225; https://doi.org/10.3390/rs14174225
Submission received: 8 July 2022 / Revised: 11 August 2022 / Accepted: 18 August 2022 / Published: 27 August 2022
(This article belongs to the Section AI Remote Sensing)

Round 1

Reviewer 1 Report

General language comments.

a) to describe the procedure, the methods and the results you use alternatively the past tense and the present tense. Please stick to one of the two.

b) There a few “, and” that seems not correct. E.g. lines 298, 377

c) sometimes you use Fig. sometimes Figure. Please choose one.

d) the order of figures is strange, e.g. Figure 7 is cited before Figure 6. Please check.

Indicating the line and the comments below

1 - Univercity -> University

18 - would use instantaneous instead of instant.

29 - estimates -> estimate

30 - “3D map.Recently” -> “3D map. Recently”

31 - convolutional neural networks (CNNs) [ to define the acronym for the first time]

37 - “sometimes” seems a bit generic

51 - “where” -> since [or the sentence is unclear]

51 - “There is a viewpoint … shutter timing” please rephrase.

54 - “previous studies” -> add e reference to papers or to the section where you discuss the state of the art

79 - “which are” -> which is

86 - the indentation doesn’t sound right 

86 - missing columns at the end of the sentence

Chapter 2 general comment: while being interesting to know all the state of the art this chapter seems to be way too detailed and long for the scope of this paper. I would summarize it, especially 2.1. The citations are already there in lines 22-23 so the papers listed here are already getting the rightful acknowledgment.

 

156 - “object-level” - a definition would help

166 - t1>t2?

166 - what exactly are I1 and I2, please define (I imagine some sort of matrices)

168 - the network may return bounding boxes *or* instance masks. What does it mean exactly? That you have two different networks. You may have defined it later but would help to have this explained here.

171 - “follows: First” -> “follows: first”

171 - “from them” -> “from them, respectively.”

174 - appear - given a natural order is that I2 follows I1 appear seems not to be the right verb. I would say that the object disappears from I2 if the sum in line 175 == 0. The same for disappear later. Seems you want to use appear in I2

Section 3.3.1. general comment: this paragraph is quite obscure. Also, while being most probably referring to weights and biases, there’s no definition of Ws an bs.

Chapter 4 general comments: 

- seems you want to say “columns” where you write “rows” e.g. line 246-247, the second column from the right not “the second row from the bottom”).

- I would use “this work” in place of “ours”.

251 - what exactly is an environment?

264 - I would say that another advantage is that you have an easy way to check ground truth. 

273 -> Is it Figure 5 ? If not Figure 5 is anyway not cited anywhere and it is not clear to me what bounding boxes and masks for changes are in I1 considered that they seems not to be linked with changes in I2.

279 - could be useful to know the binning in the ranges.

Section 5.1.2. : a scheme of the networks could help.

308/309 - what does respectively means in this context? That you multiplied by 0.1 on epoch 16 and again by 0.1 on epoch 22? This is not so clear.

313 - I would remove “The learning … parallel” this is implied in the fact that you used a Tesla V100.

334 - a reference of a definition for mIoU and mCA would help.

339/340 - “follows: For” -> “follows: for

343 - Why are you putting this sentence in brackets and after the dot? I don’t understand. 

350 - Some details on DyHead could help

Figure 10 caption - “matching module, respectively

374/375 - why is it important in infrastructure inspection?

380 - “in the object detection layer” or something similar

393/397 - I would swap the order (1) <-> (2) since you are definining the out of view in (2)

415/416 I wouldn’t propose it as a simple solution (using a broader field of view) since this is more a limit on the images that you could use. It’s an improper solution, you are just suggesting to reduce the scope of your network.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

This paper innovatively uses graph matching to detect object-level scene changes with viewpoint differences. By using detection algorithms to identify objects from different perspectives, and then establishing relationships through a graph neural network to determine changing objects.

1. The usual pixel-level change detection algorithm uses the Siamese neural network in the feature extraction part. In your target detection algorithm part, do you try to use the Siamese neural network to reduce the amount of computation.

2. What are the advantages of your algorithm over conventional pixel-level change detection algorithms.

3. If it is a natural scene dataset, can your algorithm be generalized?

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Reviewer 3 Report

Excellent work that is novel and easy to read and understand!

In this research work, the authors present a novel change detection network with the aim of detecting object-level changes, as opposed to pixel-level changes, from an image pair with viewpoints changes. Their proposed methodology is delivered over two phases. In the first phase, they utilize an object detection module. In the second phase, they utilize a graph matching module. Combined, the two phases allow for accurate and robust detection of object-level changes in image pairs, as well as the extraction of object regions in the form of a bounding box or an instance mask. The introduction and the related work sections present the problem and showcase the latest developments in the literature to solve it. This eases the reader into the topic and paves the way for better understanding of what this manuscript has to offer. The proposed scheme is then well described. This is aided with appropriate figures and mathematical formulation. Next, the experiments, computations and their results are described clearly. This section exhibits its strengths not only from the computed values, but rather from the comparison with counterpart schemes and datasets from the state-of-the-art. The proposed scheme shows either comparable or superior performance. The discussion section provides adequate commentary on the performance of the proposed scheme, while the conclusions section is built on summarizing the contents of the manuscript and referring to the achieved performance. This is especially the case where the proposed scheme is shown not to suffer from loss of recall even with changes in viewpoint. Furthermore, error modes are found and categorized into various classes, including: object-detection errors, graph matching errors, as well as errors due to objects transitioning out of the field of view.

 

The related works are adequate and mostly recent. The manuscript is well-written and easy to comprehend. Very few typing mistakes are detected. Examples of such mistakes include:

·      Missing spaces at the beginning of sentences, as in lines 23 and 30.

·      Extras spaces in lines 145, 158 and 159.

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Back to TopTop