Next Article in Journal
A Universal Fuzzy Logic Optical Water Type Scheme for the Global Oceans
Next Article in Special Issue
KappaMask: AI-Based Cloudmask Processor for Sentinel-2
Previous Article in Journal
Photometric Observations of Aerosol Optical Properties and Emission Flux Rates of Stromboli Volcano Plume during the PEACETIME Campaign
Previous Article in Special Issue
EAAU-Net: Enhanced Asymmetric Attention U-Net for Infrared Small Target Detection
 
 
Article
Peer-Review Record

Direct Aerial Visual Geolocalization Using Deep Neural Networks

Remote Sens. 2021, 13(19), 4017; https://doi.org/10.3390/rs13194017
by Winthrop Harvey 1,*, Chase Rainwater 1 and Jackson Cothren 2
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Remote Sens. 2021, 13(19), 4017; https://doi.org/10.3390/rs13194017
Submission received: 31 August 2021 / Revised: 27 September 2021 / Accepted: 28 September 2021 / Published: 8 October 2021
(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)

Round 1

Reviewer 1 Report

The paper describes a study performed in the field of Aerial Visual Geolocalization. 

I have no particular comments for this work, since it is rigorous in its methods and apparently CNNs yield good results for the specific task. 

My main concerns are related with the scientific soundness of this test, since it should be more considered a scientific report than a scientific article. I'll explain why. 

The task for which the CNN is trained, is not difficult itself, and as far as the authors state, no modifications in the architectures were done. The authors "simply" test different loss functions and networks and compare the performances. This, to some extents, reduce the soundness of the paper, since it is not a novelty that for similar tasks AI driven computer vision is outperforming (and not always!) feature based methods.

All in all, the application is interesting but the results are far from being applied in real scenario.

I can reconsider the paper, only after the following recommendations will be amended.

1 - please explain better, and in detail, how the ground truth data were collected

2 - please explain in detail the training methods, dataset annotation and training accuracy, so that the reader can understand how the network was trained.

3 - at least one comparison with well known methods should be done (SIFT for instance) and the authors should explain pros and cons of exploiting AI

4 - the literature review is poor, as the topic of safe landing and visual localization is quite well known and tackled by several researchers in the past.

5 - the main contribution of the manuscript should be better defined, in both introduction and discussion (by comparing the existing literature, once expanded)

Author Response

Please see attachment.

Author Response File: Author Response.docx

Reviewer 2 Report

In the paper, the authors presented a study of how effectively a modern CNN designed for visual classification can be applied to the problem of absolute visual geolocalization (localization without a prior location estimate). An Xception based CNN architecture was trained for this purpose on data of orthorectified high altitude photographs of the region in Washington County, Arkansas. It is an interesting study whose results may have practical application. The paper still needs some minor revisions:

75 Add the reference [15] as [3] to "Xception CNN architecture" and consecutively renumber references according to MDPI editorial standards.

172 Explain the shortcut ROI (region of interest). Abbreviations should be explained at their first appearance in the text as well.

182-184 This normalization procedure is not  explained clearly. Where are (0,0) coordinates after scaling / normalization If the labels start from image centers? Either labels start in one of the image corners - then they are scaled to range between 0 and 1 or in the center - then range should be -1 to 1. Explanation figure could be added.

199 Add a reference to Keras.

222 When the variable is normalized the statistic is commonly prefixed by a letter N for normalized or R for relative. A scaled or normalized (dimensionless) RMSE should rather be named NRMSE.

Table 1 The comment analogous to previous: NMAE instead of MAE.

238-239 RMSE is a loss (error) function, while Euclidean distance is just a metric. You may use the RMSE of Euclidean distance between two observations as the loss function. The error made by your predictor is the Euclidean distance, and your loss function would be the RMSE of these errors. What do You mean by stating that Euclidean distance differs from RMSE by a constant multiplicative factor? If it is the result of normalization then explain it clearly.

486-496 The "Discussion" section is comprehensive but the "Conclusion" lacks numerical data showing how effectively the designed CNN can be applied to the problem of Absolute Visual Geolocalization. The "Conclusion" should be expanded with the most important results.

Comments for author File: Comments.pdf

Author Response

Please see the attachment.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

The authors followed my suggestions and the article is now ready for publication.

Back to TopTop