Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Unsupervised Knowledge Extraction of Distinctive Landmarks from Earth Imagery Using Deep Feature Outliers for Robust UAV Geo-Localization

Mach. Learn. Knowl. Extr. 2025, 7(3), 81; https://doi.org/10.3390/make7030081

by Zakhar Ostrovskyi¹

, Oleksander Barmak¹

, Pavlo Radiuk^1,*

and Iurii Krak^2,3

Reviewer 1:

John Adegoke

Reviewer 2:

Shihong Yue

Reviewer 3: Anonymous

Mach. Learn. Knowl. Extr. 2025, 7(3), 81; https://doi.org/10.3390/make7030081

Submission received: 24 June 2025 / Revised: 31 July 2025 / Accepted: 11 August 2025 / Published: 13 August 2025

(This article belongs to the Special Issue Deep Learning in Image Analysis and Pattern Recognition, 2nd Edition)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

In this work, the use of outlier detection in deep feature embedding space for autonomous landmark curation is presented.

The Author findings showed that constructed embedding space yields a highly discriminative, computation-friendly set of landmarks which are suitable for real-time, robust UAV navigation. Overall, the paper is well-structured and include a detailed breakdown of segmentation, feature extraction, embedding construction, and unsupervised filtering. I particularly find the use of analytical metrics such as S2S and S2D insightful.

However, It would be good if the Authors can include empirical comparison with strong D2R methods like NetVLAD, DELF, or recent transformer-based models as ground truth for their work. I know that had been talked about in the introduction, but can the Authors state the superiority of the results obtained in this work to these techniques?

The authors should be more explicit about the broader impact and practical benefits of their study to society. While the technical contributions are well-presented, articulating how this work advances real-world applications, would significantly enhance its appeal to a wider audience, including policymakers, engineers, and the general public.

Given that the dataset used in this work were from an urban area in Germany with specific climatic conditions and environmental features, can the Authors explain how this method would perform when applied to other demographics especially with an inconsistent landscape pattern compared to the one used in this work? How will the mode fair under various seasonal variations? Maybe this can be added as part of the task for future directions.

I see that the metrics “Accuracy” was reported in the work for the YOLOv11n segmentation, however, I believe it is paramount for the Authors do also do a detailed error analysis. This is would help in understanding the reasons behind inaccurate segmentations, allowing for targeted improvements in the model's performance especially for real world propagation.

Dataset and Experimental setup section should be moved to the Method section and not the results section.

Line 307 -309: Authors should cite a reference to this claim. Are there any studies that have suggested this for other related studies?

Line 409, what kind of learnable alternatives? Authors should include examples.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This study focuses on the navigation problem of unmanned aerial vehicles in environments without GPS signals, proposing a novel target and path recognition and localization method based on landmark databases and lightweight networks; By proposing a lightweight automated pipeline to identify visually diverse urban buildings and landmarks from Earth observation data, a landmark database for drone positioning has been constructed, which can function normally even without GPS signals; The constructed network can be combined with semantic information without the need for additional training after pre training; Their experimental results validated the correctness of the proposed method; This study contains discussions on advantages, disadvantages, and applicable ranges, which is thus of useful reference value to peers. The technical designs for implementing the method are feasible and reasonable. We recommend to accept the mc after major revisions.

There are the following aspects to be addressed:

1) What is the resolution between the landmarks included in the database, and can it match the resolution of the drone sensor? Because this is the key factor determining applicability. Furthermore, to what extent is this navigation influenced by weather and light intensity?

2）The effect of outlier detection in the second section fails to be fully analyzed.

3）How many similarities does this lightweight network involve on average and equivalently in critical database queries and similarity searches?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript proposes an unsupervised framework for identifying distinctive urban landmarks from satellite imagery to support UAV navigation in GPS-denied environments. The approach involves extracting multi-layer embeddings from a pre-trained segmentation network (YOLOv11n), applying aggregation techniques (max-pooling), and selecting distinctive landmarks via the Isolation Forest algorithm. The method is validated on the VPAIR benchmark dataset, achieving promising improvements in Recall@1 and Recall@5 for selected landmarks versus typical buildings. The manuscript demonstrates a thoughtful combination of existing techniques. While the topic is relevant, I have some questions/comments on methodological novelty, depth of evaluation, and generality required for acceptance.

The scope of manuscript is limited in my point of view. The pipeline is designed exclusively for building landmarks. This limits its generalizability to rural or mixed terrain settings.
The use of Isolation Forest is appropriate for unsupervised outlier detection, but threshold selection appears arbitrary. It would benefit from an adaptive or data-driven strategy.
While conceptual comparisons are presented (Table 3), no empirical comparisons with state-of-the-art Detect-to-Retrieve (D2R) or global embedding methods (e.g., NetVLAD, DELF) are provided.
The method is described as lightweight, but there is no analysis of actual computational cost, memory usage, or runtime on resource-constrained UAV platforms.?
The quality of the landmark selection heavily depends on the segmentation accuracy. However, the manuscript does not explore how segmentation errors affect the final localization performance.
Recall are the main evaluation metrics used in the manuscript. Can authors discuss a bit more on other metrics?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed all the issues raised. I recommend that the paper be accepted for publication in MAKE.

Reviewer 3 Report

Comments and Suggestions for Authors

The authors have well addressed my comments/questions and therefore I would like to recommend it for publication.

Article Menu

Unsupervised Knowledge Extraction of Distinctive Landmarks from Earth Imagery Using Deep Feature Outliers for Robust UAV Geo-Localization

Further Information

Guidelines

MDPI Initiatives

Follow MDPI