MuRDE-FPN: Precise UAV Localization Using Enhanced Feature Pyramid Network
Round 1
Reviewer 1 Report
Comments and Suggestions for Authorsrecommend accept after revision
This manuscript presents MuRDE-FPN, a lightweight one-stream cross-view UAV-to-satellite localization framework that improves Finding Point in Map (FPI) performance by selectively refining the deepest semantic feature layer and mitigating cross-scale misalignment.
I recommend acceptance after minor revisions to strengthen clarity, reproducibility, and presentation.
comments
1) Tables and reporting clarity (Tables 2–5): Please add standard deviations (or confidence intervals) over multiple runs, and explicitly state whether results are from a single seed. Also, ensure consistent formatting for Params/GFLOPs across methods (e.g., same precision, same counting protocol).
2) Related work narrative and positioning: The Related Work section is comprehensive, but the storyline can be tightened. Please add a short paragraph at the end of Sections 2.2–2.3 that explicitly maps prior decoder/FPN variants in FPI to your design choices (why “deepest-layer selective refinement” is the missing piece, and why alignment is critical specifically for cross-view FPI rather than generic dense prediction).
3) Method details for reproducibility: Provide exact architectural specifications for MuRDE and FAM (kernel sizes, dilations, groups, GN settings, where DCNv2 is applied, and feature channel dimensions at each pyramid level).
4) Formula-level clarification and potential issues: For Equations (1)–(2), please clarify how offsets/masks are parameterized and initialized, and whether sampling uses bilinear interpolation (as in standard DCN). For Equations (3)–(4), confirm the definition of RD: the current expression appears to include a “/2” in the square root term
Comments on the Quality of English Languagen/a
Author Response
|
Comments 1: Tables and reporting clarity (Tables 2–5): Please add standard deviations (or confidence intervals) over multiple runs, and explicitly state whether results are from a single seed. Also, ensure consistent formatting for Params/GFLOPs across methods (e.g., same precision, same counting protocol). Response 1: Thank you for your observation. All reported results are obtained from a single run with a fixed seed that we now report in subsection 3.5, lines 556-564. We note that this evaluation protocol is consistent with prior FPI and cross-view UAV–satellite localization works, where datasets have fixed train/test splits and deterministic evaluation procedures, and results are commonly reported from a single run. We have revised Tables 2–5 to ensure consistent formatting and counting procedures across all methods. You may notice that the reported computational costs differ from those in the previous version. This change is intentional. In the initial submission, computational values for comparative methods were taken directly from prior studies. In the revised version, we recomputed all values using a unified counting protocol to ensure consistency. During this process, we identified that the previously reported GFLOPs values corresponded to multiply–accumulate operations (GMACs), reflecting a terminological inconsistency that also appears in parts of the existing literature. Comments 2: Related work narrative and positioning: The Related Work section is comprehensive, but the storyline can be tightened. Please add a short paragraph at the end of Sections 2.2–2.3 that explicitly maps prior decoder/FPN variants in FPI to your design choices (why “deepest-layer selective refinement” is the missing piece, and why alignment is critical specifically for cross-view FPI rather than generic dense prediction). Response 2: Yes, we completely agree with this comment. For this reason, at the end of the section 2.3 we included a small paragraph (lines 327-335) that briefly sums up previous FPI research and that argues the intuition beyond selective last layer enhancement and using a non-generic cross-scale fusion. Comments 3: Method details for reproducibility: Provide exact architectural specifications for MuRDE and FAM (kernel sizes, dilations, groups, GN settings, where DCNv2 is applied, and feature channel dimensions at each pyramid level). Response 3: Thank you for this observation; we completely agree that the specific methodology and specifications are not very explicit. We updated our manuscript in such a way: for MuRDE the update can be seen in Figure 6 description and lines 525-527, and for FAM in lines 536-537. Channel dimensions are now specified in line 492. Comments 4: Formula-level clarification and potential issues: For Equations (1)–(2), please clarify how offsets/masks are parameterized and initialized, and whether sampling uses bilinear interpolation (as in standard DCN). For Equations (3)–(4), confirm the definition of RD: the current expression appears to include a “/2” in the square root term Response 4: Yes, we agree and incorporate your suggestions for clarity purposes. We updated our manuscript to include initialization of the offsets and masks (lines 508-511), and yes, our sampling uses bilinear interpolation. We also confirm that the previous version had the correct expression of RD - “/2” is a normalizing term so that the RD would be bounded between 0 and 1. Such normalization enables us to compare our results to the previous FPI research that used this exact formulation. |
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsBelow you can find some remarks for your Submission:
- Why are Inertial Navigation Systems not mentioned in the Introduction? Today, INS and SINS are autonomous navigation systems.
- What is w(p) in expressions (1) and (2)?
- Why is time t not in expression (3) or (4)?
- What can you say about the efficiency of your method under adverse conditions, such as vibration from driving UAV motors?
- What about the computational efficiency of your method?
- Except for Fig. 10 and Fig. 12, do you have any other experiments with sufficient statistics?
- What about MuRDE-FPN processing in bad weather conditions?
Conclusion: The submission should be revised.
Comments for author File:
Comments.pdf
Author Response
Comments 1: Why are Inertial Navigation Systems not mentioned in the Introduction? Today, INS and SINS are autonomous navigation systems.
Response 1: Thank you for the comment. We would like to clarify that INS was already mentioned in the Introduction (previous version, Lines 53–58: “...internal navigation systems, INS…”). However, we agree that our description did not explicitly mention strapdown INS (SINS) and did not sufficiently emphasize INS/SINS as a standard autonomous navigation baseline. To address this, we revised the paragraph to explicitly introduce SINS, clarify that unaided INS/SINS is autonomous but exhibits unbounded drift over time and therefore is typically aided by external measurements. We position our FPI method as a possible supplementation to INS/SINS. The revised text is provided in Lines 53–61.
Comments 2: What is w(p) in expressions (1) and (2)?
Response 2: Thank you for this comment, in our previous version formulas (1) and (2) contained a typo (it should have been w(pk)) that is now fixed. w(pk) denotes the weight of the convolution kernel for the point pk , we added this clarification to our manuscript (line 502 - 503)
Comments 3: Why is time t not in expression (3) or (4)?
Response 3: Thank you for your question. Time is not a variable in expressions (3) and (4) simply because our main task is positioning in one frame. In other words: we evaluated each sample consisting of UAV image and corresponding satellite map independently and we calculated the relevant scores in the spatial domain. Your suggestion to use time as a variable is very useful for our future work, in which we will incorporate our positioning algorithm to make navigation pipelines more robust (e.g., alongside INS). However, it is out of scope of our current manuscript.
Comments 4: What can you say about the efficiency of your method under adverse conditions, such as vibration from driving UAV motors?
Response 4: Thank you for highlighting this practical aspect. Our current evaluation is performed on processed public single-frame UAV–satellite datasets that do not provide vibration examples. Therefore, we did not explicitly quantify robustness under motor-induced vibration artifacts like motion blur or rolling-shutter distortions. Such degradations are known to occur in UAV imaging and can affect automated processing. For this manuscript, we have included a limitations statement, Lines 726-731, and a revised paragraph in the Discussion clarifying scope and expected impact.
Comments 5: What about the computational efficiency of your method?
Response 5: Thank you for raising computational efficiency. We explicitly report model complexity using Params and GMACs (Table 2). MuRDE-FPN has 14.15M parameters and 11.81 GMACs, which is +0.28M Params / +1.12 GMACs compared to the closest one-stream baseline (DCD-FPI). While DCD-FPI is lighter, the added cost is modest relative to the observed accuracy gains on UL14 and UAV-Sat. We also note that our design choices were selected to remain lightweight (e.g., one-stream backbone and ECA).
Comments 6: Except for Fig. 10 and Fig. 12, do you have any other experiments with sufficient statistics?
Response 6: In our work, we provide results in two sections: the Experimental Results and the Ablation study. In the Experimental section, we provide comparative results on the UL14 and UAV-Sat datasets. In those sections, we include the evaluation of the methods with RDS, MSD (where applicable) and MA@k metrics. The results of the comparisons can be seen in tables 2-4 and figures 8-10 of the current version. Given that we expanded our UAV-Sat analysis, table 3 is updated, table 4 and figure 10 is also new. The results of the Ablation study can be seen in both figures 10-12, but also in tables 5-6 of our current and previous versions.
Comments 7: What about MuRDE-FPN processing in bad weather conditions?
Response 7: Our current evaluation is conducted on UL14 and UAV-Sat, which do not provide controlled labels/metadata for adverse weather (rain/fog/snow), and therefore, we do not claim quantified robustness under such conditions in this manuscript. However, adverse weather degrades camera imagery by reducing contrast, introducing blur or occlusions, and creating appearance shifts that can reduce the reliability of vision-based methods. In the Discussion section (lines 791-802), we clarify that weather-induced appearance changes may reduce matching quality and that robustness should be evaluated in future work using targeted augmentations (imitate fog/rain/snow/contrast and motion blur corruptions), and/or dedicated adverse-condition datasets.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe paper proposes a new UAV localization method based on the lightweight OS-PCPVT transformer and a modified Feature Pyramid Network decoder called MuRDE-FPN. The paper introduces two new modules: MultiReceptive Deformable Enhancement (MuRDE) and Feature Alignment Module (FAM), as well as a new, more diverse UAV-Sat dataset. The paper effectively addresses a real-world problem – the need for reliable UAV localization without reliance on GNSS in conditions of interference or signal loss.
Strengths
- The authors demonstrated a significant improvement in accuracy compared to state-of-the-art methods (FPI, WAMF-FPI, OS-FPI, DCD-FPI) on two datasets;
- Introduction of a new UAV-Sat dataset;
- The decoder design is well-thought-out and sensible, and the proposal is lightweight, which may be crucial for implementing a system for resource-constrained UAVs;
- The paper is well-structured.
Weaknesses
- Limited analysis of the impact of flight altitude and satellite map size - the authors emphasize that the method performs well at various altitudes and on large maps, but no detailed analysis is provided (e.g., RDS vs. altitude or vs. map size plots);
- Weak discussion of the method's limitations - there is no discussion of cases in which the method fails (e.g., dense development, forests, significant seasonal changes, nighttime UAV images);
- Few evaluation metrics (besides RDS and MA@K);
- The workflow should be improved in some places.
Suggested improvements
- Add more evaluation metrics;
- If possible, present results in sample scenarios, e.g.:
* low vs. high UAV flight altitude
* urban vs. suburban/rural areas
- Expand the discussion of the method's limitations;
- Improve the writing style – for example, avoid starting sentences with a citation, e.g., p.6, l.282: "[13] proposed" – use "In [13], ..." or "Chen et al. proposed in [13] ...", etc.
In summary, the article is solid and generally well-written, but the above improvements would significantly improve its value.
Author Response
Comments 1: Limited analysis of the impact of flight altitude and satellite map size - the authors emphasize that the method performs well at various altitudes and on large maps, but no detailed analysis is provided (e.g., RDS vs. altitude or vs. map size plots);
Response 1: Yes, we completely agree that the analysis on our UAV-Sat Dataset is limited. For this reason, we include an additional stratification of RDS results with respect to altitude when evaluating the methods on UAV-Sat dataset. Moreover, given that we introduce a new metric (MSD, see further response), we also correct MA@k metrics by choosing different k’s that better reflect the metric accuracies on the dataset. We provide a stratified result on these updated metrics depending on map size and RDS differences between high and very high altitudes.
Comments 2: Weak discussion of the method's limitations - there is no discussion of cases in which the method fails (e.g., dense development, forests, significant seasonal changes, nighttime UAV images);
Response 2: Thank you for this comment, our method indeed was not tested on more difficult scenarios, except for height and map size. In our current work, we also did not investigate the specific scenarios where our method fails, as this is planned in our future work, we outline these limitations in both the new Conclusions section (lines 746-752) and the Discussion section (lines 790-801). For now, we can conclude that all methods suffer from the decreased relative performance in terms of the relative metric (RDS) in higher altitudes, and in the future, we plan to incorporate this information as well (e.g., high-altitude flights during nighttime).
Comments 3: Few evaluation metrics (besides RDS and MA@K);
Response 3: Yes, we agree. In our previous version, we used metrics, that are staple in the FPI literature. In this version, we also introduce mean spatial distance (MSD), which is similar but not equivalent to RDS. We provide the definition of this metric in subsection 3.5 (lines 584-589), but in a nutshell – this metric is a more intuitive representation of RDS since it is tied to metric distance. Moreover, just like MA@k, it is sensitive to map size – we expect bigger maps to have bigger average distances.
Comments 4: The workflow should be improved in some places.
Response 4: Yes, we agree, and we proofread our manuscript to improve the readability.
Suggested improvements
Comments 5: Add more evaluation metrics;
Response 5: We agree and provide a new metric (MSD). Additionally, we correct thresholds for MA@k in UAV-Sat evaluation that better reflect metric performance.
Comments 6: If possible, present results in sample scenarios, e.g.:
* low vs. high UAV flight altitude
* urban vs. suburban/rural areas
Response 6: Thank you for this recommendation, we incorporated your suggestion in our current version with more extensive evaluation on UAV-Sat dataset and, as mentioned in our previous reply, we provide an additional result where lower and higher altitude conditions are separated. Since UL14 is already a comparatively low-altitude dataset, our combined findings evaluate low/high/very high conditions. Regarding the urban vs. suburban/rural (or any type of content based) comparison, we plan to include and expand this in our future work.
Comments 7: Expand the discussion of the method's limitations;
Response 7: Yes, we updated our manuscript with the new Conclusions section and revised the Discussion to include the limitations of our evaluation. In particular, our limitations are tied to the scarcity of difficult data. As you previously pointed out, no comparison of night-time images or significant seasonal changes is provided. We emphasize this argument and propose a possible research direction to relieve the data scarcity (GAN-based augmented test sets).
Comments 8: Improve the writing style – for example, avoid starting sentences with a citation, e.g., p.6, l.282: "[13] proposed" – use "In [13], ..." or "Chen et al. proposed in [13] ...", etc.
Response 8: Thank you for this suggestion, we made small changes in our text like you propose and fixed English typos.
In summary, the article is solid and generally well-written, but the above improvements would significantly improve its value.
Author Response File:
Author Response.pdf
Reviewer 4 Report
Comments and Suggestions for AuthorsThis paper proposed MuRDE-FPN, which is designed for precise UAV localisation. Realise network lightweighting throughthe OS-PCPVT transformer backbone. I think the content of the article has engineering value, but some issues need to be addressed:
- The UAV satellite pair dataset was mentioned in the Contribution section, but its performance and display are insufficient. Is it open source, and will it bring more significant application prospects to the industry?
- Regarding the experimental evaluation section of 5.1, the description of parameter tuning for different methods is vague. Can you provide a detailed explanation of the tuning values?
- For UAV application papers, end-to-end validation and feasibility are very important. Is there any testing result of end-to-end computing power equipment, or how to conduct similar environmental testing?
- Near line 588, it is suggested to add a discussion on the RDS results.
- The conclusion needs to include a clear description of the comparative improvement results of this method. The author's current writing is quite lengthy, and the limitations should be clarified at the end.
Author Response
Comments1: The UAV satellite pair dataset was mentioned in the Contribution section, but its performance and display are insufficient. Is it open source, and will it bring more significant application prospects to the industry?
Response 1: Thank you for your comment, we agree that from our previous version the usability of UAV-Sat was unclear. In the current version, the evaluation on UAV-Sat dataset is expanded and with additional analysis we can attest to UAV-Sat dataset being more difficult. First, our dataset offers the opportunity to evaluate end-to-end approaches in a bigger search, which would enhance autonomous UAV positioning in more expansive areas and conduct more flexible missions in a GNSS-denied setting. Secondly, our dataset contains data for higher altitudes (400-800 meters), which in itself is a more challenging scenario.
Our dataset is derived from open-source material and in subsection 3.1.2 we provide a detailed description of how to reproduce it. Additionally, we can provide the dataset upon request. In the future, we will make our dataset open source, but for now we also acknowledge its limitations, mainly – no adverse weather conditions, realistic visual obstructions (such as clouds, fog) or adverse scenarios (night-time images, seasonal change) is present.
Comments 2: Regarding the experimental evaluation section of 5.1, the description of parameter tuning for different methods is vague. Can you provide a detailed explanation of the tuning values?
Response 2: Thank you for the comment, we can see how in the previous version it was a bit ambiguous how the hyperparameters were chosen. We previously briefly outlined the tuning values in the 3.5 subsection, but now that section is updated with more explicit choices, as for the other methods, we followed training protocols seen in previous literature (lines 556-564).
For our method, we mainly tuned the learning rate (we observed that our method prefers lower and discriminative learning rates) with scheduling while monitoring loss and RDS curves until convergence was seen.
Comments 3:  For UAV application papers, end-to-end validation and feasibility are very important. Is there any testing result of end-to-end computing power equipment, or how to conduct similar environmental testing?
Response 3: Thank you for this important comment. In this work, we focus on algorithmic design and evaluation rather than full system-level deployment. For this reason, we assess computational complexity (GMACs) and parameter count on a modern GPU (NVIDIA GeForce RTX 4090), which provides an upper bound on achievable real-time performance. End-to-end validation on embedded UAV hardware and under real environmental conditions (e.g., vibration, weather) represents an important direction for future work and would involve deploying the trained model on onboard computing platforms and evaluating performance during real or simulated environmental scenarios.
Comments 4:  Near line 588, it is suggested to add a discussion on the RDS results.
Response 4: Agree, in the previous version there was no discussion on the RDS results when comparing different maps sizes. We expanded the section on the variability of the compared models when evaluating the RDS metric. Moreover, since we expanded our evaluation strategy (with the addition of new metric MSD, stratification dependent on UAV height and map size), similar improvements (brief discussions) were implemented in those sections (e.g., lines 667-673).
Comments 5: The conclusion needs to include a clear description of the comparative improvement results of this method. The author's current writing is quite lengthy, and the limitations should be clarified at the end.
Response 5: Thank you for pointing this out; we also feel that the Discussion section is convoluted and unnecessarily long. For this reason, we added the Conclusions section where our current work is concisely summarized and limitations are noted. Furthermore, the Discussion section outlines both algorithmic limitations (our conclusions can be applied to backbones of similar design) and expands on the limitations of our data/evaluations (e.g. no extreme weather conditions).
Author Response File:
Author Response.pdf
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe revised submission is accepted.
Author Response
We sincerely thank the Reviewer for valuable comments and constructive suggestions. Their feedback has helped us to improve the clarity, consistency, and overall quality of the manuscript. We believe that the revisions made in response to the comments have significantly strengthened the article.
Reviewer 3 Report
Comments and Suggestions for AuthorsAll the comments have been addressed by the authors. Thank you very much.
Author Response
We sincerely thank the Reviewer for valuable comments and constructive suggestions. Their feedback has helped us to improve the clarity, consistency, and overall quality of the manuscript. We believe that the revisions made in response to the comments have significantly strengthened the article.
Reviewer 4 Report
Comments and Suggestions for AuthorsI believe the author has further improved the manuscript and recommend its acceptance. However, the visual appeal of the figures still needs further refinement before publication.
Author Response
Figures were slightly revised to improve clarity and ensure consistent terminology across the manuscript.We sincerely thank the Reviewer for valuable comments and constructive suggestions. Their feedback has helped us to improve the clarity, consistency, and overall quality of the manuscript. We believe that the revisions made in response to the comments have significantly strengthened the article.

