Benchmarking Adversarial Patch Selection and Location
Round 1
Reviewer 1 Report
Comments and Suggestions for Authors- While the paper introduces the PatchMap dataset and its contributions, some of the technical terms may require further elaboration, particularly for readers unfamiliar with adversarial patch attacks. For example, terms such as "attack-success rate (ASR)" and "confidence drop" could be better explained in simpler terms or with a brief introductory context. This would help readers with varying levels of expertise better follow the content.
- In the introduction section, the novelty of PatchMap compared to existing benchmarks such as ImageNet-Patch and REAP is not made fully explicit, especially regarding what is truly “spatially exhaustive”; it is recommended to add a concise comparison paragraph clarifying the unique contributions and limitations relative to these datasets.
- The paper identifies “hot-spots” where adversarial patches cause significant misclassifications, but a more detailed discussion on the factors influencing the appearance of these hot-spots (e.g., object class, patch size, model architecture) could provide readers with a deeper understanding of why these areas are more vulnerable. It would also be helpful to explain how these results compare to existing studies on patch vulnerability.
- In the related work section, the comparison to existing large-scale benchmarks such as REAP and other patch-placement studies is mostly qualitative and does not clearly articulate how PatchMap’s spatial coverage, dataset size, and architectural breadth surpass or complement these efforts. It is recommended to add a dedicated subsection or table that quantitatively contrasts PatchMap with prior benchmarks (e.g., number of images, locations, architectures, and physical realism), thereby sharpening the paper’s novelty and positioning.
- In the dataset design section, the choice of stride-2 grid and the three specific patch sizes is only briefly justified, which may leave readers unsure about the trade-off between spatial resolution, compute cost and realism; it is recommended to include a short sensitivity or ablation discussion to support these design decisions.
- In the evaluation protocol section, the paper defines confidence- and calibration-related metrics (such as ∆conf, ECE, and Brier score) but the later analysis section primarily emphasises ASR and confidence histograms, with little to no numerical reporting on calibration changes. It is recommended to include explicit quantitative results for ECE and Brier score before and after patching (with confidence intervals), and to discuss how spatially varying patches affect model calibration, which would strengthen the “mathematics/benchmarking” character of the work.
- In the analysis and findings section, the paper focuses on a few representative patches (“Plate”, “Guitar”, “Typewriter”) while the dataset contains ten, and it is recommended either to justify this selection or to provide aggregate statistics over all patches (perhaps in an appendix) to support claims of generality.
- In the experimental results on optimisation-free placement strategies, runtime, query complexity, and computational cost are only briefly mentioned, and it is recommended to include a more systematic comparison of computational overhead across Random, Fixed and Seg-guided strategies to substantiate the claim of efficiency.
Author Response
The reply is avaliable both here and in the attached file.
Authors: We thank the reviewer for the careful reading and constructive suggestions. We have revised the manuscript to improve accessibility, sharpen the novelty positioning relative to existing benchmarks, and strengthen the reporting of metrics and efficiency.
- Reviewer :While the paper introduces the PatchMap dataset and its contributions, some of the technical terms may require further elaboration, particularly for readers unfamiliar with adversarial patch attacks. For example, terms such as "attack-success rate (ASR)" and "confidence drop" could be better explained in simpler terms or with a brief introductory context. This would help readers with varying levels of expertise better follow the content.
Authors:Thank you for pointing this out. In the revised manuscript we added short, plain language definitions before introducing our location-based metrics. Specifically, we now define ASR as “the fraction of clean, correct images whose predicted label changes after applying a patch,” and confidence drop as “the decrease in the model’s probability assigned to the correct class after patching.” We also added a brief explanation of ASRq​ as a robustness-to-location measure (how often an image is fooled across a large fraction of possible placements). These definitions are included where the metrics first appear and are also reiterated in the Evaluation Protocol section for readability. - Reviewer :In the introduction section, the novelty of PatchMap compared to existing benchmarks such as ImageNet-Patch and REAP is not made fully explicit, especially regarding what is truly “spatially exhaustive”; it is recommended to add a concise comparison paragraph clarifying the unique contributions and limitations relative to these datasets.
Authors: We agree that the novelty should be clearer earlier. We revised the Introduction to include a concise comparison paragraph that distinguishes PatchMap from existing resources. In particular, we now define “spatially exhaustive” explicitly as evaluating a patch over a dense grid of feasible locations (stride-2; 112×112 positions per image) and releasing the resulting location-conditioned predictions/confidences. We also added a short table contrasting PatchMap with ImageNet-Patch and REAP (task, number of images, placement coverage, realism, and whether outputs are cached), which clarifies that ImageNet-Patch primarily standardizes patch textures and transformed samples, while REAP emphasizes physically realistic rendering for detection under domain constraints, whereas PatchMap isolates and benchmarks placement at dense spatial resolution. - Reviewer :The paper identifies “hot-spots” where adversarial patches cause significant misclassifications, but a more detailed discussion on the factors influencing the appearance of these hot-spots (e.g., object class, patch size, model architecture) could provide readers with a deeper understanding of why these areas are more vulnerable. It would also be helpful to explain how these results compare to existing studies on patch vulnerability.
Authors: Thank you—this is an important interpretability point. We expanded the discussion accompanying the qualitative results to better explain factors that shape hot-spots. Concretely, we now discuss how hot-spots tend to align with: semantically informative object regions (e.g., heads, logos, distinctive parts), patch size (larger patches create broader vulnerable regions, while smaller patches yield more localized peaks) and model architecture/robust training (which shifts and sometimes attenuates, but does not remove high impact regions). We also added explicit comparison to prior placement studies: works that restrict placement to “on object” regions or constrained surfaces (e.g., sign only placement in REAP) can still miss the worst case locations identified by our exhaustive maps, highlighting why dense spatial benchmarking is useful even when semantic priors are available. - Reviewer :In the related work section, the comparison to existing large-scale benchmarks such as REAP and other patch-placement studies is mostly qualitative and does not clearly articulate how PatchMap’s spatial coverage, dataset size, and architectural breadth surpass or complement these efforts. It is recommended to add a dedicated subsection or table that quantitatively contrasts PatchMap with prior benchmarks (e.g., number of images, locations, archite in ctures, and physical realism), thereby sharpening the paper’s novelty and positioning.
Authors: We agree and have strengthened the positioning. In the revised manuscript we added a dedicated quantitative comparison table that contrasts PatchMap with ImageNet-Patch and REAP along the dimensions the reviewer suggested: number of base images, number of patch textures, number of evaluated locations per image, task setting (classification vs detection), and whether the benchmark releases cached location-conditioned outputs. This makes it clear how PatchMap complements prior work: it is designed to benchmark placement sensitivity at high spatial resolution, which is not directly provided by either ImageNet-Patch (patch-content benchmark) or REAP (physically realistic detection benchmark with constrained placement). - Reviewer :In the dataset design section, the choice of stride-2 grid and the three specific patch sizes is only briefly justified, which may leave readers unsure about the trade-off between spatial resolution, compute cost and realism; it is recommended to include a short sensitivity or ablation discussion to support these design decisions.
Authors: Thank you for requesting more justification. We expanded the Dataset Design section with a short ablation discussion. We now motivate the stride-2 grid as a practical resolution that preserves spatial trends while reducing computation, and we explain why it is particularly appropriate for the ResNet-50 setting due to the early stride-2 downsampling in the network. We also clarify the role of the three patch sizes as a controlled “conspicuity vs effectiveness” sweep (from small patches with limited visibility to larger patches with higher success), and we explicitly frame the design as a compromise between exhaustive coverage and tractable compute. - Reviewer :In the evaluation protocol section, the paper defines confidence- and calibration-related metrics (such as ∆conf, ECE, and Brier score) but the later analysis section primarily emphasises ASR and confidence histograms, with little to no numerical reporting on calibration changes. It is recommended to include explicit quantitative results for ECE and Brier score before and after patching (with confidence intervals), and to discuss how spatially varying patches affect model calibration, which would strengthen the “mathematics/benchmarking” character of the work.
Authors: We agree that other confidence metrics are also important and we added a quantitative comparison between different patch placement methods. We emphasized that Attack Success Rate is the major metric compared and discussed across adversarial attack methods in general. We believe that other metrics such as ∆conf while do add benefit and information which is secondary to the Attack Success Rate.
- Reviewer :In the analysis and findings section, the paper focuses on a few representative patches (“Plate”, “Guitar”, “Typewriter”) while the dataset contains ten, and it is recommended either to justify this selection or to provide aggregate statistics over all patches (perhaps in an appendix) to support claims of generality.
Authors:Thank you for the suggestion. We clarified the scope and selection in the Dataset Design section. PatchMap is built from the ImageNet Patch texture set, and in our public v1 release we focus on a small subset of representative, high impact patches to keep compute and storage tractable while still enabling meaningful placement benchmarking. We now justify this choice with an ablation study over the full patch set on a smaller sample, reporting that the selected patches produce strong optimal location success and are therefore informative for studying spatial vulnerability. We also clarified in the text which patches are included in the current release versus planned larger scale releases. - Reviewer :In the experimental results on optimisation-free placement strategies, runtime, query complexity, and computational cost are only briefly mentioned, and it is recommended to include a more systematic comparison of computational overhead across Random, Fixed and Seg-guided strategies to substantiate the claim of efficiency.
Authors: The experiments for Random/Fixed path location strategies mark the base, no added overhead time, the Segmentation based method adds roughly ~40% to the total runtime when using deeplabv3 resnet101, different segmentation model selection effects the runtime overhead. Optimization based patch placement strategies increase the runtime by a far larger factor, roughly 50-60% increase for Grad-CAM based approaches, and 1000-100000% times for iterative optimization attacks.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsWhile the paper clearly states its contributions, the introduction and related work sections could more sharply contrast PatchMap with prior location-optimization methods (e.g., LOAP, RL-based approaches). Emphasize how PatchMap’s exhaustive, query-free, and model-agnostic benchmark differs from prior optimization-based or sparse evaluations.
The description of the spatial sweep (Section 3) is clear, but the rationale for choosing stride-2 and the three specific scales (50, 25, 10 px) could be briefly justified—e.g., computational trade-offs, coverage sufficiency, or relevance to physical patch sizes.
The heuristic is well-motivated, but the paper should discuss its limitations more explicitly: a. How does it perform on images with multiple objects or ambiguous segmentation? b. Is there a risk that segmentation models themselves could be adversarially vulnerable, affecting placement quality? c. A brief comparison with other possible heuristics (e.g., saliency maps, edge density) would strengthen the methodological discussion.
Table 3 is comprehensive but dense. Consider adding a summary table or figure that highlights the average improvement of Seg-guided over Random/Fixed across all models and patch sizes.
The conclusion briefly mentions future directions (medical, robotics). It would strengthen the paper to include a short subsection on limitations, such as: a) The focus on ImageNet-classification (not detection/segmentation tasks). b) The use of a single segmentation model (DeepLab-v3+). c) The assumption of static, non-rotated patches.In Figure 5, the correlation analysis is informative, but the caption should clarify what each subplot represents (patch type/size). Also, discuss why correlation is weaker for lower segmentation confidence (<0.2).
Author Response
The reply is avaliable both here and in the attached file.
Authors: We thank the reviewer for the constructive feedback. We revised the manuscript to sharpen the positioning of PatchMap relative to prior placement-optimization methods, better justify key dataset design choices, state limitations of the segmentation-guided heuristic more explicitly and improve the clarity and summarization of results and figures.
Reviewer :While the paper clearly states its contributions, the introduction and related work sections could more sharply contrast PatchMap with prior location-optimization methods (e.g., LOAP, RL-based approaches). Emphasize how PatchMap’s exhaustive, query-free, and model-agnostic benchmark differs from prior optimization-based or sparse evaluations.
Authors: Thank you for this suggestion. We strengthened the contrast in both the Introduction and Related Work. In particular, we now explicitly distinguish between:
(*) optimization-based placement methods (e.g., LOAP and RL based approaches) that search for a strong location \emph{per image} using gradients or repeated model queries, and
(**) PatchMap, which is a benchmarking resource that exhaustively evaluates placement on a dense grid and releases location-conditioned outcomes.
We additionally highlight that PatchMap is model-agnostic (placement maps can be used to evaluate any placement strategy under identical conditions) and that the segmentation-guided placement we propose is query free (blackbox). We also added a concise quantitative table contrasting PatchMap with prior benchmarks and evaluation setups (coverage, images, locations, realism assumptions), which clarifies what we mean by “spatially exhaustive” and how it differs from sparse or optimization-driven evaluations.
Reviewer :The description of the spatial sweep (Section 3) is clear, but the rationale for choosing stride-2 and the three specific scales (50, 25, 10 px) could be briefly justified—e.g., computational trade-offs, coverage sufficiency, or relevance to physical patch sizes.
Authors: We agree and added justification in the Dataset Design section (“Choice of Parameters”). For stride, we now report an empirical comparison showing that moving from stride-1 to stride-2 preserves the spatial trends in the resulting maps while reducing compute substantially; we also explain why stride-2 aligns well with the effective spatial resolution induced by early downsampling in standard ResNet architectures.
Reviewer :The heuristic is well-motivated, but the paper should discuss its limitations more explicitly: a. How does it perform on images with multiple objects or ambiguous segmentation? b. Is there a risk that segmentation models themselves could be adversarially vulnerable, affecting placement quality? c. A brief comparison with other possible heuristics (e.g., saliency maps, edge density) would strengthen the methodological discussion.
Authors: Thank you for pointing that out. We added a more detailed explanation and we added an explicit discussion (in the Limitations / Qualitative Results discussion) covering a-c:
a. Multiple objects: Our heuristic selects the patch location that maximizes summed segmentation “objectness.” When multiple foreground objects exist or the segmentation is uncertain, the highest-confidence region may correspond to a non-target object or to a subset of the scene that is not maximally influential for the classifier’s decision. In such cases the placement may be sub-optimal relative to the true worst-case location. We explicitly discuss this as a limitation and note that class-conditional or instance-aware variants (e.g., selecting the largest connected component, or restricting the search to the dominant instance) are natural extensions.
- Vulnerability of the segmentation model: We acknowledge that segmentation networks can themselves be adversarially vulnerable. In our setting, segmentation is used as an attacker-side tool to choose placement and is computed on the clean image; thus, an adversary would need to manipulate the input before the attacker observes it to degrade the placement map. More practically, the relevant failure mode is segmentation \emph{error} under domain shift or ambiguous scenes, which can reduce the quality of the placement signal. We now highlight this limitation and emphasize that the heuristic is modular: stronger or more robust segmentation backbones can be swapped in without changing the placement procedure.
- Discussion of other heuristics: We added a brief comparison discussion contrasting our approach with saliency/Grad-CAM-based placement (often requiring whitebox access or additional victim queries) and simple image-only heuristics such as edge-density/texture-based scores (query free but less semantically aligned). This situates our method as a lightweight, semantically informed, optimization-free baseline that does not depend on access to the attacked model.
Reviewer :Table 3 is comprehensive but dense. Consider adding a summary table or figure that highlights the average improvement of Seg-guided over Random/Fixed across all models and patch sizes.
Authors: We agree and added a compact summary derived from Table 3. Specifically, we now report the mean ΔASR of Seg-guided relative to Random and Fixed (averaged over the evaluated models, and reported separately per patch and patch size). This provides a single-glance view of the typical gains (consistent with the 8–13 pp improvement highlighted in the main text) while keeping Table 3 as the full detailed breakdown.
Reviewer :The conclusion briefly mentions future directions (medical, robotics). It would strengthen the paper to include a short subsection on limitations, such as: a) The focus on ImageNet-classification (not detection/segmentation tasks). b) The use of a single segmentation model (DeepLab-v3+). c) The assumption of static, non-rotated patches.
Authors: We agree. We added a short Limitations and Scope subsection (and reflected it in the conclusion) that explicitly states:
(a) PatchMap v1 focuses on ImageNet-style classification, and extending the benchmark to detection/segmentation tasks is an important next step;
(b) the segmentation-guided placement uses a single pretrained segmentation model (DeepLab-v3+), and performance may vary with the segmentation backbone/dataset;
(c) PatchMap v1 assumes static non rotated patches, and does not yet model physical transforms such as rotation, perspective, or illumination changes.
We also briefly outline how each limitation can be addressed in future releases or extensions.
In Figure 5, the correlation analysis is informative, but the caption should clarify what each subplot represents (patch type/size). Also, discuss why correlation is weaker for lower segmentation confidence (<0.2).
Authors: Each subplot in Figure 5 (now figure 6 after addition of one figure) contains a title clarifying which patch and what size it represents. We added an explanation to section 6.4 explaining that correlation is weaker for lower segmentation confidence since lower confidence effect of the patch is obtained due to the addition of a new object to the image (the adversarial patch), and not by changing the object itself. Thus, in certain areas of the image, the object is covered by an adversarial patch which makes no further effect than a patch besides the object.
Author Response File:
Author Response.pdf
Reviewer 3 Report
Comments and Suggestions for AuthorsThe manuscript seems to be more likely a workshop, and it does have some critical issues and weaknesses, as follows:
- The main drawbacks of this review are related to its novelty and the proposed idea. I fail to see new significant findings or advances over the state of the art. I could not see the contribution of the work. Also, the authors fail to highlight the work contributions in comparison to the recent review studies and show the contributions of this work. More specifically, it lacks true novelty, since the main contribution in this work is the PatchMap, described as the "first spatially exhaustive benchmark" for patch placement. However, this is essentially a brute-force evaluation of existing patches from the ImageNet-Patch dataset. The exhaustive spatial sweeping, e.g., a stride-2 grid with 12,544 positions per image, is computationally expensive, and it is a straightforward extension of prior small-scale approaches in previous works. Also, the presented dense evaluations have already been conducted in the related works, such as the REAP and PatchAttack. This work is more likely a resource-intensive data collection effort. The claim to "decouple" the patch is also presented in many previous studies.
- The segmentation-guided placement presented in this work is a basic argmax over the summed confidences given in equation 3. This is not new and has been proposed in many previous works, such as PS-GAN [13] and Grad-CAM-based placement in [15].
- The proposed technique is presented as a tool for the future research; however, no theoretical framework has been provided. The listed contributions seem to be more descriptive than prescriptive.
- There is no real standardization for the dataset design. It looks like a collective from existing ones, such as ImageNet-Patch, ImageNet-1K validation, and ResNet-50 backbone. In fact, no new data curation has been given. Creating the datasets should follow standards, e.g., different patch shapes, rotations, and physical simulations, as in the REAP or the Adversarial Sticker.
- There are not enough introductions, and no sufficient references have been added to the introduction section. It jumps from the general adversarial vulnerability to the motivation part without a clear problem statement or gap analysis. This section should be rewritten.
- The authors have not compared their work with other existing studies to highlight their work contributions. More specifically, there is a lack of comparison with the most recent state-of-the-art studies. Given the rapid increase in publications, the authors should compare their work with the most recent papers on relevant work. More specifically, the authors compared their work only with weak baseline studies, such as random and fixed four-center offsets. It should be clearly compared with stronger approaches, such as saliency/Grad-CAM and zero-query methods. Also, no quantitative comparisons are given against LOAP, PatchAttack, GDPA, and PS-GAN.
- The experimental results and analysis are very weak and do not show significant findings.
- The writing needs to be improved as it lacks fluency and needs professional proofreading. I strongly suggest that the authors carefully proofread the entire manuscript.
In the current situation, the manuscript looks like a workshop drafted level rather than a novel journal paper since it requires significant development to meet the expected standards of a journal.
Author Response
The reply is avaliable both here and in the attached file.
Authors: We thank the reviewer for the careful reading and candid feedback. We understand the concerns about novelty and positioning, and we revised the manuscript to:
(*) Clarify the benchmark contribution and how it differs from prior optimization-based or domain-constrained evaluations.
(**) Sharpen the scope and claims of the segmentation-guided heuristic (as a simple, query-free baseline).
(***) Add quantitative comparisons and stronger baselines where feasible under a fixed-patch setting.
(****) Improve clarity, writing, and the introduction’s gap analysis.
Reviewer :The main drawbacks of this review are related to its novelty and the proposed idea. I fail to see new significant findings or advances over the state of the art. I could not see the contribution of the work. Also, the authors fail to highlight the work contributions in comparison to the recent review studies and show the contributions of this work. More specifically, it lacks true novelty, since the main contribution in this work is the PatchMap, described as the "first spatially exhaustive benchmark" for patch placement. However, this is essentially a brute-force evaluation of existing patches from the ImageNet-Patch dataset. The exhaustive spatial sweeping, e.g., a stride-2 grid with 12,544 positions per image, is computationally expensive, and it is a straightforward extension of prior small-scale approaches in previous works. Also, the presented dense evaluations have already been conducted in the related works, such as the REAP and PatchAttack. This work is more likely a resource-intensive data collection effort. The claim to "decouple" the patch is also presented in many previous studies.
Authors: We appreciate this concern and agree that we must be precise about what is novel. Our main contribution is not a new patch-generation algorithm, but a benchmarking resource that enables controlled, reproducible study of patch placement by releasing dense, location-conditioned outcomes.
Concretely, PatchMap differs from prior works in three key ways (which we now articulate explicitly in the Introduction/Related Work and in a quantitative comparison table):
- Exhaustive full-image placement maps (classification setting). PatchMap evaluates placement on a dense grid over the entire image plane (not a small set of candidate locations and not a constrained surface), producing per-location predicted labels and confidence scores. This enables direct measurement of optimal location and distribution-over-location behavior (e.g., hot-spots and ASRq_qq​) that cannot be obtained from sparse placement protocols.
- Cached, reusable outputs for benchmarking placement strategies. The computational cost is a one-time investment. We release the resulting maps so that future work can evaluate placement heuristics without rerunning an exhaustive search for each new method. This is central to our goal: enabling fair and repeatable comparison of placement strategies under identical conditions.
- Clear separation of “patch appearance” vs “patch placement” in evaluation. We use fixed transferable patch textures (from ImageNet-Patch) specifically to avoid conflating patch content optimization with placement effects. Prior work often couples content and location optimization, which makes it hard to isolate the role of placement. Our “decoupling” claim is therefore about the evaluation axis (holding content fixed while sweeping location), not a claim that the phrase itself is unprecedented.
Regarding the comparison to REAP and PatchAttack: REAP focuses on detection under a realistic, differentiable rendering pipeline with placement constrained to the sign surface, and PatchAttack focuses on learning a policy under a query budget rather than releasing dense full-image spatial vulnerability maps. We now clarify these distinctions and explicitly scope PatchMap as complementary: PatchMap addresses exhaustive spatial placement benchmarking in ImageNet-style classification, while REAP emphasizes physical realism and detection.
Reviewer :The segmentation-guided placement presented in this work is a basic argmax over the summed confidences given in equation 3. This is not new and has been proposed in many previous works, such as PS-GAN [13] and Grad-CAM-based placement in [15].
Authors: We consider segmentation-guided placement as distinct from PS-GAN and Grad-CAM based placement as Grad-CAM based placement requires knowledge of the model weights, making it a whitebox problem, and PS-GAN requires pre-training against a specific model or family of models. Segmentation guided placement approach is different substantially as it does not require any pre-knowledge of the attacked model and is completely agnostic to the content of the patch.
- Reviewer :The proposed technique is presented as a tool for the future research; however, no theoretical framework has been provided. The listed contributions seem to be more descriptive than prescriptive.
Authors: We appreciate this point and have clarified the paper’s nature: PatchMap is primarily a benchmark/dataset paper, where the “framework” is an evaluation formalization.
To make this explicit and more rigorous, we emphasized in the manuscript:
- Placement vulnerability function over location (and size), and we clearly define the derived metrics (location-wise ASR, ASRq​, and confidence-drop surfaces) as operators on this function. This makes the benchmark prescriptive in the sense that it specifies what should be measured to evaluate placement strategies, and how to compare them consistently.
- We add clearer guidance on how future methods can use PatchMap: evaluating an algorithm’s chosen locations against the full spatial map, reporting oracle upper bounds (best possible location), and measuring robustness to location via ASRq or confidence drop​.
Reviewer :There is no real standardization for the dataset design. It looks like a collective from existing ones, such as ImageNet-Patch, ImageNet-1K validation, and ResNet-50 backbone. In fact, no new data curation has been given. Creating the datasets should follow standards, e.g., different patch shapes, rotations, and physical simulations, as in the REAP or the Adversarial Sticker.
Authors: We agree that “standardization” needs to be stated more clearly. Our design choice in PatchMap v1 is to standardize by controlling variables, not by inventing new raw imagery. Specifically, we intentionally fix the base images (ImageNet validation subset),the attacked model (ResNet-50 in v1) and patch textures (transferable patches from ImageNet-Patch), so that the benchmark isolates a single axis of interest: placement (location and size).
This approach is common in benchmark construction where the goal is controlled evaluation under fixed conditions. That said, we fully agree that additional controlled factors: rotation, shape, and physical transforms would broaden realism. We therefore:
(*) explicitly list these as limitations in a dedicated limitations subsection, and
(**) describe how PatchMap can be extended in future releases to incorporate rotations, perspective, illumination (more in the spirit of REAP/Adversarial Sticker) while preserving the same “exhaustive placement” principle.
In other words, PatchMap v1 is not intended to replace physically realistic benchmarks; it complements them by providing a standardized, dense placement benchmark in a canonical classification setting.
Reviewer :There are not enough introductions, and no sufficient references have been added to the introduction section. It jumps from the general adversarial vulnerability to the motivation part without a clear problem statement or gap analysis. This section should be rewritten.
Authors: Thank you. We rewrote the Introduction to include a clearer problem statement separating patch appearance versus placement, an explicit gap analysis explaining why sparse placement protocols and per-image optimization do not provide a reliable benchmark for worst-case placement, and a concise positioning paragraph contrasting PatchMap with ImageNet-Patch and REAP. This makes the novelty claim concrete early.
Reviewer :The authors have not compared their work with other existing studies to highlight their work contributions. More specifically, there is a lack of comparison with the most recent state-of-the-art studies. Given the rapid increase in publications, the authors should compare their work with the most recent papers on relevant work. More specifically, the authors compared their work only with weak baseline studies, such as random and fixed four-center offsets. It should be clearly compared with stronger approaches, such as saliency/Grad-CAM and zero-query methods. Also, no quantitative comparisons are given against LOAP, PatchAttack, GDPA, and PS-GAN.
Authors: As for the segmentation guided placement regime, we believe that as it is an optimization free patch placement regime, it should not be compared to optimization based methods. It also is quite impossible to compare to methods such as PS-GAN which generate a patch on their own and do utilize universal patches.
Reviewer :The experimental results and analysis are very weak and do not show significant findings.
Authors: We respectfully disagree and have revised the presentation to make the findings clearer and more measurable.
On the method side, the segmentation-guided placement is intentionally simple; its purpose is to demonstrate that PatchMap enables practical, query-free placement strategies. We now report improvements more transparently, including averages across models and patch sizes (with confidence intervals where applicable), and we contextualize the gains relative to an oracle “best location” baseline.
On the benchmark side, the main findings are not limited to a single placement heuristic. PatchMap reveals that patch effectiveness can vary dramatically across location, producing consistent spatial hot-spots and large confidence collapse in specific regions—phenomena that sparse placement protocols can miss. We highlight these benchmark-level results more explicitly, and we show how they persist across architectures and under adversarial training.
Reviewer :The writing needs to be improved as it lacks fluency and needs professional proofreading. I strongly suggest that the authors carefully proofread the entire manuscript.
Authors: Thank you. We performed a thorough proofreading pass and revised multiple sections for clarity and flow, including the Introduction, Related Work, Dataset Design, and the Segmentation-Guided section. We corrected grammar issues, improved terminology consistency, removed ambiguous phrasing, and added clarifying definitions where needed. We believe the revised manuscript reads more smoothly and communicates the benchmark contribution more clearly.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsThe authors have addressed my concerns, I have no other concern.
Reviewer 3 Report
Comments and Suggestions for AuthorsI would like to thank the authors for their efforts in revising the manuscript and for addressing the major comments appropriately.

