Enhancing the Transferability of Generative Targeted Adversarial Attacks via Cosine-Based Logit Alignment

Shi, Tengfei; Wang, Shihai; Liu, Bin

doi:10.3390/math14081370

Open AccessArticle

Enhancing the Transferability of Generative Targeted Adversarial Attacks via Cosine-Based Logit Alignment

by

Tengfei Shi

^1,2

,

Shihai Wang

^1,2,*

and

Bin Liu

^1,2

¹

School of Reliability and Systems Engineering, Beihang University, Beijing 100191, China

²

Science and Technology on Reliability and Environmental Engineering Laboratory, Beijing 100191, China

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(8), 1370; https://doi.org/10.3390/math14081370

Submission received: 2 March 2026 / Revised: 15 April 2026 / Accepted: 17 April 2026 / Published: 19 April 2026

(This article belongs to the Special Issue Advanced Research in Neural Networks, Machine Learning, and Image Processing)

Download

Browse Figures

Versions Notes

Abstract

Adversarial examples reveal critical vulnerabilities in deep neural networks, posing significant risks in real-world deployment. In black-box settings, transferable targeted attacks rely on surrogate models but often suffer from low success rates. We argue that this limitation arises not only from surrogate-boundary overfitting but also from insufficient alignment with the target semantic space, which restricts the ability of adversarial examples to encode target-specific characteristics. To address this issue, we propose Cosine-Based Logit Alignment (CBLA), a unified framework for transferable targeted attacks. CBLA replaces the conventional cross-entropy loss with a cosine similarity objective to enhance directional alignment in logit space and alleviate gradient saturation. In addition, a semantic-invariant transformation strategy is introduced to improve structural consistency and cross-model generalization. Experiments on the ImageNet validation set demonstrate that CBLA consistently improves targeted attack success rates, achieving an average gain of 4.55% over strong baselines across multiple architectures.

Keywords:

deep neural networks; adversarial examples; black-box adversarial attack; transferable attack; targeted attack; transferability

MSC:

68T07

1. Introduction

Deep neural networks (DNNs) have achieved remarkable success in a wide range of domains [1,2]. However, the emergence of adversarial examples has exposed inherent vulnerabilities in DNNs [3], raising serious security concerns for safety-critical applications such as autonomous driving [4] and facial recognition [5,6]. Consequently, research efforts have increasingly shifted from improving network performance and architecture to investigating adversarial attacks [7], adversarial training [8,9], and defense mechanisms [10,11,12].

Adversarial examples (AEs) [13,14] are constructed by applying imperceptible perturbations to benign samples, leading the model to make incorrect predictions. Following their initial discovery [13,14], numerous approaches have been proposed to generate adversarial examples in different scenarios. A fundamental categorization of adversarial attacks is based on the distinction between white-box [15] and black-box [16] settings. In white-box attacks, the adversary has full access to the target model, enabling high success rates with methods such as PGD [17] and MIM [7]. In contrast, black-box attacks assume no access to the target model’s internal parameters or architecture and may allow only limited feedback on queries. Such a setting more closely reflects practical threat models in which the attacker cannot inspect or modify the deployed system. Under these constraints, the inherent transferability of adversarial examples becomes crucial: perturbations crafted on one model often remain effective on other unseen models trained for the same task. This phenomenon has led to the prominence of transfer-based attacks [18,19,20], in which adversarial examples are generated using a surrogate model and subsequently evaluated against black-box targets.

This paper focuses on targeted transfer attacks in black-box settings. Unlike untargeted attacks, targeted attacks aim to mislead the model into a specific class, making them more challenging. Transfer-based attacks can be broadly categorized into iterative and generative methods. Iterative attacks [7,21,22,23] compute gradients for each input and apply them iteratively to craft perturbations. Although highly effective in white-box settings, iterative attacks often exhibit limited transferability because they tend to overfit the surrogate model’s decision boundary. In contrast, generative approaches [18,19] learn a parametric mapping from clean inputs to adversarial examples by training a generator on a dataset using surrogate supervision. Once trained, the generator produces adversarial samples in a feed-forward manner, enabling efficient inference and improved generalization beyond individual input instances. This data-driven formulation mitigates sample-specific overfitting and typically yields stronger cross-model transferability. Our method is based on the generative attack paradigm.

Previous studies largely attribute the limited transferability of adversarial examples to surrogate-specific overfitting, and accordingly propose remedies such as gradient smoothing [21], input transformations [24], and model ensembles [25]. While these strategies mitigate boundary overfitting to some extent, they often overlook a complementary and equally critical issue: insufficient alignment with the target semantic space. We posit that two intertwined yet fundamentally distinct mechanisms govern transferable targeted attacks. On the one hand, overfitting to the surrogate model’s decision boundary constrains cross-model generalization, as the resulting perturbations exploit architecture-dependent vulnerabilities. On the other hand, even when an adversarial example successfully crosses the surrogate boundary, it may remain weakly aligned with the intrinsic semantic direction of the target class. Such under-alignment limits its capacity to induce consistent target predictions across heterogeneous architectures.

From a geometric perspective, targeted attacks require not only crossing the boundary but also converging in direction within the representation space. Cross-entropy optimization primarily aims to increase the probability of the target class, an effect implicit in softmax normalization, and can lead to gradient attenuation when the target logit becomes dominant [22]. As a result, optimization may prematurely converge once the surrogate prediction reaches the target class, without enforcing deeper semantic alignment, defined as the directional consistency between the adversarial logit vector and the target class vector in logit space. Consequently, the optimization may achieve classification success without sufficiently aligning with target-consistent representations, and the crafted adversarial examples may remain close to the surrogate boundary and lack structural stability under architectural variations.

To address this dual imbalance, we propose a simple yet effective targeted transfer framework, termed Cosine-Based Logit Alignment (CBLA). First, we replace the conventional cross-entropy objective with a cosine similarity loss in the logit space. Unlike probability-based objectives, cosine similarity directly encourages directional alignment between the adversarial logits and the one-hot representation of the target class. This formulation mitigates exponential gradient decay induced by softmax normalization and maintains meaningful gradient signals even after successful surrogate misclassification. By focusing on angular alignment rather than magnitude amplification, the optimization process promotes deeper convergence toward the target semantic direction.

Second, we introduce a semantic-invariant transformation strategy to regularize the adversarial space. Instead of relying on clean target-domain samples, we enforce consistency of attackability across multiple transformed views of each adversarial example. Specifically, adversarial samples must preserve their targeted behavior under semantic-preserving input transformations. This consistency constraint implicitly enlarges the effective attack subspace and reduces dependence on surrogate-specific decision geometry. In contrast to prior work that augments training data, our method directly constrains adversarial examples, thereby strengthening their semantic robustness without requiring access to target-domain information.

Together, cosine-based logit alignment and transformation-invariant regularization act in a complementary manner: The former enhances optimization stability and semantic convergence in logit space, while the latter improves structural invariance in attack space. This combination enables the generation of adversarial examples that are less sensitive to architectural discrepancies and dataset shifts, ultimately improving targeted transferability across models and domains.

Our contributions are summarized as follows:

We identify a previously underexplored limitation of transferable targeted attacks, namely, insufficient alignment with the target-attacking space, and analyze it from geometric and optimization perspectives.
We propose a cosine-based logit alignment objective that mitigates gradient attenuation induced by softmax normalization and promotes stable directional convergence toward the target class.
We introduce a semantic-invariant transformation strategy that regularizes adversarial examples in the input space and enhances cross-architecture and cross-domain generalization without requiring target-domain data.
Extensive experiments under both in-domain and cross-domain settings demonstrate that the proposed framework consistently outperforms existing iterative and generative baselines in targeted transfer scenarios.

2. Related Work

Transfer-based adversarial attacks have been widely studied in black-box settings [26,27,28]. Existing approaches can be broadly categorized into two main classes: iterative attacks and generative attacks. Although both paradigms rely on gradient information from surrogate models, they differ significantly in optimization structure, generalization behavior, and computational efficiency.

2.1. Iterative Attacks

Iterative attacks formulate adversarial example generation as the following constrained optimization problem:

max_{δ \in R^{d}} J (f (x + δ), y) s . t . {∥ δ ∥}_{p} \leq ϵ, x + δ \in {[0, 1]}^{d} .

(1)

where

δ \in R^{d}

denotes the perturbation,

ϵ > 0

controls its magnitude under the

ℓ_{p}

-norm, and the box constraint ensures valid pixel ranges.

Early methods such as FGSM [14] perform a single-step update along the sign of the gradient:

x^{a d v} = x + ϵ \cdot sign (\nabla_{x} J (f (x), y)) .

(2)

Multi-step variants, including BIM [15] and PGD [17], refine the perturbation through iterative projected updates,

x_{t + 1} = Π_{B_{ϵ} (x)} (x_{t} + α \cdot sign (\nabla_{x} J (f (x_{t}), y))),

(3)

where

Π_{B_{ϵ} (x)}

denotes projection onto the

ℓ_{p}

-ball centered at x. Momentum-based methods, such as MIM [7], accumulate gradient directions across iterations to avoid oscillations and escape poor local optima. Nesterov-based [21] variants further incorporate look-ahead mechanisms to anticipate future update directions. Other approaches exploit gradient variance reduction [29] and neighborhood aggregation [30] to enhance robustness of the update trajectory. These refinements improve cross-model transfer to some extent by mitigating overfitting to the surrogate model’s local decision boundary.

Another line of research focuses on input transformations and diversity. Methods such as DIM [31] apply random resizing, cropping, or padding at each iteration, thereby simulating data augmentation and expanding the optimization region. Admix [24] perturbs gradient computation by mixing the input with additional images from other classes, encouraging perturbations to align with more transferable directions in input space.

Loss-function design has also been explored as a means to enhance transferability. Logit-based [32] objectives directly manipulate the margin between target and non-target logits, avoiding the normalization effects of the softmax layer. Despite these improvements, iterative methods remain inherently input-specific and often exhibit limited transferability in targeted black-box scenarios, especially when surrogate and target models differ significantly.

2.2. Generative Attacks

Generative approaches model adversarial example generation as a parametric mapping learned over a dataset. Specifically, the generator

G_{θ}

is trained by solving

min_{θ} E_{x \sim D} J (f (G_{θ} (x)), y^{t}) s.t. {∥ G_{θ} (x) - x ∥}_{p} \leq ϵ, G_{θ} (x) \in {[0, 1]}^{d} .

(4)

Here,

D

denotes the training distribution and

y^{t}

is the target label. Early generative attacks such as GAP [33] and ATN [34] employ trainable generators to produce image-dependent perturbations. By optimizing over a data distribution rather than individual inputs, these methods improve efficiency and enhance transferability. Subsequent studies investigate distribution-level alignment strategies. CDA [20] reveals the existence of domain-invariant adversarial features, suggesting that adversarial perturbations may lie in a shared cross-domain subspace. TTP [18] enhances targeted transferability by aligning adversarial samples with the target class’s global and local distributional characteristics. M3D [19] analyzes targeted black-box attacks from a generalization perspective, showing that the target-model attack error is bounded by the surrogate error and inter-model discrepancy, and proposes minimizing this discrepancy during generator training. However, both approaches require in-domain data during training, as their objectives are formulated with respect to the surrogate model and its associated dataset distribution. Such reliance limits their applicability in scenarios with limited or alternative data sources and may restrict generalization across datasets.

Compared with distribution-alignment methods such as TTP and M3D, which rely on matching global or local data statistics, and logit-based objectives such as margin loss and C&W loss that emphasize magnitude differences, existing approaches still lack an explicit mechanism to enforce directional alignment in logit space.

Moreover, input transformation strategies and distribution alignment techniques often introduce additional computational overhead, and their improvements are sometimes coupled with increased model complexity or reliance on in-domain data. As a result, achieving consistent cross-model transferability remains challenging, especially in targeted attack settings.

In contrast, our work revisits transferable attacks from a directional alignment perspective, explicitly modeling the relationship between adversarial logits and target semantic directions, and introducing a lightweight yet effective optimization strategy that improves both stability and transferability.

2.3. Gradient Behavior Analysis of Loss Functions

The loss function plays a critical role in adversarial optimization, as it directly determines gradient quality and optimization dynamics. The Carlini & Wagner (C&W) loss [35] is widely used for targeted attacks due to its margin-based formulation:

L_{C & W} = max (max_{i \neq t} z_{i} (x) - z_{t} (x), - κ),

(5)

which encourages the target logit to exceed all others by a margin.

However, C&W relies on logit differences, and its gradients are driven by the relative gap between classes. As the target logit becomes dominant, this margin quickly saturates, leading to diminished gradient magnitude and less informative updates. Such behavior can hinder optimization stability and limit transferability, especially in cross-model settings where consistent gradient signals are essential.

In contrast, CBLA adopts a cosine similarity objective that emphasizes directional alignment in logit space rather than absolute magnitude differences. By optimizing angular consistency between adversarial logits and the target direction, CBLA maintains stable gradients even when the target logit is large. This effectively mitigates gradient saturation and promotes more consistent optimization trajectories.

Moreover, the cosine-based formulation implicitly normalizes the logit scale, reducing sensitivity to model-specific distributions. Compared to margin-based objectives such as C&W, CBLA provides a more stable, geometry-aware optimization mechanism that is better suited to transferable targeted attacks.

3. Approach

In this section, we first present a formal formulation of the adversarial attack problem, along with the corresponding notation. We then elaborate on the motivation driving this study. Subsequently, we detail the design of the loss functions and the semantic-consistency transformations incorporated into CBLA. Finally, we describe the overall training framework and algorithmic procedure of CBLA.

3.1. Notions and Definitions

Given a trained classification model

f (x) : x \in X \to y \in Y,

the model predicts a label y for an input x, where the ground-truth label of x is denoted by

y^{true}

. An adversarial attack aims to construct a small perturbation

ε

that is added to a clean sample such that the classifier produces an incorrect prediction, i.e.,

f (x + ε) \neq y^{true} .

The perturbation

ε

is typically constrained by an

ℓ_{p}

-norm bound, where

p \in {1, 2, \infty}

. The perturbed input is defined as

x^{adv} = x + ε,

which is referred to as an adversarial example. The adversarial attack can therefore be formulated as:

\begin{matrix} \arg \max_{x^{adv}} & J (f (x^{adv}), y), s . t . & {∥ x^{adv} - x ∥}_{p} \leq ϵ, \end{matrix}

(6)

where

J (\cdot)

typically denotes the cross-entropy loss, and

ϵ

is a small constant controlling the perturbation magnitude.

In this work, we focus on targeted attacks, in which the adversarial example is required not only to induce misclassification but also to drive the prediction toward a specified target label

y^{t}

, i.e.,

f (x + ε) = y^{t} .

Compared with untargeted attacks, targeted attacks constitute a more challenging optimization scenario.

In this study, adversarial examples are generated using a generative model

G (\cdot)

, and the targeted attack objective is formulated as:

\begin{matrix} \arg \min_{θ} & J (f (G_{θ} (x)), y^{t}), s . t . & {∥ G_{θ} (x) - x ∥}_{\infty} \leq ϵ, \end{matrix}

(7)

where

G (\cdot)

denotes the generator network. In this work, the perturbation is constrained by the

ℓ_{\infty}

-norm.

3.2. A Geometric Perspective on Transferability

Adversarial examples generated on a surrogate model often transfer to other models trained for the same task. Nevertheless, a significant performance gap remains between transfer-based attacks and white-box attacks, especially in the targeted setting. This gap suggests that the geometric relationships among the decision regions of different models intrinsically constrain transferability.

A key observation is that adversarial examples optimized on a single surrogate model frequently lie near the boundary of its attackable region, rather than in regions that are jointly vulnerable across multiple models. Consequently, such perturbations may fail to generalize when evaluated on unseen architectures. To formalize this intuition and motivate our method, we introduce a geometric description of structures shared across classifiers.

Shared Target Region Across Models. Let

{f_{m}}_{m = 1}^{M}

denote a collection of classifiers trained for the same task, with label set

Y

and input space

X \subset R^{d}

. For each class

t \in Y

, define the decision region of model

f_{m}

as:

D_{m}^{t} : = {x \in X ∣ f_{m} (x) = t} .

(8)

The shared target region across models is given by

S^{t} : = ⋂_{m = 1}^{M} D_{m}^{t} .

(9)

Under standard supervised training on a common dataset, different models tend to agree on a nontrivial subset of samples per class. Therefore,

S^{t}

is typically non-empty in practice. This is consistent with the empirical observation that clean samples are often consistently classified across heterogeneous architectures.

Shared Targeted Region Across Models. For a fixed input x and target label t, define the targeted adversarial region of model

f_{m}

under perturbation budget

ϵ

as:

Ω_{m}^{t} (x) = {δ \in R^{d} ∣ f_{m} (x + δ) = t, {∥ δ ∥}_{p} \leq ϵ} .

(10)

The shared targeted region across models is

T^{t} (x) = ⋂_{m = 1}^{M} Ω_{m}^{t} (x) .

(11)

In general, the non-emptiness of

T^{t} (x)

cannot be guaranteed without specific assumptions on model geometry. However, the well-established phenomenon of adversarial transfer indicates that perturbations effective on one surrogate model often remain effective across multiple models. This empirical evidence suggests that transferable targeted adversarial examples tend to lie in the vicinity of shared target-consistent regions.

These constructions lead to the following observations.

(1) Relative size of shared regions. The shared targeted adversarial region

T^{t} (x)

is generally much smaller than the shared decision region

S^{t}

. While

S^{t}

consists of inputs naturally classified as class t by all models,

T^{t} (x)

additionally requires reachability from a given input x under a constrained perturbation budget. Therefore, transferable targeted attacks can be interpreted as searching for feasible perturbations that move x into a relatively small intersection region embedded within a larger shared class-consistent domain.

(2) Prototype-based interpretation. The existence of

S^{t}

implies that there exist samples that are consistently recognized as class t across models. Such samples can be regarded as prototype points located in the interior of the shared decision region. From this perspective, effective targeted adversarial examples should not only cross the decision boundary of a surrogate model, but also move toward directions that approximate the geometric structure of these prototype regions. In particular, cosine similarity encourages alignment between the logit vector z and the target basis vector

e_{t}

, which provides a normalized directional objective independent of logit magnitude. This can be interpreted as approximating movement toward a target-consistent prototype direction in logit space, thereby improving cross-model transferability.

Transfer-based targeted attacks seek perturbations that move an input sample into the shared adversarial region

T^{t}

using only a surrogate model. The effectiveness of this process depends critically on the attack objective and the gradient feedback it provides. In most existing methods, the cross-entropy loss is adopted to guide perturbation updates.

However, once the perturbed sample crosses the surrogate model’s decision boundary, the cross-entropy gradient rapidly diminishes. As a consequence, the optimization trajectory often stagnates near the boundary of the surrogate model’s attack region. We refer to regions in the optimization space where gradient magnitudes become extremely small and provide little informative direction as attack dead zones. In these regions, optimization updates become unstable or ineffective, and perturbations are dominated by local surrogate-specific decision boundaries rather than target-consistent directions, resulting in poor transferability. This issue is particularly evident in iterative attacks such as FGSM [14] (Figure 1a) and MIM [7] (Figure 1b). Although generative approaches alleviate sample-wise overfitting [18] (Figure 1c), they still inherit the gradient-vanishing behavior induced by the cross-entropy objective.

Motivated by this limitation, we replace the cross-entropy objective with a loss that mitigates gradient decay and enlarges the region in which meaningful update signals can be obtained. By shrinking the attack dead zone, the optimization is less confined to surrogate-specific boundary artifacts and more likely to move toward the shared adversarial region

T^{t}

.

Nevertheless, reducing gradient vanishing alone does not guarantee that adversarial examples will lie inside

T^{t}

, whose structure is unknown. From the geometric perspective introduced earlier, highly transferable perturbations should approximate the shared target-consistent region

S^{t}

. Clean target samples naturally reside in this region and exhibit strong cross-model agreement. Existing methods, such as TTP [18] and M3D [19], attempt to exploit this observation by encouraging similarity between adversarial and target samples, either through logit alignment or feature-level discrimination (Figure 1c). While effective to some extent, these strategies remain dependent on surrogate-model representations and require access to real target samples.

To further reduce surrogate dependence, we instead impose constraints directly in the input space. Clean target samples preserve their class identity under semantic-preserving augmentations. Therefore, if an adversarial example maintains its attack objective under similar transformations, it is likely to capture more stable and model-agnostic target features. This transformation-based consistency serves as implicit regularization toward a shared target structure, without requiring target-domain data.

In summary, improving targeted transferability requires addressing two coupled aspects: (i) reducing model-dependent perturbations caused by gradient saturation, and (ii) enhancing target-oriented, model-agnostic features. Our framework jointly mitigates gradient vanishing and enforces semantic-invariance constraints, thereby promoting adversarial examples that are both structurally stable and strongly transferable.

3.3. Cosine Similarity as an Alternative to Cross-Entropy

The choice of loss function plays a central role in transfer-based targeted attacks, as it determines the optimization trajectory in the surrogate model’s attack space. In standard classification, cross-entropy (CE) is widely used for its stable gradients and strong class-separability properties [36]. However, in the context of targeted adversarial optimization, its behavior becomes suboptimal [32].

For a logit vector

z = (z_{1}, \dots, z_{K})

and target class t, the gradient of cross-entropy (CE) with respect to

z_{i}

is

\frac{\partial L_{CE}}{\partial z_{i}} = p_{i} - y_{i}, p_{i} = \frac{e^{z_{i}}}{\sum_{j} e^{z_{j}}},

(12)

where

y_{i}

denotes the one-hot target label.

In targeted attacks, the objective encourages

z_{t}

to dominate other logits. Let

Δ = z_{t} - {max}_{k \neq t} z_{k}

denote the logit margin. Without loss of generality, assume

z_{t}

is the largest logit. Then,

p_{t} = \frac{1}{1 + \sum_{k \neq t} e^{z_{k} - z_{t}}} = \frac{1}{1 + \sum_{k \neq t} e^{- Δ_{k}}},

(13)

where

Δ_{k} = z_{t} - z_{k} \geq Δ

.

As

Δ \to + \infty

, all terms

e^{- Δ_{k}}

vanish exponentially, yielding

1 - p_{t} \approx \sum_{k \neq t} e^{- Δ_{k}} = O (e^{- Δ}) .

(14)

Thus, the gradient on the target logit satisfies

|\frac{\partial L_{CE}}{\partial z_{t}}| = 1 - p_{t} = O (e^{- Δ}),

(15)

and similarly for non-target classes. Therefore,

∥ \nabla_{z} L_{CE} ∥ = O (e^{- Δ}), Δ \to + \infty .

(16)

This shows that CE gradients decay exponentially as the logit margin increases, leading to rapid gradient saturation once the decision boundary is crossed.

To address this limitation, we instead optimize directly in logit space using cosine similarity. Let z denote the logit vector and

e_{t}

the one-hot target vector. The cosine loss is defined as:

L_{\cos} = 1 - \frac{〈 z, e_{t} 〉}{{∥ z ∥}_{2} {∥ e_{t} ∥}_{2}} .

(17)

Since

{∥ e_{t} ∥}_{2} = 1

, we have

〈 z, e_{t} 〉 = z_{t}

, and thus

L_{\cos} = 1 - \frac{z_{t}}{{∥ z ∥}_{2}} .

(18)

Taking the gradient with respect to z, we obtain

\nabla_{z} L_{\cos} = - \frac{e_{t}}{∥ z ∥} + \frac{z_{t}}{{∥ z ∥}^{3}} z .

(19)

Bounding the norm of this gradient yields

∥ \nabla_{z} L_{\cos} ∥ \leq \frac{1}{∥ z ∥} + \frac{| z_{t} |}{{∥ z ∥}^{2}} \leq \frac{C}{∥ z ∥},

(20)

for some constant C. Therefore,

∥ \nabla_{z} L_{\cos} ∥ = O (\frac{1}{∥ z ∥}), ∥ z ∥ \to \infty .

(21)

Unlike CE, whose gradients decay exponentially with respect to the logit margin, the cosine loss exhibits only polynomial decay. As a result, it maintains non-negligible gradient signals even when the target logit is dominant.

Geometrically, cross-entropy emphasizes probability saturation, whereas cosine similarity enforces directional alignment in logit space. This allows CBLA to continue optimizing after successful misclassification, pushing adversarial examples deeper into target-consistent regions and improving transferability. Therefore, compared with CE or other magnitude-based objectives, cosine similarity yields substantially less rapidly vanishing gradients and provides a more suitable optimization signal for transferable targeted attacks.

3.4. Semantic-Invariant Constraint

The previous section addresses gradient degeneration by modifying the attack objective in logit space. However, even with sustained gradients, the optimization trajectory may remain surrogate-dependent, since the update direction is entirely determined by surrogate-specific feedback.

To further reduce this dependency, we introduce a structural constraint directly in the input domain, termed the Semantic-Invariant Constraint (SIC). The key idea is to enforce target consistency under semantic-preserving transformations.

Let

x^{a d v} = G (x)

denote the generated adversarial example targeting class t. We consider two semantic-invariant transformations (SIT): an appearance-domain transformation

T_{a} (\cdot)

and a spatial-domain transformation

T_{s} (\cdot)

. SIC requires that the adversarial objective remain valid after applying either transformation:

f (T_{a} (x^{a d v})) = t, f (T_{s} (x^{a d v})) = t .

(22)

This design is motivated by the observation that clean target samples located in the shared decision region

S^{t}

remain correctly classified under Semantic-Invariant transformations. Therefore, enforcing such invariance encourages adversarial examples to approximate the structural properties of genuine target samples, rather than merely crossing surrogate-specific decision boundaries.

In this work, the following transformations are used:

Appearance-domain transformation $T_{a}$ . This includes global, small-amplitude perturbations such as horizontal flipping, color jittering (brightness, contrast, saturation, hue), and grayscale conversion.
Spatial-domain transformation $T_{s}$ . This consists of four discrete rotation operations: $0^{\circ}$ , $90^{\circ}$ , $180^{\circ}$ , and $270^{\circ}$ .

These transformations are selected according to four principles:

Semantic preservation. The chosen operations do not alter class identity and act globally on the image, avoiding local occlusion or cropping that may distort semantics.
High-frequency robustness. Transfer failures are often induced by high-frequency perturbation artifacts; these transformations reduce sensitivity to such components.
Gradient stability. Color jittering is implemented via small parametric adjustments, and $90^{\circ}$ rotations are realized through tensor permutations without interpolation, ensuring stable gradient propagation.
Controlled comparison. By adopting the same base transformations as in TTP [18], performance differences can be attributed to the application target (adversarial examples versus training samples) rather than to the transformation design.

Unlike previous approaches that apply transformations to training data to regularize surrogate learning, SIC directly constrains the adversarial example itself. Rather than modifying the surrogate model’s representation space, we impose structural consistency on the perturbation outcome. Consequently, the generated adversarial examples are encouraged to move toward target-consistent regions that are stable under semantic variation, thereby improving cross-model transferability.

Formally, the SIC loss is defined as:

L_{SIC} = L_{\cos} (f (T_{a} (x^{a d v})), y^{t}) + L_{\cos} (f (T_{s} (x^{a d v})), y^{t}) .

(23)

The overall generator objective combines the logit-based loss and the semantic-invariance constraint:

min_{θ} L_{adv} + L_{SIC}, s.t. {∥ G_{θ} (x) - x ∥}_{\infty} \leq ϵ .

(24)

This joint formulation simultaneously maintains gradient signals in logit space and enforces semantic consistency in input space, thereby providing complementary mechanisms to enhance transferable targeted attacks.

3.5. Overall Framework and Implementation

The overall training procedure of the proposed method is illustrated in Figure 2. Given a mini-batch of clean images

x \in {[0, 1]}^{B \times C \times H \times W}

, the generator

G_{θ}

directly outputs unconstrained adversarial candidates within the valid pixel range

[0, 1]

. These preliminary outputs may contain high-frequency artifacts induced by optimization dynamics. To suppress such undesirable noise and promote smoother perturbation structures, a Gaussian smoothing operator is applied to the generated images. After smoothing, the samples are projected onto the feasible perturbation set to ensure compliance with the

ℓ_{\infty}

constraint and valid image bounds. The resulting adversarial batch is then subjected to semantic-preserving transformations; both original and transformed samples are forwarded through the surrogate model. Cosine-based logit alignment losses are computed and combined with semantic-invariance regularization to update the generator parameters.

Generator. The generator

G_{θ}

adopts a U-Net [37] architecture and learns a parametric mapping from clean images to unconstrained perturbed examples.

Gaussian Smoothing. To suppress high-frequency noise and stabilize gradient propagation, a Gaussian smoothing operator

S (\cdot)

is applied to the generated examples. Specifically,

S (\cdot)

is implemented as a Gaussian filter with kernel size

3 \times 3

and standard deviation

σ = 2

. This operation reduces localized artifacts and encourages smoother perturbation patterns, which are empirically more transferable across architectures [18].

Projection and Clipping. The adversarial example must satisfy two constraints: (i) the

ℓ_{\infty}

perturbation budget relative to the clean image, and (ii) the valid pixel range

[0, 1]

. To enforce these requirements, we first project the smoothed perturbation onto the admissible perturbation set:

\hat{x} = Π_{{∥ G_{θ} (x) - x ∥}_{\infty} \leq ϵ} (S * G_{θ} (x)),

(25)

which guarantees that

{∥ \hat{x} - x ∥}_{\infty} \leq ϵ .

(26)

Subsequently, we apply element-wise clipping to obtain a valid image:

x^{a d v} = Π_{[0, 1]} (\hat{x}),

(27)

where

Π_{[0, 1]}

denotes pixel-wise truncation into the interval

[0, 1]

.

Semantic-Invariant Transformation. To enforce structural consistency, two semantic-preserving transformations are applied to

x^{a d v}

: an appearance-domain transformation

T_{a} (\cdot)

and a spatial-domain transformation

T_{s} (\cdot)

. This produces two transformed adversarial samples:

x_{a}^{a d v} = T_{a} (x^{a d v}), x_{s}^{a d v} = T_{s} (x^{a d v}) .

(28)

Attack Loss. All adversarial samples are forwarded through the pretrained surrogate model

F_{ψ}

to obtain logits:

z = F_{ψ} (x^{a d v}), z_{a} = F_{ψ} (x_{a}^{a d v}), z_{s} = F_{ψ} (x_{s}^{a d v}) .

(29)

The total generator loss combines the primary cosine objective and the semantic-invariance regularization:

L_{G} = L_{\cos} (z, t) + L_{\cos} (z_{a}, t) + L_{\cos} (z_{s}, t) .

(30)

The generator parameters

θ

are optimized via gradient descent to minimize

L_{G}

. The complete training procedure is summarized in Algorithm 1.

Algorithm 1 Cosine-Based Logit Alignment Algorithm

Require: Training data

X

, pretrained substitute model

F_{ψ}

, perturbation budget

ϵ

, target class t, loss criteria

L_{\cos}

, the number of iteration T.

Ensure: Generator

G_{θ}

1:: Initialize the generator $G_{θ}$ .
2:: for $e p o c h = 1, \dots, T$ do
3:: Randomly sample a mini-batch $x \in X$ .
4:: Generate unbounded perturbed examples $x^{p e r} = G_{θ} (x) .$
5::        Get adversarial examples $x^{a d v}$ :
           $\hat{x} = Π_{∥ x^{p e r} {- x ∥}_{\infty} \leq ϵ} (S * x^{p e r}),$
           $x^{a d v} = Π_{[0, 1]} (\hat{x})$
6:: Obtain transformed adversarial samples $x_{a}^{a d v}, x_{s}^{a d v}$ using Equation (28).
7:: Forward pass $x^{a d v}, x_{a}^{a d v}, x_{s}^{a d v}$ through $F_{ψ}$ and get logits:
$z^{a d v} = F_{ψ} (x^{a d v}), z_{a}^{a d v} = F_{ψ} (x_{a}^{a d v}), z_{s}^{a d v} = F_{ψ} (x_{s}^{a d v})$
8:: Calculate attack loss:
$L_{adv} = L_{\cos} (z, t), L_{SIC} = L_{\cos} (z_{a}, t) + L_{\cos} (z_{s}, t)$
9:: Calculate loss of the generator:
$L_{G} = L_{adv} + L_{SIC}$
10:: Update $θ \leftarrow \min L_{G}$
11:: end for
12:: return Generator $G_{θ}$

4. Experiments

This section first introduces the experimental setup. We then present the in-domain targeted attack results to evaluate the effectiveness of the proposed method, followed by cross-domain transfer experiments where adversarial generators are trained on auxiliary datasets and evaluated on the ImageNet validation set. Next, we analyze the contribution of individual components through ablation studies, including cosine-based alignment and semantic-invariance transformations. In addition, we provide a comprehensive robustness and practical analysis, covering performance under defense methods, computational efficiency, generator architecture, perturbation budget, and perceptual quality.

4.1. Experiment Setup

Datasets. All experiments are conducted under the ImageNet classification task. To evaluate transferability under different training distributions, we consider both in-domain and cross-domain settings. In the in-domain setting, for fair comparison with TTP [18] and M3D [19], we follow their protocol and randomly select 50,000 images from the ImageNet training set [38]. In the cross-domain setting, we use two alternative datasets, namely Comics [39] and Paintings [40]. The entire Comics training split (42,175 images) is used, while 50,000 images are randomly sampled from the Paintings training set. This design enables a systematic evaluation of the robustness and cross-domain transferability of the proposed method. All images were resized to

224 \times 224 \times 3

and normalized to the range

[0, 1]

.

Models. Following prior work [18,19], we adopt VGG19_BN [41], DenseNet121 [42], and ResNet50 [36] as surrogate models for training the adversarial generator. The generated adversarial examples are subsequently evaluated on a diverse set of target architectures, including VGG19_BN [41], DenseNet121 [42], ResNet50 [36], ResNet152 [36], and WRN-50-2 [36]. This evaluation protocol enables a systematic assessment of both white-box and cross-architecture transfer performance, thereby providing a comprehensive analysis of the proposed method’s generalization capability.

Baselines. We evaluate the proposed method against a diverse set of baseline approaches, including MIM [7], DIM [31], GAP [33], CDA [20], TTP [18], and M3D [19]. These methods span both iterative gradient-based attacks and generative adversarial attacks, and include techniques explicitly designed to enhance cross-model and cross-domain transferability. As such, they constitute a representative and comprehensive benchmark for assessing the robustness and transfer performance of adversarial attacks.

Implementation Details. The generator is optimized using the Adam optimizer [43] with a learning rate of

5 \times 10^{- 4}

. The exponential decay rates for the first- and second-moment estimates are set to 0.5 and 0.999, respectively. The generator is trained for 15 epochs. The adversarial perturbation is constrained under an

ℓ_{\infty}

norm bound of

ϵ = 10

during training [18,19]. For evaluation, the perturbation budget is set to

ϵ = 16

.

Evaluation Metrics. For targeted attacks, an evaluation is conducted on the ImageNet validation set. We select ten target classes

T = {24, 99, 245, 344, 471, 555, 661, 701, 802,

919}

[18,19], and report the Average Attack Success Rate (AASR) over these targets.

For each target class

t \in T

, adversarial examples are generated for all validation samples whose ground-truth label is not t. In total,

N = 49,950

validation images are used for evaluation per target. The Attack Success Rate (ASR) for class t is defined as

{ASR}_{t} = \frac{1}{N} \sum_{i = 1}^{N} I (f (x_{i}^{adv, t}) = t),

(31)

where

I (\cdot)

denotes the indicator function.

The final Average Attack Success Rate (AASR) is computed as

AASR = \frac{1}{| T |} \sum_{t \in T} {ASR}_{t} .

(32)

4.2. In-Domain Targeted Attack Results

Table 1 reports the targeted transfer performance across multiple surrogate–target combinations. All results are averaged over 3 independent runs with different random seeds. Statistical significance is evaluated using paired t-tests across the same set of input images (

p < 0.01

). Overall, CBLA consistently achieves the highest success rates among all compared baselines, demonstrating superior transferability under both white-box and cross-architecture settings. Iterative attacks such as MIM [7] and DIM [31] exhibit limited cross-model generalization, typically yielding success rates below 15% on unseen architectures. Although prior generative approaches, including TTP [18] and M3D [19], significantly improve transferability compared with iterative methods, their performance remains notably inferior to CBLA. This indicates that simply learning transferable perturbations from surrogate supervision is insufficient without explicitly enhancing target-space alignment.

The superiority of CBLA becomes more evident in heterogeneous transfer scenarios. For example, when VGG19_BN serves as the surrogate model, the success rate on WRN-50-2 increases from 32.63% (TTP [18]) and 68.43% (M3D [19]) to 82.25% under CBLA. This corresponds to an absolute improvement of 13.82 percentage points over the strongest generative baseline (M3D [19]) and 49.62 percentage points over TTP, representing approximately a 2.5× increase relative to TTP [18]. Similar trends are observed under other surrogate architectures. In particular, when ResNet50 is used as the surrogate, CBLA achieves success rates exceeding 96% on all unseen target models, approaching white-box performance while maintaining full transferability.

These results suggest that CBLA not only enhances attack strength on the surrogate model but, more importantly, promotes model-agnostic target alignment. By mitigating gradient saturation and enforcing semantic consistency, the generated adversarial examples move deeper into shared target-consistent regions rather than remaining near surrogate-specific decision boundaries. Such structural alignment explains the substantial and stable gains observed across diverse architectural pairs.

To further illustrate the effectiveness of CBLA, Figure 3 presents qualitative targeted adversarial examples generated using ResNet50 as the surrogate model. For each clean source image, adversarial samples are produced toward multiple target classes. Despite the strict

ℓ_{\infty}

constraint (

ϵ \leq 16

), the generated examples exhibit clear semantic alignment with the designated target categories. This observation supports our geometric interpretation: CBLA encourages adversarial samples to move toward target-consistent regions that are structurally stable across models, rather than merely crossing surrogate-specific decision boundaries.

4.3. Cross-Domain Targeted Attack Results

Table 2 reports the targeted transfer performance under the cross-domain setting. In this configuration, adversarial generators are trained on auxiliary datasets—Paintings (P) [40] or Comics (C) [39]—and evaluated on the ImageNet validation set. This setup enables a systematic assessment of both cross-domain generalization and cross-architecture transferability.

Overall, CBLA consistently outperforms TTP-P [18] across all surrogate–target combinations. Under the VGG19_BN surrogate, for example, CBLA increases the transfer success rate on WRN-50-2 from 31.00% (TTP-P) to 82.25%, representing an improvement of over 50 percentage points. On Dense121 and ResNet50, the success rates are further raised to around 88%, substantially exceeding prior generative baselines. These results demonstrate that CBLA effectively alleviates distribution shift and produces adversarial examples with stronger cross-domain target alignment.

The gains remain stable across different surrogate models. When trained on either Paintings or Comics and evaluated on ImageNet, CBLA maintains high transfer success rates even on unseen architectures. For instance, with ResNet50 as the surrogate, CBLA-P achieves over 94% success rates on all heterogeneous target models. Such consistent improvements suggest that CBLA reduces reliance on surrogate-specific decision boundaries and enhances alignment with target semantics under both dataset shifts and architectural variations.

Although a performance gap between in-domain and cross-domain settings is expected due to distribution shift, the degradation of CBLA remains moderate. Compared with the in-domain results in Table 1, the cross-domain success rates decrease slightly but remain consistently high across heterogeneous architectures. For example, under the ResNet50 surrogate, CBLA achieves success rates above 96% in the in-domain setting and maintains comparable performance under cross-domain training. This relatively small gap suggests that the proposed framework does not rely heavily on the training distribution and preserves strong target-aligned transferability even when trained on auxiliary datasets. Such robustness highlights the effectiveness of the cosine-based alignment and semantic-invariance constraints in promoting cross-domain generalization. Figure 4 presents qualitative targeted adversarial examples on the Comics and Paintings datasets. Although the generator is trained on cross-domain datasets, the generated adversarial examples remain highly effective across visually distinct domains. The preservation of attack effectiveness across heterogeneous visual domains further supports the hypothesis that transferable adversarial examples approximate shared target-consistent regions beyond a single dataset distribution.

4.4. Effectiveness of Cosine Alignment and Semantic-Invariant Constraints

We analyze the effectiveness of the two key components in CBLA, namely cosine-based logit alignment (COS) and semantic-invariant constraints (SIC). Four configurations are evaluated, including cross-entropy (CE), COS, SIC + CE, and the full CBLA framework. Table 3 presents the ablation results in the in-domain setting (ImageNet).

Replacing CE with cosine similarity COS yields consistent but moderate improvements in cross-architecture transfer. For example, under the VGG19_BN surrogate, the success rate on Dense121 increases from 32.58% (CE) to 36.38% (COS), indicating that alleviating exponential gradient decay improves target alignment.

Introducing SIC leads to substantial gains. Under the same surrogate, SIC + CE boosts the Dense121 success rate to 71.11%, more than doubling the performance of CE. A similar trend is observed under the ResNet50 surrogate, where the success rate on ResNet152 rises from 64.58% (COS) to 91.53% (SIC + CE). This demonstrates that enforcing transformation consistency effectively enhances cross-model generalization. Finally, the complete CBLA framework consistently achieves the best results across all surrogate–target pairs. Under the ResNet50 surrogate, CBLA attains 97.17% on Dense121 and 96.40% on ResNet152, approaching white-box performance while remaining transferable. The consistent superiority of CBLA across different architectures confirms that cosine-based logit alignment and semantic-invariance constraints play complementary roles in strengthening targeted transferability.

Figure 5 presents the mean black-box AASR for different loss configurations under each surrogate model. A consistent monotonic improvement is observed as one progresses from CE to COS, then to SIC + CE, and finally to the full CBLA framework. Replacing cross-entropy with cosine similarity yields moderate but stable gains, confirming that mitigating gradient saturation enhances directional alignment in logit space. Introducing semantic-invariant constraints further yields substantial improvements, indicating that enforcing structural consistency significantly enhances cross-model generalization.

Most notably, CBLA consistently achieves the highest performance across all surrogate architectures. For example, under VGG19_BN, the mean black-box AASR increases from 21.26% (CE) to 84.50% (CBLA), representing an absolute gain of over 63 percentage points. Similar trends are observed for DenseNet121 and ResNet50, where CBLA approaches or exceeds 90% transfer success. These results verify that directional logit alignment and semantic-invariant regularization act in a complementary manner, jointly mitigating surrogate overfitting and enhancing target-space alignment.

4.5. Robustness and Practical Analysis

4.5.1. Robustness Against Defense Methods and Adversarially Trained Models

To better understand the practical robustness of different methods, we further evaluate their performance under several commonly used defense strategies, including JPEG compression [44], bit-depth reduction (Bit-Red) [45], feature denoising (FD) [46], random resizing and padding (R&P) [47], and neural representation purification (NRP) [48]. We also consider transferability to an adversarially trained model, Inc-v3_adv [49].

The results are summarized in Table 4. Without any defense, CBLA already achieves a higher attack success rate than the other methods. This advantage remains consistent under most transformation-based defenses. For example, under JPEG compression and Bit-Red, CBLA maintains relatively high success rates, indicating that the generated perturbations are less sensitive to input preprocessing.

A more noticeable difference can be observed in the adversarially trained model. CBLA achieves 71.91% on Inc-v3_adv, which is clearly higher than M3D and TTP, suggesting better generalization to more robust model architectures.

All methods experience a sharp drop in performance under NRP, which is expected given its strong purification capability. This behavior can be explained by the interaction between the learned perturbation structure and the purification mechanism. CBLA explicitly enforces alignment between adversarial logits and target class directions, which relies on preserving global semantic consistency in the input. In contrast, purification-based defenses such as NRP operate in feature space and aim to project perturbed inputs back onto the manifold of clean data. As a result, perturbations that encode target-oriented directional signals are suppressed or distorted during purification. Consequently, although CBLA produces semantically aligned perturbations, these signals are particularly vulnerable to purification operations, leading to a significant drop in attack success rate. Nevertheless, CBLA still performs comparably to other methods under NRP.

Overall, the results indicate that CBLA produces perturbations that are not only transferable under standard settings but also more resilient to common defense operations and adversarial training.

4.5.2. Computational Efficiency

We evaluate the computational efficiency of different methods in terms of runtime and memory consumption. All results are measured with a batch size of 16. As shown in Table 5, CBLA demonstrates a clear advantage over existing methods.

Specifically, CBLA achieves the lowest average batch processing time and memory usage). In comparison, TTP requires approximately twice the runtime and significantly more memory, while M3D is substantially more expensive, with over 2 s per batch and more than 21 GB of memory.

These differences stem from distinct optimization strategies. TTP applies data augmentation before the generator, leading to repeated forward passes over transformed inputs. M3D further increases the cost by combining pre-generator augmentation with the simultaneous optimization of multiple surrogate models. In contrast, CBLA performs transformation after the generator, avoiding redundant forward passes and eliminating additional model updates, resulting in a more efficient pipeline.

Moreover, the lower memory footprint of CBLA enables the use of larger batch sizes in practice, which can further accelerate training and improve scalability.

4.5.3. Effect of Generator Architecture

To justify the choice of generator architecture, we compare U-Net with a ResNet-based generator under the same training configuration. Both models are trained for 15 epochs, and their average attack success rates (AASR) across multiple black-box target models are reported in Figure 6a.

U-Net consistently outperforms the ResNet-based generator across all training stages. In particular, at the early stage (epoch 1), U-Net achieves 67.79% AASR, substantially higher than the ResNet generator’s 36.27%, indicating significantly faster convergence. Although the performance gap narrows as training progresses, the U-Net still maintains a consistent advantage at later stages.

This improvement can be attributed to the encoder–decoder architecture with skip connections in U-Net, which preserves spatial details and enables more precise perturbation generation. In contrast, the ResNet-based generator relies on deep residual transformations, which may weaken fine-grained control over perturbations, particularly during early optimization. These results demonstrate that U-Net provides more stable and efficient optimization for adversarial example generation, and is therefore adopted as the default generator architecture.

4.5.4. Effect of Perturbation Budget

We further analyze the effect of the perturbation budget

ϵ

on attack performance. Although the model is trained under an

ℓ_{\infty}

constraint with

ϵ = 10

, we evaluate it over a wider range of perturbation budgets to assess its generalization. The results are shown in Figure 6b. CBLA consistently outperforms competing methods across all perturbation levels. At

ϵ = 10

, which matches the training constraint, CBLA achieves 76.58% AASR, outperforming M3D (71.35%) and TTP (50.42%), demonstrating its effectiveness under the standard setting. As

ϵ

increases, all methods show improved performance due to the enlarged perturbation space; however, CBLA maintains a clear performance advantage throughout the entire range.

Notably, although the model is trained with

ϵ = 10

, it generalizes well to larger perturbation budgets such as

ϵ = 16

, where CBLA achieves 96.69% AASR. This suggests that the learned perturbation patterns are not overfit to a specific constraint but remain effective under relaxed bounds. In general, these results justify using

ϵ = 16

during evaluation as a common, more challenging setting in previous work, while also confirming that CBLA remains highly effective under the standard training budget of

ϵ = 10

.

4.5.5. Perceptual Quality

We further evaluate the perceptual quality of adversarial examples using SSIM [50], LPIPS [51], and

ℓ_{2}

distance. As shown in Table 6, CBLA achieves perceptual similarity comparable to existing generative baselines.

Compared with TTP, CBLA slightly improves both SSIM and LPIPS, while also producing a smaller perturbation magnitude, as reflected by the lower

ℓ_{2}

norm. Although M3D attains the best SSIM and LPIPS values, it requires a larger perturbation magnitude than CBLA. In contrast, CBLA achieves the lowest

ℓ_{2}

norm among all compared methods, indicating that it produces more compact perturbations while maintaining competitive perceptual quality.

The differences between CBLA and M3D in SSIM and LPIPS stem from their distinct optimization objectives. M3D explicitly promotes perceptual consistency through extensive input transformations and multi-model optimization, thereby preserving global structure and yielding higher SSIM and lower LPIPS. In contrast, CBLA emphasizes transferability via logit alignment, which enhances semantic alignment while introducing more localized or frequency-sensitive perturbations. As a result, CBLA may slightly reduce structural similarity or increase perceptual distance, despite maintaining a smaller overall perturbation magnitude.

These results suggest that CBLA provides a favorable trade-off between perturbation compactness and visual fidelity. We note that this trade-off arises from the design choice to prioritize transferability via directional alignment in logit space, rather than explicitly optimizing perceptual similarity. Incorporating an additional perceptual regularization term (e.g., an LPIPS-based loss) could further improve SSIM and LPIPS scores. However, such a modification may introduce competing objectives that could affect transferability. Exploring this balance between perceptual quality and transfer performance remains an interesting direction for future work.

5. Discussion

Figure 7 provides further insight into how model capacity and architectural design influence transferable targeted attacks. Although the convolutional architectures (VGG19_BN, DenseNet121, and ResNet50) achieve comparable Top-1 accuracies, their mutual transfer behaviors exhibit noticeable asymmetry. In particular, ResNet50 consistently serves as the most effective surrogate, yielding the highest cross-architecture transfer rates, whereas VGG19_BN produces comparatively weaker transfer to residual networks.

These observations suggest that transferability is shaped not only by classification accuracy, but also by structural properties that affect gradient propagation during generator training. Residual connections in DenseNet121 and ResNet50 facilitate smoother gradient flow and more stable feature representations, whereas the strictly sequential structure of VGG19_BN may be more susceptible to local saturation effects. Consequently, the benefits of replacing cross-entropy with cosine similarity become more pronounced in residual architectures, where sustained directional optimization can be more effectively exploited.

Importantly, when the evaluation is extended to ViT [52], a consistent drop in performance is observed across all CNN-based surrogates. Despite strong transferability across CNN architectures, the AASR values for ViT are significantly lower, indicating a clear architectural gap. This phenomenon highlights that transformer-based models, which rely on global self-attention mechanisms rather than local convolutional operations, exhibit different feature organization and decision boundaries. As a result, perturbations learned from CNN surrogates may not align well with the global context modeling in ViTs.

Nevertheless, the observation of non-trivial transfer to ViT indicates that certain components of adversarial perturbations are not entirely model-specific. This suggests that transferable attacks may rely on shared high-level representations across architectures; however, this alignment weakens when the underlying model structures differ significantly. From a broader perspective, these results highlight that the optimization objective does not solely determine transferability, but is also strongly influenced by the choice of surrogate architecture. Different model families may induce different perturbation characteristics, and their compatibility plays a key role in cross-model generalization. In this context, approaches based on directional alignment, such as CBLA, may improve transferability but do not fully resolve the challenges posed by heterogeneous architectures.

Overall, the empirical results suggest that transferable targeted attacks depend on both intra-architecture generalization and cross-architecture alignment. Within similar model families, perturbations tend to generalize more effectively due to shared structural properties, while transfer across heterogeneous architectures relies on partially aligned high-level representations. While CBLA improves transferability in several settings, the reduced performance across structurally different architectures (e.g., CNNs and transformers) indicates that cross-architecture generalization remains a challenging and important direction for future work.

6. Conclusions

This work revisits transferable targeted adversarial attacks from the perspective of directional optimization and semantic consistency. Rather than attributing limited transferability solely to surrogate-boundary overfitting, we show that insufficient alignment with a shared target-oriented representation space is a key bottleneck to stable cross-model transfer.

To address this issue, we propose CBLA, a unified framework that combines cosine-based logit alignment with semantic-invariance regularization. The cosine-based objective alleviates gradient attenuation and supports more persistent target-oriented optimization, while transformation-based semantic constraints encourage perturbations to remain effective under input variations. Taken together, these two components improve both optimization stability and the performance of transferable targeted attack. Experimental results in both in-domain and cross-domain scenarios demonstrate that CBLA consistently improves attack success rates over competitive baselines. At the same time, the additional evaluation on heterogeneous architectures suggests that cross-paradigm transfer remains challenging, especially when transferring from convolutional surrogates to attention-based models such as Vision Transformers. This indicates that although directional semantic alignment improves model-agnostic transferability, architectural discrepancies still limit the extent to which adversarial perturbations generalize across fundamentally different feature-extraction mechanisms.

Several directions merit further investigation. Future work should focus on improving cross-architecture transferability, particularly between CNNs and transformer-based models, by developing alignment strategies that better capture shared target semantics across heterogeneous representations. More effective optimization objectives could be explored beyond cosine similarity, such as hybrid losses that jointly model logit direction, feature-space consistency, and inter-model gradient agreement. In addition, the current transformation-based regularization can be extended with more diverse and adaptive augmentation policies, enabling stronger semantic invariance under appearance, frequency, and geometric changes. These directions may help establish a more general framework for transferable targeted attacks and provide deeper insight into adversarial alignment across model families.

Author Contributions

Conceptualization, T.S. and S.W.; methodology, T.S.; software, T.S.; validation, T.S.; investigation, T.S., S.W. and B.L.; data curation, T.S. and S.W.; writing—original draft preparation, T.S.; writing—review and editing, T.S., S.W. and B.L.; visualization, T.S.; supervision, B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available in ImageNet (ILSVRC 2012)/Comics/Paintings at [https://www.image-net.org/; https://www.kaggle.com/cenkbircanoglu/comic-books-classification (accessed on 28 February 2026); https://kaggle.com/competitions/painter-by-numbers (accessed on 28 February 2026)], reference number [38,39,40].

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

CBLA	Cosine-Based Logit Alignment
AE	Adversarial Examples
DNN	Deep Neural Networks
SIT	Semantic-Invariant Transformations
SIC	Semantic-Invariant Constraint
CE	Cross-Entropy
COS	Cosine Similarity
ASR	Attack Success Rate
AASR	Average Attack Success Rate

References

LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-Dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
Yuan, X.; He, P.; Zhu, Q.; Li, X. Adversarial examples: Attacks and defenses for deep learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 2805–2824. [Google Scholar] [CrossRef] [PubMed]
Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Xiao, C.; Prakash, A.; Kohno, T.; Song, D. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 1625–1634. [Google Scholar]
Singh, I.; Araki, T.; Kakizaki, K. Powerful physical adversarial examples against practical face recognition systems. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision; IEEE: Piscataway, NJ, USA, 2022; pp. 301–310. [Google Scholar]
Kilany, S.; Mahfouz, A. A comprehensive survey of deep face verification systems adversarial attacks and defense strategies. Sci. Rep. 2025, 15, 30861. [Google Scholar] [CrossRef] [PubMed]
Dong, Y.; Liao, F.; Pang, T.; Su, H.; Zhu, J.; Hu, X.; Li, J. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 9185–9193. [Google Scholar]
Jia, X.; Zhang, Y.; Wei, X.; Wu, B.; Ma, K.; Wang, J.; Cao, X. Improving fast adversarial training with prior-guided knowledge. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 6367–6383. [Google Scholar] [CrossRef] [PubMed]
Liu, X.; Yang, Y.; He, K.; Hopcroft, J.E. Parameter interpolation adversarial training for robust image classification. IEEE Trans. Inf. Forensics Secur. 2025, 20, 1613–1623. [Google Scholar] [CrossRef]
Yu, Y.; Gao, X.; Xu, C.Z. LAFIT: Efficient and reliable evaluation of adversarial defenses with latent features. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 354–369. [Google Scholar] [CrossRef] [PubMed]
Lu, Y.; Love, P.E.; Luo, H.; Fang, W. Mitigating adversarial attacks and building robust deep learning models for assessing risks in tunnel construction. Reliab. Eng. Syst. Saf. 2025, 265, 111491. [Google Scholar] [CrossRef]
Mangussi, A.D.; Pereira, R.C.; Lorena, A.C.; Santos, M.S.; Abreu, P.H. Studying the robustness of data imputation methodologies against adversarial attacks. Comput. Secur. 2025, 157, 104574. [Google Scholar] [CrossRef]
Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing Properties of Neural Networks. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018; pp. 99–112. [Google Scholar]
Chen, P.Y.; Zhang, H.; Sharma, Y.; Yi, J.; Hsieh, C.J. Zoo: Zeroth order optimization based black-box attacks to deep neural networks without training substitute models. In Proceedings of the 10th ACM Workshop on Artificial Intelligence and Security; ACM: New York, NY, USA, 2017; pp. 15–26. [Google Scholar]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Naseer, M.; Khan, S.; Hayat, M.; Khan, F.S.; Porikli, F. On generating transferable targeted perturbations. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2021; pp. 7708–7717. [Google Scholar]
Zhao, A.; Chu, T.; Liu, Y.; Li, W.; Li, J.; Duan, L. Minimizing maximum model discrepancy for transferable black-box targeted attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2023; pp. 8153–8162. [Google Scholar]
Naseer, M.M.; Khan, S.H.; Khan, M.H.; Shahbaz Khan, F.; Porikli, F. Cross-domain transferability of adversarial perturbations. Adv. Neural Inf. Process. Syst. 2019, 32, 12885–12895. [Google Scholar]
Lin, J.; Song, C.; He, K.; Wang, L.; Hopcroft, J.E. Nesterov Accelerated Gradient and Scale Invariance for Adversarial Attacks. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Li, M.; Deng, C.; Li, T.; Yan, J.; Gao, X.; Huang, H. Towards transferable targeted attack. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2020; pp. 641–649. [Google Scholar]
Chen, J.; Feng, Z.; Zeng, R.; Pu, Y.; Zhou, C.; Jiang, Y.; Gan, Y.; Li, J.; Ji, S. Enhancing adversarial transferability with adversarial weight tuning. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2025; Volume 39, pp. 2061–2069. [Google Scholar]
Wang, X.; He, X.; Wang, J.; He, K. Admix: Enhancing the transferability of adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2021; pp. 16158–16167. [Google Scholar]
Chen, B.; Yin, J.; Chen, S.; Chen, B.; Liu, X. An adaptive model ensemble adversarial attack for boosting adversarial transferability. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2023; pp. 4489–4498. [Google Scholar]
Zhang, C.; Zhou, L.; Xu, X.; Wu, J.; Liu, Z. Adversarial attacks of vision tasks in the past 10 years: A survey. ACM Comput. Surv. 2025, 58, 1–42. [Google Scholar] [CrossRef]
Nguyen, K.N.T.; Zhang, W.; Lu, K.; Wu, Y.H.; Zheng, X.; Tan, H.L.; Zhen, L. A survey and evaluation of adversarial attacks in object detection. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 15706–15722. [Google Scholar] [CrossRef] [PubMed]
Shi, Y.; Han, Y.; Zhang, Q.; Kuang, X. Adaptive iterative attack towards explainable adversarial robustness. Pattern Recognit. 2020, 105, 107309. [Google Scholar] [CrossRef]
Dong, Y.; Pang, T.; Su, H.; Zhu, J. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2019; pp. 4312–4321. [Google Scholar]
Liu, X.; Zhong, Y.; Zhang, Y.; Qin, L.; Deng, W. Enhancing generalization of universal adversarial perturbation through gradient aggregation. In Proceedings of the IEEE/CVF International Conference on Computer Vision; IEEE: Piscataway, NJ, USA, 2023; pp. 4435–4444. [Google Scholar]
Xie, C.; Zhang, Z.; Zhou, Y.; Bai, S.; Wang, J.; Ren, Z.; Yuille, A.L. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2019; pp. 2730–2739. [Google Scholar]
Zhao, Z.; Liu, Z.; Larson, M. On success and simplicity: A second look at transferable targeted attacks. Adv. Neural Inf. Process. Syst. 2021, 34, 6115–6128. [Google Scholar]
Poursaeed, O.; Katsman, I.; Gao, B.; Belongie, S. Generative adversarial perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 4422–4431. [Google Scholar]
Baluja, S.; Fischer, I. Learning to attack: Adversarial transformation networks. In Proceedings of the AAAI Conference on Artificial Intelligence; AAAI Press: Washington, DC, USA, 2018; Volume 32. [Google Scholar]
Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP); IEEE: Piscataway, NJ, USA, 2017; pp. 39–57. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2016; pp. 770–778. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 2015, 115, 211–252. [Google Scholar] [CrossRef]
Bircanoglu, C. Comic Books Classification Dataset. 2018. Available online: https://www.kaggle.com/cenkbircanoglu/comic-books-classification (accessed on 28 February 2026).
Small Yellow Duck; Kan, W. Painter by Numbers. Kaggle. 2016. Available online: https://kaggle.com/competitions/painter-by-numbers (accessed on 28 February 2026).
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2017; pp. 4700–4708. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Guo, C.; Rana, M.; Cisse, M.; Van Der Maaten, L. Countering adversarial images using input transformations. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Xu, W.; Evans, D.; Qi, Y. Feature squeezing: Detecting adversarial examples in deep neural networks. In Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 18–21 February 2018. [Google Scholar]
Liu, Z.; Liu, Q.; Liu, T.; Xu, N.; Lin, X.; Wang, Y.; Wen, W. Feature distillation: Dnn-oriented jpeg compression against adversarial examples. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR); IEEE: Piscataway, NJ, USA, 2019; pp. 860–868. [Google Scholar]
Xie, C.; Wang, J.; Zhang, Z.; Ren, Z.; Yuille, A. Mitigating adversarial effects through randomization. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Naseer, M.; Khan, S.; Hayat, M.; Khan, F.S.; Porikli, F. A self-supervised approach for adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2020; pp. 262–271. [Google Scholar]
Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.; Boneh, D.; McDaniel, P. Ensemble adversarial training: Attacks and defenses. In Proceedings of the International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition; IEEE: Piscataway, NJ, USA, 2018; pp. 586–595. [Google Scholar]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In Proceedings of the International Conference on Learning Representations (ICLR), Virtual, 3–7 May 2021. [Google Scholar]

Figure 1. Geometric illustration of targeted transferability under different attack strategies in a two-dimensional representation space. The surrogate and target decision boundaries are shown in black and red, respectively. Cross-entropy optimization produces a large gradient-weak region near the surrogate boundary, leading to boundary overfitting and limited transferability. In contrast, cosine-based alignment maintains active gradients beyond the boundary, encouraging movement toward shared target-consistent regions. Panels (a–d) compare FGSM, MIM, TTP, and the proposed CBLA framework.

Figure 2. Framework of the Proposed CBLA Method.

Figure 3. Targeted adversarial examples generated by CBLA using ResNet50 as the surrogate model. Each row corresponds to a clean source image (in the leftmost column) and its adversarial versions targeted to different target classes. Column headers indicate the target categories, while row labels denote the original classes. All adversarial examples satisfy the

ℓ_{\infty}

constraint with

ϵ \leq 16

.

Figure 3. Targeted adversarial examples generated by CBLA using ResNet50 as the surrogate model. Each row corresponds to a clean source image (in the leftmost column) and its adversarial versions targeted to different target classes. Column headers indicate the target categories, while row labels denote the original classes. All adversarial examples satisfy the

ℓ_{\infty}

constraint with

ϵ \leq 16

.

Figure 4. Cross-domain targeted adversarial examples generated by CBLA using ResNet50 as the surrogate model. The left panel shows results on the Comics dataset, and the right panel shows results on the Paintings dataset. All adversarial examples satisfy the

ℓ_{\infty}

constraint with

ϵ \leq 16

.

Figure 4. Cross-domain targeted adversarial examples generated by CBLA using ResNet50 as the surrogate model. The left panel shows results on the Comics dataset, and the right panel shows results on the Paintings dataset. All adversarial examples satisfy the

ℓ_{\infty}

constraint with

ϵ \leq 16

.

Figure 5. Ablation study on loss configurations under the in-domain (ImageNet) setting. For each surrogate model (VGG19_BN, DenseNet121, and ResNet50), we report the mean black-box AASR averaged over four unseen target models. The results demonstrate the incremental contribution of cosine similarity and semantic-invariant constraints, with CBLA achieving the highest transfer performance across all surrogate models.

Figure 6. Attack performance under different settings using ResNet50 as the surrogate model. The reported results are averaged over multiple black-box target models. (a) Effect of generator architecture. (b) Effect of perturbation budget

ϵ

. CBLA consistently achieves superior performance across different configurations.

Figure 6. Attack performance under different settings using ResNet50 as the surrogate model. The reported results are averaged over multiple black-box target models. (a) Effect of generator architecture. (b) Effect of perturbation budget

ϵ

. CBLA consistently achieves superior performance across different configurations.

Figure 7. Mutual targeted transferability (AASR, %) under the CBLA framework across convolutional and transformer-based architectures. Rows denote surrogate models used to train the generator, and columns denote target models. The diagonal entries correspond to white-box attacks, while off-diagonal entries indicate cross-architecture transfer. In addition to three convolutional neural networks (VGG19_BN, DenseNet121, and ResNet50), a Vision Transformer (ViT) is included to evaluate CBLA’s generalization across fundamentally different architectural paradigms. Top-1 accuracies of all models [36,41,42,52] are shown for reference.

Table 1. AASR(%) on the in-domain setting (ImageNet). The adversarial generator is trained on ImageNet using different surrogate models. * denotes white-box attacks, and all adversarial examples satisfy the perturbation constraint

ℓ_{\infty} \leq 16

. The Avg column reports the average success rate over all black-box target models. Boldface indicates the best result in each setting. ^† indicates that the improvement of CBLA over the best baseline is statistically significant according to a paired t-test (

p < 0.01

).

Table 1. AASR(%) on the in-domain setting (ImageNet). The adversarial generator is trained on ImageNet using different surrogate models. * denotes white-box attacks, and all adversarial examples satisfy the perturbation constraint

ℓ_{\infty} \leq 16

. The Avg column reports the average success rate over all black-box target models. Boldface indicates the best result in each setting. ^† indicates that the improvement of CBLA over the best baseline is statistically significant according to a paired t-test (

p < 0.01

).

Model	Attack	Target Model					Avg
Model	Attack	VGG19_BN	Dense121	ResNet50	ResNet152	WRN-50-2	Avg
VGG19_BN	MIM	99.91 *	0.92	0.68	0.36	0.47	0.61
	DIM	99.38 *	3.10	2.08	1.02	1.29	1.87
	GAP	98.23 *	16.19	15.83	5.89	7.78	11.42
	CDA	98.30 *	16.26	16.22	5.73	8.35	11.64
	TTP	98.54 *	45.77	45.87	27.18	32.63	37.36
	M3D	99.22 *	79.46	81.91	68.41	68.43	74.55
	CBLA	99.32 *	89.26 ^†	87.75 ^†	78.75 ^†	82.25 ^†	84.50 ^†
Dense121	MIM	1.85	99.90 *	2.71	1.68	1.88	2.03
	DIM	7.31	98.81 *	9.06	5.78	6.29	7.11
	GAP	39.01	97.30 *	47.85	39.25	34.79	40.23
	CDA	42.77	97.22 *	54.28	44.11	46.01	46.79
	TTP	58.90	97.61 *	68.72	57.11	56.80	60.38
	M3D	92.73	98.60 *	94.23	90.06	90.75	91.94
	CBLA	93.19	98.97 *	93.74	91.41 ^†	91.93 ^†	92.57 ^†
ResNet50	MIM	1.58	3.37	98.76 *	3.39	3.17	2.88
	DIM	9.14	15.47	99.01 *	12.45	12.61	12.42
	GAP	58.47	71.72	96.81 *	64.89	61.82	64.23
	CDA	64.58	73.57	96.30 *	70.30	69.27	69.43
	TTP	78.15	81.64	97.02 *	80.56	78.25	79.65
	M3D	92.41	94.39	98.33 *	93.85	93.87	93.63
	CBLA	96.85 ^†	97.17 ^†	98.98 *	96.40 ^†	96.35 ^†	96.69 ^†

Table 2. AASR (%) in the cross-domain setting. The adversarial generator is trained on auxiliary datasets, where “-P” denotes the Paintings dataset and “-C” denotes the Comics dataset, and evaluated on the ImageNet validation set. * indicates white-box results where the surrogate and target models are identical. All adversarial examples satisfy the perturbation constraint

ℓ_{\infty} \leq 16

. The Avg column reports the average success rate over all black-box target models. Boldface indicates the best result in each setting.

Table 2. AASR (%) in the cross-domain setting. The adversarial generator is trained on auxiliary datasets, where “-P” denotes the Paintings dataset and “-C” denotes the Comics dataset, and evaluated on the ImageNet validation set. * indicates white-box results where the surrogate and target models are identical. All adversarial examples satisfy the perturbation constraint

ℓ_{\infty} \leq 16

. The Avg column reports the average success rate over all black-box target models. Boldface indicates the best result in each setting.

Model	Attack	Target Model					Avg
Model	Attack	VGG19_BN	Dense121	ResNet50	ResNet152	WRN-50-2	Avg
VGG19_BN	TTP-P	97.38 *	45.53	42.90	26.72	31.00	36.54
	CBLA-P	99.33 *	87.96	86.90	75.41	80.79	82.77
	CBLA-C	99.36 *	87.44	88.00	78.58	81.33	83.84
Dense121	TTP-P	57.91	97.41 *	71.35	55.57	53.45	59.57
	CBLA-P	91.49	98.90 *	92.70	89.45	91.18	91.21
	CBLA-C	91.89	98.75 *	91.99	87.04	88.27	89.80
ResNet50	TTP-P	73.09	84.76	96.63 *	76.27	75.92	77.51
	CBLA-P	95.14	95.71	98.74 *	94.76	94.74	95.09
	CBLA-C	93.91	94.49	98.71 *	93.49	93.78	93.92

Table 3. AASR (%) on the in-domain setting (ImageNet) for different configurations. CE, COS, SIC + CE, and CBLA denote cross-entropy, cosine similarity, semantic-invariance constrained cross-entropy, and the full proposed framework, respectively. * indicates white-box results where the surrogate and target models are identical. All adversarial examples satisfy

ℓ_{\infty} \leq 16

. The Avg column reports the average success rate over all black-box target models. Boldface indicates the best result in each setting.

Table 3. AASR (%) on the in-domain setting (ImageNet) for different configurations. CE, COS, SIC + CE, and CBLA denote cross-entropy, cosine similarity, semantic-invariance constrained cross-entropy, and the full proposed framework, respectively. * indicates white-box results where the surrogate and target models are identical. All adversarial examples satisfy

ℓ_{\infty} \leq 16

. The Avg column reports the average success rate over all black-box target models. Boldface indicates the best result in each setting.

Model	Attack	Target Model					Avg
Model	Attack	VGG19_BN	Dense121	ResNet50	ResNet152	WRN-50-2	Avg
VGG19_BN	CE	87.20 *	32.58	21.22	17.44	13.80	21.26
	COS	96.20 *	36.38	27.67	19.49	19.29	25.71
	SIC + CE	99.18 *	71.11	66.36	53.29	50.42	60.30
	CBLA	99.32 *	89.26	87.75	78.75	82.25	84.50
Dense121	CE	38.27	93.78 *	40.87	35.49	29.31	35.99
	COS	48.49	98.16 *	56.24	50.11	51.00	51.46
	SIC + CE	81.49	98.96 *	85.58	80.11	74.62	80.45
	CBLA	93.19	98.97 *	93.74	91.41	91.93	92.57
ResNet50	CE	40.62	56.87	84.51 *	45.89	40.31	45.92
	COS	59.33	78.71	95.56 *	64.58	68.76	67.85
	SIC + CE	88.87	92.96	98.60 *	91.53	89.24	90.65
	CBLA	96.85	97.17	98.98 *	96.40	96.35	96.69

Table 4. Attack success rate (%) under various defense methods and adversarially trained models. Clean denotes no defense. The last column reports the average performance. Boldface indicates the best result in each setting.

Method	Clean	JPEG	Bit-Red	FD	R&P	NRP	Inc-v3_adv	Avg
TTP	79.65	64.39	74.44	56.14	51.89	0.08	34.33	51.56
M3D	93.63	83.81	91.43	82.98	86.01	6.99	61.80	72.38
CBLA	96.69	89.24	94.91	86.31	89.45	6.21	71.91	76.39

Table 5. Computational efficiency comparison of different attack methods. Time denotes the average processing time per batch, and Memory denotes the average peak GPU memory usage. Lower values, as indicated by (↓), represent better efficiency. Boldface indicates the best result in each setting.

Method	Time (s) ↓	Memory (GB) ↓
M3D	2.2067	21.623
TTP	0.5122	17.022
CBLA	0.2299	6.503

Table 6. Perceptual quality comparison of adversarial examples. SSIM measures structural similarity, LPIPS measures perceptual distance, and

ℓ_{2}

denotes the perturbation magnitude. Higher values, as indicated by (↑), and lower values, as indicated by (↓), represent better perceptual quality for the corresponding metrics. Boldface indicates the best result in each setting.

Table 6. Perceptual quality comparison of adversarial examples. SSIM measures structural similarity, LPIPS measures perceptual distance, and

ℓ_{2}

denotes the perturbation magnitude. Higher values, as indicated by (↑), and lower values, as indicated by (↓), represent better perceptual quality for the corresponding metrics. Boldface indicates the best result in each setting.

Method	SSIM ↑	LPIPS ↓	$ℓ_{2}$ ↓
TTP	0.7132	0.2980	20.1045
M3D	0.7238	0.2821	19.1139
CBLA	0.7146	0.2972	18.9696

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shi, T.; Wang, S.; Liu, B. Enhancing the Transferability of Generative Targeted Adversarial Attacks via Cosine-Based Logit Alignment. Mathematics 2026, 14, 1370. https://doi.org/10.3390/math14081370

AMA Style

Shi T, Wang S, Liu B. Enhancing the Transferability of Generative Targeted Adversarial Attacks via Cosine-Based Logit Alignment. Mathematics. 2026; 14(8):1370. https://doi.org/10.3390/math14081370

Chicago/Turabian Style

Shi, Tengfei, Shihai Wang, and Bin Liu. 2026. "Enhancing the Transferability of Generative Targeted Adversarial Attacks via Cosine-Based Logit Alignment" Mathematics 14, no. 8: 1370. https://doi.org/10.3390/math14081370

APA Style

Shi, T., Wang, S., & Liu, B. (2026). Enhancing the Transferability of Generative Targeted Adversarial Attacks via Cosine-Based Logit Alignment. Mathematics, 14(8), 1370. https://doi.org/10.3390/math14081370

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing the Transferability of Generative Targeted Adversarial Attacks via Cosine-Based Logit Alignment

Abstract

1. Introduction

2. Related Work

2.1. Iterative Attacks

2.2. Generative Attacks

2.3. Gradient Behavior Analysis of Loss Functions

3. Approach

3.1. Notions and Definitions

3.2. A Geometric Perspective on Transferability

3.3. Cosine Similarity as an Alternative to Cross-Entropy

3.4. Semantic-Invariant Constraint

3.5. Overall Framework and Implementation

4. Experiments

4.1. Experiment Setup

4.2. In-Domain Targeted Attack Results

4.3. Cross-Domain Targeted Attack Results

4.4. Effectiveness of Cosine Alignment and Semantic-Invariant Constraints

4.5. Robustness and Practical Analysis

4.5.1. Robustness Against Defense Methods and Adversarially Trained Models

4.5.2. Computational Efficiency

4.5.3. Effect of Generator Architecture

4.5.4. Effect of Perturbation Budget

4.5.5. Perceptual Quality

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI