G&G Attack: General and Geometry-Aware Adversarial Attack on the Point Cloud

Chen, Geng; Zhang, Zhiwen; Peng, Yuanxi; Li, Chunchao; Li, Teng

doi:10.3390/app15010448

Open AccessArticle

G&G Attack: General and Geometry-Aware Adversarial Attack on the Point Cloud

by

Geng Chen

¹,

Zhiwen Zhang

¹,

Yuanxi Peng

¹,

Chunchao Li

¹ and

Teng Li

^2,3,*

¹

College of Computer, National University of Defense Technology, Changsha 410073, China

²

Beijing Institute for Advanced Study, National University of Defense Technology, Beijing 100020, China

³

College of Advanced Interdisciplinary Studies, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(1), 448; https://doi.org/10.3390/app15010448

Submission received: 20 November 2024 / Revised: 24 December 2024 / Accepted: 26 December 2024 / Published: 6 January 2025

Download

Browse Figures

Review Reports Versions Notes

Abstract

Deep neural networks have been shown to produce incorrect predictions when imperceptible perturbations are introduced into the clean input. This phenomenon has garnered significant attention and extensive research in 2D images. However, related work on point clouds is still in its infancy. Current methods suffer from issues such as generated point outliers and poor attack generalization. Consequently, it is not feasible to rely solely on overall or geometry-aware attacks to generate adversarial samples. In this paper, we integrate adversarial transfer networks with the geometry-aware method to introduce adversarial loss into the attack target. A state-of-the-art autoencoder is employed, and sensitivity maps are utilized. We use the autoencoder to generate a sufficiently deceptive mask that covers the original input, adjusting the critical subset through a geometry-aware trick to distort the point cloud gradient. Our proposed approach is quantitatively evaluated in terms of the attack success rate (ASR), imperceptibility, and transferability. Compared to other baselines on ModelNet40, our method demonstrates an approximately 38% improvement in ASR for black-box transferability query attacks, with an average query count of around 7.84. Comprehensive experimental results confirm the superiority of our method.

Keywords:

3D; point cloud classification; adversary; invisibility; transferability

1. Introduction

Point clouds captured by LiDAR are now widely used in various scenarios, such as autonomous driving [1], satellite security monitoring [2], and so on. Deep Neural Networks (DNNs) have achieved tremendous success in various fields [1,3,4,5] due to their ability to fit arbitrarily complex functions. However, as research progresses, it has been discovered that DNNs are susceptible to adversarial examples, especially when imperceptible perturbations are intentionally added to clean samples [6,7]. These perturbations can mislead targeted models, rendering them untrustworthy [8]. Despite this, there is a notable lack of research concerning data privacy and security related to 3D point cloud data.

Recently, research on adversarial attacks has gradually transitioned from the 2D image field [9,10,11,12,13,14] to the 3D point cloud. Existing 3D point cloud adversarial attack methods use geometry-aware attack [15,16], which relies on global constraints such as the

L_{2}

-norm to limit perturbation distances while disregarding constraints on perturbation dimensions [17,18]. As illustrated in Figure 1, to solve the problem [11,19] attacks are explored with minimal constraints, with the need to only explore a few outlier points to achieve tasks. However, this results in excessively large steps for each point and makes the “sharp” [20] features of the surface easily perceivable. Introducing a simple Statistical Outlier Removal (SOR) [21] can easily defend against such attacks.

The geometry-aware attack also exhibits limited generalization ability, moving points [22] at key positions to minimize attacks and successfully deceive classifiers. However, over-reliance on the distribution characteristics of point clouds hampers attack transferability [23]. Current methods [20] show that iteratively perturbing the top 30% of points results in a classification accuracy drop of over 40% across different models. Furthermore, the same point cloud sensitivity map [24] is employed in different networks, revealing that about 62% of critical points overlap within the top 30%, with an overlap of approximately 34% across three models. This enables the application of a critical points subset from a surrogate model in black-box query attacks, enhancing its generalization. This underscores the commonality of different networks in classifying the single point cloud, which affirms the feasibility of black-box transfer attacks.

Moreover, existing 3D point cloud adversarial attack approaches mainly focus on white-box attacks [24,25], resulting in unsatisfactory performance on well-defended or black-box models. In a white-box model, attackers have full knowledge of the architecture and parameters of the network, which makes it useful for testing and detecting vulnerabilities. However, this fails to reflect the actual robustness of the model in real-world applications [26,27]. Due to intellectual property protection and product ecosystems, attackers rarely have access to a model’s gradients, parameter weights, and structures [28,29]. Consequently, launching backdoor attacks [30] by constructing poisoned samples based on targeted models becomes nearly impossible. Attackers can only make decisions based on the confidence scores of outputs, which poses greater challenges and has increased practical significance.

To address these challenges, we propose a general and geometry-aware adversarial attack on point clouds (G&G attack), which allows us to balance the effectiveness and imperceptibility of point-cloud attacks from novel perspectives of global reconstruction and local perturbation. By incorporating contextual semantics, we aim to generate high-quality counterfactual examples. Compared with previous works, the contributions of this approach can be summarized as follows:

To solve the problem of limited generalization and black-box attacks, an autoencoder is employed for 3D point cloud reconstruction. To mitigate the potential impact of benign surfaces caused by resampling on the effectiveness of attacks, an autoencoder is introduced to generate adversarial surfaces, which modify the density structure and local contextual features of the surface [31], resulting in generated adversarial point clouds that are smooth and uniform. We project the perturbation variables onto the input point cloud and account for classification and distance losses effectively. After multiple reconstructions, we successfully deceive the surrogate model.
Point cloud sensitivity maps are used to implement adaptive geometry-aware attacks. We introduce tangents, curvature, and integrated gradients (IGs) [32,33] to evaluate each point’s feature confidence in the classification results. We adaptively select the optimal attack direction and step size in the orthogonal search subspace. To address the problem of disturbance dimension explosion, global reconstruction and local interference are integrated.
Through comprehensive experiments, we convincingly showcase the superiority of our method over existing approaches, boasting high attack success rates, robust generalization capabilities, and minimal perceptibility. In PointNet++ robustness testing, our method achieved an impressive ASR of 79.57%, a transferability rate of 79.2%, and an adversarial distance of $1.2 \times 10^{- 3}$ .

To the best of our knowledge, G&G attack marks the pioneering fusion of global perturbations with local attacks, enhancing our comprehension of black-box classification and advancing the exploration of decision boundaries simultaneously.

2. Related Work

2.1. Point Cloud Classification

PointNet [34] is a pioneering work that leverages the principles of permutation and transformation invariance to map the original point cloud to a higher-dimensional space. Through symmetric operations, it effectively preserves geometric information. Thanks to its swift response time and simple network architecture, PointNet has become the backbone of various classification models [32,35]. Subsequent advancements, such as PointNet++ [36], improve model generalization by introducing components for extracting local features. Another notable contribution is the Point Cloud Transformer (PCT) [37], which maps point clouds to higher-dimensional spaces. The PCT employs self-attention [38,39] to capture global features and intricate local context among points. Furthermore, the PCT [37] employs multiple attention heads to enhance model spatial fitting. Recently, multi-scale geometry-aware transformers have been proposed, which boast superior classification accuracy and robustness.

2.2. 3D Point Cloud Adversarial Attacks

The current adversarial attack methods on point clouds can be classified into point-add, point-drop, and point-move. Point-add [19,26,27] and point-drop [24] aim to mislead classification models by altering the number of critical points. For instance, point-add utilizes the

L_{2}

-norm to generate adversarial points around benign points, subsequently clustering adversarial points iteratively. Conversely, [24] constructs a Saliency Map [24] to evaluate the confidence of each critical point and removes those with higher scores. Intuitively, the effectiveness of adversarial attacks depends on the number of points that can be optimized during generation and the attack distance. Therefore, point-move often outperforms the methods mentioned above [15].

Point-move [15,16,20,22,40,41] often constructs subsets of critical points using gradients in white-box models, limiting the dimensions, distances, and attack frequencies within these subsets. For example, SI-Adv [20] adopts the C&W [42] to achieve shape invariance through coordinate transformations. In contrast, GeoA3 [15] enhances the imperceptibility of point cloud attacks by projecting noise onto the lower surface of the input, thereby implementing geometry-aware constraints. These methods perform admirably in white-box scenarios. However, they tend to perform poorly in black-box transfer attacks and defensive classification models, partly due to their lack of gradient information [43] and local semantic understanding [44]. While most existing attack methods aim to disrupt points away from the surface to preserve local geometric features, they still induce perceptible irregularities on the local surfaces [45,46].

Compared to white-box attacks, most black-box attack strategies have stronger generalization capabilities. Black-box attacks typically perform a limited number of queries on a classifier with unknown internal structures to obtain partial information. They are mainly divided into two categories: score-based [20,28,29,47,48] and decision-based [49,50]. The difference lies in the output of the target model: the former outputs confidence scores, while the latter directly provides the label.

Deep neural networks are vulnerable to data leakage and backdoor attacks, raising significant concerns about their deployment in critical applications. In the context of point clouds, for instance, attackers can manipulate LiDAR sensors by introducing adversarial objects with specific shapes and textures, causing the target vehicle to ’disappear’ from the sensor data. This can lead the autonomous driving system to make incorrect decisions, such as initiating emergency braking or making erratic lane changes. As a result, adversarial attacks on point clouds can deepen our understanding of how deep neural networks behave under adversarial conditions, facilitating the evaluation and improvement of robustness in black-box attack models. Furthermore, such attacks can contribute to the development of more robust models for critical infrastructure applications, including autonomous vehicles and predictive maintenance.

3. Methods

This paper presents a revised autoencoder architecture for reconstructing 3D point clouds and a novel loss function that incorporates adversarial loss. We then optimize existing key point selection strategies, construct a more compelling point cloud sensitivity map, and develop a new method based on geometric attacks. Finally, building on the content above, we propose a novel query-based black-box transferable attack strategy that balances both global and local features. The framework diagram for Generalization and Geometric Adversarial Attack (G&G Attack) on 3D point clouds is shown in Figure 2.

3.1. 3D Point Cloud

We define a point cloud

P \in R^{N \times 3}

, which can be viewed as a set of high-dimensional random variables sampled from a density function over

R

, comprising

N

unordered points

p_{i} \in R^{3}

. Every point

p_{i} (x_{i 1}, x_{i 2}, x_{i 3})

has

K

nearest neighbour points which could calculate the normal vector of the point and estimate the local curvature. Each point cloud corresponds uniquely to a label

y \in Y

. Assume that there is a point cloud classification model f such that

f : P \to Y

.

P^{'}

are generated by adding perturbations

△ P

to

P

, make f fail to classifier them as Y, where

f : P^{'} \to Y^{'}

. For the sake of simplicity, our method is an untargeted adversarial attack based on point-move (

n u m (P) = n u m (P^{'})

). Our attack process restricts the perturbation △ within a small distance

ε

. Where

D

is the difference between

P

and

P^{'}

, and

λ

is the hyper-parameter. The formula is as follows:

f (P^{'}) \neq Y s . t . D (P, P^{'}) \leq ε,

(1)

or equivalently

min f (P^{'}) + λ D (P, P^{'}) .

(2)

3.2. Geometry-Aware

As shown in Figure 2, we construct a point cloud sensitivity map using a novel method, from which we extract the critical points. Various classification models place different emphases on critical point selection, yet significant overlap exists. We indirectly determine this subset through an alternative model, which adds scientific rigor and rationality to the process.

We determine the 3D point cloud sensitivity map by monitoring the cumulative gradient loss and calculating the geometric information content of each point. The sensitivity map summarizes the input’s skeleton shape, and points located on the surface typically have higher scores. Therefore, we use the following formula to approximate

S_{1}

.

S_{1} = \sqrt{\frac{G}{{∥ G ∥}_{2}} + H (x_{i}; P) \underset{i = 1}{\overset{3}{Σ}} {(p_{i} - p_{c i})}^{2}},

(3)

among them,

H

is the local curvature of k adjacent points, and

p_{c} (x_{c 1}, x_{c 2}, x_{c 3})

shows the overall center of gravity of the object.

Gradient-based visualization calculates the gradient of the decision score on the original image for the target class. The gradient reflects the impact of local position perturbations. However, the original gradient alone may have some limitations. Inspired by [51], we define

\tilde{p}

as the baseline input, representing samples with no information. In our paper, we use the center of gravity instead of the baseline input. The attribution of the i-th component of p can be viewed as the accumulation of all gradients along the linear path from the baseline

\tilde{p}

to the input p. The integrated gradient (IG) can be approximated through multiple summations, as illustrated by the following formula:

S_{2} \approx (p_{i} - {\tilde{p}}_{i}) \times \underset{k = 1}{\overset{m}{Σ}} \frac{\partial f (\tilde{p} + \frac{k}{m} (p - \tilde{p}))}{\partial p_{i}} \times \frac{1}{m},

(4)

where m is the order of Taylor expansion.

We utilize an Integrated Gradient (IG) [51] to capture the output of the maximum pooling. The core is a deep network that requires a baseline to simulate the absence of causes. We accumulate the gradient along the direction of the line between the baseline and the input so that the IG satisfies both sensitivity and implementation invariance. The critical point set

S 2

is determined by monitoring the maximum pooling layer and the integral gradient loss. The robustness of the model is studied by taking the union of the 200 points with the highest scores selected from

S 1

and

S_{2}

. Here, S represents the final point cloud sensitivity map, and rank(.) denotes the selection of the top 200 points in the set. The sensitivity map of the point cloud is shown in the figure.

S = r a n k (S_{1}) \cup r a n k (S_{2}) .

(5)

In summary, the generalization method is used to reduce the discriminability of the logits, then the critical point set (S) is obtained, and some critical points are moved to the boundary until the classification network can no longer classify them.

3.3. Autoencoder

In the ideal scenario, the

G

projects the perturbation onto the natural manifold. Ref. [23] uses a structurally simple fully connected autoencoder

E

. However, the method has drawbacks, such as susceptibility to overfitting and weak feature extraction. Therefore, we employ a unified autoencoder

E_{u n i t e d}

, where the decoder

D_{m l p}

uses a multi-layer perceptron (

mlp

) to establish a nonlinear perturbed point cloud

P^{'}

. The specific formula is as follows:

G (P^{'}) = D_{m l p} (E_{u n i t e d} (P^{'})) .

(6)

3.3.1. The Curve Aggregation Strategy

The purpose of curve aggregation is to enrich the intra-channel feature of encoding. The starting point of the curve determines the quality of feature fusion. Dividing n curves in the point cloud requires n starting points. Using the top-k method, the points are evaluated, and the top n points with the highest scores are selected. We use the adaptable network to seek a walk strategy

π

based on the state

s_{i}

, which is obtained from

p_{i}

, making

p_{i + 1} = π (p_{i}) 1 < I < l

, where l means the total number of steps. We choose the next step by adopting an

mlp

.

π (s_{i + 1}) = a r g m a x (s o f t m a x (m l p (s_{i}))) .

(7)

Since the argmax function is non-differentiable, gradients cannot be computed directly. To address the issue, we approximate the backpropagation of argmax by softmax and one-hot label to obtain the index of the maximum probability category.

Curves tend to form closed loops easily in small ranges due to excessive bending, which degenerates into k-NN. The key to avoiding circles is to dynamically encode the state

s_{i}

based on the current curve. We maintain the curve descriptor

r_{i} \in R_{∥ s ∥}

; the state descriptor

h_{s j}

for each key point

s_{i}

neighbour becomes a concatenation of

s_{j i}

and

r_{i}

.

\begin{matrix} β = s o f t max (m l p ([r_{i - 1}, s_{i}])), \\ r_{i} = β r_{i - 1} + (1 - β) s_{i} . \end{matrix}

(8)

Given the grouping curve

C = \{c_{1}, c_{2}, \dots \dots, c_{n}\} \in R_{C \times n \times l}

, to aggregate the features in the curve, we consider the mutual relationship

f_{i n t e r}

between the curves and the internal relationship

f_{i n t r a}

within each curve. We introduce a CA module to obtain two fine-grained feature vectors, which are then processed by the

mlp

.

3.3.2. Set Abstraction

Set Abstraction consists of three modules: sampling, grouping, and PointNet. In each layer of Set Abstraction, the point set is processed and abstracted to a smaller-scale subset during the sampling to obtain point-to-point features of different scales and levels. Farthest Point Sampling (FPS) is used to randomly select an initial point as the sampled point, compute the distance between each point in the unsampled point set, update the distance, and iterate until the desired number of sampled points is obtained. Then, these key points serve as centres during the grouping. We find their neighbouring points through multiple scales and then use PointNet to convert points in coordinate space into feature space. As the layers deepen, the number of centre points decreases, but each centre point contains more and more information. The structure of the SA module is shown in Figure 3.

3.3.3. Attention Mechanism Fusion

With the same aim as the above, the Transformer [38] initially needs to encode the clean input into higher-dimensional feature vectors F. Afterwards, F is fed into the stacked attention module

A T

to concatenate. Inspired by [37], we discard positional embedding with the following expression:

\begin{matrix} F_{i + 1} = A T^{i + 1} (F_{i}), \\ F_{s a} = c o n c a t (F_{1}, F_{2}) \cdot W_{0}, \end{matrix}

(9)

where

A T_{i}

represents the i-th attention layer, with the same input and output dimensions.

W_{0}

is the weight of the

mlp

.

\begin{matrix} (Q, K, V) = F_{i n} \cdot (W_{q}, W_{k}, W_{v}), \\ Q, K \in R^{N \times d_{a}}, V \in R^{N \times d_{e}}, \\ W_{q}, W_{k} \in R^{d_{e} \times d_{a}}, W \in R^{d_{e} \times d_{e}}, \end{matrix}

(10)

where Q, K, and V are the query matrix, key matrix, and value matrix generated by linear transformation of the input features.

W_{q}

,

W_{k}

,

W_{v}

are the shared learnable linear transformations. Using Qx and K, attention weights are calculated through the matrix dot product. When

d_{a}

is used, the dimensions of the Q and K vectors are generally not equal to

d_{e}

; the dimension

d_{a}

is usually set to be

d_{e}

/4. These weights are normalizxed and standardized for output.

Finally, inspired by the residual network,

F_{o u t}

is provided with output features by

F_{s a}

and

F_{i n}

through the LBR (Linear, BatchNorm, and ReLU).

F_{o u t} = L B R (F_{s a}) + F_{i n} .

(11)

Moreover, we use a walk curve aggregation strategy to capture the shape and geometric features of the point cloud objects, forming line segments that connect critical points of the point cloud. To increase the receptive field, we use spheres of different radii defined in Euclidean space and subsets of various sizes using PointNet as the backbone to enrich the feature variations within the channel. The multi-scale transformation of the receptive field achieves multi-level downsampling, obtaining the best-detailed information in different areas of density. On the other hand, we leverage the inherent order invariance property of the Transformer and use a multi-head attention mechanism based on neighbourhood embedding to consider local semantic enhancement. This autoencoder effectively enhances the ability to process details and generalize on complex scenes while maintaining the overall arrangement invariance of the point cloud. The structure is shown in Figure 4.

After obtaining the three types of local-global features of the point cloud, a three-layer

mlp

is used with the nonlinear activation function Relu to perform feature aggregation and achieve the transformation from the point cloud feature space to the coordinate space, completing the reconstruction of the input point cloud. The decoder

D_{m l p}

formula is shown in (12). In summary, the overall structure of the autoencoder is shown in Figure 4.

m l p (m l p (S A)) \oplus m l p (C I C) \oplus m l p (P C T) .

(12)

3.4. Manipulation

In this section, we employ surrogate and test models to enhance the success rate of adversarial attacks on samples. After generating adversarial samples A and B (Figure 2) by autoencoder and geometry-aware methods, respectively, we combine A and B (Figure 2), followed by farthest point downsampling to produce the adversarial sample C (Figure 2), maintaining the original number of points. Given that the overlap of key points in most classification models can frequently reach 34%, we sought to substitute the target model with a surrogate model and a test model to improve the transferability of adversarial samples. The specific method is as follows: We input the generated adversarial sample C into the surrogate model, where

f_{s u r r}

maps the input C to the probability distribution across all categories, resulting in

l (C)

. The function

f_{s} u r r

employs softmax to yield

L_{c l s_s u r r}

, if

y^{'} = y

, then

L_{c l s_t e s t} = 0

. Conversely, if

y^{'} \neq y

, we input C into the test model for analogous processing to obtain

L_{c l s_t e s t}

. Therefore, we can conclude (13).

L_{c l s}

measures the degree of deviation between the predicted value and the target model’s ground truth (GT) after being attacked, as gauged by the attacks on the surrogate and test models conducted by C.

L_{c l s} = L_{c l s_s u r r} + L_{c l s_t e s t} .

(13)

3.5. Injection

The perturbations

Δ \leq ε

are generated for global attacks on the reconstructed point cloud. The constraints are defined by the Hausdorff Distance (HD) [52] and the Chamfer Distance (CD) [33]. During the training of the autoencoder (G), the CD and HD are referred to as the limitations (LCD) and (LHD), respectively. A gradient-based global attack is introduced to disrupt the target’s global features. Throughout the iterative attack reconstruction, the goal is to damage the geometry-aware information on the surface while maintaining a high ASR and imperceptibility. For simplicity, we will define the loss function.

L_{d i s} = α L_{C D} + β L_{H D} .

(14)

After disrupting the geometric properties of the point cloud, we use an alternative model F to evaluate the effectiveness of the attack through multiple iterations. The loss function is outlined as follows:

L (P, t, θ) = max (max_{i \neq t} f {(x)}_{i} - f {(x)}_{t}, 0) + λ L_{d i s} + ω L_{c l s} .

(15)

For any point

p_{i} (x_{i}, y_{i}, z_{i})

, the local neighborhood fitting method based on the k-nearest neighboring points can be used to calculate the normal vector

n_{i} (n_{i 1}, n_{i 2}, n_{i 3})

of the point.

G = {\{g_{i}\}}_{I = 1}^{N} = \nabla_{P^{'}} L = \frac{\partial L (P, t, θ_{w})}{\partial P^{'}} .

(16)

(g_{i 1}, g_{i 2}, g_{i 3}) = (\frac{\partial L}{\partial x_{i}}, \frac{\partial L}{\partial y_{i}}, \frac{\partial L}{\partial z_{i}}),

is the gradient of the point in the loss function L. We use Adam [53] to update the L.

4. Experiments Settings and Results

4.1. Experimental Setup

The ModelNet40 [54] consists of 12,311 CAD models across 40 common categories that are suitable for benchmark training (9843) and testing (2468) in classification and segmentation tasks. We normalize each point cloud surface to a unit sphere to standardize and validate our method. Preprocessing steps such as rotation, translation, and scaling are applied to the data, as described in [27]. To ensure the validity of black-box transfer query attacks, the targeted classifiers default to applying pre-trained parameters on the down-sampled dataset based on the original content.

We compare the G&G attack with various outstanding methods, including gradient-based point-move methods, such as PGD [18], SI-Adv [20], and I-FGSM [11], gradient-based point-drop and point-add methods, such as Saliency Map [24] and Add [27], and constraint-based optimization methods, like AOF [44] and L3A-attack [55]. All methods adopt the best parameters provided by the original papers. In our method, for the global attack, the step size is set to 0.02 with 150 iterations. The dropout parameter for the autoencoder is 0.4. After training, some parameters are randomly frozen to improve the network’s performance. We use LeakyReLU as the activation function in the backbone. For local attacks, the step size is 0.16, the batch size is 16, and the number of iterations is 100. To normalize, all adversarial examples are subject to an L-infinity norm (with a radius of 0.07), and the attacks are non-targeted. We conduct comprehensive evaluations of the Attack Success Rate (ASR), Chamfer Distance (CD), Hausdorff Distance (HD),

L_{2}

-norm, and Attack Time (AT) under the same experimental environment (NVIDIA GeForce RTX 4090 and Intel(R) Xeon(R) Gold 6326 CPU @ 2.90 GHz).

4.2. Main Results

In this section, we will compare the G&G attack with several mainstream attacks. These methods generate adversarial point clouds in a black-box setting. For simplicity, in the comparative experiments, we use PointNet [34] as the white-box surrogate model and introduce four classification models for black-box attacks on the ModelNet40 dataset: PointNet++ [36], CurveNet [32], DGCNN [56], and PointCNN [57]. The Sensitivity Map effectively describes the vulnerabilities of classification models and the confidence of a single point under perturbation, making the perturbation search more targeted.

We compare the ASR of different attacks under no defense, SOR (Statistical Outlier Removal), and DupNet [58]. We assume the baseline does not have access to the training point clouds or classifiers. For SOR, we use the same parameters proposed in the original paper (k = 2,

α

= 1.1). For training DupNet on ModelNet40, we use an upsampling rate of 2. Based on empirical observations, as the number of dropouts increases, the natural accuracy of the classifier decreases further, as shown in Table 1.

Overall, our black-box transfer attack poses a significant threat to various defenses. As shown in Table 2 and Table 3, under the same conditions, the method can reduce the original success rate of PointNet++ [36] from 90.7% to 19.6%. Compared to the listed methods, our method consistently achieves the highest ASR. It is worth noting that, compared to DGCNN [56], PointNet++ [36], and PointCNN [57], PointCNN [57] shows higher robustness, with a lower ASR for various attacks. More significant perturbations are required to launch attacks successfully. Part of the reason is that DGCNN [56] is weaker in handling sparse graph data. Additionally, DGCNN [56] uses pairwise dynamic distance counting to search for k-nearest neighbors, which includes semantic features in multi-layer systems but ignores some vector directions between adjacent points. Therefore, models that incorporate advanced grouping operations (such as PointNet++ [36] and PointCNN [57]) tend to have higher robustness.

From the table, we can see that AOF outperforms the G&G Attack when targeting CurveNet. One of the key reasons for this is that AOF applies the Graph Fourier Transform (GFT) to the point cloud, focusing specifically on its low-frequency components. During the upsampling phase of the DupNet defense, AOF is able to more effectively capture the fundamental shape of the point cloud. By perturbing the low-frequency components, AOF disrupts the internal relational structures that CurveNet relies on. As a result, AOF achieves a higher attack success rate. However, due to the sharp increase in perturbation frequency, AOF is less imperceptible than the G&G Attack.

In contrast, X-conv, which leverages spatial local correlations, operates only on representative points that contain rich information. PointCNN, built with dual-layer X-conv, is highly sensitive to the number of key points. Meanwhile, the Saliency Map effectively reduces the classification accuracy of PointCNN by removing critical subsets of points.

The generated samples are compared with other methods using the CD, HD, and

L_{2}

-norm, and the G&G attack generates smaller disturbances. Although some methods, like I-FGSM [11], may outperform our method in specific metrics, their low ASR makes them impractical for deployment and application. Another advantage of our method is that it requires fewer queries. Compared to commonly used black-box query attacks (SimBA++ [29], SimBA [28], SI-Adv [20]), our method has an average query count of 7.84, which is lower than that of mainstream methods. Therefore, we conclude that our method is difficult to detect, given the premise of ASR, as shown in Table 4.

5. Discussion

5.1. Imperceptibility

In comparison with AdvPC and L3A-attack [55], we found that the values of non-diagonal elements for the G&G attack are greater than those of the comparison models, which suggests that our method has better overall generalization. Furthermore, we noticed that the G&G attack shows the highest robustness when PointNet++ [36] and CurveNet [32] serve as surrogate networks. Part of the reason might lie in the fact that the surrogate models aggregate local points by concatenating neighborhood features, instead of merely retaining individual points. We believe the stability of PointNet++ [36] stems from its Multi-Scale Grouping (MSG) and mlp’s robust fitting, while CurveNet [32] adopts long-distance feature extraction creatively. Although DGCNN [56] stacks multiple layers to learn global shape information, the extracted deep features and neighborhoods may be too similar to provide valuable edge vectors, resulting in weak robustness.

5.2. Transferability

Adversarial examples are generated from the surrogate network gradients to the black-box network across different networks. Figure 5 illustrates the adversarial transferability on PointNet, PointNet++ [36], and DGCNN [56]. The diagonal elements of the transferability matrix represent the ASR on the same network, while the off-diagonal elements serve as metrics for multiple transferabilities, indicating the average transferability between different networks.

Since point clouds are unordered and irregular, traditional metrics such as the Manhattan Distance and Chebyshev Distance are not suitable. Therefore, this paper makes use of the Chamfer Distance (CD) [33], Hausdorff Distance (HD) [52], and Euclidean Distance (

L_{2}

) to measure the imperceptibility of adversarial samples by computing the distance between each point in

P

and its nearest point in

P^{'}

. The formulas are as follows:

Bidirectional Chamfer Distance

\frac{1}{|N|} \sum_{\underset{p_{2} \in P^{'}}{p_{1} \in P}} m i n ∥ p_{1} - p_{2} ∥_{2} + \frac{1}{|N|} \sum_{\underset{p_{1} \in P}{p_{2} \in P^{'}}} m i n {∥ p_{1} - p_{2} ∥}_{2}

(17)

Bidirectional Hausdorff Distance

max_{p \in P} \{min_{p^{'} \in P^{'}} ∥ p - p^{'} ∥\} + max_{p^{'} \in P^{'}} \{min_{p \in P} ∥ p^{'} - p ∥\}

(18)

The distance between each point in

P

and the nearest point in

P^{'}

can reflect the similarity between the two points. The Attack Success Rate (ASR) means the proportion of misclassified samples among test samples. Additionally, the Query counts Q and Attack Time

A T

are also important.

5.3. Ablation Study

To evaluate the effectiveness of the G&G attack and demonstrate the crucial importance of each component, we conducted ablation studies by separately removing the autoencoder and the local geometric attack, and then analyzing the results both quantitatively and qualitatively. As shown in Table 5, removing any part causes a decrease in the ASR. The primary role of the autoencoder is to learn the distribution of point clouds. Geometry-aware methods consider two types of constraints. Removing either of these components resulted in the generation of numerous irregular surface fragments and noticeable outliers.

The reason is that the AE has moved the point cloud to the vicinity of the decision boundary during the destruction-reconstruction process, and attacks that limit certain critical points increase the unrecognizability and uncertainty of the geometric structure while maintaining a smooth and dense surface as much as possible. The local geometry-aware method reduces the generation of large numbers of outliers in adversarial attacks. Local geometric attacks are employed to suppress outliers, making adversarial attacks softer and more globally evenly distributed. Through experiments, we found that the ASR of overall attacks is higher—about twice that of local attacks—although the imperceptibility is poorer. This confirms our speculation.

To confirm the superiority of the joint autoencoder, we compared the simple autoencoder in AdvPC [23] with the joint autoencoder used in the G&G attack. As shown in Figure 6, the joint autoencoder exhibits better convergence. Experiments have confirmed that, under similar imperceptibility metrics, the ASR of the joint autoencoder is higher, as shown in Figure 7. As the number of epochs increases, the network becomes more complex, and the query time also increases. The ASR increases significantly from 1 to 400 epochs and then stabilizes.

5.4. Visualization of Adversarial Samples

We further visualize the adversarial point clouds and validate the effectiveness of the G&G attack. We selected the seven most challenging class attack samples and compared our method with four baseline methods in the visualization field. As shown in Figure 8, significant outliers are produced for all samples, which severely disrupt the local semantic information of the objects (e.g., tables, chairs, airplanes, and so on). The adversarial point clouds generated by our method maintain geometry-aware structures similar to their corresponding benign point clouds. It is worth noting that our adversarial samples do not exhibit many outliers or uneven local point distributions.

Our method integrates gradients, normal vectors, and other relevant surface point information from the point cloud to determine the attack direction. Curvature, acting as an adaptive hyperparameter, plays a pivotal role in weighting these components and effectively guiding the attack direction. To investigate the impact of noise on local curvature, we conducted comparative experiments. In these experiments, we introduced various types of noise, including Gaussian and random noise, into the clean point clouds and observed their effects on the attack success rates and imperceptibility metrics. The results are presented in Table 6.

To evaluate the attack success rate (ASR) and imperceptibility of our method under different noise conditions, we added Gaussian noise and random noise to the clean point clouds. The Gaussian noise had a mean of

μ

= 0, with standard deviations of

σ

= 0.01 and

σ

= 0.1, while the random noise coefficients were set to

η

= 0.5 and

η

= 0.8. In each iteration, the specified noise was applied to the point clouds for testing.

In the case of PointNet++ [36], the attack success rate (ASR) experienced a significant reduction, while in DGCNN, the decline in ASR was more modest. The persistent weakening effect of different types and intensities of noise on attack performance is noteworthy. PointNet++ [36] extracts features using multi-layer perceptrons (MLPs) from each local point cloud through a grouping technique, whereas DGCNN determines node attributes based on a neighborhood defined in feature space using a graph neural network. This makes PointNet++ more sensitive to small local disturbances, leading to a sharp decrease in ASR. In contrast, our attack method relies more heavily on the global structure of the point cloud, so introducing noise causes shifts in key points and changes in local curvature, thereby reducing the attack’s effectiveness.

Additionally, we observed that as the noise coefficient increases, the attack time decreases. This can be attributed, in part, to the fact that the noise disrupts the point cloud, introducing sufficiently strong disturbances that hinder the model’s ability to classify accurately. We plan to conduct further research to explore these phenomena in greater depth, with the goal of gaining a more comprehensive understanding of their impact and underlying mechanisms.

6. Conclusions

This paper introduces a novel and advanced method for performing black-box transfer query attacks on point cloud data. The method employs a powerful autoencoder for point cloud reconstruction, combined with a global attack strategy based on the Adam optimizer, designed to disrupt the neighborhood information of surface points within the point cloud. Specifically, we implement global attack strategies by constructing a new autoencoder with a multi-head architecture, which disturbs the local geometric structure of surface points, thereby enhancing the effectiveness of the attack. Additionally, we introduce secondary “modifications” to key points using a novel point cloud sensitivity map. Simultaneously, local attacks are carried out using the SimBA method, utilizing Integrated Gradients (IGs) and tangential synthesis directions. This approach strikes a balance between the invisibility and effectiveness of the attack, ensuring that the adversarial samples remain sufficiently concealed while still significantly impairing the performance of the target model.

Our method demonstrates excellent transferability, maintaining a high Attack Success Rate (ASR) across different models and environments. After testing against specially designed defense mechanisms for adversarial 3D samples, our attack method successfully bypasses these defenses, significantly improving the ASR. These experimental results validate the effectiveness and robustness of our approach.

Looking ahead, we plan to release the generated adversarial point clouds on the Amazon Mechanical Turk platform to collect human user preference data, which will support future qualitative evaluations. This will help us gain deeper insights into human perception differences and reactions to adversarial samples in 3D point clouds. Furthermore, we will organize the collected adversarial samples into the ModelNet40 attack dataset and release it publicly, providing a valuable resource for the academic community and further promoting research on classifier robustness and interpretability. Building on this, our goal is to continue proposing and deploying high-quality 3D point cloud attack methods. This will not only enhance the diversity and accuracy of attack effectiveness but also drive deeper advancements in adversarial sample research and defense strategies in this field. Through these efforts, we aim to advance 3D point cloud attack techniques and provide stronger theoretical support and practical references for research in related areas.

Author Contributions

Methodology, G.C.; validation, G.C. and T.L.; writing—original draft preparation, G.C. and C.L.; writing—review and editing, G.C. and Z.Z.; resources, G.C. and Y.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Key R&D Program of China (2021ZD0140301), the National Natural Science Foundation of China: 91948303-1; the National Natural Science Foundation of China: No. 61803375, No. 12002380, No. 62106278, No. 62101575, No. 61906210; the Postgraduate Scientific Research Innovation Project of Hunan Province: QL20210018.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

SOR	statistical outlier removal
DupNet	denoiser and upsampler network
FPS	farthest point sampling
KNN	k-nearest neighbor
ASR	attack success rate
CD	Chamfer Distance
HD	Hausdorff Distance
DGCNN	dynamic graph convolution neural network
AOF	adversarial attacks with attacking on frequency
PCT	point cloud transformer

References

Huang, C.Q.; Jiang, F.; Huang, Q.H.; Wang, X.Z.; Han, Z.M.; Huang, W.Y. Dual-Graph Attention Convolution Network for 3-D Point Cloud Classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 4813–4825. [Google Scholar] [CrossRef] [PubMed]
Nazir, S.; Fairhurst, G.; Verdicchio, F. WiSE–a satellite-based system for remote monitoring. Int. J. Satell. Commun. Netw. 2017, 35, 201–214. [Google Scholar] [CrossRef]
Gao, Y.; Liu, X.; Li, J.; Fang, Z.; Jiang, X.; Huq, K.M.S. LFT-Net: Local Feature Transformer Network for Point Clouds Analysis. IEEE Trans. Intell. Transp. Syst. 2023, 24, 2158–2168. [Google Scholar] [CrossRef]
Zou, X.; Li, K.; Li, Y.; Wei, W.; Chen, C. Multi-Task Y-Shaped Graph Neural Network for Point Cloud Learning in Autonomous Driving. IEEE Trans. Intell. Transp. Syst. 2022, 23, 9568–9579. [Google Scholar] [CrossRef]
Chan, K.C.; Koh, C.K.; George Lee, C.S. A 3-D-Point-Cloud System for Human-Pose Estimation. IEEE Trans. Syst. Man, Cybern. Syst. 2014, 44, 1486–1497. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2014, arXiv:1312.6199. [Google Scholar]
Cui, Y.; Zhang, B.; Yang, W.; Yi, X.; Tang, Y. Deep CNN-based Visual Target Tracking System Relying on Monocular Image Sensing. In Proceedings of the 2018 International Joint Conference on Neural Networks (IJCNN), Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–8. [Google Scholar] [CrossRef]
Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Xiao, C.; Prakash, A.; Kohno, T.; Song, D. Robust Physical-World Attacks on Deep Learning Visual Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Komkov, S.; Petiushko, A. AdvHat: Real-World Adversarial Attack on ArcFace Face ID System. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; pp. 819–826. [Google Scholar] [CrossRef]
Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018; pp. 99–112. [Google Scholar]
Papernot, N.; McDaniel, P.; Goodfellow, I.; Jha, S.; Celik, Z.B.; Swami, A. Practical black-box attacks against machine learning. In Proceedings of the 2017 ACM on Asia Conference on computer and Communications Security, Abu Dhabi, United Arab Emirates, 2–6 April 2017; pp. 506–519. [Google Scholar]
Ilyas, A.; Engstrom, L.; Athalye, A.; Lin, J. Black-box adversarial attacks with limited queries and information. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; PMLR: New York, NY, USA, 2018; pp. 2137–2146. [Google Scholar]
Dong, Y.; Liao, F.; Pang, T.; Su, H.; Zhu, J.; Hu, X.; Li, J. Boosting Adversarial Attacks With Momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Wen, Y.; Lin, J.; Chen, K.; Chen, C.P.; Jia, K. Geometry-aware generation of adversarial point clouds. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2984–2999. [Google Scholar] [CrossRef]
Tang, K.; Shi, Y.; Lou, T.; Peng, W.; He, X.; Zhu, P.; Gu, Z.; Tian, Z. Rethinking Perturbation Directions for Imperceptible Adversarial Attacks on Point Clouds. IEEE Internet Things J. 2023, 10, 5158–5169. [Google Scholar] [CrossRef]
Ding, D.; Qiu, C.; Liu, F.; Pan, Z. Point Cloud Upsampling via Perturbation Learning. IEEE Trans. Circuits Syst. Video Technol. 2021, 31, 4661–4672. [Google Scholar] [CrossRef]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2019, arXiv:1706.06083. [Google Scholar]
Yang, J.; Zhang, Q.; Fang, R.; Ni, B.; Liu, J.; Tian, Q. Adversarial Attack and Defense on Point Sets. arXiv 2021, arXiv:1902.10899. [Google Scholar]
Huang, Q.; Dong, X.; Chen, D.; Zhou, H.; Zhang, W.; Yu, N. Shape-Invariant 3D Adversarial Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 15335–15344. [Google Scholar]
Rusu, R.B.; Marton, Z.C.; Blodow, N.; Holzbach, A.; Beetz, M. Model-based and learned semantic object labeling in 3D point cloud maps of kitchen environments. In Proceedings of the 2009 IEEE/RSJ International Conference on Intelligent Robots and Systems, St Louis, MO, USA, 11–15 October 2009; IEEE: Piscataway, NJ, USA, 2009; pp. 3601–3608. [Google Scholar]
Liu, D.; Yu, R.; Su, H. Extending Adversarial Attacks and Defenses to Deep 3D Point Cloud Classifiers. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 2279–2283. [Google Scholar] [CrossRef]
Hamdi, A.; Rojas, S.; Thabet, A.; Ghanem, B. Advpc: Transferable adversarial perturbations on 3d point clouds. In Proceedings of the Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, 23–28 August 2020; Proceedings, Part XII 16. Springer: Cham, Switzerland, 2020; pp. 241–257. [Google Scholar]
Zheng, T.; Chen, C.; Yuan, J.; Li, B.; Ren, K. PointCloud Saliency Maps. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27–28 October 2019. [Google Scholar]
Liu, B.; Zhang, J.; Zhu, J. Boosting 3d adversarial attacks with attacking on frequency. IEEE Access 2022, 10, 50974–50984. [Google Scholar] [CrossRef]
Daniel Liu, R.Y.; Su, H. Adversarial Shape Perturbations on 3D Point Clouds. In Proceedings of the ECCV 2020 Workshops, Glasgow, UK, 23–28 August 2020. [Google Scholar]
Xiang, C.; Qi, C.R.; Li, B. Generating 3D Adversarial Point Clouds. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
Guo, C.; Gardner, J.; You, Y.; Wilson, A.G.; Weinberger, K. Simple black-box adversarial attacks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; PMLR: New York, NY, USA, 2019; pp. 2484–2493. [Google Scholar]
Yang, J.; Jiang, Y.; Huang, X.; Ni, B.; Zhao, C. Learning black-box attackers with transferable priors and query feedback. Adv. Neural Inf. Process. Syst. 2020, 33, 12288–12299. [Google Scholar]
Li, X.; Chen, Z.; Zhao, Y.; Tong, Z.; Zhao, Y.; Lim, A.; Zhou, J.T. Pointba: Towards backdoor attacks in 3d point cloud. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 16492–16501. [Google Scholar]
Zhang, J.; Dong, Y.; Liu, B.; Ouyang, B.; Zhu, J.; Kuang, M.; Wang, H.; Meng, Y. The art of defense: Letting networks fool the attacker. IEEE Trans. Inf. Forensics Secur. 2023, 12, 3267–3276. [Google Scholar] [CrossRef]
Xiang, T.; Zhang, C.; Song, Y.; Yu, J.; Cai, W. Walk in the cloud: Learning curves for point clouds shape analysis. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 915–924. [Google Scholar]
Fan, H.; Su, H.; Guibas, L.J. A point set generation network for 3d object reconstruction from a single image. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 605–613. [Google Scholar]
Qi, C.R.; Su, H.; Mo, K.; Guibas, L.J. PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Sheshappanavar, S.V.; Kambhamettu, C. SimpleView++: Neighborhood Views for Point Cloud Classification. In Proceedings of the 2022 IEEE 5th International Conference on Multimedia Information Processing and Retrieval (MIPR), Virtual, 2–4 August 2022; pp. 31–34. [Google Scholar] [CrossRef]
Qi, C.R.; Yi, L.; Su, H.; Guibas, L.J. PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space. In Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Newry, UK, 2017; Volume 30. [Google Scholar]
Guo, M.H.; Cai, J.X.; Liu, Z.N.; Mu, T.J.; Martin, R.R.; Hu, S.M. Pct: Point cloud transformer. Comput. Vis. Media 2021, 7, 187–199. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2017; Volume 30. [Google Scholar]
Zhao, H.; Jiang, L.; Jia, J.; Torr, P.H.; Koltun, V. Point transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, QC, Canada, 10–17 October 2021; pp. 16259–16268. [Google Scholar]
Kim, J.; Hua, B.S.; Nguyen, T.; Yeung, S.K. Minimal Adversarial Examples for Deep Learning on 3D Point Clouds. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 7797–7806. [Google Scholar]
Ma, C.; Meng, W.; Wu, B.; Xu, S.; Zhang, X. Efficient joint gradient based attack against sor defense for 3d point cloud classification. In Proceedings of the 28th ACM International Conference on Multimedia, Virtual, 12–16 October 2020; pp. 1819–1827. [Google Scholar]
Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (sp), San Jose, CA, USA, 22–26 May 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 39–57. [Google Scholar]
Li, K.; Zhang, Z.; Zhong, C.; Wang, G. Robust Structured Declarative Classifiers for 3D Point Clouds: Defending Adversarial Attacks With Implicit Gradients. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 15294–15304. [Google Scholar]
Zhang, J.; Dong, Y.; Zhu, J.; Zhu, J.; Kuang, M.; Yuan, X. Improving transferability of 3D adversarial attacks with scale and shear transformations. Inf. Sci. 2024, 662, 120245. [Google Scholar] [CrossRef]
Sun, J.; Zhang, Q.; Kailkhura, B.; Yu, Z.; Xiao, C.; Mao, Z.M. Benchmarking Robustness of 3D Point Cloud Recognition Against Common Corruptions. arXiv 2022, arXiv:2201.12296. [Google Scholar]
Zhou, Y.; Tuzel, O. VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection. arXiv 2017, arXiv:1711.06396. [Google Scholar]
Shi, Y.; Han, Y.; Hu, Q.; Yang, Y.; Tian, Q. Query-efficient black-box adversarial attack with customized iteration and sampling. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2226–2245. [Google Scholar] [CrossRef] [PubMed]
Tu, C.C.; Ting, P.; Chen, P.Y.; Liu, S.; Zhang, H.; Yi, J.; Hsieh, C.J.; Cheng, S.M. Autozoom: Autoencoder-based zeroth order optimization method for attacking black-box neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; Voluem 33, pp. 742–749. [Google Scholar]
Brendel, W.; Rauber, J.; Bethge, M. Decision-based adversarial attacks: Reliable attacks against black-box machine learning models. arXiv 2017, arXiv:1712.04248. [Google Scholar]
Dong, Y.; Su, H.; Wu, B.; Li, Z.; Liu, W.; Zhang, T.; Zhu, J. Efficient decision-based black-box adversarial attacks on face recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7714–7722. [Google Scholar]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic attribution for deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Taha, A.A.; Hanbury, A. Metrics for evaluating 3D medical image segmentation: Analysis, selection, and tool. BMC Med Imaging 2015, 15, 1–28. [Google Scholar] [CrossRef]
Diederik, P.K.; Jimmy, B. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Wu, Z.; Song, S.; Khosla, A.; Yu, F.; Zhang, L.; Tang, X.; Xiao, J. 3d shapenets: A deep representation for volumetric shapes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1912–1920. [Google Scholar]
Sun, Y.; Chen, F.; Chen, Z.; Wang, M. Local aggressive adversarial attacks on 3d point cloud. In Proceedings of the Asian Conference on Machine Learning, Virtual, 17–19 November 2021; PMLR: New York, NY, USA, 2021; pp. 65–80. [Google Scholar]
Wang, Y.; Sun, Y.; Liu, Z.; Sarma, S.E.; Bronstein, M.M.; Solomon, J.M. Dynamic graph cnn for learning on point clouds. ACM Trans. Graph. (tog) 2019, 38, 1–12. [Google Scholar] [CrossRef]
Li, Y.; Bu, R.; Sun, M.; Wu, W.; Di, X.; Chen, B. Pointcnn: Convolution on x-transformed points. In Advances in Neural Information Processing Systems; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2018; Volume 31. [Google Scholar]
Zhou, H.; Chen, K.; Zhang, W.; Fang, H.; Zhou, W.; Yu, N. Dup-net: Denoiser and upsampler network for 3d adversarial point clouds defense. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 1961–1970. [Google Scholar]

Figure 1. A geometry-aware attack typically commences by extracting pivotal contours using specialized techniques. Subsequently, these contours undergo targeted Adam. Following this, they are concatenated with non-pivotal points, undergoing multiple rounds of iteration to yield adversarial samples.

Figure 2. The G&G attack combines the advantages of geometry-aware techniques and autoencoders. (A) It reconstructs the target using a revised autoencoder. (B) Point cloud sensitivity maps are leveraged to extract critical subsets. (C) Global perturbations are acquired and adversarial samples are refined through the surrogate and test classifiers, employing both classification loss and distance loss. (D) Simba, based on gradients, curvature, and normal vectors, enhances the perturbations with Gaussian random masking and then generates adversarial samples iteratively.

Figure 3. We employ a four-layer Set Abstraction (SA) to extract point cloud local features. In each layer, points are sampled and grouped by spheres of varying diameters.

Figure 4. Our autoencoder initially applies a single LBR to the features obtained from CIC (

N_{1} \times C_{1}

), SA (

N_{2} \times C_{1}

), and PCT (

N_{3} \times C_{1}

) individually, followed by reshaping the features. The merged features yield

N_{4} \times C

features, where

N_{4} = N_{1} + N_{2} + N_{3}

, and

C_{1} = 1024

. We construct the decoder using a partially frozen feedforward neural network. The target size is

N \times C, C = 3

.

Figure 4. Our autoencoder initially applies a single LBR to the features obtained from CIC (

N_{1} \times C_{1}

), SA (

N_{2} \times C_{1}

), and PCT (

N_{3} \times C_{1}

) individually, followed by reshaping the features. The merged features yield

N_{4} \times C

features, where

N_{4} = N_{1} + N_{2} + N_{3}

, and

C_{1} = 1024

. We construct the decoder using a partially frozen feedforward neural network. The target size is

N \times C, C = 3

.

Figure 5. Visualizing the overall transferability of L3A-Attack (left), AdvPC (middle), and G&G attack (right). Elements in the same row correspond to attacks using the same surrogate network, while elements in the same column correspond to the networks to which the attacks are transferred. Since each attack is optimized on the surrogate network it is transferred to, values on the diagonal tend to be larger. For clarity, brighter elements indicate better transferability. We observe that G&G attack exhibits higher transferability compared to the others. The transferability score under each matrix is the average of all values within it, providing a summary of the overall transferability of these attacks.

Figure 6. In comparison to AdvPC, the G&G attack curve shows a more rapid increase, leveling off after 100 iterations. Furthermore, our method achieves better performance in terms of ASR.

Figure 7. In our comparison with AdvPC regarding imperceptibility using different criteria, such as

L_{2}

, CD, and HD, we found that our method outperforms AdvPC between 200 and 500 iterations. Although our approach requires slightly more queries than AdvPC, this number gradually decreases as the iteration rounds increase. Considering both ASR and imperceptibility, we conclude that around 300 iterations are optimal.

Figure 7. In our comparison with AdvPC regarding imperceptibility using different criteria, such as

L_{2}

, CD, and HD, we found that our method outperforms AdvPC between 200 and 500 iterations. Although our approach requires slightly more queries than AdvPC, this number gradually decreases as the iteration rounds increase. Considering both ASR and imperceptibility, we conclude that around 300 iterations are optimal.

Figure 8. Visual comparisons of different methods. × represents attack failure, indicating the classification model correctly identifies the point cloud; indicates attack success, indicating that the classification model fails the point cloud.

Table 1. Attack success rate (%) of black-box transfer attacks for each method with no defence.

Target Model	Attack	ASR	CD	HD	L2	AT
PointNet++	PGD	42.34%	0.00512	0.02053	2.56925	1.32729
	I-FGSM	9.64%	0.00009	0.00836	0.20468	1.81929
	SIADV	52.07%	0.00180	0.06930	1.85480	1.81142
	Saliency Map	13.29%	0.00493	0.11499	None	3.43626
	Add	8.63%	0.00012	0.00457	0.25674	4.56375
	AOF	73.74%	0.00708	0.02498	3.13824	39.04768
	L3A-attack	30.06%	0.00076	0.01834	0.70428	1.24706
	G&G Attack	78.971%	0.00127	0.00942	0.88586	11.17533
DGCNN	PGD	61.95%	0.00512	0.02053	2.56925	0.76333
	I-FGSM	13.37%	0.00009	0.00836	0.20467	0.67050
	SIADV	41.41%	0.00180	0.06930	1.85480	1.85480
	Saliency Map	24.47%	0.00495	0.11492	None	4.04345
	Add	11.02%	0.00012	0.00456	0.25669	4.69601
	AOF	83.55%	0.00702	0.02507	3.12184	10.75699
	L3A-attack	36.30%	0.00076	0.01834	0.70428	0.54858
	G&G Attack	97.20%	0.00136	0.00423	1.44489	10.04447
CurveNet	PGD	40.44%	0.00512	0.02053	2.56925	1.12563
	I-FGSM	12.97%	0.00009	0.00836	0.20466	1.14258
	SIADV	48.58%	0.00180	0.06930	1.85480	1.53974
	Saliency Map	19.21%	0.00494	0.11536	None	2.66501
	Add	9.97%	0.00012	0.00459	0.25670	3.27948
	AOF	75.12%	0.00708	0.02513	3.13730	22.04874
	L3A-attack	30.11%	0.00076	0.01834	0.70428	1.06008
	G&G Attack	95.30%	0.00141	0.00410	1.47170	9.67680
PointCNN	PGD	31.60%	0.00512	0.02054	2.56925	1.12270
	I-FGSM	23.99%	0.00009	0.00835	0.20464	1.22103
	SIADV	21.47%	0.00180	0.06930	1.85480	1.31106
	Saliency Map	36.55%	0.00492	0.11484	None	2.08533
	Add	23.30%	0.00012	0.00456	0.25681	3.46817
	AOF	49.76%	0.00706	0.02498	3.13396	24.89447
	L3A-attack	23.91%	0.00076	0.01834	0.70428	1.48106
	G&G Attack	81.36%	0.00149	0.00409	1.46869	13.32512

Table 2. Attack success rate (%) of black-box transfer attacks for each method with sor.

Target Model	Attack	ASR	CD	HD	L2	AT
PointNet++	PGD	34.48%	0.00512	0.02053	2.56925	0.98669
	I-FGSM	8.91%	0.00009	0.00836	0.20468	1.08552
	SIADV	13.33%	0.00180	0.06930	1.85480	2.35535
	Saliency Map	14.10%	0.00493	0.11499	None	1.27082
	Add	9.93%	0.00186	0.00903	1.46697	3.34064
	AOF	65.19%	0.00708	0.02498	3.13824	10.16391
	L3A-attack	17.10%	0.00076	0.01834	0.70428	1.75391
	G&G Attack	69.25%	0.00153	0.00389	1.48500	14.72075
DGCNN	PGD	44.65%	0.00512	0.02053	2.56925	1.04294
	I-FGSM	26.74%	0.00009	0.00836	0.20467	1.11005
	SIADV	16.86%	0.00180	0.06930	1.85480	1.54077
	Saliency Map	55.19%	0.00495	0.11492	None	1.05804
	Add	32.01%	0.00186	0.00902	1.46481	3.85096
	AOF	87.76%	0.00702	0.02507	3.12184	34.62557
	L3A-attack	35.94%	0.00076	0.01834	0.70428	0.82829
	G&G Attack	94.77%	0.00139	0.00359	1.45356	13.18447
CurveNet	PGD	44.65%	0.00512	0.02053	2.56925	1.04294
	I-FGSM	18.44%	0.00009	0.00836	0.20466	0.88368
	SIADV	14.10%	0.00180	0.06930	1.85480	1.91420
	Saliency Map	31.85%	0.00492	0.11534	None	1.43144
	Add	18.68%	0.00186	0.00902	1.46567	3.43966
	AOF	74.64%	0.00708	0.02513	3.13730	13.58630
	L3A-attack	22.45%	0.00076	0.01834	0.70428	1.91566
	G&G Attack	87.84%	0.00150	0.00376	1.50523	9.00795
PointCNN	PGD	36.51%	0.00512	0.02054	2.56925	1.61203
	I-FGSM	30.67%	0.00009	0.00835	0.20464	1.57526
	SIADV	20.75%	0.00180	0.06930	1.85480	1.83620
	Saliency Map	49.43%	0.00492	0.11484	None	3.30967
	Add	27.76%	0.00186	0.00903	1.46394	4.54837
	AOF	53.40%	0.00706	0.02498	3.13396	13.46032
	L3A-attack	29.86%	0.00076	0.01834	0.70428	1.80675
	G&G Attack	90.52%	0.00137	0.00345	1.43998	12.94643

Table 3. Attack success rate (%) of black-box transfer attacks for each method with dupnet.

Target Model	Attack	ASR	CD	HD	L2	AT
PointNet++	PGD	42.75%	0.00512	0.02053	2.56925	2.00264
	I-FGSM	12.36%	0.00009	0.00836	0.20470	1.94486
	SIADV	19.17%	0.00180	0.06930	1.85480	3.96888
	Saliency Map	19.17%	0.00496	0.11553	None	2.15756
	Add	13.82%	0.00186	0.00902	1.46693	5.04174
	AOF	55.68%	0.00668	0.02393	2.99472	25.12229
	L3A-attack	19.73%	0.00076	0.01834	0.70428	3.01929
	G&G Attack	60.53%	0.00182	0.00450	1.60559	7.46479
DGCNN	PGD	83.47%	0.00512	0.02054	2.56925	1.52712
	I-FGSM	67.59%	0.00009	0.00836	0.20467	1.50940
	SIADV	52.15%	0.00180	0.06930	1.85480	3.86802
	Saliency Map	19.17%	0.00496	0.11553	None	2.50536
	Add	74.07%	0.00186	0.00901	1.46448	4.28733
	AOF	65.68%	0.00668	0.02393	2.99472	25.12229
	L3A-attack	71.88%	0.00076	0.01834	0.70428	3.97484
	G&G Attack	83.83%	0.00139	0.00356	1.45443	12.97936
CurveNet	PGD	33.35%	0.00512	0.02053	2.56925	2.34982
	I-FGSM	14.75%	0.00009	0.00835	0.20467	2.19222
	SIADV	18.88%	0.00180	0.06930	1.85480	5.03818
	Saliency Map	26.13%	0.00497	0.11533	None	3.34793
	Add	14.87%	0.00186	0.00902	1.46607	6.76772
	AOF	63.21%	0.00669	0.02393	2.99106	31.26717
	L3A-attack	20.46%	0.00076	0.01834	0.70428	4.01711
	G&G Attack	39.38%	0.00153	0.00377	1.51820	10.31737
PointCNN	PGD	82.94%	0.00512	0.02053	2.56925	2.14265
	I-FGSM	83.51%	0.00009	0.00836	0.20471	18.71319
	SIADV	81.00%	0.00180	0.06930	1.85480	4.65357
	Saliency Map	86.26%	0.00493	0.11509	None	32.97012
	Add	83.95%	0.00186	0.00901	1.46387	6.84810
	AOF	85.58%	0.00671	0.02404	3.01063	19.06874
	L3A-attack	83.31%	0.00076	0.01834	0.70428	6.19376
	G&G Attack	58.51%	0.00018	0.00186	0.28452	10.43081

Table 4. Query of black-box transfer attacks for each method.

Attack	PointNet++				CurveNet				PointCNN
Attack	ASR	Q	CD	HD	ASR	Q	CD	HD	ASR	Q	CD	HD
SimBA	8.27%	50.74	1.95 × 10⁻⁷	2.59 × 10⁻⁵	16.73%	73.73	1.11 × 10⁻⁵	2.26 × 10⁻⁴	59.36%	1558.16	1.03× 10⁻⁵	1.78 × 10⁻³
SimBA++	8.14%	50.74	4.62 × 10⁻⁷	1.73 × 10⁻⁵	16.33%	48.53	4.04 × 10⁻⁶	6.92 × 10⁻⁵	58.27%	1559.68	2.08 × 10⁻⁵	1.65 × 10⁻³
SI-ADV	8.39%	4.24	1.26 × 10⁻⁷	2.09 × 10⁻⁵	16.29%	2.94	5.91 × 10⁻⁷	3.87 × 10⁻⁵	58.95%	146.47	7.59 × 10⁻⁶	1.62 × 10⁻³
G&G Attack	78.97%	34.41	1.27 × 10⁻³	9.42 × 10⁻³	95.30%	9.66	1.41 × 10⁻³	4.10 × 10⁻³	81.36%	8.28	1.49 × 10⁻³	4.09 × 10⁻³

Table 5. Ablation experiment of G&G attack.

Target Model	Method	ASR	Q	CD	HD	L2	AT
PointNet++	G&G attack	78.97%	34.41	0.00127	0.00942	0.88586	11.18
	w/o local geometry-aware attack	61.51%	2.93	0.00131	0.00243	1.40282	2.27
	w/o autoencoder	13.82%	115.83	0.00097	0.00504	0.96031	33.01
DGCNN	G&G attack	97.20%	8.76	0.00136	0.04232	1.44489	10.04
	w/o local geometry-aware attack	66.61%	3.34	0.00129	0.00234	1.41377	0.24
	w/o autoencoder	33.87%	88.26	0.00072	0.00390	0.72971	7.34
CurveNet	G&G attack	95.30%	9.66	0.00141	0.00410	1.47170	9.68
	w/o local geometry-aware attack	63.05%	3.03	0.00133	0.00244	1.44300	1.51
	w/o autoencoder	35.70%	101.85	0.00085	0.00454	0.85685	23.90
PointCNN	G&G attack	81.36%	8.26	0.00149	0.00409	1.46869	13.33
	w/o local geometry-aware attack	78.16%	3.03	0.00142	0.00242	1.44440	0.44
	w/o autoencoder	57.74%	116.18	0.00097	0.00502	0.96174	32.72

Table 6. Ablation studies with different noise.

Target Model	Attack	ASR	CD	HD	L2	AT
PointNet	Gauss Noise ( $μ$ = 0, $σ$ = 0.01)	38.65%	0.00142	0.07518	0.43488	21.68067
	Gauss Noise ( $μ$ = 0, $σ$ = 0.1)	38.65%	0.00142	0.07518	0.43488	15.85776
	Random Noise ( $η$ = 0.5)	38.81%	0.00142	0.07518	0.43491	13.22587
	Random Noise ( $η$ = 0.8)	38.70%	0.00142	0.0752	0.4349	12.17309
DGCNN	Gauss Noise ( $μ$ = 0, $σ$ = 0.01)	58.87%	0.00197	0.07824	0.43535	10.69092
	Gauss Noise ( $μ$ = 0, $σ$ = 0.1)	58.87%	0.00197	0.07824	0.43535	8.95007
	Random Noise ( $η$ = 0.5)	58.67%	0.00198	0.07823	0.43536	11.80419
	Random Noise ( $η$ = 0.8)	58.87%	0.00198	0.07824	0.43536	9.97715
CurveNet	Gauss Noise ( $μ$ = 0, $σ$ = 0.01)	64.58%	0.00191	0.07835	0.43522	24.13438
	Gauss Noise ( $μ$ = 0, $σ$ = 0.1)	64.60%	0.00191	0.07845	0.4356	23.58111
	Random Noise ( $η$ = 0.5)	64.71%	0.00191	0.07837	0.43523	22.96008
	Random Noise ( $η$ = 0.8)	63.09%	0.00192	0.07865	0.43735	20.45849

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, G.; Zhang, Z.; Peng, Y.; Li, C.; Li, T. G&G Attack: General and Geometry-Aware Adversarial Attack on the Point Cloud. Appl. Sci. 2025, 15, 448. https://doi.org/10.3390/app15010448

AMA Style

Chen G, Zhang Z, Peng Y, Li C, Li T. G&G Attack: General and Geometry-Aware Adversarial Attack on the Point Cloud. Applied Sciences. 2025; 15(1):448. https://doi.org/10.3390/app15010448

Chicago/Turabian Style

Chen, Geng, Zhiwen Zhang, Yuanxi Peng, Chunchao Li, and Teng Li. 2025. "G&G Attack: General and Geometry-Aware Adversarial Attack on the Point Cloud" Applied Sciences 15, no. 1: 448. https://doi.org/10.3390/app15010448

APA Style

Chen, G., Zhang, Z., Peng, Y., Li, C., & Li, T. (2025). G&G Attack: General and Geometry-Aware Adversarial Attack on the Point Cloud. Applied Sciences, 15(1), 448. https://doi.org/10.3390/app15010448

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

G&G Attack: General and Geometry-Aware Adversarial Attack on the Point Cloud

Abstract

1. Introduction

2. Related Work

2.1. Point Cloud Classification

2.2. 3D Point Cloud Adversarial Attacks

3. Methods

3.1. 3D Point Cloud

3.2. Geometry-Aware

3.3. Autoencoder

3.3.1. The Curve Aggregation Strategy

3.3.2. Set Abstraction

3.3.3. Attention Mechanism Fusion

3.4. Manipulation

3.5. Injection

4. Experiments Settings and Results

4.1. Experimental Setup

4.2. Main Results

5. Discussion

5.1. Imperceptibility

5.2. Transferability

5.3. Ablation Study

5.4. Visualization of Adversarial Samples

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI