Abstract
Research on deep neural network (DNN)-based multi-dimensional data visualization has thoroughly explored cross-modal hash retrieval (CMHR) systems, yet their vulnerability to malicious adversarial examples remains evident. Recent work improves the robustness of CMHR networks by augmenting training datasets with adversarial examples. Prior approaches typically formulate the generation of cross-modal adversarial examples as an optimization problem solved through iterative methods. Although effective, such techniques often suffer from slow generation speed, limiting research efficiency. To address this, we propose a generative-based method that enables rapid synthesis of adversarial examples via a carefully designed adversarial generator network. Specifically, we introduce Cross-Gen, a parallel cross-modal framework that constructs semantic triplet data by interacting with the target model through query-based feedback. The generator is optimized using a tailored objective comprising adversarial loss, reconstruction loss, and quantization loss. The experimental results show that Cross-Gen generates adversarial examples significantly faster than iterative methods while achieving competitive attack performance.