MDPI - Publisher of Open Access Journals

21 pages, 1317 KiB

Open AccessArticle

Research on Hidden Backdoor Prompt Attack Method

by Huanhuan Gu, Qianmu Li, Yufei Wang, Yu Jiang, Aniruddha Bhattacharjya, Haichao Yu and Qian Zhao

Symmetry 2025, 17(6), 954; https://doi.org/10.3390/sym17060954 - 16 Jun 2025

Viewed by 634

Abstract

Existing studies on backdoor attacks in large language models (LLMs) have contributed significantly to the literature by exploring trigger-based strategies—such as rare tokens or syntactic anomalies—that, however, limit both their stealth and generalizability, rendering them susceptible to detection. In this study, we propose HDPAttack, a novel hidden backdoor prompt attack method which is designed to overcome these limitations by leveraging the semantic and structural properties of prompts as triggers rather than relying on explicit markers. Not symmetric to traditional approaches, HDPAttack injects carefully crafted fake demonstrations into the training data, semantically re-expressing prompts to generate examples that exhibit high consistency in input semantics and corresponding labels. This method guides models to learn latent trigger patterns embedded in their deep representations, thereby enabling backdoor activation through natural language prompts without altering user inputs or introducing conspicuous anomalies. Experimental results across datasets (SST-2, SMS, AGNews, Amazon) reveal that HDPAttack achieved an average attack success rate of 99.87%, outperforming baseline methods by 2–20% while incurring a classification accuracy loss of ≤1%. These findings set a new benchmark for undetectable backdoor attacks and underscore the urgent need for advancements in prompt-based defense strategies. Full article

(This article belongs to the Section Mathematics)

► Show Figures

Figure 1

Search Results (1)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI