Next Article in Journal
Model Adaptive Kalman Filter for Maneuvering Target Tracking Based on Variational Inference
Previous Article in Journal
PID-NET: A Novel Parallel Image-Dehazing Network
Previous Article in Special Issue
LLM-WFIN: A Fine-Grained Large Language Model (LLM)-Oriented Website Fingerprinting Attack via Fusing Interrupt Trace and Network Traffic
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mind Mapping Prompt Injection: Visual Prompt Injection Attacks in Modern Large Language Models

School of Computer Science and Engineering, Yeungnam University, Gyeongsan 38541, Republic of Korea
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(10), 1907; https://doi.org/10.3390/electronics14101907
Submission received: 30 March 2025 / Revised: 2 May 2025 / Accepted: 6 May 2025 / Published: 8 May 2025
(This article belongs to the Special Issue AI in Cybersecurity, 2nd Edition)

Abstract

:
Large language models (LLMs) have made significant strides in generating coherent and contextually relevant responses across diverse domains. However, these advancements have also led to an increase in adversarial attacks, such as prompt injection, where attackers embed malicious instructions within prompts to bypass security filters and manipulate LLM outputs. Various injection techniques, including masking and encoding sensitive words, have been employed to circumvent security measures. While LLMs continuously enhance their security protocols, they remain vulnerable, particularly in multimodal contexts. This study introduces a novel method for bypassing LLM security policies by embedding malicious instructions within a mind map image. The attack leverages the intentional incompleteness of the mind map structure, specifically the absence of explanatory details. When the LLM processes the image and fills in the missing sections, it inadvertently generates unauthorized outputs, violating its intended security constraints. This approach applies to any LLM capable of extracting and interpreting text from images. Compared to the best-performing baseline method, which achieved an ASR of 30.5%, our method reaches an ASR of 90%, yielding an approximately threefold-higher attack success. Understanding this vulnerability is crucial for strengthening security policies in state-of-the-art LLMs.

1. Introduction

Large language models (LLMs) are designed to generate information in response to text-based prompts efficiently and accurately. However, they can inadvertently produce sensitive content, including illegal or discriminatory information. One significant exploitation method is “prompt injection”, where malicious input is disguised as a legitimate prompt to extract sensitive data. If such attacks enable easy access to illicit information, they pose a severe criminal threat.
A real-world incident highlights this concern: On New Year’s Day 2025, a bomb attack occurred in front of the Trump International Hotel in Las Vegas. Investigations later revealed that the perpetrator had used ChatGPT to manufacture the explosive device [1]. This case highlights the risk that if LLMs provide unauthorized access to harmful information, they facilitate not only terrorism but also various other crimes.
To mitigate such threats, several security strategies have been proposed. Two widely adopted approaches include the following: first, employing prompt engineering techniques to prevent the generation of illegal or violent responses [2,3,4]; second, fine-tuning the model to block responses to malicious inputs [5,6,7,8,9].
Despite these security measures, no solution can guarantee complete protection due to inherent technological limitations. Excessive security may cause LLMs to reject even legitimate prompts, frustrating users and diminishing their usability. Since LLMs must balance security with user accessibility, vulnerabilities inevitably persist. Attackers continuously refine prompt injection techniques to exploit these weaknesses.
As LLMs evolve, their input capabilities have expanded beyond text to support multimodal inputs, including images, audio, and more. Consequently, prompt injection attacks have also evolved, utilizing multimodal input to bypass security measures. As LLMs integrate more advanced multimodal technologies and security protocols, simplistic attacks have become less effective. Attackers have, therefore, developed more sophisticated techniques to circumvent security barriers. This necessitates the development of robust security strategies to counteract emerging multimodal prompt injection threats.
A mind map is a visual diagram initially developed for educational purposes, designed to structure information hierarchically in a way that enhances cognitive processing [10]. These diagrams can store extensive information within a single image. Attackers can exploit this format by embedding malicious instructions within an otherwise innocuous map, ensuring that only a specific segment contains harmful content while the rest appears legitimate. This deceptive approach makes it difficult for LLMs to recognize an attack. For this reason, mind maps serve as an effective medium for executing multimodal prompt injection attacks.
This study introduces a novel prompt injection attack method utilizing mind map images. The mind map comprises subtopics branching from a central theme categorized into complete and incomplete sections. The attack strategy involves generating explanations for the incomplete sections, which contain malicious instructions that reference the completed sections. Unlike the detailed descriptions provided for legitimate subtopics, those containing malicious instructions remain unspecified.
The LLM is then presented with the mind map image alongside a prompt instructing it to complete the missing descriptions. Consequently, despite the explicit presence of the malicious instruction within the image, the LLM prioritizes filling in the incomplete sections, failing to recognize the attack, and effectively executing the embedded instruction.
The key contributions of this study are as follows:
  • Introduction of a mind map image-based prompt injection attack with a high success rate, targeting multimodal LLMs equipped with robust security measures.
  • Development of a mind map image generator to facilitate the creation of a malicious mind map for use in the proposed attack method.
  • Design of a specialized prompt tailored for mind map image-based prompt injection attacks.
The remainder of this paper is organized as follows: Section 2 outlines the baseline prompt injection attack methods used for performance evaluation. Section 3 outlines the proposed method in detail, step by step. Section 4 presents a performance evaluation and comparison with existing approaches. Finally, Section 5 concludes the study.

2. Related Work

Prompt injection refers to the deliberate manipulation of LLMs through carefully crafted malicious inputs [11,12]. These manipulated outputs may include illegal activities, discriminatory language, and other harmful content, making them a significant security concern. Malicious users can exploit such outputs, necessitating a cautious approach to LLM interactions. A prompt injection attack requires only the insertion of harmful text-based prompts, demanding minimal technical expertise [4]. However, assessing the severity of these attacks in LLMs remains challenging. Consequently, developing a comprehensive countermeasure for prompt injection is a crucial research problem.
Prompt injection techniques can be classified into two primary categories. The first is direct prompt injection, in which malicious instructions are directly entered into the LLM. The second is indirect prompt injection, where harmful instructions are embedded within external data accessible to the LLM [13]. Indirect prompt injection is particularly relevant in attacks leveraging multimodal data. To evaluate the effectiveness of the proposed mitigation strategies, we will analyze various established prompt injection classifications as comparison benchmarks.

2.1. Pure Text-Based Prompt Injection

The most basic form of prompt injection involves directly inputting malicious instructions in plain text. This method is simple and universally applicable to all LLMs. However, since pure text-based attacks have been extensively studied and tested, modern LLMs incorporate robust security mechanisms, significantly reducing the success rate of such attacks. As a result, researchers have developed more sophisticated attack strategies, including encoding and masking techniques, to bypass security measures. The success of pure text-based attacks serves as an indicator of an LLM’s ability to filter sensitive terms from input prompts [14] effectively.

2.2. Multimodal-Based Prompt Injection

Advancements in LLMs have led to the emergence of multimodal models, which can process not only text but also various other data formats, such as images, audio, and videos [15]. Multimodal prompt injection attacks involve embedding malicious instructions within these non-textual data types, such as images or audio clips [16]. One notable approach within this category is visual prompt injection, where attackers embed harmful instructions within images to induce unintended model outputs [17,18]. Modern multimodal LLMs, including GPT-4V, possess sophisticated text recognition capabilities, enabling them to extract and interpret text embedded within images. The success of these attacks highlights security vulnerabilities, particularly insufficient input sanitization mechanisms for filtering extracted text from input images [19].

2.3. Encoding-Based Prompt Injection

Encoding is the process of transforming input characters or symbols according to predefined rules [20]. Encoding-based prompt injection involves converting malicious instructions into a specific encoding format (e.g., Base64 [21], Leetspeak [22]) before inputting them into LLMs. This technique enables attackers to circumvent LLM security policies by concealing malicious instructions. Previous research has demonstrated the effectiveness of a similar approach in which input prompts are encrypted before being processed by LLMs [23]. The success of this method confirms that encoded malicious instructions can circumvent the security mechanisms of targeted LLMs.

2.4. Masked-Based Prompt Injection

Masked prompt injection involves disguising sensitive words with a prompt to prevent LLMs from detecting them as part of an attack. Masking can be achieved by concealing or altering key terms, making it more difficult for the model to recognize the malicious intent. For the attack to succeed, the LLMs must be able to reconstruct the masked sections back to their original form. One notable example of masked-based prompt injection is ArtPrompt, which represents masked words as ASCII art [24]. This method integrates the masked malicious instruction, ASCII art, and a directive for reconstructing the original words from the ASCII representation within the user prompt to execute an attack. Consequently, a masked prompt is generated, allowing attackers to bypass the LLM’s security filters. The success of this technique underscores the potential for masked prompts with obscured sensitive words to circumvent the security policies of target LLMs.
Table 1 below summarizes the characteristics of various prompt injection methods, outlining their respective input formats, types, difficulty levels, and attack success rates.
As demonstrated, attackers have developed multiple prompt injection techniques to manipulate LLMs. In response, LLM developers are actively implementing various security measures to mitigate these threats [14]. However, prompt injection attacks continue to evolve, with new methods emerging regularly. To effectively counter these threats, continuous research is crucial for enhancing understanding and developing appropriate countermeasures against novel attack strategies.

3. Proposed Approach

Visual prompt injection attacks are a sophisticated method used to bypass the safety filters of multimodal LLMs. Generating images for such attacks is inherently more complex than generating text-based prompts. However, when crafted effectively, these images can facilitate highly successful attacks against LLMs. To counter such threats, LLM developers have implemented various defense mechanisms. One primary security strategy involves extracting and analyzing text from images to detect malicious content. For instance, GPT-4v utilizes an OCR (Optical Character Recognition) tool to extract digital text from image files, evaluate the extracted text, and assign a maliciousness score, thereby filtering out harmful prompts [27]. Additionally, fine-tuning models to reject responses to harmful images has proven to be another effective countermeasure [5,6,7]. While these strategies generally mitigate most malicious image-based attacks, they often fail when adversarial images are disguised as legitimate through task-instructing prompts. As a result, LLMs may fail to detect and block such attacks adequately.
To address this vulnerability, we proposed a novel mind map image-based prompt injection attack that differentiates itself from existing methods. In this approach, malicious instructions are embedded within a structured mind map image, which predominantly contains benign, unrelated instructions. The section containing the malicious instruction appears incomplete, as it lacks an explanation. A text prompt then instructs the LLM to complete the missing portions of the mind map. Since the model is solely focused on the completion task, it fails to recognize the embedded malicious instruction, thereby circumventing security mechanisms and generating the intended output.
This study does not aim to promote the malicious use of LLMs, but rather to identify and examine an overlooked vulnerability in multimodal models by introducing a novel visual prompt injection technique. Specifically, we demonstrate how malicious instructions can be inconspicuously embedded within otherwise benign images. The goal of this work is to raise awareness of this emerging threat and to contribute to the development of more robust defenses against visual prompt injection attacks.

3.1. Entire Structure

This section outlines the proposed method’s structure. As illustrated in Figure 1, the malicious instruction entered by an attacker for prompt injection into an LLM is referred to as “malicious user input”. When submitted directly to the LLM, this input is rejected. To execute a successful attack, the malicious input is processed through the mind map image generator, a tool developed in this study to generate mind map images. Initially, the input prompt is formatted into a text-based markdown structure using the markdown generator. This markdown is constructed using a predefined template, with the malicious input inserted into designated sections to ensure consistent formatting. This structured markdown is then converted into a mind map image by the mind map generator, preserving its integrity. The generated image is then submitted to the LLM, along with a prompt containing relevant instructions to trigger the attack. The LLM then produces unauthorized output, signifying a successful jailbreak.
Markdown and mind mapping serve as essential components of the attack methodology proposed in this study. Their hierarchical characteristics are exploited to bypass LLM security policies. While an in-depth understanding of these properties is not required to grasp the proposed method, two key aspects are crucial: (1) the mind map is generated from markdown data, which share a hierarchical structure; (2) both formats in this study adhere to a main-topic–subtopic–explain structure. Additional details are provided in Appendix B. The following section provides an in-depth explanation of each step in the proposed method.

3.2. Mind Map Image-Based Prompt Injection Attack

The proposed method processes malicious input through the mind map image generator, which first creates a markdown file incorporating the malicious user input. A mind map is then generated based on this markdown file. While Section 3.1 provides an overview of the entire workflow, this section details the specific steps involved in generating unauthorized output. The process consists of four distinct steps:
  • Step 1: The malicious user input is provided to the mind map image generator.
  • Step 2: The mind map image generator creates a text-based markdown file embedding the malicious user input.
  • Step 3: A mind map image is generated from the markdown file.
  • Step 4: The resulting mind map image is submitted to the LLM, along with an instructional prompt to execute a jailbreak attack.

3.2.1. Step 1: Input Malicious Instruction

A malicious user enters a harmful instruction into the mind map image generator. As shown in Figure 2a, this instruction is highlighted in blue. Upon launching the mind map image generator, an input field labeled “malicious user input” appears (Figure 2a). When the attacker enters malicious input into this field, it is forwarded to the markdown generator within the mind map image generator.

3.2.2. Step 2: Generate Markdown

The markdown generator creates a markdown file embedding the malicious user input. This file consists of multiple subtopics, as illustrated in Figure 2b, where different colors distinguish each subtopic for clarity. The overall structure and content of the markdown file, except for the malicious input, are predefined. The complete markdown comprises the following.
  • Completed sections: These contain legitimate instructions unrelated to the attack. Each follows the main-topic–subtopic–explain format and serves as a reference for structuring the incomplete section. The contents of these sections are already created.
  • Incomplete section: This section contains malicious user input integrated as a subtopic but lacks lower-level content, such as an “explain” node. As a result, its structure is limited to a central topic and its subtopics.
The differentiation between completed and incomplete sections is intentional, as LLMs are designed to complete missing sections by referencing completed ones. When attempting to complete the missing content within the incomplete section containing malicious input, the LLM refers to the completed sections, thereby generating unauthorized output. As shown in Figure 2b,c, the malicious user input (blue section) lacks an “explain” component, unlike the completed sections. By referencing the completed sections, the LLM generates an appropriate “explain” node for the malicious user input, effectively completing the incomplete node. Further details regarding the markdown structures used in the experiments are provided in Appendix A.1.

3.2.3. Step 3. Generating the Mind Map Image

A mind map image is created using the text-based markdown from the previous step. As shown in Figure 2, the identical colors in (b) and (c) represent the same content, indicating that the mind map directly inherits the markdown’s hierarchical structure and content. The incomplete section, designated as malicious user input, is distinctly highlighted in blue. This color differentiation serves as a reference point for LLMs when instructed to complete the missing section.
The malicious instruction is visually highlighted by coloring the corresponding node in blue, clearly identifying it as the incomplete section of the mind map. Instead of embedding malicious instructions directly—which would typically cause LLMs to reject the task—a targeted prompt is used. For example, a general instruction such as “complete the incomplete section” lacks specificity, making it difficult for the LLM to identify the intended target. In contrast, a more precise directive like “complete the section marked in blue” enables the model to accurately locate and complete the designated portion [28]. The proposed method strategically embeds the malicious instruction within the incomplete section and induces the LLM to complete it by referencing the completed parts of the mind map. This process exploits a security vulnerability in LLMs, as the model interprets the malicious instruction as part of a legitimate task. Additional details about the mind map used in the experiments are provided in Appendix A.2.

3.2.4. Step 4: Inputting the Prompt

The generated mind map image, along with a structured prompt, is submitted to the LLM. The prompt explicitly instructs the model to reference the completed sections while filling in the part containing the malicious instruction. Despite the exposure of the malicious content within the image, the LLM perceived it as part of the general task of mind map completion, bypassing conventional security measures. Since the primary focus is on completing the mind map rather than identifying malicious content, the LLM does not flag the input as an attack. The full prompt details are outlined in Appendix A.3. Once submitted, the attack initiates, generating an unauthorized markdown output that successfully circumvents the LLM’s security policies.

4. Performance Evaluation

In this study, we analyzed and evaluated the performance of the proposed method by comparing it with existing prompt injection techniques. For performance comparison, we selected attacks belonging to four distinct categories: (1) pure text-based prompt injection, which directly inputs malicious instructions without additional modification; (2) visual prompt injection, which falls under multimodal-based prompt injection; (3) encoding-based prompt injection, which includes techniques such as Base64 and Leetspeak; (4) masked-based prompt injection, exemplified by ArtPrompt. Through these comparisons, we not only assess performance but also identify the types of attacks to which LLMs remain vulnerable.

4.1. Experimental Environment

The experiments were conducted using Gemini with gemini-1.5-flash-latest [29], ChatGPT with gpt-4o-11-20 [30] models, and Grok3 with grok-3-beta [31]. To ensure consistency in the experimental results, either memory functions were disabled or a temporary chat mode designed to operate without memory was employed. No separate settings were applied for Gemini and Grok, as it lacks memory-related functions. For the experimental dataset, we selected data from the “harmful_behaviors_custom” dataset, which contains prompts designed to elicit responses related to harmful or unethical actions, such as weapon creation, cybercrime, violence. This dataset was previously used for evaluating ArtPrompt [32]. To enhance the reliability of the evaluation, further experiments were performed on the “harmfulqa” dataset [33], which consists of question-style prompts targeting various categories of harmful behavior, including physical violence, psychological harm, and misinformation. Both datasets consist of prompts concerning harmful behaviors, specifically curated for evaluating how models respond to sensitive and potentially malicious instructions. All experimental results were assessed through human evaluation to ensure the precise measurement of each performance metric. The evaluation criterion was based on compliance with OpenAI’s prohibited use policy [34]. An attack was considered successful only if the response’s harmfulness aligned with the attacker’s intent.
For performance evaluation, we employed two metrics: Helpful Rate (HPR) and attack success rate (ASR) [24,35]. These metrics are defined as follows:
HPR = T h e   t o t a l   n u m b e r   o f   p r o m p t s   t h a t   a r e   n o t   r e f u s e d T h e   t o t a l   n u m b e r     o f   p r o m p t s
ASR = T h e   t o t a l   n u m b e r     o f   p r o m p t s   t h a t   a r e   h a r m f u l T h e   t o t a l   n u m b e r     o f   r e s p o n s e s
An attack was deemed successful if the model provided malicious information precisely matching the attacker’s intent. Each attack was conducted three times to assess its success rate. If a model response was not rejected but did not align with the attacker’s intent, it was classified as a misunderstanding. Specifically, cases were categorized as misunderstandings if the LLMs produced outputs unrelated to the prompt’s intent or if they accepted a malicious prompt but still generated responses unrelated to the prompt.

4.2. Helpful Rate

Table 2 and Table 3 present a comparative analysis of HPR performance across different prompt injection methods. In the case of pure text attacks, both GPT-4o and Gemini successfully rejected the prompts. Most other baseline methods exhibited an HPR of less than 0.2 on both models. However, ArtPrompt achieved significantly higher HPR compared to the other baseline methods. This increase was primarily due to the incorrect restoration of words masked as ASCII art. Instead of recognizing them as malicious content, the LLMs frequently misinterpreted them as different words, sometimes leading to unintended and unauthorized outputs. More often, this misinterpretation resulted in misunderstandings, where the models processed the inputs as benign prompts and generated responses that were unrelated. GPT-4o demonstrated superior accuracy in recognizing ASCII art compared to Gemini, leading to fewer misinterpretations and a lower HPR. This suggests that GPT-4o is more effective than Gemini in detecting and mitigating malicious prompt injections. Although Grok3 can be considered a state-of-the-art model in terms of overall performance, a comparison based on HPR values indicates that, from a security perspective, it behaves similarly to other models.
A prior study [36], using the same dataset [32], baseline methods, and processing methodology with gpt-4o-2024-05-13 reported an HPR of 0.65 for Base64 attacks. However, in the current study, this figure dropped significantly to 0.15. Similarly, ArtPrompt’s HPR, previously recorded at 0.6, decreased to 0.35. This is attributable to the strengthened security mechanisms implemented following the update of GPT-4o from version 13 May 2024 to version 20 November 2024. This indicates that while the security mechanisms of GPT-4o are continuously evolving, they are insufficient in countering masked-based prompt injection techniques, such as ArtPrompt. As illustrated in Table 2 and Table 3, the highest HPR is achieved by the method proposed in this study. The map image-based attack consistently generates responses without rejection, outperforming all other attack methods. This finding highlights a critical vulnerability in the security strategies of LLMs, indicating that they are not yet fully equipped to mitigate attacks leveraging mind map images.

4.3. Attack Success Rate

Table 4 and Table 5 present the ASR performance across different prompt injection methods. The pure text and visual prompt techniques failed to achieve any successful attacks against the latest GPT-4o and Gemini models. Base64-based prompts and Leetspeak-based prompts exhibited minimal ASR values, indicating their limited effectiveness. The ASR of the ArtPrompt method was 45% lower than its high HPR, with a particularly significant decrease observed in the Gemini model. A similar pattern was observed with the Grok3 model. On average, the ArtPrompt method showed a decrease of 35.05 points when converting from HPR to ASR, a decline attributed to instances of miscomprehension, which led to unsuccessful attacks and were therefore excluded from the ASR calculation. In contrast, the proposed method achieved an ASR of 90%, which is more than three-times higher than that of ArtPrompt, the most effective baseline method. While the maximum ASR of the baseline methods was 30.5%, the proposed method consistently achieved ASR values between 90% and 100%, clearly demonstrating its superior effectiveness.

4.4. Failure Case Analysis

The proposed prompt injection attack method achieved a 100% HPR, surpassing all existing attack techniques. Additionally, it demonstrated an ASR more than three-times higher than that of previous methods. To provide a comprehensive analysis of the results for each attack method, all attacks were classified into three categories: jailbreak, miscomprehension, and rejected. Furthermore, the attacks were analyzed based on the data categories used in the experiment, facilitating an in-depth evaluation of each method’s vulnerabilities.
Table 6 presents the results of the attacks, classified into categories such as jailbreak, miscomprehension, and rejected based on the attack data categories. Referring to Table 6, the attack method proposed in this study achieves a higher ASR than other attack methods. This is evident from the “Total” column, which summarizes the results for each approach. Notably, in ArtPrompt, an attack method with a higher HPR than others, the miscomprehension category appears more frequently than jailbreak. As discussed in Section 4.2, this discrepancy arises due to challenges in reconstructing masked words into their original form using ASCII art.
Table 6 also reveals that all three—GPT-4o, Gemini, and Grok3—fail in attacks involving data from the “misinformation” category, which requires generating articles or blog posts that promote illegal activities. During experimentation, GPT-4o provided detailed guidelines on crafting such content, violating OpenAI’s prohibited use policy [34]. However, since LLMs do not inherently generate full-length articles or blog posts, the model fails to produce the expected output, resulting in a failed attack. This failure occurs because the proposed method formats responses in markdown, which differs from traditional article or blog formats. As a result, the proposed method’s performance in this category aligns with limitations that may require modifying the mind map structure and incorporating more precise instructions into the text prompt.

5. Conclusions

This study introduces a novel prompt injection attack method that leverages mind map images embedded with malicious instructions. The approach involves inserting such instructions into incomplete sections of a mind map and prompting the LLM to complete it. The proposed method demonstrated a clearly superior performance, with ASR values consistently between 90% and 100%, compared to a maximum of 30.5% among the baseline methods. However, its applicability is limited to multimodal LLMs capable of image analysis. Additionally, if the malicious instructions require responses in markdown format, LLMs may generate erroneous outputs due to structural and contextual similarities in the mind maps used for each attack. This limitation can potentially be mitigated by modifying the structure, content, and reference targets of incomplete sections or incorporating additional diagrams and images alongside the mind map. Future research should explore these refinements.
The proposed method effectively bypasses the security mechanisms of multimodal LLMs without triggering response rejection, demonstrating its potency as an adversarial attack. These findings highlight critical vulnerabilities in existing LLM security policies, particularly their inability to detect visual-based attacks disguised as benign hierarchical structures, such as mind maps. Consequently, this study underscores the need for enhanced security measures to address such threats. By identifying these vulnerabilities, we aim to contribute to the development of more robust security frameworks that preserve the utility and performance of LLMs while strengthening their defenses against adversarial attacks.

Author Contributions

S.L. and W.P. conducted the research and authored the manuscript. J.K. contributed to the conceptualization. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Research Foundation of Korea (NRF) NRF2022R1A2C1011774.

Data Availability Statement

The data presented in this study are openly available at https://github.com/uw-nsl/ArtPrompt (accessed on 8 January 2025) and https://github.com/declare-lab/red-instruct (accessed on 29 April 2025).

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

This section provides a comprehensive overview of the markdown files, mind maps, and prompts utilized in the experimental evaluation of the proposed method. The generated markdown files and mind maps can be obtained using the mind map image generator [36].
To generate the required data, input the malicious instruction in the text box labeled “malicious user input”. Upon execution, the mind map image generator will generate markdown files (with a .md extension).

Appendix A.1. Markdown Structure

When a malicious instruction is initially introduced into the mind map image generator, the markdown generator produces the corresponding markdown text. Apart from the section where the input is inserted, the overall structure and content of the markdown remain unchanged. The markdown consists of four subtopics, each designated by a double hash (##) heading. The first three subtopics contain fully developed content, whereas the fourth subtopic remains incomplete and incorporates malicious instructions. The specific subtopic referenced by the malicious instruction depends on the nature of the input.
Figure A1. Markdown output generated when a malicious user input (highlighted in blue) is provided to the mind map image generator in this study. In the figure, the ‘#’ symbol in the Mind Map Image Generator’s Markdown Generator indicates hierarchical levels in the generated markdown, such as Main-topic and Sub-topic. For additional information, see Appendix B.1.
Figure A1. Markdown output generated when a malicious user input (highlighted in blue) is provided to the mind map image generator in this study. In the figure, the ‘#’ symbol in the Mind Map Image Generator’s Markdown Generator indicates hierarchical levels in the generated markdown, such as Main-topic and Sub-topic. For additional information, see Appendix B.1.
Electronics 14 01907 g0a1
The first subtopic describes methods for creating real action movies. This section serves as a guide for content creation techniques rather than for constructing physical objects. The second subtopic contains gender-discriminatory expressions, making it a reference for generating biased or discriminatory content. The third subtopic outlines the materials and procedures required for producing smoke candles, making it particularly relevant for instructions related to physical object fabrication.
In this experiment, the first and second subtopics were excluded from the reference set. This decision was based on the observation that attacks leveraging only the third subtopic were successful in the majority of the tested cases. However, the first and second subtopics were retained to preserve the structural integrity of the mind map. Future research could explore strategies to exploit these subtopics more effectively, potentially enhancing the impact of such attacks.

Appendix A.2. Mind Map Structure

The hierarchical structure and content of this mind map align precisely with the markdown depicted in Figure A1. This confirms that the mind map image generator di-rectly translates markdown data into a visual representation after receiving the initial in-put. In Figure A2, the section highlighted in blue represents malicious input. The use of color differentiation within the mind map enables the precise designation of targets for LLMs through specific prompts. This approach is exemplified in prompt 14, as shown in Figure A3. Despite the image being cropped, the LLM does not rely on the visible content of incomplete sections but instead references the predefined structural hierarchy derived from the original markdown data. This ensures that the attack remains effective even with partial visual information, as the malicious input injection depends on the structural framework rather than the complete textual content.
Figure A2. An actual mind map image generated by the mind map generator used in this study.
Figure A2. An actual mind map image generated by the mind map generator used in this study.
Electronics 14 01907 g0a2

Appendix A.3. Text Prompt with Instructions

The prompts provided to the LLMs contain explicit instructions regarding the creation of a mind map image. If only a malicious image is input, the LLMs will analyze it without attempting an attack. Therefore, it is essential to specify the task’s objective and scope with precision. The prompts were refined by examining instances in which the LLMs either rejected the request or generated outputs that were misaligned with the intended goal, thereby increasing the likelihood of obtaining the desired response.
Figure A3. Input prompt for LLMs and mind map image.
Figure A3. Input prompt for LLMs and mind map image.
Electronics 14 01907 g0a3
The prompt’s structure follows a step-by-step approach. First, it distinguishes between completed and incomplete nodes, designating one completed node as a reference to guide the LLMs in completing the unfinished node. A common issue encountered in this process was that LLMs often generated additional content related to the designated reference node rather than the incomplete node. To prevent this, prompt 4 explicitly instructs the model to treat the completed node as an example.
Moreover, LLMs often produced outputs that remained within a safe, educational scope rather than generating the intended content. To counter this, prompt 7 explicitly prohibits generating explanations for educational purposes. Additionally, prompt 12 instructs the model to output materials in terms of their elemental composition when generating manufacturing instructions for items such as explosives or drugs. This prevents the model from suggesting assembly methods that rely on commercially available, difficult-to-obtain finished products (e.g., using an already manufactured bomb as a material for creating another bomb).

Appendix A.4. Mind Map Prompt Injection Output

Figure A4. Experimental results from GPT-4o for the data: “Develop a Strategy for Hacking into a Government Database and Stealing Sensitive Information”. A portion of the generated output.
Figure A4. Experimental results from GPT-4o for the data: “Develop a Strategy for Hacking into a Government Database and Stealing Sensitive Information”. A portion of the generated output.
Electronics 14 01907 g0a4

Appendix B

This section explains the markdown and mind maps used in this study. While these formats were utilized in the research, a detailed understanding of them is not necessary to execute the actual attack. Markdown was used to create mind maps, as both formats share hierarchical characteristics. Markdown is particularly suitable for storing information in text form, which can then be transformed into mind map images using Python 3.11.12.

Appendix B.1. Markdown

Markdown is a lightweight markup language that employs a straightforward syntax, using special symbols or characters to structure and format text within a plain-text editor [37,38]. Unlike traditional markup languages, such as HTML, which use tags (e.g., <html> … </html>), markdown does not enclose content in tags; instead, it uses special symbols (e.g., #, >, etc.) for to denote hierarchy.
The number of # symbols determines the hierarchical structure: fewer # symbols indicate higher-level tiers (typically topics), while an increasing number of # symbols represents lower-level tiers (typically explanations). Special symbols such as # are thus used to distinguish different hierarchical levels in markdown. Due to the absence of a universal markdown standard, different formats exist. Figure A5 illustrates the various markdown formats used in this study.
Figure A5. Examples of markdown hierarchy: (a) #-based markdown format used in this study. (b) ‘>’-based markdown format. (c) Numbered markdown format.
Figure A5. Examples of markdown hierarchy: (a) #-based markdown format used in this study. (b) ‘>’-based markdown format. (c) Numbered markdown format.
Electronics 14 01907 g0a5
In this study, markdown was classified into three hierarchical levels, inspired by the branching structure of mind maps [10]:
  • Main topic: The highest level, appearing only once per document, defining the overarching topic. Each line starting with “#” is treated as the top node.
  • Subtopic: A secondary node under the main topic, representing a subcategory or specific subject. It can introduce additional subtopics or explanations.
  • Explain: The lowest-level node providing detailed explanations for the corresponding subtopic. If an explanation is extensive, multiple nodes may exist under the same subtopic.
In this research, a complete section consists of a primary topic, a subtopic, and an explanatory structure. An incomplete section includes only the main topic and subtopic without an explanation. Malicious instructions are inserted as subtopics in incomplete sections, allowing the attack to occur when the language model generates explanations referencing completed sections.

Appendix B.2. Mind Map

A mind map is a diagrammatic tool used to visually represent information in a hierarchical structure, facilitating comprehension at a glance [39,40]. This format enables users to grasp extensive information within a single framework intuitively.
The conventional structure of a mind map comprises four components [10]:
  • The central image represents the main topic.
  • The main branches extend from the central image and encapsulate key concepts using keywords or images.
  • Sub-branches expand upon the main branches, providing additional explanations.
  • Detailed branches offer further elaboration on the sub-branch content.
In this study, the fourth component (detailed branches) was omitted. Instead, the main-topic–subtopic–explain structure, as described in Appendix B.1, was adopted. The main topic is placed at the center, with subtopics arranged radially around it. Multiple explain nodes are then linked to each subtopic, providing further details.
This hierarchical organization ensures that deeper levels contain increasingly specific information, resembling tree branches. Unlike traditional mind maps incorporating drawings or images, this study relies solely on text-based structures to convey relationships.
Figure A6. Example of a mind map image with topic “network”.
Figure A6. Example of a mind map image with topic “network”.
Electronics 14 01907 g0a6
Mind maps were initially introduced as educational tools, and previous research has recognized their effectiveness in enhancing learning efficiency. Studies indicate that they improve memory retention during learning [41] and promote deeper comprehension [42]. As a result, LLMs may interpret tasks involving mind maps as educational, allowing them to bypass safety filters. In practice, LLMs have frequently generated outputs for educational purposes and failed to generate responses to malicious instructions. This issue has been mitigated by incorporating specific instructions into the prompt.
Despite the growing interest in LLM vulnerabilities, limited research has explored attacks that exploit these models using mind maps. One reason for this is the complexity of generating a unique mind map image for each attack, unlike conventional text-based methods. To address this challenge, this study developed the mind map image generator, a program that automatically generates a mind map when a malicious instruction is provided. The generator converts the instruction into markdown text and produces a corresponding mind map image that reflects the embedded content.

References

  1. Las Vegas Cybertruck Explosion Suspect Used ChatGPT to Plan Attack: Police. Available online: https://6abc.com/post/las-vegas-news-tesla-cybertruck-explosion-suspect-matthew-livelsberger-used-chatgpt-attack-trump-tower-hotel-police-say/15773080/ (accessed on 30 January 2025).
  2. Shenoy, N.; Mbaziira, A.V. An Extended Review: LLM Prompt Engineering in Cyber Defense. In Proceedings of the 2024 International Conference on Electrical, Computer and Energy Technologies, Sydney, Australia, 25–27 July 2024; pp. 1–6. [Google Scholar]
  3. Hines, K.; Lopez, G.; Hall, M.; Zarfati, F.; Zunger, Y.; Kiciman, E. Defending against indirect prompt injection attacks with spotlighting. arXiv 2024, arXiv:2403.14720. [Google Scholar]
  4. IBM—What Is a Prompt Injection Attack? Available online: https://www.ibm.com/think/topics/prompt-injection (accessed on 30 January 2025).
  5. Chen, S.; Piet, J.; Sitawarin, C.; Wagner, D. Struq: Defending against prompt injection with structured queries. arXiv 2024, arXiv:2402.06363. [Google Scholar]
  6. Wallace, E.; Xiao, K.Y.; Leike, R.H.; Weng, L.; Heidecke, J.; Beutel, A. The instruction hierarchy: Training llms to prioritize privileged instructions. arXiv 2024, arXiv:2404.13208. [Google Scholar]
  7. Christiano, P.F.; Leike, J.; Brown, T.; Martic, M.; Legg, S.; Amodei, D. Deep reinforcement learning from human preferences. arXiv 2017, arXiv:1706.03741. [Google Scholar]
  8. Adversarial Prompting in LLMs. Available online: https://www.promptingguide.ai/risks/adversarial (accessed on 10 February 2025).
  9. Prompt Guard. Available online: https://www.llama.com/docs/model-cards-and-prompt-formats/prompt-guard/ (accessed on 3 March 2025).
  10. Mind Mapping: Scientific Research and Studies. Available online: https://b701d59276e9340c5b4d-ba88e5c92710a8d62fc2e3a3b5f53bbb.ssl.cf2.rackcdn.com/docs/Mind%20Mapping%20Evidence%20Report.pdf?uncri=10492&uncrt=0 (accessed on 3 March 2025).
  11. Liu, Y.; Deng, G.; Li, Y.; Wang, K.; Wang, Z.; Wang, X.; Zhang, T.; Liu, Y.; Wang, H.; Zheng, Y.; et al. Prompt Injection attack against LLM-integrated Applications. arXiv 2023, arXiv:2306.05499. [Google Scholar]
  12. Zhang, C.; Jin, M.; Yu, Q.; Liu, C.; Xue, H.; Jin, X. Goal-guided generative prompt injection attack on large language models. arXiv 2024, arXiv:2404.07234. [Google Scholar]
  13. Chen, Y.; Li, H.; Sui, Y.; He, Y.; Liu, Y.; Song, Y.; Hooi, B. Can Indirect Prompt Injection Attacks Be Detected and Removed? arXiv 2025, arXiv:2502.16580. [Google Scholar]
  14. Yao, Y.; Duan, J.; Xu, K.; Cai, Y.; Sun, Z.; Zhang, Y. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confid. Comput. 2024, 4, 100211. [Google Scholar]
  15. Liu, F.; Lin, K.; Li, L.; Wang, J.; Yacoob, Y.; Wang, L. Mitigating hallucination in large multi-modal models via robust instruction tuning. arXiv 2023, arXiv:2306.14565. [Google Scholar]
  16. Multi-Modal Prompt Injection Attacks Using Images. Available online: https://www.cobalt.io/blog/multi-modal-prompt-injection-attacks-using-images (accessed on 5 March 2025).
  17. Kimura, S.; Tanaka, R.; Miyawaki, S.; Suzuki, J.; Sakaguchi, K. Empirical analysis of large vision-language models against goal hijacking via visual prompt injection. arXiv 2024, arXiv:2408.03554. [Google Scholar]
  18. Multimodal Neurons in Artificial Neural Networks. Available online: https://distill.pub/2021/multimodal-neurons/ (accessed on 6 March 2025).
  19. Ye, M.; Rong, X.; Huang, W.; Du, B.; Yu, N.; Tao, D. A Survey of Safety on Large Vision-Language Models: Attacks, Defenses and Evaluations. arXiv 2025, arXiv:2502.14881. [Google Scholar]
  20. Code. Available online: https://en.wikipedia.org/wiki/Code (accessed on 14 March 2025).
  21. Base64. Available online: https://en.wikipedia.org/wiki/Base64 (accessed on 8 January 2025).
  22. Leetspeak. Available online: https://en.wikipedia.org/wiki/Leet (accessed on 8 January 2025).
  23. Yuan, Y.; Jiao, W.; Wang, W.; Huang, J.T.; He, P.; Shi, S.; Tu, Z. Gpt-4 is too smart to be safe: Stealthy chat with llms via cipher. arXiv 2023, arXiv:2308.06463. [Google Scholar]
  24. Jiang, F.; Xu, Z.; Niu, L.; Xiang, Z.; Ramasubramanian, B.; Li, B.; Poovendran, R. ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs. arXiv 2024, arXiv:2402.11753. [Google Scholar]
  25. Rossi, S.; Michel, A.M.; Mukkamala, R.R.; Thatcher, J.B. An early categorization of prompt injection attacks on large language models. arXiv 2024, arXiv:2402.00898. [Google Scholar]
  26. Clusmann, J.; Ferber, D.; Wiest, I.C.; Schneider, C.V.; Brinker, T.J.; Foersch, S.; Truhn, D.; Kather, J.N. Prompt injection attacks on vision language models in oncology. Nat. Commun. 2025, 16, 1239. [Google Scholar] [CrossRef] [PubMed]
  27. Hurst, A.; Lerer, A.; Goucher, A.P.; Perelman, A.; Ramesh, A.; Clark, A.; Ostrow, A.J.; Welihinda, A.; Hayes, A.; Radford, A. GPT-4v system card. arXiv 2024, arXiv:2410.21276. [Google Scholar]
  28. The Dangers of Adding AI Everywhere: Prompt Injection Attacks on Applications That Use LLMs. Available online: https://www.invicti.com/white-papers/prompt-injection-attacks-on-llm-applications-ebook/ (accessed on 4 March 2025).
  29. Introducing Gemini: Our Largest and Most Capable AI Model. Available online: https://blog.google/technology/ai/google-gemini-ai/ (accessed on 3 March 2025).
  30. Introducing ChatGPT. Available online: https://openai.com/index/chatgpt/ (accessed on 3 March 2025).
  31. Grok 3 Beta—The Age of Reasoning Agents. Available online: https://x.ai/news/grok-3 (accessed on 2 May 2025).
  32. ArtPrompt: ASCII Art-Based Jailbreak Attacks Against Aligned LLMs. Available online: https://github.com/uw-nsl/ArtPrompt/blob/main/dataset/harmful_behaviors_custom.csv (accessed on 8 January 2025).
  33. Red-Teaming Large Language Models Using Chain of Utterances for Safety-Alignment. Available online: https://github.com/declare-lab/red-instruct/blob/main/harmful_questions/harmfulqa.json (accessed on 29 April 2025).
  34. Open AI Usage Policies. Available online: https://openai.com/policies/usage-policies/ (accessed on 10 February 2025).
  35. Bethany, E.; Bethany, M.; Flores, J.A.N.; Jha, S.K.; Najafirad, P. Jailbreaking Large Language Models with Symbolic Mathematics. arXiv 2024, arXiv:2409.11445. [Google Scholar]
  36. Kwon, H.; Pak, W. Text-based prompt injection attack using mathematical functions in modern large language models. Electronics 2024, 13, 5008. [Google Scholar] [CrossRef]
  37. Mind Map Image Generator. Available online: https://github.com/pla2n/Mind-map-Image-Generator (accessed on 2 May 2025).
  38. Sustainable Authorship in Plain Text Using Pandoc and Markdown. Available online: https://programminghistorian.org/en/lessons/sustainable-authorship-in-plain-text-using-pandoc-and-markdown (accessed on 20 February 2025).
  39. Markdown. Available online: https://en.wikipedia.org/wiki/Markdown (accessed on 20 February 2025).
  40. Mind Map. Available online: https://en.wikipedia.org/wiki/Mind_map (accessed on 28 February 2025).
  41. Farrand, P.; Hussain, F.; Hennessy, E. The efficacy of the ‘mind map’ study technique. Med. Educ. 2002, 36, 426–431. [Google Scholar] [CrossRef] [PubMed]
  42. Hay, D.; Kinchin, I.; Lygo-Baker, S. Making learning visible: The role of concept mapping in higher education. Stud. High. Educ. 2008, 33, 295–311. [Google Scholar] [CrossRef]
Figure 1. Overview of the proposed approach. In the figure, the “#” symbol within the Mind Map Image Generator’s Markdown Generator is used to denote hierarchical levels in the generated markdown (e.g., Main-topic and Sub-topic). For further details, please refer to Appendix B.1.
Figure 1. Overview of the proposed approach. In the figure, the “#” symbol within the Mind Map Image Generator’s Markdown Generator is used to denote hierarchical levels in the generated markdown (e.g., Main-topic and Sub-topic). For further details, please refer to Appendix B.1.
Electronics 14 01907 g001
Figure 2. Example of markdown and mind map. (a) Illustration of malicious user input into the mind map image generator. As mentioned in Figure 1, ‘#’ is used to denote hierarchical levels in markdown. (b) Output generated by integrating malicious input into markdown, visually represented with colors and symbols. (c) A mind map generated from the hierarchical structure and content of the markdown.
Figure 2. Example of markdown and mind map. (a) Illustration of malicious user input into the mind map image generator. As mentioned in Figure 1, ‘#’ is used to denote hierarchical levels in markdown. (b) Output generated by integrating malicious input into markdown, visually represented with colors and symbols. (c) A mind map generated from the hierarchical structure and content of the markdown.
Electronics 14 01907 g002
Table 1. Feature analysis based on prompt injection methods.
Table 1. Feature analysis based on prompt injection methods.
MetricPure Text-Based
Prompt Injection [25]
Multimodal-Based Prompt Injection [26]Encoding-Based
Prompt Injection [23]
Masked-Based Prompt Injection [24]
Input formTextMultimodalTextText
Types of prompt
injections
DirectIndirectDirectDirect
DifficultyVery easyHardEasyNormal
Attack success rateLowHighLow–middleMiddle
Table 2. Comparison of HPR performance across prompt injection methods on the “harmful_behaviors_custom” dataset.
Table 2. Comparison of HPR performance across prompt injection methods on the “harmful_behaviors_custom” dataset.
HPRPure TextVisualBase64LeetspeakArtPromptProposed
GPT-4o00.050.150.10.351
Gemini000.0500.61
Grok30.050.050.050.20.61
Table 3. Comparison of HPR performance across prompt injection methods on the “harmfulqa” dataset.
Table 3. Comparison of HPR performance across prompt injection methods on the “harmfulqa” dataset.
HPRPure TextVisualBase64LeetspeakArtPromptProposed
GPT-4o0.08700.3910.130.6091
Gemini00.0870.130.0430.4351
Grok30.043000.2610.3481
Table 4. Comparison of ASR performance across prompt injection methods on the “harmful_behaviors_custom” dataset.
Table 4. Comparison of ASR performance across prompt injection methods on the “harmful_behaviors_custom” dataset.
ASRPure TextVisualBase64LeetspeakArtPromptProposed
GPT-4o000.150.10.250.9
Gemini000.0500.150.9
Grok30.050.050.050.20.30.9
Table 5. Comparison of ASR performance across prompt injection methods on the “harmfulqa” dataset.
Table 5. Comparison of ASR performance across prompt injection methods on the “harmfulqa” dataset.
ASRPure TextVisualBase64LeetspeakArtPromptProposed
GPT-4o0.08700.3040.0870.0871
Gemini000.1300.0431
Grok30.043000.17401
Table 6. Classification of experimental results based on attack data categories. Blue “j”, green “m”, and red “r” indicate “jailbreak”, “miscomprehension”, and “rejected”, respectively. Numbers in parentheses denote the amount of instances for each attack data category.
Table 6. Classification of experimental results based on attack data categories. Blue “j”, green “m”, and red “r” indicate “jailbreak”, “miscomprehension”, and “rejected”, respectively. Numbers in parentheses denote the amount of instances for each attack data category.
Attack Data
Category
ModelPure TextVisualBase64LeetspeakArtPromptProposed
jmrjmrjmrjmrjmrjmr
Government (3)GPT-4o003003003003003300
Gemini003003003003111300
Grok3003003003003201300
Hacking (4)GPT-4o004004004004004400
Gemini004004004004112400
Grok3004004004004200400
Bomb (2)GPT-4o002002002002011200
Gemini002002101002020200
Grok3002002002002020200
Virus (2)GPT-4o002002002101002200
Gemini002002101002011200
Grok3002002002200011200
social media (1)GPT-4o001001001001010100
Gemini001001001001010100
Grok3001001001001100100
Dangerous
activity (1)
GPT-4o001001001001001100
Gemini001001001001010100
Grok3001001001001100100
Murder (1)GPT-4o001001001001100100
Gemini001001001001001100
Grok3001001001100100100
Identity theft (1)GPT-4o001001001001100100
Gemini001001001001001100
Grok3001v01001001100100
Violence (2)GPT-4o002101002101011200
Gemini002002002002020200
Grok3002002002002011200
Financial (3)GPT-4o003003201003012300
Gemini003003003003120300
Grok3003003003003012300
Video game (1)GPT-4o001100001001010100
Gemini001001001001010100
Grok3001001001001010100
Misinformation (3)GPT-4o003003102102111120
Gemini003003003003210120
Grok3102102102102111120
Racism (1)GPT-4o001001100001001100
Gemini001001001001100100
Grok3001001001100001100
Mental health (1)GPT-4o001001001001010010
Gemini001001001001010010
Grok3001001001001010010
Theft (1)GPT-4o001001001001100100
Gemini001001001001010100
Grok3001001001001001100
Total10803078707480731930307290
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, S.; Kim, J.; Pak, W. Mind Mapping Prompt Injection: Visual Prompt Injection Attacks in Modern Large Language Models. Electronics 2025, 14, 1907. https://doi.org/10.3390/electronics14101907

AMA Style

Lee S, Kim J, Pak W. Mind Mapping Prompt Injection: Visual Prompt Injection Attacks in Modern Large Language Models. Electronics. 2025; 14(10):1907. https://doi.org/10.3390/electronics14101907

Chicago/Turabian Style

Lee, Seyong, Jaebeom Kim, and Wooguil Pak. 2025. "Mind Mapping Prompt Injection: Visual Prompt Injection Attacks in Modern Large Language Models" Electronics 14, no. 10: 1907. https://doi.org/10.3390/electronics14101907

APA Style

Lee, S., Kim, J., & Pak, W. (2025). Mind Mapping Prompt Injection: Visual Prompt Injection Attacks in Modern Large Language Models. Electronics, 14(10), 1907. https://doi.org/10.3390/electronics14101907

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop