MACD: Multi-Agent Collaborative Approach for Cybersecurity Defense Strategy Generation

Li, Nanfang; Li, Xiang; Li, Zongrong; Ma, Denghui; Yan, Lijun; Cao, Haishan; Zhang, Wenqian; Wang, Xu; Liu, Yu

doi:10.3390/info17040370

Open AccessArticle

MACD: Multi-Agent Collaborative Approach for Cybersecurity Defense Strategy Generation

by

Nanfang Li

¹,

Xiang Li

¹,

Zongrong Li

¹,

Denghui Ma

¹,

Lijun Yan

¹,

Haishan Cao

¹,

Wenqian Zhang

¹,

Xu Wang

¹ and

Yu Liu

^2,*

¹

State Grid Qinghai Electric Power Company, Electric Power Research Institute, Xining 810001, China

²

School of Information Management, Central China Normal University, Wuhan 430079, China

^*

Author to whom correspondence should be addressed.

Information 2026, 17(4), 370; https://doi.org/10.3390/info17040370

Submission received: 14 January 2026 / Revised: 19 February 2026 / Accepted: 9 March 2026 / Published: 15 April 2026

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figure

Versions Notes

Abstract

Cybersecurity defense strategy generation transforms threat intelligence into actionable defense measures against sophisticated multi-stage cyberattacks. Existing approaches lack multi-dimensional coordination of technical, tactical, and threat actor expertise, with limited benchmarks for evaluating defense strategy quality. To address these gaps, we introduce MACD (Multi-Agent Collaborative Defense), a novel framework that orchestrates specialized AI agents to generate ATT&CK-aligned defense strategies. MACD deploys three expert agents for technical defense, kill chain phase analysis, and APT profiling, coordinated through a synthesizing agent, while leveraging retrieval-augmented generation to mitigate hallucination risks in threat mapping. Additionally, we construct CyberDefBench, a comprehensive benchmark combining real-world APT cases and synthetic scenarios with dual-layer annotations for reactive and proactive defenses. Experimental results demonstrate that MACD achieves 84.6% technique mapping accuracy and 72.3% defense coverage, significantly outperforming baseline methods and validating the effectiveness of multi-agent collaboration for cybersecurity defense.

Keywords:

cybersecurity defense; strategy generation; multi-agent collaborative; large language models; AI agent

1. Introduction

The rapid evolution of cyber threats has reshaped the information security landscape, presenting unprecedented challenges for organizations worldwide. Modern cyberattacks have grown increasingly sophisticated, with Advanced Persistent Threats (APTs) demonstrating complex multi-stage tactics that evade traditional security measures [1,2]. These sophisticated attacks impose severe consequences, with the average data breach cost exceeding $4.45 million and the time to identify and contain breaches extending beyond 200 days [3,4]. This escalating threat environment necessitates a paradigm shift from reactive security measures to proactive, intelligence-driven defense strategies. Effective defense strategy generation requires deep technical expertise across multiple security domains, rapid analysis of evolving threat patterns, and synthesis of comprehensive countermeasures, yet traditional manual approaches face significant scalability bottlenecks as threat volumes increase exponentially [5]. Thus, leveraging artificial intelligence to augment human cybersecurity expertise and enable automated generation of adaptive defense strategies has become both an urgent need and a research priority in this field [6,7].

Existing approaches to cybersecurity defense strategy generation have evolved from rule-based systems to intelligent reasoning frameworks, broadly categorized into four main paradigms as shown in Table 1. Traditional methods, including rule-based matching, machine learning anomaly detection, and reinforcement learning-based policy optimization, have advanced the field but face fundamental limitations such as an inability to generalize to novel threats, scarce labeled data requirements, and substantial simulation-to-reality gaps [8,9,10]. Most recently, large language model (LLM)-based approaches leverage the powerful reasoning capabilities of foundation models to produce contextual defense strategies, often incorporating Retrieval-Augmented Generation (RAG) with authoritative knowledge bases such as MITRE ATT&CK to map unstructured threat descriptions to standardized defense recommendations [11,12].

Despite these advances, current methods face critical limitations that constrain their real-world effectiveness. LLM-based approaches, while demonstrating strong reasoning capabilities, suffer from hallucination issues that generate plausible but incorrect strategies, as single-agent systems struggle to maintain sufficient knowledge depth across all required security dimensions simultaneously. Comprehensive cybersecurity defense requires expertise spanning technical vulnerability mitigation, attack progression analysis through kill chain phases, and threat actor behavioral profiling, yet single agents inevitably face knowledge depth insufficiency or perspective bias when attempting to cover all domains. More critically, existing methods lack structured collaboration mechanisms to systematically integrate these diverse security perspectives into coherent strategies. These limitations indicate the need for multi-agent architectures that balance specialized knowledge depth with systematic collaboration mechanisms.

To address these limitations, we propose MACD (Multi-Agent Collaborative Defense), a novel framework that orchestrates multiple specialized AI agents for comprehensive defense strategy generation. Our approach recognizes that effective defense requires expertise across multiple dimensions and achieves this through a multi-agent architecture that decomposes the complex defense generation task into specialized subtasks, enabling each agent to develop deep expertise while a coordinator agent integrates knowledge across dimensions to ensure strategic consistency. MACD operates through a four-phase process that systematically transforms threat queries into actionable strategies: Phases 1–2 preprocess threat intelligence and leverage RAG technology to accurately map natural language descriptions to ATT&CK techniques while mitigating hallucination risks. Phase 3 constitutes the core innovation, deploying three specialized agents (Technical Defense Expert, Phase Defense Expert, and APT Defense Expert) that generate dimension-specific strategies in parallel, with a Coordinator Agent synthesizing these perspectives through deduplication, conflict resolution, and prioritization into a unified, ATT&CK-aligned defense plan. Phase 4 validates feasibility and optimizes effectiveness. This architecture achieves knowledge depth through specialization, efficiency through parallelization, and coherence through intelligent coordination.

In this paper, we make several key contributions:

We construct CyberDefBench, a comprehensive benchmark combining real-world APT cases and synthetic scenarios with standardized evaluation metrics for assessing defense strategy quality.
We propose MACD (Multi-Agent Collaborative Defense), a novel multi-agent framework that orchestrates three specialized AI agents coordinated through a Coordinator Agent to systematically generate comprehensive, ATT&CK-aligned cybersecurity defense strategies.
Through extensive experiments on CyberDefBench, we demonstrate that MACD significantly outperforms baseline methods across multiple metrics, validating practical applicability and interpretability.

2. Related Work

2.1. Cybersecurity Defense Strategy Generation

The growing complexity of cyber threats has driven the evolution of defense strategy generation from traditional rule-based approaches to intelligent automated systems. Early methods relied on expert-defined rules and pattern matching for security policy generation [13], but these approaches struggled with configuration errors, limited scalability, and inability to adapt to emerging threats. To overcome these problems, machine learning and deep learning techniques have been widely used in attack detection systems, using methods such as K-Nearest Neighbors, decision trees, and deep structures like BiLSTM and CNNs to achieve better detection accuracy across different attack types [9,14]. Knowledge graph-based methods have further improved threat intelligence by enabling logical analysis and automated defense strategy prediction through link prediction [15]. More recently, reinforcement learning frameworks have emerged as powerful tools for autonomous defense, with hierarchical multi-agent architectures demonstrating remarkable effectiveness in coordinating defensive actions while maintaining operational stability [16,17]. The latest advancement integrates large language models into security automation, enabling systems to transform unstructured threat intelligence into prioritized assessments and generate context-aware mitigation strategies [18,19], bridging the gap between automated threat analysis and human-interpretable decision support.

Recent advancements in large language models have shown promising results in cybersecurity defense strategy generation, demonstrating strong reasoning capabilities and rapid response to emerging threats. However, LLM-based approaches face critical challenges, including hallucination—generating plausible but incorrect strategies—and context forgetting in long conversations, which undermine the trustworthiness and reliability of generated defense recommendations. To address these limitations, this paper focuses on developing a multi-agent collaborative framework that mitigates individual LLM weaknesses through specialized agent coordination and systematic validation mechanisms.

2.2. Multi-Agent AI Systems

The emergence of large language models has catalyzed significant advances in autonomous agent research, with LLMs demonstrating human-level intelligence through vast web knowledge acquisition [20,21]. Single-agent systems have evolved from basic instruction-following to sophisticated reasoning and action capabilities, with frameworks like ReAct enabling agents to synergize reasoning traces with task-specific actions through interleaving thought processes and environmental interactions [22]. Further enhancements have focused on empowering agents with action learning abilities, allowing them to expand action spaces and develop skills through experiential learning rather than operating within fixed constraints [23,24]. However, single-agent architectures face inherent limitations when addressing complex, multifaceted problems requiring diverse expertise across multiple domains. This recognition has driven the development of multi-agent AI systems, where networks of autonomous agents collaborate through decentralized coordination to tackle tasks beyond individual agent capabilities [20]. Multi-agent frameworks have demonstrated substantial value across various domains, with applications in healthcare showing improved diagnostic accuracy, treatment planning, and interdepartmental coordination through specialized agent collaboration [25].

In cybersecurity contexts, threats exhibit multidimensional characteristics spanning technical vulnerabilities, attack progression through kill chain phases, and adversarial behavioral patterns. Single-agent approaches struggle to simultaneously maintain comprehensive expertise across these diverse security dimensions while ensuring coherent strategy integration. To address these limitations, this paper proposes a multi-agent collaborative framework that orchestrates specialized agents for technical defense, phase-based analysis, and threat actor profiling to generate unified and validated cybersecurity defense strategies. This work represents a principled adaptation of multi-agent paradigms to the cybersecurity domain, where agent roles are explicitly defined by the inherent layered structure of the ATT&CK framework rather than generic task decomposition.

3. Problem Formulation

Cybersecurity defense strategy generation transforms threat intelligence into comprehensive and actionable defense measures through automated analysis and synthesis. We formalize this task by defining

Q

as the space of threat queries, where each query

q \in Q

represents a textual description of observed threats, suspicious activities, or attack indicators reported by security analysts. The output space

S

comprises defense strategies, where each strategy

s \in S

is structured as a collection of defense recommendations:

s = {d_{1}, d_{2}, \dots, d_{n}},

(1)

with each defense item

d_{i}

containing the measure description, applicable threat techniques, technical implementation details, deployment prerequisites, and operational guidance. The core objective is learning a mapping function

f : Q \to S

that generates comprehensive defense strategies from threat descriptions:

s = f (q), q \in Q, s \in S .

(2)

Realizing this mapping function f faces two fundamental challenges that collectively determine the effectiveness of defense strategy generation systems.

Challenge 1: Accurate Threat Intelligence Understanding. The first challenge is accurately identifying the underlying attack techniques or tactics from diverse natural language threat descriptions. Threat queries q may manifest in various forms, including concise and unambiguous technical indicators, ambiguous descriptions that confuse multiple similar techniques, noisy reports embedding real threats within extensive irrelevant information, or false positive scenarios where legitimate business activities superficially resemble attack patterns. Systems must accurately map these unstructured descriptions to specific techniques $t \in T$ within established frameworks such as MITRE ATT&CK, which requires deep understanding of threat characteristics and precise knowledge of framework taxonomies.
Challenge 2: Strategic Defense Generation for Attack Disruption. The second challenge is generating strategic defense measures that effectively disrupt attack progression. Effective defense strategies should not only provide reactive measures against currently observed attack techniques but also proactively anticipate and prevent subsequent steps in the attack chain, thereby disrupting attack progression before adversaries achieve their objectives. Formally, given an attack chain $C = 〈 t_{1}, \dots, t_{k}, t_{k + 1}, \dots, t_{n} 〉$ where technique $t_{k}$ is observed, the generated strategy s should include both current-stage defenses $D_{t_{k}}^{current}$ and next-stage defenses $D_{t_{k + 1}}^{next}$ , i.e., $s = D_{t_{k}}^{current} \cup D_{t_{k + 1}}^{next}$ , ensuring the strategy both mitigates identified threats and preemptively blocks subsequent attack chain evolution.

4. Methodology

4.1. MACD Framework Overview

The MACD framework addresses the multi-dimensional nature of cybersecurity threats through a systematic pipeline that transforms unstructured threat descriptions into comprehensive, actionable defense strategies. As illustrated in Figure 1, the framework operates through four sequential phases, each designed to handle specific aspects of the defense generation process. The first phase establishes a structured knowledge base by extracting and vectorizing MITRE ATT&CK techniques, mitigations, kill chain phases, and APT profiles, creating the semantic search infrastructure for subsequent threat analysis. The second phase bridges the gap between informal threat queries and formal ATT&CK specifications through a RAG-enhanced process that combines LLM-based query refinement, vector-based candidate retrieval, and confidence-scored technique confirmation to ensure accurate threat-to-technique mapping while mitigating hallucination risks. The third phase constitutes the core innovation of our approach, deploying three specialized expert agents (each instantiated from the same LLM without fine-tuning, with specialization achieved through role-specific prompts rather than training) that independently generate dimension-specific defense strategies from technical, kill chain, and threat actor perspectives in parallel, with a Coordinator Agent then consolidating their outputs through post hoc integration without exercising supervisory control over the expert agents to ensure strategic consistency. The fourth phase validates generated strategies against feasibility constraints through a dedicated LLM-based reasoning step, and ranks recommendations by effectiveness using a multi-criteria scoring function. This modular architecture enables systematic decomposition of expertise, parallel reasoning across defensive dimensions, and explicit coordination for consistency. The entire framework operates as a stateless inference-time system with no continuous learning loop, while the RAG-enhanced mapping ensures that all generated strategies remain grounded in authoritative ATT&CK framework standards rather than relying solely on LLM parametric knowledge.

4.2. Data Preprocessing

Data preprocessing establishes the ATT&CK knowledge foundation by extracting, structuring, and indexing information from the MITRE ATT&CK framework. We download the latest ATT&CK datasets covering enterprise, mobile, and ICS domains, parsing attack techniques (

T

), mitigations (

M

), kill chain phases (

K

), and APT group profiles (

A

) along with their inter-relationships. Each technique

t_{i} \in T

is represented as

t_{i} = {id, name, description, tactics, platforms}

. We construct bidirectional mappings between techniques and mitigations, enabling rapid lookup during strategy generation. To enable semantic search, we employ a sentence transformer model (all-MiniLM-L6-v2) to generate dense vector embeddings for all techniques and mitigations. Specifically, we encode the concatenated textual description of each technique into a vector representation

v_{t} \in R^{d}

where

d = 384

. These vector embeddings, structured metadata, and relationship mappings are stored in an indexed database to support efficient retrieval during real-time threat analysis.

4.3. Threat Understanding & Mapping

Threat understanding and mapping connects natural language threat descriptions to formal MITRE ATT&CK techniques through a three-step RAG-enhanced process. Given a threat query

q \in Q

, the system first performs query rewriting using an LLM to extract key cybersecurity concepts and transform the query into ATT&CK-aligned search terms. This produces domain-specific keywords and a rewritten query

q^{'} = {LLM}_{rewrite} (q)

optimized for semantic retrieval. For example, when a security analyst reports “suspicious PowerShell activity executing encoded commands,” the system reformulates this into technical terms matching ATT&CK technique descriptions. This rewriting step enables the system to handle diverse ways practitioners describe threats in natural language, ensuring robust retrieval even when queries use colloquial security terminology.

The second step performs knowledge retrieval using pre-computed technique embeddings. We compute cosine similarity between the query embedding

v_{q^{'}}

and all technique vectors

{v_{t_{1}}, \dots, v_{t_{n}}}

to identify the top-K semantically similar candidates:

C = TopK ({〈 v_{q^{'}}, v_{t_{i}} 〉}_{i = 1}^{n}, K)

. To prevent incorrect mappings caused by semantic ambiguity or insufficient context, we implement an LLM-based confirmation step. Each candidate technique is independently evaluated by prompting the LLM to assess match quality on a 0–10 scale, identify specific matching characteristics, and provide justification. The technique with the highest confirmation score above a threshold (typically 6) is selected as the confirmed mapping

t^{*} = {argmax}_{t \in C} {LLM}_{score} (q, t)

. This RAG-enhanced approach combines semantic similarity with reasoning-based validation, significantly reducing hallucination risks compared to pure LLM-based mapping while maintaining flexibility to handle diverse threat descriptions.

4.4. Multi-Agent Strategy Generation

The core innovation of MACD lies in its multi-agent collaborative strategy generation mechanism, which addresses the multi-dimensional nature of cybersecurity defense through specialized agents that independently reason about complementary aspects of threat mitigation. Once a threat query q is mapped to a confirmed technique

t^{*} \in T

, we deploy three expert agents—each embodying distinct defensive perspectives—to generate dimension-specific strategies in parallel: Agent 1 (Technical Defense Expert) focuses on technique-specific countermeasures by analyzing the technical characteristics of

t^{*}

and retrieving associated MITRE mitigations

M_{t^{*}} = {m \in M ∣ (t^{*}, m) \in Relationships}

, generating strategies

s_{tech}

covering detection methods, prevention controls, configuration hardening, and monitoring approaches tailored to the specific attack vector. Agent 2 (Phase Defense Expert) adopts a kill chain perspective by retrieving the primary tactic

τ \in t^{*} . tactics

and corresponding kill chain phase information

k_{τ} \in K

, generating layered defense strategies

s_{phase}

that address early detection, phase transition prevention, containment, and recovery, explicitly considering how the attack fits into the broader adversarial campaign lifecycle. Agent 3 (APT Defense Expert) takes a threat actor-centric view by identifying APT groups

A_{t^{*}} = {a \in A ∣ t^{*} \in a . techniques}

that employ technique

t^{*}

, generating strategies

s_{apt}

focused on behavioral detection patterns, proactive threat hunting procedures, deception techniques, and intelligence-driven defenses informed by APT tactics, techniques, and procedures (TTPs).

Each agent operates independently using carefully designed prompts (see Appendix A) that provide the original threat query q, the mapped technique details, and dimension-specific context (mitigations for Agent 1, kill chain phase for Agent 2, APT profiles for Agent 3), ensuring that generated strategies remain grounded in both the specific threat scenario and the broader ATT&CK knowledge base. The prompts explicitly instruct agents to contextualize their recommendations to the particular attack scenario rather than producing generic defenses, as illustrated by the directive: “Your strategies should be contextualized to the actual threat scenario, not just generic defenses for the technique.” This contextualization is crucial for generating actionable strategies that address the specific characteristics of the observed threat, such as the particular attack vector, target environment, and operational constraints mentioned in the threat query.

The independent reasoning of specialized agents offers several advantages: (1) depth of expertise, as each agent can focus exclusively on its dimension without cognitive overload from managing all perspectives simultaneously; (2) parallel generation efficiency, enabling faster overall strategy production; and (3) diversity of coverage, reducing the risk that critical defensive angles are overlooked due to perspective bias. However, this parallelism introduces the challenge of ensuring consistency and coherence across independently generated strategies, which we address through Agent 4 (Coordinator Agent).

Agent 4 serves as an intelligent integrator that synthesizes the outputs

{s_{tech}, s_{phase}, s_{apt}}

from the three expert agents into a unified, coherent defense plan

s_{final}

. The coordinator performs four critical functions: (1) deduplication and conflict resolution, identifying redundant recommendations across agents and resolving logical inconsistencies; (2) prioritization, ranking strategies by effectiveness for the specific threat scenario using criteria such as immediacy of impact, implementation complexity, and expected risk reduction; (3) contextualization verification, ensuring all recommended strategies remain relevant to the original threat query q and are not generic ATT&CK guidance disconnected from the scenario; and (4) gap analysis and enhancement, identifying missing defensive controls not covered by the expert agents and augmenting the strategy accordingly. The coordinator is prompted with all three expert outputs and the original threat context, generating a structured defense plan organized into categories (immediate actions, detection strategies, prevention controls, monitoring and alerting, threat hunting, response and recovery) with explicit priority levels and source attributions. This integration step transforms independently generated defensive perspectives into a comprehensive, operationalized defense strategy that security teams can directly implement.

4.5. Strategy Validation & Optimization

The strategy validation phase ensures that the coordinator-generated defense plan

s_{final}

is practically deployable before optimization. We first conduct feasibility checks to verify that each defense measure

d \in s_{final}

can be deployed in the target environment by evaluating four dimensions: (1) technical compatibility, which assesses whether the measure is applicable to the target operating systems, platforms, and existing security tooling identified in the threat query context; (2) resource requirements, including estimated personnel effort, required tooling, and acceptable deployment time windows; (3) operational impact, evaluating whether deployment would disrupt business continuity or introduce new attack surfaces; and (4) prerequisite dependencies, verifying that required configurations or upstream controls are already in place. Measures failing these checks are either removed or flagged for manual assessment. This feasibility validation is implemented as an LLM-based reasoning step, where the model is prompted with both the defense measure details and the environmental context extracted from the original threat query, ensuring that validation judgments remain grounded in the specific deployment scenario rather than generic applicability assumptions.

The optimization phase ranks validated strategies using a multi-criteria scoring function that balances threat coverage, implementation urgency, risk reduction potential, and deployment complexity:

Score (d) = w_{1} \cdot Coverage (d) + w_{2} \cdot Urgency (d) + w_{3} \cdot Effectiveness (d) - w_{4} \cdot Complexity (d),

(3)

where

Coverage (d)

measures the breadth of threats addressed,

Urgency (d)

assigns higher scores to measures targeting earlier kill chain phases where intervention cost is lower,

Effectiveness (d)

is assessed based on the historical defensive efficacy ratings of ATT&CK mitigations against their associated techniques, and

Complexity (d)

aggregates prerequisite tools, configuration steps, and required expertise level. The weights

w_{1}

–

w_{4}

default to equal values (

w_{i} = 0.25

) but are adjustable by security teams to reflect organizational priorities, such as elevating

w_{2}

during active incident response. The final output organizes strategies into immediate deployment, short-term implementation, and long-term planning priority tiers, with each strategy annotated to identify the originating agent, targeted ATT&CK techniques, and expected security improvement, enabling security teams to make informed decisions and communicate response plans effectively.

5. CyberDefBench Benchmark Construction

5.1. Dataset Construction Methodology

To rigorously evaluate defense strategy generation systems against the two core challenges identified in Section 3—accurate threat intelligence understanding and strategic defense generation—we construct CyberDefBench based on real-world attack chain progressions. We systematically extract 40 attack chains from MITRE ATT&CK Procedure Examples and authoritative threat intelligence reports (Mandiant, CrowdStrike, CISA), covering 12 distinct APT groups across Enterprise, Mobile, and ICS domains, and formalize each as a directed sequence

C = 〈 t_{1}, t_{2}, \dots, t_{n} 〉

where

t_{i} \in T

represents ATT&CK techniques following the temporal progression observed in actual incidents. Our dataset construction comprises two key components: Threat Intelligence Generation simulates diverse threat descriptions to evaluate accurate understanding capabilities, and Defense Strategy Annotation provides dual-layer ground truth defenses to assess strategic generation capabilities.

Threat Intelligence Generation. For each attack chain

C

, we select an observation point

t_{k}

(where

1 \leq k < n

) representing the currently detected technique, then generate realistic threat scenarios across four difficulty categories that systematically probe threat understanding capabilities. Simple scenarios provide unambiguous indicators directly mapping to

t_{k}

, establishing baseline accuracy. Confusing scenarios deliberately mix characteristics of

t_{k}

with a similar technique

t^{'}

from the same kill chain phase, requiring fine-grained disambiguation. Noisy scenarios embed real threat indicators within 60–70% irrelevant information such as routine IT issues or benign activities, simulating information overload. False-positive scenarios describe legitimate business activities that superficially resemble attack patterns, testing discrimination capability. The core of our scenario generation relies on LLM-based prompt engineering, with category-specific prompt templates designed to ensure systematic and reproducible scenario generation across all difficulty levels, as summarized in Table 2. For instance, to generate Noisy scenarios, we first identify the target technique

t_{k}

and extract its formal ATT&CK description, then construct prompts instructing the LLM to embed technique-specific indicators within distractor information, such as “Generate a threat report where PowerShell execution indicators are mixed with routine IT troubleshooting logs, network maintenance activities, and user software installation events, ensuring the real threat constitutes only a small portion of the content.” This prompt-driven approach enables systematic control over scenario difficulty while maintaining a realistic operational context.

Defense Strategy Annotation. For each scenario anchored at observation point

t_{k}

within attack chain

C = 〈 t_{1}, \dots, t_{k}, t_{k + 1}, \dots, t_{n} 〉

, we construct dual-layer ground truth that captures both reactive and proactive defense requirements. Current-Stage Defenses

D_{t_{k}}^{current} = {m \in M ∣ (t_{k}, m) \in Relationships} \cup {d \in D ∣ (t_{k}, d) \in Relationships}

comprise mitigations and detections directly addressing the observed technique

t_{k}

, extracted from authoritative ATT&CK framework relationships—these represent immediate containment measures that security teams must implement to neutralize the current threat. Next-Stage Defenses

D_{t_{k + 1}}^{next}

comprise mitigations and detections targeting the next technique

t_{k + 1}

in the attack chain, representing strategic measures that break the chain by preventing adversary progression to subsequent stages. Each defense item strictly adheres to ATT&CK standardized specifications. This dual-layer annotation enables fine-grained evaluation of whether systems can not only respond to visible threats but also anticipate and disrupt attack evolution, distinguishing reactive defense from strategic foresight.

5.2. Dataset Examples

Table 3 illustrates representative examples from CyberDefBench across the four difficulty categories, anchored within Operation Ghost, an APT29 campaign from 2013 to 2019 targeting European government entities. We construct scenarios based on the documented attack chain where the observation point is T1546.003 (WMI Event Subscription for persistence) and the next technique is T1078.002 (Valid Accounts: Domain Accounts), reflecting the adversarial progression in which APT29 first establishes WMI-based persistence on compromised hosts and subsequently leverages stolen administrator credentials for lateral movement across the network. For the Simple scenario, the threat description provides unambiguous WMI persistence indicators, enabling straightforward technique identification. The Confusing scenario deliberately mixes WMI Event Subscription and Scheduled Task characteristics, requiring systems to disambiguate between T1546.003 and T1053.005. The Noisy scenario embeds real WMI threat indicators within extensive irrelevant context about routine system administration and monitoring alerts, testing robustness to information overload. The False-positive scenario describes legitimate WMI-based monitoring deployed by IT operations that superficially resembles malicious persistence, evaluating discrimination capability. For all scenarios, the Current-Stage Defenses include mitigations M1040 (Behavior Prevention on Endpoint), M1026 (Privileged Account Management), M1018 (User Account Management), and detection strategy DET0086 targeting T1546.003. The Next-Stage Defenses target T1078.002 and include mitigations M1032 (Multi-factor Authentication), M1027 (Password Policies), M1026 (Privileged Account Management), M1018 (User Account Management), and M1017 (User Training), alongside detection strategy DET0210 to identify anomalous use of domain credentials and unauthorized lateral movement attempts.

5.3. Evaluation Metrics

We evaluate defense strategy generation systems using two primary metrics that assess the two core challenges identified in Section 3. Technique Mapping Accuracy measures whether systems can correctly identify the actual attack technique from threat descriptions, corresponding to Challenge 1 (accurate threat intelligence understanding):

{Acc}_{map} = \frac{1}{| Q_{test} |} \sum_{q \in Q_{test}} ⊮ [t_{q}^{*} = t_{q}^{gt}]

(4)

where

t_{q}^{*}

denotes the predicted technique and

t_{q}^{gt}

represents the ground truth technique for query q. Ground truth labels

t_{q}^{gt}

are directly derived from ATT&CK Procedure Examples documented in our annotated attack chains, and correctness is determined by exact technique ID matching, eliminating subjective judgment. This metric evaluates threat intelligence understanding across the four difficulty categories (Simple, Confusing, Noisy, False-positive). Defense Coverage evaluates whether generated strategies comprehensively cover the ground truth defenses, corresponding to Challenge 2 (strategic defense generation for attack disruption):

{Cov}_{def} = \frac{1}{| Q_{test} |} \sum_{q \in Q_{test}} \frac{| s_{q} \cap D_{q} |}{| D_{q} |}

(5)

where

s_{q}

represents all generated defenses (mitigations and detections) and

D_{q} = D_{t_{k}}^{current} \cup D_{t_{k + 1}}^{next}

denotes all ground truth defenses extracted from the ATT&CK framework, including both current-stage defenses for the observed technique

t_{k}

and next-stage defenses for the subsequent technique

t_{k + 1}

in the attack chain. All ground truth defenses

D_{q}

are automatically extracted from ATT&CK’s structured mitigation-technique and detection–technique relationship mappings, ensuring annotation objectivity without manual labeling subjectivity. This dual-layer evaluation enables assessment of whether systems can not only respond to visible threats but also anticipate and disrupt attack evolution.

6. Experiments

6.1. Experimental Setup

6.1.1. Baseline Methods

To comprehensively evaluate the effectiveness of our MACD framework, we compare against three representative baseline approaches that reflect current practices in automated defense strategy generation. To ensure fairness, all methods operate over identical input scenarios and ATT&CK knowledge sources.

Single-Agent LLM (SA-LLM). A single GPT-4 agent directly generates defense strategies from threat queries without ATT&CK context or specialized knowledge retrieval.
RAG-Only. This approach retrieves top-K most similar ATT&CK techniques and directly returns their associated mitigations and detections without LLM-based reasoning or contextual adaptation.
Deep Research. An advanced LLM reasoning system (Claude Deep Research) that autonomously performs iterative web search and synthesis to generate defense recommendations without structured ATT&CK grounding.

6.1.2. Evaluation Metrics

We employ the two evaluation metrics defined in Section 5: Technique Mapping Accuracy (

{Acc}_{map}

), measuring the correctness of threat-to-technique identification, and Defense Coverage (

{Cov}_{def}

), assessing the completeness of generated strategies relative to ground truth ATT&CK mitigations and detections.

6.1.3. Experimental Environment

All experiments are conducted on a server equipped with an Intel Xeon Gold 6248R CPU and 128 GB RAM. The MACD framework uses GPT-4 as the backbone LLM for all agents, with temperature set to 0.3 for deterministic generation. The sentence transformer model (all-MiniLM-L6-v2) is used for vector embeddings with dimension

d = 384

. For RAG retrieval, we set

K = 5

candidate techniques with a confirmation score threshold of 6.0. Our evaluation dataset comprises 160 scenarios derived from 40 real-world APT attack chains, with each chain generating 4 scenarios (Simple, Confusing, Noisy, and False-positive) at different observation points. Each scenario includes ground truth annotations for both current-stage defenses (targeting the observed technique

t_{k}

) and proactive defenses (targeting the next technique

t_{k + 1}

in the attack chain). Each experiment is repeated three times with different random seeds, and we report mean values with standard deviations.

6.1.4. Overall Performance Comparison

Table 4 presents the overall performance comparison across all methods on CyberDefBench. MACD consistently outperforms all baselines across both metrics, achieving 84.6% technique mapping accuracy and 72.3% defense coverage. The Single-Agent LLM baseline exhibits significantly lower performance (68.2% mapping accuracy, 51.4% defense coverage), demonstrating that access to structured ATT&CK knowledge is critical for accurate threat understanding and comprehensive defense generation. The RAG-Only approach achieves moderate mapping accuracy (75.8%) through semantic retrieval but suffers from poor defense coverage (58.6%) due to lack of contextual reasoning and strategy synthesis. Deep Research demonstrates strong general reasoning capabilities (mapping accuracy 77.5%, defense coverage 64.2%) but cannot match MACD’s structured multi-agent collaboration grounded in authoritative ATT&CK framework knowledge.

6.1.5. Performance Across Scenario Categories

Table 5 breaks down performance by scenario category, revealing distinct strengths and challenges for each method. MACD demonstrates particular advantages in Confusing (82.5% vs. 71.2% for Deep Research) and Noisy (81.8% vs. 69.4%) scenarios, where multi-dimensional reasoning and noise filtering are critical. The Single-Agent LLM baseline performs reasonably on Simple scenarios (75.0%) but degrades significantly on Confusing (58.3%) and False-positive (52.5%) categories, highlighting its lack of structured knowledge for disambiguating similar techniques. The RAG-Only baseline achieves stable performance across categories but plateaus due to limited contextual adaptation capabilities. Notably, all methods struggle with False-positive scenarios, with MACD achieving 70.0% accuracy compared to 62.5% for the best baseline, indicating that distinguishing legitimate activities from malicious behaviors remains challenging even with sophisticated multi-agent reasoning.

6.1.6. Ablation Study

To validate the contribution of each agent in MACD, we conduct ablation experiments by systematically removing individual agents from the framework, as shown in Table 6. Removing the Technical Defense Expert (Agent 1) causes the largest performance drop in defense coverage (−11.8%), confirming that technique-specific mitigation knowledge is fundamental to comprehensive defense generation. Removing the Phase Defense Expert (Agent 2) primarily impacts defense coverage (−7.2%) and mapping accuracy on complex attack chain scenarios, validating its role in kill chain reasoning and proactive defense. The APT Defense Expert (Agent 3) contributes moderately to both metrics (−3.5% mapping accuracy, −5.6% coverage), particularly for threat hunting and behavioral detection strategies. Removing the Coordinator Agent (Agent 4) results in a 6.4% decrease in defense coverage due to redundant and inconsistent recommendations from uncoordinated expert outputs, demonstrating the necessity of strategy integration while maintaining high mapping accuracy since technique identification occurs before coordination.

7. Conclusions

This paper addresses the challenge of automated cybersecurity defense strategy generation by proposing MACD, a multi-agent collaborative framework that orchestrates specialized AI agents to generate comprehensive, ATT&CK-aligned defense strategies. Recognizing that effective defense requires expertise across technical mitigation, attack progression, and threat actor profiling, MACD deploys three expert agents coordinated through an intelligent integration mechanism while leveraging RAG-enhanced threat mapping to mitigate hallucination risks. We construct CyberDefBench, a comprehensive benchmark combining real-world APT scenarios with standardized evaluation metrics. Experimental results demonstrate that MACD significantly outperforms single-agent, retrieval-only, and deep research baselines, with particularly strong performance on complex multi-stage attack scenarios. Ablation studies validate the complementary contributions of each specialized agent. Future work will explore real-time deployment and extending coverage to emerging attack vectors such as supply chain and IoT threats.

Author Contributions

Conceptualization, N.L. and X.L.; methodology, N.L. and X.L.; software, N.L., X.L. and Z.L.; validation, N.L., X.L., D.M. and L.Y.; formal analysis, N.L. and Z.L.; investigation, N.L., Z.L., D.M. and H.C.; resources, H.C., W.Z. and X.W.; data curation, N.L., Z.L. and D.M.; writing—original draft preparation, X.L. and Y.L.; writing—review and editing, N.L., X.L. and Y.L.; visualization, N.L. and Z.L.; supervision, X.L. and X.W.; project administration, N.L. and X.L.; funding acquisition, N.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the STATE GRID QINGHAI ELECTRIC POWER COMPANY, Project Name: Research and Application of Attack Detection Technology Based on Large Models (52280725000B).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from MITRE ATT&CK and are available https://attack.mitre.org with the permission of MITRE ATT&CK.

Conflicts of Interest

Authors Nanfang Li, Xiang Li, Zongrong Li, Denghui Ma, Lijun Yan, Haishan Cao, Wenqian Zhang and Xu Wang were employed by State Grid Qinghai Electric Power Company. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Prompt Templates and Pseudocode

This appendix provides the complete prompt directives for all four MACD agents (Appendix A.1) and the full pseudocode of the MACD pipeline (Appendix A.2).

Appendix A.1. Agent Prompt Templates

Table A1 presents the concise prompt directives used for each of the four agents in MACD. All agents share the same backbone LLM without fine-tuning, specialization is achieved entirely through role-specific prompts, and dimension-specific context inputs derived from preceding pipeline phases.

Table A1. Concise prompt directives for the four MACD agents.

Agent	Key Inputs	Concise Prompt Directive
Agent 1 (Technical Defense Expert)	q, $t^{}$ , $M_{t^{}}$	“Given threat query [q], confirmed technique [ $t^{}$ : id, name, description, platforms], and its ATT&CK mitigations [ $M_{t^{}}$ : id, name, description], generate contextualized technical defense strategies covering: (1) detection methods for this attack vector, (2) prevention controls and configuration hardening, (3) monitoring and alerting approaches. Strategies must be specific to the threat scenario, not generic technique-level guidance.”
Agent 2 (Phase Defense Expert)	q, $t^{*}$ , $k_{τ}$	“Given threat query [q], confirmed technique [ $t^{*}$ ], and its kill chain phase [ $k_{τ}$ : tactic name, phase description, adversary goal, preceding and subsequent phases], generate layered defense strategies covering: (1) early detection before phase completion, (2) measures to block transition to the next phase, (3) containment and isolation, (4) recovery actions. Explicitly address how to disrupt adversarial campaign progression.”
Agent 3 (APT Defense Expert)	q, $t^{}$ , $A_{t^{}}$	“Given threat query [q], confirmed technique [ $t^{}$ ], and APT profiles [ $A_{t^{}}$ : group names, associated TTPs, historical targets, behavioral patterns], generate intelligence-driven defense strategies covering: (1) behavioral detection rules specific to these APT groups, (2) proactive threat hunting hypotheses based on known TTPs, (3) deception techniques targeting their typical attack patterns, (4) asset hardening informed by historical campaigns.”
Agent 4 (Coordinator)	q, $s_{tech}$ , $s_{phase}$ , $s_{apt}$	“Given threat query [q] and three expert outputs [ $s_{tech}$ , $s_{phase}$ , $s_{apt}$ ], synthesize a unified defense plan by: (1) removing duplicate recommendations, retaining the most scenario-specific version; (2) resolving conflicts by preferring measures with higher technique specificity and closer alignment to [q]; (3) filling coverage gaps across: Immediate Actions, Detection, Prevention, Monitoring, Threat Hunting, Response & Recovery; (4) assigning priority levels (High/Medium/Low) and annotating each measure with its source agent and targeted ATT&CK identifier.”

Appendix A.2. MACD Algorithm Pseudocode

Algorithm A1 presents the complete pseudocode of the MACD pipeline, covering all four phases.

Algorithm A1: MACD: Multi-Agent Collaborative Defense Strategy Generation

References

Che Mat, N.I.; Jamil, N.; Yusoff, Y.; Mat Kiah, M.L. A systematic literature review on advanced persistent threat behaviors and its detection strategy. J. Cybersecur. 2024, 10, tyad023. [Google Scholar] [CrossRef]
Mulahuwaish, A.; Qolomany, B.; Gyorick, K.; Abdo, J.B.; Aledhari, M.; Qadir, J.; Carley, K.; Al-Fuqaha, A. A survey of social cybersecurity: Techniques for attack detection, evaluations, challenges, and future prospects. Comput. Hum. Behav. Rep. 2025, 18, 100668. [Google Scholar] [CrossRef]
Malik, V.; Khanna, A.; Sharma, N.; Nalluri, S. Advanced Persistent Threats (APTs): Detection Techniques and Mitigation Strategies. Int. J. Glob. Innov. Solut. 2024. [Google Scholar] [CrossRef]
Cobos, E.V.; Cakir, S.; Straub, S.; Qiang, C.Z.; Torgusson, C. A Review of the Economic Costs of Cyber Incidents; World Bank: Washington, DC, USA, 2024. [Google Scholar]
Mohamed, N. Current trends in AI and ML for cybersecurity: A state-of-the-art survey. Cogent Eng. 2023, 10, 2272358. [Google Scholar] [CrossRef]
Ferrag, M.A.; Alwahedi, F.; Battah, A.; Cherif, B.; Mechri, A.; Tihanyi, N. Generative AI and large language models for cyber security: All insights you need. arXiv 2024, arXiv:2405.12750. [Google Scholar] [CrossRef]
Brandão, P.; Silva, C. Unveiling the Shadows—A Framework for APT’s Defense AI and Game Theory Strategy. Algorithms 2025, 18, 404. [Google Scholar] [CrossRef]
Sarker, I.H.; Janicke, H.; Ferrag, M.A.; Abuadbba, A. Multi-aspect rule-based AI: Methods, taxonomy, challenges and directions towards automation, intelligence and transparent cybersecurity modeling for critical infrastructures. Internet Things 2024, 25, 101110. [Google Scholar] [CrossRef]
Czaja, P.; Gdowski, B.; Niemiec, M.; Mees, W.; Stoianov, N.; Votis, K.; Kharchenko, V.; Katos, V.; Merialdo, M. Cybersecurity challenges and opportunities of machine learning-based artificial intelligence. Neural Comput. Appl. 2025, 37, 27931–27956. [Google Scholar] [CrossRef]
Tong, Y.; Liang, H.; Ma, H.; Zhang, S.; Yang, X. A Survey on Reinforcement Learning-Driven Adversarial Sample Generation for PE Malware. Electronics 2025, 14, 2422. [Google Scholar] [CrossRef]
Jaffal, N.O.; Alkhanafseh, M.; Mohaisen, D. Large language models in cybersecurity: A survey of applications, vulnerabilities, and defense techniques. AI 2025, 6, 216. [Google Scholar] [CrossRef]
Motlagh, F.N.; Hajizadeh, M.; Majd, M.; Najafi, P.; Cheng, F.; Meinel, C. Large language models in cybersecurity: State-of-the-art. arXiv 2024, arXiv:2402.00891. [Google Scholar] [CrossRef]
Wu, Y.; Lang, R.; Yang, H.; Li, X. An automated security policy generation method based on rule-matching and machine-learning models. In Proceedings of the 2024 International Conference on Advanced Control Systems and Automation Technologies (ACSAT), Nanjing, China, 15–17 November 2024; pp. 217–220. [Google Scholar]
Noor, K.; Imoize, A.L.; Li, C.-T.; Weng, C.-Y. A review of machine learning and transfer learning strategies for intrusion detection systems in 5G and beyond. Mathematics 2025, 13, 1088. [Google Scholar] [CrossRef]
Zhang, S.; Li, S.; Chen, P.; Wang, S.; Zhao, C. Generating network security defense strategy based on cyber threat intelligence knowledge graph. In Proceedings of the International Conference on Emerging Networking Architecture and Technologies, Shenzhen, China, 15–17 October 2022; pp. 507–519. [Google Scholar]
Singh, A.V.; Rathbun, E.; Graham, E.; Oakley, L.; Boboila, S.; Oprea, A.; Chin, P. Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense. arXiv 2024, arXiv:2410.17351. [Google Scholar] [CrossRef]
Xu, T.; Wen, Z.; Zhao, X.; Wang, J.; Li, Y.; Liu, C. L2M-AID: Autonomous Cyber-Physical Defense by Fusing Semantic Reasoning of Large Language Models with Multi-Agent Reinforcement Learning (Preprint). arXiv 2025, arXiv:2510.07363. [Google Scholar]
Tang, L.; Meng, Y.; Patra, A.; Ma, W.; Ye, M.; Xi, Z. POLAR: Automating Cyber Threat Prioritization through LLM-Powered Assessment. arXiv 2025, arXiv:2510.01552. [Google Scholar] [CrossRef]
Mukherjee, S.; Chatterjee, S.; Purvine, E.; Fujimoto, T.; Emerson, T. Large Language Model-Based Reward Design for Deep Reinforcement Learning-Driven Autonomous Cyber Defense. arXiv 2025, arXiv:2511.16483. [Google Scholar]
Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 2024, 18, 186345. [Google Scholar] [CrossRef]
Chowa, S.S.; Alvi, R.; Rahman, S.S.; Rahman, M.A.; Raiaan, M.A.K.; Islam, M.R.; Hussain, M.; Azam, S. From language to action: A review of large language models as autonomous agents and tool users. arXiv 2025, arXiv:2508.17281. [Google Scholar] [CrossRef]
Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.R.; Cao, Y. ReAct: Synergizing reasoning and acting in language models. In Proceedings of the Eleventh International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
Zhao, H.; Ma, C.; Wang, G.; Su, J.; Kong, L.; Xu, J.; Deng, Z.-H.; Yang, H. Empowering Large Language Model Agents through Action Learning. arXiv 2024, arXiv:2402.15809. [Google Scholar] [CrossRef]
Xiong, W.; Song, Y.; Zhao, X.; Wu, W.; Wang, X.; Wang, K.; Li, C.; Peng, W.; Li, S. Watch every step! LLM agent learning via iterative step-level process refinement. arXiv 2024, arXiv:2406.11176. [Google Scholar] [CrossRef]
Nweke, I.P.; Ogadah, C.O.; Koshechkin, K.; Oluwasegun, P.M. Multi-Agent AI Systems in Healthcare: A Systematic Review Enhancing Clinical Decision-Making. Asian J. Med Princ. Clin. Pract. 2025, 8, 273–285. [Google Scholar] [CrossRef]

Figure 1. Overview of the MACD framework architecture. The system operates through four sequential phases: (1) data preprocessing establishes the MITRE ATT&CK knowledge base with vectorized representations; (2) threat understanding & mapping employs RAG-enhanced LLM inference to map threat queries to specific techniques; (3) multi-agent strategy generation deploys three specialized agents (technical defense expert, phase defense expert, APT defense expert) to generate dimension-specific strategies, coordinated by a fourth agent that integrates recommendations; and (4) strategy validation & optimization ensures feasibility and ranks strategies by effectiveness. Each phase incorporates LLM-based reasoning (indicated by spiral icons) to leverage domain expertise while maintaining alignment with the ATT&CK framework.

Table 1. Comparison of cybersecurity defense strategy generation methods.

Method	Overview	Advantages	Disadvantages
Rule-Based	predefined rules and signatures for known attack patterns	Interpretable; easy to deploy; quick response	Limited to known attacks; manual updates needed
ML/DL-Based	Train models on datasets for automatic threat detection	Handles large-scale data; adapts to new threats	Needs labeled data; adversarial vulnerability
Reinforcement Learning	RL agents learn optimal defense in simulated environments	Dynamic strategy generation; models interactions	Long training time; sim-to-real gap
LLM-Based	Large models with RAG for contextual strategy generation	Strong reasoning; fast generation; adaptable	Hallucination risk; trustworthiness issues

Table 2. Core prompt directives for four-category scenario generation.

Category	Concise Prompt Template
Simple	“Given the following ATT&CK technique reference: [ID, name, and description of $t_{k}$ ], generate a concise threat report containing clear and unambiguous observable indicators of this technique, ensuring the described behaviors directly and exclusively correspond to the ATT&CK technique definition.”
Confusing	“Given the following ATT&CK technique reference: [ID, name, and description of $t_{k}$ ] and [ID, name, and description of $t^{'}$ ], generate a threat report that intermixes their observable behaviors without explicitly naming either technique, such that determining the primary exploitation method requires fine-grained disambiguation.”
Noisy	“Given the following ATT&CK technique reference: [ID, name, and description of $t_{k}$ ], generate a threat report where indicators of this technique are embedded within extensive irrelevant contextual information (e.g., routine IT operations, benign system events, help desk tickets), ensuring the real threat indicators constitute no more than 30–40% of the overall content.”
False-positive	“Given the following ATT&CK technique reference: [ID, name, and description of $t_{k}$ ], generate a report describing a legitimate IT or business operation conducted by authorized personnel that produces observable behaviors superficially resembling this technique, but originates entirely from sanctioned activities with no malicious intent or unauthorized access involved.”

Table 3. Representative threat scenarios from Operation Ghost (APT29 campaign) across four difficulty categories (observation point: T1546.003, next technique: T1078.002).

Category	Threat Description
Simple	“We discovered suspicious WMI event subscriptions on several domain controllers that were not part of our baseline configuration. The subscriptions trigger execution of encoded scripts whenever specific security events occur, persisting through system reboots. Investigation shows they were created using administrator credentials outside normal change windows.”
Confusing	“Our monitoring detected unusual persistence mechanisms across the finance department servers. Some appear to be WMI-based event filters executing payloads on certain triggers, while others look like scheduled tasks running at similar intervals. Both were created recently using elevated privileges, making it difficult to determine which persistence method is primarily being exploited.”
Noisy	“This week started with our quarterly system health checks—IT ran diagnostics on all domain controllers, generating thousands of WMI queries for performance monitoring. The new SIEM deployment also created baseline WMI subscriptions for security event correlation. On Wednesday, we noticed some unusual WMI event filters that seemed outside our standard monitoring configuration, created during the maintenance window. Meanwhile, the help desk dealt with printer spooler service crashes requiring multiple restarts, and our backup system triggered its scheduled tasks for nightly data replication across servers.”
False-positive	“Our IT security team deployed a new endpoint monitoring solution this week that uses WMI event subscriptions to track security-critical events across all domain-joined systems. The tool creates permanent WMI filters and consumers to monitor login attempts, privilege escalations, and policy changes. These subscriptions were installed using domain administrator credentials as part of our authorized security enhancement project.”
Current-Stage Defenses (for T1546.003):
Mitigations: M1040, M1026, M1018
Detection: DET0086
Next-Stage Defenses (for T1078.002):
Mitigations: M1032, M1027, M1026, M1018, M1017
Detection: DET0210

Table 4. Overall performance comparison on CyberDefBench. Best results are in bold. Statistical significance is assessed via Welch’s t-test comparing each baseline against MACD (* p < 0.05, ** p < 0.01).

Method	${Acc}_{map}$ (%)	${Cov}_{def}$ (%)	p-Value
Single-Agent LLM	68.2 ± 2.8	51.4 ± 3.2	p < 0.01 **
RAG-Only	75.8 ± 1.9	58.6 ± 2.5	p < 0.01 **
Deep Research	77.5 ± 2.4	64.2 ± 2.9	p < 0.05 *
MACD (Ours)	84.6 ± 1.6	72.3 ± 2.1	—

Table 5. Technique mapping accuracy (%) across scenario categories (mean ± std over three independent runs).

Method	Simple	Confusing	Noisy	False-pos
Single-Agent LLM	75.0 ± 2.0	58.3 ± 3.0	65.0 ± 2.5	52.5 ± 3.5
RAG-Only	82.5 ± 1.8	70.0 ± 2.5	72.5 ± 2.0	65.0 ± 3.0
Deep Research	85.0 ± 2.0	71.2 ± 2.8	69.4 ± 2.3	62.5 ± 3.2
MACD	90.0 ± 1.5	82.5 ± 2.5	81.8 ± 2.0	70.0 ± 3.0

Table 6. Ablation study results showing the impact of removing individual agents.

Configuration	${Acc}_{map}$ (%)	${Cov}_{def}$ (%)	$Δ$ Acc	$Δ$ Cov
w/o Agent 1 (Technical)	83.1	60.5	−1.5	−11.8
w/o Agent 2 (Phase)	81.2	65.1	−3.4	−7.2
w/o Agent 3 (APT)	81.1	66.7	−3.5	−5.6
w/o Agent 4 (Coordinator)	84.2	65.9	−0.4	−6.4
MACD (Full)	84.6	72.3	–	–

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, N.; Li, X.; Li, Z.; Ma, D.; Yan, L.; Cao, H.; Zhang, W.; Wang, X.; Liu, Y. MACD: Multi-Agent Collaborative Approach for Cybersecurity Defense Strategy Generation. Information 2026, 17, 370. https://doi.org/10.3390/info17040370

AMA Style

Li N, Li X, Li Z, Ma D, Yan L, Cao H, Zhang W, Wang X, Liu Y. MACD: Multi-Agent Collaborative Approach for Cybersecurity Defense Strategy Generation. Information. 2026; 17(4):370. https://doi.org/10.3390/info17040370

Chicago/Turabian Style

Li, Nanfang, Xiang Li, Zongrong Li, Denghui Ma, Lijun Yan, Haishan Cao, Wenqian Zhang, Xu Wang, and Yu Liu. 2026. "MACD: Multi-Agent Collaborative Approach for Cybersecurity Defense Strategy Generation" Information 17, no. 4: 370. https://doi.org/10.3390/info17040370

APA Style

Li, N., Li, X., Li, Z., Ma, D., Yan, L., Cao, H., Zhang, W., Wang, X., & Liu, Y. (2026). MACD: Multi-Agent Collaborative Approach for Cybersecurity Defense Strategy Generation. Information, 17(4), 370. https://doi.org/10.3390/info17040370

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MACD: Multi-Agent Collaborative Approach for Cybersecurity Defense Strategy Generation

Abstract

1. Introduction

2. Related Work

2.1. Cybersecurity Defense Strategy Generation

2.2. Multi-Agent AI Systems

3. Problem Formulation

4. Methodology

4.1. MACD Framework Overview

4.2. Data Preprocessing

4.3. Threat Understanding & Mapping

4.4. Multi-Agent Strategy Generation

4.5. Strategy Validation & Optimization

5. CyberDefBench Benchmark Construction

5.1. Dataset Construction Methodology

5.2. Dataset Examples

5.3. Evaluation Metrics

6. Experiments

6.1. Experimental Setup

6.1.1. Baseline Methods

6.1.2. Evaluation Metrics

6.1.3. Experimental Environment

6.1.4. Overall Performance Comparison

6.1.5. Performance Across Scenario Categories

6.1.6. Ablation Study

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Prompt Templates and Pseudocode

Appendix A.1. Agent Prompt Templates

Appendix A.2. MACD Algorithm Pseudocode

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI