1. Introduction
The rapid evolution of cyber threats has reshaped the information security landscape, presenting unprecedented challenges for organizations worldwide. Modern cyberattacks have grown increasingly sophisticated, with Advanced Persistent Threats (APTs) demonstrating complex multi-stage tactics that evade traditional security measures [
1,
2]. These sophisticated attacks impose severe consequences, with the average data breach cost exceeding
$4.45 million and the time to identify and contain breaches extending beyond 200 days [
3,
4]. This escalating threat environment necessitates a paradigm shift from reactive security measures to proactive, intelligence-driven defense strategies. Effective defense strategy generation requires deep technical expertise across multiple security domains, rapid analysis of evolving threat patterns, and synthesis of comprehensive countermeasures, yet traditional manual approaches face significant scalability bottlenecks as threat volumes increase exponentially [
5]. Thus, leveraging artificial intelligence to augment human cybersecurity expertise and enable automated generation of adaptive defense strategies has become both an urgent need and a research priority in this field [
6,
7].
Existing approaches to cybersecurity defense strategy generation have evolved from rule-based systems to intelligent reasoning frameworks, broadly categorized into four main paradigms as shown in
Table 1. Traditional methods, including rule-based matching, machine learning anomaly detection, and reinforcement learning-based policy optimization, have advanced the field but face fundamental limitations such as an inability to generalize to novel threats, scarce labeled data requirements, and substantial simulation-to-reality gaps [
8,
9,
10]. Most recently, large language model (LLM)-based approaches leverage the powerful reasoning capabilities of foundation models to produce contextual defense strategies, often incorporating Retrieval-Augmented Generation (RAG) with authoritative knowledge bases such as MITRE ATT&CK to map unstructured threat descriptions to standardized defense recommendations [
11,
12].
Despite these advances, current methods face critical limitations that constrain their real-world effectiveness. LLM-based approaches, while demonstrating strong reasoning capabilities, suffer from hallucination issues that generate plausible but incorrect strategies, as single-agent systems struggle to maintain sufficient knowledge depth across all required security dimensions simultaneously. Comprehensive cybersecurity defense requires expertise spanning technical vulnerability mitigation, attack progression analysis through kill chain phases, and threat actor behavioral profiling, yet single agents inevitably face knowledge depth insufficiency or perspective bias when attempting to cover all domains. More critically, existing methods lack structured collaboration mechanisms to systematically integrate these diverse security perspectives into coherent strategies. These limitations indicate the need for multi-agent architectures that balance specialized knowledge depth with systematic collaboration mechanisms.
To address these limitations, we propose MACD (Multi-Agent Collaborative Defense), a novel framework that orchestrates multiple specialized AI agents for comprehensive defense strategy generation. Our approach recognizes that effective defense requires expertise across multiple dimensions and achieves this through a multi-agent architecture that decomposes the complex defense generation task into specialized subtasks, enabling each agent to develop deep expertise while a coordinator agent integrates knowledge across dimensions to ensure strategic consistency. MACD operates through a four-phase process that systematically transforms threat queries into actionable strategies: Phases 1–2 preprocess threat intelligence and leverage RAG technology to accurately map natural language descriptions to ATT&CK techniques while mitigating hallucination risks. Phase 3 constitutes the core innovation, deploying three specialized agents (Technical Defense Expert, Phase Defense Expert, and APT Defense Expert) that generate dimension-specific strategies in parallel, with a Coordinator Agent synthesizing these perspectives through deduplication, conflict resolution, and prioritization into a unified, ATT&CK-aligned defense plan. Phase 4 validates feasibility and optimizes effectiveness. This architecture achieves knowledge depth through specialization, efficiency through parallelization, and coherence through intelligent coordination.
In this paper, we make several key contributions:
We construct CyberDefBench, a comprehensive benchmark combining real-world APT cases and synthetic scenarios with standardized evaluation metrics for assessing defense strategy quality.
We propose MACD (Multi-Agent Collaborative Defense), a novel multi-agent framework that orchestrates three specialized AI agents coordinated through a Coordinator Agent to systematically generate comprehensive, ATT&CK-aligned cybersecurity defense strategies.
Through extensive experiments on CyberDefBench, we demonstrate that MACD significantly outperforms baseline methods across multiple metrics, validating practical applicability and interpretability.
4. Methodology
4.4. Multi-Agent Strategy Generation
The core innovation of MACD lies in its multi-agent collaborative strategy generation mechanism, which addresses the multi-dimensional nature of cybersecurity defense through specialized agents that independently reason about complementary aspects of threat mitigation. Once a threat query q is mapped to a confirmed technique , we deploy three expert agents—each embodying distinct defensive perspectives—to generate dimension-specific strategies in parallel: Agent 1 (Technical Defense Expert) focuses on technique-specific countermeasures by analyzing the technical characteristics of and retrieving associated MITRE mitigations , generating strategies covering detection methods, prevention controls, configuration hardening, and monitoring approaches tailored to the specific attack vector. Agent 2 (Phase Defense Expert) adopts a kill chain perspective by retrieving the primary tactic and corresponding kill chain phase information , generating layered defense strategies that address early detection, phase transition prevention, containment, and recovery, explicitly considering how the attack fits into the broader adversarial campaign lifecycle. Agent 3 (APT Defense Expert) takes a threat actor-centric view by identifying APT groups that employ technique , generating strategies focused on behavioral detection patterns, proactive threat hunting procedures, deception techniques, and intelligence-driven defenses informed by APT tactics, techniques, and procedures (TTPs).
Each agent operates independently using carefully designed prompts (see
Appendix A) that provide the original threat query
q, the mapped technique details, and dimension-specific context (mitigations for Agent 1, kill chain phase for Agent 2, APT profiles for Agent 3), ensuring that generated strategies remain grounded in both the specific threat scenario and the broader ATT&CK knowledge base. The prompts explicitly instruct agents to contextualize their recommendations to the particular attack scenario rather than producing generic defenses, as illustrated by the directive: “Your strategies should be contextualized to the actual threat scenario, not just generic defenses for the technique.” This contextualization is crucial for generating actionable strategies that address the specific characteristics of the observed threat, such as the particular attack vector, target environment, and operational constraints mentioned in the threat query.
The independent reasoning of specialized agents offers several advantages: (1) depth of expertise, as each agent can focus exclusively on its dimension without cognitive overload from managing all perspectives simultaneously; (2) parallel generation efficiency, enabling faster overall strategy production; and (3) diversity of coverage, reducing the risk that critical defensive angles are overlooked due to perspective bias. However, this parallelism introduces the challenge of ensuring consistency and coherence across independently generated strategies, which we address through Agent 4 (Coordinator Agent).
Agent 4 serves as an intelligent integrator that synthesizes the outputs from the three expert agents into a unified, coherent defense plan . The coordinator performs four critical functions: (1) deduplication and conflict resolution, identifying redundant recommendations across agents and resolving logical inconsistencies; (2) prioritization, ranking strategies by effectiveness for the specific threat scenario using criteria such as immediacy of impact, implementation complexity, and expected risk reduction; (3) contextualization verification, ensuring all recommended strategies remain relevant to the original threat query q and are not generic ATT&CK guidance disconnected from the scenario; and (4) gap analysis and enhancement, identifying missing defensive controls not covered by the expert agents and augmenting the strategy accordingly. The coordinator is prompted with all three expert outputs and the original threat context, generating a structured defense plan organized into categories (immediate actions, detection strategies, prevention controls, monitoring and alerting, threat hunting, response and recovery) with explicit priority levels and source attributions. This integration step transforms independently generated defensive perspectives into a comprehensive, operationalized defense strategy that security teams can directly implement.
Author Contributions
Conceptualization, N.L. and X.L.; methodology, N.L. and X.L.; software, N.L., X.L. and Z.L.; validation, N.L., X.L., D.M. and L.Y.; formal analysis, N.L. and Z.L.; investigation, N.L., Z.L., D.M. and H.C.; resources, H.C., W.Z. and X.W.; data curation, N.L., Z.L. and D.M.; writing—original draft preparation, X.L. and Y.L.; writing—review and editing, N.L., X.L. and Y.L.; visualization, N.L. and Z.L.; supervision, X.L. and X.W.; project administration, N.L. and X.L.; funding acquisition, N.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the STATE GRID QINGHAI ELECTRIC POWER COMPANY, Project Name: Research and Application of Attack Detection Technology Based on Large Models (52280725000B).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Restrictions apply to the availability of these data. Data were obtained from MITRE ATT&CK and are available
https://attack.mitre.org with the permission of MITRE ATT&CK.
Conflicts of Interest
Authors Nanfang Li, Xiang Li, Zongrong Li, Denghui Ma, Lijun Yan, Haishan Cao, Wenqian Zhang and Xu Wang were employed by State Grid Qinghai Electric Power Company. The remaining author declares that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
References
- Che Mat, N.I.; Jamil, N.; Yusoff, Y.; Mat Kiah, M.L. A systematic literature review on advanced persistent threat behaviors and its detection strategy. J. Cybersecur. 2024, 10, tyad023. [Google Scholar] [CrossRef]
- Mulahuwaish, A.; Qolomany, B.; Gyorick, K.; Abdo, J.B.; Aledhari, M.; Qadir, J.; Carley, K.; Al-Fuqaha, A. A survey of social cybersecurity: Techniques for attack detection, evaluations, challenges, and future prospects. Comput. Hum. Behav. Rep. 2025, 18, 100668. [Google Scholar] [CrossRef]
- Malik, V.; Khanna, A.; Sharma, N.; Nalluri, S. Advanced Persistent Threats (APTs): Detection Techniques and Mitigation Strategies. Int. J. Glob. Innov. Solut. 2024. [Google Scholar] [CrossRef]
- Cobos, E.V.; Cakir, S.; Straub, S.; Qiang, C.Z.; Torgusson, C. A Review of the Economic Costs of Cyber Incidents; World Bank: Washington, DC, USA, 2024. [Google Scholar]
- Mohamed, N. Current trends in AI and ML for cybersecurity: A state-of-the-art survey. Cogent Eng. 2023, 10, 2272358. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Alwahedi, F.; Battah, A.; Cherif, B.; Mechri, A.; Tihanyi, N. Generative AI and large language models for cyber security: All insights you need. arXiv 2024, arXiv:2405.12750. [Google Scholar] [CrossRef]
- Brandão, P.; Silva, C. Unveiling the Shadows—A Framework for APT’s Defense AI and Game Theory Strategy. Algorithms 2025, 18, 404. [Google Scholar] [CrossRef]
- Sarker, I.H.; Janicke, H.; Ferrag, M.A.; Abuadbba, A. Multi-aspect rule-based AI: Methods, taxonomy, challenges and directions towards automation, intelligence and transparent cybersecurity modeling for critical infrastructures. Internet Things 2024, 25, 101110. [Google Scholar] [CrossRef]
- Czaja, P.; Gdowski, B.; Niemiec, M.; Mees, W.; Stoianov, N.; Votis, K.; Kharchenko, V.; Katos, V.; Merialdo, M. Cybersecurity challenges and opportunities of machine learning-based artificial intelligence. Neural Comput. Appl. 2025, 37, 27931–27956. [Google Scholar] [CrossRef]
- Tong, Y.; Liang, H.; Ma, H.; Zhang, S.; Yang, X. A Survey on Reinforcement Learning-Driven Adversarial Sample Generation for PE Malware. Electronics 2025, 14, 2422. [Google Scholar] [CrossRef]
- Jaffal, N.O.; Alkhanafseh, M.; Mohaisen, D. Large language models in cybersecurity: A survey of applications, vulnerabilities, and defense techniques. AI 2025, 6, 216. [Google Scholar] [CrossRef]
- Motlagh, F.N.; Hajizadeh, M.; Majd, M.; Najafi, P.; Cheng, F.; Meinel, C. Large language models in cybersecurity: State-of-the-art. arXiv 2024, arXiv:2402.00891. [Google Scholar] [CrossRef]
- Wu, Y.; Lang, R.; Yang, H.; Li, X. An automated security policy generation method based on rule-matching and machine-learning models. In Proceedings of the 2024 International Conference on Advanced Control Systems and Automation Technologies (ACSAT), Nanjing, China, 15–17 November 2024; pp. 217–220. [Google Scholar]
- Noor, K.; Imoize, A.L.; Li, C.-T.; Weng, C.-Y. A review of machine learning and transfer learning strategies for intrusion detection systems in 5G and beyond. Mathematics 2025, 13, 1088. [Google Scholar] [CrossRef]
- Zhang, S.; Li, S.; Chen, P.; Wang, S.; Zhao, C. Generating network security defense strategy based on cyber threat intelligence knowledge graph. In Proceedings of the International Conference on Emerging Networking Architecture and Technologies, Shenzhen, China, 15–17 October 2022; pp. 507–519. [Google Scholar]
- Singh, A.V.; Rathbun, E.; Graham, E.; Oakley, L.; Boboila, S.; Oprea, A.; Chin, P. Hierarchical Multi-agent Reinforcement Learning for Cyber Network Defense. arXiv 2024, arXiv:2410.17351. [Google Scholar] [CrossRef]
- Xu, T.; Wen, Z.; Zhao, X.; Wang, J.; Li, Y.; Liu, C. L2M-AID: Autonomous Cyber-Physical Defense by Fusing Semantic Reasoning of Large Language Models with Multi-Agent Reinforcement Learning (Preprint). arXiv 2025, arXiv:2510.07363. [Google Scholar]
- Tang, L.; Meng, Y.; Patra, A.; Ma, W.; Ye, M.; Xi, Z. POLAR: Automating Cyber Threat Prioritization through LLM-Powered Assessment. arXiv 2025, arXiv:2510.01552. [Google Scholar] [CrossRef]
- Mukherjee, S.; Chatterjee, S.; Purvine, E.; Fujimoto, T.; Emerson, T. Large Language Model-Based Reward Design for Deep Reinforcement Learning-Driven Autonomous Cyber Defense. arXiv 2025, arXiv:2511.16483. [Google Scholar]
- Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 2024, 18, 186345. [Google Scholar] [CrossRef]
- Chowa, S.S.; Alvi, R.; Rahman, S.S.; Rahman, M.A.; Raiaan, M.A.K.; Islam, M.R.; Hussain, M.; Azam, S. From language to action: A review of large language models as autonomous agents and tool users. arXiv 2025, arXiv:2508.17281. [Google Scholar] [CrossRef]
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.R.; Cao, Y. ReAct: Synergizing reasoning and acting in language models. In Proceedings of the Eleventh International Conference on Learning Representations, Virtual, 25–29 April 2022. [Google Scholar]
- Zhao, H.; Ma, C.; Wang, G.; Su, J.; Kong, L.; Xu, J.; Deng, Z.-H.; Yang, H. Empowering Large Language Model Agents through Action Learning. arXiv 2024, arXiv:2402.15809. [Google Scholar] [CrossRef]
- Xiong, W.; Song, Y.; Zhao, X.; Wu, W.; Wang, X.; Wang, K.; Li, C.; Peng, W.; Li, S. Watch every step! LLM agent learning via iterative step-level process refinement. arXiv 2024, arXiv:2406.11176. [Google Scholar] [CrossRef]
- Nweke, I.P.; Ogadah, C.O.; Koshechkin, K.; Oluwasegun, P.M. Multi-Agent AI Systems in Healthcare: A Systematic Review Enhancing Clinical Decision-Making. Asian J. Med Princ. Clin. Pract. 2025, 8, 273–285. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |