Next Article in Journal
Dealing with Class Overlap Through Cluster-Based Sample Weighting
Previous Article in Journal
Research on the Application of Federated Learning Based on CG-WGAN in Gout Staging Prediction
Previous Article in Special Issue
A Context-Aware Doorway Alignment and Depth Estimation Algorithm for Assistive Wheelchairs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Investigation of Cybersecurity Bottlenecks of AI Agents in Industrial Automation

1
University West, 461 32 Trollhättan, Sweden
2
AGH University of Science & Technology, 30-059 Kraków, Poland
3
UAE University, Al Ain P.O. Box 15551, United Arab Emirates
*
Author to whom correspondence should be addressed.
Computers 2025, 14(11), 456; https://doi.org/10.3390/computers14110456
Submission received: 7 July 2025 / Revised: 8 September 2025 / Accepted: 10 September 2025 / Published: 23 October 2025
(This article belongs to the Special Issue AI for Humans and Humans for AI (AI4HnH4AI))

Abstract

The growth of Agentic AI systems in Industrial Automation has brought forth new cybersecurity issues which in turn put at risk the reliability and integrity of these systems. In this study we look at the cybersecurity issues in industrial automation in terms of the threats, risks, and vulnerabilities related to Agentic AI. We conducted a systematic literature review to report on the present day practices in terms of cybersecurity for industrial automation and Agentic AI. Also we used a simulation based approach to study the security issues and their impact on industrial automation systems. Our study results identify the key areas of focus and what mitigation strategies may be put in place to secure the integration of Agentic AI in industrial automation. Our research brings to the table results which will play a role in the development of more secure and reliable industrial automation systems, which in the end will improve the overall cybersecurity of these systems.

1. Introduction

Over the past few decades, industrial automation has undergone a dramatic transformation, evolving from basic mechanization to highly interconnected cyber physical systems (CPS) [1]. Today’s production lines, power plants, and factories integrate digital control networks, robotics, sensors, and actuators to maximize performance, safety, and efficiency. Artificial intelligence (AI) has increasingly become a central driver of this evolution, enabling automated systems to respond, schedule, predict, and adapt effectively [2]. A more recent advancement, known as Agentic AI, has emerged alongside these developments. By allowing autonomous decision making, Agentic AI systems go beyond traditional automation. With little or no human input, they are capable of perceiving their environment, processing complex data, making decisions, and interacting with both digital and physical systems [3]. Agentic AI offers transformative opportunities for smart manufacturing [4]. For example, intelligent agents can optimize production lines, enable autonomous maintenance and scheduling, dynamically adapt supply chains, and improve workplace safety. Since such systems and processes can operate independently, they also promise significant operational and financial benefits [5]. However, this autonomy introduces new cybersecurity challenges. Unlike traditional control systems that follow fixed logic, Agentic AI systems are more flexible, yet this very flexibility makes them vulnerable to manipulation, adversarial influence, data poisoning, and decision drift [2,6]. Attacks targeting an agent’s ability to reason, learn, or interpret sensor data can result in unsafe operational decisions, process disruptions, or even physical damage to industrial infrastructure. Existing cybersecurity frameworks in industrial environments were not originally designed to protect autonomous systems capable of defining priorities and making decisions independently [7]. Detecting anomalies or attacks in such systems is particularly difficult, as Agentic AI agents operate autonomously and continuously adapt their behavior based on learning and shifting priorities [8,9]. The lack of cybersecurity strategies tailored specifically for these systems underscores the urgency of this research.
This paper investigates cybersecurity risks, threats, and vulnerabilities related to Agentic AI systems in industrial automation [10,11]. It focuses on identifying key security challenges, simulating adversarial attacks, exploring detection methods, and proposing practical recommendations to strengthen security resilience [2,6]. In particular, the contributions of this study are threefold: first, it presents a structured and comprehensive literature review of cybersecurity risks, threats, and vulnerabilities specific to Agentic AI in industrial automation, highlighting major gaps in current research and practice; second, it designs and implements simulated adversarial attacks, including Distributed Denial of Service (DDoS), False Data Injection, Replay, and Adversarial Attacks, on two Agentic AI frameworks (CrewAI and LangFlow), while evaluating their resilience using GAN-based anomaly detection and large language model reasoning; and third, it proposes practical mitigation strategies tailored to Agentic AI, such as advanced anomaly detection, secure communication protocols, and integration with standardized security frameworks. By combining these three contributions, this research provides actionable insights into how the next generation of autonomous industrial systems can be made more secure and resilient.

2. Background and Related Work

Agentic AI refers to intelligent systems that operate autonomously by setting their own goals and adapting through continuous learning. These systems rely on advanced frameworks and large language model (LLM) agents to automate complex operations and streamline processes across industries [12]. The shift from traditional rule-based automation to Agentic AI represents a significant transformation in industrial automation, where systems no longer just execute predefined instructions but can instead reason, adapt, and make decisions independently [13]. This promises greater adaptability, efficiency, and resilience, while also introducing new challenges for cybersecurity [14].
By structuring the discussion in this way, the chapter highlights not only what is known about Agentic AI in industrial automation, but also where critical gaps remain, particularly in addressing the cybersecurity risks of systems capable of making autonomous decisions. These sections establish the background and related work necessary to understand the opportunities and challenges of securing Agentic AI in industrial automation.

2.1. Background

Agentic AI represents the next step in this evolution: systems that operate with little or no human guidance, demonstrating adaptability, advanced decision making, and self sufficiency [2,15]. This shift is transforming industrial manufacturing into smart factories, where AI driven sensors monitor equipment, predict failures, and prevent costly downtime [16]. Research shows that Agentic AI enhances resource allocation and operational efficiency while improving safety and resilience in complex industrial environments [13]. Its core strength lies in situational flexibility and goal oriented reasoning, allowing it to adapt in real time and manage unforeseen conditions [10]. These capabilities are supported by large language models, vertical AI agents, and reinforcement learning frameworks, which extend the ability of these systems to learn from interaction and execute multi step reasoning with minimal supervision [17,18]. Recent studies emphasize the scalability of Agentic AI across industries. For example, vertical AI agents are increasingly integrated into workflows in manufacturing, logistics, and finance, while architectures such as transformers are being optimized for tasks like intelligent threat analysis and vulnerability detection [2,9]. Beyond conceptual advances, practical implementations have been proposed such as edge-based middleware for cyber physical manufacturing systems (CPMS) to enable real-time coordination among smart agents [19]. Other research highlights explainability, human–robot teaming, and cooperative multi agent designs as essential factors for adoption in industrial environments [20,21,22,23,24].
This section provides the theoretical and technological context for the paper by reviewing how Agentic AI has been defined and applied in industrial automation, what opportunities it creates, and what risks accompany its deployment. It is organized into several subsections. Section 2.1 outlines the background of Agentic AI and industrial automation, explaining the capabilities and architectures of these systems. Section 2.1.1 discusses applications in industrial environments, including predictive maintenance, process optimization, and supply chain management. Section 2.1.3 provides in-depth analysis of threats and vulnerabilities of Agentic AI, highlighting the types of adversarial attack that exploit its autonomy and decision-making capabilities. Section 2.1.3 reviews the cybersecurity methods and frameworks that have been proposed to defend against such threats. Section 2.1.4 identifies the limitations and gaps in existing research that set the foundation for this paper. And Section 2.1.5 higlights the emerging trends in security of AI. By consolidating prior research and critically assessing both opportunities and risks, this chapter provides a foundation for investigating how Agentic AI can be deployed securely in industrial automation.

2.1.1. Agentic AI in Industrial Automation

The integration of Agentic AI into industrial automation marks a paradigm shift from rigid, rule-based systems to autonomous, adaptive, and intelligent operations. Traditional programmable logic controllers (PLCs) and control systems were designed to execute predefined instructions, offering limited flexibility. By contrast, Agentic AI introduces autonomy, context awareness, and continuous learning, enabling systems to analyze real-time data, adapt their decision making processes, and respond dynamically to changing industrial environments [14,15]. At the system level, Agentic AI enhances resilience and efficiency by learning directly from operational data. Reinforcement learning techniques allow these systems to fine tune detection thresholds in real time, identifying risks such as unusual equipment vibrations, pressure fluctuations, or temperature changes before they escalate into failures [25]. This ability supports predictive maintenance, reduces downtime, and improves safety in critical industrial settings. In energy management, for example, Agentic AI agents are integrated with communication and information technologies (ICT) in smart grids, where agent-based models and service oriented architectures (SOA) enable collaborative energy distribution and real-time optimization [26].
Beyond system integration, Agentic AI is increasingly applied across diverse industrial use cases. In manufacturing, intelligent agents optimize production processes, reduce resource waste, and automate scheduling. In hyper automation environments, Agentic AI works alongside robotic process automation (RPA), IoT devices, and cloud systems to create unified and scalable ecosystems [27]. These applications extend to anomaly detection and compliance with industrial security standards such as IEC 62351 and ISA 99 [16,28], as well as to intelligent energy management systems that optimize power consumption based on contextual and environmental data [29]. By combining adaptability with real-world applications, Agentic AI is not only reshaping industrial workflows but also redefining the boundaries of automation. However, the same autonomy that enables predictive decision making and operational resilience also increases the attack surface, underscoring the need to critically assess the security challenges of these systems, which will be discussed in the following sections [16,30].

2.1.2. Threats and Vulnerabilities in Agentic AI

As Agentic AI systems become increasingly integrated into industrial automation, they also face growing cybersecurity threats [31,32]. These threats target the very features that make Agentic AI powerful: its autonomy, adaptability, and ability to reason with minimal human oversight. Unlike traditional control systems with predictable logic, Agentic AI systems learn continuously, interact dynamically with their environment, and communicate across distributed agents. This creates new opportunities for adversaries to exploit vulnerabilities in models, data, and communication channels [33,34].One major category of threats is adversarial attacks on AI models. Techniques such as data poisoning, embedding inversion, and adversarial prompting can manipulate training data or input signals, misleading AI systems into making incorrect or unsafe decisions without raising obvious alarms for human operators [33,35]. Multi agent environments are particularly at risk, as attackers can exploit weak communication protocols, such as unencrypted Bluetooth or insecure APIs, to intercept, alter, or inject malicious commands [36,37].
Automated attack generation, fueled by advances in generative AI, has further complicated the security landscape. Attackers can now leverage AI tools to identify vulnerabilities in real time, craft adversarial samples, or launch coordinated campaigns against critical industrial components [38]. For example, programmable logic controllers (PLCs) and sensors that manage essential processes in manufacturing or energy systems may be manipulated through spoofing, false data injection, or denial of service attacks, leading to production delays, supply chain disruptions, or even safety incidents [1,5]. The consequences of such attacks are not merely technical but also operational and strategic. Compromising decision making agents can disrupt entire production lines, destabilize smart grids, or undermine sustainability goals that rely on efficiency and reliability [6,39]. Although researchers have proposed adaptive defenses such as adversarial transformers, anomaly detection models, and reinforcement learning-based monitoring, deploying these solutions in real-world industrial environments remains difficult. Performance constraints, scalability issues, and the need for uninterrupted operations mean that many systems cannot easily adopt heavyweight defenses [6,13,25]. Threats to Agentic AI systems are evolving as rapidly as the systems themselves. Attackers continue to find new ways to evade detection, which underscores the importance of ongoing monitoring, improved forensic tools, and comprehensive security testing. These vulnerabilities form the basis for the defense methods and mitigation strategies discussed in the next section [26,40,41].

2.1.3. Cybersecurity Methods for Agentic AI

In response to the growing vulnerabilities of Agentic AI systems, researchers and practitioners have proposed a variety of cybersecurity methods aimed at safeguarding industrial automation. These methods focus on detecting anomalies, protecting communication channels, and enhancing resilience against adversarial manipulation [42]. Anomaly detection is one of the most widely studied approaches. By monitoring patterns of system behavior, anomaly detection tools can identify deviations that may indicate cyberattacks or system malfunctions [39]. In cyber–physical systems (CPS), where operational failures can lead to physical damage, anomaly detection plays a particularly critical role [43]. Machine learning-based detection models, including support vector machines and transformer based classifiers, are increasingly applied to identify subtle adversarial signals in large datasets [32]. Generative Adversarial Networks (GANs) have also been explored as powerful tools for anomaly detection, capable of modeling normal system behavior and flagging deviations that may correspond to malicious activity [43,44]. In addition to detection, secure communication protocols and encryption are essential for protecting sensitive industrial data. Standards such as IEC 62351 and ISA 99 provide guidelines for ensuring confidentiality, integrity, and availability in industrial networks [16,28]. These frameworks emphasize authentication, access control, and risk based security measures, which are vital for multi agent environments where autonomous systems exchange data continuously.
Industry leaders have also proposed specialized frameworks tailored to multi agent systems. For example, Microsoft’s AutoGen and OpenAI’s Assistant API illustrate how secure orchestration of LLM powered agents can be combined with encryption and access controls to mitigate risks associated with autonomous coordination [26,45]. Privacy preserving methods, such as secure coding practices and federated learning, are also gaining traction for limiting data exposure during training and inference. Beyond technical safeguards, structured methodologies such as STRIDE threat modeling help organizations systematically analyze risks at each stage of an Agentic AI system’s lifecycle. By considering factors like spoofing, tampering, and elevation of privilege, STRIDE provides a structured way to anticipate and mitigate attack vectors before deployment [15,41,46].
While these methods demonstrate promising directions, their deployment in real-time industrial settings remains limited. The complexity and scale of industrial automation often make it difficult to implement advanced detection or cryptographic techniques without affecting performance and reliability. Nevertheless, the combination of anomaly detection, secure communication, and structured threat modeling forms the foundation of current cybersecurity efforts for Agentic AI systems [42,43].

2.1.4. Challenges and Limitations

Despite promising advances in cybersecurity methods, significant challenges remain in securing Agentic AI systems within industrial automation. These challenges arise not only from the sophistication of adversarial threats but also from the complexity and unpredictability of autonomous decision making [26]. One major limitation is the lack of specialized security frameworks tailored specifically to Agentic AI. Traditional models designed for rule-based automation are insufficient for systems that learn dynamically, communicate autonomously, and adapt their behavior in real time [42,43]. The OWASP Agentic AI project has highlighted critical risk areas, including perception, decision making, memory management, and inter agent communication, but comprehensive frameworks addressing these issues are still in their infancy [33,47]. Practical deployment also suffers from interoperability challenges. Many industrial environments rely on legacy infrastructure that does not integrate seamlessly with modern AI technologies. This creates bottlenecks and fragmented security implementations, particularly in smart factories where cross platform communication is essential [29,48]. Similarly, limited compatibility among IoT devices and industrial sensors often undermines efforts to build unified security architectures [43].
Data privacy and regulatory compliance pose further obstacles. Agentic AI systems typically process large volumes of sensitive operational and personal data, raising the risk of breaches and privacy violations without robust encryption, authentication, and layered defenses [28]. The shortage of skilled professionals who understand both AI and cybersecurity compounds these issues, limiting organizational capacity to design, deploy, and maintain secure Agentic AI systems [30]. Emerging approaches such as Reinforcement Learning from Human Feedback (RLHF) aim to improve alignment and accountability of autonomous systems. However, their effectiveness in safety critical industrial contexts remains unproven, as these methods introduce new uncertainties and require extensive validation before large scale adoption [33]. Likewise, while anomaly detection and predictive modeling techniques such as MADS can forecast potential threats, their real-world performance is still constrained by false positives and scalability challenges [48,49].
Finally, developments in quantum computing present a dual challenge. On the one hand, quantum algorithms could improve detection and defense capabilities; on the other, they threaten to render current cryptographic protections obsolete, necessitating urgent research into quantum resistant security models [38]. Taken together, these challenges highlight a critical gap between theoretical security methods and their practical deployment in industrial environments. Addressing these limitations requires multidisciplinary collaboration across technical, organizational, and policy domains. More importantly, it underscores the need for applied research such as simulation based adversarial testing and systematic anomaly detection to bridge the gap between conceptual security models and operational resilience [3,34,42].

2.1.5. Emerging Trends in AI Security

The evolution of cyber threats has accelerated the search for innovative defenses, and several emerging trends are reshaping the security landscape for Agentic AI systems. One promising direction is the rise of autonomous cybersecurity agents AI-powered defense systems capable of continuously monitoring networks, detecting anomalies, and responding to attacks in real time. These agents leverage behavioral analytics, reinforcement learning, and adaptive controls to create self healing security environments that can scale with industrial demands [28,38]. Blockchain and distributed ledger technologies are also gaining attention for their ability to ensure data integrity and transparency across networked systems. By creating immutable audit trails, blockchain can support secure communication and verification in multi agent environments where trust and accountability are critical [15,29]. At the same time, adversarial defenses are becoming more specialized. Tools such as SmoothLLM and LLM Guard have been developed to reduce prompt injection and model manipulation risks in large language models (LLMs) [50]. These solutions, combined with techniques like paraphrasing, instruction defense, and filtering, are early steps toward making LLM-enabled Agentic AI systems more secure [50]. Quantum computing presents both a disruptive challenge and an opportunity. Although quantum capabilities threaten to break conventional encryption schemes, they also offer the potential for advanced quantum resistant cryptography and faster anomaly detection. Post quantum security research is therefore emerging as a critical area of focus for future proofing agentic AI [29,38,51].
Finally, human AI collaboration is increasingly recognized as an essential component of security strategy. Rather than fully replacing human oversight, future systems are expected to combine the speed and scale of AI with human judgment and contextual understanding. This hybrid approach offers a pathway to more reliable and trustworthy cybersecurity outcomes [38,52]. Together, these emerging trends illustrate the dynamic nature of AI security research. Although challenges remain, advances in autonomous defense, blockchain, quantum resilience, and human–AI collaboration signal a shift toward more adaptive and holistic approaches to secure Agentic AI in industrial automation [13,52].

2.2. Related Work

The security of Agentic AI systems in industrial automation is a multidimensional topic that includes AI system vulnerabilities, threat detection models, industrial architectural integration, and human AI collaborations. Despite promising developments, numerous critical gaps remain unfilled. Several studies look into vulnerabilities related to Large Language Models (LLMs) and Agentic AI. Papers present detailed analysis of threats such as data poisoning, model inversion, and rapid injection [47,53]. These efforts primarily address general purpose AI systems but they do not apply their research to industrial control systems (ICS) or production contexts, where operational restrictions and real-time processing add new obstacles [54,55]. Similarly, the work on Semantic Adversarial Diagnostic Attacks (SADA) presents a relevant attack model. However, it lacks industrial case studies to validate its impact on real-time decision making systems [56]. There is a need for a cybersecurity defense compass to secure Agentic AI across development and deployment phases, yet no simulation or deployment evidence is presented to validate these strategies [10]. Ref. [49] discuss GenAI applications in manufacturing, highlighting business value but neglecting security architecture and defense mechanisms. Meanwhile, Ref. [18] and the Agentic Systems Guide describe vertical AI agents for industry specific tasks but do not analyze threat resilience or adversarial robustness. The paper “Securing the Industrial Backbone” contribute by detailing machine tool communication middleware, but do not consider the cybersecurity implications of interagent communication or data integrity risks [28]. Mubarak and Awa propose machine learning-based anomaly detection models for ICS and Industry 4.0 environments [57]. Yet, their evaluations focus on offline datasets without evidence of real-time deployment or adversarial robustness [57,58]. The paper introduces an adaptive transformer-based anomaly detection model, demonstrating the potential of transformers for industrial applications [11]. However, the study does not address long-term learning stability or false-positive mitigation, which are critical for production environments [11]. Furthermore, work on GAN-based anomaly detection in smart manufacturing shows promise for quality control, but does not evaluate the system resilience against poisoning or evasion attacks. Leitão provides agent-based manufacturing and cyber physical systems (CPS) frameworks [26]. However, these works rarely integrate adversarial threat models or STRIDE based risk assessments, leaving security considerations underdeveloped [26].
Similarly, industrial guidelines such as IEC 62351 and ISA 99 are mentioned in several studies, but practical integration into Agentic AI workflows is seldom demonstrated. Human AI collaboration is addressed by the paper “Human–robot interaction in industrial collaborative robotics” through hyper automation frameworks integrating human oversight and AI driven response [20]. However, the scalability and effectiveness of these approaches under continuous adversarial pressure remains unexplored [13,20]. Despite growing academic interest, the following limitations and gaps persist:
  • Insufficient empirical validation in real industrial environments [9].
  • Lack of integrated solutions combining anomaly detection, prevention, and response [1].
  • Minimal exploration of adversarial resilience in ML-based detection systems [2].
  • Limited application of security standards in Agentic AI operational models [59].
  • Neglect of human-in-the-loop security practices in most technical models [20].
These gaps highlight the need for applied research, such as the proof-of-concept system and simulation-based adversarial testing proposed in this thesis, to bridge theoretical insights with operational realities in securing Agentic AI systems for industrial automation.

3. Research Methodology

This project is divided into two main phases to address the research question. The flow diagram was drawn to give the high level overview of the project flow as shown in Figure 1.
Figure 1 presents the progressive workflow followed in this study, emphasizing the connection between the Systematic Literature Review (SLR) and the simulation based evaluation. The process began with the Kickoff and Project Introduction where we have established the primary research question, reviewed the background of Agentic AI in industrial automation, highlighted why the study is important, and outlined our two-part approach (literature review and simulations). Followed by establishing the Background and Related Work. The systematic literature review was conducted according to PRISMA guidelines, involved identifying relevant publications, synthesizing their findings, and evaluating them against predefined inclusion and exclusion criteria. From this review, we successfully identified key gaps and major threats in Agentic AI used in industries, this guided our simulation design including attacks, metrics, and framework settings. The simulation phase uses two agentic AI frameworks LangFlow and CrewAI tested under comparable conditions. The outputs of simulation were combined and investigated through qualitative and quantitative analyses. Finally, results from both analyses were integrated in the Results and Discussion section, enabling comparisons with the literature review findings and drawing conclusions on performance, resilience, and applicability.

3.1. Systematic Literature Review

This research follows the systematic review approach based on the Preferred Reporting Items for Systematic Reviews and Meta Analyses (PRISMA) framework [60]. The PRISMA process was selected because it provides a transparent and replicable methodology for identifying, screening, and synthesizing relevant studies. To ensure comprehensive coverage, we focused on reputable and widely recognized databases, including IEEE Xplore, Scopus, ACM Digital Library, SpringerLink, and ScienceDirect. IEEE Xplore and ACM DL were prioritized for their strong focus on computer science, artificial intelligence, and cybersecurity research. Scopus and ScienceDirect were included to broaden the scope toward interdisciplinary studies covering industrial automation, control systems, and applied engineering. SpringerLink was added to capture emerging work on cyber physical systems and multi agent architectures. These databases collectively provide a balanced view of both technical and applied perspectives relevant to Agentic AI in industrial automation.
The search strategy employed carefully selected keywords and Boolean queries, such as: “Agentic AI” AND (“cybersecurity” OR “security” OR “threats” OR “vulnerabilities”) AND (“industrial automation” OR “cyber–physical systems” OR “smart manufacturing”). Additional variations included terms like “adversarial attacks,” “anomaly detection,” “multi agent systems,” and “GAN based detection” to capture literature on specific threats and defensive methods. We also included the keyword “bottleneck” because it is widely used in cybersecurity and industrial automation literature to describe limiting factors, weak points, or critical challenges. Since our study investigates security constraints that restrict the safe deployment of Agentic AI, this term helped us capture research directly addressing such limitations.
To refine the scope, we restricted the search to peer-reviewed articles, conference papers, and journal publications between 2015 and 2025, published in English. This time window was chosen to reflect the recent emergence of Agentic AI as a field, while ensuring inclusion of foundational security-related works on AI and industrial systems. By combining multiple databases, structured queries, and clear inclusion/exclusion criteria, this study ensured that the literature review captured both the breadth of existing research and the depth of specialized contributions, aligning with the objectives of this thesis.

3.1.1. Identification of Studies

We took this initial step by identifying the reputable databases using the University Library website. Since Agentic AI application in Industrial Automation is just an emerging area and still under studies, the research availability is limited. To obtain the most reliable literature, both peer-reviewed and scientific publications were considered. The Databases that were searched are ACM Digital Library, Scopus SpringerLink and Academic Search Premier. These databases were selected due to their reputation for having peered reviewed papers. They provide extensive coverage of research centered on artificial intelligence, cyber security, and software engineering fields. In order to extract the more relevant studies, we created well structured queries that integrated Boolean operators “AND” “OR” using identified keywords to identify security bottlenecks of Agentic AI in Industrial Automation. The key words included Agentic AI, Industrial Automation, Smart Manufacturing, Autonomous Agents, Adversarial attack, Cyber Threats. These keywords were used iteratively to improve and maximize on the search outcomes. The Table 1 below shows the initial results from each database.

3.1.2. Articles Screening

The retrieved articles were screened based on defined eligibility criteria for both inclusion and exclusion to ensure the choice of high quality relevant articles. By using this criterion, we removed articles whose content did not fit the research goals or whose findings were just about general AI and did not discuss anything about cyber security.
The Table 2 below gives the explanation of inclusion criteria.
Table 3 presents the exclusion criteria applied during the screening process.

3.1.3. Data Extraction

A structured data extraction technique was applied to gather information systematically from the final set of selected studies. The extracted information included metadata (title, authors, publication year, source), research methods, security implications, and mitigation strategies. To organize the process, we focused on four main elements:
  • Metadata: Basic details such as title, authors, year of publication, venue, and source.
  • Security bottlenecks: Identified risks, threats, and vulnerabilities of Agentic AI in industrial automation.
  • Mitigation Strategies: Approaches proposed or evaluated by researchers to address identified bottlenecks.
  • Results and ConclusionsKey findings, recommendations, and best practices.
To consistently classify bottlenecks, we coded them into three categories: (i) risks/vulnerabilities, (ii) attack types, and (iii) mitigation/defensive measures. This thematic coding allowed us to map recurring security issues across studies.
To ensure reliability, two authors independently reviewed and coded each paper. Any disagreements were resolved through discussion until consensus was reached. This double check process minimized bias and increased consistency in the final dataset.
This systematic extraction and validation process enabled us to identify key cybersecurity challenges of Agentic AI in industrial automation while also mapping available mitigation strategies in a structured way.

3.2. Experimental Setup

We developed an Agentic AI system based on two frameworks, CrewAI and LangFlow. The ultimate goal was to compare the performance of two AI orchestration frameworks, CrewAI and LangFlow, through replicated simulated cyber security attacks and to measure the effectiveness of integrated GAN-based anomaly detection systems in both frameworks. Both of the selected solutions were applied as intelligent agents in simulated environments in a smart factory, with the capability to sense physical parameters (like temperature, pressure, vibration, and humidity) in real time, operate autonomously, and react to security events.

3.2.1. CrewAI Framework

CrewAI framework was selected due to its flexibility in developing complex multi agent systems capable of achieving complex goals. CrewAI is an open source framework designed to coordinate AI agents with role playing and autonomous operations, facilitating cooperation among agents to solve complex problems [61]. In CrewAI, an agent is essentially an intelligent unit designed to perform specific tasks without human intervention. Agents operate autonomously, where they can make decisions and execute actions based on their given instructions. The framework allows for what are called “role customized agents,” which means that each agent can have a unique role or function depending on the needs of the application. The authors note that CrewAI allows developers to define AI agents with specific roles, objectives, and tools. CrewAI is known for its non code graphical interface, which allows non programmers to design and manage MAS effectively [62]. The platform provided an interface to create multi agent workflows through the use of large language models (LLMs), sensor simulation, and real-time anomaly detection.
The design consisted of seven nodes where five were agents assigned specific tasks and two were input and output nodes.

3.2.2. LangFlow Framework

LangFlow was initially created to facilitate the development of agents using drag and drop GUI and to allow rapid prototyping. Agent logic used in CrewAI was reintroduced in LangFlow for consistency across experiments. Langflow is a specialized No Code platform tailored for designing and executing conversational AI workflows. Langflow provides a scenario editor that enables users to design chatbot interactions step by step. This helps build flows where the conversation logic is clearly mapped out through visual components [63]. The platform is highly compatible with LangChain components, which are libraries that simplify connecting various AI modules in a sequence, ensuring that the designed workflows can easily integrate different tools and models. It is designed to empower users who may not have technical expertise by offering a drag and drop visual interface, simplifying the creation of complex workflows and thereby reducing the barrier to entry for AI application development
Tasks delegated to LangFlow agents included monitoring input streams of data from industrial sensors, triggering decision nodes in response to environmental thresholds, interacting with a GAN discriminator to detect anomalous readings, logging response outcomes for later evaluation. However, LangFlow was later evolved into a Python: 3.10-based backend simulation due to limitations imposed by its user interface. Despite this limitation, we still tried to design an architecture that would offer a node-based simulation that incorporated sensor inputs, GAN implementation and evaluation as well as LLM reasoning. A streamlined Generative Adversarial Network construct was developed using the PyTorch: 2.7.0 framework. Training of the discriminator portion was solely carried out with normal sensor data to create a profiling baseline. Any significant deviation recognized through the discriminator constituted an outlier.

3.2.3. Large Language Model Integration

OpenAI’s GPT 4o mini was used in both CrewAI and LangFlow frameworks. Agents interacted with GAN detectors and OpenAI’s Large Language Model for analyzing the received data in order to check for anomalies and detecting possible threats. Once anomalies were detected, sensor data was passed to the LLM with the following prompt: Sensor data: data. You are a safety monitoring AI in a smart industrial environment. Evaluate the sensor report and do the following:
  • Summarize the environmental conditions.
  • Assess whether the system is operating normally or abnormally.
  • Take Action Automatically.
This made LLM make relevant reasoning, enhancing the transparency of automated safety assessments based on the data it received.

3.2.4. Simulated Attacks

To evaluate system robustness, we focused on four representative cyberattacks: Distributed Denial of Service (DDoS), False Data Injection (FDI), Replay Attacks, and Adversarial Attacks. These attack types were selected because they represent some of the most critical threats to Agentic AI in industrial automation, where autonomous decision making, continuous learning, and inter agent communication expand the attack surface. Together, they reflect the three pillars of cybersecurity availability, integrity, and confidentiality making them highly relevant for this study.
Distributed Denial of Service (DDoS): A DDoS attack attempts to overload system resources by flooding them with excessive traffic. In Agentic AI systems, this disrupts the availability of data and prevents autonomous agents from maintaining communication. In smart factories, where timing and availability are critical, DDoS attacks can delay sensor readings, disrupt feedback loops, or even crash agent services altogether [1].
False Data Injection (FDI): In this attack, adversaries tamper with data from sensors or control units before presenting it to the AI system. Because Agentic AI agents often make decisions independently, corrupted data may result in false alarms, failure to detect real faults, or unsafe operational choices. FDI can therefore cause misconfigurations, shutdowns, or even deliberate sabotage of industrial processes [6].
Replay Attacks: Replay attacks involve capturing legitimate sensor or command data and resending it at a later time to create a false impression of system conditions. For example, a safe temperature reading may be replayed to conceal an overheating event. Such attacks degrade data integrity and freshness, pushing agents toward obsolete or unsafe decisions. This undermines both system trust and safety in real-time environments [5].
Adversarial Attacks: Adversarial attacks target the AI models themselves by introducing malicious inputs designed to deceive machine learning algorithms. For Agentic AI, this is especially concerning when systems rely on LLMs, computer vision, or reinforcement learning. Perturbations as subtle as a few pixels in an image can cause an agent to misclassify safe vs. unsafe states, potentially leading to hazardous decisions [39].
By focusing on these four attack types, we capture a realistic range of threats that Agentic AI systems are most likely to encounter in industrial automation. They directly test the resilience of agents in maintaining availability, integrity, and trustworthy decision making under hostile conditions.

3.2.5. Evaluation Metrics

We have set seven evaluation metrics, which are common in evaluating AI systems capability in detecting cyber threats, so we could comparatively analyze our frameworks. The metrics were accuracy, precision, recall, F1 score, detection rate, force positive/negative rate and AUC-ROC score. Accuracy measures the overall degree of correctness for the model and is the proportion of true predictions, true positives and true negatives, over the entire data [64]. But for unbalanced datasets, Precision and Recall give better indications. Precision is the number of times in which the correct threats were identified [65], while Recall (true positive rate), indicates the correct number of actual attacks that were identified [66]. F1 Score is the mean of Precision and Recall and it also provides a fair measure when an absence of both false positives and false negatives are essential [66]. False positive rate and false negative rate complement Recall by showing the rates of false alarms and missed attacks respectively [67]. Detection rate is basically aligned with Recall, but it explicitly indicates the proportion of successfully detected attacks. Lastly, AUC-ROC measures the model’s performance in discriminating attack and normal cases at different thresholds [68].
The Table 4 below gives contextualized definitions for true positive, true negative, false positive, and false negative, as applied in our case.
The Table 5 gives details of the actual evaluation metrics and their formula.

3.3. Research Questions

The following research questions have been developed to breach important knowledge gaps in cybersecurity for Agentic AI systems within industrial automation. The questions are based on a thorough evaluation of the literature:
  • RQ1: What are the cyber security bottlenecks (threats, risks and vulnerabilities) of Agentic AI in industrial automation?
  • RQ2: What simulations can be conducted to demonstrate attacks on Agentic AI in industrial automation, with focus on adversarial attacks?
  • RQ3: What are the mechanisms that can be used to detect attacks on Agentic AI systems?
  • RQ4: What are the recommendations that can be used to secure Agentic AI systems against attacks and threats?
These research questions provide a natural progression from problem identification (RQ1) to detection and monitoring (RQ3), experimental validation (RQ2), and actionable cyber security tactics (RQ4). When combined, they provide a systematic and comprehensive analysis of how to securely integrate Agentic AI into vital industrial systems in the future.

3.4. Research Design

This paper uses a mixed-method approach, carefully integrating simulation based experiments, Proof of Concept (PoC) creation, and Systematic Literature Review (SLR). Systematic literature review was first carried out to accurately identify and classify the cyber security risks, vulnerabilities, and threats directly related to Agentic AI systems within industrial automation. A strong theoretical foundation for additional practical study is established, knowledge gaps are clearly identified, and the research is well grounded in current academic knowledge because of the literature review. After studying the literature, a workable proof of concept system was created using frameworks like CrewAI and LangFlow, which made it possible to mimic Agentic AI features in an industrial setting in a regulated and realistic way. Key parameters like temperature, vibration, and pressure could be separately monitored in this proof of concept stage, allowing for in depth tracking and analysis of the Agentic AI system’s decision making procedures. Simulation-based tests were used in the final phase to methodically assess the security vulnerabilities found in the literature review. The created Agentic AI system was tested with attack simulations to verify theoretical vulnerabilities and evaluate their possible practical implications. To address real-world security concerns, these simulations also provide a platform for incorporating and analyzing anomaly detection techniques, like Generative Adversarial Networks (GANs).
This multi step, mixed-method approach was chosen because of its potential to effectively address the multifaceted and intricate character of the research topic. Thorough coverage of both conceptual comprehension and practical applicability is guaranteed by combining theoretical literature study with practical validation through proof of concept and simulations, providing strong and useful research findings.

3.5. Data Collection Method

Three main tasks included data collecting, simulation based cyber attack experiments, the creation of a prototype Agentic AI system, and a thorough literature assessment to systematically answer the research questions. Several academic resources, such as IEEE Xplore, ScienceDirect, SpringerLink, ACM Digital Library, and arXiv, were used in the systematic literature review (SLR). Targeted search queries that included keywords like “Agentic AI” OR “Autonomous AI” OR “LLM Agents” along with “Industrial Automation” OR “Smart Manufacturing” OR “ICS” were used to find relevant research articles and academic papers. These queries were then further concise down by terms like “Cybersecurity” OR “Security Risk” OR “Threats” OR “Vulnerabilities”. To maintain the inclusion and exclusion criteria rigorous and relevant only peer-reviewed journal articles, respectable conference proceedings, and acknowledged preprints published in the previous five years were included. Articles that were not directly related to cybersecurity, agentic AI, or industrial automation were excluded. The most relevant research was chosen after reviewing abstracts, and full texts to provide a solid theoretical framework.

3.6. Data Analysis Method

To create realistic and useful datasets for analyzing the vulnerabilities found in the literature review, cyber attack simulations were conducted. Adversarial manipulation of sensor data streams was used in these simulations, including the injection of fake pressure drops, unusual vibration patterns, and abnormal temperature spikes. Prompt injection attacks were one of the other attack vectors used to trick the AI bots’ decision making. For these simulated attack scenarios, Python-based scripts and security testing tools were used. This configuration made it possible to systematically track, record, and analyze the responses of the Agentic AI system, allowing for thorough assessments of the AI system’s detection and resilience.

4. Systematic Review and Simulations

This section combines insights from our systematic literature review with the threat modeling exercise. The review helped us identify recurring risks, gaps, and security challenges in Agentic AI and industrial automation. These findings directly informed the design of our threat modeling approach, ensuring that the simulated attacks and defenses were not chosen randomly but reflected real concerns raised in existing research. By first summarizing what the literature revealed and then applying those insights in a structured threat modeling exercise, we provide a clearer picture of both the theoretical and practical cybersecurity bottlenecks in Agentic AI systems.

4.1. Literature Review Findings

Our work started with planning and identifying the reputable databases and a total number of 339 records were identified from all the databases. After screening and inclusion phases, a total of 46 articles were included in the review. In order to have the best insight of these articles, we grouped them in 5 categories namely; Agentic AI focus, Industrial Automation Relevance, Cyber security Relevance, Experiment focus and publication year trend.
Out of 45 articles, 4 papers had a strong focus on Agentic AI, while 15 articles had moderate focus; 5 papers did not focus in Agentic AI, 20 papers had strong focus and 3 papers were not very in-depth. Figure 2 shows the distribution of papers based on Agentic AI Focus.
As on Cyber Security focus, we looked at papers which also talked about Cyber Security issues in Agentic AI; 52% of the papers had strong discussion on cyber security, 23.9% had moderate discussion, 19.6% had a shallow discussion on cyber security, while 4.3% of papers did not have any discussion on cyber security. Figure 3 gives the distribution of articles based on Cyber Security focus.
In the category of Industrial Automation focus, 28 articles had a significant discussion about Industrial Automation, 16 papers had moderate focus on Industrial Automation while 2 papers had a very low focus. Figure 4 shows the distribution of papers based on Industrial Automation relevance.
Lastly, we checked the number of publication on our topic domain from 2014 to 2025. We chose to start from 2014 because this is when Industry 4.0 had just started (3 years prior) and automation started being employed. The distribution of publications about Agentic AI, Industrial Automation and Cyber security showed growth from 2014 up to 2025, as shown in Figure 5.

4.2. Threat Modeling Application

During simulations, we built two frameworks in order to have a comparative analysis. The frameworks that we used are CrewAI and LangFlow.
Before describing these frameworks in detail, we first categorized the identified bottlenecks into four broad groups: data related, model related, communication related, and system level. This classification provides a structured way to link the literature review findings with the threat modeling and the simulations. Table 6 presents this categorization.

4.2.1. CrewAI Design and Implementation

To better illustrate the structure of the CrewAI framework, we mapped each simulated agent or node to its corresponding role in real-world industrial systems. This mapping highlights how abstract simulation components represent actual industrial elements such as IoT sensors, PLCs, intrusion detection systems, or SOC tools. Table 7 provides an overview of the agents, their functions in the simulation, and their relevance to real-world settings.
The design consisted of seven nodes where five were agents assigned specific tasks and two were input and output nodes. The Figure 6 shows the prototype of the CrewAI Framework.
Each node had a specific task to perform as given in the description below:
  • Sensor monitoring agent. It was responsible for the real-time processing of input sensor data from the input nodes. The simulated sensors were temperature, vibration, humidity and pressure. The baseline readings were configured in the input node and sent to the sensor monitoring node. This was setting operational base line standards for normal system behavior, as shown in Figure 7.
  • Response Implementation Node. This was responsible for interpretation of the sensor data processed by sensor monitoring agent. It was responsible for taking actions if any threshold exceeded predefined baseline. This node could make an autonomous decision for the safety of the simulated factory as shown in Figure 8.
  • Attack Simulation Node. This node was specifically responsible for simulating attacks on the system as shown in Figure 9.
  • GAN Defense Node. This node was responsible for performing independent assessments and the detection of security threats breached by deploying a GAN-based detection algorithm. The data received from the response implantation node was used as baseline to measure against the data received by attack simulation node. In this way, that historic data was used as a discriminator, to identify adversarial inputs generated by attack simulation node as shown in Figure 10.
  • Data Analysis Node. This node was responsible for analyzing all the data throughout the simulation and providing data on how the systems performed before, during, and after the attacks and how the GAN performed in defending the systems against the simulated attacks as shown in Figure 11.

4.2.2. LangFlow Design and Implementation

The LangFlow consisted of four nodes namely sensor input node, Prompt node, LLM node and output node. The Figure 12 shows the prototype of the LangFlow Framework.
As with CrewAI, each node had a specific task to perform
  • Sensor input node: It was responsible for real-time input of sensor data The simulated sensors were temperature, vibration, humidity and pressure. The baseline readings were configured in the input node and sent to the sensor monitoring node. This was setting operational base line standards for normal system behavior. The data was sent to prompt node. The Figure 13 show input node.
  • Prompt node. This was responsible for taking this received data and sending it to the Large Language Model for interpretation and action. Figure 14 shows prompt node.
  • LLM Node. This was the decision node which was taking autonomous decisions based on the data received from the prompt node. Figure 15 shows the LLM node.
  • Output Node. This was responsible for displaying the decision taken by the LLM. Figure 16 shows the output node.

5. Results

In this chapter, we are going to give the results of the work that we carried out. The results are presented in two sections: the systematic review section and simulations sections. Each section dives deep into the results and is divided in subsection to provide sufficient focus and detail on the results. This chapter also dives deep into the analysis of the SLR papers, their findings, and critical insights into cyber security bottlenecks, as discussed by different authors in the retrieved articles.

5.1. Systematic Literature Review Results Analysis

The results of the research were derived from the data which was gathered during the systematic literature review and simulations. Initially, 339 records were obtained from four databases. After removing 23 duplicate articles, irrelevant records, and 34 records for other reasons, we remained with 224 records. Subsequent screening based on titles, abstracts, and other exclusion criteria defined in the methodology led to the exclusion of 163 articles, leaving 61 records. Further eligibility screening excluded an additional 15 articles due to various factors; most of these articles lacked a direct link to industrial application. Ultimately, we included 46 articles in our study. Figure 17 summarizes this process.
Figure 17 illustrates the identification and screening process, where a total of 339 records were collected from all databases. Figure 18 gives the summary results of total number of articles, included articles, excluded articles and those that were assessed for eligibility per database. Figure 19 and Figure 20 show the ratio of included and excluded articles and per defined criteria, respectively.

5.2. Cyber Security Bottlenecks of Agentic AI

Agentic AI systems provide significant improvements in process efficiency and autonomous decision making, but they also bring with them a new range of cybersecurity risks. The very characteristics that give agentic systems strong independence, flexibility, and access to outside data sources are the cause of these weaknesses, including prompt injection and adversarial attacks, unpredictability of Agentic AI system dynamic behavior, denial of service and jamming attacks, increased attack surface, and data privacy issues. We will further present the gaps in the research in Agentic AI and Industrial Automation. In this section, we are going to breakdown the results of the reviewed articles.

5.2.1. Prompt Injection and Adversarial Attacks

Four studies picked prompt injection, adversarial sampling and hallucinated outputs as common cyber risks in Agentic AI systems that use LLMs and other algorithms, and this can result in unpredictable or malevolent actions [71,72,73,74]. Although tackled from different angles, all these studies agree that prompt injection and adversarial attacks are the big challenges. Altered prompts could lead an autonomous agent entrusted with ordering inventory to place excessive orders or reroute deliveries. The real-world threat of weaponized LLMs that can produce malware code, social engineering scripts, and phishing campaigns is demonstrated by the offensive deployment of models such as FraudGPT and WormGPT [75]. The integration of these agents with vector databases and settings that are rich in APIs presents yet another significant risk. Agentic AI may inadvertently reveal private information or carry out destructive operations when it uses plugins or third party services without strict input sanitization or authorization restrictions [74].
Another interesting finding is that in a multi agent industrial environment, other agents can be considered as adversaries in cases of data injection attacks, where part of the agents in a network may be considered being adversarial [74]. Ref. [71] cements this finding by adding a synchronization problem involving two systems, where one system is a malicious or considered an adversarial agent whose aim is to disrupt the behavior of the entire system through its interactions at the physical layer. The results indicated that such malicious behaviors have high potential of disrupting overall system performance [76] emphasizes the same that complexity of autonomous decision making and raises concerns about adversarial attacks that can deceive AI models, potentially causing unintended behaviors. Adversarial attacks exploit vulnerabilities in AI models by crafting inputs designed to mislead them. In industrial automation, such attacks could disrupt production processes or damage equipment. For example, adversarial examples in sensor data could trick AI systems into perceiving normal operations as anomalies or vice versa [77,78,79]. Although [79] does not specifically tackle critical cyber security vulnerabilities and risks of integrating Agentic AI in industrial automation, it focuses on vulnerabilities in machine learning driven autonomous control systems within advanced nuclear reactor designs, which is greatly linked to Agentic AI and autonomy.

5.2.2. Unpredictability of Agentic AI Systems Behavior

This problem is supported by five studies. According to Ref [10], decision drift, operational blind spots, or security policy circumventions might arise from the opacity of agentic decision logic in conjunction with quickly evolving settings [80]. In industrial settings where decisions have a direct impact on physical systems, this is especially risky. Another unnoticed attack surface is the integrity of training data. Performance can be subtly deteriorated or latent behaviors introduced through poisoning assaults, in which adversaries introduce tainted data during training [4,81]. Such poisoning may lead to dangerous equipment states or even improper process changes in CPS scenarios. Research has demonstrated that transformer or GAN-based models can be considerably impacted by even a tiny proportion of contaminated samples. Vulnerabilities in recommendation manipulation can be used by adversaries [82].
By creating adversarial settings, [23] address how LLM-based agents can be tricked into making risky operational recommendations. This is particularly important in settings where agentic advice are used for energy efficiency or maintenance. Limitations on explainability make cybersecurity efforts even more difficult. Transparent decision auditing techniques are absent from many agentic systems [45]. Human operators are unable to properly validate, intervene, or identify the underlying causes of a system’s behavior if they do not comprehend the reasoning behind it. This causes conflict in Security Operations Centers (SOCs) [20]. Although promising, federated and distributed learning approaches come with their own security vulnerabilities, federated edge learning models may experience identity spoofing by compromised nodes, fraudulent updates, or model inversion in OT contexts [83]. In the same vain, [76] states that malicious attacks, data breaches, and manipulation of AI decision making processes can be disastrous on AI agents. They continue to argue that systems can be targeted for unauthorized access, leading to operational disruptions or safety hazards, as compromised data can result in incorrect actions or system failures [76].

5.2.3. Denial of Service and Jamming Attacks

Agentic AI can come under DoS attacks, causing operation disruption in a manufacturing environment [74]. In their study, it is highlighted that DoS as one of the issues faced by AI agents. Attackers can block or disrupt communication channels, causing packet drops or delays, which impede the agents’ ability to share information and achieve consensus. This becomes a big problem in an industrial setup where availability is of the most important security objects to achieve. The paper examines models based on energy constraints of attackers, analyzing how the frequency and duration of such attacks influence system robustness. Another way of DoS discussed by the same researcher is jamming and communication blockades where external interference signals or malicious jamming can prevent reliable data transmission, challenging the integrity and availability of the control communication links. Industrial Automation also depends much on sensor communication where sensors relay messages to each other for operational efficiency. Ref. [84] looks at communication channels in a networked cyber–physical systems in which the fidelity of sensor data is critical for the safe operation of the physical plant. If this data is misrepresented and fed back by a primary sensor, it may cause service disruption or damage to the wider plant through interconnected physical processes which was looking at Securing AI agents argues that the DoS attack vulnerability brings the multi agent exploitation attack [47]. They explain that multi agent systems (MAS) utilize a set of capabilities of many agents to solve various issues and these systems are very susceptible to probing, repudiation, flooding, impersonation, and denial of service (DoS) attacks. Losing one agent causes major problems for the system and creates many problems and even exposes other agents [28].
Another study [85] which divided IIoT network architecture into physical, network and application layers, supports the notion of DoS attacks, especially in the network layer. The researchers explicitly emphasize that the network layer faces an increased variety of cyber threats, including Distributed Denial of Service (DDoS) attacks. The study further explores malware and intrusion attempts as well as threats such as protocol vulnerabilities, man-in-the middle attacks, and other tampering issues becoming more prominent since smart factories connect via 5G. All these threats are more likely to cause DDoS attacks on industrial agents, threatening their service availability. Another interesting risk identified by this research is about network stability [1]. Even though this might not be considered as a direct threat caused by threat actors, it introduces serious vulnerabilities in interconnected Ai agents in smart factories where availability is the most serious factor to be prioritized. The paper contextualizes network instability in the network layer in such a way that network slicing, a technique to divide physical networks into multiple virtual networks, creates challenges in maintaining stable and isolated slices. An attack in one slice may affect the overall service or cause cascading effects on other agents network isolation and segmentation is not properly isolated. Ensuring a steady and trustworthy connection requires resource scheduling, controller placement, and efficient flow control to address problems like delays and packet loss which is not a simple task. This clearly poses a threat as one of the bottlenecks [1,13].
In industrial automation, ensuring secure and efficient communication between agents is paramount. The challenges of communication delays and the need for secure protocols applicable to industrial settings, where timely and secure data exchange is vital for system coordination and performance, as discussed by [86]. Even though this is more into developing algorithms to compensate for any delays, it first agrees that communication delays occur, which have the potential to impact service availability. This paper, entitled "Neuro Adaptive Formation Control of Nonlinear Multi Agent Systems with Communication Delays", confirms that in industrial automation, ensuring secure and efficient communication between agents is crucial and that challenges of communication delays and the need for secure protocols discussed in the paper are directly applicable to industrial settings, where timely and secure data exchange is crucial for system coordination and performance [87].

5.2.4. Increased Attack Surface and Data Privacy Concerns

Much as AI in industrial manufacturing has completely redefined how production is handled by increasing efficiency and production, this has come with its cost in terms of cyber security. Unlike some decades ago when critical infrastructure was air gapped, we have noe experienced the implementation of Industry 4.0, Industry 5.0, and the Industrial Internet of Things (IIoT), which are connected to the outside world [88,89]. This has increased the attack surface and vulnerabilities in critical manufacturing infrastructure, raising the data privacy concerns in IIoT. Studies agree on this crucial issue that the increased inter connectivity between physical and cyber systems has created many security weaknesses and vulnerabilities, which have led to more cyber attacks and challenges in defending systems using traditional methods [28,50]. The study further highlights how conventional security mechanisms are no longer enough to prevent these attacks in industrial AI systems, and conventional security methods, which work by preventing or detecting attacks, are also no longer enough. As a response to this, Artificial Intelligence (AI) comes into play and is used to solve very complex security problems. Think of AI as a smart tool that can learn from past situations to help predict and identify threats before they cause harm [90]. Another study, entitled “Privacy and Security Concerns in AI Database Systems”, dives deep into the implications of AI Agent systems having direct access to databases and highlighted the privacy and security risks associated with this integration of AI agents with databases [13]. The study showed that the interaction between AI and databases raises concerns regarding unauthorized access to sensitive data and the potential for data breaches. The article emphasized that as AI technology becomes more intertwined with databases, the risks of data exposure and security breaches escalate, especially in the absence of updated security protocols. The study used a qualitative approach to investigate the privacy and security risks linked to AI agents with database system access. To make the finding more profound, it also used expert interviews by gathering insightful information from industry experts in the field [81].
Another study by [85] dived deep into the 5G IIoT network Architecture of smart factories. The study states that through the deployment of 5G networks providing wide broadband, low latency, and massive machine-type communications, industrial wireless networks, cloud, and fixed/mobile end devices in smart factories are interoperated in a harmony [85]. However, it highlights different privacy issues that come with this harmony since IIoT devices generate a lot of sensor messages for communication. The paper gives crucial security and privacy issues for 5G IIoT smart factories at the physical layer, data layer, and data layer of the communication protocol. The pain points discussed by these researchers are physical layer, data layer, and application layer. To break it down further, this paper picks device authentication and production line safety as some of the physical layer issues. With many IIoT devices, ensuring that each device is genuine is challenging. Traditional security methods may not suit these small, low-cost devices [89]. Further, factories must always monitor equipment and production lines to detect faults, manage system failures, and keep operations running even if an attack happens. This involves real-time monitoring and decision making processes to prevent complete shutdowns. The authors also acknowledge defending IIoT device attack surface from cyber criminals is very hard as devices like sensors and actuators are often resource constrained and deployed in various environments yet these components are susceptible to attacks. Addressing vulnerabilities in these devices is crucial to protect the overall factory operations. Another more interesting privacy aspect described by this study, which has not been studied by others, relates to the actual sensor information, and operational logs, which are highly sensitive. Sensor information and unauthorized access to operational logs or data leakage can likely pose serious risks to industrial competitiveness and operational integrity, since this information can contain production secrets of the factory [33,91,92].
Although the privacy concern of “Security Threats in Agentic AI System’s” is specifically on the connection between AI agents and access to database, privacy concern is about the actual messages exchanged amongst the sensors in IIoT; however, both papers agree that data privacy remains a critical issue when it relates to autonomy in smart factories. The paper further stressed that cyber criminals are also using AI to craft more sophisticated and hidden attacks. This creates a sort of arms race where both defenders and attackers use advanced technology to outsmart each other [13,85]. In simple terms, as the good guys get smarter with AI, the bad guys also find ways to misuse it, making the overall security challenge even greater. Differential privacy, secure aggregation, and robust validation are currently being developed. Semantic red teaming, proactive anomaly detection, adaptive access restriction, and contextual trust scoring are examples of new defense tactics. Hybrid defensive strategies that include AI explainability, policy compliance, and decentralized verification are advocated by certain studies [4,93]. Security for Agentic AI necessitates a multi layered, proactive strategy. Since these agents must protect against language, logic, and context manipulation in addition to code vulnerabilities, cybersecurity is not merely an external protection but rather an integral component of the architecture, unlike traditional systems [16,27].

5.3. Simulation Results

The experiment evaluation was carried out in three phases according to the purpose of the simulation. The first phase is pre attack or baseline, which is the evaluation conducted before any attack was initiated, in order provide the benchmark of the system performance under normal operation. The second phase was attack simulation without any defense mechanism. The purpose of this phase was to evaluate how susceptibility of Agentic AI system to different attacks thereby exposing their vulnerabilities. The last phase, which formed the core part of our simulation, was GAN detection performance during attacks. This was aimed at evaluating the effectiveness of GAN in detecting the attacks and defending our system against those attacks.

5.3.1. Pre Attack/Baseline Results

In both systems, Phase 1 delivered the expected stability. All readings were within the defined safety parameters, and no assaults were reported since no attacks were simulated. Under this baseline condition, the GAN correctly classified all input data as normal. This resulted in an observed accuracy of 100%, with Precision, Recall, and F1 scores undefined or zero, reflecting the absence of positive detections when no attacks were present. Both LangFlow and CrewAI also showed very few false positives, establishing a reliable control condition. Figure 21 illustrates the normal behavior recorded when no adversarial scenarios were introduced.
It is important to note, however, that this 100% accuracy applies only to the baseline test set under external no attack conditions. It does not imply that the GAN model can universally achieve perfect performance, nor does it rule out potential blind spots or internal model faults. GAN-based anomaly detection methods are effective at recognizing deviations from learned patterns, but they may be less capable of identifying subtle internal errors, miscalibrations, or faults within the model itself. These limitations highlight the need for continuous validation and complementary security mechanisms when applying GANs in real-world industrial environments.

5.3.2. Attack Simulation Without Any Defense Mechanism

In Phase 2, when attacks were simulated without any GAN defenses, neither of the frameworks succeeded in detecting the injected threats and the detection rate dropped to 0. False Negative Rate rose to 1.0, confirming total failure to recognize threats LLM outputs relied purely on unfiltered data, often misinterpreting attack patterns. Figure 22 shows the FNR score for both frameworks.

5.3.3. Attack Simulation with GAN in Place

The final stage integrated the GAN detection model into the decision-making mechanism. All the frameworks were evaluated using the same metrics to avoid any bias. The Table 8 below shows the results of each framework on these metrics.
CrewAI achieved 67% accuracy, 1.0 precision, 0.5 recall. LangFlow showed similar trends, but with a slightly decreased recall of 0.43. FPR remained at 0.0 in both frameworks, indicating no false positives were reported by both frameworks after implementing GAN. AUC-ROC improved to 0.95 for CrewAI and 0.88 for LangFlow, confirming meaningful GAN discrimination.
Comparatively, the results show that CrewAI outperformed Langflow in most of the evaluation metrics. Notably, its accuracy rate was 0.67 compared to LangFlow’s 0.60. The result means that CrewAI correctly classified a larger percentage of instances. The two models achieved a perfect precision rate of 1.00, implying that all identified threats were legitimate ones and no false positives occurred. CrewAI also showed a much improved recall rate of 0.50 compared to LangFlow with 0.43, implying higher effectiveness in detection of actual threats. The same was demonstrated in the F1 Score where CrewAI outpaced LangFlow’s to 0.60 compared to CrewAI’s 0.67.
Additionally, CrewAI had a higher detection rate of 0.50 than LangFlow’s at 0.47, still indicating a marginal superior capability to detect attacks. When it comes to false negatives, LangFlow showed higher vulnerability, with an FNR of 0.57 compared to CrewAI’s improved rate of 0.50, which simply means LangFlow was more likely to miss attacks. It is notable that both models had an optimal AUC-ROC performance of 0.00 in respect of false positives, thus confirming that neither model mislabeled benign samples as attacks. Figure 23 shows the performance comparison between CrewAI and LangFlow across matrices during attacks when GAN was implemented.
Finally, CrewAI AUC-ROC score of 0.95 was better than LangFlow’s 0.88, demonstrating CrewAI’s higher competency in distinguishing attacks from non-attack data. However, both scores indicate higher performance of both CrewAI and LangFlow frameworks as shown in Figure 24.
In summary, the consolidated results presented in Table 8 and Figure 23 and Figure 24 show a clear comparison of CrewAI and LangFlow across all evaluation metrics. CrewAI consistently outperformed LangFlow, achieving higher accuracy (67 percent vs. 60 percent), recall (50 percent vs. 43 percent), F1 score (0.67 vs. 0.60), and AUC-ROC (0.95 vs. 0.88). Both frameworks achieved perfect precision (1.00) with no false positives (FPR = 0), but LangFlow demonstrated a higher false negative rate (0.57) compared to CrewAI (0.50), indicating a greater likelihood of missing attacks. Overall, the results highlight that while both frameworks benefit from the integration of GAN, CrewAI demonstrates stronger resilience and reliability under simulated adversarial scenarios.

6. Discussions

In this section we interpret the empirical results from our SLR, in alignment with the observed system behaviors to highlight the known vulnerabilities, threats and risk of Agentic AI in industrial automation. It also discusses the architectural factors behind these results, offering theoretical and practical implications.

6.1. Limitations of GAN-Based Detection Systems

The implementation of Generative Adversarial N-based anomaly detection models significantly enhanced resilience in both CrewAI and LangFlow frameworks, agreeing with research that emphasizes the strength of GANs in unsupervised anomaly detection tasks within cyber physical systems [77,78]. The discriminator, which was trained on nominal real-time data generated by simulated systems sensors, effectively flagged out the outlier inputs, resulting in high precision of 1.00 in both frameworks. This shows that GAN in both frameworks was able to detect all attacks on the system. GANs are capable of achieving high precisions when adversarial anomalies show significant deviations from the normal trained or learned baselines as highlighted by [79]. The notable low recall values of 0.50 and 0.43 in CrewAI and LangFlow, respectively, highlight GANs’ capability limitations in detecting adversarial perturbations of low magnitude, which often imitated normal occurrences. This observation is in line with research by [23,94,95], who argue that GANs can miss adversarial inputs prudently crafted to hide within the statistical defined decision boundary. This poses a challenge for GANs to identify small data manipulations due to their reliance on surface-level statistical deviations.
Additionally, the use of GANs in isolation still exposes the Agentic AI systems to drifts, where normal shifts in operational baselines could be misclassified as malicious, thereby making the systems take unwanted decisions which could even have impact on the normal operations. To contextualize this, if there is a real high temperature, GAN can classify this as an attack. Without continuous retraining or reinforcement learning, GAN models in remain vulnerable to false negatives, especially in a dynamic evolving manufacturing industries [22]. This a risk can amplifies when adversaries exploit these system weaknesses over time. The experiments show that Agentic AI systems if embedded with detection logic like GANs and situation aware decision models like LLMs can bring reliability in industrial automation environments under cyber threat conditions.

6.2. Augmenting Detection with LLMs: Not Defense

When configuring out prompt to be sent to the GPT 4o mini LLM in both frameworks, we integrated a configuration for LLM to differentiate between normal and abnormal sensors. This was specifically carried out to allow the system undertake correction decision in context of any behavior that could harm the factory. So the LLM responded accordingly in line with its role; for example, if temperature are beyond threshold, it could trigger automatic shutdown. This integration offered meaningful validation for anomaly detection which added a second layer of security through reasoning. By prompting the LLM with flagged sensor data and contextualizing it as a “factory supervisor”, the system produced outputs that could be explained, validating the GAN’s detection with human readable reasoning.
During our attack simulations before GAN was used, LLMs consistently interpreted adversarial inputs as normal due to lack of embedded anomaly awareness, supporting vulnerabilities raised by [74] in their multi agent synchronization study. When LLMs are blind to context drift like prompt manipulation or false sensor inputs, they use surface level language reasoning, which can create dangerous failure modes. Without proper input sanitization, LLMs tend to exaggerate outputs or give malicious output when exploited with adversarial attacks, as seen with WormGPT and FraudGPT [76]. Therefore, while LLMs provide essential transparency, but they should not act as primary classifiers.

6.3. CrewAI vs. Langflow: Framework-Level Observation

The outstanding performance of CrewAI in many metrics as shown in Table 1 over LangFlow performance highlights not just tool maturity and flexibility but also frameworks architectural scalability and agent granularity. CrewAI’s design allowed us to do discrete task delegation. Agents were categorically separated with specific assigned function like agents for data sensing, attack simulation, GAN filtering, and decision making and each agent had its LLM embedded before for processing of input before passing the output to another agent. This modular design enhanced the processing of data for each agents and had impact on the quality of overall output. This design is supported by [22] decentralized intelligent architecture, which emphasizes role separated agents being key to resilient industrial orchestration. Study [61] says CrewAI is designed to coordinate AI agents by giving them roles and tasks which enable them to in solving complex problems together. CrewAI relies on agents role customization, task delegation, and task management for to enhance collaboration among agents efficiently. The framework includes essential components such as Agent, Task, Tool, and a Crew that collectively contribute to the overall and execution of complex tasks.
The outstanding performance of CrewAI in many metrics as shown in Table 1 over LangFlow performance highlights not just tool maturity and flexibility but also frameworks architectural scalability and agent granularity. CrewAI’s design allowed us to do discrete task delegation. Agents were categorically separated with specific assigned function like agents for data sensing, attack simulation, GAN filtering, and decision making and each agent had its LLM embedded before for processing of input before passing the output to another agent. This modular design enhanced the processing of data for each agents and had impact on the quality of overall output. This design is supported by [22] decentralized intelligent architecture, which emphasizes role separated agents being key to resilient industrial orchestration. Study [61] says CrewAI is designed to coordinate AI agents by giving them roles and tasks which enable them to in solving complex problems together. CrewAI relies on agents role customization, task delegation, and task management for to enhance collaboration among agents efficiently. The framework includes essential components such as Agent, Task, Tool, and a Crew that collectively contribute to the overall and execution of complex tasks.
On the other hand, LangFlow, though user friendly, lacked this in depth architectural flexibility, which is why full implementation had to be carried out in full Python. However, this can be understood because Lanflow is primarily designed for user input prompts. According to [63], LangFlow offers a scenario editor that enables users to design interactions step by step. The researcher continues to note that its specialization in conversational AI means that LangFlow is optimized for environments that require engaging, interactive interfaces for end users. Since our environment was more dynamic and autonomous, we used LangFlow for prototyping and implementation was carried out in Python. The authors admit LangFlow needs broader integration with external tools and APIs which could expand its functionality. The researcher admits that LangFlow would allow users to create more comprehensive and powerful workflows, thereby extending the range of solutions it can support if it could be integrated with other tools. We hope this approach could make LangFlow compete with crewAI in modularity and flexibility. However, we still noticed that even after transitioning to a Python-based backend implementation, LangFlows’ performace was still outpaced by CrewAI because it was still using LangFlow based libraries.

6.4. Demonstrated Security Implications in Industrial CPS

The simulations that we conducted like Denial of Service, validates many vulnerabilities highlighted the in literature. Denial of Service (DoS) and jamming attacks significantly compromised the availability of the systems, exposing risks the network layer risks identified [85] in 5G-enabled IIoT systems especially in multi agent systems architecture where slight delay in commutations among agents can have adverse impacts on production lines. Simulations showed that communication delays and attack timing influenced both detection and response time, supporting [86] conclusion that temporal sensitivity is central to Multi Agent Systems resilience in non deterministic environments. The simulation results also highlight the issue of attack surface increase in interconnected industrial Agentic systems. The study of [13] highlights that if sensor data IIoT is compromised, it can expose production secrets or cause behaviors that are unsafe to the production of plants for humans. Replay attacks and false data injection attack simulations confirmed that even minor data perturbation can completely mislead LLMs and go undetected through weak GAN configurations. This highlights the need for edge-based filtering, tracing of data origination, and checks of input authorization even at the sensor level. Both of these approaches are not yet implemented in both frameworks under our study.
Notably, even though the absence of false positives across both frameworks might look good, it can also be potentially misleading. High precision does not fully reflect safety if recall always remains weak. Systems that never misclassify normal traffic are still susceptible to failure in subtle, well-crafted slow developing intrusions, which is a serious cyber security concern in SCADA integrated factories [96].

6.5. Proposed Mitigation Strategies

After identifying the cyber security bottlenecks of Agentic AI systems as applied in Industrial Automation, this section discusses the proposed mitigation strategies based on literature review and simulation results from in Section 5. The Table 9 gives the STRIDE model framework and mitigation strategies based on the threats identified in our study. This STRIDE is modeled as per discussions in [46,97,98].

Toward Secure, Transparent Agentic AI

Securing agentic systems cannot rely only on reactive anomaly detection, but integration of proactive policy enforcement, agent validation, and behavior auditing must be part of the architectural design and management of whole Agentic AI systems [4]. The simulations highlight the urgent need for unified agentic security architectures that blend statistical detection, semantic reasoning, behavioral traceability, and protocol level hardening.

6.6. Limitations, Gaps and Future Direction

This extensive review produced valuable insights into the cyber security bottleneck of Agentic AI in industrial automation. However, we must accept the following constraints. The research examined only four reputable academic databases, namely ACM Digital Library, Scopus, SpringerLink, and Academic Search Premier, for conducting its literature review. Although these databases produced high quality results, we acknowledge that these databases might not include all studies on the topic. Future research should broaden its investigation by reviewing more databases together with the gray literature to provide a thorough review of the topic.
This research analysis included only publications published in English which might have eliminated important findings contained in literature written in different languages. Our decision to use only articles in English language might create a potential bias because security challenges of Agentic AI tend to differ among various regional contexts. Future research should include publications in other languages other than English only to achieve global representation of the topic. While the simulations highlighted a number of cyber security issues in Agentic AI, the study focused only on two frameworks, CrewAI and LangFlow, which are most common, but this creates a biased evaluation of Agentic AI system behaviors since other frameworks also exist. To broaden the scope in Agentic AI and Cyber Security Research, we propose that more research should be conducted with other existing frameworks. In the same vain, we acknowledge the use of GAN-based anomaly detection only also creates a bias. We suggest that future research should try more detection mechanisms or even combined approaches.

6.7. Summary of Key Risks and Mitigation Findings

To consolidate the findings of both the literature review and simulation experiments, this section summarizes the main cybersecurity risks and mitigation strategies relevant to Agentic AI in industrial automation. The categorization follows the key security pillars of data integrity, availability, confidentiality, and trust/reliability, as reflected in prior studies [2,13,46]. Table 10 provides a structured overview that connects identified threats with mitigation approaches, serving as a bridge between the detailed analysis and the concluding discussion.

7. Conclusions

The aim of this study was to investigate the cybersecurity bottlenecks of Agentic AI in industrial automation. The study was approached in two phases: Systematic Literature Review and Experiments Simulations. A comprehensive Literature review was conducted using PRISMA framework. The study highlights particular risks, threats and vulnerabilities that come when autonomous Agentic AI systems operate in crucial Cyber Physical Systems with minimum human intervention and supervision. A research gap was highlighted in protecting Agentic AI systems in Industrial Automation, specifically in regard to data poisoning attacks, Denial of Service and jamming attacks, model inversion, prompt injection attacks and the unpredictability nature of Agentic AI autonomous reasoning. Even though Agentic AI is a rapidly emerging solution to smart factories owing to its intelligent control, cyber security measures to protect these systems are still underdeveloped. In order to confirm these bottlenecks and close the gaps, the study created a proof of concept simulation by building Agentic AI system using LangFlow and CrewAI frameworks. The experimental simulations were aimed at observing the system behavior before and during attacks without having any defense mechanism in place. We simulated a GAN-based anomaly detection as a defense and detection mechanism. The simulation results clearly exposed the vulnerable Large Language Model-driven agents and how susceptible they are to attacks in the absence of any cyber security defense mechanisms like GAN. The results showed that the simulated attacks interfered with decision making of the agents. After implementation of the GAN technique, the results showed that most attacks were detected as indicated by quantifiable metrics of accuracy, precision, recall, and AUC ROC score. Even though the GAN performance was not perfect, but it showed that with a simple defense mechanism in place, the of Agentic AI systems can be secured.
In order to close this gap, the paper created a proof of concept simulation environment that included baseline and attack scenarios. This was followed by detection mechanisms, particularly the use of Generative Adversarial Networks (GANs) for anomaly detection. The study highlighted the vulnerability of Large Language Model-driven agents in the absence of strong security measures by showing that adversarial attacks considerably interfered with system decision making. However, the addition of GAN-based detection greatly enhanced the capacity to identify anomalies in real time as proved by evaluation metrics of accuracy, recall, detection rate, precision, and AUC-ROC score. The study also highlighted that the choice of a good Agentic AI framework during development is paramount to augment the AI detection logic when integrated with LLMs. In our simulation, even though both CrewAI and LangFlow demonstrated resilience to attacks, but CrewAI outperformed LangFlow in most of the metrics. However, both frameworks showed some drawbacks in configuration using GUI, highlighting the need for continuous development and fine tuning of their GUI interface interactions.
Lastly, this study developed the STRIDE model for threat modeling for Agentic AI system with Industrial Automation. This calls for interdisciplinary coordination among different parties like AI developed, cyber security engineers, industry stakeholder to foresee the threats associated with these systems in the earliest stage of development. and pre plan how to mitigate these threats. All in all, much as Agentic AI systems play a big role in industrial automation, their rapid adoption comes with increased vulnerabilities. There is need to employ more security measures to safeguard these systems in mission critical infrastructure.

7.1. Future Work and Research

This paper highlighted some of the cyber security challenges of Agentic AI in Industrial Automation. The STRIDE model framework can be utilized to give general knowledge of threats that are associated with these systems and plan mitigation strategies in the earliest development stage. Future work should involve testing other frameworks like AutogenAI, Mesa and others and compare their performance under different scenarios. In this project, we used one Large Language Mode, GPT 4o mini. Future work should focus on the deployment and testing of other LLMs to see how they contribute to the security Agentic AI systems. Future work should also dive deep into other applications of Agentic AI, outside of Industrial Automation. This paper also proposes working with real-world datasets and real industrial environments for more realistic observations. Lastly, we propose that more research should be conducted, especially on the detection and defense mechanisms. We also propose more exploration of combined defense mechanisms for robust and resilient Agentic AI systems. Our research should be expanded further to look into more cyber security threats and defense mechanisms.

7.2. Critical Discussion

Our simulation results are broadly consistent with existing anomaly detection approaches in industrial contexts, yet they demonstrate notable advantages in terms of resilience and adaptability. For example, [99] developed a hybrid anomaly detection framework that combined Generative Adversarial Networks (GANs) with Distributional Reinforcement Learning for Industrial IoT systems, achieving high accuracy (over 97 percent) and strong recall rates in controlled test environments. Although their results underscore the value of GAN-based detection, their evaluation was performed on a single data set and did not consider cross framework comparisons or multiple attack categories. In contrast, our study applies comparable GAN-based detection principles within the CrewAI framework but evaluates performance under multiple, diverse attack types, allowing a more robust assessment of real-world resilience [99].
Likewise, the work on anomaly based intrusion detection for Industrial Control Systems (ICS) reported competitive accuracy using a GAN architecture trained on ICS specific datasets. However, their approach primarily targeted external intrusion attempts and did not address internal faults or model drift, leaving questions about long-term reliability in dynamic environments. Our findings extend this perspective by showing that CrewAI, while also GAN driven, maintains comparable detection rates across both familiar and unfamiliar attack scenarios, and can be benchmarked against an alternative AI agentic framework (LangFlow) to highlight trade-offs in detection performance [100].
In addition, [101] proposed a hybrid XGBoost LSTM model for anomaly detection in industrial IoT environments, reporting strong AUC-ROC (0.984) and precision scores across multiple benchmark datasets. While their approach demonstrates that classical and deep learning models can be effectively combined, their framework lacks the adaptive agentic capabilities observed in CrewAI and LangFlow. Our results show that CrewAI achieves similar detection rates while offering agent-based adaptability, enabling automated decision making and framework reconfiguration based on evolving network conditions, a capability not present in their model.
Together, these comparisons indicate that while our detection results are in line with top performing approaches in the literature, the methodological scope of our study is broader in terms of attack diversity, framework adaptability, and comparative analysis. This strengthens the practical relevance of our findings for real-world industrial automation environments, where both external intrusions and evolving operational states must be addressed.

7.3. Generalization of the Results

The results of this study can be partially generalized to other cyber physical and systems that rely on autonomous agents and real-time sensor input. The prototype combines agentic decision making with GAN-based anomaly detection adaptability in detecting different types of cyber security threats in both frameworks. While the experiments were industrial-manufacturing-centered, the approach is adaptable and transferable to other fields involving real-time data integrity and autonomy. This study is unique in such a way that it integrates Agentic AI, multi agent reasoning operating at different layers, and simulation within two autonomous frameworks. This approach is currently underexplored in the literature and even simulations. Apart from contributing to experimental evidence, this study also offers a replicable simulation model for future research, which will allow practical deployments to be adapted for broader industrial automation, Agentic AI and cyber security research.

Author Contributions

Conceptualization, A.K.M.; methodology, S.S. and C.B.; software, S.S.; validation, A.K.M., F.D. and D.P.; formal analysis, S.S. and C.B.; investigation, S.S. and C.B.; resources, S.S. and C.B.; data curation, S.S. and C.B.; writing—original draft preparation, S.S. and C.B.; writing review and editing, A.K.M.; visualization, S.S.; supervision, A.K.M., F.D. and D.P.; project administration, A.K.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially funded through the MSc program support. The APC was funded by University West.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We would like to express our sincere gratitude to Rena Zhu for her continuous guidance, support, and motivation throughout the revision and publication process.Special thanks are also extended to Stark Bikram Deuba for his valuable assistance during the final revision and for being a constant source of encouragement.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

  1. Pandey, R.K.; Das, T.K. Anomaly detection in cyber-physical systems using actuator state transition model. Int. J. Inf. Technol. 2025, 17, 1509–1521. [Google Scholar] [CrossRef]
  2. Bousetouane, F. Agentic Systems: A Guide to Transforming Industries with Vertical AI Agents. arXiv 2025, arXiv:2501.00881. [Google Scholar] [CrossRef]
  3. Suresh, P. Agentic AI: Redefining Autonomy for Complex Goal-Driven Systems. ResearchGate, 2025. Available online: https://doi.org/10.13140/RG.2.2.24115.75047 (accessed on 23 May 2025).
  4. Ismail, I.; Kurnia, R.; Brata, Z.A.; Nelistiani, G.A.; Heo, S.; Kim, H.; Kim, H. Toward Robust Security Orchestration and Automated Response in Security Operations Centers with a Hyper-Automation Approach Using Agentic AI. Information 2025, 16, 365. [Google Scholar] [CrossRef]
  5. Shao, X.; Xie, L.; Li, C.; Wang, Z. A Study on Networked Industrial Robots in Smart Manufacturing: Vulnerabilities, Data Integrity Attacks and Countermeasures. J. Intell. Robot. Syst. 2023, 109, 60. [Google Scholar] [CrossRef]
  6. Ajiga, D.; Folorunsho, S. The role of software automation in improving industrial operations and efficiency. Int. J. Eng. Res. Appl. 2024, 7, 22–35. [Google Scholar] [CrossRef]
  7. Ocaka, A.; Briain, D.O.; Davy, S.; Barrett, K. Cybersecurity Threats, Vulnerabilities, Mitigation Measures in Industrial Control and Automation Systems: A Technical Review. In Proceedings of the 2022 Cyber Research Conference-Ireland (Cyber-RCI 2022), Galway, Ireland, 25 April 2022; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2022. [Google Scholar] [CrossRef]
  8. Bendjelloul, A.; Gaham, M.; Bouzouia, B.; Moufid, M.; Mihoubi, B. Multi Agent Systems Based CPPS—An Industry 4.0 Test Case. In Proceedings of the Lecture Notes in Networks and Systems (LNNS), Amsterdam, The Netherlands, 1–2 September 2022; Springer Science and Business Media Deutschland GmbH: Berlin/Heidelberg, Germany, 2022; Volume 413, pp. 187–196. [Google Scholar] [CrossRef]
  9. Orabi, M.; Tran, K.P.; Egger, P.; Thomassey, S. Anomaly detection in smart manufacturing: An Adaptive Adversarial Transformer-based model. J. Manuf. Syst. 2024, 77, 591–611. [Google Scholar] [CrossRef]
  10. Castro, J.P. Agentic AI and the Cybersecurity Compass: Optimizing Cyber Defense. ResearchGate. 2024. Available online: https://www.researchgate.net/publication/388079969 (accessed on 16 May 2025).
  11. Elía, I.; Pagola, M. Anomaly detection in Smart-manufacturing era: A review. Eng. Appl. Artif. Intell. 2024, 139, 109578. [Google Scholar] [CrossRef]
  12. Bui, M.T.; Boffa, M.; Valentim, R.V.; Navarro, J.M.; Chen, F.; Bao, X.; Houidi, Z.B.; Rossi, D. A Systematic Comparison of Large Language Models Performance for Intrusion Detection. Proc. ACM Netw. 2024, 2, 1–23. [Google Scholar] [CrossRef]
  13. Khan, R.; Sarkar, S.; Mahata, S.K.; Jose, E. Security Threats in Agentic AI System. arXiv 2024, arXiv:2410.14728. [Google Scholar]
  14. Dhameliya, N. Revolutionizing PLC Systems with AI: A New Era of Industrial Automation. Am. Digit. J. Comput. Digit. Technol. 2023, 1, 33–48. [Google Scholar]
  15. Yigit, Y.; Ferrag, M.A.; Sarker, I.H.; Maglaras, L.A.; Chrysoulas, C.; Moradpoor, N.; Janicke, H. Critical Infrastructure Protection: Generative AI, Challenges, and Opportunities. arXiv 2024, arXiv:2405.04874. [Google Scholar] [CrossRef]
  16. Parmar, A.; Gnanadhas, J.; Mini, T.T.; Abhilash, G.; Biswal, A.C. Multi-agent approach for anomaly detection in automation networks. In Proceedings of the International Conference on Circuits, Communication, Control and Computing, Bangalore, India, 21–22 November 2014; pp. 225–230. [Google Scholar] [CrossRef]
  17. Berti, A.; Maatallah, M.; Jessen, U.; Sroka, M.; Ghannouchi, S.A. Re-Thinking Process Mining in the AI-Based Agents Era. arXiv 2024, arXiv:2408.07720. [Google Scholar]
  18. Biswas, D. Stateful Monitoring and Responsible Deployment of AI Agents. In Proceedings of the 17th International Conference on Agents and Artificial Intelligence—Volume 1: ICAART. INSTICC, Porto, Portugal, 23–25 February 2025; SciTePress: Setúbal, Portugal, 2025; pp. 393–399. [Google Scholar] [CrossRef]
  19. Sunny, S.M.A.; Liu, X.F.; Shahriar, M.R. Development of machine tool communication method and its edge middleware for cyber-physical manufacturing systems. Int. J. Comput. Integr. Manuf. 2023, 36, 1009–1030. [Google Scholar] [CrossRef]
  20. Hentout, A.; Aouache, M.; Maoudj, A.; Akli, I. Human–robot interaction in industrial collaborative robotics: A literature review of the decade 2008–2017. Adv. Robot. 2019, 33, 764–799. [Google Scholar] [CrossRef]
  21. Shrestha, L.; Balogun, H.; Khan, S. AI-Driven Phishing: Techniques, Threats, and Defence Strategies. In Cybersecurity and Human Capabilities Through Symbiotic Artificial Intelligence; Jahankhani, H., Issac, B., Eds.; Springer: Cham, Switzerland, 2025; pp. 121–143. [Google Scholar]
  22. Mazhar, T.; Irfan, H.M.; Khan, S.; Haq, I.; Ullah, I.; Iqbal, M.; Hamam, H. Analysis of Cyber Security Attacks and Its Solutions for the Smart grid Using Machine Learning and Blockchain Methods. Future Internet 2023, 15, 83. [Google Scholar] [CrossRef]
  23. Sharma, P.; Dash, B. Impact of Big Data Analytics and ChatGPT on Cybersecurity. In Proceedings of the 2023 4th International Conference on Computing and Communication Systems (I3CS), Shillong, India, 16–18 March 2023; pp. 1–6. [Google Scholar] [CrossRef]
  24. Al-Jaroodi, J.; Mohamed, N.; Jawhar, I. A Service-Oriented Middleware Framework for Manufacturing Industry 4.0. ACM SIGBED Rev. 2018, 15, 29–36. [Google Scholar] [CrossRef]
  25. Yaseen, A. Reducing Industrial Risk with AI and Automation. Int. J. Intell. Autom. Comput. 2021, 4, 60–80. Available online: https://link.springer.com/journal/11633 (accessed on 23 May 2025).
  26. Leitão, P.; Karnouskos, S.; Ribeiro, L.; Lee, J.; Strasser, T.; Colombo, A.W. Smart Agents in Industrial Cyber-Physical Systems. Proc. IEEE 2016, 104, 1086–1101. [Google Scholar] [CrossRef]
  27. Li, Y.; Wang, J. Multi-Objective Dynamic Scheduling Model of Flexible Job Shop Based on NSGA-II Algorithm and Scroll Window Technology. In Advances in Swarm Intelligence; Tan, Y., Shi, Y., Niu, B., Eds.; Lecture Notes in Computer Science; Springer: Cham, Switzerland, 2020; Volume 12145, pp. 435–444. [Google Scholar] [CrossRef]
  28. Slimane, J.B.; Alshammari, A. Securing the Industrial Backbone: Cybersecurity Threats, Vulnerabilities, and Mitigation Strategies in Control and Automation Systems. J. Electr. Syst. 2024, 20. [Google Scholar] [CrossRef]
  29. Varadarajan, M.N.; Viji, C.; Rajkumar, N.; Mohanraj, A. Integration of Ai and Iot for Smart Home Automation. SSRG Int. J. Electron. Commun. Eng. 2024, 11, 37–43. [Google Scholar] [CrossRef]
  30. Khan, A.H.; Lucas, M. AI-Powered Automation: Revolutionizing Industrial Processes and Enhancing Operational Efficiency. Rev. Intel. Artif. Med. 2024, 15, 1151–1175. [Google Scholar]
  31. Amoo, O.O.; Sodiya, E.O.; Atagoga, A. AI-driven warehouse automation: A comprehensive review of systems. GSC Adv. Res. Rev. 2024, 18, 272–282. [Google Scholar] [CrossRef]
  32. Yigit, Y.; Ferrag, M.A.; Sarker, I.H.; Maglaras, L. Generative AI and LLMs for Critical Infrastructure Protection: Evaluation Benchmarks, Agentic AI, Challenges, and Opportunities. Sensors 2025, 25, 1666. [Google Scholar] [CrossRef]
  33. Deng, Z.; Guo, Y.; Han, C.; Ma, W.; Xiong, J.; Wen, S.; Xiang, Y. AI Agents Under Threat: A Survey of Key Security Challenges and Future Pathways. ACM Comput. Surv. 2024, 1, 1–35. [Google Scholar] [CrossRef]
  34. Edris, E.K.K. Utilisation of Artificial Intelligence and Cyber Security Capabilities: The Symbiotic Relationship for Enhanced Security and Applicability. Electronics 2025, 14, 2057. [Google Scholar] [CrossRef]
  35. Lebed, S.V.; Namiot, D.E.; Zubareva, E.V.; Khenkin, P.V.; Vorobeva, A.A.; Svichkar, D.A. Large Language Models in Cyberattacks. Dokl. Math. 2024, 110, S510–S520. [Google Scholar] [CrossRef]
  36. Sotiropoulos, J.; Del Rosario, R.F.; Kokuykin, E.; Oakley, H.; Habler, I.; Underkoffler, K.; Huang, K.; Steffensen, P.; Aralimatti, R.; Bitton, R.; et al. OWASP Top 10 for LLM Apps & Gen AI Agentic Security Initiative: Agentic AI—Threats and Mitigations; Version 1.0.1; OWASP: Wakefield, MA, USA, 2025; Available online: https://hal.science/hal-04985337v1 (accessed on 23 September 2025).
  37. Ahmed, U.; Lin, J.C.W.; Srivastava, G. Exploring the Potential of Cyber Manufacturing System in the Digital Age. ACM Trans. Internet Technol. 2023, 23, 1–38. [Google Scholar] [CrossRef]
  38. Folorunso, A.; Adewumi, T.O.; Okonkwo, R. Impact of AI on cybersecurity and security compliance. Glob. J. Eng. Technol. Adv. 2024, 21, 167–184. [Google Scholar] [CrossRef]
  39. Makhija, N.; Konatam, S.; Acharya, S.; Najana, M. GenAI Current Use Cases and Future Challenges in Advanced Manufacturing. Int. J. Glob. Innov. Solut. (IJGIS) 2024. [Google Scholar] [CrossRef]
  40. Mishra, A.; Gupta, N.; Gupta, B.B. Defense mechanisms against DDoS attack based on entropy in SDN-cloud using POX controller. Telecommun. Syst. 2021, 77, 47–62. [Google Scholar] [CrossRef]
  41. Zhang, C.; Costa-Perez, X.; Patras, P. Tiki-Taka: Attacking and Defending Deep Learning-based Intrusion Detection Systems. In Proceedings of the CCSW 2020—2020 ACM SIGSAC Conference on Cloud Computing Security Workshop, Virtual Event, USA, 9 November 2020; Association for Computing Machinery, Inc.: New York, NY, USA, 2020; pp. 27–39. [Google Scholar] [CrossRef]
  42. Admass, W.; Yayeh, Y.; Diro, A.A. Cyber Security: State of the Art, Challenges and Future Directions. Cyber Secur. Appl. 2023, 2, 100031. [Google Scholar] [CrossRef]
  43. Pulikottil, T.; Estrada-Jimenez, L.A.; Rehman, H.U.; Mo, F.; Nikghadam-Hojjati, S.; Barata, J. Agent-based manufacturing—Review and expert evaluation. Int. J. Adv. Manuf. Technol. 2023, 127, 2151–2180. [Google Scholar] [CrossRef]
  44. Siraparapu, S.R.; Azad, S.M. A Framework for Integrating Diverse Data Types for Live Streaming in Industrial Automation. IEEE Access 2024, 12, 111694–111708. [Google Scholar] [CrossRef]
  45. Acharya, D.B.; Kuppan, K.; B, D. Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey. IEEE Access 2024, 13, 18912–18936. [Google Scholar] [CrossRef]
  46. Shostack, A. Threat Modeling: Designing for Security; John Wiley & Sons: Hoboken, NJ, USA, 2014. [Google Scholar]
  47. Rashid, S.M.Z.U.; Montasir, I.; Haq, A.; Ahmmed, M.T.; Alam, M.M. Securing Agentic AI: Threats, Risks and Mitigation. Preprint, January 2025. Available online: https://www.researchgate.net/publication/388493552 (accessed on 20 May 2025). [CrossRef]
  48. Mishra, S. Exploring the Impact of AI-Based Cyber Security Financial Sector Management. Appl. Sci. 2023, 13, 5875. [Google Scholar] [CrossRef]
  49. Zhang, T.; Wang, G.; Xue, C.; Wang, J.; Nixon, M.; Han, S. Time-Sensitive Networking (TSN) for Industrial Automation: Current Advances and Future Directions. ACM Comput. Surv. 2024, 57, 1–38. [Google Scholar] [CrossRef]
  50. Das, B.C.; Amini, M.H.; Wu, Y. Security and privacy challenges of large language models: A survey. ACM Comput. Surv. 2025, 57, 1–39. [Google Scholar] [CrossRef]
  51. Radanliev, P.; Roure, D.D.; Nicolescu, R.; Huth, M.; Santos, O. Digital twins: Artificial intelligence and the IoT cyber-physical systems in Industry 4.0. Int. J. Intell. Robot. Appl. 2022, 6, 171–185. [Google Scholar] [CrossRef]
  52. Salikutluk, V.; Schöpper, J.; Herbert, F.; Scheuermann, K.; Frodl, E.; Balfanz, D.; Jäkel, F.; Koert, D. An Evaluation of Situational Autonomy for Human-AI Collaboration in a Shared Workspace Setting. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24), Honolulu, HI, USA, 11–16 May 2024. [Google Scholar] [CrossRef]
  53. Zhang, G.; Gao, W.; Li, Y.; Guo, X.; Hu, P.; Zhu, J. Detection of False Data Injection Attacks in a Smart Grid Based on WLS and an Adaptive Interpolation Extended Kalman Filter. Energies 2023, 16, 7203. [Google Scholar] [CrossRef]
  54. Subramanian, J.; Sinha, A.; Seraj, R.; Mahajan, A. Approximate Information State for Approximate Planning and Reinforcement Learning in Partially Observed Systems. J. Mach. Learn. Res. 2022, 23, 1–83. [Google Scholar]
  55. Lebed, S.; Namiot, D.; Zubareva, E.; Khenkin, P.; Vorobeva, A.; Svichkar, D. Large Language Models in Cyberattacks. In Proceedings of the Doklady Mathematics; Springer: Berlin/Heidelberg, Germany, 2024; Volume 110, pp. S510–S520. [Google Scholar]
  56. Hamdi, A.; Müller, M.; Ghanem, B. SADA: Semantic adversarial diagnostic attacks for autonomous applications. Proc. Aaai Conf. Artif. Intell. 2020, 34, 10901–10908. [Google Scholar] [CrossRef]
  57. Mubarak, S.; Habaebi, M.H.; Islam, M.R.; Rahman, F.D.A.; Tahir, M. Anomaly Detection in ICS Datasets with Machine Learning Algorithms. Comput. Syst. Sci. Eng. 2021, 37, 34–46. [Google Scholar] [CrossRef]
  58. Zainuddin, A.A.; Amir Hussin, A.A.; Hassan, M.K.A.; Handayani, D.O.D.; Ahmad Puzi, A.; Zakaria, N.A.; Rosdi, N.N.H.; Ahmadzamani, N.Z.A. International Grand Invention, Innovation and Design EXPO (IGIIDeation) 2025; Technical Report; KICT Publishing: Kuala Lumpur, Malaysia, 2025. [Google Scholar]
  59. Sarsam, S.M. Cybersecurity Challenges in Autonomous Vehicles: Threats, Vulnerabilities, and Mitigation Strategies. SHIFRA 2023, 2023, 34–42. [Google Scholar] [CrossRef]
  60. Moher, D.; Shamseer, L.; Clarke, M.; Ghersi, D.; Liberati, A.; Petticrew, M.; Shekelle, P.; Stewart, L.A.; PRISMA-P Group. Preferred Reporting Items for Systematic Review and Meta-Analysis Protocols (PRISMA-P) 2015 Statement. Syst. Rev. 2015, 4, 1. [Google Scholar] [CrossRef]
  61. Duan, Z.; Wang, J. Exploration of LLM Multi-Agent Application Implementation Based on LangGraph+CrewAI. arXiv 2024, arXiv:2411.18241. [Google Scholar]
  62. Venkadesh, P.; Divya, S.; Kumar, K.S. Unlocking AI Creativity: A Multi-Agent Approach with CrewAI. J. Trends Comput. Sci. Smart Technol. 2024, 6, 338–356. [Google Scholar] [CrossRef]
  63. Jeong, C. Beyond Text: Implementing Multimodal Large Language Model-Powered Multi-Agent Systems Using a No-Code Platform. arXiv 2025, arXiv:2501.00750. [Google Scholar]
  64. Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
  65. Davis, J.; Goadrich, M. The Relationship Between Precision-Recall and ROC Curves. In Proceedings of the 23rd International Conference on Machine Learning (ICML), Pittsburgh, PA, USA, 25–29 June 2006; pp. 233–240. [Google Scholar]
  66. Saito, T.; Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef]
  67. Fawcett, T. An Introduction to ROC Analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
  68. Bradley, A.P. The Use of the Area Under the ROC Curve in the Evaluation of Machine Learning Algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
  69. Khan, S.I.; Kaur, C.; Ansari, M.S.A.; Muda, I.; Borda, R.F.C.; Bala, B.K. Implementation of cloud based IoT technology in manufacturing industry for smart control of manufacturing process. Int. J. Interact. Des. Manuf. 2025, 19, 773–785. [Google Scholar] [CrossRef]
  70. Conlon, N.; Ahmed, N.R.; Szafir, D. A Survey of Algorithmic Methods for Competency Self-Assessments in Human-Autonomy Teaming. ACM Comput. Surv. 2024, 56, 1–31. [Google Scholar] [CrossRef]
  71. Zhu, J.P.; Cai, P.; Xu, K.; Li, L.; Sun, Y.; Zhou, S.; Su, H.; Tang, L.; Liu, Q. AutoTQA: Towards Autonomous Tabular Question Answering through Multi-Agent Large Language Models. Proc. VLDB Endow. 2024, 17, 3920–3933. [Google Scholar] [CrossRef]
  72. Hong, Y.; Chen, G.; Bushnell, L. Distributed observers design for leader-following control of multi-agent networks. Automatica 2008, 44, 846–850. [Google Scholar] [CrossRef]
  73. Akyol, E.; Rose, K.; Basar, T. On Optimal Jamming Over an Additive Noise Channel. arXiv 2013, arXiv:1303.3049. [Google Scholar] [CrossRef]
  74. Ishii, H.; Wang, Y.; Feng, S. An overview on multi-agent consensus under adversarial attacks. Annu. Rev. Control 2022, 53, 252–272. [Google Scholar] [CrossRef]
  75. Falade, P.V. Decoding the Threat Landscape: ChatGPT, FraudGPT, and WormGPT in Social Engineering Attacks. Int. J. Sci. Res. Comput. Sci. Eng. Inf. Technol. 2023, 185–198. [Google Scholar] [CrossRef]
  76. Sritriratanarak, W.; Garcia, P. Cyber Physical Games: Rational Multi-Agent Decision-Making in Temporally Non-Deterministic Environments. ACM Trans. Cyber-Phys. Syst. 2025, 9, 17. [Google Scholar] [CrossRef]
  77. Bairaktaris, J.A.; Johannssen, A.; Tran, K.P. Security Strategies for AI Systems in Industry 4.0. Qual. Reliab. Eng. Int. 2024, 41, 897–915. [Google Scholar] [CrossRef]
  78. Lu, J.; Sibai, H.; Fabry, E. Adversarial Examples that Fool Detectors. arXiv 2017, arXiv:1712.02494. [Google Scholar] [CrossRef]
  79. Idaho National Laboratory (INL). Autonomous System Inference, Trojan, and Adversarial Reprogramming Attack and Defense (Final); Idaho National Laboratory (INL): Idaho Falls, ID, USA, 2023. [CrossRef]
  80. Casella, A.; Wang, W. Performant LLM Agentic Framework for Conversational AI. In Proceedings of the 2025 1st International Conference on Artificial Intelligence and Computing, Kuala Lumpur, Malaysia, 14–16 February 2025. [Google Scholar]
  81. Ahmed, I.; Syed, M.A.; Maaruf, M.; Khalid, M. Distributed computing in multi-agent systems: A survey of decentralized machine learning approaches. Computing 2025, 107, 2. [Google Scholar] [CrossRef]
  82. Chien, C.H.; Trappey, A.J. Human-AI cooperative generative adversarial network (GAN) for quality predictions of small-batch product series. Adv. Eng. Inform. 2025, 65, 103327. [Google Scholar] [CrossRef]
  83. Sivakumar, S. Agentic AI in Predictive AIOps Enhancing IT Autonomy and Performance. Int. J. Sci. Res. Manag. (IJSRM) 2024, 12, 1631–1638. [Google Scholar] [CrossRef]
  84. Severson, T.A.; Croteau, B.; Rodríguez-Seda, E.J.; Kiriakidis, K.; Robucci, R.; Patel, C. A resilient framework for sensor-based attacks on cyber–physical systems using trust-based consensus and self-triggered control. Control Eng. Pract. 2020, 101, 104509. [Google Scholar] [CrossRef]
  85. Lin, C.C.; Tsai, C.T.; Liu, Y.L.; Chang, T.T.; Chang, Y.S. Security and Privacy in 5G-IIoT Smart Factories: Novel Approaches, Trends, and Challenges. Mob. Netw. Appl. 2023, 28, 1043–1058. [Google Scholar] [CrossRef]
  86. Aryankia, K.; Selmic, R.R. Neuro-Adaptive Formation Control of Nonlinear Multi-Agent Systems with Communication Delays. J. Intell. Robot. Syst. Theory Appl. 2023, 109, 92. [Google Scholar] [CrossRef]
  87. Marali, M.; Sudarsan, S.D.; Gogioneni, A. Cyber security threats in industrial control systems and protection. In Proceedings of the 2019 International Conference on Advances in Computing and Communication Engineering (ICACCE), Sathyamangalam, India, 4–6 April 2019; pp. 1–7. [Google Scholar] [CrossRef]
  88. Rai, R.; Tiwari, M.K.; Ivanov, D.; Dolgui, A. Machine learning in manufacturing and industry 4.0 applications. Int. J. Prod. Res. 2021, 59, 4773–4778. [Google Scholar] [CrossRef]
  89. Upadhyay, D.; Sampalli, S. SCADA (Supervisory Control and Data Acquisition) systems: Vulnerability assessment and security recommendations. Comput. Secur. 2020, 89, 101666. [Google Scholar] [CrossRef]
  90. Jia, J.; Yu, R.; Du, Z.; Chen, J.; Wang, Q.; Wang, X. Distributed localization for IoT with multi-agent reinforcement learning. Neural Comput. Appl. 2022, 34, 7227–7240. [Google Scholar] [CrossRef]
  91. Bayrak, B.; Giger, F.; Meurisch, C. Insightful Assistant: AI compatible Operation Graph Representations for Enhancing Industrial Conversational Agents. arXiv 2020, arXiv:2007.12929. [Google Scholar] [CrossRef]
  92. Miyachi, T.; Yamada, T. Current issues and challenges on cyber security for industrial automation and control systems. In Proceedings of the 2014 SICE Annual Conference (SICE), Tokyo, Japan, 13–15 December 2014; pp. 821–826. [Google Scholar] [CrossRef]
  93. Liu, M.; Zhang, L.; Chen, J.; Chen, W.A.; Yang, Z.; Lo, L.J.; Wen, J.; O’Neill, Z. Large language models for building energy applications: Opportunities and challenges. Build. Simul. 2025, 18, 225–234. [Google Scholar] [CrossRef]
  94. Zhu, Q.; Başar, T. Robust and Resilient Control Design for Cyber-Physical Systems with an Application to Power Systems. In Proceedings of the 50th IEEE Conference on Decision and Control and European Control Conference (CDC-ECC), Orlando, FL, USA, 12–15 December 2011; pp. 4066–4071. [Google Scholar] [CrossRef]
  95. Ullah, F.; Naeem, H.; Jabbar, S.; Khalid, S.; Latif, M.A.; Al-Turjman, F.; Mostarda, L. Cyber security threats detection in internet of things using deep learning approach. IEEE Access 2019, 7, 124379–124389. [Google Scholar] [CrossRef]
  96. Kim, S.J.; Cho, D.E.; Yeo, S.S. Secure Model against APT in m-Connected SCADA Network. Int. J. Distrib. Sens. Netw. 2014, 10, 594652. [Google Scholar] [CrossRef]
  97. Saßnick, O.; Rosenstatter, T.; Schäfer, C.; Huber, S. STRIDE-based Methodologies for Threat Modeling of Industrial Control Systems: A Review. In Proceedings of the 2024 IEEE 7th International Conference on Industrial Cyber-Physical Systems (ICPS), St. Louis, MO, USA, 12–15 May 2024; pp. 1–8. [Google Scholar] [CrossRef]
  98. Khan, R.; McLaughlin, K.; Laverty, D.; Sezer, S. STRIDE-based threat modeling for cyber-physical systems. In Proceedings of the 2017 IEEE PES Innovative Smart Grid Technologies Conference Europe (ISGT-Europe), Torino, Italy, 26–29 September 2017; pp. 1–6. [Google Scholar] [CrossRef]
  99. Benaddi, H.; Jouhari, M.; Ibrahimi, K.; Ben Othman, J.; Amhoud, E.M. Anomaly Detection in Industrial IoT Using Distributional Reinforcement Learning and Generative Adversarial Networks. Sensors 2022, 22, 8085. [Google Scholar] [CrossRef] [PubMed]
  100. TBD, A. Anomaly-Based Intrusion Detection Using GAN for Industrial Control Systems. In Proceedings of the 2022 10th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 13–14 October 2022. [Google Scholar]
  101. Chen, Z.; Li, Z.; Huang, J.; Long, H. An Effective Method for Anomaly Detection in Industrial Internet of Things Using XGBoost and LSTM. Sci. Rep. 2024, 14, 23969. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Project workflow diagram.
Figure 1. Project workflow diagram.
Computers 14 00456 g001
Figure 2. Distribution of articles based on relevance levels of articles on AI agents.
Figure 2. Distribution of articles based on relevance levels of articles on AI agents.
Computers 14 00456 g002
Figure 3. Distribution of articles based on Cyber Security focus.
Figure 3. Distribution of articles based on Cyber Security focus.
Computers 14 00456 g003
Figure 4. Distribution of articles based on Industrial Automation relevance.
Figure 4. Distribution of articles based on Industrial Automation relevance.
Computers 14 00456 g004
Figure 5. Yearly publication trend.
Figure 5. Yearly publication trend.
Computers 14 00456 g005
Figure 6. Task-oriented CrewAI framework where each node (agent) performs specific functions.
Figure 6. Task-oriented CrewAI framework where each node (agent) performs specific functions.
Computers 14 00456 g006
Figure 7. Crew AI Sensor Monitoring Node.
Figure 7. Crew AI Sensor Monitoring Node.
Computers 14 00456 g007
Figure 8. CrewAI Response Node.
Figure 8. CrewAI Response Node.
Computers 14 00456 g008
Figure 9. CrewAI Attack Simulation Node.
Figure 9. CrewAI Attack Simulation Node.
Computers 14 00456 g009
Figure 10. GAN Defence Node.
Figure 10. GAN Defence Node.
Computers 14 00456 g010
Figure 11. CrewAI Data Analysis Node.
Figure 11. CrewAI Data Analysis Node.
Computers 14 00456 g011
Figure 12. LangFlow framework showing node configuration and LLM connection.
Figure 12. LangFlow framework showing node configuration and LLM connection.
Computers 14 00456 g012
Figure 13. LangFlow Input Node.
Figure 13. LangFlow Input Node.
Computers 14 00456 g013
Figure 14. Langflow Prompt Node.
Figure 14. Langflow Prompt Node.
Computers 14 00456 g014
Figure 15. LangFlow LLM.
Figure 15. LangFlow LLM.
Computers 14 00456 g015
Figure 16. LangFlow OutPut Node.
Figure 16. LangFlow OutPut Node.
Computers 14 00456 g016
Figure 17. Article screening.
Figure 17. Article screening.
Computers 14 00456 g017
Figure 18. Summary of article selection.
Figure 18. Summary of article selection.
Computers 14 00456 g018
Figure 19. Inclusion criteria ratio.
Figure 19. Inclusion criteria ratio.
Computers 14 00456 g019
Figure 20. Exclusion criteria ratio.
Figure 20. Exclusion criteria ratio.
Computers 14 00456 g020
Figure 21. Baseline Normal Behavior.
Figure 21. Baseline Normal Behavior.
Computers 14 00456 g021
Figure 22. System behavior with No GAN.
Figure 22. System behavior with No GAN.
Computers 14 00456 g022
Figure 23. Comparison of CrewAI and LangFlow Frameworks.
Figure 23. Comparison of CrewAI and LangFlow Frameworks.
Computers 14 00456 g023
Figure 24. Comparison of CrewAI and LangFlow frameworks.
Figure 24. Comparison of CrewAI and LangFlow frameworks.
Computers 14 00456 g024
Table 1. Summary of database search.
Table 1. Summary of database search.
DatabaseSearch Query StringNumber of Articles Retrieved
ACM Digital Library(“Security” OR “Cybersecurti*” OR “IT Security*” OR “Internet Security*”) AND (“Bottleneck*” OR “Threat*” OR “Risk*” OR “Vulnerabilit*” OR “Challenge*” OR “Problem*”) AND (“Agentic AI” OR “LLM” OR “Multi-Agent*” OR “AI Agent*”) AND (“Industrial Automation” OR “Smart Factor*” OR “Smart Manufactur*” OR “Robot*”)64
ScopusTITLE-ABS-KEY ((“Bottleneck*” OR “Threat*” OR “Risk*” OR “Vulnerabilit*” OR “Challenge*” OR “Problem*”)) AND (“Agentic AI” OR “LLM” OR “Multi-Agent*” OR “AI Agent*”) AND (“Industrial Automation” OR “Smart Factor*” OR “Smart Manufactur*” OR “Robot*”) AND (“Security” OR “Cybersecur*” OR “IT Security*” OR “Internet Security*”)53
SpringerLink(“Security” OR “Cybersecurti*” OR “IT Security*” OR “Internet Security*”) AND (“Agentic AI” OR “LLM” OR “Multi-Agent*” OR “AI Agent*”) AND (“Bottleneck*” OR “Threat*” OR “Risk*” OR “Vulnerabilit*” OR “Challenge*” OR “Problem*”) AND (“Industrial Automation” OR “Smart Factor*” OR “Smart Manufactur*” OR “Robot*”)213
Academic Search Premier(“Security” OR “Cybersecurti*” OR “IT Security*” OR “Internet Security*”) AND (“Bottleneck*” OR “Threat*” OR “Risk*” OR “Vulnerabilit*” OR “Challenge*” OR “Problem*”) AND (“Agentic AI” OR “LLM” OR “Multi-Agent*” OR “AI Agent*”) AND (“Industrial Automation” OR “Smart Factor*” OR “Smart Manufactur*” OR “Robot*”)9
The * sign is a wildcard. It means any ending of the word.
Table 2. Revised inclusion criteria.
Table 2. Revised inclusion criteria.
Inclusion CriteriaExplanation
IC1Studies focusing on agentic AI systems (e.g., autonomous agents, decision making AI, multi agent AI, self learning systems, or LLM-enabled agents) in industrial automation. Example: a paper on multi agent coordination in smart factories.
IC2Studies addressing cybersecurity challenges, threats, or vulnerabilities in the context of AI driven or agent-based industrial systems. Example: adversarial attacks on CPS enabled manufacturing.
IC3Empirical studies, simulation based works, or case studies with a clear cybersecurity component in industrial environments. Example: GAN-based anomaly detection in an IoT-enabled production line.
IC4Published in peer-reviewed journals, conference proceedings, or other high quality academic sources.
IC5Published between 2015 to 2025, ensuring focus on contemporary developments in Agentic AI and industrial cybersecurity.
IC6Written in English.
IC7Papers including technical contributions (e.g., architectures, frameworks, threat models, attack vectors, countermeasures). Example: STRIDE-based threat modeling for multi agent systems.
Table 3. Revised Exclusion Criteria.
Table 3. Revised Exclusion Criteria.
Exclusion CriteriaExplanation
EC1Studies limited to general purpose AI. Example: LLM chatbot studies in education.
EC2Papers related to AI in non industrial sectors (e.g., finance) that do not address automation.
EC3Papers not addressing cybersecurity aspects, even if they focus on industrial AI (e.g., works limited to efficiency optimization without security evaluation).
EC4Non-peer-reviewed works (e.g., blog posts, opinion pieces, white papers, grey literature).
EC5Articles published before 2015.
EC6Papers with only abstract available or without full text access.
EC7Papers not written in English.
Table 4. Metrics.
Table 4. Metrics.
TermDefinition
True Positive (TP)Correctly detected attacks
False Positive (FP)Normals data was incorrectly flagged as an attack
True Negative (TP)Normal data that was correctly ignored
False Negative (FN)Attack data was incorrectly flagged as normal
Table 5. Evaluation metrics, definitions and formula.
Table 5. Evaluation metrics, definitions and formula.
MetricDefinitionFormula
AccuracyProportion of correctly classified sensor reports (attack or normal) out of all events in both framework simulations T P + T N T P + T N + F P + F N
PrecisionTotal number of actual attacks out of all flagged attacks, T P T P + F P
Recall (TPR)Total number of correctly detected attacks out of all simulated attacks (DDOs, FDI, Replay, Adversarial) T P T P + F N
F1 ScoreBalance between Precision and Recall 2 × P r e c i s i o n × R e c a l l P r e c i s i o n + R e c a l l
FPRData falsely incorrectly as attacks out of all normal data, F P F P + T N
FNRThe number of missed by the system out of all real attacks (DDOs, FDI, Replay, Adversarial) F N F N + T P
Detection RateProportion of attacks (DDOs, FDI, Replay, Adversarial) that were successfully detected T P T P + F N
AUC-ROCPerformance curve plotting TPR vs. FPRPlotted graphically
Table 6. Categorization of bottlenecks relevant to Agentic AI in industrial automation.
Table 6. Categorization of bottlenecks relevant to Agentic AI in industrial automation.
CategoryBottlenecks IdentifiedExplanation
Data relatedFalse Data Injection, Replay Attacks, Poor sensor integrityCompromise of input data directly affects the reliability of Agentic AI, leading to false or misleading decisions [5]
Model relatedAdversarial Attacks, Model overfitting, MisclassificationWeaknesses in ML/LLM reasoning and generalization expose vulnerabilities in autonomous decision making [55,69]
Communication relatedLatency, Distributed Denial of Service (DDoS), Insecure communicationBottlenecks in inter agent coordination and network resilience that can block or delay safe responses [37]
System levelDecision drift, Unintended actions, LLM exploitationFailures in coordinating autonomous systems and ensuring safe operations across industrial environments [34,39]
Table 7. Mapping of CrewAI agents to their roles and real-world industrial system components.
Table 7. Mapping of CrewAI agents to their roles and real-world industrial system components.
Agent/NodeRole in SimulationReal-World Mapping
Sensor Monitoring AgentProcesses sensor inputs (temperature, vibration, humidity, pressure) and sets baseline for normal operationIndustrial IoT sensors and monitoring devices that continuously track machine/environmental conditions
Response Implementation NodeInterprets sensor data, triggers safety actions when thresholds are exceededPLCs (Programmable Logic Controllers) or automated control systems in factories that shut down or adjust processes
Attack Simulation NodeInjects malicious inputs (e.g., FDI, DDoS, replay) to test resilienceCyber adversaries or penetration testing tools that mimic real-world cyberattacks
GAN Defence NodeUses GAN-based detection to identify anomalies and adversarial inputsIntrusion detection systems (IDS), anomaly detection AI, or ML based cybersecurity defense tools [5]
Data Analysis NodeAnalyzes simulation outcomes before, during, and after attacksSecurity operation center (SOC) tools and forensic systems for monitoring and post incident analysis [39,70]
Input NodeProvides baseline sensor values to the systemReal-world sensor feeds providing operational thresholds
Output NodeDisplays final system decisions and responsesDashboard/control panels for operators or automated alerting systems
Table 8. Evaluation Metrics.
Table 8. Evaluation Metrics.
FrameworkAccuracyPrecisionRecallF1 ScoreAUC-ROCDetection RateFNPFPR
CrewAI0.671.000.500.670.950.500.500.00
LangFlow0.601.000.430.600.880.430.570.00
Table 9. STRIDE threat model and proposed mitigation.
Table 9. STRIDE threat model and proposed mitigation.
CategoryThreat DescriptionImpactsMitigation Strategy
SpoofingFalse/fake sensor data was sent to the LLM agentsSystem was fooled to trust wrong source and took wrong and harmful decisionsInput signature validation and enforcement of source verification in agent prompts [55]
TamperingSensor readings were sent through adversarial attacksSystem was fooled to trust wrong source and took wrong and harmful decisions based on fake input readingsImplement anomaly detection systems; e.g., GAN-based anomaly detection flagged inconsistent sensor input patterns [13,89]
RepudiationAgentic actions not logged or traceable (e.g., false detection unrecorded)Difficult to audit or trace attacker behaviorUse structured, timestamped logging (e.g., in CrewAI), including agent ID and decisions [16]
Info DisclosureUnauthorized exposure of sensitive system or operational dataExposes vulnerabilities to attackers or usersImplement end to end encryption in securing communication between AI agents operating across distributed architectures [18]
DoSLangFlow and CrewAI nodes were flooded with malicious requestsThe system crashed or became unresponsiveFine tune GAN pre filters at input channels and implement rate limiting [28]
Privilege EscalationMalicious user manipulates task prompts or gains unauthorized controlUnauthorized actions executed at higher privilege levelEnforce role based task restrictions; validate task flow in CrewAI using logic rules [47]
Table 10. Summary of key risks, threats, and mitigation strategies for Agentic AI in industrial automation.
Table 10. Summary of key risks, threats, and mitigation strategies for Agentic AI in industrial automation.
Theme/ChallengeRisks & Threats IdentifiedProposed Mitigation Strategies
Data IntegrityFalse Data Injection (FDI), Data PoisoningGAN-based anomaly detection, Secure data pipelines, Validation layers
AvailabilityDistributed Denial of Service (DDoS), Replay AttacksRedundancy mechanisms, Rate limiting, Adaptive anomaly detection
ConfidentialityAdversarial Attacks, Model Inversion, Prompt InjectionAdversarial training, Model hardening, Secure communication
Trust & ReliabilityUnintended decision drift, Misclassification, LLM exploitationSTRIDE threat modeling, Human in the loop oversight, Continuous monitoring
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shrestha, S.; Banda, C.; Mishra, A.K.; Djebbar, F.; Puthal, D. Investigation of Cybersecurity Bottlenecks of AI Agents in Industrial Automation. Computers 2025, 14, 456. https://doi.org/10.3390/computers14110456

AMA Style

Shrestha S, Banda C, Mishra AK, Djebbar F, Puthal D. Investigation of Cybersecurity Bottlenecks of AI Agents in Industrial Automation. Computers. 2025; 14(11):456. https://doi.org/10.3390/computers14110456

Chicago/Turabian Style

Shrestha, Sami, Chipiliro Banda, Amit Kumar Mishra, Fatiha Djebbar, and Deepak Puthal. 2025. "Investigation of Cybersecurity Bottlenecks of AI Agents in Industrial Automation" Computers 14, no. 11: 456. https://doi.org/10.3390/computers14110456

APA Style

Shrestha, S., Banda, C., Mishra, A. K., Djebbar, F., & Puthal, D. (2025). Investigation of Cybersecurity Bottlenecks of AI Agents in Industrial Automation. Computers, 14(11), 456. https://doi.org/10.3390/computers14110456

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop