Abstract
Agentic AI systems are a recently emerged and important approach that goes beyond traditional AI, generative AI, and autonomous systems by focusing on autonomy, adaptability, and goal-driven reasoning. This study provides a clear review of agentic AI systems by bringing together their definitions, frameworks, and architectures, and by comparing them with related areas like generative AI, autonomic computing, and multi-agent systems. To do this, we reviewed 143 primary studies on current LLM-based and non-LLM-driven agentic systems and examined how they support planning, memory, reflection, and goal pursuit. Furthermore, we classified architectural models, input–output mechanisms, and applications based on their task domains where agentic AI is applied, supported using tabular summaries that highlight real-world case studies. Evaluation metrics were classified as qualitative and quantitative measures, along with available testing methods of agentic AI systems to check the system’s performance and reliability. This study also highlights the main challenges and limitations of agentic AI, covering technical, architectural, coordination, ethical, and security issues. We organized the conceptual foundations, available tools, architectures, and evaluation metrics in this research, which defines a structured foundation for understanding and advancing agentic AI. These findings aim to help researchers and developers build better, clearer, and more adaptable systems that support responsible deployment in different domains.
1. Introduction
Agentic AI refers to AI systems that do not just answer prompts; they set sub-goals, choose tools, and take multi-step actions to achieve a user’s objective with limited supervision. In practice, these systems coordinate multiple agents, each handling part of the job, and an orchestration layer that keeps them aligned with the goal [1]. An agent is a system that senses its environment, decides what to do, and takes action to achieve a goal. A simple example is a travel planner agent: Given ‘plan a 3-day trip to Chicago under USD 1500’, it breaks the task into subtasks such as flights, hotel, itinerary, queries real-time sources, compares options, book reservations, then produces a shareable plan that asks for confirmation only at key steps. This goes beyond traditional chatbots by planning, acting, and verifying outcomes in a loop [2]. The term AI agent refers to an autonomous software entity that perceives, reasons, and acts to complete specific adaptively directed tasks. The term agentic AI refers to a multi-agent system where specialized agents collaborate, coordinate, and plan to achieve complex, high-level objectives. The difference in these terms for the flight booking example is shown in Figure 1.
Figure 1.
Agentic AI vs. AI agents.
Interest in agentic AI is rising quickly as organizations look for automation that impacts real work, not just content generation. Industry analysts now track “AI agents” as a distinct market, estimating ∼USD 5.3–5.4B in 2024 and projecting ∼USD 50–52 B by 2030 (≈41–46% CAGR), reflecting strong enterprise demand for agents that can reason, use tools, and execute workflows [3,4]. Scholarly research suggests the same movement. One work from Stanford HAI [5] shows human agent that did exhibit human behavior and sustained personas. Additionally, industry advice such as by AWS [2] describes increasing levels of autonomy of which the agent iteratively planned, acted, checked experience, and adjusted each step of the way autonomously rather than expecting prompts step by step from a human. Together, these advances move AI from reactive assistants to proactive collaborators in various fields [6,7,8].
According to Google Trends, interest in “agentic AI” remained minimal for years, then spiked beginning in April 2024, reaching its peak in July 2025. A value of 100 represents the term’s highest recorded popularity, as shown in Figure 2. This sharp rise reflects a global shift toward building AI that can plan, act, and adapt with minimal human guidance.
Figure 2.
Google Trends shows the popularity of agentic AI.
Traditional systems enabled the automation of individual steps that complied with limited rules; agentic AI connect these steps, track progress, recover from errors, and they are able to automate any end-to-end process, month-end close, claims adjudication, outreach in sales, and even research workflows, previously limited to human effort. Early enterprise reporting highlights this architectural turn from smarter models to agentic, process-embedded systems [9]. This review synthesizes how the field defines agentic AI, the architectures that enable it, where it is being applied, how it is measured, and the open challenges, such as reliability, coordination, safety, and governance, that must be addressed for widespread deployment.
1.1. Research Purpose
The purpose of this research is to provide a comprehensive review of agentic AI by synthesizing its definitions, frameworks, and architectures, while comparing agentic AI from related paradigms. This study aims to find current tools and frameworks that support agentic capabilities, describe available architectural models of such systems, and explore their applications across domains. Furthermore, it concentrates on the input–output mechanism of agentic AI, existing testing methods, and metrics used to assess agentic performance, and addresses the key challenges and limitations. By doing so, the paper establishes a structured foundation for understanding, assessing, and advancing the field of agentic AI.
1.2. Research Questions
The motivation for this research resulted from the growing importance of agentic AI and the need to better understand what it is and how it works. This study aims to provide researchers and developers with a clear overview of agentic AI by examining its definitions, how it differs from related technologies, the tools and frameworks that support it, its system design, the types of tasks it can perform, how it interacts with inputs and outputs, methods for evaluating it, and the challenges it faces. The objective is to support responsible development and deployment while also assisting in the improvement of agentic system design, use, and evaluation across various domains.
- RQ1
- How is agentic AI conceptually defined, and how does it differ from related paradigms?
- Ans:
- We define agentic AI’s core characteristics, distinguishing it from generative AI, autonomic computing, and multi-agent systems. A Venn diagram-based taxonomy illustrates overlaps and distinctions, clarifying terminology and providing a structured foundation.
- RQ2
- To what extent do current LLM-based and non-LLM-driven agentic AI, tools, and frameworks enable agentic capabilities?
- Ans:
- We analyze frameworks like LangChain, AutoGPT, BabyAGI, OpenAgents, Autogen, CAMEL, MetaGPT, SuperAGI, TB-CSPN, and non-LLM-driven agentic systems. Their features planning, memory, reflection, and goal pursuit are compared to evaluate how effectively current LLM-based tools enable agentic capabilities and approach true autonomy.
- RQ3
- What are the core components or architectural models used to build agentic AI systems?
- Ans:
- We define the core architectural components of agentic AI, emphasizing planning modules, memory system, and reasoning engines.
- RQ4
- What types of goals and tasks are currently being solved using agentic AI across domains?
- Ans:
- We provide a comprehensive classification of goals and tasks being addressed by agentic AI across multiple domains, along with their application contexts and corresponding systems used in the literature, presented in a tabular format.
- RQ5
- What kinds of input and output formats do agentic AI systems handle in comparison to traditional AI systems?
- Ans:
- We provide a detailed overview of the input and output formats handled by agentic AI systems, along with the specific tasks and corresponding applications, summarized in a tabular format, and compare this with the input–output mechanisms of traditional AI systems.
- RQ6
- What evaluation methods and metrics are used to assess the performance of agentic AI systems?
- Ans:
- We presented a classification for both the testing methods and the evaluation metrics to evaluate agentic AI systems. The evaluation metrics are further classified into qualitative and quantitative.
- RQ7
- What are the key challenges and limitations in designing and deploying agentic AI systems?
- Ans:
- We presented different challenges of agentic AI, including architecture and technical challenges, performance and tool integration issues, coordination between multi-agents, and user experience issues, along with ethical and security challenges.
1.3. Contributions and Research Significance
This survey provides detailed answers for research questions that show the importance and impacts of agentic AI systems in the real world. Exploring these definitions, frameworks, and architectural models of agentic AI provides important guidance and information to researchers and developers. Understanding these foundational elements is useful for designing agentic AI systems capable of autonomous planning, decision-making, and adaptive behavior. Also, this study reviews current tools and frameworks of agentic AI systems, their input and output mechanisms, and describes how agentic capabilities are implemented in real-time applications. Organizing architectures, applications, and evaluation methods in a clear overview helps people understand and choose the right approaches and tools easily for different tasks and areas.
Additionally, this research examines the key challenges and limitations in developing and deploying agentic AI, including reliability, safety, interpretability, and governance concerns. By highlighting these challenges and discussing robust evaluation methods, it contributes to establishing reliable assessment frameworks that improve the credibility and practical application of agentic AI systems. These insights facilitate informed decision-making in both academic and industrial contexts and support the development of more advanced, dependable, and context-aware agentic AI solutions. Overall, this study offers critical knowledge on conceptual, architectural, and practical aspects of agentic AI, advancing understanding and guiding future research and implementation efforts.
1.4. Organization of the Paper
The remainder of this paper is organized as follows. Section 1 introduces the motivation and background for studying agentic AI, the research purpose, proposed research questions, and outlines the overall organization of the paper. Section 2 describes the methodology used to conduct the review and analysis. Section 3 presents the results, beginning with the conceptual definition of agentic AI and its differences from related paradigms. It then examines the extent to which current LLM-based frameworks enable agentic capabilities, followed by a discussion of core components and architectural models, including a comparison between architectures and components. The section further explores the types of goals and tasks addressed by agentic AI across domains, the input–output mechanisms in comparison to generative AI systems, and the evaluation methods and metrics used to assess performance. Challenges and limitations in designing and deploying agentic AI systems are also highlighted. Section 4 concludes the paper by summarizing the findings and offering directions for future research. Appendix A presents tables summarizing the bibliographic search results of primary studies, including publication venues, years, and paper types, while Appendix B lists the abbreviations used throughout the paper.
2. Methodology
To carry out this study, we searched several academic databases, including Library Search, ACM Digital Library, ScienceDirect, Google Scholar, and Semantic Scholar. We created a precise search query (“agentic AI” OR “agent” OR “autonomous AI” OR “multi-agent systems” OR “autonomic computing” OR “generative AI” OR “AI planning” OR “AI memory systems” OR “AI feedback loops”) AND (“survey” OR “overview” OR “review” OR “summary” OR “literature review”) to find studies relevant to our research. After collecting the results, we removed duplicates and examined abstracts and conclusions to determine relevance. We included papers that discussed agentic AI concepts, systems, or challenges, using case studies, surveys, and other research methods. We also looked at the references of these papers to find additional useful studies. Finally, we extracted key information from all selected studies to summarize current knowledge on agentic AI.
As a result of the search process, we used these 143 primary studies to extract the data, which were published from the year 2005 to the present. More than 90% of these papers were published in 2024 and 2025; from this, it is clear that most of the agentic-AI-system-related papers are being published in recent times. More statistics related to year-wise agentic AI papers publications are shown in Table A1. Out of these papers 17% are conference proceedings, 46% are peer-reviewed journals, and 28% are preprints. Others are dissertations, articles, workshops, etc. Table A2 is a clear distribution format of the type of publication corresponding to the number of papers.
Turning to the publication platform, 32 of these selected papers are published on the arXiv platform, followed by 27 papers from conference proceedings, including ACM International Conferences such as on Autonomous Agents and Multi-agent Systems, Interactive Media Experiences, Information and Knowledge Management, Human Factors in Computing Systems, etc. Additionally, we found 43 papers that are published in journal-based venues such as IEEE Access, Engineering Applications of Artificial Intelligence. One of the selected papers is a PhD dissertation, and the other papers are from Procedia CIRP, MethodsX, etc. Table A3 provides the complete list of venues. We observed that most of the publications are multidisciplinary, which makes sense in the context of agentic AI, as these disciplines are focusing on solving their problems using agentic AI.
Along with the main research papers, we also studied 21 survey and review articles that provide useful perspectives on agentic AI. Each of these surveys was examined based on our chosen dimensions, and their contributions were marked as L, M, H, or NA to represent low, medium, high, and not applicable. As shown in Table 1, many of the surveys emphasize specific aspects such as taxonomy, system architecture, or domain-focused applications of LLM-based agents, while giving little attention to evaluation methods or input–output classifications. Some papers discuss specialized areas like planning, reasoning, integration with 6G networks, or retail systems, while others examine broader themes including trust, risk, operational practices, and public attitudes toward AI.
Table 1.
Summary of important surveys on agentic AI (L—Low, M—Medium, H—High, NA—Not Applicable).
From this analysis, it is clear that most existing surveys cover only selected dimensions, leaving important areas such as standardized benchmarks, performance evaluation, and input–output modeling less developed. In contrast, our paper provides complete coverage of all six dimensions’ definitions and concepts, taxonomy, architecture, applications, input–output classifications, evaluation metrics, and challenges. By taking this wider and unified approach, our research deals with the issues faced in prior research and provides a better understanding of agentic AI that will help address the questions or challenges of any future research and real practice.
3. Results
3.1. How Is Agentic AI Conceptually Defined, and How Does It Differ from Related Paradigms?
Agentic artificial intelligence (agentic AI) marks a major change in how AI systems are designed. Moving beyond passive and reactive tools, agentic AI refers to autonomous, goal-driven systems that can operate on their own for long periods, requiring minimal human supervision [27,31,32,33]. Unlike traditional AI or standard large language models (LLMs) that respond only to single prompts [34], these systems can understand broad objectives, break them down into smaller tasks, and carry out multi-step plans while adapting to feedback from changing environments [35,36,37]. Their “agentic” nature also highlights their ability to take on responsibilities from humans [32], act purposefully, and be accountable for the results they produce.
What makes agentic AI stand out is its combination of key abilities in one system. These include strategic planning, memory that preserves context over time, using external tools to extend capabilities, and collaborating with other agents [38,39,40]. By combining foundation models with planners, knowledge bases, sensors, actuators, and feedback mechanisms, these systems can understand complex environments, adjust strategies in real time, and achieve their goals independently [37,41,42]. However, in the real world, autonomy can be defined as a system’s capacity to establish sub-goals, finish tasks without human involvement, and adjust to unexpected events [22]. Even though existing frameworks show some autonomy, they are frequently still constrained by problems like reliance on human-defined goals, vulnerability to mistakes in long-term planning, and difficulties with accountability. This makes them highly valuable in areas like scientific research, industrial automation, and compliance management [43,44,45], where precision, continuous learning, and smart decision-making are essential.
At the heart of agentic AI is self-directed, goal-focused intelligence. These systems operate with a “degree of agenticness,” meaning they can work proactively, plan ahead, and shape outcomes over time [14,46]. As research and industry continue to adopt these systems, agentic AI has the potential to become a reliable partner for humans extending our abilities while raising new questions about trust, oversight, and collaboration [36].
Agentic AI goes beyond simple automation by enabling systems to understand high-level intent, create actionable plans, and carry them out with minimal human input [27,31,38,47]. These systems combine goal-driven behavior, flexible decision-making, and adaptive reasoning to work effectively in dynamic and evolving situations [48,49,50]. They can pursue multiple goals at once, plan for the long term, and collaborate across several agents or components [37,38,51]. Architecturally, they often integrate large AI models with knowledge bases, planners, access to external tools, and persistent memory to improve understanding and independence [21,42,52]. By continuously learning from experience and adjusting to complex, unpredictable environments [53], agentic AI shifts from simple automation toward reflective, autonomous systems capable of achieving complex objectives.
The purpose of the comparative overview is to present the similarities and differences between multi-agent systems, generative AI, and agentic AI in a structured format, as shown in Table 2. By organizing the paradigms across key aspects such as decision-making, adaptability, learning approach, and workflow management, it highlights their distinctive features. The side-by-side format illustrates the progression from distributed coordination in MAS and pattern-based generation in GenAI toward the autonomy and goal-driven reasoning of agentic AI. This approach ensures complex concepts are communicated clearly.
Table 2.
Comparative analysis of AI paradigms: multi-agent systems, generative AI, and agentic AI.
The purpose of the below Venn diagram in Figure 3 is to provide a visual representation of how multiple fields of artificial intelligence overlap and intersect while emphasizing the importance of agentic AI. It shows how agentic AI overlaps with reinforcement learning and multi-agent systems and places it within the broader universe of AI. This format allows us to better understand and visualize the dependencies and shared characteristics between agentic AI and related fields. In general, the diagram serves as complement to the written discussion in providing a concise and more accessible picture of how agentic AI is situated and interacts within the greater ecosystem of AI technologies.
Figure 3.
A Venn diagram illustrating the conceptual foundations of agentic AI.
Figure 3 shows the relationships between different areas of artificial intelligence. At the broadest level, everything falls under AI, with Machine Learning (ML) as a major subset. Inside ML, deep learning (DL), generative AI (Gen AI), and LLMs are shown as nested areas, reflecting their strong connection. Reinforcement learning (RL) also sits within ML but separately. Agentic AI overlaps with these technologies, as it draws from both LLMs and RL, while also sharing ideas with MAS [77,78]. Autonomic computing is placed outside, showing that it is related but not fully inside the AI family; it focuses on self-managing systems that can configure, heal, optimize, and protect themselves, which overlaps with AI but also extends into broader fields like distributed systems and IT infrastructure.
Agentic AI is built strongly on top of LLMs and RL. LLMs give agents the ability to understand and generate natural language, enabling them to reason, plan, and interact in ways that feel human-like. RL, meanwhile, allows agents to learn from their actions and continually improve their decisions, making them adaptive and focused on achieving goals [31,79]. Together, LLMs provide intelligence in communication and knowledge, while RL contributes the ability to learn from experience. This combination is at the heart of how agentic AI systems can plan tasks, solve problems, and act independently.
The overlap between MAS and agentic AI is also important. MAS focus on how multiple autonomous agents interact, cooperate, or compete within a shared environment. These skills are essential for agentic AI, where several agents often need to coordinate to achieve complex objectives. Beyond using LLMs and RL, agentic AI depends on planning algorithms and memory systems, which help agents remember past experiences and develop long-term strategies. Altogether, these components form the foundation of agentic AI, allowing it to act not just as a tool but as an intelligent, goal-driven partner.
3.2. To What Extent Do Current LLM-Based and Non-LLM-Driven Agentic Systems, Tools, and Frameworks Enable Agentic Capabilities?
3.2.1. LLM-Based Agentic Systems
The development of LLM-based frameworks like LangChain [26,31,78,80], AutoGPT [26,31,78], BabyAGI [31], AutoGen [80], and OpenAgents [26] shows a clear trend toward building smarter, more independent AI systems. These frameworks give AI agents the ability to plan, remember past actions, reflect on their performance, pursue goals, and use external tools. By equipping agents with these cognitive skills, they become more adaptable and capable of handling complex tasks with minimal human guidance.
Table 3 presents an organized overview of major LLM-based frameworks. It highlights each framework’s, primary purpose, key agentic capabilities, and the underlying LLM used. Structuring this information in tabular form makes complex details easier to navigate. The format also facilitates side-by-side comparison, enabling the identification of unique features as well as areas of overlap in agentic capabilities.
Table 3.
Agentic capabilities across current LLM-based frameworks.
LangChain is a popularly used open source framework that connects LLM to external tools, data sources and APIs, which simplifies the work of constructing complex workflows with multiple steps [26,31,78,79]. It allows developers to construct sequences of reasoning steps (i.e., chains) to help the model reach an objective. LangChain has basic memory that allows you to store and recall information and integrate external knowledge using Retrieval-Augmented Generation (RAG), which gives agents access to documents and databases or search results while they reason [77,80]. While it facilitates task planning and chaining, it does not natively include self-reflection or optimization, so its intelligence is largely guided by the developer. LangChain is ideal for creating structured, goal-oriented workflows where models can interact with external systems and perform contextual reasoning.
AutoGPT is built as a fully autonomous agent capable of completing complex tasks without needing constant human guidance [43,45,47]. It can take high-level goals, break them down into smaller actionable steps, and adjust its plan dynamically as circumstances change. With both short-term and long-term memory, AutoGPT keeps track of context and can handle tasks that go beyond a single interaction. It also leverages tools like web search, code execution, and file management to operate effectively in real-world settings [29,44]. Through its iterative process, AutoGPT can analyze outcomes and refine its actions, combining planning, memory, reflection, and goal-oriented execution to manage tasks autonomously.
BabyAGI approaches the idea of autonomous agents with a slightly different approach, emphasizing simplicity and general accessibility for task generation [49,81]. It can decompose high-level goals into smaller, actionable steps for task outputs and has memory to keep track of progress made on a particular multi-step task. BabyAGI has extensions for working with external tools, such as those for code generation and robotics, so it can effectively act in both digital and physical environments. Although it does not have a fully explicit reflection ability like AutoGPT, and it is still all iterative, meaning that it can execute a task, receive feedback, and refine or adopt new actions over time, to give it a practical level of self-correction for goal achievement [82]. BabyAGI should provide developers the ability to implement autonomous problem-solving agents of moderate complexity.
AutoGen is designed for multi-agent systems, allowing several AI agents to communicate, collaborate, and work together to solve tasks [41]. Each agent can create its own plan while coordinating with others, using memory to track past actions and shared context. The framework supports reflection and optimization, so agents can evaluate their plans and improve them over time. By combining tool use with teamwork, AutoGen reduces errors, boosts performance, and can tackle complex, distributed goals more effectively. This makes it especially useful for research or applications that rely on multi-agent reasoning and collaborative workflows.
OpenAgents focuses on developing LLM-based agents that can work together in a shared ecosystem to accomplish complex goals [26,78,79]. The framework uses contextual memory to keep interactions coherent over multiple steps, and it supports iterative planning and reasoning so agents can refine their decisions over time. By integrating external tools, agents can carry out tasks effectively while combining internal reasoning with knowledge from outside sources [31,77,80]. OpenAgents also lets agents improve themselves through repeated reasoning cycles and supports collaboration among multiple agents on tasks that require continuous decision-making, dynamic problem-solving, or integrating diverse sources of knowledge.
Communicative Agents for “Mind” Exploration of Large-Scale Language Model Society (CAMEL) is a framework for multiple agents that enables role-playing: LLM agents discuss with each other to see what new behaviors they come up with. In order to solve tasks, it assigns complementary roles (such as “User” and “Assistant”) and promotes dialogue-based collaboration [83,87]. CAMEL facilitates autonomous reasoning, negotiation, and iterative solution refinement by breaking down objectives into conversational exchanges. In contrast to conventional single-agent systems, CAMEL makes use of multiple agents’ interactions to produce more complex results. The majority of implementations use GPT-4 and Large Language Model Meta AI (LLaMA) as the underlying models, though it can operate on other LLMs. In constrained or open-ended environments, CAMEL is especially helpful for studying agent coordination, cooperation, and emergent problem-solving. MetaGPT is an advanced framework for multi-agent collaborative problem-solving that leverages GPT-4 as a central intelligent agent. It automates tasks that can be represented as a series of smaller sub-problems by breaking each problem into sub-problems, automatically assigning sub-problems to agents based on their specialization, and coordinating agents’ interactions to solve the overlying problem [84,87]. By managing an organized “society” of agents, MetaGPT can achieve efficiency and scalability in solving challenging problems. MetaGPT can be used to automate workflows or research or to support decision-making and reasoning tasks that draw upon diverse expertise and collaborative reasoning. It is best designed with seamless communication and task management of dynamic complexity among multiple agents, showcasing a highly advanced evolution of AI-enabled collaboration.
SuperAGI is an open-source framework for autonomous agents. Developers can quickly define, deploy, and manage LLM-powered agents, with great flexibility. The software is mostly model-agnostic and supports LLMs from many providers (LLaMA, Anthropic, and OpenAI). SuperAGI agents have the capability for goal-directed execution; memory management; multi-step planning; and tool use, among others [85]. In addition, SuperAGI includes resource controls and monitoring to constrain autonomous behavior. Unlike lighter-weight orchestration tools, SuperAGI is oriented toward production readiness, allowing agents to be scaled and monitored within real-world workflows. Its modularity and extensibility make it a strong contender for enterprise-level autonomous agent applications.
A hybrid framework called Task-Based Cognitive Sequential Planning Network (TB-CSPN) blends selective LLM use with formal rule-based coordination [86]. Its architecture limits LLM involvement to semantic topic extraction and uses Colored Petri Nets for deterministic task planning and execution. This eliminates the possibility of LLM visions or non-determinism by guaranteeing that reasoning and sequencing are logically consistent and verifiable. Instead of using unrestricted LLM reasoning, TB-CSPN agents use structured tokens to communicate, and their interactions are guided by formal cognitive models. TB-CSPN demonstrates a more cautious integration of AI balancing LLM flexibility with symbolic rigor, in contrast to frameworks that mainly rely on language models for planning. It works particularly well in fields that need topic-driven task coordination, auditability, and dependability.
3.2.2. Non-LLM-Driven Agentic Systems
Intelligent agents can act and make decisions outside of large language models. Non-LLM-dependent agentic systems achieve autonomy through either rules, learning, planning, or reactive behaviors; they have been used extensively in robotics, simulations, and automated decision-making. These systems show that AI can be effective and adaptive without natural language understanding.
One important type is rule-based multi-agent systems. In these systems, agents follow predefined rules or logic to interact and achieve coordinated outcomes. Examples include CLIPS, Drools, and JADE [88], which are used for simulations, workflow automation, and decision-support tasks. Another key category is RL [89,90] agents, which learn from trial and error. Using algorithms like Q-learning, Deep Q-Networks (DQN), [91] and Policy Gradient methods, RL agents improve their performance over time by interacting with their environment. These agents are commonly used in robotics, autonomous vehicles, and dynamic systems.
Other non-LLM agentic systems include planning agents, which use algorithms such as STRIPS or PDDL [92] to achieve goals; agent-based simulation frameworks like NetLogo [93] and MASON [94], which model complex social, ecological, or economic behaviors; and behavior-based robotic agents, such as those using subsumption architectures, which respond to environmental stimuli through simple sensorimotor rules. Together, these systems demonstrate that intelligence, adaptability, and autonomous behavior can emerge through rules, learning, and planning, highlighting the broad potential of agentic AI beyond language-based models.
3.3. Core Components and Architectural Models for Agentic AI
This section defines a set of agent components and uses them to compare architectures by control flow and module wiring rather than terminology: perception, world/state, memory, planning, execution/tooling, reflection/evaluation, orchestration, and interaction. We then outline five orchestration models, ReAct (Reasoning + Acting) single-agent, supervisor/hierarchical, Hybrid reactive–deliberative, BDI, and layered neuro-symbolic, highlighting when to use each.
3.3.1. Core Functional Components of Agentic AI Systems
Core components are the reusable building blocks of agentic AI, wired together by architectural choices. A common set includes the following:
- Perception and world modeling—Ingests and structures external inputs (e.g., text, sensors, APIs) into internal representations [41,95].
- Memory (Short-Term, Long-Term, Episodic)—Stores short- and long-term knowledge; retrieval/promotion rules connect past to present reasoning [39,95].
- Planning, Reasoning, and Goal Decomposition—Transforms goals into actionable steps, evaluates alternatives, and selects next actions [95].
- Execution and Actuation—Carries out actions via APIs or actuators, with monitoring and dynamic replanning [41].
- Reflection and Evaluation—Enables self-critique, verification, and refinement of actions and plans [50].
- Communication, Orchestration, and Autonomy—Coordinates task flow, retries, and timeouts, either centrally (e.g., LLM-based supervisor) or via decentralized protocols [50,96].This component stack recurs across both academic and industry implementations, including high-stakes domains like finance [39].
The model, typically an LLM, large agent model (LAM), or foundation model (FM), provides the system’s core intelligence. It often serves as the reasoning engine, perceptual front-end, and, in many cases, the orchestrator. In Multi-Agent Artificial Intelligence (MAAI) frameworks, the model forms a base layer for perception, action, orchestration, workflows, and user interaction [96]. Tutorials increasingly integrate LLMs/LAMs with planners, memory, tools, and knowledge bases [42].
Agentic AI behavior arises from a coordinated set of reusable components rather than from a single model, as shown in Figure 4. In the following, we synthesize the modules and how they interact.
Figure 4.
Core components of agentic AI system.
Perception and World Modeling: Perception ingests external inputs (text, events, sensors/APIs) and normalizes them into structured observations. World/state modeling maintains an internal representation used for prediction, consistency checks, and counterfactual simulation. In embodied or data-rich settings, perception is multimodal and frequently layered with probabilistic inference to manage uncertainty before symbolic planning or execution [41,42,95].
Memory (Short-Term, Long-Term, Episodic): Memory provides temporal continuity. Short-term memory (STM) maintains episode context (e.g., current plan, recent exchanges); long-term memory (LTM) stores episodic/semantic knowledge (e.g., preferences, histories, artifacts). Retrieval and promotion rules connect STM and LTM so that prior outcomes inform future decisions, reflection, and personalization [47,50,95].
Planning, Reasoning, and Goal Decomposition: The planning/reasoning module transforms goals into actionable steps, evaluates alternatives, and selects next actions across short and long horizons. Granularity varies by paradigm: BDI filters desires into intentions; HRL decomposes abstract goals into sub-tasks; single-agent ReAct interleaves reasoning with action (often with tool calls) [41,47,95].
Execution and Actuation: Execution bridges cognition to impact. It invokes tools/APIs, actuators, or workflow steps; validates outcomes against expectations; and triggers retries or replanning on deviation. Production-oriented variants emphasize schema checks, budget/latency limits, and robust error handling to support closed-loop operation in dynamic environments [41,42,95].
Reflection and Evaluation: Reflection/evaluation modules verify intermediate or final outputs, critique candidate plans, and trigger selective replanning. Practical patterns include self-critique, external tool-assisted verification, and nested “critic” roles that reduce hallucination and improve reliability while adding computational overhead [50,95].
Communication, Orchestration, and Autonomy: Interaction modules support human–agent and agent–agent dialogue (clarification, negotiation, oversight) and surface trace information (e.g., actions, tools, sources) for transparency. Autonomy emerges when perception, planning, execution, memory, and reflection are orchestrated over time; in many systems, an LLM-based supervisor coordinates sub-agents, invokes memory or tools, and maintains coherence across steps [41,50,97].
3.3.2. Architectural Models in Agentic AI
Agentic AI systems are not monolithic; their capabilities emerge from architectures that coordinate planning, execution, memory, reasoning, and interaction. Below, we outline five commonly used models and their control flows.
- ReAct Single-Agent:
Figure 5 represents the architecture flow of ReAct single agent. ReAct instantiates a single agent that interleaves stepwise reasoning and acting, optionally inserting a lightweight evaluator before committing an output [98]. The core components are perception/world-state, planning/reasoning, execution and actuation, memory (STM/LTM), and reflection/evaluation; the architecture wires these into a minimal, locally orchestrated pipeline with no external supervisor. The agent observes via perception, reasons in plan/reason to select the next step, acts in execute/tools to affect the world or produce a user reply, may evaluate the result, and then updates memory before repeating. The goal is fast iteration on well-bounded tasks, where a compact loop outperforms heavier orchestration.
Figure 5.
ReAct single-agent loop.
Components and flow: Data and control proceed left to right: the agent observes (perception), optionally retrieves salient context from memory, reasons in plan/reason to select the next step with success criteria, acts in act/execute via a tool/API, and may evaluate the outcome before logging/promoting state and repeating. In the diagram, rounded rectangles denote processing modules (perception → reason/plan → act/execute) connected by solid arrows (primary progression); memory is shown as a cylinder with an upward arrow to reason/plan (retrieval) and dashed downward arrows from act/execute (and, if present, the evaluator) for logging/promotions; evaluation appears as a hexagon branching by a solid arrow from act/execute and returning dashed feedback to reason/plan and/or memory. User-visible responses are produced in act/execute; the loop closes at the environment boundary when the updated world or next user turn re-enters perception as a new observation.
Use and trade-offs: Interleaving reasoning with action tends to improve multi-step task success, and brief critique/verification further suppresses hallucinations; both, however, introduce extra inference steps and wall-clock latency. ReAct is therefore a strong baseline for bounded, single-persona workloads and rapid prototyping. Its simplicity does not scale gracefully: the loop affords little parallelism, weak long-horizon coordination, and increasing tool-selection brittleness as the tool surface and dependency depth grow. For tasks requiring persistent goals, specialization, or oversight, escalation to supervisor/hierarchical (delegation and parallel execution) or Hybrid reactive–deliberative (real-time reflexes with planning) is appropriate despite added coordination overhead [98].
- Supervisor/Hierarchical:
A supervisor/orchestrator as shown in Figure 6 decomposes high-level goals into subtasks and delegates them to role-specialized sub-agents, each of which executes locally using a compact ReAct-style loop, perception → reason/plan → act/execute, with optional evaluation and memory, while an optional shared/federated state maintains coherence across agents [47,50,95].
Figure 6.
Supervisor/hierarchical architecture.
Components and flow: Operationally, control proceeds as planning → delegation → execution → reporting → possible re-planning. The supervisor (diamond) issues assignments (solid progression) and receives reports (dashed) that trigger retries or re-planning when outcomes deviate. Within each sub-agent, memory supports retrieval (solid) into planning and accepts logging/promotions (dashed) from execution/evaluation; evaluation appears as a hexagon branching from act/execute, feeding feedback into planning/memory.
Use and trade-offs: This architecture is well-suited to decomposable, multi-stage tasks that benefit from parallel specialization and clear ownership; it improves scalability via hierarchical delegation but incurs coordination overhead and risks a supervisor bottleneck/single point of failure if the controller is not engineered for throughput, fault-tolerance, and observability [47,50,95].
- Hybrid reactive–deliberative:
A hybrid architecture runs two coordinated loops in parallel, as shown in Figure 7. A fast reactive loop that handles time-critical events and a slower deliberative loop that maintains goals and plans. The loops share a common memory and are overseen by an Arbitrator (diamond) that resolves conflicts when immediate reactions and long-horizon plans disagree [38,47,95].
Figure 7.
Hybrid reactive–deliberative.
Components and flow: The reactive path couples perception of events to actions/actuation with minimal latency; safety/guard checks are typically embedded here. The deliberative path performs planning/reasoning over longer horizons, updating goals and selecting action sequences; it reads from and writes to memory (cylinder) to maintain temporal coherence. A shared memory module synchronizes state between the loops. The Arbitrator (diamond) mediates which loop has control when their recommendations diverge, using task/goal priorities and policy constraints. Flow proceeds: perceive (reactive and abstract inputs) → (branch) fast react vs. slower plan → arbitrate (diamond) → execute → evaluate (hexagon) → update memory → possibly re-plan.
Use and trade-offs: The hybrid model balances speed with strategic reasoning, but this comes with arbitration complexity and a requirement for reliable state synchronization across loops to avoid instability or “thrash.” Empirically and in surveyed practice, it is preferred for real-time operations with long-horizon objectives; risks concentrate in arbitration design and consistency across loops [38,47,95].
- Belief–Desire–Intention (BDI):
The BDI architecture models, as shown in Figure 8, deliberation with explicit mental states: beliefs (world information/state), desires (candidate goals), and intentions (the subset of goals the agent commits to pursue). A canonical cycle is observe → update beliefs → generate desires → filter to intentions → act → repeat; commitment rules maintain intentions until a goal is achieved, becomes impossible, or is superseded [38,95].
Figure 8.
Belief–desire–intention (BDI) architecture.
Components and flow: External world inputs are observed and update beliefs/world state. The agent derives desires and filters them into intentions according to domain rules and commitment policies. Act/execute realizes the current intention via tools/APIs/actuators, producing effects on the external world. An optional evaluation gate verifies outcomes or constraints and can feed revisions back to the desire set or commitment filter. Memory supplies retrieval to the desire/intention stage (schemas, facts, episodic traces) and receives logs/promotions from act/execute and evaluation (summaries, artifacts, outcomes). Control proceeds top→down, world → beliefs → desires → intentions → act/execute, followed by evaluation and memory update; the loop closes when the changed world yields the next observation back into beliefs [38,95].
Use and trade-offs: BDI is a strong fit for tasks where explainability, goal discipline, and traceability matter (e.g., auditability or safety cases). It trades some adaptability for clarity: commitment rules and symbolic state make reasoning legible but can become brittle in open-world settings without robust belief-revision and exception policies [47,50,95].
- Layered Decision (Neuro-Symbolic):
Layered decision architecture is shown in Figure 9. Layered decision architectures integrate neural perception and probabilistic inference with a symbolic planner and an execution layer, typically followed by evaluation and memory update. The aim is to handle uncertainty via statistical inference while preserving interpretability and traceable decisions through symbolic planning [38,39,41,95].
Figure 9.
Layered decision (neuro-symbolic) architecture.
Components and flow: Perception ingests signals (text, images, telemetry) and encodes features. Probabilistic inference converts these into calibrated state estimates and event hypotheses, exposing uncertainty to discrete reasoning. The symbolic planner operates on this structured state with explicit rules, goals, and constraints; it retrieves schemas, facts, and episodic traces from memory. Act/execute materializes the chosen step via tools/APIs/actuators, producing effects on the external world. An optional evaluation stage performs verification (e.g., constraint checks, cross-model critics), returning feedback to the planner and logging/promoting artifacts to memory; act/execute likewise logs/promotes to memory. Control is top→down, external world → perception → probabilistic inference → symbolic planner → act/execute, followed by evaluation and memory updates; the loop closes as the updated world re-enters perception [38,95].
Use and trade-offs: Layered neuro-symbolic designs are well-suited to open-world decision-making under uncertainty and to domains needing transparent, auditable reasoning (e.g., public-sector workflows). The principal cost is integration and representation alignment, bridging neural encodings with symbolic state/rules and evaluation/feedback, which increases design and maintenance overhead. These strengths and costs are summarized in your comparative table and discussion.
3.3.3. Coordination and Modularity in Agentic Architectures
Agentic AIs succeed or fail not only by which components they contain but by how those components are modularized and coordinated at runtime. This section summarizes dominant coordination mechanisms and their trade-offs.
- Modular Composition:
Architectures commonly encapsulate perception, planning/reasoning, memory, execution/tooling, and reflection into independently deployable subsystems with clear interfaces. This promotes scalability (scale out specific modules), fault isolation (contain failures), and parallelism (concurrent role execution). Modularity also enables “plug-and-play” evolution, e.g., swapping a planner or adding a compliance checker, without redesigning the full system. Design guidance emphasizes tight control of interfaces and state contracts between modules to support predictable orchestration and observability [42,47,50,96].
- Orchestration and Supervisory Control:
Centralized orchestration: Many systems employ a supervisor (often LLM-based) to route tasks, maintain shared context, trigger reflection or replanning, and arbitrate conflicts between competing goals. The supervisor coordinates specialist agents (planner, retriever, executor, critic), aggregates results, and enforces guardrails and approval gates where required [50,95].
Decentralized orchestration: In multi-agent settings, decision-making is distributed via message-passing and shared-state protocols; roles negotiate or vote to coordinate actions. This improves fault tolerance and throughput but increases testing and observability complexity and requires robust protocols for consensus and backoff [42,47,96].
When to prefer which: Centralized control suits tightly regulated or safety-critical workflows with clear accountability; distributed control fits large, decomposable workloads with dynamic specialization and resilience needs.
- Communication Protocols and Workflow Graphs:
Production systems typically (i) exchange structured messages (e.g., JSON) including task intents, state snapshots, and tool results, enabling replay and lineage; (ii) use workflow graphs to represent control flow (branching, retries, compensations); and (iii) attach evaluation hooks (critics, policy checks) at key transitions to gate risky actions and trigger replanning or escalation. Graph-based orchestration frameworks and multi-agent planning protocols are frequently reported to improve robustness under non-determinism while preserving transparency for operators [41,42,50,96].
3.3.4. Integration of Components into Architecture
Agentic AI differ from conventional pipelines in that components are integrated into continuous decision loops that sense, plan, act, evaluate, and learn.
- Layered and Modular Pipelines:
A common integration pattern is a layered flow that couples components into an end-to-end decision loop: perception → world/state modeling → planning/reasoning → execution/tooling → evaluation/feedback → memory → (back to) planning. Although often depicted linearly, these flows are event-driven and cyclic: evaluation and feedback can branch to retries or replanning; memory promotion/demotion changes subsequent retrieval; and perception can interrupt with high-priority events. Effective designs define explicit interfaces (inputs/outputs, schemas, confidence) at each boundary and attach gates (e.g., policy checks, critics) before risky actions [41,42,95]. This organization supports persistent goal pursuit (long horizon), real-time adaptation (short horizon), and safe recovery from deviations [41,95].
- Orchestration Mechanisms:
Centralized orchestration: Many systems elevate an LLM to a supervisor role that maintains shared context, routes tasks to specialist modules/agents (planner, retriever, executor, critic), triggers reflection or replanning, and arbitrates conflicts among goals. The supervisor executes a graph (DAG/state machine) with branching, retries, compensations, and evaluator hooks at critical transitions to keep lineage auditable and side effects idempotent [42,50,97].
- Graph-based execution:
Regardless of centralization, integrating components with an explicit workflow graph improves robustness under non-determinism. Nodes encapsulate component calls (with contracts for inputs/outputs and budgets), and edges encode control logic (success/failure, timeouts, escalation). Evaluators or policy guards can be placed at ingress/egress of high-risk nodes (e.g., external tools) [42,50].
Trade-offs: Centralized supervisors simplify accountability and debugging but may bottleneck or introduce single-point-of-failure risk; decentralized flows increase resilience and throughput but require stronger state contracts, message protocols, and observability [42,47,50].
- Multi-Agent Integration and Decentralization:
In decentralized designs, agents each own a subset of components (e.g., local planner + executor + memory) and coordinate via shared context and message passing. Common patterns include role-based teams (planner/solver/critic), peer networks with negotiation/voting, and hybrid hierarchies that mix supervisor(s) with collaborating peers. Integration hinges on (i) shared state (what is global vs. local), (ii) protocols for task handoff and conflict resolution, and (iii) placement of evaluators/approvals for safety or compliance. This model yields fault tolerance and parallel specialization but increases testing/monitoring complexity and demands clear traceability of who decided what and why [39,47,97].
3.3.5. Architectural Models vs. Core Components
Table 4 synthesizes how common architectural families connect the components introduced earlier and compares the core agent components between the major architectural families, highlighting their typical uses and associated risks. It shows how different designs emphasize particular strengths: BDI architectures rely on explicit beliefs, desires, and intentions to support explainable decision systems, but require extensive symbolic modeling and can be brittle under uncertainty [38,41]; hierarchical/HRL models scale well for decomposable tasks through multi-level goal delegation, though they risk supervisor bottlenecks [50,96]; ReAct single agents offer lightweight reasoning–action loops for rapid iteration, but struggle with parallelism and reliable tool selection [47]; Hybrid reactive–deliberative systems balance fast reflexes with long-horizon planning, yet require careful arbitration design [38]; and layered neuro-symbolic approaches integrate neural inference with symbolic reasoning to manage uncertainty and provide traceability, though at the cost of added integration complexity [39,41,95]. Together, the table underscores how architectural choices map to trade-offs in scalability, transparency, and robustness, guiding when each model is most appropriate.
Table 4.
Components and risks across architecture families.
3.4. What Types of Goals and Tasks Are Currently Being Solved Using Agentic AI Across Domains?
Agentic AI is used in a large number of areas and applications in order to accomplish a large variety of goals and tasks. Agentic AI systems can determine a high-level objective, breaking that objective down into smaller tasks, and efficiently accomplishing those tasks, even in environments that are evolving and that are inherently unpredictable and/or complex processes. Agentic AI can reason, build plans, and act autonomously to adapt to a new scenario, work as part of a team, and produce results that may be difficult or impractical to produce with other types of AI.
Agentic AI systems are used to complete routine activities, support in real-time decision-making, handle complex tasks or problems, and provide personalized assistance by adapting to the context. The system’s strength is in combining data from many sources, making ethical and informed decisions, and coordinating multiple agents to work together in real time.
The rationale for creating a domain-based categorization of agentic AIs is to clarify how these systems are molded and shaped by the specific needs, boundaries, and objectives of each of the fields deployed. The domain-based framework not only provides the clarity in displaying agentic capabilities across multiple domains highlighting similarities in abilities that were common but others that would be unique to a specific agent. The overview of the application categories is shown in Figure 10. Moreover, it allowed the reader to identify active patterns within a specific domain, while allowing for tabular comparison of other domains.
Figure 10.
Overview of agentic AI applications across multiple domains.
There are three components of Table 5: the domain, the application, and the tasks and/or considerations associated with the agents’ work. This allows for quick access to important information without reading full paragraphs. The diagram serves as a companion piece, but includes an additional layer to represent a domain with relationships to their applications, as well as where domains share characteristics. Viewing where the table and diagram intersect means the reader has both meaningful representation, and sufficient detail to challenge if agentic AIs are acting as agents in real and experimental contexts.
Table 5.
Agentic AI applications and tasks across multiple domains.
3.4.1. Healthcare
For agentic AI in healthcare, it will include diagnostics, treatment plans, and personalized care. Agentic AI enables pinpoint risk stratification, rapid segmentation of medical images, personalized diets from scratch, real-time seizure prediction, and robust human–robot interaction [59]. Fitness and wellness agents have enabled adaptive coaching, strengthened posture analysis paradigms, and offered multimodal feedback, tracked progress, and coordinated recommendations [60]. The EDX test standardized factor could have organized EMG/NCS data, smartly enabled incorporation of known diagnostic contexts through RAG, and offered accurate level physician reports [99]. Genomic assessment could predict risk for disease, specify treatments, indicate progression, and personalize care for costs, in accordance with safeguards of security [35]. Agents could indicate the ethical framework for generative AI [44], clean and automate medical computer vision pipelines [100], develop bioethical decision support [101], greatly enhance clinical decision-making of some use cases for workflows in surgery and drug discovery workflows [56], and would streamline real-time differential diagnosis of DILI derived from clinical notes [102].
3.4.2. Military
In military and cybersecurity efforts, agentic AI makes it possible to have groups of autonomous agents to carry out both offensive actions and defensive actions based on collective coordinated decisions. Every agent can independently act based on its own preferences. They can defend their positions, attack enemy positions, move around as needed, adapt their movements based on performance metrics, and choose to deploy a variety of effective formations [119]. For example, in cloud-based AI platforms, these forms of agentic AI systems can facilitate a kind of continual protective action to foster security by allowing the rotation of particular active services in order to minimize some risk, preemptively defend against known threats, limit vulnerabilities associated with repeated attacks, apply strict security rules, report unusual activity, and automatically respond with system changes [120].
3.4.3. Transportation
Intelligent agentic traffic control systems simulate traffic situations, control dynamic routing and implement on-demand signal operations, coordinate agents over distributed transport networks, and work toward improving safety, efficiency, and sustainability [103]. Real-time process optimization using generative models and soft inputs with SIC architectures predicts quality variables, performs closed-loop control, allows for closed-loop control with soft sensors to scale biomanufacturing, personalize product design, and minimize energy use and emissions [43]. Context-aware navigation reduces risk of collision in dynamic environments and complex place types through multimodal perception to inform actions and behavior used with semantic understanding for informing decision-making [74]. Multi-agent routing and scheduling coordinate agents traveling multiple traffic channels to eliminate delays, avoid deadlock, and provide opportunity for agents to reach locations given occupancy constraints [61]. Multi-phase transportation lifecycle management is human-centered and adaptable and strive to address barriers toward inclusion while engaging with persistent issues of congestion, collisions, environmental impact, and equity while optimizing travel outcomes for individuals and collectives [104].
3.4.4. Software
Agents independently decompose complex queries, determine which tools are relevant, and execute subtasks, while providing adjustable feedback in real time [105]. Behavior-based development agents use a natural language query as an input and generate structured test cases to collaborate across teams [106], whereas autonomous visual intelligence agents fuse spatial context into multimodal sensory data to perceive the environment, adapt, and collaborate in real-time for tasks such as robotic surgery, disaster response, and augmented/mixed reality applications [107]. Software engineering and business automation agents take a goal and decompose it to oversee who is assigned to each role and to automate workflows, including inventory management, customer relationship management, invoicing, and demand forecasting [30]. Self-directed AI pipelines autonomously select models, make use of data featuring imbalanced learning classes, and manage knowledge-aware, interpretable decision-making [54]. Agentic search, reviews, and personalization systems are capable of multi-step reasoning, retrieval in cross-lingual forms, and bias mitigation [108]. Multi-agent orchestration systems can facilitate computer vision task automation and manage complex, fault-tolerant, high-risk systems [62,109].
End-user programming agents are giving rise to democratized software development [110]. Near real-time multi-agent frameworks simultaneously can be used for software development, controlling operational IT workflows, and strategic decision-making [111]. Autonomous software development and multi-domain digital task agents will plan, assign, execute, and self-correct the software development, browsing, coding, and file operations associated with a release [50,113]. Human-algorithmic agency modeling also theorizes an evolving model for collective human–AI agency [114]. Conversational agents will become increasingly integrated into software development to improve software test case development, ML workflow improvement, and SOC automation through natural language interpretation, test case generation, pipeline and incident monitoring and detection, rule zero-trust enforcement and multi-agent interaction coordination [115,116,117]. Fully autonomous software development, predictive incident detection, IT automation, and other tasks do not require a fully autonomous agent to decompose prompts, write and test code, integrate third-party APIs, optimize resource use and performance, manage Git workflows, or manage incidents using coordinated multi-agent planning [49,51,118].
3.4.5. Finance, Banking and Insurance
Agents operating in finance, banking, and insurance have improved customers’ risk profiling, automated forecasting for loans, treasury management, and fraud detection by examining real-time data and behavioral patterns [80]. Agents also increase model risk management by using automated ML workflows, compliance checks, and by cooperating agent to agent [39]. Further, customer services receive outstanding support 24/7, recommendations on investments and insurance, and automated transactions, due diligence, and KYC [128]. Risk and fraud detection agents can spot risky actions, break down complicated processes, and use AI that clearly explains how it works [129]. Personalized banking agents automate budgeting, product recommendations, and can provide instant behavioral interventions which also support inclusion through quasi-real-time feedback mechanisms [130]. Agents for insurance claims evaluation utilize multimodal inputs, decompose tasks into assigned subtasks, and respond in a policy-compliant way [96]. Finally, customer-centric chatbots improve sales, diagnostic ability, predictive analytics, and can automate IoT-driven processes for better interactive engagement across organizational partners [131].
3.4.6. Manufacturing and Industrial
Agents are useful within the manufacturing and industrial contexts in terms of increasing efficiency, safety, and decision-making across production and supply chains. Multi-agent systems play a role in the optimization of a number of tasks spanning wastewater treatment, factory operations, predictive maintenance, and energy-aware task scheduling, among others [33,63]. Supply chain agents may also forecast demand and monitor for contamination and then recommend actions with respect to quality and pricing [55]. Cognitive robots and collaborative human–robot agent systems can also be used to predict the possible states of a scenario, learn from social interactions with humans, adapt movements based on learning, and collaborate in shared tasks with humans safely [122,124]. Workflow agents and autonomic computing agents can also increase efficiency in robotic assembly, and automate the tasks of compliance, scheduling, and resource planning [121,123]. Multi-agent pickup and delivery systems address the coordination task, collision avoidance tasks, and other delay reduction tasks [69].
3.4.7. Tourism and Traveling
They optimize pricing, planning, and customer service using human–AI teams. They provide real-time decision support ethically and transparently, through RAG, Interactive Cognitive Agents (ICAs), and other management support systems [127].
3.4.8. Multi-Domain
Agents model user behavior, provide recommendations and context, coordinate processes across domains, automate business processes, break down goals, utilize external knowledge, and coordinate safely and ethically [53,132,133]. Agents also engage in real-time decision-making, minimize costs, make better use of resources, produce intelligible outputs, and plan and execute tasks under uncertainty [36,135]. Multi-domain agents engage with tools and/or APIs, manage workflows that will inevitably be very complex, such as coding, healthcare, finance, crime analysis, etc., and provide memory-based learning that allows them to continue a task by absorbing adaptive information from external cues [31,67,75,76,77,136,139]. Agents also enhance personal assistants, cybersecurity, and autonomous knowledge management, while realizing reduced computation cost and smarter cross-domain problem solving [32,46,66,134,137,138].
3.4.9. Scientific Discovery and Research
Autonomous systems are aiding scientific inquiry by making hypotheses, automating the literature review and analysis, and designing experiments. They can physically perform majority of robotic chemical and biological experiments, analyze data across all domains, and iteratively refine experiments to facilitate efficient experimentation for drug discovery, material science, and energy applications [64,65,150,151]. Multi-agent systems of LLMs can incorporate physical models with domain knowledge to plan simulations, make predictions for material properties, and develop multimodal outputs (numerical data, charts, text reports) for engineers [64,65]. Agents can facilitate automatic writing of research papers, analyze textual or hypertext content, organize qualitative interviews or conversations, and create knowledge graphs while embedding factors such as interpretability, bias detection, and collaborative reasoning throughout scientific workflows [45,152,153].
3.4.10. Retail, Business, and E-Commerce
Autonomous systems create customized customer interactions and coordinated management of supply chains by offering personalized shopping experiences, situation-aware support in the form of chatbots and virtual assistants, navigation support inside buildings or stores, and through coordinated autonomous operation by multi-agent systems like inventory management, logistics, and fulfillment [78]. Autonomous systems also manage automating task scheduling, digital management of content, and automating multi-step workflows like travel booking or e-commerce tasks [141,144]. Autonomous systems depend on using natural language processing (NLP) and situational reasoning algorithms to adapt to changing situationals [144]. There are also autonomous decision-making agents that are intended to quickly assist an organization or team by helping with aspects of strategic and tactical planning, or rapid data analytics along with error detection or forecasting to improve productivity and customer satisfaction [143]. The increasingly sophisticated capabilities of AI service agents and chatbots allow for continued personalized interactions, as well as the use of financial advice, reservations, smart device activation, and decreased human workload [140,145]. Workflow automation is at play in areas such as law and retail along with automating Kubernetes tasks with minimal supervision, and e-commerce process optimization provides opportunities for reduced costs and increased customer satisfaction [27,142]. Human–machine collaboration is added by using humanoid robotics and agentic LLMs for claims handling, smart marketing, HR, and trust establishment in customer service [68,146]. Interactive recommender systems will provide conversational personalized recommendations across movies/entertainment and retail domains [79]. GenAI agents showcase multi-domain reasoning, planning, and task execution capabilities, including finance and banking, cybersecurity, healthcare/remote healthcare, and software operation/administrative tasks [40].
3.4.11. Smart Cities and Energy
Autonomous systems support a city’s services, like transport, health, and emergency management, and can proactively provide solutions to problems like pollution and traffic in real time [125]. Agents can take on responsibility for building energy work that now involves data processing, coding, and simulations [58]. Multi-agent systems use less energy in smart buildings while maintaining comfort, reducing cost, and preventing energy spikes [70]. Digital twin agents support monitoring buildings, fault detection, maintenance to avoid faults, and energy efficiency for use [126].
3.4.12. Public Administration
Autonomous systems assist public administration with efficiently managing resources (water and energy), correctly anticipating demand, simulating policies with digital twins, and making adjustments in real-time. Autonomous systems also enhance services to citizens, automate administrative workflows, aid with fraud detection, and help with smart cities and population health decision-making [37,41].
3.4.13. Education
The personalized learning capabilities of autonomous systems in education derive from tracking student performance, adjusting content, giving timely feedback, taking on administrative responsibilities, identifying signs of disengagement, and promoting collaborative learning [57]. They can also help automate objective essay questions, thus offering a consistent, fair, and explainable way of marking via collaborative engagement of multiple agents [148]. Adaptive AI agents would be described as metacognitive support, facilitating the diagnosis of gaps in learning, providing scaffolding for complex content, relieving learner burden and acknowledgement from remote participation, and automation in marking [147]. Multi-agent LLM systems help deliver personalized and consistent results for both subjective and objective purposes, merging agent coordinate reasoning, validation, and quality coding of data generation [97]. In addition to helping learners, agents help the cognitive workflows related to research, decision-making, report writing, organizing travel, and many other complex academic workloads [149].
3.5. What Kinds of Input and Output Formats Do Agentic AI Systems Handle in Comparison to Traditional AI Systems?
Agentic AI systems designed for operation in complex environments are developed with flexible input–output structures for independent decision-making. Various agentic AI models use input–output schemes suited to the nature of the task, which will differ in terms of structural design and user styles of interaction. This section presents the various interaction methods with agentic AI systems, including how they process different input forms and produce appropriate outputs. Table 6 is categorized according to the input types given to the system, along with the task performed and the technologies used to produce that output.
Table 6.
Agentic AI models for various input–output transformations.
The rationale for the agentic AI models for input–output formats is to demonstrate how agentic AI systems are designed by directly mapping input modalities to their outputs, the tasks performed, and the technologies used. This comprehensive table is organized for easy comparison of how a single input can produce multiple types of output and perform different types of tasks. This provides important information about system flow in the real world applications of agentic AI systems. The table also highlights the diversity and specialization of agentic AI systems to help researchers and practitioners identify models or technologies appropriate for particular applications.
3.5.1. Text to Actions
Text as input is a free-form language from user to system; mostly, it is a natural language that can be structured or unstructured. In agentic AI models, actions as output means executing something in the system to complete the tasks [120]. More advanced models, like Vectara-agentic [27] and AutoGPT [34], can take text input, read the user’s prompt to create outputs or undertake actions, and coordinate with other agents without human assistance.
Beyond task executions, agentic AI systems also contribute to robotic process automation. It is possible for an agentic AI model to analyze manufacturing data and automatically execute optimized robotic assembly sequences, based on a structured framework like MAPE-K [121]. Similarly, in network operations [71], an agentic AI can interpret controlling commands and independently react to failure by interpreting and implementing possible actions in real-time [82] to make a decision to execute an action. In doing so, these models improve the fault tolerance of systems and enhance resilience [62]. Moreover, in the case of smart environments, agentic AI can drive energy savings by connecting power supply across home appliances [38] using multiple AI agents collaborating in a coordination process enhanced with LLMs.
Agentic AI models can improve themselves. BrainBody-LLM and RASC [75] utilize feedback in self-reinforcement to continue learning and optimizing over time. In scenarios where the past outputs have been wrong, self-reflective models like SELF-REFINE [50] can improve their behaviors by reviewing their own actions to create their improvement without relying on outside parties. Agency AI Systems can remove ambiguity, enact user preferences [154], and derive next actions through self-reflection to generate signals [144].
3.5.2. Text to Text
Text as output is human-readable language, which is a combination of formatted and structured text that is read and understood by people. Recent LLM models can share their knowledge amongst other agents, resulting in a multi-agent ecosystem capable of collective learning [143]. LLaMA-3 models [40] can help users in information retrieval by parsing users’ prompts [77], looking for responses to the prompts, and aggregating information from other sources [52]. LLM models support problem-solving [137] and programming [155] by completing or correcting code, and describing runtime errors [110]. The GPT models can also help automate customer interaction process workflows [140], logic-based processing in automated support [137], and responding to inquiries [41].
3.5.3. Text to Multimodel Data
Agentic AI systems can produce structured outputs, such as JSON or YAML scripts, after completing complex tasks [105], orchestration [100], or documenting research in research documents [151]. AgentGPT models can also produce multimodal data as output, which can be data visualizations such as projecting future growth using an interactive timeline [138] or charting their current and historical data and actions [64]. LLM models can fill logs and audits when they process textual input and produce structured records of activity [40].
3.5.4. Audio Files to Text/Audio
Audio as input in agentic AI systems can be speech, music, or signals, which are processed by the system to understand the goal, supporting human experience by allowing for more natural, context-rich interactions. Audio as output is a sound-based content as a response, which is human-perceivable sound for listening. Agentic AI systems utilize technologies including Automatic Speech Recognition (ASR), short-term and long-term memory, and advanced language models such as GPT-4 or Llama 3.2 to process spoken language and operationalize it into actionable insights. It is possible to use audio input for ethical advisory scenarios where agentic AIs can generate fairness-aware decisions through parsing spoken queries [101]. Similarly, feedback-generating interactive systems can use audio to goal parse and make context-aware outputs [60]. In cases such as empathetic interviewing, multimodal agentic AI are a more advanced use of audio input as both parse input and response are captured and subsequently provided in a structured, emotionally intelligent discussion [153]. Audio input allows agentic AI to extend their capabilities of real-time communications that exhibit human-like attributes.
3.5.5. Real-Time Data to Actions
Agentic AI can receive continuous real-time data from a sensor or camera, allowing it to obtain live information. Sensor data can detect a building’s power grid [73], analyze how it is being used, and instruct the building to save energy [70]. Agentic AI also suggests the optimal routes in traffic [103]. ML-based agentic systems can detect the issue [131] to optimise the operations [123]. When the AI is engaged in operational tasks, it is also measuring and detecting threats, assessing the environment and deciding on an immediate action [156], response to disruptions [30]. Using a trial RT learning approach, the system recognizes and stabilizes performances in the system’s strategizing over time [36]. Real-time data can also involve coordinates or the accurate location of a person, car, or agent in a game, allowing the system to respond immediately and perform actions timely manner [72,119].
3.5.6. Datasets to Text
Dataset as input for agentic AI systems is a pre-collected, structured set of data, and helps systems to analyze, learn from, or make decisions. When datasets are used as inputs in agentic AI systems, they can implement a whole hierarchy of autonomous and semi-autonomous actions in employee management and coaching, product management, risk identification, and pricing. These datasets can be textual, tabular, or multimodal. Datasets with employee feedback and behaviors are provided to employee management [146] and coaching applications [145] to be processed using agentic AI, NLP, SML, and deep learning models, to create performance assessments and feedback loops. In e-commerce applications, datasets with product metadata and customer interaction data are provided to transformer models using PyTorch, an open-source deep learning framework [142] to automatically generate and update product descriptions. Pricing applications typically employ Q-learning-based reinforcement learning algorithms [157] in order to learn from sales datasets and modify pricing for maximum profitability.
3.5.7. Datasets to Actions
In applications where action or intelligence is needed, such as for safety or efficiency management through automation with utilities, robotics or autonomous vehicles or processes and systems (such as supply chains), agentic AI systems use action-based datasets, which contain information about conducted actions, so they can learn how to recognize anomalies [117], mitigate threats, or identify the best actions to optimize processes [63,158]. Based on how the system adapted, feedback is generated autonomously in real time [66]. Based on given datasets, an agentic AI system can autonomously participate in trading, investments [80], and adapt to tasks [152].
3.5.8. Dataset to Dataset
The purpose of datasets extends to predictive applications, such as health, policing, and financial services, where agentic AI apply structured datasets to predict where possible threats or issues might arise, using XAI [127], ML algorithms [67], and LLMs [35] for forecasting, risk assessment, or strategy formulation, etc. Agentic AI also helps in software applications by providing the training and testing datasets, used for ML models, if a raw dataset is given. This supports that agentic AI uses streaming data, providing both static and dynamic datasets.
3.5.9. Dataset to Alerts
Alerts as output are notifications or warnings to inform users about specific events; these events, most of the time, will be detecting risks [102] or fraud detections [129] in agentic AI systems. Here, the dataset acts as an input knowledge source for the system, which is analysed to detect risks or fraud.
3.5.10. Multimodel Data to Multimodel Data
Agentic AI can utilize multimodal data from various types, including text, video, images, audio, graphs, and datasets to generate personalized outputs to act more intelligently, flexibly, and responsively. Agentic AI uses a number of sensory channels to help the machine perceive complex environments, actuate behavior to align with human intentions, and make decisions based on the real-world context. Reviewing work using LSTM-based deep learning and reinforcement learning effectively aligned behaviors of robots with human goals using simultaneous visual and textual trainable inputs [124]. Additionally, multimodal pipelines based on audio and video can build a spatial–temporal reasoning capability, enabling perception and context decision-making amidst complex situations [95]. Models like LLMs and VLMs enable adaptive decision-making in agentic decision-making when processes use combinations of text, images, and audio, thus enabling an agentic system to demonstrate embodied reasoning and task execution towards articulated goals [33,135].
Structured representations of user goals, and multimodal input like speech signals, visual cues, and brief actions are the main methods of prompting agents to behave in particular ways [132]. The agents act upon abstract, high-level user goals instead of explicit commands, which allows the agent to figure out how and when to respond. Many of the processes used intermediate representations of states and often had feedback loops to make decisions in dynamic environments, as we depicted in the annotation [71,105].
3.5.11. Beyond Conventional Input–Output Methods
In addition to standard input–output models, which usually use structured text inputs or standard datasets and files, agentic AI systems are increasingly utilizing other kinds of input data that are spatial, document, or alert-based. These different media for inputs give AI agents very high degrees of freedom, control, and adaptability for a number of actions they typically perform in a domain-oriented task. Exact coordinates of the agent help to plan a path that is entirely collision-free with respect to any agents involved [69]. This enables robotics and multi-agent systems to function without any centralized controller. Numeric data supporting SCADA connections allow AI agents [43] to optimize key process parameters, such as emission output, generally used in chemical factories. Alert-based inputs represent real-time set triggers where agentic AI can respond autonomously to reduce risk associated with incidence and Tier 1 and Tier 2 Incident response actions [161].
Code files, including .cs files, represent an input to AI agents like Codex that modify codebases or create Git diffs and logs to be able to allow for agentic to autonomous software maintenance and version tracking [51]. CSV files can provide AI with inputs and can be used in combination with pretrained ML models to cue AI to produce and debug the energy modeling code with tools like LangGraph with ReAct and PythonREPL [58]. PDFs are mostly considered unstructured input types. LLM APIs like Gemini 1.5 [99] can be used to parse and extract, and then structure clinical datasets, so AI has passed the milestone of being able to handle semi-structured documents. These examples represent agentic systems which push beyond the inputs of traditional models toward AI systems that accommodate varied formats of data in order to operationalize in specialized and high-stakes contexts.
3.5.12. Core Input–Output Mechanisms in Agentic vs. Traditional Systems
Traditional AI systems generally work on structured, static inputs and produce fixed [71] or predefined outputs [135]. The systems evaluate input against rule-based logic or supervised learning models that have been trained by available labels on training sets, all within an institutional context that necessitates human action or intervention to make decisions and take action [47]. Inputs typically comprise historical data [78], clearly defined user commands or queries, and are mapped to singular outputs such as classifications or recommendations [132]. Based on narrow sets of inputs and outputs, these systems work well in a constrained environment with predictable data structure and have less success in contexts with complex and evolving problems [95,106]. Research papers [57,80] emphasize that traditional AI with structured inputs provides shallow outputs, and becomes exhausting to use with frequent human intervention.
In comparison, agentic AI systems operate with multimodal [33], dynamic [106], and contextual inputs like real-time sensor data [116], natural language [126], intent-based actions [148], and feedback and results in autonomous outputs and/or actions that are goal-directed multi-step plans, executable actions, and adaptive decisions [157]. Agentic AI reasoning is temporally continuous, socially collaborative, and learns from its environments [38,136]. Paper [77] supports the argument that agentic AI outputs are indicative of rich, contextual information and proactive responses, while [143] describes the agentic AI system’s adaptability to user-defined workflows through its contextually adapted inputs and predictions-oriented predictions. Unlike non-agentic systems, agentic AI operates continuously, proactively, and is not dependent on an input to act; rather, it anticipates requirements, operates autonomously, provides strategic options, remembers and learns, and frequently works collaboratively with other agents to produce an output, as compared to making a prediction based on data input with a simple action [55,161].
Researchers of papers [37,132] discuss real-time adaptability of agentic AI and long-term modeling of preferences, while [31,125] start to explore deeper meanings of situational awareness and internal goals. Papers [38,71] make explicit comparisons, comparing the ordering of decision-making chains and the moments when systems are orchestrated. In evaluating quality and how useful those offering applied system analyses or definitive architectural comparisons, [31,41,95] closely align with research pointing toward modeling agentic input–output pathways. Learning about these input–output functions is important for designing future AI systems that are both intelligent and autonomously able to demonstrate self-direction and situational awareness, and also be able to operate in an unattended state.
3.6. What Evaluation Methods and Metrics Are Used to Assess the Performance of Agentic AI Systems?
Evaluation metrics used to assess the performance of agentic AI systems are increasingly critical as these systems are expanding in every domain. With the increasing applications of agentic AI in areas ranging from autonomous robotics to intelligent decision-making, it is very important to validate and test the system’s performance to ensure reliability, efficiency, and safety. This evaluation helps to identify the strengths and limitations of the system, which supports their improvement and responsible use. This section discusses more about various evaluation metrics and testing methods.
Figure 11 represents the categorization of performance measures of agentic AI, which diverges with two categories. They are evaluation metrics and testing methods. In practice, testing and evaluation metrics are closely related but not the same. Testing is where the system will be given experiments or scenarios on its performance. Evaluation metrics are the specific measures used to quantify how well the agentic AI system performs. They are used to interpret the results from testing.
Figure 11.
Classification diagram for testing methods and evaluation metrics that are used to assess the performance of agentic AI systems.
The rationale for our performance analysis of agentic AI classification is to present the different types of testing methods and evaluation metrics of agentic AI systems. It organizes methods and metrics based on the aspects that are being tested in the system. It presents different testing methods from the literature in one place, and the results from those tests are interpreted to obtain evaluation metrics. Using these metrics, the system’s performance is evaluated. Figure 11 divides methods and metrics separately because many methods in agentic AI have different metrics that overlap, such as accuracy. Each metric is further specified as qualitative or quantitative to independently identify what is being measured and how it is measured.
3.6.1. Testing Methods
Automated Test Generation [110] is a method in which tools like CodeT, Reflexion, and ClarifyGPT are used to refine code and detect ambiguous requirements. Formal Verification [95] is used to test whether the system is exactly as specified, without errors or unexpected behavior.
Runtime monitoring [51] is a live, ongoing test to find how well the system is running in real time. This allows for the identification of errors and abnormal behavior, allowing agents to quickly respond, like making a rollback or remediate before a system crash. Heartbeat monitoring [123] simply checks if the system is alive by sending requests periodically. It helps in detecting failures and alerting IT without interrupting users.
Fault injection [123] is the act of injecting errors or faults into a model so that one may observe the model’s behavior and use this methodology to ensure that the model can still perform its job correctly in a faulty state. Automated recovery [150] is deployed to test the system’s capabilities to reflect and recover from failures while returning to normal operation without human intervention.
Benchmark testing [61,113] is a process that includes evaluation of an agentic AI system’s performance on standardized tasks and metrics that validate its operational and reliable capabilities in real-world applications. Stress testing [39] is to test the model with unusual or extreme inputs to see if it stays reliable and functionally adaptable to change. A/B testing [132] is a process of running live experiments with users to see which version works better in real life.
Cross-domain tests [76] allow us to evaluate whether an agentic AI is performing well across different models or tasks with registers outside its original training. It involves tests in different domains like MineDojo, ALFRED, and ScienceWorld. Cross-platform Agent Benchmark (CRAB) [75] evaluates multimodal embodied language model agents with graphic-based tests and a unified Python interface.
Simulations [162] are used to model ethical dilemmas, especially in areas like autonomous vehicles and healthcare, allowing people to evaluate AI decisions in realistic and interactive scenarios. Digital twins [163] are real-time virtual models used for fault prediction and performance optimization, combined with LLMs, enabling smart, safe, and automated management of the system.
Human-in-the-loop verification [45] includes human involvement to improve the reliability, interpretability, and trustworthiness of AI-driven scientific discoveries. It highlights the need for human oversight to address AI limitations and support validation and accountability. User Acceptance Testing (UAT) [104] examines the system with real users to validate that the system satisfies their needs, wants, and user expectations to ensure user satisfaction and accountability.
Testing for Harmful Capabilities [46] is testing whether agents generate harmful outputs or enable malicious actions so they can be mitigated, to ensure safety. Heuristic AGI [134] testing is a simple way to evaluate models based on certain achievements, like passing a Turing test or solving important problems. This is quick as it does not involve fully measuring their true abilities.
Workflow-centric evaluation [50] considers evaluating the full decision-making process from perception to reasoning to acting rather than just evaluating the end product. Here, the agentic AI’s behavior is validated and verified across workflows when using AI-driven testing workflows, advanced monitoring, and metrics. Independent Testing of Subtasks [46] is a process of isolating each of the subtasks of complex tasks and testing each subtask by itself for reliability; high-risk actions will always take precedence.
3.6.2. Evaluation Metrics
The evaluation metrics are divided into qualitative metrics and quantitative metrics to be able to assess the system’s overall performance, behavior, and machine and human interaction. Quantitative metrics measure numerical performance, while qualitative metrics measure the quality and adaptation of the system, along with the user experience of interaction with the system.
Explainability: The degree to which the system produces clear and understandable reasons or explanations for its actual decisions or outputs. Explainability is tested using self-reflection and cross-agent reflection for foundation models like GPT-4 [133,150], LLaMA [133], and Compliance Agentic Model (CAM) [44] through reasoning traceability and in Artificial Cognitive Agents [122] by evaluating how well they mimicked human reasoning and explained decisions clearly.
Transparency: A measure of how openly the AI system’s inner workings and decision processes are made visible or available for inspection. Transparency for models like AMBY and Drawcto [135] is tested through user-facing clarity, Constitutional AI models [44] using predefined ethical principles and Hybrid Transparency Models [37] are tested through open-source governance components and external audits.
User Satisfaction: It is a measure of how well the agentic AI system meets the needs, expectations, and preferences of users [163]. Models like Bi-level are Learnable LLM Planner (BiLLP) [132], and user satisfaction is evaluated by measuring how well the agent balances short-term and long-term user preferences in recommended tasks. For user satisfaction in agentic chatbots for achieved and ideal customer service [47], satisfaction can be measured by the system accommodating for user mood and personalizing decision-making responses.
Fairness: It can be used to identify biases system in agentic AI, for example, responding in different ways, or differing response quality in different demographic groups [56]. Fairness can also use methods such as SHAP [101], or counter-factual analysis [101] to measure fairness and bias in the agentic AI experience.
Bias Mitigation: It can be used to reduce or eliminate bias from agentic AI systems. If agentic AI systems that amplify societal biases can lead to unfair outcomes, [49], particularly oscillating more aggregated biases in regards to vulnerable groups. It is important for LLMs to implement bias mitigation in contexts, such as citing GPT-4 [56], ensuring commitment for both equitable and ethical deployment.
Co-operative Behavior: It can be used to measure how well AI agents collaborate and coordinate with one another to achieve mutual goals. Agentic AI systems can collaborate via social contract rules, for example, using shared rules [157] or cooperating and learning agentic AI behavior via reinforcement learning rules. Cooperation behavior is actively promoted using the MAAC model [70] to improve resource management across multiple agents.
Adaptability: It encompasses ongoing interpersonal learning and agile feedback and tuned goals and measures via conceptual multi-agent models, formal policy operationalizations such as Petri Nets [154], and applied second burst benchmark measures that simulate unpredictable, complex, and dynamic environments [75]. Adaptability is tested by how AI agents and humanoid robots adjust tasks and processes dynamically in real time [68].
Robustness: It is a measure of the ability of an agentic AI system to maintain goal-directed performance [48] despite internal failures and external adversities. For agentic coding systems like Codex (OpenAI) [51] and Google Jules [51], robustness is tested via sandboxed execution, traceability, and rollback. AgentHarm Benchmark [136] is used to test robustness against adversarial LLM agent behavior.
Accuracy: It is a measure of how correct and precise an agentic AI system’s outputs are compared to the expected or true results. Agentic frameworks like LangGraph [53] and CrewAI [53] show around 94% accuracy in complex task execution. Models like Claude 3.5 [138] and LLaMa-3.2 [138] can show 96% accuracy with efficient plan caching and keyword-based retrieval. Platforms like IBM Watson Health [66] and tools like E-rater [57] and IntelliMetric [57] maintain high scoring accuracy in predicting and evaluating.
Precision: It is a measure of how many were actually correct out of all positive predictions made by a model, to find the accuracy of positive predictions. Most models like YOLO [109] and Mask R-CNN [109] have nearly perfect precision (almost equal to 1) for automated injury detection tasks. Models like GPT-3 [65] and LLaMA-2 [102] achieved higher precision than BERT [81], and the Swarm Agent [119] framework is better in precision than the Decision Tree [116] for classifications.
Recall: It measures how many the model gets right out of all actual positives, to obtain all positive instances. Agentic AI models have maintained high recall across different applications like EcoptiAI [142] in e-commerce retrieval, LLaMA 3.1 [102] in clinical text extraction, InteRecAgent potentially using RecLLaMA [79] in recommendation systems, and GPT-4o [126] in digital twins in fault detection.
F1 score: It is a measure that combines precision and recall into a single number by calculating their harmonic mean. The F1 score is calculated by multiplying precision and recall, and its double is dividing by the sum of precision and recall [126,142]. So, it is clear that the F1 score will depend on precision and recall.
Graph Edit Distance (GED): An important structural metric in agentic AI, using the level of structural similarity between an AI-generated task graph and a ground-truth graph as a measurement. When using the frameworks within a GPT-4-based algorithms [52,105] (gpt-4-0613), GED was estimated based on the minimum number of edits to the nodes and edges of a graph. This ultimately allows you to quantify the accuracy of the multi-hop reasoning and tool-use planning measured by the metrics; the lower the score, the closer to expert behavior we are.
Rule Fidelity: A measure of how accurately the symbolic, human-readable rules generated by the system reflect the actual decision-making process. The frameworks building the Neuro-Symbolic Agent [161], AutoML Agent [54], and the Agentic Planner [54] achieve roughly 94% rule fidelity based on their ability to concisely convert the decision-made learned model to human-readable rules.
Task Completion Time (TCT): It measures the time taken by the agent to plan, execute, and complete a particular task. This is a measure of how well the system performs operationally, along with the speed at which the AI completes complex workflows without human involvement. Model implementations LangGraph [58], Autogen [151], and GoEX [53] averaged 12 ± 2 min TCT, which is a 34.2% improvement against traditional AI in general.
Click-Through Rate (CTR): It assesses how good of a job the agentic AI system interacts with the user to click a recommendation or content we list to them, e.g, associating CTR with the system’s relevance, impact, etc. Models like the GPT-4 (ColdLLM)6, the CG4CTR6 and the LLM-InS6 simulators were used to test if it was possible to associate different scenarios of CTR to the recommendation list through simulations of common interest content personalization, visual design impact, Inter environmental dialog personality, or deferred dialog.
Gross Merchandise Value (GMV): It measures the total dollar amount of the items sold or the total dollar amount of completed transactions due to the actions of, or recommendations made by, the AI. The ColdLLM [132] simulations used user interaction to gain improvements in making recommendations for new items, which positively impacted the returned GMV in actual A/B testing.
Agentic AI accuracy is context-dependent and evaluated in different ways. In many cases, we have seen systems that have numerical accuracy like ECG AI, that has an accuracy of 96–97% [164], and Organa, that can report >90% accuracy [150]. Otititi emphasized that these efficiencies do not mean the AI will have a fixed set of tasks or goals on which basis we evaluate it. Previously discussed systems have emphasized operational relevance rather than static scores [35,134]. Some ways accurate agentic AI improved were by issue structuring in modular team collaborations, simulations like Agent4Rec, that repeat user activity and modeling to run those activities in simulation [132], and self-improvement via reasoning through feedback loops and by using knowledge graphs [37,134]. Some papers say system success is based on accuracy only in domains like healthcare and fraud detection [41,47], while others say system success also depends on transparency, fairness, and misinformation suppression [156,161]. Our view suggests that accuracy in agentic AI is best understood not as a single metric but as a multi-faceted outcome shaped by goal alignment, domain context, and the AI’s ability to plan, adapt, and reason.
Technologies like self-healing architectures can identify failures and correct problems on their own [71,121,123], while tools like retrieval-augmented generation, tree-of-thought backtracking, and iterative debugging can help rectify hallucinations and task-related mistakes [58,75,82,132]. Multi-agent cross-verification, confidence-based reruns, and actor–critic feedback loops improve reliability in dynamic scenarios [65,79,140]. These are all the mechanisms of error recovery in agentic AI systems. Metrics for error recovery remain underdeveloped, and systems often struggle with conceptual mistakes, cascading faults, or complex edge cases [105,136,165]. Human checks, backup systems, clear logs, and interpersonal support allow them to be more robust [39,95,135]. Future agentic AI systems will strive to automatically identify and recover from errors with reinforcement learning, dynamic replanning and fault-tolerant architectures [33,49,66,166].
Parallel processing, semantic filtering, and real-time planning reduce duplication and response delays, improving both speed and scalability of the system [36,81,105]. Task efficiency improves through modular multi-agent systems, lightweight models, and adaptive architectures [31,111,132]. While efficiency defines quick responses and low energy use, most studies agree that agentic AI performs better than traditional systems through goal-oriented, optimized task execution.
A further central key component for trust to operate in agentic AI systems is transparency, explainability, and ethical design. Papers [37,132,135] say that trust can be developed through reasonable explanations, fairness, and accountability. Papers [65,71] highlight the potential for trust to break down as large models remain subject to errors, attacks, and hallucinations. Some researchers suggest dynamic approaches like system refresh [120] or incremental querying and reflection [133] to manage these risks.
Some papers talk about conceptual models based around trust, for example, cognitive, affective and social trust cited as [128], and other papers talk about metrics, for example, Explainability Scores [105], and reliability [99]. It illustrates how agentic AI trust research is still emerging. Some studies say that user satisfaction depends on how easy and trustworthy using an agentic AI system [130,141], while others say it depends more on how well the AI adapts and matches the user’s values [31,37]. But the actual user satisfaction depends on many factors like meeting user’s expectations, usefulness, ease of use, speed, task completion, and accuracy.
3.7. What Are the Key Challenges and Limitations in Designing and Deploying Agentic AI Systems?
Agentic AI systems bring a number of technical and social challenges. The impact of these systems depends largely on the need to constantly sense, plan, and act in dynamic environments, which requires well-designed architectural and layered systems. At the same time, the implementation of agentic AI systems raises challenges with trust, safety, ethics, and governance that require oversight and justifiable design frameworks.
Architectural/Technical Challenges: Architectural design is a key consideration for agentic AI systems. There are limitations surrounding dynamic task de-composition, hierarchical planning, and adaptive reasoning in changing environments [105]. Recent frameworks to support generalization to diverse tasks have limitations pertaining symbolic reasoning and structural rigidity [38]. Classical agent models do not support long-term strategic planning, leading to brittleness in dynamic settings [106]. Additionally, handling large and variable data including issues with volume, velocity, variety, and veracity places excessive strains on current system architecture [54]. The construction of opaque architecture limits interpretability and makes coupling with a real-world instantiation difficult [157]. Overall, we seek to develop modular, transparent scalable agent architectures that can operate under uncertainty [167].
Ethical/Societal/Governance Limitations: Agentic AI has many ethical, societal, and governance problems that need attention. There are privacy and security risks; almost 78% of organizations reported problems with data safety while using these, which shows they needed stronger rules to protect personal information [53]. Secondly, about accountability, as they are trained with reinforcement learning and, sometimes they act in unexpected ways. Even the developers cannot explain why the agentic AI made a certain choice, which makes it difficult to decide who is responsible [157]. Third, when several agentic AIs work together, there can be a communication failure. Due to these failures, it can create safety risks and increase unfair or biased decisions [105]. Finally, when the data are used by agentic AI, they may often be broken, incomplete, or poor in quality, so this also leads to biased and unfair outcomes, and that may affect trust and fairness [135].
Performance/Tool Integration Issues: Agentic AI systems continue to face ongoing problems regarding performance and tool integration. When executing multi-step and complicated tasks, there is often delay or issues pertaining to computational inefficiencies, especially during simultaneous interaction of the agents and the set of diverse external tools in real time [53]. The issues will not be improved as there is no standard interfaces offered in tool/software development to integrate with, especially with commercial or domain specific workflows [105]. Furthermore, limited design, test, and deployment support for agent design, testing and deploy environments limits the agents’ scale and stable design and development [135]. This is especially true in critical areas such as healthcare, where the limits can threaten trust and effectiveness to operate. This highlights the need for effective agent architectures, and powerful toolchains to support reliable and scalable deployment.
Multi-Agent Coordination: Multi-agent coordination has recurrent difficulties in agentic AI systems, especially in terms of collaboration, synchronization, and scaling [105]. When multiple agents operate in shared environments, breakdowns in communication or strategy alignment between agents can lead to inefficiencies in establishing tasks, or complete system failure. Operational challenges that arise in the real-world are often the result of coordination failures like inconsistency in data access, communication latency, and misalignment stemming from a failure to share context [53]. Cooperative systems are also highly susceptible to correlated system failures resulting in collusion—even when there is no malicious intent—thereby diminishing issues of fairness, robustness, and transparency [135]. The problems demonstrate the necessity for scalable coordination protocols, methods for context awareness, and safety guarantees in multi-agent systems.
Human–AI Interaction: Human–AI interaction presents ongoing difficulties for the effective deployment of agentic AI systems. Many agents have limited transparency and interpretability, discouraging trust for users, and limiting collaboration, particularly when the stakes are high [105]. Effective human–AI interaction often entails poor or insecure interface design, along with overly rigid communication protocols, which fail to address the differing user roles and contexts that may involve non-expert or at-risk user groups [53]. In addition, the lack of participatory aspects limits the opportunity for oversight and corrective feedback to occur in ethical or safety-critical contexts. Systems currently struggle to properly interpret and align with the subtlety of human intention, leading to misalignment of action, including on even simple tasks and moral misalignment on work meant to help others [162]. These problems suggest human-centered agent approaches that increase explainability, allow for adaptable interaction, while fostering broader value alignment in communities and environments characterized by diverse technical differences.
Security: Security is still a fundamental issue in the design and implementation of agentic AI systems. Agents often function autonomously, and will operate in dynamic and diverse real-world conditions that make them susceptible to a range of risks (e.g., adversarial attacks, data poisoning, or inappropriate use of external tools or APIs) [53]. Lack of verifiable pipelines to connect external knowledge sources or tools also presents a means for malicious misconduct to emerge and spread false information [105]. Lack of relevant oversight processes as a safeguard may also make it difficult for humans to detect or intervene in harmful behavior when it occurs [162]. To achieve resilience, agentic AI must establish secure-by-design principles, undertakes continuous audits, and establish guardrails that restrain misuse while continuing to perform their task [135].
4. Conclusions
Agentic AI refers to intelligent systems that act autonomously, plan and adjust steps, and operate in complex environments with minimal human oversight. Combining LLMs and reinforcement learning, these systems manage multi-step workflows, remember past interactions, and pursue multiple goals. Frameworks like LangChain, AutoGPT, BabyAGI, AutoGen, and OpenAgents illustrate this evolution, progressing from structured reasoning and iterative tasks to multi-agent collaboration and adaptive problem-solving. Agentic AI will share many of the following key components: perception, planning, execution, memory, reflection, orchestration, and interaction, but differ in terms of the orchestration style and control loop between agentic AI. LLMs are sometimes seen and used as reasoners, communicators, and orchestrators. The very best agentic systems build on the concept of explicit memory for behavior, and have built-in feedback and graph-based orchestration with their memory, to elicit adaptive, reliable behaviors. Agentic AI is capable of instilling efficiency, individualization, and decision-making in a wide variety of settings because of its offering of autonomous, coordinated, and responsive action. Agentic AI can complement humans, undertake more sophisticated processes, provide greater safety, as well as intelligent, real-time solutions covering all aspects of work. Agentic AI systems move well beyond simple input–output frameworks. They include a wide variety of data types like text, audio, multimodal inputs, datasets, real-time streams, and even specialized formats like code files and PDFs. They facilitate autonomous work across structured and unstructured environments while counteracting the unpredictable elements of the real world. Evaluating agentic AI systems is an important process, and it requires a multi-faceted approach that integrates technical accuracy with qualitative factors. The most successful agentic AI systems will balance performance through the operational aspect of the system with the explanation and fairness aspect of the system, ensuring its outputs best support users’ needs, values, and expectations. Agentic AI systems must contend with deeply intertwined technical, social, and governance challenges. Necessary evolution to agentic AI must involve creating modular, transparent, scalable systems while balancing the problems of alignment with human values, accountability, and fairness. Secure-by-design practices, ongoing oversight, and physics guidelines for human interactions, will be vital for establishing trust in the use of agentic AI systems and ensure safety in the many types of real-world dynamic environments.
Funding
This research received no external funding.
Data Availability Statement
See Appendix A and Appendix B.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Table A1.
Numerical overview of papers published per year.
Table A1.
Numerical overview of papers published per year.
| Year Published | Number of Papers |
|---|---|
| 1999 | 1 |
| 2005 | 1 |
| 2007 | 1 |
| 2014 | 1 |
| 2017 | 1 |
| 2018 | 1 |
| 2019 | 1 |
| 2020 | 1 |
| 2021 | 3 |
| 2022 | 6 |
| 2023 | 12 |
| 2024 | 21 |
| 2025 | 93 |
| Total | 143 |
Table A2.
Numerical overview of different publication types.
Table A2.
Numerical overview of different publication types.
| Paper Type | Number of Papers |
|---|---|
| Journals | 70 |
| Preprints | 38 |
| Conference Proceedings | 26 |
| Articles | 2 |
| Working papers | 2 |
| Book | 2 |
| Dissertation | 1 |
| Workshop | 2 |
| Total | 143 |
Table A3.
Numerical overview of different publication venues.
Table A3.
Numerical overview of different publication venues.
| Publication Venue | Number of Papers | Reference Code |
|---|---|---|
| arXiv | 32 | [31,33,38,39,40,42,51,71,75,77,81,87,90,98,100,111,113,116,120,132,136,137,138,141,143,144,148,150,157,159,166,167] |
| SSRN | 6 | [27,37,41,55,129,160] |
| Preprints | 3 | [36,53,96] |
| IEEE Access | 3 | [47,89,142] |
| Companion Proceedings of the ACM on Web Conference | 3 | [45,112,139] |
| IFAC-PapersOnLine | 3 | [61,72,119] |
| AI | 2 | [30,88] |
| ACM Conference on Fairness, Accountability, and Transparency | 2 | [34,168] |
| Journal of Business Research | 2 | [131,146] |
| Engineering Applications of Artificial Intelligence | 2 | [62,76] |
| ACM International Conference on Intelligent User Interfaces | 2 | [52] |
| Neural Information Processing Systems | 2 | [83,92] |
| Metallurgical and Materials Engineering | 2 | [130,134] |
| American Advanced Journal for Emerging Disciplinaries | 1 | [78] |
| ACM Transactions on Software Engineering and Methodology | 1 | [110] |
| ACM International Conference on User Modeling, Adaptation and Personalization | 1 | [97] |
| Advanced Engineering Informatics | 1 | [124] |
| Rani Channamma University Belagavi | 1 | [57] |
| AIS Transactions on Human Computer Interaction | 1 | [114] |
| American Journal of Computing and Engineering | 1 | [103] |
| Annual review of psychology | 1 | [162] |
| Applied Energy | 1 | [70] |
| Architectural Intelligence | 1 | [169] |
| Array | 1 | [66] |
| Association for Computing Machinery | 1 | [79] |
| Automation in Construction | 1 | [58] |
| BioSystems | 1 | [158] |
| Cell Reports Physical Science | 1 | [64] |
| Chapman and Hall/CRC | 1 | [170] |
| Clinical Neurophysiology | 1 | [99] |
| Computer Methods and Programs in Biomedicine | 1 | [59] |
| Computers and Electrical Engineering | 1 | [73] |
| Computers in Human Behavior: Artificial Humans | 1 | [128] |
| Cureus | 1 | [101] |
| Current Opinion in Chemical Engineering | 1 | [43] |
| Digital Discovery | 1 | [152] |
| European journal of analytics and artificial intelligence | 1 | [35] |
| Engineering | 1 | [82] |
| European Management Journal | 1 | [68] |
| Expert Systems with Applications | 1 | [135] |
| Extreme Mechanics Letters | 1 | [165] |
| PhD Dissertation | 1 | [32] |
| Foreign Languages in Higher Education | 1 | [147] |
| Frontiers in Computational Neuroscience | 1 | [122] |
| Frontiers in Human Dynamics | 1 | [154] |
| Informatics and Health | 1 | [56] |
| Information and Organization | 1 | [67] |
| International Journal of Computational Mathematical Ideas | 1 | [54] |
| International Conference on the AI Revolution: Research, Ethics, and Society | 1 | [50] |
| International Journal of Human-Computer Studies | 1 | [164] |
| International Journal of Research Publication and Reviews | 1 | [107] |
| Journal of Retailing | 1 | [140] |
| Journal of Building Engineering | 1 | [126] |
| Journal of Clinical and Experimental Hepatology | 1 | [102] |
| Journal of Computer Information Systems | 1 | [161] |
| Journal of Computer Science and Technology Studies | 1 | [95] |
| Journal of Retailing and Consumer Services | 1 | [145] |
| Journal of Strategic Information Systems | 1 | [149] |
| Journal of Systems and Software | 1 | [133] |
| Journal of Water Process Engineering | 1 | [63] |
| MethodsX | 1 | [109] |
| Multidisciplinary, Scientific Work and Management Journal | 1 | [80] |
| NeurIPS 2024 Workshop on open-world Agents | 1 | [105] |
| Optical Switching and Networking | 1 | [163] |
| International Conference on Agents and Artificial Intelligence | 1 | [106] |
| Procedia CIRP | 1 | [121] |
| Conference on Human Factors in Computing Systems. | 1 | [155] |
| Australasian Computer Science Week | 1 | [44] |
| Special Interest Group on Management Information Systems—Computer Personnel Research. | 1 | [156] |
| ACM International Conference on Autonomous Agents and Multiagent Systems | 1 | [74] |
| ACM International Conference on Information and Knowledge Management | 1 | [108] |
| ACM International Conference on Interactive Media Experiences | 1 | [60] |
| ACM International Conference on Human Factors in Computing Systems | 1 | [153] |
| International Conference on Automated Assembly Systems | 1 | [123] |
| Pacific Asia Conference on Information Systems | 1 | [48] |
| IEEE/ACM International Conference on Automated Software Engineering | 1 | [115] |
| ACM on Software Testing and Analysis | 1 | [118] |
| OpenAI | 1 | [46] |
| Review of Materials Research | 1 | [65] |
| Telecommunications Policy | 1 | [117] |
| The Artificial Intelligence Business Review | 1 | [49] |
| Tourism Management | 1 | [127] |
| Transport Policy | 1 | [104] |
| UIUC Spring 2025 CS598 LLM Agent Workshop | 1 | [151] |
| Urban Informatics | 1 | [125] |
| International Conference on Learning Representations | 1 | [84] |
| Future Internet | 1 | [86] |
| Springer Science and Business Media LLC | 1 | [91] |
| Sage Publications, Thousand Oaks, CA | 1 | [94] |
| Wiley Online Library | 1 | [93] |
| SmythOS | 1 | [85] |
| Total | 143 | |
Appendix B
Table A4.
Commonly used abbreviations.
Table A4.
Commonly used abbreviations.
| Acronym | Abbreviation |
|---|---|
| ABT | Agent-Based Traffic |
| ACP | Adaptive Control Planning |
| ASR | Automatic Speech Recognition |
| BDD | Behavior-Driven Development |
| BDI | Belief–Desire–Intention |
| BERT | Bidirectional Encoder Representations from Transformers |
| CAM | Compliance Agentic Model |
| CAMEL | Communicative Agents for “Mind” Exploration of Large Scale Language Model Society |
| CIF | Common Information Framework |
| CLIPS | C Language Integrated Production System |
| CRAB | Cross-Platform Agent Benchmark |
| CTR | Click-Through Rate |
| CUDA | Compute Unified Device Architecture |
| DILI | Drug-induced liver injury |
| DL | Deep Learning |
| DQN | Deep Q-Networks |
| EDX | Electrodiagnostic tests |
| FM | Foundation Model |
| GED | Graph Edit Distance |
| GMV | Gross Merchandise Value |
| ICA | Interactive Cognitive Agents |
| JADE | Java Agent DEvelopment Framework |
| LAM | Large Agent Model |
| LC | Logic-based Computing |
| LLMs | Large Language Models |
| LLaMA | Large Language Model Meta AI |
| LTM | Long-Term Memory |
| MADRL | Multi-Agent Deep Reinforcement Learning |
| MAAI | Multi-Agent Artificial Intelligence |
| MAS | Multi-Agent Systems |
| MASON | Multi-Agent Simulator Of Neighborhoods |
| MCTS | Monte Carlo Tree Search |
| ML | Machine Learning |
| NLP | Natural Language Processing |
| OOAD | Object-Oriented Analysis and Design |
| PDDL | Planning Domain Definition Language |
| PIBT | Priority Inheritance with Backtracking |
| PIBTTP | Priority Inheritance with Backtracking for Tree-shaped Paths |
| PPO | Proximal Policy Optimization |
| RAG | Retrieval-Augmented Generation |
| RL | Reinforcement Learning |
| RLHF | Reinforcement Learning with Human Feedback |
| SCADA | Supervisory Control and Data Acquisition |
| SOC | Security Operations Center |
| SOCRATEST | Self-Optimizing Contextual Reasoning and Adaptive Testing System |
| STM | Short-Term Memory |
| STRIPS | Stanford Research Institute Problem Solver |
| TCT | Task Completion Time |
| TB-CSPN | Task-Based Cognitive Sequential Planning Network |
| UAT | User Acceptance Testing |
| WWTP | Wastewater Treatment Plant Operation |
| XAI | Explainable Artificial Intelligence |
References
- IBM. Agentic AI. Available online: https://www.ibm.com/think/topics/agentic-ai (accessed on 14 August 2025).
- Amazon Web Services. The Rise of Autonomous Agents: What Enterprise Leaders Need to Know About the Next Wave of AI. Available online: https://aws.amazon.com/blogs/aws-insights/the-rise-of-autonomous-agents-what-enterprise-leaders-need-to-know-about-the-next-wave-of-ai/ (accessed on 14 August 2025).
- MarketsandMarkets. AI Agents Market. Available online: https://www.marketsandmarkets.com/Market-Reports/ai-agents-market-15761548.html (accessed on 14 August 2025).
- Grand View Research. AI Agents Market Report. Available online: https://www.grandviewresearch.com/industry-analysis/ai-agents-market-report (accessed on 14 August 2025).
- Stanford HAI. Simulating Human Behavior with AI Agents. Available online: https://hai.stanford.edu/policy/simulating-human-behavior-with-ai-agents (accessed on 14 August 2025).
- Zou, J.; Topol, E.J. The rise of agentic AI teammates in medicine. Lancet 2025, 405, 457. [Google Scholar] [CrossRef]
- Chawla, C.; Chatterjee, S.; Gadadinni, S.S.; Verma, P.; Banerjee, S. Agentic AI: The building blocks of sophisticated AI business applications. J. AI Robot. Workplace Autom. 2024, 3, 1–15. [Google Scholar] [CrossRef]
- White, J. Building living software systems with generative & agentic AI. arXiv 2024, arXiv:2408.01768. [Google Scholar] [CrossRef]
- TechRadar Pro. The Enterprise AI Paradox: Why Smarter Models Alone Aren’t the Answer. Available online: https://www.techradar.com/pro/the-enterprise-ai-paradox-why-smarter-models-alone-arent-the-answer (accessed on 14 August 2025).
- Cardoso, R.C.; Ferrando, A. A review of agent-based programming for multi-agent systems. Computers 2021, 10, 16. [Google Scholar] [CrossRef]
- Jin, H.; Huang, L.; Cai, H.; Yan, J.; Li, B.; Chen, H. From llms to llm-based agents for software engineering: A survey of current, challenges and future. arXiv 2024, arXiv:2408.02479. [Google Scholar] [CrossRef]
- Zhou, J.; Lu, Q.; Chen, J.; Zhu, L.; Xu, X.; Xing, Z.; Harrer, S. A taxonomy of architecture options for foundation model-based agents: Analysis and decision model. arXiv 2024, arXiv:2408.02920. [Google Scholar] [CrossRef]
- Li, X.; Wang, S.; Zeng, S.; Wu, Y.; Yang, Y. A survey on LLM-based multi-agent systems: Workflow, infrastructure, and challenges. Vicinagearth 2024, 1, 9. [Google Scholar] [CrossRef]
- Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 2024, 18, 186345. [Google Scholar] [CrossRef]
- Gao, C.; Lan, X.; Li, N.; Yuan, Y.; Ding, J.; Zhou, Z.; Xu, F.; Li, Y. Large language models empowered agent-based modeling and simulation: A survey and perspectives. Humanit. Soc. Sci. Commun. 2024, 11, 1259. [Google Scholar] [CrossRef]
- Novozhilova, E.; Mays, K.; Katz, J.E. Looking towards an automated future: US attitudes towards future artificial intelligence instantiations and their effect. Humanit. Soc. Sci. Commun. 2024, 11, 132. [Google Scholar] [CrossRef]
- Sami, A.M.; Rasheed, Z.; Kemell, K.K.; Waseem, M.; Kilamo, T.; Saari, M.; Duc, A.N.; Systä, K.; Abrahamsson, P. System for systematic literature review using multiple ai agents: Concept and an empirical evaluation. arXiv 2024, arXiv:2403.08399. [Google Scholar] [CrossRef]
- Dev, K.; Khowaja, S.A.; Singh, K.; Zeydan, E.; Debbah, M. Advanced architectures integrated with agentic ai for next-generation wireless networks. arXiv 2025, arXiv:2502.01089. [Google Scholar] [CrossRef]
- Huang, X.; Liu, W.; Chen, X.; Wang, X.; Wang, H.; Lian, D.; Wang, Y.; Tang, R.; Chen, E. Understanding the planning of LLM agents: A survey. arXiv 2024, arXiv:2402.02716. [Google Scholar] [CrossRef]
- Ke, Z.; Jiao, F.; Ming, Y.; Nguyen, X.P.; Xu, A.; Long, D.X.; Li, M.; Qin, C.; Wang, P.; Savarese, S.; et al. A survey of frontiers in llm reasoning: Inference scaling, learning to reason, and agentic systems. arXiv 2025, arXiv:2504.09037. [Google Scholar] [CrossRef]
- Ferrag, M.A.; Tihanyi, N.; Debbah, M. From llm reasoning to autonomous ai agents: A comprehensive review. arXiv 2025, arXiv:2504.19678. [Google Scholar] [CrossRef]
- Schneider, J. Generative to agentic ai: Survey, conceptualization, and challenges. arXiv 2025, arXiv:2504.18875. [Google Scholar] [CrossRef]
- Mishra, L.N.; Senapati, B. Retail Resilience Engine: An Agentic AI Framework for Building Reliable Retail Systems With Test-Driven Development Approach. IEEE Access 2025, 13, 50226–50243. [Google Scholar] [CrossRef]
- Raza, S.; Sapkota, R.; Karkee, M.; Emmanouilidis, C. Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems. arXiv 2025, arXiv:2506.04133. [Google Scholar]
- Cao, P.; Men, T.; Liu, W.; Zhang, J.; Li, X.; Lin, X.; Sui, D.; Cao, Y.; Liu, K.; Zhao, J. Large language models for planning: A comprehensive and systematic survey. arXiv 2025, arXiv:2505.19683. [Google Scholar] [CrossRef]
- Joshi, S. LLMOps, AgentOps, and MLOps for Generative AI: A Comprehensive Review. Int. J. Comput. Appl. Technol. Res. 2025, 14, 1–11. [Google Scholar]
- Joshi, S. Comprehensive Review of Artificial General Intelligence AGI and Agentic GenAI: Applications in Business and Finance. Int. J. Multidiscip. Res. Growth Eval. 2025, 6, 681–688. [Google Scholar] [CrossRef]
- Bolanos, F.; Salatino, A.; Osborne, F.; Motta, E. Artificial intelligence for literature reviews: Opportunities and challenges. Artif. Intell. Rev. 2024, 57, 259. [Google Scholar] [CrossRef]
- Sapkota, R.; Roumeliotis, K.I.; Karkee, M. Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenge. arXiv 2025, arXiv:2505.10468. [Google Scholar] [CrossRef]
- Olujimi, P.A.; Owolawi, P.A.; Mogase, R.C.; Wyk, E.V. Agentic AI frameworks in SMMEs: A systematic literature review of ecosystemic interconnected agents. AI 2025, 6, 123. [Google Scholar] [CrossRef]
- Shah, C.; White, R.W. Agents are not enough. arXiv 2024, arXiv:2412.16241. [Google Scholar] [CrossRef]
- Dharanikota, S. Psychological and Agentic Effects of Human-Bot Delegation in Open-Source Software Development (OSSD) Communities: An Empirical Investigation of Information Systems Delegation Framework. 2022. Available online: https://digitalcommons.fiu.edu/etd/5116/ (accessed on 14 August 2025).
- Ren, Y.; Liu, Y.; Ji, T.; Xu, X. AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing. arXiv 2025, arXiv:2507.01376. [Google Scholar] [CrossRef]
- Chan, A.; Salganik, R.; Markelius, A.; Pang, C.; Rajkumar, N.; Krasheninnikov, D.; Langosco, L.; He, Z.; Duan, Y.; Carroll, M.; et al. Harms from Increasingly Agentic Algorithmic Systems. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, New York, NY, USA, 12–15 June 2023. [Google Scholar] [CrossRef]
- Suura, S.R. Agentic artificial intelligence systems for dynamic health management and real-time genomic data analysis. Eur. J. Anal. Artif. Intell. (EJAAI) 2024, 2. [Google Scholar]
- Pavani, S.; Shwetha, H. Agentic AI: Redefining Autonomy for Complex Goal-Driven Systems. 2025. Available online: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C26&q=Agentic+AI%3A+Redefining+Autonomy+for+Complex+Goal-Driven+Systems.&btnG= (accessed on 14 August 2025).
- Mohammed Salah, A.; Alnoor, A.; Abdelfattah, F.; Chew, D.X. Agentic Ai Forging Adaptive, Equity-Driven Governance Pathways for Sustainable Futures. In XinYing, Agentic Ai Forging Adaptive, Equity-Driven Governance Pathways for Sustainable Futures. 2025. Available online: http://dx.doi.org/10.2139/ssrn.5229744 (accessed on 14 August 2025).
- Botti, V. Agentic AI and Multiagentic: Are We Reinventing the Wheel? arXiv 2025, arXiv:2506.01463. [Google Scholar] [CrossRef]
- Okpala, I.; Golgoon, A.; Kannan, A.R. Agentic AI Systems Applied to tasks in Financial Services: Modeling and model risk management crews. arXiv 2025, arXiv:2502.05439. [Google Scholar]
- Narajala, V.S.; Narayan, O. Securing agentic ai: A comprehensive threat model and mitigation framework for generative ai agents. arXiv 2025, arXiv:2504.19956. [Google Scholar] [CrossRef]
- Salah, M.; Alnoor, A.; Abdelfattah, F.; Al Halbusi, H. Agentic Artificial Intelligence in Public Administration: Foundations, Applications, and Governance. Applications, and Governance 2025. Available online: https://ssrn.com/abstract=5249100 (accessed on 10 May 2025).
- Jiang, F.; Pan, C.; Dong, L.; Wang, K.; Dobre, O.A.; Debbah, M. From large ai models to agentic ai: A tutorial on future intelligent communications. arXiv 2025, arXiv:2505.22311. [Google Scholar] [CrossRef]
- Boskabadi, M.R.; Cao, Y.; Khadem, B.; Clements, W.; Nevin Gerek, Z.; Reuthe, E.; Sivaram, A.; Savoie, C.J.; Mansouri, S.S. Industrial Agentic AI and generative modeling in complex systems. Curr. Opin. Chem. Eng. 2025, 48, 101150. [Google Scholar] [CrossRef]
- Menezes, V.P.; Chowdhury, M.J.M.; Mahmood, A. An Agentic Framework for Compliant, Ethical and Trustworthy GenAI Applications in Healthcare. In Proceedings of the 2025 Australasian Computer Science Week, New York, NY, USA, 10–13 February 2025. [Google Scholar] [CrossRef]
- Huang, L.; Koutra, D.; Kulkarni, A.; Prioleau, T.; Wu, Q.; Yan, Y.; Yang, Y.; Zou, J.; Zhou, D. Towards Agentic AI for Science: Hypothesis Generation, Comprehension, Quantification, and Validation. In Proceedings of the Companion Proceedings of the ACM on Web Conference, Sydney, Australia, 28 April–2 May 2025; pp. 1639–1642. [Google Scholar]
- Shavit, Y.; Agarwal, S.; Brundage, M.; Adler, S.; O’Keefe, C.; Campbell, R.; Lee, T.; Mishkin, P.; Eloundou, T.; Hickey, A.; et al. Practices for Governing Agentic AI Systems; Research Paper; OpenAI: San Francisco, CA, USA, 2023. [Google Scholar]
- Acharya, D.B.; Kuppan, K.; Bhaskaracharya, D. Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey. IEEE Access 2025, 13, 18912–18936. [Google Scholar] [CrossRef]
- Wissuchek, C.; Zschech, P. Exploring Agentic Artificial Intelligence Systems: Towards a Typological Framework. arXiv 2025, arXiv:2508.00844. [Google Scholar]
- Allam, H.; AlOmar, B.; Dempere, J. Agentic AI for IT and Beyond: A Qualitative Analysis of Capabilities, Challenges, and Governance. Artif. Intell. Bus. Rev. 2025, 1. Available online: https://theaibr.com/index.php/aibr/article/view/3 (accessed on 14 August 2025). [CrossRef]
- Horne, D. The Agentic AI Mindset–A Practitioner’s Guide to Architectures, Patterns, and Future Directions for Autonomy and Automation. 2025. Available online: https://www.researchgate.net/profile/Dwight-Horne/publication/390958865_The_Agentic_AI_Mindset_-_A_Practitioner’s_Guide_to_Architectures_Patterns_and_Future_Directions_for_Autonomy_and_Automation/links/6805a1eadf0e3f544f432cad/The-Agentic-AI-Mindset-A-Practitioners-Guide-to-Architectures-Patterns-and-Future-Directions-for-Autonomy-and-Automation.pdf (accessed on 14 August 2025).
- Sapkota, R.; Roumeliotis, K.I.; Karkee, M. Vibe coding vs. agentic coding: Fundamentals and practical implications of agentic ai. arXiv 2025, arXiv:2505.19443. [Google Scholar] [CrossRef]
- Brachman, M.; Kunde, S.; Miller, S.; Fucs, A.; Dempsey, S.; Jabbour, J.; Geyer, W. Building Appropriate Mental Models: What Users Know and Want to Know about an Agentic AI Chatbot. In Proceedings of the 30th International Conference on Intelligent User Interfaces, Cagliari, Italy, 24–27 March 2025; pp. 247–264. [Google Scholar]
- Sawant, P. Agentic AI: A Quantitative Analysis of Performance and Applications 2025. Available online: https://www.preprints.org/frontend/manuscript/b540908df60641a985f056de30899adb/download_pub (accessed on 14 August 2025).
- Peddisetti, S. Agentic AI Meets Data Engineering: Toward Self-Directed, Interpretable, and Balanced Pipelines. Int. J. Comput. Math. Ideas (IJCMI) 2025, 17, 17313–17325. [Google Scholar]
- Pamisetty, A. Application of Agentic Artificial Intelligence in Autonomous Decision Making Across Food Supply Chains. 2024. Available online: https://ssrn.com/abstract=5231360 (accessed on 14 August 2025).
- Karunanayake, N. Next-generation agentic AI for transforming healthcare. Inform. Health 2025, 2, 73–83. [Google Scholar] [CrossRef]
- Raidas, M.A.; Bhandari, R. Agentic AI in Education: Redefining Learning for the Digital Era. In Artificial Intelligence in EducationTransforming Learning for the Future; Scholars’ Press: London, UK, 2025; p. 89. [Google Scholar]
- Zhang, L.; Fu, X.; Li, Y.; Chen, J. Large language model-based agent Schema and library for automated building energy analysis and modeling. Autom. Constr. 2025, 176, 106244. [Google Scholar] [CrossRef]
- Montagna, S.; Mariani, S.; Schumacher, M.I.; Manzo, G. Agent-based systems in healthcare. Comput. Methods Programs Biomed. 2024, 248, 108140. [Google Scholar] [CrossRef] [PubMed]
- Vahdati, M.M.; Gholizadeh HamlAbadi, K.; Laamarti, F.; El Saddik, A. A Multi-Agent Digital Twin Framework for AI-Driven Fitness Coaching. In Proceedings of the 2025 ACM International Conference on Interactive Media Experiences, New York, NY, USA, 3–6 June 2025. [Google Scholar] [CrossRef]
- Daugherty, G.; Reveliotis, S.; Mohler, G. Optimized Multi-Agent Routing in Guidepath Networks. IFAC-PapersOnLine 2017, 50, 9686–9693. [Google Scholar] [CrossRef]
- Nourani, C. Multiagent AI implementations an emerging software engineering trend. Eng. Appl. Artif. Intell. 1999, 12, 37–42. [Google Scholar] [CrossRef]
- Nam, K.; Heo, S.; Kim, S.; Yoo, C. A multi-agent AI reinforcement-based digital multi-solution for optimal operation of a full-scale wastewater treatment plant under various influent conditions. J. Water Process Eng. 2023, 52, 103533. [Google Scholar] [CrossRef]
- Li, J.-H.; Hu, Y.; Xia, G.; Mo, W.; Li, B.; Jia, Y.; Gao, Y.; Xuan, F.; Liu, H.; Lian, C. Coevolution of large language models with physical models boosts advanced battery research. Cell Rep. Phys. Sci. 2025, 6, 102553. [Google Scholar] [CrossRef]
- Wang, G.; Hu, J.; Zhou, J.; Liu, S.; Li, Q.; Sun, Z. Knowledge-guided large language model for material science. Rev. Mater. Res. 2025, 1, 100007. [Google Scholar] [CrossRef]
- Hosseini, S.; Seilani, H. The role of agentic ai in shaping a smart future: A systematic review. Array 2025, 26, 100399. [Google Scholar] [CrossRef]
- Leonardi, P.M. Homo agenticus in the age of agentic AI: Agency loops, power displacement, and the circulation of responsibility. Inf. Organ. 2025, 35, 100582. [Google Scholar] [CrossRef]
- Korzynski, P.; Edwards, A.; Gupta, M.C.; Mazurek, G.; Wirtz, J. Humanoid robotics and agentic AI: Reframing management theories and future research directions. Eur. Manag. J. 2025, 43, 548–560. [Google Scholar] [CrossRef]
- Fujitani, Y.; Yamauchi, T.; Miyashita, Y.; Sugawara, T. Deadlock-Free Method for Multi-Agent Pickup and Delivery Problem Using Priority Inheritance with Temporary Priority. Procedia Comput. Sci. 2022, 207, 1552–1561. [Google Scholar] [CrossRef]
- Xie, J.; Ajagekar, A.; You, F. Multi-Agent attention-based deep reinforcement learning for demand response in grid-responsive buildings. Appl. Energy 2023, 342, 121162. [Google Scholar] [CrossRef]
- Sifakis, J.; Li, D.; Huang, H.; Zhang, Y.; Dang, W.; Huang, R.; Yu, Y. A Reference Architecture for Autonomous Networks: An Agent-Based Approach. arXiv 2025, arXiv:2503.12871. [Google Scholar]
- Deng, L.; Shu, Z.; Chen, T. Event-Triggered Robust Distributed MPC for Multi-Agent Systems with A Two-Step Event Verification. IFAC-PapersOnLine 2022, 55, 144–149. [Google Scholar] [CrossRef]
- Coordination of modular nano grid energy management using multi-agent AI architecture. Comput. Electr. Eng. 2024, 115, 109112. [CrossRef]
- Multimodal Agentic Model Predictive Control. In Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, Detroit, MI, USA, 19–23 May 2025.
- Luo, J.; Zhang, W.; Yuan, Y.; Zhao, Y.; Yang, J.; Gu, Y.; Wu, B.; Chen, B.; Qiao, Z.; Long, Q.; et al. Large language model agent: A survey on methodology, applications and challenges. arXiv 2025, arXiv:2503.21460. [Google Scholar] [CrossRef]
- Liu, J.; Hao, W.; Cheng, K.; Jin, D. Large language model-based planning agent with generative memory strengthens performance in textualized world. Eng. Appl. Artif. Intell. 2025, 148, 110319. [Google Scholar] [CrossRef]
- Bousetouane, F. Agentic Systems: A Guide to Transforming Industries with Vertical AI Agents. arXiv 2025, arXiv:2501.00881. [Google Scholar] [CrossRef]
- Kalisetty, S.; Singireddy, J. Agentic AI in Retail: A Paradigm Shift in Autonomous Customer Interaction and Supply Chain Automation. Am. Adv. J. Emerg. Discip. (AAJED) 2023, 1. [Google Scholar]
- Huang, X.; Lian, J.; Lei, Y.; Yao, J.; Lian, D.; Xie, X. Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations. ACM Trans. Inf. Syst. 2025, 43, 1–33. [Google Scholar] [CrossRef]
- Paleti, S. Agentic AI in Financial Decision-Making: Enhancing Customer Risk Profiling, Predictive Loan Approvals, and Automated Treasury Management in Modern Banking. Multidiscip. Sci. Work. Manag. J. 2024, 34, 832–843. [Google Scholar]
- Trirat, P.; Jeong, W.; Hwang, S.J. Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding. arXiv 2025, arXiv:2505.19764. [Google Scholar] [CrossRef]
- Li, X.; Shi, W.; Zhang, H.; Peng, C.; Wu, S.; Tong, W. The Agentic-AI Core: An AI-Empowered, Mission-Oriented Core Network for Next-Generation Mobile Telecommunications. Engineering 2025, in press. [CrossRef]
- Li, G.; Hammoud, H.; Itani, H.; Khizbullin, D.; Ghanem, B. Camel: Communicative agents for" mind” exploration of large language model society. Adv. Neural Inf. Process. Syst. 2023, 36, 51991–52008. [Google Scholar]
- Hong, S.; Zhuge, M.; Chen, J.; Zheng, X.; Cheng, Y.; Zhang, C.; Wang, J.; Wang, Z.; Yau, S.K.S.; Lin, Z.; et al. MetaGPT: Meta programming for a multi-agent collaborative framework. In Proceedings of the International Conference on Learning Representations, ICLR, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- De Ridder, A. SuperAGI vs LangChain: A Comprehensive Guide. Available online: https://smythos.com/developers/agent-comparisons/superagi-vs-langchain/ (accessed on 29 August 2025).
- Borghoff, U.M.; Bottoni, P.; Pareschi, R. Beyond Prompt Chaining: The TB-CSPN Architecture for Agentic AI. Future Internet 2025, 17, 363. [Google Scholar] [CrossRef]
- Liu, Z.; Yao, W.; Zhang, J.; Yang, L.; Liu, Z.; Tan, J.; Choubey, P.K.; Lan, T.; Wu, J.; Wang, H.; et al. AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System. arXiv 2024, arXiv:2402.15538, 15538. [Google Scholar] [CrossRef]
- Thirumalainambi, R. Pitfalls of JESS for Dynamic Systems. In Proceedings of the Artificial Intelligence and Pattern Recognition, Citeseer, Orlando, FL, USA, 9–12 July 2007; pp. 491–494. [Google Scholar]
- Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef]
- Zhang, K.; Yang, Z.; Başar, T. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms. arXiv 2021, arXiv:1911.10635. [Google Scholar] [CrossRef]
- Hernandez-Leal, P.; Kartal, B.; Taylor, M.E. A survey and critique of multiagent deep reinforcement learning. Auton. Agents Multi-Agent Syst. 2019, 33, 750–797. [Google Scholar] [CrossRef]
- Silver, T.; Hariprasad, V.; Shuttleworth, R.S.; Kumar, N.; Lozano-Pérez, T.; Kaelbling, L.P. PDDL planning with pretrained large language models. In Proceedings of the NeurIPS 2022 Foundation Models for Decision Making Workshop. 2022. Available online: https://drive.google.com/file/d/1PqCYzzfdUFitxG7NxFs0D6dn-TzI97q_/view (accessed on 14 August 2025).
- Chiacchio, F.; Pennisi, M.; Russo, G.; Motta, S.; Pappalardo, F. Agent-based modeling of the immune system: NetLogo, a promising framework. BioMed Res. Int. 2014, 2014, 907171. [Google Scholar] [CrossRef] [PubMed]
- Luke, S.; Cioffi-Revilla, C.; Panait, L.; Sullivan, K.; Balan, G. Mason: A multiagent simulation environment. Simulation 2005, 81, 517–527. [Google Scholar] [CrossRef]
- Garg, V. Designing the Mind: How Agentic Frameworks Are Shaping the Future of AI Behavior. J. Comput. Sci. Technol. Stud. 2025, 7, 182–193. [Google Scholar] [CrossRef]
- Allmendinger, S.; Bonenberger, L.; Endres, K.; Fetzer, D.; Gimpel, H.; Kühl, N. Multi-Agent AI. OSF Preprints. 2023. Available online: https://osf.io/hndm3 (accessed on 14 August 2025).
- Nurturing Code Quality: Leveraging Static Analysis and Large Language Models for Software Quality in Education. In Proceedings of the Adjunct 33rd ACM Conference on User Modeling, Adaptation and Personalization. New York, NY, USA, 16–19 June 2025. [CrossRef]
- Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y. ReAct: Synergizing Reasoning and Acting in Language Models. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
- Gorenshtein, A.; Sorka, M.; Khateb, M.; Aran, D.; Shelly, S. Agent-guided AI-powered interpretation and reporting of nerve conduction studies and EMG (INSPIRE). Clin. Neurophysiol. 2025, 177, 2110792. [Google Scholar] [CrossRef]
- Kim, J.; Wahi-Anwa, M.; Park, S.; Shin, S.; Hoffman, J.M.; Brown, M.S. Autonomous Computer Vision Development with Agentic AI. arXiv 2025, arXiv:2506.11140. [Google Scholar] [CrossRef]
- Roy, T.P.D. Bioethics Artificial Intelligence Advisory (BAIA): An Agentic Artificial Intelligence (AI) Framework for Bioethical Clinical Decision Support. Cureus 2025, 17, e80494. [Google Scholar] [CrossRef]
- Suechkul, P.; Tribuddharat, N.; Kulthamrongsri, N. Toward Real-Time Detection of Drug-Induced Liver Injury Using Large Language Models: A Feasibility Study from Clinical Note. J. Clin. Exp. Hepatol. 2025, 15, 102627. [Google Scholar] [CrossRef]
- Bharadiya, J.P. Artificial Intelligence in Transportation Systems A Critical Review. Am. J. Comput. Eng. 2023, 6, 35–45. [Google Scholar] [CrossRef]
- Yu, J. Preparing for an agentic era of human-machine transportation systems: Opportunities, challenges, and policy recommendations. Transp. Policy 2025, 171, 78–97. [Google Scholar] [CrossRef]
- Jeyakumar, S.K.; Ahmad, A.A.; Gabriel, A.G. Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset. In Proceedings of the NeurIPS 2024 Workshop on Open-World Agents, Vancouver, BC, Canada, 16 December 2024. [Google Scholar]
- Paduraru, C.; Zavelca, M.; Stefanescu, A. Agentic AI for Behavior-Driven Development Testing using Large Language Models. Available online: https://www.scitepress.org/Papers/2025/133744/133744.pdf (accessed on 14 August 2025).
- Ogbu, D. Agentic AI in Computer Vision Domain-Recent Advances and Prospects. Available online: https://www.researchgate.net/profile/Daniel-Ogbu/publication/386292786_Agentic_AI_in_Computer_Vision_Domain_-Recent_Advances_and_Prospects/links/674c6ec6a7fbc259f1a33618/Agentic-AI-in-Computer-Vision-Domain-Recent-Advances-and-Prospects.pdf (accessed on 14 August 2025).
- Zhang, Y.; Liu, Z.; Wen, Q.; Pang, L.; Liu, W.; Yu, P.S. AI Agent for Information Retrieval: Generating and Ranking. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 5605–5607. [Google Scholar]
- Saxena, A.; Chaudhari, A.Y.; Gupta, A. AgCV: An Agentic framework for automating computer vision application. MethodsX 2025, 15, 103424. [Google Scholar] [CrossRef]
- Robinson, D.; Cabrera, C.; Gordon, A.D.; Lawrence, N.D.; Mennen, L. Requirements Are All You Need: The Final Frontier for End-User Software Engineering. ACM Trans. Softw. Eng. Methodol. 2025, 34, 141. [Google Scholar] [CrossRef]
- Belcak, P.; Heinrich, G.; Diao, S.; Fu, Y.; Dong, X.; Muralidharan, S.; Lin, Y.C.; Molchanov, P. Small Language Models are the Future of Agentic AI. arXiv 2025, arXiv:2506.02153. [Google Scholar] [CrossRef]
- Wen, Q.; Zhang, Y.; Liu, Z.; McAuley, J.; Wei, H.; Pang, L.; Liu, W.; Yu, P.S. The 3rd Workshop on AI Agent for Information Retrieval: Generating and Ranking. In Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, New York, NY, USA, 28 April–2 May 2025. [Google Scholar] [CrossRef]
- Casper, S.; Bailey, L.; Hunter, R.; Ezell, C.; Cabalé, E.; Gerovitch, M.; Slocum, S.; Wei, K.; Jurkovic, N.; Khan, A.; et al. The AI Agent Index. arXiv 2025, arXiv:2502.01635. [Google Scholar]
- Meske, C.; Kuss, P.M. Theorizing the Concept of Agency in Human-Algorithmic Ensembles with a Socio-Technical Lens. 2022. Available online: https://core.ac.uk/download/pdf/542549024.pdf (accessed on 14 August 2025).
- Feldt, R.; Kang, S.; Yoon, J.; Yoo, S. Towards Autonomous Testing Agents via Conversational Large Language Models. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, Luxembourg, 11–15 November 2024; IEEE Press: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
- Fatouros, G.; Makridis, G.; Kousiouris, G.; Soldatos, J.; Tsadimas, A.; Kyriazis, D. Towards Conversational AI for Human-Machine Collaborative MLOps. arXiv 2025, arXiv:2504.12477. [Google Scholar] [CrossRef]
- Transforming cybersecurity with agentic AI to combat emerging cyber threats. Telecommun. Policy 2025, 49, 102976. [CrossRef]
- Bouzenia, I.; Pradel, M. You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects. Proc. Acm Softw. Eng. 2025, 2, 1054–1076. [Google Scholar] [CrossRef]
- Wang, L.; Qiu, T.; Pu, Z.; Yi, J.; Zhu, J.; Zhao, Y. A Decision-making Method for Swarm Agents in Attack-defense Confrontation. IFAC-PapersOnLine 2023, 56, 7858–7864. [Google Scholar] [CrossRef]
- Sheriff, A.; Huang, K.; Nemeth, Z.; Nakhjiri, M. ADA: Automated Moving Target Defense for AI Workloads via Ephemeral Infrastructure-Native Rotation in Kubernetes. arXiv 2025, arXiv:2505.23805. [Google Scholar]
- Ungen, M.; Kampert, D.; Feldotto, B.; Huber, E.; Riedel, O. Automated workflow generation supporting the value stream design of reconfigurable robot assembly cells. Procedia CIRP 2024, 128, 609–614. [Google Scholar] [CrossRef]
- Sandini, G.; Sciutti, A.; Morasso, P. Artificial cognition vs. artificial intelligence for next-generation autonomous robotic agents. Front. Comput. Neurosci. 2024, 18, 1349408. [Google Scholar] [CrossRef]
- Bennett, J.; Sterritt, R. Autonomic Computing in Total Achievement of Quality. In Proceedings of the The Twentieth International Conference on Autonomic and Autonomous Systems, Athens, Greece, 10–14 March 2024. [Google Scholar]
- Contexts Matter: Robot-Aware 3D human motion prediction for Agentic AI-empowered Human-Robot collaboration. Adv. Eng. Inform. 2025, 68, 103591. [CrossRef]
- Tiwari, A. Conceptualising the emergence of Agentic Urban AI: From automation to agency. Urban Inform. 2025, 4, 13. [Google Scholar] [CrossRef]
- Yoon, S.; Song, J.; Li, J. Ontology-enabled AI agent-driven intelligent digital twins for building operations and maintenance. J. Build. Eng. 2025, 108, 112802. [Google Scholar] [CrossRef]
- Stylos, N.; Okumus, F.; Onder, I. Beauty or the Borg: Agentic artificial intelligence organizational socialization in synergistic Hybrid Transformative Dynamic Flows. Tour. Manag. 2025, 111, 105205. [Google Scholar] [CrossRef]
- Pathak, A.; Bansal, V. AI as decision aid or delegated agent: The effects of trust dimensions on the adoption of AI digital agents. Comput. Hum. Behav. Artif. Humans 2024, 2, 100094. [Google Scholar] [CrossRef]
- Sriram, H.K.; Bharath M, B.M. Beyond Automation: Exploring the Potential of Agentic AI in Risk Management and Fraud Detection in Banks. Eksplorium 2025, 46, 192–220. [Google Scholar] [CrossRef]
- Inala, R.; Somu, B. Building Trustworthy Agentic Ai Systems FOR Personalized Banking Experiences. Metall. Mater. Eng. 2025, 31, 1336–1360. [Google Scholar]
- Wünderlich, N.V.; Blut, M.; Brock, C.; Heirati, N.; Jensen, M.; Paluch, S.; Rötzmeier-Keuper, J.; Tóth, Z. How to use emerging service technologies to enhance customer centricity in business-to-business contexts: A conceptual framework and research agenda. J. Bus. Res. 2025, 192, 115284. [Google Scholar] [CrossRef]
- Huang, C.; Huang, H.; Yu, T.; Xie, K.; Wu, J.; Zhang, S.; Mcauley, J.; Jannach, D.; Yao, L. A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms. arXiv 2025, arXiv:2504.16420. [Google Scholar]
- Liu, Y.; Lo, S.K.; Lu, Q.; Zhu, L.; Zhao, D.; Xu, X.; Harrer, S.; Whittle, J. Agent design pattern catalogue: A collection of architectural patterns for foundation model based agents. J. Syst. Softw. 2025, 220, 112278. [Google Scholar] [CrossRef]
- Yellanki, S.K.; Kummari, D.N.; Sheelam, G.K.; Kannan, S.; Chakilam, C. Synthetic Cognition Meets Data Deluge: Architecting Agentic AI Models for Self-Regulating Knowledge Graphs in Heterogeneous Data Warehousing. Metall. Mater. Eng. 2025, 31, 569–586. [Google Scholar] [CrossRef]
- Piccialli, F.; Chiaro, D.; Sarwar, S.; Cerciello, D.; Qi, P.; Mele, V. AgentAI: A comprehensive survey on autonomous agents in distributed AI for industry 4.0. Expert Syst. Appl. 2025, 291, 128404. [Google Scholar] [CrossRef]
- Plaat, A.; van Duijn, M.; van Stein, N.; Preuss, M.; van der Putten, P.; Batenburg, K.J. Agentic large language models, a survey. arXiv 2025, arXiv:2503.23037. [Google Scholar] [CrossRef]
- Hu, S.; Lu, C.; Clune, J. Automated design of agentic systems. arXiv 2024, arXiv:2408.08435. [Google Scholar] [CrossRef]
- Zhang, Q.; Wornow, M.; Olukotun, K. Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching. arXiv 2025, arXiv:2506.14852. [Google Scholar]
- Zhang, Z.; Dai, Q.; Chen, X.; Li, R.; Li, Z.; Dong, Z. MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents. In Proceedings of the Companion ACM on Web Conference 2025, New York, NY, USA, 18 May 2025. [Google Scholar] [CrossRef]
- Sohn, S.; Labrecque, L.; Siemon, D.; Morana, S. Artificial intelligence versus human service agents: How their presence shapes consumer information privacy concerns. J. Retail. 2025, 101, 263–278. [Google Scholar] [CrossRef]
- Floridi, L.; Buttaboni, C.; Hine, E.; Morley, J.; Novelli, C.; Schroder, T. Agentic AI Optimisation (AAIO): What it is, how it works, why it matters, and how to deal with it. arXiv 2025, arXiv:2504.12482. [Google Scholar] [CrossRef]
- Alecsoiu, O.R.; Faruqui, N.; Panagoret, A.A.; Ceausescu, A.I.; Panagoret, D.M.; Nitu, R.V.; Mutu, M.A. EcoptiAI: E-Commerce Process Optimization and Operational Cost Minimization through Task Automation using Agentic AI. IEEE Access 2025, 13, 70254–70268. [Google Scholar] [CrossRef]
- Narechania, A.; Endert, A.; Sinha, A.R. Agentic Enterprise: AI-Centric User to User-Centric AI. arXiv 2025, arXiv:2506.22893. [Google Scholar]
- Qiao, S.; Qiu, Z.; Ren, B.; Wang, X.; Ru, X.; Zhang, N.; Chen, X.; Jiang, Y.; Xie, P.; Huang, F.; et al. Agentic Knowledgeable Self-awareness. arXiv 2025, arXiv:2504.03553. [Google Scholar] [CrossRef]
- Chong, T.; Yu, T.; Keeling, D.I.; de Ruyter, K. AI-chatbots on the services frontline addressing the challenges and opportunities of agency. J. Retail. Consum. Serv. 2021, 63, 102735. [Google Scholar] [CrossRef]
- Jeon, Y.A. Let me transfer you to our AI-based manager: Impact of manager-level job titles assigned to AI-based agents on marketing outcomes. J. Bus. Res. 2022, 145, 892–904. [Google Scholar] [CrossRef]
- Sargsyan, L. Integrating Agentic AI in Higher Education: Balancing Opportunities, Challenges, and Ethical Imperatives. Foreign Lang. High. Educ. 2025, 29, 87–100. [Google Scholar] [CrossRef]
- Kamalov, F.; Calonge, D.S.; Smail, L.; Azizov, D.; Thadani, D.R.; Kwong, T.; Atif, A. Evolution of ai in education: Agentic workflows. arXiv 2025, arXiv:2504.20082. [Google Scholar] [CrossRef]
- AI Agents: Potential implications for IS Research? Available online: https://www.sciencedirect.com/science/article/pii/S0963868725000216 (accessed on 14 August 2025).
- Gridach, M.; Nanavati, J.; Abidine, K.Z.E.; Mendes, L.; Mack, C. Agentic AI for scientific discovery: A survey of progress, challenges, and future directions. arXiv 2025, arXiv:2503.08979. [Google Scholar]
- Zhou, R.; Sikand, V.; Rao, S. AI Agents for Deep Scientific Research. In Proceedings of the UIUC Spring 2025 CS598 LLM Agent Workshop, Urbana, Illinois, USA, 27 April 2025; Submitted. Available online: https://openreview.net/forum?id=wODNrFtTT2 (accessed on 14 August 2025).
- Yager, K.G. Towards a science exocortex. Digit. Discov. 2024, 3, 1933–1957. [Google Scholar] [CrossRef]
- Budig, T.; Nißen, M.; Kowatsch, T. Towards the Embodied Conversational Interview Agentic Service ELIAS: Development and Evaluation of a First Prototype. In Proceedings of the Adjunct 33rd ACM Conference on User Modeling, Adaptation and Personalization, New York, NY, USA, 16–19 June 2025. [Google Scholar] [CrossRef]
- Borghoff, U.M.; Bottoni, P.; Pareschi, R. Human-artificial interaction in the age of agentic ai: A system-theoretical approach. Front. Hum. Dyn. 2025, 7, 1579166. [Google Scholar] [CrossRef]
- Weisz, J.D.; He, J.; Muller, M.; Hoefer, G.; Miles, R.; Geyer, W. Design Principles for Generative AI Applications. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024. [Google Scholar] [CrossRef]
- Vajpayee, P.; Hossain, G. Cyber Defense through Agentic AI Enabled Automation: An Approach to Reduce Cyber Risk. In Proceedings of the 2025 Computers and People Research Conference, Waco, TX, USA, 28–30 May 2025. [Google Scholar] [CrossRef]
- Mukherjee, A.; Chang, H.H. Agentic AI: Expanding the Algorithmic Frontier of Creative Problem Solving. arXiv 2025, arXiv:2502.00289. [Google Scholar]
- Seifert, G.; Sealander, A.; Marzen, S.; Levin, M. From reinforcement learning to agency: Frameworks for understanding basal cognition. BioSystems 2024, 235, 105107. [Google Scholar] [CrossRef]
- Zhang, R.; Tang, S.; Liu, Y.; Niyato, D.; Xiong, Z.; Sun, S.; Mao, S.; Han, Z. Toward agentic ai: Generative information retrieval inspired intelligent communications and networking. arXiv 2025, arXiv:2502.16866. [Google Scholar] [CrossRef]
- Motamary, S. Transforming Customer Experience in Telecom: Agentic AI-Driven BSS Solutions for Hyper-Personalized Service Delivery. 2024. Available online: https://ssrn.com/abstract=5240126 (accessed on 14 August 2025).
- Hughes, L.; Dwivedi, Y.K.; Malik, T.; Shawosh, M.; Albashrawi, M.A.; Jeon, I.; Dutot, V.; Appanderanda, M.; Crick, T.; De’, R.; et al. AI agents and agentic systems: A multi-expert analysis. J. Comput. Inf. Syst. 2025, 65, 489–517. [Google Scholar] [CrossRef]
- Bonnefon, J.; Rahwan, I.; Shariff, A.F. The Moral Psychology of Artificial Intelligence. Annu. Rev. Psychol. 2023, 75, 653–675. [Google Scholar] [CrossRef]
- Cruzes, S. Revolutionizing optical networks: The integration and impact of large language models. Opt. Switch. Netw. 2025, 57, 100812. [Google Scholar] [CrossRef]
- Cabitza, F.; Campagner, A.; Simone, C. The need to move away from agential-AI: Empirical investigations, useful concepts and open issues. Int. J. Hum.-Comput. Stud. 2021, 155, 102696. [Google Scholar] [CrossRef]
- Ni, B.; Buehler, M.J. MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge. Extrem. Mech. Lett. 2024, 67, 102131. [Google Scholar] [CrossRef]
- Rathakrishnan, M.; Gayan, S.; Singh, R.; Kaur, A.; Inaltekin, H.; Edirisinghe, S.; Poor, H.V. Towards AI-Driven RANs for 6G and Beyond: Architectural Advancements and Future Horizons. arXiv 2025, arXiv:2506.16070. [Google Scholar] [CrossRef]
- Miehling, E.; Ramamurthy, K.N.; Varshney, K.R.; Riemer, M.; Bouneffouf, D.; Richards, J.T.; Dhurandhar, A.; Daly, E.M.; Hind, M.; Sattigeri, P.; et al. Agentic ai needs a systems theory. arXiv 2025, arXiv:2503.00237. [Google Scholar]
- Ajmani, L.H.; Abdelkadir, N.A.; Chancellor, S. Secondary Stakeholders in AI: Fighting for, Brokering, and Navigating Agency. In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, New York, NY, USA, 23–26 June 2025. [Google Scholar] [CrossRef]
- Cheung, L.H.; Wang, L.; Lei, D. Conversational, agentic AI-enhanced architectural design process: Three approaches to multimodal AI-enhanced early-stage performative design exploration. Archit. Intell. 2025, 4, 1–25. [Google Scholar] [CrossRef]
- Jilk, D.J. Limits to verification and validation of agentic behavior. In Artificial Intelligence Safety and Security; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018; pp. 225–234. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).