Previous Article in Journal
Cell-Sequence-Based Covert Signal for Tor De-Anonymization Attacks
Previous Article in Special Issue
Patient-Oriented Smart Applications to Support the Diagnosis, Rehabilitation, and Care of Patients with Parkinson’s: An Umbrella Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

The Rise of Agentic AI: A Review of Definitions, Frameworks, Architectures, Applications, Evaluation Metrics, and Challenges

1
School of Computer Science and Information Systems, Northwest Missouri State University, Maryville, MO 64468, USA
2
Independent Researcher, Omaha, NE 68022, USA
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work. The authors are listed alphabetically by last name.
Future Internet 2025, 17(9), 404; https://doi.org/10.3390/fi17090404
Submission received: 18 August 2025 / Revised: 1 September 2025 / Accepted: 1 September 2025 / Published: 4 September 2025

Abstract

Agentic AI systems are a recently emerged and important approach that goes beyond traditional AI, generative AI, and autonomous systems by focusing on autonomy, adaptability, and goal-driven reasoning. This study provides a clear review of agentic AI systems by bringing together their definitions, frameworks, and architectures, and by comparing them with related areas like generative AI, autonomic computing, and multi-agent systems. To do this, we reviewed 143 primary studies on current LLM-based and non-LLM-driven agentic systems and examined how they support planning, memory, reflection, and goal pursuit. Furthermore, we classified architectural models, input–output mechanisms, and applications based on their task domains where agentic AI is applied, supported using tabular summaries that highlight real-world case studies. Evaluation metrics were classified as qualitative and quantitative measures, along with available testing methods of agentic AI systems to check the system’s performance and reliability. This study also highlights the main challenges and limitations of agentic AI, covering technical, architectural, coordination, ethical, and security issues. We organized the conceptual foundations, available tools, architectures, and evaluation metrics in this research, which defines a structured foundation for understanding and advancing agentic AI. These findings aim to help researchers and developers build better, clearer, and more adaptable systems that support responsible deployment in different domains.

1. Introduction

Agentic AI refers to AI systems that do not just answer prompts; they set sub-goals, choose tools, and take multi-step actions to achieve a user’s objective with limited supervision. In practice, these systems coordinate multiple agents, each handling part of the job, and an orchestration layer that keeps them aligned with the goal [1]. An agent is a system that senses its environment, decides what to do, and takes action to achieve a goal. A simple example is a travel planner agent: Given ‘plan a 3-day trip to Chicago under USD 1500’, it breaks the task into subtasks such as flights, hotel, itinerary, queries real-time sources, compares options, book reservations, then produces a shareable plan that asks for confirmation only at key steps. This goes beyond traditional chatbots by planning, acting, and verifying outcomes in a loop [2]. The term AI agent refers to an autonomous software entity that perceives, reasons, and acts to complete specific adaptively directed tasks. The term agentic AI refers to a multi-agent system where specialized agents collaborate, coordinate, and plan to achieve complex, high-level objectives. The difference in these terms for the flight booking example is shown in Figure 1.
Interest in agentic AI is rising quickly as organizations look for automation that impacts real work, not just content generation. Industry analysts now track “AI agents” as a distinct market, estimating ∼USD 5.3–5.4B in 2024 and projecting ∼USD 50–52 B by 2030 (≈41–46% CAGR), reflecting strong enterprise demand for agents that can reason, use tools, and execute workflows [3,4]. Scholarly research suggests the same movement. One work from Stanford HAI [5] shows human agent that did exhibit human behavior and sustained personas. Additionally, industry advice such as by AWS [2] describes increasing levels of autonomy of which the agent iteratively planned, acted, checked experience, and adjusted each step of the way autonomously rather than expecting prompts step by step from a human. Together, these advances move AI from reactive assistants to proactive collaborators in various fields [6,7,8].
According to Google Trends, interest in “agentic AI” remained minimal for years, then spiked beginning in April 2024, reaching its peak in July 2025. A value of 100 represents the term’s highest recorded popularity, as shown in Figure 2. This sharp rise reflects a global shift toward building AI that can plan, act, and adapt with minimal human guidance.
Traditional systems enabled the automation of individual steps that complied with limited rules; agentic AI connect these steps, track progress, recover from errors, and they are able to automate any end-to-end process, month-end close, claims adjudication, outreach in sales, and even research workflows, previously limited to human effort. Early enterprise reporting highlights this architectural turn from smarter models to agentic, process-embedded systems [9]. This review synthesizes how the field defines agentic AI, the architectures that enable it, where it is being applied, how it is measured, and the open challenges, such as reliability, coordination, safety, and governance, that must be addressed for widespread deployment.

1.1. Research Purpose

The purpose of this research is to provide a comprehensive review of agentic AI by synthesizing its definitions, frameworks, and architectures, while comparing agentic AI from related paradigms. This study aims to find current tools and frameworks that support agentic capabilities, describe available architectural models of such systems, and explore their applications across domains. Furthermore, it concentrates on the input–output mechanism of agentic AI, existing testing methods, and metrics used to assess agentic performance, and addresses the key challenges and limitations. By doing so, the paper establishes a structured foundation for understanding, assessing, and advancing the field of agentic AI.

1.2. Research Questions

The motivation for this research resulted from the growing importance of agentic AI and the need to better understand what it is and how it works. This study aims to provide researchers and developers with a clear overview of agentic AI by examining its definitions, how it differs from related technologies, the tools and frameworks that support it, its system design, the types of tasks it can perform, how it interacts with inputs and outputs, methods for evaluating it, and the challenges it faces. The objective is to support responsible development and deployment while also assisting in the improvement of agentic system design, use, and evaluation across various domains.
RQ1
How is agentic AI conceptually defined, and how does it differ from related paradigms?
Ans:
We define agentic AI’s core characteristics, distinguishing it from generative AI, autonomic computing, and multi-agent systems. A Venn diagram-based taxonomy illustrates overlaps and distinctions, clarifying terminology and providing a structured foundation.
RQ2
To what extent do current LLM-based and non-LLM-driven agentic AI, tools, and frameworks enable agentic capabilities?
Ans:
We analyze frameworks like LangChain, AutoGPT, BabyAGI, OpenAgents, Autogen, CAMEL, MetaGPT, SuperAGI, TB-CSPN, and non-LLM-driven agentic systems. Their features planning, memory, reflection, and goal pursuit are compared to evaluate how effectively current LLM-based tools enable agentic capabilities and approach true autonomy.
RQ3
What are the core components or architectural models used to build agentic AI systems?
Ans:
We define the core architectural components of agentic AI, emphasizing planning modules, memory system, and reasoning engines.
RQ4
What types of goals and tasks are currently being solved using agentic AI across domains?
Ans:
We provide a comprehensive classification of goals and tasks being addressed by agentic AI across multiple domains, along with their application contexts and corresponding systems used in the literature, presented in a tabular format.
RQ5
What kinds of input and output formats do agentic AI systems handle in comparison to traditional AI systems?
Ans:
We provide a detailed overview of the input and output formats handled by agentic AI systems, along with the specific tasks and corresponding applications, summarized in a tabular format, and compare this with the input–output mechanisms of traditional AI systems.
RQ6
What evaluation methods and metrics are used to assess the performance of agentic AI systems?
Ans:
We presented a classification for both the testing methods and the evaluation metrics to evaluate agentic AI systems. The evaluation metrics are further classified into qualitative and quantitative.
RQ7
What are the key challenges and limitations in designing and deploying agentic AI systems?
Ans:
We presented different challenges of agentic AI, including architecture and technical challenges, performance and tool integration issues, coordination between multi-agents, and user experience issues, along with ethical and security challenges.

1.3. Contributions and Research Significance

This survey provides detailed answers for research questions that show the importance and impacts of agentic AI systems in the real world. Exploring these definitions, frameworks, and architectural models of agentic AI provides important guidance and information to researchers and developers. Understanding these foundational elements is useful for designing agentic AI systems capable of autonomous planning, decision-making, and adaptive behavior. Also, this study reviews current tools and frameworks of agentic AI systems, their input and output mechanisms, and describes how agentic capabilities are implemented in real-time applications. Organizing architectures, applications, and evaluation methods in a clear overview helps people understand and choose the right approaches and tools easily for different tasks and areas.
Additionally, this research examines the key challenges and limitations in developing and deploying agentic AI, including reliability, safety, interpretability, and governance concerns. By highlighting these challenges and discussing robust evaluation methods, it contributes to establishing reliable assessment frameworks that improve the credibility and practical application of agentic AI systems. These insights facilitate informed decision-making in both academic and industrial contexts and support the development of more advanced, dependable, and context-aware agentic AI solutions. Overall, this study offers critical knowledge on conceptual, architectural, and practical aspects of agentic AI, advancing understanding and guiding future research and implementation efforts.

1.4. Organization of the Paper

The remainder of this paper is organized as follows. Section 1 introduces the motivation and background for studying agentic AI, the research purpose, proposed research questions, and outlines the overall organization of the paper. Section 2 describes the methodology used to conduct the review and analysis. Section 3 presents the results, beginning with the conceptual definition of agentic AI and its differences from related paradigms. It then examines the extent to which current LLM-based frameworks enable agentic capabilities, followed by a discussion of core components and architectural models, including a comparison between architectures and components. The section further explores the types of goals and tasks addressed by agentic AI across domains, the input–output mechanisms in comparison to generative AI systems, and the evaluation methods and metrics used to assess performance. Challenges and limitations in designing and deploying agentic AI systems are also highlighted. Section 4 concludes the paper by summarizing the findings and offering directions for future research. Appendix A presents tables summarizing the bibliographic search results of primary studies, including publication venues, years, and paper types, while Appendix B lists the abbreviations used throughout the paper.

2. Methodology

To carry out this study, we searched several academic databases, including Library Search, ACM Digital Library, ScienceDirect, Google Scholar, and Semantic Scholar. We created a precise search query (“agentic AI” OR “agent” OR “autonomous AI” OR “multi-agent systems” OR “autonomic computing” OR “generative AI” OR “AI planning” OR “AI memory systems” OR “AI feedback loops”) AND (“survey” OR “overview” OR “review” OR “summary” OR “literature review”) to find studies relevant to our research. After collecting the results, we removed duplicates and examined abstracts and conclusions to determine relevance. We included papers that discussed agentic AI concepts, systems, or challenges, using case studies, surveys, and other research methods. We also looked at the references of these papers to find additional useful studies. Finally, we extracted key information from all selected studies to summarize current knowledge on agentic AI.
As a result of the search process, we used these 143 primary studies to extract the data, which were published from the year 2005 to the present. More than 90% of these papers were published in 2024 and 2025; from this, it is clear that most of the agentic-AI-system-related papers are being published in recent times. More statistics related to year-wise agentic AI papers publications are shown in Table A1. Out of these papers 17% are conference proceedings, 46% are peer-reviewed journals, and 28% are preprints. Others are dissertations, articles, workshops, etc. Table A2 is a clear distribution format of the type of publication corresponding to the number of papers.
Turning to the publication platform, 32 of these selected papers are published on the arXiv platform, followed by 27 papers from conference proceedings, including ACM International Conferences such as on Autonomous Agents and Multi-agent Systems, Interactive Media Experiences, Information and Knowledge Management, Human Factors in Computing Systems, etc. Additionally, we found 43 papers that are published in journal-based venues such as IEEE Access, Engineering Applications of Artificial Intelligence. One of the selected papers is a PhD dissertation, and the other papers are from Procedia CIRP, MethodsX, etc. Table A3 provides the complete list of venues. We observed that most of the publications are multidisciplinary, which makes sense in the context of agentic AI, as these disciplines are focusing on solving their problems using agentic AI.
Along with the main research papers, we also studied 21 survey and review articles that provide useful perspectives on agentic AI. Each of these surveys was examined based on our chosen dimensions, and their contributions were marked as L, M, H, or NA to represent low, medium, high, and not applicable. As shown in Table 1, many of the surveys emphasize specific aspects such as taxonomy, system architecture, or domain-focused applications of LLM-based agents, while giving little attention to evaluation methods or input–output classifications. Some papers discuss specialized areas like planning, reasoning, integration with 6G networks, or retail systems, while others examine broader themes including trust, risk, operational practices, and public attitudes toward AI.
From this analysis, it is clear that most existing surveys cover only selected dimensions, leaving important areas such as standardized benchmarks, performance evaluation, and input–output modeling less developed. In contrast, our paper provides complete coverage of all six dimensions’ definitions and concepts, taxonomy, architecture, applications, input–output classifications, evaluation metrics, and challenges. By taking this wider and unified approach, our research deals with the issues faced in prior research and provides a better understanding of agentic AI that will help address the questions or challenges of any future research and real practice.

3. Results

3.1. How Is Agentic AI Conceptually Defined, and How Does It Differ from Related Paradigms?

Agentic artificial intelligence (agentic AI) marks a major change in how AI systems are designed. Moving beyond passive and reactive tools, agentic AI refers to autonomous, goal-driven systems that can operate on their own for long periods, requiring minimal human supervision [27,31,32,33]. Unlike traditional AI or standard large language models (LLMs) that respond only to single prompts [34], these systems can understand broad objectives, break them down into smaller tasks, and carry out multi-step plans while adapting to feedback from changing environments [35,36,37]. Their “agentic” nature also highlights their ability to take on responsibilities from humans [32], act purposefully, and be accountable for the results they produce.
What makes agentic AI stand out is its combination of key abilities in one system. These include strategic planning, memory that preserves context over time, using external tools to extend capabilities, and collaborating with other agents [38,39,40]. By combining foundation models with planners, knowledge bases, sensors, actuators, and feedback mechanisms, these systems can understand complex environments, adjust strategies in real time, and achieve their goals independently [37,41,42]. However, in the real world, autonomy can be defined as a system’s capacity to establish sub-goals, finish tasks without human involvement, and adjust to unexpected events [22]. Even though existing frameworks show some autonomy, they are frequently still constrained by problems like reliance on human-defined goals, vulnerability to mistakes in long-term planning, and difficulties with accountability. This makes them highly valuable in areas like scientific research, industrial automation, and compliance management [43,44,45], where precision, continuous learning, and smart decision-making are essential.
At the heart of agentic AI is self-directed, goal-focused intelligence. These systems operate with a “degree of agenticness,” meaning they can work proactively, plan ahead, and shape outcomes over time [14,46]. As research and industry continue to adopt these systems, agentic AI has the potential to become a reliable partner for humans extending our abilities while raising new questions about trust, oversight, and collaboration [36].
Agentic AI goes beyond simple automation by enabling systems to understand high-level intent, create actionable plans, and carry them out with minimal human input [27,31,38,47]. These systems combine goal-driven behavior, flexible decision-making, and adaptive reasoning to work effectively in dynamic and evolving situations [48,49,50]. They can pursue multiple goals at once, plan for the long term, and collaborate across several agents or components [37,38,51]. Architecturally, they often integrate large AI models with knowledge bases, planners, access to external tools, and persistent memory to improve understanding and independence [21,42,52]. By continuously learning from experience and adjusting to complex, unpredictable environments [53], agentic AI shifts from simple automation toward reflective, autonomous systems capable of achieving complex objectives.
The purpose of the comparative overview is to present the similarities and differences between multi-agent systems, generative AI, and agentic AI in a structured format, as shown in Table 2. By organizing the paradigms across key aspects such as decision-making, adaptability, learning approach, and workflow management, it highlights their distinctive features. The side-by-side format illustrates the progression from distributed coordination in MAS and pattern-based generation in GenAI toward the autonomy and goal-driven reasoning of agentic AI. This approach ensures complex concepts are communicated clearly.
The purpose of the below Venn diagram in Figure 3 is to provide a visual representation of how multiple fields of artificial intelligence overlap and intersect while emphasizing the importance of agentic AI. It shows how agentic AI overlaps with reinforcement learning and multi-agent systems and places it within the broader universe of AI. This format allows us to better understand and visualize the dependencies and shared characteristics between agentic AI and related fields. In general, the diagram serves as complement to the written discussion in providing a concise and more accessible picture of how agentic AI is situated and interacts within the greater ecosystem of AI technologies.
Figure 3 shows the relationships between different areas of artificial intelligence. At the broadest level, everything falls under AI, with Machine Learning (ML) as a major subset. Inside ML, deep learning (DL), generative AI (Gen AI), and LLMs are shown as nested areas, reflecting their strong connection. Reinforcement learning (RL) also sits within ML but separately. Agentic AI overlaps with these technologies, as it draws from both LLMs and RL, while also sharing ideas with MAS [77,78]. Autonomic computing is placed outside, showing that it is related but not fully inside the AI family; it focuses on self-managing systems that can configure, heal, optimize, and protect themselves, which overlaps with AI but also extends into broader fields like distributed systems and IT infrastructure.
Agentic AI is built strongly on top of LLMs and RL. LLMs give agents the ability to understand and generate natural language, enabling them to reason, plan, and interact in ways that feel human-like. RL, meanwhile, allows agents to learn from their actions and continually improve their decisions, making them adaptive and focused on achieving goals [31,79]. Together, LLMs provide intelligence in communication and knowledge, while RL contributes the ability to learn from experience. This combination is at the heart of how agentic AI systems can plan tasks, solve problems, and act independently.
The overlap between MAS and agentic AI is also important. MAS focus on how multiple autonomous agents interact, cooperate, or compete within a shared environment. These skills are essential for agentic AI, where several agents often need to coordinate to achieve complex objectives. Beyond using LLMs and RL, agentic AI depends on planning algorithms and memory systems, which help agents remember past experiences and develop long-term strategies. Altogether, these components form the foundation of agentic AI, allowing it to act not just as a tool but as an intelligent, goal-driven partner.

3.2. To What Extent Do Current LLM-Based and Non-LLM-Driven Agentic Systems, Tools, and Frameworks Enable Agentic Capabilities?

3.2.1. LLM-Based Agentic Systems

The development of LLM-based frameworks like LangChain [26,31,78,80], AutoGPT [26,31,78], BabyAGI [31], AutoGen [80], and OpenAgents [26] shows a clear trend toward building smarter, more independent AI systems. These frameworks give AI agents the ability to plan, remember past actions, reflect on their performance, pursue goals, and use external tools. By equipping agents with these cognitive skills, they become more adaptable and capable of handling complex tasks with minimal human guidance.
Table 3 presents an organized overview of major LLM-based frameworks. It highlights each framework’s, primary purpose, key agentic capabilities, and the underlying LLM used. Structuring this information in tabular form makes complex details easier to navigate. The format also facilitates side-by-side comparison, enabling the identification of unique features as well as areas of overlap in agentic capabilities.
LangChain is a popularly used open source framework that connects LLM to external tools, data sources and APIs, which simplifies the work of constructing complex workflows with multiple steps [26,31,78,79]. It allows developers to construct sequences of reasoning steps (i.e., chains) to help the model reach an objective. LangChain has basic memory that allows you to store and recall information and integrate external knowledge using Retrieval-Augmented Generation (RAG), which gives agents access to documents and databases or search results while they reason [77,80]. While it facilitates task planning and chaining, it does not natively include self-reflection or optimization, so its intelligence is largely guided by the developer. LangChain is ideal for creating structured, goal-oriented workflows where models can interact with external systems and perform contextual reasoning.
AutoGPT is built as a fully autonomous agent capable of completing complex tasks without needing constant human guidance [43,45,47]. It can take high-level goals, break them down into smaller actionable steps, and adjust its plan dynamically as circumstances change. With both short-term and long-term memory, AutoGPT keeps track of context and can handle tasks that go beyond a single interaction. It also leverages tools like web search, code execution, and file management to operate effectively in real-world settings [29,44]. Through its iterative process, AutoGPT can analyze outcomes and refine its actions, combining planning, memory, reflection, and goal-oriented execution to manage tasks autonomously.
BabyAGI approaches the idea of autonomous agents with a slightly different approach, emphasizing simplicity and general accessibility for task generation [49,81]. It can decompose high-level goals into smaller, actionable steps for task outputs and has memory to keep track of progress made on a particular multi-step task. BabyAGI has extensions for working with external tools, such as those for code generation and robotics, so it can effectively act in both digital and physical environments. Although it does not have a fully explicit reflection ability like AutoGPT, and it is still all iterative, meaning that it can execute a task, receive feedback, and refine or adopt new actions over time, to give it a practical level of self-correction for goal achievement [82]. BabyAGI should provide developers the ability to implement autonomous problem-solving agents of moderate complexity.
AutoGen is designed for multi-agent systems, allowing several AI agents to communicate, collaborate, and work together to solve tasks [41]. Each agent can create its own plan while coordinating with others, using memory to track past actions and shared context. The framework supports reflection and optimization, so agents can evaluate their plans and improve them over time. By combining tool use with teamwork, AutoGen reduces errors, boosts performance, and can tackle complex, distributed goals more effectively. This makes it especially useful for research or applications that rely on multi-agent reasoning and collaborative workflows.
OpenAgents focuses on developing LLM-based agents that can work together in a shared ecosystem to accomplish complex goals [26,78,79]. The framework uses contextual memory to keep interactions coherent over multiple steps, and it supports iterative planning and reasoning so agents can refine their decisions over time. By integrating external tools, agents can carry out tasks effectively while combining internal reasoning with knowledge from outside sources [31,77,80]. OpenAgents also lets agents improve themselves through repeated reasoning cycles and supports collaboration among multiple agents on tasks that require continuous decision-making, dynamic problem-solving, or integrating diverse sources of knowledge.
Communicative Agents for “Mind” Exploration of Large-Scale Language Model Society (CAMEL) is a framework for multiple agents that enables role-playing: LLM agents discuss with each other to see what new behaviors they come up with. In order to solve tasks, it assigns complementary roles (such as “User” and “Assistant”) and promotes dialogue-based collaboration [83,87]. CAMEL facilitates autonomous reasoning, negotiation, and iterative solution refinement by breaking down objectives into conversational exchanges. In contrast to conventional single-agent systems, CAMEL makes use of multiple agents’ interactions to produce more complex results. The majority of implementations use GPT-4 and Large Language Model Meta AI (LLaMA) as the underlying models, though it can operate on other LLMs. In constrained or open-ended environments, CAMEL is especially helpful for studying agent coordination, cooperation, and emergent problem-solving. MetaGPT is an advanced framework for multi-agent collaborative problem-solving that leverages GPT-4 as a central intelligent agent. It automates tasks that can be represented as a series of smaller sub-problems by breaking each problem into sub-problems, automatically assigning sub-problems to agents based on their specialization, and coordinating agents’ interactions to solve the overlying problem [84,87]. By managing an organized “society” of agents, MetaGPT can achieve efficiency and scalability in solving challenging problems. MetaGPT can be used to automate workflows or research or to support decision-making and reasoning tasks that draw upon diverse expertise and collaborative reasoning. It is best designed with seamless communication and task management of dynamic complexity among multiple agents, showcasing a highly advanced evolution of AI-enabled collaboration.
SuperAGI is an open-source framework for autonomous agents. Developers can quickly define, deploy, and manage LLM-powered agents, with great flexibility. The software is mostly model-agnostic and supports LLMs from many providers (LLaMA, Anthropic, and OpenAI). SuperAGI agents have the capability for goal-directed execution; memory management; multi-step planning; and tool use, among others [85]. In addition, SuperAGI includes resource controls and monitoring to constrain autonomous behavior. Unlike lighter-weight orchestration tools, SuperAGI is oriented toward production readiness, allowing agents to be scaled and monitored within real-world workflows. Its modularity and extensibility make it a strong contender for enterprise-level autonomous agent applications.
A hybrid framework called Task-Based Cognitive Sequential Planning Network (TB-CSPN) blends selective LLM use with formal rule-based coordination [86]. Its architecture limits LLM involvement to semantic topic extraction and uses Colored Petri Nets for deterministic task planning and execution. This eliminates the possibility of LLM visions or non-determinism by guaranteeing that reasoning and sequencing are logically consistent and verifiable. Instead of using unrestricted LLM reasoning, TB-CSPN agents use structured tokens to communicate, and their interactions are guided by formal cognitive models. TB-CSPN demonstrates a more cautious integration of AI balancing LLM flexibility with symbolic rigor, in contrast to frameworks that mainly rely on language models for planning. It works particularly well in fields that need topic-driven task coordination, auditability, and dependability.

3.2.2. Non-LLM-Driven Agentic Systems

Intelligent agents can act and make decisions outside of large language models. Non-LLM-dependent agentic systems achieve autonomy through either rules, learning, planning, or reactive behaviors; they have been used extensively in robotics, simulations, and automated decision-making. These systems show that AI can be effective and adaptive without natural language understanding.
One important type is rule-based multi-agent systems. In these systems, agents follow predefined rules or logic to interact and achieve coordinated outcomes. Examples include CLIPS, Drools, and JADE [88], which are used for simulations, workflow automation, and decision-support tasks. Another key category is RL [89,90] agents, which learn from trial and error. Using algorithms like Q-learning, Deep Q-Networks (DQN), [91] and Policy Gradient methods, RL agents improve their performance over time by interacting with their environment. These agents are commonly used in robotics, autonomous vehicles, and dynamic systems.
Other non-LLM agentic systems include planning agents, which use algorithms such as STRIPS or PDDL [92] to achieve goals; agent-based simulation frameworks like NetLogo [93] and MASON [94], which model complex social, ecological, or economic behaviors; and behavior-based robotic agents, such as those using subsumption architectures, which respond to environmental stimuli through simple sensorimotor rules. Together, these systems demonstrate that intelligence, adaptability, and autonomous behavior can emerge through rules, learning, and planning, highlighting the broad potential of agentic AI beyond language-based models.

3.3. Core Components and Architectural Models for Agentic AI

This section defines a set of agent components and uses them to compare architectures by control flow and module wiring rather than terminology: perception, world/state, memory, planning, execution/tooling, reflection/evaluation, orchestration, and interaction. We then outline five orchestration models, ReAct (Reasoning + Acting) single-agent, supervisor/hierarchical, Hybrid reactive–deliberative, BDI, and layered neuro-symbolic, highlighting when to use each.

3.3.1. Core Functional Components of Agentic AI Systems

Core components are the reusable building blocks of agentic AI, wired together by architectural choices. A common set includes the following:
  • Perception and world modeling—Ingests and structures external inputs (e.g., text, sensors, APIs) into internal representations [41,95].
  • Memory (Short-Term, Long-Term, Episodic)—Stores short- and long-term knowledge; retrieval/promotion rules connect past to present reasoning [39,95].
  • Planning, Reasoning, and Goal Decomposition—Transforms goals into actionable steps, evaluates alternatives, and selects next actions [95].
  • Execution and Actuation—Carries out actions via APIs or actuators, with monitoring and dynamic replanning [41].
  • Reflection and Evaluation—Enables self-critique, verification, and refinement of actions and plans [50].
  • Communication, Orchestration, and Autonomy—Coordinates task flow, retries, and timeouts, either centrally (e.g., LLM-based supervisor) or via decentralized protocols [50,96].
    This component stack recurs across both academic and industry implementations, including high-stakes domains like finance [39].
The model, typically an LLM, large agent model (LAM), or foundation model (FM), provides the system’s core intelligence. It often serves as the reasoning engine, perceptual front-end, and, in many cases, the orchestrator. In Multi-Agent Artificial Intelligence (MAAI) frameworks, the model forms a base layer for perception, action, orchestration, workflows, and user interaction [96]. Tutorials increasingly integrate LLMs/LAMs with planners, memory, tools, and knowledge bases [42].
Agentic AI behavior arises from a coordinated set of reusable components rather than from a single model, as shown in Figure 4. In the following, we synthesize the modules and how they interact.
Perception and World Modeling: Perception ingests external inputs (text, events, sensors/APIs) and normalizes them into structured observations. World/state modeling maintains an internal representation used for prediction, consistency checks, and counterfactual simulation. In embodied or data-rich settings, perception is multimodal and frequently layered with probabilistic inference to manage uncertainty before symbolic planning or execution [41,42,95].
Memory (Short-Term, Long-Term, Episodic): Memory provides temporal continuity. Short-term memory (STM) maintains episode context (e.g., current plan, recent exchanges); long-term memory (LTM) stores episodic/semantic knowledge (e.g., preferences, histories, artifacts). Retrieval and promotion rules connect STM and LTM so that prior outcomes inform future decisions, reflection, and personalization [47,50,95].
Planning, Reasoning, and Goal Decomposition: The planning/reasoning module transforms goals into actionable steps, evaluates alternatives, and selects next actions across short and long horizons. Granularity varies by paradigm: BDI filters desires into intentions; HRL decomposes abstract goals into sub-tasks; single-agent ReAct interleaves reasoning with action (often with tool calls) [41,47,95].
Execution and Actuation: Execution bridges cognition to impact. It invokes tools/APIs, actuators, or workflow steps; validates outcomes against expectations; and triggers retries or replanning on deviation. Production-oriented variants emphasize schema checks, budget/latency limits, and robust error handling to support closed-loop operation in dynamic environments [41,42,95].
Reflection and Evaluation: Reflection/evaluation modules verify intermediate or final outputs, critique candidate plans, and trigger selective replanning. Practical patterns include self-critique, external tool-assisted verification, and nested “critic” roles that reduce hallucination and improve reliability while adding computational overhead [50,95].
Communication, Orchestration, and Autonomy: Interaction modules support human–agent and agent–agent dialogue (clarification, negotiation, oversight) and surface trace information (e.g., actions, tools, sources) for transparency. Autonomy emerges when perception, planning, execution, memory, and reflection are orchestrated over time; in many systems, an LLM-based supervisor coordinates sub-agents, invokes memory or tools, and maintains coherence across steps [41,50,97].

3.3.2. Architectural Models in Agentic AI

Agentic AI systems are not monolithic; their capabilities emerge from architectures that coordinate planning, execution, memory, reasoning, and interaction. Below, we outline five commonly used models and their control flows.
  • ReAct Single-Agent:
Figure 5 represents the architecture flow of ReAct single agent. ReAct instantiates a single agent that interleaves stepwise reasoning and acting, optionally inserting a lightweight evaluator before committing an output [98]. The core components are perception/world-state, planning/reasoning, execution and actuation, memory (STM/LTM), and reflection/evaluation; the architecture wires these into a minimal, locally orchestrated pipeline with no external supervisor. The agent observes via perception, reasons in plan/reason to select the next step, acts in execute/tools to affect the world or produce a user reply, may evaluate the result, and then updates memory before repeating. The goal is fast iteration on well-bounded tasks, where a compact loop outperforms heavier orchestration.
Components and flow: Data and control proceed left to right: the agent observes (perception), optionally retrieves salient context from memory, reasons in plan/reason to select the next step with success criteria, acts in act/execute via a tool/API, and may evaluate the outcome before logging/promoting state and repeating. In the diagram, rounded rectangles denote processing modules (perception → reason/plan → act/execute) connected by solid arrows (primary progression); memory is shown as a cylinder with an upward arrow to reason/plan (retrieval) and dashed downward arrows from act/execute (and, if present, the evaluator) for logging/promotions; evaluation appears as a hexagon branching by a solid arrow from act/execute and returning dashed feedback to reason/plan and/or memory. User-visible responses are produced in act/execute; the loop closes at the environment boundary when the updated world or next user turn re-enters perception as a new observation.
Use and trade-offs: Interleaving reasoning with action tends to improve multi-step task success, and brief critique/verification further suppresses hallucinations; both, however, introduce extra inference steps and wall-clock latency. ReAct is therefore a strong baseline for bounded, single-persona workloads and rapid prototyping. Its simplicity does not scale gracefully: the loop affords little parallelism, weak long-horizon coordination, and increasing tool-selection brittleness as the tool surface and dependency depth grow. For tasks requiring persistent goals, specialization, or oversight, escalation to supervisor/hierarchical (delegation and parallel execution) or Hybrid reactive–deliberative (real-time reflexes with planning) is appropriate despite added coordination overhead [98].
  • Supervisor/Hierarchical:
A supervisor/orchestrator as shown in Figure 6 decomposes high-level goals into subtasks and delegates them to role-specialized sub-agents, each of which executes locally using a compact ReAct-style loop, perception → reason/plan → act/execute, with optional evaluation and memory, while an optional shared/federated state maintains coherence across agents [47,50,95].
Components and flow: Operationally, control proceeds as planning → delegation → execution → reporting → possible re-planning. The supervisor (diamond) issues assignments (solid progression) and receives reports (dashed) that trigger retries or re-planning when outcomes deviate. Within each sub-agent, memory supports retrieval (solid) into planning and accepts logging/promotions (dashed) from execution/evaluation; evaluation appears as a hexagon branching from act/execute, feeding feedback into planning/memory.
Use and trade-offs: This architecture is well-suited to decomposable, multi-stage tasks that benefit from parallel specialization and clear ownership; it improves scalability via hierarchical delegation but incurs coordination overhead and risks a supervisor bottleneck/single point of failure if the controller is not engineered for throughput, fault-tolerance, and observability [47,50,95].
  • Hybrid reactive–deliberative:
A hybrid architecture runs two coordinated loops in parallel, as shown in Figure 7. A fast reactive loop that handles time-critical events and a slower deliberative loop that maintains goals and plans. The loops share a common memory and are overseen by an Arbitrator (diamond) that resolves conflicts when immediate reactions and long-horizon plans disagree [38,47,95].
Components and flow: The reactive path couples perception of events to actions/actuation with minimal latency; safety/guard checks are typically embedded here. The deliberative path performs planning/reasoning over longer horizons, updating goals and selecting action sequences; it reads from and writes to memory (cylinder) to maintain temporal coherence. A shared memory module synchronizes state between the loops. The Arbitrator (diamond) mediates which loop has control when their recommendations diverge, using task/goal priorities and policy constraints. Flow proceeds: perceive (reactive and abstract inputs) → (branch) fast react vs. slower plan → arbitrate (diamond) → execute → evaluate (hexagon) → update memory → possibly re-plan.
Use and trade-offs: The hybrid model balances speed with strategic reasoning, but this comes with arbitration complexity and a requirement for reliable state synchronization across loops to avoid instability or “thrash.” Empirically and in surveyed practice, it is preferred for real-time operations with long-horizon objectives; risks concentrate in arbitration design and consistency across loops [38,47,95].
  • Belief–Desire–Intention (BDI):
The BDI architecture models, as shown in Figure 8, deliberation with explicit mental states: beliefs (world information/state), desires (candidate goals), and intentions (the subset of goals the agent commits to pursue). A canonical cycle is observe → update beliefs → generate desires → filter to intentions → act → repeat; commitment rules maintain intentions until a goal is achieved, becomes impossible, or is superseded [38,95].
Components and flow: External world inputs are observed and update beliefs/world state. The agent derives desires and filters them into intentions according to domain rules and commitment policies. Act/execute realizes the current intention via tools/APIs/actuators, producing effects on the external world. An optional evaluation gate verifies outcomes or constraints and can feed revisions back to the desire set or commitment filter. Memory supplies retrieval to the desire/intention stage (schemas, facts, episodic traces) and receives logs/promotions from act/execute and evaluation (summaries, artifacts, outcomes). Control proceeds top→down, world → beliefs → desires → intentions → act/execute, followed by evaluation and memory update; the loop closes when the changed world yields the next observation back into beliefs [38,95].
Use and trade-offs: BDI is a strong fit for tasks where explainability, goal discipline, and traceability matter (e.g., auditability or safety cases). It trades some adaptability for clarity: commitment rules and symbolic state make reasoning legible but can become brittle in open-world settings without robust belief-revision and exception policies [47,50,95].
  • Layered Decision (Neuro-Symbolic):
Layered decision architecture is shown in Figure 9. Layered decision architectures integrate neural perception and probabilistic inference with a symbolic planner and an execution layer, typically followed by evaluation and memory update. The aim is to handle uncertainty via statistical inference while preserving interpretability and traceable decisions through symbolic planning [38,39,41,95].
Components and flow: Perception ingests signals (text, images, telemetry) and encodes features. Probabilistic inference converts these into calibrated state estimates and event hypotheses, exposing uncertainty to discrete reasoning. The symbolic planner operates on this structured state with explicit rules, goals, and constraints; it retrieves schemas, facts, and episodic traces from memory. Act/execute materializes the chosen step via tools/APIs/actuators, producing effects on the external world. An optional evaluation stage performs verification (e.g., constraint checks, cross-model critics), returning feedback to the planner and logging/promoting artifacts to memory; act/execute likewise logs/promotes to memory. Control is top→down, external world → perception → probabilistic inference → symbolic planner → act/execute, followed by evaluation and memory updates; the loop closes as the updated world re-enters perception [38,95].
Use and trade-offs: Layered neuro-symbolic designs are well-suited to open-world decision-making under uncertainty and to domains needing transparent, auditable reasoning (e.g., public-sector workflows). The principal cost is integration and representation alignment, bridging neural encodings with symbolic state/rules and evaluation/feedback, which increases design and maintenance overhead. These strengths and costs are summarized in your comparative table and discussion.

3.3.3. Coordination and Modularity in Agentic Architectures

Agentic AIs succeed or fail not only by which components they contain but by how those components are modularized and coordinated at runtime. This section summarizes dominant coordination mechanisms and their trade-offs.
  • Modular Composition:
Architectures commonly encapsulate perception, planning/reasoning, memory, execution/tooling, and reflection into independently deployable subsystems with clear interfaces. This promotes scalability (scale out specific modules), fault isolation (contain failures), and parallelism (concurrent role execution). Modularity also enables “plug-and-play” evolution, e.g., swapping a planner or adding a compliance checker, without redesigning the full system. Design guidance emphasizes tight control of interfaces and state contracts between modules to support predictable orchestration and observability [42,47,50,96].
  • Orchestration and Supervisory Control:
Centralized orchestration: Many systems employ a supervisor (often LLM-based) to route tasks, maintain shared context, trigger reflection or replanning, and arbitrate conflicts between competing goals. The supervisor coordinates specialist agents (planner, retriever, executor, critic), aggregates results, and enforces guardrails and approval gates where required [50,95].
Decentralized orchestration: In multi-agent settings, decision-making is distributed via message-passing and shared-state protocols; roles negotiate or vote to coordinate actions. This improves fault tolerance and throughput but increases testing and observability complexity and requires robust protocols for consensus and backoff [42,47,96].
When to prefer which: Centralized control suits tightly regulated or safety-critical workflows with clear accountability; distributed control fits large, decomposable workloads with dynamic specialization and resilience needs.
  • Communication Protocols and Workflow Graphs:
Production systems typically (i) exchange structured messages (e.g., JSON) including task intents, state snapshots, and tool results, enabling replay and lineage; (ii) use workflow graphs to represent control flow (branching, retries, compensations); and (iii) attach evaluation hooks (critics, policy checks) at key transitions to gate risky actions and trigger replanning or escalation. Graph-based orchestration frameworks and multi-agent planning protocols are frequently reported to improve robustness under non-determinism while preserving transparency for operators [41,42,50,96].

3.3.4. Integration of Components into Architecture

Agentic AI differ from conventional pipelines in that components are integrated into continuous decision loops that sense, plan, act, evaluate, and learn.
  • Layered and Modular Pipelines:
A common integration pattern is a layered flow that couples components into an end-to-end decision loop: perception → world/state modeling → planning/reasoning → execution/tooling → evaluation/feedback → memory → (back to) planning. Although often depicted linearly, these flows are event-driven and cyclic: evaluation and feedback can branch to retries or replanning; memory promotion/demotion changes subsequent retrieval; and perception can interrupt with high-priority events. Effective designs define explicit interfaces (inputs/outputs, schemas, confidence) at each boundary and attach gates (e.g., policy checks, critics) before risky actions [41,42,95]. This organization supports persistent goal pursuit (long horizon), real-time adaptation (short horizon), and safe recovery from deviations [41,95].
  • Orchestration Mechanisms:
Centralized orchestration: Many systems elevate an LLM to a supervisor role that maintains shared context, routes tasks to specialist modules/agents (planner, retriever, executor, critic), triggers reflection or replanning, and arbitrates conflicts among goals. The supervisor executes a graph (DAG/state machine) with branching, retries, compensations, and evaluator hooks at critical transitions to keep lineage auditable and side effects idempotent [42,50,97].
  • Graph-based execution:
Regardless of centralization, integrating components with an explicit workflow graph improves robustness under non-determinism. Nodes encapsulate component calls (with contracts for inputs/outputs and budgets), and edges encode control logic (success/failure, timeouts, escalation). Evaluators or policy guards can be placed at ingress/egress of high-risk nodes (e.g., external tools) [42,50].
Trade-offs: Centralized supervisors simplify accountability and debugging but may bottleneck or introduce single-point-of-failure risk; decentralized flows increase resilience and throughput but require stronger state contracts, message protocols, and observability [42,47,50].
  • Multi-Agent Integration and Decentralization:
In decentralized designs, agents each own a subset of components (e.g., local planner + executor + memory) and coordinate via shared context and message passing. Common patterns include role-based teams (planner/solver/critic), peer networks with negotiation/voting, and hybrid hierarchies that mix supervisor(s) with collaborating peers. Integration hinges on (i) shared state (what is global vs. local), (ii) protocols for task handoff and conflict resolution, and (iii) placement of evaluators/approvals for safety or compliance. This model yields fault tolerance and parallel specialization but increases testing/monitoring complexity and demands clear traceability of who decided what and why [39,47,97].

3.3.5. Architectural Models vs. Core Components

Table 4 synthesizes how common architectural families connect the components introduced earlier and compares the core agent components between the major architectural families, highlighting their typical uses and associated risks. It shows how different designs emphasize particular strengths: BDI architectures rely on explicit beliefs, desires, and intentions to support explainable decision systems, but require extensive symbolic modeling and can be brittle under uncertainty [38,41]; hierarchical/HRL models scale well for decomposable tasks through multi-level goal delegation, though they risk supervisor bottlenecks [50,96]; ReAct single agents offer lightweight reasoning–action loops for rapid iteration, but struggle with parallelism and reliable tool selection [47]; Hybrid reactive–deliberative systems balance fast reflexes with long-horizon planning, yet require careful arbitration design [38]; and layered neuro-symbolic approaches integrate neural inference with symbolic reasoning to manage uncertainty and provide traceability, though at the cost of added integration complexity [39,41,95]. Together, the table underscores how architectural choices map to trade-offs in scalability, transparency, and robustness, guiding when each model is most appropriate.

3.4. What Types of Goals and Tasks Are Currently Being Solved Using Agentic AI Across Domains?

Agentic AI is used in a large number of areas and applications in order to accomplish a large variety of goals and tasks. Agentic AI systems can determine a high-level objective, breaking that objective down into smaller tasks, and efficiently accomplishing those tasks, even in environments that are evolving and that are inherently unpredictable and/or complex processes. Agentic AI can reason, build plans, and act autonomously to adapt to a new scenario, work as part of a team, and produce results that may be difficult or impractical to produce with other types of AI.
Agentic AI systems are used to complete routine activities, support in real-time decision-making, handle complex tasks or problems, and provide personalized assistance by adapting to the context. The system’s strength is in combining data from many sources, making ethical and informed decisions, and coordinating multiple agents to work together in real time.
The rationale for creating a domain-based categorization of agentic AIs is to clarify how these systems are molded and shaped by the specific needs, boundaries, and objectives of each of the fields deployed. The domain-based framework not only provides the clarity in displaying agentic capabilities across multiple domains highlighting similarities in abilities that were common but others that would be unique to a specific agent. The overview of the application categories is shown in Figure 10. Moreover, it allowed the reader to identify active patterns within a specific domain, while allowing for tabular comparison of other domains.
There are three components of Table 5: the domain, the application, and the tasks and/or considerations associated with the agents’ work. This allows for quick access to important information without reading full paragraphs. The diagram serves as a companion piece, but includes an additional layer to represent a domain with relationships to their applications, as well as where domains share characteristics. Viewing where the table and diagram intersect means the reader has both meaningful representation, and sufficient detail to challenge if agentic AIs are acting as agents in real and experimental contexts.

3.4.1. Healthcare

For agentic AI in healthcare, it will include diagnostics, treatment plans, and personalized care. Agentic AI enables pinpoint risk stratification, rapid segmentation of medical images, personalized diets from scratch, real-time seizure prediction, and robust human–robot interaction [59]. Fitness and wellness agents have enabled adaptive coaching, strengthened posture analysis paradigms, and offered multimodal feedback, tracked progress, and coordinated recommendations [60]. The EDX test standardized factor could have organized EMG/NCS data, smartly enabled incorporation of known diagnostic contexts through RAG, and offered accurate level physician reports [99]. Genomic assessment could predict risk for disease, specify treatments, indicate progression, and personalize care for costs, in accordance with safeguards of security [35]. Agents could indicate the ethical framework for generative AI [44], clean and automate medical computer vision pipelines [100], develop bioethical decision support [101], greatly enhance clinical decision-making of some use cases for workflows in surgery and drug discovery workflows [56], and would streamline real-time differential diagnosis of DILI derived from clinical notes [102].

3.4.2. Military

In military and cybersecurity efforts, agentic AI makes it possible to have groups of autonomous agents to carry out both offensive actions and defensive actions based on collective coordinated decisions. Every agent can independently act based on its own preferences. They can defend their positions, attack enemy positions, move around as needed, adapt their movements based on performance metrics, and choose to deploy a variety of effective formations [119]. For example, in cloud-based AI platforms, these forms of agentic AI systems can facilitate a kind of continual protective action to foster security by allowing the rotation of particular active services in order to minimize some risk, preemptively defend against known threats, limit vulnerabilities associated with repeated attacks, apply strict security rules, report unusual activity, and automatically respond with system changes [120].

3.4.3. Transportation

Intelligent agentic traffic control systems simulate traffic situations, control dynamic routing and implement on-demand signal operations, coordinate agents over distributed transport networks, and work toward improving safety, efficiency, and sustainability [103]. Real-time process optimization using generative models and soft inputs with SIC architectures predicts quality variables, performs closed-loop control, allows for closed-loop control with soft sensors to scale biomanufacturing, personalize product design, and minimize energy use and emissions [43]. Context-aware navigation reduces risk of collision in dynamic environments and complex place types through multimodal perception to inform actions and behavior used with semantic understanding for informing decision-making [74]. Multi-agent routing and scheduling coordinate agents traveling multiple traffic channels to eliminate delays, avoid deadlock, and provide opportunity for agents to reach locations given occupancy constraints [61]. Multi-phase transportation lifecycle management is human-centered and adaptable and strive to address barriers toward inclusion while engaging with persistent issues of congestion, collisions, environmental impact, and equity while optimizing travel outcomes for individuals and collectives [104].

3.4.4. Software

Agents independently decompose complex queries, determine which tools are relevant, and execute subtasks, while providing adjustable feedback in real time [105]. Behavior-based development agents use a natural language query as an input and generate structured test cases to collaborate across teams [106], whereas autonomous visual intelligence agents fuse spatial context into multimodal sensory data to perceive the environment, adapt, and collaborate in real-time for tasks such as robotic surgery, disaster response, and augmented/mixed reality applications [107]. Software engineering and business automation agents take a goal and decompose it to oversee who is assigned to each role and to automate workflows, including inventory management, customer relationship management, invoicing, and demand forecasting [30]. Self-directed AI pipelines autonomously select models, make use of data featuring imbalanced learning classes, and manage knowledge-aware, interpretable decision-making [54]. Agentic search, reviews, and personalization systems are capable of multi-step reasoning, retrieval in cross-lingual forms, and bias mitigation [108]. Multi-agent orchestration systems can facilitate computer vision task automation and manage complex, fault-tolerant, high-risk systems [62,109].
End-user programming agents are giving rise to democratized software development [110]. Near real-time multi-agent frameworks simultaneously can be used for software development, controlling operational IT workflows, and strategic decision-making [111]. Autonomous software development and multi-domain digital task agents will plan, assign, execute, and self-correct the software development, browsing, coding, and file operations associated with a release [50,113]. Human-algorithmic agency modeling also theorizes an evolving model for collective human–AI agency [114]. Conversational agents will become increasingly integrated into software development to improve software test case development, ML workflow improvement, and SOC automation through natural language interpretation, test case generation, pipeline and incident monitoring and detection, rule zero-trust enforcement and multi-agent interaction coordination [115,116,117]. Fully autonomous software development, predictive incident detection, IT automation, and other tasks do not require a fully autonomous agent to decompose prompts, write and test code, integrate third-party APIs, optimize resource use and performance, manage Git workflows, or manage incidents using coordinated multi-agent planning [49,51,118].

3.4.5. Finance, Banking and Insurance

Agents operating in finance, banking, and insurance have improved customers’ risk profiling, automated forecasting for loans, treasury management, and fraud detection by examining real-time data and behavioral patterns [80]. Agents also increase model risk management by using automated ML workflows, compliance checks, and by cooperating agent to agent [39]. Further, customer services receive outstanding support 24/7, recommendations on investments and insurance, and automated transactions, due diligence, and KYC [128]. Risk and fraud detection agents can spot risky actions, break down complicated processes, and use AI that clearly explains how it works [129]. Personalized banking agents automate budgeting, product recommendations, and can provide instant behavioral interventions which also support inclusion through quasi-real-time feedback mechanisms [130]. Agents for insurance claims evaluation utilize multimodal inputs, decompose tasks into assigned subtasks, and respond in a policy-compliant way [96]. Finally, customer-centric chatbots improve sales, diagnostic ability, predictive analytics, and can automate IoT-driven processes for better interactive engagement across organizational partners [131].

3.4.6. Manufacturing and Industrial

Agents are useful within the manufacturing and industrial contexts in terms of increasing efficiency, safety, and decision-making across production and supply chains. Multi-agent systems play a role in the optimization of a number of tasks spanning wastewater treatment, factory operations, predictive maintenance, and energy-aware task scheduling, among others [33,63]. Supply chain agents may also forecast demand and monitor for contamination and then recommend actions with respect to quality and pricing [55]. Cognitive robots and collaborative human–robot agent systems can also be used to predict the possible states of a scenario, learn from social interactions with humans, adapt movements based on learning, and collaborate in shared tasks with humans safely [122,124]. Workflow agents and autonomic computing agents can also increase efficiency in robotic assembly, and automate the tasks of compliance, scheduling, and resource planning [121,123]. Multi-agent pickup and delivery systems address the coordination task, collision avoidance tasks, and other delay reduction tasks [69].

3.4.7. Tourism and Traveling

They optimize pricing, planning, and customer service using human–AI teams. They provide real-time decision support ethically and transparently, through RAG, Interactive Cognitive Agents (ICAs), and other management support systems [127].

3.4.8. Multi-Domain

Agents model user behavior, provide recommendations and context, coordinate processes across domains, automate business processes, break down goals, utilize external knowledge, and coordinate safely and ethically [53,132,133]. Agents also engage in real-time decision-making, minimize costs, make better use of resources, produce intelligible outputs, and plan and execute tasks under uncertainty [36,135]. Multi-domain agents engage with tools and/or APIs, manage workflows that will inevitably be very complex, such as coding, healthcare, finance, crime analysis, etc., and provide memory-based learning that allows them to continue a task by absorbing adaptive information from external cues [31,67,75,76,77,136,139]. Agents also enhance personal assistants, cybersecurity, and autonomous knowledge management, while realizing reduced computation cost and smarter cross-domain problem solving [32,46,66,134,137,138].

3.4.9. Scientific Discovery and Research

Autonomous systems are aiding scientific inquiry by making hypotheses, automating the literature review and analysis, and designing experiments. They can physically perform majority of robotic chemical and biological experiments, analyze data across all domains, and iteratively refine experiments to facilitate efficient experimentation for drug discovery, material science, and energy applications [64,65,150,151]. Multi-agent systems of LLMs can incorporate physical models with domain knowledge to plan simulations, make predictions for material properties, and develop multimodal outputs (numerical data, charts, text reports) for engineers [64,65]. Agents can facilitate automatic writing of research papers, analyze textual or hypertext content, organize qualitative interviews or conversations, and create knowledge graphs while embedding factors such as interpretability, bias detection, and collaborative reasoning throughout scientific workflows [45,152,153].

3.4.10. Retail, Business, and E-Commerce

Autonomous systems create customized customer interactions and coordinated management of supply chains by offering personalized shopping experiences, situation-aware support in the form of chatbots and virtual assistants, navigation support inside buildings or stores, and through coordinated autonomous operation by multi-agent systems like inventory management, logistics, and fulfillment [78]. Autonomous systems also manage automating task scheduling, digital management of content, and automating multi-step workflows like travel booking or e-commerce tasks [141,144]. Autonomous systems depend on using natural language processing (NLP) and situational reasoning algorithms to adapt to changing situationals [144]. There are also autonomous decision-making agents that are intended to quickly assist an organization or team by helping with aspects of strategic and tactical planning, or rapid data analytics along with error detection or forecasting to improve productivity and customer satisfaction [143]. The increasingly sophisticated capabilities of AI service agents and chatbots allow for continued personalized interactions, as well as the use of financial advice, reservations, smart device activation, and decreased human workload [140,145]. Workflow automation is at play in areas such as law and retail along with automating Kubernetes tasks with minimal supervision, and e-commerce process optimization provides opportunities for reduced costs and increased customer satisfaction [27,142]. Human–machine collaboration is added by using humanoid robotics and agentic LLMs for claims handling, smart marketing, HR, and trust establishment in customer service [68,146]. Interactive recommender systems will provide conversational personalized recommendations across movies/entertainment and retail domains [79]. GenAI agents showcase multi-domain reasoning, planning, and task execution capabilities, including finance and banking, cybersecurity, healthcare/remote healthcare, and software operation/administrative tasks [40].

3.4.11. Smart Cities and Energy

Autonomous systems support a city’s services, like transport, health, and emergency management, and can proactively provide solutions to problems like pollution and traffic in real time [125]. Agents can take on responsibility for building energy work that now involves data processing, coding, and simulations [58]. Multi-agent systems use less energy in smart buildings while maintaining comfort, reducing cost, and preventing energy spikes [70]. Digital twin agents support monitoring buildings, fault detection, maintenance to avoid faults, and energy efficiency for use [126].

3.4.12. Public Administration

Autonomous systems assist public administration with efficiently managing resources (water and energy), correctly anticipating demand, simulating policies with digital twins, and making adjustments in real-time. Autonomous systems also enhance services to citizens, automate administrative workflows, aid with fraud detection, and help with smart cities and population health decision-making [37,41].

3.4.13. Education

The personalized learning capabilities of autonomous systems in education derive from tracking student performance, adjusting content, giving timely feedback, taking on administrative responsibilities, identifying signs of disengagement, and promoting collaborative learning [57]. They can also help automate objective essay questions, thus offering a consistent, fair, and explainable way of marking via collaborative engagement of multiple agents [148]. Adaptive AI agents would be described as metacognitive support, facilitating the diagnosis of gaps in learning, providing scaffolding for complex content, relieving learner burden and acknowledgement from remote participation, and automation in marking [147]. Multi-agent LLM systems help deliver personalized and consistent results for both subjective and objective purposes, merging agent coordinate reasoning, validation, and quality coding of data generation [97]. In addition to helping learners, agents help the cognitive workflows related to research, decision-making, report writing, organizing travel, and many other complex academic workloads [149].

3.5. What Kinds of Input and Output Formats Do Agentic AI Systems Handle in Comparison to Traditional AI Systems?

Agentic AI systems designed for operation in complex environments are developed with flexible input–output structures for independent decision-making. Various agentic AI models use input–output schemes suited to the nature of the task, which will differ in terms of structural design and user styles of interaction. This section presents the various interaction methods with agentic AI systems, including how they process different input forms and produce appropriate outputs. Table 6 is categorized according to the input types given to the system, along with the task performed and the technologies used to produce that output.
The rationale for the agentic AI models for input–output formats is to demonstrate how agentic AI systems are designed by directly mapping input modalities to their outputs, the tasks performed, and the technologies used. This comprehensive table is organized for easy comparison of how a single input can produce multiple types of output and perform different types of tasks. This provides important information about system flow in the real world applications of agentic AI systems. The table also highlights the diversity and specialization of agentic AI systems to help researchers and practitioners identify models or technologies appropriate for particular applications.

3.5.1. Text to Actions

Text as input is a free-form language from user to system; mostly, it is a natural language that can be structured or unstructured. In agentic AI models, actions as output means executing something in the system to complete the tasks [120]. More advanced models, like Vectara-agentic [27] and AutoGPT [34], can take text input, read the user’s prompt to create outputs or undertake actions, and coordinate with other agents without human assistance.
Beyond task executions, agentic AI systems also contribute to robotic process automation. It is possible for an agentic AI model to analyze manufacturing data and automatically execute optimized robotic assembly sequences, based on a structured framework like MAPE-K [121]. Similarly, in network operations [71], an agentic AI can interpret controlling commands and independently react to failure by interpreting and implementing possible actions in real-time [82] to make a decision to execute an action. In doing so, these models improve the fault tolerance of systems and enhance resilience [62]. Moreover, in the case of smart environments, agentic AI can drive energy savings by connecting power supply across home appliances [38] using multiple AI agents collaborating in a coordination process enhanced with LLMs.
Agentic AI models can improve themselves. BrainBody-LLM and RASC [75] utilize feedback in self-reinforcement to continue learning and optimizing over time. In scenarios where the past outputs have been wrong, self-reflective models like SELF-REFINE [50] can improve their behaviors by reviewing their own actions to create their improvement without relying on outside parties. Agency AI Systems can remove ambiguity, enact user preferences [154], and derive next actions through self-reflection to generate signals [144].

3.5.2. Text to Text

Text as output is human-readable language, which is a combination of formatted and structured text that is read and understood by people. Recent LLM models can share their knowledge amongst other agents, resulting in a multi-agent ecosystem capable of collective learning [143]. LLaMA-3 models [40] can help users in information retrieval by parsing users’ prompts [77], looking for responses to the prompts, and aggregating information from other sources [52]. LLM models support problem-solving [137] and programming [155] by completing or correcting code, and describing runtime errors [110]. The GPT models can also help automate customer interaction process workflows [140], logic-based processing in automated support [137], and responding to inquiries [41].

3.5.3. Text to Multimodel Data

Agentic AI systems can produce structured outputs, such as JSON or YAML scripts, after completing complex tasks [105], orchestration [100], or documenting research in research documents [151]. AgentGPT models can also produce multimodal data as output, which can be data visualizations such as projecting future growth using an interactive timeline [138] or charting their current and historical data and actions [64]. LLM models can fill logs and audits when they process textual input and produce structured records of activity [40].

3.5.4. Audio Files to Text/Audio

Audio as input in agentic AI systems can be speech, music, or signals, which are processed by the system to understand the goal, supporting human experience by allowing for more natural, context-rich interactions. Audio as output is a sound-based content as a response, which is human-perceivable sound for listening. Agentic AI systems utilize technologies including Automatic Speech Recognition (ASR), short-term and long-term memory, and advanced language models such as GPT-4 or Llama 3.2 to process spoken language and operationalize it into actionable insights. It is possible to use audio input for ethical advisory scenarios where agentic AIs can generate fairness-aware decisions through parsing spoken queries [101]. Similarly, feedback-generating interactive systems can use audio to goal parse and make context-aware outputs [60]. In cases such as empathetic interviewing, multimodal agentic AI are a more advanced use of audio input as both parse input and response are captured and subsequently provided in a structured, emotionally intelligent discussion [153]. Audio input allows agentic AI to extend their capabilities of real-time communications that exhibit human-like attributes.

3.5.5. Real-Time Data to Actions

Agentic AI can receive continuous real-time data from a sensor or camera, allowing it to obtain live information. Sensor data can detect a building’s power grid [73], analyze how it is being used, and instruct the building to save energy [70]. Agentic AI also suggests the optimal routes in traffic [103]. ML-based agentic systems can detect the issue [131] to optimise the operations [123]. When the AI is engaged in operational tasks, it is also measuring and detecting threats, assessing the environment and deciding on an immediate action [156], response to disruptions [30]. Using a trial RT learning approach, the system recognizes and stabilizes performances in the system’s strategizing over time [36]. Real-time data can also involve coordinates or the accurate location of a person, car, or agent in a game, allowing the system to respond immediately and perform actions timely manner [72,119].

3.5.6. Datasets to Text

Dataset as input for agentic AI systems is a pre-collected, structured set of data, and helps systems to analyze, learn from, or make decisions. When datasets are used as inputs in agentic AI systems, they can implement a whole hierarchy of autonomous and semi-autonomous actions in employee management and coaching, product management, risk identification, and pricing. These datasets can be textual, tabular, or multimodal. Datasets with employee feedback and behaviors are provided to employee management [146] and coaching applications [145] to be processed using agentic AI, NLP, SML, and deep learning models, to create performance assessments and feedback loops. In e-commerce applications, datasets with product metadata and customer interaction data are provided to transformer models using PyTorch, an open-source deep learning framework [142] to automatically generate and update product descriptions. Pricing applications typically employ Q-learning-based reinforcement learning algorithms [157] in order to learn from sales datasets and modify pricing for maximum profitability.

3.5.7. Datasets to Actions

In applications where action or intelligence is needed, such as for safety or efficiency management through automation with utilities, robotics or autonomous vehicles or processes and systems (such as supply chains), agentic AI systems use action-based datasets, which contain information about conducted actions, so they can learn how to recognize anomalies [117], mitigate threats, or identify the best actions to optimize processes [63,158]. Based on how the system adapted, feedback is generated autonomously in real time [66]. Based on given datasets, an agentic AI system can autonomously participate in trading, investments [80], and adapt to tasks [152].

3.5.8. Dataset to Dataset

The purpose of datasets extends to predictive applications, such as health, policing, and financial services, where agentic AI apply structured datasets to predict where possible threats or issues might arise, using XAI [127], ML algorithms [67], and LLMs [35] for forecasting, risk assessment, or strategy formulation, etc. Agentic AI also helps in software applications by providing the training and testing datasets, used for ML models, if a raw dataset is given. This supports that agentic AI uses streaming data, providing both static and dynamic datasets.

3.5.9. Dataset to Alerts

Alerts as output are notifications or warnings to inform users about specific events; these events, most of the time, will be detecting risks [102] or fraud detections [129] in agentic AI systems. Here, the dataset acts as an input knowledge source for the system, which is analysed to detect risks or fraud.

3.5.10. Multimodel Data to Multimodel Data

Agentic AI can utilize multimodal data from various types, including text, video, images, audio, graphs, and datasets to generate personalized outputs to act more intelligently, flexibly, and responsively. Agentic AI uses a number of sensory channels to help the machine perceive complex environments, actuate behavior to align with human intentions, and make decisions based on the real-world context. Reviewing work using LSTM-based deep learning and reinforcement learning effectively aligned behaviors of robots with human goals using simultaneous visual and textual trainable inputs [124]. Additionally, multimodal pipelines based on audio and video can build a spatial–temporal reasoning capability, enabling perception and context decision-making amidst complex situations [95]. Models like LLMs and VLMs enable adaptive decision-making in agentic decision-making when processes use combinations of text, images, and audio, thus enabling an agentic system to demonstrate embodied reasoning and task execution towards articulated goals [33,135].
Structured representations of user goals, and multimodal input like speech signals, visual cues, and brief actions are the main methods of prompting agents to behave in particular ways [132]. The agents act upon abstract, high-level user goals instead of explicit commands, which allows the agent to figure out how and when to respond. Many of the processes used intermediate representations of states and often had feedback loops to make decisions in dynamic environments, as we depicted in the annotation [71,105].

3.5.11. Beyond Conventional Input–Output Methods

In addition to standard input–output models, which usually use structured text inputs or standard datasets and files, agentic AI systems are increasingly utilizing other kinds of input data that are spatial, document, or alert-based. These different media for inputs give AI agents very high degrees of freedom, control, and adaptability for a number of actions they typically perform in a domain-oriented task. Exact coordinates of the agent help to plan a path that is entirely collision-free with respect to any agents involved [69]. This enables robotics and multi-agent systems to function without any centralized controller. Numeric data supporting SCADA connections allow AI agents [43] to optimize key process parameters, such as emission output, generally used in chemical factories. Alert-based inputs represent real-time set triggers where agentic AI can respond autonomously to reduce risk associated with incidence and Tier 1 and Tier 2 Incident response actions [161].
Code files, including .cs files, represent an input to AI agents like Codex that modify codebases or create Git diffs and logs to be able to allow for agentic to autonomous software maintenance and version tracking [51]. CSV files can provide AI with inputs and can be used in combination with pretrained ML models to cue AI to produce and debug the energy modeling code with tools like LangGraph with ReAct and PythonREPL [58]. PDFs are mostly considered unstructured input types. LLM APIs like Gemini 1.5 [99] can be used to parse and extract, and then structure clinical datasets, so AI has passed the milestone of being able to handle semi-structured documents. These examples represent agentic systems which push beyond the inputs of traditional models toward AI systems that accommodate varied formats of data in order to operationalize in specialized and high-stakes contexts.

3.5.12. Core Input–Output Mechanisms in Agentic vs. Traditional Systems

Traditional AI systems generally work on structured, static inputs and produce fixed [71] or predefined outputs [135]. The systems evaluate input against rule-based logic or supervised learning models that have been trained by available labels on training sets, all within an institutional context that necessitates human action or intervention to make decisions and take action [47]. Inputs typically comprise historical data [78], clearly defined user commands or queries, and are mapped to singular outputs such as classifications or recommendations [132]. Based on narrow sets of inputs and outputs, these systems work well in a constrained environment with predictable data structure and have less success in contexts with complex and evolving problems [95,106]. Research papers [57,80] emphasize that traditional AI with structured inputs provides shallow outputs, and becomes exhausting to use with frequent human intervention.
In comparison, agentic AI systems operate with multimodal [33], dynamic [106], and contextual inputs like real-time sensor data [116], natural language [126], intent-based actions [148], and feedback and results in autonomous outputs and/or actions that are goal-directed multi-step plans, executable actions, and adaptive decisions [157]. Agentic AI reasoning is temporally continuous, socially collaborative, and learns from its environments [38,136]. Paper [77] supports the argument that agentic AI outputs are indicative of rich, contextual information and proactive responses, while [143] describes the agentic AI system’s adaptability to user-defined workflows through its contextually adapted inputs and predictions-oriented predictions. Unlike non-agentic systems, agentic AI operates continuously, proactively, and is not dependent on an input to act; rather, it anticipates requirements, operates autonomously, provides strategic options, remembers and learns, and frequently works collaboratively with other agents to produce an output, as compared to making a prediction based on data input with a simple action [55,161].
Researchers of papers [37,132] discuss real-time adaptability of agentic AI and long-term modeling of preferences, while [31,125] start to explore deeper meanings of situational awareness and internal goals. Papers [38,71] make explicit comparisons, comparing the ordering of decision-making chains and the moments when systems are orchestrated. In evaluating quality and how useful those offering applied system analyses or definitive architectural comparisons, [31,41,95] closely align with research pointing toward modeling agentic input–output pathways. Learning about these input–output functions is important for designing future AI systems that are both intelligent and autonomously able to demonstrate self-direction and situational awareness, and also be able to operate in an unattended state.

3.6. What Evaluation Methods and Metrics Are Used to Assess the Performance of Agentic AI Systems?

Evaluation metrics used to assess the performance of agentic AI systems are increasingly critical as these systems are expanding in every domain. With the increasing applications of agentic AI in areas ranging from autonomous robotics to intelligent decision-making, it is very important to validate and test the system’s performance to ensure reliability, efficiency, and safety. This evaluation helps to identify the strengths and limitations of the system, which supports their improvement and responsible use. This section discusses more about various evaluation metrics and testing methods.
Figure 11 represents the categorization of performance measures of agentic AI, which diverges with two categories. They are evaluation metrics and testing methods. In practice, testing and evaluation metrics are closely related but not the same. Testing is where the system will be given experiments or scenarios on its performance. Evaluation metrics are the specific measures used to quantify how well the agentic AI system performs. They are used to interpret the results from testing.
The rationale for our performance analysis of agentic AI classification is to present the different types of testing methods and evaluation metrics of agentic AI systems. It organizes methods and metrics based on the aspects that are being tested in the system. It presents different testing methods from the literature in one place, and the results from those tests are interpreted to obtain evaluation metrics. Using these metrics, the system’s performance is evaluated. Figure 11 divides methods and metrics separately because many methods in agentic AI have different metrics that overlap, such as accuracy. Each metric is further specified as qualitative or quantitative to independently identify what is being measured and how it is measured.

3.6.1. Testing Methods

Automated Test Generation [110] is a method in which tools like CodeT, Reflexion, and ClarifyGPT are used to refine code and detect ambiguous requirements. Formal Verification [95] is used to test whether the system is exactly as specified, without errors or unexpected behavior.
Runtime monitoring [51] is a live, ongoing test to find how well the system is running in real time. This allows for the identification of errors and abnormal behavior, allowing agents to quickly respond, like making a rollback or remediate before a system crash. Heartbeat monitoring [123] simply checks if the system is alive by sending requests periodically. It helps in detecting failures and alerting IT without interrupting users.
Fault injection [123] is the act of injecting errors or faults into a model so that one may observe the model’s behavior and use this methodology to ensure that the model can still perform its job correctly in a faulty state. Automated recovery [150] is deployed to test the system’s capabilities to reflect and recover from failures while returning to normal operation without human intervention.
Benchmark testing [61,113] is a process that includes evaluation of an agentic AI system’s performance on standardized tasks and metrics that validate its operational and reliable capabilities in real-world applications. Stress testing [39] is to test the model with unusual or extreme inputs to see if it stays reliable and functionally adaptable to change. A/B testing [132] is a process of running live experiments with users to see which version works better in real life.
Cross-domain tests [76] allow us to evaluate whether an agentic AI is performing well across different models or tasks with registers outside its original training. It involves tests in different domains like MineDojo, ALFRED, and ScienceWorld. Cross-platform Agent Benchmark (CRAB) [75] evaluates multimodal embodied language model agents with graphic-based tests and a unified Python interface.
Simulations [162] are used to model ethical dilemmas, especially in areas like autonomous vehicles and healthcare, allowing people to evaluate AI decisions in realistic and interactive scenarios. Digital twins [163] are real-time virtual models used for fault prediction and performance optimization, combined with LLMs, enabling smart, safe, and automated management of the system.
Human-in-the-loop verification [45] includes human involvement to improve the reliability, interpretability, and trustworthiness of AI-driven scientific discoveries. It highlights the need for human oversight to address AI limitations and support validation and accountability. User Acceptance Testing (UAT) [104] examines the system with real users to validate that the system satisfies their needs, wants, and user expectations to ensure user satisfaction and accountability.
Testing for Harmful Capabilities [46] is testing whether agents generate harmful outputs or enable malicious actions so they can be mitigated, to ensure safety. Heuristic AGI [134] testing is a simple way to evaluate models based on certain achievements, like passing a Turing test or solving important problems. This is quick as it does not involve fully measuring their true abilities.
Workflow-centric evaluation [50] considers evaluating the full decision-making process from perception to reasoning to acting rather than just evaluating the end product. Here, the agentic AI’s behavior is validated and verified across workflows when using AI-driven testing workflows, advanced monitoring, and metrics. Independent Testing of Subtasks [46] is a process of isolating each of the subtasks of complex tasks and testing each subtask by itself for reliability; high-risk actions will always take precedence.

3.6.2. Evaluation Metrics

The evaluation metrics are divided into qualitative metrics and quantitative metrics to be able to assess the system’s overall performance, behavior, and machine and human interaction. Quantitative metrics measure numerical performance, while qualitative metrics measure the quality and adaptation of the system, along with the user experience of interaction with the system.
Explainability: The degree to which the system produces clear and understandable reasons or explanations for its actual decisions or outputs. Explainability is tested using self-reflection and cross-agent reflection for foundation models like GPT-4 [133,150], LLaMA [133], and Compliance Agentic Model (CAM) [44] through reasoning traceability and in Artificial Cognitive Agents [122] by evaluating how well they mimicked human reasoning and explained decisions clearly.
Transparency: A measure of how openly the AI system’s inner workings and decision processes are made visible or available for inspection. Transparency for models like AMBY and Drawcto [135] is tested through user-facing clarity, Constitutional AI models [44] using predefined ethical principles and Hybrid Transparency Models [37] are tested through open-source governance components and external audits.
User Satisfaction: It is a measure of how well the agentic AI system meets the needs, expectations, and preferences of users [163]. Models like Bi-level are Learnable LLM Planner (BiLLP) [132], and user satisfaction is evaluated by measuring how well the agent balances short-term and long-term user preferences in recommended tasks. For user satisfaction in agentic chatbots for achieved and ideal customer service [47], satisfaction can be measured by the system accommodating for user mood and personalizing decision-making responses.
Fairness: It can be used to identify biases system in agentic AI, for example, responding in different ways, or differing response quality in different demographic groups [56]. Fairness can also use methods such as SHAP [101], or counter-factual analysis [101] to measure fairness and bias in the agentic AI experience.
Bias Mitigation: It can be used to reduce or eliminate bias from agentic AI systems. If agentic AI systems that amplify societal biases can lead to unfair outcomes, [49], particularly oscillating more aggregated biases in regards to vulnerable groups. It is important for LLMs to implement bias mitigation in contexts, such as citing GPT-4 [56], ensuring commitment for both equitable and ethical deployment.
Co-operative Behavior: It can be used to measure how well AI agents collaborate and coordinate with one another to achieve mutual goals. Agentic AI systems can collaborate via social contract rules, for example, using shared rules [157] or cooperating and learning agentic AI behavior via reinforcement learning rules. Cooperation behavior is actively promoted using the MAAC model [70] to improve resource management across multiple agents.
Adaptability: It encompasses ongoing interpersonal learning and agile feedback and tuned goals and measures via conceptual multi-agent models, formal policy operationalizations such as Petri Nets [154], and applied second burst benchmark measures that simulate unpredictable, complex, and dynamic environments [75]. Adaptability is tested by how AI agents and humanoid robots adjust tasks and processes dynamically in real time [68].
Robustness: It is a measure of the ability of an agentic AI system to maintain goal-directed performance [48] despite internal failures and external adversities. For agentic coding systems like Codex (OpenAI) [51] and Google Jules [51], robustness is tested via sandboxed execution, traceability, and rollback. AgentHarm Benchmark [136] is used to test robustness against adversarial LLM agent behavior.
Accuracy: It is a measure of how correct and precise an agentic AI system’s outputs are compared to the expected or true results. Agentic frameworks like LangGraph [53] and CrewAI [53] show around 94% accuracy in complex task execution. Models like Claude 3.5 [138] and LLaMa-3.2 [138] can show 96% accuracy with efficient plan caching and keyword-based retrieval. Platforms like IBM Watson Health [66] and tools like E-rater [57] and IntelliMetric [57] maintain high scoring accuracy in predicting and evaluating.
Precision: It is a measure of how many were actually correct out of all positive predictions made by a model, to find the accuracy of positive predictions. Most models like YOLO [109] and Mask R-CNN [109] have nearly perfect precision (almost equal to 1) for automated injury detection tasks. Models like GPT-3 [65] and LLaMA-2 [102] achieved higher precision than BERT [81], and the Swarm Agent [119] framework is better in precision than the Decision Tree [116] for classifications.
Recall: It measures how many the model gets right out of all actual positives, to obtain all positive instances. Agentic AI models have maintained high recall across different applications like EcoptiAI [142] in e-commerce retrieval, LLaMA 3.1 [102] in clinical text extraction, InteRecAgent potentially using RecLLaMA [79] in recommendation systems, and GPT-4o [126] in digital twins in fault detection.
F1 score: It is a measure that combines precision and recall into a single number by calculating their harmonic mean. The F1 score is calculated by multiplying precision and recall, and its double is dividing by the sum of precision and recall [126,142]. So, it is clear that the F1 score will depend on precision and recall.
Graph Edit Distance (GED): An important structural metric in agentic AI, using the level of structural similarity between an AI-generated task graph and a ground-truth graph as a measurement. When using the frameworks within a GPT-4-based algorithms [52,105] (gpt-4-0613), GED was estimated based on the minimum number of edits to the nodes and edges of a graph. This ultimately allows you to quantify the accuracy of the multi-hop reasoning and tool-use planning measured by the metrics; the lower the score, the closer to expert behavior we are.
Rule Fidelity: A measure of how accurately the symbolic, human-readable rules generated by the system reflect the actual decision-making process. The frameworks building the Neuro-Symbolic Agent [161], AutoML Agent [54], and the Agentic Planner [54] achieve roughly 94% rule fidelity based on their ability to concisely convert the decision-made learned model to human-readable rules.
Task Completion Time (TCT): It measures the time taken by the agent to plan, execute, and complete a particular task. This is a measure of how well the system performs operationally, along with the speed at which the AI completes complex workflows without human involvement. Model implementations LangGraph [58], Autogen [151], and GoEX [53] averaged 12 ± 2 min TCT, which is a 34.2% improvement against traditional AI in general.
Click-Through Rate (CTR): It assesses how good of a job the agentic AI system interacts with the user to click a recommendation or content we list to them, e.g, associating CTR with the system’s relevance, impact, etc. Models like the GPT-4 (ColdLLM)6, the CG4CTR6 and the LLM-InS6 simulators were used to test if it was possible to associate different scenarios of CTR to the recommendation list through simulations of common interest content personalization, visual design impact, Inter environmental dialog personality, or deferred dialog.
Gross Merchandise Value (GMV): It measures the total dollar amount of the items sold or the total dollar amount of completed transactions due to the actions of, or recommendations made by, the AI. The ColdLLM [132] simulations used user interaction to gain improvements in making recommendations for new items, which positively impacted the returned GMV in actual A/B testing.
Agentic AI accuracy is context-dependent and evaluated in different ways. In many cases, we have seen systems that have numerical accuracy like ECG AI, that has an accuracy of 96–97% [164], and Organa, that can report >90% accuracy [150]. Otititi emphasized that these efficiencies do not mean the AI will have a fixed set of tasks or goals on which basis we evaluate it. Previously discussed systems have emphasized operational relevance rather than static scores [35,134]. Some ways accurate agentic AI improved were by issue structuring in modular team collaborations, simulations like Agent4Rec, that repeat user activity and modeling to run those activities in simulation [132], and self-improvement via reasoning through feedback loops and by using knowledge graphs [37,134]. Some papers say system success is based on accuracy only in domains like healthcare and fraud detection [41,47], while others say system success also depends on transparency, fairness, and misinformation suppression [156,161]. Our view suggests that accuracy in agentic AI is best understood not as a single metric but as a multi-faceted outcome shaped by goal alignment, domain context, and the AI’s ability to plan, adapt, and reason.
Technologies like self-healing architectures can identify failures and correct problems on their own [71,121,123], while tools like retrieval-augmented generation, tree-of-thought backtracking, and iterative debugging can help rectify hallucinations and task-related mistakes [58,75,82,132]. Multi-agent cross-verification, confidence-based reruns, and actor–critic feedback loops improve reliability in dynamic scenarios [65,79,140]. These are all the mechanisms of error recovery in agentic AI systems. Metrics for error recovery remain underdeveloped, and systems often struggle with conceptual mistakes, cascading faults, or complex edge cases [105,136,165]. Human checks, backup systems, clear logs, and interpersonal support allow them to be more robust [39,95,135]. Future agentic AI systems will strive to automatically identify and recover from errors with reinforcement learning, dynamic replanning and fault-tolerant architectures [33,49,66,166].
Parallel processing, semantic filtering, and real-time planning reduce duplication and response delays, improving both speed and scalability of the system [36,81,105]. Task efficiency improves through modular multi-agent systems, lightweight models, and adaptive architectures [31,111,132]. While efficiency defines quick responses and low energy use, most studies agree that agentic AI performs better than traditional systems through goal-oriented, optimized task execution.
A further central key component for trust to operate in agentic AI systems is transparency, explainability, and ethical design. Papers [37,132,135] say that trust can be developed through reasonable explanations, fairness, and accountability. Papers [65,71] highlight the potential for trust to break down as large models remain subject to errors, attacks, and hallucinations. Some researchers suggest dynamic approaches like system refresh [120] or incremental querying and reflection [133] to manage these risks.
Some papers talk about conceptual models based around trust, for example, cognitive, affective and social trust cited as [128], and other papers talk about metrics, for example, Explainability Scores [105], and reliability [99]. It illustrates how agentic AI trust research is still emerging. Some studies say that user satisfaction depends on how easy and trustworthy using an agentic AI system [130,141], while others say it depends more on how well the AI adapts and matches the user’s values [31,37]. But the actual user satisfaction depends on many factors like meeting user’s expectations, usefulness, ease of use, speed, task completion, and accuracy.

3.7. What Are the Key Challenges and Limitations in Designing and Deploying Agentic AI Systems?

Agentic AI systems bring a number of technical and social challenges. The impact of these systems depends largely on the need to constantly sense, plan, and act in dynamic environments, which requires well-designed architectural and layered systems. At the same time, the implementation of agentic AI systems raises challenges with trust, safety, ethics, and governance that require oversight and justifiable design frameworks.
Architectural/Technical Challenges: Architectural design is a key consideration for agentic AI systems. There are limitations surrounding dynamic task de-composition, hierarchical planning, and adaptive reasoning in changing environments [105]. Recent frameworks to support generalization to diverse tasks have limitations pertaining symbolic reasoning and structural rigidity [38]. Classical agent models do not support long-term strategic planning, leading to brittleness in dynamic settings [106]. Additionally, handling large and variable data including issues with volume, velocity, variety, and veracity places excessive strains on current system architecture [54]. The construction of opaque architecture limits interpretability and makes coupling with a real-world instantiation difficult [157]. Overall, we seek to develop modular, transparent scalable agent architectures that can operate under uncertainty [167].
Ethical/Societal/Governance Limitations: Agentic AI has many ethical, societal, and governance problems that need attention. There are privacy and security risks; almost 78% of organizations reported problems with data safety while using these, which shows they needed stronger rules to protect personal information [53]. Secondly, about accountability, as they are trained with reinforcement learning and, sometimes they act in unexpected ways. Even the developers cannot explain why the agentic AI made a certain choice, which makes it difficult to decide who is responsible [157]. Third, when several agentic AIs work together, there can be a communication failure. Due to these failures, it can create safety risks and increase unfair or biased decisions [105]. Finally, when the data are used by agentic AI, they may often be broken, incomplete, or poor in quality, so this also leads to biased and unfair outcomes, and that may affect trust and fairness [135].
Performance/Tool Integration Issues: Agentic AI systems continue to face ongoing problems regarding performance and tool integration. When executing multi-step and complicated tasks, there is often delay or issues pertaining to computational inefficiencies, especially during simultaneous interaction of the agents and the set of diverse external tools in real time [53]. The issues will not be improved as there is no standard interfaces offered in tool/software development to integrate with, especially with commercial or domain specific workflows [105]. Furthermore, limited design, test, and deployment support for agent design, testing and deploy environments limits the agents’ scale and stable design and development [135]. This is especially true in critical areas such as healthcare, where the limits can threaten trust and effectiveness to operate. This highlights the need for effective agent architectures, and powerful toolchains to support reliable and scalable deployment.
Multi-Agent Coordination: Multi-agent coordination has recurrent difficulties in agentic AI systems, especially in terms of collaboration, synchronization, and scaling [105]. When multiple agents operate in shared environments, breakdowns in communication or strategy alignment between agents can lead to inefficiencies in establishing tasks, or complete system failure. Operational challenges that arise in the real-world are often the result of coordination failures like inconsistency in data access, communication latency, and misalignment stemming from a failure to share context [53]. Cooperative systems are also highly susceptible to correlated system failures resulting in collusion—even when there is no malicious intent—thereby diminishing issues of fairness, robustness, and transparency [135]. The problems demonstrate the necessity for scalable coordination protocols, methods for context awareness, and safety guarantees in multi-agent systems.
Human–AI Interaction: Human–AI interaction presents ongoing difficulties for the effective deployment of agentic AI systems. Many agents have limited transparency and interpretability, discouraging trust for users, and limiting collaboration, particularly when the stakes are high [105]. Effective human–AI interaction often entails poor or insecure interface design, along with overly rigid communication protocols, which fail to address the differing user roles and contexts that may involve non-expert or at-risk user groups [53]. In addition, the lack of participatory aspects limits the opportunity for oversight and corrective feedback to occur in ethical or safety-critical contexts. Systems currently struggle to properly interpret and align with the subtlety of human intention, leading to misalignment of action, including on even simple tasks and moral misalignment on work meant to help others [162]. These problems suggest human-centered agent approaches that increase explainability, allow for adaptable interaction, while fostering broader value alignment in communities and environments characterized by diverse technical differences.
Security: Security is still a fundamental issue in the design and implementation of agentic AI systems. Agents often function autonomously, and will operate in dynamic and diverse real-world conditions that make them susceptible to a range of risks (e.g., adversarial attacks, data poisoning, or inappropriate use of external tools or APIs) [53]. Lack of verifiable pipelines to connect external knowledge sources or tools also presents a means for malicious misconduct to emerge and spread false information [105]. Lack of relevant oversight processes as a safeguard may also make it difficult for humans to detect or intervene in harmful behavior when it occurs [162]. To achieve resilience, agentic AI must establish secure-by-design principles, undertakes continuous audits, and establish guardrails that restrain misuse while continuing to perform their task [135].

4. Conclusions

Agentic AI refers to intelligent systems that act autonomously, plan and adjust steps, and operate in complex environments with minimal human oversight. Combining LLMs and reinforcement learning, these systems manage multi-step workflows, remember past interactions, and pursue multiple goals. Frameworks like LangChain, AutoGPT, BabyAGI, AutoGen, and OpenAgents illustrate this evolution, progressing from structured reasoning and iterative tasks to multi-agent collaboration and adaptive problem-solving. Agentic AI will share many of the following key components: perception, planning, execution, memory, reflection, orchestration, and interaction, but differ in terms of the orchestration style and control loop between agentic AI. LLMs are sometimes seen and used as reasoners, communicators, and orchestrators. The very best agentic systems build on the concept of explicit memory for behavior, and have built-in feedback and graph-based orchestration with their memory, to elicit adaptive, reliable behaviors. Agentic AI is capable of instilling efficiency, individualization, and decision-making in a wide variety of settings because of its offering of autonomous, coordinated, and responsive action. Agentic AI can complement humans, undertake more sophisticated processes, provide greater safety, as well as intelligent, real-time solutions covering all aspects of work. Agentic AI systems move well beyond simple input–output frameworks. They include a wide variety of data types like text, audio, multimodal inputs, datasets, real-time streams, and even specialized formats like code files and PDFs. They facilitate autonomous work across structured and unstructured environments while counteracting the unpredictable elements of the real world. Evaluating agentic AI systems is an important process, and it requires a multi-faceted approach that integrates technical accuracy with qualitative factors. The most successful agentic AI systems will balance performance through the operational aspect of the system with the explanation and fairness aspect of the system, ensuring its outputs best support users’ needs, values, and expectations. Agentic AI systems must contend with deeply intertwined technical, social, and governance challenges. Necessary evolution to agentic AI must involve creating modular, transparent, scalable systems while balancing the problems of alignment with human values, accountability, and fairness. Secure-by-design practices, ongoing oversight, and physics guidelines for human interactions, will be vital for establishing trust in the use of agentic AI systems and ensure safety in the many types of real-world dynamic environments.

Funding

This research received no external funding.

Data Availability Statement

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

Table A1. Numerical overview of papers published per year.
Table A1. Numerical overview of papers published per year.
Year PublishedNumber of Papers
19991
20051
20071
20141
20171
20181
20191
20201
20213
20226
202312
202421
202593
Total143
Table A2. Numerical overview of different publication types.
Table A2. Numerical overview of different publication types.
Paper TypeNumber of Papers
Journals70
Preprints38
Conference Proceedings26
Articles2
Working papers2
Book2
Dissertation1
Workshop2
Total143
Table A3. Numerical overview of different publication venues.
Table A3. Numerical overview of different publication venues.
Publication VenueNumber of PapersReference Code
arXiv32[31,33,38,39,40,42,51,71,75,77,81,87,90,98,100,111,113,116,120,132,136,137,138,141,143,144,148,150,157,159,166,167]
SSRN6[27,37,41,55,129,160]
Preprints3[36,53,96]
IEEE Access3[47,89,142]
Companion Proceedings of the ACM on Web Conference3[45,112,139]
IFAC-PapersOnLine3[61,72,119]
AI2[30,88]
ACM Conference on Fairness, Accountability, and Transparency2[34,168]
Journal of Business Research2[131,146]
Engineering Applications of Artificial Intelligence2[62,76]
ACM International Conference on Intelligent User Interfaces2[52]
Neural Information Processing Systems2[83,92]
Metallurgical and Materials Engineering2[130,134]
American Advanced Journal for Emerging Disciplinaries1[78]
ACM Transactions on Software Engineering and Methodology1[110]
ACM International Conference on User Modeling, Adaptation and Personalization1[97]
Advanced Engineering Informatics1[124]
Rani Channamma University Belagavi1[57]
AIS Transactions on Human Computer Interaction1[114]
American Journal of Computing and Engineering1[103]
Annual review of psychology1[162]
Applied Energy1[70]
Architectural Intelligence1[169]
Array1[66]
Association for Computing Machinery1[79]
Automation in Construction1[58]
BioSystems1[158]
Cell Reports Physical Science1[64]
Chapman and Hall/CRC1[170]
Clinical Neurophysiology1[99]
Computer Methods and Programs in Biomedicine1[59]
Computers and Electrical Engineering1[73]
Computers in Human Behavior: Artificial Humans1[128]
Cureus1[101]
Current Opinion in Chemical Engineering1[43]
Digital Discovery1[152]
European journal of analytics and artificial intelligence1[35]
Engineering1[82]
European Management Journal1[68]
Expert Systems with Applications1[135]
Extreme Mechanics Letters1[165]
PhD Dissertation1[32]
Foreign Languages in Higher Education1[147]
Frontiers in Computational Neuroscience1[122]
Frontiers in Human Dynamics1[154]
Informatics and Health1[56]
Information and Organization1[67]
International Journal of Computational Mathematical Ideas1[54]
International Conference on the AI Revolution: Research, Ethics, and Society1[50]
International Journal of Human-Computer Studies1[164]
International Journal of Research Publication and Reviews1[107]
Journal of Retailing1[140]
Journal of Building Engineering1[126]
Journal of Clinical and Experimental Hepatology1[102]
Journal of Computer Information Systems1[161]
Journal of Computer Science and Technology Studies1[95]
Journal of Retailing and Consumer Services1[145]
Journal of Strategic Information Systems1[149]
Journal of Systems and Software1[133]
Journal of Water Process Engineering1[63]
MethodsX1[109]
Multidisciplinary, Scientific Work and Management Journal1[80]
NeurIPS 2024 Workshop on open-world Agents1[105]
Optical Switching and Networking1[163]
International Conference on Agents and Artificial Intelligence1[106]
Procedia CIRP1[121]
Conference on Human Factors in Computing Systems.1[155]
Australasian Computer Science Week1[44]
Special Interest Group on Management Information Systems—Computer Personnel Research.1[156]
ACM International Conference on Autonomous Agents and Multiagent Systems1[74]
ACM International Conference on Information and Knowledge Management1[108]
ACM International Conference on Interactive Media Experiences1[60]
ACM International Conference on Human Factors in Computing Systems1[153]
International Conference on Automated Assembly Systems1[123]
Pacific Asia Conference on Information Systems1[48]
IEEE/ACM International Conference on Automated Software Engineering1[115]
ACM on Software Testing and Analysis1[118]
OpenAI1[46]
Review of Materials Research1[65]
Telecommunications Policy1[117]
The Artificial Intelligence Business Review1[49]
Tourism Management1[127]
Transport Policy1[104]
UIUC Spring 2025 CS598 LLM Agent Workshop1[151]
Urban Informatics1[125]
International Conference on Learning Representations1[84]
Future Internet1[86]
Springer Science and Business Media LLC1[91]
Sage Publications, Thousand Oaks, CA1[94]
Wiley Online Library1[93]
SmythOS1[85]
Total143

Appendix B

Table A4. Commonly used abbreviations.
Table A4. Commonly used abbreviations.
AcronymAbbreviation
ABTAgent-Based Traffic
ACPAdaptive Control Planning
ASRAutomatic Speech Recognition
BDDBehavior-Driven Development
BDIBelief–Desire–Intention
BERTBidirectional Encoder Representations from Transformers
CAMCompliance Agentic Model
CAMELCommunicative Agents for “Mind” Exploration of Large Scale Language Model Society
CIFCommon Information Framework
CLIPSC Language Integrated Production System
CRABCross-Platform Agent Benchmark
CTRClick-Through Rate
CUDACompute Unified Device Architecture
DILIDrug-induced liver injury
DLDeep Learning
DQNDeep Q-Networks
EDXElectrodiagnostic tests
FMFoundation Model
GEDGraph Edit Distance
GMVGross Merchandise Value
ICAInteractive Cognitive Agents
JADEJava Agent DEvelopment Framework
LAMLarge Agent Model
LCLogic-based Computing
LLMsLarge Language Models
LLaMALarge Language Model Meta AI
LTMLong-Term Memory
MADRLMulti-Agent Deep Reinforcement Learning
MAAIMulti-Agent Artificial Intelligence
MASMulti-Agent Systems
MASONMulti-Agent Simulator Of Neighborhoods
MCTSMonte Carlo Tree Search
MLMachine Learning
NLPNatural Language Processing
OOADObject-Oriented Analysis and Design
PDDLPlanning Domain Definition Language
PIBTPriority Inheritance with Backtracking
PIBTTPPriority Inheritance with Backtracking for Tree-shaped Paths
PPOProximal Policy Optimization
RAGRetrieval-Augmented Generation
RLReinforcement Learning
RLHFReinforcement Learning with Human Feedback
SCADASupervisory Control and Data Acquisition
SOCSecurity Operations Center
SOCRATESTSelf-Optimizing Contextual Reasoning and Adaptive Testing System
STMShort-Term Memory
STRIPSStanford Research Institute Problem Solver
TCTTask Completion Time
TB-CSPNTask-Based Cognitive Sequential Planning Network
UATUser Acceptance Testing
WWTPWastewater Treatment Plant Operation
XAIExplainable Artificial Intelligence

References

  1. IBM. Agentic AI. Available online: https://www.ibm.com/think/topics/agentic-ai (accessed on 14 August 2025).
  2. Amazon Web Services. The Rise of Autonomous Agents: What Enterprise Leaders Need to Know About the Next Wave of AI. Available online: https://aws.amazon.com/blogs/aws-insights/the-rise-of-autonomous-agents-what-enterprise-leaders-need-to-know-about-the-next-wave-of-ai/ (accessed on 14 August 2025).
  3. MarketsandMarkets. AI Agents Market. Available online: https://www.marketsandmarkets.com/Market-Reports/ai-agents-market-15761548.html (accessed on 14 August 2025).
  4. Grand View Research. AI Agents Market Report. Available online: https://www.grandviewresearch.com/industry-analysis/ai-agents-market-report (accessed on 14 August 2025).
  5. Stanford HAI. Simulating Human Behavior with AI Agents. Available online: https://hai.stanford.edu/policy/simulating-human-behavior-with-ai-agents (accessed on 14 August 2025).
  6. Zou, J.; Topol, E.J. The rise of agentic AI teammates in medicine. Lancet 2025, 405, 457. [Google Scholar] [CrossRef]
  7. Chawla, C.; Chatterjee, S.; Gadadinni, S.S.; Verma, P.; Banerjee, S. Agentic AI: The building blocks of sophisticated AI business applications. J. AI Robot. Workplace Autom. 2024, 3, 1–15. [Google Scholar] [CrossRef]
  8. White, J. Building living software systems with generative & agentic AI. arXiv 2024, arXiv:2408.01768. [Google Scholar] [CrossRef]
  9. TechRadar Pro. The Enterprise AI Paradox: Why Smarter Models Alone Aren’t the Answer. Available online: https://www.techradar.com/pro/the-enterprise-ai-paradox-why-smarter-models-alone-arent-the-answer (accessed on 14 August 2025).
  10. Cardoso, R.C.; Ferrando, A. A review of agent-based programming for multi-agent systems. Computers 2021, 10, 16. [Google Scholar] [CrossRef]
  11. Jin, H.; Huang, L.; Cai, H.; Yan, J.; Li, B.; Chen, H. From llms to llm-based agents for software engineering: A survey of current, challenges and future. arXiv 2024, arXiv:2408.02479. [Google Scholar] [CrossRef]
  12. Zhou, J.; Lu, Q.; Chen, J.; Zhu, L.; Xu, X.; Xing, Z.; Harrer, S. A taxonomy of architecture options for foundation model-based agents: Analysis and decision model. arXiv 2024, arXiv:2408.02920. [Google Scholar] [CrossRef]
  13. Li, X.; Wang, S.; Zeng, S.; Wu, Y.; Yang, Y. A survey on LLM-based multi-agent systems: Workflow, infrastructure, and challenges. Vicinagearth 2024, 1, 9. [Google Scholar] [CrossRef]
  14. Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 2024, 18, 186345. [Google Scholar] [CrossRef]
  15. Gao, C.; Lan, X.; Li, N.; Yuan, Y.; Ding, J.; Zhou, Z.; Xu, F.; Li, Y. Large language models empowered agent-based modeling and simulation: A survey and perspectives. Humanit. Soc. Sci. Commun. 2024, 11, 1259. [Google Scholar] [CrossRef]
  16. Novozhilova, E.; Mays, K.; Katz, J.E. Looking towards an automated future: US attitudes towards future artificial intelligence instantiations and their effect. Humanit. Soc. Sci. Commun. 2024, 11, 132. [Google Scholar] [CrossRef]
  17. Sami, A.M.; Rasheed, Z.; Kemell, K.K.; Waseem, M.; Kilamo, T.; Saari, M.; Duc, A.N.; Systä, K.; Abrahamsson, P. System for systematic literature review using multiple ai agents: Concept and an empirical evaluation. arXiv 2024, arXiv:2403.08399. [Google Scholar] [CrossRef]
  18. Dev, K.; Khowaja, S.A.; Singh, K.; Zeydan, E.; Debbah, M. Advanced architectures integrated with agentic ai for next-generation wireless networks. arXiv 2025, arXiv:2502.01089. [Google Scholar] [CrossRef]
  19. Huang, X.; Liu, W.; Chen, X.; Wang, X.; Wang, H.; Lian, D.; Wang, Y.; Tang, R.; Chen, E. Understanding the planning of LLM agents: A survey. arXiv 2024, arXiv:2402.02716. [Google Scholar] [CrossRef]
  20. Ke, Z.; Jiao, F.; Ming, Y.; Nguyen, X.P.; Xu, A.; Long, D.X.; Li, M.; Qin, C.; Wang, P.; Savarese, S.; et al. A survey of frontiers in llm reasoning: Inference scaling, learning to reason, and agentic systems. arXiv 2025, arXiv:2504.09037. [Google Scholar] [CrossRef]
  21. Ferrag, M.A.; Tihanyi, N.; Debbah, M. From llm reasoning to autonomous ai agents: A comprehensive review. arXiv 2025, arXiv:2504.19678. [Google Scholar] [CrossRef]
  22. Schneider, J. Generative to agentic ai: Survey, conceptualization, and challenges. arXiv 2025, arXiv:2504.18875. [Google Scholar] [CrossRef]
  23. Mishra, L.N.; Senapati, B. Retail Resilience Engine: An Agentic AI Framework for Building Reliable Retail Systems With Test-Driven Development Approach. IEEE Access 2025, 13, 50226–50243. [Google Scholar] [CrossRef]
  24. Raza, S.; Sapkota, R.; Karkee, M.; Emmanouilidis, C. Trism for agentic ai: A review of trust, risk, and security management in llm-based agentic multi-agent systems. arXiv 2025, arXiv:2506.04133. [Google Scholar]
  25. Cao, P.; Men, T.; Liu, W.; Zhang, J.; Li, X.; Lin, X.; Sui, D.; Cao, Y.; Liu, K.; Zhao, J. Large language models for planning: A comprehensive and systematic survey. arXiv 2025, arXiv:2505.19683. [Google Scholar] [CrossRef]
  26. Joshi, S. LLMOps, AgentOps, and MLOps for Generative AI: A Comprehensive Review. Int. J. Comput. Appl. Technol. Res. 2025, 14, 1–11. [Google Scholar]
  27. Joshi, S. Comprehensive Review of Artificial General Intelligence AGI and Agentic GenAI: Applications in Business and Finance. Int. J. Multidiscip. Res. Growth Eval. 2025, 6, 681–688. [Google Scholar] [CrossRef]
  28. Bolanos, F.; Salatino, A.; Osborne, F.; Motta, E. Artificial intelligence for literature reviews: Opportunities and challenges. Artif. Intell. Rev. 2024, 57, 259. [Google Scholar] [CrossRef]
  29. Sapkota, R.; Roumeliotis, K.I.; Karkee, M. Ai agents vs. agentic ai: A conceptual taxonomy, applications and challenge. arXiv 2025, arXiv:2505.10468. [Google Scholar] [CrossRef]
  30. Olujimi, P.A.; Owolawi, P.A.; Mogase, R.C.; Wyk, E.V. Agentic AI frameworks in SMMEs: A systematic literature review of ecosystemic interconnected agents. AI 2025, 6, 123. [Google Scholar] [CrossRef]
  31. Shah, C.; White, R.W. Agents are not enough. arXiv 2024, arXiv:2412.16241. [Google Scholar] [CrossRef]
  32. Dharanikota, S. Psychological and Agentic Effects of Human-Bot Delegation in Open-Source Software Development (OSSD) Communities: An Empirical Investigation of Information Systems Delegation Framework. 2022. Available online: https://digitalcommons.fiu.edu/etd/5116/ (accessed on 14 August 2025).
  33. Ren, Y.; Liu, Y.; Ji, T.; Xu, X. AI Agents and Agentic AI-Navigating a Plethora of Concepts for Future Manufacturing. arXiv 2025, arXiv:2507.01376. [Google Scholar] [CrossRef]
  34. Chan, A.; Salganik, R.; Markelius, A.; Pang, C.; Rajkumar, N.; Krasheninnikov, D.; Langosco, L.; He, Z.; Duan, Y.; Carroll, M.; et al. Harms from Increasingly Agentic Algorithmic Systems. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, New York, NY, USA, 12–15 June 2023. [Google Scholar] [CrossRef]
  35. Suura, S.R. Agentic artificial intelligence systems for dynamic health management and real-time genomic data analysis. Eur. J. Anal. Artif. Intell. (EJAAI) 2024, 2. [Google Scholar]
  36. Pavani, S.; Shwetha, H. Agentic AI: Redefining Autonomy for Complex Goal-Driven Systems. 2025. Available online: https://scholar.google.com/scholar?hl=en&as_sdt=0%2C26&q=Agentic+AI%3A+Redefining+Autonomy+for+Complex+Goal-Driven+Systems.&btnG= (accessed on 14 August 2025).
  37. Mohammed Salah, A.; Alnoor, A.; Abdelfattah, F.; Chew, D.X. Agentic Ai Forging Adaptive, Equity-Driven Governance Pathways for Sustainable Futures. In XinYing, Agentic Ai Forging Adaptive, Equity-Driven Governance Pathways for Sustainable Futures. 2025. Available online: http://dx.doi.org/10.2139/ssrn.5229744 (accessed on 14 August 2025).
  38. Botti, V. Agentic AI and Multiagentic: Are We Reinventing the Wheel? arXiv 2025, arXiv:2506.01463. [Google Scholar] [CrossRef]
  39. Okpala, I.; Golgoon, A.; Kannan, A.R. Agentic AI Systems Applied to tasks in Financial Services: Modeling and model risk management crews. arXiv 2025, arXiv:2502.05439. [Google Scholar]
  40. Narajala, V.S.; Narayan, O. Securing agentic ai: A comprehensive threat model and mitigation framework for generative ai agents. arXiv 2025, arXiv:2504.19956. [Google Scholar] [CrossRef]
  41. Salah, M.; Alnoor, A.; Abdelfattah, F.; Al Halbusi, H. Agentic Artificial Intelligence in Public Administration: Foundations, Applications, and Governance. Applications, and Governance 2025. Available online: https://ssrn.com/abstract=5249100 (accessed on 10 May 2025).
  42. Jiang, F.; Pan, C.; Dong, L.; Wang, K.; Dobre, O.A.; Debbah, M. From large ai models to agentic ai: A tutorial on future intelligent communications. arXiv 2025, arXiv:2505.22311. [Google Scholar] [CrossRef]
  43. Boskabadi, M.R.; Cao, Y.; Khadem, B.; Clements, W.; Nevin Gerek, Z.; Reuthe, E.; Sivaram, A.; Savoie, C.J.; Mansouri, S.S. Industrial Agentic AI and generative modeling in complex systems. Curr. Opin. Chem. Eng. 2025, 48, 101150. [Google Scholar] [CrossRef]
  44. Menezes, V.P.; Chowdhury, M.J.M.; Mahmood, A. An Agentic Framework for Compliant, Ethical and Trustworthy GenAI Applications in Healthcare. In Proceedings of the 2025 Australasian Computer Science Week, New York, NY, USA, 10–13 February 2025. [Google Scholar] [CrossRef]
  45. Huang, L.; Koutra, D.; Kulkarni, A.; Prioleau, T.; Wu, Q.; Yan, Y.; Yang, Y.; Zou, J.; Zhou, D. Towards Agentic AI for Science: Hypothesis Generation, Comprehension, Quantification, and Validation. In Proceedings of the Companion Proceedings of the ACM on Web Conference, Sydney, Australia, 28 April–2 May 2025; pp. 1639–1642. [Google Scholar]
  46. Shavit, Y.; Agarwal, S.; Brundage, M.; Adler, S.; O’Keefe, C.; Campbell, R.; Lee, T.; Mishkin, P.; Eloundou, T.; Hickey, A.; et al. Practices for Governing Agentic AI Systems; Research Paper; OpenAI: San Francisco, CA, USA, 2023. [Google Scholar]
  47. Acharya, D.B.; Kuppan, K.; Bhaskaracharya, D. Agentic AI: Autonomous Intelligence for Complex Goals—A Comprehensive Survey. IEEE Access 2025, 13, 18912–18936. [Google Scholar] [CrossRef]
  48. Wissuchek, C.; Zschech, P. Exploring Agentic Artificial Intelligence Systems: Towards a Typological Framework. arXiv 2025, arXiv:2508.00844. [Google Scholar]
  49. Allam, H.; AlOmar, B.; Dempere, J. Agentic AI for IT and Beyond: A Qualitative Analysis of Capabilities, Challenges, and Governance. Artif. Intell. Bus. Rev. 2025, 1. Available online: https://theaibr.com/index.php/aibr/article/view/3 (accessed on 14 August 2025). [CrossRef]
  50. Horne, D. The Agentic AI Mindset–A Practitioner’s Guide to Architectures, Patterns, and Future Directions for Autonomy and Automation. 2025. Available online: https://www.researchgate.net/profile/Dwight-Horne/publication/390958865_The_Agentic_AI_Mindset_-_A_Practitioner’s_Guide_to_Architectures_Patterns_and_Future_Directions_for_Autonomy_and_Automation/links/6805a1eadf0e3f544f432cad/The-Agentic-AI-Mindset-A-Practitioners-Guide-to-Architectures-Patterns-and-Future-Directions-for-Autonomy-and-Automation.pdf (accessed on 14 August 2025).
  51. Sapkota, R.; Roumeliotis, K.I.; Karkee, M. Vibe coding vs. agentic coding: Fundamentals and practical implications of agentic ai. arXiv 2025, arXiv:2505.19443. [Google Scholar] [CrossRef]
  52. Brachman, M.; Kunde, S.; Miller, S.; Fucs, A.; Dempsey, S.; Jabbour, J.; Geyer, W. Building Appropriate Mental Models: What Users Know and Want to Know about an Agentic AI Chatbot. In Proceedings of the 30th International Conference on Intelligent User Interfaces, Cagliari, Italy, 24–27 March 2025; pp. 247–264. [Google Scholar]
  53. Sawant, P. Agentic AI: A Quantitative Analysis of Performance and Applications 2025. Available online: https://www.preprints.org/frontend/manuscript/b540908df60641a985f056de30899adb/download_pub (accessed on 14 August 2025).
  54. Peddisetti, S. Agentic AI Meets Data Engineering: Toward Self-Directed, Interpretable, and Balanced Pipelines. Int. J. Comput. Math. Ideas (IJCMI) 2025, 17, 17313–17325. [Google Scholar]
  55. Pamisetty, A. Application of Agentic Artificial Intelligence in Autonomous Decision Making Across Food Supply Chains. 2024. Available online: https://ssrn.com/abstract=5231360 (accessed on 14 August 2025).
  56. Karunanayake, N. Next-generation agentic AI for transforming healthcare. Inform. Health 2025, 2, 73–83. [Google Scholar] [CrossRef]
  57. Raidas, M.A.; Bhandari, R. Agentic AI in Education: Redefining Learning for the Digital Era. In Artificial Intelligence in EducationTransforming Learning for the Future; Scholars’ Press: London, UK, 2025; p. 89. [Google Scholar]
  58. Zhang, L.; Fu, X.; Li, Y.; Chen, J. Large language model-based agent Schema and library for automated building energy analysis and modeling. Autom. Constr. 2025, 176, 106244. [Google Scholar] [CrossRef]
  59. Montagna, S.; Mariani, S.; Schumacher, M.I.; Manzo, G. Agent-based systems in healthcare. Comput. Methods Programs Biomed. 2024, 248, 108140. [Google Scholar] [CrossRef] [PubMed]
  60. Vahdati, M.M.; Gholizadeh HamlAbadi, K.; Laamarti, F.; El Saddik, A. A Multi-Agent Digital Twin Framework for AI-Driven Fitness Coaching. In Proceedings of the 2025 ACM International Conference on Interactive Media Experiences, New York, NY, USA, 3–6 June 2025. [Google Scholar] [CrossRef]
  61. Daugherty, G.; Reveliotis, S.; Mohler, G. Optimized Multi-Agent Routing in Guidepath Networks. IFAC-PapersOnLine 2017, 50, 9686–9693. [Google Scholar] [CrossRef]
  62. Nourani, C. Multiagent AI implementations an emerging software engineering trend. Eng. Appl. Artif. Intell. 1999, 12, 37–42. [Google Scholar] [CrossRef]
  63. Nam, K.; Heo, S.; Kim, S.; Yoo, C. A multi-agent AI reinforcement-based digital multi-solution for optimal operation of a full-scale wastewater treatment plant under various influent conditions. J. Water Process Eng. 2023, 52, 103533. [Google Scholar] [CrossRef]
  64. Li, J.-H.; Hu, Y.; Xia, G.; Mo, W.; Li, B.; Jia, Y.; Gao, Y.; Xuan, F.; Liu, H.; Lian, C. Coevolution of large language models with physical models boosts advanced battery research. Cell Rep. Phys. Sci. 2025, 6, 102553. [Google Scholar] [CrossRef]
  65. Wang, G.; Hu, J.; Zhou, J.; Liu, S.; Li, Q.; Sun, Z. Knowledge-guided large language model for material science. Rev. Mater. Res. 2025, 1, 100007. [Google Scholar] [CrossRef]
  66. Hosseini, S.; Seilani, H. The role of agentic ai in shaping a smart future: A systematic review. Array 2025, 26, 100399. [Google Scholar] [CrossRef]
  67. Leonardi, P.M. Homo agenticus in the age of agentic AI: Agency loops, power displacement, and the circulation of responsibility. Inf. Organ. 2025, 35, 100582. [Google Scholar] [CrossRef]
  68. Korzynski, P.; Edwards, A.; Gupta, M.C.; Mazurek, G.; Wirtz, J. Humanoid robotics and agentic AI: Reframing management theories and future research directions. Eur. Manag. J. 2025, 43, 548–560. [Google Scholar] [CrossRef]
  69. Fujitani, Y.; Yamauchi, T.; Miyashita, Y.; Sugawara, T. Deadlock-Free Method for Multi-Agent Pickup and Delivery Problem Using Priority Inheritance with Temporary Priority. Procedia Comput. Sci. 2022, 207, 1552–1561. [Google Scholar] [CrossRef]
  70. Xie, J.; Ajagekar, A.; You, F. Multi-Agent attention-based deep reinforcement learning for demand response in grid-responsive buildings. Appl. Energy 2023, 342, 121162. [Google Scholar] [CrossRef]
  71. Sifakis, J.; Li, D.; Huang, H.; Zhang, Y.; Dang, W.; Huang, R.; Yu, Y. A Reference Architecture for Autonomous Networks: An Agent-Based Approach. arXiv 2025, arXiv:2503.12871. [Google Scholar]
  72. Deng, L.; Shu, Z.; Chen, T. Event-Triggered Robust Distributed MPC for Multi-Agent Systems with A Two-Step Event Verification. IFAC-PapersOnLine 2022, 55, 144–149. [Google Scholar] [CrossRef]
  73. Coordination of modular nano grid energy management using multi-agent AI architecture. Comput. Electr. Eng. 2024, 115, 109112. [CrossRef]
  74. Multimodal Agentic Model Predictive Control. In Proceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems, Detroit, MI, USA, 19–23 May 2025.
  75. Luo, J.; Zhang, W.; Yuan, Y.; Zhao, Y.; Yang, J.; Gu, Y.; Wu, B.; Chen, B.; Qiao, Z.; Long, Q.; et al. Large language model agent: A survey on methodology, applications and challenges. arXiv 2025, arXiv:2503.21460. [Google Scholar] [CrossRef]
  76. Liu, J.; Hao, W.; Cheng, K.; Jin, D. Large language model-based planning agent with generative memory strengthens performance in textualized world. Eng. Appl. Artif. Intell. 2025, 148, 110319. [Google Scholar] [CrossRef]
  77. Bousetouane, F. Agentic Systems: A Guide to Transforming Industries with Vertical AI Agents. arXiv 2025, arXiv:2501.00881. [Google Scholar] [CrossRef]
  78. Kalisetty, S.; Singireddy, J. Agentic AI in Retail: A Paradigm Shift in Autonomous Customer Interaction and Supply Chain Automation. Am. Adv. J. Emerg. Discip. (AAJED) 2023, 1. [Google Scholar]
  79. Huang, X.; Lian, J.; Lei, Y.; Yao, J.; Lian, D.; Xie, X. Recommender AI Agent: Integrating Large Language Models for Interactive Recommendations. ACM Trans. Inf. Syst. 2025, 43, 1–33. [Google Scholar] [CrossRef]
  80. Paleti, S. Agentic AI in Financial Decision-Making: Enhancing Customer Risk Profiling, Predictive Loan Approvals, and Automated Treasury Management in Modern Banking. Multidiscip. Sci. Work. Manag. J. 2024, 34, 832–843. [Google Scholar]
  81. Trirat, P.; Jeong, W.; Hwang, S.J. Agentic Predictor: Performance Prediction for Agentic Workflows via Multi-View Encoding. arXiv 2025, arXiv:2505.19764. [Google Scholar] [CrossRef]
  82. Li, X.; Shi, W.; Zhang, H.; Peng, C.; Wu, S.; Tong, W. The Agentic-AI Core: An AI-Empowered, Mission-Oriented Core Network for Next-Generation Mobile Telecommunications. Engineering 2025, in press. [CrossRef]
  83. Li, G.; Hammoud, H.; Itani, H.; Khizbullin, D.; Ghanem, B. Camel: Communicative agents for" mind” exploration of large language model society. Adv. Neural Inf. Process. Syst. 2023, 36, 51991–52008. [Google Scholar]
  84. Hong, S.; Zhuge, M.; Chen, J.; Zheng, X.; Cheng, Y.; Zhang, C.; Wang, J.; Wang, Z.; Yau, S.K.S.; Lin, Z.; et al. MetaGPT: Meta programming for a multi-agent collaborative framework. In Proceedings of the International Conference on Learning Representations, ICLR, Vienna, Austria, 7–11 May 2024. [Google Scholar]
  85. De Ridder, A. SuperAGI vs LangChain: A Comprehensive Guide. Available online: https://smythos.com/developers/agent-comparisons/superagi-vs-langchain/ (accessed on 29 August 2025).
  86. Borghoff, U.M.; Bottoni, P.; Pareschi, R. Beyond Prompt Chaining: The TB-CSPN Architecture for Agentic AI. Future Internet 2025, 17, 363. [Google Scholar] [CrossRef]
  87. Liu, Z.; Yao, W.; Zhang, J.; Yang, L.; Liu, Z.; Tan, J.; Choubey, P.K.; Lan, T.; Wu, J.; Wang, H.; et al. AgentLite: A Lightweight Library for Building and Advancing Task-Oriented LLM Agent System. arXiv 2024, arXiv:2402.15538, 15538. [Google Scholar] [CrossRef]
  88. Thirumalainambi, R. Pitfalls of JESS for Dynamic Systems. In Proceedings of the Artificial Intelligence and Pattern Recognition, Citeseer, Orlando, FL, USA, 9–12 July 2007; pp. 491–494. [Google Scholar]
  89. Nguyen, T.T.; Nguyen, N.D.; Nahavandi, S. Deep Reinforcement Learning for Multiagent Systems: A Review of Challenges, Solutions, and Applications. IEEE Trans. Cybern. 2020, 50, 3826–3839. [Google Scholar] [CrossRef]
  90. Zhang, K.; Yang, Z.; Başar, T. Multi-Agent Reinforcement Learning: A Selective Overview of Theories and Algorithms. arXiv 2021, arXiv:1911.10635. [Google Scholar] [CrossRef]
  91. Hernandez-Leal, P.; Kartal, B.; Taylor, M.E. A survey and critique of multiagent deep reinforcement learning. Auton. Agents Multi-Agent Syst. 2019, 33, 750–797. [Google Scholar] [CrossRef]
  92. Silver, T.; Hariprasad, V.; Shuttleworth, R.S.; Kumar, N.; Lozano-Pérez, T.; Kaelbling, L.P. PDDL planning with pretrained large language models. In Proceedings of the NeurIPS 2022 Foundation Models for Decision Making Workshop. 2022. Available online: https://drive.google.com/file/d/1PqCYzzfdUFitxG7NxFs0D6dn-TzI97q_/view (accessed on 14 August 2025).
  93. Chiacchio, F.; Pennisi, M.; Russo, G.; Motta, S.; Pappalardo, F. Agent-based modeling of the immune system: NetLogo, a promising framework. BioMed Res. Int. 2014, 2014, 907171. [Google Scholar] [CrossRef] [PubMed]
  94. Luke, S.; Cioffi-Revilla, C.; Panait, L.; Sullivan, K.; Balan, G. Mason: A multiagent simulation environment. Simulation 2005, 81, 517–527. [Google Scholar] [CrossRef]
  95. Garg, V. Designing the Mind: How Agentic Frameworks Are Shaping the Future of AI Behavior. J. Comput. Sci. Technol. Stud. 2025, 7, 182–193. [Google Scholar] [CrossRef]
  96. Allmendinger, S.; Bonenberger, L.; Endres, K.; Fetzer, D.; Gimpel, H.; Kühl, N. Multi-Agent AI. OSF Preprints. 2023. Available online: https://osf.io/hndm3 (accessed on 14 August 2025).
  97. Nurturing Code Quality: Leveraging Static Analysis and Large Language Models for Software Quality in Education. In Proceedings of the Adjunct 33rd ACM Conference on User Modeling, Adaptation and Personalization. New York, NY, USA, 16–19 June 2025. [CrossRef]
  98. Yao, S.; Zhao, J.; Yu, D.; Du, N.; Shafran, I.; Narasimhan, K.; Cao, Y. ReAct: Synergizing Reasoning and Acting in Language Models. In Proceedings of the International Conference on Learning Representations (ICLR), Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  99. Gorenshtein, A.; Sorka, M.; Khateb, M.; Aran, D.; Shelly, S. Agent-guided AI-powered interpretation and reporting of nerve conduction studies and EMG (INSPIRE). Clin. Neurophysiol. 2025, 177, 2110792. [Google Scholar] [CrossRef]
  100. Kim, J.; Wahi-Anwa, M.; Park, S.; Shin, S.; Hoffman, J.M.; Brown, M.S. Autonomous Computer Vision Development with Agentic AI. arXiv 2025, arXiv:2506.11140. [Google Scholar] [CrossRef]
  101. Roy, T.P.D. Bioethics Artificial Intelligence Advisory (BAIA): An Agentic Artificial Intelligence (AI) Framework for Bioethical Clinical Decision Support. Cureus 2025, 17, e80494. [Google Scholar] [CrossRef]
  102. Suechkul, P.; Tribuddharat, N.; Kulthamrongsri, N. Toward Real-Time Detection of Drug-Induced Liver Injury Using Large Language Models: A Feasibility Study from Clinical Note. J. Clin. Exp. Hepatol. 2025, 15, 102627. [Google Scholar] [CrossRef]
  103. Bharadiya, J.P. Artificial Intelligence in Transportation Systems A Critical Review. Am. J. Comput. Eng. 2023, 6, 35–45. [Google Scholar] [CrossRef]
  104. Yu, J. Preparing for an agentic era of human-machine transportation systems: Opportunities, challenges, and policy recommendations. Transp. Policy 2025, 171, 78–97. [Google Scholar] [CrossRef]
  105. Jeyakumar, S.K.; Ahmad, A.A.; Gabriel, A.G. Advancing Agentic Systems: Dynamic Task Decomposition, Tool Integration and Evaluation using Novel Metrics and Dataset. In Proceedings of the NeurIPS 2024 Workshop on Open-World Agents, Vancouver, BC, Canada, 16 December 2024. [Google Scholar]
  106. Paduraru, C.; Zavelca, M.; Stefanescu, A. Agentic AI for Behavior-Driven Development Testing using Large Language Models. Available online: https://www.scitepress.org/Papers/2025/133744/133744.pdf (accessed on 14 August 2025).
  107. Ogbu, D. Agentic AI in Computer Vision Domain-Recent Advances and Prospects. Available online: https://www.researchgate.net/profile/Daniel-Ogbu/publication/386292786_Agentic_AI_in_Computer_Vision_Domain_-Recent_Advances_and_Prospects/links/674c6ec6a7fbc259f1a33618/Agentic-AI-in-Computer-Vision-Domain-Recent-Advances-and-Prospects.pdf (accessed on 14 August 2025).
  108. Zhang, Y.; Liu, Z.; Wen, Q.; Pang, L.; Liu, W.; Yu, P.S. AI Agent for Information Retrieval: Generating and Ranking. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 5605–5607. [Google Scholar]
  109. Saxena, A.; Chaudhari, A.Y.; Gupta, A. AgCV: An Agentic framework for automating computer vision application. MethodsX 2025, 15, 103424. [Google Scholar] [CrossRef]
  110. Robinson, D.; Cabrera, C.; Gordon, A.D.; Lawrence, N.D.; Mennen, L. Requirements Are All You Need: The Final Frontier for End-User Software Engineering. ACM Trans. Softw. Eng. Methodol. 2025, 34, 141. [Google Scholar] [CrossRef]
  111. Belcak, P.; Heinrich, G.; Diao, S.; Fu, Y.; Dong, X.; Muralidharan, S.; Lin, Y.C.; Molchanov, P. Small Language Models are the Future of Agentic AI. arXiv 2025, arXiv:2506.02153. [Google Scholar] [CrossRef]
  112. Wen, Q.; Zhang, Y.; Liu, Z.; McAuley, J.; Wei, H.; Pang, L.; Liu, W.; Yu, P.S. The 3rd Workshop on AI Agent for Information Retrieval: Generating and Ranking. In Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, New York, NY, USA, 28 April–2 May 2025. [Google Scholar] [CrossRef]
  113. Casper, S.; Bailey, L.; Hunter, R.; Ezell, C.; Cabalé, E.; Gerovitch, M.; Slocum, S.; Wei, K.; Jurkovic, N.; Khan, A.; et al. The AI Agent Index. arXiv 2025, arXiv:2502.01635. [Google Scholar]
  114. Meske, C.; Kuss, P.M. Theorizing the Concept of Agency in Human-Algorithmic Ensembles with a Socio-Technical Lens. 2022. Available online: https://core.ac.uk/download/pdf/542549024.pdf (accessed on 14 August 2025).
  115. Feldt, R.; Kang, S.; Yoon, J.; Yoo, S. Towards Autonomous Testing Agents via Conversational Large Language Models. In Proceedings of the 38th IEEE/ACM International Conference on Automated Software Engineering, Luxembourg, 11–15 November 2024; IEEE Press: New York, NY, USA, 2024. [Google Scholar] [CrossRef]
  116. Fatouros, G.; Makridis, G.; Kousiouris, G.; Soldatos, J.; Tsadimas, A.; Kyriazis, D. Towards Conversational AI for Human-Machine Collaborative MLOps. arXiv 2025, arXiv:2504.12477. [Google Scholar] [CrossRef]
  117. Transforming cybersecurity with agentic AI to combat emerging cyber threats. Telecommun. Policy 2025, 49, 102976. [CrossRef]
  118. Bouzenia, I.; Pradel, M. You Name It, I Run It: An LLM Agent to Execute Tests of Arbitrary Projects. Proc. Acm Softw. Eng. 2025, 2, 1054–1076. [Google Scholar] [CrossRef]
  119. Wang, L.; Qiu, T.; Pu, Z.; Yi, J.; Zhu, J.; Zhao, Y. A Decision-making Method for Swarm Agents in Attack-defense Confrontation. IFAC-PapersOnLine 2023, 56, 7858–7864. [Google Scholar] [CrossRef]
  120. Sheriff, A.; Huang, K.; Nemeth, Z.; Nakhjiri, M. ADA: Automated Moving Target Defense for AI Workloads via Ephemeral Infrastructure-Native Rotation in Kubernetes. arXiv 2025, arXiv:2505.23805. [Google Scholar]
  121. Ungen, M.; Kampert, D.; Feldotto, B.; Huber, E.; Riedel, O. Automated workflow generation supporting the value stream design of reconfigurable robot assembly cells. Procedia CIRP 2024, 128, 609–614. [Google Scholar] [CrossRef]
  122. Sandini, G.; Sciutti, A.; Morasso, P. Artificial cognition vs. artificial intelligence for next-generation autonomous robotic agents. Front. Comput. Neurosci. 2024, 18, 1349408. [Google Scholar] [CrossRef]
  123. Bennett, J.; Sterritt, R. Autonomic Computing in Total Achievement of Quality. In Proceedings of the The Twentieth International Conference on Autonomic and Autonomous Systems, Athens, Greece, 10–14 March 2024. [Google Scholar]
  124. Contexts Matter: Robot-Aware 3D human motion prediction for Agentic AI-empowered Human-Robot collaboration. Adv. Eng. Inform. 2025, 68, 103591. [CrossRef]
  125. Tiwari, A. Conceptualising the emergence of Agentic Urban AI: From automation to agency. Urban Inform. 2025, 4, 13. [Google Scholar] [CrossRef]
  126. Yoon, S.; Song, J.; Li, J. Ontology-enabled AI agent-driven intelligent digital twins for building operations and maintenance. J. Build. Eng. 2025, 108, 112802. [Google Scholar] [CrossRef]
  127. Stylos, N.; Okumus, F.; Onder, I. Beauty or the Borg: Agentic artificial intelligence organizational socialization in synergistic Hybrid Transformative Dynamic Flows. Tour. Manag. 2025, 111, 105205. [Google Scholar] [CrossRef]
  128. Pathak, A.; Bansal, V. AI as decision aid or delegated agent: The effects of trust dimensions on the adoption of AI digital agents. Comput. Hum. Behav. Artif. Humans 2024, 2, 100094. [Google Scholar] [CrossRef]
  129. Sriram, H.K.; Bharath M, B.M. Beyond Automation: Exploring the Potential of Agentic AI in Risk Management and Fraud Detection in Banks. Eksplorium 2025, 46, 192–220. [Google Scholar] [CrossRef]
  130. Inala, R.; Somu, B. Building Trustworthy Agentic Ai Systems FOR Personalized Banking Experiences. Metall. Mater. Eng. 2025, 31, 1336–1360. [Google Scholar]
  131. Wünderlich, N.V.; Blut, M.; Brock, C.; Heirati, N.; Jensen, M.; Paluch, S.; Rötzmeier-Keuper, J.; Tóth, Z. How to use emerging service technologies to enhance customer centricity in business-to-business contexts: A conceptual framework and research agenda. J. Bus. Res. 2025, 192, 115284. [Google Scholar] [CrossRef]
  132. Huang, C.; Huang, H.; Yu, T.; Xie, K.; Wu, J.; Zhang, S.; Mcauley, J.; Jannach, D.; Yao, L. A Survey of Foundation Model-Powered Recommender Systems: From Feature-Based, Generative to Agentic Paradigms. arXiv 2025, arXiv:2504.16420. [Google Scholar]
  133. Liu, Y.; Lo, S.K.; Lu, Q.; Zhu, L.; Zhao, D.; Xu, X.; Harrer, S.; Whittle, J. Agent design pattern catalogue: A collection of architectural patterns for foundation model based agents. J. Syst. Softw. 2025, 220, 112278. [Google Scholar] [CrossRef]
  134. Yellanki, S.K.; Kummari, D.N.; Sheelam, G.K.; Kannan, S.; Chakilam, C. Synthetic Cognition Meets Data Deluge: Architecting Agentic AI Models for Self-Regulating Knowledge Graphs in Heterogeneous Data Warehousing. Metall. Mater. Eng. 2025, 31, 569–586. [Google Scholar] [CrossRef]
  135. Piccialli, F.; Chiaro, D.; Sarwar, S.; Cerciello, D.; Qi, P.; Mele, V. AgentAI: A comprehensive survey on autonomous agents in distributed AI for industry 4.0. Expert Syst. Appl. 2025, 291, 128404. [Google Scholar] [CrossRef]
  136. Plaat, A.; van Duijn, M.; van Stein, N.; Preuss, M.; van der Putten, P.; Batenburg, K.J. Agentic large language models, a survey. arXiv 2025, arXiv:2503.23037. [Google Scholar] [CrossRef]
  137. Hu, S.; Lu, C.; Clune, J. Automated design of agentic systems. arXiv 2024, arXiv:2408.08435. [Google Scholar] [CrossRef]
  138. Zhang, Q.; Wornow, M.; Olukotun, K. Cost-Efficient Serving of LLM Agents via Test-Time Plan Caching. arXiv 2025, arXiv:2506.14852. [Google Scholar]
  139. Zhang, Z.; Dai, Q.; Chen, X.; Li, R.; Li, Z.; Dong, Z. MemEngine: A Unified and Modular Library for Developing Advanced Memory of LLM-based Agents. In Proceedings of the Companion ACM on Web Conference 2025, New York, NY, USA, 18 May 2025. [Google Scholar] [CrossRef]
  140. Sohn, S.; Labrecque, L.; Siemon, D.; Morana, S. Artificial intelligence versus human service agents: How their presence shapes consumer information privacy concerns. J. Retail. 2025, 101, 263–278. [Google Scholar] [CrossRef]
  141. Floridi, L.; Buttaboni, C.; Hine, E.; Morley, J.; Novelli, C.; Schroder, T. Agentic AI Optimisation (AAIO): What it is, how it works, why it matters, and how to deal with it. arXiv 2025, arXiv:2504.12482. [Google Scholar] [CrossRef]
  142. Alecsoiu, O.R.; Faruqui, N.; Panagoret, A.A.; Ceausescu, A.I.; Panagoret, D.M.; Nitu, R.V.; Mutu, M.A. EcoptiAI: E-Commerce Process Optimization and Operational Cost Minimization through Task Automation using Agentic AI. IEEE Access 2025, 13, 70254–70268. [Google Scholar] [CrossRef]
  143. Narechania, A.; Endert, A.; Sinha, A.R. Agentic Enterprise: AI-Centric User to User-Centric AI. arXiv 2025, arXiv:2506.22893. [Google Scholar]
  144. Qiao, S.; Qiu, Z.; Ren, B.; Wang, X.; Ru, X.; Zhang, N.; Chen, X.; Jiang, Y.; Xie, P.; Huang, F.; et al. Agentic Knowledgeable Self-awareness. arXiv 2025, arXiv:2504.03553. [Google Scholar] [CrossRef]
  145. Chong, T.; Yu, T.; Keeling, D.I.; de Ruyter, K. AI-chatbots on the services frontline addressing the challenges and opportunities of agency. J. Retail. Consum. Serv. 2021, 63, 102735. [Google Scholar] [CrossRef]
  146. Jeon, Y.A. Let me transfer you to our AI-based manager: Impact of manager-level job titles assigned to AI-based agents on marketing outcomes. J. Bus. Res. 2022, 145, 892–904. [Google Scholar] [CrossRef]
  147. Sargsyan, L. Integrating Agentic AI in Higher Education: Balancing Opportunities, Challenges, and Ethical Imperatives. Foreign Lang. High. Educ. 2025, 29, 87–100. [Google Scholar] [CrossRef]
  148. Kamalov, F.; Calonge, D.S.; Smail, L.; Azizov, D.; Thadani, D.R.; Kwong, T.; Atif, A. Evolution of ai in education: Agentic workflows. arXiv 2025, arXiv:2504.20082. [Google Scholar] [CrossRef]
  149. AI Agents: Potential implications for IS Research? Available online: https://www.sciencedirect.com/science/article/pii/S0963868725000216 (accessed on 14 August 2025).
  150. Gridach, M.; Nanavati, J.; Abidine, K.Z.E.; Mendes, L.; Mack, C. Agentic AI for scientific discovery: A survey of progress, challenges, and future directions. arXiv 2025, arXiv:2503.08979. [Google Scholar]
  151. Zhou, R.; Sikand, V.; Rao, S. AI Agents for Deep Scientific Research. In Proceedings of the UIUC Spring 2025 CS598 LLM Agent Workshop, Urbana, Illinois, USA, 27 April 2025; Submitted. Available online: https://openreview.net/forum?id=wODNrFtTT2 (accessed on 14 August 2025).
  152. Yager, K.G. Towards a science exocortex. Digit. Discov. 2024, 3, 1933–1957. [Google Scholar] [CrossRef]
  153. Budig, T.; Nißen, M.; Kowatsch, T. Towards the Embodied Conversational Interview Agentic Service ELIAS: Development and Evaluation of a First Prototype. In Proceedings of the Adjunct 33rd ACM Conference on User Modeling, Adaptation and Personalization, New York, NY, USA, 16–19 June 2025. [Google Scholar] [CrossRef]
  154. Borghoff, U.M.; Bottoni, P.; Pareschi, R. Human-artificial interaction in the age of agentic ai: A system-theoretical approach. Front. Hum. Dyn. 2025, 7, 1579166. [Google Scholar] [CrossRef]
  155. Weisz, J.D.; He, J.; Muller, M.; Hoefer, G.; Miles, R.; Geyer, W. Design Principles for Generative AI Applications. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, Honolulu, HI, USA, 11–16 May 2024. [Google Scholar] [CrossRef]
  156. Vajpayee, P.; Hossain, G. Cyber Defense through Agentic AI Enabled Automation: An Approach to Reduce Cyber Risk. In Proceedings of the 2025 Computers and People Research Conference, Waco, TX, USA, 28–30 May 2025. [Google Scholar] [CrossRef]
  157. Mukherjee, A.; Chang, H.H. Agentic AI: Expanding the Algorithmic Frontier of Creative Problem Solving. arXiv 2025, arXiv:2502.00289. [Google Scholar]
  158. Seifert, G.; Sealander, A.; Marzen, S.; Levin, M. From reinforcement learning to agency: Frameworks for understanding basal cognition. BioSystems 2024, 235, 105107. [Google Scholar] [CrossRef]
  159. Zhang, R.; Tang, S.; Liu, Y.; Niyato, D.; Xiong, Z.; Sun, S.; Mao, S.; Han, Z. Toward agentic ai: Generative information retrieval inspired intelligent communications and networking. arXiv 2025, arXiv:2502.16866. [Google Scholar] [CrossRef]
  160. Motamary, S. Transforming Customer Experience in Telecom: Agentic AI-Driven BSS Solutions for Hyper-Personalized Service Delivery. 2024. Available online: https://ssrn.com/abstract=5240126 (accessed on 14 August 2025).
  161. Hughes, L.; Dwivedi, Y.K.; Malik, T.; Shawosh, M.; Albashrawi, M.A.; Jeon, I.; Dutot, V.; Appanderanda, M.; Crick, T.; De’, R.; et al. AI agents and agentic systems: A multi-expert analysis. J. Comput. Inf. Syst. 2025, 65, 489–517. [Google Scholar] [CrossRef]
  162. Bonnefon, J.; Rahwan, I.; Shariff, A.F. The Moral Psychology of Artificial Intelligence. Annu. Rev. Psychol. 2023, 75, 653–675. [Google Scholar] [CrossRef]
  163. Cruzes, S. Revolutionizing optical networks: The integration and impact of large language models. Opt. Switch. Netw. 2025, 57, 100812. [Google Scholar] [CrossRef]
  164. Cabitza, F.; Campagner, A.; Simone, C. The need to move away from agential-AI: Empirical investigations, useful concepts and open issues. Int. J. Hum.-Comput. Stud. 2021, 155, 102696. [Google Scholar] [CrossRef]
  165. Ni, B.; Buehler, M.J. MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge. Extrem. Mech. Lett. 2024, 67, 102131. [Google Scholar] [CrossRef]
  166. Rathakrishnan, M.; Gayan, S.; Singh, R.; Kaur, A.; Inaltekin, H.; Edirisinghe, S.; Poor, H.V. Towards AI-Driven RANs for 6G and Beyond: Architectural Advancements and Future Horizons. arXiv 2025, arXiv:2506.16070. [Google Scholar] [CrossRef]
  167. Miehling, E.; Ramamurthy, K.N.; Varshney, K.R.; Riemer, M.; Bouneffouf, D.; Richards, J.T.; Dhurandhar, A.; Daly, E.M.; Hind, M.; Sattigeri, P.; et al. Agentic ai needs a systems theory. arXiv 2025, arXiv:2503.00237. [Google Scholar]
  168. Ajmani, L.H.; Abdelkadir, N.A.; Chancellor, S. Secondary Stakeholders in AI: Fighting for, Brokering, and Navigating Agency. In Proceedings of the 2025 ACM Conference on Fairness, Accountability, and Transparency, New York, NY, USA, 23–26 June 2025. [Google Scholar] [CrossRef]
  169. Cheung, L.H.; Wang, L.; Lei, D. Conversational, agentic AI-enhanced architectural design process: Three approaches to multimodal AI-enhanced early-stage performative design exploration. Archit. Intell. 2025, 4, 1–25. [Google Scholar] [CrossRef]
  170. Jilk, D.J. Limits to verification and validation of agentic behavior. In Artificial Intelligence Safety and Security; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018; pp. 225–234. [Google Scholar]
Figure 1. Agentic AI vs. AI agents.
Figure 1. Agentic AI vs. AI agents.
Futureinternet 17 00404 g001
Figure 2. Google Trends shows the popularity of agentic AI.
Figure 2. Google Trends shows the popularity of agentic AI.
Futureinternet 17 00404 g002
Figure 3. A Venn diagram illustrating the conceptual foundations of agentic AI.
Figure 3. A Venn diagram illustrating the conceptual foundations of agentic AI.
Futureinternet 17 00404 g003
Figure 4. Core components of agentic AI system.
Figure 4. Core components of agentic AI system.
Futureinternet 17 00404 g004
Figure 5. ReAct single-agent loop.
Figure 5. ReAct single-agent loop.
Futureinternet 17 00404 g005
Figure 6. Supervisor/hierarchical architecture.
Figure 6. Supervisor/hierarchical architecture.
Futureinternet 17 00404 g006
Figure 7. Hybrid reactive–deliberative.
Figure 7. Hybrid reactive–deliberative.
Futureinternet 17 00404 g007
Figure 8. Belief–desire–intention (BDI) architecture.
Figure 8. Belief–desire–intention (BDI) architecture.
Futureinternet 17 00404 g008
Figure 9. Layered decision (neuro-symbolic) architecture.
Figure 9. Layered decision (neuro-symbolic) architecture.
Futureinternet 17 00404 g009
Figure 10. Overview of agentic AI applications across multiple domains.
Figure 10. Overview of agentic AI applications across multiple domains.
Futureinternet 17 00404 g010
Figure 11. Classification diagram for testing methods and evaluation metrics that are used to assess the performance of agentic AI systems.
Figure 11. Classification diagram for testing methods and evaluation metrics that are used to assess the performance of agentic AI systems.
Futureinternet 17 00404 g011
Table 1. Summary of important surveys on agentic AI (L—Low, M—Medium, H—High, NA—Not Applicable).
Table 1. Summary of important surveys on agentic AI (L—Low, M—Medium, H—High, NA—Not Applicable).
ReferenceYear PublishedDefinition and Concept
Taxonomy
ArchitectureApplications
Classification
Input/Output formats
Classification
Evaluation Metrics
Metric Classification
Challenges and LimitationsRemarks
 [10]2021HMHLMHReview of agent-based programming languages and frameworks for multi-agent systems, bridging theory and practice while identifying future research directions.
 [11]2024MLHNALLFocuses on use of LLMs and LLM-based agents, highlighting their applications, limitations, and future research directions across key SE tasks.
 [12]2024LMNALNANAFocus on unified taxonomy and decision model to enhance the architectural design, operational understanding of foundation-model-based AI agents.
 [13]2024HMHNANAHSurvey on unified framework detailing the architecture, applications, and challenges of LLM-based multi-agent systems.
 [14]2024HHMMNAMFocuses on unified framework for LLM-based autonomous agents, covering their construction, evaluation methods, and future challenges.
 [15]2024MMHMMHSurvey on LLMs enhancing agent-based simulations across physical, cyber, social, and hybrid domains, highlighting challenges and future directions.
 [16]2024LNALNALLThe study investigates U.S. public comfort with AI across occupations, showing attitudes vary by automation likelihood and individual traits.
 [17]2024LLLNALLFocus multi-AI agent model using LLMs to fully automate and streamline the systematic literature review process.
 [18]2025MHLMNALThis paper explores 6G network integration with agentic AI, emphasizing automation and support intelligent, energy-efficient, low-latency operations.
 [19]2024MHLLMMThis survey systematically reviews LLM-based agent planning, providing a taxonomy and analyzing recent approaches to improve planning abilities.
 [20]2025NANANANAMMFocuses on reasoning methods in LLMs, offering a systematic framework to understand evolving LLM reasoning capabilities and trends.
 [21]2025MMLLMHFocus on LLMs and autonomous AI agents, covering benchmarks, frameworks, applications, and future research directions.
 [22]2025MNALNANALThis survey explores the evolution from generative AI to agentic AI, highlighting enhanced reasoning, autonomy, and key challenges for future research.
 [23]2025MHMLMMThe paper proposes an agentic AI framework using LLMs and TDD to enhance reliability, scalability, and decision-making in retail systems.
 [24]2025MMLLMHReviews trust, risk, and security management in LLM-based agentic multi-agent systems and proposes a roadmap for responsible deployment.
 [25]2025MHLLHMSurvey on LLM-based planning, categorizing methods and discussing evaluation frameworks and future directions.
 [26]2025HMLLMHReviews on LLMOps, AgentOps, and MLOps, outlining best practices and challenges for operationalizing generative AI systems.
 [27]2025LHHLMMReview of AGI and agentic AI, emphasizing their definitions, architectures, applications, and associated challenges.
 [28]2024LMNALLMReview of AI-enhanced tools for SLRs, focusing on screening, extraction, and the integration of LLMs.
 [29]2025HHMLMMFocuses on taxonomy and comparative analysis of AI agents and agentic AI, highlighting their divergent architectures, applications, and challenges.
 [30]2025HMHLLMReviews agentic AI applications in SMMEs, highlighting about enhancing efficiency, adaptability, and innovation through interconnected autonomous agents.
This Paper2025HHHHHHOur paper focuses on all six areas and contributes significant results such as definition and concept taxonomy, architecture, applications classification, input/output format classification, evaluation metrics, challenges, and limitations.
Table 2. Comparative analysis of AI paradigms: multi-agent systems, generative AI, and agentic AI.
Table 2. Comparative analysis of AI paradigms: multi-agent systems, generative AI, and agentic AI.
AspectsMulti-Agent Systems (MAS)Generative AI (GenAI)Agentic AIRef.
Primary FunctionTask division among multiple autonomous agents for collaboration/competitionTask division among multiple autonomous agents for collaboration/competitionGoal-oriented autonomous decision-making with multi-step workflow execution [27,35,37,39,41,43,45,47,49,54,55,56]
Decision-MakingDistributed across multiple agents with coordination mechanismsPattern-based responses from training dataIndependent analysis, reasoning, and contextual decisions [11,12,13,14,15,19,20,25,26,27,31,41,44,47,57,58]
Task ScopeComplex goals broken into smaller tasks for individual agentsSingle-turn interactions focused on content creationMulti-step, multi-layered tasks over extended time frames [10,27,37,47,59,60,61,62,63]
Learning ApprochIndividual agent learning with inter-agent communicationStatic dataset learning during training phaseInteractive learning with continuous feedback loops and adaptation [41,57,64,65]
ArchitectureMultiple coordinated agents with communication protocolsPrimarily transformer-based neural networksPlanning modules, memory systems, tool integration, reasoning components [27,35,47]
AdaptabilityCoordination challenges; varies by system designLimited to training data patterns; requires retraining for adaptationDynamic adaptation to changing environments and goals [27,41,47,57]
Human InterventionVaries by implementation; often requires coordination oversightModerate intervention for prompt engineering and guidanceLow intervention; operates autonomously with minimal supervision [47,57,66,67,68]
Goal OrientationDistributed goal achievement through agent collaborationContent generation goals defined by promptsSelf-defined and executed goal-oriented behavior [18,22,23,24,27,33,34,37,38,41,47,48,48]
Workflow ManagementDistributed workflow across multiple agentsCannot manage end-to-end workflows independentlyComprehensive end-to-end workflow orchestration [37,69,70,71,72,73]
Memory SystemsIndividual agent memory with shared knowledge basesNo persistent memory between interactionsSophisticated memory systems with reflection and experience storage [35,37,39,41,74]
Real-time AdaptationComplex goals broken into smaller tasks for individual agentsSingle-turn interactions focused on content creationMulti-step, multi-layered tasks over extended time frames [27,37,47]
Use CasesComplex distributed problem-solving, simulation systemsContent creation, text generation, image synthesisVirtual assistants, autonomous systems, adaptive planning, scientific discovery [37,47,56,75,76]
Table 3. Agentic capabilities across current LLM-based frameworks.
Table 3. Agentic capabilities across current LLM-based frameworks.
FrameworkPrimary PurposeKey Agentic CapabilitiesPrimary LLM UsedRef.
LangChainBuilding structured workflows with LLMsPlanning, memory, tool useOpenAI, Cohere, Anthropic [26,31,78,79]
AutoGPTFully autonomous multi-step task executionGoal pursuit, planning, reflection,
tool use
GPT-4[43,45,47]
BabyAGIAccessible autonomous task managementTask decomposition, memory, adaptive executionOpenAI[49,81,82]
OpenAgentsMulti-agent collaboration and coordinationMulti-agent reasoning, goal pursuit, iterative planningGPT-4[26,78,79]
AutogenCoordinating multiple AI agents via dialogueMulti-agent collaboration, planning, reflectionGPT-4 [41]
CAMELSimulating multi-agent dialogues with defined rolesRole-based interaction, dialogue simulation, collaborative explorationGPT-4, LLaMA [83]
MetaGPTMulti-agent collaborative problem solvingAutomated problem decomposition, role assignment, coordination of agent societiesGPT-4 [84]
SuperAGIOpen-source framework for developing and deploying autonomous agentsTask orchestration, multi-agent workflows, deployment at scaleLLM-agnostic (commonly GPT-4, OpenAI, Anthropic, LLaMA) [85]
TB-CSPNContext-aware agentic reasoning beyond prompt chainingTask decomposition, constraint satisfaction, context-driven decision-makingNo specific LLM [86]
Table 4. Components and risks across architecture families.
Table 4. Components and risks across architecture families.
ArchitecturePerception/StatePlanning and ReasoningMemoryExecution and ActionReflection/
Feedback
Orchestration/
Autonomy
Typical UsesSalient Risks
BDI (Belief–Desire–Intention)✓ explicit beliefs✓ desire filtering, intention commitment∘ belief updates persistent✓ goal-driven loop∘ intention revision✓ central loop, commitmentExplainable decision systems; simulationsSymbolic modeling effort; brittle under high uncertainty [38,41]
Hierarchical (HRL/modular)✓ local/implicit state✓ multi-level goal decomposition∘ episodic (reward/trace)✓ layered policies, sub-task execution∘ performance driven replans✓ supervisor(s) (top-down)Decomposable, large programs; parallel teamsSupervisor bottlenecks; debugging across tiers [50,96]
ReAct single-agent (LLM)✓ LLM-driven interpretation✓ stepwise CoT/ReAct∘ STM/LTM as needed✓ tool/API invocation∘ language based self-critique✓ local orchestrationFast baselines; scoped assistantsLimited parallelism; tool-selection errors at scale [47]
Hybrid (reactive–deliberative)✓ sensor + abstract model✓ deliberative planner∘ localized recall✓ reactive+ planned actions∘ arbitration triggered✓ Supervisor arbitrationReal-time ops with long horizonConsistency across loops; arbitration design [38]
Layered Neuro-Symbolic✓ neural perception+ inference✓ symbolic planner✓ memory for rules/episodes✓ planner to actuator tools✓ verification/reflection✓ structured orchestrationOpen-world planning under uncertainty; public sectorIntegration overhead; representation alignment [39,41,95]
✓ = typically required; ∘ = optional.
Table 5. Agentic AI applications and tasks across multiple domains.
Table 5. Agentic AI applications and tasks across multiple domains.
DomainApplicationGoal/Task Attempted by Agentic AIRef.
HealthcareDiagnostics and treatment planningEnhancing healthcare decision support, diagnosis, and personalization. [59]
Fitness CoachingPersonalized, adaptive fitness coaching using multimodal multi-agent digital twin systems. [60]
Neuromuscular ElectrodiagnosticStandardized, AI-assisted interpretation and reporting of neuromuscular EDX tests for electrophysiologists. [99]
Genomic analysis and dynamic health managementPersonalized, real-time healthcare management using agentic AI for prediction, intervention, monitoring, and workflow optimization. [35]
Ethical oversight in generative AIEthical, compliant, and trustworthy deployment of GenAI in healthcare via agentic oversight. [44]
Computer vision (medical CV)Autonomous construction and execution of medical image segmentation pipelines via agentic AI. [100]
Bioethical clinical decision supportAdvisory system for ethical clinical decision support in complex healthcare scenarios. [101]
Clinical Decision and Drug DiscoveryTransforming healthcare to enable accurate diagnostics, personalized treatment planning, real-time patient monitoring, workflow automation, and drug discovery. [56]
Clinical Risk AssessmentReal-time identification and assessment of DILI risks using LLM on clinical notes. [102]
TransportationTraffic control in Smart CitiesEnhancing traffic control, safety, and sustainability in urban transportation systems. [103]
Industrial Systems –Transportation, and BiomanufacturingReal-time optimization, control, and personalization in industrial and biomanufacturing systems. [43]
Autonomous Vehicle Control and SafetyMultimodal model predictive control for safe, context-aware autonomous vehicle navigation. [74]
Multi-Agent Routing and SchedulingMulti-agent routing and scheduling optimization in logistics and guidepath networks. [61]
Human-Machine Transportation SystemsMulti-agent management of transportation systems across lifecycle phases to improve travel outcomes, safety, and system adaptability. [104]
SoftwareAutomated query and task processingAutonomously decompose complex queries, select relevant tools, and execute tasks efficiently with real-time feedback. [105]
BDD Test Case GenerationAutomate BDD test case generation from natural language while supporting code mapping and user collaboration. [106]
Computer vision in healthcare, roboticsEnable autonomous, real-time decision-making and adaptive behavior from visual data for complex perception and interaction tasks. [107]
Software and task automation in SMMEsAutomate software engineering and business tasks through autonomous, goal-driven multi-agent systems. [30]
Fraud and sensor drift management.Automate and optimize data pipelines with adaptive, interpretable, and balanced agent-driven learning. [54]
Personalized search and recommendationsDevelop AI agents for personalized, relevant, and fair information retrieval across complex and multimodal queries. [108]
CV task automation for non-expert users.Automate CV tasks via natural language commands for non-experts. [109]
Fault-Tolerant Multi-agent Software SystemsDesign fault-tolerant software systems with autonomous multi-agent control and recovery. [62]
Automatic Programming for End-UsersEnable end-user software development from natural requirements to deployment using LLMs. [110]
SLM-based software and IT task automationAutomate software and IT tasks and coordinate multi-agent workflows using SLMs. [111]
Personalized and autonomous IR systems.Improve relevance, accuracy, and personalization in IR systems with autonomous task handling. [112]
Autonomous software developmentAutomate multi-step software development and optimize workflows, code quality, and maintenance autonomously.[50,51]
Web and file automationExecute complex digital tasks with minimal oversight using modular multi-agent systems. [113]
Human-algorithmic agency modelingClarify and theorize agency in human-algorithmic ensembles to inform organizational governance and socio-technical system design. [114]
Conversational software testing agentDevelop SOCRATEST, an autonomous LLM-based conversational agent for software testing. [115]
Conversational MLOpsFacilitate human–AI collaborative MLOps using conversational interfaces. [116]
Agentic SOC automationAutomate SOC operations for proactive detection, mitigation, and response to cyber threats. [117]
Autonomous IT operations and automationAutonomous IT operations for real-time incident detection, prediction, and resolution. LLM-driven autonomous IT operations enabling adaptive, collaborative incident detection, prediction, and resolution.[49,118]
Military
and
Security
Multi-Agent Attack-Defense CoordinationAutonomous swarm decision-making for coordinated attack-defense in military confrontations. [119]
AI Workloads and Infrastructure DefenseAutomated moving-target defense for AI workloads through ephemeral infrastructure rotation in Kubernetes. [120]
Manufacturing
and
Industrial
WWTP optimizationOptimize full-scale wastewater treatment plant operation to cut costs and improve water quality under varying conditions. [63]
Smart Manufacturing and robotic assemblyDevelop autonomous, multi-agent systems to optimize manufacturing workflows, task sequencing, and adaptive production processes.[33,121]
Automated Supply Chain DecisionsEnable autonomous decision-making in food supply chains to improve efficiency, safety, and reduce waste. [55]
Cognitive robots for collaborationDevelop autonomous, cognitive robotic agents that can safely cooperate with humans, learn continuously, and plan for future scenarios. [122]
Autonomic quality managementAchieve self-managing, autonomic manufacturing systems that reduce human effort, ensure compliance, and optimize execution for total quality achievement. [123]
Human–Robot CollaborationPredict 3D human motion to enable safe and efficient human–robot collaboration in construction tasks. [124]
Multi-Agent Pickup and DeliveryEnable multi-agent pickup and delivery systems to complete tasks efficiently and without deadlocks in complex environments. [69]
Smart
Cities
and Energy
Optimize city services and coordinationAutonomously manage, optimize, and adapt urban systems for resilient, equitable, and sustainable cities. [125]
Energy management and fault detectionOptimize building energy use, operations, and maintenance for efficiency, cost reduction, and reliability. [58,70,126]
Public AdministrationResource and service managementOptimize governance, public services, and policy-making for efficiency, equity, and sustainability. [37,41]
Tourism and TravelingDecision optimization in tourismOptimize tourism management decisions using agentic AI for pricing and personalized services. [127]
Finance,
Banking,
and
Insurance
Autonomous banking operationsEnhance financial decision-making, risk profiling, and automated banking operations using AI. [80]
Financial modeling and risk managementAutomate financial modeling and model risk management workflows using LLMs to ensure compliance, robustness, and accurate decision-making. [39]
AI for BFSI customer supportAssist consumers in BFSI by providing information, recommendations, decision aid, and delegated actions. [128]
Risk and fraud detectorEnhance banking risk management and fraud using LLMs, ML, XAI. [129]
Personalized bankingProvide trustworthy, personalized banking experiences and financial guidance through autonomous agentic AI. [130]
AI-driven automated customer serviceEnable AI-driven, autonomous customer service and decision-making to improve efficiency, personalization, and workflow automation in business and insurance contexts. [96,131]
Multi-
Domain
Personalized recommendations and task managementDevelop foundation model-powered agents for autonomous, multi-domain task execution, combining personalized recommendations and general task automation with planning, reasoning, and collaboration. [32,75,76,132,133,134]
Multi-agent systemEnable autonomous, goal-driven AI agents and multi-agent systems to plan, act, reason, and adapt across multiple domains, completing complex, long-horizon tasks with minimal human intervention. [31,34,36,47,53,77,135,136]
Coding and problem solvingEnable autonomous agents to reason, plan, and solve complex tasks across diverse domains. [46,137]
Cost-efficient plan cachingEnable autonomous LLM agents to perform multi-step reasoning and planning efficiently across domains while reducing computational costs. [66,138]
Radiology and predictive crime analyticsEnable autonomous, agentic AI systems to support decision-making, diagnostics, and automated scheduling in professional and medical domains. [67]
Memory-Augmented Agent SystemsEnable LLM agents to perform multi-domain tasks with memory for reflection, recall, and long-term planning. [139]
Retail,
Business
and
E-commerce
Retail and supply chain automationAutomate and personalize retail customer interactions and supply chain operations. [27,78,140]
Digital task automationAutonomously optimize digital tasks, schedules, and workflows across platforms using agentic AI. [141,142]
Enterprise automationAutomate enterprise decision-making and workflows to enhance productivity and reduce errors. [68,143,144]
Cognitive robots for collaborationDevelop autonomous, cognitive robotic agents that can safely cooperate with humans, learn continuously, and plan for future scenarios. [122]
Customer service and retail supportOptimize AI chatbots for customer service, decision support, and human-agent collaboration in retail. [145,146]
Interactive Recommender SystemsDeliver interactive, personalized, multi-turn recommendations using LLM-enhanced chatbots. [79]
Automate tasks and coordinate AI agentsEnable AI agents to autonomously reason, plan, and act across enterprise systems while securely automating tasks. [40]
EducationPersonalized educational assistanceEnhance learning, teaching, and educational administration through personalized and adaptive support [57,97,147]
Evolution of AI in EducationImprove consistency, reliability, and fairness in automated essay grading using multi-agent systems. [148]
Research and task managementEnhance research, decision-making, and cognitive support through intelligent task execution. [149]
Scientific
Discovery
Research
Automated Scientific ResearchAutomate and accelerate scientific discovery across chemistry, materials science, bioinformatics, and molecular biology. [45,150,151]
LLM model battery simulationIntegrate LLMs with physical models to autonomously simulate, analyze, and guide advanced battery research. [64]
LLM-driven materials discoveryAccelerate materials discovery using knowledge-guided LLMs with autonomous labs. [65]
Science Exocortex for Research AutomationAugment human cognition and automate scientific research using a “science exocortex” of AI agents. [152]
Automated interview and data codingAutomate and streamline qualitative interviews and analysis in social and health sciences. [153]
Table 6. Agentic AI models for various input–output transformations.
Table 6. Agentic AI models for various input–output transformations.
InputOutputTask PerformedTechnologyRef.
TextActionsProcess user prompts to run tasks and coordinates the agentsLLMs, MCP servers [120]
Processes and handles the user query to handling multi-step workflowsRAG systems, Vectara-agentic [27]
Manages live network operations and delivers network servicesAutonomous Agents, Orchestration APIs [82]
Automatically generate efficient robotic assembly workflowsMAPE-K, ML, knowledgebase [121]
Design and implement fault-tolerant softwareMulti-agent AI and concurrent SE techniques [62]
Enables direct network control as per Telecom controlDecision-making module [71]
Coordinate multiple home appliances for energy savingMAS, enhanced by LLMs [38]
Decide next action by self-reflection or knowledge use and generate signalsLLMs, KnowSelf, DeepSpeed, vLLM [144]
Executes arbitrary computer tasksLLMs, Adept ACT-1, AutoGPT, OpenAI’s tools [34]
Critical self-analysis and refinement of its previous outputsLLMs, SELF-REFINE [50]
Learn from feedback, self-improve dynamicallyBrainBody-LLM, RASC, REVECA, AIFP [75]
Resolve ambiguity and adapts to user preferencesLAM, Neuro-symbolic program, LLM [154]
TextParse user intent, maps, retrieve and combine data to respond backRouter Agent (LLM), domain APIs [77]
Understanding/responding to inquiries quicklyAI agents, LLM workflows [41]
AI agent learns and shares this expertise with other agentsLLM, AI Agent with locally privacy-preserving learning [143]
Understand, search, compute, combine resultsLLaMA-3-70B, Bee agent (ReAct) [52]
Complex problem-solving through iterative reasoningGPT-4, GPT-3.5, Claude, Self-refinement, CoT [137]
Plans and handles multi-step support for customer interactionsGPT-3.5, service robots [140]
Assist software development by completing codesLLM [155]
Generating and refining novel research ideas and hypothesesLLMs, reinforcement learning, RAG [150]
Understanding, planning, and error correction in codesGPT-4, fine-tuned LlaMA 2, ItemCF, SASRec [110]
Planning and generating executable synthesis plansGPT-4, 18 tools, RoboRXN [65]
Splits big goals, step-by-step plans and achieve multiple targetsAutonomous HVAC agents [37]
Text/JSON scriptsBreak down, assign, execute, and monitor tasksLLM orchestrators, FAISS [105]
JSON scriptsAutomate research by assigning tasks to AI agentsRAG, LLM, citation traversal [151]
Text/ChartsBattery material analysis, SEI growth predictionAgentGPT, DFT, VASP, APIs, Cloud, SEI Models [64]
YAML config file/imagePlan steps, generate YAML, verify, self-correct, executeLLM, SimpleMind [100]
Numerical dataExtract, plan, and refine financial dataGPT-4o-mini, LLaMa-3.2-8B [138]
Text/numerical resultsClinical decision support for epilepsy detectionMAS, agent abstraction, ML [59]
Logs/audit eventsWorkflow execution and data retrievalLLM, RAG, LangChain, LangFlow, AutoGen, CrewAI [40]
AudioTextData review, ethical decision-making and gives adviceAgentic AI, LLMs, fairness tools [101]
Text/AudioGoal parsing, context-aware feedback generationASR, STM, LTM, GPT-4 [60]
AudioConduct structured, empathetic interviewsGPT-4o mini, Whisper3-turbo, Llama3.2 [153]
Real-time
data
ActionAnalyze sensor data to detect issues and optimize operationsIIoT, ML, DT, Cloud [131]
Buildings adjust energy use to support the power gridMADRL, AC, AM, DNN, CityLearn [70]
Multi-agent navigation and collision avoidancePIBT+, Priority Decentralized Ctrl [72]
Monitor, analyze, plan, optimize to enhance system performanceMAPE-K loop, sensors, Machine Learning [123]
Smart nano-grid energy managementFuzzy Logic Control, VSI, SAF [73]
Automated threat detection, response, and system optimizationAgentic AI, OOAD, CIF, DA [156]
Adjust plans in response to disruptionsAgentic AI with RL + MAS [30]
Learn through trial and error to improve strategiesRL, Q-learning, DQN, PPO [36]
Goal setting and strategic adaptation, decision-makingUrban sensing, LLMs [125]
Text/DatasetSuggests better routes to prevent traffic and manage flowACP, ABT, AI/ML, DS, AI, VA, CB [103]
Real-time coordinatesActionsAutonomous task completion, movement decision-makingSwarm algorithms, Stepwise decisions [119]
DatasetsTextMonitoring and providing feedback for employee performanceAI, NLP, SML, DL, MA [146]
Generate E-commerce product descriptions, updates and manages catalogsTransformer model, PyTorch [142]
Learn and adjust prices to increase profitsReinforcement Learning (RL), Q-learning [157]
Provide coaching and insights for team membersCoachBot, BRiN, Amanda [145]
ActionsReal-time feedback-based adaptationMPA, A-Core integration [66]
Analyze datasets, detect threats, autonomously mitigate threatsLLMs, agentic AI, AI agents, networks of specialized models [117]
Adapted task performanceLLMs, RLHF, RAG [152]
Automated trading and investmentAgentic AI, Neural Nets, Trading algs [80]
Optimize processes, identify opportunitiesAgentic AI, Agentic platform, Pred. analytics, adaptive reasoning [158]
Make quick decisions to adjust plant controls,
improves results
MARL and G2ANet [63]
DatasetPredictive policing and analysisCAS, ML algorithms [67]
Extracts datasets split into training/testing sets
and subsamples
Code execution tool + GPT-3.5 Turbo [39]
Data analysis and pricing strategy formulationExplainable AI (XAI), Rule-based logic [127]
AlertsMonitor data, detect risks, notifyAutonomous AI system with LLMs [102]
Fraud detection and risk managementAI, ML, LLMs, NLP, GNN [129]
Images/VisualsCommunicate complex info visuallyKubernetes bots, OSSD bots, Gatekeeper [32]
Dataset/TextAnalyze and predict health outcomesAgentic AI, LLM [35]
Text/ImagesTextReview policy, verify claims, craft response, coordinate agentsMulti-agent AI, Pretrained LMs, RAG [96]
Diagnosis issue, generates treatment plan, and healthcare processLLMs, DL, ML, multimodal AI [56]
Text/DatasetTextCreate, modify, and guide BDD test steps with user feedbackLLMs, NLP, RAG, ReAct agent, BDD frameworks, Streamlit UI [106]
Text/Test codeAutonomous software testing, Planning and
executing tests
MW, LLMs, MA, LC, ACM, NN [115]
Text, JSONL dataGenerates personalized, contextual data and synthetic dataLLaMA 3.2 3B, LoR, JSONL [97]
Text/ActionsUser intent retrieval and action mappingLLMs, RAG, RL, Knowledge Graphs [159]
Text
Images
Dataset
Text/Code/LogsProblem-solving, adaptive execution, information gatheringAgentic AI/LLM-based assistant [46]
Text/Audio/ActionsPersonalized service, actions monitoring, fault detectionAgentic AI, NLP, ML, DL, RL [160]
Images/Video/TextActionsPredict and align with human intent to adapt robot behaviorLSTM-based deep learning [124]
Images/VideoCoordinates/ImageObject detection, classification, result refinementYOLO, ResNet, LLMs, RAG, VCG16 [109]
Video/AudioActionsPerception, reasoning about complex situationsMM pipelines, adaptive mechanisms; spatial and temporal reasoning [95]
Text/Images/AudioActionsAdaptive decision-making and optimized task executionLLMs, VLMs, RL, Embodied AI [135]
Text/Images/Audio/VideoActionsStrategy refinement and optimizationSelf-supervised and RL [33]
Image/Real-time dataActionsReal-time prediction and adjustment of navigationAgentic AI, DRL, computer vision [107]
Text/AudioActionsPlan and execute complex tasks like booking hotel roomsGPT-4, APIs, web automation [141]
Text/Audio/ImageActionsUnderstand and fulfill user needs by NPL interactionCLIP, Multimodal Foundation Models [132]
Text/Images/Audio/Graphs/ChartsGraphs/chartsRecommend products, provide feedback, and manage tasksAgentic knowledge graphs, ML [134]
Text/Image/AudioText/Image/AudioCustomer query and feedback handlingTransf. models, PT, CUDA [42]
Text/Video/AudioText/Visual analyticsReal-time adaptation, intelligent decision-makingRL, NLU, contextual reasoning [57]
Graph/Code/TextNumeric dataPredicts and ranks workflow performance without executionGNNs, MLPs, T5/BERT (MiniLM) [81]
Numeric + TextWord documentCompile FDD results and analyses to comprehensive reportAI Agent, Code Interpreter, LangSmith [126]
Spatial coordinatesActionsAgents plan collision-free moves to complete tasksDecentralized control algorithms, PI, BT, PIBTTP, PIBTTP-TA [69]
NumericActionsAdjusting crystallization residence time and optimizes emissionData denoising, SCADA integration [43]
AlertActions/AlertsAnalyze alerts, mitigate risks, manage Tier 1/2 responsesAgentic AI with generative AI [161]
Code files like .csCode filesModifies code, generates git diffs, logs, summariesCodex (OpenAI), git diff, sandbox [51]
CSV fileML model(.pt,.pkl)Generate and debug energy modeling codegpt-4o on LangGraph (ReAct), PythonREPL [58]
PDFsDatasetExtract and organize clinical dataAANEM references, LLM Gemini 1.5 API [99]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bandi, A.; Kongari, B.; Naguru, R.; Pasnoor, S.; Vilipala, S.V. The Rise of Agentic AI: A Review of Definitions, Frameworks, Architectures, Applications, Evaluation Metrics, and Challenges. Future Internet 2025, 17, 404. https://doi.org/10.3390/fi17090404

AMA Style

Bandi A, Kongari B, Naguru R, Pasnoor S, Vilipala SV. The Rise of Agentic AI: A Review of Definitions, Frameworks, Architectures, Applications, Evaluation Metrics, and Challenges. Future Internet. 2025; 17(9):404. https://doi.org/10.3390/fi17090404

Chicago/Turabian Style

Bandi, Ajay, Bhavani Kongari, Roshini Naguru, Sahitya Pasnoor, and Sri Vidya Vilipala. 2025. "The Rise of Agentic AI: A Review of Definitions, Frameworks, Architectures, Applications, Evaluation Metrics, and Challenges" Future Internet 17, no. 9: 404. https://doi.org/10.3390/fi17090404

APA Style

Bandi, A., Kongari, B., Naguru, R., Pasnoor, S., & Vilipala, S. V. (2025). The Rise of Agentic AI: A Review of Definitions, Frameworks, Architectures, Applications, Evaluation Metrics, and Challenges. Future Internet, 17(9), 404. https://doi.org/10.3390/fi17090404

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop