1. Introduction
Sea transport forms the backbone of the whole world trade and accounts for about 90 percent of total trade volume. Despite its indispensable role in the economy, the sector urgently requires a reduction of its environmental footprint. The shipping industry contributes a significant share of global greenhouse gas emissions, estimated at approximately 2.5 percent of total global emissions [
1]. Today, industrial standards impose that the prosperity of the economy should be coupled with environmental sustainability. The industry is increasingly focusing on pollution mitigation measures and energy efficiency [
2].
The sustainability of supply chains has developed new meaning. At this point, it uses the triple bottom line model to strike the balance between social and environmental obligations and economic goals [
3]. The need to invest in green options and innovate technology has caused the rise in the amount of research, particularly on smart ships and sustainable port infrastructure [
4]. Such complicated problems demand that the maritime industry is more digitised and automated. Intelligent agents are becoming increasingly pivotal within the broader context of digital transformation in logistics. In general, an agent can be described as an entity that can sense the environment with sensors and can respond to the environment with actuators [
5]. According to Wooldridge and Jennings, Ref. [
6] agents are defined in terms of autonomy, social capability, reactivity and proactiveness.
This is gradually shifting to multi-agent systems where agents collaborate to resolve complex problems arising from the distributed and dynamic character of global supply chains. In a decentralized environment for analysing large dataset in real time, agents use AI techniques such as machine learning, natural language processing and reinforcement learning. Intelligent agents can create value in unstructured data through these techniques and act in transformative ways. This capability to convert heterogeneous data into homogeneous, machine-readable formats enables a systems-level transformation from reactive strategies toward proactive and anticipatory planning.
This paper contains a detailed discussion of intelligent agent application in maritime logistics. The research demonstrates the revolutionary impact of intelligent agent enabled autonomy on maritime challenges by bridging the gap between theoretical agent models and the practical operational realities. To overcome these challenges, a semi-systematic review approach is adopted in this study. Literature was sourced from major academic databases (Scopus, Web of Science, IEEE Xplore) using the keywords “intelligent agents”, “multi-agent systems”, “maritime logistics”, and “autonomous shipping” with a focus on advancements from 2015 to 2025. The unique value proposition of this review is to answer three research questions: (i) How can various architectures of intelligent agents be operationalized in maritime contexts? (ii) To what extent do current agent applications generate measurable sustainability outcomes (e.g., CO2 reduction)? and (iii) What is the deployment maturity gap between simulated models and real-world operations? By answering these questions, this paper goes beyond descriptive analysis to provide an actionable research agenda.
The remaining part of the paper is organized as follows.
Section 2 presents an intelligent agent architecture taxonomy that is specific to the maritime industry consisting of reactive, deliberative, hybrid, and multi-agent systems.
Section 3 examines the real-world application of these architectures. It is organized into two main categories: ship-centred solutions and integrated port applications, concluding with a comparative synthesis of existing research trends.
Section 4 provides a critical assessment of the technical and regulatory obstacles to wide adoption, including the reality gap in simulation-based training, the black-box characteristics of deep-learning models, and weaknesses of decentralized networks, and suggests a future research agenda necessary to pave the way toward robust autonomy. Finally,
Section 5 concludes the study.
2. Classification of Architectures of Intelligent Agents in Maritime Logistics
The development of a functional intelligent agent within a logistics environment primarily depends on the selection of an appropriate architecture that defines how perception, memory, learning, planning, and action are organized and coordinated. The implementation of agent functionalities in a maritime logistics context can be examined through the following three key architectures: (i) reactive agents; (ii) deliberative agents; and (iii) hybrid agents.
Reactive agents operate as low-level, real-time entities where sensors are directly mapped to actuators via condition-action rules [
5]. In contrast to deliberative architectures, reactive agents do not have an internal symbolic representation of the world or a memory of previous states and therefore reduce the computational latency. They generate immediate reflex actions such as obstacle avoidance or emergency braking, and are indispensable for fast, safety-critical computations that the system requires to react to dynamic environmental events much faster than a complex planning algorithm would allow. They typically form the foundational control layer, serving as a fail-safe to ensure vessel safety even if higher-level modules falter [
7].
Deliberative (cognitive) agents: unlike reactive agents, deliberative agents are based on an internal symbolic view of the environment. They use abstract, symbolic or numerical planning and state estimation to make decisions on medium to long-term horizons [
8]. By maintaining an explicit representation of the environment and their goals, they enable complex reasoning, such as strategic voyage planning or fleet scheduling, but at the cost of higher computational latency. Contemporary deliberative architectures are increasingly incorporating the learning element to address the inefficiencies of the fixed symbolic models. Whereas conventional systems are based upon fixed and pre-programmed rules, modern deliberative agents often use data-driven methods to continuously improve the internal models of the environment. For example, in model-based reinforcement learning, the agent learns the transition dynamics of the system explicitly from experience and uses this learned model to plan optimal trajectories. It is important to note the operational nuance in modern learning systems: algorithms such as deep reinforcement learning (DRL) often blur strict boundaries in terms of architecture. While the training phase is based on a deliberative assessment of long-term rewards and complex state-spaces, the resulting deployed policy often performs actions reactively in runtime (e.g., immediate rudder adjustments), effectively acting like a reactive agent during operations.
Hybrid deliberative/reactive architectures: This design paradigm arises from the need to balance between quick reflexive reactions and long-term strategic planning in complex real-world systems [
9]. These architectures divide intelligence into a hierarchy of layers which operate at different temporal scales and levels of abstraction:
- ○
Reactive layer: handles high-frequency, safety-critical execution. It deals with immediate sensor-actuator mappings for tasks such as avoiding obstacles and controlling stability.
- ○
Deliberative layer: reasons and plans at a complex level based on a medium-term horizon that includes path planning, vessel state estimation, and compliance with navigational rules.
- ○
Meta-cognitive (strategic) layer: oversees long-term mission goals and global optimization, dynamically adjusting strategies to manage trade-offs between competing objectives like fuel efficiency and time-of-arrival.
While the previously discussed architectures focus on the internal reasoning of individual entities, a comprehensive overview of maritime logistics requires expanding this scope to multi-agent systems. The use of multi-agent systems today has become the foundation of the digital transformation in the sphere of maritime logistics, especially in the framework of smart ports. The multi-agent system paradigm extends the concept of intelligent agents to orchestrate complex tasks through a network of decentralized, interacting agents [
10]. Unlike centralized systems, multi-agent systems rely on the collaboration of autonomous agents, whether reactive, deliberative, or hybrid, to solve problems that are beyond the capabilities or knowledge of any individual agent. Multi-agent systems are also characterised by decentralisation, which ensures robustness in the system.
The characteristics of the above architectures and their main features are summarized in
Table 1. Finally, this classification highlights that no single architecture is universally optimal, rather the choice of a specific model must align with the operational constraints and performance goals of the proposed maritime use case.
3. Applications of Intelligent Agents in Maritime Logistics
Building upon the architectural classifications defined in the previous section, this section analyses the practical deployment of intelligent agents across the maritime logistics spectrum. To facilitate analytical clarity, the applications are broken down into two separate but closely interrelated areas: ship-centric applications, which are mainly concerned with vessel autonomy, navigation and onboard applications; and port and supply chain applications, which include land-side logistics network, terminal control, and intermodal coordination.
3.1. Ship-Centric Applications (Seaside)
The application of intelligent agents within the ship-centric domain is inextricably linked to the development of maritime autonomous surface ships (MASS). As the industry transitions from manned to fully autonomous operations, intelligent agents serve as the core technological enabler for automating the critical functions of guidance, navigation, and control [
11]. Here, the agents are not only a support tool but are also being granted authority to make decisions with respect to the stability of vessels, optimal routes, as well as avoiding collisions. This section categorizes ship-centric agent applications into navigation safety, energy efficiency, and predictive maintenance, demonstrating how the reactive and deliberative architectures reviewed in
Section 2 are implemented in real-world maritime environments.
3.1.1. Autonomous Navigation and Collision Avoidance
Autonomous navigation is the basic ability of autonomous ships. To achieve this, the combination of multi-sensor data fusion (radar, light detection and ranging system-LiDAR, automatic identification system-AIS) and complex decision-making algorithms is necessary. This functionality is usually decomposed into two levels of hierarchical functionality in agent-based architectures: global path planning and local collision avoidance.
Global path planning implies the use of deliberative agents that calculate the optimal route using the static charts and weather forecasts prior to the start of the voyage. This is insufficient because the dynamic nature of the sea often necessitates changing plans and taking corrective actions. Consequently, the system requires reactive agents capable of local collision avoidance by detecting dynamic obstacles and executing immediate manoeuvres to prevent accidents. The most important challenge in this area is ensuring that the international regulations for preventing collisions at sea (COLREGs) are strictly followed in the decisions made by agents.
Figure 1 represents this hierarchical decision-making process. The architecture illustrates the hybrid integration of a deliberative global planner (for economic route optimization) and a reactive local planner (for real-time collision avoidance), underpinned by multi-sensor data fusion and COLREGs compliance modules. Beginning with the perception layer, the raw data provided by heterogeneous sensors (radar, LiDAR, AIS, camera) is aggregated in sensor fusion algorithms in order to generate a unified state estimate of the environment. This world model has two different cognitive loops:
the global path planner (deliberative layer), which makes use of static charts and weather forecasts to compute the most energy-efficient long-term trajectory.
the local planner (reactive layer) that works with a high frequency to identify dynamic obstacles. It modifies the global trajectory in real-time to execute collision avoidance manoeuvres, strictly constrained by a logic block dedicated to COLREGs compliance.
Finally, the computed control actions (rudder angle, engine rpm) are sent to the vessel’s actuators, completing the control loop.
In the domain of deep reinforcement learning (DRL), agents are frequently trained to perform reflexive behaviour based on learned policies. Meyer et al. examined the possibility of using the proximal policy optimization algorithm to continuous control autonomous vessels. In their approach, the agent learns an optimal guidance policy through trial-and-error, enabling a direct reaction to sensor data to ensure compliance with COLREGs [
12]. Likewise, Pan et al. suggested an improved deep Q-network system that employed duelling networks [
13]. This architecture enables the agent to quickly evaluate state values and select manoeuvres, which is a good balance between safety and efficiency in navigation. Furthermore, Gan et al. developed an actor-critic model that integrates intrinsic and extrinsic rewards [
14]. Although the training process involves evaluation, the execution component of the agent operates reactively to address the challenge of sparse feedback in complex inland waterways.
Wang et al. provide an prominent example of the deliberative approach since they applied game theory to route negotiation among ships [
15]. In this model, agents not only respond to immediate positioning but strategically share navigation intentions and negotiate manoeuvres in an iterative way to reach an optimal solution that will satisfy the interests of all the parties. Additionally, Lazarowska and Zak used a deterministic trajectory base algorithm in which the agent conducts a search in a database of candidate paths to find an optimal safe path. Such a process requires a calculation and analysis of future conditions (trajectories) before any action is performed [
16].
Moon et al. developed the intelligent autonomous ship framework, which explicitly employs a hybrid agent architecture [
17]. The system has functional layers in which the global path planner (deliberative layer) plans the route using economic factors, whereas the local path planner and path track commander (reactive layers) enforce the avoidance of collisions and manoeuvres in real time. Similarly, Xiao et al. proposed the marine multi-agent framework (MarineMAS), which integrates planning, situational awareness, and decision-making through a hierarchy of agents [
18]. They focused their work on the importance of transfer of knowledge and coordination between system components to achieve systemic intelligence.
Wang et al. suggested a collaborative method that is founded on multi-agent deep reinforcement learning with the use of a deep recurrent Q-network, where agents are taught to cooperate by incorporating the local action values into a joint value [
19]. Malviya and Rajendran used the multi-agent proximal policy optimization (MAPPO) algorithm to execute a decentralized control [
20]. In this context, all agents make decisions independently but learn a policy which takes into consideration other agents as moving barriers.
In order to verify maritime regulation compliance when using a multi-agent system, Wei and Kuo designed an algorithm that employs the centralized training with decentralized execution (CTDE) model [
21]. This algorithm enables agents to learn cooperative strategies that align with COLREGs. Furthermore, Niu et al. introduced a data-driven approach utilizing real AIS data within a multi-agent deep reinforcement learning framework [
22]. Their model particularly takes into account coordinated and uncoordinated actions of vessels to enhance the robustness of the system in real-world scenarios.
3.1.2. Energy Efficiency
With the maritime industry facing mounting pressure to decarbonize, intelligent agents have become the key to vessel optimization to minimize fuel usage and greenhouse gas emissions. The primary application in this domain is intelligent weather routing. Whereas collision avoidance is based on reactive capabilities, energy efficiency requires advanced deliberative agents that can plan their strategies over a long period. Unlike traditional static route planning, agent-based systems continuously monitor dynamic meteorological conditions (wind, waves, currents) and hydrodynamic ship models. By processing such stochastic data, the agent is able to generate optimal trajectories that are a trade-off between travel time and fuel consumption. It has been demonstrated that the implementation of AI-based agents to execute such a dynamic route optimization may decrease the environmental impact of shipping processes by a significant margin in comparison to the traditional navigation [
1].
In addition to route selection, agent-based operational profile optimization methods are also used [
4]. These agents serve as dynamic supervisory controllers of the propulsion system of the ship, which dynamically adjust the engine speed and trim to align with environmental resistance. This capability enables the implementation of just-in-time arrival strategies. Specifically, the agent interacts with port-side agents to regulate the speed, thereby preventing unnecessary anchoring and waiting time at the destination. Wu et al. proposed a generic energy management strategy for hybrid-electric propulsion systems (fuel cell and battery) based on an agent [
23]. This agent, in contrast to the earlier methods, which had discrete action space, has a continuous action space, and hence, can precisely control the power division between energy sources. The agent is a deliberative agent that trains to reduce voyage costs and power source degradation by monitoring the state of charge and load profiles.
Focusing on specific environmental pollutants, Abdalla et al. investigated deep reinforcement learning agents for reducing methane slip in hybrid liquefied natural gas (LNG) vessels [
24]. Their agents function as cognitive controllers that optimize the engine load to avoid inefficient operating zones where methane slip is highest. Results indicated that the agent achieved the highest performance in reducing emissions compared to traditional peak-shaving strategies.
Alshareef and Alghanmi proposed an agent which optimizes operational parameters, e.g., engine power, speed, for different types of vessels including those that use alternative fuel such as bio-liquefied natural gas and hydrogen [
25]. This agent demonstrates high-level deliberative behaviour because it balances the trade-off between fuel efficiency and strict regulatory limits and achieves high levels of carbon intensity reduction in various operating conditions.
Multi-agent systems are required in the environment of onboard microgrids and ship-port interfaces due to the complexity of the control of various energy sources and consumers. In this case, the system intelligence is shared across several components (generators, batteries, loads) which are individual agents that are negotiating to reach a global energy balance. Tang et al. suggested a power and energy management system based on a multi-agent system [
26]. Such a decentralized system is more robust and easily integrated with renewable energy sources. Furthermore, it ensures system stability even in the event of individual agent failure.
To balance competing needs and facilitate long-term voyage optimization, hybrid agent architectures are frequently employed. As demonstrated by the Intelligent Autonomous Ship Framework (IASF) [
17], hybrid agent systems enable clear functional separation: the deliberative layer acts as an energy manager, generating fuel-efficient routes based on weather predictions, while the reactive layer handles immediate manoeuvring.
3.1.3. Predictive Maintenance of Ship Systems
The third pillar of ship-based agent application deals with the reliability and availability of onboard machinery by predictive maintenance. Conventionally, maintenance in the maritime sector has been done on a predetermined (preventive) or reactive (corrective) basis. The intelligent agents promote a paradigm shift to condition-based maintenance (CBM) and prognostics and health management (PHM) by constantly processing sensor data streams of engines, propulsion systems, and auxiliary equipment to predict equipment degradation before they become functionally defective [
27]. The agents in this field are categorized according to their reasoning horizon (immediate anomaly detection and long-term prognosis) and the complexity of their architecture.
At the operational level, deliberative agents based on deep learning methods play the role of active protectors of vessel health. These agents are data-driven rather than simple rule-based systems. Instead of relying on fixed thresholds, they establish dynamic baselines of normal operational behaviour.
Ellefsen et al. emphasize the use of deep learning agents, including the ones that use autoencoders and restricted Boltzmann machines, to build data-driven health indicators [
28]. These agents sense high-dimensional sensor data, such as vibration, temperature, and pressure, and respond to anomalies that are indicative of impending faults, and work well even in the harsh and fluctuating maritime environment where labelled failure data are limited. In the same manner, Durlik et al. explain how real-time AI agents are used through the application of clustering algorithms and statistical thresholding to monitor propulsion systems continuously [
29]. These agents operate on a proactive basis and identify micro-anomalies, e.g., a gradual rise in engine vibration, which are precursors of bearing failures, hence prescribing maintenance actions well before emergency repairs are required.
Raptodimos and Lazakis utilized nonlinear autoregressive and nonlinear autoregressive with exogenous inputs neural networks to model the dynamic behaviour of main engine parameters [
30]. These agents act deliberatively by forecasting future values of critical parameters (e.g., exhaust gas temperatures) based on past observations, enabling the prediction of potential failures steps ahead of their occurrence.
Göksu and Erginer propose a deliberative strategy based on neural networks to study the historical data on failures of a planned maintenance system of a ship [
31]. The agent was trained on the history of failure of the subsystems, such as fuel and air charge systems, over a decade and learned to make predictions about the approximate dates of future component failures. This feature enables ship operators to shift from reactive repairs to planned interventions, allowing for the scheduling of maintenance during port calls to ensure operational schedule reliability.
To address the drawbacks of completely data-driven designs such as the black box character of neural networks, hybrid agent designs integrate data-driven learning with physics-based reasoning or reliability-based reasoning.
Jimenez et al. propose a framework that combines data pre-processing agents and domain knowledge on failure modes [
27]. Their work emphasizes that, although data-driven agents can recognize patterns, they should be combined with maintenance records and expert logic to define the importance of assets and the corrective measures to be implemented.
An example of a hybrid architecture that is prominent relates to the combination of agents and digital twin technology. According to Durlik et al., hybrid AI-physics models enable agents to be trained in physics-consistent virtual worlds [
29]. These agents add physical constraints to the inference process, which makes them more robust and enables them to detect anomalies by comparing sensor-based data in real-time with the desired behaviour modelled by the digital twin. Such a hybrid configuration ensures that the agent focuses its learning on the most important elements identified through engineering knowledge, thereby optimizing the computational resources and making the maintenance decisions more interpretable.
3.2. Port Applications (Landside)
Although the former section concentrated on the autonomy of the ships, the general efficiency of the maritime logistics depends on the seamless integration of the vessel and the land-based infrastructure. The cargo interface transition is another layer of complexity that is characterized by the interaction of various actors and multimodal transport systems. Intelligent agents, in this context, canbridge this gap between maritime operations and land-based management.
3.2.1. Intelligent Port Operations
Ports are multifaceted intermodal nodes, and the organization of various logistical processes involves advanced decision-making. The use of intelligent agents in this field transforms a traditional centralized approach to scheduling to a dynamic decentralized coordination. It has been found that multi-agent systems are especially suitable in this environment because they are capable of managing the distributed nature of the port stakeholders and stochasticity of maritime traffic.
This decentralized system is represented in
Figure 2. The diagram shows how the high-level Port orchestrator agent (which performs global monitoring) can interact with operational agents (Ship, Quay Crane, Yard Truck, Yard Crane) through negotiation and bidding protocols to dynamically assign tasks through a common communication layer. The Port orchestrator agent is responsible for global system optimization, while operational execution relies on direct negotiation between the ship agent and terminal equipment. The communication between quay cranes and yard truck agents involves the use of bidding protocols to dynamically distribute transport tasks.
Within such a framework, Lv et al. note the growing importance of multi-agent reinforcement learning methods in the organization of seaside resources, including berths, quay cranes, and tugboats [
32]. In this regard, various entities are modelled as different agents that learn the optimal strategies through interactions with the environment, either by collaborating or competing. To illustrate, a multi-agent proximal policy optimization (MAPPO) algorithm is applied to find solutions to complex scheduling problems, where agents must take real-time decisions by considering dynamic perturbations, such as vessel delays or equipment failures. Such agents are not just following predetermined rules but optimizing their actions by successively maximizing reward functions that balance across operational performance, costs, and environmental objectives.
Douma et al. addressed the barge handling problem in large ports using a multi-agent system, which includes the agents of barge operators and terminal operators [
33]. Instead of following schedules set by a central authority, these agents engage in a negotiation protocol involving service-time profiles. Barge agents act opportunistically to reduce their sojourn time, whereas terminal agents strive to maximize quay crane utilization. Such a decentralized system enables the rotation of barges to be flexibly adjusted to terminal capacities and is more resilient in dynamic environments than central planning.
In the specific context of roll-on/roll-off (Ro-Ro) terminals, Gonzalez-Cancelas et al. proposed a multi-agent architecture in which an operations orchestrating entity relies on a contract net protocol (CNP) to organize specialized agents [
34]. As an example, tasks that are negotiated by vehicle distribution agents and shipping coordination agents include space allocation and boarding sequences. These agents employ heuristic optimization algorithms to find the best actions to minimize transit times and maximize the use of space as compared to traditional manual management.
Kanellos et al. proposed a power management approach in large ports where the port is a microgrid with flexible loads (e.g., reefers, electric cranes) and renewable generation [
35]. The individual agents in this multi-agent system represent separate entities such as ships at berth or refrigerated container clusters. These agents negotiate their power consumption characteristics to smooth the overall load curve of the port and utilize the maximum of the local renewable energy, so that the cost of operation and emissions are minimized without affecting the integrity of cargo.
Addressing the triple bottom line requires moving beyond conceptual optimization to measurable key performance indicators (KPIs), such as reductions in port waiting times, berth congestion, and localized emissions. A critical lever for port decarbonization is the integration of alternative power supply pathways, including onshore power supply and renewable microgrids. Intelligent agents are particularly well suited to optimize these hybrid energy architectures by negotiating shore power connections and balancing energy loads dynamically, thereby significantly reducing greenhouse gas emissions during port calls [
36].
3.2.2. Supply Chain Coordination
The introduction of maritime transport into the broader logistics chain demands the coordination of the material, information and financial flows among various autonomous stakeholders. The distributed nature of data, competing interests and the sheer complexity of contemporary networks often render traditional centralized optimization ineffective. As a result, the use of intelligent agents in this field has evolved into multi-agent systems that exploit decentralized architectures to mitigate the so-called bullwhip effect, as well as deliberative agents that consider physical constraints and risk assessment in their decision making.
In order to deal with the complexity of end-to-end logistics, researchers have proposed multi-agent system designs in which agents model certain structural entities in the supply chain, enabling decentralized autonomy.
Xu et al. proposed a methodology for designing an agent-based autonomous supply chain in which structural entities (e.g., suppliers, wholesalers, logistics providers) are represented as autonomous agents [
37]. In this architecture, agents control their units within the organization and communicate with peers to synchronize the three basic flows (material, information, financial). With this framework, agents are able to independently control processes such as procurement and order fulfilment, effectively shifting the paradigm from manual coordination to a self-operating system.
Keesara proposed an architectural design in which agents are organized in the model-view-control architecture [
38]. This architecture comprises a view layer for monitoring and interaction, a control layer for decision-making and optimization algorithms, and a model layer for storing digital twins of supply chain entities. This decoupling enables agents acting on behalf of procurement, production and logistics to have local authority but still be coherent at the system level by sharing standardized protocols, that improves scalability and visibility in distributed ecosystems.
Etebari et al. demonstrate that the use of agents to share information and centralize decisions has a significant impact on the performance of supply chains [
39]. The system can transition to a more coordinated system by sharing point-of-sale information and calculating echelon safety stock with the help of agents such as RetailerInfo, SupplierInfo as well as OrderQuantity.
Muravev et al. propose a new hybrid simulation framework, a combination of agent-based modelling, system dynamics, and discrete event simulation to plan strategies and connect seaports to the hinterland (dry ports) [
40]. Agents in their model represent not only physical objects (e.g., trains or cranes), but also important parameters in the terminal (e.g., traffic intensity, storage capacity). These agents interact to achieve an equilibrium state in a system, which enables managers to find optimal design and operation parameters of dry ports under the conditions of uncertainty and traffic irregularity. This is a two-step optimization methodology. The initial step involves agent-based system dynamics which is used to rapidly assess and stabilize parameters. This is followed by agent-based discrete event simulation for detailed operational and financial analysis.
3.3. Synthesis and Comparative Overview
The analysis in this section indicates a clear evolution in the application of intelligent agents within maritime logistics. Although the earlier studies were mostly limited to isolated, reactive problems such as collision avoidance, recent trends, especially those in 2024 and 2025, indicate a significant shift toward multi-agent reinforcement learning methods and hybrid designs that balance safety and economic efficiency.
On the seaside, the emphasis is shifting from the mere compliance with COLREGs to intricate energy management and predictive maintenance through digital twins. On the landside, studies have established that decentralized systems have better robustness than centralized planning particularly in the presence of stochastic disruptions that are characteristic of port terminals.
Hybrid architectures are becoming the norm in the ship-centric world, as the necessity for balancing short-term safety critical responses with long-term economic planning. Conversely, the port domain is characterized by decentralized multi-agent systems, where the primary challenge is not the control of individual assets, but the coordination of heterogeneous entities with conflicting objectives. From a sustainability perspective, these agents are proving instrumental: they drive economic benefits through optimized resource utilization and environmental gains by minimizing emissions via intelligent routing and grid management. Nevertheless, a maturity gap remains: predictive maintenance agents have been implemented in the real world, while autonomous control agents are still mostly used in simulations, highlighting a disconnect between algorithmic ability and operational trust.
A full summary of the most important studies discussed in this part, in terms of application domain, the specific agent architecture used (as described in
Section 2), and the specific problem addressed, is given in
Table 2.
Table 2 provides a consolidated reference framework of the reviewed literature in a concise form. Though such studies demonstrate the high level of algorithmic capabilities, the critical analysis demonstrates a distinct trend in the context of deployment maturity: most of such solutions are confirmed solely in the framework of simulation. Since physical deployment is a rare exception in the existing literature, this maturity gap is discussed as one of the main systemic challenges in
Section 4.
In summary, while the examined literature demonstrates the algorithmic maturity of agents in the simulated settings, there is a lack of studies validating these systems in the operational conditions of real-world scenarios. This gap between efficiency in simulations and applicability is the foundation of the difficulties that are discussed in the next section.
4. Challenges and Future Research Agenda
While the potential of intelligent agents in maritime logistics is transformative, the transition from controlled experiments to open-water deployment remains fraught with risks. Widespread industrial adoption is currently hindered not by a lack of algorithmic capability, but by the absence of frameworks ensuring safety and accountability in stochastic environments. This section critically examines these impediments, categorizing them into three core dimensions: the robustness of learning algorithms against physical uncertainties, the interpretability of opaque models for human operators and regulators, and the integrity of decentralized agent ecosystems against cyber-physical threats. Based on the review in
Section 3, this section outlines a targeted research agenda to bridge these gaps.
4.1. Bridging the “Reality Gap” in Autonomous Navigation
One of the most common weaknesses that have been found in the existing research on deep reinforcement learning is the use of simplified simulation environments. The majority of the agents are trained in ideal conditions or within obstacle fields that are not dynamic enough to reflect the stochastic nature of the real ocean. This introduces a reality gap, in which an agent that demonstrates efficiency on a simulation may act unpredictably in the real world when exposed to disturbances, such as sensor noise, extreme weather, or unpredictable manned vessel behavior.
In order to address this gap, future studies need to focus on sim-to-real transfer methods. This can be done by training agents on high-fidelity simulators that use domain randomization where intentionally added variations in physical coefficients (e.g., friction, wind shear) are introduced during training to make the learned policy robust enough to transfer to the physical world. Moreover, to overcome the nature of real-world traffic, research ought to incorporate adversarial reinforcement learning (ARL), in which the agent is conditioned against the adversary scenarios that are deliberately created to reveal safety weaknesses. Lastly, validation process should evolve from simulation to a hardware-in-the-loop verification. In such setupa, agent algorithms are executed on actual shipboard hardware integrated with a digital twin, ensuring their funcionality within real-time latency constraints [
29].
4.2. The “Black Box” Problem: Explainability and Trust
The maritime industry operates under strict liability frameworks where accountability is paramount. Nonetheless, the most effective agents, especially those with deep neural networks, are “black boxes”, which is a significant obstacle to adoption. A captain cannot be sure of an autonomous system that causes a maneuver with sharp turns without a clear understandable reason, and investigation agencies cannot audit incomprehensible probability weights following an accident [
4].
In this regard, future research must devote more attention to the design of multimodal explainable artificial intelligence that can be developed to deal with maritime operations. This involves the transformation of simple performance measures to post-hoc explanatory approaches that visualize exactly what sensor inputs (e.g., a particular radar blip) had caused a decision. More to the point, the agents have to be capable of generating counterfactual explanations, i.e., explain not just why a certain action was done, but also why a different action was not done. Furthermore, to ensure that the regulations are followed, the researchers should explore symbolic architectures that combine the strength of deep learning perceptions and the interpretability of the symbolic logic to assure that the decisions taken during the navigation could be traced against the COLREGs [
1]. Future research should also explore the effect of generative artificial intelligence and large language models (LLMs) in translating these “black box” numerical weights into user-friendly reports that are easily interpretable by maritime operators and regulators.
4.3. Regulatory Compliance and Standardization of Agent Communication
At present, a critical interoperability barrier exists since a standardized framework on how intelligent agents are expected to declare their intentions to other vessels or port infrastructure is lacking. Although the legal foundation of navigation is the set of COLREGs, they are designed to be interpreted by humans, with such subjective concepts as good seamanship that cannot be easily translated in binary code [
4].
The development and standardization of relevant ontologies is one of the methods of attaining system interoperability in the maritime industry [
41]. An ontology is a conceptual framework for modeling and representing knowledge in a given domain [
42]. Ontological standardization in the maritime sector would ensure that, in addition to internal interoperability (among components of the same system), external interoperability (among different systems) is established, which is of paramount importance. In addition, standardization of ontologies would be useful in the development of verification methods to determine whether the decision-making logic of a system is safe under all state space configurations.
4.4. Human-Agent Teaming and Cognitive Load
Complete autonomy in maritime industry is still decades away. The interim reality will rely heavily on human-in-the-loop systems. A critical challenge in this transition is preventing “automation surprise” and “mode confusion,” where an operator becomes detached from the control loop due to over-trust in the agent, failing to intervene effectively when the system encounters edge cases outside its training distribution [
4].
To address this, the next generation of studies ought to examine adaptive autonomy, whereby the agent dynamically adapts its degree of intervention by switching between decision-support and active control depending on the dynamically changing complexity of the situation. This necessitates the development of cognitive load estimation algorithms which will be used to identify fatigue or overload of the operators in real time. This is a new direction for agent-based systems since such methods involve the application of physiological sensors (e.g., eye-tracking or electroencephalography). All these sensors may be modeled as agents in a multi-agent system. Moreover, human-agent interfaces must evolve to present information in a form that ensures that the human operator is actively involved in the decision-making process.
4.5. Cybersecurity in Decentralized Multi-Agent Ecosystems
With the transition to the decentralized multi-agent systems of maritime logistics, the attack surface increases exponentially. e.g., Multi-agent environments are vulnerable to Sybil attacks where a rogue agent can impersonate multiple fake identities. In such an attack, this agent can drive market-based negotiations (e.g., bidding on terminal slots) or lead to gridlock by using data poisoning techniques to corrupt the learning models of other agents [
1]. A strong threat model must define explicit trust boundaries and attack surfaces, recognizing that sensor spoofing or adversarial data injection can have a direct cascading effect on safety-critical failures in navigation.
To secure this distributed landscape, future research must establish agents’ zero-trust architectures. In such an architecture, no agent is trusted by default irrespective of its location in the network. Moreover, as the edge devices (e.g., smart buoys, drones) are limited in terms of computational resources, lightweight cryptography protocols are urgently needed to ensure the security of authentication and data integrity without any corresponding latency that may jeopardize the safety-critical operation in real-time.
5. Conclusions
The maritime logistics industry is at the crossroads of environmental sustainability and high-level digitalization. This paper shows that intelligent agents are a paradigm shift towards dynamic, decentralized orchestration. The primary outcome of this research shows that ship-centric applications are based on hybrid architectures to balance between reactive collision avoidance and deliberative energy efficiency. In contrast, port-side logistics increasingly relies on multi-agent systems to autonomously broker the allocation of resources. Ultimately, these technologies play a vital role in achieving the industry’s triple bottom line through optimized fuel use, reduced emissions, and streamlined operations.
Several limitations in current work hinder widespread adoption, both in the reviewed literature and this study. The main limitation identified is the deployment maturity gap. The vast majority of current research is the validation of agent models in simplified simulation environments. To date, only a limited number of studies has tested the proposed systems under real-world conditions at sea and in ports. Furthermore, assessing the actual impact of sustainability is limited by a lack of standardized, measurable key performance indicators across studies. Moreover, the ongoing “black box” nature of deep learning algorithms continues to impede trust from regulators.
In future research, real-world studies must be expanded to fully realize the theoretical potential of agent-based applications. Therefore, the implementation of the domain-randomized sim-to-real transfer procedures is highly recommended. The development of multimodal explainable artificial intelligence is urgently needed to translate agent decisions into human readable formats. Finally, establishing standardized communication ontologies and zero-trust cybersecurity frameworks will be critical to securing the decentralized networks of future autonomous maritime logistics.
Author Contributions
Conceptualization, M.R.; investigation, L.M., D.S. and M.R.; methodology, M.R.; project administration, M.R.; resources, L.M.; writing—original draft, L.M. and D.S.; writing—review and editing, M.R. and. L.M. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations are used in this manuscript:
| MAS | Multi-agent system |
| XAI | Explainable artificial intelligence |
| MASS | Maritime autonomous surface ships |
| LiDAR | Light detection and ranging |
| AIS | Automatic identification system |
| COLREGs | International regulations for preventing collisions at sea |
| DRL | Deep reinforcement learning |
| MAPPO | Multi-agent proximal policy optimization |
| CTDE | Centralized training with decentralized execution |
| LNG | Liquefied natural gas |
| CBM | Condition-based maintenance |
| PHM | Prognostics and health management |
| Ro-Ro | Roll-on/roll-off |
| CNP | Contract net protocol |
| KPI | Key performance indicators |
| ARL | Adversarial reinforcement learning |
| LLM | Large language model |
References
- Durlik, I.; Miller, T.; Kostecka, E.; Łobodzińska, A.; Kostecki, T. Harnessing AI for sustainable shipping and green ports: Challenges and opportunities. Appl. Sci. 2024, 14, 5994. [Google Scholar] [CrossRef]
- Shabib, A.; Maraqa, M.A.; Mustafa, S.; Peng, S.; Ma, G.; Jia, J.; Qiu, N. Bridging theory and practice in port sustainability: A critical review of frameworks, technologies, and implementation strategies. Int. J. Sustain. Eng. 2025, 18, 2538856. [Google Scholar] [CrossRef]
- Özispa, N. How Ports Can Improve Their Sustainability Performance: Triple Bottom Line Approach. J. ETA Marit. Sci. 2021, 9, 41–50. [Google Scholar] [CrossRef]
- Xiao, G.; Wang, Y.; Wu, R.; Li, J.; Cai, Z. Sustainable maritime transport: A review of intelligent shipping technology and green port construction applications. J. Mar. Sci. Eng. 2024, 12, 1728. [Google Scholar] [CrossRef]
- Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach, Global Edition, 4th ed.; Pearson Education: London, UK, 2021; p. 1166. [Google Scholar]
- Wooldridge, M.; Jennings, N.R. Intelligent Agents—Theory and Practice. Knowl. Eng. Rev. 1995, 10, 115–152. [Google Scholar] [CrossRef]
- Campbell, S.; Naeem, W.; Irwin, G.W. A review on improving the autonomy of unmanned surface vehicles through intelligent collision avoidance manoeuvres. Annu. Rev. Control 2012, 36, 267–283. [Google Scholar] [CrossRef]
- Rao, A.S.; Georgeff, M.P. BDI agents: From theory to practice. In Proceedings of the First International Conference on Multiagent Systems, San Francisco, CA, USA, 12–14 June 1995; pp. 312–319. [Google Scholar]
- Gat, E.; Bonnasso, R.P.; Murphy, R. On three-layer architectures. Artif. Intell. Mob. Robot. 1998, 195, 210. [Google Scholar]
- Wooldridge, M. An Introduction to Multiagent Systems; John Wiley & Sons: Hoboken, NJ, USA, 2009. [Google Scholar]
- Males, L.; Sumic, D.; Rosic, M. Applications of Multi-Agent Systems in Unmanned Surface Vessels. Electronics 2022, 11, 3182. [Google Scholar] [CrossRef]
- Meyer, E.; Heiberg, A.; Rasheed, A.; San, O. COLREG-compliant collision avoidance for unmanned surface vehicle using deep reinforcement learning. IEEE Access 2020, 8, 165344–165364. [Google Scholar] [CrossRef]
- Pan, R.; Zhang, W.; Wang, S.; Kang, S. Deep reinforcement learning model for Multi-Ship collision avoidance decision making design implementation and performance analysis. Sci. Rep. 2025, 15, 21250. [Google Scholar] [CrossRef] [PubMed]
- Gan, S.; Zhang, Z.; Wang, Y.; Wang, D. Multi-Ship Collision Avoidance in Inland Waterways Using Actor–Critic Learning with Intrinsic and Extrinsic Rewards. Symmetry 2025, 17, 613. [Google Scholar] [CrossRef]
- Wang, Z.; Chen, P.; Chen, L.; Mou, J. Collaborative Collision Avoidance Approach for USVs Based on Multi-Agent Deep Reinforcement Learning. IEEE Trans. Intell. Transp. Syst. 2025, 26, 4780–4794. [Google Scholar] [CrossRef]
- Lazarowska, A.; Żak, A. A Concept of Autonomous Multi-Agent Navigation System for Unmanned Surface Vessels. Electronics 2022, 11, 2853. [Google Scholar] [CrossRef]
- Moon, K.D.; Jeong, C.Y.; Kim, M.S.; Park, Y.K.; Lee, K. Develop and evaluate of intelligent autonomous-ship framework. IOP Conf. Ser. Mater. Sci. Eng. 2020, 929, 012006. [Google Scholar] [CrossRef]
- Xiao, Z.; Fu, X.; Zhang, L.; Zhang, W.; Agarwal, M.; Goh, R.S.M. MarineMAS: A multi-agent framework to aid design, modelling, and evaluation of autonomous shipping systems. J. Int. Marit. Saf. Environ. Aff. Shipp. 2019, 2, 43–57. [Google Scholar] [CrossRef]
- Wang, Y.; Ye, Q.; Lau, H.C.; Wang, T.; Wu, B. Nash Bargaining Strategy in Autonomous Decision Making for Multi-Ship Collision Avoidance Based on Route Exchange. IET Intell. Transp. Syst. 2025, 19, e70025. [Google Scholar] [CrossRef]
- Malviya, A.; Rajendran, S. Multi-Agent Reinforcement Learning for Collision Avoidance and Path Following of Autonomous Surface Vehicles. In Proceedings of the 2025 IEEE Underwater Technology (UT), Taipei, Taiwan, 2–5 March 2025; pp. 1–6. [Google Scholar] [CrossRef]
- Wei, G.; Kuo, W. COLREGs-compliant multi-ship collision avoidance based on multi-agent reinforcement learning technique. J. Mar. Sci. Eng. 2022, 10, 1431. [Google Scholar] [CrossRef]
- Niu, Y.; Zhu, F.; Wei, M.; Du, Y.; Zhai, P. A multi-ship collision avoidance algorithm using data-driven multi-agent deep reinforcement learning. J. Mar. Sci. Eng. 2023, 11, 2101. [Google Scholar] [CrossRef]
- Wu, P.; Partridge, J.; Anderlini, E.; Liu, Y.; Bucknall, R. An intelligent energy management framework for hybrid-electric propulsion systems using deep reinforcement learning. Int. J. Hydrogen Energy 2025, 106, 282–294. [Google Scholar] [CrossRef]
- Abdalla, A.; Kirchen, P.; Gopaluni, B. Deep reinforcement learning for methane slip reduction in hybrid-powered liquefied natural gas marine vessels. Sustain. Energy Technol. Assess. 2025, 81, 104404. [Google Scholar] [CrossRef]
- Alshareef, M.H.; Alghanmi, A.F. Optimizing Maritime Energy Efficiency: A Machine Learning Approach Using Deep Reinforcement Learning for EEXI and CII Compliance. Sustainability 2024, 16, 10534. [Google Scholar] [CrossRef]
- Tang, D.; Yan, X.; Yuan, Y.; Wang, K.; Qiu, L. Multi-agent based power and energy management system for hybrid ships. In Proceedings of the 2015 International Conference on Renewable Energy Research and Applications (ICRERA), Palermo, Italy, 22–25 November 2015; pp. 383–387. [Google Scholar] [CrossRef]
- Jimenez, V.J.; Bouhmala, N.; Gausdal, A.H. Developing a predictive maintenance model for vessel machinery. J. Ocean Eng. Sci. 2020, 5, 358–386. [Google Scholar] [CrossRef]
- Ellefsen, A.L.; Æsøy, V.; Ushakov, S.; Zhang, H. A comprehensive survey of prognostics and health management based on deep learning for autonomous ships. IEEE Trans. Reliab. 2019, 68, 720–740. [Google Scholar] [CrossRef]
- Durlik, I.; Miller, T.; Kostecka, E.; Kozlovska, P.; Ślączka, W. Enhancing Safety in Autonomous Maritime Transportation Systems with Real-Time AI Agents. Appl. Sci. 2025, 15, 4986. [Google Scholar] [CrossRef]
- Raptodimos, Y.; Lazakis, I. An artificial neural network approach for predicting the performance of ship machinery equipment. In Proceedings of the Maritime Safety and Operations 2016 Conference Proceedings, Glasgow, UK, 13–14 October 2016. [Google Scholar]
- Göksu, B.; Erginer, K. Prediction of ship main engine failures by artificial neural networks. J. ETA Marit. Sci. 2020, 8, 98–113. [Google Scholar] [CrossRef]
- Lv, Y.; Wang, J.; Liu, Z.; Zou, M. From heuristics to multi-agent learning: A survey of intelligent scheduling methods in port seaside operations. Mathematics 2025, 13, 2744. [Google Scholar] [CrossRef]
- Douma, A.M.; Schuur, P.C.; Schutten, J.M.J. Aligning barge and terminal operations using service-time profiles. Flex. Serv. Manuf. J. 2011, 23, 385–421. [Google Scholar] [CrossRef]
- González-Cancelas, N.; Vaca-Cabrero, J.; Camarero-Orive, A. Multi-Agent System for Smart Roll-on/Roll-off Terminal Management: Orchestration and Communication Strategies for AI-Driven Optimization. Appl. Sci. 2025, 15, 6079. [Google Scholar] [CrossRef]
- Kanellos, F.D.; Volanis, E.-S.M.; Hatziargyriou, N.D. Power management method for large ports with multi-agent systems. IEEE Trans. Smart Grid 2017, 10, 1259–1268. [Google Scholar] [CrossRef]
- Issa-Zadeh, S.B.; Esteban, M.D.; López-Gutiérrez, J.-S.; Garay-Rondero, C.L. Unveiling the Sensitivity Analysis of Port Carbon Footprint via Power Alternative Scenarios: A Deep Dive into the Valencia Port Case Study. J. Mar. Sci. Eng. 2024, 12, 1290. [Google Scholar] [CrossRef]
- Xu, L.; Mak, S.; Minaricova, M.; Brintrup, A. On implementing autonomous supply chains: A multi-agent system approach. Comput. Ind. 2024, 161, 104120. [Google Scholar] [CrossRef]
- Keesara, V. AI Agents with MCV Architecture in Supply Chain Management: Toward Autonomous and Collaborative Networks. J. Inf. Syst. Eng. Manag. 2025, 10, 336–350. [Google Scholar] [CrossRef]
- Etebari, F.; Abedzadeh, M.; Khoshalhan, F. Investigating Impact of Intelligent Agents in Improving Supply Chain Performance. Int. J. Ind. Eng. Prod. Res. 2011, 22, 63–72. [Google Scholar]
- Muravev, D.; Hu, H.; Rakhmangulov, A.; Mishkurov, P. Multi-agent optimization of the intermodal terminal main parameters by using AnyLogic simulation platform: Case study on the Ningbo-Zhoushan Port. Int. J. Inf. Manag. 2021, 57, 102133. [Google Scholar] [CrossRef]
- Rosic, M.; Sumic, D.; Males, L. Semantic Interoperability of Multi-Agent Systems in Autonomous Maritime Domains. Electronics 2025, 14, 2630. [Google Scholar] [CrossRef]
- Gruber, T.R. A translation approach to portable ontology specifications. Knowl. Acquis. 1993, 5, 199–220. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |