Path Planning Design and Experiment for a Recirculating Aquaculture AGV Based on Hybrid NRBO-ACO with Dueling DQN

Guo, Zhengjiang; Xia, Yingkai; Liu, Jiajun; Gao, Jian; Wan, Peng; Xu, Kan

doi:10.3390/drones9070476

Open AccessArticle

Path Planning Design and Experiment for a Recirculating Aquaculture AGV Based on Hybrid NRBO-ACO with Dueling DQN

by

Zhengjiang Guo

¹

,

Yingkai Xia

^1,2,*

,

Jiajun Liu

¹,

Jian Gao

^2,3,

Peng Wan

^1,2 and

Kan Xu

⁴

¹

College of Engineering, Huazhong Agricultural University, Wuhan 430070, China

²

Key Laboratory of Aquaculture Facilities Engineering, Ministry of Agriculture and Rural Affairs, Wuhan 430070, China

³

College of Fisheries, Huazhong Agricultural University, Wuhan 430070, China

⁴

Wuhan Second Ship Design and Research Institute, Wuhan 430205, China

^*

Author to whom correspondence should be addressed.

Drones 2025, 9(7), 476; https://doi.org/10.3390/drones9070476

Submission received: 26 May 2025 / Revised: 24 June 2025 / Accepted: 4 July 2025 / Published: 5 July 2025

Download

Browse Figures

Versions Notes

Abstract

This study introduces an advanced automated guided vehicle (AGV) specifically designed for application in recirculating aquaculture systems (RASs). The proposed AGV seamlessly integrates automated feeding, real-time monitoring, and an intelligent path-planning system to enhance operational efficiency. To achieve optimal and adaptive navigation, a hybrid algorithm is developed, incorporating Newton–Raphson-based optimisation (NRBO) alongside ant colony optimisation (ACO). Additionally, dueling deep Q-networks (dueling DQNs) dynamically optimise critical parameters, thereby improving the algorithm’s adaptability to the complexities of RAS environments. Both simulation-based and real-world experiments substantiate the system’s effectiveness, demonstrating superior convergence speed, path quality, and overall operational efficiency compared to traditional methods. The findings of this study highlight the potential of AGV to enhance precision and sustainability in recirculating aquaculture management.

Keywords:

recirculating aquaculture AGV; path planning; Newton–Raphson-based optimiser (NRBO); ant colony optimisation (ACO); dueling deep Q-networks

1. Introduction

Advancements in modern automation technologies have led factory-based aquaculture systems [1], particularly recirculating aquaculture systems (RASs) [2], to integrate intelligent automated guided vehicle (AGV) [3] solutions to enhance operational efficiency. These AGVs automate essential tasks, including water quality monitoring [4], equipment inspection [5], and feeding management, thereby optimising [6] productivity, reducing operational costs, and promoting sustainable aquaculture practices. However, the operational environment for recirculating aquaculture AGV [7] is highly challenging due to high humidity and limited spaces [8].

Path planning is a critical technology for aquaculture AGVs, enabling precise and reliable navigation in structured yet dynamic aquaculture environments [9]. Efficient movement is crucial for tasks [10], such as water quality monitoring [11] and feeding management [12,13], particularly in narrow pathways with irregular obstacles. Conventional methods, such as A* [14], Dijkstra’s algorithm [15], and the Bellman–Ford algorithm [16], efficiently determine the shortest paths [15] but are computationally intensive [17,18] and require full recalculations in dynamic or large-scale environments [19].

To address these limitations, metaheuristic algorithms, such as particle swarm optimisation (PSO) [20] and ant colony optimisation (ACO) [20,21], have gained prominence. ACO is particularly effective in navigating complex environments with multiple obstacles and irregular layouts, utilising pheromone-based communication to iteratively refine path solutions. However, variants including layered potential field (LF)-ACO[22] and artificial potential field (APF)-ACO [23] encounter difficulties in parameter tuning, often leading to suboptimal paths due to excessive avoidance or inadequate obstacle clearance.

To further enhance path planning, this study integrates dueling deep Q-networks (dueling DQNs) [24,25], an advanced reinforcement learning methodology that differentiates between state-value and advantage functions, thereby enhancing learning efficiency and decision-making accuracy. By dynamically adjusting ACO parameters, such as pheromone influence [26] and heuristic weighting [27], this proposed hybrid dueling DQN-ACO approach significantly improves adaptability and robustness within complex and dynamic aquaculture environments, effectively balancing local exploration and global convergence. Importantly, to rigorously evaluate the performance advantages of our hybrid model, we benchmark it against several prominent and established DQN variants within the evaluation section. These include the following: (i) Prioritized Experience Replay DQN (P-DQN) [24], which accelerates learning by biasing sampling towards experiences with high temporal-difference (TD) error, effectively prioritizing transitions that offer the most significant learning potential, and (ii) Noisy Nets DQN (N-DQN) [28], which replaces traditional exploration mechanisms (like ε-greedy) by adding adaptive, state-dependent noise directly to the neural network’s weight parameters, thereby fostering richer exploration patterns throughout the training process.

Building upon the aforementioned analyses, this study presents a novel and robust path-planning method for recirculating aquaculture AGVs [29]. This approach integrates a Newton–Raphson-based optimisation (NRBO)-ACO hybrid algorithm with dueling DQN [30]. The proposed NRBO-ACO algorithm leverages the NRBO to achieve rapid local convergence while employing ACO for global exploration [31]. To further enhance adaptability, dueling DQN dynamically adjusts critical ACO parameters [27,32], including pheromone influence and heuristic factor, enabling real-time adaptation and optimised navigation efficiency. By integrating NRBO, ACO, and dueling DQN, this study provides an advanced solution for intelligent navigation and management in aquaculture systems. The main contributions of this study are summarised as follows:

1.: A novel hybrid path planning method is proposed, integrating NRBO with ACO. This approach leverages the NRBO’s fast convergence for local optimisation and ACO’s global exploration capability. This approach enhances the AGV’s ability to navigate complex environments and find optimal paths in real time.
2.: A novel path fitness function is designed for complex aquaculture environments, optimising navigation through narrow passageways and irregular obstacles. This function enables adaptive path planning to accommodate the dynamic and unpredictable nature of aquaculture systems.
3.: A dueling DQN is employed to optimise performance by dynamically adjusting critical ACO parameters, such as pheromone influence and heuristic factor. This reinforcement learning approach method enables real-time adaptation, enhancing path planning by effectively balancing exploration and exploitation within complex and evolving scenarios.

The remainder of this paper is organised as follows: Section 2 provides an overview of the aquaculture AGV prototype. Section 3 introduces the NRBO-ACO integrated optimisation algorithm with the dueling DQN approach. Section 4 presents the simulation results. Section 5 provides the experimental validation. Section 6 concludes the paper.

2. Problem Formulation

The RAS environment presents unique challenges, such as narrow pathways, irregular obstacle layouts, slippery surfaces, and dynamic feeding schedules, as illustrated in Figure 1. These constraints considerably heighten the complexity of path planning and navigation, necessitating the AGV to optimally balance efficiency, precision, and adaptability.

To validate the efficacy and practicality of the proposed algorithms in real-world applications, extensive experiments were conducted using an AGV within a controlled aquaculture setting. The experimental setup, situated at the RAS facility of Huazhong Agricultural University, was specifically designed to simulate realistic operational conditions, as illustrated in Figure 2.

3. Path Planning with NRBO-ACO Integrated Optimisation Algorithm

To overcome RAS-specific challenges, a hybrid NRBO-ACO path planning algorithm is proposed, as illustrated in Figure 3. The framework combines environmental threat assessment, narrow passage constraints, and path feasibility evaluation with a Newton-Raphson-based optimiser (NRBO) for local refinement. The dueling DQN module dynamically adjusts ACO parameters to improve convergence. TAO is used to escape local minima, and the final path is selected from all ant-generated candidates based on adaptability in the RAS system.

The proposed NRBO-ACO algorithm combines the rapid local adjustments of NRBO with the global pathfinding efficiency of ACO, leveraging pheromone-based exploration to enhance route optimisation. This framework includes path adaptability evaluation, addressing obstacle-induced threat costs, and implementing path-smoothing techniques to navigate narrow passage constraints. These features ensure precise navigation in complex environments. Additionally, the trap avoidance operator (TAO) strengthens the algorithm’s robustness by preventing suboptimal solutions.

To enhance adaptability, dueling DQNs dynamically adjust key ACO parameters, including pheromone influence and heuristic weighting, to achieve an optimal equilibrium between exploration and exploitation. The introduction of a path fitness function specifically designed for aquaculture settings addresses irregular obstacles and confined spaces, ensuring reliable and efficient AGV operation.

3.1. Traditional Ant Colony Algorithm

According to the traditional ant colony algorithm, the transition probability that the k-th ant selects the next node can be calculated using Equation (1).

p_{i j}^{k} (t) = \{\begin{matrix} \frac{{[τ_{i j} (t)]}^{α} * {[η_{i j} (t)]}^{β}}{\sum_{j \in a l l o w e d_{k}} {[τ_{i j} (t)]}^{α} * {[η_{i j} (t)]}^{β}}, \\ 0, j \in a l l o w e d_{k} \end{matrix} j \in a l l o w e d_{k}

(1)

where

τ_{i j} (t)

signifies the pheromone concentration on the path from node i to node j at time t,

j \in {a l l o w e d}_{k}

denotes that the next node belongs to the set of selectable grids, and

{a l l o w e d}_{k}

refers to the set of feasible nodes surrounding the current path node

i

.

Pheromone update: After an ant completes an iteration, the pheromone concentration along its path must be updated. The update method for the

k

-th ant is outlined in Equations (2) and (3):

τ_{i j} (t + 1) = (1 - ρ) τ_{i j} (t) + \sum_{k = 1}^{m} Δ τ_{i j}^{k},

(2)

Δ τ_{i j}^{k} (t) = \{\begin{matrix} \frac{Q}{l_{k}}, k \in (i, j) \\ 0, k \in (i, j) \end{matrix},

(3)

where

ρ

signifies the pheromone decay coefficient;

m

denotes the total number of ants;

Q

implies the pheromone intensity;

l_{k}

refers to the length of the path travelled by this ant, and

k \in (i, j)

indicates that ant k has passed through

(i, j)

.

3.2. Path Adaptability Evaluation in the RAS System

Path length cost: The path length cost represents the total length covered by the planned path. If the path consists of n nodes, each with coordinates

P_{i}

(where

i

ranges from 1 to

n

), the path length cost

L

is calculated as follows:

L_{1} = \sum_{i = 1}^{n} ‖P_{i + 1} - P_{i}‖

(4)

3.3. Obstacle Threat Cost Based on Fish Tanks, Path Generation, and Smoothing Under Narrow Passage Constraints

(1) Obstacle threat cost based on fish tanks

In the RAS environment, fish tanks serve as large, fixed obstacles that have a significant impact on path planning. Each fish tank’s centre can be considered the centre of an obstacle, with the obstacle radius

R_{j}

determined by the actual size of the fish tank. To account for the influence of these obstacles on the path, a threat cost is applied to the path nodes, which is calculated using Equation (5):

C_{1} = γ^{c} * \sum_{i = 1}^{n} \sum_{j = 1}^{p} \max (0, R_{j} - ‖P_{i} - O_{j}‖)

(5)

where

γ^{c}

indicates the amplification factor for the threat cost;

p

denotes the total number of fish tanks;

P_{i}

implies a specific node on the path; and

O_{j}

represents the centre position of the

j

-th fish tank.

As illustrated in Figure 4, the red circles denote the fish tanks, while the green and black circles indicate the next-level and subsequent-level cost zones, respectively, with the cost decreasing progressively. Rj1 and Rj2 represent the distances from the center of the fish tanks to the green and black circles, respectively. When a path node

P_{i}

enters the influence radius

R_{j}

of a fish tank, the threat cost increases, prompting the path to steer away from the edges of the fish tanks. This mechanism effectively minimizes the risk of collisions.

(2) Path generation and smoothing under narrow passage constraints

As shown in Figure 5, the distribution of path nodes is influenced by environmental constraints and optimization objectives. By calculating the spacing between nodes and setting a predefined threshold, it is possible to determine whether the path passes through a narrow region. Within such narrow passages, the algorithm primarily selects path points constrained by boundary conditions, resulting in denser node distributions that reflect the characteristics of the confined space. The C2 value in narrow regions (C2(1)) is greater than that in non-narrow regions (C2(2)), which aligns with the differences in C2 sizes demonstrated in Figure 5 for paths in narrow and wide areas.

a.: Narrow region detection

During initial path generation, the channel width is determined by measuring the distance between nodes. If the distance

d (P_{i}, P_{i + 1})

between two adjacent nodes falls below a predefined threshold

d_{1}

, the corresponding area is marked as a ‘narrow passage’.

In narrow regions, an additional path cost increment

C_{2}

is used to account for the increased traversal difficulty, as expressed in Equation (6):

C_{2} = \sum_{i = 1}^{n - 1} m a x (0, d_{1} - d (P_{i}, P_{i + 1}))

(6)

where

C_{2}

refers to the incremental cost for the path in narrow regions.

b.: Path smoothness

Path smoothness is essential for AGVs operating in RAS environments. To minimise sharp turns in narrow regions, a smoothness penalty term

S

is incorporated into the path length cost, as outlined in Equation (7):

S = \sum_{i = 2}^{n - 1} μ * ‖P_{i - 1} - 2 P_{i} + P_{i + 1}‖

(7)

where

μ

indicates the smoothness penalty coefficient and

S

refers to the smoothness penalty term. The overall path length cost

L

is subsequently modified as follows:

L = \sum_{i = 1}^{n - 1} ‖P_{i + 1} - P_{i}‖ + S

(8)

c.: Comprehensive path adaptability function F

To account for factors, including narrow passages, slippery surfaces, and obstacles, a comprehensive adaptability function is employed to assess and optimise the AGV’s path. This function integrates multiple cost components using weight coefficients, enabling the system to prioritise different aspects of path planning, as expressed in Equation (9):

F = ω_{1} L + ω_{2} C_{1} + ω_{3} C_{2}

(9)

where

ω_{1}

indicates the weight coefficient for the importance of path length and smoothness;

ω_{2}

refers to the weight coefficient for obstacle threat, guiding cautious navigation near fish tanks; and

ω_{3}

implies the weight coefficient for narrow passage cost, simplifying paths in tight areas.

3.4. Newton–Raphson-Based Optimiser

The NRBO, drawing inspiration from the Newton–Raphson method, utilises two fundamental rules to comprehensively explore the search process: the Newton–Raphson search rule (NRSR) and the TAO.

1.: Calculation of derivatives

The derivatives of the path length cost and obstacle threat cost are computed as follows:

x \frac{\partial F}{\partial x} = w_{1} \frac{\partial L}{\partial x} + w_{2} \frac{\partial C_{threat}}{\partial x}

(10)

2.: Path position update

The path position is updated using the NRSR as follows:

NRSR = r a n d n \times \frac{(x_{w} - x_{b}) \times Δ x}{2 \times (x_{w} + x_{b} - 2 \times x_{n})}

(11)

Δ x = r a n d (1, d i m) \times |x_{b} - x_{n}|

(12)

x_{n + 1} {= x}_{n} - NRSR

(13)

where

x_{w}

indicates the worst position, and

x_{b}

denotes the best position. The adaptive coefficient

Δ x

governs the balance between exploration and exploitation capabilities.

3.5. Trap Avoidance Operator

The decision to perform the TAO operation is determined by the difference between the fitness function value at the current

x_{n + 1}

path position and the fitness function value at the current best path position

x_{b}

. A threshold

ζ

is defined, and if the difference between

{F (x}_{n + 1})

and

{F (x}_{b})

is smaller than this threshold, the TAO operation is executed. Specifically, if

|F (x_{n + 1}) - F (x_{b})| < ζ

,

ζ = 0.01

, the TAO operation is performed.

By computing the partial derivatives of the fitness function F concerning the parameters a and b, it is possible to determine the necessary adjustments to these parameters to enhance the overall fitness value:

\frac{\partial F}{\partial a} a n d \frac{\partial F}{\partial b}

(14)

\frac{\partial F}{\partial a} = \frac{F (a + Δ a, b) - F (a, b)}{Δ a}

(15)

\frac{\partial F}{\partial b} = \frac{F (a, b + Δ b) - F (a, b)}{Δ b}

(16)

where

a

refers to the coefficient adjusting the influence of the current best solution

x_{b}

, and

b

refers to the coefficient adjusting the influence of the differential random positions

x_{r 1} - x_{r 2}

.

The updated rules for a and b are expressed as follows:

a = a + λ \times \frac{\partial F}{\partial a}

(17)

b = b + λ \times \frac{\partial F}{\partial b}

(18)

where

λ

signifies the learning rate.

To enhance the algorithm further, the parameter ρ can be integrated into the NRBO framework by utilising

φ

:

φ = a \times (x_{b} - x_{n}) + b \times (x_{r 1} - x_{r 2})

(19)

where

x_{b}

refers to the best position,

x_{n}

signifies the current position, and

x_{r 1}

and

x_{r 2}

denote random positions.

Based on the optimised values of a and b, the equation for updating the path position is given by the following:

x_{n + 1}^{T} = x_{n} - (r a n d n \times \frac{(x_{w} - x_{b}) \times Δ x}{2 \times (x_{w} + x_{b} - 2 \times x_{n})}) + φ

(20)

The temporary position

x_{n + 1}^{T}

integrates the combined effects of the NRSR and the parameter optimisation derived from the ACO algorithm. To prevent convergence to local optima, TAO should be employed as follows:

\{\begin{matrix} x_{T A O} = x_{n + 1}^{T} + θ_{1} \times (μ_{1} \times x_{b} - μ_{2} \times x_{n}) + θ_{2} \times δ \times (μ_{1} \times x_{ω} - μ_{2} \times x_{n}), & if μ_{1} < 0.5 \\ x_{T A O} = x_{b} + θ_{1} \times (μ_{1} \times x_{b} - μ_{2} \times x_{n}) + θ_{2} \times δ \times (μ_{1} \times x_{ω} - μ_{2} \times x_{n}), & Otherwise \end{matrix}

(21)

where

x_{T A O}

is the final position after applying the Terminal Adjustment Operator,

μ_{1}

,

μ_{2}

are random numbers in the range [0, 1],

θ_{1}

,

θ_{2}

are scaling factors controlling the perturbation amplitude, and

δ

is a perturbation coefficient.

\begin{matrix} μ_{1} = ϕ \times 3 \times r a n d + (1 - ϕ) \\ μ_{2} = ϕ \times r a n d + (1 - ϕ) \end{matrix}

(22)

δ = {(1 - (\frac{2 \times N C}{N C_{m a x}}))}^{5}

(23)

Final update of the path position:

x_{n + 1} = x_{T A O}

(24)

Additionally, the current best

x_{b}

and the worst path positions

x_{ω}

are updated based on the fitness value at the newly determined path position

x_{n + 1}

:

If F (x_{n + 1}) < F (x_{b}), update x_{b} = x_{n + 1}

(25)

If F (x_{n + 1}) > F (x_{w}), update x_{w} = x_{n + 1}

(26)

Figure 6 presents the algorithm’s logical flowchart, offering a visual representation of the relationships among its various modules and their workflow.

This diagram illustrates the integration of the Newton–Raphson optimiser into the ACO loop. It begins with population initialisation and threat-aware path cost calculation, followed by adaptive derivative computation and parameter adjustment using finite difference methods. The TAO mechanism is triggered when local optima are detected to ensure robust exploration and solution quality in RAS-specific constraints.

3.6. Dynamic Parameter Adjustment Strategy

In order to dynamically regulate the key parameters of the Ant Colony Optimisation (ACO) algorithm in complex aquaculture environments, we introduce a Dynamic Parameter Adjustment Strategy (DPAS) based on dueling deep Q-network (Dueling DQN). This module adjusts the pheromone influence factor (

α

), the heuristic factor (

β

), and the pheromone evaporation rate (

ρ

) in real time to improve convergence speed and adaptability.

Unlike traditional DRL approaches that directly map sensor inputs to control actions (end-to-end control), our method integrates a dueling DQN model to dynamically adjust key parameters of the Ant Colony Optimization (ACO) algorithm. This design choice is based on the following considerations:

(1) Interpretability: Retaining ACO for path generation allows explainable and constraint-aware planning.

(2) Sample Efficiency: DRL in low-dimensional parameter spaces (

α

,

β

,

ρ

) converges faster than full action space learning.

(3) Robustness: The decoupling between learning and planning provides resilience in unseen scenarios.

(4) Feasibility in Safety-Critical Domains: In RAS environments, fully DRL-based control can be risky and unstable. Hybrid schemes enable conservative exploration.

Moreover, simple adaptive rules or linear parameter scheduling fail to account for nonlinear and time-varying interactions between terrain features (e.g., narrow passages, slippery surfaces) and algorithmic sensitivity.

The dueling DQN module comprises dual-network structures: one designated for current value evaluation (current network) and the other responsible for updating the target value (target network). Employing gradient descent, both networks optimise the following loss function using gradient descent:

L = {(r_{t} + γ Q (s_{t + 1}, \arg m a x Q (s_{t + 1}, a; α_{t}, β_{t}, ρ_{t}); α_{t}, β_{t}, ρ_{t}) - Q (s_{t + 1}, a; α_{t}, β_{t}, ρ_{t}))}^{2}

(27)

α_{t m} = \arg m a x Q (s_{t}, a; α_{t}, β_{t}, ρ_{t})

(28)

where

r_{t}

refers to the reward at time step

t

;

γ

signifies the discount factor;

s_{t}

and

s_{t + 1}

imply the current and next states, respectively; and

α_{t m}

signifies the parameter that maximises the

Q

-value in state

s_{t}

for action

a

.

The state

s_{t} \in R^{4}

fed into dueling DQN at time step

t

is defined as the following:

s_{t} = [q_{t}, Δ q_{t}, σ_{t}, i t e r_{t}]

(29)

Δ q_{t} = q_{t} - q_{t - 1}

(30)

where

q_{t}

refers to the current best solution quality;

Δ q_{t}

indicates the quality improvement;

σ_{t}

represents the variance of solution qualities from all ants; and

i t e r_{t}

signifies the current ACO iteration count.

These are scalars, normalised between [0,1], forming the 4D state vector input to the dueling DQN.

Each

α_{t}

action is a discrete selection of parameter modification schemes. We define a 3D discrete action space:

a_{t} \in \{Δ α, Δ β, Δ ρ\}

,

Δ α, Δ β, Δ ρ \in {- 0.1, 0, + 0.1}

.

That is, the DQN selects whether to increase, decrease, or retain each of the three parameters. This leads to a total action space of 3³ = 27 possible actions.

The dueling DQN outputs a Q-value vector

Q (s_{t}, a_{t}) \in R^{27}

, estimating the expected return for each of the 27 actions. The selected action is as follows:

a_{t}^{*} = a r g m a x Q (s_{t}, a_{t})

(31)

We define a composite reward function to guide dueling DQN’s learning:

R_{t} = w_{1} * Δ q_{t} + w_{2} * (\frac{1}{i t e r_{t} + 1})

(32)

where

i t e r_{t}

indicates the iteration count.

The DQN is trained with experience replay and target network techniques, and its loss function is as follows:

L (θ) = E_{(s, a, r, s^{'})} [{(r + γ m a x Q (s^{'}, a^{'}; θ^{-}) - Q (s, a; θ))}^{2}]

(33)

where

θ

and

θ^{-}

are the current and target network parameters, respectively.

As shown in Figure 7, the network uses fully connected layers (MLP) with a shared base and two heads: one for the state-value function and the other for the advantage function. These are combined as follows:

Q (s, a; α_{t}, β_{t}, ρ_{t}) = V (s; α_{t}, β_{t}, ρ_{t}) + (A (s, a; α_{t}, β_{t}, ρ_{t}) - \frac{1}{|A|} \sum_{a^{'}} A (s, a^{'}; α_{t}, β_{t}, ρ_{t})),

(34)

4. Simulation Results

All simulation experiments were conducted on the MATLAB/Simulink 2024a platform under a Windows 10 64-bit operating system. The hardware system was equipped with an AMD Ryzen 5 7600X six-core processor (base frequency: 4.7 GHz, boost: 5.3 GHz) and an NVIDIA GeForce RTX 4060 GPU (8 GB GDDR6 VRAM, 3072 CUDA cores), ensuring sufficient computational capacity for hybrid optimisation tasks.

The path planning components involving the Newton–Raphson optimisation (NRBO) and the Ant Colony Optimisation (ACO) were implemented using native MATLAB scripts. For the dynamic parameter adjustment module, which incorporates the dueling deep Q-network (dueling DQN), a Python interface was established through MATLAB Engine API for Python to enable seamless integration with external deep learning models. The dueling DQN model was developed and trained using Python 3.10, with the PyTorch 2.1.0 deep learning framework accelerated by CUDA 12.1, allowing real-time interaction with the MATLAB-based simulation environment.

4.1. Verification of Dueling DQN Method

The effectiveness of the dueling DQN in path planning is substantiated through experiments measuring learning speed, convergence, stability, and computational efficiency. Specifically, the algorithm’s performance at each time step is measured using average reward as a key indicator, as it represents how closely the optimisation approach aligns with the ideal solution. Higher average rewards indicate superior learning effectiveness and path quality. To ensure a thorough evaluation, path length and runtime are additionally employed as performance metrics, as they offer a direct representation of the algorithm’s practicality in real-world applications. More efficient routes are indicated by shorter path lengths, while reduced runtime highlights the algorithm’s computational efficiency.

As illustrated in Table 1, the learning phases are divided into three distinct phases: short-term (10 steps), medium-term (20 steps), and long-term (50 steps), each signifying a distinct stage of the algorithm’s learning process. The short-term is the first 10 ACO episodes, the mid-term is 20 episodes, and long-term is 50 episodes

Table 2 presents the complexity levels of the experimental tasks, classifying grid sizes into three categories: simple (25 × 25), moderate (70 × 70), and complex (100 × 100) tasks, as shown in Figure 8.

The complexity at each level is defined by the density of obstacles and the degree of path irregularity.

To explicitly quantify irregularity, we define a Path Irregularity Index (PII), calculated as follows:

PII = (Number of Corners)/(Path Length in meters)

This metric captures the frequency of directional changes along the path. A higher PII indicates more frequent turns per unit distance, suggesting lower smoothness.

The complexity at each level is defined by the density of obstacles and the degree of path irregularity, serving as critical challenges in assessing the algorithm’s adaptability and robustness. When visualised, these metrics offer a comprehensive depiction of the algorithm’s strengths:

1.: Higher curves indicate enhanced performance and faster adaptation.
2.: Earlier plateaus signify a quicker convergence towards optimal solutions.
3.: Smoother curves denote stability across various scenarios, thereby reducing performance fluctuations.
4.: Higher rewards accompanied by reduced runtimes illustrate superior computational efficiency and practicality.
1.: Learning speed and convergence

Figure 9a illustrates that, in comparison with both static and dynamic parameter methods, dueling DQN attains superior average rewards throughout short-term (10 steps), medium-term (20 steps), and long-term (50 steps) learning phases. Additionally, Figure 9b highlights that under simple, moderate, and complex task conditions—where complexity is determined by obstacle density and path irregularity—dueling DQN exhibits significantly faster convergence rates. The adaptive parameter adjustment mechanism of dueling DQN facilitates a more rapid convergence towards optimal paths, thereby outperforming both static and dynamic parameter methods.

2.: Long-term stability

Figure 10a illustrates the consistently stable performance of dueling DQN across varying levels of task complexities, whereas methods employing dynamic parameter methods exhibited fluctuations, and static approaches consistently underperformed. Similarly, Figure 10b further substantiates the faster convergence of dueling DQN, particularly in complex environments, attributable to its capability for dynamic parameter adjustment.

4.2. Cross-Scale Evaluation of D-DQN: Training Dynamics and Testing Efficiency

To assess the generalisation ability of the proposed D-DQN framework, we conducted extensive training and testing on three map types, as shown in Figure 8. The simulation results are presented in Figure 11 and Figure 12, based on selected runs. The underlying data are drawn from 100 independent experiments on each map. The success rate and accuracy are evaluated as follows:

Success Rate: Percentage of trials in which the agent successfully reached the target node within 500 steps.

Accuracy: Ratio of actual steps to optimal steps in completed successful trials.

Accuracy (%) = (Optimal Path Length)/(Actual Path Length) × 100

Figure 11 presents the performance of DQN, P-DQN, N-DQN, and D-DQN during the training process, focusing on the changes in success rate and accuracy. As training progresses, all models gradually adapt to the environment, with both success rate and accuracy showing steady improvement.

In Figure 11a,b, the increase in success rate is particularly notable. From a policy design perspective, P-DQN and D-DQN exhibit faster learning speeds in the early stages of training, benefiting from their ability to accumulate richer experience. In contrast, DQN and N-DQN show relatively lower learning efficiency. In terms of network architecture, N-DQN and D-DQN significantly enhance learning capabilities and speed through dense connections and efficient feature extraction mechanisms. Notably, in Figure 11b, D-DQN achieves a higher success rate than N-DQN in the early stages, but due to the complexity of its network architecture, it is slightly surpassed by N-DQN after convergence. Nevertheless, D-DQN demonstrates excellent overall performance, particularly in convergence speed and stability.

Figure 11c,d illustrate the trends in accuracy. Compared to success rate, accuracy is more significantly influenced by policy design. The D-DQN strategy not only improves accuracy but also enhances success rate, highlighting its superior design. Overall, both policy optimisation and network architecture improvements play a crucial role in enhancing model performance, with D-DQN standing out in these aspects.

Figure 12 shows the models’ performance during the testing phase, conducted in 25 × 25 and 70 × 70 map environments over 200,000 steps. Figure 12a,b display the success rates in the two map sizes. The results indicate that P-DQN and D-DQN outperform DQN and N-DQN in complex environments. In the 70 × 70 map (Figure 12b), the dense connection mechanisms of N-DQN and D-DQN demonstrate significant advantages in handling complex scenarios. In terms of accuracy, Figure 12c,d further validate the effectiveness of the D-DQN strategy. In the 25 × 25 map (Figure 12c), D-DQN surpasses N-DQN in accuracy due to its efficient learning strategy. In the 70 × 70 map (Figure 12d), D-DQN once again proves its strong learning capabilities in complex environments. Overall, D-DQN significantly outperforms other models in both success rate and accuracy.

Table 3 summarises the performance metrics of each model during the testing phase, averaging the success rate and accuracy over the last 50,000 steps. In the 25×25 map, D-DQN achieves a path-finding probability of 99.65% and an accuracy of 99.97%; in the 70×70 map, its path-finding probability is 98.85%, and accuracy is 99.75%. These results fully demonstrate D-DQN’s significant advantages across all performance metrics.

4.3. Verification of Overall Path Planning Performance

Detailed experimental parameters are summarised in Table 4.

The simulation results are presented in Figure 13 and Table 5. This environment comprises randomly generated grid maps of different dimensions (25 × 25, 70 × 70, and 100 × 100), each featuring an obstacle density of 40%. The simulation results are presented in Figure 13 and Table 5. Figure 13 displays the path planning outcomes of three algorithms on three different scale maps (25 × 25, 70 × 70, and 100 × 100, all with 40% obstacle density) from one randomly selected trial out of 100 experiments, while Table 5 provides the complete statistical data collected from all 100 experimental trials.

To further address the limitations of average smoothness metrics, we evaluated path dynamics using two additional criteria: maximum curvature and jerk. This provides a more nuanced assessment of trajectory fluidity and motion safety, especially in high-density environments with abrupt changes. We define the following standardised calculation methods:

Maximum curvature

κ_{i}

:

κ_{i} = \frac{2 \times |(P_{i + 1} - P_{i}) \times (P_{i} - P_{i - 1})|}{∥ P_{i + 1} - P_{i} ∥ \times ∥ P_{i} - P_{i - 1} ∥ \times ∥ P_{i + 1} - P_{i - 1} ∥}

(35)

The maximum curvature is

κ_{i}

, which quantifies the sharpest turning point along the entire path.

Jerk Metric

J

:

J_{i} = \frac{x_{i + 2} - 3 x_{i + 1} + 3 x_{i} - x_{i - 1}}{Δ t^{3}}

(36)

Here,

x_{i}

represents the position at time step

i

, and

Δ t^{3}

is the time interval.

As illustrated in Figure 13 and Table 5 and Table 6, NRBO-ACO consistently attains the shortest path lengths across all map sizes. For instance, within the 25 × 25 grid, NRBO-ACO shortened the path length by 1.44 metres compared to APF-ACO, whereas in the 100 × 100 grid, it achieved a reduction of 21.24 metres compared to LF-ACO. Moreover, NRBO-ACO demonstrates significantly faster running times, completing the 100 × 100 grid in a mere 0.92 s, while LF-ACO requires 8.03 s. Additionally, NRBO-ACO necessitates fewer iterations for convergence, achieving completion of the 100 × 100 map in 67 iterations—faster than APF-ACO’s 86 iterations and significantly ahead of LF-ACO’s 93 iterations. With regard to path smoothness, NRBO-ACO consistently preserves an optimal and reasonable number of directional changes across all scenarios, thereby facilitating efficient navigation without unnecessary turns.

4.4. Parameter Sensitivity Analysis

To evaluate the robustness of the proposed dueling DQN-enhanced NRBO-ACO algorithm, two typical Recirculating Aquaculture System (RAS) scenarios were constructed:

Map 1 (25×25): Simulates a small-scale aquaculture pond with fixed feeding points, water pumps, and semi-enclosed terrain formed by curved pool walls.

Map 2 (50×50): Represents a multi-layer intensive aquaculture workshop, including dynamic obstacles, U-shaped water channels, and narrow passage areas (width < 0.6 m).

An orthogonal experimental design was adopted to test five weight combinations (

ω_{1}

,

ω_{2}

): {(0.1, 0.9), (0.3, 0.7), (0.5, 0.5), (0.7, 0.3), (0.9, 0.1)}, where

ω_{1}

and

ω_{2}

denote the weights for path length and collision avoidance, respectively. The safety distance threshold was set to 0.3 m to ensure AGV navigation in RAS environments. The evaluation metrics are as follows:

Aquaculture Efficiency Index (AEI):

A E I = \frac{1}{N} \sum_{N}^{1} (ω_{1} * \frac{L_{i}}{L_{o p t}} + ω_{2} * \frac{L_{i}}{L_{o p t}})

(37)

Equipment Disturbance Index (EDI):

E D I = \frac{1}{T} \int_{0}^{T} m a x (0,1 - \frac{d (t)}{d_{s a f e}}) d_{t} * 100 %

(38)

Table 7 summarises the performance of four algorithms across two maps. The proposed method demonstrates superior parameter robustness: (1) Under the extreme weight combination (

ω_{1}

:

ω_{2}

= 0.9:0.1) in Map 2, the standard deviation of AEI is only 1.7, representing a 62.8% reduction compared to APF-ACO. (2) When

ω_{2}

= 0.9, the EDI in Map 1 remains below 4.3%, meeting RAS safety standards.

Through 3D response surface analysis (Figure 14a–d and Figure 15a–d), the proposed algorithm exhibits a gradual transition pattern in the 50 × 50 map, with an AEI range of only 0.17, significantly outperforming the bimodal structure of traditional ACO (AEI range: 0.52). When

α

> 0.7, the proposed method shows the lowest EDI slope (0.23 vs. 0.41 for APF-ACO), demonstrating its robustness in complex environments. This performance improvement is attributed to the NRBO pre-optimisation mechanism and the dueling DQN dynamic regulation framework: the former precomputes path topology using the Newton–Raphson method to reduce initial exploration randomness, while the latter dynamically adjusts parameter weights through value and advantage networks to handle dynamic disturbances. Additionally, the segmented path optimisation strategy reduces computational complexity from O (n²) to O (n log n), further enhancing efficiency. The smoothness of the surfaces in (Figure 14a–d and Figure 15a–d) is positively correlated with parameter robustness, and Table 7 shows that the proposed algorithm outperforms comparative methods in AEI and EDI metrics, meeting RAS safety standards.

4.5. NRBO-ACO Performance in Multi-Obstacle Aquaculture Scenarios

Experimental validation in two critical aquaculture scenarios (Single Obstacle Navigation and High-Density Obstacle Field) demonstrates the enhanced capabilities of NRBO-ACO. All trajectories were generated with 0.1 s resolution over 10 s simulations, as shown in Figure 16a–d for single obstacle scenarios and Figure 17a–d for five-obstacle environments. As shown in Table 8, the proposed method achieves a significant reduction in standard deviation of AEI and maintains low EDI values, ensuring compliance with RAS safety standards. Key findings include the following:

1.: Single Obstacle Navigation:

Conventional ACO (red dashed line, Figure 16a) shows ±0.15 m oscillations near obstacles from high Gaussian noise. NRBO-ACO (black solid line, Figure 16d) reduces deviations to 0.03 m through Newton–Raphson gradient optimisation. When encountering sudden obstacles (t = 6.8 s in Figure 16c,d), NRBO-ACO achieves path correction within 0.4 s via dueling DQN adaptation, while APF-ACO (blue dotted line, Figure 16c) exhibits 1.2 s latency from potential field overshoot.

2.: High-Density Obstacle Field (5 Obstacles)

In clustered obstacle environments (Figure 17a–d), NRBO-ACO maintains 0.06 m average clearance from obstacles through adaptive trajectory smoothing, outperforming LF-ACO’s 0.22 m periodic offsets (green dash-dot line, Figure 17b) caused by low-frequency oscillations. For simultaneous target tracking (0.15 × sin(2t) reference path), NRBO-ACO (Figure 17d) demonstrates 0.05 m mean error, significantly better than ACO’s 0.31 m maximum deviation (Figure 17a) at obstacle-dense regions (t = 3.5–5.2 s).

5. Experiment Results

The experiments were conducted on a real AGV (automated guided vehicle) running on the Ubuntu 20.04 operating system with the ROS Noetic platform. The AGV, as depicted in Figure 18, is the TRACER mobile robot developed under the guidance of Songlin LiDAR. The robot was modified to enable communication via CAN and was equipped with an RPLIDAR S3 from Slamtec, which features a 40 m measurement range, a 32 kHz sampling frequency, and a 20 Hz scanning frequency.

Figure 18. The AGV used in the experiment. The global and local path planning algorithms are configured with optimised parameters, specifically tailored to the experimental environments. In Table 9, the specific parameters for the different algorithms are listed, including the pheromone influence factor (alpha), heuristic factor (beta), and adaptive weighting for the fitness function. Moreover, the TEB algorithm handles dynamic obstacles with a local planning frequency of 10 Hz.

5.1. Experimental Environment 1: Feeding Task Navigation

The first experiment evaluates the performance of the AGV in navigating from its starting position at the charging station to the furthest fish tank, simulating a feeding task. As illustrated in Figure 19, the environment consists of the following components:

1.: Dynamic obstacles: Moving individuals representing real-world unpredictability.
2.: Static obstacles: Objects, such as feed bags or maintenance tools, are placed along the path.
3.: Aquaculture-specific challenges: Narrow pathways and slippery surfaces, replicating the typical conditions of an RAS.

The experimental results for Environment 1 are presented in Figure 19 and Table 10. In Figure 20, the path-planning outcomes of three different algorithms are illustrated within a simulated aquaculture environment, which features narrow passages, irregular obstacles, and dynamic conditions.

Figure 20a illustrates the path generated by the A* algorithm, which follows the obstacle boundaries closely and includes several sharp turns. The A* algorithm’s reliance on heuristic search, without the capacity for global optimisation, results in frequent re-planning, particularly when dynamic obstacles are encountered. As shown in Table 10, the A* algorithm exhibited longer path lengths and an increased runtime in comparison to the NRBO-ACO algorithm, highlighting its inefficiency when dealing with complex aquaculture conditions.

In contrast, Figure 20b illustrates the path planned by the Dijkstra algorithm. While it guarantees solutions through an exhaustive search, Dijkstra demonstrates considerable delays in runtime, especially near dynamic obstacles. This delay can be attributed to the absence of heuristic guidance, which results in slower convergence. Additionally, as shown in Table 10, the paths generated by Dijkstra paths contain sharper turns, thereby increasing the risk of instability during navigation.

Figure 20c illustrates the path generated by the proposed NRBO-ACO algorithm, which ensures the shortest and smoothest path with minimal sharp turns. By combining the Newton–Raphson optimisation method for rapid local adjustments with ACO for global exploration, the NRBO-ACO algorithm successfully strikes a balance between efficiency and adaptability. As shown in Table 10, both the runtime and path length of the algorithm are significantly reduced compared to A* and Dijkstra, thereby guaranteeing smooth and reliable navigation, even when dynamic obstacles are present.

5.2. Experimental Environment 2: Slippery Surface Adaptation

The second experiment focuses on assessing the effectiveness of the proposed algorithm in handling slippery surfaces, which are common in aquaculture environments. As depicted in Figure 21, the experimental area features predefined slippery zones with low-friction surfaces.

The experimental results of Environment 2 are shown in Figure 21 and Table 11. Figure 22 illustrates the performance of the algorithms on slippery surfaces, simulating wet aquaculture pathways.

Figure 22a illustrates the trajectory generated by the A* algorithm. The paths often display abrupt directional changes, which significantly increase the risks of instability, particularly in slippery zones. As documented in Table 11, the A* algorithm exhibited a higher computational runtime and failed to minimise path deviations effectively.

Figure 22b illustrates the performance of the Dijkstra algorithm. Due to its exhaustive search process, substantial delays were observed, particularly when recalculating paths near slippery zones. In multiple trials, navigation remained incomplete, as excessive time was expended on re-planning.

Figure 22c depicts the path generated by the NRBO-ACO algorithm. This algorithm dynamically adjusts its path to circumvent slippery zones while preserving an optimal balance between smoothness and minimal path length. By integrating a customised fitness function that evaluates obstacle threat and path smoothness, the NRBO-ACO algorithm successfully balances exploration and exploitation, guaranteeing stable and efficient navigation.

5.3. Experimental Environment 3: Complex Return-Path Navigation with Multiple Turns

A third experiment was designed to evaluate the path planning algorithm under more complex real-world conditions involving multiple directional changes and a round-trip traversal. In this setup, the AGV starts from start position, moves forward sequentially to Tank 9, and then returns to the initial point along a different path. This round-trip mission introduces compound turning requirements and simulates operational challenges typical in dense aquaculture environments, as illustrated in Figure 23.

The performance comparison of the three path planning algorithms in this scenario is illustrated in Figure 24, where the red arrow represents the path the robot takes to reach the target fish tank for the first time, and the black arrow indicates the path the robot follows when turning back to the starting point. The detailed results are summarised in Table 12.

In Figure 24a, the path generated by the A* algorithm follows the tank boundaries very closely, resulting in multiple sharp angle turns, particularly during the return path where the AGV must execute sudden changes in direction at tank intersections. This is quantitatively reflected in the highest number of average corners (14) and longest average running time (160.8 s) among all algorithms. The increased number of corner points not only reduces trajectory smoothness but also contributes to computational delays due to the frequent need for re-evaluation near obstacle boundaries.

In Figure 24b, the Dijkstra algorithm produces a more direct path in certain sections compared to A*, thanks to its full graph traversal strategy. However, the lack of heuristic acceleration results in longer computation time (154.2 s). Although the average number of corners (11) is slightly lower than A*, the path length (59.8 m) is still higher than that of NRBO-ACO. This indicates that while Dijkstra avoids excessive detours, it sacrifices efficiency due to uniform-cost exploration without path optimisation priorities.

In contrast, Figure 24c shows the trajectory generated by the NRBO-ACO algorithm, which significantly improves both path quality and computational efficiency. With only five average corners, it achieves a much smoother and more stable path, especially in regions requiring compound turns. Additionally, the shortest path length (56.4 m) and lowest average runtime (121.5 s) confirm the algorithm’s ability to balance global search effectiveness (via ACO) with rapid local refinement (through Newton–Raphson optimisation). These results demonstrate the algorithm’s superior adaptability in constrained aquaculture environments, where maneuverability and stability are crucial.

6. Discussion

This chapter synthesises the theoretical framework from Chapter 4 with the experimental results from Chapter 5 to provide an in-depth analysis of the NRBO-ACO algorithm’s performance in autonomous navigation systems for recirculating aquaculture environments. Through comparative evaluation of three path planning algorithms (A*, Dijkstra, and NRBO-ACO) across two experimental scenarios, we demonstrate NRBO-ACO’s superior adaptability and performance in complex aquaculture settings.

6.1. Algorithm Performance Evaluation

In Experimental Scenario 1 (feeding task navigation), NRBO-ACO exhibited significant advantages. Compared to A* and Dijkstra, NRBO-ACO achieved superior performance across all metrics: path length (shortest average: 28.1 m), computation time (minimum: 70.25 s), and path smoothness (only two turns). These results indicate that NRBO-ACO’s hybrid approach—combining Newton–Raphson optimisation for local adjustments with ant colony optimisation for global exploration—enables efficient and smooth path planning even with dynamic obstacles.

In contrast, A* algorithm’s reliance on heuristic search limited its global optimisation capability, resulting in longer paths and increased computation time. While Dijkstra guarantees optimal solutions, its exhaustive search approach caused noticeable delays near dynamic obstacles and produced paths with sharp turns that compromise navigation stability.

In Experimental Scenario 2 (slippery surface adaptation), NRBO-ACO again demonstrated robust performance. By dynamically adjusting paths to avoid slippery areas, it maintained smooth trajectories while minimising path length and computation time. A* exhibited unstable pathing near slippery zones with higher computational costs, while Dijkstra failed to complete the navigation task due to excessive computational burden.

In Experimental Scenario 3 (complex round-trip navigation with multiple turns), NRBO-ACO once again outperformed baseline algorithms in terms of both path quality and computational efficiency. The NRBO-ACO algorithm produced the shortest average path length (56.4 m) and lowest average computation time (121.5 s) while also maintaining a significantly lower number of turns (5). This indicates that the algorithm not only optimises travel efficiency but also ensures greater trajectory stability in environments requiring frequent directional changes.

In contrast, the A* algorithm generated paths with the highest number of turns (14) and the longest computation time (160.8 s), due to frequent re-planning at tank intersections and abrupt direction shifts. The Dijkstra algorithm reduced turn frequency to 11 but still suffered from long runtime (154.2 s) owing to its uniform-cost exhaustive search strategy. Both traditional algorithms struggled to maintain efficiency and smoothness in the constrained bidirectional layout.

6.2. Algorithm Adaptability Insights

NRBO-ACO’s success stems from its unique adaptive mechanisms:

1.: Newton–Raphson optimisation enables rapid local path adjustments for dynamic obstacles and surface conditions
2.: Ant colony optimisation ensures global path optimality
3.: Custom fitness functions simultaneously optimise obstacle avoidance and path smoothness

Comparatively, A* and Dijkstra show limited adaptability. A* lacks comprehensive optimisation capabilities for complex environments, while Dijkstra’s computational inefficiency hinders real-time performance in dynamic scenarios.

6.3. Comparison with Recent Deep Reinforcement Learning Path Planning Methods

To further highlight the advantages of the proposed hybrid NRBO-ACO framework, we briefly compare it with recent deep reinforcement learning (DRL)-based path planning methods. Many DRL approaches, such as Soft Actor-Critic (SAC), Proximal Policy Optimisation (PPO), and traditional deep Q-network (DQN) variants, have been applied to robot navigation tasks in structured and semi-structured environments.

However, these methods typically rely on end-to-end policy learning, where the DRL model directly maps sensor observations to control actions. While effective in some dynamic scenarios, they often require extensive training episodes, suffer from high sample inefficiency, and may exhibit poor generalisation when the obstacle distribution changes or when faced with rare edge cases.

In contrast, our approach adopts a model-guided adaptation mechanism, where the dueling DQN is not responsible for generating actions directly but rather adapts the internal parameters of the ACO optimiser in real time. This structural decoupling leads to the following:

(1): Higher interpretability, as the search space and path generation are handled by a classical algorithm;
(2): Improved generalisation, as ACO operates consistently across domains and DQN only tunes meta-parameters;
(3): Lower data requirements, as the learning module focuses on parameter adjustment rather than full behaviour control;
(4): Robust convergence, especially in cluttered and dynamically shifting RAS-like environments.

Compared to traditional DQN variants such as P-DQN (which incorporates Prioritized Experience Replay) and N-DQN (which uses parameterised Noise Networks for exploration), our D-DQN method shows superior adaptability (see Section 4.2) while requiring fewer network parameters and providing explicit control over path smoothness and risk avoidance. Moreover, because the proposed framework maintains a separation between planning and learning, it avoids many instabilities found in fully end-to-end RL systems.

This hybrid architecture effectively bridges the gap between hand-crafted heuristic planning and data-driven adaptation, providing a flexible and high-performance solution tailored for real-world aquaculture scenarios

6.4. Practical Implications and Limitations

NRBO-ACO’s successful implementation in aquaculture suggests promising applications in broader agricultural automation, including greenhouse cultivation and livestock management. However, practical deployment requires addressing additional factors like sensor accuracy and communication latency.

While NRBO-ACO demonstrates excellent experimental performance, its computational complexity may pose challenges in large-scale implementations. Future research should investigate parallel computing and hardware acceleration to enhance real-time performance.

Another important limitation lies in the handling of slippery surfaces. In current implementation, the robot cannot detect slippery zones autonomously. Instead, these areas are predefined based on prior environmental knowledge. While this approach is effective in controlled experiments, it limits the system’s practical deployment in unstructured or dynamically changing real-world environments. In future work, integrating real-time perception modules, such as vision-based friction estimation or surface classification using tactile sensors, would be essential to enable online adaptation and improve robustness in diverse aquaculture conditions

7. Conclusions

The AGV introduced in this study demonstrates significant potential to transform management practices within RAS. The hybrid path planning algorithm, integrating NRBO with ACO, effectively mitigates the complexities arising from irregular obstacles and narrow passageways while concurrently facilitating optimised and adaptive path selection through dueling DQN-based dynamic parameter adjustment. Simulation and real-world experimental results conducted across diverse environments substantiate the superiority of the proposed system as regards convergence speed, path quality, and operational adaptability, outperforming traditional methods. The AGV’s proficiency in autonomous feeding and environmental surveillance significantly enhances operational efficiency, establishing it as a transformative tool in modern aquaculture practices. Future endeavours will expand upon this work by exploring collaborative multi-AGV frameworks to further advance sustainable and intelligent aquaculture management practices.

Author Contributions

Conceptualisation, Z.G. and Y.X.; methodology, Z.G. and Y.X.; software, Z.G.; validation, Z.G., J.L., and K.X.; formal analysis, P.W.; investigation, J.L.; resources, Y.X.; data curation, J.L.; writing—original draft preparation, Z.G.; writing—review and editing, Y.X., J.G., and P.W.; visualisation, K.X.; supervision, Y.X.; project administration, J.G.; funding acquisition, Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Special Funds for Basic Scientific Research in Central Universities of China (Grant NO. 2662024JC003 and 2662025GXPY008), Technology Innovation Program of Hubei Province (Grant No. 2024BBB054), and Foundation of Hubei Province Key Laboratory for Unmanned Underwater Vehicle and Manipulating Technology, Huazhong University of Science and Technology (Grant No. CHK202401).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Data will be made available on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, W.; Fernández-Gutiérrez, D.; Doornbusch, R.; Jordan, J.; Shan, T.; Leoni, P.; Hagemann, N.; Schiphorst, J.K.; Duarte, F.; Ratti, C.; et al. Roboat III: An Autonomous Surface Vessel for Urban Transportation. J. Field Robot. 2023, 40, 1996–2009. [Google Scholar] [CrossRef]
Wang, W.; Shan, T.; Leoni, P.; Fernández-Gutiérrez, D.; Meyers, D.; Ratti, C.; Rus, D. Roboat II: A Novel Autonomous Surface Vessel for Urban Environments. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 1740–1747. [Google Scholar]
Wang, W.; Wang, Z.; Mateos, L.; Huang, K.W.; Schwager, M.; Ratti, C.; Rus, D. Distributed Motion Control for Multiple Connected Surface Vessels. In Proceedings of the 2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 25–29 October 2020; pp. 11658–11665. [Google Scholar]
Wang, W.; Hagemann, N.; Ratti, C.; Rus, D. Adaptive Nonlinear Model Predictive Control for Autonomous Surface Vessels with Largely Varying Payload. In Proceedings of the 2021 IEEE International Conference on Robotics and Automation (ICRA), Xian, China, 30 May–5 June 2021; pp. 7337–7343. [Google Scholar]
Shan, T.; Wang, W.; Englot, B.; Ratti, C.; Rus, D. Receding Horizon Multi-Objective Planner for Autonomous Surface Vehicles in Urban Waterways. In Proceedings of the 2020 59th IEEE Conference on Decision and Control (CDC), Jeju Island, Republic of Korea, 14–18 December 2020; pp. 4085–4092. [Google Scholar]
de Vries, J.; Trevisan, E.; van der Toorn, J.; Das, T.; Brito, B.; Alonso-Mora, J. Regulations Aware Motion Planning for Autonomous Surface Vessels in Urban Canals. In Proceedings of the 2022 International Conference on Robotics and Automation (ICRA), Philadelphia, PA, USA, 23–27 May 2022; pp. 3291–3297. [Google Scholar]
Liu, C.; Chu, X.; Wu, W.; Li, S.; He, Z.; Zheng, M.; Zhou, H.; Li, Z. Human–Machine Cooperation Research for Navigation of Maritime Autonomous Surface Ships: A Review and Consideration. Ocean Eng. 2022, 246, 110555. [Google Scholar] [CrossRef]
Qi, J.; Yang, H.; Sun, H. MOD-RRT*: A Sampling-Based Algorithm for Robot Path Planning in Dynamic Environment. IEEE Trans. Ind. Electron. 2020, 68, 7244–7251. [Google Scholar] [CrossRef]
Nagenborg, M. Urban Robotics and Responsible Urban Innovation. Ethics Inf. Technol. 2020, 22, 345–355. [Google Scholar] [CrossRef]
Doyle, M.J.; Marques, J.V.A.; Vandermeulen, I.; Parrott, C.; Gu, Y.; Xu, X.; Kolling, A.; Groß, R. Modular Fluidic Propulsion Robots. IEEE Trans. Robot. 2020, 37, 532–549. [Google Scholar] [CrossRef]
Verma, A.; Kaiwart, A.; Dubey, N.D.; Naseer, F.; Pradhan, S. A Review on Various Types of In-Pipe Inspection Robot. Mater. Today Proc. 2022, 50, 1425–1434. [Google Scholar] [CrossRef]
Azpúrua, H.; Rezende, A.; Potje, G.; Júnior, G.P.D.C.; Fernandes, R.; Miranda, V.; Filho, L.W.D.R.; Domingues, J.; Rocha, F.; de Sousa, F.L.M.; et al. Towards Semi-Autonomous Robotic Inspection and Mapping in Confined Spaces with the EspeleoRobo. J. Intell. Robot. Syst. 2021, 101, 69. [Google Scholar] [CrossRef]
Halder, S.; Afsari, K. Robots in Inspection and Monitoring of Buildings and Infrastructure: A Systematic Review. Appl. Sci. 2023, 13, 2304. [Google Scholar] [CrossRef]
Khan, A.; Mineo, C.; Dobie, G.; Macleod, C.; Pierce, G. Vision Guided Robotic Inspection for Parts in Manufacturing and Remanufacturing Industry. J. Remanuf. 2021, 11, 49–70. [Google Scholar] [CrossRef]
Wang, C.; Yin, L.; Zhao, Q.; Wang, W.; Li, C.; Luo, B. An Intelligent Robot for Indoor Substation Inspection. Ind. Robot Int. J. Robot. Res. Appl. 2020, 47, 705–712. [Google Scholar] [CrossRef]
Halder, S.; Afsari, K.; Chiou, E.; Patrick, R.; Hamed, K.A. Construction Inspection & Monitoring with Quadruped Robots in Future Human-Robot Teaming: A Preliminary Study. J. Build. Eng. 2023, 65, 105814. [Google Scholar]
Sánchez-Ibáñez, J.R.; Pérez-del-Pulgar, C.J.; García-Cerezo, A. Path Planning for Autonomous Mobile Robots: A Review. Sensors 2021, 21, 7898. [Google Scholar] [CrossRef] [PubMed]
Cheng, C.; Sha, Q.; He, B.; Li, G. Path Planning and Obstacle Avoidance for AUV: A Review. Ocean Eng. 2021, 235, 109355. [Google Scholar] [CrossRef]
Guo, Z.; Xia, Y.; Li, J.; Liu, J.; Xu, K. Hybrid Optimization Path Planning Method for AGV Based on KGWO. Sensors 2024, 24, 5898. [Google Scholar] [CrossRef] [PubMed]
Fan, X.; Guo, Y.; Liu, H.; Wei, B.; Lyu, W. Improved Artificial Potential Field Method Applied for AUV Path Planning. Math. Probl. Eng. 2020, 2020, 6523158. [Google Scholar] [CrossRef]
Chang, L.; Shan, L.; Jiang, C.; Dai, Y. Reinforcement Based Mobile Robot Path Planning with Improved Dynamic Window Approach in Unknown Environment. Auton. Robot. 2021, 45, 51–76. [Google Scholar] [CrossRef]
Yang, L.; Fu, L.; Li, P.; Mao, J.; Guo, N.; Du, L. LF-ACO: An Effective Formation Path Planning for Multi-Mobile Robot. Math. Biosci. Eng 2022, 19, 225–252. [Google Scholar] [CrossRef]
Li, M.; Li, B.; Qi, Z.; Li, J.; Wu, J. Optimized APF-ACO Algorithm for Ship Collision Avoidance and Path Planning. J. Mar. Sci. Eng. 2023, 11, 1177. [Google Scholar] [CrossRef]
Liu, C.; Van Kampen, E.J. Her-Pdqn: A Reinforcement Learning Approach for Uav Navigation with Hybrid Action Spaces and Sparse Rewards. In Proceedings of the AIAA SCITECH 2022 Forum, San Diego, CA, USA, 3–7 January 2022; p. 0793. [Google Scholar]
Cui, X.; Zhu, L.; Zhao, B.; Wang, R.; Han, Z.; Zhang, W.; Dong, L. Parallel RepConv Network: Efficient Vineyard Obstacle Detection with Adaptability to Multi-Illumination Conditions. Comput. Electron. Agric. 2025, 230, 109901. [Google Scholar] [CrossRef]
Zhou, T.; Wei, W. Mobile Robot Path Planning Based on an Improved ACO Algorithm and Path Optimization. In Multimedia Tools and Applications; Springer: Cham, Switzerland, 2024; pp. 1–24. [Google Scholar]
Seyyedabbasi, A.; Kiani, F. MAP-ACO: An Efficient Protocol for Multi-Agent Pathfinding in Real-Time WSN and Decentralized IoT Systems. Microprocess. Microsyst. 2020, 79, 103325. [Google Scholar] [CrossRef]
Cuayáhuitl, H.; Yu, S.; Williamson, A.; Carse, J. Scaling up Deep Reinforcement Learning for Multi-Domain Dialogue Systems. In Proceedings of the 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, USA, 14–19 May 2017; pp. 3339–3346. [Google Scholar]
Liu, N.; Hu, Z.; Wei, M.; Guo, P.; Zhang, S.; Zhang, A. Improved A* Algorithm Incorporating RRT* Thought: A Path Planning Algorithm for AGV in Digitalised Workshops. Comput. Oper. Res. 2025, 177, 106993. [Google Scholar] [CrossRef]
Li, J.; Ma, S.; Xie, M. A Cluster-Oriented Task Assignment Optimization for Green High-Performance Computing Center Operations. Comput. Ind. Eng. 2025, 203, 110929. [Google Scholar] [CrossRef]
Chen, X.; Kong, Y.; Fang, X.; Wu, Q. A Fast Two-Stage ACO Algorithm for Robotic Path Planning. Neural Comput. Appl. 2013, 22, 313–319. [Google Scholar] [CrossRef]
Stützle, T.; Dorigo, M. ACO Algorithms for the Traveling Salesman Problem. Evol. Algorithms Eng. Comput. Sci. 1999, 4, 163–183. [Google Scholar]

Figure 1. Issues in the RAS environment: (a) Irregular obstacle configuration. (b) Slippery surface. (c) Constricted passage.

Figure 2. Experimental testing scenarios for AGV in the RAS environment.

Figure 3. Integrated framework for path planning using NRBO-ACO and dueling DQN.

Figure 4. Obstacle threat cost analysis for fish tanks.

Figure 5. Distribution and optimisation of path nodes in narrow passages.

Figure 6. Logical framework of the NRBO-ACO algorithm.

Figure 7. Architecture of the dueling deep Q-network.

Figure 8. Random grid maps with varying obstacle densities, ranging from (a) 25 × 25, (b) 70 × 70, (c) 100 × 100.

Figure 9. (a) Comparison of learning speed: dueling DQN vs. alternative approaches. (b) Convergence speed across different task complexities.

Figure 10. (a) Convergence speed across different task complexities. (b) Computational efficiency across varying task scales.

Figure 11. The success rate and the accuracy of training process. (a) Success rate in 25 × 25 map. (b) Success rate in 70 × 70 map. (c) Accuracy in 25 × 25 map. (d) Accuracy in 70 × 70 map.

Figure 12. The success rate and the accuracy of testing process. (a) Success rate in 25 × 25 map. (b) Success rate in 70 × 70 map. (c) Accuracy in 25 × 25 map. (d) Accuracy in 70 × 70 map.

Figure 13. Path planning outcomes in a random obstacle environment.

Figure 14. Parameter sensitivity analysis for small-scale aquaculture pond (Map 1). (a) ACO, (b) APF-ACO, (c) LF-ACO, (d) NRBO-ACO.

Figure 15. Parameter sensitivity analysis for multi-layer intensive aquaculture workshop (Map 2). (a) ACO, (b) APF-ACO, (c) LF-ACO, (d) NRBO-ACO.

Figure 16. Parameter sensitivity analysis for single obstacle environment (Map 1). (a) ACO, (b) APF-ACO, (c) LF-ACO, (d) NRBO-ACO.

Figure 17. Parameter sensitivity analysis for five obstacles environment (Map 2). (a) ACO, (b) APF-ACO, (c) LF-ACO, (d) NRBO-ACO.

Figure 19. Environmental configuration for the first aquaculture experiment.

Figure 20. Path planning results in a simulated aquaculture Environment 1. (a) A*, (b) Dijsktra, (c) NRBO-ACO.

Figure 21. Environmental setup for the second aquaculture experiment.

Figure 22. Path planning results in simulated aquaculture Environment 2. (a) A*, (b) Dijsktra, (c) NRBO-ACO.

Figure 23. Environment and path target setup for the third aquaculture experiment.

Figure 24. Path planning results in simulated aquaculture Environment 3. (a) A*, (b) Dijsktra, (c) NRBO-ACO.

Table 1. Learning phases and time steps for dueling DQN evaluation.

Learning Phase	Time Steps	Phase Description
Short-term	10	Initial learning phase
Medium-term	20	Intermediate learning phase
Long-term	50	Long-term learning phase

Table 2. Task complexity and grid size for path planning experiments.

Task Complexity	Grid Size	PII	Phase Description	Path Irregularity
Simple	25 × 25	0.21	Low-density	Low irregularity
Moderate	70 × 70	0.28	Medium-density	Medium irregularity
Complex	100 × 100	0.34	High-density	High irregularity

Table 3. Result of cross-scale evaluation for D-DQN.

Model	Map Size	Success Rate (%)	Accuracy (%)
DQN	25 × 25	97.33	99.35
DQN	70 × 70	95.68	99.15
P-DQN	25 × 25	98.10	99.58
P-DQN	70 × 70	96.47	99.20
N-DQN	25 × 25	98.51	99.66
N-DQN	70 × 70	98.65	99.45
D-DQN	25 × 25	99.65	99.97
D-DQN	70 × 70	98.85	99.75

Table 4. Initial parameter settings for ant colony algorithm.

Parameters	Value	Parameters	Value
Number of ants per metre/m	500	Pheromone enhancement coefficient	10
Initial pheromone value	1.5	Heuristic factor	2
Global pheromone evaporation coefficient	0.9	Ant movement step length	30

Table 5. Simulation data for random obstacle environment.

Map Size	Algorithm	Avg. Path Length (m)	Std. Path Length (m)	Avg. Runtime (s)	Std. Runtime (s)	Avg. Iterations	Avg. Corners
25 × 25	APF-ACO	34.21	2.14	0.81	0.12	16	7
	LF-ACO	33.07	1.88	1.92	0.15	30	7
	NRBO-ACO	32.03	1.29	0.72	0.05	13	6
70 × 70	APF-ACO	54.94	3.65	1.95	0.21	51	9
	LF-ACO	58.03	2.97	4.01	0.36	72	10
	NRBO-ACO	53.01	2.11	0.97	0.05	32	8
100 × 100	APF-ACO	136.12	5.78	3.99	0.41	86	10
	LF-ACO	158.89	7.24	8.11	0.51	94	7
	NRBO-ACO	134.15	3.65	3.08	0.11	68	7

Table 6. Supplementary smoothness metrics over 100 five-obstacle scenarios.

Algorithm	Avg. Jerk (m/s³)	Max Curvature (rad/m)
APF-ACO	1.97	0.58
LF-ACO	1.72	0.44
ACO	2.35	0.65
NRBO-ACO	0.91	0.29

Table 7. Performance comparison of algorithms (30 independent trials).

Algorithm	Map 1 AEI	Map 1 EDI (%)	Map 2 AEI	Map 2 EDI (%)
ACO	0.82	9.7	1.24	18.3
LF-ACO	0.78	7.2	1.13	15.6
APF-ACO	0.73	5.9	1.05	12.4
NRBO-ACO	0.61	3.8	0.89	6.2

Table 8. Cross-scenario comparison (single and five obstacles).

Algorithm	Scenario	Trajectory Smoothness (σ/m)	Maximum Position Error (m)	Obstacle Distance (m)	Computation Time (s)
ACO	Single Obstacle	0.10	0.15	0.15 ± 0.02	0.21
	Five Obstacles	0.12	0.31	0.18 ± 0.03	0.37
LF-ACO	Single Obstacle	0.07	0.12	0.18 ± 0.03	0.25
	Five Obstacles	0.09	0.18	0.20 ± 0.04	0.40
APF-ACO	Single Obstacle	0.09	0.10	0.20 ± 0.04	0.29
	Five Obstacles	0.11	0.25	0.22 ± 0.05	0.45
NRBO-ACO	Single Obstacle	0.02	0.08	0.06 ± 0.01	0.15
	Five Obstacles	0.05	0.12	0.08 ± 0.02	0.25

Table 9. Parameters of the path planning algorithm based on AGV.

Parameter	NRBO-ACO	A*	Dijkstra	TEB
Map resolution (m/cell)	0.1	0.1	0.1	0.1
Pheromone influence	1.2	-	-	-
Heuristic factor	2.5	-	-	-
Pheromone evaporation rate	0.4	-	-	-
Local planning frequency (Hz)	-	-	-	10
Slippery zone penalty weight	-	-	-	-
Maximum iterations	500	500	-	-
Friction coefficient threshold	0.3	-	-	-
Step size for Newton–Raphson	0.02	-	-	-

Table 10. Performance comparison of path planning algorithms in Environment 1.

Algorithm	Average Path Length (m)	Average Running Time (s)	Average Corners
A*	32.4	108	4
Dijkstra	30.2	100.7	4
NRBO-ACO	28.1	70.25	2

Table 11. Performance comparison of path planning algorithms in Environment 2.

Algorithm	Average Path Length (m)	Average Running Time (s)	Average Corners
A*	9.5	35.7	3
Dijkstra	-	-	-
NRBO-ACO	8.6	28.6	1

Table 12. Performance comparison of path planning algorithms in Environment 3.

Algorithm	Average Path Length (m)	Average Running Time (s)	Average Corners
A*	61.2	160.8	14
Dijkstra	59.8	154.2	11
NRBO-ACO	56.4	121.5	5

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Guo, Z.; Xia, Y.; Liu, J.; Gao, J.; Wan, P.; Xu, K. Path Planning Design and Experiment for a Recirculating Aquaculture AGV Based on Hybrid NRBO-ACO with Dueling DQN. Drones 2025, 9, 476. https://doi.org/10.3390/drones9070476

AMA Style

Guo Z, Xia Y, Liu J, Gao J, Wan P, Xu K. Path Planning Design and Experiment for a Recirculating Aquaculture AGV Based on Hybrid NRBO-ACO with Dueling DQN. Drones. 2025; 9(7):476. https://doi.org/10.3390/drones9070476

Chicago/Turabian Style

Guo, Zhengjiang, Yingkai Xia, Jiajun Liu, Jian Gao, Peng Wan, and Kan Xu. 2025. "Path Planning Design and Experiment for a Recirculating Aquaculture AGV Based on Hybrid NRBO-ACO with Dueling DQN" Drones 9, no. 7: 476. https://doi.org/10.3390/drones9070476

APA Style

Guo, Z., Xia, Y., Liu, J., Gao, J., Wan, P., & Xu, K. (2025). Path Planning Design and Experiment for a Recirculating Aquaculture AGV Based on Hybrid NRBO-ACO with Dueling DQN. Drones, 9(7), 476. https://doi.org/10.3390/drones9070476

Article Menu

Path Planning Design and Experiment for a Recirculating Aquaculture AGV Based on Hybrid NRBO-ACO with Dueling DQN

Abstract

1. Introduction

2. Problem Formulation

3. Path Planning with NRBO-ACO Integrated Optimisation Algorithm

3.1. Traditional Ant Colony Algorithm

3.2. Path Adaptability Evaluation in the RAS System

3.3. Obstacle Threat Cost Based on Fish Tanks, Path Generation, and Smoothing Under Narrow Passage Constraints

3.4. Newton–Raphson-Based Optimiser

3.5. Trap Avoidance Operator

3.6. Dynamic Parameter Adjustment Strategy

4. Simulation Results

4.1. Verification of Dueling DQN Method

4.2. Cross-Scale Evaluation of D-DQN: Training Dynamics and Testing Efficiency

4.3. Verification of Overall Path Planning Performance

4.4. Parameter Sensitivity Analysis

4.5. NRBO-ACO Performance in Multi-Obstacle Aquaculture Scenarios

5. Experiment Results

5.1. Experimental Environment 1: Feeding Task Navigation

5.2. Experimental Environment 2: Slippery Surface Adaptation

5.3. Experimental Environment 3: Complex Return-Path Navigation with Multiple Turns

6. Discussion

6.1. Algorithm Performance Evaluation

6.2. Algorithm Adaptability Insights

6.3. Comparison with Recent Deep Reinforcement Learning Path Planning Methods

6.4. Practical Implications and Limitations

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI