1. Introduction
The decline in water quality across oceans, rivers, and lakes stems from pressures acting at multiple scales: diffuse nutrient inputs from agriculture, urban and industrial effluents, emerging contaminants, and climate forcing that intensifies hydrometeorological extremes [
1,
2]. In this context, global assessments that integrate land use, hydrology, and water quality show that clean-water scarcity triples when pollution is included in availability models, reshaping management priorities across thousands of sub-basins [
3].
Among emerging contaminants, microplastics have shifted from a surface issue to a full water-column problem: a synthesis of 1885 vertical profiles (2014–2024) reports concentrations from ≈
to
particles per
and vertical patterns driven by particle size, density, and mixing dynamics evidence that surface only sampling is insufficient [
4]. As a result, there are direct effects on organisms and ecosystems: many marine species mistake these particles for food, leading to contamination risks and adverse impacts that propagate through food webs.
Another driver of degradation is acidification, which reflects a sustained chemical alteration of the water. The ocean absorbs CO
2 which, once dissolved, forms carbonic acid, lowers pH, reduces carbonate ions (
), and decreases the saturation state of calcium carbonate (CaCO
3), making calcification more difficult [
5]. The outcome is not a single “toxic” substance but a more corrosive habitat for corals, mollusks, and calcifying plankton, with documented effects on growth and larval survival [
6] as well as increased shell fragility [
7], with knock-on consequences for food webs and fisheries [
8].
In parallel, harmful algal blooms (HABs) have intensified since the 1980s, with a 44% rise when comparing the 2000s to the 2010s across multiple regions, driven by nutrient surpluses linked to urbanization, wastewater, agricultural expansion, and stratification [
9]. This trend calls for early warning systems and continuous observing networks. Documented cases include cyanobacterial blooms in Lake Taihu (China), with recent analyses of environmental drivers [
10], recurrent events in Lake Erie (USA) associated with cyanotoxin risks [
11], and episodes in Lake Okeechobee (USA) linked to nutrient management and releases to urban estuaries [
12]. In South America, Lake Ypacaraí (Paraguay) shows eutrophication and cyanobacterial proliferation under watershed pressures, with environmental and social impacts [
2]. Overall, these examples underscore the urgent need for efficient, continuous water-quality monitoring capable of informing timely environmental management and policy decisions.
Given this urgency, the 2030 Agenda of the United Nations prioritizes clean water and sanitation (SDG 6), promoting comprehensive and consistent monitoring methods to support sustainable management practices [
13]. However, traditional monitoring approaches, such as fixed laboratories and manual sampling, have persistent limitations: high operating costs, limited spatial coverage, staff exposure to hazards, and inflexibility [
14].
To address these limits, autonomous surface vehicles (ASVs) have emerged as a robust option in recent years [
15,
16]. Equipped with specialized sensors, ASVs enable real-time data collection and adaptive environmental monitoring while reducing human exposure. Within the ASV system, Informative Path Planning (IPP) is a key component: it generates trajectories dynamically from collected data to maximize information under energy, time, and sensor constraints [
17].
In recent IPP literature, deep reinforcement learning (DRL) has been used to learn adaptive sensing and monitoring policies that can be executed online after an offline training phase, which is useful under partial observability. Among recent applications, ref. [
18] proposes an adaptive IPP approach that combines tree search with an offline-trained neural network to predict informative sensing actions. In simulation, the method matches benchmark performance while reducing computing time, and it is additionally validated with real surface-temperature data. In a related direction, ref. [
19] presents a framework in which UAVs autonomously acquire training images to retrain semantic segmentation models, reporting performance gains and reduced labeling effort when compared against local planning baselines.
More generally, DRL-based IPP has also been studied with action-space design mechanisms that keep online decision-making tractable, such as constructing a local graph online to restrict available actions while supporting replanning [
20]. A broader synthesis of learning-based adaptive IPP methods, including common design choices and open issues such as scenario coverage and transfer validation, is provided in [
21].
For aquatic monitoring, the work [
22] combines local Gaussian Processes (GPs) with a DRL policy that conditions its decisions on the posterior mean and variance through an information-gain reward. Safety and coordination are handled through a consensus-based heuristic, and the reported results show lower estimation errors compared to alternative monitoring approaches. A related line of work, ref. [
23], formulates continuous water-quality patrolling as a partially observable Markov game for Lake Ypacaraí and adopts a multi-agent deep Q-learning setup with a shared policy for homogeneous ASVs. In that formulation, exploration and intensification are treated as separate phases, and a transition variable controls the shift toward prioritizing highly polluted areas while maintaining revisit behavior.
Swarm-intelligence planners have also shown promise. AquaFeL-PSO [
16] combines Particle Swarm Optimization (PSO), federated learning, and Gaussian Processes (GPs) as a surrogate model in two phases: exploration to estimate an initial model and exploitation that partitions the search space into action zones to refine hotspots. In a case study, the planner achieved a 300% improvement in water-quality modeling and about 4000% in peak detection. In [
24], the authors present a hybrid HGWO–PSO planner that blends Gray Wolf Optimization (GWO) for exploration with PSO for exploitation in static environments. Validation across four obstacle-rich scenarios for a mobile robot shows shorter paths than PSO, GWO, and other heuristics algorithms.
Within Ant Colony Optimization (ACO)-based path planners, several recent contributions stand out. Ref. [
25] introduces MACOGA, which uses ACO to propose feasible routes and an enhanced Genetic Algorithm (GA) to refine them, incorporating pheromones and heuristic factors with adaptive probabilistic crossover/mutation and a reconnection step to ensure length and smoothness. Across six grid models of different sizes and complexity, MACOGA attains shorter paths and planning times, with 100% success in relatively complex settings. In turn, ref. [
26] proposes ADL-ACO (two layers) for dynamic planning and ADWA for real-time obstacle avoidance: the first speeds convergence and improves global search via adaptive parameter tuning, and the second refines length, number of turns, safety, and smoothness using segmented B-splines. In an industrial context, ref. [
27] integrates ACO with Bayesian Optimization (BO) to tune critical parameters and prioritize high-demand regions on CAD-derived maps, improving coverage and reducing waste. For multi-vehicle adaptive ocean sampling, ref. [
28] presents V-ACO, which couples Voronoi partitioning (with tournament selection) and a modified-heuristic ACO to generate collision-free trajectories under mission time, inter-vehicle spacing, and obstacle constraints. Simulations maximize data collection in high-interest regions, and field tests confirm practical feasibility.
In summary, IPP with autonomous vehicles has advanced in quantifying uncertainty and adopting adaptive replanning. Even so, most ACO-based planners remain focused on global/local path planning and do not integrate an informative map derived from water-quality models to guide sampling. Consequently, current ACO approaches remain disconnected from predictive models that use uncertainty information to guide exploration and contamination data to drive exploitation.
To address this gap, this work proposes the ACO-Path, an IPP method that combines ACO with a GP as an online surrogate model of the WQPs and maintains an exploration–exploitation logic through dynamic action zones. The GP is updated online with ASV measurements and provides two outputs: the posterior mean (contamination map) and the posterior variance (model uncertainty). Building on the notion of action zones introduced in AquaFeL-PSO [
16], ACO-Path defines operational regions from the GP models to steer where ASVs are more likely to transit: high-variance zones promote exploration by densifying sampling where knowledge is weak, while high-mean zones promote exploitation by characterizing hotspots.
This paper is presented as a proof of concept for using ACO as the backbone of an IPP strategy in a multi-ASV setting. In this design, the GP updates the action zones and, through them, the set of admissible candidate locations, while ACO optimizes the route within that set. To keep the contribution focused, the ACO component is kept simple by adopting the classical Ant System as the baseline ACO variant [
29,
30]. More recent ACO variants could be incorporated within the same framework as a follow-up step, but they are not the objective of the present study. The case study is based on the Lake Ypacaraí scenario.
In contrast to DRL pipelines that require extensive training data and careful validation to ensure transfer across scenarios, the present work keeps the planner training-free and uses the GP only to update the action zones and, consequently, the set of candidate locations that ASVs can visit.
The main contributions of this work are as follows:
A new informative path planner that integrates Ant Colony Optimization (Ant System) with an online Gaussian Process (GP) updated throughout the mission, so that waypoint selection is driven by both the estimated contamination (GP mean) and the associated uncertainty (GP variance). The resulting trajectories are smoother, which can reduces the burden on the low-level controller and also lower energy demand during execution.
A GP-driven action zone construction that translates the mean and variance maps into candidate sampling regions and implements an explore then exploit policy: early decisions prioritize uncertainty reduction, and later decisions focus on informative (potential hotspot) areas using a combined criterion.
This paper is organized as follows.
Section 2 formulates the monitoring problem and summarizes the operational assumptions.
Section 3 presents the methodological foundations of the proposed IPP approach.
Section 4 introduces ACO-Path, the proposed IPP.
Section 5 describes the simulation setup, ground-truth generation, evaluation metrics, parameter settings, and experimental results.
Section 6 discusses the main findings and their implications. Finally,
Section 7 concludes the paper and outlines directions for future work.
4. ACO-Path: Proposed Informative Path Planner
ACO-Path is a novel IPP designed for autonomous environmental monitoring of water resources using fleets of ASVs. The planner addresses a gap in existing ACO-based path planning approaches: while traditional ACO methods excel at finding collision-free geometric paths, they typically do not integrate informative mapping derived from predictive models of the environment.
The key innovation of ACO-Path lies in its integration of three complementary techniques: (1) ACO, specifically the AS variant for distributed, adaptive path construction with pheromone; (2) GP regression as a surrogate model that provides both a contamination map (mean
) and an uncertainty map (variance
); (3) Dynamic Action Zones, inspired by AquaFeL-PSO [
16] but adapted for ACO, that segment the monitoring area into regions of interest based on real-time GP outputs.
By coupling these methods, ACO-Path achieves an exploration–exploitation balance: the GP variance guides exploration (sampling where the model is uncertain), while the GP mean guides exploitation (characterizing contamination hotspots).
4.1. Dynamic Action Zones
Action zones are operational regions defined from the GP maps where ASVs should focus their efforts. Unlike AquaFeL-PSO [
16], which generates zones once after an exploration phase, ACO-Path regenerates zones dynamically at every planning cycle to adapt to evolving information.
The incorporation of action zones in ACO-Path serves a dual purpose: (1) to provide a structured candidate set of waypoints for ACO decision-making, and (2) to prevent the ACO algorithm from converging prematurely to local optima.
In classical ACO path planning, artificial ants explore the entire graph uniformly, which can lead to two undesirable behaviors. First, ants may become trapped in locally attractive regions (e.g., nearby nodes with high pheromone concentration) without exploring distant areas that could yield higher information gain [
37]. Second, the search space grows combinatorially with the number of nodes, making exhaustive exploration computationally prohibitive for large environments such as Lake Ypacaraí [
38].
Action zones address these limitations by dynamically filtering the search space. Instead of evaluating all navigable cells as potential targets, ACO constructs paths among the centers of action zones, a much smaller set of high-value candidate waypoints derived from the GP model.
ACO-Path defines two types of zones based on GP outputs:
- 1.
Exploration Zones (). Regions with high GP variance where model uncertainty is elevated. The goal is to reduce uncertainty by densifying sampling in unexplored areas.
Criterion: Cells with define high-priority exploration regions, and cells with define medium-priority regions, where is the current maximum variance.
Centers: Local maxima of .
- 2.
Exploitation Zones (). Regions with high GP mean where contamination is likely. The goal is to characterize hotspots with detailed sampling.
Criterion: Cells with define high-priority exploitation regions (risk level), and cells with define medium-priority regions (warning level), where is the current maximum mean.
Centers: Local maxima of .
It is worth noting that the boundaries of each priority region are defined following the approach described in AquaFeL-PSO [
16].
Figure 2 summarizes how action zones are generated and then used by ACO-Path. First, the exploration stage fits a GP over the water resource (
Figure 2a), yielding a mean map
(contamination estimate) and a variance map
(model uncertainty). Next, the planner thresholds these maps to flag regions of interest and instantiates non-overlapping circular action zones sized for the fleet (
Figure 2b). Finally, the center of each action zone is extracted (
Figure 2c) and provided to the ACO module at time
t.
4.2. Path Planner
Figure 3 outlines the workflow of the proposed path planner. The main steps are as follows:
- (1)
Initialization
Set the IPP parameters (number of ants and iterations, for pheromone dynamics, among others).
- (2)
Sensing and model update
At their current waypoints, ASVs acquire WQP measurements and update the GP, with the following results: the mean (contamination) and the variance (uncertainty) maps.
- (3)
Action zone generation
After updating the GP, action zones are recomputed on both maps. Let
be the set of centers from the mean map and
be the set from the uncertainty map. The candidate set
is defined as
Early in the mission, data are scarce, and mean peaks tend to lie near the current positions of the ASVs, which can trigger premature exploitation of a single region. Temporarily restricting candidates to encourages coverage of poorly sampled areas. Once is exceeded, both sets are used within the ACO to balance exploration and exploitation.
- (4)
ACO routing over action zones
Given the admissible set of action zone centers , the routing problem is defined on the graph induced by these centers (nodes), with edge weights given by the travel distance between centers. Ants then construct candidate routes by sampling transitions according to pheromone information and a distance-based heuristic . After all ants finish, pheromones evaporate and are reinforced according to route quality.
The best-ranked solution yields an assignment of each ASV to a target center and an ordered sequence of centers to be visited. In ACO-Path, however, this plan is executed in a receding-horizon manner. Only the first center of the selected route is used as the next waypoint for the first vehicle. This is illustrated in
Figure 4, where the first waypoint of the best route is highlighted with a red box. The assigned center is then removed from the candidate set and the ACO step is run again for the next vehicle to avoid selecting the same target. After each ASV reaches its assigned center and collects new measurements, the GP is updated, action zones and their centers are recomputed, and the ACO routing step is executed again to select the next waypoint under the updated belief.
- (5)
Measurement acquisition
During the mission, ASVs move toward their assigned zones and acquire new measurements only when the traveled distance between consecutive samples
reaches the adaptive sampling distance
l [
35] (Equation (
12)).
where
is a scaling factor that controls how strict the sampling spacing is, and
is the current GP length scale at time
t. This choice provides enough measurements for modeling while avoiding oversampling and unnecessary GP updates.
- (6)
Stopping criterion
Consistent with Assumption 3, the monitoring mission terminates once the average per-vehicle path length reaches .
5. Experiments
This section describes the experimental setup, the ground-truth generation procedure, the evaluation metrics, and the parameter settings for both the planner and the simulation environment. All methods are evaluated under the same fleet configuration and mission distance to ensure a fair comparison.
5.1. Setup
The proposed IPP (
https://github.com/Natitesis/ACO_Nati.git, accessed on 1 February 2026) was implemented in Python 3.10 using NumPy 1.19.5, Pandas 1.2.0, Matplotlib 3.3.3, scikit-learn 0.23.2, and SciPy 1.5.4. All simulations ran on a workstation with an 11th-gen Intel Core i7@2.80 GHz, 16 GB RAM, and a 64-bit OS.
5.2. Case Study
Lake Ypacaraí was adopted as case study due to its documented eutrophication and contamination pressures from sewage and agricultural runoff [
2]. This water resource is the largest lake of Paraguay (about
), fed mainly by Pirayú (SE) and Yukyry (NW) streams, with the Salado river as its natural outlet [
39].
The simulated search area follows the discretization in
Figure 1. Since in situ WQP fields are unavailable, the ground-truth contamination map is synthesized with a multimodal Shekel function:
where
sets peak locations and
controls peak prominence. For evaluation, 10 ground-truth maps were generated with 2–4 peaks over
. Peak locations and the entries of
were drawn at random, producing multimodal contamination fields with varying hotspot prominence. An example ground truth is shown in
Figure 5. In these tests, no cross-WQP correlation is modeled.
5.3. Evaluation Metrics
The analysis reports the following: (i) mean squared error over the lake grid,
where
y is the ground truth and
the GP estimate, (ii) absolute error at contamination peaks,
(iii) the coefficient of determination
,
where
is the mean of the ground-truth values over the evaluated grid cells.
Together, these metrics characterize both global map reconstruction and hotspot fidelity, providing a consistent and quantitative basis for comparing planner performance across methods.
5.4. Statistical Significance Analysis
To complement the comparison of the evaluation parameters, statistical tests are used to determine whether the differences observed in
,
, and
reflect consistent performance differences rather than variability between scenarios. First, a one-way analysis of variance (ANOVA) is performed independently for each metric to test the null hypothesis that all planners have the same mean performance (
). In one-way ANOVA, evidence against
is summarized by the
F statistic, which compares the variance explained by differences between planners with the residual variance within planners [
40]:
All tests are performed with a significance level of
, and differences are considered statistically significant when
[
40].
Since ACO-Path and AquaFeL-PSO [
16] are evaluated on the same set of reference scenarios, their direct comparison is further evaluated using a paired
t-test [
41]. For each scenario
i, the difference is defined as
. Next, the paired
t statistic is calculated as follows:
where
is the mean of the differences,
is the standard deviation of the differences,
is the number of paired scenarios (
), and
represents the degrees of freedom.
5.5. Parameter Settings
The classical AS is used as the baseline ACO variant [
29,
31]. Parameter ranges follow the literature (
Table 1). Following prior studies,
is kept fixed at
, and the remaining ACO hyperparameters are taken from reported settings to maintain comparability. Because the heuristic weight
shows larger variability across related works and directly changes the bias of the transition rule, a focused sensitivity check is carried out by evaluating
while keeping the other parameters unchanged. Pheromone evaporation is set to
, and the number of ants is
.
The environment and fleet configuration are summarized in
Table 2 and follow the settings in [
16,
35]. The fleet includes
ASVs with maximum speed
m/s. The grid resolution is
100 m × 100 m. The sampling distance is
l with
[
16], and the GP length scale is set to
as in [
35]. Missions terminate when each ASV reaches its distance budget of 20 km, consistent with Assumption 3.
5.6. Evaluation of the Proposed System
This section assesses the proposed IPP under varying exploration distances and ACO settings. Performance is analyzed on the contamination map and the uncertainty map, and with quantitative metrics reported later.
5.6.1. Results with and
Three exploration distance were tested—5 km, 10 km, and 15 km—following prior practice in [
16]. The ACO-Path uses the GP outputs to generate action zones. At each iteration, it selects the next zone center
using AS. The decision rule is adaptive: while the cumulative traveled distance is below the exploration threshold, only uncertainty-driven centers (
) are admissible. Once the threshold is reached, both mean and uncertainty centers (
) are considered. Newly acquired WQP measurements update the GP online, reducing local uncertainty and reshaping subsequent action zones.
Figure 6 summarizes the final models after the monitoring task. For each exploration distance
, the subfigure above (orange scale) shows the GP predictive uncertainty map, and the subfigure below shows the GP predictive mean (the estimated WQP or contamination map). In the uncertainty maps, darker regions indicate areas that were insufficiently explored and thus remain poorly informed. The ASV trajectories are also overlaid: each colored polyline denotes the path of a different ASV. The black dot marks the initial position and the red dot marks the final position at the end of the mission. In the mean maps, higher values indicate the more contaminated regions of the water body.
At runtime, the ACO variant AS assigns the next action zone center for each ASV using the current ASV position and a set of candidate action zone centers derived from either the GP mean or the GP uncertainty (depending on whether the or criterion is active). After assigning targets, ASVs move toward their corresponding zones, and ACO-Path accumulates traveled distance per vehicle p. Measurements collected by the on-board sensors are used to update the GP online, which in turn refines both the mean estimate of contamination and the uncertainty field, especially in newly sampled regions.
When the
is 5 km,
Figure 6a, the ASV trajectories cover substantial portions of the map, with emphasis around contamination peaks. ASVs begin by prioritizing high uncertainty regions to improve the GP model. After reaching the 5 km threshold, the criterion switches to
, which promotes targeted sampling near suspected hotspots while retaining some exploration. Unvisited dark regions in the uncertainty map indicate areas without samples, resulting in higher residual uncertainty. The GP mean successfully recovers the principal contamination peaks, two prominent peaks, and a third with smaller elevation, showing qualitative similarity to the ground truth (
Figure 5).
With a 10 km as
(
Figure 6b), the uncertainty is notably higher than in the 5 km and 15 km cases. The trajectories do not adequately traverse one of the simulated hotspots (approximately in the southwest region), and the model retains large poorly informed areas, which elevates the uncertainty. Consistently, the GP mean captures only two of the three high peaks. The third one is underestimated, yielding lower resemblance to the ground truth.
With a
equal to 15 km (
Figure 6c), ASVs undertake longer trajectories before the switch to
, which extends the exploration phase. As a result, the uncertainty is lower than in the 10 km case, and the GP mean recovers all three contamination peaks, closely matching the ground truth. However, because the switch to exploitation is delayed by the longer exploration requirement, there is less early intensification on hotspots compared to the 5 km case.
Table 3 reports the
, the
, and the
for the three exploration distances. The 5 km distance achieves the lowest
and
, indicating the best joint coverage and hotspot exploitation under
. In contrast, the 10 km distance yields the largest
and
among the three settings, aligning with the qualitative observation that one hotspot remained insufficiently sampled. The 15 km distance improves over 10 km, particularly in recovering all three peaks and reducing uncertainty, but remains inferior to 5 km in terms of early exploitation due to the delayed switch from
to
.
Regarding , the highest value is obtained at 5 km with (≈), indicating that, on average, the model explains roughly 97.7% of the variance relative to the ground truth in this setting. For 10 km and 15 km, the 95% confidence intervals are wider and the average lower, reflecting less consistent performance across simulations. In particular, the drop in at 10 km is consistent with excessive exploration and insufficient exploitation, which can introduce redundancy and hinder convergence.
In summary, under and , the 5 km distance provides the most favorable balance between exploration and exploitation, yielding the lowest map error and peak error and the highest average . The 10 km distance over explores relative to exploitation, misses a hotspot, and performs worst across metrics. The 15 km distance extends exploration sufficiently to recover all peaks and reduce the uncertainty compared to 10 km, but its delayed exploitation makes it less effective than 5 km for early hotspot intensification.
5.6.2. Simulation Results with and
This subsection reports results for the same exploration distances
(5 km, 10 km, and 15 km) using the parameter setting
and
(see
Table 1). Setting
strongly amplifies the influence of the distance heuristic in the AS transition rule, i.e., ants have a much higher preference for moving toward nearby action zone centers.
With , AS prioritizes proximity almost exclusively: it tends to choose the nearest action zone centers . This emphasis reinforces exploitation around already contaminated areas while down weighting exploration of highly uncertain, distant regions. Due to the high value of , ACO-Path assigns lower probabilities to faraway centers even when those areas are known to be under explored.
Figure 7a shows the uncertainty and mean maps for
km. The largest uncertainty appears near the boundaries of the simulated lake. Because the required exploration distance is short, ACO-Path switches to the
criterion relatively early. Combined with the high
, this promotes repeated path around nearby sectors. Even so, the predictive mean recovers three contamination peaks, achieving an effective balance between exploration and subsequent exploitation. The mean map exhibits noticeable similarity to the ground truth in
Figure 5, aside from localized traces of uncertainty visible in the subfigure
Figure 7a.
For
km, the uncertainty and mean maps are shown in
Figure 7b. The overall pattern of uncertainty resembles the 5 km case but with persistently high uncertainty in the central and southeastern regions, indicating insufficient coverage there. Nevertheless, the two principal peaks are detected, and the mean retains qualitative similarity to the ground truth. In terms of areal coverage, however, the 10 km setting with
performs worse than 5 km, reflecting the strong bias toward nearby routes and the reduced variety in exploration.
For
km (
Figure 7c), the preference for short moves remains dominant and leads to higher uncertainty compared with the two shorter exploration distances. ASVs continue to select close by paths, frequently reusing previously traveled routes. The contamination map shows two clear peaks and a third weaker peak, yielding a map that is less consistent with the ground truth than in the shorter distance cases. Despite the longer exploration distance, the large
keeps the next target selection tightly focused on nearby action zone centers
. Consequently, the mean estimate does not improve substantially relative to shorter distances, and uncertainty stays elevated over wide areas.
These behaviors indicate that a high
causes the ACO to select the shortest path. While this can intensify exploitation near known hotspots, it limits the ability of the ACO-Path to explore new regions and to build a more complete picture of the environment. The effect is that some peaks are captured well, but valuable information in other lake regions is missed. The metrics in
Table 4 confirm this limitation, with moderate
values and higher errors in both the GP map (
) and the detected contamination peaks (
).
Examining with reveals strong performance at km. At 15 km, the average drops to with a wide 95% confidence interval (), indicating substantial variability across simulated runs. This variability is consistent with the sensitivity to proximity between action zone centers, which reduces exploration diversity and can degrade model fit in parts of the map.
5.6.3. Discussion of ACO-Path Evaluation Results
This section reviews the behavior of the proposed IPP with and across the three exploration distances (5 km, 10 km, and 15 km).
With , the distance heuristic has a moderate effect. ASVs tend to choose nearby action zone centers , but they can still move to farther ones when the pheromone level supports that choice. As a result, paths are relatively uniform. In principle, where an exploration distance is equal to 5 km allows broader coverage. However, the actual coverage depends on how diverse the generated action zone centers are. When the exploration distance is 10 km or 15 km, ASVs often follow similar paths and collect repeated measurements. In the mean maps, this leads to only partial reconstruction of the contamination peaks because the GP receives less diverse measurements.
With , the distance heuristic dominates. In the selection rule, the nearest action zone center is chosen most of the time. The pheromone update reinforces this behavior by adding pheromone to short and frequently used paths. Paths become local and are visited repeatedly. Distant areas remain unexplored even if their uncertainty is high. In some cases (10 km and 15 km), the planner still identifies nearby peaks, but the reconstruction is local rather than global.
By contrast, with
and exploration distance ≥ 5 km, the planner reaches a better balance between exploration and exploitation.
Figure 6a shows a consistent reduction in uncertainty over the map and correct estimation of the three main ground truth peaks (
Figure 5). This setting also yields the lowest errors for the full map and the peak estimates, together with an
close to 1, supporting its overall performance (see
Table 3 and
Table 4).
5.7. Comparison with Other Planners
This section compares the best-performing configuration of ACO-Path (identified in the previous section) against five path planners from the literature using four ASVs. The evaluation focuses on three metrics: the , the , and the . All values are reported as mean ± 95% confidence interval over repeated simulations, which also indicates the stability of each planner.
The planners are as follows: Lawnmower, which executes a uniform sweep with parallel tracks and does not adapt to information gathered during the mission; Classical PSO, a PSO scheme inspired by collective motion [
51] that explores the space but does not tightly couple waypoint selection to the evolving predictive model; Random Path, which assigns a random heading and directs the vehicles to continue on that course following each measurement; Random Grid, which likewise randomizes headings but restricts them to right angles (
); AquaFeL-PSO [
16], which integrates GP feedback to place waypoints in informative zones; and ACO-Path, which combines an inverse-distance heuristic
with the probabilistic ACO rule and pheromone deposition. In ACO-Path, the candidate action zone centers are generated from the GP and updated as new measurements arrive.
All planners were evaluated under identical conditions: the same discretized environment, fleet size (), initial ASV positions, sampling model, and maximum distance of the mission (20 km per vehicle). For each planner, the GP was updated with the measurements collected along its trajectories to produce the estimated map .
Table 5 reports the quantitative results. AquaFeL-PSO achieves the lowest
and the highest
, indicating the most accurate global reconstruction. ACO-Path achieves the lowest
, showing greater consistency in detecting contamination hotspots, while ranking second on
and
. On other hand, Classical PSO exhibits the largest
and the lowest
, ranking last overall. Lawnmower attains a moderate
and a high
, but its
is larger than that of ACO-Path and Random Path. Similar to Lawnmower, Random Path has a low
with the second-lowest
, though it remains non-adaptive to the model. Random Grid (right–angle random walk) underperforms in
and
and shows higher variability across runs.
To assess whether these differences are statistically significant,
Table 6 reports a one-way ANOVA conducted independently for each metric across all planners (
, 10 scenarios). Statistically significant differences are observed for
and
(
and
). For
, the ANOVA result is not significant at
(
and
), and therefore, the null hypothesis of equal means cannot be rejected for this metric.
In addition, since ACO-Path and AquaFeL-PSO are the two best-performing planners according to
Table 5 and are evaluated on the same scenarios,
Table 7 reports a paired
t-test to directly assess whether the performance gap between them is statistically significant. No statistically significant differences are found for any metric (two-sided
).
In
Figure 8, each figure shares the same layout. The top panel shows the GP uncertainty over the water resource (darker orange means higher uncertainty) with ASV trajectories overlaid. The black dots mark initial positions and red markers indicate final positions of the ASVs. The bottom panel shows the GP mean or the estimated contamination map.
In the lawnmower path planner (
Figure 8a), the ASVs travels in parallel to obtain uniform coverage. In the uncertainty graph, the long, straight paths reduce the uncertainty in the interior but leave higher uncertainty along corners. As a consequences, in the mean graph, the field is smooth and coherent, but the hotspots are diluted: peaks are present yet underestimated and slightly displaced with respect to the ground truth. This behavior explains the larger peak error.
The classical PSO (
Figure 8b) updates candidate waypoints using social and cognitive terms, which tends to concentrate sampling near locally attractive regions without an explicit rule to avoid overlap. In the uncertainty graph, paths cluster around a few local optima, leaving other areas with high uncertainty. The mean map recovers parts of the main peaks but misses structure where the fleet did not sample.
In Random Path (
Figure 8d), allowing arbitrary course changes produces more diffused, winding coverage. The ASVs visit several sectors and reduce
across much of the interior, though some high-uncertainty regions remain without a targeting rule. In the mean map, the main hotspots are reconstructed with reasonable location and contrast. This is reflected in a competitive
(around
in our runs) and a peak error lower than Lawnmower.
In Random Grid (
Figure 8c), constraining the directions to
yields straight trajectories, similar to blocks. This pattern often repeats segments and leaves gaps, especially near the boundaries. In the uncertainty map, these gaps persist as areas of high
. On the mean map, the hotspots appear diluted or slightly shifted, consistent with a higher
, a lower
than in Random Path, and a higher mean maximum error due to missing or misdetected peaks.
In the AquaFeL-PSO (
Figure 8e), the PSO objective is shaped by the GP so that sampling increases information while still tracking high mean values. In the uncertainty graph, paths spread more effectively and reduce uncertainty more uniformly, including near boundaries. In the contamination map, the reconstruction closely matches the ground truth (
Figure 5): the two dominant hotspots are well located and the weaker eastern peak appears clearly. This balance explains the low map-level error and stable performance across runs.
ACO-Path (
Figure 8f) selects the next action zone center using an AS rule that combines pheromone (memory of successful choices) with the distance-based heuristic. The policy prioritizes uncertainty to explore first. Once an ASV reaches the exploration distance, it switches to
criterion to generate the action zones. In the uncertainty map, vehicles first visit high-uncertainty regions and later concentrate sampling in informative areas with high mean values, leaving some peripheral areas with moderate uncertainty values. In the mean map, the main hotspots are estimated with good location and contrast: the southeast and central west peaks stand out and the weaker eastern feature is visible. This targeted behavior yields low peak error and high
.
In addition to reconstruction accuracy, the smoothness of the resulting trajectories was also analyzed, since frequent sharp heading changes can increase the burden on the low-level controller and make practical execution more demanding in terms of control effort and energy consumption. In practice, abrupt turns translate into large and fast-varying turning commands, which may drive the commanded inputs closer to actuator limits (magnitude, rate constraints), thereby reducing tracking authority and increasing the required control action [
52,
53].
This effect was quantified by computing the absolute heading change between consecutive trajectory segments,
, as well as by counting the number of turns above a threshold (here,
) along the mission distance.
Figure 9 reports the evolution of the turning demand versus distance as mean ± 95% confidence interval over the simulations, and
Table 8 summarizes the cumulative number of sharp turns at the end of the analyzed segment. The values are reported as mean ± 95% confidence interval over 10 mission tasks.
From an execution-oriented perspective, reducing sharp turns is relevant not only for control effort but also for energy usage. In mobile robots, power or energy demand is strongly dependent on the motion regime: straight segments typically yield the lowest consumption, whereas trajectories with pronounced rotational components require higher actuation due to increased lateral slip and frictional losses during maneuvering [
54,
55].
Although AquaFeL-PSO attains the best global reconstruction (lowest and highest ), it consistently produces trajectories with a higher incidence of sharp turns than ACO-Path. At the end of the segment, AquaFeL-PSO accumulates turns with , whereas ACO-Path accumulates , which corresponds to approximately fewer sharp turns for ACO-Path. This difference is also observed earlier in the mission. Overall, these results indicate that ACO-Path yields smoother trajectories (fewer abrupt heading changes) while remaining competitive in map-level metrics, suggesting a more favorable trade-off when practical execution constraints are considered.
A paired statistical test was conducted to verify whether the difference in sharp turns between the two planners is significant under matched scenarios. Using the common runs that reached the 20 km of traveled distance (, ), a paired two-sided t-test indicates a statistically significant reduction in the number of sharp turns for ACO-Path compared to AquaFeL-PSO (, ).
In summary, across all planners, a clear trade-off appears between broad spatial coverage and hotspot fidelity. The random baselines provide a useful reference: allowing continuous headings (Random Path) generally reconstructs the field better than restricting motion to right angles (Random Grid), although both remain behind planners that explicitly exploit GP feedback. Lawnmower enforces systematic coverage, which reduces uncertainty over large areas but also tends to smooth and attenuate peaks. Classical PSO can concentrate sampling around locally attractive regions, but without an explicit mechanism to avoid overlap, it may leave parts of the domain insufficiently explored. Focusing on the two best methods, the paired comparison in
Table 7 shows that the differences between ACO-Path and AquaFeL-PSO in
,
, and
are not statistically significant (two-sided
), even though their mean values differ. Overall, AquaFeL-PSO delivers the most accurate global reconstruction, whereas ACO-Path achieves the lowest hotspot error and, as the turning analysis shows, does so with smoother trajectories.
The evaluation relies on ground truth fields and an idealized execution model (e.g., noise-free measurements and perfect trajectory tracking). This setting was chosen to focus on the planning component and to separate it from sensing and control effects. Assessing robustness under measurement noise, localization errors, and communication constraints will be addressed in future work.
6. Discussion of the Results
Across the tested configurations, the best performance is obtained with
, and an exploration distance of 5 km (
Table 3). This setting reduces uncertainty early and then concentrates samples on high-mean regions, which aligns with the qualitative maps in
Figure 6.
When the exploration distance is set to 10 km with , the proposed IPP remains too long in exploration. One hotspot is undersampled and large regions keep high uncertainty. This aligns with the increase in and , as well as the drop in . At 15 km, coverage improves compared to 10 km and all peaks appear, but the late switch to exploitation leaves less distance to exploit them, so results remain weaker than at 5 km.
With
, the distance term dominates the choice of targets. Ants almost always prefer nearby centers. This sharpens local sampling around known areas but limits coverage, especially at 10–15 km (
Figure 7). Uncertainty remains high in parts of the map, and the confidence intervals in
Table 4 widen. The 5 km setup is again the most solid in this group, but it does not surpass the best
setting. In summary, too much weight on distance leads to repeated short moves and missed regions.
The multi-planner comparison puts these findings in context (
Table 5,
Figure 8). AquaFeL-PSO, which also uses the GP to guide sampling, achieves the lowest
and the highest
. Therefore, the most accurate global map. ACO-Path obtains the lowest
and ranks second on
and
, which indicates reliable hotspot detection with competitive global fit. Lawnmower ensures coverage but tends to smooth peaks. Random Path improves peak error relative to Lawnmower with similar
. Random Grid, constrained to right angles, repeats segments and leaves gaps, which is reflected by its higher
and
. Classical PSO explores the space, but without a tight coupling to the GP during waypoint selection, it ends up with the highest
and the lowest
on average.
The interpretation mentioned above is supported by inferential analysis. The one-way ANOVA among all planners (
Table 6) detects statistically significant differences for the map-level metrics
and
(
), reinforcing the conclusion that the ranking observed in
Table 5 reflects systematic differences in performance rather than fluctuations between scenarios. However, for
, the ANOVA result does not meet the criterion
(
), so it does not provide sufficient evidence to claim that there is a clear separation between all planners in terms of hotspot error in this test.
Since ACO-Path and AquaFeL-PSO are evaluated on identical reference scenarios, a paired analysis provides a more appropriate direct comparison. As indicated in
Table 7, the paired
t tests do not indicate statistically significant differences between these two planners for
,
, or
(two-tailed
). Therefore, although AquaFeL-PSO achieves the most robust global reconstruction and ACO-Path achieves the lowest average error in hotspots, the current set of scenarios does not support a conclusive claim of superiority of one method over the other in these accuracy metrics.
Beyond the accuracy of the reconstruction, the analysis of turns adds an execution-focused dimension to the comparison between planners. Using the absolute change in heading between consecutive segments together with the count of sharp turns, it has been observed that AquaFeL-PSO generates more aggressive maneuvers than ACO-Path for comparable mission distances. At 20 km, the average number of turns exceeding 45° is higher for AquaFeL-PSO, while ACO-Path produces smoother heading profiles. This difference is relevant from an operational point of view: frequent sharp turns imply larger and more rapidly varying turn commands, which reduces the tracking margin as the demanded inputs approach the magnitude and/or speed constraints of the actuator [
52,
53]. Furthermore, trajectories with pronounced rotational components tend to require more energy than nearly rectilinear motions, as turns increase actuation requirements and associated losses [
54,
55]. A paired two-sided
t-test on the runs that reach 20 km of traveled distance confirms that this reduction is statistically significant (
,
,
).
These results were obtained under controlled assumptions (noise-free measurements, synchronized motion, centralized coordination, and synthetic ground truths adapted to the lake grid). Under these controlled assumptions, the results consistently show that coupling Ant System with GP-driven action zones and an explore then exploit policy improves hotspot reconstruction while maintaining a competitive global fit against the selected baselines.
7. Conclusions
This work presented an IPP based on ACO for monitoring WQPs with a fleet of ASVs. The proposed planner, ACO-Path, couples a GP surrogate with Ant System so that decisions are guided not only by geometric proximity but also by the GP mean (contamination estimate) and variance (model uncertainty). From these maps, action zone centers are generated online, and the policy follows an explore then exploit policy: it prioritizes uncertainty () until an ASV reaches an exploration distance , then switches to a combined criterion to concentrate sampling on informative regions.
In simulations inspired by Lake Ypacaraí, ACO-Path was assessed under three exploration distances (5, 10, and 15 km) and two heuristic weights (). The configuration , , with a 5 km exploration distance yielded the most consistent performance, achieving the best trade-off across map-level reconstruction and hotspot fidelity. Increasing the exploration threshold to 10–15 km under delayed the exploitation stage, which reduced refinement near hotspots within the available mission distance. For , the inverse-distance heuristic dominated target selection, strengthening local sampling but limiting coverage and leaving higher uncertainty in less visited regions.
A comparative study against Lawnmower, Classical PSO, Random Path, Random Grid, and AquaFeL-PSO further highlighted these trends. AquaFeL-PSO achieved the strongest global reconstruction on average, while ACO-Path achieved the lowest mean error at contamination peaks and remained competitive in map-level metrics. Statistical analysis supports these conclusions: differences between planners are significant for global reconstruction metrics, while hotspot error shows weaker separation under the same experiments. When directly comparing the two best planners in equal scenarios, ACO-Path and AquaFeL-PSO, the paired analysis does not provide sufficient evidence to claim a consistent advantage of one method over the other in reconstruction metrics, suggesting that both methods are competitive under the evaluated conditions.
Beyond reconstruction accuracy, the analysis of turns adds an execution-oriented perspective. At comparable mission distances, ACO-Path consistently produces smoother trajectories than AquaFeL-PSO, with fewer abrupt course changes. This feature is relevant for practical deployments, where frequent abrupt turns can increase low-level control effort and energy consumption.
If the method is taken to deployment, the IPP should be integrated with a local path planning layer in charge of obstacle avoidance and inter-ASV collision avoidance during execution, so that safety is handled at control level while the IPP provides the nominal path.
As future work, the authors propose to (i) enable online adaptation of key ACO parameters; (ii) extend ACO-Path to multi-objective planning that jointly considers field accuracy, energy consumption, and travel time; (iii) adopt spatiotemporal GP so that the planner adapts to the changing dynamics of a water resource rather than a static field; (iv) conduct a more extensive sensitivity analysis of the ACO hyperparameters and evaluate the ACO-Path with additional IPP planners, including more recent ACO variants and learning-based approaches, under a unified benchmarking protocol; (v) consider more realistic sensing and navigation conditions (e.g., measurement noise and localization uncertainty) and additional deployment constraints to further assess robustness in real-world monitoring missions.