1. Introduction
Waterways are spaces that play an important role in modern society and are used for various purposes, such as ship operation, ecosystem maintenance, and water quality management. A waterway is a dynamic environment that is a mixture of physical and ecological diversity, and there are complex elements such as ships going on and off the route, activities of various aquatic organisms, and water quality that changes over time and region. These characteristics mean that waterways have more than just a passageway and are an important axis of natural ecosystems and an essential foundation for human activities. Therefore, understanding and monitoring the environment of waterways is essential in various fields such as logistics management, multipath routing, environmental protection, and water quality improvement [
1,
2,
3,
4]. In addition to spatial monitoring, recent research highlights the importance of temporal data validity, particularly through the concept of AoI (Age of Information) [
5,
6]. In dynamic environments like waterways, delayed or outdated environmental data can significantly reduce the effectiveness of decision-making. Incorporating AoI into surveillance design ensures that the system not only captures wide-area data but also prioritizes data freshness in real-time operations.
However, the surveillance and monitoring work of waterways presents challenges that are inherently different from existing land-based systems. The surveillance system in the land environment is designed based on a relatively fixed sensor network and stable monitoring conditions in combination with AI(Artificial Intelligence), 5G-6G communications, WSN (Wireless Sensor Networks), IoT (Internet of Things), UAVs (Unmanned Aerial Vehicles) [
7,
8,
9,
10,
11,
12,
13]. On the other hand, the limitations of the traditional monitoring system are clearly revealed due to the ever-changing fluid environment of the waterways. For example, changes in the moving path or water quality of a ship are difficult to predict, and a fixed sensor network cannot guarantee sufficient coverage and efficient data collection.
Figure 1 depicts the waterway surveillance for a specific region detected by a movable sensor. In particular, due to the nature of the waterways, a flexible system design that can efficiently cover a large monitoring area and reflect dynamic changes in real time is required [
14,
15].
Existing barrier-based sensor systems are designed for surveillance-oriented purposes [
16,
17,
18,
19]. However, in a waterway environment, it is important to efficiently collect environmental data over a wide area to detect intruders. To this end, a new approach is needed to monitor a wide range by minimizing overlap when deploying sensors and maximizing the sensing radius. For example, a sensor network with reduced overlap is more suitable for collecting environmental data such as water temperature or pollution level, which enables broader and more even data collection.
This study deals with the problem of using a movable sensor network for monitoring work in a waterway. Unlike the existing static sensor arrangement, in this study, sensors are designed to move to the target area and effectively perform sensing in the area. To this end, this study proposes a system composed of the following two steps. First, the moving step of the sensors to the target area, and second, the arranging step of the sensors being effectively arranged within the target area to maximize sensing efficiency. In the first step, the sensors are learned to move efficiently to the target area using MLP. This reduces the computational complexity of the movement path extraction process and allows a simple movement task to be performed quickly. In the second step, sensors find the optimal arrangement within the target area using RL. In particular, it aims to minimize the overlap of sensing areas between sensors and maximize the sensing radius by reflecting the characteristics of the water channel environment. Unlike conventional security-oriented surveillance, it is based on design goals for more effectively collecting environmental data, such as water temperature and pollution level.
The main contributions of this study are summarized as follows.
First, we design a movable sensor-assisted waterway surveillance model with enhanced coverage using a Multi-Layer Perceptron and Reinforced Learning. Also, we present the system overview and settings, as well as make a formal problem definition for a dynamic waterway environment.
To resolve the problem, we propose a scheme consisting of Movement Phase and Deployment Phase which provides enhanced coverage and overlapped area minimization. Then, the performances of the proposed scheme are evaluated based on numerical results through extensive simulations with various scenarios and factors, including loss, ratio, train time, traveled distance, and overlapped area.
The remainder of this paper is organized as follows.
Section 2 introduces the proposed system with overview, settings and problem definition.
Section 3 describes the proposed schemes consisting of Movement Phase and Deployment Phase. Then, in
Section 4, the performances of the proposed algorithms are evaluated with detailed discussions through extensive simulations. Finally, in
Section 5, we conclude the paper.
2. Proposed System
In this section, we explain the system overview, settings, and problem definitions that are used in the proposed waterway surveillance model.
2.1. System Overview
A waterway is an environment with dynamic characteristics and is used for various purposes such as ship movement, ecosystem activities, and environmental monitoring. These characteristics are essentially different from the environment in which the existing monitoring and monitoring system is designed. While the existing monitoring system is designed to respond to dynamic intruders for a relatively fixed environment, the waterway environment is constantly changing due to dynamic factors such as water flow, ship movement, and water quality changes.
The proposed system consists largely of two stages: the
Movement Phase and the
Deployment Phase. Those stages work complementarily for efficient operation of the sensor network, with specialized techniques utilized at each stage. In the
Movement Phase, the sensor is moved to the target region using an MLP. This step serves to learn and predict the movement direction so that the sensor may reach the target region from the current position. The goal is to effectively move the sensor to the target region while reducing computational complexity. In the
Deployment Phase, RL is used to efficiently arrange sensors within the target area. In a waterway environment, the main task is to minimize the overlap of sensing areas between sensors and to maximize the overall coverage. In the
Deployment Phase, to achieve this goal, the RL model is trained to optimize the sensor placement.
Figure 2 and
Figure 3 briefly present an overview of the
Movement Phase and
Deployment Phase.
2.2. System Setting
All simulations were implemented in Python 3.8.
The MLP and RL models used in the proposed framework were implemented using PyTorch 2.0.
The size of the waterway environment consists of a 2D environment of 100 (m) × 100 (m).
The size of the target area, where sensors are intensively conducting sensing operations, is 10 (m) × 10 (m).
The sensor used in the proposed system is a movable sensor in the waterway.
In a waterway, the movable sensor can only move in four directions: up, down, left, and right, based on the environmental 2D coordinates.
2.3. Problem Definition
This study addresses the problem of achieving efficient surveillance and monitoring in dynamic waterway environments by utilizing a mobile sensor network. Unlike traditional static environments, waterways are characterized by continuous changes due to factors such as water flow, vessel movement, and environmental variability. These dynamic characteristics require sensors to not only respond to moving targets but also monitor static or semi-static phenomena across a constantly shifting landscape. To tackle these challenges, this study proposes a system that ensures efficient sensor movement and deployment within a designated target region.
The problem is divided into two phases: the Movement Phase, which focuses on guiding sensors from random initial positions to the target region, and the Deployment Phase, where sensors are strategically positioned within the target region to optimize sensing coverage while minimizing overlap. The two steps serve complementary roles and aim to propose an effective sensor network that provides computationally efficient movement within the dynamic characteristics of the waterway environment and less overlap of sensing radii in the waterway.
4. Experimental Evaluations
This section evaluates the performance of the proposed movement and deployment stages through detailed numerical experiments. Experimental analysis evaluates the effectiveness of the proposed method by focusing on indicators such as Find Ratio, coverage overlap area size, and learning time, which is the percentage of accurate movement to a specific area. The experimental base environment is configured to be simulated in an environment tailored to the system setting of the Proposed Schemes section, and both stages use parameters that match each algorithm. The Movement Phase evaluates the node’s ability to reach its target location in duplicate with minimal energy consumption, while the Deployment Phase focuses on optimally deploying sensors to maximize coverage and minimize redundancy. To ensure robustness and statistical validity, results are derived based on experiments.
4.1. Movement Phase Evaluation
The Movement Phase is a step in evaluating the process of moving the sensor to the target area in an experimental environment. This step analyzes the sensor’s ability to efficiently reach the target area in its initial position. Therefore, the MLP model is used to determine the direction of movement of each sensor. MLP predicts the optimal direction of movement based on the current coordinate (x, y) of the sensor in the 100 × 100 simulation environment, and selects the direction to maximize the chance of successful arrival in the target area as represented in
Figure 5.
In this experiment, we analyzed the performance of the Movement Phase in various initial sensor deployments and target area settings. Key performance indicators are learning time, distance traveled, and Find Ratio, which means the percentage of sensors that reached the target area. We also evaluate whether the proposed loss function was applied properly in the Movement Phase. Identify the efficiency of the proposed migration strategy and MLP-based approach by minimizing losses and optimizing sensor movement. Comprehensively analyze these metrics to verify that the Movement Phase has effectively achieved its intended goals.
4.1.1. Loss
The convergence of the loss function has a direct correlation with the performance of model learning, indicating that the model is gradually learning the optimal pattern. The Movement Phase learning combined Classification Loss with Distance-Based Loss to guide you to learn how to efficiently move to the target area.
For
Figure 6, the total loss showed a high value in the early stages of learning, but gradually decreased and showed a tendency to converge stably as learning progressed. This means that the model is gradually learning the best path to the target area. In particular, Classification Loss played an important role in increasing predictive accuracy in the direction of travel, and Distance Loss helped sensors reduce distance from the target area. The combination of these loss functions complements the model to allow simultaneous learning of the optimization of travel paths and access to target areas. Loss convergence results show that the model returns an efficient navigation path by gradually reducing uncertainty during the movement.
4.1.2. Find Ratio and Train Time
Find Ratio shows the percentage of sensors that have successfully accessed the target area within 3000 steps based on a total of 1000 coordinate generations. The performance metrics in
Figure 7 are calculated as the average over 100 independent simulation runs for each setting. This is used as an important indicator of how much MLP has generalized learning in the Movement Phase. A higher Find Ratio means that the model has learned to move accurately from initial locations to the target area.
We set the data to 100, 200, 300, 400, and 500 pieces, respectively, and analyzed the performance changes depending on the number of data points. The number of epochs gradually increased from 100 to 500, and the learning rate was set to 0.001. Analysis of the model’s performance in these simulation environments has shown that Find Ratio tends to increase as the number of data points and epochs increases. In particular, with 500 pieces of data, Find Ratio has increased rapidly since the beginning of the epoch, showing high performance. This shows that the more data is used for learning, the better MLP can learn different patterns and generalize the path to the target area.
Figure 7 presents the measurements of Find Ratio and train time according to different epochs. For example, when learning with 500 pieces of data, since epoch 200, the Find Ratio remained close to 100%, showing excellent performance. Because the coordinates used in the learning were randomly extracted, the performance might not be constant when the deflected coordinates were extracted. You can see this in a graph, but in some cases, a small number of data (e.g., 400) showed a higher Find Ratio than using more data (e.g., 500). This indicates that random coordinate extraction can overfit a particular pattern or that some significant paths were not learned. Also, even if the initial value of Find Ratio is low, we could see how the Find Ratio gradually improves as the epoch increases. This shows that the MLP model has learned more accurately the direction of gradually moving to the target area through the learning process. However, it should be taken into account that the rate of improvement and final value of the Find Ratio may vary depending on the number of data points and the diversity of initial coordinates. In summary, through Find Ratio analysis, we can check that the performance of the MLP model in the Movement Phase was greatly influenced by the number of data and the learning epoch. Therefore, it is important to maintain a balance between the composition of learning data and the coordinate extraction process, which can further improve the generalization performance of the MLP model.
4.1.3. Traveled Distance
In order to evaluate the performance of the Movement Phase, Manhattan Distance was introduced as a comparison between how to measure the distance of the Movement Phase and how to evaluate it. All reported values represent the average results obtained from 100 independent simulation runs under the same conditions. First, the distance of the Movement Phase is calculated by measuring the total number of movements before the sensor reaches the target area along the direction predicted by the model. This is used as an important indicator of whether the model moves to the target area efficiently. Manhattan Distance, which was used for comparison, is the shortest distance calculation method that reflects the assumption that only four directions (top, bottom, left, right) can be moved. That is, it is a simple and effective criterion for evaluating realistic movement constraints, calculated as the sum of horizontal and vertical travel distances between two points.
As shown in
Table 1,
Table 2 and
Table 3, the distance traveled by the route obtained in the Movement Phase showed a difference of up to 5 percent compared with the shortest path in Manhattan under the same conditions. This indicates that the model has a very efficient path-finding capability.
While the proposed MLP-based approach does not explicitly guarantee the absolute shortest path as calculated by graph-based algorithms, it is designed to prioritize computational efficiency and scalability. Conventional graph-based methods often require exhaustive searches or iterative path recalculations, which can substantially increase computational overhead and hinder real-time applicability in dynamic environments. In contrast, the MLP model generates movement decisions with constant time complexity O(1) after training, enabling rapid inference regardless of the size of the environment. Although the measured travel distances may deviate by up to 5% from the corresponding Manhattan Distances, the proposed model consistently maintains a high Find Ratio. This observation indicates that such minor deviations in path length do not compromise the model’s ability to reach the target area effectively. Consequently, the MLP-based approach achieves a practical balance between path efficiency, decision-making speed, and adaptability, making it suitable for real-world deployment scenarios.
4.2. Deployment Phase Evaluation
The Deployment Phase evaluates the optimal placement of sensors that have successfully reached the target area during the Movement Phase. This phase focuses on maximizing the coverage of the 10 × 10 m target area, where the x and y axes represent the coordinate positions within this area, while minimizing the overlap between the sensor’s sensing ranges as shown in
Figure 8. By achieving this balance, the Deployment Phase ensures efficient sensor deployment and optimal resource utilization.
The performance of the Deployment Phase was analyzed under various sensor deployments and target area configurations. The key performance indicators used were coverage ratio and overlap ratio. The coverage ratio measures the proportion of the target area covered by the sensors, while the overlap ratio quantifies unnecessary coverage caused by overlapping sensing ranges between sensors. These metrics provide critical insights into the effectiveness of the proposed deployment strategy. We also evaluate whether the reinforcement learning-based optimization algorithm, specifically Proximal Policy Optimization (PPO), was successfully applied during the Deployment Phase. The PPO algorithm incorporates a loss function that penalizes overlap to minimize redundancy while simultaneously optimizing for uncovered regions to ensure maximum coverage. By analyzing these metrics comprehensively, it can be verified that the Deployment Phase effectively achieved its objectives of maximizing coverage and minimizing overlaps. For example, the sensing redundancy rate was determined by combining the set target area size and the number of sensors. This confirms that the proposed algorithm has effectively achieved the goal of minimizing redundancy for coverage maximization in the Deployment Phase.
4.2.1. Loss
The Deployment Phase focuses on optimizing sensor placement to maximize coverage of the target area while minimizing overlap between sensor sensing ranges. The reward function utilized during this phase incorporates three critical components: coverage, overlap penalty, and boundary penalty. The coverage term measures the proportion of the target area effectively covered by the sensors. The overlap penalty penalizes overlapping regions between sensor sensing ranges, ensuring efficient utilization of sensors and minimizing redundancy. Lastly, the boundary penalty discourages sensor placements outside the allowable area, guiding sensors to remain within the target area boundaries.
As illustrated in
Figure 9, the total loss during the Deployment Phase initially starts at a high value due to significant overlap and suboptimal placements. However, as the optimization progresses, the loss decreases substantially, showcasing a stable convergence pattern. This indicates that the proposed PPO-based reinforcement learning algorithm effectively learns to balance maximizing coverage and minimizing overlap. The loss convergence highlights the successful application of the overlap penalty and boundary penalty in the reward function. These penalties ensure that the sensors are optimally distributed across the target area, covering as much area as possible while avoiding unnecessary overlap. This demonstrates that the Deployment Phase achieves its objective of efficient sensor placement. Then, the stable convergence of the loss function underscores the reliability and effectiveness of the Deployment Phase in optimizing sensor placement. By achieving a balance between coverage maximization and overlap minimization, the Deployment Phase demonstrates its capability to support robust and efficient sensor networks in the Target Area.
4.2.2. Overlap Minimization
In the Deployment Phase, minimizing overlap between sensor coverage areas is essential for maximizing total coverage within a limited target area. By reducing overlap, the sensing regions are distributed more efficiently, avoiding redundancy and improving resource utilization. To evaluate the effectiveness of the Deployment Phase, the degree of overlap between sensors is measured. The overlap is calculated using the following Equation:
where r is the sensing radius, and d is the Euclidean distance between two sensor centers. If
d is smaller than
, the sensors overlap, and this formula estimates the overlap area. By summing the overlap areas for all sensor pairs, the total overlap in the deployment is quantified.
To validate the proposed approach, it is compared with a baseline method using K-Means Clustering. K-Means clustering is commonly used to partition data points into evenly distributed groups, ensuring that sensors are positioned uniformly across the area. However, it does not explicitly address overlap minimization, which can result in suboptimal resource utilization. Now, we compare the proposed Deployment Phase with the K-Means clustering scheme to confirm that the region’s overlap rate is reduced, confirming that the maximum coverage region can be achieved.
Table 4 presents a comparison of overlap areas during the Deployment Phase using the proposed reinforcement learning-based approach and the K-Means clustering algorithm. Each value in
Table 4 represents the average result from 100 simulation runs for the corresponding sensor configuration. Overall, the results indicate that both methods perform similarly in minimizing overlap. For the case of 3 sensors, both approaches achieved an overlap area of 0, demonstrating their effectiveness in avoiding redundant coverage in low-density scenarios. As the number of sensors increased to 5 and 7, some overlap was observed in both methods, with the reinforcement learning-based approach yielding slightly higher values compared with K-Means clustering. However, the differences remain relatively minor, indicating that the proposed model is capable of achieving performance levels close to K-Means clustering. The results highlight the strength of the proposed Deployment Phase model in adapting to different sensor configurations. Unlike K-Means clustering, which distributes sensors evenly without considering specific constraints, the proposed model incorporates objectives such as maximizing coverage and minimizing penalties for overlapping and boundary violations. This adaptability ensures that the model can handle diverse deployment scenarios while achieving the desired outcomes.
5. Conclusions
This study proposed a novel sensor deployment framework designed to address the challenges of dynamic waterway environments. By integrating a Movement Phase using a Multi-Layer Perceptron (MLP) and a Deployment Phase optimized via reinforcement learning (RL), the framework successfully optimized sensor placement for enhanced coverage and minimized overlap. Through comprehensive evaluations, the effectiveness of the proposed framework was validated across various scenarios. The results demonstrate that the Movement Phase enabled sensors to efficiently navigate toward the designated target area, while the Deployment Phase ensured optimal arrangement to maximize coverage while reducing redundancy. Comparisons with the K-Means clustering algorithm further highlighted the adaptability and performance of the proposed approach, achieving comparable results while incorporating additional constraints for real-world applicability. Furthermore, although the current framework does not explicitly optimize for temporal freshness, future extensions could incorporate AoI-based evaluation to quantify how rapidly and reliably environmental updates are delivered. This would provide an additional performance dimension, particularly relevant in fast-changing aquatic environments where delayed information could lead to suboptimal responses.