3.3. Spatiotemporal Clustering
Fishing vessel trajectories exhibit high variability, making it challenging to detect patterns. Point-based methods analyze each AIS position independently, failing to capture the continuous nature of fishing operations [
25]. Spatiotemporal clustering addresses this limitation by grouping trajectory points based on both spatial proximity and temporal continuity. This approach identifies concentrated fishing areas while preserving the behavioral context essential for distinguishing fishing from transit activities.
Density-based clustering methods, such as DBSCAN, can identify clusters of any shape and handle noise in vessel data [
26]. However, DBSCAN only considers spatial attributes. We use ST-DBSCAN, which adds temporal constraints to capture the dynamic nature of fishing activities. This is important because fishing vessels operate in specific areas over extended time periods. ST-DBSCAN has been empirically validated to be well suited for analyzing dynamic spatiotemporal data, such as navigation trajectories [
24].
Specifically, we consider a fishing vessel trajectory point sequence P = {p1, p2, …, pi, …, pn}, where n represents the total number of points in the given vessel trajectory, and each point pi contains spatial coordinates (xi, yi), a timestamp ti, and other non-spatial attributes (e.g., speed and direction). The ST-DBSCAN algorithm involves three parameters: the spatial radius threshold εs, the temporal radius εt, and the minimum number of points minPts. Here, points satisfying both spatial (distance ≤ εs) and temporal (time difference ≤ εt) constraints form neighborhoods. Regions with at least minPts form clusters representing fishing activities, while isolated points are classified as noise, typically indicating transit movements.
To quantify the proximity between vessel trajectory points, we establish a spatial and temporal joint distance metric. Specifically, given two trajectory points
pi and
pj ∈
P, the spatial similarity between
pi and
pj is measured by their geographical distance, while the temporal similarity between
pi and
pj is quantified through their absolute timestamp difference. Then, the spatiotemporal joint distance
djoint between
pi and
pj is thus formulated as follows:
where
dgeo represents the geographical distance (e.g., great-circle distance or projected Euclidean distance) between
pi and
pj, while
ti and
tj denote the timestamp information of
pi and
pj, respectively.
The use of this spatial and temporal joint distance ensures that only adjacent vessel trajectory points exhibiting both spatial density consistency and temporal continuity are grouped together, reflecting real-world fishing activities where vessels conduct localized operations (e.g., trawling within confined areas) over sustained periods. Then, we can use the ST-DBSCAN algorithm to find meaningful clusters from vessel trajectory points. The clustering process is implemented as follows:
Step 1: Denote all vessel trajectory points in P as unlabeled (denoted as Pu).
Step 2: Randomly select an unlabeled point pi from the set Pu.
Step 3: Compute the spatiotemporal neighborhood Nst(pi) of pi by identifying all unlabeled points that satisfy both spatial and temporal proximity constraints relative to pi, i.e., find all pj ∈ Pu such that ds(pi, pi) ≤ εs and dt(pi, pi) ≤ εt. If the number of vessel trajectory points in this spatiotemporal neighborhood of pi |Nst(pi)| ≥ minPts, pi is designated as a core point. Otherwise, repeat Step 2 until a core point is found.
Step 4: Beginning with the identified core point pi, incorporate all points in its spatiotemporal neighborhood and recursively extend its cluster boundary by traversing the points in its neighborhood and performing Step 3 until no additional points can be added to this cluster.
Step 5: If a point cannot be classified as a core point or is not within the spatiotemporal neighborhood of the core points during the expansion process, it is labeled as a noise point.
Step 6: Repeat Steps 2–5 until all points in Pu have been assigned to a cluster or labeled as noise.
The parameters εs and εt are crucial for defining the spatiotemporal density thresholds, thus influencing the clustering results. To enhance the adaptability of ST-DBSCAN across diverse scenarios, including varying vessel speeds, vessel types, and meteorological conditions, we propose a data-driven parameter optimization strategy. Specifically, the average spatial distance and the average time interval between consecutive vessel trajectory points within a given vessel trajectory are computed as initial estimates for εs and εt, respectively. They can be further fine-tuned based on domain-specific knowledge, such as historical fishing ground distributions and user experiences. In this way, the parameters of ST-DBSCAN can be adaptively set according to the inherent characteristics of the specific vessel data without the need for complex parameter search processes.
Dense point clusters in the trajectories of vessels in the results of ST-DBSCAN are labeled as clusters, and sparse clusters are labeled as noise (i.e., cluster label −1). When examining this noise in the context of a time series, we can observe that some subsequences of noise points (i.e., noise segments) in the continuous space have some degree of coherence and persistence over time. These temporal noise segments, despite being labeled as noise due to their sparsity, may represent meaningful fishing patterns. Therefore, to improve the coherence of the segmentation results at the temporal level, we further propose a post-processing step that re-evaluates these noise segments and treats them as independent new clusters.
After clustering, we apply post-processing to improve results. Some consecutive noise points may represent meaningful fishing patterns despite low density. We identify these temporal segments and reclassify them as new clusters when they exhibit temporal continuity and sufficient duration. Specifically, continuous sequences of noise points are promoted to new clusters if they maintain temporal coherence and exceed a minimum duration threshold, ensuring that only persistent behavioral patterns are captured rather than transient movements. This captures fishing activities in less crowded areas or during dispersed operations.
3.4. Trajectory Segmentation
Trajectory segmentation is necessary because fishing patterns may not hold throughout the entire trajectory, but rather in specific parts of it. Traditional methods use fixed-size windows that cannot adapt to varying fishing behaviors. Fixed-window approaches often merge segments with distinct motion patterns, thereby reducing the accuracy of identifying fishing activities. Current adaptive segmentation methods face two key limitations. First, they typically adjust windows based on single parameters, such as speed, missing the multi-dimensional nature of fishing movements. Second, the continuous-valued data in the AIS message requires methods that consider both spatial patterns and temporal continuity simultaneously. Instead of working with individual points, effective segmentation must preserve behavioral coherence across trajectory segments. In addition, when vessel trajectories exhibit diverse spatiotemporal patterns, such as rapid transitions from high-speed navigation to low-speed fishing, a single adaptive criterion often fails to subdivide different fishing behavioral trajectory segments accurately.
Our adaptive segmentation leverages spatiotemporal clusters as fundamental units rather than arbitrary time intervals. The segmentation divides the trajectories into comparable segments by aligning window boundaries with cluster transitions. This approach ensures that each segment captures coherent behavioral patterns identified by ST-DBSCAN clustering, whether representing concentrated fishing activities or transit movements. The method adapts window lengths based on actual vessel activities rather than predetermined thresholds.
Window length thresholds are determined dynamically from trajectory properties. For a trajectory of length L, the initial threshold is set to L/20, balancing segment granularity with computational efficiency. This ratio ensures that there are sufficient data points within each window for meaningful feature extraction, while avoiding over-segmentation. The adaptive mechanism operates by accumulating spatiotemporal clusters until reaching the desired window size. Rather than using fixed-length windows, the algorithm dynamically includes clusters sequentially until the total number of points approaches the threshold. This approach ensures that window boundaries naturally align with behavioral transitions identified by ST-DBSCAN clustering, with each window containing a variable number of clusters depending on their density and size.
As illustrated in
Figure 3, The segmentation process follows four steps. First, trajectory points are organized chronologically with their cluster labels from ST-DBSCAN. Second, windows are constructed by accumulating clusters until the length threshold is reached, starting from the first cluster. Third, subsequent windows incorporate a 50% overlap to maintain continuity between segments, preventing abrupt transitions at window boundaries. Finally, this process iterates until the entire trajectory is covered. To detect complete fishing activities, a run-length encoding technique is used to combine close fishing windows in post-processing.
Using clusters as segmentation units allows the method to account for heterogeneous movement patterns within trajectories. This approach captures both concentrated fishing activities and transitional movements. The adaptive segmentation handles the uncertainty inherent in fishing vessel movements, distinguishing between purposeful fishing operations and random vessel wandering. The result is trajectory segments that correspond to actual fishing behaviors rather than arbitrary divisions.
3.5. Feature Extraction
To classify the trajectories, features are extracted from the tracks, enabling the modeling of ship behavior. Various factors, including vessel performance, hydrological characteristics, and meteorological conditions, influence ship motion. Within each trajectory segment, vessels exhibit distinct patterns during fishing operations compared to transit movements. During a fishing trip, the movement profiles of vessels depend on the activity they are engaged in. Therefore, we extract features that capture both temporal dynamics and spatial patterns to distinguish these behaviors [
27]. The multi-dimensional feature enables more accurate classification of fishing behavior compared to single-domain methods, particularly in distinguishing complex fishing operations from transit movements [
28]. We therefore select features that directly reflect vessel operational states. The kinematics of ships according to different fishing gears can be effectively classified using movement parameters extracted from each segment. All features are calculated within the adaptive windows to maintain behavioral consistency. Temporal and spatial features provide different but necessary information. Temporal features measure changes in speed and direction over time. Spatial features describe the shape and pattern of vessel movements. Using only one type leads to errors. For example, fishing vessels exhibit similar low speeds during both slow transit and active fishing. However, slow transit typically follows straight paths while fishing operations create curved or repetitive patterns. By analyzing both speed and path shape together, the algorithm can correctly distinguish between these different vessel states. The features used in this study include six main categories.
(1) Velocity Dynamics: This includes average velocity and the frequency of velocity change.
Average velocity: The average velocity in the window represents the overall motion state, calculated as the mean of all velocity values within the window. Straight-line migration behavior tends to have high average velocities, while low average velocities or near-stalling states tend to be low-speed maneuvering for fishing operations.
Velocity change frequency: This metric counts the number of short-term velocity changes, which are likely to be associated with fishing activities. The frequency is calculated as the number of significant velocity changes (where |vi+1 − vi| > δv) divided by the window duration ΔT. The number of speed changes reflects the ship’s movement in response to the conditions of the fishery and the distribution of the fish school.
(2) Acceleration and steering patterns: This feature class captures the dynamic maneuvering behavior of fishing vessels, such as the volatility of acceleration and the frequency of turning events.
Acceleration volatility: This feature calculates the variability in acceleration patterns. For the provided acceleration sequence
within the window, the variance in the acceleration is calculated as follows:
where
denotes the average acceleration within the window, and
b is the number of trajectory points within the window. Larger acceleration variance indicates frequent correction of fishing vessels’ propulsion, which is consistent with fine-tuning movements during fishing operations.
Turning event frequency: It measures the frequency of directional changes within the window. Given the heading sequence
and a predefined directional change threshold
θthr, it is calculated as follows:
where Δ
T denotes the time length of the window, Δ
θi = |
θi+1 −
θi| represents the absolute difference between adjacent headings, and
I(Δ
θi −
θthr) is an indicator function that equals 1 if Δ
θi >
θthr and 0 otherwise. Generally, a high number of turning events would imply frequent direction changes in the short term, most commonly corresponding to the repeated circling or maneuvering behaviors of vessels in fishing areas [
29].
(3) Low-speed dwelling and temporal features: Two features are utilized to quantify low-speed dwelling behaviors, which are crucial for identifying fishing operations.
Low-speed ratio: This feature calculates the ratio of low-speed points within the window:
where
vϵ is a predefined low speed threshold,
b is the length of the speed sequence within the window, and
I() is an indicator function that equals 1 if vi ≤
vϵ and 0 otherwise. A high value of the low-speed ratio indicates that the vessel is more likely to be conducting slow search or net-casting operations in a specific sea area.
Maximum dwell duration: This feature calculates the maximum duration of continuous low-speed segments within the window, where continuous low-speed segments are identified by the low-speed ratio feature.
where {
ti} denotes the set of durations for continuous low-speed segments within the window. This feature can effectively discriminate between transient deceleration events and sustained fishing operations through duration-based thresholding. Focusing on continuous low-speed segments rather than isolated points can reduce misclassification caused by sporadic AIS gaps and capture operationally significant dwelling patterns.
In addition to the above temporal features that emphasize motion dynamics, spatial patterns provide complementary information about the fishing behaviors of vessels. Fishing operations often generate distinctive movement patterns, such as repeated turns and complex path shapes within confined areas, which differ significantly from the straight trajectories typical of transit movements. To effectively capture these patterns, we implement a set of spatial features, including geometric morphology, tortuosity, directional complexity, and spatial distribution and compactness features.
(4) Geometric morphology: This type of feature quantifies the global characteristics (i.e., overall shape, structure, and spatial extent) of the vessel trajectory within the window.
Trajectory length: It quantifies the spatial path coverage of a vessel’s movement and can be calculated by accumulating the distance between consecutive points within the window.
where
d (
pi,
pi+1) denotes the geographical distance between point
pi and its next point
pi+1, and
b is the number of trajectory points within the window. Fishing trajectories typically exhibit longer cumulative lengths within localized areas, which are characterized by a higher density of trajectory points due to repetitive search patterns.
Straightness: The straightness index calculates the ratio of the total trajectory length
L to the displacement (direct distance between start point
p1 and end point
pb), which can be used to further distinguish linear transit from curved fishing paths.
When Rstraight approaches 1, the vessel trajectory approximates a straight line. When Rstraight is significantly greater than 1, this indicates a curved trajectory, which aligns with the meandering pattern generally observed during fishing operations.
(5) Tortuosity and directional complexity: These two features characterize the local path variability (e.g., curvature and turning complexity) of the vessel trajectory within the window.
Local sinuosity: It evaluates the curvature or twistiness of the path by comparing the total path length to the maximum straight-line segment distance within the window, as defined in Equation (8).
where
b is the number of trajectory points within the window and
L denotes the trajectory path distance within the window.
High sinuosity signifies that the trajectory exhibits multiple turns, windings, and localized dwelling patterns within the spatial domain [
30]. Complex fishing behaviors, such as trawling or seine netting, typically result in high sinuosity values due to frequent directional changes and confined movement patterns.
Directional diversity: This metric quantifies the diversity in the heading directions of the vessel. Specifically, the direction space (0~360°) is partitioned into
M equal intervals, and the number of trajectory points in each interval is enumerated. Then, the directional distribution entropy is defined to measure the uniformity of directional distribution, as formalized in Equation (9).
where
b is the total number of trajectory points within the window, and
nθ is the number of trajectory points whose direction falls within the direction interval
θ. High
Hθ values indicate uniform directional distribution and frequent turning, which is often associated with fishing operations.
(6) Spatial distribution and compactness features: Two features of the convex hull area and ratio, as well as spatial densities, are used which focus on the spatial distribution and compactness of trajectory points within the window.
Convex hull area and ratio: We construct the minimum convex hull of the trajectory points within the window and compute the area (denoted as
Ahull) and perimeter (denoted as
Phull) of the convex hull. Then, the aspect ratio of the convex hull is defined as follows:
where
q denotes the number of vertices of the convex hull, and
d(
pi,
pj) represents the distance between any pair of vertices of the convex hull.
Fishing activities often involve repeated maneuvering, leading to a relatively compact convex hull, characterized by smaller Ahull values and Rhull values approaching 1.
Spatial density variance: Point density is defined as the number of points falling within a circular neighborhood of radius
r centered at a given point. We use the variance in spatial point density to measure the spatial distribution of trajectory points within the window, which is formulated as follows:
where
b is the number of trajectory points within the window,
denotes the density of point
pi, and
represents the mean density of points within the window. A higher density variance suggests local point clustering within the trajectory, which aligns with the high-density wandering patterns of fishing areas [
31].
3.6. Classification and Post-Processing
Vessel trajectory classification requires algorithms that can handle multi-dimensional features while maintaining computational efficiency. Tree-structure-based classifiers generally achieve high evaluation metrics for trajectory-based vessel classification [
32,
33]. XGBoost is selected as the classification algorithm for this study. The method constructs an ensemble of decision trees using gradient boosting, enabling it to capture nonlinear patterns in vessel movements. The XGBoost and Random Forest algorithms outperformed other methods in classification metrics for this problem. We selected the XGBoost for its computational efficiency and robust performance with multi-dimensional features. The key contributions of STACS lie in the integrated framework that combines spatiotemporal clustering, adaptive segmentation, and comprehensive feature extraction, enabling accurate recognition of fishing activities. The algorithm processes the extracted features from each trajectory segment to predict whether the segment is fishing or non-fishing. Its proven efficiency in handling sparse features, rapid training convergence, and robust generalization have been demonstrated in trajectory analysis and vessel type identification tasks [
32,
34,
35].
During model construction, labeled trajectory subsegments are partitioned into training and validation sets. Formally, the training set comprises
M labeled instances
, where
denotes the feature vector of the
j-th trajectory subsegment, and
yj∈{0, 1} indicates the fishing status (1 for fishing and 0 for navigation). XGBoost constructs an additive ensemble of decision trees by minimizing the following regularized objective function.
where
l(.) represents the loss function (e.g., logistic loss),
yj denotes the predicted probability, Ω(
ft) penalizes the complexity of the
t-th tree, and Θ encompasses all tree parameters. Through second-order gradient approximation and greedy tree-splitting optimization, XGBoost incrementally improves predictive accuracy while controlling overfitting.
After classification, each trajectory segment receives a binary label indicating whether it is fishing or non-fishing. However, fishing activities can last for hours or even days, and multiple fishing windows likely belong to a single complete fishing activity. Point-based classification often yields fragmented results, where continuous fishing operations appear as alternating segments of fishing and non-fishing activity. Therefore, post-processing is necessary to merge these fragments into coherent fishing episodes.
To resolve transient prediction inconsistencies, run-length encoding (RLE) compresses the label sequence {
y1,
y2, …,
yn} into alternating intervals of consecutive fishing (
y = 1) and non-fishing (
y = 0) windows.
where
represent the duration (window count) of each interval.
Valid fishing segments are identified via two criteria: (1) Boundary Integrity: Segments must start and end with fishing intervals (
). (2) Interval Dominance: Non-fishing interruptions
must satisfy the following:
This constraint ensures that only brief interruptions are merged while preserving distinct fishing operations. In practice, temporary pauses within a fishing operation are typically much shorter than the fishing activity itself. When a non-fishing period exceeds the duration of adjacent fishing segments, it is more likely to represent a genuine transition between separate fishing operations rather than a temporary interruption. This prevents the over-merging of independent fishing events while effectively reducing fragmentation from operational pauses.
RLE post-processing corrects prediction inconsistencies by merging brief non-fishing segments into surrounding fishing segments when they meet defined criteria. This simple scan extracts continuous fishing segments while identifying non-fishing segments as complementary intervals, ensuring temporal coherence in activity recognition.
3.7. Performance Evaluation
The evaluation examines STACS performance using two datasets with different characteristics. The Danish dataset contains high-resolution coastal trajectories, while the Global Fishing Watch (GFW) dataset provides global-scale vessel movements. Two evaluation levels are used. Point-level metrics measure classification accuracy for individual trajectory points. Segment-level metrics assess the coherence of fishing episodes. This approach evaluates both classification accuracy and the continuity of detected fishing activities, which is essential for practical fisheries management applications.
Specifically, at the point level, we adopt three standard metrics derived from the confusion matrix between ground-truth labels and the predicted labels of fishing activities.
(1)
Precision: It is defined as the ratio of the number of correctly classified fishing points to the total number of points predicted as fishing, which reflects the reliability of detected fishing points.
where
TP represents correctly classified fishing points and
FP denotes non-fishing points incorrectly classified as fishing points.
(2)
Recall: It is defined as the ratio of the number of correctly classified fishing points to the total number of actual fishing points, which measures the ability of the algorithm to detect true fishing activities.
where
FN represents fishing points incorrectly classified as non-fishing points.
(3)
F1-
score: It is defined as the harmonic mean of
precision and
recall, which provides a balanced evaluation metric of the algorithm’s ability to identify fishing points.
For segment-level evaluation, we quantify semantic coherence through two complementary metrics. Let ΘT = {θ1, θ2, …, θT} denote the predicted fishing segments and ΨV = {Ψ1, Ψ2, …, ΨV} represent the ground-truth fishing episodes. We use average purity, average coverage, and their harmonic mean metrics to access segment-level classification quality.
(1)
Average Purity (
AP): It evaluates internal consistency within predicted segments:
where
Ni,j denotes the number of points with label
γj in segment
θi,
L represents the total number of unique ground-truth labels,
T denotes the total number of predicted segments,
j ∈ {1, …,
L} indicates the label index, and
i ∈ {1, …,
T} represents the segment index.
(2)
Average Coverage (
AC): It measures how completely each ground-truth fishing segment is captured by the predicted fishing segments, which quantifies the completeness of fishing activities.
where
Nintersects(
Ψi,
θj) represents the number of data points that intersect between the ground-truth fishing segment
Ψi and the predicted fishing segment
θj,
Ni denotes the total number of data points in the ground-truth fishing segment
Ψi,
V is the total number of ground-truth fishing segments,
i ∈ {1, …,
V} indicates the index of ground-truth segments, and
j ∈ {1, …,
T} represents the index of predicted segments.
(3)
Harmonic Mean (
H): It resolves the intrinsic trade-off between purity and coverage and can be calculated as follows:
Seven baseline methods are compared to demonstrate the performance of the proposed STACS algorithm. Traditional approaches include speed thresholding (1–5 knots) and Gaussian mixture models (GMMs) for analyzing bimodal speed distributions [
16,
36]. Advanced methods comprise Hidden Markov Models (HMMs) for state-based behavioral classification using movement parameters [
37], convolutional neural networks (CNNs) for spatiotemporal feature extraction, and three segmentation-centric algorithms: GRASP-UTS for multi-objective optimization, sliding window segmentation (SWS), and WBS-RLE, which combines wild binary segmentation with run-length encoding [
5,
11,
14,
20].
Experiments are conducted using standardized protocols to ensure comparability. Both the pre-processing pipelines and hardware configurations are identical for all approaches, with trajectories divided by unique vessel identifiers to preserve temporal continuity. This setup enables both instantaneous classification fidelity and operational episode coherence to be stringently assessed on heterogeneous maritime scenarios. Detailed parameter settings and implementation specifications for all baseline methods are provided in
Appendix A. While inspired by the original publications, parameters were adjusted to accommodate the characteristics of our dataset and experimental requirements. All processing and computations were performed on an Intel i9-13900K/NVIDIA RTX 4090 platform using Python 3.8 with Pandas and NumPy libraries. The standardized pipeline ensures reproducibility across experiments.