1. Introduction
AIS functions as a cornerstone of maritime technology, facilitating the real-time exchange of critical navigational data—including vessel position, speed, and course—to mitigate collision risks and enhance the efficiency of maritime operations. To strengthen navigational safety and ensure effective traffic management, the International Maritime Organization has mandated the installation and operation of AIS on all passenger ships and vessels engaged in international voyages, establishing it as an essential component of the global maritime safety framework [
1,
2]. By providing dynamic information (e.g., position, speed, and course), static information (e.g., vessel name and Maritime Mobile Service Identity (MMSI)), and voyage-related information (e.g., destination and estimated time of arrival), AIS significantly improves situational awareness among ships and contributes to the systematic management of maritime traffic flow [
1].
However, AIS data acquired in real-world maritime environments is frequently compromised by data loss and errors. The primary causes include packet collisions arising from congestion and interference in Very High Frequency (VHF) communication channels [
3], signal attenuation or blockage due to adverse weather conditions and topographic obstructions [
4], as well as equipment failures or deliberate signal manipulation (spoofing) [
5]. These data irregularities extend beyond intermittent gaps, significantly undermining the overall reliability of maritime traffic systems. Recent comprehensive reviews on AIS track anomaly detection and studies on base station credibility monitoring under spoofing scenarios have highlighted these vulnerabilities [
6,
7]. To address these data integrity challenges, this study proposes an interpolation framework designed to ensure kinematic consistency, even in the presence of missing or erroneous AIS segments.
Notably, with the emergence of the MASS era, the quality of AIS data has become paramount. AIS serves as a primary information source that enables MASS to maintain situational awareness of surrounding vessels and the maritime environment, forming the foundation for critical navigational decisions such as voyage planning and collision avoidance. However, when AIS data is compromised by losses or errors due to the aforementioned factors, autonomous systems are forced to rely on inaccurate or incomplete information. This reliance can precipitate erroneous decision-making processes, potentially resulting in catastrophic maritime accidents. Therefore, beyond the existing challenges—such as reduced accuracy in VTS monitoring [
7] and diminished reliability in maritime accident investigations and route analyses [
5,
8]—guaranteeing the continuity and accuracy of AIS data is indispensable for ensuring the safe and efficient operation of MASS.
To mitigate these practical limitations of AIS data quality and safeguard mission-critical systems dependent on data-driven decision-making—such as MASS—the application of interpolation methods is indispensable. Interpolation functions as a fundamental mechanism for reconstructing fragmented trajectories by estimating missing navigational parameters, including vessel position, speed, and course. This process ensures data continuity and generates information that approximates the vessel’s actual dynamic state with high fidelity [
8,
9].
Recent studies have highlighted the deficiencies of conventional interpolation methods, such as linear and cubic splines, in adequately capturing the complex non-linear dynamics of vessel motion [
10,
11]. Consequently, alternative approaches have been proposed, including Improved Kinematic Interpolation (IKI) and machine learning (ML)-based methods such as Long Short-Term Memory (LSTM) networks and Random Forests [
4,
8]. While these methods demonstrate improved performance compared to traditional approaches, the direct application of kinematic interpolation to real-world AIS data—characterized by frequent losses and errors—can lead to error propagation. Moreover, ML-based techniques face significant challenges due to their reliance on large-scale training datasets, which are often scarce or computationally expensive to acquire in the maritime domain.
To address these challenges, this study proposes a boundary-aware, data-efficient interpolation framework (BAARTR) that robustly reconstructs vessel trajectories under realistic AIS loss and noise conditions, leveraging only time, latitude, and longitude inputs. The core innovation involves estimating the endpoint velocities of each missing segment and enforcing them as boundary constraints during interpolation, thereby preserving kinematic consistency and suppressing overshoots at curvature transitions. The reconstruction strategy adapts to the AIS reporting interval: for short gaps (defined as those shorter than the nominal reporting period), vessel speed is estimated from positions using a central-difference scheme; for long or irregular gaps, trajectory vectors are decomposed, and a complementary hierarchical regression model is used to predict endpoint velocities. These velocities are subsequently integrated into a clamped spline interpolation to produce kinematically plausible paths, all while maintaining a lightweight computational structure (local windows with ≤16 samples and a few iterations, no pre-training).
2. Related Work
A significant body of research has addressed vessel trajectory reconstruction using AIS data to ensure safe and efficient navigation. Existing literature can be broadly categorized into three streams: statistical and rule-based approaches, kinematic model-based approaches, and deep learning-based approaches.
Early research primarily utilized statistical characteristics and heuristic rules derived from navigation patterns. Sang et al. [
12] established a reconstruction procedure based on stepwise outlier removal and segment connection rules. In a subsequent study, Sang et al. [
13] refined this method by incorporating three heuristic rules and combining linear, curvilinear, and arc segments to improve reconstruction accuracy in inland waterways. Zhang et al. [
14] leveraged vector analysis for anomaly detection, differentiating between straight and curved segments to apply linear and cubic spline interpolation, respectively. Addressing Unmanned Surface Vehicle scenarios, Shi et al. [
15] integrated outlier removal, Empirical Mode Decomposition for noise reduction, and curve fitting with Fermat’s spiral, effectively demonstrating the feasibility of AIS-based approaches in real-world waters. To mitigate uncertainty in sparse AIS data, Zhang et al. [
16] employed an ant colony optimization algorithm with weighted nodes and edges, successfully reconstructing optimal paths in open-sea environments. More recently, Deng et al. [
17] utilized Graph Signal Variation Detection to eliminate false connections in complex crossing trajectories, thereby enhancing the stability of downstream interpolation and prediction. Furthermore, Chen and Huang [
18] proposed a clustering algorithm, which is based on trajectory reconstruction, an unsupervised clustering method that aggregates similar trajectory patterns without manual labeling, facilitating a novel reconstruction approach. Finally, Liang et al. [
19] developed the AISClean framework, which integrates statistical detection, polynomial interpolation, and Dynamic Time Warping, verifying the capability to reconstruct long-term missing segments.
Concurrently with statistical approaches, research incorporating the explicit physical motion characteristics of vessels has gained significant traction. Hedger et al. [
20] presented a position–velocity optimal interpolation framework for hydrophone array data, highlighting the criticality of interpolation design in reconstructing trajectories from sparse and irregular observations. Similarly, Hintzen et al. [
21] applied cubic Hermite splines to Vessel Monitoring System (VMS) data; by incorporating derivative values (i.e., speed) at each data point, this method significantly improved trajectory accuracy compared to standard linear interpolation. Long [
22] introduced a kinematic interpolation scheme that directly integrates velocity and acceleration into the interpolation process, demonstrating its effectiveness in bridging AIS data gaps. Building on this foundation, Du et al. [
23] enhanced linear interpolation by incorporating quadratic polynomials constrained by speed, heading, and estimated acceleration. Guo et al. [
10] further advanced this line of inquiry by introducing IKI, which refined acceleration estimation accuracy and boosted overall trajectory reconstruction performance. Addressing the challenge of irregular AIS sampling intervals, Zaman et al. [
24] employed a two-stage approach comprising trajectory reconstruction followed by waypoint detection, which outperformed conventional methods in identifying turning points. More recently, Liu et al. [
25] combined navigation state recognition with Bi-Directional Kinematic Interpolation (BDKI), leveraging both forward and backward boundary conditions to improve reconstruction accuracy. This approach aligns with the boundary-aware methodology proposed in our study, further validating the empirical effectiveness of such kinematic constraints. Wang et al. [
26] corroborated these insights by reporting that piecewise cubic hermite interpolating polynomial (Hermite) preprocessing improved trajectory classification and prediction in real-world confluence waterways, a finding consistent with the comparative analysis between Hermite and clamped interpolation presented herein.
In recent years, deep learning paradigms have been increasingly deployed to capture complex temporal patterns in AIS data for trajectory reconstruction. Zhong et al. [
27] leveraged a Bi-directional LSTM Recurrent Neural Networks model, achieving high accuracy with an average RMSE of approximately 10 m. Adapting architectures from computer vision, Li et al. [
28] applied the U-Net model—originally designed for image segmentation—to trajectory reconstruction, demonstrating superior performance in handling curved segments. To address data stability, Murray and Perera [
29] utilized dual autoencoders to compress and reconstruct spatiotemporal features. Further enhancing accuracy, Chen et al. [
30] proposed an integrated framework that combines outlier removal (via moving average models) with neural network prediction, effectively mitigating AIS noise while reconstructing realistic trajectories. Similarly, Wu et al. [
31] introduced a unified solution merging low-rank tensor completion with outlier separation to handle missing data and noise simultaneously. Advancing multi-step prediction capabilities, Ye et al. [
32] integrated feature-correlation-based variable selection with a Seq2Seq model. Taking a hybrid approach, Li et al. [
33] proposed a Graph Attention Network–LSTM model capable of jointly learning spatiotemporal features, which reduced the mean error by 44–56% compared to existing methods. Expanding the domain to fisheries, Zhao et al. [
34] introduced HiTrip, a deep learning-based framework fusing fishing vessel VMS data with oceanographic data; this method overcame challenges associated with irregular vessel movements and low-resolution datasets to accurately reconstruct missing segments of historical trajectories. Most recently, Chen et al. [
35] presented a framework integrating advanced denoising and interpolation, demonstrating the potential to simultaneously address real-world noise and data gaps.
Collectively, these studies have consistently advanced the accuracy of vessel trajectory reconstruction through statistical, kinematic, and deep learning approaches. However, significant limitations persist. First, despite the high performance of deep learning models, they entail substantial computational costs and data dependency; furthermore, their generalization capability often diminishes when encountering anomalous navigation patterns absent from the training dataset. Second, although kinematic models effectively capture vessel dynamics, they tend to decouple velocity estimation from position interpolation or apply a uniform interpolation strategy regardless of the missing interval duration.
To bridge these gaps, this study proposes the BAARTR framework. BAARTR employs an adaptive strategy that selectively applies central differencing or regression analysis contingent on the length of missing intervals, while precisely estimating velocity in data gaps through iterative polynomial regression (assuming variable acceleration) with limited inputs. Crucially, the predicted velocities are directly enforced as boundary conditions in a clamped spline interpolation, thereby establishing a unified process between velocity estimation and position reconstruction. Consequently, BAARTR minimizes reliance on complex deep learning models while capturing vessel dynamics with high fidelity, enabling reliable trajectory reconstruction even from sparse AIS data.
3. Research Method
3.1. Overall Framework
As illustrated in
Figure 1, the proposed BAARTR framework comprises three main stages: (1) data preprocessing, (2) trajectory segmentation and velocity estimation, and (3) trajectory reconstruction.
In the preprocessing stage, raw AIS data are decoded and cleaned. Data acquisition was performed using a custom receiver system integrating a Raspberry Pi 4 with a dAISy HAT, capturing AIS messages in ASCII format. The study dataset encompasses vessels navigating the Mokpo Port waterway in the Republic of Korea, recorded from June 2023 to June 2024 (an overview is provided in
Table 1).
In real-world scenarios, while raw AIS data typically includes vessel position, MMSI, and other static/dynamic parameters, frequent data anomalies are observed. Specifically, some records lack critical attributes (e.g., vessel dimensions, Speed Over Ground (SOG), Course Over Ground (COG)), and significant discrepancies often exist between the reported speed data and the actual positional displacement. To address these irregularities, this study developed the BAARTR framework to robustly reconstruct vessel trajectories even when input data is compromised.
The study area encompasses the Mokpo Port channel, which operates under regulations enforced by the West Sea Regional Coast Guard. Vessels navigating the 3.2 km-long and 450 m-wide channel between Dalli Island and the Hwawon Peninsula are subject to a maximum speed limit of 20 knots. Consequently, AIS data records indicating a SOG exceeding 20 knots were identified as anomalies and filtered out.
In the trajectory segmentation and velocity estimation stage, the reconstruction strategy is adaptively determined based on the AIS reporting interval, which varies according to vessel speed as summarized in
Table 2. To ensure kinematic consistency rather than relying on potentially unreliable raw SOG values reported in AIS, vessel velocity is independently recalculated using the central difference method. Based on this computed velocity, the reconstruction procedure bifurcates into two distinct pathways: when the reporting interval is short, velocity is estimated using the central difference method; conversely, when the interval is long, a regression-based velocity estimation model is applied.
The BAARTR framework employs an adaptive velocity estimation strategy contingent on the AIS reporting interval. When the transmission interval is relatively short—ensuring that distance variations between consecutive observations are reliably captured—vessel speed is computed using the central difference method. Specifically, the velocity vector
at the
-th position, defined with respect to the distance
between two adjacent positions, is expressed as follows:
Conversely, when the AIS reporting interval extends beyond one minute or exhibits significant irregularity, the accuracy of the central difference method diminishes due to discretization errors. To mitigate this, our approach employs a regression-based velocity prediction framework. In this scheme, position and time data for each segment serve as inputs to iteratively estimate the continuous time-series function of vessel speed.
In this study, a vessel’s trajectory is modeled as a time-series sequence within a two-dimensional vector space, where position is defined by latitude and longitude coordinates. The position vector at time step is denoted as . The velocity vector, defined as the first derivative of position with respect to time, is decomposed into orthogonal components along the longitude and latitude axes to facilitate independent prediction and interpolation. This decoupling strategy ensures the stability of velocity and acceleration estimation while minimizing error propagation arising from cross-axis correlations during the reconstruction process.
A vessel can be kinematically characterized as a moving object whose acceleration, velocity, and position vary over time. These changes arise from factors such as waves, wind, encounters with other vessels, obstacles, engine operations depending on the route, and maneuvers such as turning, all of which influence acceleration. To describe these dynamics, this study formulates acceleration using a non-uniform acceleration equation of motion. Although such equations can be expressed using polynomial forms ranging from quadratic to higher orders, increasing the polynomial degree tends to cause overfitting. While it is recognized that a rigorous description of vessel motion necessitates hydrodynamic modeling (e.g., MMG or Nomoto models) that accounts for mass, drag, and external forces, applying such dynamic models to sparse AIS data is often infeasible due to the unavailability of specific hydrodynamic coefficients and force inputs.
Consequently, this study adopts a parametric kinematic approach, modeling the time-variant acceleration as a quadratic polynomial. This formulation serves as a robust kinematic approximation designed to capture variable acceleration trends without overfitting, rather than attempting to simulate a fully grounded dynamic vessel model. Through successive integration with respect to time, we derive a cubic velocity model and a quartic position model for each trajectory segment.
To enhance the accuracy of velocity estimation in missing segments, this study performs regression analysis iteratively, incorporating the estimated values from the previous step as inputs for the subsequent regression. This iterative approach ensures both the continuity and physical plausibility of segment-wise predictions, while the numbers of training samples and iterations are optimized across the entire vessel trajectory based on RMSE to prevent overfitting.
Once velocity has been estimated, BAARTR applies adaptive clamped quartic polynomial interpolation to reconstruct positions within each segment. Clamped interpolation employs the slopes at the segment endpoints as boundary constraints, thereby maximizing the continuity and smoothness of the trajectory in accordance with the estimated velocities. The interpolation polynomial on the -th segment is defined as .
The core of this study lies in a double-anchoring strategy, whereby missing segments are reconstructed using clamped interpolation while the boundary slopes (velocities) are reliably estimated through adaptive hierarchical regression. First, the velocity estimation pathway is chosen according to the AIS reporting interval. For short intervals (AIS reporting interval ), the central difference method is employed to minimize noise, whereas for long intervals (AIS reporting interval ), boundary slopes are refined through iterative feedback ( times) of cubic velocity regression with acceleration constrained to a quadratic form. The resulting endpoint slopes are imposed as boundary conditions in the clamped quartic polynomial interpolation, effectively suppressing oscillatory artifacts in high-curvature segments while ensuring velocity continuity. The number of training samples () and iterations () is determined via RMSE-based sensitivity analysis across the entire trajectory, thereby achieving a balance between overfitting suppression and continuity assurance.
3.2. Boundary-Aware Adaptive Regression
Before regression, raw AIS trajectories were subjected to a light low-pass stabilization to suppress transient positional jitter caused by Global Positioning System noise while preserving the original trend. This step aimed to ensure numerical stability during the regression process rather than to smooth or reshape the data. The cut-off frequency was empirically determined as 0.1 Hz, which effectively removed short-term fluctuations (<10 s) without affecting vessel-scale motion patterns.
When the reporting interval is long, errors increase if trajectory reconstruction relies solely on the central difference method. To address this, the present study employs the BAAR technique to calculate velocity while accounting for vessel dynamics. For the -th data point, BAAR performs stages of hierarchical regression based on training samples, differing from conventional one-time regression models by incorporating the predicted values from the previous stage into the subsequent learning process. The number of training samples () and the number of regression iterations () are empirically predetermined through RMSE-based analysis across the entire trajectory.
Step 1: Initial Regression (.
The velocity at the
-th point,
, is estimated using the previously described velocity model
, with the training dataset
defined as follows:
In the same manner, is also estimated.
Step 2: Complementary Hierarchical Regression.
At the
-th iterative regression, the training process incorporates the predicted values obtained from the previous stage.
The iterative regression aims at empirical stabilization, not a formal convergence proof. Preliminary experiments were performed with and . The results indicated that provided the lowest RMSE while preserving local responsiveness, whereas larger n values led to excessive smoothing. Similarly, yielded an optimal balance between regression refinement and computational cost, as RMSE improvement saturated beyond . Within these ranges, BAARTR showed stable RMSE variation (<5%), indicating that its performance is not highly sensitive to parameter changes. Additionally, these parameters vary depending on factors such as the ports through which vessels navigate and vessel traffic volume.
This complementary hierarchical structure enables accurate velocity estimation even in long-term missing segments by incorporating feedback among predicted values while reflecting vessel dynamics. To preserve the physical realism of velocity, BAAR constrains acceleration to a quadratic polynomial and integrates it to construct a cubic polynomial regression model for velocity, thereby accounting for the smooth variation in velocity inherent in vessel motion. The clamped approach is distinctive in that it imposes kinematic constraints directly, unlike purely geometric interpolations. By fixing the slopes (velocities) at the start and end of interpolation segments, overshoot and oscillations that typically arise at turning initiation and termination are structurally suppressed. Meanwhile, BAAR limits acceleration to a quadratic form and models velocity as a cubic polynomial, minimizing degrees of freedom while still incorporating trajectory continuity constraints as a kinematic approximation. Iterative regression further updates boundary estimates in a complementary manner, suppressing the propagation of uncertainty even in long-duration gaps. This combined framework ultimately enhances the overall accuracy of trajectory reconstruction (
Figure 2).
4. Results and Analysis
4.1. Experimental Setup Overview
In this study, AIS data from vessels navigating the Mokpo Port channel were collected between June 2023 and June 2024, and three representative vessels were selected for performance evaluation of BAARTR, as illustrated in
Figure 3,
Figure 4 and
Figure 5 (the background colors represent the electronic navigational chart features: brown indicates land, blue represents navigable waters, and light green indicates shallow waters or tidal flats.). The selected vessels consisted of one fishing vessel, one cargo ship, and one tugboat, which are the dominant ship types operating in the Mokpo Port channel. These vessels exhibited different speeds, reporting intervals, and trajectory patterns, and the dataset was obtained under real navigation conditions. To validate performance, artificial data removal was conducted to generate missing segments, which were then reconstructed to assess interpolation accuracy.
To objectively evaluate the performance of the proposed BAARTR, we conducted comparative experiments against six representative deterministic interpolation methods: linear, spline, Hermite, and Bezier. Furthermore, two advanced shape-preserving techniques—PCHIP and Makima—were specifically included as state-of-the-art (SOTA) lightweight baselines. While standard cubic splines often suffer from overshoot oscillations, PCHIP and Makima are designed to preserve the shape of the data, offering a rigorous benchmark for assessing the stability of the proposed algorithm. We specifically selected fundamental geometric and kinematic interpolation methods (Linear, Spline, Hermite, Bezier, PCHIP, Makima) as baselines to isolate the contribution of the proposed boundary-aware mechanism. Smoothing filters such as Savitzky–Golay were excluded because they are designed for denoising continuous data rather than reconstructing large gaps in sparse datasets. Additionally, complex state-recognition-based frameworks like BDKI were not included in the direct comparison to focus on evaluating the performance of the pure interpolation kernel itself under identical input conditions. The evaluation metric was the RMSE, and the mean RMSE for each segment was calculated through repeated experiments.
4.2. Interpolation Results
The influence of the number of training samples and regression iterations on interpolation accuracy, according to the trajectories of the target vessels, can be examined through the thresholds presented in
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10 and
Figure 11 for the three vessel scenarios.
(1) Selection of Training Sample Size.
Across all three scenarios—(a) id440XXX000, (b) id440XXX860, and (c) id440XXX320—a sharp decline in RMSE was observed with increasing numbers of training samples, followed by a gradual transition into a saturation region. In this study, was selected by considering (i) the residual reduction margin within the saturation region, (ii) fairness of comparison through a common setting across scenarios, and (iii) stability of boundary conditions (velocity) in long-term missing segments. This selection (n = 16) incurs minimal computational overhead while maintaining robust performance across all scenarios.
(2) Selection of Regression Iterations.
The distribution of RMSE with respect to the number of regression iterations is shown in
Figure 9,
Figure 10 and
Figure 11. The optimal number of iterations was identified as (a)
, (b)
, (c)
for each scenario, respectively. The required number of regression iterations varies depending on the trajectory patterns generated by the vessels during navigation.
The comparative results between BAARTR and other interpolation methods, based on the selected number of training samples and regression iterations for each scenario, are presented in
Figure 12,
Figure 13 and
Figure 14. The visualization of reconstructed trajectories demonstrates that BAARTR generates smoother and more kinematically consistent paths compared with conventional interpolation methods (linear, spline, Hermite, etc.). In particular, in scenario id440XXX860, which involves sharp turning maneuvers, traditional methods exhibited oscillations and overfitting effects, whereas BAARTR produced reconstruction results that reflect continuous maneuvering characteristics during navigation.
Across all scenarios, conventional interpolation methods failed to produce trajectories that aligned with the actual vessel paths. In cases involving gentle curves, sharp turns, and high-curvature segments, BAARTR’s advantages in preserving kinematic consistency and suppressing oscillations were clearly evident compared to other methods.
4.3. Comparison of RMSE Distributions Across Interpolation Methods
Figure 15,
Figure 16 and
Figure 17 present boxplots and violinplots of RMSE distributions for Linear, Spline, Hermite, Bezier, PCHIP, Makima, and the proposed BAARTR methods across the various scenarios. RMSE was calculated by artificially removing one to three segments from each trajectory, creating intervals of 15–30 s, 30–45 s, and 45–60 s. Statistical outliers, depicted as red crosses, are defined as data points falling beyond
of the RMSE distribution. These points correspond to trajectories with irregular gap patterns or abrupt maneuvers, causing unusually high interpolation errors. In all cases, BAARTR exhibited the lowest median RMSE and the smallest variance. By contrast, geometric interpolations (Spline/Bezier) repeatedly showed increased variance and a higher frequency of outliers, particularly in curvature transitions and long-term missing segments.
Analyzing the performance drivers, BAARTR achieved the lowest median and the narrowest IQR, while also producing the fewest outliers. Linear interpolation, conversely, showed elevated median values and wide variance due to structural discontinuities (lack of smoothness at segment boundaries). Spline and Bezier interpolations suffered from overshoots and oscillatory artifacts at curvature transitions and endpoint connections, which contributed to greater variance and error magnitude. Hermite interpolation offered greater stability than the purely geometric methods but still exhibited minor deviations contingent on the accuracy of endpoint slope estimation. The Average Speed method demonstrated moderate performance, serving as a baseline reference. A quantitative summary of the detailed RMSE values for each scenario is provided in
Table 3.
4.4. Computational Complexity and Efficiency Analysis
To objectively verify the “lightweight” claim of the proposed framework, we analyzed the theoretical time complexity of BAARTR in comparison with six deterministic interpolation methods (Linear, Spline, Hermite, Bezier, PCHIP, Makima) and a deep learning-based reference.
As summarized in
Table 4, all deterministic methods, including the state-of-the-art shape-preserving algorithms (PCHIP and Makima), operate within
linear time complexity. This indicates that the computational cost increases linearly with the number of trajectory points, ensuring high scalability.
However, a qualitative distinction exists in their operational logic. While standard methods (Linear, Spline) and shape-preserving methods (PCHIP, Makima) rely solely on geometric constraints to determine the curve, BAARTR incorporates kinematic constraints—specifically, velocity estimation via multi-round regression—within the same complexity class. The core processes of BAARTR are sequential and deterministic, requiring no iterative optimization or heavy matrix multiplications. Consequently, BAARTR maintains the extreme computational lightness of mathematical interpolation methods while offering the kinematic consistency typically sought in complex models, making it far more suitable for real-time VTS systems than deep learning approaches (e.g., LSTM), which scale with .
5. Discussion
This study proposed the BAARTR framework designed for environments where AIS data are compromised by data loss and irregular reporting intervals. We evaluated its performance against conventional interpolation methods using AIS data from three vessels operating in actual port waters. BAARTR adopts clamped polynomial interpolation, in which the boundary slopes (velocities) of missing segments are explicitly enforced as constraints. Unlike simple geometric interpolations (linear, spline, Bezier), this approach guarantees both velocity continuity and kinematic plausibility, thereby suppressing overshoots and oscillatory artifacts at trajectory endpoints (
Figure 5 and
Figure 6). Consequently, oscillations in high-curvature regions were mitigated, and unrealistic turning radius distortions were alleviated.
For short reporting intervals, central differencing was applied, while for long or irregular intervals, velocity prediction was performed using a hierarchical iterative regression strategy. This dual approach enabled both numerical stability in short gaps and prediction reliability in long gaps. Specifically, constraining acceleration to a quadratic polynomial and integrating it to construct a cubic velocity model effectively limited excessive degrees of freedom while still reflecting vessel inertia and hydrodynamic resistance as a first-order approximation (Equations (2) and (3)).
Across all three scenarios, the RMSE curves for training sample size showed a sharp decline followed by convergence, with
n = 16 providing stable low-variance performance at minimal computational cost (
Figure 6,
Figure 7 and
Figure 8). The optimal number of regression iterations,
or
, varied depending on route geometry (specifically curvature and turning segments), suggesting that boundary velocity estimates can be sufficiently stabilized with only a few feedback iterations (
Figure 9,
Figure 10 and
Figure 11).
Boxplot comparisons further revealed that BAARTR consistently achieved the lowest median values and narrowest IQR across all scenarios (
Figure 15,
Figure 16 and
Figure 17). As shown in
Table 3, BAARTR yielded the minimum RMSE in each case, outperforming Linear, Spline, Hermite, Bezier, PCHIP and Makima. Notably, in scenario (b), which involved sharp turns, geometric interpolations produced greater variance and outliers due to endpoint oscillations and overshoots; in contrast, BAARTR maintained stability through boundary-condition-based regularization. The consistently lower median and IQR values of BAARTR can be attributed to: (i) the accuracy of the regression module, which filters out noise inherent in spline/Bezier methods, and (ii) the use of clamped interpolation, which enforces velocity continuity (kinematic consistency) in curved trajectories, preventing error propagation even in complex segments with significant data loss. These findings are substantiated by the RMSE results in
Figure 5 and
Figure 6 and
Table 3. Furthermore, theoretical complexity analysis confirms that BAARTR operates with strictly linear time complexity (
), maintaining the same level of computational efficiency as lightweight mathematical models (e.g., Makima). This contrasts with deep learning approaches (
), making BAARTR highly suitable for embedded systems.
Considering the realities of VHF communication, where data losses, delays, and packet collisions occur frequently, BAARTR ensures the continuity and kinematic consistency of trajectories even under conditions of sparse sensor availability. The method can be seamlessly integrated as a post-processing module to compensate for missing AIS data in MASS navigation systems, particularly within voyage planning and collision-avoidance modules, as well as in VTS accident reconstruction, investigative support, and port traffic flow analysis. Moreover, the high-resolution (one-second) reconstruction outputs offer high potential for fusion with radar and camera tracking systems.
Nevertheless, this study acknowledges certain limitations. Only three variables—time, latitude, and longitude—were used, while velocity was reconstructed via central differencing or regression estimation. Although this strategy maximizes model simplicity and general applicability, it may not fully capture dynamics in extreme environmental conditions, such as strong cross-currents, tidal effects, or abrupt steering inputs. Additionally, the dataset was limited to AIS data collected from a specific waterway and temporal period. Future extension to open seas, high-speed navigation, or heavily congested communication environments may yield different performance characteristics.
6. Conclusions
This study introduced BAARTR, a novel interpolation framework that integrates boundary awareness (velocity and slope) with adaptive regression to mitigate the challenges of data loss and irregular sampling in AIS data. The efficacy of the method was rigorously evaluated against conventional interpolation methods (Linear, Spline, Hermite, Bezier, PCHIP, Makima) across three real-world vessel trajectories collected in a port channel. In the proposed approach, velocities at the boundaries of missing segments are estimated through hierarchical regression and explicitly enforced as constraints in clamped interpolation, thereby ensuring kinematic continuity and smooth curvature transitions in the reconstructed trajectories. As a result, BAARTR consistently achieved lower median RMSE, narrower IQR, and fewer outliers across all scenarios, demonstrating superiority in both accuracy and stability. Visual comparisons further confirmed that BAARTR effectively suppressed overshoots and oscillatory artifacts (observed in Spline/Bezier) as well as structural discontinuities (observed in Linear), thereby reproducing trajectories that adhere to realistic navigational dynamics.
Sensitivity analysis confirmed that using training samples provides an optimal balance among performance, stability, and computational cost. Furthermore, setting the number of regression iterations conservatively to or , depending on segment dynamics, effectively prevents overfitting and error propagation while delivering robust performance gains. Moreover, the theoretical complexity analysis proved that BAARTR operates with linear complexity, ensuring real-time feasibility unlike deep learning-based approaches. Therefore, BAARTR serves as an optimal lightweight solution for restoring missing trajectory data in resource-constrained maritime environments. Beyond operational monitoring, BAARTR provides realistic trajectories without overfitting in accident reconstruction and route analysis. Moreover, when applied as a data normalization pre-processor for downstream tasks such as prediction, classification, and anomaly detection, it effectively mitigates performance variability caused by irregular AIS reporting intervals.
Despite the promising results, several limitations of this study warrant mention. First, the BAARTR framework remains contingent upon the quality and density of AIS observations. When long-term data gaps coincide with large curvature transitions, uncertainty in boundary velocity estimation increases, potentially leading to elevated local errors. Second, the method employs a simplified quadratic acceleration model as a computational heuristic. Since this purely kinematic approach does not explicitly account for vessel mass, added mass, or hydrodynamic damping, it inherently limits the ability to fully capture nonlinear dynamics or abrupt changes in motion under strong external forces (e.g., gusts, complex currents) or complex maneuvers. Third, this study relied exclusively on univariate interpolation based on time, latitude, and longitude, thereby omitting exogenous variables such as currents, wind, vessel type/load, and engine response. Fourth, complex interaction scenarios caused by multi-vessel encounters (e.g., collision avoidance or formation keeping) may complicate boundary estimation. Finally, the dataset was confined to a specific region and time period, necessitating further validation to ensure generalization across diverse waterways and environmental conditions.
While the framework was intentionally designed to operate with minimal inputs (time, latitude, longitude) to ensure low latency, we acknowledge that excluding SOG/COG, vessel type/handling characteristics, and environmental forcing may limit absolute accuracy. However, when available, these variables can be readily incorporated as training data for endpoint-velocity estimation.
Future research will focus on the following key areas: First, generalization capability will be enhanced by integrating exogenous variables—such as Speed and Course Over Ground (SOG/COG), vessel type/tonnage, and environmental factors including weather and currents—into a multivariate boundary estimation framework. Second, incorporating navigation state recognition (e.g., distinguishing between straight sailing and turning) alongside waypoint-aware strategies would enable the automatic selection of optimal interpolation methods for each segment. Third, hybridization with path-planning-based restoration algorithms is recommended to suppress kinematically implausible trajectories, particularly under conditions of severe data loss or spatial constraints. Fourth, multimodal fusion with radar, vision-based tracking, and IMU data, as well as validation across multiple waterways, should be conducted to ensure greater robustness and general applicability. Finally, we aim to refine BAARTR by leveraging data-driven enhancements through machine learning with large-scale training datasets and multiple parameters.
In conclusion, BAARTR has demonstrated the ability to achieve both accuracy and stability under conditions of missing and irregular AIS data, while incorporating critical features such as noise suppression and continuous trajectory connectivity. By balancing a lightweight model with high extensibility, including multivariate integration, state awareness, and path-planning linkage, the proposed boundary-aware adaptive regression framework is positioned as a promising foundational technology with significant potential for applications in autonomous navigation, maritime traffic control, and accident reconstruction.