A Method to Infer Customary Routes via Analysis of the Movement Importance of Ship Trajectories Calculated Using TF-IDF

Sim, Seung; Cho, Jun-Rae; Jung, Jae-Ryong; Baek, Jong-Hwa; Cho, Deuk-Jae

doi:10.3390/jmse14010029

Open AccessArticle

A Method to Infer Customary Routes via Analysis of the Movement Importance of Ship Trajectories Calculated Using TF-IDF

by

Seung Sim

^1,*

,

Jun-Rae Cho

¹,

Jae-Ryong Jung

²,

Jong-Hwa Baek

³

and

Deuk-Jae Cho

^3,*

¹

Suresoft Technologies Inc., Pangyo Headquarters AX Center, Seongnam 13453, Republic of Korea

²

Suresoft Technologies Inc., Daejeon Office AX Center, Daejeon 34129, Republic of Korea

³

Korea Research Institute of Ships & Ocean Engineering, Maritime Digital Transformation Research Center, Daejeon 34103, Republic of Korea

^*

Authors to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2026, 14(1), 29; https://doi.org/10.3390/jmse14010029

Submission received: 21 November 2025 / Revised: 19 December 2025 / Accepted: 20 December 2025 / Published: 23 December 2025

(This article belongs to the Special Issue Advanced Ship Trajectory Prediction and Route Planning)

Download

Browse Figures

Versions Notes

Abstract

Ship positional data are widely used for route inference, yet most existing studies rely on automatic identification system data, which contain irregular transmission intervals and limit the ability to capture vessel-specific operational habits and subtle route choices. This study addresses these limitations by proposing a methodology to infer customary routes using periodic 3 s ship position data collected through the Korean e-Navigation system based on long-term evolution maritime communication. The method comprises three main steps: constructing a sea-area grid with an associated weight map, determining data-driven importance and updating weights, and performing pathfinding. Domestic waters are divided into 100 m grids, and navigable and non-navigable areas are binarized to establish a framework for route exploration. Ship positional data are processed to extract inter-port trajectories, which are then classified by ship size and tidal time zone to account for navigational differences arising from vessel characteristics and tide-dependent accessibility. These trajectories are combined with spatial grids and transformed into a document–word structure, enabling the calculation of movement importance between grid cells using a modified term frequency–inverse document frequency measure. The resulting weights are applied to a pathfinding graph to derive routes that reflect vessel size and tidal conditions. The effectiveness of the proposed method is evaluated by computing cosine similarity between the inferred routes and actual trajectories.

Keywords:

customary route; trajectory data; TF-IDF; pathfinding; Korean e-Navigation; LTE-M

1. Introduction

The International Maritime Organization (IMO) has established and is globally disseminating the e-Navigation strategy to improve maritime safety, security, operational efficiency, and digitalization [1,2,3]. This trend is laying the foundation for a faster integration of maritime traffic information services and digitization of ship operations. The Korean e-Navigation service has developed the world’s first periodic (3 s) trajectory network based on the long-term evolution for maritime (LTE-M) communication protocol to complement the irregular transmission cycles and gaps in existing automatic identification system (AIS) data [4]. This uniform high-frequency data provides a foundation for detailed analysis of the actual operational characteristics of coastal vessels, and thus provides higher reliability compared to traditional AIS data.

The development of e-navigation services and maritime autonomous surface ships (MASS) [1,5] has sparked interest in technologies that infer ship routes based on data. Researchers are also investigating path-following control methods for MASS [6]. Conventional methods to provide ship routes have focused on presenting uniform routes based on navigational safety regulations. However, coastal vessels tend to select habitual routes based on factors such as weather, tide, and experience, which are known as “customary routes.” In this study, we propose a method to infer customary routes from real navigation data.

The proposed method calculates the movement importance for each ship size using LTE-M positional data from the Korean e-Navigation system, and then applies this to a pathfinding algorithm to infer customary routes. Specifically, trajectory data between ports are generated, and the movement importance between grid cells is calculated by applying a modified term frequency–inverse document frequency (TF-IDF) algorithm, which is typically used in information retrieval. Subsequently, the calculated movement importance is combined with navigational constraints such as coastlines, fishing grounds, aquaculture farms, and electronic chart objects to construct a weight map for pathfinding. Finally, the customary route inference results are evaluated using the A* algorithm [7] and Dijkstra’s algorithm [8].

The main contributions of this study are summarized as follows:

More precise trajectory-based customary routes can be inferred using LTE-M positional data, which allows for uniform data collection compared to AIS.
The proposed method quantifies movement importance in pathfinding by utilizing TF-IDF, a statistical weighting technique, to analyze trajectory data.
The outputs of the proposed customary route inference method exhibited high similarity with actual trajectories, and the feasibility of the proposed approach was verified in that no significant reduction in calculation speed was observed compared to an existing method.

2. Research Related to Route Search

2.1. Research on Trajectory Prediction

AIS data are a representative example of trajectory data that can collect real-time information such as the position, speed, and course of a ship. According to the International Telecommunication Union Technical Standard [4], for Class-A vessels, the AIS data transmission rate is determined by the dynamic information status and speed, as listed in Table 1.

AIS has been reported to exhibit irregular transmission cycles, gaps in shaded areas, and reduced reliability owing to human manual input errors or equipment configuration issues [9]. Numerous prediction techniques have been proposed to address these issues.

Gao, D. et al. [10] developed a multi-point long short-term memory (LSTM) model for ship trajectory prediction and demonstrated that deep recurrent neural network architectures can effectively capture the nonlinear spatiotemporal patterns of vessel movements. Sun, Y. et al. [11] showed that denoising AIS data and normalizing temporal intervals can significantly improve the performance of trajectory prediction models. In addition to deep-learning-based methods, Liu, J. et al. [12] utilized an improved support vector machine (SVM), a traditional machine-learning technique, for vessel trajectory prediction. More recently, Zaman, U. et al. [13] compared the performance of various neural network models, including convolutional neural networks (CNN), deep neural networks (DNN), LSTM, and gated recurrent units (GRU), using AIS data collected from Korean coastal waters.

2.2. Research on Route Calculation

Previous studies on route generation based on electronic charts have applied various pathfinding algorithms, as listed in Table 2.

Liu et al. [14] proposed an A*-based pathfinding method that reflected depth and obstacle information. Li et al. [15] improved computational efficiency and the quality of avoidance paths by improving the D* Lite algorithm, while Gu et al. [16] presented an improved rapidly exploring random tree (RRT) based route-planning method that uses representative routes extracted from AIS trajectories and compressed via the Douglas-Peucker algorithm as prior information. Charalambopoulos et al. [17] proposed a ship weather-routing method that incorporates meteorological and oceanographic forecasts into the cost function within a probabilistic roadmap (PRM) framework. Zhai et al. [18] introduced a sparse A* algorithm that reduces the search space by using only key points as nodes while maintaining path quality. Liu et al. [19] proposed an intelligent route-planning method based on an adaptive step-size Informed-RRT* that adjusts the extension step according to environmental complexity, thereby reducing path length and the number of sampled nodes. He et al. [20] computed optimal ship routes that reflect actual sailing patterns by extracting and clustering turning points from AIS trajectories and then applying Dijkstra’s algorithm and ant colony optimization (ACO).

2.3. Research on Clustering

Approaches to generalize patterns in trajectory data to derive representative paths have also been actively pursued. Yuan et al. [21] organized the routes of inland waterway vessels using ordering points to identify the clustering structure (OPTICS) clustering and Douglas-Peucker simplification. Huang et al. [22] proposed a trajectory framework based on density-based spatial clustering of applications with noise (DBSCAN) to extract representative trajectories for each cluster. Oh et al. [23] classified the passage characteristics of large vessels by analyzing their operational patterns and berthing characteristics based on k-means clustering.

However, methods based on clustering average multiple trajectories to derive representative paths, which makes it difficult to fully reflect the actual choices and preferences of operators or differences based on conditions. To overcome these limitations, we propose a method to calculate the importance of specific intervals.

2.4. Research on Importance Calculation

To improve the accuracy of route inference, it is necessary not only to perform pathfinding, but also to quantify the relative importance of each grid cell or segment. In network analysis, graph-centrality measures such as closeness and betweenness have been widely used [24], and PageRank is another representative approach that evaluates the importance of nodes based on information about link structures [25]. To infer customary routes, higher importance values must be assigned to grid cells that are frequently selected across many trajectories. Therefore, we adopt TF-IDF as an information retrieval method to compute relative importance [26].

3. Problem Definition and Data Collection

Prior research on route inference has focused on deriving mathematically optimal routes considering factors such as operational safety, fuel efficiency, and computational performance. However, this method fails to reflect customary routes based on ship size. Therefore, the proposed method infers a route that reflects actual operational characteristics based on ship specifications and historical trajectory data. To specify the direction of operation, we compiled data on port areas to be used as points of origin and destination. Obstacle data such as coastlines, fishing grounds, fish farms, and electronic chart objects were used to identify nonnavigable areas. In addition, depth and tide data were reflected in the obstacle data while considering the differences in operating areas based on the size of a given vessel.

3.1. Static Data from Vessels

Static data were collected from vessels using the Korean e-Navigation service. We also collected data linked to the Korea Research Institute of Ships and Ocean Engineering (KRISO). As listed in Table 3, tables containing ship specifications and ship type data were collected along with a status table to confirm the normal collection of ship operation data. However, the Maritime Resource Name (MRN) and Maritime Mobile Service Identity (MMSI) were pseudonymized using a hash.

3.2. LTE-M Positional Data

LTE-M-based ship positional data obtained from the Korean e-Navigation service were collected. We developed a data pipeline at KRISO to collect data from January 2022 to February 2023. A total of 4,541,753,073 LTE-M positional data points were collected, including dynamic data such as hourly coordinates, speed, and heading, as listed in Table 4.

3.3. Port Area Data

As shown in Figure 1, areas identifiable as ports within the study area such as harbors and piers were labeled as polygons. We compiled 1502 area data points for ports, as listed in Table 5, using Naver Maps [27] and Google Maps [28].

3.4. Navigation Obstacles

Object data from electronic charts (S-101) based on the S-100 framework used in Korean e-Navigation services were collected. Datasets linked to KRISO were also collected (Table 6).

In addition, data related to nonnavigable waters were further collected through the Public Data Portal. This study collected data on coastlines (Table 7), fishing grounds (Table 8), and aquaculture farms (Table 9) from the Korea Hydrographic and Oceanographic Agency.

3.5. Depth and Tidal Data

Depth and tidal data for domestic waters were collected. This study collected depth data linked to KRISO (Table 10) and tidal data from the Korea Hydrographic and Oceanographic Agency (Table 11).

4. Constructing Sea Area Grids for the Pathfinding Algorithm

The study areas were divided into grid cells for use by a pathfinding algorithm. Subsequently, navigable areas were determined by removing grids that overlapped with obstacle data. The grid size was selected by comprehensively considering both calculation speed and route accuracy. Reducing the grid size further could increase the accuracy of route calculations, but would also reduce processing speed owing to the increased computational load. In this work, we generated a total of 40,157,334 grid cells by dividing the sea area bounded by 33.120581° N to 38.726809° N and 124.896901° E to 132.058734° E into cells measuring 100 m each. The grid resolution of 100 m was selected based on prior algorithm design studies conducted within the Korean e-Navigation R&D framework [29].

4.1. Reflecting Obstacle Data

Obstacle data were compiled to designate areas where ships could not navigate. Obstacle areas were extracted by combining the geometric values of coastline, fishing grounds, fish farms, and electronic chart object data as shown in Figure 2. Subsequently, the obstacle area was excluded from the pathfinding target.

Our results revealed that some LTE-M positional data passed through fishing grounds and aquaculture areas while reflecting obstacle data. This indicated a need to reflect the navigable grid more precisely for small vessels because the grid size was 100 m, whereas the width of small vessels is approximately 4 m. Therefore, the location was corrected to a navigable area for small vessels when LTE-M positional data points of small vessels overlapped with obstacle data, as shown in Figure 3.

4.2. Reflecting Depth by Ship Size

The grids of nonnavigable depths for each vessel size are reflected in the obstacle data. Nonnavigable grid cells were identified by considering the draft and under keel clearance (UKC) of the vessel. Referring to the UKC coefficient from Lee [30], we modeled the UKC at 30% of the draft and established the depth of navigable water for each ship size group, as listed in Table 12.

4.3. Grid Binarization

For each ship size, the grid cells were represented by binarizing the grid into navigable and nonnavigable cells to reflect the obstacle data as shown in Figure 4. In addition, a weighted map was created in which the Euclidean distance between grid cells served as the movement weight to provide a foundation to apply a pathfinding algorithm. Generally, pathfinding uses an 8-directional weight map that moves up, down, left, right, and diagonally. However, in this study, we considered 32 directions by subdividing the path as shown in Figure 5. Consequently, a total of five pairs of grid cells and weight maps were generated for each ship size group.

5. Calculating and Reflecting Weights for Trajectory-Based Movement Importance

The vessel table was compiled using static vessel data. The vessel table refines LTE-M positional data and processes it for easy analysis. LTE-M positional data was processed into trajectory data by sorting it chronologically by vessel, and trajectory data was extracted between ports. The importance of movement between grid cells was calculated by applying a modified TF-IDF technique to port-to-port trajectories.

5.1. Constructing the Ship Table

Among the collected static ship data, defective data with missing dimensions or duplicate key identifiers such as MRN and MMSI were removed. In addition, vessels that were out of operation and those that lacked LTE-M positional data transmission were excluded. We obtained static data for a total of 3829 vessels, including their dimensions, type, and position. A set of rules for assigning ship codes were established as listed in Table 13 to analyze LTE-M positional data considering the specifications and ship types of the target vessels. The first code was assigned based on size, and the second to fourth codes were assigned based on the detailed ship type.

Subsequently, hexadecimal numbers were sequentially assigned to the 5th to 8th codes to compile a table of vessels similar to Table 14, which allowed us to estimate vessel specifications using an 8-digit ship code.

5.2. Extracting Trajectory Data Between Ports

The ship table processes the LTE-M positional data using Table 15 to provide information on the size, type, and position of the ship.

The processed positional data were categorized by ship code and sorted chronologically to allow for tracking of vessel movements. Using the speed of ground (SOG), we removed the stopping points of each ship to extract the trajectory data from the origin of the ship to its destination. We assumed that the point of cessation was when the SOG was two or less. Finally, as shown in Figure 6, the trajectory data between ports where both the origin and destination were within the port area were extracted from the trajectory data. Consequently, 118,039 trajectory data points between ports were extracted.

Given the nature of fishing vessels, the trajectories from fishing operations were not appropriate to be reflected in the importance of movement between ports. In this study, as shown in Figure 7, the fishing trajectories of the fishing vessel were refined during the trajectory data extraction process between ports, and only the trajectories for movement were extracted.

5.3. Importance Calculation Using Modified TF-IDF

The movement importance of each grid cell during port-to-port travel is calculated by applying the

T F - I D F

equation to trajectory data between ports and grid cells.

T F - I D F

is a statistical method that calculates the importance of each word

t

within a document

d

(1).

T F

refers to how frequently a word appears (2), and

I D F

refers to how rarely a word appears in other documents (3). That is, words that appear frequently in one document but not in another are considered more important.

T F - I D F (t, d) = T F (t, d) \times I D F (t),

(1)

T F (t, d) = \frac{f (t, d)}{\sum_{w^{'} \in d} f (t, d)},

(2)

I D F (t) = \log \frac{N}{d f (t)}

(3)

After the ports of origin and destination were designated, the movement importance between grid cells during port-to-port travel was calculated by substituting the trajectory data between ports for

d

and the grid numbers between ports for

t

. However, to reflect frequency rather than rarity across other trajectories, we employed

D F

(4), the inverse of

I D F

.

D F (t) = \frac{1}{\log \frac{N}{d f (t)}}

(4)

Consequently, the movement importance between grid cells was calculated for 12,039 origin-destination pairs using the

T F - D F

equation (5), a variant of the

T F - I D F

technique.

T F - D F (t, d) = T F (t, d) \times D F (t)

(5)

5.4. Reflecting Pathfinding Movement Weight

The calculated inter-grid movement importance is reflected in a weight map and used as the movement weight when pathfinding between specific origins and destinations. Conventional weight maps assign movement weights between each grid cell based only on Euclidean distance values. The grid cells with higher importance are prioritized during pathfinding by reducing the movement weight by the inter-grid movement importance (6).

D_{W} (t_{1}, t_{2}) = D_{E} (t_{1}, t_{2}) - (T F - D F (t_{2}))

(6)

To reflect significant importance, the Euclidean distance of 100 m was normalized to a value of one, and a scaling factor was applied to the movement importance between grid cells. The scaling factor was empirically selected as 1000 to balance the magnitude between the Euclidean distance and the

T F - I D F

score.

Furthermore, the minimum value for the travel weight was constrained to 0.001 to prevent negative values (7). Consequently, the 32-directional movement weights, reflecting the importance of movement between grid cells, range from 0.001 to √13.

D_{W} (t_{1}, t_{2}) = \max (0.001, D_{E} (t_{1}, t_{2}) - λ \times (T F - D F (t_{2}))), λ = 1000

(7)

To reflect the differences in route usage characteristics by ship size and the navigable area based on tide height conditions, five weight maps were calculated for each ship size group for a single origin-destination pair as shown in Figure 8. Subsequently, each weight map was divided into maximum and minimum tide height conditions as shown in Figure 9, resulting in 10 maps with refined weights. To calculate the importance of movement between grid cells, only the size and trajectory of the tide height corresponding to each weight map condition were used.

6. Results of Identifying Customary Routes

To verify the effectiveness of the weighted map reflecting trajectory-based movement importance, a pathfinding algorithm is executed between ports to infer customary routes. In this work, we applied the A* algorithm, which is known for its excellent computational speed. The validity of the proposed customary route inference method was verified by measuring the similarity between the derived customary routes and the trajectory data between ports. In addition, we verified whether the proposed method degraded pathfinding speed.

6.1. Measurement of Cosine Similarity Between Customary Routes and Port-to-Port Trajectories

The cosine similarity between the derived customary route and the trajectory data between ports was measured to quantitatively evaluate the validity of the proposed customary route inference method (8). For the similarity evaluation of the inferred routes, sea areas with more than 100 available port-to-port trajectory records were selected. Among the extracted port-to-port trajectory data, 90% were used as training data to construct the weighted map, while the remaining 10% were used as independent test data (Vector

A

) for similarity measurement.

c o s i n e s i m i l a r i t y (A, B) = \frac{A \cdot B}{‖ A ‖ ‖ B ‖}

(8)

Cosine similarity was measured by substituting the trajectory between ports into vector

A

and the predicted customary route into vector

B

(9). In this formulation,

V_{t r}

and

V_{p r}

denote grid-based binary vectors representing the spatial footprint of the trajectory data and the predicted customary route, respectively.

\cos s i m (V_{t r}, V_{p r}) = \frac{V_{t r} \cdot V_{p r}}{‖ V_{t r} ‖ ‖ V_{p r} ‖}, V_{t r} = T r a j e c t o r y v e c t o r, V_{p r} = P r e d i c t e d c u s t o m a r y r o u t e v e c t o r

(9)

The measurement results indicated that the cosine similarity between the customary route and the trajectory was approximately 0.8, as listed in Table 16. This confirms that the proposed customary route inference method largely reflects the actual trajectory-based operational characteristics. Specifically, the cosine similarity was higher when there was sufficient trajectory data between ports and the deviation between trajectories was small. As visualized in Figure 10, the inferred customary routes are shown to follow the underlying port-to-port trajectory data.

Because the similarity evaluation is based on grid inclusion rather than point-to-point correspondence, the proposed method is robust to differences in trajectory length and sampling density. The grid-based representation may introduce a spatial deviation on the order of the grid resolution (approximately 100 m). However, in this study, the similarity evaluation focuses on spatial consistency at the route level rather than precise point-to-point positional accuracy.

6.2. Comparison of Pathfinding Speed

Even if a valid customary route is inferred, a decrease in route calculation speed can limit the practical applications of a computational method in realistic service. Therefore, the pathfinding time required to execute the proposed method was compared with that of an existing distance-based weight map method. For evaluation, the most widely used A* and Dijkstra algorithms were applied, as listed in Table 17.

Based on the results of the comparison results, the A* algorithm demonstrated superior computational speed compared with the Dijkstra algorithm. Within the same algorithm, the results of the comparison based on the weight map showed both cases where the speed decreased and increased. This is attributed to the movement weights being distributed more diversely, which causes the pathfinding algorithm to explore more candidate grid cells and thus results in an increase in search time for shorter routes. In contrast, when the route was long, the grid cells with higher importance were selected first, reducing unnecessary search space and actually shortening the search time. Consequently, in the A* algorithm, which has excellent pathfinding speed, the time difference between distance-based route inference and customary route inference was not significant.

6.3. Comparison Between Low and High Tide

We verified the effect of the segmented weight map by comparing the inferred customary routes between low and high tide conditions. In this study, we targeted island areas strongly affected by tides and major container ports, and compared inference results for time windows around the minimum and maximum tidal levels. As shown in Figure 11, different routes are inferred between low and high tide, which confirms the validity of applying condition-specific weight maps.

7. Discussion

In this study, we have proposed a method to infer the customary route of a ship using positional data collected every 3 s and validated its effectiveness. However, several concerns must be addressed before applying it to real systems such as e-Navigation services.

7.1. Requirements When Applying the Service

The customary route inference method proposed in this study assumes sea areas where port-to-port trajectory data are available. In areas without sufficient trajectory data, distance-based routes relying on Euclidean distance are provided instead. Therefore, to enable the practical deployment of the proposed method, a database for port areas must be established, and ship trajectories should be continuously collected to create an environment in which port-to-port trajectory data can be efficiently extracted. In this study, positional data points collected over a one-year period from vessels using the Korean e-Navigation service were processed to construct port-to-port trajectory data. Accordingly, the validity of the proposed method was evaluated only in sea areas where more than 100 port-to-port trajectories were available. If trajectory data collected over a longer period are utilized, it would be possible to analyze the impact of trajectory length and density on inferred customary routes and to construct weighted maps that reflect additional factors, such as seasonal operational characteristics. Moreover, in areas where shipping routes frequently change owing to environmental factors such as shallow waters, there is a need to establish an auxiliary system that can collect and reflect real-time environmental information.

7.2. Compliance with Laws and Regulations

The purpose of this study was to infer customary routes based on the operational trajectories of vessels. Although obstacle data are reflected, the route is inferred based on the actual trajectories of vessels; therefore, trajectories that do not comply with regulations may be included. In particular, the obligation to comply with traffic separation schemes (TSS) does not apply for small fishing vessels given their relatively independent operations. Therefore, the weight map derived from their trajectories cannot guarantee compliance with TSS (Figure 12). To infer routes that comply with regulations, compliance with regulations should be reviewed in the process of constructing trajectory data between ports, and preprocessing procedures must be applied to remove noncompliant routes.

7.3. Convention on the International Regulations for Preventing Collisions at Sea (COLREGs) and Operational Constraints

The route inference method proposed in this study aims to analyze and estimate customary vessel movement patterns based on historical trajectory data and is not intended to directly implement real-time collision avoidance or the enforcement of navigational regulations prescribed by the COLREGs [31]. In real-world maritime operations, navigational safety rules such as COLREGs must always take precedence over efficiency or route optimality [32,33]. Accordingly, the routes derived in this study are intended to serve as strategic-level baseline routes, rather than as direct substitutes for rule-based or real-time navigational decision-making systems.

8. Conclusions

In this study, we have proposed a method to infer customary routes that reflect operational characteristics between ports using periodically collected ship position data. To achieve this, the research area was divided into grid cells, and a weight map of navigable grid cells was developed as input for the pathfinding algorithm. In addition, port area data were compiled to extract trajectory data between ports from vessel static data and LTE-M positional data. The importance of movement was then calculated for each of several ship size classes and tide height conditions during navigation based on the compiled data. Our results show that customary routes could be determined effectively by reflecting this information in a weight map.

To verify the efficacy of the proposed approach, we applied the method to infer routes using the A* algorithm and measured the cosine similarity with trajectory data between actual ports. Consequently, high cosine similarity was observed in areas with sufficient trajectory data between ports, which confirms that the weight map of movement importance across grid cells reflected the operational characteristics of actual vessels accurately. Furthermore, there was no significant decrease in computational speed during the process of inferring customary routes compared to an existing state-of-the-art method relying on a distance-based weight map.

This study differs from previous research in that the proposed method quantifies operational importance using actual ship positional data and incorporates it into route inference. Our results are significant as an empirical foundation for further research on techniques that rely on e-Navigation and MASS systems to improve the accuracy of route planning. Future investigations should refine the inference of customary routes based on seasons and weather conditions and consider long-term trajectory data as well as various marine environmental factors. The proposed approach should then be expanded into a real-time route recommendation service to allow for verification in actual sea areas.

The movement-importance-based weighted map proposed in this study can be utilized in combination with real-time navigational decision-making or collision avoidance algorithms. In practical maritime operations, rule-based safety constraints—such as TSS, regulated areas, and collision-risk situations—must be given priority over route optimality. Future research should focus on integrating COLREGs and navigational regulations as hard constraints within the pathfinding process, thereby extending the proposed approach toward a safer and more practical route recommendation framework.

Author Contributions

Conceptualization, D.-J.C. and J.-R.C.; Data curation, S.S.; Formal analysis, S.S.; Funding acquisition, D.-J.C.; Investigation, S.S., J.-R.C. and J.-H.B.; Methodology, S.S.; Project administration, D.-J.C.; Software, S.S.; Supervision, D.-J.C.; Validation, D.-J.C. and J.-H.B.; Visualization, S.S.; Writing—original draft, S.S.; Writing—review and editing, D.-J.C., J.-R.C., J.-R.J. and J.-H.B. All authors have read and agreed to the published version of the manuscript.

Funding

This study was the result of a research conducted in 2025 with funding from the Ministry of Oceans and Fisheries and support from the Korea Institute of Marine Science & Technology Promotion (20210645, Development of Maritime Data Science Technology based on the Korean e-Navigation Service).

Data Availability Statement

The datasets presented in this article are not readily available because the data are confidential and provided by the Ministry of Oceans and Fisheries.

Conflicts of Interest

Authors Seung Sim, Jun-Rae Cho, and Jae-Ryong Jung were employed by the company Suresoft Technologies Inc. The remaining authors declare that the research was conducted in the absence of any additional commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

TF-IDF	Term frequency–inverse document frequency
AIS	Automatic identification system
LTE-M	Long-term evolution for maritime
MASS	Maritime autonomous surface ships
MRN	Maritime resource name
MMSI	Maritime mobile service identity
ECS	Electronic chart system
SOG	Speed over ground
COG	Course over ground
UKC	Under keel clearance
LOA	Length overall
GT	Gross tonnage
TSS	Traffic separation schemes

References

International Maritime Organization (IMO). e-Navigation Strategy Implementation Plan—Update 1 (MSC.1/Circ.1595); IMO: London, UK, 2018; Available online: https://wwwcdn.imo.org/localresources/en/OurWork/Safety/Documents/enavigation/MSC.1-Circ.1595%20-%20E-Navigation%20Strategy%20Implementation%20Plan%20-%20Update%201%20(Secretariat)%20(2).pdf (accessed on 12 April 2025).
Ministry of Oceans and Fisheries (MOF). e-Navigation Service. Available online: https://e-navigation.mof.go.kr/ (accessed on 12 April 2025).
International Maritime Organization (IMO). e-Navigation. Available online: https://www.imo.org/en/OurWork/Safety/Pages/eNavigation.aspx (accessed on 12 April 2025).
International Telecommunication Union (ITU). Technical Characteristics for an Automatic Identification System Using Time-Division Multiple Access in the VHF Maritime Mobile Band (Recommendation ITU-R M.1371-5). ITU; 2014; Available online: https://www.itu.int/dms_pubrec/itu-r/rec/m/r-rec-m.1371-5-201402-i!!pdf-e.pdf (accessed on 12 April 2025).
International Maritime Organization (IMO). Autonomous Shipping. Available online: https://www.imo.org/en/MediaCentre/HotTopics/Pages/Autonomous-shipping.aspx (accessed on 12 April 2025).
Choi, W.-J.; Lee, J.-S. A balanced path-following approach to course change and original course convergence for autonomous vessels. J. Mar. Sci. Eng. 2024, 12, 1831. [Google Scholar] [CrossRef]
Hart, P.E.; Nilsson, N.J.; Raphael, B. A formal basis for the heuristic determination of minimum cost paths. IEEE Trans. Syst. Sci. Cybern. 1968, 4, 100–107. [Google Scholar] [CrossRef]
Dijkstra, E.W. A note on two problems in connection with graphs. Numer. Math. 1959, 1, 269–271. [Google Scholar] [CrossRef]
Harati-Mokhtari, A.; Wall, A.; Brooks, P.; Wang, J. Automatic Identification System (AIS): Data reliability and human error implications. J. Navig. 2007, 60, 373–389. [Google Scholar] [CrossRef]
Gao, D.; Zhu, Y.; Zhang, J.; He, Y.; Yan, K.; Yan, B. A novel MP-LSTM method for ship trajectory prediction based on AIS data. Ocean Eng. 2021, 228, 108956. [Google Scholar] [CrossRef]
Sun, Y.; Chen, X.Q.; Jun, L.; Zhao, J.S.; Hu, Q.Y.; Fang, X.H.; Yan, Y. Ship trajectory cleansing and prediction with historical AIS data using an ensemble ANN framework. Int. J. Innov. Comput. Inf. Control 2021, 17, 443. [Google Scholar] [CrossRef]
Liu, J.; Shi, G.; Zhu, K. Vessel trajectory prediction model based on AIS sensor data and adaptive chaos differential evolution support vector regression. Appl. Sci. 2019, 9, 2983. [Google Scholar] [CrossRef]
Zaman, U.; Khan, J.; Lee, E.; Balobaid, A.S.; Aburasain, R.Y.; Kim, K. Deep learning innovations in South Korean maritime navigation. PLoS ONE 2024, 19, e0310385. [Google Scholar] [CrossRef]
Liu, H.; Shan, Q.; Cao, Y.; Xu, Q. Global path planning of unmanned surface vehicle in complex sea areas. J. Mar. Sci. Eng. 2024, 12, 1324. [Google Scholar] [CrossRef]
Li, Y.; Yang, F.; Zhang, X.; Yu, D.; Yang, X. Improved D* Lite algorithm for ship route planning. J. Mar. Sci. Eng. 2024, 12, 1554. [Google Scholar] [CrossRef]
Gu, Q.; Zhen, R.; Liu, J.; Li, C. An improved RRT algorithm based on prior AIS information. Ocean Eng. 2023, 279, 114595. [Google Scholar] [CrossRef]
Charalambopoulos, N.; Xidias, E.; Nearchou, A. Efficient ship weather routing using probabilistic roadmaps. Ocean Eng. 2023, 273, 114031. [Google Scholar] [CrossRef]
Zhai, Y.; Cui, J.; Meng, F.; Xie, H.; Hou, C.; Li, B. Ship path planning based on sparse A* algorithm. J. Mar. Sci. Appl. 2025, 24, 238–248. [Google Scholar] [CrossRef]
Liu, Z.; Cui, J.; Meng, F.; Xie, H.; Dan, Y.; Li, B. Intelligent ship route planning based on adaptive step size informed-RRT*. J. Mar. Sci. Appl. 2025, 24, 829–839. [Google Scholar] [CrossRef]
He, Y.K.; Zhang, D.; Zhang, J.F.; Zhang, M.Y.; Li, T.W. Ship route planning using historical trajectories derived from AIS data. TransNav 2019, 13, 69–76. [Google Scholar] [CrossRef]
Yuan, X.; Wang, J.; Zhao, G.; Wang, H. Comprehensive study on optimizing inland waterway vessel routes using AIS data. J. Mar. Sci. Eng. 2024, 12, 1775. [Google Scholar] [CrossRef]
Huang, I.-L.; Lee, M.-C.; Chang, L.; Huang, J.-C. Development and application of an advanced AIS-based ship trajectory extraction framework for maritime traffic analysis. J. Mar. Sci. Eng. 2024, 12, 1672. [Google Scholar] [CrossRef]
Oh, M.-J.; Roh, M.-I.; Park, S.-W.; Chun, D.-H.; Son, M.-J.; Lee, J.-Y. Operational analysis of container ships by using maritime big data. J. Mar. Sci. Eng. 2021, 9, 438. [Google Scholar] [CrossRef]
Freeman, L.C. Centrality in social networks: Conceptual clarification. Soc. Netw. 1979, 1, 215–239. [Google Scholar] [CrossRef]
Brin, S.; Page, L. The anatomy of a large-scale hypertextual web search engine. Comput. Netw. ISDN Syst. 1998, 30, 107–117. [Google Scholar] [CrossRef]
Salton, G.; Buckley, C. Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 1988, 24, 513–523. [Google Scholar] [CrossRef]
Naver Map. Available online: https://map.naver.com (accessed on 23 March 2023).
Google Maps. Available online: https://www.google.com/maps (accessed on 23 March 2023).
Korea Research Institute of Ships and Ocean Engineering (KRISO). Algorithm Specification for the Optimal Safe Route Service (SV30); KeN-DDS-SV30-003; Version 2.0; KRISO: Daejeon, Republic of Korea, 2020. [Google Scholar]
Lee, W.-H.; Yoo, W.-C.; Choi, G.-H.; Ham, S.-H.; Kim, T.-W. Determination of optimal ship route in coastal sea considering sea state and under keel clearance. J. Soc. Nav. Archit. Korea 2019, 56, 480–487. [Google Scholar] [CrossRef]
International Maritime Organization (IMO). Convention on the International Regulations for Preventing Collisions at Sea (COLREGs); IMO: London, UK, 1972; Available online: https://www.imo.org/en/about/conventions/pages/colreg.aspx (accessed on 19 December 2025).
Tam, C.; Bucknall, R.; Greig, A. Review of collision avoidance and path planning methods for ships. Proc. Inst. Mech. Eng. Part M 2009, 223, 13–25. [Google Scholar] [CrossRef]
Johansen, T.A.; Perez, T.; Cristofaro, A. Ship collision avoidance and COLREGs compliance using maneuvering policies. IEEE Trans. Intell. Transp. Syst. 2016, 17, 3407–3422. [Google Scholar] [CrossRef]

Figure 1. Port area labeling.

Figure 2. Obstacle data. (a) Incheon; (b) Geoje; (c) Mokpo; and (d) Yeosu.

Figure 3. Correction of navigable areas for small vessels. (a) Obstacle data near Yeosu and small vessel LTE-M positional data; (b) Corrected obstacle data near Yeosu; (c) Obstacle data in the Gogunsan Islands and small vessel LTE-M positional data; and (d) Corrected obstacle data in the Gogunsan Islands.

Figure 4. Binarized grid cells. (a) Obstacle grid near Pohang; (b) Binarization near Pohang; (c) Obstacle grid near Pyeongtaek; and (d) Binarization near Pyeongtaek.

Figure 5. 32-direction weight map. (a) Without obstacles; (b) With obstacles.

Figure 6. Trajectory data between ports. (a) Sogyeongdo Pier—Yeosu Sinbuk Port; (b) Wando Port Port—Jeju Port; (c) Misu Port—Tongyeong West Port; and (d) Gwangyang Port—Pohang Shinhang (new port).

Figure 7. Refinement of fishing vessel operation trajectories. (a) Original fishing vessel trajectory and (b) Refined fishing vessel trajectory.

Figure 8. Weight map of Incheon Port to Pyeongtaek Port by ship size. (a) Vessels under 20 m; (b) Vessels 21–40 m.

Figure 9. Weight map for Masan Port to Pyeongtaek Port by tide height. (a) Maximum tide height; (b) Minimum tide height.

Figure 10. Results of inferring customary routes. (a) Gamdo Harbor—Sinwol Harbor; (b) Seogwi Port—Jeju Port; (c) Jeonjangpo Harbor—Songdo Harbor; and (d) Jindo Port—Sangchujah Port.

Figure 11. Inferred customary routes by tidal condition. (a) Geomun Port, Geoje—Dong Port, Tongyeong; (b) Uldo Port—Deokjeokdo Port; (c) Busan Port—Pohang Shinhang (new port); and (d) Heoryukdo Pier—Ocheon Port.

Figure 12. Trajectories and navigation routes of small vessels.

Table 1. Transmission rates of AIS data for Class A vessels.

Ship Status and Speed	Reporting Interval
Ships anchored or moored and not moving faster than 3 knots	3 min
Ships anchored or moored and moving faster than 3 knots	10 s
Ships with speeds ranging from 0 to 14 knots	10 s
Ships with speeds ranging from 0 to 14 knots currently changing course	3 1/3 s
Ships with speeds ranging from 14 to 23 knots	6 s
Ships with speeds ranging from 14 to 23 knots currently changing course	2 s
Ship speed > 23 knots	2 s
Ship speed > 23 knots and changing course	2 s

Table 2. Summary of representative studies on route calculation algorithms.

Reference	Algorithm Used	Main Features	Performance Evaluation
Liu et al., 2024 [14]	A*	Route planning considering depth and obstacle information	Route safety, suitability of avoidance path
Li et al., 2024 [15]	Improved D* Lite	Enhanced version of D* Lite with improved computational efficiency and avoidance path quality	Computation time, path quality
Gu et al., 2023 [16]	Improved RRT	RRT planning using AIS/DP-based advance information	Path length, number of nodes, computation time
Charalambopoulos et al., 2023 [17]	PRM	PRM for vessel route navigation using weather and oceanographic information	sailing time, fuel consumption, weather hazards
Zhai et al., 2025 [18]	Sparse A*	Reducing the search space with sparse A* using only important points as nodes	Number of nodes, computation time, path quality
Liu et al., 2025 [19]	Informed-RRT*	Informed-RRT* adjusts step size according to environment complexity	Path length, number of sampling nodes, computation time
He et al., 2019 [20]	Dijkstra + ACO	Graph-based route optimization after AIS turning point clustering	Path length, avoidance of hazardous waters, safety

Table 3. Static Data from Vessels.

Table	Filed	Description
Ship Detail	MRN	This is used as a unique identifier and address for data transmission and reception in the Korean e-Navigation service and as a foreign key to match with other tables.
	Vessel type code	The vessel type can be identified by matching this code with the Common Code table.
	Tonnage	Gross Tonnage of the vessel.
	Length	Length Overall of the vessel.
	Width	Width of the vessel.
	Draft	Draft of the vessel.
Common Code	Code identifier	This code is used as a foreign key to link with the Ship Detail table.
Common Code	Code name	This indicates the classification types of vessels, such as passenger ships, fishing boats, and cargo ships.
Ship Master	MRN	This is used as a foreign key to link to the Ship Detail table.
	MMSI	The Maritime Mobile Service Identity is used as a unique identification number for ships.
	Vessel usage status	This indicates whether the ship is operational.
Router Master	MRN	This is used as a foreign key to link to the Ship Detail table.
Router Master	Router usage status	This indicates whether or not to use a router for service communication.
ECS Master	MRN	This is used as a foreign key to link to the Ship Detail table.
ECS Master	ECS usage status	This indicates whether or not the service terminal is used.

Table 4. LTE-M Positional Data.

Column	Data Type	Description
Message Send Date Time	string	Timestamp when the position message was sent (YYYYMMDDHHmmssSSS)
Message Source ID	string	MRN of the message source (unique vessel identifier)
SOG	float	Vessel speed over ground in knots
COG	float	Vessel course over ground in degrees (0–359°)
Latitude	float	Latitude of vessel position (WGS84)
Longitude	float	Longitude of vessel position (WGS84)

Table 5. Port area data.

Column	Data Type	Description
Port Identifier	integer	Unique numeric identifier for the port or pier
Port Name	string	Name of the port or pier
Geometry	polygon	Geospatial polygon representing the spatial extent of the port or pier

Table 6. Object data from electronic charts.

Category	Features
Navigation Aids	Beacon cardinal, Beacon isolated danger, Beacon lateral, Beacon special purpose general, Daymark, Pile, Pylon bridge support
Anchorage & Mooring	Anchorage area, Floating dock, Mooring warping facility, Pontoon
Buoys	Buoy artificial, Buoy cardinal, Buoy installation, Buoy isolated danger, Buoy lateral, Buoy safe water, Buoy special purpose general
Bridges & Structures	Bridge, Span fixed, Span opening, Conveyor, Pipeline overhead, Shoreline construction
Fishing & Facilities	Fishing facility, Offshore platform, Offshore production area, Oil barrier, Hulk
Traffic Management	Traffic separation zone

Table 7. Coastline data.

Column	Data Type	Description
City/County/District Group	integer	Administrative region code for city, county, or district group
Island Group	integer	Group identifier for associated islands
Source Date	integer	Year of the source data (2022)
Geometry	line string	Geospatial line string representing the coastline (WGS84)

Table 8. Fishing ground data.

Column	Data Type	Description
Geometric Identifier	integer	Unique numeric identifier for the geometric feature
Category Code	integer	Code representing the category or type of the fishery area
Revision Year	integer	Year when the data or boundaries were last revised (2022)
Area	float	Area size of the fishery
Longitude	float	Longitude of a representative point in the fishery area (WGS84)
Latitude	float	Latitude of a representative point in the fishery area (WGS84)
Geometry	polygon	Geospatial polygon defining the spatial extent of the fishery area

Table 9. Aquaculture farm data.

Column	Data Type	Description
Object Number	integer	Unique numeric identifier for the aquaculture site
Electronic Navigational Chart Level	integer	Chart scale or level classification according to electronic navigational charts
Source Date	string	Date when the source data was recorded (YYYYMMDD)
Geometry	polygon	Geospatial polygon representing the boundary of the aquaculture site

Table 10. Depth data.

Column	Data Type	Description
Origin Latitude	float	Latitude of the data starting point (WGS84)
Origin Longitude	float	Longitude of the data starting point (WGS84)
Spacing Latitudinal	float	Grid spacing in the latitudinal direction, in degrees (0.001)
Spacing Longitude	float	Grid spacing in the longitudinal direction, in degrees (0.001)
Depth	float	Measured water depth at the location (meters; negative = below sea level)

Table 11. Tidal data.

Column	Data Type	Description
Station Name	String	Unique name of the observation station
Latitude	Float	Latitude of observation station
Longitude	Float	Longitude of observation station
Record Time	String	Observation time (YYYYMMDDHHmm)
Tide Type	String	MAX for highest tide, MIN for lowest tide
Tide Height	Integer	Tide height (in centimeters)

Table 12. Setting the navigable water depth by ship size group.

LOA (Length Overall)	Draft	UKC (Under-Keel Clearance)	Navigable Water Depth (Draft + UKC)
1–20 m	1 m	0.3 m	1.3 m
21–40 m	3 m	0.9 m	3.9 m
41–80 m	5 m	1.5 m	6.5 m
81–200 m	7 m	2.1 m	9.1 m
200 m	9 m	2.7 m	11.7 m

Table 13. Rules for assigning ship code.

Prefix of Ship Code	Ship Length Overall	Second Segment of the Ship Code	Ship Type
A	1–20 m	Axx	Passenger ship
B	21–40 m	Bxx	Fishing
C	41–80 m	Cxx	Cargo ship
D	81–200 m	Dxx	Official vessel
E	200 m	Exx	Miscellaneous
		Fxx	Oil tanker

Table 14. Example ship table.

SHIP_CODE	MRN	MMSI	GT	LOA	Width	Draft	Ship Type
CA010000	(HASH)	(HASH)	388	53	11	3	Passenger ship
AB01008e	(HASH)	(HASH)	10	15	5	1	Fishing
AB020313	(HASH)	(HASH)	9	15	4	1	Fishing (Drift Net)
CA01001c	(HASH)	(HASH)	494	44	12	4	Passenger ship
CA050038	(HASH)	(HASH)	225	44	9	5	Passenger ship (Car Ferry)

Table 15. Processed positional data.

Column	Definition
szMsgSendDT	Message Send Date Time
SHIP_CODE	Size-and-type-based vessel identifier
dSOG	Speed over ground
dCOG	Course over ground
dLat	Latitude
dLon	Longitude

Table 16. Results of cosine similarity measurement.

Origin	Destination	LOA	Cosine Similarity
Gamdo Harbor, Yeosu	Sinwol Harbor, Yeosu	1 m~20 m	0.7602
Seogwi Port	Jeju Port	21 m~40 m	0.8644
Jeonjangpo Harbor, Sinan	Songdo Harbor, Sinan	1 m~20 m	0.8954
Jindo Port, Jindo	Sangchujah Port, Jindo	1 m~20 m	0.8897

Table 17. Comparison of pathfinding times.

Origin	Destination	Elapsed Time for Pathfinding by Algorithm (s)
		A*		Dijkstra
		Distance Based	Movement Importance Based	Distance Based	Movement Importance Based
Gamdo Harbor	Sinwol Harbor	0.73 s	1.41 s	9.10 s	17.34 s
Seogwi Port	Jeju Port	2.03 s	5.29 s	149.64 s	86.34 s
Jeonjangpo Harbor	Songdo Harbor	0.57 s	0.34 s	2.45 s	15.59 s
Jindo Port,	Sangchujah Port	0.62 s	1.23 s	197.46 s	38.06 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sim, S.; Cho, J.-R.; Jung, J.-R.; Baek, J.-H.; Cho, D.-J. A Method to Infer Customary Routes via Analysis of the Movement Importance of Ship Trajectories Calculated Using TF-IDF. J. Mar. Sci. Eng. 2026, 14, 29. https://doi.org/10.3390/jmse14010029

AMA Style

Sim S, Cho J-R, Jung J-R, Baek J-H, Cho D-J. A Method to Infer Customary Routes via Analysis of the Movement Importance of Ship Trajectories Calculated Using TF-IDF. Journal of Marine Science and Engineering. 2026; 14(1):29. https://doi.org/10.3390/jmse14010029

Chicago/Turabian Style

Sim, Seung, Jun-Rae Cho, Jae-Ryong Jung, Jong-Hwa Baek, and Deuk-Jae Cho. 2026. "A Method to Infer Customary Routes via Analysis of the Movement Importance of Ship Trajectories Calculated Using TF-IDF" Journal of Marine Science and Engineering 14, no. 1: 29. https://doi.org/10.3390/jmse14010029

APA Style

Sim, S., Cho, J.-R., Jung, J.-R., Baek, J.-H., & Cho, D.-J. (2026). A Method to Infer Customary Routes via Analysis of the Movement Importance of Ship Trajectories Calculated Using TF-IDF. Journal of Marine Science and Engineering, 14(1), 29. https://doi.org/10.3390/jmse14010029

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Method to Infer Customary Routes via Analysis of the Movement Importance of Ship Trajectories Calculated Using TF-IDF

Abstract

1. Introduction

2. Research Related to Route Search

2.1. Research on Trajectory Prediction

2.2. Research on Route Calculation

2.3. Research on Clustering

2.4. Research on Importance Calculation

3. Problem Definition and Data Collection

3.1. Static Data from Vessels

3.2. LTE-M Positional Data

3.3. Port Area Data

3.4. Navigation Obstacles

3.5. Depth and Tidal Data

4. Constructing Sea Area Grids for the Pathfinding Algorithm

4.1. Reflecting Obstacle Data

4.2. Reflecting Depth by Ship Size

4.3. Grid Binarization

5. Calculating and Reflecting Weights for Trajectory-Based Movement Importance

5.1. Constructing the Ship Table

5.2. Extracting Trajectory Data Between Ports

5.3. Importance Calculation Using Modified TF-IDF

5.4. Reflecting Pathfinding Movement Weight

6. Results of Identifying Customary Routes

6.1. Measurement of Cosine Similarity Between Customary Routes and Port-to-Port Trajectories

6.2. Comparison of Pathfinding Speed

6.3. Comparison Between Low and High Tide

7. Discussion

7.1. Requirements When Applying the Service

7.2. Compliance with Laws and Regulations

7.3. Convention on the International Regulations for Preventing Collisions at Sea (COLREGs) and Operational Constraints

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI