Ship Voyage Route Waypoint Optimization Method Using Reinforcement Learning Considering Topographical Factors and Fuel Consumption

Lee, Juhyang; Park, Youngseo; Eom, Jeongon; Hwang, Hungyu; Kim, Sewon

doi:10.3390/jmse13081554

Open AccessArticle

Ship Voyage Route Waypoint Optimization Method Using Reinforcement Learning Considering Topographical Factors and Fuel Consumption

by

Juhyang Lee

¹

,

Youngseo Park

²

,

Jeongon Eom

²

,

Hungyu Hwang

¹

and

Sewon Kim

^2,*

¹

Research Institute of Medium & Small Ship Building, Seoul 07237, Republic of Korea

²

Department of Intelligent Mechatronics Engineering, Sejong University, Seoul 05006, Republic of Korea

^*

Author to whom correspondence should be addressed.

J. Mar. Sci. Eng. 2025, 13(8), 1554; https://doi.org/10.3390/jmse13081554

Submission received: 23 July 2025 / Revised: 8 August 2025 / Accepted: 11 August 2025 / Published: 13 August 2025

(This article belongs to the Special Issue Intelligent Solutions for Marine Operations)

Download

Browse Figures

Versions Notes

Abstract

As the IMO and the EU strengthen carbon emission regulations, eco-friendly voyage planning is increasingly recognized by ship owners as one of the most important performance factors of the vessel fleet. The eco-friendly voyage planning aims to reduce carbon emissions and fuel consumption while satisfying voyage constraints. In this study, a novel route waypoint optimization method is proposed, which combines a fuel consumption forecasting model based on the Transformer and a Proximal Policy Optimization (PPO) algorithm for adaptive waypoint planning. The developed framework suggests a multi-objective methodology unlike the traditional approaches where a single objective is sought after, which characterizes fuel efficiency against navigational safety and operational simplicity. The methodology consists of three sequential phases. First, the transformer model is employed to predict ship fuel consumption using navigational and environmental data. Next, the predicted consumption values are utilized as a reward function in a PPO-based reinforcement learning framework to generate fuel-efficient routes. Finally, the number and placement of waypoints are further optimized with respect to terrain and bathymetric constraints, improving the practicality and safety of the navigational plan. The results show that the proposed method could decrease average fuel consumption by up to 11.33% across three real-world case studies: Busan–Rotterdam, Busan–Los Angeles, and Mokpo–Houston, compared to AIS-based routes. The transformer model outperformed Long Short-Term Memory (LSTM) and Random Forest baselines with the highest prediction accuracy, achieving an R² score of 86.75%. This study is the first to incorporate transformer-based forecasting into reinforcement learning for maritime route planning and demonstrates how the method adaptively controls waypoint density in response to environmental and geographical conditions. These results support the practical application of the approach in smart ship navigation systems aligned with IMO’s decarbonization goals.

Keywords:

route planning; waypoint number optimization; reinforcement learning; transformer model

1. Introduction

1.1. Background

In recent years, the shipping industry has shown increasing interest in operational strategies to reduce fuel consumption and carbon emissions simultaneously. The International Maritime Organization (IMO) has adopted a legally binding Net-Zero Framework to achieve net-zero greenhouse gas emissions from international shipping by 2050 and is implementing various regulations, such as a global fuel standard and a credit-based carbon pricing mechanism. These regulations aim not only to ensure navigational safety and operational efficiency but also to underscore the growing importance of developing voyage planning strategies that actively mitigate the environmental impact of maritime transport.

To support this regulatory transition, this study proposes a route planning strategy that minimizes fuel consumption through dynamic waypoint optimization. By optimizing waypoint configuration with consideration of bathymetric and environmental conditions, the method reduces fuel consumption and carbon emission due to unnecessary course changes and vessel steering. These operational improvements lower carbon intensity, contributing to compliance with the Carbon Intensity Indicator (CII), and help meet Energy Efficiency Existing Ship Index (EEXI) thresholds without the need for hardware retrofitting. This approach supports compliance with IMO environmental regulations and contributes to achieving the 2050 net-zero target.

1.2. State of the Art

The important basic route for maritime navigation is the great circle route, which is the shortest route. Considering the marine environment, the great circle route is not always the optimal route. In optimal navigation studies, a grid route, which is a candidate route, is generated using the great circle route, and an objective function, such as fuel consumption, is set to derive the optimal route [1]. Artusi et al. used the PPO algorithm to generate the optimal route and verified it by comparing it with the A star algorithm [2]. Moradi et al. predicted fuel consumption using an artificial neural network and generated the optimal route using a reinforcement learning algorithm, and verified it by comparing it with the great circle route [3]. However, route points satisfying the objective function can cause excessive course changes of the ship, and a large number of route points can increase the complexity of the maneuver. Research on these issues has not been sufficiently conducted, and additional research is needed to optimally adjust the position and number of route points.

After waypoints are produced, the path following is the issue. Line-of-Sight (LOS) is a path-following algorithm used in autonomous navigation systems to guide vessels or unmanned vehicles towards target waypoints by orienting directly from the current position to the subsequent waypoint. The LOS method regulates the waypoint without unnecessary heading change. Previous studies have employed various techniques to mitigate excessive heading changes. Ma et al. imposed constraints limiting heading angle variations to within 30 degrees [4]. Lee and Kim highlighted limitations of the conventional A* algorithm, which excessively minimizes path length, causing waypoints to be placed too closely [5]. They applied the Douglas–Peucker (DP) algorithm to eliminate redundant waypoints and optimize the path. Han et al. identified inefficiencies resulting from a fixed look-ahead distance and proposed a fuzzy-controlled Variable Look-Ahead Distance method [6]. Their approach reduced the overshoot from 15 m to 5 m and the path-following error to 1 m. In order to optimize the waypoints, Hsu, T. P., and Hsieh, T. H., proposed a method to add new waypoints using non-fixed intervals [7]. The addition process was performed until a distance advantage of more than 1 h was achieved compared to before the addition. Hsieh, Tsung-Hsuan, et al. proposed a method to optimize the number and position of waypoints based on fuzzy logic using a genetic algorithm in a great circle route [8]. It showed better performance than the method of Hsu, T. P., and Hsieh, T. H. [7], but it has limitations in that the optimization was performed only on a great circle route, and only the distance was considered for optimization.

A ship’s operational route has a direct impact on fuel consumption. Therefore, to evaluate the performance of a route optimization, it is essential to link it with a fuel forecasting model, and the accuracy of the forecast is critical. The AI-based fuel consumption prediction that infers the relation between fuel consumption and the voyage data has been variously researched. Gkerekos et al. predicted and compared the fuel consumption of the main engine of a ship using various machine learning algorithms such as Random Forest, Support Vector Machine (SVM), and Artificial Neural Network (ANN) [9]. Le et al. developed a fuel consumption prediction model for a container ship using a neural network-based model [10]. The superiority of the neural network model was proven with a lower average absolute percentage error compared to the multiple regression model. Kaklis et al. predicted fuel consumption using a recurrent neural network (RNN), and the optimal route was searched by combining it with a weather routing algorithm [11]. Lee et al. developed a fuel consumption and carbon emission prediction model based on LSTM using the onboard measurement data of a smart ship [12]. Kim et al. proposed a model for predicting the power consumption of ships using multivariate Bayesian optimization and Convolutional Neural Network–Long Short-Term Memory (CNN-LSTM) [13]. It suggested the possibility of contributing to reducing fuel consumption and GHG emissions by increasing the accuracy of power load prediction.

Since the fuel consumption data of ships are time series data, many models that are representative of time series prediction, such as RNN and LSTM, have been applied. In particular, transformer architectures are powerful in natural language processing and are also being applied to time series forecasting models. Zerveas et al. conducted a study on applying the transformer encoder architecture for representation learning of multivariate time series data [14]. It was shown that excellent performance can be achieved in both supervised and unsupervised learning environments using only the transformer encoder, and excellent performance was achieved in both regression and classification tasks. Zhou et al. proposed an informer model for efficient time series prediction. By introducing the ProbSparse Attention mechanism, the time complexity of attention calculation was reduced, enabling the efficient prediction of long time series data [15]. Min, Hee-soo, and Chae, Dong-gyu, proposed a transformer-based model that fuses multi-scale information of time series data [16]. Tuli et al. developed the Transformer Networks for Anomaly Detection (TranAD) model for anomaly detection of multivariate time series data [17]. They learned long-term dependencies of data using a transformer-based encoder and detected anomalies based on the reconstruction errors of input data. Unlike existing RNN- or LSTM-based models, the transformer has the advantage of parallel processing, enabling more efficient learning for long time series. The transformer model can solve the problem of long-term dependencies in time series [18,19].

1.3. Research Gap

Existing route optimization studies have proposed optimal routes that aim to minimize fuel consumption. However, routes that aim to minimize fuel consumption increase the complexity of maneuvers due to excessive course changes. In response, several studies have attempted to reduce unnecessary heading changes by limiting heading angles, removing redundant waypoints, or inserting new waypoints based on distance. Nevertheless, these approaches are typically limited to a great circle route or rely solely on the distance factor, which restricts their applicability in real-world operational scenarios. Moreover, research on simultaneously optimizing both the number and the position of waypoints—especially in connection with fuel efficiency and operational feasibility—remains limited.

A variety of prediction models have been developed to accurately estimate fuel consumption, which is essential for evaluating the performance of route optimization. Among these, time series prediction models are particularly relevant due to the sequential nature of fuel consumption data. Techniques such as machine learning, ANN, LSTM, and transformer-based models have been actively employed. Notably, the transformer architecture, which enables parallel processing and effectively captures long-term dependencies, has demonstrated superior performance in long time series prediction.

Research that simultaneously integrates the number of waypoints and fuel consumption forecasting into route optimization remains limited. While a few studies have attempted to incorporate waypoint configuration into route planning, approaches that jointly optimize the number of waypoints and fuel efficiency remain extremely limited. Such studies have often been disconnected from the realities of practical ship operations. In actual navigation, mariners are required to minimize fuel consumption in compliance with environmental regulations, while also ensuring navigational stability and reducing maneuvering complexity. Prior approaches have rarely validated whether their optimized routes remain effective when evaluated against these multi-objective demands. This study proposes an integrated framework that combines a waypoint optimization model and a fuel consumption prediction model to derive optimized routes. The proposed method first enhances the accuracy of fuel consumption prediction through a dedicated forecasting model, which enables the derivation of an energy-efficient route. Based on the preliminary route, the number of waypoints is subsequently adjusted to improve navigational stability while reducing maneuvering complexity. Consequently, the improved navigational stability and the reduced steering reduce the fuel consumption and the carbon emissions due to the ship voyage. The proposed method aims to minimize overall fuel consumption by simultaneously adjusting both the position and the number of waypoints, thereby improving operational performance. The key contributions of this study are summarized as follows:

A PPO algorithm is proposed to optimize the position and number of waypoints to minimize fuel consumption and reduce the maneuvering complexity of the ship. Based on environmental data, the optimization of waypoints considering geographical constraints improves field applicability and derives an optimal route that considers both fuel efficiency and navigational stability.
To accurately predict fuel consumption, which is essential for an optimal route, this study develops a forecasting model based on the transformer architecture—well known for its effectiveness in time series analysis—and compares its predictive performance with LSTM networks and random forest models.
Operational data collected from a smart ship are used to validate the proposed framework. By integrating the fuel consumption prediction model with the waypoint optimization algorithm, this study demonstrates the potential for practical implementation in autonomous ship routing systems.

2. Data Materials

2.1. Data Frame

In this study, fuel consumption measurement data during actual ship operation were used for transformer development. The fuel consumption data used were data from 18 months of operation of a 174,000 cubic meter Liquefied Natural Gas (LNG) carrier. As described in Kim’s [1] study, the cost function of the operation plan is set to fuel consumption, and the fuel consumption prediction accuracy determines the performance of the operation plan model.

2.1.1. Target Vessel

To develop a transformer-based ship fuel prediction model, the operating data of a 174,000 cubic meter LNG ship were used as learning data. The speed-dependent resistance of the ship and the environmental external force conditions applied during operation were measured, and the corresponding fuel consumption of the ship was measured. The fuel consumption was measured directly by measuring the amount of fuel injected into the engine. This method can be used as an approach based on actual data, and was conducted by referring to the environmental resistance modeling part of Kim’s [1] study.

The main specifications of the target vessel are as shown in Table 1.

2.1.2. Voyage Measurement Data

Figure 1 shows the 12-month voyage trajectory of the operational data. It uses actual data from ships departing from Houston, USA, passing through the Panama Canal, and operating between LNG bases in Korea. The navigation data used one year of data from 1 January 2022 to 31 December 2022, during the entire operation period. The total number of data was 486,097. The latitude and longitude data accumulated during the operation were used to calculate the ship’s Speed Over Ground (SOG) and the distance between the waypoints. The SOG represents the actual movement speed of the ship relative to the ground surface, and means the actual movement speed reflecting the influence of tides and currents. The units, minimum, maximum, and average values of the voyage data variables are summarized in Table 2.

The optimized route was derived from AIS data representing actual vessel trajectories. As these historical trajectories correspond to routes already navigated in practice, the optimization results are assumed to follow operationally feasible paths in compliance with relevant navigational regulations, including the IMO COLREGs.

2.1.3. Weather Data

Meteorological data, including wind, current, and wave, were provided by the National Oceanic and Atmospheric Administration (NOAA). The wind and current data have a spatial resolution of 0.25° and a temporal resolution of 4 h, and the observation range is from 89.83° S to 89.83° N, and from 179.83° W to 179.83° E. In the data, the scalar wind speed is provided in m/s, and the Ekman Current is provided as the wind diffusivity current in m/s. The wave data were obtained through the Wave Watch III (WW3) wave model, and have a spatial resolution of 0.5° and a temporal resolution of 1 h. The observation range is from 77.50° S to 77.50° N, and from 0° E to 359.50° E. Significant wave height is measured in meters, and the peak wave period is measured in seconds. Table 3 summarizes the statistical values of meteorological data assigned to the paths of the Busan Port and LA Port, which are one of the experimental cases.

Table 3 and Table 4 show the environmental parameters along the chosen routes of voyages, including zonal and meridional wind and current components. Specifically, Table 3 corresponds to the Busan–Los Angeles route, while Table 4 corresponds to the Busan–Rotterdam route. The negative numbers in these tables signify the direction of a vector with respect to the global coordinate system, not with respect to the direction of travel of the vessel.

The zonal component is the east–west velocity (positive to the east, negative to the west), and the meridional component is the north–south velocity (positive to the north, negative to the south).

Note that the transformation to the along-track (i.e., ship-aligned) reference system was not applied for these elements. Hence, the values are absolute geophysical quantities but not the forces effective in the direction of the ship’s movement. In the model, these vector components are input vector components directly, without projecting the vectors to the vessel’s heading axis.

3. Research Method

3.1. Research Approach

This paper proposes a unique research methodology in three steps to show how to optimize the number of waypoints in a voyage plan. Previous studies have used methods of setting the number of waypoints to be constant or proportional to distance. First, as a preparatory step for establishing an initial voyage plan, which is the initial condition for optimizing the number of waypoints, a fuel prediction model using a Transformer model is developed. The fuel prediction model has previously used an LSTM model capable of real-time prediction, but since the prediction accuracy of the Transformer model is likely to increase, we will utilize this part.

Second, a step of determining a sailing path that minimizes fuel consumption using a reinforcement learning model, PPO, was added to the algorithm. The sailing path derived in this step is used as the initial condition for the third step. The sailing path planning performed only up to the second step contains too many inflection points. This increases fuel consumption due to frequent heading changes. In coastal areas, where heading changes are required more often, this may hinder timely maneuvering and increase the risk of collision with coastal terrain.

Third, the number of waypoints by the reinforcement learning PPO method is proposed. The action is to reduce or add the waypoint and impose the reward as a combination of the fuel consumption and the safety of the route planning.

At last, the performance of the proposed novel method is evaluated. The suggested method, the PPO-based method, is to decide the number of waypoints by action that adds the waypoints. This method aims to optimize the number of waypoints for both geographical practicality and fuel efficiency. Figure 2 provides an overview of the schematic diagram of this research.

3.2. Data Preprocessing

3.2.1. Data Resampling

The collected voyage data, originally recorded at 30 s intervals, were resampled to facilitate analysis and fuel consumption modeling. A resampling interval of one hour was applied, which is a commonly used temporal resolution in fuel consumption prediction studies.

For positional and navigational variables such as latitude, longitude, and vessel speed, the mean value within each interval was used. For cumulative variables such as travel distance and fuel consumption, the sum within each interval was applied.

Any missing values generated during the resampling process were removed to ensure data reliability. The resampling methods applied to each variable are summarized in Table 5.

3.2.2. Voyage Speed Range Filtering

To predict fuel consumption under actual operating conditions, a filtering process was applied to the dataset. Segments in which the vessel’s speed exceeded 10 knots were defined as representing underway conditions, and only data from these segments were used for analysis [20]. This approach excludes data corresponding to anchoring or low-speed maneuvering, thereby improving the accuracy of the fuel consumption prediction model under typical voyage conditions.

3.2.3. UKC-Based Grounding Risk Assessment

When navigating in coastal regions, vessels must carefully account for the risk of grounding. To assess this risk, the Under Keel Clearance (UKC) is calculated, which refers to the vertical distance between the lowest point of the ship’s hull and the seabed. Insufficient UKC can result in grounding incidents during transit. This clearance can be mathematically expressed as shown in Equation (1), where H denotes the water depth, T_max is the vessel’s maximum draft, and UKC represents the remaining clearance below the keel by Galor et al. [21]. Assuming that the vessel’s characteristics and environmental conditions remain constant, the UKC can also be approximated as a fixed proportion (β) of the ship’s draft, as shown in Equation (2). In relatively calm waters with minimal wave effects, a β value of 0.1 to 0.15 is typically adopted. In contrast, for regions exposed to significant wave action, a higher coefficient—often 0.3 or greater—is recommended by Park and Chun, 2009 [22]. Given that the target navigation routes include both coastal and open-ocean segments, a UKC coefficient of 0.3 was applied in this study.

H - T_{m a x} > U K C

(1)

U K C = β T_{m a x} = 0.3 T_{m a x}

(2)

3.3. Design Variables

This study aims to optimize the number, positions, and velocities of ship waypoints for efficient and safe route planning. The key design variables are defined as follows:

n ∈ N: Number of waypoints in the route.
$l_{t} = {(l a t_{t}, l o n_{t})}_{t = 1}^{n}$ : Geographical coordinates of each waypoint t, where lat_t and lon_t denote the latitude and longitude, respectively.
$v_{t} = {\{v_{t}\}}_{t = 1}^{n - 1}$ : Vessel speed between consecutive waypoints, calculated based on the distance and the time interval.
a_t ∈ {add, maintain, delete}: Discrete action space used in PPO to modify the number of waypoints dynamically during learning.
$x_{t} = [v, d, v_{w i n d}, v_{c u r r e n t}, h_{w a v e}, P_{w a v e}]$ : Characteristic values for each waypoint. v is the ship speed over ground, d is the distance between waypoints, h_wave is significant wave height, and P_wave is the wave period. And v_wind and v_current represent the external environmental forces acting on the vessel due to wind and ocean currents, respectively.

Equations (3)–(5) form the decision variables and the objective function in the waypoint optimization. Equation (3) defines a waypoint W_t as a pair of its geographic position and the velocity of the ship. Each waypoint is represented as follows:

W_t = (lat_t, lon_t, v_t), t = 1, 2, …, n

(3)

These waypoints hold both spatial and dynamic data required for route planning and fuel calculations. Equation (4) specifies where A_t refers to the action at time t for the agent, which is composed of an index of the target waypoint and associated waypoint manipulation operation, as follows:

A_t = (l_t, a_t)

(4)

Here, a_t ∈ {add, maintain, delete} represents the type of modification applied to the waypoint set. This formulation allows the agent to dynamically adjust both the number and location of waypoints during training.

Equation (5) represents the objective function for the reinforcement learning agent, as follows:

J (π_{θ}) = E_{π_{θ}} [\sum_{t = 0}^{T} r^{t} (r_{d e p t h} + r_{d i s t a n c e} + r_{w a y p o i n t s})]

(5)

The total reward at each time step is composed of the following three components: r_depth, which favors the location of waypoints in deep navigable regions of the deep sea and deters the placement of keypoints in hazard-rich or shallow areas, r_distance, which encourages proper spacing between waypoints instead of very short or long segments, and r_waypoints, which controls the overall number of waypoints so that the route is not too sparse or too dense.

This choice of reward steers the agent into learning a waypoint configuration that trades off navigational safety, fuel cost, and sensibly complex routes.

To simplify the mathematical formulation and make it suitable for reinforcement learning, many assumptions were used. First, the model is valid only under the assumption of unchanged engine performance, with possible effects due to sea fouling and engine deterioration over time not taken into account. Second, wave and wind data were considered quasi-static over each time step, with hourly averages being linearly interpolated. Third, the environment of reinforcement learning is modeled as a Markov Decision Process (MDP), which means that it assumes full observability of the current state. Constraints on the number of waypoints per path and distance between waypoints were also used to allow for learnable policies, while at the same time preventing paths that were too impractical/sparse.

3.4. Fuel Consumption Prediction Module

3.4.1. Transformer Model

Structure of Transformer Model

The transformer architecture was initially introduced by Vaswani et al. [23]. More recently, encoder-only transformer frameworks have been applied to time-series representation learning, as demonstrated by Zerveas et al. [12], who proposed a model specifically designed for multivariate time-series data.

An encoder-only transformer was used for fuel consumption prediction to reduce model complexity and avoid unnecessary parameters, as sequence generation was not required. This simplification improves computational efficiency and helps prevent overfitting, which is particularly advantageous when modeling structured time-series data. As the objective is to predict a single scalar value rather than generate a sequence, the decoder is unnecessary and may introduce unnecessary complexity.

The encoder is composed of multiple identical layers, each containing the following:

Multi-head self-attention modules.
Position-wise feed-forward networks.
Residual connections and Layer Normalization.

To improve training stability, dropout is applied after attention and feed-forward layers. The final output of the encoder is passed through a global average pooling layer and a fully connected layer to produce a scalar prediction for fuel consumption.

Rather than describing the mathematical details of the attention mechanism [11], we focus on practical tuning, which was critical to achieving strong prediction performance.

The overall structure is illustrated in Figure 3.

Input Vector Modeling

The transformer model consumes a multivariate time series of physical and environmental readings taken along the ship’s course as input. At each time step t, the input vector x_t is defined as follows:

X = \{x_{1}, x_{2}, \dots, x_{T}\}, x_{t} = [v, d, v_{w i n d}, v_{c u r r e n t}, h_{w a v e}, P_{w a v e}]

(6)

where v is the speed of the vessel over ground, d is the inter-waypoint distance, v_wind and v_current are wind and current velocities, and h_wave and P_wave represent wave height and wave period, respectively.

The input vector is transformed through the embedding matrix, and position encoding is added as follows to include position information.

Z_{0} = X W_{e} + P, P \in R^{T \times d}

(7)

where W_e is the embedding matrix, P is position encoding, and d is the embedding dimension.

The temporal features from several encoder blocks are pooled using a global average pooling layer, and a dense layer is used to predict fuel consumption, as follows:

\hat{y} = D e n s e (A v g P o o l (Z))

(8)

The training objective of the model is to reduce the Mean Absolute Error (MAE) and is calculated as follows:

L (\hat{y}, y) = \frac{1}{T} \sum_{t = 1}^{T} |\hat{y_{t}} - y_{t}|

(9)

where L is the loss function, y_t is the fuel consumption.

In this study, we performed a random search-based optimization process targeting key hyperparameters, including the embedding dimension, number of attention heads, feedforward network dimension, dropout ratio, and number of encoder layers, to improve prediction performance. The specific tuning strategies and parameter combinations are detailed in Section 4.2.

3.4.2. Optimizing Transformer Model

The Transformer model in this study employed a random search to select six critical hyperparameters: dimension of the model, number of heads, feed-forward dimension, dropout rate, number of layers, and batch size. I sampled using ParameterSampler from sklearn (version 1.6.1), which is faster than a grid search. The R² score on the test set was used to evaluate the performance of each model, and during training, early stopping was designed so as not to overfit.

Tuning is conducted in two stages: extract search space with a broad search and then perform the fine-tune using prior results. Table 6 lists the initial wide range search space for each hyperparameter. Based on the outcomes of this stage, Table 7 presents the refined search space used in the second tuning phase. Finally, Table 8 summarizes the best-performing hyperparameter configuration selected through this process.

The dataset was divided into train (80%), validation (10%), and test sets (10%). With time series data, this cannot be performed due to their inherent sequential nature, and so no cross-validation was conducted; however, robustness is provided by early stopping and the repeated training.

3.5. Route Optimization Module

The focus of this study is on a two-stage reinforcement learning-based maritime route optimization. In the first stage, it reduces the total fuel consumption (through analysis of environmental and operational data). In the second stage, the topological and bathymetrical constraints for each of the adjustments are computed in order to reduce route length, which will result in a decreased operation cost.

The method improves fuel efficiency and navigational safety. Waypoint intervals are expanded to 200 km in the open ocean regions to facilitate routing, but denser spacing (minimum 10 km) is maintained along the coast where the seafloor is shallower and more complicated topographically for safe navigation.

3.5.1. PPO Algorithm

This study proposes a dual-optimization maritime route planning method using PPO, one of the deep reinforcement learning methods. PPO enables stable policy learning and is well suited for continuous action spaces, making it appropriate for maritime route optimization [24]. In the first stage, a route that minimizes fuel consumption is generated.

The policy advantage ratio in PPO can be defined as follows:

r_{t} (θ) = \frac{π_{θ} (a_{t} | s_{t})}{π_{θ o l d} (a_{t} | s_{t})}

(10)

where r_t(θ) is the action probability ratio between the new policy

π_{θ}

and the old policy

π_{θ o l d}

,

π_{θ} (a_{t} | s_{t})

is the probability of action under the new policy, and

π_{θ o l d} (a_{t} | s_{t})

: the probability of action

a_{t}

under the old policy.

The clipped loss function can be defined as follows. To prevent drastic changes in the policy, the probability ratio a is constrained not to deviate beyond the range [1 − ε,1 + ε].

L_{C L I P} (θ) = E_{t} [m i n (r_{t} (θ) \hat{A_{t}}, c l i p (r_{t} (θ), 1 - ε, 1 + ε) \hat{A_{t}}]

(11)

where

L_{C L I P} (θ)

is clipped loss function,

\hat{A_{t}}

is the advantage function evaluated at time t,

ε

is the clipping threshold, and

c l i p (r_{t} (θ), 1 - ε, 1 + ε)

is restricting

r_{t} (θ)

to lie within the interval

[1 - ε, 1 + ε]

.

The advantage function

\hat{A_{t}}

is defined as follows:

\hat{A} = R_{t} - v (s_{t})

(12)

where R_t is the cumulative reward obtained at time t, V(s_t) is the state-value function for s_t, and

\hat{A_{t}}

is a measure of the advantage of action

a_{t}

over the baseline value

V (s_{t})

.

The overall loss function is composed of three components. First, the policy loss serves as the core objective in PPO, incorporating a clipping mechanism to ensure stability during policy updates. Second, the value loss enhances the accuracy of the state-value function by penalizing deviations from the predicted returns. Lastly, the entropy bonus promotes sufficient exploration by encouraging policy stochasticity and preserving action diversity. The total loss function employed in PPO is defined as follows:

L (θ) = E_{t} [L^{C L I P} (θ) - c_{1} L^{V F} (θ) + c_{2} L^{E N T} (π_{θ})]

(13)

where

L^{C L I P} (θ)

is the clipped policy objective function,

L^{V F} (θ)

is the value estimation loss,

L^{E N T} (π_{θ})

is the entropy-based exploration bonus, and c₁, c₂ is hyperparameters that weight the contributions of the value function loss and the entropy regularization term in the overall objective.

Fuel Consumption Minimization Model—PPO

A reinforcement learning framework using the PPO algorithm is applied to derive fuel-efficient ship routing strategies. The state space is a six-dimensional vector including cumulative distance, vessel speed, wind speed, current speed, wave height, and wave period. It has a discrete action space, where actions consist of position and speed; the step size is chosen to be between 10 and 19 knots on a uniform grid. In each decision step, the agent sees the current state and selects an optimal position or speed combination given its state.

In position selection, valid positions are only ones that differ by at most two grid points to the left or right (lateral window) from the last selected point. Specifically, we formulate the reward function as a negative reward of the fuel consumption predicted by a transformer-based fuel consumption prediction model, intending to let the agent learn routing policies that lead to lower fuel consumption.

The ship fuel consumption minimization problem via waypoint optimization can be mathematically formulated as a Markov Decision Process (MDP), enabling the application of reinforcement learning algorithms to derive optimal routing policies. MDP is defined by a set of states S_t, actions a_t, rewards r_t, a state transition function P(s_t+1∣s_t, a_t), and a termination condition T. Based on this formulation, the agent learns optimal routing and speed control policies. In the reinforcement learning environment, each state s_t is represented by six features: the distance between the current and previous waypoints, speed over ground (SOG), wind speed, ocean current speed, wave height, and wave period. The state S_t of the reinforcement learning environment at time step t is composed of six elements, capturing key dynamic and environmental factors affecting ship routing. These include the distance between the current and previous waypoints (d_t), the currently selected vessel speed (v_t), wind speed (w_t), ocean current speed (c_t), wave height (h_t), and wave period (p_t). The state vector is formally expressed as follows:

S_{t} = [d_{t}, v_{t}, w_{t}, c_{t}, h_{t}, p_{t}]

(14)

In reinforcement learning, the state space S varies dynamically depending on the batch size B, and the space of observation vectors can be defined as follows:

S = R^{B \times 3}

(15)

where B is the batch size.

The agent’s action a_t at time step t is represented as a combination of two components: the index of the selected waypoint l_t and the vessel speed v_t. The action is formally expressed as follows:

a_{t} = (l_{t}, v_{t})

(16)

The action space A is defined as the Cartesian product of the discrete set of waypoint indices and the discretized set of vessel speeds. Formally, it can be written as follows:

A = \{1, 2, \dots, N\} \times {10.0, 10.1, \dots, 19.0}

(17)

At each time step t, the agent selects a combination of waypoint index l_t and vessel speed v_t, represented as the action a_t. The reward function rt is designed to encourage the agent to minimize fuel consumption. Specifically, the fuel consumption F_t at time t is predicted by a fuel consumption prediction model f_fuel(x_t), where x_t denotes the feature vector at the current time step. The fuel consumption is computed as follows:

F_{t} = f_{f u e l} (x_{t})

(18)

The reward r_t is then defined as the negative value of the predicted fuel consumption, such that lower fuel consumption results in higher rewards. Formally, the reward is expressed as follows:

r_{t} = - F_{t}

(19)

This reward formulation drives the agent to learn a policy that reduces the overall fuel consumption.

The state transition function P(s_t+1/s_t, a_t) defines the transition to a new state s_t+₁ resulting from the agent’s action a_t taken at time step t, given the current state s_t. The state transition is represented as follows:

s_{t + 1} = f_{s t a t e} (s_{t}, a_{t})

(20)

where f_state denotes the state transition function, s_t is the current state, and a_t is the agent’s action.

The state transition process involves computing the distance between the previous and current waypoints, as well as the corresponding fuel consumption, to generate the next observation state s_t+₁.

The termination condition E_t defines the criteria for ending an episode. The episode terminates when the reinforcement learning agent reaches a predefined maximum number of time steps. The termination condition is formally expressed as follows:

E_{t} = \{\begin{matrix} 1, i f t \geq T_{m a x} o r W_{t} = W_{f i n a l} \\ 0, o t h e r w i s e \end{matrix}

(21)

where T_max is the total number of time steps, W_t is the number of waypoints evaluated up to time t, and W_final is the target point or the final waypoint.

By leveraging the defined state space, action space, reward function, state transition dynamics, and termination condition, the waypoint optimization problem is formally modeled as a Markov Decision Process (MDP). This formulation enables the agent to learn a policy that selects optimal waypoints l_t and vessel speeds v_t at each decision step, with the objective of minimizing overall fuel consumption.

The reward function graph for the PPO model designed to minimize fuel consumption is shown to converge as illustrated in Figure 4.

Waypoint Number Optimization Method—PPO

The PPO algorithm optimizes waypoint location and count by learning terrain-aware placement through a depth-based reward system. The state space includes latitude, longitude, and water depth at the current location. The agent performs three actions: selecting a location, adjusting speed, and adding or deleting waypoints. The reward function considers both depth and waypoint characteristics.

Based on the vessel’s maximum draft, shallow water is defined as under 50 m and deep water as over 200 m. Deleting waypoints in deep water yields a positive reward, while doing so in shallow water results in a penalty. To maintain a reasonable waypoint count, strong penalties are applied if the number falls below 3 or exceeds 100. The allowed distance between waypoints ranges from 10 km to 200 km, with penalties beyond this range. This structure leads the agent to prefer sparse waypoint placement in deep waters and denser placement in coastal or shallow areas, mimicking experienced mariners’ navigation strategies. This reduces unnecessary course changes, enhances efficiency, and simplifies routing. The state space is defined as follows:

S_{t} = [l a t_{t}, l o n_{t}, d_{t}]

(22)

where lat_t, lon_t is the latitude and longitude coordinate observed at time t and d_t is the bathymetric depth at time t.

In reinforcement learning, the state space S varies dynamically depending on the batch size B, and the space of observation vectors can be defined as Equation (15).

The agent’s action A_t at time step t is composed of two components, as follows:

A_{t} = (l_{t}, a_{t})

(23)

The first component, l_t, represents the index of the waypoint selected by the agent at time step t. It is chosen from a discrete set of available waypoints, where l_t ∈ {1, 2, …, L}. The second component, a_t, represents the type of modification applied to the waypoint set. This is a discrete action where a_t ∈ {0, 1, 2}, with the following meanings:

a_t = 0: maintain the current waypoint set;
a_t = 1: add a new waypoint;
a_t = 2: delete an existing waypoint.

This action structure allows the agent not only to select navigation waypoints but also to dynamically adjust the waypoint set during training, thereby enabling more flexible route optimization.

The action space A consists of combinations of possible waypoint indices and discrete operations—maintaining, adding, or deleting a waypoint. It can be defined as follows:

A = {(l, a) | l \in 1, \dots, L, a \in 1, 2}

(24)

The reward function R_t, which evaluates the agent’s action A_t, is constructed based on three key criteria. The first criterion is a depth-based reward: adding waypoints in deeper waters (i.e., open ocean) is rewarded, whereas adding waypoints in shallow waters (i.e., coastal regions) incurs a penalty. The second criterion is a waypoint-count-based reward: a penalty is applied when the number of waypoints exceeds a certain threshold, discouraging excessively dense routing. The third criterion is a distance-based reward: a penalty is imposed if the distance between consecutive waypoints becomes too small, encouraging spatial efficiency in the planned route. The reward function is formally defined as follows:

R_{t} = r_{d e p t h} + r_{d i s t a n c e} + r_{w a y p o i n t s}

(25)

where r_depth is a depth-based reward, r_distance is a distance-based reward between waypoints, and r_waypoint is a reward based on the number of waypoints.

The state transition function f_state maps the current state and action to a new state S_t+₁, representing the environment’s evolution when the agent executes action A_t at time t. The state transition function T is defined as follows:

S_{t + 1} = f_{s t a t e} (S_{t} + A_{t}) = \{\begin{matrix} S_{t}, i f a_{t} = 0 (m a i n t a i n) \\ S_{t} ⨆ l_{t}, i f a_{t} = 1 (a d d) \\ S_{t} ∖ l_{t}, i f a_{t} = 2 (d e l e t e) \end{matrix}

(26)

In the state transition process, the new observation state

S_{t + 1}

is generated based on the current waypoint positions, the addition or deletion of waypoints, and the distances between consecutive waypoints. The termination condition E defines the criteria for ending an episode. An episode terminates either when the reinforcement learning agent reaches the predefined maximum time step or when all waypoints have been visited. The termination condition E is defined as Equation (21).

The episode terminates when the agent reaches the destination point of the route, at which point learning is halted and a new route is initiated for training.

The reward function graph for the PPO-based model designed to optimize the number of waypoints is shown to converge, as illustrated in Figure 5.

3.5.2. PPO Algorithm Optimization

To enhance the learning performance of the reinforcement learning model, hyperparameter optimization of the PPO algorithm is conducted using Optuna, which is a hyperparameter optimization framework based on Bayesian optimization, specifically utilizing the Tree-Structured Parzen Estimator (TPE) method [25]. It performs efficient sampling and optimization within the defined search space and is well suited for tuning hyperparameters of reinforcement learning and machine learning models.

The objective of the optimization process is to adjust the PPO hyperparameters such that the agent achieves higher average rewards and improved learning stability. In this study, separate hyperparameter optimization procedures are performed for two objectives: (1) route optimization for minimizing fuel consumption and (2) waypoint optimization that accounts for geographical constraints and bathymetry. Table 9 and Table 10 summarize the hyperparameter search grids explored during optimization, while Table 11 and Table 12 present the optimal hyperparameter configurations obtained through the optimization process.

4. Results

4.1. Fuel Consumption Prediction Model Results

4.1.1. Marine Fuels

The vessel analyzed in this study operated using a combination of fuels, including low sulfur marine gas oil (LSMGO), low sulfur heavy fuel oil (LSHCO), and the environmentally friendly LNG. Among these, LNG was used for approximately 50% of the total operational period. This high usage proportion likely reflects the vessel’s operational preference for LNG due to its lower emissions and better compliance with IMO environmental regulations, especially within emission control areas (ECAs). Accordingly, a representative fuel consumption prediction model was developed specifically for LNG usage.

4.1.2. Performance Comparison of LNG Fuel Consumption Prediction Models

Eighty percent of the voyage dataset, including fuel consumption data, was used for training, and the remaining 20% was reserved for evaluation. To assess the performance of the regression models, the R² score, root mean square error (RMSE), and mean absolute percentage error (MAPE) were employed as evaluation metrics.

For comparative analysis, additional models were trained using LSTM networks and Random Forest (RF) regressors. Among the R² scores, the transformer-based model achieved the highest performance with 86.75%, followed by the LSTM model with 78.81% and the RF model with 69.20%. The transformer-based model also outperformed the others in terms of RMSE. However, in terms of MAPE, the LSTM model recorded the lowest error, indicating its relative advantage in minimizing percentage-based prediction errors.

In contrast, the LSTM model presented its lowest MAPE score (3.62%), while the transformer model, despite having a higher MAPE score (5.01%), was considered the best according to RMSE and R² statistics. This discrepancy arises from a more profound difference between the two models in terms of the treatment of short versus long-term prediction errors.

As a percentage error measure, MAPE is sensitive when the magnitude of the target is low (especially at low-closing demand hours), which makes use of better fine-grained fitting (owing to the gated memory).

However, the transformer model is adequate for global dependencies and patterns to be learned from far temporal windows because of its self-attentive architecture; it has superior generalization and overall prediction performance.

As a result, the transformer model is more robust to different voyage conditions than the LSTM overall, even though it has a higher MAPE, as shown by a higher R² and lower RMSE, as seen in Table 13.

4.2. Results of the Waypoint Optimization Model

4.2.1. Case 1: Voyage Route from Busan Port to the Port of Los Angeles

Voyage Route Derived from AIS Data

A representative voyage route between Busan Port and the Port of Los Angeles was generated based on AIS data. A total of 333 waypoints were created along the route at 30 km intervals, as illustrated in Figure 6.

Route Optimized for Fuel Consumption Minimized Using PPO

An optimal voyage route between Busan Port and the Port of Los Angeles was derived using the PPO algorithm to minimize fuel consumption. As a result, a route consisting of 333 waypoints was generated, as shown in Figure 7.

Route Optimized Using PPO Algorithm with Consideration of Terrain and Bathymetric Constraints

An excessive number of waypoints and frequent changes in heading direction can increase the complexity of vessel navigation and lead to additional fuel consumption. In open ocean areas where environmental conditions remain relatively stable, it is desirable to minimize the number of waypoints, whereas in coastal regions with complex terrain, increasing waypoint density can enhance navigational safety.

Based on the PPO algorithm, the number and placement of waypoints were optimized with consideration of both terrain and bathymetric constraints. As a result, a route consisting of 108 waypoints was generated, as illustrated in Figure 8. Figure 9 provides a comparative illustration that integrates the results shown separately in Figure 6, Figure 7 and Figure 8, enabling a direct visual comparison of their respective outcomes.

Comparison of Optimized Routes by Model

The previously derived routes, including the AIS-based data route, the fuel consumption-minimizing route, and the waypoint-optimized route, were compared and analyzed. Each model’s route, along with associated wave height data, was visualized in Figure 10. The fuel consumption estimates for the three routes, as predicted by the transformer-based model, are summarized in Table 14.

The analysis revealed that the fuel-minimizing route reduced fuel consumption by 1.56% compared to the AIS-based route. The waypoint-optimized route showed a 3.80% reduction in fuel consumption relative to the AIS-based route, and a 2.27% improvement compared to the fuel-minimizing route. Among the three, the waypoint-optimized route was found to be the most fuel efficient for the voyage between Busan Port and the Port of Los Angeles. This route can be considered the most effective in achieving voyage planning objectives by minimizing fuel consumption, enhancing navigational safety, and reducing unnecessary course changes.

This 3.80% fuel reduction is mainly due to the PPO algorithm’s depth-aware waypoint adjustment mechanism. The agent added or deleted waypoints, dynamically reducing density in deep-sea regions to minimize maneuvering and increasing it near coastal areas for safer navigation. As seen in Figure 10, the optimized route also avoids high-wave zones, further contributing to lower resistance and improved operational efficiency.

4.2.2. Case 2: Voyage Route from Busan Port to the Port of Rotterdam

Voyage Route Derived from AIS Data

A representative voyage route between Busan Port and the Port of Rotterdam was generated based on AIS data. The route consists of 674 waypoints placed at 30 km intervals, as shown in Figure 11.

Route Optimized for Fuel Consumption Minimized Using PPO

An optimal voyage route between Busan Port and the Port of Rotterdam was derived using the PPO algorithm to minimize fuel consumption. As a result, a route consisting of 674 waypoints was generated, as illustrated in Figure 12.

Route Optimized Using PPO Algorithm with Consideration of Terrain and Bathymetric Constraints

Excessive numbers of waypoints and abrupt changes in heading can increase navigational complexity and lead to additional fuel consumption. In open ocean areas with relatively stable environmental conditions, minimizing the number of waypoints is preferable. Conversely, in coastal regions with complex geography, increasing waypoint density can improve navigational safety.

Using the PPO algorithm, the number and placement of waypoints were optimized by taking into account both terrain and bathymetric constraints. As a result, a route consisting of 449 waypoints was generated, as illustrated in Figure 13. Figure 14 provides a comparative illustration that integrates the results shown separately in Figure 11, Figure 12 and Figure 13, enabling a direct visual comparison of their respective outcomes.

Comparison of Optimized Routes by Model

The previously derived routes, including the AIS-based data route, the fuel consumption-minimizing route, and the waypoint-optimized route, were compared and analyzed. Each model’s route, along with associated wave height data, was visualized in Figure 15. The fuel consumption estimates for the three routes, as predicted by the transformer-based model, are summarized in Table 15.

The analysis showed that the fuel-minimizing route achieved a 15.97% reduction in fuel consumption compared to the AIS-based route. The waypoint-optimized route achieved an 18.08% reduction compared to the AIS-based route, indicating the highest fuel-saving potential.

For the voyage between Busan Port and the Port of Rotterdam, the waypoint-optimized route was found to be the most fuel efficient. This route can be considered effective in achieving voyage planning objectives by minimizing fuel consumption, enhancing navigational safety, and reducing unnecessary course changes.

The fuel savings for the Busan–Rotterdam route also stem from the PPO agent’s adaptive waypoint control based on bathymetric conditions. The optimized route minimizes heading changes over deep water and increases route precision near land. Figure 13 confirms that the route efficiently bypasses areas of higher wave height, validating its environmental robustness.

4.2.3. Case 3: Voyage Route from Mokpo Port to the Port of Houston

Voyage Route Derived from AIS Data

A representative voyage route between Mokpo Port and the Port of Houston was generated based on AIS data. The route consists of 644 waypoints placed at 30 km intervals, as shown in Figure 16.

Route Optimized for Fuel Consumption Minimized Using PPO

An optimal voyage route between Mokpo Port and the Port of Houston was derived using the PPO algorithm to minimize fuel consumption. As a result, a route consisting of 644 waypoints was generated, as illustrated in Figure 17.

Route Optimized Using PPO Algorithm with Consideration of Terrain and Bathymetric Constraints

An excessive number of waypoints and complex heading directions can increase the navigational complexity of a vessel and lead to additional fuel consumption. In open ocean areas where environmental conditions are relatively stable, it is preferable to minimize the number of waypoints, while in coastal areas with complex terrain, increasing the number of waypoints can enhance navigational safety.

Based on the PPO algorithm, the number and positions of waypoints were optimized by considering both terrain and bathymetric constraints. As a result, a route consisting of 78 waypoints was generated, as illustrated in Figure 18. Figure 19 provides a comparative illustration that integrates the results shown separately in Figure 16, Figure 17 and Figure 18, enabling a direct visual comparison of their respective outcomes.

Comparison of Optimized Routes by Model

The previously derived routes, including the AIS-based data route, the fuel consumption-minimizing route, and the waypoint-optimized route, were compared and analyzed. Each model’s route was visualized along with wave height data in Figure 20. The fuel consumption values for the three routes are summarized in Table 16.

The AIS-based route represents actual fuel consumption data, while the fuel-minimizing and waypoint-optimized routes reflect the predicted values obtained using the transformer-based model. The analysis showed that the fuel-minimizing route achieved a 1.26% reduction in fuel consumption compared to the AIS-based route. The waypoint-optimized route yielded a 12.10% reduction relative to the AIS-based route.

For the voyage between Mokpo Port and the Port of Houston, the waypoint-optimized route was identified as the most fuel-efficient option among the three alternatives.

In the Houston–Mokpo case, the waypoint-optimized route benefits from the PPO model’s depth-dependent waypoint strategy, which balances fuel efficiency and navigational safety. Figure 18 shows that the route avoids regions with elevated wave activity, which supports the model’s ability to generate both low-emission and sea-state-resilient trajectories.

5. Conclusions

With the advent of smart ships and autonomous vessels, route planning has become increasingly important, and fuel consumption along the way is critical as it affects sailing performance. In response to the IMO zero-carbon emissions target, carbon emission reduction technologies are a key challenge for ship operations. This study proposes a waypoint optimization method that accounts for not only fuel efficiency but also ship stability and steering efficiency. A transformer-based model, which is excellent for time-series data processing, was employed to predict fuel consumption using environmental and voyage data collected from the smart ship. Predicted fuel consumption and bathymetric data were used to perform waypoint optimization with the PPO algorithm, which is designed for continuous and dynamic maritime environments. The proposed model achieved a prediction accuracy of 86.75%, outperforming both the LSTM and random forest models. In experiments conducted involving three real-ship scenarios, the fuel-minimizing and waypoint-optimized routes both demonstrated the ability to avoid danger areas. Notably, compared to the AIS-based route, the waypoint-optimized route reduced fuel consumption by an average of 11.33%, based on the results from three case studies (Case 1 to Case 3), demonstrating superior performance. The experiments confirmed that waypoint optimization was successfully implemented using a fuel consumption minimization model and bathymetric constraints. Based on the experimental results, the proposed methodology makes the following significant contributions.

-: To improve the accuracy of fuel consumption prediction, which is essential for optimal route planning, this study develops a forecasting model based on the transformer architecture. The proposed transformer-based model outperformed the LSTM and random forest, which are widely used for time series prediction. This study is the first to apply a transformer model to multivariate time series prediction in the context of smart ship operations, utilizing navigation and weather data. The experimental results demonstrated its superior predictive performance.
-: A waypoint optimization model was developed using the PPO algorithm to minimize fuel consumption and reduce excessive vessel course changes and steering complexity. The position and number of waypoints were optimized based on how actual navigators make course changes to improve both fuel efficiency and vessel stability. The waypoint optimization study is highly applicable in the field because it reflects the actual decision-making experience of navigators.
-: By integrating a fuel consumption prediction model with waypoint optimization, this study derives optimal route planning using data collected from a smart ship. These routes support compliance with environmental regulations and operational guidelines set by the IMO. Furthermore, the proposed method contributes to autonomous ship routing by reflecting patterns similar to actual operations.

Although this study confirms the validity of the proposed method, it is based on specific conditions and datasets, and the following limitations should be considered in future research. First, ships often use dual fuel as well as single fuel, so further research is needed to improve the understanding of various fuels and expand the scope of application of the fuel consumption prediction model. The second limitation of our study is that the validation of this study is based on operational data from a single LNG carrier and a limited number of voyage cases. As a result, the extent to which it will work for other ship types, in more or less severe sea states that may also include more critical or abnormal conditions, and for durations of voyage is unknown. Therefore, future studies should also use various vessel types, different spatial distributions of distance from shore and travel direction probabilities, and conduct tests in other oceanographic and meteorological conditions to assess the framework’s performance more rigorously. Third, research is needed on more efficient optimization methods to perform fuel consumption minimization and waypoint optimization in a single step by considering terrain and water depth. Fourth, further research is required to enhance the model by incorporating real-time weather forecast data and applying it to various navigation routes. Fifth, to improve the realism and applicability of the model, future studies should consider incorporating real-world navigational constraints such as Traffic Separation Schemes (TSSs) and piracy zones. Finally, although the PPO algorithm offers robust policy learning, its real-time application may be constrained by high computational demands. This challenge can be addressed by employing pre-trained models, lightweight inference engines, or selecting from pre-generated route candidates to improve operational feasibility. To enhance scalability to autonomous ships, future work should address fuel diversity and dynamic environments. This includes developing multi-fuel adaptable prediction models and integrating real-time weather forecasts into the routing process. Moreover, unifying fuel prediction and waypoint optimization into a single-step framework would improve responsiveness and computational efficiency for real-time autonomous navigation.

In addition to these findings, the following key improvements and regulatory implications were observed:

Performance improvements over baselines.
Achieved an average of 11.33% fuel savings compared to AIS-based historical routes.
Reduced fuel consumption by 3.80% relative to PPO-only optimization.
Outperformed LSTM and Random Forest models in prediction accuracy, with a higher R² score (86.75%) and lower RMSE.
Achieved a 67.6% reduction in the number of waypoints, while maintaining fuel efficiency and navigational safety.

The proposed method contributes to compliance with IMO guidelines on safe navigation and voyage planning, particularly those outlined in Resolution A.893(21) and SOLAS Chapter V. By dynamically adjusting waypoint density based on bathymetric and environmental data, the system enables smoother steering, minimizes abrupt heading changes, and enhances predictability in route execution—especially in shallow or confined waters. These operational features align with the principles of anticipatory and safety-conscious voyage planning encouraged under international maritime regulations. Additionally, by reducing average fuel consumption and improving voyage energy efficiency, the method supports ongoing compliance with IMO’s Carbon Intensity Indicator (CII).

Author Contributions

Conceptualization, J.L. and S.K.; Methodology, J.L.; Validation, Y.P.; Formal analysis, J.E.; Investigation, J.L. and J.E.; Resources, S.K.; Data curation, H.H.; Writing—original draft, J.L., Y.P., and J.E.; Writing—review and editing, Y.P. and S.K.; Visualization, Y.P.; Supervision, S.K.; Project administration, S.K.; Funding acquisition, H.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Ministry of Science and ICT (MSIT), Republic of Korea, through the ICT Challenge and Advanced Network of HRD (ICAN) program (IITP-2025-RS-2022-00156345), under the supervision of the Institute for Information and Communications Technology, Planning and Evaluation (IITP) (50%). This work was also supported by the Ministry of Trade, Industry and Energy (MOTIE) and Korea Institute for Advancement of Technology (KIAT) through the International Cooperative R&D Program (P0028244) (25%). Further support was provided by the Ministry of Trade, Industry and Energy (MOTIE), and the Korea Planning & Evaluation Institute of Industrial Technology (KEIT) in 2024 (RS-2024-00454634) (25%).

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Kim, S. The Development of Route Decision-Making Method Based on Tailor-Made Forecast 2D Wave Spectra Due to the Operation Profile of the Vessel. Ocean. Eng. 2020, 213, 106907. [Google Scholar] [CrossRef]
Artusi, E.; Chaillan, F.; Napoli, A. Path Planning for a Maritime Surface Ship Based on Deep Reinforcement Learning and Weather Data. In Proceedings of the OCEANS 2021: San Diego–Porto, San Diego, CA, USA, 20–23 September 2021; pp. 1–8. [Google Scholar]
Moradi, M.H.; Brutsche, M.; Wenig, M.; Wagner, U.; Koch, T. Marine Route Optimization Using Reinforcement Learning Approach to Reduce Fuel Consumption and Consequently Minimize CO₂ Emissions. Ocean Eng. 2022, 259, 111882. [Google Scholar] [CrossRef]
Ma, Y.; Zhao, Y.; Yu, J.; Zhou, J.; Kuang, H. An Interpretable Gray Box Model for Ship Fuel Consumption Prediction Based on the SHAP Framework. J. Mar. Sci. Eng. 2023, 11, 1059. [Google Scholar] [CrossRef]
Lee, H.T.; Kim, M.K. Optimal Path Planning for a Ship in Coastal Waters with Deep Q Network. Ocean Eng. 2024, 307, 118193. [Google Scholar] [CrossRef]
Han, B.; Duan, Z.; Peng, Z.; Chen, Y. A Ship Path Tracking Control Method Using a Fuzzy Control Integrated Line-of-Sight Guidance Law. J. Mar. Sci. Eng. 2024, 12, 586. [Google Scholar] [CrossRef]
Hsu, T.P.; Hsieh, T.H. Evaluation and Execution of Great Elliptic Sailing. J. Navig. 2017, 70, 1023–1040. [Google Scholar] [CrossRef]
Hsieh, T.H.; Meng, Q.; Han, B.; Wang, S.; Wu, X. Optimization of Waypoints on the Great Circle Route Based on Genetic Algorithm and Fuzzy Logic. J. Mar. Sci. Eng. 2023, 11, 358. [Google Scholar] [CrossRef]
Gkerekos, C.; Lazakis, I. A Novel, Data-Driven Heuristic Framework for Vessel Weather Routing. Ocean Eng. 2020, 197, 106887. [Google Scholar] [CrossRef]
Le, L.T.; Lee, G.; Park, K.S.; Kim, H. Neural Network-Based Fuel Consumption Estimation for Container Ships in Korea. Marit. Policy Manag. 2020, 47, 615–632. [Google Scholar] [CrossRef]
Kaklis, D.; Eirinakis, P.; Giannakopoulos, G.; Spyropoulos, C.; Varelas, T.J.; Varlamis, I. A Big Data Approach for Fuel Oil Consumption Estimation in the Maritime Industry. In Proceedings of the 2022 IEEE Eighth International Conference on Big Data Computing Service and Applications (BigDataService), San Francisco, CA, USA, 15–18 August 2022; pp. 39–47. [Google Scholar]
Lee, J.; Eom, J.; Park, J.; Jo, J.; Kim, S. The Development of a Machine Learning-Based Carbon Emission Prediction Method for a Multi-Fuel-Propelled Smart Ship by Using Onboard Measurement Data. Sustainability 2024, 16, 2381. [Google Scholar] [CrossRef]
Kim, G.; Lee, G.; An, S.; Lee, J. Forecasting Future Electric Power Consumption in Busan New Port Using a Deep Learning Model. Asian J. Shipp. Logist. 2023, 39, 78–93. [Google Scholar] [CrossRef]
Zerveas, G.; Jayaraman, S.; Patel, D.; Bhamidipaty, A.; Eickhoff, C. A Transformer-Based Framework for Multivariate Time Series Representation Learning. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining (KDD ‘21), Virtual Event, 14–18 August 2021; pp. 2114–2124. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual Event, 2–9 February 2021; Volume 35, pp. 11106–11115. [Google Scholar]
Min, H.; Chae, D. A Transformer-Based Long-Term Time Series Forecasting Method Using Multi-Scale Feature Fusion. In Proceedings of the Korea Information Processing Society Conference, Gyeongju, Republic of Korea, 3–5 November 2022; Volume 29, pp. 539–540. (In Korean). [Google Scholar]
Tuli, S.; Casale, G.; Jennings, N.R. TranAD: Deep Transformer Networks for Anomaly Detection in Multivariate Time Series Data. arXiv 2022, arXiv:2201.07284. [Google Scholar] [CrossRef]
Wen, Q.; Zhou, T.; Zhang, C.; Chen, W.; Ma, Z.; Yan, J.; Sun, L. Transformers in Time Series: A Survey. arXiv 2023, arXiv:2202.07125. [Google Scholar] [CrossRef]
Xie, Z.; Tu, E.; Fu, X.; Yuan, G.; Han, Y. AIS Data-Driven Maritime Monitoring Based on Transformer: A Comprehensive Review. arXiv 2025. [Google Scholar] [CrossRef]
Yoon, J.H.; Kim, S.W.; Eom, J.O.; Oh, J.; Kim, H.J. Coastal Air Quality Assessment through AIS-Based Vessel Emissions: A Daesan Port Case Study. J. Mar. Sci. Eng. 2023, 11, 2291. [Google Scholar] [CrossRef]
Galor, W. Determination of dynamic under keel clearance of maneuvering ship. J. KONBiN 2009, 8, 53–60. [Google Scholar] [CrossRef][Green Version]
Park, J.; Kim, N. A basic research on optimum under keel clearance for entrance channel. In Proceedings of the Korean Association of Ocean Science and Technology Societies 2009 Conference, Changwon, Republic of Korea, 28 May 2009. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention Is All You Need. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017); Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; pp. 5998–6008. [Google Scholar]
Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar] [CrossRef]

Figure 1. Ship navigation routes used for machine learning.

Figure 2. Presents the research frame of this paper.

Figure 3. General transformer model structure.

Figure 4. Reward function curve for the PPO-based fuel consumption minimization model.

Figure 5. Reward function curve for the PPO-based waypoint number optimization model.

Figure 6. AIS-based voyage route from Busan Port to the Port of Los Angeles.

Figure 7. Fuel-optimized voyage route from Busan Port to the Port of Los Angeles.

Figure 8. Waypoint-optimized voyage route from Busan Port to the Port of Los Angeles.

Figure 9. Comparing three routes from Busan Port to the Port of Los Angles. Blue Route is AIS-based voyage route, Green Route is fuel-optimized voyage route, and Red Route is waypoint-optimized voyage route.

Figure 10. Model-based routes from Busan to Los Angeles with wave height data.

Figure 11. AIS-based voyage route from Busan Port to the Port of Rotterdam.

Figure 12. Fuel-optimized voyage route from Busan Port to the Port of Rotterdam.

Figure 13. Waypoint-optimized voyage route from Busan Port to the Port of Rotterdam.

Figure 14. Comparing three routes from Busan Port to the Port of Rotterdam. Blue Route is an AIS-based voyage route, Green Route is a fuel-optimized voyage route, and Red Route is a waypoint-optimized voyage route.

Figure 15. Model-based routes from Busan to Rotterdam with wave height data.

Figure 16. AIS-based voyage route from Mokpo Port to the Port of Houston.

Figure 17. Fuel-optimized voyage route from Mokpo Port to the Port of Houston.

Figure 18. Waypoint-optimized voyage route from Mokpo Port to the Port of Houston.

Figure 19. Comparing three routes from Mokpo Port to the Port of Houston. Blue Route is an AIS-based voyage route, Green Route is a fuel-optimized voyage route, and Red Route is a waypoint-optimized voyage route.

Figure 20. Model-based routes from Mokpo to Houston with wave height data.

Table 1. Principal dimension of 173K LNG carrier.

Item	Unit	173K LNGC
LPP	m	283.0
B	m	46.4
T	m	11.5
Propulsion/Engine	-	Twin
Propeller Diameter	m	8.3

Table 2. List of navigation data.

Item	Unit	Minimum	Maximum	Average
Latitude	degree	6.43	51.09	25.63
Longitude	degree	−179.99	179.99	−37.91
Speed over ground	Knot	0	19.99	13.32
Distance	Meter	0	55.35	0.21
Fuel consumption	Kg	0	3152.00	1436.14

Table 3. List of environment data variables (Busan–Los Angeles Route).

Item	Unit	Minimum	Maximum	Average
Scalar Wind Speed	m/s	0.20	15.42	5.44
Zonal Wind	m/s	−15.41	12.65	1.02
Meridional Wind	m/s	−11.30	9.29	−1.06
Scalar Current Speed	m/s	0	0.05	0.01
Zonal Current Speed	m/s	−0.04	0.03	0
Meridional Current Speed	m/s	−0.04	0.05	0
Significant Wave Height	meter	0.18	1.39	2.81
Peak Wave Period	second	3.28	17.86	8.69
Peak Wave Direction	degree	0	360.00	196.21
Elevation	meter	−7824.00	1422.00	−3215.87

Table 4. List of environment data variables (Busan–Rotterdam Route).

Item	Unit	Minimum	Maximum	Average
Scalar Wind Speed	m/s	0.20	21.72	6.19
Zonal Wind	m/s	−16.75	12.33	−0.07
Meridional Wind	m/s	−20.14	10.01	−1.22
Scalar Current Speed	m/s	0	0.21	0.03
Zonal Current Speed	m/s	−0.15	0.04	−0.01
Meridional Current Speed	m/s	−0.21	0.07	0
Significant Wave Height	meter	0.01	4.47	1.15
Peak Wave Period	second	1.25	15.38	9.12
Peak Wave Direction	degree	0	359.00	175.72
Elevation	meter	−5154.00	2806.00	−1686.60

Table 5. Data resampling method.

	Unit	Resampling Method
Latitude	degree	Mean
Longitude	degree	Mean
Speed over ground	Knot	Mean
Distance	Meter	Sum within interval
Fuel consumption	Kg	Sum within interval

Table 6. Hyperparameter grid for transformer algorithm for predicting fuel consumption.

Hyperparameter	Grid
Model Dimension	64, 128, 256, 512
Head Number	2, 4, 8, 16
Feedforward Dimension	128, 256, 512
Dropout Ratio	0.1, 0.2, 0.3
Layer Number	2, 4, 6
Batch Size	8, 16, 32

Table 7. Detailed hyperparameter grid for transformer algorithm to predict fuel consumption.

Hyperparameter	Grid
Model Dimension	72, 80, 88
Head Number	2, 3
Feedforward Dimension	72, 80, 88
Dropout Ratio	0.03, 0.05, 0.07
Layer Number	3, 4, 5
Batch Size	12, 16, 20

Table 8. Final hyperparameters for transformer algorithm to predict fuel consumption.

Hyperparameter	Grid
Model Dimension	72
Head Number	2
Feedforward Dimension	88
Dropout Ratio	0.05
Layer Number	5
Batch Size	16

Table 9. Hyperparameter grid for PPO algorithm in route optimization for minimizing fuel consumption.

Hyperparameter	Value
Learning rate	0.00001~0.0001
Batch size	64, 128, 256
Number of epochs	8~20
Entropy coefficient	0.0005~0.01
Discount factor	0.97~0.995
Lambda	0.92~0.99
Clipping range	0.1~0.3

Table 10. Hyperparameter Search Grid for PPO-based Waypoint Optimization Considering Terrain and Bathymetry.

Hyperparameter	Value
Learning rate	0.00001~0.0001
Batch size	64, 128, 256
Number of epochs	8~20
Entropy coefficient	0.0005~0.01
Discount factor	0.97~0.995
Lambda	0.92~0.99
Clipping range	0.1~0.3

Table 11. Final Hyperparameter Configuration of the PPO Algorithm for Route Optimization to Minimize Fuel Consumption.

Hyperparameter	Value
Learning rate	0.00016
Batch size	64
Number of epochs	12
Entropy coefficient	0.01863
Discount factor	0.95564
Lambda	0.92447
Clipping range	0.12639

Table 12. Final hyperparameters for PPO-based waypoint optimization considering terrain and bathymetry.

Hyperparameter	Value
Learning rate	0.0001
Batch size	64
Number of epochs	25
Entropy coefficient	0.005
Discount factor	0.99
Lambda	0.95
Clipping range	0.2

Table 13. Performance metrics of fuel consumption prediction models.

Model	R² Score (%)	RMSE (Value)	MAPE (%)
Transformer	87.75	137.88	5.01
LSTM	78.81	141.93	3.62
RF	69.20	187.09	8.93

Table 14. Comparison of fuel consumption for optimized routes by model for the voyage from Busan to LA.

Route Type	Fuel Consumption (ton)
AIS-Based Route	688.94
Fuel-Minimizing Route	678.15
Waypoint-Optimized Route	662.78

Table 15. Comparison of fuel consumption for optimized routes by model for the voyage from Busan to Rotterdam.

Route Type	Fuel Consumption (ton)
AIS-Based Route	1273.19
Fuel-Minimizing Route	1069.83
Waypoint-Optimized Route	1043.04

Table 16. Comparison of fuel consumption for optimized routes by model for the voyage from Mokpo to Houston.

Route Type	Fuel Consumption (ton)
AIS-Based Route	1307.68
Fuel-Minimizing Route	1291.22
Waypoint-Optimized Route	1149.50

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, J.; Park, Y.; Eom, J.; Hwang, H.; Kim, S. Ship Voyage Route Waypoint Optimization Method Using Reinforcement Learning Considering Topographical Factors and Fuel Consumption. J. Mar. Sci. Eng. 2025, 13, 1554. https://doi.org/10.3390/jmse13081554

AMA Style

Lee J, Park Y, Eom J, Hwang H, Kim S. Ship Voyage Route Waypoint Optimization Method Using Reinforcement Learning Considering Topographical Factors and Fuel Consumption. Journal of Marine Science and Engineering. 2025; 13(8):1554. https://doi.org/10.3390/jmse13081554

Chicago/Turabian Style

Lee, Juhyang, Youngseo Park, Jeongon Eom, Hungyu Hwang, and Sewon Kim. 2025. "Ship Voyage Route Waypoint Optimization Method Using Reinforcement Learning Considering Topographical Factors and Fuel Consumption" Journal of Marine Science and Engineering 13, no. 8: 1554. https://doi.org/10.3390/jmse13081554

APA Style

Lee, J., Park, Y., Eom, J., Hwang, H., & Kim, S. (2025). Ship Voyage Route Waypoint Optimization Method Using Reinforcement Learning Considering Topographical Factors and Fuel Consumption. Journal of Marine Science and Engineering, 13(8), 1554. https://doi.org/10.3390/jmse13081554

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Ship Voyage Route Waypoint Optimization Method Using Reinforcement Learning Considering Topographical Factors and Fuel Consumption

Abstract

1. Introduction

1.1. Background

1.2. State of the Art

1.3. Research Gap

2. Data Materials

2.1. Data Frame

2.1.1. Target Vessel

2.1.2. Voyage Measurement Data

2.1.3. Weather Data

3. Research Method

3.1. Research Approach

3.2. Data Preprocessing

3.2.1. Data Resampling

3.2.2. Voyage Speed Range Filtering

3.2.3. UKC-Based Grounding Risk Assessment

3.3. Design Variables

3.4. Fuel Consumption Prediction Module

3.4.1. Transformer Model

Structure of Transformer Model

Input Vector Modeling

3.4.2. Optimizing Transformer Model

3.5. Route Optimization Module

3.5.1. PPO Algorithm

Fuel Consumption Minimization Model—PPO

Waypoint Number Optimization Method—PPO

3.5.2. PPO Algorithm Optimization

4. Results

4.1. Fuel Consumption Prediction Model Results

4.1.1. Marine Fuels

4.1.2. Performance Comparison of LNG Fuel Consumption Prediction Models

4.2. Results of the Waypoint Optimization Model

4.2.1. Case 1: Voyage Route from Busan Port to the Port of Los Angeles

Voyage Route Derived from AIS Data

Route Optimized for Fuel Consumption Minimized Using PPO

Route Optimized Using PPO Algorithm with Consideration of Terrain and Bathymetric Constraints

Comparison of Optimized Routes by Model

4.2.2. Case 2: Voyage Route from Busan Port to the Port of Rotterdam

Voyage Route Derived from AIS Data

Route Optimized for Fuel Consumption Minimized Using PPO

Route Optimized Using PPO Algorithm with Consideration of Terrain and Bathymetric Constraints

Comparison of Optimized Routes by Model

4.2.3. Case 3: Voyage Route from Mokpo Port to the Port of Houston

Voyage Route Derived from AIS Data

Route Optimized for Fuel Consumption Minimized Using PPO

Route Optimized Using PPO Algorithm with Consideration of Terrain and Bathymetric Constraints

Comparison of Optimized Routes by Model

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI