Development of a Reinforcement Learning-Based Ship Voyage Planning Optimization Method Applying Machine Learning-Based Berth Dwell-Time Prediction as a Time Constraint
Abstract
1. Introduction
1.1. Background
1.2. Related Works
1.2.1. Fuel Consumption Prediction
1.2.2. JIT Arrival and Port Operations
1.2.3. Optimal Ship Routing and Deep Reinforcement Learning
1.3. Research Gap
2. Data Materials
2.1. Data Frame
2.1.1. Port Operation Data
2.1.2. Voyage Measurement Data
2.1.3. Weather and Geographical Data
3. Research Method
3.1. Research Approach
3.2. Design Variables
3.3. Dwell-Time Prediction Model
3.3.1. Input and Output Definition
3.3.2. Data Refinement and Construction of the Training Dataset
3.3.3. Model Structure
3.4. Fuel Consumption Prediction Model
3.4.1. Input and Output Definition
3.4.2. Data Preprocessing and Sliding Window Construction
3.4.3. Transformer Model
3.4.4. Optimizing Transformer Model
3.5. Route Optimization Module
3.5.1. MDP Formulation
3.5.2. State and Action Space Design
3.5.3. Reward Function
Fuel-Cost Term
Time Penalty Term
Safety Penalty Term
- Fuel efficiency: the fuel-cost term Ft encourages the agent to minimize predicted fuel consumption under the current environmental and navigational conditions.
- Schedule adherence (RTA/JIT compliance): the time-penalty term drives the policy to reduce early or late arrival deviations, supporting just-in-time operation.
- Navigational safety: the safety-penalty term St penalizes entry into shallow waters and high-wave regions, guiding the agent toward risk-averse routing behavior.
Weight Setting
3.5.4. Deep Q-Network Structure
3.6. Post-Processing
3.6.1. Speed Adjustment for Just-in-Time Arrival
Computation of ETA Deviation
- Δt > 0: the vessel is predicted to arrive late → increase speed;
- Δt < 0: the vessel is predicted to arrive early → decrease speed.
Selection of Speed-Adjustment Segments
Fuel Re-Evaluation After Speed Adjustment
3.6.2. Path Simplification of Grid-Based Routes (Douglas–Peucker)
4. Results
4.1. Dwell-Time Prediction Model Results
4.2. Fuel Consumption Prediction Model Results
4.3. Route Optimization Model Results
4.3.1. Experimental Setup
4.3.2. Case 1: Voyage Route from Gwangyang Port to Busan Port
Experimental Setup
Training Performance and Optimized Route
Fuel and Emissions Reduction Results
4.3.3. Case 2: Voyage Route from Vladivostok Port to Busan Port
Experimental Setup
Training Performance and Optimized Route
Fuel and Emissions Reduction Results
4.3.4. Case 3: Voyage Route from Ningbo Port to Busan Port
Experimental Setup
Training Performance and Optimized Route
ETA-RTA Coordination Process
4.4. Comparative Analysis with Existing Studies
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- International Maritime Organization (IMO). Revised IMO GHG Strategy 2023; IMO: London, UK, 2023; Available online: https://www.imo.org/en/ourwork/environment/pages/2023-imo-strategy-on-reduction-of-ghg-emissions-from-ships.aspx (accessed on 23 December 2025).
- International Council on Clean Transportation (ICCT). Aligning the IMO’s Greenhouse Gas Fuel Standard with Its GHG Strategy and the Paris Agreement; ICCT: Washington, DC, USA, 2024; Available online: https://theicct.org/aligning-the-imos-greenhouse-gas-fuel-standard-with-its-ghg-strategy-and-the-paris-agreement-jan24/ (accessed on 23 December 2025).
- Market.us. Autonomous Ships Market Report; Market.us: San Francisco, CA, USA, 2024; Available online: https://market.us/report/autonomous-ships-market/ (accessed on 23 December 2025).
- International Maritime Organization (IMO). Lowering Containership Emissions Through Just-in-Time Arrivals; IMO: London, UK, 2023; Available online: https://www.imo.org/en/mediacentre/pages/whatsnew-1718.aspx (accessed on 23 December 2025).
- IMO–GreenVoyage2050; Low Carbon GIA. Emissions Reduction Potential in Global Container Shipping; International Maritime Organization: London, UK, 2022. [Google Scholar]
- Uyanik, T.; Arslanoğlu, Y.; Kalenderli, O. Ship Fuel Consumption Prediction with Machine Learning. In Proceedings of the 4th International Mediterranean Science and Engineering Congress (IMSEC 2019), Alanya, Antalya, Türkiye, 25–27 April 2019; pp. 757–759. [Google Scholar]
- Zhang, M.; Tsoulakos, N.; Kujala, P.; Hirdaris, S. A Deep Learning Method for the Prediction of Ship Fuel Consumption in Real Operational Conditions. Eng. Appl. Artif. Intell. 2024, 130, 107425. [Google Scholar] [CrossRef]
- Zhang, D.; Song, Y.; Gao, Y.; Shen, Z.; Li, L.; Yau, A. Research on Ship Engine Fuel Consumption Prediction Algorithm Based on Adaptive Optimization Generative Network. J. Mar. Sci. Eng. 2025, 13, 1140. [Google Scholar] [CrossRef]
- Su, J.; Kim, H.; Park, J. Fuel Consumption Prediction and Optimization Model for Pure Car/Truck Transport Ships. J. Mar. Sci. Eng. 2023, 11, 1231. [Google Scholar] [CrossRef]
- Digital Container Shipping Association. Standards for a Just-in-Time Port Call, Version 1.0. 2020. Available online: https://dcsa.org/standards/just-in-time-port-call (accessed on 23 December 2025).
- Senss, P.; Canbulat, O.; Uzun, D.; Gunbeyaz, S.; Turan, O. Just in Time Vessel Arrival System for Dry Bulk Carriers. J. Shipp. Trade 2023, 8, 12. [Google Scholar] [CrossRef]
- Yan, S.; Tian, W.; Lin, B.; Meng, B.; Larsson, S.; Tian, J. Enhancing Ship Energy Efficiency through Just-In-Time Arrival: A Comprehensive Review. Ocean. Eng. 2025, 340, 122246. [Google Scholar] [CrossRef]
- Iris, Ç.; Pacino, D.; Ropke, S.; Larsen, A. Integrated Berth Allocation and Quay Crane Assignment Problem: Set Partitioning Models and Computational Results. Transp. Res. Part E Logist. Transp. Rev. 2015, 81, 75–97. [Google Scholar] [CrossRef]
- Venturini, G.; Iris, Ç.; Kontovas, C.A.; Larsen, A. The multi-port berth allocation problem with speed optimization and emission considerations. Transp. Res. Part D Transp. Environ. 2017, 54, 142–159. [Google Scholar] [CrossRef]
- Golias, M.M.; Saharidis, G.K.; Boile, M.; Theofanis, S.; Ierapetritou, M.G. The berth allocation problem: Optimizing vessel arrival time. Marit. Econ. Logist. 2009, 11, 358–377. [Google Scholar] [CrossRef]
- Zis, T.P.V.; Psaraftis, H.N.; Ding, L. Ship Weather Routing: A Taxonomy and Survey. Ocean. Eng. 2020, 209, 107697. [Google Scholar] [CrossRef]
- Wei, S.; Zhou, P. Development of a 3D Dynamic Programming Method for Weather Routing. TransNav Int. J. Mar. Navig. Saf. Sea Transp. 2012, 6, 79–85. [Google Scholar]
- Kytariolou, M.; Themelis, N.; Papakonstantinou, V. Ship Routing Optimisation Based on Forecasted Weather Data and Considering Safety Criteria. J. Navig. 2023, 75, 1310–1331. [Google Scholar] [CrossRef]
- Jeong, S.; Kim, T. Generating a path-search graph based on ship-trajectory data: Route search via dynamic programming for autonomous ships. Ocean. Eng. 2023, 283, 114503. [Google Scholar] [CrossRef]
- Chen, Y.; Tian, W.; Mao, W. Strategies to improve the isochrone algorithm for ship voyage optimisation. Ships Offshore Struct. 2024, 19, 2137–2149. [Google Scholar] [CrossRef]
- Mannarini, G.; Salinas, M.L.; Carelli, L.; Petacco, N.; Orović, J. VISIR-2: Ship weather routing in Python. Geosci. Model Dev. 2024, 17, 4355–4382. [Google Scholar] [CrossRef]
- Guo, S.; Zhang, X.; Du, Y.; Zheng, Y.; Cao, Z. Path planning of coastal ships based on optimized DQN reward function. J. Mar. Sci. Eng. 2021, 9, 210. [Google Scholar] [CrossRef]
- Moradi, M.H.; Brutsche, M.; Wenig, M.; Wagner, U.; Koch, T. Marine route optimization using reinforcement learning approach to reduce fuel consumption and consequently minimize CO2 emissions. Ocean. Eng. 2022, 259, 111882. [Google Scholar] [CrossRef]
- Lee, H.-T.; Kim, M.-K. Optimal path planning for a ship in coastal waters with deep Q network. Ocean. Eng. 2024, 307, 118193. [Google Scholar] [CrossRef]
- Latinopoulos, C.; Zavvos, E.; Kaklis, D.; Leemen, V.; Halatsis, A. Marine voyage optimization and weather routing with deep reinforcement learning. J. Mar. Sci. Eng. 2025, 13, 902. [Google Scholar] [CrossRef]
- Shin, G.-H.; Yang, H. Deep Reinforcement Learning for Integrated Vessel Path Planning with Safe Anchorage Allocation. Brodogradnja 2025, 76, 76305. [Google Scholar] [CrossRef]
- Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef]
- van Hasselt, H.; Guez, A.; Silver, D. Deep Reinforcement Learning With Double Q-Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Phoenix, AZ, USA, 12–17 February 2016; Volume 30. [Google Scholar] [CrossRef]
- Wang, Z.; Schaul, T.; Hessel, M.; van Hasselt, H.; Lanctot, M.; de Freitas, N. Dueling Network Architectures for Deep Reinforcement Learning. In Proceedings of the 33rd International Conference on Machine Learning (ICML 2016), New York, NY, USA, 19–24 June 2016. [Google Scholar] [CrossRef]
- Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized Experience Replay. In Proceedings of the International Conference on Learning Representations (ICLR 2016), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar] [CrossRef]
- Hessel, M.; Modayil, J.; van Hasselt, H.; Schaul, T.; Ostrovski, G.; Dabney, W.; Horgan, D.; Piot, B.; Azar, M.G.; Silver, D. Rainbow: Combining Improvements in Deep Reinforcement Learning. In Proceedings of the AAAI Conference on Artificial Intelligence 2018, New Orleans, LA, USA, 2–7 February 2018; Volume 32. [Google Scholar] [CrossRef]
- Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal Policy Optimization Algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
- Haarnoja, T.; Zhou, A.; Abbeel, P.; Levine, S. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor. In Proceedings of the 35th International Conference on Machine Learning (ICML 2018), Stockholm, Sweden, 10–15 July 2018. [Google Scholar] [CrossRef]
- KPI Depot. Port Stay Duration. Available online: https://kpidepot.com/kpi/port-stay-duration (accessed on 23 December 2025).
- Northern Corridor Transit and Transport Coordination Authority. Monthly Port Community Charter Report: February 2018; Northern Corridor Transport Observatory: Mombasa, Kenya, 2018. [Google Scholar]
- Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A Next-Generation Hyperparameter Optimization Framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar] [CrossRef]

















| Item | Unit | Specification |
|---|---|---|
| Capacity | TEU | 23,964 |
| Length | Meter | 399.9 |
| Width | Meter | 61.0 |
| Depth | Meter | 33.2 |
| Max Speed | Knots | 22.4 |
| Item | Unit | Minimum | Maximum | Average |
|---|---|---|---|---|
| Latitude | Degree | −24.86 | 20.73 | −2.31 |
| Longitude | Degree | 48.72 | 121.19 | 86.12 |
| Speed over ground | Knot | 1.42 | 16.72 | 14.60 |
| Time | Day | 0 | 14.01 | 7 |
| Power | MW | 0.01 | 20.80 | 16.58 |
| Time Index | Latitude | Longitude | Wind u | Wind v |
|---|---|---|---|---|
| 6 May 2023 23:00 | 28 | 121 | −1.12 | −0.92 |
| 6 May 2023 23:00 | 28 | 121.25 | −3.10 | −4.73 |
| 6 May 2023 23:00 | 28 | 121.5 | −3.04 | −8.42 |
| 7 May 2023 00:00 | 28 | 121 | −2.56 | −0.66 |
| 7 May 2023 00:00 | 28 | 121.25 | −4.50 | −4.20 |
| 7 May 2023 00:00 | 28 | 121.5 | −4.88 | −7.62 |
| Time Index | Latitude | Longitude | Wave Direction | Wave Period |
|---|---|---|---|---|
| 6 May 2023 23:00 | 28 | 121 | 71 | 4.85 |
| 6 May 2023 23:00 | 28 | 121.5 | 34 | 4.88 |
| 6 May 2023 23:00 | 28 | 122 | 14 | 5.88 |
| 7 May 2023 00:00 | 28 | 121 | 68 | 4.90 |
| 7 May 2023 00:00 | 28 | 121.5 | 36 | 4.78 |
| 7 May 2023 00:00 | 28 | 122 | 15 | 5.95 |
| Category | Variable | Description | Unit | Type |
|---|---|---|---|---|
| Navigation data | Speed over ground | Vessel’s instantaneous ground speed | knot | Input |
| Latitude | Geographical position (north–south) | degree | Input | |
| Longitude | Geographical position (east–west) | degree | Input | |
| Marine environmental data | Water depth | Bathymetric depth below vessel position | m | Input |
| Wind speed | Magnitude of wind at vessel location | m/s | Input | |
| Wind direction | Direction of wind vector | degree | Input | |
| Wave height | Significant wave height | m | Input | |
| Wave direction | Propagation direction of waves | degree | Input | |
| Wave period | Peak wave period | s | Input | |
| Target variable | Fuel consumption (yt) | Fuel consumption per unit time | kg/h | Output |
| Hyperparameter | Grid |
|---|---|
| Model Dimension | [48, 128] |
| Head Number | [2, 6] |
| Feedforward Dimension | [64, 192] |
| Dropout Ratio | [0.001, 0.1] |
| Layer Number | 3, 4, 5, 6 |
| Batch Size | 8, 16, 32 |
| Hyperparameter | Grid |
|---|---|
| Model Dimension | 72 |
| Head Number | 2 |
| Feedforward Dimension | 88 |
| Dropout Ratio | 0.05 |
| Layer Number | 5 |
| Batch Size | 16 |
| Model | R2 Score (%) | RMSE (Value) | MAPE (%) |
|---|---|---|---|
| Transformer | 98.96 | 48.83 | 1.28 |
| LSTM | 98.69 | 54.88 | 1.93 |
| LightGBM | 98.60 | 56.75 | 1.72 |
| Study | Task | Baseline | Key Result |
|---|---|---|---|
| Moradi [23] | Weather routing | GC@20 kn | Fuel −6.64% (DDPG); −1.07% (DQN) |
| Lee & Kim [24] | Coastal planning | Passage plan | Dist −1.77% |
| Latinopoulos [25] | Weather rout-ing | Dist. base/Tabu | Fuel up to −12%; DDPG −4% vs. DDQN |
| This study | RTA-constrained | AIS + dwell→RTA | Fuel/CO2 −35.85% avg.; Dist −22.09% avg. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Park, Y.; Kim, S.; Eom, J.; Kim, S. Development of a Reinforcement Learning-Based Ship Voyage Planning Optimization Method Applying Machine Learning-Based Berth Dwell-Time Prediction as a Time Constraint. J. Mar. Sci. Eng. 2026, 14, 43. https://doi.org/10.3390/jmse14010043
Park Y, Kim S, Eom J, Kim S. Development of a Reinforcement Learning-Based Ship Voyage Planning Optimization Method Applying Machine Learning-Based Berth Dwell-Time Prediction as a Time Constraint. Journal of Marine Science and Engineering. 2026; 14(1):43. https://doi.org/10.3390/jmse14010043
Chicago/Turabian StylePark, Youngseo, Suhwan Kim, Jeongon Eom, and Sewon Kim. 2026. "Development of a Reinforcement Learning-Based Ship Voyage Planning Optimization Method Applying Machine Learning-Based Berth Dwell-Time Prediction as a Time Constraint" Journal of Marine Science and Engineering 14, no. 1: 43. https://doi.org/10.3390/jmse14010043
APA StylePark, Y., Kim, S., Eom, J., & Kim, S. (2026). Development of a Reinforcement Learning-Based Ship Voyage Planning Optimization Method Applying Machine Learning-Based Berth Dwell-Time Prediction as a Time Constraint. Journal of Marine Science and Engineering, 14(1), 43. https://doi.org/10.3390/jmse14010043

