Deep Reinforcement Learning-Based Energy Management for Liquid Hydrogen-Fueled Hybrid Electric Ship Propulsion System

: This study proposed a deep reinforcement learning-based energy management strategy (DRL-EMS) that can be applied to a hybrid electric ship propulsion system (HSPS) integrating liquid hydrogen (LH 2 ) fuel gas supply system (FGSS), proton-exchange membrane fuel cell (PEMFC) and lithium-ion battery systems. This study analyzed the optimized performance of the DRL-EMS and the operational strategy of the LH 2 -HSPS. To train the proposed DRL-EMS, a reward function was deﬁned based on fuel consumption and degradation of power sources during operation. Fuel consumption for ship propulsion was estimated with the power for balance of plant (BOP) of the LH 2 FGSS and PEMFC system. DRL-EMS demonstrated superior global and real-time optimality compared to benchmark algorithms, namely dynamic programming (DP) and sequential quadratic programming (SQP)-based EMS. For various operation cases not used in training, DRL-EMS resulted in 0.7% to 9.2% higher operating expenditure compared to DP-EMS. Additionally, DRL-EMS was trained to operate 60% of the total operation time in the maximum efﬁciency range of the PEMFC system. Different hydrogen fuel costs did not affect the optimized operational strategy although the operating expenditure (OPEX) was dependent on the hydrogen fuel cost. Different capacities of the battery system did not considerably change the OPEX.


Introduction
After the International Maritime Organization (IMO) announced its initial strategy for reducing greenhouse gas (GHG) emissions, it has been continuously strengthening regulations to reduce GHG emissions from ships [1].The IMO has set a target to reduce GHG emissions related to maritime transport by 50% compared to 2008 levels.The European Commission has predicted that if additional measures for GHG reduction are not implemented, the proportion of GHG emissions generated by the shipping industry will increase by 17% by 2050 [2].Furthermore, the IMO is applying the Energy Efficiency Design Index (EEDI) to newly constructed ships to explore GHG emission reduction measures at the design stage of these vessels [3].
One effective method to successfully achieve the IMO's GHG emissions reduction goals is the implementation of alternative fuels and hybrid propulsion systems.According to E.A. Bouman et al. (2017), the use of alternative fuels has been reported to have a potential for up to an 80% reduction in CO 2 emissions, while the application of hybrid propulsion systems can result in a reduction potential of over 15% [4].Among these options, hydrogen fuel is being considered as one of the fuels that can ultimately emit zero GHGs and can be used for both coastal and ocean-going ships.Furthermore, with the continuous advancement of fuel cell and battery technologies and the electrification of ship energy systems, research projects are actively underway to operate hybrid electric propulsion systems (e.g., hydrogen fuel cell + battery) in ships [5].
The ZEMSHIP project aimed to develop and realize the first hydrogen-powered passenger ship with a capacity of over 100 persons.The electric motor consumes electric power of 100 kW which is generated from a proton-exchange membrane fuel cell (PEMFC) and the integrated batteries.The first boat developed by this project, FCS Alsterwasser, has been operating on the Alster in Hamburg since 2008 [5].The HySeas III is a project aimed at developing and demonstrating the use of fuel cells to power a Roll-on/Rolloff/Passenger (RoPax) ferry operating in the Orkney Islands, off the coast of Scotland.The ferry uses a hybrid propulsion system consisting of PEMFC of 6 × 100 kW and batteries of 768 kWh, allowing it to operate on fuel cells when conditions are optimal, and switch to battery power when necessary [6,7].The FLAGSHIPS project aims to take zero-emission waterborne transport to an entirely new level by deploying two commercially operated hydrogen fuel cell vessels by 2023.The demo vessels include the world's first commercial cargo transport vessel operating on hydrogen, plying the river Seine in Paris [8].The HFC MARINE project aims to use hydrogen and fuel cells for marine applications.The intention of the first phase is to design a solution geared for demonstration onboard the new modular ferry design by Odense Maritime Technology.The project explored the feasibility of using fuel cells in marine environments with a focus on hydrogen safety and certification, fuel cell cooling, air compression, installation integration, and cost of ownership [5].
The hybrid electric ship propulsion system (HSPS), which combines two or more power sources, offers excellent fuel economy and is an effective solution for reducing GHG emissions.However, the control problem for the efficient operation of multiple power sources becomes more complex when compared to conventional ship propulsion systems.As a result, research on energy management strategy (EMS) for effective control of hybrid propulsion systems is actively being conducted across various applications, including vehicles, aircraft, and ships [9][10][11][12][13][14][15][16][17].S. Antonopoulos et al. (2021) presented an energy management framework for hybrid power plants in ships, based on model predictive control (MPC), and evaluated the performance of this framework [9].C. Musardo et al. (2005) proposed an EMS based on the adaptive equivalent consumption minimization strategy (A-ECMS), which can be applied to hybrid electric vehicles (HEVs).They also introduced a method for estimating equivalence factors for driving cycles [12].G. Du et al. (2020) proposed an energy management algorithm for HEVs using newly introduced reinforcement learning (Dyna-H) and deep reinforcement learning (AMSGrad) algorithms.They reported fast training speeds and high optimal control performance for these algorithms [13].K. Deng et al. (2022) introduced an EMS for hybrid railway vehicles considering the degradation of a PEMFC and validated the performance of the proposed EMS based on real measured data in a stochastic training environment [16].
Many algorithms for EMS of the hybrid power system can be broadly categorized into rule-based and optimization-based approaches [18].Among these, rule-based EMS has the advantage of easily controlling the system in real time and having simple control procedures.However, it requires a lot of experience from system designers and operators, does not guarantee the optimal operation points for various operating profiles, and often requires tuning of parameters.On the other hand, optimization-based EMS can propose optimal operating strategies for the target system using online or offline optimization algorithms and delivers excellent energy management performance across various operating profiles.Optimization-based EMS, employing methods such as dynamic programming (DP), Pontryagin's minimum principle (PMP), or heuristic global optimization algorithms, can calculate optimal energy management problems, making it widely used as a benchmark solution for analyzing the performance of other algorithms.However, global optimization algorithms, including DP, demand significant computational resources, are challenging to adapt to unknown operating conditions, and PMP is not suitable for online optimal control due to the complexity of Hamiltonian function computations (i.e., it is suitable for offline optimal control).
To overcome the limitations of conventional offline optimization, research on optimization-based EMS that can allocate the output of the target system in real-time (referred to as online EMS) is actively underway [19].Online EMS can be implemented using various methodologies such as model predictive control, reinforcement learning (RL), equivalent consumption minimization strategy, stochastic dynamic programming, and more.Among these, RL-based online EMS can achieve performance similar to global optimization-based EMS through agent training, has lower computational costs when utilizing the trained agent in actual operations, and can effectively handle high-order models or problems due to its model-free characteristics.For these reasons, many studies were conducted to apply RL-based online EMS to energy management problems in hybrid power systems [11,13,16].However, despite the strengthening of emission regulations and the consideration of various alternative fuels and power sources in the maritime industry, research on EMS for HSPS remains insufficient.
Meanwhile, most of the research on online EMS for HSPS conducted thus far has focused on propulsion systems using diesel, LNG, and gaseous hydrogen as the main fuel [9,10,17].Among these, hydrogen is a promising zero-carbon ship fuel for the future.However, when ship capacity increases or bunkering intervals are extended, the volume of fuel tanks needed to store gaseous hydrogen becomes very large.In contrast, when storing hydrogen fuel in a liquid state and using it as fuel by vaporization, it is expected that liquid hydrogen (LH 2 ) can reduce the volume of fuel tanks, as it has a higher volumetric energy density than gaseous hydrogen (approximately twice as high as 700 bar gaseous hydrogen) [20,21].Furthermore, the individual volume of fuel tanks required for storing high-pressure gaseous hydrogen is not higher than that of LH 2 fuel tanks.It means the number of tanks, valves, and associated equipment should be significantly increased due to its low volumetric energy density.
LH 2 is stored at an extremely low saturation temperature, which is around 20 K at atmospheric pressure.Therefore, it requires a fuel gas supply system (FGSS) to match the supply conditions for fuel cells [22], and additional power for the balance of the plant (BOP power) needs to be supplied to the LH 2 FGSS and PEMFC systems.In other words, it means that the BOP power must be provided to meet the power demand requirements.This additional power can be sourced from either the propulsion system or other onboard power plants.Thus, to apply online EMS to LH 2 -HSPS, the supply of BOP power for producing the required power should be included in the energy management problem.However, existing EMS proposals for HSPS, based on prior research, have only considered cost functions related to the power demand for propulsion, and degradation of fuel cells and batteries without the BOP power of the FGSS and power sources.Therefore, there is a need for research on EMS for systems that use LH 2 as a fuel with consideration of BOP power for the LH 2 FGSS and the PEMFC system.
Therefore, this study proposes an EMS for LH 2 -HSPS using deep reinforcement learning.Constructing an EMS that considers both power demand and BOP power based on models of the LH 2 FGSS, PEMFC, and battery systems that constitute LH 2 -HSPS, energy management performance is compared with conventional optimization algorithms, which are DP and sequential quadratic programming (SQP).Furthermore, we assess the optimized operation strategy with the proposed DRL-EMS through sensitivity analysis of key parameters and changes in operating profiles that affect the EMS.This research provides academic contributions by offering an EMS that can be applied to LH 2 -HSPS and considers the BOP power of the target system, with an analysis of its performance.It is expected to provide meaningful insights into the energy management problems of LH 2 -based hybrid power systems for various industries in the future.The rest of this study is organized as follows: Section 2 introduces the description of models of the LH 2 FGSS, PEMFC, and battery systems.In Section 3, a methodology for energy management is suggested.Section 4 presents the results and discussion, and Section 5 shows the conclusions of this study

Model Description 2.1. Description of Target Ship
A Platform Supply Vessel (PSV) is a ship designed to support the transportation, installation, operation, and maintenance of offshore installations.PSVs perform various tasks in offshore environments and are equipped with a dynamic positioning system to control the vessel's position and direction for safe and stable operations.When dynamic positioning is used to control the vessel in real time, the required power for the target vessel can vary significantly.In ships like PSVs, where the power demand varies significantly over time, online EMS demonstrates superior performance compared to rule-based EMS since it relies on predefined rules for power distribution.Furthermore, when a battery system that allows for charging and discharging of power at desired times is integrated into LH 2 -HSPS, it can operate the PEMFC system more efficiently when variations of power demand are significant [10].
Therefore, 2 MW-class PSV is selected as the target ship for applying DRL-EMS.The power demand of a PSV is determined based on its operational mode, which includes laden voyage, dynamic positioning operation, partial load voyage, and standby mode.While many research studies are ongoing to predict the required power for varying environmental conditions, this study assumes a general required power profile of PSVs based on the existing literature [23][24][25].This power profile is utilized as a reference profile for DRL, DP, and SQP (Figure 1a).Additionally, to assess the online performance of DRL-EMS when applied to unknown power profiles not used during training, we considered three additional power profiles as shown in Figure 1b-d.
suggested.Section 4 presents the results and discussion, and Section 5 shows the conclusions of this study

Description of Target Ship
A Platform Supply Vessel (PSV) is a ship designed to support the transportation, installation, operation, and maintenance of offshore installations.PSVs perform various tasks in offshore environments and are equipped with a dynamic positioning system to control the vessel's position and direction for safe and stable operations.When dynamic positioning is used to control the vessel in real time, the required power for the target vessel can vary significantly.In ships like PSVs, where the power demand varies significantly over time, online EMS demonstrates superior performance compared to rule-based EMS since it relies on predefined rules for power distribution.Furthermore, when a battery system that allows for charging and discharging of power at desired times is integrated into LH2-HSPS, it can operate the PEMFC system more efficiently when variations of power demand are significant [10].
Therefore, 2 MW-class PSV is selected as the target ship for applying DRL-EMS.The power demand of a PSV is determined based on its operational mode, which includes laden voyage, dynamic positioning operation, partial load voyage, and standby mode.While many research studies are ongoing to predict the required power for varying environmental conditions, this study assumes a general required power profile of PSVs based on the existing literature [23][24][25].This power profile is utilized as a reference profile for DRL, DP, and SQP (Figure 1a).Additionally, to assess the online performance of DRL-EMS when applied to unknown power profiles not used during training, we considered three additional power profiles as shown in Figure 1(b)-(d).

Liquid Hydrogen Fuel Gas Supply System
The LH 2 FGSS plays a role in vaporizing the stored LH 2 and supplying fuel to meet the pressure and temperature conditions required by the PEMFC system.This system consists of a fuel tank for storing LH 2 , a pump for transferring LH 2 , an ethylene glycol/water (GW) mixture system for supplying thermal energy, valves, and controllers.When the fuel tank volume is not large, or transient states in the FGSS do not occur frequently, there is an advantage to reducing the risk of hydrogen leaks and not requiring redundancy units by installing a pressure build-up unit that pressurizes the tank to pressure of a certain level using an external heat source instead of the LH 2 pump [22].However, since the PSV experiences significant fluctuations in the output of the LH 2 FGSS and PEMFC system, it is assumed a pump-type FGSS to ensure a stable fuel supply.
In this study, the LH 2 FGSS is simulated using Aspen HYSYS software to calculate the changes in hydrogen fuel flow rate, pressure, and temperature due to fluctuations in the output of the PEMFC system during operation.Figure 2 shows the implemented LH 2 FGSS in Aspen HYSYS and Table 1 represents key design specifications for each piece of equipment.

Liquid Hydrogen Fuel Gas Supply System
The LH2 FGSS plays a role in vaporizing the stored LH2 and supplying fuel to meet the pressure and temperature conditions required by the PEMFC system.This system consists of a fuel tank for storing LH2, a pump for transferring LH2, an ethylene glycol/water (GW) mixture system for supplying thermal energy, valves, and controllers.When the fuel tank volume is not large, or transient states in the FGSS do not occur frequently, there is an advantage to reducing the risk of hydrogen leaks and not requiring redundancy units by installing a pressure build-up unit that pressurizes the tank to pressure of a certain level using an external heat source instead of the LH2 pump [22].However, since the PSV experiences significant fluctuations in the output of the LH2 FGSS and PEMFC system, it is assumed a pump-type FGSS to ensure a stable fuel supply.
In this study, the LH2 FGSS is simulated using Aspen HYSYS software to calculate the changes in hydrogen fuel flow rate, pressure, and temperature due to fluctuations in the output of the PEMFC system during operation.Figure 2 shows the implemented LH2 FGSS in Aspen HYSYS and Table 1 represents key design specifications for each piece of equipment.The Modified Benedict-Webb-Rubin (MBWR) and Peng-Robinson equations are used for the hydrogen and GW streams, respectively, and the composition of the hydrogen stream is assumed to be 99.8% para-hydrogen and 0.02% ortho-hydrogen based on mole fractions.The sizing of key equipment for dynamic simulation is performed considering the maximum H2 flow rate for the maximum output of the PEMFC system.In particular, the volume of the LH2 fuel tank is determined as 100 m 3 based on the required fuel  The Modified Benedict-Webb-Rubin (MBWR) and Peng-Robinson equations are used for the hydrogen and GW streams, respectively, and the composition of the hydrogen stream is assumed to be 99.8% para-hydrogen and 0.02% ortho-hydrogen based on mole fractions.The sizing of key equipment for dynamic simulation is performed considering the maximum H 2 flow rate for the maximum output of the PEMFC system.In particular, the volume of the LH 2 fuel tank is determined as 100 m 3 based on the required fuel quantity for a case where the PEMFC system, the main power source generates all required power during operation, following the IGF Code [26].The LH 2 vaporizer is simulated as a shell and tube-type heat exchanger.The governing equations for each piece of equipment used in the calculations are shown in Equations ( 1)-( 8) [27,28].
E f ,tank : Internal energy of fluid stored in the tank h f ,in,i : Enthalpy of fluid i entering the tank h f ,out,j : Enthalpy of fluid j exiting the tank Q HX : Heat flow rate between shell and tube side U: Overall heat transfer coefficient A: Heat transfer area ∆T LM : Logarithmic temperature difference Heater P heater : Power consumption of the heater η heater : Efficiency of the heater
k: Pressure drop coefficient

Description of Power Sources
The PEMFC stack is modeled based on electrochemistry to simulate the voltage, current, and stack temperature.Also, the model can calculate the additional power required for the BOP.After modeling individual cells using Equations ( 9)-( 13) [29,30], cell models are connected to simulate the 2 MW class PEMFC system.A schematic diagram of the PEMFC system can be shown in Figure 3.The current flowing through the PEMFC stack is calculated based on the supplied hydrogen flow rate, and this is used to determine the voltage applied to the PEMFC stack, thus calculating the system's output.Additionally, the power consumption of BOP is determined by the power consumption of the air compressor, H 2 compressor, GW radiator, and air fan.All calculations are performed using Simulink/Simscape and Aspen HYSYS, and the developed PEMFC stack model was validated against the polarization curve of NEDSTACK's FCS 10-XXL product [31].It should be noted that the output of the PEMFC system produced through the combination of stacks and the BOP power for the system can vary slightly depending on the system's configuration and the detailed specifications of each piece of equipment.In this study, it was assumed that the flow rate, pressure, and temperature conditions of hydrogen supplied to multiple stacks are consistent, and the BOP power was calculated for the entire PEMFC system.
V cell : Cell voltage V nernst : Nernst voltage V act : Activation loss V ohm : Ohmic loss V conc : Concentration loss θ act : Coefficient of activation loss j cell : Current density j 0 : Reference current density i cell : Current R ohm : Electric resistance On the other hand, PEMFC stacks installed in ships or mobility applications have a relatively shorter lifetime compared to stationary applications.If there are rapid output changes in the stacks and if very high and low outputs continue, degradation is accelerated, leading to higher replacement costs over the lifespan.Therefore, it is necessary to consider the decreasing lifespan of PEMFC stacks during operation in energy management problems.P. Pei et al. (2008) investigated the effects of load-changing cycles, start/stop cycles, idling time, and high-power load conditions on the lifespan of automotive PEMFC through experimental research and proposed a degradation model based on arithmetic equations [32].Additionally, Y. Liu et al. (2020) examined rule-learning-based EMS for fuel-cell hybrid vehicles using the mentioned degradation model and reported an effective reduction in hydrogen consumption and an increase in the lifespan of PEMFC stacks [14].Similarly, in this study, the proposed model is used to calculate the effective degradation cost of the PEMFC system with Equation ( 14) and parameters in Table 2.The power charging and discharging of the battery system are simulated by connecting cell models based on the Equivalent Circuit Model (ECM) in series and parallel.Similar to the PEMFC system, a heat management system using GW as a thermal medium is modeled using Simulink/Simscape.A water-cooling type is a heat management system commonly used in batteries to dissipate heat generated during charging and discharging.When this type is used as a heat management system, a liquid coolant, typically water or GW, is circulated through a series of channels and tubes that are embedded within the battery pack or attached to its exterior surface.Once the coolant has been heated by the batteries, it is circulated to a radiator where it is cooled by air or another coolant.The cooled coolant is then circulated back into the battery, where it absorbs heat and the cycle repeats.Water-cooling systems offer several advantages over other types of heat On the other hand, PEMFC stacks installed in ships or mobility applications have a relatively shorter lifetime compared to stationary applications.If there are rapid output changes in the stacks and if very high and low outputs continue, degradation is accelerated, leading to higher replacement costs over the lifespan.Therefore, it is necessary to consider the decreasing lifespan of PEMFC stacks during operation in energy management problems.P. Pei et al. (2008) investigated the effects of load-changing cycles, start/stop cycles, idling time, and high-power load conditions on the lifespan of automotive PEMFC through experimental research and proposed a degradation model based on arithmetic equations [32].Additionally, Y. Liu et al. (2020) examined rule-learning-based EMS for fuel-cell hybrid vehicles using the mentioned degradation model and reported an effective reduction in hydrogen consumption and an increase in the lifespan of PEMFC stacks [14].Similarly, in this study, the proposed model is used to calculate the effective degradation cost of the PEMFC system with Equation ( 14) and parameters in Table 2.The power charging and discharging of the battery system are simulated by connecting cell models based on the Equivalent Circuit Model (ECM) in series and parallel.Similar to the PEMFC system, a heat management system using GW as a thermal medium is modeled using Simulink/Simscape.A water-cooling type is a heat management system commonly used in batteries to dissipate heat generated during charging and discharging.When this type is used as a heat management system, a liquid coolant, typically water or GW, is circulated through a series of channels and tubes that are embedded within the battery pack or attached to its exterior surface.Once the coolant has been heated by the batteries, it is circulated to a radiator where it is cooled by air or another coolant.The cooled coolant is then circulated back into the battery, where it absorbs heat and the cycle repeats.Water-cooling systems offer several advantages over other types of heat management systems, such as air-cooling or passive cooling.They can dissipate heat more efficiently and effectively, which allows the battery to operate at higher power levels for longer periods of time.Additionally, water-cooling systems can be designed to be more compact and lightweight than other types, which is particularly important in applications where space and weight are limited, such as ship propulsion.
The used ECM consists of a 4-parameters model, which includes one voltage source, two resistors, and one capacitor.Each parameter was calculated as a two-dimensional lookup table based on the state of charge (SOC) and temperature of the battery cell, referencing the research results of Huria et al. ( 2012) [33].The heat management system of the battery system using GW as a thermal medium was modeled using Equations ( 15)- (17).
f avg : Friction factor with averaged condition between the inlet and outlet Re avg : Reynolds number with averaged condition between the inlet and outlet Pr avg : Prandtl number with averaged condition between the inlet and outlet f = 1 −1.8log 10 6.9 Re + 1 Lithium iron phosphate (LiFePO 4 , LFP)-based battery cells have the disadvantage of relatively low gravimetric energy density.However, they are relatively inexpensive because they do not use expensive materials like cobalt and nickel.Additionally, they have a long lifespan under conditions where the maximum C-rate is not high.Moreover, they are widely used in large-scale applications such as ships and space industries due to their low risk of explosion or fire [34,35].Therefore, in this study, it is assumed that the target battery system uses LFP-based cells.
J. Wang et al. (2011) conducted experimental research to investigate capacity fade in graphite-LFP cells by varying cell temperature, depth of discharge, and C-rate.They found that at low C-rates, capacity fade was significantly affected by time and temperature, while at high C-rates, the effect of the C-rate became more pronounced.Furthermore, based on the experimental results, they generalized the power-law equation for capacity fade [35].Similar to the PEMFC system, we used the following degradation model based on existing research results to consider the degradation rate of battery cells in the energy management problem, as shown in Equation ( 18) with parameters in Table 3.
A h : Ah-throughput E loss,bat : Capacity loss C rate : C-rate

System Efficiency
Using the models for LH 2 FGSS and the PEMFC system, the system efficiency is estimated of the target system based on the power of the PEMFC system using Equation (19), which includes hydrogen consumption with BOP power.Additionally, the calculated system efficiency is used to estimate hydrogen consumption in the energy management problem.Since the BOP power from the battery system is not significantly high compared to the LH 2 FGSS and PEMFC system, we assumed it is supplied by the battery system.η system (P FC ) = P FC,total (P FC ) − P FGSS,BOP (P FC ) − P FC,BOP (P FC ) .
η system : System efficiency P FC : Output power of the PEMFC system excluding BOP power P FC,total : Output power of the PEMFC system P FGSS,BOP : BOP power of the LH 2 FGSS P FC,BOP : BOP power of the PEMFC system Figure 4 depicts the system efficiency of the LH 2 -HSPS calculated through Equation ( 19) and the required mass flow rate of hydrogen as a function of fraction for the maximum output of the PEMFC system, which is 2 MW.The maximum efficiency of the LH 2 -HSPS is found to be approximately 59%, occurring within the 10~20% fraction of output power.Additionally, it is confirmed that BOP power reduces the system efficiency of the LH 2 -HSPS by approximately 7%, resulting in a difference of 17.5 kg/h in the required hydrogen mass flow rate based on the maximum output power.

Methodology of Energy Management
The energy management problem of the LH2-HSPS addressed in this study takes the output of the PEMFC system as the control variable.As mentioned earlier, it is assumed that all BOP power for the LH2 FGSS and the PEMFC system is generated by the PEMFC system.Additionally, the reward function (for DRL-EMS) or objective function (for DP-EMS and SQP-EMS) of this problem considers operating expenditure (OPEX) with hydrogen consumption and the degradation of the PEMFC and battery systems.Constraints are imposed on the state of charge (SOC) of the battery system and the power demand of the PSV.To summarize, the problem can be described with Equations ( 20)-( 28).Detailed parameters for solving energy management problems can be shown in Table 4.

Methodology of Energy Management
The energy management problem of the LH 2 -HSPS addressed in this study takes the output of the PEMFC system as the control variable.As mentioned earlier, it is assumed that all BOP power for the LH 2 FGSS and the PEMFC system is generated by the PEMFC system.Additionally, the reward function (for DRL-EMS) or objective function (for DP-EMS and SQP-EMS) of this problem considers operating expenditure (OPEX) with hydrogen consumption and the degradation of the PEMFC and battery systems.Constraints are imposed on the state of charge (SOC) of the battery system and the power demand of the PSV.To summarize, the problem can be described with Equations ( 20)- (28).Detailed parameters for solving energy management problems can be shown in Table 4 C bat,eq (t

Deep Reinforcement Learning
The Deep Q-network (DQN) algorithm used in this study is based on the Q-learning algorithm widely used in reinforcement learning.It effectively trains agents for highdimensional or large state and action spaces by approximating Q-values for each state and action obtained through the Q-function, typically defined as the following Equation ( 29), using a neural network [39].The Q-values computed through the Q-function represent the expected value of the return (i.e., cumulative reward) that can be obtained when taking action (a) in a specific state (s).In the case of the Q-learning algorithm, training occurs through interaction with the environment, and Q-values for all actions in all states are continuously updated.Once the learning is completed, the agent can choose the optimal action in each state.
Q π : State-action value function with policy π G t : Return after time t S t : State at time t A t : Action at time t R t+1 : Reward at time t + 1 γ: Discount factor One of the features of the DQN algorithm is the use of separate prediction and target networks.During training, the prediction network is continuously updated, while the target network is updated less frequently.The target network provides target Q-values for the loss function, defined as follows, at each training step.Additionally, the target network mitigates the overestimation of Q-values approximated by the neural network by providing stable target Q-values.During each episode, the neural network is trained through random sampling from the experience pool.The gradient of the loss function is computed, and the optimal action value is obtained using the gradient descent algorithm with Equation (30).The loss function represents how optimally the current prediction network approximates the action value.Training proceeds by continuously updating both the target and prediction networks.Figure 5 represents the overview of the DQN algorithm and detailed hyperparameters can be shown in Table 5.

Benchmark Algorithms
To verify and assess the optimality of the PEMFC system's output determined through DRL-EMS and the total OPEX obtained, the results are compared with the DP algorithm for the same energy management problem.DP is a widely used algorithm for continuoustime control problems, including energy management in hybrid propulsion systems.The dynamic model considered in this study evolves over time and, following the principle of optimality, the DP algorithm calculates the optimal cost-to-go function for all time and state nodes through backward calculation.Based on this, it provides optimal control results through forward calculation [40].
As mentioned in Section 1, the DP algorithm is advantageous for global optimization, but as the number of state and control variables increases, the computational complexity escalates rapidly, making it unsuitable for online EMS applications.Therefore, to assess the online energy management performance of RL-EMS, an online EMS based on the SQP algorithm and ECMS (SQP-EMS) is additionally developed [41,42].

Results and Discussion
Before analyzing the optimal operational strategy applied to LH 2 -HSPS by DRL-EMS, the optimization results with DP-EMS and SQP-EMS algorithms are compared to evaluate the performance of these algorithms, as shown in Table 6.It is observed that both DRL-EMS and SQP-EMS resulted in 0.5% and 17.5% higher OPEX, respectively, compared to DP-EMS.The significant impact on the performance of these two algorithms was attributed to the equivalent degradation cost of the PEMFC system.The degradation rate calculated through the model exhibited discontinuities at low-load operations (<40 kW) and high-load operations (>1800 kW), which SQP-EMS, based on gradient descent, failed to sufficiently consider.Additionally, DRL-EMS yielded OPEX values nearly identical to DP-EMS, indicating that the effective utilization of the battery system allowed DP-EMS to calculate slightly lower OPEX.Meanwhile, Figure 6 shows the changes in the calculated PEMFC system output and SOC when each EMS is applied.As mentioned earlier, it can be observed that DP-EMS is most effectively utilizing the battery system based on the SOC changes, while SQP-EMS appears to underutilize the installed battery system in situations where future required power is uncertain.Figure 7 shows a histogram and cumulative percent of the PEMFC system's power output counted in 30 min intervals using DRL-EMS for the reference operating profile.It is evident that, due to the decreasing LH 2 -HSPS LHV efficiency as the PEMFC system output increases, the optimization has resulted in operation times at power levels lower than the average required power (~420 kW) for about 60% of the time.On the other hand, the system efficiency plot reveals that the system efficiency is highest when the PEMFC system output is 20% or less of its maximum value.The fraction of this average required power is 21%.In other words, DRL-EMS has been trained to operate within the maximum efficiency range of the PEMFC system as much as possible.
be observed that DP-EMS is most effectively utilizing the battery system based on the SOC changes, while SQP-EMS appears to underutilize the installed battery system in situations where future required power is uncertain.Figure 7 shows a histogram and cumulative percent of the PEMFC system's power output counted in 30 min intervals using DRL-EMS for the reference operating profile.It is evident that, due to the decreasing LH2-HSPS LHV efficiency as the PEMFC system output increases, the optimization has resulted in operation times at power levels lower than the average required power (~420 kW) for about 60% of the time.On the other hand, the system efficiency plot reveals that the system efficiency is highest when the PEMFC system output is 20% or less of its maximum value.The fraction of this average required power is 21%.In other words, DRL-EMS has been trained to operate within the maximum efficiency range of the PEMFC system as much as possible.
In the previous results, it is confirmed that approximately 90% of OPEX was incurred through hydrogen fuel consumption, indicating the necessity of saving hydrogen consumption for the efficient operation of LH2-HSPS.Figure 8 represents cumulative hydrogen consumption when using the same DRL-EMS but distributing power based on PEMFC system stack efficiency instead of LH2-HSPS system efficiency.DRL-EMS calculated a total fuel consumption of 4603 kg with consideration of BOP power.On the other hand, when not considering BOP power, a total of 4074 kg of fuel was consumed, which is approximately 11.5% lower compared to the system efficiency-based calculation.Since ships have limited space for equipment relative to their capacity, the appropriate sizing of each piece of equipment should be determined in the design phase.When using LH2 as fuel without a separate external power plant to supply BOP power required for ship propulsion, power must be supplied through the PEMFC System for propulsion.In this case, as explained earlier, there is a significant difference of about 11.5% in fuel consumption per operation, affecting the volume of the fuel tank.Therefore, the volume of the LH2powered ship's fuel tank to be built in the future should be determined by thoroughly reviewing the system efficiency of LH2-HSPS.In the previous results, it is confirmed that approximately 90% of OPEX was incurred through hydrogen fuel consumption, indicating the necessity of saving hydrogen consumption for the efficient operation of LH 2 -HSPS.Figure 8 represents cumulative hydrogen consumption when using the same DRL-EMS but distributing power based on PEMFC system stack efficiency instead of LH 2 -HSPS system efficiency.DRL-EMS calculated a total fuel consumption of 4603 kg with consideration of BOP power.On the other hand, when not considering BOP power, a total of 4074 kg of fuel was consumed, which is approximately 11.5% lower compared to the system efficiency-based calculation.Since ships have limited space for equipment relative to their capacity, the appropriate sizing of each piece of equipment should be determined in the design phase.When using LH 2 as fuel without a separate external power plant to supply BOP power required for ship propulsion, power must be supplied through the PEMFC System for propulsion.In this case, as explained earlier, there is a significant difference of about 11.5% in fuel consumption per operation, affecting the volume of the fuel tank.Therefore, the volume of the LH 2 -powered ship's fuel tank to be built in the future should be determined by thoroughly reviewing the system efficiency of LH 2 -HSPS.To further analyze the optimal energy management performance of the DRL-EMS, two sensitivity analyses are conducted.Among them, Figure 9 represents the energy management results with different hydrogen fuel costs, which has the most significant impact on LH2-HSPS's OPEX.The training is performed for unit hydrogen fuel costs of 2, 4, and 6 USD/kg.The calculation results showed that the hydrogen fuel price exhibited a linear relationship with OPEX compared to the reference case.Additionally, when examining the average power generated by the PEMFC system in each case, it is found that nearly To further analyze the optimal energy management performance of the DRL-EMS, two sensitivity analyses are conducted.Among them, Figure 9 represents the energy management results with different hydrogen fuel costs, which has the most significant impact on LH 2 -HSPS's OPEX.The training is performed for unit hydrogen fuel costs of 2, 4, and 6 USD/kg.The calculation results showed that the hydrogen fuel price exhibited a linear relationship with OPEX compared to the reference case.Additionally, when examining the average power generated by the PEMFC system in each case, it is found that nearly identical average power is produced in all cases.This implies that the change in hydrogen fuel price does not determine the operational strategy of LH 2 -HSPS, and the decrease in OPEX is attributed to changes in hydrogen fuel prices rather than changes in the operating mode.
Furthermore, a sensitivity analysis of DRL-EMS performance with respect to battery system capacity can be shown in Figure 10.Since the equivalent degradation cost of the battery system does not account for a significant portion of OPEX, LH 2 -HSPS's OPEX does not exhibit significant changes for all investigated battery system capacities.It showed a maximum difference of approximately 2.2% compared to the reference case (i.e., capacity of 2000 kWh).The increase in battery system capacity leads to a trade-off relationship with equivalent degradation cost under the same charging and discharging conditions due to the combined effects of system cost increase and C-rate decrease.Consequently, it is determined that this did not have a significant impact on the overall OPEX.
Finally, considering that various operation modes can occur during a vessel's operation, the performance of DRL-EMS is evaluated on three additional operation profiles not used in the training.The calculation results showed that, depending on the cases, OPEX is higher by approximately 0.7% to 9.2% compared to DP-EMS (Figure 11).Case 2, which demonstrated performance similar to DP-EMS, exhibited a distribution of power demand in the histogram that closely resembled that of DP-EMS.On the other hand, Case 3 and Case 4, which showed significant differences from DP-EMS, have distinct distributions of power demand compared to the reference case (Figure 12).In essence, it is concluded that DRL-EMS's performance could decrease when significantly different operations occurred compared to the required power variations used in its training.However, despite being arbitrary power demands not used in the training of DRL-EMS, the fact that they still show a maximum difference of up to 9.2% compared to DP-EMS indicates that DRL-EMS exhibits remarkable optimization performance, as compared to the results of SQP-EMS (Table 6).Also, one of DRL-EMS's advantages is its ability to use an agent trained under various operating conditions directly in actual operations.By continuously updating neural networks based on data obtained from equipment installed on real vessels and conducting ongoing training, it is expected that DRL-EMS can provide an effective EMS for diverse operations.identical average power is produced in all cases.This implies that the change in hydrogen fuel price does not determine the operational strategy of LH2-HSPS, and the decrease in OPEX is attributed to changes in hydrogen fuel prices rather than changes in the operating mode.Furthermore, a sensitivity analysis of DRL-EMS performance with respect to battery system capacity can be shown in Figure 10.Since the equivalent degradation cost of the battery system does not account for a significant portion of OPEX, LH2-HSPS's OPEX does not exhibit significant changes for all investigated battery system capacities.It showed a maximum difference of approximately 2.2% compared to the reference case (i.e., capacity of 2000 kWh).The increase in battery system capacity leads to a trade-off relationship with equivalent degradation cost under the same charging and discharging conditions due to the combined effects of system cost increase and C-rate decrease.Consequently, it is determined that this did not have a significant impact on the overall OPEX.identical average power is produced in all cases.This implies that the change in hydrogen fuel price does not determine the operational strategy of LH2-HSPS, and the decrease in OPEX is attributed to changes in hydrogen fuel prices rather than changes in the operating mode.Furthermore, a sensitivity analysis of DRL-EMS performance with respect to battery system capacity can be shown in Figure 10.Since the equivalent degradation cost of the battery system does not account for a significant portion of OPEX, LH2-HSPS's OPEX doe not exhibit significant changes for all investigated battery system capacities.It showed a maximum difference of approximately 2.2% compared to the reference case (i.e., capacity of 2000 kWh).The increase in battery system capacity leads to a trade-off relationship with equivalent degradation cost under the same charging and discharging conditions due to the combined effects of system cost increase and C-rate decrease.Consequently, it is de termined that this did not have a significant impact on the overall OPEX.show a maximum difference of up to 9.2% compared to DP-EMS indicates that DRL-EMS exhibits remarkable optimization performance, as compared to the results of SQP-EMS (Table 6).Also, one of DRL-EMS's advantages is its ability to use an agent trained under various operating conditions directly in actual operations.By continuously updating neural networks based on data obtained from equipment installed on real vessels and conducting ongoing training, it is expected that DRL-EMS can provide an effective EMS for diverse operations.

Conclusions
This study proposed a deep reinforcement learning-based energy management strategy (DRL-EMS) that can be applied to a liquid hydrogen-powered hybrid electric ship propulsion system (LH2-HSPS) and compares and analyzes its performance with EMS using dynamic programming (DP-EMS) and sequential quadratic programming (SQP-EMS).The study also investigated the optimal operation strategy for LH2-HSPS.Modeling of LH2-HSPS was conducted to calculate the optimal operating expenditure (OPEX) considering BOP power for the LH2 FGSS and PEMFC system within LH2-HSPS.The reward function of the energy management problem consists of hydrogen consumption, degradation of the PEMFC and battery systems, and equivalent consumption of the battery system.DRL-EMS demonstrated superior global and real-time optimization performance compared to DP-EMS and SQP-EMS.Additionally, additional performance analysis was

Conclusions
This study proposed a deep reinforcement learning-based energy management strategy (DRL-EMS) that can be applied to a liquid hydrogen-powered hybrid electric ship propulsion system (LH 2 -HSPS) and compares and analyzes its performance with EMS using dynamic programming (DP-EMS) and sequential quadratic programming (SQP-EMS).The study also investigated the optimal operation strategy for LH 2 -HSPS.Modeling of LH 2 -HSPS was conducted to calculate the optimal operating expenditure (OPEX) considering BOP power for the LH 2 FGSS and PEMFC system within LH 2 -HSPS.The reward function

Figure 1 .
Figure 1.Power demand for propulsion of a 2 MW−class PSV for (a) Reference Case, (b) Case 2, (c) Case 3 and (d) Case 4.Figure 1.Power demand for propulsion of a 2 MW−class PSV for (a) Reference Case, (b) Case 2, (c) Case 3 and (d) Case 4.

Figure 1 .
Figure 1.Power demand for propulsion of a 2 MW−class PSV for (a) Reference Case, (b) Case 2, (c) Case 3 and (d) Case 4.Figure 1.Power demand for propulsion of a 2 MW−class PSV for (a) Reference Case, (b) Case 2, (c) Case 3 and (d) Case 4.
m f ,tank : Mass of fluid stored in the tank .m f ,in,i : Mass flow rate of fluid i entering the tank .m f ,out,j : Mass flow rate of fluid j exiting the tank P pump : Power consumption of the pump p f ,out : Outlet pressure of fluid p f ,in : Inlet pressure of fluid ρ f : Density of fluid η pump : Efficiency of the pump Heat Exchanger d(m f ,shell h f ,shell,out ) dt = .m f ,shell h f ,shell,in − h f ,shell,out + .Q HX (4) m f ,shell : Mass of accumulated fluid in shell side .m f ,shell : Mass flow rate of fluid in shell side h f ,shell,in : Enthalpy of entering fluid in shell side h f ,shell,out : Enthalpy of exiting fluid in shell side .
m f ,tube : Mass of accumulated fluid in tube side .m f ,tube : Mass flow rate of fluid in tube side h f ,tube,in : Enthalpy of entering fluid in tube side h f ,tube,out : Enthalpy of exiting fluid in tube side .

Figure 3 .
Figure 3. Schematic diagram of the PEMFC system.

Figure 3 .
Figure 3. Schematic diagram of the PEMFC system.
Roughness of tube D: Diameter of tube

J 21 Figure 4 .
Figure 4. System efficiency and required mass flow rate of hydrogen for operation of the LH2-HSPS with different fractions of PEMFC output.

Figure 4 .
Figure 4. System efficiency and required mass flow rate of hydrogen for operation of the LH 2 -HSPS with different fractions of PEMFC output.

Figure 6 .
Figure 6.(a) Output power of the PEMFC system and (b) SOC profiles for reference case with each energy management algorithm.

Figure 6 .
Figure 6.(a) Output power of the PEMFC system and (b) SOC profiles for reference case with each energy management algorithm.J. Mar.Sci.Eng.2023, 11, x FOR PEER REVIEW 15 of 21

Figure 7 .
Figure 7. Histogram and cumulative percent plot of output power of the PEMFC system for reference case.

Figure 7 .
Figure 7. Histogram and cumulative percent plot of output power of the PEMFC system for reference case.

Figure 7 .
Figure 7. Histogram and cumulative percent plot of output power of the PEMFC system for reference case.

Figure 8 .
Figure 8. Cumulative hydrogen consumption with and without consideration of BOP power for the liquid hydrogen fuel gas supply system and PEMFC system.

Figure 8 .
Figure 8. Cumulative hydrogen consumption with and without consideration of BOP power for the liquid hydrogen fuel gas supply system and PEMFC system.

Figure 9 .
Figure 9. Energy management results with different hydrogen fuel costs for reference case.

Figure 10 .
Figure 10.Energy management results with different capacities of the battery system for reference

Figure 9 .
Figure 9. Energy management results with different hydrogen fuel costs for reference case.

Figure 9 .
Figure 9. Energy management results with different hydrogen fuel costs for reference case.

Figure 10 .
Figure 10.Energy management results with different capacities of the battery system for referenc case.

Figure 10 .
Figure 10.Energy management results with different capacities of the battery system for reference case.

Figure 11 .
Figure 11.Energy management results for Cases 1 to 4 with dynamic programming and deep reinforcement learning algorithms.

Figure 11 .
Figure 11.Energy management results for Cases 1 to 4 with dynamic programming and deep reinforcement learning algorithms.J. Mar.Sci.Eng.2023, 11, x FOR PEER REVIEW 18 of 21

Figure 12 .
Figure 12.Histogram and density function of power demand for each operation case.

Figure 12 .
Figure 12.Histogram and density function of power demand for each operation case.

Table 1 .
Specifications of each piece of equipment for the LH2 FGSS.

Table 1 .
Specifications of each piece of equipment for the LH 2 FGSS.

Table 3 .
Parameters of the battery system and the degradation model. .

Table 4 .
Parameters for energy management problem.

Table 5 .
Hyperparameters of the Deep Q-network.

Table 5 .
Hyperparameters of the Deep Q-network.

Table 6 .
Comparison of optimized operating expenditure with different algorithms.

Table 6 .
Comparison of optimized operating expenditure with different algorithms.