Prognostics and Health Management for the Optimization of Marine Hybrid Energy Systems

Decarbonization of marine transport is a key global issue, with the carbon emissions of international shipping projected to increase 23% to 1090 million tonnes by 2035 in comparison to 2015 levels. Optimization of the energy system (especially propulsion system) in these vessels is a complex multi-objective challenge involving economical maintenance, environmental metrics, and energy demand requirements. In this paper, data from instrumented vessels on the River Thames in London, which includes environmental emissions, power demands, journey patterns, and variance in operational patterns from the captain(s) and loading (passenger numbers), is integrated and analyzed through automatic, multi-objective global optimization to create an optimal hybrid propulsion configuration for a hybrid vessel. We propose and analyze a number of computational techniques, both for monitoring and remaining useful lifetime (RUL) estimation of individual energy assets, as well as modeling and optimization of energy use scenarios of a hybrid-powered vessel. Our multi-objective optimization relates to emissions, asset health, and power performance. We show that, irrespective of the battery packs used, our Relevance Vector Machine (RVM) algorithm is able to achieve over 92% accuracy in remaining useful life (RUL) predictions. A k-nearest neighbors algorithm (KNN) is proposed for prognostics of state of charge (SOC) of back-up lead-acid batteries. The classifier achieved an average of 95.5% accuracy in a three-fold cross validation. Utilizing operational data from the vessel, optimal autonomous propulsion strategies are modeled combining the use of battery and diesel engines. The experiment results show that 70% to 80% of fuel saving can be achieved when the diesel engine is operated up to 350 kW. Our methodology has demonstrated the feasibility of combination of artificial intelligence (AI) methods and real world data in decarbonization and optimization of green technologies for maritime propulsion.


Introduction
Global marine transport is a vital contributor to the economy, supporting the transfer of goods and people throughout the world. As a result, marine transport is a significant and growing consumer of energy and fuel consumption. According to a report by the Organization for Economic Co-operation and Development (OECD), carbon emissions from global shipping are projected to reach approximately 1090 million tonnes by 2035, representing a 23% growth from 2015 [1]. The main driver behind such growth is the rise in international trade. Emissions have reduced during the global lock-down as result of Covid-19; however, a rebound effect in carbon emissions will resume when restrictions are removed [2]. The combination of climate change and the forecasted global recession further increases the demand for economical decarbonization solutions that can be applied to existing fleets. analysis. By not using real-time data to inform these schedules, both technical and commercial risks are incurred, such as incorrect or incomplete maintenance, induced damage through intervention, impacts of new load patterns, unnecessary non-productive time, and early replacement of healthy components [21]. Consequently, more advanced asset health management systems for hybrid vessel are required that take into account the dynamics that exist in these complex systems. Current literature on intelligent asset health management tends to focus on prognostics of single-type assets [22][23][24][25] or using simulated data to test the reliability of health management models [26,27]. A gap exists in the literature for the analysis of the complex dynamic relationships between multiple assets that simultaneously respond to the needs of vessels in real-world environments. Run-time optimization of sub-systems whilst meeting operational objectives, optimally, during dynamic loading and ambient conditions represents a significant challenge. Through a detailed optimization, balancing the usage and optimal performance characteristics of the diesel engine and lithium-ion battery, the lifespan of the diesel engine and lithium-ion batteries can be positively affected [28]. Therefore, it is crucial to identify the conditions under which diesel engines operate sub-optimally and inefficiently. In doing so, hybrid vessel operators can switch or combine diesel and electric propulsion power, simultaneously, to maximize fuel efficiency and reduce emissions of the vessels.
In the context of the current state of the art, our research addresses several key knowledge gaps. First, using operational and other test data, we create prognostic models for state of health estimation of individual energy subsystems present aboard a hybrid-powered vessel-specifically, Li-ion and lead-acid batteries. Second, we optimize the diesel engine performance in the presence of lithium-ion battery assets, as part of a hybrid propulsion system. Third, through the use of multi-objective optimization which includes asset health, energy performance and environmental metrics, examine a number of control methods which can be used in the operation of an autonomous energy management system. These findings capture the inter-dependencies and variances in real-world operations and overcome the limitations of data from simulated marine vessel performance.
The remainder of this paper is organized as follows: Section 2 presents an overview on prognostics and health management (PHM) and research into the propulsion of marine vessels. Section 3 describes our approach of optimization and the data sources utilized within this study, while Section 4 presents our methodology and results for individual asset forecasts within a hybrid energy system, e.g., Li-on battery, lead-acid battery and diesel engine. Section 5 provides an description of the whole system optimization for the hybrid energy system, and demonstrates the effect of autonomous energy system management for a hybrid propulsion system under different operating scenarios. Finally, Section 6 summarizes the key observations from results and recommendations from our findings, as well as suggestions for future work.

Prognostics and Propulsion Research in Marine Systems
Within this section, we provide a brief overview of prognostic health management and propulsion research with reference to components and sub-systems relevant to hybrid marine vessels.

Prognostics and Health Management
Remaining useful life (RUL) is the length of time an asset, e.g., component, sub-system, or system, is likely to operate before it requires repair or replacement. The estimation of the RUL of components, sub-systems and systems, traditionally relies on analysis of data, which contains signatures of failure modes, collated over a systems' life cycle. Such data can be scarce and difficult to obtain due to the sensitivities of companies releasing such data, or due to the early replacement of assets. Reliability based RUL estimation relies on understanding the failure characteristics of the asset and determines what can be done to manage the consequence of the failure [29,30].
A common approach when failure data is available is to fit a Weibull distributions to this failure data. However, as shown by Turner [31], there are significant limitations to this approach, such as: (1) lack of reliable data, (2) influence of multiple failure mechanisms of an engineering system, and (3) component failure distribution does not match the Weibull distribution. Statistical analysis often ignores the mechanism of failure or assumes only one failure mechanism, which is not the case in many complex engineering systems, and results in fixed time replacement of components. In comparison, a prognostic approach aims at predicting failure based on individual component state estimation by employing either physics of failure, data or fusion degradation models [32].
Prognostics and Health Management (PHM) is a highly multidisciplinary field that aims to forecast asset failures ahead of time in order to enable intelligent health management whilst the system remains in operation [33]. The main purpose of a prognostic model is to detect system state deviation from a healthy baseline state by understanding whether the engineering system is behaving within nominal operational bounds [34].
There are three main approaches to prognostics: A statistical, reliability-based PHM approach which assumes usage and environmental conditions have no effect and that knowledge of the failure mechanism is not required [35]. A physical, modeling-based approach that mimics the component or system deterioration/decay via nominal life models. This can include degradation models or damage accumulation models based on failure modes mechanism, such as wear, fatigue, corrosion, and contamination [14]. On the other hand, the data-driven prognostics approach is an application of machine learning and statistical pattern recognition on data collected at system, subsystem or component level [32,36]. In practice, a prognostics architecture can rely on an individual method or a combination of the three. The selection of method depends on available data, previous models and system physical knowledge. Although PHM methods have been widely used across numerous industries on machinery, electronics, avionic systems, and vehicles with great success, challenges still persist in creating accurate models from imperfect real-world operational data and in producing predictions within time-critical and operational timescales. For example, real-world operational data are collected on different vessels, settings, and under different contexts. Existing data may not be suitable for our analysis without knowing full details of the data collection process. Second, data related to failure modes and faults need to be sufficient for training models; in some cases, such as for standby assets, it may be difficult to obtain sufficient hazard/failure related operational data [37]. Representativeness of the data is also a concern, as data collected may not be fully representative of all operation conditions. As a result, prognostics using these data may lead to false positive predictions [38]. In practice, it may be time-consuming for users to collect representative data, and they may be reluctant to use PHM applications due to this delay. In addition, due to the complexity in modeling single assets, system of system level prognostics is still in its early stages of research and development.

Propulsion and Health Management of Marine Vessels
In Reference [39], the marine transport optimization problem is focused on the ships voyage where emissions, cruising distance and cruising speed constraints are fulfilled in multiple stages. However, this multi-stage optimization does not level the power demand from the main generators, thereby, omitting operational benefits, such as reduced fuel consumption [40]. In Reference [41], an integrated optimal control system is proposed for full-electric propulsion ship. The optimization solution is formulated under multiple technical constraints to ensure safety while optimizing fuel consumption and limiting greenhouse gas emissions.
Literature on hybrid vessel control management is often based on the Equivalent Consumption Minimization Strategy (ECMS). In ECMS, the optimum power management strategies are calculated as a control optimization problem, where fuel consumption of the diesel engine, as well as the equivalent fuel consumption of batteries, are simultaneously minimized, while accounting for the fact that batteries need to be recharged. Load is shared between the power assets to minimize cost [8]. This ECMS method has been proposed by various researchers in trial studies on marine vessels. Grimmelius et al. Reference [42] applied the ECMS strategy on a tug as a test case. However, the model presented by the authors did not include the diesel engine as a power source. Another study using ECMS was conducted by Vu et al. Reference [43] to determine an optimal scheduling of engine and battery operation. The authors offer a nonlinear optimization approach to minimize cost function and show that relative to a rule-based controller approach, the ECMS strategy enables a 9% performance improvement for the combined cost function. In this study, there is a lack of real operational data, leading to the use of simulated tugboat model. The authors did not explicitly model battery degradation or engine health within this study. Thus, the optimization approach overlooks vital techno-economical assessments of a real vessel. A trail study was also conducted using ECMS on the hybrid power ferry MV Hallaig, which demonstrated a 24% fuel saving capability when batteries on board were charged overnight [44]. However, this prediction is based on a predefined power configuration and not real data. In addition, the impact of environmental metrics was not discussed in this trial.
In addition to EMCS, heuristic control strategies are also mentioned in the literature. For example, Greestma et al. Reference [8] summarized that, in heuristic control strategies, logical rules are used to determine the use of battery versus diesel engine. Under this strategy, batteries are applied to serve discrete, distinct operating modes. For example, Sciberras and Norman [45] proposed a strategy where a multi-objective optimization problem is reduced to a single-objective one through a weighted vector approach. The authors control strategy relies on using batteries alone when in low speeds and using both diesel engine and electric for propulsion in high speeds. The authors also demonstrated reduced fuel consumption in their paper. However, the authors' solution ultimately relies on the non-technical, experience-based decisions made by vessel operators.
Regarding the health management of marine vessels, mature and well-developed approaches are rare according to [46]. A typical process of vessel health status monitoring involves a multi-layer oral and periodical communication process, from vessel crew technicians to captains and then fleet managers. The fleet manager then decides a scheduled maintenance plan to solve any technical problems that may hinder the operation of the vessel. Such maintenance operations face considerable risks of human error and suffer from drawbacks, such as indirect communications between fleet manager and engineers. Furthermore, maintenance decisions made after this multi-layer communications are not based on real-time data. Due to these limitations, the current scheduled maintenance model for vessel health management can result in incorrect maintenance and repairs, damage caused to vessels, and substantial costs incurred to asset owners. It was reported that, between 2015 and 2017, incorrect maintenance and/or repairs accounted for nearly 30% of vessel damages and costs $544,167, on average [47]. Recently, some studies have explored design and health management system optimization for marine vessels. For example, a wireless sensor network approach was proposed [46] to gather and transmit data for condition monitoring of vessel engines for health management. Reference [48] addresses the question of how to define a prognostics and health management development process capable of deciding the sensor selection criteria to support fault detection and isolation on a typical marine diesel engine. However, the authors focused only on its fuel oil delivery system and not the diesel engine, nor the inter-dependencies across the vessels sub-systems Within the aforementioned studies on propulsion and health management in hybrid vessels, there are several gaps within literature. Firstly, in propulsion energy management, the lack of real-time data, and consequently the use of simulated data, prevents real-time feedback to be made on propulsion power asset performance and state of health, which also limits the ability to optimize vessel operations for emissions and fuel consumption. Moreover, current literature on health management does not capture the inter-dependencies of critical sub-systems and their aggregated impact on system level optimizations.

Optimization Approach
In this paper, we focus on the optimization of three key performance indicators (KPIs) using operational data from a vessel; (i) efficiency in energy performance, (ii) reduced emissions and fuel consumption, and (iii) prediction of asset lifetime. The multi-objective optimization supports the design of an autonomous energy system for real operational data using artificial intelligence (AI) technology. Our research on hybrid vessel PHM was conducted by exploring key assets on board (lithium-ion battery, lead acid battery, and diesel engine) using different machine learning and statistical approaches. We applied Relevance Vector Machine (RVM) to facilitate supervised learning of lithium-ion battery lifetime prediction. Battery life cycle testing data was applied to train, test, and evaluate our RVM algorithm for lithium-ion battery RUL predictions [15]. With respect to lead-acid batteries on hybrid vessels, we designed an experiment and collate life-cycle data using electric pulse testing and created a K-nearest algorithm for classification training. Real-time operational data from a vessel on the River Thames was collated to explore the diesel engine operation characteristics and optimization. The data collected originated from a Cyclone clipper, operated by Thames Clippers, which was fitted onboard with a MTU 10V 2000 M72 diesel engine, as summarized in Table 1.
Our power optimization method follows an energy management strategy considering energy demand versus possible energy supplied by the engine and battery combination, with constraints from the KPIs above.The optimization model is demonstrated in Figure 1, showing how we integrate environmental factors, energy performance, and asset health (battery remaining useful of life) in the system. Details of the work of subsystems are presented in Sections 4 and 5.

Marine Vessel Data Gathering
We performed operational data collection on one of the Collins River Enterprises Limited's vessel (Cyclone clipper) operating on the Thames River between North Greenwich and London Bridge and return. Our work on engine modeling and power optimization is based on two sets of data collected from two ship runs on 07.12.16 and 29.06.16 (and are labeled as such). Note that, although the routes are exactly the same, there are considerable differences in energy consumption between the two datasets, due to the differences in human operator (captain) behaviors, influencing speed, acceleration, and maneuvering. This allows us to test our algorithms under different operating conditions and highlights the performance variability based on human operation.
Operational data is crucial in order to create accurate engine modeling and hybrid energy system management. Such data, captures the operational profile of an in-service vessel, importantly, on the the power and energy usage of its diesel engine under various modes of movement, including maneuvering, cruising, docking, abrupt acceleration or break. Operational data thus enables us to gain a dynamic view of how the diesel engines are affected by various operational factors and helps us to develop the most appropriate and relevant energy management system. Figure 2 illustrates the vessel used in this study, 38.04 m long and 9.40 m wide. The vessel can carry 158 people seated inside, 38 people seated outside, and 24 people standing. The propulsion system of the Cyclone clipper vessel consists of two MTU (Germany) 10V 2000 M72 diesel engine. Each engine weighs more than 3 tonnes, has a displacement of 22.3L (two stage twin turbo) and is capable of producing 900 kW (Table 1). The test consisted of gathering data through the MTU control unit (one of the two engines) while the vessel was performing a normal run on the Thames River route (11.8 miles). Twelve different Key engine parameters (such as engine power, engine speed, fuel consumption, etc.) were collected at a sampling rate of 0.1 s. In addition, the Testo 350 emission analyzer (http://www.testo350.com/) was used to acquire (sampling rate: 1 s) information about engine emissions during the run (NOx, CO, HC, etc.).

Methodology and Results for Data Analysis of Individual Energy Assets
This section presents our work on different assets of hybrid vessels, utilizing different machine learning and statistic approaches.

Prognostics Methods
Although lithium-ion batteries are rechargeable, irreversible chemical reactions can cause battery cells to age, which results in battery capacity decrease over time. In most practical applications or deployments, a lithium-ion battery will be deemed unreliable when its capacity fades to less than 70% of its rated capacity [22]. As an important on-board asset, the state of health of lithium-ion batteries need to be monitored closely to ensure maintenance or replacements are carried out in time, preventing failure and potential economic losses. As discussed in Section 2, many prognostic models use data-driven techniques, feeding data into machine learning models to generate predictive results for RUL. We follow this data-driven method and use RVM to facilitate supervised learning.

Relevance Vector Regression Background
Fundamentally, Relevance Vector Machine is a Bayesian treatment of Support Vector Machine (SVM) [49]. The Bayesian treatment can lead to probabilistic predictions and allows for arbitrary kernel functions to be used (SVM) [50]. The Bayesian treatment leads to probabilistic predictions, and allows arbitrary kernel functions to be utilized. We use an iterative expectation maximization (EM) algorithm for RVM training, which directly avoids the unnecessary and lengthy step of computing and optimizing hyper-parameters in the RVM. This section presents the technical principles of our RVM approach in more detail.
Formally, input vectors {x n } N n=1 and corresponding targets {t n } N n=1 forms a training dataset for RVM, and the output of the RVM model is as follows: where K(x, x i ) is kernel function, w i is the weight of the model, and N is the number of the samples in the data set. According to the standard probabilistic formulation, the target values {t n } N n=1 are assumed to be samples from the above model with additive noise, expressed as: where n is assumed as the normally distributed noise factor, with mean of zero and variance of σ 2 .
In addition, the likelihood of the entire dataset, under the assumption that it is independent of t n , can be written as: In the maximum likelihood estimation of w and σ 2 above, the number of parameters and input examples are equal, this could lead to over-fitting. The RVM utilizes a Bayesian perspective to this problem and introduces additional constraints on weighting parameters w. A Gaussian distribution with mean of zero is chosen over w by Tipping [49]: where α is the N + 1 hyper-parameters vector, and these hyper-parameters are associated independently with every weight. To complete the hierarchical Bayesian model, we assume that hyper-priors over scale parameters α and σ 2 follow Gamma distributions. Since RVM employs Bayes rule, one can derive the posterior (conditional probability) distribution for the weighting parameter w: where the posterior covariance and mean are, respectively, and In the relevance vector regression, many hyper-parameters are close to zero and the posterior distributions of many weights are close to zero. The basis vectors corresponding to the non-zero weighs are called relevance vectors (RVs). The sparse Bayesian learning turns into the hyper-parameters optimization, and the aim is to find the most suitable posterior distributions.

RVM Training Algorithm
The training algorithm is critical for implementing the RVM model. The process of RVM training is the optimization of hyper-parameters. There are mainly three different RVM training algorithms, which are sequential sparse Bayesian learning [51], MacKay iterative learning [49], and Expectation-Maximization (EM) iterative learning algorithm [52]. In RVM training, hyper-parameters α need to be computed and optimized; however, when size of the training dataset increases, the range of α approaches infinity. In this case, the matrix Σ does not have an inverse; thus, it would become impossible to compute the relevance vectors. Computational efficiency is also reduced when dataset is large; in this paper, we use EM algorithm for RVM training to overcome this. Based on EM iteration, the sparse Bayesian algorithm can obtain the parameters in directly. Thus, we were able to avoid computing hyper-parameters and use EM to estimate the posterior distribution. The algorithm consists of an E-Step (calculating the expectation) and an M-Step (maximization). The steps taken to train the aforementioned sparse Bayesian learning with the EM algorithm are as follows: 1. Initialization step: Initialize the weight w 1 and variance (σ 2 ) 1 2. Expectation step: Use w k and (σ 2 ) k from iteration k to estimate subsequent iterations and E(ww T ), where: where the covariance and w k , w k+1 are the weights at iteration k and k + 1. 3. Maximization step: Use w k+1 obtained in the second step to calculate the variance (σ 2 ) (k+1) , where: where tr[] is the trace of a matrix. 4. Convergence threshold: The iteration will end if ||w k+1 − w k || < δ, where δ is normally a very small number that is set to be our empirical threshold. If the condition is not satisfied, then we will go back to the expectation step and start a new iteration.

Battery Data Source and Aging Test
A key constraint in battery analysis is that long-term battery failure data, as well as data from operating commercial vessels, is highly restricted and difficult to obtain, as this requires collecting substantial amounts of data over a long period of time. Within this prognostic experiment, we utilize the aging test dataset from [53]. Life cycle tests were conducted using 34 lithium-ion battery packs (each pack contain 4 batteries) under various experimental conditions. In our study, a typical charge-discharge process is considered as a valuable cycle and a key measurement of the remaining useful life of batteries and charge-discharge process was conducted repeatedly. The batteries were first charged at 1.5 A constant current (CC). When the batteries' voltage reached 4.2 V, charging switched into a constant voltage (CV) mode. The charging process stopped When charge current decreased to 20 mA. Discharge was conducted firstly at a 2 A constant current mode. The discharge process stopped when battery voltage decreased from 4.2 V to a cut-off discharge voltage. We performed this experiment under two distinct room temperature (25 • C , 4 • C ) and with four different cut-off discharge voltages (2.7 V, 2.5 V, 2.2 V, and 2.5 V) for the battery packs. In each cycle test, data on the following battery characteristics: voltage, current and temperature was collected constantly during charging and discharging at different sampling rate. Capacity of the test batteries was also measured using a Coulomb meter after each cycle.

Battery Feature Selection and RUL Prediction
The fade of capacity is the main direct indicator of battery degradation. Obtaining accurate and real time capacity value is critical for battery SOH evaluation and prognostics [22]. When a li-on battery's capacity fade over 30% of its rated capacity, the battery is no longer considered as reliable, and this is the End-of-life (EOL) of the battery [53]. Unfortunately, getting real time battery capacity data is challenging since internal state variables are not accessible by normal sensing technologies [54]. On the other hand, battery capacity is related to several physical features which are easy to measure. Thus, battery prognostics can be achieved through capacity estimation and RUL prediction by utilizing indirect indicators from historical battery operational data.
In this work, five features are extracted from the raw aging dataset for the RVM model training. Previous research indicates that capacity fade occurs in both CC and CV charging processes. Additionally, there is research demonstrating that the time needed for CC and CV charging can be used for capacity estimation [55,56]. So, in our study, the first feature extracted is the time interval between the battery's voltage increasing from a starting voltage to the cut off voltage in the CC charging phase(T CC ). The second feature is extracted from the CV charging phase: the time elapsed from the beginning of CV charging until the current dropping to cut off amount (20 MA) (T CV ). Similarly, our third feature is the time interval extracted from the discharge phase, which is the time elapsed between a starting voltage to the cut off voltage(T D ). In this work, time intervals are extracted between a voltage we set up in the middle of the CC charging and the cut off voltage, rather than from the absolute beginning of charging or discharging steps. Under this setting, it is possible for our features to mimic those in real world application scenarios, such as on hybrid vessels, where batteries are often partially charged/discharged during operation.
Previous research shows that [57,58] surface temperature of a battery is indicative of its thermal behavior, which affects its capacity and resistance. So, the last two features are the battery surface temperatures during charge (Temp(C)) and discharge process(Temp(D)). Specifically, we use the average temperature during each charge-discharge process. Compared to battery capacity values, the above five features are relatively easy to measure and collect.
Due to the different numeric ranges of the features and target data, a normalization process was conducted after the extraction of features data from each cycle test. For each battery pack, we use such normalized feature data from a continuous number of cycle tests as our training vectors V[T CC ,T CV ,T D ,Temp(C),Temp(D)] and a corresponding capacity C for RVM model training. Using EM iteration algorithm described in Section 4.3, the obtained RVM model is used for new capacity C estimation when given new sets of characteristic vectors V . The capacity estimation is used for battery RUL prediction. In this study, battery RUL is defined as the number of cycles left until the battery reaches the cycle corresponding to its End-of-Life (EOL) capacity. In our prediction process, if the capacity estimation reaches the batteries' EOL, the prediction process stops, and we will compare the predicted RUL with the actual RUL to evaluate the accuracy of the model. The detailed workflow could be found in Figure 3.

Algorithm Evaluation and Results
To measure the error of the predict RUL of the battery, we define the absolute error AE and relative error RE as AE = ||R −Ȓ|| and RE = ||R−Ȓ|| R , where R is the actual RUL value andȒ is the predicted RUL value. First, we trained our RVM model for battery No. 5, utilizing its cycle test data. The following starting points for prediction are selected: the 40th, 60th, and the 80th cycles. RUL prediction results for battery No. 5 are shown in Table 2.  Table 2 shows result from battery No. 5. The predicted RUL errors are less than 10 cycles when choosing different starting points. Meanwhile, the true RUL value lies within the 90% confidence intervals. Figure 4 shows the plot of actual data and the predictive point estimates for battery No. 5.
The graph on the left shows the plot with starting point at the 60th cycle, while the right-hand-side graph shows the plot starting at the 80th cycle. The results demonstrate that the model can perform well in forecasting RUL of batteries. Specifically, the model will return more accurate results of RUL prediction when starting cycle is latter (cycle 80).
We also conducted RUL prediction using data from other batteries to verify and assess how adaptable the proposed method is. Table 3 shows results from batteries No. 6, No. 7, and No. 18. Results displayed in Table 3 resemble results from battery No. 5. Measurement for prediction precision: AE and RE shows that our RVM method achieves a good performance for this problem, especially when sufficient data is used for training, and overall prediction precision is sufficient. The results for 4 different battery packs shows that regardless of the starting point, whether 40th, 60th or the 80th cycle, the proposed algorithm and prediction procedure can generate RUL prediction that lie within 10 cycles of the true battery RUL. Performance measurement RE lies within 8.5% for all 4 packs. Our results show that the RVM estimation has a good performance in the long-term prediction of battery RUL. As expected, the latter the starting cycle is (cycle 80), the more accurate the resulting RUL prediction will be, as more data is available for training the model.
These results clearly show the potential of ML-based prognostics for predicting the state of health of batteries installed in hybrid vessels, utilizing the state-of-the-art supervised learning methods.

Lead-Acid Battery Prognostics
Similar to lithium-ion batteries, capacity degradation is also a concern for lead-acid batteries. Knowledge of state of health monitoring for lead-acid batteries is lacking in both commercial and academic spheres. To the best of our knowledge, the closest the commercial lead-acid battery monitoring has come to is referred to as "Ohmic" testing. The test is conducted by drawing a current from the battery cell in a pulse or several pulses, recording the voltage response of the cell, and using these data to compute the internal resistance level [59].

Experiment Setup and Data Collection
To date, voltage response data for lead-acid batteries is rarely available. In this paper, we conduct a "DC" pulse test in order to secure lead-acid battery data. In this test, we are able to extract constant current from the battery cell and obtain a single, short duration pulse. This test is conducted at different levels of state of charge. We measure voltage response at battery cell terminals and record the voltage and current. A typical voltage response over time takes the form shown in the Figure 5.
The aim of this test is to predict state of charge (SOC) of lead-acid batteries using the voltage drop duration information. Since batteries with different SOC will respond differently to the DC pulses, feeding the duration data into a classification model will enable us to make predictions on actual SOC of batteries.
As illustrated in Figure 6, the test equipment includes: test battery, programmable load, programmable battery charger, custom designed electronics signal conditioning, control system custom software, and recording oscilloscope. The test program has the following structure: 1. Charge the battery for 10 h to ensure the test battery is being sufficiently recharged between tests. An example of the collected data is demonstrated in Table 4.

Data Classification and Results
The k-nearest neighbors Classifier: The k-nearest neighbors (KNN) is a non-parametric classification and regression method [60]. KNN is a form of instance-based learning. The function is approximated locally while no computation is conducted until classification. In classification and regression problems, KNN can help assign more weight to the nearer neighbors than the distant ones so that the nearer neighbors contribute more to the average. One example of the weighting schemes assigns a weight of 1/d, where d is the distance to the neighbor. The neighbors are obtained from a set of objects with known class or known object property values for KNN classification or KNN regression respectively. Although it is not required to have explicit training steps, the set of objects can be considered as the training set for the KNN algorithm.
Training data and classification: In our data collection process, we conducted 60 cycle tests for a brand new lead-acid battery designed and manufactured for hybrid vessel, where each test results in a matrix of voltage responses to the DC pulse within a short time period, each of the nine column vectors of the matrix is labeled with the corresponding SOC. In our classification phase, an unlabeled vector was input and classified by assigning the label which is most frequent among the k training samples nearest to that query point. This enabled us to classify the unlabeled vector into a SOC category. The model was trained using 20 test datasets (180 labeled data), the remaining 40 test datasets (360 unlabeled data) were used to evaluate the model. The result showed that the classifier achieved an average of 95.5% accuracy in a 3-fold cross validation. This allows us to monitor the SOC of the standing by battery, by utilizing the data from a simple DC pulse test.

Diesel Engine Modeling
The first step in designing a hybrid diesel/electric power unit is to model the operating envelope of the diesel engine. As previously mentioned, the static optimization method translates electric power demand to an equivalent fuel power demand for the engine. Therefore, the diesel engine must operate along its ideal operating curve to minimize fuel consumption [8,[61][62][63][64].
For the purpose of this analysis, both the Power to Engine Speed and Torque to Engine Speed curves have been estimated from the engine specifications as 2nd order polynomials. In addition, the fuel consumption versus Power and Engine Speed has also been estimated using a 3rd order polynomial. In addition, the NOx estimations based on Power and Engine speed have also been determined based on a polynomial function previously developed at Newcastle University [63].
This section covers (1) Diesel engine model optimization for fuel economy with a view towards minimizing emissions, in particular NOx emissions, (2) a short introduction to diesel engine emissions, and (3) its effect on the optimal energy function when minimizing for emissions.

Diesel Engine Operation and Optimization
In a hybrid power setting where there is electric transmission, the gearbox can be removed, suggesting that the engine will be used as a generator. Therefore, minimum fuel consumption is achieved when the engine is operating on the most efficient envelope curve.
To enable performance monitoring and develop an advisory system for engine operation along the torque-speed/power-speed curve, engine fuel consumption must be expressed as a function of power and speed. In this paper, we approximate engine fuel consumption with a 3rd order polynomial with 9 coefficients. These coefficients are calculated with 25 engine data points that are derived from the engine manufacturer's Performance Diagram. Our engine fuel consumption function performs the estimation of fuel consumption at different torque-speed pairs and enables a straightforward calculation of the locus for minimum fuel consumption at a given power.
The optimization of diesel engine operating point is based on this polynomial; thus, the engine fuel consumption (when operated in generator mode) can be accurately estimated.
Performance mapping is traditionally determined from a minimum of 50 data points [64]; however, due to scarce data, the fuel map derived in this model achieves an 80% accuracy. One must not confuse test data with experimental data; test data cannot be used to model engine performance functions, instead it can be used to determine how the engine is used on-board the vessel throughout the various runs performed. The polynomial used is of the form: f (z) = A + Bx + Cy + Dx 2 + Exy + Fy 2 + Gx 3 + Hx 2 y + Ixy 2 , (11) where z is fuel in g/kWh, x engine speed in rpm, and y power in kW. The derived coefficient values have been tabulated in Table 5. From Equation (11) on fuel consumption, we plotted the 3-dimensional distribution of the function in Figure 7 and a contour map in Figure 8.
When the power level is constant, such as during generator mode operations, and the engine speed and engine torque are at fuel-consumption minimizing values, then we consider the engine to be at its optimum operating point. On the contour map of Figure 8, this point is represented at the trough. When engine power changes from 0% to 100%, in other words, from 100 kW (power at idle speed) to 900 kW (the maximum power which is achieved at a speed of 2250 rpm), the optimum operation points will form a line as shown in red in Figure 8. Therefore, it can be concluded that the engine must operate on this red line to achieve optimal fuel consumption. However, for the present application the engine will be operated at approximately 40%. This is likely to be sub-optimal for the engine as a standalone unit, but it is the optimum operating range for the hybrid system, bearing in mind that the battery is also supplying some fraction of the power required for vessel movement.

Diesel Engine Emissions
Reducing emissions, particularly carbon dioxide, is currently a global challenge. Carbon dioxide or CO 2 is considered as the prime pollutant responsible for global warming. The CO 2 emissions of a diesel engine are directly related to its fuel consumption; hence, electrical hybridization of a traditional vehicle is a good way to reduce the CO 2 emissions. It is noted that reducing the fuel consumption does not guarantee a minimization for all engine emissions. Due to the direct proportionality of CO 2 to fuel, the present design will be able to achieve low carbon dioxide emissions. The optimal control which minimizes fuel consumption leads to some engine operating point with relatively high pollutant emissions, such as NOx emissions. NOx, a generic term for nitrogen oxide (NO) and nitrogen dioxide (NO 2 ), is the most harmful for human health of all emission gasses. Thus, a second objective for the optimization algorithm is to attain pollutant emissions standards.
To better illustrate the above described scenario, consider the test run data plotted as NOx Emissions versus Exhaust Temperature and the NOx Emissions and Engine Power/Engine Torque, as shown in Figures 9 and 10. NOx gasses are considered harmful to humans; hence, a reduction on such emissions is a must when operating in central London. It is worth nothing that the NOx is the only pollutant that does not solely depend on fuel consumption; instead, its output quantity is determined by torque and fuel, as well as combustion temperature.
From Figure 11, it is apparent that NOx emission does not fully depend on engine power. Exhaust temperature actually has greater impact on NOx emissions (Exhaust temperature is used here because direct measurement of the fuel combustion temperature inside the engine is not available in the datasets). Lower exhaust temperature is associated with lower NOx emissions. Since exhaust temperature depends on operational time at particular power settings (of engine torque and speed), hence, the optimization problem can be considered as a complicated algorithm that will later impact on battery remaining useful life. The more the battery is used the less time is spent in diesel operating mode. However, one must decide whether the battery will feature a plug-in option (for shore-to-ship charging) or use only the diesel engine for charging, as the latter will most likely generate relatively high NOx emissions.  In addition, it has been demonstrated that diesel mechanical propulsion is likely to lead to high NOx emissions due to high cylinder temperature (subsequently high exhaust temperature) [63]. Due to the nature of the hybrid concept proposed the diesel generator runs closer to its design point, at which it typically produces fewer NOx emissions or requires less fuel. Furthermore, a diesel generator runs at rated speed, as opposed to a mechanical propulsion engine, which spends much of its time operating at part load (lower efficiency), producing more NOx due to the longer operational time and resulting in higher exhaust temperatures, in turn, leading to an increase in NOx formation time-this has been confirmed in Figure 12, where NOx function has been plotted against power and exhaust temperature.
As noted before, we use polynomial method developed by [63] to calculate NOx emissions, which has the form shown in Equation (12): z denotes NOx emissions in (g/kWh), x% of maximum power, and y% of maximum engine speed. In addition, the coefficient values have been tabulated in Table 6.

Whole-System Optimization of Power Use and Emissions for Hybrid Vessels
Having looked at analytical methods for examining individual energy generation assets which are found aboard a hybrid vessel, in this section, we examine the whole-system optimization of the vessel's power system.
The method employed in this paper follows an energy management strategy considering energy demand versus possible energy supplied by the engine/battery combination. To realize the potential of a hybrid power vessel in full, the power management function should minimize fuel consumption and emissions [65] and, at the same time, maximize lifetime of key assets. However, minimizing fuel consumption does not necessarily suggest that emissions are also minimized. For example, NOx may increase with combustion temperature regardless of actual fuel consumption. The design of power management function was proposed in different papers in the literature, and can be mainly categorized into three types: (1) Intelligent techniques employing control rules/fuzzy logic/neural network to estimate power demand and develop control algorithms; (2) static optimization approach, which means the electric power is generated from battery pack, and transformed into an equivalent amount of fossil fuel power, the optimization scheme then decides the optimal usage of the two power sources using a steady-state efficiency map; and (3) dynamic optimization where optimization is made with reference to a time horizon instead of an instance in time. However, this method could be computationally intensive for practical applications [65].
In this paper, the static optimization method was employed firstly because of its point-wise optimization nature, which fits the test run data. Secondly, its potential of minimizing fuel economy and emissions at the same time. Two clear benefits of such an optimizations are: (1) Evaluation and possible extension of the remaining useful life of the asset and (2) reduction of environmental footprint, as well as fuel consumption of the vessel.
This section introduces our approach to tackle the difficulty presented by integrating a hybrid vessel power system. We decide that the common fixed parameter in our system is power. The vessel propulsion power requirement is assumed to be similar if we replace the current diesel power system with a hybrid system, and the vessel itself will be assumed to be the same: shape and mass of the hull, and the air resistance.
Diesel engines produce torque over a range of engine speeds, and many of which are sub-optimal with respect to emissions and fuel consumption, whereas electrical motors produce a steady (almost linear) torque curve across a wide range of operating speeds (rpm). This presents an arbitrage style opportunity for a hybrid control system in balancing the power demand of the system and the peak power demand placed on the diesel engine.

Energy Usage
The main requirements of the system were assessed using an energy balance approach. It is noted that requirements for the system have two main threads, namely Energy and Power. The interchangeability of these quantities allows the power data to be used as a common thread to draw conclusions. The model for this system represents the power usage between two time points as a function P(t) = f (P, t 1 , t 2 ), as given in Equation (13): The incremental energy is calculated for each time interval leading to the total energy used over a time period given by the sum in Equation (14).
Using this method, the energy requirements were determined from the empirical data-sets to be 175 kWh and 343 kWh.

Battery Model
The battery is designed to make a decision based on the requested power output as per (this is requested by the skipper via the ships' throttle). The model takes a +ve or −ve value of power and decides whether the battery has capacity available to either absorb or supply the energy demand and then communicates this decision to the engine controller (Details of the working flow can be found in Figure 13), which will use this information to decide on the quantity of energy requested from the engine.  Let P d be the power demand, P batt be the net power reported by the battery, and P net be the power to be supplied from the battery, as defined in Equation (15).
Note the convention used here is that P < 0 is a charging (battery absorbing) power flow and P > 0 is a discharging (battery supplying) power flow, with the special case being zero energy transfer.

Modeling Assumptions
It is worth noting that, in the development of our model, we make several assumptions when approximating the physics of the problem. These would require further consideration in the development and implementation of a final system. 1. The model does not account for the latency of power output for engine and battery. It is assumed that power demand at any level would be provided instantaneously. 2. According to the data sheets, the battery can provide a maximum power demand of 750 kW from the observed data; thus, we did not introduce any maximum power constraints to the battery. If the battery specification changes, further battery instruction will be needed for the battery to operate within rated working limits. 3. In lieu of information regarding a power converter and electric motor controller for the system, it has thus far been assumed that the constraints imposed by this will not affect the operational algorithms.
To make recommendations about the form of system to be used for a Hybrid Fusion Energy System, a number of cases are examined using the modeling script developed in our work. The cases are described below: Engine Only: This is our baseline case from the empirical data that is used to contextualize the other simulations. The results from this case are used as a baseline for comparison and to characterize the improvement/deterioration of system performance for a given case.
Battery Only: In this case, we assess the performance of a system operating solely under battery power, i.e., a purely electric-propulsion vessel. This is clearly a "best-case" scenario from the perspective of local emissions, and the battery prognostic methods developed in Section 4.1 also apply directly to purely electric-powered vessels. However, large scale adoption of purely battery or electric-powered ships remains elusive, at least in a short-term time horizon, mainly due to economic constraints. Batteries, especially of the power rating and energy density required to run an exclusively electric-powered vessel, remain an expensive proposition, at least in the short term; hence, most practical attention has focused on hybrid ships. Nevertheless, having this case as a benchmark is very useful.
Micro-cycling: In this scenario, both engine and battery are used based on the rules described in Section 5.2. We set a pre-defined point of engine power, and any demand higher than this point will be provided by the propulsion battery. When power demand is less than this point, then the excess power will be used to charge the battery until full.
Full-cycling: In this case, the rule is similar to the battery only case, except that we place a constraint on the system during battery charge and discharge. This implies that the system will keep using the battery until the battery pack is fully discharged. Then, the battery will only absorb the exceeding power from the engine until the battery pack is fully charged and will carry on in such a manner, ensuring the battery will only encounter full charge cycles.
Notably, the two cycling (micro and full) scenarios are both integrated hybrid models, meaning the battery works in conjunction with an engine at a chosen engine power set-point.

Engine Only Scenario
A power demand profile was created for the hybrid vessels from two different data collection trials. The summary of this data can be found in Table 7. As the empirical data is recorded from a vessel solely powered by diesel engine, this is taken as the ground truth case. The table shows the total energy used ∑ E, mean power demandP, the maximum power demand P max , and also the total available regenerative energy (Negative power value was observed in the data and the energy from this was summed to give a value of potential regenerative energy available. As seen, it represents a small percentage of the total energy (1.5% & 0.8%)). The distribution of the power demand is shown in the histograms in Figure 14. The distribution shows that the power usage is characteristically used in a somewhat on/off fashion, with little power demand in the interim between maximum and zero power. This suggests that there will be time for a battery to absorb charge at high current and then many opportunities to provide power at high current.
Raw data of the power demand for both trial are plotted in Figure 15 to give a comparison to the runs. We can observe that the 29.06.16 dataset entails more peaks than that of the 07.12.16 dataset, although the route of the trials is the same. It also shows that the power demands changes over the same route, although power distribution remains similar (see Figure 14). This represents a difference in total power demand of 32% between the two trials, likely due to the difference in usage and piloting style between the two captains steering the ships. Nevertheless, we consider this a positive, as it allows to test our algorithms under different operating conditions.

Battery Only Scenario
The battery only scenario was simulated by setting both the battery storage capacity and initial storage state to 800 kWh with the engine power setting at 0 kW. The results are as shown in Figure 16.
From the datasets, the average energy consumption (power) can be used to calculate the battery life under these conditions. For the recommended battery specification, 800 kWh useable capacity would give a battery lifespan based on the datasets using t = CP −1 , where C is the battery capacity, thus giving 3.5 h (29.06.16,P = 228 kW) or 5.8 h (07.12.16,P = 137 kW).

Micro-Cycling Scenario
One of the key assumptions behind our model is the power rise and fall times for the battery have been assumed to be negligible. Therefore, the algorithm works based on the battery being able to provide and power magnitude required from it at any moment regardless of the power flow change required to achieve this. Formally, it can be stated that the model assumes dP dt ∈ [−∞, ∞], whereas, in the real world, dP dt ∈ [−α, β], where α and β are some empirically measured rise and fall rates restricting the change in power output/input rates of the battery. Crucially, this assumption for the battery is also paralleled in the case of the diesel engine. A full model would have to take into account the lag in the increase in power for the engine to improve the power matching of the system. An overview of the type of behavior emerging from this regime can be seen in the plots in Figure 17.  Allowing the battery to micro-cycle gives the greatest flexibility in system performance. This is in terms of being able to quickly provide peaks of power when they are required while allowing the diesel engine to continue generating power at its maximum efficiency operating point. Doing this allows reductions in the emissions associated with change in power as engines give more emissions when undergoing a change in operating points as this forces the engine to move through the inefficient ranges to meet the power demand required from it. Similarly, when power demand is low, the engine can be maintained at its optimum set-point by supplying any excess power to recharge the battery system.
In comparison with the full cycling case in Section 5.6, this system gives a better balance of asset usage.

Full Battery Cycling Scenario
This case requires the engine to frequently provide high power outputs, up to and including full power. This is caused by the restriction of not allowing the battery to switch between charging and discharging states on the fly, meaning that, when the battery is in the charging state, it will spend much of its time inactive due to high power demand values for the vessel, leaving the system with insufficient excess energy available to charge the battery and an engine operating outside its optimal operating envelope.
A positive feature of this system is that the quantification of the battery cycles can be carried out easily because each charging state switch represents a half cycle of the battery (This is based on the assumption that a battery cycle is defined as the battery going full → empty → full). The behavior of this regime can be seen by observing the battery state shown in Figure 18. In this case, the battery capacity was set to 10 kWh merely for illustrative purposes. It can also be seen, by observing the net engine power, that the engine is at times required to provide the full power demand for the system.

Results
The complexity of the hybrid system poses an increasing amount of design choices for a diesel generator/battery combination. optimizing the operational profile is essential to energy management function design in order to be able to minimize for fuel, emissions, and battery degradation due to micro-cycling.
In this section, we analyze the potential of adding a battery in parallel to the existing diesel engine in order to explore possible fuel savings. The suggested retrofit solution is by no means an efficient solution (i.e., using the existing specification of diesel engine); however, it underlines the benefits and possible cost savings by replacing the existing diesel engine with a completely new hybrid system architecture configuration. The results obtained following the analysis have been captured in Table 8. Our preliminary results demonstrated that we can achieve fuel savings of 70% to 80%. This is because, when power demand is low, for example, during cruising, the diesel engine is operated up to 350 kW, which is the maximum power output to ensure optimal operation. In addition, surplus energy will be stored in the battery if power demand is below 350 kW. When power demand is higher than 350 kW, energy from the battery is utilized. In practice, this power demand would be seen, for example, during docking maneuvers.

Engine/Battery Pair Matching
It is apparent that the matching of engine power and battery capacity are crucial in the real world scenario. As an example, it was observed that a single run of the vessel (29.06.16) required 343 kWh. This energy is considered to be fundamental as it is a result of the prime moving force of the vessel throughout the run. If there is no plug-in charging for the vessel, then the engine must be sized to P eng P . From a practical perspective, it would be suggested that the engine be slightly oversized to allow an operational margin. From the results of running simulations, the suggestion would be that an engine size of 300 kW and a battery usable capacity of 200 kWh would suffice for the purposes of a functioning hybrid power system for our case. The plotted results from this simulation are shown in Figure 17.

Challenge and Future Work
Limitations and costs of adding a new hybrid system could relate to retrofitting the hybrid system on "conventional" vessels. Challenges of implementing a hybrid propulsion system to achieve fuel consumption saving could include interoperability, requiring larger amount of batteries than conventional propulsion systems. It remains a question whether and how, on a hybrid vessel, component parts and control systems could be designed to fit the needs of vessel operators [66]. In addition, hybrid system on the vessel could result in large batteries or distributed batteries reducing cargo space. Furthermore, there would be installation costs and potentially further investment required in vessel redesign and customization to accommodate space and weight distributions of batteries.
With respect to future research, areas of investigation that could provide additional optimization to system performance, include: 1. Potential for dynamic engine operating set-points. This would allow the engine set-points to be changed by the control system in order to further optimize output emissions. 2. Investigation into the potential for use of a multi-regime system. Such a system might change the operation of the hybrid system based on the operation of the vessel, such as having a docking mode which, it might be imagined, could switch the system to run on 100% battery during the docking stages of a journey. 3. Data-driven model with more feasibility considerations for live battery prognostic. Currently our battery prognostic model is geared towards outputting live RUL prediction when given dynamic operational data, especially partial charge and discharge data.

Conclusions
The marine sector faces global pressure to reduce its CO 2 and NO x emissions and increase fuel efficiency, which has led to a surge of interest in the use of hybrid vessels. However, as this work shows, a successful transition of hybrid vessels throughout the marine sector requires an understanding of the operational, environmental, and economic risks and benefits associated with these systems.
The methodology proposed in this paper addresses limitations within the literature that relate to operational prognostics for key assets, system level optimization in energy performance, and integration of environmental, energy demand and asset health key performance indicators. We used a state-of-the-art supervised learning algorithm to predict lithium-ion batteries RUL with relative errors lying under 8.5% for multiple test battery packs, demonstrating the potential for using such algorithm for on-board battery prognostics. We also conducted life cycle tests on a sample of lead-acid batteries as stand-by power sources on a hybrid vessel. The results showed that a clustering machine learning algorithm can achieve an average of 95.5% accuracy in predicting the SOC for lead-acid batteries.
Another key contribution of our work is the design and validation of an autonomous hybrid marine propulsion. Incorporating operational data and automatic multi-objective global optimization for performance (energy), asset health, and environment metrics, we demonstrated that a fuel saving of 70% to 80% can be achieved when using a diesel engine when the energy requirement is below 350 kW, storing surplus energy into battery, and switching to the battery when the energy requirement is above 350 kW.
From our findings, the following recommendations to vessel operators are made: Firstly, it would be valuable to use the internet of things (IoT) to secure operational data on vessel performance; a cost-effective hybrid propulsion system requires a multi-asset optimization architecture, such as the one developed in this paper. Secondly, a cloud platform could add value to the monitoring of hybrid vessel performance trends. Future research will involve determining set-points for dynamic engine operation for emission control, using machine learning algorithms to learn and optimize vessel operations under different stages of a cruise journey, such as docking for automated control decision-making.