Adaptive Online Sequential Extreme Learning Machine with Kernels for Online Ship Power Prediction

: With the in-depth penetration of renewable energy in the shipboard power system, the uncertainty of its output power and the variability of sea conditions have brought severe challenges to the control of shipboard integrated power system. In order to provide additional accurate signals to the power control system to eliminate the inﬂuence of uncertain factors, this study proposed an adaptive kernel based online sequential extreme learning machine to accurately predict shipboard electric power ﬂuctuation online. Three adaptive factors are introduced, which control the kernel function scale adaptively to ensure the accuracy and speed of the algorithm. The electric power ﬂuctuation data of real-ship under two different sea conditions are used to verify the effectiveness of the algorithm. The simulation results clearly demonstrate that in the case of ship power ﬂuctuation prediction, the proposed method can not only meet the rapidity demand of real-time control system, but also provide accurate prediction results. Future work will study the combination of the proposed prediction algorithm with traditional power system controllers, such as generator controllers. The prediction algorithm can replace the traditional compensator and provide additional control signals for the power coordinated control of ship power system, so as to eliminate the problems caused by the power uncertainty of power system, such as frequent power ﬂuctuation.


Background and Motivation
With the development of ship power system electrification and the penetration of renewable energy, the integrated power system (IPS) with renewable energy power generation system such as photovoltaic (PV) power generation system has become an important development direction of ship electrification [1,2].
In order to provide stable power quality for shipboard IPS, it is of great importance to ensure the power balance and stability between supply side and demand side of shipboard power system [3,4]. In practice, however, due to the influence of uncertainty factors such as wind, temperature, waves and weather, the output power of PV system and ship load demand power are volatile, which makes it difficult to stabilize the power quality of the system. Therefore, it is essential to solve the influence of uncertainty on the power system.
At present, there are some studies to solve the impact of uncertainty on the power system. Lots of research on improving power system control focus on building more accurate non-linear models for power system [5]. In [6], a graphical user interface (GUI) is developed where the fault alarms appear on real-time status monitor whenever a fault occurs in the actual PV plant. Some other researchers are using intelligent algorithm as control approaches such as fuzzy logic [7], artificial neural networks [8,9], optimisation algorithms [10]. Due to the instability of renewable energy and the real-time change of load, the rapid and random power fluctuations bring more difficulties to signal estimation, which will significantly affect the power quality control of power system by these methods. Therefore, the method of predicting power fluctuation to provide accurate reference for ship power system controllers to eliminate uncertainty becomes significant.

Literature Review
In [11] a short-term nonparametric probabilistic method of PV powerprediction is presented. Based on long short-term memory (LSTM) recurrent neural network (RNN), Ref. [12] solves the short-term load power prediction problem for individual residential households. The back propagation neural network (BPNN) is regarded as a classical method for forecasting [13]. The accuracy of these prediction methods is high. However, it is difficult to predict the real-time fluctuation of power system due to the complex network structure. Therefore, this paper aims to design an algorithm which can predict the power fluctuation of integrated power system of ship, and lay a foundation for the control of power system. Huang et al. proposed a machine learning algorithm for single-hidden layer feed forward neural networks, which is extreme learning machine (ELM) [14]. Due to the fast operation and applicabilities, ELM has been successfully used in many applications [15,16]. Many variants of ELM have been developed. Online sequential ELM (OS-ELM) [17] and incremental ELM (I-ELM) [18] are proposed for online sequential data. An ensembleforecasting model based on the extreme learning machine algorithm (En-ELM) is designed in [19] to predict ultra-short-term power fluctuations, which serves as an extra signal for automatic generation control. However, the prediction algorithm is an offline prediction algorithm. Although it can avoid the randomness of ELM and improve the prediction accuracy and stability, the network structure is complex and always requires a lot of calculation. According to the support vector machine (SVM), an online sequential learning algorithm for regularized Extreme Learning Machine (OS-RELM) is presented in [20] and a regularized OS-ELM with adaptive regulation factor is proposed for time-varying nonlinear system in [21]. Based on hybrid decomposition and outlier robust based OS-ELM is developed for wind speed forecasting [22]. In [23], for online ship roll motion prediction problem, a sequential ELM is proposed using grey prediction. With the limitation of the scale for online prediction training, a forgetting mechanism is introduced into OS-ELM [24] and on the basis of [24], the researchers in [25] introduces the forgetting factor, which further increases the prediction accuracy of the algorithm. The above methods can ensure that the algorithm can realize online sequence prediction. However, it is difficult to accurately get the characteristics and trendency of online sequence data with the online prediction based on OS-ELM; the prediction error is large.
In order to reduce the prediction error, OS-ELM with kernel (KOS-ELM) [26] function is proposed and has some development. In [27], a KOS-ELM is proposed for non-stationary time series prediction. However, with the online prediction, the scale of kernel function in the algorithm is also expanding, which increases the computational complexity. Although the prediction accuracy is guaranteed, the longer prediction time can not meet the requirements of online prediction. Based on [28], an online sequential extreme learning machine with reduced kernel function (OS-RKELM) is developed, which can reduce the growth of kernel function in the process of online prediction [29]. However, the OS-RKELM can not be applied to online prediction of time series, because the processing of random data has disturbed the time correlation of data.

Main Work and Paper Organization
In order to improve the power prediction speed of the algorithm and provide enough prediction accuracy to predict online power fluctuations to achieve an accurate reference of the power system control, based on the above research, this paper proposes an adaptive KOS-ELM. The adaptive factors are introduced into KOS-ELM to ensure the prediction accuracy and speed up the prediction algorithm. By comparing the similarity and prediction error, the scale of kernel function in online prediction process is limited to speed up the operation, and the time correlation between data is kept to ensure the prediction accuracy. In order to verify the effectiveness of the proposed algorithm, this paper uses the power fluctuation data recorded by ships sailing in different sea conditions to test the algorithm. The prediction performance is compared with OS-ELM, KOS-ELM, En-ELM and LSTM.
The remaining of this paper is organised as follows: In Section 2, ELM, KELM, OS-ELM, KOS-ELM and the derivation process of the foundation are introduced. Section 3 presents the proposed AKOS-ELM algorithm. The introduced adaptive factors and the way of their calculation are also explained. The simulation of online sequence prediction based on AKOS-ELM is demonstrated in Section 4, and the performance of the algorithm is discussed. Conclusion and future work are finally derived in Section 5.

Brief Review of ELM Methods
In this section, an overview of the ELM, KELM and OS-ELM extension is presented. This serves to provide the necessary background for the development of the adaptive improvement in the next section.

The ELM
ELM is proposed as feedfoward neural networks with single hidden-layer [14]. In ELM, only the number of the hidden neurons ispredefined, while the configurations of the hidden neurons need not be fine-tuned, but randomly assigned. Given training where N is the number of data, d is the dimension, m is the number of output nodes. The output function of ELM is given as where h(·) is the activation function, L is the number of hidden nodes, Ω i is the input weights of additive nodes, b i is the bias of additive nodes and β = [β i , . . . , β L ] T is the output weights, which can be obtained through the Least Squares Estimate (LSE) method as given in Equation (2).
where T is the target matrix and H † is the Moore-Penrose generalized inverse of matrix H. The origin ELM can be solved as a constrained L2-regularized optimization problem [23].
where H denotes the output matrix of hidden layers which maps the data from the ddimensional input space to the L-dimensional hidden layer feature space. Then the solution of β can be solved through the Karush-Kuhn-Tucker (KKT) condition and given as the following equations

The KELM
In [29], connections with SVM and other kernel methods become apparent when, instead of an explicit feature vector h(x), which need not be known, a kernel function (5) is considered by applying the Mercer's conditions on ELM [15,16].
and the express in Equation (1) can be represented in kernel-based form In [28], a RKELM method is presented using a reduced kernel matrix instead of the full kernel matrix to build the model. The SLFN with kernel function K(·, ·) and L support vectors can be written as where Ω N×L = K(X, X L ) is the reduced kernel matrix, β = [β 1 , β 2 , . . . , β L ] is the output weight matrix.

The OS-ELM and KOS-ELM
The online sequential update version of the ELM, OS-ELM, has already been developed and is widely used [17]. The OS-ELM aims to sequentially update the inverse matrix (H T H) −1 and the output weights β in Equation (2) using the Woodbury formula.
Suppose that the dataset is presented in successive chunks S j , and given a chunk of initial training set notes the number of instances in this chunk, therefore, the minimizing problem becomes as Considering both chunks of the data sets S 0 and S 1 , the updated output weight β (1) is obtained as then, the β (1) can be expressed as a function of β (0) . and substituting Equations (11) and (12) into Equation (10), β (1) is given by When (k + 1)th chunk of data set arrives as S k+1 = {(x i , t i ) ∑ k+1 j=0 N j }, k ≥ 0 and N k+1 denotes the number of the observations in the (k + 1)th chunk. By generalizing the previous arguments, the P k+1 and β (k+1) can be written as Using the Woodbury formula, the K −1 k+1 can be expressed as a function of K −1 k , let P k+1 = K −1 k+1 then the Equation (14) for updating β (k+1) can be rewritten as By substituting the kernel function expression, the recursive formula of β (k) can be obtained as the following kernel-based form, hence the KOS-ELM algorithm [26]. where In [26,27], the experimental study and analysis reflected that the KOS-ELM givesbetter testperformance than SVM/LS-SVM/PSVM and OS-ELM, however, with the new chunks of data received, the scale of kernel matrix size is increasing, which leads to the exponential growth of training time complexity during the whole online learning process. To cope with the online data well, in [29], a reduced KOS-ELM is proposed, which uses the method of randomly selecting a certain amount of data from the chunk to replace the original data set for training, and provides a reduced scale OS-ELM with kernel.

The Proposed AKOS-ELM Algorithm
During the online learning process, all samples are equally treated by the KOS-ELM and the OS-RKELM. However, for online time series, the ability to reflect the overall characteristics and trend is different between the new and old data. Considering the differences and timeliness of the samples, on the basis of the KOS-ELM, an adaptive KOS-ELM is proposed to maintain the size of the kernel matrix, ensure the prediction accuracy and calculation speed.

The AKOS-ELM
Learning the inital sample training chunk , based on the Equation (7), the inital output weight matrix β (0) can be obtained as When another chunk of data where N 1 is the number of the data samples in the new chunk of arrival, the output matrix of hidden layer is obtained and the corresponding output weight β (1) then becomes β (1) Generally speaking, the new data can better describe the trend of the current data. When the old and new data are similar, or the contribution to reflect the characteristics of the data is similar, we can maintain the size of the kernel matrix and ensure the calculation speed by introducing an adaptive factor µ to adjust the weights of the new and old chunk of data. The validity period of any sample chunk is determined by memory factor n. Therefore, the β (1) can be represented with the factor as β (1) Let (19) can be transformed into In general, when the (k + 1)th chunk of dataset where k ≥ 0 and N k+1 denotes the number of the observations in the (k + 1)th chunk. When the number of datasets does not reach the maximum memory factor k ≤ n, Z k+1 and β (k+1) can be written as With the Wood-bury formula employed, the update formula of Z −1 k+1 is derived Let G k+1 = Z −1 k+1 , then the equations for updating β (k+1) can be derived as When the data set size reaches the limitation k ≥ n, the next arriving data set needs to be decided whether to keep the data set by the factor ε and discard the oldest data set.
Here, assume that S k is acquired and the (k − u)th chunk of data S k−u is discarded. Then, the expression in Equation (18) can be represented as At the time where the (k + 1)th data chunk S k+1 is received, the output weight β (k+1) is denoted as follows and the Equation (24) can be rewritten as the following equations

The Adaptive Factors
In the previous part, two adaptive factors n, µ are used to improve the adaptability of the algorithm. n is a factor used to maintain the size of the online training dataset and the size of the kernel function for the proposed algorithm, and to adjust the weights of the old and new chunks in the online process, the adaptive factor µ is employed. In order to make sure the proposed algorithm has better effect, these two factors are required to have the ability of updating. The function of adaptive factor in the algorithm is shown in the Figure 1. Here, two elements are introduced to make µ adaptive, λ and E. λ denotes the time weight in the current kernel function matrix, (0 < λ < 1), the newer the data chunk, the larger the weight. The other element, E, reflects the prediction errors of the training instances on the previous model. Then, the express of the factor µ can be written as an exponential function.
Since the value ranges of λ k is from 0 to 1, and the E > 0, the − λ k E k < 0, then, the right side of Equation (28) belongs 0 and 1, which makes sure the adaptive factor µ ranges from 0 to 1.
Similarly, the value of n is also related to two elements: the range of maximum and minimum scale of kernel function, which is presented as n max and n min . The value of the two elements can be obtained by the optimization function before the online prediction algorithm begins.
min Y(n max , n min ) − T (29) where Y denotes the prediction result of the current training chunk.
To maintain the size of kernel function and update the data used for training in kernel function, an adaptive factor ε is employed to determine whether to retain or replace data sets by calculating the similarity between old and new data. Similarity based on the Euclidean distance can be calculated by represents the Eucliden distance between the kth sample chunk and the k + 1th sample chunk.
When D k ≤ ε, the new chunk is considered to be similar to the current data. Because the new data can better reflect the current trend, the new sample chunk is retained and the oldest one is removed to maintain the current kernel size. When D k > ε, it means that the new data chunk brings new data characteristics, so it can be retained directly until it reaches the maximum scale n max , then, the oldest data chunk should be removed as the previous processing method. In general, ε can be given by the normalized empirical value; in this paper ε = 0.5.

Setup
The shipboard load electric power data used in this paper is obtained during the sea trials of 146 M Multi-Purpose Offshore Carrier, Table 1 gives some important information about this ship. The electricity system of the shipboard power system is 690 V, 60 Hz, which consists of four main generating units and a set of 690 V distribution board; the 690 V distribution board is set into two sections through the bus coupler switch. In the paper, the data used in this paper are normalized or proportional transformed, which can express the change trend of the original data. With uncertainty caused by various working condition demand, random waves and wind, as stated above, PV system output power is highly volatile and the ship load electric power (mainly propulsion load power) also changes randomly to maintain dynamic performance like ship speed and bus voltage. Therefore, the real measurement data of shipboard electric power is employed to test the prediction performance of the proposed algorithm. The number of training set is selected as 3000, and the testing set is 500. The experimental studies have been conducted using MATLAB 2016b, MathWorks, MA, USA on a PC with 3.20 GHz CPU, 16G Memory and Windows 10 (64 bit). To verify the performance of the AKOS-ELM on the time series problem, we tested the algorithm in different sea conditions. The effect of the algorithm is compared from two benchmarks: root mean squad error (RMSE) and algorithm execution time. In the experimental test, the input and output of regression problem need to be normalized or proportional transformed, and therefore the input and output in this paper are normalized to the range [−1, 1]. In order to further demonstrate the performance of the algorithm, this paper uses LSTM, En-ELM, OS-ELM and KOS-ELM prediction methods as a comparison group. The weights and biases of OS-ELM are randomly chosen from [−1, 1]. The hidden layer number of LSTM is 2, neurons and epochs are set as 50 and 120, respectively. The number of sub-ELM is set as 50 for En-ELM. The kernel function is selected as Gaussian kernel and the additive activation function is chosen as sigmoid function.

Result
Two benchmark test data sets have been studied here, as shown in Figure 2, each data set used for this section training and test is 3500 data samples. In order to show the power fluctuation directly, when comparing the test results with the real value, the power value is normalized in the range of [−1, 1], and Level-3 and Level-5 represent different sea condition level. As mentioned before, the first 3000 data of each data set shown in Figure 2 are used for initialization training, and the last 500 data are used to test the prediction ability of the algorithm.

Scenario 1
In Figures 3 and 4, using the learning mode of one by one, the comparison of the prediction results of KOS-ELM, AKOS-ELM, OS-ELM, En-ELM and LSTM with the test data set are shown, respectively, and the comparison of the prediction errors of the three methods is given at last. It can be seen from Figures 3 and 4 that KOS-ELM has the lowest prediction error, however, the deviation of prediction error between the KOS-ELM algorithm and AKOS-ELM is not big, which is only about 18% better than AKOS-ELM. This is because the kernel function of KOS-ELM contains the characteristics of all the data from the beginning to the end, so the prediction model obtained by this method is more able to reflect the characteristics and trend of real data. While in the online prediction process, AKOS-ELM removes part of the data to ensure the speed of the algorithm, so it can not get the same prediction accuracy as KOS-ELM. However, the KOS-ELM method is time-consuming, which is about 22 times more than AKOS-ELM as shown in Table 2, because in the online prediction process, the scale of kernel function is increasing, which leads to the increase of computational complexity. It can be seen that the algorithm time of OS-ELM and AKOS-ELM is close, which is about 40% of AKOS-ELM. This is because the AKOS-ELM algorithm has adaptive factors, which limit the scale of kernel function in the online prediction process. However, OS-ELM can not get the data features well. The prediction accuracy of AKOS-ELM is 30% higher than that of OS-ELM. AKOS-ELM has better performance than OS-ELM when meeting the execution time requirements.
When only the prediction accuracy is compared, the accuracy of LSTM is between KOS-ELM and AKOS-ELM, which is 18% better than AKOS-ELM, while the accuracy of En-ELM is not as good as AKOS-ELM. LSTM offline training network can not adapt to the changing sea conditions and the power change of ship flexible load in real time, so LSTM needs to continuously train network parameters. When there are too many hidden layers in LSTM, the training time is too long. When the hidden layers are reduced, the prediction accuracy is low. The same is true of En-ELM. Too many sub-ELMs make the training time too long, and too few sub-ELMs will reduce the prediction accuracy. In this test, the average training time of LSTM is about 5 s and that of En-ELM is about 4 s. In order to further verify the performance of the algorithm on different data sets, we use Level-5 sea condition data set as shown in Figure 2. The following test results are obtained.
As can be seen from Figures 5 and 6 and Table 3 that the test results of the two data sets are qualitatively consistent. In this data set, because the sea condition is worse than before, the power fluctuation is larger and the characteristics and trend of the data set are more difficult to obtain, resulting in greater prediction error than before. Due to the increase of uncertainty in data sets, the kernel function of the proposed algorithm is kept at the maximum scale in the online prediction process, which makes the algorithm more timeconsuming than before. However, the growth of uncertainty has little effect on KOS-ELM, because it does not control the scale of kernel function. The deviation of prediction error between the KOS-ELM algorithm and AKOS-ELM is still not big, which is only about 17.7% better than AKOS-ELM. While the execution time of KOS-ELM is about 18 times more than AKOS-ELM as shown in Table 3, the OS-ELM has a lower execution time under this sea condition, which is about half of AKOS-ELM, however, the prediction accuracy of AKOS-ELM is 50% higher than that of OS-ELM. In this scenario, the average training time of LSTM is about 5.5 s and that of En-ELM is about 4.3 s. LSTM has better prediction accuracy which is 14% higher than AKOS-ELM, while AKOS-ELM is 17% better than En-ELM. It can be seen that despite the changes of sea conditions, the proposed algorithm still maintains a stable advantage compared with other prediction algorithms.  In summary, the comparison results of two data sets indicate that AKOS-ELM has less time-consuming without sacrificing much accuracy, the comprehensive performance is better than the other algorithms.

Conclusions and Future Work
With the deep penetration of renewable energy in ship power systems, it is of great significance to predict power fluctuation and provide accurate reference for ship power system controllers to eliminate uncertainty. In this paper, a fast and accurate online sequential kernel-based adaptive extreme learning machine, AKOS-ELM, is proposed for sequences with time-dependent characteristics. Three adaptive factors are introduced into the algorithm to ensure the rapidity and accuracy of the prediction algorithm. Multiple benchmarks are comprehensively tested using real ship power fluctuations under two sea conditions and compared with several prediction algorithms.
The test results show that compared with other online prediction algorithms, AKOS-ELM algorithm has better comprehensive effect, that is, it provides higher prediction accuracy while ensuring rapidity. The test results also show that compared with the traditional off-line training prediction algorithm, AKOS-ELM can still provide high prediction accuracy under the same number of epochs, and the rapidity of the proposed algorithm makes it meet the needs of real-time control system and can be more easily applied to power system control. Based on the above results, for the case of time-dependent series and online prediction demand, such as ship power fluctuation prediction, the algorithm can not only meet the rapidity demand of real-time control system, but also provide accurate prediction results. Future work will study the combination of the proposed prediction algorithm with traditional power system controllers, such as generator controllers. The prediction algorithm can replace the traditional compensator and provide additional control signals for the power coordinated control of ship power system, so as to eliminate the problems caused by the power uncertainty of power system, such as frequent power fluctuation.