Cholesky Factorization Based Online Sequential Extreme Learning Machines with Persistent Regularization and Forgetting Factor

Xinran Zhou; Xiaoyan Kui

doi:10.3390/sym11060801

and

School of Computer Science and Engineering, Central South University, Changsha 410083, China

^*

Author to whom correspondence should be addressed.

Symmetry2019, 11(6), 801;https://doi.org/10.3390/sym11060801

Version Notes

Order Reprints

Abstract

The online sequential extreme learning machine with persistent regularization and forgetting factor (OSELM-PRFF) can avoid potential singularities or ill-posed problems of online sequential regularized extreme learning machines with forgetting factors (FR-OSELM), and is particularly suitable for modelling in non-stationary environments. However, existing algorithms for OSELM-PRFF are time-consuming or unstable in certain paradigms or parameters setups. This paper presents a novel algorithm for OSELM-PRFF, named “Cholesky factorization based” OSELM-PRFF (CF-OSELM-PRFF), which recurrently constructs an equation for extreme learning machine and efficiently solves the equation via Cholesky factorization during every cycle. CF-OSELM-PRFF deals with timeliness of samples by forgetting factor, and the regularization term in its cost function works persistently. CF-OSELM-PRFF can learn data one-by-one or chunk-by-chunk with a fixed or varying chunk size. Detailed performance comparisons between CF-OSELM-PRFF and relevant approaches are carried out on several regression problems. The numerical simulation results show that CF-OSELM-PRFF demonstrates higher computational efficiency than its counterparts, and can yield stable predictions.

Keywords:

extreme learning machine; online sequential learning algorithms; forgetting factor; Cholesky decomposition; persistent regularization

1. Introduction

Single hidden-layer feedforward neural networks (SLFN) can approximate any function and form decision boundaries with arbitrary shapes if the activation function is chosen properly [,,]. To fast train SLFN, Huang et al. proposed a learning algorithm called “Extreme Learning Machine” (ELM), which randomly assigns the hidden nodes parameters and then determines the output weights by the Moore–Penrose generalized inverse [,,]. ELM has been successfully applied to many real-world applications, such as retinal vessel segmentation [], wind speed forecasting [,], water network management [], path-tracking of autonomous mobile robot [], modelling of drying processes [], bearing fault diagnosis [], cybersecurity defense framework [], crop classification [], and energy disaggregation []. In recent years, ELM has been extended to multilayer ELMs, which play an important role in the deep learning domain [,,,,,,].

The original ELM is a batch learning algorithm; all samples must be available before ELM trains SLFN. Whenever new data arrive, ELM has to gather old and new data together to retrain SLFN to incorporate the new information. This is a very time-consuming process, and is even computationally infeasible in some applications where frequent and fast training, or even real-time training, is required. Hardware systems cannot provide enough memory to store the increasing amount of training data. To deal with problems with sequential data, Liang et al. proposed an online sequential ELM (OS-ELM) to learn data one-by-one or chunk-by-chunk with fixed or varying chunk size []. OS-ELM can be implemented in common programming languages and run on universal computing platforms. Moreover, in order to fast execute OS-ELM, Frances-Villora et al. developed an FPGA-based implementation of the tailored OS-ELM algorithm [], which assumes a one-by-one training strategy. OS-ELM has been successfully adopted in some applications, but it still has some drawbacks. Firstly, OS-ELM may encounter ill-conditioning problems, and resulting fluctuating generalization performances of SLFN if the number of hidden nodes L in SLFN is not set appropriately [,,]. Secondly, OS-ELM does not take timeliness of samples into account, so it cannot be directly employed in time-varying or nonstationary environments.

As a variant of ELM, Regularized ELM (RELM) [,], which is equivalent to the constrained optimization-based ELM [] mathematically, can achieve better generalization performance than ELM, can greatly reduce the randomness effect in ELM [,], and is less sensitive to L. Furthermore, several online sequential RELMs have been developed by researchers. Huynh and Won proposed ReOS-ELM []. Despite the widespread application of ReOS-ELM, it does not consider the timeliness of samples. To take this into account, Zhang et al. and Du et al. separately designed online sequential RELM with a forgetting factor, viz., SF-ELM [] and RFOS-ELM []; Guo and Xu referred to them as FR-OSELM [,]. After stating the real optimization cost function in FR-OSELM and theoretically analyzing FR-OSELM, Guo and Xu pointed out that the regularization term in the cost function of FR-OSELM will be forgotten and tends to zero as time passes; thus, FR-OSELM will probably run into ill-conditioning problems and become unstable after a long period. Incidentally, a similar or the same optimization cost function, or recursive solution in which the regularization term wanes gradually with time, is still utilized in [,,].

Recently, online sequential extreme learning machines with persistent regularization and forgetting factors (OSELM-PRFFs) were put forward [,,]; this can avoid the potential singularity or ill-posed problem of FR-OSELM. Moreover, two kinds of recursive calculation schemes for OSELM-PRFF have been developed. One is FP-ELM, which directly calculates precise inverse of matrix during every update of model []; the other includes FGR-OSELM [] and AFGR-OSELM [] which compute the recursively approximate inverse of the involved matrix to reduce computational burden. These online sequential RELM have been applied to some examples perfectly. However, although the recursive calculation of approximate inverse matrix in FGR-OSELM and AFGR-OSELM enhances efficiency, it may cause FGR-OSELM and AFGR-OSELM to be unreliable in certain paradigms or parameters setups. Additionally, the direct calculation of precise inverse matrix in FP-ELM will make FP-ELM inefficient.

The reliability and time efficiency of online learning algorithms are two important indexes in general. In real-time applications, such as stock forecasts, modelling for controlled objects and signal processing, the computational efficiency of online training algorithm for SLFN is a crucial factor. Here, a new online sequential extreme learning machine with persistent regularization and forgetting factor using Cholesky factorization (CF-OSELM-PRFF) is presented. This paper analyzes and proves the symmetry and positive definitiveness of coefficient matrix of linear equations of OSELM-PRFF. The presented method decomposes the coefficient matrix in Cholesky form during every updating model period, and transforms the linear equations into two linear equations with lower and upper triangular coefficient matrices respectively, and applies forward substitution and backward substitution to solve the two linear equations. The computational efficiency and prediction accuracy of CF-OSELM-PRFF are evaluated by process identification, classical time series prediction, and real electricity load forecasting. The numerical experiments indicate that CF-OSELM-PRFF runs faster than several other representative methods, and can provide accurate predictions.

The rest of this paper is organized as follows. Section 2 gives a brief review of RELM, FR-OSELM and the existing OSELM-PRFF. Section 3 proposes CF-OSELM-PRFF. Performance evaluation is conducted in Section 4. Finally, conclusions are given in Section 5.

2. Brief Review of Related Work

2.1. The RELM

For simplicity, ELM based learning algorithm for SLFN with multiple input single output is discussed. The output of a SLFN with L hidden nodes can be represented by

f (x) = \sum_{i = 1}^{L} β_{i}^{} G (a_{i}^{}, b_{i}^{}, x) = h (x) β, x \in R_{}^{n}, a_{i}^{} \in R_{}^{n}

(1)

where a_i and b_i are the learning parameters of hidden nodes, and β = [β₁, β₂,…,β_L]^T is the vector of the output weights, and G(a_i, b_i, x) denotes the output of the i-th hidden node with respect to the input x, i.e., activation function. h(x) = [G(a₁, b₁, x), G(a₂, b₂, x),…, G(a_L, b_L, x)] is a feature mapping from the n-dimensional input space to the L-dimensional hidden-layer feature space. In ELM, a_i and b_i are randomly determined firstly.

For a given set of distinct training data

{(x_{i}, y_{i})}

_{1}^{N} \subset R^{n} \times R

, where x_i is an n-dimensional input vector and y_i is the corresponding scalar observation, The RELM, may be formulated as [,]

Minimize L_{_{R E L M}} (β) = \frac{λ}{2} | | β | |^{2} + \frac{1}{2} | | H β - Y | |^{2},

(2)

where Y = [y₁, y₂,…, y_N]^T indicates the target value of all the samples. H=[h(x₁)^T, h(x₂)^T,…, h(x_N)^T]^T is the mapping matrix for the inputs of all the samples. λ is the regularization parameter.

2.2. FR-OSELM

For time k, the FR-OSELM algorithm is equivalent to minimizing the following cost function:

\begin{matrix} L_{_{F R}} (β_{k}) = & \frac{λ}{2} \prod_{j = 1}^{k} μ_{j} | | β_{k} | |^{2} + \frac{1}{2} (\prod_{j = 1}^{k} μ_{j} | | H_{0} β_{k} - Y_{0} | |^{2} + \prod_{j = 2}^{k} μ_{j} | | H_{1} β_{k} - Y_{1} | |^{2} \\ + \dots + μ_{k} | | H_{k - 1} β_{k} - Y_{k - 1} | |^{2} + | | H_{k} β_{k} - Y_{k} | |^{2}) \\ = & \frac{λ}{2} \prod_{j = 1}^{k} μ_{j} | | β_{k} | |^{2} + \frac{1}{2} \sum_{i = 0}^{k} (\prod_{j = i + 1}^{k} μ_{j} | | H_{i} β_{k} - Y_{i} | |^{2}) . \end{matrix}

(3)

where subscripts i, j, k represent time point, and μ is forgetting factor.

The partial derivative of the objective function with respect to β_k is

\begin{array}{l} \frac{\partial L_{F R} (β_{k})}{\partial β_{k}} & = λ \prod_{j = 1}^{k} μ_{j} β_{k} + \sum_{i = 0}^{k} [\prod_{j = i + 1}^{k} μ_{j} H_{i}^{T} (H_{i} β_{k} - Y_{i})] \\ = (λ \prod_{j = 1}^{k} μ_{j} I + \sum_{i = 0}^{k} \prod_{j = i + 1}^{k} μ_{j} H_{i}^{T} H_{i}) β_{k} - \sum_{i = 0}^{k} \prod_{j = i + 1}^{k} μ_{j} H_{i}^{T} Y_{i} \end{array}

(4)

Set

\partial L_{_{F R}} (β_{k}) / \partial β_{k} = 0

, then β_k can be obtained as follows:

β_{k} = (λ \prod_{j = 1}^{k} μ_{j} I + \sum_{i = 0}^{k} \prod_{j = i + 1}^{k} μ_{j} H_{i}^{T} H_{i})^{- 1} (\sum_{i = 0}^{k} \prod_{j = i + 1}^{k} μ_{j} H_{i}^{T} Y_{i}) .

(5)

Denote

P_{k} = (λ \prod_{j = 1}^{k} μ_{j} I + \sum_{i = 0}^{k} \prod_{j = i + 1}^{k} μ_{j} H_{i}^{T} H_{i})^{- 1},

(6)

then,

\begin{array}{l} P_{k}^{- 1} & = λ \prod_{j = 1}^{k} μ_{j} I + \sum_{i = 0}^{k - 1} \prod_{j = i + 1}^{k} μ_{j} H_{i}^{T} H_{i} + H_{k}^{T} H_{k} \\ = μ_{k} (λ \prod_{j = 1}^{k - 1} μ_{j} I + \sum_{i = 0}^{k - 1} \prod_{j = i + 1}^{k - 1} μ_{j} H_{i}^{T} H_{i}) + H_{k}^{T} H_{k} \\ = μ_{k} P_{k - 1}^{- 1} + H_{k}^{T} H_{k} \end{array}

(7)

Invert both sides of Equation (7) and apply Sherman-Morrison-Woodbury formula, P_k can be calculated as follows:

P_{k} = {(μ_{k} P_{k - 1}^{- 1} + H_{k}^{T} H_{k}^{})}^{- 1} = \frac{1}{μ_{k}} (P_{k - 1} - \frac{P_{k - 1} H_{k}^{T} H_{k} P_{k - 1}^{}}{μ_{k} + H_{k} P_{k - 1} H_{k}^{T}}),

(8)

Substituting Equation (6) into Equation (5) yields

\begin{array}{l} β_{k} & = P_{k} (\sum_{i = 0}^{k} \prod_{j = i + 1}^{k} u_{j} H_{i}^{T} Y_{i}) \\ = P_{k} (μ_{k} \sum_{i = 0}^{k - 1} \prod_{j = i + 1}^{k - 1} μ_{j} H_{i}^{T} Y_{i} + H_{k}^{T} Y_{k}) \\ = P_{k} (μ_{k} P_{k - 1}^{- 1} P_{k - 1} \sum_{i = 0}^{k - 1} \prod_{j = i + 1}^{k - 1} μ_{j} H_{i}^{T} Y_{i} + H_{k}^{T} Y_{k}) \\ = P_{k} ((P_{k}^{- 1} - H_{k}^{T} H_{k}) β_{k - 1} + H_{k}^{T} Y_{k}) \\ = β_{k - 1} + P_{k} H_{k}^{T} (Y_{k} - H_{k} β_{k - 1}) \end{array}

(9)

It is obvious that the regularization item

\frac{λ}{2} \prod_{j = 1}^{k} μ_{j} | | β_{k} | |^{2}

in the cost function of FR-OSELM will be forgotten and tends to zero with k increasing, thus, FR-OSELM will probably run into ill-conditioning problems and become unstable after a long period [,].

2.3. The Existing Algorithms for OSELM-PRFF

Recently, some papers [,,] take the following cost function in online sequential RELM with forgetting factor:

L (β_{k}) = \frac{λ}{2} | | β_{k} | |^{2} + \frac{1}{2} \sum_{i = 0}^{k} (\prod_{j = i + 1}^{k} μ_{j} | | H_{i} β_{k} - Y_{i} | |^{2}),

(10)

Set

\partial L (β_{k}) / \partial β_{k} = 0

, then β_k can be obtained as follows:

β_{k} = (λ I + \sum_{i = 0}^{k} \prod_{j = i + 1}^{k} μ_{j} H_{i}^{T} H_{i})^{- 1} (\sum_{i = 0}^{k} \prod_{j = i + 1}^{k} μ_{j} H_{i}^{T} Y_{i}) .

(11)

Here regularization item will not be forgotten over time. Moreover, two kinds of recursive calculation approaches for β_k, i.e., FP-ELM and FGR-OSELM have been proposed.

2.3.1. FP-ELM

The main procedure of FP-ELM can be retold as follows.

For an initial data chunk

S_{0} = {(x_{i}, y_{i})}_{^{i = 1}}^{_{i = N_{0}}}

, let

K_{0}^{} = H_{0}^{T} H_{0}^{},

(12)

according to Equation (11), the initial network output weight is

β_{0} = {(λ I + K_{0})}^{- 1} H_{0}^{T} Y_{0}

(13)

When a new data chunk

S_{k} = {(x_{i}, y_{i})}_{^{_{i = (\sum_{j = 0}^{k - 1} N_{j}) + 1}}}^{_{i = \sum_{j = 0}^{k} N_{j}}}

, (k ≥ 1) arrives, the recursive way of updating the output weights can be written as

K_{k}^{} = μ_{k} K_{k - 1}^{} + H_{k}^{T} H_{k}^{},

(14)

β_{k} = β_{k - 1} + {(λ I + K_{k}^{})}^{- 1} (H_{k}^{T} (Y_{k} - H_{k}^{} β_{k - 1}) - λ (1 - μ_{k}) β_{k - 1}) .

(15)

In Equation (15), the calculation of inverse of matrix, i.e., (λI+K_k)^-1 make FP-ELM time-consuming.

2.3.2. FGR-OSELM

The procedure of FGR-OSELM can be summarized as follows.

Initialization phase:

P_{0} = {(λ I + H_{0}^{T} H_{0})}^{- 1} .

(16)

β_{0} = P_{0} H_{0}^{T} Y_{0} .

(17)

Online sequential learning phase:

\begin{array}{l} P_{k}^{*} & = \frac{1}{μ} P_{k - 1} - \frac{λ (1 - μ)}{μ^{2}} P_{k - 1} {(I + \frac{λ (1 - μ)}{μ} P_{k - 1})}^{- 1} P_{k - 1} \\ \approx \frac{1}{μ} P_{k - 1} - \frac{λ (1 - μ)}{μ^{2}} P_{k - 1} (I - \frac{λ (1 - μ)}{μ} P_{k - 1}) P_{k - 1} \end{array}

(18)

P_{k} = P_{k}^{*} - \frac{P_{k}^{*} H_{k}^{T} H_{k} P_{k}^{*}}{1 + H_{k} P_{k}^{*} H_{k}^{T}}

(19)

β_{k} = β_{k - 1} + P_{k} H_{k}^{T} (Y_{k} - H_{k} β_{k - 1}) - λ (1 - μ) P_{k} β_{k - 1}

(20)

In Equation (18), the approximate calculation of matrix

P_{k}^{*}

make FGR-OSELM unreliable in certain paradigms or parameters setups.

It should be noted that some online sequential RELM with forgetting factor, such as SF-ELM [], RFOS-ELM [], WOS-ELM [], DU-OS-ELM [], and FReOS-ELM [], take Equation (10) as cost function, but take Equations (8) and (9) or their equivalent form as recursive solutions.

3. Proposed CF-OSELM-PRFF

FP-ELM is a stable online sequential training algorithm for SLFN which take the timeliness of samples into consideration and can circumvent the potential phenomenon of data saturation. However, the calculation of inverse of λI+K_k in Equation (15) is time-consuming in every work period. FGR-OSELM calculates approximate recursively by Equations (18) and (19), which can save time, whereas which might engender algorithm unstable. In order to speed up FP-ELM, the work proposes an approach to fast solve β_k using Cholesky decomposition trick. The complete algorithm is termed as CF-OSELM-PRFF, which is described in the sequel.

Let

Q_{0}^{} = H_{0}^{T} Y_{0}^{},

(21)

Q_{k}^{} = μ_{k} Q_{k - 1}^{} + H_{k}^{T} Y_{k}^{} .

(22)

Then, Equation (11) can be rewritten as

(λ I + K_{k}^{}) β_{k} = Q_{k}^{} .

(23)

Proposition 1.

The matrix λI+K_k is a symmetric positive definition matrix.

Proof Symmetry. Apparently, K₀ is symmetric. Assume K_k_-1 is symmetric, then

K_{k}^{T} = {(μ_{k} K_{k - 1}^{} + H_{k}^{T} H_{k}^{})}_{}^{T} = (μ_{k} K_{k - 1}^{T} + H_{k}^{T} H_{k}^{}) = K_{k}^{} .

(24)

According to mathematical induction, for any k, K_k is symmetric. As a result, λI+K_k is symmetric.

Positive definitiveness. For any ζ=[ζ₁, ζ₂, …, ζ_L]^T≠0, it holds that

ζ_{}^{T} K_{0} ζ = ζ_{}^{T} H_{0}^{T} H_{0}^{} ζ = {(H_{0}^{} ζ)}_{}^{T} (H_{0}^{} ζ) = (\sum_{i = 1}^{L} ζ_{i}^{} G (a_{i}^{}, b_{i}^{}, x_{1}^{}))_{}^{2} + (\sum_{i = 1}^{L} ζ_{i}^{} G (a_{i}^{}, b_{i}^{}, x_{2}^{}))_{}^{2} + \dots + (\sum_{i = 1}^{L} ζ_{i}^{} G (a_{i}^{}, b_{i}^{}, x_{N_{0}}^{}))_{}^{2} \geq 0.

(25)

Suppose K_k_-1 is positive semi-definite, that is,

ζ_{}^{T} K_{k - 1}^{} ζ \geq 0,

(26)

then,

ζ_{}^{T} μ_{k} K_{k - 1}^{} ζ \geq 0.

(27)

Similar to Equation (25), it holds that

ζ_{}^{T} H_{k}^{T} H_{k}^{} ζ \geq 0,

(28)

Then,

\begin{array}{l} ζ_{}^{T} K_{k}^{} ζ & = ζ_{}^{T} (μ_{k} K_{k - 1}^{} + H_{k}^{T} H_{k}^{}) ζ \\ = ζ_{}^{T} μ_{k} K_{k - 1}^{} ζ + ζ_{}^{T} H_{k}^{T} H_{k}^{} ζ \geq 0 \end{array}

(29)

Additionally,

ζ_{}^{T} λ I ζ = λ \sum_{i = 1}^{L} ζ_{i}^{2} > 0 .

(30)

In conclusion, λI+K_k is a symmetric positive definition matrix.

Denote B=λI+K_k, then, B can be uniquely factorized into Cholesky form, i.e., B=U^TU, where U is an upper triangular matrix. Then U can be calculated as following formulas []:

\begin{array}{l} u_{i i} & = {(b_{i i} - \sum_{d = 1}^{i - 1} u_{d i}^{2})}_{}^{1 / 2} \\ = {(λ + K_{k}^{} (i i) - \sum_{d = 1}^{i - 1} u_{d i}^{2})}_{}^{1 / 2}, i = 1, \dots, L . \end{array}

(31)

\begin{array}{l} u_{i j} & = (b_{i j} - \sum_{d = 1}^{i - 1} u_{d i}^{} u_{d j}^{}) / u_{i i} \\ = (K_{k}^{} (i j) - \sum_{d = 1}^{i - 1} u_{d i}^{} u_{d j}^{}) / u_{i i}, j = i + 1, \dots, L . \end{array}

(32)

Equation (23) can be solved by the following two equations:

U_{}^{T} P = Q_{k}^{},

(33)

U β_{k} = P .

(34)

Denote Q_k= [q₁, q₂, …, q_L]^T, utilize back substitution method, the solution to Equation (23), viz., the coefficient β_i in Equation (1) may be gained as follows

p_{i} = (q_{i} - \sum_{d = 1}^{i - 1} u_{d i} p_{d}) / u_{i i}, i = 1, \dots, L

(35)

β_{k, i} = (p_{i} - \sum_{d = i + 1}^{L} u_{i d} β_{k, d}) / u_{i i}, i = L, \dots, 1

(36)

The CF-OSELM-PRFF algorithm can be summarized as follows.

Step 1: Preparation.

(1) Choose the hidden output function G(a, b, x) of SFLN and the number of hidden nodes L; determine λ, μ.

(2) Randomly assign hidden parameters (a_i, b_i), i=1,2,…,L.

Step 2: Initialization.

(1) Acquire the initial data chunk S₀ (N₀ ≥ 2).

(2) Calculate H₀, Y₀.

(3) Calculate K₀ by Equation (12), calculate Q₀ by Equation (21).

(4) Calculate Cholesky factor U of K₀ by Equations (31) and (32).

(5) Calculate β₀ by Equations (35) and (36).

(6) For the input x, the predicting output value

\hat{y} = h (x) β_{0}

.

Step 3: Online modeling and prediction, i.e., repeat the following substeps.

(1) Acquire the kth (k ≥ 1) data chunk S_k

(2) Calculate H_k, Y_k.

(3) Calculate K_k by Equation (14), calculate Q_k by Equation (22).

(4) Calculate U of K_k by Equations (31) and (32).

(5) Calculate β_k by Equations (35) and (36).

(6) For the input x, the predicting output value

\hat{y} = h (x) β_{k}

.

4. Experimental Results and Analysis

In this section, the performance of the presented CF-OSELM-PRFF is verified by a time-varying nonlinear process identification, two chaotic time series and one electricity demand prediction. These simulations are designed from the aspects of computation complexity (or running time) and accuracy of the CF-OSELM-PRFF by comparison with the FP-ELM, FGR-OSELM, AFGR-OSELM and FOS-MELM []. FOS-MELM is an online sequential multiple hidden layers extreme learning machine with forgetting mechanism, which is recently proposed by Xiao et al. To make the results of FOS-MELM more stable, a regularization term is introduced into its solving process according to [].

For these online algorithms, the common regularization parameter λ is set as 0.001; the forgetting factor is set as μ=0.99. For AFGR-OSELM, adaptive forgetting factor is tuned in interval [0.8, 0.999] with initial value 0.995, and other peculiar parameters are set according to [].

The output of hidden node with respect to the input x of a SLFN in Equation (1) is set as the sigmoid function, i.e., G(a, b, x)=1/(1+exp(-(a

\cdot

x +b))), the components of a, i.e, the input weights and biase b are randomly chosen from the range [-1,1]. Specially, the hyperbolic tangent function G(a, b, x)=(1-exp(-(a

\cdot

x+b)))/(1+exp(-(a

\cdot

x +b))) is selected as activation function in FOS-MELM.

For FOS-MELM, every training data chunk contains only a sample, and each sample remains valid for s unit time; the parameter s is set as s = N₀.

In order to observe performance of these approaches under various situations, the number L of hidden nodes of SLFN is set as 25, 50, 100, 150, 200, and the corresponding number N₀ of initial training samples is assigned to 50, 100, 200, 200, 200, respectively.

The root mean square error (RMSE) of prediction is regarded as measurement index of model accuracy.

RMSE = \sqrt{\frac{1}{t_{2} - t_{1} + 1} \sum_{i = t_{1}}^{t_{2}} {({\hat{y}}_{i} - y_{i})}^{2}}

(37)

The relative efficiencies of CF-OSELM-PRFF to its counterparts are measured by speedup ratios. The speedups of CF-OSELM-PRFF to the other related methods are defined as

{speedup}_{1} = \frac{total running time of FP-ELM}{total running time of CF-OSELM-PRFF}

(38)

{speedup}_{2} = \frac{total running time of FGR-OSELM}{total running time of CF-OSELM-PRFF}

(39)

{speedup}_{3} = \frac{total running time of AFGR-OSELM}{total running time of CF-OSELM-PRFF}

(40)

{speedup}_{4} = \frac{total running time of FOS-MELM}{total running time of CF-OSELM-PRFF}

(41)

All the performance assessments were carried out in MATLAB R2010b 32-bit environment running on Windows 7 32-bit with Intel Core i3-3220 3.3 GHz CPU and 4 GB RAM.

4.1. Time-varying Nonlinear Process Identification

The identified unknown system is a modified version of the one addressed in []; by changing the constant and the coefficients of variables, the time-varying system is expressed as follows:

y (k + 1) = {\begin{matrix} \frac{y (k)}{1 + y {(k)}_{}^{2}} + u {(k)}_{}^{3} & k \leq 100 \\ \frac{2 y (k)}{2 + y {(k)}_{}^{2}} + u {(k)}_{}^{3} & 100 < k \leq 300 \\ \frac{y (k)}{1 + 2 y {(k)}_{}^{2}} + 2 u {(k)}_{}^{3} & 300 < k \end{matrix}

(42)

The system (42) can be expressed as follows:

y (k) = f (x (k))

(43)

where f(x) is a nonlinear function, x(k) is the regression input data vector

x(k)=[y(k-1), y(k-2),..., y(k-n_y); u(k-n_d), u(k-n_d-1),..., u(k-n_u)],

(44)

with n_y, n_d and n_u being model structure parameters. They are set as n_y = 3, n_d = 1 and n_u = 2 here. When SLFN is applied to approximate Equation (40), (x(k), y(k)) is the learning sample (x_k, y_k) of SLFN.

Denote k₀ = N₀+max (n_y, n_u) − n_d, k₁ = k₀ + 500. The system input is set as follows:

u (k) = {\begin{matrix} rand () - 0.5 & k \leq k_{0}^{} \\ \sin (2 π (k - k_{0}^{}) / 120) & k_{0}^{} < k \leq k_{1}^{} \\ \sin (2 π (k - k_{1}^{}) / 50) & k_{1}^{} < k \end{matrix}

(45)

where, rand ( ) generates random numbers which are uniformly distributed in the interval (0,1).

The simulations are carried out for different numbers of steps and hidden nodes. Efficiency comparison between CF-OSELM-PRFF and FP-ELM, FGR-OSELM, AFGR-OSELM, together with FOS-MELM are listed in Table 1. Due to randomness of parameters a, b, and u(k) during initial stage (k≤k₀), along with intrinsic uncertainty of computing environment, the testing results of simulation must possess variation. Consequentially, for each case, every running time is a mean over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is marked in bold.

Table 1. Efficiency comparison between CF-OSELM-PRFF and its counterparts on identification of process (42) with input (45).

Table 1 shows that, with the same number of hidden nodes and by performing the same number of simulation prediction steps, CF-OSELM-PRFF costs least time among five approaches statistically, therefore, CF-OSELM-PRFF has obvious speed advantage over FP-ELM. Moreover, with the increase of the number of hidden nodes, the speedup tends to become larger. FOS-MELM trains three hidden layer feedforward neural networks and consists of complex calculation steps; thus, it costs most time.

Table 2 displays a prediction RMSE comparison of CF-OSELM-PRFF to its counterparts. Every RMSE is also an average over 5 independent trials of each algorithm performing the set steps. In each set of results, the best one is marked in bold. From Table 2, it can be seen that there is not any apparent difference in the predictive accuracy among FP-ELM, FOS-MELM and CF-OSELM-PRFF; in some setups, FGR-OSELM and AFGR-OSELM can provide satisfactory, or even the best forecasts, but they cannot work optimally in certain cases. When the number of simulation prediction steps is set to 2500 or 3000, FGR-OSELM sometimes produces bad or low-accuracy predictions in the later stages of the simulation process, and therefore, the RMSE of FGR-OSELM becomes very large. Additionally, when the number of hidden nodes is set at 100, 150 or 200 and the number of simulation prediction steps is set at 1000 or more, AFGR-OSELM usually runs unstably and produces a very large RMSE. The reason for this is that in the recursive formulas of FGR-OSELM and AFGR-OSELM, the approximate calculation of inverse of a related matrix will yield errors which may propagate and accumulate; as a result, the algorithm is apt to producing unreliable results.

Table 2. Prediction RMSE comparison of CF-OSELM-PRFF to its counterparts on identification of process (42) with input (45).

To intuitively observe and compare the accuracy and stability of these online sequential RELMs, fix L = 25, N₀ = 50, and simulation prediction steps = 3000, execute FGR-OSELM repeatedly until a certain unstable predictive scenario occurs, plot the corresponding prediction error (predicted value minus real value) curves in Figure 1a, save the current a, b values, and initial u(k) (k≤k₀) signal, which are called as adverse a, b values and initial u(k) signal. Subsequently, execute FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse a, b values and initial u(k) signal, respectively, and demonstrate the prediction error curves of the three approaches in Figure 1b, c and d. Clearly, Figure 1a shows that the prediction errors of FGR-OSELM become very large and completely meaningless when the prediction step exceeds a certain limit. There are a few large peaks at certain instances which reveal the instability of the FGR-OSELM arising from its recursive approximate calculation. In order to explicitly exhibit variation of prediction error of FGR-OSELM, only the partial results within the first 2453 steps are presented. Additionally, Figure 1b, c and d shown that the prediction effect of CF-OSELM-PRFF is similar to that of FP-ELM and FOS-MELM; in other words, they possess the same ideal predictive performance.

Figure 1. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on identification of process (42): (a) Prediction error curve of FGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.

Set L = 100, N₀ = 200, and simulation prediction steps = 3000, execute AFGR-OSELM repeatedly until a certain unstable predictive scenario occurs, plot the corresponding prediction error curves in Figure 2a, save the current a, b values, and initial u(k) (k≤k₀) signal, i.e., adverse a, b values and initial u(k) signal. Then, execute FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse a, b values, and initial u(k) signal, respectively, and demonstrate the prediction error curves of the three approaches in Figure 2b, c and d. Clearly, Figure 2a shows that, the prediction errors of AFGR-OSELM become larger at the 872th step, reaching −67.7839. Actually, at the 878th step, the prediction error of AFGR-OSELM suddenly reaches −25286.1257. Too large prediction errors have not been marked in Figure 2a; thus, only the partial results prior to the 878th step are presented. Additionally, Figure 2b and c manifest CF-OSELM-PRFF possesses the same excellent predictive performance as FP-ELM. Figure 2d seems to show that FOS-MELM is slightly better than CF-OSELM-PRFF, but there is only a small difference between their RMSEs.

Figure 2. Prediction error curves of relevant approaches with an adverse parameters setting of AFGR-OSELM on identification of process (42): (a) Prediction error curve of AFGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.

The above experiments indicate that for the parameter settings with which FGR-OSELM or AFGR-OSELM produces larger prediction errors, CF-OSELM-PRFF, FP-ELM and FOS-MELM can run stably and provide satisfactory prediction.

4.2. Lorenz Time Series Prediction

The Lorenz time series is a three-dimensional dynamical system that exhibits chaotic flow, which is described by the following equations [,]:

\begin{array}{l} \frac{d x (t)}{d t} = σ [y (t) - x (t)] \\ \frac{d y (t)}{d t} = r x (t) - y (t) - x (t) z (t) \\ \frac{d z (t)}{d t} = x (t) y (t) - b z (t) \end{array}

(46)

where x(t), y(t), and z(t) are the values of time series at time t. A typical choice for the parameter values is σ = 10, r = 28, and b = 8/3.

In this example, observations of the continuous x(t), y(t), and z(t) at a sequence of sampling time points are concerned. Hence, the function ode45, a standard solver for ordinary differential equations in MATLAB, is applied to generate sampling values of x(t), y(t), and z(t). The routine implements the fourth-order and fifth-order Runge-Kutta method with adaptive integral step size for efficient computation. The initial state is set as x(0) = 2, y(0) = 3, and z(0) = 4.

Let T_s indicates sampling period. At each sampling time point kT_s, ([x(kT_s-T_s), x(kT_s-2T_s),..., x(kT_s-n_yT_s)], x(kT_s)) is a training sample for ELM. In time series one step ahead prediction, the common way is to use x_k₊₁ = [x(kT_s), x(kT_s-T_s), ..., x(kT_s-(n_y-1)T_s)] to calculate predicting value of x(kT_s+T_s), namely,

\hat{x} (k T_{s} + T_{s})

. In the simulation, the sampling period is set as T_s = 0.02, the embedding dimension is chosen as n_y = 3, and the x-coordinate of the Lorenz time series is considered for prediction.

In order to verify the computational efficiency of CF-OSELM-PRFF compared to FP-ELM, FGR-OSELM, AFGR-OSELM and FOS-MELM, the running times of these algorithms and speedups of CF-OSELM-PRFF to the other algorithms are tabulated in Table 3. Every running time is a mean over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. As seen in Table 3, CF-OSELM-PRFF clearly outperforms the other algorithms in terms of speed.

Table 3. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of time series (46).

Table 4 shows the prediction RMSE of five methods. Every RMSE is also an average value over 5 independent trials. In each set of results, the best one is marked in bold. As shown in Table 4, on the whole, the prediction behaviors of these methods in this simulation are basically similar to those in the first simulation. When the number of simulation prediction steps is set to 3000, FGR-OSELM occasionally produces poor predictions in the later stage of the simulation process; thus, RMSE of FGR-OSELM becomes larger. Unexpectedly, in many cases, AFGR-OSELM cannot provide reasonable predictions.

Table 4. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of time series (46).

Analogously, an intuitional comparison of predicting results for these algorithms is also made. Fix L = 200, N₀ = 200, and simulation prediction steps = 3000, run FGR-OSELM repetitively until a certain unstable predictive scenario appears, illustrate the corresponding prediction error curves in Figure 3a, save the current a, b values, i.e., adverse a, b values. Subsequently, execute AFGR-OSELM, FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse a, b values, respectively, their prediction error curves are demonstrated in Figure 3b, c, d and e.

Figure 3. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on time series (46): (a) Prediction error curve of FGR-OSELM; (b) That of AFGR-OSELM; (c) That of FP-ELM; (d) That of CF-OSELM-PRFF; (e) That of FOS-MELM.

Figure 3a shows that the prediction errors of FGR-OSELM become very large when the prediction step exceeds a certain limit. In Figure 3b, only curve from the 450th step to the 540th step is plotted, because after the 540th step, AFGR-OSELM provides too large prediction errors at many time points. The recursive approximate calculations of FGR-OSELM and AFGR-OSELM result in instability in certain settings. Figure 3c and d manifest prediction error curve of CF-OSELM-PRFF is extremely similar to that of FP-ELM. Comparing Figure 3d and e, it can be found that the prediction result of CF-OSELM-PRFF is slightly better than that of FOS-MELM.

Although normalizing time series values, i.e., input data of SLFN into the interval [0,1] or [−1,1], can significantly improve stability of FGR-OSELM and AFGR-OSELM, it is difficult to obtain the maximum and minimum values of the input data in some actual online modeling. Thus, the normalization of input data is infeasible sometimes. In this example, CF-OSELM-PRFF and FP-ELM can train SLFN and provide comfortable results using raw data; they are less susceptible to scope of input data than FGR-OSELM and AFGR-OSELM.

4.3. Rössler Time Series Prediction

The Rössler system is one of the most famous chaotic systems, though it is an artificial system designed solely with the purpose of creating a model for a strange attractor [,,]. The Rössler time series is generated from the following differential equations:

\begin{array}{l} \frac{d x (t)}{d t} = - y (t) - z (t) \\ \frac{d y (t)}{d t} = x (t) + d y (t) \\ \frac{d z (t)}{d t} = e + z (t) (x (t) - f) \end{array}

(47)

where x(t), y(t), and z(t) are the values of time series at time t. d, e, f are the control parameters and they are set as d = 0.15, e = 0.2 and f = 10.

The way to generate sampling values of x(t), y(t), and z(t) is the same as that aforementioned in the previous example. The initial condition is set as x(0) = 0.05, y(0) = 0.05, and z(0) = 0.05. The sampling interval is set as Ts=0.02, the embedding dimension is chosen as n_y=3, and the x-coordinate of the the Rössler systems is considered for prediction.

In this simulation, the experimental design is the same as that in the previous simulations. The running times of these algorithms and speedups of CF-OSELM-PRFF to other algorithms are recorded in Table 5. Every running time is an average over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. From Table 5, it is clear to see that CF-OSELM-PRFF is superior to other methods in terms of efficiency.

Table 5. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of time series (47).

Table 6 shows the prediction RMSE of five methods. Every RMSE is an average over 5 independent trials. In each set of results, the best one is marked in bold. Different from the previous two experiments, here the chance that FGR-OSELM behaves unstably in the later segment of the simulation process when the numbers of simulation prediction steps are set as 3000. Moreover, AFGR-OSELM can behave stably with larger probability. If AFGR-OSELM is run for only 5 successive times, unreasonable prediction results rarely appear. Thus, its performance has not been investigated in the following contrastive demonstration. Additionally, in many cases, FOS-MELM achieves the best results; with the number of nodes increasing, FOS-MELM yields higher accuracy than CF-OSELM-PRFF at the expense of requiring more time.

Table 6. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of time series (47).

Accordingly, for the case of L = 25, N₀ = 50 and simulation prediction steps = 3000, FGR-OSELM, CF-OSELM-PRFF, FP-ELM and FOS-MELM are conducted with the same adverse a, b values, respectively, and their prediction error curves are plotted in Figure 4. Figure 4a shows that the FGR-OSELM works first but fails afterwards. Figure 4b and c manifest CF-OSELM-PRFF and FP-ELM provide almost the same good prediction results. Contrast Figure 4c and d, it can be found that error range of CF-OSELM-PRFF is smaller than that of FOS-MELM; in other words, the former yields better forecasting than the latter in the late stage.

Figure 4. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on time series (47): (a) Prediction error curve of FGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.

4.4. Experiment on Real Data Set

Electricity load forecasting plays an important part in the strategy management of electricity power systems. Here, an electricity demand time series (EDTS) [] is utilized to test the performances of these online algorithms; the EDTS consists of a sequence of 15 minutes averaged values of power demand. The first 3500 values of EDTS are shown in Figure 5.

Figure 5. EDTS.

Before training the model, the data are normalized into [−1,1] by Equation (48); after forecasting, the predicted values are denormalized by Equation (49).

y (k) \leftarrow 2 \frac{y (k) - \min (y)}{\max (y) - \min (y)} - 1

(48)

\hat{y} (k) \leftarrow \frac{\hat{y} (k) + 1}{2} (\max (y) - \min (y)) + \min (y)

(49)

In this example, the experiment is designed like the previous ones. The running times of these algorithms and speedups of its counterparts are recorded in Table 7. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. From Table 7, it is clear that CF-OSELM-PRFF runs faster than the other methods.

Table 7. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of EDTS.

Table 8 shows prediction RMSE of five methods. In each set of results, the best one is marked in bold. CF-OSELM-PRFF produces nearly the same levels of accuracy as FP-ELM, but higher accuracy than FOS-MELM statistically. FGR-OSELM runs unstably only in one case, but AFGR-OSELM does so in many cases. In addition, different from the previous example, FOS-MELM did not achieve the best results here; CF-OSELM-PRFF yields higher accuracy than FOS-MELM in many cases.

Table 8. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of EDTS.

4.5. Discussion

The above experiments show that CF-OSELM-PRFF has greater time efficiency than several other related approaches; the speedup ratio of CF-OSELM-PRFF to other approaches is mainly influenced by the number of hidden nodes of SLFN. The CF-OSELM-PRFF has achieved around 1.4 to 2.0 of speedup over the FP-ELM. This speedup can facilitate its use in real applications, and even in real-time applications.

CF-OSELM-PRFF can provide the same predictive accuracy as FP-ELM, and better stability than FGR-OSELM and AFGR-OSELM. Additionally, the experiments also show that there is not an obvious difference between the predictive accuracy of CF-OSELM-PRFF and that of FOS-MELM. In the third simulation, FOS-MELM outperformed CF-OSELM-PRFF statistically, whereas, CF-OSELM-PRFF surpassed FOS-MELM in the next one.

CF-OSELM-PRFF can learn arriving data one-by-one or chunk-by-chunk without the need for storing the training samples accumulated thus far; it is suitable for storage capacity-constrained computing devices.

In the above experiments, CF-OSELM-PRFF adopted a fixed forgetting factor to reduce the contribution of old samples; actually, it may absorb some variable forgetting factor skills, such as that reported in [].

5. Conclusions

Regularization plays an important role in RELM, but in the cost function or recursive solution of many online sequential RELMs, the regularization effect will decay gradually over time. Fortunately, FP-ELM, FGR-OSELM and AFGR-OSELM can maintain persistent regularization effect throughout the whole learning process. They share the same cost function but employ different solving processes.

This paper makes full use of symmetry and positive definitiveness of the coefficient matrix of linear equations of OSELM-PRFF, and factorizes the matrix in Cholesky form to solve equations in every prediction step. Finally, a new solving method for OSELM-PRFF, i.e., CF-OSELM-PRFF is developed. The proposed method is a fast and reliable one; it is very appropriate for fast, and even for real-time modeling of time-varying nonlinear systems.

The regularization item in CF-OSELM-PRFF would not decay over time, but constant regularization parameters make CF-OSELM-PRFF deficient in terms of adaptability. Therefore, it would worthwhile to design a high-efficient method to adjust regularization parameter.

Author Contributions

X.Z. and X.K. conceived and developed the algorithm; X.K. designed the experiments; X.Z. performed the experiments and analyzed the results; X.Z. and X.K. Wrote the manuscript.

Funding

The work is supported by the Hunan Provincial Science and Technology Foundation of China (2011FJ6033); the National Natural Science Foundation of China (No. 61502540); National Science Foundation of Hunan Province (No. 2019JJ40406).

Conflicts of Interest

The authors declare no conflict of interest.

References

Park, J.; Sandberg, I.W. Universal approximation using radial basis function networks. Neural Comput. 1991, 3, 246–257. [Google Scholar] [CrossRef] [PubMed]
Huang, G.B.; Chen, Y.; Babri, Q.H.A. Classification ability of single hidden layer feedforward neural networks. IEEE Trans. Neural Netw. 2000, 11, 799–801. [Google Scholar] [CrossRef] [PubMed]
Ferrari, S.; Stengel, R.F. Smooth function approximation using neural networks. IEEE Trans. Neural Netw. 2005, 16, 24–38. [Google Scholar] [CrossRef] [PubMed]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: a new learning scheme of feedforward neural networks. In Proceedings of the international joint conference on neural networks, Budapest, Hungary, 25–29 July 2004. [Google Scholar]
Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
Wu, Y.; Liu, D.; Jiang, H. Length-changeable incremental extreme learning machine. J. Comput. Sci. Technol. 2017, 32, 630–643. [Google Scholar] [CrossRef]
Zhu, C.; Zou, B.; Zhao, R.; Cui, J.; Duan, X.; Chen, Z.; Liang, Y. Retinal vessel segmentation in colour fundus images using Extreme Learning Machine. Comput. Med. Imag. Gr. 2017, 55, 68–77. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Tian, H.Q.; Li, Y.F. Four wind speed multi-step forecasting models using extreme learning machines and signal decomposing algorithms. Energy Convers. Manag. 2015, 100, 16–22. [Google Scholar] [CrossRef]
Mi, X.W.; Liu, H.; Li, Y.F. Wind speed forecasting method using wavelet, extreme learning machine and outlier correction algorithm. Energy Convers. Manag. 2017, 151, 709–722. [Google Scholar] [CrossRef]
Sattar, A.M.A.; Ertuğrul, Ö. F.; Gharabaghi, B.; McBean, E.A.; Cao, J. Extreme learning machine model for water network management. Neural Comput. Appl. 2019, 31, 157–169. [Google Scholar] [CrossRef]
Yang, Y.; Lin, X.; Miao, Z.; Yuan, X.; Wang, Y. Predictive Control Strategy Based on Extreme Learning Machine for Path-Tracking of Autonomous Mobile Robot. Intell. Auto. Soft Comput. 2015, 21, 1–19. [Google Scholar] [CrossRef]
Salmeron, J.L.; Ruiz-Celma, A. Elliot and Symmetric Elliot Extreme Learning Machines for Gaussian Noisy Industrial Thermal Modelling. Energies 2019, 12, 90. [Google Scholar] [CrossRef]
Rodriguez, N.; Alvarez, P.; Barba, L.; Cabrera-Guerrero, G. Combining Multi-Scale Wavelet Entropy and Kernelized Classification for Bearing Multi-Fault Diagnosis. Entropy 2019, 21, 152. [Google Scholar] [CrossRef]
Demertzis, K.; Tziritas, N.; Kikiras, P.; Sanchez, S.L.; Iliadis, L. The Next Generation Cognitive Security Operations Center: Adaptive Analytic Lambda Architecture for Efficient Defense against Adversarial Attacks. Big Data Cogn. Comput. 2019, 3, 6. [Google Scholar] [CrossRef]
Sonobe, R. Parcel-Based Crop Classification Using Multi-Temporal TerraSAR-X Dual Polarimetric Data. Remote Sens. 2019, 11, 1148. [Google Scholar] [CrossRef]
Salerno, V.M.; Rabbeni, G. An Extreme Learning Machine Approach to Effective Energy Disaggregation. Electronics 2018, 7, 235. [Google Scholar] [CrossRef]
Kasun, L.L.C.; Zhou, H.; Huang, G.B.; Vong, C.M. Representational learning with ELMs for big data. IEEE Intell. Syst. 2013, 286, 31–34. [Google Scholar]
Ding, S.; Zhang, N.; Xu, X.; Guo, L.; Zhang, J. Deep Extreme Learning Machine and Its Application in EEG Classification. Math. Probl. Eng. 2015. [Google Scholar] [CrossRef]
Yang, Y.; Wu, Q.M.J. Multilayer extreme learning machine with subnetwork nodes for representation learning. IEEE Trans. Cybern. 2016, 46, 2570–2583. [Google Scholar] [CrossRef] [PubMed]
Xiao, D.; Li, B.; Mao, Y. A Multiple Hidden Layers Extreme Learning Machine Method and Its Application. Math. Probl. Eng. 2017. [Google Scholar] [CrossRef]
Xiao, D.; Li, B.; Zhang, S. An online sequential multiple hidden layers extreme learning machine method with forgetting mechanism. Chemom. Intell. Lab. Syst. 2018, 176, 126–133. [Google Scholar] [CrossRef]
Yang, Y.; Wu, Q.M.J.; Wang, Y. Autoencoder with invertible functions for dimension reduction and image reconstruction. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 1065–1079. [Google Scholar] [CrossRef]
Yang, J.; Sun, W.; Liu, N.; Chen, Y.; Wang, Y.; Han, S. A Novel Multimodal Biometrics Recognition Model Based on Stacked ELM and CCA Methods. Symmetry 2018, 10, 96. [Google Scholar] [CrossRef]
Liang, N.Y.; Huang, G.B.; Saratchandran, P.; Sundararajan, N. A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 2006, 17, 1411–1423. [Google Scholar] [CrossRef] [PubMed]
Frances-Villora, J.V.; Rosado-Muñoz, A.; Bataller-Mompean, M.; Barrios-Aviles, J.; Guerrero-Martinez, J.F. Moving Learning Machine towards Fast Real-Time Applications: A High-Speed FPGA-Based Implementation of the OS-ELM Training Algorithm. Electronics 2018, 7, 308. [Google Scholar] [CrossRef]
Huynh, H.T.; Won, Y. Regularized online sequential learning algorithm for single-hidden layer feedforward neural networks. Patt. Recognit. Lett. 2011, 32, 1930–1935. [Google Scholar] [CrossRef]
Guo, W.; Xu, T. Online sequential extreme learning machine with generalized regularization and forgetting mechanism. Control Decis. 2017, 32, 247–254. [Google Scholar]
Guo, W.; Xu, T.; Tang, K.; Yu, J.; Chen, S. Online Sequential Extreme Learning Machine with Generalized Regularization and Adaptive Forgetting Factor for Time-Varying System Prediction. Math. Probl. Eng. 2018. [Google Scholar] [CrossRef]
Deng, W.Y.; Zheng, Q.H.; Chen, L. Regularized extreme learning machine. In Proceedings of the IEEE Symposiumon Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009. [Google Scholar]
Ding, S.; Ma, G.; Shi, Z. A Rough RBF Neural Network Based on Weighted Regularized Extreme Learning Machine. Neural Process. Lett. 2014, 40, 245–260. [Google Scholar] [CrossRef]
Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2012, 42, 513–529. [Google Scholar] [CrossRef]
Er, M.J.; Shao, Z.; Wang, N. A study on the randomness reduction effect of extreme learning machine with ridge regression. In Proceedings of the Advances in Neural Networks—ISNN 2013, 10th International Symposium on Neural Networks, Dalian, China, 4–6 July 2013. [Google Scholar]
Shao, Z.; Er, M.J.; Wang, N. An effective semi-cross-validation model selection method for extreme learning machine with ridge regression. Neurocomputing 2015, 151, 933–942. [Google Scholar] [CrossRef]
Zhang, X.; Wang, H.L. Selective forgetting extreme learning machine and its application to time series prediction. Acta Phys. Sinica 2011. [Google Scholar] [CrossRef]
Du, Z.; Li, X.; Zheng, Z.; Zhang, G.; Mao, Q. Extreme learning machine based on regularization and forgetting factor and its application in fault prediction. Chinese J. Instrum. 2015, 36, 1546–1553. [Google Scholar]
Zhang, H.; Zhang, S.; Yin, Y. Online Sequential ELM Algorithm with Forgetting Factor for Real Applications. Neurocomputing 2017, 261, 144–152. [Google Scholar] [CrossRef]
Li, Y.; Zhang, S.; Yin, Y.; Xiao, W.; Zhang, J. A Novel Online Sequential Extreme Learning Machine for Gas Utilization Ratio Prediction in Blast Furnaces. Sensors 2017, 17, 1847. [Google Scholar] [CrossRef] [PubMed]
Wu, Z.; Tang, H.; He, S.; Gao, J.; Chen, X.; To, S.; Li, Y.; Yang, Z. Fast dynamic hysteresis modeling using a regularized online sequential extreme learning machine with forgetting property. Int. J. Adv. Manuf. Technol. 2018, 94, 3473–3484. [Google Scholar] [CrossRef]
Liu, D.; Wu, Y.; Jiang, H. FP-ELM: An online sequential learning algorithm for dealing with concept drift. Neurocomputing 2016, 207, 322–334. [Google Scholar] [CrossRef]
Martin, R.S.; Peters, G.; Wilkinson, J.H. Symmetric decomposition of a positive definite matrix. Num. Math. 1965, 7, 362–383. [Google Scholar] [CrossRef]
Narendra, K.; Parthasarathy, K. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1990, 1, 4–27. [Google Scholar] [CrossRef] [PubMed]
Lorenz, E.N. Deterministic nonperiodic flows. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
Meng, Q.F.; Peng, Y.H.; Sun, J. The improved local linear prediction of chaotic time series. Chin. Phys. 2007, 16, 3220–3225. [Google Scholar]
Rössler, O.E. An Equation for Continuous Chaos. Phys. Lett. A. 1976, 57, 397–398. [Google Scholar] [CrossRef]
Peitgen, H.O.; Jürgens, H.; Saupe, D. Chaos and Fractals New Frontiers of Science, 2nd ed.; Springer: New York, NY, USA, 2004; pp. 636–646. [Google Scholar]
Li, D.; Han, M.; Wang, J. Chaotic time series prediction based on a novel robust echo state network. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 787–799. [Google Scholar] [CrossRef] [PubMed]
Applications of Machine Learning Group. Available online: https://research.cs.aalto.fi/aml/datasets.shtml (accessed on 17 May 2019).

Figure 1. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on identification of process (42): (a) Prediction error curve of FGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.

Figure 2. Prediction error curves of relevant approaches with an adverse parameters setting of AFGR-OSELM on identification of process (42): (a) Prediction error curve of AFGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.

Figure 3. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on time series (46): (a) Prediction error curve of FGR-OSELM; (b) That of AFGR-OSELM; (c) That of FP-ELM; (d) That of CF-OSELM-PRFF; (e) That of FOS-MELM.

Figure 4. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on time series (47): (a) Prediction error curve of FGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.

Figure 5. EDTS.

Table 1. Efficiency comparison between CF-OSELM-PRFF and its counterparts on identification of process (42) with input (45).

#nodes	Algorithms	Running Time (s) for Different Number of Simulation Prediction Steps							Speedup₁	Speedup₂	Speedup₃	Speedup₄
		250	500	1000	1500	2000	2500	3000
25	FP-ELM	0.076	0.100	0.162	0.218	0.278	0.340	0.413
	FGR-OSELM	0.058	0.082	0.124	0.162	0.200	0.252	0.420
	AFGR-OSELM	0.086	0.140	0.240	0.367	0.493	0.608	0.733
	FOS-MELM	0.182	0.345	0.668	1.021	1.379	1.761	2.122
	CF-OSELM-PRFF	0.060	0.079	0.108	0.152	0.194	0.234	0.272	1.444	1.182	2.428	6.807
50	FP-ELM	0.098	0.146	0.256	0.358	0.456	0.564	0.662
	FGR-OSELM	0.084	0.130	0.214	0.296	0.390	0.478	0.578
	AFGR-OSELM	0.130	0.250	0.628	0.908	1.268	1.544	1.558
	FOS-MELM	0.581	1.132	2.150	3.238	4.234	5.264	6.314
	CF-OSELM-PRFF	0.078	0.114	0.186	0.256	0.328	0.426	0.474	1.364	1.165	3.376	12.306
100	FP-ELM	0.194	0.340	0.646	0.978	1.366	1.731	2.044
	FGR-OSELM	0.158	0.257	0.498	0.728	1.023	1.284	1.524
	AFGR-OSELM	0.485	0.947	—¹	—	—	—	—
	FOS-MELM	2.116	4.138	7.910	11.663	15.624	19.512	23.196
	CF-OSELM-PRFF	0.134	0.228	0.440	0.632	0.878	1.091	1.233	1.575	1.180	3.954	18.155
150	FP-ELM	0.357	0.683	1.410	1.990	2.666	3.256	3.848
	FGR-OSELM	0.300	0.580	1.120	1.705	2.307	2.876	3.324
	AFGR-OSELM	1.268	2.785	—	—	—	—	—
	FOS-MELM	3.976	7.910	15.818	23.569	31.150	38.792	45.994
	CF-OSELM-PRFF	0.217	0.394	0.752	1.099	1.459	1.847	2.302	1.761	1.513	6.636	20.720
200	FP-ELM	0.682	1.269	2.452	3.590	4.602	5.824	7.177
	FGR-OSELM	0.662	1.212	2.342	3.599	4.708	5.748	6.854
	AFGR-OSELM	2.970	5.818	—	—	—	—	—
	FOS-MELM	6.662	13.140	25.994	38.692	51.636	64.915	77.460
	CF-OSELM-PRFF	0.360	0.722	1.269	1.956	2.472	3.148	3.876	1.854	1.820	8.119	20.175

¹ ‘—’ represents meaninglessness in the case.

Table 2. Prediction RMSE comparison of CF-OSELM-PRFF to its counterparts on identification of process (42) with input (45).

#nodes	Algorithms	Simulation Prediction Steps
		250	500	1000	1500	2000	2500	3000
25	FP-ELM	0.0221	0.0203	0.0674	0.0536	0.0491	0.0471	0.0464
	FGR-OSELM	0.0222	0.0192	0.0665	0.0576	0.0507	332.4410	2134.2351
	AFGR-OSELM	0.0239	0.0182	0.0624	0.0535	0.0534	0.0437	0.0439
	FOS-MELM	0.0231	0.0242	0.0603	0.0571	0.0559	0.0481	0.0433
	CF-OSELM-PRFF	0.0212	0.0196	0.0634	0.0553	0.0517	0.0472	0.0468
50	FP-ELM	0.0219	0.0176	0.0641	0.0538	0.0487	0.0458	0.0402
	FGR-OSELM	0.0212	0.0171	0.0627	0.0527	0.0489	0.0433	2.2906
	AFGR-OSELM	0.0231	0.0182	0.0625	0.0507	0.0444	0.0416	0.0388
	FOS-MELM	0.0254	0.0211	0.0681	0.0605	0.0533	0.0457	0.0456
	CF-OSELM-PRFF	0.0222	0.0176	0.0604	0.0529	0.0492	0.0440	0.0441
100	FP-ELM	0.0323	0.0249	0.0607	0.0564	0.0459	0.0445	0.0396
	FGR-OSELM	0.0328	0.0245	0.0600	0.0544	0.0461	0.0411	2.5203
	AFGR-OSELM	0.0314	0.0252	×¹	×	×	×	×
	FOS-MELM	0.0336	0.0238	0.0742	0.0640	0.0495	0.0459	0.0424
	CF-OSELM-PRFF	0.0329	0.0244	0.0603	0.0541	0.0475	0.0434	0.0389
150	FP-ELM	0.0320	0.0238	0.0606	0.0511	0.0467	0.0430	0.0383
	FGR-OSELM	0.0322	0.0238	0.0613	0.0526	0.0456	0.0412	36.7141
	AFGR-OSELM	0.0308	0.0306	×	×	×	×	×
	FOS-MELM	0.0344	0.0238	0.0721	0.0591	0.0534	0.0456	0.0408
	CF-OSELM-PRFF	0.0319	0.0242	0.0614	0.0519	0.0471	0.0430	0.0402
200	FP-ELM	0.0317	0.0237	0.0615	0.0526	0.0429	0.0407	0.0362
	FGR-OSELM	0.0314	0.0236	0.0602	0.0513	0.0434	0.0403	0.0427
	AFGR-OSELM	0.0307	0.0243	×	×	×	×	×
	FOS-MELM	0.0320	0.0229	0.0670	0.0564	0.0506	0.0428	0.0417
	CF-OSELM-PRFF	0.0316	0.0238	0.0605	0.0507	0.0441	0.0404	0.0369

¹ ‘×’ represents nullification owing to the too large RMSE.

Table 3. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of time series (46).

#nodes	Algorithms	Running Time (s) for Different Number of Simulation Prediction Steps							Speedup₁	Speedup₂	Speedup₃	Speedup₄
		250	500	1000	1500	2000	2500	3000
25	FP-ELM	0.054	0.098	0.167	0.216	0.272	0.347	0.398
	FGR-OSELM	0.041	0.074	0.119	0.188	0.226	0.234	0.284
	AFGR-OSELM	0.090	—¹	—	—	—	—
	FOS-MELM	0.194	0.345	0.703	1.047	1.386	1.710	2.053
	CF-OSELM-PRFF	0.052	0.068	0.112	0.146	0.183	0.240	0.260	1.462	1.098	1.731	7.007
50	FP-ELM	0.066	0.114	0.222	0.326	0.463	0.524	0.632
	FGR-OSELM	0.048	0.082	0.164	0.254	0.324	0.385	0.458
	AFGR-OSELM	0.126	0.233	—	—	—	—	—
	FOS-MELM	0.508	0.987	2.026	3.108	4.152	5.190	6.076
	CF-OSELM-PRFF	0.042	0.082	0.160	0.239	0.298	0.380	0.441	1.429	1.045	2.894	13.429
100	FP-ELM	0.166	0.338	0.600	0.888	1.214	1.551	1.762
	FGR-OSELM	0.118	0.254	0.446	0.714	0.930	1.195	1.430
	AFGR-OSELM	0.450	—	—	—	—	—	—
	FOS-MELM	2.010	4.028	8.097	12.069	16.094	20.318	24.174
	CF-OSELM-PRFF	0.108	0.206	0.380	0.584	0.804	0.981	1.150	1.547	1.208	4.167	20.601
150	FP-ELM	0.354	0.667	1.346	2.123	2.877	3.563	4.154
	FGR-OSELM	0.284	0.557	1.139	1.811	2.436	3.022	3.610
	AFGR-OSELM	1.213	—	—	—	—	—	—
	FOS-MELM	4.026	7.992	16.137	24.157	32.581	40.494	48.799
	CF-OSELM-PRFF	0.190	0.374	0.750	1.197	1.653	2.038	2.344	1.765	1.505	6.385	20.384
200	FP-ELM	0.592	1.160	2.374	3.630	4.668	6.074	7.438
	FGR-OSELM	0.558	1.172	2.193	3.366	4.396	5.588	6.734
	AFGR-OSELM	2.502	—	—	—	—	—	—
	FOS-MELM	6.630	13.437	26.253	39.086	53.110	66.712	79.110
	CF-OSELM-PRFF	0.314	0.602	1.263	1.876	2.468	3.340	4.036	1.866	1.727	7.968	20.457

¹ ‘—’ represents meaninglessness in the case.

Table 4. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of time series (46).

#nodes	Algorithms	Simulation Prediction Steps
		250	500	1000	1500	2000	2500	3000
25	FP-ELM	0.1095	0.2265	0.1925	0.2194	0.2535	0.2674	0.1814
	FGR-OSELM	0.1882	0.2978	0.1469	0.3452	0.3131	0.2028	9896.2491
	AFGR-OSELM	0.1274	×¹	×	×	×	×	×
	FOS-MELM	0.0796	0.2497	0.2118	0.1749	0.2932	0.1972	0.2887
	CF-OSELM-PRFF	0.1513	0.1858	0.1804	0.1673	0.1596	0.2128	0.1831
50	FP-ELM	0.0719	0.1025	0.1171	0.1095	0.1284	0.0701	0.1139
	FGR-OSELM	0.1231	0.1478	0.1063	0.1715	0.1597	0.1496	1651.0173
	AFGR-OSELM	0.1007	0.1403	×	×	×	×	×
	FOS-MELM	0.0892	0.1688	0.3740	0.1587	0.1723	0.2471	0.1356
	CF-OSELM-PRFF	0.0797	0.1325	0.0948	0.1107	0.1010	0.1405	0.1291
100	FP-ELM	0.0839	0.0708	0.0610	0.0604	0.0589	0.0592	0.0745
	FGR-OSELM	0.1027	0.1033	0.0973	0.0628	0.0656	0.1016	0.5117
	AFGR-OSELM	0.0815	×	×	×	×	×	×
	FOS-MELM	0.0968	0.1061	0.0935	0.0829	0.0922	0.0925	0.0665
	CF-OSELM-PRFF	0.0794	0.0816	0.0788	0.0779	0.0763	0.0805	0.0701
150	FP-ELM	0.0669	0.0599	0.0493	0.0559	0.0434	0.0522	0.0589
	FGR-OSELM	0.0769	0.0710	0.0567	0.0558	0.0578	0.0670	61.2365
	AFGR-OSELM	0.0831	×	×	×	×	×	×
	FOS-MELM	0.0825	0.0687	0.0693	0.0612	0.0634	0.0765	0.0686
	CF-OSELM-PRFF	0.0587	0.0579	0.0561	0.0635	0.0610	0.0579	0.0619
200	FP-ELM	0.0627	0.0480	0.0451	0.0402	0.0462	0.0418	0.0419
	FGR-OSELM	0.0565	0.0504	0.0497	0.0464	0.0515	0.0412	23.1895
	AFGR-OSELM	0.0693	×	×	×	×	×	×
	FOS-MELM	0.0815	0.0817	0.0505	0.0575	0.0596	0.0496	0.0693
	CF-OSELM-PRFF	0.0601	0.0460	0.0422	0.0440	0.0517	0.0388	0.0489

¹ ‘×’ represents nullification owing to the too large RMSE.

Table 5. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of time series (47).

#nodes	Algorithms	Running Time (s) for Different Number of Simulation Prediction Steps							Speedup₁	Speedup₂	Speedup₃	Speedup₄
		250	500	1000	1500	2000	2500	3000
25	FP-ELM	0.049	0.075	0.136	0.195	0.262	0.331	0.389
	FGR-OSELM	0.037	0.060	0.104	0.150	0.185	0.232	0.279
	AFGR-OSELM	0.082	0.124	0.250	0.363	0.482	0.604	0.730
	FOS-MELM	0.188	0.362	0.686	1.03	1.428	1.756	2.106
	CF-OSELM-PRFF	0.032	0.052	0.099	0.129	0.175	0.219	0.262	1.484	1.081	2.732	7.806
50	FP-ELM	0.067	0.125	0.217	0.321	0.420	0.510	0.610
	FGR-OSELM	0.053	0.098	0.172	0.232	0.305	0.380	0.476
	AFGR-OSELM	0.124	0.230	0.458	0.692	0.922	1.192	1.364
	FOS-MELM	0.490	0.994	2.097	3.106	4.118	5.153	6.130
	CF-OSELM-PRFF	0.045	0.097	0.169	0.239	0.292	0.371	0.466	1.352	1.023	2.968	13.158
100	FP-ELM	0.160	0.312	0.572	0.880	1.202	1.503	1.827
	FGR-OSELM	0.129	0.234	0.456	0.676	0.879	1.145	1.397
	AFGR-OSELM	0.460	0.930	1.884	2.816	3.810	4.714	5.656
	FOS-MELM	2.012	4.166	8.194	12.232	16.646	20.617	24.890
	CF-OSELM-PRFF	0.107	0.202	0.396	0.578	0.776	1.008	1.191	1.517	1.155	4.761	20.849
150	FP-ELM	0.360	0.672	1.395	2.036	2.694	3.480	4.184
	FGR-OSELM	0.278	0.585	1.106	1.736	2.300	2.968	3.478
	AFGR-OSELM	1.184	2.424	4.909	7.382	9.640	12.252	14.838
	FOS-MELM	4.014	8.224	16.293	24.779	33.227	41.534	49.448
	CF-OSELM-PRFF	0.198	0.377	0.753	1.162	1.522	1.822	2.294	1.824	1.532	6.475	21.841
200	FP-ELM	0.596	1.159	2.475	3.950	5.089	6.187	7.385
	FGR-OSELM	0.577	1.163	2.271	3.922	4.644	5.793	7.003
	AFGR-OSELM	2.730	5.145	10.318	15.087	20.305	25.380	30.522
	FOS-MELM	6.672	13.200	26.207	39.492	52.814	65.882	78.909
	CF-OSELM-PRFF	0.322	0.610	1.348	2.118	2.547	3.141	3.865	1.924	1.819	7.848	20.298

Table 6. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of time series (47).

#nodes	Algorithms	Simulation Prediction Steps
		250	500	1000	1500	2000	2500	3000
25	FP-ELM	0.00200	0.00254	0.00330	0.00486	0.00590	0.00730	0.01494
	FGR-OSELM	0.00230	0.00304	0.00326	0.00386	0.00462	0.00732	203.56020
	AFGR-OSELM	0.00246	0.00280	0.00288	0.00318	0.00304	0.00598	0.01324
	FOS-MELM	0.00136	0.00124	0.00192	0.00344	0.00566	0.01106	0.02382
	CF-OSELM-PRFF	0.00194	0.00246	0.00346	0.00464	0.00504	0.00710	0.01222
50	FP-ELM	0.00244	0.00238	0.00326	0.00384	0.00382	0.00528	0.01032
	FGR-OSELM	0.00284	0.00252	0.00284	0.00302	0.00288	0.00462	0.01264
	AFGR-OSELM	0.00252	0.00244	0.00276	0.00230	0.00317	0.00498	2.54816
	FOS-MELM	0.00184	0.00176	0.00204	0.00250	0.00368	0.00696	0.01590
	CF-OSELM-PRFF	0.00242	0.00240	0.00324	0.00366	0.00410	0.00494	0.01094
100	FP-ELM	0.00180	0.00230	0.00268	0.00272	0.00292	0.00420	0.00820
	FGR-OSELM	0.00138	0.00170	0.00190	0.00168	0.00220	0.00390	0.01366
	AFGR-OSELM	0.00134	0.00164	0.00180	0.00178	0.00220	0.00354	0.00910
	FOS-MELM	0.00100	0.00103	0.00110	0.00134	0.00190	0.00356	0.00784
	CF-OSELM-PRFF	0.00170	0.00222	0.00266	0.00266	0.00276	0.00428	0.00850
150	FP-ELM	0.00160	0.00202	0.00222	0.00216	0.00234	0.00342	0.00686
	FGR-OSELM	0.00110	0.00132	0.00130	0.00134	0.00176	0.00320	0.00714
	AFGR-OSELM	0.00104	0.00128	0.00146	0.00138	0.00180	0.00306	0.00638
	FOS-MELM	0.00069	0.00077	0.00081	0.00110	0.00154	0.00278	0.00636
	CF-OSELM-PRFF	0.00160	0.00202	0.00224	0.00220	0.00228	0.00334	0.00674
200	FP-ELM	0.00142	0.00178	0.00190	0.00182	0.00194	0.00300	0.00570
	FGR-OSELM	0.00096	0.00114	0.00112	0.00110	0.00152	0.00294	0.00628
	AFGR-OSELM	0.00086	0.00108	0.00138	0.00124	0.00154	0.00260	0.00614
	FOS-MELM	0.00060	0.00065	0.00067	0.00094	0.00132	0.00244	0.00516
	CF-OSELM-PRFF	0.00140	0.00176	0.00188	0.00182	0.00196	0.00288	0.00570

Table 7. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of EDTS.

#nodes	Algorithms	Running Time (s) for Different Number of Simulation Prediction Steps							Speedup₁	Speedup₂	Speedup₃	Speedup₄
		250	500	1000	1500	2000	2500	3000
25	FP-ELM	0.058	0.102	0.178	0.254	0.336	0.418	0.506
	FGR-OSELM	0.039	0.053	0.103	0.144	0.185	0.232	0.288
	AFGR-OSELM	0.068	0.126	—¹	—	—	—	—
	FOS-MELM	0.170	0.336	0.678	1.020	1.338	1.832	2.128
	CF-OSELM-PRFF	0.024	0.046	0.090	0.130	0.170	0.218	0.252	1.991	1.124	2.771	8.067
50	FP-ELM	0.089	0.162	0.306	0.460	0.602	0.760	0.916
	FGR-OSELM	0.046	0.086	0.166	0.256	0.334	0.402	0.500
	AFGR-OSELM	0.128	—	—	—	—	—	—
	FOS-MELM	0.483	1.032	2.020	3.130	4.224	5.226	6.206
	CF-OSELM-PRFF	0.044	0.084	0.158	0.234	0.308	0.376	0.441	2.003	1.088	2.909	13.571
100	FP-ELM	0.226	0.413	0.814	1.216	1.656	2.158	2.556
	FGR-OSELM	0.120	0.230	0.456	0.662	0.928	1.200	1.432
	AFGR-OSELM	0.436	—	—	—	—	—	—
	FOS-MELM	1.994	4.026	8.010	11.600	15.725	21.788	23.466
	CF-OSELM-PRFF	0.106	0.234	0.475	0.706	0.928	1.172	1.356	1.816	1.010	4.113	17.401
150	FP-ELM	0.584	1.156	2.092	3.043	3.898	4.964	5.966
	FGR-OSELM	0.356	0.574	1.142	1.690	2.272	2.882	3.500
	AFGR-OSELM	1.476	—	—	—	—	—	—
	FOS-MELM	3.956	7.668	15.660	23.923	31.005	40.389	46.937
	CF-OSELM-PRFF	0.238	0.394	0.768	1.166	1.580	1.976	2.322	2.570	1.470	6.202	20.078
200	FP-ELM	0.876	1.766	3.608	5.327	6.996	8.762	10.334
	FGR-OSELM	0.582	1.196	2.380	3.568	5.184	6.956	8.340
	AFGR-OSELM	3.162	—	—	—	—	—	—
	FOS-MELM	6.192	13.152	26.146	40.028	51.933	64.998	76.605
	CF-OSELM-PRFF	0.396	0.764	1.456	2.116	2.918	3.584	4.276	2.429	1.819	7.985	17.992

¹ ‘—’ represents meaninglessness in the case.

Table 8. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of EDTS.

#nodes	Algorithms	Simulation Prediction Steps
		250	500	1000	1500	2000	2500	3000
25	FP-ELM	32.28002	30.83888	37.61498	35.39764	35.90394	34.82232	34.55490
	FGR-OSELM	32.31098	30.82570	37.50214	35.39156	35.93366	34.95880	3716.21866
	AFGR-OSELM	34.73510	32.33776	×¹	×	×	×	×
	FOS-MELM	38.49708	33.75844	44.35528	41.75432	42.49752	41.60332	40.84012
	CF-OSELM-PRFF	32.10552	30.91740	37.46004	35.57738	35.97014	34.75104	34.66202
50	FP-ELM	35.79906	32.89564	37.40156	35.95780	35.89944	34.68656	34.32452
	FGR-OSELM	35.83376	32.97100	37.41768	35.65734	36.19928	34.77612	34.34424
	AFGR-OSELM	40.34600	×	×	×	×	×	×
	FOS-MELM	36.28396	34.33358	37.61886	36.17690	36.30670	35.38694	34.46474
	CF-OSELM-PRFF	35.77696	32.79708	37.38146	35.64052	35.90528	34.94106	34.20720
100	FP-ELM	29.90644	36.69486	36.63406	34.98748	36.24932	34.87696	33.51692
	FGR-OSELM	29.82256	36.48022	36.66528	34.98040	36.17566	34.85032	33.46508
	AFGR-OSELM	33.53562	×	×	×	×	×	×
	FOS-MELM	30.26694	37.67300	37.31872	35.12966	36.24698	35.23746	33.92734
	CF-OSELM-PRFF	29.82264	36.76328	36.81942	34.90620	36.10756	34.83982	33.60692
150	FP-ELM	29.80988	36.44814	36.63526	34.89114	36.06950	34.71408	33.54184
	FGR-OSELM	29.80668	36.47544	36.72366	34.83888	36.08316	34.70846	33.62596
	AFGR-OSELM	33.40442	×	×	×	×	×	×
	FOS-MELM	30.24228	37.89150	37.38668	35.20784	36.33474	35.50346	33.81606
	CF-OSELM-PRFF	29.85126	36.54124	36.61798	34.72458	36.06778	34.67280	33.53874
200	FP-ELM	29.82088	36.50352	36.67734	34.78988	36.02270	34.68296	33.36908
	FGR-OSELM	29.80214	36.51458	36.63322	34.70966	36.00532	34.62506	33.35924
	AFGR-OSELM	33.15900	×	×	×	×	×	×
	FOS-MELM	30.31664	38.29058	37.60300	35.41638	36.64308	35.48306	34.02962
	CF-OSELM-PRFF	29.82440	36.63616	36.66926	34.70436	36.01218	34.63022	33.45856

¹ ‘×’ represents nullification owing to the too large RMSE.

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Cholesky Factorization Based Online Sequential Extreme Learning Machines with Persistent Regularization and Forgetting Factor

Abstract

1. Introduction

2. Brief Review of Related Work

2.1. The RELM

2.2. FR-OSELM

2.3. The Existing Algorithms for OSELM-PRFF

2.3.1. FP-ELM

2.3.2. FGR-OSELM

3. Proposed CF-OSELM-PRFF

4. Experimental Results and Analysis

4.1. Time-varying Nonlinear Process Identification

4.2. Lorenz Time Series Prediction

4.3. Rössler Time Series Prediction

4.4. Experiment on Real Data Set

4.5. Discussion

5. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics