You are currently viewing a new version of our website. To view the old version click .
Symmetry
  • Article
  • Open Access

17 June 2019

Cholesky Factorization Based Online Sequential Extreme Learning Machines with Persistent Regularization and Forgetting Factor

and
School of Computer Science and Engineering, Central South University, Changsha 410083, China
*
Author to whom correspondence should be addressed.

Abstract

The online sequential extreme learning machine with persistent regularization and forgetting factor (OSELM-PRFF) can avoid potential singularities or ill-posed problems of online sequential regularized extreme learning machines with forgetting factors (FR-OSELM), and is particularly suitable for modelling in non-stationary environments. However, existing algorithms for OSELM-PRFF are time-consuming or unstable in certain paradigms or parameters setups. This paper presents a novel algorithm for OSELM-PRFF, named “Cholesky factorization based” OSELM-PRFF (CF-OSELM-PRFF), which recurrently constructs an equation for extreme learning machine and efficiently solves the equation via Cholesky factorization during every cycle. CF-OSELM-PRFF deals with timeliness of samples by forgetting factor, and the regularization term in its cost function works persistently. CF-OSELM-PRFF can learn data one-by-one or chunk-by-chunk with a fixed or varying chunk size. Detailed performance comparisons between CF-OSELM-PRFF and relevant approaches are carried out on several regression problems. The numerical simulation results show that CF-OSELM-PRFF demonstrates higher computational efficiency than its counterparts, and can yield stable predictions.

1. Introduction

Single hidden-layer feedforward neural networks (SLFN) can approximate any function and form decision boundaries with arbitrary shapes if the activation function is chosen properly [,,]. To fast train SLFN, Huang et al. proposed a learning algorithm called “Extreme Learning Machine” (ELM), which randomly assigns the hidden nodes parameters and then determines the output weights by the Moore–Penrose generalized inverse [,,]. ELM has been successfully applied to many real-world applications, such as retinal vessel segmentation [], wind speed forecasting [,], water network management [], path-tracking of autonomous mobile robot [], modelling of drying processes [], bearing fault diagnosis [], cybersecurity defense framework [], crop classification [], and energy disaggregation []. In recent years, ELM has been extended to multilayer ELMs, which play an important role in the deep learning domain [,,,,,,].
The original ELM is a batch learning algorithm; all samples must be available before ELM trains SLFN. Whenever new data arrive, ELM has to gather old and new data together to retrain SLFN to incorporate the new information. This is a very time-consuming process, and is even computationally infeasible in some applications where frequent and fast training, or even real-time training, is required. Hardware systems cannot provide enough memory to store the increasing amount of training data. To deal with problems with sequential data, Liang et al. proposed an online sequential ELM (OS-ELM) to learn data one-by-one or chunk-by-chunk with fixed or varying chunk size []. OS-ELM can be implemented in common programming languages and run on universal computing platforms. Moreover, in order to fast execute OS-ELM, Frances-Villora et al. developed an FPGA-based implementation of the tailored OS-ELM algorithm [], which assumes a one-by-one training strategy. OS-ELM has been successfully adopted in some applications, but it still has some drawbacks. Firstly, OS-ELM may encounter ill-conditioning problems, and resulting fluctuating generalization performances of SLFN if the number of hidden nodes L in SLFN is not set appropriately [,,]. Secondly, OS-ELM does not take timeliness of samples into account, so it cannot be directly employed in time-varying or nonstationary environments.
As a variant of ELM, Regularized ELM (RELM) [,], which is equivalent to the constrained optimization-based ELM [] mathematically, can achieve better generalization performance than ELM, can greatly reduce the randomness effect in ELM [,], and is less sensitive to L. Furthermore, several online sequential RELMs have been developed by researchers. Huynh and Won proposed ReOS-ELM []. Despite the widespread application of ReOS-ELM, it does not consider the timeliness of samples. To take this into account, Zhang et al. and Du et al. separately designed online sequential RELM with a forgetting factor, viz., SF-ELM [] and RFOS-ELM []; Guo and Xu referred to them as FR-OSELM [,]. After stating the real optimization cost function in FR-OSELM and theoretically analyzing FR-OSELM, Guo and Xu pointed out that the regularization term in the cost function of FR-OSELM will be forgotten and tends to zero as time passes; thus, FR-OSELM will probably run into ill-conditioning problems and become unstable after a long period. Incidentally, a similar or the same optimization cost function, or recursive solution in which the regularization term wanes gradually with time, is still utilized in [,,].
Recently, online sequential extreme learning machines with persistent regularization and forgetting factors (OSELM-PRFFs) were put forward [,,]; this can avoid the potential singularity or ill-posed problem of FR-OSELM. Moreover, two kinds of recursive calculation schemes for OSELM-PRFF have been developed. One is FP-ELM, which directly calculates precise inverse of matrix during every update of model []; the other includes FGR-OSELM [] and AFGR-OSELM [] which compute the recursively approximate inverse of the involved matrix to reduce computational burden. These online sequential RELM have been applied to some examples perfectly. However, although the recursive calculation of approximate inverse matrix in FGR-OSELM and AFGR-OSELM enhances efficiency, it may cause FGR-OSELM and AFGR-OSELM to be unreliable in certain paradigms or parameters setups. Additionally, the direct calculation of precise inverse matrix in FP-ELM will make FP-ELM inefficient.
The reliability and time efficiency of online learning algorithms are two important indexes in general. In real-time applications, such as stock forecasts, modelling for controlled objects and signal processing, the computational efficiency of online training algorithm for SLFN is a crucial factor. Here, a new online sequential extreme learning machine with persistent regularization and forgetting factor using Cholesky factorization (CF-OSELM-PRFF) is presented. This paper analyzes and proves the symmetry and positive definitiveness of coefficient matrix of linear equations of OSELM-PRFF. The presented method decomposes the coefficient matrix in Cholesky form during every updating model period, and transforms the linear equations into two linear equations with lower and upper triangular coefficient matrices respectively, and applies forward substitution and backward substitution to solve the two linear equations. The computational efficiency and prediction accuracy of CF-OSELM-PRFF are evaluated by process identification, classical time series prediction, and real electricity load forecasting. The numerical experiments indicate that CF-OSELM-PRFF runs faster than several other representative methods, and can provide accurate predictions.
The rest of this paper is organized as follows. Section 2 gives a brief review of RELM, FR-OSELM and the existing OSELM-PRFF. Section 3 proposes CF-OSELM-PRFF. Performance evaluation is conducted in Section 4. Finally, conclusions are given in Section 5.

3. Proposed CF-OSELM-PRFF

FP-ELM is a stable online sequential training algorithm for SLFN which take the timeliness of samples into consideration and can circumvent the potential phenomenon of data saturation. However, the calculation of inverse of λI+Kk in Equation (15) is time-consuming in every work period. FGR-OSELM calculates approximate recursively by Equations (18) and (19), which can save time, whereas which might engender algorithm unstable. In order to speed up FP-ELM, the work proposes an approach to fast solve βk using Cholesky decomposition trick. The complete algorithm is termed as CF-OSELM-PRFF, which is described in the sequel.
Let
Q 0 = H 0 T Y 0 ,
Q k = μ k Q k 1 + H k T Y k .
Then, Equation (11) can be rewritten as
( λ I + K k ) β k = Q k .
Proposition 1.
The matrix λI+Kk is a symmetric positive definition matrix.
Proof Symmetry. Apparently, K0 is symmetric. Assume Kk-1 is symmetric, then
K k T = ( μ k K k 1 + H k T H k ) T = ( μ k K k 1 T + H k T H k ) = K k .
According to mathematical induction, for any k, Kk is symmetric. As a result, λI+Kk is symmetric.
Positive definitiveness. For any ζ=[ζ1, ζ2, …, ζL]T≠0, it holds that
ζ T K 0 ζ = ζ T H 0 T H 0 ζ = ( H 0 ζ ) T ( H 0 ζ ) = ( i = 1 L ζ i G ( a i , b i , x 1 ) ) 2 + ( i = 1 L ζ i G ( a i , b i , x 2 ) ) 2 + + ( i = 1 L ζ i G ( a i , b i , x N 0 ) ) 2 0.
Suppose Kk-1 is positive semi-definite, that is,
ζ T K k 1 ζ 0 ,
then,
ζ T μ k K k 1 ζ 0.
Similar to Equation (25), it holds that
ζ T H k T H k ζ 0 ,
Then,
ζ T K k ζ = ζ T ( μ k K k 1 + H k T H k ) ζ = ζ T μ k K k 1 ζ + ζ T H k T H k ζ 0
Additionally,
ζ T λ I ζ = λ i = 1 L ζ i 2 > 0 .
In conclusion, λI+Kk is a symmetric positive definition matrix.
Denote B=λI+Kk, then, B can be uniquely factorized into Cholesky form, i.e., B=UTU, where U is an upper triangular matrix. Then U can be calculated as following formulas []:
u i i = ( b i i d = 1 i 1 u d i 2 ) 1 / 2 = ( λ + K k ( i i ) d = 1 i 1 u d i 2 ) 1 / 2 , i = 1 , , L .
u i j = ( b i j d = 1 i 1 u d i u d j ) / u i i = ( K k ( i j ) d = 1 i 1 u d i u d j ) / u i i , j = i + 1 , , L .
Equation (23) can be solved by the following two equations:
U T P = Q k ,
U β k = P .
Denote Qk= [q1, q2, …, qL]T, utilize back substitution method, the solution to Equation (23), viz., the coefficient βi in Equation (1) may be gained as follows
p i = ( q i d = 1 i 1 u d i p d ) / u i i , i = 1 , , L
β k , i = ( p i d = i + 1 L u i d β k , d ) / u i i , i = L , , 1
The CF-OSELM-PRFF algorithm can be summarized as follows.
Step 1: Preparation.
(1) Choose the hidden output function G(a, b, x) of SFLN and the number of hidden nodes L; determine λ, μ.
(2) Randomly assign hidden parameters (ai, bi), i=1,2,…,L.
Step 2: Initialization.
(1) Acquire the initial data chunk S0 (N0 ≥ 2).
(2) Calculate H0, Y0.
(3) Calculate K0 by Equation (12), calculate Q0 by Equation (21).
(4) Calculate Cholesky factor U of K0 by Equations (31) and (32).
(5) Calculate β0 by Equations (35) and (36).
(6) For the input x, the predicting output value y ^ = h ( x ) β 0 .
Step 3: Online modeling and prediction, i.e., repeat the following substeps.
(1) Acquire the kth (k ≥ 1) data chunk Sk
(2) Calculate Hk, Yk.
(3) Calculate Kk by Equation (14), calculate Qk by Equation (22).
(4) Calculate U of Kk by Equations (31) and (32).
(5) Calculate βk by Equations (35) and (36).
(6) For the input x, the predicting output value y ^ = h ( x ) β k .

4. Experimental Results and Analysis

In this section, the performance of the presented CF-OSELM-PRFF is verified by a time-varying nonlinear process identification, two chaotic time series and one electricity demand prediction. These simulations are designed from the aspects of computation complexity (or running time) and accuracy of the CF-OSELM-PRFF by comparison with the FP-ELM, FGR-OSELM, AFGR-OSELM and FOS-MELM []. FOS-MELM is an online sequential multiple hidden layers extreme learning machine with forgetting mechanism, which is recently proposed by Xiao et al. To make the results of FOS-MELM more stable, a regularization term is introduced into its solving process according to [].
For these online algorithms, the common regularization parameter λ is set as 0.001; the forgetting factor is set as μ=0.99. For AFGR-OSELM, adaptive forgetting factor is tuned in interval [0.8, 0.999] with initial value 0.995, and other peculiar parameters are set according to [].
The output of hidden node with respect to the input x of a SLFN in Equation (1) is set as the sigmoid function, i.e., G(a, b, x)=1/(1+exp(-(a · x +b))), the components of a, i.e, the input weights and biase b are randomly chosen from the range [-1,1]. Specially, the hyperbolic tangent function G(a, b, x)=(1-exp(-(a · x+b)))/(1+exp(-(a · x +b))) is selected as activation function in FOS-MELM.
For FOS-MELM, every training data chunk contains only a sample, and each sample remains valid for s unit time; the parameter s is set as s = N0.
In order to observe performance of these approaches under various situations, the number L of hidden nodes of SLFN is set as 25, 50, 100, 150, 200, and the corresponding number N0 of initial training samples is assigned to 50, 100, 200, 200, 200, respectively.
The root mean square error (RMSE) of prediction is regarded as measurement index of model accuracy.
RMSE = 1 t 2 t 1 + 1 i = t 1 t 2 ( y ^ i y i ) 2
The relative efficiencies of CF-OSELM-PRFF to its counterparts are measured by speedup ratios. The speedups of CF-OSELM-PRFF to the other related methods are defined as
speedup 1 = total running time of FP-ELM total running time of CF-OSELM-PRFF
speedup 2 = total running time of FGR-OSELM total running time of CF-OSELM-PRFF
speedup 3 = total running time of AFGR-OSELM total running time of CF-OSELM-PRFF
speedup 4 = total running time of FOS-MELM total running time of CF-OSELM-PRFF
All the performance assessments were carried out in MATLAB R2010b 32-bit environment running on Windows 7 32-bit with Intel Core i3-3220 3.3 GHz CPU and 4 GB RAM.

4.1. Time-varying Nonlinear Process Identification

The identified unknown system is a modified version of the one addressed in []; by changing the constant and the coefficients of variables, the time-varying system is expressed as follows:
y ( k + 1 ) = { y ( k ) 1 + y ( k ) 2 + u ( k ) 3 k 100 2 y ( k ) 2 + y ( k ) 2 + u ( k ) 3 100 < k 300 y ( k ) 1 + 2 y ( k ) 2 + 2 u ( k ) 3 300 < k
The system (42) can be expressed as follows:
y ( k ) = f ( x ( k ) )
where f(x) is a nonlinear function, x(k) is the regression input data vector
x(k)=[y(k-1), y(k-2),..., y(k-ny); u(k-nd), u(k-nd-1),..., u(k-nu)],
with ny, nd and nu being model structure parameters. They are set as ny = 3, nd = 1 and nu = 2 here. When SLFN is applied to approximate Equation (40), (x(k), y(k)) is the learning sample (xk, yk) of SLFN.
Denote k0 = N0+max (ny, nu) − nd, k1 = k0 + 500. The system input is set as follows:
u ( k ) = { rand ( ) 0.5 k k 0 sin ( 2 π ( k k 0 ) / 120 ) k 0 < k k 1 sin ( 2 π ( k k 1 ) / 50 ) k 1 < k
where, rand ( ) generates random numbers which are uniformly distributed in the interval (0,1).
The simulations are carried out for different numbers of steps and hidden nodes. Efficiency comparison between CF-OSELM-PRFF and FP-ELM, FGR-OSELM, AFGR-OSELM, together with FOS-MELM are listed in Table 1. Due to randomness of parameters a, b, and u(k) during initial stage (kk0), along with intrinsic uncertainty of computing environment, the testing results of simulation must possess variation. Consequentially, for each case, every running time is a mean over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is marked in bold.
Table 1. Efficiency comparison between CF-OSELM-PRFF and its counterparts on identification of process (42) with input (45).
Table 1 shows that, with the same number of hidden nodes and by performing the same number of simulation prediction steps, CF-OSELM-PRFF costs least time among five approaches statistically, therefore, CF-OSELM-PRFF has obvious speed advantage over FP-ELM. Moreover, with the increase of the number of hidden nodes, the speedup tends to become larger. FOS-MELM trains three hidden layer feedforward neural networks and consists of complex calculation steps; thus, it costs most time.
Table 2 displays a prediction RMSE comparison of CF-OSELM-PRFF to its counterparts. Every RMSE is also an average over 5 independent trials of each algorithm performing the set steps. In each set of results, the best one is marked in bold. From Table 2, it can be seen that there is not any apparent difference in the predictive accuracy among FP-ELM, FOS-MELM and CF-OSELM-PRFF; in some setups, FGR-OSELM and AFGR-OSELM can provide satisfactory, or even the best forecasts, but they cannot work optimally in certain cases. When the number of simulation prediction steps is set to 2500 or 3000, FGR-OSELM sometimes produces bad or low-accuracy predictions in the later stages of the simulation process, and therefore, the RMSE of FGR-OSELM becomes very large. Additionally, when the number of hidden nodes is set at 100, 150 or 200 and the number of simulation prediction steps is set at 1000 or more, AFGR-OSELM usually runs unstably and produces a very large RMSE. The reason for this is that in the recursive formulas of FGR-OSELM and AFGR-OSELM, the approximate calculation of inverse of a related matrix will yield errors which may propagate and accumulate; as a result, the algorithm is apt to producing unreliable results.
Table 2. Prediction RMSE comparison of CF-OSELM-PRFF to its counterparts on identification of process (42) with input (45).
To intuitively observe and compare the accuracy and stability of these online sequential RELMs, fix L = 25, N0 = 50, and simulation prediction steps = 3000, execute FGR-OSELM repeatedly until a certain unstable predictive scenario occurs, plot the corresponding prediction error (predicted value minus real value) curves in Figure 1a, save the current a, b values, and initial u(k) (kk0) signal, which are called as adverse a, b values and initial u(k) signal. Subsequently, execute FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse a, b values and initial u(k) signal, respectively, and demonstrate the prediction error curves of the three approaches in Figure 1b, c and d. Clearly, Figure 1a shows that the prediction errors of FGR-OSELM become very large and completely meaningless when the prediction step exceeds a certain limit. There are a few large peaks at certain instances which reveal the instability of the FGR-OSELM arising from its recursive approximate calculation. In order to explicitly exhibit variation of prediction error of FGR-OSELM, only the partial results within the first 2453 steps are presented. Additionally, Figure 1b, c and d shown that the prediction effect of CF-OSELM-PRFF is similar to that of FP-ELM and FOS-MELM; in other words, they possess the same ideal predictive performance.
Figure 1. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on identification of process (42): (a) Prediction error curve of FGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.
Set L = 100, N0 = 200, and simulation prediction steps = 3000, execute AFGR-OSELM repeatedly until a certain unstable predictive scenario occurs, plot the corresponding prediction error curves in Figure 2a, save the current a, b values, and initial u(k) (kk0) signal, i.e., adverse a, b values and initial u(k) signal. Then, execute FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse a, b values, and initial u(k) signal, respectively, and demonstrate the prediction error curves of the three approaches in Figure 2b, c and d. Clearly, Figure 2a shows that, the prediction errors of AFGR-OSELM become larger at the 872th step, reaching −67.7839. Actually, at the 878th step, the prediction error of AFGR-OSELM suddenly reaches −25286.1257. Too large prediction errors have not been marked in Figure 2a; thus, only the partial results prior to the 878th step are presented. Additionally, Figure 2b and c manifest CF-OSELM-PRFF possesses the same excellent predictive performance as FP-ELM. Figure 2d seems to show that FOS-MELM is slightly better than CF-OSELM-PRFF, but there is only a small difference between their RMSEs.
Figure 2. Prediction error curves of relevant approaches with an adverse parameters setting of AFGR-OSELM on identification of process (42): (a) Prediction error curve of AFGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.
The above experiments indicate that for the parameter settings with which FGR-OSELM or AFGR-OSELM produces larger prediction errors, CF-OSELM-PRFF, FP-ELM and FOS-MELM can run stably and provide satisfactory prediction.

4.2. Lorenz Time Series Prediction

The Lorenz time series is a three-dimensional dynamical system that exhibits chaotic flow, which is described by the following equations [,]:
d x ( t ) d t = σ [ y ( t ) x ( t ) ] d y ( t ) d t = r x ( t ) y ( t ) x ( t ) z ( t ) d z ( t ) d t = x ( t ) y ( t ) b z ( t )
where x(t), y(t), and z(t) are the values of time series at time t. A typical choice for the parameter values is σ = 10, r = 28, and b = 8/3.
In this example, observations of the continuous x(t), y(t), and z(t) at a sequence of sampling time points are concerned. Hence, the function ode45, a standard solver for ordinary differential equations in MATLAB, is applied to generate sampling values of x(t), y(t), and z(t). The routine implements the fourth-order and fifth-order Runge-Kutta method with adaptive integral step size for efficient computation. The initial state is set as x(0) = 2, y(0) = 3, and z(0) = 4.
Let Ts indicates sampling period. At each sampling time point kTs, ([x(kTs-Ts), x(kTs-2Ts),..., x(kTs-nyTs)], x(kTs)) is a training sample for ELM. In time series one step ahead prediction, the common way is to use xk+1 = [x(kTs), x(kTs-Ts), ..., x(kTs-(ny-1)Ts)] to calculate predicting value of x(kTs+Ts), namely, x ^ ( k T s + T s ) . In the simulation, the sampling period is set as Ts = 0.02, the embedding dimension is chosen as ny = 3, and the x-coordinate of the Lorenz time series is considered for prediction.
In order to verify the computational efficiency of CF-OSELM-PRFF compared to FP-ELM, FGR-OSELM, AFGR-OSELM and FOS-MELM, the running times of these algorithms and speedups of CF-OSELM-PRFF to the other algorithms are tabulated in Table 3. Every running time is a mean over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. As seen in Table 3, CF-OSELM-PRFF clearly outperforms the other algorithms in terms of speed.
Table 3. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of time series (46).
Table 4 shows the prediction RMSE of five methods. Every RMSE is also an average value over 5 independent trials. In each set of results, the best one is marked in bold. As shown in Table 4, on the whole, the prediction behaviors of these methods in this simulation are basically similar to those in the first simulation. When the number of simulation prediction steps is set to 3000, FGR-OSELM occasionally produces poor predictions in the later stage of the simulation process; thus, RMSE of FGR-OSELM becomes larger. Unexpectedly, in many cases, AFGR-OSELM cannot provide reasonable predictions.
Table 4. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of time series (46).
Analogously, an intuitional comparison of predicting results for these algorithms is also made. Fix L = 200, N0 = 200, and simulation prediction steps = 3000, run FGR-OSELM repetitively until a certain unstable predictive scenario appears, illustrate the corresponding prediction error curves in Figure 3a, save the current a, b values, i.e., adverse a, b values. Subsequently, execute AFGR-OSELM, FP-ELM, CF-OSELM-PRFF and FOS-MELM with the adverse a, b values, respectively, their prediction error curves are demonstrated in Figure 3b, c, d and e.
Figure 3. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on time series (46): (a) Prediction error curve of FGR-OSELM; (b) That of AFGR-OSELM; (c) That of FP-ELM; (d) That of CF-OSELM-PRFF; (e) That of FOS-MELM.
Figure 3a shows that the prediction errors of FGR-OSELM become very large when the prediction step exceeds a certain limit. In Figure 3b, only curve from the 450th step to the 540th step is plotted, because after the 540th step, AFGR-OSELM provides too large prediction errors at many time points. The recursive approximate calculations of FGR-OSELM and AFGR-OSELM result in instability in certain settings. Figure 3c and d manifest prediction error curve of CF-OSELM-PRFF is extremely similar to that of FP-ELM. Comparing Figure 3d and e, it can be found that the prediction result of CF-OSELM-PRFF is slightly better than that of FOS-MELM.
Although normalizing time series values, i.e., input data of SLFN into the interval [0,1] or [−1,1], can significantly improve stability of FGR-OSELM and AFGR-OSELM, it is difficult to obtain the maximum and minimum values of the input data in some actual online modeling. Thus, the normalization of input data is infeasible sometimes. In this example, CF-OSELM-PRFF and FP-ELM can train SLFN and provide comfortable results using raw data; they are less susceptible to scope of input data than FGR-OSELM and AFGR-OSELM.

4.3. Rössler Time Series Prediction

The Rössler system is one of the most famous chaotic systems, though it is an artificial system designed solely with the purpose of creating a model for a strange attractor [,,]. The Rössler time series is generated from the following differential equations:
d x ( t ) d t = y ( t ) z ( t ) d y ( t ) d t = x ( t ) + d y ( t ) d z ( t ) d t = e + z ( t ) ( x ( t ) f )
where x(t), y(t), and z(t) are the values of time series at time t. d, e, f are the control parameters and they are set as d = 0.15, e = 0.2 and f = 10.
The way to generate sampling values of x(t), y(t), and z(t) is the same as that aforementioned in the previous example. The initial condition is set as x(0) = 0.05, y(0) = 0.05, and z(0) = 0.05. The sampling interval is set as Ts=0.02, the embedding dimension is chosen as ny=3, and the x-coordinate of the the Rössler systems is considered for prediction.
In this simulation, the experimental design is the same as that in the previous simulations. The running times of these algorithms and speedups of CF-OSELM-PRFF to other algorithms are recorded in Table 5. Every running time is an average over 5 independent trials. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. From Table 5, it is clear to see that CF-OSELM-PRFF is superior to other methods in terms of efficiency.
Table 5. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of time series (47).
Table 6 shows the prediction RMSE of five methods. Every RMSE is an average over 5 independent trials. In each set of results, the best one is marked in bold. Different from the previous two experiments, here the chance that FGR-OSELM behaves unstably in the later segment of the simulation process when the numbers of simulation prediction steps are set as 3000. Moreover, AFGR-OSELM can behave stably with larger probability. If AFGR-OSELM is run for only 5 successive times, unreasonable prediction results rarely appear. Thus, its performance has not been investigated in the following contrastive demonstration. Additionally, in many cases, FOS-MELM achieves the best results; with the number of nodes increasing, FOS-MELM yields higher accuracy than CF-OSELM-PRFF at the expense of requiring more time.
Table 6. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of time series (47).
Accordingly, for the case of L = 25, N0 = 50 and simulation prediction steps = 3000, FGR-OSELM, CF-OSELM-PRFF, FP-ELM and FOS-MELM are conducted with the same adverse a, b values, respectively, and their prediction error curves are plotted in Figure 4. Figure 4a shows that the FGR-OSELM works first but fails afterwards. Figure 4b and c manifest CF-OSELM-PRFF and FP-ELM provide almost the same good prediction results. Contrast Figure 4c and d, it can be found that error range of CF-OSELM-PRFF is smaller than that of FOS-MELM; in other words, the former yields better forecasting than the latter in the late stage.
Figure 4. Prediction error curves of relevant approaches with an adverse parameters setting of FGR-OSELM on time series (47): (a) Prediction error curve of FGR-OSELM; (b) That of FP-ELM; (c) That of CF-OSELM-PRFF; (d) That of FOS-MELM.

4.4. Experiment on Real Data Set

Electricity load forecasting plays an important part in the strategy management of electricity power systems. Here, an electricity demand time series (EDTS) [] is utilized to test the performances of these online algorithms; the EDTS consists of a sequence of 15 minutes averaged values of power demand. The first 3500 values of EDTS are shown in Figure 5.
Figure 5. EDTS.
Before training the model, the data are normalized into [−1,1] by Equation (48); after forecasting, the predicted values are denormalized by Equation (49).
y ( k ) 2 y ( k ) min ( y ) max ( y ) min ( y ) 1
y ^ ( k ) y ^ ( k ) + 1 2 ( max ( y ) min ( y ) ) + min ( y )
In this example, the experiment is designed like the previous ones. The running times of these algorithms and speedups of its counterparts are recorded in Table 7. In each set of experiments with the same number of simulation steps and nodes, the best result is formatted in bold. From Table 7, it is clear that CF-OSELM-PRFF runs faster than the other methods.
Table 7. Efficiency comparison between CF-OSELM-PRFF and its counterparts on prediction of EDTS.
Table 8 shows prediction RMSE of five methods. In each set of results, the best one is marked in bold. CF-OSELM-PRFF produces nearly the same levels of accuracy as FP-ELM, but higher accuracy than FOS-MELM statistically. FGR-OSELM runs unstably only in one case, but AFGR-OSELM does so in many cases. In addition, different from the previous example, FOS-MELM did not achieve the best results here; CF-OSELM-PRFF yields higher accuracy than FOS-MELM in many cases.
Table 8. RMSE comparison of CF-OSELM-PRFF to its counterparts on prediction of EDTS.

4.5. Discussion

The above experiments show that CF-OSELM-PRFF has greater time efficiency than several other related approaches; the speedup ratio of CF-OSELM-PRFF to other approaches is mainly influenced by the number of hidden nodes of SLFN. The CF-OSELM-PRFF has achieved around 1.4 to 2.0 of speedup over the FP-ELM. This speedup can facilitate its use in real applications, and even in real-time applications.
CF-OSELM-PRFF can provide the same predictive accuracy as FP-ELM, and better stability than FGR-OSELM and AFGR-OSELM. Additionally, the experiments also show that there is not an obvious difference between the predictive accuracy of CF-OSELM-PRFF and that of FOS-MELM. In the third simulation, FOS-MELM outperformed CF-OSELM-PRFF statistically, whereas, CF-OSELM-PRFF surpassed FOS-MELM in the next one.
CF-OSELM-PRFF can learn arriving data one-by-one or chunk-by-chunk without the need for storing the training samples accumulated thus far; it is suitable for storage capacity-constrained computing devices.
In the above experiments, CF-OSELM-PRFF adopted a fixed forgetting factor to reduce the contribution of old samples; actually, it may absorb some variable forgetting factor skills, such as that reported in [].

5. Conclusions

Regularization plays an important role in RELM, but in the cost function or recursive solution of many online sequential RELMs, the regularization effect will decay gradually over time. Fortunately, FP-ELM, FGR-OSELM and AFGR-OSELM can maintain persistent regularization effect throughout the whole learning process. They share the same cost function but employ different solving processes.
This paper makes full use of symmetry and positive definitiveness of the coefficient matrix of linear equations of OSELM-PRFF, and factorizes the matrix in Cholesky form to solve equations in every prediction step. Finally, a new solving method for OSELM-PRFF, i.e., CF-OSELM-PRFF is developed. The proposed method is a fast and reliable one; it is very appropriate for fast, and even for real-time modeling of time-varying nonlinear systems.
The regularization item in CF-OSELM-PRFF would not decay over time, but constant regularization parameters make CF-OSELM-PRFF deficient in terms of adaptability. Therefore, it would worthwhile to design a high-efficient method to adjust regularization parameter.

Author Contributions

X.Z. and X.K. conceived and developed the algorithm; X.K. designed the experiments; X.Z. performed the experiments and analyzed the results; X.Z. and X.K. Wrote the manuscript.

Funding

The work is supported by the Hunan Provincial Science and Technology Foundation of China (2011FJ6033); the National Natural Science Foundation of China (No. 61502540); National Science Foundation of Hunan Province (No. 2019JJ40406).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Park, J.; Sandberg, I.W. Universal approximation using radial basis function networks. Neural Comput. 1991, 3, 246–257. [Google Scholar] [CrossRef] [PubMed]
  2. Huang, G.B.; Chen, Y.; Babri, Q.H.A. Classification ability of single hidden layer feedforward neural networks. IEEE Trans. Neural Netw. 2000, 11, 799–801. [Google Scholar] [CrossRef] [PubMed]
  3. Ferrari, S.; Stengel, R.F. Smooth function approximation using neural networks. IEEE Trans. Neural Netw. 2005, 16, 24–38. [Google Scholar] [CrossRef] [PubMed]
  4. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: a new learning scheme of feedforward neural networks. In Proceedings of the international joint conference on neural networks, Budapest, Hungary, 25–29 July 2004. [Google Scholar]
  5. Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  6. Wu, Y.; Liu, D.; Jiang, H. Length-changeable incremental extreme learning machine. J. Comput. Sci. Technol. 2017, 32, 630–643. [Google Scholar] [CrossRef]
  7. Zhu, C.; Zou, B.; Zhao, R.; Cui, J.; Duan, X.; Chen, Z.; Liang, Y. Retinal vessel segmentation in colour fundus images using Extreme Learning Machine. Comput. Med. Imag. Gr. 2017, 55, 68–77. [Google Scholar] [CrossRef] [PubMed]
  8. Liu, H.; Tian, H.Q.; Li, Y.F. Four wind speed multi-step forecasting models using extreme learning machines and signal decomposing algorithms. Energy Convers. Manag. 2015, 100, 16–22. [Google Scholar] [CrossRef]
  9. Mi, X.W.; Liu, H.; Li, Y.F. Wind speed forecasting method using wavelet, extreme learning machine and outlier correction algorithm. Energy Convers. Manag. 2017, 151, 709–722. [Google Scholar] [CrossRef]
  10. Sattar, A.M.A.; Ertuğrul, Ö. F.; Gharabaghi, B.; McBean, E.A.; Cao, J. Extreme learning machine model for water network management. Neural Comput. Appl. 2019, 31, 157–169. [Google Scholar] [CrossRef]
  11. Yang, Y.; Lin, X.; Miao, Z.; Yuan, X.; Wang, Y. Predictive Control Strategy Based on Extreme Learning Machine for Path-Tracking of Autonomous Mobile Robot. Intell. Auto. Soft Comput. 2015, 21, 1–19. [Google Scholar] [CrossRef]
  12. Salmeron, J.L.; Ruiz-Celma, A. Elliot and Symmetric Elliot Extreme Learning Machines for Gaussian Noisy Industrial Thermal Modelling. Energies 2019, 12, 90. [Google Scholar] [CrossRef]
  13. Rodriguez, N.; Alvarez, P.; Barba, L.; Cabrera-Guerrero, G. Combining Multi-Scale Wavelet Entropy and Kernelized Classification for Bearing Multi-Fault Diagnosis. Entropy 2019, 21, 152. [Google Scholar] [CrossRef]
  14. Demertzis, K.; Tziritas, N.; Kikiras, P.; Sanchez, S.L.; Iliadis, L. The Next Generation Cognitive Security Operations Center: Adaptive Analytic Lambda Architecture for Efficient Defense against Adversarial Attacks. Big Data Cogn. Comput. 2019, 3, 6. [Google Scholar] [CrossRef]
  15. Sonobe, R. Parcel-Based Crop Classification Using Multi-Temporal TerraSAR-X Dual Polarimetric Data. Remote Sens. 2019, 11, 1148. [Google Scholar] [CrossRef]
  16. Salerno, V.M.; Rabbeni, G. An Extreme Learning Machine Approach to Effective Energy Disaggregation. Electronics 2018, 7, 235. [Google Scholar] [CrossRef]
  17. Kasun, L.L.C.; Zhou, H.; Huang, G.B.; Vong, C.M. Representational learning with ELMs for big data. IEEE Intell. Syst. 2013, 286, 31–34. [Google Scholar]
  18. Ding, S.; Zhang, N.; Xu, X.; Guo, L.; Zhang, J. Deep Extreme Learning Machine and Its Application in EEG Classification. Math. Probl. Eng. 2015. [Google Scholar] [CrossRef]
  19. Yang, Y.; Wu, Q.M.J. Multilayer extreme learning machine with subnetwork nodes for representation learning. IEEE Trans. Cybern. 2016, 46, 2570–2583. [Google Scholar] [CrossRef] [PubMed]
  20. Xiao, D.; Li, B.; Mao, Y. A Multiple Hidden Layers Extreme Learning Machine Method and Its Application. Math. Probl. Eng. 2017. [Google Scholar] [CrossRef]
  21. Xiao, D.; Li, B.; Zhang, S. An online sequential multiple hidden layers extreme learning machine method with forgetting mechanism. Chemom. Intell. Lab. Syst. 2018, 176, 126–133. [Google Scholar] [CrossRef]
  22. Yang, Y.; Wu, Q.M.J.; Wang, Y. Autoencoder with invertible functions for dimension reduction and image reconstruction. IEEE Trans. Syst. Man Cybern. Syst. 2018, 48, 1065–1079. [Google Scholar] [CrossRef]
  23. Yang, J.; Sun, W.; Liu, N.; Chen, Y.; Wang, Y.; Han, S. A Novel Multimodal Biometrics Recognition Model Based on Stacked ELM and CCA Methods. Symmetry 2018, 10, 96. [Google Scholar] [CrossRef]
  24. Liang, N.Y.; Huang, G.B.; Saratchandran, P.; Sundararajan, N. A fast and accurate online sequential learning algorithm for feedforward networks. IEEE Trans. Neural Netw. 2006, 17, 1411–1423. [Google Scholar] [CrossRef] [PubMed]
  25. Frances-Villora, J.V.; Rosado-Muñoz, A.; Bataller-Mompean, M.; Barrios-Aviles, J.; Guerrero-Martinez, J.F. Moving Learning Machine towards Fast Real-Time Applications: A High-Speed FPGA-Based Implementation of the OS-ELM Training Algorithm. Electronics 2018, 7, 308. [Google Scholar] [CrossRef]
  26. Huynh, H.T.; Won, Y. Regularized online sequential learning algorithm for single-hidden layer feedforward neural networks. Patt. Recognit. Lett. 2011, 32, 1930–1935. [Google Scholar] [CrossRef]
  27. Guo, W.; Xu, T. Online sequential extreme learning machine with generalized regularization and forgetting mechanism. Control Decis. 2017, 32, 247–254. [Google Scholar]
  28. Guo, W.; Xu, T.; Tang, K.; Yu, J.; Chen, S. Online Sequential Extreme Learning Machine with Generalized Regularization and Adaptive Forgetting Factor for Time-Varying System Prediction. Math. Probl. Eng. 2018. [Google Scholar] [CrossRef]
  29. Deng, W.Y.; Zheng, Q.H.; Chen, L. Regularized extreme learning machine. In Proceedings of the IEEE Symposiumon Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009. [Google Scholar]
  30. Ding, S.; Ma, G.; Shi, Z. A Rough RBF Neural Network Based on Weighted Regularized Extreme Learning Machine. Neural Process. Lett. 2014, 40, 245–260. [Google Scholar] [CrossRef]
  31. Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme learning machine for regression and multiclass classification. IEEE Trans. Syst. Man Cybern. Part B Cybern. 2012, 42, 513–529. [Google Scholar] [CrossRef]
  32. Er, M.J.; Shao, Z.; Wang, N. A study on the randomness reduction effect of extreme learning machine with ridge regression. In Proceedings of the Advances in Neural Networks—ISNN 2013, 10th International Symposium on Neural Networks, Dalian, China, 4–6 July 2013. [Google Scholar]
  33. Shao, Z.; Er, M.J.; Wang, N. An effective semi-cross-validation model selection method for extreme learning machine with ridge regression. Neurocomputing 2015, 151, 933–942. [Google Scholar] [CrossRef]
  34. Zhang, X.; Wang, H.L. Selective forgetting extreme learning machine and its application to time series prediction. Acta Phys. Sinica 2011. [Google Scholar] [CrossRef]
  35. Du, Z.; Li, X.; Zheng, Z.; Zhang, G.; Mao, Q. Extreme learning machine based on regularization and forgetting factor and its application in fault prediction. Chinese J. Instrum. 2015, 36, 1546–1553. [Google Scholar]
  36. Zhang, H.; Zhang, S.; Yin, Y. Online Sequential ELM Algorithm with Forgetting Factor for Real Applications. Neurocomputing 2017, 261, 144–152. [Google Scholar] [CrossRef]
  37. Li, Y.; Zhang, S.; Yin, Y.; Xiao, W.; Zhang, J. A Novel Online Sequential Extreme Learning Machine for Gas Utilization Ratio Prediction in Blast Furnaces. Sensors 2017, 17, 1847. [Google Scholar] [CrossRef] [PubMed]
  38. Wu, Z.; Tang, H.; He, S.; Gao, J.; Chen, X.; To, S.; Li, Y.; Yang, Z. Fast dynamic hysteresis modeling using a regularized online sequential extreme learning machine with forgetting property. Int. J. Adv. Manuf. Technol. 2018, 94, 3473–3484. [Google Scholar] [CrossRef]
  39. Liu, D.; Wu, Y.; Jiang, H. FP-ELM: An online sequential learning algorithm for dealing with concept drift. Neurocomputing 2016, 207, 322–334. [Google Scholar] [CrossRef]
  40. Martin, R.S.; Peters, G.; Wilkinson, J.H. Symmetric decomposition of a positive definite matrix. Num. Math. 1965, 7, 362–383. [Google Scholar] [CrossRef]
  41. Narendra, K.; Parthasarathy, K. Identification and control of dynamical systems using neural networks. IEEE Trans. Neural Netw. 1990, 1, 4–27. [Google Scholar] [CrossRef] [PubMed]
  42. Lorenz, E.N. Deterministic nonperiodic flows. J. Atmos. Sci. 1963, 20, 130–141. [Google Scholar] [CrossRef]
  43. Meng, Q.F.; Peng, Y.H.; Sun, J. The improved local linear prediction of chaotic time series. Chin. Phys. 2007, 16, 3220–3225. [Google Scholar]
  44. Rössler, O.E. An Equation for Continuous Chaos. Phys. Lett. A. 1976, 57, 397–398. [Google Scholar] [CrossRef]
  45. Peitgen, H.O.; Jürgens, H.; Saupe, D. Chaos and Fractals New Frontiers of Science, 2nd ed.; Springer: New York, NY, USA, 2004; pp. 636–646. [Google Scholar]
  46. Li, D.; Han, M.; Wang, J. Chaotic time series prediction based on a novel robust echo state network. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 787–799. [Google Scholar] [CrossRef] [PubMed]
  47. Applications of Machine Learning Group. Available online: https://research.cs.aalto.fi/aml/datasets.shtml (accessed on 17 May 2019).

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.