#
L_{1}-Norm Robust Regularized Extreme Learning Machine with Asymmetric C-Loss for Regression

^{1}

^{2}

^{3}

^{*}

## Abstract

**:**

_{1}norm robust regularized extreme learning machine with asymmetric C-loss (L

_{1}-ACELM) is presented to handle the overfitting problem. The proposed algorithm benefits from L

_{1}norm and replaces the square loss function with the AC-loss function. The L

_{1}-ACELM can generate a more compact network with fewer hidden nodes and reduce the impact of noise. To evaluate the effectiveness of the proposed algorithm on noisy datasets, different levels of noise are added in numerical experiments. The results for different types of artificial and benchmark datasets demonstrate that L

_{1}-ACELM achieves better generalization performance compared to other state-of-the-art algorithms, especially when noise exists in the datasets.

## 1. Introduction

_{2}norm which is sensitive to outliers. To reduce the influence of outliers, Rong et al. proposed the pruned extreme learning machine (P-ELM) [9], which can remove irrelevant hidden nodes. P-ELM is only used for classification problems. To further address the regression problem, the optimally pruned extreme learning machine (OP-ELM) [10] was proposed. In OP-ELM, The L

_{1}norm is used to remove irrelevant output nodes and select the corresponding hidden nodes, and then the weight of the corresponding hidden nodes is calculated using the least squares method. Given that the L

_{1}norm is robust to outliers, it is used in various algorithms to improve the generalization performance [11,12]. Balasundaram et al. [13] proposed the L

_{1}norm extreme learning machine, which produces sparse models such that decision functions can be determined using fewer hidden layer nodes. Generally speaking, RELM is composed of empirical risk and structural risk. Structural risk can effectively avoid overfitting, and structural risk is determined by loss function. Traditional RELMs use the squared loss function, which is symmetric and unbounded. The symmetry makes the model unable to take into account the distribution characteristics within the training samples, while unboundedness will cause the model to be sensitive to noise and outliers. In real life, the distribution of data is unbalanced, and noise is generally mixed in the process of data collection. Therefore, it is particularly important to choose an appropriate loss function to construct the model.

_{1}-norm (called L

_{1}-ACELM) is then developed. The main contributions of this report are as follows:

- (1)
- Based on the expectile penalty and correntropy loss function, a new loss function (AC-loss) is developed. AC-loss retains some important properties of C-loss such as non-convexity and boundedness. AC-loss is asymmetric, and it can handle unbalanced noise.
- (2)
- A novel approach called the L
_{1}-norm robust regularized extreme learning machine with asymmetric C-loss (L_{1}-ACELM) is proposed by applying the proposed AC-loss function and the L_{1}-norm in the objective function of ELM to enhance robustness to outliers. - (3)

_{1}-ACELM model. Next, the half-quadratic optimization algorithm is used to solve L

_{1}-ACELM. In addition, we analyze the convergence of the algorithm. The experimental results for the artificial and benchmark datasets are presented in Section 4. Section 5 summarizes the main conclusions and further study.

## 2. Related Work

#### 2.1. Extreme Learning Machine (ELM)

#### 2.2. Correntropy-Induced Loss (C-Loss)

#### 2.3. Half-Quadratic Optimization

## 3. Main Contributions

#### 3.1. Asymmetric C-Loss Function (AC-Loss)

#### 3.2. L_{1}-ACELM

_{2}norm of structural risk in RELM is replaced with the L

_{1}norm. Therefore, we propose a new robust ELM (called L

_{1}-ACELM):

#### 3.3. Solving Method

_{1}norm exists in the objective function, the proximal gradient descent (PGD) algorithm is applied to solve the optimization problem Equation (28). The objective function $J\left({\beta}^{t+1}\right)$ can be written as

Algorithm 1. Half-quadratic optimization for L_{1}-ACELM |

Input: The training dataset $T={\{({x}_{i},{y}_{i})\}}_{i=1}^{N}$, the number of hidden layer nodes L, the activation function $h\left(x\right)$, the regularization parameter $\gamma $, the maximum number of iterations ${t}_{\mathrm{max}}$, window width $\sigma $, a small number $\rho $ and the parameter $\tau $. Output: the output weight vector $\beta $. Step 1. Randomly generate input weight ${\alpha}_{i}$ and hidden layer bias ${b}_{i}$ with L hidden nodes. Step 2. Calculate hidden output matrix $H\left(x\right)$. Step 3. Compute $\beta $ by Equation (7). Step 4. Let ${\beta}^{0}=\beta $ and ${\beta}^{1}=\beta $, set $t=1$. Step 5. While $\left|J\left({\beta}^{t}\right)-J\left({\beta}^{t-1}\right)\right|<\rho $ or $t<{t}_{\mathrm{max}}$ do calculate ${v}_{i}^{t+1}$ by Equation (26). update ${\beta}^{t+1}$ using Equation (35). compute $J\left({\beta}^{t+1}\right)$ by Equation (29). update t: = t + 1. End while Step 6: Output result given by $\beta ={\beta}^{t-1}$. |

#### 3.4. Convergence Analysis

**Proposition**

**1.**

**Proof.**

## 4. Experiments

#### 4.1. Experimental Setup

_{1}-ACELM algorithm, we performed numerical simulations using two artificial datasets and ten standard benchmark datasets. To show the effectiveness of the L

_{1}-ACELM algorithm compared to traditional algorithms including extreme learning machine (ELM), regularized ELM (RELM), and C-loss based ELM (CELM), several experiments were performed. All experiments were implemented in Matlab2016a on a PC with an i5-7200U Intel(R) Core (TM) processor (2.70 GHz) 4 GB RAM.

_{1}-ACELM algorithm, the regression evaluation metrics are defined as follows:

- (1)
- The root mean square error (RMSE)

- (2)
- Mean absolute error (MAE)

- (3)
- The ratio of the sum squared error (SSE) to the sum squared deviation of the sample SST (SSE/SST) is given as:

- (4)
- The ratio between the interpretable sum deviation SSR and SST (SSR/SST) is given by:

_{1}-ACELM, and can be expressed as:

_{1}-ACELM, the optimal value of the regularization parameter $\gamma $ is selected from the set {2

^{−50}, 2

^{−49}, …, 2

^{49}, 2

^{50}}. For CELM and L

_{1}-ACELM, the window width $\sigma $ is selected from the range {2

^{−2}, 2

^{−1}, 2

^{0}, 2

^{1}, 2

^{2}}. For L

_{1}-ACELM, the parameter $\tau $ is obtained from the set {0.1, 0.2, …, 0.9}.

#### 4.2. Performance on Artificial Datasets

_{1}-ACELM, two artificial datasets were generated using six different types of noise, both of which consisted of 2000 data points. Table 1 shows the specific forms of two artificial datasets and different types of noise. ${\lambda}_{i}\sim N\left(0,{s}^{2}\right)$ indicates that ${\lambda}_{i}$ has a normal distribution with a mean of zero and variance of ${s}^{2}$, ${\lambda}_{i}\sim U\left(a,b\right)$ means that ${\lambda}_{i}$ has a uniform distribution in the interval $[a,b]$, ${\lambda}_{i}\sim T\left(c\right)$ indicates that ${\lambda}_{i}$ has a t-distribution with $c$ degrees of freedom.

_{1}-ACELM on two artificial datasets with noise are shown in Figure 4 and Figure 5.

_{1}-ACELM is the closest to the real function curve compared to the other three algorithms. In Table 2, the best test results are shown in bold.

_{1}-ACELM exhibits better performance in most cases when compared to the other three algorithms for the two artificial datasets with different noises. It is evident that L

_{1}-ACELM has smaller RMSE, MAE, and SSE/SST, and larger SSE/SSR. This indicates that L

_{1}-ACELM is more robust to noise. For example, for the sinc function, except for F noise, the performance of the proposed algorithm is superior to that of the other algorithms for different types of noise. Moreover, it is seen that L

_{1}-ACELM has better generalization performance in the case of unbalanced noise data. In conclusion, L

_{1}-ACELM is more stable in a noisy environment.

#### 4.3. Performance on Benchmark Datasets

_{1}-ACELM, experiments were performed on ten UCI datasets [30] with different levels of noise, including noise-free datasets, datasets with 5% noise, and datasets with 10% noise. Noise datasets were only added to the target output value of the training datasets. Among them, datasets with 5% noise indicate that the noisy data are 5% of the training dataset. The data in the noisy dataset are randomly taken from the set $[0,d]$, where d is the average of the target output values of the training datasets.

_{1}-ACELM, the RMSE, MAE, SSE/SST, and SSR/SST were compared with those of ELM, RELM, and CELM. The evaluation indicators and the ranking of each algorithm for different noise environments are listed in Table 4, Table 5 and Table 6, and the best test results are shown in bold. From Table 4 to Table 6, it is observed that the performance of each algorithm decreases as the noise level increases. However, compared to the other algorithms, the performance of L

_{1}-ACELM is still the best in most cases. From Table 4, it can be concluded that L

_{1}-ACELM performs best on nine datasets out of a total of ten datasets in term of the RMSE and SSR/SST values. Similarly, for the MAE and SSE/SST values, L

_{1}-ACELM exhibits the best performance on all the datasets. Table 5 shows that after adding 5% noise, the performance of each algorithm decreases, and according to the RMSE value, the proposed algorithm performed well on eight of the ten datasets. For the MAE, SSE/SST, and SSR/SST values, L

_{1}-ACELM performs better for nine datasets. Moreover, for the RMSE, MAE, and SSR/SST values, it exhibits superior performance in nine cases and for the SSE/SST values, it has better performance in all ten datasets.

_{1}-ACELM, ${F}_{F}>{F}_{\alpha}$ is achieved by comparing the results from Table 10. Therefore, the assumption that all the algorithms perform the same is rejected. To further contrast the differences between paired algorithms, the Nemenyi test [32] is often used as a post hoc test.

- (1)
- Under noise-free environment. For the RMSE and SSR/SST index, the performance of L
_{1}-ACELM is better than that of ELM $\left(4-1.1=2.9>1.4832\right)$. For the MAE index, the performance of L_{1}-ACELM is better than that of ELM $\left(4-1.0=3.0>1.4832\right)$ and RELM $\left(2.6-1.0=1.5>1.4832\right)$. There is no significant difference between L_{1}-ACELM and CELM. - (2)
- Under 5% noise environment. For the RMSE index, the performance of L
_{1}-ACELM is better than that of ELM $\left(3.7-1.0=2.7>1.4832\right)$, RELM $\left(2.6-1.0=1.6>1.4832\right)$, and CELM $\left(2.5-1.0=1.5>1.4832\right)$. For the MAE and SSE/SST index, the performance of L_{1}-ACELM is better than that of ELM ($3.7-1.1=2.6>1.4832$, $3.8-1.1=2.7>1.4832$) and RELM ($2.7-1.1=1.6>1.4832$, $2.8-1.1=1.7>1.4832$). For the SSR/SST index, the performance of L_{1}-ACELM is better than that of ELM $\left(3.7-1.15=2.55>1.4832\right)$ and CELM $\left(2.65-1.15=1.5>1.4832\right)$. - (3)
- Under 10% noise environment. Similarly, for the RMSE, MAE, and SSE/SST index, the performance of L
_{1}-ACELM is better than that of ELM, RELM, and CELM. For the SSR/SST index, the performance of L_{1}-ACELM is better than that of ELM and RELM.

## 5. Conclusions

_{1}norm are introduced into the regularized extreme learning machine, and an improved robust regularized extreme learning machine is proposed for regression. Owing to the non-convexity of the AC-loss function, it is difficult to solve L

_{1}-ACELM. As such, the half-quadratic optimization algorithm is applied to address the nonconvex optimization problem. To prove the effectiveness of L

_{1}-ACELM, experiments are conducted on artificial datasets and benchmark datasets with different types of noise, respectively. The results demonstrate the significant advantages of L

_{1}-ACELM in generalization performance and robustness, especially when the data distribution with noise and outliers are asymmetric.

_{1}-ACELM in this paper. Since it is an iterative process, the training speed is reduced. In the future, we will research a faster method to solve this optimization problem.

## Author Contributions

## Funding

## Data Availability Statement

## Conflicts of Interest

## References

- Ding, S.; Su, C.; Yu, J. An optimizing BP neural network algorithm based on genetic algorithm. Artif. Intell. Rev.
**2011**, 36, 153–162. [Google Scholar] [CrossRef] - Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: A new learning scheme of feedforward neural networks. In Proceedings of the 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No. 04CH37541), Budapest, Hungary, 25–29 July 2004; pp. 985–990. [Google Scholar]
- Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing
**2006**, 70, 489–501. [Google Scholar] [CrossRef] - Silva, B.L.; Inaba, F.K.; Evandro, O.T.; Ciarelli, P.M. Outlier robust extreme machine learning for multi-target regression. Expert Syst. Appl.
**2020**, 140, 112877. [Google Scholar] [CrossRef] [Green Version] - Li, Y.; Wang, Y.; Chen, Z.; Zou, R. Bayesian robust multi-extreme learning machine. Knowl. -Based Syst.
**2020**, 210, 106468. [Google Scholar] [CrossRef] - Liu, X.; Ge, Q.; Chen, X.; Li, J.; Chen, Y. Extreme learning machine for multivariate reservoir characterization. J. Pet. Sci. Eng.
**2021**, 205, 108869. [Google Scholar] [CrossRef] - Catoni, O. Challenging the empirical mean and empirical variance: A deviation study. Annales de l’IHP Probabilités et Statistiques
**2012**, 48, 1148–1185. [Google Scholar] [CrossRef] - Deng, W.; Zheng, Q.; Chen, L. Regularized extreme learning machine. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Data Mining, Nashville, TN, USA, 30 March–2 April 2009; pp. 389–395. [Google Scholar]
- Rong, H.J.; Ong, Y.S.; Tan, A.H.; Zhu, Z. A fast pruned-extreme learning machine for classification problem. Neurocomputing
**2008**, 72, 359–366. [Google Scholar] [CrossRef] - Miche, Y.; Sorjamaa, A.; Bas, P.; Simula, O.; Jutten, C.; Lendasse, A. OP-ELM: Optimally pruned extreme learning machine. IEEE Trans. Neural Netw.
**2009**, 21, 158–162. [Google Scholar] [CrossRef] - Ye, Q.; Yang, J.; Liu, F.; Zhao, C.; Ye, N.; Yin, T. L1-norm distance linear discriminant analysis based on an effective iterative algorithm. IEEE Trans. Circuits Syst. Video Technol.
**2016**, 28, 114–129. [Google Scholar] [CrossRef] - Li, C.N.; Shao, Y.H.; Deng, N.Y. Robust L1-norm non-parallel proximal support vector machine. Optimization
**2016**, 65, 169–183. [Google Scholar] [CrossRef] - Balasundaram, S.; Gupta, D. 1-Norm extreme learning machine for regression and multiclass classification using Newton method. Neurocomputing
**2014**, 128, 4–14. [Google Scholar] [CrossRef] - Dong, H.; Yang, L. Kernel-based regression via a novel robust loss function and iteratively reweighted least squares. Knowl. Inf. Syst.
**2021**, 63, 1149–1172. [Google Scholar] [CrossRef] - Dong, H.; Yang, L. Training robust support vector regression machines for more general noise. J. Intell. Fuzzy Syst.
**2020**, 39, 2881–2892. [Google Scholar] [CrossRef] - Farooq, M.; Steinwart, I. An SVM-like approach for expectile regression. Comput. Stat. Data Anal.
**2017**, 109, 159–181. [Google Scholar] [CrossRef] [Green Version] - Razzak, I.; Zafar, K.; Imran, M.; Xu, G. Randomized nonlinear one-class support vector machines with bounded loss function to detect of outliers for large scale IoT data. Future Gener. Comput. Syst.
**2020**, 112, 715–723. [Google Scholar] [CrossRef] - Gupta, D.; Hazarika, B.B.; Berlin, M. Robust regularized extreme learning machine with asymmetric Huber loss function. Neural Comput. Appl.
**2020**, 32, 12971–12998. [Google Scholar] [CrossRef] - Ren, Z.; Yang, L. Correntropy-based robust extreme learning machine for classification. Neurocomputing
**2018**, 313, 74–84. [Google Scholar] [CrossRef] - Ma, Y.; Zhang, Q.; Li, D.; Tian, Y. LINEX support vector machine for large-scale classification. IEEE Access.
**2019**, 7, 70319–70331. [Google Scholar] [CrossRef] - Singh, A.; Pokharel, R.; Principe, J. The C-loss function for pattern classification. Pattern Recognit.
**2014**, 47, 441–453. [Google Scholar] [CrossRef] - Zhou, R.; Liu, X.; Yu, M.; Huang, K. Properties of risk measures of generalized entropy in portfolio selection. Entropy
**2017**, 19, 657. [Google Scholar] [CrossRef] - Ren, L.R.; Gao, Y.L.; Liu, J.X.; Shang, J.; Zheng, C.H. Correntropy induced loss based sparse robust graph regularized extreme learning machine for cancer classification. BMC Bioinform.
**2020**, 21, 1–22. [Google Scholar] [CrossRef] [PubMed] - Zhao, Y.P.; Tan, J.F.; Wang, J.J.; Yang, Z. C-loss based extreme learning machine for estimating power of small-scale turbojet engine. Aerosp. Sci. Technol.
**2019**, 89, 407–419. [Google Scholar] [CrossRef] - He, Y.; Wang, F.; Li, Y.; Qin, J.; Chen, B. Robust matrix completion via maximum correntropy criterion and half-quadratic optimization. IEEE Trans. Signal Process.
**2019**, 68, 181–195. [Google Scholar] [CrossRef] [Green Version] - Ren, Z.; Yang, L. Robust extreme learning machines with different loss functions. Neural Process. Lett.
**2019**, 49, 1543–1565. [Google Scholar] [CrossRef] - Chen, L.; Paul, H.; Qu, H.; Zhao, J.; Sun, X. Correntropy-based robust multilayer extreme learning machines. Pattern Recognit.
**2018**, 84, 357–370. [Google Scholar] - Huang, G.; Huang, G.B.; Song, S.; You, K. Trends in extreme learning machines: A review. Neural Netw.
**2015**, 61, 32–48. [Google Scholar] [CrossRef] - Robini, M.C.; Yang, F.; Zhu, Y. Inexact half-quadratic optimization for linear inverse problems. SIAM J. Imaging Sci.
**2018**, 11, 1078–1133. [Google Scholar] [CrossRef] - Blake, C.L.; Merz, C.J.; UCI Repository for Machine Learning Databases. Department of Information and Computer Sciences, University of California, Irvine. 1998. Available online: http://www.ics.uci.edu/~mlearn/MLRepository.html (accessed on 15 June 2022).
- Demšar, J. Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res.
**2006**, 7, 1–30. [Google Scholar] - Benavoli, A.; Corani, G.; Mangili, F. Should we really use post-hoc tests based on mean-ranks? J. Mach. Learn. Res.
**2016**, 17, 152–161. [Google Scholar]

Artificial Dataset | Function Definition | Types of Noise |
---|---|---|

Sinc function | ${y}_{i}=\mathrm{sin}c\left(2{x}_{i}\right)=\frac{\mathrm{sin}\left(2{x}_{i}\right)}{2{x}_{i}}+{\lambda}_{i}$ | Type A: $x\in \left[-3,3\right]$,${\lambda}_{i}\sim N\left(0,0.15^2\right)$ Type B: $x\in \left[-3,3\right]$,${\lambda}_{i}\sim N\left(0,0.5^2\right)$ Type C: $x\in \left[-3,3\right]$,${\lambda}_{i}\sim U\left(-0.15,0.15\right)$ Type D: $x\in \left[-3,3\right]$,${\lambda}_{i}\sim U\left(0.5,0.5\right)$ Type E: $x\in \left[-3,3\right]$,${\lambda}_{i}\sim T\left(5\right)$ Type F: $x\in \left[-3,3\right]$,${\lambda}_{i}\sim T\left(10\right)$ |

Self-defining function | ${y}_{i}={e}^{{x}_{i}^{2}\mathrm{sin}c\left(0.3\pi {x}_{i}\right)}+{\lambda}_{i}$ |

Dataset | Noise | Algorithm | $(\mathit{\gamma},\mathit{\sigma},\mathit{\tau})$ | RMSE | MAE | SSE/SST | SSR/SST |
---|---|---|---|---|---|---|---|

Sinc function | Type A | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{20}, /, /)(2 ^{10}, 2^{−2}, /)(2 ^{−23}, 2^{−2}, 0.7) | 0.2429 0.2341 0.2345 0.2109 | 0.1957 0.1942 0.1949 0.1690 | 0.6206 0.5768 0.5785 0.4680 | 0.3808 0.4263 0.4256 0.5359 |

Type B | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{2}, /, /)(2 ^{−19}, 2^{−2}, /)(2 ^{5}, 2^{−2}, 0.3) | 0.5288 0.5270 0.5286 0.5221 | 0.4199 0.4186 0.4199 0.4143 | 0.9064 0.9004 0.9060 0.8838 | 0.0988 0.1004 0.0991 0.1246 | |

Type C | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{−42}, /, /)(2 ^{10}, 2^{−2}, /)(2 ^{39}, 2^{−2}, 0.7) | 0.1923 0.2019 0.1922 0.1595 | 0.1581 0.1677 0.1582 0.1309 | 0.4332 0.4776 0.4325 0.2978 | 0.5701 0.5233 0.5705 0.7023 | |

Type D | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{12}, /, /)(2 ^{−38}, 2^{−2}, /)(2 ^{−4}, 2^{−2}, 0.3) | 0.3262 0.3246 0.3223 0.3199 | 0.2715 0.2709 0.2695 0.2678 | 0.6963 0.6890 0.6828 0.6706 | 0.7633 0.7578 0.7664 0.8571 | |

Type E | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{12}, /, /)(2 ^{−12}, 2^{−2}, /)(2 ^{−12}, 2^{−2}, 0.2) | 0.1737 0.1766 0.1725 0.1349 | 0.1406 0.1441 0.1398 0.1175 | 0.2369 0.2451 0.2338 0.1431 | 0.7633 0.7578 0.7664 0.8571 | |

Type F | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{−2}, /, /)(2 ^{−1}, 2^{−2}, /)(2 ^{−3}, 2^{−2}, 0.1) | 0.18850.17460.1757 0.1753 | 0.14220.14120.1413 0.1416 | 0.27150.23280.2359 0.2346 | 0.72980.76810.7651 0.7663 | |

Type A | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{−8}, /, /)(2 ^{−7}, 2^{−2}, /)(2 ^{−10}, 2^{−2}, 0.5) | 0.1572 0.1569 0.1565 0.1560 | 0.1304 0.1301 0.1294 0.1241 | 0.0908 0.0893 0.0888 0.0800 | 0.9105 0.9120 0.9127 0.9211 | |

Type B | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{26}, /, /)(2 ^{15}, 2^{−2}, /)(2 ^{−16}, 2^{−2}, 0.2) | 0.4905 0.4862 0.4858 0.4849 | 0.3843 0.3850 0.3838 0.3795 | 0.4761 0.4766 0.4759 0.4641 | 0.5251 0.5249 0.5252 0.5369 | |

Type C | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{25}, /, /)(2 ^{17}, 2^{−2}, /)(2 ^{37}, 2^{−2}, 0.2) | 0.0937 0.0950 0.0936 0.0934 | 0.0794 0.0803 0.0792 0.0791 | 0.0288 0.0296 0.0287 0.0286 | 0.9714 0.9706 0.9715 0.9716 | |

Type D | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{15}, /, /)(2 ^{−34}, 2^{−2}, /)(2 ^{22}, 2^{−2}, 0.7) | 0.3009 0.3006 0.2948 0.2929 | 0.2622 0.2614 0.2555 0.2534 | 0.2471 0.2466 0.2373 0.2342 | 0.7534 0.7539 0.7634 0.7665 | |

Type E | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{−26}, /, /)(2 ^{2}, 2^{−2}, /)(2 ^{44}, 2^{−2}, 0.4) | 0.0434 0.0426 0.0425 0.0415 | 0.0372 0.0367 0.0363 0.0335 | 0.0074 0.0071 0.0071 0.0068 | 0.9929 0.9932 0.9932 0.9935 | |

Self−defining function | Type F | ELM RELM CELM L _{1}−ACELM | (/, /, /) (2 ^{5}, /, /)(2 ^{12}, 2^{−2}, /)(2 ^{20}, 2^{−2}, 0.3) | 0.0498 0.0761 0.04810.0513 | 0.0425 0.0586 0.0408 0.0372 | 0.0098 0.0230 0.00920.0104 | 0.9912 0.9779 0.99200.9908 |

Dataset | Number of Training Data | Number of Testing Data | Number of Features |
---|---|---|---|

Boston Housing | 404 | 102 | 13 |

Air Quality | 7485 | 1872 | 12 |

AutoMPG | 313 | 79 | 7 |

Triazines | 148 | 38 | 60 |

Bodyfat | 201 | 51 | 14 |

Pyrim | 59 | 15 | 27 |

Servo | 133 | 34 | 4 |

Bike Sharing | 584 | 147 | 13 |

Balloon | 1600 | 401 | 1 |

NO_{2} | 400 | 100 | 7 |

Dataset | Algorithm | $(\mathit{\gamma},\mathit{\sigma},\mathit{\tau}$) | RMSE | MAE | SSE/SST | SSR/SST |
---|---|---|---|---|---|---|

Boston Housing | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−16}, /, /)(2 ^{−31}, 2^{−2}, /)(2 ^{−24}, 2^{−2}, 0.4) | 4.4449(4) 4.1636(3) 4.1511(2) 4.0435(1) | 3.1736(4) 2.9660(2) 2.9847(3) 2.9236(1) | 0.2438(4) 0.2068(3) 0.2067(2) 0.1965(1) | 0.7682(4) 0.7998(3) 0.8002(2) 0.8097(1) |

Air Quality | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−32}, /, /)(2 ^{−37}, 2^{−2}, /)(2 ^{−36}, 2^{−2}, 0.4) | 8.3167(4)7.4516(1)7.5140(3) 7.4574(2) | 6.5439(4) 5.7812(3) 5.7604(2) 5.7383(1) | 0.0297(4) 0.0215(2.5) 0.0215(2.5) 0.0212(1) | 0.9705(4) 0.9786(2) 0.9785(3) 0.9788(1) |

AutoMPG | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−57}, /, /)(2 ^{−43}, 2^{−2}, /)(2 ^{−32}, 2^{−2}, 0.5) | 2.8296(4) 2.6859(3) 2.6590(2) 2.5914(1) | 2.0956(4) 1.9632(3) 1.9582(2) 1.8949(1) | 0.1352(4) 0.1205(3) 0.1202(2) 0.1143(1) | 0.8710(4) 0.8845(2) 0.8840(3) 0.8907(1) |

Triazines | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−49}, /, /)(2 ^{−19}, 2^{−2}, /)(2 ^{−31}, 2^{−2}, 0.5) | 0.0664(4) 0.0557(3) 0.0529(2) 0.0490(1) | 0.0465(4) 0.0410(3) 0.0393(2) 0.0365(1) | 0.0816(4) 0.0545(3) 0.0526(2) 0.0416(1) | 0.9283(4) 0.9547(3) 0.9573(2) 0.9645(1) |

Bodyfat | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−10}, /, /)(2 ^{−6}, 2^{−2}, /)(2 ^{−16}, 2^{−2}, 0.1) | 1.3123(4) 1.1374(3) 1.1352(2) 1.0036(1) | 0.7449(4) 0.6904(3) 0.6858(2) 0.5936(1) | 0.0298(4) 0.0233(2) 0.0234(3) 0.0189(1) | 0.9732(4) 0.9794(2) 0.9787(3) 0.9820(1) |

Pyrim | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−1}, /, /)(2 ^{−20}, 2^{−2}, /)(2 ^{−10}, 2^{−2}, 0.1) | 0.1085(4) 0.0759(2) 0.0800(3) 0.0728(1) | 0.0688(4) 0.0548(2) 0.0552(3) 0.0502(1) | 0.6897(4) 0.3535(2) 0.3839(3) 0.2956(1) | 0.6143(4) 0.8034(2) 0.7718(3) 0.8284(1) |

Servo | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−40}, /, /)(2 ^{−41}, 2^{−2}, /)(2 ^{−46}, 2^{−2}, 0.4) | 0.7367(4) 0.6769(3) 0.6733(2) 0.6593(1) | 0.5220(4) 0.4750(3) 0.4730(2) 0.4491(1) | 0.2826(4) 0.2075(3) 0.2061(2) 0.1917(1) | 0.7874(4) 0.8148(3) 0.8214(2) 0.8270(1) |

Bike Sharing | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−10})(2 ^{−16}, 2^{−2}, /)(2 ^{−9}, 2^{−2}, 0.2) | 287.615(4) 236.107(2) 241.917(3) 217.385(1) | 206.507(4) 178.976(2) 180.856(3) 160.747(1) | 0.0230(4) 0.0157(2) 0.0161(3) 0.0130(1) | 0.9773(4) 0.9851(2) 0.9844(3) 0.9873(1) |

Balloon | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−29}, /, /)(2 ^{−25}, 2^{−2}, /)(2 ^{−24}, 2^{−2}, 0.9) | 0.0850(4) 0.0796(3) 0.0782(2) 0.0773(1) | 0.0543(4) 0.0528(3) 0.0527(2) 0.0525(1) | 0.3452(4) 0.2991(3) 0.2806(2) 0.2790(1) | 0.7026(4) 0.7147(3) 0.7335(1)0.7304(2) |

NO_{2} | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−9}, /, /)(2 ^{−15}, 2^{−2}, /)(2 ^{−17}, 2^{−2}, 0.2) | 0.5272(4) 0.5154(2) 0.5161(3) 0.5132(1) | 0.4128(4) 0.4034(2) 0.4047(3) 0.4028(1) | 0.5157(4) 0.4844(2) 0.4910(3) 0.4823(1) | 0.5060(4) 0.5298(2) 0.5271(3) 0.5338(1) |

Dataset | Algorithm | $(\mathit{\gamma},\mathit{\sigma},\mathit{\tau}$) | RMSE | MAE | SSE/SST | SSR/SST |
---|---|---|---|---|---|---|

Boston Housing | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−17}, /, /)(2 ^{−6}, 2^{−2}, /)(2 ^{−5}, 2^{−2}, 0.5) | 6.5817(4) 6.2972(3) 6.2155(2) 6.1256(1) | 4.1292(4) 3.9095(3) 3.8937(2) 3.8185(1) | 0.4196(4) 0.3835(3) 0.3756(2) 0.3675(1) | 0.5962(4) 0.6327(3) 0.6407(2) 0.6478(1) |

Air Quality | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−32}, /, /)(2 ^{−39}, 2^{−2}, /)(2 ^{−39}, 2^{−2}, 0.8) | 12.0381(4) 11.6199(2) 11.6303(3) 11.5540(1) | 7.5222(4) 7.1866(3) 7.1554(2) 7.1145(1) | 0.0531(4) 0.0496(2) 0.0499(3) 0.0489(1) | 0.9471(4) 0.9504(2) 0.9501(3) 0.9511(1) |

AutoMPG | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−21}, /, /)(2 ^{−28}, 2^{−2}, /)(2 ^{−30}, 2^{−2}, 0.9) | 5.6949(4) 5.5923(2) 5.6502(3) 5.4775(1) | 3.2315(4) 3.1677(3) 3.1189(2) 3.0347(1) | 0.4024(4) 0.3919(3) 0.3915(2) 0.3688(1) | 0.6204(4) 0.6337(2) 0.6299(3) 0.6558(1) |

Triazines | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−16}, /, /)(2 ^{−39}, 2^{−2}, /)(2 ^{−22}, 2^{−2}, 0.5) | 0.0937(4) 0.0790(3) 0.0779(2) 0.0725(1) | 0.0618(4) 0.0549(3) 0.0515(2) 0.0489(1) | 0.1510(4) 0.1031(3) 0.0989(2) 0.0834(1) | 0.8719(4) 0.9199(3) 0.9172(2) 0.9273(1) |

Bodyfat | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−16}, /, /)(2 ^{−36}, 2^{−2}, /)(2 ^{−11}, 2^{−2}, 0.6) | 4.1325(4) 3.9255(3) 3.8868(2) 3.7288(1) | 2.0890(4) 2.0575(3) 2.0413(2) 1.9119(1) | 0.2414(4) 0.2115(3) 0.2095(2) 0.1986(1) | 0.7783(4) 0.8027(2) 0.8078(3) 0.8149(1) |

Pyrim | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−12}, /, /)(2 ^{−3}, 2^{−2}, /)(2 ^{−13}, 2^{−2}, 0.8) | 0.1019(4) 0.0825(2) 0.0871(3) 0.0743(1) | 0.0722(4) 0.0591(2) 0.0609(3) 0.0562(1) | 0.6711(4) 0.4008(2) 0.4435(3) 0.3720(1) | 0.6685(4) 0.7537(2) 0.7153(3) 0.7762(1) |

Servo | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−46}, /, /)(2 ^{−42}, 2^{−2}, /)(2 ^{−49}, 2^{−2}, 0.7) | 0.8424(4) 0.7753(3) 0.7598(1)0.7724(2) | 0.5868(4) 0.5473(3) 0.5252(1)0.5299(2) | 0.3224(4) 0.2794(3) 0.2763(1)0.2983(2) | 0.7235(4) 0.7742(3) 0.7752(2) 0.7778(1) |

Bike Sharing | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−1}, /, /)(2 ^{−9}, 2^{−2}, /)(2 ^{−6}, 2^{−2}, 0.9) | 1130.04(4) 1093.85(2) 1094.35(3) 1085.27(1) | 497.051(4) 453.720(2) 461.094(3) 441.646(1) | 0.2730(4) 0.2556(3) 0.2545(2) 0.2526(1) | 0.7352(4) 0.7505(3) 0.7523(1.5)0.7523(1.5) |

Balloon | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−16}, /, /)(2 ^{−9}, 2^{−2}, /)(2 ^{−5}, 2^{−2}, 0.9) | 0.0874(4) 0.0850(3) 0.0799(2) 0.0782(1) | 0.0546(3) 0.0544(2) 0.0549(4) 0.0536(1) | 0.3815(4) 0.3444(3) 0.3086(2) 0.2704(1) | 0.6794(4) 0.7170(2) 0.7135(3) 0.7368(1) |

NO_{2} | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−31}, /, /)(2 ^{−19}, 2^{−2}, /)(2 ^{−19}, 2^{−2}, 0.5) | 0.9489(1)0.9698(3) 0.9737(4) 0.9611(2) | 0.5767(2) 0.5781(3) 0.5856(4) 0.5708(1) | 0.7594(2) 0.7754(3) 0.7844(4) 0.7515(1) | 0.2803(1)0.2692(3) 0.2644(4) 0.2790(2) |

Dataset | Algorithm | $(\mathit{\gamma},\mathit{\sigma},\mathit{\tau}$) | RMSE | MAE | SSE/SST | SSR/SST |
---|---|---|---|---|---|---|

Boston Housing | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−30}, /, /)(2 ^{−36}, 2^{−2}, /)(2 ^{−48}, 2^{−2}, 0.9) | 8.6315(4) 8.2456(3) 8.2437(2) 8.1718(1) | 5.1524(4) 5.1512(3) 4.9250(2) 4.8090(1) | 0.5873(4) 0.5177(3) 0.5151(2) 0.5123(1) | 0.4557(4) 0.4999(3) 0.5006(2) 0.5074(1) |

Air Quality | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−39}, /, /)(2 ^{−45}, 2^{−2}, /)(2 ^{−4}, 2^{−2}, 0.6) | 14.7386(4) 14.5651(3) 14.5412(2) 14.4355(1) | 8.8277(4) 8.4928(3) 8.4737(2) 8.4236(1) | 0.0778(4) 0.0759(3) 0.0754(2) 0.0748(1) | 0.9223(4) 0.9241(3) 0.9246(2) 0.9253(1) |

AutoMPG | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−28}, /, /)(2 ^{−27}, 2^{−2}, /)(2 ^{−39}, 2^{−2}, 0.1) | 7.0139(3) 7.0729(4) 6.9306(2) 6.9151(1) | 4.0307(2) 4.0592(3) 4.0792(4) 3.9845(1) | 0.5218(3) 0.5278(4) 0.5147(2) 0.5032(1) | 0.5009(4) 0.5068(3) 0.5183(1)0.5169(2) |

Triazines | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−37}, /, /)(2 ^{−21}, 2^{−2}, /)(2 ^{−29}, 2^{−2}, 0.6) | 0.1166(4) 0.1068(2) 0.1074(3) 0.0963(1) | 0.0776(4) 0.0703(2) 0.0705(3) 0.0638(1) | 0.2077(4) 0.1693(2) 0.1729(3) 0.1378(1) | 0.8116(4) 0.8536(2) 0.8501(3) 0.8815(1) |

Bodyfat | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−23}, /, /)(2 ^{−22}, 2^{−2}, /)(2 ^{−8}, 2^{−2}, 0.4) | 6.5116(3) 6.5075(2) 6.5343(4) 6.3088(1) | 3.4749(2) 3.4977(3) 3.5697(4) 3.4931(1) | 0.4184(4) 0.4094(2) 0.4119(3) 0.3743(1) | 0.6129(4) 0.6180(3) 0.6182(2) 0.6515(1) |

Pyrim | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−23}, /, /)(2 ^{−10}, 2^{−2}, /)(2 ^{−24}, 2^{−2}, 0.5) | 0.1263(4) 0.1136(2) 0.1137(3) 0.1010(1) | 0.0903(4) 0.0804(2) 0.0812(3) 0.0717(1) | 0.9389(4) 0.7002(2) 0.7098(3) 0.4848(1) | 0.5540(4) 0.6048(3) 0.6515(2) 0.7080(1) |

Servo | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−34}, /, /)(2 ^{−39}, 2^{−2}, /)(2 ^{−45}, 2^{−2}, 0.9) | 0.8648(4) 0.8253(3) 0.8025(2) 0.7486(1) | 0.6291(3) 0.6889(4) 0.5487(2) 0.5332(1) | 0.3719(4) 0.2863(3) 0.2788(2) 0.2412(1) | 0.7042(4) 0.7633(2) 0.7557(3) 0.7960(1) |

Bike Sharing | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−39}, /, /)(2 ^{−42}, 2^{−2}, /)(2 ^{−49}, 2^{−2}, 0.1) | 1614.52(4) 1587.01(3) 1582.54(2) 1562.74(1) | 755.097(4) 716.147(2) 718.328(3) 714.710(1) | 0.4224(4) 0.4052(3) 0.4012(2) 0.3952(1) | 0.5926(4) 0.6055(3) 0.6089(2) 0.6194(1) |

Balloon | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−34}, /, /)(2 ^{−39}, 2^{−2}, /)(2 ^{−42}, 2^{−2}, 0.5) | 0.0785(1)0.0807(4) 0.0793(3) 0.0788(2) | 0.0547(3) 0.0549(4) 0.0545(2) 0.0544(1) | 0.2749(2) 0.2871(3) 0.2931(4) 0.2682(1) | 0.7321(2) 0.7206(3) 0.7127(4) 0.7398(1) |

NO_{2} | ELM RELM CELM L _{1}-ACELM | (/, /, /) (2 ^{−16}, /, /)(2 ^{−27}, 2^{−2}, /)(2 ^{−23}, 2^{−2}, 0.2) | 1.2576(4) 1.2718(2) 1.2478(3) 1.2408(1) | 0.7013(1)0.7259(4) 0.7164(3) 0.7080(2) | 0.8752(3) 0.8908(4) 0.8639(2) 0.8566(1) | 0.1643(4) 0.1663(3) 0.1770(2) 0.1882(1) |

Algorithm | RMSE | MAE | SSE/SST | SSR/SST |
---|---|---|---|---|

ELM | 4 | 4 | 4 | 4 |

RELM | 2.5 | 2.6 | 2.55 | 2.4 |

CELM | 2.4 | 2.4 | 2.45 | 2.5 |

L_{1}-ACELM | 1.1 | 1.0 | 1.0 | 1.1 |

Algorithm | RMSE | MAE | SSE/SST | SSR/SST |
---|---|---|---|---|

ELM | 3.7 | 3.7 | 3.8 | 3.7 |

RELM | 2.6 | 2.7 | 2.8 | 2.5 |

CELM | 2.5 | 2.5 | 2.3 | 2.65 |

L_{1}-ACELM | 1.0 | 1.1 | 1.1 | 1.15 |

Algorithm | RMSE | MAE | SSE/SST | SSR/SST |
---|---|---|---|---|

ELM | 3.5 | 3.1 | 3.6 | 3.8 |

RELM | 2.8 | 3.0 | 2.9 | 2.8 |

CELM | 2.6 | 2.8 | 2.5 | 2.3 |

L_{1}-ACELM | 1.1 | 1.1 | 1.0 | 1.1 |

Ratio of Noise | ${\mathit{\chi}}_{\mathit{F}}^{\mathbf{2}}$ | ${\mathit{F}}_{\mathit{F}}$ | CD | ||||||
---|---|---|---|---|---|---|---|---|---|

RMSE | MAE | SSE/SST | SSR/SST | RMSE | MAE | SSE/SST | SSR/SST | ||

Noise-free | 25.32 | 27.12 | 27.03 | 25.32 | 48.69 | 84.75 | 81.91 | 48.69 | 1.4832 |

5% noise | 16.20 | 20.64 | 22.68 | 19.71 | 10.57 | 19.81 | 27.89 | 17.24 | 1.4832 |

10% noise | 18.36 | 15.96 | 21.72 | 22.68 | 14.20 | 10.23 | 23.61 | 27.89 | 1.4832 |

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wu, Q.; Wang, F.; An, Y.; Li, K.
L_{1}-Norm Robust Regularized Extreme Learning Machine with Asymmetric C-Loss for Regression. *Axioms* **2023**, *12*, 204.
https://doi.org/10.3390/axioms12020204

**AMA Style**

Wu Q, Wang F, An Y, Li K.
L_{1}-Norm Robust Regularized Extreme Learning Machine with Asymmetric C-Loss for Regression. *Axioms*. 2023; 12(2):204.
https://doi.org/10.3390/axioms12020204

**Chicago/Turabian Style**

Wu, Qing, Fan Wang, Yu An, and Ke Li.
2023. "L_{1}-Norm Robust Regularized Extreme Learning Machine with Asymmetric C-Loss for Regression" *Axioms* 12, no. 2: 204.
https://doi.org/10.3390/axioms12020204