# Simultaneous Feature Selection and Classification for Data-Adaptive Kernel-Penalized SVM

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

## 2. Notation and Framework

#### 2.1. Geometric Interpretation of SVM Kernels

**Lemma**

**1**

**.**Suppose $K(\mathit{x},\mathit{z})$ is a reproducing kernel function, and $\mathit{s}\left(\mathit{x}\right)$ is the corresponding mapping in the support vector machine. Then, (9) holds that

#### 2.2. Penalized SVM

## 3. Methodology of Data-Adaptive Kernel-Penalized SVM

#### 3.1. Kernel-Based Parameters

#### 3.2. Data-Adaptive Kernel Functions

**Theorem**

**1.**

#### 3.3. Data-Adaptive Kernel-Penalized SVM

- A1.
- The penalty function ${p}_{{\lambda}_{n}}\left(x\right)$ is symmetric, non-decreasing and concave for $x\in [0,\infty )$, with a continuous first-order derivative ${p}_{{\lambda}_{n}}^{\prime}\left(x\right)$ on ${R}^{+}$ and ${p}_{{\lambda}_{n}}^{\prime}\left(0\right)=0$.
- A2.
- There exists $a>1$, such that $\underset{x\to 0+}{\mathrm{lim}}{p}_{{\lambda}_{n}}^{\prime}\left(x\right)={\lambda}_{n}$, ${p}_{{\lambda}_{n}}^{\prime}\left(x\right)\ge {\lambda}_{n}-x/a$ for $0<x<a\lambda $ and ${p}_{{\lambda}_{n}}^{\prime}\left(x\right)=0$ for $x\ge a\lambda $.

- SCAD: smoothly clipped absolute deviation [14]$$\begin{array}{ccc}\hfill {p}_{\lambda}\left(\right|\mathbf{w}\left|\right)& =& \lambda \left|\mathbf{w}\right|I(0\le \left|\mathbf{w}\right|<\lambda )+\frac{a\lambda \left|\mathbf{w}\right|-({\mathbf{w}}^{2}+{\lambda}^{2})/2}{a-1}I(\lambda \le |\mathbf{w}|\le a\lambda )\hfill \\ & +& \frac{(a+1){\lambda}^{2}}{2}I\left(\right|\mathbf{w}|>a\lambda )\phantom{\rule{3.33333pt}{0ex}}\mathrm{for}\mathrm{some}\mathrm{a}2.\hfill \end{array}$$
- MCP: minimax concave penalty [28]$${p}_{\lambda}\left(\right|\mathbf{w}\left|\right)=\lambda \left(\right|\mathbf{w}|-\frac{{\mathbf{w}}^{2}}{2a\lambda}\left)I\right(0\le \left|\mathbf{w}\right|<a\lambda )+\frac{a{\lambda}^{2}}{2}I\left(\right|\mathbf{w}|\ge a\lambda )\phantom{\rule{1.em}{0ex}}\mathrm{for}\mathrm{some}\mathrm{a}1.$$
- ${L}_{0}$ norm smooth approximation: ${\parallel \mathbf{w}\parallel}_{0}=\left|\{i:{\mathrm{w}}_{i}\ne 0\}\right|$ by [11]. Unlike ${L}_{p}$ norm with $p>0$, ${L}_{0}$ norm is not precisely a norm because the triangle inequality does not hold and, consequently, it is not smooth. Thus, the approximation by a concave function is applied on the ${L}_{0}$ norm so that a penalty function is$${p}_{\lambda}\left(\right|\mathbf{w}\left|\right)={\mathbf{1}}^{T}(\mathbf{1}-\mathrm{exp}(\lambda \left|\mathbf{w}\right|\left)\right)\approx {\parallel \mathbf{w}\parallel}_{0},$$

**Remark**

**1.**

**Remark**

**2.**

**Remark**

**3.**

#### 3.4. An Algorithm to Solve Data-Adaptive Kernel-Penalized SVM

#### 3.5. The Oracle Property

- C1.
- The densities of $\mathbf{Z}$ given $Y=1$ and $-1$ are continuous with common support in ${R}^{q}$, where $\mathbf{Z}$ are truly relevant predictors.
- C2.
- $E({Z}_{j}^{2})<\infty $ for $1\le j\le q$, i.e., the second order moments of all active predictors are finite.
- C3.
- The true parameter ${\mathit{\beta}}_{0}$ is a non-zero and unique vector.
- C4.
- $q=O\left({n}^{c}\right)$ for some $0\le c<1/2$, namely, ${lim}_{n\to \infty}q/{n}^{c}<\infty $.
- C5.
- Eigenvalues of ${n}^{-1}{\left[{\mathbf{X}}^{\odot 2}\right]}^{T}{\mathbf{X}}^{\odot 2}$ are finite, where $\mathbf{X}$ is the input matrix, and ${(\xb7)}^{\odot 2}$ is the component-wise square.

**Theorem**

**2.**

**w**.

## 4. Numerical Studies

#### 4.1. Simulation Study

#### 4.2. A Real Data Example

## 5. Concluding Remarks

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A. Proof of Lemmas and Theorems

#### Appendix A.1. Proof of Lemma 1

**Proof.**

#### Appendix A.2. Proof of Theorem 1

**Proof.**

#### Appendix A.3. Proof of Theorem 2: The Oracle Properties in Data-Adaptive Kernel-Penalized SVM

**Proof.**

## References

- Blum, A.L.; Langley, P. Selection of relevant features and examples in machine learning. Artif. Intell.
**1997**, 97, 245–271. [Google Scholar] [CrossRef][Green Version] - Zhang, L.; Hu, H.; Zhang, D. A credit risk assessment model based on SVM for small and medium enterprises in supply chain finance. Financ. Innov.
**2015**, 1, 14. [Google Scholar] [CrossRef][Green Version] - Khokhar, S.; Zin, A.A.B.M.; Mokhtar, A.S.B.; Pesaran, M. A comprehensive overview on signal processing and artificial intelligence techniques applications in classification of power quality disturbances. Renew. Sustain. Energy Rev.
**2015**, 51, 1650–1663. [Google Scholar] [CrossRef] - Vapnik, V.N.; Vapnik, V. Statistical Learning Theory; Wiley: New York, NY, USA, 1998; Volume 1. [Google Scholar]
- Rodger, J.A. Discovery of medical Big Data analytics: Improving the prediction of traumatic brain injury survival rates by data mining Patient Informatics Processing Software Hybrid Hadoop Hive. Inform. Med. Unlocked
**2015**, 1, 17–26. [Google Scholar] [CrossRef][Green Version] - Maldonado, S.; Weber, R. A wrapper method for feature selection using support vector machines. Inf. Sci.
**2009**, 179, 2208–2217. [Google Scholar] [CrossRef] - Friedman, J.; Hastie, T.; Tibshirani, R. The Elements of Statistical Learning; Springer: Berlin, Germany, 2001; Volume 1. [Google Scholar]
- Zhang, X.; Wu, Y.; Wang, L.; Li, R. Variable selection for support vector machines in moderately high dimensions. J. R. Stat. Soc. Ser. B (Stat. Methodol.)
**2016**, 78, 53–76. [Google Scholar] [CrossRef][Green Version] - Guyon, I.; Weston, J.; Barnhill, S.; Vapnik, V. Gene selection for cancer classification using support vector machines. Mach. Learn.
**2002**, 46, 389–422. [Google Scholar] [CrossRef] - Zou, H. An Improved 1-norm SVM for Simultaneous Classification and Variable Selection. AISTATS
**2007**, 2, 675–681. [Google Scholar] - Maldonado, S.; Weber, R.; Basak, J. Simultaneous feature selection and classification using kernel-penalized support vector machines. Inf. Sci.
**2011**, 181, 115–128. [Google Scholar] [CrossRef] - Pehro, D.; Stork, D. Pattern Classification; Wiley Interscience Publication: Hoboken, NJ, USA, 2001. [Google Scholar]
- Guyon, I.; Elisseeff, A. An introduction to variable and feature selection. J. Mach. Learn. Res.
**2003**, 3, 1157–1182. [Google Scholar] - Fan, J.; Li, R. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc.
**2001**, 96, 1348–1360. [Google Scholar] [CrossRef] - Bradley, P.S.; Mangasarian, O.L. Feature selection via concave minimization and support vector machines. ICML
**1998**, 98, 82–90. [Google Scholar] - Fumera, G.; Roli, F. Support vector machines with embedded reject option. In Pattern Recognition with Support Vector Machines; Springer: New York, NY, USA, 2002; pp. 68–82. [Google Scholar]
- Zhu, J.; Rosset, S.; Hastie, T.; Tibshirani, R. 1-norm Support Vector Machines. NIPS
**2003**, 15, 49–56. [Google Scholar] - Wang, L.; Zhu, J.; Zou, H. The doubly regularized support vector machine. Stat. Sin.
**2006**, 12, 589–615. [Google Scholar] - Wang, L.; Zhu, J.; Zou, H. Hybrid huberized support vector machines for microarray classification and gene selection. Bioinformatics
**2008**, 24, 412–419. [Google Scholar] [CrossRef] [PubMed][Green Version] - Zou, H.; Yuan, M. The F
_{∞}-norm support vector machine. Stat. Sin.**2008**, 18, 379–398. [Google Scholar] - Park, C.; Kim, K.R.; Myung, R.; Koo, J.Y. Oracle properties of scad-penalized support vector machine. J. Stat. Plan. Inference
**2012**, 142, 2257–2270. [Google Scholar] [CrossRef] - Wu, G.; Chang, E.Y. Adaptive feature-space conformal transformation for imbalanced-data learning. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), Washington, DC, USA, 21–24 August 2003; pp. 816–823. [Google Scholar]
- Williams, P.; Li, S.; Feng, J.; Wu, S. Scaling the kernel function to improve performance of the support vector machine. In Advances in Neural Networks–ISNN 2005; Springer: Cham, Switzerland, 2005; pp. 831–836. [Google Scholar]
- Maratea, A.; Petrosino, A.; Manzo, M. Adjusted F-measure and kernel scaling for imbalanced data learning. Inf. Sci.
**2014**, 257, 331–341. [Google Scholar] [CrossRef] - Boser, B.E.; Guyon, I.M.; Vapnik, V.N. A training algorithm for optimal margin classifiers. In Proceedings of the Fifth Annual Workshop on Computational Learning Theory, Pittsburgh, PA, USA, 27–29 July 1992; pp. 144–152. [Google Scholar]
- Amari, S.i.; Wu, S. Improving support vector machine classifiers by modifying kernel functions. Neural Netw.
**1999**, 12, 783–789. [Google Scholar] [CrossRef] - Lin, Y. Support vector machines and the Bayes rule in classification. Data Min. Knowl. Discov.
**2002**, 6, 259–275. [Google Scholar] [CrossRef] - Zhang, C.H. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat.
**2010**, 38, 894–942. [Google Scholar] [CrossRef][Green Version] - Wu, S.; Amari, S.I. Conformal transformation of kernel functions: A data-dependent way to improve support vector machine classifiers. Neural Process. Lett.
**2002**, 15, 59–67. [Google Scholar] [CrossRef] - Zhu, J.; Rosset, S.; Tibshirani, R.; Hastie, T.J. 1-norm support vector machines. In Advances in Neural Information Processing Systems; The MIT Press: New York, NY, USA, 2004; pp. 49–56. [Google Scholar]
- Mazumder, R.; Friedman, J.H.; Hastie, T. Sparsenet: Coordinate descent with nonconvex penalties. J. Am. Stat. Assoc.
**2011**, 106, 1125–1138. [Google Scholar] [CrossRef] [PubMed][Green Version] - Chen, J.; Chen, Z. Extended Bayesian information criteria for model selection with large model spaces. Biometrika
**2008**, 95, 759–771. [Google Scholar] [CrossRef][Green Version] - Claeskens, G.; Croux, C.; Kerckhoven, J.V. An information criterion for variable selection in support vector machines. J. Mach. Learn. Res.
**2008**, 9, 541–558. [Google Scholar] [CrossRef][Green Version] - Blake, C.L.; Merz, C.J. UCI Repository of Machine Learning Databases; Department information Computer Science, University of California: Irvine, CA, USA, 1998; Volume 55. [Google Scholar]
- Mangasarian, O.L.; Street, W.N.; Wolberg, W.H. Breast cancer diagnosis and prognosis via linear programming. Oper. Res.
**1995**, 43, 570–577. [Google Scholar] [CrossRef][Green Version]

Method | Proportion | p | Relevant | Irrelevant | True% | Test Error% |
---|---|---|---|---|---|---|

DA-SCAD-SVM | c = 0.50 | 50 | 5.00(0.00) | 0.88(0.16) | 96 | 8.16(0.2) |

100 | 5.00(0.00) | 0.91(0.14) | 96 | 8.72(0.2) | ||

c = 0.75 | 50 | 4.96(0.01) | 0.92(0.23) | 94 | 9.23(0.3) | |

100 | 4.95(0.01) | 0.95(0.27) | 94 | 9.85(0.3) | ||

c = 0.90 | 100 | 4.91(0.03) | 1.10(0.39) | 91 | 10.55(0.4) | |

100 | 4.90(0.03) | 1.09(0.41) | 91 | 10.93(0.4) | ||

DA-MCP-SVM | c = 0.50 | 50 | 5.00(0.00) | 0.12(0.01) | 98 | 7.20(0.2) |

100 | 5.00(0.00) | 0.13(0.01) | 98 | 7.38(0.2) | ||

c = 0.75 | 50 | 4.98(0.01) | 0.26(0.03) | 96 | 8.44(0.2) | |

100 | 4.98(0.01) | 0.28(0.03) | 96 | 8.90(0.2) | ||

c = 0.90 | 100 | 4.95(0.02) | 0.42(0.04) | 92 | 9.20(0.3) | |

100 | 4.94(0.02) | 0.45(0.04) | 92 | 9.65(0.3) | ||

DA-L0-SVM | c = 0.50 | 50 | 5.00(0.00) | 0.36(0.02) | 97 | 7.81(0.2) |

100 | 5.00(0.00) | 0.39(0.02) | 97 | 7.86(0.2) | ||

c = 0.75 | 50 | 4.97(0.01) | 0.47(0.03) | 95 | 8.02(0.2) | |

100 | 4.96(0.01) | 0.51(0.03) | 95 | 8.10(0.2) | ||

c = 0.90 | 100 | 4.92(0.02) | 0.68(0.04) | 91 | 9.70(0.3) | |

100 | 4.92(0.02) | 0.65(0.04) | 90 | 9.82(0.3) | ||

SCAD-SVM | c = 0.50 | 50 | 4.92(0.02) | 1.92(0.18) | 96 | 8.23(0.2) |

100 | 4.91(0.02) | 1.99(0.17) | 96 | 8.66(0.2) | ||

c = 0.75 | 50 | 4.83(0.03) | 2.01(0.31) | 91 | 10.19(0.4) | |

100 | 4.78(0.04) | 2.13(0.36) | 91 | 10.87(0.4) | ||

c = 0.90 | 100 | 4.76(0.04) | 3.35(0.41) | 88 | 12.15(0.5) | |

100 | 4.74(0.04) | 3.40(0.43) | 87 | 12.36(0.5) | ||

MCP-SVM | c = 0.50 | 50 | 5.00(0.00) | 0.27(0.02) | 98 | 7.32(0.2) |

100 | 5.00(0.00) | 0.29(0.02) | 98 | 7.41(0.2) | ||

c = 0.75 | 50 | 4.92(0.01) | 0.43(0.03) | 93 | 8.96(0.2) | |

100 | 4.91(0.01) | 0.47(0.03) | 93 | 9.29(0.3) | ||

c = 0.90 | 100 | 4.85(0.03) | 0.88(0.05) | 89 | 10.63(0.4) | |

100 | 4.84(0.03) | 0.91(0.05) | 89 | 11.79(0.4) | ||

${L}_{1}$-SVM | c = 0.50 | 50 | 4.86(0.05) | 31.08(1.52) | 10 | 16.67(0.5) |

100 | 4.71(0.06) | 42.98(2.13) | 4 | 19.33(0.6) | ||

c = 0.75 | 50 | 4.62(0.07) | 35.71(1.67) | 3 | 19.18(0.6) | |

100 | 4.45(0.08) | 46.29(2.20) | 0 | 22.00(0.8) | ||

c = 0.90 | 50 | 4.33(0.10) | 39.53(2.02) | 1 | 22.61(0.8) | |

100 | 4.02(0.10) | 59.01(2.54) | 0 | 25.98(1.0) | ||

Adapt ${L}_{1}$-SVM | c = 0.50 | 50 | 4.38(0.07) | 13.62(0.90) | 23 | 16.28(0.5) |

100 | 4.01(0.10) | 13.10(0.86) | 5 | 20.23(0.5) | ||

c = 0.75 | 50 | 4.13(0.09) | 15.18(1.05) | 8 | 18.71(0.5) | |

100 | 3.91(0.10) | 14.92(1.03) | 0 | 22.33(0.6) | ||

c = 0.90 | 50 | 3.87(0.10) | 16.99(1.22) | 2 | 20.02(0.6) | |

100 | 3.81(0.13) | 16.87(1.21) | 0 | 25.01(0.7) | ||

${L}_{0}$-SVM | c = 0.50 | 50 | 4.85(0.02) | 2.87(0.66) | 62 | 12.16(0.5) |

100 | 4.78(0.04) | 2.93(0.49) | 54 | 14.16(0.4) | ||

c = 0.75 | 50 | 4.61(0.04) | 4.11(0.23) | 55 | 13.88(0.4) | |

100 | 4.37(0.08) | 4.23(0.56) | 43 | 15.73(0.4) | ||

c = 0.90 | 50 | 4.33(0.07) | 6.28(0.77) | 41 | 16.68(0.5) | |

100 | 4.03(0.10) | 6.79(0.78) | 25 | 17.02(0.5) |

Method | Proportion | p | Relevant | Irrelevant | True% | Test Error% |
---|---|---|---|---|---|---|

DA-SCAD-SVM | c = 0.50 | 200 | 5.00(0.00) | 0.58(0.11) | 98 | 7.76(0.2) |

400 | 5.00(0.00) | 0.72(0.13) | 98 | 8.13(0.2) | ||

c = 0.75 | 200 | 4.98(0.01) | 0.67(0.12) | 96 | 8.76(0.3) | |

400 | 4.98(0.01) | 0.71(0.13) | 96 | 9.12(0.3) | ||

c = 0.90 | 200 | 4.95(0.02) | 0.81(0.17) | 93 | 9.14(0.3) | |

400 | 4.94(0.02) | 0.77(0.16) | 93 | 9.93(0.3) | ||

DA-MCP-SVM | c = 0.50 | 200 | 5.00(0.00) | 0.05(0.01) | 98 | 6.28(0.2) |

400 | 5.00(0.00) | 0.06(0.01) | 98 | 6.91(0.2) | ||

c = 0.75 | 200 | 4.98(0.01) | 0.12(0.04) | 97 | 7.45(0.2) | |

400 | 4.98(0.01) | 0.11(0.04) | 97 | 7.93(0.2) | ||

c = 0.90 | 200 | 4.95(0.02) | 0.18(0.05) | 94 | 8.60(0.2) | |

400 | 4.94(0.02) | 0.19(0.05) | 94 | 9.11(0.3) | ||

DA-L0-SVM | c = 0.50 | 200 | 5.00(0.00) | 0.26(0.01) | 98 | 7.02(0.2) |

400 | 5.00(0.00) | 0.28(0.01) | 98 | 7.12(0.2) | ||

c = 0.75 | 200 | 4.98(0.01) | 0.33(0.08) | 96 | 7.88(0.2) | |

400 | 4.98(0.01) | 0.36(0.08) | 97 | 8.02(0.2) | ||

c = 0.90 | 200 | 4.95(0.02) | 0.44(0.10) | 93 | 9.15(0.2) | |

400 | 4.94(0.02) | 0.49(0.10) | 93 | 9.54(0.3) | ||

SCAD-SVM | c = 0.50 | 200 | 4.96(0.01) | 1.52(0.15) | 96 | 8.01(0.2) |

400 | 4.96(0.01) | 1.76(0.16) | 96 | 8.36(0.2) | ||

c = 0.75 | 200 | 4.88(0.03) | 1.77(0.16) | 92 | 9.59(0.3) | |

400 | 4.82(0.04) | 1.98(0.18) | 92 | 10.27(0.4) | ||

c = 0.90 | 200 | 4.82(0.04) | 2.89(0.36) | 90 | 11.32(0.5) | |

400 | 4.77(0.04) | 3.11(0.40) | 89 | 11.87(0.4) | ||

MCP-SVM | c = 0.50 | 200 | 5.00(0.00) | 0.27(0.02) | 98 | 7.32(0.2) |

400 | 5.00(0.00) | 0.29(0.02) | 98 | 7.41(0.2) | ||

c = 0.75 | 200 | 4.92(0.01) | 0.43(0.03) | 93 | 8.96(0.2) | |

400 | 4.91(0.01) | 0.47(0.03) | 93 | 9.29(0.3) | ||

c = 0.90 | 200 | 4.85(0.03) | 0.88(0.05) | 89 | 10.63(0.4) | |

400 | 4.84(0.03) | 0.91(0.05) | 89 | 11.79(0.4) | ||

${L}_{1}$-SVM | c = 0.50 | 200 | 4.88(0.04) | 25.08(1.22) | 15 | 14.91(0.4) |

400 | 4.79(0.06) | 28.66(1.56) | 8 | 17.76(0.5) | ||

c = 0.75 | 200 | 4.65(0.07) | 28.12(1.54) | 5 | 16.53(0.5) | |

400 | 4.45(0.08) | 31.67(1.53) | 1 | 20.35(0.7) | ||

c = 0.90 | 200 | 4.43(0.09) | 33.53(1.61) | 0 | 19.53(0.6) | |

400 | 4.11(0.09) | 40.27(2.08) | 0 | 23.16(0.9) | ||

Adapt ${L}_{1}$-SVM | c = 0.50 | 200 | 4.49(0.08) | 11.28(0.90) | 35 | 13.28(0.5) |

400 | 4.25(0.9) | 13.10(0.86) | 16 | 16.55(0.6) | ||

c = 0.75 | 200 | 4.25(0.09) | 13.65(1.05) | 17 | 15.97(0.5) | |

400 | 4.12(0.09) | 14.16(1.03) | 6 | 18.46(0.6) | ||

c = 0.90 | 200 | 3.87(0.10) | 14.85(1.22) | 5 | 18.98(0.6) | |

400 | 4.01(0.10) | 15.26(1.21) | 1 | 21.98(0.7) | ||

${L}_{0}$-SVM | c = 0.50 | 200 | 4.88(0.02) | 2.42(0.66) | 77 | 11.42(0.5) |

400 | 4.82(0.02) | 2.65(0.23) | 60 | 12.91(0.5) | ||

c = 0.75 | 200 | 4.73(0.04) | 3.69(0.30) | 65 | 12.51(0.5) | |

400 | 4.49(0.06) | 3.82(0.23) | 48 | 13.80(0.5) | ||

c = 0.90 | 200 | 4.46(0.06) | 5.52(0.63) | 47 | 15.23(0.5) | |

400 | 4.33(0.07) | 6.18(0.76) | 29 | 16.45(0.6) |

**Table 3.**Classification outcome on the Wisconsin Breast Cancer data set. Margins are provided in brackets.

Methods | # of Features | Prediction Error(%) |
---|---|---|

DA-SCAD-SVM | 6(0.8) | 9.6(0.3) |

DA-MCP-SVM | 5(0.2) | 9.4(0.2) |

DA-L0-SVM | 5(0.4) | 9.6(0.2) |

SCAD-SVM | 7(0.8) | 10.9(0.3) |

MCP-SVM | 6(0.2) | 13.2(0.2) |

${L}_{0}$-norm Approximation SVM | 12(1.3) | 15.2(0.2) |

Adapt ${L}_{1}$-norm SVM | 14.50(2.4) | 17(1.5) |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Liu, X.; Zhao, B.; He, W. Simultaneous Feature Selection and Classification for Data-Adaptive Kernel-Penalized SVM. *Mathematics* **2020**, *8*, 1846.
https://doi.org/10.3390/math8101846

**AMA Style**

Liu X, Zhao B, He W. Simultaneous Feature Selection and Classification for Data-Adaptive Kernel-Penalized SVM. *Mathematics*. 2020; 8(10):1846.
https://doi.org/10.3390/math8101846

**Chicago/Turabian Style**

Liu, Xin, Bangxin Zhao, and Wenqing He. 2020. "Simultaneous Feature Selection and Classification for Data-Adaptive Kernel-Penalized SVM" *Mathematics* 8, no. 10: 1846.
https://doi.org/10.3390/math8101846