Next Article in Journal
Concepts and Criteria for Blind Quantum Source Separation and Blind Quantum Process Tomography
Next Article in Special Issue
Estimating Mixture Entropy with Pairwise Distances
Previous Article in Journal
Initial Results of Testing Some Statistical Properties of Hard Disks Workload in Personal Computers in Terms of Non-Extensive Entropy and Long-Range Dependencies
Previous Article in Special Issue
Rate-Distortion Bounds for Kernel-Based Distortion Measures
Open AccessArticle

Overfitting Reduction of Text Classification Based on AdaBELM

by Xiaoyue Feng 1, Yanchun Liang 1,2,3, Xiaohu Shi 1,2, Dong Xu 1,3, Xu Wang 1 and Renchu Guan 1,2,*
1
Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun 130012, China
2
Zhuhai Laboratory of Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Zhuhai College of Jilin University, Zhuhai 519041, China
3
Department of Electric Engineering and Computer Science, and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO 65211, USA
*
Author to whom correspondence should be addressed.
Entropy 2017, 19(7), 330; https://doi.org/10.3390/e19070330
Received: 2 May 2017 / Revised: 26 June 2017 / Accepted: 29 June 2017 / Published: 6 July 2017
(This article belongs to the Special Issue Information Theory in Machine Learning and Data Science)
Overfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability. View Full-Text
Keywords: machine learning; overfitting; AdaBoost; feedforward neural network; extreme learning machine machine learning; overfitting; AdaBoost; feedforward neural network; extreme learning machine
Show Figures

Figure 1

MDPI and ACS Style

Feng, X.; Liang, Y.; Shi, X.; Xu, D.; Wang, X.; Guan, R. Overfitting Reduction of Text Classification Based on AdaBELM. Entropy 2017, 19, 330.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

1
Back to TopTop