Overfitting Reduction of Text Classification Based on AdaBELM
AbstractOverfitting is an important problem in machine learning. Several algorithms, such as the extreme learning machine (ELM), suffer from this issue when facing high-dimensional sparse data, e.g., in text classification. One common issue is that the extent of overfitting is not well quantified. In this paper, we propose a quantitative measure of overfitting referred to as the rate of overfitting (RO) and a novel model, named AdaBELM, to reduce the overfitting. With RO, the overfitting problem can be quantitatively measured and identified. The newly proposed model can achieve high performance on multi-class text classification. To evaluate the generalizability of the new model, we designed experiments based on three datasets, i.e., the 20 Newsgroups, Reuters-21578, and BioMed corpora, which represent balanced, unbalanced, and real application data, respectively. Experiment results demonstrate that AdaBELM can reduce overfitting and outperform classical ELM, decision tree, random forests, and AdaBoost on all three text-classification datasets; for example, it can achieve 62.2% higher accuracy than ELM. Therefore, the proposed model has a good generalizability. View Full-Text
Share & Cite This Article
Feng, X.; Liang, Y.; Shi, X.; Xu, D.; Wang, X.; Guan, R. Overfitting Reduction of Text Classification Based on AdaBELM. Entropy 2017, 19, 330.
Feng X, Liang Y, Shi X, Xu D, Wang X, Guan R. Overfitting Reduction of Text Classification Based on AdaBELM. Entropy. 2017; 19(7):330.Chicago/Turabian Style
Feng, Xiaoyue; Liang, Yanchun; Shi, Xiaohu; Xu, Dong; Wang, Xu; Guan, Renchu. 2017. "Overfitting Reduction of Text Classification Based on AdaBELM." Entropy 19, no. 7: 330.
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.