Next Article in Journal
Compound Structures of Periodic Holes and Curved Ripples Fabricated by the Interference between the Converging Surface Plasmon Polaritons and Femtosecond Laser
Previous Article in Journal
Geology and Petrogeochemistry of Lijiapuzi Nb-Ta Granitic Pegmatite Deposit: Implications for Ore Genesis and Prospecting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Mining the Frequent Patterns of Named Entities for Long Document Classification

1
College of Systems Engineering, National University of Defense Technology, Changsha 410073, China
2
Beijing Institute of Computer Technology and Application, Beijing 100854, China
3
Data Intelligence System Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2022, 12(5), 2544; https://doi.org/10.3390/app12052544
Submission received: 14 January 2022 / Revised: 18 February 2022 / Accepted: 24 February 2022 / Published: 28 February 2022
(This article belongs to the Topic Machine and Deep Learning)

Abstract

Nowadays, a large amount of information is stored as text, and numerous text mining techniques have been developed for various applications, such as event detection, news topic classification, public opinion detection, and sentiment analysis. Although significant progress has been achieved for short text classification, document-level text classification requires further exploration. Long documents always contain irrelevant noisy information that shelters the prominence of indicative features, limiting the interpretability of classification results. To alleviate this problem, a model called MIPELD (mining the frequent pattern of a named entity for long document classification) for long document classification is demonstrated, which mines the frequent patterns of named entities as features. Discovered patterns allow semantic generalization among documents and provide clues for verifying the results. Experiments on several datasets resulted in good accuracy and marco-F1 values, meeting the requirements for practical application. Further analysis validated the effectiveness of MIPELD in mining interpretable information in text classification.
Keywords: long document classification; key feature mining; Naive Bayesian long document classification; key feature mining; Naive Bayesian

Share and Cite

MDPI and ACS Style

Wang, B.; Qi, R.; Gao, J.; Zhang, J.; Yuan, X.; Ke, W. Mining the Frequent Patterns of Named Entities for Long Document Classification. Appl. Sci. 2022, 12, 2544. https://doi.org/10.3390/app12052544

AMA Style

Wang B, Qi R, Gao J, Zhang J, Yuan X, Ke W. Mining the Frequent Patterns of Named Entities for Long Document Classification. Applied Sciences. 2022; 12(5):2544. https://doi.org/10.3390/app12052544

Chicago/Turabian Style

Wang, Bohan, Rui Qi, Jinhua Gao, Jianwei Zhang, Xiaoguang Yuan, and Wenjun Ke. 2022. "Mining the Frequent Patterns of Named Entities for Long Document Classification" Applied Sciences 12, no. 5: 2544. https://doi.org/10.3390/app12052544

APA Style

Wang, B., Qi, R., Gao, J., Zhang, J., Yuan, X., & Ke, W. (2022). Mining the Frequent Patterns of Named Entities for Long Document Classification. Applied Sciences, 12(5), 2544. https://doi.org/10.3390/app12052544

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop