Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning †
Abstract
:1. Introduction
- Evaluation of the effect of using the WordNet ontology for feature dimension reduction on classification success on imbalanced and multi-class (seven class labels) data.
- Comparison of the performance of machine learning and deep learning classification algorithms when used on a multi-class imbalanced dataset with different imbalance ratios.
- The use of WordNet for dimension reduction using lexicography instead of domain ontology, together with some word embedding methods, before classical machine learning models increases the success of some of the classification models.
- The highest success was achieved when using WordNet for dimension reduction with lexicography and the BERT algorithm as a hybrid model. It can be seen from our experimental studies that feature dimension reduction based on lexicography increased classification success.
2. Literature Review
2.1. Document Classification
2.2. Feature Dimension Reduction Using WordNet Ontology
2.3. Binary and Multi-Class Classification
2.4. Statistical Analysis of Document Classification
3. Methodology
3.1. WordNet Ontology
3.2. Word Embedding Methods
3.2.1. Bag of Words (BoW)
3.2.2. TF-IDF
3.2.3. Word2Vec
3.2.4. Doc2Vec
3.3. Classification Methods
3.3.1. Random Forest (RF)
- Random samples are selected from the input dataset.
- The algorithm constructs a decision tree that will yield the prediction result for each selected sample.
- This model is used in the classification problem for each of the predicted outcomes.
- The prediction that receives the most votes is the final result.
3.3.2. Support Vector Machine (SVM)
3.3.3. Multi-Layer Perceptron (MLP)
3.3.4. Bidirectional Encoder Representations from Transformers (BERT)
4. Experimental Design
4.1. Application Design
4.2. Dataset
4.3. Data Preprocessing
- Punctuation marks were removed.
- HTML tags were removed.
- Numeric expressions were removed.
- All words were converted to lowercase.
- Stop words were removed.
- Word spellings were corrected.
- Lemmatization was applied.
- Stop words formed after lemmatization were removed.
4.4. Ontology-Based Feature Dimension Reduction
5. Experiment and Results
5.1. Proposed Model
5.2. Machine Learning Classifiers
5.3. Deep Learning Classifier
6. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kadhim, A.I. Survey on supervised machine learning techniques for automatic text classification. Artif. Intell. Rev. 2019, 52, 273–292. [Google Scholar] [CrossRef]
- Kumbhar, P.; Mali, M.A. Survey on Feature Selection Techniques and Classification Algorithms for Efficient Text Classification. Int. J. Sci. Res. 2016, 5, 1267–1275. [Google Scholar]
- Mwadulo, M.W. A Review on Feature Selection Methods for Classification Tasks. Int. J. Comput. Appl. Technol. Res. 2016, 5, 395–402. [Google Scholar]
- Zhang, T.; Yang, B. Big data dimension reduction using PCA. In Proceedings of the 2016 IEEE International Conference on Smart Cloud (SmartCloud), New York, NY, USA, 18–20 November 2016; pp. 152–157. [Google Scholar] [CrossRef]
- Lu, Z.; Du, P.; Nie, J.Y. VGCN-BERT: Augmenting BERT with graph embedding for text classification. In Advances in Information Retrieval, Proceedings of the 42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, 14–17 April 2020; Springer: Berlin/Heidelberg, Germany, 2020; pp. 369–382. [Google Scholar] [CrossRef]
- Barbouch, M.; Verberne, S.; Verhoef, T. WN-BERT: Integrating WordNet and BERT for Lexical Semantics in Natural Language Understanding. Comput. Linguist. Neth. J. 2021, 11, 105–124. [Google Scholar]
- Kowsari, K.; Jafari Meimandi, K.; Heidarysafa, M.; Mendu, S.; Barnes, L.; Brown, D. Text classification algorithms: A survey. Information 2019, 10, 150. [Google Scholar] [CrossRef]
- Stein, R.A.; Jaques, P.A.; Valiati, J.F. An analysis of hierarchical text classification using word embeddings. Inf. Sci. 2019, 471, 216–232. [Google Scholar] [CrossRef]
- Sen, P.C.; Hajra, M.; Ghosh, M. Supervised classification algorithms in machine learning: A survey and review. In Emerging Technology in Modelling and Graphics, Proceedings of the IEM Graph 2018, Kolkata, India, 6–7 September 2018; Springer: Singapore, 2020; pp. 99–111. [Google Scholar]
- Han, Q.; Snaidauf, D. Comparison of Deep Learning Technologies in Legal Document Classification. In Proceedings of the 2021 IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 15–18 December 2021; pp. 2701–2704. [Google Scholar]
- Kosar, A.; De Pauw, G.; Daelemans, W. Unsupervised Text Classification with Neural Word Embeddings. Comput. Linguist. Neth. J. 2022, 12, 165–181. [Google Scholar]
- Han, H.; Giles, C.L.; Manavoglu, E.; Zha, H.; Zhang, Z.; Fox, E.A. Automatic document metadata extraction using support vector machines. In Proceedings of the 2003 IEEE Joint Conference on Digital Libraries, Houston, TX, USA, 27–31 May 2003; pp. 37–48. [Google Scholar]
- Biagioli, C.; Francesconi, E.; Passerini, A.; Montemagni, S.; Soria, C. Automatic semantics extraction in law documents. In Proceedings of the 10th International Conference on Artificial Intelligence and Law, Paris, France, 6–8 July 2005; pp. 133–140. [Google Scholar]
- Maynard, D.; Yankova, M.; Kourakis, A.; Kokossis, A. Ontology-based information extraction for market monitoring and technology watch. In Proceedings of the ESWC Workshop End User Apects of the Semantic Web, Heraklion, Greece, 6–10 June 2005. [Google Scholar]
- Mohemad, R.; Hamdan, A.R.; Othman, Z.A.; Mohamad Noor, N.M. Ontological-based information extraction of construction tender documents. In Advances in Intelligent Web Mastering–3, Proceedings of the 7th Atlantic Web Intelligence Conference, AWIC 2011, Fribourg, Switzerland, 26–28 January 2011; Springer: Berlin/Heidelberg, Germany, 2011; pp. 153–162. [Google Scholar]
- Bloehdorn, S.; Basili, R.; Cammisa, M.; Moschitti, A. Semantic kernels for text classification based on topological measures of feature similarity. In Proceedings of the Sixth IEEE International Conference on Data Mining (ICDM’06), Hong Kong, China, 18–22 December 2006; pp. 808–812. [Google Scholar]
- Cristianini, N.; Shawe-Taylor, J.; Lodhi, H. Latent semantic kernels. J. Intell. Inf. Syst. 2002, 18, 127–152. [Google Scholar] [CrossRef]
- Dhyaram, L.P.; Vishnuvardhan, B. Random subset feature selection for classification. Int. J. Adv. Res. Comput. Sci 2018, 9, 317–319. [Google Scholar] [CrossRef]
- Bamatraf, S.A.; Bin-Thalab, R.A. Semantic Classification Model for Twitter Dataset Using WordNet. Int. Res. J. Innov. Eng. Technol. 2021, 5, 5. [Google Scholar]
- Gawade, M.; Mane, T.; Ghone, D.; Khade, P.; Ranjan, N. Text Document Classification by using WordNet Ontology and Neural Network. Int. J. Comput. Appl. 2018, 182, 33–36. [Google Scholar] [CrossRef]
- Elhadad, M.K.; Badran, K.M.; Salama, G.I. A novel approach for ontology-based dimensionality reduction for web text document classification. Int. J. Softw. Innov. 2017, 5, 44–58. [Google Scholar] [CrossRef]
- Demirsoz, O.; Ozcan, R. Classification of news-related tweets. J. Inf. Sci. 2017, 43, 509–524. [Google Scholar] [CrossRef]
- Xue, B.; Zhu, C.; Wang, X.; Zhu, W. The Study on the Text Classification Based on Graph Convolutional Network and BiLSTM. In Proceedings of the 8th International Conference on Computing and Artificial Intelligence, Tianjin, China, 18–21 March 2022; pp. 323–331. [Google Scholar] [CrossRef]
- Bouazizi, M.; Ohtsuki, T. A pattern-based approach for multi-class sentiment analysis in Twitter. IEEE Access 2017, 5, 20617–20639. [Google Scholar] [CrossRef]
- Dogra, V.; Alharithi, F.S.; Álvarez, R.M.; Singh, A.; Qahtani, A.M. NLP-Based Application for Analyzing Private and Public Banks Stocks Reaction to News Events in the Indian Stock Exchange. Systems 2022, 10, 233. [Google Scholar] [CrossRef]
- Xue, B.; Zhu, C.; Wang, X.; Zhu, W. An Integration Model for Text Classification using Graph Convolutional Network and BERT. J. Phys. Conf. Ser. 2021, 2137, 012052. [Google Scholar] [CrossRef]
- Vazquez Barrera, A. Neural News Classifier from Pre-Trained Models. Master’s Thesis, Universitat Politècnica de València, Valencia, Spain, 2022. [Google Scholar]
- Liu, J.; Xu, Y. T-Friedman Test: A New Statistical Test for Multiple Comparison with an Adjustable Conservativeness Measure. Int. J. Comput. Intell. Syst. 2022, 15, 29. [Google Scholar] [CrossRef]
- Labani, M.; Moradi, P.; Ahmadizar, F.; Jalili, M. A novel multivariate filter method for feature selection in text classification problems. Eng. Appl. Artif. Intell. 2018, 70, 25–37. [Google Scholar] [CrossRef]
- Dey Sarkar, S.; Goswami, S.; Agarwal, A.; Aktar, J. A novel feature selection technique for text classification using Naive Bayes. Int. Sch. Res. Not. 2014, 2014, 717092. [Google Scholar] [CrossRef] [PubMed]
- Taieb, M.A.H.; Aouicha, M.B.; Hamadou, A.B. Ontology-based approach for measuring semantic similarity. Eng. Appl. Artif. Intell. 2014, 36, 238–261. [Google Scholar] [CrossRef]
- Salton, G.; Yu, C.T. On the construction of effective vocabularies for information retrieval. Acm Sigplan Not. 1975, 10, 48–60. [Google Scholar] [CrossRef]
- Bond, F.; Lim, L.T.; Tang, E.K.; Riza, H. The combined WordNet bahasa. NUSA: Linguist. Stud. Lang. Around Indones. 2014, 57, 83–100. [Google Scholar]
- Alrababah, S.A.A.; Gan, K.H.; Tan, T.P. Mining opinionated product features using WordNet lexicographer files. J. Inf. Sci. 2017, 43, 769–785. [Google Scholar] [CrossRef]
- Chebotko, A.; Lu, S.; Atay, M.; Fotouhi, F. Efficient processing of RDF queries with nested optional graph patterns in an RDBMS. Int. J. Semant. Web Inf. Syst. 2008, 4, 1–30. [Google Scholar] [CrossRef]
- Miller, G.A. WordNet: An Electronic Lexical Database; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
- Dogru, H.B.; Tilki, S.; Jamil, A.; Hameed, A.A. Deep learning-based classification of news texts using doc2vec model. In Proceedings of the 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 6–7 April 2021; pp. 91–96. [Google Scholar]
- Kang, M.; Ahn, J.; Lee, K. Opinion mining using ensemble text hidden Markov models for text classification. Expert Syst. Appl. 2018, 94, 218–227. [Google Scholar] [CrossRef]
- Luhn, H.P. A statistical approach to mechanized encoding and searching of literary information. IBM J. Res. Dev. 1957, 1, 309–317. [Google Scholar] [CrossRef]
- Arroyo-Fernández, I.; Méndez-Cruz, C.F.; Sierra, G.; Torres-Moreno, J.M.; Sidorov, G. Unsupervised sentence representations as word information series: Revisiting TF–IDF. Comput. Speech Lang. 2019, 56, 107–129. [Google Scholar] [CrossRef]
- Sparck Jones, K. A statistical interpretation of term specificity and its application in retrieval. J. Doc. 1972, 28, 11–21. [Google Scholar] [CrossRef]
- Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.S.; Dean, J. Distributed representations of words and phrases and their compositionality. Adv. Neural Inf. Process. Syst. 2013, 26, 1–9. [Google Scholar]
- Ren, Y.; Wang, R.; Ji, D. A topic-enhanced word embedding for twitter sentiment classification. Inf. Sci. 2016, 369, 188–198. [Google Scholar] [CrossRef]
- Le, Q.; Mikolov, T. Distributed representations of sentences and documents. In Proceedings of the International Conference on Machine Learning, Beijing, China, 21–26 June 2014; pp. 1188–1196. [Google Scholar]
- Breiman, L. Machine learning. Random For. 2001, 45, 5–32. [Google Scholar]
- Wang, Y.; Pan, Z.; Zheng, J.; Qian, L.; Li, M. A hybrid ensemble method for pulsar candidate classification. Astrophys. Space Sci. 2019, 364, 1–13. [Google Scholar] [CrossRef]
- Rustam, Z.; Yaurita, F. Insolvency Prediction in Insurance Companies using Support Vector Machines and Fuzzy Kernel cMeans. J. Phys. Conf. Ser 2018, 1028, 012118. [Google Scholar] [CrossRef]
- Rustam, Z.; Zahras, D. Comparison between support vector machine and fuzzy c-means as classifier for intrusion detection system. J. Phys. Conf. Ser. 2018, 1028, 012227. [Google Scholar] [CrossRef]
- Rustam, Z.; Faradina, R. Face recognition to identify look-alike faces using support vector machine. J. Phys. Conf. Ser. 2018, 1108, 012071. [Google Scholar] [CrossRef]
- Rustam, Z.; Ruvita, A.A. Application support vector machine on face recognition for gender classification. J. Phys. Conf. Ser. 2018, 1108, 012067. [Google Scholar] [CrossRef]
- Rampisela, T.V.; Rustam, Z. Classification of schizophrenia data using support vector machine (SVM). J. Phys. Conf. Ser. 2018, 1108, 012044. [Google Scholar] [CrossRef]
- Nadira, T.; Rustam, Z. Classification of cancer data using support vector machines with features selection method based on global artificial bee colony. In Proceedings of the AIP Conference Proceedings, Bali, Indonesia, 26–27 July 2017; p. 020205. [Google Scholar]
- Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
- Esteva, A.; Kuprel, B.; Novoa, R.A.; Ko, J.; Swetter, S.M.; Blau, H.M.; Thrun, S. Dermatologist-level classification of skin cancer with deep neural networks. Nature 2017, 542, 115–118. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Panchal, G.; Ganatra, A.; Kosta, Y.P.; Panchal, D. Behaviour Analysis of Multilayer Perceptrons with Multiple Hidden Neurons and Hidden Layers. Int. J. Comput. Theory Eng. 2011, 3, 332. [Google Scholar] [CrossRef]
- Zainal-Mokhtar, K.; Mohamad-Saleh, J. An oil fraction neural sensor developed using electrical capacitance tomography sensor data. Sensors 2013, 13, 11385–11406. [Google Scholar] [CrossRef]
- Nozza, D.; Bianchi, F.; Hovy, D. What the [mask]? making sense of language-specific BERT models. arXiv 2020, arXiv:2003.02912. [Google Scholar]
- Jśnior, E.A.C.; Marinho, V.Q.; dos Santos, L.B. NILC-USP at SemEval2017 Task 4: A Multi-view Ensemble for Twitter Sentiment Analysis. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, BC, Canada, 3–4 August 2017; pp. 611–615. [Google Scholar]
- Rustam, Z.; Ariantari, N.P.A.A. Support Vector Machines for classifying policyholders satisfactorily in automobile insurance. J. Phys. Conf. Ser. 2018, 1028, 012005. [Google Scholar] [CrossRef]
- Dong, R.; Schaal, M.; O’Mahony, M.P.; Smyth, B. Topic extraction from online reviews for classification and recommendation. In Proceedings of the Twenty-Third International Joint Conference on Artificial Intelligence (IJCAI 13), Beijing, China, 3–9 August 2013; pp. 1310–1316. [Google Scholar]
- Farkiya, A.; Saini, P.; Sinha, S.; Desai, S. Natural language processing using NLTK and WordNet. Int. J. Comput. Sci. Inf. Technol. 2015, 6, 5465–5469. [Google Scholar]
- Chiorrini, A.; Diamantini, C.; Mircoli, A.; Potena, D. Emotion and sentiment analysis of tweets using BERT. In Proceedings of the EDBT/ICDT Workshops 2021, Nicosia, Cyprus, 23–26 March 2023; Volume 3. [Google Scholar]
- Kishan Yadav. Available online: https://www.kaggle.com/datasets/kishanyadav/inshort-news?select=inshort_news_data-1.csv (accessed on 13 January 2023).
- Yang, Y.; Uy, M.C.S.; Huang, A. FinBERT: A pretrained language model for financial communications. arXiv 2020, arXiv:2006.08097. [Google Scholar]
- Dumitrescu, S.D.; Avram, A.M.; Pyysalo, S. The birth of Romanian BERT. arXiv 2020, arXiv:2009.08712. [Google Scholar]
- Jahan, M.S.; Beddiar, D.R.; Oussalah, M.; Arhab, N. Hate and Offensive language detection using BERT for English Subtask A. In Proceedings of the FIRE 2021: Forum for Information Retrieval Evaluation, Gandhinagar, India, 13–17 December 2021. [Google Scholar]
- Keya, A.J.; Wadud, M.A.H.; Mridha, M.F.; Alatiyyah, M.; Hamid, M.A. AugFake-BERT: Handling Imbalance through Augmentation of Fake News Using BERT to Enhance the Performance of Fake News Classification. Appl. Sci. 2022, 12, 8398. [Google Scholar] [CrossRef]
- Gasmi, K. Improving BERT-Based Model for Medical Text Classification with an Optimization Algorithm. In Advances in Computational Collective Intelligence, Proceedings of the 14th International Conference, ICCCI 2022, Hammamet, Tunisia, 28–30 September 2022; Springer International Publishing: Cham, Switzerland, 2022; pp. 101–111. [Google Scholar]
Word Embedding Method | Advantages | Disadvantages |
---|---|---|
BoW |
Characterized by ease of implementation.
It does not require extensive training data. It can be used to create an initial draft model before proceeding to more sophisticated word embeddings. |
It does have a few limitations and drawbacks.
As the size of the vocabulary increases, the size of the BoW vector representation grows accordingly. |
TF-IDF | The algorithm is simple to use, as it is computationally efficient, cost effective to run, and provides a clear basis for similarity calculations. |
TF-IDF cannot assist in carrying semantic meaning.
TF-IDF disregards word order. |
Word2Vec |
It is a computationally efficient method for generating word embeddings.
It produces embeddings that capture semantic relationships between words. It employs dimensionality reduction techniques to compress the high-dimensional vector space of words into a lower-dimensional space. It is flexible and can be trained on different datasets. Word2vec can be used for language modeling, which involves predicting the likelihood of a sequence of words occurring in a text corpus. |
Word2vec does not handle polysemy very well, which is the phenomenon of a single word having multiple meanings.
Word2vec requires a large corpus of text to train on, and it may not perform well on words that are not present in the training data. While word2vec produces vector representations of words, it can be difficult to interpret what these vectors actually represent. While word2vec is generally computationally efficient, it can still require significant computing resources, especially when training on large datasets or with complex models. |
Doc2Vec |
Doc2vec can generate vector representations that capture the semantic relationships between entire documents, making it useful for tasks such as document classification, clustering, and similarity searches.
Unlike traditional bag-of-words models, doc2vec can handle variable-length documents, which makes it useful for processing long and complex documents. Doc2vec can incorporate the context of a document, including the surrounding documents and other relevant information, into the vector representation of the document. Doc2vec is computationally efficient and can be used on large datasets. |
Doc2vec is a more complex model than traditional bag-of-words models, and can be more difficult to implement and interpret.
Doc2vec can require significant computing resources, especially when trained on large datasets or with complex models. The quality of the document embeddings produced by doc2vec depends on the quality and size of the training data. While doc2vec produces vector representations of documents, it can be difficult to interpret what these vectors actually represent. This can make it challenging to understand why certain documents are more similar to each other than others. |
Classification Method | Advantage | Disadvantage |
---|---|---|
RF |
It helps to enhance accuracy by mitigating overfitting in decision trees.
It is versatile and can be used for both classification and regression problems. It is effective for working with both categorical and continuous values. It can handle missing values in the data without the need for imputation or pre-processing. |
It requires a substantial amount of computational power and resources.
It can take a long time to train, as it combines a large number of decision trees to determine the class. Due to the use of an ensemble of decision trees, RF also faces challenges with interpretability and struggles to determine the significance of each variable. |
SVM |
SVM is particularly effective when there is a clear separation between classes.
In high-dimensional spaces, SVM tends to be more effective. SVM is effective in scenarios where the number of dimensions exceeds the number of samples. It is known for its relatively efficient memory usage. |
The SVM algorithm is not suitable for processing large datasets due to its higher computational complexity and memory requirements.
SVM does not perform well when the dataset contains more noise. SVM does not perform well if the number of features per data point exceeds the number of training samples. |
MLP |
It is capable of addressing complex nonlinear problems.
It can effectively handle large amounts of input data. After the training process, MLP can make quick predictions. MLP can achieve comparable levels of accuracy even with smaller sample sizes. | MLP includes too many parameters because it is fully connected. |
BERT | BERT is a technology for generating “contextualized” word embeddings/vectors, which is its biggest advantage. | It is very computationally intensive at inference time, meaning that if you want to use it in production at scale, it can become costly. |
Class Number | Category (Class Label) Name |
Total
Number of Data | Number of Data in Training Set | Number of Data in Test Set |
---|---|---|---|---|
1 | Automobile | 256 | 190 | 66 |
2 | Entertainment | 998 | 697 | 301 |
3 | Politics | 546 | 380 | 166 |
4 | Science | 389 | 277 | 112 |
5 | Sports | 856 | 594 | 262 |
6 | Technology | 751 | 514 | 237 |
7 | World | 1021 | 719 | 302 |
File Number | Name | Contents |
---|---|---|
13 | noun.food | nouns denoting foods and drinks |
15 | noun.location | nouns denoting spatial position |
18 | noun.person | nouns denoting people |
21 | noun.possession | nouns denoting possession and transfer of possession |
23 | noun.quantity | nouns denoting quantities and units of measure |
25 | noun.shape | nouns denoting two- and three-dimensional shapes |
28 | noun.time | nouns denoting time and temporal relations |
37 | verb.emotion | verbs of feeling |
40 | verb.possession | verbs of buying, selling, owning |
41 | verb.social | verbs of political and social activities and events |
43 | verb.weather | verbs of raining, snowing, thawing, thundering |
Original Data | After Preprocessing | After WordNet |
---|---|---|
Iranian authorities on Saturday executed journalist Ruhollah Zam over his online work that helped inspire nationwide economic protests in 2017. A court had sentenced Zam to death in June after he was found guilty of “corruption on earth”, one of the country\’s most serious offences. Zam had been living in exile in France but was arrested in October last year. | iranian authority saturday executed journalist roll online work helped inspire nationwide economic protest court sentenced death june found guilty corruption earth one country serious offence living exile france arrested october last year | person authority time social person roll online work social emotion nationwide economic protest court sentenced death time possession guilty corruption earth quantity country serious offence living person location arrested time time time |
Tokyo Stock Exchange (TSE) President and CEO Koichiro Miyahara will step down to accept responsibility over a system failure last month that resulted in the first all-day stoppage of trading since the exchange switched to all-electronic trading in 1999. Akira Kiyota, the Group CEO of Japan Exchange Group that runs the TSE, will temporarily take over Miyahara’s role. | tokyo stock exchange president co cairo micah ara step accept responsibility system failure last month resulted first day stoppage trading since exchange switched electronic trading akita toyota group co japan exchange group run temporarily take micah ara role | location possession exchange person co location person ara step accept responsibility system failure time time resulted first time stoppage trading since exchange switched electronic trading akita toyota group co location exchange group run temporarily possession person ara role |
Mick Schumacher, son of seven-time world champion Michael Schumacher, will be racing for Haas in the next Formula One season. The 21-year-old German signed a multi-year agreement and will partner Russian Nikita Mazepin. “The prospect of being on the Formula One grid next year makes me incredibly happy…I\’m simply speechless”, said Mick. He is currently leading the Formula Two championship. | mick schumacher son seven time world champion michael schumacher racing haas next formula one season year old german signed multi year agreement partner russian nikita maze prospect formula one grid next year make incredibly happy simply speechless said mick currently leading formula two championship | person schumacher person quantity time world champion person schumacher racing haas next formula quantity time time time person signed multi time agreement person person nikita maze prospect formula quantity grid next time make incredibly happy simply speechless said person currently leading formula quantity championship |
(a) Parameters for BoW | |
Parameter | Parameter Value |
Max Features | 500 |
Min df | 5 |
Max df | 0.7 |
(b) Parameters for TF-IDF | |
Parameter | Parameter Value |
Max Features | 1000 |
Min df | 5 |
Max df | 0.7 |
(c) Parameters for Word2Vec | |
Parameter | Parameter Value |
Training Algorithm | skip-gram |
Window | 5 |
Min Count | 5 |
Size | 200 |
Workers | 100 |
Epoch | 100 |
(d) Parameters for Doc2Vec | |
Parameter | Parameter Value |
Training Algorithm | PV-DM |
Vector Size | 200 |
Window | 8 |
Workers | 100 |
Epoch | 25 |
(e) Parameters for RF | |
Parameter | Parameter Value |
Random State | 0 |
(f) Parameters for SVM | |
Parameter | Parameter Value |
Max Iter | 15,000 |
Kernel | Linear |
Gamma | Auto |
(g) Parameters for MLP | |
Parameter | Parameter Value |
Solver | lbfgs |
Max Iter | 50 |
Hidden Layer Sizes | 50, 50, 50 |
Activation | Relu |
Method | Precision | Recall | F1-Score | Accuracy |
---|---|---|---|---|
BoW + RF | 91.29% | 90.53% | 90.87% | 91.35% |
TF-IDF + RF | 90.66% | 90.33% | 90.43% | 91.35% |
Word2Vec + RF | 90.57% | 88.34% | 89.32% | 91.14% |
Doc2Vec + RF | 92.67% | 91.41% | 91.96% | 92.39% |
WordNet + BoW + RF | 90.40% | 89.78% | 90.04% | 90.73% |
WordNet + TF-IDF + RF | 92.19% | 91.70% | 91.94% | 91.90% |
WordNet + Word2Vec + RF | 92.60% | 90.27% | 91.33% | 91.77% |
WordNet + Doc2Vec + RF | 92.55% | 92.79% | 92.62% | 93.01% |
Method | Precision | Recall | F1-Score | Accuracy |
---|---|---|---|---|
BoW + SVM | 89.36% | 90.42% | 89.70% | 90.45% |
TF-IDF + SVM | 89.95% | 90.09% | 89.91% | 91.00% |
Word2Vec + SVM | 86.96% | 87.36% | 87.07% | 88.03% |
Doc2Vec + SVM | 91.46% | 91.52% | 91.41% | 92.18% |
WordNet + BoW + SVM | 91.76% | 91.10% | 91.38% | 91.90% |
WordNet + TF-IDF + SVM | 90.04% | 90.94% | 90.37% | 90.66% |
WordNet + Word2Vec + SVM | 87.20% | 86.39% | 86.68% | 87.89% |
WordNet + Doc2Vec + SVM | 91.70% | 92.05% | 91.85% | 92.25% |
Method | Precision | Recall | F1-Score | Accuracy |
---|---|---|---|---|
BoW + MLP | 89.54% | 89.37% | 89.43% | 90.38% |
TF-IDF + MLP | 83.72% | 82.64% | 82.87% | 84.99% |
Word2Vec + MLP | 75.91% | 73.89% | 74.63% | 81.32% |
Doc2Vec + MLP | 90.69% | 89.48% | 90.00% | 91.07% |
WordNet + BoW + MLP | 90.35% | 90.78% | 90.51% | 91.07% |
WordNet + TF-IDF + MLP | 87.60% | 87.30% | 8.41% | 89.14% |
WordNet + Word2Vec + MLP | 79.41% | 75.75% | 77.03% | 81.88% |
WordNet + Doc2Vec + MLP | 91.84% | 90.73% | 91.21% | 91.63% |
Parameters | Parameters Value |
---|---|
Optimizer | Adam |
Learning Rate | 1e5 |
Epsilon | 1e8 |
Max Length | 256 |
Batch Size | 4 |
Epochs | 3–5 |
Method | Precision | Recall | F1-Score | Accuracy |
---|---|---|---|---|
BERT | 92.49% | 91.58% | 91.94% | 92.32% |
WordNet + BERT | 94.31% | 92.99% | 93.60% | 93.77% |
DistilBERT | 90.51% | 92.57% | 91.34% | 91.6% |
WordNet+DistilBERT | 92.71% | 92.47% | 92.56% | 92.5% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yelmen, I.; Gunes, A.; Zontul, M. Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning. Appl. Sci. 2023, 13, 6139. https://doi.org/10.3390/app13106139
Yelmen I, Gunes A, Zontul M. Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning. Applied Sciences. 2023; 13(10):6139. https://doi.org/10.3390/app13106139
Chicago/Turabian StyleYelmen, Ilkay, Ali Gunes, and Metin Zontul. 2023. "Multi-Class Document Classification Using Lexical Ontology-Based Deep Learning" Applied Sciences 13, no. 10: 6139. https://doi.org/10.3390/app13106139