Next Article in Journal
Design of Dual-Polarized All-Dielectric Transmitarray Antenna for Ka-Band Applications
Previous Article in Journal
Acoustic Emission and Infrared Radiation Temperature Characteristics of Coal with Varying Bedding Planes Under Uniaxial Compression
Previous Article in Special Issue
Context-Aware Multimodal Fusion with Sensor-Augmented Cross-Modal Learning: The BLAF Architecture for Robust Chinese Homophone Disambiguation in Dynamic Environments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Article

Comparative Analysis of Natural Language Processing Techniques in the Classification of Press Articles

Department of Microelectronics and Computer Science, Lodz University of Technology, ul. Wólczańska 221, 93-005 Łódź, Poland
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(17), 9559; https://doi.org/10.3390/app15179559 (registering DOI)
Submission received: 30 July 2025 / Revised: 27 August 2025 / Accepted: 28 August 2025 / Published: 30 August 2025
(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications—2nd Edition)

Abstract

The study undertook a comprehensive review and comparative analysis of natural language processing techniques for news article classification, with a particular focus on Java language libraries. The dataset comprised an excess of 200,000 items of news metadata sourced from The Huffington Post. The traditional algorithms based on mathematical statistics and deep machine learning were evaluated. The libraries chosen for tests were Apache OpenNLP, Stanford CoreNLP, Waikato Weka, and the Huggingface ecosystem with the Pytorch backend. The efficacy of the trained models in forecasting specific topics was evaluated, and diverse methodologies for the feature extraction and analysis of word-vector representations were explored. The study considered aspects such as hardware resource management, implementation simplicity, learning time, and the quality of the resulting model in terms of detection, and it examined a range of techniques for attribute selection, feature filtering, vector representation, and the handling of imbalanced datasets. Advanced techniques for word selection and named entity recognition were employed. The study compared different models and configurations in terms of their performance and the resources they consumed. Furthermore, it addressed the difficulties encountered when processing lengthy texts with transformer neural networks, and it presented potential solutions such as sequence truncation and segment analysis. The elevated computational cost inherent to Java-based languages may present challenges in machine learning tasks. OpenNLP model achieved 84% accuracy, Weka and CoreNLP attained 86% and 88%, respectively, and DistilBERT emerged as the top performer, with an accuracy rate of 92%. Deep learning models demonstrated superior performance, training time, and ease of implementation compared to conventional statistical algorithms.
Keywords: press article; natural language processing; news classification; artificial intelligence; machine learning; deep learning; transformer model; naive Bayes; DistilBERT press article; natural language processing; news classification; artificial intelligence; machine learning; deep learning; transformer model; naive Bayes; DistilBERT

Share and Cite

MDPI and ACS Style

Piasta, K.; Kotas, R. Comparative Analysis of Natural Language Processing Techniques in the Classification of Press Articles. Appl. Sci. 2025, 15, 9559. https://doi.org/10.3390/app15179559

AMA Style

Piasta K, Kotas R. Comparative Analysis of Natural Language Processing Techniques in the Classification of Press Articles. Applied Sciences. 2025; 15(17):9559. https://doi.org/10.3390/app15179559

Chicago/Turabian Style

Piasta, Kacper, and Rafał Kotas. 2025. "Comparative Analysis of Natural Language Processing Techniques in the Classification of Press Articles" Applied Sciences 15, no. 17: 9559. https://doi.org/10.3390/app15179559

APA Style

Piasta, K., & Kotas, R. (2025). Comparative Analysis of Natural Language Processing Techniques in the Classification of Press Articles. Applied Sciences, 15(17), 9559. https://doi.org/10.3390/app15179559

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop