Natural Language Processing Applications in Big Data

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, University of Sheffield, Sheffield S10 2TN, UK
Interests: natural language processing; machine learning; computational media analysis

E-Mail Website
Guest Editor
School of Information Science and Technology, Qingdao University of Science and Technology, Qingdao 260061, China
Interests: social computing; misinformation detection; text representation learning

E-Mail Website
Guest Editor
School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK
Interests: applying linguistic knowledge bases to fundamental NLP models, including LLMs; NLP/LLM applications in healthcare, and digital wellbeing; neuro-cognitive NLP and its application in affective analysis/misinformation detection based on text and multimodality data
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Recent developments in NLP, especially the application of large language models (LLMs), demonstrate the monumental shift in the ability of natural language processing (NLP) to analyse big data. However, there is still a significant gap between the theoretical advancements of NLP and their practical real-world applications. This Special Issue targets the practical application of natural language processing (NLP) in different disciplines and delves into how NLP enhances data analysis, decision making, and productivity across various sectors (such as finance, healthcare, and marketing) by automating and improving processes.

The aim of this Special Issue is to highlight the impact of NLP on data analysis across disciplines and address the critical challenges of big data, such as computational efficiency and cost, explainability, low-resource language applications, and sustainable development that meets the growing needs of industry.

Relevant topics for this Special Issue include, but are not limited to, the following areas:

  • Computational social science and cultural analytics;
  • Dialogue and interactive systems;
  • Efficient/low-resource methods for NLP;
  • Ethics, bias, and fairness;
  • Finance NLP;
  • Generation;
  • Healthcare NLP;
  • Information extraction;
  • Information retrieval and text mining;
  • Interpretability and analysis of models for NLP;
  • Legal NLP;
  • Linguistic theories, cognitive modelling, and psycholinguistics;
  • Machine learning for NLP;
  • Machine translation;
  • Multilinguality and language diversity;
  • Multimodality and language grounding to vision, robotics, and beyond;
  • Question answering;
  • Resources and evaluation;
  • Sentiment analysis, stylistic analysis, and argument mining;
  • Speech recognition, text-to-speech conversion, and spoken language understanding;
  • Summarization.

Dr. Xingyi Song
Dr. Ye Jiang
Dr. Yunfei Long
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Big Data and Cognitive Computing is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • natural language processing
  • low-resource languages
  • NLP applications
  • interpretability
  • large language model
  • sentiment analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (1 paper)

Order results
Result details
Select all
Export citation of selected articles as:

Research

21 pages, 4729 KiB  
Article
Enhancing Hierarchical Classification in Tree-Based Models Using Level-Wise Entropy Adjustment
by Olga Narushynska, Anastasiya Doroshenko, Vasyl Teslyuk, Volodymyr Antoniv and Maksym Arzubov
Big Data Cogn. Comput. 2025, 9(3), 65; https://doi.org/10.3390/bdcc9030065 - 11 Mar 2025
Viewed by 769
Abstract
Hierarchical classification, which organizes items into structured categories and subcategories, has emerged as a powerful solution for handling large and complex datasets. However, traditional flat classification approaches often overlook the hierarchical dependencies between classes, leading to suboptimal predictions and limited interpretability. This paper [...] Read more.
Hierarchical classification, which organizes items into structured categories and subcategories, has emerged as a powerful solution for handling large and complex datasets. However, traditional flat classification approaches often overlook the hierarchical dependencies between classes, leading to suboptimal predictions and limited interpretability. This paper addresses these challenges by proposing a novel integration of tree-based models with hierarchical-aware split criteria through adjusted entropy calculations. The proposed method calculates entropy at multiple hierarchical levels, ensuring that the model respects the taxonomic structure during training. This approach aligns statistical optimization with class semantic relationships, enabling more accurate and coherent predictions. Experiments conducted on real-world datasets structured according to the GS1 Global Product Classification (GPC) system demonstrate the effectiveness of our method. The proposed model was applied using tree-based ensemble methods combined with the newly developed hierarchy-aware metric Penalized Information Gain (PIG). PIG was implemented with level-wise entropy adjustments, assigning greater weight to higher hierarchical levels to maintain the taxonomic structure. The model was trained and evaluated on two real-world datasets based on the GS1 Global Product Classification (GPC) system. The final dataset included approximately 30,000 product descriptions spanning four hierarchical levels. An 80-20 train–test split was used, with model hyperparameters optimized through 5-fold cross-validation and Bayesian search. The experimental results showed a 12.7% improvement in classification accuracy at the lowest hierarchy level compared to traditional flat classification methods, with significant gains in datasets featuring highly imbalanced class distributions and deep hierarchies. The proposed approach also increased the F1 score by 12.6%. Despite these promising results, challenges remain in scaling the model for very large datasets and handling classes with limited training samples. Future research will focus on integrating neural networks with hierarchy-aware metrics, enhancing data augmentation to address class imbalance, and developing real-time classification systems for practical use in industries such as retail, logistics, and healthcare. Full article
(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)
Show Figures

Figure 1

Back to TopTop