Recent Trends and Advances in the Natural Language Processing

A special issue of Mathematics (ISSN 2227-7390). This special issue belongs to the section "E1: Mathematics and Computer Science".

Deadline for manuscript submissions: closed (15 October 2024) | Viewed by 16514

Special Issue Editors


E-Mail Website
Guest Editor
Computer Science Department, Xi’an Jiaotong University, Xi’an 710049, China
Interests: natural language processing; text mining; semantic web

E-Mail Website
Guest Editor
Computer Science Department, School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
Interests: natural language processing; text mining; computer vision

E-Mail Website
Guest Editor
Computer Science Department, Xi’an Jiaotong University, Xi’an 710049, China
Interests: machine learning; statistical learning theory; information theory

Special Issue Information

Dear Colleagues,

Natural language processing (NLP) is one of the most exciting areas of artificial intelligence. The rapid growth of social media and digital articles creates significant challenges in analyzing vast user data to generate insights. Furthermore, interactive automation systems such as chatbots are unable to fully replace humans, due to their lack of understanding of semantics and context. As unstructured data grow, NLP techniques are evolving to better understand the nuances, contexts, and ambiguity of human language. Novel technologies have been developed to meet the various requirements in NLP intelligent systems. This Special Issue offers a timely collection of original contributions of works to benefit the researchers and practitioners in the research fields of new trends and applications in natural language processing. The Special Issue focuses on the use and exploration of new technologies (see keywords) for NLP-related tasks, including (but not limited to) information extraction, information retrieval, sentiment analysis, machine translation, text summarization, and dialogue systems.

Prof. Dr. Chen Li
Prof. Dr. Jun Liu
Dr. Tieliang Gong
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Mathematics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • language model
  • few-shot learning for NLP
  • reinforcement learning for NLP
  • lifelong learning for NLP
  • graph learning for NLP
  • multilingual NLP
  • multimodel NLP
  • reinforcement learning for NLP
  • intelligent applications with NLP

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

25 pages, 628 KiB  
Article
Knowledge-Enhanced Transformer Graph Summarization (KETGS): Integrating Entity and Discourse Relations for Advanced Extractive Text Summarization
by Aytuğ Onan and Hesham Alhumyani
Mathematics 2024, 12(23), 3638; https://doi.org/10.3390/math12233638 - 21 Nov 2024
Viewed by 1100
Abstract
The rapid proliferation of textual data across multiple sectors demands more sophisticated and efficient techniques for summarizing extensive texts. Focusing on extractive text summarization, this approach zeroes in on choosing key sentences from a document, providing an essential method for handling extensive information. [...] Read more.
The rapid proliferation of textual data across multiple sectors demands more sophisticated and efficient techniques for summarizing extensive texts. Focusing on extractive text summarization, this approach zeroes in on choosing key sentences from a document, providing an essential method for handling extensive information. While conventional methods often miss capturing deep semantic links within texts, resulting in summaries that might lack cohesion and depth, this paper introduces a novel framework called Knowledge-Enhanced Transformer Graph Summary (KETGS). Leveraging the strengths of both transformer models and Graph Neural Networks, KETGS develops a detailed graph representation of documents, embedding linguistic units from words to key entities. This structured graph is then navigated via a Transformer-Guided Graph Neural Network (TG-GNN), dynamically enhancing node features with structural connections and transformer-driven attention mechanisms. The framework adopts a Maximum Marginal Relevance (MMR) strategy for selecting sentences. Our evaluations show that KETGS outshines other leading extractive summarization models, delivering summaries that are more relevant, cohesive, and concise, thus better preserving the essence and structure of the original texts. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

19 pages, 17342 KiB  
Article
TTG-Text: A Graph-Based Text Representation Framework Enhanced by Typical Testors for Improved Classification
by Carlos Sánchez-Antonio, José E. Valdez-Rodríguez and Hiram Calvo
Mathematics 2024, 12(22), 3576; https://doi.org/10.3390/math12223576 - 15 Nov 2024
Viewed by 865
Abstract
Recent advancements in graph-based text representation, particularly with embedding models and transformers such as BERT, have shown significant potential for enhancing natural language processing (NLP) tasks. However, challenges related to data sparsity and limited interpretability remain, especially when working with small or imbalanced [...] Read more.
Recent advancements in graph-based text representation, particularly with embedding models and transformers such as BERT, have shown significant potential for enhancing natural language processing (NLP) tasks. However, challenges related to data sparsity and limited interpretability remain, especially when working with small or imbalanced datasets. This paper introduces TTG-Text, a novel framework that strengthens graph-based text representation by integrating typical testors—a symbolic feature selection technique that refines feature importance while reducing dimensionality. Unlike traditional TF-IDF weighting, TTG-Text leverages typical testors to enhance feature relevance within text graphs, resulting in improved model interpretability and performance, particularly for smaller datasets. Our evaluation on a text classification task using a graph convolutional network (GCN) demonstrates that TTG-Text achieves a 95% accuracy rate, surpassing conventional methods and BERT with fewer required training epochs. By combining symbolic algorithms with graph-based models, this hybrid approach offers a more interpretable, efficient, and high-performing solution for complex NLP tasks. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

16 pages, 1046 KiB  
Article
Unified Training for Cross-Lingual Abstractive Summarization by Aligning Parallel Machine Translation Pairs
by Shaohuan Cheng, Wenyu Chen, Yujia Tang, Mingsheng Fu and Hong Qu
Mathematics 2024, 12(13), 2107; https://doi.org/10.3390/math12132107 - 4 Jul 2024
Cited by 1 | Viewed by 1063
Abstract
Cross-lingual summarization (CLS) is essential for enhancing global communication by facilitating efficient information exchange across different languages. However, owing to the scarcity of CLS data, recent studies have employed multi-task frameworks to combine parallel monolingual summaries. These methods often use independent decoders or [...] Read more.
Cross-lingual summarization (CLS) is essential for enhancing global communication by facilitating efficient information exchange across different languages. However, owing to the scarcity of CLS data, recent studies have employed multi-task frameworks to combine parallel monolingual summaries. These methods often use independent decoders or models with non-shared parameters because of the mismatch in output languages, which limits the transfer of knowledge between CLS and its parallel data. To address this issue, we propose a unified training method for CLS that combines parallel machine translation (MT) pairs with CLS pairs, jointly training them within a single model. This design ensures consistent input and output languages and promotes knowledge sharing between the two tasks. To further enhance the model’s capability to focus on key information, we introduce two additional loss terms to align the hidden representations and probability distributions between the parallel MT and CLS pairs. Experimental results demonstrate that our method outperforms competitive methods in both full-dataset and low-resource scenarios on two benchmark datasets, Zh2EnSum and En2ZhSum. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

26 pages, 2339 KiB  
Article
Switching Self-Attention Text Classification Model with Innovative Reverse Positional Encoding for Right-to-Left Languages: A Focus on Arabic Dialects
by Laith H. Baniata and Sangwoo Kang
Mathematics 2024, 12(6), 865; https://doi.org/10.3390/math12060865 - 15 Mar 2024
Cited by 4 | Viewed by 1981
Abstract
Transformer models have emerged as frontrunners in the field of natural language processing, primarily due to their adept use of self-attention mechanisms to grasp the semantic linkages between words in sequences. Despite their strengths, these models often face challenges in single-task learning scenarios, [...] Read more.
Transformer models have emerged as frontrunners in the field of natural language processing, primarily due to their adept use of self-attention mechanisms to grasp the semantic linkages between words in sequences. Despite their strengths, these models often face challenges in single-task learning scenarios, particularly when it comes to delivering top-notch performance and crafting strong latent feature representations. This challenge is more pronounced in the context of smaller datasets and is particularly acute for under-resourced languages such as Arabic. In light of these challenges, this study introduces a novel methodology for text classification of Arabic texts. This method harnesses the newly developed Reverse Positional Encoding (RPE) technique. It adopts an inductive-transfer learning (ITL) framework combined with a switching self-attention shared encoder, thereby increasing the model’s adaptability and improving its sentence representation accuracy. The integration of Mixture of Experts (MoE) and RPE techniques empowers the model to process longer sequences more effectively. This enhancement is notably beneficial for Arabic text classification, adeptly supporting both the intricate five-point and the simpler ternary classification tasks. The empirical evidence points to its outstanding performance, achieving accuracy rates of 87.20% for the HARD dataset, 72.17% for the BRAD dataset, and 86.89% for the LABR dataset, as evidenced by the assessments conducted on these datasets. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

20 pages, 897 KiB  
Article
Finite State Automata on Multi-Word Units for Efficient Text-Mining
by Alberto Postiglione
Mathematics 2024, 12(4), 506; https://doi.org/10.3390/math12040506 - 6 Feb 2024
Cited by 4 | Viewed by 2040
Abstract
Text mining is crucial for analyzing unstructured and semi-structured textual documents. This paper introduces a fast and precise text mining method based on a finite automaton to extract knowledge domains. Unlike simple words, multi-word units (such as credit card) are emphasized for their [...] Read more.
Text mining is crucial for analyzing unstructured and semi-structured textual documents. This paper introduces a fast and precise text mining method based on a finite automaton to extract knowledge domains. Unlike simple words, multi-word units (such as credit card) are emphasized for their efficiency in identifying specific semantic areas due to their predominantly monosemic nature, their limited number and their distinctiveness. The method focuses on identifying multi-word units within terminological ontologies, where each multi-word unit is associated with a sub-domain of ontology knowledge. The algorithm, designed to handle the challenges posed by very long multi-word units composed of a variable number of simple words, integrates user-selected ontologies into a single finite automaton during a fast pre-processing step. At runtime, the automaton reads input text character by character, efficiently locating multi-word units even if they overlap. This approach is efficient for both short and long documents, requiring no prior training. Ontologies can be updated without additional computational costs. An early system prototype, tested on 100 short and medium-length documents, recognized the knowledge domains for the vast majority of texts (over 90%) analyzed. The authors suggest that this method could be a valuable semantic-based knowledge domain extraction technique in unstructured documents. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

20 pages, 1045 KiB  
Article
Transformer Text Classification Model for Arabic Dialects That Utilizes Inductive Transfer
by Laith H. Baniata and Sangwoo Kang
Mathematics 2023, 11(24), 4960; https://doi.org/10.3390/math11244960 - 14 Dec 2023
Cited by 9 | Viewed by 2502
Abstract
In the realm of the five-category classification endeavor, there has been limited exploration of applied techniques for classifying Arabic text. These methods have primarily leaned on single-task learning, incorporating manually crafted features that lack robust sentence representations. Recently, the Transformer paradigm has emerged [...] Read more.
In the realm of the five-category classification endeavor, there has been limited exploration of applied techniques for classifying Arabic text. These methods have primarily leaned on single-task learning, incorporating manually crafted features that lack robust sentence representations. Recently, the Transformer paradigm has emerged as a highly promising alternative. However, when these models are trained using single-task learning, they often face challenges in achieving outstanding performance and generating robust latent feature representations, especially when dealing with small datasets. This issue is particularly pronounced in the context of the Arabic dialect, which has a scarcity of available resources. Given these constraints, this study introduces an innovative approach to dissecting sentiment in Arabic text. This approach combines Inductive Transfer (INT) with the Transformer paradigm to augment the adaptability of the model and refine the representation of sentences. By employing self-attention SE-A and feed-forward sub-layers as a shared Transformer encoder for both the five-category and three-category Arabic text classification tasks, this proposed model adeptly discerns sentiment in Arabic dialect sentences. The empirical findings underscore the commendable performance of the proposed model, as demonstrated in assessments of the Hotel Arabic-Reviews Dataset, the Book Reviews Arabic Dataset, and the LARB dataset. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

Review

Jump to: Research

32 pages, 730 KiB  
Review
Event-Centric Temporal Knowledge Graph Construction: A Survey
by Timotej Knez and Slavko Žitnik
Mathematics 2023, 11(23), 4852; https://doi.org/10.3390/math11234852 - 2 Dec 2023
Cited by 7 | Viewed by 5697
Abstract
Textual documents serve as representations of discussions on a variety of subjects. These discussions can vary in length and may encompass a range of events or factual information. Present trends in constructing knowledge bases primarily emphasize fact-based common sense reasoning, often overlooking the [...] Read more.
Textual documents serve as representations of discussions on a variety of subjects. These discussions can vary in length and may encompass a range of events or factual information. Present trends in constructing knowledge bases primarily emphasize fact-based common sense reasoning, often overlooking the temporal dimension of events. Given the widespread presence of time-related information, addressing this temporal aspect could potentially enhance the quality of common-sense reasoning within existing knowledge graphs. In this comprehensive survey, we aim to identify and evaluate the key tasks involved in constructing temporal knowledge graphs centered around events. These tasks can be categorized into three main components: (a) event extraction, (b) the extraction of temporal relationships and attributes, and (c) the creation of event-based knowledge graphs and timelines. Our systematic review focuses on the examination of available datasets and language technologies for addressing these tasks. An in-depth comparison of various approaches reveals that the most promising results are achieved by employing state-of-the-art models leveraging large pre-trained language models. Despite the existence of multiple datasets, a noticeable gap exists in the availability of annotated data that could facilitate the development of comprehensive end-to-end models. Drawing insights from our findings, we engage in a discussion and propose four future directions for research in this domain. These directions encompass (a) the integration of pre-existing knowledge, (b) the development of end-to-end systems for constructing event-centric knowledge graphs, (c) the enhancement of knowledge graphs with event-centric information, and (d) the prediction of absolute temporal attributes. Full article
(This article belongs to the Special Issue Recent Trends and Advances in the Natural Language Processing)
Show Figures

Figure 1

Back to TopTop