MDPI - Publisher of Open Access Journals

16 pages, 12177 KiB

Open AccessArticle

An Advanced Natural Language Processing Framework for Arabic Named Entity Recognition: A Novel Approach to Handling Morphological Richness and Nested Entities

by Saleh Albahli

Appl. Sci. 2025, 15(6), 3073; https://doi.org/10.3390/app15063073 - 12 Mar 2025

Cited by 1 | Viewed by 1024

Abstract

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP) that supports applications such as information retrieval, sentiment analysis, and text summarization. While substantial progress has been made in NER for widely studied languages like English, Arabic presents unique challenges [...] Read more.

Named Entity Recognition (NER) is a fundamental task in Natural Language Processing (NLP) that supports applications such as information retrieval, sentiment analysis, and text summarization. While substantial progress has been made in NER for widely studied languages like English, Arabic presents unique challenges due to its morphological richness, orthographic ambiguity, and the frequent occurrence of nested and overlapping entities. This paper introduces a novel Arabic NER framework that addresses these complexities through architectural innovations. The proposed model incorporates a Hybrid Feature Fusion Layer, which integrates external lexical features using a cross-attention mechanism and a Gated Lexical Unit (GLU) to filter noise, while a Compound Span Representation Layer employs Rotary Positional Encoding (RoPE) and Bidirectional GRUs to enhance the detection of complex entity structures. Additionally, an Enhanced Multi-Label Classification Layer improves the disambiguation of overlapping spans and assigns multiple entity types where applicable. The model is evaluated on three benchmark datasets—ANERcorp, ACE 2005, and a custom biomedical dataset—achieving an F1-score of 93.0% on ANERcorp and 89.6% on ACE 2005, significantly outperforming state-of-the-art methods. A case study further highlights the model’s real-world applicability in handling compound and nested entities with high confidence. By establishing a new benchmark for Arabic NER, this work provides a robust foundation for advancing NLP research in morphologically rich languages. Full article

(This article belongs to the Special Issue Techniques and Applications of Natural Language Processing)

► Show Figures

Figure 1

9 pages, 1925 KiB

Open AccessProceeding Paper

A New Approach for Carrying Out Sentiment Analysis of Social Media Comments Using Natural Language Processing

by Mritunjay Ranjan, Sanjay Tiwari, Arif Md Sattar and Nisha S. Tatkar

Eng. Proc. 2023, 59(1), 181; https://doi.org/10.3390/engproc2023059181 - 17 Jan 2024

Cited by 5 | Viewed by 6465

Abstract

Business and science are using sentiment analysis to extract and assess subjective information from the web, social media, and other sources using NLP, computational linguistics, text analysis, image processing, audio processing, and video processing. It models polarity, attitudes, and urgency from positive, negative, [...] Read more.

Business and science are using sentiment analysis to extract and assess subjective information from the web, social media, and other sources using NLP, computational linguistics, text analysis, image processing, audio processing, and video processing. It models polarity, attitudes, and urgency from positive, negative, or neutral inputs. Unstructured data make emotion assessment difficult. Unstructured consumer data allow businesses to market, engage, and connect with consumers on social media. Text data are instantly assessed for user sentiment. Opinion mining identifies a text’s positive, negative, or neutral opinions, attitudes, views, emotions, and sentiments. Text analytics uses machine learning to evaluate “unstructured” natural language text data. These data can help firms make money and decisions. Sentiment analysis shows how individuals feel about things, services, organizations, people, events, themes, and qualities. Reviews, forums, blogs, social media, and other articles use it. DD (data-driven) methods find complicated semantic representations of texts without feature engineering. Data-driven sentiment analysis is three-tiered: document-level sentiment analysis determines polarity and sentiment, aspect-based sentiment analysis assesses document segments for emotion and polarity, and data-driven (DD) sentiment analysis recognizes word polarity and writes positive and negative neutral sentiments. Our innovative method captures sentiments from text comments. The syntactic layer encompasses various processes such as sentence-level normalisation, identification of ambiguities at paragraph boundaries, part-of-speech (POS) tagging, text chunking, and lemmatization. Pragmatics include personality recognition, sarcasm detection, metaphor comprehension, aspect extraction, and polarity detection; semantics include word sense disambiguation, concept extraction, named entity recognition, anaphora resolution, and subjectivity detection. Full article

(This article belongs to the Proceedings of Eng. Proc., 2023, RAiSE-2023)

► Show Figures

Figure 1

20 pages, 676 KiB

Open AccessArticle

Enhancement of Question Answering System Accuracy via Transfer Learning and BERT

by Kai Duan, Shiyu Du, Yiming Zhang, Yanru Lin, Hongzhuo Wu and Quan Zhang

Appl. Sci. 2022, 12(22), 11522; https://doi.org/10.3390/app122211522 - 13 Nov 2022

Cited by 5 | Viewed by 3563

Abstract

Entity linking and predicate matching are two core tasks in the Chinese Knowledge Base Question Answering (CKBQA). Compared with the English entity linking task, the Chinese entity linking is extremely complicated, making accurate Chinese entity linking difficult. Meanwhile, strengthening the correlation between entities [...] Read more.

Entity linking and predicate matching are two core tasks in the Chinese Knowledge Base Question Answering (CKBQA). Compared with the English entity linking task, the Chinese entity linking is extremely complicated, making accurate Chinese entity linking difficult. Meanwhile, strengthening the correlation between entities and predicates is the key to the accuracy of the question answering system. Therefore, we put forward a Bidirectional Encoder Representation from Transformers and transfer learning Knowledge Base Question Answering (BAT-KBQA) framework, which is on the basis of feature-enhanced Bidirectional Encoder Representation from Transformers (BERT), and then perform a Named Entity Recognition (NER) task, which is appropriate for Chinese datasets using transfer learning and the Bidirectional Long Short-Term Memory-Conditional Random Field (BiLSTM-CRF) model. We utilize a BERT-CNN (Convolutional Neural Network) model for entity disambiguation of the problem and candidate entities; based on the set of entities and predicates, a BERT-Softmax model with answer entity predicate features is introduced for predicate matching. The answer ultimately chooses to integrate entities and predicates scores to determine the definitive answer. The experimental results indicate that the model, which is developed by us, considerably enhances the overall performance of the Knowledge Base Question Answering (KBQA) and it has the potential to be generalizable. The model also has better performance on the dataset supplied by the NLPCC-ICCPOL2016 KBQA task with a mean F1 score of 87.74% compared to BB-KBQA. Full article

(This article belongs to the Topic Recent Advances in Data Mining)

► Show Figures

Figure 1

12 pages, 1074 KiB

Open AccessArticle

Entity Linking Method for Chinese Short Text Based on Siamese-Like Network

by Yang Zhang, Jin Liu, Bo Huang and Bei Chen

Information 2022, 13(8), 397; https://doi.org/10.3390/info13080397 - 22 Aug 2022

Cited by 6 | Viewed by 2693

Abstract

Entity linking plays a fundamental role in knowledge engineering and data mining and is the basis of various downstream applications such as content analysis, relationship extraction, question and answer. Most existing entity linking models rely on sufficient context for disambiguation but do not [...] Read more.

Entity linking plays a fundamental role in knowledge engineering and data mining and is the basis of various downstream applications such as content analysis, relationship extraction, question and answer. Most existing entity linking models rely on sufficient context for disambiguation but do not work well for concise and sparse short texts. In addition, most of the methods use pre-training models to directly calculate the similarity between the entity text to be disambiguated and the candidate entity text, and do not dig deeper into the relationship between them. This article proposes an entity linking method for Chinese short texts based on Siamese-like networks to address the above shortcomings. In the entity disambiguation task, the features of the Siamese-like network are used to deeply parse the semantic relationships in the text and make full use of the feature information of the entity text to be disambiguated, capturing the interdependent features within the sentences through an attention mechanism, aiming to find out the most critical elements in the entity text description. The experimental demonstration on the CCKS2019 dataset shows that the F1 value of the method reaches 87.29%, increase of 11.02% compared to the F1 value(that) of the baseline method, fully validating the superiority of the model. Full article

(This article belongs to the Special Issue Intelligence Computing and Systems)

► Show Figures

Figure 1

15 pages, 1209 KiB

Open AccessArticle

Computationally Efficient Context-Free Named Entity Disambiguation with Wikipedia

by Michael Angelos Simos and Christos Makris

Information 2022, 13(8), 367; https://doi.org/10.3390/info13080367 - 2 Aug 2022

Cited by 4 | Viewed by 3059

Abstract

The induction of the semantics of unstructured text corpora is a crucial task for modern natural language processing and artificial intelligence applications. The Named Entity Disambiguation task comprises the extraction of Named Entities and their linking to an appropriate representation from a concept [...] Read more.

The induction of the semantics of unstructured text corpora is a crucial task for modern natural language processing and artificial intelligence applications. The Named Entity Disambiguation task comprises the extraction of Named Entities and their linking to an appropriate representation from a concept ontology based on the available information. This work introduces novel methodologies, leveraging domain knowledge extraction from Wikipedia in a simple yet highly effective approach. In addition, we introduce a fuzzy logic model with a strong focus on computational efficiency. We also present a new measure, decisive in both methods for the entity linking selection and the quantification of the confidence of the produced entity links, namely the relative commonness measure. The experimental results of our approach on established datasets revealed state-of-the-art accuracy and run-time performance in the domain of fast, context-free Wikification, by relying on an offline pre-processing stage on the corpus of Wikipedia. The methods introduced can be leveraged as stand-alone NED methodologies, propitious for applications on mobile devices, or in the context of vastly reducing the complexity of deep neural network approaches as a first context-free layer. Full article

(This article belongs to the Special Issue Knowledge Management and Digital Humanities)

► Show Figures

Figure 1

25 pages, 1776 KiB

Open AccessArticle

Linking Entities from Text to Hundreds of RDF Datasets for Enabling Large Scale Entity Enrichment

by Michalis Mountantonakis and Yannis Tzitzikas

Knowledge 2022, 2(1), 1-25; https://doi.org/10.3390/knowledge2010001 - 24 Dec 2021

Viewed by 4076

Abstract

There is a high increase in approaches that receive as input a text and perform named entity recognition (or extraction) for linking the recognized entities of the given text to RDF Knowledge Bases (or datasets). In this way, it is feasible to retrieve [...] Read more.

There is a high increase in approaches that receive as input a text and perform named entity recognition (or extraction) for linking the recognized entities of the given text to RDF Knowledge Bases (or datasets). In this way, it is feasible to retrieve more information for these entities, which can be of primary importance for several tasks, e.g., for facilitating manual annotation, hyperlink creation, content enrichment, for improving data veracity and others. However, current approaches link the extracted entities to one or few knowledge bases, therefore, it is not feasible to retrieve the URIs and facts of each recognized entity from multiple datasets and to discover the most relevant datasets for one or more extracted entities. For enabling this functionality, we introduce a research prototype, called LODsyndesis

_{I E}

, which exploits three widely used Named Entity Recognition and Disambiguation tools (i.e., DBpedia Spotlight, WAT and Stanford CoreNLP) for recognizing the entities of a given text. Afterwards, it links these entities to the LODsyndesis knowledge base, which offers data enrichment and discovery services for millions of entities over hundreds of RDF datasets. We introduce all the steps of LODsyndesis

_{I E}

, and we provide information on how to exploit its services through its online application and its REST API. Concerning the evaluation, we use three evaluation collections of texts: (i) for comparing the effectiveness of combining different Named Entity Recognition tools, (ii) for measuring the gain in terms of enrichment by linking the extracted entities to LODsyndesis instead of using a single or a few RDF datasets and (iii) for evaluating the efficiency of LODsyndesis

_{I E}

. Full article

► Show Figures

Figure 1

13 pages, 1632 KiB

Open AccessArticle

Improving the Performance of Vietnamese–Korean Neural Machine Translation with Contextual Embedding

by Van-Hai Vu, Quang-Phuoc Nguyen, Ebipatei Victoria Tunyan and Cheol-Young Ock

Appl. Sci. 2021, 11(23), 11119; https://doi.org/10.3390/app112311119 - 23 Nov 2021

Cited by 4 | Viewed by 3140

Abstract

With the recent evolution of deep learning, machine translation (MT) models and systems are being steadily improved. However, research on MT in low-resource languages such as Vietnamese and Korean is still very limited. In recent years, a state-of-the-art context-based embedding model introduced by [...] Read more.

With the recent evolution of deep learning, machine translation (MT) models and systems are being steadily improved. However, research on MT in low-resource languages such as Vietnamese and Korean is still very limited. In recent years, a state-of-the-art context-based embedding model introduced by Google, bidirectional encoder representations for transformers (BERT), has begun to appear in the neural MT (NMT) models in different ways to enhance the accuracy of MT systems. The BERT model for Vietnamese has been developed and significantly improved in natural language processing (NLP) tasks, such as part-of-speech (POS), named-entity recognition, dependency parsing, and natural language inference. Our research experimented with applying the Vietnamese BERT model to provide POS tagging and morphological analysis (MA) for Vietnamese sentences,, and applying word-sense disambiguation (WSD) for Korean sentences in our Vietnamese–Korean bilingual corpus. In the Vietnamese–Korean NMT system, with contextual embedding, the BERT model for Vietnamese is concurrently connected to both encoder layers and decoder layers in the NMT model. Experimental results assessed through BLEU, METEOR, and TER metrics show that contextual embedding significantly improves the quality of Vietnamese–Korean NMT. Full article

► Show Figures

Graphical abstract

17 pages, 2161 KiB

Open AccessArticle

NERWS: Towards Improving Information Retrieval of Digital Library Management System Using Named Entity Recognition and Word Sense

by Ahmed Aliwy, Ayad Abbas and Ahmed Alkhayyat

Big Data Cogn. Comput. 2021, 5(4), 59; https://doi.org/10.3390/bdcc5040059 - 28 Oct 2021

Cited by 10 | Viewed by 5853

Abstract

An information retrieval (IR) system is the core of many applications, including digital library management systems (DLMS). The IR-based DLMS depends on either the title with keywords or content as symbolic strings. In contrast, it ignores the meaning of the content or what [...] Read more.

An information retrieval (IR) system is the core of many applications, including digital library management systems (DLMS). The IR-based DLMS depends on either the title with keywords or content as symbolic strings. In contrast, it ignores the meaning of the content or what it indicates. Many researchers tried to improve IR systems either using the named entity recognition (NER) technique or the words’ meaning (word sense) and implemented the improvements with a specific language. However, they did not test the IR system using NER and word sense disambiguation together to study the behavior of this system in the presence of these techniques. This paper aims to improve the information retrieval system used by the DLMS by adding the NER and word sense disambiguation (WSD) together for the English and Arabic languages. For NER, a voting technique was used among three completely different classifiers: rules-based, conditional random field (CRF), and bidirectional LSTM-CNN. For WSD, an examples-based method was used to implement it for the first time with the English language. For the IR system, a vector space model (VSM) was used to test the information retrieval system, and it was tested on samples from the library of the University of Kufa for the Arabic and English languages. The overall system results show that the precision, recall, and F-measures were increased from 70.9%, 74.2%, and 72.5% to 89.7%, 91.5%, and 90.6% for the English language and from 66.3%, 69.7%, and 68.0% to 89.3%, 87.1%, and 88.2% for the Arabic language. Full article

► Show Figures

Figure 1

14 pages, 789 KiB

Open AccessArticle

The Integration of Linguistic and Geospatial Features Using Global Context Embedding for Automated Text Geocoding

by Zheren Yan, Can Yang, Lei Hu, Jing Zhao, Liangcun Jiang and Jianya Gong

ISPRS Int. J. Geo-Inf. 2021, 10(9), 572; https://doi.org/10.3390/ijgi10090572 - 24 Aug 2021

Cited by 10 | Viewed by 5214

Abstract

Geocoding is an essential procedure in geographical information retrieval to associate place names with coordinates. Due to the inherent ambiguity of place names in natural language and the scarcity of place names in textual data, it is widely recognized that geocoding is challenging. [...] Read more.

Geocoding is an essential procedure in geographical information retrieval to associate place names with coordinates. Due to the inherent ambiguity of place names in natural language and the scarcity of place names in textual data, it is widely recognized that geocoding is challenging. Recent advances in deep learning have promoted the use of the neural network to improve the performance of geocoding. However, most of the existing approaches consider only the local context, e.g., neighboring words in a sentence, as opposed to the global context, e.g., the topic of the document. Lack of global information may have a severe impact on the robustness of the model. To fill the research gap, this paper proposes a novel global context embedding approach to generate linguistic and geospatial features through topic embedding and location embedding, respectively. A deep neural network called LGGeoCoder, which integrates local and global features, is developed to solve the geocoding as a classification problem. The experiments on a Wikipedia place name dataset demonstrate that LGGeoCoder achieves competitive performance compared with state-of-the-art models. Furthermore, the effect of introducing global linguistic and geospatial features in geocoding to alleviate the ambiguity and scarcity problem is discussed. Full article

(This article belongs to the Special Issue GIS Software and Engineering for Big Data)

► Show Figures

Figure 1

17 pages, 1460 KiB

Open AccessArticle

OTNEL: A Distributed Online Deep Learning Semantic Annotation Methodology

by Christos Makris and Michael Angelos Simos

Big Data Cogn. Comput. 2020, 4(4), 31; https://doi.org/10.3390/bdcc4040031 - 29 Oct 2020

Cited by 6 | Viewed by 6463

Abstract

Semantic representation of unstructured text is crucial in modern artificial intelligence and information retrieval applications. The semantic information extraction process from an unstructured text fragment to a corresponding representation from a concept ontology is known as named entity disambiguation. In this work, we [...] Read more.

Semantic representation of unstructured text is crucial in modern artificial intelligence and information retrieval applications. The semantic information extraction process from an unstructured text fragment to a corresponding representation from a concept ontology is known as named entity disambiguation. In this work, we introduce a distributed, supervised deep learning methodology employing a long short-term memory-based deep learning architecture model for entity linking with Wikipedia. In the context of a frequently changing online world, we introduce and study the domain of online training named entity disambiguation, featuring on-the-fly adaptation to underlying knowledge changes. Our novel methodology evaluates polysemous anchor mentions with sense compatibility based on thematic segmentation of the Wikipedia knowledge graph representation. We aim at both robust performance and high entity-linking accuracy results. The introduced modeling process efficiently addresses conceptualization, formalization, and computational challenges for the online training entity-linking task. The novel online training concept can be exploited for wider adoption, as it is considerably beneficial for targeted topic, online global context consensus for entity disambiguation. Full article

(This article belongs to the Special Issue Semantic Web Technology and Recommender Systems)

► Show Figures

Figure 1

19 pages, 1699 KiB

Open AccessArticle

Adaptive Geoparsing Method for Toponym Recognition and Resolution in Unstructured Text

by Edwin Aldana-Bobadilla, Alejandro Molina-Villegas, Ivan Lopez-Arevalo, Shanel Reyes-Palacios, Victor Muñiz-Sanchez and Jean Arreola-Trapala

Remote Sens. 2020, 12(18), 3041; https://doi.org/10.3390/rs12183041 - 17 Sep 2020

Cited by 22 | Viewed by 5271

Abstract

The automatic extraction of geospatial information is an important aspect of data mining. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing, which includes two important tasks: geographic entity recognition and toponym resolution. The first task [...] Read more.

The automatic extraction of geospatial information is an important aspect of data mining. Computer systems capable of discovering geographic information from natural language involve a complex process called geoparsing, which includes two important tasks: geographic entity recognition and toponym resolution. The first task could be approached through a machine learning approach, in which case a model is trained to recognize a sequence of characters (words) corresponding to geographic entities. The second task consists of assigning such entities to their most likely coordinates. Frequently, the latter process involves solving referential ambiguities. In this paper, we propose an extensible geoparsing approach including geographic entity recognition based on a neural network model and disambiguation based on what we have called dynamic context disambiguation. Once place names are recognized in an input text, they are solved using a grammar, in which a set of rules specifies how ambiguities could be solved, in a similar way to that which a person would utilize, considering the context. As a result, we have an assignment of the most likely geographic properties of the recognized places. We propose an assessment measure based on a ranking of closeness relative to the predicted and actual locations of a place name. Regarding this measure, our method outperforms OpenStreetMap Nominatim. We include other assessment measures to assess the recognition ability of place names and the prediction of what we called geographic levels (administrative jurisdiction of places). Full article

(This article belongs to the Special Issue Advances in Remote Sensing and Geographic Information Science and Their Uses in Geointelligence)

► Show Figures

Graphical abstract

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (11)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI