Skip Content
You are currently on the new version of our website. Access the old version .

391 Results Found

  • Article
  • Open Access
27 Citations
10,506 Views
14 Pages

Part-of-Speech (POS) tagging is one of the most important tasks in the field of natural language processing (NLP). POS tagging for a word depends not only on the word itself but also on its position, its surrounding words, and their POS tags. POS tag...

  • Article
  • Open Access
3 Citations
2,864 Views
13 Pages

23 February 2023

Various research approaches have attempted to solve the length difference problem between the surface form and the base form of words in the Korean morphological analysis and part-of-speech (POS) tagging task. The compound POS tagging method is a pop...

  • Article
  • Open Access
26 Citations
7,449 Views
13 Pages

Bidirectional Long Short-Term Memory Network with a Conditional Random Field Layer for Uyghur Part-Of-Speech Tagging

  • Maihemuti Maimaiti,
  • Aishan Wumaier,
  • Kahaerjiang Abiderexiti and
  • Tuergen Yibulayin

30 November 2017

Uyghur is an agglutinative and a morphologically rich language; natural language processing tasks in Uyghur can be a challenge. Word morphology is important in Uyghur part-of-speech (POS) tagging. However, POS tagging performance suffers from error p...

  • Article
  • Open Access
624 Views
16 Pages

Chinese Text Readability Assessment Based on the Integration of Visualized Part-of-Speech Information with Linguistic Features

  • Chi-Yi Hsieh,
  • Jing-Yan Lin,
  • Chi-Wen Hsieh,
  • Bo-Yuan Huang,
  • Yi-Chi Huang and
  • Yu-Xiang Chen

9 December 2025

The assessment of Chinese text readability plays a significant role in Chinese language education. Due to the intrinsic differences between alphabetic languages and Chinese character representations, the readability assessment becomes more challengin...

  • Article
  • Open Access
866 Views
14 Pages

25 March 2025

Linguistic tasks such as Part-of-Speech (PoS) tagging can be tedious, but are crucial for the development of Natural Language Processing (NLP) tools. Games With A Purpose (GWAPs) aim to reduce the monotony of the task for native speakers and non-expe...

  • Article
  • Open Access
7 Citations
3,139 Views
20 Pages

21 April 2022

Chinese Medical Named Entity Recognition (Chinese-MNER) aims to identify potential entities and their categories from the unstructured Chinese medical text. Existing methods for this task mainly incorporate the dictionary knowledge on the basis of tr...

  • Article
  • Open Access
3 Citations
2,172 Views
13 Pages

Does Part of Speech Have an Influence on Cyberbullying Detection?

  • Jingxiu Huang,
  • Ruofei Ding,
  • Yunxiang Zheng,
  • Xiaomin Wu,
  • Shumin Chen and
  • Xiunan Jin

21 December 2023

With the development of the Internet, the issue of cyberbullying on social media has gained significant attention. Cyberbullying is often expressed in text. Methods of identifying such text via machine learning have been growing, most of which rely o...

  • Article
  • Open Access
3 Citations
2,430 Views
18 Pages

Separate Syntax and Semantics: Part-of-Speech-Guided Transformer for Image Captioning

  • Dong Wang,
  • Bing Liu,
  • Yong Zhou,
  • Mingming Liu,
  • Peng Liu and
  • Rui Yao

22 November 2022

Transformer-based image captioning models have recently achieved remarkable performance by using new fully attentive paradigms. However, existing models generally follow the conventional language model of predicting the next word conditioned on the v...

  • Article
  • Open Access
4 Citations
2,519 Views
11 Pages

Medical Named Entity Recognition Fusing Part-of-Speech and Stroke Features

  • Fen Yi,
  • Hong Liu,
  • You Wang,
  • Sheng Wu,
  • Cheng Sun,
  • Peng Feng and
  • Jin Zhang

2 August 2023

It is highly significant from a research standpoint and a valuable practice to identify diseases, symptoms, drugs, examinations, and other medical entities in medical text data to support knowledge maps, question and answer systems, and other downstr...

  • Article
  • Open Access
7 Citations
2,600 Views
37 Pages

23 April 2024

To address the cost and efficiency issues of manually analysing requirement dependency in requirements engineering, a requirement dependency extraction method based on part-of-speech features and an improved stacking ensemble learning model (P-Stacki...

  • Article
  • Open Access
2 Citations
1,335 Views
20 Pages

14 March 2025

Natural language processing (NLP) has numerous applications and has been extensively developed in deep learning. In recent years, language models such as Transformer, BERT, and GPT have frequently been the foundation for related research. However, re...

  • Article
  • Open Access
2 Citations
4,196 Views
17 Pages

4 October 2020

In this work, we propose a novel priors-based attention neural network (PANN) for image captioning, which aims at incorporating two kinds of priors, i.e., the probabilities being mentioned for local region proposals (PBM priors) and part-of-speech cl...

  • Article
  • Open Access
3 Citations
2,406 Views
17 Pages

Part-of-Speech Tags Guide Low-Resource Machine Translation

  • Zaokere Kadeer,
  • Nian Yi and
  • Aishan Wumaier

10 August 2023

Neural machine translation models are guided by loss function to select source sentence features and generate results close to human annotation. When the data resources are abundant, neural machine translation models can focus on the features used to...

  • Article
  • Open Access
2 Citations
3,121 Views
11 Pages

11 September 2021

Many natural language processing architectures are greatly affected by seemingly small design decisions, such as batching and curriculum learning (how the training data are ordered during training). In order to better understand the impact of these d...

  • Article
  • Open Access
7 Citations
6,239 Views
21 Pages

Improving Basic Natural Language Processing Tools for the Ainu Language

  • Karol Nowakowski,
  • Michal Ptaszynski,
  • Fumito Masui and
  • Yoshio Momouchi

24 October 2019

Ainu is a critically endangered language spoken by the native inhabitants of northern Japan. This paper describes our research aimed at the development of technology for automatic processing of text in Ainu. In particular, we improved the existing to...

  • Article
  • Open Access
4 Citations
4,047 Views
12 Pages

Developing Core Technologies for Resource-Scarce Nguni Languages

  • Jakobus S. du Toit and
  • Martin J. Puttkammer

14 December 2021

The creation of linguistic resources is crucial to the continued growth of research and development efforts in the field of natural language processing, especially for resource-scarce languages. In this paper, we describe the curation and annotation...

  • Article
  • Open Access
27 Citations
8,426 Views
18 Pages

5 December 2018

The statistical machine translation for the Arabic language integrates external linguistic resources such as part-of-speech tags. The current research presents a Bidirectional Long Short-Term Memory (Bi-LSTM)—Conditional Random Fields (CRF) seg...

  • Article
  • Open Access
8 Citations
4,650 Views
13 Pages

Developing a POS Tagged Corpus of Urdu Tweets

  • Amber Baig,
  • Mutee U Rahman,
  • Hameedullah Kazi and
  • Ahsanullah Baloch

7 November 2020

Processing of social media text like tweets is challenging for traditional Natural Language Processing (NLP) tools developed for well-edited text due to the noisy nature of such text. However, demand for tools and resources to correctly process such...

  • Data Descriptor
  • Open Access
3 Citations
3,734 Views
10 Pages

Introducing DeReKoGram: A Novel Frequency Dataset with Lemma and Part-of-Speech Information for German

  • Sascha Wolfer,
  • Alexander Koplenig,
  • Marc Kupietz and
  • Carolin Müller-Spitzer

10 November 2023

We introduce DeReKoGram, a novel frequency dataset containing lemma and part-of-speech (POS) information for 1-, 2-, and 3-grams from the German Reference Corpus. The dataset contains information based on a corpus of 43.2 billion tokens and is divide...

  • Article
  • Open Access
6 Citations
4,162 Views
12 Pages

Towards the Construction of a Gold Standard Biomedical Corpus for the Romanian Language

  • Maria Mitrofan,
  • Verginica Barbu Mititelu and
  • Grigorina Mitrofan

23 November 2018

Gold standard corpora (GSCs) are essential for the supervised training and evaluation of systems that perform natural language processing (NLP) tasks. Currently, most of the resources used in biomedical NLP tasks are mainly in English. Little effort...

  • Article
  • Open Access
1 Citations
3,670 Views
19 Pages

19 September 2024

Semantic ontologies have been widely utilized as crucial tools within natural language processing, underpinning applications such as knowledge extraction, question answering, machine translation, text comprehension, information retrieval, and text su...

  • Article
  • Open Access
5 Citations
4,020 Views
13 Pages

Improving the Performance of Vietnamese–Korean Neural Machine Translation with Contextual Embedding

  • Van-Hai Vu,
  • Quang-Phuoc Nguyen,
  • Ebipatei Victoria Tunyan and
  • Cheol-Young Ock

23 November 2021

With the recent evolution of deep learning, machine translation (MT) models and systems are being steadily improved. However, research on MT in low-resource languages such as Vietnamese and Korean is still very limited. In recent years, a state-of-th...

  • Article
  • Open Access
2 Citations
4,277 Views
15 Pages

10 June 2022

Text vectorization is the basic work of natural language processing tasks. High-quality vector representation with rich feature information can guarantee the quality of entity recognition and other downstream tasks in the field of traditional Chinese...

  • Article
  • Open Access
10 Citations
5,733 Views
13 Pages

A Comparative Study of Arabic Part of Speech Taggers Using Literary Text Samples from Saudi Novels

  • Reyadh Alluhaibi,
  • Tareq Alfraidi,
  • Mohammad A. R. Abdeen and
  • Ahmed Yatimi

15 December 2021

Part of Speech (POS) tagging is one of the most common techniques used in natural language processing (NLP) applications and corpus linguistics. Various POS tagging tools have been developed for Arabic. These taggers differ in several aspects, such a...

  • Article
  • Open Access
12 Citations
10,280 Views
32 Pages

23 November 2023

This study presents a new dataset for fake news analysis and detection, namely, the PolitiFact-Oslo Corpus. The corpus contains samples of both fake and real news in English, collected from the fact-checking website PolitiFact.com. It grew out of a n...

  • Article
  • Open Access
887 Views
24 Pages

18 September 2025

Part-of-speech (POS) tagging in low-resource, morphologically rich languages (LRLs/MRLs) remains challenging due to extensive affixation, high out-of-vocabulary (OOV) rates, and pervasive polysemy. We propose MRL-POS, a unified Transformer-CRF framew...

  • Article
  • Open Access
522 Views
24 Pages

Hybrid Methods for Automatic Collocation Extraction in Building a Learners’ Dictionary of Italian

  • Damiano Perri,
  • Osvaldo Gervasi,
  • Sergio Tasso,
  • Stefania Spina,
  • Irene Fioravanti,
  • Fabio Zanda and
  • Luciana Forti

12 December 2025

The automatic construction of learners’ dictionaries requires robust methods for identifying non-literal word combinations, or collocations, which represent a significant challenge for second-language (L2) learners. This paper addresses the cri...

  • Article
  • Open Access
5 Citations
3,234 Views
14 Pages

Grammar-Supervised End-to-End Speech Recognition with Part-of-Speech Tagging and Dependency Parsing

  • Genshun Wan,
  • Tingzhi Mao,
  • Jingxuan Zhang,
  • Hang Chen,
  • Jianqing Gao and
  • Zhongfu Ye

27 March 2023

For most automatic speech recognition systems, many unacceptable hypothesis errors still make the recognition results absurd and difficult to understand. In this paper, we introduce the grammar information to improve the performance of the grammatica...

  • Article
  • Open Access
7 Citations
2,628 Views
17 Pages

Parallel Bidirectionally Pretrained Taggers as Feature Generators

  • Ranka Stanković,
  • Mihailo Škorić and
  • Branislava Šandrih Todorović

16 May 2022

In a setting where multiple automatic annotation approaches coexist and advance separately but none completely solve a specific problem, the key might be in their combination and integration. This paper outlines a scalable architecture for Part-of-Sp...

  • Article
  • Open Access
1 Citations
2,600 Views
24 Pages

10 March 2021

Software clones are code fragments with similar or nearly similar functionality or structures. These clones are introduced in a project either accidentally or deliberately during software development or maintenance process. The presence of clones pos...

  • Article
  • Open Access
3 Citations
3,982 Views
20 Pages

29 January 2023

This study investigates the distributions of word classes in English speeches made in the European Parliament and their German (written) translations and simultaneous interpretations. For comparison, a sample of original German speeches and a selecti...

  • Article
  • Open Access
5 Citations
3,247 Views
12 Pages

Enhancing Communication Reliability from the Semantic Level under Low SNR

  • Yueling Liu,
  • Yichi Zhang,
  • Peng Luo,
  • Shengteng Jiang,
  • Kuo Cao,
  • Haitao Zhao and
  • Jibo Wei

In the low signal-to-noise ratio region, a large number of bit errors occur, and it may exceed the channel error correction capability of the receiver. Traditional communication system may use the technology of automatic repeat-request to deal with t...

  • Article
  • Open Access
8 Citations
2,889 Views
19 Pages

Quantifying Urban Linguistic Diversity Related to Rainfall and Flood across China with Social Media Data

  • Jiale Qian,
  • Yunyan Du,
  • Fuyuan Liang,
  • Jiawei Yi,
  • Nan Wang,
  • Wenna Tu,
  • Sheng Huang,
  • Tao Pei and
  • Ting Ma

Understanding the public’s diverse linguistic expressions about rainfall and flood provides a basis for flood disaster studies and enhances linguistic and cultural awareness. However, existing research tends to overlook linguistic complexity, p...

  • Article
  • Open Access
3 Citations
3,468 Views
13 Pages

Deep Learning-Based End-to-End Language Development Screening for Children Using Linguistic Knowledge

  • Byoung-Doo Oh,
  • Yoon-Kyoung Lee,
  • Jong-Dae Kim,
  • Chan-Young Park and
  • Yu-Seop Kim

6 May 2022

Language development is inextricably linked to the development of fundamental human abilities. A language problem can result from abnormal language development in childhood, which has a severe impact on other elements of life. As a result, early trea...

  • Article
  • Open Access
3 Citations
2,536 Views
27 Pages

Exploring the Cognitive Neural Basis of Factuality in Abstractive Text Summarization Models: Interpretable Insights from EEG Signals

  • Zhejun Zhang,
  • Yingqi Zhu,
  • Yubo Zheng,
  • Yingying Luo,
  • Hengyi Shao,
  • Shaoting Guo,
  • Liang Dong,
  • Lin Zhang and
  • Lei Li

19 January 2024

(1) Background: Information overload challenges decision-making in the Industry 4.0 era. While Natural Language Processing (NLP), especially Automatic Text Summarization (ATS), offers solutions, issues with factual accuracy persist. This research bri...

  • Article
  • Open Access
72 Citations
12,765 Views
22 Pages

10 October 2020

In the derived approach, an analysis is performed on Twitter data for World Cup soccer 2014 held in Brazil to detect the sentiment of the people throughout the world using machine learning techniques. By filtering and analyzing the data using natural...

  • Article
  • Open Access
25 Citations
5,911 Views
20 Pages

Detecting Word-Based Algorithmically Generated Domains Using Semantic Analysis

  • Luhui Yang,
  • Jiangtao Zhai,
  • Weiwei Liu,
  • Xiaopeng Ji,
  • Huiwen Bai,
  • Guangjie Liu and
  • Yuewei Dai

2 February 2019

In highly sophisticated network attacks, command-and-control (C&C) servers always use domain generation algorithms (DGAs) to dynamically produce several candidate domains instead of static hard-coded lists of IP addresses or domain names. Disting...

  • Article
  • Open Access
23 Citations
2,956 Views
22 Pages

10 January 2024

Aspect-based sentiment analysis is a fine-grained task where the key goal is to predict sentiment polarities of one or more aspects in a given sentence. Currently, graph neural network models built upon dependency trees are widely employed for aspect...

  • Article
  • Open Access
2 Citations
2,294 Views
17 Pages

26 January 2024

The purpose of this paper is to address the extraction of entities and relationships from unstructured Chinese text, with a particular emphasis on the challenges of Named Entity Recognition (NER) and Relation Extraction (RE). This will be achieved by...

  • Article
  • Open Access
3 Citations
4,782 Views
17 Pages

10 August 2018

A microblog is a new type of social media for information publishing, acquiring, and spreading. Finding the significant topics of a microblog is necessary for popularity tracing and public opinion following. This paper puts forward a method to detect...

  • Article
  • Open Access
2 Citations
2,847 Views
33 Pages

23 May 2024

Wide adoption of social media has caused an explosion of information stored online, with the majority of that information containing subjective, opinionated, and emotional content produced daily by users. The field of emotion analysis has helped effe...

  • Article
  • Open Access
5 Citations
2,376 Views
20 Pages

Natural Language Processing in Knowledge-Based Support for Operator Assistance

  • Fatemeh Besharati Moghaddam,
  • Angel J. Lopez,
  • Stijn De Vuyst and
  • Sidharta Gautama

26 March 2024

Manufacturing industry faces increasing complexity in the performance of assembly tasks due to escalating demand for complex products with a greater number of variations. Operators require robust assistance systems to enhance productivity, efficiency...

  • Article
  • Open Access
5 Citations
3,809 Views
15 Pages

This study analyzes the relationship between the degrees of resemblance and distances between dialects based on several dialectological atlases. This analysis investigates various correspondence data with respect to total valid data in setting refere...

  • Article
  • Open Access
29 Citations
8,114 Views
10 Pages

Due to the constantly evolving social media and different types of sources of information, we are facing different fake news and different types of misinformation. Currently, we are working on a project to identify applicable methods for identifying...

  • Data Descriptor
  • Open Access
9 Citations
6,195 Views
8 Pages

Visual Lip Reading Dataset in Turkish

  • Ali Berkol,
  • Talya Tümer-Sivri,
  • Nergis Pervan-Akman,
  • Melike Çolak and
  • Hamit Erdem

5 January 2023

The promised dataset was obtained from daily Turkish words and phrases pronounced by various people in videos posted on YouTube. The purpose of compiling the dataset was to provide a method for the detection of the spoken word by recognizing patterns...

  • Article
  • Open Access
4 Citations
3,605 Views
16 Pages

Lexical Category and Downstep in Japanese

  • Manami Hirayama,
  • Hyun Kyung Hwang and
  • Takaomi Kato

29 January 2022

In pursuing the mapping between syntax and phonology/prosody, little attention has been paid to the kinds of syntactic information that can affect prosody. In this paper, we explore Japanese downstep, a process in phrasal phonology. What syntactic in...

  • Article
  • Open Access
597 Views
27 Pages

Computational Infrastructure for Modern Greek: From Grammar to Ontology

  • Nikoletta E. Samaridi,
  • Nikitas N. Karanikolas,
  • Evangelos C. Papakitsos and
  • Christos Skourlas

19 November 2025

This study presents a comprehensive NLP infrastructure for Modern Greek that bridges grammatical analysis and ontological representation, integrating linguistic theory, algorithmic modeling, and semantic structuring within a unified computational fra...

  • Article
  • Open Access
3 Citations
4,524 Views
9 Pages

Similar Text Fragments Extraction for Identifying Common Wikipedia Communities

  • Svitlana Petrasova,
  • Nina Khairova,
  • Włodzimierz Lewoniewski,
  • Orken Mamyrbayev and
  • Kuralay Mukhsina

13 December 2018

Similar text fragments extraction from weakly formalized data is the task of natural language processing and intelligent data analysis and is used for solving the problem of automatic identification of connected knowledge fields. In order to search s...

  • Review
  • Open Access
4,806 Views
32 Pages

15 November 2025

Arabic natural language processing (NLP) has garnered significant attention in recent years due to the growing demand for automated text and Arabic-based intelligent systems, in addition to digital transformation in the Arab world. However, the uniqu...

  • Article
  • Open Access
239 Views
24 Pages

12 January 2026

In the context of the BiblIndex project, which is an online index of biblical textual reuses by the Church Fathers, intrabiblical intertextuality must be considered to better understand the underlying basis of the Church Fathers’ thought. This...

of 8