Computational Linguistics and Natural Language Processing

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Processes".

Deadline for manuscript submissions: closed (31 December 2023) | Viewed by 27083

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE 68588, USA
Interests: databases; computational linguistics; bioinformatics; geoinformatics
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues, 

This Special Issue will present extended versions of selected papers presented at the 2021 2nd International Conference on Computational Linguistics and Natural Language Processing (CLNLP 2021). CLNLP 2021 aims to bring together leading academic scientists, researchers, and research scholars to exchange and share their experiences and research results in the field of Computational Linguistics and Natural Language Processing. With the theme “Computational Linguistics and Natural Language Processing”, CLNLP 2021 aspires to keep up with advances and changes to a consistently morphing field. Leading researchers and industry experts from around the globe will be presenting the latest studies through papers and oral presentations. Authors of invited papers should be aware that the final submitted manuscript must provide a minimum of 50% new content and not exceed 30% copy/paste from the proceedings paper.

Prof. Dr. Peter Revesz
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • computational semantics
  • data science in language processing
  • discourse processing
  • document analysis
  • graphical methods
  • information about space and time in language models and processing
  • information extraction and database linking
  • information extraction and text mining
  • information retrieval
  • machine learning of language
  • social media

Published Papers (12 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

18 pages, 506 KiB  
Article
Morphosyntactic Annotation in Literary Stylometry
by Robert Gorman
Information 2024, 15(4), 211; https://doi.org/10.3390/info15040211 - 09 Apr 2024
Viewed by 406
Abstract
This article investigates the stylometric usefulness of morphosyntactic annotation. Focusing on the style of literary texts, it argues that including morphosyntactic annotation in analyses of style has at least two important advantages: (1) maintaining a topic agnostic approach and (2) providing input variables [...] Read more.
This article investigates the stylometric usefulness of morphosyntactic annotation. Focusing on the style of literary texts, it argues that including morphosyntactic annotation in analyses of style has at least two important advantages: (1) maintaining a topic agnostic approach and (2) providing input variables that are interpretable in traditional grammatical terms. This study demonstrates how widely available Universal Dependency parsers can generate useful morphological and syntactic data for texts in a range of languages. These data can serve as the basis for input features that are strongly informative about the style of individual novels, as indicated by accuracy in classification tests. The interpretability of such features is demonstrated by a discussion of the weakness of an “authorial” signal as opposed to the clear distinction among individual works. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

15 pages, 312 KiB  
Article
A Survey on Using Linguistic Markers for Diagnosing Neuropsychiatric Disorders with Artificial Intelligence
by Ioana-Raluca Zaman and Stefan Trausan-Matu
Information 2024, 15(3), 123; https://doi.org/10.3390/info15030123 - 22 Feb 2024
Viewed by 1173
Abstract
Neuropsychiatric disorders affect the lives of individuals from cognitive, emotional, and behavioral aspects, impact the quality of their lives, and even lead to death. Outside the medical area, these diseases have also started to be the subject of investigation in the field of [...] Read more.
Neuropsychiatric disorders affect the lives of individuals from cognitive, emotional, and behavioral aspects, impact the quality of their lives, and even lead to death. Outside the medical area, these diseases have also started to be the subject of investigation in the field of Artificial Intelligence: especially Natural Language Processing (NLP) and Computer Vision. The usage of NLP techniques to understand medical symptoms eases the process of identifying and learning more about language-related aspects of neuropsychiatric conditions, leading to better diagnosis and treatment options. This survey shows the evolution of the detection of linguistic markers specific to a series of neuropsychiatric disorders and symptoms. For each disease or symptom, the article presents a medical description, specific linguistic markers, the results obtained using markers, and datasets. Furthermore, this paper offers a critical analysis of the work undertaken to date and suggests potential directions for future research in the field. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

13 pages, 18095 KiB  
Article
Minoan Cryptanalysis: Computational Approaches to Deciphering Linear A and Assessing Its Connections with Language Families from the Mediterranean and the Black Sea Areas
by Aaradh Nepal and Francesco Perono Cacciafoco
Information 2024, 15(2), 73; https://doi.org/10.3390/info15020073 - 25 Jan 2024
Viewed by 1280
Abstract
During the Bronze Age, the inhabitants of regions of Crete, mainland Greece, and Cyprus inscribed their languages using, among other scripts, a writing system called Linear A. These symbols, mainly characterized by combinations of lines, have, since their discovery, remained a mystery. Not [...] Read more.
During the Bronze Age, the inhabitants of regions of Crete, mainland Greece, and Cyprus inscribed their languages using, among other scripts, a writing system called Linear A. These symbols, mainly characterized by combinations of lines, have, since their discovery, remained a mystery. Not only is the corpus very small, but it is challenging to link Minoan, the language behind Linear A, to any known language. Most decipherment attempts involve using the phonetic values of Linear B, a grammatological offspring of Linear A, to ‘read’ Linear A. However, this yields meaningless words. Recently, novel approaches to deciphering the script have emerged which involve a computational component. In this paper, two such approaches are combined to account for the biases involved in provisionally assigning Linear B phonetic values to Linear A and to shed more light on the possible connections of Linear A with other scripts and languages from the region. Additionally, the limitations inherent in such approaches are discussed. Firstly, a feature-based similarity measure is used to compare Linear A with the Carian Alphabet and the Cypriot Syllabary. A few Linear A symbols are matched with symbols from the Carian Alphabet and the Cypriot Syllabary. Finally, using the derived phonetic values, Linear A is compared with Ancient Egyptian, Luwian, Hittite, Proto-Celtic, and Uralic using a consonantal approach. Some possible word matches are identified from each language. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
16 pages, 289 KiB  
Article
Agile Logical Semantics for Natural Languages
by Vincenzo Manca
Information 2024, 15(1), 64; https://doi.org/10.3390/info15010064 - 21 Jan 2024
Viewed by 1235
Abstract
This paper presents an agile method of logical semantics based on high-order Predicate Logic. An operator of predicate abstraction is introduced that provides a simple mechanism for logical aggregation of predicates and for logical typing. Monadic high-order logic is the natural environment in [...] Read more.
This paper presents an agile method of logical semantics based on high-order Predicate Logic. An operator of predicate abstraction is introduced that provides a simple mechanism for logical aggregation of predicates and for logical typing. Monadic high-order logic is the natural environment in which predicate abstraction expresses the semantics of typical linguistic structures. Many examples of logical representations of natural language sentences are provided. Future extensions and possible applications in the interaction with chatbots are briefly discussed as well. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
41 pages, 9365 KiB  
Article
Computing the Sound–Sense Harmony: A Case Study of William Shakespeare’s Sonnets and Francis Webb’s Most Popular Poems
by Rodolfo Delmonte
Information 2023, 14(10), 576; https://doi.org/10.3390/info14100576 - 20 Oct 2023
Viewed by 1751
Abstract
Poetic devices implicitly work towards inducing the reader to associate intended and expressed meaning to the sounds of the poem. In turn, sounds may be organized a priori into categories and assigned presumed meaning as suggested by traditional literary studies. To compute the [...] Read more.
Poetic devices implicitly work towards inducing the reader to associate intended and expressed meaning to the sounds of the poem. In turn, sounds may be organized a priori into categories and assigned presumed meaning as suggested by traditional literary studies. To compute the degree of harmony and disharmony, I have automatically extracted the sound grids of all the sonnets by William Shakespeare and have combined them with the themes expressed by their contents. In a first experiment, sounds have been associated with lexically and semantically based sentiment analysis, obtaining an 80% of agreement. In a second experiment, sentiment analysis has been substituted by Appraisal Theory, thus obtaining a more fine-grained interpretation that combines dis-harmony with irony. The computation for Francis Webb is based on his most popular 100 poems and combines automatic semantically and lexically based sentiment analysis with sound grids. The results produce visual maps that clearly separate poems into three clusters: negative harmony, positive harmony and disharmony, where the latter instantiates the need by the poet to encompass the opposites in a desperate attempt to reconcile them. Shakespeare and Webb have been chosen to prove the applicability of the method proposed in general contexts of poetry, exhibiting the widest possible gap at all linguistic and poetic levels. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

33 pages, 4671 KiB  
Article
A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers
by Mohamed Hesham Ibrahim Abdalla, Simon Malberg, Daryna Dementieva, Edoardo Mosca and Georg Groh
Information 2023, 14(10), 522; https://doi.org/10.3390/info14100522 - 26 Sep 2023
Cited by 1 | Viewed by 2086
Abstract
As generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in machine-generated text can be factually wrong or even entirely fabricated. In this work, [...] Read more.
As generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in machine-generated text can be factually wrong or even entirely fabricated. In this work, we introduce a novel benchmark dataset containing human-written and machine-generated scientific papers from SCIgen, GPT-2, GPT-3, ChatGPT, and Galactica, as well as papers co-created by humans and ChatGPT. We also experiment with several types of classifiers—linguistic-based and transformer-based—for detecting the authorship of scientific text. A strong focus is put on generalization capabilities and explainability to highlight the strengths and weaknesses of these detectors. Our work makes an important step towards creating more robust methods for distinguishing between human-written and machine-generated scientific papers, ultimately ensuring the integrity of scientific literature. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

29 pages, 1330 KiB  
Article
Analyzing Sentiments Regarding ChatGPT Using Novel BERT: A Machine Learning Approach
by Sudheesh R, Muhammad Mujahid, Furqan Rustam, Rahman Shafique, Venkata Chunduri, Mónica Gracia Villar, Julién Brito Ballester, Isabel de la Torre Diez and Imran Ashraf
Information 2023, 14(9), 474; https://doi.org/10.3390/info14090474 - 25 Aug 2023
Cited by 3 | Viewed by 5283
Abstract
Chatbots are AI-powered programs designed to replicate human conversation. They are capable of performing a wide range of tasks, including answering questions, offering directions, controlling smart home thermostats, and playing music, among other functions. ChatGPT is a popular AI-based chatbot that generates meaningful [...] Read more.
Chatbots are AI-powered programs designed to replicate human conversation. They are capable of performing a wide range of tasks, including answering questions, offering directions, controlling smart home thermostats, and playing music, among other functions. ChatGPT is a popular AI-based chatbot that generates meaningful responses to queries, aiding people in learning. While some individuals support ChatGPT, others view it as a disruptive tool in the field of education. Discussions about this tool can be found across different social media platforms. Analyzing the sentiment of such social media data, which comprises people’s opinions, is crucial for assessing public sentiment regarding the success and shortcomings of such tools. This study performs a sentiment analysis and topic modeling on ChatGPT-based tweets. ChatGPT-based tweets are the author’s extracted tweets from Twitter using ChatGPT hashtags, where users share their reviews and opinions about ChatGPT, providing a reference to the thoughts expressed by users in their tweets. The Latent Dirichlet Allocation (LDA) approach is employed to identify the most frequently discussed topics in relation to ChatGPT tweets. For the sentiment analysis, a deep transformer-based Bidirectional Encoder Representations from Transformers (BERT) model with three dense layers of neural networks is proposed. Additionally, machine and deep learning models with fine-tuned parameters are utilized for a comparative analysis. Experimental results demonstrate the superior performance of the proposed BERT model, achieving an accuracy of 96.49%. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

16 pages, 396 KiB  
Article
D0L-System Inference from a Single Sequence with a Genetic Algorithm
by Mateusz Łabędzki and Olgierd Unold
Information 2023, 14(6), 343; https://doi.org/10.3390/info14060343 - 16 Jun 2023
Viewed by 979
Abstract
In this paper, we proposed a new method for image-based grammatical inference of deterministic, context-free L-systems (D0L systems) from a single sequence. This approach is characterized by first parsing an input image into a sequence of symbols and then, using a genetic algorithm, [...] Read more.
In this paper, we proposed a new method for image-based grammatical inference of deterministic, context-free L-systems (D0L systems) from a single sequence. This approach is characterized by first parsing an input image into a sequence of symbols and then, using a genetic algorithm, attempting to infer a grammar that can generate this sequence. This technique has been tested using our test suite and compared to similar algorithms, showing promising results, including solving the problem for systems with more rules than in existing approaches. The tests show that it performs better than similar heuristic methods and can handle the same cases as arithmetic algorithms. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

15 pages, 8355 KiB  
Article
Decipherment Challenges Due to Tamga and Letter Mix-Ups in an Old Hungarian Runic Inscription from the Altai Mountains
by Peter Z. Revesz
Information 2022, 13(9), 422; https://doi.org/10.3390/info13090422 - 07 Sep 2022
Cited by 1 | Viewed by 1961
Abstract
An Old Hungarian Runic inscription from the Altai Mountains with 40 signs has posed some special challenges for decipherment due to several letter mix-ups and the use of a tamga sign, which is the first reported use of a tamga within this type [...] Read more.
An Old Hungarian Runic inscription from the Altai Mountains with 40 signs has posed some special challenges for decipherment due to several letter mix-ups and the use of a tamga sign, which is the first reported use of a tamga within this type of script. This paper gives a complete and correct translation and draws some lessons that can be learned about decipherment. It introduces sign similarity matrices as a method of detecting accidental misspellings and shows that sign similarity matrices can be efficiently computed. It also explains the importance of simultaneously achieving the three criteria for a valid decipherment: correct signs, syntax, and semantics. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

17 pages, 1700 KiB  
Article
Linguistic Profiling of Text Genres: An Exploration of Fictional vs. Non-Fictional Texts
by Akshay Mendhakar
Information 2022, 13(8), 357; https://doi.org/10.3390/info13080357 - 26 Jul 2022
Cited by 2 | Viewed by 4390
Abstract
Texts are composed for multiple audiences and for numerous purposes. Each form of text follows a set of guidelines and structure to serve the purpose of writing. A common way of grouping texts is into text types. Describing these text types in terms [...] Read more.
Texts are composed for multiple audiences and for numerous purposes. Each form of text follows a set of guidelines and structure to serve the purpose of writing. A common way of grouping texts is into text types. Describing these text types in terms of their linguistic characteristics is called ‘linguistic profiling of texts’. In this paper, we highlight the linguistic features that characterize a text type. The findings of the present study highlight the importance of parts of speech distribution and tenses as the most important microscopic linguistic characteristics of the text. Additionally, we demonstrate the importance of other linguistic characteristics of texts and their relative importance (top 25th, 50th and 75th percentile) in linguistic profiling. The results are discussed with the use case of genre and subgenre classifications with classification accuracies of 89 and 73 percentile, respectively. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

9 pages, 8714 KiB  
Article
A Proposed Translation of an Altai Mountain Inscription Presumed to Be from the 7th Century BC
by Peter Z. Revesz and Géza Varga
Information 2022, 13(5), 243; https://doi.org/10.3390/info13050243 - 10 May 2022
Cited by 1 | Viewed by 2732
Abstract
The purpose of this study is to examine an Old Hungarian inscription that was recently found in the Altai mountain and was claimed to be over 2600 years old, which would make it the oldest extant example of the Old Hungarian script. A [...] Read more.
The purpose of this study is to examine an Old Hungarian inscription that was recently found in the Altai mountain and was claimed to be over 2600 years old, which would make it the oldest extant example of the Old Hungarian script. A careful observation of the Altai script and a comparison with other Old Hungarian inscriptions was made, during which several errors were discovered in the interpretation of the Old Hungarian signs. After correcting for these errors that were apparently introduced by mixing up the inscription with underlying engravings of animal images, a new sequence of Old Hungarian signs was obtained and translated into a new text. The context of the text indicates that the inscription is considerably more recent and is unlikely to be earlier than the 19th century. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

Review

Jump to: Research

25 pages, 1910 KiB  
Review
A Literature Survey on Word Sense Disambiguation for the Hindi Language
by Vinto Gujjar, Neeru Mago, Raj Kumari, Shrikant Patel, Nalini Chintalapudi and Gopi Battineni
Information 2023, 14(9), 495; https://doi.org/10.3390/info14090495 - 07 Sep 2023
Cited by 3 | Viewed by 1475
Abstract
Word sense disambiguation (WSD) is a process used to determine the most appropriate meaning of a word in a given contextual framework, particularly when the word is ambiguous. While WSD has been extensively studied for English, it remains a challenging problem for resource-scarce [...] Read more.
Word sense disambiguation (WSD) is a process used to determine the most appropriate meaning of a word in a given contextual framework, particularly when the word is ambiguous. While WSD has been extensively studied for English, it remains a challenging problem for resource-scarce languages such as Hindi. Therefore, it is crucial to address ambiguity in Hindi to effectively and efficiently utilize it on the web for various applications such as machine translation, information retrieval, etc. The rich linguistic structure of Hindi, characterized by complex morphological variations and syntactic nuances, presents unique challenges in accurately determining the intended sense of a word within a given context. This review paper presents an overview of different approaches employed to resolve the ambiguity of Hindi words, including supervised, unsupervised, and knowledge-based methods. Additionally, the paper discusses applications, identifies open problems, presents conclusions, and suggests future research directions. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

Back to TopTop