Information

Research

Jump to: Review

18 pages, 506 KiB

Open AccessArticle

Morphosyntactic Annotation in Literary Stylometry

by Robert Gorman

Information 2024, 15(4), 211; https://doi.org/10.3390/info15040211 - 09 Apr 2024

Viewed by 406

Abstract

This article investigates the stylometric usefulness of morphosyntactic annotation. Focusing on the style of literary texts, it argues that including morphosyntactic annotation in analyses of style has at least two important advantages: (1) maintaining a topic agnostic approach and (2) providing input variables [...] Read more.

This article investigates the stylometric usefulness of morphosyntactic annotation. Focusing on the style of literary texts, it argues that including morphosyntactic annotation in analyses of style has at least two important advantages: (1) maintaining a topic agnostic approach and (2) providing input variables that are interpretable in traditional grammatical terms. This study demonstrates how widely available Universal Dependency parsers can generate useful morphological and syntactic data for texts in a range of languages. These data can serve as the basis for input features that are strongly informative about the style of individual novels, as indicated by accuracy in classification tests. The interpretability of such features is demonstrated by a discussion of the weakness of an “authorial” signal as opposed to the clear distinction among individual works. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

15 pages, 312 KiB

Open AccessArticle

A Survey on Using Linguistic Markers for Diagnosing Neuropsychiatric Disorders with Artificial Intelligence

by Ioana-Raluca Zaman and Stefan Trausan-Matu

Information 2024, 15(3), 123; https://doi.org/10.3390/info15030123 - 22 Feb 2024

Viewed by 1173

Abstract

Neuropsychiatric disorders affect the lives of individuals from cognitive, emotional, and behavioral aspects, impact the quality of their lives, and even lead to death. Outside the medical area, these diseases have also started to be the subject of investigation in the field of [...] Read more.

Neuropsychiatric disorders affect the lives of individuals from cognitive, emotional, and behavioral aspects, impact the quality of their lives, and even lead to death. Outside the medical area, these diseases have also started to be the subject of investigation in the field of Artificial Intelligence: especially Natural Language Processing (NLP) and Computer Vision. The usage of NLP techniques to understand medical symptoms eases the process of identifying and learning more about language-related aspects of neuropsychiatric conditions, leading to better diagnosis and treatment options. This survey shows the evolution of the detection of linguistic markers specific to a series of neuropsychiatric disorders and symptoms. For each disease or symptom, the article presents a medical description, specific linguistic markers, the results obtained using markers, and datasets. Furthermore, this paper offers a critical analysis of the work undertaken to date and suggests potential directions for future research in the field. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

13 pages, 18095 KiB

Open AccessArticle

Minoan Cryptanalysis: Computational Approaches to Deciphering Linear A and Assessing Its Connections with Language Families from the Mediterranean and the Black Sea Areas

by Aaradh Nepal and Francesco Perono Cacciafoco

Information 2024, 15(2), 73; https://doi.org/10.3390/info15020073 - 25 Jan 2024

Viewed by 1280

Abstract

During the Bronze Age, the inhabitants of regions of Crete, mainland Greece, and Cyprus inscribed their languages using, among other scripts, a writing system called Linear A. These symbols, mainly characterized by combinations of lines, have, since their discovery, remained a mystery. Not [...] Read more.

During the Bronze Age, the inhabitants of regions of Crete, mainland Greece, and Cyprus inscribed their languages using, among other scripts, a writing system called Linear A. These symbols, mainly characterized by combinations of lines, have, since their discovery, remained a mystery. Not only is the corpus very small, but it is challenging to link Minoan, the language behind Linear A, to any known language. Most decipherment attempts involve using the phonetic values of Linear B, a grammatological offspring of Linear A, to ‘read’ Linear A. However, this yields meaningless words. Recently, novel approaches to deciphering the script have emerged which involve a computational component. In this paper, two such approaches are combined to account for the biases involved in provisionally assigning Linear B phonetic values to Linear A and to shed more light on the possible connections of Linear A with other scripts and languages from the region. Additionally, the limitations inherent in such approaches are discussed. Firstly, a feature-based similarity measure is used to compare Linear A with the Carian Alphabet and the Cypriot Syllabary. A few Linear A symbols are matched with symbols from the Carian Alphabet and the Cypriot Syllabary. Finally, using the derived phonetic values, Linear A is compared with Ancient Egyptian, Luwian, Hittite, Proto-Celtic, and Uralic using a consonantal approach. Some possible word matches are identified from each language. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

16 pages, 289 KiB

Open AccessArticle

Agile Logical Semantics for Natural Languages

by Vincenzo Manca

Information 2024, 15(1), 64; https://doi.org/10.3390/info15010064 - 21 Jan 2024

Viewed by 1235

Abstract

This paper presents an agile method of logical semantics based on high-order Predicate Logic. An operator of predicate abstraction is introduced that provides a simple mechanism for logical aggregation of predicates and for logical typing. Monadic high-order logic is the natural environment in [...] Read more.

This paper presents an agile method of logical semantics based on high-order Predicate Logic. An operator of predicate abstraction is introduced that provides a simple mechanism for logical aggregation of predicates and for logical typing. Monadic high-order logic is the natural environment in which predicate abstraction expresses the semantics of typical linguistic structures. Many examples of logical representations of natural language sentences are provided. Future extensions and possible applications in the interaction with chatbots are briefly discussed as well. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

41 pages, 9365 KiB

Open AccessArticle

Computing the Sound–Sense Harmony: A Case Study of William Shakespeare’s Sonnets and Francis Webb’s Most Popular Poems

by Rodolfo Delmonte

Information 2023, 14(10), 576; https://doi.org/10.3390/info14100576 - 20 Oct 2023

Viewed by 1751

Abstract

Poetic devices implicitly work towards inducing the reader to associate intended and expressed meaning to the sounds of the poem. In turn, sounds may be organized a priori into categories and assigned presumed meaning as suggested by traditional literary studies. To compute the [...] Read more.

Poetic devices implicitly work towards inducing the reader to associate intended and expressed meaning to the sounds of the poem. In turn, sounds may be organized a priori into categories and assigned presumed meaning as suggested by traditional literary studies. To compute the degree of harmony and disharmony, I have automatically extracted the sound grids of all the sonnets by William Shakespeare and have combined them with the themes expressed by their contents. In a first experiment, sounds have been associated with lexically and semantically based sentiment analysis, obtaining an 80% of agreement. In a second experiment, sentiment analysis has been substituted by Appraisal Theory, thus obtaining a more fine-grained interpretation that combines dis-harmony with irony. The computation for Francis Webb is based on his most popular 100 poems and combines automatic semantically and lexically based sentiment analysis with sound grids. The results produce visual maps that clearly separate poems into three clusters: negative harmony, positive harmony and disharmony, where the latter instantiates the need by the poet to encompass the opposites in a desperate attempt to reconcile them. Shakespeare and Webb have been chosen to prove the applicability of the method proposed in general contexts of poetry, exhibiting the widest possible gap at all linguistic and poetic levels. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

33 pages, 4671 KiB

Open AccessArticle

A Benchmark Dataset to Distinguish Human-Written and Machine-Generated Scientific Papers

by Mohamed Hesham Ibrahim Abdalla, Simon Malberg, Daryna Dementieva, Edoardo Mosca and Georg Groh

Information 2023, 14(10), 522; https://doi.org/10.3390/info14100522 - 26 Sep 2023

Cited by 1 | Viewed by 2086

Abstract

As generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in machine-generated text can be factually wrong or even entirely fabricated. In this work, [...] Read more.

As generative NLP can now produce content nearly indistinguishable from human writing, it is becoming difficult to identify genuine research contributions in academic writing and scientific publications. Moreover, information in machine-generated text can be factually wrong or even entirely fabricated. In this work, we introduce a novel benchmark dataset containing human-written and machine-generated scientific papers from SCIgen, GPT-2, GPT-3, ChatGPT, and Galactica, as well as papers co-created by humans and ChatGPT. We also experiment with several types of classifiers—linguistic-based and transformer-based—for detecting the authorship of scientific text. A strong focus is put on generalization capabilities and explainability to highlight the strengths and weaknesses of these detectors. Our work makes an important step towards creating more robust methods for distinguishing between human-written and machine-generated scientific papers, ultimately ensuring the integrity of scientific literature. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

29 pages, 1330 KiB

Open AccessArticle

Analyzing Sentiments Regarding ChatGPT Using Novel BERT: A Machine Learning Approach

by Sudheesh R, Muhammad Mujahid, Furqan Rustam, Rahman Shafique, Venkata Chunduri, Mónica Gracia Villar, Julién Brito Ballester, Isabel de la Torre Diez and Imran Ashraf

Information 2023, 14(9), 474; https://doi.org/10.3390/info14090474 - 25 Aug 2023

Cited by 3 | Viewed by 5283

Abstract

Chatbots are AI-powered programs designed to replicate human conversation. They are capable of performing a wide range of tasks, including answering questions, offering directions, controlling smart home thermostats, and playing music, among other functions. ChatGPT is a popular AI-based chatbot that generates meaningful [...] Read more.

Chatbots are AI-powered programs designed to replicate human conversation. They are capable of performing a wide range of tasks, including answering questions, offering directions, controlling smart home thermostats, and playing music, among other functions. ChatGPT is a popular AI-based chatbot that generates meaningful responses to queries, aiding people in learning. While some individuals support ChatGPT, others view it as a disruptive tool in the field of education. Discussions about this tool can be found across different social media platforms. Analyzing the sentiment of such social media data, which comprises people’s opinions, is crucial for assessing public sentiment regarding the success and shortcomings of such tools. This study performs a sentiment analysis and topic modeling on ChatGPT-based tweets. ChatGPT-based tweets are the author’s extracted tweets from Twitter using ChatGPT hashtags, where users share their reviews and opinions about ChatGPT, providing a reference to the thoughts expressed by users in their tweets. The Latent Dirichlet Allocation (LDA) approach is employed to identify the most frequently discussed topics in relation to ChatGPT tweets. For the sentiment analysis, a deep transformer-based Bidirectional Encoder Representations from Transformers (BERT) model with three dense layers of neural networks is proposed. Additionally, machine and deep learning models with fine-tuned parameters are utilized for a comparative analysis. Experimental results demonstrate the superior performance of the proposed BERT model, achieving an accuracy of 96.49%. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

16 pages, 396 KiB

Open AccessArticle

D0L-System Inference from a Single Sequence with a Genetic Algorithm

by Mateusz Łabędzki and Olgierd Unold

Information 2023, 14(6), 343; https://doi.org/10.3390/info14060343 - 16 Jun 2023

Viewed by 979

Abstract

In this paper, we proposed a new method for image-based grammatical inference of deterministic, context-free L-systems (D0L systems) from a single sequence. This approach is characterized by first parsing an input image into a sequence of symbols and then, using a genetic algorithm, [...] Read more.

In this paper, we proposed a new method for image-based grammatical inference of deterministic, context-free L-systems (D0L systems) from a single sequence. This approach is characterized by first parsing an input image into a sequence of symbols and then, using a genetic algorithm, attempting to infer a grammar that can generate this sequence. This technique has been tested using our test suite and compared to similar algorithms, showing promising results, including solving the problem for systems with more rules than in existing approaches. The tests show that it performs better than similar heuristic methods and can handle the same cases as arithmetic algorithms. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

15 pages, 8355 KiB

Open AccessArticle

Decipherment Challenges Due to Tamga and Letter Mix-Ups in an Old Hungarian Runic Inscription from the Altai Mountains

by Peter Z. Revesz

Information 2022, 13(9), 422; https://doi.org/10.3390/info13090422 - 07 Sep 2022

Cited by 1 | Viewed by 1961

Abstract

An Old Hungarian Runic inscription from the Altai Mountains with 40 signs has posed some special challenges for decipherment due to several letter mix-ups and the use of a tamga sign, which is the first reported use of a tamga within this type [...] Read more.

An Old Hungarian Runic inscription from the Altai Mountains with 40 signs has posed some special challenges for decipherment due to several letter mix-ups and the use of a tamga sign, which is the first reported use of a tamga within this type of script. This paper gives a complete and correct translation and draws some lessons that can be learned about decipherment. It introduces sign similarity matrices as a method of detecting accidental misspellings and shows that sign similarity matrices can be efficiently computed. It also explains the importance of simultaneously achieving the three criteria for a valid decipherment: correct signs, syntax, and semantics. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

17 pages, 1700 KiB

Open AccessArticle

Linguistic Profiling of Text Genres: An Exploration of Fictional vs. Non-Fictional Texts

by Akshay Mendhakar

Information 2022, 13(8), 357; https://doi.org/10.3390/info13080357 - 26 Jul 2022

Cited by 2 | Viewed by 4390

Abstract

Texts are composed for multiple audiences and for numerous purposes. Each form of text follows a set of guidelines and structure to serve the purpose of writing. A common way of grouping texts is into text types. Describing these text types in terms [...] Read more.

Texts are composed for multiple audiences and for numerous purposes. Each form of text follows a set of guidelines and structure to serve the purpose of writing. A common way of grouping texts is into text types. Describing these text types in terms of their linguistic characteristics is called ‘linguistic profiling of texts’. In this paper, we highlight the linguistic features that characterize a text type. The findings of the present study highlight the importance of parts of speech distribution and tenses as the most important microscopic linguistic characteristics of the text. Additionally, we demonstrate the importance of other linguistic characteristics of texts and their relative importance (top 25th, 50th and 75th percentile) in linguistic profiling. The results are discussed with the use case of genre and subgenre classifications with classification accuracies of 89 and 73 percentile, respectively. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

9 pages, 8714 KiB

Open AccessArticle

A Proposed Translation of an Altai Mountain Inscription Presumed to Be from the 7th Century BC

by Peter Z. Revesz and Géza Varga

Information 2022, 13(5), 243; https://doi.org/10.3390/info13050243 - 10 May 2022

Cited by 1 | Viewed by 2732

Abstract

The purpose of this study is to examine an Old Hungarian inscription that was recently found in the Altai mountain and was claimed to be over 2600 years old, which would make it the oldest extant example of the Old Hungarian script. A [...] Read more.

The purpose of this study is to examine an Old Hungarian inscription that was recently found in the Altai mountain and was claimed to be over 2600 years old, which would make it the oldest extant example of the Old Hungarian script. A careful observation of the Altai script and a comparison with other Old Hungarian inscriptions was made, during which several errors were discovered in the interpretation of the Old Hungarian signs. After correcting for these errors that were apparently introduced by mixing up the inscription with underlying engravings of animal images, a new sequence of Old Hungarian signs was obtained and translated into a new text. The context of the text indicates that the inscription is considerably more recent and is unlikely to be earlier than the 19th century. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures

Figure 1

Review

Jump to: Research

25 pages, 1910 KiB

Open AccessReview

A Literature Survey on Word Sense Disambiguation for the Hindi Language

by Vinto Gujjar, Neeru Mago, Raj Kumari, Shrikant Patel, Nalini Chintalapudi and Gopi Battineni

Information 2023, 14(9), 495; https://doi.org/10.3390/info14090495 - 07 Sep 2023

Cited by 3 | Viewed by 1475

Abstract

Word sense disambiguation (WSD) is a process used to determine the most appropriate meaning of a word in a given contextual framework, particularly when the word is ambiguous. While WSD has been extensively studied for English, it remains a challenging problem for resource-scarce [...] Read more.

Word sense disambiguation (WSD) is a process used to determine the most appropriate meaning of a word in a given contextual framework, particularly when the word is ambiguous. While WSD has been extensively studied for English, it remains a challenging problem for resource-scarce languages such as Hindi. Therefore, it is crucial to address ambiguity in Hindi to effectively and efficiently utilize it on the web for various applications such as machine translation, information retrieval, etc. The rich linguistic structure of Hindi, characterized by complex morphological variations and syntactic nuances, presents unique challenges in accurately determining the intended sense of a word within a given context. This review paper presents an overview of different approaches employed to resolve the ambiguity of Hindi words, including supervised, unsupervised, and knowledge-based methods. Additionally, the paper discusses applications, identifies open problems, presents conclusions, and suggests future research directions. Full article

(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)

► Show Figures