Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (13)

Search Parameters:
Keywords = stylometric features

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
25 pages, 1887 KB  
Article
Does All or Nothing Always Work Best? In Search of Advantageous Representation of Attributes
by Urszula Stańczyk and Grzegorz Baron
Appl. Sci. 2026, 16(6), 2679; https://doi.org/10.3390/app16062679 - 11 Mar 2026
Viewed by 142
Abstract
Discretisation is a processing step often included in the preliminary data preparation. Typically, when the input features have continuous domains and their discrete forms are needed, all are translated into categorical type at the same time, before data mining takes place. However, proceeding [...] Read more.
Discretisation is a processing step often included in the preliminary data preparation. Typically, when the input features have continuous domains and their discrete forms are needed, all are translated into categorical type at the same time, before data mining takes place. However, proceeding this way is not always the most advantageous to performance. The paper presents results from the research where the discretisation transformations were carried out sequentially forward for variables, and their selection was based on their values and also importance of the attributes estimated by the constructed rankings. The experiments were executed on the datasets from the area of stylometric analysis of texts, the application domain focused on recognising authorship based on individual characteristics of writing styles. For the selected data mining techniques, the performance was studied in the context of transformed features. The observed trends indicate that along with enhanced understanding of the nature of the data, partial discretisation of feature sets could bring higher accuracy than transformation of entire input domain, showing the merits of the described research methodology. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

24 pages, 1783 KB  
Article
A Hybrid Human-Centric Framework for Discriminating Engine-like from Human-like Chess Play: A Proof-of-Concept Study
by Zura Kevanishvili and Maksim Iavich
Appl. Syst. Innov. 2026, 9(1), 11; https://doi.org/10.3390/asi9010011 - 26 Dec 2025
Viewed by 1260
Abstract
The rapid growth of online chess has intensified the challenge of distinguishing engine-assisted from authentic human play, exposing the limitations of existing approaches that rely solely on deterministic evaluation metrics. This study introduces a proof-of-concept hybrid framework for discriminating between engine-like and human-like [...] Read more.
The rapid growth of online chess has intensified the challenge of distinguishing engine-assisted from authentic human play, exposing the limitations of existing approaches that rely solely on deterministic evaluation metrics. This study introduces a proof-of-concept hybrid framework for discriminating between engine-like and human-like chess play patterns, integrating Stockfish’s deterministic evaluations with stylometric behavioral features derived from the Maia engine. Key metrics include Centipawn Loss (CPL), Mismatch Move Match Probability (MMMP), and a novel Curvature-Based Stability (ΔS) indicator. These features were incorporated into a convolutional neural network (CNN) classifier and evaluated on a controlled benchmark dataset of 1000 games, where ‘suspicious’ gameplay was algorithmically generated to simulate engine-optimal patterns, while ‘clean’ play was modeled using Maia’s human-like predictions. Results demonstrate the framework’s ability to discriminate between these behavioral archetypes, with the hybrid model achieving a macro F1-score of 0.93, significantly outperforming the Stockfish-only baseline (F1 = 0.87), as validated by McNemar’s test (p = 0.0153). Feature ablation confirmed that Maia-derived features reduced false negatives and improved recall, while ΔS enhanced robustness. This work establishes a methodological foundation for behavioral pattern discrimination in chess, demonstrating the value of combining deterministic and human-centric modeling. Beyond chess, the approach offers a template for behavioral anomaly analysis in cybersecurity, education, and other decision-based domains, with real-world validation on adjudicated misconduct cases identified as the essential next step. Full article
Show Figures

Figure 1

15 pages, 1374 KB  
Article
Stylometric Analysis of Sustainable Central Bank Communications: Revealing Authorial Signatures in Monetary Policy Statements
by Hakan Emekci and İbrahim Özkan
Sustainability 2025, 17(20), 8979; https://doi.org/10.3390/su17208979 - 10 Oct 2025
Viewed by 779
Abstract
Sustainable economic development requires transparent and consistent institutional communication from monetary authorities to maintain long-term financial stability and public trust. This study investigates the latent authorial structure and stylistic heterogeneity of central bank communications by applying stylometric analysis and unsupervised machine learning to [...] Read more.
Sustainable economic development requires transparent and consistent institutional communication from monetary authorities to maintain long-term financial stability and public trust. This study investigates the latent authorial structure and stylistic heterogeneity of central bank communications by applying stylometric analysis and unsupervised machine learning to official announcements of the Central Bank of the Republic of Turkey (CBRT). Using a dataset of 557 press releases from 2006 to 2017, we extract a range of linguistic features at both sentence and document levels—including sentence length, punctuation density, word length, and type–token ratios. These features are reduced using Principal Component Analysis (PCA) and clustered via Hierarchical Clustering on Principal Components (HCPC), revealing three distinct authorial groups within the CBRT’s communications. The robustness of these clusters is validated using multidimensional scaling (MDS) on character-level and word-level n-gram distances. The analysis finds consistent stylistic differences between clusters, with implications for authorship attribution, tone variation, and communication strategy. Notably, sentiment analysis indicates that one authorial cluster tends to exhibit more negative tonal features, suggesting potential bias or divergence in internal communication style. These findings challenge the conventional assumption of institutional homogeneity and highlight the presence of distinct communicative voices within the central bank. Furthermore, the results suggest that stylistic variation—though often subtle—may convey unintended policy signals to markets, especially in contexts where linguistic shifts are closely scrutinized. This research contributes to the emerging intersection of natural language processing, monetary economics, and institutional transparency. It demonstrates the efficacy of stylometric techniques in revealing the hidden structure of policy discourse and suggests that linguistic analytics can offer valuable insights into the internal dynamics, credibility, and effectiveness of monetary authorities. These findings contribute to sustainable financial governance by demonstrating how AI-driven analysis can enhance institutional transparency, promote consistent policy communication, and support long-term economic stability—key pillars of sustainable development. Full article
(This article belongs to the Special Issue Public Policy and Economic Analysis in Sustainability Transitions)
Show Figures

Figure 1

16 pages, 1051 KB  
Article
Kafka’s Literary Style: A Mixed-Method Approach
by Carsten Strathausen, Wenyi Shang and Andrei Kazakov
Humanities 2025, 14(3), 61; https://doi.org/10.3390/h14030061 - 12 Mar 2025
Viewed by 3081
Abstract
In this essay, we examine how the polyvalence of meaning in Kafka’s texts is engineered both semantically (on the narrative level) and syntactically (on the linguistic level), and we ask whether a computational approach can shed new light on the long-standing debate about [...] Read more.
In this essay, we examine how the polyvalence of meaning in Kafka’s texts is engineered both semantically (on the narrative level) and syntactically (on the linguistic level), and we ask whether a computational approach can shed new light on the long-standing debate about the major characteristics of Kafka’s literary style. A mixed-method approach means that we seek out points of connection that interlink traditional humanist (i.e., interpretative) and computational (i.e., quantitative) methods of investigation. Following the introduction, the second section of our article provides a critical overview of the existing scholarship from both a humanist and a computational perspective. We argue that the main methodological difference between traditional humanist and AI-enhanced computational studies of Kafka’s literary style lies not in the use of statistics but in the new interpretative possibilities enabled by AI methods to explore stylistic features beyond the scope of human comprehension. In the third and fourth sections of our article, we will introduce our own stylometric approach to Kafka, detail our methods, and interpret our findings. Rather than focusing on training an AI model capable of accurately attributing authorship to Kafka, we examine whether AI could help us detect significant stylistic differences between the writing Kafka himself published during his lifetime (Kafka Core) and his posthumous writings edited and published by Max Brod. Full article
(This article belongs to the Special Issue Franz Kafka in the Age of Artificial Intelligence)
Show Figures

Figure 1

37 pages, 2517 KB  
Article
Multitask Learning for Authenticity and Authorship Detection
by Gurunameh Singh Chhatwal and Jiashu Zhao
Electronics 2025, 14(6), 1113; https://doi.org/10.3390/electronics14061113 - 12 Mar 2025
Cited by 2 | Viewed by 2869
Abstract
Traditionally, detecting misinformation (real vs. fake) and authorship (human vs. AI) have been addressed as separate classification tasks, leaving a critical gap in real-world scenarios where these challenges increasingly overlap. Motivated by this need, we introduce a unified framework—the Shared–Private Synergy Model (SPSM)—that [...] Read more.
Traditionally, detecting misinformation (real vs. fake) and authorship (human vs. AI) have been addressed as separate classification tasks, leaving a critical gap in real-world scenarios where these challenges increasingly overlap. Motivated by this need, we introduce a unified framework—the Shared–Private Synergy Model (SPSM)—that tackles both authenticity and authorship classification under one umbrella. Our approach is tested on a novel multi-label dataset and evaluated through an exhaustive suite of methods, including traditional machine learning, stylometric feature analysis, and pretrained large language model-based classifiers. Notably, the proposed SPSM architecture incorporates multitask learning, shared–private layers, and hierarchical dependencies, achieving state-of-the-art results with over 96% accuracy for authenticity (real vs. fake) and 98% for authorship (human vs. AI). Beyond its superior performance, our approach is interpretable: stylometric analyses reveal how factors like sentence complexity and entity usage can differentiate between fake news and AI-generated text. Meanwhile, LLM-based classifiers show moderate success. Comprehensive ablation studies further highlight the impact of task-specific architectural enhancements such as shared layers and balanced task losses on boosting classification performance. Our findings underscore the effectiveness of synergistic PLM architectures for tackling complex classification tasks while offering insights into linguistic and structural markers of authenticity and attribution. This study provides a strong foundation for future research, including multimodal detection, cross-lingual expansion, and the development of lightweight, deployable models to combat misinformation in the evolving digital landscape and smart society. Full article
Show Figures

Figure 1

30 pages, 1001 KB  
Article
Genre Classification of Books in Russian with Stylometric Features: A Case Study
by Natalia Vanetik, Margarita Tiamanova, Genady Kogan and Marina Litvak
Information 2024, 15(6), 340; https://doi.org/10.3390/info15060340 - 7 Jun 2024
Cited by 2 | Viewed by 3358
Abstract
Within the literary domain, genres function as fundamental organizing concepts that provide readers, publishers, and academics with a unified framework. Genres are discrete categories that are distinguished by common stylistic, thematic, and structural components. They facilitate the categorization process and improve our understanding [...] Read more.
Within the literary domain, genres function as fundamental organizing concepts that provide readers, publishers, and academics with a unified framework. Genres are discrete categories that are distinguished by common stylistic, thematic, and structural components. They facilitate the categorization process and improve our understanding of a wide range of literary expressions. In this paper, we introduce a new dataset for genre classification of Russian books, covering 11 literary genres. We also perform dataset evaluation for the tasks of binary and multi-class genre identification. Through extensive experimentation and analysis, we explore the effectiveness of different text representations, including stylometric features, in genre classification. Our findings clarify the challenges present in classifying Russian literature by genre, revealing insights into the performance of different models across various genres. Furthermore, we address several research questions regarding the difficulty of multi-class classification compared to binary classification, and the impact of stylometric features on classification accuracy. Full article
(This article belongs to the Special Issue Text Mining: Challenges, Algorithms, Tools and Applications)
Show Figures

Figure 1

32 pages, 2235 KB  
Article
Importance of Characteristic Features and Their Form for Data Exploration
by Urszula Stańczyk, Beata Zielosko and Grzegorz Baron
Entropy 2024, 26(5), 404; https://doi.org/10.3390/e26050404 - 6 May 2024
Cited by 2 | Viewed by 2499
Abstract
The nature of the input features is one of the key factors indicating what kind of tools, methods, or approaches can be used in a knowledge discovery process. Depending on the characteristics of the available attributes, some techniques could lead to unsatisfactory performance [...] Read more.
The nature of the input features is one of the key factors indicating what kind of tools, methods, or approaches can be used in a knowledge discovery process. Depending on the characteristics of the available attributes, some techniques could lead to unsatisfactory performance or even may not proceed at all without additional preprocessing steps. The types of variables and their domains affect performance. Any changes to their form can influence it as well, or even enable some learners. On the other hand, the relevance of features for a task constitutes another element with a noticeable impact on data exploration. The importance of attributes can be estimated through the application of mechanisms belonging to the feature selection and reduction area, such as rankings. In the described research framework, the data form was conditioned on relevance by the proposed procedure of gradual discretisation controlled by a ranking of attributes. Supervised and unsupervised discretisation methods were employed to the datasets from the stylometric domain and the task of binary authorship attribution. For the selected classifiers, extensive tests were performed and they indicated many cases of enhanced prediction for partially discretised datasets. Full article
Show Figures

Figure 1

18 pages, 506 KB  
Article
Morphosyntactic Annotation in Literary Stylometry
by Robert Gorman
Information 2024, 15(4), 211; https://doi.org/10.3390/info15040211 - 9 Apr 2024
Cited by 3 | Viewed by 3028
Abstract
This article investigates the stylometric usefulness of morphosyntactic annotation. Focusing on the style of literary texts, it argues that including morphosyntactic annotation in analyses of style has at least two important advantages: (1) maintaining a topic agnostic approach and (2) providing input variables [...] Read more.
This article investigates the stylometric usefulness of morphosyntactic annotation. Focusing on the style of literary texts, it argues that including morphosyntactic annotation in analyses of style has at least two important advantages: (1) maintaining a topic agnostic approach and (2) providing input variables that are interpretable in traditional grammatical terms. This study demonstrates how widely available Universal Dependency parsers can generate useful morphological and syntactic data for texts in a range of languages. These data can serve as the basis for input features that are strongly informative about the style of individual novels, as indicated by accuracy in classification tests. The interpretability of such features is demonstrated by a discussion of the weakness of an “authorial” signal as opposed to the clear distinction among individual works. Full article
(This article belongs to the Special Issue Computational Linguistics and Natural Language Processing)
Show Figures

Figure 1

16 pages, 6689 KB  
Article
The Question of Studying Information Entropy in Poetic Texts
by Olga Kozhemyakina, Vladimir Barakhnin, Natalia Shashok and Elina Kozhemyakina
Appl. Sci. 2023, 13(20), 11247; https://doi.org/10.3390/app132011247 - 13 Oct 2023
Viewed by 2902
Abstract
One of the approaches to quantitative text analysis is to represent a given text in the form of a time series, which can be followed by an information entropy study for different text representations, such as “symbolic entropy”, “phonetic entropy” and “emotional entropy” [...] Read more.
One of the approaches to quantitative text analysis is to represent a given text in the form of a time series, which can be followed by an information entropy study for different text representations, such as “symbolic entropy”, “phonetic entropy” and “emotional entropy” of various orders. Studying authors’ styles based on such entropic characteristics of their works seems to be a promising area in the field of information analysis. In this work, the calculations of entropy values of the first, second and third order for the corpus of poems by A.S. Pushkin and other poets from the Golden Age of Russian Poetry were carried out. The values of “symbolic entropy”, “phonetic entropy” and “emotional entropy” and their mathematical expectations and variances were calculated for given corpora using the software application that automatically extracts statistical information, which is potentially applicable to tasks that identify features of the author’s style. The statistical data extracted could become the basis of the stylometric classification of authors by entropy characteristics. Full article
(This article belongs to the Special Issue Natural Language Processing: Theory, Methods and Applications)
Show Figures

Figure 1

22 pages, 5373 KB  
Article
Secret Key Distillation with Speech Input and Deep Neural Network-Controlled Privacy Amplification
by Jelica Radomirović, Milan Milosavljević, Zoran Banjac and Miloš Jovanović
Mathematics 2023, 11(6), 1524; https://doi.org/10.3390/math11061524 - 21 Mar 2023
Cited by 5 | Viewed by 1974
Abstract
We propose a new high-speed secret key distillation system via public discussion based on the common randomness contained in the speech signal of the protocol participants. The proposed system consists of subsystems for quantization, advantage distillation, information reconciliation, an estimator for predicting conditional [...] Read more.
We propose a new high-speed secret key distillation system via public discussion based on the common randomness contained in the speech signal of the protocol participants. The proposed system consists of subsystems for quantization, advantage distillation, information reconciliation, an estimator for predicting conditional Renyi entropy, and universal hashing. The parameters of the system are optimized in order to achieve the maximum key distillation rate. By introducing a deep neural block for the prediction of conditional Renyi entropy, the lengths of the distilled secret keys are adaptively determined. The optimized system gives a key rate of over 11% and negligible information leakage to the eavesdropper, while NIST tests show the high cryptographic quality of produced secret keys. For a sampling rate of 16 kHz and quantization of input speech signals with 16 bits per sample, the system provides secret keys at a rate of 28 kb/s. This speed opens the possibility of wider application of this technology in the field of contemporary information security. Full article
Show Figures

Figure 1

24 pages, 5111 KB  
Article
Post-Authorship Attribution Using Regularized Deep Neural Network
by Abiodun Modupe, Turgay Celik, Vukosi Marivate and Oludayo O. Olugbara
Appl. Sci. 2022, 12(15), 7518; https://doi.org/10.3390/app12157518 - 26 Jul 2022
Cited by 14 | Viewed by 5085
Abstract
Post-authorship attribution is a scientific process of using stylometric features to identify the genuine writer of an online text snippet such as an email, blog, forum post, or chat log. It has useful applications in manifold domains, for instance, in a verification process [...] Read more.
Post-authorship attribution is a scientific process of using stylometric features to identify the genuine writer of an online text snippet such as an email, blog, forum post, or chat log. It has useful applications in manifold domains, for instance, in a verification process to proactively detect misogynistic, misandrist, xenophobic, and abusive posts on the internet or social networks. The process assumes that texts can be characterized by sequences of words that agglutinate the functional and content lyrics of a writer. However, defining an appropriate characterization of text to capture the unique writing style of an author is a complex endeavor in the discipline of computational linguistics. Moreover, posts are typically short texts with obfuscating vocabularies that might impact the accuracy of authorship attribution. The vocabularies include idioms, onomatopoeias, homophones, phonemes, synonyms, acronyms, anaphora, and polysemy. The method of the regularized deep neural network (RDNN) is introduced in this paper to circumvent the intrinsic challenges of post-authorship attribution. It is based on a convolutional neural network, bidirectional long short-term memory encoder, and distributed highway network. The neural network was used to extract lexical stylometric features that are fed into the bidirectional encoder to extract a syntactic feature-vector representation. The feature vector was then supplied as input to the distributed high networks for regularization to minimize the network-generalization error. The regularized feature vector was ultimately passed to the bidirectional decoder to learn the writing style of an author. The feature-classification layer consists of a fully connected network and a SoftMax function to make the prediction. The RDNN method was tested against thirteen state-of-the-art methods using four benchmark experimental datasets to validate its performance. Experimental results have demonstrated the effectiveness of the method when compared to the existing state-of-the-art methods on three datasets while producing comparable results on one dataset. Full article
(This article belongs to the Special Issue Application of Machine Learning in Text Mining)
Show Figures

Figure 1

18 pages, 8411 KB  
Review
Stylometry and Numerals Usage: Benford’s Law and Beyond
by Andrei V. Zenkov
Stats 2021, 4(4), 1051-1068; https://doi.org/10.3390/stats4040060 - 14 Dec 2021
Cited by 6 | Viewed by 3455
Abstract
We suggest two approaches to the statistical analysis of texts, both based on the study of numerals occurrence in literary texts. The first approach is related to Benford’s Law and the analysis of the frequency distribution of various leading digits of numerals contained [...] Read more.
We suggest two approaches to the statistical analysis of texts, both based on the study of numerals occurrence in literary texts. The first approach is related to Benford’s Law and the analysis of the frequency distribution of various leading digits of numerals contained in the text. In coherent literary texts, the share of the leading digit 1 is even larger than prescribed by Benford’s Law and can reach 50 percent. The frequencies of occurrence of the digit 1, as well as, to a lesser extent, the digits 2 and 3, are usually a characteristic the author’s style feature, manifested in all (sufficiently long) literary texts of any author. This approach is convenient for testing whether a group of texts has common authorship: the latter is dubious if the frequency distributions are sufficiently different. The second approach is the extension of the first one and requires the study of the frequency distribution of numerals themselves (not their leading digits). The approach yields non-trivial information about the author, stylistic and genre peculiarities of the texts and is suited for the advanced stylometric analysis. The proposed approaches are illustrated by examples of computer analysis of the literary texts in English and Russian. Full article
(This article belongs to the Special Issue Benford's Law(s) and Applications)
Show Figures

Figure 1

18 pages, 847 KB  
Article
Language-Independent Fake News Detection: English, Portuguese, and Spanish Mutual Features
by Hugo Queiroz Abonizio, Janaina Ignacio de Morais, Gabriel Marques Tavares and Sylvio Barbon Junior
Future Internet 2020, 12(5), 87; https://doi.org/10.3390/fi12050087 - 11 May 2020
Cited by 82 | Viewed by 11460
Abstract
Online Social Media (OSM) have been substantially transforming the process of spreading news, improving its speed, and reducing barriers toward reaching out to a broad audience. However, OSM are very limited in providing mechanisms to check the credibility of news propagated through their [...] Read more.
Online Social Media (OSM) have been substantially transforming the process of spreading news, improving its speed, and reducing barriers toward reaching out to a broad audience. However, OSM are very limited in providing mechanisms to check the credibility of news propagated through their structure. The majority of studies on automatic fake news detection are restricted to English documents, with few works evaluating other languages, and none comparing language-independent characteristics. Moreover, the spreading of deceptive news tends to be a worldwide problem; therefore, this work evaluates textual features that are not tied to a specific language when describing textual data for detecting news. Corpora of news written in American English, Brazilian Portuguese, and Spanish were explored to study complexity, stylometric, and psychological text features. The extracted features support the detection of fake, legitimate, and satirical news. We compared four machine learning algorithms (k-Nearest Neighbors (k-NN), Support Vector Machine (SVM), Random Forest (RF), and Extreme Gradient Boosting (XGB)) to induce the detection model. Results show our proposed language-independent features are successful in describing fake, satirical, and legitimate news across three different languages, with an average detection accuracy of 85.3% with RF. Full article
(This article belongs to the Special Issue Social Web, New Media, Algorithms and Power)
Show Figures

Figure 1

Back to TopTop