Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (4)

Search Parameters:
Keywords = authorship marker

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
31 pages, 5323 KiB  
Article
Learning the Style via Mixed SN-Grams: An Evaluation in Authorship Attribution
by Juan Pablo Francisco Posadas-Durán, Germán Ríos-Toledo, Erick Velázquez-Lozada, J. A. de Jesús Osuna-Coutiño, Madaín Pérez-Patricio and Fernando Pech May
AI 2025, 6(5), 104; https://doi.org/10.3390/ai6050104 - 20 May 2025
Viewed by 1003
Abstract
This study addresses the problem of authorship attribution with a novel method for modeling writing style using dependency tree subtree parsing. This method exploits the syntactic information of sentences using mixed syntactic n-grams (mixed sn-grams). The method comprises an algorithm to generate [...] Read more.
This study addresses the problem of authorship attribution with a novel method for modeling writing style using dependency tree subtree parsing. This method exploits the syntactic information of sentences using mixed syntactic n-grams (mixed sn-grams). The method comprises an algorithm to generate mixed sn-grams by integrating words, POS tags, and dependency relation tags. The mixed sn-grams are used as style markers to feed Machine Learning methods such as a SVM. A comparative analysis was performed to evaluate the performance of the proposed mixed sn-grams method against homogeneous sn-grams with the PAN-CLEF 2012 and CCAT50 datasets. Experiments with PAN 2012 showed the potential of mixed sn-grams to model a writing style by outperforming homogeneous sn-grams. On the other hand, experiments with CCAT50 showed that training with mixed sn-grams improves accuracy over homogeneous sn-grams, with the POS-Word category showing the best result. The study’s results suggest that mixed sn-grams constitute effective stylistic markers for building a reliable writing style model, which machine learning algorithms can learn. Full article
Show Figures

Figure 1

37 pages, 2517 KiB  
Article
Multitask Learning for Authenticity and Authorship Detection
by Gurunameh Singh Chhatwal and Jiashu Zhao
Electronics 2025, 14(6), 1113; https://doi.org/10.3390/electronics14061113 - 12 Mar 2025
Cited by 1 | Viewed by 1105
Abstract
Traditionally, detecting misinformation (real vs. fake) and authorship (human vs. AI) have been addressed as separate classification tasks, leaving a critical gap in real-world scenarios where these challenges increasingly overlap. Motivated by this need, we introduce a unified framework—the Shared–Private Synergy Model (SPSM)—that [...] Read more.
Traditionally, detecting misinformation (real vs. fake) and authorship (human vs. AI) have been addressed as separate classification tasks, leaving a critical gap in real-world scenarios where these challenges increasingly overlap. Motivated by this need, we introduce a unified framework—the Shared–Private Synergy Model (SPSM)—that tackles both authenticity and authorship classification under one umbrella. Our approach is tested on a novel multi-label dataset and evaluated through an exhaustive suite of methods, including traditional machine learning, stylometric feature analysis, and pretrained large language model-based classifiers. Notably, the proposed SPSM architecture incorporates multitask learning, shared–private layers, and hierarchical dependencies, achieving state-of-the-art results with over 96% accuracy for authenticity (real vs. fake) and 98% for authorship (human vs. AI). Beyond its superior performance, our approach is interpretable: stylometric analyses reveal how factors like sentence complexity and entity usage can differentiate between fake news and AI-generated text. Meanwhile, LLM-based classifiers show moderate success. Comprehensive ablation studies further highlight the impact of task-specific architectural enhancements such as shared layers and balanced task losses on boosting classification performance. Our findings underscore the effectiveness of synergistic PLM architectures for tackling complex classification tasks while offering insights into linguistic and structural markers of authenticity and attribution. This study provides a strong foundation for future research, including multimodal detection, cross-lingual expansion, and the development of lightweight, deployable models to combat misinformation in the evolving digital landscape and smart society. Full article
Show Figures

Figure 1

28 pages, 1581 KiB  
Article
Authorship Attribution in Less-Resourced Languages: A Hybrid Transformer Approach for Romanian
by Melania Nitu and Mihai Dascalu
Appl. Sci. 2024, 14(7), 2700; https://doi.org/10.3390/app14072700 - 23 Mar 2024
Cited by 1 | Viewed by 2431
Abstract
Authorship attribution for less-resourced languages like Romanian, characterized by the scarcity of large, annotated datasets and the limited number of available NLP tools, poses unique challenges. This study focuses on a hybrid Transformer combining handcrafted linguistic features, ranging from surface indices like word [...] Read more.
Authorship attribution for less-resourced languages like Romanian, characterized by the scarcity of large, annotated datasets and the limited number of available NLP tools, poses unique challenges. This study focuses on a hybrid Transformer combining handcrafted linguistic features, ranging from surface indices like word frequencies to syntax, semantics, and discourse markers, with contextualized embeddings from a Romanian BERT encoder. The methodology involves extracting contextualized representations from a pre-trained Romanian BERT model and concatenating them with linguistic features, selected using the Kruskal–Wallis mean rank, to create a hybrid input vector for a classification layer. We compare this approach with a baseline ensemble of seven machine learning classifiers for authorship attribution employing majority soft voting. We conduct studies on both long texts (full texts) and short texts (paragraphs), with 19 authors and a subset of 10. Our hybrid Transformer outperforms existing methods, achieving an F1 score of 0.87 on the full dataset of the 19-author set (an 11% enhancement) and an F1 score of 0.95 on the 10-author subset (an increase of 10% over previous research studies). We conduct linguistic analysis leveraging textual complexity indices and employ McNemar and Cochran’s Q statistical tests to evaluate the performance evolution across the best three models, while highlighting patterns in misclassifications. Our research contributes to diversifying methodologies for effective authorship attribution in resource-constrained linguistic environments. Furthermore, we publicly release the full dataset and the codebase associated with this study to encourage further exploration and development in this field. Full article
Show Figures

Figure 1

18 pages, 886 KiB  
Article
Retraction Notices: Who Authored Them?
by Shaoxiong (Brian) Xu and Guangwei Hu
Publications 2018, 6(1), 2; https://doi.org/10.3390/publications6010002 - 3 Jan 2018
Cited by 17 | Viewed by 11061
Abstract
Unlike other academic publications whose authorship is eagerly claimed, the provenance of retraction notices (RNs) is often obscured presumably because the retraction of published research is associated with undesirable behavior and consequently carries negative consequences for the individuals involved. The ambiguity of authorship, [...] Read more.
Unlike other academic publications whose authorship is eagerly claimed, the provenance of retraction notices (RNs) is often obscured presumably because the retraction of published research is associated with undesirable behavior and consequently carries negative consequences for the individuals involved. The ambiguity of authorship, however, has serious ethical ramifications and creates methodological problems for research on RNs that requires clear authorship attribution. This article reports a study conducted to identify RN textual features that can be used to disambiguate obscured authorship, ascertain the extent of authorship evasion in RNs from two disciplinary clusters, and determine if the disciplines varied in the distributions of different types of RN authorship. Drawing on a corpus of 370 RNs archived in the Web of Science for the hard discipline of Cell Biology and the soft disciplines of Business, Finance, and Management, this study has identified 25 types of textual markers that can be used to disambiguate authorship, and revealed that only 25.68% of the RNs could be unambiguously attributed to authors of the retracted articles alone or jointly and that authorship could not be determined for 28.92% of the RNs. Furthermore, the study has found marked disciplinary differences in the different categories of RN authorship. These results point to the need for more explicit editorial requirements about RN authorship and their strict enforcement. Full article
(This article belongs to the Special Issue Scientific Ethics)
Show Figures

Figure 1

Back to TopTop