Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (9)

Search Parameters:
Keywords = human vs. AI text generation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 19843 KB  
Article
Distinguishing Human- and AI-Generated Image Descriptions Using CLIP Similarity and Transformer-Based Classification
by Daniela Onita, Matei-Vasile Căpîlnaș and Adriana Baciu (Birlutiu)
Mathematics 2025, 13(19), 3228; https://doi.org/10.3390/math13193228 - 9 Oct 2025
Viewed by 427
Abstract
Recent advances in vision-language models such as BLIP-2 have made AI-generated image descriptions increasingly fluent and difficult to distinguish from human-authored texts. This paper investigates whether such differences can still be reliably detected by introducing a novel bilingual dataset of English and Romanian [...] Read more.
Recent advances in vision-language models such as BLIP-2 have made AI-generated image descriptions increasingly fluent and difficult to distinguish from human-authored texts. This paper investigates whether such differences can still be reliably detected by introducing a novel bilingual dataset of English and Romanian captions. The English subset was derived from the T4SA dataset, while AI-generated captions were produced with BLIP-2 and translated into Romanian using MarianMT; human-written Romanian captions were collected via manual annotation. We analyze the problem from two perspectives: (i) semantic alignment, using CLIP similarity, and (ii) supervised classification with both traditional and transformer-based models. Our results show that BERT achieves over 95% cross-validation accuracy (F1 = 0.95, ROC AUC = 0.99) in distinguishing AI from human texts, while simpler classifiers such as Logistic Regression also reach competitive scores (F1 ≈ 0.88). Beyond classification, semantic and linguistic analyses reveal systematic cross-lingual differences: English captions are significantly longer and more verbose, whereas Romanian texts—often more concise—exhibit higher alignment with visual content. Romanian was chosen as a representative low-resource language, where studying such differences provides insights into multilingual AI detection and challenges in vision-language modeling. These findings emphasize the novelty of our contribution: a publicly available bilingual dataset and the first systematic comparison of human vs. AI-generated captions in both high- and low-resource languages. Full article
Show Figures

Figure 1

16 pages, 1471 KB  
Article
Leveraging Explainable AI for LLM Text Attribution: Differentiating Human-Written and Multiple LLM-Generated Text
by Ayat A. Najjar, Huthaifa I. Ashqar, Omar Darwish and Eman Hammad
Information 2025, 16(9), 767; https://doi.org/10.3390/info16090767 - 4 Sep 2025
Viewed by 1515
Abstract
The development of generative AI Large Language Models (LLMs) raised the alarm regarding the identification of content produced by generative AI vs. humans. In one case, issues arise when students heavily rely on such tools in a manner that can affect the development [...] Read more.
The development of generative AI Large Language Models (LLMs) raised the alarm regarding the identification of content produced by generative AI vs. humans. In one case, issues arise when students heavily rely on such tools in a manner that can affect the development of their writing or coding skills. Other issues of plagiarism also apply. This study aims to support efforts to detect and identify textual content generated using LLM tools. We hypothesize that LLM-generated text is detectable by machine learning (ML) and investigate ML models that can recognize and differentiate between texts generated by humans and multiple LLM tools. We used a dataset of student-written text in comparison with LLM-written text. We leveraged several ML and Deep Learning (DL) algorithms, such as Random Forest (RF) and Recurrent Neural Networks (RNNs) and utilized Explainable Artificial Intelligence (XAI) to understand the important features in attribution. Our method is divided into (1) binary classification to differentiate between human-written and AI-generated text and (2) multi-classification to differentiate between human-written text and text generated by five different LLM tools (ChatGPT, LLaMA, Google Bard, Claude, and Perplexity). Results show high accuracy in multi- and binary classification. Our model outperformed GPTZero (78.3%), with an accuracy of 98.5%. Notably, GPTZero was unable to recognize about 4.2% of the observations, but our model was able to recognize the complete test dataset. XAI results showed that understanding feature importance across different classes enables detailed author/source profiles, aiding in attribution and supporting plagiarism detection by highlighting unique stylistic and structural elements, thereby ensuring robust verification of content originality. Full article
(This article belongs to the Special Issue Generative AI Transformations in Industrial and Societal Applications)
Show Figures

Figure 1

35 pages, 3420 KB  
Systematic Review
Effectiveness and Adherence of Standalone Digital Tobacco Cessation Modalities: A Systematic Review of Systematic Reviews
by Maria Pia Di Palo, Federica Di Spirito, Marina Garofano, Rosaria Del Sorbo, Mario Caggiano, Francesco Giordano, Marianna Bartolomeo, Colomba Pessolano, Massimo Giordano, Massimo Amato and Alessia Bramanti
Healthcare 2025, 13(17), 2125; https://doi.org/10.3390/healthcare13172125 - 26 Aug 2025
Viewed by 764
Abstract
Background: The World Health Organization defined specific recommendations about digital tobacco cessation modalities as a self-management tool or as an adjunct to other support for adults. Objectives: The present umbrella review primarily aimed to assess the long-term (≥6 months) effectiveness and adherence [...] Read more.
Background: The World Health Organization defined specific recommendations about digital tobacco cessation modalities as a self-management tool or as an adjunct to other support for adults. Objectives: The present umbrella review primarily aimed to assess the long-term (≥6 months) effectiveness and adherence of the different standalone digital tobacco cessation modalities (mobile text messaging, smartphone apps, Internet-based websites and programs, AI-based), administered individually or in combination; secondarily, the study aimed to assess the effect on smokers’ health. Methods: The present study (PROSPERO number: CRD42024601824) followed the PRISMA guidelines. The included studies were qualitatively synthesized and evaluated through the AMSTAR-2 tool. Results: Forty-five systematic reviews were included, encompassing 164,010 adult daily smokers of combustible tobacco. At 6 months, highly interactive or human-centered digital tools showed higher effectiveness (biochemically verified continuous abstinence rates (CARs) were 11.48% for smartphone apps and 11.76% for video/telephone counseling). In contrast, at 12 months, simpler, less interactive tools demonstrated higher effectiveness (self-reported CARs was 24.38% for mobile text messaging and 18.98% for Internet-based). Adherence rates were generally high, particularly with human-centered digital tools, amounting to 94.12% at 6 months and 64.08% at 12 months. Compared with individually administered digital tobacco cessation modalities, at 12 months, combined ones registered slightly higher effectiveness (self-reported CARs were 13.12% vs. 13.94%) and adherence (62.36% vs. 63.70%), potentially attributed to the multi-component nature and longer durations. Conclusions: Clinicians should prioritize combined digital tobacco cessation interventions that incorporate human-centered engagement initially, alongside simpler, sustained digital support to enhance long-term effectiveness and adherence. Future research should explore long-term medical and oral health benefits to assess the impact on overall health and well-being. Full article
Show Figures

Figure 1

13 pages, 894 KB  
Article
Enhancing and Not Replacing Clinical Expertise: Improving Named-Entity Recognition in Colonoscopy Reports Through Mixed Real–Synthetic Training Sources
by Andrei-Constantin Ioanovici, Andrei-Marian Feier, Marius-Ștefan Mărușteri, Alina-Dia Trâmbițaș-Miron and Daniela-Ecaterina Dobru
J. Pers. Med. 2025, 15(8), 334; https://doi.org/10.3390/jpm15080334 - 30 Jul 2025
Viewed by 673
Abstract
Background/Objectives: In routine practice, colonoscopy findings are saved as unstructured free text, limiting secondary use. Accurate named-entity recognition (NER) is essential to unlock these descriptions for quality monitoring, personalized medicine and research. We compared named-entity recognition (NER) models trained on real, synthetic, [...] Read more.
Background/Objectives: In routine practice, colonoscopy findings are saved as unstructured free text, limiting secondary use. Accurate named-entity recognition (NER) is essential to unlock these descriptions for quality monitoring, personalized medicine and research. We compared named-entity recognition (NER) models trained on real, synthetic, and mixed data to determine whether privacy preserving synthetic reports can boost clinical information extraction. Methods: Three Spark NLP biLSTM CRF models were trained on (i) 100 manually annotated Romanian colonoscopy reports (ModelR), (ii) 100 prompt-generated synthetic reports (ModelS), and (iii) a 1:1 mix (ModelM). Performance was tested on 40 unseen reports (20 real, 20 synthetic) for seven entities. Micro-averaged precision, recall, and F1-score values were computed; McNemar tests with Bonferroni correction assessed pairwise differences. Results: ModelM outperformed single-source models (precision 0.95, recall 0.93, F1 0.94) and was significantly superior to ModelR (F1 0.70) and ModelS (F1 0.64; p < 0.001 for both). ModelR maintained high accuracy on real text (F1 = 0.90), but its accuracy fell when tested on synthetic data (0.47); the reverse was observed for ModelS (F1 = 0.99 synthetic, 0.33 real). McNemar χ2 statistics (64.6 for ModelM vs. ModelR; 147.0 for ModelM vs. ModelS) greatly exceeded the Bonferroni-adjusted significance threshold (α = 0.0167), confirming that the observed performance gains were unlikely to be due to chance. Conclusions: Synthetic colonoscopy descriptions are a valuable complement, but not a substitute for real annotations, while AI is helping human experts, not replacing them. Training on a balanced mix of real and synthetic data can help to obtain robust, generalizable NER models able to structure free-text colonoscopy reports, supporting large-scale, privacy-preserving colorectal cancer surveillance and personalized follow-up. Full article
(This article belongs to the Special Issue Clinical Updates on Personalized Upper Gastrointestinal Endoscopy)
Show Figures

Figure 1

37 pages, 2517 KB  
Article
Multitask Learning for Authenticity and Authorship Detection
by Gurunameh Singh Chhatwal and Jiashu Zhao
Electronics 2025, 14(6), 1113; https://doi.org/10.3390/electronics14061113 - 12 Mar 2025
Cited by 1 | Viewed by 1781
Abstract
Traditionally, detecting misinformation (real vs. fake) and authorship (human vs. AI) have been addressed as separate classification tasks, leaving a critical gap in real-world scenarios where these challenges increasingly overlap. Motivated by this need, we introduce a unified framework—the Shared–Private Synergy Model (SPSM)—that [...] Read more.
Traditionally, detecting misinformation (real vs. fake) and authorship (human vs. AI) have been addressed as separate classification tasks, leaving a critical gap in real-world scenarios where these challenges increasingly overlap. Motivated by this need, we introduce a unified framework—the Shared–Private Synergy Model (SPSM)—that tackles both authenticity and authorship classification under one umbrella. Our approach is tested on a novel multi-label dataset and evaluated through an exhaustive suite of methods, including traditional machine learning, stylometric feature analysis, and pretrained large language model-based classifiers. Notably, the proposed SPSM architecture incorporates multitask learning, shared–private layers, and hierarchical dependencies, achieving state-of-the-art results with over 96% accuracy for authenticity (real vs. fake) and 98% for authorship (human vs. AI). Beyond its superior performance, our approach is interpretable: stylometric analyses reveal how factors like sentence complexity and entity usage can differentiate between fake news and AI-generated text. Meanwhile, LLM-based classifiers show moderate success. Comprehensive ablation studies further highlight the impact of task-specific architectural enhancements such as shared layers and balanced task losses on boosting classification performance. Our findings underscore the effectiveness of synergistic PLM architectures for tackling complex classification tasks while offering insights into linguistic and structural markers of authenticity and attribution. This study provides a strong foundation for future research, including multimodal detection, cross-lingual expansion, and the development of lightweight, deployable models to combat misinformation in the evolving digital landscape and smart society. Full article
Show Figures

Figure 1

21 pages, 5031 KB  
Article
A Comparative Study of Vision Language Models for Italian Cultural Heritage
by Chiara Vitaloni, Dasara Shullani and Daniele Baracchi
Heritage 2025, 8(3), 95; https://doi.org/10.3390/heritage8030095 - 2 Mar 2025
Cited by 1 | Viewed by 1756
Abstract
Human communication has long relied on visual media for interaction, and is facilitated by electronic devices that access visual data. Traditionally, this exchange was unidirectional, constrained to text-based queries. However, advancements in human–computer interaction have introduced technologies like reverse image search and large [...] Read more.
Human communication has long relied on visual media for interaction, and is facilitated by electronic devices that access visual data. Traditionally, this exchange was unidirectional, constrained to text-based queries. However, advancements in human–computer interaction have introduced technologies like reverse image search and large language models (LLMs), enabling both textual and visual queries. These innovations are particularly valuable in Cultural Heritage applications, such as connecting tourists with point-of-interest recognition systems during city visits. This paper investigates the use of various Vision Language Models (VLMs) for Cultural Heritage visual question aswering, including Bing’s search engine with GPT-4 and open models such as Qwen2-VL and Pixtral. Twenty Italian landmarks were selected for the study, including the Colosseum, Milan Cathedral, and Michelangelo’s David. For each landmark, two images were chosen: one from Wikipedia and another from a scientific database or private collection. These images were input into each VLM with textual queries regarding their content. We studied the quality of the responses in terms of their completeness, assessing the impact of various levels of detail in the queries. Additionally, we explored the effect of language (English vs. Italian) on the models’ ability to provide accurate answers. Our findings indicate that larger models, such as Qwen2-VL and Bing+ChatGPT-4, which are trained on multilingual datasets, perform better in both English and Italian. Iconic landmarks like the Colosseum and Florence’s Duomo are easily recognized, and providing context (e.g., the city) improves identification accuracy. Surprisingly, the Wikimedia dataset did not perform as expected, with varying results across models. Open models like Qwen2-VL, which can run on consumer workstations, showed performance similar to larger models. While the algorithms demonstrated strong results, they also generated occasional hallucinated responses, highlighting the need for ongoing refinement of AI systems for Cultural Heritage applications. Full article
(This article belongs to the Special Issue AI and the Future of Cultural Heritage)
Show Figures

Figure 1

19 pages, 1770 KB  
Article
Application of Conversational AI Models in Decision Making for Clinical Periodontology: Analysis and Predictive Modeling
by Albert Camlet, Aida Kusiak and Dariusz Świetlik
AI 2025, 6(1), 3; https://doi.org/10.3390/ai6010003 - 2 Jan 2025
Cited by 3 | Viewed by 1963
Abstract
(1) Background: Language represents a crucial ability of humans, enabling communication and collaboration. ChatGPT is an AI chatbot utilizing the GPT (Generative Pretrained Transformer) language model architecture, enabling the generation of human-like text. The aim of the research was to assess the effectiveness [...] Read more.
(1) Background: Language represents a crucial ability of humans, enabling communication and collaboration. ChatGPT is an AI chatbot utilizing the GPT (Generative Pretrained Transformer) language model architecture, enabling the generation of human-like text. The aim of the research was to assess the effectiveness of ChatGPT-3.5 and the latest version, ChatGPT-4, in responding to questions posed within the scope of a periodontology specialization exam. (2) Methods: Two certification examinations in periodontology, available in both English and Polish, comprising 120 multiple-choice questions, each in a single-best-answer format. The questions were additionally assigned to five types in accordance with the subject covered. These exams were utilized to evaluate the performance of ChatGPT-3.5 and ChatGPT-4. Logistic regression models were used to estimate the chances of correct answers regarding the type of question, exam session, AI model, and difficulty index. (3) Results: The percentages of correct answers obtained by ChatGPT-3.5 and ChatGPT-4 in the Spring 2023 session in Polish and English were 40.3% vs. 55.5% and 45.4% vs. 68.9%, respectively. The periodontology specialty examination test accuracy of ChatGPT-4 was significantly better than that of ChatGPT-3.5 for both sessions (p < 0.05). For the ChatGPT-4 spring session, it was significantly more effective in the English language (p = 0.0325) due to the lack of statistically significant differences for ChatGPT-3.5. In the case of ChatGPT-3.5 and ChatGPT-4, incorrect responses showed notably lower difficulty index values during the Spring 2023 session in English and Polish (p < 0.05). (4) Conclusions: ChatGPT-4 exceeded the 60% threshold and passed the examination in the Spring 2023 session in the English version. In general, ChatGPT-4 performed better than ChatGPT-3.5, achieving significantly better results in the Spring 2023 test in the Polish and English versions. Full article
Show Figures

Figure 1

13 pages, 724 KB  
Article
The Effects of Assumed AI vs. Human Authorship on the Perception of a GPT-Generated Text
by Angelica Lermann Henestrosa and Joachim Kimmerle
Journal. Media 2024, 5(3), 1085-1097; https://doi.org/10.3390/journalmedia5030069 - 20 Aug 2024
Cited by 17 | Viewed by 8794
Abstract
Artificial Intelligence (AI) has demonstrated its ability to undertake writing tasks, including automated journalism. Prior studies suggest no differences between human and AI authors regarding perceived message credibility. However, research on people’s perceptions of AI authorship on complex topics is lacking. In a [...] Read more.
Artificial Intelligence (AI) has demonstrated its ability to undertake writing tasks, including automated journalism. Prior studies suggest no differences between human and AI authors regarding perceived message credibility. However, research on people’s perceptions of AI authorship on complex topics is lacking. In a between-groups experiment (N = 734), we examined the effect of labeled authorship on credibility perceptions of a GPT-written science journalism article. The results of an equivalence test showed that labeling a text as AI-written vs. human-written reduced perceived message credibility (d = 0.36). Moreover, AI authorship decreased perceived source credibility (d = 0.24), anthropomorphism (d = 0.67), and intelligence (d = 0.41). The findings are discussed against the backdrop of a growing availability of AI-generated content and a greater awareness of AI authorship. Full article
Show Figures

Figure 1

1 pages, 145 KB  
Abstract
Machine Learning for Dissimulating Reality
by Andrea Giussani
Proceedings 2021, 77(1), 17; https://doi.org/10.3390/proceedings2021077017 - 27 Apr 2021
Viewed by 2148
Abstract
In the last decade, advances in statistical modeling and computer science have boosted the production of machine-produced contents in different fields: from language to image generation, the quality of the generated outputs is remarkably high, sometimes better than those produced by a human [...] Read more.
In the last decade, advances in statistical modeling and computer science have boosted the production of machine-produced contents in different fields: from language to image generation, the quality of the generated outputs is remarkably high, sometimes better than those produced by a human being. Modern technological advances such as OpenAI’s GPT-2 (and recently GPT-3) permit automated systems to dramatically alter reality with synthetic outputs so that humans are not able to distinguish the real copy from its counteracts. An example is given by an article entirely written by GPT-2, but many other examples exist. In the field of computer vision, Nvidia’s Generative Adversarial Network, commonly known as StyleGAN (Karras et al. 2018), has become the de facto reference point for the production of a huge amount of fake human face portraits; additionally, recent algorithms were developed to create both musical scores and mathematical formulas. This presentation aims to stimulate participants on the state-of-the-art results in this field: we will cover both GANs and language modeling with recent applications. The novelty here is that we apply a transformer-based machine learning technique, namely RoBerta (Liu et al. 2019), to the detection of human-produced versus machine-produced text concerning fake news detection. RoBerta is a recent algorithm that is based on the well-known Bidirectional Encoder Representations from Transformers algorithm, known as BERT (Devlin et al. 2018); this is a bi-directional transformer used for natural language processing developed by Google and pre-trained over a huge amount of unlabeled textual data to learn embeddings. We will then use these representations as an input of our classifier to detect real vs. machine-produced text. The application is demonstrated in the presentation. Full article
(This article belongs to the Proceedings of Global Safety Evaluation (GSE) Network Workshop)
Back to TopTop