Next Article in Journal
When Stress Meets Support: How AI Learning Support Shapes the Link Between Stress Mindset and School Burnout
Previous Article in Journal
Adolescent Neural Reactivity to Alcohol Cues: The Role of Violence Exposure and Coping Motives
Previous Article in Special Issue
The Comparison of Human and Machine Performance in Object Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
This is an early access version, the complete PDF, HTML, and XML versions will be available soon.
Review

Snake Oil or Panacea? How to Misuse AI in Scientific Inquiries of the Human Mind

by
René Schlegelmilch
1,* and
Lenard Dome
2,3
1
Department of Psychology, Faculty of Human and Health Sciences, University of Bremen, 28359 Bremen, Germany
2
Department of Psychiatry and Psychotherapy, Faculty of Medicine, University Tübingen, 72074 Tübingen, Germany
3
German Center for Mental Health (DZPG), Partner Site Tübingen, 72074 Tübingen, Germany
*
Author to whom correspondence should be addressed.
Behav. Sci. 2026, 16(2), 219; https://doi.org/10.3390/bs16020219
Submission received: 31 October 2025 / Revised: 15 December 2025 / Accepted: 22 January 2026 / Published: 3 February 2026
(This article belongs to the Special Issue Advanced Studies in Human-Centred AI)

Abstract

Large language models (LLMs) are increasingly used to predict human behavior from plain-text descriptions of experimental tasks that range from judging disease severity to consequential medical decisions. While these methods promise quick insights without complex psychological theories, we reveal a critical flaw: they often latch onto accidental patterns in the data that seem predictive but collapse when faced with novel experimental conditions. Testing across multiple behavioral studies, we show these models can generate wildly inaccurate predictions, sometimes even reversing true relationships, when applied beyond their training context. Standard validation techniques miss this flaw, creating false confidence in their reliability. We introduce a simple diagnostic tool to spot these failures and urge researchers to prioritize theoretical grounding over statistical convenience. Without this, LLM-driven behavioral predictions risk being scientifically meaningless, despite impressive initial results.
Keywords: large language models; embedding-based regression; out-of-distribution generalization; model adequacy; extrapolation failure large language models; embedding-based regression; out-of-distribution generalization; model adequacy; extrapolation failure

Share and Cite

MDPI and ACS Style

Schlegelmilch, R.; Dome, L. Snake Oil or Panacea? How to Misuse AI in Scientific Inquiries of the Human Mind. Behav. Sci. 2026, 16, 219. https://doi.org/10.3390/bs16020219

AMA Style

Schlegelmilch R, Dome L. Snake Oil or Panacea? How to Misuse AI in Scientific Inquiries of the Human Mind. Behavioral Sciences. 2026; 16(2):219. https://doi.org/10.3390/bs16020219

Chicago/Turabian Style

Schlegelmilch, René, and Lenard Dome. 2026. "Snake Oil or Panacea? How to Misuse AI in Scientific Inquiries of the Human Mind" Behavioral Sciences 16, no. 2: 219. https://doi.org/10.3390/bs16020219

APA Style

Schlegelmilch, R., & Dome, L. (2026). Snake Oil or Panacea? How to Misuse AI in Scientific Inquiries of the Human Mind. Behavioral Sciences, 16(2), 219. https://doi.org/10.3390/bs16020219

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop