Linguistic Markers in At-Risk Mental States Using Natural Language Processing: A Systematic Review
Highlights
- NLP markers, particularly reduced semantic coherence, lower syntactic complexity, and diminished referential cohesion, significantly differentiate individuals with at-risk mental states (ARMS) from healthy controls.
- Automated linguistic indicators can predict the transition to psychosis with high accuracy levels (ranging from 79% to 100%).
- NLP serves as an objective, non-invasive, and cost-effective tool that complements traditional semi-structured interviews in clinical settings.
- The integration of automated speech analysis into early detection protocols could enhance prognostic specificity, enabling more timely and targeted preventive interventions for individuals at clinical high risk.
Abstract
1. Introduction
2. Materials and Methods
2.1. Search Strategy
2.2. Eligibility Criteria
2.3. Data Extraction
2.4. Quality Assessment and Risk of Bias
3. Results
3.1. Study Selection
3.2. Methodological Quality and Risk of Bias Assessment of Included Studies
3.3. Study Characteristics
3.4. Linguistic Markers as Predictors of Psychosis
3.5. Additional Language Measures Beyond NLP in ARMS Research
4. Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
| ARMS | Individuals with At-Risk Mental States |
| NLP | Natural Language Processing |
| PRISMA | Preferred Reporting Items for Systematic Reviews and Meta-Analyses |
| FEP | First-Episode Psychosis |
| CHR | Clinical High Risk for Psychosis |
| DUP | Duration of Untreated Psychosis |
| UHR | Ultra High Risk |
Appendix A
| Language Variables Analyzed | Definition | Examples |
|---|---|---|
| Reduced referential cohesion | Referential cohesion involves the use of a pronoun, demonstrative, definite or comparative article to refer to people or objects previously mentioned. In a loosely cohesive reference, the speaker uses a pronoun, demonstrative, definite or comparative article to refer to a person or object that has not been previously mentioned, which can confuse the listener. Similarly, the listener is confused if the speaker makes an ambiguous reference when using a referent that can be applied to more than one person or object [37]. | Therapist: What did John do? Patient: John went to the park. John took him. John played with his dog. In this sentence with reduced referential cohesion, there is a lack of connectors that refer to the person mentioned above, repeating the name “John”, instead of a pronoun. In addition, a reference is made to an object that has not been mentioned before “him” (dog). A cohesive answer would be: “John went to the park with his dog and played with him.” |
| Reduced semantic coherence | Logical organization of the meaning of discourse through interrelated linguistic structures. The lack of semantic coherence makes it difficult to understand the discourse and integrate the meaning of sentences [38]. | Therapist: What are you going to do today to eat?” The lack of semantic coherence will be revealed with a response like: “Macaroons. This afternoon I will go shopping for tobacco. I think my brother called me yesterday.” In this response, you can see how the topic of conversation changes abruptly, going from answering what is asked to talking about anything else, without there being a logical continuity in the discourse. |
| Reduced syntactic complexity | Syntactic complexity is defined as the degree of elaboration and diversity in the grammatical structures used in the discourse, reflected in the subordination, coordination and use of modifying elements that enrich the sentences [39]. The use of pronouns, determiners, conjunctions, adverbs, verbs, etc., reflect the level of syntactic complexity of a discourse or narrative. | Therapist: Why did the children get wet? Patient: “They left without an umbrella.” The first sentence shows a reduced level of syntactic complexity compared to an answer such as the following: “Although it was raining heavily, the children, who had been waiting all day, went out to the park to play, carrying raincoats and boots, but forgot the umbrella.” |
| Poverty of speech/content | The measures of speech poverty (decrease in the quantity of speech) and content poverty (decrease in the quality of thought) have been unified under the concept of speech poverty/content; also included under this category are measures that refer to sentence length or semantic density. | |
| Speech poverty | There is a decrease in the amount of spontaneous speech, with the answers being brief, not very fluent, fragmentary, vague and not elaborated. It is rare for additional information to be provided that has not been specifically asked. The patient may not even speak if not asked and answer only in monosyllables (yes, no, etc.), and some questions may remain unanswered [40]. | Therapist: “What did you do yesterday?” Patient: “I was home.” Therapist: “And what else did you do at home?” Patient: “Nothing.” |
| Poverty of content | There is a decrease in the quality of thought. The language is adequate in quantity (verbal fluency is preserved), and the answers are sufficiently long, but they provide little information. Language tends to be vague, repetitive, imprecise, abstract and stereotyped. One can speak fluently without giving the appropriate information to answer the question asked [9]. | Therapist: “How have you been feeling lately?” Patient: “I’ve been… not bad. Everything is… well, only… Nothing is the same as before. That’s all… like this. As always.” |
| Tangentiality | A disorder of the course of thought that is characterized by the inability to associate goal-directed thoughts. They respond obliquely to what is asked and lose the thread of the conversation. There is a lack of relationship between the question and the answer given. The patient gets lost in ramblings; an answer related to the general theme is given, but they do not answer the question asked. The final goal is not reached [40]. | Therapist: “How are you feeling today?” A tangential response would be like, “Well, the weather is changing a lot lately. It’s cold in the mornings and then hot in the afternoons. Climate change is a real problem, and I think it affects animals as well. Yesterday I saw a documentary about that…” |
| Repetition (persevering thinking) | It consists of the repetition of the same answer to different questions; the patient is practically unable to change the answers. In it, words, phrases or ideas are repeated, out of context. One continuously dwells on the same concepts and gives persistent answers despite the fact that new questions or stimuli may appear [40]. | Therapist: “What did you do this morning?” Patient: “I went to the market. There were people in the market, many people. The market is always full. In the market they sell fruits. I like the market. The market is close to my house. Do you like the market?” |
| Circumstantiality | A disorder of the course of thought that is characterized by language with excessive and redundant information. Difficulty in selecting ideas, such that it cannot be discerned between what is essential and what is accessory. There is a loss of the ability to direct thought towards a goal. Excessive, unnecessary, irrelevant details are incorporated, with multiple paragraphs and clarifying comments and with difficulties in arriving at the final idea [40]. | Therapist: “Did you go to the grocery store today?” Patient: “Yes, but first I had to look for the keys, which I always leave on the table, although sometimes I put them in the drawer, because I also keep other things there such as invoices, important papers, and once I found a blue pen there… Oh yes, I went to the supermarket.” |
| Derailment | A language disorder characterized by the interruption of the logical connection between ideas and the general sense of direction of thought. Constant sliding from one topic to another. Individual sentences can be clear and meaningful, although there is a lack of an adequate connection between phrases or ideas. The resulting language may be non-cohesive, and the final content of the speech may not be related to the question asked at the beginning [40]. | Therapist: “What did you do today?” Patient: “Today I had toast for breakfast, because I like it a lot. The cows give milk, but I think the grass they eat is green. Yesterday I saw a red car that was going very fast. The color red has always reminded me of traffic lights… have you ever seen a broken traffic light?” |
| Literal use of metaphors | A blurred boundary between the symbolic and literal meaning of metaphors has been identified in ARMS subjects [28]. | “It’s like seeing through the eyes of a body I’ve bought or a robot,” “my thoughts are on autopilot, as if my true thoughts have been put aside,” “it’s like everything is organized, like in a theater.” |
| Reference | Q1 | Q2 | Q3 | Q4 | Q5 | Q6 | Q7 | Q8 | Overall Quality |
|---|---|---|---|---|---|---|---|---|---|
| Bedi (2015) [15] | Y | Y | Y | Y | U | U | Y | Y | Moderate/High |
| Corcoran (2018) [19] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Gupta (2018) [24] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Rezaii (2019) [20] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Haas (2020) [25] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Spencer (2021) [14] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Morgan (2021) [16] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Bilgrami (2022) [26] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Baklund (2023) [27] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Srivastava (2023) [23] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Nettekoven (2023) [28] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Dalal (2025) [29] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Kizilay (2024) [30] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Mota (2025) [31] | Y | Y | Y | Y | Y | Y | Y | Y | High |
| Kim-Dufor (2025) [32] | Y | Y | Y | Y | U | U | Y | Y | Moderate/High |
References
- Schmidt, S.J.; Schultze-Lutter, F.; Schimmelmann, B.G.; Maric, N.P.; Salokangas, R.K.R.; Riecher-Rössler, A.; van der Gaag, M.; Meneghelli, A.; Nordentoft, M.; Marshall, M.; et al. EPA Guidance on the Early Intervention in Clinical High Risk States of Psychoses. Eur. Psychiatry 2015, 30, 388–404. [Google Scholar] [CrossRef]
- Fusar-Poli, P.; Rocchetti, M.; Sardella, A.; Avila, A.; Brandizzi, M.; Caverzasi, E.; Politi, P.; Ruhrmann, S.; McGuire, P. Disorder, Not Just State of Risk: Meta-Analysis of Functioning and Quality of Life in People at High Risk of Psychosis. Br. J. Psychiatry 2015, 207, 198–206. [Google Scholar] [CrossRef] [PubMed]
- Olivares, R.I.; Figueroa, B.A. Análisis de Alteraciones Del Discurso En Estados Mentales de Alto Riesgo de Psicosis (EMAR): Una Revisión Sistemática. Rev. Chil. Neuropsiquiatr. 2021, 59, 343–360. [Google Scholar] [CrossRef]
- Bird, V.; Premkumar, P.; Kendall, T.; Whittington, C.; Mitchell, J.; Kuipers, E. Early Intervention Services, Cognitive–Behavioural Therapy and Family Intervention in Early Psychosis: Systematic Review. Br. J. Psychiatry 2010, 197, 350–356. [Google Scholar] [CrossRef]
- Larsen, T.K.; Melle, I.; Auestad, B.; Haahr, U.; Joa, I.; Johannessen, J.O.; Opjordsmoen, S.; Rund, B.R.; Rossberg, J.I.; Simonsen, E.; et al. Early Detection of Psychosis: Positive Effects on 5-Year Outcome. Psychol. Med. 2011, 41, 1461–1469. [Google Scholar] [CrossRef] [PubMed]
- Killackey, E.; Yung, A.R. Effectiveness of Early Intervention in Psychosis. Curr. Opin. Psychiatry 2007, 20, 121–125. [Google Scholar] [CrossRef]
- Rinaldi, M.; Killackey, E.; Smith, J.; Shepherd, G.; Singh, S.P.; Craig, T. First Episode Psychosis and Employment: A Review. Int. Rev. Psychiatry 2010, 22, 148–162. [Google Scholar] [CrossRef]
- Salazar de Pablo, G.; Guinart, D.; Armendariz, A.; Aymerich, C.; Catalan, A.; Alameda, L.; Rogdaki, M.; Martinez Baringo, E.; Soler-Vidal, J.; Oliver, D.; et al. Duration of Untreated Psychosis and Outcomes in First-Episode Psychosis: Systematic Review and Meta-Analysis of Early Detection and Intervention Strategies. Schizophr. Bull. 2024, 50, 771–783. [Google Scholar] [CrossRef]
- Yung, A.; Phillips, L.; McGorry, P.D. Treating Schizophrenia in the Prodromal Phase; CRC Press: Boca Raton, FL, USA, 2004; ISBN 9781135428983. [Google Scholar]
- Miller, T.J.; McGlashan, T.H.; Rosen, J.L.; Cadenhead, K.; Ventura, J.; McFarlane, W.; Perkins, D.O.; Pearlson, G.D.; Woods, S.W. Prodromal Assessment With the Structured Interview for Prodromal Syndromes and the Scale of Prodromal Symptoms: Predictive Validity, Interrater Reliability, and Training to Reliability. Schizophr. Bull. 2003, 29, 703–715. [Google Scholar] [CrossRef]
- Thompson, A.; Marwaha, S.; Broome, M.R. At-Risk Mental State for Psychosis: Identification and Current Treatment Approaches. BJPsych Adv. 2016, 22, 186–193. [Google Scholar] [CrossRef]
- Yung, A.R.; Yung, A.R.; Pan Yuen, H.; Mcgorry, P.D.; Phillips, L.J.; Kelly, D.; Dell’olio, M.; Francey, S.M.; Cosgrave, E.M.; Killackey, E.; et al. Mapping the Onset of Psychosis: The Comprehensive Assessment of At-Risk Mental States. Aust. New Zealand J. Psychiatry 2005, 39, 964–971. [Google Scholar] [CrossRef]
- Fusar-Poli, P.; Bonoldi, I.; Yung, A.R.; Borgwardt, S.; Kempton, M.J.; Valmaggia, L.; Barale, F.; Caverzasi, E.; McGuire, P. Predicting Psychosis: Meta-Analysis of Transition Outcomes in Individuals at High Clinical Risk. Arch. Gen. Psychiatry 2012, 69, 220. [Google Scholar] [CrossRef]
- Spencer, T.J.; Thompson, B.; Oliver, D.; Diederen, K.; Demjaha, A.; Weinstein, S.; Morgan, S.E.; Day, F.; Valmaggia, L.; Rutigliano, G.; et al. Lower Speech Connectedness Linked to Incidence of Psychosis in People at Clinical High Risk. Schizophr. Res. 2021, 228, 493–501. [Google Scholar] [CrossRef]
- Bedi, G.; Carrillo, F.; Cecchi, G.A.; Slezak, D.F.; Sigman, M.; Mota, N.B.; Ribeiro, S.; Javitt, D.C.; Copelli, M.; Corcoran, C.M. Automated Analysis of Free Speech Predicts Psychosis Onset in High-Risk Youths. Schizophrenia 2015, 1, 15030. [Google Scholar] [CrossRef]
- Morgan, S.E.; Diederen, K.; Vértes, P.E.; Ip, S.H.Y.; Wang, B.; Thompson, B.; Demjaha, A.; De Micheli, A.; Oliver, D.; Liakata, M.; et al. Natural Language Processing Markers in First Episode Psychosis and People at Clinical High-Risk. Transl. Psychiatry 2021, 11, 630. [Google Scholar] [CrossRef]
- Tang, S.X.; Kriz, R.; Cho, S.; Park, S.J.; Harowitz, J.; Gur, R.E.; Bhati, M.T.; Wolf, D.H.; Sedoc, J.; Liberman, M.Y. Natural Language Processing Methods Are Sensitive to Sub-Clinical Linguistic Differences in Schizophrenia Spectrum Disorders. NPJ Schizophr. 2021, 7, 25. [Google Scholar] [CrossRef]
- Arslan, B.; Kizilay, E.; Verim, B.; Demirlek, C.; Demir, M.; Cesim, E.; Eyuboglu, M.S.; Ozbek, S.U.; Sut, E.; Yalincetin, B.; et al. Computational Analysis of Linguistic Features in Speech Samples of First-Episode Bipolar Disorder and Psychosis. J. Affect. Disord. 2024, 363, 340–347. [Google Scholar] [CrossRef] [PubMed]
- Corcoran, C.M.; Carrillo, F.; Fernández-Slezak, D.; Bedi, G.; Klim, C.; Javitt, D.C.; Bearden, C.E.; Cecchi, G.A. Prediction of Psychosis across Protocols and Risk Cohorts Using Automated Language Analysis. World Psychiatry 2018, 17, 67–75. [Google Scholar] [CrossRef] [PubMed]
- Rezaii, N.; Walker, E.; Wolff, P. A Machine Learning Approach to Predicting Psychosis Using Semantic Density and Latent Content Analysis. NPJ Schizophr. 2019, 5, 1–12. [Google Scholar] [CrossRef]
- Page, M.J.; McKenzie, J.E.; Bossuyt, P.M.; Boutron, I.; Hoffmann, T.C.; Mulrow, C.D.; Shamseer, L.; Tetzlaff, J.M.; Akl, E.A.; Brennan, S.E.; et al. The PRISMA 2020 Statement: An Updated Guideline for Reporting Systematic Reviews. BMJ 2021, 372, n71. [Google Scholar] [CrossRef] [PubMed]
- Ouzzani, M.; Hammady, H.; Fedorowicz, Z.; Elmagarmid, A. Rayyan-a Web and Mobile App for Systematic Reviews. Syst. Rev. 2016, 5, 210. [Google Scholar] [CrossRef]
- Srivastava, A.; Selloni, A.; Bilgrami, Z.R.; Sarac, C.; McGowan, A.; Cotter, M.; Bayer, J.; Spark, J.; Krcmar, M.; Formica, M.; et al. Differential Expression of Anomalous Self-Experiences in Spontaneous Speech in Clinical High-Risk and Early-Course Psychosis Quantified by Natural Language Processing. Biol. Psychiatry Cogn. Neurosci. Neuroimaging 2023, 8, 1005–1012. [Google Scholar] [CrossRef]
- Gupta, T.; Hespos, S.J.; Horton, W.S.; Mittal, V.A. Automated Analysis of Written Narratives Reveals Abnormalities in Referential Cohesion in Youth at Ultra High Risk for Psychosis. Schizophr. Res. 2018, 192, 82–88. [Google Scholar] [CrossRef]
- Haas, S.S.; Doucet, G.E.; Garg, S.; Herrera, S.N.; Sarac, C.; Bilgrami, Z.R.; Shaik, R.B.; Corcoran, C.M. Linking Language Features to Clinical Symptoms and Multimodal Imaging in Individuals at Clinical High Risk for Psychosis. Eur. Psychiatry 2020, 63, e72. [Google Scholar] [CrossRef] [PubMed]
- Bilgrami, Z.R.; Sarac, C.; Srivastava, A.; Herrera, S.N.; Azis, M.; Haas, S.S.; Shaik, R.B.; Parvaz, M.A.; Mittal, V.A.; Cecchi, G.; et al. Construct Validity for Computational Linguistic Metrics in Individuals at Clinical Risk for Psychosis: Associations with Clinical Ratings. Schizophr. Res. 2022, 245, 90–96. [Google Scholar] [CrossRef]
- Baklund, L.; Røssberg, J.I.; Møller, P. Linguistic Markers and Basic Self-Disturbances among Adolescents at Risk of Psychosis. A Qualitative Study. EClinicalMedicine 2023, 55, 101733. [Google Scholar] [CrossRef]
- Nettekoven, C.R.; Diederen, K.; Giles, O.; Duncan, H.; Stenson, I.; Olah, J.; Gibbs-Dean, T.; Collier, N.; Vértes, P.E.; Spencer, T.J.; et al. Semantic Speech Networks Linked to Formal Thought Disorder in Early Psychosis. Schizophr. Bull. 2023, 49, S142–S152. [Google Scholar] [CrossRef]
- Dalal, T.C.; Liang, L.; Silva, A.M.; Mackinley, M.; Voppel, A.; Palaniyappan, L. Speech Based Natural Language Profile before, during and after the Onset of Psychosis: A Cluster Analysis. Acta Psychiatr. Scand. 2025, 151, 332–347. [Google Scholar] [CrossRef]
- Kizilay, E.; Arslan, B.; Verim, B.; Demirlek, C.; Demir, M.; Cesim, E.; Eyuboglu, M.S.; Uzman Ozbek, S.; Sut, E.; Yalincetin, B.; et al. Automated Linguistic Analysis in Youth at Clinical High Risk for Psychosis. Schizophr. Res. 2024, 274, 121–128. [Google Scholar] [CrossRef]
- Mota, N.B.; Ribeiro, M.; Malcorra, B.; Argolo, F.; Lopes-Rocha, A.C.; Ara, A.; Gondim, J.M.; Cecchi, G.; Loch, A.A.; Corcoran, C.M. Attenuated Symptoms Are Associated with Connectedness and Emotional Expression in Narratives Based on Emotional Pictures in a Brazilian Clinical High-Risk Cohort. Psychiatry Res. 2025, 348, 116469. [Google Scholar] [CrossRef]
- Kim-Dufor, D.-H.; Walter, M.; Krebs, M.-O.; Haralambous, Y.; Lenca, P.; Lemey, C. Deeper Insight into Speech Characteristics of Patients at Ultra-High Risk Using Classification and Explainability Models. Front. Psychiatry 2025, 16, 1595197. [Google Scholar] [CrossRef]
- García-Molina, J.T.; Downey, M.; Méndez, E.; Figueroa-Barra, A. Diagnostic and Transition Accuracy of Natural Language Processing in High Risk for Psychosis Individuals: A Systematic Review. Asian J. Psychiatry 2025, 112, 104695. [Google Scholar] [CrossRef]
- Argolo, F.; Magnavita, G.; Mota, N.B.; Ziebold, C.; Mabunda, D.; Pan, P.M.; Zugman, A.; Gadelha, A.; Corcoran, C.; Bressan, R.A. Lowering Costs for Large-Scale Screening in Psychosis: A Systematic Review and Meta-Analysis of Performance and Value of Information for Speech-Based Psychiatric Evaluation. Braz. J. Psychiatry 2020, 42, 673–686. [Google Scholar] [CrossRef]
- Eyigoz, E.; Mathur, S.; Santamaria, M.; Cecchi, G.; Naylor, M. Linguistic Markers Predict Onset of Alzheimer’s Disease. EClinicalMedicine 2020, 28, 100583. [Google Scholar] [CrossRef]
- García, A.M.; Carrillo, F.; Orozco-Arroyave, J.R.; Trujillo, N.; Vargas Bonilla, J.F.; Fittipaldi, S.; Adolfi, F.; Nöth, E.; Sigman, M.; Fernández Slezak, D.; et al. How Language Flows When Movements Don’t: An Automated Analysis of Spontaneous Discourse in Parkinson’s Disease. Brain Lang. 2016, 162, 19–28. [Google Scholar] [CrossRef]
- Bearden, C.E.; Wu, K.N.; Caplan, R.; Cannon, T.D. Thought Disorder and Communication Deviance as Predictors of Outcome in Youth at Clinical High Risk for Psychosis. J. Am. Acad. Child Adolesc. Psychiatry 2011, 50, 669–680. [Google Scholar] [CrossRef]
- Figueroa-Barra, A.; Del Aguila, D.; Cerda, M.; Gaspar, P.A.; Terissi, L.D.; Durán, M.; Valderrama, C. Automatic Language Analysis Identifies and Predicts Schizophrenia in First-Episode of Psychosis. Schizophrenia 2022, 8, 53. [Google Scholar] [CrossRef]
- Ortega, L. Syntactic Complexity Measures and Their Relationship to L2 Proficiency: A Research Synthesis of College-level L2 Writing. Appl. Linguist. 2003, 24, 492–518. [Google Scholar] [CrossRef]
- Vilarrasa, A.B. Vallejo. Introducción a la Psicopatología y la Psiquiatría; Elsevier: Amsterdam, The Netherlands, 2025; ISBN 9788491138303. [Google Scholar]

| Databases | Search Expression |
|---|---|
| Medline | (“ultra high risk”[Title/Abstract] OR “clinical high risk”[Title/Abstract] OR CHR[Title/Abstract] OR UHR[Title/Abstract] OR “risk for psychosis”[Title/Abstract] OR “at-risk mental state”[Title/Abstract] OR “at-risk for psychosis”[Title/Abstract]) AND (“language analysis”[Title/Abstract] OR “language markers”[Title/Abstract] OR “linguistic markers”[Title/Abstract] OR “natural language processing”[Title/Abstract]) |
| Scopus | (TITLE-ABS-KEY(“ultra high risk”) OR TITLE-ABS-KEY(“clinical high risk”) OR TITLE-ABS-KEY(CHR) OR TITLE-ABS-KEY(UHR) OR TITLE-ABS-KEY(“risk for psychosis”) OR TITLE-ABS-KEY(“at-risk mental state”) OR TITLE-ABS-KEY(“at-risk for psychosis”)) AND (TITLE-ABS-KEY(“language analysis”) OR TITLE-ABS-KEY(“language markers”) OR TITLE-ABS-KEY(“linguistic markers”) OR TITLE-ABS-KEY(“natural language processing”)) |
| PsycInfo | (tiab(“ultra high risk”) OR tiab(“clinical high risk”) OR tiab(CHR) OR tiab(UHR) OR tiab(“risk for psychosis”) OR tiab(“at-risk mental state”) OR tiab(“at-risk for psychosis”)) AND (tiab(“language analysis”) OR tiab(“language markers”) OR tiab(“linguistic markers”) OR tiab(“natural language processing”)) |
| Reference | Groups and N | ARMS Diagnostic Instrument | Language Collection Method | Language Analysis Technique | Language Variables Analyzed | Main Results |
|---|---|---|---|---|---|---|
| Bedi (2015) [15] | ARMS = 34 | SIPS/SOPS | Open narrative interviews. | Natural Language Toolkit (NLTK). Latent Semantic Analysis (LSA). | · Semantic coherence. · Syntactic complexity: use of determiners, poverty of speech/content. | · Semantic coherence, normalized use of determiners, and speech/content poverty: they predict psychosis with 100% accuracy. |
| Corcoran (2018) [19] | ARMS = 93 CG = 21 FEP = 16 | SIPS/SOPS | “Story Game”. Open narrative interviews. | Natural Language Toolkit (NLTK). Latent Semantic Analysis (LSA). Tagging specific parts of speech (POS-Tag) with Penn Treebank. Machine learning classifier. | · Semantic coherence. · Syntactic complexity: use of possessive pronouns. | · Semantic coherence: ARMS < CG. · Semantic coherence variance: ARMS > CG. · Possessive pronouns: ARMS < CG. · They predict psychosis with 79% accuracy. |
| Gupta (2018) [24] | ARMS = 41 CG = 43 | SIPS/SOPS | Narrative description task written using the Boston Cookie Theft Image. | Coh-Metrix 3.0. | · Referential cohesion. | · Referential cohesion: ARMS < CG. |
| Rezaii (2019) [20] | ARMS = 40 | SIPS/SOPS | SIPS/SOPS interview recordings. | Stanford PCFG parser. WordNetLemmatizer module of the Natural Language Toolkit (NLTK). | · Poverty of speech/content. · Syntactic complexity: use of determiners. · Probability of saying words “Voice, sound, song or loud”. | · Poverty of speech/content: ARMS+ > ARMS−. · Use of words over voices and sounds: ARMS+ > ARMS−. · They predict psychosis with 90% accuracy. |
| Haas (2020) [25] | ARMS = 46 CG = 22 | SIPS/SOPS | Open narrative interviews. | Natural Language Toolkit (NLTK). Latent Semantic Analysis (LSA). | · Poverty of speech/content. · Semantic coherence. · Syntactic complexity. | · Syntactic complexity: correlates with negative symptoms and seems sensitive to prodromal symptoms in ARMS individuals. · Speech/content poverty, semantic coherence, and syntactic complexity: correlated with measures of brain structure and functional connectivity in ARMS individuals. |
| Spencer (2021) [14] | ARMS = 24 FEP = 16 CG = 13 | CAARMS | Thematic Apperception Test (TAT). | Speech Graph Software. | · Referential cohesion: graph-based analysis of connectivity of speech. | · Referential cohesion: ARMS+ < ARMS−. |
| Morgan (2021) [16] | ARMS = 25 FEP = 16 CG = 13 | CAARMS | Thematic Apperception Test (TAT). Speech Comprehension Test (DCT). Interview on any topic. | Measurements using NLP. Embedding Google News, Word2vec and SIF model words. Latent Semantic Analysis (LSA). Calculus cosine similarity between all possible sentence pairs. Pre-trained conference resolution model. Speech Graph Software. | · Semantic coherence. · Tangentiality. · Relationship between the discourse and the theme of the DCT. · Repetition. · Number of ambiguous pronouns. · Referential cohesion: graphic connectivity of speech. | · Semantic coherence: ARMS < CG. |
| Bilgrami (2022) [26] | ARMS = 60 CG = 27 | SIPS/SOPS | Open qualitative interviews. Thought, Language and Communication Assessment Scale (TLC). | Natural Language Toolkit (NLTK). Tagging of specific parts of speech: (POS-Tag). Transformer Bidirectional Encoder Representation (BERT). | · FTA measures: Elements of thought disorder+: tangentiality, circumstantiality, derailment. Elements of thought disorder: poverty of speech/content. · NLP Measurements: Semantic coherence, syntactic complexity (poverty of speech/content and use of determiner pronouns). | · Semantic coherence: ARMS < CG. Correlated with TLC + thought disorder (tangentiality, circumstantiality, and derailment). · Syntactic complexity (poverty of speech/content and use of determinant pronouns): ARMS < CG. Correlated with TLC thought disorder (speech/content poverty). |
| Baklund (2023) [27] | ARMS and BSD = 30 | PQ16 SIPS/SOPS | Semi-structured interviews from the Anomalous Self-Experience Examination Manual (EASE). | An adapted form of Interpretive Phenomenological Analysis (IPA) based on a linguistic conceptual framework and the concept of basic self-disturbance (BSD). | · Distinctive and prominent words related to language. · Irregular use of prepositions related to place and location. · Personal pronouns. · Use of conjunctions and metaphors. · Idiosyncratic use of adjectives and perceptual modalities. | · BSD Linguistic Markers: Appear qualitatively different from the linguistic markers analyzed in studies with ARMS individuals. |
| Srivastava (2023) [23] | ARMS = 167 FEP = 89 CG = 170 | SIPS/SOPS CAARMS | Open qualitative interviews. IPASE. | Natural Language Toolkit (NLTK). S-BERT bidirectional sentence encoder. | · Semantic similarity of natural speech with anomalies of ego experiences assessed by IPASE. | · Semantic similarity between natural speech and IPASE (anomalous experiences of the self): ARMS > FEP > CG. |
| Nettekoven (2023) [28] | ARMS = 24 FEP = 16 CG = 13 | CAARMS | Thematic Apperception Test (TAT). Speech Comprehension Test (DCT). | Speech networks transcribed in Python 2.0 (netts). Speech syntactic graphics. | · Connected components of the netts-generated semantic speech network. | · Semantic networks of speech: CG > ARMS > FEP. |
| Dalal (2025) [29] | ARMS = 18 PSY = 18 FEP = 72 CG = 39 | SIPS | Thought and Language Index (TLI). | Cluster Analysis with R package Nbclust. | · Choice of words or lexical variables. · Structure of utterances or syntax variables. · Semantic cohesion. | Three cluster solution: · Largest cluster with typical linguistic profile included most CG and the majority of ARMS and PSY. · Cluster with high semantic similarity in word choices with less perceptual words, lower cohesion and analytical structure mostly contained FEP. · Last cluster with more perceptual but less cognitive/emotional word classes, simpler syntactic structure, and a lack of sufficient reference to prior information has more PSY. |
| Kizilay (2024) [30] | ARMS = 62 CG = 45 | SIPS/SOPS | Thematic Apperception Test (TAT). | Natural Language Toolkit (NLTK). S-BERT bidirectional sentence encoder. Machine Learning Classification: Forest classifier with Python scikit-learn. | · Semantic coherence. · Image and text similarity. · Tangentiality. · Generic characteristics and specific parts of speech (POS). | · Semantic coherence and use of adjectives: ARMS > CG. · Similarity between image and text: ARMS < CG. · Adverbs, conjunctions and pronouns in the first person: ARMS < CG. · Machine learning model based on NLP features achieved an accuracy of 79.6% in the discriminative capacity of ARMS vs. CG individuals. |
| Mota (2025) [31] | ARMS = 42 CG = 29 | SIPS, the Prodromal Questionnaire, the PCA scale | The Happy Thoughts protocol. | Speech Graph Software. Linguistic Inquiry Word Count. | · Largest connected component. · Largest strongly connected component. · Proportion of positive and negative emotional words. | Narratives with a higher proportion of “incongruent” negative emotional words: ARMS > CG. |
| Kim-Dufor (2025) [32] | ARMS = 45 CG = 15 FEP = 8 | CAARMS | Open narrative interviews. | Latent semantic analysis (LSA). Supervised machine learning model XGBoost. | · Lexical richness, diversity, density. · Syntactic complexity. · Semantic coherence. · Speech fluency. | Intersubjective LSA minimum: CG < FEP, ARMS < FEP. Subjective LSA minimum: ARMS < FEP. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zhang, Y.; Carrió, A.; Sevilla-Llewellyn-Jones, J.; Gutiérrez, E.; Calvo, A.; Navarro, J.-B.; Barajas, A. Linguistic Markers in At-Risk Mental States Using Natural Language Processing: A Systematic Review. Healthcare 2026, 14, 999. https://doi.org/10.3390/healthcare14080999
Zhang Y, Carrió A, Sevilla-Llewellyn-Jones J, Gutiérrez E, Calvo A, Navarro J-B, Barajas A. Linguistic Markers in At-Risk Mental States Using Natural Language Processing: A Systematic Review. Healthcare. 2026; 14(8):999. https://doi.org/10.3390/healthcare14080999
Chicago/Turabian StyleZhang, Yuhan, Alba Carrió, Julia Sevilla-Llewellyn-Jones, Enrique Gutiérrez, Ana Calvo, Jose-Blas Navarro, and Ana Barajas. 2026. "Linguistic Markers in At-Risk Mental States Using Natural Language Processing: A Systematic Review" Healthcare 14, no. 8: 999. https://doi.org/10.3390/healthcare14080999
APA StyleZhang, Y., Carrió, A., Sevilla-Llewellyn-Jones, J., Gutiérrez, E., Calvo, A., Navarro, J.-B., & Barajas, A. (2026). Linguistic Markers in At-Risk Mental States Using Natural Language Processing: A Systematic Review. Healthcare, 14(8), 999. https://doi.org/10.3390/healthcare14080999

