MDPI - Publisher of Open Access Journals

18 pages, 720 KB

Open AccessArticle

The Effect of Second Language Immersion Experience on the Perception of VOT by Saudi Arabic Learners of English

by Wafaa Alshangiti

Languages 2026, 11(5), 81; https://doi.org/10.3390/languages11050081 - 22 Apr 2026

Increased experience with a second language (L2) can affect one’s speech perception and production. Some studies have suggested that experience does not affect the production of English bilabial stops by Arabic speakers. They produce the English bilabial stops /p/ and /b/ as the [...] Read more.

Increased experience with a second language (L2) can affect one’s speech perception and production. Some studies have suggested that experience does not affect the production of English bilabial stops by Arabic speakers. They produce the English bilabial stops /p/ and /b/ as the Arabic /b/, which differs in VOT. However, the effect of English experience on the perception of English bilabial stops remains underinvestigated. This study examines the effect of L2 immersion experience on the perception of the English stops /p/–/b/ to investigate whether the lack of /p/ in Arabic can affect the perception of the /p/–/b/ contrast and whether L2 experience shifts the category boundary toward that of native speakers. Sixtysix participants, comprising two groups of Arabic speakers with differing L2 experience and a control group of native English speakers, completed identification and discrimination tasks using the /p/–/b/ VOT continuum. The regression analysis showed that listeners with more L2 experience (i.e., ≥3 years in the UK) had a closer category boundary to that of native listeners than those with less L2 experience. However, category discrimination accuracy did not differ significantly between the Arabic groups. The results highlight the importance of L2 immersion experience in altering VOT perceptual strategies, which can help in designing future training studies that focus on VOT perception as an L2 phonetic cue. Full article

► Show Figures

Figure 1

32 pages, 1293 KB

Open AccessArticle

Early Detection of Re-Identification Risk in Multi-Turn Dialogues via Entity-Aware Evidence Accumulation

by Yeongseop Lee, Seungun Park and Yunsik Son

Appl. Sci. 2026, 16(8), 3680; https://doi.org/10.3390/app16083680 - 9 Apr 2026

Viewed by 356

Abstract

In multi-turn conversational AI, individually innocuous personally identifiable information (PII) fragments disclosed across successive turns can accumulate into a re-identification risk that no single utterance reveals on its own. Existing PII detectors operate on isolated utterances and therefore cannot track this cross-turn evidence [...] Read more.

In multi-turn conversational AI, individually innocuous personally identifiable information (PII) fragments disclosed across successive turns can accumulate into a re-identification risk that no single utterance reveals on its own. Existing PII detectors operate on isolated utterances and therefore cannot track this cross-turn evidence build-up. We propose a stateful middleware guardrail whose core design principle is speaker-attributed entity isolation: every extracted PII fragment is attributed to its originating conversational participant, and evidence is accumulated in entity-isolated subgraphs that prevent cross-entity contamination. The system signals re-identification onset

t_{pred}

at the earliest turn where combination-based rules grounded in the uniqueness literature are satisfied. On a 184-record template-synthetic evaluation corpus, the gated NER configuration leads on primary timeliness (OW@5

= 73.4 %

, MAE

= 1.357

turns); the full system achieves OW@5

= 70.7 %

with MAE

= 2.442

turns as an alternative operating mode for ambiguity-sensitive disclosure patterns. We further evaluate behavior on a 300-record mutation stress set, test RULE_B on the ABCD external corpus, and supplement RULE_A evaluation with both a proxy-labeled transfer analysis on PersonaChat and a manual annotation study on 151 Switchboard dialogues. The reported results should be interpreted as an initial empirical reference point rather than a sufficient endpoint for autonomous runtime enforcement. Full article

(This article belongs to the Special Issue Advances in Intelligent Systems—2nd edition)

► Show Figures

Figure 1

23 pages, 5532 KB

Open AccessArticle

Perception and Production of the Aspiration Contrast in Mandarin Retroflex Affricates [tʂ] and [tʂ^h] by Adult Spanish Speakers Learning Mandarin Chinese: An Exploratory Study

by Guilherme Galhoz Maria Roque and Quanzhen Zhang

Languages 2026, 11(4), 69; https://doi.org/10.3390/languages11040069 - 2 Apr 2026

Viewed by 362

Abstract

This exploratory study examines the perception and production of the aspiration contrast in Mandarin voiceless retroflex affricates zh [tʂ] and ch [tʂ^h] by ten adult Spanish speakers (three Peruvian, seven Chilean) at Nanjing University. Participants completed a perception identification task and [...] Read more.

This exploratory study examines the perception and production of the aspiration contrast in Mandarin voiceless retroflex affricates zh [tʂ] and ch [tʂ^h] by ten adult Spanish speakers (three Peruvian, seven Chilean) at Nanjing University. Participants completed a perception identification task and a production reading task using the same set of 128 syllables. Voice Onset Time (VOT) measurements from the production task were converted to binary classifications for cross-modality comparison. Perception accuracy was moderately high (zh [tʂ]: 84.43%; ch [tʂ^h]: 82.39%), whilst production accuracy was substantially lower (zh [tʂ]: 32.61%; ch [tʂ^h]: 19.15% within native VOT ranges). Participants maintained the aspiration contrast (zh [tʂ] = 58 ms, ch [tʂ^h] = 125 ms) but consistently underproduced VOT compared to native speakers (zh [tʂ] = 67 ms, ch [tʂ^h] = 164 ms). Perception patterns align with Category Goodness (CG) assimilation within PAM-L2: both Mandarin sounds map to Spanish [tʃ] but with different goodness-of-fit, enabling moderate discrimination. Production follows SLM-r predictions, with learners developing a Composite L1–L2 Category that maintains the aspiration contrast but fails to establish new phonetic categories. The small sample size (n = 10) precluded robust statistical testing of individual differences. The perception–production asymmetry supports independent modality development in L2 phonetic acquisition. Full article

► Show Figures

Figure 1

31 pages, 8029 KB

Open AccessArticle

A Novel Fluorescence-Triggered Auditory Feedback Photosensor for Precision Lymph Node Mapping

by Kicheol Yoon, Hyunjun Son, Hari Kang, Sangyun Lee, Tae-Hyeon Lee, Won-Suk Lee and Kwang Gi Kim

Sensors 2026, 26(6), 1745; https://doi.org/10.3390/s26061745 - 10 Mar 2026

Viewed by 387

Abstract

Background: In cancer surgery, resection of the primary tumor and regional lymph nodes (LNs) is critical. Adequate LN examination is essential to detect metastasis, which determines the cancer stage. Fluorescence emission allows for visual differentiation and rapid monitoring of LNs. Methods: [...] Read more.

Background: In cancer surgery, resection of the primary tumor and regional lymph nodes (LNs) is critical. Adequate LN examination is essential to detect metastasis, which determines the cancer stage. Fluorescence emission allows for visual differentiation and rapid monitoring of LNs. Methods: Cancer tissue is stained with a fluorescent dye (indocyanine green, ICG) to identify LNs. Fluorescence is induced from the stained LNs using LED light, and a photosensor coupled with a speaker detects the fluorescence signal and triggers an audible alarm. Filters are applied to prevent false alarms. Results: Upon LN detection, an alarm is emitted from the speaker, and the results are recorded using the LED and a digital multimeter (DMM). In clinical trials, ICG is injected to induce LN fluorescence staining, followed by LED irradiation to induce the fluorescent wavelength and verify LN imaging. Discussion: In clinical trials, ICG stains both LNs and blood vessels, which may lead to false positives. To address this limitation, artificial intelligence algorithms can be trained to specifically identify LNs. Conclusions: Detection of fluorescence wavelengths via photosensors allows for rapid identification of LNs, confirmed through an audible alarm, thereby reducing surgical time. This method shows potential for broad application in cancer surgery. Full article

(This article belongs to the Collection Biomedical Imaging and Sensing)

► Show Figures

Figure 1

25 pages, 2358 KB

Open AccessArticle

Near-Merger and Contextual Sensitivity in the Perception of /n-l/ in Sichuan Mandarin

by Minghao Zheng, Allen Shamsi and Ratree Wayland

Brain Sci. 2026, 16(2), 155; https://doi.org/10.3390/brainsci16020155 - 29 Jan 2026

Viewed by 422

Abstract

Background/Objectives: Sichuan Mandarin is often described as exhibiting overlap or merger between word-initial /n/ and /l/, but perceptual sensitivity across phonetic contexts remains underexplored. This study examines whether perception of the /n-l/ contrast varies by vowel context and listener experience. Methods: [...] Read more.

Background/Objectives: Sichuan Mandarin is often described as exhibiting overlap or merger between word-initial /n/ and /l/, but perceptual sensitivity across phonetic contexts remains underexplored. This study examines whether perception of the /n-l/ contrast varies by vowel context and listener experience. Methods: Thirty-two Sichuan Mandarin listeners completed categorical identification and same–different AX discrimination tasks using seven-step /n/ → /l/ continua derived from native-speaker productions in /i/ and /a/ contexts. Sensitivity, response bias, accuracy, and response times were analyzed alongside individual differences. Acoustic properties of the stimuli were quantified using spectral and amplitude-based measures. Results: Listeners showed overall reduced sensitivity to the /n-l/ contrast, with substantially stronger perceptual differentiation in /i/ than in /a/ contexts. Bias patterns were comparable across contexts, indicating sensitivity-driven effects. Acoustic analyses showed more robust cue structure in the /i/ continuum. Age, education, and Standard Mandarin experience modulated response efficiency but did not eliminate the vowel asymmetry. Conclusions: Results support a context-dependent near-merger of /n/ and /l/, shaped by acoustic cue availability and experience-based cue exploitation. Full article

(This article belongs to the Special Issue Language Perception and Processing)

► Show Figures

Figure 1

15 pages, 1022 KB

Open AccessArticle

The Influence of Contextual Predictability on Word Segmentation in Chinese Reading: An Eye-Tracking Study

by Mengchuan Song, Wenxin Zhang, Yashu Cao and Jingxin Wang

Behav. Sci. 2026, 16(2), 185; https://doi.org/10.3390/bs16020185 - 27 Jan 2026

Viewed by 428

Abstract

Word segmentation is a fundamental component of lexical processing, and Chinese reading—lacking inter-word spacing—requires readers to identify word boundaries based on prior experience. Previous studies have shown that contextual predictability facilitates lexical identification in Chinese reading; however, its influence on word segmentation remains [...] Read more.

Word segmentation is a fundamental component of lexical processing, and Chinese reading—lacking inter-word spacing—requires readers to identify word boundaries based on prior experience. Previous studies have shown that contextual predictability facilitates lexical identification in Chinese reading; however, its influence on word segmentation remains unclear. This study used eye-tracking to examine the relationship between contextual predictability and readers’ segmentation preferences during Chinese sentence reading. Overlapping ambiguous three-character strings (e.g., 花生长) were used as the region of interest (ROI), and a 2 (segmentation type: AB-C (e.g., 花生/长) vs. A-BC (e.g., 花/生长)) × 2 (contextual predictability: high vs. low) within-subjects design was adopted. A total of 76 native Chinese speakers completed the task. The results showed that contextual predictability had a significant effect on skipping probability: Highly predictable target character strings were skipped more often than low-predictability words. However, contextual predictability did not influence any eye-movement measure. In contrast, segmentation type produced consistent effects across all measures, with shorter reading times for AB-C than for A-BC, indicating a stable preference for two-character segmentation. More importantly, no interaction emerged between contextual predictability and segmentation type, and Bayesian model comparison further supported this conclusion. These findings suggest that Chinese reading involves a robust preference for AB-C segmentation and that contextual predictability and word segmentation operate as independent processes, with predictability exerting minimal influence on word segmentation during reading. This result supports the Chinese Reading Model (CRM). Full article

(This article belongs to the Section Developmental Psychology)

► Show Figures

Figure 1

16 pages, 5040 KB

Open AccessArticle

Phonetic Training and Talker Variability in the Perception of Spanish Stop Consonants

by Iván Andreu Rascón

Languages 2026, 11(1), 1; https://doi.org/10.3390/languages11010001 - 23 Dec 2025

Viewed by 974

Abstract

This study examined how variability in phonetic training input (high vs. low) influences the perception and acquisition of Spanish stop consonants by English-speaking beginners. A total of 128 participants completed 20 online identification sessions targeting /p, t, k, b, d, g/. In the [...] Read more.

This study examined how variability in phonetic training input (high vs. low) influences the perception and acquisition of Spanish stop consonants by English-speaking beginners. A total of 128 participants completed 20 online identification sessions targeting /p, t, k, b, d, g/. In the high-variability condition (HVPT), learners heard tokens from six speakers, and in the low-variability condition (LVPT), all input came from a single speaker. Training followed an interleaved-talker design with immediate feedback, and perceptual learning was evaluated using a Bayesian hierarchical logistic regression analysis. Results showed improvement across sessions for both groups, with identification accuracy reaching ceiling by the end of the training sessions. Differences between HVPT and LVPT were small: LVPT showed steeper categorization trajectories in some cases due to slightly lower baselines, but neither condition yielded a measurable advantage. The pattern observed suggests that for boundary-shift contrasts such as Spanish stops, perceptual improvements are driven primarily by input quantity rather than variability. This interpretation aligns with input-based models of L2 speech learning (SLM-r, L2LP) and underscores the role of repeated exposure in restructuring phonological categories. Full article

(This article belongs to the Special Issue The Impacts of Phonetically Variable Input on Language Learning)

► Show Figures

Figure 1

11 pages, 216 KB

Open AccessArticle

RNN-Based F0 Estimation Method with Attention Mechanism

by Ales Jandera, Martin Muzelak and Tomas Skovranek

Information 2025, 16(12), 1089; https://doi.org/10.3390/info16121089 - 7 Dec 2025

Cited by 2 | Viewed by 706

Abstract

Fundamental frequency estimation, also known as F0 estimation, is a crucial task in speech processing and analysis, with significant applications in areas such as speech recognition, speaker identification, and emotion detection. Traditional algorithms, while effective, often encounter challenges in real-time environments due to [...] Read more.

Fundamental frequency estimation, also known as F0 estimation, is a crucial task in speech processing and analysis, with significant applications in areas such as speech recognition, speaker identification, and emotion detection. Traditional algorithms, while effective, often encounter challenges in real-time environments due to computational limitations. Recent advances in deep learning, especially in the use of recurrent neural networks (RNNs), have opened new opportunities for enhancing F0 estimation accuracy and efficiency. This paper introduces a novel RNN-based F0 estimation method with an attention mechanism and evaluates its performance against selected state-of-the-art F0 estimation approaches, including standard baseline methods, as well as neural-network-based regression and classification models. By integrating attention mechanisms, the model eliminates the necessity for post-processing steps and enables a more efficient seq2scal estimation process. While the self-attention mechanism used in Transformers captures all pairwise temporal dependencies at a quadratic computational cost, the proposed method’s implementation of the attention mechanism enables it to selectively focus on the most relevant acoustic cues for F0 prediction, enhancing robustness without increasing the model’s complexity. Experimental results using the LibriSpeech and Common Voice datasets demonstrate superior computational efficiency of the proposed method compared to current state-of-the-art RNN-based seq2seq models, while maintaining comparable estimation accuracy. Furthermore, the proposed “RNN-based F0 estimation method with an attention mechanism” achieves the lowest computational complexity among all compared models, while maintaining high accuracy, making it suitable for low-latency, resource-limited deployments and competitive even with standard baseline methods, such as pYIN or CREPE. Finally, the performance of the developed RNN-based F0 estimation method with attention mechanism in terms of RMSE and FLOPs demonstrates the potential of attention mechanisms and sequence modelling in achieving high accuracy alongside lightweight F0 estimation suitable for modern speech processing applications, which aligns with the growing trend towards deploying intelligent systems on resource-constrained devices. Full article

(This article belongs to the Special Issue Signal Processing and Machine Learning, 2nd Edition)

► Show Figures

Graphical abstract

20 pages, 1485 KB

Open AccessArticle

Prevalence of Neurocognitive Disorders in the Elderly Quechua Population Using the Q-RUDAS

by Jonathan Zegarra-Valdivia, Ruth Diana Mamani Quispe, José Chinoapaza Turpo, Carmen Paredes-Manrique, Marco Malaga, Oscar Mamani-Benito, Rosa Montesinos, Nilton Custodio and Giuseppe Tosto

Brain Sci. 2025, 15(12), 1307; https://doi.org/10.3390/brainsci15121307 - 4 Dec 2025

Viewed by 918

Abstract

Background:The Rowland Universal Dementia Assessment Scale (RUDAS) is a validated cognitive screening tool for illiterate and low-educated individuals, adaptable across languages and cultures. In Peru, we adapted it for Quechua speakers (Q-RUDAS) to assess cognitive status in older adults. Objective: We aimed to [...] Read more.

Background:The Rowland Universal Dementia Assessment Scale (RUDAS) is a validated cognitive screening tool for illiterate and low-educated individuals, adaptable across languages and cultures. In Peru, we adapted it for Quechua speakers (Q-RUDAS) to assess cognitive status in older adults. Objective: We aimed to estimate the prevalence of neurocognitive disorders—mild cognitive impairment (MCI) and dementia—among Quechua-speaking older adults in one of the most socially vulnerable districts of Peru using the Quechua version of the Rowland Universal Dementia Assessment Scale (Q-RUDAS), a brief cognitive screening tool validated in Peru. Methods: We studied 511 participants from Puno a region in the southern Peruvian Andes (mean age 65.04 ± 6.73 years; 80.4% females), collecting sociodemographic data and Q-RUDAS scores. After excluding 18 individuals with medical conditions that could affect cognitive performance, such as neurological, psychiatric, or cerebrovascular disorders, 493 completed the test. Results: All Q-RUDAS items were well understood, although over 50% of participants struggled with visuospatial construction. The mean Q-RUDAS score was 26.01 ± 2.71. Of the participants, 446 (90.5%) scored within normal ranges (26.67 ± 1.92), 41 (8.3%) were classified as having mild cognitive impairment (MCI) (21.49 ± 1.92), and 6 (1.2%) as having dementia (17.00 ± 2.71) based on established Q-RUDAS cut-offs. Urban participants scored higher. The prevalence of MCI and dementia combined was 9.52%. Conclusions: The Q-RUDAS is a culturally sensitive tool that can support the identification of cognitive impairment in Indigenous populations. Our findings highlight the need for further cross-validation studies to refine diagnostic accuracy in Quechua-speaking populations. Full article

(This article belongs to the Section Neurodegenerative Diseases)

► Show Figures

Figure 1

16 pages, 1427 KB

Open AccessArticle

Acoustic Vector Sensor–Based Speaker Diarization Using Sound Intensity Analysis for Two-Speaker Dialogues

by Grzegorz Szwoch, Józef Kotus and Szymon Zaporowski

Appl. Sci. 2025, 15(23), 12780; https://doi.org/10.3390/app152312780 - 3 Dec 2025

Viewed by 2418

Abstract

Speaker diarization is a key component of automatic speech recognition (ASR) systems, particularly in interview scenarios where speech segments must be assigned to individual speakers. This study presents a diarization algorithm based on sound intensity analysis using an Acoustic Vector Sensor (AVS). The [...] Read more.

Speaker diarization is a key component of automatic speech recognition (ASR) systems, particularly in interview scenarios where speech segments must be assigned to individual speakers. This study presents a diarization algorithm based on sound intensity analysis using an Acoustic Vector Sensor (AVS). The algorithm determines the azimuth of each speaker, defines directional beams, and detects speaker activity by analyzing intensity distributions within each beam, enabling identification of both single and overlapping speech segments. A dedicated dataset of interview recordings involving five speakers was created for evaluation. Performance was assessed using the Diarization Error Rate (DER) metric and compared with the State-of-the-Art Pyannote.audio system. The proposed AVS-based method achieved a lower DER value (0.112) than Pyannote (0.213) without overlapping speech, and a DER equal to 0.187 with overlapping speech included, demonstrating improved diarization accuracy and better handling of overlapping speech. The algorithm does not require training, operates independently of speaker-specific features, and can be adapted to various acoustic conditions. The results confirm that AVS-based diarization provides a robust and interpretable alternative to neural approaches, particularly suitable for structured two-speaker dialogues such as physician–patient or interviewer–interviewee scenarios. Full article

(This article belongs to the Special Issue Advances in Audio Signal Processing)

► Show Figures

Figure 1

16 pages, 1176 KB

Open AccessArticle

Hearing Tones, Missing Boundaries: Cross-Level Selective Transfer of Prosodic Boundaries Among Chinese–English Learners

by Lan Fang, Zilong Li, Keke Yu, John W. Schwieter and Ruiming Wang

Behav. Sci. 2025, 15(12), 1605; https://doi.org/10.3390/bs15121605 - 21 Nov 2025

Viewed by 551

Abstract

Second language (L2) learners often struggle to process prosodic boundaries, which are essential for speech comprehension. This study investigated the nature of these difficulties and how first language (L1) cue-weighting strategies transfer to L2 processing among Chinese (Mandarin)–English learners. The rising pitch that [...] Read more.

Second language (L2) learners often struggle to process prosodic boundaries, which are essential for speech comprehension. This study investigated the nature of these difficulties and how first language (L1) cue-weighting strategies transfer to L2 processing among Chinese (Mandarin)–English learners. The rising pitch that cues English phrase boundaries acoustically overlaps with functionally distinct Chinese lexical tones. Through two experiments comparing Chinese–English learners and native English speakers, we assessed sensitivity across lexical constituent, phrase, and sentence boundaries and manipulated acoustic cues (pause, lengthening, pitch) to estimate their perceptual weights during phrase-boundary identification. L2 learners showed reduced discrimination sensitivity only at the phrase level, performing comparably to native speakers at lexical constituent and sentence boundaries. For phrase boundaries, learners over-relied on pitch and under-relied on pre-boundary lengthening compared to native speakers, though both groups weighted pauses strongly. This selective deficit implicates the transfer of L1 cue-weighting strategies more than a global knowledge deficit. Our findings support a dynamic transfer model where L1 sensitivity to lexical tone transfer of L2 phrase perception, elevating the weight of pitch. While learners show partial adaptation, these results refine the Cue-Weighting Transfer Hypothesis by demonstrating that L2 prosodic acquisition involves both integrated L1 transfer and L2-driven reweighting strategies. Full article

► Show Figures

Figure 1

20 pages, 6646 KB

Open AccessArticle

Machine Unlearning for Speaker-Agnostic Detection of Gender-Based Violence Condition in Speech

by Emma Reyner-Fuentes, Esther Rituerto-González and Carmen Peláez-Moreno

Appl. Sci. 2025, 15(22), 12270; https://doi.org/10.3390/app152212270 - 19 Nov 2025

Viewed by 1003

Abstract

Gender-based violence is a pervasive social and public health issue that severely impacts women’s mental health, often leading to conditions such as anxiety, depression, post-traumatic stress disorder, and substance abuse. Identifying the combination of these various mental health conditions could then point to [...] Read more.

Gender-based violence is a pervasive social and public health issue that severely impacts women’s mental health, often leading to conditions such as anxiety, depression, post-traumatic stress disorder, and substance abuse. Identifying the combination of these various mental health conditions could then point to someone who is a victim of gender-based violence. While speech-based artificial intelligence tools appear as a promising solution for mental health screening, their performance often deteriorates when encountering speech from previously unseen speakers, a sign that speaker traits may be confounding factors. This study introduces a speaker-agnostic approach to detecting the gender-based violence victim condition—defined as self-identified survivors who exhibit pre-clinical PTSD symptom levels—from speech, aiming to develop robust artificial intelligence models capable of generalizing across speakers. By employing domain-adversarial training, we reduce the influence of speaker identity on model predictions, and we achieve a 26.95% relative reduction in speaker identification accuracy while improving gender-based violence victim condition classification accuracy by 6.37% (relative). These results suggest that our models effectively capture paralinguistic biomarkers linked to the gender-based violence victim condition, rather than speaker-specific traits. Additionally, the model’s predictions show moderate correlation with pre-clinical post-traumatic stress disorder symptoms, supporting the relevance of speech as a non-invasive tool for mental health monitoring. This work lays the foundation for ethical, privacy-preserving artificial intelligence systems to support clinical screening of gender-based violence survivors. Full article

(This article belongs to the Section Applied Biosciences and Bioengineering)

► Show Figures

Figure 1

25 pages, 2538 KB

Open AccessArticle

Fic2Bot: A Scalable Framework for Persona-Driven Chatbot Generation from Fiction

by Sua Kang, Chaelim Lee, Subin Jung and Minsu Lee

Electronics 2025, 14(19), 3859; https://doi.org/10.3390/electronics14193859 - 29 Sep 2025

Viewed by 1777

Abstract

This paper presents Fic2Bot, an end-to-end framework that automatically transforms raw novel text into in-character chatbots by combining scene-level retrieval with persona profiling. Unlike conventional RAG-based systems that emphasize factual accuracy but neglect stylistic coherence, Fic2Bot ensures both factual grounding and consistent persona [...] Read more.

This paper presents Fic2Bot, an end-to-end framework that automatically transforms raw novel text into in-character chatbots by combining scene-level retrieval with persona profiling. Unlike conventional RAG-based systems that emphasize factual accuracy but neglect stylistic coherence, Fic2Bot ensures both factual grounding and consistent persona expression without any manual intervention. The framework integrates (1) Major Entity Identification (MEI) for robust coreference resolution, (2) scene-structured retrieval for precise contextual grounding, and (3) stylistic and sentiment profiling to capture linguistic and emotional traits of each character. Experiments conducted on novels from diverse genres show that Fic2Bot achieves robust entity resolution, more relevant retrieval, highly accurate speaker attribution, and stronger persona consistency in multi-turn dialogues. These results highlight Fic2Bot as a scalable and domain-agnostic framework for persona-driven chatbot generation, with potential applications in interactive roleplaying, language and literary studies, and entertainment. Full article

(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)

► Show Figures

Graphical abstract

20 pages, 776 KB

Open AccessArticle

Who Speaks to Whom? An LLM-Based Social Network Analysis of Tragic Plays

by Aura Cristina Udrea, Stefan Ruseti, Laurentiu-Marian Neagu, Ovio Olaru, Andrei Terian and Mihai Dascalu

Electronics 2025, 14(19), 3847; https://doi.org/10.3390/electronics14193847 - 28 Sep 2025

Cited by 1 | Viewed by 1124

Abstract

The study of dramatic plays has long relied on qualitative methods to analyze character interactions, making little assumption about the structural patterns of communication involved. Our approach bridges NLP and literary studies, enabling scalable, data-driven analysis of interaction patterns and power structures in [...] Read more.

The study of dramatic plays has long relied on qualitative methods to analyze character interactions, making little assumption about the structural patterns of communication involved. Our approach bridges NLP and literary studies, enabling scalable, data-driven analysis of interaction patterns and power structures in drama. We propose a novel method to supplement addressee identification in tragedies using Large Language Models (LLMs). Unlike conventional Social Network Analysis (SNA) approaches, which often diminish dialogue dynamics by relying on co-occurrence or adjacency heuristics, our LLM-based method accurately records directed speech acts, joint addresses, and listener interactions. In a preliminary evaluation of an annotated multilingual dataset of 14 scenes from nine plays in four languages, our top-performing LLM (i.e., Llama3.3-70B) achieved an F1-score of

88.75 %

(P =

94.81 %

, R =

84.72 %

), an exact match of 77.31%, and an 86.97% partial match with human annotations, where partial match indicates any overlap between predicted and annotated receiver lists. Through automatic extraction of speaker–addressee relations, our method provides preliminary evidence for the potential scalability of SNA for literary analyses, as well as insights into power relations, influence, and isolation of characters in tragedies, which we further visualize by rendering social network graphs. Full article

(This article belongs to the Special Issue Natural Language Processing Based on Neural Networks and Large Language Models)

► Show Figures

Figure 1

22 pages, 2431 KB

Open AccessArticle

Perceptual Plasticity in Bilinguals: Language Dominance Reshapes Acoustic Cue Weightings

by Annie Tremblay and Hyoju Kim

Brain Sci. 2025, 15(10), 1053; https://doi.org/10.3390/brainsci15101053 - 27 Sep 2025

Viewed by 1214

Abstract

Background/Objectives: Speech perception is shaped by language experience, with listeners learning to selectively attend to acoustic cues that are informative in their language. This study investigates how language dominance, a proxy for long-term language experience, modulates cue weighting in highly proficient Spanish–English bilinguals’ [...] Read more.

Background/Objectives: Speech perception is shaped by language experience, with listeners learning to selectively attend to acoustic cues that are informative in their language. This study investigates how language dominance, a proxy for long-term language experience, modulates cue weighting in highly proficient Spanish–English bilinguals’ perception of English lexical stress. Methods: We tested 39 bilinguals with varying dominance profiles and 40 monolingual English speakers in a stress identification task using auditory stimuli that independently manipulated vowel quality, pitch, and duration. Results: Bayesian logistic regression models revealed that, compared to monolinguals, bilinguals relied less on vowel quality and more on pitch and duration, mirroring cue distributions in Spanish versus English. Critically, cue weighting within the bilingual group varied systematically with language dominance: English-dominant bilinguals patterned more like monolingual English listeners, showing increased reliance on vowel quality and decreased reliance on pitch and duration, whereas Spanish-dominant bilinguals retained a cue weighting that was more Spanish-like. Conclusions: These results support experience-based models of speech perception and provide behavioral evidence that bilinguals’ perceptual attention to acoustic cues remains flexible and dynamically responsive to long-term input. These results are in line with a neurobiological account of speech perception in which attentional and representational mechanisms adapt to changes in the input. Full article

(This article belongs to the Special Issue Language Perception and Processing)

► Show Figures

Figure 1

Search Results (115)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (115)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI