Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (115)

Search Parameters:
Keywords = speaker identification

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 720 KB  
Article
The Effect of Second Language Immersion Experience on the Perception of VOT by Saudi Arabic Learners of English
by Wafaa Alshangiti
Languages 2026, 11(5), 81; https://doi.org/10.3390/languages11050081 - 22 Apr 2026
Abstract
Increased experience with a second language (L2) can affect one’s speech perception and production. Some studies have suggested that experience does not affect the production of English bilabial stops by Arabic speakers. They produce the English bilabial stops /p/ and /b/ as the [...] Read more.
Increased experience with a second language (L2) can affect one’s speech perception and production. Some studies have suggested that experience does not affect the production of English bilabial stops by Arabic speakers. They produce the English bilabial stops /p/ and /b/ as the Arabic /b/, which differs in VOT. However, the effect of English experience on the perception of English bilabial stops remains underinvestigated. This study examines the effect of L2 immersion experience on the perception of the English stops /p/–/b/ to investigate whether the lack of /p/ in Arabic can affect the perception of the /p/–/b/ contrast and whether L2 experience shifts the category boundary toward that of native speakers. Sixtysix participants, comprising two groups of Arabic speakers with differing L2 experience and a control group of native English speakers, completed identification and discrimination tasks using the /p/–/b/ VOT continuum. The regression analysis showed that listeners with more L2 experience (i.e., ≥3 years in the UK) had a closer category boundary to that of native listeners than those with less L2 experience. However, category discrimination accuracy did not differ significantly between the Arabic groups. The results highlight the importance of L2 immersion experience in altering VOT perceptual strategies, which can help in designing future training studies that focus on VOT perception as an L2 phonetic cue. Full article
Show Figures

Figure 1

32 pages, 1293 KB  
Article
Early Detection of Re-Identification Risk in Multi-Turn Dialogues via Entity-Aware Evidence Accumulation
by Yeongseop Lee, Seungun Park and Yunsik Son
Appl. Sci. 2026, 16(8), 3680; https://doi.org/10.3390/app16083680 - 9 Apr 2026
Viewed by 356
Abstract
In multi-turn conversational AI, individually innocuous personally identifiable information (PII) fragments disclosed across successive turns can accumulate into a re-identification risk that no single utterance reveals on its own. Existing PII detectors operate on isolated utterances and therefore cannot track this cross-turn evidence [...] Read more.
In multi-turn conversational AI, individually innocuous personally identifiable information (PII) fragments disclosed across successive turns can accumulate into a re-identification risk that no single utterance reveals on its own. Existing PII detectors operate on isolated utterances and therefore cannot track this cross-turn evidence build-up. We propose a stateful middleware guardrail whose core design principle is speaker-attributed entity isolation: every extracted PII fragment is attributed to its originating conversational participant, and evidence is accumulated in entity-isolated subgraphs that prevent cross-entity contamination. The system signals re-identification onset tpred at the earliest turn where combination-based rules grounded in the uniqueness literature are satisfied. On a 184-record template-synthetic evaluation corpus, the gated NER configuration leads on primary timeliness (OW@5 = 73.4%, MAE= 1.357 turns); the full system achieves OW@5 = 70.7% with MAE = 2.442 turns as an alternative operating mode for ambiguity-sensitive disclosure patterns. We further evaluate behavior on a 300-record mutation stress set, test RULE_B on the ABCD external corpus, and supplement RULE_A evaluation with both a proxy-labeled transfer analysis on PersonaChat and a manual annotation study on 151 Switchboard dialogues. The reported results should be interpreted as an initial empirical reference point rather than a sufficient endpoint for autonomous runtime enforcement. Full article
(This article belongs to the Special Issue Advances in Intelligent Systems—2nd edition)
Show Figures

Figure 1

23 pages, 5532 KB  
Article
Perception and Production of the Aspiration Contrast in Mandarin Retroflex Affricates [tʂ] and [tʂh] by Adult Spanish Speakers Learning Mandarin Chinese: An Exploratory Study
by Guilherme Galhoz Maria Roque and Quanzhen Zhang
Languages 2026, 11(4), 69; https://doi.org/10.3390/languages11040069 - 2 Apr 2026
Viewed by 362
Abstract
This exploratory study examines the perception and production of the aspiration contrast in Mandarin voiceless retroflex affricates zh [tʂ] and ch [tʂh] by ten adult Spanish speakers (three Peruvian, seven Chilean) at Nanjing University. Participants completed a perception identification task and [...] Read more.
This exploratory study examines the perception and production of the aspiration contrast in Mandarin voiceless retroflex affricates zh [tʂ] and ch [tʂh] by ten adult Spanish speakers (three Peruvian, seven Chilean) at Nanjing University. Participants completed a perception identification task and a production reading task using the same set of 128 syllables. Voice Onset Time (VOT) measurements from the production task were converted to binary classifications for cross-modality comparison. Perception accuracy was moderately high (zh [tʂ]: 84.43%; ch [tʂh]: 82.39%), whilst production accuracy was substantially lower (zh [tʂ]: 32.61%; ch [tʂh]: 19.15% within native VOT ranges). Participants maintained the aspiration contrast (zh [tʂ] = 58 ms, ch [tʂh] = 125 ms) but consistently underproduced VOT compared to native speakers (zh [tʂ] = 67 ms, ch [tʂh] = 164 ms). Perception patterns align with Category Goodness (CG) assimilation within PAM-L2: both Mandarin sounds map to Spanish [tʃ] but with different goodness-of-fit, enabling moderate discrimination. Production follows SLM-r predictions, with learners developing a Composite L1–L2 Category that maintains the aspiration contrast but fails to establish new phonetic categories. The small sample size (n = 10) precluded robust statistical testing of individual differences. The perception–production asymmetry supports independent modality development in L2 phonetic acquisition. Full article
Show Figures

Figure 1

31 pages, 8029 KB  
Article
A Novel Fluorescence-Triggered Auditory Feedback Photosensor for Precision Lymph Node Mapping
by Kicheol Yoon, Hyunjun Son, Hari Kang, Sangyun Lee, Tae-Hyeon Lee, Won-Suk Lee and Kwang Gi Kim
Sensors 2026, 26(6), 1745; https://doi.org/10.3390/s26061745 - 10 Mar 2026
Viewed by 387
Abstract
Background: In cancer surgery, resection of the primary tumor and regional lymph nodes (LNs) is critical. Adequate LN examination is essential to detect metastasis, which determines the cancer stage. Fluorescence emission allows for visual differentiation and rapid monitoring of LNs. Methods: [...] Read more.
Background: In cancer surgery, resection of the primary tumor and regional lymph nodes (LNs) is critical. Adequate LN examination is essential to detect metastasis, which determines the cancer stage. Fluorescence emission allows for visual differentiation and rapid monitoring of LNs. Methods: Cancer tissue is stained with a fluorescent dye (indocyanine green, ICG) to identify LNs. Fluorescence is induced from the stained LNs using LED light, and a photosensor coupled with a speaker detects the fluorescence signal and triggers an audible alarm. Filters are applied to prevent false alarms. Results: Upon LN detection, an alarm is emitted from the speaker, and the results are recorded using the LED and a digital multimeter (DMM). In clinical trials, ICG is injected to induce LN fluorescence staining, followed by LED irradiation to induce the fluorescent wavelength and verify LN imaging. Discussion: In clinical trials, ICG stains both LNs and blood vessels, which may lead to false positives. To address this limitation, artificial intelligence algorithms can be trained to specifically identify LNs. Conclusions: Detection of fluorescence wavelengths via photosensors allows for rapid identification of LNs, confirmed through an audible alarm, thereby reducing surgical time. This method shows potential for broad application in cancer surgery. Full article
(This article belongs to the Collection Biomedical Imaging and Sensing)
Show Figures

Figure 1

25 pages, 2358 KB  
Article
Near-Merger and Contextual Sensitivity in the Perception of /n-l/ in Sichuan Mandarin
by Minghao Zheng, Allen Shamsi and Ratree Wayland
Brain Sci. 2026, 16(2), 155; https://doi.org/10.3390/brainsci16020155 - 29 Jan 2026
Viewed by 422
Abstract
Background/Objectives: Sichuan Mandarin is often described as exhibiting overlap or merger between word-initial /n/ and /l/, but perceptual sensitivity across phonetic contexts remains underexplored. This study examines whether perception of the /n-l/ contrast varies by vowel context and listener experience. Methods: [...] Read more.
Background/Objectives: Sichuan Mandarin is often described as exhibiting overlap or merger between word-initial /n/ and /l/, but perceptual sensitivity across phonetic contexts remains underexplored. This study examines whether perception of the /n-l/ contrast varies by vowel context and listener experience. Methods: Thirty-two Sichuan Mandarin listeners completed categorical identification and same–different AX discrimination tasks using seven-step /n/ → /l/ continua derived from native-speaker productions in /i/ and /a/ contexts. Sensitivity, response bias, accuracy, and response times were analyzed alongside individual differences. Acoustic properties of the stimuli were quantified using spectral and amplitude-based measures. Results: Listeners showed overall reduced sensitivity to the /n-l/ contrast, with substantially stronger perceptual differentiation in /i/ than in /a/ contexts. Bias patterns were comparable across contexts, indicating sensitivity-driven effects. Acoustic analyses showed more robust cue structure in the /i/ continuum. Age, education, and Standard Mandarin experience modulated response efficiency but did not eliminate the vowel asymmetry. Conclusions: Results support a context-dependent near-merger of /n/ and /l/, shaped by acoustic cue availability and experience-based cue exploitation. Full article
(This article belongs to the Special Issue Language Perception and Processing)
Show Figures

Figure 1

15 pages, 1022 KB  
Article
The Influence of Contextual Predictability on Word Segmentation in Chinese Reading: An Eye-Tracking Study
by Mengchuan Song, Wenxin Zhang, Yashu Cao and Jingxin Wang
Behav. Sci. 2026, 16(2), 185; https://doi.org/10.3390/bs16020185 - 27 Jan 2026
Viewed by 428
Abstract
Word segmentation is a fundamental component of lexical processing, and Chinese reading—lacking inter-word spacing—requires readers to identify word boundaries based on prior experience. Previous studies have shown that contextual predictability facilitates lexical identification in Chinese reading; however, its influence on word segmentation remains [...] Read more.
Word segmentation is a fundamental component of lexical processing, and Chinese reading—lacking inter-word spacing—requires readers to identify word boundaries based on prior experience. Previous studies have shown that contextual predictability facilitates lexical identification in Chinese reading; however, its influence on word segmentation remains unclear. This study used eye-tracking to examine the relationship between contextual predictability and readers’ segmentation preferences during Chinese sentence reading. Overlapping ambiguous three-character strings (e.g., 花生长) were used as the region of interest (ROI), and a 2 (segmentation type: AB-C (e.g., 花生/长) vs. A-BC (e.g., 花/生长)) × 2 (contextual predictability: high vs. low) within-subjects design was adopted. A total of 76 native Chinese speakers completed the task. The results showed that contextual predictability had a significant effect on skipping probability: Highly predictable target character strings were skipped more often than low-predictability words. However, contextual predictability did not influence any eye-movement measure. In contrast, segmentation type produced consistent effects across all measures, with shorter reading times for AB-C than for A-BC, indicating a stable preference for two-character segmentation. More importantly, no interaction emerged between contextual predictability and segmentation type, and Bayesian model comparison further supported this conclusion. These findings suggest that Chinese reading involves a robust preference for AB-C segmentation and that contextual predictability and word segmentation operate as independent processes, with predictability exerting minimal influence on word segmentation during reading. This result supports the Chinese Reading Model (CRM). Full article
(This article belongs to the Section Developmental Psychology)
Show Figures

Figure 1

16 pages, 5040 KB  
Article
Phonetic Training and Talker Variability in the Perception of Spanish Stop Consonants
by Iván Andreu Rascón
Languages 2026, 11(1), 1; https://doi.org/10.3390/languages11010001 - 23 Dec 2025
Viewed by 974
Abstract
This study examined how variability in phonetic training input (high vs. low) influences the perception and acquisition of Spanish stop consonants by English-speaking beginners. A total of 128 participants completed 20 online identification sessions targeting /p, t, k, b, d, g/. In the [...] Read more.
This study examined how variability in phonetic training input (high vs. low) influences the perception and acquisition of Spanish stop consonants by English-speaking beginners. A total of 128 participants completed 20 online identification sessions targeting /p, t, k, b, d, g/. In the high-variability condition (HVPT), learners heard tokens from six speakers, and in the low-variability condition (LVPT), all input came from a single speaker. Training followed an interleaved-talker design with immediate feedback, and perceptual learning was evaluated using a Bayesian hierarchical logistic regression analysis. Results showed improvement across sessions for both groups, with identification accuracy reaching ceiling by the end of the training sessions. Differences between HVPT and LVPT were small: LVPT showed steeper categorization trajectories in some cases due to slightly lower baselines, but neither condition yielded a measurable advantage. The pattern observed suggests that for boundary-shift contrasts such as Spanish stops, perceptual improvements are driven primarily by input quantity rather than variability. This interpretation aligns with input-based models of L2 speech learning (SLM-r, L2LP) and underscores the role of repeated exposure in restructuring phonological categories. Full article
(This article belongs to the Special Issue The Impacts of Phonetically Variable Input on Language Learning)
Show Figures

Figure 1

11 pages, 216 KB  
Article
RNN-Based F0 Estimation Method with Attention Mechanism
by Ales Jandera, Martin Muzelak and Tomas Skovranek
Information 2025, 16(12), 1089; https://doi.org/10.3390/info16121089 - 7 Dec 2025
Cited by 2 | Viewed by 706
Abstract
Fundamental frequency estimation, also known as F0 estimation, is a crucial task in speech processing and analysis, with significant applications in areas such as speech recognition, speaker identification, and emotion detection. Traditional algorithms, while effective, often encounter challenges in real-time environments due to [...] Read more.
Fundamental frequency estimation, also known as F0 estimation, is a crucial task in speech processing and analysis, with significant applications in areas such as speech recognition, speaker identification, and emotion detection. Traditional algorithms, while effective, often encounter challenges in real-time environments due to computational limitations. Recent advances in deep learning, especially in the use of recurrent neural networks (RNNs), have opened new opportunities for enhancing F0 estimation accuracy and efficiency. This paper introduces a novel RNN-based F0 estimation method with an attention mechanism and evaluates its performance against selected state-of-the-art F0 estimation approaches, including standard baseline methods, as well as neural-network-based regression and classification models. By integrating attention mechanisms, the model eliminates the necessity for post-processing steps and enables a more efficient seq2scal estimation process. While the self-attention mechanism used in Transformers captures all pairwise temporal dependencies at a quadratic computational cost, the proposed method’s implementation of the attention mechanism enables it to selectively focus on the most relevant acoustic cues for F0 prediction, enhancing robustness without increasing the model’s complexity. Experimental results using the LibriSpeech and Common Voice datasets demonstrate superior computational efficiency of the proposed method compared to current state-of-the-art RNN-based seq2seq models, while maintaining comparable estimation accuracy. Furthermore, the proposed “RNN-based F0 estimation method with an attention mechanism” achieves the lowest computational complexity among all compared models, while maintaining high accuracy, making it suitable for low-latency, resource-limited deployments and competitive even with standard baseline methods, such as pYIN or CREPE. Finally, the performance of the developed RNN-based F0 estimation method with attention mechanism in terms of RMSE and FLOPs demonstrates the potential of attention mechanisms and sequence modelling in achieving high accuracy alongside lightweight F0 estimation suitable for modern speech processing applications, which aligns with the growing trend towards deploying intelligent systems on resource-constrained devices. Full article
(This article belongs to the Special Issue Signal Processing and Machine Learning, 2nd Edition)
Show Figures

Graphical abstract

20 pages, 1485 KB  
Article
Prevalence of Neurocognitive Disorders in the Elderly Quechua Population Using the Q-RUDAS
by Jonathan Zegarra-Valdivia, Ruth Diana Mamani Quispe, José Chinoapaza Turpo, Carmen Paredes-Manrique, Marco Malaga, Oscar Mamani-Benito, Rosa Montesinos, Nilton Custodio and Giuseppe Tosto
Brain Sci. 2025, 15(12), 1307; https://doi.org/10.3390/brainsci15121307 - 4 Dec 2025
Viewed by 918
Abstract
Background:The Rowland Universal Dementia Assessment Scale (RUDAS) is a validated cognitive screening tool for illiterate and low-educated individuals, adaptable across languages and cultures. In Peru, we adapted it for Quechua speakers (Q-RUDAS) to assess cognitive status in older adults. Objective: We aimed to [...] Read more.
Background:The Rowland Universal Dementia Assessment Scale (RUDAS) is a validated cognitive screening tool for illiterate and low-educated individuals, adaptable across languages and cultures. In Peru, we adapted it for Quechua speakers (Q-RUDAS) to assess cognitive status in older adults. Objective: We aimed to estimate the prevalence of neurocognitive disorders—mild cognitive impairment (MCI) and dementia—among Quechua-speaking older adults in one of the most socially vulnerable districts of Peru using the Quechua version of the Rowland Universal Dementia Assessment Scale (Q-RUDAS), a brief cognitive screening tool validated in Peru. Methods: We studied 511 participants from Puno a region in the southern Peruvian Andes (mean age 65.04 ± 6.73 years; 80.4% females), collecting sociodemographic data and Q-RUDAS scores. After excluding 18 individuals with medical conditions that could affect cognitive performance, such as neurological, psychiatric, or cerebrovascular disorders, 493 completed the test. Results: All Q-RUDAS items were well understood, although over 50% of participants struggled with visuospatial construction. The mean Q-RUDAS score was 26.01 ± 2.71. Of the participants, 446 (90.5%) scored within normal ranges (26.67 ± 1.92), 41 (8.3%) were classified as having mild cognitive impairment (MCI) (21.49 ± 1.92), and 6 (1.2%) as having dementia (17.00 ± 2.71) based on established Q-RUDAS cut-offs. Urban participants scored higher. The prevalence of MCI and dementia combined was 9.52%. Conclusions: The Q-RUDAS is a culturally sensitive tool that can support the identification of cognitive impairment in Indigenous populations. Our findings highlight the need for further cross-validation studies to refine diagnostic accuracy in Quechua-speaking populations. Full article
(This article belongs to the Section Neurodegenerative Diseases)
Show Figures

Figure 1

16 pages, 1427 KB  
Article
Acoustic Vector Sensor–Based Speaker Diarization Using Sound Intensity Analysis for Two-Speaker Dialogues
by Grzegorz Szwoch, Józef Kotus and Szymon Zaporowski
Appl. Sci. 2025, 15(23), 12780; https://doi.org/10.3390/app152312780 - 3 Dec 2025
Viewed by 2418
Abstract
Speaker diarization is a key component of automatic speech recognition (ASR) systems, particularly in interview scenarios where speech segments must be assigned to individual speakers. This study presents a diarization algorithm based on sound intensity analysis using an Acoustic Vector Sensor (AVS). The [...] Read more.
Speaker diarization is a key component of automatic speech recognition (ASR) systems, particularly in interview scenarios where speech segments must be assigned to individual speakers. This study presents a diarization algorithm based on sound intensity analysis using an Acoustic Vector Sensor (AVS). The algorithm determines the azimuth of each speaker, defines directional beams, and detects speaker activity by analyzing intensity distributions within each beam, enabling identification of both single and overlapping speech segments. A dedicated dataset of interview recordings involving five speakers was created for evaluation. Performance was assessed using the Diarization Error Rate (DER) metric and compared with the State-of-the-Art Pyannote.audio system. The proposed AVS-based method achieved a lower DER value (0.112) than Pyannote (0.213) without overlapping speech, and a DER equal to 0.187 with overlapping speech included, demonstrating improved diarization accuracy and better handling of overlapping speech. The algorithm does not require training, operates independently of speaker-specific features, and can be adapted to various acoustic conditions. The results confirm that AVS-based diarization provides a robust and interpretable alternative to neural approaches, particularly suitable for structured two-speaker dialogues such as physician–patient or interviewer–interviewee scenarios. Full article
(This article belongs to the Special Issue Advances in Audio Signal Processing)
Show Figures

Figure 1

16 pages, 1176 KB  
Article
Hearing Tones, Missing Boundaries: Cross-Level Selective Transfer of Prosodic Boundaries Among Chinese–English Learners
by Lan Fang, Zilong Li, Keke Yu, John W. Schwieter and Ruiming Wang
Behav. Sci. 2025, 15(12), 1605; https://doi.org/10.3390/bs15121605 - 21 Nov 2025
Viewed by 551
Abstract
Second language (L2) learners often struggle to process prosodic boundaries, which are essential for speech comprehension. This study investigated the nature of these difficulties and how first language (L1) cue-weighting strategies transfer to L2 processing among Chinese (Mandarin)–English learners. The rising pitch that [...] Read more.
Second language (L2) learners often struggle to process prosodic boundaries, which are essential for speech comprehension. This study investigated the nature of these difficulties and how first language (L1) cue-weighting strategies transfer to L2 processing among Chinese (Mandarin)–English learners. The rising pitch that cues English phrase boundaries acoustically overlaps with functionally distinct Chinese lexical tones. Through two experiments comparing Chinese–English learners and native English speakers, we assessed sensitivity across lexical constituent, phrase, and sentence boundaries and manipulated acoustic cues (pause, lengthening, pitch) to estimate their perceptual weights during phrase-boundary identification. L2 learners showed reduced discrimination sensitivity only at the phrase level, performing comparably to native speakers at lexical constituent and sentence boundaries. For phrase boundaries, learners over-relied on pitch and under-relied on pre-boundary lengthening compared to native speakers, though both groups weighted pauses strongly. This selective deficit implicates the transfer of L1 cue-weighting strategies more than a global knowledge deficit. Our findings support a dynamic transfer model where L1 sensitivity to lexical tone transfer of L2 phrase perception, elevating the weight of pitch. While learners show partial adaptation, these results refine the Cue-Weighting Transfer Hypothesis by demonstrating that L2 prosodic acquisition involves both integrated L1 transfer and L2-driven reweighting strategies. Full article
Show Figures

Figure 1

20 pages, 6646 KB  
Article
Machine Unlearning for Speaker-Agnostic Detection of Gender-Based Violence Condition in Speech
by Emma Reyner-Fuentes, Esther Rituerto-González and Carmen Peláez-Moreno
Appl. Sci. 2025, 15(22), 12270; https://doi.org/10.3390/app152212270 - 19 Nov 2025
Viewed by 1003
Abstract
Gender-based violence is a pervasive social and public health issue that severely impacts women’s mental health, often leading to conditions such as anxiety, depression, post-traumatic stress disorder, and substance abuse. Identifying the combination of these various mental health conditions could then point to [...] Read more.
Gender-based violence is a pervasive social and public health issue that severely impacts women’s mental health, often leading to conditions such as anxiety, depression, post-traumatic stress disorder, and substance abuse. Identifying the combination of these various mental health conditions could then point to someone who is a victim of gender-based violence. While speech-based artificial intelligence tools appear as a promising solution for mental health screening, their performance often deteriorates when encountering speech from previously unseen speakers, a sign that speaker traits may be confounding factors. This study introduces a speaker-agnostic approach to detecting the gender-based violence victim condition—defined as self-identified survivors who exhibit pre-clinical PTSD symptom levels—from speech, aiming to develop robust artificial intelligence models capable of generalizing across speakers. By employing domain-adversarial training, we reduce the influence of speaker identity on model predictions, and we achieve a 26.95% relative reduction in speaker identification accuracy while improving gender-based violence victim condition classification accuracy by 6.37% (relative). These results suggest that our models effectively capture paralinguistic biomarkers linked to the gender-based violence victim condition, rather than speaker-specific traits. Additionally, the model’s predictions show moderate correlation with pre-clinical post-traumatic stress disorder symptoms, supporting the relevance of speech as a non-invasive tool for mental health monitoring. This work lays the foundation for ethical, privacy-preserving artificial intelligence systems to support clinical screening of gender-based violence survivors. Full article
(This article belongs to the Section Applied Biosciences and Bioengineering)
Show Figures

Figure 1

25 pages, 2538 KB  
Article
Fic2Bot: A Scalable Framework for Persona-Driven Chatbot Generation from Fiction
by Sua Kang, Chaelim Lee, Subin Jung and Minsu Lee
Electronics 2025, 14(19), 3859; https://doi.org/10.3390/electronics14193859 - 29 Sep 2025
Viewed by 1777
Abstract
This paper presents Fic2Bot, an end-to-end framework that automatically transforms raw novel text into in-character chatbots by combining scene-level retrieval with persona profiling. Unlike conventional RAG-based systems that emphasize factual accuracy but neglect stylistic coherence, Fic2Bot ensures both factual grounding and consistent persona [...] Read more.
This paper presents Fic2Bot, an end-to-end framework that automatically transforms raw novel text into in-character chatbots by combining scene-level retrieval with persona profiling. Unlike conventional RAG-based systems that emphasize factual accuracy but neglect stylistic coherence, Fic2Bot ensures both factual grounding and consistent persona expression without any manual intervention. The framework integrates (1) Major Entity Identification (MEI) for robust coreference resolution, (2) scene-structured retrieval for precise contextual grounding, and (3) stylistic and sentiment profiling to capture linguistic and emotional traits of each character. Experiments conducted on novels from diverse genres show that Fic2Bot achieves robust entity resolution, more relevant retrieval, highly accurate speaker attribution, and stronger persona consistency in multi-turn dialogues. These results highlight Fic2Bot as a scalable and domain-agnostic framework for persona-driven chatbot generation, with potential applications in interactive roleplaying, language and literary studies, and entertainment. Full article
(This article belongs to the Special Issue Feature Papers in Artificial Intelligence)
Show Figures

Graphical abstract

20 pages, 776 KB  
Article
Who Speaks to Whom? An LLM-Based Social Network Analysis of Tragic Plays
by Aura Cristina Udrea, Stefan Ruseti, Laurentiu-Marian Neagu, Ovio Olaru, Andrei Terian and Mihai Dascalu
Electronics 2025, 14(19), 3847; https://doi.org/10.3390/electronics14193847 - 28 Sep 2025
Cited by 1 | Viewed by 1124
Abstract
The study of dramatic plays has long relied on qualitative methods to analyze character interactions, making little assumption about the structural patterns of communication involved. Our approach bridges NLP and literary studies, enabling scalable, data-driven analysis of interaction patterns and power structures in [...] Read more.
The study of dramatic plays has long relied on qualitative methods to analyze character interactions, making little assumption about the structural patterns of communication involved. Our approach bridges NLP and literary studies, enabling scalable, data-driven analysis of interaction patterns and power structures in drama. We propose a novel method to supplement addressee identification in tragedies using Large Language Models (LLMs). Unlike conventional Social Network Analysis (SNA) approaches, which often diminish dialogue dynamics by relying on co-occurrence or adjacency heuristics, our LLM-based method accurately records directed speech acts, joint addresses, and listener interactions. In a preliminary evaluation of an annotated multilingual dataset of 14 scenes from nine plays in four languages, our top-performing LLM (i.e., Llama3.3-70B) achieved an F1-score of 88.75% (P = 94.81%, R = 84.72%), an exact match of 77.31%, and an 86.97% partial match with human annotations, where partial match indicates any overlap between predicted and annotated receiver lists. Through automatic extraction of speaker–addressee relations, our method provides preliminary evidence for the potential scalability of SNA for literary analyses, as well as insights into power relations, influence, and isolation of characters in tragedies, which we further visualize by rendering social network graphs. Full article
Show Figures

Figure 1

22 pages, 2431 KB  
Article
Perceptual Plasticity in Bilinguals: Language Dominance Reshapes Acoustic Cue Weightings
by Annie Tremblay and Hyoju Kim
Brain Sci. 2025, 15(10), 1053; https://doi.org/10.3390/brainsci15101053 - 27 Sep 2025
Viewed by 1214
Abstract
Background/Objectives: Speech perception is shaped by language experience, with listeners learning to selectively attend to acoustic cues that are informative in their language. This study investigates how language dominance, a proxy for long-term language experience, modulates cue weighting in highly proficient Spanish–English bilinguals’ [...] Read more.
Background/Objectives: Speech perception is shaped by language experience, with listeners learning to selectively attend to acoustic cues that are informative in their language. This study investigates how language dominance, a proxy for long-term language experience, modulates cue weighting in highly proficient Spanish–English bilinguals’ perception of English lexical stress. Methods: We tested 39 bilinguals with varying dominance profiles and 40 monolingual English speakers in a stress identification task using auditory stimuli that independently manipulated vowel quality, pitch, and duration. Results: Bayesian logistic regression models revealed that, compared to monolinguals, bilinguals relied less on vowel quality and more on pitch and duration, mirroring cue distributions in Spanish versus English. Critically, cue weighting within the bilingual group varied systematically with language dominance: English-dominant bilinguals patterned more like monolingual English listeners, showing increased reliance on vowel quality and decreased reliance on pitch and duration, whereas Spanish-dominant bilinguals retained a cue weighting that was more Spanish-like. Conclusions: These results support experience-based models of speech perception and provide behavioral evidence that bilinguals’ perceptual attention to acoustic cues remains flexible and dynamically responsive to long-term input. These results are in line with a neurobiological account of speech perception in which attentional and representational mechanisms adapt to changes in the input. Full article
(This article belongs to the Special Issue Language Perception and Processing)
Show Figures

Figure 1

Back to TopTop