MDPI - Publisher of Open Access Journals

31 pages, 1901 KiB

Open AccessArticle

The Impact of Color Cues on Word Segmentation by L2 Chinese Readers: Evidence from Eye Movements

by Lin Li, Yaning Ji, Jingxin Wang and Kevin B. Paterson

Behav. Sci. 2025, 15(7), 904; https://doi.org/10.3390/bs15070904 - 3 Jul 2025

Viewed by 239

Chinese lacks explicit word boundary markers, creating frequent temporary segmental ambiguities where character sequences permit multiple plausible lexical analyses. Skilled native (L1) Chinese readers resolve these ambiguities efficiently. However, mechanisms underlying word segmentation in second language (L2) Chinese reading remain poorly understood. Our [...] Read more.

Chinese lacks explicit word boundary markers, creating frequent temporary segmental ambiguities where character sequences permit multiple plausible lexical analyses. Skilled native (L1) Chinese readers resolve these ambiguities efficiently. However, mechanisms underlying word segmentation in second language (L2) Chinese reading remain poorly understood. Our study investigated: (1) whether L2 readers experience greater difficulty processing temporary segmental ambiguities compared to L1 readers, and (2) whether visual boundary cues can facilitate ambiguity resolution in L2 reading. We measured the eye movements of 102 skilled L1 and 60 high-proficiency L2 readers for sentences containing temporarily ambiguous three-character incremental words (e.g., “音乐剧” [musical]), where the initial two characters (“音乐” [music]) also form a valid word. Sentences were presented using either neutral mono-color displays providing no segmentation cues, or color-coded displays marking word boundaries. The color-coded displays employed either uniform coloring to promote resolution of the segmental ambiguity or contrasting colors for the two-character embedded word versus the final character to induce a segmental misanalysis. The L2 group read more slowly than the L1 group, employing a cautious character-by-character reading strategy. Both groups nevertheless appeared to process the segmental ambiguity effectively, suggesting shared segmentation strategies. The L1 readers showed little sensitivity to visual boundary cues, with little evidence that this influenced their ambiguity processing. By comparison, L2 readers showed greater sensitivity to these cues, with some indication that they affected ambiguity processing. The overall sentence-level effects of color coding word boundaries were nevertheless modest for both groups, suggesting little influence of visual boundary cues on overall reading fluency for either L1 or L2 readers. Full article

► Show Figures

Figure 1

15 pages, 847 KiB

Open AccessData Descriptor

Mixtec–Spanish Parallel Text Dataset for Language Technology Development

by Hermilo Santiago-Benito, Diana-Margarita Córdova-Esparza, Juan Terven, Noé-Alejandro Castro-Sánchez, Teresa García-Ramirez, Julio-Alejandro Romero-González and José M. Álvarez-Alvarado

Data 2025, 10(7), 94; https://doi.org/10.3390/data10070094 - 21 Jun 2025

Viewed by 225

Abstract

This article introduces a freely available Spanish–Mixtec parallel corpus designed to foster natural language processing (NLP) development for an indigenous language that remains digitally low-resourced. The dataset, comprising 14,587 sentence pairs, covers Mixtec variants from Guerrero (Tlacoachistlahuaca, Northern Guerrero, and Xochapa) and Oaxaca [...] Read more.

This article introduces a freely available Spanish–Mixtec parallel corpus designed to foster natural language processing (NLP) development for an indigenous language that remains digitally low-resourced. The dataset, comprising 14,587 sentence pairs, covers Mixtec variants from Guerrero (Tlacoachistlahuaca, Northern Guerrero, and Xochapa) and Oaxaca (Western Coast, Southern Lowland, Santa María Yosoyúa, Central, Lower Cañada, Western Central, San Antonio Huitepec, Upper Western, and Southwestern Central). Texts are classified into four main domains as follows: education, law, health, and religion. To compile these data, we conducted a two-phase collection process as follows: first, an online search of government portals, religious organizations, and Mixtec language blogs; and second, an on-site retrieval of physical texts from the library of the Autonomous University of Querétaro. Scanning and optical character recognition were then performed to digitize physical materials, followed by manual correction to fix character misreadings and remove duplicates or irrelevant segments. We conducted a preliminary evaluation of the collected data to validate its usability in automatic translation systems. From Spanish to Mixtec, a fine-tuned GPT-4o-mini model yielded a BLEU score of 0.22 and a TER score of 122.86, while two fine-tuned open source models mBART-50 and M2M-100 yielded BLEU scores of 4.2 and 2.63 and TER scores of 98.99 and 104.87, respectively. All code demonstrating data usage, along with the final corpus itself, is publicly accessible via GitHub and Figshare. We anticipate that this resource will enable further research into machine translation, speech recognition, and other NLP applications while contributing to the broader goal of preserving and revitalizing the Mixtec language. Full article

► Show Figures

Figure 1

37 pages, 3049 KiB

Open AccessArticle

English-Arabic Hybrid Semantic Text Chunking Based on Fine-Tuning BERT

by Mai Alammar, Khalil El Hindi and Hend Al-Khalifa

Computation 2025, 13(6), 151; https://doi.org/10.3390/computation13060151 - 16 Jun 2025

Cited by 1 | Viewed by 628

Abstract

Semantic text chunking refers to segmenting text into coherently semantic chunks, i.e., into sets of statements that are semantically related. Semantic chunking is an essential pre-processing step in various NLP tasks e.g., document summarization, sentiment analysis and question answering. In this paper, we [...] Read more.

Semantic text chunking refers to segmenting text into coherently semantic chunks, i.e., into sets of statements that are semantically related. Semantic chunking is an essential pre-processing step in various NLP tasks e.g., document summarization, sentiment analysis and question answering. In this paper, we propose a hybrid chunking; two-steps semantic text chunking method that combines the effectiveness of unsupervised semantic text chunking based on the similarities between sentences embeddings and the pre-trained language models (PLMs) especially BERT by fine-tuning the BERT on semantic textual similarity task (STS) to provide a flexible and effective semantic text chunking. We evaluated the proposed method in English and Arabic. To the best of our knowledge, there is an absence of an Arabic dataset created to assess semantic text chunking at this level. Therefore, we created an AraWiki50k to evaluate our proposed text chunking method inspired by an existing English dataset. Our experiments showed that exploiting the fine-tuned pre-trained BERT on STS enhances results over unsupervised semantic chunking by an average of 7.4 in the PK metric and by an average of 11.19 in the WindowDiff metric on four English evaluation datasets, and 0.12 in the PK and 2.29 in the WindowDiff for the Arabic dataset. Full article

(This article belongs to the Section Computational Social Science)

► Show Figures

Figure 1

28 pages, 5257 KiB

Open AccessArticle

Comparative Evaluation of Sequential Neural Network (GRU, LSTM, Transformer) Within Siamese Networks for Enhanced Job–Candidate Matching in Applied Recruitment Systems

by Mateusz Łępicki, Tomasz Latkowski, Izabella Antoniuk, Michał Bukowski, Bartosz Świderski, Grzegorz Baranik, Bogusz Nowak, Robert Zakowicz, Łukasz Dobrakowski, Bogdan Act and Jarosław Kurek

Appl. Sci. 2025, 15(11), 5988; https://doi.org/10.3390/app15115988 - 26 May 2025

Viewed by 650

Abstract

Job–candidate matching is pivotal in recruitment, yet traditional manual or keyword-based methods can be laborious and prone to missing qualified candidates. In this study, we introduce the first Siamese framework that systematically contrasts GRU, LSTM, and Transformer sequential heads on top of a [...] Read more.

Job–candidate matching is pivotal in recruitment, yet traditional manual or keyword-based methods can be laborious and prone to missing qualified candidates. In this study, we introduce the first Siamese framework that systematically contrasts GRU, LSTM, and Transformer sequential heads on top of a multilingual Sentence Transformer backbone, which is trained end-to-end with triplet loss on real-world recruitment data. This combination captures both long-range dependencies across document segments and global semantics, representing a substantial advance over approaches that rely solely on static embeddings. We compare the three heads using ranking metrics such as Top-K accuracy and Mean Reciprocal Rank (MRR). The Transformer-based model yields the best overall performance, with an MRR of 0.979 and a Top-100 accuracy of 87.20% on the test set. Visualization of learned embeddings (t-SNE) shows that self-attention more effectively clusters matching texts and separates them from irrelevant ones. These findings underscore the potential of combining multilingual base embeddings with specialized sequential layers to reduce manual screening efforts and improve recruitment efficiency. Full article

(This article belongs to the Special Issue Innovations in Artificial Neural Network Applications)

► Show Figures

Figure 1

12 pages, 517 KiB

Open AccessArticle

Preliminary Investigation of a Novel Measure of Speech Recognition in Noise

by Linda Thibodeau, Emma Freeman, Kristin Kronenberger, Emily Suarez, Hyun-Woong Kim, Shuang Qi and Yune Sang Lee

Audiol. Res. 2025, 15(3), 59; https://doi.org/10.3390/audiolres15030059 - 13 May 2025

Viewed by 664

Abstract

Background/Objectives: Previous research has shown that listeners may use acoustic cues for speech processing that are perceived during brief segments in the noise when there is an optimal signal-to-noise ratio (SNR). This “glimpsing” effect requires higher cognitive skills than the speech tasks used [...] Read more.

Background/Objectives: Previous research has shown that listeners may use acoustic cues for speech processing that are perceived during brief segments in the noise when there is an optimal signal-to-noise ratio (SNR). This “glimpsing” effect requires higher cognitive skills than the speech tasks used in typical audiometric evaluations. Purpose: The aim of this study was to investigate the use of an online test of speech processing in noise in listeners with typical hearing sensitivity (TH, defined as thresholds ≤ 25 dB HL) who were asked to determine the gender of the subject in sentences that were presented in increasing levels of continuous and interrupted noise. Methods: This was a repeated-measures design with three factors (SNR, noise type, and syntactic complexity). Study Sample: Participants with self-reported TH (N = 153, ages 18–39 years, mean age = 20.7 years) who passed an online hearing screening were invited to complete an online questionnaire. Data Collection and Analysis: Participants completed a sentence recognition task under four SNRs (−6, −9, −12, and −15 dB), two syntactic complexity settings (subjective-relative and objective-relative center-embedded), and two noise types (interrupted and continuous). They were asked to listen to 64 sentences through their own headphones/earphones that were presented in an online format at a user-selected comfortable listening level. Their task was to identify the gender of the person performing the action in each sentence. Results: Significant main effects of all three factors as well as the SNR by noise-type two-way interaction were identified (p < 0.05). This interaction indicated that the effect of SNR on sentence comprehension was more pronounced in the continuous noise compared to the interrupted noise condition. Conclusions: Listeners with self-reported TH benefited from the glimpsing effect in the interrupted noise even under low SNRs (i.e., −15 dB). The evaluation of glimpsing may be a sensitive measure of auditory processing beyond the traditional word recognition used in clinical evaluations in persons who report hearing challenges and may hold promise for the development of auditory training programs. Full article

► Show Figures

Figure 1

24 pages, 5923 KiB

Open AccessArticle

Using AI to Ensure Reliable Supply Chains: Legal Relation Extraction for Sustainable and Transparent Contract Automation

by Bajeela Aejas, Abdelhak Belhi and Abdelaziz Bouras

Sustainability 2025, 17(9), 4215; https://doi.org/10.3390/su17094215 - 7 May 2025

Viewed by 583

Abstract

Efficient contract management is essential for ensuring sustainable and reliable supply chains; yet, traditional methods remain manual, error-prone, and inefficient, leading to delays, financial risks, and compliance challenges. AI and blockchain technology offer a transformative alternative, enabling the establishment of automated, transparent, and [...] Read more.

Efficient contract management is essential for ensuring sustainable and reliable supply chains; yet, traditional methods remain manual, error-prone, and inefficient, leading to delays, financial risks, and compliance challenges. AI and blockchain technology offer a transformative alternative, enabling the establishment of automated, transparent, and self-executing smart contracts that enhance efficiency and sustainability. As part of AI-driven smart contract automation, we previously implemented contractual clause extraction using question answering (QA) and named entity recognition (NER). This paper presents the next step in the information extraction process, relation extraction (RE), which aims to identify relationships between key legal entities and convert them into structured business rules for smart contract execution. To address RE in legal contracts, we present a novel hierarchical transformer model that captures sentence- and document-level dependencies. It incorporates global and segment-based attention mechanisms to extract complex legal relationships spanning multiple sentences. Given the scarcity of publicly available contractual datasets, we also introduce the contractual relation extraction (ContRE) dataset, specifically curated to support relation extraction tasks in legal contracts, that we use to evaluate the proposed model. Together, these contributions enable the structured automation of legal rules from unstructured contract text, advancing the development of AI-powered smart contracts. Full article

(This article belongs to the Special Issue Emerging IoT and Blockchain Technologies for Sustainability)

► Show Figures

Figure 1

21 pages, 3806 KiB

Open AccessArticle

Research on the Method of Air Traffic Control Instruction Keyword Extraction Based on the Roberta-Attention-BiLSTM-CRF Model

by Sheng Chen, Weijun Pan, Yidi Wang, Shenhao Chen and Xuan Wang

Aerospace 2025, 12(5), 376; https://doi.org/10.3390/aerospace12050376 - 27 Apr 2025

Viewed by 438

Abstract

In recent years, with the increasing complexity of air traffic management and the rapid development of automation technology, efficiently and accurately extracting key information from large volumes of air traffic control (ATC) instructions has become essential for ensuring flight safety and improving the [...] Read more.

In recent years, with the increasing complexity of air traffic management and the rapid development of automation technology, efficiently and accurately extracting key information from large volumes of air traffic control (ATC) instructions has become essential for ensuring flight safety and improving the efficiency of air traffic control. However, this task is challenging due to the specialized terminology involved and the high real-time requirements for data collection and processing. While existing keyword extraction methods have made some progress, most of them still perform unsatisfactorily on ATC instruction data due to issues such as data irregularities and the lack of domain-specific knowledge. To address these challenges, this paper proposes a Roberta-Attention-BiLSTM-CRF model for keyword extraction from ATC instructions. The RABC model introduces an attention mechanism specifically designed to extract keywords from multi-segment ATC instruction texts. Moreover, the BiLSTM component enhances the model’s ability to capture detailed semantic information within individual sentences during the keyword extraction process. Finally, by integrating a Conditional Random Field (CRF), the model can predict and output multiple keywords in the correct sequence. Experimental results on an ATC instruction dataset demonstrate that the RABC model achieves an accuracy of 89.5% in keyword extraction and a sequence match accuracy of 91.3%, outperforming other models across multiple evaluation metrics. These results validate the effectiveness of the proposed model in extracting keywords from ATC instruction data and demonstrate its potential for advancing automation in air traffic control. Full article

(This article belongs to the Section Air Traffic and Transportation)

► Show Figures

Figure 1

22 pages, 3497 KiB

Open AccessArticle

CPS-LSTM: Privacy-Sensitive Entity Adaptive Recognition Model for Power Systems

by Hao Zhang, Jing Wang, Xuanyuan Wang, Xuhui Lü, Zhenzhi Guan, Zhenghua Cai and Hua Zhang

Energies 2025, 18(8), 2013; https://doi.org/10.3390/en18082013 - 14 Apr 2025

Viewed by 245

Abstract

With the widespread application of Android devices in the energy sector, an increasing number of applications rely on SDKs to access privacy-sensitive data, such as device identifiers, location information, energy consumption, and user behavior. However, these data are often stored in different formats [...] Read more.

With the widespread application of Android devices in the energy sector, an increasing number of applications rely on SDKs to access privacy-sensitive data, such as device identifiers, location information, energy consumption, and user behavior. However, these data are often stored in different formats and naming conventions, which poses challenges for consistent extraction and identification. Traditional taint analysis methods are inefficient in identifying these entities, hindering the realization of accurate identification. To address this issue, we first propose a high-quality data construction method based on privacy protocols, which includes sentence segmentation, compression encoding, and entity annotation. We then introduce CPS-LSTM (Character-level Privacy-sensitive Entity Adaptive Recognition Model), which enhances the recognition capability of privacy-sensitive entities in mixed Chinese and English text through character-level embedding and word vector fusion. The model features a streamlined architecture, accelerating convergence and enabling parallel sentence processing. Our experimental results demonstrate that CPS-LSTM significantly outperforms the baseline methods in terms of accuracy and recall. The accuracy of CPS-LSTM is 0.09 higher than Lattice LSTM, 0.14 higher than WC-LSTM, and 0.05 higher than FLAT. In terms of recall, CPS-LSTM is 0.07 higher than Lattice LSTM, 0.12 higher than WC-LSTM, and 0.02 higher than FLAT. Full article

(This article belongs to the Section F1: Electrical Power System)

► Show Figures

Figure 1

24 pages, 689 KiB

Open AccessArticle

Topic Classification of Interviews on Emergency Remote Teaching

by Spyridon Tzimiris, Stefanos Nikiforos, Maria Nefeli Nikiforos, Despoina Mouratidis and Katia Lida Kermanidis

Information 2025, 16(4), 253; https://doi.org/10.3390/info16040253 - 21 Mar 2025

Viewed by 613

Abstract

This study explores the application of transformer-based language models for automated Topic Classification in qualitative datasets from interviews conducted in Modern Greek. The interviews captured the views of parents, teachers, and school directors regarding Emergency Remote Teaching. Identifying key themes in this kind [...] Read more.

This study explores the application of transformer-based language models for automated Topic Classification in qualitative datasets from interviews conducted in Modern Greek. The interviews captured the views of parents, teachers, and school directors regarding Emergency Remote Teaching. Identifying key themes in this kind of interview is crucial for informed decision-making in educational policies. Each dataset was segmented into sentences and labeled with one out of four topics. The dataset was imbalanced, presenting additional complexity for the classification task. The GreekBERT model was fine-tuned for Topic Classification, with preprocessing including accent stripping, lowercasing, and tokenization. The findings revealed GreekBERT’s effectiveness in achieving balanced performance across all themes, outperforming conventional machine learning models. The highest evaluation metric achieved was a macro-F1-score of 0.76, averaged across all classes, highlighting the effectiveness of the proposed approach. This study contributes the following: (i) datasets capturing diverse educational community perspectives in Modern Greek, (ii) a comparative evaluation of conventional ML models versus transformer-based models, (iii) an investigation of how domain-specific language enhances the performance and accuracy of Topic Classification models, showcasing their effectiveness in specialized datasets and the benefits of fine-tuned GreekBERT for such tasks, and (iv) capturing the complexities of ERT through an empirical investigation of the relationships between extracted topics and relevant variables. These contributions offer reliable, scalable solutions for policymakers, enabling data-driven educational policies to address challenges in remote learning and enhance decision-making based on comprehensive qualitative evidence. Full article

(This article belongs to the Special Issue Machine Learning Approaches for Imbalanced Domains: Emerging Trends and Applications)

► Show Figures

Figure 1

22 pages, 1390 KiB

Open AccessArticle

Emotion-Aware Embedding Fusion in Large Language Models (Flan-T5, Llama 2, DeepSeek-R1, and ChatGPT 4) for Intelligent Response Generation

by Abdur Rasool, Muhammad Irfan Shahzad, Hafsa Aslam, Vincent Chan and Muhammad Ali Arshad

AI 2025, 6(3), 56; https://doi.org/10.3390/ai6030056 - 13 Mar 2025

Cited by 6 | Viewed by 2823

Abstract

Empathetic and coherent responses are critical in automated chatbot-facilitated psychotherapy. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce Emotion-Aware Embedding Fusion, a novel framework integrating hierarchical fusion and attention [...] Read more.

Empathetic and coherent responses are critical in automated chatbot-facilitated psychotherapy. This study addresses the challenge of enhancing the emotional and contextual understanding of large language models (LLMs) in psychiatric applications. We introduce Emotion-Aware Embedding Fusion, a novel framework integrating hierarchical fusion and attention mechanisms to prioritize semantic and emotional features in therapy transcripts. Our approach combines multiple emotion lexicons, including NRC Emotion Lexicon, VADER, WordNet, and SentiWordNet, with state-of-the-art LLMs such as Flan-T5, Llama 2, DeepSeek-R1, and ChatGPT 4. Therapy session transcripts, comprising over 2000 samples, are segmented into hierarchical levels (word, sentence, and session) using neural networks, while hierarchical fusion combines these features with pooling techniques to refine emotional representations. Attention mechanisms, including multi-head self-attention and cross-attention, further prioritize emotional and contextual features, enabling the temporal modeling of emotional shifts across sessions. The processed embeddings, computed using BERT, GPT-3, and RoBERTa, are stored in the Facebook AI similarity search vector database, which enables efficient similarity search and clustering across dense vector spaces. Upon user queries, relevant segments are retrieved and provided as context to LLMs, enhancing their ability to generate empathetic and contextually relevant responses. The proposed framework is evaluated across multiple practical use cases to demonstrate real-world applicability, including AI-driven therapy chatbots. The system can be integrated into existing mental health platforms to generate personalized responses based on retrieved therapy session data. The experimental results show that our framework enhances empathy, coherence, informativeness, and fluency, surpassing baseline models while improving LLMs’ emotional intelligence and contextual adaptability for psychotherapy. Full article

(This article belongs to the Special Issue Multimodal Artificial Intelligence in Healthcare)

► Show Figures

Figure 1

23 pages, 696 KiB

Open AccessArticle

KG-EGV: A Framework for Question Answering with Integrated Knowledge Graphs and Large Language Models

by Kun Hou, Jingyuan Li, Yingying Liu, Shiqi Sun, Haoliang Zhang and Haiyang Jiang

Electronics 2024, 13(23), 4835; https://doi.org/10.3390/electronics13234835 - 7 Dec 2024

Cited by 2 | Viewed by 2271

Abstract

Despite the remarkable progress of large language models (LLMs) in understanding and generating unstructured text, their application in structured data domains and their multi-role capabilities remain underexplored. In particular, utilizing LLMs to perform complex reasoning tasks on knowledge graphs (KGs) is still an [...] Read more.

Despite the remarkable progress of large language models (LLMs) in understanding and generating unstructured text, their application in structured data domains and their multi-role capabilities remain underexplored. In particular, utilizing LLMs to perform complex reasoning tasks on knowledge graphs (KGs) is still an emerging area with limited research. To address this gap, we propose KG-EGV, a versatile framework leveraging LLMs to perform KG-based tasks. KG-EGV consists of four core steps: sentence segmentation, graph retrieval, EGV, and backward updating, each designed to segment sentences, retrieve relevant KG components, and derive logical conclusions. EGV, a novel integrated framework for LLM inference, enables comprehensive reasoning beyond retrieval by synthesizing diverse evidence, which is often unattainable via retrieval alone due to noise or hallucinations. The framework incorporates six key stages: generation expansion, expansion evaluation, document re-ranking, re-ranking evaluation, answer generation, and answer verification. Within this framework, LLMs take on various roles, such as generator, re-ranker, evaluator, and verifier, collaboratively enhancing answer precision and logical coherence. By combining the strengths of retrieval-based and generation-based evidence, KG-EGV achieves greater flexibility and accuracy in evidence gathering and answer formulation. Extensive experiments on widely used benchmarks, including FactKG, MetaQA, NQ, WebQ, and TriviaQA, demonstrate that KG-EGV achieves state-of-the-art performance in answer accuracy and evidence quality, showcasing its potential to advance QA research and applications. Full article

(This article belongs to the Special Issue Big Data Analytics and Information Technology for Smart Cities and Citizen Wellbeing)

► Show Figures

Figure 1

24 pages, 2553 KiB

Open AccessArticle

Effects of Language Proficiency on Selective Attention Patterns at Segmenting Boundaries in English Audio Sentences

by Yunhao Mei, Fei Chen and Xiaoxiang Chen

Brain Sci. 2024, 14(12), 1204; https://doi.org/10.3390/brainsci14121204 - 28 Nov 2024

Viewed by 839

Abstract

Background/Objectives: Normative perceptual segmentation facilitates event perception, comprehension, and memory. Given that native English listeners’ normative perceptual segmentation of English speech streams coexists with a highly selective attention pattern at segmentation boundaries, it is significant to test whether Chinese learners of English have [...] Read more.

Background/Objectives: Normative perceptual segmentation facilitates event perception, comprehension, and memory. Given that native English listeners’ normative perceptual segmentation of English speech streams coexists with a highly selective attention pattern at segmentation boundaries, it is significant to test whether Chinese learners of English have a different attention pattern at boundaries, thereby checking whether they perform a normative segmentation. Methods: Thirty Chinese learners of English with relatively higher language proficiency (CLH) and 26 with relatively lower language proficiency (CLL) listened to a series of English audio sentences. Meanwhile, they were asked to press the key whenever a phonetic probe “ba” occurred. Response time to “ba” reflects the attention where “ba” is located at segmentation boundaries. Results: The results revealed that, (1) relative to native English listeners’ highly selective attention pattern, the CLH group showed a relatively selective attention pattern, while the CLL group displayed a non-selective attention pattern. (2) Both the CLH group and natives had better recognition memory than the CLL group. (3) Both the CLH group and natives’ attention at segmentation boundaries was not correlated with their memory for sentences, while the CLL group’s attention at boundaries was correlated with memory. Conclusions: These findings suggest that (1) Chinese learners of English did not perform a normative segmentation, which shows the effect of English proficiency on perceptual segmentation; (2) English proficiency has a superior effect on memory for sentences, while perceptual segmentation would come next to support memory by providing structure for memory construction if English proficiency is not high; (3) a comparison of attention patterns between Chinese learners and natives can provide a reference for potential intervention to rectify non-natives, thereby improving their perception of English speech streams. Full article

(This article belongs to the Section Behavioral Neuroscience)

► Show Figures

Figure 1

25 pages, 2085 KiB

Open AccessArticle

How Much Does the Dynamic F0 Curve Affect the Expression of Emotion in Utterances?

by Tae-Jin Yoon

Appl. Sci. 2024, 14(23), 10972; https://doi.org/10.3390/app142310972 - 26 Nov 2024

Viewed by 1065

Abstract

The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying [...] Read more.

The modulation of vocal elements, such as pitch, loudness, and duration, plays a crucial role in conveying both linguistic information and the speaker’s emotional state. While acoustic features like fundamental frequency (F0) variability have been widely studied in emotional speech analysis, accurately classifying emotion remains challenging due to the complex and dynamic nature of vocal expressions. Traditional analytical methods often oversimplify these dynamics, potentially overlooking intricate patterns indicative of specific emotions. This study examines the influences of emotion and temporal variation on dynamic F0 contours in the analytical framework, utilizing a dataset valuable for its diverse emotional expressions. However, the analysis is constrained by the limited variety of sentences employed, which may affect the generalizability of the findings to broader linguistic contexts. We utilized the Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS), focusing on eight distinct emotional states performed by 24 professional actors. Sonorant segments were extracted, and F0 measurements were converted into semitones relative to a 100 Hz baseline to standardize pitch variations. By employing Generalized Additive Mixed Models (GAMMs), we modeled non-linear trajectories of F0 contours over time, accounting for fixed effects (emotions) and random effects (individual speaker variability). Our analysis revealed that incorporating emotion-specific, non-linear time effects and individual speaker differences significantly improved the model’s explanatory power, ultimately explaining up to 66.5% of the variance in the F0. The inclusion of random smooths for time within speakers captured individual temporal modulation patterns, providing a more accurate representation of emotional speech dynamics. The results demonstrate that dynamic modeling of F0 contours using GAMMs enhances the accuracy of emotion classification in speech. This approach captures the nuanced pitch patterns associated with different emotions and accounts for individual variability among speakers. The findings contribute to a deeper understanding of the vocal expression of emotions and offer valuable insights for advancing speech emotion recognition systems. Full article

(This article belongs to the Special Issue Advances and Applications of Audio and Speech Signal Processing)

► Show Figures

Figure 1

17 pages, 919 KiB

Open AccessArticle

A Helium Speech Correction Method Based on Generative Adversarial Networks

by Hongjun Li, Yuxiang Chen, Hongwei Ji and Shibing Zhang

Big Data Cogn. Comput. 2024, 8(11), 158; https://doi.org/10.3390/bdcc8110158 - 15 Nov 2024

Cited by 1 | Viewed by 937

Abstract

The distortion of helium speech caused by helium−oxygen gas mixtures significantly impacts the safety and communication efficiency of saturation divers. Although existing correction methods have shown some effectiveness in improving the intelligibility of helium speech, challenges remain in enhancing clarity and high−pitch correction. [...] Read more.

The distortion of helium speech caused by helium−oxygen gas mixtures significantly impacts the safety and communication efficiency of saturation divers. Although existing correction methods have shown some effectiveness in improving the intelligibility of helium speech, challenges remain in enhancing clarity and high−pitch correction. To address the issue of degraded speech quality post−correction, a novel helium speech correction method based on generative adversarial networks (GANs) is proposed. Firstly, a new helium speech dataset is introduced, which includes isolated words and continuous speech in both Chinese and English. By training and testing on both isolated words and continuous passages, the correction capability of the model can be accurately evaluated. Secondly, a new evaluation system for helium speech correction is proposed, which partially fills the gap in current helium speech evaluation metrics. This system uses comprehensive similarity to evaluate the similarity of keywords at the sentence level, thus assessing the correction results of helium speech from both word and sentence dimensions. Lastly, a GAN−based helium speech correction method is designed. This method solves the problems of pitch period distortion and formant shift in helium speech by introducing an adaptive speech segmentation algorithm and a fusion loss function and significantly improves the clarity and intelligibility of corrected helium speech. The experimental results show that the corrected helium speech is improved in clarity and intelligibility, which shows its practical value and application potential. Full article

► Show Figures

Figure 1

15 pages, 2908 KiB

Open AccessStudy Protocol

Exploring Natural Language Processing through an Exemplar Using YouTube

by Joohyun Chung, Sangmin Song and Heesook Son

Int. J. Environ. Res. Public Health 2024, 21(10), 1357; https://doi.org/10.3390/ijerph21101357 - 15 Oct 2024

Viewed by 1647

Abstract

There has been a growing emphasis on data across various health-related fields, not just in nursing research, due to the increasing volume of unstructured data in electronic health records (EHRs). Natural Language Processing (NLP) provides a solution by transforming this unstructured data into [...] Read more.

There has been a growing emphasis on data across various health-related fields, not just in nursing research, due to the increasing volume of unstructured data in electronic health records (EHRs). Natural Language Processing (NLP) provides a solution by transforming this unstructured data into structured formats, thereby facilitating valuable insights. This methodology paper explores the application of NLP in nursing, using an exemplar case study that analyzes YouTube data to investigate social phenomena among adults living alone. The methodology involves five steps: accessing data through YouTube’s API, data cleaning, preprocessing (tokenization, sentence segmentation, linguistic normalization), sentiment analysis using Python, and topic modeling. This study serves as a comprehensive guide for integrating NLP into nursing research, supplemented with digital content demonstrating each step. For successful implementation, nursing researchers must grasp the fundamental concepts and processes of NLP. The potential of NLP in nursing is significant, particularly in utilizing unstructured textual data from nursing documentation and social media. Its benefits include streamlining nursing documentation, enhancing patient communication, and improving data analysis. Full article

(This article belongs to the Section Health Care Sciences)

► Show Figures

Figure 1

Search Results (66)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (66)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI