Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (159)

Search Parameters:
Keywords = speech intelligibility enhancement

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
11 pages, 552 KB  
Article
Association Between Shift Work and Auditory–Cognitive Processing in Middle-Aged Healthcare Workers
by Margarida Roque, Tatiana Marques and Margarida Serrano
Audiol. Res. 2025, 15(6), 145; https://doi.org/10.3390/audiolres15060145 (registering DOI) - 25 Oct 2025
Abstract
Background/Objectives: Shift work in healthcare professionals affects performance in high cognitive processing, especially in complex environments. However, the beneficial effects that working in complex environments may have on auditory–cognitive processing remain unknown. These professionals face increased challenges in decision-making due to factors such [...] Read more.
Background/Objectives: Shift work in healthcare professionals affects performance in high cognitive processing, especially in complex environments. However, the beneficial effects that working in complex environments may have on auditory–cognitive processing remain unknown. These professionals face increased challenges in decision-making due to factors such as noise exposure and sleep disturbances, which may lead to the development of enhanced auditory–cognitive resources. This study aims to investigate the associations between shift work and auditory–cognitive processing in middle-aged healthcare workers. Methods: Thirty middle-aged healthcare workers were equally allocated to a shift worker (SW) or a fixed-schedule worker (FSW) group. Performance on a cognitive test, and in pure-tone audiometry, speech in quiet and noise, and listening effort were used to explore whether correlations were specific to shift work. Results: Exploratory analyses indicated that shift workers tended to perform better in visuospatial/executive function, memory recall, memory index, orientation, and total MoCA score domains compared to fixed-schedule workers. In the SW group, hearing thresholds correlated with memory recall and memory index. In the FSW group, hearing thresholds correlated with orientation, memory index, and total MoCA score, while listening effort correlated with naming, and speech intelligibility in quiet correlated with total MoCA scores. Conclusions: These exploratory findings suggest that shift work may be linked to distinct auditory–cognitive patterns, with potential compensatory mechanisms in visuospatial/executive functions and memory among middle-aged healthcare workers. Larger, longitudinal studies are warranted to confirm whether these patterns reflect true adaptive mechanisms. Full article
(This article belongs to the Special Issue The Aging Ear)
40 pages, 9185 KB  
Article
Tongan Speech Recognition Based on Layer-Wise Fine-Tuning Transfer Learning and Lexicon Parameter Enhancement
by Junhao Geng, Dongyao Jia, Ziqi Li, Zihao He, Nengkai Wu, Weijia Zhang and Rongtao Cui
Appl. Sci. 2025, 15(21), 11412; https://doi.org/10.3390/app152111412 (registering DOI) - 24 Oct 2025
Abstract
Speech recognition, as a key driver of artificial intelligence and global communication, has advanced rapidly in major languages, while studies on low-resource languages remain limited. Tongan, a representative Polynesian language, carries significant cultural value. However, Tongan speech recognition faces three main challenges: data [...] Read more.
Speech recognition, as a key driver of artificial intelligence and global communication, has advanced rapidly in major languages, while studies on low-resource languages remain limited. Tongan, a representative Polynesian language, carries significant cultural value. However, Tongan speech recognition faces three main challenges: data scarcity, limited adaptability of transfer learning, and weak dictionary modeling. This study proposes improvements in adaptive transfer learning and NBPE-based dictionary modeling to address these issues. An adaptive transfer learning strategy with layer-wise unfreezing and dynamic learning rate adjustment is introduced, enabling effective adaptation of pretrained models to the target language while improving accuracy and efficiency. In addition, the MEA-AGA is developed by combining the Mind Evolutionary Algorithm (MEA) with the Adaptive Genetic Algorithm (AGA) to optimize the number of byte-pair encoding (NBPE) parameters, thereby enhancing recognition accuracy and speed. The collected Tongan speech data were expanded and preprocessed, after which the experiments were conducted on an NVIDIA RTX 4070 GPU (16 GB) using CUDA 11.8 under the Ubuntu 18.04 operating system. Experimental results show that the proposed method achieved a word error rate (WER) of 26.18% and a word-per-second (WPS) rate of 68, demonstrating clear advantages over baseline methods and confirming its effectiveness for low-resource language applications. Although the proposed approach demonstrates promising performance, this study is still limited by the relatively small corpus size and the early stage of research exploration. Future work will focus on expanding the dataset, refining adaptive transfer strategies, and enhancing cross-lingual generalization to further improve the robustness and scalability of the model. Full article
(This article belongs to the Special Issue Techniques and Applications of Natural Language Processing)
7 pages, 1456 KB  
Proceeding Paper
Towards a More Natural Urdu: A Comprehensive Approach to Text-to-Speech and Voice Cloning
by Muhammad Ramiz Saud, Muhammad Romail Imran and Raja Hashim Ali
Eng. Proc. 2025, 87(1), 112; https://doi.org/10.3390/engproc2025087112 - 20 Oct 2025
Viewed by 144
Abstract
This paper introduces a comprehensive approach to building natural-sounding Urdu Text-to-Speech (TTS) and voice cloning systems, addressing the lack of computational resources for Urdu. We developed a large-scale dataset of over 100 h of Urdu speech, carefully cleaned and phonetically aligned through an [...] Read more.
This paper introduces a comprehensive approach to building natural-sounding Urdu Text-to-Speech (TTS) and voice cloning systems, addressing the lack of computational resources for Urdu. We developed a large-scale dataset of over 100 h of Urdu speech, carefully cleaned and phonetically aligned through an automated transcription pipeline to preserve linguistic accuracy. The dataset was then used to fine-tune Tacotron2, a neural network model originally trained for English, with modifications tailored to Urdu’s phonological and morphological features. To further enhance naturalness, we integrated voice cloning techniques that capture regional accents and produce personalized speech outputs. Model performance was evaluated through mean opinion score (MOS), word error rate (WER), and speaker similarity, showing substantial improvements compared to previous Urdu systems. The results demonstrate clear progress toward natural and intelligible Urdu speech synthesis, while also revealing challenges such as handling dialectal variation and preventing model overfitting. This work contributes an essential resource and methodology for advancing Urdu natural language processing (NLP), with promising applications in education, accessibility, entertainment, and assistive technologies. Full article
(This article belongs to the Proceedings of The 5th International Electronic Conference on Applied Sciences)
Show Figures

Graphical abstract

23 pages, 1934 KB  
Article
INTU-AI: Digitalization of Police Interrogation Supported by Artificial Intelligence
by José Pinto Garcia, Carlos Grilo, Patrício Domingues and Rolando Miragaia
Appl. Sci. 2025, 15(19), 10781; https://doi.org/10.3390/app151910781 - 7 Oct 2025
Viewed by 499
Abstract
Traditional police interrogation processes remain largely time-consuming and reliant on substantial human effort for both analysis and documentation. Intuition Artificial Intelligence (INTU-AI) is a Windows application designed to digitalize the administrative workflow associated with police interrogations, while enhancing procedural efficiency through the integration [...] Read more.
Traditional police interrogation processes remain largely time-consuming and reliant on substantial human effort for both analysis and documentation. Intuition Artificial Intelligence (INTU-AI) is a Windows application designed to digitalize the administrative workflow associated with police interrogations, while enhancing procedural efficiency through the integration of AI-driven emotion recognition models. The system employs a multimodal approach that captures and analyzes emotional states using three primary vectors: Facial Expression Recognition (FER), Speech Emotion Recognition (SER), and Text-based Emotion Analysis (TEA). This triangulated methodology aims to identify emotional inconsistencies and detect potential suppression or concealment of affective responses by interviewees. INTU-AI serves as a decision-support tool rather than a replacement for human judgment. By automating bureaucratic tasks, it allows investigators to focus on critical aspects of the interrogation process. The system was validated in practical training sessions with inspectors and with a 12-question questionnaire. The results indicate a strong acceptance of the system in terms of its usability, existing functionalities, practical utility of the program, user experience, and open-ended qualitative responses. Full article
(This article belongs to the Special Issue Digital Transformation in Information Systems)
Show Figures

Figure 1

15 pages, 666 KB  
Review
Transforming Speech-Language Pathology with AI: Opportunities, Challenges, and Ethical Guidelines
by Georgios P. Georgiou
Healthcare 2025, 13(19), 2460; https://doi.org/10.3390/healthcare13192460 - 28 Sep 2025
Viewed by 1038
Abstract
Artificial intelligence (AI) is transforming the diagnosis, treatment, and management of speech-language disorders through advances in speech recognition, natural language processing, automated assessments, and personalized intervention. These tools have the potential to enhance clinical decision-making, improve diagnostic accuracy, and increase access to services [...] Read more.
Artificial intelligence (AI) is transforming the diagnosis, treatment, and management of speech-language disorders through advances in speech recognition, natural language processing, automated assessments, and personalized intervention. These tools have the potential to enhance clinical decision-making, improve diagnostic accuracy, and increase access to services for individuals with speech and language disorders, particularly in underserved populations. Despite this progress, adoption is challenged by data bias, lack of transparency, and limited integration into clinical workflows. To realize the potential of AI in communication sciences, both technical development and ethical safeguards are required. This paper outlines core applications, emerging opportunities, and major challenges in applying AI to speech-language pathology and proposes ethical principles for its responsible use. Full article
Show Figures

Figure 1

16 pages, 773 KB  
Article
Evaluating Parenting Stress and Identifying Influential Factors in Caregivers of Deaf and Hard-of-Hearing Children
by Yuan Chen, Xiaoli Shen and Chengao Lyu
Audiol. Res. 2025, 15(5), 120; https://doi.org/10.3390/audiolres15050120 - 20 Sep 2025
Viewed by 351
Abstract
Parenting stress significantly affects caregivers of deaf and hard-of-hearing (DHH) children, influenced by unique challenges and stressors. Background/Objectives: This study aims to develop the Chinese Family Stress Scale (CFSS) and to identify the stressors and contributing factors to elevated stress levels. Methods [...] Read more.
Parenting stress significantly affects caregivers of deaf and hard-of-hearing (DHH) children, influenced by unique challenges and stressors. Background/Objectives: This study aims to develop the Chinese Family Stress Scale (CFSS) and to identify the stressors and contributing factors to elevated stress levels. Methods: The study involved 257 caregivers of DHH children aged 0–12 years old. The CFSS was used to assess parenting stress in caregivers of DHH children, with its reliability and validity evaluated. Factors such as speech intelligibility, oral language use, self-compassion, and social support were examined for their impact on parenting stress. Results: Key stressors included financial issues, discipline, education concerns, medical care, and safety. Elevated parenting stress was significantly associated with poor speech intelligibility of the child, inadequate oral language use, negative aspects of self-compassion, and insufficient social support. The CFSS showed good reliability and validity in measuring parenting stress among caregivers of DHH children. Conclusions: The CFSS is an effective tool for assessing parenting stress in caregivers of DHH children. Interventions to reduce parenting stress can focus on improving children’s communication skills, enhancing caregiver self-compassion, and bolstering social support networks. Full article
(This article belongs to the Section Hearing)
Show Figures

Figure 1

17 pages, 8430 KB  
Article
Robust Audio–Visual Speaker Localization in Noisy Aircraft Cabins for Inflight Medical Assistance
by Qiwu Qin and Yian Zhu
Sensors 2025, 25(18), 5827; https://doi.org/10.3390/s25185827 - 18 Sep 2025
Viewed by 516
Abstract
Active Speaker Localization (ASL) involves identifying both who is speaking and where they are speaking from within audiovisual content. This capability is crucial in constrained and acoustically challenging environments, such as aircraft cabins during in-flight medical emergencies. In this paper, we propose a [...] Read more.
Active Speaker Localization (ASL) involves identifying both who is speaking and where they are speaking from within audiovisual content. This capability is crucial in constrained and acoustically challenging environments, such as aircraft cabins during in-flight medical emergencies. In this paper, we propose a novel end-to-end Cross-Modal Audio–Visual Fusion Network (CMAVFN) designed specifically for ASL under real-world aviation conditions, which are characterized by engine noise, dynamic lighting, occlusions from seats or oxygen masks, and frequent speaker turnover. Our model directly processes raw video frames and multi-channel ambient audio, eliminating the need for intermediate face detection pipelines. It anchors spatially resolved visual features with directional audio cues using a cross-modal attention mechanism. To enhance spatiotemporal reasoning, we introduce a dual-branch localization decoder and a cross-modal auxiliary supervision loss. Extensive experiments on public datasets (AVA-ActiveSpeaker, EasyCom) and our domain-specific AirCabin-ASL benchmark demonstrate that CMAVFN achieves robust speaker localization in noisy, occluded, and multi-speaker aviation scenarios. This framework offers a practical foundation for speech-driven interaction systems in aircraft cabins, enabling applications such as real-time crew assistance, voice-based medical documentation, and intelligent in-flight health monitoring. Full article
(This article belongs to the Special Issue Advanced Biomedical Imaging and Signal Processing)
Show Figures

Figure 1

22 pages, 4234 KB  
Article
Speaker Recognition Based on the Combination of SincNet and Neuro-Fuzzy for Intelligent Home Service Robots
by Seo-Hyun Kim, Tae-Wan Kim and Keun-Chang Kwak
Electronics 2025, 14(18), 3581; https://doi.org/10.3390/electronics14183581 - 9 Sep 2025
Viewed by 588
Abstract
Speaker recognition has become a critical component of human–robot interaction (HRI), enabling personalized services based on user identity, as the demand for home service robots increases. In contrast to conventional speech recognition tasks, recognition in home service robot environments is affected by varying [...] Read more.
Speaker recognition has become a critical component of human–robot interaction (HRI), enabling personalized services based on user identity, as the demand for home service robots increases. In contrast to conventional speech recognition tasks, recognition in home service robot environments is affected by varying speaker–robot distances and background noises, which can significantly reduce accuracy. Traditional approaches rely on hand-crafted features, which may lose essential speaker-specific information during extraction like mel-frequency cepstral coefficients (MFCCs). To address this, we propose a novel speaker recognition technique for intelligent robots that combines SincNet-based raw waveform processing with an adaptive neuro-fuzzy inference system (ANFIS). SincNet extracts relevant frequency features by learning low- and high-cutoff frequencies in its convolutional filters, reducing parameter complexity while retaining discriminative power. To improve interpretability and handle non-linearity, ANFIS is used as the classifier, leveraging fuzzy rules generated by fuzzy c-means (FCM) clustering. The model is evaluated on a custom dataset collected in a realistic home environment with background noise, including TV sounds and mechanical noise from robot motion. Our results show that the proposed model outperforms existing CNN, CNN-ANFIS, and SincNet models in terms of accuracy. This approach offers robust performance and enhanced model transparency, making it well-suited for intelligent home robot systems. Full article
(This article belongs to the Special Issue Control and Design of Intelligent Robots)
Show Figures

Figure 1

24 pages, 3568 KB  
Article
Employing AI for Better Access to Justice: An Automatic Text-to-Video Linking Tool for UK Supreme Court Hearings
by Hadeel Saadany, Constantin Orăsan, Catherine Breslin, Mikolaj Barczentewicz and Sophie Walker
Appl. Sci. 2025, 15(16), 9205; https://doi.org/10.3390/app15169205 - 21 Aug 2025
Viewed by 1157
Abstract
The increasing adoption of artificial intelligence across domains presents new opportunities to enhance access to justice. In this paper, we introduce a human-centric AI tool that utilises advances in Automatic Speech Recognition (ASR) and Large Language Models (LLMs) to facilitate semantic linking between [...] Read more.
The increasing adoption of artificial intelligence across domains presents new opportunities to enhance access to justice. In this paper, we introduce a human-centric AI tool that utilises advances in Automatic Speech Recognition (ASR) and Large Language Models (LLMs) to facilitate semantic linking between written UK Supreme Court (SC) judgements and their corresponding hearing videos. The motivation stems from the critical role UK SC hearings play in shaping landmark legal decisions, which often span several hours and remain difficult to navigate manually. Our approach involves two key components: (1) a customised ASR system fine-tuned on 139 h of manually edited SC hearing transcripts and legal documents and (2) a semantic linking module powered by GPT-based text embeddings adapted to the legal domain. The ASR system addresses domain-specific transcription challenges by incorporating a custom language model and legal phrase extraction techniques. The semantic linking module uses fine-tuned embeddings to match judgement paragraphs with relevant spans in the hearing transcripts. Quantitative evaluation shows that our customised ASR system improves transcription accuracy by 9% compared to generic ASR baselines. Furthermore, our adapted GPT embeddings achieve an F1 score of 0.85 in classifying relevant links between judgement text and hearing transcript segments. These results demonstrate the effectiveness of our system in streamlining access to critical legal information and supporting legal professionals in interpreting complex judicial decisions. Full article
(This article belongs to the Special Issue Computational Linguistics: From Text to Speech Technologies)
Show Figures

Figure 1

25 pages, 1734 KB  
Article
A Multimodal Affective Interaction Architecture Integrating BERT-Based Semantic Understanding and VITS-Based Emotional Speech Synthesis
by Yanhong Yuan, Shuangsheng Duo, Xuming Tong and Yapeng Wang
Algorithms 2025, 18(8), 513; https://doi.org/10.3390/a18080513 - 14 Aug 2025
Viewed by 1085
Abstract
Addressing the issues of coarse emotional representation, low cross-modal alignment efficiency, and insufficient real-time response capabilities in current human–computer emotional language interaction, this paper proposes an affective interaction framework integrating BERT-based semantic understanding with VITS-based speech synthesis. The framework aims to enhance the [...] Read more.
Addressing the issues of coarse emotional representation, low cross-modal alignment efficiency, and insufficient real-time response capabilities in current human–computer emotional language interaction, this paper proposes an affective interaction framework integrating BERT-based semantic understanding with VITS-based speech synthesis. The framework aims to enhance the naturalness, expressiveness, and response efficiency of human–computer emotional interaction. By introducing a modular layered design, a six-dimensional emotional space, a gated attention mechanism, and a dynamic model scheduling strategy, the system overcomes challenges such as limited emotional representation, modality misalignment, and high-latency responses. Experimental results demonstrate that the framework achieves superior performance in speech synthesis quality (MOS: 4.35), emotion recognition accuracy (91.6%), and response latency (<1.2 s), outperforming baseline models like Tacotron2 and FastSpeech2. Through model lightweighting, GPU parallel inference, and load balancing optimization, the system validates its robustness and generalizability across English and Chinese corpora in cross-linguistic tests. The modular architecture and dynamic scheduling ensure scalability and efficiency, enabling a more humanized and immersive interaction experience in typical application scenarios such as psychological companionship, intelligent education, and high-concurrency customer service. This study provides an effective technical pathway for developing the next generation of personalized and immersive affective intelligent interaction systems. Full article
(This article belongs to the Section Algorithms for Multidisciplinary Applications)
Show Figures

Figure 1

21 pages, 794 KB  
Article
A Study on the Application of Large Language Models Based on LoRA Fine-Tuning and Difficult-Sample Adaptation for Online Violence Recognition
by Zhengguang Gao, Shenjia Jing and Lihong Zhang
Symmetry 2025, 17(8), 1310; https://doi.org/10.3390/sym17081310 - 13 Aug 2025
Viewed by 1324
Abstract
This study introduces the concept of symmetry as a fundamental theoretical perspective for understanding the linguistic structure of cyberbullying texts. It posits that such texts often exhibit symmetry breaking between surface-level language forms and underlying semantic intent. This structural-semantic asymmetry increases the complexity [...] Read more.
This study introduces the concept of symmetry as a fundamental theoretical perspective for understanding the linguistic structure of cyberbullying texts. It posits that such texts often exhibit symmetry breaking between surface-level language forms and underlying semantic intent. This structural-semantic asymmetry increases the complexity of the recognition task and places higher demands on the semantic modeling capabilities of detection systems. With the rapid growth of social media, the covert and harmful nature of cyberbullying speech has become increasingly prominent, posing serious challenges to public opinion management and public safety. While mainstream approaches to cyberbullying detection—typically based on traditional deep learning models or pre-trained language models—have achieved some progress, they still struggle with low accuracy, poor generalization, and weak interpretability when handling implicit, semantically complex, or borderline expressions. To address these challenges, this paper proposes a cyberbullying detection method that combines LoRA-based fine-tuning with Small-Scale Hard-Sample Adaptive Training (S-HAT), leveraging a large language model framework based on Meta-Llama-3-8B-Instruct. The method employs prompt-based techniques to identify inference failures and integrates model-generated reasoning paths for lightweight fine-tuning. This enhances the model’s ability to capture and represent semantic asymmetry in cyberbullying texts. Experiments conducted on the ToxiCN dataset demonstrate that the S-HAT approach achieves a precision of 84.1% using only 24 hard samples—significantly outperforming baseline models such as BERT and RoBERTa. The proposed method not only improves recognition accuracy but also enhances model interpretability and deployment efficiency, offering a practical and intelligent solution for cyberbullying mitigation. Full article
Show Figures

Figure 1

21 pages, 1344 KB  
Article
Research on Intelligent Extraction Method of Influencing Factors of Loess Landslide Geological Disasters Based on Soft-Lexicon and GloVe
by Lutong Huang, Yueqin Zhu, Yingfei Li, Tianxiao Yan, Yu Xiao, Dongqi Wei, Ziyao Xing and Jian Li
Appl. Sci. 2025, 15(16), 8879; https://doi.org/10.3390/app15168879 - 12 Aug 2025
Viewed by 359
Abstract
Loess landslide disasters are influenced by a multitude of factors, including slope conditions, triggering mechanisms, and spatial attributes. Extracting these factors from unstructured geological texts is challenging due to nested entities, semantic ambiguity, and rare domain-specific terms. This study proposes a joint extraction [...] Read more.
Loess landslide disasters are influenced by a multitude of factors, including slope conditions, triggering mechanisms, and spatial attributes. Extracting these factors from unstructured geological texts is challenging due to nested entities, semantic ambiguity, and rare domain-specific terms. This study proposes a joint extraction framework guided by a domain ontology that categorizes six types of loess landslide influencing factors, including spatial relationships. The ontology facilitates conceptual classification and semi-automatic nested entity annotation, enabling the construction of a high-quality corpus with eight tag types. The model integrates a Soft-Lexicon mechanism that enhances character-level GloVe embeddings with explicit lexical features, including domain terms, part-of-speech tags, and word boundary indicators derived from a domain-specific lexicon. The resulting hybrid character-level representations are then fed into a BiLSTM-CRF architecture to jointly extract entities, attributes, and multi-level spatial and causal relationships. Extracted results are structured using a content-knowledge model to build a spatially enriched knowledge graph, supporting semantic queries and intelligent reasoning. Experimental results demonstrate improved performance over baseline methods, showcasing the framework’s effectiveness in geohazard information extraction and disaster risk analysis. Full article
(This article belongs to the Special Issue Applications of Big Data and Artificial Intelligence in Geoscience)
Show Figures

Figure 1

23 pages, 888 KB  
Article
Explainable Deep Learning Model for ChatGPT-Rephrased Fake Review Detection Using DistilBERT
by Rania A. AlQadi, Shereen A. Taie, Amira M. Idrees and Esraa Elhariri
Big Data Cogn. Comput. 2025, 9(8), 205; https://doi.org/10.3390/bdcc9080205 - 11 Aug 2025
Viewed by 1224
Abstract
Customers heavily depend on reviews for product information. Fake reviews may influence the perception of product quality, making online reviews less effective. ChatGPT’s (GPT-3.5 and GPT-4) ability to generate human-like reviews and responses to inquiries across several disciplines has increased recently. This leads [...] Read more.
Customers heavily depend on reviews for product information. Fake reviews may influence the perception of product quality, making online reviews less effective. ChatGPT’s (GPT-3.5 and GPT-4) ability to generate human-like reviews and responses to inquiries across several disciplines has increased recently. This leads to an increase in the number of reviewers and applications using ChatGPT to create fake reviews. Consequently, the detection of fake reviews generated or rephrased by ChatGPT has become essential. This paper proposes a new approach that distinguishes ChatGPT-rephrased reviews, considered fake, from real ones, utilizing a balanced dataset to analyze the sentiment and linguistic patterns that characterize both reviews. The proposed model further leverages Explainable Artificial Intelligence (XAI) techniques, including Local Interpretable Model-agnostic Explanations (LIME) and Shapley Additive Explanations (SHAP) for deeper insights into the model’s predictions and the classification logic. The proposed model performs a pre-processing phase that includes part-of-speech (POS) tagging, word lemmatization, tokenization, and then fine-tuned Transformer-based Machine Learning (ML) model DistilBERT for predictions. The obtained experimental results indicate that the proposed fine-tuned DistilBERT, utilizing the constructed balanced dataset along with a pre-processing phase, outperforms other state-of-the-art methods for detecting ChatGPT-rephrased reviews, achieving an accuracy of 97.25% and F1-score of 97.56%. The use of LIME and SHAP techniques not only enhanced the model’s interpretability, but also offered valuable insights into the key factors that affect the differentiation of genuine reviews from ChatGPT-rephrased ones. According to XAI, ChatGPT’s writing style is polite, uses grammatical structure, lacks specific descriptions and information in reviews, uses fancy words, is impersonal, and has deficiencies in emotional expression. These findings emphasize the effectiveness and reliability of the proposed approach. Full article
(This article belongs to the Special Issue Natural Language Processing Applications in Big Data)
Show Figures

Figure 1

26 pages, 514 KB  
Article
Improving Voice Spoofing Detection Through Extensive Analysis of Multicepstral Feature Reduction
by Leonardo Mendes de Souza, Rodrigo Capobianco Guido, Rodrigo Colnago Contreras, Monique Simplicio Viana and Marcelo Adriano dos Santos Bongarti
Sensors 2025, 25(15), 4821; https://doi.org/10.3390/s25154821 - 5 Aug 2025
Viewed by 1408
Abstract
Voice biometric systems play a critical role in numerous security applications, including electronic device authentication, banking transaction verification, and confidential communications. Despite their widespread utility, these systems are increasingly targeted by sophisticated spoofing attacks that leverage advanced artificial intelligence techniques to generate realistic [...] Read more.
Voice biometric systems play a critical role in numerous security applications, including electronic device authentication, banking transaction verification, and confidential communications. Despite their widespread utility, these systems are increasingly targeted by sophisticated spoofing attacks that leverage advanced artificial intelligence techniques to generate realistic synthetic speech. Addressing the vulnerabilities inherent to voice-based authentication systems has thus become both urgent and essential. This study proposes a novel experimental analysis that extensively explores various dimensionality reduction strategies in conjunction with supervised machine learning models to effectively identify spoofed voice signals. Our framework involves extracting multicepstral features followed by the application of diverse dimensionality reduction methods, such as Principal Component Analysis (PCA), Truncated Singular Value Decomposition (SVD), statistical feature selection (ANOVA F-value, Mutual Information), Recursive Feature Elimination (RFE), regularization-based LASSO selection, Random Forest feature importance, and Permutation Importance techniques. Empirical evaluation using the ASVSpoof 2017 v2.0 dataset measures the classification performance with the Equal Error Rate (EER) metric, achieving values of approximately 10%. Our comparative analysis demonstrates significant performance gains when dimensionality reduction methods are applied, underscoring their value in enhancing the security and effectiveness of voice biometric verification systems against emerging spoofing threats. Full article
(This article belongs to the Special Issue Sensors and Machine-Learning Based Signal Processing)
Show Figures

Figure 1

17 pages, 8512 KB  
Article
Interactive Holographic Display System Based on Emotional Adaptability and CCNN-PCG
by Yu Zhao, Zhong Xu, Ting-Yu Zhang, Meng Xie, Bing Han and Ye Liu
Electronics 2025, 14(15), 2981; https://doi.org/10.3390/electronics14152981 - 26 Jul 2025
Viewed by 859
Abstract
Against the backdrop of the rapid advancement of intelligent speech interaction and holographic display technologies, this paper introduces an interactive holographic display system. This paper applies 2D-to-3D technology to acquisition work and uses a Complex-valued Convolutional Neural Network Point Cloud Gridding (CCNN-PCG) algorithm [...] Read more.
Against the backdrop of the rapid advancement of intelligent speech interaction and holographic display technologies, this paper introduces an interactive holographic display system. This paper applies 2D-to-3D technology to acquisition work and uses a Complex-valued Convolutional Neural Network Point Cloud Gridding (CCNN-PCG) algorithm to generate a computer-generated hologram (CGH) with depth information for application in point cloud data. During digital human hologram building, 2D-to-3D conversion yields high-precision point cloud data. The system uses ChatGLM for natural language processing and emotion-adaptive responses, enabling multi-turn voice dialogs and text-driven model generation. The CCNN-PCG algorithm reduces computational complexity and improves display quality. Simulations and experiments show that CCNN-PCG enhances reconstruction quality and speeds up computation by over 2.2 times. This research provides a theoretical framework and practical technology for holographic interactive systems, applicable in virtual assistants, educational displays, and other fields. Full article
(This article belongs to the Special Issue Artificial Intelligence, Computer Vision and 3D Display)
Show Figures

Figure 1

Back to TopTop