Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (451)

Search Parameters:
Keywords = audio-integrated

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 6359 KiB  
Article
Development and Testing of an AI-Based Specific Sound Detection System Integrated on a Fixed-Wing VTOL UAV
by Gabriel-Petre Badea, Mădălin Dombrovschi, Tiberius-Florian Frigioescu, Maria Căldărar and Daniel-Eugeniu Crunteanu
Acoustics 2025, 7(3), 48; https://doi.org/10.3390/acoustics7030048 - 30 Jul 2025
Viewed by 33
Abstract
This study presents the development and validation of an AI-based system for detecting chainsaw sounds, integrated into a fixed-wing VTOL UAV. The system employs a convolutional neural network trained on log-mel spectrograms derived from four sound classes: chainsaw, music, electric drill, and human [...] Read more.
This study presents the development and validation of an AI-based system for detecting chainsaw sounds, integrated into a fixed-wing VTOL UAV. The system employs a convolutional neural network trained on log-mel spectrograms derived from four sound classes: chainsaw, music, electric drill, and human voices. Initial validation was performed through ground testing. Acoustic data acquisition is optimized during cruise flight, when wing-mounted motors are shut down and the rear motor operates at 40–60% capacity, significantly reducing noise interference. To address residual motor noise, a preprocessing module was developed using reference recordings obtained in an anechoic chamber. Two configurations were tested to capture the motor’s acoustic profile by changing the UAV’s orientation relative to the fixed microphone. The embedded system processes incoming audio in real time, enabling low-latency classification without data transmission. Field experiments confirmed the model’s high precision and robustness under varying flight and environmental conditions. Results validate the feasibility of real-time, onboard acoustic event detection using spectrogram-based deep learning on UAV platforms, and support its applicability for scalable aerial monitoring tasks. Full article
Show Figures

Figure 1

23 pages, 2710 KiB  
Article
Non-Semantic Multimodal Fusion for Predicting Segment Access Frequency in Lecture Archives
by Ruozhu Sheng, Jinghong Li and Shinobu Hasegawa
Educ. Sci. 2025, 15(8), 978; https://doi.org/10.3390/educsci15080978 (registering DOI) - 30 Jul 2025
Viewed by 32
Abstract
This study proposes a non-semantic multimodal approach to predict segment access frequency (SAF) in lecture archives. Such archives, widely used as supplementary resources in modern education, often consist of long, unedited recordings that are difficult to navigate and review efficiently. The predicted SAF, [...] Read more.
This study proposes a non-semantic multimodal approach to predict segment access frequency (SAF) in lecture archives. Such archives, widely used as supplementary resources in modern education, often consist of long, unedited recordings that are difficult to navigate and review efficiently. The predicted SAF, an indicator of student viewing behavior, serves as a practical proxy for student engagement. The increasing volume of recorded material renders manual editing and annotation impractical, making the automatic identification of high-SAF segments crucial for improving accessibility and supporting targeted content review. The approach focuses on lecture archives from a real-world blended learning context, characterized by resource constraints such as no specialized hardware and limited student numbers. The model integrates multimodal features from instructor’s actions (via OpenPose and optical flow), audio spectrograms, and slide page progression—a selection of features that makes the approach applicable regardless of lecture language. The model was evaluated on 665 labeled one-minute segments from one such course. Experiments show that the best-performing model achieves a Pearson correlation of 0.5143 in 7-fold cross-validation and 61.05% average accuracy in a downstream three-class classification task. These results demonstrate the system’s capacity to enhance lecture archives by automatically identifying key segments, which aids students in efficient, targeted review and provides instructors with valuable data for pedagogical feedback. Full article
Show Figures

Figure 1

21 pages, 5817 KiB  
Article
UN15: An Urban Noise Dataset Coupled with Time–Frequency Attention for Environmental Sound Classification
by Yu Shen, Ge Cao, Huan-Yu Dong, Bo Dong and Chang-Myung Lee
Appl. Sci. 2025, 15(15), 8413; https://doi.org/10.3390/app15158413 - 29 Jul 2025
Viewed by 79
Abstract
With the increasing severity of urban noise pollution, its detrimental impact on public health has garnered growing attention. However, accurate identification and classification of noise sources in complex urban acoustic environments remain major technical challenges for achieving refined noise management. To address this [...] Read more.
With the increasing severity of urban noise pollution, its detrimental impact on public health has garnered growing attention. However, accurate identification and classification of noise sources in complex urban acoustic environments remain major technical challenges for achieving refined noise management. To address this issue, this study presents two key contributions. First, we construct a new urban noise classification dataset, namely the urban noise 15-category dataset (UN15), which consists of 1620 audio clips from 15 representative categories, including traffic, construction, crowd activity, and commercial noise, recorded from diverse real-world urban scenes. Second, we propose a novel deep neural network architecture based on a residual network and integrated with a time–frequency attention mechanism, referred to as residual network with temporal–frequency attention (ResNet-TF). Extensive experiments conducted on the UN15 dataset demonstrate that ResNet-TF outperforms several mainstream baseline models in both classification accuracy and robustness. These results not only verify the effectiveness of the proposed attention mechanism but also establish the UN15 dataset as a valuable benchmark for future research in urban noise classification. Full article
(This article belongs to the Section Acoustics and Vibrations)
Show Figures

Figure 1

19 pages, 290 KiB  
Article
Artificial Intelligence in Primary Care: Support or Additional Burden on Physicians’ Healthcare Work?—A Qualitative Study
by Stefanie Mache, Monika Bernburg, Annika Würtenberger and David A. Groneberg
Clin. Pract. 2025, 15(8), 138; https://doi.org/10.3390/clinpract15080138 - 25 Jul 2025
Viewed by 141
Abstract
Background: Artificial intelligence (AI) is being increasingly promoted as a means to enhance diagnostic accuracy, to streamline workflows, and to improve overall care quality in primary care. However, empirical evidence on how primary care physicians (PCPs) perceive, engage with, and emotionally respond [...] Read more.
Background: Artificial intelligence (AI) is being increasingly promoted as a means to enhance diagnostic accuracy, to streamline workflows, and to improve overall care quality in primary care. However, empirical evidence on how primary care physicians (PCPs) perceive, engage with, and emotionally respond to AI technologies in everyday clinical settings remains limited. Concerns persist regarding AI’s usability, transparency, and potential impact on professional identity, workload, and the physician–patient relationship. Methods: This qualitative study investigated the lived experiences and perceptions of 28 PCPs practicing in diverse outpatient settings across Germany. Participants were purposively sampled to ensure variation in age, practice characteristics, and digital proficiency. Data were collected through in-depth, semi-structured interviews, which were audio-recorded, transcribed verbatim, and subjected to rigorous thematic analysis employing Mayring’s qualitative content analysis framework. Results: Participants demonstrated a fundamentally ambivalent stance toward AI integration in primary care. Perceived advantages included enhanced diagnostic support, relief from administrative burdens, and facilitation of preventive care. Conversely, physicians reported concerns about workflow disruption due to excessive system prompts, lack of algorithmic transparency, increased cognitive and emotional strain, and perceived threats to clinical autonomy and accountability. The implications for the physician–patient relationship were seen as double-edged: while some believed AI could foster trust through transparent use, others feared depersonalization of care. Crucial prerequisites for successful implementation included transparent and explainable systems, structured training opportunities, clinician involvement in design processes, and seamless integration into clinical routines. Conclusions: Primary care physicians’ engagement with AI is marked by cautious optimism, shaped by both perceived utility and significant concerns. Effective and ethically sound implementation requires co-design approaches that embed clinical expertise, ensure algorithmic transparency, and align AI applications with the realities of primary care workflows. Moreover, foundational AI literacy should be incorporated into undergraduate health professional curricula to equip future clinicians with the competencies necessary for responsible and confident use. These strategies are essential to safeguard professional integrity, support clinician well-being, and maintain the humanistic core of primary care. Full article
35 pages, 5195 KiB  
Article
A Multimodal AI Framework for Automated Multiclass Lung Disease Diagnosis from Respiratory Sounds with Simulated Biomarker Fusion and Personalized Medication Recommendation
by Abdullah, Zulaikha Fatima, Jawad Abdullah, José Luis Oropeza Rodríguez and Grigori Sidorov
Int. J. Mol. Sci. 2025, 26(15), 7135; https://doi.org/10.3390/ijms26157135 - 24 Jul 2025
Viewed by 348
Abstract
Respiratory diseases represent a persistent global health challenge, underscoring the need for intelligent, accurate, and personalized diagnostic and therapeutic systems. Existing methods frequently suffer from limitations in diagnostic precision, lack of individualized treatment, and constrained adaptability to complex clinical scenarios. To address these [...] Read more.
Respiratory diseases represent a persistent global health challenge, underscoring the need for intelligent, accurate, and personalized diagnostic and therapeutic systems. Existing methods frequently suffer from limitations in diagnostic precision, lack of individualized treatment, and constrained adaptability to complex clinical scenarios. To address these challenges, our study introduces a modular AI-powered framework that integrates an audio-based disease classification model with simulated molecular biomarker profiles to evaluate the feasibility of future multimodal diagnostic extensions, alongside a synthetic-data-driven prescription recommendation engine. The disease classification model analyzes respiratory sound recordings and accurately distinguishes among eight clinical classes: bronchiectasis, pneumonia, upper respiratory tract infection (URTI), lower respiratory tract infection (LRTI), asthma, chronic obstructive pulmonary disease (COPD), bronchiolitis, and healthy respiratory state. The proposed model achieved a classification accuracy of 99.99% on a holdout test set, including 94.2% accuracy on pediatric samples. In parallel, the prescription module provides individualized treatment recommendations comprising drug, dosage, and frequency trained on a carefully constructed synthetic dataset designed to emulate real-world prescribing logic.The model achieved over 99% accuracy in medication prediction tasks, outperforming baseline models such as those discussed in research. Minimal misclassification in the confusion matrix and strong clinician agreement on 200 prescriptions (Cohen’s κ = 0.91 [0.87–0.94] for drug selection, 0.78 [0.74–0.81] for dosage, 0.96 [0.93–0.98] for frequency) further affirm the system’s reliability. Adjusted clinician disagreement rates were 2.7% (drug), 6.4% (dosage), and 1.5% (frequency). SHAP analysis identified age and smoking as key predictors, enhancing model explainability. Dosage accuracy was 91.3%, and most disagreements occurred in renal-impaired and pediatric cases. However, our study is presented strictly as a proof-of-concept. The use of synthetic data and the absence of access to real patient records constitute key limitations. A trialed clinical deployment was conducted under a controlled environment with a positive rate of satisfaction from experts and users, but the proposed system must undergo extensive validation with de-identified electronic medical records (EMRs) and regulatory scrutiny before it can be considered for practical application. Nonetheless, the findings offer a promising foundation for the future development of clinically viable AI-assisted respiratory care tools. Full article
Show Figures

Figure 1

18 pages, 697 KiB  
Review
Lip-Reading: Advances and Unresolved Questions in a Key Communication Skill
by Martina Battista, Francesca Collesei, Eva Orzan, Marta Fantoni and Davide Bottari
Audiol. Res. 2025, 15(4), 89; https://doi.org/10.3390/audiolres15040089 - 21 Jul 2025
Viewed by 281
Abstract
Lip-reading, i.e., the ability to recognize speech using only visual cues, plays a fundamental role in audio-visual speech processing, intelligibility, and comprehension. This capacity is integral to language development and functioning; it emerges in early development, and it slowly evolves. By linking psycholinguistics, [...] Read more.
Lip-reading, i.e., the ability to recognize speech using only visual cues, plays a fundamental role in audio-visual speech processing, intelligibility, and comprehension. This capacity is integral to language development and functioning; it emerges in early development, and it slowly evolves. By linking psycholinguistics, psychophysics, and neurophysiology, the present narrative review explores the development and significance of lip-reading across different stages of life, highlighting its role in human communication in both typical and atypical development, e.g., in the presence of hearing or language impairments. We examined how relying on lip-reading becomes crucial when communication occurs in noisy environments and, on the contrary, the impacts that visual barriers can have on speech perception. Finally, this review highlights individual differences and the role of cultural and social contexts for a better understanding of the visual counterpart of speech. Full article
Show Figures

Figure 1

21 pages, 1689 KiB  
Article
Exploring LLM Embedding Potential for Dementia Detection Using Audio Transcripts
by Brandon Alejandro Llaca-Sánchez, Luis Roberto García-Noguez, Marco Antonio Aceves-Fernández, Andras Takacs and Saúl Tovar-Arriaga
Eng 2025, 6(7), 163; https://doi.org/10.3390/eng6070163 - 17 Jul 2025
Viewed by 269
Abstract
Dementia is a neurodegenerative disorder characterized by progressive cognitive impairment that significantly affects daily living. Early detection of Alzheimer’s disease—the most common form of dementia—remains essential for prompt intervention and treatment, yet clinical diagnosis often requires extensive and resource-intensive procedures. This article explores [...] Read more.
Dementia is a neurodegenerative disorder characterized by progressive cognitive impairment that significantly affects daily living. Early detection of Alzheimer’s disease—the most common form of dementia—remains essential for prompt intervention and treatment, yet clinical diagnosis often requires extensive and resource-intensive procedures. This article explores the effectiveness of automated Natural Language Processing (NLP) methods for identifying Alzheimer’s indicators from audio transcriptions of the Cookie Theft picture description task in the PittCorpus dementia database. Five NLP approaches were compared: a classical Tf–Idf statistical representation and embeddings derived from large language models (GloVe, BERT, Gemma-2B, and Linq-Embed-Mistral), each integrated with a logistic regression classifier. Transcriptions were carefully preprocessed to preserve linguistically relevant features such as repetitions, self-corrections, and pauses. To compare the performance of the five approaches, a stratified 5-fold cross-validation was conducted; the best results were obtained with BERT embeddings (84.73% accuracy) closely followed by the simpler Tf–Idf approach (83.73% accuracy) and the state-of-the-art model Linq-Embed-Mistral (83.54% accuracy), while Gemma-2B and GloVe embeddings yielded slightly lower performances (80.91% and 78.11% accuracy, respectively). Contrary to initial expectations—that richer semantic and contextual embeddings would substantially outperform simpler frequency-based methods—the competitive accuracy of Tf–Idf suggests that the choice and frequency of the words used might be more important than semantic or contextual information in Alzheimer’s detection. This work represents an effort toward implementing user-friendly software capable of offering an initial indicator of Alzheimer’s risk, potentially reducing the need for an in-person clinical visit. Full article
Show Figures

Figure 1

18 pages, 1150 KiB  
Article
Navigating by Design: Effects of Individual Differences and Navigation Modality on Spatial Memory Acquisition
by Xianyun Liu, Yanan Zhang and Baihu Sun
Behav. Sci. 2025, 15(7), 959; https://doi.org/10.3390/bs15070959 - 15 Jul 2025
Viewed by 264
Abstract
Spatial memory is a critical component of spatial cognition, particularly in unfamiliar environments. As navigation systems become integral to daily life, understanding how individuals with varying spatial abilities respond to different navigation modes is increasingly important. This study employed a virtual driving environment [...] Read more.
Spatial memory is a critical component of spatial cognition, particularly in unfamiliar environments. As navigation systems become integral to daily life, understanding how individuals with varying spatial abilities respond to different navigation modes is increasingly important. This study employed a virtual driving environment to examine how participants with varying spatial abilities (good or poor) performed under three navigation modes, namely visual, audio, and combined audio–visual navigation modes. A total of 78 participants were divided into two groups, good sense of direction (G-SOD) and poor sense of direction (P-SOD), according to their Santa Barbara Sense of Direction (SBSOD) scores. They were randomly assigned to one of the three navigation modes (visual, audio, audio–visual). Participants followed navigation cues and simulated driving behavior to the end point twice during the learning phase, then completed the route retracing task, recognizing scenes task and recognizing the order task. Significant main effects were found for both SOD group and navigation mode, with no interaction. G-SOD participants outperformed P-SOD participants in route retracing task. Audio navigation mode led to better performance in tasks involving complex spatial decisions, such as turn intersections and recognizing the order. The accuracy of recognizing scenes did not significantly differ across SOD groups or navigation modes. These findings suggest that audio navigation mode may reduce visual distraction and support more effective spatial encoding and that individual spatial abilities influence navigation performance independently of guidance type. These findings highlight the importance of aligning navigation modalities with users’ cognitive profiles and support the development of adaptive navigation systems that accommodate individual differences in spatial ability. Full article
(This article belongs to the Section Cognition)
Show Figures

Figure 1

20 pages, 5700 KiB  
Article
Multimodal Personality Recognition Using Self-Attention-Based Fusion of Audio, Visual, and Text Features
by Hyeonuk Bhin and Jongsuk Choi
Electronics 2025, 14(14), 2837; https://doi.org/10.3390/electronics14142837 - 15 Jul 2025
Viewed by 420
Abstract
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose [...] Read more.
Personality is a fundamental psychological trait that exerts a long-term influence on human behavior patterns and social interactions. Automatic personality recognition (APR) has exhibited increasing importance across various domains, including Human–Robot Interaction (HRI), personalized services, and psychological assessments. In this study, we propose a multimodal personality recognition model that classifies the Big Five personality traits by extracting features from three heterogeneous sources: audio processed using Wav2Vec2, video represented as Skeleton Landmark time series, and text encoded through Bidirectional Encoder Representations from Transformers (BERT) and Doc2Vec embeddings. Each modality is handled through an independent Self-Attention block that highlights salient temporal information, and these representations are then summarized and integrated using a late fusion approach to effectively reflect both the inter-modal complementarity and cross-modal interactions. Compared to traditional recurrent neural network (RNN)-based multimodal models and unimodal classifiers, the proposed model achieves an improvement of up to 12 percent in the F1-score. It also maintains a high prediction accuracy and robustness under limited input conditions. Furthermore, a visualization based on t-distributed Stochastic Neighbor Embedding (t-SNE) demonstrates clear distributional separation across the personality classes, enhancing the interpretability of the model and providing insights into the structural characteristics of its latent representations. To support real-time deployment, a lightweight thread-based processing architecture is implemented, ensuring computational efficiency. By leveraging deep learning-based feature extraction and the Self-Attention mechanism, we present a novel personality recognition framework that balances performance with interpretability. The proposed approach establishes a strong foundation for practical applications in HRI, counseling, education, and other interactive systems that require personalized adaptation. Full article
(This article belongs to the Special Issue Explainable Machine Learning and Data Mining)
Show Figures

Figure 1

16 pages, 2365 KiB  
Article
Fast Inference End-to-End Speech Synthesis with Style Diffusion
by Hui Sun, Jiye Song and Yi Jiang
Electronics 2025, 14(14), 2829; https://doi.org/10.3390/electronics14142829 - 15 Jul 2025
Viewed by 451
Abstract
In recent years, deep learning-based end-to-end Text-To-Speech (TTS) models have made significant progress in enhancing speech naturalness and fluency. However, existing Variational Inference Text-to-Speech (VITS) models still face challenges such as insufficient pitch modeling, inadequate contextual dependency capture, and low inference efficiency in [...] Read more.
In recent years, deep learning-based end-to-end Text-To-Speech (TTS) models have made significant progress in enhancing speech naturalness and fluency. However, existing Variational Inference Text-to-Speech (VITS) models still face challenges such as insufficient pitch modeling, inadequate contextual dependency capture, and low inference efficiency in the decoder. To address these issues, this paper proposes an improved TTS framework named Q-VITS. Q-VITS incorporates Rotary Position Embedding (RoPE) into the text encoder to enhance long-sequence modeling, adopts a frame-level prior modeling strategy to optimize one-to-many mappings, and designs a style extractor based on a diffusion model for controllable style rendering. Additionally, the proposed decoder ConfoGAN integrates explicit F0 modeling, Pseudo-Quadrature Mirror Filter (PQMF) multi-band synthesis and Conformer structure. The experimental results demonstrate that Q-VITS outperforms the VITS in terms of speech quality, pitch accuracy, and inference efficiency in both subjective Mean Opinion Score (MOS) and objective Mel-Cepstral Distortion (MCD) and Root Mean Square Error (RMSE) evaluations on a single-speaker dataset, achieving performance close to ground-truth audio. These improvements provide an effective solution for efficient and controllable speech synthesis. Full article
Show Figures

Figure 1

22 pages, 3768 KiB  
Article
MWB_Analyzer: An Automated Embedded System for Real-Time Quantitative Analysis of Morphine Withdrawal Behaviors in Rodents
by Moran Zhang, Qianqian Li, Shunhang Li, Binxian Sun, Zhuli Wu, Jinxuan Liu, Xingchao Geng and Fangyi Chen
Toxics 2025, 13(7), 586; https://doi.org/10.3390/toxics13070586 - 14 Jul 2025
Viewed by 396
Abstract
Background/Objectives: Substance use disorders, particularly opioid addiction, continue to pose a major global health and toxicological challenge. Morphine dependence represents a significant problem in both clinical practice and preclinical research, particularly in modeling the pharmacodynamics of withdrawal. Rodent models remain indispensable for investigating [...] Read more.
Background/Objectives: Substance use disorders, particularly opioid addiction, continue to pose a major global health and toxicological challenge. Morphine dependence represents a significant problem in both clinical practice and preclinical research, particularly in modeling the pharmacodynamics of withdrawal. Rodent models remain indispensable for investigating the neurotoxicological effects of chronic opioid exposure and withdrawal. However, conventional behavioral assessments rely on manual observation, limiting objectivity, reproducibility, and scalability—critical constraints in modern drug toxicity evaluation. This study introduces MWB_Analyzer, an automated and high-throughput system designed to quantitatively and objectively assess morphine withdrawal behaviors in rats. The goal is to enhance toxicological assessments of CNS-active substances through robust, scalable behavioral phenotyping. Methods: MWB_Analyzer integrates optimized multi-angle video capture, real-time signal processing, and machine learning-driven behavioral classification. An improved YOLO-based architecture was developed for the accurate detection and categorization of withdrawal-associated behaviors in video frames, while a parallel pipeline processed audio signals. The system incorporates behavior-specific duration thresholds to isolate pharmacologically and toxicologically relevant behavioral events. Experimental animals were assigned to high-dose, low-dose, and control groups. Withdrawal was induced and monitored under standardized toxicological protocols. Results: MWB_Analyzer achieved over 95% reduction in redundant frame processing, markedly improving computational efficiency. It demonstrated high classification accuracy: >94% for video-based behaviors (93% on edge devices) and >92% for audio-based events. The use of behavioral thresholds enabled sensitive differentiation between dosage groups, revealing clear dose–response relationships and supporting its application in neuropharmacological and neurotoxicological profiling. Conclusions: MWB_Analyzer offers a robust, reproducible, and objective platform for the automated evaluation of opioid withdrawal syndromes in rodent models. It enhances throughput, precision, and standardization in addiction research. Importantly, this tool supports toxicological investigations of CNS drug effects, preclinical pharmacokinetic and pharmacodynamic evaluations, drug safety profiling, and regulatory assessment of novel opioid and CNS-active therapeutics. Full article
(This article belongs to the Section Drugs Toxicity)
Show Figures

Graphical abstract

13 pages, 5432 KiB  
Communication
CSAMT-Driven Feasibility Assessment of Beishan Underground Research Laboratory
by Zhiguo An, Qingyun Di, Changmin Fu and Zhongxing Wang
Sensors 2025, 25(14), 4282; https://doi.org/10.3390/s25144282 - 9 Jul 2025
Viewed by 238
Abstract
The safe disposal of high-level radioactive waste (HLW) is imperative for sustaining China’s rapidly expanding nuclear power sector, with deep geological repositories requiring rigorous site evaluation via underground research laboratories (URLs). This study presents a controlled-source audio-frequency magnetotellurics (CSAMT) survey at the Xinchang [...] Read more.
The safe disposal of high-level radioactive waste (HLW) is imperative for sustaining China’s rapidly expanding nuclear power sector, with deep geological repositories requiring rigorous site evaluation via underground research laboratories (URLs). This study presents a controlled-source audio-frequency magnetotellurics (CSAMT) survey at the Xinchang site in China’s Beishan area, a region dominated by high-resistivity metamorphic rocks. To overcome electrical data acquisition challenges in such resistive terrains, salt-saturated water was applied to transmitting and receiving electrodes to enhance grounding efficiency. Using excitation frequencies of 9600 Hz to 1 Hz, the survey achieved a 1000 m investigation depth. Data processing incorporated static effect removal via low-pass filtering and smoothness-constrained 2D inversion. The results showed strong consistency between observed and modeled data, validating inversion reliability. Borehole correlations identified a 600-m-thick intact rock mass, confirming favorable geological conditions for URL construction. The study demonstrates CSAMT’s efficacy in characterizing HLW repository sites in high-resistivity environments, providing critical geophysical insights for China’s HLW disposal program. These findings advance site evaluation methodologies for deep geological repositories, though integrated multidisciplinary assessments remain essential for comprehensive site validation. This work underscores the feasibility of the Xinchang site while establishing a technical framework that is applicable to analogous challenging terrains globally. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

17 pages, 5876 KiB  
Article
Optimization of Knitted Strain Sensor Structures for a Real-Time Korean Sign Language Translation Glove System
by Youn-Hee Kim and You-Kyung Oh
Sensors 2025, 25(14), 4270; https://doi.org/10.3390/s25144270 - 9 Jul 2025
Viewed by 271
Abstract
Herein, an integrated system is developed based on knitted strain sensors for real-time translation of sign language into text and audio voices. To investigate how the structural characteristics of the knit affect the electrical performance, the position of the conductive yarn and the [...] Read more.
Herein, an integrated system is developed based on knitted strain sensors for real-time translation of sign language into text and audio voices. To investigate how the structural characteristics of the knit affect the electrical performance, the position of the conductive yarn and the presence or absence of elastic yarn are set as experimental variables, and five distinct sensors are manufactured. A comprehensive analysis of the electrical and mechanical performance, including sensitivity, responsiveness, reliability, and repeatability, reveals that the sensor with a plain-plated-knit structure, no elastic yarn included, and the conductive yarn positioned uniformly on the back exhibits the best performance, with a gauge factor (GF) of 88. The sensor exhibited a response time of less than 0.1 s at 50 cycles per minute (cpm), demonstrating that it detects and responds promptly to finger joint bending movements. Moreover, it exhibits stable repeatability and reliability across various angles and speeds, confirming its optimization for sign language recognition applications. Based on this design, an integrated textile-based system is developed by incorporating the sensor, interconnections, snap connectors, and a microcontroller unit (MCU) with built-in Bluetooth Low Energy (BLE) technology into the knitted glove. The complete system successfully recognized 12 Korean Sign Language (KSL) gestures in real time and output them as both text and audio through a dedicated application, achieving a high recognition accuracy of 98.67%. Thus, the present study quantitatively elucidates the structure–performance relationship of a knitted sensor and proposes a wearable system that accounts for real-world usage environments, thereby demonstrating the commercialization potential of the technology. Full article
(This article belongs to the Section Wearables)
Show Figures

Figure 1

17 pages, 1063 KiB  
Article
In More Than Words: Ecopoetic Hybrids with Visual and Musical Arts
by Lynn Keller
Humanities 2025, 14(7), 145; https://doi.org/10.3390/h14070145 - 8 Jul 2025
Viewed by 862
Abstract
While poetry has long relied on musical and visual elements for its communicative power, numerous contemporary poets are drawing so dramatically on the resources of the visual arts and on elements of musical scoring that their poems become inter-arts hybrids. The interdisciplinary character [...] Read more.
While poetry has long relied on musical and visual elements for its communicative power, numerous contemporary poets are drawing so dramatically on the resources of the visual arts and on elements of musical scoring that their poems become inter-arts hybrids. The interdisciplinary character of environmental writing and its attachment to material conditions of planetary life particularly invite the use of visual and/or audio technologies as documentation or as prompts toward multisensory attention that may shift readers’ perceptions of the more-than-human world. This essay examines four recent works of ecopoetry from the US to explore some of the diverse ways in which, by integrating into volumes of poetry their own visual and musical art, poets are expanding the environmental imagination and enhancing their environmental messaging. The visual and musical elements, I argue, offer fresh perceptual lenses that help break down cognitive habits bolstering separations of Western humans from more-than-human realms or dampening awareness of social and cultural norms that foster environmental degradation and violations of environmental justice. The multi-modal works discussed are Jennifer Scappettone’s The Republic of Exit 43, JJJJJerome Ellis’s Aster of Ceremonies, Danielle Vogel’s Edges & Fray, and Jonathan Skinner’s “Blackbird Stanzas.” Full article
(This article belongs to the Special Issue Hybridity and Border Crossings in Contemporary North American Poetry)
Show Figures

Figure 1

23 pages, 1945 KiB  
Article
Spectro-Image Analysis with Vision Graph Neural Networks and Contrastive Learning for Parkinson’s Disease Detection
by Nuwan Madusanka, Hadi Sedigh Malekroodi, H. M. K. K. M. B. Herath, Chaminda Hewage, Myunggi Yi and Byeong-Il Lee
J. Imaging 2025, 11(7), 220; https://doi.org/10.3390/jimaging11070220 - 2 Jul 2025
Viewed by 349
Abstract
This study presents a novel framework that integrates Vision Graph Neural Networks (ViGs) with supervised contrastive learning for enhanced spectro-temporal image analysis of speech signals in Parkinson’s disease (PD) detection. The approach introduces a frequency band decomposition strategy that transforms raw audio into [...] Read more.
This study presents a novel framework that integrates Vision Graph Neural Networks (ViGs) with supervised contrastive learning for enhanced spectro-temporal image analysis of speech signals in Parkinson’s disease (PD) detection. The approach introduces a frequency band decomposition strategy that transforms raw audio into three complementary spectral representations, capturing distinct PD-specific characteristics across low-frequency (0–2 kHz), mid-frequency (2–6 kHz), and high-frequency (6 kHz+) bands. The framework processes mel multi-band spectro-temporal representations through a ViG architecture that models complex graph-based relationships between spectral and temporal components, trained using a supervised contrastive objective that learns discriminative representations distinguishing PD-affected from healthy speech patterns. Comprehensive experimental validation on multi-institutional datasets from Italy, Colombia, and Spain demonstrates that the proposed ViG-contrastive framework achieves superior classification performance, with the ViG-M-GELU architecture achieving 91.78% test accuracy. The integration of graph neural networks with contrastive learning enables effective learning from limited labeled data while capturing complex spectro-temporal relationships that traditional Convolution Neural Network (CNN) approaches miss, representing a promising direction for developing more accurate and clinically viable speech-based diagnostic tools for PD. Full article
(This article belongs to the Section Medical Imaging)
Show Figures

Figure 1

Back to TopTop