MDPI - Publisher of Open Access Journals

18 pages, 7321 KiB

Open AccessArticle

Fault Diagnosis of Wind Turbine Gearbox Based on Mel Spectrogram and Improved ResNeXt50 Model

by Xiaojuan Zhang, Feixiang Jia and Yayu Chen

Appl. Sci. 2025, 15(15), 8563; https://doi.org/10.3390/app15158563 (registering DOI) - 1 Aug 2025

In response to the problem of complex and variable loads on wind turbine gearbox bearing in working conditions, as well as the limited amount of sound data making fault identification difficult, this study focuses on sound signals and proposes an intelligent diagnostic method [...] Read more.

In response to the problem of complex and variable loads on wind turbine gearbox bearing in working conditions, as well as the limited amount of sound data making fault identification difficult, this study focuses on sound signals and proposes an intelligent diagnostic method using deep learning. By adding the CBAM module in ResNeXt to enhance the model’s attention to important features and combining it with the Arcloss loss function to make the model learn more discriminative features, the generalization ability of the model is strengthened. We used a fine-tuning transfer learning strategy, transferring pre-trained model parameters to the CBAM-ResNeXt50-ArcLoss model and training with an extracted Mel spectrogram of sound signals to extract and classify audio features of the wind turbine gearbox. Experimental validation of the proposed method on collected sound signals showed its effectiveness and superiority. Compared to CNN, ResNet50, ResNeXt50, and CBAM-ResNet50 methods, the CBAM-ResNeXt50-ArcLoss model achieved improvements of 13.3, 3.6, 2.4, and 1.3, respectively. Through comparison with classical algorithms, we demonstrated that the research method proposed in this study exhibits better diagnostic capability in classifying wind turbine gearbox sound signals. Full article

► Show Figures

Figure 1

21 pages, 5062 KiB

Open AccessArticle

Forest Management Effects on Breeding Bird Communities in Apennine Beech Stands

by Guglielmo Londi, Francesco Parisi, Elia Vangi, Giovanni D’Amico and Davide Travaglini

Ecologies 2025, 6(3), 54; https://doi.org/10.3390/ecologies6030054 (registering DOI) - 1 Aug 2025

Abstract

Beech forests in the Italian peninsula are actively managed and they also support a high level of biodiversity. Hence, biodiversity conservation can be synergistic with timber production and carbon sequestration, enhancing the overall economic benefits of forest management. This study aimed to evaluate [...] Read more.

Beech forests in the Italian peninsula are actively managed and they also support a high level of biodiversity. Hence, biodiversity conservation can be synergistic with timber production and carbon sequestration, enhancing the overall economic benefits of forest management. This study aimed to evaluate the effect of forest management regimes on bird communities in the Italian Peninsula during 2022 through audio recordings. We studied the structure, composition, and specialization of the breeding bird community in four managed beech stands (three even-aged beech stands aged 20, 60, and 100 years old, managed by a uniform shelterwood system; one uneven-aged stand, managed by a single-tree selection system) and one uneven-aged, unmanaged beech stand in the northern Apennines (Tuscany region, Italy). Between April and June 2022, data were collected through four 1-hour audio recording sessions per site, analyzing 5 min sequences. The unmanaged stand hosted a richer (a higher number of species, p < 0.001) and more specialized (a higher number of cavity-nesting species, p < 0.001; higher Woodland Bird Community Index (WBCI) values, p < 0.001; and eight characteristic species, including at least four highly specialized ones) bird community, compared to all the managed forests; moreover, the latter were homogeneous (similar to each other). Our study suggests that the unmanaged beech forests should be a priority option for conservation, while in terms of the managed beech forests, greater attention should be paid to defining the thresholds for snags, deadwood, and large trees to be retained to enhance their biodiversity value. Studies in additional sites, conducted over more years and including multi-taxon communities, are recommended for a deeper understanding and generalizable results. Full article

► Show Figures

Figure 1

15 pages, 415 KiB

Open AccessArticle

Enhancing MusicGen with Prompt Tuning

by Hohyeon Shin, Jeonghyeon Im and Yunsick Sung

Appl. Sci. 2025, 15(15), 8504; https://doi.org/10.3390/app15158504 (registering DOI) - 31 Jul 2025

Abstract

Generative AI has been gaining attention across various creative domains. In particular, MusicGen stands out as a representative approach capable of generating music based on text or audio inputs. However, it has limitations in producing high-quality outputs for specific genres and fully reflecting [...] Read more.

Generative AI has been gaining attention across various creative domains. In particular, MusicGen stands out as a representative approach capable of generating music based on text or audio inputs. However, it has limitations in producing high-quality outputs for specific genres and fully reflecting user intentions. This paper proposes a prompt tuning technique that effectively adjusts the output quality of MusicGen without modifying its original parameters and optimizes its ability to generate music tailored to specific genres and styles. Experiments were conducted to compare the performance of the traditional MusicGen with the proposed method and evaluate the quality of generated music using the Contrastive Language-Audio Pretraining (CLAP) and Kullback–Leibler Divergence (KLD) scoring approaches. The results demonstrated that the proposed method significantly improved the output quality and musical coherence, particularly for specific genres and styles. Compared with the traditional model, the CLAP score was increased by 0.1270, and the KLD score was increased by 0.00403 on average. The effectiveness of prompt tuning in optimizing the performance of MusicGen validated the proposed method and highlighted its potential for advancing generative AI-based music generation tools. Full article

(This article belongs to the Special Issue Recent Advances in AI Convergence: Innovations at the Crossroads of Disciplines)

► Show Figures

Figure 1

21 pages, 1681 KiB

Open AccessArticle

Cross-Modal Complementarity Learning for Fish Feeding Intensity Recognition via Audio–Visual Fusion

by Jian Li, Yanan Wei, Wenkai Ma and Tan Wang

Animals 2025, 15(15), 2245; https://doi.org/10.3390/ani15152245 - 31 Jul 2025

Viewed by 48

Abstract

Accurate evaluation of fish feeding intensity is crucial for optimizing aquaculture efficiency and the healthy growth of fish. Previous methods mainly rely on single-modal approaches (e.g., audio or visual). However, the complex underwater environment makes single-modal monitoring methods face significant challenges: visual systems [...] Read more.

Accurate evaluation of fish feeding intensity is crucial for optimizing aquaculture efficiency and the healthy growth of fish. Previous methods mainly rely on single-modal approaches (e.g., audio or visual). However, the complex underwater environment makes single-modal monitoring methods face significant challenges: visual systems are severely affected by water turbidity, lighting conditions, and fish occlusion, while acoustic systems suffer from background noise. Although existing studies have attempted to combine acoustic and visual information, most adopt simple feature-level fusion strategies, which fail to fully explore the complementary advantages of the two modalities under different environmental conditions and lack dynamic evaluation mechanisms for modal reliability. To address these problems, we propose the Adaptive Cross-modal Attention Fusion Network (ACAF-Net), a cross-modal complementarity learning framework with a two-stage attention fusion mechanism: (1) a cross-modal enhancement stage that enriches individual representations through Low-rank Bilinear Pooling and learnable fusion weights; (2) an adaptive attention fusion stage that dynamically weights acoustic and visual features based on complementarity and environmental reliability. Our framework incorporates dimension alignment strategies and attention mechanisms to capture temporal–spatial complementarity between acoustic feeding signals and visual behavioral patterns. Extensive experiments demonstrate superior performance compared to single-modal and conventional fusion approaches, with 6.4% accuracy improvement. The results validate the effectiveness of exploiting cross-modal complementarity for underwater behavioral analysis and establish a foundation for intelligent aquaculture monitoring systems. Full article

(This article belongs to the Special Issue Innovations in Aquaculture: New Technologies, Culture Systems and Integration of Emerging Species)

► Show Figures

Figure 1

14 pages, 1974 KiB

Open AccessArticle

Effect of Transducer Burn-In on Subjective and Objective Parameters of Loudspeakers

by Tomasz Kopciński, Bartłomiej Kruk and Jan Kucharczyk

Appl. Sci. 2025, 15(15), 8425; https://doi.org/10.3390/app15158425 - 29 Jul 2025

Viewed by 199

Abstract

Speaker burn-in is a controversial practice in the audio world, based on the belief that new devices reach optimal performance only after a certain period of use. Supporters claim it improves component flexibility, reduces initial distortion, and enhances sound quality—especially in the low-frequency [...] Read more.

Speaker burn-in is a controversial practice in the audio world, based on the belief that new devices reach optimal performance only after a certain period of use. Supporters claim it improves component flexibility, reduces initial distortion, and enhances sound quality—especially in the low-frequency range. Critics, however, emphasize the lack of scientific evidence for audible changes and point to the placebo effect in subjective listening tests. They argue that modern manufacturing and strict quality control minimize differences between new and “burned-in” devices. This study cites a standard describing a preliminary burn-in procedure, specifying the exact conditions and duration required. Objective tests revealed slight changes in speaker impedance and amplitude response after burn-in, but these differences are inaudible to the average listener. Notably, significant variation was observed between speakers of the same series, attributed to production line tolerances rather than use-related changes. The study also explored aging processes in speaker materials to better understand potential long-term effects. However, subjective listening tests showed that listeners rated the sound consistently across all test cases, regardless of whether the speaker had undergone burn-in. Overall, while minor physical changes may occur, their audible impact is negligible, especially for non-expert users. Full article

► Show Figures

Figure 1

21 pages, 5817 KiB

Open AccessArticle

UN15: An Urban Noise Dataset Coupled with Time–Frequency Attention for Environmental Sound Classification

by Yu Shen, Ge Cao, Huan-Yu Dong, Bo Dong and Chang-Myung Lee

Appl. Sci. 2025, 15(15), 8413; https://doi.org/10.3390/app15158413 - 29 Jul 2025

Viewed by 88

Abstract

With the increasing severity of urban noise pollution, its detrimental impact on public health has garnered growing attention. However, accurate identification and classification of noise sources in complex urban acoustic environments remain major technical challenges for achieving refined noise management. To address this [...] Read more.

With the increasing severity of urban noise pollution, its detrimental impact on public health has garnered growing attention. However, accurate identification and classification of noise sources in complex urban acoustic environments remain major technical challenges for achieving refined noise management. To address this issue, this study presents two key contributions. First, we construct a new urban noise classification dataset, namely the urban noise 15-category dataset (UN15), which consists of 1620 audio clips from 15 representative categories, including traffic, construction, crowd activity, and commercial noise, recorded from diverse real-world urban scenes. Second, we propose a novel deep neural network architecture based on a residual network and integrated with a time–frequency attention mechanism, referred to as residual network with temporal–frequency attention (ResNet-TF). Extensive experiments conducted on the UN15 dataset demonstrate that ResNet-TF outperforms several mainstream baseline models in both classification accuracy and robustness. These results not only verify the effectiveness of the proposed attention mechanism but also establish the UN15 dataset as a valuable benchmark for future research in urban noise classification. Full article

(This article belongs to the Section Acoustics and Vibrations)

► Show Figures

Figure 1

19 pages, 290 KiB

Open AccessArticle

Artificial Intelligence in Primary Care: Support or Additional Burden on Physicians’ Healthcare Work?—A Qualitative Study

by Stefanie Mache, Monika Bernburg, Annika Würtenberger and David A. Groneberg

Clin. Pract. 2025, 15(8), 138; https://doi.org/10.3390/clinpract15080138 - 25 Jul 2025

Viewed by 156

Abstract

Background: Artificial intelligence (AI) is being increasingly promoted as a means to enhance diagnostic accuracy, to streamline workflows, and to improve overall care quality in primary care. However, empirical evidence on how primary care physicians (PCPs) perceive, engage with, and emotionally respond [...] Read more.

Background: Artificial intelligence (AI) is being increasingly promoted as a means to enhance diagnostic accuracy, to streamline workflows, and to improve overall care quality in primary care. However, empirical evidence on how primary care physicians (PCPs) perceive, engage with, and emotionally respond to AI technologies in everyday clinical settings remains limited. Concerns persist regarding AI’s usability, transparency, and potential impact on professional identity, workload, and the physician–patient relationship. Methods: This qualitative study investigated the lived experiences and perceptions of 28 PCPs practicing in diverse outpatient settings across Germany. Participants were purposively sampled to ensure variation in age, practice characteristics, and digital proficiency. Data were collected through in-depth, semi-structured interviews, which were audio-recorded, transcribed verbatim, and subjected to rigorous thematic analysis employing Mayring’s qualitative content analysis framework. Results: Participants demonstrated a fundamentally ambivalent stance toward AI integration in primary care. Perceived advantages included enhanced diagnostic support, relief from administrative burdens, and facilitation of preventive care. Conversely, physicians reported concerns about workflow disruption due to excessive system prompts, lack of algorithmic transparency, increased cognitive and emotional strain, and perceived threats to clinical autonomy and accountability. The implications for the physician–patient relationship were seen as double-edged: while some believed AI could foster trust through transparent use, others feared depersonalization of care. Crucial prerequisites for successful implementation included transparent and explainable systems, structured training opportunities, clinician involvement in design processes, and seamless integration into clinical routines. Conclusions: Primary care physicians’ engagement with AI is marked by cautious optimism, shaped by both perceived utility and significant concerns. Effective and ethically sound implementation requires co-design approaches that embed clinical expertise, ensure algorithmic transparency, and align AI applications with the realities of primary care workflows. Moreover, foundational AI literacy should be incorporated into undergraduate health professional curricula to equip future clinicians with the competencies necessary for responsible and confident use. These strategies are essential to safeguard professional integrity, support clinician well-being, and maintain the humanistic core of primary care. Full article

19 pages, 3365 KiB

Open AccessArticle

Robust Federated Learning Against Data Poisoning Attacks: Prevention and Detection of Attacked Nodes

by Pretom Roy Ovi and Aryya Gangopadhyay

Electronics 2025, 14(15), 2970; https://doi.org/10.3390/electronics14152970 - 25 Jul 2025

Viewed by 248

Abstract

Federated learning (FL) enables collaborative model building among a large number of participants without sharing sensitive data to the central server. Because of its distributed nature, FL has limited control over local data and the corresponding training process. Therefore, it is susceptible to [...] Read more.

Federated learning (FL) enables collaborative model building among a large number of participants without sharing sensitive data to the central server. Because of its distributed nature, FL has limited control over local data and the corresponding training process. Therefore, it is susceptible to data poisoning attacks where malicious workers use malicious training data to train the model. Furthermore, attackers on the worker side can easily manipulate local data by swapping the labels of training instances, adding noise to training instances, and adding out-of-distribution training instances in the local data to initiate data poisoning attacks. And local workers under such attacks carry incorrect information to the server, poison the global model, and cause misclassifications. So, the prevention and detection of such data poisoning attacks is crucial to build a robust federated training framework. To address this, we propose a prevention strategy in federated learning, namely confident federated learning, to protect workers from such data poisoning attacks. Our proposed prevention strategy at first validates the label quality of local training samples by characterizing and identifying label errors in the local training data, and then excludes the detected mislabeled samples from the local training. To this aim, we experiment with our proposed approach on both the image and audio domains, and our experimental results validated the robustness of our proposed confident federated learning in preventing the data poisoning attacks. Our proposed method can successfully detect the mislabeled training samples with above 85% accuracy and exclude those detected samples from the training set to prevent data poisoning attacks on the local workers. However, our prevention strategy can successfully prevent the attack locally in the presence of a certain percentage of poisonous samples. Beyond that percentage, the prevention strategy may not be effective in preventing attacks. In such cases, detection of the attacked workers is needed. So, in addition to the prevention strategy, we propose a novel detection strategy in the federated learning framework to detect the malicious workers under attack. We propose to create a class-wise cluster representation for every participating worker by utilizing the neuron activation maps of local models and analyze the resulting clusters to filter out the workers under attack before model aggregation. We experimentally demonstrated the efficacy of our proposed detection strategy in detecting workers affected by data poisoning attacks, along with the attack types, e.g., label-flipping or dirty labeling. In addition, our experimental results suggest that the global model could not converge even after a large number of training rounds in the presence of malicious workers, whereas after detecting the malicious workers with our proposed detection method and discarding them from model aggregation, we ensured that the global model achieved convergence within very few training rounds. Furthermore, our proposed approach stays robust under different data distributions and model sizes and does not require prior knowledge about the number of attackers in the system. Full article

(This article belongs to the Special Issue Security Challenges and Opportunities of Artificial Intelligence/Big Data Scenarios)

► Show Figures

Figure 1

21 pages, 2789 KiB

Open AccessArticle

BIM-Based Adversarial Attacks Against Speech Deepfake Detectors

by Wendy Edda Wang, Davide Salvi, Viola Negroni, Daniele Ugo Leonzio, Paolo Bestagini and Stefano Tubaro

Electronics 2025, 14(15), 2967; https://doi.org/10.3390/electronics14152967 - 24 Jul 2025

Viewed by 216

Abstract

Automatic Speaker Verification (ASV) systems are increasingly employed to secure access to services and facilities. However, recent advances in speech deepfake generation pose serious threats to their reliability. Modern speech synthesis models can convincingly imitate a target speaker’s voice and generate realistic synthetic [...] Read more.

Automatic Speaker Verification (ASV) systems are increasingly employed to secure access to services and facilities. However, recent advances in speech deepfake generation pose serious threats to their reliability. Modern speech synthesis models can convincingly imitate a target speaker’s voice and generate realistic synthetic audio, potentially enabling unauthorized access through ASV systems. To counter these threats, forensic detectors have been developed to distinguish between real and fake speech. Although these models achieve strong performance, their deep learning nature makes them susceptible to adversarial attacks, i.e., carefully crafted, imperceptible perturbations in the audio signal that make the model unable to classify correctly. In this paper, we explore adversarial attacks targeting speech deepfake detectors. Specifically, we analyze the effectiveness of Basic Iterative Method (BIM) attacks applied in both time and frequency domains under white- and black-box conditions. Additionally, we propose an ensemble-based attack strategy designed to simultaneously target multiple detection models. This approach generates adversarial examples with balanced effectiveness across the ensemble, enhancing transferability to unseen models. Our experimental results show that, although crafting universally transferable attacks remains challenging, it is possible to fool state-of-the-art detectors using minimal, imperceptible perturbations, highlighting the need for more robust defenses in speech deepfake detection. Full article

(This article belongs to the Special Issue Selected Papers from Young Researchers in Signal/Image/Video Coding and Processing, 2nd Edition)

► Show Figures

Figure 1

21 pages, 9522 KiB

Open AccessArticle

Deep Edge IoT for Acoustic Detection of Queenless Beehives

by Christos Sad, Dimitrios Kampelopoulos, Ioannis Sofianidis, Dimitrios Kanelis, Spyridon Nikolaidis, Chrysoula Tananaki and Kostas Siozios

Electronics 2025, 14(15), 2959; https://doi.org/10.3390/electronics14152959 - 24 Jul 2025

Viewed by 292

Abstract

Honey bees play a vital role in ecosystem stability, and the need to monitor colony health has driven the development of IoT-based systems in beekeeping, with recent studies exploring both empirical and machine learning approaches to detect and analyze key hive conditions. In [...] Read more.

Honey bees play a vital role in ecosystem stability, and the need to monitor colony health has driven the development of IoT-based systems in beekeeping, with recent studies exploring both empirical and machine learning approaches to detect and analyze key hive conditions. In this study, we present an IoT-based system that leverages sensors to record and analyze the acoustic signals produced within a beehive. The captured audio data is transmitted to the cloud, where it is converted into mel-spectrogram representations for analysis. We explore multiple data pre-processing strategies and machine learning (ML) models, assessing their effectiveness in classifying queenless states. To evaluate model generalization, we apply transfer learning (TL) techniques across datasets collected from different hives. Additionally, we implement the feature extraction process and deploy the pre-trained ML model on a deep edge IoT device (Arduino Zero). We examine both memory consumption and execution time. The results indicate that the selected feature extraction method and ML model, which were identified through extensive experimentation, are sufficiently lightweight to operate within the device’s memory constraints. Furthermore, the execution time confirms the feasibility of real-time queenless state detection in edge-based applications. Full article

(This article belongs to the Special Issue Modern Circuits and Systems Technologies (MOCAST 2024))

► Show Figures

Figure 1

20 pages, 4310 KiB

Open AccessFeature PaperArticle

Training Rarámuri Criollo Cattle to Virtual Fencing in a Chaparral Rangeland

by Sara E. Campa Madrid, Andres R. Perea, Micah Funk, Maximiliano J. Spetter, Mehmet Bakir, Jeremy Walker, Rick E. Estell, Brandon Smythe, Sergio Soto-Navarro, Sheri A. Spiegal, Brandon T. Bestelmeyer and Santiago A. Utsumi

Animals 2025, 15(15), 2178; https://doi.org/10.3390/ani15152178 - 24 Jul 2025

Viewed by 484

Abstract

Virtual fencing (VF) offers a promising alternative to conventional or electrified fences for managing livestock grazing distribution. This study evaluated the behavioral responses of 25 Rarámuri Criollo cows fitted with Nofence^® collars in Pine Valley, CA, USA. The VF system was deployed [...] Read more.

Virtual fencing (VF) offers a promising alternative to conventional or electrified fences for managing livestock grazing distribution. This study evaluated the behavioral responses of 25 Rarámuri Criollo cows fitted with Nofence^® collars in Pine Valley, CA, USA. The VF system was deployed in chaparral rangeland pastures. The study included a 14-day training phase followed by an 18-day testing phase. The collar-recorded variables, including audio warnings and electric pulses, animal movement, and daily typical behavior patterns of cows classified into a High or Low virtual fence response group, were compared using repeated-measure analyses with mixed models. During training, High-response cows (i.e., resistant responders) received more audio warnings and electric pulses, while Low-response cows (i.e., active responders) had fewer audio warnings and electric pulses, explored smaller areas, and exhibited lower mobility. Despite these differences, both groups showed a time-dependent decrease in the pulse-to-warning ratio, indicating increased reliance on audio cues and reduced need for electrical stimulation to achieve similar containment rates. In the testing phase, both groups maintained high containment with minimal reinforcement. The study found that Rarámuri Criollo cows can effectively adapt to virtual fencing technology, achieving over 99% containment rate while displaying typical diurnal patterns for grazing, resting, or traveling behavior. These findings support the technical feasibility of using virtual fencing in chaparral rangelands and underscore the importance of accounting for individual behavioral variability in behavior-based containment systems. Full article

(This article belongs to the Special Issue Novel Techniques for Efficient and Sustainable Cattle Production: Precision Farming, Feed Ingredients and Efficiency and Rumen Health)

► Show Figures

Figure 1

13 pages, 1305 KiB

Open AccessArticle

Fine-Tuning BirdNET for the Automatic Ecoacoustic Monitoring of Bird Species in the Italian Alpine Forests

by Giacomo Schiavo, Alessia Portaccio and Alberto Testolin

Information 2025, 16(8), 628; https://doi.org/10.3390/info16080628 - 23 Jul 2025

Viewed by 253

Abstract

The ongoing decline in global biodiversity constitutes a critical challenge for environmental science, necessitating the prompt development of effective monitoring frameworks and conservation protocols to safeguard the structure and function of natural ecosystems. Recent progress in ecoacoustic monitoring, supported by advances in artificial [...] Read more.

The ongoing decline in global biodiversity constitutes a critical challenge for environmental science, necessitating the prompt development of effective monitoring frameworks and conservation protocols to safeguard the structure and function of natural ecosystems. Recent progress in ecoacoustic monitoring, supported by advances in artificial intelligence, might finally offer scalable tools for systematic biodiversity assessment. In this study, we evaluate the performance of BirdNET, a state-of-the-art deep learning model for avian sound recognition, in the context of selected bird species characteristic of the Italian Alpine region. To this end, we assemble a comprehensive, manually annotated audio dataset targeting key regional species, and we investigate a variety of strategies for model adaptation, including fine-tuning with data augmentation techniques to enhance recognition under challenging recording conditions. As a baseline, we also develop and evaluate a simple Convolutional Neural Network (CNN) trained exclusively on our domain-specific dataset. Our findings indicate that BirdNET performance can be greatly improved by fine-tuning the pre-trained network with data collected within the specific regional soundscape, outperforming both the original BirdNET and the baseline CNN by a significant margin. These findings underscore the importance of environmental adaptation and data variability for the development of automated ecoacoustic monitoring devices while highlighting the potential of deep learning methods in supporting conservation efforts and informing soundscape management in protected areas. Full article

(This article belongs to the Special Issue Signal Processing Based on Machine Learning Techniques)

► Show Figures

Graphical abstract

24 pages, 8344 KiB

Open AccessArticle

Research and Implementation of Travel Aids for Blind and Visually Impaired People

by Jun Xu, Shilong Xu, Mingyu Ma, Jing Ma and Chuanlong Li

Sensors 2025, 25(14), 4518; https://doi.org/10.3390/s25144518 - 21 Jul 2025

Viewed by 308

Abstract

Blind and visually impaired (BVI) people face significant challenges in perception, navigation, and safety during travel. Existing infrastructure (e.g., blind lanes) and traditional aids (e.g., walking sticks, basic audio feedback) provide limited flexibility and interactivity for complex environments. To solve this problem, we [...] Read more.

Blind and visually impaired (BVI) people face significant challenges in perception, navigation, and safety during travel. Existing infrastructure (e.g., blind lanes) and traditional aids (e.g., walking sticks, basic audio feedback) provide limited flexibility and interactivity for complex environments. To solve this problem, we propose a real-time travel assistance system based on deep learning. The hardware comprises an NVIDIA Jetson Nano controller, an Intel D435i depth camera for environmental sensing, and SG90 servo motors for feedback. To address embedded device computational constraints, we developed a lightweight object detection and segmentation algorithm. Key innovations include a multi-scale attention feature extraction backbone, a dual-stream fusion module incorporating the Mamba architecture, and adaptive context-aware detection/segmentation heads. This design ensures high computational efficiency and real-time performance. The system workflow is as follows: (1) the D435i captures real-time environmental data; (2) the processor analyzes this data, converting obstacle distances and path deviations into electrical signals; (3) servo motors deliver vibratory feedback for guidance and alerts. Preliminary tests confirm that the system can effectively detect obstacles and correct path deviations in real time, suggesting its potential to assist BVI users. However, as this is a work in progress, comprehensive field trials with BVI participants are required to fully validate its efficacy. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

21 pages, 1689 KiB

Open AccessArticle

Exploring LLM Embedding Potential for Dementia Detection Using Audio Transcripts

by Brandon Alejandro Llaca-Sánchez, Luis Roberto García-Noguez, Marco Antonio Aceves-Fernández, Andras Takacs and Saúl Tovar-Arriaga

Eng 2025, 6(7), 163; https://doi.org/10.3390/eng6070163 - 17 Jul 2025

Viewed by 280

Abstract

Dementia is a neurodegenerative disorder characterized by progressive cognitive impairment that significantly affects daily living. Early detection of Alzheimer’s disease—the most common form of dementia—remains essential for prompt intervention and treatment, yet clinical diagnosis often requires extensive and resource-intensive procedures. This article explores [...] Read more.

Dementia is a neurodegenerative disorder characterized by progressive cognitive impairment that significantly affects daily living. Early detection of Alzheimer’s disease—the most common form of dementia—remains essential for prompt intervention and treatment, yet clinical diagnosis often requires extensive and resource-intensive procedures. This article explores the effectiveness of automated Natural Language Processing (NLP) methods for identifying Alzheimer’s indicators from audio transcriptions of the Cookie Theft picture description task in the PittCorpus dementia database. Five NLP approaches were compared: a classical Tf–Idf statistical representation and embeddings derived from large language models (GloVe, BERT, Gemma-2B, and Linq-Embed-Mistral), each integrated with a logistic regression classifier. Transcriptions were carefully preprocessed to preserve linguistically relevant features such as repetitions, self-corrections, and pauses. To compare the performance of the five approaches, a stratified 5-fold cross-validation was conducted; the best results were obtained with BERT embeddings (84.73% accuracy) closely followed by the simpler Tf–Idf approach (83.73% accuracy) and the state-of-the-art model Linq-Embed-Mistral (83.54% accuracy), while Gemma-2B and GloVe embeddings yielded slightly lower performances (80.91% and 78.11% accuracy, respectively). Contrary to initial expectations—that richer semantic and contextual embeddings would substantially outperform simpler frequency-based methods—the competitive accuracy of Tf–Idf suggests that the choice and frequency of the words used might be more important than semantic or contextual information in Alzheimer’s detection. This work represents an effort toward implementing user-friendly software capable of offering an initial indicator of Alzheimer’s risk, potentially reducing the need for an in-person clinical visit. Full article

(This article belongs to the Special Issue Advanced Artificial Intelligence Techniques for Disease Prediction, Diagnosis and Management)

► Show Figures

Figure 1

18 pages, 1150 KiB

Open AccessArticle

Navigating by Design: Effects of Individual Differences and Navigation Modality on Spatial Memory Acquisition

by Xianyun Liu, Yanan Zhang and Baihu Sun

Behav. Sci. 2025, 15(7), 959; https://doi.org/10.3390/bs15070959 - 15 Jul 2025

Viewed by 272

Abstract

Spatial memory is a critical component of spatial cognition, particularly in unfamiliar environments. As navigation systems become integral to daily life, understanding how individuals with varying spatial abilities respond to different navigation modes is increasingly important. This study employed a virtual driving environment [...] Read more.

Spatial memory is a critical component of spatial cognition, particularly in unfamiliar environments. As navigation systems become integral to daily life, understanding how individuals with varying spatial abilities respond to different navigation modes is increasingly important. This study employed a virtual driving environment to examine how participants with varying spatial abilities (good or poor) performed under three navigation modes, namely visual, audio, and combined audio–visual navigation modes. A total of 78 participants were divided into two groups, good sense of direction (G-SOD) and poor sense of direction (P-SOD), according to their Santa Barbara Sense of Direction (SBSOD) scores. They were randomly assigned to one of the three navigation modes (visual, audio, audio–visual). Participants followed navigation cues and simulated driving behavior to the end point twice during the learning phase, then completed the route retracing task, recognizing scenes task and recognizing the order task. Significant main effects were found for both SOD group and navigation mode, with no interaction. G-SOD participants outperformed P-SOD participants in route retracing task. Audio navigation mode led to better performance in tasks involving complex spatial decisions, such as turn intersections and recognizing the order. The accuracy of recognizing scenes did not significantly differ across SOD groups or navigation modes. These findings suggest that audio navigation mode may reduce visual distraction and support more effective spatial encoding and that individual spatial abilities influence navigation performance independently of guidance type. These findings highlight the importance of aligning navigation modalities with users’ cognitive profiles and support the development of adaptive navigation systems that accommodate individual differences in spatial ability. Full article

(This article belongs to the Section Cognition)

► Show Figures

Figure 1

Search Results (888)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (888)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI