MDPI - Publisher of Open Access Journals

26 pages, 7650 KB

Open AccessArticle

ACD-DETR: Adaptive Cross-Scale Detection Transformer for Small Object Detection in UAV Imagery

by Yang Tong, Hui Ye, Jishen Yang and Xiulong Yang

Sensors 2025, 25(17), 5556; https://doi.org/10.3390/s25175556 - 5 Sep 2025

Small object detection in UAV imagery remains challenging due to complex aerial perspectives and the presence of dense, small targets with blurred boundaries. To address these challenges, we propose ACD-DETR, an adaptive end-to-end Transformer detector tailored for UAV-based small object detection. The framework [...] Read more.

Small object detection in UAV imagery remains challenging due to complex aerial perspectives and the presence of dense, small targets with blurred boundaries. To address these challenges, we propose ACD-DETR, an adaptive end-to-end Transformer detector tailored for UAV-based small object detection. The framework introduces three core modules: the Multi-Scale Edge-Enhanced Feature Fusion Module (MSEFM) to preserve fine-grained details; the Omni-Grained Boundary Calibrator (OG-BC) for boundary-aware semantic fusion; and the Dynamic Position Bias Attention-based Intra-scale Feature Interaction (DPB-AIFI) to enhance spatial reasoning. Furthermore, we introduce ACD-DETR-SBA+, a fusion-enhanced variant that removes OG-BC and DPB-AIFI while deploying densely connected Semantic–Boundary Aggregation (SBA) modules to intensify boundary–semantic fusion. This design sacrifices computational efficiency in exchange for higher detection precision, making it suitable for resource-rich deployment scenarios. On the VisDrone2019 dataset, ACD-DETR achieves 50.9% mAP@0.5, outperforming the RT-DETR-R18 baseline by 3.6 percentage points, while reducing parameters by 18.5%. ACD-DETR-SBA+ further improves accuracy to 52.0% mAP@0.5, demonstrating the benefit of SBA-based fusion. Extensive experiments on the VisDrone2019 and DOTA datasets demonstrate that ACD-DETR achieves a state-of-the-art trade-off between accuracy and efficiency, while ACD-DETR-SBA+ achieves further performance improvements at higher computational cost. Ablation studies and visual analyses validate the effectiveness of the proposed modules and design strategies. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

19 pages, 11406 KB

Open AccessArticle

A Pool Drowning Detection Model Based on Improved YOLO

by Wenhui Zhang, Lu Chen and Jianchun Shi

Sensors 2025, 25(17), 5552; https://doi.org/10.3390/s25175552 - 5 Sep 2025

Abstract

Drowning constitutes the leading cause of injury-related fatalities among adolescents. In swimming pool environments, traditional manual surveillance exhibits limitations, while existing technologies suffer from poor adaptability of wearable devices. Vision models based on YOLO still face challenges in edge deployment efficiency, robustness in [...] Read more.

Drowning constitutes the leading cause of injury-related fatalities among adolescents. In swimming pool environments, traditional manual surveillance exhibits limitations, while existing technologies suffer from poor adaptability of wearable devices. Vision models based on YOLO still face challenges in edge deployment efficiency, robustness in complex water conditions, and multi-scale object detection. To address these issues, we propose YOLO11-LiB, a drowning object detection model based on YOLO11n, featuring three key enhancements. First, we design the Lightweight Feature Extraction Module (LGCBlock), which integrates the Lightweight Attention Encoding Block (LAE) and effectively combines Ghost Convolution (GhostConv) with dynamic convolution (DynamicConv). This optimizes the downsampling structure and the C3k2 module in the YOLO11n backbone network, significantly reducing model parameters and computational complexity. Second, we introduce the Cross-Channel Position-aware Spatial Attention Inverted Residual with Spatial–Channel Separate Attention module (C2PSAiSCSA) into the backbone. This module embeds the Spatial–Channel Separate Attention (SCSA) mechanism within the Inverted Residual Mobile Block (iRMB) framework, enabling more comprehensive and efficient feature extraction. Finally, we redesign the neck structure as the Bidirectional Feature Fusion Network (BiFF-Net), which integrates the Bidirectional Feature Pyramid Network (BiFPN) and Frequency-Aware Feature Fusion (FreqFusion). The enhanced YOLO11-LiB model was validated against mainstream algorithms through comparative experiments, and ablation studies were conducted. Experimental results demonstrate that YOLO11-LiB achieves a drowning class mean average precision (DmAP50) of 94.1%, with merely 2.02 M parameters and a model size of 4.25 MB. This represents an effective balance between accuracy and efficiency, providing a high-performance solution for real-time drowning detection in swimming pool scenarios. Full article

(This article belongs to the Section Intelligent Sensors)

18 pages, 2778 KB

Open AccessArticle

YOLO-MARS for Infrared Target Detection: Towards near Space

by Bohan Liu, Yeteng Han, Pengxi Liu, Sha Luo, Jie Li, Tao Zhang and Wennan Cui

Sensors 2025, 25(17), 5538; https://doi.org/10.3390/s25175538 - 5 Sep 2025

Abstract

In response to problems such as large target scale variations, strong background noise, and blurred features leading by low contrast in infrared target detection in near space environments, this paper proposes an efficient detection model, YOLO-MARS, which is based on YOLOv8. The model [...] Read more.

In response to problems such as large target scale variations, strong background noise, and blurred features leading by low contrast in infrared target detection in near space environments, this paper proposes an efficient detection model, YOLO-MARS, which is based on YOLOv8. The model introduces a Space-to-Depth (SPD) convolution module into the backbone section, which retains the detailed features of smaller targets by downsampling operations without information loss, alleviating the loss of the target feature caused by traditional downsampling. The Grouped Multi-Head Self-Attention (GMHSA) module is added after the backbone’s SPPF module to improve cross-scale global modeling capabilities for target area feature responses while suppressing complex thermal noise background interference. In addition, a Light Adaptive Spatial Feature Fusion (LASFF) detector head is designed to mitigate the scale sensitivity issue of infrared targets (especially smaller targets) in the feature pyramid. It uses a shared weighting mechanism to achieve adaptive fusion of multi-scale features, reducing computational complexity while improving target localization and classification accuracy. To address the extreme scarcity of near space data, we integrated 284 near space images with the HIT-UAV dataset through physical equivalence analysis (atmospheric transmittance, contrast, and signal-to-noise ratio) to construct the NS-HIT dataset. The experimental results show that

{m A P}_{@ 0.5}

increases by 5.4% and the number of parameters only increase 10% using YOLO-MARS compared to YOLOv8. YOLO-MARS improves the accuracy of detection significantly while considering the requirements of model complexity, which provides an efficient and reliable solution for applications in near space infrared target detection. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

24 pages, 32270 KB

Open AccessArticle

Spectral Channel Mixing Transformer with Spectral-Center Attention for Hyperspectral Image Classification

by Zhenming Sun, Hui Liu, Ning Chen, Haina Yang, Jia Li, Chang Liu and Xiaoping Pei

Remote Sens. 2025, 17(17), 3100; https://doi.org/10.3390/rs17173100 - 5 Sep 2025

Abstract

In recent years, the research trend of HSI classification has focused on the innovative integration of deep learning and Transformer architecture to enhance classification performance through multi-scale feature extraction, attention mechanism optimization, and spectral–spatial collaborative modeling. However, due to the excessive computational complexity [...] Read more.

In recent years, the research trend of HSI classification has focused on the innovative integration of deep learning and Transformer architecture to enhance classification performance through multi-scale feature extraction, attention mechanism optimization, and spectral–spatial collaborative modeling. However, due to the excessive computational complexity and the large number of parameters of the Transformer, there is an expansion bottleneck in long sequence tasks, and the collaborative optimization of the algorithm and hardware is required. To better handle this issue, our paper proposes a method which integrates RWKV linear attention with Transformer through a novel TC-Former framework, combining TimeMixFormer and HyperMixFormer architectures. Specifically, TimeMixFormer has optimized the computational complexity through time decay weights and gating design, significantly improving the processing efficiency of long sequences and reducing the computational complexity. HyperMixFormer employs a gated WKV mechanism and dynamic channel weighting, combined with Mish activation and time-shift operations, to optimize computational overhead while achieving efficient cross-channel interaction, significantly enhancing the discriminative representation of spectral features. The pivotal characteristic of the proposed method lies in its innovative integration of linear attention mechanisms, which enhance HSI classification accuracy while achieving lower computational complexity. Evaluation experiments on three public hyperspectral datasets confirm that this framework outperforms the previous state-of-the-art algorithms in classification accuracy. Full article

(This article belongs to the Section Remote Sensing Image Processing)

30 pages, 15053 KB

Open AccessArticle

Comparative Analysis of Spatial Distribution and Mechanism Differences Between Public Electric Vehicle Charging Stations and Traditional Gas Stations: A Case Study from Wenzhou, China

by Jingmin Pan, Aoyang Li, Bo Tang, Fei Wang, Chao Chen, Wangyu Wu and Bingcai Wei

Sustainability 2025, 17(17), 8009; https://doi.org/10.3390/su17178009 - 5 Sep 2025

Abstract

With the impact of fossil energy on the climate environment and the development of energy technologies, new energy vehicles, represented by electric cars, have begun to receive increasing attention and emphasis. The rapid proliferation of public charging infrastructure for NEVs has concurrently influenced [...] Read more.

With the impact of fossil energy on the climate environment and the development of energy technologies, new energy vehicles, represented by electric cars, have begun to receive increasing attention and emphasis. The rapid proliferation of public charging infrastructure for NEVs has concurrently influenced traditional petrol station networks, creating measurable disparities in their spatial distributions that warrant systematic investigation. This research examines Wenzhou City, China, as a representative case area, employing multi-source Point of Interest (POI) data and spatial analysis models to analyse differential characteristics in spatial layout accessibility, service equity, and underlying driving mechanisms between public electric vehicle charging stations (EV) and traditional gas stations (GS). The findings reveal that public electric vehicle charging stations exhibit a pronounced “single-centre concentration with weak multi-centre linkage” spatial configuration, heavily reliant on dual-core drivers of population density and economic activity. This results in marked service accessibility declines in peripheral areas, resembling a cliff-like drop, and a relatively low spatial equity index. In contrast, traditional gas stations demonstrate a “core-axis linkage” diffusion pattern with strong coupling to urban road networks, showing gradient attenuation in service coverage efficiency along transportation arteries, fewer suburban service gaps, and more gradual accessibility reductions. Location entropy analysis further indicates that charging station deployment shows significant capital-oriented tendencies, with certain areas exhibiting paradoxical “excess facilities” phenomena, while gas station distribution aligns more closely with road network topology and transportation demand dynamics. Furthermore, the layout characteristics of public charging stations feature a more complex and diverse range of land use types, while traditional gas stations have a strong dependence on industrial land. This research elucidates the spatial distribution patterns of emerging and legacy energy infrastructure in the survey regions, providing critical empirical evidence for optimising energy infrastructure allocation and facilitating coordinated transportation system transitions. The findings also offer practical insights for the construction of energy supply facilities in urban development frameworks, holding substantial reference value for achieving sustainable urban spatial governance. Full article

(This article belongs to the Special Issue Sustainable and Resilient Regional Development: A Spatial Perspective)

► Show Figures

Figure 1

18 pages, 1641 KB

Open AccessArticle

PigStressNet: A Real-Time Lightweight Vision System for On-Farm Heat Stress Monitoring via Attention-Guided Feature Refinement

by Shuai Cao, Fang Li, Xiaonan Luo, Jiacheng Ni and Linsong Li

Sensors 2025, 25(17), 5534; https://doi.org/10.3390/s25175534 - 5 Sep 2025

Abstract

Heat stress severely impacts pig welfare and farm productivity. However, existing methods lack the capability to detect subtle physiological cues (e.g., skin erythema) in complex farm environments while maintaining real-time efficiency. This paper proposes PigStressNet, a novel lightweight detector designed for accurate and [...] Read more.

Heat stress severely impacts pig welfare and farm productivity. However, existing methods lack the capability to detect subtle physiological cues (e.g., skin erythema) in complex farm environments while maintaining real-time efficiency. This paper proposes PigStressNet, a novel lightweight detector designed for accurate and efficient heat stress recognition. Our approach integrates four key innovations: (1) a Normalization-based Attention Module (NAM) integrated into the backbone network enhances sensitivity to localized features critical for heat stress, such as posture and skin erythema; (2) a Rectangular Self-Calibration Module (RCM) in the neck network improves spatial feature reconstruction, particularly for occluded pigs; (3) an MBConv-optimized detection head (MBHead) reduces computational cost in the head by 72.3%; (4) the MPDIoU loss function enhances bounding box regression accuracy in scenarios with overlapping pigs. We constructed the first fine-grained dataset specifically annotated for pig heat stress (comprising 710 images across 5 classes: standing, eating, sitting, lying, and stress), uniquely fusing posture (lying) and physiological traits (skin erythema). Experiments demonstrate state-of-the-art performance: PigStressNet achieves 0.979 mAP for heat stress detection while requiring 15.9% lower computation (5.3 GFLOPs) and 11.7% fewer parameters compared to the baseline YOLOv12-n model. The system achieves real-time inference on embedded devices, offering a viable solution for intelligent livestock management. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

21 pages, 4483 KB

Open AccessArticle

A Lightweight Instance Segmentation Model for Simultaneous Detection of Citrus Fruit Ripeness and Red Scale (Aonidiella aurantii) Pest Damage

by İlker Ünal and Osman Eceoğlu

Appl. Sci. 2025, 15(17), 9742; https://doi.org/10.3390/app15179742 - 4 Sep 2025

Abstract

Early detection of pest damage and accurate assessment of fruit ripeness are essential for improving the quality, productivity, and sustainability of citrus production. Moreover, precisely assessing ripeness is crucial for establishing the optimal harvest time, preserving fruit quality, and enhancing yield. The simultaneous [...] Read more.

Early detection of pest damage and accurate assessment of fruit ripeness are essential for improving the quality, productivity, and sustainability of citrus production. Moreover, precisely assessing ripeness is crucial for establishing the optimal harvest time, preserving fruit quality, and enhancing yield. The simultaneous and precise early detection of pest damage and assessment of fruit ripeness greatly enhance the efficacy of contemporary agricultural decision support systems. This study presents a lightweight deep learning model based on an optimized YOLO12n-Seg architecture for the simultaneous detection of ripeness stages (unripe and fully ripe) and pest damage caused by Red Scale (Aonidiella aurantii). The model is based on an improved version of YOLO12n-Seg, where the backbone and head layers were retained, but the neck was modified with a GhostConv block to reduce parameter size and improve computational efficiency. Additionally, a Global Attention Mechanism (GAM) was incorporated to strengthen the model’s focus on target-relevant features and reduce background noise. The improvement procedure improved both the ability to gather accurate spatial information in several dimensions and the effectiveness of focusing on specific target object areas utilizing the attention mechanism. Experimental results demonstrated high accuracy on test data, with mAP@0.5 = 0.980, mAP@0.95 = 0.960, precision = 0.961, and recall = 0.943, all achieved with only 2.7 million parameters and a training time of 2 h and 42 min. The model offers a reliable and efficient solution for real-time, integrated pest detection and fruit classification in precision agriculture. Full article

(This article belongs to the Section Agricultural Science and Technology)

► Show Figures

Figure 1

25 pages, 6014 KB

Open AccessArticle

Enhancing Instance Segmentation in Agriculture: An Optimized YOLOv8 Solution

by Qiaolong Wang, Dongshun Chen, Wenfei Feng, Liang Sun and Gaohong Yu

Sensors 2025, 25(17), 5506; https://doi.org/10.3390/s25175506 - 4 Sep 2025

Abstract

To address the limitations of traditional segmentation algorithms in processing complex agricultural scenes, this paper proposes an improved YOLOv8n-seg model. Building upon the original three detection layers, we introduce a dedicated layer for small object detection, which significantly enhances the detection accuracy of [...] Read more.

To address the limitations of traditional segmentation algorithms in processing complex agricultural scenes, this paper proposes an improved YOLOv8n-seg model. Building upon the original three detection layers, we introduce a dedicated layer for small object detection, which significantly enhances the detection accuracy of small targets (e.g., people) after processing images through fourfold downsampling. In the neck network, we replace the C2f module with our proposed C2f_CPCA module, which incorporates a channel prior attention mechanism (CPCA). This mechanism dynamically adjusts attention weights across channels and spatial dimensions to effectively capture relationships between different spatial scales, thereby improving feature extraction and recognition capabilities while maintaining low computational complexity. Finally, we propose a C3RFEM module based on the RFEM architecture and integrate it into the main network. This module combines dilated convolutions and weighted layers to enhance feature extraction capabilities across different receptive field ranges. Experimental results demonstrated that the improved model achieved 1.4% and 4.0% increases in precision and recall rates on private datasets, respectively, with mAP@0.5 and mAP@0.5:0.95 metrics improved by 3.0% and 3.5%, respectively. In comparative evaluations with instance segmentation algorithms such as the YOLOv5 series, YOLOv7, YOLOv8n, YOLOv9t, YOLOv10n, YOLOv10s, Mask R-CNN, and Mask2Former, our model achieved an optimal balance between computational efficiency and detection performance. This demonstrates its potential for the research and development of small intelligent precision operation technology and equipment. Full article

(This article belongs to the Section Smart Agriculture)

► Show Figures

Figure 1

21 pages, 8753 KB

Open AccessArticle

PowerStrand-YOLO: A High-Voltage Transmission Conductor Defect Detection Method for UAV Aerial Imagery

by Zhenrong Deng, Jun Li, Junjie Huang, Shuaizheng Jiang, Qiuying Wu and Rui Yang

Mathematics 2025, 13(17), 2859; https://doi.org/10.3390/math13172859 - 4 Sep 2025

Abstract

Broken or loose strands in high-voltage transmission conductors constitute critical defects that jeopardize grid reliability. Unmanned aerial vehicle (UAV) inspection has become indispensable for their timely discovery; however, conventional detectors falter in the face of cluttered backgrounds and the conductors’ diminutive pixel footprint, [...] Read more.

Broken or loose strands in high-voltage transmission conductors constitute critical defects that jeopardize grid reliability. Unmanned aerial vehicle (UAV) inspection has become indispensable for their timely discovery; however, conventional detectors falter in the face of cluttered backgrounds and the conductors’ diminutive pixel footprint, yielding sub-optimal accuracy and throughput. To overcome these limitations, we present PowerStrand-YOLO—an enhanced YOLOv8 derivative tailored for UAV imagery. The method is trained on a purpose-built dataset and integrates three technical contributions. (1) A C2f_DCNv4 module is introduced to strengthen multi-scale feature extraction. (2) An EMA attention mechanism is embedded to suppress background interference and emphasize defect-relevant cues. (3) The original loss function is superseded by Shape-IoU, compelling the network to attend closely to the geometric contours and spatial layout of strand anomalies. Extensive experiments demonstrate 95.4% precision, 96.2% recall, and 250 FPS. Relative to the baseline YOLOv8, PowerStrand-YOLO improves precision by 3% and recall by 6.8% while accelerating inference. Moreover, it also demonstrates competitive performance on the VisDrone2019 dataset. These results establish the improved framework as a more accurate and efficient solution for UAV-based inspection of power transmission lines. Full article

► Show Figures

Figure 1

15 pages, 1292 KB

Open AccessArticle

Lightweight Semantic Segmentation for AGV Navigation: An Enhanced ESPNet-C with Dual Attention Mechanisms

by Jianqi Shu, Xiang Yan, Wen Liu, Haifeng Gong, Jingtai Zhu and Mengdie Yang

Electronics 2025, 14(17), 3524; https://doi.org/10.3390/electronics14173524 - 3 Sep 2025

Abstract

Efficient navigation of Automated Guided Vehicles (AGVs) in dynamic warehouse environments requires real-time and accurate path segmentation algorithms. However, traditional semantic segmentation models suffer from excessive parameters and high computational costs, limiting their deployment on resource-constrained embedded platforms. A lightweight image segmentation algorithm [...] Read more.

Efficient navigation of Automated Guided Vehicles (AGVs) in dynamic warehouse environments requires real-time and accurate path segmentation algorithms. However, traditional semantic segmentation models suffer from excessive parameters and high computational costs, limiting their deployment on resource-constrained embedded platforms. A lightweight image segmentation algorithm is proposed, built on an improved ESPNet-C architecture, combining Spatial Group-wise Enhance (SGE) and Efficient Channel Attention (ECA) with a dual-branch upsampling decoder. On our custom warehouse dataset, the model attains 90.5% Miou with 0.425 M parameters and runs at ~160 FPS, reducing parameters by ×116–×136 and computational costs by 70–92% in comparison with DeepLabV3+. The proposed model improves boundary coherence by 22% under uneven lighting and achieves 90.2% Miou on the public BDD100K benchmark, demonstrating strong generalization beyond warehouse data. These results highlight its suitability as a real-time visual perception module for AGV navigation in resource-constrained environments and offer practical guidance for designing lightweight semantic segmentation models for embedded applications. Full article

► Show Figures

Figure 1

21 pages, 4900 KB

Open AccessArticle

RingFormer-Seg: A Scalable and Context-Preserving Vision Transformer Framework for Semantic Segmentation of Ultra-High-Resolution Remote Sensing Imagery

by Zhan Zhang, Daoyu Shu, Guihe Gu, Wenkai Hu, Ru Wang, Xiaoling Chen and Bingnan Yang

Remote Sens. 2025, 17(17), 3064; https://doi.org/10.3390/rs17173064 - 3 Sep 2025

Abstract

Semantic segmentation of ultra-high-resolution remote sensing (UHR-RS) imagery plays a critical role in land use and land cover analysis, yet it remains computationally intensive due to the enormous input size and high spatial complexity. Existing studies have commonly employed strategies such as patch-wise [...] Read more.

Semantic segmentation of ultra-high-resolution remote sensing (UHR-RS) imagery plays a critical role in land use and land cover analysis, yet it remains computationally intensive due to the enormous input size and high spatial complexity. Existing studies have commonly employed strategies such as patch-wise processing, multi-scale model architectures, lightweight networks, and representation sparsification to reduce resource demands, but they have often struggled to maintain long-range contextual awareness and scalability for inputs of arbitrary size. To address this, we propose RingFormer-Seg, a scalable Vision Transformer framework that enables long-range context learning through multi-device parallelism in UHR-RS image segmentation. RingFormer-Seg decomposes the input into spatial subregions and processes them through a distributed three-stage pipeline. First, the Saliency-Aware Token Filter (STF) selects informative tokens to reduce redundancy. Next, the Efficient Local Context Module (ELCM) enhances intra-region features via memory-efficient attention. Finally, the Cross-Device Context Router (CDCR) exchanges token-level information across devices to capture global dependencies. Fine-grained detail is preserved through the residual integration of unselected tokens, and a hierarchical decoder generates high-resolution segmentation outputs. We conducted extensive experiments on three benchmarks covering UHR-RS images from 2048 × 2048 to 8192 × 8192 pixels. Results show that our framework achieves top segmentation accuracy while significantly improving computational efficiency across the DeepGlobe, Wuhan, and Guangdong datasets. RingFormer-Seg offers a versatile solution for UHR-RS image segmentation and demonstrates potential for practical deployment in nationwide land cover mapping, supporting informed decision-making in land resource management, environmental policy planning, and sustainable development. Full article

► Show Figures

Figure 1

25 pages, 4433 KB

Open AccessArticle

Mathematical Analysis and Performance Evaluation of CBAM-DenseNet121 for Speech Emotion Recognition Using the CREMA-D Dataset

by Zineddine Sarhani Kahhoul, Nadjiba Terki, Ilyes Benaissa, Khaled Aldwoah, E. I. Hassan, Osman Osman and Djamel Eddine Boukhari

Appl. Sci. 2025, 15(17), 9692; https://doi.org/10.3390/app15179692 - 3 Sep 2025

Viewed by 34

Abstract

Emotion recognition from speech is essential for human–computer interaction (HCI) and affective computing, with applications in virtual assistants, healthcare, and education. Although deep learning has made significant advancements in Automatic Speech Emotion Recognition (ASER), the challenge still exists in the task given variation [...] Read more.

Emotion recognition from speech is essential for human–computer interaction (HCI) and affective computing, with applications in virtual assistants, healthcare, and education. Although deep learning has made significant advancements in Automatic Speech Emotion Recognition (ASER), the challenge still exists in the task given variation in speakers, subtle emotional expressions, and environmental noise. Practical deployment in this context depends on a strong, fast, scalable recognition system. This work introduces a new framework combining DenseNet121, especially fine-tuned for the crowd-sourced emotional multimodal actors dataset (CREMA-D), with the convolutional block attention module (CBAM). While DenseNet121’s effective feature propagation captures rich, hierarchical patterns in the speech data, CBAM improves the focus of the model on emotionally significant elements by applying both spatial and channel-wise attention. Furthermore, enhancing the input spectrograms and strengthening resistance against environmental noise is an advanced preprocessing pipeline including log-Mel spectrogram transformation and normalization. The proposed model demonstrates superior performance. To make sure the evaluation is strong even if there is a class imbalance, we point out important metrics like an Unweighted Average Recall (UAR) of 71.01% and an F1 score of 71.25%. The model also gets a test accuracy of 71.26% and a precision of 71.30%. These results establish the model as a promising solution for real-world speech emotion detection, highlighting its strong generalization capabilities, computational efficiency, and focus on emotion-specific features compared to recent work. The improvements demonstrate practical flexibility, enabling the integration of established image recognition techniques and allowing for substantial adaptability in various application contexts. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

25 pages, 3060 KB

Open AccessArticle

Asymmetric Object Recognition Process for Miners’ Safety Based on Improved YOLOv10 Technology

by Diana Novak, Yuriy Kozhubaev, Vyacheslav Potekhin, Haodong Cheng and Roman Ershov

Symmetry 2025, 17(9), 1435; https://doi.org/10.3390/sym17091435 - 3 Sep 2025

Viewed by 165

Abstract

Coal remains a vital energy resource and plays a key role in national development. Ensuring the safety of underground mining personnel is essential, and intelligent algorithms are increasingly used to detect miners in surveillance footage. However, complex underground environments—characterised by poor lighting, occlusions, [...] Read more.

Coal remains a vital energy resource and plays a key role in national development. Ensuring the safety of underground mining personnel is essential, and intelligent algorithms are increasingly used to detect miners in surveillance footage. However, complex underground environments—characterised by poor lighting, occlusions, irregular postures, and reflective gear—make accurate detection difficult. This study proposes improvements to the YOLOv10-N object detection model for miner detection. Using 37,463 annotated images from real mining environments, we propose three main enhancements: a Coordinate Attention (CA) mechanism to highlight important spatial features, a Dynamic Head (DyHead) module to improve multi-scale feature fusion, and the Efficient IoU (EIOU) loss function to enhance bounding box regression and speed up convergence. While CA, DyHead, and EIOU are established methods, their synergistic integration for asymmetric miner detection (e.g., occluded limbs, uneven lighting) presents a novel application-specific optimisation. Experimental results confirm that the enhanced model significantly outperforms the original. It achieves 92.69% accuracy, 87.53% recall, and an average accuracy of 89.9%, with a practical detection effect of 68.24%. These findings show that the proposed method improves both accuracy and robustness in challenging mining conditions while maintaining processing efficiency. Full article

(This article belongs to the Special Issue Symmetry Applied in Computer Vision, Automation, and Robotics)

► Show Figures

Figure 1

24 pages, 3866 KB

Open AccessArticle

Improved Heterogeneous Spatiotemporal Graph Network Model for Traffic Flow Prediction at Highway Toll Stations

by Yaofang Zhang, Jian Chen, Fafu Chen and Jianjie Gao

Sustainability 2025, 17(17), 7905; https://doi.org/10.3390/su17177905 - 2 Sep 2025

Viewed by 111

Abstract

This study aims to guide the management and service of highways towards a more efficient and intelligent direction, and also provides intelligent and green data support for achieving sustainable development goals. The forecasting of traffic flow at highway stations serves as the cornerstone [...] Read more.

This study aims to guide the management and service of highways towards a more efficient and intelligent direction, and also provides intelligent and green data support for achieving sustainable development goals. The forecasting of traffic flow at highway stations serves as the cornerstone for spatiotemporal analysis and is vital for effective highway management and control. Despite considerable advancements in data-driven traffic flow prediction, the majority of existing models fail to differentiate between directions. Specifically, entrance flow prediction has applications in dynamic route guidance, disseminating real-time traffic conditions, and offering optimal entrance selection suggestions. Meanwhile, exit flow prediction is instrumental for congestion and accident alerts, as well as for road network optimization decisions. In light of these needs, this study introduces an enhanced heterogeneous spatiotemporal graph network model tailored for predicting highway station traffic flow. To accurately capture the dynamic impact of upstream toll stations on the target station’s flow, we devise an influence probability matrix. This matrix, in conjunction with the covariance matrix across toll stations, updated graph structure data, and integrated external weather conditions, allows the attention mechanism to assign varied combination weights to the target toll station from temporal, spatial, and external standpoints, thereby augmenting prediction accuracy. We undertook a case study utilizing traffic flow data from the Chengdu-Chengyu station on the Sichuan Highway to gauge the efficacy of our proposed model. The experimental outcomes indicate that our model surpasses other baseline models in performance metrics. This study provides valuable insights for highway management and control, as well as for reducing traffic congestion. Furthermore, this research highlights the importance of using data-driven approaches to reduce carbon emissions associated with transportation, enhance resource allocation at toll plazas, and promote sustainable highway transportation systems. Full article

(This article belongs to the Special Issue Advances in Intelligent Transportation, Smart Grids and Electric Vehicles in the Context of Sustainability)

► Show Figures

Figure 1

16 pages, 2827 KB

Open AccessArticle

A Dual-Modality CNN Approach for RSS-Based Indoor Positioning Using Spatial and Frequency Fingerprints

by Xiangchen Lai, Yunzhi Luo and Yong Jia

Sensors 2025, 25(17), 5408; https://doi.org/10.3390/s25175408 - 2 Sep 2025

Viewed by 147

Abstract

Indoor positioning systems based on received signal strength (RSS) achieve indoor positioning by leveraging the position-related features inherent in spatial RSS fingerprint images. Their positioning accuracy and robustness are directly influenced by the quality of fingerprint features. However, the inherent spatial low-resolution characteristic [...] Read more.

Indoor positioning systems based on received signal strength (RSS) achieve indoor positioning by leveraging the position-related features inherent in spatial RSS fingerprint images. Their positioning accuracy and robustness are directly influenced by the quality of fingerprint features. However, the inherent spatial low-resolution characteristic of spatial RSS fingerprint images makes it challenging to effectively extract subtle fingerprint features. To address this issue, this paper proposes an RSS-based indoor positioning method that combines enhanced spatial frequency fingerprint representation with fusion learning. First, bicubic interpolation is applied to improve image resolution and reveal finer spatial details. Then, a 2D fast Fourier transform (2D FFT) converts the enhanced spatial images into frequency domain representations to supplement spectral features. These spatial and frequency fingerprints are used as dual-modality inputs for a parallel convolutional neural network (CNN) model with efficient multi-scale attention (EMA) modules. The model extracts modality-specific features and fuses them to generate enriched representations. Each modality—spatial, frequency, and fused—is passed through a dedicated fully connected network to predict 3D coordinates. A coordinate optimization strategy is introduced to select the two most reliable outputs for each axis (x, y, z), and their average is used as the final estimate. Experiments on seven public datasets show that the proposed method significantly improves positioning accuracy, reducing the mean positioning error by up to 47.1% and root mean square error (RMSE) by up to 54.4% compared with traditional and advanced time–frequency methods. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

Search Results (1,526)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (1,526)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI