MDPI - Publisher of Open Access Journals

28 pages, 9378 KiB

Open AccessArticle

A Semantic Segmentation-Based GNSS Signal Occlusion Detection and Optimization Method

by Zhe Yue, Chenchen Sun, Xuerong Zhang, Chengkai Tang, Yuting Gao and Kezhao Li

Remote Sens. 2025, 17(15), 2725; https://doi.org/10.3390/rs17152725 (registering DOI) - 6 Aug 2025

Existing research fails to effectively address the problem of increased GNSS positioning errors caused by non-line-of-sight (NLOS) and line-of-sight (LOS) signal attenuation due to obstructions such as buildings and trees in complex urban environments. To address this issue, we dig into the environmental [...] Read more.

Existing research fails to effectively address the problem of increased GNSS positioning errors caused by non-line-of-sight (NLOS) and line-of-sight (LOS) signal attenuation due to obstructions such as buildings and trees in complex urban environments. To address this issue, we dig into the environmental perception perspective to propose a semantic segmentation-based GNSS signal occlusion detection and optimization method. The approach distinguishes between building and tree occlusions and adjusts signal weights accordingly to enhance positioning accuracy. First, a fisheye camera captures environmental imagery above the vehicle, which is then processed using deep learning to segment sky, tree, and building regions. Subsequently, satellite projections are mapped onto the segmented sky image to classify signal occlusions. Then, based on the type of obstruction, a dynamic weight optimization model is constructed to adjust the contribution of each satellite in the positioning solution, thereby enhancing the positioning accuracy of vehicle-navigation in urban environments. Finally, we construct a vehicle-mounted navigation system for experimentation. The experimental results demonstrate that the proposed method enhances accuracy by 16% and 10% compared to the existing GNSS/INS/Canny and GNSS/INS/Flood Fill methods, respectively, confirming its effectiveness in complex urban environments. Full article

(This article belongs to the Special Issue GNSS and Multi-Sensor Integrated Precise Positioning and Applications)

► Show Figures

Figure 1

27 pages, 4680 KiB

Open AccessArticle

Gecko-Inspired Robots for Underground Cable Inspection: Improved YOLOv8 for Automated Defect Detection

by Dehai Guan and Barmak Honarvar Shakibaei Asli

Electronics 2025, 14(15), 3142; https://doi.org/10.3390/electronics14153142 - 6 Aug 2025

Abstract

To enable intelligent inspection of underground cable systems, this study presents a gecko-inspired quadruped robot that integrates multi-degree-of-freedom motion with a deep learning-based visual detection system. Inspired by the gecko’s flexible spine and leg structure, the robot exhibits strong adaptability to confined and [...] Read more.

To enable intelligent inspection of underground cable systems, this study presents a gecko-inspired quadruped robot that integrates multi-degree-of-freedom motion with a deep learning-based visual detection system. Inspired by the gecko’s flexible spine and leg structure, the robot exhibits strong adaptability to confined and uneven tunnel environments. The motion system is modeled using the standard Denavit–Hartenberg (D–H) method, with both forward and inverse kinematics derived analytically. A zero-impact foot trajectory is employed to achieve stable gait planning. For defect detection, the robot incorporates a binocular vision module and an enhanced YOLOv8 framework. The key improvements include a lightweight feature fusion structure (SlimNeck), a multidimensional coordinate attention (MCA) mechanism, and a refined MPDIoU loss function, which collectively improve the detection accuracy of subtle defects such as insulation aging, micro-cracks, and surface contamination. A variety of data augmentation techniques—such as brightness adjustment, Gaussian noise, and occlusion simulation—are applied to enhance robustness under complex lighting and environmental conditions. The experimental results validate the effectiveness of the proposed system in both kinematic control and vision-based defect recognition. This work demonstrates the potential of integrating bio-inspired mechanical design with intelligent visual perception to support practical, efficient cable inspection in confined underground environments. Full article

(This article belongs to the Special Issue Robotics: From Technologies to Applications)

18 pages, 11340 KiB

Open AccessArticle

CLSANet: Cognitive Learning-Based Self-Adaptive Feature Fusion for Multimodal Visual Object Detection

by Han Peng, Qionglin Liu, Riqing Ruan, Shuaiqi Yuan and Qin Li

Electronics 2025, 14(15), 3082; https://doi.org/10.3390/electronics14153082 - 1 Aug 2025

Viewed by 338

Abstract

Multimodal object detection leverages the complementary characteristics of visible (RGB) and infrared (IR) imagery, making it well-suited for challenging scenarios such as low illumination, occlusion, and complex backgrounds. However, most existing fusion-based methods rely on static or heuristic strategies, limiting their adaptability to [...] Read more.

Multimodal object detection leverages the complementary characteristics of visible (RGB) and infrared (IR) imagery, making it well-suited for challenging scenarios such as low illumination, occlusion, and complex backgrounds. However, most existing fusion-based methods rely on static or heuristic strategies, limiting their adaptability to dynamic environments. To address this limitation, we propose CLSANet, a cognitive learning-based self-adaptive network that enhances detection performance by dynamically selecting and integrating modality-specific features. CLSANet consists of three key modules: (1) a Dominant Modality Identification Module that selects the most informative modality based on global scene analysis; (2) a Modality Enhancement Module that disentangles and strengthens shared and modality-specific representations; and (3) a Self-Adaptive Fusion Module that adjusts fusion weights spatially according to local scene complexity. Compared to existing methods, CLSANet achieves state-of-the-art detection performance with significantly fewer parameters and lower computational cost. Ablation studies further demonstrate the individual effectiveness of each module under different environmental conditions, particularly in low-light and occluded scenes. CLSANet offers a compact, interpretable, and practical solution for multimodal object detection in resource-constrained settings. Full article

(This article belongs to the Special Issue Digital Intelligence Technology and Applications)

► Show Figures

Figure 1

21 pages, 1681 KiB

Open AccessArticle

Cross-Modal Complementarity Learning for Fish Feeding Intensity Recognition via Audio–Visual Fusion

by Jian Li, Yanan Wei, Wenkai Ma and Tan Wang

Animals 2025, 15(15), 2245; https://doi.org/10.3390/ani15152245 - 31 Jul 2025

Viewed by 276

Abstract

Accurate evaluation of fish feeding intensity is crucial for optimizing aquaculture efficiency and the healthy growth of fish. Previous methods mainly rely on single-modal approaches (e.g., audio or visual). However, the complex underwater environment makes single-modal monitoring methods face significant challenges: visual systems [...] Read more.

Accurate evaluation of fish feeding intensity is crucial for optimizing aquaculture efficiency and the healthy growth of fish. Previous methods mainly rely on single-modal approaches (e.g., audio or visual). However, the complex underwater environment makes single-modal monitoring methods face significant challenges: visual systems are severely affected by water turbidity, lighting conditions, and fish occlusion, while acoustic systems suffer from background noise. Although existing studies have attempted to combine acoustic and visual information, most adopt simple feature-level fusion strategies, which fail to fully explore the complementary advantages of the two modalities under different environmental conditions and lack dynamic evaluation mechanisms for modal reliability. To address these problems, we propose the Adaptive Cross-modal Attention Fusion Network (ACAF-Net), a cross-modal complementarity learning framework with a two-stage attention fusion mechanism: (1) a cross-modal enhancement stage that enriches individual representations through Low-rank Bilinear Pooling and learnable fusion weights; (2) an adaptive attention fusion stage that dynamically weights acoustic and visual features based on complementarity and environmental reliability. Our framework incorporates dimension alignment strategies and attention mechanisms to capture temporal–spatial complementarity between acoustic feeding signals and visual behavioral patterns. Extensive experiments demonstrate superior performance compared to single-modal and conventional fusion approaches, with 6.4% accuracy improvement. The results validate the effectiveness of exploiting cross-modal complementarity for underwater behavioral analysis and establish a foundation for intelligent aquaculture monitoring systems. Full article

(This article belongs to the Special Issue Innovations in Aquaculture: New Technologies, Culture Systems and Integration of Emerging Species)

► Show Figures

Figure 1

20 pages, 16450 KiB

Open AccessArticle

A Smart Textile-Based Tactile Sensing System for Multi-Channel Sign Language Recognition

by Keran Chen, Longnan Li, Qinyao Peng, Mengyuan He, Liyun Ma, Xinxin Li and Zhenyu Lu

Sensors 2025, 25(15), 4602; https://doi.org/10.3390/s25154602 - 25 Jul 2025

Viewed by 319

Abstract

Sign language recognition plays a crucial role in enabling communication for deaf individuals, yet current methods face limitations such as sensitivity to lighting conditions, occlusions, and lack of adaptability in diverse environments. This study presents a wearable multi-channel tactile sensing system based on [...] Read more.

Sign language recognition plays a crucial role in enabling communication for deaf individuals, yet current methods face limitations such as sensitivity to lighting conditions, occlusions, and lack of adaptability in diverse environments. This study presents a wearable multi-channel tactile sensing system based on smart textiles, designed to capture subtle wrist and finger motions for static sign language recognition. The system leverages triboelectric yarns sewn into gloves and sleeves to construct a skin-conformal tactile sensor array, capable of detecting biomechanical interactions through contact and deformation. Unlike vision-based approaches, the proposed sensor platform operates independently of environmental lighting or occlusions, offering reliable performance in diverse conditions. Experimental validation on American Sign Language letter gestures demonstrates that the proposed system achieves high signal clarity after customized filtering, leading to a classification accuracy of 94.66%. Experimental results show effective recognition of complex gestures, highlighting the system’s potential for broader applications in human-computer interaction. Full article

(This article belongs to the Special Issue Advanced Tactile Sensors: Design and Applications)

► Show Figures

Figure 1

28 pages, 8982 KiB

Open AccessArticle

Decision-Level Multi-Sensor Fusion to Improve Limitations of Single-Camera-Based CNN Classification in Precision Farming: Application in Weed Detection

by Md. Nazmuzzaman Khan, Adibuzzaman Rahi, Mohammad Al Hasan and Sohel Anwar

Computation 2025, 13(7), 174; https://doi.org/10.3390/computation13070174 - 18 Jul 2025

Viewed by 307

Abstract

The United States leads in corn production and consumption in the world with an estimated USD 50 billion per year. There is a pressing need for the development of novel and efficient techniques aimed at enhancing the identification and eradication of weeds in [...] Read more.

The United States leads in corn production and consumption in the world with an estimated USD 50 billion per year. There is a pressing need for the development of novel and efficient techniques aimed at enhancing the identification and eradication of weeds in a manner that is both environmentally sustainable and economically advantageous. Weed classification for autonomous agricultural robots is a challenging task for a single-camera-based system due to noise, vibration, and occlusion. To address this issue, we present a multi-camera-based system with decision-level sensor fusion to improve the limitations of a single-camera-based system in this paper. This study involves the utilization of a convolutional neural network (CNN) that was pre-trained on the ImageNet dataset. The CNN subsequently underwent re-training using a limited weed dataset to facilitate the classification of three distinct weed species: Xanthium strumarium (Common Cocklebur), Amaranthus retroflexus (Redroot Pigweed), and Ambrosia trifida (Giant Ragweed). These weed species are frequently encountered within corn fields. The test results showed that the re-trained VGG16 with a transfer-learning-based classifier exhibited acceptable accuracy (99% training, 97% validation, 94% testing accuracy) and inference time for weed classification from the video feed was suitable for real-time implementation. But the accuracy of CNN-based classification from video feed from a single camera was found to deteriorate due to noise, vibration, and partial occlusion of weeds. Test results from a single-camera video feed show that weed classification accuracy is not always accurate for the spray system of an agricultural robot (AgBot). To improve the accuracy of the weed classification system and to overcome the shortcomings of single-sensor-based classification from CNN, an improved Dempster–Shafer (DS)-based decision-level multi-sensor fusion algorithm was developed and implemented. The proposed algorithm offers improvement on the CNN-based weed classification when the weed is partially occluded. This algorithm can also detect if a sensor is faulty within an array of sensors and improves the overall classification accuracy by penalizing the evidence from a faulty sensor. Overall, the proposed fusion algorithm showed robust results in challenging scenarios, overcoming the limitations of a single-sensor-based system. Full article

(This article belongs to the Special Issue Moving Object Detection Using Computational Methods and Modeling)

► Show Figures

Figure 1

22 pages, 5363 KiB

Open AccessArticle

Accurate Extraction of Rural Residential Buildings in Alpine Mountainous Areas by Combining Shadow Processing with FF-SwinT

by Guize Luan, Jinxuan Luo, Zuyu Gao and Fei Zhao

Remote Sens. 2025, 17(14), 2463; https://doi.org/10.3390/rs17142463 - 16 Jul 2025

Viewed by 280

Abstract

Precise extraction of rural settlements in alpine regions is critical for geographic data production, rural development, and spatial optimization. However, existing deep learning models are hindered by insufficient datasets and suboptimal algorithm structures, resulting in blurred boundaries and inadequate extraction accuracy. Therefore, this [...] Read more.

Precise extraction of rural settlements in alpine regions is critical for geographic data production, rural development, and spatial optimization. However, existing deep learning models are hindered by insufficient datasets and suboptimal algorithm structures, resulting in blurred boundaries and inadequate extraction accuracy. Therefore, this study uses high-resolution unmanned aerial vehicle (UAV) remote sensing images to construct a specialized dataset for the extraction of rural settlements in alpine mountainous areas, while introducing an innovative shadow mitigation technique that integrates multiple spectral characteristics. This methodology effectively addresses the challenges posed by intense shadows in settlements and environmental occlusions common in mountainous terrain analysis. Based on the comparative experiments with existing deep learning models, the Swin Transformer was selected as the baseline model. Building upon this, the Feature Fusion Swin Transformer (FF-SwinT) model was constructed by optimizing the data processing, loss function, and multi-view feature fusion. Finally, we rigorously evaluated it through ablation studies, generalization tests and large-scale image application experiments. The results show that the FF-SwinT has improved in many indicators compared with the traditional Swin Transformer, and the recognition results have clear edges and strong integrity. These results suggest that the FF-SwinT establishes a novel framework for rural settlement extraction in alpine mountain regions, which is of great significance for regional spatial optimization and development policy formulation. Full article

► Show Figures

Figure 1

21 pages, 2469 KiB

Open AccessArticle

Robust Low-Overlap Point Cloud Registration via Displacement-Corrected Geometric Consistency for Enhanced 3D Sensing

by Xin Wang and Qingguang Li

Sensors 2025, 25(14), 4332; https://doi.org/10.3390/s25144332 - 11 Jul 2025

Viewed by 399

Abstract

Accurate alignment of 3D point clouds, achieved by ubiquitous sensors such as LiDAR and depth cameras, is critical for enhancing perception capabilities in robotics, autonomous navigation, and environmental reconstruction. However, low-overlap scenarios—common due to limited sensor field-of-view or occlusions—severely degrade registration robustness and [...] Read more.

Accurate alignment of 3D point clouds, achieved by ubiquitous sensors such as LiDAR and depth cameras, is critical for enhancing perception capabilities in robotics, autonomous navigation, and environmental reconstruction. However, low-overlap scenarios—common due to limited sensor field-of-view or occlusions—severely degrade registration robustness and sensing reliability. To address this challenge, this paper proposes a novel geometric consistency optimization and rectification deep learning network named GeoCORNet. By synergistically designing a geometric consistency enhancement module, a bidirectional cross-attention mechanism, a predictive displacement rectification strategy, and joint optimization of overlap loss with displacement loss, GeoCORNet significantly improves registration accuracy and robustness in complex scenarios. The Attentive Cross-Consistency module of GeoCORNet integrates distance and angular consistency constraints with bidirectional cross-attention to significantly suppress noise from non-overlapping regions while reinforcing geometric coherence in overlapping areas. The predictive displacement rectification strategy dynamically rectifies erroneous correspondences through predicted 3D displacements instead of discarding them, maximizing the utility of sparse sensor data. Furthermore, a novel displacement loss function was developed to effectively constrain the geometric distribution of corrected point-pairs. Experimental results demonstrate that our method outperformed existing approaches in the aspects of registration recall, rotation error, and algorithm robustness under low-overlap conditions. These advances establish a new paradigm for robust 3D sensing in real-world applications where partial sensor data is prevalent. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

29 pages, 16466 KiB

Open AccessArticle

DMF-YOLO: Dynamic Multi-Scale Feature Fusion Network-Driven Small Target Detection in UAV Aerial Images

by Xiaojia Yan, Shiyan Sun, Huimin Zhu, Qingping Hu, Wenjian Ying and Yinglei Li

Remote Sens. 2025, 17(14), 2385; https://doi.org/10.3390/rs17142385 - 10 Jul 2025

Viewed by 549

Abstract

Target detection in UAV aerial images has found increasingly widespread applications in emergency rescue, maritime monitoring, and environmental surveillance. However, traditional detection models suffer significant performance degradation due to challenges including substantial scale variations, high proportions of small targets, and dense occlusions in [...] Read more.

Target detection in UAV aerial images has found increasingly widespread applications in emergency rescue, maritime monitoring, and environmental surveillance. However, traditional detection models suffer significant performance degradation due to challenges including substantial scale variations, high proportions of small targets, and dense occlusions in UAV-captured images. To address these issues, this paper proposes DMF-YOLO, a high-precision detection network based on YOLOv10 improvements. First, we design Dynamic Dilated Snake Convolution (DDSConv) to adaptively adjust the receptive field and dilation rate of convolution kernels, enhancing local feature extraction for small targets with weak textures. Second, we construct a Multi-scale Feature Aggregation Module (MFAM) that integrates dual-branch spatial attention mechanisms to achieve efficient cross-layer feature fusion, mitigating information conflicts between shallow details and deep semantics. Finally, we propose an Expanded Window-based Bounding Box Regression Loss Function (EW-BBRLF), which optimizes localization accuracy through dynamic auxiliary bounding boxes, effectively reducing missed detections of small targets. Experiments on the VisDrone2019 and HIT-UAV datasets demonstrate that DMF-YOLOv10 achieves 50.1% and 81.4% mAP50, respectively, significantly outperforming the baseline YOLOv10s by 27.1% and 2.6%, with parameter increases limited to 24.4% and 11.9%. The method exhibits superior robustness in dense scenarios, complex backgrounds, and long-range target detection. This approach provides an efficient solution for UAV real-time perception tasks and offers novel insights for multi-scale object detection algorithm design. Full article

► Show Figures

Graphical abstract

23 pages, 15159 KiB

Open AccessArticle

TBFH: A Total-Building-Focused Hybrid Dataset for Remote Sensing Image Building Detection

by Lin Yi, Feng Wang, Guangyao Zhou, Niangang Jiao, Minglin He, Jingxing Zhu and Hongjian You

Remote Sens. 2025, 17(13), 2316; https://doi.org/10.3390/rs17132316 - 6 Jul 2025

Viewed by 430

Abstract

Building extraction plays a crucial role in a variety of applications, including urban planning, high-precision 3D reconstruction, and environmental monitoring. In particular, the accurate detection of tall buildings is essential for reliable modeling and analysis. However, most existing building-detection methods are primarily trained [...] Read more.

Building extraction plays a crucial role in a variety of applications, including urban planning, high-precision 3D reconstruction, and environmental monitoring. In particular, the accurate detection of tall buildings is essential for reliable modeling and analysis. However, most existing building-detection methods are primarily trained on datasets dominated by low-rise structures, resulting in degraded performance when applied to complex urban scenes with high-rise buildings and severe occlusions. To address this limitation, we propose TBFH (Total-Building-Focused Hybrid), a novel dataset specifically designed for building detection in remote sensing imagery. TBFH comprises a diverse collection of tall buildings across various urban environments and is integrated with the publicly available WHU Building dataset to enable joint training. This hybrid strategy aims to enhance model robustness and generalization across varying urban morphologies. We also propose the KTC metric to quantitatively evaluate the structural integrity and shape fidelity of building segmentation results. We evaluated the effectiveness of TBFH on multiple state-of-the-art models, including UNet, UNetFormer, ABCNet, BANet, FCN, DeepLabV3, MANet, SegFormer, and DynamicVis. Our comparative experiments conducted on the Tall Building dataset, the WHU dataset, and TBFH demonstrated that models trained with TBFH significantly outperformed those trained on individual datasets, showing notable improvements in IoU, F1, and KTC scores as well as in the accuracy of building shape delineation. These findings underscore the critical importance of incorporating tall building-focused data to improve both detection accuracy and generalization performance. Full article

► Show Figures

Figure 1

22 pages, 6123 KiB

Open AccessArticle

Real-Time Proprioceptive Sensing Enhanced Switching Model Predictive Control for Quadruped Robot Under Uncertain Environment

by Sanket Lokhande, Yajie Bao, Peng Cheng, Dan Shen, Genshe Chen and Hao Xu

Electronics 2025, 14(13), 2681; https://doi.org/10.3390/electronics14132681 - 2 Jul 2025

Viewed by 511

Abstract

Quadruped robots have shown significant potential in disaster relief applications, where they have to navigate complex terrains for search and rescue or reconnaissance operations. However, their deployment is hindered by limited adaptability in highly uncertain environments, especially when relying solely on vision-based sensors [...] Read more.

Quadruped robots have shown significant potential in disaster relief applications, where they have to navigate complex terrains for search and rescue or reconnaissance operations. However, their deployment is hindered by limited adaptability in highly uncertain environments, especially when relying solely on vision-based sensors like cameras or LiDAR, which are susceptible to occlusions, poor lighting, and environmental interference. To address these limitations, this paper proposes a novel sensor-enhanced hierarchical switching model predictive control (MPC) framework that integrates proprioceptive sensing with a bi-level hybrid dynamic model. Unlike existing methods that either rely on handcrafted controllers or deep learning-based control pipelines, our approach introduces three core innovations: (1) a situation-aware, bi-level hybrid dynamic modeling strategy that hierarchically combines single-body rigid dynamics with distributed multi-body dynamics for modeling agility and scalability; (2) a three-layer hybrid control framework, including a terrain-aware switching MPC layer, a distributed torque controller, and a fast PD control loop for enhanced robustness during contact transitions; and (3) a multi-IMU-based proprioceptive feedback mechanism for terrain classification and adaptive gait control under sensor-occluded or GPS-denied environments. Together, these components form a unified and computationally efficient control scheme that addresses practical challenges such as limited onboard processing, unstructured terrain, and environmental uncertainty. A series of experimental results demonstrate that the proposed method outperforms existing vision- and learning-based controllers in terms of stability, adaptability, and control efficiency during high-speed locomotion over irregular terrain. Full article

(This article belongs to the Special Issue Smart Robotics and Autonomous Systems)

► Show Figures

Figure 1

25 pages, 2723 KiB

Open AccessArticle

A Human-Centric, Uncertainty-Aware Event-Fused AI Network for Robust Face Recognition in Adverse Conditions

by Akmalbek Abdusalomov, Sabina Umirzakova, Elbek Boymatov, Dilnoza Zaripova, Shukhrat Kamalov, Zavqiddin Temirov, Wonjun Jeong, Hyoungsun Choi and Taeg Keun Whangbo

Appl. Sci. 2025, 15(13), 7381; https://doi.org/10.3390/app15137381 - 30 Jun 2025

Cited by 1 | Viewed by 336

Abstract

Face recognition systems often falter when deployed in uncontrolled settings, grappling with low light, unexpected occlusions, motion blur, and the degradation of sensor signals. Most contemporary algorithms chase raw accuracy yet overlook the pragmatic need for uncertainty estimation and multispectral reasoning rolled into [...] Read more.

Face recognition systems often falter when deployed in uncontrolled settings, grappling with low light, unexpected occlusions, motion blur, and the degradation of sensor signals. Most contemporary algorithms chase raw accuracy yet overlook the pragmatic need for uncertainty estimation and multispectral reasoning rolled into a single framework. This study introduces HUE-Net—a Human-centric, Uncertainty-aware, Event-fused Network—designed specifically to thrive under severe environmental stress. HUE-Net marries the visible RGB band with near-infrared (NIR) imagery and high-temporal-event data through an early-fusion pipeline, proven more responsive than serial approaches. A custom hybrid backbone that couples convolutional networks with transformers keeps the model nimble enough for edge devices. Central to the architecture is the perturbed multi-branch variational module, which distills probabilistic identity embeddings while delivering calibrated confidence scores. Complementing this, an Adaptive Spectral Attention mechanism dynamically reweights each stream to amplify the most reliable facial features in real time. Unlike previous efforts that compartmentalize uncertainty handling, spectral blending, or computational thrift, HUE-Net unites all three in a lightweight package. Benchmarks on the IJB-C and N-SpectralFace datasets illustrate that the system not only secures state-of-the-art accuracy but also exhibits unmatched spectral robustness and reliable probability calibration. The results indicate that HUE-Net is well-positioned for forensic missions and humanitarian scenarios where trustworthy identification cannot be deferred. Full article

(This article belongs to the Special Issue New Technologies and Applications of Visual-Based Human-Computer Interactions)

► Show Figures

Figure 1

21 pages, 15478 KiB

Open AccessReview

Small Object Detection in Traffic Scenes for Mobile Robots: Challenges, Strategies, and Future Directions

by Zhe Wei, Yurong Zou, Haibo Xu and Sen Wang

Electronics 2025, 14(13), 2614; https://doi.org/10.3390/electronics14132614 - 28 Jun 2025

Viewed by 549

Abstract

Small object detection in traffic scenes presents unique challenges for mobile robots operating under constrained computational resources and highly dynamic environments. Unlike general object detection, small targets often suffer from low resolution, weak semantic cues, and frequent occlusion, especially in complex outdoor scenarios. [...] Read more.

Small object detection in traffic scenes presents unique challenges for mobile robots operating under constrained computational resources and highly dynamic environments. Unlike general object detection, small targets often suffer from low resolution, weak semantic cues, and frequent occlusion, especially in complex outdoor scenarios. This study systematically analyses the challenges, technical advances, and deployment strategies for small object detection tailored to mobile robotic platforms. We categorise existing approaches into three main strategies: feature enhancement (e.g., multi-scale fusion, attention mechanisms), network architecture optimisation (e.g., lightweight backbones, anchor-free heads), and data-driven techniques (e.g., augmentation, simulation, transfer learning). Furthermore, we examine deployment techniques on embedded devices such as Jetson Nano and Raspberry Pi, and we highlight multi-modal sensor fusion using Light Detection and Ranging (LiDAR), cameras, and Inertial Measurement Units (IMUs) for enhanced environmental perception. A comparative study of public datasets and evaluation metrics is provided to identify current limitations in real-world benchmarking. Finally, we discuss future directions, including robust detection under extreme conditions and human-in-the-loop incremental learning frameworks. This research aims to offer a comprehensive technical reference for researchers and practitioners developing small object detection systems for real-world robotic applications. Full article

(This article belongs to the Special Issue New Trends in Computer Vision and Image Processing)

► Show Figures

Figure 1

26 pages, 3494 KiB

Open AccessArticle

A Hyper-Attentive Multimodal Transformer for Real-Time and Robust Facial Expression Recognition

by Zarnigor Tagmatova, Sabina Umirzakova, Alpamis Kutlimuratov, Akmalbek Abdusalomov and Young Im Cho

Appl. Sci. 2025, 15(13), 7100; https://doi.org/10.3390/app15137100 - 24 Jun 2025

Viewed by 461

Abstract

Facial expression recognition (FER) plays a critical role in affective computing, enabling machines to interpret human emotions through facial cues. While recent deep learning models have achieved progress, many still fail under real-world conditions such as occlusion, lighting variation, and subtle expressions. In [...] Read more.

Facial expression recognition (FER) plays a critical role in affective computing, enabling machines to interpret human emotions through facial cues. While recent deep learning models have achieved progress, many still fail under real-world conditions such as occlusion, lighting variation, and subtle expressions. In this work, we propose FERONet, a novel hyper-attentive multimodal transformer architecture tailored for robust and real-time FER. FERONet integrates a triple-attention mechanism (spatial, channel, and cross-patch), a hierarchical transformer with token merging for computational efficiency, and a temporal cross-attention decoder to model emotional dynamics in video sequences. The model fuses RGB, optical flow, and depth/landmark inputs, enhancing resilience to environmental variation. Experimental evaluations across five standard FER datasets—FER-2013, RAF-DB, CK+, BU-3DFE, and AFEW—show that FERONet achieves superior recognition accuracy (up to 97.3%) and real-time inference speeds (<16 ms per frame), outperforming prior state-of-the-art models. The results confirm the model’s suitability for deployment in applications such as intelligent tutoring, driver monitoring, and clinical emotion assessment. Full article

(This article belongs to the Special Issue Emerging Trends in Affective Computing and Measuring Emotional Intelligence)

► Show Figures

Figure 1

17 pages, 5935 KiB

Open AccessTechnical Note

Merging Various Types of Remote Sensing Data and Social Participation GIS with AI to Map the Objects Affected by Light Occlusion

by Yen-Chun Lin, Teng-To Yu, Yu-En Yang, Jo-Chi Lin, Guang-Wen Lien and Shyh-Chin Lan

Remote Sens. 2025, 17(13), 2131; https://doi.org/10.3390/rs17132131 - 21 Jun 2025

Viewed by 364

Abstract

This study proposes a practical integration of an existing deep learning model (YOLOv9-E) and social participation GIS using multi-source remote sensing data to identify asbestos-containing materials located on the side of a building affected by light occlusions. These objects are often undetectable by [...] Read more.

This study proposes a practical integration of an existing deep learning model (YOLOv9-E) and social participation GIS using multi-source remote sensing data to identify asbestos-containing materials located on the side of a building affected by light occlusions. These objects are often undetectable by traditional vertical or oblique photogrammetry, yet their precise localization is essential for effective removal planning. By leveraging the mobility and responsiveness of citizen investigators, we conducted fine-grained surveys in community spaces that were often inaccessible using conventional methods. The YOLOv9-E model demonstrated robustness on mobile-captured images, enriched with geolocation and orientation metadata, which improved the association between detections and specific buildings. By comparing results from Google Street View and field-based social imagery, we highlight the complementary strengths of both sources. Rather than introducing new algorithms, this study focuses on an applied integration framework to improve detection coverage, spatial precision, and participatory monitoring for environmental risk management. The dataset comprised 20,889 images, with 98% being used for training and validation and 2% being used for independent testing. The YOLOv9-E model achieved an mAP50 of 0.81 and an F1-score of 0.85 on the test set. Full article

(This article belongs to the Special Issue Collaborative Learning for Multimodal Remote Sensing Analysis: Methods, Techniques and Applications)

► Show Figures

Figure 1

Search Results (196)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (196)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI