MDPI - Publisher of Open Access Journals

22 pages, 1641 KB

Open AccessArticle

PGRF: Physics-Guided Rectified Flow for Low-Light RAW Image Enhancement

by Juntai Zeng and Qingyun Yang

J. Imaging 2025, 11(11), 393; https://doi.org/10.3390/jimaging11110393 - 6 Nov 2025

Viewed by 380

Enhancing RAW images acquired under low-light conditions remains a fundamental yet challenging problem in computational photography and image signal processing. Recent deep learning-based approaches have shifted from real paired datasets toward synthetic data generation, where sensor noise is typically simulated through physical modeling. [...] Read more.

Enhancing RAW images acquired under low-light conditions remains a fundamental yet challenging problem in computational photography and image signal processing. Recent deep learning-based approaches have shifted from real paired datasets toward synthetic data generation, where sensor noise is typically simulated through physical modeling. However, most existing methods primarily account for additive noise, neglect multiplicative noise components, and rely on global calibration procedures that fail to capture pixel-level manufacturing variability. Consequently, these methods struggle to faithfully reproduce the complex statistics of real sensor noise. To overcome these limitations, this paper introduces a physically grounded composite noise model that jointly incorporates additive and multiplicative noise components. We further propose a per-pixel noise simulation and calibration strategy, which estimates and synthesizes noise individually for each pixel. This physics-based calibration not only circumvents the constraints of global noise modeling but also captures spatial noise variations arising from microscopic CMOS sensor fabrication differences. Inspired by the recent success of rectified-flow methods in image generation, we integrate our physics-based noise synthesis into a rectified-flow generative framework and present PGRF (Physics-Guided Rectified Flow): a physics-guided rectified-flow framework for low-light RAW image enhancement. PGRF leverages the expressive capacity of rectified flows to model complex data distributions, while physical guidance constrains the generation process toward the desired clean image manifold. To evaluate our method, we constructed the LLID, a dedicated indoor low-light RAW benchmark captured using the Sony A7S II camera. Extensive experiments demonstrate that the proposed framework achieves substantial improvements over state-of-the-art methods in low-light RAW image enhancement. Full article

(This article belongs to the Section Image and Video Processing)

► Show Figures

Figure 1

19 pages, 11860 KB

Open AccessArticle

Indoor Object Measurement Through a Redundancy and Comparison Method

by Pedro Faria, Tomás Simões, Tiago Marques and Peter D. Finn

Sensors 2025, 25(21), 6744; https://doi.org/10.3390/s25216744 - 4 Nov 2025

Viewed by 388

Abstract

Accurate object detection and measurement within indoor environments—particularly unfurnished or minimalistic spaces—pose unique challenges for conventional computer vision methods. Previous research has been limited to small objects that can be fully detected by applications such as YOLO, or to outdoor environments where reference [...] Read more.

Accurate object detection and measurement within indoor environments—particularly unfurnished or minimalistic spaces—pose unique challenges for conventional computer vision methods. Previous research has been limited to small objects that can be fully detected by applications such as YOLO, or to outdoor environments where reference elements are more abundant. However, in indoor scenarios with limited detectable references—such as walls that exceed the camera’s field of view—current models exhibit difficulties in producing complete detections and accurate distance estimates. This paper introduces a geometry-driven, redundancy-based framework that leverages proportional laws and architectural heuristics to enhance the measurement accuracy of walls and spatial divisions using standard smartphone cameras. The model was trained on 204 labeled indoor images over 25 training iterations (500 epochs) with augmentation, achieving a mean average precision (mAP@50) of 0.995, precision of 0.995, and recall of 0.992, confirming convergence and generalisation. Applying the redundancy correction method reduced distance deviation errors to approximately 10%, corresponding to a mean absolute error below 2% in the use case. Unlike depth-sensing systems, the proposed solution requires no specialised hardware and operates fully on 2D visual input, allowing on-device and offline use. The framework provides a scalable, low-cost alternative for accurate spatial measurement and demonstrates the feasibility of camera-based geometry correction in real-world indoor settings. Future developments may integrate the proposed redundancy correction with emerging multimodal models such as SpatialLM to extend precision toward full-room spatial reasoning in applications including construction, real estate evaluation, energy auditing, and seismic assessment. Full article

(This article belongs to the Special Issue Computer Vision and Sensing Technologies for Industrial Quality Inspection: 2nd Edition)

► Show Figures

Figure 1

22 pages, 2340 KB

Open AccessArticle

Efficient Dual-Domain Collaborative Enhancement Method for Low-Light Images in Architectural Scenes

by Jing Pu, Wei Shi, Dong Luo, Guofei Zhang, Zhixun Xie, Wanying Liu and Bincan Liu

Infrastructures 2025, 10(11), 289; https://doi.org/10.3390/infrastructures10110289 - 31 Oct 2025

Viewed by 187

Abstract

Low-light image enhancement in architectural scenes presents a considerable challenge for computer vision applications in construction engineering. Images captured in architectural settings during nighttime or under inadequate illumination often suffer from noise interference, low-light blurring, and obscured structural features. Although low-light image enhancement [...] Read more.

Low-light image enhancement in architectural scenes presents a considerable challenge for computer vision applications in construction engineering. Images captured in architectural settings during nighttime or under inadequate illumination often suffer from noise interference, low-light blurring, and obscured structural features. Although low-light image enhancement and deblurring are intrinsically linked when emphasizing architectural defects, conventional image restoration methods generally treat these tasks as separate entities. This paper introduces an efficient and robust Frequency-Space Recovery Network (FSRNet), specifically designed for low-light image enhancement in architectural contexts, tailored to the unique characteristics of such scenes. The encoder utilizes a Feature Refinement Feedforward Network (FRFN) to achieve precise enhancement of defect features while dynamically mitigating background redundancy. Coupled with a Frequency Response Module, it modifies the amplitude spectrum to amplify high-frequency components of defects and ensure balanced global illumination. The decoder utilizes InceptionDWConv2d modules to capture multi-directional and multi-scale features of cracks. When combined with a gating mechanism, it dynamically suppresses noise, restores the spatial continuity of defects, and eliminates blurring. This method also reduces computational costs in terms of parameters and MAC operations. To assess the effectiveness of the proposed approach in architectural contexts, this paper conducts a comprehensive study using low-light defect images from indoor concrete walls as a representative case. Experimental results indicate that FSRNet not only achieves state-of-the-art PSNR performance of 27.58 dB but also enhances the mAP of the downstream YOLOv8 detection model by 7.1%, while utilizing only 3.75 M parameters and 8.8 GMACs. These findings fully validate the superiority and practicality of the proposed method for low-light image enhancement tasks in architectural settings. Full article

► Show Figures

Figure 1

22 pages, 6682 KB

Open AccessArticle

Multimodal Fire Salient Object Detection for Unregistered Data in Real-World Scenarios

by Ning Sun, Jianmeng Zhou, Kai Hu, Chen Wei, Zihao Wang and Lipeng Song

Fire 2025, 8(11), 415; https://doi.org/10.3390/fire8110415 - 26 Oct 2025

Viewed by 840

Abstract

In real-world fire scenarios, complex lighting conditions and smoke interference significantly challenge the accuracy and robustness of traditional fire detection systems. Fusion of complementary modalities, such as visible light (RGB) and infrared (IR), is essential to enhance detection robustness. However, spatial shifts and [...] Read more.

In real-world fire scenarios, complex lighting conditions and smoke interference significantly challenge the accuracy and robustness of traditional fire detection systems. Fusion of complementary modalities, such as visible light (RGB) and infrared (IR), is essential to enhance detection robustness. However, spatial shifts and geometric distortions occur in multi-modal image pairs collected by multi-source sensors due to installation deviations and inconsistent intrinsic parameters. Existing multi-modal fire detection frameworks typically depend on pre-registered data, which struggles to handle modal misalignment in practical deployment. To overcome this limitation, we propose an end-to-end multi-modal Fire Salient Object Detection framework capable of dynamically fusing cross-modal features without pre-registration. Specifically, the Channel Cross-enhancement Module (CCM) facilitates semantic interaction across modalities in salient regions, suppressing noise from spatial misalignment. The Deformable Alignment Module (DAM) achieves adaptive correction of geometric deviations through cascaded deformation compensation and dynamic offset learning. For validation, we constructed an unregistered indoor fire dataset (Indoor-Fire) covering common fire scenarios. Generalizability was further evaluated on an outdoor dataset (RGB-T Wildfire). To fully validate the effectiveness of the method in complex building fire scenarios, we conducted experiments using the Fire in historic buildings (Fire in historic buildings) dataset. Experimental results demonstrate that the F1-score reaches 83% on both datasets, with the IoU maintained above 70%. Notably, while maintaining high accuracy, the number of parameters (91.91 M) is only 28.1% of the second-best SACNet (327 M). This method provides a robust solution for unaligned or weakly aligned modal fusion caused by sensor differences and is highly suitable for deployment in intelligent firefighting systems. Full article

► Show Figures

Figure 1

16 pages, 6596 KB

Open AccessArticle

Enhanced Reality Exercise System Designed for People with Limited Mobility

by Ahmet Özkurt, Tolga Olcay and Taner Akkan

Appl. Sci. 2025, 15(20), 11146; https://doi.org/10.3390/app152011146 - 17 Oct 2025

Viewed by 295

Abstract

People with limited mobility experience disadvantages when participating in outdoor activities such as cycling, which can lead to negative consequences. This study proposes an indoor physical cycling activity, with the help of technological solutions, for people with limited mobility. The aim is to [...] Read more.

People with limited mobility experience disadvantages when participating in outdoor activities such as cycling, which can lead to negative consequences. This study proposes an indoor physical cycling activity, with the help of technological solutions, for people with limited mobility. The aim is to use enhanced reality (ER) technology, based on virtual reality, to exercise in the person’s own indoor environment. In this system, real track and speed information is received by a 360-degree camera, GPS, and gyroscope sensors and presented to the mechanical system in the electromechanical bike structure with real-time interaction. The pedal force system of the exercise bike is driven using information of the incline, and data from the bike’s speed sensor and head movements are transferred in real time to the track image on the user’s head-up display, creating a realistic experience. With this system, it is possible to maintain an experience close to real cycling through human–computer interaction with hardware and software integration. Thus, using this system, people with limited mobility can improve their quality of life by performing indoor physical activities with an experience close to reality. Full article

► Show Figures

Figure 1

15 pages, 2159 KB

Open AccessArticle

Benchmarking Lightweight YOLO Object Detectors for Real-Time Hygiene Compliance Monitoring

by Leen Alashrafi, Raghad Badawood, Hana Almagrabi, Mayda Alrige, Fatemah Alharbi and Omaima Almatrafi

Sensors 2025, 25(19), 6140; https://doi.org/10.3390/s25196140 - 4 Oct 2025

Viewed by 1223

Abstract

Ensuring hygiene compliance in regulated environments—such as food processing facilities, hospitals, and public indoor spaces—requires reliable detection of personal protective equipment (PPE) usage, including gloves, face masks, and hairnets. Manual inspection is labor-intensive and unsuitable for continuous, real-time enforcement. This study benchmarks three [...] Read more.

Ensuring hygiene compliance in regulated environments—such as food processing facilities, hospitals, and public indoor spaces—requires reliable detection of personal protective equipment (PPE) usage, including gloves, face masks, and hairnets. Manual inspection is labor-intensive and unsuitable for continuous, real-time enforcement. This study benchmarks three lightweight object detection models—YOLOv8n, YOLOv10n, and YOLOv12n—for automated PPE compliance monitoring using a large curated dataset of over 31,000 annotated images. The dataset spans seven classes representing both compliant and non-compliant conditions: glove, no_glove, mask, no_mask, incorrect_mask, hairnet, and no_hairnet. All evaluations were conducted using both detection accuracy metrics (mAP@50, mAP@50–95, precision, recall) and deployment-relevant efficiency metrics (inference speed, model size, GFLOPs). Among the three models, YOLOv10n achieved the highest mAP@50 (85.7%) while maintaining competitive efficiency, indicating strong suitability for resource-constrained IoT-integrated deployments. YOLOv8n provided the highest localization accuracy at stricter thresholds (mAP@50–95), while YOLOv12n favored ultra-lightweight operation at the cost of reduced accuracy. The results provide practical guidance for selecting nano-scale detection models in real-time hygiene compliance systems and contribute a reproducible, deployment-aware evaluation framework for computer vision in hygiene-critical settings. Full article

(This article belongs to the Section Internet of Things)

► Show Figures

Figure 1

20 pages, 74841 KB

Open AccessArticle

Autonomous Concrete Crack Monitoring Using a Mobile Robot with a 2-DoF Manipulator and Stereo Vision Sensors

by Seola Yang, Daeik Jang, Jonghyeok Kim and Haemin Jeon

Sensors 2025, 25(19), 6121; https://doi.org/10.3390/s25196121 - 3 Oct 2025

Cited by 1 | Viewed by 861

Abstract

Crack monitoring in concrete structures is essential to maintaining structural integrity. Therefore, this paper proposes a mobile ground robot equipped with a 2-DoF manipulator and stereo vision sensors for autonomous crack monitoring and mapping. To facilitate crack detection over large areas, a 2-DoF [...] Read more.

Crack monitoring in concrete structures is essential to maintaining structural integrity. Therefore, this paper proposes a mobile ground robot equipped with a 2-DoF manipulator and stereo vision sensors for autonomous crack monitoring and mapping. To facilitate crack detection over large areas, a 2-DoF motorized manipulator providing linear and rotational motions, with a stereo vision sensor mounted on the end effector, was deployed. In combination with a manual rotation plate, this configuration enhances accessibility and expands the field of view for crack monitoring. Another stereo vision sensor, mounted at the front of the robot, was used to acquire point cloud data of the surrounding environment, enabling tasks such as SLAM (simultaneous localization and mapping), path planning and following, and obstacle avoidance. Cracks are detected and segmented using the deep learning algorithms YOLO (You Only Look Once) v6-s and SFNet (Semantic Flow Network), respectively. To enhance the performance of crack segmentation, synthetic image generation and preprocessing techniques, including cropping and scaling, were applied. The dimensions of cracks are calculated using point clouds filtered with the median absolute deviation method. To validate the performance of the proposed crack-monitoring and mapping method with the robot system, indoor experimental tests were performed. The experimental results confirmed that, in cases of divided imaging, the crack propagation direction was predicted, enabling robotic manipulation and division-point calculation. Subsequently, total crack length and width were calculated by combining reconstructed 3D point clouds from multiple frames, with a maximum relative error of 1%. Full article

(This article belongs to the Special Issue Emerging Sensors and AI-Driven Innovations in Infrastructure Health Monitoring)

► Show Figures

Figure 1

14 pages, 3021 KB

Open AccessArticle

An Experimental Investigation into the Influence of Colored Lighting on Perceived Spatial Impressions

by Heejin Lee and Eunsil Lee

Buildings 2025, 15(19), 3511; https://doi.org/10.3390/buildings15193511 - 28 Sep 2025

Viewed by 636

Abstract

The present study investigates the psychological impact of lighting color on spatial impressions within indoor settings, drawing on Mehrabian and Russell’s PAD model. The purpose of this study is to explore potential variations in spatial impressions, encompassing affectivity, tranquility, and thermality, across six [...] Read more.

The present study investigates the psychological impact of lighting color on spatial impressions within indoor settings, drawing on Mehrabian and Russell’s PAD model. The purpose of this study is to explore potential variations in spatial impressions, encompassing affectivity, tranquility, and thermality, across six different lighting colors (i.e., red, green, blue, yellow, orange, and purple). A controlled laboratory experiment was conducted with 101 participants, utilizing a color-changing LED lighting fixture to expose participants to actual lighting conditions rather than simulated images. The findings revealed significant differences in spatial impressions among the six lighting colors, indicating that the choice of lighting color has an impact on how people perceive space impressions. Blue lighting elicited the most favorable affective responses, while red lighting was perceived most negatively. Although purple lighting yielded the highest tranquility mean, it was not statistically different from other cool hues and was also associated with sleepiness and dullness. By incorporating secondary colors and employing real-time lighting exposure, this study offers a novel contribution to existing research on color and lighting. Full article

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

► Show Figures

Figure 1

38 pages, 10032 KB

Open AccessArticle

Closed and Structural Optimization for 3D Line Segment Extraction in Building Point Clouds

by Ruoming Zhai, Xianquan Han, Peng Wan, Jianzhou Li, Yifeng He and Bangning Ding

Remote Sens. 2025, 17(18), 3234; https://doi.org/10.3390/rs17183234 - 18 Sep 2025

Viewed by 565

Abstract

The extraction of architectural structural line features can simplify the 3D spatial representation of built environments, reduce the storage and processing burden of large-scale point clouds, and provide essential geometric primitives for downstream modeling tasks. However, existing 3D line extraction methods suffer from [...] Read more.

The extraction of architectural structural line features can simplify the 3D spatial representation of built environments, reduce the storage and processing burden of large-scale point clouds, and provide essential geometric primitives for downstream modeling tasks. However, existing 3D line extraction methods suffer from incomplete and fragmented contours, with missing or misaligned intersections. To overcome these limitations, this study proposes a patch-level framework for 3D line extraction and structural optimization from building point clouds. The proposed method first partitions point clouds into planar patches and establishes local image planes for each patch, enabling a structured 2D representation of unstructured 3D data. Then, graph-cut segmentation is proposed to extract compact boundary contours, which are vectorized into closed lines and back-projected into 3D space to form the initial line segments. To improve geometric consistency, regularized geometric constraints, including adjacency, collinearity, and orthogonality constraints, are further designed to merge homogeneous segments, refine topology, and strengthen structural outlines. Finally, we evaluated the approach on three indoor building environments and four outdoor scenes, and experimental results show that it reduces noise and redundancy while significantly improving the completeness, closure, and alignment of 3D line features in various complex architectural structures. Full article

(This article belongs to the Special Issue Advances in 3D Reconstruction Based on Remote Sensing Imagery and Lidar Point Cloud)

► Show Figures

Figure 1

13 pages, 2763 KB

Open AccessArticle

Structural Deflection Measurement with a Single Smartphone Using a New Scale Factor Calibration Method

by Long Tian, Yangxiang Yuan, Liping Yu and Xinyue Zhang

Infrastructures 2025, 10(9), 238; https://doi.org/10.3390/infrastructures10090238 - 10 Sep 2025

Viewed by 560

Abstract

This study proposes a novel structural deflection measurement method using a single smartphone with an innovative scale factor (SF) calibration technique, eliminating reliance on laser rangefinders and industrial cameras. Conventional off-axis digital image correlation (DIC) techniques require laser rangefinders to measure discrete points [...] Read more.

This study proposes a novel structural deflection measurement method using a single smartphone with an innovative scale factor (SF) calibration technique, eliminating reliance on laser rangefinders and industrial cameras. Conventional off-axis digital image correlation (DIC) techniques require laser rangefinders to measure discrete points for SF calculation, suffering from high hardware costs and sunlight-induced ranging failures. The proposed approach replaces physical ranging by deriving SF through geometric relationships of known structural dimensions (e.g., bridge length/width) within the measured plane. A key innovation lies in developing a versatile SF calibration framework adaptable to varying numbers of reference dimensions: a non-optimized calculation integrates smartphone gyroscope-measured 3D angles when only one dimension is available; a local optimization model with angular parameters enhances accuracy for 2–3 known dimensions; and a global optimization model employing spatial constraints achieves precise SF resolution with ≥4 reference dimensions. Indoor experiments demonstrated sub-0.05 m ranging accuracy and deflection errors below 0.30 mm. Field validations on Beijing Subway Line 13′s bridge successfully captured dynamic load-induced deformations, confirming outdoor applicability. This smartphone-based method reduces costs compared to traditional setups while overcoming sunlight interference, establishing a hardware-adaptive solution for vision-based structural health monitoring. Full article

► Show Figures

Figure 1

29 pages, 1761 KB

Open AccessArticle

5G High-Precision Positioning in GNSS-Denied Environments Using a Positional Encoding-Enhanced Deep Residual Network

by Jin-Man Shen, Hua-Min Chen, Hui Li, Shaofu Lin and Shoufeng Wang

Sensors 2025, 25(17), 5578; https://doi.org/10.3390/s25175578 - 6 Sep 2025

Viewed by 1905

Abstract

With the widespread deployment of 5G technology, high-precision positioning in global navigation satellite system (GNSS)-denied environments is a critical yet challenging task for emerging 5G applications, enabling enhanced spatial resolution, real-time data acquisition, and more accurate geolocation services. Traditional methods relying on single-source [...] Read more.

With the widespread deployment of 5G technology, high-precision positioning in global navigation satellite system (GNSS)-denied environments is a critical yet challenging task for emerging 5G applications, enabling enhanced spatial resolution, real-time data acquisition, and more accurate geolocation services. Traditional methods relying on single-source measurements like received signal strength information (RSSI) or time of arrival (TOA) often fail in complex multipath conditions. To address this, the positional encoding multi-scale residual network (PE-MSRN) is proposed, a novel deep learning framework that enhances positioning accuracy by deeply mining spatial information from 5G channel state information (CSI). By designing spatial sampling with multigranular data and utilizing multi-source information in 5G CSI, a dataset covering a variety of positioning scenarios is proposed. The core of PE-MSRN is a multi-scale residual network (MSRN) augmented by a positional encoding (PE) mechanism. The positional encoding transforms raw angle of arrival (AOA) data into rich spatial features, which are then mapped into a 2D image, allowing the MSRN to effectively capture both fine-grained local patterns and large-scale spatial dependencies. Subsequently, the PE-MSRN algorithm that integrates ResNet residual networks and multi-scale feature extraction mechanisms is designed and compared with the baseline convolutional neural network (CNN) and other comparison methods. Extensive evaluations across various simulated scenarios, including indoor autonomous driving and smart factory tool tracking, demonstrate the superiority of our approach. Notably, PE-MSRN achieves a positioning accuracy of up to 20 cm, significantly outperforming baseline CNNs and other neural network algorithms in both accuracy and convergence speed, particularly under real measurement conditions with higher SNR and fine-grained grid division. Our work provides a robust and effective solution for developing high-fidelity 5G positioning systems. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

13 pages, 4367 KB

Open AccessArticle

Non-Destructive Characterization of Drywall Moisture Content Using Terahertz Time-Domain Spectroscopy

by Habeeb Foluso Adeagbo and Binbin Yang

Sensors 2025, 25(17), 5576; https://doi.org/10.3390/s25175576 - 6 Sep 2025

Viewed by 1410

Abstract

Despite its wide acceptance, one of the most critical limitations of Terahertz wave technology is its high sensitivity to moisture. This limitation can, in turn, be exploited for use in moisture detection applications. This work presents a quantitative, non-invasive characterization of moisture content [...] Read more.

Despite its wide acceptance, one of the most critical limitations of Terahertz wave technology is its high sensitivity to moisture. This limitation can, in turn, be exploited for use in moisture detection applications. This work presents a quantitative, non-invasive characterization of moisture content in standard gypsum drywall using Terahertz Time-Domain Spectroscopy (THz-TDS). With an increase in the moisture content of the drywall sample, experimental results indicated an increase in the dielectric properties such as the refractive index, permittivity, absorption coefficient, extinction coefficient, and dissipation factor. The demonstrated sensitivity to moisture establishes THz-TDS as a powerful tool for structural monitoring, hidden defect detection, and electromagnetic modeling of real-world building environments. Beyond material diagnostics, these findings have broader implications for THz indoor propagation studies, especially for emerging sub-THz and low THz communication technologies in 5G/6G and THz imaging of objects hidden behind the wall. Full article

(This article belongs to the Section Fault Diagnosis & Sensors)

► Show Figures

Figure 1

16 pages, 2827 KB

Open AccessArticle

A Dual-Modality CNN Approach for RSS-Based Indoor Positioning Using Spatial and Frequency Fingerprints

by Xiangchen Lai, Yunzhi Luo and Yong Jia

Sensors 2025, 25(17), 5408; https://doi.org/10.3390/s25175408 - 2 Sep 2025

Viewed by 594

Abstract

Indoor positioning systems based on received signal strength (RSS) achieve indoor positioning by leveraging the position-related features inherent in spatial RSS fingerprint images. Their positioning accuracy and robustness are directly influenced by the quality of fingerprint features. However, the inherent spatial low-resolution characteristic [...] Read more.

Indoor positioning systems based on received signal strength (RSS) achieve indoor positioning by leveraging the position-related features inherent in spatial RSS fingerprint images. Their positioning accuracy and robustness are directly influenced by the quality of fingerprint features. However, the inherent spatial low-resolution characteristic of spatial RSS fingerprint images makes it challenging to effectively extract subtle fingerprint features. To address this issue, this paper proposes an RSS-based indoor positioning method that combines enhanced spatial frequency fingerprint representation with fusion learning. First, bicubic interpolation is applied to improve image resolution and reveal finer spatial details. Then, a 2D fast Fourier transform (2D FFT) converts the enhanced spatial images into frequency domain representations to supplement spectral features. These spatial and frequency fingerprints are used as dual-modality inputs for a parallel convolutional neural network (CNN) model with efficient multi-scale attention (EMA) modules. The model extracts modality-specific features and fuses them to generate enriched representations. Each modality—spatial, frequency, and fused—is passed through a dedicated fully connected network to predict 3D coordinates. A coordinate optimization strategy is introduced to select the two most reliable outputs for each axis (x, y, z), and their average is used as the final estimate. Experiments on seven public datasets show that the proposed method significantly improves positioning accuracy, reducing the mean positioning error by up to 47.1% and root mean square error (RMSE) by up to 54.4% compared with traditional and advanced time–frequency methods. Full article

(This article belongs to the Section Navigation and Positioning)

► Show Figures

Figure 1

17 pages, 5431 KB

Open AccessArticle

Localization Meets Uncertainty: Uncertainty-Aware Multi-Modal Localization

by Hye-Min Won, Jieun Lee and Jiyong Oh

Technologies 2025, 13(9), 386; https://doi.org/10.3390/technologies13090386 - 1 Sep 2025

Viewed by 917

Abstract

Reliable localization is critical for robot navigation in complex indoor environments. In this paper, we propose an uncertainty-aware localization method that enhances the reliability of localization outputs without modifying the prediction model itself. This study introduces a percentile-based rejection strategy that filters out [...] Read more.

Reliable localization is critical for robot navigation in complex indoor environments. In this paper, we propose an uncertainty-aware localization method that enhances the reliability of localization outputs without modifying the prediction model itself. This study introduces a percentile-based rejection strategy that filters out unreliable 3-degree-of-freedom pose predictions based on aleatoric and epistemic uncertainties the network estimates. We apply this approach to a multi-modal end-to-end localization that fuses RGB images and 2D LiDAR data, and we evaluate it across three real-world datasets collected using a commercialized serving robot. Experimental results show that applying stricter uncertainty thresholds consistently improves pose accuracy. Specifically, the mean position error, calculated as the average Euclidean distance between the predicted and ground-truth (x, y) coordinates, is reduced by 41.0%, 56.7%, and 69.4%, and the mean orientation error, representing the average angular deviation between the predicted and ground-truth yaw angles, is reduced by 55.6%, 65.7%, and 73.3%, when percentile thresholds of 90%, 80%, and 70% are applied, respectively. Furthermore, the rejection strategy effectively removes extreme outliers, resulting in better alignment with ground truth trajectories. To the best of our knowledge, this is the first study to quantitatively demonstrate the benefits of percentile-based uncertainty rejection in multi-modal and end-to-end localization tasks. Our approach provides a practical means to enhance the reliability and accuracy of localization systems in real-world deployments. Full article

(This article belongs to the Special Issue AI Robotics Technologies and Their Applications)

► Show Figures

Figure 1

19 pages, 2591 KB

Open AccessArticle

A Comprehensive Hybrid Approach for Indoor Scene Recognition Combining CNNs and Text-Based Features

by Taner Uckan, Cengiz Aslan and Cengiz Hark

Sensors 2025, 25(17), 5350; https://doi.org/10.3390/s25175350 - 29 Aug 2025

Viewed by 855

Abstract

Indoor scene recognition is a computer vision task that identifies various indoor environments, such as offices, libraries, kitchens, and restaurants. This research area is particularly significant for applications in robotics, security, and assistance for individuals with disabilities, as it enables the categorization of [...] Read more.

Indoor scene recognition is a computer vision task that identifies various indoor environments, such as offices, libraries, kitchens, and restaurants. This research area is particularly significant for applications in robotics, security, and assistance for individuals with disabilities, as it enables the categorization of spaces and the provision of contextual information. Convolutional Neural Networks (CNNs) are commonly employed in this field. While CNNs perform well in outdoor scene recognition by focusing on global features such as mountains and skies, they often struggle with indoor scenes, where local features like furniture and objects are more critical. In this study, the “MIT 67 Indoor Scene” dataset is used to extract and combine features from both a CNN and a text-based model utilizing object recognition outputs, resulting in a two-channel hybrid model. The experimental results demonstrate that this hybrid approach, which integrates natural language processing and image processing techniques, improves the test accuracy of the image processing model by 8.3%, achieving a notable success rate. Furthermore, this study offers contributions to new application areas in remote sensing, particularly in indoor scene understanding and indoor mapping. Full article

(This article belongs to the Section Sensor Networks)

► Show Figures

Figure 1

Search Results (825)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (825)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI