Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (103)

Search Parameters:
Keywords = fisheye image

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 3240 KB  
Article
A Waist-Mounted Interface for Mobile Viewpoint-Height Transformation Affecting Spatial Perception
by Jun Aoki, Hideki Kadone and Kenji Suzuki
Sensors 2026, 26(2), 372; https://doi.org/10.3390/s26020372 - 6 Jan 2026
Viewed by 254
Abstract
Visual information shapes spatial perception and body representation in human augmentation. However, the perceptual consequences of viewpoint-height changes produced by sensor–display geometry are not well understood. To address this gap, we developed an interface that maps a waist-mounted stereo fisheye camera to an [...] Read more.
Visual information shapes spatial perception and body representation in human augmentation. However, the perceptual consequences of viewpoint-height changes produced by sensor–display geometry are not well understood. To address this gap, we developed an interface that maps a waist-mounted stereo fisheye camera to an eye-level viewpoint on a head-mounted display in real time. Geometric and timing calibration kept latency low enough to preserve a sense of agency and enable stable untethered walking. In a within-subject study comparing head- and waist-level viewpoints, participants approached adjustable gaps, rated passability confidence (1–7), and attempted passage when confident. We also recorded walking speed and assessed post-task body representation using a questionnaire. High gaps were judged passable and low gaps were not, irrespective of viewpoint. At the middle gap, confidence decreased with a head-level viewpoint and increased with a waist-level viewpoint, and walking speed decreased when a waist-level viewpoint was combined with a chest-height gap, consistent with added caution near the decision boundary. Body image reports most often indicated a lowered head position relative to the torso, consistent with visually driven rescaling rather than morphological change. These findings show that a waist-mounted interface for mobile viewpoint-height transformation can reliably shift spatial perception. Full article
(This article belongs to the Special Issue Sensors and Wearables for AR/VR Applications)
Show Figures

Figure 1

20 pages, 8493 KB  
Article
Low-Cost Panoramic Photogrammetry: A Case Study on Flat Textures and Poor Lighting Conditions
by Ondrej Benko, Marek Fraštia, Marián Marčiš and Adrián Filip
Geomatics 2026, 6(1), 2; https://doi.org/10.3390/geomatics6010002 - 3 Jan 2026
Viewed by 227
Abstract
The article addresses the issue of panoramic photogrammetry for the reconstruction of interior spaces. Such environments often present challenges, including poor lighting conditions and surfaces with variable texture for photogrammetric scanning. In this case study, we reconstruct the interior spaces of the historical [...] Read more.
The article addresses the issue of panoramic photogrammetry for the reconstruction of interior spaces. Such environments often present challenges, including poor lighting conditions and surfaces with variable texture for photogrammetric scanning. In this case study, we reconstruct the interior spaces of the historical house of Samuel Mikovíni, which represents these unfavorable conditions. The 3D reconstruction of interior spaces is performed using the Ricoh Theta Z1 spherical camera (Ricoh Company, Ltd.; Tokyo, Japan) in six variants, each employing a different number of images and different camera networks. Scale is introduced into the reconstructions based on significant dimensions measured with a measuring tape. A comparison is carried out using a point cloud obtained from terrestrial laser scanning and difference point clouds are generated for each variant. Based on the results, reconstructions produced from a reduced number of spherical images can serve as a basic source for simple documentation with accuracy up to 0.15 m. When the number of spherical images is increased and images from different height levels are included, the reconstruction accuracy improves markedly, achieving positional accuracy of up to 0.05 m, even in areas affected by poor lighting conditions or low-texture surfaces. The results confirm that for interior reconstruction, a higher number of images not only increases the density of the reconstructed point cloud but also enhances its positional accuracy. Full article
Show Figures

Figure 1

31 pages, 6944 KB  
Article
Prompt-Based and Transformer-Based Models Evaluation for Semantic Segmentation of Crowdsourced Urban Imagery Under Projection and Geometric Symmetry Variations
by Sina Rezaei, Aida Yousefi and Hossein Arefi
Symmetry 2026, 18(1), 68; https://doi.org/10.3390/sym18010068 - 31 Dec 2025
Viewed by 334
Abstract
Semantic segmentation of crowdsourced street-level imagery plays a critical role in urban analytics by enabling pixel-wise understanding of urban scenes for applications such as walkability scoring, environmental comfort evaluation, and urban planning, where robustness to geometric transformations and projection-induced symmetry variations is essential. [...] Read more.
Semantic segmentation of crowdsourced street-level imagery plays a critical role in urban analytics by enabling pixel-wise understanding of urban scenes for applications such as walkability scoring, environmental comfort evaluation, and urban planning, where robustness to geometric transformations and projection-induced symmetry variations is essential. This study presents a comparative evaluation of two primary families of semantic segmentation models: transformer-based models (SegFormer and Mask2Former) and prompt-based models (CLIPSeg, LangSAM, and SAM+CLIP). The evaluation is conducted on images with varying geometric properties, including normal perspective, fisheye distortion, and panoramic format, representing different forms of projection symmetry and symmetry-breaking transformations, using data from Google Street View and Mapillary. Each model is evaluated on a unified benchmark with pixel-level annotations for key urban classes, including road, building, sky, vegetation, and additional elements grouped under the “Other” class. Segmentation performance is assessed through metric-based, statistical, and visual evaluations, with mean Intersection over Union (mIoU) and pixel accuracy serving as the primary metrics. Results show that LangSAM demonstrates strong robustness across different image formats, with mIoU scores of 64.48% on fisheye images, 85.78% on normal perspective images, and 96.07% on panoramic images, indicating strong semantic consistency under projection-induced symmetry variations. Among transformer-based models, SegFormer proves to be the most reliable, attains higher accuracy on fisheye and normal perspective images among all models, with mean IoU scores of 72.21%, 94.92%, and 75.13% on fisheye, normal, and panoramic imagery, respectively. LangSAM not only demonstrates robustness across different projection geometries but also delivers the lowest segmentation error, consistently identifying the correct class for corresponding objects. In contrast, CLIPSeg remains the weakest prompt-based model, with mIoU scores of 77.60% on normal images, 59.33% on panoramic images, and a substantial drop to 59.33% on fisheye imagery, reflecting sensitivity to projection-related symmetry distortions. Full article
Show Figures

Figure 1

29 pages, 11999 KB  
Article
Pixel-Wise Sky-Obstacle Segmentation in Fisheye Imagery Using Deep Learning and Gradient Boosting
by Némo Bouillon and Vincent Boitier
J. Imaging 2025, 11(12), 446; https://doi.org/10.3390/jimaging11120446 - 12 Dec 2025
Viewed by 505
Abstract
Accurate sky–obstacle segmentation in hemispherical fisheye imagery is essential for solar irradiance forecasting, photovoltaic system design, and environmental monitoring. However, existing methods often rely on expensive all-sky imagers and region-specific training data, produce coarse sky–obstacle boundaries, and ignore the optical properties of fisheye [...] Read more.
Accurate sky–obstacle segmentation in hemispherical fisheye imagery is essential for solar irradiance forecasting, photovoltaic system design, and environmental monitoring. However, existing methods often rely on expensive all-sky imagers and region-specific training data, produce coarse sky–obstacle boundaries, and ignore the optical properties of fisheye lenses. We propose a low-cost segmentation framework designed for fisheye imagery that combines synthetic data generation, lens-aware augmentation, and a hybrid deep-learning pipeline. Synthetic fisheye training images are created from publicly available street-view panoramas to cover diverse environments without dedicated hardware, and lens-aware augmentations model fisheye projection and photometric effects to improve robustness across devices. On this dataset, we train a convolutional neural network (CNN) and refine its output with gradient-boosted decision trees (GBDT) to sharpen sky–obstacle boundaries. The method is evaluated on real fisheye images captured with smartphones and low-cost clip-on lenses across multiple sites, achieving an Intersection over Union (IoU) of 96.63% and an F1 score of 98.29%, along with high boundary accuracy. An additional evaluation on an external panoramic baseline dataset confirms strong cross-dataset generalization. Together, these results show that the proposed framework enables accurate, low-cost, and widely deployable hemispherical sky segmentation for practical solar and environmental imaging applications. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

17 pages, 1939 KB  
Article
Artificial Intelligence—Assisted Monitoring of Water Usage for Cooling Cows on a Dairy Farm
by Fernando Valle, Kelly Anklam and Dörte Döpfer
Animals 2025, 15(23), 3470; https://doi.org/10.3390/ani15233470 - 2 Dec 2025
Viewed by 453
Abstract
High-yielding lactating cows generate considerable internal heat, making thermoregulation challenging in warm conditions. Traditionally, sprinkler systems have cooled dairy cows by spraying water droplets onto their skin to aid heat dissipation, especially when used with fans. This study explores the benefits of AI-assisted [...] Read more.
High-yielding lactating cows generate considerable internal heat, making thermoregulation challenging in warm conditions. Traditionally, sprinkler systems have cooled dairy cows by spraying water droplets onto their skin to aid heat dissipation, especially when used with fans. This study explores the benefits of AI-assisted monitoring of water usage for cooling dairy cows, aiming to optimize water consumption and enhance sustainability. An object detection model, trained with 200 random images from a fisheye security camera installed above pens of dairy cows in a dairy farm, was used to detect the presence or absence of cows in headgate sections to guide water sprinkler activity. According to the object detection model, the implementation of AI-assisted detection of cows’ presence or absence in headgates with an accuracy of 0.924 has the potential to save up to 75 percent of water annually for cooling cows. Additionally, the model can detect cows’ behavior patterns regarding location in the pens depending on the occurrence of heat stress. The implementation of AI-powered detection systems in dairy farms has been proven to enhance sustainability and significantly reduce expenses by curbing the excessive use of water. Full article
(This article belongs to the Section Animal System and Management)
Show Figures

Figure 1

17 pages, 2325 KB  
Article
Stabilizing and Optimizing of Automatic Leaf Area Index Estimation in Temporal Forest
by Junghee Lee, Nanghyun Cho, Woohyeok Kim, Jungho Im and Kyungmin Kim
Forests 2025, 16(11), 1691; https://doi.org/10.3390/f16111691 - 6 Nov 2025
Viewed by 510
Abstract
Under climate change, the importance of ecosystem monitoring has been repeatedly emphasized over the past decades. Leaf Area Index (LAI), a key ecosystem variable linking the atmosphere and rhizosphere, has been widely studied through various LAI measurement methods. As satellite-based LAI products continue [...] Read more.
Under climate change, the importance of ecosystem monitoring has been repeatedly emphasized over the past decades. Leaf Area Index (LAI), a key ecosystem variable linking the atmosphere and rhizosphere, has been widely studied through various LAI measurement methods. As satellite-based LAI products continue to advance, the demand for extensive and periodic in situ LAI observations has also increased. In this study, we evaluated the combinations of binarization techniques and temporal filtering to reduce variability in an automatic in situ LAI observation network using fisheye lens imagery, which was established by the National Institute of Forest Science (NIFoS). Compared to the widely used methods such as Otsu thresholding (Otsu) and K-means clustering (K-means), the deep learning (DL) method showed more stable LAI time series under field conditions. Under different illumination conditions, mean LAI values fluctuated significantly—from 0.89 to 3.15—depending on image acquisition time. Furthermore, sixteen temporal filtering methods were tested to identify a reasonable range of LAI values, with optimal post-processing strategies suggested: seven-day moving average for maximum LAI (LAI different range among filtering methods −6.1~−1.5) and a three-day moving average excluding rainy days for minimum LAI (LAI different range among filtering methods 0~0.9). This study highlights uncertainties in canopy classification methods, the effects of acquisition timing and lighting, and the necessity of outlier filtering in automatic LAI networks. Despite these challenges, the need for automated LAI observation system is growing, particularly in complex and fragmented forests such as those found in South Korea. Full article
(This article belongs to the Section Forest Ecology and Management)
Show Figures

Figure 1

17 pages, 490 KB  
Article
Knowledge-Guided Symbolic Regression for Interpretable Camera Calibration
by Rui Pimentel de Figueiredo
J. Imaging 2025, 11(11), 389; https://doi.org/10.3390/jimaging11110389 - 2 Nov 2025
Viewed by 785
Abstract
Calibrating cameras accurately requires the identification of projection and distortion models that effectively account for lens-specific deviations. Conventional formulations, like the pinhole model or radial–tangential corrections, often struggle to represent the asymmetric and nonlinear distortions encountered in complex environments such as autonomous navigation, [...] Read more.
Calibrating cameras accurately requires the identification of projection and distortion models that effectively account for lens-specific deviations. Conventional formulations, like the pinhole model or radial–tangential corrections, often struggle to represent the asymmetric and nonlinear distortions encountered in complex environments such as autonomous navigation, robotics, and immersive imaging. Although neural methods offer greater adaptability, they demand extensive training data, are computationally intensive, and often lack transparency. This work introduces a symbolic model discovery framework guided by physical knowledge, where symbolic regression and genetic programming (GP) are used in tandem to identify calibration models tailored to specific optical behaviors. The approach incorporates a broad class of known distortion models, including Brown–Conrady, Mei–Rives, Kannala–Brandt, and double-sphere, as modular components, while remaining extensible to any predefined or domain-specific formulation. Embedding these models directly into the symbolic search process constrains the solution space, enabling efficient parameter fitting and robust model selection without overfitting. Through empirical evaluation across a variety of lens types, including fisheye, omnidirectional, catadioptric, and traditional cameras, we show that our method produces results on par with or surpassing those of established calibration techniques. The outcome is a flexible, interpretable, and resource-efficient alternative suitable for deployment scenarios where calibration data are scarce or computational resources are constrained. Full article
(This article belongs to the Special Issue Celebrating the 10th Anniversary of the Journal of Imaging)
Show Figures

Figure 1

20 pages, 3686 KB  
Article
Comparative Analysis of Correction Methods for Multi-Camera 3D Image Processing System and Its Application Design in Safety Improvement on Hot-Working Production Line
by Joanna Gąbka
Appl. Sci. 2025, 15(16), 9136; https://doi.org/10.3390/app15169136 - 19 Aug 2025
Viewed by 1027
Abstract
The paper presents the results of research focused on configuring a system for stereoscopic view capturing and processing. The system is being developed for use in staff training scenarios based on Virtual Reality (VR), where high-quality, distortion-free imagery is essential. This research addresses [...] Read more.
The paper presents the results of research focused on configuring a system for stereoscopic view capturing and processing. The system is being developed for use in staff training scenarios based on Virtual Reality (VR), where high-quality, distortion-free imagery is essential. This research addresses key challenges in image distortion, including the fish-eye effect and other aberrations. In addition, it considers the computational and bandwidth efficiency required for effective and economical streaming and real-time display of recorded content. Measurements and calculations were performed using a selected set of cameras, adapters, and lenses, chosen based on predefined criteria. A comparative analysis was conducted between the nearest-neighbour linear interpolation method and a third-order polynomial interpolation (ABCD polynomial). These methods were tested and evaluated using three different computational approaches, each aimed at optimizing data processing efficiency critical for real-time image correction. Images captured during real-time video transmission—processed using the developed correction techniques—are presented. In the final sections, the paper describes the configuration of an innovative VR-based training system incorporating an edge computing device. A case study involving a factory producing wheel rims is also presented to demonstrate the practical application of the system. Full article
Show Figures

Figure 1

27 pages, 5515 KB  
Article
Optimizing Multi-Camera Mobile Mapping Systems with Pose Graph and Feature-Based Approaches
by Ahmad El-Alailyi, Luca Morelli, Paweł Trybała, Francesco Fassi and Fabio Remondino
Remote Sens. 2025, 17(16), 2810; https://doi.org/10.3390/rs17162810 - 13 Aug 2025
Cited by 1 | Viewed by 2893
Abstract
Multi-camera Visual Simultaneous Localization and Mapping (V-SLAM) increases spatial coverage through multi-view image streams, improving localization accuracy and reducing data acquisition time. Despite its speed and generally robustness, V-SLAM often struggles to achieve precise camera poses necessary for accurate 3D reconstruction, especially in [...] Read more.
Multi-camera Visual Simultaneous Localization and Mapping (V-SLAM) increases spatial coverage through multi-view image streams, improving localization accuracy and reducing data acquisition time. Despite its speed and generally robustness, V-SLAM often struggles to achieve precise camera poses necessary for accurate 3D reconstruction, especially in complex environments. This study introduces two novel multi-camera optimization methods to enhance pose accuracy, reduce drift, and ensure loop closures. These methods refine multi-camera V-SLAM outputs within existing frameworks and are evaluated in two configurations: (1) multiple independent stereo V-SLAM instances operating on separate camera pairs; and (2) multi-view odometry processing all camera streams simultaneously. The proposed optimizations include (1) a multi-view feature-based optimization that integrates V-SLAM poses with rigid inter-camera constraints and bundle adjustment; and (2) a multi-camera pose graph optimization that fuses multiple trajectories using relative pose constraints and robust noise models. Validation is conducted through two complex 3D surveys using the ATOM-ANT3D multi-camera fisheye mobile mapping system. Results demonstrate survey-grade accuracy comparable to traditional photogrammetry, with reduced computational time, advancing toward near real-time 3D mapping of challenging environments. Full article
Show Figures

Graphical abstract

19 pages, 9147 KB  
Article
Evaluating Forest Canopy Structures and Leaf Area Index Using a Five-Band Depth Image Sensor
by Geilebagan, Takafumi Tanaka, Takashi Gomi, Ayumi Kotani, Genya Nakaoki, Xinwei Wang and Shodai Inokoshi
Forests 2025, 16(8), 1294; https://doi.org/10.3390/f16081294 - 8 Aug 2025
Viewed by 1366
Abstract
The objective of the study was to develop and validate a ground-based method using a depth image sensor equipped with depth, visible red, green, blue (RGB), and near-infrared bands to measure the leaf area index (LAI) based on the relative illuminance of foliage [...] Read more.
The objective of the study was to develop and validate a ground-based method using a depth image sensor equipped with depth, visible red, green, blue (RGB), and near-infrared bands to measure the leaf area index (LAI) based on the relative illuminance of foliage only. The method was applied in a Itajii chinkapin (Castanopsis sieboldii (Makino) Hatus. ex T.Yamaz. & Mashiba )forest in Aichi Prefecture, Japan, and validated by comparing estimates with conventional methods (LAI-2200 and fisheye photography). To apply the 5-band sensor to actual forests, a methodology is proposed for matching the color camera and near-infrared camera in units of pixels, along with a method for widening the exposure range through multi-step camera exposure. Based on these advancements, the RGB color band, near-infrared band, and depth band are converted into several physical properties. Employing these properties, each pixel of the canopy image is classified into upper foliage, lower foliage, sky, and non-assimilated parts (stems and branches). Subsequently, the LAI is calculated using the gap-fraction method, which is based on the relative illuminance of the foliage. In comparison with existing indirect LAI estimations, this technique enabled the distinction between upper and lower canopy layers and the exclusion of non-assimilated parts. The findings indicate that the plant area index (PAI) ranged from 2.23 to 3.68 m2 m−2, representing an increase from 33% to 34% compared to the LAI calculated after excluding non-assimilating parts. The findings of this study underscore the necessity of distinguishing non-assimilated components in the estimation of LAI. The PAI estimates derived from the depth image sensor exhibited moderate to strong agreement with the LAI-2200, contingent upon canopy rings (R2 = 0.48–0.98), thereby substantiating the reliability of the system’s performance. The developed approaches also permit the evaluation of the distributions of leaves and branches at various heights from the ground surface to the top of the canopy. The novel LAI measurement method developed in this study has the potential to provide precise, reliable foundational data to support research in ecology and hydrology related to complex tree structures. Full article
(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)
Show Figures

Figure 1

28 pages, 9378 KB  
Article
A Semantic Segmentation-Based GNSS Signal Occlusion Detection and Optimization Method
by Zhe Yue, Chenchen Sun, Xuerong Zhang, Chengkai Tang, Yuting Gao and Kezhao Li
Remote Sens. 2025, 17(15), 2725; https://doi.org/10.3390/rs17152725 - 6 Aug 2025
Viewed by 3057
Abstract
Existing research fails to effectively address the problem of increased GNSS positioning errors caused by non-line-of-sight (NLOS) and line-of-sight (LOS) signal attenuation due to obstructions such as buildings and trees in complex urban environments. To address this issue, we dig into the environmental [...] Read more.
Existing research fails to effectively address the problem of increased GNSS positioning errors caused by non-line-of-sight (NLOS) and line-of-sight (LOS) signal attenuation due to obstructions such as buildings and trees in complex urban environments. To address this issue, we dig into the environmental perception perspective to propose a semantic segmentation-based GNSS signal occlusion detection and optimization method. The approach distinguishes between building and tree occlusions and adjusts signal weights accordingly to enhance positioning accuracy. First, a fisheye camera captures environmental imagery above the vehicle, which is then processed using deep learning to segment sky, tree, and building regions. Subsequently, satellite projections are mapped onto the segmented sky image to classify signal occlusions. Then, based on the type of obstruction, a dynamic weight optimization model is constructed to adjust the contribution of each satellite in the positioning solution, thereby enhancing the positioning accuracy of vehicle-navigation in urban environments. Finally, we construct a vehicle-mounted navigation system for experimentation. The experimental results demonstrate that the proposed method enhances accuracy by 16% and 10% compared to the existing GNSS/INS/Canny and GNSS/INS/Flood Fill methods, respectively, confirming its effectiveness in complex urban environments. Full article
(This article belongs to the Special Issue GNSS and Multi-Sensor Integrated Precise Positioning and Applications)
Show Figures

Figure 1

33 pages, 10063 KB  
Article
Wide-Angle Image Distortion Correction and Embedded Stitching System Design Based on Swin Transformer
by Shiwen Lai, Zuling Cheng, Wencui Zhang and Maowei Chen
Appl. Sci. 2025, 15(14), 7714; https://doi.org/10.3390/app15147714 - 9 Jul 2025
Cited by 2 | Viewed by 1513
Abstract
Wide-angle images often suffer from severe radial distortion, compromising geometric accuracy and challenging image correction and real-time stitching, especially in resource-constrained embedded environments. To address this, this study proposes a wide-angle image correction and stitching framework based on a Swin Transformer, optimized for [...] Read more.
Wide-angle images often suffer from severe radial distortion, compromising geometric accuracy and challenging image correction and real-time stitching, especially in resource-constrained embedded environments. To address this, this study proposes a wide-angle image correction and stitching framework based on a Swin Transformer, optimized for lightweight deployment on edge devices. The model integrates multi-scale feature extraction, Thin Plate Spline (TPS) control point prediction, and optical flow-guided constraints, balancing correction accuracy and computational efficiency. Experiments on synthetic and real-world datasets show that the method outperforms mainstream algorithms, with PSNR gains of 3.28 dB and 2.18 dB on wide-angle and fisheye images, respectively, while maintaining real-time performance. To validate practical applicability, the model is deployed on a Jetson TX2 NX device, and a real-time dual-camera stitching system is built using C++ and DeepStream. The system achieves 15 FPS at 1400 × 1400 resolution, with a correction latency of 56 ms and stitching latency of 15 ms, demonstrating efficient hardware utilization and stable performance. This study presents a deployable, scalable, and edge-compatible solution for wide-angle image correction and real-time stitching, offering practical value for applications such as smart surveillance, autonomous driving, and industrial inspection. Full article
(This article belongs to the Special Issue Latest Research on Computer Vision and Image Processing)
Show Figures

Figure 1

23 pages, 14051 KB  
Article
A Novel Method for Water Surface Debris Detection Based on YOLOV8 with Polarization Interference Suppression
by Yi Chen, Honghui Lin, Lin Xiao, Maolin Zhang and Pingjun Zhang
Photonics 2025, 12(6), 620; https://doi.org/10.3390/photonics12060620 - 18 Jun 2025
Cited by 1 | Viewed by 1349
Abstract
Aquatic floating debris detection is a key technological foundation for ecological monitoring and integrated water environment management. It holds substantial scientific and practical value in applications such as pollution source tracing, floating debris control, and maritime navigation safety. However, this field faces ongoing [...] Read more.
Aquatic floating debris detection is a key technological foundation for ecological monitoring and integrated water environment management. It holds substantial scientific and practical value in applications such as pollution source tracing, floating debris control, and maritime navigation safety. However, this field faces ongoing challenges due to water surface polarization. Reflections of polarized light produce intense glare, resulting in localized overexposure, detail loss, and geometric distortion in captured images. These optical artifacts severely impair the performance of conventional detection algorithms, increasing both false positives and missed detections. To overcome these imaging challenges in complex aquatic environments, we propose a novel YOLOv8-based detection framework with integrated polarized light suppression mechanisms. The framework consists of four key components: a fisheye distortion correction module, a polarization feature processing layer, a customized residual network with Squeeze-and-Excitation (SE) attention, and a cascaded pipeline for super-resolution reconstruction and deblurring. Additionally, we developed the PSF-IMG dataset (Polarized Surface Floats), which includes common floating debris types such as plastic bottles, bags, and foam boards. Extensive experiments demonstrate the network’s robustness in suppressing polarization artifacts and enhancing feature stability under dynamic optical conditions. Full article
(This article belongs to the Special Issue Advancements in Optical Measurement Techniques and Applications)
Show Figures

Figure 1

29 pages, 19553 KB  
Article
Let’s Go Bananas: Beyond Bounding Box Representations for Fisheye Camera-Based Object Detection in Autonomous Driving
by Senthil Yogamani, Ganesh Sistu, Patrick Denny and Jane Courtney
Sensors 2025, 25(12), 3735; https://doi.org/10.3390/s25123735 - 14 Jun 2025
Viewed by 1865
Abstract
Object detection is a mature problem in autonomous driving, with pedestrian detection being one of the first commercially deployed algorithms. It has been extensively studied in the literature. However, object detection is relatively less explored for fisheye cameras used for surround-view near-field sensing. [...] Read more.
Object detection is a mature problem in autonomous driving, with pedestrian detection being one of the first commercially deployed algorithms. It has been extensively studied in the literature. However, object detection is relatively less explored for fisheye cameras used for surround-view near-field sensing. The standard bounding-box representation fails in fisheye cameras due to heavy radial distortion, particularly in the periphery. In this paper, a generic object detection framework is implemented using the base YOLO (You Only Look Once) detector to systematically explore various object representations using the public WoodScape dataset. First, we implement basic representations, namely the standard bounding box, the oriented bounding box, and the ellipse. Secondly, we implement a generic polygon and propose a novel curvature-adaptive polygon, which obtains an improvement of 3 mAP (mean average precision) points. A polygon is expensive to annotate and complex to use in downstream tasks; thus, it is not practical to use it in real-world applications. However, we utilize it to demonstrate that the accuracy gap between the polygon and the bounding box representation is very high due to strong distortion in fisheye cameras. This motivates the design of a distortion-aware optimal representation of the bounding box for fisheye images, which tend to be banana-shaped near the periphery. We derive a novel representation called a curved box and improve it further by leveraging vanishing-point constraints. The proposed curved box representations outperform the bounding box by 3 mAP points and the oriented bounding box by 1.6 mAP points. In addition, the camera geometry tensor is formulated to provide adaptation to non-linear fisheye camera distortion characteristics and improves the performance further by 1.4 mAP points. Full article
(This article belongs to the Special Issue Design, Communication, and Control of Autonomous Vehicle Systems)
Show Figures

Figure 1

42 pages, 47882 KB  
Article
Product Engagement Detection Using Multi-Camera 3D Skeleton Reconstruction and Gaze Estimation
by Matus Tanonwong, Yu Zhu, Naoya Chiba and Koichi Hashimoto
Sensors 2025, 25(10), 3031; https://doi.org/10.3390/s25103031 - 11 May 2025
Viewed by 1857
Abstract
Product engagement detection in retail environments is critical for understanding customer preferences through nonverbal cues such as gaze and hand movements. This study presents a system leveraging a 360-degree top-view fisheye camera combined with two perspective cameras, the only sensors required for deployment, [...] Read more.
Product engagement detection in retail environments is critical for understanding customer preferences through nonverbal cues such as gaze and hand movements. This study presents a system leveraging a 360-degree top-view fisheye camera combined with two perspective cameras, the only sensors required for deployment, effectively capturing subtle interactions even under occlusion or distant camera setups. Unlike conventional image-based gaze estimation methods that are sensitive to background variations and require capturing a person’s full appearance, raising privacy concerns, our approach utilizes a novel Transformer-based encoder operating directly on 3D skeletal keypoints. This innovation significantly reduces privacy risks by avoiding personal appearance data and benefits from ongoing advancements in accurate skeleton estimation techniques. Experimental evaluation in a simulated retail environment demonstrates that our method effectively identifies critical gaze-object and hand-object interactions, reliably detecting customer engagement prior to product selection. Despite yielding slightly higher mean angular errors in gaze estimation compared to a recent image-based method, the Transformer-based model achieves comparable performance in gaze-object detection. Its robustness, generalizability, and inherent privacy preservation make it particularly suitable for deployment in practical retail scenarios such as convenience stores, supermarkets, and shopping malls, highlighting its superiority in real-world applicability. Full article
(This article belongs to the Special Issue Feature Papers in Sensing and Imaging 2025&2026)
Show Figures

Figure 1

Back to TopTop