Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (217)

Search Parameters:
Keywords = three-dimensional computer vision

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
23 pages, 3485 KB  
Article
MSGS-SLAM: Monocular Semantic Gaussian Splatting SLAM
by Mingkai Yang, Shuyu Ge and Fei Wang
Symmetry 2025, 17(9), 1576; https://doi.org/10.3390/sym17091576 - 20 Sep 2025
Viewed by 954
Abstract
With the iterative evolution of SLAM (Simultaneous Localization and Mapping) technology in the robotics domain, the SLAM paradigm based on three-dimensional Gaussian distribution models has emerged as the current state-of-the-art technical approach. This research proposes a novel MSGS-SLAM system (Monocular Semantic Gaussian Splatting [...] Read more.
With the iterative evolution of SLAM (Simultaneous Localization and Mapping) technology in the robotics domain, the SLAM paradigm based on three-dimensional Gaussian distribution models has emerged as the current state-of-the-art technical approach. This research proposes a novel MSGS-SLAM system (Monocular Semantic Gaussian Splatting SLAM), which innovatively integrates monocular vision with three-dimensional Gaussian distribution models within a semantic SLAM framework. Our approach exploits the inherent spherical symmetries of isotropic Gaussian distributions, enabling symmetric optimization processes that maintain computational efficiency while preserving geometric consistency. Current mainstream three-dimensional Gaussian semantic SLAM systems typically rely on depth sensors for map reconstruction and semantic segmentation, which not only significantly increases hardware costs but also limits the deployment potential of systems in diverse scenarios. To overcome this limitation, this research introduces a depth estimation proxy framework based on Metric3D-V2, which effectively addresses the inherent deficiency of monocular vision systems in depth information acquisition. Additionally, our method leverages architectural symmetries in indoor environments to enhance semantic understanding through symmetric feature matching. Through this approach, the system achieves robust and efficient semantic feature integration and optimization without relying on dedicated depth sensors, thereby substantially reducing the dependency of three-dimensional Gaussian semantic SLAM systems on depth sensors and expanding their application scope. Furthermore, this research proposes a keyframe selection algorithm based on semantic guidance and proxy depth collaborative mechanisms, which effectively suppresses pose drift errors accumulated during long-term system operation, thereby achieving robust global loop closure correction. Through systematic evaluation on multiple standard datasets, MSGS-SLAM achieves comparable technical performance to existing three-dimensional Gaussian model-based semantic SLAM systems across multiple key performance metrics including ATE RMSE, PSNR, and mIoU. Full article
(This article belongs to the Section Engineering and Materials)
Show Figures

Figure 1

22 pages, 13502 KB  
Article
AI Test Modeling for Computer Vision System—A Case Study
by Jerry Gao and Radhika Agarwal
Computers 2025, 14(9), 396; https://doi.org/10.3390/computers14090396 - 18 Sep 2025
Viewed by 728
Abstract
This paper presents an intelligent AI test modeling framework for computer vision systems, focused on image-based systems. A three-dimensional (3D) model using decision tables enables model-based function testing, automated test data generation, and comprehensive coverage analysis. A case study using the Seek by [...] Read more.
This paper presents an intelligent AI test modeling framework for computer vision systems, focused on image-based systems. A three-dimensional (3D) model using decision tables enables model-based function testing, automated test data generation, and comprehensive coverage analysis. A case study using the Seek by iNaturalist application demonstrates the framework’s applicability to real-world CV tasks. It effectively identifies species and non-species under varying image conditions such as distance, blur, brightness, and grayscale. This study contributes a structured methodology that advances our academic understanding of model-based CV testing while offering practical tools for improving the robustness and reliability of AI-driven vision applications. Full article
(This article belongs to the Special Issue Advanced Image Processing and Computer Vision (2nd Edition))
Show Figures

Figure 1

24 pages, 1501 KB  
Review
Artificial Intelligence and Digital Tools Across the Hepato-Pancreato-Biliary Surgical Pathway: A Systematic Review
by Andreas Efstathiou, Evgenia Charitaki, Charikleia Triantopoulou and Spiros Delis
J. Clin. Med. 2025, 14(18), 6501; https://doi.org/10.3390/jcm14186501 - 15 Sep 2025
Viewed by 759
Abstract
Background: Hepato-pancreato-biliary (HPB) surgery involves operations that depend heavily on precise imaging, careful planning, and intraoperative decision-making. The rapid emergence of artificial intelligence (AI) and digital tools has assisted in these domains. Methods: We performed a PRISMA-guided systematic review (searches through June 2025) [...] Read more.
Background: Hepato-pancreato-biliary (HPB) surgery involves operations that depend heavily on precise imaging, careful planning, and intraoperative decision-making. The rapid emergence of artificial intelligence (AI) and digital tools has assisted in these domains. Methods: We performed a PRISMA-guided systematic review (searches through June 2025) of AI/digital technologies applied to HPB surgical care, including novel models such as machine learning, deep learning, radiomics, augmented/mixed reality, and computer vision. Our focus was for eligible studies to address imaging interpretation, preoperative planning, intraoperative guidance, or outcome prediction. Results: In total, 38 studies met inclusion criteria. Imaging models constructed with AI showed high diagnostic performance for lesion detection and classification (commonly AUC ~0.80–0.98). Moreover, risk models using machine learning frequently exceeded traditional scores for predicting postoperative complications (e.g., pancreatic fistula). AI-assisted three-dimensional visual reconstructions enhanced anatomical understanding for preoperative planning, while augmented and mixed-reality systems enabled real-time intraoperative navigation in pilot series. Computer-vision systems recognized critical intraoperative landmarks (e.g., critical view of safety) and detected hazards such as bleeding in near real time. Most of the studies included were retrospective, single-center, or feasibility designs, with limited external validation. Conclusions: The usage of AI and digital tools show promising results across the HPB pathway—from preoperative diagnostics to intraoperative safety and guidance. The evidence to date supports technical feasibility and suggests clinical benefit, but routine adoption and further conclusions should await prospective, multicenter validation and consistent reporting. With continued refinement, multidisciplinary collaboration, appropriate cost effectiveness, and attention to ethics and implementation, these technologies could improve the precision, safety, and outcomes of HPB surgery. Full article
Show Figures

Figure 1

25 pages, 1596 KB  
Review
A Survey of 3D Reconstruction: The Evolution from Multi-View Geometry to NeRF and 3DGS
by Shuai Liu, Mengmeng Yang, Tingyan Xing and Ran Yang
Sensors 2025, 25(18), 5748; https://doi.org/10.3390/s25185748 - 15 Sep 2025
Viewed by 2380
Abstract
Three-dimensional (3D) reconstruction technology is not only a core and key technology in computer vision and graphics, but also a key force driving the flourishing development of many cutting-edge applications such as virtual reality (VR), augmented reality (AR), autonomous driving, and digital earth. [...] Read more.
Three-dimensional (3D) reconstruction technology is not only a core and key technology in computer vision and graphics, but also a key force driving the flourishing development of many cutting-edge applications such as virtual reality (VR), augmented reality (AR), autonomous driving, and digital earth. With the rise in novel view synthesis technologies such as Neural Radiation Field (NeRF) and 3D Gaussian Splatting (3DGS), 3D reconstruction is facing unprecedented development opportunities. This article introduces the basic principles of traditional 3D reconstruction methods, including Structure from Motion (SfM) and Multi View Stereo (MVS) techniques, and analyzes the limitations of these methods in dealing with complex scenes and dynamic environments. Focusing on implicit 3D scene reconstruction techniques related to NeRF, this paper explores the advantages and challenges of using deep neural networks to learn and generate high-quality 3D scene rendering from limited perspectives. Based on the principles and characteristics of 3DGS-related technologies that have emerged in recent years, the latest progress and innovations in rendering quality, rendering efficiency, sparse view input support, and dynamic 3D reconstruction are analyzed. Finally, the main challenges and opportunities faced by current 3D reconstruction technology and novel view synthesis technology were discussed in depth, and possible technological breakthroughs and development directions in the future were discussed. This article aims to provide a comprehensive perspective for researchers in 3D reconstruction technology in fields such as digital twins and smart cities, while opening up new ideas and paths for future technological innovation and widespread application. Full article
(This article belongs to the Section Sensing and Imaging)
Show Figures

Figure 1

25 pages, 21209 KB  
Article
Hyperspectral Image Classification Using a Spectral-Cube Gated Harmony Network
by Nana Li, Wentao Shen and Qiuwen Zhang
Electronics 2025, 14(17), 3553; https://doi.org/10.3390/electronics14173553 - 6 Sep 2025
Viewed by 546
Abstract
In recent years, hybrid models that integrate Convolutional Neural Networks (CNNs) with Vision Transformers (ViTs) have achieved significant improvements in hyperspectral image classification (HSIC). Nevertheless, their complex architectures often lead to computational redundancy and inefficient feature fusion, particularly struggling to balance global modeling [...] Read more.
In recent years, hybrid models that integrate Convolutional Neural Networks (CNNs) with Vision Transformers (ViTs) have achieved significant improvements in hyperspectral image classification (HSIC). Nevertheless, their complex architectures often lead to computational redundancy and inefficient feature fusion, particularly struggling to balance global modeling and local detail extraction in high-dimensional spectral data. To solve these issues, this paper proposes a Spectral-Cube Gated Harmony Network (SCGHN) that achieves efficient spectral–spatial joint feature modeling through a dynamic gating mechanism and hierarchical feature decoupling strategy. There are three primary innovative contributions of this paper as follows: Firstly, we design a Spectral Cooperative Parallel Convolution (SCPC) module that combines dynamic gating in the spectral dimension and spatial deformable convolution. This module adopts a dual-path parallel architecture that adaptively enhances key bands and captures local textures, thereby significantly improving feature discriminability at mixed ground object boundaries. Secondly, we propose a Dual-Gated Fusion (DGF) module that achieves cross-scale contextual complementarity through group convolution and lightweight attention, thereby enhancing hierarchical semantic representations with significantly lower computational complexity. Finally, by means of the coordinated design of 3D convolution and lightweight classification decision blocks, we construct an end-to-end lightweight framework that effectively alleviates the structural redundancy issues of traditional hybrid models. Extensive experiments on three standard hyperspectral datasets reveal that our SCGHN requires fewer parameters and exhibits lower computational complexity as compared with some existing HSIC methods. Full article
Show Figures

Figure 1

30 pages, 5669 KB  
Article
Vision and 2D LiDAR Fusion-Based Navigation Line Extraction for Autonomous Agricultural Robots in Dense Pomegranate Orchards
by Zhikang Shi, Ziwen Bai, Kechuan Yi, Baijing Qiu, Xiaoya Dong, Qingqing Wang, Chunxia Jiang, Xinwei Zhang and Xin Huang
Sensors 2025, 25(17), 5432; https://doi.org/10.3390/s25175432 - 2 Sep 2025
Cited by 1 | Viewed by 944
Abstract
To address the insufficient accuracy of traditional single-sensor navigation methods in dense planting environments of pomegranate orchards, this paper proposes a vision and LiDAR fusion-based navigation line extraction method for orchard environments. The proposed method integrates a YOLOv8-ResCBAM trunk detection model, a reverse [...] Read more.
To address the insufficient accuracy of traditional single-sensor navigation methods in dense planting environments of pomegranate orchards, this paper proposes a vision and LiDAR fusion-based navigation line extraction method for orchard environments. The proposed method integrates a YOLOv8-ResCBAM trunk detection model, a reverse ray projection fusion algorithm, and geometric constraint-based navigation line fitting techniques. The object detection model enables high-precision real-time detection of pomegranate tree trunks. A reverse ray projection algorithm is proposed to convert pixel coordinates from visual detection into three-dimensional rays and compute their intersections with LiDAR scanning planes, achieving effective association between visual and LiDAR data. Finally, geometric constraints are introduced to improve the RANSAC algorithm for navigation line fitting, combined with Kalman filtering techniques to reduce navigation line fluctuations. Field experiments demonstrate that the proposed fusion-based navigation method improves navigation accuracy over single-sensor methods and semantic-segmentation methods, reducing the average lateral error to 5.2 cm, yielding an average lateral error RMS of 6.6 cm, and achieving a navigation success rate of 95.4%. These results validate the effectiveness of the vision and 2D LiDAR fusion-based approach in complex orchard environments and provide a viable route toward autonomous navigation for orchard robots. Full article
(This article belongs to the Section Sensors and Robotics)
Show Figures

Figure 1

23 pages, 28830 KB  
Article
Micro-Expression-Based Facial Analysis for Automated Pain Recognition in Dairy Cattle: An Early-Stage Evaluation
by Shuqiang Zhang, Kashfia Sailunaz and Suresh Neethirajan
AI 2025, 6(9), 199; https://doi.org/10.3390/ai6090199 - 22 Aug 2025
Viewed by 1066
Abstract
Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm [...] Read more.
Timely, objective pain recognition in dairy cattle is essential for welfare assurance, productivity, and ethical husbandry yet remains elusive because evolutionary pressure renders bovine distress signals brief and inconspicuous. Without verbal self-reporting, cows suppress overt cues, so automated vision is indispensable for on-farm triage. Although earlier systems tracked whole-body posture or static grimace scales, frame-level detection of facial micro-expressions has not been explored fully in livestock. We translate micro-expression analytics from automotive driver monitoring to the barn, linking modern computer vision with veterinary ethology. Our two-stage pipeline first detects faces and 30 landmarks using a custom You Only Look Once (YOLO) version 8-Pose network, achieving a 96.9% mean average precision (mAP) at an Intersection over the Union (IoU) threshold of 0.50 for detection and 83.8% Object Keypoint Similarity (OKS) for keypoint placement. Cropped eye, ear, and muzzle patches are encoded using a pretrained MobileNetV2, generating 3840-dimensional descriptors that capture millisecond muscle twitches. Sequences of five consecutive frames are fed into a 128-unit Long Short-Term Memory (LSTM) classifier that outputs pain probabilities. On a held-out validation set of 1700 frames, the system records 99.65% accuracy and an F1-score of 0.997, with only three false positives and three false negatives. Tested on 14 unseen barn videos, it attains 64.3% clip-level accuracy (i.e., overall accuracy for the whole video clip) and 83% precision for the pain class, using a hybrid aggregation rule that combines a 30% mean probability threshold with micro-burst counting to temper false alarms. As an early exploration from our proof-of-concept study on a subset of our custom dairy farm datasets, these results show that micro-expression mining can deliver scalable, non-invasive pain surveillance across variations in illumination, camera angle, background, and individual morphology. Future work will explore attention-based temporal pooling, curriculum learning for variable window lengths, domain-adaptive fine-tuning, and multimodal fusion with accelerometry on the complete datasets to elevate the performance toward clinical deployment. Full article
Show Figures

Figure 1

31 pages, 2542 KB  
Article
ECR-MobileNet: An Imbalanced Largemouth Bass Parameter Prediction Model with Adaptive Contrastive Regression and Dependency-Graph Pruning
by Hao Peng, Cheng Ouyang, Lin Yang, Jingtao Deng, Mingyu Tan, Yahui Luo, Wenwu Hu, Pin Jiang and Yi Wang
Animals 2025, 15(16), 2443; https://doi.org/10.3390/ani15162443 - 20 Aug 2025
Viewed by 568
Abstract
The precise, non-destructive monitoring of fish length and weight is a core technology for advancing intelligent aquaculture. However, this field faces dual challenges: traditional contact-based measurements induce stress and yield loss. In addition, existing computer vision methods are hindered by prediction biases from [...] Read more.
The precise, non-destructive monitoring of fish length and weight is a core technology for advancing intelligent aquaculture. However, this field faces dual challenges: traditional contact-based measurements induce stress and yield loss. In addition, existing computer vision methods are hindered by prediction biases from imbalanced data and the deployment bottleneck of balancing high accuracy with model lightweighting. This study aims to overcome these challenges by developing an efficient and robust deep learning framework. We propose ECR-MobileNet, a lightweight framework built on MobileNetV3-Small. It features three key innovations: an efficient channel attention (ECA) module to enhance feature discriminability, an original adaptive multi-scale contrastive regression (AMCR) loss function that extends contrastive learning to multi-dimensional regression for length and weight simultaneously to mitigate data imbalance, and a dependency-graph-based (DepGraph) structured pruning technique that synergistically optimizes model size and performance. On our multi-scene largemouth bass dataset, the pruned ECR-MobileNet-P model comprehensively outperformed 14 mainstream benchmarks. It achieved an R2 of 0.9784 and a root mean square error (RMSE) of 0.4296 cm for length prediction, as well as an R2 of 0.9740 and an RMSE of 0.0202 kg for weight prediction. The model’s parameter count is only 0.52 M, with a computational load of 0.07 giga floating-point operations per second (GFLOPs) and a CPU latency of 10.19 ms, achieving Pareto optimality. This study provides an edge-deployable solution for stress-free biometric monitoring in aquaculture and establishes an innovative methodological paradigm for imbalanced regression and task-oriented model compression. Full article
(This article belongs to the Section Aquatic Animals)
Show Figures

Figure 1

21 pages, 2712 KB  
Review
The State of the Art and Potentialities of UAV-Based 3D Measurement Solutions in the Monitoring and Fault Diagnosis of Quasi-Brittle Structures
by Mohammad Hajjar, Emanuele Zappa and Gabriella Bolzon
Sensors 2025, 25(16), 5134; https://doi.org/10.3390/s25165134 - 19 Aug 2025
Viewed by 1036
Abstract
The structural health monitoring (SHM) of existing infrastructure and heritage buildings is essential for their preservation and safety. This is a review paper which focuses on modern three-dimensional (3D) measurement techniques, particularly those that enable the assessment of the structural response to environmental [...] Read more.
The structural health monitoring (SHM) of existing infrastructure and heritage buildings is essential for their preservation and safety. This is a review paper which focuses on modern three-dimensional (3D) measurement techniques, particularly those that enable the assessment of the structural response to environmental actions and operational conditions. The emphasis is on the detection of fractures and the identification of the crack geometry. While traditional monitoring systems—such as pendula, callipers, and strain gauges—have been widely used in massive, quasi-brittle structures like dams and masonry buildings, advancements in non-contact and computer-vision-based methods are increasingly offering flexible and efficient alternatives. The integration of drone-mounted systems facilitates access to challenging inspection zones, enabling the acquisition of quantitative data from full-field surface measurements. Among the reviewed techniques, digital image correlation (DIC) stands out for its superior displacement accuracy, while photogrammetry and time-of-flight (ToF) technologies offer greater operational flexibility but require additional processing to extract displacement data. The collected information contributes to the calibration of digital twins, supporting predictive simulations and real-time anomaly detection. Emerging tools based on machine learning and digital technologies further enhance damage detection capabilities and inform retrofitting strategies. Overall, vision-based methods show strong potential for outdoor SHM applications, though practical constraints such as drone payload and calibration requirements must be carefully managed. Full article
(This article belongs to the Special Issue Feature Review Papers in Fault Diagnosis & Sensors)
Show Figures

Figure 1

21 pages, 4909 KB  
Article
Rapid 3D Camera Calibration for Large-Scale Structural Monitoring
by Fabio Bottalico, Nicholas A. Valente, Christopher Niezrecki, Kshitij Jerath, Yan Luo and Alessandro Sabato
Remote Sens. 2025, 17(15), 2720; https://doi.org/10.3390/rs17152720 - 6 Aug 2025
Cited by 1 | Viewed by 958
Abstract
Computer vision techniques such as three-dimensional digital image correlation (3D-DIC) and three-dimensional point tracking (3D-PT) have demonstrated broad applicability for monitoring the conditions of large-scale engineering systems by reconstructing and tracking dynamic point clouds corresponding to the surface of a structure. Accurate stereophotogrammetry [...] Read more.
Computer vision techniques such as three-dimensional digital image correlation (3D-DIC) and three-dimensional point tracking (3D-PT) have demonstrated broad applicability for monitoring the conditions of large-scale engineering systems by reconstructing and tracking dynamic point clouds corresponding to the surface of a structure. Accurate stereophotogrammetry measurements require the stereo cameras to be calibrated to determine their intrinsic and extrinsic parameters by capturing multiple images of a calibration object. This image-based approach becomes cumbersome and time-consuming as the size of the tested object increases. To streamline the calibration and make it scale-insensitive, a multi-sensor system embedding inertial measurement units and a laser sensor is developed to compute the extrinsic parameters of the stereo cameras. In this research, the accuracy of the proposed sensor-based calibration method in performing stereophotogrammetry is validated experimentally and compared with traditional approaches. Tests conducted at various scales reveal that the proposed sensor-based calibration enables reconstructing both static and dynamic point clouds, measuring displacements with an accuracy higher than 95% compared to image-based traditional calibration, while being up to an order of magnitude faster and easier to deploy. The novel approach has broad applications for making static, dynamic, and deformation measurements to transform how large-scale structural health monitoring can be performed. Full article
(This article belongs to the Special Issue New Perspectives on 3D Point Cloud (Third Edition))
Show Figures

Figure 1

31 pages, 11269 KB  
Review
Advancements in Semantic Segmentation of 3D Point Clouds for Scene Understanding Using Deep Learning
by Hafsa Benallal, Nadine Abdallah Saab, Hamid Tairi, Ayman Alfalou and Jamal Riffi
Technologies 2025, 13(8), 322; https://doi.org/10.3390/technologies13080322 - 30 Jul 2025
Viewed by 3797
Abstract
Three-dimensional semantic segmentation is a fundamental problem in computer vision with a wide range of applications in autonomous driving, robotics, and urban scene understanding. The task involves assigning semantic labels to each point in a 3D point cloud, a data representation that is [...] Read more.
Three-dimensional semantic segmentation is a fundamental problem in computer vision with a wide range of applications in autonomous driving, robotics, and urban scene understanding. The task involves assigning semantic labels to each point in a 3D point cloud, a data representation that is inherently unstructured, irregular, and spatially sparse. In recent years, deep learning has become the dominant framework for addressing this task, leading to a broad variety of models and techniques designed to tackle the unique challenges posed by 3D data. This survey presents a comprehensive overview of deep learning methods for 3D semantic segmentation. We organize the literature into a taxonomy that distinguishes between supervised and unsupervised approaches. Supervised methods are further classified into point-based, projection-based, voxel-based, and hybrid architectures, while unsupervised methods include self-supervised learning strategies, generative models, and implicit representation techniques. In addition to presenting and categorizing these approaches, we provide a comparative analysis of their performance on widely used benchmark datasets, discuss key challenges such as generalization, model transferability, and computational efficiency, and examine the limitations of current datasets. The survey concludes by identifying potential directions for future research in this rapidly evolving field. Full article
(This article belongs to the Section Information and Communication Technologies)
Show Figures

Figure 1

13 pages, 4474 KB  
Article
Imaging on the Edge: Mapping Object Corners and Edges with Stereo X-Ray Tomography
by Zhenduo Shang and Thomas Blumensath
Tomography 2025, 11(8), 84; https://doi.org/10.3390/tomography11080084 - 29 Jul 2025
Viewed by 483
Abstract
Background/Objectives: X-ray computed tomography (XCT) is a powerful tool for volumetric imaging, where three-dimensional (3D) images are generated from a large number of individual X-ray projection images. However, collecting the required number of low-noise projection images is time-consuming, limiting its applicability to scenarios [...] Read more.
Background/Objectives: X-ray computed tomography (XCT) is a powerful tool for volumetric imaging, where three-dimensional (3D) images are generated from a large number of individual X-ray projection images. However, collecting the required number of low-noise projection images is time-consuming, limiting its applicability to scenarios requiring high temporal resolution, such as the study of dynamic processes. Inspired by stereo vision, we previously developed stereo X-ray imaging methods that operate with only two X-ray projections, enabling the 3D reconstruction of point and line fiducial markers at significantly faster temporal resolutions. Methods: Building on our prior work, this paper demonstrates the use of stereo X-ray techniques for 3D reconstruction of sharp object corners, eliminating the need for internal fiducial markers. This is particularly relevant for deformation measurement of manufactured components under load. Additionally, we explore model training using synthetic data when annotated real data is unavailable. Results: We show that the proposed method can reliably reconstruct sharp corners in 3D using only two X-ray projections. The results confirm the method’s applicability to real-world stereo X-ray images without relying on annotated real training datasets. Conclusions: Our approach enables stereo X-ray 3D reconstruction using synthetic training data that mimics key characteristics of real data, thereby expanding the method’s applicability in scenarios with limited training resources. Full article
Show Figures

Figure 1

16 pages, 14336 KB  
Article
Three-Dimensional Binary Marker: A Novel Underwater Marker Applicable for Long-Term Deployment Scenarios
by Alaaeddine Chaarani, Patryk Cieslak, Joan Esteba, Ivan Eichhardt and Pere Ridao
J. Mar. Sci. Eng. 2025, 13(8), 1442; https://doi.org/10.3390/jmse13081442 - 28 Jul 2025
Viewed by 611
Abstract
Traditional 2D optical markers degrade quickly in underwater applications due to sediment accumulation and marine biofouling, becoming undetectable within weeks. This paper presents a Three-Dimensional Binary Marker, a novel passive fiducial marker designed for underwater Long-Term Deployment. The Three-Dimensional Binary Marker addresses the [...] Read more.
Traditional 2D optical markers degrade quickly in underwater applications due to sediment accumulation and marine biofouling, becoming undetectable within weeks. This paper presents a Three-Dimensional Binary Marker, a novel passive fiducial marker designed for underwater Long-Term Deployment. The Three-Dimensional Binary Marker addresses the 2D-markers limitation through a 3D design that enhances resilience and maintains contrast for computer vision detection over extended periods. The proposed solution has been validated through simulation, water tank testing, and long-term sea trials for 5 months. In each stage, the marker was compared based on detection per visible frame and the detection distance. In conclusion, the design demonstrated superior performance compared to standard 2D markers. The proposed Three-Dimensional Binary Marker provides compatibility with widely used fiducial markers, such as ArUco and AprilTag, allowing quick adaptation for users. In terms of fabrication, the Three-Dimensional Binary Marker uses additive manufacturing, offering a low-cost and scalable solution for underwater localization tasks. The proposed marker improved the deployment time of fiducial markers from a couple of days to sixty days and with a range up to seven meters, providing robustness and reliability. As the marker survivability and detection range depend on its size, it is still a valuable innovation for Autonomous Underwater Vehicles, as well as for inspection, maintenance, and monitoring tasks in marine robotics and offshore infrastructure applications. Full article
(This article belongs to the Section Ocean Engineering)
Show Figures

Figure 1

15 pages, 1193 KB  
Article
Enhanced Brain Stroke Lesion Segmentation in MRI Using a 2.5D Transformer Backbone U-Net Model
by Mahsa Karimzadeh, Hadi Seyedarabi, Ata Jodeiri and Reza Afrouzian
Brain Sci. 2025, 15(8), 778; https://doi.org/10.3390/brainsci15080778 - 22 Jul 2025
Viewed by 1282
Abstract
Background/Objectives: Accurate segmentation of brain stroke lesions from MRI images is a critical task in medical image analysis that is essential for timely diagnosis and treatment planning. Methods: This paper presents a novel approach for segmenting brain stroke lesions using a deep learning [...] Read more.
Background/Objectives: Accurate segmentation of brain stroke lesions from MRI images is a critical task in medical image analysis that is essential for timely diagnosis and treatment planning. Methods: This paper presents a novel approach for segmenting brain stroke lesions using a deep learning model based on the U-Net neural network architecture. We enhanced the traditional U-Net by integrating a transformer-based backbone, specifically the Mix Vision Transformer (MiT), and compared its performance against other commonly used backbones such as ResNet and EfficientNet. Additionally, we implemented a 2.5D method, which leverages 2D networks to process three-dimensional data slices, effectively balancing the rich spatial context of 3D methods and the simplicity of 2D methods. The 2.5D approach captures inter-slice dependencies, leading to improved lesion delineation without the computational complexity of full 3D models. Utilizing the 2015 ISLES dataset, which includes MRI images and corresponding lesion masks for 20 patients, we conducted our experiments with 4-fold cross-validation to ensure robustness and reliability. To evaluate the effectiveness of our method, we conducted comparative experiments with several state-of-the-art (SOTA) segmentation models, including CNN-based UNet, nnU-Net, TransUNet, and SwinUNet. Results: Our proposed model outperformed all competing methods in terms of Dice Coefficient and Intersection over Union (IoU), demonstrating its robustness and superiority. Our extensive experiments demonstrate that the proposed U-Net with the MiT Backbone, combined with 2.5D data preparation, achieves superior performance metrics, specifically achieving DICE and IoU scores of 0.8153 ± 0.0101 and 0.7835 ± 0.0079, respectively, outperforming other backbone configurations. Conclusions: These results indicate that the integration of transformer-based backbones and 2.5D techniques offers a significant advancement in the accurate segmentation of brain stroke lesions, paving the way for more reliable and efficient diagnostic tools in clinical settings. Full article
(This article belongs to the Section Neural Engineering, Neuroergonomics and Neurorobotics)
Show Figures

Figure 1

29 pages, 3338 KB  
Article
AprilTags in Unity: A Local Alternative to Shared Spatial Anchors for Synergistic Shared Space Applications Involving Extended Reality and the Internet of Things
by Amitabh Mishra and Kevin Foster Carff
Sensors 2025, 25(14), 4408; https://doi.org/10.3390/s25144408 - 15 Jul 2025
Viewed by 1616
Abstract
Creating shared spaces is a key part of making extended reality (XR) and Internet of Things (IoT) technology more interactive and collaborative. Currently, one system which stands out in achieving this end commercially involves spatial anchors. Due to the cloud-based nature of these [...] Read more.
Creating shared spaces is a key part of making extended reality (XR) and Internet of Things (IoT) technology more interactive and collaborative. Currently, one system which stands out in achieving this end commercially involves spatial anchors. Due to the cloud-based nature of these anchors, they can introduce connectivity and privacy issues for projects which need to be isolated from the internet. This research attempts to explore and create a different approach that does not require internet connectivity. This work involves the creation of an AprilTags-based calibration system as a local solution for creating shared XR spaces and investigates its performance. AprilTags are simple, scannable markers that, through computer vision algorithms, can help XR devices figure out position and rotation in a three-dimensional space. This implies that multiple users can be in the same virtual space and in the real-world space at the same time, easily. Our tests in XR showed that this method is accurate and works well for synchronizing multiple users. This approach could make shared XR experiences faster, more private, and easier to use without depending on cloud-based calibration systems. Full article
(This article belongs to the Special Issue Feature Papers in the Internet of Things Section 2025)
Show Figures

Figure 1

Back to TopTop