Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (115)

Search Parameters:
Keywords = point cloud video

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
22 pages, 6385 KB  
Article
Targetless Calibration of Wide-Baseline and Wide-Angle Surround-View Fisheye Cameras Using Cylindrical Projection Model
by Gee Hoon Lee and Soon-Yong Park
Sensors 2026, 26(12), 3622; https://doi.org/10.3390/s26123622 - 6 Jun 2026
Viewed by 264
Abstract
We propose a novel targetless extrinsic calibration method for wide-baseline and wide-angle fisheye cameras, which are mounted on a driving vehicle for surround view monitoring. Sequences of image frames from three fisheye cameras are obtained, and the object instance and depth around the [...] Read more.
We propose a novel targetless extrinsic calibration method for wide-baseline and wide-angle fisheye cameras, which are mounted on a driving vehicle for surround view monitoring. Sequences of image frames from three fisheye cameras are obtained, and the object instance and depth around the vehicle are used for calibration. Thus, the proposed method can be applied to online vehicle camera calibration. Fisheye images are first transformed into the cylindrical coordinate system by considering the panoramic formation of the cameras. Then, the state-of-the-art object detection and monocular depth estimation models are applied to the cylindrical images. Vehicle instances matched across different views are reconstructed into 3D point clouds, and their depths are scaled by employing the pose geometry of the front camera. The per-point depths and global scale are then jointly optimized to achieve accurate cross-view alignment and extrinsic calibration. Experiments on both real-world and synthetic video datasets show that the proposed method achieves higher accuracy than COLMAP and DUSt3R under challenging conditions such as wide baselines and low frame rates, without requiring an artificial calibration target. Full article
Show Figures

Figure 1

23 pages, 11145 KB  
Article
DiffLiGS: Diffusion-Guided LiDAR-Enhanced 3D Gaussian Splatting
by Shucheng Gong, Hong Xie, Jiang Song, Longze Zhu and Hongping Zhang
ISPRS Int. J. Geo-Inf. 2026, 15(4), 140; https://doi.org/10.3390/ijgi15040140 - 24 Mar 2026
Viewed by 1988
Abstract
Multi-view 3D reconstruction is essential for smart city, supporting applications such as smart city planning and autonomous navigation. While traditional reconstruction pipelines and recent neural implicit methods, such as NeRF, achieve high visual fidelity, they often struggle with geometric accuracy and sparse-view scenarios. [...] Read more.
Multi-view 3D reconstruction is essential for smart city, supporting applications such as smart city planning and autonomous navigation. While traditional reconstruction pipelines and recent neural implicit methods, such as NeRF, achieve high visual fidelity, they often struggle with geometric accuracy and sparse-view scenarios. To address this challenge, we present DiffLiGS, a novel multi-modal 3D reconstruction framework that integrates LiDAR point clouds and LiDAR-guided diffusion-based priors into the 3D Gaussian Splatting (3DGS) pipeline, enabling high-fidelity and geometrically accurate models. Our method first densifies sparse LiDAR depths using a diffusion model and refines them through multi-view geometric constraints, producing dense LiDAR depth maps that provide robust supervision for 3DGS optimization. Leveraging these dense depth maps, we guide a Stable Video Diffusion model to synthesize novel view images, which are incorporated into training to enhance reconstruction completeness and visual realism. By jointly fusing rich appearance cues from multi-view images with precise LiDAR-derived geometry and diffusion priors, DiffLiGS achieves unified, geometry-aware 3D scene representations. Our extensive experiments demonstrate that our approach significantly improves both geometric accuracy and rendering quality compared to existing 3D reconstruction methods, enabling real-time, high-precision modeling of complex urban environments. Full article
Show Figures

Figure 1

19 pages, 7295 KB  
Article
Video Identifying and Eraser: Use Multi-Task Cascaded Convolutional Neural Network to Enhance Safety in a Text-to-Video Diffusion Model
by Shuang Lin, Ranran Zhou and Yong Wang
Appl. Sci. 2026, 16(6), 2995; https://doi.org/10.3390/app16062995 - 20 Mar 2026
Viewed by 456
Abstract
Current security solutions predominantly rely on cloud-based implementations, often neglecting computational resource constraints and operational efficiency. While contemporary methodologies typically require additional training, the few that operate without retraining frequently yield suboptimal performance. To address these limitations, this work leverages a pre-trained MTCNN [...] Read more.
Current security solutions predominantly rely on cloud-based implementations, often neglecting computational resource constraints and operational efficiency. While contemporary methodologies typically require additional training, the few that operate without retraining frequently yield suboptimal performance. To address these limitations, this work leverages a pre-trained MTCNN architecture to detect faces of copyright-protected individuals. We construct a facial landmark database comprising five critical fiducial points, which serves as a supplementary module integrated into the stable diffusion framework, enabling real-time security filtering for synthesized video content. The proposed system utilizes MTCNN models pre-trained in the cloud to build a repository of copyrighted facial signatures, generating a geometric parameter database of facial landmarks. This database, coupled with a parallel verification unit, functions as a plugin within the standard Stable Diffusion pipeline. By leveraging Stable Diffusion’s native decoder, we decode stochastic frames from the U-Net latent representations and perform real-time comparative analysis to identify potential copyright violations in generated video sequences. Upon detecting an infringement, an on-screen display (OSD) alert notifies the user and immediately halts the text-to-video (T2V) generation process. Experimental evaluations demonstrate that our framework effectively mitigates the resource constraints and latency issues inherent in edge deployment scenarios of prior security implementations. Leveraging MTCNN’s proven robustness and extensive edge compatibility for facial recognition, the proposed detection and obfuscation plugin integrates seamlessly with Stable Diffusion while preserving generation quality. Full article
(This article belongs to the Special Issue Applied Multimodal AI: Methods and Applications Across Domains)
Show Figures

Figure 1

35 pages, 10558 KB  
Article
Cave of Altamira (Spain): UAV-Based SLAM Mapping, Digital Twin and Segmentation-Driven Crack Detection for Preventive Conservation in Paleolithic Rock-Art Environments
by Jorge Angás, Manuel Bea, Carlos Valladares, Cristian Iranzo, Gonzalo Ruiz, Pilar Fatás, Carmen de las Heras, Miguel Ángel Sánchez-Carro, Viola Bruschi, Alfredo Prada and Lucía M. Díaz-González
Drones 2026, 10(1), 73; https://doi.org/10.3390/drones10010073 - 22 Jan 2026
Cited by 4 | Viewed by 1697
Abstract
The Cave of Altamira (Spain), a UNESCO World Heritage site, contains one of the most fragile and inaccessible Paleolithic rock-art environments in Europe, where geomatics documentation is constrained not only by severe spatial, lighting and safety limitations but also by conservation-driven restrictions on [...] Read more.
The Cave of Altamira (Spain), a UNESCO World Heritage site, contains one of the most fragile and inaccessible Paleolithic rock-art environments in Europe, where geomatics documentation is constrained not only by severe spatial, lighting and safety limitations but also by conservation-driven restrictions on time, access and operational procedures. This study applies a confined-space UAV equipped with LiDAR-based SLAM navigation to document and assess the stability of the vertical rock wall leading to “La Hoya” Hall, a structurally sensitive sector of the cave. Twelve autonomous and assisted flights were conducted, generating dense LiDAR point clouds and video sequences processed through videogrammetry to produce high-resolution 3D meshes. A Mask R-CNN deep learning model was trained on manually segmented images to explore automated crack detection under variable illumination and viewing conditions. The results reveal active fractures, overhanging blocks and sediment accumulations located on inaccessible ledges, demonstrating the capacity of UAV-SLAM workflows to overcome the limitations of traditional surveys in confined subterranean environments. All datasets were integrated into the DiGHER digital twin platform, enabling traceable storage, multitemporal comparison, and collaborative annotation. Overall, the study demonstrates the feasibility of combining UAV-based SLAM mapping, videogrammetry and deep learning segmentation as a reproducible baseline workflow to inform preventive conservation and future multitemporal monitoring in Paleolithic caves and similarly constrained cultural heritage contexts. Full article
(This article belongs to the Topic 3D Documentation of Natural and Cultural Heritage)
Show Figures

Figure 1

22 pages, 9357 KB  
Article
Intelligent Evaluation of Rice Resistance to White-Backed Planthopper (Sogatella furcifera) Based on 3D Point Clouds and Deep Learning
by Yuxi Zhao, Huilai Zhang, Wei Zeng, Litu Liu, Qing Li, Zhiyong Li and Chunxian Jiang
Agriculture 2026, 16(2), 215; https://doi.org/10.3390/agriculture16020215 - 14 Jan 2026
Viewed by 521
Abstract
Accurate assessment of rice resistance to Sogatella furcifera (Horváth) is essential for breeding insect-resistant cultivars. Traditional assessment methods rely on manual scoring of damage severity, which is subjective and inefficient. To overcome these limitations, this study proposes an automated resistance evaluation approach based [...] Read more.
Accurate assessment of rice resistance to Sogatella furcifera (Horváth) is essential for breeding insect-resistant cultivars. Traditional assessment methods rely on manual scoring of damage severity, which is subjective and inefficient. To overcome these limitations, this study proposes an automated resistance evaluation approach based on multi-view 3D reconstruction and deep learning–based point cloud segmentation. Multi-view videos of rice materials with different resistance levels were collected over time and processed using Structure from Motion (SfM) and Multi-View Stereo (MVS) to reconstruct high-quality 3D point clouds. A well-annotated “3D Rice WBPH Damage” dataset comprising 174 samples (15 rice materials, three replicates each, 45 pots) was established, where each sample corresponds to a reconstructed 3D point cloud from a video sequence. A comparative study of various point cloud semantic segmentation models, including PointNet, PointNet++, ShellNet, and PointCNN, revealed that the PointNet++ (MSG) model, which employs a Multi-Scale Grouping strategy, demonstrated the best performance in segmenting complex damage symptoms. To further accurately quantify the severity of damage, an adaptive point cloud dimensionality reduction method was proposed, which effectively mitigates the interference of leaf shrinkage on damage assessment. Experimental results demonstrated a strong correlation (R2 = 0.95) between automated and manual evaluations, achieving accuracies of 86.67% and 93.33% at the sample and material levels, respectively. This work provides an objective, efficient, and scalable solution for evaluating rice resistance to S. furcifera, offering promising applications in crop resistance breeding. Full article
(This article belongs to the Section Crop Protection, Diseases, Pests and Weeds)
Show Figures

Figure 1

20 pages, 5947 KB  
Article
A Knowledge Graph-Guided and Multimodal Data Fusion-Driven Rapid Modeling Method for Digital Twin Scenes: A Case Study of Bridge Tower Construction
by Yongtao Zhang, Yongwei Wang, Zhihao Guo, Jun Zhu, Fanxu Huang, Hao Zhu, Yuan Chen and Yajian Kang
ISPRS Int. J. Geo-Inf. 2026, 15(1), 27; https://doi.org/10.3390/ijgi15010027 - 6 Jan 2026
Viewed by 1493
Abstract
Establishing digital twin scenes facilitates the understanding of geospatial phenomena, representing a significant research focus for GIS scientists and engineers. However, current research on digital twin scenes modeling relies on manual intervention or the overlay of static models, resulting in low modeling efficiency [...] Read more.
Establishing digital twin scenes facilitates the understanding of geospatial phenomena, representing a significant research focus for GIS scientists and engineers. However, current research on digital twin scenes modeling relies on manual intervention or the overlay of static models, resulting in low modeling efficiency and poor standardization. To address these challenges, this paper proposes a knowledge graph-guided and multimodal data fusion-driven rapid modeling method for digital twin scenes, using bridge tower construction as an illustrative example. We first constructed a knowledge graph linking the three domains of “event-object-data” in bridge tower construction. Guided by this graph, we designed a knowledge graph-guided multimodal data association and fusion algorithm. Then a rapid modeling method for bridge tower construction scenes based on dynamic data was established. Finally, a prototype system was developed, and a case study area was selected for analysis. Experimental results show that the knowledge graph we built clearly captures all elements and their relationships in bridge tower construction scenes. Our method enables precise fusion of 5 types of multimodal data: BIM, DEM, images, videos, and point clouds. It improves spatial registration accuracy by 21.83%, increases temporal fusion efficiency by 65.6%, and reduces feature fusion error rates by 70.9%. Local updates of the 3D geographic scene take less than 30 ms, supporting millisecond-level digital twin modeling. This provides a practical reference for building geographic digital twin scenes. Full article
(This article belongs to the Special Issue Knowledge-Guided Map Representation and Understanding)
Show Figures

Figure 1

26 pages, 12124 KB  
Article
MF-GCN: Multimodal Information Fusion Using Incremental Graph Convolutional Network for Ship Behavior Anomaly Detection
by Ruixin Ma, Jinhao Zhang, Weizhi Nie, Naiming Ge, Hao Wen and Aoxiang Liu
J. Mar. Sci. Eng. 2026, 14(1), 87; https://doi.org/10.3390/jmse14010087 - 1 Jan 2026
Viewed by 906
Abstract
Ship behavior anomaly detection is critical for intelligent perception and early warning in complex inland waterways, where single-source sensing (e.g., AIS-only or vision-only) is often fragile under occlusion, illumination variation, and signal noise. This study proposes MF-GCN, a multimodal (heterogeneous) information fusion framework [...] Read more.
Ship behavior anomaly detection is critical for intelligent perception and early warning in complex inland waterways, where single-source sensing (e.g., AIS-only or vision-only) is often fragile under occlusion, illumination variation, and signal noise. This study proposes MF-GCN, a multimodal (heterogeneous) information fusion framework based on an Incremental Graph Convolutional Network (IGCN) to detect and warn anomalous ship behaviors by jointly modeling AIS, video imagery, LiDAR point clouds, and water level signals. We first extract modality-specific features and enforce temporal–spatial consistency via timestamp and geo-referencing alignment, then construct an evolving graph in which nodes represent multimodal features and edges encode temporal dependency and semantic similarity. MF-GCN integrates a Semantic Clustering-based GCN (S-GCN) to inject historical semantic context and an Attentive Fusion-based GCN (A-GCN) to learn dynamic cross-modal correlations using multi-head attention. Experiments on our constructed real-world datasets demonstrate that MF-GCN achieves accuracies of 93.8%, 93.8%, and 93.3% with F1-scores of 93.6%, 93.6%, and 93.3% for ship deviation warning, bridge-crossing warning, and inter-ship collision warning, respectively, consistently outperforming representative baselines. These results verify the effectiveness of the proposed method for robust multimodal anomaly detection and early warning in inland-waterway scenarios. Full article
(This article belongs to the Special Issue Emerging Computational Methods in Intelligent Marine Vehicles)
Show Figures

Figure 1

17 pages, 3550 KB  
Article
Edge Intelligence-Based Rail Transit Equipment Inspection System
by Lijia Tian, Hongli Zhao, Li Zhu, Hailin Jiang and Xinjun Gao
Sensors 2026, 26(1), 236; https://doi.org/10.3390/s26010236 - 30 Dec 2025
Cited by 1 | Viewed by 1111
Abstract
The safe operation of rail transit systems relies heavily on the efficient and reliable maintenance of their equipment, as any malfunction or abnormal operation may pose serious risks to transportation safety. Traditional manual inspection methods are often characterized by high costs, low efficiency, [...] Read more.
The safe operation of rail transit systems relies heavily on the efficient and reliable maintenance of their equipment, as any malfunction or abnormal operation may pose serious risks to transportation safety. Traditional manual inspection methods are often characterized by high costs, low efficiency, and susceptibility to human error. To address these limitations, this paper presents a rail transit equipment inspection system based on Edge Intelligence (EI) and 5G technology. The proposed system adopts a cloud–edge–end collaborative architecture that integrates Computer Vision (CV) techniques to automate inspection tasks; specifically, a fine-tuned YOLOv8 model is employed for object detection of personnel and equipment, while a ResNet-18 network is utilized for equipment status classification. By implementing an ETSI MEC-compliant framework on edge servers (NVIDIA Jetson AGX Orin), the system enhances data processing efficiency and network performance, while further strengthening security through the use of a 5G private network that isolates critical infrastructure data from the public internet, and improving robustness via distributed edge nodes that eliminate single points of failure. The proposed solution has been deployed and evaluated in real-world scenarios on Beijing Metro Line 6. Experimental results demonstrate that the YOLOv8 model achieves a mean Average Precision (mAP@0.5) of 92.7% ± 0.4% for equipment detection, and the ResNet-18 classifier attains 95.8% ± 0.3% accuracy in distinguishing normal and abnormal statuses. Compared with a cloud-centric architecture, the EI-based system reduces the average end-to-end latency for anomaly detection tasks by 45% (28.5 ms vs. 52.1 ms) and significantly lowers daily bandwidth consumption by approximately 98.1% (from 40.0 GB to 0.76 GB) through an event-triggered evidence upload strategy involving images and short video clips, highlighting its superior real-time performance, security, robustness, and bandwidth efficiency. Full article
Show Figures

Figure 1

15 pages, 1308 KB  
Article
Evolution of Convolutional and Recurrent Artificial Neural Networks in the Context of BIM: Deep Insight and New Tool, Bimetria
by Andrzej Szymon Borkowski, Łukasz Kochański and Konrad Rukat
Infrastructures 2026, 11(1), 6; https://doi.org/10.3390/infrastructures11010006 - 22 Dec 2025
Cited by 1 | Viewed by 989
Abstract
This paper discusses the evolution of convolutional (CNN) and recurrent (RNN) artificial neural networks in applications for Building Information Modeling (BIM). The paper outlines the milestones reached in the last two decades. The article organizes the current state of knowledge and technology in [...] Read more.
This paper discusses the evolution of convolutional (CNN) and recurrent (RNN) artificial neural networks in applications for Building Information Modeling (BIM). The paper outlines the milestones reached in the last two decades. The article organizes the current state of knowledge and technology in terms of three aspects: (1) computer visualization coupled with BIM models (detection, segmentation, and quality verification in images, videos, and point clouds), (2) sequence and time series modeling (prediction of costs, energy, work progress, risk), and (3) integration of deep learning results with the semantics and topology of Industry Foundation Class (IFC) models. The paper identifies the most used architectures, typical data pipelines (synthetic data from BIM models, transfer learning, mapping results to IFC elements) and practical limitations: lack of standardized benchmarks, high annotation costs, a domain gap between synthetic and real data, and discontinuous interoperability. We indicate directions for development: combining CNN/RNN with graph models and transformers for wider use of synthetic data and semi-/supervised learning, as well as explainability methods that increase trust in AECOO (Architecture, Engineering, Construction, Owners & Operators) processes. A practical case study presents a new application, Bimetria, which uses a hybrid CNN/OCR (Optical Character Recognition) solution to generate 3D models with estimates based on two-dimensional drawings. A deep review shows that although the importance of attention-based and graph-based architectures is growing, CNNs and RNNs remain an important part of the BIM process, especially in engineering tasks, where, in our experience and in the Bimetria case study, mature convolutional architectures offer a good balance between accuracy, stability and low latency. The paper also raises some fundamental questions to which we are still seeking answers. Thus, the article not only presents the innovative new Bimetria tool but also aims to stimulate discussion about the dynamic development of AI (Artificial Intelligence) in BIM. Full article
(This article belongs to the Special Issue Modern Digital Technologies for the Built Environment of the Future)
Show Figures

Figure 1

29 pages, 31164 KB  
Article
Geometric Condition Assessment of Traffic Signs Leveraging Sequential Video-Log Images and Point-Cloud Data
by Yiming Jiang, Yuchun Huang, Shuang Li, Jun Liu and He Yang
Remote Sens. 2025, 17(24), 4061; https://doi.org/10.3390/rs17244061 - 18 Dec 2025
Viewed by 765
Abstract
Traffic signs exposed to long-term outdoor conditions frequently exhibit deformation, inclination, or other forms of physical damage, highlighting the need for timely and reliable anomaly assessment to support road safety management. While point-cloud data provide accurate three-dimensional geometric information, their sparse distribution and [...] Read more.
Traffic signs exposed to long-term outdoor conditions frequently exhibit deformation, inclination, or other forms of physical damage, highlighting the need for timely and reliable anomaly assessment to support road safety management. While point-cloud data provide accurate three-dimensional geometric information, their sparse distribution and lack of appearance cues make traffic sign extraction challenging in complex environments. High-resolution sequential video-log images captured from multiple viewpoints offer complementary advantages by providing rich color and texture information. In this study, we propose an integrated traffic sign detection and assessment framework that combines video-log images and mobile-mapping point clouds to enhance both accuracy and robustness. A dedicated YOLO-SIGN network is developed to perform precise detection and multi-view association of traffic signs across sequential images. Guided by these detections, a frustum-based point-cloud extraction strategy with seed-point density growing is introduced to efficiently isolate traffic sign panels and supporting poles. The extracted structures are then used for geometric parameterization and damage assessment, including inclination, deformation, and rotation. Experiments on 35 simulated scenes and nine real-world road scenarios demonstrate that the proposed method can reliably extract and evaluate traffic sign conditions in diverse environments. Furthermore, the YOLO-SIGN network achieves a localization precision of 91.16% and a classification mAP of 84.64%, outperforming YOLOv10s by 1.7% and 8.7%, respectively, while maintaining a reduced number of parameters. These results confirm the effectiveness and practical value of the proposed framework for large-scale traffic sign monitoring. Full article
Show Figures

Graphical abstract

15 pages, 1414 KB  
Article
Gait Cycle Duration Analysis in Lower Limb Amputees Using an IoT-Based Photonic Wearable Sensor: A Preliminary Proof-of-Concept Study
by Bruna Alves, Alessandro Fantoni, José Pedro Matos, João Costa and Manuela Vieira
Sensors 2025, 25(23), 7148; https://doi.org/10.3390/s25237148 - 23 Nov 2025
Viewed by 1246
Abstract
This study represents a preliminary proof of concept intended to demonstrate the feasibility of using a single-point LiDAR sensor for wearable gait analysis. The study presents a low-cost wearable sensor system that integrates a single-point LiDAR module and IoT connectivity to assess Gait [...] Read more.
This study represents a preliminary proof of concept intended to demonstrate the feasibility of using a single-point LiDAR sensor for wearable gait analysis. The study presents a low-cost wearable sensor system that integrates a single-point LiDAR module and IoT connectivity to assess Gait Cycle Duration (GCD) and gait symmetry in real time. The device is positioned on the medial side of the calf to detect the contralateral limb crossing—used as a proxy for mid-stance—enabling the computation of GCD for both limbs and the derivation of the Symmetry Ratio and Symmetry Index. This was conducted under simulated walking at three cadences (slow, normal and fast). GCD estimated by the sensor was compared against the visual annotation with Kinovea®, showing reasonable agreement, with most cycle-wise relative differences below approximately 13% and both methods capturing similar symmetry trends. The wearable system operated reliably across different speeds, with an estimated materials cost of under 100 € and wireless data streaming to a cloud dashboard for real-time visualization. Although the validation is preliminary and limited to a single healthy participant and a video-based reference, the results support the feasibility of a photonic, IoT-based approach for portable and objective gait assessment, motivating future studies with larger and clinical cohorts and gold-standard references to quantify accuracy, repeatability and clinical utility. Full article
Show Figures

Figure 1

21 pages, 7707 KB  
Article
Tomato Growth Monitoring and Phenological Analysis Using Deep Learning-Based Instance Segmentation and 3D Point Cloud Reconstruction
by Warut Timprae, Tatsuki Sagawa, Stefan Baar, Satoshi Kondo, Yoshifumi Okada, Kazuhiko Sato, Poltak Sandro Rumahorbo, Yan Lyu, Kyuki Shibuya, Yoshiki Gama, Yoshiki Hatanaka and Shinya Watanabe
Sustainability 2025, 17(22), 10120; https://doi.org/10.3390/su172210120 - 12 Nov 2025
Cited by 3 | Viewed by 1366
Abstract
Accurate and nondestructive monitoring of tomato growth is essential for large-scale greenhouse production; however, it remains challenging for small-fruited cultivars such as cherry tomatoes. Traditional 2D image analysis often fails to capture precise morphological traits, limiting its usefulness in growth modeling and yield [...] Read more.
Accurate and nondestructive monitoring of tomato growth is essential for large-scale greenhouse production; however, it remains challenging for small-fruited cultivars such as cherry tomatoes. Traditional 2D image analysis often fails to capture precise morphological traits, limiting its usefulness in growth modeling and yield estimation. This study proposes an automated phenotyping framework that integrates deep learning-based instance segmentation with high-resolution 3D point cloud reconstruction and ellipsoid fitting to estimate fruit size and ripeness from daily video recordings. These techniques enable accurate camera pose estimation and dense geometric reconstruction (via SfM and MVS), while Nerfacto enhances surface continuity and photorealistic fidelity, resulting in highly precise and visually consistent 3D representations. The reconstructed models are followed by CIELAB color analysis and logistic curve fitting to characterize the growth dynamics. When applied to real greenhouse conditions, the method achieved an average size estimation error of 8.01% compared to manual caliper measurements. During summer, the maximum growth rate (gmax) of size and ripeness were 24.14%, and 95.24% higher than in winter, respectively. Seasonal analysis revealed that winter-grown tomatoes matured approximately 10 days later than summer-grown fruits, highlighting environmental influences on phenological development. By enabling precise, noninvasive tracking of size and ripeness progression, this approach is a novel tool for smart and sustainable agriculture. Full article
(This article belongs to the Special Issue Green Technology and Biological Approaches to Sustainable Agriculture)
Show Figures

Figure 1

20 pages, 2797 KB  
Article
Seed 3D Phenotyping Across Multiple Crops Using 3D Gaussian Splatting
by Jun Gao, Chao Zhu, Junguo Hu, Fei Deng, Zhaoxin Xu and Xiaomin Wang
Agriculture 2025, 15(22), 2329; https://doi.org/10.3390/agriculture15222329 - 8 Nov 2025
Viewed by 3122
Abstract
This study introduces a versatile seed 3D reconstruction method that is applicable to multiple crops—including maize, wheat, and rice—and designed to overcome the inefficiency and subjectivity of manual measurements and the high costs of laser-based phenotyping. A panoramic video of the seed is [...] Read more.
This study introduces a versatile seed 3D reconstruction method that is applicable to multiple crops—including maize, wheat, and rice—and designed to overcome the inefficiency and subjectivity of manual measurements and the high costs of laser-based phenotyping. A panoramic video of the seed is captured and processed through frame sampling to extract multi-view images. Structure-from-Motion (SFM) is employed for sparse reconstruction and camera pose estimation, while 3D Gaussian Splatting (3DGS) is utilized for high-fidelity dense reconstruction, generating detailed point cloud models. The subsequent point cloud preprocessing, filtering, and segmentation enable the extraction of key phenotypic parameters, including length, width, height, surface area, and volume. The experimental evaluations demonstrated a high measurement accuracy, with coefficients of determination (R2) for length, width, and height reaching 0.9361, 0.8889, and 0.946, respectively. Moreover, the reconstructed models exhibit superior image quality, with peak signal-to-noise ratio (PSNR) values consistently ranging from 35 to 37 dB, underscoring the robustness of 3DGS in preserving fine structural details. Compared to conventional multi-view stereo (MVS) techniques, the proposed method can achieve significantly improved reconstruction accuracy and visual fidelity. The key outcomes of this study confirm that the 3DGS-based pipeline provides a highly accurate, efficient, and scalable solution for digital phenotyping, establishing a robust foundation for its application across diverse crop species. Full article
(This article belongs to the Section Artificial Intelligence and Digital Agriculture)
Show Figures

Figure 1

35 pages, 20479 KB  
Article
Comprehensive Forensic Tool for Crime Scene and Traffic Accident 3D Reconstruction
by Alejandra Ospina-Bohórquez, Esteban Ruiz de Oña, Roy Yali, Emmanouil Patsiouras, Katerina Margariti and Diego González-Aguilera
Algorithms 2025, 18(11), 707; https://doi.org/10.3390/a18110707 - 7 Nov 2025
Cited by 1 | Viewed by 3448
Abstract
This article presents a comprehensive forensic tool for crime scene and traffic accident investigations, integrating advanced 3D reconstruction and semantic and dynamic analyses; the tool facilitates the accurate documentation and preservation of crime scenes through photogrammetric techniques, producing detailed 3D models based on [...] Read more.
This article presents a comprehensive forensic tool for crime scene and traffic accident investigations, integrating advanced 3D reconstruction and semantic and dynamic analyses; the tool facilitates the accurate documentation and preservation of crime scenes through photogrammetric techniques, producing detailed 3D models based on images or video captured under specified protocols. The system includes modules for semantic analysis, enabling object detection and classification in 3D point clouds and 2D images. By employing machine learning methods such as the Random Forest model for point cloud classification and the YOLOv8 architecture for object detection, the tool enhances the accuracy and reliability of forensic analysis. Furthermore, a dynamic analysis module supports ballistic trajectory calculations for crime scene investigations and the vehicle impact speed estimation using the Equivalent Barrier Speed (EBS) model for traffic accidents. These capabilities are integrated into a single, user-friendly platform offering significant improvements over existing forensic tools, which often focus on singular tasks and require expertise. This tool provides a robust, accessible solution for law enforcement agencies, enabling more efficient and precise forensic investigations across different scenarios. Full article
(This article belongs to the Special Issue Modern Algorithms for Image Processing and Computer Vision)
Show Figures

Figure 1

21 pages, 5023 KB  
Article
Robust 3D Target Detection Based on LiDAR and Camera Fusion
by Miao Jin, Bing Lu, Gang Liu, Yinglong Diao, Xiwen Chen and Gaoning Nie
Electronics 2025, 14(21), 4186; https://doi.org/10.3390/electronics14214186 - 27 Oct 2025
Cited by 2 | Viewed by 1959
Abstract
Autonomous driving relies on multimodal sensors to acquire environmental information for supporting decision making and control. While significant progress has been made in 3D object detection regarding point cloud processing and multi-sensor fusion, existing methods still suffer from shortcomings—such as sparse point clouds [...] Read more.
Autonomous driving relies on multimodal sensors to acquire environmental information for supporting decision making and control. While significant progress has been made in 3D object detection regarding point cloud processing and multi-sensor fusion, existing methods still suffer from shortcomings—such as sparse point clouds of foreground targets, fusion instability caused by fluctuating sensor data quality, and inadequate modeling of cross-frame temporal consistency in video streams—which severely restrict the practical performance of perception systems. To address these issues, this paper proposes a multimodal video stream 3D object detection framework based on reliability evaluation. Specifically, it dynamically perceives the reliability of each modal feature by evaluating the Region of Interest (RoI) features of cameras and LiDARs, and adaptively adjusts their contribution ratios in the fusion process accordingly. Additionally, a target-level semantic soft matching graph is constructed within the RoI region. Combined with spatial self-attention and temporal cross-attention mechanisms, the spatio-temporal correlations between consecutive frames are fully explored to achieve feature completion and enhancement. Verification on the nuScenes dataset shows that the proposed algorithm achieves an optimal performance of 67.3% and 70.6% in terms of the two core metrics, mAP and NDS, respectively—outperforming existing mainstream 3D object detection algorithms. Ablation experiments confirm that each module plays a crucial role in improving overall performance, and the algorithm exhibits better robustness and generalization in dynamically complex scenarios. Full article
Show Figures

Figure 1

Back to TopTop