Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (101)

Search Parameters:
Keywords = Semi-Global Matching

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
28 pages, 32809 KB  
Article
LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery
by Boya Wang, Shuo Wang, Yibin Han, Linfeng Xu and Dong Ye
Remote Sens. 2025, 17(19), 3349; https://doi.org/10.3390/rs17193349 - 1 Oct 2025
Abstract
We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV [...] Read more.
We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV applications. LiteSAM integrates three key components to address these issues. First, efficient multi-scale feature extraction optimizes representation, reducing inference latency for edge devices. Second, a Token Aggregation–Interaction Transformer (TAIFormer) with a convolutional token mixer (CTM) models inter- and intra-image correlations, enabling robust global–local feature fusion. Third, a MinGRU-based dynamic subpixel refinement module adaptively learns spatial offsets, enhancing subpixel-level matching accuracy and cross-scenario generalization. The experiments show that LiteSAM achieves competitive performance across multiple datasets. On UAV-VisLoc, LiteSAM attains an RMSE@30 of 17.86 m, outperforming state-of-the-art semi-dense methods such as EfficientLoFTR. Its optimized variant, LiteSAM (opt., without dual softmax), delivers inference times of 61.98 ms on standard GPUs and 497.49 ms on NVIDIA Jetson AGX Orin, which are 22.9% and 19.8% faster than EfficientLoFTR (opt.), respectively. With 6.31M parameters, which is 2.4× fewer than EfficientLoFTR’s 15.05M, LiteSAM proves to be suitable for edge deployment. Extensive evaluations on natural image matching and downstream vision tasks confirm its superior accuracy and efficiency for general feature matching. Full article
25 pages, 17492 KB  
Article
Temporal and Spatial Upscaling with PlanetScope Data: Predicting Relative Canopy Dieback in the Piñon-Juniper Woodlands of Utah
by Elliot S. Shayle and Dirk Zeuss
Remote Sens. 2025, 17(19), 3323; https://doi.org/10.3390/rs17193323 - 28 Sep 2025
Abstract
Drought-induced forest mortality threatens biodiversity globally, particularly in arid, and semi-arid woodlands. The continual development of remote sensing approaches enables enhanced monitoring of forest health. Herein, we investigate the ability of a limited ground-truthed canopy dieback dataset and satellite image derived Normalised Difference [...] Read more.
Drought-induced forest mortality threatens biodiversity globally, particularly in arid, and semi-arid woodlands. The continual development of remote sensing approaches enables enhanced monitoring of forest health. Herein, we investigate the ability of a limited ground-truthed canopy dieback dataset and satellite image derived Normalised Difference Vegetation Index (NDVI) to make inferences about forest health as temporal and spatial extent from its collection increases. We used ground-truthed observations of relative canopy mortality from the Pinus edulis-Juniperus osteosperma woodlands of southeastern Utah, United States of America, collected after the 2017–2018 drought, and PlanetScope satellite imagery. Through assessing different modelling approaches, we found that NDVI is significantly associated with sitewide mean canopy dieback, with beta regression being the most optimal modelling framework due to the bounded nature of the variable relative canopy dieback. Model performance was further improved by incorporating the proportion of J. osteosperma as an interaction term, matching the reports of species-specific differential dieback. A time-series analysis revealed that NDVI retained its predictive power for our whole testing period; four years after the initial ground-truthing, thus enabling retrospective inference of defoliation and regreening. A spatial random forest model trained on our ground-truthed observations accurately predicted dieback across the broader landscape. These findings demonstrate that modest field campaigns combined with high-resolution satellite data can generate reliable, scalable insights into forest health, offering a cost-effective method for monitoring drought-impacted ecosystems under climate change. Full article
(This article belongs to the Section Forest Remote Sensing)
Show Figures

Figure 1

24 pages, 1822 KB  
Article
A Trinocular System for Pedestrian Localization by Combining Template Matching with Geometric Constraint Optimization
by Jinjing Zhao, Sen Huang, Yancheng Li, Jingjing Xu and Shengyong Xu
Sensors 2025, 25(19), 5970; https://doi.org/10.3390/s25195970 - 25 Sep 2025
Abstract
Pedestrian localization is a fundamental sensing task for intelligent outdoor systems. To overcome the limitations of accuracy and efficiency in conventional binocular approaches, this study introduces a trinocular stereo vision framework that integrates template matching with geometric constraint optimization. The system employs a [...] Read more.
Pedestrian localization is a fundamental sensing task for intelligent outdoor systems. To overcome the limitations of accuracy and efficiency in conventional binocular approaches, this study introduces a trinocular stereo vision framework that integrates template matching with geometric constraint optimization. The system employs a trinocular camera configuration arranged in an equilateral triangle, which enables complementary perspectives beyond a standard horizontal baseline. Based on this setup, an initial depth estimate is obtained through multi-scale template matching on the primary binocular pair. The additional vertical viewpoint is then incorporated by enforcing three-view geometric consistency, yielding refined and more reliable depth estimates. We evaluate the method on a custom outdoor trinocular dataset. Experimental results demonstrate that the proposed approach achieves a mean absolute error of 0.435 m with an average processing time of 3.13 ms per target. This performance surpasses both the binocular Semi-Global Block Matching (0.536 m) and RAFT-Stereo (0.623 m for the standard model and 0.621 m for the real-time model without fine-tuning). When combined with the YOLOv8-s detector, the system can localize pedestrians in 7.52 ms per frame, maintaining real-time operation (> 30 Hz) for up to nine individuals, with a total end-to-end latency of approximately 32.56 ms. Full article
(This article belongs to the Section Navigation and Positioning)
48 pages, 18119 KB  
Article
Dense Matching with Low Computational Complexity for Disparity Estimation in the Radargrammetric Approach of SAR Intensity Images
by Hamid Jannati, Mohammad Javad Valadan Zoej, Ebrahim Ghaderpour and Paolo Mazzanti
Remote Sens. 2025, 17(15), 2693; https://doi.org/10.3390/rs17152693 - 3 Aug 2025
Viewed by 543
Abstract
Synthetic Aperture Radar (SAR) images and optical imagery have high potential for extracting digital elevation models (DEMs). The two main approaches for deriving elevation models from SAR data are interferometry (InSAR) and radargrammetry. Adapted from photogrammetric principles, radargrammetry relies on disparity model estimation [...] Read more.
Synthetic Aperture Radar (SAR) images and optical imagery have high potential for extracting digital elevation models (DEMs). The two main approaches for deriving elevation models from SAR data are interferometry (InSAR) and radargrammetry. Adapted from photogrammetric principles, radargrammetry relies on disparity model estimation as its core component. Matching strategies in radargrammetry typically follow local, global, or semi-global methodologies. Local methods, while having higher accuracy, especially in low-texture SAR images, require larger kernel sizes, leading to quadratic computational complexity. Conversely, global and semi-global models produce more consistent and higher-quality disparity maps but are computationally more intensive than local methods with small kernels and require more memory (RAM). In this study, inspired by the advantages of local matching algorithms, a computationally efficient and novel model is proposed for extracting corresponding pixels in SAR-intensity stereo images. To enhance accuracy, the proposed two-stage algorithm operates without an image pyramid structure. Notably, unlike traditional local and global models, the computational complexity of the proposed approach remains stable as the input size or kernel dimensions increase while memory consumption stays low. Compared to a pyramid-based local normalized cross-correlation (NCC) algorithm and adaptive semi-global matching (SGM) models, the proposed method maintains good accuracy comparable to adaptive SGM while reducing processing time by up to 50% relative to pyramid SGM and achieving a 35-fold speedup over the local NCC algorithm with an optimal kernel size. Validated on a Sentinel-1 stereo pair with a 10 m ground-pixel size, the proposed algorithm yields a DEM with an average accuracy of 34.1 m. Full article
Show Figures

Graphical abstract

24 pages, 15100 KB  
Article
Sugarcane Feed Volume Detection in Stacked Scenarios Based on Improved YOLO-ASM
by Xiao Lai and Guanglong Fu
Agriculture 2025, 15(13), 1428; https://doi.org/10.3390/agriculture15131428 - 2 Jul 2025
Viewed by 404
Abstract
Improper regulation of sugarcane feed volume can lead to harvester inefficiency or clogging. Accurate recognition of feed volume is therefore critical. However, visual recognition is challenging due to sugarcane stacking during feeding. To address this, we propose YOLO-ASM (YOLO Accurate Stereo Matching), a [...] Read more.
Improper regulation of sugarcane feed volume can lead to harvester inefficiency or clogging. Accurate recognition of feed volume is therefore critical. However, visual recognition is challenging due to sugarcane stacking during feeding. To address this, we propose YOLO-ASM (YOLO Accurate Stereo Matching), a novel detection method. At the target detection level, we integrate a Convolutional Block Attention Module (CBAM) into the YOLOv5s backbone network. This significantly reduces missed detections and low-confidence predictions in dense stacking scenarios, improving detection speed by 28.04% and increasing mean average precision (mAP) by 5.31%. At the stereo matching level, we enhance the SGBM (Semi-Global Block Matching) algorithm through improved cost calculation and cost aggregation, resulting in Opti-SGBM (Optimized SGBM). This double-cost fusion approach strengthens texture feature extraction in stacked sugarcane, effectively reducing noise in the generated depth maps. The optimized algorithm yields depth maps with smaller errors relative to the original images, significantly improving depth accuracy. Experimental results demonstrate that the fused YOLO-ASM algorithm reduces sugarcane volume error rates across feed volumes of one to six by 3.45%, 3.23%, 6.48%, 5.86%, 9.32%, and 11.09%, respectively, compared to the original stereo matching algorithm. It also accelerates feed volume detection by approximately 100%, providing a high-precision solution for anti-clogging control in sugarcane harvester conveyor systems. Full article
(This article belongs to the Section Agricultural Technology)
Show Figures

Figure 1

12 pages, 3508 KB  
Article
Improvement of the Cross-Scale Multi-Feature Stereo Matching Algorithm
by Nan Chen, Dongri Shan and Peng Zhang
Appl. Sci. 2025, 15(11), 5837; https://doi.org/10.3390/app15115837 - 22 May 2025
Viewed by 605
Abstract
With the continuous advancement of industrialization and intelligentization, stereo-vision-based measurement technology for large-scale components has become a prominent research focus. To address weak-textured regions in large-scale component images and reduce mismatches in stereo matching, we propose a cross-scale multi-feature stereo matching algorithm. In [...] Read more.
With the continuous advancement of industrialization and intelligentization, stereo-vision-based measurement technology for large-scale components has become a prominent research focus. To address weak-textured regions in large-scale component images and reduce mismatches in stereo matching, we propose a cross-scale multi-feature stereo matching algorithm. In the cost-computation stage, the sum of absolute differences (SAD), census, and modified census cost aggregation are employed as cost-calculation methods. During the cost-aggregation phase, cross-scale theory is introduced to fuse multi-scale cost volumes using distinct aggregation parameters through a cross-scale framework. Experimental results on both benchmark and real-world datasets demonstrate that the enhanced algorithm achieves an average mismatch rate of 12.25%, exhibiting superior robustness compared to conventional census transform and semi-global matching (SGM) algorithms. Full article
(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)
Show Figures

Figure 1

27 pages, 49665 KB  
Article
ETQ-Matcher: Efficient Quadtree-Attention-Guided Transformer for Detector-Free Aerial–Ground Image Matching
by Chuan Xu, Beikang Wang, Zhiwei Ye and Liye Mei
Remote Sens. 2025, 17(7), 1300; https://doi.org/10.3390/rs17071300 - 5 Apr 2025
Viewed by 1082
Abstract
UAV aerial–ground feature matching is used for remote sensing applications, such as urban mapping, disaster management, and surveillance. However, current semi-dense detectors are sparse and inadequate for comprehensively addressing problems like scale variations from inherent viewpoint differences, occlusions, illumination changes, and repeated textures. [...] Read more.
UAV aerial–ground feature matching is used for remote sensing applications, such as urban mapping, disaster management, and surveillance. However, current semi-dense detectors are sparse and inadequate for comprehensively addressing problems like scale variations from inherent viewpoint differences, occlusions, illumination changes, and repeated textures. To address these issues, we propose an efficient quadtree-attention-guided transformer (ETQ-Matcher) based on efficient LoFTR, which integrates the multi-layer transformer with channel attention (MTCA) to capture global features. Specifically, to tackle various complex urban building scenarios, we propose quadtree-attention feature fusion (QAFF), which implements alternating self- and cross-attention operations to capture the context of global images and establish correlations between image pairs. We collect 12 pairs of UAV remote sensing images using drones and handheld devices, and we further utilize representative multi-source remote sensing images along with MegaDepth datasets to demonstrate their strong generalization ability. We compare ETQ-Matcher to classic algorithms, and our experimental results demonstrate its superior performance in challenging aerial–ground urban scenes and multi-source remote sensing scenarios. Full article
Show Figures

Figure 1

14 pages, 8718 KB  
Technical Note
A Novel Bias-Adjusted Estimator Based on Synthetic Confusion Matrix (BAESCM) for Subregion Area Estimation
by Bo Zhang, Xuehong Chen, Xihong Cui and Miaogen Shen
Remote Sens. 2025, 17(7), 1145; https://doi.org/10.3390/rs17071145 - 24 Mar 2025
Viewed by 527
Abstract
Accurate area estimation of specific land cover/use types in administrative or natural units is crucial for various applications. However, land cover areas derived directly from classification maps of remote sensing via pixel counting often exhibit non-negligible bias. Thus, various design-based area estimators (e.g., [...] Read more.
Accurate area estimation of specific land cover/use types in administrative or natural units is crucial for various applications. However, land cover areas derived directly from classification maps of remote sensing via pixel counting often exhibit non-negligible bias. Thus, various design-based area estimators (e.g., bias-adjusted estimator, model-assisted difference estimator, model-assisted ratio estimator derived from confusion matrix), which combine the information of ground truth samples and the classification map, have been applied to provide more accurate area estimates and the uncertainty inference. These estimators work well for estimating areas in a region with sufficient ground truth samples, whereas they encounter challenges when estimating areas in multiple subregions where the samples are limited within each subregion. To overcome this limitation, we propose a novel Bias-Adjusted Estimator based on the Synthetic Confusion Matrix (BAESCM) for estimating land cover areas in subregions by downscaling the global sample information to the subregion scale. First, several clusters were generated from remote sensing data through the K-means method (with the number of clusters being much smaller than the number of subregions). Then, the cluster confusion matrix is estimated based on the samples in each cluster. Assuming that the classification error distribution within each cluster remains consistent across different subregions, the confusion matrix of the subregion can be synthesized by a weighted sum of the cluster confusion matrices, with the weights of the cluster abundances in the subregion. Finally, the classification bias at the subregion scale can be estimated based on the synthetic confusion matrix, and the area counted from the classification map is corrected accordingly. Moreover, we introduced a semi-empirical method for inferring the confidence intervals of the estimated areas, considering both the sampling variance due to sampling randomness and the downscaling variance due to the heterogeneity in classification error distribution within the cluster. We tested our method through simulated experiments for county-level area estimation of soybean crops in Nebraska State, USA. The results show that the root mean square errors (RMSEs) of the subregion area estimates using BAESCM are reduced by 21–64% compared to estimates based on pixel counting from the classification map. Additionally, the true coverages of the confidence intervals estimated by our method approximately matched their nominal coverages. Compared with traditional design-based estimators, the proposed BAESCM achieves better estimation accuracy of subregion areas when the sample size is limited. Therefore, the proposed method is particularly recommended for studies regarding subregion land cover areas in the case of inadequate ground truth samples. Full article
Show Figures

Figure 1

14 pages, 3344 KB  
Article
Robot-Based Procedure for 3D Reconstruction of Abdominal Organs Using the Iterative Closest Point and Pose Graph Algorithms
by Birthe Göbel, Jonas Huurdeman, Alexander Reiterer and Knut Möller
J. Imaging 2025, 11(2), 44; https://doi.org/10.3390/jimaging11020044 - 5 Feb 2025
Cited by 1 | Viewed by 1550
Abstract
Image-based 3D reconstruction enables robot-assisted interventions and image-guided navigation, which are emerging technologies in laparoscopy. When a robotic arm guides a laparoscope for image acquisition, hand–eye calibration is required to know the transformation between the camera and the robot flange. The calibration procedure [...] Read more.
Image-based 3D reconstruction enables robot-assisted interventions and image-guided navigation, which are emerging technologies in laparoscopy. When a robotic arm guides a laparoscope for image acquisition, hand–eye calibration is required to know the transformation between the camera and the robot flange. The calibration procedure is complex and must be conducted after each intervention (when the laparoscope is dismounted for cleaning). In the field, the surgeons and their assistants cannot be expected to do so. Thus, our approach is a procedure for a robot-based multi-view 3D reconstruction without hand–eye calibration, but with pose optimization algorithms instead. In this work, a robotic arm and a stereo laparoscope build the experimental setup. The procedure includes the stereo matching algorithm Semi Global Matching from OpenCV for depth measurement and the multiscale color iterative closest point algorithm from Open3D (v0.19), along with the multiway registration algorithm using a pose graph from Open3D (v0.19) for pose optimization. The procedure is evaluated quantitatively and qualitatively on ex vivo organs. The results are a low root mean squared error (1.1–3.37 mm) and dense point clouds. The proposed procedure leads to a plausible 3D model, and there is no need for complex hand–eye calibration, as this step can be compensated for by pose optimization algorithms. Full article
(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))
Show Figures

Figure 1

27 pages, 3367 KB  
Article
Binocular Video-Based Automatic Pixel-Level Crack Detection and Quantification Using Deep Convolutional Neural Networks for Concrete Structures
by Liqu Liu, Bo Shen, Shuchen Huang, Runlin Liu, Weizhang Liao, Bin Wang and Shuo Diao
Buildings 2025, 15(2), 258; https://doi.org/10.3390/buildings15020258 - 17 Jan 2025
Cited by 5 | Viewed by 1341
Abstract
Crack detection and quantification play crucial roles in assessing the condition of concrete structures. Herein, a novel real-time crack detection and quantification method that leverages binocular vision and a lightweight deep learning model is proposed. In this methodology, the proposed method based on [...] Read more.
Crack detection and quantification play crucial roles in assessing the condition of concrete structures. Herein, a novel real-time crack detection and quantification method that leverages binocular vision and a lightweight deep learning model is proposed. In this methodology, the proposed method based on the following four modules is adopted: a lightweight classification algorithm, a high-precision segmentation algorithm, a semi-global block matching algorithm (SGBM), and a crack quantification technique. Based on the crack segmentation results, a framework is developed for quantitative analysis of the major geometric parameters, including crack length, crack width, and crack angle of orientation at the pixel level. Results indicate that, by incorporating channel attention and spatial attention mechanisms in the MBConv module, the detection accuracy of the improved EfficientNetV2 increased by 1.6% compared with the original EfficientNetV2. Results indicate that using the proposed quantification method can achieve low quantification errors of 2%, 4.5%, and 4% for the crack length, width, and angle of orientation, respectively. The proposed method can contribute to crack detection and quantification in practical use by being deployed on smart devices. Full article
(This article belongs to the Special Issue Seismic Performance and Durability of Engineering Structures)
Show Figures

Figure 1

20 pages, 4856 KB  
Article
Enhancing the Ground Truth Disparity by MAP Estimation for Developing a Neural-Net Based Stereoscopic Camera
by Hanbit Gil, Sehyun Ryu and Sungmin Woo
Sensors 2024, 24(23), 7761; https://doi.org/10.3390/s24237761 - 4 Dec 2024
Viewed by 1871
Abstract
This paper presents a novel method to enhance ground truth disparity maps generated by Semi-Global Matching (SGM) using Maximum a Posteriori (MAP) estimation. SGM, while not producing visually appealing outputs like neural networks, offers high disparity accuracy in valid regions and avoids the [...] Read more.
This paper presents a novel method to enhance ground truth disparity maps generated by Semi-Global Matching (SGM) using Maximum a Posteriori (MAP) estimation. SGM, while not producing visually appealing outputs like neural networks, offers high disparity accuracy in valid regions and avoids the generalization issues often encountered with neural network-based disparity estimation. However, SGM struggles with occlusions and textureless areas, leading to invalid disparity values. Our approach, though relatively simple, mitigates these issues by interpolating invalid pixels using surrounding disparity information and Bayesian inference, improving both the visual quality of disparity maps and their usability for training neural network-based commercial depth-sensing devices. Experimental results validate that our enhanced disparity maps preserve SGM’s accuracy in valid regions while improving the overall performance of neural networks on both synthetic and real-world datasets. This method provides a robust framework for advanced stereoscopic camera systems, particularly in autonomous applications. Full article
Show Figures

Figure 1

21 pages, 7841 KB  
Article
Research on a Method for Measuring the Pile Height of Materials in Agricultural Product Transport Vehicles Based on Binocular Vision
by Wang Qian, Pengyong Wang, Hongjie Wang, Shuqin Wu, Yang Hao, Xiaoou Zhang, Xinyu Wang, Wenyan Sun, Haijie Guo and Xin Guo
Sensors 2024, 24(22), 7204; https://doi.org/10.3390/s24227204 - 11 Nov 2024
Cited by 1 | Viewed by 1205
Abstract
The advancement of unloading technology in combine harvesting is crucial for the intelligent development of agricultural machinery. Accurately measuring material pile height in transport vehicles is essential, as uneven accumulation can lead to spillage and voids, reducing loading efficiency. Relying solely on manual [...] Read more.
The advancement of unloading technology in combine harvesting is crucial for the intelligent development of agricultural machinery. Accurately measuring material pile height in transport vehicles is essential, as uneven accumulation can lead to spillage and voids, reducing loading efficiency. Relying solely on manual observation for measuring stack height can decrease harvesting efficiency and pose safety risks due to driver distraction. This research applies binocular vision to agricultural harvesting, proposing a novel method that uses a stereo matching algorithm to measure material pile height during harvesting. By comparing distance measurements taken in both empty and loaded states, the method determines stack height. A linear regression model processes the stack height data, enhancing measurement accuracy. A binocular vision system was established, applying Zhang’s calibration method on the MATLAB (R2019a) platform to correct camera parameters, achieving a calibration error of 0.15 pixels. The study implemented block matching (BM) and semi-global block matching (SGBM) algorithms using the OpenCV (4.8.1) library on the PyCharm (2020.3.5) platform for stereo matching, generating disparity, and pseudo-color maps. Three-dimensional coordinates of key points on the piled material were calculated to measure distances from the vehicle container bottom and material surface to the binocular camera, allowing for the calculation of material pile height. Furthermore, a linear regression model was applied to correct the data, enhancing the accuracy of the measured pile height. The results indicate that by employing binocular stereo vision and stereo matching algorithms, followed by linear regression, this method can accurately calculate material pile height. The average relative error for the BM algorithm was 3.70%, and for the SGBM algorithm, it was 3.35%, both within the acceptable precision range. While the SGBM algorithm was, on average, 46 ms slower than the BM algorithm, both maintained errors under 7% and computation times under 100 ms, meeting the real-time measurement requirements for combine harvesting. In practical operations, this method can effectively measure material pile height in transport vehicles. The choice of matching algorithm should consider container size, material properties, and the balance between measurement time, accuracy, and disparity map completeness. This approach aids in manual adjustment of machinery posture and provides data support for future autonomous master-slave collaborative operations in combine harvesting. Full article
(This article belongs to the Special Issue AI, IoT and Smart Sensors for Precision Agriculture)
Show Figures

Figure 1

25 pages, 42422 KB  
Article
Conceptualization and First Realization Steps for a Multi-Camera System to Capture Tree Streamlining in Wind
by Frederik O. Kammel and Alexander Reiterer
Forests 2024, 15(11), 1846; https://doi.org/10.3390/f15111846 - 22 Oct 2024
Viewed by 1039
Abstract
Forests and trees provide a variety of essential ecosystem services. Maintaining them is becoming increasingly important, as global and regional climate change is already leading to major changes in the structure and composition of forests. To minimize the negative effects of storm damage [...] Read more.
Forests and trees provide a variety of essential ecosystem services. Maintaining them is becoming increasingly important, as global and regional climate change is already leading to major changes in the structure and composition of forests. To minimize the negative effects of storm damage risk, the tree and stand characteristics on which the storm damage risk depends must be known. Previous work in this field has consisted of tree-pulling tests and targets attached to selected branches. They fail, however, since the mass of such targets is very high compared to the mass of the branches, causing the targets to influence the tree’s response significantly, and because they cannot model dynamic wind loads. We, therefore, installed a multi-camera system consisting of nine cameras that are mounted on four masts surrounding a tree. With those cameras acquiring images at a rate of 10 Hz, we use photogrammetry and a semi-automatic feature-matching workflow to deduce a 3D model of the tree crown over time. Together with motion sensors mounted on the tree and tree-pulling tests, we intended to learn more about the wind-induced tree response of all dominant aerial tree parts, including the crown, under real wind conditions, as well as dampening processes in tree motion. Full article
(This article belongs to the Section Natural Hazards and Risk Management)
Show Figures

Figure 1

20 pages, 54021 KB  
Article
Point of Interest Recognition and Tracking in Aerial Video during Live Cycling Broadcasts
by Jelle Vanhaeverbeke, Robbe Decorte, Maarten Slembrouck, Sofie Van Hoecke and Steven Verstockt
Appl. Sci. 2024, 14(20), 9246; https://doi.org/10.3390/app14209246 - 11 Oct 2024
Viewed by 1376
Abstract
Road cycling races, such as the Tour de France, captivate millions of viewers globally, combining competitive sportsmanship with the promotion of regional landmarks. Traditionally, points of interest (POIs) are highlighted during broadcasts using manually created static overlays, a process that is both outdated [...] Read more.
Road cycling races, such as the Tour de France, captivate millions of viewers globally, combining competitive sportsmanship with the promotion of regional landmarks. Traditionally, points of interest (POIs) are highlighted during broadcasts using manually created static overlays, a process that is both outdated and labor-intensive. This paper presents a novel, fully automated methodology for detecting and tracking POIs in live helicopter video streams, aiming to streamline the visualization workflow and enhance viewer engagement. Our approach integrates a saliency and Segment Anything-based technique to propose potential POI regions, which are then recognized using a keypoint matching method that requires only a few reference images. This system supports both automatic and semi-automatic operations, allowing video editors to intervene when necessary, thereby balancing automation with manual control. The proposed pipeline demonstrated high effectiveness, achieving over 75% precision and recall in POI detection, and offers two tracking solutions: a traditional MedianFlow tracker and an advanced SAM 2 tracker. While the former provides speed and simplicity, the latter delivers superior segmentation tracking, albeit with higher computational demands. Our findings suggest that this methodology significantly reduces manual workload and opens new possibilities for interactive visualizations, enhancing the live viewing experience of cycling races. Full article
Show Figures

Figure 1

38 pages, 98377 KB  
Article
FaSS-MVS: Fast Multi-View Stereo with Surface-Aware Semi-Global Matching from UAV-Borne Monocular Imagery
by Boitumelo Ruf, Martin Weinmann and Stefan Hinz
Sensors 2024, 24(19), 6397; https://doi.org/10.3390/s24196397 - 2 Oct 2024
Viewed by 1555
Abstract
With FaSS-MVS, we present a fast, surface-aware semi-global optimization approach for multi-view stereo that allows for rapid depth and normal map estimation from monocular aerial video data captured by unmanned aerial vehicles (UAVs). The data estimated by FaSS-MVS, in turn, facilitate online 3D [...] Read more.
With FaSS-MVS, we present a fast, surface-aware semi-global optimization approach for multi-view stereo that allows for rapid depth and normal map estimation from monocular aerial video data captured by unmanned aerial vehicles (UAVs). The data estimated by FaSS-MVS, in turn, facilitate online 3D mapping, meaning that a 3D map of the scene is immediately and incrementally generated as the image data are acquired or being received. FaSS-MVS is composed of a hierarchical processing scheme in which depth and normal data, as well as corresponding confidence scores, are estimated in a coarse-to-fine manner, allowing efficient processing of large scene depths, such as those inherent in oblique images acquired by UAVs flying at low altitudes. The actual depth estimation uses a plane-sweep algorithm for dense multi-image matching to produce depth hypotheses from which the actual depth map is extracted by means of a surface-aware semi-global optimization, reducing the fronto-parallel bias of Semi-Global Matching (SGM). Given the estimated depth map, the pixel-wise surface normal information is then computed by reprojecting the depth map into a point cloud and computing the normal vectors within a confined local neighborhood. In a thorough quantitative and ablative study, we show that the accuracy of the 3D information computed by FaSS-MVS is close to that of state-of-the-art offline multi-view stereo approaches, with the error not even an order of magnitude higher than that of COLMAP. At the same time, however, the average runtime of FaSS-MVS for estimating a single depth and normal map is less than 14% of that of COLMAP, allowing us to perform online and incremental processing of full HD images at 1–2 Hz. Full article
(This article belongs to the Special Issue Advances on UAV-Based Sensing and Imaging)
Show Figures

Figure 1

Back to TopTop