MDPI - Publisher of Open Access Journals

28 pages, 32809 KB

Open AccessArticle

LiteSAM: Lightweight and Robust Feature Matching for Satellite and Aerial Imagery

by Boya Wang, Shuo Wang, Yibin Han, Linfeng Xu and Dong Ye

Remote Sens. 2025, 17(19), 3349; https://doi.org/10.3390/rs17193349 - 1 Oct 2025

We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV [...] Read more.

We present a (Light)weight (S)atellite–(A)erial feature (M)atching framework (LiteSAM) for robust UAV absolute visual localization (AVL) in GPS-denied environments. Existing satellite–aerial matching methods struggle with large appearance variations, texture-scarce regions, and limited efficiency for real-time UAV applications. LiteSAM integrates three key components to address these issues. First, efficient multi-scale feature extraction optimizes representation, reducing inference latency for edge devices. Second, a Token Aggregation–Interaction Transformer (TAIFormer) with a convolutional token mixer (CTM) models inter- and intra-image correlations, enabling robust global–local feature fusion. Third, a MinGRU-based dynamic subpixel refinement module adaptively learns spatial offsets, enhancing subpixel-level matching accuracy and cross-scenario generalization. The experiments show that LiteSAM achieves competitive performance across multiple datasets. On UAV-VisLoc, LiteSAM attains an RMSE@30 of 17.86 m, outperforming state-of-the-art semi-dense methods such as EfficientLoFTR. Its optimized variant, LiteSAM (opt., without dual softmax), delivers inference times of 61.98 ms on standard GPUs and 497.49 ms on NVIDIA Jetson AGX Orin, which are 22.9% and 19.8% faster than EfficientLoFTR (opt.), respectively. With 6.31M parameters, which is 2.4× fewer than EfficientLoFTR’s 15.05M, LiteSAM proves to be suitable for edge deployment. Extensive evaluations on natural image matching and downstream vision tasks confirm its superior accuracy and efficiency for general feature matching. Full article

25 pages, 17492 KB

Open AccessArticle

Temporal and Spatial Upscaling with PlanetScope Data: Predicting Relative Canopy Dieback in the Piñon-Juniper Woodlands of Utah

by Elliot S. Shayle and Dirk Zeuss

Remote Sens. 2025, 17(19), 3323; https://doi.org/10.3390/rs17193323 - 28 Sep 2025

Abstract

Drought-induced forest mortality threatens biodiversity globally, particularly in arid, and semi-arid woodlands. The continual development of remote sensing approaches enables enhanced monitoring of forest health. Herein, we investigate the ability of a limited ground-truthed canopy dieback dataset and satellite image derived Normalised Difference [...] Read more.

Drought-induced forest mortality threatens biodiversity globally, particularly in arid, and semi-arid woodlands. The continual development of remote sensing approaches enables enhanced monitoring of forest health. Herein, we investigate the ability of a limited ground-truthed canopy dieback dataset and satellite image derived Normalised Difference Vegetation Index (NDVI) to make inferences about forest health as temporal and spatial extent from its collection increases. We used ground-truthed observations of relative canopy mortality from the Pinus edulis-Juniperus osteosperma woodlands of southeastern Utah, United States of America, collected after the 2017–2018 drought, and PlanetScope satellite imagery. Through assessing different modelling approaches, we found that NDVI is significantly associated with sitewide mean canopy dieback, with beta regression being the most optimal modelling framework due to the bounded nature of the variable relative canopy dieback. Model performance was further improved by incorporating the proportion of J. osteosperma as an interaction term, matching the reports of species-specific differential dieback. A time-series analysis revealed that NDVI retained its predictive power for our whole testing period; four years after the initial ground-truthing, thus enabling retrospective inference of defoliation and regreening. A spatial random forest model trained on our ground-truthed observations accurately predicted dieback across the broader landscape. These findings demonstrate that modest field campaigns combined with high-resolution satellite data can generate reliable, scalable insights into forest health, offering a cost-effective method for monitoring drought-impacted ecosystems under climate change. Full article

(This article belongs to the Section Forest Remote Sensing)

► Show Figures

Figure 1

24 pages, 1822 KB

Open AccessArticle

A Trinocular System for Pedestrian Localization by Combining Template Matching with Geometric Constraint Optimization

by Jinjing Zhao, Sen Huang, Yancheng Li, Jingjing Xu and Shengyong Xu

Sensors 2025, 25(19), 5970; https://doi.org/10.3390/s25195970 - 25 Sep 2025

Abstract

Pedestrian localization is a fundamental sensing task for intelligent outdoor systems. To overcome the limitations of accuracy and efficiency in conventional binocular approaches, this study introduces a trinocular stereo vision framework that integrates template matching with geometric constraint optimization. The system employs a [...] Read more.

Pedestrian localization is a fundamental sensing task for intelligent outdoor systems. To overcome the limitations of accuracy and efficiency in conventional binocular approaches, this study introduces a trinocular stereo vision framework that integrates template matching with geometric constraint optimization. The system employs a trinocular camera configuration arranged in an equilateral triangle, which enables complementary perspectives beyond a standard horizontal baseline. Based on this setup, an initial depth estimate is obtained through multi-scale template matching on the primary binocular pair. The additional vertical viewpoint is then incorporated by enforcing three-view geometric consistency, yielding refined and more reliable depth estimates. We evaluate the method on a custom outdoor trinocular dataset. Experimental results demonstrate that the proposed approach achieves a mean absolute error of 0.435 m with an average processing time of 3.13 ms per target. This performance surpasses both the binocular Semi-Global Block Matching (0.536 m) and RAFT-Stereo (0.623 m for the standard model and 0.621 m for the real-time model without fine-tuning). When combined with the YOLOv8-s detector, the system can localize pedestrians in 7.52 ms per frame, maintaining real-time operation (> 30 Hz) for up to nine individuals, with a total end-to-end latency of approximately 32.56 ms. Full article

(This article belongs to the Section Navigation and Positioning)

48 pages, 18119 KB

Open AccessArticle

Dense Matching with Low Computational Complexity for Disparity Estimation in the Radargrammetric Approach of SAR Intensity Images

by Hamid Jannati, Mohammad Javad Valadan Zoej, Ebrahim Ghaderpour and Paolo Mazzanti

Remote Sens. 2025, 17(15), 2693; https://doi.org/10.3390/rs17152693 - 3 Aug 2025

Viewed by 543

Abstract

Synthetic Aperture Radar (SAR) images and optical imagery have high potential for extracting digital elevation models (DEMs). The two main approaches for deriving elevation models from SAR data are interferometry (InSAR) and radargrammetry. Adapted from photogrammetric principles, radargrammetry relies on disparity model estimation [...] Read more.

Synthetic Aperture Radar (SAR) images and optical imagery have high potential for extracting digital elevation models (DEMs). The two main approaches for deriving elevation models from SAR data are interferometry (InSAR) and radargrammetry. Adapted from photogrammetric principles, radargrammetry relies on disparity model estimation as its core component. Matching strategies in radargrammetry typically follow local, global, or semi-global methodologies. Local methods, while having higher accuracy, especially in low-texture SAR images, require larger kernel sizes, leading to quadratic computational complexity. Conversely, global and semi-global models produce more consistent and higher-quality disparity maps but are computationally more intensive than local methods with small kernels and require more memory (RAM). In this study, inspired by the advantages of local matching algorithms, a computationally efficient and novel model is proposed for extracting corresponding pixels in SAR-intensity stereo images. To enhance accuracy, the proposed two-stage algorithm operates without an image pyramid structure. Notably, unlike traditional local and global models, the computational complexity of the proposed approach remains stable as the input size or kernel dimensions increase while memory consumption stays low. Compared to a pyramid-based local normalized cross-correlation (NCC) algorithm and adaptive semi-global matching (SGM) models, the proposed method maintains good accuracy comparable to adaptive SGM while reducing processing time by up to 50% relative to pyramid SGM and achieving a 35-fold speedup over the local NCC algorithm with an optimal kernel size. Validated on a Sentinel-1 stereo pair with a 10 m ground-pixel size, the proposed algorithm yields a DEM with an average accuracy of 34.1 m. Full article

(This article belongs to the Special Issue Advancing Synthetic Aperture Radar: Imaging, Processing, and Applications in Remote Sensing)

► Show Figures

Graphical abstract

24 pages, 15100 KB

Open AccessArticle

Sugarcane Feed Volume Detection in Stacked Scenarios Based on Improved YOLO-ASM

by Xiao Lai and Guanglong Fu

Agriculture 2025, 15(13), 1428; https://doi.org/10.3390/agriculture15131428 - 2 Jul 2025

Viewed by 404

Abstract

Improper regulation of sugarcane feed volume can lead to harvester inefficiency or clogging. Accurate recognition of feed volume is therefore critical. However, visual recognition is challenging due to sugarcane stacking during feeding. To address this, we propose YOLO-ASM (YOLO Accurate Stereo Matching), a [...] Read more.

Improper regulation of sugarcane feed volume can lead to harvester inefficiency or clogging. Accurate recognition of feed volume is therefore critical. However, visual recognition is challenging due to sugarcane stacking during feeding. To address this, we propose YOLO-ASM (YOLO Accurate Stereo Matching), a novel detection method. At the target detection level, we integrate a Convolutional Block Attention Module (CBAM) into the YOLOv5s backbone network. This significantly reduces missed detections and low-confidence predictions in dense stacking scenarios, improving detection speed by 28.04% and increasing mean average precision (mAP) by 5.31%. At the stereo matching level, we enhance the SGBM (Semi-Global Block Matching) algorithm through improved cost calculation and cost aggregation, resulting in Opti-SGBM (Optimized SGBM). This double-cost fusion approach strengthens texture feature extraction in stacked sugarcane, effectively reducing noise in the generated depth maps. The optimized algorithm yields depth maps with smaller errors relative to the original images, significantly improving depth accuracy. Experimental results demonstrate that the fused YOLO-ASM algorithm reduces sugarcane volume error rates across feed volumes of one to six by 3.45%, 3.23%, 6.48%, 5.86%, 9.32%, and 11.09%, respectively, compared to the original stereo matching algorithm. It also accelerates feed volume detection by approximately 100%, providing a high-precision solution for anti-clogging control in sugarcane harvester conveyor systems. Full article

(This article belongs to the Section Agricultural Technology)

► Show Figures

Figure 1

12 pages, 3508 KB

Open AccessArticle

Improvement of the Cross-Scale Multi-Feature Stereo Matching Algorithm

by Nan Chen, Dongri Shan and Peng Zhang

Appl. Sci. 2025, 15(11), 5837; https://doi.org/10.3390/app15115837 - 22 May 2025

Viewed by 605

Abstract

With the continuous advancement of industrialization and intelligentization, stereo-vision-based measurement technology for large-scale components has become a prominent research focus. To address weak-textured regions in large-scale component images and reduce mismatches in stereo matching, we propose a cross-scale multi-feature stereo matching algorithm. In [...] Read more.

With the continuous advancement of industrialization and intelligentization, stereo-vision-based measurement technology for large-scale components has become a prominent research focus. To address weak-textured regions in large-scale component images and reduce mismatches in stereo matching, we propose a cross-scale multi-feature stereo matching algorithm. In the cost-computation stage, the sum of absolute differences (SAD), census, and modified census cost aggregation are employed as cost-calculation methods. During the cost-aggregation phase, cross-scale theory is introduced to fuse multi-scale cost volumes using distinct aggregation parameters through a cross-scale framework. Experimental results on both benchmark and real-world datasets demonstrate that the enhanced algorithm achieves an average mismatch rate of 12.25%, exhibiting superior robustness compared to conventional census transform and semi-global matching (SGM) algorithms. Full article

(This article belongs to the Special Issue Advances in Computer Vision and Digital Image Processing)

► Show Figures

Figure 1

27 pages, 49665 KB

Open AccessArticle

ETQ-Matcher: Efficient Quadtree-Attention-Guided Transformer for Detector-Free Aerial–Ground Image Matching

by Chuan Xu, Beikang Wang, Zhiwei Ye and Liye Mei

Remote Sens. 2025, 17(7), 1300; https://doi.org/10.3390/rs17071300 - 5 Apr 2025

Viewed by 1082

Abstract

UAV aerial–ground feature matching is used for remote sensing applications, such as urban mapping, disaster management, and surveillance. However, current semi-dense detectors are sparse and inadequate for comprehensively addressing problems like scale variations from inherent viewpoint differences, occlusions, illumination changes, and repeated textures. [...] Read more.

UAV aerial–ground feature matching is used for remote sensing applications, such as urban mapping, disaster management, and surveillance. However, current semi-dense detectors are sparse and inadequate for comprehensively addressing problems like scale variations from inherent viewpoint differences, occlusions, illumination changes, and repeated textures. To address these issues, we propose an efficient quadtree-attention-guided transformer (ETQ-Matcher) based on efficient LoFTR, which integrates the multi-layer transformer with channel attention (MTCA) to capture global features. Specifically, to tackle various complex urban building scenarios, we propose quadtree-attention feature fusion (QAFF), which implements alternating self- and cross-attention operations to capture the context of global images and establish correlations between image pairs. We collect 12 pairs of UAV remote sensing images using drones and handheld devices, and we further utilize representative multi-source remote sensing images along with MegaDepth datasets to demonstrate their strong generalization ability. We compare ETQ-Matcher to classic algorithms, and our experimental results demonstrate its superior performance in challenging aerial–ground urban scenes and multi-source remote sensing scenarios. Full article

(This article belongs to the Special Issue Application of Spatial Information Science and Cartography in the Big Remotely Sensed Data Era)

► Show Figures

Figure 1

14 pages, 8718 KB

Open AccessTechnical Note

A Novel Bias-Adjusted Estimator Based on Synthetic Confusion Matrix (BAESCM) for Subregion Area Estimation

by Bo Zhang, Xuehong Chen, Xihong Cui and Miaogen Shen

Remote Sens. 2025, 17(7), 1145; https://doi.org/10.3390/rs17071145 - 24 Mar 2025

Viewed by 527

Abstract

Accurate area estimation of specific land cover/use types in administrative or natural units is crucial for various applications. However, land cover areas derived directly from classification maps of remote sensing via pixel counting often exhibit non-negligible bias. Thus, various design-based area estimators (e.g., [...] Read more.

Accurate area estimation of specific land cover/use types in administrative or natural units is crucial for various applications. However, land cover areas derived directly from classification maps of remote sensing via pixel counting often exhibit non-negligible bias. Thus, various design-based area estimators (e.g., bias-adjusted estimator, model-assisted difference estimator, model-assisted ratio estimator derived from confusion matrix), which combine the information of ground truth samples and the classification map, have been applied to provide more accurate area estimates and the uncertainty inference. These estimators work well for estimating areas in a region with sufficient ground truth samples, whereas they encounter challenges when estimating areas in multiple subregions where the samples are limited within each subregion. To overcome this limitation, we propose a novel Bias-Adjusted Estimator based on the Synthetic Confusion Matrix (BAESCM) for estimating land cover areas in subregions by downscaling the global sample information to the subregion scale. First, several clusters were generated from remote sensing data through the K-means method (with the number of clusters being much smaller than the number of subregions). Then, the cluster confusion matrix is estimated based on the samples in each cluster. Assuming that the classification error distribution within each cluster remains consistent across different subregions, the confusion matrix of the subregion can be synthesized by a weighted sum of the cluster confusion matrices, with the weights of the cluster abundances in the subregion. Finally, the classification bias at the subregion scale can be estimated based on the synthetic confusion matrix, and the area counted from the classification map is corrected accordingly. Moreover, we introduced a semi-empirical method for inferring the confidence intervals of the estimated areas, considering both the sampling variance due to sampling randomness and the downscaling variance due to the heterogeneity in classification error distribution within the cluster. We tested our method through simulated experiments for county-level area estimation of soybean crops in Nebraska State, USA. The results show that the root mean square errors (RMSEs) of the subregion area estimates using BAESCM are reduced by 21–64% compared to estimates based on pixel counting from the classification map. Additionally, the true coverages of the confidence intervals estimated by our method approximately matched their nominal coverages. Compared with traditional design-based estimators, the proposed BAESCM achieves better estimation accuracy of subregion areas when the sample size is limited. Therefore, the proposed method is particularly recommended for studies regarding subregion land cover areas in the case of inadequate ground truth samples. Full article

► Show Figures

Figure 1

14 pages, 3344 KB

Open AccessArticle

Robot-Based Procedure for 3D Reconstruction of Abdominal Organs Using the Iterative Closest Point and Pose Graph Algorithms

by Birthe Göbel, Jonas Huurdeman, Alexander Reiterer and Knut Möller

J. Imaging 2025, 11(2), 44; https://doi.org/10.3390/jimaging11020044 - 5 Feb 2025

Cited by 1 | Viewed by 1550

Abstract

Image-based 3D reconstruction enables robot-assisted interventions and image-guided navigation, which are emerging technologies in laparoscopy. When a robotic arm guides a laparoscope for image acquisition, hand–eye calibration is required to know the transformation between the camera and the robot flange. The calibration procedure [...] Read more.

Image-based 3D reconstruction enables robot-assisted interventions and image-guided navigation, which are emerging technologies in laparoscopy. When a robotic arm guides a laparoscope for image acquisition, hand–eye calibration is required to know the transformation between the camera and the robot flange. The calibration procedure is complex and must be conducted after each intervention (when the laparoscope is dismounted for cleaning). In the field, the surgeons and their assistants cannot be expected to do so. Thus, our approach is a procedure for a robot-based multi-view 3D reconstruction without hand–eye calibration, but with pose optimization algorithms instead. In this work, a robotic arm and a stereo laparoscope build the experimental setup. The procedure includes the stereo matching algorithm Semi Global Matching from OpenCV for depth measurement and the multiscale color iterative closest point algorithm from Open3D (v0.19), along with the multiway registration algorithm using a pose graph from Open3D (v0.19) for pose optimization. The procedure is evaluated quantitatively and qualitatively on ex vivo organs. The results are a low root mean squared error (1.1–3.37 mm) and dense point clouds. The proposed procedure leads to a plausible 3D model, and there is no need for complex hand–eye calibration, as this step can be compensated for by pose optimization algorithms. Full article

(This article belongs to the Special Issue Geometry Reconstruction from Images (2nd Edition))

► Show Figures

Figure 1

27 pages, 3367 KB

Open AccessArticle

Binocular Video-Based Automatic Pixel-Level Crack Detection and Quantification Using Deep Convolutional Neural Networks for Concrete Structures

by Liqu Liu, Bo Shen, Shuchen Huang, Runlin Liu, Weizhang Liao, Bin Wang and Shuo Diao

Buildings 2025, 15(2), 258; https://doi.org/10.3390/buildings15020258 - 17 Jan 2025

Cited by 5 | Viewed by 1341

Abstract

Crack detection and quantification play crucial roles in assessing the condition of concrete structures. Herein, a novel real-time crack detection and quantification method that leverages binocular vision and a lightweight deep learning model is proposed. In this methodology, the proposed method based on [...] Read more.

Crack detection and quantification play crucial roles in assessing the condition of concrete structures. Herein, a novel real-time crack detection and quantification method that leverages binocular vision and a lightweight deep learning model is proposed. In this methodology, the proposed method based on the following four modules is adopted: a lightweight classification algorithm, a high-precision segmentation algorithm, a semi-global block matching algorithm (SGBM), and a crack quantification technique. Based on the crack segmentation results, a framework is developed for quantitative analysis of the major geometric parameters, including crack length, crack width, and crack angle of orientation at the pixel level. Results indicate that, by incorporating channel attention and spatial attention mechanisms in the MBConv module, the detection accuracy of the improved EfficientNetV2 increased by 1.6% compared with the original EfficientNetV2. Results indicate that using the proposed quantification method can achieve low quantification errors of 2%, 4.5%, and 4% for the crack length, width, and angle of orientation, respectively. The proposed method can contribute to crack detection and quantification in practical use by being deployed on smart devices. Full article

(This article belongs to the Special Issue Seismic Performance and Durability of Engineering Structures)

► Show Figures

Figure 1

20 pages, 4856 KB

Open AccessArticle

Enhancing the Ground Truth Disparity by MAP Estimation for Developing a Neural-Net Based Stereoscopic Camera

by Hanbit Gil, Sehyun Ryu and Sungmin Woo

Sensors 2024, 24(23), 7761; https://doi.org/10.3390/s24237761 - 4 Dec 2024

Viewed by 1871

Abstract

This paper presents a novel method to enhance ground truth disparity maps generated by Semi-Global Matching (SGM) using Maximum a Posteriori (MAP) estimation. SGM, while not producing visually appealing outputs like neural networks, offers high disparity accuracy in valid regions and avoids the [...] Read more.

This paper presents a novel method to enhance ground truth disparity maps generated by Semi-Global Matching (SGM) using Maximum a Posteriori (MAP) estimation. SGM, while not producing visually appealing outputs like neural networks, offers high disparity accuracy in valid regions and avoids the generalization issues often encountered with neural network-based disparity estimation. However, SGM struggles with occlusions and textureless areas, leading to invalid disparity values. Our approach, though relatively simple, mitigates these issues by interpolating invalid pixels using surrounding disparity information and Bayesian inference, improving both the visual quality of disparity maps and their usability for training neural network-based commercial depth-sensing devices. Experimental results validate that our enhanced disparity maps preserve SGM’s accuracy in valid regions while improving the overall performance of neural networks on both synthetic and real-world datasets. This method provides a robust framework for advanced stereoscopic camera systems, particularly in autonomous applications. Full article

(This article belongs to the Topic 3D Computer Vision and Smart Building and City, 2nd Volume)

► Show Figures

Figure 1

21 pages, 7841 KB

Open AccessArticle

Research on a Method for Measuring the Pile Height of Materials in Agricultural Product Transport Vehicles Based on Binocular Vision

by Wang Qian, Pengyong Wang, Hongjie Wang, Shuqin Wu, Yang Hao, Xiaoou Zhang, Xinyu Wang, Wenyan Sun, Haijie Guo and Xin Guo

Sensors 2024, 24(22), 7204; https://doi.org/10.3390/s24227204 - 11 Nov 2024

Cited by 1 | Viewed by 1205

Abstract

The advancement of unloading technology in combine harvesting is crucial for the intelligent development of agricultural machinery. Accurately measuring material pile height in transport vehicles is essential, as uneven accumulation can lead to spillage and voids, reducing loading efficiency. Relying solely on manual [...] Read more.

The advancement of unloading technology in combine harvesting is crucial for the intelligent development of agricultural machinery. Accurately measuring material pile height in transport vehicles is essential, as uneven accumulation can lead to spillage and voids, reducing loading efficiency. Relying solely on manual observation for measuring stack height can decrease harvesting efficiency and pose safety risks due to driver distraction. This research applies binocular vision to agricultural harvesting, proposing a novel method that uses a stereo matching algorithm to measure material pile height during harvesting. By comparing distance measurements taken in both empty and loaded states, the method determines stack height. A linear regression model processes the stack height data, enhancing measurement accuracy. A binocular vision system was established, applying Zhang’s calibration method on the MATLAB (R2019a) platform to correct camera parameters, achieving a calibration error of 0.15 pixels. The study implemented block matching (BM) and semi-global block matching (SGBM) algorithms using the OpenCV (4.8.1) library on the PyCharm (2020.3.5) platform for stereo matching, generating disparity, and pseudo-color maps. Three-dimensional coordinates of key points on the piled material were calculated to measure distances from the vehicle container bottom and material surface to the binocular camera, allowing for the calculation of material pile height. Furthermore, a linear regression model was applied to correct the data, enhancing the accuracy of the measured pile height. The results indicate that by employing binocular stereo vision and stereo matching algorithms, followed by linear regression, this method can accurately calculate material pile height. The average relative error for the BM algorithm was 3.70%, and for the SGBM algorithm, it was 3.35%, both within the acceptable precision range. While the SGBM algorithm was, on average, 46 ms slower than the BM algorithm, both maintained errors under 7% and computation times under 100 ms, meeting the real-time measurement requirements for combine harvesting. In practical operations, this method can effectively measure material pile height in transport vehicles. The choice of matching algorithm should consider container size, material properties, and the balance between measurement time, accuracy, and disparity map completeness. This approach aids in manual adjustment of machinery posture and provides data support for future autonomous master-slave collaborative operations in combine harvesting. Full article

(This article belongs to the Special Issue AI, IoT and Smart Sensors for Precision Agriculture)

► Show Figures

Figure 1

25 pages, 42422 KB

Open AccessArticle

Conceptualization and First Realization Steps for a Multi-Camera System to Capture Tree Streamlining in Wind

by Frederik O. Kammel and Alexander Reiterer

Forests 2024, 15(11), 1846; https://doi.org/10.3390/f15111846 - 22 Oct 2024

Viewed by 1039

Abstract

Forests and trees provide a variety of essential ecosystem services. Maintaining them is becoming increasingly important, as global and regional climate change is already leading to major changes in the structure and composition of forests. To minimize the negative effects of storm damage [...] Read more.

Forests and trees provide a variety of essential ecosystem services. Maintaining them is becoming increasingly important, as global and regional climate change is already leading to major changes in the structure and composition of forests. To minimize the negative effects of storm damage risk, the tree and stand characteristics on which the storm damage risk depends must be known. Previous work in this field has consisted of tree-pulling tests and targets attached to selected branches. They fail, however, since the mass of such targets is very high compared to the mass of the branches, causing the targets to influence the tree’s response significantly, and because they cannot model dynamic wind loads. We, therefore, installed a multi-camera system consisting of nine cameras that are mounted on four masts surrounding a tree. With those cameras acquiring images at a rate of 10 Hz, we use photogrammetry and a semi-automatic feature-matching workflow to deduce a 3D model of the tree crown over time. Together with motion sensors mounted on the tree and tree-pulling tests, we intended to learn more about the wind-induced tree response of all dominant aerial tree parts, including the crown, under real wind conditions, as well as dampening processes in tree motion. Full article

(This article belongs to the Section Natural Hazards and Risk Management)

► Show Figures

Figure 1

20 pages, 54021 KB

Open AccessArticle

Point of Interest Recognition and Tracking in Aerial Video during Live Cycling Broadcasts

by Jelle Vanhaeverbeke, Robbe Decorte, Maarten Slembrouck, Sofie Van Hoecke and Steven Verstockt

Appl. Sci. 2024, 14(20), 9246; https://doi.org/10.3390/app14209246 - 11 Oct 2024

Viewed by 1376

Abstract

Road cycling races, such as the Tour de France, captivate millions of viewers globally, combining competitive sportsmanship with the promotion of regional landmarks. Traditionally, points of interest (POIs) are highlighted during broadcasts using manually created static overlays, a process that is both outdated [...] Read more.

Road cycling races, such as the Tour de France, captivate millions of viewers globally, combining competitive sportsmanship with the promotion of regional landmarks. Traditionally, points of interest (POIs) are highlighted during broadcasts using manually created static overlays, a process that is both outdated and labor-intensive. This paper presents a novel, fully automated methodology for detecting and tracking POIs in live helicopter video streams, aiming to streamline the visualization workflow and enhance viewer engagement. Our approach integrates a saliency and Segment Anything-based technique to propose potential POI regions, which are then recognized using a keypoint matching method that requires only a few reference images. This system supports both automatic and semi-automatic operations, allowing video editors to intervene when necessary, thereby balancing automation with manual control. The proposed pipeline demonstrated high effectiveness, achieving over 75% precision and recall in POI detection, and offers two tracking solutions: a traditional MedianFlow tracker and an advanced SAM 2 tracker. While the former provides speed and simplicity, the latter delivers superior segmentation tracking, albeit with higher computational demands. Our findings suggest that this methodology significantly reduces manual workload and opens new possibilities for interactive visualizations, enhancing the live viewing experience of cycling races. Full article

(This article belongs to the Special Issue Artificial Intelligence and Computer Technologies in Sports and Healthcare)

► Show Figures

Figure 1

38 pages, 98377 KB

Open AccessArticle

FaSS-MVS: Fast Multi-View Stereo with Surface-Aware Semi-Global Matching from UAV-Borne Monocular Imagery

by Boitumelo Ruf, Martin Weinmann and Stefan Hinz

Sensors 2024, 24(19), 6397; https://doi.org/10.3390/s24196397 - 2 Oct 2024

Viewed by 1555

Abstract

With FaSS-MVS, we present a fast, surface-aware semi-global optimization approach for multi-view stereo that allows for rapid depth and normal map estimation from monocular aerial video data captured by unmanned aerial vehicles (UAVs). The data estimated by FaSS-MVS, in turn, facilitate online 3D [...] Read more.

With FaSS-MVS, we present a fast, surface-aware semi-global optimization approach for multi-view stereo that allows for rapid depth and normal map estimation from monocular aerial video data captured by unmanned aerial vehicles (UAVs). The data estimated by FaSS-MVS, in turn, facilitate online 3D mapping, meaning that a 3D map of the scene is immediately and incrementally generated as the image data are acquired or being received. FaSS-MVS is composed of a hierarchical processing scheme in which depth and normal data, as well as corresponding confidence scores, are estimated in a coarse-to-fine manner, allowing efficient processing of large scene depths, such as those inherent in oblique images acquired by UAVs flying at low altitudes. The actual depth estimation uses a plane-sweep algorithm for dense multi-image matching to produce depth hypotheses from which the actual depth map is extracted by means of a surface-aware semi-global optimization, reducing the fronto-parallel bias of Semi-Global Matching (SGM). Given the estimated depth map, the pixel-wise surface normal information is then computed by reprojecting the depth map into a point cloud and computing the normal vectors within a confined local neighborhood. In a thorough quantitative and ablative study, we show that the accuracy of the 3D information computed by FaSS-MVS is close to that of state-of-the-art offline multi-view stereo approaches, with the error not even an order of magnitude higher than that of COLMAP. At the same time, however, the average runtime of FaSS-MVS for estimating a single depth and normal map is less than 14% of that of COLMAP, allowing us to perform online and incremental processing of full HD images at 1–2 Hz. Full article

(This article belongs to the Special Issue Advances on UAV-Based Sensing and Imaging)

► Show Figures

Figure 1

Search Results (101)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (101)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI