Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (311)

Search Parameters:
Keywords = UAV image transformation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
18 pages, 2930 KiB  
Article
Eye in the Sky for Sub-Tidal Seagrass Mapping: Leveraging Unsupervised Domain Adaptation with SegFormer for Multi-Source and Multi-Resolution Aerial Imagery
by Satish Pawar, Aris Thomasberger, Stefan Hein Bengtson, Malte Pedersen and Karen Timmermann
Remote Sens. 2025, 17(14), 2518; https://doi.org/10.3390/rs17142518 - 19 Jul 2025
Viewed by 191
Abstract
The accurate and large-scale mapping of seagrass meadows is essential, as these meadows form primary habitats for marine organisms and large sinks for blue carbon. Image data available for mapping these habitats are often scarce or are acquired through multiple surveys and instruments, [...] Read more.
The accurate and large-scale mapping of seagrass meadows is essential, as these meadows form primary habitats for marine organisms and large sinks for blue carbon. Image data available for mapping these habitats are often scarce or are acquired through multiple surveys and instruments, resulting in images of varying spatial and spectral characteristics. This study presents an unsupervised domain adaptation (UDA) strategy that combines histogram-matching with the transformer-based SegFormer model to address these challenges. Unoccupied aerial vehicle (UAV)-derived imagery (3-cm resolution) was used for training, while orthophotos from airplane surveys (12.5-cm resolution) served as the target domain. The method was evaluated across three Danish estuaries (Horsens Fjord, Skive Fjord, and Lovns Broad) using one-to-one, leave-one-out, and all-to-one histogram matching strategies. The highest performance was observed at Skive Fjord, achieving an F1-score/IoU = 0.52/0.48 for the leave-one-out test, corresponding to 68% of the benchmark model that was trained on both domains. These results demonstrate the potential of this lightweight UDA approach to generalization across spatial, temporal, and resolution domains, enabling the cost-effective and scalable mapping of submerged vegetation in data-scarce environments. This study also sheds light on contrast as a significant property of target domains that impacts image segmentation. Full article
(This article belongs to the Special Issue High-Resolution Remote Sensing Image Processing and Applications)
Show Figures

Figure 1

22 pages, 5363 KiB  
Article
Accurate Extraction of Rural Residential Buildings in Alpine Mountainous Areas by Combining Shadow Processing with FF-SwinT
by Guize Luan, Jinxuan Luo, Zuyu Gao and Fei Zhao
Remote Sens. 2025, 17(14), 2463; https://doi.org/10.3390/rs17142463 - 16 Jul 2025
Viewed by 226
Abstract
Precise extraction of rural settlements in alpine regions is critical for geographic data production, rural development, and spatial optimization. However, existing deep learning models are hindered by insufficient datasets and suboptimal algorithm structures, resulting in blurred boundaries and inadequate extraction accuracy. Therefore, this [...] Read more.
Precise extraction of rural settlements in alpine regions is critical for geographic data production, rural development, and spatial optimization. However, existing deep learning models are hindered by insufficient datasets and suboptimal algorithm structures, resulting in blurred boundaries and inadequate extraction accuracy. Therefore, this study uses high-resolution unmanned aerial vehicle (UAV) remote sensing images to construct a specialized dataset for the extraction of rural settlements in alpine mountainous areas, while introducing an innovative shadow mitigation technique that integrates multiple spectral characteristics. This methodology effectively addresses the challenges posed by intense shadows in settlements and environmental occlusions common in mountainous terrain analysis. Based on the comparative experiments with existing deep learning models, the Swin Transformer was selected as the baseline model. Building upon this, the Feature Fusion Swin Transformer (FF-SwinT) model was constructed by optimizing the data processing, loss function, and multi-view feature fusion. Finally, we rigorously evaluated it through ablation studies, generalization tests and large-scale image application experiments. The results show that the FF-SwinT has improved in many indicators compared with the traditional Swin Transformer, and the recognition results have clear edges and strong integrity. These results suggest that the FF-SwinT establishes a novel framework for rural settlement extraction in alpine mountain regions, which is of great significance for regional spatial optimization and development policy formulation. Full article
Show Figures

Figure 1

23 pages, 10698 KiB  
Article
Unmanned Aerial Vehicle-Based RGB Imaging and Lightweight Deep Learning for Downy Mildew Detection in Kimchi Cabbage
by Yang Lyu, Xiongzhe Han, Pingan Wang, Jae-Yeong Shin and Min-Woong Ju
Remote Sens. 2025, 17(14), 2388; https://doi.org/10.3390/rs17142388 - 10 Jul 2025
Viewed by 318
Abstract
Downy mildew is a highly destructive fungal disease that significantly reduces both the yield and quality of kimchi cabbage. Conventional detection methods rely on manual scouting, which is labor-intensive and prone to subjectivity. This study proposes an automated detection approach using RGB imagery [...] Read more.
Downy mildew is a highly destructive fungal disease that significantly reduces both the yield and quality of kimchi cabbage. Conventional detection methods rely on manual scouting, which is labor-intensive and prone to subjectivity. This study proposes an automated detection approach using RGB imagery acquired by an unmanned aerial vehicle (UAV), integrated with lightweight deep learning models for leaf-level identification of downy mildew. To improve disease feature extraction, Simple Linear Iterative Clustering (SLIC) segmentation was applied to the images. Among the evaluated models, Vision Transformer (ViT)-based architectures outperformed Convolutional Neural Network (CNN)-based models in terms of classification accuracy and generalization capability. For late-stage disease detection, DeiT-Tiny recorded the highest test accuracy (0.948) and macro F1-score (0.913), while MobileViT-S achieved the highest diseased recall (0.931). In early-stage detection, TinyViT-5M achieved the highest test accuracy (0.970) and macro F1-score (0.918); however, all models demonstrated reduced diseased recall under early-stage conditions, with DeiT-Tiny achieving the highest recall at 0.774. These findings underscore the challenges of identifying early symptoms using RGB imagery. Based on the classification results, prescription maps were generated to facilitate variable-rate pesticide application. Overall, this study demonstrates the potential of UAV-based RGB imaging for precision agriculture, while highlighting the importance of integrating multispectral data and utilizing domain adaptation techniques to enhance early-stage disease detection. Full article
(This article belongs to the Special Issue Advances in Remote Sensing for Crop Monitoring and Food Security)
Show Figures

Figure 1

19 pages, 6293 KiB  
Article
Restoring Anomalous Water Surface in DOM Product of UAV Remote Sensing Using Local Image Replacement
by Chunjie Wang, Ti Zhang, Liang Tao and Jiayuan Lin
Sensors 2025, 25(13), 4225; https://doi.org/10.3390/s25134225 - 7 Jul 2025
Viewed by 335
Abstract
In the production of a digital orthophoto map (DOM) from unmanned aerial vehicle (UAV)-acquired overlapping images, some anomalies such as texture stretching or data holes frequently occur in water areas due to the lack of significant textural features. These anomalies seriously affect the [...] Read more.
In the production of a digital orthophoto map (DOM) from unmanned aerial vehicle (UAV)-acquired overlapping images, some anomalies such as texture stretching or data holes frequently occur in water areas due to the lack of significant textural features. These anomalies seriously affect the visual quality and data integrity of the resulting DOMs. In this study, we attempted to eliminate the water surface anomalies in an example DOM via replacing the entire water area with an intact one that was clipped out from one single UAV image. The water surface scope and boundary in the image was first precisely achieved using the multisource seed filling algorithm and contour-finding algorithm. Next, the tie points were selected from the boundaries of the normal and anomalous water surfaces, and employed to realize their spatial alignment using affine plane coordinate transformation. Finally, the normal water surface was overlaid onto the DOM to replace the corresponding anomalous water surface. The restored water area had good visual effect in terms of spectral consistency, and the texture transition with the surrounding environment was also sufficiently natural. According to the standard deviations and mean values of RGB pixels, the quality of the restored DOM was greatly improved in comparison with the original one. These demonstrated that the proposed method had a sound performance in restoring abnormal water surfaces in a DOM, especially for scenarios where the water surface area is relatively small and can be contained in a single UAV image. Full article
(This article belongs to the Special Issue Remote Sensing and UAV Technologies for Environmental Monitoring)
Show Figures

Figure 1

33 pages, 15773 KiB  
Article
Surface Change and Stability Analysis in Open-Pit Mines Using UAV Photogrammetric Data and Geospatial Analysis
by Abdurahman Yasin Yiğit and Halil İbrahim Şenol
Drones 2025, 9(7), 472; https://doi.org/10.3390/drones9070472 - 2 Jul 2025
Cited by 1 | Viewed by 528
Abstract
Significant morphological transformations resulting from open-pit mining activities always present major problems with site safety and slope stability. This study investigates an active marble quarry in Dinar, Türkiye by combining geospatial analysis and photogrammetry based on unmanned aerial vehicles (UAV). Acquired in 2024 [...] Read more.
Significant morphological transformations resulting from open-pit mining activities always present major problems with site safety and slope stability. This study investigates an active marble quarry in Dinar, Türkiye by combining geospatial analysis and photogrammetry based on unmanned aerial vehicles (UAV). Acquired in 2024 and 2025, high-resolution images were combined with dense point clouds produced by Structure from Motion (SfM) methods. Iterative Closest Point (ICP) registration (RMSE = 2.09 cm) and Multiscale Model-to-Model Cloud Comparison (M3C2) analysis was used to quantify the surface changes. The study found a volumetric increase of 7744.04 m3 in the dump zones accompanied by an excavation loss of 8359.72 m3, so producing a net difference of almost 615.68 m3. Surface risk factors were evaluated holistically using a variety of morphometric criteria. These measures covered surface variation in several respects: their degree of homogeneity, presence of any unevenness or texture, verticality, planarity, and linearity. Surface variation > 0.20, roughness > 0.15, and verticality > 0.25 help one to identify zones of increased instability. Point cloud modeling derived from UAVs and GIS-based spatial analysis were integrated to show that morphological anomalies are spatially correlated with possible failure zones. Full article
Show Figures

Figure 1

32 pages, 5287 KiB  
Article
UniHSFormer X for Hyperspectral Crop Classification with Prototype-Routed Semantic Structuring
by Zhen Du, Senhao Liu, Yao Liao, Yuanyuan Tang, Yanwen Liu, Huimin Xing, Zhijie Zhang and Donghui Zhang
Agriculture 2025, 15(13), 1427; https://doi.org/10.3390/agriculture15131427 - 2 Jul 2025
Viewed by 310
Abstract
Hyperspectral imaging (HSI) plays a pivotal role in modern agriculture by capturing fine-grained spectral signatures that support crop classification, health assessment, and land-use monitoring. However, the transition from raw spectral data to reliable semantic understanding remains challenging—particularly under fragmented planting patterns, spectral ambiguity, [...] Read more.
Hyperspectral imaging (HSI) plays a pivotal role in modern agriculture by capturing fine-grained spectral signatures that support crop classification, health assessment, and land-use monitoring. However, the transition from raw spectral data to reliable semantic understanding remains challenging—particularly under fragmented planting patterns, spectral ambiguity, and spatial heterogeneity. To address these limitations, we propose UniHSFormer-X, a unified transformer-based framework that reconstructs agricultural semantics through prototype-guided token routing and hierarchical context modeling. Unlike conventional models that treat spectral–spatial features uniformly, UniHSFormer-X dynamically modulates information flow based on class-aware affinities, enabling precise delineation of field boundaries and robust recognition of spectrally entangled crop types. Evaluated on three UAV-based benchmarks—WHU-Hi-LongKou, HanChuan, and HongHu—the model achieves up to 99.80% overall accuracy and 99.28% average accuracy, outperforming state-of-the-art CNN, ViT, and hybrid architectures across both structured and heterogeneous agricultural scenarios. Ablation studies further reveal the critical role of semantic routing and prototype projection in stabilizing model behavior, while parameter surface analysis demonstrates consistent generalization across diverse configurations. Beyond high performance, UniHSFormer-X offers a semantically interpretable architecture that adapts to the spatial logic and compositional nuance of agricultural imagery, representing a forward step toward robust and scalable crop classification. Full article
Show Figures

Figure 1

24 pages, 41032 KiB  
Article
Multi-Parameter Water Quality Inversion in Heterogeneous Inland Waters Using UAV-Based Hyperspectral Data and Deep Learning Methods
by Hongran Li, Nuo Wang, Zixuan Du, Deyu Huang, Mengjie Shi, Zhaoman Zhong and Dongqing Yuan
Remote Sens. 2025, 17(13), 2191; https://doi.org/10.3390/rs17132191 - 25 Jun 2025
Viewed by 300
Abstract
Water quality monitoring is crucial for ecological protection and water resource management. However, traditional monitoring methods suffer from limitations in temporal, spatial, and spectral resolution, which constrain the effective evaluation of urban rivers and multi-scale aquatic systems. To address challenges such as ecological [...] Read more.
Water quality monitoring is crucial for ecological protection and water resource management. However, traditional monitoring methods suffer from limitations in temporal, spatial, and spectral resolution, which constrain the effective evaluation of urban rivers and multi-scale aquatic systems. To address challenges such as ecological heterogeneity, multi-scale complexity, and data noise, this paper proposes a deep learning framework, TL-Net, based on unmanned aerial vehicle (UAV) hyperspectral imagery, to estimate four water quality parameters: total nitrogen (TN), dissolved oxygen (DO), total suspended solids (TSS), and chlorophyll a (Chla); and to produce their spatial distribution maps. This framework integrates Transformer and long short-term memory (LSTM) networks, introduces a cross-temporal attention mechanism to enhance feature correlation, and incorporates an adaptive feature fusion module for dynamically weighted integration of local and global information. The experimental results demonstrate that TL-Net markedly outperforms conventional machine learning approaches, delivering consistently high predictive accuracy across all evaluated water quality parameters. Specifically, the model achieves an R2 of 0.9938 for TN, a mean absolute error (MAE) of 0.0728 for DO, a root mean square error (RMSE) of 0.3881 for total TSS, and a mean absolute percentage error (MAPE) as low as 0.2568% for Chla. A spatial analysis reveals significant heterogeneity in water quality distribution across the study area, with natural water bodies exhibiting relatively uniform conditions, while the concentrations of TN and TSS are substantially elevated in aquaculture areas due to aquaculture activities. Overall, TL-Net significantly improves multi-parameter water quality prediction, captures fine-scale spatial variability, and offers a robust and scalable solution for inland aquatic ecosystem monitoring. Full article
(This article belongs to the Section Environmental Remote Sensing)
Show Figures

Graphical abstract

28 pages, 11793 KiB  
Article
Unsupervised Multimodal UAV Image Registration via Style Transfer and Cascade Network
by Xiaoye Bi, Rongkai Qie, Chengyang Tao, Zhaoxiang Zhang and Yuelei Xu
Remote Sens. 2025, 17(13), 2160; https://doi.org/10.3390/rs17132160 - 24 Jun 2025
Cited by 1 | Viewed by 346
Abstract
Cross-modal image registration for unmanned aerial vehicle (UAV) platforms presents significant challenges due to large-scale deformations, distinct imaging mechanisms, and pronounced modality discrepancies. This paper proposes a novel multi-scale cascaded registration network based on style transfer that achieves superior performance: up to 67% [...] Read more.
Cross-modal image registration for unmanned aerial vehicle (UAV) platforms presents significant challenges due to large-scale deformations, distinct imaging mechanisms, and pronounced modality discrepancies. This paper proposes a novel multi-scale cascaded registration network based on style transfer that achieves superior performance: up to 67% reduction in mean squared error (from 0.0106 to 0.0068), 9.27% enhancement in normalized cross-correlation, 26% improvement in local normalized cross-correlation, and 8% increase in mutual information compared to state-of-the-art methods. The architecture integrates a cross-modal style transfer network (CSTNet) that transforms visible images into pseudo-infrared representations to unify modality characteristics, and a multi-scale cascaded registration network (MCRNet) that performs progressive spatial alignment across multiple resolution scales using diffeomorphic deformation modeling to ensure smooth and invertible transformations. A self-supervised learning paradigm based on image reconstruction eliminates reliance on manually annotated data while maintaining registration accuracy through synthetic deformation generation. Extensive experiments on the LLVIP dataset demonstrate the method’s robustness under challenging conditions involving large-scale transformations, with ablation studies confirming that style transfer contributes 28% MSE improvement and diffeomorphic registration prevents 10.6% performance degradation. The proposed approach provides a robust solution for cross-modal image registration in dynamic UAV environments, offering significant implications for downstream applications such as target detection, tracking, and surveillance. Full article
(This article belongs to the Special Issue Advances in Deep Learning Approaches: UAV Data Analysis)
Show Figures

Graphical abstract

23 pages, 6358 KiB  
Article
Optimization of Sorghum Spike Recognition Algorithm and Yield Estimation
by Mengyao Han, Jian Gao, Cuiqing Wu, Qingliang Cui, Xiangyang Yuan and Shujin Qiu
Agronomy 2025, 15(7), 1526; https://doi.org/10.3390/agronomy15071526 - 23 Jun 2025
Viewed by 308
Abstract
In the natural field environment, the high planting density of sorghum and severe occlusion among spikes substantially increases the difficulty of sorghum spike recognition, resulting in frequent false positives and false negatives. The target detection model suitable for this environment requires high computational [...] Read more.
In the natural field environment, the high planting density of sorghum and severe occlusion among spikes substantially increases the difficulty of sorghum spike recognition, resulting in frequent false positives and false negatives. The target detection model suitable for this environment requires high computational power, and it is difficult to realize real-time detection of sorghum spikes on mobile devices. This study proposes a detection-tracking scheme based on improved YOLOv8s-GOLD-LSKA with optimized DeepSort, aiming to enhance yield estimation accuracy in complex agricultural field scenarios. By integrating the GOLD module’s dual-branch multi-scale feature fusion and the LSKA attention mechanism, a lightweight detection model is developed. The improved DeepSort algorithm enhances tracking robustness in occlusion scenarios by optimizing the confidence threshold filtering (0.46), frame-skipping count, and cascading matching strategy (n = 3, max_age = 40). Combined with the five-point sampling method, the average dry weight of sorghum spikes (0.12 kg) was used to enable rapid yield estimation. The results demonstrate that the improved model achieved a mAP of 85.86% (a 6.63% increase over the original YOLOv8), an F1 score of 81.19%, and a model size reduced to 7.48 MB, with a detection speed of 0.0168 s per frame. The optimized tracking system attained a MOTA of 67.96% and ran at 42 FPS. Image- and video-based yield estimation accuracies reached 89–96% and 75–93%, respectively, with single-frame latency as low as 0.047 s. By optimizing the full detection–tracking–yield pipeline, this solution overcomes challenges in small object missed detections, ID switches under occlusion, and real-time processing in complex scenarios. Its lightweight, high-efficiency design is well suited for deployment on UAVs and mobile terminals, providing robust technical support for intelligent sorghum monitoring and precision agriculture management, and thereby playing a crucial role in driving agricultural digital transformation. Full article
Show Figures

Figure 1

19 pages, 8609 KiB  
Article
A Microwave Vision-Enhanced Environmental Perception Method for the Visual Navigation of UAVs
by Rui Li, Dewei Wu, Peiran Li, Chenhao Zhao, Jingyi Zhang and Jing He
Remote Sens. 2025, 17(12), 2107; https://doi.org/10.3390/rs17122107 - 19 Jun 2025
Viewed by 299
Abstract
Visual navigation technology holds significant potential for applications involving unmanned aerial vehicles (UAVs). However, the inherent spectral limitations of optical-dependent navigation systems prove particularly inadequate for high-altitude long-endurance (HALE) UAV operations, as they are fundamentally constrained in maintaining reliable environment perception under conditions [...] Read more.
Visual navigation technology holds significant potential for applications involving unmanned aerial vehicles (UAVs). However, the inherent spectral limitations of optical-dependent navigation systems prove particularly inadequate for high-altitude long-endurance (HALE) UAV operations, as they are fundamentally constrained in maintaining reliable environment perception under conditions of fluctuating illumination and persistent cloud cover. To address this challenge, this paper introduces microwave vision to assist optical vision for environmental measurement and proposes a novel microwave vision-enhanced environmental perception method. In particular, the richness of perceived environmental information can be enhanced by SAR and optical image fusion processing in the case of sufficient light and clear weather. In order to simultaneously mitigate inherent SAR speckle noise and address existing fusion algorithms’ inadequate consideration of UAV navigation-specific environmental perception requirements, this paper designs a SAR Target-Augmented Fusion (STAF) algorithm based on the target detection of SAR images. On the basis of image preprocessing, this algorithm utilizes constant false alarm rate (CFAR) detection along with morphological operations to extract critical target information from SAR images. Subsequently, the intensity–hue–saturation (IHS) transform is employed to integrate this extracted information into the optical image. The experimental results show that the proposed microwave vision-enhanced environmental perception method effectively utilizes microwave vision to shape target information perception in the electromagnetic spectrum and enhance the information content of environmental measurement results. The unique information extracted by the STAF algorithm from SAR images can effectively enhance the optical images while retaining their main attributes. This method can effectively enhance the environmental measurement robustness and information acquisition ability of the visual navigation system. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Graphical abstract

23 pages, 1208 KiB  
Article
UCrack-DA: A Multi-Scale Unsupervised Domain Adaptation Method for Surface Crack Segmentation
by Fei Deng, Shaohui Yang, Bin Wang, Xiujun Dong and Siyuan Tian
Remote Sens. 2025, 17(12), 2101; https://doi.org/10.3390/rs17122101 - 19 Jun 2025
Viewed by 472
Abstract
Surface cracks serve as early warning signals for potential geological hazards, and their precise segmentation is crucial for disaster risk assessment. Due to differences in acquisition conditions and the diversity of crack morphology, scale, and surface texture, there is a significant domain shift [...] Read more.
Surface cracks serve as early warning signals for potential geological hazards, and their precise segmentation is crucial for disaster risk assessment. Due to differences in acquisition conditions and the diversity of crack morphology, scale, and surface texture, there is a significant domain shift between different crack datasets, necessitating transfer training. However, in real work areas, the sparse distribution of cracks results in a limited number of samples, and the difficulty of crack annotation makes it highly inefficient to use a high proportion of annotated samples for transfer training to predict the remaining samples. Domain adaptation methods can achieve transfer training without relying on manual annotation, but traditional domain adaptation methods struggle to effectively address the characteristics of cracks. To address this issue, we propose an unsupervised domain adaptation method for crack segmentation. By employing a hierarchical adversarial mechanism and a prediction entropy minimization constraint, we extract domain-invariant features in a multi-scale feature space and sharpen decision boundaries. Additionally, by integrating a Mix-Transformer encoder, a multi-scale dilated attention module, and a mixed convolutional attention decoder, we effectively solve the challenges of cross-domain data distribution differences and complex scene crack segmentation. Experimental results show that UCrack-DA achieves superior performance compared to existing methods on both the Roboflow-Crack and UAV-Crack datasets, with significant improvements in metrics such as mIoU, mPA, and Accuracy. In UAV images captured in field scenarios, the model demonstrates excellent segmentation Accuracy for multi-scale and multi-morphology cracks, validating its practical application value in geological hazard monitoring. Full article
(This article belongs to the Section AI Remote Sensing)
Show Figures

Graphical abstract

19 pages, 6772 KiB  
Article
A Cross-Mamba Interaction Network for UAV-to-Satallite Geolocalization
by Lingyun Tian, Qiang Shen, Yang Gao, Simiao Wang, Yunan Liu and Zilong Deng
Drones 2025, 9(6), 427; https://doi.org/10.3390/drones9060427 - 12 Jun 2025
Viewed by 946
Abstract
The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face [...] Read more.
The geolocalization of unmanned aerial vehicles (UAVs) in satellite-denied environments has emerged as a key research focus. Recent advancements in this area have been largely driven by learning-based frameworks that utilize convolutional neural networks (CNNs) and Transformers. However, both CNNs and Transformers face challenges in capturing global feature dependencies due to their restricted receptive fields. Inspired by state-space models (SSMs), which have demonstrated efficacy in modeling long sequences, we propose a pure Mamba-based method called the Cross-Mamba Interaction Network (CMIN) for UAV geolocalization. CMIN consists of three key components: feature extraction, information interaction, and feature fusion. It leverages Mamba’s strengths in global information modeling to effectively capture feature correlations between UAV and satellite images over a larger receptive field. For feature extraction, we design a Siamese Feature Extraction Module (SFEM) based on two basic vision Mamba blocks, enabling the model to capture the correlation between UAV and satellite image features. In terms of information interaction, we introduce a Local Cross-Attention Module (LCAM) to fuse cross-Mamba features, providing a solution for feature matching via deep learning. By aggregating features from various layers of SFEMs, we generate heatmaps for the satellite image that help determine the UAV’s geographical coordinates. Additionally, we propose a Center Masking strategy for data augmentation, which promotes the model’s ability to learn richer contextual information from UAV images. Experimental results on benchmark datasets show that our method achieves state-of-the-art performance. Ablation studies further validate the effectiveness of each component of CMIN. Full article
Show Figures

Figure 1

26 pages, 7731 KiB  
Article
Semantic HBIM for Heritage Conservation: A Methodology for Mapping Deterioration and Structural Deformation in Historic Envelopes
by Enrique Nieto-Julián, María Dolores Robador, Juan Moyano and Silvana Bruno
Buildings 2025, 15(12), 1990; https://doi.org/10.3390/buildings15121990 - 10 Jun 2025
Viewed by 469
Abstract
The conservation and intervention of heritage structures require a flexible, interdisciplinary environment capable of managing data throughout the building’s life cycle. Historic building information modeling (HBIM) has emerged as an effective tool for supporting these processes. Originally conceived for parametric construction modeling, BIM [...] Read more.
The conservation and intervention of heritage structures require a flexible, interdisciplinary environment capable of managing data throughout the building’s life cycle. Historic building information modeling (HBIM) has emerged as an effective tool for supporting these processes. Originally conceived for parametric construction modeling, BIM can also integrate historical transformations, aiding in maintenance and preservation. Historic buildings often feature complex geometries and visible material traces of time, requiring detailed analysis. This research proposes a methodology for documenting and assessing the envelope of historic buildings by locating, classifying, and recording transformations, deterioration, and structural deformations. The approach is based on semantic segmentation and classification using data from terrestrial laser scanning (TLS) and unmanned aerial vehicles (UAVs), applied to the Palace of Miguel de Mañara—an iconic 17th-century building in Seville. Archival images were integrated into the HBIM model to identify previous restoration interventions and assess current deterioration. The methodology included geometric characterization, material mapping, semantic segmentation, diagnostic input, and temporal analysis. The results validated a process for detecting pathological cracks in masonry facades, providing a collaborative HBIM framework enriched with expert-validated data to support repair decisions and guide conservation efforts. Full article
Show Figures

Figure 1

25 pages, 6066 KiB  
Article
FD2-YOLO: A Frequency-Domain Dual-Stream Network Based on YOLO for Crack Detection
by Junwen Zhu, Jinbao Sheng and Qian Cai
Sensors 2025, 25(11), 3427; https://doi.org/10.3390/s25113427 - 29 May 2025
Viewed by 655
Abstract
Crack detection in cement infrastructure is imperative to ensure its structural integrity and public safety. However, most existing methods use multi-scale and attention mechanisms to improve on a single backbone, and this single backbone network is often ineffective in detecting slender or variable [...] Read more.
Crack detection in cement infrastructure is imperative to ensure its structural integrity and public safety. However, most existing methods use multi-scale and attention mechanisms to improve on a single backbone, and this single backbone network is often ineffective in detecting slender or variable cracks in complex scenarios. We propose a novel network, FD2-YOLO, based on frequency-domain dual-stream YOLO, for accurate and efficient detection of cement cracks. Firstly, the model employs a dual backbone architecture, integrating edge and texture features in the frequency domain with semantic features in the spatial domain, to enhance the extraction of crack-related features. Furthermore, the Dynamic Inter-Domain Feature Fusion module (DIFF) is introduced, which uses large-kernel deep convolution and Hadamard to enable the adaptive fusion of features from different domains, thus addressing the problem of difficult feature fusion due to domain differences. Finally, the DIA-Head module has been proposed, which dynamically focuses on the texture and geometric deformation features of cracks by introducing the Deformable Interactive Attention Module (DIA Module) in Decoupled Head and utilizing its Deformable Interactive Attention. Extensive experiments on the RDD2022 dataset demonstrate that FD2-YOLO achieves state-of-the-art performance. Compared with existing YOLO-based models, it improves mAP50 by 1.3%, mAP50-95 by 1.1%, recall by 1.8%, and precision by 0.5%, validating its effectiveness in real-world object detection scenarios. In addition, evaluation on the UAV-PDD2023 dataset further confirms the robustness and generalization of our approach, where FD2-YOLO achieves a mAP50 of 67.9%, mAP50-95 of 35.9%, recall of 61.2%, and precision of 75.9%, consistently outperforming existing lightweight and Transformer-based detectors under more complex aerial imaging conditions. Full article
(This article belongs to the Section Physical Sensors)
Show Figures

Figure 1

18 pages, 9335 KiB  
Article
Image Matching Algorithm for Transmission Towers Based on CLAHE and Improved RANSAC
by Ruihua Chen, Pan Yao, Shuo Wang, Chuanlong Lyu and Yuge Xu
Designs 2025, 9(3), 67; https://doi.org/10.3390/designs9030067 - 29 May 2025
Viewed by 940
Abstract
To address the lack of robustness against illumination and blurring variations in aerial images of transmission towers, an improved image matching algorithm for aerial images is proposed. The proposed algorithm consists of two main components: an enhanced AKAZE algorithm and an improved three-stage [...] Read more.
To address the lack of robustness against illumination and blurring variations in aerial images of transmission towers, an improved image matching algorithm for aerial images is proposed. The proposed algorithm consists of two main components: an enhanced AKAZE algorithm and an improved three-stage feature matching strategy, which are used for feature point detection and feature matching, respectively. First, the improved AKAZE enhances image contrast using Contrast-Limited Adaptive Histogram Equalization (CLAHE), which highlights target features and improves robustness against environmental interference. Subsequently, the original AKAZE algorithm is employed to detect feature points and construct binary descriptors. Building upon this, an improved three-stage feature matching strategy is proposed to estimate the geometric transformation between image pairs. Specifically, the strategy begins with initial feature matching using the nearest neighbor ratio (NNR) method, followed by outlier rejection via the Grid-based Motion Statistics (GMS) algorithm. Finally, an improved Random Sample Consensus (RANSAC) algorithm computes the transformation matrix, further enhancing matching efficiency. Experimental results demonstrate that the proposed method exceeds the original AKAZE algorithm’s matching accuracy by 4∼15% on different image sets while achieving faster matching speeds. Under real-world conditions with UAV-captured aerial images of transmission towers, the proposed algorithm achieves over 95% matching accuracy, which is higher than other algorithms. Our proposed algorithm enables fast and accurate matching of transmission tower aerial images. Full article
(This article belongs to the Section Electrical Engineering Design)
Show Figures

Figure 1

Back to TopTop