MDPI - Publisher of Open Access Journals

22 pages, 6201 KiB

Open AccessArticle

SOAM Block: A Scale–Orientation-Aware Module for Efficient Object Detection in Remote Sensing Imagery

by Yi Chen, Zhidong Wang, Zhipeng Xiong, Yufeng Zhang and Xinqi Xu

Symmetry 2025, 17(8), 1251; https://doi.org/10.3390/sym17081251 - 6 Aug 2025

Object detection in remote sensing imagery is critical in environmental monitoring, urban planning, and land resource management. However, the task remains challenging due to significant scale variations, arbitrary object orientations, and complex background clutter. To address these issues, we propose a novel orientation [...] Read more.

Object detection in remote sensing imagery is critical in environmental monitoring, urban planning, and land resource management. However, the task remains challenging due to significant scale variations, arbitrary object orientations, and complex background clutter. To address these issues, we propose a novel orientation module (SOAM Block) that jointly models object scale and directional features while exploiting geometric symmetry inherent in many remote sensing targets. The SOAM Block is constructed upon a lightweight and efficient Adaptive Multi-Scale (AMS) Module, which utilizes a symmetric arrangement of parallel depth-wise convolutional branches with varied kernel sizes to extract fine-grained multi-scale features without dilation, thereby preserving local context and enhancing scale adaptability. In addition, a Strip-based Context Attention (SCA) mechanism is introduced to model long-range spatial dependencies, leveraging horizontal and vertical 1D strip convolutions in a directionally symmetric fashion. This design captures spatial correlations between distant regions and reinforces semantic consistency in cluttered scenes. Importantly, this work is the first to explicitly analyze the coupling between object scale and orientation in remote sensing imagery. The proposed method addresses the limitations of fixed receptive fields in capturing symmetric directional cues of large-scale objects. Extensive experiments are conducted on two widely used benchmarks—DOTA and HRSC2016—both of which exhibit significant scale variations and orientation diversity. Results demonstrate that our approach achieves superior detection accuracy with fewer parameters and lower computational overhead compared to state-of-the-art methods. The proposed SOAM Block thus offers a robust, scalable, and symmetry-aware solution for high-precision object detection in complex aerial scenes. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

23 pages, 7371 KiB

Open AccessArticle

A Novel Method for Estimating Building Height from Baidu Panoramic Street View Images

by Shibo Ge, Jiping Liu, Xianghong Che, Yong Wang and Haosheng Huang

ISPRS Int. J. Geo-Inf. 2025, 14(8), 297; https://doi.org/10.3390/ijgi14080297 - 30 Jul 2025

Viewed by 258

Abstract

Building height information plays an important role in many urban-related applications, such as urban planning, disaster management, and environmental studies. With the rapid development of real scene maps, street view images are becoming a new data source for building height estimation, considering their [...] Read more.

Building height information plays an important role in many urban-related applications, such as urban planning, disaster management, and environmental studies. With the rapid development of real scene maps, street view images are becoming a new data source for building height estimation, considering their easy collection and low cost. However, existing studies on building height estimation primarily utilize remote sensing images, with little exploration of height estimation from street-view images. In this study, we proposed a deep learning-based method for estimating the height of a single building in Baidu panoramic street view imagery. Firstly, the Segment Anything Model was used to extract the region of interest image and location features of individual buildings from the panorama. Subsequently, a cross-view matching algorithm was proposed by combining Baidu panorama and building footprint data with height information to generate building height samples. Finally, a Two-Branch feature fusion model (TBFF) was constructed to combine building location features and visual features, enabling accurate height estimation for individual buildings. The experimental results showed that the TBFF model had the best performance, with an RMSE of 5.69 m, MAE of 3.97 m, and MAPE of 0.11. Compared with two state-of-the-art methods, the TBFF model exhibited robustness and higher accuracy. The Random Forest model had an RMSE of 11.83 m, MAE of 4.76 m, and MAPE of 0.32, and the Pano2Geo model had an RMSE of 10.51 m, MAE of 6.52 m, and MAPE of 0.22. The ablation analysis demonstrated that fusing building location and visual features can improve the accuracy of height estimation by 14.98% to 69.99%. Moreover, the accuracy of the proposed method meets the LOD1 level 3D modeling requirements defined by the OGC (height error ≤ 5 m), which can provide data support for urban research. Full article

► Show Figures

Figure 1

22 pages, 3494 KiB

Open AccessArticle

Parcel Segmentation Method Combined YOLOV5s and Segment Anything Model Using Remote Sensing Image

by Xiaoqin Wu, Dacheng Wang, Caihong Ma, Yi Zeng, Yongze Lv, Xianmiao Huang and Jiandong Wang

Land 2025, 14(7), 1429; https://doi.org/10.3390/land14071429 - 8 Jul 2025

Viewed by 425

Abstract

Accurate land parcel segmentation in remote sensing imagery is critical for applications such as land use analysis, agricultural monitoring, and urban planning. However, existing methods often underperform in complex scenes due to small-object segmentation challenges, blurred boundaries, and background interference, often influenced by [...] Read more.

Accurate land parcel segmentation in remote sensing imagery is critical for applications such as land use analysis, agricultural monitoring, and urban planning. However, existing methods often underperform in complex scenes due to small-object segmentation challenges, blurred boundaries, and background interference, often influenced by sensor resolution and atmospheric variation. To address these limitations, we propose a dual-stage framework that combines an enhanced YOLOv5s detector with the Segment Anything Model (SAM) to improve segmentation accuracy and robustness. The improved YOLOv5s module integrates Efficient Channel Attention (ECA) and BiFPN to boost feature extraction and small-object recognition, while Soft-NMS is used to reduce missed detections. The SAM module receives bounding-box prompts from YOLOv5s and incorporates morphological refinement and mask stability scoring for improved boundary continuity and mask quality. A composite Focal-Dice loss is applied to mitigate class imbalance. In addition to the publicly available CCF BDCI dataset, we constructed a new WuJiang dataset to evaluate cross-domain performance. Experimental results demonstrate that our method achieves an IoU of 89.8% and a precision of 90.2%, outperforming baseline models and showing strong generalizability across diverse remote sensing conditions. Full article

► Show Figures

Figure 1

23 pages, 15159 KiB

Open AccessArticle

TBFH: A Total-Building-Focused Hybrid Dataset for Remote Sensing Image Building Detection

by Lin Yi, Feng Wang, Guangyao Zhou, Niangang Jiao, Minglin He, Jingxing Zhu and Hongjian You

Remote Sens. 2025, 17(13), 2316; https://doi.org/10.3390/rs17132316 - 6 Jul 2025

Viewed by 434

Abstract

Building extraction plays a crucial role in a variety of applications, including urban planning, high-precision 3D reconstruction, and environmental monitoring. In particular, the accurate detection of tall buildings is essential for reliable modeling and analysis. However, most existing building-detection methods are primarily trained [...] Read more.

Building extraction plays a crucial role in a variety of applications, including urban planning, high-precision 3D reconstruction, and environmental monitoring. In particular, the accurate detection of tall buildings is essential for reliable modeling and analysis. However, most existing building-detection methods are primarily trained on datasets dominated by low-rise structures, resulting in degraded performance when applied to complex urban scenes with high-rise buildings and severe occlusions. To address this limitation, we propose TBFH (Total-Building-Focused Hybrid), a novel dataset specifically designed for building detection in remote sensing imagery. TBFH comprises a diverse collection of tall buildings across various urban environments and is integrated with the publicly available WHU Building dataset to enable joint training. This hybrid strategy aims to enhance model robustness and generalization across varying urban morphologies. We also propose the KTC metric to quantitatively evaluate the structural integrity and shape fidelity of building segmentation results. We evaluated the effectiveness of TBFH on multiple state-of-the-art models, including UNet, UNetFormer, ABCNet, BANet, FCN, DeepLabV3, MANet, SegFormer, and DynamicVis. Our comparative experiments conducted on the Tall Building dataset, the WHU dataset, and TBFH demonstrated that models trained with TBFH significantly outperformed those trained on individual datasets, showing notable improvements in IoU, F1, and KTC scores as well as in the accuracy of building shape delineation. These findings underscore the critical importance of incorporating tall building-focused data to improve both detection accuracy and generalization performance. Full article

► Show Figures

Figure 1

24 pages, 27167 KiB

Open AccessArticle

ICT-Net: A Framework for Multi-Domain Cross-View Geo-Localization with Multi-Source Remote Sensing Fusion

by Min Wu, Sirui Xu, Ziwei Wang, Jin Dong, Gong Cheng, Xinlong Yu and Yang Liu

Remote Sens. 2025, 17(12), 1988; https://doi.org/10.3390/rs17121988 - 9 Jun 2025

Viewed by 469

Abstract

Traditional single neural network-based geo-localization methods for cross-view imagery primarily rely on polar coordinate transformations while suffering from limited global correlation modeling capabilities. To address these fundamental challenges of weak feature correlation and poor scene adaptation, we present a novel framework termed ICT-Net [...] Read more.

Traditional single neural network-based geo-localization methods for cross-view imagery primarily rely on polar coordinate transformations while suffering from limited global correlation modeling capabilities. To address these fundamental challenges of weak feature correlation and poor scene adaptation, we present a novel framework termed ICT-Net (Integrated CNN-Transformer Network) that synergistically combines convolutional neural networks with Transformer architectures. Our approach harnesses the complementary strengths of CNNs in capturing local geometric details and Transformers in establishing long-range dependencies, enabling comprehensive joint perception of both local and global visual patterns. Furthermore, capitalizing on the Transformer’s flexible input processing mechanism, we develop an attention-guided non-uniform cropping strategy that dynamically eliminates redundant image patches with minimal impact on localization accuracy, thereby achieving enhanced computational efficiency. To facilitate practical deployment, we propose a deep embedding clustering algorithm optimized for rapid parsing of geo-localization information. Extensive experiments demonstrate that ICT-Net establishes new state-of-the-art localization accuracy on the CVUSA benchmark, achieving a top-1 recall rate improvement of 8.6% over previous methods. Additional validation on a challenging real-world dataset collected at Beihang University (BUAA) further confirms the framework’s effectiveness and practical applicability in complex urban environments, particularly showing 23% higher robustness to vegetation variations. Full article

(This article belongs to the Special Issue Artificial Intelligence Algorithm for Remote Sensing Imagery Processing (5th Edition))

► Show Figures

Figure 1

27 pages, 34596 KiB

Open AccessArticle

Evolution Method of Built Environment Spatial Quality in Historic Districts Based on Spatiotemporal Street View: A Case Study of Tianjin Wudadao

by Lujin Hu, Yu Liu and Bing Yu

Buildings 2025, 15(11), 1953; https://doi.org/10.3390/buildings15111953 - 4 Jun 2025

Viewed by 477

Abstract

With the accelerating pace of urbanization, historic districts are increasingly confronted with the dual challenge of coordinating heritage preservation and sustainable development. This study proposes an intelligent evaluation framework that integrates spatiotemporal street view imagery, affective perception modeling, and scene recognition to reveal [...] Read more.

With the accelerating pace of urbanization, historic districts are increasingly confronted with the dual challenge of coordinating heritage preservation and sustainable development. This study proposes an intelligent evaluation framework that integrates spatiotemporal street view imagery, affective perception modeling, and scene recognition to reveal the evolutionary dynamics of built environment spatial quality in historic districts. Empirical analysis based on multi-temporal data (2013–2020) from the Wudadao Historic District in Tianjin demonstrates that spatial quality is shaped by a complex interplay of factors, including planning and preservation policies, landscape greening, pedestrian-oriented design, infrastructure adequacy, and equitable resource allocation. These findings validate the framework’s effectiveness as a tool for monitoring urban sustainability. Moreover, it provides actionable insights for the development of resilient, equitable, and culturally vibrant built environments, effectively bridging the gap between technological innovation and sustainable governance in the context of historic districts. Full article

(This article belongs to the Special Issue Recent Advances in Intelligent Applications of Well-Being Spaces Design and Engineering)

► Show Figures

Figure 1

22 pages, 12284 KiB

Open AccessArticle

EcoDetect-YOLOv2: A High-Performance Model for Multi-Scale Waste Detection in Complex Surveillance Environments

by Jing Su, Ruihan Chen, Mingzhi Li, Shenlin Liu, Guobao Xu and Zanhong Zheng

Sensors 2025, 25(11), 3451; https://doi.org/10.3390/s25113451 - 30 May 2025

Cited by 1 | Viewed by 582

Abstract

Conventional waste monitoring relies heavily on manual inspection, while most detection models are trained on close-range, simplified datasets, limiting their applicability for real-world surveillance. Even with surveillance imagery, challenges such as cluttered backgrounds, scale variation, and small object sizes often lead to missed [...] Read more.

Conventional waste monitoring relies heavily on manual inspection, while most detection models are trained on close-range, simplified datasets, limiting their applicability for real-world surveillance. Even with surveillance imagery, challenges such as cluttered backgrounds, scale variation, and small object sizes often lead to missed detections and reduced robustness. To address these challenges, this study introduces EcoDetect-YOLOv2, a lightweight and high-efficiency object detection model developed using the Intricate Environment Waste Exposure Detection (IEWED) dataset. Building upon the YOLOv8s architecture, EcoDetect-YOLOv2 incorporates a small object detection P2 detection layer to enhance sensitivity to small objects. The integration of an efficient multi-scale attention (EMA) mechanism prior to the P2 head further improves the model’s capacity to detect small-scale targets, while bolstering robustness against cluttered backgrounds and environmental noise, as well as generalizability across scale variations. In the feature fusion stage, a Dynamic Upsampling Module (Dysample) replaces traditional nearest-neighbor upsampling to yield higher-quality feature maps, thereby facilitating improved discrimination of overlapping and degraded waste particles. To reduce computational overhead and inference latency without sacrificing detection accuracy, Ghost Convolution (GhostConv) replaces conventional convolution layers within the neck. Based on this, a GhostResBottleneck structure is proposed, along with a novel ResGhostCSP module—designed via a one-shot aggregation strategy—to replace the original C2f module. Experiments conducted on the IEWED dataset, which features multi-object, multi-class, and highly complex real-world scenes, demonstrate that EcoDetect-YOLOv2 outperforms the baseline YOLOv8s by 1.0%, 4.6%, 4.8%, and 3.1% in precision, recall, mAP₅₀, and mAP_50:95, respectively, while reducing the parameter count by 19.3%. These results highlight the model’s effectiveness in real-time, multi-object waste detection, providing a scalable and efficient tool for automated urban and digital governance. Full article

(This article belongs to the Collection Advances in Deep-Learning-Based Sensing, Imaging, and Video Processing)

► Show Figures

Figure 1

17 pages, 7128 KiB

Open AccessArticle

Application of Deep Learning on Global Spaceborne Radar and Multispectral Imagery for the Estimation of Urban Surface Height Distribution

by Vivaldi Rinaldi and Masoud Ghandehari

Remote Sens. 2025, 17(7), 1297; https://doi.org/10.3390/rs17071297 - 5 Apr 2025

Viewed by 528

Abstract

Digital Surface Models (DSMs) have a wide range of applications, including the spatial and temporal analysis of human habitation. Traditionally, DSMs are generated by rasterizing Light Detection and Ranging (LiDAR) point clouds. While LiDAR provides high-resolution details, the acquisition of required data is [...] Read more.

Digital Surface Models (DSMs) have a wide range of applications, including the spatial and temporal analysis of human habitation. Traditionally, DSMs are generated by rasterizing Light Detection and Ranging (LiDAR) point clouds. While LiDAR provides high-resolution details, the acquisition of required data is logistically challenging and costly, leading to limited spatial coverage and temporal frequency. Satellite imagery, such as Synthetic Aperture Radar (SAR), contains information on surface height variations in the scene within the reflected signal. Transforming satellite imagery data into a global DSM is challenging but would be of great value if those challenges were overcome. This study explores the application of a U-Net architecture to generate DSMs by coupling Sentinel-1 SAR and Sentinel-2 optical imagery. The model is trained on surface height data from multiple U.S. cities to produce a normalized DSM (NDSM) and assess its ability to generalize inferences for cities outside the training dataset. The analysis of the results shows that the model performs moderately well when inferring test cities but its performance remains well below that of the training cities. Further examination, through the comparison of height distributions and cross-sectional analysis, reveals that estimation bias is influenced by the input image resolution and the presence of geometric distortion within the SAR image. These findings highlight the need for refinement in preprocessing techniques as well as advanced training approaches and model architecture that can better handle the complexities of urban landscapes encoded in satellite imagery. Full article

(This article belongs to the Section AI Remote Sensing)

► Show Figures

Figure 1

20 pages, 4782 KiB

Open AccessArticle

Self-Supervised Learning with Trilateral Redundancy Reduction for Urban Functional Zone Identification Using Street-View Imagery

by Kun Zhao, Juan Li, Shuai Xie, Lijian Zhou, Wenbin He and Xiaolin Chen

Sensors 2025, 25(5), 1504; https://doi.org/10.3390/s25051504 - 28 Feb 2025

Cited by 1 | Viewed by 915

Abstract

In recent years, the use of street-view images for urban analysis has received much attention. Despite the abundance of raw data, existing supervised learning methods heavily rely on large-scale and high-quality labels. Faced with the challenge of label scarcity in urban scene classification [...] Read more.

In recent years, the use of street-view images for urban analysis has received much attention. Despite the abundance of raw data, existing supervised learning methods heavily rely on large-scale and high-quality labels. Faced with the challenge of label scarcity in urban scene classification tasks, an innovative self-supervised learning framework, Trilateral Redundancy Reduction (Tri-ReD) is proposed. In this framework, a more restrictive loss, “trilateral loss”, is proposed. By compelling the embedding of positive samples to be highly correlated, it guides the pre-trained model to learn more essential representations without semantic labels. Furthermore, a novel data augmentation strategy, tri-branch mutually exclusive augmentation (Tri-MExA), is proposed. Its aim is to reduce the uncertainties introduced by traditional random augmentation methods. As a model pre-training method, Tri-ReD framework is architecture-agnostic, performing effectively on both CNNs and ViTs, which makes it adaptable for a wide variety of downstream tasks. In this paper, 116,491 unlabeled street-view images were used to pre-train models by Tri-ReD to obtain the general representation of urban scenes at the ground level. These pre-trained models were then fine-tuned using supervised data with semantic labels (17,600 images from BIC_GSV and 12,871 from BEAUTY) for the final classification task. Experimental results demonstrate that the proposed self-supervised pre-training method outperformed the direct supervised learning approaches for urban functional zone identification by 19% on average. It also surpassed the performance of models pre-trained on ImageNet by around 11%, achieving state-of-the-art (SOTA) results in self-supervised pre-training. Full article

(This article belongs to the Section Remote Sensors)

► Show Figures

Figure 1

20 pages, 6841 KiB

Open AccessArticle

Analysis of the Spatial Distributions and Mechanisms Influencing Abandoned Farmland Based on High-Resolution Satellite Imagery

by Wei Su, Yueming Hu, Fangyan Xue, Xiaoping Fu, Hao Yang, Hui Dai and Lu Wang

Land 2025, 14(3), 501; https://doi.org/10.3390/land14030501 - 28 Feb 2025

Viewed by 823

Abstract

Due to the rapid expansion of urban areas, the aging of agricultural labor, and the loss of rural workforce, some regions in China have experienced farmland abandonment. The use of remote sensing technology allows for the rapid and accurate extraction of abandoned farmland, [...] Read more.

Due to the rapid expansion of urban areas, the aging of agricultural labor, and the loss of rural workforce, some regions in China have experienced farmland abandonment. The use of remote sensing technology allows for the rapid and accurate extraction of abandoned farmland, which is of great significance for research on land-using change, food security protection, and ecological and environmental conservation. This research focuses on Qiaotou Town in Chengmai County, Hainan Province, as the study area. Using four high-resolution satellite imagery scenes, digital elevation models, and other relevant data, the random forest classification method was applied to extract abandoned farmland and analyze its spatial distribution characteristics. The accuracy of the results was verified. Based on these findings, the study examines the influence of four factors—irrigation conditions, slope, accessibility, and proximity to residential areas—on farmland abandonment and proposes corresponding governance policies. The results indicate that the accuracy of abandoned farmland extraction using high-resolution satellite imagery is 93.29%. The phenomenon of seasonal farmland abandonment is more prevalent than perennial farmland abandonment in the study area. Among the influencing factors, the abandonment rate decreases with increasing distance from road buffer zones, increases with greater distance from water systems, and decreases with increasing distance from residential areas. Most of the abandoned farmland is located in areas with gentler slopes, which have a relatively smaller impact on farmland abandonment. This study provides valuable references for the extraction of abandoned farmland and for analyzing the abandonment mechanisms in the study area, which have a profound impact on agricultural economic development and help to support the implementation of rural revitalization strategies. Full article

(This article belongs to the Special Issue State of the Art on Agriculture in Rural Areas: For Sustainable Land Management)

► Show Figures

Figure 1

22 pages, 25824 KiB

Open AccessArticle

NoctuDroneNet: Real-Time Semantic Segmentation of Nighttime UAV Imagery in Complex Environments

by Ruokun Qu, Jintao Tan, Yelu Liu, Chenglong Li and Hui Jiang

Drones 2025, 9(2), 97; https://doi.org/10.3390/drones9020097 - 27 Jan 2025

Viewed by 1131

Abstract

Nighttime semantic segmentation represents a challenging frontier in computer vision, made particularly difficult by severe low-light conditions, pronounced noise, and complex illumination patterns. These challenges intensify when dealing with Unmanned Aerial Vehicle (UAV) imagery, where varying camera angles and altitudes compound the difficulty. [...] Read more.

Nighttime semantic segmentation represents a challenging frontier in computer vision, made particularly difficult by severe low-light conditions, pronounced noise, and complex illumination patterns. These challenges intensify when dealing with Unmanned Aerial Vehicle (UAV) imagery, where varying camera angles and altitudes compound the difficulty. In this paper, we introduce NoctuDroneNet (Nocturnal UAV Drone Network, hereinafter referred to as NoctuDroneNet), a real-time segmentation model tailored specifically for nighttime UAV scenarios. Our approach integrates convolution-based global reasoning with training-only semantic alignment modules to effectively handle diverse and extreme nighttime conditions. We construct a new dataset, NUI-Night, focusing on low-illumination UAV scenes to rigorously evaluate performance under conditions rarely represented in standard benchmarks. Beyond NUI-Night, we assess NoctuDroneNet on the Varied Drone Dataset (VDD), a normal-illumination UAV dataset, demonstrating the model’s robustness and adaptability to varying flight domains despite the lack of large-scale low-light UAV benchmarks. Furthermore, evaluations on the Night-City dataset confirm its scalability and applicability to complex nighttime urban environments. NoctuDroneNet achieves state-of-the-art performance on NUI-Night, surpassing strong real-time baselines in both segmentation accuracy and speed. Qualitative analyses highlight its resilience to under-/over-exposure and small-object detection, underscoring its potential for real-world applications like UAV emergency landings under minimal illumination. Full article

► Show Figures

Figure 1

20 pages, 3724 KiB

Open AccessArticle

Unveiling Urban River Visual Features Through Immersive Virtual Reality: Analyzing Youth Perceptions with UAV Panoramic Imagery

by Yunlei Shou, Zexin Lei, Jiaying Li and Junjie Luo

ISPRS Int. J. Geo-Inf. 2024, 13(11), 402; https://doi.org/10.3390/ijgi13110402 - 7 Nov 2024

Cited by 2 | Viewed by 1722

Abstract

The visual evaluation and characteristic analysis of urban rivers are pivotal for advancing our understanding of urban waterscapes and their surrounding environments. Unmanned aerial vehicles (UAVs) offer significant advantages over traditional satellite remote sensing, including flexible aerial surveying, diverse perspectives, and high-resolution imagery. [...] Read more.

The visual evaluation and characteristic analysis of urban rivers are pivotal for advancing our understanding of urban waterscapes and their surrounding environments. Unmanned aerial vehicles (UAVs) offer significant advantages over traditional satellite remote sensing, including flexible aerial surveying, diverse perspectives, and high-resolution imagery. This study centers on the Haihe River, South Canal, and North Canal in Tianjin China, employing UAVs to capture continuous panoramic image data. Through immersive virtual reality (VR) technology, visual evaluations of these panoramic images were obtained from a cohort of young participants. These evaluations encompassed assessments of scenic beauty, color richness, vitality, and historical sense. Subsequently, computer vision techniques were utilized to quantitatively analyze the proportions of various landscape elements (e.g., trees, grass, buildings) within the images. Clustering analysis of visual evaluation results and semantic segmentation outcomes from different study points facilitated the effective identification and grouping of river visual features. The findings reveal significant differences in scenic beauty, color richness, and vitality among the Haihe River, South Canal, and North Canal, whereas the South and North Canals exhibited a limited sense of history. Six landscape elements—water bodies, buildings, trees, etc.—comprised over 90% of the images, forming the primary visual characteristics of the three rivers. Nonetheless, the uneven spatial distribution of these elements resulted in notable variations in the visual features of the rivers. This study demonstrates that the visual feature analysis method based on UAV panoramic images can achieve a quantitative evaluation of multi-scene urban 3D landscapes, thereby providing a robust scientific foundation for the optimization of urban river environments. Full article

► Show Figures

Figure 1

17 pages, 5076 KiB

Open AccessArticle

A Scene–Object–Economy Framework for Identifying and Validating Urban–Rural Fringe Using Multisource Geospatial Big Data

by Ganmin Yin, Ying Feng, Yanxiao Jiang and Yi Bao

Appl. Sci. 2024, 14(22), 10191; https://doi.org/10.3390/app142210191 - 6 Nov 2024

Viewed by 1243

Abstract

Rapid urbanization has led to the emergence of urban–rural fringes, complex transitional zones that challenge traditional urban–rural dichotomies. While these areas play a crucial role in urban development, their precise identification remains a significant challenge. Existing methods often rely on single-dimensional metrics or [...] Read more.

Rapid urbanization has led to the emergence of urban–rural fringes, complex transitional zones that challenge traditional urban–rural dichotomies. While these areas play a crucial role in urban development, their precise identification remains a significant challenge. Existing methods often rely on single-dimensional metrics or administrative boundaries, failing to capture the multi-faceted nature of these zones. This study introduces a novel “Scene–Object–Economy” (SOE) framework to address these limitations and enhance the precision of urban–rural fringe identification. Our approach integrates multisource geospatial big data, including remote sensing imagery, nightlight data, buildings, and Points of Interest (POI), leveraging machine learning techniques. The SOE framework constructs feature from three dimensions: scene (image features), object (buildings), and economy (POIs). This multidimensional methodology allows for a more comprehensive and nuanced mapping of urban–rural fringes, overcoming the constraints of traditional methods. The study demonstrates the effectiveness of the SOE framework in accurately delineating urban–rural fringes through multidimensional validation. Our results reveal distinct spatial patterns and characteristics of these transitional zones, providing valuable insights for urban planners and policymakers. Furthermore, the integration of dynamic population data as a separate layer of analysis offers a unique perspective on population distribution patterns within the identified fringes. This research contributes to the field by offering a more robust and objective approach to urban–rural fringe identification, laying the groundwork for improved urban management and sustainable development strategies. The SOE framework presents a promising tool for future studies in urban spatial analysis and planning. Full article

(This article belongs to the Special Issue Geographic Information System and Remote Sensing Applications in Digital Earth)

► Show Figures

Figure 1

20 pages, 1584 KiB

Open AccessArticle

Hyperspectral Image Classification Algorithm for Forest Analysis Based on a Group-Sensitive Selective Perceptual Transformer

by Shaoliang Shi, Xuyang Li, Xiangsuo Fan and Qi Li

Appl. Sci. 2024, 14(20), 9553; https://doi.org/10.3390/app14209553 - 19 Oct 2024

Viewed by 1773

Abstract

Substantial advancements have been achieved in hyperspectral image (HSI) classification through contemporary deep learning techniques. Nevertheless, the incorporation of an excessive number of irrelevant tokens in large-scale remote sensing data results in inefficient long-range modeling. To overcome this hurdle, this study introduces the [...] Read more.

Substantial advancements have been achieved in hyperspectral image (HSI) classification through contemporary deep learning techniques. Nevertheless, the incorporation of an excessive number of irrelevant tokens in large-scale remote sensing data results in inefficient long-range modeling. To overcome this hurdle, this study introduces the Group-Sensitive Selective Perception Transformer (GSAT) framework, which builds upon the Vision Transformer (ViT) to enhance HSI classification outcomes. The innovation of the GSAT architecture is primarily evident in several key aspects. Firstly, the GSAT incorporates a Group-Sensitive Pixel Group Mapping (PGM) module, which organizes pixels into distinct groups. This allows the global self-attention mechanism to function within these groupings, effectively capturing local interdependencies within spectral channels. This grouping tactic not only boosts the model’s spatial awareness but also lessens computational complexity, enhancing overall efficiency. Secondly, the GSAT addresses the detrimental effects of superfluous tokens on model efficacy by introducing the Sensitivity Selection Framework (SSF) module. This module selectively identifies the most pertinent tokens for classification purposes, thereby minimizing distractions from extraneous information and bolstering the model’s representational strength. Furthermore, the SSF refines local representation through multi-scale feature selection, enabling the model to more effectively encapsulate feature data across various scales. Additionally, the GSAT architecture adeptly represents both global and local features of HSI data by merging global self-attention with local feature extraction. This integration strategy not only elevates classification precision but also enhances the model’s versatility in navigating complex scenes, particularly in urban mapping scenarios where it significantly outclasses previous deep learning methods. The advent of the GSAT architecture not only rectifies the inefficiencies of traditional deep learning approaches in processing extensive remote sensing imagery but also markededly enhances the performance of HSI classification tasks through the deployment of group-sensitive and selective perception mechanisms. It presents a novel viewpoint within the domain of hyperspectral image classification and is poised to propel further advancements in the field. Empirical testing on six standard HSI datasets confirms the superior performance of the proposed GSAT method in HSI classification, especially within urban mapping contexts, where it exceeds the capabilities of prior deep learning techniques. In essence, the GSAT architecture markedly refines HSI classification by pioneering group-sensitive pixel group mapping and selective perception mechanisms, heralding a significant breakthrough in hyperspectral image processing. Full article

► Show Figures

Figure 1

25 pages, 6736 KiB

Open AccessArticle

LFIR-YOLO: Lightweight Model for Infrared Vehicle and Pedestrian Detection

by Quan Wang, Fengyuan Liu, Yi Cao, Farhan Ullah and Muxiong Zhou

Sensors 2024, 24(20), 6609; https://doi.org/10.3390/s24206609 - 14 Oct 2024

Cited by 5 | Viewed by 3666

Abstract

The complexity of urban road scenes at night and the inadequacy of visible light imaging in such conditions pose significant challenges. To address the issues of insufficient color information, texture detail, and low spatial resolution in infrared imagery, we propose an enhanced infrared [...] Read more.

The complexity of urban road scenes at night and the inadequacy of visible light imaging in such conditions pose significant challenges. To address the issues of insufficient color information, texture detail, and low spatial resolution in infrared imagery, we propose an enhanced infrared detection model called LFIR-YOLO, which is built upon the YOLOv8 architecture. The primary goal is to improve the accuracy of infrared target detection in nighttime traffic scenarios while meeting practical deployment requirements. First, to address challenges such as limited contrast and occlusion noise in infrared images, the C2f module in the high-level backbone network is augmented with a Dilation-wise Residual (DWR) module, incorporating multi-scale infrared contextual information to enhance feature extraction capabilities. Secondly, at the neck of the network, a Content-guided Attention (CGA) mechanism is applied to fuse features and re-modulate both initial and advanced features, catering to the low signal-to-noise ratio and sparse detail features characteristic of infrared images. Third, a shared convolution strategy is employed in the detection head, replacing the decoupled head strategy and utilizing shared Detail Enhancement Convolution (DEConv) and Group Norm (GN) operations to achieve lightweight yet precise improvements. Finally, loss functions, PIoU v2 and Adaptive Threshold Focal Loss (ATFL), are integrated into the model to better decouple infrared targets from the background and to enhance convergence speed. The experimental results on the FLIR and multispectral datasets show that the proposed LFIR-YOLO model achieves an improvement in detection accuracy of 4.3% and 2.6%, respectively, compared to the YOLOv8 model. Furthermore, the model demonstrates a reduction in parameters and computational complexity by 15.5% and 34%, respectively, enhancing its suitability for real-time deployment on resource-constrained edge devices. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

Search Results (88)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (88)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI