MDPI - Publisher of Open Access Journals

28 pages, 1607 KiB

Open AccessArticle

Self-Supervised Keypoint Learning for the Geometric Analysis of Road-Marking Templates

by Chayanon Sub-r-pa and Rung-Ching Chen

Algorithms 2025, 18(7), 379; https://doi.org/10.3390/a18070379 - 23 Jun 2025

Viewed by 279

Robust visual perception and geometric alignment are crucial for intelligent automation in various domains, such as industrial processes and infrastructure monitoring. Accurately aligning structured visual elements, such as floor markings or road-marking templates, is essential for tasks like automated guidance, verification, and condition [...] Read more.

Robust visual perception and geometric alignment are crucial for intelligent automation in various domains, such as industrial processes and infrastructure monitoring. Accurately aligning structured visual elements, such as floor markings or road-marking templates, is essential for tasks like automated guidance, verification, and condition assessment. However, traditional feature-based methods struggle with templates that feature simple geometries and lack rich textures, making reliable feature matching and alignment difficult, even under controlled conditions. To address this, we propose GeoTemplateKPNet, a novel self-supervised deep-learning framework, built upon Convolutional Neural Networks (CNNs), designed to learn robust, geometrically consistent keypoints specifically in synthetic template images. The model is trained exclusively in a synthetic template dataset by enforcing equivariance to geometric transformations and utilizing self-supervised losses, including inside mask loss, peakiness loss, repulsion loss, and keypoint-driven image reprojection loss, thereby eliminating the need for manual keypoint annotations. We evaluate the method in a synthetic template test set, using metrics such as a keypoint-matching comparison, the Inside Mask Rate (IMR), and the Alignment Reconstruction Error (ARE). The results demonstrate that GeoTemplateKPNet successfully learns to predict meaningful keypoints on template structures, enabling accurate alignment between templates and their transformed counterparts. Ablation studies reveal that the number of keypoints (K) impacts the performance, with K = 3 providing the most suitable balance for the overall alignment accuracy, although the performance varies across different template geometries. GeoTemplateKPNet offers a foundational self-supervised solution for the robust geometric analysis of templates, which is crucial for downstream alignment tasks and applications. Full article

(This article belongs to the Special Issue Data-Driven Intelligent Modeling and Optimization Algorithms for Industrial Processes: 2nd Edition)

► Show Figures

Figure 1

29 pages, 138770 KiB

Open AccessArticle

Regional-Scale Detection of Palms Using VHR Satellite Imagery and Deep Learning in the Guyanese Rainforest

by Matthew J. Drouillard and Anthony R. Cummings

Remote Sens. 2024, 16(24), 4642; https://doi.org/10.3390/rs16244642 - 11 Dec 2024

Cited by 1 | Viewed by 1066

Abstract

Arecaceae (palms) play a crucial role for native communities and wildlife in the Amazon region. This study presents a first-of-its-kind regional-scale spatial cataloging of palms using remotely sensed data for the country of Guyana. Using very high-resolution satellite images from the GeoEye-1 and [...] Read more.

Arecaceae (palms) play a crucial role for native communities and wildlife in the Amazon region. This study presents a first-of-its-kind regional-scale spatial cataloging of palms using remotely sensed data for the country of Guyana. Using very high-resolution satellite images from the GeoEye-1 and WorldView-2 sensor platforms, which collectively cover an area of 985 km², a total of 472,753 individual palm crowns are detected with F1 scores of 0.76 and 0.79, respectively, using a convolutional neural network (CNN) instance segmentation model. An example of CNN model transference between images is presented, emphasizing the limitation and practical application of this approach. A method is presented to optimize precision and recall using the confidence of the detection features; this results in a decrease of 45% and 31% in false positive detections, with a moderate increase in false negative detections. The sensitivity of the CNN model to the size of the training set is evaluated, showing that comparable metrics could be achieved with approximately 50% of the samples used in this study. Finally, the diameter of the palm crown is calculated based on the polygon identified by mask detection, resulting in an average of 7.83 m, a standard deviation of 1.05 m, and a range of {4.62, 13.90} m for the GeoEye-1 image. Similarly, for the WorldView-2 image, the average diameter is 8.08 m, with a standard deviation of 0.70 m and a range of {4.82, 15.80} m. Full article

(This article belongs to the Special Issue Deep Learning Techniques Applied in Remote Sensing)

► Show Figures

Figure 1

17 pages, 32002 KiB

Open AccessArticle

Automated Shoreline Segmentation in Satellite Imagery Using USV Measurements

by Antoni Jaszcz, Marta Włodarczyk-Sielicka, Andrzej Stateczny, Dawid Połap and Ilona Garczyńska

Remote Sens. 2024, 16(23), 4457; https://doi.org/10.3390/rs16234457 - 27 Nov 2024

Cited by 4 | Viewed by 1657

Abstract

Generating aerial shoreline segmentation masks can be a daunting task, often requiring manual labeling or correction. This is further problematic because neural segmentation models require decent and abundant data for training, requiring even more manpower to automate the process. In this paper, we [...] Read more.

Generating aerial shoreline segmentation masks can be a daunting task, often requiring manual labeling or correction. This is further problematic because neural segmentation models require decent and abundant data for training, requiring even more manpower to automate the process. In this paper, we propose utilizing Unmanned Surface Vehicles (USVs) in an automated shoreline segmentation system on satellite imagery. The remotely controlled vessel first collects above- and underwater shoreline information using light detection and ranging (LiDAR) and multibeam echosounder (MBES) measuring instruments, resulting in a geo-referenced 3D point cloud. After cleaning and processing these data, the system integrates the projected map with an aerial image of the region. Based on the height values of the mapped points, the image is segmented. Finally, post-processing methods and the k-NN algorithm are introduced, resulting in a complete binary shoreline segmentation mask. The obtained data were used for training U-Net-type segmentation models with pre-trained backbones. The InceptionV3-based model achieved an accuracy of 96% and a dice coefficient score of 93%, demonstrating the effectiveness of the proposed system as a source of data acquisition for training deep neural networks. Full article

(This article belongs to the Section Environmental Remote Sensing)

► Show Figures

Figure 1

9 pages, 1242 KiB

Open AccessEntry

Geomasking to Safeguard Geoprivacy in Geospatial Health Data

by Jue Wang

Encyclopedia 2024, 4(4), 1581-1589; https://doi.org/10.3390/encyclopedia4040103 - 21 Oct 2024

Viewed by 1772

Definition

Geomasking is a set of techniques that introduces noise or intentional errors into geospatial data to minimize the risk of identifying exact location information related to individuals while preserving the utility of the data to a controlled extent. It protects the geoprivacy of [...] Read more.

Geomasking is a set of techniques that introduces noise or intentional errors into geospatial data to minimize the risk of identifying exact location information related to individuals while preserving the utility of the data to a controlled extent. It protects the geoprivacy of the data contributor and mitigates potential harm from data breaches while promoting safer data sharing. The development of digital health technologies and the extensive use of individual geospatial data in health studies have raised concerns about geoprivacy. The individual tracking data and health information, if accessed by unauthorized parties, may lead to privacy invasions, criminal activities, and discrimination. These risks underscore the importance of robust protective measures in the collection, management, and sharing of sensitive data. Geomasking techniques have been developed to safeguard geoprivacy in geospatial health data, addressing the risks and challenges associated with data sharing. This entry paper discusses the importance of geoprivacy in geospatial health data and introduces various kinds of geomasking methods and their applications in balancing the protection of individual privacy with the need for data sharing to ensure scientific reproducibility, highlighting the urgent need for more effective geomasking techniques and their applications. Full article

(This article belongs to the Section Mathematics & Computer Science)

► Show Figures

Figure 1

19 pages, 5199 KiB

Open AccessArticle

Geometry-Aware Enhanced Mutual-Supervised Point Elimination with Overlapping Mask Contrastive Learning for Partitial Point Cloud Registration

by Yue Dai, Shuilin Wang, Chunfeng Shao, Heng Zhang and Fucang Jia

Electronics 2024, 13(20), 4074; https://doi.org/10.3390/electronics13204074 - 16 Oct 2024

Viewed by 1252

Abstract

Point cloud registration is one of the fundamental tasks in computer vision, but faces challenges under low overlap conditions. Recent approaches use transformers and overlapping masks to improve perception, but mask learning only considers Euclidean distances between features, ignores mismatches caused by fuzzy [...] Read more.

Point cloud registration is one of the fundamental tasks in computer vision, but faces challenges under low overlap conditions. Recent approaches use transformers and overlapping masks to improve perception, but mask learning only considers Euclidean distances between features, ignores mismatches caused by fuzzy geometric structures, and is often computationally inefficient. To address these issues, we introduce a novel matching framework. Firstly, we fuse adaptive graph convolution with PPF features to obtain rich feature perception. Subsequently, we construct a PGT framework that uses GeoTransformer and combines it with location information encoding to enhance the geometry perception between source and target clouds. In addition, we improve the visibility of overlapping regions through information exchange and the AIS module, aiming at subsequent keypoint extraction, preserving points with distinct geometrical structures while suppressing the influence of non-overlapping regions to improve computational efficiency. Finally, the mask is refined through contrast learning to preserve geometric and distance similarity, which helps to compute the transformation parameters more accurately. We have conducted comprehensive experiments on synthetic and real-world scene datasets, demonstrating superior registration performance compared to recent deep learning methods. Our approach shows remarkable improvements of 68.21% in

R_{R M S E}

and 76.31% in

t_{R M S E}

on synthetic data, while also excelling in real-world scenarios with enhancements of 76.46% in

R_{R M S E}

and 45.16% in

t_{R M S E}

. Full article

► Show Figures

Figure 1

37 pages, 6394 KiB

Open AccessArticle

Insights into the Effects of Tile Size and Tile Overlap Levels on Semantic Segmentation Models Trained for Road Surface Area Extraction from Aerial Orthophotography

by Calimanut-Ionut Cira, Miguel-Ángel Manso-Callejo, Ramon Alcarria, Teresa Iturrioz and José-Juan Arranz-Justel

Remote Sens. 2024, 16(16), 2954; https://doi.org/10.3390/rs16162954 - 12 Aug 2024

Cited by 2 | Viewed by 2722

Abstract

Studies addressing the supervised extraction of geospatial elements from aerial imagery with semantic segmentation operations (including road surface areas) commonly feature tile sizes varying from 256 × 256 pixels to 1024 × 1024 pixels with no overlap. Relevant geo-computing works in the field [...] Read more.

Studies addressing the supervised extraction of geospatial elements from aerial imagery with semantic segmentation operations (including road surface areas) commonly feature tile sizes varying from 256 × 256 pixels to 1024 × 1024 pixels with no overlap. Relevant geo-computing works in the field often comment on prediction errors that could be attributed to the effect of tile size (number of pixels or the amount of information in the processed image) or to the overlap levels between adjacent image tiles (caused by the absence of continuity information near the borders). This study provides further insights into the impact of tile overlaps and tile sizes on the performance of deep learning (DL) models trained for road extraction. In this work, three semantic segmentation architectures were trained on data from the SROADEX dataset (orthoimages and their binary road masks) that contains approximately 700 million pixels of the positive “Road” class for the road surface area extraction task. First, a statistical analysis is conducted on the performance metrics achieved on unseen testing data featuring around 18 million pixels of the positive class. The goal of this analysis was to study the difference in mean performance and the main and interaction effects of the fixed factors on the dependent variables. The statistical tests proved that the impact on performance was significant for the main effects and for the two-way interaction between tile size and tile overlap and between tile size and DL architecture, at a level of significance of 0.05. We provide further insights and trends in the predictions of the extensive qualitative analysis carried out with the predictions of the best models at each tile size. The results indicate that training the DL models on larger tile sizes with a small percentage of overlap delivers better road representations and that testing different combinations of model and tile sizes can help achieve a better extraction performance. Full article

(This article belongs to the Special Issue Advances in Remote Sensing and Digital Twin Technologies for Transportation Infrastructure)

► Show Figures

Figure 1

21 pages, 31109 KiB

Open AccessArticle

InstLane Dataset and Geometry-Aware Network for Instance Segmentation of Lane Line Detection

by Qimin Cheng, Jiajun Ling, Yunfei Yang, Kaiji Liu, Huanying Li and Xiao Huang

Remote Sens. 2024, 16(15), 2751; https://doi.org/10.3390/rs16152751 - 28 Jul 2024

Cited by 1 | Viewed by 1262

Abstract

Despite impressive progress, obtaining appropriate data for instance-level lane segmentation remains a significant challenge. This limitation hinders the refinement of granular lane-related applications such as lane line crossing surveillance, pavement maintenance, and management. To address this gap, we introduce a benchmark for lane [...] Read more.

Despite impressive progress, obtaining appropriate data for instance-level lane segmentation remains a significant challenge. This limitation hinders the refinement of granular lane-related applications such as lane line crossing surveillance, pavement maintenance, and management. To address this gap, we introduce a benchmark for lane instance segmentation called InstLane. To the best of our knowledge, InstLane constitutes the first publicly accessible instance-level segmentation standard for lane line detection. The complexity of InstLane emanates from the fact that the original data are procured using cameras mounted laterally, as opposed to traditional front-mounted sensors. InstLane encapsulates a range of challenging scenarios, enhancing the generalization and robustness of the lane line instance segmentation algorithms. In addition, we propose GeoLaneNet, a real-time, geometry-aware lane instance segmentation network. Within GeoLaneNet, we design a finer localization of lane proto-instances based on geometric features to counteract the prevalent omission or multiple detections in dense lane scenarios resulting from non-maximum suppression (NMS). Furthermore, we present a scheme that employs a larger receptive field to achieve profound perceptual lane structural learning, thereby improving detection accuracy. We introduce an architecture based on partial feature transformation to expedite the detection process. Comprehensive experiments on InstLane demonstrate that GeoLaneNet can achieve up to twice the speed of current State-Of-The-Artmethods, reaching 139 FPS on an RTX3090 and a mask AP of 73.55%, with a permissible trade-off in AP, while maintaining comparable accuracy. These results underscore the effectiveness, robustness, and efficiency of GeoLaneNet in autonomous driving. Full article

(This article belongs to the Special Issue Advances in Remote Sensing of Solving Challenges in Autonomous Driving and Safety Analysis)

► Show Figures

Figure 1

21 pages, 3492 KiB

Open AccessArticle

A Question and Answering Service of Typhoon Disasters Based on the T5 Large Language Model

by Yongqi Xia, Yi Huang, Qianqian Qiu, Xueying Zhang, Lizhi Miao and Yixiang Chen

ISPRS Int. J. Geo-Inf. 2024, 13(5), 165; https://doi.org/10.3390/ijgi13050165 - 14 May 2024

Cited by 10 | Viewed by 3184

Abstract

A typhoon disaster is a common meteorological disaster that seriously impacts natural ecology, social economy, and even human sustainable development. It is crucial to access the typhoon disaster information, and the corresponding disaster prevention and reduction strategies. However, traditional question and answering (Q&A) [...] Read more.

A typhoon disaster is a common meteorological disaster that seriously impacts natural ecology, social economy, and even human sustainable development. It is crucial to access the typhoon disaster information, and the corresponding disaster prevention and reduction strategies. However, traditional question and answering (Q&A) methods exhibit shortcomings like low information retrieval efficiency and poor interactivity. This makes it difficult to satisfy users’ demands for obtaining accurate information. Consequently, this work proposes a typhoon disaster knowledge Q&A approach based on LLM (T5). This method integrates two technical paradigms of domain fine-tuning and retrieval-augmented generation (RAG) to optimize user interaction experience and improve the precision of disaster information retrieval. The process specifically includes the following steps. First, this study selects information about typhoon disasters from open-source databases, such as Baidu Encyclopedia and Wikipedia. Utilizing techniques such as slicing and masked language modeling, we generate a training set and 2204 Q&A pairs specifically focused on typhoon disaster knowledge. Second, we continuously pretrain the T5 model using the training set. This process involves encoding typhoon knowledge as parameters in the neural network’s weights and fine-tuning the pretrained model with Q&A pairs to adapt the T5 model for downstream Q&A tasks. Third, when responding to user queries, we retrieve passages from external knowledge bases semantically similar to the queries to enhance the prompts. This action further improves the response quality of the fine-tuned model. Finally, we evaluate the constructed typhoon agent (Typhoon-T5) using different similarity-matching approaches. Furthermore, the method proposed in this work lays the foundation for the cross-integration of large language models with disaster information. It is expected to promote the further development of GeoAI. Full article

(This article belongs to the Special Issue Innovative GIS Models and Approaches for Large Environmental and Urban Applications in the Age of AI)

► Show Figures

Figure 1

12 pages, 7133 KiB

Open AccessCommunication

Deterministic Global 3D Fractal Cloud Model for Synthetic Scene Generation

by Aaron M. Schinder, Shannon R. Young, Bryan J. Steward, Michael Dexter, Andrew Kondrath, Stephen Hinton and Ricardo Davila

Remote Sens. 2024, 16(9), 1622; https://doi.org/10.3390/rs16091622 - 30 Apr 2024

Cited by 2 | Viewed by 1953

Abstract

This paper describes the creation of a fast, deterministic, 3D fractal cloud renderer for the AFIT Sensor and Scene Emulation Tool (ASSET). The renderer generates 3D clouds by ray marching through a volume and sampling the level-set of a fractal function. The fractal [...] Read more.

This paper describes the creation of a fast, deterministic, 3D fractal cloud renderer for the AFIT Sensor and Scene Emulation Tool (ASSET). The renderer generates 3D clouds by ray marching through a volume and sampling the level-set of a fractal function. The fractal function is distorted by a displacement map, which is generated using horizontal wind data from a Global Forecast System (GFS) weather file. The vertical windspeed and relative humidity are used to mask the creation of clouds to match realistic large-scale weather patterns over the Earth. Small-scale detail is provided by the fractal functions which are tuned to match natural cloud shapes. This model is intended to run quickly, and it can run in about 700 ms per cloud type. This model generates clouds that appear to match large-scale satellite imagery, and it reproduces natural small-scale shapes. This should enable future versions of ASSET to generate scenarios where the same scene is consistently viewed from both GEO and LEO satellites from multiple perspectives. Full article

(This article belongs to the Section Atmospheric Remote Sensing)

► Show Figures

Figure 1

19 pages, 25201 KiB

Open AccessTechnical Note

Disparity Refinement for Stereo Matching of High-Resolution Remote Sensing Images Based on GIS Data

by Xuanqi Wang, Liting Jiang, Feng Wang, Hongjian You and Yuming Xiang

Remote Sens. 2024, 16(3), 487; https://doi.org/10.3390/rs16030487 - 26 Jan 2024

Cited by 5 | Viewed by 2476

Abstract

With the emergence of the Smart City concept, the rapid advancement of urban three-dimensional (3D) reconstruction becomes imperative. While current developments in the field of 3D reconstruction have enabled the generation of 3D products such as Digital Surface Models (DSM), challenges persist in [...] Read more.

With the emergence of the Smart City concept, the rapid advancement of urban three-dimensional (3D) reconstruction becomes imperative. While current developments in the field of 3D reconstruction have enabled the generation of 3D products such as Digital Surface Models (DSM), challenges persist in accurately reconstructing shadows, handling occlusions, and addressing low-texture areas in very-high-resolution remote sensing images. These challenges often lead to difficulties in calculating satisfactory disparity maps using existing stereo matching methods, thereby reducing the accuracy of 3D reconstruction. This issue is particularly pronounced in urban scenes, which contain numerous super high-rise and densely distributed buildings, resulting in large disparity values and occluded regions in stereo image pairs, and further leading to a large number of mismatched points in the obtained disparity map. In response to these challenges, this paper proposes a method to refine the disparity in urban scenes based on open-source GIS data. First, we register the GIS data with the epipolar-rectified images since there always exists unignorable geolocation errors between them. Specifically, buildings with different heights present different offsets in GIS data registering; thus, we perform multi-modal matching for each building and merge them into the final building mask. Subsequently, a two-layer optimization process is applied to the initial disparity map based on the building mask, encompassing both global and local optimization. Finally, we perform a post-correction on the building facades to obtain the final refined disparity map that can be employed for high-precision 3D reconstruction. Experimental results on SuperView-1, GaoFen-7, and GeoEye satellite images show that the proposed method has the ability to correct the occluded and mismatched areas in the initial disparity map generated by both hand-crafted and deep-learning stereo matching methods. The DSM generated by the refined disparity reduces the average height error from 2.2 m to 1.6 m, which demonstrates superior performance compared with other disparity refinement methods. Furthermore, the proposed method is able to improve the integrity of the target structure and present steeper building facades and complete roofs, which are conducive to subsequent 3D model generation. Full article

(This article belongs to the Special Issue 3D Information Recovery and 2D Image Processing for Remotely Sensed Optical Images II)

► Show Figures

Figure 1

31 pages, 45264 KiB

Open AccessEditor’s ChoiceReview

Porous Material (Titanium Gas Diffusion Layer) in Proton Exchange Membrane Fuel Cell/Electrolyzer: Fabrication Methods & GeoDict: A Critical Review

by Javid Hussain, Dae-Kyeom Kim, Sangmin Park, Muhammad-Waqas Khalid, Sayed-Sajid Hussain, Bin Lee, Myungsuk Song and Taek-Soo Kim

Materials 2023, 16(13), 4515; https://doi.org/10.3390/ma16134515 - 21 Jun 2023

Cited by 11 | Viewed by 4696

Abstract

Proton exchange membrane fuel cell (PEMFC) is a renewable energy source rapidly approaching commercial viability. The performance is significantly affected by the transfer of fluid, charges, and heat; gas diffusion layer (GDL) is primarily concerned with the consistent transfer of these components, which [...] Read more.

Proton exchange membrane fuel cell (PEMFC) is a renewable energy source rapidly approaching commercial viability. The performance is significantly affected by the transfer of fluid, charges, and heat; gas diffusion layer (GDL) is primarily concerned with the consistent transfer of these components, which are heavily influenced by the material and design. High-efficiency GDL must have excellent thermal conductivity, electrical conductivity, permeability, corrosion resistance, and high mechanical characteristics. The first step in creating a high-performance GDL is selecting the appropriate material. Therefore, titanium is a suitable substitute for steel or carbon due to its high strength-to-weight and superior corrosion resistance. The second crucial parameter is the fabrication method that governs all the properties. This review seeks to comprehend numerous fabrication methods such as tape casting, 3D printing, freeze casting, phase separation technique, and lithography, along with the porosity controller in each process such as partial sintering, input design, ice structure, pore agent, etching time, and mask width. Moreover, other GDL properties are being studied, including microstructure and morphology. In the future, GeoDict simulation is highly recommended for optimizing various GDL properties, as it is frequently used for other porous materials. The approach can save time and energy compared to intensive experimental work. Full article

(This article belongs to the Special Issue Design, Synthesis and Characterization of Novel Porous Materials)

► Show Figures

Figure 1

30 pages, 6551 KiB

Open AccessPerspective

We Have Eaten the Rivers: The Past, Present, and Unsustainable Future of Hydroelectricity in Vietnam

by Gerard Sasges and Alan D. Ziegler

Sustainability 2023, 15(11), 8969; https://doi.org/10.3390/su15118969 - 1 Jun 2023

Cited by 11 | Viewed by 6121

Abstract

Vietnam has one of the most intensively energy-exploited riverscapes in Asia with at least 720 hydropower facilities of various capacities currently in operation or in some stage of construction. These facilities represent about 26 GW of installed capacity. This degree of domestic exploitation [...] Read more.

Vietnam has one of the most intensively energy-exploited riverscapes in Asia with at least 720 hydropower facilities of various capacities currently in operation or in some stage of construction. These facilities represent about 26 GW of installed capacity. This degree of domestic exploitation is often overshadowed by the geopolitically contested manipulation of the waters of the international Mekong River. In contrast, the utilization of Vietnam’s hydropower resources has unfolded gradually and largely unremarked for more than half a century. This perspective argues that the harnessing of rivers and streams for electricity generation is the result of not only the country’s abundant hydrologic resources, but also its history, culture, and (geo)politics. The paper traces the processes that have produced this high level of river exploitation, its ambiguous history, and the uncertain future of hydropower in Vietnam in the context of sustainability. Further, the renewed interest in dam-building in recent years is part of a “theater of decarbonization” that masks the operation of powerful domestic and international lobbies with an interest in “heavy engineering” projects that will do little to meet the nation’s rapidly growing electricity needs but will likely incur detrimental ecological and sociological impacts. The paper ends by positing that rather than forging ahead with the construction of additional small hydropower facilities, a more ecologically and socially equitable policy could instead critically examine the sustainability of existing capabilities, resolve the factors limiting the development of other renewable sources of energy, and face the fundamental challenge of curbing energy use. Full article

(This article belongs to the Special Issue Tropical Rivers and Wetlands: Impacts, Hazards, Conservation, and Management)

► Show Figures

Figure 1

22 pages, 3954 KiB

Open AccessArticle

Co-Visual Pattern-Augmented Generative Transformer Learning for Automobile Geo-Localization

by Jianwei Zhao, Qiang Zhai, Pengbo Zhao, Rui Huang and Hong Cheng

Remote Sens. 2023, 15(9), 2221; https://doi.org/10.3390/rs15092221 - 22 Apr 2023

Cited by 11 | Viewed by 2810

Abstract

Geolocation is a fundamental component of route planning and navigation for unmanned vehicles, but GNSS-based geolocation fails under denial-of-service conditions. Cross-view geo-localization (CVGL), which aims to estimate the geographic location of the ground-level camera by matching against enormous geo-tagged aerial (e.g., satellite) images, [...] Read more.

Geolocation is a fundamental component of route planning and navigation for unmanned vehicles, but GNSS-based geolocation fails under denial-of-service conditions. Cross-view geo-localization (CVGL), which aims to estimate the geographic location of the ground-level camera by matching against enormous geo-tagged aerial (e.g., satellite) images, has received a lot of attention but remains extremely challenging due to the drastic appearance differences across aerial–ground views. In existing methods, global representations of different views are extracted primarily using Siamese-like architectures, but their interactive benefits are seldom taken into account. In this paper, we present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL), for CVGL. Specifically, by taking the initial representations produced by the backbone network, MGTL develops two separate generative sub-modules—one for aerial-aware knowledge generation from ground-view semantics and vice versa—and fully exploits the entirely mutual benefits through the attention mechanism. Moreover, to better capture the co-visual relationships between aerial and ground views, we introduce a cascaded attention masking algorithm to further boost accuracy. Extensive experiments on challenging public benchmarks, i.e., CVACT and CVUSA, demonstrate the effectiveness of the proposed method, which sets new records compared with the existing state-of-the-art models. Our code will be available upon acceptance. Full article

(This article belongs to the Special Issue Information Extraction, Processing and Analysis Methods for Remote Sensing Multi-Modal Information Navigation Applications)

► Show Figures

Graphical abstract

23 pages, 10870 KiB

Open AccessFeature PaperArticle

Automated Rice Phenology Stage Mapping Using UAV Images and Deep Learning

by Xiangyu Lu, Jun Zhou, Rui Yang, Zhiyan Yan, Yiyuan Lin, Jie Jiao and Fei Liu

Drones 2023, 7(2), 83; https://doi.org/10.3390/drones7020083 - 25 Jan 2023

Cited by 16 | Viewed by 4727

Abstract

Accurate monitoring of rice phenology is critical for crop management, cultivars breeding, and yield estimating. Previously, research for phenology detection relied on time-series data and orthomosaic and manually plotted regions, which are difficult to automate. This study presented a novel approach for extracting [...] Read more.

Accurate monitoring of rice phenology is critical for crop management, cultivars breeding, and yield estimating. Previously, research for phenology detection relied on time-series data and orthomosaic and manually plotted regions, which are difficult to automate. This study presented a novel approach for extracting and mapping phenological traits directly from the unmanned aerial vehicle (UAV) photograph sequence. First, a multi-stage rice field segmentation dataset containing four growth stages and 2600 images, namely PaddySeg, was built. Moreover, an efficient Ghost Bilateral Network (GBiNet) was proposed to generate trait masks. To locate the trait of each pixel, we introduced direct geo-locating (DGL) and incremental sparse sampling (ISS) techniques to eliminate redundant computation. According to the results on PaddySeg, the proposed GBiNet with 91.50% mean-Intersection-over-Union (mIoU) and 41 frames-per-second (FPS) speed outperformed the baseline model (90.95%, 36 FPS), while the fastest GBiNet_t reached 62 FPS which was 1.7 times faster than the baseline model, BiSeNetV2. Additionally, the measured average DGL deviation was less than 1% of the relative height. Finally, the mapping of rice phenology was achieved by interpolation on trait value–location pairs. The proposed approach demonstrated great potential for automatic rice phenology stage surveying and mapping. Full article

(This article belongs to the Special Issue UAS in Smart Agriculture)

► Show Figures

Figure 1

28 pages, 15345 KiB

Open AccessArticle

GeoBERT: Pre-Training Geospatial Representation Learning on Point-of-Interest

by Yunfan Gao, Yun Xiong, Siqi Wang and Haofen Wang

Appl. Sci. 2022, 12(24), 12942; https://doi.org/10.3390/app122412942 - 16 Dec 2022

Cited by 17 | Viewed by 6053

Abstract

Thanks to the development of geographic information technology, geospatial representation learning based on POIs (Point-of-Interest) has gained widespread attention in the past few years. POI is an important indicator to reflect urban socioeconomic activities, widely used to extract geospatial information. However, previous studies [...] Read more.

Thanks to the development of geographic information technology, geospatial representation learning based on POIs (Point-of-Interest) has gained widespread attention in the past few years. POI is an important indicator to reflect urban socioeconomic activities, widely used to extract geospatial information. However, previous studies often focus on a specific area, such as a city or a district, and are designed only for particular tasks, such as land-use classification. On the other hand, large-scale pre-trained models (PTMs) have recently achieved impressive success and become a milestone in artificial intelligence (AI). Against this background, this study proposes the first large-scale pre-training geospatial representation learning model called GeoBERT. First, we collect about 17 million POIs in 30 cities across China to construct pre-training corpora, with 313 POI types as the tokens and the level-7 Geohash grids as the basic units. Second, we pre-train GeoEBRT to learn grid embedding in self-supervised learning by masking the POI type and then predicting. Third, under the paradigm of “pre-training + fine-tuning”, we design five practical downstream tasks. Experiments show that, with just one additional output layer fine-tuning, GeoBERT outperforms previous NLP methods (Word2vec, GloVe) used in geospatial representation learning by 9.21% on average in F1-score for classification tasks, such as store site recommendation and working/living area prediction. For regression tasks, such as POI number prediction, house price prediction, and passenger flow prediction, GeoBERT demonstrates greater performance improvements. The experiment results prove that pre-training on large-scale POI data can significantly improve the ability to extract geospatial information. In the discussion section, we provide a detailed analysis of what GeoBERT has learned from the perspective of attention mechanisms. Full article

(This article belongs to the Special Issue Natural Language Processing (NLP) and Applications)

► Show Figures

Figure 1

Search Results (42)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (42)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI