remotesensing-logo

Journal Browser

Journal Browser

Advanced Application of Artificial Intelligence and Machine Vision in Remote Sensing (Third Edition)

A special issue of Remote Sensing (ISSN 2072-4292). This special issue belongs to the section "AI Remote Sensing".

Deadline for manuscript submissions: 15 June 2025 | Viewed by 27804

Special Issue Editor


E-Mail Website
Guest Editor
1. Faculty of Engineering and IT, University of Technology Sydney, Ultimo, NSW, Australia
2. McGregor Coxall Australia Pty Ltd., Sydney, NSW, Australia
Interests: machine learning; geospatial 3D analysis; geospatial database querying; web GIS; airborne/spaceborne image processing; feature extraction; time-series analysis in forecasting modelling and domain adaptation in various environmental applications
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

We are excited to introduce a new Special Issue, building upon the success of our previous endeavor, “Advanced Application of Artificial Intelligence and Machine Vision in Remote Sensing”. This forthcoming edition dives deeper into the confluence of cutting-edge technology, with a particular focus on drone-based and LiDAR-based image processing as well as the integration of artificial intelligence (AI) in urban planning.

Over the last decade, AI and machine learning (ML) techniques have significantly impacted image processing and spatial analysis across various applications. AI has empowered us to unlock the true potential of imagery data, employing tailored algorithms for tasks such as classification, regression, clustering, spatial correlation modeling, and more. Deep neural networks, commonly known as deep learning, stand out as powerful tools within this domain, performing functions like pattern recognition, feature detection, trend prediction, instance segmentation, semantic segmentation, and image classification within neural network frameworks.

Traditionally, structured remotely sensed data often required painstaking manual labelling for training models, a subjective and non-transferable process. To address these challenges, "machine vision" (MV) has emerged as a holistic solution, streamlining the workflow from image acquisition to knowledge extraction. MV leverages AI technology to minimise computation time and maximise replicable accuracy, encompassing software products and hardware architectures such as CPUs, GPU/FPGA combinations, parallel processing, and computer vision techniques.

In this Special Issue, we invite scholarly manuscripts proposing frameworks that combine machine vision with state-of-the-art AI techniques and geospatial information systems to automate the processing of remotely sensed imagery from diverse sources, including drones, LiDAR, radar, SAR, and multispectral sensors. The primary objective is to achieve higher precision in a range of spatial applications, from urban planning to environmental studies, weather and climate analysis, the energy sector, natural resource management, landscape assessment, and geo-hazard monitoring.

As we explore this Special Issue, we anticipate groundbreaking contributions that will reshape urban planning and related domains. These endeavors, enriched by drone-based and LiDAR-based image processing, alongside innovative image processing AI, will guide us towards a more intelligent, efficient, and sustainable future.

Dr. Hossein M. Rizeei
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Remote Sensing is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2700 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • artificial intelligence (AI)
  • machine vision (MV)
  • machine learning (ML)
  • geospatial information systems (GIS)
  • optimisation
  • spatial framework
  • deep learning (DL)

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Related Special Issues

Published Papers (15 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 19260 KiB  
Article
Refraction-Aware Structure from Motion for Airborne Bathymetry
by Alexandros Makris, Vassilis C. Nicodemou, Evangelos Alevizos, Iason Oikonomidis, Dimitrios D. Alexakis and Anastasios Roussos
Remote Sens. 2024, 16(22), 4253; https://doi.org/10.3390/rs16224253 - 15 Nov 2024
Viewed by 414
Abstract
In this work, we introduce the first pipeline that combines a refraction-aware structure from motion (SfM) method with a deep learning model specifically designed for airborne bathymetry. We accurately estimate the 3D positions of the submerged points by integrating refraction geometry within the [...] Read more.
In this work, we introduce the first pipeline that combines a refraction-aware structure from motion (SfM) method with a deep learning model specifically designed for airborne bathymetry. We accurately estimate the 3D positions of the submerged points by integrating refraction geometry within the SfM optimization problem. This way, no refraction correction as post-processing is required. Experiments with simulated data that approach real-world capturing conditions demonstrate that SfM with refraction correction is extremely accurate, with submillimeter errors. We integrate our refraction-aware SfM within a deep learning framework that also takes into account radiometrical information, developing a combined spectral and geometry-based approach, with further improvements in accuracy and robustness to different seafloor types, both textured and textureless. We conducted experiments with real-world data at two locations in the southern Mediterranean Sea, with varying seafloor types, which demonstrate the benefits of refraction correction for the deep learning framework. We made our refraction-aware SfM open source, providing researchers in airborne bathymetry with a practical tool to apply SfM in shallow water areas. Full article
Show Figures

Figure 1

26 pages, 5895 KiB  
Article
SeFi-CD: A Semantic First Change Detection Paradigm That Can Detect Any Change You Want
by Ling Zhao, Zhenyang Huang, Yipeng Wang, Chengli Peng, Jun Gan, Haifeng Li and Chao Hu
Remote Sens. 2024, 16(21), 4109; https://doi.org/10.3390/rs16214109 - 3 Nov 2024
Viewed by 957
Abstract
The existing change detection (CD) methods can be summarized as the visual-first change detection (ViFi-CD) paradigm, which first extracts change features from visual differences and then assigns them specific semantic information. However, CD is essentially dependent on change regions of interest (CRoIs), meaning [...] Read more.
The existing change detection (CD) methods can be summarized as the visual-first change detection (ViFi-CD) paradigm, which first extracts change features from visual differences and then assigns them specific semantic information. However, CD is essentially dependent on change regions of interest (CRoIs), meaning that the CD results are directly determined by the semantics changes in interest, making its primary image factor semantic of interest rather than visual. The ViFi-CD paradigm can only assign specific semantics of interest to specific change features extracted from visual differences, leading to the inevitable omission of potential CRoIs and the inability to adapt to different CRoI CD tasks. In other words, changes in other CRoIs cannot be detected by the ViFi-CD method without retraining the model or significantly modifying the method. This paper introduces a new CD paradigm, the semantic-first CD (SeFi-CD) paradigm. The core idea of SeFi-CD is to first perceive the dynamic semantics of interest and then visually search for change features related to the semantics. Based on the SeFi-CD paradigm, we designed Anything You Want Change Detection (AUWCD). Experiments on public datasets demonstrate that the AUWCD outperforms the current state-of-the-art CD methods, achieving an average F1 score 5.01% higher than that of these advanced supervised baselines on the SECOND dataset, with a maximum increase of 13.17%. The proposed SeFi-CD offers a novel CD perspective and approach. Full article
Show Figures

Figure 1

21 pages, 2923 KiB  
Article
Multi-Scale Classification and Contrastive Regularization: Weakly Supervised Large-Scale 3D Point Cloud Semantic Segmentation
by Jingyi Wang, Jingyang He, Yu Liu, Chen Chen, Maojun Zhang and Hanlin Tan
Remote Sens. 2024, 16(17), 3319; https://doi.org/10.3390/rs16173319 - 7 Sep 2024
Viewed by 945
Abstract
With the proliferation of large-scale 3D point cloud datasets, the high cost of per-point annotation has spurred the development of weakly supervised semantic segmentation methods. Current popular research mainly focuses on single-scale classification, which fails to address the significant feature scale differences between [...] Read more.
With the proliferation of large-scale 3D point cloud datasets, the high cost of per-point annotation has spurred the development of weakly supervised semantic segmentation methods. Current popular research mainly focuses on single-scale classification, which fails to address the significant feature scale differences between background and objects in large scenes. Therefore, we propose MCCR (Multi-scale Classification and Contrastive Regularization), an end-to-end semantic segmentation framework for large-scale 3D scenes under weak supervision. MCCR first aggregates features and applies random downsampling to the input data. Then, it captures the local features of a random point based on multi-layer features and the input coordinates. These features are then fed into the network to obtain the initial and final prediction results, and MCCR iteratively trains the model using strategies such as contrastive learning. Notably, MCCR combines multi-scale classification with contrastive regularization to fully exploit multi-scale features and weakly labeled information. We investigate both point-level and local contrastive regularization to leverage point cloud augmentor and local semantic information and introduce a Decoupling Layer to guide the loss optimization in different spaces. Results on three popular large-scale datasets, S3DIS, SemanticKITTI and SensatUrban, demonstrate that our model achieves state-of-the-art (SOTA) performance on large-scale outdoor datasets with only 0.1% labeled points for supervision, while maintaining strong performance on indoor datasets. Full article
Show Figures

Figure 1

26 pages, 5826 KiB  
Article
An Efficient Task Implementation Modeling Framework with Multi-Stage Feature Selection and AutoML: A Case Study in Forest Fire Risk Prediction
by Ye Su, Longlong Zhao, Hongzhong Li, Xiaoli Li, Jinsong Chen and Yuankai Ge
Remote Sens. 2024, 16(17), 3190; https://doi.org/10.3390/rs16173190 - 29 Aug 2024
Viewed by 833
Abstract
As data science advances, automated machine learning (AutoML) gains attention for lowering barriers, saving time, and enhancing efficiency. However, with increasing data dimensionality, AutoML struggles with large-scale feature sets. Effective feature selection is crucial for efficient AutoML in multi-task applications. This study proposes [...] Read more.
As data science advances, automated machine learning (AutoML) gains attention for lowering barriers, saving time, and enhancing efficiency. However, with increasing data dimensionality, AutoML struggles with large-scale feature sets. Effective feature selection is crucial for efficient AutoML in multi-task applications. This study proposes an efficient modeling framework combining a multi-stage feature selection (MSFS) algorithm and AutoSklearn, a robust and efficient AutoML framework, to address high-dimensional data challenges. The MSFS algorithm includes three stages: mutual information gain (MIG), recursive feature elimination with cross-validation (RFECV), and a voting aggregation mechanism, ensuring comprehensive consideration of feature correlation, importance, and stability. Based on multi-source and time series remote sensing data, this study pioneers the application of AutoSklearn for forest fire risk prediction. Using this case study, we compare MSFS with five other feature selection (FS) algorithms, including three single FS algorithms and two hybrid FS algorithms. Results show that MSFS selects half of the original features (12/24), effectively handling collinearity (eliminating 11 out of 13 collinear feature groups) and increasing AutoSklearn’s success rate by 15%, outperforming two FS algorithms with the same number of features by 7% and 5%. Among the six FS algorithms and non-FS, MSFS demonstrates the highest prediction performance and stability with minimal variance (0.09%) across five evaluation metrics. MSFS efficiently filters redundant features, enhancing AutoSklearn’s operational efficiency and generalization ability in high-dimensional tasks. The MSFS–AutoSklearn framework significantly improves AutoML’s production efficiency and prediction accuracy, facilitating the efficient implementation of various real-world tasks and the wider application of AutoML. Full article
Show Figures

Figure 1

20 pages, 2725 KiB  
Article
Soil Organic Carbon Estimation via Remote Sensing and Machine Learning Techniques: Global Topic Modeling and Research Trend Exploration
by Tong Li, Lizhen Cui, Yu Wu, Timothy I. McLaren, Anquan Xia, Rajiv Pandey, Hongdou Liu, Weijin Wang, Zhihong Xu, Xiufang Song, Ram C. Dalal and Yash P. Dang
Remote Sens. 2024, 16(17), 3168; https://doi.org/10.3390/rs16173168 - 27 Aug 2024
Cited by 2 | Viewed by 2653
Abstract
Understanding and monitoring soil organic carbon (SOC) stocks is crucial for ecosystem carbon cycling, services, and addressing global environmental challenges. This study employs the BERTopic model and bibliometric trend analysis exploration to comprehensively analyze global SOC estimates. BERTopic, a topic modeling technique based [...] Read more.
Understanding and monitoring soil organic carbon (SOC) stocks is crucial for ecosystem carbon cycling, services, and addressing global environmental challenges. This study employs the BERTopic model and bibliometric trend analysis exploration to comprehensively analyze global SOC estimates. BERTopic, a topic modeling technique based on BERT (bidirectional encoder representatives from transformers), integrates recent advances in natural language processing. The research analyzed 1761 papers on SOC and remote sensing (RS), in addition to 490 related papers on machine learning (ML) techniques. BERTopic modeling identified nine research themes for SOC estimation using RS, emphasizing spectral prediction models, carbon cycle dynamics, and agricultural impacts on SOC. In contrast, for the literature on RS and ML it identified five thematic clusters: spatial forestry analysis, hyperspectral soil analysis, agricultural deep learning, the multitemporal imaging of farmland SOC, and RS platforms (Sentinel-2 and synthetic aperture radar, SAR). From 1991 to 2023, research on SOC estimation using RS and ML has evolved from basic mapping to topics like carbon sequestration and modeling with Sentinel-2A and big data. In summary, this study traces the historical growth and thematic evolution of SOC research, identifying synergies between RS and ML and focusing on SOC estimation with advanced ML techniques. These findings are critical to global ecosystem SOC assessments and environmental policy formulation. Full article
Show Figures

Figure 1

26 pages, 75294 KiB  
Article
SOD-YOLO: Small-Object-Detection Algorithm Based on Improved YOLOv8 for UAV Images
by Yangang Li, Qi Li, Jie Pan, Ying Zhou, Hongliang Zhu, Hongwei Wei and Chong Liu
Remote Sens. 2024, 16(16), 3057; https://doi.org/10.3390/rs16163057 - 20 Aug 2024
Cited by 1 | Viewed by 4132
Abstract
The rapid development of unmanned aerial vehicle (UAV) technology has contributed to the increasing sophistication of UAV-based object-detection systems, which are now extensively utilized in civilian and military sectors. However, object detection from UAV images has numerous challenges, including significant variations in the [...] Read more.
The rapid development of unmanned aerial vehicle (UAV) technology has contributed to the increasing sophistication of UAV-based object-detection systems, which are now extensively utilized in civilian and military sectors. However, object detection from UAV images has numerous challenges, including significant variations in the object size, changing spatial configurations, and cluttered backgrounds with multiple interfering elements. To address these challenges, we propose SOD-YOLO, an innovative model based on the YOLOv8 model, to detect small objects in UAV images. The model integrates the receptive field convolutional block attention module (RFCBAM) in the backbone network to perform downsampling, improving feature extraction efficiency and mitigating the spatial information sparsity caused by downsampling. Additionally, we developed a novel neck architecture called the balanced spatial and semantic information fusion pyramid network (BSSI-FPN) designed for multi-scale feature fusion. The BSSI-FPN effectively balances spatial and semantic information across feature maps using three primary strategies: fully utilizing large-scale features, increasing the frequency of multi-scale feature fusion, and implementing dynamic upsampling. The experimental results on the VisDrone2019 dataset demonstrate that SOD-YOLO-s improves the mAP50 indicator by 3% compared to YOLOv8s while reducing the number of parameters and computational complexity by 84.2% and 30%, respectively. Compared to YOLOv8l, SOD-YOLO-l improves the mAP50 indicator by 7.7% and reduces the number of parameters by 59.6%. Compared to other existing methods, SODA-YOLO-l achieves the highest detection accuracy, demonstrating the superiority of the proposed method. Full article
Show Figures

Graphical abstract

20 pages, 7768 KiB  
Article
SimNFND: A Forward-Looking Sonar Denoising Model Trained on Simulated Noise-Free and Noisy Data
by Taihong Yang, Tao Zhang and Yiqing Yao
Remote Sens. 2024, 16(15), 2815; https://doi.org/10.3390/rs16152815 - 31 Jul 2024
Viewed by 923
Abstract
Given the propagation characteristics of sound waves and the complexity of the underwater environment, denoising forward-looking sonar image data presents a formidable challenge. Existing studies often add noise to sonar images and then explore methods for its removal. This approach neglects the inherent [...] Read more.
Given the propagation characteristics of sound waves and the complexity of the underwater environment, denoising forward-looking sonar image data presents a formidable challenge. Existing studies often add noise to sonar images and then explore methods for its removal. This approach neglects the inherent complex noise in sonar images, resulting in inaccurate evaluations of traditional denoising methods and poor learning of noise characteristics by deep learning models. To address the lack of high-quality data for FLS denoising model training, we propose a simulation algorithm for forward-looking sonar data based on RGBD data. By utilizing rendering techniques and noise simulation algorithms, high-quality noise-free and noisy sonar data can be rapidly generated from existing RGBD data. Based on these data, we optimize the loss function and training process of the FLS denoising model, achieving significant improvements in noise removal and feature preservation compared to other methods. Finally, this paper performs both qualitative and quantitative analyses of the algorithm’s performance using real and simulated sonar data. Compared to the latest FLS denoising models based on traditional methods and deep learning techniques, our method demonstrates significant advantages in denoising capability. All inference results for the Marine Debris Dataset (MDD) have been made open source, facilitating subsequent research and comparison. Full article
Show Figures

Figure 1

22 pages, 20661 KiB  
Article
Automated Flood Prediction along Railway Tracks Using Remotely Sensed Data and Traditional Flood Models
by Abdul-Rashid Zakaria, Thomas Oommen and Pasi Lautala
Remote Sens. 2024, 16(13), 2332; https://doi.org/10.3390/rs16132332 - 26 Jun 2024
Viewed by 1667
Abstract
Ground hazards are a significant problem in the global economy, costing millions of dollars in damage each year. Railroad tracks are vulnerable to ground hazards like flooding since they traverse multiple terrains with complex environmental factors and diverse human developments. Traditionally, flood-hazard assessments [...] Read more.
Ground hazards are a significant problem in the global economy, costing millions of dollars in damage each year. Railroad tracks are vulnerable to ground hazards like flooding since they traverse multiple terrains with complex environmental factors and diverse human developments. Traditionally, flood-hazard assessments are generated using models like the Hydrological Engineering Center–River Analysis System (HEC-RAS). However, these maps are typically created for design flood events (10, 50, 100, 500 years) and are not available for any specific storm event, as they are not designed for individual flood predictions. Remotely sensed methods, on the other hand, offer precise flood extents only during the flooding, which means the actual flood extents cannot be determined beforehand. Railroad agencies need daily flood extent maps before rainfall events to manage and plan for the parts of the railroad network that will be impacted during each rainfall event. A new approach would involve using traditional flood-modeling layers and remotely sensed flood model outputs such as flood maps created using the Google Earth Engine. These new approaches will use machine-learning tools in flood prediction and extent mapping. This new approach will allow for determining the extent of flood for each rainfall event on a daily basis using rainfall forecast; therefore, flooding extents will be modeled before the actual flood, allowing railroad managers to plan for flood events pre-emptively. Two approaches were used: support vector machines and deep neural networks. Both methods were fine-tuned using grid-search cross-validation; the deep neural network model was chosen as the best model since it was computationally less expensive in training the model and had fewer type II errors or false negatives, which were the priorities for the flood modeling and would be suitable for developing the automated system for the entire railway corridor. The best deep neural network was then deployed and used to assess the extent of flooding for two floods in 2020 and 2022. The results indicate that the model accurately approximates the actual flooding extent and can predict flooding on a daily temporal basis using rainfall forecasts. Full article
Show Figures

Figure 1

18 pages, 6891 KiB  
Article
Enhancing Machine Learning Performance in Estimating CDOM Absorption Coefficient via Data Resampling
by Jinuk Kim, Jin Hwi Kim, Wonjin Jang, JongCheol Pyo, Hyuk Lee, Seohyun Byeon, Hankyu Lee, Yongeun Park and Seongjoon Kim
Remote Sens. 2024, 16(13), 2313; https://doi.org/10.3390/rs16132313 - 25 Jun 2024
Cited by 2 | Viewed by 766
Abstract
Chromophoric dissolved organic matter (CDOM) is a mixture of various types of organic matter and a useful parameter for monitoring complex inland surface waters. Remote sensing has been widely utilized to detect CDOM in various studies; however, in many cases, the dataset is [...] Read more.
Chromophoric dissolved organic matter (CDOM) is a mixture of various types of organic matter and a useful parameter for monitoring complex inland surface waters. Remote sensing has been widely utilized to detect CDOM in various studies; however, in many cases, the dataset is relatively imbalanced in a single region. To address these concerns, data were acquired from hyperspectral images, field reflection spectra, and field monitoring data, and the imbalance problem was solved using a synthetic minority oversampling technique (SMOTE). Using the on-site reflectance ratio of the hyperspectral images, the input variables Rrs (452/497), Rrs (497/580), Rrs (497/618), and Rrs (684/618), which had the highest correlation with the CDOM absorption coefficient aCDOM (355), were extracted. Random forest and light gradient boosting machine algorithms were applied to create a CDOM prediction algorithm via machine learning, and to apply SMOTE, low-concentration and high-concentration datasets of CDOM were distinguished by 5 m−1. The training and testing datasets were distinguished at a 75%:25% ratio at low and high concentrations, and SMOTE was applied to generate synthetic data based on the training dataset, which is a sub-dataset of the original dataset. Datasets using SMOTE resulted in an overall improvement in the algorithmic accuracy of the training and test step. The random forest model was selected as the optimal model for CDOM prediction. In the best-case scenario of the random forest model, the SMOTE algorithm showed superior performance, with testing R2, absolute error (MAE), and root mean square error (RMSE) values of 0.838, 0.566, and 0.777 m−1, respectively, compared to the original algorithm’s test values of 0.722, 0.493, and 0.802 m−1. This study is anticipated to resolve imbalance problems using SMOTE when predicting remote sensing-based CDOM. It is expected to produce and implement a machine learning model with improved reliable performance. Full article
Show Figures

Figure 1

28 pages, 26836 KiB  
Article
Effective Training and Inference Strategies for Point Classification in LiDAR Scenes
by Mariona Carós, Ariadna Just, Santi Seguí and Jordi Vitrià
Remote Sens. 2024, 16(12), 2153; https://doi.org/10.3390/rs16122153 - 13 Jun 2024
Viewed by 1161
Abstract
Light Detection and Ranging systems serve as robust tools for creating three-dimensional representations of the Earth’s surface. These representations are known as point clouds. Point cloud scene segmentation is essential in a range of applications aimed at understanding the environment, such as infrastructure [...] Read more.
Light Detection and Ranging systems serve as robust tools for creating three-dimensional representations of the Earth’s surface. These representations are known as point clouds. Point cloud scene segmentation is essential in a range of applications aimed at understanding the environment, such as infrastructure planning and monitoring. However, automating this process can result in notable challenges due to variable point density across scenes, ambiguous object shapes, and substantial class imbalances. Consequently, manual intervention remains prevalent in point classification, allowing researchers to address these complexities. In this work, we study the elements contributing to the automatic semantic segmentation process with deep learning, conducting empirical evaluations on a self-captured dataset by a hybrid airborne laser scanning sensor combined with two nadir cameras in RGB and near-infrared over a 247 km2 terrain characterized by hilly topography, urban areas, and dense forest cover. Our findings emphasize the importance of employing appropriate training and inference strategies to achieve accurate classification of data points across all categories. The proposed methodology not only facilitates the segmentation of varying size point clouds but also yields a significant performance improvement compared to preceding methodologies, achieving a mIoU of 94.24% on our self-captured dataset. Full article
Show Figures

Figure 1

19 pages, 44215 KiB  
Article
Algal Bed Region Segmentation Based on a ViT Adapter Using Aerial Images for Estimating CO2 Absorption Capacity
by Guang Li, Ren Togo, Keisuke Maeda, Akinori Sako, Isao Yamauchi, Tetsuya Hayakawa, Shigeyuki Nakamae, Takahiro Ogawa and Miki Haseyama
Remote Sens. 2024, 16(10), 1742; https://doi.org/10.3390/rs16101742 - 14 May 2024
Viewed by 950
Abstract
In this study, we propose a novel method for algal bed region segmentation using aerial images. Accurately determining the carbon dioxide absorption capacity of coastal algae requires measurements of algal bed regions. However, conventional manual measurement methods are resource-intensive and time-consuming, which hinders [...] Read more.
In this study, we propose a novel method for algal bed region segmentation using aerial images. Accurately determining the carbon dioxide absorption capacity of coastal algae requires measurements of algal bed regions. However, conventional manual measurement methods are resource-intensive and time-consuming, which hinders the advancement of the field. To solve these problems, we propose a novel method for automatic algal bed region segmentation using aerial images. In our method, we use an advanced semantic segmentation model, a ViT adapter, and adapt it to aerial images for algal bed region segmentation. Our method demonstrates high accuracy in identifying algal bed regions in an aerial image dataset collected from Hokkaido, Japan. The experimental results for five different ecological regions show that the mean intersection over union (mIoU) and mean F-score of our method in the validation set reach 0.787 and 0.870, the IoU and F-score for the background region are 0.957 and 0.978, and the IoU and F-score for the algal bed region are 0.616 and 0.762, respectively. In particular, the mean recognition area compared with the ground truth area annotated manually is 0.861. Our study contributes to the advancement of blue carbon assessment by introducing a novel semantic segmentation-based method for identifying algal bed regions using aerial images. Full article
Show Figures

Figure 1

18 pages, 3629 KiB  
Article
RS-LLaVA: A Large Vision-Language Model for Joint Captioning and Question Answering in Remote Sensing Imagery
by Yakoub Bazi, Laila Bashmal, Mohamad Mahmoud Al Rahhal, Riccardo Ricci and Farid Melgani
Remote Sens. 2024, 16(9), 1477; https://doi.org/10.3390/rs16091477 - 23 Apr 2024
Cited by 5 | Viewed by 4834
Abstract
In this paper, we delve into the innovative application of large language models (LLMs) and their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) image analysis. We particularly emphasize their multi-tasking potential with a focus on image captioning and [...] Read more.
In this paper, we delve into the innovative application of large language models (LLMs) and their extension, large vision-language models (LVLMs), in the field of remote sensing (RS) image analysis. We particularly emphasize their multi-tasking potential with a focus on image captioning and visual question answering (VQA). In particular, we introduce an improved version of the Large Language and Vision Assistant Model (LLaVA), specifically adapted for RS imagery through a low-rank adaptation approach. To evaluate the model performance, we create the RS-instructions dataset, a comprehensive benchmark dataset that integrates four diverse single-task datasets related to captioning and VQA. The experimental results confirm the model’s effectiveness, marking a step forward toward the development of efficient multi-task models for RS image analysis. Full article
Show Figures

Figure 1

19 pages, 5673 KiB  
Article
M-SKSNet: Multi-Scale Spatial Kernel Selection for Image Segmentation of Damaged Road Markings
by Junwei Wang, Xiaohan Liao, Yong Wang, Xiangqiang Zeng, Xiang Ren, Huanyin Yue and Wenqiu Qu
Remote Sens. 2024, 16(9), 1476; https://doi.org/10.3390/rs16091476 - 23 Apr 2024
Cited by 2 | Viewed by 1128
Abstract
It is a challenging task to accurately segment damaged road markings from images, mainly due to their fragmented, dense, small-scale, and blurry nature. This study proposes a multi-scale spatial kernel selection net named M-SKSNet, a novel model that integrates a transformer and a [...] Read more.
It is a challenging task to accurately segment damaged road markings from images, mainly due to their fragmented, dense, small-scale, and blurry nature. This study proposes a multi-scale spatial kernel selection net named M-SKSNet, a novel model that integrates a transformer and a multi-dilated large kernel convolutional neural network (MLKC) block to address these issues. Through integrating multiple scales of information, the model can extract high-quality and semantically rich features while generating damage-specific representations. This is achieved by leveraging both the local and global contexts, as well as self-attention mechanisms. The performance of M-SKSNet is evaluated both quantitatively and qualitatively, and the results show that M-SKSNet achieved the highest improvement in F1 by 3.77% and in IOU by 4.6%, when compared to existing models. Additionally, the effectiveness of M-SKSNet in accurately extracting damaged road markings from images in various complex scenarios (including city roads and highways) is demonstrated. Furthermore, M-SKSNet is found to outperform existing alternatives in terms of both robustness and accuracy. Full article
Show Figures

Figure 1

22 pages, 35245 KiB  
Article
DCEF2-YOLO: Aerial Detection YOLO with Deformable Convolution–Efficient Feature Fusion for Small Target Detection
by Yeonha Shin, Heesub Shin, Jaewoo Ok, Minyoung Back, Jaehyuk Youn and Sungho Kim
Remote Sens. 2024, 16(6), 1071; https://doi.org/10.3390/rs16061071 - 18 Mar 2024
Cited by 2 | Viewed by 2235
Abstract
Deep learning technology for real-time small object detection in aerial images can be used in various industrial environments such as real-time traffic surveillance and military reconnaissance. However, detecting small objects with few pixels and low resolution remains a challenging problem that requires performance [...] Read more.
Deep learning technology for real-time small object detection in aerial images can be used in various industrial environments such as real-time traffic surveillance and military reconnaissance. However, detecting small objects with few pixels and low resolution remains a challenging problem that requires performance improvement. To improve the performance of small object detection, we propose DCEF 2-YOLO. Our proposed method enables efficient real-time small object detection by using a deformable convolution (DFConv) module and an efficient feature fusion structure to maximize the use of the internal feature information of objects. DFConv preserves small object information by preventing the mixing of object information with the background. The optimized feature fusion structure produces high-quality feature maps for efficient real-time small object detection while maximizing the use of limited information. Additionally, modifying the input data processing stage and reducing the detection layer to suit small object detection also contributes to performance improvement. When compared to the performance of the latest YOLO-based models (such as DCN-YOLO and YOLOv7), DCEF 2-YOLO outperforms them, with a mAP of +6.1% on the DOTA-v1.0 test set, +0.3% on the NWPU VHR-10 test set, and +1.5% on the VEDAI512 test set. Furthermore, it has a fast processing speed of 120.48 FPS with an RTX3090 for 512 × 512 images, making it suitable for real-time small object detection tasks. Full article
Show Figures

Graphical abstract

15 pages, 22774 KiB  
Article
Object Detection in Remote Sensing Images Based on Adaptive Multi-Scale Feature Fusion Method
by Chun Liu, Sixuan Zhang, Mengjie Hu and Qing Song
Remote Sens. 2024, 16(5), 907; https://doi.org/10.3390/rs16050907 - 4 Mar 2024
Cited by 11 | Viewed by 2577
Abstract
Multi-scale object detection is critical for analyzing remote sensing images. Traditional feature pyramid networks, which are aimed at accommodating objects of varying sizes through multi-level feature extraction, face significant challenges due to the diverse scale variations present in remote sensing images. This situation [...] Read more.
Multi-scale object detection is critical for analyzing remote sensing images. Traditional feature pyramid networks, which are aimed at accommodating objects of varying sizes through multi-level feature extraction, face significant challenges due to the diverse scale variations present in remote sensing images. This situation often forces single-level features to span a broad spectrum of object sizes, complicating accurate localization and classification. To tackle these challenges, this paper proposes an innovative algorithm that incorporates an adaptive multi-scale feature enhancement and fusion module (ASEM), which enhances remote sensing image object detection through sophisticated multi-scale feature fusion. Our method begins by employing a feature pyramid to gather coarse multi-scale features. Subsequently, it integrates a fine-grained feature extraction module at each level, utilizing atrous convolutions with varied dilation rates to refine multi-scale features, which markedly improves the information capture from widely varied object scales. Furthermore, an adaptive enhancement module is applied to the features of each level by employing an attention mechanism for feature fusion. This strategy concentrates on the features of critical scale, which significantly enhance the effectiveness of capturing essential feature information. Compared with the baseline method, namely, Rotated FasterRCNN, our method achieved an mAP of 74.21% ( 0.81%) on the DOTA-v1.0 dataset and an mAP of 84.90% (+9.2%) on the HRSC2016 dataset. These results validated the effectiveness and practicality of our method and demonstrated its significant application value in multi-scale remote sensing object detection tasks. Full article
Show Figures

Figure 1

Back to TopTop