Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (17)

Search Parameters:
Keywords = UperNet

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 3901 KB  
Article
Research on CTSA-DeepLabV3+ Urban Green Space Classification Model Based on GF-2 Images
by Ruotong Li, Jian Zhao and Yanguo Fan
Sensors 2025, 25(13), 3862; https://doi.org/10.3390/s25133862 - 21 Jun 2025
Viewed by 711
Abstract
As an important part of urban ecosystems, urban green spaces play a key role in ecological environmental protection and urban spatial structure optimization. However, due to the complex morphology and high degree of fragmentation of urban green spaces, it is still challenging to [...] Read more.
As an important part of urban ecosystems, urban green spaces play a key role in ecological environmental protection and urban spatial structure optimization. However, due to the complex morphology and high degree of fragmentation of urban green spaces, it is still challenging to effectively distinguish urban green space types from high spatial resolution images. To solve the problem, a Contextual Transformer and Squeeze Aggregated Excitation Enhanced DeepLabV3+ (CTSA-DeepLabV3+) model was proposed for urban green space classification based on Gaofen-2 (GF-2) satellite images. A Contextual Transformer (CoT) module was added to the decoder part of the model to enhance the global context modeling capability, and the SENetv2 attention mechanism was employed to improve its key feature capture ability. The experimental results showed that the overall classification accuracy of the CTSA-DeepLabV3+ model is 96.21%, and the average intersection ratio, precision, recall, and F1-score reach 89.22%, 92.56%, 90.12%, and 91.23%, respectively, which is better than DeepLabV3+, Fully Convolutional Networks (FCNs), U-Net (UNet), the Pyramid Scene Parseing Network (PSPNet), UperNet-Swin Transformer, and other mainstream models. The model exhibits higher accuracy and provides efficient references for the intelligent interpretation of urban green space with high-resolution remote sensing images. Full article
(This article belongs to the Section Remote Sensors)
Show Figures

Figure 1

29 pages, 9654 KB  
Article
Construction of Multi-Scale Fusion Attention Unified Perceptual Parsing Networks for Semantic Segmentation of Mangrove Remote Sensing Images
by Xin Wang, Yu Zhang, Wenquan Xu, Hanxi Wang, Jingye Cai, Qin Qin, Qin Wang and Jing Zeng
Appl. Sci. 2025, 15(2), 976; https://doi.org/10.3390/app15020976 - 20 Jan 2025
Viewed by 1103
Abstract
Mangrove forests play a crucial role in coastal ecosystem protection and carbon sequestration processes. However, monitoring remains challenging due to the forests’ complex spatial distribution characteristics. This study addresses three key challenges in mangrove monitoring: limited high-quality datasets, the complex spatial characteristics of [...] Read more.
Mangrove forests play a crucial role in coastal ecosystem protection and carbon sequestration processes. However, monitoring remains challenging due to the forests’ complex spatial distribution characteristics. This study addresses three key challenges in mangrove monitoring: limited high-quality datasets, the complex spatial characteristics of mangrove distribution, and technical difficulties in high-resolution image processing. To address these challenges, we present two main contributions. (1) Using multi-source high-resolution satellite imagery from China’s new generation of Earth observation satellites, we constructed the Mangrove Semantic Segmentation Dataset of Beihai, Guangxi (MSSDBG); (2) We propose a novel Multi-scale Fusion Attention Unified Perceptual Network (MFA-UperNet) for precise mangrove segmentation. This network integrates Cascade Pyramid Fusion Modules, a Multi-scale Selective Kernel Attention Module, and an Auxiliary Edge Neck to process the unique characteristics of mangrove remote sensing images, particularly addressing issues of scale variation, complex backgrounds, and boundary accuracy. The experimental results demonstrate that our approach achieved a mean Intersection over Union (mIoU) of 94.54% and a mean Pixel Accuracy (mPA) of 97.14% on the MSSDBG dataset, significantly outperforming existing methods. This study provides valuable tools and methods for monitoring and protecting mangrove ecosystems, contributing to the preservation of these critical coastal environments. Full article
Show Figures

Figure 1

21 pages, 12271 KB  
Article
Detection of Marine Oil Spill from PlanetScope Images Using CNN and Transformer Models
by Jonggu Kang, Chansu Yang, Jonghyuk Yi and Yangwon Lee
J. Mar. Sci. Eng. 2024, 12(11), 2095; https://doi.org/10.3390/jmse12112095 - 19 Nov 2024
Cited by 5 | Viewed by 2135
Abstract
The contamination of marine ecosystems by oil spills poses a significant threat to the marine environment, necessitating the prompt and effective implementation of measures to mitigate the associated damage. Satellites offer a spatial and temporal advantage over aircraft and unmanned aerial vehicles (UAVs) [...] Read more.
The contamination of marine ecosystems by oil spills poses a significant threat to the marine environment, necessitating the prompt and effective implementation of measures to mitigate the associated damage. Satellites offer a spatial and temporal advantage over aircraft and unmanned aerial vehicles (UAVs) in oil spill detection due to their wide-area monitoring capabilities. While oil spill detection has traditionally relied on synthetic aperture radar (SAR) images, the combined use of optical satellite sensors alongside SAR can significantly enhance monitoring capabilities, providing improved spatial and temporal coverage. The advent of deep learning methodologies, particularly convolutional neural networks (CNNs) and Transformer models, has generated considerable interest in their potential for oil spill detection. In this study, we conducted a comprehensive and objective comparison to evaluate the suitability of CNN and Transformer models for marine oil spill detection. High-resolution optical satellite images were used to optimize DeepLabV3+, a widely utilized CNN model; Swin-UPerNet, a representative Transformer model; and Mask2Former, which employs a Transformer-based architecture for both encoding and decoding. The results of cross-validation demonstrate a mean Intersection over Union (mIoU) of 0.740, 0.840 and 0.804 for all the models, respectively, indicating their potential for detecting oil spills in the ocean. Additionally, we performed a histogram analysis on the predicted oil spill pixels, which allowed us to classify the types of oil. These findings highlight the considerable promise of the Swin Transformer models for oil spill detection in the context of future marine disaster monitoring. Full article
(This article belongs to the Special Issue Remote Sensing Applications in Marine Environmental Monitoring)
Show Figures

Figure 1

14 pages, 35441 KB  
Article
Smart Ship Draft Reading by Dual-Flow Deep Learning Architecture and Multispectral Information
by Bo Zhang, Jiangyun Li, Haicheng Tang and Xi Liu
Sensors 2024, 24(17), 5580; https://doi.org/10.3390/s24175580 - 28 Aug 2024
Cited by 3 | Viewed by 1919
Abstract
In maritime transportation, a ship’s draft survey serves as a primary method for weighing bulk cargo. The accuracy of the ship’s draft reading determines the fairness of bulk cargo transactions. Human visual-based draft reading methods face issues such as safety concerns, high labor [...] Read more.
In maritime transportation, a ship’s draft survey serves as a primary method for weighing bulk cargo. The accuracy of the ship’s draft reading determines the fairness of bulk cargo transactions. Human visual-based draft reading methods face issues such as safety concerns, high labor costs, and subjective interpretation. Therefore, some image processing methods are utilized to achieve automatic draft reading. However, due to the limitations in the spectral characteristics of RGB images, existing image processing methods are susceptible to water surface environmental interference, such as reflections. To solve this issue, we obtained and annotated 524 multispectral images of a ship’s draft as the research dataset, marking the first application of integrating NIR information and RGB images for automatic draft reading tasks. Additionally, a dual-branch backbone named BIF is proposed to extract and combine spectral information from RGB and NIR images. The backbone network can be combined with the existing segmentation head and detection head to perform waterline segmentation and draft detection. By replacing the original ResNet-50 backbone of YOLOv8, we reached a mAP of 99.2% in the draft detection task. Similarly, combining UPerNet with our dual-branch backbone, the mIoU of the waterline segmentation task was improved from 98.9% to 99.3%. The inaccuracy of the draft reading is less than ±0.01 m, confirming the efficacy of our method for automatic draft reading tasks. Full article
(This article belongs to the Special Issue Sensor-Fusion-Based Deep Interpretable Networks)
Show Figures

Graphical abstract

27 pages, 18900 KB  
Article
Refined Intelligent Landslide Identification Based on Multi-Source Information Fusion
by Xiao Wang, Di Wang, Chenghao Liu, Mengmeng Zhang, Luting Xu, Tiegang Sun, Weile Li, Sizhi Cheng and Jianhui Dong
Remote Sens. 2024, 16(17), 3119; https://doi.org/10.3390/rs16173119 - 23 Aug 2024
Cited by 3 | Viewed by 1254
Abstract
Landslides are most severe in the mountainous regions of southwestern China. While landslide identification provides a foundation for disaster prevention operations, methods for utilizing multi-source data and deep learning techniques to improve the efficiency and accuracy of landslide identification in complex environments are [...] Read more.
Landslides are most severe in the mountainous regions of southwestern China. While landslide identification provides a foundation for disaster prevention operations, methods for utilizing multi-source data and deep learning techniques to improve the efficiency and accuracy of landslide identification in complex environments are still a focus of research and a difficult issue in landslide research. In this study, we address the above problems and construct a landslide identification model based on the shifted window (Swin) transformer. We chose Ya’an, which has a complex terrain and experiences frequent landslides, as the study area. Our model, which fuses features from different remote sensing data sources and introduces a loss function that better learns the boundary information of the target, is compared with the pyramid scene parsing network (PSPNet), the unified perception parsing network (UPerNet), and DeepLab_V3+ models in order to explore the learning potential of the model and test the models’ resilience in an open-source landslide database. The results show that in the Ya’an landslide database, compared with the above benchmark networks (UPerNet, PSPNet, and DeepLab_v3+), the Swin Transformer-based optimization model improves overall accuracies by 1.7%, 2.1%, and 1.5%, respectively; the F1_score is improved by 14.5%, 16.2%, and 12.4%; and the intersection over union (IoU) is improved by 16.9%, 18.5%, and 14.6%, respectively. The performance of the optimized model is excellent. Full article
(This article belongs to the Special Issue Application of Remote Sensing Approaches in Geohazard Risk)
Show Figures

Figure 1

20 pages, 13039 KB  
Article
A Strategy for Neighboring Pixel Collaboration in Landslide Susceptibility Prediction
by Xiao Wang, Di Wang, Mengmeng Zhang, Xiaochuan Song, Luting Xu, Tiegang Sun, Weile Li, Sizhi Cheng and Jianhui Dong
Remote Sens. 2024, 16(12), 2206; https://doi.org/10.3390/rs16122206 - 18 Jun 2024
Viewed by 1201
Abstract
Landslide susceptibility prediction usually involves the comprehensive analysis of terrain and other factors that may be distributed with spatial patterns. Without considering the spatial correlation and mutual influence between pixels, conventional prediction methods often focus only on information from individual pixels. To address [...] Read more.
Landslide susceptibility prediction usually involves the comprehensive analysis of terrain and other factors that may be distributed with spatial patterns. Without considering the spatial correlation and mutual influence between pixels, conventional prediction methods often focus only on information from individual pixels. To address this issue, the present study proposes a new strategy for neighboring pixel collaboration based on the Unified Perceptual Parsing Network (UPerNet), the Vision Transformer (ViT), and Vision Graph Neural Networks (ViG). This strategy efficiently utilizes the strengths of deep learning in feature extraction, sequence modeling, and graph data processing. By considering the information from neighboring pixels, this strategy can more accurately identify susceptible areas and reduce misidentification and omissions. The experimental results suggest that the proposed strategy can predict landslide susceptibility zoning more accurately. These predictions can identify flat areas such as rivers and distinguish between areas with high and very high landslide susceptibility. Such refined zoning outcomes are significant for landslide prevention and mitigation and can help decision-makers formulate targeted response measures. Full article
(This article belongs to the Special Issue Application of Remote Sensing Approaches in Geohazard Risk)
Show Figures

Figure 1

20 pages, 4796 KB  
Article
ABNet: An Aggregated Backbone Network Architecture for Fine Landcover Classification
by Bo Si, Zhennan Wang, Zhoulu Yu and Ke Wang
Remote Sens. 2024, 16(10), 1725; https://doi.org/10.3390/rs16101725 - 13 May 2024
Cited by 2 | Viewed by 1958
Abstract
High-precision landcover classification is a fundamental prerequisite for resource and environmental monitoring and land-use status surveys. Imbued with intricate spatial information and texture features, very high spatial resolution remote sensing images accentuate the divergence between features within the same category, thereby amplifying the [...] Read more.
High-precision landcover classification is a fundamental prerequisite for resource and environmental monitoring and land-use status surveys. Imbued with intricate spatial information and texture features, very high spatial resolution remote sensing images accentuate the divergence between features within the same category, thereby amplifying the complexity of landcover classification. Consequently, semantic segmentation models leveraging deep backbone networks have emerged as stalwarts in landcover classification tasks owing to their adeptness in feature representation. However, the classification efficacy of a solitary backbone network model fluctuates across diverse scenarios and datasets, posing a persistent challenge in the construction or selection of an appropriate backbone network for distinct classification tasks. To elevate the classification performance and bolster the generalization of semantic segmentation models, we propose a novel semantic segmentation network architecture, named the aggregated backbone network (ABNet), for the meticulous landcover classification. ABNet aggregates three prevailing backbone networks (ResNet, HRNet, and VoVNet), distinguished by significant structural disparities, using a same-stage fusion approach. Subsequently, it amalgamates these networks with the Deeplabv3+ head after integrating the convolutional block attention mechanism (CBAM). Notably, this amalgamation harmonizes distinct scale features extracted by the three backbone networks, thus enriching the model’s spatial contextual comprehension and expanding its receptive field, thereby facilitating more effective semantic feature extraction across different stages. The convolutional block attention mechanism primarily orchestrates channel adjustments and curtails redundant information within the aggregated feature layers. Ablation experiments demonstrate an enhancement of no less than 3% in the mean intersection over union (mIoU) of ABNet on both the LoveDA and GID15 datasets when compared with a single backbone network model. Furthermore, in contrast to seven classical or state-of-the-art models (UNet, FPN, PSPNet, DANet, CBNet, CCNet, and UPerNet), ABNet evinces excellent segmentation performance across the aforementioned datasets, underscoring the efficiency and robust generalization capabilities of the proposed approach. Full article
Show Figures

Figure 1

19 pages, 5842 KB  
Article
Changes in the Water Area of an Inland River Terminal Lake (Taitma Lake) Driven by Climate Change and Human Activities, 2017–2022
by Feng Zi, Yong Wang, Shanlong Lu, Harrison Odion Ikhumhen, Chun Fang, Xinru Li, Nan Wang and Xinya Kuang
Remote Sens. 2024, 16(10), 1703; https://doi.org/10.3390/rs16101703 - 10 May 2024
Cited by 1 | Viewed by 3340
Abstract
Constructed from a dataset capturing the seasonal and annual water body distribution of the lower Qarqan River in the Taitma Lake area from 2017 to 2022, and combined with the meteorological and hydraulic engineering data, the spatial and temporal change patterns of the [...] Read more.
Constructed from a dataset capturing the seasonal and annual water body distribution of the lower Qarqan River in the Taitma Lake area from 2017 to 2022, and combined with the meteorological and hydraulic engineering data, the spatial and temporal change patterns of the Taitma Lake watershed area were determined. Analyses were conducted using Planetscope (PS) satellite images and a deep learning model. The results revealed the following: ① Deep learning-based water body extraction provides significantly greater accuracy than the conventional water body index approach. With an impressive accuracy of up to 96.0%, UPerNet was found to provide the most effective extraction results among the three convolutional neural networks (U-Net, DeeplabV3+, and UPerNet) used for semantic segmentation; ② Between 2017 and 2022, Taitma Lake’s water area experienced a rapid decrease, with the distribution of water predominantly shifting towards the east–west direction more than the north–south. The shifts between 2017 and 2020 and between 2020 and 2022 were clearly discernible, with the latter stage (2020–2022) being more significant than the former (2017–2020); ③ According to observations, Taitma Lake’s changing water area has been primarily influenced by human activity over the last six years. Based on the research findings of this paper, it was observed that this study provides a valuable scientific basis for water resource allocation aiming to balance the development of water resources in the middle and upper reaches of the Tarim and Qarqan Rivers, as well as for the ecological protection of the downstream Taitma Lake. Full article
Show Figures

Figure 1

12 pages, 2733 KB  
Article
A Study of Kale Recognition Based on Semantic Segmentation
by Huarui Wu, Wang Guo, Chang Liu and Xiang Sun
Agronomy 2024, 14(5), 894; https://doi.org/10.3390/agronomy14050894 - 25 Apr 2024
Cited by 2 | Viewed by 1250
Abstract
The kale crop is an important bulk vegetable, and automatic segmentation to recognize kale is fundamental for effective field management. However, complex backgrounds and texture-rich edge details make fine segmentation of kale difficult. To this end, we constructed a kale dataset in a [...] Read more.
The kale crop is an important bulk vegetable, and automatic segmentation to recognize kale is fundamental for effective field management. However, complex backgrounds and texture-rich edge details make fine segmentation of kale difficult. To this end, we constructed a kale dataset in a real field scenario and proposed an UperNet semantic segmentation model with a Swin transformer as the backbone network and improved the model according to the growth characteristics of kale. Firstly, a channel attention module (CAM) is introduced into the Swin transformer module to improve the representation ability of the network and enhance the extraction of kale outer leaf and leaf bulb information; secondly, the extraction accuracy of kale target edges is improved in the decoding part by designing an attention refinement module (ARM); lastly, the uneven distribution of classes is solved by modifying the optimizer and loss function to solve the class distribution problem. The experimental results show that the improved model in this paper has excellent performance in feature extraction, and the average intersection and merger ratio (mIOU) of the improved kale segmentation can be up to 91.2%, and the average pixel accuracy (mPA) can be up to 95.2%, which is 2.1 percentage points and 4.7 percentage points higher than the original UperNet model, respectively, and it effectively improves the segmentation recognition of kale. Full article
(This article belongs to the Special Issue Effects of Integrated Environment Management on Crop Photosynthesis)
Show Figures

Figure 1

24 pages, 14284 KB  
Article
Mask2Former with Improved Query for Semantic Segmentation in Remote-Sensing Images
by Shichen Guo, Qi Yang, Shiming Xiang, Shuwen Wang and Xuezhi Wang
Mathematics 2024, 12(5), 765; https://doi.org/10.3390/math12050765 - 4 Mar 2024
Cited by 9 | Viewed by 6965
Abstract
Semantic segmentation of remote sensing (RS) images is vital in various practical applications, including urban construction planning, natural disaster monitoring, and land resources investigation. However, RS images are captured by airplanes or satellites at high altitudes and long distances, resulting in ground objects [...] Read more.
Semantic segmentation of remote sensing (RS) images is vital in various practical applications, including urban construction planning, natural disaster monitoring, and land resources investigation. However, RS images are captured by airplanes or satellites at high altitudes and long distances, resulting in ground objects of the same category being scattered in various corners of the image. Moreover, objects of different sizes appear simultaneously in RS images. For example, some objects occupy a large area in urban scenes, while others only have small regions. Technically, the above two universal situations pose significant challenges to the segmentation with a high quality for RS images. Based on these observations, this paper proposes a Mask2Former with an improved query (IQ2Former) for this task. The fundamental motivation behind the IQ2Former is to enhance the capability of the query of Mask2Former by exploiting the characteristics of RS images well. First, we propose the Query Scenario Module (QSM), which aims to learn and group the queries from feature maps, allowing the selection of distinct scenarios such as the urban and rural areas, building clusters, and parking lots. Second, we design the query position module (QPM), which is developed to assign the image position information to each query without increasing the number of parameters, thereby enhancing the model’s sensitivity to small targets in complex scenarios. Finally, we propose the query attention module (QAM), which is constructed to leverage the characteristics of query attention to extract valuable features from the preceding queries. Being positioned between the duplicated transformer decoder layers, QAM ensures the comprehensive utilization of the supervisory information and the exploitation of those fine-grained details. Architecturally, the QSM, QPM, and QAM as well as an end-to-end model are assembled to achieve high-quality semantic segmentation. In comparison to the classical or state-of-the-art models (FCN, PSPNet, DeepLabV3+, OCRNet, UPerNet, MaskFormer, Mask2Former), IQ2Former has demonstrated exceptional performance across three publicly challenging remote-sensing image datasets, 83.59 mIoU on the Vaihingen dataset, 87.89 mIoU on Potsdam dataset, and 56.31 mIoU on LoveDA dataset. Additionally, overall accuracy, ablation experiment, and visualization segmentation results all indicate IQ2Former validity. Full article
(This article belongs to the Special Issue Advanced Research in Data-Centric AI)
Show Figures

Figure 1

28 pages, 1784 KB  
Article
Real-Time Detection of Unrecognized Objects in Logistics Warehouses Using Semantic Segmentation
by Serban Vasile Carata, Marian Ghenescu and Roxana Mihaescu
Mathematics 2023, 11(11), 2445; https://doi.org/10.3390/math11112445 - 25 May 2023
Cited by 3 | Viewed by 3133
Abstract
Pallet detection and tracking using computer vision is challenging due to the complexity of the object and its contents, lighting conditions, background clutter, and occlusions in industrial areas. Using semantic segmentation, this paper aims to detect pallets in a logistics warehouse. The proposed [...] Read more.
Pallet detection and tracking using computer vision is challenging due to the complexity of the object and its contents, lighting conditions, background clutter, and occlusions in industrial areas. Using semantic segmentation, this paper aims to detect pallets in a logistics warehouse. The proposed method examines changes in image segmentation from one frame to the next using semantic segmentation, taking into account the position and stationary behavior of newly introduced objects in the scene. The results indicate that the proposed method can detect pallets despite the complexity of the object and its contents. This demonstrates the utility of semantic segmentation for detecting unrecognized objects in real-world scenarios where a precise definition of the class cannot be given. Full article
Show Figures

Figure 1

14 pages, 5425 KB  
Article
Swin-UperNet: A Semantic Segmentation Model for Mangroves and Spartina alterniflora Loisel Based on UperNet
by Zhenhua Wang, Jing Li, Zhilian Tan, Xiangfeng Liu and Mingjie Li
Electronics 2023, 12(5), 1111; https://doi.org/10.3390/electronics12051111 - 24 Feb 2023
Cited by 15 | Viewed by 5722
Abstract
As an ecosystem in transition from land to sea, mangroves play a vital role in wind and wave protection and biodiversity maintenance. However, the invasion of Spartina alterniflora Loisel seriously damages the mangrove wetland ecosystem. To protect mangroves scientifically and dynamically, a semantic [...] Read more.
As an ecosystem in transition from land to sea, mangroves play a vital role in wind and wave protection and biodiversity maintenance. However, the invasion of Spartina alterniflora Loisel seriously damages the mangrove wetland ecosystem. To protect mangroves scientifically and dynamically, a semantic segmentation model for mangroves and Spartina alterniflora Loise was proposed based on UperNet (Swin-UperNet). In the proposed Swin-UperNet model, a data concatenation module was proposed to make full use of the multispectral information of remote sensing images, the backbone network was replaced with a Swin transformer to improve the feature extraction capability, and a boundary optimization module was designed to optimize the rough segmentation results. Additionally, a linear combination of cross-entropy loss and Lovasz-Softmax loss was taken as the loss function of Swin-UperNet, which could address the problem of unbalanced sample distribution. Taking GF-1 and GF-6 images as the experiment data, the performance of the Swin-UperNet model was compared against that of other segmentation models in terms of pixel accuracy (PA), mean intersection over union (mIoU), and frames per second (FPS), including PSPNet, PSANet, DeepLabv3, DANet, FCN, OCRNet, and DeepLabv3+. The results showed that the Swin-UperNet model achieved the best PA of 98.87% and mIoU of 90.0%, and the efficiency of the Swin-UperNet model was higher than that of most models. In conclusion, Swin-UperNet is an efficient and accurate model for mangrove and Spartina alterniflora Loise segmentation synchronously, which will provide a scientific basis for Spartina alterniflora Loise monitoring and mangrove resource conservation and management. Full article
(This article belongs to the Special Issue Applications of Deep Neural Network for Smart City)
Show Figures

Figure 1

25 pages, 9068 KB  
Article
Large-Scale Date Palm Tree Segmentation from Multiscale UAV-Based and Aerial Images Using Deep Vision Transformers
by Mohamed Barakat A. Gibril, Helmi Zulhaidi Mohd Shafri, Rami Al-Ruzouq, Abdallah Shanableh, Faten Nahas and Saeed Al Mansoori
Drones 2023, 7(2), 93; https://doi.org/10.3390/drones7020093 - 29 Jan 2023
Cited by 26 | Viewed by 5954
Abstract
The reliable and efficient large-scale mapping of date palm trees from remotely sensed data is crucial for developing palm tree inventories, continuous monitoring, vulnerability assessments, environmental control, and long-term management. Given the increasing availability of UAV images with limited spectral information, the high [...] Read more.
The reliable and efficient large-scale mapping of date palm trees from remotely sensed data is crucial for developing palm tree inventories, continuous monitoring, vulnerability assessments, environmental control, and long-term management. Given the increasing availability of UAV images with limited spectral information, the high intra-class variance of date palm trees, the variations in the spatial resolutions of the data, and the differences in image contexts and backgrounds, accurate mapping of date palm trees from very-high spatial resolution (VHSR) images can be challenging. This study aimed to investigate the reliability and the efficiency of various deep vision transformers in extracting date palm trees from multiscale and multisource VHSR images. Numerous vision transformers, including the Segformer, the Segmenter, the UperNet-Swin transformer, and the dense prediction transformer, with various levels of model complexity, were evaluated. The models were developed and evaluated using a set of comprehensive UAV-based and aerial images. The generalizability and the transferability of the deep vision transformers were evaluated and compared with various convolutional neural network-based (CNN) semantic segmentation models (including DeepLabV3+, PSPNet, FCN-ResNet-50, and DANet). The results of the examined deep vision transformers were generally comparable to several CNN-based models. The investigated deep vision transformers achieved satisfactory results in mapping date palm trees from the UAV images, with an mIoU ranging from 85% to 86.3% and an mF-score ranging from 91.62% to 92.44%. Among the evaluated models, the Segformer generated the highest segmentation results on the UAV-based and the multiscale testing datasets. The Segformer model, followed by the UperNet-Swin transformer, outperformed all of the evaluated CNN-based models in the multiscale testing dataset and in the additional unseen UAV testing dataset. In addition to delivering remarkable results in mapping date palm trees from versatile VHSR images, the Segformer model was among those with a small number of parameters and relatively low computing costs. Collectively, deep vision transformers could be used efficiently in developing and updating inventories of date palms and other tree species. Full article
Show Figures

Figure 1

10 pages, 437 KB  
Article
Relevancy between Objects Based on Common Sense for Semantic Segmentation
by Jun Zhou, Xing Bai and Qin Zhang
Appl. Sci. 2022, 12(24), 12711; https://doi.org/10.3390/app122412711 - 11 Dec 2022
Cited by 1 | Viewed by 2132
Abstract
Research on image classification sparked the latest deep-learning boom. Many downstream tasks, including semantic segmentation, benefit from it. The state-of-the-art semantic segmentation models are all based on deep learning, and they sometimes make some semantic mistakes. In a semantic segmentation dataset with a [...] Read more.
Research on image classification sparked the latest deep-learning boom. Many downstream tasks, including semantic segmentation, benefit from it. The state-of-the-art semantic segmentation models are all based on deep learning, and they sometimes make some semantic mistakes. In a semantic segmentation dataset with a small number of categories, images are often collected from a single scene, and there is a close semantic connection between any two categories. However, in the semantic segmentation dataset collected from multiple scenes, two categories may be irrelevant. The probability of objects in one category appearing next to objects in other categories is different, which is the basis of the paper. Semantic segmentation methods need to solve two problems of positioning and classification. This paper is dedicated to correcting those clearly wrong classifications that are contrary to reality. Specifically, we first calculate the relevancy between different class pairs. Then, based on this knowledge, we infer the category of a connected component according to the relationships of this connected component with its surrounding connected components and correct the obviously wrong classifications made by a deep learning semantic segmentation model. Several well-performing deep learning models are experimented on two challenging public datasets in the field of semantic image segmentation. Our proposed method improves the performance of UPerNet, OCRNet and SETR from 40.7%, 43% and 48.64% to 42.07%, 44.09% and 49.09% mean IoU on the ADE20K validation set, and the performance of PSPNet, DeepLabV3 and OCRNet from 37.26%, 37.3% and 39.5% to 38.93%, 38.95% and 40.63% mean IoU on the COCO-Stuff dataset, which shows the effectiveness of the method. Full article
Show Figures

Figure 1

24 pages, 7621 KB  
Article
HFENet: Hierarchical Feature Extraction Network for Accurate Landcover Classification
by Di Wang, Ronghao Yang, Hanhu Liu, Haiqing He, Junxiang Tan, Shaoda Li, Yichun Qiao, Kangqi Tang and Xiao Wang
Remote Sens. 2022, 14(17), 4244; https://doi.org/10.3390/rs14174244 - 28 Aug 2022
Cited by 14 | Viewed by 3556
Abstract
Landcover classification is an important application in remote sensing, but it is always a challenge to distinguish different features with similar characteristics or large-scale differences. Some deep learning networks, such as UperNet, PSPNet, and DANet, use pyramid pooling and attention mechanisms to improve [...] Read more.
Landcover classification is an important application in remote sensing, but it is always a challenge to distinguish different features with similar characteristics or large-scale differences. Some deep learning networks, such as UperNet, PSPNet, and DANet, use pyramid pooling and attention mechanisms to improve their abilities in multi-scale features extraction. However, due to the neglect of low-level features contained in the underlying network and the information differences between feature maps, it is difficult to identify small-scale objects. Thus, we propose a novel image segmentation network, named HFENet, for mining multi-level semantic information. Like the UperNet, HFENet adopts a top-down horizontal connection architecture while includes two improved modules, the HFE and the MFF. According to the characteristics of different levels of semantic information, HFE module reconstructs the feature extraction part by introducing an attention mechanism and pyramid pooling module to fully mine semantic information. With the help of a channel attention mechanism, MFF module up-samples and re-weights the feature maps to fuse them and enhance the expression ability of multi-scale features. Ablation studies and comparative experiments between HFENet and seven state-of-the-art models (U-Net, DeepLabv3+, PSPNet, FCN, UperNet, DANet and SegNet) are conducted with a self-labeled GF-2 remote sensing image dataset (MZData) and two open datasets landcover.ai and WHU building dataset. The results show that HFENet on three datasets with six evaluation metrics (mIoU, FWIoU, PA, mP, mRecall and mF1) are better than the other models and the mIoU is improved 7.41–10.60% on MZData, 1.17–11.57% on WHU building dataset and 0.93–4.31% on landcover.ai. HFENet can perform better in the task of refining the semantic segmentation of remote sensing images. Full article
(This article belongs to the Special Issue Advances in Deep Learning Based 3D Scene Understanding from LiDAR)
Show Figures

Graphical abstract

Back to TopTop