Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (21)

Search Parameters:
Keywords = generative mask sub-network

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
16 pages, 3751 KiB  
Article
Improved Face Image Super-Resolution Model Based on Generative Adversarial Network
by Qingyu Liu, Yeguo Sun, Lei Chen and Lei Liu
J. Imaging 2025, 11(5), 163; https://doi.org/10.3390/jimaging11050163 - 19 May 2025
Viewed by 639
Abstract
Image super-resolution (SR) models based on the generative adversarial network (GAN) face challenges such as unnatural facial detail restoration and local blurring. This paper proposes an improved GAN-based model to address these issues. First, a Multi-scale Hybrid Attention Residual Block (MHARB) is designed, [...] Read more.
Image super-resolution (SR) models based on the generative adversarial network (GAN) face challenges such as unnatural facial detail restoration and local blurring. This paper proposes an improved GAN-based model to address these issues. First, a Multi-scale Hybrid Attention Residual Block (MHARB) is designed, which dynamically enhances feature representation in critical face regions through dual-branch convolution and channel-spatial attention. Second, an Edge-guided Enhancement Block (EEB) is introduced, generating adaptive detail residuals by combining edge masks and channel attention to accurately recover high-frequency textures. Furthermore, a multi-scale discriminator with a weighted sub-discriminator loss is developed to balance global structural and local detail generation quality. Additionally, a phase-wise training strategy with dynamic adjustment of learning rate (Lr) and loss function weights is implemented to improve the realism of super-resolved face images. Experiments on the CelebA-HQ dataset demonstrate that the proposed model achieves a PSNR of 23.35 dB, a SSIM of 0.7424, and a LPIPS of 24.86, outperforming classical models and delivering superior visual quality in high-frequency regions. Notably, this model also surpasses the SwinIR model (PSNR: 23.28 dB → 23.35 dB, SSIM: 0.7340 → 0.7424, and LPIPS: 30.48 → 24.86), validating the effectiveness of the improved model and the training strategy in preserving facial details. Full article
(This article belongs to the Section AI in Imaging)
Show Figures

Figure 1

35 pages, 19516 KiB  
Article
DoubleNet: A Method for Generating Navigation Lines of Unstructured Soil Roads in a Vineyard Based on CNN and Transformer
by Xuezhi Cui, Licheng Zhu, Bo Zhao, Ruixue Wang, Zhenhao Han, Kunlei Lu, Xuguang Feng, Jipeng Ni and Xiaoyi Cui
Agronomy 2025, 15(3), 544; https://doi.org/10.3390/agronomy15030544 - 23 Feb 2025
Viewed by 634
Abstract
Navigating unstructured roads in vineyards with weak satellite signals presents significant challenges for robotic systems. This research introduces DoubleNet, an innovative deep-learning model designed to generate navigation lines for such conditions. To improve the model’s ability to extract image features, DoubleNet incorporates several [...] Read more.
Navigating unstructured roads in vineyards with weak satellite signals presents significant challenges for robotic systems. This research introduces DoubleNet, an innovative deep-learning model designed to generate navigation lines for such conditions. To improve the model’s ability to extract image features, DoubleNet incorporates several key innovations, such as a unique multi-head self-attention mechanism (Fused-MHSA), a modified activation function (SA-GELU), and a specialized operation block (DNBLK). Based on them, DoubleNet is structured as an encoder–decoder network that includes two parallel subnetworks: one dedicated to processing 2D feature maps and the other focused on 1D tensors. These subnetworks interact through two feature fusion networks, which operate in both the encoder and decoder stages, facilitating a more integrated feature extraction process. Additionally, we utilized a specially annotated dataset comprising images fused with RGB and mask, with five navigation points marked to enhance the accuracy of point localization. As a result of these innovations, DoubleNet achieves a remarkable 95.75% percentage of correct key points (PCK) and operates at 71.16 FPS on our dataset, with a combined performance that outperformed several well-known key point detection algorithms. DoubleNet demonstrates strong potential as a competitive solution for generating effective navigation routes for robots operating in vineyards with unstructured roads. Full article
(This article belongs to the Special Issue Advanced Machine Learning in Agriculture)
Show Figures

Figure 1

20 pages, 28589 KiB  
Article
An Adaptive Semantic Segmentation Network for Adversarial Learning Domain Based on Low-Light Enhancement and Decoupled Generation
by Meng Wang, Zhuoran Zhang and Haipeng Liu
Appl. Sci. 2024, 14(8), 3295; https://doi.org/10.3390/app14083295 - 13 Apr 2024
Cited by 3 | Viewed by 1790
Abstract
Nighttime semantic segmentation due to issues such as low contrast, fuzzy imaging, and low-quality annotation results in significant degradation of masks. In this paper, we introduce a domain adaptive approach for nighttime semantic segmentation that overcomes the reliance on low-light image annotations to [...] Read more.
Nighttime semantic segmentation due to issues such as low contrast, fuzzy imaging, and low-quality annotation results in significant degradation of masks. In this paper, we introduce a domain adaptive approach for nighttime semantic segmentation that overcomes the reliance on low-light image annotations to transfer the source domain model to the target domain. On the front end, a low-light image enhancement sub-network combining lightweight deep learning with mapping curve iteration is adopted to enhance nighttime foreground contrast. In the segmentation network, the body generation and edge preservation branches are implemented to generate consistent representations within the same semantic region. Additionally, a pixel weighting strategy is embedded to increase the prediction accuracy for small targets. During the training, a discriminator is implemented to distinguish features between the source and target domains, thereby guiding the segmentation network for adversarial transfer learning. The proposed approach’s effectiveness is verified through testing on Dark Zurich, Nighttime Driving, and CityScapes, including evaluations of mIoU, PSNR, and SSIM. They confirm that our approach surpasses existing baselines in segmentation scenarios. Full article
Show Figures

Figure 1

22 pages, 7154 KiB  
Article
Temporal Stability of Grassland Soil Moisture Utilising Sentinel-2 Satellites and Sparse Ground-Based Sensor Networks
by Rumia Basu, Eve Daly, Colin Brown, Asaf Shnel and Patrick Tuohy
Remote Sens. 2024, 16(2), 220; https://doi.org/10.3390/rs16020220 - 5 Jan 2024
Cited by 5 | Viewed by 3013
Abstract
Soil moisture is important for understanding climate, water resources, water storage, and land use management. This study used Sentinel-2 (S-2) satellite optical data to retrieve surface soil moisture at a 10 m scale on grassland sites with low hydraulic conductivity soil in a [...] Read more.
Soil moisture is important for understanding climate, water resources, water storage, and land use management. This study used Sentinel-2 (S-2) satellite optical data to retrieve surface soil moisture at a 10 m scale on grassland sites with low hydraulic conductivity soil in a climate dominated by heavy rainfall. Soil moisture was estimated after modifying the Optical Trapezoidal Model to account for mixed land cover in such conditions. The method uses data from a short-wave infra-red band, which is sensitive to soil moisture, and four vegetation indices from optical bands, which are sensitive to overlying vegetation. Scatter plots of these data from multiple, infrequent satellite passes are used to define the range of surface moisture conditions. The saturated and dry edges are clearly non-linear, regardless of the choice of vegetation index. Land cover masks are used to generate scatter plots from data only over grassland sites. The Enhanced Vegetation Index demonstrated advantages over other vegetation indices for surface moisture estimation over the entire range of grassland conditions. In poorly drained soils, the time lag between satellite surface moisture retrievals and in situ sensor soil moisture at depth must be part of the validation process. This was achieved by combining an approximate solution to the Richards’ Equation, along with measurements of saturated and residual moisture from soil samples, to optimise the correlations between measurements from satellites and sensors at a 15 cm depth. Time lags of 2–4 days resulted in a reduction of the root mean square errors between volumetric soil moisture predicted from S-2 data and that measured by in situ sensors, from ~0.1 m3/m3 to <0.06 m3/m3. The surface moisture results for two grassland sites were analysed using statistical concepts based upon the temporal stability of soil water content, an ideal framework for the intermittent Sentinel-2 data in conditions of persistent cloud cover. The analysis could discriminate between different natural drainages and surface soil textures in grassland areas and could identify sub-surface artificial drainage channels. The techniques are transferable for land-use and agricultural management in diverse environmental conditions without the need for extensive and expensive in situ sensor networks. Full article
(This article belongs to the Special Issue Remote Sensing for Soil Moisture and Vegetation Parameters Retrieval)
Show Figures

Figure 1

16 pages, 1092 KiB  
Article
Depth Information Precise Completion-GAN: A Precisely Guided Method for Completing Ill Regions in Depth Maps
by Ren Qian, Wenfeng Qiu, Wenbang Yang, Jianhua Li, Yun Wu, Renyang Feng, Xinan Wang and Yong Zhao
Remote Sens. 2023, 15(14), 3686; https://doi.org/10.3390/rs15143686 - 24 Jul 2023
Viewed by 1665
Abstract
In the depth map obtained through binocular stereo matching, there are many ill regions due to reasons such as lighting or occlusion. These ill regions cannot be accurately obtained due to the lack of information required for matching. Since the completion model based [...] Read more.
In the depth map obtained through binocular stereo matching, there are many ill regions due to reasons such as lighting or occlusion. These ill regions cannot be accurately obtained due to the lack of information required for matching. Since the completion model based on Gan generates random results, it cannot accurately complete the depth map. Therefore, it is necessary to accurately complete the depth map according to reality. To address this issue, this paper proposes a depth information precise completion GAN (DIPC-GAN) that effectively uses the Guid layer normalization (GuidLN) module to guide the model for precise completion by utilizing depth edges. GuidLN flexibly adjusts the weights of the guiding conditions based on intermediate results, allowing modules to accurately and effectively incorporate the guiding information. The model employs multiscale discriminators to discriminate results of different resolutions at different generator stages, enhancing the generator’s grasp of overall image and detail information. Additionally, this paper proposes Attention-ResBlock, which enables all ResBlocks in each task module of the GAN-based multitask model to focus on their own task by sharing a mask. Even when the ill regions are large, the model can effectively complement the missing details in these regions. Additionally, the multiscale discriminator in the model enhances the generator’s robustness. Finally, the proposed task-specific residual module can effectively focus different subnetworks of a multitask model on their respective tasks. The model has shown good repair results on datasets, including artificial, real, and remote sensing images. The final experimental results showed that the model’s REL and RMSE decreased by 9.3% and 9.7%, respectively, compared to RDFGan. Full article
(This article belongs to the Special Issue Computer Vision and Image Processing in Remote Sensing)
Show Figures

Figure 1

22 pages, 3954 KiB  
Article
Co-Visual Pattern-Augmented Generative Transformer Learning for Automobile Geo-Localization
by Jianwei Zhao, Qiang Zhai, Pengbo Zhao, Rui Huang and Hong Cheng
Remote Sens. 2023, 15(9), 2221; https://doi.org/10.3390/rs15092221 - 22 Apr 2023
Cited by 9 | Viewed by 2778
Abstract
Geolocation is a fundamental component of route planning and navigation for unmanned vehicles, but GNSS-based geolocation fails under denial-of-service conditions. Cross-view geo-localization (CVGL), which aims to estimate the geographic location of the ground-level camera by matching against enormous geo-tagged aerial (e.g., satellite) images, [...] Read more.
Geolocation is a fundamental component of route planning and navigation for unmanned vehicles, but GNSS-based geolocation fails under denial-of-service conditions. Cross-view geo-localization (CVGL), which aims to estimate the geographic location of the ground-level camera by matching against enormous geo-tagged aerial (e.g., satellite) images, has received a lot of attention but remains extremely challenging due to the drastic appearance differences across aerial–ground views. In existing methods, global representations of different views are extracted primarily using Siamese-like architectures, but their interactive benefits are seldom taken into account. In this paper, we present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL), for CVGL. Specifically, by taking the initial representations produced by the backbone network, MGTL develops two separate generative sub-modules—one for aerial-aware knowledge generation from ground-view semantics and vice versa—and fully exploits the entirely mutual benefits through the attention mechanism. Moreover, to better capture the co-visual relationships between aerial and ground views, we introduce a cascaded attention masking algorithm to further boost accuracy. Extensive experiments on challenging public benchmarks, i.e., CVACT and CVUSA, demonstrate the effectiveness of the proposed method, which sets new records compared with the existing state-of-the-art models. Our code will be available upon acceptance. Full article
Show Figures

Graphical abstract

18 pages, 39769 KiB  
Article
De-Aliasing and Accelerated Sparse Magnetic Resonance Image Reconstruction Using Fully Dense CNN with Attention Gates
by Md. Biddut Hossain, Ki-Chul Kwon, Shariar Md Imtiaz, Oh-Seung Nam, Seok-Hee Jeon and Nam Kim
Bioengineering 2023, 10(1), 22; https://doi.org/10.3390/bioengineering10010022 - 22 Dec 2022
Cited by 12 | Viewed by 3504
Abstract
When sparsely sampled data are used to accelerate magnetic resonance imaging (MRI), conventional reconstruction approaches produce significant artifacts that obscure the content of the image. To remove aliasing artifacts, we propose an advanced convolutional neural network (CNN) called fully dense attention CNN (FDA-CNN). [...] Read more.
When sparsely sampled data are used to accelerate magnetic resonance imaging (MRI), conventional reconstruction approaches produce significant artifacts that obscure the content of the image. To remove aliasing artifacts, we propose an advanced convolutional neural network (CNN) called fully dense attention CNN (FDA-CNN). We updated the Unet model with the fully dense connectivity and attention mechanism for MRI reconstruction. The main benefit of FDA-CNN is that an attention gate in each decoder layer increases the learning process by focusing on the relevant image features and provides a better generalization of the network by reducing irrelevant activations. Moreover, densely interconnected convolutional layers reuse the feature maps and prevent the vanishing gradient problem. Additionally, we also implement a new, proficient under-sampling pattern in the phase direction that takes low and high frequencies from the k-space both randomly and non-randomly. The performance of FDA-CNN was evaluated quantitatively and qualitatively with three different sub-sampling masks and datasets. Compared with five current deep learning-based and two compressed sensing MRI reconstruction techniques, the proposed method performed better as it reconstructed smoother and brighter images. Furthermore, FDA-CNN improved the mean PSNR by 2 dB, SSIM by 0.35, and VIFP by 0.37 compared with Unet for the acceleration factor of 5. Full article
(This article belongs to the Special Issue AI in MRI: Frontiers and Applications)
Show Figures

Figure 1

15 pages, 3005 KiB  
Article
Guided Random Mask: Adaptively Regularizing Deep Neural Networks for Medical Image Analysis by Potential Lesions
by Xiaorui Yu, Shuqi Wang and Junjie Hu
Appl. Sci. 2022, 12(18), 9099; https://doi.org/10.3390/app12189099 - 9 Sep 2022
Cited by 4 | Viewed by 2642
Abstract
Data augmentation is a critical regularization method that contributes to numerous state-of-the-art results achieved by deep neural networks (DNNs). The visual interpretation method demonstrates that the DNNs behave like object detectors, focusing on the discriminative regions in the input image. Many studies have [...] Read more.
Data augmentation is a critical regularization method that contributes to numerous state-of-the-art results achieved by deep neural networks (DNNs). The visual interpretation method demonstrates that the DNNs behave like object detectors, focusing on the discriminative regions in the input image. Many studies have also discovered that the DNNs correctly identify the lesions in the input, which has been confirmed in the current work. However, for medical images containing complicated lesions, we observe the DNNs focus on the most prominent abnormalities, neglecting sub-clinical characteristics that may also help diagnosis. We speculate this bias may hamper the generalization ability of DNNs, potentially causing false predicted results. Based on this consideration, a simple yet effective data augmentation method called guided random mask (GRM) is proposed to discover the lesions with different characteristics. Visual interpretation of the inference result is used as guidance to generate random-sized masks, forcing the DNNs to learn both the prominent and subtle lesions. One notable difference between GRM and conventional data augmentation methods is the association with the training phase of DNNs. The parameters in vanilla augmentation methods are independent of the training phase, which may limit their effectiveness when the scale and appearance of region-of-interests vary. Nevertheless, the effectiveness of the proposed GRM method evolves with the training of DNNs, adaptively regularizing the DNNs to alleviate the over-fitting problem. Moreover, the GRM is a parameter-free augmentation method that can be incorporated into DNNs without modifying the architecture. The GRM is empirically verified on multiple datasets with different modalities, including optical coherence tomography, x-ray, and color fundus images. Quantitative experimental results show that the proposed GRM method achieves higher classification accuracy than the commonly used augmentation methods in multiple networks. Visualization analysis also demonstrates that the GRM can better localize lesions than the vanilla network. Full article
Show Figures

Figure 1

24 pages, 7794 KiB  
Article
Mask-Point: Automatic 3D Surface Defects Detection Network for Fiber-Reinforced Resin Matrix Composites
by Helin Li, Bin Lin, Chen Zhang, Liang Xu, Tianyi Sui, Yang Wang, Xinquan Hao, Deyu Lou and Hongyu Li
Polymers 2022, 14(16), 3390; https://doi.org/10.3390/polym14163390 - 19 Aug 2022
Cited by 9 | Viewed by 3260
Abstract
Surface defects of fiber-reinforced resin matrix composites (FRRMCs) adversely affect their appearance and performance. To accurately and efficiently detect the three-dimensional (3D) surface defects of FRRMCs, a novel lightweight and two-stage semantic segmentation network, i.e., Mask-Point, is proposed. Stage 1 of Mask-Point is [...] Read more.
Surface defects of fiber-reinforced resin matrix composites (FRRMCs) adversely affect their appearance and performance. To accurately and efficiently detect the three-dimensional (3D) surface defects of FRRMCs, a novel lightweight and two-stage semantic segmentation network, i.e., Mask-Point, is proposed. Stage 1 of Mask-Point is the multi-head 3D region proposal extractors (RPEs), generating several 3D regions of interest (ROIs). Stage 2 is the 3D aggregation stage composed of the shared classifier, shared filter, and non-maximum suppression (NMS). The two stages work together to detect the surface defects. To evaluate the performance of Mask-Point, a new 3D surface defects dataset of FRRMCs containing about 120 million points is produced. Training and test experiments show that the accuracy and the mean intersection of union (mIoU) increase as the number of different 3D RPEs increases in Stage 1, but the inference speed becomes slower when the number of different 3D RPEs increases. The best accuracy, mIoU, and inference speed of the Mask-Point model could reach 0.9997, 0.9402, and 320,000 points/s, respectively. Moreover, comparison experiments also show that Mask-Point offers relatively the best segmentation performance compared with several other typical 3D semantic segmentation networks. The mIoU of Mask-Point is about 30% ahead of the sub-optimal 3D semantic segmentation network PointNet. In addition, a distributed surface defects detection system based on Mask-Point is developed. The system is applied to scan real FRRMC products and detect their surface defects, and it achieves the relatively best detection performance in competition with skilled human workers. The above experiments demonstrate that the proposed Mask-Point could accurately and efficiently detect 3D surface defects of FRRMCs, and the Mask-Point also provides a new potential solution for the 3D surface defects detection of other similar materials Full article
(This article belongs to the Special Issue Development in Fiber-Reinforced Polymer Composites)
Show Figures

Figure 1

18 pages, 37056 KiB  
Article
An Optical Image Encryption Method Using Hopfield Neural Network
by Xitong Xu and Shengbo Chen
Entropy 2022, 24(4), 521; https://doi.org/10.3390/e24040521 - 7 Apr 2022
Cited by 14 | Viewed by 2555
Abstract
In this paper, aiming to solve the problem of vital information security as well as neural network application in optical encryption system, we propose an optical image encryption method by using the Hopfield neural network. The algorithm uses a fuzzy single neuronal dynamic [...] Read more.
In this paper, aiming to solve the problem of vital information security as well as neural network application in optical encryption system, we propose an optical image encryption method by using the Hopfield neural network. The algorithm uses a fuzzy single neuronal dynamic system and a chaotic Hopfield neural network for chaotic sequence generation and then obtains chaotic random phase masks. Initially, the original images are decomposed into sub-signals through wavelet packet transform, and the sub-signals are divided into two layers by adaptive classification after scrambling. The double random-phase encoding in 4f system and Fresnel domain is implemented on two layers, respectively. The sub-signals are performed with different conversions according to their standard deviation to assure that the local information’s security is guaranteed. Meanwhile, the parameters such as wavelength and diffraction distance are considered as additional keys, which can enhance the overall security. Then, inverse wavelet packet transform is applied to reconstruct the image, and a second scrambling is implemented. In order to handle and manage the parameters used in the scheme, the public key cryptosystem is applied. Finally, experiments and security analysis are presented to demonstrate the feasibility and robustness of the proposed scheme. Full article
(This article belongs to the Special Issue Computational Imaging and Image Encryption with Entropy)
Show Figures

Figure 1

23 pages, 1926 KiB  
Article
GenU-Net++: An Automatic Intracranial Brain Tumors Segmentation Algorithm on 3D Image Series with High Performance
by Yan Zhang, Xi Liu, Shiyun Wa, Yutong Liu, Jiali Kang and Chunli Lv
Symmetry 2021, 13(12), 2395; https://doi.org/10.3390/sym13122395 - 12 Dec 2021
Cited by 20 | Viewed by 3395
Abstract
Automatic segmentation of intracranial brain tumors in three-dimensional (3D) image series is critical in screening and diagnosing related diseases. However, there are various challenges in intracranial brain tumor images: (1) Multiple brain tumor categories hold particular pathological features. (2) It is a thorny [...] Read more.
Automatic segmentation of intracranial brain tumors in three-dimensional (3D) image series is critical in screening and diagnosing related diseases. However, there are various challenges in intracranial brain tumor images: (1) Multiple brain tumor categories hold particular pathological features. (2) It is a thorny issue to locate and discern brain tumors from other non-brain regions due to their complicated structure. (3) Traditional segmentation requires a noticeable difference in the brightness of the interest target relative to the background. (4) Brain tumor magnetic resonance images (MRI) have blurred boundaries, similar gray values, and low image contrast. (5) Image information details would be dropped while suppressing noise. Existing methods and algorithms do not perform satisfactorily in overcoming these obstacles mentioned above. Most of them share an inadequate accuracy in brain tumor segmentation. Considering that the image segmentation task is a symmetric process in which downsampling and upsampling are performed sequentially, this paper proposes a segmentation algorithm based on U-Net++, aiming to address the aforementioned problems. This paper uses the BraTS 2018 dataset, which contains MR images of 245 patients. We suggest the generative mask sub-network, which can generate feature maps. This paper also uses the BiCubic interpolation method for upsampling to obtain segmentation results different from U-Net++. Subsequently, pixel-weighted fusion is adopted to fuse the two segmentation results, thereby, improving the robustness and segmentation performance of the model. At the same time, we propose an auto pruning mechanism in terms of the architectural features of U-Net++ itself. This mechanism deactivates the sub-network by zeroing the input. It also automatically prunes GenU-Net++ during the inference process, increasing the inference speed and improving the network performance by preventing overfitting. Our algorithm’s PA, MIoU, P, and R are tested on the validation dataset, reaching 0.9737, 0.9745, 0.9646, and 0.9527, respectively. The experimental results demonstrate that the proposed model outperformed the contrast models. Additionally, we encapsulate the model and develop a corresponding application based on the MacOS platform to make the model further applicable. Full article
Show Figures

Figure 1

18 pages, 13944 KiB  
Article
A Coarse-to-Fine Contour Optimization Network for Extracting Building Instances from High-Resolution Remote Sensing Imagery
by Fang Fang, Kaishun Wu, Yuanyuan Liu, Shengwen Li, Bo Wan, Yanling Chen and Daoyuan Zheng
Remote Sens. 2021, 13(19), 3814; https://doi.org/10.3390/rs13193814 - 23 Sep 2021
Cited by 17 | Viewed by 3877
Abstract
Building instances extraction is an essential task for surveying and mapping. Challenges still exist in extracting building instances from high-resolution remote sensing imagery mainly because of complex structures, variety of scales, and interconnected buildings. This study proposes a coarse-to-fine contour optimization network to [...] Read more.
Building instances extraction is an essential task for surveying and mapping. Challenges still exist in extracting building instances from high-resolution remote sensing imagery mainly because of complex structures, variety of scales, and interconnected buildings. This study proposes a coarse-to-fine contour optimization network to improve the performance of building instance extraction. Specifically, the network contains two special sub-networks: attention-based feature pyramid sub-network (AFPN) and coarse-to-fine contour sub-network. The former sub-network introduces channel attention into each layer of the original feature pyramid network (FPN) to improve the identification of small buildings, and the latter is designed to accurately extract building contours via two cascaded contour optimization learning. Furthermore, the whole network is jointly optimized by multiple losses, that is, a contour loss, a classification loss, a box regression loss and a general mask loss. Experimental results on three challenging building extraction datasets demonstrated that the proposed method outperformed the state-of-the-art methods’ accuracy and quality of building contours. Full article
(This article belongs to the Topic High-Resolution Earth Observation Systems, Technologies, and Applications)
(This article belongs to the Section AI Remote Sensing)
Show Figures

Graphical abstract

21 pages, 3902 KiB  
Article
A Deep Learning Approach to an Enhanced Building Footprint and Road Detection in High-Resolution Satellite Imagery
by Christian Ayala, Rubén Sesma, Carlos Aranda and Mikel Galar
Remote Sens. 2021, 13(16), 3135; https://doi.org/10.3390/rs13163135 - 7 Aug 2021
Cited by 38 | Viewed by 8484
Abstract
The detection of building footprints and road networks has many useful applications including the monitoring of urban development, real-time navigation, etc. Taking into account that a great deal of human attention is required by these remote sensing tasks, a lot of effort has [...] Read more.
The detection of building footprints and road networks has many useful applications including the monitoring of urban development, real-time navigation, etc. Taking into account that a great deal of human attention is required by these remote sensing tasks, a lot of effort has been made to automate them. However, the vast majority of the approaches rely on very high-resolution satellite imagery (<2.5 m) whose costs are not yet affordable for maintaining up-to-date maps. Working with the limited spatial resolution provided by high-resolution satellite imagery such as Sentinel-1 and Sentinel-2 (10 m) makes it hard to detect buildings and roads, since these labels may coexist within the same pixel. This paper focuses on this problem and presents a novel methodology capable of detecting building and roads with sub-pixel width by increasing the resolution of the output masks. This methodology consists of fusing Sentinel-1 and Sentinel-2 data (at 10 m) together with OpenStreetMap to train deep learning models for building and road detection at 2.5 m. This becomes possible thanks to the usage of OpenStreetMap vector data, which can be rasterized to any desired resolution. Accordingly, a few simple yet effective modifications of the U-Net architecture are proposed to not only semantically segment the input image, but also to learn how to enhance the resolution of the output masks. As a result, generated mappings quadruplicate the input spatial resolution, closing the gap between satellite and aerial imagery for building and road detection. To properly evaluate the generalization capabilities of the proposed methodology, a data-set composed of 44 cities across the Spanish territory have been considered and divided into training and testing cities. Both quantitative and qualitative results show that high-resolution satellite imagery can be used for sub-pixel width building and road detection following the proper methodology. Full article
(This article belongs to the Special Issue Computer Vision and Deep Learning for Remote Sensing Applications)
Show Figures

Graphical abstract

20 pages, 4441 KiB  
Article
Part-Aware Mask-Guided Attention for Thorax Disease Classification
by Ruihua Zhang, Fan Yang, Yan Luo, Jianyi Liu, Jinbin Li and Cong Wang
Entropy 2021, 23(6), 653; https://doi.org/10.3390/e23060653 - 23 May 2021
Cited by 5 | Viewed by 2810
Abstract
Thorax disease classification is a challenging task due to complex pathologies and subtle texture changes, etc. It has been extensively studied for years largely because of its wide application in computer-aided diagnosis. Most existing methods directly learn global feature representations from whole Chest [...] Read more.
Thorax disease classification is a challenging task due to complex pathologies and subtle texture changes, etc. It has been extensively studied for years largely because of its wide application in computer-aided diagnosis. Most existing methods directly learn global feature representations from whole Chest X-ray (CXR) images, without considering in depth the richer visual cues lying around informative local regions. Thus, these methods often produce sub-optimal thorax disease classification performance because they ignore the very informative pathological changes around organs. In this paper, we propose a novel Part-Aware Mask-Guided Attention Network (PMGAN) that learns complementary global and local feature representations from all-organ region and multiple single-organ regions simultaneously for thorax disease classification. Specifically, multiple innovative soft attention modules are designed to progressively guide feature learning toward the global informative regions of whole CXR image. A mask-guided attention module is designed to further search for informative regions and visual cues within the all-organ or single-organ images, where attention is elegantly regularized by automatically generated organ masks and without introducing computation during the inference stage. In addition, a multi-task learning strategy is designed, which effectively maximizes the learning of complementary local and global representations. The proposed PMGAN has been evaluated on the ChestX-ray14 dataset and the experimental results demonstrate its superior thorax disease classification performance against the state-of-the-art methods. Full article
(This article belongs to the Section Signal and Data Analysis)
Show Figures

Figure 1

18 pages, 12494 KiB  
Technical Note
Application of Denoising CNN for Noise Suppression and Weak Signal Extraction of Lunar Penetrating Radar Data
by Haoqiu Zhou, Xuan Feng, Zejun Dong, Cai Liu and Wenjing Liang
Remote Sens. 2021, 13(4), 779; https://doi.org/10.3390/rs13040779 - 20 Feb 2021
Cited by 27 | Viewed by 3788
Abstract
As one of the main payloads mounted on the Yutu-2 rover of Chang’E-4 probe, lunar penetrating radar (LPR) aims to map the subsurface structure in the Von Kármán crater. The field LPR data are generally masked by clutters and noises of large quantities. [...] Read more.
As one of the main payloads mounted on the Yutu-2 rover of Chang’E-4 probe, lunar penetrating radar (LPR) aims to map the subsurface structure in the Von Kármán crater. The field LPR data are generally masked by clutters and noises of large quantities. To solve the noise interference, dozens of filtering methods have been applied to LPR data. However, these methods have their limitations, so noise suppression is still a tough issue worth studying. In this article, the denoising convolutional neural network (CNN) framework is applied to the noise suppression and weak signal extraction of 500 MHz LPR data. The results verify that the low-frequency clutters embedded in the LPR data mainly came from the instrument system of the Yutu rover. Besides, compared with the classic band-pass filter and the mean filter, the CNN filter has better performance when dealing with noise interference and weak signal extraction; compared with Kirchhoff migration, it can provide original high-quality radargram with diffraction information. Based on the high-quality radargram provided by the CNN filter, the subsurface sandwich structure is revealed and the weak signals from three sub-layers within the paleo-regolith are extracted. Full article
(This article belongs to the Special Issue Lunar Remote Sensing and Applications)
Show Figures

Graphical abstract

Back to TopTop