Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (27)

Search Parameters:
Keywords = visible-to-infrared image translation

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
36 pages, 26652 KiB  
Article
Low-Light Image Enhancement for Driving Condition Recognition Through Multi-Band Images Fusion and Translation
by Dong-Min Son and Sung-Hak Lee
Mathematics 2025, 13(9), 1418; https://doi.org/10.3390/math13091418 - 25 Apr 2025
Viewed by 526
Abstract
When objects are obscured by shadows or dim surroundings, image quality is improved by fusing near-infrared and visible-light images. At night, when visible and NIR lights are insufficient, long-wave infrared (LWIR) imaging can be utilized, necessitating the attachment of a visible-light sensor to [...] Read more.
When objects are obscured by shadows or dim surroundings, image quality is improved by fusing near-infrared and visible-light images. At night, when visible and NIR lights are insufficient, long-wave infrared (LWIR) imaging can be utilized, necessitating the attachment of a visible-light sensor to an LWIR camera to simultaneously capture both LWIR and visible-light images. This camera configuration enables the acquisition of infrared images at various wavelengths depending on the time of day. To effectively fuse clear visible regions from the visible-light spectrum with those from the LWIR spectrum, a multi-band fusion method is proposed. The proposed fusion process subsequently combines detailed information from infrared and visible-light images, enhancing object visibility. Additionally, this process compensates for color differences in visible-light images, resulting in a natural and visually consistent output. The fused images are further enhanced using a night-to-day image translation module, which improves overall brightness and reduces noise. This night-to-day translation module is a trained CycleGAN-based module that adjusts object brightness in nighttime images to levels comparable to daytime images. The effectiveness and superiority of the proposed method are validated using image quality metrics. The proposed method significantly contributes to image enhancement, achieving the best average scores compared to other methods, with a BRISQUE of 30.426 and a PIQE of 22.186. This study improves the accuracy of human and object recognition in CCTV systems and provides a potential image-processing tool for autonomous vehicles. Full article
Show Figures

Figure 1

22 pages, 52708 KiB  
Article
CSMR: A Multi-Modal Registered Dataset for Complex Scenarios
by Chenrui Li, Kun Gao, Zibo Hu, Zhijia Yang, Mingfeng Cai, Haobo Cheng and Zhenyu Zhu
Remote Sens. 2025, 17(5), 844; https://doi.org/10.3390/rs17050844 - 27 Feb 2025
Viewed by 981
Abstract
Complex scenarios pose challenges to tasks in computer vision, including image fusion, object detection, and image-to-image translation. On the one hand, complex scenarios involve fluctuating weather or lighting conditions, where even images of the same scenarios appear to be different. On the other [...] Read more.
Complex scenarios pose challenges to tasks in computer vision, including image fusion, object detection, and image-to-image translation. On the one hand, complex scenarios involve fluctuating weather or lighting conditions, where even images of the same scenarios appear to be different. On the other hand, the large amount of textural detail in the given images introduces considerable interference that can conceal the useful information contained in them. An effective solution to these problems is to use the complementary details present in multi-modal images, such as visible-light and infrared images. Visible-light images contain rich textural information while infrared images contain information about the temperature. In this study, we propose a multi-modal registered dataset for complex scenarios under various environmental conditions, targeting security surveillance and the monitoring of low-slow-small targets. Our dataset contains 30,819 images, where the targets are labeled as three classes of “person”, “car”, and “drone” using Yolo format bounding boxes. We compared our dataset with those used in the literature for computer vision-related tasks, including image fusion, object detection, and image-to-image translation. The results showed that introducing complementary information through image fusion can compensate for missing details in the original images, and we also revealed the limitations of visual tasks in single-modal images with complex scenarios. Full article
(This article belongs to the Special Issue Recent Advances in Infrared Target Detection)
Show Figures

Figure 1

24 pages, 1129 KiB  
Article
Infrared Image Generation Based on Visual State Space and Contrastive Learning
by Bing Li, Decao Ma, Fang He, Zhili Zhang, Daqiao Zhang and Shaopeng Li
Remote Sens. 2024, 16(20), 3817; https://doi.org/10.3390/rs16203817 - 14 Oct 2024
Cited by 1 | Viewed by 2091
Abstract
The preparation of infrared reference images is of great significance for improving the accuracy and precision of infrared imaging guidance. However, collecting infrared data on-site is difficult and time-consuming. Fortunately, the infrared images can be obtained from the corresponding visible-light images to enrich [...] Read more.
The preparation of infrared reference images is of great significance for improving the accuracy and precision of infrared imaging guidance. However, collecting infrared data on-site is difficult and time-consuming. Fortunately, the infrared images can be obtained from the corresponding visible-light images to enrich the infrared data. To this end, this present work proposes an image translation algorithm that converts visible-light images to infrared images. This algorithm, named V2IGAN, is founded on the visual state space attention module and multi-scale feature contrastive learning loss. Firstly, we introduce a visual state space attention module designed to sharpen the generative network’s focus on critical regions within visible-light images. This enhancement not only improves feature extraction but also bolsters the generator’s capacity to accurately model features, ultimately enhancing the quality of generated images. Furthermore, the method incorporates a multi-scale feature contrastive learning loss function, which serves to bolster the robustness of the model and refine the detail of the generated images. Experimental results show that the V2IGAN method outperforms existing typical infrared image generation techniques in both subjective visual assessments and objective metric evaluations. This suggests that the V2IGAN method is adept at enhancing the feature representation in images, refining the details of the generated infrared images, and yielding reliable, high-quality results. Full article
Show Figures

Figure 1

20 pages, 28541 KiB  
Article
IFSrNet: Multi-Scale IFS Feature-Guided Registration Network Using Multispectral Image-to-Image Translation
by Bowei Chen, Li Chen, Umara Khalid and Shuai Zhang
Electronics 2024, 13(12), 2240; https://doi.org/10.3390/electronics13122240 - 7 Jun 2024
Cited by 4 | Viewed by 1204
Abstract
Multispectral image registration is the process of aligning the spatial regions of two images with different distributions. One of the main challenges it faces is to resolve the severe inconsistencies between the reference and target images. This paper presents a novel multispectral image [...] Read more.
Multispectral image registration is the process of aligning the spatial regions of two images with different distributions. One of the main challenges it faces is to resolve the severe inconsistencies between the reference and target images. This paper presents a novel multispectral image registration network, Multi-scale Intuitionistic Fuzzy Set Feature-guided Registration Network (IFSrNet), to address multispectral image registration. IFSrNet generates pseudo-infrared images from visible images using Cycle Generative Adversarial Network (CycleGAN), which is equipped with a multi-head attention module. An end-to-end registration network encodes the input multispectral images with intuitionistic fuzzification, which employs an improved feature descriptor—Intuitionistic Fuzzy Set–Scale-Invariant Feature Transform (IFS-SIFT)—to guide its operation. The results of the image registration will be presented in a direct output. For this task we have also designed specialised loss functions. The results of the experiment demonstrate that IFSrNet outperforms existing registration methods in the Visible–IR dataset. IFSrNet has the potential to be employed as a novel image-to-image translation paradigm. Full article
(This article belongs to the Topic Computer Vision and Image Processing, 2nd Edition)
Show Figures

Figure 1

29 pages, 2492 KiB  
Review
Emerging Technologies for Remote Sensing of Floating and Submerged Plastic Litter
by Lonneke Goddijn-Murphy, Victor Martínez-Vicente, Heidi M. Dierssen, Valentina Raimondi, Erio Gandini, Robert Foster and Ved Chirayath
Remote Sens. 2024, 16(10), 1770; https://doi.org/10.3390/rs16101770 - 16 May 2024
Cited by 16 | Viewed by 5995
Abstract
Most advances in the remote sensing of floating marine plastic litter have been made using passive remote-sensing techniques in the visible (VIS) to short-wave-infrared (SWIR) parts of the electromagnetic spectrum based on the spectral absorption features of plastic surfaces. In this paper, we [...] Read more.
Most advances in the remote sensing of floating marine plastic litter have been made using passive remote-sensing techniques in the visible (VIS) to short-wave-infrared (SWIR) parts of the electromagnetic spectrum based on the spectral absorption features of plastic surfaces. In this paper, we present developments of new and emerging remote-sensing technologies of marine plastic litter such as passive techniques: fluid lensing, multi-angle polarimetry, and thermal infrared sensing (TIS); and active techniques: light detection and ranging (LiDAR), multispectral imaging detection and active reflectance (MiDAR), and radio detection and ranging (RADAR). Our review of the detection capabilities and limitations of the different sensing technologies shows that each has their own weaknesses and strengths, and that there is not one single sensing technique that applies to all kinds of marine litter under every different condition in the aquatic environment. Rather, we should focus on the synergy between different technologies to detect marine plastic litter and potentially the use of proxies to estimate its presence. Therefore, in addition to further developing remote-sensing techniques, more research is needed in the composition of marine litter and the relationships between marine plastic litter and their proxies. In this paper, we propose a common vocabulary to help the community to translate concepts among different disciplines and techniques. Full article
(This article belongs to the Section Environmental Remote Sensing)
Show Figures

Graphical abstract

21 pages, 7693 KiB  
Article
The Potential of Diffusion-Based Near-Infrared Image Colorization
by Ayk Borstelmann, Timm Haucke and Volker Steinhage
Sensors 2024, 24(5), 1565; https://doi.org/10.3390/s24051565 - 28 Feb 2024
Cited by 1 | Viewed by 2705
Abstract
Camera traps, an invaluable tool for biodiversity monitoring, capture wildlife activities day and night. In low-light conditions, near-infrared (NIR) imaging is commonly employed to capture images without disturbing animals. However, the reflection properties of NIR light differ from those of visible light in [...] Read more.
Camera traps, an invaluable tool for biodiversity monitoring, capture wildlife activities day and night. In low-light conditions, near-infrared (NIR) imaging is commonly employed to capture images without disturbing animals. However, the reflection properties of NIR light differ from those of visible light in terms of chrominance and luminance, creating a notable gap in human perception. Thus, the objective is to enrich near-infrared images with colors, thereby bridging this domain gap. Conventional colorization techniques are ineffective due to the difference between NIR and visible light. Moreover, regular supervised learning methods cannot be applied because paired training data are rare. Solutions to such unpaired image-to-image translation problems currently commonly involve generative adversarial networks (GANs), but recently, diffusion models gained attention for their superior performance in various tasks. In response to this, we present a novel framework utilizing diffusion models for the colorization of NIR images. This framework allows efficient implementation of various methods for colorizing NIR images. We show NIR colorization is primarily controlled by the translation of the near-infrared intensities to those of visible light. The experimental evaluation of three implementations with increasing complexity shows that even a simple implementation inspired by visible-near-infrared (VIS-NIR) fusion rivals GANs. Moreover, we show that the third implementation is capable of outperforming GANs. With our study, we introduce an intersection field joining the research areas of diffusion models, NIR colorization, and VIS-NIR fusion. Full article
Show Figures

Figure 1

25 pages, 12563 KiB  
Article
Nighttime Thermal Infrared Image Translation Integrating Visible Images
by Shihao Yang, Min Sun, Xiayin Lou, Hanjun Yang and Dong Liu
Remote Sens. 2024, 16(4), 666; https://doi.org/10.3390/rs16040666 - 13 Feb 2024
Cited by 3 | Viewed by 2692
Abstract
Nighttime Thermal InfraRed (NTIR) image colorization, also known as the translation of NTIR images into Daytime Color Visible (DCV) images, can facilitate human and intelligent system perception of nighttime scenes under weak lighting conditions. End-to-end neural networks have been used to learn the [...] Read more.
Nighttime Thermal InfraRed (NTIR) image colorization, also known as the translation of NTIR images into Daytime Color Visible (DCV) images, can facilitate human and intelligent system perception of nighttime scenes under weak lighting conditions. End-to-end neural networks have been used to learn the mapping relationship between temperature and color domains, and translate NTIR images with one channel into DCV images with three channels. However, this mapping relationship is an ill-posed problem with multiple solutions without constraints, resulting in blurred edges, color disorder, and semantic errors. To solve this problem, an NTIR2DCV method that includes two steps is proposed: firstly, fuse Nighttime Color Visible (NCV) images with NTIR images based on an Illumination-Aware, Multilevel Decomposition Latent Low-Rank Representation (IA-MDLatLRR) method, which considers the differences in illumination conditions during image fusion and adjusts the fusion strategy of MDLatLRR accordingly to suppress the adverse effects of nighttime lights; secondly, translate the Nighttime Fused (NF) image to DCV image based on HyperDimensional Computing Generative Adversarial Network (HDC-GAN), which ensures feature-level semantic consistency between the source image (NF image) and the translated image (DCV image) without creating semantic label maps. Extensive comparative experiments and the evaluation metrics values show that the proposed algorithms perform better than other State-Of-The-Art (SOTA) image fusion and translation methods, such as FID and KID, which decreased by 14.1 and 18.9, respectively. Full article
(This article belongs to the Section Remote Sensing Image Processing)
Show Figures

Figure 1

19 pages, 10488 KiB  
Article
VQ-InfraTrans: A Unified Framework for RGB-IR Translation with Hybrid Transformer
by Qiyang Sun, Xia Wang, Changda Yan and Xin Zhang
Remote Sens. 2023, 15(24), 5661; https://doi.org/10.3390/rs15245661 - 7 Dec 2023
Cited by 1 | Viewed by 4095
Abstract
Infrared (IR) images containing rich spectral information are essential in many fields. Most RGB-IR transfer work currently relies on conditional generative models to learn and train IR images for specific devices and scenes. However, these models only establish an empirical mapping relationship between [...] Read more.
Infrared (IR) images containing rich spectral information are essential in many fields. Most RGB-IR transfer work currently relies on conditional generative models to learn and train IR images for specific devices and scenes. However, these models only establish an empirical mapping relationship between RGB and IR images in a single dataset, which cannot achieve the multi-scene and multi-band (0.7–3 μm and 8–15 μm) transfer task. To address this challenge, we propose VQ-InfraTrans, a comprehensive framework for transferring images from the visible spectrum to the infrared spectrum. Our framework incorporates a multi-mode approach to RGB-IR image transferring, encompassing both unconditional and conditional transfers, achieving diverse and flexible image transformations. Instead of training individual models for each specific condition or dataset, we propose a two-stage transfer framework that integrates diverse requirements into a unified model that utilizes a composite encoder–decoder based on VQ-GAN, and a multi-path transformer to translate multi-modal images from RGB to infrared. To address the issue of significant errors in transferring specific targets due to their radiance, we have developed a hybrid editing module to precisely map spectral transfer information for specific local targets. The qualitative and quantitative comparisons conducted in this work reveal substantial enhancements compared to prior algorithms, as the objective evaluation metric SSIM (structural similarity index) was improved by 2.24% and the PSNR (peak signal-to-noise ratio) was improved by 2.71%. Full article
(This article belongs to the Special Issue Computer Vision and Image Processing in Remote Sensing)
Show Figures

Graphical abstract

18 pages, 17294 KiB  
Article
Spectral Patterns of Pixels and Objects of the Forest Phytophysiognomies in the Anauá National Forest, Roraima State, Brazil
by Tiago Monteiro Condé, Niro Higuchi, Adriano José Nogueira Lima, Moacir Alberto Assis Campos, Jackelin Dias Condé, André Camargo de Oliveira and Dirceu Lucio Carneiro de Miranda
Ecologies 2023, 4(4), 686-703; https://doi.org/10.3390/ecologies4040045 - 28 Oct 2023
Cited by 1 | Viewed by 1567
Abstract
Forest phytophysiognomies have specific spatial patterns that can be mapped or translated into spectral patterns of vegetation. Regions of spectral similarity can be classified by reference to color, tonality or intensity of brightness, reflectance, texture, size, shape, neighborhood influence, etc. We evaluated the [...] Read more.
Forest phytophysiognomies have specific spatial patterns that can be mapped or translated into spectral patterns of vegetation. Regions of spectral similarity can be classified by reference to color, tonality or intensity of brightness, reflectance, texture, size, shape, neighborhood influence, etc. We evaluated the power of accuracy of supervised classification algorithms via per-pixel (maximum likelihood) and geographic object-based image analysis (GEOBIA) for distinguishing spectral patterns of the vegetation in the northern Brazilian Amazon. A total of 280 training samples (70%) and 120 validation samples (30%) of each of the 11 vegetation cover and land-use classes (N = 4400) were classified based on differences in their visible (RGB), near-infrared (NIR), and medium infrared (SWIR 1 or MIR) Landsat 8 (OLI) bands. Classification by pixels achieved a greater accuracy (Kappa = 0.75%) than GEOBIA (Kappa = 0.72%). GEOBIA, however, offers a greater plasticity and the possibility of calibrating the spectral rules associated with vegetation indices and spatial parameters. We conclude that both methods enabled precision spectral separations (0.45–1.65 μm), contributing to the distinctions between forest phytophysiognomies and land uses—strategic factors in the planning and management of natural resources in protected areas in the Amazon region. Full article
Show Figures

Figure 1

17 pages, 5450 KiB  
Review
Near-Infrared-II Fluorophores for In Vivo Multichannel Biosensing
by Feng Ren, Tuanwei Li, Tingfeng Yao, Guangcun Chen, Chunyan Li and Qiangbin Wang
Chemosensors 2023, 11(8), 433; https://doi.org/10.3390/chemosensors11080433 - 4 Aug 2023
Cited by 4 | Viewed by 2500
Abstract
The pathological process involves a range of intrinsic biochemical markers. The detection of multiple biological parameters is imperative for providing precise diagnostic information on diseases. In vivo multichannel fluorescence biosensing facilitates the acquisition of biochemical information at different levels, such as tissue, cellular, [...] Read more.
The pathological process involves a range of intrinsic biochemical markers. The detection of multiple biological parameters is imperative for providing precise diagnostic information on diseases. In vivo multichannel fluorescence biosensing facilitates the acquisition of biochemical information at different levels, such as tissue, cellular, and molecular, with rapid feedback, high sensitivity, and high spatiotemporal resolution. Notably, fluorescence imaging in the near-infrared-II (NIR-II) window (950–1700 nm) promises deeper optical penetration depth and diminished interferential autofluorescence compared with imaging in the visible (400–700 nm) and near-infrared-I (NIR-I, 700–950 nm) regions, making it a promising option for in vivo multichannel biosensing toward clinical practice. Furthermore, the use of advanced NIR-II fluorophores supports the development of biosensing with spectra-domain, lifetime-domain, and fluorescence-lifetime modes. This review summarizes the versatile designs and functions of NIR-II fluorophores for in vivo multichannel biosensing in various scenarios, including biological process monitoring, cellular tracking, and pathological analysis. Additionally, the review briefly discusses desirable traits required for the clinical translation of NIR-II fluorophores such as safety, long-wavelength emission, and clear components. Full article
Show Figures

Figure 1

22 pages, 7673 KiB  
Article
Enhanced Night-to-Day Image Conversion Using CycleGAN-Based Base-Detail Paired Training
by Dong-Min Son, Hyuk-Ju Kwon and Sung-Hak Lee
Mathematics 2023, 11(14), 3102; https://doi.org/10.3390/math11143102 - 13 Jul 2023
Cited by 16 | Viewed by 5620
Abstract
Numerous studies are underway to enhance the identification of surroundings in nighttime environments. These studies explore methods such as utilizing infrared images to improve night image visibility or converting night images into day-like representations for enhanced visibility. This research presents a technique focused [...] Read more.
Numerous studies are underway to enhance the identification of surroundings in nighttime environments. These studies explore methods such as utilizing infrared images to improve night image visibility or converting night images into day-like representations for enhanced visibility. This research presents a technique focused on converting the road conditions depicted in night images to resemble daytime scenes. To facilitate this, a paired dataset is created by augmenting limited day and night image data using CycleGAN. The model is trained using both original night images and single-scale luminance transform (SLAT) day images to enhance the level of detail in the converted daytime images. However, the generated daytime images may exhibit sharpness and noise issues. To address these concerns, an image processing approach, inspired by the Stevens effect and local blurring, which align with visual characteristics, is employed to reduce noise and enhance image details. Consequently, this study contributes to improving the visibility of night images by means of day image conversion and subsequent image processing. The proposed night-to-day image translation in this study has a processing time of 0.81 s, including image processing, which is less than one second. Therefore, it is considered valuable as a module for daytime image translation. Additionally, the image quality assessment metric, BRISQUE, yielded a score of 19.8, indicating better performance compared to conventional methods. The outcomes of this research hold potential applications in fields such as CCTV surveillance systems and self-driving cars. Full article
Show Figures

Figure 1

23 pages, 12660 KiB  
Article
Eddies in the Arctic Ocean Revealed from MODIS Optical Imagery
by Evgeny A. Morozov and Igor E. Kozlov
Remote Sens. 2023, 15(6), 1608; https://doi.org/10.3390/rs15061608 - 15 Mar 2023
Cited by 3 | Viewed by 2703
Abstract
Here we investigate properties of ocean eddies in the key Arctic region of the northern Greenland Sea and the Fram Strait using visible and infrared Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua data acquired from April to September in 2007 and 2018–2020. We infer [...] Read more.
Here we investigate properties of ocean eddies in the key Arctic region of the northern Greenland Sea and the Fram Strait using visible and infrared Moderate Resolution Imaging Spectroradiometer (MODIS) Aqua data acquired from April to September in 2007 and 2018–2020. We infer eddy properties using visual identification and automated processing of their signatures in sea surface temperature (SST) and chlorophyll-a (chl-a) maps, and their gradients. Altogether, 450 (721) eddies were identified in SST (chl-a) data. Their radii span from 2 to 40 km (mean value 12 km). Most eddies are elliptical with a mean aspect ratio (eccentricity) of their axes equal 0.77 (0.64). Cyclones are smaller than anticyclones and prevail in both data sources. Cyclones tend to be more prevalent over shallow shelves, and anticyclones over deep water regions. Peak eddy activity is registered in June, while chl-a data also possess a second peak in April. In SST, the highest eddy probability is found along the East Greenland Current in the Nordbukta region at 76–78°N and along the West Spitsbergen Current at 78–80°N. In chl-a, most of them are observed in the central Fram Strait. The overall number of eddies with a positive chl-a anomaly, dominated by cyclones, is larger (62%) than that with a negative one (~38%). The number of eddies with positive and negative SST anomalies is nearly equal. Eddy translation velocities are 0.9–9.6 km/day (mean value 4.2 km/day). Despite frequent cloud and ice cover, MODIS data is a rich source of information on eddy generation hot-spots, their spatial properties, dynamics and associated SST and chl-a anomalies in the Arctic Ocean. Full article
(This article belongs to the Special Issue Remote Sensing of Polar Ocean, Sea Ice and Atmosphere Dynamics)
Show Figures

Figure 1

11 pages, 3708 KiB  
Article
MWIRGAN: Unsupervised Visible-to-MWIR Image Translation with Generative Adversarial Network
by Mohammad Shahab Uddin, Chiman Kwan and Jiang Li
Electronics 2023, 12(4), 1039; https://doi.org/10.3390/electronics12041039 - 20 Feb 2023
Cited by 10 | Viewed by 3568
Abstract
Unsupervised image-to-image translation techniques have been used in many applications, including visible-to-Long-Wave Infrared (visible-to-LWIR) image translation, but very few papers have explored visible-to-Mid-Wave Infrared (visible-to-MWIR) image translation. In this paper, we investigated unsupervised visible-to-MWIR image translation using generative adversarial networks (GANs). We proposed [...] Read more.
Unsupervised image-to-image translation techniques have been used in many applications, including visible-to-Long-Wave Infrared (visible-to-LWIR) image translation, but very few papers have explored visible-to-Mid-Wave Infrared (visible-to-MWIR) image translation. In this paper, we investigated unsupervised visible-to-MWIR image translation using generative adversarial networks (GANs). We proposed a new model named MWIRGAN for visible-to-MWIR image translation in a fully unsupervised manner. We utilized a perceptual loss to leverage shape identification and location changes of the objects in the translation. The experimental results showed that MWIRGAN was capable of visible-to-MWIR image translation while preserving the object’s shape with proper enhancement in the translated images and outperformed several competing state-of-the-art models. In addition, we customized the proposed model to convert game-engine-generated (a commercial software) images to MWIR images. The quantitative results showed that our proposed method could effectively generate MWIR images from game-engine-generated images, greatly benefiting MWIR data augmentation. Full article
(This article belongs to the Special Issue Feature Papers in Circuit and Signal Processing)
Show Figures

Figure 1

14 pages, 2850 KiB  
Technical Note
An Unpaired Thermal Infrared Image Translation Method Using GMA-CycleGAN
by Shihao Yang, Min Sun, Xiayin Lou, Hanjun Yang and Hang Zhou
Remote Sens. 2023, 15(3), 663; https://doi.org/10.3390/rs15030663 - 22 Jan 2023
Cited by 18 | Viewed by 4901
Abstract
Automatically translating chromaticity-free thermal infrared (TIR) images into realistic color visible (CV) images is of great significance for autonomous vehicles, emergency rescue, robot navigation, nighttime video surveillance, and many other fields. Most recent designs use end-to-end neural networks to translate TIR directly to [...] Read more.
Automatically translating chromaticity-free thermal infrared (TIR) images into realistic color visible (CV) images is of great significance for autonomous vehicles, emergency rescue, robot navigation, nighttime video surveillance, and many other fields. Most recent designs use end-to-end neural networks to translate TIR directly to CV; however, compared to these networks, TIR has low contrast and an unclear texture for CV translation. Thus, directly translating the TIR temperature value of only one channel to the RGB color value of three channels without adding additional constraints or semantic information does not handle the one-to-three mapping problem between different domains in a good way, causing the translated CV images not only to have blurred edges but also color confusion. As for the methodology of the work, considering that in the translation from TIR to CV the most important process is to map information from the temperature domain into the color domain, an improved CycleGAN (GMA-CycleGAN) is proposed in this work in order to translate TIR images to grayscale visible (GV) images. Although the two domains have different properties, the numerical mapping is one-to-one, which reduces the color confusion caused by one-to-three mapping when translating TIR to CV. Then, a GV-CV translation network is applied to obtain CV images. Since the process of decomposing GV images into CV images is carried out in the same domain, edge blurring can be avoided. To enhance the boundary gradient between the object (pedestrian and vehicle) and the background, a mask attention module based on the TIR temperature mask and the CV semantic mask is designed without increasing the network parameters, and it is added to the feature encoding and decoding convolution layers of the CycleGAN generator. Moreover, a perceptual loss term is applied to the original CycleGAN loss function to bring the translated images closer to the real images regarding the space feature. In order to verify the effectiveness of the proposed method, the FLIR dataset is used for experiments, and the obtained results show that, compared to the state-of-the-art model, the subjective quality of the translated CV images obtained by the proposed method is better, as the objective evaluation metric FID (Fréchet inception distance) is reduced by 2.42 and the PSNR (peak signal-to-noise ratio) is improved by 1.43. Full article
Show Figures

Figure 1

27 pages, 3739 KiB  
Article
Visual Navigation Algorithm for Night Landing of Fixed-Wing Unmanned Aerial Vehicle
by Zhaoyang Wang, Dan Zhao and Yunfeng Cao
Aerospace 2022, 9(10), 615; https://doi.org/10.3390/aerospace9100615 - 17 Oct 2022
Cited by 15 | Viewed by 3444
Abstract
In the recent years, visual navigation has been considered an effective mechanism for achieving an autonomous landing of Unmanned Aerial Vehicles (UAVs). Nevertheless, with the limitations of visual cameras, the effectiveness of visual algorithms is significantly limited by lighting conditions. Therefore, a novel [...] Read more.
In the recent years, visual navigation has been considered an effective mechanism for achieving an autonomous landing of Unmanned Aerial Vehicles (UAVs). Nevertheless, with the limitations of visual cameras, the effectiveness of visual algorithms is significantly limited by lighting conditions. Therefore, a novel vision-based autonomous landing navigation scheme is proposed for night-time autonomous landing of fixed-wing UAV. Firstly, due to the difficulty of detecting the runway caused by the low-light image, a strategy of visible and infrared image fusion is adopted. The objective functions of the fused and visible image, and the fused and infrared image, are established. Then, the fusion problem is transformed into the optimal situation of the objective function, and the optimal solution is realized by gradient descent schemes to obtain the fused image. Secondly, to improve the performance of detecting the runway from the enhanced image, a runway detection algorithm based on an improved Faster region-based convolutional neural network (Faster R-CNN) is proposed. The runway ground-truth box of the dataset is statistically analyzed, and the size and number of anchors in line with the runway detection background are redesigned based on the analysis results. Finally, a relative attitude and position estimation method for the UAV with respect to the landing runway is proposed. New coordinate reference systems are established, six landing parameters, such as three attitude and three positions, are further calculated by Orthogonal Iteration (OI). Simulation results reveal that the proposed algorithm can achieve 1.85% improvement of AP on runway detection, and the reprojection error of rotation and translation for pose estimation are 0.675 and 0.581%, respectively. Full article
(This article belongs to the Special Issue Vision-Based UAV Navigation)
Show Figures

Figure 1

Back to TopTop