MDPI - Publisher of Open Access Journals

22 pages, 7778 KiB

Open AccessArticle

Gas Leak Detection and Leakage Rate Identification in Underground Utility Tunnels Using a Convolutional Recurrent Neural Network

by Ziyang Jiang, Canghai Zhang, Zhao Xu and Wenbin Song

Appl. Sci. 2025, 15(14), 8022; https://doi.org/10.3390/app15148022 - 18 Jul 2025

Viewed by 302

Abstract

An underground utility tunnel (UUT) is essential for the efficient use of urban underground space. However, current maintenance systems rely on patrol personnel and professional equipment. This study explores industrial detection methods for identifying and monitoring natural gas leaks in UUTs. Via infrared [...] Read more.

An underground utility tunnel (UUT) is essential for the efficient use of urban underground space. However, current maintenance systems rely on patrol personnel and professional equipment. This study explores industrial detection methods for identifying and monitoring natural gas leaks in UUTs. Via infrared thermal imaging gas experiments, data were acquired and a dataset established. To address the low-resolution problem of existing imaging devices, video super-resolution (VSR) was used to improve the data quality. Based on a convolutional recurrent neural network (CRNN), the image features at each moment were extracted, and the time series data were modeled to realize the risk-level classification mechanism based on the automatic classification of the leakage rate. The experimental results show that when the sliding window size was set to 10 frames, the classification accuracy of the CRNN was the highest, which could reach 0.98. This method improves early warning precision and response efficiency, offering practical technical support for UUT maintenance management. Full article

(This article belongs to the Special Issue Applications of Artificial Intelligence in Industrial Engineering)

► Show Figures

Figure 1

17 pages, 7786 KiB

Open AccessArticle

Video Coding Based on Ladder Subband Recovery and ResGroup Module

by Libo Wei, Aolin Zhang, Lei Liu, Jun Wang and Shuai Wang

Entropy 2025, 27(7), 734; https://doi.org/10.3390/e27070734 - 8 Jul 2025

Viewed by 341

Abstract

With the rapid development of video encoding technology in the field of computer vision, the demand for tasks such as video frame reconstruction, denoising, and super-resolution has been continuously increasing. However, traditional video encoding methods typically focus on extracting spatial or temporal domain [...] Read more.

With the rapid development of video encoding technology in the field of computer vision, the demand for tasks such as video frame reconstruction, denoising, and super-resolution has been continuously increasing. However, traditional video encoding methods typically focus on extracting spatial or temporal domain information, often facing challenges of insufficient accuracy and information loss when reconstructing high-frequency details, edges, and textures of images. To address this issue, this paper proposes an innovative LadderConv framework, which combines discrete wavelet transform (DWT) with spatial and channel attention mechanisms. By progressively recovering wavelet subbands, it effectively enhances the video frame encoding quality. Specifically, the LadderConv framework adopts a stepwise recovery approach for wavelet subbands, first processing high-frequency detail subbands with relatively less information, then enhancing the interaction between these subbands, and ultimately synthesizing a high-quality reconstructed image through inverse wavelet transform. Moreover, the framework introduces spatial and channel attention mechanisms, which further strengthen the focus on key regions and channel features, leading to notable improvements in detail restoration and image reconstruction accuracy. To optimize the performance of the LadderConv framework, particularly in detail recovery and high-frequency information extraction tasks, this paper designs an innovative ResGroup module. By using multi-layer convolution operations along with feature map compression and recovery, the ResGroup module enhances the network’s expressive capability and effectively reduces computational complexity. The ResGroup module captures multi-level features from low level to high level and retains rich feature information through residual connections, thus improving the overall reconstruction performance of the model. In experiments, the combination of the LadderConv framework and the ResGroup module demonstrates superior performance in video frame reconstruction tasks, particularly in recovering high-frequency information, image clarity, and detail representation. Full article

(This article belongs to the Special Issue Rethinking Representation Learning in the Age of Large Models)

► Show Figures

Figure 1

20 pages, 2149 KiB

Open AccessArticle

Accelerating Facial Image Super-Resolution via Sparse Momentum and Encoder State Reuse

by Kerang Cao, Na Bao, Shuai Zheng, Ye Liu and Xing Wang

Electronics 2025, 14(13), 2616; https://doi.org/10.3390/electronics14132616 - 28 Jun 2025

Viewed by 417

Abstract

Single image super-resolution (SISR) aims to reconstruct high-quality images from low-resolution inputs, a persistent challenge in computer vision with critical applications in medical imaging, satellite imagery, and video enhancement. Traditional diffusion model-based (DM-based) methods, while effective in restoring fine details, suffer from computational [...] Read more.

Single image super-resolution (SISR) aims to reconstruct high-quality images from low-resolution inputs, a persistent challenge in computer vision with critical applications in medical imaging, satellite imagery, and video enhancement. Traditional diffusion model-based (DM-based) methods, while effective in restoring fine details, suffer from computational inefficiency due to their iterative denoising process. To address this, we introduce the Sparse Momentum-based Faster Diffusion Model (SMFDM), designed for rapid and high-fidelity super-resolution. SMFDM integrates a novel encoder state reuse mechanism that selectively omits non-critical time steps during the denoising phase, significantly reducing computational redundancy. Additionally, the model employs a sparse momentum mechanism, enabling robust representation capabilities while utilizing only a fraction of the original model weights. Experiments demonstrate that SMFDM achieves an impressive 71.04% acceleration in the diffusion process, requiring only 15% of the original weights, while maintaining high-quality outputs with effective preservation of image details and textures. Our work highlights the potential of combining sparse learning and efficient sampling strategies to enhance the practical applicability of diffusion models for super-resolution tasks. Full article

► Show Figures

Figure 1

28 pages, 5387 KiB

Open AccessArticle

A Deep Learning Framework of Super Resolution for License Plate Recognition in Surveillance System

by Pei-Fen Tsai, Jia-Yin Shiu and Shyan-Ming Yuan

Mathematics 2025, 13(10), 1673; https://doi.org/10.3390/math13101673 - 20 May 2025

Viewed by 1407

Abstract

Recognizing low-resolution license plates from real-world scenes remains a challenging task. While deep learning-based super-resolution methods have been widely applied, most existing datasets rely on artificially degraded images, and common quality metrics poorly correlate with OCR accuracy. We construct a new paired low- [...] Read more.

Recognizing low-resolution license plates from real-world scenes remains a challenging task. While deep learning-based super-resolution methods have been widely applied, most existing datasets rely on artificially degraded images, and common quality metrics poorly correlate with OCR accuracy. We construct a new paired low- and high-resolution license plate dataset from dashcam videos and propose a specialized super-resolution framework for license plate recognition. Only low-resolution images with OCR accuracy ≥5 are used to ensure sufficient feature information for effective perceptual learning. We analyze existing loss functions and introduce two novel perceptual losses—one CNN-based and one Transformer-based. Our approach improves recognition performance, achieving an average OCR accuracy of 85.14%. Full article

(This article belongs to the Section E1: Mathematics and Computer Science)

► Show Figures

Figure 1

13 pages, 3165 KiB

Open AccessArticle

Self-Supervised Infrared Video Super-Resolution Based on Deformable Convolution

by Jian Chen, Yan Zhao, Mo Chen, Yuwei Wang and Xin Ye

Electronics 2025, 14(10), 1995; https://doi.org/10.3390/electronics14101995 - 14 May 2025

Viewed by 476

Abstract

Infrared video often encounters low resolution, which makes it difficult to perform the target detection and recognition task. Super-resolution (SR) is an effective technology to enhance the resolution of infrared video. However, the existing SR method of infrared image is basically a single [...] Read more.

Infrared video often encounters low resolution, which makes it difficult to perform the target detection and recognition task. Super-resolution (SR) is an effective technology to enhance the resolution of infrared video. However, the existing SR method of infrared image is basically a single image SR, which restricts the performance of SR due to ignoring the strong inter-frame correlation in video. We propose a self-supervised SR method for infrared video that can estimate the blur kernel and generate paired data from raw low-resolution infrared video itself, without the need for additional high-resolution videos for supervision. Furthermore, to overcome the limitations of optical flow prediction in handling complex motion, a deformable convolutional network is introduced to adaptively learn motion information to capture more accurate, tiny motion changes between adjacent images in an infrared video. Experimental results show that the proposed method can achieve an outstanding performance of restored image in both visual effect and quantitative metrics. Full article

► Show Figures

Figure 1

29 pages, 2763 KiB

Open AccessReview

A Review of Computer Vision Technology for Football Videos

by Fucheng Zheng, Duaa Zuhair Al-Hamid, Peter Han Joo Chong, Cheng Yang and Xue Jun Li

Information 2025, 16(5), 355; https://doi.org/10.3390/info16050355 - 28 Apr 2025

Viewed by 1521

Abstract

In the era of digital advancement, the integration of Deep Learning (DL) algorithms is revolutionizing performance monitoring in football. Due to restrictions on monitoring devices during games to prevent unfair advantages, coaches are tasked to analyze players’ movements and performance visually. As a [...] Read more.

In the era of digital advancement, the integration of Deep Learning (DL) algorithms is revolutionizing performance monitoring in football. Due to restrictions on monitoring devices during games to prevent unfair advantages, coaches are tasked to analyze players’ movements and performance visually. As a result, Computer Vision (CV) technology has emerged as a vital non-contact tool for performance analysis, offering numerous opportunities to enhance the clarity, accuracy, and intelligence of sports event observations. However, existing CV studies in football face critical challenges, including low-resolution imagery of distant players and balls, severe occlusion in crowded scenes, motion blur during rapid movements, and the lack of large-scale annotated datasets tailored for dynamic football scenarios. This review paper fills this gap by comprehensively analyzing advancements in CV, particularly in four key areas: player/ball detection and tracking, motion prediction, tactical analysis, and event detection in football. By exploring these areas, this review offers valuable insights for future research on using CV technology to improve sports performance. Future directions should prioritize super-resolution techniques to enhance video quality and improve small-object detection performance, collaborative efforts to build diverse and richly annotated datasets, and the integration of contextual game information (e.g., score differentials and time remaining) to improve predictive models. The in-depth analysis of current State-Of-The-Art (SOTA) CV techniques provides researchers with a detailed reference to further develop robust and intelligent CV systems in football. Full article

(This article belongs to the Special Issue AI-Based Image Processing and Computer Vision)

► Show Figures

Figure 1

19 pages, 2647 KiB

Open AccessArticle

FDI-VSR: Video Super-Resolution Through Frequency-Domain Integration and Dynamic Offset Estimation

by Donghun Lim and Janghoon Choi

Sensors 2025, 25(8), 2402; https://doi.org/10.3390/s25082402 - 10 Apr 2025

Cited by 1 | Viewed by 909

Abstract

The increasing adoption of high-resolution imaging sensors across various fields has led to a growing demand for techniques to enhance video quality. Video super-resolution (VSR) addresses this need by reconstructing high-resolution videos from lower-resolution inputs; however, directly applying single-image super-resolution (SISR) methods to [...] Read more.

The increasing adoption of high-resolution imaging sensors across various fields has led to a growing demand for techniques to enhance video quality. Video super-resolution (VSR) addresses this need by reconstructing high-resolution videos from lower-resolution inputs; however, directly applying single-image super-resolution (SISR) methods to video sequences neglects temporal information, resulting in inconsistent and unnatural outputs. In this paper, we propose FDI-VSR, a novel framework that integrates spatiotemporal dynamics and frequency-domain analysis into conventional SISR models without extensive modifications. We introduce two key modules: the Spatiotemporal Feature Extraction Module (STFEM), which employs dynamic offset estimation, spatial alignment, and multi-stage temporal aggregation using residual channel attention blocks (RCABs); and the Frequency–Spatial Integration Module (FSIM), which transforms deep features into the frequency domain to effectively capture global context beyond the limited receptive field of standard convolutions. Extensive experiments on the Vid4, SPMCs, REDS4, and UDM10 benchmarks, supported by detailed ablation studies, demonstrate that FDI-VSR not only surpasses conventional VSR methods but also achieves competitive results compared to recent state-of-the-art methods, with improvements of up to 0.82 dB in PSNR on the SPMCs benchmark and notable reductions in visual artifacts, all while maintaining lower computational complexity and faster inference. Full article

(This article belongs to the Section Sensing and Imaging)

► Show Figures

Figure 1

14 pages, 4638 KiB

Open AccessArticle

LightVSR: A Lightweight Video Super-Resolution Model with Multi-Scale Feature Aggregation

by Guanglun Huang, Nachuan Li, Jianming Liu, Minghe Zhang, Li Zhang and Jun Li

Appl. Sci. 2025, 15(3), 1506; https://doi.org/10.3390/app15031506 - 1 Feb 2025

Cited by 1 | Viewed by 2013

Abstract

Video super-resolution aims to generate high-resolution video sequences with realistic details from existing low-resolution video sequences. However, most existing video super-resolution models require substantial computational power and are not suitable for resource-constrained devices such as smartphones and tablets. In this paper, we propose [...] Read more.

Video super-resolution aims to generate high-resolution video sequences with realistic details from existing low-resolution video sequences. However, most existing video super-resolution models require substantial computational power and are not suitable for resource-constrained devices such as smartphones and tablets. In this paper, we propose a lightweight video super-resolution (LightVSR) model that employs a novel feature aggregation module to enhance video quality by efficiently reconstructing high-resolution frames from compressed low-resolution inputs. LightVSR integrates several novel mechanisms, including head-tail convolution, cross-layer shortcut connections, and multi-input attention, to enhance computational efficiency while guaranteeing video super-resolution performance. Extensive experiments show that LightVSR achieves a frame rate of 28.57 FPS and a PSNR of 39.25 dB on the UDM10 dataset and 36.91 dB on the Vimeo-90k dataset, validating its efficiency and effectiveness. Full article

► Show Figures

Figure 1

13 pages, 22601 KiB

Open AccessArticle

Lightweight Reference-Based Video Super-Resolution Using Deformable Convolution

by Tomo Miyazaki, Zirui Guo and Shinichiro Omachi

Information 2024, 15(11), 718; https://doi.org/10.3390/info15110718 - 8 Nov 2024

Viewed by 1680

Abstract

Super-resolution is a technique for generating a high-resolution image or video from a low-resolution counterpart by predicting natural and realistic texture information. It has various applications such as medical image analysis, surveillance, remote sensing, etc. However, traditional single-image super-resolution methods can lead to [...] Read more.

Super-resolution is a technique for generating a high-resolution image or video from a low-resolution counterpart by predicting natural and realistic texture information. It has various applications such as medical image analysis, surveillance, remote sensing, etc. However, traditional single-image super-resolution methods can lead to a blurry visual effect. Reference-based super-resolution methods have been proposed to recover detailed information accurately. In reference-based methods, a high-resolution image is also used as a reference in addition to the low-resolution input image. Reference-based methods aim at transferring high-resolution textures from the reference image to produce visually pleasing results. However, it requires texture alignment between low-resolution and reference images, which generally requires a lot of time and memory. This paper proposes a lightweight reference-based video super-resolution method using deformable convolution. The proposed method makes the reference-based super-resolution a technology that can be easily used even in environments with limited computational resources. To verify the effectiveness of the proposed method, we conducted experiments to compare the proposed method with baseline methods in two aspects: runtime and memory usage, in addition to accuracy. The experimental results showed that the proposed method restored a high-quality super-resolved image from a very low-resolution level in 0.0138 s using two NVIDIA RTX 2080 GPUs, much faster than the representative method. Full article

(This article belongs to the Special Issue Deep Learning for Image, Video and Signal Processing)

► Show Figures

Figure 1

15 pages, 1240 KiB

Open AccessArticle

Position-Guided Multi-Head Alignment and Fusion for Video Super-Resolution

by Yanbo Gao, Xun Cai, Shuai Li, Jiajing Chai and Chuankun Li

Electronics 2024, 13(22), 4372; https://doi.org/10.3390/electronics13224372 - 7 Nov 2024

Cited by 1 | Viewed by 1019

Abstract

Video super-resolution (VSR), which takes advantage of multiple low-resolution (LR) video frames to reconstruct corresponding high-resolution (HR) frames in a video, has raised increasing interest. To upsample an LR frame (denoted by a reference frame), VSR methods usually align multiple neighboring frames (denoted [...] Read more.

Video super-resolution (VSR), which takes advantage of multiple low-resolution (LR) video frames to reconstruct corresponding high-resolution (HR) frames in a video, has raised increasing interest. To upsample an LR frame (denoted by a reference frame), VSR methods usually align multiple neighboring frames (denoted by supporting frames) to the reference frame first in order to provide more relevant information. The existing VSR methods usually employ deformable convolution to conduct the frame alignment, where the whole supporting frame is aligned to the reference frame without a specific target and without supervision. Thus, the aligned features are not explicitly learned to provide the HR frame information and cannot fully explore the supporting frames. To address this problem, in this work, we propose a novel video super-resolution framework with Position-Guided Multi-Head Alignment, termed as PGMH-A, to explicitly align the supporting frames to different spatial positions of the HR frame (denoted by different heads). It injects explicit position information to obtain multi-head-aligned features of supporting frames to better formulate the HR frame. PGMH-A can be trained individually or end-to-end with the ground-truth HR frames. Moreover, a Position-Guided Multi-Head Fusion, termed as PGMH-F, is developed based on the attention mechanism to further fuse the spatial–temporal information across temporal supporting frames, across multiple heads corresponding to the different spatial positions of an HR frame, and across multiple channels. Together, the proposed Position-Guided Multi-Head Alignment and Fusion (PGMH-AF) can provide VSR with better local details and temporal coherence. The experimental results demonstrate that the proposed method outperforms the state-of-the-art VSR networks. Ablation studies have also been conducted to verify the effectiveness of the proposed modules. Full article

(This article belongs to the Special Issue Challenges and Applications in Multimedia and Visual Computing)

► Show Figures

Figure 1

15 pages, 6862 KiB

Open AccessArticle

Detection and Tracking of Low-Frame-Rate Water Surface Dynamic Multi-Target Based on the YOLOv7-DeepSORT Fusion Algorithm

by Xingcheng Han, Shiwen Fu and Junxuan Han

J. Mar. Sci. Eng. 2024, 12(9), 1528; https://doi.org/10.3390/jmse12091528 - 3 Sep 2024

Cited by 4 | Viewed by 1398

Abstract

This study aims to address the problem in tracking technology in which targeted cruising ships or submarines sailing near the water surface are tracked at low frame rates or with some frames missing in the video image, so that the tracked targets have [...] Read more.

This study aims to address the problem in tracking technology in which targeted cruising ships or submarines sailing near the water surface are tracked at low frame rates or with some frames missing in the video image, so that the tracked targets have a large gap between frames, leading to a decrease in tracking accuracy and inefficiency. Thus, in this study, we proposed a water surface dynamic multi-target tracking algorithm based on the fusion of YOLOv7 and DeepSORT. The algorithm first introduces the super-resolution reconstruction network. The network can eliminate the interference of clouds and waves in images to improve the quality of tracking target images and clarify the target characteristics in the image. Then, the shuffle attention module is introduced into YOLOv7 to enhance the feature extraction ability of the target features in the recognition network. Finally, Euclidean distance matching is introduced into the cascade matching of the DeepSORT algorithm to replace the distance matching of IOU to improve the target tracking accuracy. Simulation results showed that the algorithm proposed in this study has a good tracking effect, with an improvement of 9.4% in the improved YOLOv7 model relative to the mAP50-95 value and an improvement of 13.1% in the tracking accuracy in the DeepSORT tracking network compared with the SORT tracking accuracy. Full article

► Show Figures

Figure 1

24 pages, 1883 KiB

Open AccessReview

Applications of GANs to Aid Target Detection in SAR Operations: A Systematic Literature Review

by Vinícius Correa, Peter Funk, Nils Sundelius, Rickard Sohlberg and Alexandre Ramos

Drones 2024, 8(9), 448; https://doi.org/10.3390/drones8090448 - 31 Aug 2024

Cited by 4 | Viewed by 2964

Abstract

Research on unmanned autonomous vehicles (UAVs) for search and rescue (SAR) missions is widespread due to its cost-effectiveness and enhancement of security and flexibility in operations. However, a significant challenge arises from the quality of sensors, terrain variability, noise, and the sizes of [...] Read more.

Research on unmanned autonomous vehicles (UAVs) for search and rescue (SAR) missions is widespread due to its cost-effectiveness and enhancement of security and flexibility in operations. However, a significant challenge arises from the quality of sensors, terrain variability, noise, and the sizes of targets in the images and videos taken by them. Generative adversarial networks (GANs), introduced by Ian Goodfellow, among their variations, can offer excellent solutions for improving the quality of sensors, regarding super-resolution, noise removal, and other image processing issues. To identify new insights and guidance on how to apply GANs to detect living beings in SAR operations, a PRISMA-oriented systematic literature review was conducted to analyze primary studies that explore the usage of GANs for edge or object detection in images captured by drones. The results demonstrate the utilization of GAN algorithms in the realm of image enhancement for object detection, along with the metrics employed for tool validation. These findings provide insights on how to apply or modify them to aid in target identification during search stages. Full article

(This article belongs to the Special Issue UAV Detection, Classification, and Tracking)

► Show Figures

Figure 1

34 pages, 4902 KiB

Open AccessReview

A Survey on Visual Mamba

by Hanwei Zhang, Ying Zhu, Dan Wang, Lijun Zhang, Tianxiang Chen, Ziyang Wang and Zi Ye

Appl. Sci. 2024, 14(13), 5683; https://doi.org/10.3390/app14135683 - 28 Jun 2024

Cited by 54 | Viewed by 13885

Abstract

State space models (SSM) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently shown significant potential in long-sequence modeling. Since the complexity of transformers’ self-attention mechanism is quadratic with image size, as well as increasing computational demands, researchers are currently exploring how [...] Read more.

State space models (SSM) with selection mechanisms and hardware-aware architectures, namely Mamba, have recently shown significant potential in long-sequence modeling. Since the complexity of transformers’ self-attention mechanism is quadratic with image size, as well as increasing computational demands, researchers are currently exploring how to adapt Mamba for computer vision tasks. This paper is the first comprehensive survey that aims to provide an in-depth analysis of Mamba models within the domain of computer vision. It begins by exploring the foundational concepts contributing to Mamba’s success, including the SSM framework, selection mechanisms, and hardware-aware design. Then, we review these vision Mamba models by categorizing them into foundational models and those enhanced with techniques including convolution, recurrence, and attention to improve their sophistication. Furthermore, we investigate the widespread applications of Mamba in vision tasks, which include their use as a backbone in various levels of vision processing. This encompasses general visual tasks, medical visual tasks (e.g., 2D/3D segmentation, classification, image registration, etc.), and remote sensing visual tasks. In particular, we introduce general visual tasks from two levels: high/mid-level vision (e.g., object detection, segmentation, video classification, etc.) and low-level vision (e.g., image super-resolution, image restoration, visual generation, etc.). We hope this endeavor will spark additional interest within the community to address current challenges and further apply Mamba models in computer vision. Full article

(This article belongs to the Special Issue Application of Artificial Intelligence in Visual Processing)

► Show Figures

Figure 1

17 pages, 4537 KiB

Open AccessArticle

Video Multi-Scale-Based End-to-End Rate Control in Deep Contextual Video Compression

by Lili Wei, Zhenglong Yang, Hua Zhang, Xinyu Liu, Weihao Deng and Youchao Zhang

Appl. Sci. 2024, 14(13), 5573; https://doi.org/10.3390/app14135573 - 26 Jun 2024

Cited by 1 | Viewed by 1518

Abstract

In recent years, video data have increased in size, which results in enormous transmission pressure. Rate control plays an important role in stabilizing video stream transmissions by balancing the rate and distortion of video compression. To achieve high-quality videos through low-bandwidth transmission, video [...] Read more.

In recent years, video data have increased in size, which results in enormous transmission pressure. Rate control plays an important role in stabilizing video stream transmissions by balancing the rate and distortion of video compression. To achieve high-quality videos through low-bandwidth transmission, video multi-scale-based end-to-end rate control is proposed. First, to reduce video data, the original video is processed using multi-scale bicubic downsampling as the input. Then, the end-to-end rate control model is implemented. By fully using the temporal coding correlation, a two-branch residual-based network and a two-branch regression-based network are designed to obtain the optimal bit rate ratio and Lagrange multiplier λ for rate control. For restoring high-resolution videos, a hybrid efficient distillation SISR network (HEDS-Net) is designed to build low-resolution and high-resolution feature dependencies, in which a multi-branch distillation network, a lightweight attention LCA block, and an upsampling network are used to transmit deep extracted frame features, enhance feature expression, and improve image detail restoration abilities, respectively. The experimental results show that the PSNR and SSIM BD rates of the proposed multi-scale-based end-to-end rate control are −1.24% and −0.50%, respectively, with 1.82% rate control accuracy. Full article

► Show Figures

Figure 1

24 pages, 96595 KiB

Open AccessArticle

Modified ESRGAN with Uformer for Video Satellite Imagery Super-Resolution

by Kinga Karwowska and Damian Wierzbicki

Remote Sens. 2024, 16(11), 1926; https://doi.org/10.3390/rs16111926 - 27 May 2024

Cited by 2 | Viewed by 2149

Abstract

In recent years, a growing number of sensors that provide imagery with constantly increasing spatial resolution are being placed on the orbit. Contemporary Very-High-Resolution Satellites (VHRS) are capable of recording images with a spatial resolution of less than 0.30 m. However, until now, [...] Read more.

In recent years, a growing number of sensors that provide imagery with constantly increasing spatial resolution are being placed on the orbit. Contemporary Very-High-Resolution Satellites (VHRS) are capable of recording images with a spatial resolution of less than 0.30 m. However, until now, these scenes were acquired in a static way. The new technique of the dynamic acquisition of video satellite imagery has been available only for a few years. It has multiple applications related to remote sensing. However, in spite of the offered possibility to detect dynamic targets, its main limitation is the degradation of the spatial resolution of the image that results from imaging in video mode, along with a significant influence of lossy compression. This article presents a methodology that employs Generative Adversarial Networks (GAN). For this purpose, a modified ESRGAN architecture is used for the spatial resolution enhancement of video satellite images. In this solution, the GAN network generator was extended by the Uformer model, which is responsible for a significant improvement in the quality of the estimated SR images. This enhances the possibilities to recognize and detect objects significantly. The discussed solution was tested on the Jilin-1 dataset and it presents the best results for both the global and local assessment of the image (the mean values of the SSIM and PSNR parameters for the test data were, respectively, 0.98 and 38.32 dB). Additionally, the proposed solution, in spite of the fact that it employs artificial neural networks, does not require a high computational capacity, which means it can be implemented in workstations that are not equipped with graphic processors. Full article

(This article belongs to the Section Remote Sensing Image Processing)

► Show Figures

Figure 1

Search Results (74)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (74)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI