SupGAN: A General Super-Resolution GAN-Promoting Training Method

Wu, Tao; Xiong, Shuo; Chen, Qiuhang; Liu, Huaizheng; Cao, Weijun; Tuo, Haoran

doi:10.3390/app15179231

Open AccessArticle

SupGAN: A General Super-Resolution GAN-Promoting Training Method

by

Tao Wu

¹,

Shuo Xiong

^2,*

,

Qiuhang Chen

¹,

Huaizheng Liu

¹,

Weijun Cao

¹ and

Haoran Tuo

¹

School of Software Engineering, Huazhong University of Science and Technology, Wuhan 430074, China

²

School of Journalism and Information Communication, Huazhong University of Science and Technology, Wuhan 430074, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(17), 9231; https://doi.org/10.3390/app15179231

Submission received: 7 May 2025 / Revised: 17 July 2025 / Accepted: 29 July 2025 / Published: 22 August 2025

(This article belongs to the Special Issue Collaborative Learning and Optimization Theory and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

An image super-resolution (SR) method based on Generative Adversarial Networks (GANs) has achieved impressive results in terms of visual performance. However, the weights of loss functions in these methods are usually set to fixed values manually, which cannot fully adapt to different datasets and tasks, and may result in a decrease in the perceptual effect of the SR images. To address this issue and further improve visual quality, we propose a perception-driven SupGAN, which improves the generator and loss function of GAN-based image super-resolution models. The generator adopts multi-scale feature extraction and fusion to restore SR images with diverse and fine textures. We design a network-training method based on the proportion of high-frequency information in images (BHFTM), which utilizes the proportion of high-frequency information in images obtained through the Canny operator to set the weights of the loss function. In addition, we employ the four-patch method to better simulate the degradation of complex real-world scenarios. We extensively test our method and compare it with recent SR methods (BSRGAN, Real-ESRGAN, RealSR, SwinIR, LDL, etc.) on different types of datasets (OST300, 2020track1, RealWorld38, BSDS100 etc.) with a scaling factor of ×4. The results show that the NIQE metric improves, and also demonstrate that SupGAN can generate more natural and fine textures while suppressing unpleasant artifacts.

Keywords:

perception design; image super resolution; Generative Adversarial Network; degradation model; loss function

1. Introduction

The purpose of single-image super-resolution (SISR) is to convert a low-resolution (LR) into a high-resolution (HR) image, and it aims to enhance the resolution and quality of the image to make it clearer and with more details. Traditionally, it is challenging to convert an LR image into a HR image, and the main methods are interpolation-based [1,2] and reconstruction-based [3,4,5]. Currently, SISR methods typically employ deep learning-based models, such as Convolutional Neural Networks (CNNs). These learning-based methods are trained to extract features from the low-resolution image and map them to the HR image, which are then used to reconstruct the final output.

Since Generative Adversarial Networks (GANs) [6] were introduced in the field of super-resolution, GANs have been widely used in this field and have shown great potential. SRGAN [7] employs SRResNet as the generator within its GAN framework and introduces a perceptual loss that blends content loss with adversarial loss to train the network. ESRGAN [8] improves the adversarial loss using a relativistic GAN [9] on the basis of SRGAN and enhances the perceptual loss by using features before activation. Real-ESRGAN [10,11] also uses perceptual loss, adversarial loss, and content loss as loss functions.

Most super-resolution models use composite loss functions composed of multiple loss functions, whose weights are usually manually set and fixed during training. However, there are differences in the proportion of high-frequency information among real training samples. For GAN-based super-resolution methods, the weights of the loss functions used are constant, resulting in the equal treatment of all training samples. For images with a higher proportion of low-frequency information, their low-frequency information can sometimes cause the training to proceed in the wrong direction in terms of perception. This can lead to problems such as insufficient recovery of detailed information by the generator, limited discrimination ability of the discriminator between SR images and HR images in terms of details, and may cause high-frequency detail information to be lost in SR images generated from inputs containing rich details and high-frequency content, thereby compromising visual fidelity.

In other words, we should distinguish the weights of the corresponding loss functions for images with different proportions of high-frequency information in the training samples. Images with a higher proportion of high-frequency information usually contain more detailed information and deserve more attention toward the perceptual loss during super-resolution, while images with more low-frequency information (such as sky backgrounds) require more focus on pixel loss. Based on this, we propose Support GAN(SupGAN), and the contributions of this paper are as follows:

We design a multi-scale feature-extraction block in the generator network. It features a five-level perceptual hierarchy that captures multi-scale features from LR images while suppressing artifacts.
We propose a network-training strategy that adjusts learning priorities based on the proportion of high-frequency information in the training images (BHFTM), which includes an edge detection module, a weight-setting module, and a loss-construction module, to dynamically adjust the weights of loss functions in the super-resolution model according to the proportion of high-frequency information in the training samples.
We employ the four-patch method to better simulate the degradation of complex real-world scenarios.
We demonstrate that SupGAN produces SR images with better recovery of high-frequency details and improved perceptual metrics while reducing the occurrence of artifacts, resulting in SR images that better meet human perceptual needs (see Figure 1).

2. Related Work

Single image super-resolution: Over the past few years, deep learning-based approaches have achieved remarkable progress in the field of single-image super-resolution (SISR) [6,12,13]. SRCNN [14] was the first to introduce deep learning into super-resolution, significantly outperforming traditional methods. Dong et al. later proposed FSRCNN [15], which improved on SRCNN by offering a faster and more real-time solution. ESPCN [16] introduced a subpixel convolution layer that extracts features directly from low-resolution input, thus avoiding convolutions in high-resolution images and reducing computational cost. Generative Adversarial Networks (GANs) have also been widely adopted in SISR tasks, utilizing adversarial training between a generator and a discriminator to produce more realistic high-resolution outputs [7,8,10]. More recently, researchers have begun exploring the powerful modeling capabilities of transformers [17] in computer vision. For example, Yang et al. present TTSR [18], which formulates LR and HR images as queries and keys within the transformer architecture to promote joint feature learning between LR and HR representations.
Perception-driven approaches: Image super-resolution methods that optimize for peak signal-to-noise ratio (PSNR) often produce overly smooth high-resolution images that lack fine high-frequency details and fail to align with human perceptual quality. To address this limitation, perception-driven approaches have gained significant attention in recent years. RCAN [19] enhances perceptual quality by employing residual blocks along with channel attention mechanisms within its super-resolution architecture. To capture multi-scale information and improve feature discrimination, RFB-ESRGAN [20] integrates a Receptive Field Block (RFB) [21] into its design and achieved top performance in the NTIRE 2020 Perceptual Extreme Super-Resolution Challenge.
Loss function: Traditional super-resolution methods commonly use mean squared error (MSE) as the loss function [14]. While effective for optimizing pixel-wise accuracy, MSE tends to produce overly smooth high-resolution outputs that lack high-frequency details. Although the use of L1 loss has shown improvements over MSE [22], it still fails to adequately capture the perceptual quality of the generated images. To address this issue, Li Fei-Fei et al. introduced a perceptual loss function [23], based on the concept of perceptual similarity [24]. This loss evaluates the perceptual difference between the reconstructed and ground-truth images by computing feature distances within a pre-trained deep neural network. SRGAN [7] introduced adversarial loss, employing a discriminator network to guide the generator towards producing images that lie on the natural image manifold. To further enhance texture realism, Sajjadi et al. proposed a texture matching loss [25], which compares the texture statistics between generated and real images. Contextual loss [26] was also proposed to better capture structural and content-level similarities between images. Since a single loss function often cannot fully account for the diverse aspects of image quality in super-resolution tasks, many modern approaches adopt a combination of multiple loss terms. A commonly used composite loss includes L1 loss, perceptual loss, and adversarial loss [7,8,10].
Degradation models: Deep learning-based super-resolution methods often rely on classical degradation models [27,28], which typically consist of a sequence of operations, including blurring, downsampling, and noise addition. However, the performance of these methods can degrade significantly when real-world degradation deviates from these assumed models. To handle this situation, some approaches use implicit degradation modeling, learning abstract representations of degradation types in the feature space rather than explicitly modeling them in the pixel space [29,30,31]. BSRGAN [32] proposes a more practical and diverse degradation model by randomly shuffling different degradation operations, including blur, downsampling, and noise, while Real-ESRGAN [10] further improves on this by introducing a higher-order degradation process designed to more accurately reflect the complexity of real-world image corruptions.
Evaluation metrics: Super-resolution methods based on deep convolutional neural networks are typically evaluated using two categories of metrics: distortion-based metrics (e.g., PSNR, SSIM, IFC, VIF [33,34,35]) and perceptual quality metrics. The latter includes human subjective assessments and no-reference image quality indicators such as the Ma score [36], NIQE [37], BRISQUE [38], and the perceptual index (PI) [39,40]. Yochai et al. [39] highlighted the inherent trade-off between distortion accuracy and perceptual quality, where methods achieving high perceptual quality often score lower on traditional distortion metrics such as PSNR and SSIM. In this work, we adopt NIQE as our primary evaluation metric. As a no-reference metric, NIQE assesses the perceptual quality of reconstructed images without requiring access to ground truth images [41].

3. Proposed Methods

To enhance perceptual quality and reduce artifacts in single-image super-resolution (SISR), we introduce Support GAN (SupGAN). This section details three key components: the generator architecture with its multi-scale feature-extraction block (MFEB), enhancements to high-order degradation modeling through an improved four-patch strategy, and the balanced high-frequency training method (BHFTM) for network optimization.

3.1. Generator Architecture

Studies demonstrate that convolutional filters with varying kernel sizes acquire distinct feature extraction capabilities, where larger kernels capture broader structural patterns leading to performance gains [14]. Our MFEB module employs an inception-inspired design [42], with five parallel convolutional pathways for hierarchical feature extraction from low-resolution inputs, as illustrated in Figure 2.

The pathway configuration is as follows:(1) micro-feature pathway (k1-n64-s1) preserves fine details through minimal receptive fields; (2) fine-feature pathway (k1-n32-s1 → k3-n64-s1) combines small kernels for subtle pattern retention; (3) mid-range pathway (k1-n32-s1 → k3-n48-s1 → k3-n64-s1) employs medium kernels for intermediate features; (4) broad-context pathway (k1-n32-s1 → k3-n40-s1 → k3-n48-s1 → k5-n64-s1) utilizes larger kernels for extensive feature capture; (5) global-context pathway (k1-n32-s1 → k3-n40-s1 → k3-n48-s1 → k5-n56-s1 → k5-n64-s1) integrates multiple kernel sizes for comprehensive analysis.

All pathway outputs are processed through SiLU activation functions before undergoing pixel-shuffle operations to generate multi-scale feature maps

F_{1}

–

F_{3}

. The complete transformation is mathematically represented as

F_{i} = [S i L U ({C o n v s}_{i} (x_{s u b}))] ↑_{s}, i \in {1, 2, 3} .

(1)

Finally, these features are mixed to obtain a richer representation of features.

3.2. Four-Patch Degradation Model

The four-patch method is proposed to improve the downsampling operation in the Real-ESRGAN high-order degradation modeling process. The main reason for this improvement is that the interpolation method used during the resize stage is singular, selecting one among bicubic, area, and bilinear interpolation methods. However, in the field of super-resolution (SR), it cannot be definitively stated which interpolation method is the best. Each interpolation method represents a distribution of a specific type of low-resolution (LR) image in the real world and cannot represent all types of LR images. Existing SR approaches in the field typically apply a single interpolation method to a single image during the process of handling degraded images, and then train the network based on it. However, this approach deviates significantly from real-world scenarios.

Therefore, during the resize process, it is necessary to calculate a single image using multiple interpolation algorithms. The specific approach is to divide the image into four patches, where consecutive pairs of patches are processed using different interpolation methods. The interpolation method is distributed as Figure 3.

3.3. Network Training Method

The composition of our method is illustrated in Figure 4. It includes an edge detection module, a weight setting module, and a loss construction module.

Edge detection module: To differentiate the proportion of high-frequency information in images, we use edge detection algorithms to preprocess the images, converting digital images into images composed of lines and curves, providing shape and boundary information of objects. We compared several commonly used edge detection algorithms and used the Canny operator to detect image edges. The detection effect of the Canny operator is shown in Figure 5.
The Canny operator [43] is an edge detection algorithm based on Gaussian filtering and non-maximum suppression. It has good edge detection performance and is suitable for detecting fine-edge parts. Compared with other first-order differential operators (such as Roberts operator [44], Sobel operator [45], and Prewitt operator [46]), the Canny operator has stronger noise reduction ability.
Weight-setting module: Image samples with a lower proportion of high-frequency content contain less effective information, and, during training, these samples should be trained to produce high-definition images with equally limited information. To achieve this, it is necessary to reduce the proportion of the perceptual loss and adversarial loss and emphasize the role of the L1 loss. Therefore, the L1 loss weight is increased for samples with a lower proportion of high-frequency content, and decreased for those with a higher proportion. The main process of the BHFTM, which adjusts the weights of the L1 loss for each sample based on the proportion of high-frequency information obtained by the edge detection module, is described in Algorithm 1.
Loss-construction module: The generator’s loss function combines pixel loss $L_{1}$ , perceptual loss $L_{P e r c e p}$ , and adversarial loss $L_{G A N}$ into a single unified formula. The overall loss function of the generator can be expressed as

$\begin{matrix} L_{G} = & α L_{1} + β L_{P e r c e p} + γ L_{G A N} \\ = & α E_{x_{i}} ∥ G (x_{i}) {- y ∥}_{1} + β E_{x_{i}} {∥ V G G (G (x_{i})) - V G G (y) ∥}_{1} \\ + γ (- E_{x_{h r}} [log (1 - D (x_{h r}, x_{s r}))] - E_{x_{s r}} [log (D (x_{s r}, x_{h r}))]) \end{matrix}$

(2)

where $L_{1}$ is the 1-norm distance between the recovered image $G (x_{i})$ and HR image y, evaluating the average degree of approximation of $I^{S R}$ and $I^{H R}$ on pixels. $L_{P e r c e p}$ is gained by introducing a fine-tuned VGG19 network to calculate the 1-norm distance between the recovered image $G (x_{i})$ and high-level features of y, assessing the perceptual similarity between $I^{S R}$ and $I^{H R}$ from a human perspective. The $L_{G A N}$ framework leverages the discriminator’s advanced perceptual discrimination capability to differentiate between super-resolved and high-resolution images, thereby enhancing the reconstruction of sharp edges and fine texture details. The loss weights $α$ , $β$ and $γ$ are determined by the weight-setting module:

$α = \{\begin{matrix} 0 & if p > 0.2 \\ (1 - p) \cdot 10 & otherwise \end{matrix}, β = 1.0, γ = 0.1 .$

(3)

Algorithm 1 Edge detection and weight setting.

Input: GT image set

X

.

Output: Weight of L1 loss, Perceptual loss and Adversarial loss

α, β, γ

.

1:: for all $I_{G T}$ such that $I_{G T}$ ∈ $X$ do
2:: generate proportion of high-frequency information p through edge detection module from $I_{G T}$ .
3:: Get $α, β, γ$ through weight setting module by p
4:: if $p > 0.2$ then
5:: $α = 0, β = 1.0, γ = 0.1$
6:: else
7:: $α = (1 - p) * 10, β = 1.0, γ = 0.1$
8:: end if
9:: end forreturn $α, β, γ$ .

4. Experiments

4.1. Training Details

In our experimental setup, we utilize a ×4 scaling ratio to generate low-resolution (LR) images from their high-resolution (HR) counterparts. The HR images are initially cropped into

400 \times 400

patches, which are subsequently degraded through our enhanced four-patch-augmented high-order degradation approach [10] to produce LR images at

100 \times 100

resolution. The cropped HR and LR patches maintain dimensions of

256 \times 256

and

64 \times 64

respectively. For model training, we employ a distributed batch configuration of

12 \times 2

, indicating 12 samples per GPU across two parallel GPUs.

The training protocol consists of two distinct phases: generator pre-training followed by adversarial training. During the initial phase, the generator undergoes exclusive optimization using L1 loss with a learning rate of

2 \times 10^{- 4}

over 400,000 iterations. This pre-trained generator then serves as the foundation for subsequent adversarial training, where it collaborates with the discriminator under a composite objective function incorporating L1 loss, perceptual loss, and adversarial loss. The BHFTM algorithm dynamically balances these loss components throughout training. Both networks maintain a constant learning rate of

1 \times 10^{- 4}

throughout the 280,000-iteration adversarial phase. This staged approach yields two benefits: (1) it prevents generator convergence to inferior local optima, producing superior visual results, and (2) it refines the discriminator’s capacity to discern textural details, thereby elevating the quality of super-resolved images during adversarial training.

Optimization is performed using the Adam algorithm [47] with momentum coefficients

β_{1} = 0.9

and

β_{2} = 0.99

. The generator and discriminator undergo alternating updates until model convergence. All implementations were developed using the PyTorch framework and were executed on NVIDIA GeForce RTX 3090 GPU hardware.

4.2. Data

We trained our model on the DIV2K [48], Flickr2K [13] and the OutdoorSceneTraining(OST) [49] dataset, whose abundant high-resolution images with complex textures facilitate the synthesis of more natural super-resolved results [8]. Experimental validation was conducted on several widely recognized benchmarks, such as OST300 [49], BSDS100 and BSD200 [50], 2020track1 [51], Urban100 [52], and RealWorld38 [31], ensuring a thorough comparison with existing methods.

4.3. Qualitative Results

To assess the effectiveness of our approach, we compared SupGAN with several leading SR techniques, including Real-ESRGAN, BSRGAN, PDM_SR, SwinIR, LDL, and RealSR, using the Natural Image Quality Evaluator (NIQE) as the primary evaluation metric. Table 1 summarizes their performance across six benchmark datasets, where a lower NIQE score indicates superior perceptual quality.

Visual comparisons, along with corresponding NIQE measurements, are provided in Figure 6 and Table 1. The results demonstrate that SupGAN consistently generates sharper and more natural reconstructions with fewer artifacts compared to existing methods. For example, in reconstructing fine textures such as animal fur (see 0934) and architectural details (see 157036), SupGAN produces more realistic outputs than BSRGAN and SwinIR, which often exhibit distorted textures and noise. Meanwhile, RealSR and Real-ESRGAN+ struggle to recover high-frequency details effectively.

Furthermore, SupGAN enhances edge clarity and structural sharpness (see 0842), whereas competing methods either oversmooth details (Real-ESRGAN+ and SwinIR) or fail to restore sufficient high-frequency components (PDM_SR). Notably, while other GAN-based approaches like LDL and RealSR occasionally introduce unnatural artifacts, SupGAN suppresses such distortions, yielding cleaner and more visually coherent results (see 0928).

Although SupGAN does not achieve the lowest NIQE scores in all cases, we emphasize that human perceptual quality should take precedence in SR evaluation, as current quantitative metrics do not fully capture real-world visual fidelity. Our method prioritizes naturalness and detail preservation, aligning better with human judgment than metrics alone.

4.4. Ablation Study

To validate the effectiveness of the BHFTM and four-patch methods, we applied them to the Real-ESRGAN and BSRGAN models and compared them to the original approaches. We have presented some representative qualitative results with NIQE in Table 2. It can be observed that the improved Real-ESRGAN and BSRGAN models with BHFTM and four-patch methods have shown significant improvements in terms of the NIQE metric. This validates the effectiveness of the BHFTM and four-patch methods.

4.5. Efficiency Analysis

Table 3 shows the inference speeds of the original model, the loss-GAN with BHFTM, and the complete Sup-GAN. It can be seen that the introduction of these modules has a negligible impact on speed. Figure 7 and Figure 8 shows the differences in visual effects between each model.

5. Conclusions

We propose a SupGAN that achieves superior perceptual quality through profound NIQE tests. The SupGAN model uses a multi-scale feature extraction and fusion generator for a more comprehensive perception of image features. The four-patch method used can better simulate the complex degradation in reality. SupGAN can produce more natural and detailed textures while suppressing unpleasant artifacts. In addition, it can dynamically adjust the weights of the loss function based on the proportion of high- and low-frequency information in the image.

Author Contributions

T.W.: Project administration, Funding; S.X.g: Writing—review & editing, Funding; Q.C.: Writing—original draft; H.L.: Software, Writing—review & editing; W.C.: Validation, Writing—review & editing; H.T.: Resources, Writing—original draft. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Tencent Co.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This research is funded by Tencent, and the code-related technologies belong to Tencent. Before allowed by company, we cannot disclose them.

Acknowledgments

We would like to thank Liang Yu and Jiajun Liu, who took over the work of Huaizheng Liu and Weijun Cao, respectively, contributed to revising the manuscript, 280 and thus meet the criteria for authorship with their significant contributions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, L.; Wu, X. An edge-guided image interpolation algorithm via directional filtering and data fusion. IEEE Trans. Image Process. 2006, 15, 2226–2238. [Google Scholar] [CrossRef] [PubMed]
Duchon, C.E. Lanczos filtering in one and two dimensions. J. Appl. Meteorol. Climatol. 1979, 18, 1016–1022. [Google Scholar] [CrossRef]
Dai, S.; Han, M.; Xu, W.; Wu, Y.; Gong, Y.; Katsaggelos, A.K. Softcuts: A soft edge smoothness prior for color image super-resolution. IEEE Trans. Image Process. 2009, 18, 969–981. [Google Scholar] [CrossRef] [PubMed]
Sun, J.; Xu, Z.; Shum, H.-Y. Image super-resolution using gradient profile prior. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
Yan, Q.; Xu, Y.; Yang, X.; Nguyen, T.Q. Single image super-resolution based on gradient profile sharpness. IEEE Trans. Image Process. 2015, 24, 3187–3202. [Google Scholar] [PubMed]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 2672–2680. [Google Scholar]
Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Loy, C.C. ESRGAN: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision Workshops, Munich, Germany, 8–14 September 2018; pp. 63–79. [Google Scholar]
Jolicoeur-Martineau, A. The relativistic discriminator: A key element missing from standard gan. arXiv 2018, arXiv:1807.00734. [Google Scholar] [CrossRef]
Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-ESRGAN: Training Real-World Blind Super-Resolution with Pure Synthetic Data. In Proceedings of the ICCVW, Montreal, BC, Canada, 11–17 October 2021; pp. 1905–1914. [Google Scholar]
Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
Lai, W.-S.; Huang, J.-B.; Ahuja, N.; Yang, M. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 5835–5843. [Google Scholar]
Timofte, R.; Agustsson, E.; Van Gool, L.; Yang, M.H.; Zhang, L.; Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M.; et al. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 1110–1121. [Google Scholar]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the ECCV, Zurich, Switzerland, 6–12 September 2014; pp. 184–199. [Google Scholar]
Dong, C.; Loy, C.C.; Tang, X. Accelerating the Super-Resolution Convolutional Neural Network. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 391–407. [Google Scholar]
Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Yang, F.; Yang, H.; Fu, J.; Lu, H.; Guo, B. Learning texture transformer network for image super-resolution. In Proceedings of the CVPR, Seattle, WA, USA, 13–19 June 2020; pp. 5791–5800. [Google Scholar]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
Shang, T.; Dai, Q.; Zhu, S.; Yang, T.; Guo, Y. Perceptual Extreme Super Resolution Network with Receptive Field Block. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Seattle, WA, USA, 14–19 June 2020; pp. 1778–1787. [Google Scholar]
Liu, S.; Huang, D. Receptive field block net for accurate and fast object detection. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 385–400. [Google Scholar]
Zhao, H.; Gallo, O.; Frosio, I.; Kautz, J. Loss functions for neural networks for image processing. arXiv 2015, arXiv:1511.08861. [Google Scholar]
Johnson, J.; Alahi, A.; Li, F.-F. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision; Springer International Publishing: Cham Switzerland, 2016; pp. 694–711. [Google Scholar]
Bruna, J.; Sprechmann, P.; LeCun, Y. Super-resolution with deep convolutional sufficient statistics. arXiv 2015, arXiv:1511.05666. [Google Scholar]
Sajjadi, M.S.M.; Scholkopf, B.; Hirsch, M. EnhanceNet: Single image super-resolution through automated texture synthesis. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 4501–4510. [Google Scholar]
Mechrez, R.; Talmi, I.; Zelnik-Manor, L. The Contextual Loss for Image Transformation with Non-Aligned Data. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 800–815. [Google Scholar]
Elad, M.; Feuer, A. Restoration of a single super-resolution image from several blurred, noisy, and undersampled measured images. IEEE Trans. Image Process. 1997, 6, 1646–1658. [Google Scholar] [CrossRef] [PubMed]
Liu, C.; Sun, D. On bayesian adaptive video super resolution. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 36, 346–360. [Google Scholar] [CrossRef] [PubMed]
Yuan, Y.; Liu, S.; Zhang, J.; Zhang, Y.; Dong, C.; Lin, L. Unsupervised image super-resolution using cycle-in-cycle generative adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Fritsche, M.; Gu, S.; Timofte, R. Frequency separation for real-world super-resolution. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW), Seoul, Republic of Korea, 27–28 October 2019; pp. 3599–3608. [Google Scholar]
Wang, L.; Wang, Y.; Dong, X.; Xu, Q.; Yang, J.; An, W.; Guo, Y. Unsupervised Degradation Representation Learning for Blind Super-Resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 10581–10590. [Google Scholar]
Zhang, K.; Liang, J.; Gool, L.V.; Timofte, R. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 4771–4780. [Google Scholar]
Sheikh, H.R.; Bovik, A.C.; Veciana, G.D. An information fidelity criterion for image quality assessment using natural scene statistics. IEEE Trans. Image Process. 2005, 14, 2117–2128. [Google Scholar] [CrossRef] [PubMed]
Sheikh, H.R.; Bovik, A.C. Image information and visual quality. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, Montreal, QC, Canada, 17–21 May 2004; pp. 709–712. [Google Scholar]
Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
Ma, C.; Yang, C.Y.; Yang, X.; Yang, M.H. Learning a noreference quality metric for single-image super-resolution. Comput. Vis. Image Underst. 2017, 158, 1–16. [Google Scholar] [CrossRef]
Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
Mittal, A.; Moorthy, A.K.; Bovik, A.C. No-reference image quality assessment in the spatial domain. IEEE Trans. Image Process. 2012, 21, 4695–4708. [Google Scholar] [CrossRef] [PubMed]
Blau, Y.; Michaeli, T. The perception-distortion tradeoff. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6228–6237. [Google Scholar]
Vasu, S.; Madam, N.T.; Rajagopalan, A.N. Analyzing Perception-Distortion Tradeoff Using Enhanced Perceptual Super-Resolution Network. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 114–131. [Google Scholar]
Blau, Y.; Mechrez, R.; Timofte, R.; Michaeli, T.; Zelnik-Manor, L. The 2018 PIRM Challenge on Perceptual Image Super-Resolution. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018; pp. 334–355. [Google Scholar]
Christian, S.; Wei, L.; Yangqing, J.; Pierre, S.; Scott, R.; Dragomir, A.; Dumitru, E.; Vincent, V.; Andrew, R. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Canny, J. A Computational Approach to Edge Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 6, 679–698. [Google Scholar]
Roberts, L.G. Machine Perception of Three-Dimensional Solids. Optical and Electro-Optical Information Processing. Doctoral’s Dissertation, Massachusetts Institute of Technology, Cambridge, MA, USA, 1963. [Google Scholar]
Sobel, I. An isotropic 3 × 3 image gradient operator. In Machine Vision for Three-Dimensional Scenes; Freeman, H., Ed.; Academic Press: San Diego, CA, USA, 1990; pp. 376–379. [Google Scholar]
Prewitt, J.M. Object enhancement and extraction. Pict. Process. Psychopictorics 1970, 10, 15–19. [Google Scholar]
Kingma, D.P.; Ba, J.L. Adam: A method for stochastic optimization. In Proceedings of the ICLR, San Diego, CA, USA, 7–9 May 2015; pp. 1–15. [Google Scholar]
Agustsson, E.; Timofte, R. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 1122–1131. [Google Scholar]
Wang, X.; Yu, K.; Dong, C.; Loy, C.C. Recovering realistic texture in image super-resolution by deep spatial feature transform. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 606–615. [Google Scholar]
Martin, D.; Fowlkes, C.; Tal, D.; Malik, J. A database of human segmented naturalimages and its application to evaluating segmentation algorithms and measuringecological statistics. In Proceedings of the Eighth IEEE International Conference on Computer Vision, Vancouver, BC, Canada, 7–14 July 2001. [Google Scholar]
Lugmayr, A.; Danelljan, M.; Timofte, R. Ntire 2020 challenge on real-world image super-resolution: Methods and results. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 494–495. [Google Scholar]
Huang, J.B.; Singh, A.; Ahuja, N. Single image super-resolution from transformedself-exemplars. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]

Figure 1. Comparisons of visual quality among BSRGAN, Real-ESRGAN+, and SupGAN on real-life images with

\times 4

upscaling. The SupGAN can generate naturally finer textures and remove or alleviate annoying artifacts in real-world images. Zoom-in for the best view.

Figure 1. Comparisons of visual quality among BSRGAN, Real-ESRGAN+, and SupGAN on real-life images with

\times 4

upscaling. The SupGAN can generate naturally finer textures and remove or alleviate annoying artifacts in real-world images. Zoom-in for the best view.

Figure 2. Design of a MFEB in the generator network.

Figure 3. Design of four-patch degradation model.

Figure 4. Design of BHFTM.

Figure 5. The detection effect of the Canny operator.

Figure 6. Qualitative results of SupGAN. SupGAN produces more subtle textures and clearer structures, animal textures, and building structures, and also fewer unpleasant artifacts. Zoom in for best view.

Figure 7. Compared to the original Real-ESRGAN, both Sup-Real-ESRGAN and Loss-Real-ESRGAN restored clearer door frame structures in the first image and eliminated unnecessary distorted horizontal stripes in the second image. The door-frame tones generated by Sup-Real-ESRGAN are softer, closer to the style of the original image, and the door-frame structure is clearer.

Figure 8. Both Sup-BSRGAN and Loss-BSRGAN, compared to the original BSRGAN, eliminate unnecessary cracks in the first row image and restore clearer animal fur textures in the second row image. The animal fur colors generated by Sup-BSRGAN are lighter than those generated by Loss-BSRGAN, more accurately reflecting the white fur of snow leopards in the real world.

Table 1. NIQE scores on diverse testing datasets: the lower, the better. Colors Red, Green, Blue indicate the best 1st, 2nd, and 3rd NIQE results among models on each dataset row. The calculation method of NIQE is derived from the basic SR package of PyTorch 1.11.0+cu113.

	Bicubic	BSRGAN	PDM_SR	SwinIR	LDL	RealSR	Real-ESRGAN+	SupGAN
OST300	7.600	3.309	4.319	2.921	2.817	3.521	2.806	2.633
RealWorld38	7.701	4.677	4.939	4.452	4.376	4.606	4.538	4.302
BSDS100	7.548	3.865	4.785	3.623	3.788	3.773	3.902	3.257
BSDS200	7.576	4.081	5.029	3.833	3.910	3.968	4.124	3.536
urban100	7.235	4.258	4.984	4.080	3.637	4.024	4.193	3.820
2020track1	7.596	3.783	4.101	3.622	3.181	3.790	3.820	3.372

Table 2. NIQE scores on diverse testing datasets: the lower, the better. Colors Red indicate the best NIQE results among models on each dataset row. The calculation method of NIQE is derived from the basic SR package of PyTorch 1.11.0+cu113.

	RealEsrgan+		SupGAN		BSRGAN
	Origin	Origin+Sup	Origin	Origin+Sup	Origin	Origin+Sup
OST300	2.806	2.685	2.637	2.633	3.309	3.151
2020track1	3.82	3.631	3.489	3.372	3.783	3.650
RealWorld38	4.538	4.386	4.339	4.302	4.677	4.201
BSDS100	3.899	3.412	3.302	3.257	3.863	3.745
BSDS200	4.123	3.72	3.583	3.536	4.076	3.841
urban100	4.193	3.913	3.985	3.820	4.258	3.948

Table 3. Comparison of inference speeds for the original model, loss-GAN with BHFTM, and Sup-GAN, indicating minimal overhead introduced by added modules.

	Real-ESRGAN	Loss-Real-ESRGAN	Sup-Real-ESRGAN	BSRGAN	Loss-BSRGAN	Sup-BSRGAN
OST300	11 m 24 s	11 m 51 s	11 m 17 s	10 m 10 s	10 m 56 s	10 m 54 s
2020track 1	1 m 52 s	1 m 42 s	2 m 10 s	2 m 4 s	2 m 11 s	2 m 14 s
RealWorld38	22 s	23 s	24 s	17 s	13 s	13 s
DIV2K 100	2 m	2 m 32 s	2 m 34 s	2 m 11 s	2 m 17 s	2 m 15 s
BSDS100	1 m 45 s	2 m 2 s	2 m	1 m 51 s	2 m 4 s	2 m 7 s
BSDS200	3 m 28 s	3 m 6 s	3 m 5 s	3 m 55 s	4 m 8 s	4 m 10 s
Urban100	8 m 53 s	8 m 25 s	8 m 54 s	11 m 8 s	11 m 10 s	11 m 12 s

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, T.; Xiong, S.; Chen, Q.; Liu, H.; Cao, W.; Tuo, H. SupGAN: A General Super-Resolution GAN-Promoting Training Method. Appl. Sci. 2025, 15, 9231. https://doi.org/10.3390/app15179231

AMA Style

Wu T, Xiong S, Chen Q, Liu H, Cao W, Tuo H. SupGAN: A General Super-Resolution GAN-Promoting Training Method. Applied Sciences. 2025; 15(17):9231. https://doi.org/10.3390/app15179231

Chicago/Turabian Style

Wu, Tao, Shuo Xiong, Qiuhang Chen, Huaizheng Liu, Weijun Cao, and Haoran Tuo. 2025. "SupGAN: A General Super-Resolution GAN-Promoting Training Method" Applied Sciences 15, no. 17: 9231. https://doi.org/10.3390/app15179231

APA Style

Wu, T., Xiong, S., Chen, Q., Liu, H., Cao, W., & Tuo, H. (2025). SupGAN: A General Super-Resolution GAN-Promoting Training Method. Applied Sciences, 15(17), 9231. https://doi.org/10.3390/app15179231

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

SupGAN: A General Super-Resolution GAN-Promoting Training Method

Abstract

1. Introduction

2. Related Work

3. Proposed Methods

3.1. Generator Architecture

3.2. Four-Patch Degradation Model

3.3. Network Training Method

4. Experiments

4.1. Training Details

4.2. Data

4.3. Qualitative Results

4.4. Ablation Study

4.5. Efficiency Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI