Multi-Scale Adaptive Modulation Network for Efficient Image Super-Resolution
Abstract
1. Introduction
- We propose a novel multi-scale adaptive modulation layer (MAML) that employs multi-scale decomposition and variance-based modulation to effectively extract multi-scale global structural information.
- We design a lightweight local detail extraction layer (LDEL) to capture fine local details, complemented by Swin Transformer Layers (STLs) to efficiently model long-range dependencies.
- Through comprehensive quantitative and qualitative evaluations on five benchmark datasets, we demonstrate that our method achieves a favorable trade-off between computational complexity and reconstruction quality.
2. Related Works
2.1. CNN-Based Super-Resolution
2.2. ViT-Based Super-Resolution
2.3. Lightweight and Efficient Image Super-Resolution
3. Proposed Method
3.1. Multi-Scale Adaptive Modulation Layer
3.2. Local Detail Extraction Layer
3.3. Swin Transformer Layer
3.4. Multi-Scale Adaptive Modulation Block
4. Experimental Results
4.1. Datasets and Implementation Details
4.2. Comparisons with State-of-the-Art Methods
4.3. Model Analysis
- Feature Modulation. The MAML incorporates a feature modulation mechanism to adaptively adjust feature weights. Ablation results show that removing this operation (“w/o FM”) leads to performance degradation of 0.02 dB on Set5 and 0.03 dB on Manga109 compared to the baseline model.
- Multi-scale representation. To evaluate the effectiveness of multi-scale features in the proposed MAMN, we construct two variant models: “w/o MC” and “w/o Down”. Here, the “w/o MC” configuration replaces the multi-scale depth-wise convolution with a single-scale depth-wise convolution for spatial feature extraction. As shown in Table 4, the use of multi-scale features yields a PSNR improvement of 0.04 dB on the Manga109 dataset. The “w/o Down” variant, which removes the downsampling operation, confirms that incorporating downsampling can bring about superior PSNR performance. These results demonstrate that multi-scale feature extraction enhances the model’s ability to capture information at different levels of detail, thereby improving SR reconstruction. Furthermore, we employ adaptive max pooling to construct multi-scale representations. In comparison to adaptive average pooling and nearest interpolation, adaptive max pooling more effectively identifies salient features, contributing to improved reconstruction quality.
- Variance modulation. To enhance the ability to capture non-local information, the proposed MAMN incorporates variance modulation within the MAML branch. An ablation study is conducted by removing this operation to evaluate its contribution. As summarized in Table 4, the absence of variance modulation results in a consistent performance reduction of 0.06 dB on both the Set5 and Manga109 datasets. Furthermore, replacing variance modulation with standard attention mechanisms improves performance but leads to sharp increases in parameters and computational complexity, rising by 58.6% and 81%, respectively. These findings confirm that variance modulation plays a critical role in improving the representational capacity of the model.
- Feature aggregation. To evaluate the effectiveness of feature aggregation, we construct an ablation model denoted as “w/o FA”, in which the convolutional layer for the integration of multi-scale features along the channel dimensions is removed. Experimental results show that incorporating feature aggregation improves PSNR by 0.06 dB on the Set5 dataset and 0.07 dB on the Manga109 dataset. These results demonstrate the essential role of multi-scale feature aggregation in improving reconstruction performance.
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Nomenclature
| CNNs | Convolutional neural networks | SA | Self-attention |
| MAMN | Multi-scale adaptive modulation network | LR | Low-resolution |
| MAMB | Multi-scale adaptive modulation block | HR | High-resolution |
| MAML | Multi-scale adaptive modulation layer | ViT | Vision Transformer |
| LDEL | Local detail extraction layer | STL | Swin transformer layer |
| SISR | Single-image super-resolution | Variance | |
| Adaptive max pooling downsampling | N | Total pixels | |
| Nearest interpolation upsampling | GELU activation | ||
| MSA | Multi-head self-attention | MLP | Multi-layer perceptron |
| PSNR | Peak signal-to-noise ratio | Mean squared error | |
| SSIM | Structural similarity index measure | Maximum pixel value | |
| GAN | Generative adversarial network |
References
- Donya, K.; Abdolah, A.; Kian, J.; Mohammad, H.M.; Abolfazl, Z.K.; Najmeh, M. Low-Cost Implementation of Bilinear and Bicubic Image Interpolation for Real-Time Image Super-Resolution. In Proceedings of the GHTC, Online, 29 October–1 November 2020; pp. 1–5. [Google Scholar] [CrossRef]
- Ruangsang, W.; Aramvith, S. Efficient super-resolution algorithm using overlapping bicubic interpolation. In Proceedings of the GCCE, Nagoya, Japan, 24–27 October 2017; pp. 1–2. [Google Scholar] [CrossRef]
- Dai, S.; Han, M.; Xu, W.; Wu, Y.; Gong, Y. Soft Edge Smoothness Prior for Alpha Channel Super Resolution. In Proceedings of the CVPR, Minneapolis, MN, USA, 23–28 June 2007; pp. 1–8. [Google Scholar] [CrossRef]
- Radu, T.; Vincent, D.; Luc, V. A+: Adjusted Anchored Neighborhood Regression for Fast Super-Resolution. In Proceedings of the ACCV, Singapore, 1–5 November 2015; pp. 111–126. [Google Scholar] [CrossRef]
- Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image Super-Resolution Via Sparse Representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef] [PubMed]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the ECCV, Zurich, Switzerland, 6–12 September 2014; pp. 184–199. [Google Scholar] [CrossRef]
- Ahn, N.; Kang, B.; Sohn, K. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 252–268. [Google Scholar] [CrossRef]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced deep residual networks for single image super-resolution. In Proceedings of the CVPRW, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar] [CrossRef]
- Sun, L.; Dong, J.; Tang, J.; Pan, J. Spatially-adaptive feature modulation for efficient image super-resolution. In Proceedings of the ICCV, Paris, France, 2–6 October 2023; pp. 13190–13199. [Google Scholar] [CrossRef]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the ICCVW, Montreal, QC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar] [CrossRef]
- Zhou, Y.; Li, Z.; Guo, C.L.; Bai, S.; Cheng, M.M.; Hou, Q. SRFormer: Permuted Self-Attention for Single Image Super-Resolution. In Proceedings of the ICCV, Paris, France, 2–6 October 2023; pp. 12734–12745. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the ECCV, Amsterdam, The Netherlands, 11–14 October 2016; pp. 391–407. [Google Scholar] [CrossRef]
- Zhang, X.; Zeng, H.; Zhang, L. Edge-oriented Convolution Block for Real-time Super Resolution on Mobile Devices. In Proceedings of the ACMM, Virtual Event, China, 20–24 October 2021; pp. 4034–4043. [Google Scholar] [CrossRef]
- Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. In Proceedings of the CVPR, Nashville, TN, USA, 10–25 June 2021; pp. 12299–12310. [Google Scholar] [CrossRef]
- Zamir, S.W.; Arora, A.; Khan, S.; Hayat, M.; Khan, F.S.; Yang, M.H. Restormer: Efficient transformer for high-resolution image restoration. In Proceedings of the CVPR, New Orleans, LA, USA, 18–24 June 2022; pp. 5728–5739. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, Y.; Gu, J.; Kong, L.; Yang, X.; Yu, F. Dual aggregation transformer for image super-resolution. In Proceedings of the ICCV, Paris, France, 2–6 October 2023; pp. 12278–12287. [Google Scholar] [CrossRef]
- Dong, J.; Pan, J.; Yang, Z.; Tang, J. Multi-scale residual low-pass filter network for image deblurring. In Proceedings of the ICCV, Paris, France, 2–6 October 2023; pp. 12311–12320. [Google Scholar] [CrossRef]
- Namuk, P.; Songkuk, K. How Do Vision Transformers Work? In Proceedings of the ICLR, Virtual, 25–29 April 2022. [Google Scholar] [CrossRef]
- Matsui, Y.; Ito, K.; Aramaki, Y.; Fujimoto, A.; Ogawa, T.; Yamasaki, T.; Aizawa, K. Sketch-based manga retrieval using manga109 dataset. Multimed. Tools Appl. 2017, 76, 21811–21838. [Google Scholar] [CrossRef]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-Time Single Image and Video Super-Resolution Using an Efficient Sub-Pixel Convolutional Neural Network. In Proceedings of the CVPR, Las Vegas, NV, USA, 27–30 June 2016; pp. 1874–1883. [Google Scholar] [CrossRef]
- Kim, J.; Lee, J.; Lee, K. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the CVPR, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar] [CrossRef]
- Ledig, C.; Theis, L.; Huszar, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In Proceedings of the CVPR, Honolulu, HI, USA, 21–26 July 2017; pp. 105–114. [Google Scholar] [CrossRef]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Chen, C.L. ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. In Proceedings of the ECCV, Munich, Germany, 8–14 September 2019; pp. 63–79. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. In Proceedings of the NeurIPS, Red Hook, NY, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar] [CrossRef]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proceedings of the ICLR, Vienna, Austria, 3–7 May 2021. [Google Scholar] [CrossRef]
- Zhang, X.; Zeng, H.; Guo, S.; Zhang, L. Efficient long-range attention network for image super-resolution. In Proceedings of the ECCV, Tel Aviv, Israel, 23–27 October 2022; pp. 649–667. [Google Scholar] [CrossRef]
- Chen, X.; Wang, X.; Zhou, J.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the CVPR, Vancouver, BC, Canada, 18–22 June 2023; pp. 22367–22377. [Google Scholar] [CrossRef]
- Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the ACMM, Nice, France, 21–25 October 2019; pp. 2024–2032. [Google Scholar] [CrossRef]
- Luo, X.; Xie, Y.; Zhang, Y.; Qu, Y.; Li, C.; Fu, Y. Latticenet: Towards lightweight image super-resolution with lattice block. In Proceedings of the ECCV, Glasgow, UK, 23–28 August 2020; pp. 272–289. [Google Scholar] [CrossRef]
- Peng, C.; Shu, P.; Huang, X.; Fu, Z.; Li, X. LCRCA: Image super-resolution using lightweight concatenated residual channel attention networks. Appl. Intell. 2022, 52, 10045–10059. [Google Scholar] [CrossRef]
- Sun, L.; Pan, J.; Tang, J. Shufflemixer: An efficient convnet for image super-resolution. In Proceedings of the NeurIPS, New Orleans, LA, USA, 28 November–9 December 2022; pp. 17314–17326. [Google Scholar] [CrossRef]
- Li, Z.; Liu, Y.; Chen, X.; Cai, H.; Gu, J.; Qiao, Y.; Dong, C. Blueprint separable residual network for efficient image super-resolution. In Proceedings of the CVPRW, New Orleans, LA, USA, 19–20 June 2022; pp. 833–843. [Google Scholar] [CrossRef]
- Fang, J.; Lin, H.; Chen, X.; Zeng, K. A hybrid network of cnn and transformer for lightweight image super-resolution. In Proceedings of the CVPRW, New Orleans, LA, USA, 19–20 June 2022; pp. 1102–1111. [Google Scholar] [CrossRef]
- Tian, C.; Zhang, X.; Wang, T.; Zhang, Y.; Zhu, Q.; Chia-Wen, L. A Heterogeneous Dynamic Convolutional Neural Network for Image Super-resolution. Image Video Process. 2024. [Google Scholar] [CrossRef]
- Zheng, M.; Sun, L.; Dong, J.; Jinshan, P. SMFANet: A Lightweight Self-Modulation Feature Aggregation Network for Efficient Image Super-Resolution. In Proceedings of the ECCV, Milan, Italy, 29 September–4 October 2025; pp. 359–375. [Google Scholar] [CrossRef]
- Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the CVPR, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Bevilacqua, M.; Roumy, A.; Guillemot, C.; Morel, M.l.A. Low-Complexity Single Image Super-Resolution Based on Nonnegative Neighbor Embedding. In Proceedings of the BMVC, Surrey, UK, 3–7 September 2012; pp. 1–10. [Google Scholar] [CrossRef]
- Zeyde, R.; Elad, M.; Protter, M. On Single Image Scale-Up Using Sparse-Representations. In Proceedings of the Curves and Surfaces, Avignon, France, 24–30 June 2010; pp. 711–730. [Google Scholar] [CrossRef]
- Arbeláez, P.; Maire, M.; Fowlkes, C.; Malik, J. Contour Detection and Hierarchical Image Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2010, 33, 898–916. [Google Scholar] [CrossRef]
- Huang, J.; Singh, A.; Ahuja, N. Single image super-resolution from transformed self-exemplars. In Proceedings of the CVPR, Boston, MA, USA, 7–12 June 2015; pp. 5197–5206. [Google Scholar] [CrossRef]
- Korhonen, J.; You, J. Peak signal-to-noise ratio revisited: Is simple beautiful? In Proceedings of the QoMEX, Melbourne, Australia, 5–7 July 2012; pp. 37–38. [Google Scholar] [CrossRef]
- Zhou, W.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
- Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the ICLR, San Diego, CA, USA, 7–9 May 2015. [Google Scholar] [CrossRef]
- Loshchilov, I.; Hutter, F. Sgdr: Stochastic gradient descent with warm restarts. In Proceedings of the ICLR, Toulon, France, 24–26 April 2017. [Google Scholar] [CrossRef]
- Zhao, H.; Kong, X.; He, J.; Qiao, Y.; Dong, C. Efficient image super-resolution using pixel attention. In Proceedings of the ECCVW, Glasgow, UK, 23–28 August 2020; pp. 56–72. [Google Scholar] [CrossRef]
- Zhang, K.; Zuo, W.; Zhang, L. Deep Plug-and-Play Super-Resolution for Arbitrary Blur Kernels. In Proceedings of the CVPR, Long Beach, CA, USA, 15–20 June 2019; pp. 1671–1681. [Google Scholar] [CrossRef]
- Gao, G.; Li, W.; Li, J.; Wu, F.; Lu, H.; Yu, Y. Feature distillation interaction weighting network for lightweight image super-resolution. In Proceedings of the AAAI, Virtual, 22 February–1 March 2022; pp. 661–669. [Google Scholar] [CrossRef]
- Li, F.; Cong, R.; Wu, J.; Bai, H.; Wang, M.; Zhao, Y. SRConvNet: A Transformer-Style ConvNet for Lightweight Image Super-Resolution. Int. J. Comput. Vis. 2025, 133, 173–189. [Google Scholar] [CrossRef]
- Chen, Z.; Zhang, Y.; Gu, J.; Kong, L.; Yang, X. Recursive Generalization Transformer for Image Super-Resolution. In Proceedings of the ICLR, Vienna, Austria, 7–11 May 2024. [Google Scholar] [CrossRef]
- Gu, J.; Dong, C. Interpreting super-resolution networks with local attribution maps. In Proceedings of the CVPR, Nashville, TN, USA, 19–25 June 2021; pp. 9195–9204. [Google Scholar] [CrossRef]







| Scale | Model | Params [K] | FLOPs [G] | Set5 | Set14 | BSD100 | Urban100 | Manga109 |
|---|---|---|---|---|---|---|---|---|
| Bicubic [1] | - | 0.029 | 33.66/0.9299 | 30.24/0.8688 | 29.56/0.8431 | 26.88/0.8403 | 30.80/0.9339 | |
| FSRCNN [13] | 12 | 6 | 36.99/0.9564 | 32.70/0.9093 | 31.50/0.8905 | 29.91/0.9020 | 36.44/0.9708 | |
| EDSR_baseline [8] | 1370 | 316 | 37.97/0.9605 | 33.61/0.9174 | 32.14/0.8993 | 31.98/0.9271 | 38.53/0.9769 | |
| CARN [7] | 1592 | 229 | 37.82/0.9601 | 33.59/0.9173 | 32.09/0.8985 | 31.96/0.9264 | 38.36/0.9765 | |
| IMDN [29] | 694 | 161 | 37.99/0.9605 | 33.67/0.9176 | 32.17/0.8994 | 32.17/0.9283 | 38.86/0.9773 | |
| PAN [47] | 261 | 71 | 37.99/0.9605 | 33.63/0.9179 | 32.16/0.8997 | 32.02/0.9272 | 38.68/0.9773 | |
| DPSR [48] | 1296 | 350 | 37.84/0.9601 | 33.55/0.9170 | 32.13/0.8993 | 31.91/0.9264 | 38.19/0.9764 | |
| LatticeNet [30] | 756 | 170 | 38.15/0.9610 | 33.78/0.9193 | 32.25/0.9005 | 32.43/0.9302 | - | |
| LCRCA [31] | 813 | 186 | 38.13/0.9610 | 33.69/0.9184 | 32.22/0.8999 | 32.36/0.9299 | - | |
| ShuffleMixer_base [32] | 394 | 91 | 38.00/0.9606 | 33.67/0.9179 | 32.15/0.8995 | 31.90/0.9256 | 38.80/0.9774 | |
| HNCT [34] | 0.36 | 82 | 38.08/0.9609 | 33.65/0.9184 | 32.23/0.9003 | 32.22/0.9296 | 38.87/0.9775 | |
| FDIWN [49] | 629 | 112 | 38.07/0.9608 | 33.75/0.9201 | 32.23/0.9003 | 32.40/0.9305 | 38.85/0.9774 | |
| HDSRNet [35] | 1820 | 291 | 37.94/0.9604 | 33.57/0.9169 | 32.13/0.8989 | 32.00/0.9266 | 38.30/0.9765 | |
| MAMN (Ours) | 302 | 80 | 38.12/0.9610 | 33.81/0.9194 | 32.28/0.9009 | 32.36/0.9302 | 39.21/0.9782 | |
| Bicubic [1] | - | 0.029 | 30.39/0.8682 | 27.55/0.7742 | 27.21/0.7385 | 24.46/0.7349 | 26.95/0.8556 | |
| FSRCNN [13] | 12 | 5 | 33.01/0.9142 | 29.53/0.8261 | 28.50/0.7890 | 26.40/0.8073 | 31.04/0.9217 | |
| EDSR_baseline [8] | 1555 | 160 | 34.37/0.9271 | 30.30/0.8416 | 29.08/0.8051 | 28.14/0.8525 | 33.44/0.9439 | |
| CARN [7] | 1592 | 119 | 34.33/0.9267 | 30.31/0.8414 | 29.06/0.8041 | 28.07/0.8500 | 33.50/0.9440 | |
| IMDN [29] | 703 | 76 | 34.36/0.9270 | 30.33/0.8415 | 29.09/0.8044 | 28.17/0.8519 | 33.60/0.9444 | |
| PAN [47] | 261 | 39 | 34.41/0.9272 | 30.37/0.8421 | 29.10/0.8049 | 28.10/0.8509 | 33.57/0.9447 | |
| DPSR [48] | 1296 | 194 | 34.36/0.9271 | 30.27/0.8417 | 29.09/0.8053 | 28.08/0.8512 | 33.30/0.9435 | |
| LatticeNet [30] | 765 | 76 | 34.53/0.9281 | 30.39/0.8424 | 29.15/0.8059 | 28.33/0.8538 | - | |
| LCRCA [31] | 822 | 84 | 34.51/0.9280 | 30.44/0.8432 | 29.15/0.8060 | 28.37/0.8558 | - | |
| ShuffleMixer_base [32] | 415 | 43 | 34.40/0.9272 | 30.37/0.8422 | 29.11/0.8051 | 28.08/0.8497 | 33.68/0.9447 | |
| HNCT [34] | 0.36 | 38 | 34.47/0.9278 | 30.44/0.8442 | 29.16/0.8072 | 28.29/0.8560 | 33.81/0.9461 | |
| FDIWN [49] | 645 | 52 | 34.52/0.9281 | 30.42/0.8438 | 29.14/0.8065 | 28.36/0.8567 | 33.77/0.9456 | |
| HDSRNet [35] | 2000 | 149 | 34.32/0.9268 | 30.28/0.8409 | 29.05/0.8041 | 28.01/0.8490 | 33.29/0.9431 | |
| MAMN (Ours) | 307 | 36 | 34.55/0.9284 | 30.55/0.8459 | 29.22/0.8082 | 28.43/0.8570 | 34.20/0.9478 | |
| Bicubic [1] | - | 0.029 | 28.42/0.8104 | 26.00/0.7027 | 25.96/0.6675 | 23.14/0.6577 | 24.89/0.7866 | |
| FSRCNN [13] | 12 | 5 | 30.73/0.8695 | 27.73/0.7592 | 27.00/0.7158 | 24.68/0.7313 | 28.00/0.8649 | |
| EDSR_baseline [8] | 1518 | 114 | 32.09/0.8938 | 28.58/0.7813 | 27.57/0.7358 | 26.03/0.7848 | 30.35/0.9067 | |
| CARN [7] | 1592 | 91 | 32.15/0.8948 | 28.61/0.7814 | 26.08/0.7845 | 26.07/0.7844 | 30.47/0.9084 | |
| IMDN [29] | 715 | 41 | 32.21/0.8948 | 28.58/0.7811 | 27.56/0.7353 | 26.04/0.7838 | 30.45/0.9075 | |
| PAN [47] | 261 | 22 | 32.13/0.8948 | 28.61/0.7822 | 27.60/0.7365 | 26.11/0.7854 | 30.51/0.9095 | |
| DPSR [48] | 1333 | 148 | 32.21/0.8956 | 28.68/0.7837 | 27.59/0.7365 | 26.15/0.7872 | 30.54/0.9097 | |
| LatticeNet [30] | 777 | 44 | 32.30/0.8962 | 28.68/0.7830 | 27.62/0.7367 | 26.25/0.7873 | - | |
| LCRCA [31] | 834 | 48 | 32.33/0.8963 | 28.68/0.7822 | 27.62/0.7357 | 26.23/0.7882 | - | |
| ShuffleMixer_base [32] | 411 | 28 | 32.21/0.8953 | 28.66/0.7827 | 27.62/0.7368 | 26.08/0.7835 | 30.65/0.9093 | |
| HNCT [34] | 0.37 | 22 | 32.30/0.8960 | 28.68/0.7833 | 27.64/0.7388 | 26.20/0.7900 | 30.70/0.9114 | |
| FDIWN [49] | 664 | 28 | 32.23/0.8955 | 28.66/0.7829 | 27.62/0.7380 | 26.27/0.7919 | 30.63/0.9098 | |
| HDSRNet [35] | 1970 | 108 | 32.14/0.8940 | 28.55/0.7804 | 27.56/0.7350 | 26.01/0.7832 | 30.36/0.9067 | |
| MAMN (Ours) | 314 | 21 | 32.35/0.8968 | 28.81/0.7856 | 27.70/0.7398 | 26.39/0.7929 | 31.04/0.9137 |
| Scale | Model | Params [M] | FLOPs [G] | Set5 | Set14 | BSD100 | Urban100 | Manga109 |
|---|---|---|---|---|---|---|---|---|
| SAFMN_c36n8 [10] | 0.23 | 52 | 38.00/0.9605 | 33.59/0.9176 | 32.15/0.8994 | 31.85/0.9255 | 38.69/0.9771 | |
| SMFANet [36] | 0.19 | 41 | 38.08/0.9607 | 33.65/0.9185 | 32.22/0.9002 | 32.20/0.9282 | 39.11/0.9779 | |
| SRConvNet [50] | 0.39 | 74 | 38.00/0.9605 | 33.58/0.9186 | 32.16/0.8995 | 32.05/0.9272 | 38.87/0.9774 | |
| SwinIR [11] | 11.75 | 2301 | 38.42/0.9623 | 34.46/0.9250 | 32.53/0.9041 | 33.81/0.9427 | 39.92/0.9797 | |
| HAT [28] | 20.71 | 5554 | 38.63/0.9630 | 34.86/0.9274 | 32.62/0.9053 | 34.45/0.9466 | 40.26/0.9809 | |
| RGT [51] | 10.10 | 2255 | 38.59/0.9628 | 34.83/0.9271 | 32.62/0.9050 | 34.47/0.9467 | 40.34/0.9808 | |
| MAMN (Ours) | 0.30 | 80 | 38.12/0.9610 | 33.81/0.9194 | 32.28/0.9009 | 32.36/0.9302 | 39.21/0.9782 | |
| SAFMN_c36n8 [10] | 0.23 | 23 | 34.35/0.9268 | 30.34/0.8417 | 29.08/0.8048 | 27.94/0.8473 | 33.57/0.9437 | |
| SMFANet [36] | 0.19 | 19 | 34.42/0.9274 | 30.41/0.8430 | 29.16/0.8065 | 28.22/0.8523 | 33.96/0.9460 | |
| SRConvNet [50] | 0.39 | 33 | 34.40/0.9272 | 30.30/0.8416 | 29.07/0.8047 | 28.04/0.8500 | 33.56/0.9443 | |
| SwinIR [11] | 11.94 | 1026 | 34.97/0.9318 | 30.93/0.8534 | 29.46/0.8145 | 29.75/0.8826 | 35.12/0.9537 | |
| HAT [28] | 20.85 | 2499 | 35.07/0.9329 | 31.08/0.8555 | 29.54/0.8167 | 30.23/0.8896 | 35.53/0.9552 | |
| RGT [51] | 10.24 | 1015 | 35.15/0.9329 | 31.13/0.8550 | 29.55/0.8165 | 30.28/0.8899 | 35.55/0.9553 | |
| MAMN (Ours) | 0.31 | 36 | 34.55/0.9284 | 30.55/0.8459 | 29.22/0.8082 | 28.43/0.8570 | 34.20/0.9478 | |
| SAFMN_c36n8 [10] | 0.24 | 14 | 32.18/0.8948 | 28.60/0.7813 | 27.58/0.7360 | 25.97/0.7809 | 30.43/0.9063 | |
| SMFANet [36] | 0.20 | 11 | 32.25/0.8956 | 28.71/0.7833 | 27.64/0.7377 | 26.18/0.7862 | 30.82/0.9104 | |
| SRConvNet [50] | 0.38 | 22 | 32.18/0.8951 | 28.61/0.7818 | 27.57/0.7359 | 26.06/0.7845 | 30.35/0.9075 | |
| SwinIR [11] | 11.90 | 834 | 32.92/0.9044 | 29.09/0.7950 | 27.92/0.7489 | 27.45/0.8254 | 32.03/0.9260 | |
| HAT [28] | 20.82 | 1458 | 33.04/0.9056 | 29.23/0.7973 | 28.00/0.7517 | 27.97/0.8368 | 32.48/0.9292 | |
| RGT [51] | 10.20 | 592 | 33.12/0.9060 | 29.23/0.7972 | 28.00/0.7513 | 27.98/0.8369 | 32.50/0.9291 | |
| MAMN (Ours) | 0.31 | 21 | 32.35/0.8968 | 28.81/0.7856 | 27.70/0.7398 | 26.39/0.7929 | 31.04/0.9137 |
| Methods | DPSR [48] | LatticeNet [30] | HNCT [34] | ShuffleMixer [32] | SMFANet [36] | SRConvNet [50] | MAMN (Ours) |
|---|---|---|---|---|---|---|---|
| #Avg.Time [s] | 0.169 | 0.120 | 0.083 | 0.211 | 0.034 | 0.062 | 0.066 |
| Ablation | Variant | Params [K] | FLOPs [G] | Set5 | Manga109 |
|---|---|---|---|---|---|
| Baseline | - | 314 | 21 | 32.35/0.8968 | 31.04/0.9137 |
| Main module | MAML→None | 276 | 19 | ↓0.08/↓0.0011 | ↓0.17/↓0.0018 |
| LDEL→None | 240 | 17 | ↓0.13/↓0.0021 | ↓0.36/↓0.0043 | |
| STL→None | 119 | 6 | ↓0.34/↓0.0038 | ↓0.74/↓0.0079 | |
| MAML | w/o FA | 303 | 20 | ↓0.06/↓0.0008 | ↓0.07/↓0.0009 |
| w/o FM | 314 | 21 | ↓0.02/↓0.0004 | ↓0.03/↓0.0003 | |
| w/o MC | 308 | 21 | ↓0.02/↑0.0001 | ↓0.04/↓0.0006 | |
| w/o Down | 314 | 21 | ↓0.01/↓0.0002 | ↓0.03/↓0.0004 | |
| AdaptiveMaxPool → AdaptiveAvgPool | 314 | 21 | ↓0.10/↓0.0014 | ↓0.02/↓0.0002 | |
| AdaptiveMaxPool → Nearest interpolate | 314 | 21 | ↓0.07/↓0.0008 | ↓0.12/↓0.0015 | |
| w/o VM | 314 | 21 | ↓0.06/↓0.0009 | ↓0.06/↓0.0007 | |
| VM → Self-attention | 498 | 38 | ↑0.05/↑0.0004 | ↑0.07/↑0.0005 | |
| Loss | w/o FFTLoss | 314 | 21 | ↓0.04/↓0.0005 | ↓0.01/↓0.0002 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Z.; Zhang, G.; Tian, J.; Qi, R. Multi-Scale Adaptive Modulation Network for Efficient Image Super-Resolution. Electronics 2025, 14, 4404. https://doi.org/10.3390/electronics14224404
Liu Z, Zhang G, Tian J, Qi R. Multi-Scale Adaptive Modulation Network for Efficient Image Super-Resolution. Electronics. 2025; 14(22):4404. https://doi.org/10.3390/electronics14224404
Chicago/Turabian StyleLiu, Zepeng, Guodong Zhang, Jiya Tian, and Ruimin Qi. 2025. "Multi-Scale Adaptive Modulation Network for Efficient Image Super-Resolution" Electronics 14, no. 22: 4404. https://doi.org/10.3390/electronics14224404
APA StyleLiu, Z., Zhang, G., Tian, J., & Qi, R. (2025). Multi-Scale Adaptive Modulation Network for Efficient Image Super-Resolution. Electronics, 14(22), 4404. https://doi.org/10.3390/electronics14224404

