DSConv+LR: A Minimalist Lightweight Network for Image Super-Resolution
Abstract
1. Introduction
- Depthwise separable convolution (DSConv)—for drastic parameter reduction.
- Lightweight channel attention (HAT)—to potentially compensate for information loss.
- Local residual connection (LR)—to improve gradient flow.
- A minimalist yet highly effective lightweight SR network derived from VDSR, along with a clear statement of its limitations.
- A rigorous ablation study that isolates the contributions of depthwise separable convolution, channel attention, and local residual connections.
- An honest negative result showing that a widely used lightweight attention module does not improve performance in this ultra-lightweight setting, supported by controlled experiments (increased training, reduction ratio r = 2, and standard convolution baseline).
- A strong emphasis on local residual connections as a simple, parameter-free tool for enhancing lightweight SR models, while acknowledging that such simplifications come at a cost.
2. Related Work
2.1. Deep SR Networks
2.2. Lightweight SR Networks
2.3. Attention Mechanisms in SR
3. Proposed Method
3.1. Baseline: VDSR
3.2. Building Blocks
- DSConv block: a single depthwise separable convolution followed by ReLU, as defined in Equation (1):y = ReLU (DSConv(x))A depthwise separable convolution factorises a standard convolution into two separate operations. First, a depthwise convolution applies a single filter per input channel. Second, a pointwise convolution (1 × 1) combines the outputs. This reduces the parameter count from Cin × Cout × k2 to Cin × k2 + Cin × Cout. For our setting (Cin = Cout = 64, k = 3), a standard convolution has 64 × 64 × 9 = 36,864 parameters. A depthwise separable convolution has 64 × 9 + 64 × 64 = 576 + 4096 = 4672 parameters—a reduction of about 87%. Despite this efficiency, the decoupling of channel mixing can reduce representational power.
- HAT block: DSConv + ReLU + a lightweight channel attention module (HAT), given by Equation (2). HAT uses global average pooling to squeeze spatial information. Then it uses two linear layers with a reduction ratio r = 4 to produce channel-wise weights. This is followed by a Sigmoid activation. The weights are multiplied element-wise with the feature map to recalibrate channels. The structure is: global average pooling (GAP) → a fully connected layer reducing dimension from 64 to 16 → ReLU → a fully connected layer expanding back to 64 → Sigmoid.y = HAT (ReLU (DSConv(x)))This block adds only a few thousand extra parameters. The two linear layers have 64 × 16 + 16 × 64 = 2048 weights. The total per block becomes 4672 + 2048 = 6720 parameters. The intention is to compensate for the loss of cross-channel information caused by depthwise separable convolution.
- LR block (proposed): DSConv + ReLU + a local residual connection that adds the input of the block to its output, as shown in Equation (3). No attention is used.y = ReLU (DSConv(x)) + x
3.3. Network Architecture
3.4. Training Details
4. Experiments
4.1. Datasets and Preprocessing
4.2. Implementation
- Framework: PyTorch 2.0.
- GPU: NVIDIA RTX4090D.
- Batch size: 64.
- Number of blocks: N = 10 for all lightweight models.
- Training epochs: 80.
4.3. Ablation Study
- VDSR (baseline): the original 20-layer standard convolution network, without any of our modifications. It serves as the performance upper bound in terms of PSNR. It also has the largest parameter count (665,921).
- DSConv: replaces the 18 intermediate standard convolutional layers with DSConv blocks (no attention, no local residual). This model has only 49,217 parameters. Its PSNR drops to 35.01 dB, a loss of 0.32 dB compared to VDSR.
- DSConv+HAT: adds the lightweight channel attention module to each DSConv block. This increases parameters to 69,697. Surprisingly, the PSNR remains 35.01 dB—identical to DSConv. This shows that HAT does not help in this ultra-lightweight setting.
- Efficient Hybrid Attention Super-Resolution (Eff-HASR): adds local residual connections to the HAT blocks (i.e., DSConv + HAT + local residual). Its parameter count is still 69,697. PSNR improves to 35.23 dB. This +0.22 dB gain over DSConv+HAT is entirely due to the local residual connection.
- DSConv+LR (ours): removes HAT but keeps the local residual connection. It uses only 49,217 parameters (same as DSConv). It achieves a PSNR of 35.21 dB. Compared to DSConv, this is a +0.20 dB gain with no extra parameters. Compared to Eff-HASR, it is slightly lower (35.21 vs. 35.23) but uses 30% fewer parameters.
4.4. Comparison with State-of-the-Art Lightweight Methods
4.5. Extension to ×4 Super-Resolution
4.6. Qualitative Results
5. Discussion
5.1. Why Did Channel Attention Not Help?
5.2. The Importance of Local Residual Connections
5.3. Fast Convergence
5.4. Limitations of the Lightweight Design
- It cannot fully match VDSR’s PSNR (0.12 dB lower on Set5).
- It lags behind larger lightweight networks like CARN and IMDN by 2–3 dB, but uses 9–14× fewer parameters.
- Compared to FSRCNN (34.07 dB), DSConv+LR achieves 35.21 dB with fewer parameters (49.2 K vs. 58.0 K) and better perceptual quality (perceptual loss 0.2556 vs. 0.295.).
- The model is not state-of-the-art; it is intended for extreme resource-constrained scenarios where memory and computation are the primary constraints.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Image super-resolution using deep convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1646–1654. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 8–16 October 2016; Springer: Cham, Switzerland, 2016; pp. 391–407. [Google Scholar] [CrossRef]
- Shi, W.; Caballero, J.; Huszár, F.; Totz, J.; Aitken, A.P.; Bishop, R.; Rueckert, D.; Wang, Z. Real-time single image and video super-resolution using an efficient sub-pixel convolutional neural network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1874–1883. [Google Scholar] [CrossRef]
- Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 252–268. [Google Scholar] [CrossRef]
- Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1637–1645. [Google Scholar] [CrossRef]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 624–632. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar] [CrossRef]
- Zhang, L.; Li, H.; Liu, X.; Niu, J.; Wu, J. MobileSR: Efficient Convolutional Neural Network for Super-resolution. In Proceedings of the 2020 IEEE Global Communications Conference (GLOBECOM), Taipei, Taiwan, 7–11 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
- Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the 27th ACM International Conference on Multimedia (ACM MM), Nice, France, 21–25 October 2019; ACM: New York, NY, USA, 2019; pp. 2024–2032. [Google Scholar] [CrossRef]
- Cao, W.; Lei, X.; Shi, J.; Liang, W.; Liu, J.; Bai, Z. HASN: Hybrid attention separable network for efficient image super-resolution. Vis. Comput. 2025, 41, 3423–3435. [Google Scholar] [CrossRef]
- Zhao, C.; Dong, G.; Zhang, S.; Tan, Z.; Basu, A. Frequency regularization: Reducing information redundancy in convolutional neural networks. IEEE Access 2023, 11, 106793–106802. [Google Scholar] [CrossRef]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Springer: Cham, Switzerland, 2018; pp. 286–301. [Google Scholar] [CrossRef]
- Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 14–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 11534–11542. [Google Scholar] [CrossRef]
- Zhang, Y.; Tian, Y.; Kong, Y.; Zhong, B.; Fu, Y. Residual dense network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2472–2481. [Google Scholar] [CrossRef]




| Model | Params (K) | PSNR (dB) | Note |
|---|---|---|---|
| VDSR (baseline) | 665.9 | 35.33 ± 0.05 | |
| DSConv (no LR) | 49.2 | 35.01 ± 0.04 | |
| DSConv+HAT (r = 4) | 69.7 | 35.01 ± 0.05 | No gain |
| DSConv+HAT (r = 2) | 71.3 | 35.01 ± 0.04 | Still no gain |
| DSConv+HAT (160 epochs) | 69.7 | 35.02 ± 0.04 | Longer training not helpful |
| Eff-HASR (DSConv+HAT+LR) | 69.7 | 35.23 ± 0.04 | Gain from LR, not HAT |
| DSConv+LR (ours) | 49.2 | 35.21 ± 0.04 | Best efficiency |
| Model | Params (K) | PSNR (dB) | SSIM | Source |
|---|---|---|---|---|
| VDSR | 665.9 | 35.33 | 0.9359 | Our reimpl. |
| DSConv+LR (ours) | 49.2 | 35.21 | 0.9369 | Our reimpl. |
| FSRCNN (reimpl.) | 58.0 | 34.07 | 0.9105 | Our reimpl. |
| CARN (reimpl.) | 456.8 | 37.25 | 0.9580 | Our reimpl. |
| IMDN (reimpl.) | 694.0 | 37.35 | 0.9587 | Our reimpl. |
| Model | Set5 | Set14 | BSD100 | Urban100 |
|---|---|---|---|---|
| VDSR | 35.33/0.9359 | 33.42/0.8868 | 33.08/0.8676 | 32.30/0.8566 |
| DSConv+LR (ours) | 35.21/0.9369 | 33.41/0.8899 | 33.10/0.8722 | 32.34/0.8609 |
| FSRCNN | 34.07/0.9105 | 32.73/0.8640 | 32.65/0.8495 | 31.76/0.8288 |
| CARN (ours) | 37.25/0.9580 | 34.72/0.9138 | 33.93/0.8971 | 33.86/0.9143 |
| IMDN (ours) | 37.35/0.9587 | 34.69/0.9129 | 33.95/0.8972 | 33.91/0.9134 |
| Dataset | Set5 | Set14 | BSD100 | Urban100 |
|---|---|---|---|---|
| PSNR (dB) | 32.26 | 31.48 | 31.45 | 30.92 |
| SSIM | 0.8191 | 0.7209 | 0.6864 | 0.6759 |
| Experiment | Model | Params (K) | PSNR (dB) |
|---|---|---|---|
| Baseline | DSConv+LR | 49.2 | 35.21 |
| Original HAT | DSConv+HAT (r = 4) | 69.7 | 35.01 |
| (1) Longer training | DSConv+HAT (160 epochs) | 69.7 | 35.02 |
| (2) Less aggressive reduction | DSConv+HAT (r = 2) | 71.3 | 35.01 |
| (3) HAT on standard conv | Conv+HAT (no DSConv) | 665.9 | 35.45 (baseline 35.40) |
| N | DSConv (without LR) PSNR (dB) | DSConv+LR (with LR) PSNR (dB) | Gain (dB) |
|---|---|---|---|
| 5 | 34.89 | 35.23 | +0.34 |
| 10 | 35.01 | 35.21 | +0.20 |
| 15 | 34.92 | 35.21 | +0.29 |
| 20 | 34.92 | 35.23 | +0.31 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Hu, Q.; Tian, J.; Jiang, G.; Xue, S.; Wang, J. DSConv+LR: A Minimalist Lightweight Network for Image Super-Resolution. Electronics 2026, 15, 2637. https://doi.org/10.3390/electronics15122637
Hu Q, Tian J, Jiang G, Xue S, Wang J. DSConv+LR: A Minimalist Lightweight Network for Image Super-Resolution. Electronics. 2026; 15(12):2637. https://doi.org/10.3390/electronics15122637
Chicago/Turabian StyleHu, Qiuxia, Jie Tian, Guangyi Jiang, Shan Xue, and Jingxuan Wang. 2026. "DSConv+LR: A Minimalist Lightweight Network for Image Super-Resolution" Electronics 15, no. 12: 2637. https://doi.org/10.3390/electronics15122637
APA StyleHu, Q., Tian, J., Jiang, G., Xue, S., & Wang, J. (2026). DSConv+LR: A Minimalist Lightweight Network for Image Super-Resolution. Electronics, 15(12), 2637. https://doi.org/10.3390/electronics15122637

