Infrared Image Super-Resolution Network Utilizing the Enhanced Transformer and U-Net
Abstract
:1. Introduction
- We explore a novel infrared image SR reconstruction model, SwinAIR. We introduce the Residual Swin Transformer and Average Pooling Block into the deep feature extraction module of SwinAIR. This configuration effectively extracts both low- and high-frequency infrared features while concurrently fusing them. Comparative results with existing methods demonstrate that our model exhibits superior performance in the SR reconstruction of infrared images.
- We combine SwinAIR with the U-Net [16] to construct SwinAIR-GAN, further exploring the problem of SR reconstruction for real infrared images. SwinAIR-GAN utilizes SwinAIR as the generator network and employs U-Net as the discriminator network. Comparative results with similar methods demonstrate that our model can generate infrared images with better visual effects and more realistic features and details.
- We incorporate spectral normalization, dropout, and artifact discrimination loss to minimize possible artifacts during the restoration process of real infrared images and enhance the generalization ability of SwinAIR-GAN. We also expand the degradation space of the degradation model to emulate the degradation process of real infrared images more accurately. These improvements enable the generation of infrared images that closely align with the textures and details of real-world images while avoiding over-smoothing.
- We establish an infrared data acquisition system that can simultaneously capture LR and HR infrared images of corresponding scenes, addressing the issue of lacking reference images in the real infrared image SR reconstruction task.
2. Related Works
2.1. Traditional Infrared Image Super-Resolution Reconstruction Methods
2.2. Transformer-Based Image Super-Resolution Reconstruction Methods
2.3. GAN-Based Image Super-Resolution Reconstruction Methods
3. Proposed Method
3.1. SwinAIR
3.1.1. Overall Structure
3.1.2. Deep Feature Extraction Module
3.1.3. Loss Function
3.2. SwinAIR-GAN
3.2.1. Generator Network
3.2.2. Discriminator Network
3.2.3. Loss Function
3.2.4. Degradation Model
4. Experimental Details and Results
4.1. Datasets and Metrics
4.2. Model and Training Settings
4.3. SwinAIR
4.3.1. Comparisons with the State-of-the-Art Methods
4.3.2. Ablation Study
4.4. SwinAIR-GAN
4.4.1. Comparisons with the State-of-the-Art Methods
4.4.2. Ablation Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Henn, K.A.; Peduzzi, A. Surface Heat Monitoring with High-Resolution UAV Thermal Imaging: Assessing Accuracy and Applications in Urban Environments. Remote Sens. 2024, 16, 930. [Google Scholar] [CrossRef]
- Chen, X.; Letu, H.; Shang, H.; Ri, X.; Tang, C.; Ji, D.; Shi, C.; Teng, Y. Rainfall Area Identification Algorithm Based on Himawari-8 Satellite Data and Analysis of its Spatiotemporal Characteristics. Remote Sens. 2024, 16, 747. [Google Scholar] [CrossRef]
- Cheng, L.; He, Y.; Mao, Y.; Liu, Z.; Dang, X.; Dong, Y.; Wu, L. Personnel Detection in Dark Aquatic Environments Based on Infrared Thermal Imaging Technology and an Improved YOLOv5s Model. Sensors 2024, 24, 3321. [Google Scholar] [CrossRef]
- Calvin, W.M.; Littlefield, E.F.; Kratt, C. Remote sensing of geothermal-related minerals for resource exploration in Nevada. Geothermics 2015, 53, 517–526. [Google Scholar] [CrossRef]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In Proceedings of the Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; Proceedings, Part IV 13. Springer: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
- Kim, J.; Lee, J.K.; Lee, K.M. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 10012–10022. [Google Scholar]
- Chen, H.; Wang, Y.; Guo, T.; Xu, C.; Deng, Y.; Liu, Z.; Ma, S.; Xu, C.; Xu, C.; Gao, W. Pre-trained image processing transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 12299–12310. [Google Scholar]
- Liang, J.; Cao, J.; Sun, G.; Zhang, K.; Van Gool, L.; Timofte, R. Swinir: Image restoration using swin transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1833–1844. [Google Scholar]
- Zhang, D.; Huang, F.; Liu, S.; Wang, X.; Jin, Z. Swinfir: Revisiting the swinir with fast fourier convolution and improved training for image super-resolution. arXiv 2022, arXiv:2208.11247. [Google Scholar]
- Chen, X.; Wang, X.; Zhou, J.; Qiao, Y.; Dong, C. Activating more pixels in image super-resolution transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 22367–22377. [Google Scholar]
- Zhang, K.; Liang, J.; Van Gool, L.; Timofte, R. Designing a practical degradation model for deep blind image super-resolution. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 4791–4800. [Google Scholar]
- Wang, X.; Xie, L.; Dong, C.; Shan, Y. Real-esrgan: Training real-world blind super-resolution with pure synthetic data. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 1905–1914. [Google Scholar]
- Lin, M.; Chen, Q.; Yan, S. Network in network. arXiv 2013, arXiv:1312.4400. [Google Scholar]
- Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, part III 18. Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
- Wang, J.; Ralph, J.F.; Goulermas, J.Y. An analysis of a robust super resolution algorithm for infrared imaging. In Proceedings of the 2009 Proceedings of 6th International Symposium on Image and Signal Processing and Analysis, Salzburg, Austria, 16–18 September 2009; pp. 158–163. [Google Scholar]
- Choi, K.; Kim, C.; Kang, M.H.; Ra, J.B. Resolution improvement of infrared images using visible image information. IEEE Signal Process. Lett. 2011, 18, 611–614. [Google Scholar] [CrossRef]
- Mao, Y.; Wang, Y.; Zhou, J.; Jia, H. An infrared image super-resolution reconstruction method based on compressive sensing. Infrared Phys. Technol. 2016, 76, 735–739. [Google Scholar] [CrossRef]
- Deng, C.Z.; Tian, W.; Chen, P.; Wang, S.Q.; Zhu, H.S.; Hu, S.F. Infrared image super-resolution via locality-constrained group sparse model. Acta Phys. Sin. 2014, 63, 044202. [Google Scholar] [CrossRef]
- Yang, X.; Wu, W.; Hua, H.; Liu, K. Infrared image recovery from visible image by using multi-scale and multi-view sparse representation. In Proceedings of the 2015 11th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS), Bangkok, Thailand, 23–27 November 2015; pp. 554–559. [Google Scholar]
- Yang, X.; Wu, W.; Liu, K.; Zhou, K.; Yan, B. Fast multisensor infrared image super-resolution scheme with multiple regression models. J. Syst. Archit. 2016, 64, 11–25. [Google Scholar] [CrossRef]
- Song, P.; Deng, X.; Mota, J.F.; Deligiannis, N.; Dragotti, P.L.; Rodrigues, M.R. Multimodal image super-resolution via joint sparse representations induced by coupled dictionaries. IEEE Trans. Comput. Imaging 2019, 6, 57–72. [Google Scholar] [CrossRef]
- Yao, T.; Luo, Y.; Hu, J.; Xie, H.; Hu, Q. Infrared image super-resolution via discriminative dictionary and deep residual network. Infrared Phys. Technol. 2020, 107, 103314. [Google Scholar] [CrossRef]
- Wang, Y.; Wang, L.; Liu, B.; Zhao, H. Research on blind super-resolution technology for infrared images of power equipment based on compressed sensing theory. Sensors 2021, 21, 4109. [Google Scholar] [CrossRef] [PubMed]
- Alonso-Fernandez, F.; Farrugia, R.A.; Bigun, J. Iris super-resolution using iterative neighbor embedding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 153–161. [Google Scholar]
- Ahmadi, S.; Burgholzer, P.; Jung, P.; Caire, G.; Ziegler, M. Super resolution laser line scanning thermography. Opt. Lasers Eng. 2020, 134, 106279. [Google Scholar] [CrossRef]
- Wang, Y.; Zhang, J.; Wang, L. Compressed Sensing Super-Resolution Method for Improving the Accuracy of Infrared Diagnosis of Power Equipment. Appl. Sci. 2022, 12, 4046. [Google Scholar] [CrossRef]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 5998–6008. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
- Lin, X.; Sun, S.; Huang, W.; Sheng, B.; Li, P.; Feng, D.D. EAPT: Efficient attention pyramid transformer for image processing. IEEE Trans. Multimed. 2021, 25, 50–61. [Google Scholar] [CrossRef]
- Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 213–229. [Google Scholar]
- Ma, C.; Zhuo, L.; Li, J.; Zhang, Y.; Zhang, J. Cascade transformer decoder based occluded pedestrian detection with dynamic deformable convolution and gaussian projection channel attention mechanism. IEEE Trans. Multimed. 2023, 25, 1529–1537. [Google Scholar] [CrossRef]
- Arnab, A.; Dehghani, M.; Heigold, G.; Sun, C.; Lučić, M.; Schmid, C. Vivit: A video vision transformer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 11–17 October 2021; pp. 6836–6846. [Google Scholar]
- Junayed, M.S.; Islam, M.B. Consistent video inpainting using axial attention-based style transformer. IEEE Trans. Multimed. 2022, 25, 7494–7504. [Google Scholar] [CrossRef]
- Cao, Y.; Li, L.; Liu, B.; Zhou, W.; Li, Z.; Ni, W. CFMB-T: A cross-frequency multi-branch transformer for low-quality infrared remote sensing image super-resolution. Infrared Phys. Technol. 2023, 133, 104861. [Google Scholar] [CrossRef]
- Yi, S.; Li, L.; Liu, X.; Li, J.; Chen, L. HCTIRdeblur: A hybrid convolution-transformer network for single infrared image deblurring. Infrared Phys. Technol. 2023, 131, 104640. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4681–4690. [Google Scholar]
- Yan, B.; Bare, B.; Ma, C.; Li, K.; Tan, W. Deep objective quality assessment driven single image super-resolution. IEEE Trans. Multimed. 2019, 21, 2957–2971. [Google Scholar] [CrossRef]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
- Shang, T.; Dai, Q.; Zhu, S.; Yang, T.; Guo, Y. Perceptual extreme super-resolution network with receptive field block. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 440–441. [Google Scholar]
- Liu, S.; Yang, Y.; Li, Q.; Feng, H.; Xu, Z.; Chen, Y.; Liu, L. Infrared image super resolution using gan with infrared image prior. In Proceedings of the 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP), Wuxi, China, 19–21 July 2019; pp. 1004–1009. [Google Scholar]
- Huang, Y.; Jiang, Z.; Wang, Q.; Jiang, Q.; Pang, G. Infrared image super-resolution via heterogeneous convolutional WGAN. In Proceedings of the PRICAI 2021: Trends in Artificial Intelligence: 18th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2021, Hanoi, Vietnam, 8–12 November 2021; Proceedings, Part II 18. Springer: Berlin/Heidelberg, Germany, 2021; pp. 461–472. [Google Scholar]
- Huang, Y.; Jiang, Z.; Lan, R.; Zhang, S.; Pi, K. Infrared image super-resolution via transfer learning and PSRGAN. IEEE Signal Process. Lett. 2021, 28, 982–986. [Google Scholar] [CrossRef]
- Liu, Q.M.; Jia, R.S.; Liu, Y.B.; Sun, H.B.; Yu, J.Z.; Sun, H.M. Infrared image super-resolution reconstruction by using generative adversarial network with an attention mechanism. Appl. Intell. 2021, 51, 2018–2030. [Google Scholar] [CrossRef]
- Lee, I.H.; Chung, W.Y.; Park, C.G. Style transformation super-resolution GAN for extremely small infrared target image. Pattern Recognit. Lett. 2023, 174, 1–9. [Google Scholar] [CrossRef]
- Kong, X.; Liu, X.; Gu, J.; Qiao, Y.; Dong, C. Reflash dropout in image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 6002–6012. [Google Scholar]
- Si, C.; Yu, W.; Zhou, P.; Zhou, Y.; Wang, X.; Yan, S. Inception transformer. Adv. Neural Inf. Process. Syst. 2022, 35, 23495–23509. [Google Scholar]
- Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv 2018, arXiv:1802.05957. [Google Scholar]
- Liang, J.; Zeng, H.; Zhang, L. Details or artifacts: A locally discriminative learning approach to realistic image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 5657–5666. [Google Scholar]
- Dierickx, B.; Meynants, G. Missing pixel correction algorithm for image sensors. In Advanced Focal Plane Arrays and Electronic Cameras II; SPIE: Bellingham, WA, USA, 1998; Volume 3410, pp. 200–203. [Google Scholar]
- Zhang, K.; Gool, L.V.; Timofte, R. Deep unfolding network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 3217–3226. [Google Scholar]
- Ji, X.; Cao, Y.; Tai, Y.; Wang, C.; Li, J.; Huang, F. Real-world super-resolution via kernel estimation and noise injection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 13–19 June 2020; pp. 466–467. [Google Scholar]
- Timofte, R.; Agustsson, E.; Van Gool, L.; Yang, M.H.; Zhang, L. Ntire 2017 challenge on single image super-resolution: Methods and results. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 114–125. [Google Scholar]
- González, A.; Fang, Z.; Socarras, Y.; Serrat, J.; Vázquez, D.; Xu, J.; López, A.M. Pedestrian detection at day/night time with visible and FIR cameras: A comparison. Sensors 2016, 16, 820. [Google Scholar] [CrossRef]
- Portmann, J.; Lynen, S.; Chli, M.; Siegwart, R. People detection and tracking from aerial thermal views. In Proceedings of the 2014 IEEE International Conference on Robotics and Automation (ICRA), Hong Kong, China, 31 May–7 June 2014; pp. 1794–1800. [Google Scholar]
- Iray-384 Image Database. Available online: http://openai.iraytek.com/apply/Universal_video.html/ (accessed on 21 May 2024).
- Iray-Ship Image Database. Available online: http://openai.raytrontek.com/apply/Sea_shipping.html/ (accessed on 21 May 2024).
- Iray-Aerial Photography Image Database. Available online: http://openai.iraytek.com/apply/Aerial_mancar.html/ (accessed on 21 May 2024).
- Iray-Security Image Database. Available online: http://openai.iraytek.com/apply/Infrared_security.html/ (accessed on 21 May 2024).
- Li, Z.; Yang, J.; Liu, Z.; Yang, X.; Jeon, G.; Wu, W. Feedback network for image super-resolution. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 3867–3876. [Google Scholar]
- Zhang, K.; Zuo, W.; Zhang, L. Learning a single convolutional super-resolution network for multiple degradations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3262–3271. [Google Scholar]
- Sajjadi, M.S.; Scholkopf, B.; Hirsch, M. Enhancenet: Single image super-resolution through automated texture synthesis. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4491–4500. [Google Scholar]
- Liang, S.; Song, K.; Zhao, W.; Li, S.; Yan, Y. DASR: Dual-Attention Transformer for infrared image super-resolution. Infrared Phys. Technol. 2023, 133, 104837. [Google Scholar] [CrossRef]
- Wei, W.; Sun, Y.; Zhang, L.; Nie, J.; Zhang, Y. Boosting one-shot spectral super-resolution using transfer learning. IEEE Trans. Comput. Imaging 2020, 6, 1459–1470. [Google Scholar] [CrossRef]
- Zhang, B.; Ma, M.; Wang, M.; Hong, D.; Yu, L.; Wang, J.; Gong, P.; Huang, X. Enhanced resolution of FY4 remote sensing visible spectrum images utilizing super-resolution and transfer learning techniques. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2022, 15, 7391–7399. [Google Scholar] [CrossRef]
Name | Input Channels | Output Channels | Kernel Size | Stride |
---|---|---|---|---|
Conv-1 | 3 | 64 | 3 | 1 |
Conv-2 | 64 | 128 | 3 | 2 |
Conv-3 | 128 | 256 | 3 | 2 |
Conv-4 | 256 | 512 | 3 | 2 |
Conv-5 | 512 | 256 | 3 | 1 |
Conv-6 | 256 | 128 | 3 | 1 |
Conv-7 | 128 | 64 | 3 | 1 |
Conv-8 | 64 | 64 | 3 | 1 |
Conv-9 | 64 | 64 | 3 | 1 |
Conv-10 | 64 | 1 | 3 | 1 |
Camera | Model | Resolution | Focal Length | Horizontal Field of View | Vertical Field of View |
---|---|---|---|---|---|
Iray | M3384012Y01312X | 384 × 288 | 13mm | 20° | 15° |
Jing Lin Chengdu | RTD6122C | 640 × 512 | 18mm | 14.8° | 10.8° |
Scale | Method | Params | CVC14 | Flir | Iray-384 | Iray-Security | Iray-Ship | Iray-Aerial Photography |
---|---|---|---|---|---|---|---|---|
Bicubic | - | 39.57/0.9570 | 32.23/0.7633 | 29.47/0.8980 | 33.09/0.9114 | 34.42/0.9287 | 32.37/0.9263 | |
SRCNN [5] | 0.1M | 39.93/0.9597 | 32.60/0.7834 | 31.06/0.9205 | 34.21/0.9291 | 35.17/0.9417 | 34.36/0.9480 | |
EDSR [7] | 43.0M | 42.53/0.9666 | 33.17/0.7930 | 32.16/0.9314 | 35.56/0.9383 | 36.51/0.9492 | 35.71/0.9586 | |
SRMD [63] | 15.1M | 43.62/0.9726 | 34.40/0.8193 | 33.14/0.9363 | 36.36/0.9439 | 37.67/0.9539 | 36.74/0.9618 | |
SRMDNF [63] | 15.1M | 43.77/0.9730 | 34.45/0.8202 | 33.32/0.9380 | 36.62/0.9454 | 37.82/0.9548 | 36.92/0.9630 | |
SRFBN [62] | 3.5M | 43.96/0.9736 | 34.53/0.8214 | 33.66/0.9409 | 37.25/0.9484 | 38.10/0.9560 | 37.21/0.9643 | |
SwinIR [10] | 11.5M | 44.02/0.9737 | 34.53/0.8208 | 33.77/0.9418 | 37.22/0.9484 | 37.98/0.9563 | 37.30/0.9648 | |
HAT [12] | 20.8M | 42.02/0.9701 | 33.10/0.7964 | 32.35/0.9320 | 36.64/0.9468 | 36.67/0.9504 | 36.60/0.9610 | |
SwinAIR (Ours) | 9.2M | 43.97/0.9737 | 34.55/0.8221 | 33.81/0.9420 | 37.40/0.9488 | 38.10/0.9568 | 37.34/0.9648 | |
Bicubic | - | 36.68/0.9274 | 30.43/0.6947 | 27.03/0.8324 | 30.82/0.8575 | 31.68/0.8807 | 29.32/0.8596 | |
SRCNN [5] | 0.1M | 37.59/0.9352 | 30.92/0.7172 | 28.10/0.8601 | 31.72/0.8803 | 32.51/0.8982 | 30.76/0.8911 | |
EDSR [7] | 43.0M | 39.29/0.9444 | 31.45/0.7286 | 28.85/0.8774 | 32.57/0.8943 | 33.68/0.9090 | 31.78/0.9110 | |
SRMD [63] | 15.3M | 40.30/0.9530 | 32.63/0.7626 | 29.97/0.8840 | 33.61/0.9043 | 34.79/0.9157 | 32.95/0.9167 | |
SRMDNF [63] | 15.3M | 40.43/0.9535 | 32.69/0.7637 | 30.06/0.8859 | 33.70/0.9054 | 34.87/0.9167 | 33.04/0.9181 | |
SRFBN [62] | 3.5M | 40.81/0.9559 | 32.84/0.7662 | 30.38/0.8925 | 34.06/0.9099 | 35.18/0.9206 | 33.36/0.9228 | |
SwinIR [10] | 11.5M | 40.87/0.9562 | 32.84/0.7669 | 30.42/0.8932 | 33.97/0.9100 | 35.15/0.9203 | 33.37/0.9230 | |
HAT [12] | 20.8M | 39.94/0.9442 | 31.59/0.7285 | 29.50/0.8779 | 32.91/0.8974 | 33.94/0.9103 | 32.45/0.9146 | |
SwinAIR (Ours) | 9.3M | 40.85/0.9561 | 32.85/0.7673 | 30.49/0.8939 | 34.15/0.9108 | 35.18/0.9212 | 33.38/0.9233 | |
Bicubic | - | 34.34/0.8985 | 29.19/0.6432 | 25.71/0.7817 | 29.83/0.8213 | 30.22/0.8463 | 27.70/0.8060 | |
SRCNN [5] | 0.1M | 35.09/0.9049 | 29.62/0.6645 | 26.50/0.8069 | 30.36/0.8409 | 30.84/0.8620 | 28.76/0.8358 | |
EDSR [7] | 43.0M | 36.84/0.9212 | 30.18/0.6813 | 27.14/0.8280 | 31.10/0.8592 | 31.20/0.8754 | 29.74/0.8642 | |
SRMD [63] | 15.5M | 37.87/0.9322 | 31.38/0.7199 | 28.29/0.8372 | 32.20/0.8725 | 33.20/0.8844 | 30.89/0.8725 | |
SRMDNF [63] | 15.5M | 37.98/0.9333 | 31.40/0.7210 | 28.36/0.8405 | 32.34/0.8750 | 33.21/0.8859 | 30.99/0.8755 | |
SRFBN [62] | 3.5M | 38.34/0.9367 | 31.57/0.7249 | 28.63/0.8483 | 32.55/0.8792 | 33.46/0.8905 | 31.24/0.8812 | |
SwinIR [10] | 11.5M | 38.38/0.9372 | 31.54/0.7251 | 28.58/0.8482 | 32.52/0.8794 | 33.46/0.8907 | 31.22/0.8815 | |
HAT [12] | 20.8M | 37.39/0.9173 | 30.02/0.6986 | 27.47/0.8283 | 31.49/0.8635 | 32.01/0.8811 | 30.61/0.8720 | |
SwinAIR (Ours) | 9.3M | 38.52/0.9380 | 31.64/0.7272 | 28.78/0.8512 | 32.66/0.8811 | 33.51/0.8917 | 31.33/0.8835 |
Scale | Image Size | Method | FLOPs (G) | Inference Speed (ms) |
---|---|---|---|---|
EDSR [7] | 205.83 | 10 | ||
SRFBN [62] | 530.93 | 22 | ||
SwinIR [10] | 49.28 | 32 | ||
HAT [12] | 82.04 | 77 | ||
SwinAIR (Ours) | 37.51 | 42 |
Number of Branches | Avgpool + Linear | Maxpool + Linear | PSNR *(dB) | SSIM * |
---|---|---|---|---|
1 | ✗ | ✓ | 38.19 | 0.9361 |
1 | ✓ | ✗ | 38.20 | 0.9364 |
2 | ✓ | ✗ | 38.52 | 0.9380 |
2 | ✗ | ✓ | 38.51 | 0.9379 |
2 | ✓ | ✓ | 37.96 | 0.9326 |
Scale | Method | Loss Function | Self-Built Dataset (BI) | Self-Built Dataset (BD) | Iray-384 (BI) | Iray-384 (BD) | ASL-TID (BI) | ASL-TID (BD) |
---|---|---|---|---|---|---|---|---|
BSRGAN [13] | 31.98/0.8706 | 31.62/0.8663 | 25.58/0.7669 | 25.47/0.7773 | 33.26/0.8750 | 32.87/0.8703 | ||
Real-ESRGAN [14] | 30.94/0.8630 | 30.25/0.8513 | 25.10/0.7681 | 24.66/0.7703 | 32.66/0.8787 | 32.06/0.8704 | ||
SwinIR-GAN [10] | 29.91/0.8234 | 29.67/0.8210 | 24.64/0.7548 | 24.59/0.7674 | 32.87/0.8768 | 32.53/0.8725 | ||
SwinAIR-GAN (Ours) | 33.47/0.9106 | 32.93/0.9043 | 25.89/0.7841 | 25.86/0.8037 | 33.82/0.9059 | 33.30/0.8992 |
Scale | Method | Self-Built Dataset | Iray-384 | ASL-TID |
---|---|---|---|---|
BSRGAN [13] | 6.3307/5.8951 | 4.3722/4.4544 | 5.7031/5.1755 | |
Real-ESRGAN [14] | 6.2912/6.0504 | 4.4617/4.3883 | 6.1816/5.6404 | |
SwinIR-GAN [10] | 6.4183/6.0457 | 4.0101/4.0049 | 5.2140/4.9301 | |
SwinAIR-GAN (Ours) | 6.0089/5.6015 | 3.9406/3.9722 | 5.4822/4.9886 |
Dropout | Unknown Degradation (NIQE/PI) 1 | BI Degradation (PSNR (dB)/SSIM) 2 | BD Degradation (PSNR (dB)/SSIM) 2 |
---|---|---|---|
✗ | 6.3053/5.7188 | 32.46/0.8904 | 32.22/0.8924 |
✓ | 6.0089/5.6015 | 33.47/0.9106 | 32.93/0.9043 |
Dropout | Unknown Degradation (NIQE/PI) 1 | BI Degradation (PSNR (dB)/SSIM) 2 | BD Degradation (PSNR (dB)/SSIM) 2 |
---|---|---|---|
✗ | 6.6458/6.0230 | 32.67/0.8969 | 32.37/0.8934 |
✓ | 6.0089/5.6015 | 33.47/0.9106 | 32.93/0.9043 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Huang, F.; Li, Y.; Ye, X.; Wu, J. Infrared Image Super-Resolution Network Utilizing the Enhanced Transformer and U-Net. Sensors 2024, 24, 4686. https://doi.org/10.3390/s24144686
Huang F, Li Y, Ye X, Wu J. Infrared Image Super-Resolution Network Utilizing the Enhanced Transformer and U-Net. Sensors. 2024; 24(14):4686. https://doi.org/10.3390/s24144686
Chicago/Turabian StyleHuang, Feng, Yunxiang Li, Xiaojing Ye, and Jing Wu. 2024. "Infrared Image Super-Resolution Network Utilizing the Enhanced Transformer and U-Net" Sensors 24, no. 14: 4686. https://doi.org/10.3390/s24144686
APA StyleHuang, F., Li, Y., Ye, X., & Wu, J. (2024). Infrared Image Super-Resolution Network Utilizing the Enhanced Transformer and U-Net. Sensors, 24(14), 4686. https://doi.org/10.3390/s24144686