Infrared Image Super-Resolution via Progressive Compact Distillation Network

Fan, Kefeng; Hong, Kai; Li, Fei

doi:10.3390/electronics10243107

Open AccessArticle

Infrared Image Super-Resolution via Progressive Compact Distillation Network

by

Kefeng Fan

^1,2,

Kai Hong

^2,* and

Fei Li

³

¹

Information Technology Research Center, China Electronics Standardization Institute, Beijing 100007, China

²

Department of Electronic Engineering and Automation, Guilin University of Electronic Technology, Guilin 541004, China

³

PengCheng Lab., Shenzhen 518055, China

^*

Author to whom correspondence should be addressed.

Electronics 2021, 10(24), 3107; https://doi.org/10.3390/electronics10243107

Submission received: 28 October 2021 / Revised: 6 December 2021 / Accepted: 7 December 2021 / Published: 14 December 2021

(This article belongs to the Special Issue Multimedia Processing: Challenges and Prospects)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Deep convolutional neural networks are capable of achieving remarkable performance in single-image super-resolution (SISR). However, due to the weak availability of infrared images, heavy network architectures for insufficient infrared images are confronted by excessive parameters and computational complexity. To address these issues, we propose a lightweight progressive compact distillation network (PCDN) with a transfer learning strategy to achieve infrared image super-resolution reconstruction with a few samples. We design a progressive feature residual distillation (PFDB) block to efficiently refine hierarchical features, and parallel dilation convolutions are utilized to expand PFDB’s receptive field, thereby maximizing the characterization power of marginal features and minimizing the network parameters. Moreover, the bil-global connection mechanism and the difference calculation algorithm between two adjacent PFDBs are proposed to accelerate the network convergence and extract the high-frequency information, respectively. Furthermore, we introduce transfer learning to fine-tune network weights with few-shot infrared images to obtain infrared image mapping information. Experimental results suggest the effectiveness and superiority of the proposed framework with low computational load in infrared image super-resolution. Notably, our PCDN outperforms existing methods on two public datasets for both ×2 and ×4 with parameters less than 240 k, proving its efficient and excellent reconstruction performance.

Keywords:

infrared image super-resolution; information distillation; lightweight network; transfer learning

1. Introduction

Infrared imaging plays a significant role in military target detection, video surveillance [1], biomedical science [2], and other areas [3,4,5]. However, infrared images generally suffer from low contrast and blurred details. To tackle this problem, many super-resolution (SR) methods have been proposed. Single-image super-resolution (SISR) is where clear high-resolution (HR) images can be reconstructed from a blurry single-frame low-resolution (LR) image based on the identity mapping function or imaging model through prior information. Dong et al. [6] pioneered the three-layer convolutional network into SR to learn nonlinear LR–HR mapping. Subsequently, SR deeper-oriented approaches have received significant attention. A 20-layer model named VDSR was proposed by Kim [7], which was inspired by the proposal of the residual network [8] and the VGG model [9]. Furthermore, extensive models, EDSR [10] and RCAN [11] with 69-layer and 400-layer networks, respectively, have achieved state-of-the-art performance.

Although visible image SR techniques have displayed tremendous progress, with deeper network structure models possessing better nonlinear expression ability to present superior image quality, there are a few limitations for their adoption in the infrared image domain. Firstly, these SR networks lead to a huge computational cost, require large computer memory resources, and are not suitable for real-world infrared detector devices. Designing fast, accurate, and lightweight infrared image SR models combined with infrared visual perception is a promising approach to mitigate this contradiction. Secondly, training such deep network models requires massive datasets of different scenes or targets as sample support, which hinders the assembly of infrared image datasets due to expensive optical sensors and restricted natural circumstances. The main alternative approach is to utilize transfer learning to address insufficient training dataset resources in the infrared domain.

As lightweight networks, DRCN [12] and DRRN [13] utilized recursive networks to decrease redundant parameters. CARN-M [14] introduced an efficient and lightweight cascaded residual network with group convolution for mobile devices; however, this was at the expense of a serious loss of PSNR with the stacking of group convolution. Hui et al. [15] designed the information multi-distillation network (IMDN), which divides the preceding features into two segments, whereby one part is maintained and the other is further processed. MSAR-Net [16] proposed a multiscale attention residual block for feature refinement and an up–down sampling projection block for edge refinement, further improving the efficiency of extracting high-frequency information.

For limited training datasets in the target field, transfer learning [17] can effectively apply learned knowledge in a previous field to a novel domain. Shahin [18] et al. proposed the transfer deep learning approach for examination tasks and designed the lightweight DCRN to estimate building orientation angle. Zhang [19] et al. improved the accuracy of spectrum classification and enhanced the robustness of the model using a multistep training strategy applying different datasets. Taking advantage of transfer learning, based on the visible image in the first stage, the second stage employs few-shot infrared image samples to update network weights.

In this paper, we design a progressive compact distillation network (PCDN) integrated with few-shot transfer learning using easily available visible images and few-shot infrared image samples. The goal of the proposed PCDN framework is to efficiently restore the SR infrared images in real time. The core progressive feature distillation block (PFDB) utilizes the combination of asymmetric convolution and parallel dilation convolution with the multi-receptive field, which can not only maximize the characterization power of marginal features but also minimize the network parameters. For adjacent PFDBs, the proposed PCDN investigates the difference calculation algorithm concentrating on the high-frequency information to provide structure priors. Moreover, we propose the bil-global connection mechanism to significantly lower the complexity of the network and accelerate the convergence. Secondly, PCDN applies transfer learning to reuse the features trained by the visible image dataset for the infrared domain. Specifically, the PCDN is trained by visible images to create an efficient ensemble mapping relationship to enhance the restoring performance. However, it is not ideal to employ the trained model to super-resolved infrared images since the mapping weights are obtained from the visual image instead of the infrared image. Consequently, we fine-tune the network weights based on the pretrained model through a small number of infrared images in the second step. Compared with existing transfer learning methods, the PCDN only employs 55 infrared images for the fine-tuning stage and achieves excellent performance. We discuss the experimental details in the ablation experiments and evaluate our method on two public infrared datasets to compare its performance with existing SR methods.

This paper is organized as follows: Section 2 reviews the state-of-the-art methods in the field of SR and transfer learning. Our proposed method is elaborated in Section 3. Results and a discussion are presented in Section 4 and Section 5, respectively.

2. Related Work

2.1. Lightweight CNNs for SR

The current CNN-based SR models achieve optimal performance regardless of negative effects caused by a deeper network or larger filter size. For instance, the EDSR of Lim et al. [10] stacked 69-layer networks with a sequence of residual blocks, and the RCAN designed by Zhang et al. [11] even exceeded 400 layers through residual in residual (RIR) to achieve state-of-the-art performance. However, such networks of excessive parameters with deeper structure and larger kernel size lead to huge calculation, limiting real-world applications. Hence, numerous researchers have designed strategies, such as pre-upsampling, global and local structure, and internal convolution operation.

Compared with the originally proposed SRCNN [6], FSRCNN [20] changed the pre-upsampling strategy that extracts features from the bicubic interpolated LR images to the post-upsampling strategy that directly takes the LR images as input, thereby greatly lowering the computational complexity, and it has been applied to almost all SISR tasks.

The global and local structure strategy focuses on transforming the connection among modules to enhance the sharing and utilization of information. The DRCN proposed by Kim et al. [12] is the premier model in SR used in a recursive manner to obtain more high-frequency feature spaces from the target pixels. CARN [14] implements a cascading mechanism at both the local and the global levels to aggregate the features from multiple layers. Hui et al. [15] designed the information multi-distillation network (IMDN) for better purifying each processing feature by explicitly splitting the preceding features into two segments, whereby one part is maintained and the other is further refined, streamlining the network parameters and boosting the reconstruction performance. RTAN [21] is a novel lightweight residual triplet attention module used to obtain the cross-dimensional attention weights of the features. LW-AWSRN [22] is a novel local fusion block consisting of adaptive weighted residual units and the local residual fusion unit to remove the redundancy scale branch.

From the perspective of the internal convolution operation, several researchers have made great progress. InceptionV3 [23] factorized an n × n convolution to a 1 × n followed by an n × 1 convolution, reducing the parameters by nearly 33% with an identical receptive field. CARN-M [7] introduced group convolution based on the cascaded residual network. Group convolution separates the channel of the filter to slide over the input images, and depth-wise convolution [24] is used in the extreme group convolution to minimize the parameters. Nevertheless, as the stacking depth-wise convolution layers increases, the representative ability is drastically weakened due to the information fusion from different channel groups being hindered. Dilation convolution [25], with the same parameters as vanilla convolution, expands receptive field by filling holes in the filter window and generates abundant marginal feature by assembling multiscale dilation convolution.

2.2. Infrared Image SR and Transfer Learning

WGAN [26] employs a heterogeneous kernel-based residual block that enables multiscale extraction of infrared image features by combining convolutional kernels of different sizes. MRIDN [27] introduces effective multi-receptive field convolution groups based on a fixed number of the channels split to extract features of infrared images. Although these models reconstruct high-quality infrared images, they all rely on a large number of datasets as the foundation. However, certain domains make it difficult to assemble sample sets due to expensive equipment or restricted natural circumstances. Hence, transfer learning has emerged in many methods in various image processing fields. For instance, Moran [28] analyzed medical images through a pretrained network and successfully recovered digital periapical radiographs. Chen [29] et al. exploited different transfer learning implementations, i.e., pretrained deep learning models combined with support vector machine (SVM) and fine-tuning, in detecting unfavorable driving states. In the infrared image domain, PSRGAN [30] employs a multistage transfer learning strategy utilizing visible images and 100 infrared images to boost the restoration performance of infrared images. In this paper, the proposed PCDN only employs 55 infrared images to fine-tune the pretrained network and performs better compared to existing SR approaches.

3. Method

3.1. Overall Architecture of the Network

The overall framework of the proposed network, shown in Figure 1, is composed of four modules: (1) first feature extraction (FFE) convolution for collecting the robust features, (2) stack of progressive feature residual distillation (PFRD) modules for generating nonlinear mapping relationship, (3) difference calculation (DC) between PFRDs for exploring high-frequency information, and (4) reconstruction module with (upsampling pixel attention) U-PA blocks.

Firstly, we set the original LR image (

I_{L R}

) as the input of our network. The FFE was implemented using only a 3 × 3 convolution, followed by the Relu activation function to map

I_{L R}

from the image space to feature space, expressed as follows:

x_{0} = h_{F F E} (I_{L R}),

(1)

where

h_{F F E}

denotes the 3 × 3 convolution operation, and

x_{0}

is the extracted shallow feature.

Then, several PFRDs were followed for deep feature nonlinear mapping to generate powerful feature representations. We denote the proposed the PFRD as

f_{P F D B} (\cdot)

, given by

x_{p f}^{n} = f_{P F D B}^{n} (f_{P F R D}^{n - 1} (\dots f_{P F R D}^{0} (x_{0}) \dots)),

(2)

where

x_{p f}^{n}

is the output feature map of the n-th PFRD module. Then, the latter PFRD block subtracts the former counterpart for providing structure priors, which can be formulated as

x_{d c}^{n - 1} = x_{p f}^{n} - x_{p f}^{n - 1},

(3)

where

x_{d c}^{i}

denotes the difference between two adjacent PFRD blocks. After passing through the last PFRDs and DCs, we concatenate all generated

x_{p f}^{n}

and

x_{d c}^{i}

to feed into the reconstruction module. This process is defined below.

I_{S R} = f_{r e c} (C o n c a t (x_{n} + x_{d c}^{n - 1}) + x_{0}) + f_{b i l} (x_{0}),

(4)

where

f_{r e c}

represents the U-PA reconstruction function, inspired by PAN [31], and

C o n c a t

denotes the concatenation operation along the channel dimension. In addition, we added the bil-global connection that accumulates the shallow feature

x_{0}

and a global bilinear interpolation connection

f_{b i l}

.

I_{S R}

is the final output of the network.

3.2. Progressive Feature Residual Distillation

As depicted in Figure 1, our progressive feature residual distillation (PFRD) module is composed of several asymmetric residual blocks (ARBs) aimed at extracting deeper features, two parallel asymmetric convolutions with different dilation factors for expanding the receptive field, a few 1 × 1 convolutions on the distillation pipeline used to reduce the number of channels, and subsequent processing for the concatenation layer.

As for the ARB, it halves the two convolutions extracting features to compress the number of parameters compared with the vanilla residual block. Afterward, the single convolution is decoupled into consecutive 1 × 3, 3 × 1 convolutions to substantially diminish the parameters. As a consequence, the ARB can better leverage residual learning through stacking residual connections with an extremely modest number of parameters.

The PFDB first divides the previous feature into two components: the main refinement path and the branch distillation path. The bottom main path adopts three serial ARBs to better purify features, while the upper branch path is used to preserve coarse feature information using a 1 × 1 convolution filter behind each ARB. We take the insufficient depth spatial context into account since the ARB has only one 3 × 3 convolutional. Hence, two parallel dilation convolutions are complemented at the end of the main distillation branch to increase the receptive field and extract more marginal contextual information. This procedure can be described as follows:

F = R e L U (C o n v_{3 \times 1} (C o n v_{1 \times 3} (x)) + x), f_{r e f i n e d}^{1}, f_{c o a r s e}^{1} = F (f_{i n}), D (f_{i n}), f_{r e f i n e d}^{2}, f_{c o a r s e}^{2} = F (f_{r e f i n e d}^{1}), D (f_{c o a r s e}^{1}), f_{r e f i n e d}^{3}, f_{c o a r s e}^{3} = F (f_{r e f i n e d}^{2}), D (f_{c o a r s e}^{2}), f_{r e f i n e d}^{4_1}, f_{r e f i n e d}^{4_2} = F_{D i l_{2}} (f_{r e f i n e d}^{3}), F_{D i l_{3}} (f_{c o a r s e}^{3}), f_{o u t} = C o n c a t (f_{c o a r s e}^{1}, f_{c o a r s e}^{2}, f_{c o a r s e}^{3}, f_{r e f i n e d}^{4_{1}}, f_{r e f i n e d}^{4_{2}}),

(5)

where

F

and

D

represent the ARB and 1 × 1 convolution filter, and

f_{r e f i n e d}^{i}

and

f_{c o a r s e}^{i}

denote the output of i-th ARB block and i-th distillation layer, respectively.

F_{D i l_2}

and

F_{D i l_3}

represent the convolution filter in the ARB block with dilation factors equal to 2 and 3, respectively;

f_{o u t}

is the aggregated output of all refined and distilled features.

After concatenating all features in PFRD, we exploit the channel shuffle mechanism, inspired by ShuffleNet [32], to strengthen the correlation and share the features among concatenated channels, followed by a 1 × 1 convolution aimed at reducing the dimension of the features. Furthermore, we adopt the ESA_D, inspired by ESA [33], which replaces the convolution group with dilation convolution, and we apply the depth-wise convolution as input to refine the key information of the feature with negligible parameters.

3.3. Difference Calculation and Bil-Global Connection Mechanism

After obtaining refinement features from the inside of the PFRD, we further make full use of feature correlations of adjacent PFRDs through difference calculation. From Figure 1, the latter PFRD block has more high-level information than its former counterpart, because more convolutional layers can extract more refined and more representative features. Hence,

f_{P F D B}^{n}

subtracted from

f_{P F D B}^{n - 1}

can obtain high-frequency information and provide structure for post-processing image SR. On the other hand, we designed a bil-global connection mechanism to stabilize the network and boost reconstruction performance. This mechanism implements the global connection from the LR low-resolution image and the extracted shallow image by bilinear interpolation. The backpropagation signal can directly pass through these two global skip connections to make our network easier to train. The overall operations of the difference calculation and bil-global connection mechanism are illustrated in Equations (3) and (4), as well as Figure 1.

3.4. Loss Function

We choose the L1 loss function at the visible training stage and fine-tune the network with the L2 loss function at the transfer learning stage. Given a training set

{\{I_{H R}^{i}, I_{L R}^{i}\}}_{i = 1}^{N}

that has LR–HR pairs, the goal of PCDN is to minimize the following loss functions:

L_{1} (θ) = \frac{1}{N} \sum_{i = 1}^{N} ‖ H_{P C D N} (I_{L R}^{i}) — I_{H R}^{i} ‖_{1},

(6)

L_{2} (θ) = \frac{1}{N} \sum_{i = 1}^{N} {(H_{P C D N} (I_{L R}^{i}) — I_{H R}^{i})}^{2},

(7)

where

θ

indicates the training parameters of the PCDN,

H_{P C D N}

represents our proposed model, and

I_{L R}

and

I_{H R}

represent the LR image and its corresponding ground-truth HR image in the respective dataset.

4. Results

4.1. Implementation Details

In our proposed method, we used the recently popular dataset DIV2K [34] as the visible image dataset to train our models. The DIV2K dataset contains 800 high-quality RGB training images. For infrared images, we adopted the CVC-09: FIR Sequence Pedestrian dataset [35]. We extracted 55 images from this dataset as the infrared image training datasets named IR55. We employed two datasets for testing, defined as result-A and result-C, which consisted of 22 images obtained by fusing the infrared images and visible light images utilizing the approaches proposed in [36,37], respectively. We used the peak signal-to-noise ratio (PSNR) and structural similarity (SSIM) to measure the quality of the reconstructed images. All values were converted from RGB channels to the Y channel of YCbCr for evaluation.

We generated the training LR images by downsampling HR images with scaling factors (×2 and ×4) using bicubic interpolation in MATLAB. We trained the model using visible DIV2K datasets and fine-tuned it using IR55 at the visible stage and transfer learning stage. At both stages, we used the 128 × 128 RGB input patches from the LR image for training and trained our model with the ADAM optimizer by setting β₁ = 0.9, β₂ = 0.999, and ε = 10⁻⁸. The learning rate was initialized as 5 × 10⁻⁴ and halved every 2 × 10⁵ minibatch updates at the first stage, whereas it was set as 5 × 10⁻⁵ and halved every 20 minibatch updates at the second stage. We set the batch size to 32 at both stages and set the number of epochs to 400 at the visible training stage for full training. However, since the pretrained network tended to converge, the epoch was set to 50 at the transfer learning stage. The networks were implemented utilizing the PyTorch framework with NVIDIA TITAN X (Pascal) GPUs.

4.2. Quantitative and Qualitative Results

We compared our approach quantitatively with popular SR methods including SRMD [38], DPSR [39], EDSR [10], SPSR [40], IMDN [15], and PSRGAN [24]. Results of PSNR and SSIM values are presented in Table 1. It can be found that PCDN achieved the best performance and outperformed these methods by a large margin on ×2 and ×4 scaling factors in both testing datasets. It is worth noting that the performance of PSRGAN was inferior to that of our network. Firstly, PCDN only uses 55 infrared images for transfer learning while PSRGAN used 100, which means that the former uses less sample space. Secondly, PCDN has fewer parameters (227.403 K) than PSRGAN (312.7 K). In addition, in terms of reconstruction performance, the proposed PDCN outperformed the PSRGAN for both result-A and result-C on all scales, being 1.76 dB and 0.0342 higher than PSRGAN in terms of PSNR and SSIM, respectively, for result-C ×4. In conclusion, our PCDN achieved better infrared image reconstruction performance with fewer parameters and less infrared sample space, highlighting the powerful superiority of our network. Therefore, it is clear from Table 1 that the proposed PCDN performs well when compared with existing methods.

We evaluated the visual results of different models to further illustrate the effectiveness of our method. The visual comparisons on the scale ×2 and ×4 result-A and result-C are visually depicted in Figure 2. It can be seen from the zoomed in part of the eaves for image_18 in result-A × 4 that the PCDN could reconstruct more realistic textures. For image_7 in result-C × 2, the softer and smoother stripes and line patterns were better recovered by PCDN than other methods. Overall, the capability of the proposed network to synthesize greater detail and more accurate structures was proven for infrared image super-resolution.

5. Discussion

5.1. Model Analysis

In many typical factors used to assess a lightweight model, the essential and vital standard of measurement involves the network parameters. Therefore, we visualize a cost-effectiveness analysis between PSNR and model size for the result-C ×4 dataset in Figure 3. Our network is extremely lightweight, and its parameters are only 227.403 K when ×2 and 235.063 K when ×4. In addition, we also present a tradeoff analysis between performance and Multi-Adds for result-C ×2 in Figure 4. The Multi-Adds of PCDN was only 14.64 G and 22.02 G when ×2 and ×4, respectively, which is much lower than other methods. It can be observed that our PCDN achieved the optimal tradeoff between reconstruction performance and model size in contrast to the existing models.

5.2. Ablation Experiments

In order to demonstrate the effectiveness of transfer learning, we analyzed and compared the two training results utilizing the DIV2K visible dataset individually and with IR55 transfer learning. We trained 400 epochs for these ablation experiments. The experimental results prove that the performance of the PCDN was improved using transfer learning, as shown in Table 2. For result-A ×2, PCDN enhanced the PSNR by 0.09 dB and of the SSIM by 0.006 for result-C ×2. This shows that fine-tuning the pretrained network of transfer learning is beneficial to improve the performance of the model without introducing parameters.

Furthermore, we investigated the implementation of dilation convolution in PFRD. We compared and analyzed the impacts of using dilation convolution and common convolution at the end of the distillation feature extraction on the network performance. Figure 5 shows the training process of the calculated PSNR with 800 images in DIV2K as the training dataset and 100 images as the validation dataset. It can be observed that the PCDN with dilation convolution showed better performance. Moreover, the PSNR and SSIM on the infrared test datasets for result-A and result-C are presented in Table 3.

Due to the superiority of dilation convolution in expanding the receptive field, we further explored whether the reconstruction performance would improve upon replacing all convolutions in the PFRD by dilation convolution. However, this caused performance to plummet. This was possibly due to the holes inserted between effective pixel values of filters when convolution introduced the dilation factor, which led to feature information being lost, especially in the underlying network containing more important details.

6. Conclusions

In this paper, we presented the progressive compact distillation network (PCDN) incorporating the lightweight information distillation network and transfer learning strategy. We designed the progressive feature residual distillation block (PFRD) containing asymmetric residual blocks, parallel asymmetric convolutions with different dilation factors for the feature extraction, and a subsequent processing module with adaptive recalibration for the weights of fusion features. In terms of structure design, we then utilized the difference calculation between adjacent PFRDs to obtain high-frequency information and the bil-global connection mechanism to keep the network stable. Moreover, we used the transfer learning strategy to fine-tune the PCDN with only 55 infrared images based on the pretrained network using visible image datasets. Comprehensive experiments illustrated that our proposed extremely lightweight method realizes optimal infrared image super-resolution performance with few-shot infrared images samples. In the future, we will further explore our approach to achieve better infrared image reconstruction with zero-data learning in the transfer learning domain.

Author Contributions

Conceptualization, K.F., K.H. and F.L.; methodology, K.F. and K.H.; formal analysis, F.L.; writing—original draft preparation K.H. and F.L.; supervision, K.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key Research and Development Program of China (No. 2019YFB1405503), 2019 Public Service Platform of Industrial Technology Foundation of MIIT (No.2019-00893-1-1) and the 2019 Public Service Platform of the Industrial Technology Foundation of MIIT (No. 2019-00895-2-1).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ma, X.; Chau, L.P.; Yap, K.H.; Ping, G. Convolutional three-stream network fusion for driver fatigue detection from infrared videos. In Proceedings of the 2019 IEEE International symposium on circuits and systems (ISCAS), Sapporo, Japan, 26–29 May 2019. [Google Scholar]
Guo, K.; Zhai, S.; Liu, Y.; Liu, B.; Yang, H. Development of upper limb rehabilitation training control system based on path planning. In Proceedings of the 2019 International Conference on Image and Video Processing, and Artificial Intelligence, Shanghai, China, 27 November 2019. [Google Scholar]
Shakeel, P.M.; Tobely, T.E.E.; Al-feel, H.; Manogaran, G.; Baskar, S. Neural network based brain tumor detection using wireless in- frared imaging sensor. IEEE Access 2019, 7, 5577–5588. [Google Scholar] [CrossRef]
Sakudo, A. Near-infrared spectroscopy for medical applications: Current status and future perspectives. Clin. Chim. Acta. 2016, 455, 181–188. [Google Scholar] [CrossRef] [PubMed]
Rupali, M.; Vishal, V.; Michael, W.; Carlos, B.R.; David, M. Imaging and feature selection using GA-FDA algorithm for the classi- fication of mid-infrared biomedical images. Microsc. Microanal. 2016, 22, 1008–1009. [Google Scholar]
Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a Deep Convolutional Network for Image Super-Resolution. Pattern Anal. Mach. Intell. 2016, 38, 295–307. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Kim, J.; Lee, J.K.; Lee, K.M. Accurate Image Super-Resolution Using Very Deep Convolutional Networks. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 391–407. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 23–28 June.
Lim, B.; Son, S.; Kim, H.; Nah, S.; Lee, K.M. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Kim, J.; Lee, J.K.; Lee, K.M. Deeply-recursive convolutional network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
Ahn, N.; Kang, B.; Sohn, K.A. Fast, accurate, and lightweight super-resolution with cascading residual network. In Proceedings of the European Conference on Computer Vision Workshop (ECCVW), Munich, Germany, 8 September 2018. [Google Scholar]
Hui, Z.; Gao, X.; Yang, Y.; Wang, X. Lightweight image super-resolution with information multi-distillation network. In Proceedings of the European Conference on Computer Vision Workshop (ECCVW), Munich, Germany, 8 September 2018. [Google Scholar]
Mehta, N.; Murala, S. MSAR-Net: Multi-scale Attention based Light-Weight Image Super-Resolution. Pattern Recogn. Lett. 2021, 151, 215–221. [Google Scholar] [CrossRef]
Shao, L.; Zhu, F.; Li, X. Transfer Learning for Visual Categorization: A Survey. IEEE Trans. Neural Netw. Learn. Syst. 2010, 22, 1345–1359. [Google Scholar]
Shahin, A.; Almotairi, S. DCRN: An Optimized Deep Convolutional Regression Network for Building Orientation Angle Estimation in High-Resolution Satellite Images. Electronics 2021, 10, 2970. [Google Scholar] [CrossRef]
Zhang, Q.; Yang, Q.; Zhang, X.; Bao, Q.; Su, J.; Liu, X. Waste image classification based on transfer learning and convolutional neural network. Waste Manag. 2021, 135, 150–157. [Google Scholar] [CrossRef] [PubMed]
Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In Proceedings of the European Conference on Computer Vision (ECCV), Amsterdam, The Netherlands, 11–14 October 2016; pp. 391–407. [Google Scholar]
Huang, F.; Wang, Z.; Wu, J.; Shen, Y.; Chen, L. Residual Triplet Attention Network for Single-Image Super-Resolution. Electronics 2021, 10, 2072. [Google Scholar] [CrossRef]
Li, Z.; Wang, C.; Wang, J.; Ying, S.; Shi, J. Lightweight adaptive weighted network for single image super-resolution. Comput. Vis. Image Und. 2021, 211, 103254. [Google Scholar] [CrossRef]
Inc, C.S.G.; Vanhoucke, V.; Ioffe, S.; Shlens, J. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018. [Google Scholar]
Yu, F.; Koltun, V. Multi-Scale Context Aggregation by Dilated Convolutions. In Proceedings of the International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
Huang, Y.; Jiang, Z.; Wang, Q.; Jiang, Q.; Pang, G. Pacific Rim International Conference on Artificial Intelligence; Springer: Cham, Switzerland, 2021. [Google Scholar]
Wu, J.; Cheng, L.; Chen, M.; Wang, T. Super-resolution infrared imaging via multi-receptive field information distillation network. Opt. Laser. Eng. 2021, 145, 106681. [Google Scholar] [CrossRef]
Moran, M.B.H.; Faria, M.D.B.; Giraldi, G.A.; Bastos, L.F.; Conci, A. Using super-resolution generative adversarial network models and transfer learning to obtain high resolution digital periapical radiographs. Comput. Biol. Med. 2021, 129, 104139. [Google Scholar] [CrossRef] [PubMed]
Chen, J.; Wang, H.; Wang, S.; He, E.; Zhang, T.; Wang, L. Convolutional neural network with transfer learning approach for detection of unfavorable driving state using phase coherence image. Expert Syst. Appl. 2022, 187, 116016. [Google Scholar] [CrossRef]
Huang, Y.; Jiang, Z.; Lan, R.; Zhang, S.; Pi, K. Infrared Image Super-Resolution via Transfer Learning and PSRGAN. IEEE Signal Process. Lett. 2021, 28, 982–986. [Google Scholar] [CrossRef]
Zhao, H.; Kong, X.; He, J.; Qiao, Y.; Dong, C. Efficient image super-resolution using pixel attention. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020. [Google Scholar]
Zhang, X.; Zhou, X.; Lin, M.; Sun, J. Shufflenet: An extremely efficient convolutional neural network for mobile devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 6848–6856. [Google Scholar]
Liu, J.; Zhang, W.; Tang, Y.; Tang, J.; Wu, G. Residual feature aggregation network for image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13 June 2020; pp. 2356–2365. [Google Scholar]
Timofte, R.; Agustsson, E.; Gool, L.V. Ntire 2017 challenge on single image super-resolution: Dataset and study. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshop (CVPRW), Honolulu, HI, USA, 21–26 July 2017; pp. 126–135. [Google Scholar]
Socarr’as, Y.; Ramos, S.; V’azquez, D.; L’opez, A.M.; Gevers, T. Adapting pedestrian detection from synthetic to far infrared images. In Proceedings of the IEEE International Conference on Computer Vision (ICCVW), Sydney, Australia, 1–8 December 2013. [Google Scholar]
Liu, Y.; Chen, X.; Cheng, J.; Peng, H. Infrared and visible image fusion with convolutional neural networks. Int. J. Wavelets Multi. 2018, 16, 1850018. [Google Scholar] [CrossRef]
Zhang, Y.; Zhang, L.; Bai, X.; Zhang, L. Infrared and visual image fusion through infrared feature extraction and visual information preservation. Infrared Phys. Technol. 2017, 83, 227–237. [Google Scholar] [CrossRef]
Zhang, K.; Zuo, W.; Zhang, L. Learning a single convolutional super-resolution network for multiple degradations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 3262–3271. [Google Scholar]
Zhang, K.; Zuo, W.; Zhang, L. Deep plug-and-play super-resolution for arbitrary blur kernels. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1671–1681. [Google Scholar]
Ma, C.; Rao, Y.; Cheng, Y.; Chen, C.; Lu, J.; Zhou, J. Structure-preserving super resolution with gradient guidance. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 7769–7778. [Google Scholar]

Figure 1. The complete architecture of the proposed network.

Figure 2. Visual results of PCDN compared with other methods using fused_18 from result-A ×4 (top) and fused_21 from result-C ×2 (below).

Figure 3. Cost-efficient comparison between the proposed PCDN and other methods for result-C × 4.

Figure 4. Tradeoff between performance and Multi-Adds for result-C ×2.

Figure 5. Convergence analysis on PFRD with common convolution and dilation convolution at the end of distillation feature extraction.

Table 1. Average PSNR/SSIM for scale factors of 2 and 4 on datasets result-A and result-C.

Test Datasets	Scale	SRMD [38]	DPSR [39]	EDSR [10]	SPSR [40]	IMDN [15]	PSRGAN [30]	Ours
result-A	×4	11.90/0.2860	32.45/0.8181	27.89/0.7076	32.46/0.7839	32.56/0.8239	33.13/0.8282	34.95/0.8647
result-A	×2	23.21/0.4193	37.01/0.9303	31.94/0.8106	36.41/0.8961	37.16/0.9343	37.48/0.9229	39.31/0.9441
result-C	×4	12.52/0.4193	33.12/0.8377	28.41/0.7332	33.14/0.8056	33.20/0.8435	33.86/0.8466	35.62/0.8808
result-C	×2	22.92/0.7605	37.91/0.9425	32.46/0.8294	37.23/0.9129	38.06/0.9465	38.52/0.9363	40.29/0.9558

Table 2. Evaluation of performance in training using DIV2K visible dataset individually and with IR55 transfer learning.

Training Datasets		Test Datasets
DIV2K	IR55	result-A ×2	result-A ×4	result-C ×2	result-C ×4
√	×	39.2224/0.9447	34.9491/0.8645	40.2422/0.9552	35.5849/0.8805
√	√	39.3126/0.9441	34.9547/0.8647	40.2940/0.9558	35.6215/0.8808

Table 3. Investigations of common convolution and dilation convolution at the end of distillation feature extraction for two test datasets.

Block Composition	PSNR/SSIM
Test Datasets	Result-A ×2	Result-A ×4	Result-C ×2	Result-C ×4
PFRD_Common	39.16/0.9427	34.88/0.8634	40.13/0.9540	35.50/0.8799
PFRD_Dilation	39.31/0.9441	34.95/0.8647	40.29/0.9558	35.62/0.8808

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fan, K.; Hong, K.; Li, F. Infrared Image Super-Resolution via Progressive Compact Distillation Network. Electronics 2021, 10, 3107. https://doi.org/10.3390/electronics10243107

AMA Style

Fan K, Hong K, Li F. Infrared Image Super-Resolution via Progressive Compact Distillation Network. Electronics. 2021; 10(24):3107. https://doi.org/10.3390/electronics10243107

Chicago/Turabian Style

Fan, Kefeng, Kai Hong, and Fei Li. 2021. "Infrared Image Super-Resolution via Progressive Compact Distillation Network" Electronics 10, no. 24: 3107. https://doi.org/10.3390/electronics10243107

APA Style

Fan, K., Hong, K., & Li, F. (2021). Infrared Image Super-Resolution via Progressive Compact Distillation Network. Electronics, 10(24), 3107. https://doi.org/10.3390/electronics10243107

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Infrared Image Super-Resolution via Progressive Compact Distillation Network

Abstract

1. Introduction

2. Related Work

2.1. Lightweight CNNs for SR

2.2. Infrared Image SR and Transfer Learning

3. Method

3.1. Overall Architecture of the Network

3.2. Progressive Feature Residual Distillation

3.3. Difference Calculation and Bil-Global Connection Mechanism

3.4. Loss Function

4. Results

4.1. Implementation Details

4.2. Quantitative and Qualitative Results

5. Discussion

5.1. Model Analysis

5.2. Ablation Experiments

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI