Image Super-Resolution Algorithm Based on an Improved Sparse Autoencoder

Due to the limitations of the resolution of the imaging system and the influence of scene changes and other factors, sometimes only low-resolution images can be acquired, which cannot satisfy the practical application’s requirements. To improve the quality of low-resolution images, a novel super-resolution algorithm based on an improved sparse autoencoder is proposed. Firstly, in the training set preprocessing stage, the highand low-resolution image training sets are constructed, respectively, by using high-frequency information of the training samples as the characterization, and then the zero-phase component analysis whitening technique is utilized to decorrelate the formed joint training set to reduce its redundancy. Secondly, a constructed sparse regularization term is added to the cost function of the traditional sparse autoencoder to further strengthen the sparseness constraint on the hidden layer. Finally, in the dictionary learning stage, the improved sparse autoencoder is adopted to achieve unsupervised dictionary learning to improve the accuracy and stability of the dictionary. Experimental results validate that the proposed algorithm outperforms the existing algorithms both in terms of the subjective visual perception and the objective evaluation indices, including the peak signal-to-noise ratio and the structural similarity measure.


Introduction
In the remote sensing, medical, military, and other fields, the acquisition of high-resolution (HR) images is of great significance.Image super-resolution (SR) is a technique that uses signal processing approaches to enhance the spatial resolution of the image.Its key is to add some additional information into the process of image reconstruction to compensate for the loss of detail information due to image degradation, so that it could reconstruct a clear HR image from a low-resolution (LR) image [1].The SR algorithm based on dictionary learning utilizes the characteristic that the natural images have a sparse representation under a specific dictionary, and applies the dictionary learning method to construct the dictionaries which can represent image patches sparsely, and then some additional information can be obtained to improve the quality of the reconstructed image [2].
The purpose of dictionary learning is to decompose the data matrix into a dictionary matrix and a representation matrix, so it is also known as "matrix factorization".In the late 1990s, dictionary learning began to be applied in vision [3] and information retrieval [4].At present, dictionary learning has been widely used to solve inverse problems in image processing, such as image denoising [5], image inpainting [6], color image restoration [7], inverse half toning [8], and even medical image reconstruction [9,10].
The dictionary learning methods can be divided into two categories, the mathematical transformation-based methods and the learning-based methods.The wavelet transform (WT) dictionary and the overcomplete discrete cosine transform (DCT) dictionary belong to the mathematical transformation-based methods.Dattatray et al. [11] and Dabbaghchian et al. [12] learned face image samples by using WT and DCT, respectively, and applied the learned mathematical transformation-based dictionary to face recognition.Although the mathematical transformation-based dictionary is simple and easy to implement in the case of representing the signal sparsely, the expression of the signal is single and without self-adaptability.However, the learning-based dictionary has a relatively strong adaptive ability, which can better adapt to different image data.The method of optimal directions (MOD) proposed by Engan et al. [13] is the originator of the learning-based dictionary, and its dictionary update approach is simple, but its convergence speed is very slow.Aharon et al. [14] proposed the K-SVD algorithm, which is the most popular dictionary learning method.The algorithm learned the dictionary under the strict sparse condition by giving a set of training signals so that each signal has the best representation.Moreover, the convergence speed of the K-SVD algorithm is faster than that of the MOD algorithm.Mairal et al. [15] proposed an online dictionary learning algorithm that has high training speed and is suitable for the processing of special signals, such as video signals and voice signals.With the development of machine learning, the models of unsupervised learning, such as neural networks or deep learning, provide some new ideas for dictionary learning.In [16], the dictionary learning method was proposed by using some models including deep belief networks and a stacked autoencoder.
We apply a sparse autoencoder (SAE) to the SR algorithm and propose two image SR algorithms.The main contributions of this paper are summarized as follows: 1.
A novel training set preprocessing method is proposed.By regarding the high-frequency information of the image as the characterization, we construct the HR and LR image training sets with different methods, and then apply the zero-phase component analysis (ZCA) whitening method to reduce the redundancy of the joint training set to improve the learning efficiency of the SAE.

2.
An improved SAE (ISAE) is proposed to boost the accuracy and stability of the dictionary.A new sparse regularization term related to the hidden layer is introduced into the cost function of the traditional SAE to further strengthen the sparseness constraint on the hidden layer, so that the number of hidden units whose average activation is close to zero is as many as possible.

3.
The SR algorithm based on the SAE (SRSAE) and the SR algorithm based on the ISAE (SRISAE) are proposed.The SAE is employed to achieve unsupervised dictionary learning, and then by applying this unsupervised dictionary learning method to the SR algorithm based on sparse representation, the SRSAE can be constructed.By replacing the SAE with the ISAE, the SRISAE can be obtained using the same procedure described above.
The remainder of this paper is organized as follows.Section 2 introduces the related works.Section 3 presents the basic theory of the image SR algorithm based on dictionary learning.Section 4 describes the proposed algorithm, including the training set preprocessing method, the unsupervised dictionary learning model based on the ISAE, and the specific overall flow of our algorithm.In Section 5, some experimental results are shown to verify the effectiveness of our algorithm.Section 6 concludes the paper.

Related Works
Dictionary learning can achieve better sparse representation and discriminative information through the custom design for the dictionary, which can improve the quality of the reconstructed image.In recent years, the SR algorithm based on dictionary learning has attracted a large number of scholars' attention and has become one of the most important research directions of the single-image SR algorithm.
Yang et al. [17] regarded the image library consisting of a large number of HR images as training samples, and generated the corresponding LR images training samples by down-sampling the HR images.Then, the joint dictionary training algorithm was used to train the HR and LR images so that the sparse representation coefficients of the LR image patches were similar to those of the corresponding HR image patches.Consequently, the HR image patches could be generated approximately through the sparse representation coefficients of LR image patches and the HR dictionary.Although the algorithm can obtain sufficient additional information to restore some high-frequency detail information, the accuracy and stability of the additional information cannot be guaranteed when the training image library cannot provide image patches similar to the image to be reconstructed.Zeyde et al. [18] improved Yang's algorithm [17] through applying the K-SVD approach and the pseudo-inverse approach to train the LR dictionary and the HR dictionary, respectively.Compared with Yang's algorithm, this algorithm improves the quality of the reconstructed image and reduces image artifacts.To avoid a large number of image training samples and obtain more accurate prior knowledge, Jing et al. [19] proposed an SR algorithm based on multi-task dictionary learning, which learned a multiple-examples-aided redundant dictionary from different classes of samples classified by the K-Means approach to provide a more suitable dictionary for the reconstruction of each sample.The algorithm can not only reduce the computational complexity caused by the large dictionary, but also has good reconstruction performance.In [20], the SR algorithm based on the K-SVD method and semi-coupled dictionary learning was proposed to solve the time-consumption problem in dictionary learning.The K-SVD algorithm was applied to train the dictionary pair in the semi-coupled dictionary learning model, which not only reduces the dictionary learning time, but also improves the quality of the reconstructed image.Zhang et al. [21] proposed a single-image SR algorithm based on label consistency K-SVD (LC-KSVD).The algorithm introduced a new label consistency constraint called "discriminative sparse code error" into the K-SVD objective function, which made the learned dictionary possess both good representation and discrimination ability.Accordingly, the reconstruction performance and the robustness of this algorithm become better than that of the K-SVD algorithm.
For unsupervised dictionary learning, such as neural networks, Zhang et al. [22] learned a feature dictionary from a large number of unlabeled remote sensing images by using the SAE, and retrieved the remote sensing images through the learned dictionary and a convolutional neural network.The algorithm effectively improves the speed and accuracy of remote sensing image retrieval.
At present, the research on applying dictionary learning based on a neural network model to an image SR algorithm is still relatively rare.Inspired by the literature [22], an improved SAE is proposed for unsupervised dictionary learning to enhance the accuracy and stability of the dictionary, and it is applied to the SR algorithm to improve the quality of the reconstructed images.

Image SR Algorithm Based on Dictionary Learning
We define X ∈ R N as the HR image, Y ∈ R M as the LR image, L as the down-sampling operator, and n as the additive white noise.Then, the degradation model from the HR image to the LR image can be defined as, Assuming that there is an overcomplete HR dictionary D h and an LR dictionary D l , and the HR and LR images have the same sparse representation coefficients [2], then the HR image X can be reconstructed by combining HR dictionary D h with sparse representation coefficients α, Since the LR image Y is known, its sparse coefficients α can be solved by combining LR dictionary D l with the LR image Y.The model of the solution is as follows, In general, the optimization of Equation ( 3) is an NP-hard problem.Supposing that the sparse representation coefficients α are sparse enough, solving the l 0 -norm minimization problem can be replaced by solving the l 1 -norm minimization problem [23].Then, the Lagrange multiplier is used for equivalent conversion to obtain the following sparse coding function, where λ is a parameter used to balance the sparsity of the solution and the fidelity of the LR image Y.
The sparse coefficients α can be obtained by solving Equation ( 4), and then the HR image can be reconstructed.

Training Set Preprocessing Method
The sample images used to construct the training sets contain 91 HR images derived from literature [2].Let P h represent the HR images.Then, the corresponding LR images P l can be obtained by down-sampling these HR images using the degradation model shown as Equation (1), and the corresponding middle images P m of the same size as the HR images can be obtained by up-sampling these LR images with Bicubic interpolation.
Construct the HR training set.To train the characterization of the relationship between the HR patches and their corresponding LR patches in the edge and the texture, the HR images are subtracted from the middle images to remove their low-frequency information, that is, the difference images e h can be obtained via e h = P h − P m .Then, the HR training set Z h can be obtained by performing feature extraction on the difference images e h .
Construct the LR training set.To extract the local characterization corresponding to their high-frequency information, the middle images P m , which are the enlarged images from the LR images, are filtered by using r high-pass filters, that is, {R i * P m } i , i = 1, 2, • • • , r (where the symbol * indicates a convolution operation).These high-pass filters can be gradient filters or Laplacian filters.Then, feature extraction is performed on the filtered images, and the LR training set Z l can be obtained.Considering that the dimension of Z l increases as the middle images are filtered with r high-pass filters, the sparse principal component analysis (SPCA) [24] algorithm is employed to reduce the dimension of Z l to reduce the computational complexity of the dictionary learning.The SPCA algorithm, which is based on the PCA algorithm, introduces a new constraint term to find the sparse principal components which can be represented by the linear combination of the smallest but most representative variables.In this way, it can not only reduce the time of dimensionality reduction, but also obtain more accurate principal components and improve the ability of explanation and analysis.After reducing the dimension of Z l , the LR training set can be expressed as Z l .
In summary, we can obtain the joint training set Z = [Z h , Z l ] by combining the HR training set Z h with the LR training set Z l .
Zero-phase component analysis (ZCA) whitening.Due to the strong correlation that exists between adjacent pixels in the image, the ZCA whitening technology [25] is adopted to eliminate the redundancy of the joint training set Z.
Through ZCA whitening, the correlation between the features of each image patch in the training set is reduced, and the features of all the image patches have the same variance.We define as the corresponding joint training set.The main process of ZCA whitening is listed as follows.
Firstly, calculate the eigenvector matrix U through decomposing the covariance matrix of the joint training set Z by Singular Value Decomposition (SVD).For matrix U, it possesses the orthogonality property and satisfies UU T = U T U = 1.Secondly, rotate the features according to Z rot = U T Z.
Thirdly, utilize the PCA whitening approach to process the rotated features so that each feature has unit variance, that is, z PCAwhite,i = z rot,i / √ λ i , where λ i is the value of the diagonal element of the covariance matrix of z rot .Finally, left multiply the matrix U with z PCAwhite,i to obtain the ZCA whitening features s i , where s i ∈ S, the joint training set processed by the ZCA whitening is In the ZCA whitening stage, the data dimension will be maintained and no longer reduced.In addition, since the range of input samples of the SAE must be scaled to [0, 1], the training set S needs to be normalized.
The proposed training set preprocessing method can not only effectively reduce the computational complexity of the SAE to save the training time, but also reduce the correlation between the features, which lays the foundation for dictionary learning.The framework of the proposed training set preprocessing method is illustrated as Figure 1.
ZCA whitening is listed as follows.
Firstly, calculate the eigenvector matrix U through decomposing the covariance matrix of the joint training set Z by Singular Value Decomposition (SVD).For matrix U , it possesses the orthogonality property and satisfies Secondly, rotate the features according to T rot  Z U Z .Thirdly, utilize the PCA whitening approach to process the rotated features so that each feature has unit variance, that is, .In the ZCA whitening stage, the data dimension will be maintained and no longer reduced.In addition, since the range of input samples of the SAE must be scaled to [0,1] , the training set S needs to be normalized.
The proposed training set preprocessing method can not only effectively reduce the computational complexity of the SAE to save the training time, but also reduce the correlation between the features, which lays the foundation for dictionary learning.The framework of the proposed training set preprocessing method is illustrated as Figure 1.

Unsupervised Dictionary Learning Model Based on ISAE
The traditional dictionary matrix can be seen as consisting of multiple atoms, where each column of the matrix corresponds to an atom.In [16], an unsupervised dictionary learning method is performed using a deep neural network, and the column of the dictionary is treated as the connection between the input layer and the presentation layer.Thus, the updated connection weights are equivalent to the learned dictionary.The relationship between dictionary learning and the neural network representation is shown as Figure 2. In Figure 2, X stands for the data, D represents a basis which also called a 'dictionary' and the columns of D are called 'atoms', and Z indicates the representing of X .

Unsupervised Dictionary Learning Model Based on ISAE
The traditional dictionary matrix can be seen as consisting of multiple atoms, where each column of the matrix corresponds to an atom.In [16], an unsupervised dictionary learning method is performed using a deep neural network, and the column of the dictionary is treated as the connection between the input layer and the presentation layer.Thus, the updated connection weights are equivalent to the learned dictionary.The relationship between dictionary learning and the neural network representation is shown as Figure 2. In Figure 2, X stands for the data, D represents a basis which also called a 'dictionary' and the columns of D are called 'atoms', and Z indicates the representing of X.
In this paper, the SAE is employed to achieve unsupervised dictionary learning.The SAE is a traditional feedforward neural network including an input layer, a hidden layer, and an output layer.In this model, the number of hidden units is greater than that of the input units, and its structure is illustrated as Figure 3.The main reasons why we choose the SAE for dictionary learning include, on the one hand, that the SAE can automatically learn more sparse and compact data characteristics from unlabeled data on the condition that the output is approximately equal to the original input.On the other hand, the number of hidden units is equivalent to the dictionary dimension, so the SAE with far more hidden units than input units can guarantee that the learned dictionary has the overcomplete property.
structure is illustrated as Figure 3.The main reasons why we choose the SAE for dictionary learning include, on the one hand, that the SAE can automatically learn more sparse and compact data characteristics from unlabeled data on the condition that the output is approximately equal to the original input.On the other hand, the number of hidden units is equivalent to the dictionary dimension, so the SAE with far more hidden units than input units can guarantee that the learned dictionary has the overcomplete property.The SAE consists of an encoder and a decoder.The encoder maps the input vector x to the hidden layer y in a certain way by means of a nonlinear mapping function, where W is the weight matrix of the input layer to the hidden layer, 1 b is the bias vector of the input layer, Wb , and ()   is the activate function.The decoder is responsible for mapping the hidden layer y to the output layer z .The output layer has the same number of units as the input layer, and the mapping relationship is as follows, structure is illustrated as Figure 3.The main reasons why we choose the SAE for dictionary learning include, on the one hand, that the SAE can automatically learn more sparse and compact data characteristics from unlabeled data on the condition that the output is approximately equal to the original input.On the other hand, the number of hidden units is equivalent to the dictionary dimension, so the SAE with far more hidden units than input units can guarantee that the learned dictionary has the overcomplete property.
(a) (b)  The SAE consists of an encoder and a decoder.The encoder maps the input vector x to the hidden layer y in a certain way by means of a nonlinear mapping function, where W is the weight matrix of the input layer to the hidden layer, 1 b is the bias vector of the input layer, Wb , and ()   is the activate function.The decoder is responsible for mapping the hidden layer y to the output layer z .The output layer has the same number of units as the input layer, and the mapping relationship is as follows, The SAE consists of an encoder and a decoder.The encoder maps the input vector x to the hidden layer y in a certain way by means of a nonlinear mapping function, where x ∈ [0, 1], y ∈ [0, 1], W 1 is the weight matrix of the input layer to the hidden layer, b 1 is the bias vector of the input layer, θ 1 = {W 1 , b 1 }, and σ(•) is the activate function.The decoder is responsible for mapping the hidden layer y to the output layer z.The output layer has the same number of units as the input layer, and the mapping relationship is as follows, where z ∈ [0, 1], W 2 is the weight matrix of the hidden layer to the output layer, and its value is the same as the transpose of W 1 , and b 2 is the bias vector of the hidden layer, θ 2 , where the parameters can be expressed as θ = {θ 1 , θ 2 } by merging θ 1 and θ 2 .
The SAE minimizes the reconstruction error between input and output by adjusting the parameter θ.In general, the mean squared error (MSE) is used as its cost function, and a weighted attenuation term is added to the cost function to reduce the magnitude of the weights and prevent overfitting.Moreover, to ensure that the hidden units are inactive most of the time, the regularization term used to constrain the sparsity of the hidden layer is added to the cost function.Assuming that where the data from s 1 to s m belongs to the HR training set, and the data from s m+1 to s m+n belongs to the LR training set), and its output data is the cost function of the traditional SAE can be expressed as, where m and n are the number of samples in the HR and LR training sets, respectively, s i ∈ S is the input data, h i ∈ H is the output data, N l is the number of layers, S l is the number of units in layer l, ρj is the average activation of the hidden unit j, ρ is the expected activation whose value is set to close to 0, and λ and β are the regularization parameters.In this paper, the kullbackleibler (KL) divergence is utilized to penalize ρj for significant deviation from ρ, and its expression is as follows, Combined with the SR theory based on dictionary learning and the SAE model, a more accurate dictionary can be generated as long as the sparsity of the hidden layer can be further improved.Consequently, to ensure that the number of hidden units whose average activation is close to zero is as many as possible, the l 1 norm is adopted to strengthen the sparseness constraint on the hidden layer in this paper, and then the cost function of ISAE can be expressed as, where γ is a regularization parameter used to adjust the constructed sparse regularization term, and A is the activation matrix of all the hidden units, whose expression is as follows, where a l j is the activation value of unit j in layer l, W l−1 ji is the weight associated with the connection between unit i in layer l − 1 and unit j in layer l, and b l−1 j is the bias vector associated with unit j in layer l.
The selection of the activation function.The Sigmoid function can scale the input data to (0, 1), which satisfies the requirement of SAE.In addition, the data of the Sigmoid function is not easy to diverge in the process of transmission, and its derivation is simple to calculate.Hence, we select the Sigmoid function as the activation function in the encoding stage, and its corresponding expression is as follows: Although the Sigmoid function can improve the performance of the SAE to a certain extent, the SAE has an inherent drawback that the range of its input data must be scaled to [0, 1].To solve the problem of data scaling, in the decoding stage, we use a linear decoder, that is, σ d (t) = t; accordingly, the residuals can be calculated more accurately to improve the accuracy of the dictionary [26].
To minimize the improved cost function, the gradient descent (GD) method [27] is adopted to update the weights and the bias vectors, and then the connection weights W 1 from the input layer to the hidden layer can be obtained.According to the relationship between dictionary learning and neural network representation shown as Figure 2, the learned dictionary in our algorithm is equivalent to the transpose of W 1 , that is, W 1 T .Consequently, the dictionary is expressed as where w i = w 1,i , w 2,i , . . ., w k,i , k is the dictionary dimension, i = 1, 2, . . ., m + n, and the HR dictionary and LR dictionary can be written as D h = {w 1 , w 2 , • • • , w m } and D l = {w m+1 , w m+2 , • • • , w m+n }, respectively.So, the dictionary pair obtained by applying the ISAE can be expressed as D = (D h , D l ).

The Overall Flow of the Proposed Algorithm
The overall flow of the proposed SR algorithm is illustrated as Algorithm 1.
Input: an LR image Y to be reconstructed, the HR sample images P h for dictionary learning.
Step 1: obtain the LR images P l by down-sampling the HR images P h , and then obtain the middle images P m of the same size as the HR images P h by up-sampling the LR images P l with Bicubic interpolation.
Step 2: obtain the HR and LR joint training set S through preprocessing the HR images P h , the LR images P l , and the middle images P m by applying the proposed training set preprocessing method.
Step 3: generate the HR dictionary D h and LR dictionary D l by utilizing the ISAE to learn the joint training set S.
Step 4: calculate the sparse representation coefficients α of the LR image Y to be reconstructed under the learned LR dictionary D l by using the feature-sign search (FSS) algorithm [28].
Step 5: reconstruct the HR image X via X = D h α.
Step 6: obtain the final reconstructed HR image X by compensating for X with the global error compensation model based on the weighted guided filter [29].
Output: HR image X .

Experiments
To verify the effectiveness of the proposed algorithm, a series of simulation experiments were carried out.Those experiments are implemented in MATLAB 2014a software installed on a 64-bit Windows Operating System, which runs on an Inter(R) Core(TM) i7-7700K CPU @ 4.20GHz with 16 G of memory.The performance of the SR algorithms is evaluated subjectively and objectively.In the subjective evaluation, details such as the edge and texture of the reconstructed images are analyzed.In the objective evaluation, we calculate two indices, the peak signal-to-noise ratio (PSNR) [30] and the structural similarity measure (SSIM) [31], based on the reconstructed images and the original reference HR images.The higher the PSNR value is, the better the quality of the reconstructed image is and the better the performance of the corresponding SR algorithm is.The closer the SSIM value is to 1, the more similar the reconstructed image is to the original image and the better the performance of the corresponding SR algorithm is.In our experiments, the maximum PSNR or SSIM is highlighted in bold type.The PSNR and the SSIM are calculated as follows, where I is the reconstructed HR image, I is the original HR image, M and N are the rows and columns of the HR image, respectively, µ I and µ I are the mean of I and I, respectively, σ 2 I and σ 2 I are the variance of I and I, respectively, σ II is the co-variance, and C 1 and C 2 are the constants.

Samples and Settings
In the dictionary learning stage, the training samples used in the experiments are derived from the training set in literature [2], and include natural images such as landscapes, people, and buildings.Some of the samples are shown in Figure 4. To ensure the objectivity of the experiments, the test images used in the experiments are selected from three image sets: Set5 [32], Set14 [33], and B100, where B100 includes 100 images selected from BSDS300 [33].To quantitatively evaluate the quality of the reconstructed images, these test images are regarded as the HR reference images, and the LR images to be reconstructed are obtained through down-sampling these HR images.The sampling factor s is assigned the value 3.In the training set preprocessing stage, we set r = 4, that is, four high-pass filters are used, , and f 4 = f T 3 .In the stage of dictionary learning, the parameters related to the cost function are set as follows: λ = 0.001, β = 6, γ = 8, ρ = 0.035.In the process of image reconstruction, the image patch size is set to 5 × 5.
are the constants.

Samples and Settings
In the dictionary learning stage, the training samples used in the experiments are derived from the training set in literature [2], and include natural images such as landscapes, people, and buildings.Some of the samples are shown in Figure 4. To ensure the objectivity of the experiments, the test images used in the experiments are selected from three image sets: Set5 [32], Set14 [33], and B100, where B100 includes 100 images selected from BSDS300 [33].To quantitatively evaluate the quality of the reconstructed images, these test images are regarded as the HR reference images, and the LR images to be reconstructed are obtained through down-sampling these HR images.The sampling factor s is assigned the value 3.In the training set preprocessing stage, we set

Analyze the Influence of Different Number of Hidden Units on the Reconstructed Images
In order to discuss the influence of different numbers of hidden units on the performance of the proposed algorithm, the dictionary dimensions are set to 256, 512, 1024, and 2048, respectively.In this way, the optimal number of hidden units can be determined.In this experiment, the images in Set5 are selected as the test images.
Figure 5 shows the reconstructed results of Butterfly using the dictionaries with different numbers of hidden units from subjective visual perception.To better compare and analyze the performance of different SR algorithms, we enlarge the area with more details, which is highlighted with a yellow rectangle in the corresponding image.It can be seen from Figure 5 that the reconstructed image is still vague and there are obvious jagged effects at the edge when the number of hidden units is 256.When the number of hidden units is 512 and 1024, the texture and edge of the

Analyze the Influence of Different Number of Hidden Units on the Reconstructed Images
In order to discuss the influence of different numbers of hidden units on the performance of the proposed algorithm, the dictionary dimensions are set to 256, 512, 1024, and 2048, respectively.In this way, the optimal number of hidden units can be determined.In this experiment, the images in Set5 are selected as the test images.
Figure 5 shows the reconstructed results of Butterfly using the dictionaries with different numbers of hidden units from subjective visual perception.To better compare and analyze the performance of different SR algorithms, we enlarge the area with more details, which is highlighted with a yellow rectangle in the corresponding image.It can be seen from Figure 5 that the reconstructed image is still vague and there are obvious jagged effects at the edge when the number of hidden units is 256.When the number of hidden units is 512 and 1024, the texture and edge of the reconstructed images is gradually improved, the artifacts are fewer and fewer, and the reconstructed images become clearer and clearer.However, when the number reaches 2048, there is no significant improvement for the quality of the reconstructed image, but more time is spent on dictionary learning.Table 1 lists the PSNR and SSIM values of the reconstructed images using the dictionaries with different numbers of hidden units for Set5.From Table 1, we can see that the PSNR and SSIM values of the reconstructed images in Set5 gradually increase with the increase of the number of hidden units.However, as the number of hidden units increases to 2048, the PSNR and SSIM values of the reconstructed images corresponding to most of the images in Set5 decrease.Therefore, combining with the results in Figure 5 and Table 1, we set 1024 as the optimal number of hidden units, that is, the dictionary dimension is set to 1024.
with different numbers of hidden units for Set5.From Table 1, we can see that the PSNR and SSIM values of the reconstructed images in Set5 gradually increase with the increase of the number of hidden units.However, as the number of hidden units increases to 2048, the PSNR and SSIM values of the reconstructed images corresponding to most of the images in Set5 decrease.Therefore, combining with the results in Figure 5 and Table 1, we set 1024 as the optimal number of hidden units, that is, the dictionary dimension is set to 1024.In this experiment, the test images are derived from set5, set14, and BSD100, and the number of hidden units is set to 1024.The purpose of this experiment is to verify that it is effective to apply the SAE or the ISAE for dictionary learning.Consequently, we compare Bicubic interpolation with the proposed SRSAE and SRISAE algorithms.
Figure 6 shows the reconstructed results of Woman with these three SR algorithms from subjective visual perception.We can see that the details of the reconstructed images obtained by the SRSAE and SRISAE algorithms are significantly more abundant than the Bicubic algorithm.Moreover, compared with the SRSAE algorithm, a lot of artifacts in the reconstructed images obtained by SRISAE are reduced; for instance, the woman's face is clearer.Table 2   In this experiment, the test images are derived from set5, set14, and BSD100, and the number of hidden units is set to 1024.The purpose of this experiment is to verify that it is effective to apply the SAE or the ISAE for dictionary learning.Consequently, we compare Bicubic interpolation with the proposed SRSAE and SRISAE algorithms.

lists the average
Figure 6 shows the reconstructed results of Woman with these three SR algorithms from subjective visual perception.We can see that the details of the reconstructed images obtained by the SRSAE and SRISAE algorithms are significantly more abundant than the Bicubic algorithm.Moreover, compared with the SRSAE algorithm, a lot of artifacts in the reconstructed images obtained by SRISAE are reduced; for instance, the woman's face is clearer.Table 2 lists the average PSNR and SSIM values of these three algorithms for the test image sets Set5, Set14, and BSD100.It can be seen from Table 2 that the average PSNR and SSIM values of the SRSAE algorithm are much larger than those of the Bicubic algorithm and that the performance of the SRISAE algorithm is better than that of the SRSAE algorithm.

Analyze the Performance of Different SR Algorithms on Images Sets
To further verify the performance of the proposed SRISAE algorithm, it is compared with eight SR algorithms, including Super Resolution with L1 Regression (L1SR) [2], Single Image Super Resolution (SISR) [18], Anchored Neighborhood Regression (ANR), Neighbor Embedding with Least Squares (NE + LS), Neighbor Embedding with Non-Negative Least Squares (NE + NNLS), and Neighbor Embedding with Locally Linear Embedding (NE + LLE), mentioned in literature [34], Adjusted Anchored Neighborhood Regression (A+)(16 atoms) [35], and improved Super Resolution based on Sparse representation(ISPSR) [29].The eight test images are selected from Set5 and Set14 in this experiment.
Figures 7a,b and 8a,b show the two HR test images Lena and Bird and their corresponding detail images, respectively.Figures 7c-k and 8c-k, respectively, illustrate the reconstructed results of the detail regions of the brim of Lena's hat and the Bird's head with different SR algorithms from subjective visual perception.It can be seen from Figures 7 and 8 that although the L1SR algorithm restores some parts of the details, there are obvious patch effects in its reconstructed images, such as the face in Figure 7c and part of the yellow feather in Figure 8c.With the SISR algorithm, the edge sharpening effect is obvious, but some artificial details appear in the reconstructed images, such as the edge of the hat in Figure 7d.The SR algorithms corresponding to Figures 7e-j and 8e-j achieve good reconstructed results, but too many artificial details, such as the brim of Lena's hat and the junction of the bottom of the Bird's mouth and feather in those reconstructed results, are introduced while restoring more details.The proposed SRISAE algorithm is superior to the other eight SR algorithms.It restores more details without introducing too many artificial details, and the areas of the brim of Lena's hat in Figure 7k and the junction of the bottom of the Bird's mouth and feather in Figure 8k are closer to the original images.Table 3 shows the PSNR and SSIM values of the reconstructed images with different SR algorithms for these test images.It can be seen from Table 3 that the PSNR and SSIM values of the SRISAE algorithm are generally optimal, which indicates that the proposed SRISAE algorithm outperforms the other eight SR algorithms mentioned above.

Analyze the Performance of Different SR Algorithms on Images Sets
To further verify the performance of the proposed SRISAE algorithm, it is compared with eight SR algorithms, including Super Resolution with L1 Regression (L1SR) [2], Single Image Super Resolution (SISR) [18], Anchored Neighborhood Regression (ANR), Neighbor Embedding with Least Squares (NE + LS), Neighbor Embedding with Non-Negative Least Squares (NE + NNLS), and Neighbor Embedding with Locally Linear Embedding (NE + LLE), mentioned in literature [34], Adjusted Anchored Neighborhood Regression (A+)(16 atoms) [35], and improved Super Resolution based on Sparse representation(ISPSR) [29].The eight test images are selected from Set5 and Set14 in this experiment.
Figure 7a,b and Figure 8a,b show the two HR test images Lena and Bird and their corresponding detail images, respectively.Figures 7c-k and 8c-k, respectively, illustrate the reconstructed results of the detail regions of the brim of Lena's hat and the Bird's head with different SR algorithms from subjective visual perception.It can be seen from Figures 7 and 8 that although the L1SR algorithm restores some parts of the details, there are obvious patch effects in its reconstructed images, such as the face in Figure 7c and part of the yellow feather in Figure 8c.With the SISR algorithm, the edge sharpening effect is obvious, but some artificial details appear in the reconstructed images, such as the edge of the hat in Figure 7d.The SR algorithms corresponding to Figures 7e-j and 8e-j achieve good reconstructed results, but too many artificial details, such as the brim of Lena's hat and the junction of the bottom of the Bird's mouth and feather in those reconstructed results, are introduced while restoring more details.The proposed SRISAE algorithm is superior to the other eight SR algorithms.It restores more details without introducing too many artificial details, and the areas of the brim of Lena's hat in Figure 7k and the junction of the bottom of the Bird's mouth and feather in Figure 8k are closer to the original images.Table 3 shows the PSNR and SSIM values of the reconstructed images with different SR algorithms for these test images.It can be seen from Table 3 that the PSNR and SSIM values of the SRISAE algorithm are generally optimal, which indicates that the proposed SRISAE algorithm outperforms the other eight SR algorithms mentioned above.To analyze the computing time of these SR algorithms, our experiments are performed on the same hardware platform mentioned above, and the computing time of different SR algorithms is shown in Table 4. From Table 4, we can see that although the time the proposed SRISAE algorithm spent is not the least, it is at the same level as the algorithms including SISR, ANR, NE + LS, NE + NNLS, NE + LLE, and A + (16 atoms), and its reconstruction performance is superior to these algorithms.It can be seen from Tables 3 and 4 that SRISAE significantly outperforms ISPSR in terms of computing time despite its small increase relative to ISPSR in terms of the PSNR and SSIM indices.Through a comprehensive analysis, we can see that the proposed SRISAE algorithm not only has the best reconstruction performance, but also has good reconstruction efficiency.5 that the average of these two evaluation indices of the proposed algorithm are optimal, indicating that the performance of our algorithm is better than that of the comparative SR algorithms.To analyze the computing time of these SR algorithms, our experiments are performed on the same hardware platform mentioned above, and the computing time of different SR algorithms is shown in Table 4. From Table 4, we can see that although the time the proposed SRISAE algorithm spent is not the least, it is at the same level as the algorithms including SISR, ANR, NE + LS, NE + NNLS, NE + LLE, and A + (16 atoms), and its reconstruction performance is superior to these algorithms.It can be seen from Tables 3 and 4 that SRISAE significantly outperforms ISPSR in terms of computing time despite its small increase relative to ISPSR in terms of the PSNR and SSIM indices.Through a comprehensive analysis, we can see that the proposed SRISAE algorithm not only has the best reconstruction performance, but also has good reconstruction efficiency.5 that the average of these two evaluation indices of the proposed algorithm are optimal, indicating that the performance of our algorithm is better than that of the comparative SR algorithms.This experiment is to test the performance of different SR algorithms on bad quality medical images.The training data and the test data are derived from the published data set The Cancer Imaging Archive (TCIA) [36] and the test images are real LR medical images which have been randomly selected.Figures 9 and 10 illustrate the reconstructed images of two real lung cancer images with different SR algorithms.Among them, Figures 9j and 10j are the reconstructed images of the proposed SRISAE algorithm.Comparing them with the real LR images to be reconstructed shown in Figures 9a and 10a, the reconstructed images of the SRISAE algorithm are clearer and many details in the edge, texture, and structure are restored.[36] and the test images are real LR medical images which have been randomly selected.Since there is no HR reference image, some classic no-reference image quality evaluation indices are used to evaluate the reconstructed images of different SR algorithms objectively.The indices are Variance, Meangradient, Entropy, Brenner, and Energy [37,38], and the higher the value  This experiment is to test the performance of different SR algorithms on bad quality medical images.The training data and the test data are derived from the published data set The Cancer Imaging Archive (TCIA) [36] and the test images are real LR medical images which have been randomly selected.Since there is no HR reference image, some classic no-reference image quality evaluation indices are used to evaluate the reconstructed images of different SR algorithms objectively.The indices are Variance, Meangradient, Entropy, Brenner, and Energy [37,38], and the higher the value Variance, Meangradient, Entropy, Brenner, and Energy [37,38], and the higher the value is, the better the reconstructed performance.The maximum values of these indices are highlighted in bold type.Table 6 shows the average of these indices of 76 reconstructed medical images, and we can see that the proposed SRISAE algorithm is slightly better than the other eight SR algorithms.

Conclusions
To make the input data more effective and enhance the training efficiency of the SAE, we propose a new training set preprocessing method which utilizes different approaches to construct HR and LR training sets and employs the ZCA whitening technology to decorrelate the joint training set to reduce its redundancy.The SAE is applied to the SR algorithm based on sparse representation, and to further enhance the sparsity of the hidden layer, a constructed sparse regularization term is added to the cost function of the traditional SAE.Then, a novel unsupervised dictionary learning algorithm based on the ISAE is proposed to improve the accuracy and stability of the dictionary.Comparisons with several SR algorithms, including L1SR, SISR, ANR, NE + LS, NE + NNLS, NE + LLE, A + (16 atoms) and ISPSR are made.Experimental results demonstrate that the proposed SRISAE algorithm achieves a significant improvement in terms of both quantitative and qualitative measurements.

Figure 1 .
Figure 1.Framework of the proposed training set preprocessing method.HR: high resolution; LR: low resolution; ZCA: zero-phase component analysis.

Figure 1 .
Figure 1.Framework of the proposed training set preprocessing method.HR: high resolution; LR: low resolution; ZCA: zero-phase component analysis.

Figure 2 .
Figure 2. Relationship between dictionary learning and neural network representation.(a) Dictionary learning; (b) Neural network representation.

Figure 2 .
Figure 2. Relationship between dictionary learning and neural network representation.(a) Dictionary learning; (b) Neural network representation.

Figure 2 .
Figure 2. Relationship between dictionary learning and neural network representation.(a) Dictionary learning; (b) Neural network representation.

4
r  , that is, four high-pass filters are used, 1 [ 1,0,1] of dictionary learning, the parameters related to the cost function are set as follows: process of image reconstruction, the image patch size is set to 55  .

Figure 4 .
Figure 4. Some training samples for dictionary learning.

Figure 4 .
Figure 4. Some training samples for dictionary learning.
Figures 9 and 10 illustrate the reconstructed images of two real lung cancer images with different SR algorithms.Among them, Figures 9j and 10j are the reconstructed images of the proposed SRISAE algorithm.Comparing them with the real LR images to be reconstructed shown in Figures 9a and 10a, the reconstructed images of the SRISAE algorithm are clearer and many details in the edge, texture, and structure are restored.
Figures 9 and 10 illustrate the reconstructed images of two real lung cancer images with different SR algorithms.Among them, Figures 9j and 10j are the reconstructed images of the proposed SRISAE algorithm.Comparing them with the real LR images to be reconstructed shown in Figures 9a and 10a, the reconstructed images of the SRISAE algorithm are clearer and many details in the edge, texture, and structure are restored.

Figure 10 .
Figure 10.Comparison of the reconstructed images with different SR algorithms for medical image2.

Table 1 .
Comparison of the peak signal-to-noise ratio PSNR (dB) and the structural similarity measure (SSIM) of the reconstructed images using the dictionaries with different numbers of hidden units for Set5 (PSNR/SSIM).

Table 1 .
Comparison of the peak signal-to-noise ratio PSNR (dB) and the structural similarity measure (SSIM) of the reconstructed images using the dictionaries with different numbers of hidden units for Set5 (PSNR/SSIM).

Table 2 .
Comparison of the average PSNR(dB) and SSIM values of the reconstructed images with the three super resolution (SR) algorithms for Set5, Set14, and BSD100 (PSNR/SSIM).SRSAE: SR algorithm based on the SAE; SRISAE: SR algorithm based on the ISAE.

Table 3 .
Comparison of the PSNR and the SSIM of the reconstructed images with different SR algorithms (PSNR/SSIM).

Table 3 .
Comparison of the PSNR and the SSIM of the reconstructed images with different SR algorithms (PSNR/SSIM).

Table 4 .
Comparison of computing time of different SR algorithms (s).

Table 5
lists the average PSNR and SSIM values of the reconstructed images with different SR algorithms mentioned above for B100.It can be seen from Table

Table 4 .
Comparison of computing time of different SR algorithms (s).

Table 5
lists the average PSNR and SSIM values of the reconstructed images with different SR algorithms mentioned above for B100.It can be seen from Table

Table 5 .
Comparison of the average of PSNR and SSIM of the reconstructed images with different SR algorithms for B100 (PSNR/SSIM).

Table 5 .
Comparison of the average of PSNR and SSIM of the reconstructed images with different SR algorithms for B100 (PSNR/SSIM).Analyze the Performance of Different SR Algorithms on Real Medical Images This experiment is to test the performance of different SR algorithms on bad quality medical images.The training data and the test data are derived from the published data set The Cancer Imaging Archive (TCIA)

Table 5 .
Comparison of the average of PSNR and SSIM of the reconstructed images with different SR algorithms for B100 (PSNR/SSIM).Analyze the Performance of Different SR Algorithms on Real Medical Images

Table 6 .
Comparison of the average of no-reference image quality evaluation indices of the reconstructed images with different SR algorithms.