Gradient-Guided Convolutional Neural Network for MRI Image Super-Resolution

: Super-resolution (SR) technology is essential for improving image quality in magnetic resonance imaging (MRI). The main challenge of MRI SR is to reconstruct high-frequency (HR) details from a low-resolution (LR) image. To address this challenge, we develop a gradient-guided convolutional neural network for improving the reconstruction accuracy of high-frequency image details from the LR image. A gradient prior is fully explored to supply the information of high-frequency details during the super-resolution process, thereby leading to a more accurate reconstructed image. Experimental results of image super-resolution on public MRI databases demonstrate that the gradient-guided convolutional neural network achieves better performance over the published state-of-art approaches.


Introduction
For accuracy of surgical analysis and clinical diagnosis, high-resolution images are a critical need for visualization of the brain structure of brain function. One of the most compelling methods of visualizing brain structure is magnetic resonance image (MRI). However, high-resolution MR images are hard to access in practice. Normally, routine brain MR images are obtained at thicker section-thicknesses and with lower quality to reduce the scanning costs and sampling time, which discourage further medical analysis. For decades, super-resolution techniques have been studied for improving the resolution of LR MRI images, aiming to recover important information about the anatomical structure to facilitate clinical diagnosis [1][2][3][4][5][6][7][8][9][10].
The earlier methods such as interpolation-based methods suffer from over smoothing artifacts, and usually tend to blur the image textures and edges [1]. To tackle these issues, iterative reconstruction-based techniques attempt to recover the high-frequency image details by introducing image priors as regularization items [2,5,7,9,10], which enforce some predefined constraints on the reconstructed image. However, those reconstruction-based methods are time consuming due to the repetition of image reconstruction to generate a sequence of intermediate results.
Recently, machine learning techniques have attracted considerable attention in MRI SR. Learning-based SR methods believe that super-resolution of MRI data can be reconstructed in a supervised context, and try to estimate the mapping function from the LR space to the HR space from extra labeled examples [11][12][13].
Most recently, important advances have been attained in computer vision by using deep neural networks (CNN) [14]. Deep neural networks have become popular in biomedical tasks, such as image classification [15,16] and image reconstruction [11,13]. Learning with large training datasets, CNN-based super-resolution approaches have achieved significant advances over the traditional learning-based methods for natural image super-resolution [17][18][19][20][21]. Inspired by the substantial success of CNNs in natural image SR, several CNN-based variants were developed to improve the performance of MRI super-resolution [12,[22][23][24][25][26]. One of the appealing features of CNN-based approaches is that, once being well-trained, CNN-based super-resolution methods perform super-resolution much more quickly than traditional reconstruction-based approaches.
To ensure data consistency, mean squared error (MSE) between the reconstructed image and its ground-truth image is adapted as the loss function in CNN-based approaches. The pixel-wise MSE fails to enhance high-frequency image details (edges, corners or textures) and leads to blurry images. Figure 1 presents MRI image super-resolution result of different methods. Figure 1b,c shows the CNN super-resolution models that are minimized using MSE. Both methods tend to blur the image details.
(a) Ground-truth (b) SCSR [13] (c) ResNet (d) DGGRN Figure 1. The results of different super-resolution methods on real data in NAMIC with an upscaling factor of 4: (a) the ground-truth; (b) the result of SCSR [13], which tends to blur the high-frequency details; (c) the result of the residual network (ResNet); and (d) the result of our method, which is guided by gradient information during image reconstruction. More realistic results are obtained via the gradient-guided methods (zoom in for better view).
For MRI images, high-frequency image details such as edge structures of the sulcus gyrus and the cortex, substantially impact the detection of suspicious structures, the classification of malformations, and diagnosis. Thus, many studies have considered high-frequency factors in MRI super-resolution [8,[27][28][29][30]. For example, the interpolation approach improves the accuracy of edge reconstruction by introducing contrast guidance [29]. Facilitated by multi-contrast MRI, the missing MR image details are partly recovered by helpful information of the reference MRI data [13,31]. However, the above methods either enforce the optimization via additional regularized terms or introduce supplementary information as part of the input, while leaving the forward super-resolution process to blindly reconstruct a high-resolution image. A flexible model to embed useful image priors into CNN for MRI image super-resolution is still missing. We argue that the image gradient feature, knowing the position (region) that corresponds to an edge, texture or smoothness, is beneficial for recovering high-frequency image details. By incorporated gradient guidance in the feed forward network, the network can recover more image high-frequency details.
Our main contributions to MR image super-resolution are summarized as follows: 1.
We design a gradient-guided residual network for solving the single contrast MRI image super-resolution problem. The proposed network exploits the mutual relation of the super-resolution and the image gradient priors. Thus, the network employs image gradient information for image super-resolution intentionally.

2.
With a suitable model, image gradient is exploited for MR image super-resolution to supply the clues regarding the high-frequency details. Under the guidance of gradient, the forward super-resolution process reconstructs HR image explicitly, thereby leading a more accurate HR image.

3.
The experimental results of three public databases show that the gradient-guided CNN outperforms the conventional feed-forward architecture CNNs in MRI image super-resolution.
The proposed approach provides a flexible model of employing image prior for CNN-based super-resolution.

Related Works
Let x denote the HR image and y denote the observed LR image. y can be formulated as where D, H and refer to the downsampling process, the blurring kernels and the additive noise, respectively. To estimate the MRI super-resolution imagex, The MRI super-resolution imagex can be obtained by: where the data fidelity item is defined by the L2 norm · 2 2 . R(x) is the regularization item. The main difficulty with single-image SR is that it is an ill-posed problem. Since to the high-frequency information is missing, one low-resolution image y can be down-sampled from many high-resolution images x.

CNN-Based MRI Super-Resolution
A CNN-based SR approach aims to learn an end-to-end mapping F between the low-resolution image y and high-resolution image x. The F is decomposed into a sequence of convolutional layers, which are combined of rectified linear unit (ReLU) layers. The lth convolutional layer convolves the image by filters f l × f l . The output of the lth layer is a set of feature maps, which is formulated as: where W l denotes the convolutional weight vectors and B l is the biases of the lth layer. * represents the convolutional operations. y l−1 denotes the input data, which is the output of the previous l − 1th layer. y l is the output of the convolution. y 0 is the input LR images y.
In summary, CNN-based approaches attempt to learn a mapping functionx = F(y; Θ) that is parameterized by Θ, where Θ contains all parameters of W l and B l . To estimate Θ, the mean squared error (MSE) between the reconstruction image and the ground-truth image is often applied as a loss function, which is defined as L = x − F(y; Θ) 2 2 . Suppose a T2-weighted (T2w) low-resolution MRI image is denoted as y T2 , the CNN-based MRI super-resolution aims to learn an end-to-end mappingx T2 = F(y T2 , Θ) from label data. The objective of network is to generate a corresponding T2w high-resolution imagex T2 that is similar quality to the ground-truth x T2 MRI image.
The residual network structure [32] was wildly adopted in CNN-based approaches for image super-resolution [18,21]. As illustrated in Figure 2a, the conventional residual block (Resblock) has two convolutional layers and a shortcut connection, and the result of Resblock is the addition of the input and output.

High-Frequency Details Recovery
Most CNN-based models adopt MSE as the loss function. Because MSE treats every pixel equally, it tends to produce over-smoothed results. Thus, the key objective of image SR is to recover the missing high-frequency details.
To address the over-smoothing issue, the gradient prior is widely applied in reconstruction- [4,27,30] and CNN-based MRI SR methods [33][34][35]. Image gradient provides the exact positions and magnitudes of high-frequency image parts, which are important for improving the accuracy of super-resolution performance. Two approaches are commonly used to embed the image gradient prior into CNNs: The image gradient is employed as a regularization item in the loss function. In a correctly restored image, the edges and texture (related to the image gradients) should be accurate. The regularization term, which is induced by additional sources of information, helps recover high-frequency details. L = L MSE + L G , where L G is defined as in which G(·) denotes the gradient detector, and G(x) is the gradient magnitude of image x.

2.
The alternative approach to incorporating image gradient in the SR process is to concatenate the gradient maps with the input LR image y as a joint input [y, G(y)] of the network. Thus, the mapping function isx = F(y, G(y); Θ) the above approaches implicitly assume that the input or loss function is where the gradient information should be incorporated. However, the positions of high-frequency details are not explicitly explored in the process of image reconstruction. In traditional CNN-based SR, the intermediate layers just attempt to restore the image blindly.
To encourage the network to focus on the image high-frequency details, we design a gradient-guided Resblock, which is illustrated in Figure 2b. The gradient-guided Resblock is based on Resblock, while the result of the intermediate layer is modulated using gradient information.

Proposed Methods
We develop a gradient-guided residual network (DGGRN) that is based on two intuitions: (1) CNN-based SR methods [12,13] have achieved significant performance advances in MRI super-resolution; and (2) gradient features of the LR image facilitate the recovery of high-frequency details in an HR image [4,28,30,34,36]. Figure 3 illustrates the main architecture of DGGRN. DGGRN consists of two subnets. One is for gradient information modeling and the other is for super-resolution. The input of DGGRN is a LR T2w image, which is denoted as y T2 , and the output is a high-resolution image, which is denoted asx T2 .

Gradient Modeling (GM) Subnet
From the gradient magnitude map of LR T2w image g T2 , the GM subnet aims at selectively determining the locations high-frequency image details and facilitating the discrimination between smooth areas and non-smooth areas that are full of fine textures by the SR subnet. To fully exploit the gradient information for image super-resolution, the GM subnet is designed as a shallow densely connected convolutional network, so that it can be optimized end-to-end with the SR subnet.
As shown in Figure 3, the gradient detector calculates the gradient map of LR T2w image in the yand x-directions to obtain its magnitude, and then the gradient map is normalized to [0, 1] before being fed into the gradient modeling subnet. Specifically, we choose Sobel detector as the gradient detector.
After normalization, g T2 is fed into the GM subnet to produce gradient guidance, which is a set of feature maps G T2 : where F GM is the learned mapping function with parameters Θ GM . As expected, the GM subnet acts similar to a feature selector that can identify and locate high-frequency image details. We use the sigmoid function as the final convolutional layer's activation function (shown in Figure 3 with a red box). Thus, the output of the GM subnet is a set of feature maps ranges [0, 1], which provides helpful information regarding the image areas to which the information of pixels belongs. This output will be broadcast to the SR subnet to guide the SR process.

Super-Resolution Subnet
The SR subnet reconstructs the HR image conditioned on the output of GM subnet G T2 . The input of the SR subnet includes two parts, the LR T2w image y T2 and the output of GM subnet G T2 .

Gradient-Guided Resblock
The SR subnet consists of several gradient-guided residual blocks. We propose an effective block, namely, gradient-guided Resblock, that modulates the convolutional results according to G T2 . As illustrated in Figure 2a, compared with Resblock, the convolutional result of gradient-guided Resblock is modulated by the output of the gradient modeling.
Suppose Z is the output of the convolutional operation. The result of gradient-guided Resblock is the dot production of Z and the gradient condition G T2 : where ⊗ denotes the element-wise multiplication. By this approach, the learned parameters of the GM subnet influence the outputs by multiplying them spatially with each intermediate feature maps in an SR subnet.

Reconstruction Block
In our network, we upscale features by using the sub-pixel convolutional layer [19]. The upscaling operation is performed in the latter part of the network, so that the most computations are performed in the LR space. This can reduce the number of computations while preserving the model capacity [19]. We reconstruct the final HR imagex T2 :x where F SR is the learned mapping with parameters Θ SR of the SR subnet.

Datasets
We performed experiments on three public databases. BrainWeb dataset [37] (http://www.bic.mni.mcgill.ca/BrainWeb/) is a publicly available MRI dataset that includes normal and multiple sclerosis simulated images. It contains a set of realistic MRI data volumes produced by an MRI simulator. The voxel dimensions of the synthetic brain MRI is 1 × 1 × 1 mm 3 and the data size is 181 × 217 × 181.
IXI dataset (https://brain-development.org/ixi-dataset/) consists of real MRI data collected from three hospitals in London. We evaluated on the MRI images from Guys hospital. The voxel dimensions are 1 × 1 × 1 mm 3 and the data size is 264 × 255 × 186.
All 3D MRI data were split into 2D image sequences along the transverse, sagittal, and coronal planes. The obtained 2D data were all normalized to [0, 1].

Implementation Details
Our experiments were divided into two groups. In one group, experiments were conducted on BrainWeb and NAMIC, where we built our training set and testing set as in [13] for fair comparison. In the other group, the experiments were evaluated on the IXI dataset, where we built the training and testing sets, and then trained DGGRN from the scratch.
Training set: LR images were generated according to the following steps: 1.
The original image x were convolved by 3 × 3 Gaussian kernel with standard deviation of 1.

2.
The results of convolution were down-sampled with factors of 2, 3 and 4, respectively.
The same degradation was applied in [9,13]. It aims to simulate the generation of a LR MRI image in the spatial domain.
For BrainWeb and NAMIC, we built the training sets with the same data as in [13]. For IXI dataset, 300 2D T2w images from 10 people were used for network training. By flipping and rotation, 16 augmented images were generated from each training image for data augmentation. In addition to these affine augmentations, we extended the training dataset via elastic deformation [38].
Test set: For testing, we randomly selected samples from persons excluding the training data. Specifically, we selected 56 samples from persons in IXI. The test sets of BrainWeb and NAMIC were produced as in [13].
Network structure: The SR subnet is composed of eight gradient-guided Resblocks. Each block consists of two convolutional layers. The gradient modeling subnet consists of four convolutional layers with dense connections. Following CNN-based SR studies [18,21], we set the convolutional filter 64 to 3 × 3. We used the method of Xavier to initialize the weights, and the biases were initialized to zero.
Training details: Our network was implemented based on Tensorflow [39]. To minimize the overhead and to fully utilize the GPU memory, the batch size was set to 64 and the training stopped after 65 epochs when no improvement was observed. The LR T2w 12 × 12 patches and their gradient magnitude maps were fed into network as the inputs. According to the report of Pham et al. [11], the Adam method [40] provides fast convergence and better reconstruction results than SGD. Thus, we trained the network with the Adam optimization with β 1 = 0.9 and β 2 = 0.999. The initial learning rate was 1 × 10 −4 which was decreased by 10% every 20 epochs.
Hardware: All experiments were conduct on a PC with a 2.1 GHz Intel Xeon E5-2620 CPU and an NVIDIA Titan X GPU (12G Memory). All compared approaches were run on the same machine.

Comparison with State-of-the-Art Methods
The proposed approach was compared with three conventional methods: bicubic interpolation low-rank and total-variation regularizations [9] and non-local up-sampling [2]. We compared our results with the CNN-based MRI SR method: single contrast super-resolution CNN (SCSR) [13] and residual-learning network (ReCNN) [11]. The SCSR was proposed for multi-contrast MRI super-resolution, and we extracted the output of its single contrast subnet for comparison. Another compared method is the most recent CNN-based method, namely ReCNN [11], which is designed for 3D MRI super-resolution. To compare our method with ReCNN, we performed the experiments according to the baseline network of [11] and trained the corresponding 2D CNN from the scratch.
To evaluate the performance of DGGRN, we trained a residual network (ResNet) with the same training data and eight Resblocks from scratch. Experiments were performed on BrainWeb and NAMIC. PSNR and structural similarity (SSIM) were used as quantitative measures. Higher PSNR values indicate the reconstructed version is more faithful to the ground-truth image, while higher SSIM values indicate that more accurate image structures are preserved. MATLAB functions were used for the evaluation.
The quantitative results of different methods are reported in Table 1. Compared with SCSR, our method yields PSNR values that are higher by approximately 1.6 dB on BrainWeb, 0.6 dB on NAMIC and 0.4 dB on IXI. DGGRN performs more competitively on the BrainWeb dataset. Our method outperforms ReCNN, which is a CNN with residual learning, for all three test sets. SSIM value corresponds to the perceptual quality of the structural similarity. The SSIM values of DGGRN are higher than other methods on real data of NAMIC and IXI. Thus, it is not trivial to embed image gradients into the CNN models.  Figure 4 presents the reconstructed images of DGGRN and other methods with an upscaling factor of 4 on the three test sets. The bicubic method tends to produce blurry images with unexpected artifacts. SCSR and ResNet present more visually appealing results than bicubic method; however, they lose subtle details in some local regions. DGGRN restores more accurate image high-frequency details of edges and textures areas and recovers more informative structural details than the above methods.

Benefits of Gradient-Guided Resblock
In the proposed network, the super-resolution subnet and the gradient modeling subnet are trained jointly. In Figure 5, we present some feature maps of the last convolutional layer in the gradient modeling subnet. The feature maps contain sufficient diversity for representing the high-frequency details in T2w MRI, which supports the validity of the gradient-guided strategy. Facilitated by these feature maps, the edge and texture areas can be reconstructed explicitly. In DGGRN, the SR subnet consists of eight gradient-guided Resblocks. The ResNet and the proposed network share the same initial parameters and hyperparameters, such as the learning rate and the number of epochs. Table 1   In Figure 4, three examples of real images with an upscaling factor of 4 are presented. Facilitated by the gradient modeling subnet, the proposed method outperforms ResNet in recovering sharp edges and tiny textures.

Performance and Training Epochs
We investigated the performance and epochs of DGGRN vs. ResNet. In our work, parameters of DGGRN and ResNet were estimated by minimizing the loss function using Adam optimization. Figure 6 depicts convergence curve of DGGRN vs. ResNet with upscaling fact 2 on three test sets. It demonstrates that DGGRN converges to a plateau in 20 epochs, which represents the most appealing results. It is worth noting that DGGRN achieves higher PSNR than ResNet on all three test sets during all 65 epochs. In addition, these phenomena indicate that DGGRN converges rapidly with Adam optimization in view of both performance and convergence speed.

Parameters and Performance
Filter size: Small filter sizes such as 3 × 3 are a popular choice in current CNN-based image super-resolution. Due to limited computational resources, CNN-based SR prefers a deeper network with small filter size rather than a wider network with large size [18,21]. Following the above studies, we set the filter size to 3 × 3.
Network size: One of the issues for training the deeper network is overfitting. We investigated the performance and network size based on the baseline of DGGRN (the SR subnet with eight gradient-guided Resblocks). The number of blocks was increased from 2 to 12 to evaluate the performance vs. the number of gradient-guided Resblocks. Table 2 demonstrates the PSNRs of ResNet and the proposed network with various network sizes. For DGGRN, the PSNR values are improved progressively as the number of gradient-guided Resblocks rises from 2 to 12. On the other hand, a sharp descent is observed when we try to stack more Resblocks in ResNet; the same observation was reported by Zeng et al. [13]. This might be because the gradient modeling subnet performs similar to a selector, which drops out selected inputs of the next layer to prevent overfitting. The number of filters: The SR performance can benefit from a reasonable number of filters within networks. Thus, an appropriate number of filters K of each convolutional layer must be selected. We trained DGGRN with different K to find the appropriate value of K. It took 33, 45 and 70 s for training one epoch when K = 32, 64 and 128, respectively. Table 3 presents how the number of filters affects the performance. DGGRN K denotes DGGRN with K filters. When K increases from 32 to 64, the average PSNRs of the three datasets improve about 0.56 (dB). Compared with DGGRN 64 , DGGRN 128 achieves little improvement (0.06 dB) in PSNR. However, both the number of parameters and training time are double those in DGGRN with 64 filters. Thus, we set K = 64 as a trade-off between reconstructed image quality and the computational complexity.

Conclusions
A deep gradient-guided residual network is proposed in this paper for MRI super-resolution. The gradient subnetwork operates as a feature selector that enhances high-frequency features once the positions of the high-frequency details have been located. By broadcasting useful spatial information in high-frequency regions to the SR subnet, high-frequency MRI image details can be recovered explicitly. Thus, our method avoids reconstructing high-resolution images blindly. The joint recovery by the gradient modeling and super-resolution subnets leads to more accurate detail recovery. Experiments on synthetic and real brain MRI data have demonstrated that DGGRN reconstruct HR images with more faithful high-frequency details than other methods. We will explore applications of the gradient information in 3D brain MRI reconstruction in future work. Moreover, the gradient prior can be generalized to other image priors. In the future, brain image segmentation and textures and other features that describe the image can be further explored.