Multiple Optimizations-Based ESRFBN Super-Resolution Network Algorithm for MR Images

: Magnetic resonance (MR) images can detect small pathological tissue with the size of 3–5 image pixels at an early stage, which is of great signiﬁcance in the localization of pathological lesions and the diagnosis of disease. High-resolution MR images can provide clearer structural details and help doctors to analyze and diagnose the disease correctly. In this paper, MR superresolution based on the multiple optimizations-based Enhanced Super Resolution Feed Back Network (ESRFBN) is proposed. The method realizes network optimization from the three perspectives of network structure, data characteristics and heterogeneous network integration. Firstly, a superresolution network structure based on multi-slice input optimization is proposed to make full use of the structural similarity between samples. Secondly, aiming at the problem that the L1 or L2 loss function is based on a per-pixel comparison of differences, without considering human visual perception, the optimization method of multiple loss function cascade is proposed, which combines the L1 loss function to retain the color and brightness characteristics and the MS-SSIM loss function to retain the contrast characteristics of the high-frequency region better, so that the depth model has better characterization performance; thirdly, in view of the problem that large deep learning networks are difﬁcult to balance model complexity and training difﬁculty, a heterogeneous network fusion method is proposed. For multiple independent deep super-resolution networks, the output of a single network is integrated through an additional fusion layer, which broadens the width of the network, and can effectively improve the mapping and characterization capabilities of high-and low-resolution features. The experimental results on two super-resolution scales and on MR images datasets of four human body parts show that the proposed large-sample space learning super-resolution method effectively improves the super-resolution performance.


Introduction
MR images can detect small pathological tissue at an early stage, which is of great significance in the localization of pathological lesions and the diagnosis of disease. As MR images with high-resolution can provide clearer structural details, which can help doctors to analyze and diagnose the disease correctly, it is more desirable to obtain highresolution MR images in a stronger magnetic field [1]. However, the acquisition of highresolution MR images requires a stronger magnetic field and longer radiation scanning time. Generally, on the MR imaging time, it maybe lasts about 6-12 min for one organ region [2]. In this paper, the high-resolution refers to the resolution of an image for one organ region. For example, for the same size of the organ region, the low-resolution is 512 × 512, and the high-resolution is 1024 × 1024. When the imaging process takes a longer time, the movement of the human body during the process will bring more noise [3]. The improvement of the resolution of MR images will be at the cost of a doubling of the imaging time, which greatly reduces the patient's experience and easily brings low-resolution and high-resolution image blocks and their corresponding real dictionaries through the joint training of low-resolution and high-resolution image block dictionaries. Therefore, the sparse representation of low-resolution image blocks and the high-resolution super-complete dictionary can work together to reconstruct the high-resolution image blocks, and then the high-resolution image blocks are connected to obtain the final complete high-resolution image. IJDL [17]: Aiming at the problem of dictionary joint cascade training only considers the joint image block pair error and does not consider the individual reconstruction error of the high-and low-resolution dictionary, Improved Joint Dictionary Learning (IJDL), which is a dictionary training method based on independent calculation of high-and low-resolution dictionary reconstruction errors, is proposed. This method optimizes the objective function, and takes the individual reconstruction errors of high-and low-resolution dictionaries into consideration, and discards the traditional cascade calculation method, effectively reducing the reconstruction error and improving the reconstruction accuracy of the image. RDN [11]: The Residual Dense Network (RDN) applies Residual Dense blocks. It can not only read the state of the previous residual dense block through a continuous memory mechanism, but also make full use of all the layers in it through local dense connection, and adaptively retain the accumulated features through Local Feature Fusion (LFF). The global residual learning is used to combine the shallow features with the deep features and make full use of the stratified features of the original low-resolution image. EDSR [18]: Enhanced Deep Residual Networks for Single Image Super-Resolution (EDSR) draws on the residual learning mechanism of the ResNet network. The input is divided into two paths after a layer of convolution. One path goes through the N-layer RDB module for convolution again, and the other leads directly to the intersection for weighted summation, and then the result is output after up-sampling and convolution. SRGAN [12]: SRGAN is a generative adversarial network. It adds a discriminator on the basis of SRResnet. Its function is to add an additional discriminator network and two losses, and to train the two networks in an alternate training way. ESRGAN [19]: Compared with SRGAN, ESRGAN increases the perceptual loss, and uses the Residual-in-Residual Dense Block (RDDB) module to replace the original Residual Block module, so that ESRGAN can obtain more realistic and natural textures. SRFBN [14]: SRFBN designed a feedback module, using the return mechanism to improve the effect of super-resolution. The advantage of a returnback is that no additional parameters are added, and multiple returnbacks are equivalent to deepening the network, refining the generated image. Although other networks have adopted similar return structures, these networks are unable to make the front layer obtain useful information from the back layer. ESRFBN [20]: For ESRFBN, the number of feature graphs and group convolutions is increased on the basis of the SRFBN network. The number of feature graphs is changed from 32 to 64, and the number of group convolution is also increased from 3 to 6.
However, these deep learning networks are mainly for natural image super-resolution tasks, not medical images or MR images. In this regard, medical image researchers have also paid attention to the progress of deep super-resolution, and transferred deep learning technology to MR images, and a number of deep-learning-based MR images superresolution methods have emerged. For example, a Progressive Sub-band Residual learning Super-Resolution Network (PSR-SRN) is proposed in [21], which contains two parallel progressive learning streams. One stream passes through the sub-band residual learning unit to detect the missing high-frequency residuals, and the other stream focuses on the reconstruction of refined MR images. These two learning streams complement each other and learn the complex mapping between high-and low-resolution MR images. In [22], researchers proposed a new hybrid network, which improves the quality of MR images by increasing the width of the network. The hybrid block combines multi-path structure and the variation dense block, which can extract rich features from low-resolution images. In the previous work [23], the researcher combined the meta-learning technology with the Generative adversarial networks (GAN) to achieve super-resolution of MR images of any scale and high fidelity, and the meta-learning technology was used in the work [24].
Generative adversarial networks (GAN) are a deep learning model. It is one of the most promising methods of unsupervised learning on complex distribution in recent years. The model produces fairly good output through mutual game learning of two modules in the framework: generative model and discriminant model. GAN is most commonly used in image generation, such as super-resolution tasks, semantic segmentation and so on. The Super Resolution scale is the scaling factor. The scaling operation will adjust the size of the image according to the given scaling factor. In this paper, data sets with two super-resolution scales of ×2 and ×4 are generated in this paper.
However, the current deep learning technology still has three limitations when applying to MR images' super-resolution. (1) The characteristics of MR images' sequence are not fully considered. The imaging method of MR images is different from natural images. In the process of MR images, a series of images with similar structures can be obtained. Therefore, compared with natural images, MR sequence images can provide richer structural information and make the network more robust to noise. (2) At present, most deep networks adopt L1 or L2 loss functions, which are all based on a pixel-by-pixel comparison without considering human visual perception. At the same time, they are easy to fall into the local optimal solution, and it is difficult to obtain the optimal effect [20].
(3) The capability of feature representation of deep learning networks gradually increases with the deepening of the network. However, with the huge network structure, the training difficulty of the network also increases exponentially, which makes it difficult to achieve a good balance between model complexity and training difficulty.

Algorithm Architecture
Super-resolution methods based on deep learning have been successively applied to MR images. However, these methods simply use deep learning techniques to process MR image super-resolution tasks without fully considering the difference between natural images and medical images. Deep learning technology still has limitations in MR images' super-resolution. In this paper, corresponding optimization methods are proposed for the three limitations of the application of current deep super-resolution networks in MR images.
Aiming at the problem that the characteristics of MR images' sequence are not fully considered, this chapter proposes a multi-slice input strategy to provide more structural information for the network. The MR images are a sequence of images. Its adjacent slices have a similar structure, which can provide two-point-five dimension (2.5D) information for the network. It can obtain better results than two-dimension (2D), and has fewer parameters and higher efficiency than three-dimension (3D) convolution [25].
Aiming at the problem that the L1 or L2 loss function is based on a pixel-by-pixel comparison without considering human visual perception, where the L1 loss function is the Least Absolute value Deviation (LAD) loss function, and the L2 norm loss function is the Least Square Error (LSE) loss function. We propose a method of multi-loss function cascade optimization based on the analysis of the commonly used loss functions. By combining the characteristics of the L1 loss function to retain color and brightness and the characteristics of the MS-SSIM loss function to retain the contrast of the high-frequency region better, the depth model has better characterization performance [26].
Aiming at the problem that it is difficult for large deep learning networks to balance model complexity and training difficulty, this paper proposes a method based on heterogeneous network fusion [27]. Heterogeneous network fusion is the use of common technologies to realize the interconnection of different structures of super-resolution networks. In addition, the different structures have the different ability of MR super-resolution. Heterogeneous network fusion is to increase the super-resolution performance through applying enough of the advantages from these heterogeneous networks. For multiple independent deep super-resolution networks, the output of a single network is integrated through an additional fusion layer, or two independent networks are cascaded. This method essentially broadens the width and depth of the network, and can effectively im-prove the mapping and characterization capabilities of high-and low-resolution features, so as to obtain a better MR images' super-resolution effect.
In order to solve the three limitations of deep super-resolution networks on MR images simultaneously, this paper proposes a multi-optimized ESRFBN super-resolution network algorithm for MR images based on the optimization methods of the previous three paragraphs. The algorithm architecture is shown in Figure 1. The three optimization methods are applied to the ESRFBN network at the same time, so that the multi-optimized ESRFBN network can get a better MR images' super-resolution effect.
Appl. Sci. 2021, 11, x FOR PEER REVIEW 5 of 16 multiple independent deep super-resolution networks, the output of a single network is integrated through an additional fusion layer, or two independent networks are cascaded. This method essentially broadens the width and depth of the network, and can effectively improve the mapping and characterization capabilities of high-and low-resolution features, so as to obtain a better MR images' super-resolution effect.
In order to solve the three limitations of deep super-resolution networks on MR images simultaneously, this paper proposes a multi-optimized ESRFBN super-resolution network algorithm for MR images based on the optimization methods of the previous three paragraphs. The algorithm architecture is shown in Figure 1. The three optimization methods are applied to the ESRFBN network at the same time, so that the multi-optimized ESRFBN network can get a better MR images' super-resolution effect.

Cascade Optimization of loss Function Based on Joint PSNR-SSIM
In order to improve the PSRN and SSIM indexes of MR super-resolution, a cascade optimization method based on the joint loss function of PSNR-SSIM is proposed in this section. SSIM as a loss function is not particularly sensitive to uniform deviation. This leads to changes in brightness or color, which usually become duller. However, SSIM can preserve contrast and is better than other loss functions in the high-frequency region. On the other hand, the L1 loss function can keep the brightness and color unchanged. Therefore, in order to obtain the best characteristics of the two loss functions, this section combines them to form a new loss function, and the name is Mix loss function. Its formula is as follows The cascade optimization calculation steps of the loss function based on joint PSNR-SSIM are as follows.

MS-SSIM
L SSIM/MS-SSIM is an index that measures the similarity of two images, mainly from three aspects: brightness, contrast and structure. Therefore, the SSIM index can better characterize the structural information of the image. As a loss function, it can better retain the high-frequency information of the image, so that the super-resolution image has more detailed information, which is useful for MR images in clinical diagnosis. Its expression is as follows. Where is the original high-resolution image; is the high-resolution image output by the deep network; is the mean of the image ; is the mean of the image ; is the variance of the image ; is the variance of the image ; is the

Cascade Optimization of Loss Function Based on Joint PSNR-SSIM
In order to improve the PSRN and SSIM indexes of MR super-resolution, a cascade optimization method based on the joint loss function of PSNR-SSIM is proposed in this section. SSIM as a loss function is not particularly sensitive to uniform deviation. This leads to changes in brightness or color, which usually become duller. However, SSIM can preserve contrast and is better than other loss functions in the high-frequency region. On the other hand, the L1 loss function can keep the brightness and color unchanged. Therefore, in order to obtain the best characteristics of the two loss functions, this section combines them to form a new loss function, and the name is Mix loss function. Its formula is as follows The cascade optimization calculation steps of the loss function based on joint PSNR-SSIM are as follows.
Step 1. Calculate L MS−SSIM SSIM/MS-SSIM is an index that measures the similarity of two images, mainly from three aspects: brightness, contrast and structure. Therefore, the SSIM index can better characterize the structural information of the image. As a loss function, it can better retain the high-frequency information of the image, so that the super-resolution image has more detailed information, which is useful for MR images in clinical diagnosis. Its expression is as follows. Where y is the original high-resolution image; y is the high-resolution image output by the deep network; µ y is the mean of the image y; µ y is the mean of the image y ; σ 2 y is the variance of the image y; σ 2 y is the variance of the image y ; σ y y is the covariance of the sum of the image y and y ; c 1 and c 2 are two constants, avoiding the denominator to be 0.
Appl. Sci. 2021, 11, 8150 6 of 15 The index range of SSIM is 0-1. The larger the SSIM is, the higher the similarity of the two images. Therefore, SSIM as the loss function can be expressed as the following formula [20].
In order to better transmit the error, the SSIM formula is deformed, and the dependence of the mean and standard deviation on the pixel p is ignored here. The mean and standard deviation are calculated using a Gaussian filter with a standard deviation G σG .
At the same time, the SSIM loss function can be re-characterized as ∼ p represents the central pixel. Even if the network learns the weight to maximize the SSIM value of the central pixel, the learned kernel will be applied to all pixels in the image. Therefore, it is necessary to calculate the reciprocal of ∼ p to other pixels q, and the derivation process is as follows Analogous to SSIM as a loss function, MS-SSIM as a loss function can be characterized as Step 2. Calculate L l 1 Both the L1 and L2 loss function can keep the brightness and color unchanged, but the L2 loss function is more sensitive to abnormal points; thus, the effect of the model will be greatly affected the data quality.
The mechanism of the L1 loss function on the image is based on comparing the differences per pixel and then taking the absolute value. In general, it minimizes the sum of the absolute difference between the target value and the estimated value. The expression is shown below, where y(i, j) represents the pixel value of the original high-resolution image at position (i, j); y (i, j) represents the pixel value of the network output image at position (i, j); W and H represent the width and height of the image, respectively [20].
Compared with the L1 loss function, the L2 loss function magnifies the gap between the maximum error and the minimum error, and the L2 loss function is more sensitive to abnormal points. When the quality of the training data is poor, the L2 loss function will affect the model effect, resulting in inaccurate super-resolution effects, which will have a serious impact on medical diagnosis based on MR images. However, if only the L1 loss function is used, it is easy to fall into a local optimal solution.
Step 3. Setting of α The initial value of α in the formula is set to 0.84. In this section, half attenuation was carried out in the 90th epoch, the 120th epoch and the 190th epoch, respectively, which can be conducive to the rapid convergence of the network.

ESRFBN Network Improvement Based on Contextual Network Fusion
The ESRFBN network improvement based on contextual network fusion is shown in Figure 2. For the input low-resolution MR images, it is divided into two channels and two input network architectures, respectively. For the first channel, the images are computed with the calculation of 3 × 3 convolution, 1 × 1 convolution and deconvolution and the up-sampling operation. Through these computer units, the high-resolution image is created. In the second channel of images, a high-resolution image is obtained through the second network computing. Then, the high-resolution images are created with the convolution-based fusion network from the two channels of MR images from the superresolution networks. Two independent networks are used to predict high-resolution images, respectively, and the fusion layer is used to fuse the high-resolution images outputted by the independent network, then output the final super-resolution image. The expression is as follows [20].
Appl. Sci. 2021, 11, x FOR PEER REVIEW 8 of 16 × 3 convolution layer is used as the fusion layer, and its learnable parameter is 3 × (3 × 3 + 1) = 30. In the training of the improved ESRFBN network method based on contextual network fusion, the weight of a single network is frozen, and the fusion layer is randomly initialized by a zero-mean Gaussian distribution with a standard deviation of 0.001.

Algorithm Flow
In this section, the steps of the ESRFBN super-resolution network algorithm for MR images based on multiple optimizations are mainly divided into network training and model testing. The specific steps are in the Appendix A.

Discussion
Firstly, under the large number of the training samples, the super-resolution performance is better if the super-resolution network is deeper. Accordingly, the number Among them, W j , b j is the fusion layer parameter that being constructed. The weight of the fusion layer can be learned by fine-tuning the entire network. In this section, two independent network parameters are frozen to learn the parameters of the fusion layer. A 3 × 3 convolution layer is used as the fusion layer, and its learnable parameter is 3 × (3 × 3 + 1) = 30. In the training of the improved ESRFBN network method based on contextual network fusion, the weight of a single network is frozen, and the fusion layer is randomly initialized by a zero-mean Gaussian distribution with a standard deviation of 0.001.

Algorithm Flow
In this section, the steps of the ESRFBN super-resolution network algorithm for MR images based on multiple optimizations are mainly divided into network training and model testing. The specific steps are in the Appendix A.

Discussion
Firstly, under the large number of the training samples, the super-resolution performance is better if the super-resolution network is deeper. Accordingly, the number of the parameters of the deep super-resolution network is larger. Secondly, under the similar number of parameters and training samples, the different structures of the deep network have the different ability of super-resolution. The different performances depend on the different super-resolution mechanism. So, we have the fusion of the different structure of networks, and the fusion network can extract enough of the features of the MR images. Then, the higher super-resolution performance is obtained with the fusion super-resolution deep network. In this fusion network, we fuse the independent deep super-resolution networks, so the size of networks is not decreased. So, the numbers of parameters are similar for the original different super-resolution networks and fusion network. On the balance of the performance and the complexity, on the same size of the deep networks, the fusion network can achieve the higher performance than the original scale of deep network. For multiple independent deep super-resolution networks, the output of a single network is integrated through an additional fusion layer, which broadens the width of the network, and can effectively improve the mapping and characterization capabilities of high-and low-resolution features.

Experiment 1
In this section, the loss function experiments of L1, L2, SSIM, MS-SSIM and L1+MS-SSIM are performed on the double cubic down-sampling data set and K-space truncated data set. The specific experimental results are shown in Tables 1-4. When ESRFBN uses cascading loss function L1+MS-SSIM as a loss function for training, the effect is optimal on both the bicubic decrement sampling data set and K-space truncated data set, which indicates the effectiveness of the optimization of the cascading loss function.
SSIM, as a loss function, is not particularly sensitive to uniform deviations, which will cause changes in brightness or color, and usually become duller. However, SSIM can preserve contrast and is better than other loss functions in the high-frequency region. On the other hand, the L1 loss function can keep the brightness and color unchanged. Therefore, the cascaded loss function can combine the characteristics of the two well, so that the super-resolution effect of the trained network is better.   As can be seen from Figure 3, compared with SSIM and MS-SSIM loss functions, the result of L1 and L2 only improves the pixels around the contour details, while for the result of SSIM and MS-SSIM loss functions, the feature details of high-resolution images after super-resolution are more prominent than that of L1 and L2. However, the L1+MS-SSIM loss function retains both pixel gray level information and contour feature information, and the image after super-resolution is most similar to the original image. It also indicates that the ESRFBN super-resolution network based on the L1+MS-SSIM cascading loss function optimization can effectively improve the super-resolution effect of MR images. SSIM loss function retains both pixel gray level information and contour feature information, and the image after super-resolution is most similar to the original image. It also indicates that the ESRFBN super-resolution network based on the L1+MS-SSIM cascading loss function optimization can effectively improve the super-resolution effect of MR images.

Experiment 2: Analysis of the Convergence of Each Loss Function Index during the Training Process
In order to enable the network to converge better during the training process and avoid falling into the local optimal solution, the cascaded loss function L1+MS-SSIM uses parameter α to balance the PSNR index and the SSIM index. As can be seen from the training convergence curve of the ESRFBN network in Figure 4, it began to converge before 80 epochs. Uncertain factors in the calculation of loss function were increased by adjusting the attenuation of the learning rate and parameter α , so as to avoid falling into the local optimal solution. The attenuation of parameter α is shown in Figure 4

Experiment 2: Analysis of the Convergence of Each Loss Function Index during the Training Process
In order to enable the network to converge better during the training process and avoid falling into the local optimal solution, the cascaded loss function L1+MS-SSIM uses parameter α to balance the PSNR index and the SSIM index. As can be seen from the training convergence curve of the ESRFBN network in Figure 4, it began to converge before 80 epochs. Uncertain factors in the calculation of loss function were increased by adjusting the attenuation of the learning rate and parameter α, so as to avoid falling into the local optimal solution. The attenuation of parameter α is shown in Figure 4

Experiment 2: Analysis of the Convergence of Each Loss Function Index during the Training Process
In order to enable the network to converge better during the training process and avoid falling into the local optimal solution, the cascaded loss function L1+MS-SSIM use parameter α to balance the PSNR index and the SSIM index. As can be seen from th training convergence curve of the ESRFBN network in Figure 4, it began to converg before 80 epochs. Uncertain factors in the calculation of loss function were increased b adjusting the attenuation of the learning rate and parameter α , so as to avoid falling int the local optimal solution. The attenuation of parameter α is shown in Figure 4   In this section, in order to reflect the impact of parameter α's attenuation on the training process better, the difference between the evaluation indexes of the cascaded loss function (Mix) and other loss functions during the training process is used for characterization. Figure 5 shows the curve of the number of iterations and the difference in PSNR during the training of each loss function, and Figure 6 shows the curve of the number of iterations and the difference in the SSIM during the training of each loss function. It can be seen from the figure that at 90 epoch and 120 epoch, the difference between the Mix loss function and other loss functions will have obvious vibrations. This is because in the Mix loss function, the parameter α attenuated and the loss function curve oscillates, resulting in lower PSNR and SSIM values calculated by the model. However, the difference tends to stabilize after rising in the subsequent process. This shows that ESRFBN has a strong convergence ability. On the other hand, it shows that during the training process, the learning rate attenuation strategy can prevent the network from falling into the local optimal solution, so that the network model obtained finally has a better super-resolution effect.
characterization. Figure 5 shows the curve of the number of iterations and the difference in PSNR during the training of each loss function, and Figure 6 shows the curve of the number of iterations and the difference in the SSIM during the training of each loss function. It can be seen from the figure that at 90 epoch and 120 epoch, the difference between the Mix loss function and other loss functions will have obvious vibrations. This is because in the Mix loss function, the parameter α attenuated and the loss function curve oscillates, resulting in lower PSNR and SSIM values calculated by the model However, the difference tends to stabilize after rising in the subsequent process. This shows that ESRFBN has a strong convergence ability. On the other hand, it shows tha during the training process, the learning rate attenuation strategy can prevent the network from falling into the local optimal solution, so that the network model obtained finally has a better super-resolution effect.  In order to improve the PSRN and SSIM indexes of MR super-resolution, this section proposes a method based on PSNR-SSIM joint loss function cascade optimization. In order to verify the effectiveness of the proposed method, four experiments were carried out on the data sets generated by the two modes. characterization. Figure 5 shows the curve of the number of iterations and the difference in PSNR during the training of each loss function, and Figure 6 shows the curve of the number of iterations and the difference in the SSIM during the training of each loss function. It can be seen from the figure that at 90 epoch and 120 epoch, the difference between the Mix loss function and other loss functions will have obvious vibrations. This is because in the Mix loss function, the parameter α attenuated and the loss function curve oscillates, resulting in lower PSNR and SSIM values calculated by the model However, the difference tends to stabilize after rising in the subsequent process. This shows that ESRFBN has a strong convergence ability. On the other hand, it shows tha during the training process, the learning rate attenuation strategy can prevent the network from falling into the local optimal solution, so that the network model obtained finally has a better super-resolution effect.  In order to improve the PSRN and SSIM indexes of MR super-resolution, this section proposes a method based on PSNR-SSIM joint loss function cascade optimization. In order to verify the effectiveness of the proposed method, four experiments were carried out on the data sets generated by the two modes. In order to improve the PSRN and SSIM indexes of MR super-resolution, this section proposes a method based on PSNR-SSIM joint loss function cascade optimization. In order to verify the effectiveness of the proposed method, four experiments were carried out on the data sets generated by the two modes. Experiment 1 proves that the L1+MS_SSIM cascaded loss function can obtain the optimal super-resolution effect, which proves the effectiveness of the optimization of the cascading loss function. This is mainly because the cascading loss function combines the advantages of the L1 loss function which can keep the brightness and color unchanged and the MS_SSIM loss function which can retain the contrast and is better than other loss functions in the high frequency area, so that the cascading loss function has the best characteristics of the two loss functions. The cascade loss function optimization enables the ESRFBN network to obtain a better super-resolution effect of MR images.
Through Experiment 2, it can be proved that parameter attenuation can make the difference between the Mix loss function and other loss functions vibrate significantly, so as to avoid the network falling into the local optimal solution and make the obtained network model have better super-resolution effect. At the same time, it shows that the ESRFBN network has strong convergence ability, which can make deep network converge quickly in the training process and reduce the learning difficulty of network.
The data set used in this section includes MR images' data of the neck, head, breast and knee. The number of training set, verification set and test set are: Neck: 400, 50, 50; Head: 500, 50, 50; breast: 1800, 200, 200; knee: 800, 100, 100. Data sets with two super-resolution scales of ×2 and ×4 are generated. The model used is implemented with Pytorch 1.6.0, and is trained with dual Nvidia GeForce GTX 1080 Ti graphics cards.
The methods mentioned in this section and the experimental results of each method are shown in Tables 5 and 6. It can be seen from the table that the method proposed in this paper has obtained the best results in four parts of the human body and in two super-division scales, and the evaluation index value improved greatly.

Conclusions
This paper aims at solving the three limitations of applying a deep learning network in the super-resolution task of MR images, such as not fully considering the characteristics of MR images' sequence, difficulty in obtaining the optimal solution of loss function and difficulty in balancing model complexity and training difficulty. Firstly, seven kinds of deep learning networks with a good super-resolution effect were used to carry out experiments under two super-resolution scales on four human body parts, namely neck, breast, knee and head. Secondly, in view of the three limitations of applying deep learning in the super-resolution of MR images, this paper proposes a multiple optimization-based ESRFBN super-resolution network algorithm for MR images, which integrates the three optimization methods. The quantitative experimental results show that the ESRFBN network is more suitable for MR images' super-resolution task. Secondly, in view of the three limitations of applying deep learning in the super-resolution of MR images, this paper proposes a multiple optimization-based ESRFBN super-resolution network algorithm for MR images, which integrates the three optimization methods. Quantitative experimental results show that the ESRFBN super-resolution network based on multiple optimization of MR images is superior to the super-resolution algorithms based on interpolation, reconstruction, dictionary learning and other super-resolution algorithms based on a deep learning network, and it can greatly improve the super-resolution effect of MR images, proving that the method proposed in this paper is effective. Institutional Review Board Statement: Ethical review and approval were not applicable for this study, due to the data used in the experiment is open source.

Data Availability Statement:
We have not used specific data from other sources for the simulations of the results. The two popular MRI data sets in this paper, fastMRI Dataset and IXI Dataset, are free to download with the website: https://fastmri.org/ and http://www.brain-development.org/ (accessed on 31 November 2020).

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
Algorithm A1 Multiple optimization ESRFBN super-resolution network training process.
Algorithm A1 ESRFBN super-resolution network training based on multiple optimization.
(2) Independent network outputs their respective super-resolution results Input: Multi-slice low-resolution image set X : (x t−1 , x t , x t+1 ) k Multi-slice high-resolution image set Y : (y t−1 , y t , y t+1 ) k Deep model parameters θ 1 , θ 2 , θ 3 Output: Evaluation value PSNR, SSIM. 1: Model parameter loading 2: Image super-resolution for 1 to k do (1) Independent network outputs their respective super-resolution results The fusion layer performs the fusion of the output results of each network ssim(Y i 3 , Y i ) 4: Output the value of PSNR, SSIM