1. Introduction
A short version of this work has been accepted to the 17th International Symposium on Biomedical Imaging (ISBI 2020), 3–7 April 2020, Iowa City, IO, USA [
1]. Magnetic Resonance Imaging (MRI) is an imaging modality used to probe the soft tissues of the human body. As it is non-invasive and non-ionizing (contrary to X-Rays, for example), its popularity has grown over the years, for example, tripling between 1997 and 2006, according to the authors of [
2]. This is attributed in part to the technical improvements of this technique. We can, for example, mention higher field magnets (3 Teslas instead of 1.5), parallel imaging [
3], or compressed sensing MRI [
4] (CS-MRI). These improvements allow for better image quality and lower acquisition duration.
There is, however, still room for improvement. Indeed, an MRI scan may last up to 90 min according to the NHS website [
5], making it unpractical for some people because you need to lay still for this long period. Typically, babies or people suffering from Parkinson’s disease or claustrophobia could not stay that long in a scanner without undergoing general anesthesia, which is a heavy process, making the overall exam less accessible. To extend the accessibility to more people, we should, therefore, either increase the robustness to motion artifacts, or reduce the acquisition time with the same image quality. On top of that, we should also reduce the reconstruction time with the same image quality to increase the MRI scanners throughput and the total exam time. Indeed, the reconstructed image might show some motion artifacts, and the whole acquisition would need to be re-done [
6]. Some other times, based on the first images seen by the physician, they may decide to prescribe complementary pulse sequences if necessary to clarify the image-based diagnosis.
When working in the framework of CS-MRI, the classical methods generally involve solving a convex non-smooth optimization problem. This problem often involves a data-fitting term and a regularization term reflecting our prior on the data. The need for regularization comes from the fact that the problem is ill-posed since the sampling in the Fourier space, called k-space, is under the Nyquist–Shannon limit. However, these classical reconstruction methods exhibit two shortcomings.
This is where learning comes in to play, and in particular, deep learning. The promise is that it will solve both the aforementioned problems.
Because they are implemented efficiently on GPU and do not use an iterative algorithm, the deep learning algorithms run extremely fast.
If they have enough capacity, they can learn a better prior of the MR images from the training set.
One of the first neural networks to gain attention for its use in MRI reconstruction was AUTOMAP [
8]. This network did not exploit a problem-specific property except the fact that the outcome was supposed to be an image. Some more recent works [
9,
10,
11] have tried to inspire themselves from existing classical methods in order to leverage problem specific properties but also expertise gained in the field. However, they have not been compared against each other on a large dataset containing complex-valued raw data.
A recently published dataset, fastMRI [
12], allows this comparison, although it is still to be done and requires an implementation of the different networks in the same framework to allow for a fairer comparison in terms of, for example, runtime.
Our contribution is exactly this, that is:
Benchmark different neural networks for MRI reconstruction on two datasets: the fastMRI dataset, containing raw complex-valued knee data, and the OASIS dataset [
13] containing DICOM real-valued brain data.
While our work focuses on classical MRI modalities reconstruction, note that other works have applied deep learning to other modalities like MR fingerprinting [
16] or diffusion MRI [
17]. The networks studied here could be applied but would not benefit from some invariants of the problem, especially in the fourth (contrast-related) dimension introduced.
2. Related Works
In this section, we briefly discuss other works presenting benchmarks on many different reconstruction neural networks.
In [
18], they benchmark their (adversarial training based) algorithms against classical methods and against Cascade-net (which they call Deep Cascade) [
11] and ADMM-net (which they call DeepADMM) [
19]. They train and evaluate the networks quantitatively on two datasets, selecting each time 100 images for train and 100 images for test:
While both these datasets provide a sufficient number of samples to have a trustworthy estimate of the performance of the networks, they are not composed of raw complex-valued data, but of DICOM real-valued data. Still, in [
18], they do evaluate their algorithms on a raw complex-valued dataset (
http://mridata.org/list?project=Stanford%20Fullysampled%203D%20FSE%20Knees), but it only features 20 acquisitions, and therefore the comparison is only done qualitatively.
In [
10], they benchmark their algorithm against classical methods. They train and evaluate their network on three different datasets:
Again, the only public dataset they use features real-valued data. It is also to be noted that their code cannot be found online.
6. Discussion
In this work, we only considered one scheme of under-sampling. However, it should be interesting to see if the performance obtained on one type of under-sampling generalizes to other types of under-sampling, especially if we do a re-gridding step for non-Cartesian under-sampling schemes. On that specific point, the extension of the networks towards non-Cartesian sampling schemes is not easy because the data consistency cannot be performed in the same way, and the measurement space is no longer similar to an image (except if we re-grid). In a recent work [
47], some of the authors of the Cascade-net [
11] propose a way to extend their approach to the non-Cartesian case, using a re-gridding step. The PD-net [
9] also has a straightforward implementation for the non-Cartesian case even without re-gridding, in what is called the learned Primal. In this case, the network in the k-space is just computing the difference (residual) between the current k-space measurements and the initial k-space measurements. Therefore, there are no parameters to learn, which alleviates the problem of how to learn them.
We also only considered a single-coil acquisition setting. As parallel imaging is primarily used in CS-MRI to allow higher image quality [
3], it is important to see how these networks will behave in the multi-coil setting. The difficult part in the extension of these works to the multi-coil setting will be to understand how to best involve the sensitivity maps (or even not involve them [
48]).
Regarding the networks themselves, the results seem to suggest that for cross-domain networks, the trade-off between a high number of iterations and a richer correction in a certain domain (by having deeper networks) is in favor of having more iterations (i.e., alternating more between domains). It is, however, unclear how to best tackle the reconstruction in the k-space, since the convolutional networks make a shift invariance hypothesis, which is not true in the Fourier space where the coefficients corresponding to the high frequencies should probably not be treated in the same way as with the low frequencies. This leaves room for improvement in the near future.
Finally, this work has not dealt with recent approaches involving adversarial training for MRI reconstruction networks [
18,
49]. It would be very interesting to see how the adversarial training could improve each of the proposed networks.