A Review of Optimization-Based Deep Learning Models for MRI Reconstruction

Bian, Wanyu; Tamilselvam, Yokhesh Krishnasamy

doi:10.3390/appliedmath4030059

Open AccessReview

A Review of Optimization-Based Deep Learning Models for MRI Reconstruction

by

Wanyu Bian

^1,2,*

and

Yokhesh Krishnasamy Tamilselvam

³

¹

Athinoula A. Martinos Center for Biomedical Imaging, Massachusetts General Hospital, Charlestown, MA 02129, USA

²

Department of Radiology, Harvard Medical School, Boston, MA 02115, USA

³

Department of Electrical and Computer Engineering, University of Western Ontario, London, ON N6A 5B9, Canada

^*

Author to whom correspondence should be addressed.

AppliedMath 2024, 4(3), 1098-1127; https://doi.org/10.3390/appliedmath4030059

Submission received: 4 July 2024 / Revised: 27 August 2024 / Accepted: 29 August 2024 / Published: 3 September 2024

(This article belongs to the Special Issue Optimization and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

Magnetic resonance imaging (MRI) is crucial for its superior soft tissue contrast and high spatial resolution. Integrating deep learning algorithms into MRI reconstruction has significantly enhanced image quality and efficiency. This paper provides a comprehensive review of optimization-based deep learning models for MRI reconstruction, focusing on recent advancements in gradient descent algorithms, proximal gradient descent algorithms, ADMM, PDHG, and diffusion models combined with gradient descent. We highlight the development and effectiveness of learnable optimization algorithms (LOAs) in improving model interpretability and performance. Our findings demonstrate substantial improvements in MRI reconstruction in handling undersampled data, which directly contribute to reducing scan times and enhancing diagnostic accuracy. The review offers valuable insights and resources for researchers and practitioners aiming to advance medical imaging using state-of-the-art deep learning techniques.

Keywords:

optimization algorithms; MRI reconstruction; deep learning

1. Introduction

Magnetic resonance imaging (MRI) is a crucial medical imaging technology that is non-invasive and non-ionizing, providing highly detailed and accurate images of tissues in their natural, living state, which is vital for disease diagnosis and medical research. As an indispensable instrument in both diagnostic medicine and clinical studies, MRI plays an essential role [1,2].

Although MRI offers superior diagnostic capabilities, its lengthy imaging times, compared to other modalities, restrict patient throughput. This challenge has spurred innovations aimed at speeding up the MRI process, with the shared objective of significantly reducing scan duration while maintaining image quality [3,4]. Accelerating data acquisition during MRI scans is a major focus within the MRI and clinical application community. Typically, scanning one sequence of MR images can take at least 30 min, depending on the body part being scanned, which is considerably longer than most other imaging techniques. However, certain groups such as infants, elderly individuals, and patients with serious diseases who cannot control their body movements, may find it difficult to remain still for the duration of the scan. Prolonged scanning can lead to patient discomfort and may introduce motion artifacts that compromise the quality of the MR images, reducing diagnostic accuracy. Consequently, reducing MRI scan times is crucial for enhancing image quality and patient experience.

MRI scan time is largely dependent on the number of phase encoding steps in the frequency domain (k-space), with common methods to accelerate the process involving the reduction of these steps by skipping phase encoding lines and sampling only partial k-space data. However, this approach can lead to aliasing artifacts due to undersampling, violating the Nyquist criterion [5]. MRI reconstruction involves generating high-quality, artifact-free MR images from undersampled k-space data, which are then used for diagnostic and clinical purposes. In MRI reconstruction, the challenge lies in solving an inverse problem, where the goal is to recover an image from partially sampled and noisy k-space data. Compressed sensing (CS) [6] MRI reconstruction and parallel imaging [3,7,8] are effective techniques that address this inverse problem, speeding up MRI scans and reducing artifacts. By allowing for undersampling and heaving the ability to reconstruct high-quality MRI images from undersampled data, CS significantly reduces scan time while offering images that are often comparable to those obtained from fully sampled data.

Traditional MRI reconstruction techniques suffer from several limitations and challenges. While faster MRI scans are desirable to reduce patient discomfort and improve throughput, this could lead to reduced sampling which in turn could compromise the integrity of the reconstructed images. In addition to undersampling, certain techniques may be susceptible to noise and other artifacts which could contaminate the images [9,10]. Challenges pertaining to reconstructing multi-contrast images have also been discussed by these papers [11,12]. These multi-contrast images are often considered to provide rich and more useful information as they involve combining different types of image modalities and MRI sequences. However, reconstructing these images may be computationally expensive with traditional methods and may also involve managing large and complex datasets. Recent studies have proposed numerous deep-learning models to address these challenges.

Deep learning has seen extensive applications in image processing tasks [13,14,15,16,17] because of its ability to efficiently manage multi-scale data and learn hierarchical structures effectively, both of which are essential for precise image reconstruction and enhancement. Convolution neural networks (CNNs) are also extensively utilized in MRI reconstruction due to their proficiency in handling complex patterns and noise inherent in MRI data [18,19,20,21,22,23,24,25,26,27]. By learning from large datasets, deep learning algorithms can improve the accuracy and speed of reconstructing high-quality images, thus significantly enhancing the diagnostic capabilities of MRI technology. Additionally, the deep learning models may be particularly suited for reconstructing the MRI data. This is because these optimization-based models are designed to iteratively refine their outputs, thereby allowing them to efficiently handle the complex inverse problems associated with the MRI reconstruction. Furthermore, deep learning models can effectively incorporate prior knowledge such as sparsity or smoothness constraints, which are essential for reconstructing high-quality images from undersampled data. Owing to the flexibility available during the model design, it is possible to design models that can enforce the physics of MRI acquisition, such as Fourier encoding of spatial information. This would ensure that the reconstructed images are consistent with the underlying data.

In recent years, optimization-based algorithm unrolling networks have gained significant attention in the field of MRI reconstruction [27,28,29,30,31,32,33,34,35,36,37]. These algorithms are inspired by classical optimization techniques and are designed to address the unique challenges posed by MRI data. One notable development in this area is the introduction of learnable optimization algorithms (LOAs). LOAs enhance the interpretability of deep learning models by incorporating MR physics, thereby improving both model performance and training efficiency. The convergence properties of these LOAs can support the fast convergence of the reconstruction process and speed up the model training [38,39]. This paper explores several key approaches within this framework, including gradient descent and proximal gradient descent algorithm-inspired networks, variational networks, iterative shrinkage-thresholding algorithm (ISTA) networks, and alternating direction method of multipliers (ADMM)-inspired networks. These methods leverage iterative optimization techniques to refine MRI reconstructions, effectively reducing artifacts and enhancing image clarity. Additionally, the integration of diffusion models, such as the score-based diffusion model and Domain-conditioned diffusion modeling, represents a novel approach that combines deep learning with diffusion processes to more robustly tackle undersampling issues. LOAs, as a subset of physics-driven machine learning methods [40], explicitly incorporate known physics-based forward imaging models into deep learning architectures. This integration ensures consistency with k-space measurements during the reconstruction process, offering a comprehensive framework for improving the speed and accuracy of MRI scans and advancing the application of machine learning in medical imaging. These methods collectively represent a robust framework for improving the speed and accuracy of MRI scans, advancing both the theory and application of machine learning in medical imaging. The integration of these sophisticated deep learning techniques with traditional optimization algorithms provides a dual advantage of enhancing diagnostic capabilities while significantly reducing scan times. By reviewing how these optimization methods are used in conjunction with novel deep learning techniques, we aim to shed light on the capabilities of state-of-the-art MRI reconstruction techniques and the scope for future work in this direction.

This paper is organized in the following structure: Section 1 introduces the importance of MRI reconstruction and LOA methods. Section 2 presents the compressed sensing (CS)-based MRI reconstruction model. Section 3 provides a detailed overview of various optimization algorithms utilizing deep learning techniques. Section 4 discusses the current issues and limitations of learnable optimization models. Section 5 concludes the paper by summarizing the key findings and implications of the study.

2. MRI Reconstruction Model

Parallel imaging methods, such as the generalized auto-calibrating partially parallel acquisition (GRAPPA) [4] and ESPIRiT [41], are k-space techniques that focus on manipulating or reconstructing the k-space data before converting it into the image domains using an inverse Fourier transform [42]. These methods utilize coil-by-coil auto-calibration to achieve accurate reconstruction.

On the other hand, compressed sensing (CS) exploits the sparsity of MR images in a specific transform domain (e.g., wavelet or total variation) to reconstruct images. For CS to be effective, it requires incoherent sampling, which helps to spread the aliasing artifacts in the image domain in a way that makes them easier to remove. CS is primarily applied in the image domain, removing aliasing artifacts by solving a system of equations that relate the image to be reconstructed and the partial k-space data through coil sensitivities. An example of this approach is sensitivity encoding (SENSE) [3].

This paper focuses on CS-based methods and different algorithms for solving the system equations derived from them. The formulation for the MRI reconstruction problem in CS-based methods is described by a regularized variational model as follows:

min_{x} \frac{1}{2} {∥ A x - f ∥}_{2}^{2} + μ R (x),

(1)

where

x \in C^{n}

is the MR image to be reconstructed, consisting of n pixels, and

f \in C^{m}

denotes the corresponding undersampled measurement data in k-space. The data fidelity term

\frac{1}{2} {∥ A x - f ∥}_{2}^{2}

enforces physical consistency between the reconstructed image

x

and the partial data

f

measured in the k-space. The choice of regularization operator plays a critical role in enforcing sparsity or low-rank constraints to stabilize the reconstruction from undersampled k-space data. A common regularization used is

L_{1}

Norm which promotes sparse solutions by penalizing the sum of absolute values of coefficients in transformed domains. Alternatively, a low-rank or non-convex regularization may also be used depending on the application. While a low-rank regularization leverages the fact that MR images exhibit low-rank structures in certain matrix forms, the non-convex penalties provide stronger sparsity-promoting properties similar to

L_{p}

norm. However, the downside is that the non-convex regularization problems are harder to solve due to non-convexity. A practical example illustrating the effectiveness or advantage of enforcing sparsity in a clinical setting is that by enforcing sparsity in the wavelet domain, CS-based methods have been found to accelerate MRI acquisitions by 4 to 6 times while maintaining the image quality [3].

The regularization operator

R

:

C^{n} \to R

enforces sparsity or low-rank constraints on the MRI data, incorporating prior knowledge to guide the reconstruction and prevent overfitting of the data fidelity term. It is important to note that MR images are generally not sparse in their original spatial domain. Therefore, to effectively apply CS-based methods, the image must be transformed into a domain where it exhibits sparsity, such as the wavelet or Fourier domain. The regularization term

R (x)

typically enforces sparsity in this transformed domain, allowing for accurate image reconstruction from undersampled data.

The weight parameter

μ > 0

balances the data fidelity term and regularization term. The measurement data is typically expressed as

f = A x + ε

with

ε \in C^{m}

representing the noise encountered during acquisition. The forward measurement encoding matrix

A \in C^{m \times n}

utilized in parallel imaging is defined by:

A : = P_{Ω} F S,

(2)

where

S : = [S_{1}, \dots, S_{c}]

refers to the sensitivity maps of c different coils,

F \in C^{n \times n}

represents the 2D discrete Fourier transform, and

P_{Ω} \in N^{m \times n} (m ≪ n)

is the binary undersampling mask that captures m sampled data points according to the undersampling pattern

Ω

. Figure 1 shows the image reconstruction diagram.

Optimization-based reconstruction methods encompass a variety of techniques for solving complex problems such as (1). These include gradient descent methods like steepest descent [43], proximal methods such as ADMM for non-smooth optimization, interior point methods for constrained problems, and Newton-type methods for faster convergence. Other approaches include iterative shrinkage-thresholding algorithms for sparse reconstruction, coordinate descent for large-scale problems, stochastic methods like SGD for machine learning applications, and primal-dual methods that optimize in both spaces simultaneously. Each method offers unique advantages, making them suitable for different types of reconstruction problems based on factors such as problem structure, size, and computational requirements.

3. Optimization-Based Network Unrolling Algorithms for MRI Reconstruction

The deep learning-based model has the capability to leverage large datasets and further improve reconstruction performance compared to traditional methods and therefore has had successful applications in clinical fields [18,19,20,21,22,23,24,44,45]. Most existing deep learning-based methods employ end-to-end neural networks that either map partial k-space data directly to reconstructed images [46,47,48,49,50], or map partial k-space data to an estimated fully sampled k-space such as RAKI [51] and Grappa-Net [52]. By incorporating optimization algorithms in the end-to-end training, both the acquisition scanning times and image reconstruction time can be drastically reduced to reconstruct high-quality images from undersampled k-space data. This could allow for faster imaging without compromising the MRI’s diagnostic accuracy [51,52,53]. To improve the interpretability of the relation between the topology of the deep model and reconstruction results, a new emerging class of deep learning-based methods known as learnable optimization algorithms (LOA) have attracted much attention, e.g., [27,28,29,30,31,32,33,34,35,36,37,54,55]. LOA was proposed to map existing optimization algorithms to structured networks where each phase of the networks corresponds to one iteration of an optimization algorithm.

The architectures of these networks are modeled after iterative optimization algorithms. They retain the data fidelity term, which describes image formation based on well-established physical principles that are already known and do not need to be relearned. Instead of using manually designed and overly simplified regularization as in classical reconstruction methods, these networks employ deep neural networks for regularization. Typically, these reconstruction networks consist of a few phases, each mimicking one iteration of a traditional optimization-based reconstruction algorithm. The manually designed regularization terms in classical methods are replaced by layers of CNNs, whose parameters are learned during offline training.

For instance, ADMM-Net [54], ISTA-Net⁺ [56], and cascade network [19] are applied on single-coil MRI reconstruction, where the encoding matrix is reduced to

A = P_{Ω} F

as the sensitivity map

S

is its identity.Variational networks (VNs) [18] introduced the gradient descent method with given (pre-calculated) sensitivities

S

. MoDL [57] proposed a recursive network by unrolling the conjugate gradient algorithm using a weight-sharing strategy. Blind-PMRI-Net [58] designed three network blocks to alternatively update multi-channel images, sensitivity maps, and the reconstructed MR image using an iterative algorithm based on half-quadratic splitting. VS-Net [59] derived a variable splitting optimization method. However, existing methods still face the lack of accurate coil sensitivity maps and proper regularization in the parallel imaging problem. Alder et al. [60] proposed a reconstruction network that unrolled a primal-dual algorithm where the proximal operator is learnable.

3.1. Gradient Descent Algorithm-Inspired Network

3.1.1. Variational Network

The use of a variational network (VN) [18] solves model (1) by using gradient descent:

x^{(t + 1)} = x^{(t)} - α^{(t)} (λ A^{⊤} (A x^{(t)} - f) + \nabla^{(t)} R (x^{(t)})) .

(3)

This model was applied to multi-coil MRI reconstruction. The regularization term was defined by the field-of-experts model:

R (x) = \sum_{i = 1}^{N} < H_{i} (G_{i} x),

1>. A convolution neural network

G_{i}

is applied to the MRI data. The function

H_{i}

is defined as non-linear potential functions which are composed of scalar activation functions. Then take the summation of the inner product of the non-linear term

H_{i} (G_{i} x)

and the vector of ones

1

. The sensitivity maps are pre-calculated and being used in

A

. The algorithm of VN unrolls the step (3) where the regularizer

R

is parameterized by the learnable network

G_{i}

together with non-linear activation function

H_{i}

:

x^{(t + 1)} = x^{(t)} - α^{(t)} (λ A^{⊤} (A x^{(t)} - f) + \sum_{i = 1}^{N} {(G_{i}^{(t)})}^{⊤} H_{i}^{(t)} (G_{i}^{(t)} x^{(t)})) .

(4)

Figure 2 shows the iterative process of the reconstruction algorithm of VN.

Owing to the numerous advantages of CNNs, they can function as implicit regularizers, replacing traditional techniques such as

L_{2}

regularization or dropout layers. CNNs excel at feature extraction and enable weight sharing, which reduces the number of parameters compared to fully connected layers, making the model less susceptible to overfitting. Additionally, the hierarchical learning process of CNNs imposes a structured learning approach, naturally limiting the network’s complexity and providing an implicit regularization effect. Learning regularization terms from training data is becoming a popular trend for solving inverse problems. Some methods were developed via a hybrid domain-specific learning model. For instance, E2EVarNet [61] performs iterative optimization steps in the k-space domain and uses CNNs to learn the gradient of the regularization term in the image domain within each cascade iteratively.

\begin{matrix} f^{(t + 1)} & = f^{(t)} - α^{(t)} P_{Ω} (f^{(t)} - f) + W (f^{(t)}), \end{matrix}

(5a)

\begin{matrix} where W (f^{(t)}) & = F \circ J \circ H_{Θ} \circ \tilde{J} \circ F^{⊤} (f^{(t)}) . \end{matrix}

(5b)

where

H_{Θ}

is a CNN applied on the complex image,

J

and

\tilde{J}

are expand and reduce operators that take care of coil sensitivity maps, respectively. Recurrent VarNet [62] utilizes recurrent neural networks (RNN) for learning the image-refinement model

H_{Θ}

. The coil sensitivity maps are estimated and refined during the training phase. It has strong performance in refining image quality and sensitivity maps, particularly in multi-coil MRI. It is also effective in long-term sequence data, but is computationally intensive. Additionally, other variations of VN [61,63] have also been developed. E2E-VarNet [61] employs hybrid domain learning, combining both k-space and image domain approaches through iterative optimization steps using gradient descent. While this method delivers excellent performance in reconstructing undersampled MRI data in both domains, it comes with a complex training process that demands substantial computational resources. Despite its effectiveness, the high computational requirements for training and deployment present a significant challenge. These works demonstrate the potential of combining variational methods with deep learning for solving complex inverse problems in medical imaging.

3.1.2. Denoising Model-Based Regularizations

A group of optimization models employed denoising model-based regularizations [57,64,65]. They help in the reconstruction of high-quality images by iteratively refining the image while reducing noise and preserving important image details. These types of models balance the reconstruction between fitting the observed data and adhering to the learned priors about noise and artifact patterns. The model-based deep learning (MoDL) [57] framework incorporates a CNN-based regularization prior within a model-based reconstruction scheme. This framework unrolls an alternating recursive algorithm to solve a variational model, where the regularization term is learnable and designed to estimate the noise and alias artifacts:

min_{x} {∥ A x - f ∥}_{2}^{2} + μ {∥ x - D_{Θ} (x) ∥}_{2}^{2} .

(6)

In Equation (6), the non-linear denoising operator

D_{Θ} (x)

parameterized by learnable variables

Θ

is trained to eliminate noise and artifacts from the image. The regularization term ensures that the reconstructed image

x

closely approximates the denoised version provided by

D_{Θ} (x)

. This regularized optimization model is solved by using the normal equations and iterates the following steps:

\begin{matrix} x^{(t)} & = \frac{A^{⊤} f + λ Z^{(t - 1)}}{A^{⊤} A + λ I} . \end{matrix}

(7a)

\begin{matrix} Z^{(t)} & = D_{Θ} (x^{(t)}) . \end{matrix}

(7b)

When dealing with single-channel MRI data, the step (7a) has an analytical solution which can be expressed as a data consistency (DC) layer, where

x^{(t)} = F^{⊤} f_{D C}^{(t)}

, and

f_{D C}^{(t)} [i] = \{\begin{matrix} \frac{(f [i] + λ Z^{(t - 1)} [i])}{1 + λ} & if i th sample is acquired, \\ Z^{(t - 1)} [i] & else . \end{matrix}

(8)

For the multi-coil MRI data, (7a) is solved using a conjugate gradient optimization algorithm. Figure 3 shows the iterative process of the reconstruction algorithm of VN.

3.2. Proximal Gradient Descent Algorithm-Inspired Networks

Solving inverse problems using proximal gradient descent has been largely explored and successfully applied in medical imaging reconstruction [27,39,66,67,68,69,70,71,72,73,74,75,76].

Applying a proximal gradient descent algorithm to approximate a (local) minimizer of (1) is an iterative process. The first step is gradient descent to force data consistency, and the second step applies a proximal operator to obtain the updated image. The following steps iterates the proximal gradient descent algorithm:

\begin{matrix} b_{t} = x_{t} - ρ_{t} A^{⊤} (A x_{t} - f), \end{matrix}

(9a)

\begin{matrix} x_{t + 1} = {prox}_{ρ_{t} R (\cdot)} (b_{t}), \end{matrix}

(9b)

where

ρ_{t} > 0

is the step size and

{prox}_{α R}

is the proximal operator of

R

defined by

{prox}_{ρ R} (b) = \underset{x}{arg min} \frac{1}{2 ρ} {∥ x - b ∥}_{2}^{2} + R (x) .

(10)

The gradient update step (9a) is straightforward to compute and fully utilizes the relationship between the partial k-space data

f

and the image

x

to be reconstructed as derived from MRI physics. This step involves implementing the proximal operation for regularization

R

, which is equivalent to finding the maximum a posteriori solution for the Gaussian denoising problem at a noise level

\sqrt{ρ}

[77,78]. Thus, the proximal operator can be interpreted as a Gaussian denoiser. However, because the proximal operator

{prox}_{ρ R}

in the objective function (1) does not admit a closed-form solution, a CNN can be used to substitute

{prox}_{ρ R}

. Constructing the network with residual learning [66,71,76] is suitable to avoid the gradient vanishing problem. This approach allows the CNN to effectively approximate the proximal operator and facilitate the optimization process.

Mardani et al. [66] introduced a recurrent neural network (RNN) architecture enhanced by residual learning to learn the proximal operator more effectively. This learnable proximal mapping effectively functions as a denoiser, progressively eliminating aliasing artifacts from the input image.

The step size

ρ_{t}

plays a crucial role in determining the convergence and performance of the proximal gradient descent algorithm. LOAs usually apply a learnable step size which adds a layer of adaptability to the optimization process, allowing the algorithm to adjust dynamically to the specific characteristics of the data and the model at each iteration. Traditional fixed step sizes may be either too small, leading to slow convergence, or too large, potentially causing the algorithm to overshoot and oscillate. In contrast, a learnable step size can adjust itself based on the gradient’s magnitude and direction, promoting a more stable and faster convergence. By fine-tuning the step size during the training process, the algorithm can navigate the loss landscape more efficiently, avoiding regions where a fixed step size might struggle. The performance of the proximal gradient descent algorithm in MRI reconstruction is closely tied to how well it can minimize the objective function. A learnable step size has the flexibility that helps in striking a balance between convergence speed and reconstruction accuracy, leading to better overall performance. Additionally, it allows the model to adapt to different noise levels and data inconsistencies, which are common in MRI tasks, further improving the robustness of the reconstruction.

3.2.1. Iterative Shrinkage-Thresholding Algorithm (ISTA) Network

ISTA-Net⁺ [56] formulate the regularizer as a

ℓ_{1}

norm of non-linear transform

R (x) = {∥ φ (x) ∥}_{1}

. The proximal gradient descent updates (9) become:

\begin{matrix} b_{t} = x_{t} - ρ_{t} A^{⊤} (A x_{t} - f), \end{matrix}

(11a)

\begin{matrix} x_{t + 1} = \underset{x}{arg min} \frac{1}{2 ρ_{t}} ∥ x - b_{t} ∥_{2}^{2} + {∥ φ (x) ∥}_{1} . \end{matrix}

(11b)

The proximal step (11b) can be parameterized as an implicit residual update step due to the lack of a closed-form solution:

x_{t + 1} = b_{t} + H (b_{t}),

(12)

where

H

is a deep neural network with residual learning that approximates the proximal point.

Using the mean value theorem, ISTA-Net⁺ derives an approximation theorem:

∥ φ (x) - φ (b_{t}) ∥_{2}^{2} \approx δ {∥ x - b_{t} ∥}_{2}^{2}

with

δ > 0

. Thus, the proximal update step (11b) was written as

x_{t + 1} \approx \underset{x}{arg min} \frac{1}{2} ∥ φ (x) - φ (b_{t}) ∥_{2}^{2} + δ ρ_{t} {∥ φ (x) ∥}_{1} .

(13)

Assuming

φ

is orthogonal and invertible, ISTA-Net⁺ provides the following closed-form solution:

x_{t + 1} = \tilde{φ} (S_{β_{t}} (φ (b_{t}))),

(14)

where

β_{t} = δ ρ_{t}

and

S_{β_{t}} (x) = {prox}_{β_{t} {∥ \cdot ∥}_{1}} (x) = [sign (x_{i}) max (| x_{i} | - β_{t}, 0)] \in R^{n}

is the soft shrinkage operator with vector

x = (x_{1}, \dots, x_{n}) \in R^{n}

. Thus, Equation (12) is reduced to an explicit form as given in (14), which we summarize together with (11a) in the following scheme:

\begin{matrix} b_{t} & = x_{t} - ρ_{t} A^{⊤} (A x_{t} - f), \end{matrix}

(15a)

\begin{matrix} x_{t} & = b_{t} + {\tilde{φ}}_{t} (S_{β_{t}} (φ_{t} (b_{t}))) . \end{matrix}

(15b)

The deep network

\tilde{φ}

is applied with the symmetric structure to

φ

and is trained separately to enhance the capacity of the network. The initial input

x_{0}

was set to be the zero-filled reconstruction

A^{⊤} f

.

The loss function was designed in two parts. The first part is the discrepancy loss:

L_{d i s} (Θ) = \frac{1}{2} {∥ x_{T} (f, Θ) - x^{*} ∥}_{2}^{2}

(16)

This loss measures the squared discrepancy between the reference image

x^{*}

and the reconstructed image from the last iteration

x_{T} (Θ)

. The second part of the loss function is to enforce the consistency of

{\tilde{φ}}_{t}

and

φ_{t}

:

L_{i d} (Θ) = \frac{1}{2} \sum_{t = 1}^{T} {∥ {\tilde{φ}}_{t} (φ_{t} (x^{*})) - x^{*} ∥}_{2}^{2}

(17)

This loss aims to ensure that

{\tilde{φ}}_{t} φ_{t} = I

, an identity mapping. The training process minimizes the following loss function:

L_{l o s s} = L_{d i s} (Θ) + μ L_{i d} (Θ) .

(18)

where

μ

is a balancing parameter. Figure 4 shows the iterative process of the reconstruction algorithm of VN.

The derivation of the approximation theorem in ISTA-Net+ relies on several critical assumptions including orthogonality and invertibility of non-linear transform (

φ

). One of the primary assumptions is that

φ

is the orthogonal means that the transform

φ

satisfies:

φ^{T} φ = I,

(19)

where I is the identity matrix. Another assumption is that

φ

is invertible and it ensures that there exists a unique inverse transform. This assumption ensures that no information is lost and allows for a complete reconstruction of the original signal after transformation. The final assumption is the mean value theorem application. The approximation theorem assumes that

φ

satisfies the conditions necessary for applying the mean value theorem. Discussing the practical implications of these assumptions, the orthogonality assumption allows for more straightforward updates leading to a closed-form solution. Similarly, the invertibility is an important requirement to ensure that the transformation

φ

does not lose any critical information. Finally, the mean value theorem provides a convenient way to approximate the relationship between the transformed variables.

3.2.2. Parallel MRI Network

Parallel MRI networks [71] leverage residual learning to learn the proximal mapping and tackle model (9), thus bypassing the requirement for pre-calculated coil sensitivity maps in the encoding matrix (2). Similar to the model for joint reconstruction and synthesis [70], parallel MRI networks consider the MRI reconstruction as a bi-level optimization problem:

\begin{matrix} min_{Θ} ℓ (x_{Θ}, x^{*}), \end{matrix}

(20a)

\begin{matrix} s . t . x_{Θ} = \underset{x}{arg min} Φ_{Θ} (x) . \end{matrix}

(20b)

The variable

x = (x_{1}, \dots, x_{c}) \in C^{m \times n \times c}

denotes the multi-coil MRI data scanned from c coil elements, with each

x_{i}

corresponding to i-th coil for

i = 1 \dots c

. The study constructs a model

Φ_{Θ}

to incorporate dual regularization terms applied to both image space and k-space, described by:

Φ (x) : = \frac{1}{2} \sum_{i = 1}^{c} ∥ P_{Ω} F x_{i} - f_{i} ∥_{2}^{2} + R (J (x)) + R_{f} (F x_{i}) .

(21)

The channel-combination operator

J

aims to learn a combination of multi-coil MRI data which integrates the prior information among multiple channels. Then the image domain regularizer

R

extracts the information from the channel-integration image

J (x)

. The regularizer

R_{f}

is designed to obtain prior information from k-space data.

The upper-level optimization (20a) is the network training process where the loss function

ℓ (x_{Θ}, x^{*})

is defined as the discrepancy between learned

x_{Θ}

and the ground truth

x^{*}

. The lower level optimization (20b) is solved by the following redefined algorithm:

\begin{matrix} b_{i}^{(t)} = x_{i}^{(t)} - ρ_{t} F^{- 1} P_{Ω}^{⊤} (P_{Ω} F x_{i}^{(t)} - f_{i}), & i = 1, \dots, c, \end{matrix}

(22a)

\begin{matrix} {\bar{x}}_{i}^{(t)} = {[{prox}_{ρ_{t} R (J (\cdot))} (b^{(t)})]}_{i}, & i = 1, \dots, c, \end{matrix}

(22b)

\begin{matrix} x_{i}^{(t + 1)} = {prox}_{ρ_{t} R_{f} (F (\cdot))} ({\bar{x}}_{i}^{(t)}), & i = 1, \dots, c . \end{matrix}

(22c)

The proximal operator can be understood as a Gaussian denoiser. Nevertheless, the proximal operator

{prox}_{ρ_{t} R}

in the objective function (22b) lacks a closed-form solution, necessitating the use of a CNN as a substitute for

{prox}_{ρ_{t} R}

. This network is designed as a residual learning network denoted by

ϕ

in the image domain and

φ

in the k-space domain, and the algorithm (22a), (22b) and (22c) are implemented in the following scheme:

\begin{matrix} b_{i}^{(t)} = x_{i}^{(t)} - ρ_{t} F^{- 1} P_{Ω}^{⊤} (P_{Ω} F x_{i}^{(t)} - f_{i}), & i = 1, \dots, c, \end{matrix}

(23a)

\begin{matrix} {\bar{x}}_{i}^{(t)} = b_{i} (t) + ϕ_{t} (b_{i}^{(t)}), & i = 1, \dots, c, \end{matrix}

(23b)

\begin{matrix} x_{i}^{(t + 1)} = {\bar{x}}_{i} (t) + F^{- 1} φ_{t} (F ({\bar{x}}_{i}^{(t)})), & i = 1, \dots, c . \end{matrix}

(23c)

The CNN

ϕ

utilizes channel-integration operator

J

and operates with shared weights across iterations, effectively learning spatial features. However, it may erroneously enhance oscillatory artifacts as real features. In the k-space denoising step (23c), the k-space network φ focuses on low-frequency data, helping to remove high-frequency artifacts and restore image structure. Alternating between (23b) and (23c) in their respective domains balances their strengths and weaknesses, improving overall performance. Figure 5 shows the iterative process of reconstruction algorithm of the parallel MRI network.

The construction of iterative algorithm (23) is inspired by cross-domain reconstruction methods [79,80,81,82], which aimed at improving the quality and speed of MRI image reconstruction by leveraging information from multiple domains—typically the image domain and the k-space (frequency) domain. This approach integrates data and knowledge across different domains to enhance the accuracy and efficiency of reconstructing high-quality images from undersampled MRI data.

Learning sensitivity maps also represent a related group of reconstruction techniques [61,83,84,85,86]. Deep J-Sense builds on the joint SENSE model, which solves for both the image and coil sensitivity maps simultaneously during MRI reconstruction. This work incorporates unrolled alternating optimization to refine both the magnetization (image) kernel and the sensitivity maps iteratively. The optimization problem is defined as

min_{S, x} \frac{1}{2} {∥ f - P_{Ω} F (S * x) ∥}_{2}^{2} + λ_{x} R_{x} (x) + λ_{s} R_{s} (S),

(24)

R_{x} (x), R_{s} (S)

are regularization terms for the image and sensitivity maps, respectively.

λ_{x}

and

λ_{s}

are regularization parameters. This optimization problem is solved iteratively by alternating between updating the image

x

and the sensitivity maps

S

.

For the sensitivity maps update:

\begin{matrix} S_{t + 1} & = arg min_{S} \frac{1}{2} {∥ f - P_{Ω} F (S * x) ∥}_{2}^{2} + λ_{s} R_{s} (S) . \end{matrix}

(25a)

\begin{matrix} S^{+} & = D_{s} (S) \end{matrix}

(25b)

For the image update:

\begin{matrix} x_{t + 1} & = arg min_{x} \frac{1}{2} {∥ f - P_{Ω} F (S * x) ∥}_{2}^{2} + λ_{x} R_{x} (x) . \end{matrix}

(26a)

\begin{matrix} x^{+} & = D_{x} (x) \end{matrix}

(26b)

The end-to-end deep neural networks

D_{s}

and

D_{x}

are applied to refine the sensitivity maps and refine the image, respectively.

Parallel MRI network architecture has also been generalized to quantitative MRI (qMRI) reconstruction problems under a self-supervised learning framework. The next subsection introduces a similar learnable optimization algorithm for the qMRI reconstruction network.

3.2.3. Self-Supervised Approaches for Quantitative MRI Reconstruction

Recent advancements in quantitative MRI (qMRI) reconstruction have seen the incorporation of self-supervised learning techniques, which have proven effective in reconstructing quantitative mapping from the undersampled k-space MRI data. Among these, the RELAX-MORE [76] algorithm has introduced a self-supervised learning framework that optimizes reconstruction by leveraging the underlying physics of MRI signal acquisition.

RELAX-MORE introduced an optimization algorithm to unroll the proximal gradient for qMRI reconstruction. The loss function minimizes the discrepancy between undersampled reconstructed MRI k-space data and the “true” undersampled k-space data retrospectively. The model, once thoroughly trained, can be adapted to other testing data through the use of transfer learning.

The qMRI reconstruction model aims to reconstruct the quantitative parameters

P

and this problem can be formulated as a bi-level optimization model:

\begin{matrix} min_{Θ} & ℓ (P_{Ω} F S M (P (f | Θ)), f) s . t . \end{matrix}

(27a)

\begin{matrix} P (f | Θ) = \underset{P}{arg min} K_{Θ} (P), \end{matrix}

(27b)

\begin{matrix} where & K_{Θ} (P) : = \frac{1}{2} {‖ P_{Ω} F S M (P) - f ‖}_{2}^{2} + β R_{Θ} (P) . \end{matrix}

(27c)

The model

M

represents the MR signal function that maps the set of quantitative parameters

P : = {ρ_{1}, \dots, ρ_{N}}

to the MRI data. The loss function in (27a) is addressed through a self-supervised learning network, and

P (f | Θ)

is derived from the network parameterized by

Θ

. The upper-level problem (27a) focuses on optimizing the learnable parameters for network training, while the lower-level problem (27b) concentrates on optimizing the quantitative MR parameters. RELAX-MORE uses

T_{1}

mapping obtained through the variable flip angle (vFA) method [87] as an example. The MR signal model

M

is described by the following equation:

M_{k} (T_{1}, I_{0}) = I_{0} \cdot \frac{(1 - e^{- T R / T_{1}}) sin η_{k}}{1 - e^{- T R / T_{1}} cos η_{k}},

(28)

where

η_{k}

represents flip angle for

k = 1, \dots, N_{k}

, where

N_{k}

is the total number of the flip angles acquired.

T_{1}

and

I_{0}

are the spin-lattice relaxation time maps and proton density maps, respectively. Therefore, the parameter set needed for reconstruction is

P = {T_{1}, I_{0}}

.

Similar to the parallel MRI network [71], RELAX-MORE employs a proximal gradient descent algorithm to address the lower level problem (27b), with a residual network structure designed to learn the proximal mapping. Below is the unrolled learnable Algorithm 1 for resolving (27b):

Algorithm 1 Learnable proximal gradient descent algorithm

Input:: $ρ_{i}^{(0)}, μ_{i}^{(1)}, ν_{i}^{(1)}, i = 1, \dots, N$ .
1:: for $t = 1$ to T do
2:: for $i = 1$ to N do
3:: ${\bar{ρ}}_{i}^{(t)} = ρ_{i}^{(t - 1)} - μ_{i}^{(t)} \nabla_{ρ_{i}} \frac{1}{2} {∥ P_{Ω} F S M ({ρ_{i}^{(t - 1)}}_{i = 1}^{N}) - f ∥}_{2}^{2},$
4:: $ρ_{i}^{(t)} = {\tilde{W}}_{Θ_{i}}^{(t)} \circ T_{ν_{i}}^{(t)} \circ W_{Θ_{i}}^{(t)} ({\bar{ρ}}_{i}^{(t)}) + {\bar{ρ}}_{i}^{(t)}$
5:: end for
6:: $x^{(t)} = M ({ρ_{i}^{(t)}}_{i = 1}^{N}),$
7:: end for
Output:: ${ρ_{i}^{(T)}}_{i = 1}^{N}$ and $x^{(t)}, \forall t \in {1, \dots, T}$ .

Step 4 implements the residual network structure to learn the proximal operator with regularization

β R_{Θ}

. The learnable operators

{\tilde{W}}_{Θ}

and

W_{Θ}

have a symmetric network structure, and

T_{ν}^{(t)}

is the soft thresholding operator threshold parameter

ν

.

3.3. Alternating Direction Method of Multipliers (ADMM) Algorithm-Inspired Networks

ADMM introduced an auxiliary variable

v

to solve the following bi-level problem:

\begin{matrix} min_{x, v} & \frac{1}{2} {∥ A x - f ∥}_{2}^{2} + μ R (v) \end{matrix}

(29a)

\begin{matrix} s . t . v = D x, \end{matrix}

(29b)

we can consider

D

as a gradient operator to reinforce the sparsity of MRI data such as total variation norm.

The first step is to form the augmented Lagrangian for the given problem. The augmented Lagrangian combines the objective function with a penalty term for the constraint violation and a Lagrange multiplier:

L_{ρ} (x, v, λ) = \frac{1}{2} {∥ A x - f ∥}_{2}^{2} + μ R (v) + < λ, D x - v > + \frac{ρ}{2} {∥ D x - v ∥}_{2}^{2},

(30)

where

λ

is the Lagrange multiplier and

ρ > 0

is a penalty parameter.

The ADMM algorithm solves the above problem by alternating the following three subproblems:

\begin{matrix} x_{t + 1} & = \underset{x}{arg min} \frac{1}{2} {∥ A x - f ∥}_{2}^{2} + \frac{ρ}{2} {∥ D x - (v_{t} - u_{t}) ∥}_{2}^{2}, \end{matrix}

(31a)

\begin{matrix} v_{t + 1} & = \underset{v}{arg min} \frac{ρ}{2} {∥ D x_{t + 1} - (v - u_{t}) ∥}_{2}^{2} + μ R (v), \end{matrix}

(31b)

\begin{matrix} u_{t + 1} & = u_{t} + (D x_{t + 1} - v_{t}) . \end{matrix}

(31c)

We can obtain the closed-form solutions for each subproblem as follows:

\begin{matrix} x_{t + 1} & = {(A^{⊤} A + ρ D^{⊤} D)}^{- 1} (A^{⊤} f + ρ D^{⊤} (v_{t} - u_{t})), \end{matrix}

(32a)

\begin{matrix} v_{t + 1} & = {prox}_{μ / ρ} (D x_{t + 1} + u_{t}), \end{matrix}

(32b)

\begin{matrix} u_{t + 1} & = u_{t} + (D x_{t + 1} - v_{t + 1}) . \end{matrix}

(32c)

If the regularizer is

l_{1}

norm

R (v) = {∥ v ∥}_{1}

, then (32b) reduces to

v_{t + 1} = S_{β} (D x_{t + 1} + u_{t})

with the soft shrinkage threshold

β = μ / ρ

.

In the ADMM, tuning penalty and regularization parameters play a crucial role in the algorithm’s performance—particularly the penalty parameter

ρ

and the augmented Lagrangian parameter

μ

. These parameters significantly influence the convergence and performance of the algorithm. A well-chosen

μ

can accelerate convergence by appropriately weighting the constraints in the problem. However, if

μ

is too large, the algorithm may become unstable or oscillate between iterations. Conversely, if

μ

is too small, convergence can be excessively slow. The parameter

ρ

controls the weight of the augmented Lagrangian term in ADMM. It influences the convergence speed and stability of the algorithm. A large

ρ

can enforce constraints more strictly but may lead to numerical instability, while a small

ρ

might result in slower convergence. Tuning parameters typically rely on heuristic methods or cross-validation to find a suitable value. One approach to mitigating the challenges of parameter selection is to adaptively update

ρ

and

μ

during the iterations based on the observed residuals. This strategy can help balance the trade-off between convergence speed and stability, but it adds complexity to the algorithm.

Gradient descent is simple and widely used but it suffers from slow convergence, particularly in ill-conditioned problems. ADMM can converge faster for certain problem classes, especially when the objective function can be split into simpler subproblems. Proximal gradient descent extends gradient descent by incorporating proximal operators, making it suitable for optimization problems with non-smooth terms. ADMM can be seen as a more general approach that also leverages proximal operators but in a way that allows for better decomposition of the problem.

ADMM-Net

ADMM-Net [54] reformulates these three steps through an augmented Lagrangian method. This approach leverages a cell-based architecture to optimize neural network operations for MRI image reconstruction. The network is structured into several layers, each corresponding to a specific operation in the ADMM optimization process. The gradient operator

D

is parameterized as a deep neural network

D

. All the scalars

μ

and

ρ

are learnable parameters to be trained and updated through ADMM iterations:

\begin{matrix} x_{t} & = {(A^{⊤} A + ρ_{t} D_{t}^{⊤} D_{t})}^{- 1} (A^{⊤} f + ρ_{t} D_{t}^{⊤} (v_{t - 1} - u_{t - 1})), \end{matrix}

(33a)

\begin{matrix} v_{t} & = S_{β_{t}} (D_{t} x_{t} + u_{t - 1}), \end{matrix}

(33b)

\begin{matrix} u_{t} & = u_{t - 1} + (D_{t} x_{t} - v_{t}) . \end{matrix}

(33c)

The Reconstruction layer (33a) uses a combination of Fourier and penalized transformations to reconstruct images from undersampled k-space data, incorporating learnable penalty parameters and filter matrices. The Convolution layer

c_{t} = D x_{t + 1}

applies a convolution operation, transforming the reconstructed image to enhance feature representation, using distinct, learnable filter matrices to increase the network’s capacity. The Non-linear Transform layer (33b) replaces traditional regularization functions with a learnable piecewise linear function, allowing for more flexible and data-driven transformations that go beyond simple thresholding. Finally, the Multiplier Update layer (33c) updates the Lagrangian multipliers, essential for integrating constraints into the learning process, with learnable parameters to adaptively refine the model’s accuracy. Each layer’s output is methodically fed into the next, ensuring a coherent flow that mimics the iterative ADMM process, thus systematically refining the image reconstruction quality with each pass through the network. Figure 6 shows the iterative process of the reconstruction algorithm of the parallel MRI network.

3.4. Primal-Dual Hybrid Gradient (PDHG) Algorithm-Inspired Networks

There are several networks [60,88] are developed inspired by the PDHG algorithm. PDHG can be used to solve the model (1) by iterating the following steps:

\begin{matrix} d_{t + 1} & = {prox}_{ζ_{t} H^{⊤} (\cdot)} (d_{t} + ζ_{t} A {\bar{x}}_{t}), \end{matrix}

(34a)

\begin{matrix} x_{t + 1} & = {prox}_{ρ_{t} R (\cdot)} (x_{t} + η_{t} A^{⊤} d_{t + 1}), \end{matrix}

(34b)

\begin{matrix} {\bar{x}}_{t + 1} & = x_{t + 1} + θ (x_{t + 1} - x_{t}), \end{matrix}

(34c)

where H is the data fidelity function defined as

H (A x, f) : = {∥ A x - f ∥}_{2}^{2}

in the model (1). In the learned primal-dual model [60], the traditional proximal operators are replaced with learned parametric operators. These operators are not necessarily proximal but are instead learned from training data, aiming to act similarly to denoising operators, such as block matching 3D (BM3D). The proximal operators can be parameterized as deep networks

G_{θ^{'}}

and

K_{θ^{″}}

. PD-Net [88] iterates the following two steps:

\begin{matrix} d_{t + 1} & = G_{θ^{'}} (d_{t}, ζ_{t} A x_{t}, f), \end{matrix}

(35a)

\begin{matrix} x_{t + 1} & = K_{θ^{″}} (x_{t}, A^{⊤} d_{t + 1}), \end{matrix}

(35b)

The key innovation here is that these operators—both for the primal and dual variables—are parameterized and optimized during training, allowing the model to learn optimal operation strategies directly from the data. The learned primal-dual model operates under a fixed number of iterations, which serves as a stopping criterion. This approach ensures that the computation time remains predictable and manageable, which is beneficial for time-sensitive applications. The algorithm maintains its structure but becomes more adaptive to specific data characteristics through the learning process, potentially enhancing reconstruction quality over traditional methods. Figure 7 shows the iterative process of the reconstruction algorithm of a parallel MRI network.

3.5. Diffusion Models Meet Gradient Descent for MRI Reconstruction

A notable development for MRI reconstruction using diffusion models is the emergence of denoising diffusion probabilistic models (DDPMs) [89,90,91,92]. In denoising diffusion probabilistic models (DDPMs), the forward diffusion process systematically introduces noise into the input data, incrementally increasing the noise level until the data becomes pure Gaussian noise. This alteration progressively distorts the original data distribution. Conversely, the reverse diffusion process, or the denoising process, aims to reconstruct the original data structure from this noise-altered distribution. DDPMs effectively employ a Markov chain mechanism to transition from a noise-modified distribution back to the original data distribution via learned Gaussian transitions. The learnable Gaussian noise can be parametrized in a U-net architecture that consists of transformers/attention layers [93] in each diffusion step. The transformer model has demonstrated promising performance in generating global information and can be effectively utilized for image denoising tasks.

DDPMs represent an innovative class of generative models renowned for their ability to master complex data distributions and achieve high-quality sample generation without relying on adversarial training methods. Their adoption in MRI reconstruction has been met with growing enthusiasm due to their robustness, particularly in handling distribution shifts. Recent studies exploring DDPM-based MRI reconstructions [89,90,91,92] demonstrate how these models can generate noisy MR images which are progressively denoised through iterative learning at each diffusion step, either unconditionally or conditionally. This approach has shown promise in enhancing MRI workflows by speeding up the imaging process, improving patient comfort, and boosting clinical throughput. Moreover, the model [90] has proven exceptionally robust, producing high-quality images even when faced with data that deviates from the training set (distribution shifts) [94], accommodating various patient anatomies and conditions, and thus enhancing the accuracy and reliability of diagnostic imaging.

3.5.1. Score-Based Diffusion Model

Chung et al. [89] presented an innovative framework that applies score-based diffusion models to solve inverse imaging problems. The core technique involves training a continuous time-dependent score function using denoising score matching. The score function of the data distribution

log p_{t} (x (t))

is defined as the gradient of log density w.r.t the input data. This is estimated by a time-conditional deep neural network

S_{Θ} (x (t), t)

. The score model is trained by minimizing the following loss function on the magnitude image:

L (Θ) = E_{x (t) \sim p (x (t) | x (0)), x (0) \sim p_{d a t a}} [∥ S_{Θ} (x (t), t) - \nabla_{x (t)} log p_{t} {(x (t) | x (0)) ∥}_{2}^{2}] .

(36)

During inference, the model alternates between a numerical stochastic differential equation (SDE) solver and a data consistency step to reconstruct images. The method is agnostic to subsampling patterns, enabling its application across various sampling schemes and body parts not included in the training data. Chung et al. [89] proposed the following Algorithm (2) with the predictor-corrector (PC) sampling algorithm [95]. For

i = N - 1, \dots, 0

, the predictor is defined as

x_{i} = x_{i + 1} + (σ_{i + 1}^{2} - σ_{i}^{2}) S_{Θ} (x_{i + 1}, σ_{i + 1}) + \sqrt{σ_{i + 1}^{2} - σ_{i}^{2}} ϵ

with

ϵ \sim N (0, I)

. The corrector is defined as

x_{i} = x_{i} + λ_{i} S_{Θ} (x_{i}, σ_{i}) + \sqrt{2 λ_{i}} ϵ

with step size

λ_{i} > 0

.

Incorporating a gradient descent step to emphasize the data consistency after the predictor and corrector, we can obtain the following Algorithm 2:

Algorithm 2 Score-based sampling for MRI reconstruction [89]

Input:: $x_{N} \sim N (0, σ_{T}^{2} I)$ , Learned score function $Θ$ , step size ${ϵ_{i}}$ , noise schedule ${σ_{i}}$ and MRI encoding matrix $A$ .
1:: for $i = N - 1, \dots, 0$ do
2:: $Re (x_{i}) \leftarrow Predictor (Re (x_{i + 1}), σ_{i}, σ_{i + 1})$
3:: $Im (x_{i}) \leftarrow Predictor (Im (x_{i + 1}), σ_{i}, σ_{i + 1})$
4:: $x_{i} = Re (x_{i}) + j Im (x_{i})$
5:: $x_{i} \leftarrow x_{i} - A^{⊤} (A x_{i} - f)$
6:: for $j = 1, \dots, M$ do
7:: $Re (x_{i}) \leftarrow Corrector (Re (x_{i + 1}), σ_{i}, ϵ_{i})$
8:: $Im (x_{i}) \leftarrow Corrector (Im (x_{i + 1}), σ_{i}, ϵ_{i})$
9:: $x_{i} = Re (x_{i}) + j Im (x_{i})$
10:: $x_{i} \leftarrow x_{i} - A^{⊤} (A x_{i} - f)$
11:: end for
12:: end for

The above Algorithm 2 can be variate to other two different algorithms. One is the parallel implementation for each coil image for parallel MRI reconstruction. The other one considers the correlation among the multiple coil images and eliminates the calculation of sensitivity maps, and the final magnitude image is obtained by using the sum-of-root-sum-of-squares of each coil. The results outperforms conventional deep learning methods, including UNet [96], DuDoRNet [97], and E2E-Varnet [61], which requires complex k-space data.

3.5.2. Domain-Conditioned Diffusion Modeling

Domain-conditioned diffusion modeling (DiMo) [73] and quantitative DiMo were developed for application on both accelerated multi-coil MRI and quantitative MRI (qMRI) using diffusion models conditioned on the native data domain rather than the image domain. The method incorporates a gradient descent optimization within the diffusion steps to improve feature learning and denoising effectiveness. The training and sampling algorithm for MRI reconstruction is illustrated in Algorithms 3 and 4.

Algorithm 3 Training process of static DiMo

Input:: $t \sim Uniform ({1, \dots, T}), ϵ \sim N (0, I),$ fully scanned k-space ${\hat{f}}_{0} \sim q (f_{0})$ , undersampling mask $P_{Ω}$ , partial scanned k-space $f$ , and coil sensitivities $S$ .
Initialization: $η_{0}$
1:: ${\hat{f}}_{t} \leftarrow \sqrt{{\bar{α}}_{t}} {\hat{f}}_{0} + \sqrt{1 - {\bar{α}}_{t}} ϵ .$
2:: ${\hat{f}}_{t} \leftarrow P_{Ω} (λ_{t} f + (1 - λ_{t}) {\hat{f}}_{t}) + (1 - P_{Ω}) {\hat{f}}_{t}$ ▹ DC
3:: for $k = 0$ to $K - 1$ do
4:: ${\hat{f}}_{t} \leftarrow {\hat{f}}_{t} - η_{k} \nabla_{{\hat{f}}_{t}} \frac{1}{2} {∥ A F^{- 1} {\hat{f}}_{t} - f ∥}_{2}^{2}$ ▹ GD
5:: end for
6:: Take gradient descent update step
$\nabla_{θ} {∥ ϵ - ϵ_{θ} ({\hat{f}}_{t}, t) ∥}_{2}^{2}$
Until converge
Output:: ${\hat{f}}_{t}$ , $t \in {1, \dots, T}$ .

Algorithm 4 Sampling process of static DiMo

Input:: ${\hat{f}}_{T} \sim N (0, I)$ , undersampling mask $P_{Ω}$ , partial scanned k-space $f$ , and coil sensitivities $S$ .
1:: for $t = T - 1, \dots, 0$ do
2:: $z \sim N (0, I)$ if $t > 0$ , else $z = 0$
3:: ${\hat{f}}_{t} = μ_{θ} ({\hat{f}}_{t + 1}, t + 1) + σ_{t + 1} z$
4:: ${\hat{f}}_{t} \leftarrow P_{Ω} (λ_{t} f + (1 - λ_{t}) {\hat{f}}_{t}) + (1 - P_{Ω}) {\hat{f}}_{t}$ ▹ DC
5:: for $k = 0$ to $K - 1$ do
6:: ${\hat{f}}_{t} \leftarrow {\hat{f}}_{t} - η_{k} \nabla_{{\hat{f}}_{t}} \frac{1}{2} {∥ A F^{- 1} {\hat{f}}_{t} - f ∥}_{2}^{2}$ ▹ GD
7:: end for
8:: end for
Output:: ${\hat{f}}_{0}$

In the training and sampling algorithm, the data consistency (DC) term was used to emphasize the physical consistency between the partial k-space and reconstructed images. Then, the gradient descent (GD) algorithm is applied iteratively into the diffusion step to refine k-space data further. The matrix

1

only contains a value of one. GD in here solves the optimization problem (1) without the regularization term.

Static DiMo performed a qualitative comparison with both the image domain diffusion model presented by Chung et al. [89] and the k-space domain diffusion model MC-DDPM [98] and demonstrated robust performance in reconstruction quality and noise reduction. Quantitative DiMo reconstructs the quantitative parameter maps from the partial k-space data. The MR signal model

M

defined in (28) maps MR parameter maps to the static MRI; therefore, it is one more function inside the reconstruction model:

min_{Δ} \frac{1}{2} {∥ A M (Δ) - f ∥}_{2}^{2} .

(37)

The MR parameter maps are denoted as

Δ = {δ_{i}}_{i = 1}^{N}

where

δ_{i}

indicates each MR parameter and N is the total number of MR parameters to be estimated.

The training and sampling diffusion model for quantitative MRI (qMRI) DiMo follows the same steps as static DiMo, where the signal model should take the inverse when calculating the quantitative maps

Δ

from the updated k-space. Quantitative DiMo showed the least error compared to other methods [99,100,101]. This is likely achieved through integrating the unrolling gradient descent algorithm and diffusion denoising network, prioritizing noise suppression without compromising the fidelity and clarity of the underlying tissue structure.

3.6. Bi-Level Optimization Model for Multi-Task Learning

In recent years, a large amount of work introduced the customized variational model for multi-task learning using bi-level optimization models. For example, joint tasks of reconstruction and multi-contrast synthesis [70] and meta-learning model for MRI reconstruction [39].

Consider a clinical situation where a patient with suspected multiple sclerosis (MS) undergoes an MRI scan. The clinician requires both T1-weighted and T2-weighted images to assess different tissue characteristics—T1 for detailed anatomical structures and T2 for detecting lesions with high water content, commonly associated with MS. Typically, reconstructing these sequences separately is time-consuming due to the high-resolution requirements. Concurrently, there is a need to synthesize a FLAIR (Fluid-Attenuated Inversion Recovery) image, which is crucial for suppressing cerebrospinal fluid signals and enhancing the visibility of lesions. Given the urgency of diagnosis and the need for comprehensive imaging, a joint reconstruction and synthesis model can be employed. This model not only reconstructs the T1 and T2 sequences simultaneously, thereby reducing overall scan and processing time, but also synthesizes the FLAIR image directly from the acquired data. By leveraging the complementary information between the T1 and T2 sequences, the model ensures that the synthesized FLAIR image is consistent and reliable, providing the clinician with a complete set of images for accurate diagnosis without the need for additional scans.

A provable learnable optimization algorithm [70] was introduced for joint MRI reconstruction and synthesis. Consider the partial k-space data

{f_{1}, f_{2}}

of the source modalities (e.g., T1 and T2) obtained from the measurement domain. The goal is to reconstruct the corresponding images

{x_{1}, x_{2}}

and synthesize the image

x_{3}

of the missing modality (e.g., FLAIR) without having its k-space data. The following optimization model is designed:

\begin{matrix} min_{x_{1}, x_{2}, x_{3}} Φ_{Θ, γ} (x_{1}, x_{2}, x_{3}) & : = \frac{1}{2} \sum_{i = 1}^{2} ∥ A_{i} x_{i} - f_{i} ∥_{2}^{2} + \frac{1}{3} \sum_{i = 1}^{3} {∥ f_{w_{i}} (x_{i}) ∥}_{2, 1} \\ + \frac{γ}{2} {∥ ζ_{θ} ([f_{w_{1}} (x_{1}), f_{w_{2}} (x_{2})]) - x_{3} ∥}_{2}^{2} . \end{matrix}

(38)

The first term

\frac{1}{2} \sum_{i = 1}^{2} {∥ A_{i} x_{i} - f_{i} ∥}_{2}^{2}

ensures the fidelity of the reconstructed images

{x_{1}, x_{2}}

to their partial k-space data

{f_{1}, f_{2}}

. The second term

\frac{1}{3} \sum_{i = 1}^{3} {∥ f_{w_{i}} (x_{i}) ∥}_{2, 1}

regularizes the images using modality-specific feature extraction operators

f_{w_{i}}, i = 1, 2, 3 .

The third term

\frac{γ}{2} {∥ ζ_{θ} ([f_{w_{1}} (x_{1}), f_{w_{2}} (x_{2})]) - x_{3} ∥}_{2}^{2}

enforces consistency between the synthesized image

x_{3}

and the learned correlation relationship from the reconstructed images

x_{1}, x_{2}

. To synthesize the image

x_{3}

using

x_{1}

and

x_{2}

, a feature-fusion operator

ζ_{θ}

was employed which learns the mapping from the features

f_{w_{1}} (x_{1})

and

f_{w_{2}} (x_{2})

to the image

x_{3}

.

Denote

X = {x_{1}, x_{2}, x_{3}}

, the forward learnable optimization algorithm is presented in Algorithm 5. In step 3, the algorithm performs a gradient descent update with a step size found via line search, while keeping the smoothing parameter

ε > 0

fixed. In step 4, the reduction of

ε

ensures the subsequence which met the

ε

reduction criterion must have an accumulation point that is a Clarke stationary point of the problem.

Algorithm 5 Learnable descent algorithm for joint MRI reconstruction and synthesis

1:: Input: Initial estimate $X_{0}$ , step size range $0 < η < 1$ , initial smoothing parameter $ε_{0}$ , $a, σ > 0$ , $t = 0$ . Maximum iterations T. Set tolerance $ϵ_{tol} > 0$ .
2:: for $t = 0, 1, 2, \dots, T - 1$ do
3:: $X_{t + 1} = X_{t} - α_{t} \nabla Φ_{Θ, γ}^{ε} (X_{t})$ , where the step size $α_{t}$ is determined by a line search such that $Φ_{Θ, γ}^{ε} (X_{t + 1}) - Φ_{Θ, γ}^{ε} (X_{t}) \leq - \frac{1}{a} {∥ X_{t + 1} - X_{t} ∥}^{2}$ holds.
4:: if $∥ \nabla Φ_{Θ, γ}^{ε} (X_{t + 1}) ∥ < σ η ε$ , set $ε_{t + 1} = η ε$ ; otherwise, set $ε_{t + 1} = ε$ .
5:: if $σ ε < ϵ_{tol}$ , terminate and go to step 6,
6:: end for and output $X^{(t)}$ .

Algorithm 5 is a forward MRI reconstruction algorithm. Let

Θ = {w_{1}, w_{2}, w_{3}, θ}

denote the collection of all the learnable parameters and

γ

denote a parameter to balance the reconstruction part and image synthesis part. The backward network training algorithm is designed to solve a bilevel optimization problem:

\begin{matrix} {min}_{γ} \sum_{i = 1}^{M_{v a l}} ℓ (Θ (γ), γ; D_{i}^{v a l}) s . t . Θ (γ) = {arg min}_{Θ} \sum_{i = 1}^{M_{t r}} ℓ (Θ, γ; D_{i}^{t r}), \end{matrix}

(39)

\begin{matrix} where ℓ (Θ, γ; D_{i}) : = \frac{μ}{2} {∥ ζ_{θ} ([f_{w_{1}} (x_{1}^{*}), f_{w_{2}} (x_{2}^{*})]) - x_{3}^{*} ∥}_{2}^{2} \\ + \sum_{j = 1}^{3} (\frac{1}{2} {∥ x_{j, \hat{T}} (Θ, γ; D_{i}) - x_{j}^{*} ∥}_{2}^{2} + (1 - S S I M (x_{j, \hat{T}} (Θ, γ; D_{i}), x_{j}^{*}))) . \end{matrix}

(40)

The following Algorithm 6 was proposed for training the model for joint reconstruction and synthesis.

Algorithm 6 Mini-batch alternating direction penalty algorithm

1:: Input Training data $D^{t r}$ , validation data $D^{v a l}$ , tolerance $δ_{t o l} > 0$ . Initialize $Θ$ , $γ$ , $δ$ , $λ > 0$ and $ν_{δ} \in (0, 1)$ , $ν_{λ} > 1$ .
2:: while $δ > δ_{t o l}$ do
3:: Sample training batch $B^{t r} \subset D^{t r}$ and validation batch $B^{v a l} \subset D^{v a l}$ .
4:: while $∥ \nabla_{Θ} \tilde{L} (Θ, γ; B^{t r}, B^{v a l}) ∥^{2} + {∥ \nabla_{γ} \tilde{L} (Θ, γ; B^{t r}, B^{v a l}) ∥}^{2} > δ$ do
5:: for $k = 1, 2, \dots, K$ (inner loop) do
6:: Update $Θ \leftarrow Θ - ρ_{Θ}^{(k)} \nabla_{Θ} \tilde{L} (Θ, γ; B^{t r}, B^{v a l})$
7:: end for
8:: Update $γ \leftarrow γ - ρ_{γ} \nabla_{γ} \tilde{L} (Θ, γ; B^{t r}, B^{v a l})$
9:: end while and update $δ \leftarrow ν_{δ} δ$ , $λ \leftarrow ν_{λ} λ$ .
10:: end while and output: $Θ, γ$ .

The training Algorithm 6 considers updating the network parameters

Θ

and the balancing parameter

γ

by minimizing the loss function (40) on both validation and training data sets for each task

j = 1, 2, 3

. Figure 8 shows the overall network architecture of iterating forward optimization Algorithm 5 with backward training Algorithm 6.

4. Discussion

4.1. Evaluation Metrics and Loss Functions

To quantitatively compare the performance of several MRI reconstruction algorithms, evaluation metrics are essential. These metrics need to be standardized in order to provide a consistent comparison between algorithms proposed in different studies and provide insights into effectiveness of the algorithms. In this paper, we introduce a few evaluation metrics used in previous studies. A common evaluation metric used across multiple studies is the root mean squared error (RMSE), which provides the square root of the average squared difference between predicted and actual images. The RMSE between the reconstruction

v

and the ground truth

v^{*}

is defined as

R M S E = ∥ v^{*} - v ∥ / ∥ v^{*} ∥ .

(41)

R M S E = \sqrt{\sum_{i = 1}^{c} ∥ v_{i}^{*} - v_{i} ∥^{2} / \sum_{i = 1}^{c} {∥ v_{i}^{*} ∥}^{2}} .

(42)

Studies have also discussed the peak signal-to-noise ratio (PSNR), which expresses the ratio between the maximum possible pixel value and the power of the noise. A higher PSNR values indicate better predictive quality. Typically, a PSNR of above 30 dB is considered acceptable for reconstructions. The PSNR is defined as follows.

P S N R = 20 {log}_{10} (max (| v^{*} |) / \frac{1}{N} {∥ v^{*} - v ∥}^{2}),

(43)

where N is the total number of pixels in the magnitude of ground truth.

Finally, the structural similarity index (SSIM) evaluates the quality of the model predictions by comparing luminance, contrast, and structure between the reconstructed and original images. A higher SSIM indicates closer structural similarity. The SSIM ranges between −1 and 1, with 1 being considered as being perfect structural similarity. The following equation calculates SSIM between reconstruction

v

and reference

v^{*}

:

S S I M = \frac{(2 μ_{v} μ_{v^{*}} + C_{1}) (2 σ_{v v^{*}} + C_{2})}{(μ_{v}^{2} + μ_{v^{*}}^{2} + C_{1}) (σ_{v}^{2} + σ_{v^{*}}^{2} + C_{2})},

(44)

where

μ_{v}, μ_{v^{*}}

are local means of pixel intensity,

σ_{v}, σ_{v^{*}}

denote the standard deviation and

σ_{v v^{*}}

is covariance between

v

and

v^{*}

.

C_{1} = {(k_{1} L)}^{2}, C_{2} = {(k_{2} L)}^{2}

are constants to avoid the denominator being zero, where

k_{1} = 0.01, k_{2} = 0.03

. L is the largest pixel value of the magnitude of images.

Loss functions play crucial roles in training the network and optimizing model performance. Studies proposing various networks have also discussed novel loss functions that may enhance image reconstruction. We listed the loss functions that are being used in several key representative methods. Most of the LOAs are supervised learning which requires ground truth in the loss function. RELAX-MORE [76] is subject-dependent self-supervised learning method that does not require fully sampled k-space data. The Table 1 shows the comparisons of different loss functions.

It is worth noting that the goal for diffusion models is to learn the Gaussian noise added in each diffusion step, the network

ϵ_{θ}

learns to remove the noise that is added in the forward process and matches the target distribution. Therefore, the loss function measures the discrepancy between the estimated learned noise in each step and the actual added noise in the training process.

4.2. Comparing Learnable Optimization Algorithms (LOA) with Traditional Optimization Methods

LOA has emerged as an effective alternative to traditional optimization methods. Despite its advantages, the LOA is computationally expensive, especially during the training phase. LOAs need a substantial amount of training data to generalize well across optimization tasks. Furthermore, owing to the requirement of large-scale training, the training process for the LOA could also be quite slow, especially when suffering the curse of dimensionality. Discussing the predictive performance and accuracy of the LOA and comparing it with the traditional optimization methods, LOA often tends to outperform traditional methods on more complex non-convex problems with irregular landscapes. Focusing on MRI reconstruction, LOAs have shown impressive results in recovering high-quality MRI images from undersampled k-space data. Additionally, studies have demonstrated that LOAs can handle noise and artifacts better, which are common in clinical MRI scans. Another advantage is that once the LOA is trained, they can reconstruct MRI images much quicker than traditional methods. This efficiency makes LOAs more attractive for real-time applications. However, as indicated earlier, this efficiency comes at the cost of increasing training time.

4.3. Strengths, Weakness, and Performance of the Reviewed Algorithms in Different Scenarios

In this paper, several algorithms focusing on MRI reconstruction have been discussed. To fully understand the limitations of existing algorithms and the need for future research, it is vital to perform a comparative analysis between the algorithms in this paper. The Table 2 summarized a detailed comparisons among several well-known LOA inspired methods.

4.4. Selection of Acquisition Parameters

The selection of acquisition parameters is a critical aspect of MRI reconstruction, influencing the efficiency and accuracy of deep learning models. The architecture of deep reconstruction networks and efficient numerical methods play a pivotal role in this selection. In LOAs, one must determine the appropriate number of iterations T and the initial step size for gradient descent to ensure the reconstruction network converges to the local optimum of the problem (1). The convergence to the local optimum is essential for producing high-quality reconstructed images. The required number of iterations and the step size depend on the specific application tasks and whether the step size is learnable or fixed in the gradient descent-based algorithm used for reconstruction. Proper tuning of these parameters is crucial for optimizing the performance of the reconstruction network.

4.5. Theoretical Convergence and Practical Considerations

While unrolling-based deep-learning methods are derived from numerical algorithms with convergence guarantees, these guarantees do not always extend to the unrolled methods due to their dynamic nature and the direct replacement of functions by neural networks. Theoretical convergence is compromised, and only a few works have analyzed the convergence behavior of unrolling-based methods in theory. Notable studies include [38,39,70], which provide insights into the theoretical convergence of these methods. For example, reference [70] proved that if

x^{(t)}

satisfies the stopping criterion, then there exists a subsequence

x^{(t_{l + 1})}

at least one accumulation point, and every accumulation point of

x^{(t_{l + 1})}

is a Clarke stationary point of the (1). Understanding the convergence properties of unrolled networks is crucial for ensuring the reliability and robustness of MRI reconstruction algorithms. Future research should focus on establishing stronger theoretical foundations and convergence guarantees for unrolling-based deep learning methods. This includes developing new theoretical frameworks that can account for the dynamic and adaptive nature of these models, as well as creating more rigorous validation protocols.

A significant application of deep learning in clinical MRI is its use in accelerating image acquisition, making it possible to acquire images up to 10 times faster than conventional methods without compromising diagnostic quality [24,94,103]. Artifact reduction is a critical challenge in MRI, where motion artifacts or metal implants can significantly degrade image quality. Deep learning models have shown scalability across different MRI modalities and anatomical regions, which is crucial for their widespread adoption in clinical practice. The ability to handle large datasets and perform real-time processing is also vital for integrating these models into routine workflows. For deep learning models to be truly effective in clinical settings, they must generalize well across diverse patient populations, imaging protocols, and MRI scanners. This requires robust training on large, heterogeneous datasets and careful validation across multiple sites and clinical conditions.

4.6. Limitations of the Existing Deep Learning Approaches

Despite the advancements in reconstructing MRI images through deep learning methods, there are several practical challenges that need to be addressed. Most deep learning approaches focus on designing end-to-end networks that are independent of intrinsic MRI physical characteristics, leading to sub-optimal performance. Deep learning methods are also often criticized for their lack of mathematical interpretation, being seen as “black boxes”. Acquiring and processing large high-quality datasets that are needed for training deep-learning models may be difficult, especially when dealing with diverse patient populations and varying imaging conditions. Training deep neural networks may require a large quantity of data and may be prone to over-fitting when data is scarce. Additionally, both the training and inference process of deep learning models may require substantial computational resources, which may act as a barrier for certain medical institutions. Hence, it may be a trade-off between cost and time. Ensuring that the model generalizes well across different MRI scans and clinical settings is also essential for the widespread adoption of deep-learning techniques. Finally, the technologies utilized in clinical settings might need to be validated, transparent, and fully interpretable in order to ensure that clinicians trust the decision-making capabilities of the algorithms. Future studies may focus on addressing the above-mentioned challenges, which would accelerate the adoption of deep learning methods and advance the field of medical imaging.

4.7. Computational Burden, Memory Consumption, and Inference Time

Deep learning-based MRI reconstruction methods, particularly those involving unrolled optimization algorithms, demand significant computational resources. The training process involves substantial GPU memory consumption to store intermediate results and their corresponding gradients. This high memory requirement, coupled with potentially long training times, arises from the need to repeatedly apply the forward and adjoint operators during training.

However, diffusion models, for example, require long inference times due to the pre-scheduled denoising steps involved in the sampling process. Additionally, training diffusion models is time-consuming and memory-intensive, particularly because of the self-attention modules incorporated in the denoising network at each diffusion step.

The LOAs such as ISTA-Net, and PD-Net unroll iterative optimization procedures, with each phase of the network being trained independently without parameter sharing. As the number of iterations increases, so does the computational and memory burden, making these methods more resource-intensive. It is crucial to acknowledge these limitations in the design of the network architecture, such as long inference times, model complexity, memory consumption, and computational burden, alongside the advantages of these methods. Addressing these challenges will be key to improving the feasibility and efficiency of deep learning approaches in MRI reconstruction.

To address these challenges, techniques such as pruning, quantization, and knowledge distillation can reduce the model size and memory footprint without significantly sacrificing performance. In addition, implement gradient checkpointing, where only a subset of activations is stored during the forward pass, and others are recomputed during the backward pass. This can significantly reduce memory usage during training. Splitting the training process across multiple GPUs or even across different nodes in a cluster can help manage memory constraints by distributing the workload. Advancements in hardware acceleration, such as the use of specialized AI chips and tensor processing units (TPUs), could further enhance the performance of deep learning-based MRI reconstruction.

4.8. Other Related Reconstruction Methods

Federated learning (FL) for MRI reconstruction [104,105,106] is an innovative approach that enables multiple institutions to collaboratively train a deep learning model without sharing patient data, thus preserving privacy. This method is particularly valuable in the medical imaging field, where data privacy and security are paramount, and where institutions may have diverse datasets collected under different conditions, such as varying sensors, disease types, and acquisition protocols. For example, the FL-MR framework [105] trains local models at multiple institutions. Each local model computes reconstruction losses and update parameters through gradient descent. Then the updated model parameters are sent to the central server, where they are averaged to update the global model. The updated global model is then redistributed to local institutions for further training, iterating until the model converges.

Plug-and-play (PnP) methods [78,107,108,109] integrate state-of-the-art denoising algorithms as priors into the image reconstruction process. The PnP approach allows for the decoupling of image modeling (denoising) from forward modeling (data acquisition), which is particularly advantageous in MRI, where the forward model can vary significantly between different scans. For example, proximal-based PnP methods [78] leverage the ADMM framework to decouple the regularization term (which encodes prior knowledge of the image) from the data fidelity term. In these methods, a denoising algorithm replaces the proximal operator that would typically be used in the optimization process. Gradient-based PnP methods [108] involve using gradient descent-based algorithms like the fast iterative shrinkage-thresholding algorithm (FISTA) in combination with denoisers. Instead of solving the proximal update exactly, a gradient descent step is performed to handle the data fidelity term, followed by a denoising step that uses the denoiser as a regularizer. This category emphasizes computational efficiency, as gradient steps are generally less expensive than solving proximal updates. The consensus equilibrium (CE) framework [110,111,112,113] provides a theoretical understanding of PnP methods. It interprets the denoiser used in the PnP algorithm as a solution to an equilibrium equation rather than an exact minimization of a cost function. This framework helps to address questions related to the convergence of PnP methods, particularly when the denoiser does not correspond to any known regularizer.

Blind-PMRI-Net [58] is a method that alternates between updating images and sensitivity maps to solve multi-channel MRI problems. It employs a half-quadratic splitting approach, resulting in a complex network design that requires careful balancing of updates to maintain stability. This method is particularly effective in multi-coil MRI scenarios where sensitivity maps are not known a priori, making it well-suited for handling complex imaging tasks. However, optimizing this approach can be challenging due to its intricate structure.

VS-Net [59] utilizes variable splitting optimization to manage complex multi-coil data. While it performs well in parallel imaging tasks, its robustness is limited by its handling of sensitivity maps, making it less adaptable to certain coil configurations. VS-Net delivers good performance, but its sensitivity to inaccuracies in the sensitivity maps can impact its effectiveness in some scenarios.

4.9. Future Directions and Research Opportunities

Future research should aim to address the limitations of current models and explore new avenues for enhancement. Emerging AI techniques, such as reinforcement learning, offer promising directions for improving MRI reconstruction. To further enhance MRI reconstruction, future work may focus on leveraging reinforcement learning (RL). RL can optimize acquisition parameters dynamically during the scan. This could lead to more efficient data collection and potentially reduce scan times further, thereby making MRI procedures more responsive to specific patient needs and scanning conditions. In addition to RL, self-supervised learning also holds immense promise in improving the efficacy, quality, and accuracy of MRI reconstruction. Self-supervised learning can exploit the structure in MRI data, such as the physical constraints of the imaging process, to generate useful training signals without the need for a large quantity of ground truth data. This can serve as a catalyst for the development of novel models and reduce any dependency the researchers may have on expensive and time-consuming data labeling. Future work may see the development of more sophisticated self-supervised to leverage domain-specific knowledge and enhance model performance.

In addition to leveraging various techniques, personalized medicine approaches, where models are tailored to individual patient characteristics, could provide significant benefits. These approaches can leverage patient-specific data to enhance the accuracy and reliability of MRI reconstructions, leading to more precise diagnoses and personalized treatment plans. Additionally, the integration of self-supervised learning techniques, which can operate with limited ground truth data, represents a promising avenue. Self-supervised learning can leverage the inherent structure in MRI data to improve model training, reducing the dependency on extensive labeled datasets.

Moreover, exploring hybrid models that combine multiple algorithms may offer a more comprehensive solution to the challenges of MRI reconstruction. These hybrid models can integrate the strengths of different techniques, such as combining the robustness of classical optimization with the adaptability of deep learning. Collaborative efforts between researchers, clinicians, and industry partners will be essential for advancing the field and translating research innovations into clinical practice. Ensuring that these models are user-friendly and seamlessly integrated into existing clinical workflows will be critical for their successful implementation.

5. Conclusions

In conclusion, this paper provides a comprehensive overview of several optimization algorithms and network unrolling methods for MRI reconstruction. The discussed techniques include gradient descent algorithms, proximal gradient descent algorithms, ADMM, PDHG, and diffusion models combined with gradient descent. By summarizing these advanced methodologies, we aim to offer a valuable resource for researchers seeking to enhance MRI reconstruction through optimization-based deep learning approaches. The insights presented in this review are expected to facilitate further development and application of these algorithms in the field of medical imaging. One of the most promising directions is the integration of these optimization techniques with emerging AI-driven methods such as generative models and RL, which could further improve reconstruction accuracy and reduce computational costs. Additionally, research on reinforcement and self-supervised learning could greatly improve the efficacy and accuracy of the MRI reconstruction process. The insights presented in this review are expected to facilitate further development and application of these LOAs in the fields of inverse problems and medical imaging reconstruction.

Author Contributions

Conceptualization, W.B.; validation, W.B.; formal analysis, W.B.; investigation, W.B. and Y.K.T.; resources, W.B.; writing—original draft preparation, W.B. and Y.K.T.; writing—review and editing, W.B. and Y.K.T.; visualization, W.B. and Y.K.T.; supervision, W.B.; project administration, W.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No data is shared in this study.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Kim, R.J.; Wu, E.; Rafael, A.; Chen, E.L.; Parker, M.A.; Simonetti, O.; Klocke, F.J.; Bonow, R.O.; Judd, R.M. The use of contrast-enhanced magnetic resonance imaging to identify reversible myocardial dysfunction. N. Engl. J. Med. 2000, 343, 1445–1453. [Google Scholar] [CrossRef]
Kasivisvanathan, V.; Rannikko, A.S.; Borghi, M.; Panebianco, V.; Mynderse, L.A.; Vaarala, M.H.; Briganti, A.; Budäus, L.; Hellawell, G.; Hindley, R.G.; et al. MRI-targeted or standard biopsy for prostate-cancer diagnosis. N. Engl. J. Med. 2018, 378, 1767–1777. [Google Scholar] [CrossRef] [PubMed]
Pruessmann, K.P.; Weiger, M.; Scheidegger, M.B.; Boesiger, P. SENSE: Sensitivity encoding for fast MRI. Magn. Reson. Med. 1999, 42, 952–962. [Google Scholar] [CrossRef]
Griswold, M.A.; Jakob, P.M.; Heidemann, R.M.; Nittka, M.; Jellus, V.; Wang, J.; Kiefer, B.; Haase, A. Generalized autocalibrating partially parallel acquisitions (GRAPPA). Magn. Reson. Med. 2002, 47, 1202–1210. [Google Scholar] [CrossRef]
Nyquist, H. Certain topics in telegraph transmission theory. Trans. Am. Inst. Electr. Eng. 1928, 47, 617–644. [Google Scholar] [CrossRef]
Donoho, D.L. Compressed sensing. IEEE Trans. Inf. Theory 2006, 52, 1289–1306. [Google Scholar] [CrossRef]
Sodickson, D.K.; Manning, W.J. Simultaneous acquisition of spatial harmonics (SMASH): Fast imaging with radiofrequency coil arrays. Magn. Reson. Med. 1997, 38, 591–603. [Google Scholar] [CrossRef]
Larkman, D.J.; Nunes, R.G. Parallel magnetic resonance imaging. Phys. Med. Biol. 2007, 52, R15. [Google Scholar] [CrossRef]
Fessler, J.A. Model-based image reconstruction for MRI. IEEE Signal Process. Mag. 2010, 27, 81–89. [Google Scholar] [CrossRef]
Yang, A.C.; Kretzler, M.; Sudarski, S.; Gulani, V.; Seiberlich, N. Sparse reconstruction techniques in magnetic resonance imaging: Methods, applications, and challenges to clinical adoption. Investig. Radiol. 2016, 51, 349–364. [Google Scholar] [CrossRef]
Bilgic, B.; Goyal, V.K.; Adalsteinsson, E. Multi-contrast reconstruction with Bayesian compressed sensing. Magn. Reson. Med. 2011, 66, 1601–1615. [Google Scholar] [CrossRef] [PubMed]
Huang, J.; Chen, C.; Axel, L. Fast multi-contrast MRI reconstruction. Magn. Reson. Imaging 2014, 32, 1344–1352. [Google Scholar] [CrossRef] [PubMed]
Siddique, N.; Paheding, S.; Elkin, C.P.; Devabhaktuni, V. U-net and its variants for medical image segmentation: A review of theory and applications. IEEE Access 2021, 9, 82031–82057. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III 18; Springer: Berlin/Heidelberg, Germany, 2015; pp. 234–241. [Google Scholar]
Zhou, Z.; Rahman Siddiquee, M.M.; Tajbakhsh, N.; Liang, J. Unet++: A nested u-net architecture for medical image segmentation. In Proceedings of the Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support: 4th International Workshop, DLMIA 2018, and 8th International Workshop, ML-CDS 2018, Held in Conjunction with MICCAI 2018, Granada, Spain, 20 September 2018; Proceedings 4. Springer: Berlin/Heidelberg, Germany, 2018; pp. 3–11. [Google Scholar]
Zhan, C.; Ghaderibaneh, M.; Sahu, P.; Gupta, H. Deepmtl pro: Deep learning based multiple transmitter localization and power estimation. Pervasive Mob. Comput. 2022, 82, 101582. [Google Scholar] [CrossRef]
Zhan, C.; Ghaderibaneh, M.; Sahu, P.; Gupta, H. Deepmtl: Deep learning based multiple transmitter localization. In Proceedings of the 2021 IEEE 22nd International Symposium on a World of Wireless, Mobile and Multimedia Networks (WoWMoM), Pisa, Italy, 7–11 June 2021; pp. 41–50. [Google Scholar]
Hammernik, K.; Klatzer, T.; Kobler, E.; Recht, M.P.; Sodickson, D.K.; Pock, T.; Knoll, F. Learning a variational network for reconstruction of accelerated MRI data. Magn. Reson. Med. 2018, 79, 3055–3071. [Google Scholar] [CrossRef]
Schlemper, J.; Caballero, J.; Hajnal, J.V.; Price, A.; Rueckert, D. A deep cascade of convolutional neural networks for MR image reconstruction. In Information Processing in Medical Imaging, Proceedings of the 25th International Conference, IPMI 2017, Boone, NC, USA, 25–30 June 2017; Proceedings 25; Springer: Berlin/Heidelberg, Germany, 2017; pp. 647–658. [Google Scholar]
Han, Y.; Yoo, J.; Kim, H.H.; Shin, H.J.; Sung, K.; Ye, J.C. Deep learning with domain adaptation for accelerated projection-reconstruction MR. Magn. Reson. Med. 2018, 80, 1189–1205. [Google Scholar] [CrossRef]
Zhu, B.; Liu, J.Z.; Cauley, S.F.; Rosen, B.R.; Rosen, M.S. Image reconstruction by domain-transform manifold learning. Nature 2018, 555, 487–492. [Google Scholar] [CrossRef]
Yang, G.; Yu, S.; Dong, H.; Slabaugh, G.; Dragotti, P.L.; Ye, X.; Liu, F.; Arridge, S.; Keegan, J.; Guo, Y.; et al. DAGAN: Deep de-aliasing generative adversarial networks for fast compressed sensing MRI reconstruction. IEEE Trans. Med. Imaging 2017, 37, 1310–1321. [Google Scholar] [CrossRef]
Lee, D.; Yoo, J.; Tak, S.; Ye, J.C. Deep residual learning for accelerated MRI using magnitude and phase networks. IEEE Trans. Biomed. Eng. 2018, 65, 1985–1995. [Google Scholar] [CrossRef]
Knoll, F.; Hammernik, K.; Zhang, C.; Moeller, S.; Pock, T.; Sodickson, D.K.; Akcakaya, M. Deep-learning methods for parallel magnetic resonance imaging reconstruction: A survey of the current approaches, trends, and issues. IEEE Signal Process. Mag. 2020, 37, 128–140. [Google Scholar] [CrossRef]
Yaman, B.; Hosseini, S.A.H.; Moeller, S.; Ellermann, J.; Uğurbil, K.; Akçakaya, M. Self-supervised learning of physics-guided reconstruction neural networks without fully sampled reference data. Magn. Reson. Med. 2020, 84, 3172–3191. [Google Scholar] [CrossRef] [PubMed]
Blumenthal, M.; Luo, G.; Schilling, M.; Haltmeier, M.; Uecker, M. NLINV-Net: Self-Supervised End-2-End Learning for Reconstructing Undersampled Radial Cardiac Real-Time Data. In Proceedings of the ISMRM Annual Meeting, London, UK, 7–12 May 2022. [Google Scholar]
Bian, W. A Brief Overview of Optimization-Based Algorithms for MRI Reconstruction Using Deep Learning. arXiv 2024, arXiv:2406.02626. [Google Scholar]
Jin, K.H.; McCann, M.T.; Froustey, E.; Unser, M. Deep convolutional neural network for inverse problems in imaging. IEEE Trans. Image Process. 2017, 26, 4509–4522. [Google Scholar] [CrossRef] [PubMed]
Lundervold, A.S.; Lundervold, A. An overview of deep learning in medical imaging focusing on MRI. Z. Med. Phys. 2019, 29, 102–127. [Google Scholar] [CrossRef]
Liang, D.; Cheng, J.; Ke, Z.; Ying, L. Deep magnetic resonance image reconstruction: Inverse problems meet neural networks. IEEE Signal Process. Mag. 2020, 37, 141–151. [Google Scholar] [CrossRef]
Sandino, C.M.; Cheng, J.Y.; Chen, F.; Mardani, M.; Pauly, J.M.; Vasanawala, S.S. Compressed Sensing: From Research to Clinical Practice with Deep Neural Networks: Shortening Scan Times for Magnetic Resonance Imaging. IEEE Signal Process. Mag. 2020, 37, 117–127. [Google Scholar] [CrossRef]
McCann, M.T.; Jin, K.H.; Unser, M. Convolutional neural networks for inverse problems in imaging: A review. IEEE Signal Process. Mag. 2017, 34, 85–95. [Google Scholar] [CrossRef]
Zhou, S.K.; Greenspan, H.; Davatzikos, C.; Duncan, J.S.; Van Ginneken, B.; Madabhushi, A.; Prince, J.L.; Rueckert, D.; Summers, R.M. A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises. Proc. IEEE 2021, 109, 820–838. [Google Scholar] [CrossRef]
Singha, A.; Thakur, R.S.; Patel, T. Deep Learning Applications in Medical Image Analysis. In Biomedical Data Mining for Information Retrieval: Methodologies, Techniques and Applications; John Wiley & Sons: Hoboken, NJ, USA, 2021; pp. 293–350. [Google Scholar]
Chandra, S.S.; Bran Lorenzana, M.; Liu, X.; Liu, S.; Bollmann, S.; Crozier, S. Deep learning in magnetic resonance image reconstruction. J. Med. Imaging Radiat. Oncol. 2021, 65, 564–577. [Google Scholar] [CrossRef]
Ahishakiye, E.; Van Gijzen, M.B.; Tumwiine, J.; Wario, R.; Obungoloch, J. A survey on deep learning in medical image reconstruction. Intell. Med. 2021, 1, 118–127. [Google Scholar] [CrossRef]
Liu, R.; Zhang, Y.; Cheng, S.; Luo, Z.; Fan, X. A Deep Framework Assembling Principled Modules for CS-MRI: Unrolling Perspective, Convergence Behaviors, and Practical Modeling. IEEE Trans. Med. Imaging 2020, 39, 4150–4163. [Google Scholar] [CrossRef] [PubMed]
Chen, Y.; Liu, H.; Ye, X.; Zhang, Q. Learnable descent algorithm for nonsmooth nonconvex image reconstruction. SIAM J. Imaging Sci. 2021, 14, 1532–1564. [Google Scholar] [CrossRef]
Bian, W.; Chen, Y.; Ye, X.; Zhang, Q. An optimization-based meta-learning model for mri reconstruction with diverse dataset. J. Imaging 2021, 7, 231. [Google Scholar] [CrossRef]
Hammernik, K.; Küstner, T.; Yaman, B.; Huang, Z.; Rueckert, D.; Knoll, F.; Akçakaya, M. Physics-Driven Deep Learning for Computational Magnetic Resonance Imaging: Combining physics and machine learning for improved medical imaging. IEEE Signal Process. Mag. 2023, 40, 98–114. [Google Scholar] [CrossRef]
Uecker, M.; Lai, P.; Murphy, M.J.; Virtue, P.; Elad, M.; Pauly, J.M.; Vasanawala, S.S.; Lustig, M. ESPIRiT—an eigenvalue approach to autocalibrating parallel MRI: Where SENSE meets GRAPPA. Magn. Reson. Med. 2014, 71, 990–1001. [Google Scholar] [CrossRef]
Deshmane, A.; Gulani, V.; Griswold, M.A.; Seiberlich, N. Parallel MR imaging. J. Magn. Reson. Imaging 2012, 36, 55–72. [Google Scholar] [CrossRef]
Fliege, J.; Svaiter, B.F. Steepest descent methods for multicriteria optimization. Math. Methods Oper. Res. 2000, 51, 479–494. [Google Scholar] [CrossRef]
Singh, D.; Monga, A.; de Moura, H.L.; Zhang, X.; Zibetti, M.V.; Regatte, R.R. Emerging trends in fast MRI using deep-learning reconstruction on undersampled k-space data: A systematic review. Bioengineering 2023, 10, 1012. [Google Scholar] [CrossRef]
Sun, H.; Liu, X.; Feng, X.; Liu, C.; Zhu, N.; Gjerswold-Selleck, S.J.; Wei, H.J.; Upadhyayula, P.S.; Mela, A.; Wu, C.C.; et al. Substituting gadolinium in brain MRI using DeepContrast. In Proceedings of the 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, 3–7 April 2020; pp. 908–912. [Google Scholar]
Others, S.W. DeepcomplexMRI: Exploiting deep residual network for fast parallel MR imaging with complex convolution. Magn. Reson. Imaging 2020, 68, 136–147. [Google Scholar]
Wang, S.; Su, Z.; Ying, L.; Peng, X.; Zhu, S.; Liang, F.; Feng, D.; Liang, D. Accelerating magnetic resonance imaging via deep learning. In Proceedings of the 2016 IEEE 13th International Symposium on Biomedical Imaging (ISBI), Prague, Czech Republic, 13–16 April 2016; pp. 514–517. [Google Scholar]
Kwon, K.; Kim, D.; Park, H. A parallel MR imaging method using multilayer perceptron. Med. Phys. 2017, 44, 6209–6224. [Google Scholar] [CrossRef]
Quan, T.M.; Nguyen-Duc, T.; Jeong, W.K. Compressed Sensing MRI Reconstruction Using a Generative Adversarial Network with a Cyclic Loss. IEEE Trans. Med. Imaging 2018, 37, 1488–1497. [Google Scholar] [CrossRef]
Mardani, M.; Gong, E.; Cheng, J.Y.; Vasanawala, S.S.; Zaharchuk, G.; Xing, L.; Pauly, J.M. Deep Generative Adversarial Neural Networks for Compressive Sensing MRI. IEEE Trans. Med. Imaging 2019, 38, 167–179. [Google Scholar] [CrossRef]
Akçakaya, M.; Moeller, S.; Weingärtner, S.; Uğurbil, K. Scan-specific robust artificial-neural-networks for k-space interpolation (RAKI) reconstruction: Database-free deep learning for fast imaging. Magn. Reson. Med. 2019, 81, 439–453. [Google Scholar] [CrossRef]
Sriram, A.; Zbontar, J.; Murrell, T.; Zitnick, C.L.; Defazio, A.; Sodickson, D.K. GrappaNet: Combining parallel imaging with deep learning for multi-coil MRI reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 14315–14322. [Google Scholar]
Han, Y.; Sunwoo, L.; Ye, J.C. k-space deep learning for accelerated MRI. IEEE Trans. Med. Imaging 2019, 39, 377–386. [Google Scholar] [CrossRef]
Yang, Y.; Sun, J.; Li, H.; Xu, Z. Deep ADMM-Net for Compressive Sensing MRI. In Advances in Neural Information Processing Systems; Curran Associates, Inc.: New York, NY, USA, 2016; Volume 29. [Google Scholar]
Kofler, A.; Altekrüger, F.; Antarou Ba, F.; Kolbitsch, C.; Papoutsellis, E.; Schote, D.; Sirotenko, C.; Zimmermann, F.F.; Papafitsoros, K. Learning regularization parameter-maps for variational image reconstruction using deep neural networks and algorithm unrolling. SIAM J. Imaging Sci. 2023, 16, 2202–2246. [Google Scholar] [CrossRef]
Zhang, J.; Ghanem, B. ISTA-Net: Interpretable optimization-inspired deep network for image compressive sensing. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1828–1837. [Google Scholar]
Aggarwal, H.K.; Mani, M.P.; Jacob, M. MoDL: Model-Based Deep Learning Architecture for Inverse Problems. IEEE Trans. Med. Imaging 2019, 38, 394–405. [Google Scholar] [CrossRef]
Meng, N.; Yang, Y.; Xu, Z.; Sun, J. A prior learning network for joint image and sensitivity estimation in parallel MR imaging. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13–17 October; Springer: Berlin/Heidelberg, Germany, 2019; pp. 732–740. [Google Scholar]
Duan, J.; Schlemper, J.; Qin, C.; Ouyang, C.; Bai, W.; Biffi, C.; Bello, G.; Statton, B.; O’regan, D.P.; Rueckert, D. VS-Net: Variable splitting network for accelerated parallel MRI reconstruction. In Medical Image Computing and Computer Assisted Intervention–MICCAI 2019, Proceedings of the 22nd International Conference, Shenzhen, China, 13–17 October 2019; Proceedings, Part IV 22; Springer: Berlin/Heidelberg, Germany, 2019; pp. 713–722. [Google Scholar]
Adler, J.; Öktem, O. Learned primal-dual reconstruction. IEEE Trans. Med. Imaging 2018, 37, 1322–1332. [Google Scholar] [CrossRef]
Sriram, A.; Zbontar, J.; Murrell, T.; Defazio, A.; Zitnick, C.L.; Yakubova, N.; Knoll, F.; Johnson, P. End-to-end variational networks for accelerated MRI reconstruction In Medical Image Computing and Computer Assisted Intervention–MICCAI 2020, Proceedings of the 23rd International Conference, Lima, Peru, 4–8 October 2020; Proceedings, Part II 23; Springer: Berlin/Heidelberg, Germany, 2020; pp. 64–73. [Google Scholar]
Yiasemis, G.; Sonke, J.J.; Sánchez, C.; Teuwen, J. Recurrent variational network: A deep learning inverse problem solver applied to the task of accelerated MRI reconstruction. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 732–741. [Google Scholar]
Polak, D.; Cauley, S.; Bilgic, B.; Gong, E.; Bachert, P.; Adalsteinsson, E.; Setsompop, K. Joint multi-contrast variational network reconstruction (jVN) with application to rapid 2D and 3D imaging. Magn. Reson. Med. 2020, 84, 1456–1469. [Google Scholar] [CrossRef]
Schlemper, J.; Caballero, J.; Hajnal, J.V.; Price, A.N.; Rueckert, D. A Deep Cascade of Convolutional Neural Networks for Dynamic MR Image Reconstruction. IEEE Trans. Med. Imaging 2018, 37, 491–503. [Google Scholar] [CrossRef]
Jun, Y.; Shin, H.; Eo, T.; Kim, T.; Hwang, D. Deep model-based magnetic resonance parameter mapping network (DOPAMINE) for fast T1 mapping using variable flip angle method. Med. Image Anal. 2021, 70, 102017. [Google Scholar] [CrossRef]
Mardani, M.; Sun, Q.; Donoho, D.; Papyan, V.; Monajemi, H.; Vasanawala, S.; Pauly, J. Neural proximal gradient descent for compressive imaging. Adv. Neural Inf. Process. Syst. 2018, 31, 9596–9606. [Google Scholar]
Zeng, G.; Guo, Y.; Zhan, J.; Wang, Z.; Lai, Z.; Du, X.; Qu, X.; Guo, D. A review on deep learning MRI reconstruction without fully sampled k-space. BMC Med. Imaging 2021, 21, 195. [Google Scholar] [CrossRef]
Bian, W.; Chen, Y.; Ye, X. Deep parallel MRI reconstruction network without coil sensitivities. In Proceedings of the Machine Learning for Medical Image Reconstruction: Third International Workshop, MLMIR 2020, Held in Conjunction with MICCAI 2020, Lima, Peru, 8 October 2020; Proceedings 3. Springer: Berlin/Heidelberg, Germany, 2020; pp. 17–26. [Google Scholar]
Bian, W. Optimization-Based Deep Learning Methods for Magnetic Resonance Imaging Reconstruction and Synthesis. Ph.D. Thesis, University of Florida, Gainesville, FL, USA, 2022. [Google Scholar]
Bian, W.; Zhang, Q.; Ye, X.; Chen, Y. A learnable variational model for joint multimodal MRI reconstruction and synthesis. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 354–364. [Google Scholar]
Bian, W.; Chen, Y.; Ye, X. An optimal control framework for joint-channel parallel MRI reconstruction without coil sensitivities. Magn. Reson. Imaging 2022, 89, 1–11. [Google Scholar] [CrossRef]
Bian, W.; Jang, A.; Liu, F. Magnetic Resonance Parameter Mapping using Self-supervised Deep Learning with Model Reinforcement. arXiv 2023, arXiv:2307.13211v1. [Google Scholar]
Bian, W.; Jang, A.; Zhang, L.; Yang, X.; Stewart, Z.; Liu, F. Diffusion modeling with domain-conditioned prior guidance for accelerated mri and qmri reconstruction. IEEE Trans. Med. Imaging 2024. [Google Scholar] [CrossRef] [PubMed]
Bian, W.; Jang, A.; Liu, F. Multi-task Magnetic Resonance Imaging Reconstruction using Meta-learning. arXiv 2024, arXiv:2403.19966. [Google Scholar]
Bian, W. A Review of Electromagnetic Elimination Methods for low-field portable MRI scanner. arXiv 2024, arXiv:2406.17804. [Google Scholar]
Bian, W.; Jang, A.; Liu, F. Improving quantitative MRI using self-supervised deep learning with model reinforcement: Demonstration for rapid T1 mapping. Magn. Reson. Med. 2024, 92, 98–111. [Google Scholar] [CrossRef]
Heide, F.; Steinberger, M.; Tsai, Y.T.; Rouf, M.; Pająk, D.; Reddy, D.; Gallo, O.; Liu, J.; Heidrich, W.; Egiazarian, K.; et al. Flexisp: A flexible camera image processing framework. ACM Trans. Graph. 2014, 33, 1–13. [Google Scholar] [CrossRef]
Venkatakrishnan, S.V.; Bouman, C.A.; Wohlberg, B. Plug-and-play priors for model based reconstruction. In Proceedings of the 2013 IEEE Global Conference on Signal and Information Processing, Austin, TX, USA, 3–5 December 2013; pp. 945–948. [Google Scholar]
Sun, L.; Fan, Z.; Fu, X.; Huang, Y.; Ding, X.; Paisley, J. A deep information sharing network for multi-contrast compressed sensing MRI reconstruction. IEEE Trans. Image Process. 2019, 28, 6141–6153. [Google Scholar] [CrossRef]
Dar, S.U.; Yurt, M.; Shahdloo, M.; Ildız, M.E.; Tınaz, B.; Çukur, T. Prior-guided image reconstruction for accelerated multi-contrast MRI via generative adversarial networks. IEEE J. Sel. Top. Signal Process. 2020, 14, 1072–1087. [Google Scholar] [CrossRef]
Liu, X.; Wang, J.; Jin, J.; Li, M.; Tang, F.; Crozier, S.; Liu, F. Deep unregistered multi-contrast MRI reconstruction. Magn. Reson. Imaging 2021, 81, 33–41. [Google Scholar] [CrossRef]
Zhou, B.; Dey, N.; Schlemper, J.; Salehi, S.S.M.; Liu, C.; Duncan, J.S.; Sofka, M. DSFormer: A dual-domain self-supervised transformer for accelerated multi-contrast MRI reconstruction. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 3–7 January 2023; pp. 4966–4975. [Google Scholar]
Jun, Y.; Shin, H.; Eo, T.; Hwang, D. Joint deep model-based MR image and coil sensitivity reconstruction network (joint-ICNet) for fast MRI. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 5270–5279. [Google Scholar]
Arvinte, M.; Vishwanath, S.; Tewfik, A.H.; Tamir, J.I. Deep J-Sense: Accelerated MRI reconstruction via unrolled alternating optimization. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Strasbourg, France, 27 September–1 October 2021; Springer: Berlin/Heidelberg, Germany, 2021; pp. 350–360. [Google Scholar]
Peng, X.; Sutton, B.P.; Lam, F.; Liang, Z.P. DeepSENSE: Learning coil sensitivity functions for SENSE reconstruction using deep learning. Magn. Reson. Med. 2022, 87, 1894–1902. [Google Scholar] [CrossRef] [PubMed]
Tang, L.; Zhao, Y.; Li, Y.; Guo, R.; Cai, B.; Wang, J.; Li, Y.; Liang, Z.P.; Peng, X.; Luo, J. Jsense-pro: Joint sensitivity estimation and image reconstruction in parallel imaging using p re-learned subspaces of coil sensitivity functions. Magn. Reson. Med. 2023, 89, 1531–1542. [Google Scholar] [CrossRef] [PubMed]
Wang, H.Z.; Riederer, S.J.; Lee, J.N. Optimizing the precision in T1 relaxation estimation using limited flip angles. Magn. Reson. Med. 1987, 5, 399–416. [Google Scholar] [CrossRef] [PubMed]
Cheng, J.; Wang, H.; Ying, L.; Liang, D. Model learning: Primal dual networks for fast MR imaging. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Shenzhen, China, 13–17 October 2019; Springer: Berlin/Heidelberg, Germany, 2019; pp. 21–29. [Google Scholar]
Chung, H.; Ye, J.C. Score-based diffusion models for accelerated MRI. Med. Image Anal. 2022, 80, 102479. [Google Scholar] [CrossRef]
Güngör, A.; Dar, S.U.; Öztürk, Ş.; Korkmaz, Y.; Bedel, H.A.; Elmas, G.; Ozbey, M.; Çukur, T. Adaptive diffusion priors for accelerated MRI reconstruction. Med. Image Anal. 2023, 88, 102872. [Google Scholar] [CrossRef]
Kazerouni, A.; Aghdam, E.K.; Heidari, M.; Azad, R.; Fayyaz, M.; Hacihaliloglu, I.; Merhof, D. Diffusion models in medical imaging: A comprehensive survey. Med. Image Anal. 2023, 88, 102846. [Google Scholar] [CrossRef]
Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Zhang, W.; Cui, B.; Yang, M.H. Diffusion models: A comprehensive survey of methods and applications. Acm Comput. Surv. 2023, 56, 1–39. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30, 6000–6010. [Google Scholar]
Knoll, F.; Hammernik, K.; Kobler, E.; Pock, T.; Recht, M.P.; Sodickson, D.K. Assessment of the generalization of learned image reconstruction and the potential for transfer learning. Magn. Reson. Med. 2019, 81, 116–128. [Google Scholar] [CrossRef] [PubMed]
Song, Y.; Sohl-Dickstein, J.; Kingma, D.P.; Kumar, A.; Ermon, S.; Poole, B. Score-based generative modeling through stochastic differential equations. arXiv 2020, arXiv:2011.13456. [Google Scholar]
Zbontar, J.; Knoll, F.; Sriram, A.; Murrell, T.; Huang, Z.; Muckley, M.J.; Defazio, A.; Stern, R.; Johnson, P.; Bruno, M.; et al. fastMRI: An open dataset and benchmarks for accelerated MRI. arXiv 2018, arXiv:1811.08839. [Google Scholar]
Zhou, B.; Zhou, S.K. DuDoRNet: Learning a dual-domain recurrent network for fast MRI reconstruction with deep T1 prior. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 4273–4282. [Google Scholar]
Xie, Y.; Li, Q. Measurement-conditioned denoising diffusion probabilistic model for under-sampled medical image reconstruction. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Singapore, 18–22 September 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 655–664. [Google Scholar]
Liu, F.; Kijowski, R.; El Fakhri, G.; Feng, L. Magnetic resonance parameter mapping using model-guided self-supervised deep learning. Magn. Reson. Med. 2021, 85, 3211–3226. [Google Scholar] [CrossRef] [PubMed]
Maier, O.; Schoormans, J.; Schloegl, M.; Strijkers, G.J.; Lesch, A.; Benkert, T.; Block, T.; Coolen, B.F.; Bredies, K.; Stollberger, R. Rapid T1 quantification from high resolution 3D data with model-based reconstruction. Magn. Reson. Med. 2019, 81, 2072–2089. [Google Scholar] [CrossRef]
Zhang, T.; Pauly, J.M.; Levesque, I.R. Accelerating parameter mapping with a locally low rank constraint. Magn. Reson. Med. 2015, 73, 655–661. [Google Scholar] [CrossRef]
Yang, Y.; Sun, J.; Li, H.; Xu, Z. ADMM-CSNet: A deep learning approach for image compressive sensing. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 42, 521–538. [Google Scholar] [CrossRef] [PubMed]
Radmanesh, A.; Muckley, M.; Murrell, T.; Lindsey, E.; Sriram, A.; Knoll, F.; Sodickson, D.; Lui, Y. Exploring the Acceleration Limits of Deep Learning VarNet-based Two-dimensional Brain MRI. Radiol. Artif. Intell. 2022, 4, e210313. [Google Scholar] [CrossRef] [PubMed]
Guo, P.; Wang, P.; Zhou, J.; Jiang, S.; Patel, V.M. Multi-institutional collaborations for improving deep learning-based magnetic resonance image reconstruction using federated learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 19–25 June 2021; pp. 2423–2432. [Google Scholar]
Elmas, G.; Dar, S.U.; Korkmaz, Y.; Ceyani, E.; Susam, B.; Ozbey, M.; Avestimehr, S.; Çukur, T. Federated learning of generative image priors for MRI reconstruction. IEEE Trans. Med. Imaging 2022, 42, 1996–2009. [Google Scholar] [CrossRef]
Feng, C.M.; Yan, Y.; Wang, S.; Xu, Y.; Shao, L.; Fu, H. Specificity-preserving federated learning for MR image reconstruction. IEEE Trans. Med. Imaging 2022, 42, 2010–2021. [Google Scholar] [CrossRef]
Yazdanpanah, A.P.; Afacan, O.; Warfield, S.K. Deep Plug-and-Play Prior for Parallel MRI Reconstruction. In Proceedings of the International Conference on Computer Vision (ICCV 2019) Workshop on Learning for Computational Imaging, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
Kamilov, U.S.; Mansour, H.; Wohlberg, B. A plug-and-play priors approach for solving nonlinear imaging inverse problems. IEEE Signal Process. Lett. 2017, 24, 1872–1876. [Google Scholar] [CrossRef]
Ahmad, R.; Bouman, C.A.; Buzzard, G.T.; Chan, S.; Liu, S.; Reehorst, E.T.; Schniter, P. Plug-and-play methods for magnetic resonance imaging: Using denoisers for image recovery. IEEE Signal Process. Mag. 2020, 37, 105–116. [Google Scholar] [CrossRef] [PubMed]
Sun, Y.; Wohlberg, B.; Kamilov, U.S. An online plug-and-play algorithm for regularized image reconstruction. IEEE Trans. Comput. Imaging 2019, 5, 395–408. [Google Scholar] [CrossRef]
Reehorst, E.T.; Schniter, P. Regularization by denoising: Clarifications and new interpretations. IEEE Trans. Comput. Imaging 2018, 5, 52–67. [Google Scholar] [CrossRef] [PubMed]
Chan, S.H. Performance analysis of plug-and-play ADMM: A graph signal processing perspective. IEEE Trans. Comput. Imaging 2019, 5, 274–286. [Google Scholar] [CrossRef]
Buzzard, G.T.; Chan, S.H.; Sreehari, S.; Bouman, C.A. Plug-and-play unplugged: Optimization-free reconstruction using consensus equilibrium. SIAM J. Imaging Sci. 2018, 11, 2001–2020. [Google Scholar] [CrossRef]

Figure 1. Demonstration of MRI reconstruction process.

Figure 2. Network architecture of VN at t-th phase, illustrating the updates of

x^{(t)}

.

Figure 2. Network architecture of VN at t-th phase, illustrating the updates of

x^{(t)}

.

Figure 3. Network architecture of MoDL at t-th phase, illustrating the updates of

x^{(t)}

.

Figure 3. Network architecture of MoDL at t-th phase, illustrating the updates of

x^{(t)}

.

Figure 4. Network architecture of ISTA-Net at the t-th phase, illustrating the updates of

b^{(t)}

and

x^{(t)}

.

Figure 4. Network architecture of ISTA-Net at the t-th phase, illustrating the updates of

b^{(t)}

and

x^{(t)}

.

Figure 5. Architecture of parallel MRI network at the t-th phase, illustrating the updates of

b^{(t)}, {\bar{x}}_{i}^{(t)}

and

x^{(t)}

.

Figure 5. Architecture of parallel MRI network at the t-th phase, illustrating the updates of

b^{(t)}, {\bar{x}}_{i}^{(t)}

and

x^{(t)}

.

Figure 6. Network architecture of ADMM-Net at the t-th phase, illustrating the updates of

x_{t}, v_{t}

and

u_{t}

.

Figure 6. Network architecture of ADMM-Net at the t-th phase, illustrating the updates of

x_{t}, v_{t}

and

u_{t}

.

Figure 7. Network architecture of PD-Net at the t-th phase, illustrating the updates of

d_{t}, x_{t}

.

Figure 7. Network architecture of PD-Net at the t-th phase, illustrating the updates of

d_{t}, x_{t}

.

Figure 8. The model framework of joint MRI reconstruction and synthesis.

Table 1. Loss functions for variational networks in MRI reconstruction.

Model	Loss Function
Variational Network [18]	$\frac{1}{2} {∥ x_{T} - x^{*} ∥}_{2}^{2}$
MoDL [57]	$\frac{1}{2} {∥ x_{T} - x^{*} ∥}_{2}^{2}$
ISTA-Net [56]	$\begin{matrix} L_{d i s} + μ L_{i d} \\ L_{d i s} = \frac{1}{2} {∥ x_{T} - x^{} ∥}_{2}^{2} \\ L_{i d} = \frac{1}{2} \sum_{t = 1}^{T} {∥ {\tilde{φ}}_{t} (φ_{t} (x^{})) - x^{*} ∥}_{2}^{2} \end{matrix}$
pMRI-Net [71]	$\begin{matrix} \sum_{i = 1}^{c} γ ∥ x_{i}^{(T)} - {\hat{x}}_{i} ∥ + ∥ \| J^{(()} {\bar{x}}^{(T)}) \| - s (\hat{x}) ∥ \\ + η ∥ s ({\bar{x}}^{(T)}) - s (\hat{x}) ∥ \end{matrix}$
ADMM-Net [102]	$\frac{1}{2} {∥ x_{T} - x^{*} ∥}_{2}^{2}$
Learned Primal-Dual [60]	$\frac{1}{2} {∥ x_{T} - x^{*} ∥}_{2}^{2}$
RELAX-MORE [76]	$\sum_{t = 0}^{T} γ_{t} {‖ U F C x_{t} - f ‖}_{2}^{2}$
DiMo [73]	$E_{{\hat{f}}_{t}, t, ϵ} {∥ ϵ - ϵ_{θ} ({\hat{f}}_{t}, t) ∥}_{2}^{2}$

Table 2. Strengths, weaknesses, performance of various optimization-based network unrolling algorithms for MRI reconstruction and their code availability.

Algorithm	Strengths	Weaknesses	Performance in Various Scenarios	Code Availability
Variational Network (VN) [18]	Uses gradient descent for reconstruction. Effective in multi-coil MRI. Incorporates CNNs for regularization.	Needs pre-calculated sensitivity maps. Limited flexibility in handling unknown or inaccurate sensitivity maps.	Strong performance in multi-coil MRI reconstruction, but dependent on the accuracy of sensitivity maps.	https://github.com/VLOGroup/mri-variationalnetwork (accessed on 28 August 2024)
E2E-VarNet [61]	Hybrid domain learning (k-space and image domain). Learn iterative optimization steps using gradient descent.	Complex training process. Requires high computational resources.	Excellent performance in k-space and image domain reconstructions. Effective in handling undersampled MRI data, but requires significant computational resources for training and deployment.	https://github.com/facebookresearch/fastMRI/tree/main/fastmri_examples/varnet (accessed on 28 August 2024)
MoDL [57]	Recursive architecture. Weight sharing across iterations. Effective denoising-based regularization.	May enhance artifacts due to over-smoothing. Computationally intensive due to its recursive nature.	Performs well in denoising and removing aliasing artifacts, particularly in single-coil and multi-coil MRI reconstruction. May suffer from artifact enhancement when over-smoothed.	https://github.com/hkaggarwal/modl (accessed on 28 August 2024)
ISTA-Net⁺ [56]	Closed-form solution using soft-thresholding. Efficient for solving L1 norm problems. Incorporates residual learning.	May be less effective with complex, multi-coil data. Requires careful tuning of hyperparameters.	Excellent performance in sparse image reconstruction and L1-based regularization scenarios. Struggles in complex multi-coil scenarios.	https://github.com/jianzhangcs/ISTA-Net-PyTorch (accessed on 28 August 2024)
ADMM-Net [54]	Effective for single-coil MRI reconstruction. Retains iterative optimization principles. Utilizes CNN-based regularization.	Limited application in multi-coil settings. Requires accurate coil sensitivity maps.	Performs well in single-coil MRI reconstruction but struggles in multi-coil settings due to lack of sensitivity map adaptation.	https://github.com/yangyan92/Deep-ADMM-Net (accessed on 28 August 2024)
Primal-Dual Network [60]	Utilizes primal-dual optimization principles. Proximal operator is learnable. Good flexibility.	Requires significant training data. Computationally expensive due to dual operations.	Performs well across different MRI settings, including multi-coil and accelerated imaging. Effective in handling complex data with well-learned priors.	https://github.com/adler-j/learned_primal_dual (accessed on 28 August 2024)
Parallel MRI Network [71]	Bi-level optimization for MRI reconstruction. Regularization in both image and k-space domains. Cross-domain learning.	Complex design with dual regularization terms. Sensitive to training data.	Excellent performance in parallel MRI, including multi-coil settings. Balances between k-space and image domain reconstruction, effective at removing artifacts.	https://github.com/Wanyu624/pMRI_optimal_control (accessed on 28 August 2024)
RELAX-MORE [76]	Self-supervised learning. Transfer learning for quick adaptation. Optimizes quantitative MRI parameters.	Dependent on transfer learning for efficiency. Requires well-prepared training datasets.	Strong performance in quantitative MRI reconstruction, especially with transfer learning. Effective in undersampled scenarios and cross-domain applications.	N/A

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bian, W.; Tamilselvam, Y.K. A Review of Optimization-Based Deep Learning Models for MRI Reconstruction. AppliedMath 2024, 4, 1098-1127. https://doi.org/10.3390/appliedmath4030059

AMA Style

Bian W, Tamilselvam YK. A Review of Optimization-Based Deep Learning Models for MRI Reconstruction. AppliedMath. 2024; 4(3):1098-1127. https://doi.org/10.3390/appliedmath4030059

Chicago/Turabian Style

Bian, Wanyu, and Yokhesh Krishnasamy Tamilselvam. 2024. "A Review of Optimization-Based Deep Learning Models for MRI Reconstruction" AppliedMath 4, no. 3: 1098-1127. https://doi.org/10.3390/appliedmath4030059

APA Style

Bian, W., & Tamilselvam, Y. K. (2024). A Review of Optimization-Based Deep Learning Models for MRI Reconstruction. AppliedMath, 4(3), 1098-1127. https://doi.org/10.3390/appliedmath4030059

Article Menu

A Review of Optimization-Based Deep Learning Models for MRI Reconstruction

Abstract

1. Introduction

2. MRI Reconstruction Model

3. Optimization-Based Network Unrolling Algorithms for MRI Reconstruction

3.1. Gradient Descent Algorithm-Inspired Network

3.1.1. Variational Network

3.1.2. Denoising Model-Based Regularizations

3.2. Proximal Gradient Descent Algorithm-Inspired Networks

3.2.1. Iterative Shrinkage-Thresholding Algorithm (ISTA) Network

3.2.2. Parallel MRI Network

3.2.3. Self-Supervised Approaches for Quantitative MRI Reconstruction

3.3. Alternating Direction Method of Multipliers (ADMM) Algorithm-Inspired Networks

ADMM-Net

3.4. Primal-Dual Hybrid Gradient (PDHG) Algorithm-Inspired Networks

3.5. Diffusion Models Meet Gradient Descent for MRI Reconstruction

3.5.1. Score-Based Diffusion Model

3.5.2. Domain-Conditioned Diffusion Modeling

3.6. Bi-Level Optimization Model for Multi-Task Learning

4. Discussion

4.1. Evaluation Metrics and Loss Functions

4.2. Comparing Learnable Optimization Algorithms (LOA) with Traditional Optimization Methods

4.3. Strengths, Weakness, and Performance of the Reviewed Algorithms in Different Scenarios

4.4. Selection of Acquisition Parameters

4.5. Theoretical Convergence and Practical Considerations

4.6. Limitations of the Existing Deep Learning Approaches

4.7. Computational Burden, Memory Consumption, and Inference Time

4.8. Other Related Reconstruction Methods

4.9. Future Directions and Research Opportunities

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI