1. Unrolled Networks for Linear Inverse Problems
In many imaging applications, ranging from magnetic resonance imaging (MRI) and computational tomography (CT scan) to seismic imaging and nuclear magnetic resonance (NMR), the measurement process can be modeled in the following way:
In the above equation,
represents the collected measurements and
denotes the vectorized image that we aim to capture. The matrix
represents the forward operator or measurement matrix of the imaging system, which is typically known exactly or with some small error. Finally,
w represents the measurement noise, which is not known, but some information about its statistical properties (such as the approximate shape of the distribution) may be available.
Recovering
from the measurement
y has been extensively studied, especially in the last 15 years after the emergence of the field of compressed sensing [
1,
2,
3,
4]. In particular, between 2005 and 2015, many successful algorithms were proposed to solve this problem, including Denoising-based AMP [
5,
6], Plug-and-Play Priors [
7], compression-based recovery [
8,
9], and Regularization by Denoising [
10].
Inspired by the successful application of neural networks, many researchers have started exploring the application of neural networks to solve linear inverse problems [
11,
12,
13,
14,
15,
16,
17,
18,
19,
20,
21,
22,
23,
24,
25,
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37,
38]. The original networks proposed for this goal were deep networks that combined convolutional and fully connected layers [
12,
39]. The idea was that we feed
y or
to a network and then expect the network to eventually return
.
While these methods performed reasonably well in some applications, in many cases, they underperformed the more classical non-neural network-based algorithms and simultaneously required computationally demanding training. Some of the challenges that these networks face are as follows:
Size of the measurement matrix: The forward model under matrix A has elements. Even if the image is small, we may have and . This means that the measurement matrix may have more than 1 billion elements. Consequently, an effective deep learning-based recovery algorithm may need to memorize the elements or learn the structural properties of A to be able to reconstruct from y. This means that the neural network itself should ideally have many more parameters. Not only is the training of such networks computationally very demanding but also, within the computational limits of the work that has been conducted so far, end-to-end networks have not been very successful.
Changes in the measurement matrix: Another issue that is faced by such large networks is that usually one needs to redesign a network and train a model specific to each measurement matrix. Each network often suffers from poor generalizability to even small changes in the matrix entries.
Forward model inconsistency: It should also be noted that these end-to-end neural networks do not solve the inverse problem in the mathematical sense—they learn approximate mappings without guaranteeing consistency with the forward model
, as demonstrated in CT imaging applications [
40].
To address the issues faced by such deep and complex networks in solving inverse problems, and inspired by iterative algorithms for solving convex and non-convex problems, a category of networks known as unrolled networks has emerged [
41,
42]. To understand the motivation behind these unrolled networks, we consider the hypothetical situation in which all images of interest belong to a set
. Under this assumption, one way to recover the image
from the measurements
y is to find
One method to solve this optimization problem is via projected gradient descent (PGD), which uses the following iterative steps:
where
is the estimate of
in iteration
i,
is the step size (learning rate), and
denotes the projection onto the set
.
Figure 1 shows a diagram of the projected gradient descent algorithm.
One of the challenges in using PGD for linear inverse problems is that the set
is unknown, and hence
is also not known. For example,
can represent all natural images of a certain size and
a projection onto that space. Researchers have considered ideas such as using state-of-the-art image compression algorithms and image denoising algorithms for the projection step [
6,
7,
8,
9,
10,
43], and have more recently adopted neural networks as a promising approach [
11,
18,
44]. In these formulations, all the projectors in
Figure 1 are replaced with neural networks (usually, the same neural network architecture can be used at different steps). There are several analytical and computational benefits to such an approach:
We do not require a heuristically pre-designed denoiser or compression algorithm to act as the projector . Instead, we can deploy a training set to train the networks and optimize their performance. This enables the algorithm to potentially achieve better performance.
Although projected gradient descent analytically employs the same projection operator at every iteration, once we replace them with neural networks, we do not need to impose the constraint of all the neural networks’ learned parameters being the same. In fact, giving more freedom can enable us to train the networks more efficiently and, at the same time, improve performance.
Using neural networks enables greater flexibility and can integrate with a wide range of iterative optimization algorithms. For instance, although the above formulation of the unrolled network follows from projected gradient descent, one can also design unrolled networks using a wide range of options, including heavy-ball methods and message passing algorithms.
The above formulation of combining projected gradient descent with neural networks belongs to a family called
deep unrolled networks or
deep unfolded networks, which is a class of neural network architectures that integrate iterative model-based optimization algorithms with data-driven deep learning approaches [
29,
42,
45]. The central idea is to “unroll” an iterative optimization algorithm a given number of times, where each iteration replaces the traditional mapping or projection operator with a neural network. The parameters of this network are typically learned end to end. By enabling parameters to be learnable within this framework, unrolled networks combine the interpretability and convergence properties of traditional algorithms with the adaptivity and performance of deep learning models. Due to their efficacy, these networks have been widely used in solving linear inverse problems.
The unrolled network can also be constructed by replacing the projected gradient descent algorithm with various other iterative optimization algorithms. Researchers have explored incorporating deep learning-based projectors with a range of iterative methods, including the Alternating Direction Method of Multipliers (ADMM-Net) [
11], Iterative Soft-Thresholding Algorithm (ISTA-Net) [
17], Nesterov’s Accelerated First Order Method [
46,
47], and approximate message passing (AMP-Net) [
48]. Many of these alternatives offer faster convergence for convex optimization problems, raising the prospect of reducing the number of neural network projectors required, thereby lowering the computational complexity of both training and deploying these networks.
The remainder of this paper is organized as follows.
Section 2 discusses the challenges in designing unrolled networks.
Section 3 introduces our Deep Memory Unrolled Network (DeMUN), which generalizes existing unrolled algorithms.
Section 4 presents our four main hypotheses with supporting experiments on loss functions, residual connections, and network complexity.
Section 5 demonstrates robustness across different measurement matrices, noise levels, and image resolutions.
Section 6 concludes with practical guidelines for designing effective unrolled networks.
2. Challenges in Using Unrolled Networks
As discussed above, the flexibility of unrolled networks has established them as a powerful tool for solving imaging inverse problems. However, applying these architectures to address specific inverse problems presents significant challenges to users. These difficulties stem primarily from two factors: (i) a multitude of design choices and (ii) robustness to noise, measurement matrix, and image resolution. We clarify these issues and present our approach to addressing them below.
2.1. Design Choices
The first challenge lies in the numerous design decisions that users must make when employing unrolled networks. We list some main choices below:
Optimization Algorithm. In training any unrolled network, the user must decide on which iterative optimization algorithm to unroll. The choices include projected gradient descent, heavy-ball methods (such as Nesterov’s accelerated first-order method), approximate message passing (AMP), the Alternating Direction Method of Multiplies (ADMM), among others. Unrolled networks trained on different optimization algorithms may lead to drastically varying performances for the task at hand.
Loss Function. For any unrolled optimization algorithm, given any observation
y, one produces a sequence of
T projections
(see
Figure 1 for an illustration). To train the model, the convention is to define the loss function with respect to the final projection
using the
loss
since this is usually the quantity returned by the network. However, given the non-convexity of the cost function that is used during training, there is no guarantee that the above loss function is optimal for the generalization error. For example, one could use a loss function that incorporates one or more estimates from intermediate stages, such as
, to potentially achieve better training that provides an improved estimate of
. As will be shown in our simulations, the choice of the loss function has a major impact on the performance of the learned networks. Various papers have considered a wide range of loss functions for training different networks. We categorize them broadly below.
- –
Last-Layer Loss. Consider the notations used in
Figure 1. The last-layer loss evaluates the performance of the network using the following loss function:
The last-layer loss is the most popular loss function that has been used in applications. The main argument for using this loss is that, since we only care about the final estimate and that is used as our final reconstruction, we should consider the error of the last estimate.
- –
Weighted Intermediate Loss. While the loss function above seems reasonable, some works in related fields have proposed using an intermediate loss function instead [
49,
50]. We define the general version of the intermediate loss function as follows:
where
. One argument that motivates the use of such loss function is that, if the predicted image after each projection is closer to the ground truth
, then it will help the subsequent steps to reach better solutions. The weighted intermediate loss tries to find the right balance among the accuracy of the estimates at different iterations [
49]. In addition, we make the following observations:
- ∗
When , the losses from different layers of the unrolled network are weighted equally. This means that our emphasis on the performance of the last layer is “weakened.” However, this is not necessarily undesirable. As we will show in our simulations, improving the estimates of the intermediate steps will also help to improve the recovery quality of .
- ∗
As we decrease the value of , we see that the loss function approaches the last-layer loss . The choice of the therefore enables us to interpolate between the two cases.
- –
Skip L-Layer Intermediate Loss. Another loss function that we investigate is what we call the skip
L-layer intermediate loss. This loss is similar to the loss used in Inception networks for image classification [
51]. Let
L be a factor of
T. Then, the skip
L-layer loss is given by
For instance, if and , the skip 3-layer intermediate loss will evaluate the sum of the mean-squared errors between and projections and . By ranging L from 1 to T, one can again interpolate between the two loss functions and .
Number of Unrolled Steps. Practitioners also have to decide on the number of steps T to unroll for any optimization algorithm. Increasing T often comes with additional computational burdens and may also lead to overfitting. A proper choice of T can ensure that network training is not prohibitively expensive and ensure desirable levels of performance.
Complexity of the Neural Network. Similar to the above, the choice of also has a significant impact on the performance of the network. The options entail the number of layers or depth of the network, the activation function to use, whether or not to include residual connections, etc. If the projector is designed to have only little capacity, the unrolled network may have poor recovery. However, if the projector has excessive capacity, the network may become computationally expensive to train and prone to overfitting.
It is important to note that, after making all the design choices, users are required to conduct time-consuming, computationally demanding, and costly simulations to train the network. Consequently, users may only have the opportunity to explore a limited number of options before settling on their preferred architecture.
2.2. Robustness and Scaling
When designing unrolled networks for inverse problems, it is common to aim for robustness across a range of settings beyond the specific conditions for which the algorithm was originally designed. While an algorithm may be tailored for a particular signal type, image resolution, number of observations, or noise level, it is desirable for the network to maintain effectiveness across different settings as well.
As a simple motivating example, consider when a new imaging device has been acquired that operates using a different observation matrix. If the unrolled network previously designed has bad adaptivity and performance with respect to the current matrix, one would be required to review the entire process to determine a new batch of choices for the current setting. Therefore, ideally, one would like to have a single network structure that works well across a wide range of applications.
2.3. Our Approach for Designing Unrolled Networks
As discussed above, one faces an abundance of design choices before training and deploying an unrolled network. However, testing the performance of all the possible enumerations of these choices across a wide range of applications and datasets is computationally demanding and combinatorially prohibitive. This hinders practitioners from applying the optimal unrolled network in their problem-specific applications. To offer a more systematic way for designing such networks, we adopt the following high-level approach:
We present the Deep Memory Unrolled Network (DeMUN), where each step of the network leverages the gradient from all previous iterations. These networks encompass various existing models as special cases. The DeMUN lets the data decide on the optimal choice of algorithm to be unrolled and improves recovery performance.
We present several hypotheses regarding important design choices that underlie the design of unrolled networks, and we test them using extensive simulations. These hypotheses allow users to avoid exploring the multitude of design choices that they have to face in practice.
These two steps allow users to bypass many design choices, such as selecting an optimization algorithm or loss function, thus simplifying the process of designing unrolled networks. We test the robustness of our hypotheses with respect to the changes in the measurement matrices and noise in the system. These robustness results suggest that the simplified design approach presented in this paper can be applied to a much wider range of systems than those specifically studied here.
3. Deep Memory Unrolled Network (DeMUN)
As discussed previously, one of the initial decisions users face when designing an unrolled network is selecting the optimization algorithm to unroll. Various optimization algorithms, including gradient descent, heavy-ball methods, and approximate message passing, have been incorporated into unrolled networks. We introduce the Deep Memory Unrolled Network (DeMUN), which encompasses many of these algorithms as special cases. At the
i-th iteration in the DeMUN, the update of
is given by
for
, where
. In other words, while calculating
, it uses not only the gradient calculated at the current step but also leverages all the gradients calculated from previous steps.
By using different choices for
at each iteration, one can recover a large class of algorithms, including gradient descent, heavy-ball methods, and approximate message passing. As shown in Equation (
6) and illustrated in
Figure 2, we can rearrange the vector
and the gradients
as images and view the expression as one-by-one convolutions over the images. Our simulation results reported later show that DeMUNs with trainable
offer greater flexibility and better performance compared to fixed instances such as gradient descent or Nesterov’s method.
4. Our Four Main Hypotheses
4.1. Simulation Setup
Our goal in this section is to (1) show the effectiveness of the DeMUN by comparing its performance against different unrolled algorithms and (2) explore the impact of specific design choices. We conduct extensive ablation studies where we fix all but one design choice at each step and explore the performance of unrolled algorithms under different options for this choice. Based on these studies, we have developed several hypotheses aimed at simplifying the design of unrolled networks. We will outline these hypotheses and present simulation results that support them.
For all simulations below, we report results of four different sampling rates
for the measurement matrix
A: 10%, 20%, 30%, and 40%. In
Section 4, each entry in the measurement matrix is i.i.d. Gaussian, where
for
. While we will discuss the impact of the resolution on the performance of the algorithms, in the initial simulations, all training images have resolution
, and vectorizing the images leads to
. We primarily consider when the number of unrolled steps
, with additional comparisons to performance at
, where illustrative. For all results below, we report the Peak Signal-to-Noise Ratio (PSNR) and the Structural Similarity Index Measure (SSIM) for the networks trained under the aforementioned sampling rates and number of projection steps on a test set of 2500 images. More details on data collection and processing, training of unrolled networks, and evaluation are deferred to
Appendix A.
In our simulations, we adopt the general DnCNN architecture as outlined by Zhang et al. as our neural network projector
[
52]. A DnCNN architecture with
L intermediate layers consists of an input layer with 64 filters of size
followed by an ReLU activation function to map the input image to 64 channels (It is of size
since we assume the images are in grayscale.),
L layers consisting of 64 filters of size
followed by BatchNormalization and ReLU, and a final reconstruction layer with a single filter of size
to map to the output dimension of
.
4.2. Overview of Our Simplifying Hypotheses
As previously described, we begin with four hypotheses, each of which contributes to improving the performance of unrolled networks, enhancing training practices, and simplifying the design process by reducing the number of decisions practitioners need to make. These hypotheses are based on extensive simulations and are reported below.
Hypothesis 1. Unrolled networks trained with the loss function uniformly outperform their counterparts trained with . Among the unrolling algorithms we tested, i.e., PGD, AMP, and Nesterov, DeMUNs offer superior recovery performance.
Although we are primarily concerned with the quality of the final reconstruction
, we find that using the loss function
during training yields better recovery performance than focusing solely on the last layer. This improvement may be attributed to the smoother optimization landscape provided by using the intermediate loss, which guides the network more effectively towards better minima. We present our empirical evidence for suggesting this hypothesis in
Section 4.3. With the advantage of using an unweighted intermediate loss function established, we next explore the impact of incorporating residual connections into unrolled networks.
Hypothesis 2. DeMUNs trained using residual connections and loss function uniformly improve recovery performance compared to those trained without residual connections.
Residual connections are known to alleviate issues such as vanishing gradients and facilitate the training of deeper networks by allowing gradients to propagate more effectively through the intermediate layers [
53,
54]. Specifically, we modify each projection step in our unrolled network to be
. In verifying Hypothesis 2, we continue to use
(see the definition of intermediate loss in (
4)). This ensures that any observed improvements can be directly attributed to the addition of residual connections rather than changes in the loss function. We present our empirical evidence for suggesting this hypothesis in
Section 4.4. Having confirmed that both the use of an unweighted intermediate loss and the inclusion of residual connections improve recovery performance, we further investigate the sensitivity of our network to the specific shape of the loss function.
Hypothesis 3. For training DeMUNs, there is no significant difference among the following loss functions: (1) , (2) , and (3) . Furthermore, , , , , and perform worse than .
Hypothesis 4. When we vary the number of layers, L, in the DnCNN from 5 to 15, the performance of DeMUNs remains largely unchanged, indicating that the number of layers has a negligible impact on its performance. However, increasing L from 3 to 5 provides a noticeable improvement in performance.
Confirming these hypotheses provides a set of practical recommendations for designing unrolled networks that are both effective and robust across various settings.
4.3. Impact of Intermediate Loss
In this section, we aim to validate Hypothesis 1, which posits that deep unrolled networks trained with the unweighted intermediate loss function uniformly outperform their counterparts trained with the last-layer loss . We consider the following algorithms:
For all unrolled algorithms, we consider when all the projection steps are cast as direct projections of the form
and compare the performance between last-layer loss and unweighted intermediate loss.
Figure 3 presents an example of a DnCNN architecture with
intermediate layers.
Improved Performance with Intermediate Loss:
By analyzing the tables and graphs, we conclude that, across all four unrolled algorithms, training with the intermediate loss function consistently yields higher PSNR values compared to training with the last-layer loss, .
Superiority of Deep Memory Unrolled Network: Among all algorithms that we have unrolled, i.e., PGD, Nesterov, and AMP, the DeMUN achieves the highest PSNR values when trained with the intermediate loss, confirming our hypothesis. (However, we see that this is not always the case when using last-layer loss. A possible explanation is that our memory networks contain many parameters (especially with many projection steps) and may be stuck at a local minimum during training using the last-layer loss. In contrast, when adopting the intermediate loss function, the network needs to optimize its projection performance across all projection steps to minimize the loss. As a result, it may find better solutions, especially for the parameters that are involved in the earlier layers.) This is to be expected as DeMUNs encompass the other unrolled networks as special cases. During training, the data effectively determines which algorithm should be unrolled.
According to these observations, the intermediate loss may provide several benefits:
Avoiding Poor Local Minima: Focusing solely on the output of the final layer may lead the network to suboptimal solutions (due to non-convexity). In comparison, the intermediate loss encourages the network to make meaningful progress at each step, which potentially reduces the risk of becoming stuck in poor local minima.
More Information during Backpropagation: By including losses from all intermediate steps, the network receives more gradient information during autodifferentiation, which may be helpful in learning better representations and weights.
These empirical results strongly support our first hypothesis that incorporating information from all intermediate steps creates a more effective learning mechanism for the network.
4.4. Impact of Residual Connections
Having verified that training with the intermediate loss function improves the recovery performance of unrolled networks, we now examine the effect of incorporating residual connections of the form into unrolled networks, as stated in Hypothesis 2, when fixing the choice of the unweighted intermediate loss function . For comparison, in addition to the Deep Memory Unrolled Network, we include the results for unrolled networks based on PGD under the same conditions.
Consistent Performance Improvement: Including residual connections consistently improves the PSNR across all sampling rates and number of projection steps for both the deep memory- and PGD-based unrolled networks.
Superior Performance of Deep Memory Network: While both networks benefit from residual connections, the Deep Memory Unrolled Networks maintain superior performance over projected gradient descent in all scenarios.
These empirical results strongly support Hypothesis 2 that incorporating residual connections into the Deep Memory Unrolled Network further improves its performance on top of training with the unweighted intermediate loss function. The consistent improvement across different sampling rates and projection steps potentially highlights the value of residual connections in unrolled network architectures.
4.5. Sensitivity to Other Loss Functions
Having identified that using an unweighted intermediate loss function and incorporating residual connections in Deep Memory Unrolled Networks offer superior performance, we now explore the sensitivity of our network to variations in the loss function as raised in Hypothesis 3. Specifically, we want to see whether different weighting schemes in the intermediate loss function or using a skip-L layer loss significantly impact the recovery performance. We consider the following variations of the loss function: , where and . We also include results for Deep Memory Unrolled Networks trained using with residual connections for comparison.
From the simulation results presented in
Table 3, we are able to observe the following:
Minimal Impact for : When , the recovery performance remains relatively consistent, with negligible differences in the PSNR values.
Degradation with Small : For , there is a noticeable decrease in reconstruction quality. This decline may be attributed to the exponential down-weighting of the initial layers, which causes the network to focus excessively on the later iterations, potentially leading to suboptimal convergence.
We see that as long as the intermediate outputs receive sufficient emphasis during training, the network can output high-quality reconstructions. The decline in performance with smaller values of underscores the importance of adequately supervising the reconstruction of intermediate layers to guide the network toward the desirable recovery.
4.6. Impact of the Complexity of the Projection Step
In this section, we examine Hypothesis 4 by changing the number of intermediate layers L of the DnCNN architecture. We assume that there is no additive measurement noise and consider , and 15 layers. Our results are shown below.
Increasing the number of layers from 5 to 15 results in negligible changes in the performance of DeMUNs regardless of the number of projection steps. By comparing
Table 4,
Table 5 and
Table 6, we observe that the number of projections has a significantly greater impact on performance than the number of layers within each projection.
By comparing
and
, we conclude that reducing the depth too drastically
may impair the network’s ability to learn complex features as convolutional neural networks rely on multiple layers to capture hierarchical representations [
55].
We acknowledge that these conclusions may not necessarily extend to other projector architectures that do not rely on deep convolutional layers. Nevertheless, we believe this observation generalizes to other types of architectures when their capacity diminishes beyond a certain threshold, although we defer further investigation to future work. We address extensions to other types of measurement matrices in
Section 5.3.
5. Robustness of DeMUNs
In
Section 4, we established, through extensive simulations, the superior performance of DeMUNs trained with unweighted intermediate loss
and residual connections. The aim of this section is to assess the robustness of this configuration under various conditions. Specifically, we examine our network’s performance under changes in the measurement matrix, the presence of additive noise, variations in input image resolution, and changes in projector capacity. These aspects represent the primary variables that practitioners must consider when deploying unrolled networks in real-world scenarios. Our extensive experiments demonstrate the adequacy and generalizability of our design choices. In the simulations presented in the following sections, we fix the image resolution to
when the resolution has not been specified.
5.1. Robustness to the Sampling Matrix
We first investigate our network’s performance under different sampling matrix structures. In addition to the Gaussian random matrix used previously, we consider a Discrete Cosine Transform (DCT) matrix of the form
, where
is an undersampling matrix and
F represents the 2D-DCT. We set the number of hidden layers for each projector (DnCNN)
. Additional implementation details can be found in
Appendix A. There are a few points that we would like to clarify here:
Table 7 demonstrates that our network maintains good performance when considering DCT-type measurement matrices as well. The network effectively adapts to the DCT matrices, achieving comparable or better PSNR values than the Gaussian forward model. This suggests that our design choices made based on our simulations on Gaussian forward models offer good performance for other types of matrices as well.
The performance improvement DeMUNs gain from additional projection steps on DCT forward models is typically less than the improvement achieved with additional projections on Gaussian matrices. Since there are no signs of overfitting concerning recovery performance, we believe that the user does not need to worry about the number of projection steps when designing the network.
5.2. Robustness to Additive Noise
Next, we introduce additive noise and obtain measurements of the form , where . We want to see if our design choices still offer good performance in the presence of additive noise. The primary objective of this section is to demonstrate that the PSNR of DeMUN reconstructions gradually decreases as the noise level increases and overfitting does not occur as the number of projections increases.
Despite additive measurement noise predictably lowering the recovery PSNR, its impact on performance is relatively controlled. In particular, as the noise level increases, the PSNR degrades at a rate significantly slower than the decrease in the input SNR. This suggests that the network effectively suppresses the measurement noise.
As the noise level grows, the marginal benefit of additional projection steps diminishes. In other words, fewer projection steps often suffice to achieve comparable reconstruction quality. As mentioned before and is clear from
Table 8, still increasing the number of projections does not hurt the reconstruction performance of the network. Hence, in scenarios where the noise level is not known, practitioners may choose a number that works well for the noiseless setting and use it for the noisy settings as well.
5.3. Robustness of Hypothesis 4 to Sampling Matrix and Additive Noise
The main goal of this section is to evaluate the robustness of Hypothesis 4 in response to changes in the measurement matrix and measurement noise. We first assume that there is no additive noise and consider
, and 15 layers. We then evaluate the performance of DeMUNs on DCT-type matrices described in
Section 5.1.
As evident from
Table 10, increasing
L from 5 to 15 does not provide a noticeable improvement for DCT-type matrices. One could also argue that, in most cases for DCT-type matrices, the performance gain from increasing
L from 3 to 5 is marginal.
Next, we study the accuracy of Hypothesis 4 when additive noise is present in the measurements. Here, we consider three noise levels
and test depths of
, and 10. The results are presented in
Table 11 and
Table 12.
These results strongly suggest that, even in the presence of additive noise, increasing L does not offer substantial gain in the performance of DeMUNs. Given that the improvement in recovery performance is marginal when increasing the projector capacity, this suggests that simple architectures like DnCNN with very few convolutional layers may be sufficient for practical applications where measurement noise is present, offering potential computational savings without significant performance degradation.
5.4. Robustness to Image Resolution
Finally, we assess the DeMUN’s performance across different image resolutions. We test resolutions of , , and , fixing the measurement matrices and removing measurement noise. There are two main questions we aim to address here: (1) Do we need more or fewer projections as we increase the number of projections? (2) How should we set the number of layers L in the projection as we increase/decrease resolution? As before, we first set the number of intermediate layers of each projector to .
We observe from
Table 13 that, as the image resolution increases, the network’s recovery performance generally improves. This is possibly due to the presence of more information in higher-resolution images, which helps the network in learning more detailed structural properties.
6. Conclusions
In this paper, we conducted a comprehensive empirical study on the design choices for unrolled networks in solving linear inverse problems. As our first step, we introduced the Deep Memory Unrolled Network (DeMUN), which leverages the history of all gradients and generalizes a wide range of existing unrolled networks. This approach was designed to (1) allow the data to decide on the optimal choice of algorithm to be unrolled and (2) improve recovery performance. A byproduct of our choice is that users do not need to decide which algorithm they need to unroll.
Figure 7 presents examples of recovered images under DCT matrix with 30 projections across different sampling rates.
Through extensive simulations, we demonstrated that training the DeMUN with an unweighted intermediate loss function and incorporating residual connections represents the best existing practice (among the ones studied in this paper) for optimizing these networks. This approach delivers superior performance compared to existing unrolled algorithms, highlighting its effectiveness and versatility.
We also presented experiments that exhibit the robustness of our design choices to a wide range of conditions, including different measurement matrices, additive noise levels, and image resolutions. Hence, our results offer practical guidelines and rules of thumb for selecting the loss function for training, structuring the unrolled network, determining the required number of projections, and deciding on the appropriate number of layers. These insights simplify the design and optimization of such networks for a wide range of applications, and we expect them to serve as a useful reference for researchers and practitioners in designing effective unrolled networks for linear inverse problems across various settings.