Ambiguity in Solving Imaging Inverse Problems with Deep-Learning-Based Operators

In recent years, large convolutional neural networks have been widely used as tools for image deblurring, because of their ability in restoring images very precisely. It is well known that image deblurring is mathematically modeled as an ill-posed inverse problem and its solution is difficult to approximate when noise affects the data. Really, one limitation of neural networks for deblurring is their sensitivity to noise and other perturbations, which can lead to instability and produce poor reconstructions. In addition, networks do not necessarily take into account the numerical formulation of the underlying imaging problem when trained end-to-end. In this paper, we propose some strategies to improve stability without losing too much accuracy to deblur images with deep-learning-based methods. First, we suggest a very small neural architecture, which reduces the execution time for training, satisfying a green AI need, and does not extremely amplify noise in the computed image. Second, we introduce a unified framework where a pre-processing step balances the lack of stability of the following neural-network-based step. Two different pre-processors are presented. The former implements a strong parameter-free denoiser, and the latter is a variational-model-based regularized formulation of the latent imaging problem. This framework is also formally characterized by mathematical analysis. Numerical experiments are performed to verify the accuracy and stability of the proposed approaches for image deblurring when unknown or not-quantified noise is present; the results confirm that they improve the network stability with respect to noise. In particular, the model-based framework represents the most reliable trade-off between visual precision and robustness.


Introduction
Image restoration is a discipline within the field of image processing focusing on the removal or reduction of distortions and artifacts from images.This topic is of interest in a wide range of applications, including medical imaging, satellite and aerial imaging, and digital photography.In this last case, blurring on images is quite frequent and several factors can cause it.To set some examples, Gaussian blur is caused by the diffraction of light passing through a lens and it is more prevalent in images captured with low-aperture lenses or in situations where the depth of field is shallow, whereas motion blur is due to handheld camera movements or low lighting conditions and slow shutter speeds [1,2,3].Also noise seriously affects images; it is usually introduced by the acquisition systems.
Researchers have developed a number of algorithms for reducing blur and noise and image restoration is a very active field of research where new methods are continuously being proposed and developed.Such methodologies can be classified into two main categories: model-based and learning-based.The model-based techniques assume that the degradation process is known and it is mathematically described as an inverse problem [4].The learningbased methods learn a map between the degraded and clean images during the training phase and use it to deblur new corrupted images [5].

Model-based mathematical formulation.
In model-based approaches, denoting by X the compact and locally connected subset of R n of the x gt ground truth sharp images, the relation between x gt ∈ X and its blurred and noisy observation y δ is formulated as: where K is the known blurring operator and e represents noise on the image.We can say that, with very high probability, ||e|| ≤ δ.In this setting, the goal of model-based image deblurring methods is to compute a sharp and unobstructed image x given y δ and K, by solving the linear inverse problem.When noise is present, problem (P) is typically reformulated into an optimization problem, where a data fit measure, namely F, is minimized.Since the blurring operator K is known to be severely ill-conditioned, a regularization term R is added to the data-fidelity term F to avoid noise propagation.The resulting optimization problem is formulated as: where λ > 0 is the regularization parameter.This optimization problem can be solved using different iterative methods depending on the specific choice for F and R [6,1,7].We remark that F is set as the least-squares function in case of Gaussian noise, whereas te regularization function R can be tuned by the users according to the imaging properties they desire to enforce.Recently, plug-and-play techniques plug a denoiser, usually a neural network, into an iterative procedure to solve the minimization problem [8,9,10].The value of λ can also be selected by automatic routines, image-by-image [11,12].These features make model-based approaches mathematically explainable, flexible, and robust.However, a disadvantage is that the final result strongly depends on a set of parameters that are difficult to set up properly.
Deep learning-based formulation.
In the last decade, deep learning algorithms have been emerging as good alternatives to model-based approaches.Disregarding any mathematical blurring operator, convolutional neural networks (NNs) can be trained to identify patterns characterizing blur on images, thus they can learn several kinds of blur and adapt to each specific imaging task.Large and complex convolutional neural networks, called UNet, have been proposed to achieve high levels of accuracy, by automatically tuning and defining their inner filters and proper transformations for blur reduction, without needing any parameter setting [13,14,15,16].Indeed, the possibility to process large amounts of data in parallel makes networks highly efficient for image processing tasks and prone to play a key role in the development of new and more advanced techniques in the future.
However, challenges and limitations in using neural networks are known in the literature.Firstly, it is difficult to understand and precisely interpret how they are making decisions and predictions, as they act as unexplainable black boxes mapping the input image y δ towards x gt directly.Secondly, neural networks are prone to overfitting, which occurs when they become too specialized for the training samples and perform poorly on new, unseen images.Lastly, the high performance of neural networks is typically evaluated only in the so-called in-domain case, i.e. the test procedure is performed on images sharing exactly the same corruption with the training samples, hence the impact of unquantified perturbations (as noise can be) has been not widely studied yet.In other words, the robustness of NN-based image deblurring with respect to unknown noise is not guaranteed [17,18,19,20].

Contributions of the article.
Motivated by the poor stability but high accuracy of NN-based approaches in solving inverse imaging problems such as deblurring, this paper proposes strategies to improve stability, maintaining good accuracy, acting similarly as regularization functions do in the model-based approach.Basing on a result showing a trade-off between stability and accuracy, we propose to use a very small neural network, in place of the UNet, which is less accurate, but it is much more stable than larger networks.Since it has only few parameters to identify, it consumes relatively little time and energy, thus meeting the green AI principles.Moreover, we propose two new NN-based schemes, embedding a pre-processing step to face the network instability when solving deblurring problems as in (P).The first scheme, denoted as FiNN, applies a model-free low-pass filter to the datum, before passing it as input to the NN.This is a good approach to be applied whenever an unknown noise is present because it does not need any model information or parameter tuning.The second scheme, called Stabilized Neural Network (StNN), exploits an estimation of the noise statistics and the mathematical modeling of both noise and image corruption process.Figure 1 shows a draft of the proposed frameworks.whose robustness is evaluated from a theoretical perspective and tested on an image data set.
Structure of the article.
The work is organized as follows.In Section 2, we formulate the NN-based action as an image reconstructor for problem (P).In Section 3 we show our experimental set-up and motivate our work on some experiments, thus we state our proposals and derive their main properties in Section 4. Finally, in Section 5 we will report the results of some experiments to test the methods and empirically validate the theoretical analysis, before concluding with final remarks in Section 6.

Solving imaging inverse problems with Deep Learning based operators
As stated in (P), image restoration is mathematically modeled as an inverse problem which derives from the discretization of Fredholm integral equations, are ill-posed and the noise on the data is amplified in the numerically computed solution of y δ = Kx gt + e.A rigorous theoretical analysis on the solution of such problems with variational techniques which can be formulated as in equation ( 1) has been performed, both in the continuous and discrete settings, and regularization techniques have been proposed to limit the noise spread in the solution [21,1].
At our best knowledge, a similar analysis for deep learning based algorithms is not present in literature and it is quite mysterious how these algorithms behave in presence of noise on the data.In this paper we use some of the mathematical tools defined and proved in [20] and we propose here some techniques to limit noise spread.More details about the proposed mathematical framework in a more general setting can be found in [20].
In the following, if not differently stated, as a vector norm we consider the Euclidean norm.We first formalize the concept of reconstructor associated to (P) with the following definition.The associated reconstructing error is Definition 2.2.We quantify the accuracy of the reconstructor ψ, by defining the measure η > 0 as: We say that ψ is η −1 -accurate [21].
We now consider a neural network as a particular reconstructor.
We now analyze the performance of NN-based reconstructors when noise is added to their input.
Definition 2.4.Given δ ≥ 0, the δ-stability constant C δ ψ θ of an η −1 -accurate reconstructor is defined as: Since from Definition 2.4 we interestingly observe that the stability constant amplifies the noise in the data: with y 0 the noiseless datum, we can give the following definition: The next theorem states an important relation between the stability constant and the accuracy of a neural network as a solver of an inverse problem .
Theorem 2.1.Let ψ θ : R n → R n be an η −1 -accurate reconstructor.Then, for any x gt ∈ X and for any δ > 0, where K † is the Moore Penrose pseudo-inverse of K.
For the proof see [20].
We emphasize that, even if neural networks used as reconstructors do not use any information on the operator K, the stability of ψ θ is related to the pseudo-inverse of that operator.

Experimental setting
Here we describe our particular setting using neural networks as reconstructors for a deblurring application.
The UNet and NAFNet architectures are complex, multi-scale networks, with similar overall structure but very different behavior.As shown in Figure 2, both UNet and NAFNet are multi-resolution networks, where the input is sequentially processed by a sequence of blocks B 1 , . . ., B ni , i = 1, . . ., L and downsampled after that.After L − 1 downsampling, the image is then sequentially upsampled again to the original shape through a sequence of blocks, symmetrically to what happened in the downsampling phase.At each resolution level i = 1, . . ., L, the corresponding image in the downsampling phase is concatenated to the first block in the upsampling phase, to keep the information through the network.Moreover, a skip connection has also been added between the input and the output layer of the model to simplify the training as described in [24].The left-hand side of Figure 2 shows that the difference between UNet and NAFNet is in the structure of each block.In particular, the blocks in UNet are simple Residual Convolutional Layers, defined as a concatenation of Convolutions, ReLU, BatchNormalizations and a skip connection.On the other side, each block in NAFNet is way more complex, containing a long sequence of gates, convolutional and normalization layers.The key propriety of NAFNet, as described in [23], is that no activation function is used in the blocks, since they have been substituted by non-linear gates, thus obtaining improved expressivity and more training efficiency.
The 3-layer Single-Scale Network (3L-SSNet) is a very simple model defined, as suggested by its name, by just three convolutional layers, each of them composed by a linear filter, followed by a ReLU activation function and a BatchNormalization layer.Since by construction the network works on single-scale images (the input is never downsampled to low-resolution level, as it is common in image processing), to increase the receptive field of the model the kernel size is crucial.For this reason, we considered a 3L-SSNet with width [128, 128, 128] and kernel size [9 × 9, 5 × 5, 3 × 3], respectively.

Data set
As a data set for our experiments we choose the widely-used GoPro [25], which is composed of a large number of photographic images acquired from a GoPro camera.All the images have been cropped into 256 × 256 patches (with no overlapping), converted into grayscale and normalized into [0,1].We synthesize the blurring of each image according to (P) by considering a Gaussian corrupting effect, implemented with the 11 × 11 Gaussian kernel G defined as with variance σ G = 1.3.The kernel is visualized in Figure 3, together with one of the GoPro images and its blurred counterpart.

Neural networks training and testing
To train a Neural Network for deblurring, the set of available images has been split into train and test subsets, with N D = 2503 and N T = 1111 images respectively.Then we consider a set D = {(y δ i , x gt i ); , for a given δ ≥ 0. Since we set a Mean Squared Error (MSE) loss function, a NN-based reconstructor is uniquely defined as the solution of: Each network has been trained by performing 50 epochs of Adam optimizer with β 1 = 0.9, β 2 = 0.9 and a learning rate of 10 −3 .We focus on the next two experiments.
Experiment A. In this experiment we train the neural networks on images only corrupted by blur (δ = 0).To the aim of checking the networks accuracy, defined as in Section 2, we test on no noisy images (in-domain tests).Then, to verify theorem 2.1 we consider test images with added Gaussian noise, with σ = 0.025 (out-of-domain tests).

Experiment B.
A common practice for enforcing network stability is noise injection [26], consisting in training a network by adding noise components to the input.In particular, we have added a vector noise e ∼ N (0, σ 2 I), with σ = 0.025.To test the stability of the proposed frameworks with respect to noise, we test with higher noise with respect to training.

Robustness of the end-to-end NN approach
Preliminary results obtained from experiment A are shown in Figure 4.The first row displays the reconstructions obtained from in-domain tests, where we can appreciate the accuracy of all the three considered architectures.In the second row we can see the results obtained from out-of-domain tests, where the noise on the input data strongly corrupts the solution of the ill-posed inverse problem computed by UNet and NAFNet.Confirming what stated by Theorem 2.1, the best result is obtained with the very light 3L-SSNET, which is the only one able to handle the noise.

Improving noise-robustness in deep learning based reconstructors
As observed in Section 3, merely using a neural network to solve an inverse problem is an unstable routine.To enforce the robustness of ψ θ reconstructors, we propose to modify the Deep Learning based approach by introducing a suitable operator, defined in the following as a stabilizer, into the reconstruction process.Definition 4.1.A continuous functions ϕ : R n → R n is called a δ-stabilizer for a neural network reconstructor In this case, the reconstructor ψθ = ψ θ • ϕ is said to be δ-stabilized.The smallest constant L δ ϕ for which the definition holds is the stability constant C δ ϕ of ϕ.
Intuitively, applying a pre-processing ϕ with L δ ϕ < 1 reduces the perturbation of the input data, by converting a noise of amplitude bounded by δ to a corruption with norm bounded by δL δ ϕ .This intuition has been mathematically explained in [20], Proposition 4.2, where a relationship between the stability constant of the stabilized reconstructor ψθ and the stability constant of ψ θ has been proved.In particular, if ψθ = ψ θ • ϕ is a δ-stabilized reconstructor, L δ ψ θ , L δ ϕ are the local Lipschitz constants of ψ θ and ϕ, respectively, then: As a consequence, if L δ ϕ < 1, then the stability constant of ψθ is smaller than the Lipschitz constant of ψ θ , which implies that ψθ is more stable to input perturbations.
We underline that the δ-stabilizers ϕ are effective if they preserve the characteristics and the details of the input image y δ .In this paper we focus on the two following proposals of δ-stabilizers ϕ.

Stabilized Neural Network (StNN) based on the imaging model
If the blurring operator K is known, it can be exploited to derive a δ-stabilizer function ϕ.We argue that information on K will contribute to improve the reconstruction accuracy.Specifically, we consider an iterative algorithm, converging to the solution of (1), represented by the scheme: where T k is the action of the k-th iteration of the algorithm.Given a positive integer M ∈ N and a fixed starting iterate x (0) , let us define the δ-stabilizer: By definition, ϕ M maps a corrupted image y δ to the solution computed by the iterative solver in M iterations.
Setting as objective function in (1) the Tikhonov-regularized least-squared function: the authors in [20] showed that it is possible to choose M such that L δ ϕ M < 1.Hence, given δ and F A θ , it is always possible to use ϕ M as a pre-processing step, stabilizing ψ θ .We refer to ψθ = γ θ • ϕ M as Stabilized Neural Network (StNN).In the numerical experiments presented in Section 5, we use as iterative method for the solution of (15) the Conjugate Gradient Least Squares (CGLS) iterative method [11].

Filtered Neural Network (FiNN)
The intuition that a pre-processing step should reduce the noise present in the input data naturally leads to our second proposal, implemented by a Gaussian denoising filter.The Gaussian filter is a low-pass filter that reduces the impact of noise on the high frequencies [27].Thus, the resulting pre-processed image is a low-frequency version of y δ and the neural network ψ θ ∈ F A θ has to recover the high frequencies corresponding to the image details.Let ϕ G represents the operator that applies the Gaussian filter to the input.We will refer to the reconstructor ψθ = ψ θ • ϕ G as Filtered Neural Network (FiNN).
Note that, even if FiNN is employed to reduce the impact of the noise and consequently to stabilize the network solution, its L δ ϕ constant is not smaller than one.In fact, for any e ∈ R n with ||e|| ≤ δ, it holds: as a consequence of the linearity of ϕ G .

Results
In this Section we present the results obtained in our deblurring experiments described in Section 3. To evaluate and compare the deblurred images, we use visual inspection on a selected test image and exploit the Structural Similarity index (SSIM) [28] on the test set.

Results of experiments A
We show and comment on the results obtained on experiment A described in Section 3.3.We remark that aim of these tests is to measure the accuracy of the three considered neural reconstructors and of the stabilizers proposed in Section 4 and verify their sensitivity to noise in the input data.In a word, how these reconstructors handle the ill-posedness of the imaging inverse problem.
To this purpose, we visually compare the reconstructions of a single test image by the UNet and 3L-SSNet in Figure 5.The first row (which replicates some of the images of Figure 4) shows the results of the deep learning based reconstructors, where the out-of-domain images are clearly damaged by the noise.The FiNN and, particularly, the StNN stabilizer drastically reduce noise, producing accurate results even for out-of-domain tests.
In order to analyze the accuracy and stability of our proposals, we compute the empirical accuracy η−1 and the empirical stability constant Ĉδ ψ , respectively defined as: and where S T ⊆ X is the test set and e is a noise realization from N (0, σ 2 I) with ||e|| 2 ≤ δ (different for any datum x ∈ S T ).
The computed values are reported in Table 1.Focusing on the estimated accuracies, the results confirm that NN is the most accurate method, followed by NAFNet and 3L-SSNet, as expected.As a consequence of Theorem 2.1, the values of the stability constant Ĉδ ψ are in reverse order: the most accurate is the less stable (notice the very high value of Ĉδ ψ for NN!).By applying the stabilizers, the accuracy is slightly lower but the stability is highly improved (in most of the cases the constant is less than one), confirming the efficacy of the proposed solutions to handle noise and, at the same time, maintain good image quality.In particular, StNN is a stable reconstructor independently from the architecture.
To analyse the stability of the test set with respect to noise, we have plotted in Figure 6, for each test image, E ψ (x gt , y δ ) − η vs. ∥e∥, where the reconstruction error is defined in (2).With green and red dots we have plotted the experiments with stability constant less and greater than one, respectively and with the blue dashed line the   bisect.We notice that the values reported in Table 1 for the empirical stability constant computed as supremum (see Equation ( 18)) are not outliers but they are representative of the results of the whole test set.

Results of experiment B
In this experiment we used noise injection in the neural networks training, as described in Section 3.3.This quite common strategy reduces the networks accuracy but improve their stability with respect to noise.However, we show that the reconstructions are not totally satisfactory when we test on out-of-domain images, i.e. when input images are affected by noise of different intensities with respect to training.
Figure 7 displays the reconstructions obtained by testing with both in-domain (on the left) and out-of-domain (on the right) images.Even if the NN reconstructions (column 4) are not so injured by noise as in experiment A (see Figure 4), however noise artifacts are clearly visible, especially in UNet and NAFNet.Both the stabilizers proposed act efficiently and remove most of the noise.We observe that the restorations obtained with FiNN are smoother but also more blurred with respect to the ones computed by StNN.
An overview of the tests is displayed by the boxplots of the SSIM values sketched in Figure 8.The light blue, orange and green boxes represent the results obtained with NN, FiNN and StNN methods, respectively.They confirm that the neural networks performance worsens with noisy data (see the different positions of light blue boxes from the left to the right column), whereas the proposed frameworks including FiNN and StNN are far more stable.

Analysis with noise varying on the test set
Finally, we have analysed the performance of the methods when the input image y δ is corrupted by noise ∥e∥ from N (0, σ 2 I), with σ varying.In Figure 9 we plot, for one image in the test set, the absolute error between the reconstruction and the true image vs. the noise standard deviation σ.In the upper row the results from experiment A (we remark that in this experiment we trained the networks on no noisy data).The NN error (blue line) is out of range for very small values of σ for both UNet and NAFNet, whereas the 3L-SSNet is far more stable.In all the cases, the orange and green line shows that FiNN and StNN improve the reconstruction error.In particular, StNN performs best in all these tests.
Concerning experiment B (in the lower row of the figure), it is very interesting to notice that when the noise is smaller than the training one (corresponding to σ = 0.025) the NN methods are the best performing for all the considered architectures.When σ ≃ 0.05 the behaviour changes and the stabilized methods are more accurate.

Conclusions
Starting from the consideration that the most popular neural networks used for image deblurring, such as the family of convolutional UNets, are very accurate but unstable with respect to noise in the test images, we have proposed two different approaches to get stability without losing too much accuracy.The first one is a very light neural architecture, called 3L-SSNET, and the second one is to stabilize the deep learning framework by introducing a pre-processing step.Numerical results on the GoPro dataset have demonstrated the efficiency and robustness of the proposed approaches, under several settings encompassing in-domain and out-of-domain testing scenarios.The 3L-SSNet overcome UNet and NAFNet in every test where the noise on test images exceeds the noise on the training set, combining the desired characteristics of execution speed (in a green AI perspective) and high stability.The FiNN proposal increases the stability of the NN-based restoration (the values of its SSIM do not change remarkably in all the experiments), but the restored images appear too smooth and few small details are lost somewhere.The StNN proposal, exploiting a model-based formulation of the underlying imaging process, achieves the highest SSIM values in the most challenging out-of-domain cases, confirming its great theory-grounded potential.It represents, indeed, a good compromise between stability and accuracy.We finally remark that the proposed approach can be simply extended to other imaging applications modeled as an inverse problem, such as super-resolution, denoising, or tomography, where the neural networks learning the map from the input to the ground truth image cannot efficiently handle noise in the input data.This work represents one step further in shedding light on the black-box essence of NN-based image processing.

Figure 1 :
Figure 1: A graphical draft highlighting the introduction of pre-processing steps Fi and St defining the proposed frameworks FiNN and StNN, respectively.

Figure 2 :
Figure 2: A diagram representing the UNet and NAFNet architectures.

Figure 3 :
Figure 3: From left to right: ground truth clean image, blurring kernel, blurred corrupted image.

Figure 4 :
Figure 4: Results from experiment A with the three considered neural networks.Upper row: reconstruction from no noisy data.Lower row: reconstruction from noisy data (δ = 0.025).

Figure 7 :
Figure 7: Results from the experiment B. On the left, tests with images with the same noise as in the training (δ = 0.025).On the right, tests on images with higher noise (δ = 0.075).

Figure 8 :Figure 9 :
Figure 8: Boxplots for the SSIM values in experiment B. The light blue, orange and green boxplots represent the results computed by NN, FiNN and StNN, respectively.

Table 1 :
Estimated accuracy and stability constants for experiment A on out-of-domain test (input images corrupted by noise with δ = 2.56).