Interaction between Model Based Signal and Image Processing, Machine Learning and Artificial Intelligence

Signale and image processing has always been the main tools in many area and in particular in Medical and Biomedical applications. Nowadays, there are great number of toolboxes, general purpose and very specialized, in which classical techniques are implemented and can be used: all the transformation based methods (Fourier, Wavelets, ...) as well as model based and iterative regularization methods. Statistical methods have also shown their success in some area when parametric models are available. Bayesian inference based methods had great success, in particular, when the data are noisy, uncertain, incomplete (missing values) or with outliers and where there is a need to quantify uncertainties. In some applications, nowadays, we have more and more data. To use these “Big Data” to extract more knowledge, the Machine Learning and Artificial Intelligence tools have shown success and became mandatory. However, even if in many domains of Machine Learning such as classification and clustering these methods have shown success, their use in real scientific problems are limited. The main reasons are twofold: First, the users of these tools cannot explain the reasons when the are successful and when they are not. The second is that, in general, these tools can not quantify the remaining uncertainties. Model based and Bayesian inference approach have been very successful in linear inverse problems. However, adjusting the hyper parameters is complex and the cost of the computation is high. The Convolutional Neural Networks (CNN) and Deep Learning (DL) tools can be useful for pushing farther these limits. At the other side, the Model based methods can be helpful for the selection of the structure of CNN and DL which are crucial in ML success. In this work, I first provide an overview and then a survey of the aforementioned methods and explore the possible interactions between them.

Intelligence 23 1. Introduction 24 Nowadays, there are great number of general purpose and very specialized toolboxes, in which, 25 classical and advanced techniques of signal and image processing methods are implemented and 26 can be used. Between them, we can mention all the transformation based methods (Fourier,Hilbert,27 Wavelets, Radon, Abel, ... and much more) as well as all the Model Based and iterative regularization 28 methods. Statistical methods have also shown their success in some areas when parametric models are 29 available. 30 Bayesian inference based methods had great success, in particular, when the data are noisy, 31 uncertain, some missing and some outliers and where there is a need to account and to quantify  In the first category, the main idea is to use different ways the signal and images can be represented in 61 time, frequency, space, spacial frequency, time-frequency, wavelets, etc.    69 The model based methods are related to the notions of forward model and inverse problems 70 approach. The following figure shows the main idea:

71
Physical Model of some brain characteristic f

Forward problem
Image of some brain characteristic f Inverse problem

73
In the same way, given the forward model H and the data g, the estimation of the unknown 74 sources f can be done either via a deterministic method or probabilistic one. One of the deterministic 75 method is the Generalized inversion: f = H † (g). A more general method is the regularization: [3].

77
As we will see later, the only probabilistic method which can be efficiently use for the inverse 78 problems is the Bayesian approach.

80
Let consider the linear inverse problem: Then the basic idea in regularization is to define a regularization criterion: and optimize it to obtain the solution [4]. The first main issue in such regularization method is the choice of the regularizer. The most common examples are: The second main issue in regularization is the the choice of appropriate optimization algorithm. Mainly, 81 depending on the type of the criterion, we have:

83
• R( f ) non quadratic, but convex and differentiable: Here too the Gradient based and Conjugate 84 Gradient (CG) methods can be used, but there are also great number of convex criterion 85 optimization algorithms.

86
• R( f ) convex but non-differentiable: Here, the notion of sub-gradient is used.

87
Specific cases are:

88
• L2 or quadratic: In this case we have an analytic solution: f = (H H + λD D) −1 H g. However, in practice 90 this analytic solution is not usable in high dimensional problems. In general, as the gradient can be evaluated analytically, gradient based algorithms are 92 used.

93
• L1 (TV): convex but not differentiable at zero: The algorithms in this case use the notions of Fenchel conjugate, Dual problem, sub gradient, 95 proximal operator, ...

96
• Variable splitting and Augmented Lagrangian

108
The simple case of the Bayes rule is: When there are some hyper parameters which have also to be estimated, we have: From that joint posterior distribution, we may also obtain the marginals: To be more specific, let consider the case of linear inverse problems g = H f + ,. Then, assuming Gaussian noise, we have: Assuming a Gaussian prior: Then, we see that the posterior is also Gaussian and the MAP and Posterior Mean (PM) estimates become the same and can be computed as the minimizer of : In summary, we have: For the case where the hyper parameters v and v f are unknown (Unsupervised case), we can derive the following: where the expressions for α , β , α f , β f can be found in [8].

109
The joint posterior can be written as: From this expression, we have different expansion possibilities:

110
• JMAP: Alternate optimization with respect to f , v , v f : • Gibbs sampling MCMC: • Variational Bayesian Approximation: The questions now are: Can we join any of these steps? Can we go directly from the image to the decision? For the first one, the Bayesian approach can provide a solution: The main tool here is to introduce a hidden variable which can represent the segmentation. A solution 118 is to introduce a classification hidden variable z with z j = {1, 2, · · · , K}. Then we have in summary: 119 • p(g| f , z) does not depend on z, so it can be written as p(g| f ).

121
• We may choose a Markovian Potts model for p(z) to obtain more compact homogeneous regions 122 [8,9].

127
This scheme can be extended to consider the estimation of the hyper parameters too.

137
• Possibility of estimating hyper-parameters via JMAP or VBA

138
• Natural ways to take account for uncertainties and quantify the remaining uncertainties. The main idea in Machine Learning is to learn from a great number of data: (g i , d i ), i = 1, · · · N:   161 To show the possibilities of the interaction between classical and machine learning, let consider a few examples. The first one is the case of linear inverse problems and quadratic regularization or the Bayesian with Gaussian priors. The solution has an analytic expression:

Interaction between Model based and Machine Learning tools
which can be presented schematically as The second example is the denoising g = f + with L1 regularizer, or equivalently, the MAP estimator with a double exponential prior, where the solution can be obtained by a convolution followed by a thresholding [20,21] The third example is the Joint Reconstruction and Segmentation that was presented in previous sections.

165
If we present the different steps of reconstruction, segmentation and parameter estimation, we can