Compensation of Modeling Errors for the Aeroacoustic Inverse Problem with Tools from Deep Learning

: In the ﬁeld of aeroacoustic source imaging, one seeks to reconstruct acoustic source powers from microphone array measurements. For most setups, one cannot expect a perfect reconstruction. The main effects that contribute to this reconstruction error are data noise and modeling errors. While the data noise is accounted for in most advanced reconstruction methods, e.g., by a proper regularization strategy, the modeling error is usually neglected. This article proposes an approach that extends regularized inverse methods with a mechanism that takes the modeling error into account. The presented algorithmic framework utilizes the representation of the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) algorithm by a neural network and uses standard gradient schemes from the ﬁeld of deep learning. It is directly applicable to a single measurement, i.e., a prior training phase on previously generated data is not required. The capabilities of the method are illustrated by several numerical examples.


Introduction
Reconstruction of acoustic sources from aeroacoustic measurements has been an active field of research for many decades [1][2][3][4][5]. As aeroacoustic measurements are usually conducted in environments that are subject to random processes (such as wind tunnels), reconstruction procedures are usually based on cross spectral data, i.e., correlations in the frequency domain. There exist various source power reconstruction methods for correlation data such as Beamforming (see, e.g., [6]), CLEAN-SC [7], DAMAS [8] or Covariance Matrix Fitting also known as CMF (see, e.g., [9,10]). In this article, we will present an approach based on a regularized version of CMF.
Usually, source power estimations can only provide an approximation of the ground truth source powers, and there are two main causes for this reconstruction error. Firstly, data noise, which can be treated by suitable regularization. Secondly, modeling errors, i.e., usage of a physical sound propagation model that does not exactly match to conditions of the measurement environment. In this article, we present an approach that extends regularized CMF with additional degrees of freedom (DOFs) in order to take modeling errors into account. Here, we will only consider additional DOFs for the phase of the sound propagation matrix, as this is much more affected by modeling errors than the amplitudes. The proposed method is based on the Fast Iterative Shrinkage Thresholding Algorithm (FISTA) optimization algorithm, originally proposed by Beck & Teboulle for 1 -regularized least squares [11], which is an accelerated version of the iterative shrinkage-thresholding (ISTA) algorithm (see, e.g., [12,13]). Several variants of this algorithm have also been applied in the field of aeroacoustics [14][15][16][17]. Moreover, the principle of unrolling is used here. Unrolling means that a fixed number of (F)ISTA iterations is represented by a neural network, where the number of layers matches the number of (F)ISTA iterations (see, e.g., [18]).
As we employ tools from machine learning, in particular deep learning, we would like to relate our approach to other data-driven methods in inverse problems, respectively, acoustic source imaging. On the one hand, there exist purely data-driven approaches that fully learn the source power reconstruction [19][20][21][22]. Our approach is a hybrid one, i.e., it combines model-based methods with principles from machine learning. Other hybrid approaches in inverse problems are, for example, learned regularizers [23,24] or learned operator correction [25]. The concept of unrolling, which can also be seen as an hybrid approach, has been employed for Learned ISTA (LISTA) [26,27] and Trainable ISTA (TISTA) [28,29], which both optimize or learn some parameters of the ISTA algorithm. The latter one has recently been applied to acoustic source imaging [30]. The main principle for the compensation of the modeling error effects used in this work is motivated by ideas from Regularized Total Least Squares [31,32] and the Deep Inverse Prior [33]. The regularized cost function that measures the data misfit between modeled and measured data is optimized not only with respect to the source powers but additionally with respect to sound propagation parameters. In contrast to most machine learning frameworks, this approach does not include a prior training period on previously simulated or measured training data. Hence, it can be applied to a single measurement. We would like to emphasize that in this article, we focus on the presentation of the main ideas and a proof of concept to estimate the capabilities of the method. Hence, this is only the first step towards an applicability for real measurement scenarios.

Problem Modeling
Let c denote the speed of sound, u ∈ R 3 a constant convection field, and m = u c the Mach vector. Moreover, we consider subsonic convection, i.e., it is assumed that m 2 < 1.
The standard sound propagation model that is employed in experimental aeroacoustics is the convected Helmholtz equation which describes time harmonic sound propagation within the convection field u In Equation (1), k = ω c denotes the wavenumber, where ω = 2π f the angular frequency and f the frequency. Further, p denotes the complex pressure field and s a generic source. Note that the time factor convention e +iωt is used. The free field Green's function for the convected Helmholtz equation is given by where the Mach norm is defined by Although the formulation (2) ignores any geometric features of the measurement setup (e.g., presence of walls), the free field Green's function is widely used for source power reconstruction in experimental aeroacoustics as it provides robust results. The discrete forward model is set up as follows. The map domain, i.e., the regions where sources are assumed is discretized by N focus positions y 1 , . . . , y N and the positions of the array microphones are denoted by x 1 , . . . , x M . The pressure signal at the array and the source signal at the focus positions are given by . . . As the measurement environment is subject to random processes, both the pressure signal p(ω) ∈ C M and the source signal s(ω) ∈ C N are considered as random variables with zero mean. The propagation matrix G ∈ C M×N defined by determines the linear relation between source and array signal Taking the correlation matrix of p, we obtain where H denotes the Hermitian transpose. The matrix C is usually denoted as cross spectral matrix (CSM). A standard assumption in experimental acoustics, which will also be employed here, is the assumption of spatially uncorrelated sources. This means that the correlation matrix of the source signal is given by a diagonal matrix, i.e., The vector q will be denoted as source power vector in the following. Finally, the discrete inverse problem can be stated as follows: given a CSM C ∈ C M×M , find a source power vector q ∈ R N such that G M q G H = C .
For the further analysis, we introduce the discrete forward operator and its adjoint operator

Source Power Reconstruction with FISTA
For real experimental data, only a noisy approximation C obs of the true CSM C is available. Therefore, an appropriate source power reconstruction technique is the following regularized version of Cross Spectral Matrix Fitting (CMF) [9,10] q (α 1 ,α 2 ) = argmin where the penalty functional R is given by and α 1 , α 2 ≥ 0 denote the regularization parameters. The L1 penalty promotes a sparse source power vector, which is a realistic assumption for many aeroacoustic measurements. The L2 penalty ensures that the minimization problem has a unique solution. The objective function in Equation (6) can be efficiently minimized using the framework of the generalized Fast Iterative Shrinkage Thresholding Algorithm (FISTA) (p. 291, [34]). The principle of the FISTA algorithm is to repeat an alternating application of the following two operations: • A gradient step with respect to the first summand of the objective function; • Application of the proximal mapping (see Definition 6.1, p. 129, [34]) of the regularization part.
An implementation of this generic method for the specific problem (6) is given in Algorithm 1.
We conclude this section with some remarks on the implementation.
• The most expensive step in Algorithm 1 is the evaluation of C * C, which corresponds to line 7 and 8. By exploiting the multiplicative structure (see Equations (4) and (5)), this can be implemented efficiently (see [35]). • To ensure convergence, the step size τ must satisfy where the upper bound may be estimated, e.g., by the power method (see p. 239, [36]). • As the observed CSM C obs is Hermitian, the upper diagonal part can be neglected. Therefore, S denotes the index set of the lower triangular part of the CSM. The operation tril S (()·) sets all entries to zero that do not belong to S. Moreover, the principle of diagonal removal can be easily incorporated by defining S as the lower triangular indices without the diagonal.

Optimization of Phase Modeling Parameters
For most measurement scenarios, the free-field sound propagation matrix G (Equations (2) and (3)) is only an approximation of the true sound propagation. In this section, we will derive a framework that adds more degrees of freedom to the reconstruction process in order to compensate for this modeling error. Note that the propagation matrix G can be expressed as where R, Φ ∈ R M×N are the amplitude and phase matrices, respectively, denotes the component-wise or Hadamard product and the exponential also operates component-wise.
For the entire article, we assume that the amplitude matrix R is sufficiently well known but the phase matrix Φ is perturbed by a modeling error. In most measurement scenarios, the modeling of R is usually much more accurate than the modeling of the phase.

Unrolled FISTA
Let R be given and fixed, then for each phase matrix Φ, we can define the corresponding Tikhonov minimizer As outlined in the last section, this minimizer can be efficiently approximated by n iter steps of the FISTA algorithm (see Algorithm 1). Moreover, the FISTA algorithm with a fixed number of iterations can be regarded as the application of a neural network F with n iter layers. The general principle of representing iterative optimization algorithms as neural networks is usually referred to as algorithm unrolling. For an extensive review on the principle of algorithm unrolling, we refer to the review article [18]. The specific case that is employed here (unrolling the FISTA algorithm), will be described in detail below.
For the computation of one FISTA iteration, one needs the pair of the last two iterates, denoted by (q, q ) in the following. Lines 6-8 in Algorithm 1 perform the accelerated gradient step. The computation of each line can be represented by a corresponding function The entire accelerated gradient step can be represented by a concatenation of the functions defined above ags (q, q ), R, Φ, τ, β, C obs = ag3 ag1(q, q , β), ag2(v, R, Φ), R, Φ, τ, C obs , q . (8) Note that the function ags (8) is linear with respect to its first argument. In the context of neural networks, line 9 of Algorithm 1 can be interpreted as the application of a nonlinear activation function. For a pair of vectors (w, q), we therefore define the activation function of the neural network as proxAc((w, q), τ, α 1 , Note that the activation function proxAc (9) applies the nonlinear proximal mapping on the first component w and simply keeps the second component q, as the previous two iterates are needed for the next FISTA iteration.
With the propagation function ags (8) and the activation function proxAc (9), we can interpret the FISTA algorithm as a feed-forward neural network with the following characteristics: • The set of trainable parameters of F are the entries of the phase matrix Φ. Those are shared for each layer. Note that the set of trainable and non-trainable parameters can be varied but we will restrict our investigations to the case where Φ is trainable. • The starting value q (0) is the data input of the neural network and is transformed to the pair q (0) , q (0) before the first layer.
• The network consists of n iter layers, where each layer represents one FISTA iteration.
In each layer, the following two operations are applied to the pair (q, q ): 1.
The current pair is propagated forward by linear operations, encoded in ags (8), which essentially depend on the trainable parameters Φ and the non-trainable parameters R, C obs , τ, β n , α 1 , α 2 .

2.
After the propagation step, the activation function proxAc (9) is applied.
• After the last layer, the output pair q (n iter ) , q (n iter−1 ) is transformed to the final output q (n iter ) .
A sketch of the framework of the neural network F is presented in Figure 1.

Constrained Residual Minimization
In a scenario where the phase of the propagation is subject to a modeling error, this will affect the accuracy of the Tikhonov minimizerq(Φ). To account for this modeling error in the minimization process, we consider the following constrained problem Using the approximation (10) and replacing the constraint, we get the following unconstrained problem Note that we also omitted the factor 1 2 in front of the residual as it does not affect minimizers. The problem (11) is a minimization problem with respect to the trainable neural network parameters Φ. Hence, the loss function in (11) can be minimized by standard tools (i.e., gradient descent schemes) for the training of neural networks. A generic gradient descent scheme G receives the current network parameters, the gradient of the cost function with respect to the network parameters, the learning rate, and potentially other parameters as in input and returns the updated network parameters. Typical examples for such schemes G are standard gradient descent, gradient descent with momentum or ADAM [37]. The whole procedure to minimize J with a generic gradient descent optimizer G is summarized in Algorithm 2.
Algorithm 2: Gradient descent minimization scheme for the solution of (11).

Numerical Examples
To examine the performance of the designed problem (11), we consider a simple numerical example with four monopole sources. This setup has also been used in previous publications for synthetic benchmark computations [38]. The microphone array has an aperture of d = 1. The exact correlation matrix, propagation matrix and source power vector are denoted by C exact ∈ C M×M , G exact ∈ C M×N and q † ∈ R N . In order to generate also noisy data C obs ∈ C M×M , we draw n samp = 1000 pressure samples according to In (12), denotes pointwise multiplication and the vectors η (j) ∈ C M , (j) ∈ C N are sampled independently from vector-valued, standard complex normal random variables η and The level of the additive noise δ is chosen such that Thus, for n samp → ∞, the average relative perturbation of the diagonal of the correlation matrix is approximately 5%. Noisy data are then obtained by The neural network F Φ is set up using the fixed source input q (0) = (0, . . . , 0) , n iter = 100 layers (i.e., FISTA iterations) and regularization parameters α 1 = 10 −5 , α 2 = 10 −7 for all examples. Figure 2 shows the exact source powers and the reconstructed source powers for He = 16 using 100 FISTA iterations and the exact propagation matrix G exact . Reconstruction with the correct propagation model leads to an reconstruction error of 7.49.

Systematic Modeling Error
As a first example, we consider a systematic phase modeling error by slightly perturbing the Mach vector by 5% from m = (0.2, 0, 0) to m = (0.19, 0, 0) . The resulting erroneous phase matrix is denoted by Φ syst . The residual optimization (11) is done using Python and TensorFlow [39], where the used hyperparameter values are summarized in Table 1.  Figure 3a shows the result for the FISTA reconstruction, the systematically perturbed propagation matrix R exp Φ syst , and Figure 3b the result of the residual minimization (see (11)) with initial network parameters Φ syst . We observe that the perturbed flow magnitude leads to an error in the source location. As the source grid is relatively coarse, the main peak of each source is distributed over two source grid points. This effect of the systematic model error cannot be compensated by the neural network method, i.e., both methods produce a reconstruction error of the same order of magnitude.

Random Modeling Error
Secondly, we consider an example with a random perturbation of the phase of the propagation matrix. Such deviations from the true propagation model may occur, e.g., due to small measurement errors of the microphone positions. Recall the definition of the travel times between microphone and source positions (see Equation (2)) By means of the average travel time the perturbed phase matrix is defined as Here, E is a matrix-valued standard Gaussian random variable and σ denotes noise power The noise power level σ is chosen such that Hence, the perturbation of the propagation matrix has the same order of magnitude for both noise categories (systematic and random). Figure 4 shows the reconstruction results for an exemplary realization of the randomly perturbed phase matrix Φ rand along with the result of the neural network residual minimization (see (11)). In contrast to the systematic error case, the setup is now able to strongly improve the reconstruction result. The final error is of the same order of magnitude as for the FISTA reconstruction with the correct propagation matrix. As this example yields promising results, we consider a small parameter study for three Helmholtz numbers He ∈ {8, 16, 32} and n avg = 100 random noise realizations each. To evaluate the capabilities of the neural network residual minimization, several statistical measures will be considered. Firstly, the j-th perturbed phase realization after the n-th gradient step is denoted by Φ(j, n) for j = 1, . . . , n avg ; n = 0, . . . , n grad .
Refer to Equation (11) for the definition of the cost function J . The average cost over all noise realizations after n gradient descent steps and the reference cost for the reconstruction with the correct propagation matrix are given by As a measure of variation among the ensemble, we consider the average positive and negative deviation of cost value measures for each gradient descent step Similarly, for the reconstruction errors, the source power vector for each noise realization after n gradient descent steps in (11)  1 . Figure 5 shows the previously introduced statistical measures for each gradient descent step and for Helmholtz numbers He ∈ {8, 16, 32}. For all cases, the mean cost and mean error reach the optimal level of the reference cost and error (dotted line), respectively, after 50-100 gradient descent steps. This shows that for the chosen parameter setup, the method is able to recover solutions of the same error level starting with an erroneous propagation matrix compared to source power reconstruction with the correct propagation matrix. Moreover, the method seems to be robust with respect to additional gradient descent iterations. After the optimal cost and error level are reached, the values stagnate there.
Even though this is still a rather simple synthetic example, the robustness and accuracy of the results with respect to the reconstruction of q are remarkable. The setup considers M = 64 microphones, i.e., M(M+1) 2 = 2080 correlation datapoints and N = 441 focus points. This leads to M · N = 28,224 degrees of freedom (DOFs) for the phase perturbation within the residual minimization. Hence, even for this scenario, problem (11) is heavily underdetermined. Therefore, one should not expect to recover the correct phase matrix Φ exact , at least not in such a setup with much more DOFs than datapoints.  Cost and error graphs for several Helmholtz numbers. The colored area indicates the interval cost(n) ± 2 cost ± dev (n) and err(n) ± 2 err ± dev (n).

Conclusions
In this article, we suggested a framework that accounts for modeling errors in the aeroacoustic source power reconstruction problem. We presented an approach that extends acoustic source power reconstruction based on the FISTA algorithm with additional degrees of freedom (DOFs) that allow a variation of the modeled sound propagation. We restricted our investigations on a variation of the phase of the propagation matrix as this is usually much more affected by the modeling error than the amplitude. Our framework uses the unrolling principle which represents the FISTA optimization by a neural network. The actual optimization with respect to the phase parameters can the be accomplished by standard gradient descent schemes from deep learning. In principle, the neural network representation is not explicitly needed to define the proposed method. However, the great advantage of such a representation is that it leads to a straightforward and efficient implementation using the automatic differentiation abilities of deep learning software packages.
Our results should be seen as a proof of concept for the suggested algorithmic framework. Certainly, this is still work in progress and several more steps are needed to make it usable for experimental data.
The numerical examples show that this approach has the potential to improve source power reconstruction results that are subject to modeling error effects. For random phase perturbations, the algorithm yields very good results in the chosen setup. However, this was a rather friendly setup, where the number of DOFs was approximately 10 times larger than the number of correlation datapoints. For realistic measurement scenarios, this relation becomes even worse. Therefore, in order to move to an application on real experimental datasets, one has to reduce the number of DOFs in the method. Here, we used a brute force strategy where each phase parameter could be varied independently. Alternatively, one may choose a physically motivated parametric phase perturbation model with P parameters, where P M · N. The examples with the systematic phase perturbation show that the brute force approach is not well suited if the true perturbation is based on much less DOFs, in this case only one. Further research on this algorithmic approach should examine the performance of the presented framework using parametric perturbations based on the variation of physical parameters such as microphone positions, angle of attack, free stream velocity or speed of sound.