A Physics-Informed Neural Network Approach for Nearfield Acoustic Holography

In this manuscript, we describe a novel methodology for nearfield acoustic holography (NAH). The proposed technique is based on convolutional neural networks, with autoencoder architecture, to reconstruct the pressure and velocity fields on the surface of the vibrating structure using the sampled pressure soundfield on the holographic plane as input. The loss function used for training the network is based on a combination of two components. The first component is the error in the reconstructed velocity. The second component is the error between the sound pressure on the holographic plane and its estimate obtained from forward propagating the pressure and velocity fields on the structure through the Kirchhoff–Helmholtz integral; thus, bringing some knowledge about the physics of the process under study into the estimation algorithm. Due to the explicit presence of the Kirchhoff–Helmholtz integral in the loss function, we name the proposed technique the Kirchhoff–Helmholtz-based convolutional neural network, KHCNN. KHCNN has been tested on two large datasets of rectangular plates and violin shells. Results show that it attains very good accuracy, with a gain in the NMSE of the estimated velocity field that can top 10 dB, with respect to state-of-the-art techniques. The same trend is observed if the normalized cross correlation is used as a metric.


Introduction
Nearfield acoustic holography (NAH) [1,2] is an interesting acoustic-based technique for the contactless analysis of vibrating structures, such as plates and shells. NAH represents an appealing alternative to vibrational analysis carried out with accelerometric sensors when, for example, the structure under analysis is particularly fragile or the deployment of accelerometers is not feasible. Contactless analysis is also preferred when lightweight objects are considered, since no additional mass needs to be added. Differently from contactless optical techniques, e.g., laser Doppler vibrometer (LDV), NAH can be employed with objects made of reflective materials.
NAH estimates the velocity field of a vibrating structure starting from acoustic measurements acquired in its proximity. The sound pressure is typically captured by a microphone array deployed on a plane, known as holographic plane. The holographic plane is close to the vibrating surface in order to measure the evanescent waves, which are confined in the proximity of the structure [2]. With the aim of estimating the velocity field of the source from the pressure on the holographic plane, NAH relies on the inversion of the well-known Kirchhoff-Helmholtz (KH) integral [2,3]. As a matter of fact, the KH equation relates the normal velocity of a vibrating surface to the acoustic pressure generated by the vibration. Sometimes NAH is also cast as a sound field reconstruction problem [4] for applications in the field of source characterization [5] and sound field navigation [6][7][8]. The inversion of the KH integral, targeted by NAH, is known to be a highly ill-conditioned prior information about the physical problem. Indeed, SRCNN acts an image mapping between the input and output learned from the specific training set without considering any physical information.
In this manuscript, we propose a novel approach to solve the NAH problem. The goal of this work is twofold. On the one side, we combined the advantages of deep learning solutions, in particular CNNs, with prior knowledge coming from the physical model, which leads NAH, i.e., the KH integral. On the other side, we built an architecture able to estimate both the magnitude and the phase information of the normal velocity field on the vibrating surface.
The proposed model consists of two main blocks. The former takes the form of a CNN with one input, the hologram pressure field, and two outputs, i.e., the pressure and velocity fields on the vibrating surface. Successively, the second block propagates the two network outputs with the Kirchhoff-Helmholtz model in order to provide an estimate of the pressure at the hologram. For this reason, we called the devised architecture Kirchhoff-Helmholtz-based CNN (KHCNN).
We focused on the velocity analysis on rectangular plates and violin top plates starting from a corrupted version of acoustic pressures at the hologram plane with additive noise. Moreover, in order to consider scenarios compatible with experimental measurements, we performed NAH starting from a low number of pressure points at the hologram, namely the number of microphone sensors.
The proposed method is validated comparing the predicted vibrational fields with the ground truth and estimates given by CESM [20]. Moreover, we compared the performances also with respect to the fully data-driven approach of SRCNN-based NAH presented in [31]. Simulation results confirm the effectiveness of the proposed KHCNN approach to NAH. In particular, the presence of the physical model block improves the velocity accuracy of the network estimates.
It is worth noticing that the dataset that we proposed does not present a Gaussian distribution of the data [32]. Interestingly, the devised approach is able to accurate model this variability of the dataset.
The paper is structured as follows. Section 2 presents the data model of the problems. In Section 3, the mathematical formulation adopted and the overall methodology is introduced. The description of the proposed KHCNN along with the training procedure are reported in Section 4. Section 5 presents the generation of the simulated datasets. The validation results with the comparison between the state-of-the-art approaches are report in Section 6. A discussion of the available experiments is present in Section 7. Finally, Section 8 draws some final conclusions.

Data Model of the Mechanical Vibration
The characterization of a vibrating structure requires the knowledge of its structural dynamic properties. An essential information is represented by the modes of vibration. Modes, also called eigenmodes, are natural patterns of deformation that occur in objects during vibrations. They are associated to the modal frequencies or eigenfrequencies. Indeed, at these specific frequencies, the structural vibrations produce a stationary wave, the so-called mode shape. These vibrational patterns are characterized by nodes and anti-nodes. The former are points where no displacement of the structure is observed, whereas in the latter, maximum deformation occurs.
It is worth noticing that modes are inherent properties of a structure; thus, they do not depend on external forces or loads acting on the structure. Modes depend only on the object geometry, the material properties (i.e., mass, stiffness, damping properties) and from the boundary conditions (BCs) applied to the structure (i.e., simply supported, free or clamped BCs) [33].
Conversely, when a structure is excited by external forces, its vibration results in the Operational Deflection Shape (ODS) [34]. In particular, the ODS represents a combination of modes giving a general description for the harmonic evolution of the displacement over surface. Unlike mode shapes, the ODS depends on the excitation point, the load applied to the structure, and the frequency content of the excitation signal [34].

Data Model of the Acoustic Behavior
The pressure radiation produced by the points s belonging to a vibrating surface S and measured in a point r is predicted by the Kirchhoff-Helmholtz (KH) integral [2], i.e.
where ω is the angular frequency of vibration, p(·, ω) is the pressure field, n is outward normal vector and g ω (r, s) is the free-field Green's function from s to r, namely with c the sound speed in the air and j is the imaginary unit. Notice that the computation of (1) depends on the parameter α(r), which is determined by the position of the radiation point r: Moreover, (1) has to satisfy the Sommerfeld condition, which gives a boundary condition at infinity [2,35], namely The Euler's equation [2] defines a fundamental relation between the pressure and the normal velocity and writes where ρ 0 is the mass density of the material, which for the air medium at 20°C is ρ 0 ≈ 1.225 kg · m −3 and v n (s, ω) is the normal velocity in point s. By substituting (5) in (1), we can derive a different formulation of the KH integral equation for the exterior radiation problem of a vibrating structure Thanks to this formulation of the KH integral, we can compute the pressure radiated by a vibrating source starting from the knowledge of the pressure and the normal velocity fields on the object's surface.

Notation in Nearfield Acoustic Holography
In NAH, the soundfield on the holographic plane H is acquired through a microphone array nearby the object. In a Cartesian coordinate system, a general setup for performing NAH is shown in Figure 1, where H is horizontal and its z coordinate is z H .
In solid media, the vibrating structure radiates sound at frequencies higher than the cutoff frequency [2]. At frequencies lower than the cutoff, the soundfield decays exponentially with z, generating the evanescent waves. For this reason, the near-field condition [2] is an essential requirement in NAH for capturing all the velocity field components. In the NAH context, we are interested in solving the inverse propagation problem. In practice, this boils down to the inversion of (6), i.e., v n (s, ω) where F is a discrete estimator that approximates the soundfield on the hologram plane. However, the inverse propagation problem (7) is highly ill-conditioned, thus requiring a regularization procedure. In this work, F −1 takes the form of a CNN. From the input pressure field at the hologram, the devised network is able to estimate the velocity field on the vibrating surface, thus avoiding explicit matrix inversions.

Problem Formulation
In this manuscript, we present a novel approach to NAH that combines the advantages of deep learning [23] with the physical model of acoustic propagation (6). The underlying physical model allows us to enrich the recent data-driven NAH approaches in [24,31] with an estimate of the complex velocity field on the object's surface (i.e., both magnitude and phase information) to better characterize the vibrational behavior of the source.
Let us now consider a sampled version of the radiated pressure field acquired through a uniform planar microphone array placed on the horizontal holographic plane H. Hence, the hologram pressure field in matrix form is where the microphones are located at r m 1 m 2 with m 1 = 1, . . . , M 1 and m 2 = 1, . . . , M 2 . M 1 and M 2 are the number of points in the array along the y and x axes, respectively. Similarly, we can define a sampled version of the normal velocity field and of the pressure field on the object's surface in matrix form as P S (ω) = p(s n 1 n 2 , ω) where is the Hadamard product, M is a rectangular mesh grid on the source samples at s n 1 n 2 with n 1 = 1, . . . , N 1 and n 2 = 1, . . . , N 2 , such that it entirely contains the geometry of the vibrating surface S. In order to take into account the shape of the object, let B be a binary mask, which selects the points of the mesh grid belonging to the target surface. In particular, (b n 1 n 2 ) = 1 if s n 1 ,n 2 lies on the surface, and 0 otherwise. With the above definitions at hand, we can write the discretized Kirchhoff-Helmholtz Equation (6) in matrix form as where H represents the Hermitian transpose operator, p H ∈ C M×1 , and p S , v ∈ C N×1 are the column vector forms of P H , P S and V, respectively. Likewise, G v ∈ C N×M is the matrix of Green's functions relating the N points on the surface with the M points on the hologram and G p = ∂ ∂n G v . Notice that in (11), the number of points M can be different from N. In typical NAH experimental scenarios M < N, thus having a limited number of microphone sensors available with respect to the desired velocity resolution. Moreover, the discrete estimator F represents an estimate of the real pressure field, with accuracy determined by the number of adopted discrete points. Figure 2 shows the two-block approach proposed in this paper to combine data-driven and model-based solutions of the NAH problem. Inspiring by the recent works of CNN-NAH architectures presented in [24,31], where deep learning solutions have proved the ability to extract a powerful feature representation to regularize the inverse NAH problem, we employed a DNN to infer the back propagation relation. In particular, from the input pressure field acquired at the hologram plane P H (ω), the first block reconstructs the complex fields V(ω) and P S (ω) on the object's surface.
On the other hand, the use of physical information makes prior knowledge useful to regularize the ill-posed inverse problem, as shown for example in ESM-based NAH techniques. For this reason, we fed a second block with the two DNN outputs in order to apply a mathematical model of the forward acoustic propagation. This implies also knowing the Green functions relating the reconstruction points in S with the measurements locations on H and the frequency at which the object is vibrating. Thanks to this second block, we can obtain the estimate of P H (ω), comparing it with the input pressure, and tuning the performance of the DNN.

Network Description
In this section, we describe in detail the model sketched in Figure 2 along with the definition of the input and output data and the DNN architecture.

Kirchhoff-Helmholtz-Based Convolutional Neural Network
The estimated pressure field P H (ω) is computed using the discretized version of the KH equation defined in (11). In addition to P S (ω) and V(ω), this mathematical model requires knowing the Green function (2) between s and r for all the surface and hologram grid points pairs.
The DNN model adopted to solve the back propagation problem is inspired by the architecture of the renowned U-Net [36]. This architecture consists of three main components: the contraction, the bottleneck, and the expansion sections. Nevertheless, we modified such architecture in order to have two different outputs from the CNN, i.e., the pressure P S (ω) and the velocity V(ω). Therefore, the proposed model consists of one encoder E and two decoders D 1 and D 2 .
In order to apply the KH propagation model from the network outputs, the CNN has to reconstruct both output fields in complex domain. Therefore, the input and output data of the devised network are arranged in tensors with two channels containing the real and imaginary parts of the complex fields, respectively, thus preserving magnitude and phase information.
For these reasons, we refer to the devised model as Kirchhoff-Helmholtz-based convolutional neural network (KHCNN) and the overall scheme is depicted in Figure 3.
Notice that in Figure 3, real and imaginary parts are stacked to emphasize the fact that they are arranged in two channels. This way, real and imaginary parts are not treated as separate signals, but a feature sharing between the real and imaginary parts during the training process of the network is achieved.

KH propagation
Deep Neural Network Mathematical model

Input/Output Data
The network input is the pressure field acquired at the hologram plane P H (ω) arranged in a tensor of M 1 × M 2 × 2. In particular, M = M 1 × M 2 is the number of points used to sample the hologram pressure and the last two channels contain the real and imaginary part of the complex field.
On the other hand, the two outputs of the CNN coming for decoders D 1 and D 2 are the pressure P S (ω) and the velocity V(ω) on the vibrating surface, respectively. Both outputs are arranged in a tensor with dimensions N 1 × N 2 × 2 in order to estimate the real and imaginary parts of the complex fields in the N = N 1 × N 2 points on the object's surface.
It is worth noticing that the value ranges of P H (ω) and V(ω) are different. This is due not only to the different physical quantities and the elevation of the hologram pressure, but it also depends on the geometry and boundary conditions of the vibrating source. Therefore, the input pressure and the output velocity datasets have been normalized with respect to their maximum absolute value; thus, collecting images with magnitude in [0, 1]. Notice that this operation does not affect the phase information, which remains unaltered, but only the real and imaginary part of the complex fields, which now are in [−1, 1].
Differently from [24], where the pressure on the surface P S (ω) was considered as an implicit latent variable, here we consider it as explicit latent variable. Hence, we let KHCNN estimate P S (ω) from the evaluation of the KH propagation model on P H (ω).

CNN Structure
Here we describe the layers of the CNN architecture. We decide to compare two different networks that differ from the input dimensions and the number of parameters.
The first CNN aims to estimate V(ω) (9) starting from the input pressure P H (ω) (8), having the same spatial resolution of the output, i.e., same dimension of M 1 × M 2 = N 1 × N 2 = 16 × 64. This architecture is similar to the one proposed in [24] and it is used here only as a benchmark, since in practical NAH scenarios it is infeasible to measure the hologram pressure in 1024 points due to wiring and spacing problem. For this reason, inspired by the SRCNN approach in [31], we considered another architecture that produces the velocity estimate in 1024 points, but starting from 64 points at the input pressure acquired in a grid with dimension 8 × 8.
For the sake of simplicity, here we describe only the CNN architecture with M = 64 points at the input. Indeed, the benchmark model with the input pressure in M = 1024 points presents the same architecture with only an adaptation on the dimensionality.
The proposed encoder E consists of a series of four downsampling blocks. Each block includes two consecutive layers of 2D convolutions with filter size 3 × 3 and with a rectified linear unit (ReLU) activation function [37]. Moreover, batch normalization and 2 × 2 max pooling operations are applied after each downsampling block.
From the bottleneck embedding, the expansion section is achieved by two parallel decoders, D 1 and D 2 , with the same structure. Each upsampling step of both decoders consists of a Conv2DTranspose layer [38] with stride 2 × 2 followed by two convolutions with ReLU functions and batch normalization. Moreover, skip connections [39] between each downsampling block of the encoder E and the corresponding upsampling layers of D 1 and D 2 are used to enable the reuse of the encoded features. The desired output dimensions is reached with a super resolution section consisting in two additional upsampling blocks with asymmetric strides 1 × 2 and a final layer with linear activation function.
As a consequence, we obtain a double symmetric structure with one shared encoder and two parallel decoders.

Training Procedure
The CNN model is built to extract an estimate of the velocity V(ω) and pressure P S (ω) on the vibrating surface starting from the input pressure P H (ω). Moreover, the KH model computes P H (ω) from the network outputs. Notice that the quality of the network estimate can be assessed by the accuracy of both the soundfield on the hologram (input of the network) and of the velocity field (output of the network), since the former is estimated through the KH discretized operator from the latter. Therefore, we define the following mean square error (MSE) loss function: where Re(·) and Im(·) are operators that take the real and imaginary part of the complex field, respectively. Moreover, the pressure and velocity fields in (12) are represented as column vectors and without the dependence of ω for the sake of simplicity. The network is implemented (https://github.com/polimi-ispl/nah-khcnn, accessed on 20 November 2021) in Python using Keras [40] and trained through Adam optimizer with the default parameters presented in [41]. We decreased the learning rate by a factor of 0.2 on learning plateau. Moreover, we applied the early stopping regularization technique to prevent overfitting. Hence, we stopped the training after 20 epochs in which no improvement of the validation loss was observed.

Dataset Generation
We evaluated the proposed approach using two different vibrating structures: aluminum rectangular plates and violin top plates made of Sitka spruce [42]. In the former case, the vibrating object are planar and the material is isotropic, whereas the latter is made of a complex 3D orthotropic structure that exhibits different mechanical properties along the three perpendicular directions of the wood (L, longitudinal; R, radial; T, tangential).
We varied the dimensions and the BCs of the aluminum rectangular plate to build an extensive dataset, whereas in the violin plate dataset, the outline of the plate was modified according to [43].

COMSOL Simulation
Simulations are based on the finite element method (FEM) [44] using COMSOL Multiphysics ® software to compute a numerical approximation of the sound pressure radiated and the velocity generated by the vibrating structure.
Both for the rectangular plate and for the violin top plate simulations, two steps have been applied. The first step involves a mechanical study in order to retrieve the eigenfrequenciesω of each item in the dataset (Eigenfrequency study). In the second step, a suitable acoustic pressure study in the frequency domain has been accomplished (Pressure Acoustics, Frequency Domain study).
More specifically, in the acoustic simulation we emulated multiple shaker test setups by applying an external sinusoidal load at a fixed point on the structure with carrier frequency equal to each eigenfrequency computed in the previous mechanical study. Then, we retrieved the radiated sound pressure and the normal velocity associated to that specific vibrational input. In particular, we selectedω ∈ [0, ω MAX ] where ω MAX is defined such that ω MAX 2π = 2000 Hz. Moreover, in order to validate the devised methodology in scenarios compatible to experimental NAH setups, we evaluated the holographic pressure at the elevation z H close to 3 cm.
In order to have accurate estimations of the complex fields, the discretization process consisted of second-order polynomial interpolation. In particular, the mesh elements were built in order to have at least five second-order elements for each wavelength, i.e., h MAX = λ 0 /5, where λ 0 is the wavelength corresponding to the maximum frequency considered of 2000 Hz.
Finally, we sampled the synthesized data with a cubic interpolation to yield the discrete estimations of P H (ω) and V(ω) datasets. As far as V(ω) is concerned, it is sampled on rectangular grids with dimensions L x and L y in N = 1024 points. The grid dimensions change accordingly to the specific object shape and size, such that the vibrating structure is entirely contained in the rectangular grid. Therefore, the sampling stepsx s andȳ s of the normal velocity are computed according tō with N 1 = 16 and N 2 = 64. As for the acoustic soundfield, an example of the 3D pressure radiation resulting from COMSOL Multiphysics ® is depicted in Figure 4. From the computed acoustic simulation, we retrieved P H (ω) at the hologram plane H placed at z H .  We sampled the hologram pressure in a uniform rectangular grid with the same dimensions L x and L y used for V(ω). We defined the pressure sampling steps in order to collect two different spatial resolutions version of P H (ω) to validate the two proposed KHCNN architectures described in Section 4. In particular, we collected P H (ω) in M = 1024 points arranged in M 1 × M 2 = 16 × 64 and also in M = 64 points arranged in a grid of 8 × 8.
Notice that, when P H (ω) is sampled in M = 1024 points, the input and output spatial resolutions are the same. Conversely, when P H (ω) is sampled with M = 64, we have fewer pressure points than velocity ones.
We modeled 672 different rectangular plates, with dimensions comparable to the body of small bowed-string instruments. In particular, with length L For each plate, we analyzed with COMSOL Multiphysics ® the mechanical behavior for three different boundary conditions (BCs), i.e., simply supported, clamped, and free edges. To avoid exciting the plates on nodal lines and analyze as many different operational deflection shapes (ODS) as possible, we excited the plate with simply supported and clamped BCs at x = L x /5 and y = L y /4 locations, while for free BC, the excitation point was in the center of the plate.
We collected a dataset of 15,570 pairs of P H (ω) and V(ω). In particular, we obtained 8707 instances for the free BC, while 2752 and 4111 correspond to clamped and simply supported BCs, respectively [45].
As for the violin plate dataset, we simulated 1568 different synthetic violin top plates with variable outline. Authors in [43] described 20 different parameters, which enable the complete definition of a violin top plate geometry, i.e., shape and dimensions. In [43], shape parameters are sampled from Gaussian distributions centered around the nominal value of a reference violin (based on a Stradivarius instrument). Thanks to this approach, we used different violin-like geometries with parametric outlines in order to ease a generalization on the 3D shapes. In COMSOL Multiphysics ® software, we modeled the radiated pressure and velocity data by exciting the center position of each violin top plate with free BC, which yields a total of 72,523 instances in the dataset.

Data Augmentation and Additive Noise
The pairs of P H (ω) and V(ω) in the training set need to be the same for the simply supported, clamped and free BCs. This is especially true for the rectangular plate case, where we want the network to infer the different vibrational behaviors for the three BCs.
Nevertheless, the datasets resulting from the COMSOL Multiphysics ® simulation of rectangular plates are not well balanced. Due to the different dimensions considered, a larger number of items corresponding to modes at low frequencies can be found in the dataset with respect to the high frequency ones. Moreover, plates with clamped BC are characterized by modes with higher eigenfrequencies than free and simply supported ones.
For these reasons, we set up an analysis on the mode occurrences. Modes that are underrepresented in the dataset undergo a data augmentation step in order to have a balanced training set.
The mode occurrence analysis is based on the computation of a correlation matrix between all of the mode shapes present in the dataset. For all modes that are underrepresented, a replication is accomplished, so that a homogeneous distribution of the vibrational patterns is obtained.
Moreover, the collected P H (ω) of violin and rectangular plates has been corrupted with additive white Gaussian noise in order to model the effect of measurement noise in the pressure sensors. This operation is accomplished for both 1024 and 64 sampled points data at the hologram.
The additive noise applied to each pressure item in the datasets is such that the signalto-noise ratio (SNR) is selected from a uniform distribution in the interval [10, 60] dB. Table 1 reports the number D of P H (ω) and V(ω) fields for the rectangular and violin datasets.

Validation and Results
In this section, we describe the experiments conducted and we discuss the related results.

Metrics
Two are the metrics used for assessing the performance of KHCNN. They both test the deviation of the estimated velocity field V(ω) with respect to the ground truth V(ω) as computed from the COMSOL Multiphysics ® model.
For notational simplicity, in the rest of this section, we omit ω, but the dependence is implicit in the metric definition and in the complex field notations.
The normalized cross correlation (NCC) is a metric widely adopted in the NAH context that assesses the similarity between the prediction and the ground truth and it is defined as where the complex velocity fields are considered as column vectors. Notice that NCC is in [0, 1] and it is equal to one if the two velocity fields match perfectly. The second metric used to evaluate the accuracy is the normalized mean square error (NMSE), and it is defined in dB as where e = v − v is the column vector error between the prediction and the true value. It is worth noticing that NMSE emphasizes scaling and bias errors between v and v, which are not captured by NCC.

Validation
In this section, we evaluate the reconstruction capabilities of KHCNN with different boundary conditions (BCs). Moreover, we compare the results of the reconstruction when M = 1024 or M = 64 points are used on the holographic plane. For the ease of the reader, we will use the apex to emphasize the number of input points at the hologram pressure, e.g., P The test set consists of 1557 pressure fields of rectangular plates. In the following, the resulting NMSE and NCC are shown in octave band frequencies from the analysis of all the modes in the considered band. In particular, median values, standard deviation, and quartile distributions of NMSE and NCC in a band are depicted with box plots [46].
It is important to notice that, in the lower frequency bands, mainly free BCs are present, due to the different distribution of the eigenfrequencies along the frequency axis for the three BCs. In particular, a larger number of mode shapes occur at high frequencies. This produces a bias in the computation of the arithmetic mean of the metrics with respect the entire test set. For this reason, in order to correctly understand the overall network performance it is more insightful to consider the median value. Figure 5 shows the metrics with M = 64 points of hologram pressure subdivided for the three boundary conditions of the test set. By inspecting Figure 5, we can notice that a more accurate reconstruction is obtained for simply supported and clamped plates. A possible interpretation of this result can be found in the fact that with free BC, mode shapes are less predictable than with clamped and simply supported ones. Nevertheless, KHCNN is able to recognize from the low spatial resolution input P (M=64) H the different BCs applied to the vibrating source; thus, producing accurate V in N = 1024 points. In particular, the estimates reached a NMSE < −10 dB for the whole test set and NCC steadily above 0.95.
In Figure 6, we compare NMSE and NCC between the networks with P (M=1024) H and P (M=64) H at the input. Notice that, in this case, the plates in the test set have not been subdivided with respect to the BCs. From Figure 6 it is possible to observe that for both metrics the network based on an input of 1024 points offers an advantage with respect to the 64 points one, as one would expect.
It must be noticed that the KHCNN accuracy presents the same trend for M = 1024 and M = 64 for all frequency bands. In particular, the median NMSE value (Figure 6a   . Therefore, KHCNN is able to perform a denoising operation at the hologram pressure. This is a general behavior obtained for all KHCNN reconstructions of the test set and it is more visible for the estimates that start with low SNR values of input pressures.

Comparison with the Baseline
To validate the devised methodology with respect to state-of-the-art approaches, we compared the estimations obtained by KHCNN with a model-based and a fully data-driven approach adopted in the context of NAH. In particular, we compared the rectangular plate reconstructions computed with the devised KHCNN and the CESM presented in [20]. Furthermore, the velocity reconstructions on the violin top plates was compared with the SRCNN-NAH approach presented in [31].
Similarly to [20], here, we solved the optimization problem of CESM with the CVX solver [47]. The equivalent point sources are distributed on a uniform planar grid of 25 × 25 points located at the height z eq = −5 cm behind the surface of the plate. Moreover, the grid of equivalent point sources is 1 cm larger, with respect to each edge of the rectangular plate under study. Figure 8 shows In general, the devised KHCNN outperformed the CESM approach in the whole frequency range both for NMSE and for NCC. In particular, the median NCC value for the KHCNN reconstructions is 99.53%, while for CESM it is equal to 76.58%. Moreover, the NMSE median value decrease from −19.65 dB for KHCNN to −3.46 dB for CESM.
By inspecting the reconstructions, we noticed that KHCNN is more robust than CESM to the presence of noise in the input pressure. Moreover, although CESM obtains good accuracy for the simply and clamped BCs, the overall trend decreases due to the reconstructions with free BC. This is confirmed by the median NCC values of CESM that correspond to 89.8%, 79%, and 74.3% for the simply, clamped, and free BCs, respectively.
An example of KHCNN and CESM estimate is shown in Figure 9. Both techniques reconstructed the surface normal velocity of the rectangular plate with simply supported BC at 1371 Hz from P (M=64) H . Nevertheless, the V KHCNN is more accurate than the V CESM both for the magnitude and for the phase. This is confirmed also by the metrics. In particular, the NCC values for CESM and KHCNN are 87.39% and 99.56%, respectively. This is also confirmed by the fact that NMSE = −6.19 dB for the CESM estimate, while with KHCNN, we obtain NMSE = −20.47 dB. Furthermore, in order to validate the proposed KHCNN to objects different from rectangular plates, we analyzed the velocity reconstructions on violin top plates. Moreover, we compared the results of KHCNN with the ones obtained with the SRCNN architecture proposed in [31]. Hence, we trained the two systems with P (M=64) H and V pairs of violin top plates generated with COMSOL Multiphysics ® software. Figure 10 shows the performance comparison of NMSE and NCC for 7266 pressure fields of violin top plates. Notice that, since SRCNN reconstructs only the magnitude of the velocity field, the metrics are computed, considering the absolute value of the velocity estimates coming from KHCNN. In general, NMSE and NCC for both architectures present the same trend, with a more accurate reconstruction in the lower frequency bands with respect to higher ones. Nevertheless, KHCNN can achieve higher accuracy with a NMSE median value of −17.10 dB and a median NCC of 99.24%. The reconstructions with SRCNN, instead, reached a NMSE and NCC median value of −9.31 dB and 95.58%, respectively.
Two examples of the velocity magnitude estimates can be seen in Figure 11. The devised architecture is able to obtain a more accurate and smoother velocity pattern than the SRCNN one. As a matter of fact, in the second reconstruction example (at 948 Hz), the NMSE value of SRCNN and KHCNN are −9.17 dB and −16.07 dB, respectively. Likewise, NCC = 94.19% for SRCNN and for KHCNN the NCC value is 98.82%. Moreover, in addition to the more accurate results for the magnitude reconstructions, KHCNN is able to estimate the phase information of the desired V. In Figure 12 an example of the KHCNN reconstruction, i.e., complex velocity and hologram pressure fields, for a violin top plate vibrating at 1829 Hz is shown.

Discussion
From the analysis of the results displayed in the previous section, KHCNN shows accurate reconstructions with different shapes and mechanical properties of vibrating sources, improving the performance with respect to sparsity-based NAH [20] and recent DNN [31] solutions.
Interestingly, KHCNN is able to retain accurate estimates when the size of the input data is reduced to one-sixteenth of the output spatial resolution. Hence, KHCNN effectively achieves the super-resolution of velocity fields as in [31], while improving the overall accuracy of the reconstruction. In particular, KHCNN with M = 64 input pressure points of a rectangular test set produces the complex velocity field in N = 1024 points with a median NMSE value stable around −25 dB for the first four octave bands, i.e., up to 707 Hz.
With respect to the CESM approach [20], KHCNN shows improved results, in terms of NMSE and NCC. As a matter of fact, KHCNN is able to produce more accurate estimates of the complex velocity field, i.e., considering both magnitude and phase. The accuracy of CESM greatly degrades when free BCs are adopted. KHCNN, instead, is able to limit the performance reduction obtaining comparable results for all three BCs under analysis. It is worth noticing that the CESM approach is led by an approximation of the acoustic propagation model. The velocity field, indeed, is estimated from a sparse set of point-like sources, which equivalently describe the soundfield at the holographic plane. Conversely, KHCNN employs a neural network to estimate the velocity field and it takes advantage by the actual propagation model represented by the Kirchhoff-Helmholtz equation.
Results show that the desired velocity field can be computed on the surface of complex shapes that present arching, such that violin top plates. Here, KHCNN is compared with respect to SRCNN [31], a recently developed DNN-based NAH technique. The improved results attained using KHCNN can be noticed especially in the lower frequency bands, where the difference between the statistical distribution of the metrics is substantial. In particular, KHCNN reduces the NMSE values of around 10 dB with respect to SRCNN up to 353 Hz. In the highest frequency range, the difference between KHCNN and SRCNN is not significantly reduced.
The main difference between the two architectures lies in the propagation model introduced in KHCNN. As a matter of fact, SRCNN learns a direct mapping between the magnitude of the input pressure and the magnitude of the surface velocity. Although effective, this approach does not take into account for possible deviations in the soundfield generated by the estimated velocity. Differently from the end-to-end approach of SRCNN, KHCNN explicitly considers the complex velocity field (real and imaginary parts) and the propagation through the Kirchhoff-Helmholtz equation. As a result, the estimated velocity fields improved considerably, allowing us to obtain a more accurate reconstruction of the violin mode shapes.
Lastly, we can observe that KHCNN performed consistently on both the datasets under analysis despite the different shapes, BCs and material properties of the objects. As a matter of fact, in the case of rectangular plates with 64 points at the input the median values of NMSE and NCC are −19.65 dB and 99.53%, respectively. Similarly, for the violin plate dataset, we obtained a median NMSE value of −16.83 dB and a NCC median of 99.2%, both computed with the complex fields. Hence, although the violin plates present diverse shapes and material properties, the difference on the median values is limited to 2.82 dB and 0.33% for the NMSE and NCC, respectively. The proposed CNN approach aims, therefore, to provide a promising tool for a wide variety of NAH applications.

Conclusions
In this manuscript, we introduced a novel technique for NAH. The devised architecture, called KHCNN, combines the advantages of the learning feature of CNN with the physical information given by the Kirchhoff-Helmholtz forward propagation model. In particular, the CNN is trained in order to provide an estimate of the velocity field of the source starting from the acquired acoustic pressure. Through the propagation of the estimated velocity using the Kirchhoff-Helmholtz equation, the prediction was then refined, comparing the respective acoustic pressure with the input data.
The proposed KHCNN was validated with two different datasets: isotropic rectangular plates and orthotropic violin top plates. The velocity ground truth on the vibrating structures and the complex pressure field at the holographic plane were generated for each structure using the finite element method with COMSOL Multiphysics ® software. We varied the dimensions and the boundary conditions for each vibrating plate to ease a generalization of the method. Moreover, the synthesized pressures were corrupted with different SNRs of additive white noise in order to simulate sensor noise.
Results show that KHCNN is able to estimate the desired complex velocity field on vibrating objects, starting from the low spatial resolution of radiated soundfield. We obtained accurate reconstructions for both the magnitude and the phase information of the vibrational field. In particular, KHCNN reached a median NMSE value under −16 dB and a median value of NCC above 99% for both the rectangular plate dataset and the violin plate one. Moreover, the explicit definition of the forward propagation model into KHCNN enables further verification of the network estimates by comparing the pressure reconstruction at the hologram.
Furthermore, we assessed the network accuracy with respect to recent NAH approaches available in the literature. We compared the rectangular plate estimates of KHCNN with CESM and the magnitude of violin top plate reconstructions with the fully data-driven approach of SRCNN. In both cases, the KHCNN results outperformed the considered approaches in terms of normalized mean square error and normalized cross correlation for the whole frequency range considered.
Future works will be devoted to the application of KHCNN to experimental measurements following two main directions. On one side, we aim at training the architecture using simulations while testing the system on the field on real data acquired also in the presence of reverberation. This will allow us to avoid expensive and time consuming measurement campaigns exploiting the flexibility of simulations for building extensive datasets with variable characteristics offline. On the other side, we expect to extend the KHCNN approach to work with a wide variety of objects without explicitly retraining the network. For both points, we foresee the application of domain adaptation and transfer learning strategies in order to tune the network with different data. Funding: This research was not funded by a research program, but the authors trained and tested the proposed algorithm using a GPU donated through the NVIDIA GPU grant program.
Institutional Review Board Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript: