Complex-Valued Pix2pix—Deep Neural Network for Nonlinear Electromagnetic Inverse Scattering

: Nonlinear electromagnetic inverse scattering is an imaging technique with quantitative reconstruction and high resolution. Compared with conventional tomography, it takes into account the more realistic interaction between the internal structure of the scene and the electromagnetic waves. However, there are still open issues and challenges due to its inherent strong non-linearity, ill-posedness and computational cost. To overcome these shortcomings, we apply an image translation network, named as Complex-Valued Pix2pix, on the inverse scattering problem of electromagnetic ﬁeld. Complex-Valued Pix2pix includes two parts of Generator and Discriminator. The Generator employs a multi-layer complex valued convolutional neural network, while the Discriminator computes the maximum likelihoods between the original value and the reconstructed value from the aspects of the two parts of the complex: real part and imaginary part, respectively. The results show that the Complex-Valued Pix2pix can learn the mapping from the initial contrast to the real contrast in microwave imaging models. Moreover, due to the introduction of discriminator, Complex-Valued Pix2pix can capture more features of nonlinearity than traditional Convolutional Neural Network (CNN) by confrontation training. Therefore, without considering the time cost of training, Complex-Valued Pix2pix may be a more effective way to solve inverse scattering problems than other deep learning methods. The main improvement of this work lies in the realization of a Generative Adversarial Network (GAN) in the electromagnetic inverse scattering problem, adding a discriminator to the traditional Convolutional Neural Network (CNN) method to optimize network training. It has the prospect of outperforming conventional methods in terms of both the image quality and computational efﬁciency.


Introduction
As an accurate and non-destructive measurement modality for imaging, nonlinear electromagnetic inverse scattering is widely used in science, engineering, military and medical fields [1][2][3][4][5]. Compared with conventional tomography methods [6][7][8][9][10], nonlinear electromagnetic inverse scattering can solve the multiple scattering problem of electromagnetic wave fields inside the object [3][4][5]11], and the internal structure of the scene can be "seen" in a quantitative way. A large number of algorithms have been proposed and developed over the past few decades to solve electromagnetic inverse scattering problem, which can be divided into the following two: (a) deterministic optimization methods including Distorted Born iterative methods (DBIM) [12,13], Subspace based Optimization (SOM) [14][15][16][17][18], Contrast Source Inversion [19][20][21], and (b) stochastic methods [22][23][24] such as Particle Swarm Optimization Algorithms (PSO). In recent years, with the widely studied and rapidly developed of compressive sensing theory , some inverse scattering methods were produced for addressing the problem of Synthetic Aperture Radar(SAR) imaging [25][26][27][28]. Despite it has been verified that these methods can provide satisfactory results for objects of intermediate size and contrast. Owing to the limitation of computational • (Stage I) An initial guess of the contrast • (Stage IIc) Obtain a better contrast estimation through a custom deep learning network.
In this paper, an initial guess of the contrast is obtained by back-propagation method. We will demonstrate that the CVP2P network can efficiently reconstruct the targets with higher accuracy and efficiency than others. The input data of CVP2P originates from the back-propagation results in Stage (I) and the input label comes from the real contrast of the corresponding model.
The content of this paper is as follows-Section 2 states the problem and explains the final estimated goal of the electromagnetic inverse scattering problem. Section 3 describes the two stages of the above methods and compares the related schemes. Section 4 describes the implementation details of the network, including loss function, network structure and the training of the network. In Section 5, simulation and experiments are conducted to verify the performance of CVP2P. We used the MNIST dataset to train and test the CVP2P network [51]. We also built a microwave imaging system to provide experimental data for the algorithm to test the generalization ability of the algorithm. Finally, the whole paper is concluded and analyzed in the Section 6.

Problem Statement
A two-dimensional scalar electromagnetic field is considered in this paper. As shown in Figure 1, the incident plane wave E z,in of TM polarization irradiates the target region Σ. And the subscript z represents the Z component of electromagnetic waves. N S receivers with distance R from origin are uniformly distributed on D and used to receive scattering field data. The scattered field received by the receiver can be expressed as: r and r ∈ Σ represent the field point and the source point, respectively. dS is an area unit on Σ. E z (r) represents the total field, χ(r ) = ε r (r ) − 1 − i( σ(r ) ε 0 ω ) denotes a quantitative relationship between the contrast of the object χ(r ) and relative permittivity ε r (r ). σ(r ), ε 0 and ω are conductivity, vacuum permittivity and angular frequency, respectively.
0 (•) denotes the first kind of zero order Hankel function. The scattered field is measured with N S receivers per illumination with a total of N I illuminations in a single experiment. In order to carry out numerical experiments, we solve the discretized version of the Equations (1) and (2) by partitioning Σ into an K × K square grid using the method of moments. Meanwhile, the product of the contrast χ and the internal field E(r) at any point r in Σ is defined as the contrast source w, as shown in Equation (5).Combined with Equations (1) and (2), we obtain the following associated discretized forms: where e n j ∈ C K 2 ×1 , e n in j ∈ C K 2 ×1 denotes the total internal and incident fields for the j-th illumination, respectively. And n is the number of iterations. w n j ∈ C K 2 ×1 denotes the contrast source, e n sca j ∈ C N S ×1 refers to the scattered field, e n j , e n in j and e n sca j represent the discretized version of E z (r), E z,in (r) and E z,sca (r), respectively. χ n ∈ C K 2 ×1 refers to the contrast in the imaging domain Σ. G S ∈ C K 2 ×K 2 and G M ∈ C N S ×K 2 represents the state matrix and measurement matrix, respectively.
The inverse scattering problem is to use the known scattered field e sca to estimate the contrast χ.

Motivation
The equations between the scattering field and the object contrast, such as Equations (3)-(5), are non-linear and ill-conditioned equations, therefore the system will have infinite solutions when solving the inverse problem. So it is difficult to choose a meaningful solution. This defect is especially obvious under the conditions of high contrast objects or high frequency scene.
Although it is possible to learn the mapping from the scatter field e sca to the contrast χ directly, we can also pre-process the scattered data e sca using non-iterative inversion algorithm before training the network. Accordingly, we propose a two stage strategy based on back-propagation method, which is a computationally simple and effective solution for highly non-linear inverse scattering problem. Next, we will explain the individual stages of this deep learning scheme.

Initial Guess (Stage-I)
Similar to the method adopted by CSI, we use the back-propagation (BP) algorithm to determine the initial value of the contrast source as follows where w 0 j represents the initial value of the contrast source. G * M is the adjoint matrix of the measurement matrix G M , · 2 Σ represents the 2-norm in the target area Σ, · 2 ∂D and represents the 2-norm on the measurement boundary ∂D.
According to the state equation of Equation (2) and the initial value of the contrast source of Equation (6), the initial value of the total field e 0 j can be obtained as follows.
Thus, the initial value of contrast χ 0 can be computed by the following Equation (8).
Combining the above Equation (6) to Equation (8), the initial contrast χ 0 can be prepared as the input of the subsequent CVP2P network.

Comparison with Related Schemes
A popular deep learning method for dealing with inverse problems is the DCS method [49]. In the learning process of DCS, the contrasts χ of each incidence are put into different input-channels of CNN, and each corresponding output-channel is the true contrast χ of the domain Σ. Consequently, there are N pairs of input-and output-channels in DCS, obviously different from BPS with only one pair of input-and output-channel. Thus for DCS, results from different incidences are filled into different channels of the input images, which is helpful to capture more nonlinear features. In CVP2P, the discriminator as a loss function has its nature advantage of capturing more inner nonlinear features because of the compromise between the generator and the discriminator. Thus it probably has similar advantages as DCS in terms of capturing more nonlinear features with deeper length and confrontation strategy other than more channels.
As a related inversion procedure based on deep learning, the contrast source network (CS-Net) has recently been proposed to solve the inverse scattering problem [48]. CS-Net trains the network to learn the noise subspace components of the contrast source, so as to obtain an estimation of the total contrast source. Its final output is still obtained by the iterative algorithm of CS, which fails to discard the long time procedure of iteration. While the CVP2P replaces the iterations with the deep learning network, which make the main difference between them.

Structure and Core Idea of CVP2P
The CVP2P is a kind of Generative Adversarial Network (GAN) [52], which has similar structure to the network of pix2pix. Different from the traditional pix2pix to learn the mapping from input picture to output picture, CVP2P mainly learns the mapping from input complex data to output complex data.
CVP2P mainly consists of two parts-generator and discriminator. The role of the generator is to try to fool the discriminator by generating a contrast as accurate as possible, and the discriminator needs to distinguish as much as possible between the real contrast and the contrast generated by the generator. Through confrontation training, both continuously optimize their own network to achieve a balance point, so that the contrasts generated by the generator are infinitely close to the real samples. In the end, we can obtain an ideal generator to generate the desired result. Thus, it can be seen that the best advantage of CVP2P is that the updated information of generator (G) comes from discriminator (D) rather than the data sample. For example, if we give the generator the goal of "learning the mapping between input data and output data", the discriminator will control the generator to achieve this goal by confrontation training between them.
The generator of traditional image translation network (pix2pix) adopts U-net structure and is composed of convolution and deconvolution neural network. The discriminator adopts "PatchGAN" architecture and is composed of convolution neural network. The traditional pix2pix network is difficult to be applied directly to the electromagnetic inverse scattering problem because it cannot deal with the complex. To solve this difficulty, we have made the following improvement. Firstly, the generator of CVP2P adopts a multilayer complex-valued Convolution Neural Network (cCNN), which can compute the complex-valued convolution and apply activation function on both real and imaginary parts respectively. Different filter size is used in different cCNN layer to capture the features from different spatial scales. And then the discriminator is divided into two parts: real part and imaginary part. Either discriminator is a small traditional CNN that adopts "Patch-GAN" architecture. The complex generated by the generator is sent to the corresponding discriminator for judgment.

CVP2P Loss Function
The loss function of the CVP2P is inspired by the traditional pix2pix. It combines the loss function of cGAN with L1 distance, both of which should be calculated by the law of complex. Thus, the loss function of cGAN and L1 distance are expressed as the following Equations (11) and (12) respectively: and where G tries to generate the estimation of contrast. D r/i represents the real r or imaginary i parts of discriminator, aims to distinguish between Ground Truths y r/i and G(x r/i ) generated estimation of contrast. L complex (G) is the complex value of L1 distance. Thus, the final loss function of CVP2P becomes as follows, where λ controls the relative importance of the two loss function:

CVP2P: Network Training
The structural details of the CVP2P for nonlinear inverse scattering are described in Figure 2. The input data of the CVP2P comes from the BP algorithm. The training procedure for the CVP2P is as follows: (1) In the first step, the initial contrasts are divided into the real part and the imaginary part as the input of the generator. And then both parts are convolved with the corresponding filters according to Equation (10) to obtain a set of feature matrices. Note that the output of cCNN has the same size as its input. In other words, the size of the feature matrix remains constant in entire training process. (2) In the second step, these feature matrices undergo a nonlinear activation function to obtain a sparse outcome. Then the result is used as the input of the next layer to repeat above operation. Generally speaking, it is assumed that the relative permittivity is not smaller than 1 and the conductivity is non-negative. Therefore, the real part of the contrast is positive and the imaginary part of the contrast is negative. If we use the activation function of ReLu, we should apply the ReLu function to the complex conjugate of the contrast. (3) In the third step, the output of the final cCNN is sent to the corresponding discriminator for discrimination.
The results of a different number of convolution layers in the generator are tested. The experimental results show that a 9-layers convolution is sufficient to achieve the desired image quality. If necessary, more convolution layers can be added to enrich the nonlinearity of the network. But, this enhances the complexity of the network, which requires additional training cost and enhances the likelihood of overfitting. Since the main role of the discriminator is to train the generator, two convolution layers are sufficient to obtain the ideal generator in the case we consider.

Numerical and Experimental Results
The performance of CVP2P is assessed from the two aspects of simulation and experiment. For comparison, we also test the corresponding results of the Multiplicative Regularization Contrast Source Inversion (MR-CSI) method, both of which employ the Green's integral equation to generate the measured data in simulation as Equations (1) and (2).

Training and Testing over MNIST Dataset
The MNIST handwritten digit dataset is used to evaluate CVP2P. As common handwritten digits dataset in the field of deep learning, the MNIST dataset is commonly adopted to train and test networks. When the CVP2P method is applied to non-destructive testing, the permittivity of foreign object generally has a simple distribution with the shape of circle, ellipse or striped shape. The relative permittivity value is relatively concentrated in a certain range. Thus, we use some simplified samples, such as the MNIST handwriting digit dataset, as the training set to obtain their characteristics for foreign object detection, because they have the similar shape or distribution characteristics. For simplification, We use binary handwritten digit sets for training to test constant contrast objects with different shapes. Referring to Figure 1, the imaging region Σ is a square with size of 5.6λ 0 × 5.6λ 0 (λ 0 = 7.5 cm is the effective wavelength in vacuum). For numerical simulation, the imaging region Σ is composed of 110 × 110 uniform sub-squares. 32 transmitting antennas are uniformly distributed on the circular region D containing the imaging region Σ. And the radius of this region is represented by R = 10λ 0 . Meanwhile, 32 receiving antennas are used to collect the scattered electric field of the probed scene. The relative permittivity ε r of digit-like objects are equal to 3, in this full-wave electromagnetic simulation [53]. In addition, in order to test the ability of the network, we consider adding 10% random white noise to data of the scattered field for testing in this research. Note that we only train CVP2P in the noiseless case and test the network with noise-added data. From the MNIST dataset, we randomly select 7000 images as the samples' contrast. Through solving the full-wave solution of Maxwell's equations, the electromagnetic responses of multiple inputs and multiple outputs are obtained. Afterwards, 7000 BP results can be generated as initial contrast. These data are used as the input of CVP2P, while the 7000 samples' contrasts are considered to be the input label and expected output of CVP2P. As a result, 7000 data pairs are randomly broken into two groups: 6000 for network training, and 1000 for network testing.
The training of CVP2P was administered by the ADAM optimization method [54], and the epoch setting is 12. The learning rates are set to 0.0002. The filters are initialized randomly. All computations are performed in a small-scale server with the configuration of 128 GB access memory, with Intel Xeon E5-1620v2 central processing unit and NVIDIA GeForce GTX 1080Ti. We implemented and trained the CVP2P using Tensorflow library [55]. And the MR-CSI algorithms are implemented in Matlab 2018. Each iteration (including forward and backward pass) takes about 1.2s, and the complete training takes about 4h. Figure 3a shows the ground truths of the simulated MNIST handwritten digits for the nonlinear inverse problem. Figure 3b,c show the image obtained by BP and the MR-CSI with 1000 iterations, respectively. This is a clear indication that neither BP nor MR-CSI can provide acceptable results in high contrast cases. The corresponding results that calculated by CVP2P with cCNNs of 3, 6, and 9 layers are shown in Figure 3d(d-1,d-2,d-3), respectively. The results illustrate that more parts of the nonlinear features in inverse scattering problem can be learned by CVP2P.
In order to compare the impact of different methods on imaging quality, the so-called Peak Signal to Noise Ratio (PSNR) and Correlation Coefficient (CC) are used as qualitative measure metrics to assess image quality. For CVP2P method, the results of cCNN with 9 layers are selected to evaluate the image quality, because it can be seen intuitively that the reconstruction result of cCNN with nine layers is better for all the cases we consider. The formula to calculate CC and PSNR are as follows: where X is the real part of the reconstruction, Y is the real part of the original model, D(·) and Cov(·) represent the variance operator and covariance operator, respectively. The possible maximum pixel value is represented by MAX I . MSE is the Mean Square Error between the original image and the reconstructed image. Tables 1 and 2 respectively show the corresponding peak signal-to-noise ratio (PSNR) and CC of different methods.  As can be seen from the above table, the value of the qualitative measure metrics for CVP2P is much higher than the traditional method. And for PSNR and CC, higher values mean better image quality.
We note that in this case, the well-trained CVP2P, BP and MR-CSI algorithm takes about 1 s, 8 s and 10 min to reconstruct an image, respectively. The computation time of the CVP2P is much faster than the traditional method. Accordingly, it can be concluded that the CVP2P is significantly better than MR-CSI method from two aspects of image quality and computation time in this high-contrast case. Moreover, we also consider the architecture with more cCNN layers to learn more multiple scattering rules for improving the imaging quality. In the inverse scattering problem, we usually hope that the reconstruction result is consistent with the ground truth. However, the PSNR value fails to keep in line with the subjective judgment of human eye. In other words, when the PSNR value is high, the reconstruction may be unsatisfactory. PSNR performs poorly in predicting subjective image quality. Thus, the CC becomes the only indicator to evaluate the imaging quality in later cases.

Testing over Letter Targets with Trained Networks
We carry out another set of numerical simulations so as to verify the superiority of the method. In this test, the MNIST dataset is still invoked as the training set of CVP2P. Meanwhile, the test objects have the shape of English letters and the relative permittivity is set to 3. Other parameters are all the same as the Example 1. Table 2. CC results for the reconstructions in Figure 3.  Figure 4 shows the reconstruction results based on different inverse scattering methods, where ground truths is displayed in the first row. The imaging results of the BP, MR-CSI and CVP2P are illustrated in the second, third and fourth rows, respectively. We use CC to compare the image quality of the reconstruction with all the three methods above, which is shown in the Table 3. Moreover, the reconstruction time with the trained CVP2P takes less than 1 s, while the MR-CSI method takes the reconstruction time of about 10 min with 1000 iterations. The BP algorithm takes 8 s because of its low computational complexity. Because the probed object has relative high contrast, the MR-CSI method unable to produce satisfactory reconstruction results. Therefore, the CVP2P exhibits significantly better than BP and MR-CSI from the aspects of imaging quality and time.

Ground Truths for
Through the above discussion, we can conclude that although the network is only trained by the MNIST dataset, we can still obtain satisfactory reconstruction results for different types of objects with the trained CVP2P. This indicates that the CVP2P can learn the generalizable mapping between ground truth and the input in a similar electromagnetic inverse scattering scenario regardless of the shapes of scatters. We clearly observed that the CC of the CVP2P method is much higher than the BP and MR-CSI method. In other words, the CVP2P can learn more accurate features of the nonlinear imaging models.

Tests with Lossy Scatterers
We further verify the versatility of the CVP2P method by reconstructing lossy scatterers. Other parameters are all the same as the Example 1 except for the complex value contrast.
In the first two columns of Figure 5, the true profiles of three ground truths are shown. The real and imaginary parts of relative permittivity are in the range of 1-3 and 0-1 in the training set of Example 3 , respectively. The reconstructed results by the CVP2P are also displayed in Figure 5, and it is seen that these scheme achieve acceptable results for lossy scatterers. We use CC to compare the image quality of the reconstruction with real and imaginary parts, which is shown in the Table 4.

Testing Pre-Trained Networks by Experimental Data
In order to have a deeper understanding on CVP2P, the homemade measurement system for imaging are used to obtain Experimental data of antenna array for generalizability verification. We first built a Multi-antenna measurement system to provide experimental data. The picture of the experimental system is shown in Figure 6.
The system works at 3-5 GHz with 24 balanced Vivaldi antennas, which are evenly placed on a cylinder with a radius of 22.5 cm. Each Vivaldi antenna is 7 cm long, and it is 7.3 cm wide.The maximum imaging domain, D, consisted of a circle of radius 17 cm, located at the center of the chamber. If a square domain is used, the maximum size is a length of 18 cm. In practice, we have used a maximum imaging domain, D, with 10 cm sides. A vector network analyzer (KC901V) is connected to antenna via Agilent Coaxial Matrix Switch for transmitting and receiving signals, which provided port isolation of greater than 100 dB over the frequency range of interest. A host computer is connected to the vector network analyzer via USB to collect data. One antenna is used as the transmitting antenna, and the other 23 antennas are used as the receiving antenna, and the 1 × 23 transmission measurements of S a,b is obtained. Replace another antenna as the transmitting antenna and repeat the same operation. All data sets had 24 × 23 23 transmission measurements of S a,b (reflection measurements, S a,a , were excluded from these data). The target is made of a square wooden block with a side length of 5 cm. For numerical simulations, the imaging region Σ is a square with size of 0.1 m × 0.1 m, which is evenly divided into 64 × 64 sub-squares.  We use CVP2P trained by MNIST dataset to test the experimental data. Figure 7a shows the target (Ground truth) where the yellow object is a square wooden block and its relative permittivity is 2. Figure 7b-d shows the results of the BP, CVP2P and MR-CSI method at the working frequency of 4.4 GHz, respectively. Although the experimental data is extremely different from the simulated data of the MNIST dataset, the results of the CVP2P are satisfactory and superior to the MR-CSI in terms of image quality and computational efficiency. It should be noted that it takes 10 min and 1000 iterations for MR-CSI to produce these results. The computational time of the CVP2P is less than 1 s, which is much faster than MR-CSI.
Although the CC of the reconstructed image produced by CVP2P is as high as 0.9625, the imaging result still has artifacts and rough boundaries. This shows that the generalization ability of the network is relatively strong.

Conclusions
In this paper, we establish a deep learning framework, which can be applied to inverse scattering problem. Further, we clearly demonstrate that our method has the ability to reconstruct the objects with high contrast and achieve acceptable outcomes. The CVP2P can produce the contrast image with more accuracy by learning, which is illustrated by our quantitative and comparative research in simulations and experiments. Since the CVP2P is a non-iterative method, it can greatly reduce the computational cost and is very suitable for handling large-scale inverse scattering problems. Compared with the traditional method, such as MR-CSI, CVP2P achieves a better result in image quality and computational efficiency.
However, in the deep learning process, lack of interpretability is a major issue for our proposed scheme. As shown by the results described above, although the CVP2P is significantly superior to other inverse scattering methods, the mapping relationship learned by CVP2P is still not so clear. It leads to uncertainty as to how the CVP2P is able to estimate the contrast of ground truth from the initial contrast. However, we must add that many deep learning schemes involve such a problem. Funding: This research was supported in part by the Fundamental Research Funds for the Central Universities, grant number 20CX05021A and the Qingdao Source Innovation Program, grant number 19-6-2-60-cg. We also acknowledge them support to carry out the study.

Conflicts of Interest:
The authors declare no conflict of interest.The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Abbreviations
The following abbreviations are used in this manuscript:

CNN
Convolutional Neural Network GAN Generative Adversarial Network Pix2pix Image-to-Image Translation with Conditional Adversarial Nets