Application of Deep Neural Network to the Reconstruction of Two ‐ Phase Material Imaging by Capacitively Coupled Electrical Resistance Tomography

: A convolutional neural network (CNN) ‐ based image reconstruction algorithm for two ‐ phase material imaging is presented and verified with experimental data from a capacitively coupled electrical resistance tomography (CCERT) sensor. As a contactless version of electrical resistance tomography (ERT), CCERT has advantages such as no invasion, low cost, no radiation, and rapid response for two ‐ phase material imaging. Besides that, CCERT avoids contact error of ERT by imaging from outside of the pipe. Forward modeling was implemented based on the practical circular array sensor, and the inverse image reconstruction was realized by a CNN ‐ based supervised learning algorithm, as well as the well ‐ known total variation (TV) regularization algorithm for comparison. The 2D, monochrome, 2500 ‐ pixel image was divided into 625 clusters, and each cluster was used individually to train its own CNN to solve the 16 classes classification problem. Inherent regularization for the assumption of binary materials enabled us to use a classification algorithm with CNN. The iterative TV regularization algorithm achieved a close state of the two ‐ phase material reconstruction by its sparsity ‐ based assumption. The supervised learning algorithm established the mathematical model that mapped the simulated resistance measurement to the pixel patterns of the clusters. The training process was carried out only using simulated measurement data, but simulated and experimental tests were both conducted to investigate the feasibility of applying a multi ‐ layer CNN for CCERT imaging. The performance of the CNN algorithm on the simulated data is demonstrated, and the comparison between the results created by the TV ‐ based algorithm and the proposed CNN algorithm with the real ‐ world data is also provided.


Introduction
Electrical impedance tomography (EIT) has been studied and widely applied in medical imaging and process tomography since it was introduced in the 1980s [1][2][3][4][5]. The conductivity distribution within the target region, such as areas of the human body or the contents of a pipeline or vessel, can be revealed based on the impedance measurements via electrodes placed on the boundary of the region [6]. Compared with other imaging protocols, EIT has the advantages of producing images with high temporal resolutions while having a relatively low cost, no radiation, no invasion, rapid response, and simplicity for application [6,7]. In late 1980s, when EIT was introduced to the process tomography field, electrical resistance tomography (ERT), a particular case of EIT, was proposed [8,9]. Compared with EIT, it has similar imaging processes, except that the LARS, and elastic net methods, and they used a set of trained subsystems to generate the value of each pixel of the image in parallel [33].
In this work, a multi-layer feedforward CNN was established to achieve image reconstruction for CCERT industrial application. During training, the 2D monochrome 2500-pixel image was divided into 625 clusters, and then the proposed CNN was trained separately for each pixel cluster of the image to achieve feature extraction and classification. A supervised learning algorithm built a mathematical model for the cluster to map the input resistance to the output pixel pattern. With the 12-electrode circular CCERT system, the proposed multi-layer CNN model was examined by both simulation and experiment data. In addition, the reconstructed images obtained with the CNN method were compared with the images produced by a traditional reconstruction algorithm: a TV algorithm.

System Configuration and Data Acquisition Principle
For the CCERT system, data were collected via the boundary-placed electrodes. This research studied the performance of a circular electrode sensor, where 12 electrodes were evenly spaced and attached to the outside of the sensing area with an angle of 25°, as shown in Figure 1a. The size of one electrode was 150 mm × 24 mm, with the inner and outer diameters of the sensing area being 106 mm and 110 mm, respectively. During the measurement process, a 3.3 V AC voltage with 500 kHz was applied as the excitation signal. For each independent measurement, only two electrodes were selected as the exciting and detecting electrode pair, where the AC voltage was injected into the excitation electrode and the current was detected via the detection electrode, and the remaining electrodes were kept at floating potentials at the same time. The equivalent detection circuit can be simplified as in Figure 1b, in which and express the coupling capacitances and represents the impedance of the sensing area. Only the resistance part was involved in the CCERT system, and it could be calculated from the applied voltage and the real part of the detected current based on Ohm's law. In a complete measurement cycle, electrode 1 was first selected as the excitation electrode, and electrode 2 to electrode 12 were successively selected as the detection electrode. The whole process continued until electrodes 11 and 12 constituted an electrode pair. For the same sample, the detected resistance between a certain electrode pair remained the same no matter which acted as the excitation electrode or the detection electrode. Therefore, in each measurement cycle, the total number of independent measurements was = = 66, where n is the number of electrodes.

Conventional Forward Modeling and Image Reconstruction Algorithm of CCERT
Conventional CCERT is the technique that enables the reconstruction of the internal conductivity distribution from the boundary resistance measurements with the sensitivity matrix and reconstruction algorithm. The imaging process has two essential stages: one is the forward modeling, and the other is image reconstruction, often termed as the inverse problem [34]. During the test, the time difference (TD) method was adopted to obtain the resistance projection (P), where P equaled the difference of the resistances at different times: one with a homogeneous conductive background and the other with detected samples added into the background [35]. Tap water with a conductivity of σ = 0.018 S/m was taken as the background medium.
In the forward problem, the boundary equations were obtained based on the known conductivity distribution within the target region. Two assumptions are made in the forward modeling process. The first assumption is that the electromagnetic field can be regarded as a quasi-static electric field, since the detected area is much smaller than the wavelength of the excitation signal under the commonly applied frequencies [18]. The second one is that the fringe effect caused by the finite electrode length can be neglected in order to simplify the modeling process [18]. Therefore, based on Maxwell's equations, the forward problem at the under-radio frequency within the sensing area Ω can be written as [11] ∇ , , where , , , , and , are the conductivity, permittivity, and electrical potential distribution of the sensing area, respectively, is the angular frequency of the excitation signal ( 2 , where is the excitation frequency), and ∇ represents the gradient operator. Then, the boundary conditions can be derived as where V is the amplitude of the excitation voltage, ⃗ represents the normal unit vector pointing out of the boundary. , , and are the indexes of the excitation electrode, the detection electrode, and the remaining floating electrodes, respectively, and Γ , Γ , and Γ are the spatial locations of the corresponding electrodes. Then, the sensitivity matrix (S), which reveals the relationship between the resistance projection (P) and conductivity distribution (G), can be determined based on the simulation [12]. During the forward simulation, a critical process is to mesh the sensing region and the system model into a finite number of elements. In this work, the discretization process is conducted by COMSOL Multiphysics. The simulation process is carried out by MATLAB R2020b, MathWorks.Inc, USA, as well as COMSOL Multiphysics. The excitation AC voltage is simulated as a 500 kHz frequency and 1 V amplitude signal. After injecting the AC voltage signal to the electrode, the current measurement on the detection electrode can be represented as where is the current measurement ( 1,2, … ,66 ) and is the measured current density of the electrode pair m and n. Then, the corresponding resistance measurement between the electrode pair can be written as 1 With the whole measurement data, the sensitivity matrix of CCERT is where M is the total number of measurements, N is the total number of meshing elements, is the sensitivity matrix associated with the measurement and element, and and are the current and resistance measurement when the imaging region is at a background state, respectively, where the conductivity of all elements equals to . When the conductivity of the element changes from to while the remaining elements still have conductivity, the current and resistance measurements then become and . After calculating the sensitivity matrix, the image reconstruction process can be conducted. For simplicity, the approximated linear relationship between P (change in resistance measured data), S, and G (change in electrical conductivity) can be expressed as (7) The inverse problem cannot be solved directly by multiplying P and the inverse of S to obtain G, given the following reasons. First, the solution is under-determined since there are more variables than equations [11]. Secondly, G is very sensitive to the perturbations of P [11]. Additionally, CCERT is a type of soft field tomography, which means the actual sensitivity matrix changes with the conductivity distribution [11]. Therefore, proper image reconstruction algorithms are needed in order to solve the inverse reconstruction problem.
For circular CCERT, linear back projection (LBP) was adopted first due to its advantages of simplicity and rapidity, but the image quality was limited. Therefore, an algorithm which combined LBP with a K-means clustering method was proposed to improve the image quality [36]. In 2014, a new hybrid algorithm which adopted Tikhonov regularization as the initial guess and took the simultaneous iterative reconstruction technique (SIRT) for standard iterations was proposed [12]. In 2017, the method consisting of a combination of the Levenberg-Marquardt (L-M) method and the simultaneous algebraic reconstruction technique (SART) was put forward. This method applied L-M for the initial guess and SART for final reconstruction [37]. Recently, the total variation (TV) algorithm with split Bregman iterations was used for CCERT reconstruction [15].
A simple image reconstruction can be performed using LBP: An iterative TV algorithm is an effective method for recovering and reconstructing piecewise constant signals. It is a deterministic technique that safeguards discontinuities in image processing tasks, so it is well suited for this two-phase imaging.
An anisotropic TV regularization term is expressed by Equation (9): where represents a finite difference approximation of the spatial image gradient. An isotropic version of the TV function is given by Equation (10) and was used in this work: where is the error threshold and is the regularization parameter. The higher the regularization (smoothing) parameter gets, the more impact the regularization will have on the solutions and, consequently, the more details will be lost from the image. Indeed, with the increase of , the contrast of the image becomes lower, and the boundaries within the object become smoother. After carefully choosing the regularization parameter, we optimized the image by deleting the artifacts. A more detailed description of the proposed TV method for CCERT can be seen in [15]. To be able to compare this method with the binary CNN algorithm, the TV-reconstructed images were the thresholds for the binary images.

CNN-Based Image Reconstruction CCERT
The supervised learning algorithm is one kind of machine learning algorithm. As task-driven learning, it aims to find a mathematical model for mapping the inputs and their correct outputs through a backpropagation (BP) learning algorithm. It is commonly applied in various classification problems, including image classification, fraud detection, and diagnostics, as well as regression problems including risk assessments, score prediction, and market forecasting.
In this research, a CNN-based supervised learning algorithm was adopted for image reconstruction which established a mathematical model of mapping the input of 66 resistance measurements to the desired output pixel pattern [38]. The resulting image was meshed into a 50 × 50 pixel grid, and the pixels were equally spaced. These 2500 pixels were sorted first by row and then by column. As such, in the first column and from the first row to the last row, the pixels were numbered from 1 to 50. Then, in the second column and from the first row to the last row, the pixels were numbered from 51 to 100. Following the same rule, the pixels in the last column from the first row to the last row were numbered from 2451 to 2500. If a single CNN were used to image the entire 2500pixel image, there would be 2 pixel distribution classes for the CNN to classify, which would be almost impossible for training. The problem was solved by dividing the 50 × 50 pixel image into a 25 × 25 pixel image with non-overlapping clusters, with each cluster representing a 2 × 2 pixel block. Since the space of each pixel point on the image was the same, the space of the clusters was also the same among each other. The conversions between pixels and clusters can be viewed in Figure 2. The clusters were also sorted first by row and then by column. Thus, taking cluster 1 as an example, it corresponds to the area of pixel 1, 2, 51, and 52. After completing the transformation, a distinct CNN could be applied for each cluster, and the classification became feasible since there were 2 16 pixel patterns within one cluster. Their labeling and matrix expressions are displayed in Table 1. As the proposed CNN model was designed for two-phase material application, the result could be represented as the binary image, where 0 and 1 mean the background and inclusion.
Then, the image reconstruction could be realized by the conversion process via the 625 CNN models, as shown in Figure 3. The 625 CNN results were converted into 625 2 × 2 binary matrices, based on Table 1, and the conversion between cluster patterns and the final pixel image took the reverse of the conversion from pixel to cluster, as explained in Figure 2, to form the final 50 × 50 pixel image. The development of each CNN followed the general procedure of the deep learning method as shown in Figure 4, which mainly included accessing data, constructing network architecture, setting training options, and conducting training, along with hand-tunings to achieve a fitting model.  Simulation data were generated based on the precalculated sensitivity matrix (S) and labeled for each CNN based on the cluster's pixel pattern. A total of 10,000 cases were generated for the network training, containing 5000 single-inclusion cases, 2500 doubleinclusion cases, and 2500 triple-inclusion cases. All the inclusions were in the quasicircular shape, with diameters from a 10 pixel-length to a 20 pixel-length placed on all locations of the image. Random noise was added to the simulation based on the standard deviation value of the background measurement for network training. Each set of 66 resistances were scaled to [0 1] to avoid gradient vanishing and converted into an 11 × 6 matrix. The structure of the matrix could be any combination of a size of 66, such as 11 × 6, 6 × 11, or 2 × 33. The final result would be the same no matter what matrix structure was used.
The CNN layers were constructed with the aid of the deep network designer application MATLAB R2020b, MathWorks.Inc, USA. After hand-tunings, the 625 CNNs adopted the same 19-layer architecture to realize feature extraction and classification, and the network architecture is displayed in Figure 5. In this work, hand-tuning of the hyperparameters included the following: (1) tuning the hyperparameters related to the network structure, such as the number of hidden layers and units and the activation function, and (2) tuning the hyperparameters related to the training algorithm, such as the optimizer, initial learning rate, number of epochs, and batch size. For different cases, the hand-tuning was different, but the trade-off needed to be considered alongside the training to avoid underfitting or overfitting cases. Convolution layers functioned as feature extractors by executing convolution operations between the receptive fields of the input and the kernels. An activation function-the rectified linear unit (ReLU)introduced nonlinearity to the network via ReLU x max x, 0 . Max pooling performed nonlinear downsampling on each feature map by taking the max value of the feature block to reduce computation while keeping essential information and providing invariance to the local translation. Batch normalization improved the stability, performance, and speed of the network. The fully connected (FC) layer flattened the 3D features into a 1D vector for classification, and the softmax layer calculated the probability of the input data belonging to each class. The distribution of 16 pattern classes was unbalanced. After randomly sampling the different cases, Table 2 shows the 16 classes' distribution for all sampled cases. Though the number appearing for each class may have varied with the added noise, class 1 and class 16 accounted for the majority of the possibilities. Thus, the focal loss layer was critical, since it was applied as the output layer to deal with the data imbalance between classes. The details of the CNN layers and parameters are given in Table 3.   1  45  34  82  102  2  0  1  0  3  3  0  1  0  3  4  0  1  0  3  5  0  1  0  3  6  0  0  0  0  7  0  0  0  0  8  3  3  5  3  9  3  3  5  3  10  3  0  5  3  11  3  0  5  3  12  1  1  1  4  13  1  1  1  4  14  1  1  1  4  15  1  1  1  4  16 564 577 519 483 The training was carried out on each CNN separately through the BP algorithm and Adam optimizer in order to find the most suitable weights and bias for the model, which could result in minimal prediction cross-entropy loss. The simulation dataset was randomly divided into training data, validation data, and test data at a ratio of 80%:10%:10%, respectively. The 'initial learning rate' was set as 1 × 10 −5 , 'MaxEpochs' was 20, 'MiniBatchSize' was 50, the validation frequency was 20, and the rest of the configuration parameters were set to the default values. The optimization process went through a maximum of 3380 iterations before reaching the final convergence. For these 625 CNN networks, the minimum validation accuracy after training was 85.6% and the average validation accuracy was 94.4%. The number of clusters with a validation accuracy above 90% was 536, accounting for 85.7% of all clusters. Figure 6 displays the training progress plot generated by MATLAB R2020b, MathWorks.Inc, USA, for the 313 cluster, which was the hardest one to reconstruct as it was located in the center of the sensing area. Due to the characteristic of the soft field, the sensitivity in the center area was lower than that near the sensor. In Figure 7b, the validations and test accuracies of the other clusters are also shown. The selected demonstration clusters were positioned at the midline of the vertical axis with the same space. Figure 7a shows the position of the selected cluster with red squares, and the corresponding clusters are the 63rd, 188th, 313th , 438th, and 563rd clusters. From the results, it can be seen that the cluster near the sensor area would have better CNN performance. In Figure 6, the deep blue curve and black curve in the top image represent the training accuracy and validation accuracy, respectively, and the orange curve and black curve in the bottom image represent the training loss and validation loss, respectively. With the growing training iterations, the accuracy curves increased gradually, achieving over 85.6% accuracy after training, while the loss curves decreased. Based on the tendency of the curves, it could be regarded as a good-fitting network. Besides that, the accuracy of the test dataset for the 313 cluster reached 85.93%, verifying the generalization ability of the model. When assessing the performance of the CNN, if the dataset used to train the network was unbalanced, the trained network may have been underfitted. Therefore, metrics other than the accuracy were needed to assist in the analysis, such as the confusion matrix (including recall and precision values) and the receiver operating characteristic (ROC) curve. In our design, since we trained the distinct CNN for each cluster instead of individual pixels, and these 16 classes of the same cluster would not have the same occurring probability, thus there were variabilities in the recall and precision values of different classes for the same cluster. Classes 1 and 16 had the highest occurring probability; therefore, the precision and recall values of 625 CNNs for these two classes were relatively stable and high, and the metrics showed that the value was smaller when the cluster was at the center area while larger when the cluster was close to the sensor. Figure 8 shows the plot of the precision and recall values of 625 CNN networks for class 1 and class 16. For class 16, the minimum precision and recall values of the 625 CNNs were 86% and 96%, respectively. For class 1, though a few networks underperformed, in aggregate, 85.9% of the networks achieved more than a 75% precision value, and 79.6% of the networks achieved more than a 75% recall value. For the other classes, the performance of the network varied among clusters, and the values of these metrics were low, mostly falling below 20%. Although such results may have introduced errors in boundary reconstruction, it was necessary to train the network with all 16 classes. By training the network with more different cases, the performance of the 625 CNNs for classification could be improved, thus providing the images with more accurate boundaries. Compared with other simple networks such as the shallow neural network, our proposed 625 multi-layer deep neural network performed better. The 625 CNNs possessed an average accuracy of 94.4% and had high precision and recall values for class 1 and class 16, which were the two most important classes. Taking the 63rd, 188th, and 313th clusters as examples, the test accuracy of the 625 CNNs was 96.22%, 89.34%, and 85.93%, while the accuracy of the shallow network with the same depth (150) was 93.7%, 84%, and 83.7%, respectively. Moreover, the recall and precision values of the 625 CNNs for class 1 and class 16 were much higher than those of the shallow network, implying that the shallow network has a higher probability of misclassifying class 1 or class 16 than other classes, which will affect the image results. Judging from the complete simulation and experimental image results shown in the following sections, our network was able to correctly determine the location and size of the inclusions, and the performance of the system met our design goals.
The 625 CNN models were saved separately and applied for simulation reconstruction and experimental reconstruction. The image reconstruction accuracy was analyzed quantitatively by calculating the structural similarity (SSIM), the mean squared error (MSE), and the peak signal-to-noise ratio (PSNR) between the reconstructed image and the referencing image. The SSIM, MSE, and PSNR are all metrics used to assess image quality [39,40]. The MSE is the average energy of the difference between the current image and the referencing image, while the PSNR is the ratio between the energy of the peak image value and the mean energy of the noise. The calculations of these two methods are both based on the error between the corresponding pixel points. Suppose that there are two images: the current image and the referencing image . The total number of pixels is for both images, and the pixel value belonging to them is and , respectively.
Therefore, the calculating algorithm of the MSE and PSNR can be expressed as where is the maximum pixel value of the current image. The less distorted image should have a higher PSNR value but a lower MSE value.
The SSIM is an index showing the similarity between two images. Different from the MSE and PSNR, the SSIM evaluates the quality of an image with a region of pixels instead of the individual pixel points, and thus it conforms to the human visual system. It calculates the similarity between the images in terms of luminance, contrast, and structure. The formulation of SSIM is where is the average of , is the average of , 2 and 2 are the variance of and , respectively, is the covariance of and , and 1 and 2 are the variables that stabilize the division. The value range of the SSIM is 0-1, and the image with better quality should have a higher SSIM value.

Simulation Reconstruction Results
In all simulation and experiment results, binary images were used, and they were cut into circular shape to match the shape of the sensing system. The reconstruction results obtained via traditional TV algorithm were also given. For the accuracy analyses of the TV and CNN results, we used the input simulation image as the reference ('True' image). To give a better comparison between the two methods, the term 'TV-CNN' was also given. This took the TV result as the reference and thus showed the difference between the CNN results and the reference. Table 4 gives the results of 9 simulation cases, in which cases 1-3 and cases 4-6 contained single inclusions with diameters of 16 pixels and 14 pixels, respectively, cases 7-8 were for double inclusions with diameters of 16 pixels and 14 pixels, and case 9 included three inclusions with diameters of 16 pixels, 14 pixels, and 12 pixels. For a better comparison, the initial pixel images recovered by the CNN were converted from binary images to RGB images with our MATLAB R2020b, MathWorks.Inc, USA, drawing function. Since some noise was added to the simulated measured data during the training process, the noise in the data translated to artifacts in the image domain. In the real experiments, we had the true 0 and 1 situations representing the conducting and nonconducting materials, where any value in between was ignored. From the above table, we can see that the 625 CNN models could effectively reveal the number, size, and position of the simulated inclusions, with an average SSIM of 0.8658, average MSE of 0.0203, and average PSNR of 18.0856. With the increasing number of inclusions, the SSIM dropped and the MSE increased, while it could still reach an SSIM of over 0.7 and an MSE of less than 0.05 for three-sample detection. The consistency between the TV results and CNN results verifies the reliability of the CNN models and provides feasibility for experimental reconstruction.

Experimental Reconstruction Results
Experimental data was collected from the CCERT system as shown in Figure 9a,b, which included an insulating pipe, a 12-electrode circular array sensor, 12 excitation and detection units, a signal control and processing unit, and a microcomputer. Plastic rods with diameters of 34.5 mm, 29.5 mm, and 26.5 mm were utilized as detected samples, which approximately matched the simulated inclusions with diameters of 16 pixels, 14 pixels, and 12 pixels, respectively. Their distributions also corresponded to the examined simulation cases in Table 4, so we took the same simulation image as the true image for each case. The TD method was adopted to eliminate background effects. Like in the simulated training data, each set of 66 experimental resistances were scaled to [0 1] and converted into an 11 × 6 matrix before being put into the models. The experimental reconstruction results are demonstrated in Table 5.  Comparing Tables 4 and 5, for each case, the SNR value by the CNN for experimental reconstruction was lower than that of the simulation reconstruction due to the random noise and interference during measurements. Besides that, the effect of scaling also amplified the differences. Taking case 1 as an example, Figure 10 plots the 66 scaled resistance measurement data of the simulation and experimental tests. Both reasons led to a decrease of the SSIM and increase of the MSE in practical reconstruction. Even so, Table 5 shows that the CNN can be well applied for real data to reveal the relative size and position of the plastic rods, with an average SSIM of 0.7846, average MSE of 0.0408, and average PSNR of 14.3733, which indicates that our networks did well in terms of noise tolerance. The average SSIM, MSE, and PSNR for the TV method were 0.7947, 0.0436, and 14.1732, respectively.   In all nine experimental cases, six cases had higher SSIM values, lower MSE values, and higher PSNR values with the CNN method than those with the TV algorithm, which demonstrates the improvement in image reconstruction accuracy for the CCERT system by the multi-CNN approach and the feasibility of applying deep learning for two-phase material imaging by CCERT. What is more, the typical calculation time to reconstruct the image with the 625 DL models was around 1 min. Though the time of producing one image with a CNN was longer than that with the TV algorithm (several seconds) at the current time, the improvement of GPUs in the future can accelerate the reconstruction process to provide real-time imaging.

Conclusions
This research studied the feasibility of a CNN-based reconstruction algorithm for a circular CCERT system. CCERT has the same advantages as the traditional ERT system, including simplicity, no invasions, no radiation, rapid response, and a low cost. Additionally, CCERT avoids contact errors by inserting an insulation layer between the conductive mediums and electrodes. Additionally, CCERT could achieve a higher image quality due to the extended frequency range. The forward model was simulated based on the Maxwell equations and FEM method, and the image reconstruction was realized by a deep learning approach. A CNN was adopted as the network architecture due to its superior ability to extract features from the input data, and thus its suitability to use CNNs for classification tasks. Each 2500-pixel image was divided into 625 clusters so that a CNN could be applied on each cluster to solve the distinct multi-class classification problems. Each CNN took in data and mapped them into a label representing the pixel distribution. The CNN models were achieved by accessing data, constructing layers, setting training options, and conducting training. The training of each CNN was carried out separately to pursue a fitting model for each cluster. After tunings, the 625 models could achieve satisfying training accuracies, and they were then applied for the reconstruction of entire images. Both the simulation images and practical measurement images achieved acceptable results, which confirmed the practicability of applying multiple CNNs for image reconstruction in circular CCERT. The training with the simulated data and successful tests conducted with experimental data are very promising; the results allow greater depth of computer-based optimization of the CCERT system. In this study, the CNN approach was compared with one of the state-of-the-art total variation algorithms and provided similar performance. The TV algorithm still needed thresholding of the final image, which was not always straightforward, while the CNN was directly producing binary images. In this work, we considered nine scenarios to test whether the proposed CNN was capable of imaging with high quality. It is worth noticing that there were good performances shown by the state-of-the-art traditional imaging methods, such as TV algorithm, as well as both the shallow and deep neural networks. In future work, as more scenarios are considered to train the system, such as the case where inclusions contact each other, the performance of the system will become better. In theory, the proposed method should handle such nonlinearity, but it needs to be compared with a nonlinear traditional algorithm.