Holographic Microwave Image Classification Using a Convolutional Neural Network

Holographic microwave imaging (HMI) has been proposed for early breast cancer diagnosis. Automatically classifying benign and malignant tumors in microwave images is challenging. Convolutional neural networks (CNN) have demonstrated excellent image classification and tumor detection performance. This study investigates the feasibility of using the CNN architecture to identify and classify HMI images. A modified AlexNet with transfer learning was investigated to automatically identify, classify, and quantify four and five different HMI breast images. Various pre-trained networks, including ResNet18, GoogLeNet, ResNet101, VGG19, ResNet50, DenseNet201, SqueezeNet, Inception v3, AlexNet, and Inception-ResNet-v2, were investigated to evaluate the proposed network. The proposed network achieved high classification accuracy using small training datasets (966 images) and fast training times.


Introduction
Breast cancer is the leading cause of female cancer deaths [1]. Previous studies showed that early breast cancer detection methods combined with suitable treatment could improve survival rates significantly [2]. X-ray mammography is the current gold-standard imaging tool for diagnosing breast cancer, but it produces harmful radiation and is unsuitable for dense breasts [3]. Microwave imaging has been proposed as one of the most potential breast imaging tools [4]. Researchers have extensively investigated microwave imaging in many aspects, including measurement of the microwave dielectric properties of breast tissues [5,6], image algorithms [7,8], numerical models [9,10], data acquisition systems [11][12][13], microwave antennas [14][15][16], clinical trials [17,18], image enhancement and improvement methods [19][20][21], and image classification [22][23][24]. If microwave images contain specific qualitative and quantitative indicators, this may help characterize benign and malignant tumors and predict disease. However, this work is challenging because this interdisciplinary study involves several disciplines, such as microwave science, medical imaging, machine learning, and computer vision.
Over the past two decades, deep learning has attracted increasing attention and has achieved excellent performance in medical image classification and disease detection [25,26]. For example, Chen et al. employed the biclustering mining method in ultrasound images to identify breast lesions with accuracy, sensitivity, and specificity of 96.1, 96.7, and 95.7%, respectively [27]. However, the image datasets were too small to implement generalizations. Le et al. applied a deep neural network to enhance microwave images [28]. Khoshdel et al. investigated the feasibility of using 3D U-Net architecture to improve microwave breast images [29]. Rana et al. investigated machine learning for breast lesion detection using microwave radar imaging [22]. Mojabi et al. applied convolutional neural networks (CNN) to microwave and ultrasound images to classify uncertainty quantification and breast tissue [24]. However, obtaining big microwave image datasets for training networks is challenging.
where l denotes the lth layer and * is a convolution operation. W l , b l , and z l denote the weights matrix, bias matrix, and weighted input of the lth layer. σ is the nonlinear activation function. When l = 2, x 2−1 = x 1 is the image matrix whose elements are pixel values. When l > 2, x l−1 is the feature maps matrix a l−1 , which is extracted from the (l − 1)th layer, i.e., x l−1 = a l−1 = σ z l−1 . Let L be the output layer and a L is the final output vector. Nonlinear activation functions are employed from the second layer to the last layer. The cost function is: where n is the training number and N is the number of neurons in the output layer corresponding to the N classes. t L k is the targeted value corresponding to the kth neuron of the output layer and a L k is the actual output value of the kth neuron of the output layer. The output layer error can be defined as: where ∂(·) denotes the partial derivative operation. l = { L − 1, L − 2, . . . , 2}, then: where • is the Hadamard product. The partial derivative from E L 0 to W l+1 and b l can be calculated as follows: the changes can be computed by: where η denotes the learning rate.
The ResNet architecture reduces training errors and network layers [36]. Adding a quick identity link to the primary network unit is the key to the ResNet architecture: where H(X) is the ideal image and F(X) is the residual map.

Datasets
As shown in Table 1, publicly available MRI-derived breast phantoms from 9 human subjects were used to develop realistic breast models by converting pixel values in MRI images to complex-valued permittivity [37,38]. Figure 1 shows a sample (breast 9) of 12 phantoms and the real and imaginary parts of the relative complex-valued permittivity. Figure 2 shows the real and imaginary parts of 12 breast phantoms. The HMI method was applied to generate HMI breast image datasets using the developed, realistic numerical microwave breast models. The numerical model simulated a sphere-shaped inclusion as a tumor (radius of 5 and 10 mm).       (e,f) real and imaginary parts of the relative complex-valued permittivity of breast 3; (g,h) real and imaginary parts of the relative complex-valued permittivity of breast 4; (i,j) real and imaginary parts of the relative complex-valued permittivity of breast 5; (k,l) real and imaginary parts of the relative complex-valued permittivity of breast 6; (m,n) real and imaginary parts of the relative complex-valued permittivity of breast 7; (o,p) real and imaginary parts of the relative complex-valued permittivity of breast 8; (q,r) real and imaginary parts of the relative complex-valued permittivity of breast 9; (s,t) real and imaginary parts of the relative complex-valued permittivity of breast 10; (u,v) real and imaginary parts of the relative complex-valued permittivity of breast 11 and; (w,x) real and imaginary parts of the relative complex-valued permittivity of breast 12.
This study used two datasets to train and test the CNN networks (see Table 2). Dataset 1 consists of the real part of HMI breast images, and dataset 2 consists of the imaginary part of HMI breast images. According to [37], the dataset in this study includes five classes of HMI images (12 phantoms), which are fatty, dense, heterogeneously dense, very dense, and breasts containing tumors. Class V was identified based on tumors that existed, and three Class V models were investigated in this study (see Table 1). (c,d) real and imaginary parts of the relative complex-valued permittivity of breast 2; (e,f) real and imaginary parts of the relative complex-valued permittivity of breast 3; (g,h) real and imaginary parts of the relative complex-valued permittivity of breast 4; (i,j) real and imaginary parts of the relative complex-valued permittivity of breast 5; (k,l) real and imaginary parts of the relative complex-valued permittivity of breast 6; (m,n) real and imaginary parts of the relative complex-valued permittivity of breast 7; (o,p) real and imaginary parts of the relative complex-valued permittivity of breast 8; (q,r) real and imaginary parts of the relative complex-valued permittivity of breast 9; (s,t) real and imaginary parts of the relative complex-valued permittivity of breast 10; (u,v) real and imaginary parts of the relative complex-valued permittivity of breast 11 and; (w,x) real and imaginary parts of the relative complex-valued permittivity of breast 12.
This study used two datasets to train and test the CNN networks (see Table 2). Dataset 1 consists of the real part of HMI breast images, and dataset 2 consists of the imaginary part of HMI breast images. According to [37], the dataset in this study includes five classes of HMI images (12 phantoms), which are fatty, dense, heterogeneously dense, very dense, and breasts containing tumors. Class V was identified based on tumors that existed, and three Class V models were investigated in this study (see Table 1). An original HMI image contains different types of tissues with different sizes and cannot be applied directly for classification. We applied the image segmentation method to partition each original HMI image into sub-images and created the total of the subimages. Sub-image properties are 227 × 227 pixels (a RGB image). The segmentation method helps to change the representation to a more meaningful and easier-to-analyze image while changing the scale to fit AlexNet. Image segmentation makes HMI images in each sub-image more uniform, which is suitable for classification and facilitates the final determination of the percentage of each mechanism. In addition, to ensure the authenticity of extracted features from the training dataset, image augmentation techniques such as rotation, height, and width shift were not used to ensure the integrity of the original images.

Image Labeling
Both datasets 1 and 2 were classified into five classes (see Figure 2 and Table 2). The fatty breast (class I) consists of skin, muscle, and fat tissue. Dense breast (Class II) consists of skin, muscle, fat, and dense tissue (which has higher dielectric properties than fatty tissue). Heterogeneously dense breast tissue (Class III) consists of skin, muscle, fat tissue, and heterogeneously dense tissue. A very dense breast (Class IV) consists of skin, muscle, fat, dense tissue (which has higher dielectric properties than fat), and very dense fatty tissues (which have higher dielectric properties than fat and dense tissues). A breast contains tumors (Class V) consisting of skin, muscle, fat, heterogeneously dense tissue, and two tumors.
The created HMI images illustrate the application behavior of the trained network. Therefore, their sub-images were not labeled. Different numbers of sub-images from each class were selected for manual labeling and then used for training and testing the proposed network. Training and testing datasets were utterly independent to ensure the reliability and stability of the proposed method.
For each dataset, 70% of the total images were used to train the proposed network, 20% of the total images were used to validate the network, and 10% of the total images were used to test the network. All breast image datasets were resized to 227 × 227 × 3 pixels. The training image dataset was applied to tune the network parameters using a gradientbased method. The testing image dataset was involved in the testing process to generate predictions. Table 2 shows the parameters used for training the networks.

Network Architecture 2.4.1. Modified AlexNet
AlexNet is the most popular CNN architecture due to its better performance in image classification. Thus, this study applied a modified AlexNet with transfer learning (see Table 3) to HMI images to improve image classification accuracy. Table 3 shows the structure of modified AlexNet with transfer learning. The first convolution layer of the network takes input datasets and passes them through convolution filters. Thus, the input image is required to be resized to 227 × 227 × 3 pixels, corresponding to the breadth, height, and three-color channels representing the depth of the input image. The last convolutional layer implements the reconstructed image process, aggregating the high-resolution patch-wise representations to produce the output image. The cross-entropy loss function is used to reduce errors. The batch normalization function is performed before each activation function to solve overfitting problems. The ReLU layer provides faster and more efficient training, mapping negatives, and maintaining positive values. The max pooling layer simplifies the output and reduces the resolution by reducing the number of parameters needed to learn. The fully connected layer combines all features to classify the images into four classes. The SoftMax function normalizes the output of the fully connected layer. were used to test the network. All breast image datasets were resized to 227 × 227 × 3 pixels. The training image dataset was applied to tune the network parameters using a gradient-based method. The testing image dataset was involved in the testing process to generate predictions. Table 2 shows the parameters used for training the networks.

Modified AlexNet
AlexNet is the most popular CNN architecture due to its better performance in image classification. Thus, this study applied a modified AlexNet with transfer learning (see Table 3) to HMI images to improve image classification accuracy. Table 3 shows the structure of modified AlexNet with transfer learning. The first convolution layer of the network takes input datasets and passes them through convolution filters. Thus, the input image is required to be resized to 227 × 227 × 3 pixels, corresponding to the breadth, height, and three-color channels representing the depth of the input image. The last convolutional layer implements the reconstructed image process, aggregating the high-resolution patchwise representations to produce the output image. The cross-entropy loss function is used to reduce errors. The batch normalization function is performed before each activation function to solve overfitting problems. The ReLU layer provides faster and more efficient training, mapping negatives, and maintaining positive values. The max pooling layer simplifies the output and reduces the resolution by reducing the number of parameters needed to learn. The fully connected layer combines all features to classify the images into four classes. The SoftMax function normalizes the output of the fully connected layer.  Table 3, the last three layers of AlexNet were replaced by transfer learning to avoid overfitting. The proposed AlexNet network consists of a pre-trained network and a transferred network. The parameters in the pre-trained network were trained on publicly available ImageNet datasets. Therefore, it could be adapted to extract features from the HMI image dataset. The parameters in the transferred network represent a small part of the proposed AlexNet network. Thus, a small training dataset can meet the requirements of transfer learning.

Data Analysis and Image Processing
MATLAB version R2020a with the deep learning library tool was used for data analysis and image processing. The proposed network was developed on a laptop (ThinkPad P53) with an Intel i7-8700K CPU (2.60 GHz) and 256 GB of RAM. Stochastic gradient descent with momentum (SGDM) was selected to train the transferred part of AlexNet.

Performance Metrics
The overall performance of the proposed architecture depends on the evaluation matrix, which contains True Positives (TP), False Positives (FP), False Negatives (FN), and True Negatives (TN). The AlexNet architecture was evaluated on the testing dataset using four performance metrics, including precision and accuracy. Precision quantifies the exactness of a model and represents the ratio of carcinoma images accurately classified out of the union of predicted same-class images [39].
where TP refers to images correctly classified as breast tumor images and FP represents the typical images mistakenly classified as breast tumor images. Accuracy evaluates the correctness of a model and is the ratio of the number of images accurately classified out of the total number of testing images. Accuracy = TP + FN TP + TN + FP + FN (9) where TN refers to the correctly classified standard images. Figure 3a shows the training progress of the proposed network using dataset 1 and the SGDM method, including classification accuracy and cross-entropy loss for each epoch of training and validation. At 50 epochs, the highest classification accuracy of training and validation was 100 and 100%, respectively, and the lowest cross-entropy loss of training and validation was 0 and 0%, respectively. The training time was 11 min and 13 s for training 966 images from dataset 1. Figure 3b displays the training progress of modified AlexNet with transfer learning using dataset 2 and the SGDM method. At 50 epochs, the highest classification accuracy of training and validation was 100 and 100%, respectively, and the lowest cross-entropy loss of training and validation was 0 and 0%, respectively. The training time was 10 min and 55 s for training 966 images from dataset 2.

Results
As shown in Figure 4a, the performance of the proposed network was evaluated using the confusion matrix on testing images (from dataset 1). The actual horizontal row and predicted vertical column demonstrate the classification accuracy and sensitivity of the proposed network, respectively. For example, in the first row, 16 images were used to classify Class IV in the testing dataset, and 16 images (100%) were classified accurately. Therefore, the classification accuracy of Classes I, II, III, IV, and V was 100, 100, 100, 91.7, and 67.7%, respectively. In the first column, 16 images were used to predict class I of the testing images (from dataset 1), where 16 images (100%) were classified accurately. The sensitivity of Classes I, II, III, IV, and V was 100, 78.3, 97.7, 100, and 100%, respectively.  As shown in Figure 4a, the performance of the proposed network was evaluated using the confusion matrix on testing images (from dataset 1). The actual horizontal row and predicted vertical column demonstrate the classification accuracy and sensitivity of the proposed network, respectively. For example, in the first row, 16 images were used to classify Class IV in the testing dataset, and 16 images (100%) were classified accurately. Therefore, the classification accuracy of Classes I, II, III, IV, and V was 100, 100, 100, 91.7, and 67.7%, respectively. In the first column, 16 images were used to predict class I of the testing images (from dataset 1), where 16 images (100%) were classified accurately. The sensitivity of Classes I, II, III, IV, and V was 100, 78.3, 97.7, 100, and 100%, respectively. Figure 4b shows the performance of modified AlexNet with transfer learning on testing images (from dataset 2). In the first row, 16 images were used to classify Class I in the testing dataset, and 16 images (100%) were classified accurately. The proposed network obtained a classification accuracy of 100, 100, 100, 100, and 100% for Classes I, II, III, IV, and V, respectively. In the first column, 16 images were used to predict Class I in the  Figure 4b shows the performance of modified AlexNet with transfer learning on testing images (from dataset 2). In the first row, 16 images were used to classify Class I in the testing dataset, and 16 images (100%) were classified accurately. The proposed network obtained a classification accuracy of 100, 100, 100, 100, and 100% for Classes I, II, III, IV, and V, respectively. In the first column, 16 images were used to predict Class I in the testing images, where 16 images (100%) were classified accurately. The proposed network obtained a sensitivity of 100, 100, 100, 100, and 100% for Classes I, II, III, IV, and V, respectively. testing images, where 16 images (100%) were classified accurately. The proposed network obtained a sensitivity of 100, 100, 100, 100, and 100% for Classes I, II, III, IV, and V, respectively.
(a) (b)   testing images, where 16 images (100%) were classified accurately. The proposed network obtained a sensitivity of 100, 100, 100, 100, and 100% for Classes I, II, III, IV, and V, respectively.      Table 4 presents the prediction results of dataset 2 using several deep learning networks. MobileNet-v2 obtained the highest accuracy (96.84%), and the training time was 28 min and 38 s. AlexNet used the shortest training time (3 min and 4 s) with relatively low accuracy (79.89%), Inception-ResNet-v2 obtained the lowest accuracy (79.34%) and used a long training time (106 min and 48 s), and DenseNet201 used the longest training time (132 min and 25 s) with relatively high accuracy (96.01%). Modified AlexNet with transfer learning achieved higher classification accuracy than other deep learning networks, which is suitable for classifying HMI images.

Architecture Accuracy Training Time Result
MobileNet-v2 96.84% 28 min 38 s  Table 4 presents the prediction results of dataset 2 using several deep learning networks. MobileNet-v2 obtained the highest accuracy (96.84%), and the training time was 28 min and 38 s. AlexNet used the shortest training time (3 min and 4 s) with relatively low accuracy (79.89%), Inception-ResNet-v2 obtained the lowest accuracy (79.34%) and used a long training time (106 min and 48 s), and DenseNet201 used the longest training time (132 min and 25 s) with relatively high accuracy (96.01%). Modified AlexNet with transfer learning achieved higher classification accuracy than other deep learning networks, which is suitable for classifying HMI images.   Table 4 presents the prediction results of dataset 2 using several deep learning networks. MobileNet-v2 obtained the highest accuracy (96.84%), and the training time was 28 min and 38 s. AlexNet used the shortest training time (3 min and 4 s) with relatively low accuracy (79.89%), Inception-ResNet-v2 obtained the lowest accuracy (79.34%) and used a long training time (106 min and 48 s), and DenseNet201 used the longest training time (132 min and 25 s) with relatively high accuracy (96.01%). Modified AlexNet with transfer learning achieved higher classification accuracy than other deep learning networks, which is suitable for classifying HMI images.

Discussion
In this study, five classes of breast phantoms were developed using the method presented in [37]. The initial HMI breast images were created using the HMI method detailed in [33]. The initial images were analyzed and processed using the proposed CNN architecture. The proposed architecture offered higher classification accuracy and sensitivity for image dataset 2 (imagery-part HMI images; see Figure 4a) than image dataset 1 (realpart HMI images; see Figure 4b). For image dataset 1, the modified AlexNet with transfer learning offers higher classification accuracy for classes I-III (100%) than classes IV (91.7%) and V (67.7%), and higher sensitivity for classes I (100%), IV (100%), and V (100%) than classes II (78.3%) and III (97.7%). However, no significant difference in classification

Discussion
In this study, five classes of breast phantoms were developed using the method presented in [37]. The initial HMI breast images were created using the HMI method detailed in [33]. The initial images were analyzed and processed using the proposed CNN architecture. The proposed architecture offered higher classification accuracy and sensitivity for image dataset 2 (imagery-part HMI images; see Figure 4a) than image dataset 1 (realpart HMI images; see Figure 4b). For image dataset 1, the modified AlexNet with transfer learning offers higher classification accuracy for classes I-III (100%) than classes IV (91.7%) and V (67.7%), and higher sensitivity for classes I (100%), IV (100%), and V (100%) than classes II (78.3%) and III (97.7%). However, no significant difference in classification

Discussion
In this study, five classes of breast phantoms were developed using the method presented in [37]. The initial HMI breast images were created using the HMI method detailed in [33]. The initial images were analyzed and processed using the proposed CNN architecture. The proposed architecture offered higher classification accuracy and sensitivity for image dataset 2 (imagery-part HMI images; see Figure 4a) than image dataset 1 (realpart HMI images; see Figure 4b). For image dataset 1, the modified AlexNet with transfer learning offers higher classification accuracy for classes I-III (100%) than classes IV (91.7%) and V (67.7%), and higher sensitivity for classes I (100%), IV (100%), and V (100%) than classes II (78.3%) and III (97.7%). However, no significant difference in classification

Discussion
In this study, five classes of breast phantoms were developed using the method presented in [37]. The initial HMI breast images were created using the HMI method detailed in [33]. The initial images were analyzed and processed using the proposed CNN architecture. The proposed architecture offered higher classification accuracy and sensitivity for image dataset 2 (imagery-part HMI images; see Figure 4a) than image dataset 1 (real-part HMI images; see Figure 4b). For image dataset 1, the modified AlexNet with transfer learning offers higher classification accuracy for classes I-III (100%) than classes IV (91.7%) and V (67.7%), and higher sensitivity for classes I (100%), IV (100%), and V (100%) than classes II (78.3%) and III (97.7%). However, no significant difference in classification accuracy and sensitivity was obtained for dataset 2. Figure 4 demonstrates that image datasets affect the performance classification accuracy and sensitivity of the modified AlexNet with transfer learning.
The randomly selected 16 testing examples of image dataset 1 are shown in Figure 5b, and the 16 randomly selected testing examples of image dataset 2 are shown in Figure 6b. Although a classification accuracy of 100% was obtained for examples of image dataset 1 (see Figure 5b), it does not mean that the classification accuracy of dataset 1 is as high as 100%. For example, the classification accuracy rates of 91.7% and 67.7% were obtained for classes IV and V, respectively (see Figure 4a). Although the proposed CNN architecture provides accuracy and sensitivity of 100% to classify dataset 2 (see Figure 4b), the classification accuracy of some testing examples is below 100% (96.36-99.96%; see Figure 6b). This may be caused by MATLAB calculation errors.
Compared with some popular deep learning networks (see Table 4), modified AlexNet with transfer learning has apparent advantages in classification accuracy and training time. For example, modified AlexNet with transfer learning obtained higher accuracy (100% vs. 96.84%) and required shorter training time (10 min 55 s vs. 28 min 38 s) to classify image dataset 2 than MobileNet-v2. The experimental results demonstrated that the modified AlexNet with transfer learning could identify, classify, and quantify HMI images with high accuracy, sensitivity, and reasonable training time. Several factors may affect the test results, including image preprocessing, the number of training images (in percentages), the total number of image datasets, and MATLAB calculation errors.

Conclusions
In this study, the CNN architecture was introduced for analyzing HMI images. A modified AlexNet with transfer learning was developed to identify, classify, and quantify five classes of HMI images (fatty, dense, heterogeneously dense, very dense, and very dense breasts containing tumors). Various experimental validations were conducted to validate the performance of the proposed network. Various popular deep learning networks, including AlexNet, were studied to evaluate the proposed network. Results demonstrated that the proposed network could automatically identify and classify HMI images more accurately (100%) than other deep learning networks. In conclusion, the proposed network has the potential to become an effective tool for analyzing HMI images using small training datasets, which offers promising applications in the microwave breast imaging field.
Funding: This research was funded by the International Science and Technology Cooperation Project of the Shenzhen Science and Technology Commission (GJHZ20200731095804014).

Institutional Review Board Statement: Not applicable.
Data Availability Statement: Data and code are available from the corresponding authors upon reasonable request.

Conflicts of Interest:
The author declares no conflict of interest.