Next Article in Journal
A Proteomic Study of Atherosclerotic Plaques in Men with Coronary Atherosclerosis
Next Article in Special Issue
Breast Cancer Detection—A Synopsis of Conventional Modalities and the Potential Role of Microwave Imaging
Previous Article in Journal
Estimating the Spatial Accessibility to Blood Group and Rhesus Type Point-of-Care Testing for Maternal Healthcare in Ghana
Previous Article in Special Issue
Breast Cancer Diagnosis Using an Efficient CAD System Based on Multiple Classifiers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Breast Ultrasound Image Synthesis using Deep Convolutional Generative Adversarial Networks

1
Department of Diagnostic Radiology, Tokyo Medical and Dental University Hospital 1-5-45, Yushima, Bunkyo-ku, Tokyo 113-8501, Japan
2
Department of Radiology, Dokkyo Medical University Hospital, 880 Kitakobayashi, Mibu, Shimotsugagun, Tochigi 321-0293, Japan
3
Department of Surgery, Breast Surgery, Tokyo Medical and Dental University Hospital 1-5-45, Yushima, Bunkyo-ku, Tokyo 113-8501, Japan
*
Author to whom correspondence should be addressed.
Diagnostics 2019, 9(4), 176; https://doi.org/10.3390/diagnostics9040176
Submission received: 9 October 2019 / Revised: 30 October 2019 / Accepted: 5 November 2019 / Published: 6 November 2019
(This article belongs to the Special Issue Multimodality Breast Imaging)

Abstract

:
Deep convolutional generative adversarial networks (DCGANs) are newly developed tools for generating synthesized images. To determine the clinical utility of synthesized images, we generated breast ultrasound images and assessed their quality and clinical value. After retrospectively collecting 528 images of 144 benign masses and 529 images of 216 malignant masses in the breasts, synthesized images were generated using a DCGAN with 50, 100, 200, 500, and 1000 epochs. The synthesized (n = 20) and original (n = 40) images were evaluated by two radiologists, who scored them for overall quality, definition of anatomic structures, and visualization of the masses on a five-point scale. They also scored the possibility of images being original. Although there was no significant difference between the images synthesized with 1000 and 500 epochs, the latter were evaluated as being of higher quality than all other images. Moreover, 2.5%, 0%, 12.5%, 37.5%, and 22.5% of the images synthesized with 50, 100, 200, 500, and 1000 epochs, respectively, and 14% of the original images were indistinguishable from one another. Interobserver agreement was very good (|r| = 0.708–0.825, p < 0.001). Therefore, DCGAN can generate high-quality and realistic synthesized breast ultrasound images that are indistinguishable from the original images.

1. Introduction

With the recent development of deep learning technology, the use of deep learning methods for medical image synthesis has increased dramatically [1,2,3]. One of the most interesting breakthroughs in the field of deep learning is the advent of generative adversarial networks (GANs), which consist of effective machine learning frameworks used to train unsupervised generative models. A GAN is a special type of neural network model in which two networks are trained simultaneously; one focuses on image generation and the other on discrimination [4]. We can apply a GAN to many medical applications when image reconstruction, segmentation, detection, classification, or cross-modality synthesis is necessary [5]. A deep convolutional GAN (DCGAN) is a direct extension of GAN that uses convolutional and transpose–convolutional layers in the discriminator and generator, respectively. DCGANs can reportedly generate high-quality medical images [6]. When using diagnostic images in publications or lectures, patient consent is required and their privacy must be preserved. Moreover, the collection of numerous diagnostic images can be time-consuming [5]. However, these problems may be overcome if synthetic images with the same characteristics as the real image are available using a DCGAN. Unfortunately, the reports on medical imaging which employ a GAN have not demonstrated any clinical evaluations using the images to date, and to the best of our knowledge, there are no reports of breast ultrasound image synthesis using a DCGAN.
To this end, the purpose of this study was to use a DCGAN to generate breast ultrasound images and evaluate their clinical value.

2. Materials and Methods

2.1. Patients

Our medical ethics committee (Tokyo Medical and Dental University Hospital Ethics Committee) approved the retrospective study in 20 September 2019 and waived the requirement for written informed consent from patients. In the present study, the inclusion criteria were as follows: (a) those who underwent breast ultrasound examination at our hospital between January 2010 and December 2017 and presented with breast masses and (b) those whose masses were diagnosed as being benign or malignant by histopathological examination or >2-year follow-up. The exclusion criteria were as follows: (a) those who received hormone therapy, chemotherapy, or radiation therapy and (b) those who were <20 years of age. A breast radiologist (Tomoyuki Fujioka) and medical student (Mizuki Kimura) randomly selected and extracted the maximum six different cross-sectional images of one mass and the maximum two masses in one patient. Ultrasound images with strong artifacts were excluded.
Five radiologists with 4–20 years of experience performed the ultrasound examinations using an Aplio XG scanner with a PLT-805AT 8.0-MHz linear probe (Toshiba Medical Systems, Tochigi, Japan), an Aplio 500 scanner with a PLT-805AT 8.0-MHz linear probe (Toshiba Medical Systems, Tochigi, Japan), or an EUB-7500 scanner with a EUP-L54MA 9.75-MHz linear probe (Hitachi Medical Systems, Tokyo, Japan). The radiologists acquired static images in the vertical and horizontal planes and measured the maximum diameter of the masses.

2.2. Data Set

In this study, all solid and cystic masses, including simple cysts, were evaluated. We used the same set of data that were employed in our previous work [7]. Ultrasound DICOM images were converted to JPEG figures using the viewing software TFS-01 (Toshiba Medical Systems, Tochigi, Japan) and trimmed to include the chest wall with Microsoft Paint (Microsoft, Redmond, WA, USA) for analysis.
Table 1 summarizes the details of patient numbers, masses, images, age, and maximum mass diameter. We extracted a maximum of six cross-sectional images per mass and two masses per patient. For image synthesis, we used a total of 1057 images of 360 masses in 355 patients (528 images of 144 benign masses in 141 patients and 529 images of 216 malignant masses in 214 patients). Table 2 presents the results of histopathological tests of the masses.

2.3. Image Synthesis

Image synthesis was performed on DEEPstation DK-1000 (UEI, Tokyo, Japan) containing the graphics processing unit GeForce GTX 1080 (NVIDIA, CA, USA), central processing unit Core i7-8700 (Intel, CA, USA), and graphical user interface-based deep learning tool Deep Analyzer (GHELIA, Tokyo, Japan). Images were constructed using a DCGAN [6]. The DCGAN discriminator was made up of strided convolution layers, batch norm layers, and LeakyReLU activations. The generator comprised of transpose–convolutional layers, batch norm layers, and ReLU activations. The strided transpose–convolutional layers allowed the latent vector to be transformed into a volume with the same shape as an image. The parameters for the generator and discriminator were the same as those reported in the study by Radford et al. [6]: optimizer algorithm = Adam (lr = 0.0002, β1 = 0.5, β2 = 0.999, eps = le−8). The image data were set to be input at a pixel size of 256 × 256 and output at a pixel size of 256 × 256. After building the models, we generated 20 images with 50, 100, 200, 500, and 1000 epochs. Moreover, we randomly selected 40 original images. Figure 1 shows five examples of the synthetic and original breast ultrasound images.

2.4. Radiologist Readout

In the present study, two breast radiologists [reader 1 (Mio Mori), who had 6 years of experience in breast imaging, and reader 2 (Tomoyuki Fujioka), who had 10 years of experience in breast imaging] assessed the ultrasound images. The radiologists subjectively evaluated the 20 generated images with 50, 100, 200, 500, and 1000 epochs and 40 original images on a scale of 1–5 (1 = excellent, 2 = good, 3 = normal, 4 = poor, and 5 = very poor) for overall image quality, definition of anatomic structures, and visualization of the masses. The quality of the original breast ultrasound image at our facility was used as the standard for score 1. They also score the possibility of the images being original on a scale of 1–5 (1 = 100%, 2 = 75%, 3 = 50%, 4 = 25%, and 5 = 0%).

2.5. Statistical Analysis

All statistical analyses were performed using the EZR software package version 1.31 (Saitama Medical Center, Jichi Medical University, Saitama, Japan) [8].
In this work, the data are presented as the mean and standard deviation. Mann–Whitney U-tests were used for the analysis of characteristics, including patient age, maximum mass diameter, and image quality (overall quality of images, definition of anatomic structures, and visualization of the masses) using the mean five-point assessment scores given by readers 1 and 2. The interobserver agreement of scores given by readers 1 and 2 was assessed using Spearman’s coefficient of correlation. A p value of <0.05 was considered to be statistically significant.

3. Results

Malignant masses were larger than benign masses, and patients with malignant masses were significantly older than those with benign masses (Table 1).
Table 3 and Table 4 summarize the results of the evaluation and comparison of image quality for the synthetic and original images. In the five-point assessment of overall image quality, definition of anatomic structures, and visualization of the masses, the scores for the original images were lower than those for the synthesized images (p < 0.001). Although there was no significant difference between images synthesized with 1000 and 500 epochs (p = 0.725–1.000), the latter were evaluated as being of better quality than all other images. Images synthesized with 500 epochs were significantly different from those synthesized with 50, 100, or 200 epochs in terms of overall quality (p < 0.016) and visualization of the masses (p < 0.006). Moreover, images synthesized with 500 epochs were significantly different from those synthesized with 50 or 100 epochs in terms of the definition of anatomic structures (p < 0.001). Interobserver agreement was very good for overall quality (|r| = 0.825, p < 0.001), definition of anatomic structures (|r| = 0.784, p < 0.001), and visualization of the masses (|r| = 0.708, p < 0.001). In the evaluation of the possibility of images being originals, 2.5%, 0%, 12.5%, 37.5%, and 22.5% of the images synthesized with 50, 100, 200, 500, and 1000 epochs, respectively, were misinterpreted as being original images (possibility of the synthesized images being original with 100% and 75%) or were indistinguishable from the original images (possibility of the synthesized images being original with 50%). Moreover, 14.0% of the original images were indistinguishable from the synthesized images (possibility of the original images being synthesized with 50%) (Table 5).

4. Discussion

In this study, we used a DCGAN to synthesize breast ultrasound images and asked two experienced breast radiologists to evaluate those images from a clinical perspective. Our results indicate that high-quality breast ultrasound images that are indistinguishable from the original images can be generated using the DCGAN.
The concept of adversarial training is relatively new, and great progress has recently been made in this area of research [4]. In our study, the fully connected layer was used as a building block for a GAN, which was later replaced by the complete convolutional downsampling/upsampling layer for the DCGAN [6]. The DCGAN showed better training stability and generated high-quality medical images. GANs have attracted much attention since their development, and the number of reports involving their use has been rising each year [5].
Previously, Beers et al. showed realistic medical images in two different domains, namely retinal fundus photographs showing retinopathy associated with prematurity and two-dimensional magnetic resonance images of gliomas using progressive growing of GANs, which can create photorealistic images at high resolutions [9]. Additionally, Frid-Adar et al. reported that the accuracy of classifying liver lesions was increased using synthetic images generated by a DCGAN from a limited dataset of computed tomography images of 182 liver lesions (including cysts, metastases, and hemangiomas) as a training set for the convolutional neural network [10]. Synthetic images of lung nodules on computed tomography scans, chest infections on X-rays, and breast tissues on mammography using GANs have also been reported [11,12,13]. However, reports of the use of GAN for medical images synthesis have not demonstrated any associated clinical evaluations.
In this study, we generated synthetic images with 50, 100, 200, 500, and 1000 epochs and demonstrated that as the number of learning repetitions increased, the quality of the final image increased. Although there was no significant difference between the images synthesized with 1000 and 500 epochs, the latter showed a better quality than all other images. Therefore, to generate a high-quality synthetic image, multiple epochs are necessary. However, if the number of learning repetitions is very high, overlearning may occur and the accuracy of the generated images may decrease. In this study, we adopted the parameters for the generator and discriminator from our previous work [6] and generated high-quality breast ultrasound synthetic images.
In reference to the synthetic images derived using 500 epochs, 37.5% of the images were misinterpreted as being original or were indistinguishable from the original images and, interestingly, 14.0% of the original images were indistinguishable from the synthesized images. This finding demonstrates that the quality of the realistic and phenotypically diverse synthetic images was such that they perplexed the readers so that they could not confidently assert which images were original.
As research on generated synthetic images using GANs progresses, we may generate realistic data that can be applied to training materials for radiologists and deep learning networks that resolve problems such as the requirement for patient agreement when publishing images and the time and effort required to collect images [5]. The rational that images generated using a DCGAN are acceptable for diagnostic images has not been provided. Synthesized data are still virtual to date, and therefore it is risky to make clinical or scientific conclusions with such data.
This study has several limitations. First, the retrospective study was conducted at a single institution using three ultrasound systems from two companies. Therefore, extensive, multicenter studies are warranted to validate our findings. Second, not all recurrent lesions were diagnosed using cytological or histological diagnoses. Third, we performed this study using images that were converted to 256 × 256 pixel JPEG images. This image processing step might result in loss of information and influence the performance of the generating models. Fourth, the synthetic images were generated using data from a mixture of benign and malignant masses. It may be possible to improve the reliability of the synthetic images by collecting data from more cases, generating classified ultrasound images representing different pathological cases, and comparing original date and patient information. Fifth, the readers subjectively evaluated the generated images and original images; therefore, bias of the evaluation could not be completely removed. Finally, we examined only ultrasound images of breast masses; we have not fully verified whether images of normal breast tissues and non-mass-containing breast lesions can be applied.

5. Conclusions

DCGANs can generate high-quality and realistic breast ultrasound images that are indistinguishable from original images.

Author Contributions

Conceptualization—T.F., M.M.; methodology—K.K.; formal analysis—Y.K. (Yuka Kikuchi), L.K.; investigation—M.A., G.O., T.N.; supervision—Y.K. (Yoshio Kitazume), U.T.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Yasaka, K.; Akai, H.; Kunimatsu, A.; Kiryu, S.; Abe, O. Deep learning with convolutional neural network in radiology. Jpn J. Radiol. 2018, 36, 257–272. [Google Scholar] [CrossRef] [PubMed]
  2. Chartrand, G.; Cheng, P.M.; Vorontsov, E.; Drozdzal, M.; Turcotte, S.; Pal, C.J.; Kadoury, S.; Tang, A. Deep learning: A primer for radiologists. Radiographics 2017, 37, 2113–2131. [Google Scholar] [CrossRef] [PubMed]
  3. Suzuki, K. Overview of deep learning in medical imaging. Radiol. Phys. Technol. 2017, 10, 257–273. [Google Scholar] [CrossRef] [PubMed]
  4. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
  5. Yi, X.; Walia, E.; Babyn, P. Generative adversarial network in medical imaging: A review. Med. Image. Anal. 2019, 58, 101552. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  6. Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
  7. Fujioka, T.; Kubota, K.; Mori, M.; Kikuchi, Y.; Katsuta, L.; Kasahara, M.; Oda, G.; Ishiba, T.; Nakagawa, T.; Tateishi, U. Distinction between benign and malignant breast masses at breast ultrasound using deep learning method with convolutional neural network. Jpn J. Radiol. 2019, 37, 466–472. [Google Scholar] [CrossRef] [PubMed]
  8. Kanda, Y. Investigation of the freely available easy-to-use software ‘EZR’ for medical statistics. Bone Marrow. Transplant. 2013, 48, 452–458. [Google Scholar] [CrossRef] [PubMed]
  9. Beers, A.; Brown, J.; Chang, K.; Campbell, J.P.; Ostmo, S.; Chiang, M.F.; Kalpathy-Cramer, J. High-resolution medical image synthesis using progressively grown generative adversarial networks. arXiv preprint. arXiv 2018, arXiv:1805.03144. [Google Scholar]
  10. Frid-Adar, M.; Klang, E.; Amitai, M.; Goldberger, J.; Greenspan, H. Gan-based data augmentation for improved liver lesion classification. arXiv 2018, arXiv:1803.01229. [Google Scholar]
  11. Chuquicusma, M.J.; Hussein, S.; Burt, J.; Bagci, U. How to Fool Radiologists with Generative Adversarial Networks? A Visual Turing Test for Lung Cancer Diagnosis. In Proceedings of the 2018 IEEE 15th International Symposium on Biomedical Imaging, Washington, DC, USA, 4–7 April 2018; pp. 240–244. [Google Scholar]
  12. Salehinejad, H.; Valaee, S.; Dowdell, T.; Colak, E.; Barfett, J. Generalization of Deep Neural Networks for Chest Pathology Classification in X-rays using Generative Adversarial Networks. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing, Calgary, AB, Canada, 15–20 April 2018; pp. 990–994. [Google Scholar]
  13. Korkinof, D.; Rijken, T.; O’Neill, M.; Yearsley, J.; Harvey, H.; Glocker, B. High-resolution mammogram synthesis using progressive generative adversarial networks. arXiv 2018, arXiv:1807.03401. [Google Scholar]
Figure 1. Examples of five synthetic images generated with 50 (A), 100 (B), 200 (C), 500 (D), and 1000 (E) epochs and the original images (F).
Figure 1. Examples of five synthetic images generated with 50 (A), 100 (B), 200 (C), 500 (D), and 1000 (E) epochs and the original images (F).
Diagnostics 09 00176 g001aDiagnostics 09 00176 g001b
Table 1. Characteristics of patients and their masses.
Table 1. Characteristics of patients and their masses.
BenignMalignantp
Patients (n)141214
Masses (n)144216
Images (n)528529
AgeMean (y)48.8 ± 12.160.3 ± 12.6p < 0.001
Range (y)21–8427–84
Maximum DiameterMean (mm)13.5 ± 8.117.1 ± 7.9p < 0.001
Range (mm)4–505–41
Comparison was performed using the Mann–Whitney U-test.
Table 2. Histopathology of masses.
Table 2. Histopathology of masses.
Benign (n = 144)Malignant (n = 216)
Fibroadenoma 47Ductal Carcinoma in Situ 17
Mastopathy 19Invasive Ductal Cancer 168
Intraductal Papilloma 17Invasive Lobular Carcinoma 9
Phyllodes Tumor (Benign) 2Mucinous Carcinoma 8
Fibrous Disease 1Apocrine Carcinoma 7
Lactating Adenoma 1Invasive Micropapillary Carcinoma 2
Abscess 1Malignant Lymphoma 1
Adenosis 1Medullary Carcinoma 1
Pseudoangiomatous Stromal Hyperplasia 1Adenoid Cystic Carcinoma 1
Radial scar/Complex Sclerosing Lesion 1Phyllodes Tumor (Malignant) 1
No Malignancy 5Adenomyoepithelioma With Carcinoma 1
Not Known 48 (Diagnosed by Follow-up)
Table 3. Five-point assessment score of synthetic and original images.
Table 3. Five-point assessment score of synthetic and original images.
Overall Quality of ImagesDefinition of Anatomic StructuresVisualization of the Masses
ReaderR1R2R1&2R1R2R1&2R1R2R1&2
50 epochs4.85 ± 0.374.25 ± 0.724.55 ± 0.484.85 ± 0.374.20 ± 0.894.53 ± 0.534.30 ± 0.574.05 ± 0.694.18 ± 0.54
100 epochs4.55 ± 0.514.05 ± 0.514.30 ± 0.414.80 ± 0.414.00 ± 0.654.40 ± 0.384.15 ± 0.493.50 ± 0.693.83 ± 0.49
200 epochs4.10 ± 0.643.00 ± 0.923.55 ± 0.634.05 ± 0.602.95 ± 0.833.50 ± 0.544.25 ± 0.852.85 ± 0.883.55 ± 0.71
500 epochs3.45 ± 0.832.35 ± 0.592.90 ± 0.553.50 ± 0.832.55 ± 0.603.03 ± 0.603.45 ± 0.832.10 ± 0.452.78 ± 0.41
1000 epochs3.70 ± 0.922.65 ± 0.933.18 ± 0.734.00 ± 0.862.80 ± 0.703.40 ± 0.643.40 ± 0.942.15 ± 0.992.78 ± 0.85
Real1.48 ± 0.601.20 ± 0.411.34 ± 0.401.45 ± 0.641.38 ± 0.591.41 ± 0.441.55 ± 0.751.35 ± 0.621.45 ± 0.53
R1:Reader1, R2:Reader2, R1&2: Average of Reader1 and 2.
Table 4. Comparison of quality of synthetic and original images.
Table 4. Comparison of quality of synthetic and original images.
Overall Quality of Images (p)Definition of Anatomic Structures (p)Visualization of the Masses (p)
50 vs. 1000.4701.0000.426
50 vs. 200<0.001<0.0010.100
50 vs. 500<0.001<0.001<0.001
50 vs. 1000<0.001<0.001<0.001
50 vs. Real<0.001<0.001<0.001
100 vs. 200<0.001<0.0011.000
100 vs. 500<0.001<0.001<0.001
100 vs. 1000<0.001<0.001<0.001
100 vs. Real<0.001<0.001<0.001
200 vs. 5000.0160.0950.016
200 vs. 10000.8971.0000.071
200 vs. Real<0.001<0.001<0.001
500 vs. 10001.0000.7251.000
500 vs. Real<0.001<0.001<0.001
1000 vs. Real<0.001<0.001<0.001
Average score of Reader1 and 2; Mann–Whitney U-test was performed.
Table 5. Evaluation for possibility of original images.
Table 5. Evaluation for possibility of original images.
Possibility of Original IMAGES50 epochs100 epochs200 epochs500 epochs1000 epochsReal
ScoreR1R2R1R2R1R2R1R2R1R2R1R2
1 (100%)00000000001825
2 (75%)00000212111313
3 (50%)0100122102592
4 (25%)040831510851100
5 (0%)201520121617012300
Indistinguishable Images (%)
(Average of R1&R2)
0%5%0%0%5%20%15%60%15%30%23%5%
(2.5%)(0%)(12.5%)(37.5%)(22.5%)(14%)

Share and Cite

MDPI and ACS Style

Fujioka, T.; Mori, M.; Kubota, K.; Kikuchi, Y.; Katsuta, L.; Adachi, M.; Oda, G.; Nakagawa, T.; Kitazume, Y.; Tateishi, U. Breast Ultrasound Image Synthesis using Deep Convolutional Generative Adversarial Networks. Diagnostics 2019, 9, 176. https://doi.org/10.3390/diagnostics9040176

AMA Style

Fujioka T, Mori M, Kubota K, Kikuchi Y, Katsuta L, Adachi M, Oda G, Nakagawa T, Kitazume Y, Tateishi U. Breast Ultrasound Image Synthesis using Deep Convolutional Generative Adversarial Networks. Diagnostics. 2019; 9(4):176. https://doi.org/10.3390/diagnostics9040176

Chicago/Turabian Style

Fujioka, Tomoyuki, Mio Mori, Kazunori Kubota, Yuka Kikuchi, Leona Katsuta, Mio Adachi, Goshi Oda, Tsuyoshi Nakagawa, Yoshio Kitazume, and Ukihide Tateishi. 2019. "Breast Ultrasound Image Synthesis using Deep Convolutional Generative Adversarial Networks" Diagnostics 9, no. 4: 176. https://doi.org/10.3390/diagnostics9040176

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop