Next Article in Journal
Recent Advances of Artificial Intelligence Applications in Interstitial Lung Diseases
Previous Article in Journal
Underexpression of Carbamoyl Phosphate Synthetase I as Independent Unfavorable Prognostic Factor in Intrahepatic Cholangiocarcinoma: A Potential Theranostic Biomarker
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Skeleton Segmentation on Bone Scintigraphy for BSI Computation

1
Department of Biomedical Imaging and Radiological Science, China Medical University, Taichung 404, Taiwan
2
Department of Nuclear Medicine, Feng Yuan Hospital Ministry of Health and Welfare, Taichung 420, Taiwan
3
Center of Augmented Intelligence in Healthcare, China Medical University Hospital, Taichung 404, Taiwan
*
Author to whom correspondence should be addressed.
Diagnostics 2023, 13(13), 2302; https://doi.org/10.3390/diagnostics13132302
Submission received: 19 June 2023 / Revised: 2 July 2023 / Accepted: 5 July 2023 / Published: 6 July 2023
(This article belongs to the Section Machine Learning and Artificial Intelligence in Diagnostics)

Abstract

:
Bone Scan Index (BSI) is an image biomarker for quantifying bone metastasis of cancers. To compute BSI, not only the hotspots (metastasis) but also the bones have to be segmented. Most related research focus on binary classification in bone scintigraphy: having metastasis or none. Rare studies focus on pixel-wise segmentation. This study compares three advanced convolutional neural network (CNN) based models to explore bone segmentation on a dataset in-house. The best model is Mask R-CNN, which reaches the precision, sensitivity, and F1-score: 0.93, 0.87, 0.90 for prostate cancer patients and 0.92, 0.86, and 0.88 for breast cancer patients, respectively. The results are the average of 10-fold cross-validation, which reveals the reliability of clinical use on bone segmentation.

1. Introduction

Bone is the most common targeted site for metastatic cancer, especially in the advanced and later phases of cancer progression—notably breast, prostate, and lung cancers, with the highest incidence rates [1]. Bone metastases can severely impact patients’ daily activities and quality of life due to severe pain and associated major complications. The protracted clinical course of bone metastasis poses significant challenges to treatment. Per a 2022 report published in the Taiwan National Health Insurance Research Database [2], prostate cancer ranked sixth among the leading causes of cancer death among Taiwanese men. In contrast, breast cancer ranked second among the leading causes of cancer death among Taiwanese women. Diagnostic techniques for bone metastasis currently include bone scintigraphy (BS), X-ray imaging, computed tomography (CT), and magnetic resonance imaging (MRI), while BS serves as the most cost-effective early screening method. BS can diagnose bone metastasis earlier than CT or X-ray, within 3 to 6 months [3].
Bone metastasis typically affects the central skeletal system and the proximal regions of the upper and lower limbs. The central skeletal system contains red bone marrow, which may contribute to the formation of bone metastasis due to its physiological characteristics [4]. Physicians often perform a whole-body bone scan (WBBS) to diagnose the presence of bone metastasis. 99mTc-MDP is the radiopharmaceutical injected into a patient’s vein, which can enter the bone cells and deposit with mineral components in four hours. Consequently, Tc-99m MDP tends to accumulate in areas of active bone formation in the affected region, resulting in localized increased radiopharmaceutical activity that appears as a “hot spot” on BS, allowing physicians to identify bone metastasis [5]. However, BS may suffer from ambiguity owing to impacts such as bone injury, arthritis, and degenerative changes, and causes interpretation challenges. Inexperienced clinical physicians may struggle to make accurate judgments or even misinterpret images.
Bone scan index (BSI) is an imaging biomarker used to quantify the extent of bone metastasis in cancers [6]. BSI is calculated as the ratio of “the number of bone lesions indicating bone metastasis” to “the number of regions with a high incidence of bone metastasis” [7,8,9], as shown in Figure 1. With artificial intelligence, machine learning, and big data, BSI calculation has become more objective, accurate, and diagnostically efficient. BSI’s most attractive application is monitoring treatment and prognosis, providing significant clinical value. Armstrong et al. from Duke University introduced the automated bone scan index (aBSI) as an objective imaging parameter [10], which can evaluate the prognosis of metastatic castration-resistant prostate cancer (mCRPC) patients undergoing systemic treatment in clinical trials. In [11,12], manual and automated BSI measurements were highly correlated (ρ = 0.80), and automated BSI scoring demonstrated reproducibility, eliminating the subjectivity of clinical judgment while retaining the same clinical significance as manual BSI scoring. Furthermore, some studies confirmed the utility of aBSI in mCRPC patients [13,14,15], while other studies have begun to explore its application and refinement in other tumors [16].
Generally, computer-assisted diagnosis (CAD) systems that utilize machine learning or neural network (NN) framework for calculating BSI on WBBS images can be divided into two parts: lesion segmentation and skeleton segmentation, which respectively reflect the numerator and denominator of the BSI value [17,18,19,20]. Recently, numerous studies [21,22] and related patents [23,24] on lesion segmentation using the NN framework have been conducted. However, the performance of the lesion pixel-wise segmentation has not been thoroughly and rigorously investigated. Similarly, research on skeleton segmentation using deep learning and NN models is scarce in previous studies [20,25] despite the mention of its skeleton segmentation approach in [20], which lacks comparison with other NN models. Although [25] compared its performance with U-Net, it remained confined to traditional semantic segmentation network architectures. Thus, the field of skeleton segmentation using NN remains insufficiently explored. This paper uses different NN models for skeleton segmentation on WBBS images and investigates their results. Additionally, we have built a website platform for online skeleton segmentation of WBBS images [Appendix A], which provides effective skeleton segmentation data for further evaluation of BSI.

2. Materials and Methods

2.1. Materials

In this retrospective study in collaboration with the Department of Nuclear Medicine at China Medical University Hospital, 196 WBBS images of patients with prostate cancer were collected. Among the 196 patients, 110 patients had bone metastasis, and 86 patients had no evidence of bone metastasis. We also collected 163 WBBS images of patients with breast cancer. All of them had bone metastasis. The study was approved by the Institutional Review Board (IRB) and the Hospital Research Ethics Committee (CMUH106-REC2-130) of China Medical University.
The radiopharmaceutical used for WBBS was Tc-99m MDP, and the imaging was performed 4 h after the vein injection. A Gamma camera (Millennium MG, Infinia Hawkeye 4, or Discovery NM/CT 670 system; GE Healthcare, Waukesha, WI, USA) was used for planar bone scanning, with a low-energy high-resolution or general-purpose collimator, a matrix size of 1024 × 256, a photon energy centered on the 140 keV peaks, and a symmetric 20% energy window. The collected bone scan images were in DICOM format, with a spatial resolution of 1024 × 512 pixels (composed of anterior-posterior (AP) and posterior-anterior (PA) views), and the intensity information of each pixel was saved in 2-byte (uint16). The images were preprocessed using the dedicated GE Xeleris workstation (GE Medical Systems, Haifa, Israel; version 3.1) before being uploaded to PACS.
A standard WBBS image contains two views: anterior and posterior. The original DICOM images were first converted to PNG format after removing any identifiable information. Following the approach described in [22], pre-processing was performed by normalizing the image size and intensity. Afterwards, the anterior and posterior views were cropped into a single image with a size of 950 × 512, without any scaling or geometric transformations, as shown in Figure 2.

2.2. Region Definition

To identify the skeletal regions where bone metastases occur most frequently, we consulted with two experienced nuclear medicine physicians and established standards. The standards require the approval of these two board-certified nuclear medicine physicians. The regions are the skull, spine, chest (including ribs, scapula, and clavicle), humerus (proximal to midshaft of the femurs), femurs (proximal to midshaft of the humerus), and pelvis.
The positions of the humerus on images differ significantly, as shown in Figure 2. Different from only one category on femurs, we categorize humerus into four categories, i.e., the left and right humerus in the anterior and posterior views separately. The reason for doing so will be addressed in the discussion. Furthermore, Tc-99m MDP undergoes renal metabolism, which can result in the kidneys appearing as high signal areas. In some situations, the kidney will be misclassified as metastasis. To alleviate this problem, we created an extra kidney category to exclude ambiguity.
In summary, there are in total ten categories (Figure 3), including the skull, spine, chest (including ribs, scapula, and clavicle), anterior right humerus (AR), anterior left humerus (AL), posterior right humerus (PR), posterior left humerus (PL), femurs (proximal to midshaft of the humerus), pelvis, and kidney.

2.3. Neural Network Architectures

Three different neural network architectures were tested, including Mask R-CNN [26], Double U-Net [27], and Deeplabv3 plus [28]. We used similar hyperparameters on these three models to conduct experiments to compare their performances.
The Mask R-CNN architecture shown in Figure 4 comprises four main parts: backbone architecture, RPN, RoIAlign, and head architecture. We used ResNet-50 as the backbone. The hyperparameters hold the same learning rate of 0.005, batch size of 4, and 100 epochs.
The Double U-Net architecture shown in Figure 5 comprises two sub-networks, dilated convolution, spatial pyramid pooling, and an SE block. It was originally designed for binary classification. Here we modified it to make the multi-class classification. We changed the output layer of Network 1 to have a SoftMax activation function to enable multi-class classification. The hyperparameters were set to be a learning rate of 0.0005, batch size of 4, 200 epochs (without data augmentation), or 20 epochs (with data augmentation).
The Deeplabv3 plus architecture shown in Figure 6 includes an encoder, decoder, dilated convolution, and depth-wise separable convolution. We used ResNet-50 as the encoder backbone. The hyperparameters were set to be a learning rate 0.0005, batch size of 4, and 200 epochs.
The learning rate is a hyperparameter used in various machine learning algorithms, particularly in gradient-based optimization. It determines the step size at which the model updates its parameters during training. The choice of learning rate depends on the specific problem. Typically, every model has its own suggested learning rate. In this study, we choose a balance between accuracy and training speed. For this task, Mask R-CNN uses a learning rate of 0.005, while Double U-Net and DeeplabV3 plus use a learning rate 0.0005.

2.4. Image Pre-Processing

The input matrix size for Mask R-CNN was 950 × 512. Double U-Net and Deeplabv3 plus’s input matrix size was adjusted to 960 × 512 by padding with zeros due to their restriction. The labels were saved in PNG format with integers ranging from 0 to 10.
Augmentation included rotations (−3°, 0°, 3°) with step 1°, scaling (0.9, 1, 1.1) with step 0.1, and brightness adjustments (0.8, 0.93, 1.06, 1.19, 1.32, 1.45, 1.58, 1.7 times). The augmented images had the same matrix size as the original images, resulting in a total rate of 63 times increase. The augmentations were only used in training.

2.5. Evaluations

In this study, the terms true positive (TP), false positive (FP), true negative (TN), and false negative (FN) were defined in pixel scale. The evaluation metrics used in the experiment were precision (Equation (1)) and sensitivity (Equation (2)), and the overall model evaluation was based on the F1 score (Equation (3)).
Precision = (True positive)/(True positive + False positive),
Sensitivity = (True positive)/(True positive + False negative),
F1 score = 2(Precision × Sensitivity)/(Precision+ Sensitivity),

3. Results

3.1. 10-Fold Cross-Validation

In this study, validations on these three models used 10-fold cross-validation. Two datasets comprised 196 prostate cancer WBBS images and 163 breast cancer WBBS images, respectively. The ratio of training, validation, and test was set to be 8:1:1. The main goal of this experiment was to compare the performance differences among each network and to evaluate the impact of prostate and breast cancer WBBS images on network training. The hyperparameters used in the experiment are in Table 1, and the results are depicted in Table 2 and Table 3, compared in Table 4. The qualitative results of bone segmentation are shown in Figure 7 and Figure 8.

3.2. 10-Fold Cross-Validation with Data Augmentation

After the above experiments, we chose Double U-Net for investigation because it slightly outperformed on F1-score. Following, we fine-tuned the epoch to trade-off the training time and the performance to see what best performance we could reach. The images of training for prostate cancer and breast cancer were augmented 63 times by using rotation, scaling, and brightness adjustment described in the methods. Again, the hyperparameters are in Table 5, and the quantitative results of the 10-fold cross-validation are in Table 6.

4. Discussion

This study utilised Mask R-CNN, Double U-Net, and DeeplabV3 plus for skeleton segmentation comparison on prostate cancer and breast cancer WBBS images. The quantitative results were investigated via 10-fold cross-validation. Based on the quantitative findings, Mask R-CNN exhibited higher precision than Double U-Net by 2.03% in the prostate cancer dataset and 1.84% in the breast cancer dataset. Mask R-CNN also exhibited higher precision than DeeplabV3 by 3.23% in the prostate dataset and 2.31% in the breast dataset. On the other hand, Double U-Net (90.70% & 88.86%) demonstrated higher sensitivity than Mask R-CNN (87.02% & 85.51%) and DeeplabV3 plus (88.64% & 85.71%). This indicated that Mask R-CNN had lower false positives (FP) during prediction, while Double U-Net had lower false negatives (FN).
To better understand these results, we visualized the predictions, where white color represented TP, green color represented FP and red color represented FN (as shown in Figure 9 and Figure 10). Mask R-CNN’s predictions shifted inward slightly compared to the ground truth (GT), resulting in more FN in the edge regions and only a few FP. Double U-Net’s predictions aligned well with the GT along the edges, resulting in slightly fewer FN but more FP. DeeplabV3 plus exhibited irregularities along the edges compared to the other two models, leading to noticeable erroneous FP and an overall increase in FP.
These findings shed light on the performance of different models for skeleton segmentation, emphasizing the trade-off between FP and FN. Further improvements can be explored to address the limitations observed, particularly in the case of DeeplabV3 plus, to enhance its stability and accuracy.
Further investigation of Mask R-CNN results revealed an increase in false negatives (FN) when predicting smaller categories, such as the humerus and kidneys, as shown in Figure 11a. This result could be caused by the following reasons:
First, the insufficient brightness in the WBBS image may hinder feature detection. The brightness of WBBS images depends on the counts collected by the scintillation crystal, which can be influenced by factors such as patient thickness and radiopharmaceutical activity. In cases where the received counts are insufficient, resulting in inadequate image brightness, deep neural network models may struggle to make accurate judgments or even make errors. Adjusting the image brightness and conducting further tests can help alleviate this situation, as shown in Figure 11b.
Second, abnormal patient positioning in the WBBS image could cause another issue. In a few instances, patient positioning in the WBBS image deviates to some extent from standard clinical positions. This deviation made challenges for CNN prediction, as shown in Figure 12. The degree of deviation is closely related to the patient’s clinical condition and is difficult to entirely avoid in clinical practice. While other previous studies might manually exclude misleading images to prevent such occurrences, this study aimed to maintain a dataset that reflects real clinical scenarios, thereby we did not exclude any case. To enhance the network’s ability to predict WBBS images with unusual positioning, future considerations include employing hard negative mining techniques to improve the model’s generalization capabilities.
Third, the model’s insensitivity to features of small objects in WBBS images could also decrease performance. Quantitative results indicated relatively low precision for categories such as upper limbs, femurs, and kidneys, which correspond to smaller objects. This suggested Mask R-CNN facing certain difficulties in segmenting smaller regions.
These findings highlighted specific challenges encountered during the skeleton segmentation process, particularly related to image brightness, abnormal patient positioning, and the segmentation of smaller objects. Addressing these challenges could improve the performance of the Mask R-CNN model.
On the other hand, we observed that DeeplabV3 plus and Double U-Net tended to mix categories, resulting in unstable performance. Double U-Net and DeeplabV3 plus did not exhibit the category missing issue observed in Mask R-CNN, but they experienced problems such as category confusion and masks appearing in unintended areas, with DeeplabV3 plus being particularly affected. The issue of category confusion during prediction in semantic segmentation network architectures was not explicitly mentioned in [20,25]. However, in our experiments, we did observe this problem. Figure 13a showed an incorrect segmentation in the knee area in a Double U-Net skeleton segmentation result, while Figure 13b depicted category confusion in the upper limbs and head in a DeeplabV3 plus skeleton segmentation result.
This problem stemmed from different network architectures. Mask R-CNN utilizes parallel branch networks to independently determine categories and select the appropriate masks based on individual region-of-interest (ROI). Consequently, different ROIs could be distinguished independently, and masks could be treated as separate entities. In contrast, traditional fully convolutional network (FCN) architectures performed category and mask predictions simultaneously, leading to competition between different categories and masks. Additionally, due to the design of having one category per mask, FCN-based methods could not treat ROIs independently. Another critical factor was using the Sigmoid activation function and average binary cross-entropy loss in the branch networks, which mitigated the adverse effects of cross-category competition encountered in traditional FCN methods. This design yielded excellent instance segmentation results and avoided category overlap or confusion. From the experiments, Mask R-CNN demonstrated itself more suitable for skeleton segmentation in WBBS images than the other two network architectures.
From experiments shown in Table 2, Table 3 and Table 4, one might think that the models’ performance is close to each other, and there might not be a statistically significant difference. It is crucial to consider the context of image segmentation in deep learning. In this task, precision and sensitivity are calculated pixel-wise. Therefore, even a small difference in percentage points can have a significant impact.
In the experiments involving data augmentation, it was observed that it contributed to a slight performance improvement. As the model already performed reasonably well without data augmentation, the addition of data augmentation only led to marginal performance gains. According to related literature [29], incorporating data augmentation helped reduce overfitting at higher learning rates, allowing the model to be trained for more epochs without sacrificing accuracy. Further experiments and investigations were warranted to explore the impact of data augmentation in more depth.
The limitation of this study is the scarcity of original data and the homogeneity of its source. In the future, it is desirable to establish collaborations with other medical centers to acquire cross-centre data, thereby improving the performance and generalization ability of the models. Additionally, we only investigated three relatively common network architectures, and it would be an attractive research direction to explore newer architectures, such as transformer-based networks. Different nuclear medicine imaging modalities, such as planar and SPECT, differ in the resulting images. It would be worth investigating whether these differences lead to heterogeneity in model predictions. This is an area for further exploration in the future.

5. Conclusions

In this study, we investigated three CNN models on bone segmentation of the WBBS images. We found that only one model was suitable for this goal, Mask R-CNN. The Double U-Net and Deeplabv3 + had a problem with ‘category confusion’, which humans would never have. We used a pixelwise scale to examine the model performance. The best performance we had ever made for Mask R-CNN was the precision, sensitivity, and F1-score: 0.93, 0.87, 0.90 for the prostate cancer dataset and 0.92, 0.86, 0.88 for the breast cancer dataset, which was the average of 10-fold cross-validation.

Author Contributions

Conceptualization, D.-C.C.; methodology, D.-C.C.; software, P.-N.Y. and Y.-Y.C.; validation, P.-N.Y. and Y.-Y.C.; formal analysis D.-C.C.; investigation, Y.-C.L.; resources, Y.-C.L. and D.-C.C.; data curation, P.-N.Y. and Y.-Y.C.; writing—original draft preparation, P.-N.Y.; writing—review and editing, D.-C.C.; visualization, D.-C.C.; supervision, D.-C.C.; project administration, D.-C.C.; funding acquisition, D.-C.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Science and Technology Council (NSTC), Taiwan, grant number MOST 111-2314-B-039-040.

Institutional Review Board Statement

The study was approved by the Institutional Review Board (IRB) and the Hospital Research Ethics Committee (CMUH106-REC2-130, approved on 27 September 2017) of China Medical University.

Informed Consent Statement

Patient consent was waived by IRB due to this is a retrospective study, and only images were used without the patient’s identification.

Data Availability Statement

Not applicable.

Acknowledgments

We thank National Center for High-performance Computing (NCHC) for providing computational and storage resources.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

According to the previous research results, we established a skeleton segmentation website equipped with a deep learning framework, enabling clinical physicians to utilize website functions online to assist in calculating BSI and conducting clinical diagnoses, thereby achieving the purpose of the previous research. This website has multiple functions that allow clinical physicians to upload images and perform simple post-processing on the images within the website. Finally, execute the skeleton segmentation deep learning model for skeleton segmentation. The public IP address of the website is 140.128.65.129, and login credentials are required (username: wbbsweb, password: wbbswebpass).

References

  1. Coleman, R. Metastatic bone disease: Clinical features, pathophysiology, and treatment strategies. Cancer Treat. Rev. 2001, 27, 165–176. [Google Scholar] [CrossRef] [PubMed]
  2. National Health Insurance Research Database. Available online: https://www.mohw.gov.tw/cp-16-70314-1.html (accessed on 12 May 2022).
  3. O’Sullivan, G.J.; Carty, F.L.; Cronin, C.G. Imaging of bone metastasis: An update. World J. Radiol. 2015, 7, 202–211. [Google Scholar] [CrossRef]
  4. Coleman, R.E. Clinical features of metastatic bone disease and risk of skeletal morbidity. Clin. Cancer Res. 2006, 12, 6243–6249. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  5. Brenner, A.I.; Koshy, J.; Morey, J.; Lin, C.; DiPoce, J. The bone scan. Semin. Nucl. Med. 2012, 42, 11–26. [Google Scholar] [CrossRef] [Green Version]
  6. Imbriaco, M.; Larson, S.M.; Yeung, H.W.; Mawlawi, O.R.; Erdi, Y.; Venkatraman, E.S.; Scher, H.I. A new parameter for measuring metastatic bone involvement by prostate cancer: The Bone Scan Index. Clin. Cancer Res. Off. J. Am. Assoc. Cancer Res. 1998, 4, 1765–1772. [Google Scholar]
  7. Dennis, E.R.; Jia, X.; Mezheritskiy, I.S.; Stephenson, R.D.; Schoder, H.; Fox, J.J.; Heller, G.; Scher, H.I.; Larson, S.M.; Morris, M.J. Bone scan index: A quantitative treatment response biomarker for castration-resistant metastatic prostate cancer. J. Clin. Oncol. 2012, 30, 519–524. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  8. Anand, A.; Morris, M.J.; Kaboteh, R.; Båth, L.; Sadik, M.; Gjertsson, P.; Lomsky, M.; Edenbrandt, L.; Minarik, D.; Bjartell, A. Analytic validation of the automated bone scan index as an imaging biomarker to standardize quantitative changes in bone scans of patients with metastatic prostate cancer. J. Nucl. Med. 2016, 57, 41–45. [Google Scholar] [CrossRef] [Green Version]
  9. Nakajima, K.; Edenbrandt, L.; Mizokami, A. Bone scan index: A new biomarker of bone metastasis in patients with prostate cancer. Int. J. Urol. 2017, 24, 668–673. [Google Scholar] [CrossRef] [Green Version]
  10. Armstrong, A.J.; Nordle, O.; Morris, M. Assessing the Prognostic Value of the Automated Bone Scan Index for Prostate Cancer—Reply. JAMA Oncol. 2019, 5, 270–271. [Google Scholar] [CrossRef]
  11. Ulmert, D.; Kaboteh, R.; Fox, J.J.; Savage, C.; Evans, M.J.; Lilja, H.; Abrahamsson, P.A.; Björk, T.; Gerdtsson, A.; Bjartell, A.; et al. A novel automated platform for quantifying the extent of skeletal tumour involvement in prostate cancer patients using the Bone Scan Index. Eur. Urol. 2012, 62, 78–84. [Google Scholar] [CrossRef] [Green Version]
  12. Reza, M.; Kaboteh, R.; Sadik, M.; Bjartell, A.; Wollmer, P.; Trägårdh, E. A prospective study to evaluate the intra-individual reproducibility of bone scans for quantitative assessment in patients with metastatic prostate cancer. BMC Med. Imaging 2018, 18, 8. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  13. Armstrong, A.J.; Anand, A.; Edenbrandt, L.; Bondesson, E.; Bjartell, A.; Widmark, A.; Sternberg, C.N.; Pili, R.; Tuvesson, H.; Nordle, O. Phase 3 assessment of the automated bone scan index as a prognostic imaging biomarker of overall survival in men with metastatic castration-resistant prostate cancer: A secondary analysis of a randomized clinical trial. JAMA Oncol. 2018, 4, 944–951. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  14. Anand, A.; Morris, M.J.; Kaboteh, R.; Reza, M.; Trägårdh, E.; Matsunaga, N.; Edenbrandt, L.; Bjartell, A.; Larson, S.M.; Minarik, D. A preanalytic validation study of automated bone scan index: Effect on accuracy and reproducibility due to the procedural variabilities in bone scan image acquisition. J. Nucl. Med. 2016, 57, 1865–1871. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  15. Reza, M.; Wirth, M.; Tammela, T.; Cicalese, V.; Veiga, F.G.; Mulders, P.; Miller, K.; Tubaro, A.; Debruyne, F.; Patel, A.; et al. Automated bone scan index as an imaging biomarker to predict overall survival in the Zometa European Study/SPCG11. Eur. Urol. Oncol. 2021, 4, 49–55. [Google Scholar] [CrossRef] [Green Version]
  16. Wuestemann, J.; Hupfeld, S.; Kupitz, D.; Genseke, P.; Schenke, S.; Pech, M.; Kreissl, M.C.; Grosser, O.S. Analysis of bone scans in various tumor entities using a deep-learning-based artificial neural network algorithm—Evaluation of diagnostic performance. Cancers 2020, 12, 2654. [Google Scholar] [CrossRef]
  17. Yoshida, A.; Higashiyama, S.; Kawabe, J. Assessment of software for semi-automatically calculating the bone scan index on bone scintigraphy scans. Clin. Imaging 2021, 78, 14–18. [Google Scholar] [CrossRef]
  18. Koizumi, M.; Wagatsuma, K.; Miyaji, N.; Murata, T.; Miwa, K.; Takiguchi, T.; Makino, T.; Koyama, M. Evaluation of a computer-assisted diagnosis system, BONENAVI version 2, for bone scintigraphy in cancer patients in a routine clinical setting. Ann. Nucl. Med. 2015, 29, 138–148. [Google Scholar] [CrossRef]
  19. Koizumi, M.; Miyaji, N.; Murata, T.; Motegi, K.; Miwa, K.; Koyama, M.; Terauchi, T.; Wagatsuma, K.; Kawakami, K.; Richter, J. Evaluation of a revised version of the computer-assisted diagnosis system, BONENAVI version 2.1.7, for bone scintigraphy in cancer patients. Ann. Nucl. Med. 2015, 29, 659–665. [Google Scholar] [CrossRef]
  20. Shimizu, A.; Wakabayashi, H.; Kanamori, T.; Saito, A.; Nishikawa, K.; Daisaki, H.; Higashiyama, S.; Kawabe, J. Correction to: Automated measurement of bone scan index from a whole-body bone scintigram. Int. J. Comput.-Assist. Radiol. Surg. 2020, 15, 401. [Google Scholar] [CrossRef] [Green Version]
  21. Cheng, D.C.; Liu, C.C.; Hsieh, T.C.; Yen, K.Y.; Kao, C.H. Bone metastasis detection in the chest and pelvis from a whole-body bone scan using deep learning and a small dataset. Electronics 2021, 10, 1201. [Google Scholar] [CrossRef]
  22. Cheng, D.C.; Hsieh, T.C.; Yen, K.Y.; Kao, C.H. Lesion-based bone metastasis detection in chest bone scintigraphy images of prostate cancer patients using pre-train, negative mining, and deep learning. Diagnostics 2021, 11, 518. [Google Scholar] [CrossRef] [PubMed]
  23. Cheng, D.C.; Liu, C.C.; Kao, C.H.; Hsieh, T.C. System of Deep Learning Neural Network in Prostate Cancer Bone Metastasis Identification Based on Whole Body Bone Scan Images. U.S. Patent US11488303B2, 1 November 2022. [Google Scholar]
  24. Brown, M.S. Computer-Aided Bone Scan Assessment with Automated Lesion Detection and Quantitative Assessment of Bone Disease Burden Changes. U.S. Patent US20140105471, 7 April 2015. [Google Scholar]
  25. Huang, K.B.; Huang, S.G.; Chen, G.J.; Li, X.; Li, S.; Liang, Y.; Gao, Y. An end-to-end multi-task system of automatic lesion detection and anatomical localization in whole-body bone scintigraphy by deep learning. Bioinformatics 2023, 39, btac753. [Google Scholar] [CrossRef] [PubMed]
  26. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. arXiv 2018, arXiv:1703.06870v3. [Google Scholar]
  27. Jha, D.; Riegler, M.A.; Johansen, D.; Halvorsen, P.; Johansen, H.D. Double U-Net: A Deep Convolutional Neural Network for Medical Image Segmentation. arXiv 2020, arXiv:2006.04868. [Google Scholar]
  28. Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. arXiv 2018, arXiv:1802.02611v3. [Google Scholar]
  29. Bhuse, P.; Singh, B.; Raut, P. Effect of data augmentation on the accuracy of convolutional neural networks. In Information and Communication Technology for Competitive Strategies (ICTCS 2020) ICT: Applications and Social Interfaces; Springer: Berlin/Heidelberg, Germany, 2020; pp. 337–348. [Google Scholar]
Figure 1. (a) represents a WBBS, (b) depicts the skeleton regions with a high incidence of bone metastasis, and (c) indicates the areas where bone metastasis is present. The BSI in (a) corresponds to the area ratio of (c) to (b).
Figure 1. (a) represents a WBBS, (b) depicts the skeleton regions with a high incidence of bone metastasis, and (c) indicates the areas where bone metastasis is present. The BSI in (a) corresponds to the area ratio of (c) to (b).
Diagnostics 13 02302 g001
Figure 2. Two WBBS, (a) has bone metastasis and (b) has no metastasis.
Figure 2. Two WBBS, (a) has bone metastasis and (b) has no metastasis.
Diagnostics 13 02302 g002
Figure 3. (a) shows the original image, while (b) displays the ground truth of the bone metastasis-prone regions and 10 (+1 background) categories with different colors.
Figure 3. (a) shows the original image, while (b) displays the ground truth of the bone metastasis-prone regions and 10 (+1 background) categories with different colors.
Diagnostics 13 02302 g003
Figure 4. The architecture of Mask R-CNN having multi-class classification.
Figure 4. The architecture of Mask R-CNN having multi-class classification.
Diagnostics 13 02302 g004
Figure 5. The architecture of Double U-Net is comprised of two sub-networks. To enable multi-class classification, we modified the output of Network 1 and the input of Network 2.
Figure 5. The architecture of Double U-Net is comprised of two sub-networks. To enable multi-class classification, we modified the output of Network 1 and the input of Network 2.
Diagnostics 13 02302 g005
Figure 6. The architecture of the DeeplabV3 plus using ResNet-50 as the backbone.
Figure 6. The architecture of the DeeplabV3 plus using ResNet-50 as the backbone.
Diagnostics 13 02302 g006
Figure 7. The qualitative results of three models on prostate cancer WBBS images: (a) Mask R-CNN, (b) Double U-Net, and (c) DeeplabV3 plus.
Figure 7. The qualitative results of three models on prostate cancer WBBS images: (a) Mask R-CNN, (b) Double U-Net, and (c) DeeplabV3 plus.
Diagnostics 13 02302 g007
Figure 8. The qualitative results of three models on breast cancer WBBS images: (a) Mask R-CNN, (b) Double U-Net, and (c) DeeplabV3 plus.
Figure 8. The qualitative results of three models on breast cancer WBBS images: (a) Mask R-CNN, (b) Double U-Net, and (c) DeeplabV3 plus.
Diagnostics 13 02302 g008
Figure 9. Qualitative comparisons on three models: (a) Mask R-CNN, (b) Double U-Net, (c) DeeplabV3 plus. White: TP, red: FP, and green: FN.
Figure 9. Qualitative comparisons on three models: (a) Mask R-CNN, (b) Double U-Net, (c) DeeplabV3 plus. White: TP, red: FP, and green: FN.
Diagnostics 13 02302 g009
Figure 10. Qualitative comparisons on three models: (a) Mask R-CNN, (b) Double U-Net, (c) DeeplabV3 plus.
Figure 10. Qualitative comparisons on three models: (a) Mask R-CNN, (b) Double U-Net, (c) DeeplabV3 plus.
Diagnostics 13 02302 g010
Figure 11. (a) Original test segmentation result with missing right humerus in the frontal view. (b) Segmentation result after adjusting the brightness to 2.5 times and retest.
Figure 11. (a) Original test segmentation result with missing right humerus in the frontal view. (b) Segmentation result after adjusting the brightness to 2.5 times and retest.
Diagnostics 13 02302 g011
Figure 12. A segmentation result showed the absence of the frontal and dorsal left femur due to abnormal patient position, while this abnormal position was rare and did not exist in the training dataset.
Figure 12. A segmentation result showed the absence of the frontal and dorsal left femur due to abnormal patient position, while this abnormal position was rare and did not exist in the training dataset.
Diagnostics 13 02302 g012
Figure 13. (a) Segmentation result of Double U-Net with a segmentation error in the distal part of the leg. (b) Segmentation result of Deeplabv3 plus showing category confusion in the upper limbs and head region.
Figure 13. (a) Segmentation result of Double U-Net with a segmentation error in the distal part of the leg. (b) Segmentation result of Deeplabv3 plus showing category confusion in the upper limbs and head region.
Diagnostics 13 02302 g013
Table 1. Hyperparameters were used for the 10-fold cross-validation experiments with each neural network.
Table 1. Hyperparameters were used for the 10-fold cross-validation experiments with each neural network.
HyperparametersMask R-CNNDouble U-NetDeeplabV3 Plus
Learning Rate0.0050.00050.0005
Batch Size444
Epochs100200200
Table 2. The comparing results of 10-fold cross-validation on prostate cancer WBBS image dataset.
Table 2. The comparing results of 10-fold cross-validation on prostate cancer WBBS image dataset.
CategoryMask R-CNNDouble U-NetDeeplabV3 Plus
PrecisionSensitivityPrecisionSensitivityPrecisionSensitivity
Skull97.2294.4396.0596.1395.3495.91
Spine93.9088.6291.1691.3089.9489.79
Chest95.3393.5894.8394.5293.6193.87
AR_humerus91.8284.8089.6590.1887.4287.88
AL_humerus92.4685.3089.7690.1287.9489.02
PR_humerus91.7284.4188.6889.5585.7787.25
PL_humerus89.9482.0187.8988.7887.5083.64
Pelvis92.3288.2690.7690.8390.9987.84
Femurs88.4081.7586.0884.8585.5982.60
Kidney86.1379.2382.4582.7380.1581.87
Average91.9386.2489.7389.9088.4387.97
Average (w/o kidney)92.5787.0290.5490.7089.3488.64
The F1 scores are 89.71, 90.62, and 88.99 for Mask R-CNN, Double U-Net, and DeeplabV3, respectively. The Double U-Net has the best F1-score.
Table 3. The comparing results of 10-fold cross-validation on the breast cancer WBBS image dataset.
Table 3. The comparing results of 10-fold cross-validation on the breast cancer WBBS image dataset.
CategoryMask_R-CNNDouble U-NetDeeplabV3 Plus
PrecisionSensitivityPrecisionSensitivityPrecisionSensitivity
Skull97.2494.2396.1895.8895.9193.24
Spine93.2088.6191.1590.6890.5687.76
Chest95.1793.4894.1094.3292.7893.40
AR_humerus89.6780.2387.2186.0185.8881.90
AL_humerus89.0781.2086.4484.9787.1580.26
PR_humerus89.6582.1087.5886.4685.4183.66
PL_humerus88.2880.0887.3486.3986.6281.92
Pelvis92.2288.2790.8690.2491.3487.54
Femurs89.9581.3987.0584.8388.0681.71
Kidney87.2180.7184.3783.7483.9177.59
Average91.1785.0389.2388.3588.7684.90
Average (w/o Kidney)91.6185.5189.7788.8689.3085.71
The F1-scores are 88.45, 89.31, and 87.47 for Mask R-CNN, Double U-Net, and DeeplabV3, respectively. The Double U-Net has the best F1-score.
Table 4. The comparing results of 10-fold cross-validation on the two image datasets.
Table 4. The comparing results of 10-fold cross-validation on the two image datasets.
DatabaseMask_R-CNNDouble U-NetDeeplabV3 Plus
Pre.Sen.F1-ScorePre.Sen.F1-ScorePre.Sen.F1-Score
Prostate cancer92.5787.0289.7190.5490.7090.6289.3488.6488.99
Breast cancer91.6185.5188.4589.7788.8689.3189.3085.7187.47
Pre. = Precision, Sen. = Sensitivity.
Table 5. Hyperparameters for training Double U-Net.
Table 5. Hyperparameters for training Double U-Net.
HyperparametersDouble U-Net
Learning Rate0.0005
Batch Size4
Epochs20
Table 6. 10-fold cross-validation on Double U-Net, used augmentation.
Table 6. 10-fold cross-validation on Double U-Net, used augmentation.
Fold NumberProstateBreast
PrecisionSensitivityPrecisionSensitivity
186.6796.0583.9594.84
287.0194.9286.1895.26
391.2291.3381.1496.05
493.0191.3781.8796.32
585.6994.8584.3596.18
694.1889.2896.2376.73
796.1086.8195.6485.26
893.4388.3195.3784.49
992.9987.7495.5785.51
1093.8988.1294.9789.19
Average91.4290.8889.5389.98
The F1 scores are 91.15 and 89.75, respectively.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, P.-N.; Lai, Y.-C.; Chen, Y.-Y.; Cheng, D.-C. Skeleton Segmentation on Bone Scintigraphy for BSI Computation. Diagnostics 2023, 13, 2302. https://doi.org/10.3390/diagnostics13132302

AMA Style

Yu P-N, Lai Y-C, Chen Y-Y, Cheng D-C. Skeleton Segmentation on Bone Scintigraphy for BSI Computation. Diagnostics. 2023; 13(13):2302. https://doi.org/10.3390/diagnostics13132302

Chicago/Turabian Style

Yu, Po-Nien, Yung-Chi Lai, Yi-You Chen, and Da-Chuan Cheng. 2023. "Skeleton Segmentation on Bone Scintigraphy for BSI Computation" Diagnostics 13, no. 13: 2302. https://doi.org/10.3390/diagnostics13132302

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop