Impact of Denoising on Deep-Learning-Based Automatic Segmentation Framework for Breast Cancer Radiotherapy Planning

Simple Summary We investigated the contouring data of organs at risk from 40 patients with breast cancer who underwent radiotherapy. The performance of denoising-based auto-segmentation was compared with manual segmentation and conventional deep-learning-based auto-segmentation without denoising. Denoising-based auto-segmentation achieved superior segmentation accuracy on the liver compared with AccuContourTM-based auto-segmentation. This denoising-based auto-segmentation method could provide more precise contour delineation of the liver and reduce the clinical workload. Abstract Objective: This study aimed to investigate the segmentation accuracy of organs at risk (OARs) when denoised computed tomography (CT) images are used as input data for a deep-learning-based auto-segmentation framework. Methods: We used non-contrast enhanced planning CT scans from 40 patients with breast cancer. The heart, lungs, esophagus, spinal cord, and liver were manually delineated by two experienced radiation oncologists in a double-blind manner. The denoised CT images were used as input data for the AccuContourTM segmentation software to increase the signal difference between structures of interest and unwanted noise in non-contrast CT. The accuracy of the segmentation was assessed using the Dice similarity coefficient (DSC), and the results were compared with those of conventional deep-learning-based auto-segmentation without denoising. Results: The average DSC outcomes were higher than 0.80 for all OARs except for the esophagus. AccuContourTM-based and denoising-based auto-segmentation demonstrated comparable performance for the lungs and spinal cord but showed limited performance for the esophagus. Denoising-based auto-segmentation for the liver was minimal but had statistically significantly better DSC than AccuContourTM-based auto-segmentation (p < 0.05). Conclusions: Denoising-based auto-segmentation demonstrated satisfactory performance in automatic liver segmentation from non-contrast enhanced CT scans. Further external validation studies with larger cohorts are needed to verify the usefulness of denoising-based auto-segmentation.


Introduction
In radiotherapy planning, organs at risk (OARs) are manually delineated by physicians based on computed tomography (CT) scans. Accurate contouring of OARs is essential for precise radiotherapy. OARs are manually delineated and carefully reviewed by physicians.

Data and Delineation
Ethical approval for this study was obtained from the Institutional Review Board (IRB) of Yonsei University Health System, Gangnam Severance Hospital (Approval No.: 3-2021-0276). All methods were performed in accordance with the relevant guidelines and regulations. Due to the retrospective nature of this study, informed consent was waived by the IRB of Gangnam Severance Hospital. We used non-contrast planning CT scans of female patients with breast cancer who underwent modified radical mastectomy or breast-conserving surgery and received postoperative radiotherapy between 2019 and 2020 [37]. Forty patients were randomly chosen. The median age was 49 years old (range, 30-77 years), and the median body mass index was 22 kg/m 2 (range 17-32 kg/m 2 ). There were 22 patients with left breast cancer and 18 patients with right breast cancer. No patients had previously undergone surgical procedures for lung, heart, esophagus, spine, and upper abdominal organs at the time of conducting non-contrast planning CT. The CT images were acquired on a Siemens Sensation Open scanner (Siemens, Forchheim, Germany) using the following parameters: 120 kVp (scan voltage) and 3 mm slice thickness (layer thickness). Scans were conducted with tube current modulation, an adaptive method in which the current changes as the gantry is rotated. We obtained 81-123 slices per patient. All patients were scanned in the supine position with a customized arm support using a breast board. In this study, the OARs included the heart, right and left lungs, esophagus, spinal cord, and liver. The contours were manually delineated by two experienced radiation oncologists. The radiation oncologist was blinded to the results of delineation of the OARs by other radiation oncologists.

Deep-Learning-Based Auto-Segmentation
Recently, various deep-learning-based auto-segmentation methods have been developed to assist with image segmentation tasks. Satisfactory organ segmentation results have been reported [1], and some commercial products have been implemented in clinics for CT-based automatic segmentation. In this study, a commercially available deep learning contouring software "AccuContour TM " (Manteia Medical Technologies Co. Ltd., Xiamen, China) was used to generate the information required for treatment planning. It automatically segments the OARs, including the head-and-neck, thorax, abdomen, and pelvis for both males and females. AccuContour TM is based on the U-net model [38] pre-trained by the vendors. U-net is a fully convolutional network (FCN) based model with end-to-end scheme proposed for image segmentation [38][39][40]. In U-net, the network for obtaining the overall context information of the image and that for accurate localization are symmetrically configured. U-net is a model that applies up-sampling and skip architecture of concepts that are more extended than FCNs, resulting in the U-net's structure demonstrating superior performance in several image segmentation problems by leveraging data augmentation with only a small amount of learning data.
The trained model data were collected from multiple centers, and data cleaning was performed. Initial contours generated by the deep-learning model were corrected by postprocessing with graph-based models. The accuracy was further improved by combining local and global information from the image and initial segmentation results. With these procedures, the contouring workload was reduced from hours to less than a minute for each patient. This segmentation technique was applied to delineate the OARs.

Anisotropic Total Variation Denoiser-Based Auto-Segmentation
An ATV denoiser [36] was applied to the CT images to augment the intensity difference between the striking features and unwanted noise by combining the conduction coefficient used in the anisotropic diffusion filter [41]. The minimization of the ATV objective function implies that edges with high contrast relative to the surroundings are preserved, and noisy voxels with low contrast are smoothed [42].
The ATV objective function, R(V), can be expressed as follows: where w j is the anisotropic penalty with different weights for neighbors at the same distance and D(V j ) is the discrete gradient transform with backward difference in the jth indexed value of the CT images.
where index j identifies the index of voxel elements in the CT image, and V (x,y) is the voxel element at the 2D position (x, y). Equivalently, N j represents the set of neighbors of the jth voxel element. We only considered four first-order neighbors in this study. Empirically, the most meaningful results were derived with the parameter δ set to 80% of the cumulative distribution function histogram that accumulates the gradient at each voxel of a CT image. The ATV objective function in Equation (1) was minimized using the steepest gradient descent method with an adaptive step size. It is expressed as follows.
where λ is an adaptive parameter that reduces the smoothing degree as the iteration progresses [27,31]. The square root of all voxel elements updated in each step is used to change λ gradually to smaller values with an increase in the number of iterations. A scaling parameter γ was used to escape local minimization due to sudden changes. This value starts initially at 1.0 and decreases linearly by multiplication with a constant value (0.8) when R(V) in the current iteration step is greater than that in the preceding step. ∇R V j is the gradient of the objective function R(V) at the jth indexed pixel [32]. The root-sumsquare of the gradient calculated at all the pixels, |∇R(V)|, is required for the normalized gradient calculation [32]. The number of iterations is fine-tuned for the gradient descent optimizer. In this study, the optimal number of iterations was set to 20. The parameters used to optimize the denoising method were based on the manuaaly adjusted analysis. The pseudo-code of the ATV denoiser is presented in Appendix A.
The proposed image processing pipeline includes three steps. The first step is denoising. The anisotropic total variation was used for each image set to smooth noisy pixels while preserving the intensity of the edges during denoising. For the second step, the denoised CT images were used as input data in the AccuContour TM segmentation module based on the U-net model pre-trained by the vendors. The third step involves contour segmentation. Six auto-generated contour sets (heart, left lung, right lung, esophagus, spinal cord, and liver) for each CT image were generated using a deep-learning-based auto-segmentation framework ( Figure 1).

Quantitative Analysis
The noise power spectrum (NPS) was calculated using open-source software (imQuest, Duke University, Durham, NC, USA) that uses the technology described in TG233 of the AAPM [43,44] to assess the image quality characteristics without and with ATV denoiser. The quantity of in-plane noise was evaluated using two-dimensional NPS. For CT images, NPS can be determined in structures with a homogenous area. In this study, the liver was selected for NPS calculations, and overall frequencies were compared using 1D profiles.
The manual contours drawn by two radiation oncologists were considered the ground truth in this study, against which the AccuContour TM -based and denoising-based autosegmentations were compared. To quantitatively evaluate the accuracy of AccuContour TMbased and denoising-based auto-segmentations, the Dice similarity coefficient (DSC) was used to evaluate the performance of the proposed method. The DSC method calculates the overlapping results of two different volumes according to the following equation: where A is the manual segmentation volume, and B is the auto-segmentation volume (AccuContour TM and denoising). DSC is a measure of overlap between two contours, from "0" to "1," where "1" indicates a complete overlap. We considered a Dice score of 0.80 as an acceptable match [45]. Wilcoxon matched pairs signed-rank test was conducted, and statistical significance was defined as p < 0.05 for evaluating differences in the results of the DSC.
where A is the manual segmentation volume, and B is the auto-segmentation volume (Ac-cuContour TM and denoising). DSC is a measure of overlap between two contours, from "0" to "1," where "1" indicates a complete overlap. We considered a Dice score of 0.80 as an acceptable match [45].
Wilcoxon matched pairs signed-rank test was conducted, and statistical significance was defined as p < 0.05 for evaluating differences in the results of the DSC.  Figure 2 shows the NPS curves without and with the ATV denoiser using planning CT images from 40 patients with breast cancer. Three square ROIs were placed at different  Figure 2 shows the NPS curves without and with the ATV denoiser using planning CT images from 40 patients with breast cancer. Three square ROIs were placed at different positions in the liver area with uniform magnitude, as shown in Figure 2a. The ROIs were extended to five adjacent consecutive slices contained within the liver area. The average NPS peak frequency was obtained as 0.127 mm −1 without denoiser and 0.035 mm −1 for the ATV denoiser. The NPS peaks ranged from 209 to 957 HU 2 mm 2 without the denoiser and 66 to 481 HU 2 mm 2 for the ATV denoiser. As such, the NPS peak was on average lower with the ATV denoiser than without the denoiser. The peak spatial frequency values of NPS for ATV denoiser shifted to lower spatial frequencies in comparison to no denoiser. Numerically, the NPS average spatial frequencies were obtained as 0.142 mm −1 for the ATV denoiser and 0.295 mm −1 for no denoiser. Images with the ATV denoiser smoothed out with a lower noise amplitude, as indicated by the average frequencies of the NPS curves, resulting in a monotonous texture. The results of DSC versus the manual contours from radiation oncologists 1 and 2 are shown in Tables 1 and 2, respectively. The average DSC outcomes were higher than 0.80 in all OARs, except for the esophagus. The AccuContour TM -based and denoising-based auto-segmentations of the esophagus were below acceptable standards. In a comparison of AccuContour TM -based and denoising-based auto-segmentation, the differences were not statistically significant for the lungs, esophagus, or spinal cord (p > 0.05). The denoising-based auto-segmentations achieve superior segmentation accuracy on the liver and inferior segmentation accuracy on the heart compared with AccuContour TM -based auto-segmentations (p < 0.05).

Results
Cancers 2022, 14, x 6 of 18 positions in the liver area with uniform magnitude, as shown in Figure 2a. The ROIs were extended to five adjacent consecutive slices contained within the liver area. The average NPS peak frequency was obtained as 0.127 mm −1 without denoiser and 0.035 mm −1 for the ATV denoiser. The NPS peaks ranged from 209 to 957 HU 2 mm 2 without the denoiser and 66 to 481 HU 2 mm 2 for the ATV denoiser. As such, the NPS peak was on average lower with the ATV denoiser than without the denoiser. The peak spatial frequency values of NPS for ATV denoiser shifted to lower spatial frequencies in comparison to no denoiser. Numerically, the NPS average spatial frequencies were obtained as 0.142 mm −1 for the ATV denoiser and 0.295 mm −1 for no denoiser. Images with the ATV denoiser smoothed out with a lower noise amplitude, as indicated by the average frequencies of the NPS curves, resulting in a monotonous texture. The results of DSC versus the manual contours from radiation oncologists 1 and 2 are shown in Tables 1 and 2, respectively. The average DSC outcomes were higher than 0.80 in all OARs, except for the esophagus. The AccuContour TM -based and denoisingbased auto-segmentations of the esophagus were below acceptable standards. In a comparison of AccuContour TM -based and denoising-based auto-segmentation, the differences

Discussion
In this study, we compared the auto-contouring results in five organ structures using the commercial deep-learning contouring program AccuContour TM with those obtained from an anisotropic total variation denoiser. Both the AccuContour TM -based and denoisingbased auto-segmentation were considered to yield an acceptable accuracy for generating contours of the heart, lungs, spinal cord, and liver. However, these techniques yielded limited performance for the esophagus.
Deep-learning algorithms based on convolutional neural networks and AccuContour TM have yielded satisfactory performance outcomes for the automatic segmentation of OARs. However, some parts of automatic segmentation of the liver in non-contrast CT required manual corrections to make them clinically acceptable (Figure 3). In non-contrast CT images, it might be difficult to delineate the fuzzy boundaries between the liver and adjacent organs, owing to low soft tissue contrast between the liver and its surrounding organs. In this study, the auto-segmentation results showed a significant improvement in the DSC when using denoising-based auto-segmentations of the liver, compared to using AccuContour TM -based auto-segmentation. An ATV denoiser could enhance the image quality of CT by removing noisy areas, and this may lead to improved segmentation boundaries. Figure 3 shows that some parts surrounding the gall bladder, pancreas, duodenum, large vessel, or kidney are included in AccuContour TM -based auto-segmentation of the liver. However, denoising-based auto-segmentations could delineate the liver accurately by distinguishing the surrounding organs of CT images. These results indicate that the performance of denoising-based auto-segmentation is superior to that of AccuContour TM -based autosegmentation. All deep-learning-based auto-segmentations should be carefully reviewed and approved by the radiation oncologists before use for a treatment plan. In some CT slices, major or minor errors of deep-learning-based auto-contour are present, and correction is required. The denoising-based auto-segmentation might convert "large or minor errors" of conventional deep-learning-based auto-segmentation to "minor errors" (a small amount of editing needed) or "no correction". The denoising-based auto-segmentation could be a practical tool for reducing the clinical workload of radiotherapy planning.
The esophagus is one of the most challenging OARs in thoracic organ autosegmentation. In this study, the performance of AccuContour TM -based and denoisingbased auto-segmentation was below a satisfactory level for the esophagus. Previous studies have reported that the DSCs of deep-learning-based auto-segmentation do not exceed 0.8 for the esophagus [40,[46][47][48][49]. Due to the absence of a consistent intensity contrast between the esophagus and neighboring tissues in non-contrast CT images, the boundaries between the esophagus and surrounding soft tissues are not well-defined. Figure 4 shows that some parts of the surrounding pulmonary vessel were included in AccuContour TM -based and denoising-based auto-segmentation of the esophagus. In addition, the appearance of the esophagus varies depending on whether it is filled with air or not. Figure 5 shows that the air-filled regions of the esophagus were not included in AccuContour TM -based and denoising-based auto-segmentations. The segmentation results for the esophagus obtained from the denoising-based auto-segmentation may still be inaccurate and unsatisfactory.  It has been demonstrated that auto-segmentations for the heart and lungs yield higher DSCs, with an average of over 0.9 [45,[48][49][50][51]. This study also showed that the average DSC outcomes of the heart and lungs are higher than 0.9. High-contrast edges and distinct structural boundaries of the heart and lungs were detected easily in both the AccuContour TM -based and denoising-based auto-segmentation. Therefore, AccuContour TM -based or denoising-based auto-segmentation for the heart and lungs can be used without major adjustments. The denoising-based auto-segmentations achieve inferior segmentation accuracy on the heart compared with AccuContour TM -based auto-segmentations. The denoising-based autosegmentation volumes are slightly larger than manual contours ( Figure 6A,D) or smaller than manual contours ( Figure 6B,C). Denoising-based auto-segmentation did not improve the accuracy of auto-segmentation of the heart.   This study had several limitations. Although 40 patients were randomly selected, selection bias in terms of the CT samples may be present. Since the results of this study were generated by only one proprietary software and CT scan, selection bias may have impacted our results.
Further, the contouring bias of the physicians may impact our results. Therefore, further external validation studies involving multiple experts, hospitals, and a larger sample size are needed to overcome these limitations. Moreover, denoising-based autosegmentation should be compared using software other than AccuContour TM .
However, in several deep-learning-based automatic segmentation studies, a single experienced radiation oncologist delineated the organ at risk or the clinical target volume [10,11,14,50]. In this study, two radiation oncologists delineated the organ at risk because it was thought that inter-observer variability may exist. The results of the Dice similarity coefficient of the five organs at risk from radiation oncologist 1 and 2 were consistent, so the results of this study are expected to be relatively reliable.    It has been demonstrated that auto-segmentations for the heart and lungs yield higher DSCs, with an average of over 0.9 [45,[48][49][50][51]. This study also showed that the average DSC outcomes of the heart and lungs are higher than 0.9. High-contrast edges and distinct structural boundaries of the heart and lungs were detected easily in both the Ac-cuContour TM -based and denoising-based auto-segmentation. Therefore, AccuContour TMbased or denoising-based auto-segmentation for the heart and lungs can be used without major adjustments. The denoising-based auto-segmentations achieve inferior segmentation accuracy on the heart compared with AccuContour TM -based auto-segmentations. The denoising-based auto-segmentation volumes are slightly larger than manual contours ( Figure 6A,D) or smaller than manual contours ( Figure 6B,C). Denoising-based auto-segmentation did not improve the accuracy of auto-segmentation of the heart. This study had several limitations. Although 40 patients were randomly selected, selection bias in terms of the CT samples may be present. Since the results of this study were generated by only one proprietary software and CT scan, selection bias may have impacted our results.

Conclusions
CT images are subject to noise that can affect the boundary between the adjacent organs, providing potentially limited contrast. Reducing the noise level in CT images provides the best visualization of structures, which increases the accuracy of image segmentation.
By combining the denoising algorithm in the deep-learning based auto-segmentation, the denoising-based auto-segmentation results of the liver from non-contrast CT scans were slightly superior to those of commercial conventional deep-learning-based autosegmentation and had greater similarities with the ground truth. This denoising-based auto-segmentation could provide a more precise contour delineation of the liver, thus reducing the clinical workload. The results of this study require validation through further studies using a higher sample size, which will compare denoising-based auto-segmentation using software other than AccuContour TM .  Institutional Review Board Statement: Ethical approval for this study was obtained from the Institutional Review Board of Yonsei University Health System, Gangnam Severance Hospital.

Informed Consent Statement:
The need for patient consent was waived due to the retrospective nature of the study design.
Data Availability Statement: All data generated or analyzed during this study are included in the article.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A
For n ← 1 to all images do For j ← 1 to all voxels do Calculate D(V j ) using Equation (2) Create gradient CDF histogram using D(V j ) End For δ ← Value at 80% o f gradient CDF End For For n ← 1 to all images do R(V) ← 0 , r ← 1, r red ← 0.8 For j ← 1 to all voxels do w j ← 0 For m ← 1 to N j do End For Calculate D(V j ) using Equation (2) R(V j ) ← w j D(V j ) R(V) ← R(V) + R(V j ) End For For t ← 1 to 20 do |∇R(V)| ← 0 For j ← 1 to all voxels do End For |∇R(V)| ← |∇R(V)| 2 For j ← 1 to all voxels do V j ← V j + λ∇R(V j )/|∇R(V)| Calculate D(V j ) using Equation (2) End For Calculate R(V ) using w j and D(V j ) WhileR (V j ) > R(V) do r ← r × r red λ ← λ × r For j ← 1 to all voxels do Calculate D(V j ) using Equation (2)