# Deep Learning for Automated Elective Lymph Node Level Segmentation for Head and Neck Cancer Radiotherapy

^{1}

^{2}

^{3}

^{4}

^{*}

## Abstract

**:**

## Simple Summary

## Abstract

## 1. Introduction

## 2. Materials and Methods

#### 2.1. Data Acquisition

#### 2.2. Pre-Processing

^{3}voxel spacing by 3rd-order and nearest-neighbour interpolation, respectively. This spacing was chosen to minimize image interpolations, whilst making sure the network’s filters were of equal size in each orthogonal plane for all patients.

#### 2.3. Experimental Outline

#### 2.4. Model Training

^{®}Core™ i9-9900KF CPU @3.6 GHz processor, using the GPU version of TensorFlow (Version 2.2.0) with Cuda 10.1 and Python (Version 3.8.10). The TensorBoard (Version 2.2.2) callback was used for tracking the training and validation scores, whilst only the best model in terms of DSC was saved. The models were trained using the Adam optimizer [20]. All models were trained using standard values in Keras, with an initial learning rate of 0.001, β

_{1}= 0.9, β

_{2}= 0.999 and ε = 1 × 10

^{−7}To reduce the divergence of the model weights at later stages of training, an exponential learning rate decay scheduler was used to decrease the learning rate by 5% with every epoch, up to a minimum of 0.0001. Dropout was switched off at test time. All models were trained using 5-fold cross-validation, with a train\test split of 48\12 cases every fold. To minimize the training variation, we used ensemble learning [9,21,22,23], where the highest cumulated in-class segmentation probability of 5 sequentially trained networks decided the final segmentation map. The training and evaluation times were saved.

#### 2.4.1. UNet

^{3}in volume that were known to contain the combined structure of LN levels I–V for every patient on each side. Binary and multi-class dice loss functions were used for optimization. The multi-class DSC loss was defined as the sum of individual foreground class losses (Equation (1)):

_{m}are the class weights that are calculated using the Python’s scikit-learn module [29], m ranges from 1 to M and denotes class indices, where M is the number of classes. DL is the DSC loss, defined as 1 minus the DSC score (Equation (2)):

_{m}and B

_{m}denote the predicted and manual reference binary sets of class m, respectively. In the case of binary segmentation, DSC

_{loss}is reduced to the latter loss function. For patches that contain a limited amount of foreground voxels, DSC

_{loss}becomes ill-defined (the denominator in DL

_{m}is not constrained to values larger than 0). To ameliorate this, we used a Gaussian sampling method, where the mean and standard deviation of the x, y and z coordinates are calculated from the centre of mass of the combined, binary structure of LN levels I–V of all patients. Subsequently, we used a truncated normal distribution to sample patches, such that they were constrained to be entirely within the region of interest. The weights were initialized using the standard initialization method in Keras (glorot uniform initialization). The models were optimized for 100 epochs. However, it should be noted that the use of an epoch in a patch-based setting is arbitrary, because patch sampling is perfrmed at random, and thus a different sub-set of all data is seen by the network in each epoch. The number of training pairs seen by the network per epoch was set to 4096, which corresponded to roughly 34 training patches per side per patient.

#### 2.4.2. Multi-View

#### 2.4.3. Data Augmentation

_{C}) [width (C

_{W})] was 0 [700], as was previously used for lymph structure segmentation [9]. If contrast adaptation was applied, alternative window level center, and width were sampled from normal distributions, with μ

_{C}= 0; σ

_{C}= 3% × 700 and μ

_{W}= 700; σ

_{W}= 3% × 700, respectively.

#### 2.5. Post-Processing

#### 2.6. Evaluation and Statistical Analysis

#### 2.7. Independent Validation

## 3. Results

## 4. Discussion

## 5. Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Institutional Review Board Statement

## Informed Consent Statement

## Data Availability Statement

## Conflicts of Interest

## References

- Van der Veen, J.; Gulyban, A.; Nuyts, S. Interobserver variability in delineation of target volumes in head and neck cancer. Radiother. Oncol.
**2019**, 137, 9–15. [Google Scholar] [CrossRef] [PubMed] - Grégoire, V.; Ang, K.; Budach, W.; Grau, C.; Hamoir, M.; Langendijk, J.A.; Lee, A.; Le, Q.-T.; Maingon, P.; Nutting, C.; et al. Delineation of the primary tumour Clinical Target Volumes (CTV-P) in laryngeal, hypopharyngeal, oropharyngeal and oral cavity squamous cell carcinoma: AIRO, CACA, DAHANCA, EORTC, GEORCC, GORTEC, HKNPCSG, HNCIG, IAG-KHT, LPRHHT, NCIC CTG, NCRI, NRG Oncolog. Int. J. Radiat. Oncol.
**2019**, 104, 677–684. [Google Scholar] [CrossRef][Green Version] - Van Rooij, W.; Dahele, M.; Brandao, H.R.; Delaney, A.R.; Slotman, B.J.; Verbakel, W.F.A.R. Deep Learning-Based Delineation of Head and Neck Organs at Risk: Geometric and Dosimetric Evaluation. Int. J. Radiat. Oncol.
**2019**, 104, 677–684. [Google Scholar] [CrossRef] [PubMed] - Zhu, W.; Huang, Y.; Zeng, L.; Chen, X.; Liu, Y.; Qian, Z.; Du, N.; Fan, W.; Xie, X. AnatomyNet: Deep learning for fast and fully automated whole-volume segmentation of head and neck anatomy. Med. Phys.
**2018**, 46, 576–589. [Google Scholar] [CrossRef] [PubMed][Green Version] - Wang, W.; Wang, Q.; Jia, M.; Wang, Z.; Yang, C.; Zhang, D.; Wen, S.; Hou, D.; Liu, N.; Wang, P. Deep Learning-Augmented Head and Neck Organs at Risk Segmentation From CT Volumes. Front. Phys.
**2021**, 9, 1–11. [Google Scholar] [CrossRef] - Kawahara, D.; Tsuneda, M.; Ozawa, S.; Okamoto, H.; Nakamura, M.; Nishio, T.; Saito, A.; Nagata, Y. Stepwise deep neural network (stepwise-net) for head and neck auto-segmentation on CT images. Comput. Biol. Med.
**2022**, 143, 105295. [Google Scholar] [CrossRef] - Ali, R.; Hardie, R.C.; Narayanan, B.N.; Kebede, T.M. IMNets: Deep Learning Using an Incremental Modular Network Synthesis Approach for Medical Imaging Applications. Appl. Sci.
**2022**, 12, 5500. [Google Scholar] [CrossRef] - Men, K.; Chen, X.; Zhang, Y.; Zhang, T.; Dai, J.; Yi, J.; Li, Y. Deep deconvolutional neural network for target segmentation of nasopharyngeal cancer in planning computed tomography images. Front. Oncol.
**2017**, 7, 315. [Google Scholar] [CrossRef][Green Version] - Cardenas, C.E.; Beadle, B.M.; Garden, A.S.; Skinner, H.D.; Yang, J.; Rhee, D.J.; McCarroll, R.E.; Netherton, T.J.; Gay, S.S.; Zhang, L. Generating High-Quality Lymph Node Clinical Target Volumes for Head and Neck Cancer Radiation Therapy Using a Fully Automated Deep Learning-Based Approach. Int. J. Radiat. Oncol.
**2021**, 109, 801–812. [Google Scholar] [CrossRef] - Chen, A.; Deeley, M.A.; Niermann, K.J.; Moretti, L.; Dawant, B.M. Combining registration and active shape models for the automatic segmentation of the lymph node regions in head and neck CT images. Med. Phys.
**2010**, 37, 6338–6346. [Google Scholar] [CrossRef] - Stapleford, L.J.; Lawson, J.D.; Perkins, C.; Edelman, S.; Davis, L.; McDonald, M.W.; Waller, A.; Schreibmann, E.; Fox, T. Evaluation of Automatic Atlas-Based Lymph Node Segmentation for Head-and-Neck Cancer. Int. J. Radiat. Oncol.
**2010**, 77, 959–966. [Google Scholar] [CrossRef] - Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Lecture Notes in Computer Science, Proceedings of the Medical Image Computing and Computer-Assisted Intervention 2015, Munich, Germany, 5–9 October 2015; Springer: Cham, Switzerland, 2015. [Google Scholar] [CrossRef][Green Version]
- Weissmann, T.; Huang, Y.; Fischer, S.; Roesch, J. Deep Learning for automatic head and neck lymph node level delineation. Int. J. Radiat. Oncol. Biol. Phys.
**2022**, 1–17. Available online: https://arxiv.org/abs/2208.13224 (accessed on 1 October 2022). - Van der Veen, J.; Willems, S.; Bollen, H.; Maes, F.; Nuyts, S. Deep learning for elective neck delineation: More consistent and time efficient. Radiother. Oncol.
**2020**, 153, 180–188. [Google Scholar] [CrossRef] - Birenbaum, A.; Greenspan, H. Multi-view longitudinal CNN for multiple sclerosis lesion segmentation. Eng. Appl. Artif. Intell.
**2017**, 65, 111–118. [Google Scholar] [CrossRef] - Strijbis, V.I.J.; de Bloeme, C.M.; Jansen, R.W.; Kebiri, H.; Nguyen, H.-G.; de Jong, M.C.; Moll, A.C.; Bach-Cuadra, M.; de Graaf, P.; Steenwijk, M.D. Multi-view convolutional neural networks for automated ocular structure and tumor segmentation in retinoblastoma. Sci. Rep.
**2021**, 11, 14590. [Google Scholar] [CrossRef] - Roth, H.R.; Lu, L.; Seff, A.; Cherry, K.M.; Hoffman, J.; Wang, S.; Liu, J.; Turkbey, E.; Summers, R.M. A New 2.5D Representation for Lymph Node Detection Using Random Sets of Deep Convolutional Neural Network Observations. In Lecture Notes in Computer Science, Proceedings of the Medical Image Computing and Computer-Assisted Intervention (MICCAI), Boston, MA, USA, 14–18 September 2014; Springer: Cham, Switzerland, 2014; Volume 17, pp. 520–527. [Google Scholar] [CrossRef][Green Version]
- Schouten, J.P.; Noteboom, S.; Martens, R.M.; Mes, S.W.; Leemans, C.R.; de Graaf, P.; Steenwijk, M.D. Automatic segmentation of head and neck primary tumors on MRI using a multi-view CNN. Cancer Imaging
**2022**, 22, 8. [Google Scholar] [CrossRef] - Aslani, S.; Dayan, M.; Storelli, L.; Filippi, M.; Murino, V.; Rocca, M.A.; Sona, D. Multi-branch convolutional neural network for multiple sclerosis lesion segmentation. NeuroImage
**2019**, 196, 1–15. [Google Scholar] [CrossRef][Green Version] - Kingma, D.P.; Ba, J. Adam: A Method for Stochastic Optimization. In Proceedings of the 3rd International Conference of Learning Representations (ICLR), San Diego, USA, 7–9 May 2015. [Google Scholar]
- Ren, J.; Eriksen, J.G.; Nijkamp, J.; Korreman, S.S. Comparing different CT, PET and MRI multi-modality image combinations for deep learning-based head and neck tumor segmentation. Acta Oncol.
**2021**, 60, 1399–1406. [Google Scholar] [CrossRef] - Van Rooij, W.; Dahele, M.; Nijhuis, H.; Slotman, B.J.; Verbakel, W.F.A.R. OC-0346: Strategies to improve deep learning-based salivary gland segmentation. Radiat. Oncol.
**2020**, 15, 272. [Google Scholar] [CrossRef] - Van Rooij, W.; Verbakel, W.F.; Slotman, B.J.; Dahele, M. Using Spatial Probability Maps to Highlight Potential Inaccuracies in Deep Learning-Based Contours: Facilitating Online Adaptive Radiation Therapy. Adv. Radiat. Oncol.
**2021**, 6, 100658. [Google Scholar] [CrossRef] - He, K.; Zhang, X.; Ren, S.; Sun, J. Identity Mappings in Deep Residual Networks. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2016. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar] [CrossRef][Green Version]
- Milletari, F.; Navab, N.; Ahmadi, S.-A. V-Net: Fully convolutional neural networks for volumetric medical image segmentation. In Proceedings of the 2016 Fourth International Conference on 3D Vision (3DV), Stanford, CA, USA, 25–28 October 2016. [Google Scholar]
- Wu, D.; Kim, K.; Li, Q. Computationally efficient deep neural network for computed tomography image reconstruction. Med. Phys.
**2019**, 46, 4763–4776. [Google Scholar] [CrossRef] [PubMed][Green Version] - Bouman, P.M.; Strijbis, V.I.J.; Jonkman, L.E.; Hulst, H.E.; Geurts, J.J.G.; Steenwijk, M.D. Artificial double inversion recovery images for (juxta)cortical lesion visualization in multiple sclerosis. Mult. Scler. J.
**2022**. [Google Scholar] [CrossRef] [PubMed] - Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res.
**2011**, 12, 2825–2830. [Google Scholar] - D’Agostino, B.B. An omnibus test of normality for moderate and large size samples. Biometrika
**1971**, 58, 341–348. [Google Scholar] [CrossRef] - Koo, T.K.; Li, M.Y. A Guideline of Selecting and Reporting Intraclass Correlation Coefficients for Reliability Research. J. Chiropr. Med.
**2016**, 15, 155–163. [Google Scholar] [CrossRef][Green Version] - Wack, D.S.; Dwyer, M.G.; Bergsland, N.; Di Perri, C.; Ranza, L.; Hussein, S.; Ramasamy, D.; Poloni, G.; Zivadinov, R. Improved assessment of multiple sclerosis lesion segmentation agreement via detection and outline error estimates. BMC Med. Imaging
**2012**, 12, 17. [Google Scholar] [CrossRef][Green Version] - Grégoire, V.; Ang, K.; Budach, W.; Grau, C.; Hamoir, M.; Langendijk, J.A.; Lee, A.; Quynh-Thu, L.; Maingon, P.; Nutting, C.; et al. Delineation of the neck node levels for head and neck tumors: A 2013 update. DAHANCA, EORTC, HKNPCSG, NCIC CTG, NCRI, RTOG, TROG consensus guidelines. Radiother. Oncol.
**2014**, 110, 172–181. [Google Scholar] [CrossRef] - Nogues, I.; Lu, L.; Wang, X.; Roth, H.; Bertasius, G.; Lay, N.; Shi, J.; Tsehay, Y.; Summers, R.M. Automatic lymph node cluster segmentation using holistically-nested neural networks and structured optimization in CT images. In Lecture Notes in Computer Science, Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2016, Athens, Greece, 17–21 October 2016; Springer: Cham, Switzerland, 2016. [Google Scholar] [CrossRef]

**Figure 1.**Schematic overview of the experimental outline. UNet (blue boxes) and MV (red boxes) were used to make three model configurations. In the first configuration, a patch-based UNet segments the background and LN levels I–V directly from the planning CT. In the second configuration, MV classifies the background and LN levels I–V voxels from within a preconstructed mask (cyan). In UNet+MV, a patch-based UNet first segments the combined structure of LN levels I–V. This is subsequently used as a mask (cyan) for MV to subsequently classify positive voxels into individual levels I–V. The details of both models are given in Figure 2. Abbreviations: MV: multi-view; CT: computed tomography.(Also shows in Figure S2).

**Figure 2.**Schematic overview of the UNet (blue box) and MV (red box) networks. UNet consists of an encoder (

**left**) and a decoder (

**right**) pathway that generates binary segmentation maps from 64 cubed voxel patches sampled from planning CTs. MV uses three multi-view branches that build up to each anatomical plane within a scale block, the output of which is concatenated and used as the input for the multi-scale branched architecture. The thickness of the convolutional blocks corresponded with the number of filters used. The number of output classes (M) was six for UNet in the UNet-only configuration and two for UNet in the UNet+MV configuration. M was six for MV in the MV-only configuration and five in the UNet+MV configuration. Abbreviations: MV: multi-view; ch: number of channels; BN: batch normalization; ReLu: rectified linear unit; f: number of output filters; M: number of output classes; K: convolution kernel size; S: convolution stride; BN: batch normalization; p: dropout fraction: CT: computed tomography.

**Figure 3.**Example segmentations selected from the first (Q1), second (Q2) and third (Q3) quartile in terms of DSC averaged over individual LN levels I–V. The filled region is the manual reference. The solid, dashed and dotted lines correspond to the predictions of the model configurations of UNet, MV and UNet+MV, respectively. LN levels I–V are indicated in pink, blue, green, red and yellow, respectively. The low average DSC in Q1 was in part attributed to an error in the manual reference level III–IV transition. Abbreviations: DSC: dice similarity coefficient; LN: lymph node.

**Figure 4.**Predicted and manual reference volumes for all structures. Abbreviations: ICC: intra-class correlation (two-way mixed, single measures, consistency).

**Figure 5.**Spatial performances of UNet, MV and UNet+MV model configurations for DSC, HD and MSD measures. Statistical significance marking of the MV configuration was omitted because differences between MV and other model configurations were always significant. Structures for which differences between UNet and UNet+MV were statistically significant are denoted by significance bars. *: p < 0.05; **: p < 0.01; ***: p < 0.001; ****: p < 0.0001; Abbreviations: DSC: dice similarity coefficient; MV: multi-view; HD: Hausdorff distance; MSD: mean surface distance.

**Figure 6.**Examples from the worst-performing quartile samples in terms of DSC averaged over individual LN levels I–V. The filled region is the manual reference. The solid, dashed and dotted lines correspond to the predictions of the UNet, MV and UNet+MV model configurations, respectively. LN levels I–V are indicated in pink, blue, green, red and yellow, respectively. Arrows indicate specific locations of interest. Abbreviations: DSC: dice similarity coefficient; LN: lymph node.

**Figure 7.**UNet and UNet+MV spatial model performances in the independent test. Structures for which differences between model configurations were statistically significant are denoted by significance bars. ***: p < 0.001; ****: p < 0.0001; Abbreviations: DSC: Dice similarity coefficient; MV: multi-view.

**Table 1.**The reported values denote the range of median DSCs produced by five individual models and ensemble model combinations of UNet, MV and UNet+MV configurations after post-processing. Ensemble results that showed higher spatial agreement than the most accurate individual model are denoted in bold. Ensembles increased result consistency and typically outperformed any of the standalone models for all configurations. Abbreviations: MV: multi-view; Ind. individual; Ens: ensemble; LN: lymph node.

Cross-Validation | Independent Test | |||||||
---|---|---|---|---|---|---|---|---|

UNet | MV | UNet+MV | UNet | UNet+MV | ||||

Ind. | Ens. | Ind. | Ens. | Ind. | Ens. | Ens. | Ens. | |

LN I–V | [0.850–0.852] | 0.857 | [0.692–0.706] | 0.708 | [0.860–0.862] | 0.867 | 0.846 | 0.865 |

LN I | [0.849–0.855] | 0.860 | [0.682–0.695] | 0.700 | [0.851–0.856] | 0.857 | 0.856 | 0.852 |

LN II | [0.827–0.834] | 0.840 | [0.702–0.720] | 0.726 | [0.856–0.858] | 0.862 | 0.824 | 0.850 |

LN III | [0.771–0.781] | 0.781 | [0.628–0.653] | 0.656 | [0.802–0.812] | 0.810 | 0.755 | 0.825 |

LN IV | [0.714–0.746] | 0.748 | [0.559–0.585] | 0.583 | [0.757–0.764] | 0.764 | 0.743 | 0.724 |

LN V | [0.738–0.751] | 0.754 | [0.572–0.604] | 0.610 | [0.753–0.761] | 0.763 | 0.697 | 0.707 |

PI–PV | [0.897–0.898] | 0.899 | [0.779–0.788] | 0.798 | [0.899–0.900] | 0.908 | 0.892 | 0.904 |

PII–PIV | [0.887–0.891] | 0.892 | [0.768–0.782] | 0.788 | [0.899–0.900] | 0.902 | 0.893 | 0.892 |

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Strijbis, V.I.J.; Dahele, M.; Gurney-Champion, O.J.; Blom, G.J.; Vergeer, M.R.; Slotman, B.J.; Verbakel, W.F.A.R. Deep Learning for Automated Elective Lymph Node Level Segmentation for Head and Neck Cancer Radiotherapy. *Cancers* **2022**, *14*, 5501.
https://doi.org/10.3390/cancers14225501

**AMA Style**

Strijbis VIJ, Dahele M, Gurney-Champion OJ, Blom GJ, Vergeer MR, Slotman BJ, Verbakel WFAR. Deep Learning for Automated Elective Lymph Node Level Segmentation for Head and Neck Cancer Radiotherapy. *Cancers*. 2022; 14(22):5501.
https://doi.org/10.3390/cancers14225501

**Chicago/Turabian Style**

Strijbis, Victor I. J., Max Dahele, Oliver J. Gurney-Champion, Gerrit J. Blom, Marije R. Vergeer, Berend J. Slotman, and Wilko F. A. R. Verbakel. 2022. "Deep Learning for Automated Elective Lymph Node Level Segmentation for Head and Neck Cancer Radiotherapy" *Cancers* 14, no. 22: 5501.
https://doi.org/10.3390/cancers14225501