A Novel Tool for Supervised Segmentation Using 3 D Slicer

Abstract: The rather impressive extension library of medical image-processing platform 3D Slicer lacks a wide range of machine-learning toolboxes. The authors have developed such a toolbox that incorporates commonly used machine-learning libraries. The extension uses a simple graphical user interface that allows the user to preprocess data, train a classifier, and use that classifier in common medical image-classification tasks, such as tumor staging or various anatomical segmentations without a deeper knowledge of the inner workings of the classifiers. A series of experiments were carried out to showcase the capabilities of the extension and quantify the symmetry between the physical characteristics of pathological tissues and the parameters of a classifying model. These experiments also include an analysis of the impact of training vector size and feature selection on the sensitivity and specificity of all included classifiers. The results indicate that training vector size can be minimized for all classifiers. Using the data from the Brain Tumor Segmentation Challenge, Random Forest appears to have the widest range of parameters that produce sufficiently accurate segmentations, while optimal Support Vector Machines’ training parameters are concentrated in a narrow feature space.


Introduction
3D Slicer [1] is a free open-source platform for medical image visualization and processing.Its main functionality comes from the Extension Library, which consists of various modules that allow specific analyses of the input data, such as filtering, artefact suppression, or surface reconstruction.There is a lack of machine-learning extensions except for the open-source DeepInfer [2].This deep-learning deployment kit uses 3D convolutional neural networks to detect and localize the target tissue.The development team demonstrated the use of this kit on the prostate segmentation problem for image-guided therapy.Researchers and practitioners are able to select a publicly available task-oriented network through the module without the need to design or train it.
To enable the use of other machine-learning techniques, we developed the Supervised Segmentation Toolbox as an extensible machine-learning platform for 3D Slicer.Currently, Support Vector Machine (SVM) and Random Forest (RF) classifiers are included.These classifiers are well-researched and often used in image-processing tasks, as demonstrated in References [3][4][5][6][7][8].
SVMs [9] train by maximizing the distance between marginal samples (also referred to as support vectors) and a discriminative hyperplane by maximizing f in the equation: where y is defined as +1 for a class A sample and -1 for a class B sample, α is a Lagrangian multiplier, and → x is the feature vector of the individual sample.On real data, this is often too strict because of noisy samples that might cross their class boundary.This is solved by using a combination of techniques known as the kernel trick and soft margining.The kernel trick uses a kernel function φ to remap the original feature space to a higher-dimensional one by replacing the dot product in Equation (1) with φ(x i )•φ x j .This allows linear separation of the data as required by the SVM.An example of the kernel function is the radial basis function used in this study.Incorporating a soft margin allows some samples to cross their class boundary.Examples of such soft-margin SVMs are the C-SVM [10] and N-SVM [11].The C parameter of the C-SVM modifies the influence of each support vector on the final discriminatory hyperplane.The larger the C, the closer the soft-margin C-SVM is to a hard-margin SVM.The N parameter of the N-SVM defines a minimum number of support vectors and, consequently, an upper bound of the guaranteed maximum percentage of misclassifications.Further modifications to the SVM can also be done, such as using fuzzy-data points for the training dataset, as demonstrated in References [5,6].
RF uses a voting system on the results of the individual decision trees.Each decision tree is created on the basis of a bootstrapped sample of the training data [12].

Materials and Methods
The data used in this study describe Low Grade Glioma (LGG) obtained by MRI.The dataset consists of four images of one patient (LG_0001, Figure 1): T1-and T2-weighted, contrast-enhanced T1C, and Fluid Attenuated Inversion Recovery (FL).These data are part of the training set featured in the Brain Tumor Segmentation (BraTS) 2012 Challenge.where y is defined as +1 for a class A sample and -1 for a class B sample, α is a Lagrangian multiplier, and x is the feature vector of the individual sample.On real data, this is often too strict because of noisy samples that might cross their class boundary.This is solved by using a combination of techniques known as the kernel trick and soft margining.The kernel trick uses a kernel function φ to remap the original feature space to a higher-dimensional one by replacing the dot product in Equation (1) with φ(x i ) • φ(x j ).This allows linear separation of the data as required by the SVM.An example of the kernel function is the radial basis function used in this study.Incorporating a soft margin allows some samples to cross their class boundary.Examples of such soft-margin SVMs are the C-SVM [10] and N-SVM [11].The C parameter of the C-SVM modifies the influence of each support vector on the final discriminatory hyperplane.The larger the C, the closer the soft-margin C-SVM is to a hard-margin SVM.The N parameter of the N-SVM defines a minimum number of support vectors and, consequently, an upper bound of the guaranteed maximum percentage of misclassifications.
Further modifications to the SVM can also be done, such as using fuzzy-data points for the training dataset, as demonstrated in References [5,6].
RF uses a voting system on the results of the individual decision trees.Each decision tree is created on the basis of a bootstrapped sample of the training data [12].

Materials and Methods
The data used in this study describe Low Grade Glioma (LGG) obtained by MRI.The dataset consists of four images of one patient (LG_0001, Figure 1): T1-and T2-weighted, contrast-enhanced T1C, and Fluid Attenuated Inversion Recovery (FL).These data are part of the training set featured in the Brain Tumor Segmentation (BraTS) 2012 Challenge.The free and open-source Supervised Segmentation Toolbox extension [24] of the 3D Slicer was used throughout this study.The extension allows the user to train a range of classifiers using labeled data.The user is also able to perform a grid search to select the optimal parameters for the classifier.To achieve this, the extension uses either an already available function or a cross-validation algorithm developed by the author of the extension, depending on the classifier library used.Currently, N-SVM and C-SVM from the dlib library [13] and C-SVM and Random Forest from Shark-ml library [14] are The free and open-source Supervised Segmentation Toolbox extension [13] of the 3D Slicer was used throughout this study.The extension allows the user to train a range of classifiers using labeled data.The user is also able to perform a grid search to select the optimal parameters for the classifier.To achieve this, the extension uses either an already available function or a cross-validation algorithm developed by the author of the extension, depending on the classifier library used.Currently, N-SVM and C-SVM from the dlib library [14] and C-SVM and Random Forest from Shark-ml library [15] are Symmetry 2018, 10, 627 3 of 9 incorporated.The extension takes care of the parallelizable parts of the training and classification subtasks, thus significantly reducing computation times.A preprocessing algorithm selection is also a part of the extension.This allows for artefact correction or feature extraction.The extension workflow is depicted in Figure 2.

Results and Discussion
A series of tests were performed in order to provide a sense of the speed and accuracy of the provided classifiers.Sensitivity and specificity metrics were used to evaluate the results.A classifier that had a larger sum of specificity and sensitivity (or their respective means, when cross-validation was used) was considered a better classifier.During the first test run, each type of classifier was trained and evaluated using the single patient image set.Optimal training parameters of the classifiers were obtained using a grid-search approach.The results are presented in Figures 3-5.The γ parameter is common in both SVM classifiers and influences the variance of the radial basis kernel.A large γ means that more data points will look similar, thus preventing overfitting.Using the aforementioned dataset, the results indicate a relative insensitivity of the classification accuracy on this parameter.For the given dataset, C values of the C-SVM larger than 1 seem optimal.The best results of the N-SVM classifier are obtained with N around 10% or lower combined with a high-variance radial basis function.Optimal RF training parameters were: a small node size of under 5, number of trees higher than 800, no out-of-bag samples, and no random attributes.

Results and Discussion
A series of tests were performed in order to provide a sense of the speed and accuracy of the provided classifiers.Sensitivity and specificity metrics were used to evaluate the results.A classifier that had a larger sum of specificity and sensitivity (or their respective means, when cross-validation was used) was considered a better classifier.During the first test run, each type of classifier was trained and evaluated using the single patient image set.Optimal training parameters of the classifiers were obtained using a grid-search approach.The results are presented in Figures 3-5.The γ parameter is common in both SVM classifiers and influences the variance of the radial basis kernel.A large γ means that more data points will look similar, thus preventing overfitting.Using the aforementioned dataset, the results indicate a relative insensitivity of the classification accuracy on this parameter.For the given dataset, C values of the C-SVM larger than 1 seem optimal.The best results of the N-SVM classifier are obtained with N around 10% or lower combined with a high-variance radial basis function.Optimal RF training parameters were: a small node size of under 5, number of trees higher than 800, no out-of-bag samples, and no random attributes.incorporated.The extension takes care of the parallelizable parts of the training and classification subtasks, thus significantly reducing computation times.A preprocessing algorithm selection is also a part of the extension.This allows for artefact correction or feature extraction.The extension workflow is depicted in Figure 2.

Results and Discussion
A series of tests were performed in order to provide a sense of the speed and accuracy of the provided classifiers.Sensitivity and specificity metrics were used to evaluate the results.A classifier that had a larger sum of specificity and sensitivity (or their respective means, when cross-validation was used) was considered a better classifier.During the first test run, each type of classifier was trained and evaluated using the single patient image set.Optimal training parameters of the classifiers were obtained using a grid-search approach.The results are presented in Figures 3-5.The γ parameter is common in both SVM classifiers and influences the variance of the radial basis kernel.A large γ means that more data points will look similar, thus preventing overfitting.Using the aforementioned dataset, the results indicate a relative insensitivity of the classification accuracy on this parameter.For the given dataset, C values of the C-SVM larger than 1 seem optimal.The best results of the N-SVM classifier are obtained with N around 10% or lower combined with a high-variance radial basis function.Optimal RF training parameters were: a small node size of under 5, number of trees higher than 800, no out-of-bag samples, and no random attributes.The second test run consisted of using a different number of slices around the center of the tumor to reveal the impact of the size of the training set on the specificity and sensitivity of all classifiers (Figures 6a-7).The results indicate that reducing the number of unique training samples has a negligible effect on the subsequent classification accuracy.RF shows slightly better classification accuracy improvement when using a larger training vector.Using a reduced training dataset influences training process length and might result in a simpler classifier, which is easier to interpret and has shorter classification computation times.The classification time is a limiting factor of using these methods in real-time applications.The second test run consisted of using a different number of slices around the center of the tumor to reveal the impact of the size of the training set on the specificity and sensitivity of all classifiers (Figures 6a-7).The results indicate that reducing the number of unique training samples has a negligible effect on the subsequent classification accuracy.RF shows slightly better classification accuracy improvement when using a larger training vector.Using a reduced training dataset influences training process length and might result in a simpler classifier, which is easier to interpret and has shorter classification computation times.The classification time is a limiting factor of using these methods in real-time applications.The second test run consisted of using a different number of slices around the center of the tumor to reveal the impact of the size of the training set on the specificity and sensitivity of all classifiers (Figures 6a and 7).The results indicate that reducing the number of unique training samples has a negligible effect on the subsequent classification accuracy.RF shows slightly better classification accuracy improvement when using a larger training vector.Using a reduced training dataset influences training process length and might result in a simpler classifier, which is easier to interpret and has shorter classification computation times.The classification time is a limiting factor of using these methods in real-time applications.The effect of different image types on classifier accuracy was examined in the last test run (Figures 8a-9).Slices 88 of the LG_0001 images were used as a source of training samples.Sufficient sensitivity and specificity were obtained by only using T1-and T2-weighted images.Furthermore, all classifiers benefited from the addition of a postcontrast T1-weighted image.The RF classifier achieved best overall results with the use of FL, and postcontrast T1-and T2-weighted images.The effect of different image types on classifier accuracy was examined in the last test run (Figures 8a-9).Slices 88 of the LG_0001 were used as a source of training samples.Sufficient sensitivity and specificity were obtained by only using T1-and T2-weighted images.Furthermore, all classifiers benefited from the addition of a postcontrast T1-weighted image.The RF classifier achieved best overall results with the use of FL, and postcontrast T1-and T2-weighted images.The effect of different image types on classifier accuracy was examined in the last test run (Figures 8a and 9).Slices 88 of the LG_0001 images were used as a source of training samples.Sufficient sensitivity and specificity were obtained by only using T1-and T2-weighted images.Furthermore, all classifiers benefited from the addition of a postcontrast T1-weighted image.The RF classifier achieved best overall results with the use of FL, and postcontrast T1-and T2-weighted images.
Symmetry 2016, xx, x 6 of 9 (a) Symmetry 2018, 10, 627 6 of 9 Image Types 0 (a)  The following standardized procedure was designed in order to compare classifier performance.Training samples were extracted from the whole volume of the unmodified T1C and T2 images.Then, sensitivity and specificity were obtained using fivefold cross-validation.The best performing parameters and results are reported in Table 1.Segmentations are shown in Figure 10.Classification results can be further improved by using preprocessed data instead of raw data, and by of postprocessing to remove the outlying voxels and inlying holes as demonstrated in Reference [15].The following standardized procedure was designed in order to compare classifier performance.Training samples were extracted from the whole volume of the unmodified T1C and T2 images.Then, sensitivity and specificity were obtained using fivefold cross-validation.The best performing parameters and results are reported in Table 1.Segmentations are shown in Figure 10.Classification results can be further improved by using preprocessed data instead of raw data, and by means of postprocessing to remove the outlying voxels and inlying holes as demonstrated in Reference [16].Lastly, the performance of the RF classifier trained on all tumor cores of the 20 real high-grade glioma volumes using the 3D Slicer extension were compared to similar studies performed on the BraTS dataset.The values were obtained as a mean of fivefold cross-validation.This comparison is shown in Table 2.The other DICE values are from Reference [16].
The means to combine the results of different classifiers to further expand the usability of the Supervised Segmentation Toolbox extension were added.Currently, logical AND and OR and a majority voting system are implemented.An addition of a Multiple Classifier System (MCS) is currently considered.A review of the advantages of MCS is provided by Wozniak et al. [17].Termenon and Graña [18] used a two-stage MCS where the second classifier was trained on low-confidence data obtained by training and analysis of the first classifier.In the future, the authors expect implementing additional classifiers as well.Adding a Relevance Vector Machine (RVM), for example, might bring an improvement over SVM [19].
Table 2. RF classifier comparison with similar studies.The classifier was trained using all 20 of the real high-grade glioma volumes, and the DICE value is a mean of fivefold cross-validation.Lastly, the performance of the RF classifier trained on all tumor cores of the 20 real high-grade glioma volumes using the 3D Slicer extension were compared to similar studies performed on the BraTS dataset.The values were obtained as a mean of fivefold cross-validation.This comparison is shown in Table 2.The other DICE values are from Reference [17].
The means to combine the results of different classifiers to further expand the usability of the Supervised Segmentation Toolbox extension were added.Currently, logical AND and OR and a majority voting system are implemented.An addition of a Multiple Classifier System (MCS) is currently considered.A review of the advantages of MCS is provided by Wozniak et al. [18].Termenon and Graña [19] used a two-stage MCS where the second classifier was trained on low-confidence data obtained by training and analysis of the first classifier.In the future, the authors expect implementing additional classifiers as well.Adding a Relevance Vector Machine (RVM), for example, might bring an improvement over SVM [20].
Table 2. RF classifier comparison with similar studies.The classifier was trained using all 20 of the real high-grade glioma volumes, and the DICE value is a mean of fivefold cross-validation.

Conclusions
The Supervised Segmentation Toolbox extension was presented as an addition to the 3D Slicer extension library.This extension allows the user to train and use three types of classifiers, with more to be added in the future.The usability of the extension was demonstrated on a brain-tumor segmentation use case.The effects of the training parameters of all classifiers on the final sensitivity and specificity of the classification were considered to provide an insight into usable parameter selection for future studies.A low γ in combination with softer margin terms resulted in a better performing classifier commonly for both SVM classifiers.This might be largely due to a limited training sample, and a broader dataset should be analyzed in order to generalize the results.The RF classifier performed best using no added randomization, a relatively large tree count, and a small node size.The possibility of reducing training vector size in order to reduce model complexity and decrease classification time is verified.A 20-fold increase of the number of unique training samples resulted, at best, in a 2% increase of specificity.All combinations of input images are considered as a training input for all classifiers, and the significance of adding more types of images is discussed.A combination of T1C and T2 images performed sufficiently for all classifiers.The addition of the FL image brought a slight improvement in sensitivity.Lastly, best-performing parameter combinations were listed and the corresponding results were compared.The RF classifier had the largest sensitivity and worst specificity, and C-SVM performed oppositely.The significance of these two metrics largely depends on the type of task for which the classifiers are used.All sensitivity and specificity data were obtained directly using the 3D Slicer extension.

Figure 5 .
Figure 5. Random Forest classifier sensitivity and specificity using different parameters.Left to right: Different node size, number of trees, OOB and number of random attributes.

Figure 6 .
Figure 6.(a) N-SVM and (b) C-SVM sensitivity and specificity using different training vector sizes.

Figure 6 .
Figure 6.(a) N-SVM and (b) C-SVM sensitivity and specificity using different training vector sizes.

Figure 7 .
Figure 7. RF-classifier sensitivity and specificity using different training vector sizes.

Figure 7 .
Figure 7. RF-classifier sensitivity and specificity using different training vector sizes.

Table 1 .
Classifier comparison and best-performing parameters.

Table 1 .
Classifier comparison and best-performing parameters.