Mathematical Modelling of Ground Truth Image for 3D Microscopic Objects Using Cascade of Convolutional Neural Networks Optimized with Parameters ʹ Combinations Generators

: Mathematical modelling to compute ground truth from 3D images is an area of research that can strongly benefit from machine learning methods. Deep neural networks (DNNs) are state ‐ of ‐ the ‐ art methods design for solving these kinds of difficulties. Convolutional neural networks (CNNs), as one class of DNNs, can overcome special requirements of quantitative analysis especially when image segmentation is needed. This article presents a system that uses a cascade of CNNs with symmetric blocks of layers in chain, dedicated to 3D image segmentation from microscopic images of 3D nuclei. The system is designed through eight experiments that differ in following aspects: number of training slices and 3D samples for training, usage of pre ‐ trained CNNs and number of slices and 3D samples for validation. CNNs parameters are optimized using linear, brute force, and random combinatorics, followed by voter and median operations. Data augmentation techniques such as reflection, translation and rotation are used in order to produce sufficient training set for CNNs. Optimal CNN parameters are reached by defining 11 standard and two proposed metrics. Finally, benchmarking demonstrates that CNNs improve segmentation accuracy, reliability and increased annotation accuracy, confirming the relevance of CNNs to generate high ‐ throughput mathematical ground truth 3D images.


Introduction
Deep neural networks (DNNs) along with general concept of deep learning (DL) brought revolution in the area of biomedical engineering, especially medical image processing and segmentation. DL can be described as hierarchical learning that is based on learning characteristics of data. These models are inspired by the flow of information and its processing in the human biological nerve system. CNN networks are a well-known form of DL architecture. When large training datasets are available, DNNs can achieve satisfying results. When real data are not sufficient, it is possible to generate augmented data by data augmentation techniques such as reflection, translation, rotation, etc. Using DNNs for image segmentation in computed topography (CT) [1,2], magnetic resonance (MR) [3−5] or X-ray [6,7] images has become standard, while promising results are being obtained with DL in microscopy [8−13] and electron microscopy [14−18]. Furthermore, DNNs are successfully implemented for nucleus segmentation [19−26].
One key point in this DL process is the automatic or semi-automatic calculation of mathematical ground truth (mathematical GT) for datasets and several alternatives from various fields have been proposed. Comparison between fine and coarse GT datasets of traffic conditions as well as perspectives of using DNNs to prepare coarse GT datasets are shown in [27]. The CNN called "Vnet" associated with data augmentation is successfully used for 3D image segmentation based on a volumetric [28]. V-net achieves better results when compared to solutions presented in the PROMISE 2012 challenge dataset. The CNN network "U-net", with training strategy that also relies on data augmentation is presented in [29]. U-net outperforms any other solutions at the International Symposium on Biomedical Imaging (ISBI) cell tracking challenge in categories of phase contrast and DIC. Deep Convolutional Neural Network (DCNN) is presented in [30], with focus on having automated training process with usage of tools such as CellProfiler [31] and data augmentation for creation of training dataset. In [32] authors demonstrate multi-class semantic segmentation of CT and MRI images, without specific GT labels, but from different dataset with the same anatomy. Still, it seems that this kind of research is not that common. In addition, majority of state-of-the-art articles tend to optimize single DNN.
Here, machine learning methods and techniques are applied in order to develop robust and objective approach for given problem: generate a GT for Arabidopsis thaliana nucleus dataset. The first part of research [33] uses multiple algorithms, levels and operators to build a model based on traditional unsupervised segmentation methods. After various evaluations, our work shows that the best results on reduced Arabidopsis thaliana dataset are obtained by Generic Ground Truth Image Approach 7 (GGTI AP7) [34]. GGTI AP 7 involves knowledge based optimization and voter and median operators and is implemented with seven unsupervised segmentation algorithms: Adaptive K-means [35,36], Fuzzy-c means [37−39], KGB (Kernel Graph Cut) [40,41], Multi Modal [42,43], OTSU [44,45], SRM (Statistical region merging) [46,47] and APC (Affinity Propagation Clustering) [48−50].
Although GGTI AP7 gives good results, it also has some flaws. Additional techniques do improve overall results of segmentation, but unsupervised methods can work in unexpected ways in some cases. For example, when using GGTI AP7, some difficulties are encountered for the segmentation of slices at the beginning and at the end of a 3D sample/image stack (where the nucleus is hardly visible or not existing). Furthermore, classical algorithms cannot make semantic segmentation with named output labels, meaning they cannot specifically categorize objects they segment. To overcome these difficulties, this paper focuses on supervised learning techniques, CNNs, semantic segmentation, operators and various optimizations. This paper offers DL approach with cascade of CNNs (multiple CNNs in sequence) with chained symmetric blocks of layers for semi-automatic mathematical GT generation of Arabidopsis thaliana dataset. It also demonstrates that generated CNNs architecture can be used as pre-trained CNNs and further improvement of generic GT generator. Furthermore, tools and methods developed in this paper can be used for CNN development and generation of mathematical GT for other classes and types of datasets, as illustrated in this paper.

Dataset Description and Preparation
Images from public OMERO repository at Florida State University, https://omero.bio.fsu.edu/webclient/userdata/?experimenter=-1, (Project: "2015_Poulet et al_Bioinformatics") are used. The dataset used in this paper is the same as in [51] and is composed of 77 Arabidopsis thaliana nuclei divided in 38 wild type nuclei and 39 nuclei from the crwn1 crwn2 mutant. Microscopic observations are performed using Leica Microsystems MAAF DM 16000B [52]. Samples are saved in TIFF format, and may include a number of 2D multipage slices for each sample. Thus, the dataset provides a full 3D image stack sample collection. The training process of CNN networks requires the same sized input images. As the Arabidopsis thaliana dataset contains samples with variable sizes, appropriate resizing is performed. All samples are less than 110 x 110 pixels, and were resized to 40 x 40 pixels in order to make the training process faster.
The algorithms proposed in this paper process 2D images/slices only. Thus, they cannot be directly applied to 3D stack samples. It is necessary to decompose/compose 3D stack samples into n x 2D slices and then compose slices back to 3D stack samples after processing (Figure 1). . As mentioned, GGTI AP7 has drawbacks with segmenting slices at the beginning and at the end of a 3D stack sample, so segmenting and comparing 3D stack samples with all slices would not be useful. Instead, six 3D stack samples are selected-3 from both the wild and mutated classes, with ten slices from middle part of each 3D stack sample. This way it was possible to have reliable and comparable results with GGTI AP7.
In order to generate a training dataset for the CNNs training process and to evaluate proposed algorithms, some sort of GT labelling is necessary. This is done by the manual segmentation of the selected dataset. To make the process of manual segmentation more objective and realistic, 3 team members who are experts in the field, made individual manual segmentations for each slice. To summarize, each expert made 3 manual segmentations for 10 slices of 6 different 3D stack samples (Supplementary table S1). Individual slices for sample 6 are shown in Figure 2a, and a 3D view of the same sample is shown in Figure 2b. The subsequent step is to process 3 manual segmentations of every slice through the voter and median operator (voter and median operator will be explained in section 2.2.1). That is a total of 60 voter and 60 median slice segments for each expert. Corresponding voter and median slices are then processed by the experts through global voter and median operator, resulting in voter and median segments for every slice/3D stack sample. The final voter and median segments of the nucleus are called MGTI (Manual Ground Truth Image). In this study, only voter segments are used for further processing and evaluation. MGTI voter slices for sample 6 are shown in Figure 3a, and a 3D view of same sample is shown in Figure 3b.

Convolutional Neural Netwotrks (CNNs)
CNN networks are trained with large amounts of data, and can be used in different fields for different purposes such as classification or semantic segmentation. They contain a large number of layers, most of which are convolutional layers. Our proposed CNN architecture uses 2 chained symmetrical blocks of convolutional-batch normalization-ReLu (Rectifier Linear Unit) layers, having max pooling layer between them. The basic architecture of CNN generated in this paper is shown in Figure 4, although GUI tools allow for the additional defining of CNN architecture, such as adding additional convolutional-batch normalization-ReLu blocks of layers in chain. The input layer is the first layer in the CNN network and it primarily determines input size for the dataset. The input layer size is defined through GUI, and it changes depending on the dataset. Both 40 x 40 pixels and 200 x 200 pixels images are used as will be noted along experiments. Convolutional layer performs convolution operation over layer's input as defined by filter size. This operation reduces input size, thus enabling faster computing throughout the whole algorithm. Besides reducing input size, convolutional layer produces multiple number of filters as output.
For the first two convolutional layers and transposed convolutional layer, the filter number had double value of previous convolutional layer. Batch normalization layer optimizes input values for further calculations while making no input size changes. ReLu layer applies rectifier linear function to its input while also making no input size changes. Batch normalization layer and ReLu layer have no explicit parameters. Max pooling layer reduces layer's input size by applying moving window filter size over the whole input, forwarding only the maximum value from that window. The filter size that is used in maximum pooling layer is 2 x 2 pixels, with stride of 2 x 2 pixels. In order to make transposed convolutional layer functional, it is necessary to have input size for the layer calculated correctly. That is why the two convolutional layers preceding the transposed layer must have proper output dimensions. To achieve that, padding must be applied to the input matrices of convolutional layers. The formula for the padding of convolutional layers is: pixels , filter size 2k 1, k € Z x 1 pixels, filter size 2k, k, k € Z (1) Some of previously described layers reduce the size of filters. The transposed convolutional layer applies a transposed convolution operation on filters, computing filters with initial size as output. This layer has filter size of 4 x 4 pixels and stride of 2 x 2 pixels. The cropping of 1x1 pixel is due to previously added paddings. Fully connected layer serves as connection between every pixel and previously calculated filters. This convolutional layer has default configuration stride of 1 x 1 pixel, and output of two filters, as there are two labels, background and foreground (nucleus). Softmax layer calculates probability of each pixel belonging to one of two defined labels. Pixel classification layer produces final semantic segmentation, having pixel-labelled output of the same size as the initial input. Softmax layer and pixel classification layer have no explicit parameters. Training of each CNN is done with 300 epochs. Other parameter configurations, along with other details, are explained in the next section.

Knowledge Based (KB) Module
The KB module consists of several components: parameters' combinatorics optimization, voter and median operator.
Parameters' combinations optimization is implemented through different combinatorics of CNN parameters' values. The purpose of this combinatorics operator is to thoroughly explore domain parameters space and generate and train multiple CNNs with different parameters' values using one of the combinatorics methods, in order to achieve more reliable results. We propose 3 parameters' combinatorics optimization methods: linear, brute force and the random parameter method.
CNN parameters that are included in the combinatorics operator are: number of filters, learning rate, filter size, mini batch size and momentum. After thorough evaluation, we concluded that these parameters have the most significance to CNNs performance, but also keep the training time of CNNs within reasonable time limits. The parameterʹs values domains are also determined through evaluation. As for the other CNN parameters, we determined their values from the tests and training processes for which we achieved the best results. Default values are used wherever not explicitly stated otherwise.

Linear method (LM)
The linear method presumes generating and training of CNNs as follows:  Choose n combinations of CNN parameters, with linearly selected values from their domain.  Generate and train n sets of CNNs for each selected combination. Pseudocode for LM is defined as Pseudocode 1. Voter operator Once CNNs have been generated and trained, each of them has its own semantic segmentation output. With voter operator, it is possible to calculate a single segmentation from these CNNs.
The formula for the voter operator is: Where i,j represent the row and column of pixels in the semantic segmentation image, and n represents number of CNN semantic segmentations. Values 1 and 0 correspond to the foreground and background labels of nucleus segmentation.
In order for the voter operator to function properly, the number of CNNs has to be odd. The voter operator is shown in Figure 5 (for demonstration purposes, segmentation images have 2 x 4 dimensions). The voter operator is calculated for the linear, brute force and random parameter methods separately, but can also be combined into another voter segmentation.

Median operator
Similar to the voter operator, inputs for the median operator are segmentation images from generated CNNs. All segmentation images can be sorted in ascending order, having a number of pixels with foreground labels as referent sorting parameters. Due to the odd number of segmentation slices, the middle element can be taken as the resulting image. The formula for median operator is: Where n represents number of segmentations, in this case number of CNNs, ⌊ ⌋ is x-th segmentation in a previously sorted manner and ⌊ ⌋ (floor of x) is the biggest integer, which is not bigger than x. A demonstration of the median operator is shown in Figure 6.
(a) (b) Figure 6. Graphical demonstration of the median operator. Suppose that there are five generated CNNs: (a) Each of CNNs made semantic segmentation for a slice, with grey pixels being background label, and white pixels being foreground (nucleus) label. (b) After sorting resulting segmentation slices, median slice can be selected.

Cascade of CNNs
In order to fully describe and understand model proposed in this paper, it is necessary to describe cascade of CNNs more closely. Cascade of CNNs do not present new type of CNN. Instead, the outputs of previously described CNNs generated with each combinatorics are used sequentially by applying median and voter operators on them. This approach is simple and seems superior to the potential poor performance of a single CNN, because it is able to compensate the flaws of one CNN and optimize results using proposed operators over multiple CNNs. In this way, a model of the cascade of CNNs is defined through mathematical operators that are used within algorithms itself: voter, median and combinatorics operator. In addition to mathematical operators, proposed model uses statistical evaluation functions (described in Section 2.2).
It is also possible to define performance parameters of cascade of CNNs as (same as CNN structure parameters, it is understood that unlisted parameters take default suggested values):

Data Augmentation
CNN training requires a great amount of data in order to have the best and usable results and CNNs. Many pre-trained publicly available CNNs [53] used thousands of pictures for training. To this aim, we generate artificial samples based on real dataset. The ImageDataAugmenter [54] MATLAB library that incorporates data augmentation techniques such as reflection, translation and rotation is used for data augmentation.

Metrics Evaluation Module
In order to make result evaluation more robust and reliable, we use 13 metrics. It is necessary to define basic terminology for metrics [55]:  Let ground truth image (GTI) be defined as referenced slice, and segmented image (SI) as segmented slice calculated from semantic segmentation of CNN. White value represents foreground (nucleus) label, and black value represents background label. If the GTI (white (+)) pixel goes into SI (white (+)) pixel, that is TP. If the GTI (white (+)) pixel goes into SI (black (-)) pixel, that is FN. If the GTI (black (-)) pixel goes into SI (white (+)) pixel that is FP. If the GTI (black (-)) pixel goes into SI (black (-)) pixel, that is TN.
In addition, T, P, Q, N parameters are defined as Now it is possible to define formulas for metrics as [56] Accuracy (8) [56] Sensitivity (10) [56] AUC Monitoring single metrics can lead to the wrong conclusion. If the metric shows good results, it does not mean that results are reliable or objective. Last two metrics AllMetricsConcurent and AllMetricsConcurentQuadratic are original metrics, which are introduced to overcome these issues.
These metrics include all other metrics into their calculation, in order to achieve a more reliable and more objective evaluation of results. This approach compensates metrics that show outlier results compared to other metrics. Metric AllMetricsConcurentQuadratic can be considered as the most severe, because it uses a formula that the research team ranked as giving the most objective and robust evaluation results.

Model Based on CNNs
Generic diagram of complete process is shown in Figure 8. The previously described dataset is divided into two parts: the part for manual segmentation and the part for validation (Module 5). Module 5 produces two types of GT (GT from GGTI AP7 and MGTI) that are used as referenced segmentations.
Foreground and background labels are specified in Module 1. After manual segmentation, the training set is proceeded to the deep learning module for the generic ground truth image generator (GGTI DL), Module 3. Different variants of this module will be explained in later paragraphs. After the DL generator, mathematical GT for specific slices is calculated, so they are proceeded to the metrics evaluation module, along with raw images and mathematical GT from the Module 2. Metrics results are later evaluated and analysed. Variant A uses only one 3D stack sample for training/validation, while variant B uses multiple 3D stack samples.
Different variants contained in Module 3 are shown in Table 1: Generic diagram of variants 1A−2B is shown in Figure 9. It can be seen that these variants have the same processing steps, but differ in training blocks. Variant 1A operates with one 3D stack sample only. After Module 1 finishes with manual segmentation, a single raw and MGTI slice (with labels) is proceeded to the KB module, as training set for CNNs. Result of the KB module is k CNNs. Validation raw slice from the same 3D stack sample is processed through trained CNNs, resulting in k segmented slices. Segmented slices are processed to the voter and median operators, creating the final result. Voter and median segmented slices are processed to Module 4, along with validation segmented slice, for evaluation.
GGTI DL variant 1B has the same steps as Variant 1A, but it differs in the data used for training and validation. A single MGTI slice (with labels) from multiple 3D stack samples is used for training CNNs. The validation slice is from the 3D stack sample that was not included in training. Variant 2A operates with one 3D stack sample only, but multiple slices for training CNNs. After Module 1 finishes with manual segmentation, multiple raw and MGTI slices (with labels) are proceeded to KB module, as training set for CNNs. Result of the KB module is k CNNs. Validation raw slice from the same 3D stack sample is processed through trained CNNs, resulting in k segmented slices. Segmented slices are processed to the voter and median operators, creating the final result. Voter and median segmented slices are then processed to Module 4, along with validation segmented slice, for evaluation.
GGTI DL variant 2B has the same steps as variant 2A, but it differs in the data used for training and validation. Multiple MGTI slices (with labels) from multiple 3D stack samples are used for training CNNs. The validation slice is from the 3D stack sample that was not included in training.

GGTI DL Variant 3A and 3B
GGTI DL variant 3A differs in steps when compared to previous variants. It uses pre-trained CNNs from variant 2A. Thus, the structure parameters of CNNs in the KB module cannot be defined, but performance parameters can. These pre-trained CNNs are additionally trained with new MGTI slices from the same 3D stack sample. A diagram for the variant 3A and 3B is shown in Figure 10: kGGTI DL variant 3B has the same steps as variant 3A, but it differs in the data used for training and validation. Multiple MGTI slices (with labels) from multiple 3D stacks samples are used for additional training of pre-trained CNNs from the variant 2B. The validation slice is from the 3D stack sample that was not included in either the training or pre-training and additional training.
The training blocks (training sets) for variants 1A−3B are shown in Figure 11.

Software Implementation
The result of our research is complete GUI support for all processes previously explained. All code and GUI tools are written in MATLAB [62]. The software framework is shown in Figure 12. Global GUI is shown in Figure 13a. Specific GUI has been developed for Module 1. The Module 1 GUI offers variety of options, including segmentation options like global/local threshold, draw freehand, flood fill. It also offers three manual segmentations with voter/median operators, as well as external load of segmented slices, export to TIFF/MAT file types, etc. The MGTI GUI tool is shown in Figure 13b. Module 2 is the GUI tool for GGTI AP7, and has been upgraded additionally since the last publication. GUI for Module 2 is shown in Figure 13c. The GUI tool for Module 3-GGTI DL variants 1A, 1B, 2A and 2B-is shown in Figure 13d. It contains elements such as configuring CNNs with performance, structural and data augmentation parameters, exporting/importing CNNs, segmenting slices with generated CNNs, specifying sample resizing and cropping options, saving segmented slices as TIFF/MAT data types, other help tools, etc. The main feature of the Module 3 is enabling iterative training of CNNs. It is possible to generate CNNs for each method more than once, while the CNNs counters continue to add CNNs to environment. The GUI tool for Module 3-GGTI DL variants 3A and 3B is shown in Figure 14a. It supports functionalities such as loading pre-trained CNNs, configuring CNNs with performance and data augmentation parameters, export/import CNNs, segmenting slices with generated CNNs, and saving them in TIFF/MAT data types, other help tools, etc. Likewise, the GUIs for variants 1A, 1B, 2A and 2B, it enables iterative additional training of pre-trained CNNs. The Module 4-Metrics evaluation GUI tool is shown in Figure 14b,14c. It offers variety of metrics and options such as calculating 2D and iterative 3D metrics, saving results to MAT files, etc. Application also includes GUI that is used for the simplification of 3D segmentation and evaluation. It provides import of CNNs, along with batch 3D DL segmentation and 3D metrics evaluation. These features are also available through previously mentioned GUIs, but this GUI has been developed in order to provide an all-in-one segmentation and evaluation process. The GUI is shown in Figure 14d.

Results
Experiments 1−6 map onto models of variants 1A, 1B, 2A, 2B, 3A and 3B. The CNN voter and median segments of generated CNNs for LM, BFM and RPM are compared to both MGTI and GGTI AP7 segments. MGTI and GGTI AP7 represent referential points in these comparisons. That makes total of four comparisons. In the rest of the paper all metrics will be addressed with acronyms: AccuracyAdjustedRandIndex (AARI), AUC (AUC), BoundaryHammingDistance (BHD), DiceCoefficient (DC), FowlkesMallowIndex (FMI), JaccardCoefficient (JC), Precision (P), Rand   Integral diagram of Tables 2−5 is shown in Figure 15.    Integral diagram of Tables 6−9 is shown in Figure 16.    Integral diagram of Tables 10−13 is shown in Figure 17.

Experiment 7: Benchmarking-GGTI DL 3D Segmentation
In order to benchmark this novel GGTI DL approach, it is necessary to compare proposed algorithm to other algorithms. Six manually segmented and prepared Arabidopsis thaliana 3D samples are used for segmentation and evaluation. The other two algorithms are GGTI AP7 and NucleusJ [63]. NucleusJ is an ImageJ/Fiji [64] plugin.
For GGTI DL segmentation, CNN generated networks from 3B BFM variant are chosen for 3D segmentation. Both voter and median segments are compared.

Experiment 8: Generalization
In order to show generalization of proposed approach, it is necessary to compare proposed algorithms, both GGTI DL and GGTI AP7 to other microscopic nuclei datasets. Algorithms are evaluated on two datasets: BBBC039-nuclei of U2OS cells in a chemical screen [65], and BBBC035simulated nuclei of HL60 cells stained with Hoescht [66]. Preprocessing of datasets is done by applying the imadjust () MATLAB function for increasing contrast. Samples of these datasets are shown in Figure 21. Both datasets contain GT, so no manual segmentation is required. Considering the fact that these are 2D datasets, an alternative approach is applied and a simulation of a 3D dataset is made. First, five slices (ordered by alphabet, A-Z descending) of corresponding datasets are concatenated, in order to make one 3D sample for evaluation. That sample is compared to corresponding 3D GT sample and evaluation is performed using 13 metrics.
For GGTI DL, CNN networks generated from BFM in variant 3B are used and three variants are made:  Variant 1G-Semantic segmentation with pre-trained CNN networks used in experiment 7.   For variant 2G, CNN networks are additionally trained with samples from each of the datasets independently. That means that the training and testing process did not involve the mixing of two datasets. Considering the fact that samples size from both datasets are over 500 x 500 pixels, adjustments are necessary in order to conduct a valid training process, because pre-trained CNN networks are trained on 40 x 40 pixels input size. To manage these differences, random 40 x 40 pixels window cropping of training set samples is used.
The training set for both datasets consists of five successive samples that are found next to the five testing samples (ordered by alphabet, A-Z descending). Results are shown in Tables 34 and 35.

Comparative Analysis and Discussion
As already mentioned, AMCQ metrics are evaluated as the most objective and thus all discussion and analysis are conducted in relation to that metric. Aggregation of experiments 1-6 is shown in Figure 22.  (Table 2). Experiment 2-variant 1B has the best result for LM method, CNN median compared to MGTI, AMCQ metrics with value 0.97628 (Table 6). Segmented nucleus slices for these experiments are shown in Figure 23.  (Table 10). Experiment 4-variant 2B has the best result for LM method, CNN median compared to MGTI, AMCQ metrics with value 0.97879 (Table 14). Segmented nucleus slices for these experiments are shown in Figure 24.   Evaluating experiments 1-6, overall AMCQ comparison between voter and MGTI shows results over 0.94142 for all experiments, for all three methods used (Figure 22 a−c). This demonstrates that the proposed concept gives good results, regardless of the fact what and how much data is used for training. Nevertheless, data augmentation and artificial data is important factor.
In addition, analysing experiments 1−6, comparison between CNN voter and MGTI shows better results than comparison between CNN voter and GGTI AP7. This contributes to the fact that GGTI DL approaches make different results than GGTI AP7, but are still reliable and useful. Experiment 1 and 2 show high performances in comparison between CNN voter and MGTI, above 0.94. Even with minimal training set, it seems possible to achieve suboptimal results and thus reducing human intervention even further. Experiment 6 shows over 0.96547 for AMCQ metrics, for all results. Having larger number of training slices (pre-trained + additional training) can explain these results.

Benchmarking
Aggregation of experiment 7-benchmarking is shown in Figure 26.  3D segmentation samples for sample experiment 7, sample 4 can be seen in Figure 28.

Generalization
Aggregation of experiment 8-generalization is shown in Figure 29.   Figure 30.  Comparing voter and median segments of CNNs shows that the CNN voter has better results than the CNN median throughout all experiments (Figure 22, 26, 29). This suggests that the number of CNNs could provide better results for median, because of larger number of output segmentation, and in that way compensate for lower performance.
The overall results, including individual, benchmarking and generalization segmentations, and the fact that segmentation labels are available as the end result in DL processes, are optimistic.

Conclusions
This research encompasses the development of a complete system for the mathematical modelling of GT 3D images. The system modules are GUI for training/importing/exporting CNNs, iterative generation, semantic segmentation, GUI for manual segmentation, performance evaluation module, etc. Our approach detailed in this research proves to be more robust than classical algorithms due to being able to segment specific slices and make labelled output data. Training iterations finished with maximum number of 35 CNNs per method, but it is possible to choose an arbitrary number of CNNs. With minimal changes to the source code, it is possible to adjust the system to make semantic segmentations for datasets with more than two label classes. Tools presented in this research associated with appropriate dataset preprocessing enable the processing of any dataset with two labels. Finally, the system gives better benchmarking and generalization results when compared to other algorithms in the majority of our experiments (Tables 29−31, 33−36).
Future work will involve the expansion of the system by including more parameters into combinatorics operator to achieve even better results. Training of a greater number of CNNs for all methods on more powerful computing devices would probably further increase the reliability of the system. The final stage could involve the modelling of GGT for Arabidopsis thaliana. We conclude that DL is a powerful tool from a scalability perspective: (a) application to many classes of double-labelled nuclei, (b) application to multi-label cells pertaining to molecular domains, and (c) generating trained networks for (a) and (b) that could be re-trained with new molecular domains. Thus, our "laboratory scale" deep learning model will move forward by incorporating new molecular domains of the cell.