Computational Complexity Reduction of Neural Networks of Brain Tumor Image Segmentation by Introducing Fermi–Dirac Correction Functions

Nowadays, deep learning methods with high structural complexity and flexibility inevitably lean on the computational capability of the hardware. A platform with high-performance GPUs and large amounts of memory could support neural networks having large numbers of layers and kernels. However, naively pursuing high-cost hardware would probably drag the technical development of deep learning methods. In the article, we thus establish a new preprocessing method to reduce the computational complexity of the neural networks. Inspired by the band theory of solids in physics, we map the image space into a noninteraction physical system isomorphically and then treat image voxels as particle-like clusters. Then, we reconstruct the Fermi–Dirac distribution to be a correction function for the normalization of the voxel intensity and as a filter of insignificant cluster components. The filtered clusters at the circumstance can delineate the morphological heterogeneity of the image voxels. We used the BraTS 2019 datasets and the dimensional fusion U-net for the algorithmic validation, and the proposed Fermi–Dirac correction function exhibited comparable performance to other employed preprocessing methods. By comparing to the conventional z-score normalization function and the Gamma correction function, the proposed algorithm can save at least 38% of computational time cost under a low-cost hardware architecture. Even though the correction function of global histogram equalization has the lowest computational time among the employed correction functions, the proposed Fermi–Dirac correction function exhibits better capabilities of image augmentation and segmentation.


Introduction
Deep learning methods at present are playing an indispensable role in the field of computer vision. The relevant fruitful achievements also facilitate and change the fashion of synergy between clinical diagnosis and computerized assistance. The scope of application covers the demands of computer-guided pathological inspection [1][2][3][4], brain neural circuit mapping and tracking [1,[5][6][7][8][9], specific tissue detection in image-based datasets [5,[10][11][12][13], and other clinical applications. Among these applications, neural network (NN)-based recognition methods that are capable of detecting life-threatening abnormalities from image-based datasets especially attract the attention of both scientific and engineering participants [2,11,[14][15][16][17] and gradually replace conventional approaches. Within these techniques, the emergence of fully convolutional neural networks (FCN) has successfully acquired more attention, and the FCN-based methods have further elevated the performance of convolutional neural networks (CNNs) in the field of modern medical image recognition and segmentation [15,[18][19][20]. In applications of clinical practice, on the other hand, the medical image datasets are often sparse, so the technical development in these fields is leading by the NN models that are suitable for dealing with small-size datasets. Thus, starting from FCN, over V-net [15], U-net [21][22][23], and the auxiliary of feature extraction blocks [24][25][26], then to the attention U-net [27,28] and dimension fusion Unet (D-Unet) [29], the relevant methods gradually become the developing core in medical image analysis. Meanwhile, due to the extremely high mortality [1,13], the modality investigations of malignant brain tumors have also affected the mainstream techniques of the brain tumor image segmentation [1,30] and the procedures of image-guided surgery.
However, for pursuing the performance of model prediction, continuing to increase model depths and kernel numbers would become the burden of hardware architecture in the points of view of power control and the maintenance of computational complexity. For instance, the consumption of GPU memory and the computational complexity are all proportional to the size and number of convolutional kernels. Large numbers of parameters in training procedures also challenge the memory sizes of GPUs, the design of algorithmic schemes, and power requirements [14]. Additionally, the labor intervention and time cost in the stage of data labeling also obstruct the progression of technical development. Thus, the development of deep learning methods would step into the inevitable situations of highcost hardware and high computational complexity configuration [14,16]. The combination of contour-based models and statistical-learning-based methods offer alleviant approaches by utilizing prior information [3,[31][32][33][34] to reduce the high-cost scheme caused by the deep learning schemes. For instance, the statistical models of the random walks are used for tracking and segmenting thin and elongated image components by assigning seed points on the ends of these components [35]. Region-based contour models combined with statistical classification processes have been utilized to solve the problems of intensity inhomogeneity that occurred within image blocks [36][37][38] by arranging curve functions on the blocks of interest. Nevertheless, the manual intervention of the combinatorial approaches also causes different problems in artificially predefining labels of seeded image components [14,[39][40][41]. Physical-based methods also offer another track to deal with these undergoing problems. The data density functional theory (DDFT) successfully extracts the intensity features from brain tumor image datasets by estimating the morphological heterogeneity in energy spaces [42,43] so that it may remove the undesired manual intervention. However, the architecture that can bridge the deep learning procedures and DDFT has not appeared yet. Therefore, inspired by the merits of the physical framework of DDFT and also considering the advantages of deep learning methods, we introduced the Fermi-Dirac distribution to be a correction function in the following procedures. By employing the theoretical scheme of DDFT, we mapped the three-dimensional image space into a noninteracting physical system and treated each image voxel as a physical particle. Under this condition, the Fermi-Dirac distribution was used to analyze the particle behavior and classify these particle clusters according to their intrinsic energies. Thus, the particle clusters formed a specific morphological structure according to their energy property in the physical system. The estimates were then mapped back to the image space, and the image voxels were decomposed by considering their morphological heterogeneity. In the article, we compared the performance of the Fermi-Dirac correction function in Unet-based architectures. The experimental results validated the superiority of the proposed Fermi-Dirac correction function under the purpose of computational complexity reduction.

Datasets
For fair comparisons, we employed the open datasets from the multimodal brain tumor segmentation challenge (BraTS) 2019 [44,45] to test the proposed algorithmic schemes. The type of datasets includes T1, T1-CE (T1-weighted contrast-enhanced), T2-weighted, and T2-FLAIR (Fluid-Attenuated Inversion Recovery). The datasets were skull-stripped, interpolated to the same resolution, and labeled by experienced neuro-radiologists. There are four labeled types for the tumor anatomic structures, which comprise the enhancing tumor (ET), the peritumoral edema, the necrotic and nonenhancing tumor core (TC), and the background. There are 300 testing sets and 35 validation sets, and 155 pieces of images in each individual set. The dimensions of each image size are 240 pixels both for height and width, and all of the images are in axial views.

Theoretical Scheme of Fermi-Dirac Correction Function
According to the theoretical framework of DDFT, an arbitrary high-dimensional image space can be isomorphically mapped into a pseudo-physical space with the same dimensionality once the data length is fixed [42]. Then, image voxel components are correspondingly mapped as particle-like clusters. Thus, there exists a bijective relation between the voxel intensity distribution and the Fermi hypersurface function. In other words, the voxel intensity can be mapped as a corresponding energy value within the pseudo-physical space. The voxel intensity distribution becomes a function of the Fermi hypersurface function under this theoretical framework, and vice versa. As the Fermi hypersurface function has an adequately theoretical definition in the pseudo-physical space [42], we could safely introduce appropriate physical properties to deal with the encountering situations. For instance, the Hamiltonian curves and the Lagrangian density functional could be exploited to measure the most-possible-cluster numbers and cluster boundaries, respectively [42,46]. Thus, we can delineate the morphological structure of images by applying these properties of energy under the framework of DDFT. For image segmentation, we introduced a specific particle distribution in terms of energy properties, named Fermi-Dirac distribution, to analyze and pre-decompose the image clusters according to their morphological structures.
Under the DDFT framework, the pseudo-physical space is spanned onto a noninteracting (or weakly interacting) system so that we can establish required conditions by considering each cluster to be a subsystem and each particle (mapped from a voxel in the image space) to be in a single-particle state. Additionally, as the locations and the corresponding intensity distribution of voxels within an image space are all fixed, there is no interaction between voxels, and this space can be treated as a stationary and "frozen" system. In other words, the clusters will not exchange particles and energy with their surroundings in the noninteracting system; thus, we can exploit an additional condition of zero temperature into this system. Thus, the probability of observing the number of particles n in the single-particle state can be estimated by introducing the perspective of thermodynamic properties: where the parameters ε, ε F , and ε b are the energy of the single-particle state, the Fermi energy at zero temperature, and the correction energy for dimensional balance, respectively. The symbol Z represents a partition function for the subsystem of interest: The grand potential GV can be then defined by designating the particles to be pseudofermions or pseudo-bosons, respectively: Then, the corresponding particle distribution functions for each particle type can be expressed formally:n The expressions of these distributions are the well-known Fermi-Dirac (FD) distribution and Bose-Einstein (BE) distribution, respectively. Equation (4) lists their representations in our proposed pseudo-physical system. It should be emphasized that the n takes 0 and 1 for pseudo-fermions and 0, 1, 2, . . . , ∞ for pseudo-bosons. Theoretically, the maximum of n in different particle types (fermions or bosons) represents the maximum capacity of particles a single-particle state can contain.
We found an interesting phenomenon by simultaneously comparing the theoretical form of the FD distribution to that of the sigmoid function and the z-score normalization. The coarse form of the FD distribution is exactly the sigmoid function, so the pseudoparticle distribution will be squeezed into the range of [0, 1] as expected. On the other hand, as the form of the mathematical kernel of the FD distribution, (ε − ε F )/ε b , is also similar to the z-score normalization, we found a route to determine well the parameter definitions of pseudo-energy by comparing their parameters. Due to the voxel intensity in the image space and the energy in the pseudo-physical space being bijective, we correspondingly assigned these parameters, ε, ε F , and ε b , to be the intensity value distribution of each image, the global mean intensity of each testing set, and the minus global standard deviation of intensity of each testing set, respectively. Thus, the FD distribution exhibits its own mathematical merits for data processing by fusing the characteristics of the sigmoid function and the z-score normalization. Mathematically, we can clarify the difference between the FD and the BE distributions by examining their behavior under extreme conditions. When the value of ε − ε F in the kernel approaches zero, it also means that the intensity approaches its mean value in the subsystem, the value of the FD distribution will reach 0.5, whereas that of the BE distribution will approach infinity. Thus, the BE distribution intrinsically reveals its own anti-behavior compared to the sigmoid function. When the kernel is much larger than 1, both FD and BE distributions will have the same mathematical formn ≈ e −(ε−ε F )/ε b . This form is also a well-known Maxwell-Boltzmann (MB) distribution. Figure 1 illustrates each mathematical behavior of the mentioned distributions with different values of minus global standard deviation ε b .
As the bosons are allowed to have the same energy value, it would lead to a consequence that all of the voxels are classified in the same cluster and become indistinguishable. On the other hand, the intrinsic mathematical form of the BE distribution also causes itself to be an infinite value while the voxel intensity approaches its mean value; thus, these inconvenient properties make the BE distribution hardly a correction function for data normalization and data filtering in the data processing. A similar situation would happen to the MB distribution. Additionally, it is also impracticable to estimate the mean intensity of each subsystem technically due to the undesired computational complexity; thus, we utilized the global mean intensity of the whole system to conquer this issue. Therefore, by considering both the theoretical and the technical merits, the FD distribution is more suitable as a correction function in the data preprocessing procedures.

Experimental Framework
To test the performance of the proposed FD correction function, we employed the conventional preprocessing methods, a z-score normalization function, a Gamma correction function, and a three-dimensional (3D) global histogram equalization function [47], for the performance comparison and used a null-preprocess as a baseline. As mentioned above, the mathematical structure of the FD correction function was established by a coarse skeleton and a kernel embedded in the skeleton. Theoretically, the skeleton is a sigmoid function, and the kernel function has the same form as the z-score normalization function. Thus, we adopted the z-score normalization function as one of the preprocessing ways for a fair comparison of performance and then investigated the differences between these functions. On the other hand, we also adopted the Gamma correction function due to its performance being adjustable by assigning an appropriate parameter. Thus, we can also compare the performance of conventional supervised correction functions to the proposed FD correction function. The way to determine the optimized Gamma parameter directly relies on a large number of experiments. We extracted the Gamma parameters in a set of {0.2, 0.4, 0.6, 0.8, 1.0, 1.2, 1.5, 2}. According to the performance of image segmentation, we adopted the value of 0.6 as the Gamma parameter due to it offering the best performance in the image segmentation. Finally, we compared the capability between the global histogram equalization function and our functions on the topic of image augmentation.
Thus, five preprocessing scenarios were employed for the performance comparison under a specific neural network model. To achieve this purpose, we utilized the D-Unet-based structure for the brain tumor image segmentation. Figure 2 illustrates the relevant framework. D-Unet is established based on the conventional U-net structure, and thus, its structure is suitable for dealing with small-size datasets. Furthermore, a delicate function component named the dimension-transform-block [29] is employed in the encoding procedure of D-Unet, then all of the two-and three-dimensional extracted features would be fused through these blocks. As illustrated in Figure 2, two dimensiontransform-blocks (indicated as the red columns) were used in the proposed D-Unet scheme. Meanwhile, to reduce the undesired computational cost, only the fused features and the two-dimensional features are fed into the decoding procedure. Thus, the D-Unet structure can simultaneously preserve the significant high-dimensional features and offer an acceptable computational cost. Under the framework of the employed D-Unet structure, the dimensions of image sizes were scaled down to 160 pixels both for height and width, and the input types of each image included four channels, T1, T1-CE, T2-weighted, and T2-FLAIR. The adopted numbers of filters are indicated above each layer, and only one channel, the whole tumor (WT), would be output for the performance comparison. The output activation layer was a sigmoid function. The number of epochs and batch size were set to 30 and 32, respectively, for all preprocessing scenarios. The optimizer and the loss function were Adam and the three-dimensional soft dice loss function [48,49], respectively. The analytic form of the soft dice loss function is [15,[48][49][50]: The number N is the total data length and the symbol = 10 −5 was adopted to avoid DL diverging. The parameters p i ∈ {0, 1} and g i ∈ {0, 1} are respectively the binary predicted segmentation and ground truth labeling. Ref. [50] offers the source codes of the soft dice loss. The initial value of the learning rate of all scenarios is set to 0.00015. The relevant model parameters and functions are collected and listed in Table 1.  Figure 2. The framework of the D-Unet-based structure for the brain tumor image segmentation. The original image sizes are 240 pixels, and we rescaled the input sizes to 160 pixels to match the requirements of the adopted neural network models. The third-dimensional value of 160 × 160 × 4 used in the 2D Unet represents four-type images, include T1, T1-weighted contrast-enhanced (T1-CE), T2-weighted, and T2-fluid-attenuated inversion recovery (FLAIR), and so does that used in the 3D feature extraction procedure. The fourth dimension value of 160 × 160 × 4 × 1 in the 3D feature extraction procedure represents the number of trials. The dimension-transform-blocks were used to fuse the two-and three-dimensional features in the encoding procedure, and only fused-and two-dimensional features were used for the information decoding. Thus, these procedures can offer a trade-off between high-dimensional information and computational complexity.

Results
All of the experiments were executed with the hardware specification of i9-9980XE CPU @ 3 GHz, 18 cores, and one GPU of NVIDIA GeForce RTX 2080 Ti. Figure 3 illustrates the visualized comparison between the employed preprocessing methods and the proposed FD correction function. To theoretically inspect the capability of the FD correction function, two values of ε b were used to construct the kernel. The corresponding values for FD 1 and FD 2 correction functions were ε b = −1 and ε b = minus the global standard deviation of intensity of each testing set, respectively. In the words, the FD 2 correction function is a standard FD correction function. To clarify the main difference between FD 1 and FD 2 correction functions, their analytical forms are: By comparing to the results of other preprocessing methods, all of the proposed FD-type correction functions can remove the insignificant components of the brain tissue images, especially for the cases of T2 and T2-FLAIR images. Meanwhile, the FD 2 correction function can further enhance the filtered image components, and so did the 3D global histogram equalization function. The z-score normalization and the Gamma correction functions exhibit similar preprocessing results. The performance comparison of all preprocessing scenarios is listed in Table 2. The dice score of WT without any preprocessing is the lowest one as expected, and thus, we used its corresponding computational time as the baseline. All dice scores of the employed preprocessing methods are similar, and the FD-type correction functions present slightly high scores. Even though the dice score of the z-score normalization function reaches a good level, its computational time is much higher than that of baseline. It might be that the execution efficiency of this method was not optimized. On the other hand, the Gamma correction function also exhibits good performance in both the dice score and the corresponding computational time. However, it costed a large number of experiments and labor intervention to choose the optimized gamma parameter. In our experiments, the final optimized parameter of the Gamma correction function was 0.6. Although the dice score of the 3D global histogram equalization function is the lowest one compared to all employed methods, its computational time reveals the superior performance. Table 3 represents a confusion-matrix table to show the comparison of accuracy, sensitivity (recall), and precision between the proposed FD-type correction functions and the conventional correction methods. The definition of each indicator is as follows [51,52]: Recall(Sensitivity) = TP TP + FN , The factors TP, TN, FP, and FN are the true positive, true negative, false positive, and false negative, respectively. The total summation of these four factors is around 812,535 with a small residual error. Then, the total number of tumors is defined as TP + FN and has a value of about 21,191.  Thus, we further compared the performance between the 3D global histogram equalization function and the proposed FD-type correction functions, wherein the tumor labels were categorized into WT, TC, and ET in detail. The output channels of the D-Unet became three branches. Tables 4 and 5 respectively list the performance comparison and the corresponding confusion-matrix table. We preserved all of the model parameters for a fair comparison but replaced the output layer with a softmax activation function for multi-label detections. We also presented the corresponding dice scores of each label for the training stage and the validation stage. As expected, the dice scores of the validation stage are much lower than those of the training stage. All computational times in this experiment are comparable, but the required time costs were all raised due to the multi-label detections. In this experiment, all of the dice scores of the FD-type correction functions are higher than that of the 3D global histogram equalization function. We might deduce these reasons by utilizing their visualized results, as illustrated in Figure 4.   All types of image sets were processed with the 3D global histogram equalization function and the proposed FD-type correction functions, and the preprocessed images are illustrated in Figure 4a. The image inclinations resulted from the augmentation of affine transformations. Then, the corresponding results of brain tumor image segmentation are illustrated in Figure 4b. As the tumor recognition of ET type is typically hard, the corresponding dice scores are quite low, as listed in Table 3, for all employed preprocessing methods. The visualized results of the ET type illustrated in Figure 4b reflect this fact. We can recognize that the image structures of ET type are meticulous compared to TC and WT, and thus, that it would cause difficulty in the ET tumor image recognition. It should also be emphasized that the segmented results of TC and WT types are similar in all employed methods in the experiments. Then, the FD 2 correction function shows better performance in recognizing the ET tumor edges than the other two methods. Therefore, the results validate that the FD 2 correction function, i.e., the FD correction function with a kernel of z-score normalization function, exhibits superior performance in both computational time reduction and brain tumor image segmentation by comparing other conventional preprocessing methods.

Discussion and Conclusions
We theoretically reconstructed the Fermi-Dirac distribution function to be a correction function for brain tumor image segmentation. The skeleton form and the kernel of the proposed Fermi-Dirac correction function are respectively a sigmoid function and a z-score normalization function. Thus, the Fermi-Dirac correction function inherits the merits of these functions. It is suitable for dealing with intensity normalization and filtering insignificant image components simultaneously. We also explained the mathematical inconvenience of the Bose-Einstein and Maxwell-Boltzmann distribution functions in imaging processing.
We then compared the performance of the proposed correction function with other conventional preprocessing methods, such as the standard z-score normalization, the Gamma correction, and the three-dimensional global histogram equalization. We also changed the normalization factor of the kernel to inspect the mathematical properties of the proposed Fermi-Dirac correction functions. The experimental consequences validated that the proposed correction function can have better computational cost than those of conventional and supervised methods. Then, the proposed correction function would inhibit the insignificant image components and enhance the filtered image segments. As the experimental results also showed that the three-dimensional global histogram equalization can offer a similar computational capability as the proposed correction method, we further fed those methods into a complete comparison. The experimental results validate the superiority of the proposed Fermi-Dirac correction function.
In further performance comparison, the proposed Fermi-Dirac correction function exhibits its superior computational and recognizing capability for the brain tumor image segmentation with the multimodal types of ET, TC, and WT. Therefore, the proposed Fermi-Dirac correction function can not only reduce the computational complexity but also reinforce the image component recognition and segmentation.  Data Availability Statement: Publicly available datasets were analyzed in this study. This data can be found here: https://www.med.upenn.edu/cbica/brats2019/data.html.