Narrow Band Active Contour Attention Model for Medical Segmentation

Medical image segmentation is one of the most challenging tasks in medical image analysis and widely developed for many clinical applications. While deep learning-based approaches have achieved impressive performance in semantic segmentation, they are limited to pixel-wise settings with imbalanced-class data problems and weak boundary object segmentation in medical images. In this paper, we tackle those limitations by developing a new two-branch deep network architecture which takes both higher level features and lower level features into account. The first branch extracts higher level feature as region information by a common encoder-decoder network structure such as Unet and FCN, whereas the second branch focuses on lower level features as support information around the boundary and processes in parallel to the first branch. Our key contribution is the second branch named Narrow Band Active Contour (NB-AC) attention model which treats the object contour as a hyperplane and all data inside a narrow band as support information that influences the position and orientation of the hyperplane. Our proposed NB-AC attention model incorporates the contour length with the region energy involving a fixed-width band around the curve or surface. The proposed network loss contains two fitting terms: (i) a high level feature (i.e., region) fitting term from the first branch; (ii) a lower level feature (i.e., contour) fitting term from the second branch including the (ii1) length of the object contour and (ii2) regional energy functional formed by the homogeneity criterion of both the inner band and outer band neighboring the evolving curve or surface. The proposed NB-AC loss can be incorporated into both 2D and 3D deep network architectures. The proposed network has been evaluated on different challenging medical image datasets, including DRIVE, iSeg17, MRBrainS18 and Brats18. The experimental results have shown that the proposed NB-AC loss outperforms other mainstream loss functions: Cross Entropy, Dice, Focal on two common segmentation frameworks Unet and FCN. Our 3D network which is built upon the proposed NB-AC loss and 3DUnet framework achieved state-of-the-art results on multiple volumetric datasets.


Introduction
Medical image segmentation has been widely studied and developed for refinement of clinical analysis and application [1][2][3][4][5][6]. Most deep learning (DL)-based segmentation networks have made use of common loss functions, e.g., Cross-Entropy (CE), Dice [6], and the recent Focal [7]. These losses are based on summations over the segmentation regions and are restricted to pixel-wise settings. Not only pixel-wise sensitivity, these losses are unfavorable to small structures, do not take geometrical information into account as well as are limited to imbalanced-class data and weak boundary objects problems. Furthermore, these losses are working on higher level features of region information and none of them is intentionally designed for lower level features such as edge/boundary which play an important role in medical imaging.
Our observations on medical images are as follows: (i) Boundary information plays a significant role in many medical analysis tasks, such as shape-based cancer analysis, size-based volume measure. (ii) Medical images contain weak boundaries which make segmentation tasks much more challenging due to low intensity contrast between tissues, and intensity inhomogeneity. For example, the myelination and maturation process of the infant brain, the intensity distributions of gray matter (GM) and white matter (WM) have a larger overlapping, and thus, the boundary between GM and WM is very weak, leading to a difficulty for segmentation. (iii) In the medical image segmentation problem, imbalance-class data are naturally existing. Those two challenges of the imbalanced-class data and the weak boundary object in medical imaging are visualized in Figure 1 and demonstrated in Figure 2. Figure 2 illustrates the imbalanced-class problem in medical images through the statistical class distribution of four different datasets. For each dataset, the number of samples between classes are varied.    Over the past few years, many efforts [1,8,9] have been proposed to segment a medical object under multiple challenges, such as weak boundary objects, small objects, imbalanced data, less annotated data. Among these approaches, active contour (AC) methods are powerful tools thanks to their ability to adapt their geometry and incorporate prior knowledge about the structure of interest. Level Set (LS) [10], an implementation of AC using energy functional minimization [11], has been proven to overcome the limitations of uniquely gradient-based models, especially when dealing with data sets suffering from noise and lack of contrast such as weak boundary objects. Besides the weak boundary objects, the unbalanced data problem in medical image segmentation has lately received serious attention [9,12] in addition to the problem of small objects detection/segmentation [7]. In [12], a boundary loss was proposed as a distance metric on the space of contours (or shapes), not regions, namely, the objective function is defined as a distance between two contours. Furthermore, the boundary loss [12] is implemented as the distance between single pixel on the contour, which signifies a high time consumption, specially when applying onto volumetric data. Different from boundary loss [12], which is considered as the distance between the predicted boundary and the ground-truth one, our proposed NB-AC loss treats the object contour as a hyperplane and all data inside a narrow band serve as support information that influences the position and orientation of the hyperplane. Our NB-AC loss with attention mechanism which focuses on the contour length with the region energy involving a fixed-width band around the curve or surface. Unlike [3], which works in the 2D domain and the energy loss is applied into entire spatial domain, our energy loss is applied on the narrow band around the boundary and our NB-AC is able to work in both 2D and 3D domains. Far from LAC-DA [1], which employs discriminant analysis classifier to split PET tissues into three categories: background, lesion, and border-line regions and processes a PET scan slide by slide in the 2D domain, our approach is a 2D/3D Unet-like model with an energy loss function and takes temporal information of the volumetric data into consideration. Unlike other 2D AC approaches, i.e., [5] which utilizes density-oriented BIRCH, Ref. [4] which uses AC evolution based on a fuzzy clustering algorithm, Ref. [6] which employs kernel fuzzy C-means to improve AC performance in medical image segmentation, our proposed NB-AC takes both 2D still image or 3D volumetric data into consideration. Recently, Ref. [1] utilized LS [10] in a deep learning framework to improve segmentation performance on medical images. However, the two energy terms corresponding to the inside energy and the outside energy are computed with the assumption that the mean values of the inside contour and the outside contour are constants and set as 1 and 0. Furthermore, Ref. [1] applied LS [10] over an entire image domain. Different from [1], our proposed network makes use of LS as an attention gate on a narrow band around the contour. In addition, the mean values of the inside contour and outside contour in our framework are computed using the deep feature map from the network.
To address the above problems, we make use of the advantages of LS [10] and propose a two-branch deep network which explicitly takes into account both higher level features, i.e., an object region in the first branch and lower level features, i.e., a contour (object shape) and narrow band around the contour in the second branch. The first branch is designed as a classical CNN, i.e., an encoder-decoder network structure whereas the second branch is built as a narrow band active contour (NB-AC) attention model which processes in parallel to the first branch. The proposed loss for our NB-AC attention model contains two fitting terms: (i) the length of the contour; (ii) the narrow band energy formed by homogeneity criterion in both the inner band and the outer band neighboring the evolving curve or surface as illustrated in Figure 4. The higher level feature from the first branch is connected to the lower level feature in the second branch through our proposed transitional gates and both are designed in an end-to-end architecture. Thus, our loss function not only pays attention to region information but also focuses on support information at the two sides of the boundary under a narrow band. In this proposed network, we consider the object contour as a hyperplane whereas information in the inner and outer bands aims to play the role of a supporter which influences the position and direction of the hyperplane. The key features of our architecture are summarized as follows: • Tackle the weak boundary object segmentation problem: the proposed NB-AC attention model is designed as an edge extractor and makes use of the narrow band principle, which has proven its efficiency in the evolution of level sets [13]. Furthermore, the proposed NB-AC loss is defined under an active contour energy minimization [10] which has been proven to be useful for weak object segmentation. • Address the imbalanced-class data problem: instead of taking into account all pixels belonging to an image domain and assigning a label to every single pixel, the NB-AC attention model focuses on a subset of supportive pixels located within the narrow band defined by the inner band and outer band. By ignoring all pixels that are outside of the narrow band, the proposed NB-AC attention model is considered as an under-sampling approach to solve imbalanced-class data problem. In the scenario of an under-sampling solution, which removes samples from the majority class to compensate for imbalanced distribution between classes, our proposed NB-AC attention model helps answer an important question "which samples should be removed/kept?". • Propose a new type of transitional gate that allows the higher level feature to interact with the lower level feature in an end-to-end framework.
To the best of our knowledge, this is one of the first works which takes both the imbalanced-class data problem and the weak boundary object segmentation into account by not only integrating the length of the boundary but also by minimizing the energy of the inner and outer bands around the curve or surface. We perform the evaluation with both 2D networks and 3D networks on various challenging medical datasets: DRIVE [14]-retinal vessel segmentation, iSeg [15]-infant brain segmentation, MRBrainS [16]-adult brain segmentation, Brats [17]-brain tumor segmentation. b c a

Active Contour (AC)
Active Contour (AC), or Deformable Models, based on variational models and partial differential equations (PDEs), can be considered as one of the most widely used approaches in medical image segmentation. There are two main approaches in AC: snakes and Level Set (LS). Snakes explicitly move predefined snake points based on an energy minimization scheme, while LS approaches move contours implicitly as a particular level of a function.
Among many AC-based approaches in the last few decades for image segmentation, LS methods [2,10,18,19] have demonstrated promising performance under some constraints, e.g., resolution, illumination, shape, noise, occlusions, etc. LS-based or implicit AC models have provided more flexibility and convenience for the implementation of AC; thus, they have been used in a variety of image processing and computer vision tasks. The basic idea of the implicit AC is to represent the initial curve C implicitly within a higher dimensional function, called the level set function Φ(x, y) : Ω → R, such as: where Ω denotes the entire image plane. AC is widely applied in image segmentation due to its ability to automatically handle such various topological changes. In the AC framework with LS implementation, the contour evolution is equivalent to the evolution of the LS function and the boundary C can be represented by the zero LS Φ = 0 as follows: One of the most popular region-based AC models was proposed by Chan-Vese (CV) [10]. The CV-model has successfully segmented an image into two regions, each having a distinct mean of pixel intensity by minimizing the following energy functional. The CV-model to image segmentation starts with an initial level set Φ 0 and a given image I. The updating process is performed via gradient descent by minimizing an energy function which is defined based on the difference of image features, such as color and texture, between foreground and background. The fitting term or energy term in CV-model is defined by: the inside contour energy E 1 , the outside contour energy E 2 , the length of the contour Length(C) and the size of area inside the contour Area(C) as in Equation (2). The first two terms are to search for uniformity of a desired feature within a subset whereas the last two terms are regularization terms.
where c 1 and c 2 are the average intensity inside and outside the contour C.

Class Imbalance
Class imbalance has been studied thoroughly over previous decades using either traditional machine learning models, i.e., non-DL or advanced DL techniques. Anand et al. [20] proposed the first work which explores the effects of class imbalance on the backpropagation in a shallow neural network. The authors showed that in the problem of imbalanced data, the majority class usually dominates the network gradient and the error of the majority class is quickly reduced while the error of the minority class is increased. The previous works using DL to class imbalance can be divided into three groups: (i) data-level, (ii) algorithm-level and (iii) hybrid-level. Data-level methods aim at altering the training data distribution by either adding more samples into the minority class or removing samples from the majority class to compensate for imbalanced distribution between the classes. There are three approaches in this category: (i) under-sampling examples from the majority class [21]; (ii) over-sampling examples from the minority class [22]; (iii) dynamic sampling [23]. In the context of deep feature representation learning using DL, data-level methods may either (i) introduce large amounts of duplicated samples, which slows down the training process and faces an over-fitting problem when performing over-sampling, or (ii) discard valuable examples that are important for discriminating when performing under-sampling. Due to these disadvantages of applying under-sampling or over-sampling for DL training, the algorithm-level methods focus on how to design a better class-balanced loss. Far apart from the previous data-level methods, algorithm-level methods focus on modifying deep learning algorithms. There are two main groups of DL-based algorithm level methods: (i) the first group focuses on proposing a loss function that reduces the influence of imbalanced data. Loss functions that work in DL frameworks are mean false error (MFE), mean squared false error (MSFE) [24], focal loss [7], rectification loss [25] and (ii) the second category focuses on cost-sensitivity and the proposed methods include cost-sensitive deep learning (CoSen CNN) [26], cost-sensitive deep belief network with differential evolution (CSDBN-DE) [27], long-tail loss [28]. In order to learn more about the discrimination of deep representations of imbalanced image data, Ref. [29] proposed a hybrid-data method named Large Margin Local Embedding (LMLE) method which takes advantages from both data-level and algorithm level. However, their proposed method has a number of fundamental drawbacks including disjoint feature, quintuplet construction updates and classification optimization. Later, Ref. [30] introduced Deep over-sampling (DOS) which incorporates over-sampling into the deep feature space produced by DL. Our proposed loss belongs to the second category, DL-based algorithm level methods.
The existing works on the imbalanced-class data problem can be summarized as in the diagram shown in Figure 5.

Loss Function
To train a Deep Neural Network (DNN), the loss function, which is known as cost function, plays a significant role. The loss function is to measure the average (expected) divergence between the output of the network (P) and the actual function (T) being approximated over the entire domain of the input, sized m × n. We denote i as index of each pixel in an image spatial space N = m × n. The label of each class is written as c in C classes. Herein, we briefly review the some common loss functions.
Cross Entropy (CE) Loss: it was proposed by [33] and is a widely used pixel-wise distance to evaluate the performance of the classification or segmentation model. In the CE loss function, the output from the softmax layer (P) is classified and evaluated against the ground truth (T). For binary segmentation, CE loss is expressed as Binary-CE (CE) loss function as follows: The standard CE loss has well-known drawbacks in the context of highly unbalanced problems. It achieves a good performance on a large training set with balanced classes. However, for unbalanced data, it typically results in unstable training results and leads to decision boundaries biased towards the majority classes. To deal with the imbalanced-data problem, two variants of the standard CE loss, Weighted CE (WCE) loss and Balanced CE (BCE) loss are proposed to assign weights to the different classes.
Dice loss: it was proposed by [6]. It measures the degree of overlapping between the reference and segmentation. Dice loss comes from Dice score which was used to evaluate the segmentation performance. In general, it is defined as follows: Even though Dice loss has been successful in image segmentation, it is still a pixel-wise loss and has similar limitations as the CE loss. Despite the Dice loss improvements over the CE loss, Dice loss may undergo difficulties when dealing with very small structures [34] and weak object boundary, as missclassification of a few pixels can lead to a large decrease of the coefficient.
Focal Loss: it was proposed by [7], Focal loss is a modified version of CE loss. It is to balance between easy and hard samples as follows: In Focal loss, the loss for confidently correctly classified labels is scaled down, so that the network focuses more on incorrect and low confidence labels than on increasing its confidence in the already correct labels. The loss focuses more on less accurate labels than the logarithmic loss when γ > 1.
Offset Loss Recently, Le et al. [9] proposed Offset Loss which aims to address the weak boundary object segmentation. The Offset Curve (OsC) Loss network takes into account both higher feature level, i.e., the region inside the contour, the intermediate feature level, i.e., offset curves around the contour and the lower feature level, i.e., the contour. The proposed OsC loss consists of three main fitting terms. The first fitting term focuses on pixel-wise level segmentation whereas the second fitting term acts as attention model which pays attention to the area around the boundaries (offset curves). The third terms plays a role as regularization term which takes the length of boundaries into account. The proposed OsC loss is defined as where as L 1 , L 2 and L 3 are three loss terms corresponding to higher feature loss, intermediate feature loss and low feature loss, respectively.
where T c o is binary indicator (0 or 1) if class label "c" is the correct classification for observation The signed distance function (SDF) [13] is applied on P to obtain φ. The proposed two-branch network is an improvement of our previous work on OsC loss [9].

Our Proposed Two-Branch Network
Our proposed network contains two branches. The first branch focuses on higher level feature presentation (i.e., region) whereas the second branch targets at lower level feature representation (i.e., contour). The first branch is built upon region information whereas the second branch is built upon narrow band energy and the length of the contour. The entire network architecture is shown in Figure 6.

Higher Level Feature Branch
The first branch of the network is a standard segmentation CNN which can utilize any encoder-decoder network such as Unet [3] and FCN [35]. Unet [3] has been widely used as end-to-end and encoder-decoder framework for semantic segmentation with high precision results. One of the most important building blocks is skipped connections which are designed for forwarding feature maps from the down-sampling path to the up-sampling path in order to localize high resolution features. Fully convolutional networks (FCN) [35] also consist of two paths: down-sampling and up-sampling paths. The down-sampling path aims to increase the receptive-field via convolution and pooling layers. In the upsampling path, the intermediate features are up-sampled to the input resolution by bi-linear operators. Both Unet and FCN network architectures are chosen as the network backbones in our experiments. More formally, for a region segmentation of K classes, the first branch outputs the categorical distribution and the loss is computed as: where y c o is binary indicator (0 or 1) if class label "c" is the correct classification for observation "o" and p c o is predicted probability observation "o" is of class "c".

Transitional Gate
In semantic segmentation, both object region and object contour are closely related; thus, we present a transitional gate that aims at transferring information from the first branch to the second branch. The transitional gate acts as a filter that focuses on extracting lower level features and removing irrelevant information from higher level features. Let us denote the output feature representation of the first branch as F H . The output from NB-AC attention model in the second branch is denoted as F C L and F N L corresponding to the contour feature map and the narrow-band feature map.
The contour feature map F C L is obtained by applying the edge extraction operator χ on the higher level feature map F H and the narrow-band feature map F N L is obtained by applying the parallel curves operator ζ on F C L . In our experiments, χ and ζ are chosen as the gradient operator and the dilation operator, respectively. Our NB-AC loss is flexibly incorporated into both 2D and 3D frameworks. In 2D frameworks, the gradient operator (χ) is defined as either 3 × 3 convolutional layer and dilation operator (ζ) is defined as B × B where B is the width of the narrow band. In 3D frameworks, the gradient operator (χ) is defined as a 3 × 3 × 3 convolutional layer and the dilation operator (ζ) is defined as B × B × B where B is the width of narrow band.

Lower Level Feature Branch
Our proposed NB-AC attention model in the second branch is motivated by the minimization problem of CV's model [10] (Section 2.1). The CV model is used to efficiently find a boundary (object contour) by automatically partitioning an image into two regions based on global minimizing active contour energy. The level set function Φ splits the image domain Ω into an inner region Ω I = Φ > 0, an outer region Ω O = Φ < 0 and on the contour Φ = 0. However, the CV model makes strong assumptions on the intensity distributions and homogeneity criterion, which are usually expressed over regions inside and outside of the contour. Instead of dealing with the entire domains Ω defined by the evolving curve, we only consider the narrow band B in B out C which is formed by the inner band domain B in , the outer band domain B out from two sides of the curve C and the curve C itself (note: C is presented by Φ = 0), as depicted in Figure 7a. Our NB-AC loss of the second branch is defined in Equation (13): where the first term defines the smoothness which is equivalent to the length of the contour, the second term defines the inner band energy, the last term defines the outer band energy. p is the predicted feature map. By applying the transitional gate (Section 3.2), we can rewrite Equation (13) in terms of the domain Ω as follows: where b in and b out are intensity descriptors of B in and B out , respectively. b in = Ω p(x, y)F y ζχ (x, y)dxdy Ω F y ζχ (x, y)dxdy and b out = Ω p(x, y)(1 − F y ζχ (x, y))dxdy Ω (1 − F y ζχ (x, y))dxdy (15) where F y ζχ is the narrow band of the ground truth y and is computed by first applying the gradient operator (χ) to extract the gradient and then applying a dilation operator ζ to obtain the narrow band, namely, F y ζχ = ζ(χ(y)). Our proposed NB-AC loss achieves good flexibility thanks to the narrow band principle which does not carry a strict homogeneity condition. The theory of our proposed NB-AC attention model comes from the parallel curve also known as "offset curves" [36,37]. As given in Figure 7b, the curve C B1 or C B2 (C B in general) is called a parallel curve of C if its position vector I B satisfies: C : Ω → R 2 z → I(z) = [x(z), y(z)] I B (z) = I(z) + Bn(z) (16) where x and y are continuously differentiable with respect to parameter z and Ω ∈ [0, 1]. B is the amount of translation, and n in the inward unit normal of C. An important property resulting from the definition of Equation (16) is that the velocity vector of parallel curves depends on the curvature of C. That means, the velocity vector of curve C B is expressed as a function of the velocity vector of C and its curvature and normal. Set n(z) = −κI(z), we have: Applying Equation (17) to the curves in Figure 7a, we obtain the length element (or velocity) of the outer parallel curve C +B : l +B = ||I+Bn(z)||, the length element of the inner parallel curve C −B : l −B = ||I−Bn(z)||. Based on the above offset curve theory, the inner band B in and the outer band B out (in Figure 7a) are bounded by parallel curves C −B and C +B .
In our proposed network architecture, the second branch only focuses on the information around the contour and on the contour itself, i.e., B in B out C as in Figure 7a. This aims at addressing not only the problem of weak boundary object segmentation but also the imbalanced data problem. In image segmentation, each pixel is considered as a data sample and needs to be classified. The second branch can be seen as an under-sampling approach where all data samples inside the C −B and outside of C +B (i.e., not in the narrow band) are ignored and only data samples between the narrow band formed by B in B out C are kept for prediction. One can think that the contour C plays the role of a hyperplane and all data samples inside the narrow band play the role of support vectors which influence the position and orientation of the hyperplane.

Network Architecture
The architecture of our proposed two-branch network is illustrated in Figure 6 where we choose the Unet framework for this demonstration. The first branch is designed as a standard encoder-decoder segmentation network. The second branch is composed of residual blocks interleaved with transitional gates (in Section 3.2) which ensures that the second branch only processes boundary-relevant information (edge and narrow band). Our proposed network is designed as an end-to-end framework. The losses from both branches are combined as: where λ 1 and λ 2 are two hyper-parameters that control the weighting between the losses and are chosen as λ 1 = λ 2 = 0.5 in our experiments.
In this work, we use 2D Unet [3] and 2D FCN [35] architectures as our base segmentation frameworks to evaluate our proposed NB-AC loss function performance in the case of 2D input. Furthermore, we use 3D Unet [4] to evaluate the proposed NB-AC loss function in the case of 3D input. In Unet, feature maps from the down-sampling path is forwarded to the up-sampling path by skipping connections. Each layer in the down-sampling path consists of two 3 × 3 convolution layers (3 × 3 × 3 in 3D Unet), one batch normalization (BN), one rectified linear unit (ReLU) and one max pooling layer. In the up-sampling path, a bilinear interpolation is used to up-sample the feature maps. In the FCN framework, we choose FCN-32 which produces the segmentation map from conv1, conv3, conv7 by using a bilinear interpolation. At the down-sampling path, each layer in FCN is designed as same as layer in the 2D Unet.

Experiments and Conclusions
In this section, we evaluate the proposed NB-AC loss with different network architectures, such as Unet [3], FCN [35], 3DUnet [4]. Our performance is compared against other common loss functions, i.e., Dice, CE, Focal on the baseline frameworks Unet [3], FCN [35] and compared against other state-of-the-art networks on 3DUnet [4].

Metrics
Our proposed Nb-AC is evaluated on the common metrics as follows: Dice Score: the algorithm generates a predictions P which is the segmentation of a tumor region from a modality. P ∈ {0, 1} for each of the three tumor regions. The corresponding experts' consensus truth T ∈ {0, 1} is obtained from ground truth images for each of the regions. The evaluation metric Dice score is calculated as: where ∧ is the logical AND operator, | | is the size of the set (i.e., the number of voxels belonging to it), and P 1 and T 1 represent the set of voxels where P = 1 and T = 1, respectively. The Dice score normalizes the number of true positives to the average size of the two segmented areas. It is identical to the F_score (the harmonic mean of the precision recall curve) and can be transformed monotonously to the Jaccard score. Intersection-Over-Union (IoU): it is one of the most commonly used metrics in semantic segmentation. This metric aims to measure the overlap between two bounding boxes or masks.
Precision and Recall: precision is defined as the volume of correctly segmented volume to the total volume that has been segmented. Recall (also referred to as sensitivity) is the the ratio of correctly segmented volume over the ground truth. Precision takes into account only the volume that has been segmented correctly but does not consider the under-segmented volume. Recall, on the other hand, does not consider the over-segmented volume. The Precision and Recall metrics are defined as follows:

Dataset
We use four common medical datasets including 2D and 3D images in our experiments as follows: DRIVE: The Digital Retinal Images for Vessel Extraction) [14] contains 40 colored fundus photographs, each is sized 565 × 584. The dataset is divided into 20 images for training and validation, 20 images for testing. To reduce the overfitting problem and to reduce the calculation complexity, our model is trained on 19,000 small patches sized 224 × 224 which were randomly extracted from the 20 training images.
iSeg: The iSeg17 dataset [15] consists of 10 subjects with ground-truth labels for training and 13 subjects without ground-truth labels for testing. Each subject includes T1 and T2 images with a size of 144 × 192 × 256, and an image resolution of 1 × 1 × 1 mm 3 . In iSeg, there are three classes: white matter (WM), gray matter (GM), and cerebrospinal fluid (CSF).
MRBrainS: The MRBrainS18 dataset [16] contains 7 subjects for training and validation and 23 subjects for testing. For each subject, three modalities are available that includes T1-weighted, T1-weighted inversion recovery and T2-FLAIR with an image size of 48 × 240 × 240. Each subject was manually segmented into either 3 or 8 classes by the challenge organizers.
Brats: The Brats18 database [17] contains 210 HGG scans and 75 LGG scans. For each scan, there are 4 available modalities, i.e., T1, T1C, T2, and Flair. Each image is registered to a common space, sampled to an isotropic 1 × 1 × 1 mm 3 resolution by the organizers and has a dimension of 240 × 240 × 155. In Brats18, there are three tumor classes: whole tumor (WT), tumor core (TC) and enhanced tumor (ET).

Experiment Setting
On 2D images, to train our NB-AC loss on 2D networks (FCN [35], UNet [3]), we define the input as N × C × H × W, where N is the batch size, C is the number of input modalities and H, W are height, width of 2D image. Corresponding to DRIVE, iSeg17, MRBrainS18 and Brats18, we choose the input as 8 × 1 × 64 × 64, 4 × 2 × 128 × 128, 4 × 3 × 224 × 224 and 4 × 4 × 224 × 224, respectively. We employed the Adam optimizer, with a learning rate of 1 × 10 −2 with weight decay 1 × 10 −4 . On 3D volumes, our 3D architecture is built upon 3D-Unet [4] and the input is defined as where N is batch size, C is the number of input modalities and H, W, D are height, width and depth of the volume patch in the sagittal, coronal, and axial planes. Corresponding to Brats18, MRBrainS18 and iSeg17, we choose the input as 1 × 4 × 128 × 128 × 128, 4 × 3 × 48 × 96 × 96, and 2 × 2 × 64 × 64 × 64. We implemented our network using PyTorch 1.3.0 and our model is trained until convergence by using the ADAM optimizer. We employed the Adam optimizer, with a learning rate of 2 × 10 −4 . Our 3D Unet makes use of instance normalization [38] and Leaky reLU. The experiments are conducted using an Intel CPU and RTX GPU.

Results and Comparison
For quantitative assessment of the segmentation, the proposed model is evaluated on different metrics, e.g., Dice score (DSC), Intersection over Union (IoU), Sensitivity (or Recall), Precision (Pre).
The performance of our proposed NB-AC loss is evaluated on both FCN [35] and Unet [3] architectures for 2D input and 3DUnet [4] for 3D input. The comparisons between our proposed loss and other common loss functions: CE, Dice, Focal on challenging datasets DRIVE, MRBrainS18, Brats18 and iSeg17 are given in Tables 1-4. Table 1. Comparison between our proposed NB-AC loss against other losses CE [33], Dice [6], Focal [7], and OsC [9] on the DRIVE dataset with the corresponding two network backbones 2D-FCN [35] and 2D-Unet [3]. The best performance is shown in bold.  Table 3. Comparison between our proposed NB-AC loss against other losses CE [33], Dice [6], Focal [7], and OsC [9] on the BRATS 2018 dataset with the corresponding two network backbones 2D-FCN [35] and 2D-Unet [3]. The best performance is shown in bold.  Table 4. Comparison between our proposed NB-AC loss against other losses CE [33], Dice [6], Focal [7], and OsC [9] on the iSeg 2017 dataset with the corresponding two network backbones 2D-FCN [35] and 2D-Unet [3]. The best performance is shown in bold. It is clear that the proposed NB-AC loss function outperforms the other common losses under both UNet and FCN frameworks. Take the DSC metric on the best known CE loss as an example, our loss gains 3.19%, 1.39%, 2.08%, 0.44% on DRIVE, MRBrainS18, Brats18, iSeg17, respectively, using 2D-Unet framework and it gains 4.52%, 0.91%, 1.33%, 0.88% on DRIVE, MRBrainS18, Brats18, iSeg17, respectively, using FCN framework. Figures 8-11 visualize the comparison between our proposed NB-AC loss against other loss functions including Dice, Focal (FC) and Cross Entropy (CE) on the Unet framework. These images are randomly selected from the testing set of various datasets, namely DRIVE, MRBrainS 2018, BRATS 2018, iSeg 2017. As shown in Figure 1, medical images contain poor contrast images where the boundary between objects is very unclear and weak. Take the iSeg dataset as an example, due to the myelination and maturation process of the infant brain, the boundary between classes in the infant brain in iSeg is very weak, leading to difficulties for segmentation. The segmentation results from different loss functions are visualized in Figure 11(top) with specific differences highlighted in colored boxes. The infant brain MR images (iseg-2017 dataset) have extremely low tissue contrast between tissues; thus, the segmentation results using traditional loss functions (such as CE, Dice, and Focal loss) have large amounts of topological errors (contain large and complex handles or holes) in the segmentation results, such as the WM surface in the Figure 11(bottom) which illustrates an enlarged view of the white matter surface of an infant brain. Figure 11 (bottom) demonstrates that the proposed NB-AC loss function produces less topological errors (i.e., holes and handles), indicated by the red arrows, compared against the existing loss functions. In addition to the 2D view of the brain as in Figure 11, the 3D view of the entire white matter surface, as in Figure 12, demonstrates that the proposed NB-AC loss function produces less topological errors (i.e., holes and handles), indicated by the red arrows, compared against the existing loss functions.

Losses
In Figure 8, the weak boundary vessel is highlighted in colored boxes. In such colored boxes, we can see the vessel is shown with poor contrast in the original image and the ground truth of the vessel is very thin. Far apart from other loss functions which are unable to capture such information, the proposed NB-AC has high capability to work in the case of weak object boundary segmentation. Not only for weak object boundary but also imbalanced-class data, Figures 9 and 10 contain the performance of the middle slide of each image/volume that are from the MRBrainS 2018, BRATS 2018 datasets. In each figure, the colored boxes highlight areas corresponding to small class data and weak boundary object (especially the object boundary). Compared against other loss functions, our NB-AC loss obtains the closest result to the ground truth in both cases of weak boundary object and small object. Clearly, comparing with the common segmentation losses, the proposed NB-AC loss improves the segmenting performance using the same network backbone. Take CE loss function as an example, the proposed NB-AC loss improved the segmentation accuracy regardless of the backbone networks (2D-FCN, 2D-Unet or 3D-Unet). Figures 8-11 visualize the comparison between our loss and other loss functions. In these figures, some regions are highlighted to easily see the difference in segmentation results between loss functions.
The segmentation results from different loss functions are visualized in Figure 11(top) with specific differences highlighted in colored boxes. Figure 11(down) illustrates an enlarged view of the white matter surface of an infant brain from the regions highlighted in blue boxes of Figure 11(top). Figure 11(down) demonstrates that the proposed NB-AC loss function produces less topological errors (i.e., holes and handles), indicated by the red arrows, compared against the existing loss functions. For a more detailed visualization, we provide the entire view of the white matter surface obtained from different loss functions in Figure 12. Table 5 shows the comparison against other state-of-the-art methods on three volumetric datasets. Our performance is quite compatible with [39] on MRBrainS while it outperforms [40,41] on BratS18 and iSeg17 with similar network architecture setup.

Original GT
Dice FC CE NB-AC Figure 11. top: Comparison of our proposed NB-AC loss against other loss functions on the iSeg17 dataset with colored boxes highlighting specific differences. bottom: A closer look is also given with the topological errors indicated by red arrows.

CE Dice
FC NB-AC Figure 12. Visualization of the white matter surface of the existing loss functions on the iSeg17 dataset where differences in topology are indicated by red arrows. Table 5. Comparison of our proposed NC-AC loss on both 2D-Unet [3] and 3D-Unet [4] against other state-of-the-art methods on medical datasets with Dice score (DSC).

Conclusions
In this paper, we presented a novel two-branch deep neural network with narrow band active contour (NB-AC) attention model on the second branch. Our proposed network targets at addressing the problems of imbalanced-class data and weak boundary object segmentation. The proposed network takes into account both higher level features, i.e., the region in the first branch and lower level features, i.e., the contour and narrow band in the second branch. The information from the first branch transfers to the second branch through our proposed transitional gate. Both branches process in parallel and under an end-to-end framework. The experiments have demonstrated that our proposed two-branch network with NB-AC loss function performs significantly better than commonly used loss functions, e.g., CE, Dice, Focal, OsC regarding the network backbone, i.e., 2D-FCN, 2D-Unet, 3D-Unet network architectures. The experiments have shown that incorporating NB-AC loss obtained with 3D-Unet architecture networks can provide a state-of-the-art performance on multiple volumetric datasets. We believe that this new development will be successfully applied to other segmentation tasks in both medical imaging and computer vision.