A Semantic Segmentation Method with Emphasis on the Edges for Automatic Vessel Wall Analysis

: To develop a precise semantic segmentation method with an emphasis on the edges for automated segmentation of the arterial vessel wall and plaque based on the convolutional neural network (CNN) in order to facilitate the quantitative assessment of plaque in patients with ischemic stroke. A total of 124 subjects’ MR vessel wall images were used to train, validate, and test the model using deep learning. An end-to-end architecture network that can emphasize the edge information, namely the Edge Vessel Segmentation Network (EVSegNet) for automated segmentation of the arterial vessel wall, is proposed. The EVSegNet network consists of two workﬂows: one is implemented to achieve ﬁnely and multiscale segmentation by combining Dense Upsampling Convolution (DUC) and Hybrid Dilated Convolution (HDC) with different dilation rates modules, and the other utilizes edge information and is fused with another workﬂow to ﬁnally segment the vessel wall. The proposed network demonstrates robust segmentation of the vessel wall and better performance with a Dice (%) of 87.5, compared with the traditional U-net that has a Dice (%) of 81.0 and other U-net-based models on the test dataset. The results suggest that the proposed segmentation method with an emphasis on the edges improves segmentation accuracy effectively and will facilitate the quantitative assessment of atherosclerosis.


Introduction
Magnetic resonance (MR) vessel wall imaging (VWI) has been proven to visualize plaques and quantitatively analyze their morphologic and signal characteristics [1]. One of the potential important applications of quantitative morphologic and signal measurements of the arterial vessel wall and plaques is to monitor intracranial atherosclerotic disease progression and regression during medical management, which can greatly assist a comprehensive study of atherosclerosis. Quantitative morphologic measurements require the segmentation of the arterial vessel wall [2]. However, the manual segmentation method can only be performed slice by slice based on the 2D images, which is inefficient and costly, and heavily depends on expert knowledge and experience. Therefore, a fast and precise computer-aided automatic segmentation method is potentially needed.
Most previous studies were based on semi-automatic method. Cheng et al. proposed a Hough transform-based boundary identification method to automatically detect the carotid artery boundary in MR images. The method can realize the analysis of a tremendous number of images and all results are repeatable [3]. Luo et al. presented a modified traditional level set method with a double adaptive threshold (DATLS) to semi-automatically segment the carotid artery on TOFMRA images, which performed well on images with a weak boundary and outperformed traditional level sets in terms of segmentation computing time, robustness, and accuracy [4]. Sakellarios et al. utilized active contours and

Network Architecture
We proposed the Edge Vessel Segmentation Network (EVSegNet), a highlighted edge convolution network for arterial vessel wall and plaque segmentation. The architecture of the EVSegNet is illustrated in Figure 1. It consists of a regular stream and an edge stream. The regular stream has an encoder-decoder architecture to process texture information. In the encoder, four downsampling convolutional blocks are followed by a bottleneck block. Overall, the downsampling block reduced the spatial dimension down to 1/16 of the input size to extract high-level semantic information. In the decoder, we introduced Dense Upsampling Convolution (DUC) to replace traditional bilinear interpolation upsampling to predict segmentations from features extracted by the encoder. Motivated by Fang et al. [14], the bottleneck block is a hybrid dilated convolution (HDC) module, which is composed of three stacked HDC blocks to expand receptive fields but to not increase the weights number in the filter. The HDC block consists of three convolution layers with kernel size 3 and dilation rates of r = 1, 2, 3, respectively, three normalization layers, and three ReLU activation functions. The edge stream is composed of a sequence of residual blocks followed by an edge gated layer, except for the bottleneck, which was fed into the edge stream as the first residual block input of the edge stream. Each resolution upsampling feature map from the decoder and its corresponding downsampling feature map from the encoder of the regular stream were fused and combined with previous residual block as the edge gated layer inputs. Specifically, the edge information was generated by utilizing the Canny filter. Finally, the output of the edge stream was concatenated with the output of the regular stream. Two streams shared a dual loss function besides their dedicated loss layers. Appl. Sci. 2022, 12, x FOR PEER REVIEW 3 of 13 to 1/16 of the input size to extract high-level semantic information. In the decoder, we introduced Dense Upsampling Convolution (DUC) to replace traditional bilinear interpolation upsampling to predict segmentations from features extracted by the encoder. Motivated by Fang et al. [14], the bottleneck block is a hybrid dilated convolution (HDC) module, which is composed of three stacked HDC blocks to expand receptive fields but to not increase the weights number in the filter. The HDC block consists of three convolution layers with kernel size 3 and dilation rates of r = 1, 2, 3, respectively, three normalization layers, and three ReLU activation functions. The edge stream is composed of a sequence of residual blocks followed by an edge gated layer, except for the bottleneck, which was fed into the edge stream as the first residual block input of the edge stream. Each resolution upsampling feature map from the decoder and its corresponding downsampling feature map from the encoder of the regular stream were fused and combined with previous residual block as the edge gated layer inputs. Specifically, the edge information was generated by utilizing the Canny filter. Finally, the output of the edge stream was concatenated with the output of the regular stream. Two streams shared a dual loss function besides their dedicated loss layers.

Dense Upsampling Convolution (DUC) Module
Traditional upsampling methods utilize bilinear interpolation upsampling. Due to the unlearnably of bilinear interpolation upsampling parameters, it is likely to miss the essential low-level information and hard to carry out the recovery by common upsampling operations. To resolve this problem, we introduced the DUC module, which can capture and compensate for the fine-detailed information that is commonly missed in the bilinear interpolation operation to generate a dense pixel-wise prediction map and further final prediction.
As shown in Figure 2, we performed the DUC on the h × w × c feature map from the output of the encoder part, the original size of which is the size of the input image with H × W × C. We set the downsampling factor as d, so h = H/d, w = W/d. Then, we utilized a 1 × 1 convolutional layer to adjust the channel number c to d 2 × L in order to obtain the dimension h × w × d 2 × L of the feature map, where L represents the class numbers in segmentation. Subsequently, the dimension h × w × d 2 × L feature map is reshaped to H × W × L (here, our L is 1, d is 2). In particular, the main idea of DUC is to divide the label map into the number of d 2 subparts that are equal in size to the input feature map where

Dense Upsampling Convolution (DUC) Module
Traditional upsampling methods utilize bilinear interpolation upsampling. Due to the unlearnably of bilinear interpolation upsampling parameters, it is likely to miss the essential low-level information and hard to carry out the recovery by common upsampling operations. To resolve this problem, we introduced the DUC module, which can capture and compensate for the fine-detailed information that is commonly missed in the bilinear interpolation operation to generate a dense pixel-wise prediction map and further final prediction.
As shown in Figure 2, we performed the DUC on the h × w × c feature map from the output of the encoder part, the original size of which is the size of the input image with H × W × C. We set the downsampling factor as d, so h = H/d, w = W/d. Then, we utilized a 1 × 1 convolutional layer to adjust the channel number c to d 2 × L in order to obtain the dimension h × w × d 2 × L of the feature map, where L represents the class numbers in segmentation. Subsequently, the dimension h × w × d 2 × L feature map is reshaped to H × W × L (here, our L is 1, d is 2). In particular, the main idea of DUC is to divide the label map into the number of d 2 subparts that are equal in size to the input feature map where the length is the same as the width. Thus, DUC can make up for the loss of detail information in the length and width of the feature map through the channel dimension. the length is the same as the width. Thus, DUC can make up for the loss of detail information in the length and width of the feature map through the channel dimension.

Hybrid Dilated Convolution (HDC) Module
Recent researches have demonstrated that combining with dilated convolutions is a promising method for image analysis tasks [15,16]. Dilated convolution replacing pooling operation, allows maintaining receptive fields without extra parameters. One problem that exists in the dilated convolution framework with a fixed dilation rate may cause "gridding artifacts". The use of large dilation rate information may only be beneficial for some large regions, but it may have disadvantages for small areas. As can be seen from Figure 3, when dilation rate (r) = 2, kernel size (k) = 3, not all the pixels in the receptive field are involved in the computation, leaving some holes that did not cover the entire receptive field. Therefore, the information will be lost in the places where these holes are in the image. However, these holes are the places on the image where small information exists. If the dilated rate is larger, the number of pixels that take part in computation will be smaller. For instance, if k = 3 and r = 2, only 9 of 25 pixels are involved in the convolutional operation. Here, inspired by Wang [17], we adopt the HDC block. The HDC model is built by stacking the dilated convolution kernels with different rates. As can be seen in Figure 4, instead of using the same dilation rate, HDC employs different dilation rates for the dilated convolution layer. It can effectively solve the information loss and precision reduction problem caused by the holes in the dilated convolution kernels because it can make a series of convolution operations to cover the area completely without holes when expanding the receptive field. Figure 5 depicts the detail of the HDC block; we can see that the HDC block consists of three convolution layers with kernel size 3 and dilation rates of r = 1, 2, 3, respectively, three batch normalization layers, and three ReLU activations.

Hybrid Dilated Convolution (HDC) Module
Recent researches have demonstrated that combining with dilated convolutions is a promising method for image analysis tasks [15,16]. Dilated convolution replacing pooling operation, allows maintaining receptive fields without extra parameters. One problem that exists in the dilated convolution framework with a fixed dilation rate may cause "gridding artifacts". The use of large dilation rate information may only be beneficial for some large regions, but it may have disadvantages for small areas. As can be seen from Figure 3, when dilation rate (r) = 2, kernel size (k) = 3, not all the pixels in the receptive field are involved in the computation, leaving some holes that did not cover the entire receptive field. Therefore, the information will be lost in the places where these holes are in the image. However, these holes are the places on the image where small information exists. If the dilated rate is larger, the number of pixels that take part in computation will be smaller. For instance, if k = 3 and r = 2, only 9 of 25 pixels are involved in the convolutional operation. Here, inspired by Wang [17], we adopt the HDC block. The HDC model is built by stacking the dilated convolution kernels with different rates. As can be seen in Figure 4, instead of using the same dilation rate, HDC employs different dilation rates for the dilated convolution layer. It can effectively solve the information loss and precision reduction problem caused by the holes in the dilated convolution kernels because it can make a series of convolution operations to cover the area completely without holes when expanding the receptive field. Figure 5 depicts the detail of the HDC block; we can see that the HDC block consists of three convolution layers with kernel size 3 and dilation rates of r = 1, 2, 3, respectively, three batch normalization layers, and three ReLU activations. the length is the same as the width. Thus, DUC can make up for the loss of detail information in the length and width of the feature map through the channel dimension.

Hybrid Dilated Convolution (HDC) Module
Recent researches have demonstrated that combining with dilated convolutions is a promising method for image analysis tasks [15,16]. Dilated convolution replacing pooling operation, allows maintaining receptive fields without extra parameters. One problem that exists in the dilated convolution framework with a fixed dilation rate may cause "gridding artifacts". The use of large dilation rate information may only be beneficial for some large regions, but it may have disadvantages for small areas. As can be seen from Figure 3, when dilation rate (r) = 2, kernel size (k) = 3, not all the pixels in the receptive field are involved in the computation, leaving some holes that did not cover the entire receptive field. Therefore, the information will be lost in the places where these holes are in the image. However, these holes are the places on the image where small information exists. If the dilated rate is larger, the number of pixels that take part in computation will be smaller. For instance, if k = 3 and r = 2, only 9 of 25 pixels are involved in the convolutional operation. Here, inspired by Wang [17], we adopt the HDC block. The HDC model is built by stacking the dilated convolution kernels with different rates. As can be seen in Figure 4, instead of using the same dilation rate, HDC employs different dilation rates for the dilated convolution layer. It can effectively solve the information loss and precision reduction problem caused by the holes in the dilated convolution kernels because it can make a series of convolution operations to cover the area completely without holes when expanding the receptive field. Figure 5 depicts the detail of the HDC block; we can see that the HDC block consists of three convolution layers with kernel size 3 and dilation rates of r = 1, 2, 3, respectively, three batch normalization layers, and three ReLU activations.

Edge-Gated Layer
Gated convolution solves the problem of traditional convolutional methods that treat all the pixels as equally important; the gated convolution layer (GCL) is capable of filtering irrelevant information and focus on partial information by providing a selection mechanism. Motivated by this, we introduced a gated layer to our work. The edge gated layer receives the inputs from its previous gated layer outputs with the corresponding resolution feature maps of the main stream. GCL enables collaboration between two streams. The regular stream has a high ability to extract high-level semantic information; GCL deactivates the features from each resolution feature map of the regular stream. In this work, edge ground truth was generated by leveraging a Canny filter [18,19] to the original vessel wall images' ground truth masks.
Let i be the number of convolutional layers, i ϵ 0, 1, …, j. Where and represent the inputs from resolution i of the regular stream and edge stream, respectively. We obtained an attention map ∈ ℝ by feeding and into a 1 × 1 convolutional layer, , then concatenating them and passing them into a rectified linear unit (ReLU) [20] activation function.
As can be seen in the following, GCL is operated on a shape stream, where pixelwise multiplies the attention map followed by a residual block. Thus, the output is computed as: Figure 5. HDC Block. The HDC block consists of three convolution layers with kernel size 3 and dilation rates of r = 1, 2, 3, respectively, three batch normalization layers, and three ReLU activation functions.

Edge-Gated Layer
Gated convolution solves the problem of traditional convolutional methods that treat all the pixels as equally important; the gated convolution layer (GCL) is capable of filtering irrelevant information and focus on partial information by providing a selection mechanism. Motivated by this, we introduced a gated layer to our work. The edge gated layer receives the inputs from its previous gated layer outputs with the corresponding resolution feature maps of the main stream. GCL enables collaboration between two streams. The regular stream has a high ability to extract high-level semantic information; GCL deactivates the features from each resolution feature map of the regular stream. In this work, edge ground truth was generated by leveraging a Canny filter [18,19] to the original vessel wall images' ground truth masks.
Let i be the number of convolutional layers, i ∈ 0, 1, . . . , j. Where r i and e i represent the inputs from resolution i of the regular stream and edge stream, respectively. We obtained an attention map α i ∈ R H×W by feeding r i and e i into a 1 × 1 convolutional layer, C 1×1 , then concatenating them and passing them into a rectified linear unit (ReLU) [20] activation function.
As can be seen in the following, GCL is operated on a shape stream, where e i pixelwise multiplies the attention map α i followed by a residual block. Thus, the output is computed as: In our work, there will be the same number of edge-gated layers as the number of resolutions in the regular stream.

Loss Function
Semantic loss: The area of the vessel wall only accounts for 0.45-30.4% of the total area. Therefore, there is an imbalance between the foreground and background of our vessel wall images. We defined the foreground and background as easily classified samples and difficultly classified samples, respectively. The Focal loss proposed by Lin [21] is evolved from the cross entropy (CE) loss [22] to address the category imbalance problem. In this work, we introduce the focal loss as the semantic loss. Focal loss is defined as following: Focal loss introduces a modulating factor (1 − p t ) γ that is compared with CE; its purpose is to reduce the weight of the background so that the model can focus more on foreground samples during training. When p t >> 0.5, the contribution from the background to the loss is downweighted. The hyperparameter γ can be used to tune the weight of different samples. As γ increases, fewer easily classified samples will be involved in the training loss computation. Intuitively, when γ is 0, the focal loss is the same as the BCE loss. It balances easy and difficult samples by tuning α. Focal loss function with α = 0.25 and γ = 0.5.
Edge loss: We supervised the loss of edge stream in the vessel wall image by adding an auxiliary loss term that encourages the prediction of edges. Using the focal loss along with the BCE loss to supervise the edge stream, as follows: Here λ 1 , λ 2 are hyper-parameters. Dual loss: The Dual loss is defined as follows: where y pre,j denotes the prediction result of the regular stream, c represents the segmentation class. However, the function argmax is non-differentiable, so we use Gumbel softmax to back-propagate. Thus, we approximate the argmax operator as follows: ∂argmax t P y t ∂ η i = ∇ η i e (log e P(y t )+g t )/τ e (log e P(y j )+g j )/τ where τ is a hyper-parameter, g j denotes the Gumbel density function.

Image Preprocessing
The acquired 3D MR vessel wall images were transferred to a dedicated plaque analysis software (uWS PlaqueTool, United Imaging Healthcare Co., Ltd., Shanghai, China) for image preprocessing. We obtained the Contrast-enhanced MR angiography (CE-MRA) for a clear morphological structure of the blood vessels to extract the blood vessel centerline. Therefore, through the registration relationship, we mapped the centerline of the CE-MRA to the MRVWI to obtain the centerline of the MR vessel wall images.
According to the centerline of the MRVWI to the Curved multi-planar reformation (MPR), we reconstructed the entire vessel by straightening the image of the curved vessel wall and displaying it in a two-dimensional plane on a curved surface to obtain the crosssectional 2D image. The total process was completed automatically by the system. Then, the cross-sectional 2D images were manually delineated from the outer wall contours by five experienced radiologists with more than six years of experience.

Experimental Settings
We split 124 patients (13,962 slices) into 75 patients for training (8377 slices); 37 patients for validation (4189 slices), and 12 patients for testing (1396 slices). To avoid model overfitting, the training dataset was expanded by five times through rotation, translation, and padding, using Kaiming as initialization function. The parametric rectified linear units (PReLU) function was selected as the activation function. The SGD algorithm was used to optimize the network with a momentum of 0.9. We set a weight decay of 0.0001, a batch size of 32, and 20 epochs. We set the initial learning rate as 3 × 10 −5 . We performed the segmentation network by implementation with PyTorch on a computer with Intel i7-10700 CPU @ 2.9 GHz and an Nvidia GeForce TX 2080Ti 11 GB GPU.

Evaluation and Metrics
We employed five commonly used metrics to quantitatively compare different methods for vessel wall segmentation: Dice coefficient (denoted as Dice, %), recall, precision, accuracy, and mIoU.

Ablation Analysis of Our EVSegNet
In order to justify the effectiveness of the principal components of our network, i.e., DUC, HDC, and edge stream in the proposed EVSegNet, we used Dice, recall, precision, and accuracy as evaluation matrices. The U-net was compared in the testing dataset with an approach involving additional components including Resnet, DUC, HDC, and edge stream for further evaluations. Figure 6 shows the segmentation results by the proposed ESegNet from the test dataset as well as their corresponding original image, mask, binary mask, and edgemap. Edgemaps are generated online by computing the Canny filter. As can be seen from Figure 6, our examples included multiscale vessel walls and low-contrast images.

Results
The compared results segmentation outputs with different backbone architecture, U-net (None), U-net (Res), U-net (DUC), U-net (DUC + HDC), and EVSegNet, respectively, and are illustrated in Table 1. Table 1 shows the comparison results of the four metrics in our method with different components of our collected dataset. As shown in Table 1, the baseline U-net achieved a Dice, accuracy, recall, and precision score of 81.0, 96.5, 78.6, and 88.4, respectively, for the vessel wall. The performance kept improving when successively adding Resnet, DUC, HDC, and edgestream modules. As a result, our finalized network included DUC, HDC, and edgestream for image segmentation. Compared to the other backbone architecture, our EVSegNet has larger Dice, accuracy, the most stable recall, and great precision scores. For the test dataset, EVSegNet achieved the best performance with a Dice of 87.5 for the vessel wall, followed by U-net (None), U-net (Res), U-net (DUC), U-net (DUC + HDC), achieving a Dice of 81.0, 84.5, 85.6, and 86.5, respectively. Figure 7 shows the violin plot of segmentation performance evaluation of different methods. As can be seen from the figure, the wider violin shape regions represent more aggregated values than those in the narrower zones. EVSegNet aggregates places with values larger than those aggregated by other methods.
Ablation study for adopting the DUC block: Researchers have shown the DUC leads to better performance. Therefore, there is a concern that combining the DUC module might improve our network performance. The proposed EVSegNet employs the DUC block to replace the traditional bilinear interpolation for preserving the learning capability when upsampling. Table 1 shows the segmentation results; our model performance can be improved by adopting the dense upsampling convolution. For outer vessel wall segmentation, the Dice, accuracy, recall, and precision scores are increased from 81.0, 96.5, 78.6, and 88.4 to  Figure 6. Illustration of segmentation results of VWI dataset. The first column represents the original image, the second column represents the mask, the third column is binary mas, the fourth represents the edgemaps, which are generated by using Canny filter, and the last column is the corresponding results of the prediction.  Illustration of segmentation results of VWI dataset. The first column represents the original image, the second column represents the mask, the third column is binary mas, the fourth represents the edgemaps, which are generated by using Canny filter, and the last column is the corresponding results of the prediction.  Figure 6. Illustration of segmentation results of VWI dataset. The first column represents the original image, the second column represents the mask, the third column is binary mas, the fourth represents the edgemaps, which are generated by using Canny filter, and the last column is the corresponding results of the prediction.    Figure 8 shows how the Dice curves change of 11 epochs for different models for the test dataset. From Figure 8, it can be seen that the HDC model has a higher convergence speed than the traditional CNN model in terms the performance index of Dice and has a better performance. We validate that HDC contributes to the performance of the model and accelerate convergence. Furthermore, we also give a few examples for visual comparison in Figure 9 to show the segmentation results from 12 patients: the first column is the original image, and the second is the corresponding binary mask for references. The following columns are vessel wall segmentation images predicted by U-net (None), U-net (Res), U-net (DUC), U-net (DUC + HDC), and EVSegNet. Their corresponding evaluation results of Dice and mIoU are shown in Table 2. In particular, as can be seen from the second row and fifth row in Figure 9, all backbones do well in large-sized vessel wall segmentation. Meanwhile, in the first, third, and fourth row, the performance of U-Net is significantly lower than that of the other backbone networks and suffered from unsatisfactory segmentation accuracy due to the small vessel wall. The U-net is not precise enough to discriminate between the foreground and background, which leads to the edge being non-closed. After adding the HDC module, the performance of small vessel walls has been improved. This indicates that thanks to the advantages of employing the HDC module, we were able to obtain a better and more stable performance for multiscale vessel walls. In addition, for the first and third row image, there is low contrast surrounding the boundaries in the neighboring tissues that were essentially invisible to naked eyes; it is observed that EVSegNet was able to locate the vessel wall region and produce the cleanest and most accurate segmentation results and is superior to the other backbones. Ablation study for edge stream: Motivated by Takikawa et al. [23], the proposed EVSegNet combines an additional edge stream that could supply the edge knowledge information before segmenting for extracting edge information. Table 1 also shows the effect of the edge stream, which boosts the performance of the MR vessel wall imaging segmentation. Compared to the U-net + DUC + HDC, the Dice, accuracy, recall, and precision scores are increased from 86.5, 97.4, 86.1, and 91.3 to 87.5, 97.6, 86.8, and 91.5, respectively. Experimental results showed that our method was effective and presented superior performance over other methods. It serves to improve the performance of the model noticeably.
Furthermore, we also give a few examples for visual comparison in Figure 9 to show the segmentation results from 12 patients: the first column is the original image, and the second is the corresponding binary mask for references. The following columns are vessel wall segmentation images predicted by U-net (None), U-net (Res), U-net (DUC), U-net (DUC + HDC), and EVSegNet. Their corresponding evaluation results of Dice and mIoU are shown in Table 2. In particular, as can be seen from the second row and fifth row in Figure 9, all backbones do well in large-sized vessel wall segmentation. Meanwhile, in the first, third, and fourth row, the performance of U-Net is significantly lower than that of the other backbone networks and suffered from unsatisfactory segmentation accuracy due to the small vessel wall. The U-net is not precise enough to discriminate between the foreground and background, which leads to the edge being non-closed. After adding the HDC module, the performance of small vessel walls has been improved. This indicates that thanks to the advantages of employing the HDC module, we were able to obtain a better and more stable performance for multiscale vessel walls. In addition, for the first and third row image, there is low contrast surrounding the boundaries in the neighboring tissues that were essentially invisible to naked eyes; it is observed that EVSegNet was able to locate the vessel wall region and produce the cleanest and most accurate segmentation results and is superior to the other backbones.
to the small vessel wall. The U-net is not precise enough to discriminate between the foreground and background, which leads to the edge being non-closed. After adding the HDC module, the performance of small vessel walls has been improved. This indicates that thanks to the advantages of employing the HDC module, we were able to obtain a better and more stable performance for multiscale vessel walls. In addition, for the first and third row image, there is low contrast surrounding the boundaries in the neighboring tissues that were essentially invisible to naked eyes; it is observed that EVSegNet was able to locate the vessel wall region and produce the cleanest and most accurate segmentation results and is superior to the other backbones.

Discussion
We evaluated the EVSegNet through ablation study by adding Resnet, DUC, HDC, and edge stream block. For the outer vessel wall contour segmentation, as can be seen in the first, third, and fourth row in the case of the MR vessel wall image segmentation results from Figure 9 and Table 2, the EVSegNet achieved superior quantitative results than the other methods for the segmentation of small-sized vessel walls in rows 1, 3, and 4.
More careful observation suggests that adding the edge stream as an auxiliary output for the vessel wall segmentation task is vital for improving the accuracy. The use of a subnetwork of edge stream before doing the segmentation is beneficial; this approach improves the shape accuracy of the resulting segmentation. In particular, our edge stream does not require additional annotation, since edge information can be generated from the ground truth segmentation masks. This means that adding the edge module as prior knowledge into our method can further improve the segmentation accuracy by generating refined boundaries. Therefore, it improves the segmentation quality surrounding the edge, which results in the overall better segmentation performance achieved; our results are most consistent with ground truths.