Attention-Based UNet Deep Learning Model for Plaque Segmentation in Carotid Ultrasound for Stroke Risk Stratification: An Artificial Intelligence Paradigm

Stroke and cardiovascular diseases (CVD) significantly affect the world population. The early detection of such events may prevent the burden of death and costly surgery. Conventional methods are neither automated nor clinically accurate. Artificial Intelligence-based methods of automatically detecting and predicting the severity of CVD and stroke in their early stages are of prime importance. This study proposes an attention-channel-based UNet deep learning (DL) model that identifies the carotid plaques in the internal carotid artery (ICA) and common carotid artery (CCA) images. Our experiments consist of 970 ICA images from the UK, 379 CCA images from diabetic Japanese patients, and 300 CCA images from post-menopausal women from Hong Kong. We combined both CCA images to form an integrated database of 679 images. A rotation transformation technique was applied to 679 CCA images, doubling the database for the experiments. The cross-validation K5 (80% training: 20% testing) protocol was applied for accuracy determination. The results of the Attention-UNet model are benchmarked against UNet, UNet++, and UNet3P models. Visual plaque segmentation showed improvement in the Attention-UNet results compared to the other three models. The correlation coefficient (CC) value for Attention-UNet is 0.96, compared to 0.93, 0.96, and 0.92 for UNet, UNet++, and UNet3P models. Similarly, the AUC value for Attention-UNet is 0.97, compared to 0.964, 0.966, and 0.965 for other models. Conclusively, the Attention-UNet model is beneficial in segmenting very bright and fuzzy plaque images that are hard to diagnose using other methods. Further, we present a multi-ethnic, multi-center, racial bias-free study of stroke risk assessment.


Introduction
In recent decades, stroke and other cardiovascular diseases (CVD) have emerged as fatal diseases across the globe. For developing countries such as India, stroke has become a pandemic [1][2][3][4] in recent years with 105 to 152 cases per 0.1 million of the population [5].
In developed countries such as the USA, 795,000 people experience a stroke, and approximately 240,000 suffer a transient ischemic stroke yearly [6]. Various cardiovascular risk Still, any modification in the basic encoder-decoder arms or skip connections of the fundamental UNet architecture may fall under the HDL category. In keeping the spirit of HDL, Jain et al. [73] were the first to propose a few hybrid deep learning (HDL) models for ICA plaque segmentation. The authors showed that HDL models are a fusion of two solo deep learning models with the feature extraction capability of two SDL models. Some faster, small, and low parameter HDL models were proposed by them recently [74]. Another study by Jain et al. [73] used the same HDL model on CCA plaque segmentation. Thus, HDL models are gaining more importance due to their better feature extraction capability.
Attention channel maps were recently introduced in UNet-based deep learning [75,76]; however, they have not been tested in the carotid ultrasound framework. Further, no study has yet benchmarked UNet against other models for a carotid ultrasound framework. Therefore, we hypothesize that attention channel maps, when added to the skip connection of UNet architectures, will improve the performance visually and quantitatively. Figure 1 shows the global system diagram of plaque segmentation using the attention-based UNetbased DL paradigm (named as AtheroEdge™ 3.0, AtheroPoint LLP, Roseville, CA, USA).
The layout of this study is as follows: Section 2 presents database selection, preparation, and baseline characteristics. Section 3 presents the architecture of All UNet models used in this study. Methodology and experiments are presented in Section 4. The results are presented in Section 5. Section 6 shows the performance evaluation, Section 7 presents the discussions, and the paper concludes in Section 8.

Database Selection, Preparation, and Baseline Characteristics
We considered multi-institution, multi-ethnic databases for our study. Hence, our experiments are free from data selection bias. We considered three databases in this research work, DB1: ICA database from the United Kingdom; DB2: CCA database from Japan; DB3: CCA database from Hong Kong. Detailed descriptions of the databases are provided here.

DB1: UK ICA Database
A video database of 99 B-mode ultrasound images was acquired from Imperial College London, UK. The database consisted of datasets from 47 male and 52 female patients with a mean age of 75.04 ± 9.96 years. Each US video was converted into still images. From each patient image pool, we selected 10 images at the interval of 10. After a retrospective analysis of US video scans, we found that two scans were fuzzy and noisy; therefore, we removed these two scans from the database. Finally, we achieved 970 images of moderate-to-high risk plaque images.

DB2: Japanese Diabetic CCA Database
DB2 was collected from Toho University, Japan. It consisted of 379 CCA images of 190 diabetic patients with a mean age of 68.78 ± 10.88 years. The database consisted of 147 male and 43 female patients. An experienced sonographer performed all the scans. This was a retrospective study, and the institutional review board approved the ethics. A detailed baseline characteristic was recorded at the time of diagnosis and presented in Table 1 based on the measured plaque area. We collected the DB3 database from 50 Chinese women aged between 54 and 67 years. Further, we ordered six images from each patient; thus, the database constituted 300 images. The patients provided their written consent before the start of the experiment. All patients were post-menopausal women, and 28 such women were also identified and were suffering from other diseases. Of these 28 diseased women, one diabetic, three hypertensives, seven hypercholesteremic, fifteen hypertensive and hypercholesteremic, and two with all abnormalities were identified. The remaining patients had normal BP, cholesterol, and blood glucose levels in a fasting state.

Data Preparation and Augmentation Technique
Data preparation and augmentation play a vital role in DL-based systems. First, we removed the non-relevant information in ultrasound scans, such as patient ID (name), age, date, and model of the machine, etc., by cropping the grayscale area only. This is very popular in image processing and is a standardized method [45,77]. Secondly, we converted all images of DB1, DB2, and DB3 into equal-sized images of size 224 × 224, compatible with the first layer of the DL models. DB1 and DB2 are from different institutions and countries. Thus, we merged both databases to provide our experiments a multi-ethnic, multi-institutional database. Therefore, we combined the CCA images of DB2 and DB3 into a single folder, DB23, and achieved a new DB23 totaling 379 + 300 = 679 images. We also used a data augmentation technique on our combined CCA DB23 to enhance the number of images in the experiments. For this purpose, we used a rotation transform [-15 • to +15 • ] on all images of combined CCA DB23. Finally, we accomplished a database DB2A, where A stands for augmentation of 679 × 2 = 1358 images.

Binary Mask Preparation for Supervised Learning
This work falls under the supervised learning-based semantic segmentation of the atherosclerotic plaque from carotid ultrasound images. Thus, the models require pixel-label information for the training phase. We have a team of experienced sonographers and cardiologists who successfully identified the LI and MA layers of the carotid artery and delineated the plaque. Further, another sonographer verified the same for any error. The delineated LI and MA closed borders (contours) in grayscale images are converted into a binary mask using a MATLAB-based program. Further, these binary masks are also converted into equal-sized images of 224 × 224, as with the greyscale image.
Inter-and intra-observer analysis always require a radiologist and sometimes become expensive. Further, while inter-and intra-observer studies were not an integral part of this pilot design, our observations has proven that such analysis leads to variations of between 1% and 5% [78][79][80][81][82][83][84][85][86][87][88]; such ranges are normal and meet the FDA 510 (K) regulations. We intend to integrate this practice in future studies

Basic UNet Model
where m is the height of the convolutional filter, n is the width of the convolutional filter, d is the number of filters in the previous layer, b is the bias, and k is the number of filters in the current layer. The convolutional filter size is 3 × 3, and the default bias value (b) is 1 for each stage. Thus, after each encoder stage the number of the trainable parameters was 38,720, 221,440, 885,248, and 3,539,968, respectively. A bottleneck layer is present after the encoder stage, which carries the most delicate features extracted from the encoder stage. This bottleneck layer holds 14,157,824 numbers of trainable parameters. The right-hand part of the UNet architecture is called the decoder arm. The decoder stage follows the up-convolutional layer, followed by concatenation, convolutional, and ReLU layers.
During top to down parameters extraction in the encoder stage, vital semantic features are left in the extraction process. Thus, these features are added to the corresponding decoder stage at the concatenation layer via a skip connection (shown in the Figure 2 by dotted lines). After adding features from the encoder stage, each decoder stage carries 9,176,576, 2,294,528, 573,824, and 143,552 parameters. Finally, fully connected and softmax layers comprise 1154 and six parameters. Thus, Unet architecture contains a total of 31,032,840 training parameters. The Unet model is trained with grayscale images of 224 × 224 images and their corresponding binary masks. The model uses a sparse categorical cross-entropy loss function given by Equation (2) and an ADAM optimizer to reduce the loss function.

Unet++ Architecture
The Unet++ network is a modified architecture of the basic Unet model. In the basic Unet model, the skip connection directly carries features from the encoder stage and concatenates to the decoder stage. However, the Unet++ model follows a dense skip connection path where features are passed through different intermediate blocks. As shown in the Unet++ architecture in Figure 3, the first, second, third, and fourth stage has 3, 2, 1, and 0 intermediate convolutional blocks. These intermediate blocks are connected to all previous blocks at the same level through the concatenation layer. The encoder arm has the same training parameters as the Unet model; however, in the decoder stage, training parameters change due to addition from the intermediate stages. Table 2 shows the number of training parameters in different parts of the Unet architectures.

Unet3P
Unet3P is another variant of basic Unet architecture. In the previous Unet and Unet++ models, the skip connections carry the features from the same encoder stage to the same decoder stage (same scale). However, multi-scale features exist at different encoder stages. Thus, these models lack in adding features from multi-scale feature connections. Therefore, the concept of adding multi-scale features to the Unet model arises as Unet+++ or Unet3P. This idiosyncratic Unet3P model merges features from the same-scale and lower-scale features from the encoder with the high-scale features from the decoder. As can be seen from Figure 4, the same-scale features from encoder stage 1, the large scale features from decoder stages 2, 3, and 4, and the bottleneck layers are added to decoder stage 1 via skip connections. Similarly, the lower-scale features from encoder stage 1, the same-scale features from encoder stage 2, and the large-scale features from decoder stages 3 and 4, as well as the bottleneck layers are concatenated to decoder stage 2 via skip connection. The lower scale features from encoder stage 1 and 2, the same-scale features from encoder stage 3, and the large-scale features from decoder stage 4 and the bottleneck layers are added to decoder stage 3. Finally, the lower-scale features from encoder stages 1, 2, and 3, the same-scale features from encoder stage 4, and the high-scale features from the bottleneck layer are concatenated with decoder stage 4. The complete architecture of Unet3P is shown in Figure 4.

Attention-based Unet Model
The concept of the attention mechanism was proposed earlier by Bahdanau et al. [89] and Luong et al. [90]. Further, the same concept was integrated with the UNet by Oktay et al. [75] for segmentation of the Pancreas. We utilized the attention-UNet for plaque segmentation as the atherosclerotic plaque has a very fuzzy nature, which is challenging to segment using other UNet models in many cases. Figure 5 below shows an attention block used in the place of the skip connection of the UNet model.
The attention gate has two inputs and one output. One of the inputs is "input features" 'x' from the same encoder level, and the second input is gating features 'g' from the lower decoder level. Both inputs have an inherent property to carry the features. The input feature 'x' from the same encoder level comes from a shallow network level; therefore, it contains spatial feature information. The gating signal derives from the decoder level, which is at a deeper level compared to the input signal 'x'. Thus gating signal provides better feature representation. Further, the input signal is downsampled by a stride of '2 to make it compatible with the gating signal. The downsampled signal 'x' is then combined with the gating signal 'g' and passed through the rectified linear unit (ReLU) activation function. The ReLU is a non-linear activation function which removes the negative values from the input, i.e., it provides the output if it has positive values.
Further, both 'x' and 'g' signals are passed through a 1 × 1 convolutional operation to acquire the weights from the combined weight signal. The combined weight signal passes through a Sigmoid activation function. Sigmoid is a 'S-shaped' non-linear curve defined by Sig(x) = 1/(1 + exp(-x)). Due to its large slope value, it is able to transform the falling input values to between '0 and '1 . Finally, the combined weight signal is upsampled to the same scale as the input signal 'x' and multiplied by it element-wise. We can understand the above attention mechanism with the example provided in Figure 6, which shows the first attention gate at the skip connection between the first encoder and decoder stages. The input features 'x' of size 224 × 224 × 64 originate from the first encoder stage, and the gating signal 'g' of size 112 × 112 × 128 comes from one lower decoder level. Input signal 'x' is downsampled by green block (convolutional 1 × 1 with #filter = gating signal filter i.e., 128) to the size 112 × 112 × 128. Also, the gating signal passes through the blue box (#filters = 128) with 1 × 1 convolution, so it acquires the same shape as 'x', i.e., 112 × 112 × 128. Both signals are added and applied to the ReLU activation function, where any nonlinearity is removed from the combined signal. Further, the combined signal is passed through the psi (Ψ) block, which is a 1 × 1 convolutional block, to acquire the weights from the combined signal. These weights are the attention gate weights generated by resampling the shallow and deep features from the encoder and decoder stages. These weights are upsampled to size 224 × 224 by the upsampler and multiplied to the input signal 'x' of the same size. Figure 7 shows the complete attention-UNet model with four encoder and decoder stages. In this architecture, four attention blocks are used in place of the skip connection between the encoder and decoder stages.

Methodology and Experiments
All six models were trained using the raw images and binary masks of the ICA DB1 and CCA DB2A mentioned in database Section 2. Various experimental setup steps and key points are mentioned below.

Hyperparameter Selection and Optimization
The choice of hyperparameters is highly standardized and well-established. The main parameters were (1) # of layers, (2) CV protocol, (3) # of epochs, (4) learning rate, (5) batch size, (6) filter size. The number of layers is shown in a block diagram of all DL models. Additionally, Table 2 shows the total number of parameters in major parts of the DL architecture. The hyperparameters of the experiments include 100 epochs, which we optimized after many experiments with the database. At this number of epochs, no significant change in loss function is observed, and the loss value converges. We selected batch sizes of 8 for UNet, UNet3P, Squeeze-UNet, and attention-UNet, and 4 for the UNet++ and Fractal-UNet models. Further, we used a standard learning rate of 10 −4 , convolutional filter size of 3 × 3, and bias value of 1 with the same padding.

Sparse Categorical Cross-Entropy Loss Function
We used a sparse-categorical cross-entropy loss function to minimize the training loss. The loss function is defined as the following Equation (1): where, w refers to the model parameters, e.g., weights of the neural network; N is the total number of pixels in an image, y i is the true (actual) label; andŷ i is the predicted label.

K5 Cross-Validation
We implemented a commonly used K5 cross-validation [91] method for training and testing. The complete database is divided into 80% training and 20% test datasets in this cross-validation system. In the first step, the training set is used to train the system and offline weight generation, and the test set is used to validate the system and segmentation parameters generation. Again, we transferred the 20% test set back into the main dataset and used another 20% of the images for testing and 80% for training. We repeated the previous step. We repeated this training and testing experiment five times; thus, five offline models were generated corresponding to each test dataset. Now, each test dataset is used with a corresponding offline weight to generate the test results. Thus, all images are tested at least once.
Since the datasets were relatively small and our past experiences in AI protocols showed strong validation results [92] Once the test result of each image is generated, we obtain the arithmetic mean of all images using the following Equation (2).
where xi is the extracted feature of image 'i', and 'X' is the arithmetic mean of 'N' images.

Results
The segmentation performance of the six models with the ICA DB1 and CCA DB2A databases are shown in Tables 3 and 4, respectively. We have acquired accuracy, sensitivity, specificity, precision, Mathew's correlation coefficient (MCC), dice-similarity coefficient (DSC) and Jaccard Index (JI). These indices are calculated by comparing estimated binary masks generated from all UNet models and ground truth masks. The attention-based UNet show mean ± SD values of all parameters as 98.58 ± 0. 59  The visual results of UNet, UNet++, UNet3P, Fractal-UNet, Squeeze-UNet, and attention-UNet models are shown in Figure 8. The top row represents binary masks of all databases. The second-row shows overlays of the GT masks over the raw grayscale images in green colour. The third row shows the overlay of the difference between estimated and GT masks on raw grayscale images. Similarly, the fourth, fifth, sixth, seventh, and eighth rows also show the difference between the estimated and GT mask on raw grayscale images for UNet++, UNet3P, Fractal-UNet, Squeeze-UNet, and attention-UNet models, respectively. The red color indicates the estimated mask, and the green color represents the difference of the two masks. The attention mechanism states that the attention blocks modify the deep features by applying the attention weights. The same can be seen visually in some critical images shown in Figure 9, which are not successfully segmented using other models.

Performance Evaluation
The results of all models show almost equal segmentation indices. Hence we perform some more performance evaluation tests to validate our experiments. We conducted a series of performance tests on the ICA DB1 and DB2A databases, such as regression analysis, receiver operating characteristics (ROC) analysis, paired-t-test, and Bland-Altman's plot analysis.

Regression Analysis
A regression analysis is a powerful statistical tool to analyze the relation between two quantities. It generates a correlation coefficient (CC) between the two variables, which occupies values between 0 and 1. A CC value close to '1' refers to a very high correlation, whereas a CC close to '0' refers to very little correlation between the two quantities. We used ground truth plaque area (GTPA) and predicted plaque area from the model for the regression analysis. Using ICA DB1, we obtained CC values between GTPA and PA from  Table 5. The regression analysis curve is shown in Figures 10 and 11. From the analysis of all CC numbers, it is clear that the attention-based UNet outperforms the other models.

Receiver Operating Characteristics
A receiver operating characteristics (ROC) analysis is another performance evaluation tool to assess the classification performance. We cross-examined the literature and found a plaque area threshold value of 40 mm 2 used by researchers to classify the low and high risk plaque. Using this threshold value, we generated GT labels '1' and '0' for GTPA > 40 mm 2 and GTPA < 40 mm 2 , respectively. Using the predicted area as the variable and GT labels as the classification variable, we plotted the ROC curve for UNet, UNet++, UNet3P, Fractal-UNet, Squeeze-UNet, and attention-based UNet models for ICA and CCA databases. These ROC curves are shown in Figures 12 and 13 along with the area under the ROC curve (AUC) and p-values. The AUC values are compared in Table 5. Again, it is clear from the AUC numbers and the ROC curves that the attention-based UNet model outperforms other UNet models.

Paired-t-Test Analysis
A paired-t-test is mostly used in biostatistics to analyze the mean difference between the two measurements. The requirement for this test is the paired quantity of measurement for the same subject. In our case, we have GTPA and the predicted PA for the same arteries (CCA and ICA both). Thus, this method analyses whether the mean difference between the PA pair is zero or not. The distribution of GTPA and predicted PA is shown using the box and whiskers plot in Figures 14 and 15 for ICA and CCA databases, respectively. The paired t-test results such as mean ± SD, the standard error of the mean, mean differ-ence, Student's t-value, and p-values are shown in Tables 6 and 7 for the ICA and CCA databases, respectively.

Bland-Altman's Plot
A biostatistical analysis frequently uses a Bland-Altman plot or simply a difference plot when two different methods or instruments measure a parameter. The BA plot is used to show the bias between the mean differences of the two methods. It also offers an agreement interval of 95% (confidence interval), in which the difference between the second and first method fall. In our experiments, plaque area is the quantity which is measured by the expert sonographer by a manual method, i.e., GTPA and the predicted area measured by all UNet models. Thus, the difference between GTPA and the predicted PA is plotted on the Y-axis, and the mean of the both quantity is plotted on the X-axis. The BA plots between GTPA and predicted PA for all the UNet models using the ICA and CCA databases are shown in Figures 16 and 17, respectively.

Discussion
The current research work is a novel application of attention-based UNet models for carotid plaque segmentation. We hypothesized that many of the images would be very fuzzy and bright to identify the plaque constituents. Thus, the attention-based UNet model [75,76] enhances the deep features and provides better segmentation. The model also shows excellent segmentation performance for low-to-moderate CCA and moderateto-high ICA plaque images. Further, we compared the attention-based model against UNet, UNet++, and UNet3P models for a multicenter, multi-ethnic database. We made an effort to present a bias-free stroke risk assessment system.

Bias in Medical Imaging Models
Deep learning-based clinical models have recently been gaining significant attention. The models use clinical data available from hospitals and medical research centers. Further, these clinical data are processed using established algorithms or new algorithms. These AI-based clinical decision support systems have shown promising results in diagnosis. However, the biases in these systems are not reported in many studies [92,93]. These models may suffer from data selection bias (from single source); observer bias; data-labelling bias; data source bias (data device selection bias); validation bias; racial bias (multi-ethnic data selection); measurement bias; bias due to variabilities in the data sets [94], and many other types of bias which may affect the clinical results in some way. These biases must be discussed in detail and considered while designing the clinical support systems. Recently, methods have been developed to compute biases, which can be extended for UNet-based systems [95][96][97].

Supervised and Unsupervised Learning Based DL Models
The current algorithm uses binary masks of ICA and CCA images for model training.
To successfully train the model's error-free mask, preparation is a must; failure to prepare may result in the false recognition of the model. Binary mask preparation is tedious and time-consuming, and expert sonographers are required to accomplish this task. The current study involves the hundreds of images due to the unavailability of the datasets. However, considering the case where multiple thousands of images are available for analysis, generating such masks is nearly impossible. Therefore, supervised learning loses its significance in big data analysis. Thus, unsupervised models may replace such scenarios where no binary mask or labelled information is required for training. However, such unsupervised DL models have not gained much attention in medical image processing until now, and significant scope is available in this area. We may see many Unsup-DL models in the near future.

Benchmarking
We have presented an attention-channel-based UNet model for atherosclerotic plaque segmentation. Many efforts have been made in ICA and CCA plaque area segmentation; still, these methods are not perfect in one way or another. Previous methods suffer from some of the biases discussed in the above section, resulting in poor image segmentation from other databases. Zhou et al. [66] presented the UNet++ model for ICA and CCA plaque segmentation from multi-ethnic databases. However, they trained their model with only 33, 33, and 34 images and tested on 44 images. Using such a low number of images for the test does not infer proper justification to the clinical setting. Further, they did not benchmark their system with any other established system. In another study, Jain et al. presented hybrid deep learning models for ICA plaque segmentation [68]. They proposed SegNet-UNet and SegNet-UNet+ HDL models for plaque area segmentation. Although their model used only one segment of the artery, i.e., ICA, they enhanced the image database by applying the rotation transform augmentation technique. Thus, their models also suffer some biases such as data selection bias, source bias, and racial bias. However, their models are a milestone in HDL model studies.
In another study, the same team of researchers, Jain et al., used the above HDL models for plaque segmentation from multi-ethnic CCA databases [78]. They used two kinds of databases for the experiments, one from Japan and another from Hong Kong, and performed some unseen experiments. Thus, they attempted to avoid data selection and racial bias using these HDL models for unseen experiments. However, they failed to validate their experiments against any established system; therefore, their models suffer from validation bias. Further, they attempted to avoid the validation bias by comparing their SDL and HDL models against a commercially available state-of-the-art plaque segmentation system, AtheroEdge 2.0, developed by AtheroPoint LLC, CA, USA [73]. Their HDL model shows a plaque area error of 8 mm 2 compared to 9.9 mm 2 for SDL and 9.6 mm 2 for AtheroEdge 2.0 models for 90% of the image database. The proposed method attempted to overcome previous bias by intensive exercises. By using databases from ICA and CCA sections, we attempted to avoid data selection bias. We also managed racial biases by incorporating multi-ethnic, multi-center ICA and CCA databases. DB1, DB2, and DB3 are from the UK, Japan and Hong Kong, respectively. Further, we validated our Attention-UNet model against previous UNet, UNet++, and UNet3P models to avoid model selection and validation bias. Table 8 summarizes the comparisons of the present study with some benchmark studies. The present work shows a bias-free study of the plaque segmentation from ICA and CCA images from multi-ethnic, multicenter databases. We presented a powerful attention mechanism to modify the shallow and deep features of the carotid plaque images, which can capture those plaque areas which other models do not detect. We compared attentionbased UNet segmentation results with other models used in previous studies, such as UNet, UNet++, and UNet3P, and the results are comparable or superior to such models. Moreover, the visual results show a promising improvement in many images. Further, we validated our experiments using a series of performance evaluation tests such as regression analysis, ROC, Bland-Altman plots, and paired-t-tests. The results of such performance tests are also comparable or superior to other models.
Further, we attempted to fill the gap in data selection and racial biases from previous studies using multicenter, multi-ethnic, and augmented databases. We believe that these biases are not sufficient to overcome, and there is still scope for their improvement, which can be attempted in future studies, as attempted here [98]. Further, the attention mechanism can be employed in other variants of the UNet to view its effect on other HDLs and such integration of advanced image processing [99] methods with UNet.
Since training models are large, one can adapt weight pruning techniques using algorithms such as genetic algorithms and whale optimization [100,101]. Moreover, the UNet-based segmentation method can be used for plaque tissue characterization due to its vital feature extraction and modification capability [71,102,103]. Carotid segmented lesions and plaque need to be correlated to a coronary SYNTAX score as part of clinical validation [104]. Segmented plaque can be attempted with different clinical groups such as rheumatology patients to understand cardiovascular risk [105]. Finally, since coronary plaque has been observed in COVID-19 patients [106], one can extend the UNet-based solution for plaque segmentation and measurement in carotid scans on COVID-19 patients.

Conclusions
This work presents a novel concept of the attention mechanism incorporated with UNet as an attention-based UNet model. The attention-based UNet model successfully demonstrated plaque segmentation in complex images with fuzzy and bright plaque. The results of the attention-UNet models were benchmarked against UNet, UNet++, and UNet3P models. The CC value of the attention-based UNet model for the CCA database was 0.96, compared to 0.93, 0.96, and 0.92 for UNet, UNet++, and UNet3P. The AUC value for attention-based UNet was 0.97, compared to 0.964, 0.966, and 0.965 for the other models. The attention gate weight modifies the shallow and deep features to identify the complex plaque images; therefore, the attention mechanism is vital in plaque feature extraction and tissue characterization. The system can be adopted in clinical settings for cardiovascular disease risk stratification.