Classification of Breast Cancer Lesions in Ultrasound Images by using Attention Layer and loss Ensembles in Deep Convolutional Neural Networks

Reliable classification of benign and malignant lesions in breast ultrasound images can provide an effective and relatively low cost method for early diagnosis of breast cancer. The accuracy of the diagnosis is however highly dependent on the quality of the ultrasound systems and the experience of the users (radiologists). The leverage in deep convolutional neural network approaches provided solutions in efficient analysis of breast ultrasound images. In this study, we proposed a new framework for classification of breast cancer lesions by use of an attention module in modified VGG16 architecture. We also proposed new ensembled loss function which is the combination of binary cross-entropy and logarithm of the hyperbolic cosine loss to improve the model discrepancy between classified lesions and its labels. Networks trained from pretrained ImageNet weights, and subsequently fine-tuned with ultrasound datasets. The proposed model in this study outperformed other modified VGG16 architectures with the accuracy of 93% and also the results are competitive with other state of the art frameworks for classification of breast cancer lesions. In this study, we employed transfer learning approaches with the pre-trained VGG16 architecture. Different CNN models for classification task were trained to predict benign or malignant lesions in breast ultrasound images. Our Experimental results show that the choice of loss function is highly important in classification task and by adding an attention block we could empower the performance our model.


Introduction
Breast cancer is the second leading cause of cancer death in women [1,2].Different types of imaging modalities such as mammography, ultrasound and magnetic resonance imaging have been used for diagnosing breast tumours.Whilst mammography has been proven to be a useful technique for diagnosing breast cancer leading to a reduced mortality [3], its sensitivity is limited in dense breast tissues.Breast density has been established as an independent risk of breast cancer [4][5][6].Women with heterogenous dense and extremely dense breast tissues have relatively higher risks of 1.2 and 2.1 times in developing breast cancers compared to average women [7].The accuracy rate of simple benign cysts diagnosis in breast ultrasound images has been reported to be 96-100% , so they do not require further evaluation [8].In a meta-analysis of 29 studies, various adjunct screening methods have been studied to assess the limitation of various breast cancer screening modalities and ultrasound has demonstrated an increase of cancer detection by 40% [9].
Computer-aided diagnosis (CAD) systems are extensively used for detection and classification of tumours in breast ultrasound images.Statistical methods [10] have been predominantly used to analyse extracted features from lesion shape, margin, homogeneity and posterior acoustic attenuation.However, identification of shape and margin of lesions is difficult in ultrasound images [11].Machine learning techniques have also been extensively deployed to analyse and classify lesions based on the handcrafted features consist of morphological and texture features of tumours [12,13].However, the extraction of features was still highly dependent on radiologist's experience.The struggles of researchers for handcrafting features has led to development of newer algorithms that can learn features automatically from data such as deep learning algorithms which are particularly strong tools for extracting non-linear features from data.Deep learning models are surprisingly promising in classification of ultrasound images, in which pattern recognition is not easily hand-engineered [14].
According to the rapid growth in deep learning based methods in last few years, one step further to efficiently integrate local and global features and exploit localised information [15] was employing attention mechanisms.The attention has been used in computer vision tasks such as detection [16,17], segmentation [18] and classification [19], and it improves the model performance by focusing on the most relevant features that are important for the given task.To the best of our knowledge, attention modules have been widely used in medical image segmentation but not classification.In this study we used the attention gate module [15] in modified VGG16 architecture with new loss function to increase the classification performance for ultrasound breast lesion classification.

Methods
The proposed framework in this study is inspired by [15], where the authors have introduced attention gates.We used one layer of attention in modified VGG16 architecture where the attention mechanism was applied to layer 13 just before pooling and the max pooling layer 18 as a last convolutional layer in the VGG16 feature extraction module [20].The framework is illustrated in figure 1.Most of the malignant lesions were infiltrating ductal carcinomas (IDCs), whereas the majority of the benign lesions were fibroadenomas.The sizes of the malignant lesions ranged from 0.5 to 9.0 cm (mean ± SD: 2.1 ± 1.2 cm), whereas the sizes of the benign lesions ranged from 0.3 to 5.0 cm (mean ± SD: 1.4 ± 1.0 cm).

Pre-processing
All ultrasound images were acquired using the Aixplorer ultrasound system (SuperSonic Imagine, Aix en Provence, France) using a 15-4 MHz linear transducer probe.Two specialized radiologists in breast imaging performed the scanning task and they were blinded to the histological diagnosis results.All images in UMMC dataset were in JPEG format and in the resolution of 1400x1050 pixels.The average image size in dataset B was 760 x 570 pixels where each of the images presented one or more lesions.In our experiments, the images were resampled to 128 x 128 pixels with a 75-15-10 train-test-validation split.Image normalization was applied to all of the images in the dataszsets to create a consistent dynamic range across the dataset.Figure 2, Illustrates the samples of benign and malignant lesions in breast ultrasound images.

Attention Module
At the deep levels of convolutional layers, the network acquires the richest possible feature representation.Yet, spatial information may get lost in the high-level output maps with cascaded convolutions [21] or dense predictions are made on particular region of interest (ROI) and this approach leads to redundancy of low-level features extracted by all models within the cascade [22].We used soft attention gate (AG) to deal with these issues.Through the AGs, the input feature map has element-wise multiplication with the attention coefficient to highlight the salient features (figure 3).
ReLU and sigmoid as σ1 and σ2, respectively, are used to transform the intermediate maps in calculating the attention coefficients.The attention coefficients determine the important regions of image and prune features to maintain the relevant activations in specific task.The output maps at each scale are upsampled and then concatenated with the pruned features.In this stage, a 1x1x1 convolutions and non-linear activations are applied on each output map and then the high dimensional feature representation is supervised with CE-logcosh loss.

Cross entropy -Log hyperbolic cosine (CE-LogCosh) Loss
According to importance of the loss function in learning algorithm, towards having better learning system, this study is inspired by the ensembled methods [23] in order to develop an ensemble loss function.We combined two loss functions, cross entropy [24] and log hyperbolic cosine [25] to boost the learning process and achieving better performance.The cross entropy loss, compares the distribution of predictions and true labels and defines as: The log-cosh loss function is the hyperbolic cosine algorithm of the prediction error.
LLCH (y,ŷ) = ∑ log (cos ℎ ( ̂ −   ) Where y is the label and  ̂ is the predicted label.The proposed ensembled loss function is as: In CE-logcosh Loss function, α and β are parameters that can tuned to shift the emphasis on cross entropy or logcosh loss.In this study we set α and β to 0.5 as the best performance was achieved.

Network Architecture
In this study, we used convolutional layers in VGG16 to extract features from the datasets.
Figure 4 is the schematic of proposed network architecture in which pre-trained VGG16 were used for fine tuning and feature extraction.The feature maps in layer 13 and 18 were then used in the attention block and then the output was fed to modified fully connected (FC) [26] layers for classification of malignant and benign lesions.We proposed new model based on attention gating and new loss function to enhance the performance of classification for breast ultrasound images.The "dropout" strategy [27] was also used to avoid overfitting.

Evaluation
Classification performance of models in this study were measured by sensitivity, specificity, accuracy, precision, F1 score and Matthews Correlation Coefficient [28], which were obtained from confusion matrix entries.In a confusion matrix, the relation between classification outcomes and predicted classes are illustrated.The level of classification performance is calculated by the number of correct and incorrect classified samples in each class.Accuracy is computed based on the total number of correct predictions, defined as: Sensitivity is the proportion of true positive that are identified correctly, defined as: Specificity is the proportion of true negative that are correctly predicted, defined as: Precision or positive predictive value, is the ratio of correctly predicted positive observations to total predicted positive observations, defined as: F1 score is the weighted average of precision which is calculated as: Matthews Correlation Coefficient (MCC) is correlation coefficient between the observed and predicted classifications, defined as: Where True Positive (TP) and True Negative (TN) stand for the number of correct predictions and False Positive (FP) and False Negative (FN) that of incorrect predictions.

Results
We evaluated our proposed model on classification of ultrasound breast lesions to benign and malignant.In particular, correct classification of benign and malignant lesions is difficult task because of variety in shape and poor contrast on ultrasound breast images.We compared our model and standard VGG16 with different losses in terms of classification performance.
From table 1 and figure 5, it is notable that our proposed model with CE-logcosh outperformed other classification models in terms of accuracy, sensitivity, specificity, precision, F1 score and MCC.

Discussion
In this paper, modified VGG16 architectures were compared in order to achieve higher performance in classification of benign and malignant breast tumours.Modifications such as additional attention block, different dense layers and ensembled loss functions were made.One of the improvements in the CNN models was the use of ensembled loss functions.Within the training phase, in the gradient propagation optimization, the weight of each loss function was tuned and they were parametrized by α and β to control the emphasis.To the best of our knowledge, logcosh loss works mostly like L2 at small values and like L1 at large values and is usually used in regression or reconstruction tasks [25].In this study we used logcosh loss, combined with binary cross-entropy to improve the classification accuracy.As it is notable in table 1, the ensemble of both losses could improve the performance of classification.
On the other hand, by using attention block, relevant spatial information is identified from lowlevel feature maps and propagated to classification stage.The lack of these relevant spatial information is caused by transforming the large size of the feature maps that are obtained after the convolutional layers in the CNN and reaching smaller feature dimensions.Therefore, the attention block was proposed which attempts to compute the contribution of each feature.
In our study, out of all the models, the attention VGG16 with logcosh loss has demonstrated the highest accuracy and precision.Additionally, the proposed deep convolutional neural network architecture does not need prior expert knowledge or image segmentation, hence it will be more convenient in CAD and suitable for future clinical diagnosis.In summary, we proposed the attention VGG16 classifier as a potential architecture in classifying breast cancer ultrasound images.Having said this, we suggest that this model is tested further using a larger dataset to improve the robustness of this architecture.Additionally, we also suggest that the VGG16 to be implemented with machine learning classifiers as potential architectures in clinical studies.In future studies, the deep convolutional neural networks architecture should be conducted on a larger image data with various tumor subtypes to adapt it to multi-class classification as the classification of breast lesions` subtypes is of greater clinical impact [33,34].

Conclusion
In this study, we analysed some computer-aided diagnosis models for classification of benign and malignant lesions on UMMC breast ultrasound image dataset.We employed transfer learning approaches with the pre-trained VGG16 architecture.Different CNN models for classification task were trained to predict benign or malignant lesions in breast ultrasound images.Our Experimental results demonstrated that the choice of loss function is highly important in classification task and by adding an attention block we could empower the performance our model.Our proposed model with extracted features from VGG16 and fully connected network with only 10 neurons achieved the best performance in classification task with respect to the precision of 92% and accuracy of 93%.With this framework, evaluation tests show that the combination of loss functions can provide suitable information to enable the construction of the most accurate prediction model when compared with other models.In the future, other deep neural network models will be tested on a larger dataset of ultrasound images with the hope to further increase the accuracy of performance.

Figure 1 .
Figure 1.The overall framework of this study.

Figure 2 .
Figure 2. The samples of benign (a) and malignant (b) lesions in breast ultrasound images.

Table 1 .
The comparison of the VGG16 and attention-VGG16 models with different loss functions in classification of benign and malignant lesions.

Table 2
demonstrate some state-of-the-art deep learning models in lesion classification for breast ultrasound images.It is notable that the performance of our proposed model is comparable to these models.

Table 2 .
The state of the art of deep learning models in breast ultrasound lesion classification.