Identification of Breast Malignancy by Marker-Controlled Watershed Transformation and Hybrid Feature Set for Healthcare

Breast cancer is a highly prevalent disease in females that may lead to mortality in severe cases. The mortality can be subsided if breast cancer is diagnosed at an early stage. The focus of this study is to detect breast malignancy through computer-aided diagnosis (CADx). In the first phase of this work, Hilbert transform is employed to reconstruct B-mode images from the raw data followed by the marker-controlled watershed transformation to segment the lesion. The methods based only on texture analysis are quite sensitive to speckle noise and other artifacts. Therefore, a hybrid feature set is developed after the extraction of shape-based and texture features from the breast lesion. Decision tree, k-nearest neighbor (KNN), and ensemble decision tree model via random under-sampling with Boost (RUSBoost) are utilized to segregate the cancerous lesions from the benign ones. The proposed technique is tested on OASBUD (Open Access Series of Breast Ultrasonic Data) and breast ultrasound (BUS) images collected at Baheya Hospital Egypt (BHE). The OASBUD dataset contains raw ultrasound data obtained from 100 patients containing 52 malignant and 48 benign lesions. The dataset collected at BHE contains 210 malignant and 437 benign images. The proposed system achieved promising accuracy of 97% with confidence interval (CI) of 91.48% to 99.38% for OASBUD and 96.6% accuracy with CI of 94.90% to 97.86% for the BHE dataset using ensemble method.


Introduction
Breast cancer (BC) is one of the main reasons of demises among women worldwide [1]. According to the report, one out of twelve female may be affected with BC. The World Health Organization (WHO) documented that, approximately 508,000 women died due to this disease in 2011 [2]. In spite of this, it is possible to defeat breast cancer if it is diagnosed at the initial stage. The mammography imaging is considered an appropriate method of breast cancer diagnosis. However, BUS is employed as the supportive tool to assess mammographic finding, profound masses, indecisive mammograms, and to guide biopsies [3]. BUS is also the main procedure prescribed to assess breast ailments in females especially below the age of 30 due to low cost and easy access that can accurately differentiate those types as compared to other imaging procedures. However, proper diagnosis by BUS involves trained and experienced radiologists to identify dense and cystic breast lesion. The report of radiologist is based on visual inspection of BUS images only, so it is difficult to diagnose it as malignant or benign [4,5]. Furthermore, BUS is operator-dependent modality [5] that comprises artifacts (i.e., speckle noise etc.), which degrades the image quality. CADx systems using machine learning approaches can be developed for diagnosis of breast anomalies and tumors classification. The CADx system assists radiologists in the evaluation of breast abnormalities with reliability and accuracy [6,7]. Furthermore, CADx may avoid extra efforts during an examination and diagnostic errors of physicians that are made due to fatigue and workload.
A CADx framework usually comprises of four important steps: Preprocessing, segmentation, feature extraction, and feature classification [8,9]. The efficacy of a CADx framework depends on employed features that are mostly acquired using expert knowledge.
Several researchers proposed the CADx system to detect breast abnormalities using BUS images. Image segmentation is an essential phase in the CADx framework. Different classical approaches are used for BUS segmentation such as thresholding, region growing, and watershed [10]. Thresholding is the simple and speedy method of segmentation. Threshold value is constant over the whole image in global thresholding whereas, in case of local thresholding, value varies over the local features. Global thresholding cannot perform well in noisy and low contrast images [10]. Shan et al. employed [11] the automatic seed point selection technique for region-growing method. In this method, thresholding was utilized to create a group of selected regions and then tumor region is determined based on local feature, size of region, and location. Watershed is a robust technique and it yields accurate segmentation than region growing and thresholding. However, the main issue faced in watershed is the selection of marker, therefore, Gómez et al. [12] employed internal and external markers using the Beucher gradient [13] to prevent over-segmentation.
The efficacy of CADx framework mostly depends upon the appropriate feature set. A variety of features are obtained from BUS images and classified through machine learning. The available and most employed features are typically separated into two major groups: Texture-based and shape-based [6]. Numerous texture features are utilized for classing ultrasound breast tumors. For example, Yang et al. [14] performed texture analysis of BUS images that applied gray-scale invariant features via ranklet transform and used support vector machine (SVM) to isolate the malignant lesions from benign ones. Shi et al. [15] employed fractal features, textural features from spatial gray-level dependence (SGLD) matrices, and histogram-based features. The stepwise regression is applied for choosing an optimal subset of features and used fuzzy SVM (FSVM), artificial neural network (ANN) and SVM, and claimed that FSVM produces better results. Lo et al. [16] recommended a system to extract grey level co-occurrence matrix (GLCM) features from ranklet BUS images and illustrated that the result is significant in clinical use, but GCM is sensitive to speckle noise and other artifacts in BUS images. Cai et al. [17] suggested phased congruency-based binary pattern (PCBP) that merges the phase congruency (PC) method with that of the local binary pattern (LBP). The features are classified through SVM with the radial basis function (RBF) and showed robustness of the proposed method.
The shape-based features very effectively differentiated various breast lesions in some researches [18,19] and showed that these features are more appropriate for breast tumor detection. The objective of shape-based features is to measure contour and shape characteristics of breast lesions. Typically, ill-defined and irregular boundaries are observed in the case of malignant tumors [9], so the goal of the shape-based features is to evaluate the lesion margin and shape. However, the efficacy of shape-based features relies on a US scanner, the particular view of the lesion, preprocessing technique, and segmentation algorithm [20].
Deep learning is applied in image processing to solve challenging tasks but it needs further improvement to solve all bottlenecks. There are certain bottlenecks where conventional methods with hybrid features demonstrated the capability of a better solution [21]. Recently, deep learning and convolutional neural network (CNN)-based methodologies are employed for benign or malignant lesion recognition [22,23], but the cost of computation complexity of these methods are a major barriers in clinical applications [24,25]. To overcome this barrier, researchers have been considering various methods to lessen the time and cost related with deep learning application. The advent of deep learning may launch several ways to do something with conventional methods to overwhelm the various challenges deep learning brings (e.g., time, accuracy, computing power, quantity, and characteristics inputs) [21].
In CADx systems, the main focus is the automatic finding and classification of breast lesion. Imaging modalities supporting the type of texture features [14][15][16] and shape-based features [18][19][20] have been employed to identify breast cancer ailment. However, it is still a difficult job to choose appropriate features for finding cancer at its early stage [19]. The goal of our research to enhance classification accuracy, and specially: a.
To develop an automated CADx system to detect breast cancer accurately. b.
To introduce the marker-controlled watershed transformation for efficient segmentation. c.
To extract hybrid feature set incorporating both shape-based and texture features to describe lesions in detail and to overcome the limitations of texture-based methods for BUS images.
To give an understanding of the proposed methodology, the rest of this article is structured in the following sections. Section 2 illustrates the framework of the proposed method, implementation procedure, including description of the employed dataset, segmentation technique, and feature extraction method. Section 3 reports the experimental results and their discussion are presented in Section 4. Section 5 concludes the work with future direction. Figure 1 illustrates the proposed CADx system for quantitative ultrasound breast images. The proposed technique is comprised of four phases: Preprocessing to build ultrasound B-mode images from raw ultrasonic data using the Hilbert transform for OASBUD dataset. The second phase is segmentation to extract a lesion part from background tissues; which is performed through robust segmentation technique called marker-controlled watershed transformation. The third phase is feature extraction which is significant for lesion categorization. There are many successful features employed. However, some features are commonly invalid for BUS images. For example, tumors are mostly dark and difficult to find through BUS images and compact detail is required for detection, therefore the hybrid-based approach of shape and texture has been proposed. Finally, the proposed system is validated through different classifiers such ensemble, decision tree, and KNN to differentiate between benign and malignant tumor.

Image Database
In this research work, a database called OASBUD [26] and BUS images collected at BHE [27] are used. The OASBUD dataset includes raw ultrasound data obtained from 100 patients with 52 malignant breast lesions and 48 benign lesions. For every breast lesion, two perpendicular scans (transverse and longitudinal) were performed. Region of interest (ROI) was outlined by an expert radiologist to specify the lesion part for each scan. The OASBUD dataset comprised of different fields as shown in Table 1 [26].

Preprocessing
As the OASBUD dataset comprises raw data, we have used the method proposed earlier [26] to rebuild B-mode images from the raw data. First, the envelope of ultrasonic echoes is computed through the Hilbert transform and then log-compressed to 50 dB dynamic range that reconstructs the B-mode (brightness) image, which is a two-dimensional image comprised of bright dots signifying the ultrasound echoes. Figure 2 displays an image of a tumor rebuilt from RF (radiofrequency) data.

Image Database
In this research work, a database called OASBUD [26] and BUS images collected at BHE [27] are used. The OASBUD dataset includes raw ultrasound data obtained from 100 patients with 52 malignant breast lesions and 48 benign lesions. For every breast lesion, two perpendicular scans (transverse and longitudinal) were performed. Region of interest (ROI) was outlined by an expert radiologist to specify the lesion part for each scan. The OASBUD dataset comprised of different fields as shown in Table 1 [26].

Preprocessing
As the OASBUD dataset comprises raw data, we have used the method proposed earlier [26] to rebuild B-mode images from the raw data. First, the envelope of ultrasonic echoes is computed through the Hilbert transform and then log-compressed to 50 dB dynamic range that reconstructs the B-mode (brightness) image, which is a two-dimensional image comprised of bright dots signifying the ultrasound echoes. Figure 2 displays an image of a tumor rebuilt from RF (radio-frequency) data.

Segmentation
Image segmentation is a very crucial stage that is the key to obtain effective outcomes in medical imaging. The segmentation step can dispose of the subjectivity of human-drawn restrictions [28]. Basically, segmentation divides an image into homogenous regions to precisely detect the contours (e.g., breast lesions) of the regions. Assortment techniques are proposed for image segmentation but none of those can produce a result with better quality for all image types (for example, mammography, positron emission tomography (PET), computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound) [29]. Thus, there is no single segmentation method, which is universally accepted for all kinds of procedures. In the proposed method, markercontrolled watershed transformation [30] has been recommended for tumor segmentation because it performs well to achieve reliable performance in the case of quantitative BUS image having speckle noise, low contrast, and weakly defined [12].

Watershed Transformation
Watershed transformation is a region-based method [31]. The idea of watershed is based on visualizing an image into its topographical view. Figure 3 illustrates the topographic view of an ultrasound image in which the bright portions cover "high" altitudes and dark contain "low" altitudes. Every local minima of an image includes a hole through which water is supplied into various catchment basins. The catchment basins are filled up with water starting from the bottom (i.e., minima of lowest intensity), and continue to high until water level attains the top peak in the topography. Dams are built to avoid the joining of water originating from two or more local minima. Consequently, the topography is divided into regions, also called catchment basins parted by dams, known as watershed lines or watersheds. The quantity of objects that are n outcome of segmentation relies upon the quantity of local minima that occurs in the image. Thus, the existence of several local minima in the object creates the problem of over-segmentation.

Segmentation
Image segmentation is a very crucial stage that is the key to obtain effective outcomes in medical imaging. The segmentation step can dispose of the subjectivity of human-drawn restrictions [28]. Basically, segmentation divides an image into homogenous regions to precisely detect the contours (e.g., breast lesions) of the regions. Assortment techniques are proposed for image segmentation but none of those can produce a result with better quality for all image types (for example, mammography, positron emission tomography (PET), computed tomography (CT), magnetic resonance imaging (MRI), and ultrasound) [29]. Thus, there is no single segmentation method, which is universally accepted for all kinds of procedures. In the proposed method, marker-controlled watershed transformation [30] has been recommended for tumor segmentation because it performs well to achieve reliable performance in the case of quantitative BUS image having speckle noise, low contrast, and weakly defined [12].
Watershed Transformation Watershed transformation is a region-based method [31]. The idea of watershed is based on visualizing an image into its topographical view. Figure 3 illustrates the topographic view of an ultrasound image in which the bright portions cover "high" altitudes and dark contain "low" altitudes. Every local minima of an image includes a hole through which water is supplied into various catchment basins. The catchment basins are filled up with water starting from the bottom (i.e., minima of lowest intensity), and continue to high until water level attains the top peak in the topography. Dams are built to avoid the joining of water originating from two or more local minima. Consequently, the topography is divided into regions, also called catchment basins parted by dams, known as watershed lines or watersheds. The quantity of objects that are n outcome of segmentation relies upon the quantity of local minima that occurs in the image. Thus, the existence of several local minima in the object creates the problem of over-segmentation.  [32] Over-segmentation is the main drawback of watershed transformation. It creates many regions owing to the existence of specious minima. To avoid this dilemma, automatic markers are introduced earlier and they are employed inside the required regions. This process dictates the flow of water only to the basins related to each marker. This method is referred to as marker-controlled watershed transformation and it is a powerful technique for breast lesions segmentation [12,31].
The marker image is a binary image also named as marker function, comprising of either single or big marker regions along with true logical variables. Each marker specifies a particular position inside the segmentation function [12] that impose a region to be global minima of the topography through the minima imposition method [33]. Therefore, this technique eliminates all inappropriate minima that are not associated to the marked areas. The minima imposition is a morphological operator [33] and more detail about it can be found in the paper [12]. At last, the potential lesion margins are acquired by calculating the watershed transformation of the minimaimposed image, shown in Figure 4.

Feature Extraction
After the segmentation of breast lesion from the background, attributes or characteristics are then exploited to recognize a lesion as benign or malignant. Feature extraction is a basic step to obtain the lesion properties which can differentiate this lesion from others. The efficient method of feature extraction can properly extract the features from the segmented image to facilitate and simplify the task of classifiers and more precise results can be achieved. Good features should have uniqueness, integrity, agility, abstractness, and invariance under the geometric structure [34]. It is essential to choose appropriate features and perform their correct assessment for the detection of malignancy [35]. In the feature extraction techniques, two types of image features are frequently used: Texture features and shape features. In this research work, GLCM and shape-based features are extracted. Over-segmentation is the main drawback of watershed transformation. It creates many regions owing to the existence of specious minima. To avoid this dilemma, automatic markers are introduced earlier and they are employed inside the required regions. This process dictates the flow of water only to the basins related to each marker. This method is referred to as marker-controlled watershed transformation and it is a powerful technique for breast lesions segmentation [12,31].
The marker image is a binary image also named as marker function, comprising of either single or big marker regions along with true logical variables. Each marker specifies a particular position inside the segmentation function [12] that impose a region to be global minima of the topography through the minima imposition method [33]. Therefore, this technique eliminates all inappropriate minima that are not associated to the marked areas. The minima imposition is a morphological operator [33] and more detail about it can be found in the paper [12]. At last, the potential lesion margins are acquired by calculating the watershed transformation of the minima-imposed image, shown in Figure 4.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 6 of 18 Figure 3. Sight of watershed transformation [32] Over-segmentation is the main drawback of watershed transformation. It creates many regions owing to the existence of specious minima. To avoid this dilemma, automatic markers are introduced earlier and they are employed inside the required regions. This process dictates the flow of water only to the basins related to each marker. This method is referred to as marker-controlled watershed transformation and it is a powerful technique for breast lesions segmentation [12,31].
The marker image is a binary image also named as marker function, comprising of either single or big marker regions along with true logical variables. Each marker specifies a particular position inside the segmentation function [12] that impose a region to be global minima of the topography through the minima imposition method [33]. Therefore, this technique eliminates all inappropriate minima that are not associated to the marked areas. The minima imposition is a morphological operator [33] and more detail about it can be found in the paper [12]. At last, the potential lesion margins are acquired by calculating the watershed transformation of the minimaimposed image, shown in Figure 4.

Feature Extraction
After the segmentation of breast lesion from the background, attributes or characteristics are then exploited to recognize a lesion as benign or malignant. Feature extraction is a basic step to obtain the lesion properties which can differentiate this lesion from others. The efficient method of feature extraction can properly extract the features from the segmented image to facilitate and simplify the task of classifiers and more precise results can be achieved. Good features should have uniqueness, integrity, agility, abstractness, and invariance under the geometric structure [34]. It is essential to choose appropriate features and perform their correct assessment for the detection of malignancy [35]. In the feature extraction techniques, two types of image features are frequently used: Texture features and shape features. In this research work, GLCM and shape-based features are extracted.

Feature Extraction
After the segmentation of breast lesion from the background, attributes or characteristics are then exploited to recognize a lesion as benign or malignant. Feature extraction is a basic step to obtain the lesion properties which can differentiate this lesion from others. The efficient method of feature extraction can properly extract the features from the segmented image to facilitate and simplify the task of classifiers and more precise results can be achieved. Good features should have uniqueness, integrity, agility, abstractness, and invariance under the geometric structure [34]. It is essential to choose appropriate features and perform their correct assessment for the detection of malignancy [35]. In the feature extraction techniques, two types of image features are frequently used: Texture features and shape features. In this research work, GLCM and shape-based features are extracted.

Grey Level Co-Occurrence Matrix (GLCM) Features
It is a popular technique to represent textural features [36] because it enhances the image details and provides a good interpretation. GLCM is based on 2nd order statistical features by counting the dependency of two brightness values. GLCM is a tabulation of various combinations of gray levels occurrence in an image. GLCM extracts features in two steps; formation of co-occurrence matrix formation and calculating the texture features. GLCM computes the association between two neighboring values through displacement d (distance to the next neighbor usually is equal to one) and angles θ = (0 • , 45 • , 90 • , and135 • ) representing horizontal, diagonal, vertical, and antidiagonal orientation, respectively as shown in Figure 5. Different statistical features are then extracted from co-occurrence matrix including energy , difference variance ( f 11 ), information measure of correlation 1 ( f 12 ) and information measure of correlation 2 ( f 13 ) [36], dissimilarity ( f 14 ), autocorrelation ( f 15 ), maximum probability ( f 16 ), cluster shade ( f 17 ), cluster prominence ( f 18 ) [37], inverse difference moment normalized ( f 19 ) and inverse difference normalized ( f 20 ) [38] as illustrated in Table 2. It is a popular technique to represent textural features [36] because it enhances the image details and provides a good interpretation. GLCM is based on 2nd order statistical features by counting the dependency of two brightness values. GLCM is a tabulation of various combinations of gray levels occurrence in an image. GLCM extracts features in two steps; formation of cooccurrence matrix formation and calculating the texture features. GLCM computes the association between two neighboring values through displacement d (distance to the next neighbor usually is equal to one) and angles = (0 , 45 , 90 , and 135 0 ) representing horizontal, diagonal, vertical, and antidiagonal orientation, respectively as shown in Figure 5. Different statistical features are then extracted from co-occurrence matrix including energy ( 1 ), contrast ( 2 ), correlation ( 3 ), variance ( 4 ), sum average ( 5 ), inverse difference moment ( 6 ), entropy ( 7 ), sum variance ( 8 ), sum entropy ( 9 ), difference entropy ( 10 ), difference variance( 11 ), information measure of correlation 1 ( 12 ) and information measure of correlation 2 ( 13 ) [36], dissimilarity ( 14 ), autocorrelation ( 15 ), maximum probability ( 16 ), cluster shade ( 17 )

Features
Formula Features Formula Where P g represents number of different intensity levels.
Where α x , α y and β x , β y are the means and standard deviations of G x and G y f 4

Shape-Based Features
Visual features of lesions are known as shape attributes or features. Such as, triangular shape or circular shapes, diameter of the boundary or perimeter of the border of object etc. Shape attributes can be partitioned into two groups; one is based on the object boundary, and the second is based on features of region. Various features of the shape can be computed including area, which is the number of pixels of the breast lesion and the perimeter of tumor is obtained by computing total number of boundary pixels around the lesion. Usually the lesion features such as area, perimeter, minor axis, and major axis cannot be employed independently for lesion classification. Such features are affected by the size of the lesion. Apart from that, features that are scale-invariant such as elongation, compactness, rectangularity, solidity, roundness, eccentricity, and convexity can be obtained and employed. Formulas used to extract shape features are documented in Table 3 [39]. Table 3. Formula of shape-based features.

Feature
Formula Description

Ma jor Axis
Used to measure the object length.
Method to show the level of determination lesion.

Area convex_area
Used to measure the density of the lesion.

Area Ma jor Axis * Minor Axis
Method to explain resemblance of lesion shape with rectangular shape.
The ratio between the lesion area with circle area.

Convex_perimeter Perimeter
This technique is the perimeter ratio between convex full of lesion and the lesion itself.

Ma jor Axis
The proportion of distance between the ellipse focal and major axis.
Illustration of some features such as area, major axis, and minor axis are shown in Figure 6 for more clarity. GLCM uses the arrangement of grey levels and their positions; its statistical characteristics allow faster identification of several varieties of lesions whereas, shape-based features contain all significant properties of an object in a compact descriptor. Employing only one feature type limits the depiction power of the object in terms of classification performance. Their combination constructs a distinct descriptor of feature presenting an optimum representation of the lesion. Thus, texture feature along with the shape-based features documented in Table 4, are merged to develop a hybrid feature set. The hybrid feature set is the concatenation of the GLCM and shape-based features, to provide a total of 27 features.

Classification
After extraction of the hybrid feature set, classification is done to segregate the benign lesions from the malignant ones. The well-known classifiers, such as KNN, decision tree (DT), and ensemble classifier were applied with different parameters to attain promising accuracy as analyzed below. We analyzed longitudinal views and transverse views separately and applied 10-fold cross-validation [40] for each view. In 10-fold cross-validation technique, the classifier is trained on the union of a 9-fold dataset and the test is performed on remaining dataset. This process continues until the last iteration and computes the overall classification accuracy of 10-fold cross-validation technique. Binary classification generates four possible products that are true positive (TP), true negative (TN), false positive (FP), and false negative (FN). TP is significant when the malignant lesion is recognized by the model as malignant whereas TN arises when a tumor belongs to benign and is recognized as benign. On the other side, FP occurs when a tumor is benign and it is recognized as malignant whereas FN arises when a particular tumor is malignant and is recognized as benign that is presented in Figure 7.

Classification
After extraction of the hybrid feature set, classification is done to segregate the benign lesions from the malignant ones. The well-known classifiers, such as KNN, decision tree (DT), and ensemble classifier were applied with different parameters to attain promising accuracy as analyzed below. We analyzed longitudinal views and transverse views separately and applied 10fold cross-validation [40] for each view. In 10-fold cross-validation technique, the classifier is trained on the union of a 9-fold dataset and the test is performed on remaining dataset. This process continues until the last iteration and computes the overall classification accuracy of 10-fold crossvalidation technique. Binary classification generates four possible products that are true positive (TP), true negative (TN), false positive (FP), and false negative (FN). TP is significant when the malignant lesion is recognized by the model as malignant whereas TN arises when a tumor belongs to benign and is recognized as benign. On the other side, FP occurs when a tumor is benign and it is recognized as malignant whereas FN arises when a particular tumor is malignant and is recognized as benign that is presented in Figure 7.  The performance of the proposed model can be calculated through standard matrices containing specificity, accuracy, sensitivity, positive predictive value (PPV), and negative predictive value (NPV).

Results
In this study, we have used the OASBUD dataset [26]. This dataset involves 52 malignant and 48 benign images. Each lesion of the OASBUD dataset consists of two scans, one is longitudinal and other is transverse. A second dataset of BUS images collected at BHE [27] included 210 malignant and 437 benign lesions is also employed. The segmented part in the breast lesion exposes cancer affected portion. This malignant region is required to be extracted from the background tissues. Watershed transformation algorithm based on the marker function and minima imposition is used for the segmentation of the BUS images. The hybrid feature set of GLCM and shape-based were used to assess our model for correct classification of breast lesion. The outcomes are concisely reviewed based on the classification performance and the corresponding results based on longitudinal and transvers scan of the OASBUD dataset are given in Tables 5 and 6, respectively. While the results of BUS images collected at BHE are reported in Table 7.  As shown in Tables 5-7, all statistical measures of diagnostic are estimates and presented with confidence intervals [41]. The 97% accuracy with CI of 91.48% to 99.38% is achieved through the proposed system using hybrid features when used with ensemble classifier for longitudinal scan of the OASBUD dataset is presented in Table 5. It illustrates the impact of hybrid features for correct classification of lesion. Furthermore, high values of specificity (97.87% with CI of 88.71% to 99.95%) and sensitivity (96.23% with CI of 87.02% to 99.54%) specify that the hybridization of GLCM and shape-based features has successfully managed to indicate a more distinct judgment between the binary sets, such as benign and malignant lesions. In Table 6, accuracy of 95% with CI of 88.72% to 98.36% using transvers scan is observed.
Similarly, Table 7 showed highest accuracy of (96.6% with CI of 94.90% to 97.86%) using BUS images collected at BHE. Furthermore, high values of specificity (97.70% with CI of 95.81% to 98.89%) and NPV (97.25% with CI of 95.34% to 98.40%) are observed.
As shown in Tables 5-7, the ensemble classifier offered superior results as compared to others. The prime reason is that ensemble algorithm uses multiple learning technologies to attain an enhanced discriminative power than any other single classifier [42]. The ensemble decision tree classifiers with RUSBoost have a good result due to limited generalization error in the large number of growing trees in the classifiers. RUSBoost employs a mix of random under sampling and a boosting method to improve performance [43]. Ensemble classifiers are better; however, it consumes more time as compared to decision tree and KNN.
It should be noted that the results also depend upon the nature of the domain, dataset size, and selection of features. Certain classifiers perform well on some type of applications or data better than the others. The overall classification accuracy of employed classifiers, that is, ensemble, KNN, and decision tree have been observed to be 97%, 94%, and 88% for longitudinal scan and 95%, 93%, and 85% for transverse scan of the OASBUD dataset, respectively as shown in Figure 8. Similarly, the observed classification accuracy of BUS images collected at BHE using ensemble, decision tree, and KNN are 96.6%, 95.83%, and 95.36%, respectively. The results prove that the ensemble classifier decision tree model through RUSBoost offers the best accuracy among all the employed classifiers.

Discussion
BUS images encompass speckle noise and other artifacts. Artifacts of BUS images are created via the characteristics of ultrasound themselves. They are categorized into four major classes: Degraded images, missing structures, mis-registered location, and falsely perceived objects [44]. Usually, artifacts may produce unnecessary clinical intervention. Such factors also reduce the efficacy of CADx framework that employs texture-based feature extraction. Some texture extraction techniques such as LBP and GLCM are very sensitive to the abovementioned factors. Therefore, only texture analysis does not accurately portray the lesions of BUS images. As shape-based features describes the properties of a lesion in a compact manner. In this paper, hybrid feature set is employed to produce more robust features to accurately characterize breast lesions and to enhance the performance of CADx system. Furthermore, ensemble classifiers enable a combination of individual classifiers, for example decision trees to boost the predictive capability. Decision trees are mostly suitable for the ensemble method since they are quick. Multiple learners are employed, weighted, and then merged in ensemble method to get a better result of classifier to individual classifiers. This method takes the 'wisdom of the crowd' idea in that individuals will pursue and evaluate numerous estimations before formulating a main decision. The most accepted case of selection for ensemble is boosting. It is a common technique for enhancing the operation of a weak learner for example, decision tree [43].
In this paper, we concentrated on enhancing the efficacy of CADx system. In the research community, several techniques and methodologies have been proposed to isolate a benign lesion from malignant one accurately. A comparative analysis has been provided in Table 8 presenting the performance benchmark of the proposed system. Nugroho et al. [45] proposed active contours without edges for segmentation, texture, and geometry analysis are performed for feature extraction and achieved 91.3% accuracy using SVM. Moon et al. [46] employed fuzzy c-means clustering for segmentation, feature analysis was done using echogenicity and morphology, and accomplished 92.50% sensitivity for malignant lesion using the binary logistic regression technique. B. Singh et al. [47] performed shape-based analysis and achieved 84.6% accuracy using ANN. The proposed CADx system delineates better performance due to better segmentation, hybrid features

Discussion
BUS images encompass speckle noise and other artifacts. Artifacts of BUS images are created via the characteristics of ultrasound themselves. They are categorized into four major classes: Degraded images, missing structures, mis-registered location, and falsely perceived objects [44]. Usually, artifacts may produce unnecessary clinical intervention. Such factors also reduce the efficacy of CADx framework that employs texture-based feature extraction. Some texture extraction techniques such as LBP and GLCM are very sensitive to the abovementioned factors. Therefore, only texture analysis does not accurately portray the lesions of BUS images. As shape-based features describes the properties of a lesion in a compact manner. In this paper, hybrid feature set is employed to produce more robust features to accurately characterize breast lesions and to enhance the performance of CADx system. Furthermore, ensemble classifiers enable a combination of individual classifiers, for example decision trees to boost the predictive capability. Decision trees are mostly suitable for the ensemble method since they are quick. Multiple learners are employed, weighted, and then merged in ensemble method to get a better result of classifier to individual classifiers. This method takes the 'wisdom of the crowd' idea in that individuals will pursue and evaluate numerous estimations before formulating a main decision. The most accepted case of selection for ensemble is boosting. It is a common technique for enhancing the operation of a weak learner for example, decision tree [43].
In this paper, we concentrated on enhancing the efficacy of CADx system. In the research community, several techniques and methodologies have been proposed to isolate a benign lesion from malignant one accurately. A comparative analysis has been provided in Table 8 presenting the performance benchmark of the proposed system. Nugroho et al. [45] proposed active contours without edges for segmentation, texture, and geometry analysis are performed for feature extraction and achieved 91.3% accuracy using SVM. Moon et al. [46] employed fuzzy c-means clustering for segmentation, feature analysis was done using echogenicity and morphology, and accomplished 92.50% sensitivity for malignant lesion using the binary logistic regression technique. B. Singh et al. [47] performed shape-based analysis and achieved 84.6% accuracy using ANN. The proposed CADx system delineates better performance due to better segmentation, hybrid features extraction for the shape, and size of lesions and ensemble method of classification as compared to [45][46][47] in terms of accuracy and sensitivity.

Conclusions and Future Directions
For an effective diagnosis of breast lesions using ultrasonography images, the expert radiologists mostly search for disrupting attributes in the lesion, which can be characterized as appropriate discrepancies in the normal patterns. Such kind of intelligence is difficult to transfer to an automatic system due to inconsistencies in breast tissues, i.e., presence of speckle noise and weakly defined edges. In order to deal with such scenarios, a hybrid feature vector of texture and shape-based features is used to capture all significant and optimal properties of a lesion. Furthermore, the segmentation of lesion is performed through marker-controlled watershed transformation to avoid over-segmentation. The significance of the proposed model is evaluated on the OASBUD dataset and BUS images collected at BHE. The research work has revealed that the employment of ensemble model with multiple learning classifiers was able to achieve 97% accuracy to improve performance of CADx system.
The limitation of this study is not to employ a feature selection process as such by keeping in view that it may compromise accuracy of the critical task of lesion detection. Our future research will concentrate on the categorization of ultrasonography image in accordance to multiclass BI-RADS level.