Open Access
This article is

- freely available
- re-usable

*Algorithms*
**2010**,
*3*(1),
44-62;
https://doi.org/10.3390/a3010044

Article

Breast Cancer Detection with Gabor Features from Digital Mammograms

Alcorn State University, Alcorn State, MS 39096, USA

Received: 28 October 2009; in revised form: 14 January 2010 / Accepted: 14 January 2010 / Published: 19 January 2010

## Abstract

**:**

A new breast cancer detection algorithm, named the “Gabor Cancer Detection” (GCD) algorithm, utilizing Gabor features is proposed. Three major steps are involved in the GCD algorithm, preprocessing, segmentation (generating alarm segments), and classification (reducing false alarms). In preprocessing, a digital mammogram is down-sampled, quantized, denoised and enhanced. Nonlinear diffusion is used for noise suppression. In segmentation, a band-pass filter is formed by rotating a 1-D Gaussian filter (off center) in frequency space, termed as “Circular Gaussian Filter” (CGF). A CGF can be uniquely characterized by specifying a central frequency and a frequency band. A mass or calcification is a space-occupying lesion and usually appears as a bright region on a mammogram. The alarm segments (suspicious to be masses/calcifications) can be extracted out using a threshold that is adaptively decided upon the histogram analysis of the CGF-filtered mammogram. In classification, a Gabor filter bank is formed with five bands by four orientations (horizontal, vertical, 45 and 135 degree) in Fourier frequency domain. For each mammographic image, twenty Gabor-filtered images are produced. A set of edge histogram descriptors (EHD) are then extracted from 20 Gabor images for classification. An EHD signature is computed with four orientations of Gabor images along each band and five EHD signatures are then joined together to form an EHD feature vector of 20 dimensions. With the EHD features, the fuzzy C-means clustering technique and k-nearest neighbor (KNN) classifier are used to reduce the number of false alarms. The experimental results tested on the DDSM database (University of South Florida) show the promises of GCD algorithm in breast cancer detection, which achieved TP (true positive rate) = 90% at FPI (false positives per image) = 1.21 in mass detection; and TP = 93% at FPI = 1.19 in calcification detection.

Keywords:

breast cancer; computer-aided detection (CAD); Gabor filter; edge histogram descriptor (EHD); mammography screening; mass detection; calcification detection## 1. Introduction

The lifetime risk of developing breast cancer among American women is 12.7% (one in eight, http://www.cancer.gov/cancertopics/factsheet/Detection/probability-breast-cancer, from the National Cancer Institute), exceeded only by lung cancer. Mammography is considered the most effective technology presently available for breast cancer screening. With digital or digitized mammograms computer-aided detection (CAD) has proven to be a very useful tool for radiologists. A successful CAD system will eventually facilitate a computer-aided diagnosis (CADx) system development that may greatly save radiologists’ work and benefit patients. There are three types of breast lesions according to the ACR Bi-RADS ® lexicon, which are mass, calcification and architecture distortion. In this paper, all discussions focus on mass detection and calcification detection, which are briefly reviewed as follows.

A mass is defined as a space-occupying lesion seen in at least two different projections [1]. Masses are described by their shape (Round, Oval, Lobulated, Irregular) and margin characteristics (Circumscribed, Microlobulated, Obscured, Ill-Defined, Spiculated). On mammograms, mass areas usually appear brighter than healthy tissues. However, the patterns of mass lesion are hard to be defined by simple features such as intensities or gradients because of huge variations among individuals. For example, masses are quite difficult to be recognized from dense breasts. Therefore, many advanced features are proposed to identify mass lesions from screening mammograms in the literature. In general, neighborhood or region texture features are generated by jointly considering differences of orientations and correlation of scales. Kegelmeyer et al. [2] developed a method to detect spiculated masses using a set of five features for each pixel. They used the standard deviation of a local edge orientation histogram (i.e., analysis of local oriented edges, ALOE) and a subset of Law’s texture features (of four dimensions). To address the variant mass size problem, Liu et al. [3] proposed a multi-resolution algorithm by using the discrete wavelet transform based on Kegelmeyer et al.’s work. Matsubara et al. [4] presented an adaptive thresholding technique for the detection of masses. Qian et al. [5] developed a multi-resolution and multi-orientation wavelet transform for the detection of masses and spiculation analysis. They observed that traditional wavelet transforms cannot extract directional information which is crucial for a spiculation detection task.

Calcifications are small calcium deposits that form in the breast as a result of benign or malignant processes. Mammographically, they appear as bright white spots of various sizes and shapes. The important characteristics of calcifications are their size, shape or morphology, number, and distribution. Malignant calcifications tend to be numerous, clustered, small, varying in size and shape, angular, irregularly shaped, and branching in orientation [6,7]. On the other hand, calcifications associated with benign diseases are generally larger, more rounded, smaller in number, more diffusely distributed, and more homogeneous in size and shape. Yu et al. [8] used a wavelet filter for the detection of microcalcifications, and a Markov random field (MRF) model to obtain textural features from the neighborhood of every detected calcification. The MRF-based textural features and three statistical textural features (the mean, variance, and a measure of edge density), were used to reject false positives. Soltanian-Zadeh et al. [9] compared four groups of features: multi-wavelet-based features, wavelet-based features, Haralick’s texture features [10], and shape features. The microcalcifications were first segmented using an automated method, and then the forementioned features were extracted. Within each group, a feature-selection procedure based on genetic algorithms was employed to identify the most-suitable features for use with a k-nearest-neighbor classification scheme. It was observed that the multi-wavelet features yielded the best performance (evaluated with the area under the ROC curve), followed by the shape features, in distinguishing malignant microcalcifications from the benign categories. Bhangale et al. [11] and Rogova et al. [12] used a set of Gabor filters to process mammograms. By changing the center frequencies of Gabor filters, this method could transform the original images into different scales and orientation spaces. The filtered images are divided into small nonoverlapping blocks. For each block, the mean and standard deviation of the intensities are calculated and a feature vector is formed. Bhangale et al. [11] used a k-mean clustering classifier to reduce false positives.

In this paper, we aim at general cancer detection for both masses and calcifications regardless of their sizes, shapes or margins, and present a new CAD algorithm, named the “Gabor cancer detection” (GCD) algorithm. Specifically, the alarm segments are generated by using a set of band-pass filters, termed as “Circular Gaussian Filter” (CGF). The number of false alarms is reduced by using the Gabor features. The proposed GCD algorithm is validated with a subset of DDSM database (University of South Florida) [13]. The rest of this paper is organized as follows. The GCD algorithm is fully described in Section 2. The experiments and discussions are given in Section 3. Finally, conclusions are made in Section 4.

## 2. GCD Algorithm

As shown in Figure 1, there are three main steps in the GCD algorithm, preprocessing, segmentation (for generating alarms), and classification (for reducing false alarms), which are elaborated in the following three subsections.

#### 2.1. Mammogram Preprocessing

The goal of preprocessing is to prepare for next two steps, segmentation and classification. Down-sampling, quantization, ROI (region of interest) extraction, denoising and enhancement are done in the preprocessing step.

To reduce computation load without losing much sensitivity, original mammograms are down-sampled by factor of 4 (i.e., the new image size is reduced to 1/16 of its original size) and quantized down to 8 bits per pixel (256 gray level). The digitized mammograms in the DDSM database are of high resolution and of high fidelity. For example, the mammograms scanned by LumiSys in the DDSM database are of 12 bits per pixel and 50 microns per pixel, a typical mammogram of which has the resolution of 6,000 by 4,000 pixels.

The region of interest on a mammographic image (breast ROI) is extracted to reduce the processing time (by ignoring the dark areas). First, a mammogram (
where

**I**_{S}) is mapped onto a special target image (**I**_{M}) with Equation (1) by specifying (**μ**_{T},**σ**_{T}) = (128, 0.75). Then the mapped image (**I**_{M}) is binarized with a threshold (**μ**_{T}). Next, the largest 8-connected object is considered as the binary mask of breast ROI. Finally, morphological closing operations can fill holes inside the ROI:
$${I}_{M}=({I}_{S}-{\mathsf{\mu}}_{S})\cdot \frac{{\mathsf{\sigma}}_{T}}{{\mathsf{\sigma}}_{S}}+{\mathsf{\mu}}_{T}$$

**I**_{M}is the mapped (target) image,**I**_{S}is the source (original) mammographic image;**μ**and**σ**denote mean and standard deviation, respectively; the subscripts ‘S’ and ‘T’ refer to the source (original) and target (mapped) images.**Figure 1.**The flow chart of GCD algorithm: Three steps, preprocessing, segmentation and classification, are shown in three columns, respectively. “Save/Read the Preprocessed image” (within dashed rectangles) may be omitted in a continuous process. P

_{C}is the cancerous probability of analyzing alarm segment.

Nonlinear diffusion is performed to suppress noise while retaining edges. Nonlinear diffusion methods have been proven as powerful methods in the denoising and smoothing of image intensities while retaining and enhancing edges. Such an image smoothing process can be summarized as a successive coarsening of any given image while certain structures in that image are retained on a fine scale. Nonlinear diffusion is closely connected to a specific kind of multiscale analysis referred to as scale-space [14,15], and was first used for image smoothing with simultaneous edge enhancement [16]. In addition, Barash et al. [17] have proven that nonlinear diffusion is equivalent to adaptive smoothing. Basically, diffusion is a PDE (partial differential equation) method that involves two operators, smoothing and gradient, in 2D image space. The diffusion process smoothes the regions with lower gradients whereas stops smoothing at region boundaries with higher gradients. In other words, the diffused result is a nonlinear function of local gradients. Weickert et al. [18] presented a semi-implicit scheme with an “additive operator splitting” (AOS) implementation for nonlinear diffusion filtering, which is stable for all time steps (t >> 0.25) and guarantees equal treatment of all coordinate axes. The AOS scheme is at least ten times more efficient than the widely used explicit schemes (with limited time step, t ≤ 0.25). In our experiments, the AOS scheme is implemented for mammogram diffusion by empirically specifying parameters to achieve a good balance between noise removal and detail retaining.

Image enhancement is intended to benefit the CAD algorithm but not necessarily to favorite visual inspection, which is achieved with a “thresholded histogram matching”. Histogram matching (also referred as histogram specification) is usually used to enhance an image when histogram equalization fails [19]. Given the shape of the histogram that we want the enhanced image to have, histogram matching can generate a processed image that has the specified histogram. In particular, by specifying the histogram of Gaussian distribution and by designating a threshold (T

_{HM}), histogram matching is employed to enhance mammograms. During histogram matching, any pixel whose intensity value is less than T_{HM}will always be kept unchanged as if it were protected from intensity (gray level) changes. Such a thresholded histogram matching can retain black background of medical images, which is useful for the following alarm generation.#### 2.2. Segmentation by Circular Gaussian Filter

Segmentation is one of the key steps to empower a breast CAD algorithm to be successful. Without the segmentation, it is inefficient to extract the features over the entire mammogram, which will cause too extensive computations and usually result in a poor classification. The goal of segmentation is to find all suspicious regions that should contain as many cancers (masses or calcifications) as possible; whereas the false positives will be excluded with a trained classifier using the additional features extracted from the suspicious segments. We propose to detect mass or calcification regions using a set of band-pass filters formed by rotating a 1-D Gaussian filter (off center) in frequency space, termed as “Circular Gaussian Filter” (CGF; refer to Equation (2) and Figure 2). A CGF can be uniquely characterized by specifying a central frequency (f) and a frequency band (σ). A mass or calcification is a space-occupying lesion and usually appears as a bright region on a mammogram. On the filtered mammograms with a set of CGFs, the highlighted regions correspond to mass or calcification segments. Consequently, the suspicious mass or calcification segments can be extracted out using a threshold adaptively decided upon histogram analysis. Typically, the CGF parameters of (f = [12 24 48], σ = [6 12 24]) produce promising segmentation results.

In Fourier frequency domain, a Circular Gaussian Filter (CGF) is defined as follows:
where:
where f specifies a central frequency and σ defines a frequency band.

$$CGF(u,v)=\frac{1}{2\pi {\mathsf{\sigma}}^{2}}{e}^{-\frac{{u}_{1}^{2}+{v}_{1}^{2}}{2{\mathsf{\sigma}}^{2}}}$$

$${u}_{1}=(u-f\mathrm{cos}\mathsf{\theta})\mathrm{cos}\mathsf{\theta}-(v-f\mathrm{sin}\mathsf{\theta})\mathrm{sin}\mathsf{\theta}$$

$${v}_{1}=(u-f\mathrm{cos}\mathsf{\theta})\mathrm{sin}\mathsf{\theta}+(v-f\mathrm{sin}\mathsf{\theta})\mathrm{cos}\mathsf{\theta},\mathsf{\theta}=0~2\mathsf{\pi},$$

**Figure 2.**Figure illustration of CGF with f = 12 and σ = 6 (Only the central part of CGF is presented): (a) CGF in frequency domain; (b) CGF in spatial domain; (c) A central slice of (a); (d) A central slice of (b).

Alarm pixels and alarm segments are generated with the following procedures:

(1) Alarm pixels are produced by thresholding three CGF-filtered images (I

_{Fm}) pixel-by-pixel. The alarm threshold (T_{Am}) is determined by histogram analyses. For each of three CGF-filtered images (I_{Fm}, m = 1, 2, 3), initialize a corresponding alarm image, I_{Am}, with zero pixel values, and then:- (a)
- Compute the histogram and accumulated histogram: H
_{Fm}and AH_{Fm}. - (b)
- Find the locations of peaks in H
_{Fm}by using histogram gradient changes (of sign pattern [+ + -]): {LP_{1}, LP_{2}, … LP_{q}}; and assumed this set is in the order from the lowest (LP_{1}) to the highest (LP_{q}) gray level. - (c)
- Choose the candidates of alarm threshold: T
_{k}= {LP_{i}| when (the selected alarm area) < (10% entire breast ROI area); i = 1 ~ q}, k = p, p+1, …, q (2 ≤ p ≤ q). Use AH_{Fm}to calculate the selected alarm area. - (d)
- Let the alarm threshold be one of {T
_{k}; k = p ~ q}, i.e., T_{Am}= T_{l}, p ≤ l ≤ q, such that $\left|L{P}_{l}-L{P}_{l-1}\right|$ is the maximum among {$\left|L{P}_{k}-L{P}_{k-1}\right|$; k = p ~ q}. - (e)
- Mark a pixel at (x, y) as a candidate of alarm pixel if I
_{Fm}(x, y) > T_{Am}by assigning I_{Am}(x, y) = 4 – m, where m = 1, 2, 3. - (f)
- A pixel at (x, y) is considered as an alarm pixel if $\sum _{m=1}^{3}{I}_{\mathrm{A}m}(x,\text{}y)\ge 4$.

(2) Alarm segments are aggregated from alarm pixels with morphological and geometric process as follows:

- (a)
- Use morphological opening or filling to break segments or fill holes.
- (b)
- Enumerate all 4-connection segments.
- (c)
- Remove small alarm segments whose area is less than 9 pixels.

The overlapping area between alarm segments and overlays (ground truths) can be easily calculated (refer to Equation (9c)), which is an important measure of segmentation performance.

#### 2.3. Classification with Gabor Features

Gabor filters have been used in many applications, such as texture segmentation, target detection, edge detection, retina identification, image coding and image representation [20]. The Gabor filters have received considerable attention because the characteristics of certain cells in the visual cortex of some mammals can be approximated by these filters. Further, biological research suggests that the primary visual cortex performs a similar orientational and Fourier space decomposition [21], so they seem to be sensible for a technical vision system. In addition these 2D band-pass filters, have been shown to posses optimal localization properties in both spatial and frequency domain and thus are well suited for extracting edges or features of an image lying in a specific frequency range and orientation.

A Gabor filter can be viewed as a sinusoidal plane of particular frequency and orientation, modulated by a Gaussian envelope. It can be written as:

$$g(x,y)={e}^{-\frac{1}{2}\left[\frac{{x}_{2}}{{\sigma}_{x}^{2}}+\frac{{y}_{2}}{{\sigma}_{y}^{2}}\right]}{e}^{-j2\pi ({u}_{0}x+{v}_{0}y)}$$

In Fourier frequency domain, the filter’s response consists of two 2D Gaussian functions (due to the conjugate symmetry on the spectrum) that are:
where σ
where f determines the central frequency of the pass band in orientation θ. Of course, we have $\theta ={\mathrm{tan}}^{-1}({{v}_{0}/u}_{0})$ and $f=\sqrt{{u}_{0}^{2}+{v}_{0}^{2}}$, where (u

$$G(u,v)={G}_{1}+{G}_{2}={e}^{-\frac{1}{2}\left[\frac{{u}_{1}^{2}}{{\sigma}_{u}^{2}}+\frac{{v}_{1}^{2}}{{\sigma}_{v}^{2}}\right]}+{e}^{-\frac{1}{2}\left[\frac{{u}_{2}^{2}}{{\sigma}_{u}^{2}}+\frac{{v}_{2}^{2}}{{\sigma}_{v}^{2}}\right]}$$

_{u}= 1/(2πσ_{x}) and σ_{v}= 1/(2πσ_{y}) are the standard deviation along two orthogonal directions (which determines the width of the Gaussian envelope along the x- and y-axes in spatial domain), and assume that the origin of the Fourier transform has been centered. The intermediate variables are defined as following:
$${u}_{1}=(u-f\mathrm{cos}\theta )\mathrm{cos}\theta +(v-f\mathrm{sin}\theta )\mathrm{sin}\theta $$

$${v}_{1}=-(u-f\mathrm{cos}\theta )\mathrm{cos}\theta +(v-f\mathrm{sin}\theta )\mathrm{sin}\theta $$

$${u}_{2}=(u+f\mathrm{cos}\theta )\mathrm{cos}\theta +(v+f\mathrm{sin}\theta )\mathrm{sin}\theta $$

$${v}_{2}=-(u+f\mathrm{cos}\theta )\mathrm{cos}\theta +(v+f\mathrm{sin}\theta )\mathrm{sin}\theta $$

_{0}, v_{0}) is the center of one Gaussian function in Equation (5).From each mammogram, a total of 20 Gabor filtered images (I

_{Gmn}, m = 1~5, n = 1~4, in spatial domain) are produced with 20 Gabor filters distributed along five bands (located from low to high frequencies) by four orientations (vertical, 45°, horizontal, and 135°). Four Gabor filters along four orientations at Band 2 are illustrated in Figure 3, where only the central parts of four filters are displayed. The full size of a Gabor filter actually matches the image size being processed. Keep in mind that there is 90° directional difference between spatial domain and frequency domain. One sample of Gabor filtered images (of Case “3039_Left”, refer to Figure 10) with the Gabor filter bank at 5 bands and 4 orientations is demonstrated in Figure 4.**Figure 3.**Four Gabor filters along four orientations at Band 2. (a) Gabor filters in frequency domain, where f = 12, σ

_{u}= σ

_{v}= 6, θ = 0°, 45°, 90°, 135°. Only the central parts of four filters are displayed here. (b) Gabor filters in spatial domain. Note that there is 90° directional difference between spatial domain and frequency domain.

For each alarm segment found in Section 2.2, a set of edge histogram descriptors are computed with its 20 counterparts lying in 20 Gabor filtered images (I

_{Gmn}, m = 1~5, n = 1~4), which will be used as features for classification. After clustering the EHD features with fuzzy C-means clustering method, a k-nearest neighbor (KNN) classifier is used to reduce the number of false alarms.The edge histogram descriptor (EHD) [22,23] was initially proposed for MPEG-7 to express the local edge distribution in an image. The histogram generated in EHD denotes the local (within the alarm segment) frequency of four different types of edges namely vertical (90°), horizontal (0°), 45° diagonal, 135° diagonal edges at each band (refer to Figure 3 and Figure 4). Specifically, for a particular alarm segment at each band (each row in Figure 4), the vertical histogram frequency (within the alarm segment) is the number of pixels of maximal intensity values in the vertical edge-extracted image (left-most column in Figure 4) compared with the pixel values in other three directional (horizontal, 45° diagonal and 135° diagonal) edge-extracted images (columns 2, 3, 4 in Figure 4). The other three directional frequencies can be calculated in the same way and a four-dimensional EHD signature can be formed by combining four directional frequencies together. The EHD features representing an alarm segment are obtained by joining 5-band EHD signatures together, which can be formulated as follows:

EHD (m,n) = (Number of maximal intensity pixels at direction n)/(Alarm segment area)

m = 1, 2, 3, 4, 5; n = 1, 2, 3, 4

m = 1, 2, 3, 4, 5; n = 1, 2, 3, 4

The EHD calculation is equivalent to count the numbers of maximal intensity pixels at each orientation along all bands. For example, suppose the vertical frequency of an EHD signature at band 1 is the largest (i.e., the highest bar in a histogram plot), that means vertical edges dominate band 1. Such an EHD feature (of an alarm segment) reflects both directional edge information and also frequency scale information (form low to high frequency). The EHD features are statistical features that are stable and reliable regardless of the absolute intensity values.

**Figure 4.**Gabor-filtered images (I

_{Gmn}, Case “3039_Left” in Figure 10, calcification present) with Gabor filter bank at five bands (along five rows) and four orientations (across four columns) for calcification detection. The four columns were Gabor filtered images corresponding to four orientations (vertical, 45°, horizontal and 135°).

The most representative EHD features are mainly selected by using the overlapping ratio (see Equation (9c)) and clustered with a Fuzzy C-means (FCM) [24] clustering method. The FCM is a data clustering technique wherein each data point belongs to a cluster to some degree that is specified by a membership grade. FCM starts with an initial guess (most likely incorrect) for the cluster centers, which are intended to mark the mean location of each cluster. FCM assigns every data point a membership grade for each cluster. By iteratively updating the cluster centers and the membership grades for each data point, FCM iteratively moves the cluster centers to the right location within a data set. This iteration is based on minimizing an objective function that represents the distance from any given data point to a cluster center weighted by that data point’s membership grade.

Once a certain number of clusters (say R clusters for each of two classes, cancerous vs. healthy) are formed by the FCM algorithm, the k-nearest neighbors (KNN) can be found from R × 2 clusters by using Euclidean distance (between the analyzing feature and the clustered features). The probability, P
where k

_{C}, of a given alarm segment to be a malignant cancer is calculated by:
P

_{C}= k_{C}/ k, k_{C}≤ k and k_{C}+ k_{H}= k_{C}and k_{H}are the numbers of nearest clusters (to the analyzing features) that belong to cancerous class and healthy class, respectively. By specifying a threshold, T_{P}, a given alarm segment is classified as “cancer” (mass or calcification) when P_{C}> T_{P}; otherwise “Healthy” (Normal).In general, classification performance can be evaluated using sensitivity, specificity and ROC (receiver operating characteristic) area. The performance of a CAD system for breast cancer detection is usually reported with true positive rate (TP), false positives per image (FPI), which are described as follows:
where the Overlapping ratio defined in Equation (9c) is used to select typical patterns (for training purpose in classification) and to evaluate the annotation accuracy. Note that “positive” means cancerous whereas “negative” means healthy in the context. A “mark” is usually imposed on the mammography image when an alarm segment is classified as positive (cancerous), which is also referred as annotation in this paper. If a positive mark matches the ground truth (referred as “overlay” herein, predefined by a radiologist) very well, then it is so-called “true positive”; otherwise “false positive”.

TP = (Number of true positive marks)/(Number of lesions)

FPI = (Number of false positive marks)/(Number of images)

Overlapping ratio = (Overlapping area)/(Overlay area)

## 3. Experimental Results and Discussions

Screening mammograms (exposed to low-dose x-rays) are suitable for screening test because of lower x-ray dose involved and shorter imaging time used. Two views of each breast in screen mammograms are recorded—the craniocaudal (CC) view, which is a top to bottom view, and a mediolateral oblique (MLO) view, which is a side view taken at an angle.

The screening mammograms used in our experiments were a subset (“Lumisys”) of the Digital Database for Screening Mammography project (DDSM, University of South Florida). These digitized mammograms were scanned by the LumiSys scanner (50 microns per pixel, 12 bits per pixel), the image sizes of which were from 2,728 × 3,920 pixels to 4,608 × 6,048 pixels. A GCD algorithm was implemented as described in Section 2 (refer to Figure 1). The following parameters were used in our implementation for both mass and calcification detections: (i) the threshold used for histogram matching: T

_{HM}= 32; (ii) the CGF parameters used for alarm segments generation: f = [12 24 48], σ = [6 12 24] (see Equations 2 and 3 and Figure 2); (iii) for Gabor filter bank: f = [6 12 24 48 80], σ_{u}= σ_{v}= [3 6 12 24 30], θ = 0°, 45°, 90°, 135° (see Equations (4–6) and Figure 3 and Figure 4); (iv) The classification features are the EHD features of 20 dimensions; (v) the number clusters for each class R = 40, and thus a total of 80 clusters were formed; the number of nearest neighbors k = 50 (k < R × 2); (vi) the threshold used for annotations: T_{P}= 0.45, which means an alarm segment whose P_{C}> 0.45 will be annotated as a classified mass or calcification. If the number of alarm segments is greater than three, then only the first three largest P_{C}’s will be annotated on the mammogram in heavy black outlines.#### 3.1. GCD for Mass Detection

A total of 431 mammograms (digitized by the LumiSys scanner) were actually analyzed to validate the GCD algorithm in mass detection, which included 159 normal images (from 80 healthy individuals) and 272 cancerous images (with at least one malignant mass present on each image, from 270 cancerous patients). To sufficiently use the dataset (of 431 samples) and to conduct reliable results, a ten-fold cross validation [25,26] method was used to divide the dataset into independent training and testing groups. By this procedure, each one of the ten subsets, in turn, was held out to be used as an independent testing set while the other nine subsets were combined to form a training set. From the training set, the typical patterns (features) were selected according to their overlapping ratios (e.g., greater than 0.75, for cancerous class only). The selected features were the most representative ones and then used to form a certain number of clusters (R = 40) for each class with the FCM method. With the KNN (k = 50, i.e., find out the 50 nearest neighbors from 80 clusters) classifier, the P

_{C}value of each case in the testing set was computed. All cases were tested after 10 runs and the performance of this GCD algorithm was evaluated by using the metrics defined in Equation (9).At the end of segmentation (generating alarm segments), we had TP (Segmentation) = 92% and FPI (Segmentation) = 7.78. At the end of classification (reducing false alarms), by specifying T

_{P}= 0.45, the performance was TP (classification) = 90% and FPI (classification) = 1.21.Four cancerous cases of malignant masses are shown in Figure 5, Figure 6, Figure 7 and Figure 8. The detailed descriptions and case analyses were given in figure captions. In each figure, Assessment and Subtlety of the first lesion are given. There are three panels in each figure: (a) is the original mammogram (ROI only); (b) is the preprocessed image (diffused and enhanced) with the annotations (detected by GCD, in heavy black outlines) and overlays (ground truths, in light black lines); (c) is the map of alarm segments generated from segmentation, overlapped on the overlays, and the probability values (P

_{C}) of alarm segments are also presented. In the DDSM database, the situation of a cancerous area is described by Assessment, rating from 1 to 5 (most advanced) and Subtlety, rating from 1 to 5 (most detailed). So a mass lesion of (Assessment, Subtlety) = (5, 5) is most easily detected. An early-stage mass detection was demonstrated in Figure 6, where (Assessment, Subtlety) = (1, 2).**Figure 5.**GCD results of Case “3499_Right” (one malignant mass lesion with irregular-shape and spiculated margin, assessment = 5, subtlety = 5): (a) Original mammogram; (b) Preprocessed image with one annotation (P

_{C}= 0.55, detected by GCD, in heavy black outlines) and one overlay (ground truths, in light black line)—a very good match between annotation and overlay; (c) 4 alarm segments (P

_{C}= 0.55, 0.44, 0.28, 0.26) overlapped on one overlay.

**Figure 6.**GCD results of Case “3087_Right” (one malignant mass with irregular shape and obscured-ill_defined margin, assessment = 1, subtlety = 2, a very early-stage mass): (a) Original mammogram; (b) Preprocessed image with one annotation (P

_{C}= 0.57, detected by GCD, in heavy black outline) and one overlay (ground truth, in light black line)—an excellent match between annotation and overlay; (c) 3 alarm segments ((P

_{C}= 0.57, 0.37, 0.37) overlapped on one overlay.

**Figure 7.**GCD results of Case “3396_Right” (one malignant mass lesion with irregular-shape and spiculated margin, assessment = 4, subtlety = 3): (a) Original mammogram; (b) Preprocessed image with two annotations (P

_{C}= 0.63, detected by GCD, in heavy black outlines) and one overlay (ground truths, in light black lines)—one false alarm (P

_{C}= 0.50) was annotated; (c) 2 alarm segments (P

_{C}= 0.63, 0.50) overlapped on one overlay.

**Figure 8.**GCD results of Case “3062_Left” (one malignant mass with lobulated-architechtural_distortion shape and obscured-ill_defined margin, assessment = 4, subtlety = 2): (a) Original mammogram; (b) Preprocessed image with one annotation (P

_{C}= 0.71, detected by GCD, in heavy black outline) and one overlay (ground truth, in light black lines)—pretty good match between annotation and overlay; (c) One alarm segment (P

_{C}= 0.71) overlapped on one overlay.

One normal case is illustrated in Figure 9, where 10 alarm segments were detected in segmentation but all were removed in classification. Eventually, there is no false alarm present in this normal case examination with the proposed GCD algorithm.

**Figure 9.**GCD results of Case “3634_Right” (normal and healthy): (a) Original mammogram; (b) Preprocessed image without any annotation—No false alarms were annotated; (c) 10 alarm segments (P

_{C}= 0.44, 0.41, 0.35, 0.29, 0.28, 0.28, 0.28, 0.27, 0.25, 0.25).

#### 3.2. GCD for Calcification Detection

In the test of calcification detection, 155 digital mammograms (digitized by the “LumiSys” scanner, from the DDSM database) were analyzed by the proposed GCD algorithm. Those 155 mammographic images consist of CC views only, which include 80 normal images (from 80 healthy subjects, randomly selected one image from each subject) and 75 cancerous images (from 74 patients). To balance two sample numbers used in the training set, 80 normal images (rather than 159) were deployed here in contrast with 75 cancerous images. In DDSM database, a calcification was also depicted by its type (Amorphous, Pleomorphic, Fine_Linear_Branching, etc.) and distribution (Clustered, Linear, Segmental, Regional, Diffusely_scattered), which was suggested by the ACR Bi-RADS ® lexicon.

Similarly, a ten-fold cross validation method was used to sufficiently use the dataset (of 155 samples) and to conduct reliable results. The same experimental parameters were used as that used in mass detection (in Section 3.1). We achieved TP (Segmentation) = 94% at FPI (Segmentation) = 4.21; and TP (Classification) = 93% at FPI (Classification) = 1.19 in calcification detection with the GCD algorithm.

**Figure 10.**GCD results of Case “3039_Left” (one malignant calcification with pleomorphic type and segmental distribution, assessment = 4, subtlety = 3): (a) Original mammogram; (b) Preprocessed image with one annotation ((P

_{C}= 0.71, detected by GCD, in heavy black outline) and one overlay (ground truth, in light black line)—an excellent match; (c) 9 alarm segments (P

_{C}= 0.71, 0.36, 0.35, 0.34, 0.34, 0.33, 0.33, 0.31, 0.31) overlapped on one overlay.

**Figure 11.**GCD results of Case“0068_Right” (two malignant calcifications with amorphous type and clustered distribution, assessment = 5, subtlety = 4): (a) Original mammogram; (b) Preprocessed image with one annotation (P

_{C}= 0.69) and two overlays—a good match since one annotation hit two overlays; (c) 6 alarm segments (P

_{C}= 0.69, 0.39, 0.33, 0.27, 0.24, 0.22) overlapped on two overlays.

**Figure 12.**GCD results of Case “3674_Right” (healthy): (a) Original mammogram; (b) Preprocessed image with one annotation—one false alarm (P

_{C}= 0.66) was annotated; (c) 8 alarm segments (P

_{C}= 0.66, 0.43, 0.35, 0.30, 0.29, 0.29, 0.28, 0.27).

#### 3.3. Discussions

In our present experiments, the KNN classifiers were trained with two separate datasets. One dataset consists of mass cases vs. healthy cases to train and test a mass classifier, while the other consists of calcification cases vs. healthy cases to train and test a calcification classifier. The normal cases were from the same group of healthy subjects (digitized by the “Lumisys” scanner). Overall, the performances of the presented GCD algorithm in both mass detection and calcification detection are very good in contrast with the results reported in literature [27].

For real CAD applications (in fields), the following solutions may be considered. (i) Run two trained classifiers (say a mass classifier and a calcification classifier) with a given mammogram. This way can distinguish mass annotations from calcification annotations but may put more false marks on the analyzing mammogram; (ii) Run one trained classifier trained with one mixed database including cancerous cases (of mass, calcification, or both) and healthy cases. This way may reduce false positives but cannot tell if the marked segment is mass or calcification; (iii) Run a multiple-class classifier trained with the mixed database, which may tell the lesion type (mass or calcification) and give less false positives. However, finding a set of suitable features for a multiple-class classifier will be more challenging. More extensive research and experiments will be conducted regarding how to classify multiple lesion types in breast cancer detection.

Our future efforts will also focus on extracting and integrating new features (such as wavelet features), analyzing multiple views (such as the asymmetry between left and right beast, and the correspondence between CC view and MLO view), and fusing multiple classifiers (such as support vector machine).

## 4. Conclusions

A new mass and calcification detection algorithm is proposed and termed as “Gabor cancer detection” (GCD) algorithm. A circular Gaussian filter (CGF) is used in segmentation (generating alarm segments); while the EHD (edge histogram descriptor) features extracted from the Gabor-filtered images are used in classification (reducing false alarms). The histogram analysis makes the alarm threshold adapted to the CGF-filtered images and thus makes the alarm generation process vigorous and adaptive to variant intensities from case to case.

The GCD algorithm is tested on a relatively large sample of mammograms (a subset of DDSM database). The overall performance (TP and FPI) of the GCD algorithm is very good and the accuracies in locating both masses and calcifications (measured with overlapping ratio hereinabove) are very high. Furthermore, the GCD algorithm can successfully detect early-stage masses and calcifications (of low values of Assessment & Subtlety in DDSM).

## Acknowledgements

This work was supported by grant #W81XWH-06-1-0543 from TATRC, DOD. The source data, digitized screening mammograms, were obtained from the DDSM project [7], University of South Florida.

## References

- ACR BI-RADS — Mammography, Ultrasound & Magnetic Resonance Imaging, 4th ed.; American College of Radiology: Reston, VA, USA, 2003.
- Kegelmeyer, W.P., Jr.; Pruneda, J.M.; Bourland, P.D.; Hillis, A.; Riggs, M.W.; Nipper, M.L. Computer-aided mammographic screening for speculated lesions. Radiology
**1994**, 191, 331–337. [Google Scholar] [CrossRef] - Liu, S.L.; Babbs, C.F.; Delp, E.J. MultiResolution Detection of spiculated lesions in digital mammograms. IEEE Trans. Image Process.
**2001**, 10, 874–884. [Google Scholar] - Matsubara, T.; Fujita, H.; Endo, T.; Horita, K.; Ikeda, M.; Kido, C.; Ishigaki, T. Development of mass detection algorithm based on adaptive thresholding technique in digital mammograms. In Digital Mammogrpahy; Doi, K., Giger, M.L., Eds.; Elsevier: Amsterdam, The Netherlands, 1996; pp. 391–396. [Google Scholar]
- Qian, W.; Li, L.; Clarke, L.; Clark, R.A.; Thomas, J. Comparison of adaptive and non adaptive CAD methods for mass detection. Acad. Radiol.
**1999**, 6, 471–480. [Google Scholar] [CrossRef] - Feig, S.A.; Galkin, B.M.; Muir, H.D. Evaluation of breast microcalcifications by means of optically magnified tissue specimen radiographs. In Recent Results in Cancer Research; Brunner, S., Langfeldt, B., Eds.; Springer: Berlin, Germany, 1987; Vol. 105, pp. 111–123. [Google Scholar]
- Sickles, E.A. Breast calcifications: mammographic evaluation. Radiology
**1986**, 160, 289–293. [Google Scholar] [CrossRef] [PubMed] - Yu, S.N.; Li, K.Y.; Huang, Y.K. Detection of microcalcifications in digital mammograms using wavelet filter and Markov random field model. Comput. Med. Imaging Graph.
**2006**, 30, 163–173. [Google Scholar] [CrossRef] [PubMed] - Soltanian-Zadeh, H.; Rafiee-Rad, F.; Pourabdollah-Nejad, S. Comparison of multiwavelet, wavelet, Haralick, and shape features for microcalcification classification in mammograms. Pattern Recogn.
**2004**, 37, 1973–1986. [Google Scholar] [CrossRef] - Haralick, R.M. Statistical and structural approaches to texture. Proc. IEEE
**1979**, 67, 786–804. [Google Scholar] [CrossRef] - Bhangale, T.; Desai, U.B.; Sharma, U. An unsupervised scheme for detection of microcalci cations on mammograms. In Proceedings of IEEE International Conference on Image Processing, Vancouver, BC, Canada, September 10–13, 2000; pp. 184–187.
- Rogova, G.L.; Stomper, P.C.; Ke, C. Microcalcification texture analysis in a hybrid system for computer aided mammography. Proc. SPIE
**1999**, 3661, 1426–1433. [Google Scholar] - Heath, M.; Bowyer, K.W.; Kopans, D.; Moore, R.; Kegelmeyer, P. Current status of the digital database for screening mammography. In Digital Mammography; Kluwer Academic Publishers: Nijmegen, The Netherlands, 1998; pp. 457–460. [Google Scholar]
- Nielsen, M.; Johansen, P.; Olsen, O.F.; Weickert, J. (Eds.) Scale-Space Theories in Computer Vision; Lecture Notes in Computer Science; Springer: Berlin, Germany, 1999; Vol. 1682.
- ter Haar Romeny, B.M.; Florack, L.; Koenderink, J. (Eds.) Scale-Space Theory in Computer Vision; Lecture Notes in Computer Science; Springer: Berlin, Germany, 1997; Vol. 1252.
- Perona, P.; Malik, J. Scale space and edge detection using anisotropic diffusion. IEEE Trans. Pattern Anal. Mach. Intell.
**1990**, 12, 629–639. [Google Scholar] [CrossRef] - Barash, D.; Comaniciu, D. A common framework for nonlinear diffusion, adaptive smoothing, bilateral filtering and mean shift. Image Vis. Comput.
**2004**, 22, 73–81. [Google Scholar] [CrossRef] - Weickert, J.; ter Haar Romeny, B.M.; Viergever, M.A. Efficient and reliable schemes for nonlinear diffusion filtering. IEEE Trans. Image Process.
**1998**, 7, 398–410. [Google Scholar] [CrossRef] [PubMed] - Gonzalez, R.C.; Woods, R.E. Digital Image Processing, 2nd ed.; Prentice Hall: Upper Saddle River, NJ, USA, 2002. [Google Scholar]
- Weldon, T.P.; Higgins, W.E.; Dunn, D.F. Gabor filter design for multiple texture segmentation. Opt. Eng.
**1996**, 35, 2852–2863. [Google Scholar] [CrossRef] - Jones, J.P.; Palmer, L.A. The two-dimensional spectral structure of simple receptive fields in cat striate cortex. J. Neurophysiol.
**1987**, 58, 1187–1211. [Google Scholar] [PubMed] - Chang, S.F.; Sikora, T.; Puri, A. Overview of the MPEG-7 standard. IEEE Trans. Circ. Syst. Video Technol.
**2001**, 11, 688–695. [Google Scholar] [CrossRef] - Won, C.S.; Park, D.K.; Park, S.J. Efficient use of MPEG-7 edge histogram descriptor. ETRI J.
**2002**, 24, 23–30. [Google Scholar] [CrossRef] - Bezdec, J.C. Pattern Recognition with Fuzzy Objective Function Algorithms; Plenum Press: New York, NY, USA, 1981. [Google Scholar]
- Bowd, C.; Medeiros, F.A.; Zhang, Z.; Zangwill, L.M.; Hao, J.; Lee, T.W.; Sejnowski, T.J.; Weinreb, R.N.; Goldbaum, M.H. Relevance vector machine and support vector machine classifier analysis of scanning laser polarimetry retinal nerve fiber layer measurements. Invest. Ophthalmol. Vis. Sci.
**2005**, 46, 1322–1329. [Google Scholar] [CrossRef] [PubMed] - Zangwill, L.M.; Chan, K.; Bowd, C.; Hao, J.; Lee, T.W.; Weinreb, R.N.; Sejnowski, T.J.; Goldbaum, M.H. Heidelberg retina tomograph measurements of the optic disc and parapapillary retina for detecting glaucoma analyzed by machine learning classifiers. Invest. Ophthalmol. Vis. Sci.
**2004**, 45, 3144–3151. [Google Scholar] [CrossRef] [PubMed] - Computer-aided detection and diagnosis in mammography. In Handbook of Image and Video Processing, 2nd ed.; Bovik, A. (Ed.) Academic Press: Boston, MA, USA, 2005; Chapter 10.4; pp. 1195–1217.

© 2010 by the authors; licensee MDPI, Basel, Switzerland. This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).