Classification of Microcalcification Clusters in Digital Mammograms Using a Stack Generalization Based Classifier

This paper presents a machine learning based approach for the discrimination of malignant and benign microcalcification (MC) clusters in digital mammograms. A series of morphological operations was carried out to facilitate the feature extraction from segmented microcalcification. A combination of morphological, texture, and distribution features from individual MC components and MC clusters were extracted and a correlation-based feature selection technique was used. The clinical relevance of the selected features is discussed. The proposed method was evaluated using three different databases: Optimam Mammography Image Database (OMI-DB), Digital Database for Screening Mammography (DDSM), and Mammographic Image Analysis Society (MIAS) database. The best classification accuracy (95.00±0.57%) was achieved for OPTIMAM using a stack generalization classifier with 10-fold cross validation obtaining an Az value equal to 0.97±0.01.

[23] to segment nuclei for single channel image. 58 A superpixel-based framework was presented for segmentation that used a "hybrid" approach which 59 was intended to integrate the advantage of region-based clustering algorithm and an edge detector 60 with an integrated edge map.  containing 5% of the highest positive intensity values from the difference image, (c) eliminating single pixels and perform erosion on (b), (d) Image A: pixels having higher value then the specified threshold mention in section 2.2.2 are added to (c), (e) contrast enhancement filter applied to the bi-cubic interpolated image of (a), (f) Image B: five percent of the pixels having the highest intensity will be selected from the filtered image, (g) Image C: Logical summation of (d) and (f higher spatial frequency. The weight was set to 0.10, as a increase in weight above 0.8 did not provide   6. (a) Elimination of blobs containing one or two pixels from the probability image generated in section 2.2.2 (see Fig. 5g), (b) final probability image, for example case: (1_1076_463), after discarding all blobs from 1 cm 2 pixel block whilst objects inside the block were less than 3. In this example, all the 1 cm 2 pixel blocks contained more than 3 blobs so no object elimination was done.   From the probability image generated in section 2.2.2 (see Fig. 5g) regions containing one or two 178 pixels were removed, as they were considered artifacts [35], and an erosion operation with a 2 × 2 179 unit element kernel was performed: see Fig. 6a. Here, a 2 × 2 unit element kernel was used for the

Segmentation evaluation 206
The evaluation was carried out using the Dice similarity metric [37] [38], and is in line with our 207 previous work [11]. The reference masks (see Fig. 8b), were generated from the radiologist's annotation 208 outline: see Fig. 8a. Subsequently, individual MCs that reside inside the radiologist's annotation were 209 considered to generate convex hull. This convex hull (see Fig. 8f) and the reference mask (see Fig. 8b), 210 were used to calculate the Dice similarity score (see (Fig. 8g -8i)). The Dice similarity metric for DDSM 211 and MIAS is presented in Fig. 9. though the similarity score for our proposed approach is slightly higher than with Oliver's method     To ensure the robustness of the feature selection and avoid bias, all the data was divided using

275
The feature extraction and selection technique, as previously mentioned, was applied separately 276 on the digitized and digital databases to investigate whether the provided features from the digital 277 database outperformed those extracted from the digitized database in classifying MC clusters. Table 1 represents the 4 most important features extracted and selected using Digitized database (DDSM), and 279 The in-depth details on the impact of our feature selection approach are described in section 6. The results are presented in Table 6 in section 6.

289
To evaluate the reliability of the feature selection approach, images from the digital and digitized 290 databases were separately divided into ten folds. The process of feature selection was performed on To investigate the influence of shape, size, and texture aspects, each individual feature type 295 was separately used for the classification using ensemble learning, see time guarantees that the algorithm will come up with the same results-identical for each run. In 308 this experiment, the seed number was set to 1 for the first run and its value was increased by 1 with 309 each run. Hence, for 10 runs, the maximum seed value was set to 10. In 10-fold cross-validation, the 310 original sample is randomly partitioned into 10 equal size sub samples (folds). Of the 10 sub samples, 311 a single sub-sample is retained for testing the model, and the remaining (10-1) sub-samples were 312 used as training data. The cross-validation process is then repeated 10 times, with each of the folds 313 used exactly once as the test data. The 10 results from the folds were averaged to produce a single 314 estimation. The advantage of this method is that all observations are used for both training and testing, 315 and each observation is used for testing exactly once.

316
Note that the feature selection was only performed on the training data and therefore it is not 317 expected that overfitting will happen. By using stratified 10-fold cross-validation we avoided the    Fig. 11). A z is equivalent to the Wilcoxon signed-ranks test, which is a nonparametric 367 alternative to the paired t-test [58]. All the classification and evaluation aspects were implemented 368 using the Weka [59] data mining suite.

387
The stacked generalization approach [46] was applied to create an additional classifier, described features that were extracted and selected from the digital database (OPTIMAM) whilst using LOOCV Table 5. Classification accuracy using LOOCV and 10-fold CV applying all 51 and the 2 most salient features from digital mammogram, and 4 most salient features from the digitized mammogram using stacked generalization. The images were segmented following the clinical grounding of cluster distribution. Naive Bayes was used as the meta-classifier. The results presented in Table 3, Table 4, and  distribution in the sample cases. It is also noteworthy that the digital images were manually annotated   results. The feature extraction and selection were individually done using the digitized and digital 527 mammograms, and afterwards those features were used to classify clusters in the digital database.

528
The proposed method was evaluated using three different databases: OPTIMAM, DDSM, and MIAS.

529
Two different classifiers-ensemble learner and stack generalization, were applied to evaluate the 530 classification result. The best classification accuracy (96.72 ± 0.46%) for the digital database was 531 achieved by using a stack generalization classification with 10-fold CV obtaining an A z value equal to 532 0.98 ± 0.00.