You are currently viewing a new version of our website. To view the old version click .
Bioengineering
  • Article
  • Open Access

31 March 2025

Automatic Blob Detection Method for Cancerous Lesions in Unsupervised Breast Histology Images

,
,
and
Department of Computer Science, University of South Africa, Preller Street, Muckleneuk Ridge, Pretoria 1709, South Africa
*
Authors to whom correspondence should be addressed.
These authors contributed equally to this work.
This article belongs to the Special Issue Artificial Intelligence in Medical Image Processing and Segmentation, 2nd Edition

Abstract

The early detection of cancerous lesions is a challenging task given the cancer biology and the variability in tissue characteristics, thus rendering medical image analysis tedious and time-inefficient. In the past, conventional computer-aided diagnosis (CAD) and detection methods have heavily relied on the visual inspection of medical images, which is ineffective, particularly for large and visible cancerous lesions in such images. Additionally, conventional methods face challenges in analyzing objects in large images due to overlapping/intersecting objects and the inability to resolve their image boundaries/edges. Nevertheless, the early detection of breast cancer lesions is a key determinant for diagnosis and treatment. In this study, we present a deep learning-based technique for breast cancer lesion detection, namely blob detection, which automatically detects hidden and inaccessible cancerous lesions in unsupervised human breast histology images. Initially, this approach prepares and pre-processes data through various augmentation methods to increase the dataset size. Secondly, a stain normalization technique is applied to the augmented images to separate nucleus features from tissue structures. Thirdly, morphology operation techniques, namely erosion, dilation, opening, and a distance transform, are used to enhance the images by highlighting foreground and background pixels while removing overlapping regions from the highlighted nucleus objects in the image. Subsequently, image segmentation is handled via the connected components method, which groups highlighted pixel components with similar intensity values and assigns them to their relevant labeled components (binary masks). These binary masks are then used in the active contours method for further segmentation by highlighting the boundaries/edges of ROIs. Finally, a deep learning recurrent neural network (RNN) model automatically detects and extracts cancerous lesions and their edges from the histology images via the blob detection method. This proposed approach utilizes the capabilities of both the connected components method and the active contours method to resolve the limitations of blob detection. This detection method is evaluated on 27,249 unsupervised, augmented human breast cancer histology dataset images, and it shows a significant evaluation result in the form of a 98.82 % F1 accuracy score.

1. Introduction

Nuclear detection is a key step for most CAD systems targeting image analysis, such as in automated capturing and grading for breast cancer (BC) tissue samples. The early diagnosis of BC relies heavily on how cancerous lesions spread across nuclei and tissue glands in an image. Conventional methods require medical practitioners to use the Bloom-Richardson grading system to determine the grade and extent of tumor cells morphing into normal nucleic cells, the degree of morphing, and the extent to which the tumor is increasing [1]. Therefore, the grading system directly correlates with the shape and appearance of breast cancer nucleus objects in histology images.
Recently, there has been an increased demand for the early detection of breast cancer (BC) at screening sites/hospitals, thus opening up avenues for new research. The early, automatic detection of cancer increases the chances of making accurate decisions for successful treatment. Therefore, screening procedures are analyzed through computer-aided (CAD) systems, which use medical images to improve the clinical efficiency and confidentiality. Conventionally, the evaluation of medical images has been quite time-inefficient and varies from person to person; thus, most of the recent research studies have targeted the analysis of medical images to aid in clinical diagnosis [2].
Automated nucleus object detection has also been challenging given the large number of nucleus objects, the sizes of high-resolution digitized medical images, and the variable sizes, appearances, textures, and shapes of individual nucleus objects. The hematoxylin and eosin (H&E) staining procedure has shown significant results, being the preferred standard for the histologic examination of human glands/tissues [3]. The early diagnosis of BC relies heavily on the extent to which cancerous lesions spread in histology images—specifically in nuclei and tissue glands—thus assisting in prognosis.
The distinction brought about by BC classification as either malignant or benign has led researchers to extensively explore the application of deep learning methods in the assessment of progression and treatment in cancer. Histopathology represents one use case that exemplifies the application of deep learning methods to big image data given their size and complexity. DL techniques have shown large success in various fields, namely object detection, image recognition, and classification. CAD systems targeting cancer detection in histology images have various research prospects. This study focuses on examining the tissue characteristics, cell nucleus isolation/identification, and cancerous lesion detection, thus assisting early diagnosis.
The detection of cancerous lesions in breast histology images is solely dependent on the segmentation of nucleus objects. Most of these segmentation methods largely revolve around techniques that target regions of interest (ROI) and edges/boundaries, namely watershed segmentation, active contours, and other grouped techniques that involve different morphology operations [4].
However, segmentation techniques suffer from oversegmentation and thus do not work well for overlapping nucleus cells. Active contours have been increasingly used in image segmentation; however, their major limitation is the inability to deal with the object inhomogeneity in large images, which leads to the segmentation of multiple objects as a single object [5].
In histopathology, the morphological appearance of different features and structures in an image, such as nucleus cells or glands, often indicates the presence of a disease. In the case of BC, the shape and morphological characteristics of nuclei in histology images correlate with the disease aggressiveness [6]. Conventionally, thresholding and morphology operations are the preferred techniques for image segmentation. Morphology operations, as proposed by [7], are used to pre-process, threshold, and further post-process images to detect edges. Automated segmentation techniques such as grayscaling, median filtering, and bottom–top hat filters, as proposed by [8], use pre-processing steps to enhance the image contrast. Thresholding is used to identify regions of interest, while post-processing morphology techniques, namely dilation, area opening, and hole filling, are used to improve the final segmentation results.
Thresholding and binary morphological operations are also presented in [9]; they utilize dilation and erosion to identify the breast region of interest (ROI), masking and isolating it from unwanted pectoral muscle regions. In [10], image pre-processing is handled through normalization, segmentation through color de-convolution for nucleus enhancement, and data augmentation to increase the dataset size, and a binary threshold is used to detect nucleus edges in the images.
Hence, CAD systems are important in predicting BC via accurately and efficiently isolating and identifying the locations of nucleus objects and segmenting them so that relevant morphological features related to BC may be obtained and used for subsequent detection.
The segmentation of cancerous nucleus cells in breast histology images presents several challenges, as discussed herein; thus, this study proposes an automatic blob detection method for cancerous lesions in unsupervised breast histology images.
This proposed method pre-processes images through various augmentation methods, namely random cropping, rotation, vertical and horizontal shift translation, and scaling adjustments to increase the size of the dataset. The H&E stain normalization technique is applied to the resultant augmented images to remove color inconsistencies and separate and isolate nucleic features from tissue structures. Morphology operations, namely erosion, dilation, opening, and a distance transform, are then used to highlight foreground and background pixels in the image.
Subsequently, the connected components analysis method is introduced to group highlighted pixel components with similar characteristics and assign them their relevant labeled components (binary masks). The active contours method then utilizes the resultant binary masks for further segmentation by resolving the inhomogeneity of ROI boundaries/edges. Lastly, a deep learning recurrent neural network (RNN) model automatically detects and extracts nucleus objects that contain cancerous lesions and their edges in the histology images via the blob detection method. This proposed approach utilizes the capabilities of the connected components analysis method and active contours method to resolve the limitations of blob detection.
The main contributions of this paper are as follows.
  • Augmentation methods are used to deal with data scarcity. Additionally, stain normalization is used to deal with color inconsistencies.
  • Morphology operations enhance the image by highlighting important features. The connected components analysis method is used to group components with similar characteristics and assist in separating overlapping and non-overlapping objects.
  • The active contours method uses the obtained binary masks from the connected components analysis to highlight and isolate the edges/boundaries of ROIs. Further, the blob detection method is used to resolve undersegmentation from the previous step and identify BC lesions (blobs) from the previously obtained masked images.
The rest of this paper is as follows: we discuss existing related work in Section 2, the proposed segmentation method is described in Section 3, the results and discussion are given in Section 4, the conclusions are presented in Section 5, and the future work is described in Section 6.

3. Methods and Techniques

The proposed approach achieves the pre-processing, segmentation, and detection of BC in unsupervised histology images through the following steps.

3.1. Dataset Preparation and Pre-Processing

This study uses 24 unsupervised BC histology images from the publicly available Kaggle dataset repository.
Data are key for any neural network model to learn and deduce useful information from the data provided [45]. Recent artificial intelligence research has been heavily reliant on deep learning algorithms. These algorithms outperform conventional machine learning methods and thus rely on large datasets being available for model training. Hence, data scarcity in this case is addressed by applying a suitable method, namely data augmentation, which aims at increasing the dataset size, as large datasets are not publicly available, which is often the case with medical images.

3.1.1. Data Augmentation

Data augmentation artificially creates additional data, which are used to train DL models, resulting in performance improvements when tested/validated on a separate unlabeled dataset. The authors in [46] present a study where data augmentation was utilized on medical images to train deep learning models. The review provides insights into these techniques and supports the validation of the resultant models.
The scarcity of publicly available datasets also leads to issues such as data bias, inaccurate results, and overfitting, but data augmentation resolves these issues. Data augmentation techniques improve the performance of the deep learning-based diagnosis of medical conditions in different organs, namely the breast, lung, brain, and eyes, via different imaging modes, such as mammography, computed tomography (CT), and magnetic resonance imaging (MRI), as examined in [47].
Data augmentation also entails artificially transforming existing images in a dataset by rotation, scaling, cropping, flipping, and height and width shifts to create more images. Augmentation is preferable based on its significant effectiveness in training different deep learning models [48]. Further, it assists in solving data scarcity issues by increasing the size and variety of images in datasets, which useful for the training of models without collecting new samples [49], and this increase in the dataset size assists in maintaining the image quality [18].
In this study, data augmentation methods such as rotation, scaling, and height and width shifts are used to increase the dataset size from 24 unsupervised BC histology images to 27,249 images.

3.1.2. Data Stain Normalization

The emergence of medical imaging has led to advanced CAD systems and AI technologies that assist in digital pathology. The examination of tissue samples in medical images is commonly used to diagnose cancerous diseases, but the analysis of histology images is not always accurate. During the preparation and pre-processing stages, images exhibit various distortions and inconsistencies. These inconsistencies lessen the accuracy of computer-aided diagnosis, thus affecting pathologists’ diagnoses. Therefore, an effective stain normalization method is used to standardize and minimize color inconsistencies and variations in histology images. The authors of [50] review different stain normalization techniques, highlighting the main methodologies, contributions, strengths, and weaknesses, and rank them according to selected performance and accuracy scores.
In this study, color inconsistencies and variations are attributed to the H&E staining procedure, which highlights microscopic nucleic features, tissue structures, and image transformations resulting from data augmentation. Laboratory slide preparation, examination, analysis, and the digitalization of scanning samples are other factors that lead to image variations [51]. These factors negatively impact the training and testing of neural networks. Consequently, this study utilizes the Macenko et al. [22] stain normalization technique for BC images to separate nucleus features from tissue structures. The images in the dataset are first converted from the BGR to the RGB color space to enable smooth stain normalization.
Macenko stain normalization: This is used to prepare tissue slides. Image colors are converted to their optical density (OD) equivalents via a simple logarithmic transformation, as shown below:
O D = log 10 ( I )
with I as the RGB color vector and individual components normalized to [0, 1].
A value β is used as a threshold value to remove data with a higher OD intensity.
Single value decomposition (SVD) is applied to optical density tuples to create a plane. The plane corresponds to the largest singular values. OD-transformed pixels are then projected onto the plane to determine the angle at each point related to the first SVD direction. The color space transformation is applied to the original BC histology image. An image histogram is stretched such that the range covers the lower ( 100 β ) % of the data.
Minimum and maximum vectors are calculated and projected back to the OD space. The hematoxylin stain corresponds to the minimum vector, while the eosin stain is the maximum vector. Stain concentrations are determined to form a matrix representing the RGB channels and OD intensities, respectively. This study sets the values of α and β at 1 and 0.15, respectively. Figure 2 shows the original, augmented, normalized H&E, normalized H, and normalized E breast cancer histology images, respectively. After this stain normalization process, our proposed approach focuses on the normalized H image (image with only nucleus objects), having isolated it from the greater normalized H&E image set.
Figure 2. Original H&E image, augmented H&E image, normalized H&E image, normalized H image, and normalized E image, respectively.

3.2. Image Enhancement

Dataset image enhancement improves the brightness, contrast, and scaling to compensate for the non-uniformity of image illumination. This study utilizes thresholding, morphology operations, and a distance transform to enhance the images in the dataset.

3.2.1. Thresholding

Binary thresholding is used to capture the outline of a BC ROI in the normalized histology image/images and is shown in Figure 3.
Figure 3. Images after binary thresholding.

3.2.2. Morphology Operations

These include dilation and erosion operations to remove noise, remove overlapping edges, and extract certain regions from the BC histology images. Opening and closing operations are used to distinguish between the background and the foreground in an image. These regions are separated by diminishing and accentuating image pixels and edges. These operations also highlight the unknown area between the background and the foreground. Figure 4 shows images resulting from morphology operations.
Figure 4. Images after morphology operations, namely erosion and dilation operations and opening operations (clearing borders), respectively.

3.2.3. Distance Transform

This isolates nucleus objects in the image by locating the foreground and deleting remaining ROIs. It also highlights and emphasizes the foreground objects and background of the BC histology image. Figure 5 shows the image after distance transformation.
Figure 5. Image after distance transformation.

3.3. Segmentation

Nucleus segmentation is handled by the connected components analysis method and the active contours segmentation method.

3.3.1. Connected Components Analysis

The CCA method combines pixel components with similar neighborhood properties. Additionally, image pixels are grouped into connected darker and brighter regions. Darker regions form the background, while brighter pixels form the foreground. The connected components analysis (CCA) method extracts these ROIs as binary masks from BC histology images. These binary mask ROIs are shown in Figure 6.
Figure 6. Image after connected components analysis.
The detection of darker and brighter regions is achieved by the Laplacian of Gaussian approach, using a convolutional kernel of the form
L o G = x 2 + y 2 2 σ 2 σ 4 e x 2 = y 2 2 σ 2
such that σ is the kernel width.

3.3.2. Active Contours Segmentation

Active contours have been widely used in segmenting medical images, especially in computer vision tasks, to describe the boundary shapes in an image. They are widely utilized to resolve cases where the approximate shape of a boundary/edge is unknown. The active contours model adapts and evolves according to image color variations, enabling the matching and tracking of object boundaries/edges. The active contours method also accentuates elusive boundary (contours) shapes by ignoring missing/inhomogeneous boundary/edge information.
The authors in [52] propose an active contours-based model (snake) to detect the boundaries of objects from deformed initial contours. The deformed contours use an energy function that decreases when the snake perfectly fits the object boundary in an image. With an increased number of objects in the medical image, the snake approach experiences difficulties in segmenting the image. Therefore, the numerical step-by-step procedures proposed in methods used in [53,54] are used to detect topology changes automatically in the image.
The resultant binary-masked images from the previous connected components analysis are then used for contour detection. Multiple boundaries of objects are detected, as shown below.
Let C ( p , t ) : 0 , 1 R 2 denote a family of curves resulting from the motion C 0 ( P ) directed towards inward Euclidean vector N. Let I denote the image where the object boundaries are to be identified and detected. We assume that the plane evolution of the curve is given by
d C d t = g ( I ) [ k + v ] N , C ( p , t = 0 ) = C 0 ( i n i t i a l c u r v e ) ,
where v is a constant, k is the local curvature, N is the unit vector normal to the curve, and g ( I ) is a factor related to the image content.
We assume that the deforming curve C ( p , t ) is the zero value of a function ⋃, i.e., C ( p , t ) is a set of points ( x , y , t ) given U ( x , y , t ) = 0 . Given Equation (2) and the derivative of U ( x , y , t ) = 0 with respect to space and time, the deformation of C ( p , t ) is given by the deformation of surface ( x , y , t ) , whose evolution is given by
d U ( x , y , t ) d t = g ( I ) d v i | | + v | | , ( x , y , t = 0 ) = 0 ( x , y ) ,
Here, | | denotes the magnitude of the gradient, d v i denotes the divergence operator, and 0 is the level set representation of C 0 .
Consequently, the authors in [55,56] presented the concept of geodesic active contours, which results in a geometric model given by
d U ( x , y , t ) d t = g ( I ) d v i | | + v | | , + g ( x , y , t = 0 ) = 0 ( x , y ) .
From these results, a field is generated, ( x , y , t ) , having null positions corresponding to active contour locations at any given evolution time. In these equations, v is a constant that constrains the active contours from either expanding or shrinking and is a function of the method used to draw the initial coarse contour. This constant is key within the model because it allows the initial curve to acquire a non-convex shape. d v i ( / | | ) denotes the curvature of the level set passing by a point and determines the regularizing effect of the model.
The function g ( I ) is utilized as a stopping factor in the evolving curve; specifically, the factor is small near an edge/boundary so as to stop the evolution when the contour moves close to the edge. The function g ( I ) is expressed as
g ( I ) = 1 ( 1 + | ( I 1 | p ) ,
Given the I 1 results from the low-pass Gaussian filtering of image I , p = 1 or 2 and other expressions of g ( I ) can be used to monitor other features. The effect of ( g ) is to capture the evolving contour as it moves towards an edge and push it back if it crosses the edge. Therefore, unlike the conventional snake method, the geometric active contours model is stable and handles topographical changes, namely splitting and merging, and is devoid of any computational problems. The active contours method extracts ROIs of interest, as shown in Figure 7.
Figure 7. Active contours in histology images.

3.4. Detection

Geometric features are the most important extraction features when detecting cancerous lesions in BC histology images. The previous active contours segmentation method extracts these geometric features, resulting in highlighted edge boundaries and the isolation of BC lesions from other ROIs in the image. Further, it is necessary to address oversegmentation brought about by the active contours segmentation failing to resolve the image edges’ inhomogeneity. This inhomogeneity leads to a high frequency of non-cancerous lesions (false positives) in the image.
Consequently, models experience longer processing times, leading to the poor performance of CAD systems. The authors in [41] adopt blob sensitivity parameters to select caries candidates and eliminate false positives from carious candidates. The present study uses a similar blob detection approach with high blobness values and low blobness values indicating BC and non-BC candidates, respectively. Therefore, the maximum and mean blobness values of cancerous candidates are used to eliminate false positives. The formulas are defined as
B l o b n e s s m e a n ( T ) = Σ p ε T B l o b n e s s ( λ p ) N T
B l o b n e s s m a x ( T ) = m a x p ε T B l o b n e s s ( λ p )
The T refers to a cancerous candidate with N T voxels and p is a voxel belonging to a cancerous candidate. The size of the blobs is another feature extracted to aid in selecting BC candidates. The feature size is used since cancerous candidate detection based on Hessian analysis is sensitive to image intensity variations, which lead to false positives. The linear regression model [57] is applied to eliminate false positives within BC candidates. The cancerous selection function L s ( T ) is defined as
Z ( T ) = β 0 + i = 1 N f β i x i
L s ( T ) = 1 1 + e Z ( T )
where N f is the number of features, x i is the feature value, β 0 is a constant coefficient, and β i is the corresponding coefficient estimated by the linear regression model. No threshold value is needed to determine the cancerous lesions after eliminating false positives, since geometric features have already been extracted via the active contours method in the segmentation step. Lastly, the remaining selected cancerous candidates are classified as “BC detected” and are shown in Figure 8.
Figure 8. BC detection after blob detection in masked images.
The effect of the proposed blob detection method for unsupervised BC histology images was evaluated using a deep learning recurrent neural network model. The recurrent neural network architecture consisted of eight layers: one input dense layer, three hidden dense layers, one output dense layer, and three dropout layers. The cross-entropy was minimized using categorical cross-entropy with the Adam optimizer, with a learning rate of 0.0001, a batch size of 32, dropout of 0.2, and 30 epochs. We chose these hyperparameters based on iterative model experiments.

4. Results and Discussion

The experimental results of this study are based on a performance evaluation of the automatic blob detection method on unsupervised BC histology images. These experiments were carried out on 27,249 unsupervised BC histology images split into 20,436 training and 6813 testing set images. The data preparation, pre-processing, image enhancement, segmentation, and detection steps applied to the unsupervised BC dataset have been discussed in the preceding section.
Figure 1 shows an overview of the processing stages involved in the proposed detection method, namely pre-processing, image enhancement, segmentation, and detection. Consequently, the performance evaluation of the proposed approach is based on its automatic ability to detect cancerous lesions in unsupervised breast histology images, as shown in Figure 9.
Figure 9. BC detection after blob detection in original images.
These image results demonstrate the efficacy of the proposed method in detecting cancerous lesions in other histology images; it is not limited to breast histology images. Breast cancerous lesions are detected via their spread on tissues, signified by irregularities in the histology images. The H&E stain separation and normalization technique aids the separation of image pixels into hematoxylin (blue color), denoting nucleic features, and eosin (pink color), denoting tissue structures. Therefore, any other breast lesion on either lobules or ducts can be detected by separating nucleus objects from non-nucleus objects in the histology images, demonstrating the applicability of the proposed approach.
The proposed method uses various approaches to improve the efficiency in detecting BC in histology images. These approaches include data augmentation to deal with the small size of the available dataset and stain normalization to resolve color inconsistencies resulting from augmentation. Further, image enhancement is handled through thresholding and various morphology operations, namely dilation, erosion, and the distance transform. These enhancement techniques highlight the BC ROIs and remove noise and overlapping objects, thus isolating nucleic objects from non-nucleic objects.
The image results obtained from the enhancement phase are then segmented via the connected components analysis method and the active contours method. These methods are utilized to group components with similar characteristics into binary masks and resolve the ROI edges/boundaries’ inhomogeneity. Geometric features extracted from the resultant images are used to resolve any remaining blurred ROI edges/boundaries via the blob detection method. Consequently, this addresses oversegmentation and results in isolated BC lesions.
Our proposed technique achieved significant results in detecting cancerous lesions throughout the unsupervised breast histology image dataset. These results are attributed to various factors, namely dataset augmentation, stain normalization, image enhancement via morphology operations, and the mentioned segmentation methods.
Table 1 shows a comparison between our proposed approach and other state-of-the-art ROI detection-related techniques. From the literature reviewed, some of the methods discussed utilize image patches on individual images for the faster segmentation of whole slide images (WSI) and also aid data augmentation. Transfer learning has also been used in various methods in the literature to automatically extract feature vectors.
Table 1. Performance evaluation and comparison with state-of-the-art methods.
The DL methods discussed herein that use supervised images tend to produce high-performance results since the images are previously annotated and already pre-processed. There is also the use of pre-trained models such as VGG19 and ResNet to automatically segment, detect, and classify various ROIs. The preferred method in dealing with color inconsistencies is the one in [22] since it targets the specific nucleus ROIs.
For those methods discussed herein that do not utilize stain normalization but have produced significant results, emphasis is placed on the segmentation/detection methods applied to extract the necessary features. Most methods dealing with unsupervised images tend to focus on image pre-processing and stain normalization to assist in resolving data scarcity issues and color irregularities, respectively. Image enhancement techniques also play a pivotal role in highlighting and isolating important image features; thus, the utilization of morphology operations has been discussed in the literature. Segmentation and detection methods that use the obtained results from image enhancement stages tend to produce significant results due to focused image processing.
Figure 10 shows the model performance before weight regularization techniques are applied, thus exhibiting the presence of overfitting.
Figure 10. Model training/validation loss/accuracy graph curves before weight regularization techniques are applied.
Figure 11 visually shows the model’s performance on unsupervised BC histology images—specifically, the detection of BC after fine-tuning its parameters. Weight regularization techniques, namely dropout and early stopping, were applied to deal with overfitting.
Figure 11. Model training/validation loss/accuracy graph curves after dropout and early stopping are applied.

Limitations

Oversegmentation and the remaining edge inhomogeneity in the active contours method used herein subsequently affect blob candidate selection. Moreover, most segmentation/detection methods tend to perform segmentation/detection on whole slide images (WSI) rather than image patches, thus increasing the model’s computational time.
Additionally, there is reluctance among pathologists to invest in CAD systems due to the large number of false positive results. Therefore, it is necessary to introduce deep learning neural networks within CAD systems to assist in the early diagnosis and treatment of breast cancer. Consequently, the proposed automatic blob detection method for cancerous lesions in human breast histology images can be used to isolate nucleus objects from other tissue structures in an image dataset, leading to faster model computability.

5. Conclusions

Recently, we have seen the increased utilization of CAD systems for the analysis of medical images, including breast histology images, and studies of how they can facilitate and assist in the early diagnosis of cancerous lesions, as well as segmentation and detection. Segmentation and detection are key tasks that aid imaging analysts in obtaining important information from medical images, including histology images.
Several computer-aided diagnosis systems have emerged that aid the extraction of ROIs to identify objects with similar features and characteristics in images for exploration purposes. These systems allow medical practitioners to interpret medical images and thus improve the efficiency of diagnosis and treatment tasks. Segmentation and detection tasks have been improved with the introduction of these automatic diagnostic systems.
This study has clearly shown the importance of prior image processing in effectively detecting cancerous lesions in breast histology images. The proposed approach has been systematically broken down into stages, namely pre-processing, image enhancement, segmentation, and detection. The preparation of the dataset involves resolving the data scarcity via data augmentation, while stain normalization addresses color inconsistencies resulting from augmentation.
Thresholding and morphology operations are used to remove noise and enhance unclear features in the resultant images, respectively. The segmentation methods used in this study specifically target nucleus regions in histology images by grouping components with similar characteristics, removing overlapping and non-nucleus objects, and resolving edge boundary inhomogeneity while topographically distinguishing between different image regions, namely nucleus ROIs, the foreground, and the background.
The outcome of the entire proposed technique is the development of a deep learning recurrent neural network that automatically detects cancerous lesions in unsupervised breast histology images. The neural network evaluates the detection task effectively and produces significant results. From the literature discussed and the results obtained with the proposed method, image pre-processing and image enhancement techniques, particularly augmentation and stain normalization, are crucial to the segmentation and detection task. These methods play a pivotal role in image feature extraction and in the detection of cancerous lesions in histology images, in turn improving the overall performance and model computability.

6. Future Work

CAD systems are necessary and offer a faster and better prognosis to assist in early treatment. These systems offer an alternative solution for medical practitioners when detecting BC lesions in histology images.
This study provides a valuable solution that enables models to correctly identify, separate, and detect cancerous lesions (nucleus objects and their edges) in BC histology images given the significant results obtained compared to other methods. The encouraging results of automatic blob detection offer insights and further avenues for exploration. Future work is not just limited to breast histology images and unsupervised images. Exploration areas include medical fields with publicly available datasets and the introduction of hybrid approaches. Additionally, weight regularization methods and the fine-tuning of supervised and unsupervised models can be explored to assist in the improvement of the model’s efficacy and reduce overfitting.
These perspectives are discussed in detail below.
  • Data availability and integrity. Most deep learning approaches require huge volumes of data to achieve meaningful performance results. Therefore, publicly available image datasets are necessary, especially histology image datasets, to assist deep learning.
  • Regularization methods. These are needed to improve the performance of models. This can be achieved through model hyperparameter tuning, such as optimizing the learning rates, dropout, loss functions, activation functions, and early stopping methods.
  • Hybrid image processing/model approaches. Combining various/several image processing methods or model architectures, it would be possible to form a hybrid method that improves the overall evaluation performance. This combination can occur at any step in the model, such as pre-processing, combining various attributes of different models to form one that will enhance the training, extraction, detection, and classification tasks. Additionally, future work could expand, explore, and diagnose other human and animal diseases through image datasets, moving beyond BC histology images.

Author Contributions

Conceptualization, V.M., E.M., Z.W., and D.K.M.; methodology, V.M., E.M., Z.W., and D.K.M.; software, V.M., and D.K.M.; validation, V.M., E.M., Z.W., and D.K.M.; formal analysis, V.M., and D.K.M.; investigation, V.M., and D.K.M.; resources, V.M., E.M., Z.W., and D.K.M.; data curation, V.M., and D.K.M.; writing—original draft preparation, V.M., and D.K.M.; writing—review and editing, V.M., E.M., Z.W., and D.K.M.; visualization, V.M., and D.K.M.; supervision, E.M., and Z.W.; project administration, E.M., and Z.W. All authors have read and agreed to the published version of the manuscript.

Funding

The APC was funded by The University of South Africa.

Institutional Review Board Statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the University of South Africa, College of Science, Engineering, and Technology, School of Computing Ethics Research Committee.

Data Availability Statement

The data used to support the findings of this study can be obtained from the corresponding authors upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Elston, C.W.; Ellis, I.O. Pathological prognostic factors in breast cancer. I. The value of histological grade in breast cancer: Experience from a large study with long-term follow-up. Histopathology 1991, 19, 403–410. [Google Scholar] [PubMed]
  2. Krithiga, R.; Geetha, P. Breast cancer detection, segmentation and classification on histopathology images analysis: A systematic review. Arch. Comput. Methods Eng. 2021, 28, 2607–2619. [Google Scholar]
  3. Fox, H. Is H&E morphology coming to an end? J. Clin. Pathol. 2000, 53, 38–40. [Google Scholar] [PubMed]
  4. Wang, P.; Hu, X.; Li, Y.; Liu, Q.; Zhu, X. Automatic cell nuclei segmentation and classification of breast cancer histopathology images. Signal Process. 2016, 122, 1–13. [Google Scholar]
  5. Ali, S.; Madabhushi, A. Segmenting multiple overlapping objects via a hybrid active contour model incorporating shape priors: Applications to digital pathology. In Proceedings of the Medical Imaging 2011, Lake Buena Vista, FL, USA, 13–17 February 2011; Volume 7962, pp. 909–921. [Google Scholar]
  6. Venkataraman, G.; Rycyna, K.; Rabanser, A.; Heinze, G.; Baesens, B.M.; Ananthanarayanan, V.; Paner, G.P.; Barkan, G.A.; Flanigan, R.C.; Wojcik, E.M. Morphometric signature differences in nuclei of Gleason pattern 4 areas in Gleason 7 prostate cancer with differing primary grades on needle biopsy. J. Urol. 2009, 181, 88–94. [Google Scholar]
  7. Lal, S.; Desouza, R.; Maneesh, M.; Kanfade, A.; Kumar, A.; Perayil, G.; Alabhya, K.; Chanchal, A.K.; Kini, J. A robust method for nuclei segmentation of H&E stained histopathology images. In Proceedings of the 2020 7th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 27–28 February 2020; pp. 453–458. [Google Scholar]
  8. Kaushal, C.; Singla, A. Automated segmentation technique with self-driven post-processing for histopathological breast cancer images. CAAI Trans. Intell. Technol. 2020, 5, 294–300. [Google Scholar]
  9. Zebari, D.A.; Zeebaree, D.Q.; Abdulazeez, A.M.; Haron, H.; Hamed, H.N.A. Improved threshold based and trainable fully automated segmentation for breast cancer boundary and pectoral muscle in mammogram images. IEEE Access 2020, 8, 203097–203116. [Google Scholar]
  10. Kiran, I.; Raza, B.; Ijaz, A.; Khan, M.A. DenseRes-Unet: Segmentation of overlapped/clustered nuclei from multi organ histopathology images. Comput. Biol. Med. 2022, 143, 105267. [Google Scholar]
  11. Irshad, H.; Veillard, A.; Roux, L.; Racoceanu, D. Methods for nuclei detection, segmentation, and classification in digital histopathology: A review—current status and future potential. IEEE Rev. Biomed. Eng. 2013, 7, 97–114. [Google Scholar]
  12. Aswathy, M.; Jagannath, M. Detection of breast cancer on digital histopathology images: Present status and future possibilities. Informatics Med. Unlocked 2017, 8, 74–79. [Google Scholar]
  13. Reshma, V.; Arya, N.; Ahmad, S.S.; Wattar, I.; Mekala, S.; Joshi, S.; Krah, D. Detection of breast cancer using histopathological image classification dataset with deep learning techniques. BioMed Res. Int. 2022, 2022, 8363850. [Google Scholar]
  14. Xie, J.; Liu, R.; Luttrell, J., IV; Zhang, C. Deep learning based analysis of histopathological images of breast cancer. Front. Genet. 2019, 10, 80. [Google Scholar]
  15. Gecer, B.; Aksoy, S.; Mercan, E.; Shapiro, L.G.; Weaver, D.L.; Elmore, J.G. Detection and classification of cancer in whole slide breast histopathology images using deep convolutional networks. Pattern Recognit. 2018, 84, 345–356. [Google Scholar] [PubMed]
  16. Toğaçar, M.; Özkurt, K.B.; Ergen, B.; Cömert, Z. BreastNet: A novel convolutional neural network model through histopathological images for the diagnosis of breast cancer. Phys. A Stat. Mech. Its Appl. 2020, 545, 123592. [Google Scholar]
  17. Mohanakurup, V.; Parambil Gangadharan, S.M.; Goel, P.; Verma, D.; Alshehri, S.; Kashyap, R.; Malakhil, B. Breast cancer detection on histopathological images using a composite dilated Backbone Network. Comput. Intell. Neurosci. 2022, 2022, 8517706. [Google Scholar]
  18. Araújo, T.; Aresta, G.; Castro, E.; Rouco, J.; Aguiar, P.; Eloy, C.; Polónia, A.; Campilho, A. Classification of breast cancer histology images using convolutional neural networks. PLoS ONE 2017, 12, e0177544. [Google Scholar]
  19. Vesal, S.; Ravikumar, N.; Davari, A.; Ellmann, S.; Maier, A. Classification of breast cancer histology images using transfer learning. In Proceedings of the Image Analysis and Recognition: 15th International Conference, ICIAR 2018, Póvoa de Varzim, Portugal, 27–29 June 2018; pp. 812–819. [Google Scholar]
  20. Feng, Y.; Zhang, L.; Yi, Z. Breast cancer cell nuclei classification in histopathology images using deep neural networks. Int. J. Comput. Assist. Radiol. Surg. 2018, 13, 179–191. [Google Scholar]
  21. Sohail, A.; Khan, A.; Nisar, H.; Tabassum, S.; Zameer, A. Mitotic nuclei analysis in breast cancer histopathology images using deep ensemble classifier. Med. Image Anal. 2021, 72, 102121. [Google Scholar]
  22. Macenko, M.; Niethammer, M.; Marron, J.S.; Borland, D.; Woosley, J.T.; Guan, X.; Schmitt, C.; Thomas, N.E. A method for normalizing histology slides for quantitative analysis. In Proceedings of the 2009 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Boston, MA, USA, 28 June–1 July 2009; pp. 1107–1110. [Google Scholar]
  23. Buda, M.; Maki, A.; Mazurowski, M.A. A systematic study of the class imbalance problem in convolutional neural networks. Neural Netw. 2018, 106, 249–259. [Google Scholar]
  24. Golatkar, A.; Anand, D.; Sethi, A. Classification of breast cancer histology using deep learning. In Proceedings of the Image Analysis and Recognition: 15th International Conference, ICIAR 2018, Póvoa de Varzim, Portugal, 27–29 June 2018; pp. 837–844. [Google Scholar]
  25. Yu, C.; Chen, H.; Li, Y.; Peng, Y.; Li, J.; Yang, F. Breast cancer classification in pathological images based on hybrid features. Multimed. Tools Appl. 2019, 78, 21325–21345. [Google Scholar]
  26. Liu, X.; Liu, J.; Feng, Z.; Xu, X.; Tang, J. Mass classification in mammogram with semi-supervised relief based feature selection. In Proceedings of the Fifth International Conference on Graphic and Image Processing (ICGIP 2013), Hong Kong, China, 26–27 October 2013; Volume 9069, pp. 252–256. [Google Scholar]
  27. George, K.; Faziludeen, S.; Sankaran, P. Breast cancer detection from biopsy images using nucleus guided transfer learning and belief based fusion. Comput. Biol. Med. 2020, 124, 103954. [Google Scholar]
  28. Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar]
  29. Sornapudi, S.; Stanley, R.J.; Stoecker, W.V.; Almubarak, H.; Long, R.; Antani, S.; Thoma, G.; Zuna, R.; Frazier, S.R. Deep learning nuclei detection in digitized histology images by superpixels. J. Pathol. Informatics 2018, 9, 5. [Google Scholar]
  30. Dinh, T.L.; Kwon, S.G.; Lee, S.H.; Kwon, K.R. Breast tumor cell nuclei segmentation in histopathology images using efficientunet++ and multi-organ transfer learning. J. Korea Multimed. Soc. 2021, 24, 1000–1011. [Google Scholar]
  31. Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L.C. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 4510–4520. [Google Scholar]
  32. Rashmi, R.; Prasad, K.; Udupa, C.B.K. Breast histopathological image analysis using image processing techniques for diagnostic purposes: A methodological review. J. Med. Syst. 2022, 46, 7. [Google Scholar]
  33. Veta, M.; Huisman, A.; Viergever, M.A.; van Diest, P.J.; Pluim, J.P. Marker-controlled watershed segmentation of nuclei in H&E stained breast cancer biopsy images. In Proceedings of the 2011 IEEE International Symposium on Biomedical Imaging: From Nano to Macro, Chicago, IL, USA, 30 March–2 April 2011; pp. 618–621. [Google Scholar]
  34. Natarajan, V.A.; Kumar, M.S.; Patan, R.; Kallam, S.; Mohamed, M.Y.N. Segmentation of nuclei in histopathology images using fully convolutional deep neural architecture. In Proceedings of the 2020 International Conference on Computing and Information Technology (ICCIT-1441), Tabuk, Saudi Arabia, 9–10 September 2020; pp. 1–7. [Google Scholar]
  35. Guatemala-Sanchez, V.R.; Peregrina-Barreto, H.; Lopez-Armas, G. Nuclei segmentation on histopathology images of breast carcinoma. In Proceedings of the 2021 43rd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Mexico, 1–5 November 2021; pp. 2622–2628. [Google Scholar]
  36. Xie, L.; Qi, J.; Pan, L.; Wali, S. Integrating deep convolutional neural networks with marker-controlled watershed for overlapping nuclei segmentation in histopathology images. Neurocomputing 2020, 376, 166–179. [Google Scholar]
  37. Mahanta, L.B.; Hussain, E.; Das, N.; Kakoti, L.; Chowdhury, M. IHC-Net: A fully convolutional neural network for automated nuclear segmentation and ensemble classification for Allred scoring in breast pathology. Appl. Soft Comput. 2021, 103, 107136. [Google Scholar]
  38. Niaz, A.; Memon, A.A.; Rana, K.; Joshi, A.; Soomro, S.; Kang, J.S.; Choi, K.N. Inhomogeneous image segmentation using hybrid active contours model with application to breast tumor detection. IEEE Access 2020, 8, 186851–186861. [Google Scholar]
  39. Kaladevi, P.; Kanimozhi, N.; Nirmala, B.; Sivasankari, R. Morpho-contour exponential estimation algorithm for predicting breast tumor growth from MRI imagery. Int. J. Inf. Technol. 2024, 1–16. [Google Scholar]
  40. Xu, Y.; Wu, T.; Gao, F.; Charlton, J.R.; Bennett, K.M. Improved small blob detection in 3D images using jointly constrained deep learning and Hessian analysis. Sci. Rep. 2020, 10, 326. [Google Scholar]
  41. Majanga, V.; Viriri, S. Automatic blob detection for dental caries. Appl. Sci. 2021, 11, 9232. [Google Scholar] [CrossRef]
  42. Xu, Y.; Gao, F.; Wu, T.; Bennett, K.M.; Charlton, J.R.; Sarkar, S. U-net with optimal thresholding for small blob detection in medical images. In Proceedings of the 2019 IEEE 15th International Conference on Automation Science and Engineering (CASE), Vancouver, BC, Canada, 22–26 August 2019; pp. 1761–1767. [Google Scholar]
  43. Ingle, S.; Vidhate, A.; Chaudhari, S. Automatic pectoral muscles and artefacts removal in mammogram images for improved breast cancer diagnosis. Int. J. Bioinform. Res. Appl. 2024, 20, 627–647. [Google Scholar] [CrossRef]
  44. Kumar, T.S.; Sridhar, G.; Manju, D.; Subhash, P.; Nagaraju, G. Breast Cancer Classification and Predicting Class Labels Using ResNet50. J. Electr. Syst. 2023, 19. [Google Scholar] [CrossRef]
  45. Majanga, V.; Viriri, S. Dental images’ segmentation using threshold connected component analysis. Comput. Intell. Neurosci. 2021, 2021, 2921508. [Google Scholar] [CrossRef]
  46. Chlap, P.; Min, H.; Vandenberg, N.; Dowling, J.; Holloway, L.; Haworth, A. A review of medical image data augmentation techniques for deep learning applications. J. Med. Imaging Radiat. Oncol. 2021, 65, 545–563. [Google Scholar]
  47. Goceri, E. Medical image data augmentation: Techniques, comparisons and interpretations. Artif. Intell. Rev. 2023, 56, 12561–12605. [Google Scholar]
  48. Hussain, Z.; Gimenez, F.; Yi, D.; Rubin, D. Differential data augmentation techniques for medical imaging classification tasks. In Proceedings of the AMIA Annual Symposium Proceedings, Washington, DC, USA, 4–8 November 2017; American Medical Informatics Association: Bethesda, MD, USA, 2017; Volume 2017, p. 979. [Google Scholar]
  49. Garcea, F.; Serra, A.; Lamberti, F.; Morra, L. Data augmentation for medical imaging: A systematic literature review. Comput. Biol. Med. 2023, 152, 106391. [Google Scholar]
  50. Hoque, M.Z.; Keskinarkaus, A.; Nyberg, P.; Seppänen, T. Stain normalization methods for histopathology image analysis: A comprehensive review and experimental comparison. Inf. Fusion 2024, 102, 101997. [Google Scholar]
  51. Veta, M.; Pluim, J.P.; Van Diest, P.J.; Viergever, M.A. Breast cancer histopathology image analysis: A review. IEEE Trans. Biomed. Eng. 2014, 61, 1400–1411. [Google Scholar]
  52. Elmoataz, A.; Schüpp, S.; Clouard, R.; Herlin, P.; Bloyet, D. Using active contours and mathematical morphology tools for quantification of immunohistochemical images. Signal Process. 1998, 71, 215–226. [Google Scholar]
  53. Osher, S.; Sethian, J.A. Fronts propagating with curvature-dependent speed: Algorithms based on Hamilton-Jacobi formulations. J. Comput. Phys. 1988, 79, 12–49. [Google Scholar] [CrossRef]
  54. Sethian, J.A. Level Set Methods: Evolving Interfaces in Geometry, Fluid Mechanics, Computer Vision, and Materials Science; Cambridge Monographs on Applied and Computational Mathematics; Cambridge University Press: Cambridge, UK, 1996; Volume 3. [Google Scholar]
  55. Kichenassamy, S.; Kumar, A.; Olver, P.; Tannenbaum, A.; Yezzi, A. Gradient flows and geometric active contour models. In Proceedings of the IEEE International Conference on Computer Vision, Cambridge, MA, USA, 20–23 June 1995; pp. 810–815. [Google Scholar]
  56. Caselles, V.; Kimmel, R.; Sapiro, G. Geodesic active contours. Int. J. Comput. Vis. 1997, 22, 61–79. [Google Scholar] [CrossRef]
  57. Hosmer, D.W.; Hjort, N.L. Goodness-of-fit processes for logistic regression: Simulation results. Stat. Med. 2002, 21, 2723–2738. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.