Local Extremum Mapping for Weak Supervision Learning on Mammogram Classification and Localization

Zhu, Minjuan; Zhang, Lei; Wang, Lituan; Wang, Zizhou; Wang, Yan; Qian, Guangwu

doi:10.3390/bioengineering12040325

Open AccessArticle

Local Extremum Mapping for Weak Supervision Learning on Mammogram Classification and Localization

by

Minjuan Zhu

¹,

Lei Zhang

¹

,

Lituan Wang

¹

,

Zizhou Wang

²,

Yan Wang

² and

Guangwu Qian

^3,*

¹

College of Computer Science, Sichuan University, Section 4, Southern 1st Ring Rd., Chengdu 610065, China

²

Institute of High Performance Computing, Agency for Science, Technology and Research (A*STAR), Singapore 138632, Singapore

³

West China Biomedical Big Data Center, West China Hospital, Sichuan University, Section 4, Southern 1st Ring Rd., Chengdu 610065, China

^*

Author to whom correspondence should be addressed.

Bioengineering 2025, 12(4), 325; https://doi.org/10.3390/bioengineering12040325

Submission received: 2 February 2025 / Revised: 6 March 2025 / Accepted: 10 March 2025 / Published: 21 March 2025

(This article belongs to the Special Issue Breast Cancer: From Precision Medicine to Diagnostics)

Download

Browse Figures

Versions Notes

Abstract

The early and accurate detection of breast lesions through mammography is crucial for improving survival rates. However, the existing deep learning-based methods often rely on costly pixel-level annotations, limiting their scalability in real-world applications. To address this issue, a novel local extremum mapping (LEM) mechanism is proposed for mammogram classification and weakly supervised lesion localization. The proposed method first divides the input mammogram into multiple regions and generates score maps through convolutional neural networks. Then, it identifies the most informative regions by filtering local extrema in the score maps and aggregating their scores for final classification. This strategy enables lesion localization with only image-level labels, significantly reducing annotation costs. Experiments on two public mammography datasets, CBIS-DDSM and INbreast, demonstrate that the proposed method achieves competitive performance. On the INbreast dataset, LEM improves classification accuracy to 96.3% with an AUC of 0.976. Furthermore, the proposed method effectively localizes lesions with a dice similarity coefficient of 0.37, outperforming Grad-CAM and other baseline approaches. These results highlight the practical significance and potential clinical applications of our approach, making automated mammogram analysis more accessible and efficient.

Keywords:

mammography images; breast cancer classification; lesion localization; weak supervision; deep neural networks

1. Introduction

According to the latest cancer statistics [1], breast cancer has been the most common cancer among women, alone accounting for 31% of female cancers. The research of the World Health Organization has shown that early detection can have a positive impact on the prognosis and survival rate of breast cancer.

Mammography is widely recognized as an effective way to reduce mortality, and it has become a common tool in the early detection of breast cancer [2]. To help radiologists improve the mammography screening process and reduce their workloads, many computer-aided diagnosis (CAD) systems based on mammography images have been proposed to identify benign and malignant breast tumors [3,4,5,6]. With the surge of interest in deep neural networks (DNNs), it has been widely used in automatic mammography image analysis and has achieved great success in CAD systems. Existing DNN-based methods can generally be divided into the following two types.

The first treats breast cancer screening as a detection or segmentation task [7,8,9], focusing on locating or outlining lesions in mammograms and then classifying them as benign or malignant. These types of models can provide detailed lesion information and auxiliary evidence for radiologists to diagnose a mammographic image. Typically, sophisticated annotations are required for their construction, e.g., bounding boxes or pixel-level segmentation ground truth. These annotations are usually made by experienced radiologists to ensure accuracy, which requires a lot of effort and high costs. So, it is difficult to acquire a large amount of annotated data to build this type of model. For the same reason, a large number of unprocessed mammograms in hospitals have not been utilized due to a lack of pixel-level annotations. However, their image-level annotations can be easily queried from pathology reports. Therefore, how to reasonably utilize this part of the data to improve the efficiency of breast cancer screening has become an important question.

The second type of method focuses on establishing classification frameworks with only image-level annotations to predict the existence or absence of benign and malignant lesions in mammograms [10,11,12]. This method analyzes a large number of mammograms without detailed pixel-level annotations and fully mines its underlying information. However, due to the limitation of annotations, these models often fail to provide detailed location information. This information is very important because it can help radiologists quickly locate the support regions of model predictions. The absence of this information means that radiologists need to manually look for suspicious locations to verify the accuracy of predictions. Therefore, it is necessary to construct a method that can provide location information with only image-level annotation. To address this, several weakly supervised algorithms have been proposed that address the challenge of classifying mammograms and locating lesions with only image-level annotations [13,14,15,16]. However, they have usually provided only the location of the malignant lesions or introduce additional computational overhead.

In this study, a weakly supervised algorithm is proposed to address the challenges of classifying mammograms and localizing lesions using only image-level annotations. Due to lesions in mammograms being very small relative to the entire image, a large number of non-tumor regions can seriously interfere with the classification performance. Therefore, the core idea of the algorithm is to identify regions that are highly likely to contain lesions in the mammogram and then calculate the overall probability that a mammogram is benign or malignant. Meanwhile, lesion regions play a key role in mammogram classification; clinical studies have suggested that both the local details (such as the shape of the lesion) and the overall structure (such as the density and pattern of breast fibroglandular tissue) are necessary for an accurate diagnosis [17,18]. Due to the high variability, intricate appearance, and sparsity of breast lesions, it is even difficult for professional radiologists to diagnose them [19,20]. To overcome this obstacle, we subdivide the mammogram classification into two tasks: candidate lesion region selection and classification. Shu et al. [21] have explored two pooling structures to score the mammogram regionally and select candidate lesion regions according to a threshold. The results show that the prediction of a convolutional neural network (CNN) depends on the activation value of each spatial position on its feature map, and the spatial position with a relatively large or small value can usually correspond to the visual position of the object. However, this method depends on the selection of the ratio of selected candidate lesions, but the size and number of lesions in different samples are different, so the performance of the model is limited due to these parameters. To address these issues, a local extremum mapping (LEM) method is designed in this study. In the proposed LEM, all regions are scored for generating a score map, and the local extremum positions in this map are recorded as candidate lesion regions. Different from selecting candidate regions by threshold, the regions corresponding to the local extremum positions are more representative than their surrounding regions. These most representative extremum scores are collected to calculate the benign/malignant probability of the mammogram, and the lesions are localized on the basis of these positions. To further enhance this process, during training, a sparse loss function is designed to stimulate the emergence of extreme values in the score map. In the stage of lesion localization, the extreme positions will be screened out and backpropagated, and the pixels that contribute the most to those extreme values are considered the lesion pixels. In this process, the local maximum points are defined as corresponding to the malignant area, while the local minimum points correspond to benign areas.

In general, the proposed framework can be summarized into three stages: extracting features with a deep convolutional network, calculating the local extremum map based on the features, and backpropagating response points to obtain the location of lesions. Compared to pixel- or object-level labels that need to be manually annotated, information on whether a mammographic image contains benign or malignant lesions can easily be obtained from the pathology diagnosis. For lesion locating, the proposed method expands the existing deep convolutional network so that it can filter suspicious regions in the mammogram and finally realize the functions of locating lesions. This structure can be embedded into any CNN as a supplementary module, with little computational overhead, and it does not change the original CNN architectures. In summary, the main contributions of this work include the following:

A weakly supervised mammogram classification and localization model is proposed, which can locate lesions without detailed segmentation or detection annotations.
A local extremum mapping method is proposed that utilizes the extrema as criteria to select candidate lesion regions. It is a more comprehensive way to extend the feature distance between benign and malignant lesions.
Based on the above method, a sparse loss function is proposed to facilitate the generation of extremes and limit the number of extremes, and a localization algorithm is designed to obtain pixel-level lesion locations by backpropagating the extremes.

2. Related Work

After decades of development, deep neural networks have continuously made breakthroughs [22,23] and made great progress in computer vision [24,25,26]. For many big data analysis problems, deep learning models demonstrate significantly better performance than traditional methods [27,28]. Therefore, a multitude of methods to apply deep learning in medical image analysis have appeared [29,30,31]. Due to the rapid development of deep neural networks in medical image analysis, some deep learning methods have also been explored to solve the problem of automatic breast cancer screening. Most of these methods have demonstrated performance beyond traditional methods, and the models are easy to construct and more practical [11,12,32].

In mammographic image analysis, most DNN-based methods have two stages: segmentation or extraction of regions of interest (ROI), and ROI classification. Some works used DNN to classify breast masses in mammographic images [32,33,34]. Li Shen et al. [35] proposed a path-based method to detect and classify breast cancer lesions on mammograms. Al-antari et al. [36] constructed a complete framework for breast cancer detection, segmentation, and classification using digital X-ray mammograms. However, several recent studies have noted that these methodologies rely too much on ROI annotations and have poor scalability and repeatability. Chougrad et al. [37] collected lesion patches from mammographic images using segmentation annotations and built a lesion classification network to predict benign and malignant lesions. Agnes et al. [38] presented an end-to-end classification network implementing multilevel dilated convolutions to classify mammogram images. They further presented a deep ensemble transfer learning method and a deep classifier for feature extraction and mass classification [39]. The study in [40] addressed the problem of screening breast cancer in mammograms by converting it into a multiple instance learning (MIL) problem. They applied the sparsity assumption to the task of breast cancer classification to suppress the number of activated regions. Differently, the sparse loss in the proposed structure is intended to limit the non-extreme regions, make them fall within a range, and make the extreme values more distinguishable.

Deep weakly supervised localization methods with only image-level annotation as supervision typically aggregate deep responses to derive global confidence scores. Therefore, various pooling structures have been proposed to optimize the mapping of feature maps to classification probability. Global max pooling [41] selects the highest-response regions for category probability estimation, but it discards a significant amount of information. The global average pool [42] assigns equal importance to all regions, making it difficult to distinguish objects from the background. To address these limitations, Sun et al. [43] introduced log–sum–exponential pooling, which smoothly integrates the benefits of max and average pooling to focus on class-aware regions. Durand et al. [44] proposed rank max–min pooling, which selects high-scoring regions as positive and low-scoring ones as negative to improve feature discrimination. Zhou et al. [45] applied class peaks to image segmentation, showing that outliers in features are more informative. Shen et al. [14] proposed an attention-based CNN for the weakly supervised localization of malignant tumors in high-resolution mammographic images. Liang et al. [15] proposed a self-training strategy combined with CAM activation maps to replace attention to breast cancer localization. Sampaio et al. [16] explored the weakly supervised localization performance of mammography images using different activation map methods, such as CAM [42], GradCAM [46], and GradCAM++ [47].

However, these existing methods focus on extracting deep features from a global perspective while overlooking local spatial correlations, which are less suited to mammographic image analysis. Most mammogram lesions are small, and their classification depends not only on shape and size but also on their spatial distribution, which requires the consideration of local spatial relationships. For mammographic images, Han et al. [34] proposed a deep location soft-embedding-based network with regional scoring to evaluate and capture lesion location information, but they introduced an additional complex position embedding module for localization. Therefore, this paper designs a local extremum mapping mechanism to bridge feature space and image space. Specifically, the proposed local maximum mapping captures the spatial characteristics of suspected malignant lesions, while the local minimum mapping associates regions with benign lesions. By stimulating the emergence of these local extrema, the proposed method enables a comprehensive exploration of all regions of the image, enhancing the localization capability of the network. Furthermore, the proposed method can be simply integrated into any CNN architecture without modifying the original model structure, making it particularly beneficial for medical image analysis, where detailed annotations are time-consuming to obtain.

3. Method

In this section, a weakly supervised mammogram classification and localization technique is presented utilizing a local extremum map. CNN-based classifiers can produce the class activation map (CAM) [42] that specifies the contribution of each location in the image to the classification confidence. According to CNN’s feedforward process, it can be deduced that the abnormal point in the CAM usually corresponds with the visual cues of the target object, and in the mammogram, it is the lesion. Therefore, a mapping method based on the local extrema and the corresponding loss function is proposed to enhance the extremum in the CAM. In addition, considering that the lesion area in the mammographic image is small, a constraint term is added to the loss function to ensure the sparsity of the number of extrema. During the lesion localization phase, the extrema are backpropagated to generate maps that highlight informative regions for each object. The proposed mapping method retrieves all abnormalities in the mammogram, and the extrema are usually the most representative points in a local area, which can better locate the lesion.

3.1. The Overall Structure

The dataset used in the proposed framework contains mammograms and image-level annotations. Given a mammogram,

x

, its corresponding label is t. When it is fed into a standard CNN-based structure, a feature map can be obtained from the last convolutional layer. Let

M \in R^{C \times W \times H}

denote the feature map, where W and H represent the row and column size of the feature map, respectively, and C is the channel dimension. Most CNNs use global average pooling (GAP) [42] to obtain feature vectors from

M

, followed by a fully connected layer to compute the final classification confidence. By passing the parameters of this fully connected layer to a new convolutional layer with a kernel size of

1 \times 1

, it can be seamlessly converted into a structure with the ability to retain spatial information throughout forwarding. This process can be formulated as

y^{g a p} = σ [\sum_{c = 1}^{C} (\frac{1}{W H} \sum_{i = 1}^{W} \sum_{j = 1}^{H} (M_{c, i, j})) V_{c}] = σ [\frac{1}{W H} \sum_{i = 1}^{W} \sum_{j = 1}^{H} \sum_{c = 1}^{C} (M_{c, i, j} V_{c})],

(1)

where

y^{g a p}

denotes the category confidence calculated via GAP,

M_{c, i, j}

is the element of row c, column i, and channel j in the tensor

M

, and

V

is the weight matrix of the fully connected layer or convolutional layer. Since mammogram classification is treated as a binary classification problem, and only one neuron is used to represent the predicted probability (as shown in Figure 1), the size of

V

is

C \times 1

and its c-th element is denoted as

V_{c}

. The

σ

denotes the sigmoid active function that maps the confidence to the range of

(0, 1)

. According to Equation (1), the structure of a GAP followed by a fully connected layer can be implemented with a convolution layer of

1 \times 1

followed by a GAP. The latter is used in this work to construct the network. So, a score map,

S = {[S_{i, j}]}_{W \times H}

, can be obtained through this convolutional layer via

S_{i, j} = \sum_{c = 1}^{C} (M_{c, i, j} V_{c}),

(2)

where

S_{i, j}

is the score of position

(i, j)

in the score map

S

. The values in

S

can well reflect the importance of all regions in the input image

x

, so the most informative extreme values in

S

will be scanned out for a further calculation of category confidence. Figure 1 shows the detailed structure of the proposed method.

3.2. Local Extremum Mapping

In order to drive the extrema in the score map to be more prominent, a local extremum mapping layer is added to follow the last convolution layer. There are two types of local extrema in the score map, the local maxima and the local minima, corresponding to the suspected malignant and suspected benign lesions (masses or calcification), respectively. During the feedforward calculation pass, two sampling maps,

S^{m a x} = {[S_{i, j}^{m a x}]}_{W \times H}

and

S^{m i n} = {[S_{i, j}^{m i n}]}_{W \times H}

, are generated to locate all the maximum and minimum positions, which will be aggregated to compute the benign or malignant tendency of the lesion. In both sampling maps, the element at position

(i, j)

is equal to 0 or 1 (1 indicates the positions of the extremum). They are formed as

S_{i, j}^{m a x} = \{\begin{matrix} 0 & i f S_{i, j} \neq m a x (S_{i, j}^{s r d}), \\ 1 & i f S_{i, j} = m a x (S_{i, j}^{s r d}), \end{matrix}

(3)

and

S_{i, j}^{m i n} = \{\begin{matrix} 0 & i f S_{i, j} \neq m i n (S_{i, j}^{s r d}), \\ 1 & i f S_{i, j} = m i n (S_{i, j}^{s r d}), \end{matrix}

(4)

where

S_{i, j}^{s r d}

is the point set located around the point at coordinate

(i, j)

in score map

S

. When it is on the corner of the score map (such as the top left corner

S_{1, 1}

and the bottom right corner

S_{W, H}

), there are 4 points around it. When it is on the boundary and not the corner (such as the left boundary

S_{j, H}

for

\forall j \in (1, W)

) of the score map, there are 6 points around it. Otherwise, there are 9 points around it. They are used to query extremum points.

When all extrema are found, the maxima

S^{m a x}

and minima

S^{m i n}

will be aggregated separately to calculate the probability that those suspicious regions tend to be malignant or benign using

\begin{matrix} a^{m} = \sum_{i, j} (S_{i, j}^{m a x} \cdot m a x (S_{i, j}, 0)), \\ a^{b} = \sum_{i, j} (S_{i, j}^{m i n} \cdot m i n (S_{i, j}, 0)), \\ y^{l e m} = σ (\frac{a^{m} + a^{b}}{\sum_{i, j} (S_{i, j}^{m a x} + S_{i, j}^{m i n})}), \end{matrix}

(5)

where

a^{m}

indicates the aggregate value at which the mammogram tends to be malignant,

a^{b}

denotes the aggregate value at which the mammogram tends to be benign, and

y^{l e m}

indicates the final probability of malignancy. As Equation (5) shows, the calculation of probability only uses extremum values as input.

When Equations (1) and (5) are compared, it can be seen that all scores are densely sampled in the conventional method to compute the confidence of the category. When the imbalance between the lesion area and the background area is taken into account, the proposed method sparsely samples the score map and only selects regions with abnormal scores to calculate the probability of the category. In this way, it is possible to avoid the problem in which the feature distance between benign and malignant lesions is too close because of sampling too many background regions.

3.3. Sparse Loss Function

This section introduces the details of the sparse loss function designed for LEM. Mammography image classification is essentially a problem of classifying lesions. However, the lesion typically comprises only a tiny percentage in a mammogram, which means that most of the mammogram area contributes little to helping identify whether a mammogram is malignant or benign. In other words, the region of the lesion is relatively sparse throughout the mammogram. Correspondingly, when sampling scores, the active points in the score map should also be limited so that they are sparse. To constrain the number of active points and ensure that there is a small number of significant extrema, a sparse loss function

L_{s p a}

is first designed to constrain the LEM and defined as

\begin{matrix} L_{s p a} = ∥ L_{e l i} ∥_{1} + γ {∥ L_{p n a} ∥}_{1}, \\ L_{e l i} = \sum_{i, j} [S_{i, j}^{m a x} \cdot σ (m i n (S_{i, j}, 0)) + S_{i, j}^{m i n} \cdot σ (- m a x (S_{i, j}, 0))], \\ L_{p n a} = \sum_{i, j} [σ (m a x (S_{i, j}, τ)) + σ (- m i n (S_{i, j}, - τ))], \end{matrix}

(6)

where

L_{e l i}

is the elimination loss used to eliminate the wrong extrema in the score map, and

L_{p n a}

is the penalty loss to ensure the activation values are sparse. And the parameter

γ

is a penalty factor,

{∥ \cdot ∥}_{1}

denotes the norm

L_{1}

, and the

τ

in

L_{p n a}

is a threshold used to control the sparsity range. The effect of this function is shown in Figure 2.

In addition, the LEM loss function,

L_{l e m}

, and the GAP loss function,

L_{g a p}

, are calculated to stimulate the appearance of extrema in the score map:

\begin{matrix} L_{g a p} = g (y^{g a p}, t), \\ L_{l e m} = g (y^{l e m}, t), \\ g (y, t) = t \cdot log y + (1 - t) \cdot log (1 - y) . \end{matrix}

(7)

where

t \in {0, 1}

is the image-level annotation (0 denotes benign, and 1 denotes malignant) for mammographic image

x

. The cross-entropy function g is used to evaluate the difference between the predicted value and the label.

The complete loss function is formulated as

\begin{matrix} J = α (μ ∥ L_{s p a} ∥_{1} + L_{l e m}) + (1 - α) L_{g a p} + \frac{λ}{2} {∥ θ ∥}^{2}, \end{matrix}

(8)

where

θ

is the parameter of the network,

∥ \cdot ∥

denotes the

L_{2}

norm,

λ

is the regularizer that controls model complexity, and

μ

is the sparsity factor, balancing the sparsity assumption with the importance of different regions.

α \in [0, 1]

is the parameter used to dynamically control the choice of using GAP or LEM during the training phase. When

α = 0

, the proposed method degenerates into using CNN with GAP to compute the final classification confidence and lesion localization, whereas, when

α = 1

, only the proposed LEM is used. The motivation is that, at the beginning of the training stage, the generation of extrema is random, but candidate extrema can be generated by training with GAP. So,

α

is designed to control this process. During the training process, the gradient

\frac{\partial J}{\partial θ}

is computed based on the total loss defined in Equation (8):

\begin{matrix} \frac{\partial J}{\partial θ} = α μ \frac{\partial ‖ L_{s p a} ‖_{1}}{\partial θ} + α \frac{\partial L_{l e m}}{\partial θ} + (1 - α) \frac{\partial L_{g a p}}{\partial θ} + λ θ . \end{matrix}

(9)

By controlling the sparsity of the extrema and the training phase in this way, the adaptive potential of the network can be fully utilized. During classification, only a few regions are activated, which will make the activated regions more representative. This process is shown in Figure 3.

3.4. Lesion Localization

To locate the lesion, the contribution of each pixel in the mammogram to the confidence in the final output must be calculated. The CAM-based visualization structure (such as grad-cam, grad-cam++, etc.) calculates the importance of each region based on the last convolution layer, which has a good effect on big objects but is not fine-grained enough. Currently, most CNNs adopt

R e L U

as the activation function, leading to the fact that only weights greater than 0 contribute to the calculation. This conclusion has been proven in [48] and [45]. Therefore, for the purpose of calculating the importance of each pixel to a certain extremum, this property of CNN is utilized for reverse calculation in the proposed model. Unlike CAM, the final confidence in our model is not directly used to calculate the location map. All extrema will be searched, and those that meet the selection criteria will be backpropagated. Formulaically, the feature maps of two adjacent layers, l and

l + 1

, are defined as

M^{l}

and

M^{l + 1}

, respectively.

n_{l}^{i}

represents the i-th neuron in the layer l, and

n_{l + 1}^{j}

represents the j-th neuron in layer

l + 1

. Therefore,

M_{i}^{l}

and

M_{j}^{l + 1}

are the activation values of neurons

n_{i}^{l}

and

n_{j}^{l + 1}

, respectively.

In the feedforward calculation process, the activation value

M_{j}^{l + 1}

of the neuron

n_{j}^{l + 1}

is computed via

M_{j}^{l + 1} = R e L U (\sum_{i \in C_{j}} U_{i, j} M_{i}^{l} + b_{l})

, where

U_{i, j} \in θ

is the weight from neuron

n_{i}^{l}

to neuron

n_{j}^{l + 1}

,

b_{l}

is the bias in layer l, and

C_{j}

is the set of child connection nodes for neuron

n_{j}^{l + 1}

. Then, the importance values

P (M_{i, j}^{l})

of the previous layer can be computed top-down, starting from that of the output units, as follows:

\begin{matrix} P (M_{i}^{l}) = \sum_{j \in P_{i}} P (M_{i, j}^{l}) P (M_{j}^{l + 1}), \\ P (M_{i, j}^{l}) = R e L U (U_{i, j}) M_{i}^{l} Z_{j}, \end{matrix}

(10)

Here,

P_{i}

is the set of parent connection nodes of the neuron

n_{i}^{l}

, weights less than 0 are discarded through the function

R e L U

, and the term

Z_{j} = {(\sum_{i \in C_{j}} R e L U (U_{i, j}) M_{i}^{l})}^{- 1}

is a normalization factor that ensures the sum of importance up to 1, that is,

\sum_{i \in C_{j}} P (M_{i, j}^{l}) = 1

. The distribution of importance of the last feature map can be obtained directly from the score map

S

. Note that the score values in

S

are not calculated using the function

R e L U

. Therefore, in this layer, positive weights contribute to maximum values, and negative weights contribute to minimum values. The last layer can be calculated via

\begin{matrix} \begin{matrix} P (M_{i, j, c}^{m a x}) = R e L U (V_{c}) M_{i, j, c} S_{i, j}^{m a x}, \\ P (M_{i, j, c}^{m i n}) = - R e L U (- V_{c}) M_{i, j, c} S_{i, j}^{m i n} . \end{matrix} \end{matrix}

(11)

Through the above formula, the importance of each pixel in the image to different extrema can be calculated. Since the importance map is a spatial display of the contribution of pixels to the prediction, the salient regions in the importance map can be considered lesion areas when benign and malignant labels are used as supervision. The algorithms for the LEM training process and the location of the lesion are specified in Algorithms 1 and 2, respectively.

Algorithm 1 Training process of LEM

Require:

Input: Mammograms and image-level annotations.
Learning rate: $l r$

Ensure:

The learned parameters: $θ$

1:: for $e p o c h = 1 : m a x_e p o c h$ do
2:: Extracting features for each input image
3:: Computing the score map $S$ via Equation (2)
4:: Detecting extrema via Equations (3) and (4)
5:: Computing the malignant probability $y^{l e m}$ via Equation (5)
6:: Computing the gradient of $J$ with respect to the parameters $θ$ via Equations (8) and (9)
7:: Updating the parameters $θ$ : $θ$ = $θ$ − $l r \cdot \frac{\partial J}{\partial θ}$
8:: end for

Algorithm 2 Lesion localization

Require:

Input: A mammogram, $x$ , and its label, t.
A trained network with parameter $θ$ .

Ensure:

The lesion location map

1:: Forward $x$ to obtain the feature map $M$ ;
2:: Compute the extreme position map $S^{m a x}$ and $S^{m i n}$ via Equations (3) and (4)
3:: Compute the malignant probability $y^{l e m}$ for mammogram $x$ via Equation (5)
4:: if $y^{l e m} > 0$ then
5:: for $S_{i, j}^{m a x}$ in $S^{m a x}$ do
6:: Backpropagation for computing the location map
7:: end for
8:: else
9:: for $S_{i, j}^{m i n}$ in $S^{m i n}$ do
10:: Backpropagation for computing the location map
11:: end for
12:: end if
13:: Locating lesions in the location map

4. Experiments and Results

In this section, the details of the two datasets in the experiment are first presented. Then, the experimental setup is briefly introduced, including the evaluation strategy, the evaluation metric, and the implementation details. Lastly, the results of the experiment and the corresponding analysis are demonstrated.

4.1. Dataset

The structure proposed in this study was validated on two public data sets. Both datasets are scan data of mammograms stored in DICOM format. Their details and pre-processing steps are as follows.

CBIS-DDSM [49]: The CBIS-DDSM database (Curated Breast Imaging Subset of DDSM) consists of mammograms, segmentation annotations, lesion patches, and pathological diagnoses. This repository boasts a diverse array of 753 cases exhibiting calcifications and 891 cases displaying masses, a total of 3071 images. For all experiments conducted in this study, we only utilized image-level cancer labels for model training purposes.

INbreast [50]: The INbreast database consists of full-field digital mammograms and precise annotations, formatted similarly to modern computer vision datasets. The mammography images in this database contain a wide variety of lesions, such as masses, calcifications, asymmetries, and distortions. The INbreast database boasts a total of 115 cases, encompassing 410 images with BI-RADS annotations. In this study, we utilized BI-RADS annotations (BI-RADS

\in {4, 5, 6}

as malignant).

The CBIS-DDSM database uses the standard split provided by [49], with 85% of the data allocated to training and 15% to testing. Following [21], the INbreast dataset was randomly split into mutually exclusive subsets for training (80%) and testing (20%) subsets. The preprocessing of raw mammograms was carried out according to the method in the previous work of this study [21], and the size of mammograms was adjusted to a standard size of

800 \times 800

. Original mammograms were augmented. All images were standardized to ensure that the value of the pixels was located at

[0, 1]

. To reduce the impact of overfitting, in the training stage, several data augmentation techniques were applied to mammography images, including random horizontal flipping, adjustments to image contrast and saturation, rotation within the range of −30 to +30 degrees, and the addition of Gaussian noise.

4.2. Training Details

In the experiment, all CNN architectures utilized were built upon publicly available code provided by their respective authors, with only the last fully connected layer altered with our proposed LEM and no other structural modifications. The parameters of the convolutional layers were initialized on ImageNet. For a fair comparison, all experiments took an identical setup: the same update strategy, the same learning rate, and the same stopping condition. The initial learning rate of all models was

1 \times 10^{- 4}

. The

λ

in the loss function was set to a very small value,

1 \times 10^{- 5}

, to prevent the inability to generate extreme values; the

τ

in Equation (7) was set to the balanced value 3, and the

α

was initialized to

0.1

and gradually increased to 1 (gradual transition from GAP-dominated to LEM-dominated). The learning rate underwent a gradual decay process every eight epochs with a decay factor of

0.98

. The parameters were updated in mini-batches, utilizing a batch size of 32. The classification threshold value was set to 0.5, and the training process terminated after the 400th epoch. The proposed method was evaluated based on classification accuracy (ACC) and the area under the receiver operating characteristic curve (AUC) for the mammogram classification task, as well as the dice similarity coefficient (DSC) and the recall for the localization task.

4.3. Results and Analysis

4.3.1. Evaluation with Different Backbone Structures

The proposed LEM can be employed as an auxiliary layer in any CNN-based model. To verify its reliability and effectiveness with different CNN structures as the backbone, a set of control experiments was designed to compare the model performance under different conditions. Specifically, several popular models were trained and tested: the ResNet [51] series (Resnet-18, Resnet-50, and Resnet-101) and the DenseNet [24] series (DenseNet-121 and DenseNet-169). The parameters of LEM remain consistent across all models. In addition, the input sizes of different CNNs are inconsistent, but they do not affect the calculation of the score maps. Therefore, the input sizes of all networks in these experiments are set to the same size of

800 \times 800

. Table 1 shows the results of the LEM structure applied to different CNNs on the INbreast dataset.

It can be seen from the results that LEM has a significantly positive effect on all CNN-based networks listed, especially on the Resnet-18 structure. The reason may be that Resnet-18 is lighter and has fewer parameters than other networks, making its extracted features more primitive and containing more complex information. The GAP it uses struggles to extract effective lesion information from such feature maps, resulting in poor network performance. In contrast, LEM can optimize the extraction of spatial information and enhance the learning of abnormal regions in the training phase, making it easier to locate lesion regions in order to distinguish their benign and malignant nature. In addition, Resnet-18 with LEM outperforms the standard Resnet-50 and Resnet-101 models. It shows that LEM can fully exploit the potential of networks and improve their upper limit. Building models with better performance under the same scale of parameters in such a way can help the application of CNN-based CAD systems.

4.3.2. Evaluation of Different Mapping Methods

The proposed approach includes the mapping of feature maps to score maps. To evaluate whether LEM works efficiently, several pooling methods are compared with Resnet-18 as the backbone. Among them, global average pooling (GAP) [42] and global max pooling (GMP) [41] are the most widely used structures. Region-based group-max pooling (RGP) [21] and global group-max pooling (GGP) [21] are threshold-based pooling methods that have achieved good performance in mammogram classification. The experimental results are shown in Table 2.

The comparison results of different pooling structures show that the LEM achieves very competitive performance, with accuracy similar to the previous work, GGP, and far better than other methods. The AUC of LEM far exceeds that of other pooling methods, indicating its better robustness. This is consistent with the previous discussion suggesting that the sparse lesion hypothesis can benefit from the learning ability of the deep network itself. The LEM employs a more flexible candidate region selection strategy and constrains the score map with a sparse loss to fully exploit the potential of the network. In contrast to GGP and RGP, it does not suffer from inter-sample variation (because the number of selected regions can be different in different samples).

4.3.3. Comparisons with Other Methods

In recent years, some studies have attempted to apply several well-performing CNNs to the classification of mammography images. However, they did not take into account the characteristics of mammography, in which the classification of lesions is very different from the classification of typical objects in natural images, nor did they take full advantage of the potential of neural networks. The proposed method is completely automated and does not require pixel-level labeling or staged training. Image-level labels can also be directly derived from pathological results, facilitating the complete implementation of end-to-end training. The method was compared with other methods in mammography analysis, as shown in Table 3 and Table 4. These results are quoted from the original text.

The results show that the end-to-end trained LEM achieves superiority in the classification of whole mammograms. The features derived from the LEM structure are beneficial for training the deep network with mammograms as input. The DenseNet-169-based structure achieves a competitive result compared to the GMIC and MS + BRS models, proving the robustness of the proposed LEM structure. It shows the effectiveness of the features obtained via LEM in medical images. In addition, the accuracy in CBIS-DDSM is significantly lower than that in INbreast, possibly because of the different annotations. INbreast database only contains BI-RADS annotations, while CBIS-DDSM publishes benign/malignant annotations. In the domain of deep learning, annotations are critical, and the results are significantly impacted by them. Furthermore, the inconsistency of the data source devices in two datasets is also a crucial factor in determining the results.

4.3.4. Evaluation of Localization Performance

To further validate the lesion localization performance of the LEM, we calculated the pixel-level accuracy on the test set of the INbreast dataset and compared it with other segmentation models. The results are shown in the following Table 5.

The results show that the locating performance of LEM is better than that of standard CNN without pixel-level annotation. This weakly supervised learning method can improve the positioning ability of the model. Figure 4a,b visualizes the positioning performance of the model for masses and calcifications. It can be found that the model will also pay attention to surrounding tissues when locating calcifications.

Figure 4c shows two cases of model prediction errors, namely, predicting benign lesions as malignant and predicting malignant lesions as benign. However, it can be seen from the location map that the lesion is detectable, although the model prediction is wrong. The reason for this phenomenon may be that the lesions have some of the same properties in features, such as shape and color. The model can distinguish the lesion from the background, but it cannot accurately classify some lesions with non-obvious features. These cases are also difficult to distinguish via the standard CNN, but the proposed model can provide the basis for the prediction as verification evidence for diagnosis.

4.3.5. Ablation Study

In addition, we conducted ablation experiments to compare the performance of the model with and without sparse loss. The results are displayed in Table 6. As the results show, training the model with sparse loss increases the DSC by 12% but decreases the AUC by 0.3%. This indicates that the sparse loss does not have a positive effect on the classification performance of the model but can improve the localization performance of the model. The reason is that the sparse loss limits the number of extremes and enhances the association of extremes with lesions.

4.3.6. Visualization of the Score Map

In the comparative experiment on whether to use sparse loss, the visualization of the score map is shown in Figure 5. Figure 5a contains more extrema compared to Figure 5b, and the extremum values in Figure 5b are obviously higher than in Figure 5a. The network relies primarily on the most representative points. Therefore, training the model with sparse loss enhances the prominence of the lesion areas in the score map while effectively suppressing noise.

5. Conclusions

In this study, an end-to-end mammographic image diagnostic architecture has been proposed in which LEM was designed for mammogram classification and lesion localization without the need for pixel-level annotations. By identifying local extrema in the feature score maps, lesion-relevant regions were effectively captured, while non-discriminative areas were filtered out. Experimental results on the INbreast and CBIS-DDSM datasets demonstrated the effectiveness of the proposed LEM. However, there are still some limitations to lesion localization. The localization accuracy could still be improved, as small or subtle lesions may be underrepresented in the score maps. Future work will focus on enhancing feature refinement strategies and the integration of self-supervised or semi-supervised learning techniques to enhance location performance. In conclusion, the proposed method provides a practical solution for weakly supervised mammogram analysis, enhancing both classification accuracy and lesion localization performance while minimizing the reliance on costly pixel-level annotations. This is especially valuable in computer-aided diagnosis systems, for which reducing the cost of annotation is critical for scalability and efficiency.

Author Contributions

Conceptualization, M.Z. and L.Z.; methodology, M.Z. and G.Q.; software, M.Z.; validation, M.Z., L.W. and Z.W.; formal analysis, M.Z.; investigation, M.Z. and Y.W.; resources, L.Z. and G.Q.; data curation, M.Z.; writing—original draft preparation, M.Z.; writing—review and editing, L.Z., L.W. and G.Q.; visualization, M.Z.; supervision, L.Z. and G.Q.; project administration, G.Q.; funding acquisition, L.Z. and G.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant No. 62025601 and Grant No. U24A20341, Aier Eye Hospital-Sichuan University Research under Grant No. 23JZH043, and the National Key Research and Development Program of China under Grant No. 2024YFC2510700.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available from CBIS-DDSM at https://www.cancerimagingarchive.net/collection/cbis-ddsm/, accessed on 14 September 2017 and INbreast at https://www.kaggle.com/datasets/ramanathansp20/inbreast-dataset, accessed on 1 May 2024.

Acknowledgments

The authors would like to thank Zhenbin Wang and Xin Shu for their technical assistance with experiments and data analysis.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ACC	Accuracy
AUC	Area under the receiver operating characteristic curve
CAD	Computer-aided diagnosis
CAM	Class activation map
CNN	Convolutional neural network
DNN	Deep neural network
DSC	Dice similarity coefficient
GAP	Global average pooling
GGP	global group-max pooling
GMP	Global max pooling
LEM	Local extremum mapping
MIL	Multiple instance learning
RGP	Region-based group-max pooling
ROI	Regions of interest

References

Siegel, R.L.; Miller, K.D.; Wagle, N.S.; Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 2023, 73, 17–48. [Google Scholar] [CrossRef] [PubMed]
Moss, S.M.; Nystrom, L.; Jonsson, H.; Paci, E.; Lynge, E.; Njor, S.; Broeders, M. The impact of mammographic screening on breast cancer mortality in Europe: A review of trend studies. J. Med. Screen. 2012, 19 (Suppl. S1), 26. [Google Scholar] [CrossRef] [PubMed]
Hupse, R.; Karssemeijer, N. Use of normal tissue context in computer-aided detection of masses in mammograms. IEEE Trans. Med. Imaging 2009, 28, 2033–2041. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Yang, Y. A context-sensitive deep learning approach for microcalcification detection in mammograms. Pattern Recognit. 2018, 78, 12–22. [Google Scholar]
Jouirou, A.; Baâzaoui, A.; Barhoumi, W. Multi-view content-based mammogram retrieval using dynamic similarity and locality sensitive hashing. Pattern Recognit. 2021, 112, 107786. [Google Scholar] [CrossRef]
Karagoz, M.A.; Nalbantoglu, O.U. A self-supervised learning model based on variational autoencoder for limited-sample mammogram classification. Appl. Intell. 2024, 54, 3448–3463. [Google Scholar]
Zhao, W.; Lou, M.; Qi, Y.; Wang, Y.; Xu, C.; Deng, X.; Ma, Y. Adaptive channel and multiscale spatial context network for breast mass segmentation in full-field mammograms. Appl. Intell. 2021, 51, 8810–8827. [Google Scholar] [CrossRef]
Shen, Y.; Wu, N.; Phang, J.; Park, J.; Kim, G.; Moy, L.; Cho, K.; Geras, K.J. Globally-aware multiple instance classifier for breast cancer screening. In Proceedings of the International Workshop on Machine Learning in Medical Imaging; Springer: Cham, Switzerland, 2019; pp. 18–26. [Google Scholar]
Wu, N.; Phang, J.; Park, J.; Shen, Y.; Huang, Z.; Zorin, M.; Jastrzbski, S.; F¨¦vry, T.; Katsnelson, J.; Kim, E.a. Deep Neural Networks Improve Radiologists’ Performance in Breast Cancer Screening. IEEE Trans. Med. Imaging 2020, 39, 1184–1194. [Google Scholar] [CrossRef]
Fevry, T.; Phang, J.; Wu, N.; Kim, S.G.; Moy, L.; Cho, K.; Geras, K.J. Improving localization-based approaches for breast cancer screening exam classification. arXiv 2019, arXiv:1908.00615. [Google Scholar]
McKinney, S.M.; Sieniek, M.; Godbole, V.; Godwin, J.; Antropova, N.; Ashrafian, H.; Back, T.; Chesus, M.; Corrado, G.C.; Darzi, A.; et al. International evaluation of an AI system for breast cancer screening. Nature 2020, 577, 89–94. [Google Scholar] [CrossRef]
Wang, Y.; Wang, Z.; Feng, Y.; Zhang, L. WDCCNet: Weighted Double-Classifier Constraint Neural Network for Mammographic Image Classification. IEEE Trans. Med. Imaging 2022, 41, 559–570. [Google Scholar] [CrossRef] [PubMed]
Liu, W.; Shu, X.; Zhang, L.; Li, D.; Lv, Q. Deep multiscale multi-instance networks with regional scoring for mammogram classification. IEEE Trans. Artif. Intell. 2021, 3, 485–496. [Google Scholar] [CrossRef]
Shen, Y.; Wu, N.; Phang, J.; Park, J.; Liu, K.; Tyagi, S.; Heacock, L.; Kim, S.G.; Moy, L.; Cho, K.; et al. An interpretable classifier for high-resolution breast cancer screening images utilizing weakly supervised localization. Med. Image Anal. 2021, 68, 101908. [Google Scholar] [CrossRef]
Liang, G.; Wang, X.; Zhang, Y.; Jacobs, N. Weakly-Supervised Self-Training for Breast Cancer Localization. In Proceedings of the 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada, 20–24 July 2020; pp. 1124–1127. [Google Scholar] [CrossRef]
Sampaio, V.; Cordeiro, F.R. Improving Mass Detection in Mammography Images: A Study of Weakly Supervised Learning and Class Activation Map Methods. In Proceedings of the 2023 36th SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), Rio Grande, Brazil, 6–9 November 2023; pp. 139–144. [Google Scholar] [CrossRef]
Pereira, S.M.P.; McCormack, V.A.; Moss, S.M.; dos Santos Silva, I. The spatial distribution of radiodense breast tissue: A longitudinal study. Breast Cancer Res. 2009, 11, R33. [Google Scholar] [CrossRef] [PubMed]
Wei, J.; Chan, H.P.; Wu, Y.T.; Zhou, C.; Helvie, M.A.; Tsodikov, A.; Hadjiiski, L.M.; Sahiner, B. Association of computerized mammographic parenchymal pattern measure with breast cancer risk: A pilot case-control study. Radiology 2011, 260, 42–49. [Google Scholar] [CrossRef]
Oliver, A.; Freixenet, J.; Marti, J.; Perez, E.; Pont, J.; Denton, E.R.; Zwiggelaar, R. A review of automatic mass detection and segmentation in mammographic images. Med. Image Anal. 2010, 14, 87–110. [Google Scholar] [CrossRef]
Tang, J.; Rangayyan, R.M.; Xu, J.; Naqa, I.E.; Yang, Y. Computer-Aided Detection and Diagnosis of Breast Cancer With Mammography: Recent Advances. IEEE Trans. Inf. Technol. Biomed. 2009, 13, 236–251. [Google Scholar] [CrossRef]
Shu, X.; Zhang, L.; Wang, Z.; Lv, Q.; Yi, Z. Deep Neural Networks with Region-Based Pooling Structures for Mammographic Image Classification. IEEE Trans. Med. Imaging 2020, 39, 2246–2255. [Google Scholar] [CrossRef]
Zhang, L.; Yi, Z.; Amari, S.i. Theoretical study of oscillator neurons in recurrent neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2018, 29, 5242–5248. [Google Scholar] [CrossRef]
Li, Y.; Qian, G.; Jiang, X.; Jiang, Z.; Wen, W.; Zhang, S.; Li, K.; Lao, Q. Hierarchical-Instance Contrastive Learning for Minority Detection on Imbalanced Medical Datasets. IEEE Trans. Med. Imaging 2024, 43, 416–426. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Maaten, L.V.D.; Weinberger, K.Q. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2261–2269. [Google Scholar]
Wang, L.; Zhang, L.; Yi, Z. Trajectory Predictor by Using Recurrent Neural Networks in Visual Tracking. IEEE Trans. Cybern. 2017, 47, 3172–3183. [Google Scholar] [PubMed]
Wang, L.; Zhang, L.; Qi, X.; Yi, Z. Deep Attention-Based Imbalanced Image Classification. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 3320–3330. [Google Scholar] [CrossRef] [PubMed]
Qi, X.; Zhang, L.; Chen, Y.; Pi, Y.; Chen, Y.; Lv, Q.; Yi, Z. Automated diagnosis of breast ultrasonography images using deep neural networks. Med. Image Anal. 2019, 52, 185–198. [Google Scholar] [PubMed]
Xie, L.; Zhang, L.; Hu, T.; Li, G.; Yi, Z. Neural Network Model Based on Branch Architecture for the Quality Assurance of Volumetric Modulated Arc Therapy. Bioengineering 2024, 11, 362. [Google Scholar] [CrossRef]
Rajpurkar, P.; Irvin, J.; Zhu, K.; Yang, B.; Mehta, H.; Duan, T.; Ding, D.; Bagul, A.; Langlotz, C.; Shpanskaya, K. CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv 2017, arXiv:1711.05225. [Google Scholar]
Azad, R.; Kazerouni, A.; Heidari, M.; Aghdam, E.K.; Molaei, A.; Jia, Y.; Jose, A.; Roy, R.; Merhof, D. Advances in medical image analysis with vision transformers: A comprehensive review. Med. Image Anal. 2024, 91, 103000. [Google Scholar]
Sakaida, M.; Yoshimura, T.; Tang, M.; Ichikawa, S.; Sugimori, H.; Hirata, K.; Kudo, K. The Effectiveness of Semi-Supervised Learning Techniques in Identifying Calcifications in X-ray Mammography and the Impact of Different Classification Probabilities. Appl. Sci. 2024, 14, 5968. [Google Scholar] [CrossRef]
Carneiro, G.; Nascimento, J.; Bradley, A.P. Unregistered Multiview Mammogram Analysis with Pre-trained Deep Learning Models. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; pp. 652–660. [Google Scholar]
Jiao, Z.; Gao, X.; Wang, Y.; Li, J. A deep feature based framework for breast masses classification. Neurocomputing 2016, 197, 221–231. [Google Scholar]
Han, B.; Sun, L.; Li, C.; Yu, Z.; Jiang, W.; Liu, W.; Tao, D.; Liu, B. Deep Location Soft-Embedding-Based Network With Regional Scoring for Mammogram Classification. IEEE Trans. Med. Imaging 2024, 43, 3137–3148. [Google Scholar]
Shen, L.; Margolies, L.R.; Rothstein, J.H.; Fluder, E.; McBride, R.; Sieh, W. Deep learning to improve breast cancer detection on screening mammography. Sci. Rep. 2019, 9, 12495. [Google Scholar] [CrossRef]
Al-Antari, M.A.; Al-Masni, M.A.; Kim, T.S. Deep Learning Computer-Aided Diagnosis for Breast Lesion in Digital Mammogram. In Deep Learning in Medical Image Analysis; Springer: Berlin/Heidelberg, Germany, 2020; pp. 59–72. [Google Scholar]
Chougrad, H.; Zouaki, H.; Alheyane, O. Multi-label transfer learning for the early diagnosis of breast cancer. Neurocomputing 2019, 392, 168–180. [Google Scholar]
Agnes, S.A.; Anitha, J.; Pandian, S.I.A.; Peter, J.D. Classification of mammogram images using multiscale all convolutional neural network (MA-CNN). J. Med. Syst. 2020, 44, 30. [Google Scholar]
Arora, R.; Rai, P.K.; Raman, B. Deep feature–based automatic classification of mammograms. Med. Biol. Eng. Comput. 2020, 58, 1199–1211. [Google Scholar] [PubMed]
Zhu, W.; Lou, Q.; Vang, Y.S.; Xie, X. Deep Multi-instance Networks with Sparse Label Assignment for Whole Mammogram Classification. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Quebec City, QC, Canada, 11–13 September 2017; pp. 603–611. [Google Scholar]
Oquab, M.; Bottou, L.; Laptev, I.; Sivic, J. Is object localization for free?—Weakly-supervised learning with convolutional neural networks. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Zhou, B.; Khosla, A.; Lapedriza, A.; Oliva, A.; Torralba, A. Learning deep features for discriminative localization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2921–2929. [Google Scholar]
Sun, C.; Paluri, M.; Collobert, R.; Nevatia, R.; Bourdev, L. Pronet: Learning to propose object-specific boxes for cascaded neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 3485–3493. [Google Scholar]
Durand, T.; Mordan, T.; Thome, N.; Cord, M. Wildcat: Weakly supervised learning of deep convnets for image classification, pointwise localization and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 642–651. [Google Scholar]
Zhou, Y.; Zhu, Y.; Ye, Q.; Qiu, Q.; Jiao, J. Weakly supervised instance segmentation using class peak response. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3791–3800. [Google Scholar]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar]
Chattopadhay, A.; Sarkar, A.; Howlader, P.; Balasubramanian, V.N. Grad-CAM++: Generalized Gradient-Based Visual Explanations for Deep Convolutional Networks. In Proceedings of the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp. 839–847. [Google Scholar] [CrossRef]
Zhang, J.; Bargal, S.A.; Lin, Z.; Brandt, J.; Shen, X.; Sclaroff, S. Top-Down Neural Attention by Excitation Backprop. Int. J. Comput. Vis. 2018, 126, 1084–1102. [Google Scholar]
Lee, R.S.; Gimenez, F.; Hoogi, A.; Miyake, K.K.; Gorovoy, M.; Rubin, D.L. A curated mammography data set for use in computer-aided detection and diagnosis research. Sci. Data 2017, 4, 170177. [Google Scholar]
Moreira, I.C.; Amaral, I.; Domingues, I.; Cardoso, A.; Cardoso, M.J.; Cardoso, J.S. INbreast: Toward a full-field digital mammographic database. Acad. Radiol. 2012, 19, 236–248. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Dhungel, N.; Carneiro, G.; Bradley, A.P. Automated Mass Detection in Mammograms Using Cascaded Deep Learning and Random Forests. In Proceedings of the International Conference on Digital Image Computing: Techniques and Applications, Gold Coast, Australia, 30 November–2 December 2016; pp. 1–8. [Google Scholar]
Al-Antari, M.A.; Al-Masni, M.A.; Choi, M.T.; Han, S.M.; Kim, T.S. A fully integrated computer-aided diagnosis system for digital X-ray mammograms via deep learning detection, segmentation, and classification. Int. J. Med. Inform. 2018, 117, 44–54. [Google Scholar]
Chougrad, H.; Zouaki, H.; Alheyane, O. Deep convolutional neural networks for breast cancer screening. Comput. Methods Programs Biomed. 2018, 157, 19–30. [Google Scholar]
Wang, Y.; Feng, Y.; Zhang, L.; Wang, Z.; Lv, Q.; Yi, Z. Deep adversarial domain adaptation for breast cancer screening from mammograms. Med. Image Anal. 2021, 73, 102147. [Google Scholar]
Hu, T.; Zhang, L.; Xie, L.; Yi, Z. A multi-instance networks with multiple views for classification of mammograms. Neurocomputing 2021, 443, 320–328. [Google Scholar]
Zhang, C.; Zhao, J.; Niu, J.; Li, D. New convolutional neural network model for screening and diagnosis of mammograms. PLoS ONE 2020, 15, e0237674. [Google Scholar] [CrossRef] [PubMed]
Xie, L.; Zhang, L.; Hu, T.; Huang, H.; Yi, Z. Neural networks model based on an automated multi-scale method for mammogram classification. Knowl.-Based Syst. 2020, 208, 106465. [Google Scholar] [CrossRef]
Lopez, E.; Grassucci, E.; Valleriani, M.; Comminiello, D. Multi-View Breast Cancer Classification via Hypercomplex Neural Networks. arXiv 2022, arXiv:2204.05798. [Google Scholar]
Gao, Y.; Wang, X.; Zhang, T.; Han, L.; Beets-Tan, R.; Mann, R. Self-supervised learning of mammograms with pathology aware. In Proceedings of the Medical Imaging with Deep Learning, Zurich, Switzerland, 6–8 July 2022. [Google Scholar]
Lou, M.; Wang, R.; Qi, Y.; Zhao, W.; Xu, C.; Meng, J.; Deng, X.; Ma, Y. MGBN: Convolutional neural networks for automated benign and malignant breast masses classification. Multimed. Tools Appl. 2021, 80, 26731–26750. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-net: Convolutional networks for biomedical image segmentation. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]

Figure 1. The overall architecture integrates the proposed LEM into a convolutional neural network for mammography image analysis. Feature maps extracted via the CNN are processed via the LEM, which computes the score map to find all local extrema. Based on these extrema, the classification confidence of the input mammography image is calculated, and the regions that tend to contain lesions are localized via backpropagation.

Figure 2. Schematic diagram of the gradient direction in the score map. The red and yellow points in the figure correspond to the maximum and minimum values, respectively. A sparse loss makes all non-extreme points outside the interval

(- τ, τ)

approach this range so as to make the extreme points more significant.

Figure 2. Schematic diagram of the gradient direction in the score map. The red and yellow points in the figure correspond to the maximum and minimum values, respectively. A sparse loss makes all non-extreme points outside the interval

(- τ, τ)

approach this range so as to make the extreme points more significant.

Figure 3. The training phase using the proposed sparse loss function for LEM. For benign mammography images, the gradient generated via this loss will make the local extremum in the score map that meets the condition of the loss function smaller. Finally, the local maximum values are suppressed, and the local minima become more prominent. In malignant samples, the direction of the gradient is the opposite.

Figure 4. Visualization of whether sparse loss is used: (a) two samples containing calcification, (b) two samples with mass lesions, and (c) two samples that the model predicted incorrectly.

Figure 5. Visualization of lesion location map: (a) the samples visualized from the score map from LEM without sparse loss; (b) the LEM with sparse loss, resulting in fewer extrema.

Table 1. Accuracy and AUC comparisons of different CNN with LEM on INbreast.

Method	ACC (%)	AUC
Resnet-18	87.7	0.890
Resnet-50	88.9	0.921
Resnet-101	87.7	0.940
DenseNet-121	93.8	0.953
DenseNet-169	88.9	0.942
LEM Resnet-18	92.6	0.951
LEM Resnet-50	92.6	0.953
LEM Resnet-101	95.1	0.946
LEM DenseNet-121	96.3	0.976
LEM DenseNet-169	91.4	0.957