Previous Article in Journal
Ethics of the Use of Artificial Intelligence in Academia and Research: The Most Relevant Approaches, Challenges and Topics
Previous Article in Special Issue
Heart Attack Risk Prediction via Stacked Ensemble Metamodeling: A Machine Learning Framework for Real-Time Clinical Decision Support
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Privacy-Preserving Synthetic Mammograms: A Generative Model Approach to Privacy-Preserving Breast Imaging Datasets

1
Ivannikov Institute for System Programming of the Russian Academy of Science, Moscow 109004, Russia
2
ISP RAS Research Center for Trusted Artificial Intelligence, Moscow 109004, Russia
*
Author to whom correspondence should be addressed.
Informatics 2025, 12(4), 112; https://doi.org/10.3390/informatics12040112
Submission received: 5 August 2025 / Revised: 7 October 2025 / Accepted: 9 October 2025 / Published: 18 October 2025
(This article belongs to the Special Issue Health Data Management in the Age of AI)

Abstract

Background: Significant progress has been made in the field of machine learning, enabling the development of methods for automatic interpretation of medical images that provide high-quality diagnostics. However, most of these methods require access to confidential data, making them difficult to apply under strict privacy requirements. Existing privacy-preserving approaches, such as federated learning and dataset distillation, have limitations related to data access, visual interpretability, etc. Methods: This study explores the use of generative models to create synthetic medical data that preserves the statistical properties of the original data while ensuring privacy. The research is carried out on the VinDr-Mammo dataset of digital mammography images. A conditional generative method using Latent Diffusion Models (LDMs) is proposed with conditioning on diagnostic labels and lesion information. Diagnostic utility and privacy robustness are assessed via cancer classification tasks and re-identification tasks using Siamese neural networks and membership inference. Results: The generated synthetic data achieved a Fréchet Inception Distance (FID) of 5.8, preserving diagnostic features. A model trained solely on synthetic data achieved comparable performance to one trained on real data (ROC-AUC: 0.77 vs. 0.82). Visual assessments showed that synthetic images are indistinguishable from real ones. Privacy evaluations demonstrated a low re-identification risk (e.g., mAP@R = 0.0051 on the test set), confirming the effectiveness of the privacy-preserving approach. Conclusions: The study demonstrates that privacy-preserving generative models can produce synthetic medical images with sufficient quality for diagnostic task while significantly reducing the risk of patient re-identification. This approach enables secure data sharing and model training in privacy-sensitive domains such as medical imaging.

1. Introduction

In the modern world, neural network models occupy a central position across various fields of activity. These models are trained on vast amounts of data and leverage the acquired experience to make decisions. Their advantages include the ability to efficiently process and analyze large volumes of information, automatically detect hidden patterns, and predict trends. Nowadays neural networks and computer vision play an increasingly significant role in medicine, particularly in the analysis of medical images such as X-rays, CT scans, MRIs, and more [1,2,3,4,5]. Contemporary machine learning methods, including deep learning, demonstrate potential for improving diagnostic accuracy and developing personalized treatment methods [6,7,8]. However, the success of such methods hinges on the availability of extensive datasets for training neural networks. Specifically, deep learning systems, which consist of millions of trainable parameters, require large amounts of data to reliably learn meaningful representations [9,10]. In addition to the quantity, the quality of the available data is crucial for medical research [11]. Diverse and well-curated datasets enable researchers to achieve effective generalization and reduce the risk of biased predictions when applied in practice.
One of the main challenges in the development and training of neural networks for medical image analysis is access to sufficient volumes of data [12]. Sharing medical data is a complex issue due to stringent privacy and data protection requirements. A large portion of medical data remains isolated and stored locally in medical institutions and laboratories due to concerns related to patient data privacy [13,14]. As a result, access to medical data for research purposes and the training of neural networks can be significantly restricted. A practical way around data-sharing limits is to create synthetic patient data that keeps the original dataset’s statistics without exposing personal information. For instance, ref. [15] presents two differentially private methods for image obfuscation: pixelization with perturbation to the pixelized image and the application of Laplace perturbation to each pixel with Gaussian blur afterward. Ref. [16] presents a privacy-preserving automated method for breast density classification using craniocaudal (CC) view mammograms from the VinDr-Mammo dataset. Although synthetic data can be generated for all types of data modalities, this work focuses on mammogram images.
This study explores the method of generating and publishing synthetic datasets based on private datasets that are not publicly accessible. We study a threat model where an attacker who already has a patient’s images tries to confirm that patient’s presence in a public synthetic set to learn private details. Additionally, we assess the practical applicability of synthetic data using the example of a classification task.
To sum up, the contributions of our paper are as follows:
1.
An identification model is created that demonstrates the capability to identify patients based on mammogram images.
2.
A generative network is trained to create new mammogram images, achieving a Fréchet Inception Distance (FID) score of 5.6.
3.
Verified that the diagnostic information is preserved in the generated images.
4.
Demonstrated that patient privacy is maintained through the identification model as well as by conducting a Membership Inference Attack.

1.1. Patient Identification Using Medical Images

To the best of our knowledge, we did not find literature demonstrating the possibility of patient identification based on mammogram images. The closest domain-related study is presented in [17], where patient identification based on chest X-ray images was achieved using two approaches. In the first approach, a Siamese network takes two images and outputs p  [0, 1], indicating whether they belong to the same patient. The second approach does not directly identify whether the images belong to the same patient. In this approach [18], a Siamese network with contrastive loss embeds images so that similar cases are close in Euclidean distance and dissimilar ones are far apart.

1.2. Synthetic Data Generation

Generative Adversarial Networks (GANs) [19] have been widely used for generating synthetic data in the context of medical imaging. With the advent of diffusion models [20], which have shown superior performance over GANs in many generative tasks, diffusion models have also been applied to generate medical images [20]. In [21], the authors utilize Conditional Latent Diffusion Models (CLDM) to generate synthetic mammograms with malignant features based on an image of a healthy mammogram and a lesion mask. The results demonstrate practical applicability, but it preserves breast shape, which is a drawback for de-identification.
Recent conditional LDMs add controllability via cross-attention and classifier-free guidance, improving prompt adherence while balancing fidelity and diversity [22,23]. Beyond text, lightweight add-ons like ControlNet and T2I-Adapter inject spatial priors (edges, depth, masks, pose) without retraining the backbone [24,25]. GLIGEN grounds generation with layouts (e.g., bounding boxes) for open-set, layout-constrained synthesis [26]. Scaling variants such as SDXL expand context and aspect-ratio conditioning for higher-resolution quality [27]. In medical imaging, mask- and anatomy-aware conditioning enforces topology and label consistency—crucial when preserving breast shape for de-identification [28,29].

1.3. Privacy-Preserving

The most common approaches to secure data sharing while ensuring privacy include federated learning, dataset distillation, and other methods [20,21,30]. In federated learning, the model is trained locally on data within medical institutions, and model updates are centrally aggregated. The main limitation of this method is the need for continuous interaction with medical institutions for model training, which limits its use in open-access environments. Dataset distillation ensures data privacy by distilling information into a compact and generalized dataset. However, it has the drawback of being dependent on the model architecture, and the resulting data is not visually interpretable.
In this work we propose using generative models to create synthetic datasets that retain the statistical properties of the original data while ensuring privacy.
For example, in [24], the authors used GANs to generate a synthetic dataset based on chest X-ray images and brain CT scans. They analyzed the loss in classification quality with various parameters of the original dataset and the synthetic dataset generated by the neural network. The authors achieved results for classification tasks on 128 × 128 images, demonstrating that the classification metric remained stable when training on synthetic data. However, for larger images the quality significantly decreased. The focus of this work was on the statistical properties of the generated data rather than on privacy. The authors also noted that their method ultimately does not guarantee data privacy but emphasized that generative models hold great potential for addressing this challenge.

1.4. Federated Learning for Privacy-Preserving Medical Imaging

Federated learning (FL) trains models across hospitals without moving raw images, sharing only model updates (often with secure aggregation and differential privacy) [31]. It can improve robustness under domain shift but faces non-IID data, intermittent clients/bandwidth, and the need for sustained cross-site MLOps. FL and our approach share the same goal—protect patient privacy—but differ in what is shared: FL keeps data local and shares only updates, whereas we release shareable synthetic images that approximate the real distribution [32]. This enables open benchmarking and reuse without coordinating live federated rounds. The two strategies are also compatible (e.g., FL-trained generators or federated evaluation of synthetic datasets) [33].
Examples of FL in practice include early multi-institutional brain tumor segmentation showing FL performance close to centralized training [34], FL frameworks for COVID-19 chest X-ray screening across hospitals [35], and recent breast cancer studies using FL with transfer learning across multiple centers for privacy-preserving classification [36]. For federated evaluation rather than training, MedPerf enables “bring-the-model-to-the-data” benchmarking in clinical sites [37].

1.5. Extension to 3D Imaging

The conditional latent–diffusion pipeline can be extended from 2D mammography to volumetric digital breast tomosynthesis (DBT) and breast MRI. Prior work shows diffusion models are effective for 3D medical image synthesis and translation, motivating a 3D variant of our approach [38,39]. Two practical designs are relevant: a native 3D latent model with a 3D U-Net, maximizing inter-slice coherence at higher compute, and a slice-wise model augmented with cross-slice context (e.g., axial attention or a lightweight 3D refiner), which reduces memory while preserving anatomical continuity—see, e.g., latent-diffusion approaches such as Make-A-Volume [40]. Conditioning can encode acquisition metadata: for DBT, view angle/compression and geometry; for MRI, sequence and time-point indicators, which improves controllability and clinical relevance.
Evaluation should combine volumetric perceptual metrics (e.g., FID/LPIPS aggregated over volumes) with task-level tests such as lesion detection in DBT/MRI, complemented by expert review for slice-to-slice anatomical consistency [40]. Public DBT resources [41] and domain reviews [42] provide baselines and highlight gaps—limited public data, reconstruction artifacts—where synthetic 3D data can help. Extending the generator to 3D would retain the privacy advantages of synthetic data while aligning more closely with current DBT and breast MRI workflows.

2. Materials and Methods

2.1. Dataset

The VinDr-Mammo dataset [43], which was released recently, contains 5000 mammographic studies (four images per patient, totaling 20,000 digital mammograms). The annotated examinations are split into a training set of 4000 studies and a test set of 1000 studies. The dataset uses the BI-RADS classification system. Images are classified according to the solution proposed in [44], specifically: categories 1 and 2 as “normal”, categories 4 and 5 as “cancer”, with category 3 excluded. Therefore, the images have two labels: {cancer, normal}.

2.2. Patient Identification Based on Images

To address the task of patient identification using images, we applied a neural network model based on the SNN architecture as described in [14]. The model takes a mammogram image as input and outputs a 256-dimensional vector representing the image in a lower-dimensional space. During training, the model pulls embeddings from the same patient together and pushes embeddings from different patients apart using Euclidean distance. For this, the contrastive loss function [15] is used during training, calculated by the following formula:
L = 1 2 N i = 1 N   y i d i 2 + 1 y i m a x 0 , m d i 2
where N is the total number of pairs, y i is the label for the i-th pair (1 for positive pairs, 0 for negative pairs), d i is the Euclidean distance between representations for the i-th pair, and m is the margin (threshold) that defines the minimum distance between negative pairs. For further information, please refer to Appendix A.
After training the neural network, the search is performed as follows: for all images in the dataset, the neural network generates a vector representation of reduced dimensionality. Next, pairwise distances between the vectors are calculated, and the candidates are sorted in ascending order of distance. Thus, by using the contrastive loss function [15], which penalizes closeness between images of different patients and large distances between images of the same patient, the neural network is trained to solve the required task.
The following metrics were used to assess the performance of identification. Precision@ R Equation (2) is a common metric used in retrieval tasks. It measures the precision of the top- R returned results, meaning how many of the retrieved results are relevant. Mathematically it is defined as follows:
P r e c i s i o n @ R = r R
where r represents the number of relevant results among the top R results, R is the total number of retrieved results ( R is a fixed value).
Acc@ R (Accuracy at R ) Equation (3) is another metric used to measure the correctness of retrieval at rank R. It is defined as the average of Precision@R values that are non-zero across all queries:
A c c @ R = 1 Q i = 1 Q   I P r e c i s i o n @ R 0
where Q is the total number of queries, I P r e c i s i o n @ R 0 is an indicator function that is 1 when Precision@ R is non-zero and 0 otherwise. This metric essentially measures the fraction of queries for which at least one relevant result is returned in the top R .
AP@ R (Average Precision at R ) Equation (4) is a refined version of precision, which takes into account the ranking of the relevant results. It is computed as the average precision over all positions in the top R results as follows:
A P @ R = 1 R i = 1 R   P r e c i s i o n @ i × r e l i
where r e l i is an indicator that is 1 if the result at rank i is relevant, and 0 otherwise.
mAP@ R (Mean Average Precision at R ) Equation (5) is the average of AP@ R values across multiple queries. It provides an overall measure of the ranking quality of the system as follows:
m A P @ R = 1 Q i = 1 Q   A P i @ R

2.3. Synthetic Data Generation

For the task of generating new datasets, we utilized the Latent Diffusion Model (LDM). LDM consists of two parts: a Variational Auto-Encoder (VAE) model that translates the image from the original space into a latent space and a diffusion model that learns the data distribution in the latent space. Therefore, the VAE model must first be trained to achieve good image reconstruction quality, and then, fixing the VAE weights, the diffusion model is trained. Because we have class labels, view labels, and lesion boxes, we treat generation as conditional and feed a mask with this information to the model. For conditional generation, a mask containing all this information is additionally fed into the model. Hence, a CLDM was trained to account for this condition. To evaluate the perceptual fidelity of the synthetic images, we used the Fréchet Inception Distance (FID) computed on features extracted by the Inception-v3 network. Although Inception-v3 was trained on natural images, recent work demonstrates that FID can meaningfully reflect perceptual quality in medical-image synthesis [45]. Details can be found in Figure 1.

2.4. Patient Identification Experiment

We conducted an experiment to evaluate the model’s ability to identify a patient. A patient is considered identified if their mammogram image is among the top 3 results returned by the model (top@3). It is important to note that during the dataset split into training and validation sets, if one image of a patient is included in the training set, all other images of that patient will also be in the training set. Thus, there is no overlap of patients between the sets.
Each patient has four images: LCC (left craniocaudal), LMLO (left mediolateral oblique), RCC (right craniocaudal), and RMLO (right mediolateral oblique). To prevent potential errors in identification, where the model might match images from different projections of the same patient, we took several important steps.
Firstly, all images from the right side (RCC and RMLO) were flipped along the vertical axis to align with the projections from the left side (LCC and LMLO). This unification of image orientation prevents the model from being biased based on the gland’s projection.
Secondly, we applied binary gland masks as in [46] to remove projection labels from the images. This ensures that the model does not rely on these labels for identification and focuses exclusively on mammographic data.

2.5. Synthetic Dataset Generation

To generate the synthetic dataset, the original repository from the article presenting the LDM [25] was used. Initially, the encoder and decoder of the model were trained. Specifically, two model architectures referred to by the authors as VQ-VAE and KL-VAE were trained.

3. Results

3.1. Patient Identification

All images were standardized, displaying only the mammograms without projection information. Thus, we created a dataset consisting of mammographic images where all glands are oriented in the same direction and projection labels have been removed. An example of images after applying the mask and rotation is shown in Figure 2.
For comparison, we also conducted an experiment with projection labels included. After training the model, the resulting quality metrics are shown in Table 1. An example of the model’s performance is presented in Figure 3.

3.2. Synthetic Dataset

After the training process was completed, images from the test set were reconstructed, and the quality of reconstruction was evaluated using the Frechet Inception Distance (FID) metric. The PyTorch-FID library [47] was used for this calculation. The results are presented in Table 2. The evaluation showed that the KL-VAE model achieved significantly better reconstruction quality compared to VQ-VAE. Therefore, for further experiments, the pretrained KL-VAE model with frozen weights was used as the encoder and decoder.
Following the training of the encoder and decoder, a conditional diffusion model was trained in a latent space of size 4 × 64 × 64. This model takes as input the latent representation of the image of size 4 × 64 × 64 and a mask of size 2 × 64 × 64. The mask contains information about the gland projection, size, the presence of lesions, and their location. To enhance the accuracy of lesion generation, the loss function is multiplied by a factor of 2 in regions where lesions are localized.
Image generation and FID evaluation are computationally expensive and time-consuming processes, making it impractical to calculate FID at every epoch for selecting the best model weights. Therefore, the weights of the three best-performing models based on the loss function on the test set were saved. FID was then calculated for these models, and the best-performing model according to this metric was selected. The results are shown in Table 3.
Additionally, we conducted a study to assess whether the synthetic mammograms are visually distinguishable from real ones. We generated 100 images and paired each with its corresponding real source image. A radiologist with 5 years of experience completed a blinded, pairwise task, selecting the synthetic image in each pair. The radiologist’s accuracy was 43%—close to chance 50% of the random model—indicating the synthetic images were not reliably distinguishable from real images. An example of the generated image in Figure 4.

3.3. Evaluating the Practical Utility of the Synthetic Dataset

To assess the practical utility of the synthetic dataset, two identical cancer classification models based on EfficientNet-B3 architecture [6] were trained. The first model was trained exclusively on the original data, while the second was trained entirely on the synthetic data, which was generated based on masks from the original dataset. Both the original and synthetic training datasets have the same size and the same number of images with the “cancer” label.
All training parameters were identical in both cases. After training, classification metrics were computed on the test part of the original dataset. The metrics in Table 4 indicate that the model trained solely on synthetic data has slightly lower performance than the model trained on the original data. However, the quality achieved on the synthetic dataset is still sufficiently high, demonstrating the practical value of the generated data.

3.4. Privacy Assurance Evaluation

To ensure the rest of the process is clear it is necessary to understand how the synthetic datasets are constructed. For each train and test set a corresponding synthetic set of the same size is generated. The generation process involves using a mask for each image from the original dataset, which is then used for conditional generation (see Figure 5). The generated image is then assigned the patient identifier based on the mask used.
Ensuring privacy protection is a complex task. Several possible scenarios need to be checked, which we will consider sequentially.
First, we need to ensure that the original image of a patient does not match a generated one with the same identifier. If this happens, the patient’s information would be exposed.
To verify this, we computed the identification quality metrics described in Section 2.2. It is important to note that in this case a search (query) is conducted for each image from the original dataset in the generated dataset.
The results in Table 5 show that there is no unambiguous correspondence, and the possibility of re-identification with a matching identifier is quite low.
Another scenario to consider is when the model generates a similar or identical image for a different patient. In this case there is a potential for visual similarity to establish that the patient’s data is present in this dataset and to extract additional information.
To assess the likelihood of such similarities between images, we analyzed the distances between vectors obtained using the model. We calculated the distance between the original and the nearest original images as well as the distance between the original and the nearest generated images. Figure 6 and Figure 7 show that the distance between the original and generated images is greater than between the original images, indicating that there is no direct visual correspondence between them. However, it is necessary to analyze images from the left tail of the distribution, which have small distances, and to remove them if explicit visual matches are found. Another solution to this problem is to regenerate images with distances below a certain threshold.

4. Discussion

This study addresses a critical challenge in medical imaging: balancing the need for large, high-quality datasets to train machine learning models with the need to protect patient privacy. By using generative models—specifically, Latent Diffusion Models (LDMs)—we demonstrate a viable path towards privacy-preserving synthetic data generation that preserves practical utility.
The main key contribution of this work is the development of a pipeline capable of generating mammographic images conditioned on diagnostic and spatial metadata, such as lesion location and projection type. This conditional synthesis approach can produce realistic and diverse samples while preserving essential diagnostic features. The use of a KL-VAE-based encoder–decoder further enhanced image fidelity, achieving a 5.8 FID score. Ref. [48] presents FID 4.383 and [49] 52.89, but the studies are incomparable due to different datasets.
Our experiments revealed that the synthetic dataset is sufficient for training cancer classification models: a classifier trained solely on synthetic images achieved an ROC-AUC of 0.77 compared to 0.82 when trained on real data. Although there is a slight degradation in performance, the gap is relatively small, suggesting the potential of synthetic data as a surrogate in scenarios where access to real data is limited or legally restricted.
Importantly, the privacy evaluation demonstrated strong robustness to re-identification attempts. By training and testing an SNN-based identification model, we confirmed that synthetic images do not contain one-to-one mappings to original data. Re-identification metrics (e.g., mAP@R ≈ 0.0051) and distance distributions support the conclusion that patient identity is effectively obscured. Furthermore, human evaluators were unable to reliably distinguish between real and synthetic samples in a blinded setting further affirming the realism of synthetic data.
Our results underscore a fundamental trade-off between data fidelity and privacy: improving synthetic realism does not have to come at the cost of leaking patient-specific features. The presented method strikes a compelling balance by incorporating semantic information (through masks and labels) while introducing sufficient latent-space variability to reduce similarity to real inputs.
Nevertheless, the synthetic data is derived from masks based on the original data, which could still embed subtle patient-specific traits if not carefully designed. Moreover, the cancer classification task, while illustrative, does not encompass the full spectrum of diagnostic applications. Further studies are needed to evaluate the utility of synthetic datasets for segmentation, detection, or multi-class classification tasks. While our privacy analysis includes standard metrics and qualitative inspection, formal guarantees such as differential privacy bounds were not established and represent an important direction for future research.
To sum up, this paper presents a comprehensive framework for generating private synthetic mammograms. Our key contributions are as follows:
  • High-quality image generation: We developed a generative model, specifically a Latent Diffusion Model (LDM), that produces realistic synthetic mammograms, achieving a state-of-the-art Fréchet Inception Distance (FID) score of 5.808. A blinded evaluation by a radiologist confirmed the visual fidelity, with an identification accuracy of only 43% (close to random chance).
  • Generative network results: The generated synthetic data achieved a Fréchet Inception Distance (FID) of 5.8, preserving diagnostic features.
  • Preservation of diagnostic utility: We verified that the synthetic data retains clinical value. A cancer classification model trained exclusively on our generated data achieved an ROC-AUC of 0.77, demonstrating only a slight performance drop compared to a model trained on original data (ROC-AUC: 0.82) and proving its practical utility for downstream tasks.
  • Robust patient privacy assurance: We rigorously evaluated privacy risks. Our identification model showed near-zero re-identification accuracy (mAP@R of 0.001 on the training set), and a distribution analysis of image embeddings confirmed that synthetic images are significantly less similar to their original counterparts than original images are to each other, effectively mitigating the risk of data leakage.
Overall, the study confirms that LDM-based synthetic data generation can be a powerful tool in privacy-sensitive domains, offering a way to enable open collaboration and data sharing without compromising patient confidentiality. Future work will focus on enhancing the conditioning mechanism, introducing formal privacy guarantees, and testing scalability across diverse imaging modalities and institutions.
In future work, we plan to take a closer look at more privacy attacks on our pipeline. We plan to test attribute-inference attacks and run ablations to see how much leakage comes from lesion cues and overall breast shape. On the defense side, we will try different fine-tuning techniques, simple regularization of the loss/conditioning (including randomized masking and small geometry tweaks), and basic output filters based on embedding similarity.
Another potential next step is a multi-reader study to check how well radiologists detect and classify lesions on synthetic mammograms. Using blinded, lesion-level reads and standard ROC/FROC measures, we would compare the results with matched real images to verify non-inferiority.

5. Conclusions

The objective of this study was to examine the potential of neural network models in ensuring the privacy of medical data transmission, with digital mammogram images serving as a case study. The initial stage of the study involved training a model for patient identification from mammogram images. This demonstrated the potential for identification, as confirmed by an mAP@R metric of 0.636.
To address this issue, an LDM was trained to generate a synthetic dataset. A synthetic sample was created using this model, and the Fréchet Inception Distance (FID) metric was calculated for it. Subsequently, a visual similarity experiment was conducted, during which a focus group was unable to distinguish between the original images and the synthetic images.
Additionally, an experiment was conducted to illustrate the practical utility of synthetic data. A cancer classification model was trained on both the original and synthetic datasets. Classification quality metrics were calculated on the test subset of the original dataset. The findings indicate that the model trained exclusively on synthetic data exhibits slightly inferior quality metrics in comparison to the model trained on the original data. Nevertheless, the quality of the synthetic dataset remains sufficiently robust, thereby substantiating the practical utility of the generated data.
To analyze the assurance of privacy, metrics were initially calculated to confirm the fulfillment of the conditions delineated in the problem statement. Subsequently, an additional analysis of image similarity was conducted by comparing the Euclidean distances between the embedding vectors of the original and generated images.
In conclusion, synthetic data can be useful in clinical settings when used carefully, but it may miss device- or site-specific nuances, introduce subtle artifacts, or shift class balance. To keep performance reliable, using synthetic data in clinical practice requires site-specific validation, subgroup and calibration checks, and ongoing human oversight.

Author Contributions

Conceptualization, D.S., E.U., A.L., and Y.M.; methodology, D.S. and E.U.; software, D.S. and E.U.; validation, D.S., E.U., and Y.M.; formal analysis, Y.M.; investigation, D.S.; resources, Y.M.; data curation, E.U.; writing—original draft preparation, D.S.; writing—review and editing, D.S., E.U., A.L., and Y.M.; visualization, D.S.; supervision, Y.M.; project administration, E.U. and Y.M.; funding acquisition, Y.M. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by a grant, provided by the Ministry of Economic Development of the Russian Federation in accordance with the subsidy agreement (agreement identifier 000000C313925P4G0002) and the agreement with the Ivannikov Institute for System Programming of the Russian Academy of Sciences dated 20 June 2025 No. 139-15-2025-011.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MLMachine Learning
LDMLatent Diffusion Model
CLDMConditional Latent Diffusion Model
VAEVariational Autoencoder
VQ-VAEVector Quantized Variational Autoencoder
KL-VAEKullback–Leibler Variational Autoencoder
GANGenerative Adversarial Network
SNNSiamese Neural Network
FIDFréchet Inception Distance
BI-RADSBreast Imaging Reporting and Data System
CTComputed Tomography
MRIMagnetic Resonance Imaging

Appendix A. Experiments and Dataset Details

Appendix A.1. Dataset Details

The train set includes 7233 studies classified as BI-RADS categories 1 or 2 (‘normal’), 372 studies classified as category 3, and studies classified as categories 4 and 5 (‘cancer’). The test set includes 1808 studies classified as categories 1 or 2 (‘normal’), 93 studies classified as category 3, and 99 studies classified as categories 4 and 5 (‘cancer’). The images were resized to 512 × 512 and this size was used for all experiments.

Appendix A.2. Patient Identification Based on Images

The SNN network was trained using a two-stage procedure, similar to that described in [17]. The EfficientNet B3 model with pre-trained weights from the torchvision library was used as the backbone. In the initial phase, the backbone of the network was frozen, with only the task-specific head being trained. In the subsequent phase, the backbone was unfrozen, and the entire network underwent fine-tuning. For both stages we used mini-batch training with batch size 32, and standard image normalization (mean = 0.122, std = 0.231, pixel range [0, 1]). Data augmentation is detailed in Table A2. Optimization in each stage used SGD with momentum and a OneCycle learning-rate scheduler; specific optimizer and scheduler values for each stage are listed in Table A3. All reported epochs and scheduler settings are as used in the experiment configuration.

Appendix A.3. Synthetic Data Generation

For high-resolution image generation we followed the original Latent Diffusion Models repository and retained the original configuration structure and majority of values. The training process was divided into two distinct stages. The first-stage VAE (KL-8) utilized embed_dim = 4 and z_channels = 4 with an input resolution of 512 × 512. The LDM operates on 64 × 64 latents (image_size = 64, channels = 4) and concatenates the segmentation encoding to the latent channels (model concat_mode = true; UNet in_channels = 6, see explanation below). The conditioning encoder for segmentation employs a spatial rescaler with two output channels; consequently, concatenation results in six channels being fed into the UNet. For the purpose of sampling, images were generated using the DDIM sampler (500 steps, η = 1). It is important to note that augmentations are not applied during CLDM training.

Appendix B. Parameters

Hardware

All training and evaluation runs were executed on a single NVIDIA A100 GPU with 40 GB of VRAM. All times below are single-run wall-clock durations measured on that machine.
Table A1. Single-run training wall-clock runtimes of experiments.
Table A1. Single-run training wall-clock runtimes of experiments.
ExperimentSingle-Run RuntimeSteps/Epochs
VAE (KL-8 first stage)2 days 5 h232,000 steps
Conditional LDM (CLDM)2 days 10 h259,000 steps
SNN—stage 1 (frozen backbone)12.5 h50 epochs
SNN—stage 2 (unfrozen backbone)17 h50 epochs
SNN—total (both stages)1 day 5.5 h100 epochs
Classification model (EfficientNet)2.5 h20 epochs
Table A2. Augmentations used during patient identification (SNN) training.
Table A2. Augmentations used during patient identification (SNN) training.
TransformParameters/Notes
Resize512 × 512
Normalizemean = 0.122, std = 0.231, max pixel value = 1
Hue/Saturation/Valuep = 0.5; val shift limit = 0.1 (hue/sat shift = 0)
Gaussian noisep = 0.5; var_limit = [1.0 × 10−4, 0.005];
per_channel = True
Advanced blurp = 0.35
Sharpenp = 0.35; alpha ∈ [0.02, 0.2]
Pixel dropoutp = 0.3
Table A3. SNN optimization and scheduler settings (stage 1 and stage 2).
Table A3. SNN optimization and scheduler settings (stage 1 and stage 2).
Stage 1—Backbone FrozenStage 2—Backbone Unfrozen
OptimizerSGDSGD
Initial/configured LR0.1 (max_lr in scheduler)0.021 (max_lr in scheduler)
Momentum0.90.9
Weight decay1.0 × 10−51.0 × 10−5
SchedulerOneCycleLR (max_lr = 0.1)OneCycleLR (max_lr = 0.021)
Scheduler configepochs = 50,
steps_per_epoch = 500
epochs = 50,
steps_per_epoch = 500
Batch size3232
Table A4. VAE (KL-8)—key hyperparameters.
Table A4. VAE (KL-8)—key hyperparameters.
Value/Notes
Target classldm.models.autoencoder.AutoencoderKL
Base LR4.5 × 10−6
Embed dim4
Loss typeLPIPSWithDiscriminator (disc_start = 50,001, disc_in_channels = 1, kl_weight = 1.0 × 10−6, disc_weight = 0.5)
ddconfig—channel settingsch = 128; ch_mult = [1, 2, 4, 4]; num_res_blocks = 2; attn_resolutions = []; dropout = 0.0
Batch size2
Table A5. Latent Diffusion Model (LDM)—key hyperparameters.
Table A5. Latent Diffusion Model (LDM)—key hyperparameters.
Value/Notes
Target classldm.models.diffusion.ddpm.LatentDiffusion
Base LR4.5 × 10−6
Noise schedulelinear start = 0.0015; linear end = 0.0205; timesteps = 1000
Loss typeL1
Conditioning modeconcat_mode = true
(concatenate segmentation encoding to latent channels)
UNet channels and headsmodel_channels = 128; channel_mult = [1, 4, 8]; num_res_blocks = 2; num_heads = 8
cond_stage_configSpatialRescaler with n_stages = 3, in_channels = 2 (segmentation encoder output)
Batch size16
Table A6. Classification model training and optimization settings.
Table A6. Classification model training and optimization settings.
Value/Notes
Loss typeFocal (alpha = 0.948, gamma = 2.0, reduction = mean)
OptimizerLion (lr = 1.0 × 10−5, betas = [0.9, 0.99], weight_decay = 0.01)
SchedulerReduceLROnPlateau (monitor = AUROC, factor = 0.1, patience = 5, threshold = 1.0 × 10−4, mode = max)
Batch size32
Table A7. Augmentations used for classification training.
Table A7. Augmentations used for classification training.
TransformParameters/Notes
Resize512 × 512
RandomGridShufflep = 0.2
AdvancedBlurp = 0.35
Gaussian noisep = 0.5; var_limit = [1 × 10−4, 0.005]; per_channel = True
Horizontal flipp = 0.5
Hue/Saturation/Valuep = 0.5; val_shift_limit = 0.1 (hue/sat shift = 0)
Sharpenp = 0.35; alpha ∈ [0.02, 0.2]
ShiftScaleRotatep = 0.35; border_mode = 0
GridDropoutp = 0.1; ratio = 0.35
GridDistortionp = 0.2; border_mode = 0
CoarseDropoutp = 0.2; max_holes = 20
PixelDropoutp = 0.3
Normalizemean = 0.122, std = 0.231, max pixel value = 1

References

  1. Prodan, M.; Paraschiv, E.; Stanciu, A. Applying deep learning methods for mammography analysis and breast cancer detection. Appl. Sci. 2023, 13, 4272. [Google Scholar] [CrossRef]
  2. Gao, X.W.; Hui, R.; Tian, Z. Classification of CT brain images based on deep learning networks. Comput. Methods Programs Biomed. 2017, 138, 49–56. [Google Scholar] [CrossRef]
  3. Aamir, M.; Rahman, Z.; Dayo, Z.A.; Abro, W.A.; Uddin, M.I.; Khan, I.; Imran, A.S.; Ali, Z.; Ishfaq, M.; Guan, Y.; et al. A deep learning approach for brain tumor classification using MRI images. Comput. Electr. Eng. 2022, 101, 108105. [Google Scholar] [CrossRef]
  4. Ushakov, E.; Naumov, A.; Fomberg, V.; Vishnyakova, P.; Asaturova, A.; Badlaeva, A.; Tregubova, A.; Karpulevich, E.; Sukhikh, G.; Fatkhudinov, T. EndoNet: A Model for the Automatic Calculation of H-Score on Histological Slides. Informatics 2023, 10, 90. [Google Scholar] [CrossRef]
  5. Ibragimov, A.; Senotrusova, S.; Markova, K.; Karpulevich, E.; Ivanov, A.; Tyshchuk, E.; Grebenkina, P.; Stepanova, O.; Sirotskaya, A.; Kovaleva, A.; et al. Deep Semantic Segmentation of Angiogenesis Images. Int. J. Mol. Sci. 2023, 24, 1102. [Google Scholar] [CrossRef]
  6. De Fauw, J.; Ledsam, J.R.; Romera-Paredes, B.; Nikolov, S.; Tomasev, N.; Blackwell, S.; Askham, H.; Glorot, X.; O’dOnoghue, B.; Visentin, D.; et al. Clinically applicable deep learning for diagnosis and referral in retinal disease. Nat. Med. 2018, 24, 1342–1350. [Google Scholar] [CrossRef]
  7. Monteiro, M.; Newcombe, V.F.J.; Mathieu, F.; Adatia, K.; Kamnitsas, K.; Ferrante, E.; Das, T.; Whitehouse, D.; Rueckert, D.; Menon, D.K.; et al. Multiclass semantic segmentation and quantification of traumatic brain injury lesions on head CT using deep learning: An algorithm development and multicentre validation study. Lancet Digit. Health 2020, 2, e314–e322. [Google Scholar] [CrossRef]
  8. Chen, H.; Engkvist, O.; Wang, Y.; Olivecrona, M.; Blaschke, T. The rise of deep learning in drug discovery. Drug Discov. Today 2018, 23, 1241–1250. [Google Scholar] [CrossRef] [PubMed]
  9. LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
  10. Xing, X.; Papanastasiou, G.; Dìaz, O.; Alberich, L.C.; Osuala, R.; Nan, Y.; Lekadir, K.; Yang, G. Generating Synthetic Data in Cancer Research. In Trustworthy AI in Cancer Imaging Research; Springer: Cham, Switzerland, 2025; pp. 81–101. [Google Scholar]
  11. Esteva, A.; Robicquet, A.; Ramsundar, B.; Kuleshov, V.; DePristo, M.; Chou, K.; Cui, C.; Corrado, G.; Thrun, S.; Dean, J. A guide to deep learning in healthcare. Nat. Med. 2019, 25, 24–29. [Google Scholar] [CrossRef]
  12. Lo, B. Sharing clinical trial data: Maximizing benefits, minimizing risk. J. Am. Med. Assoc. 2015, 313, 793–794. [Google Scholar] [CrossRef] [PubMed]
  13. van Panhuis, W.G.; Paul, P.; Emerson, C.; Grefenstette, J.; Wilder, R.; Herbst, A.J.; Heymann, D.; Burke, D.S. A systematic review of barriers to data sharing in public health. BMC Public Health 2014, 14, 1144. [Google Scholar] [CrossRef]
  14. Phillips, M. International data-sharing norms: From the OECD to the general data protection regulation (GDPR). Hum. Genet. 2018, 137, 575–582. [Google Scholar] [CrossRef] [PubMed]
  15. Fan, L. Differential privacy for image publication. In Proceedings of the Theory and Practice of Differential Privacy (TPDP) Workshop 2019, London, UK, 11 November 2019; Volume 1, p. 6. [Google Scholar]
  16. Mongkolluksamee, S.; Khonthapagdee, S. Privacy-Preserving Breast Density Classification in Mammograms Using Fuzzy C-Means and Homomorphic Encryption. In Proceedings of the 2025 17th International Conference on Knowledge and Smart Technology (KST), Bangkok, Thailand, 26 February 2025–1 March 2025; pp. 1–6. [Google Scholar]
  17. Packhäuser, K.; Gündel, S.; Münster, N.; Syben, C.; Christlein, V.; Maier, A. Deep learning-based patient re-identification is able to exploit the biometric nature of medical chest X-ray data. Sci. Rep. 2022, 12, 14851. [Google Scholar] [CrossRef] [PubMed]
  18. Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition—Volume 2 (CVPR’06), New York, NY, USA, 17–22 June 2006; Volume 2, pp. 1735–1742. [Google Scholar]
  19. Skandarani, Y.; Jodoin, P.-M.; Lalande, A. GANs for medical image synthesis: An empirical study. J. Imaging 2023, 9, 69. [Google Scholar] [CrossRef]
  20. Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
  21. Angel, R.M.-D.; Sam-Millan, K.; Vilanova, J.C.; Mart’ı, R. Mame: Mammographic synthetic image generation with diffusion models. Sensors 2024, 24, 2076. [Google Scholar] [CrossRef]
  22. Sun, Y.; Chen, Z.; Zheng, H.; Deng, W.; Liu, J.; Min, W.; Elazab, A.; Wan, X.; Wang, C.; Ge, R. BS-LDM: Effective Bone Suppression in High-Resolution Chest X-Ray Images with Conditional Latent Diffusion Models. IEEE J. Biomed. Health Inform. 2025. [Google Scholar] [CrossRef]
  23. Zhu, L.; Xue, Z.; Jin, Z.; Liu, X.; He, J.; Liu, Z.; Yu, L. Make-A-Volume: Leveraging Latent Diffusion Models for Cross-Modality 3D Brain MRI Synthesis. arXiv 2023, arXiv:2307.10094. [Google Scholar]
  24. Shi, G.; Xiao, L.; Chen, Y.; Zhou, S.K. Applying deep learning in digital breast tomosynthesis for breast cancer screening: Opportunities and challenges. Med. Image Anal. 2021, 70, 101979. [Google Scholar] [CrossRef]
  25. Ho, J.; Salimans, T. Classifier-Free Diffusion Guidance. arXiv 2022, arXiv:2207.12598. [Google Scholar] [PubMed]
  26. Podell, D.; English, Z.; Lacey, K.; Blattmann, A.; Dockhorn, T.; Müller, J.; Penna, J.; Rombach, R. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis. arXiv 2023, arXiv:2307.01952. [Google Scholar] [CrossRef]
  27. Sch, A.D.; Hetzel, J.; Gatidis, S.; Hepp, T.; Dietz, B.; Bauer, S.; Schwab, P. Overcoming barriers to data sharing with medical image generation: A comprehensive evaluation. npj Digit. Med. 2021, 4, 141. [Google Scholar] [CrossRef]
  28. Zhang, L.; Rao, A.; Agrawala, M. ControlNet: Adding Conditional Control to Text-to-Image Diffusion Models. arXiv 2023, arXiv:2302.05543. [Google Scholar]
  29. Mou, C.; Wang, X.; Xie, L.; Wu, Y.; Zhang, J.; Qi, Z.; Shan, Y.; Qie, X. T2I-Adapter: Learning Adapters for More Controllable Text-to-Image Diffusion Models. arXiv 2023, arXiv:2302.08453. [Google Scholar]
  30. Nguyen, H.T.; Nguyen, H.Q.; Pham, H.H.; Lam, K.; Le, L.T.; Dao, M.; Vu, V. VinDr-Mammo: A large-scale benchmark dataset for computer-aided diagnosis in full-field digital mammography. medRxiv 2022. [Google Scholar] [CrossRef]
  31. Adnan, M.; Kalra, S.; Cresswell, J.C.; Taylor, G.W.; Tizhoosh, H.R. Federated learning and differential privacy for medical image analysis. Sci. Rep. 2022, 12, 1953. [Google Scholar] [CrossRef] [PubMed]
  32. Zhang, C.; Xie, Y.; Bai, H.; Yu, B.; Li, W.; Gao, Y. A survey on federated learning. Knowl.-Based Syst. 2021, 216, 106775. [Google Scholar] [CrossRef]
  33. Bendoukha, A.-A.; Demirag, D.; Kaaniche, N.; Boudguiga, A.; Sirdey, R.; Gambs, S. Towards privacy-preserving and fairness-aware federated learning framework. Proc. Priv. Enhancing Technol. 2025, 2025, 845–865. [Google Scholar] [CrossRef]
  34. Seitzer, Pytorch-fid: FID Score for PyTorch. Version 0.3.0. Available online: https://github.com/mseitzer/pytorch-fid (accessed on 31 August 2025).
  35. Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. arXiv 2019, arXiv:1905.11946. [Google Scholar]
  36. Woodland, M.; Castelo, A.; Al Taie, M.; Silva, J.A.M.; Eltaher, M.; Mohn, F.; Shieh, A.; Kundu, S.; Yung, J.P.; Patel, A.B.; et al. Feature extraction for generative medical imaging evaluation: New evidence against an evolving trend. In International Conference on Medical Image Computing and Computer-Assisted Intervention; Springer: Berlin/Heidelberg, Germany, 2024; pp. 87–97. [Google Scholar]
  37. Sheller, M.J.; Reina, G.A.; Edwards, B.; Martin, J.; Pati, S.; Kotrotsou, A.; Milchenko, M.; Xu, W.; Marcus, D.S.; Colen, R.R.; et al. Multi-institutional Deep Learning Modeling Without Sharing Patient Data: A Feasibility Study on Brain Tumor Segmentation. In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries; Lecture Notes in Computer Science Series; Springer: Cham, Switzerland, 2019; Volume 11383, pp. 92–104. [Google Scholar] [CrossRef]
  38. Feki, I.; Ammar, S.; Kessentini, Y.; Muhammad, K. Federated learning for COVID-19 screening from Chest X-ray images. Appl. Soft Comput. 2021, 106, 107330. [Google Scholar] [CrossRef]
  39. Roth, H.R.; Chang, K.; Singh, P.; Neumark, N.; Li, W.; Gupta, V.; Gupta, S.; Qu, L.; Ihsani, A.; Bizzo, B.C.; et al. Federated Learning for Breast Density Classification: A Real-World Collaborative Setting. In Domain Adaptation and Representation Transfer and Medical Image Learning with Less Labels and Imperfect Data (DART–MIA 2020); Lecture Notes in Computer Science Series; Springer: Berlin/Heidelberg, Germany, 2020; Volume 12444, pp. 181–191. [Google Scholar] [CrossRef]
  40. Karargyris, A.; Umeton, R.; Sheller, M.J.; Aristizabal, A.; George, J.; Wuest, A.; Pati, S.; Kassem, H.; Zenk, M.; Baid, U.; et al. Federated benchmarking of medical artificial intelligence with MedPerf. Nat. Mach. Intell. 2023, 5, 799–810. [Google Scholar] [CrossRef] [PubMed]
  41. Wang, W.; Xia, Q.; Yan, Z.; Hu, Z.; Chen, Y.; Zheng, W.; Wang, X.; Nie, S.; Metaxas, D.; Zhang, S. Diffusion models in medical imaging: A comprehensive survey. Med. Image Anal. 2023, 91, 102999. [Google Scholar] [CrossRef]
  42. Khader, F.; Müller-Franzes, G.; Arasteh, S.T.; Han, T.; Haarburger, C.; Schulze-Hagen, M.; Schad, P.; Engelhardt, S.; Baeßler, B.; Foersch, S.; et al. Denoising diffusion probabilistic models for 3D medical image generation. Sci. Rep. 2023, 13, 7303. [Google Scholar] [CrossRef] [PubMed]
  43. Li, G.; Togo, R.; Ogawa, T.; Haseyama, M. Compressed gastric image generation based on soft-label dataset distillation for medical data sharing. Comput. Methods Programs Biomed. 2022, 227, 107189. [Google Scholar] [CrossRef]
  44. Li, G.; Togo, R.; Ogawa, T.; Haseyama, M. Dataset distillation for medical dataset sharing. arXiv 2022, arXiv:2209.14603. [Google Scholar] [CrossRef]
  45. Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
  46. Ibragimov, A.; Senotrusova, S.; Litvinov, A.; Ushakov, E.; Karpulevich, E.; Markin, Y. MamT4: Multi-view attention networks for mammography cancer classification. In Proceedings of the 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC), Osaka, Japan, 2–4 July 2024; pp. 1965–1970. [Google Scholar]
  47. Shen, L.; Margolies, L.R.; Rothstein, J.H.; Fluder, E.; McBride, R.; Sieh, W. Deep learning to improve breast cancer detection on screening mammography. Sci. Rep. 2019, 9, 12495. [Google Scholar] [CrossRef]
  48. Park, S.; Lee, K.H.; Ko, B.; Kim, N. Unsupervised anomaly detection with generative adversarial networks in mammography. Sci. Rep. 2023, 13, 2925. [Google Scholar] [CrossRef]
  49. Jiménez-Gaona, Y.; Carrión-Figueroa, D.; Lakshminarayanan, V.; Rodríguez-Álvarez, M.J. Gan-based data augmentation to improve breast ultrasound and mammography mass classification. Biomed. Signal Process. Control. 2024, 94, 106255. [Google Scholar] [CrossRef]
Figure 1. Overview of the proposed methodology for generating and evaluating synthetic mammography datasets. (A) Synthetic images are generated from private datasets using a Conditional Latent Diffusion Model (CLDM). (B) The synthetic dataset ensures privacy against a DL-based identification model. (C) The synthetic dataset’s utility is evaluated by training a cancer classification model whose performance is comparable to that of a model trained on real datasets.
Figure 1. Overview of the proposed methodology for generating and evaluating synthetic mammography datasets. (A) Synthetic images are generated from private datasets using a Conditional Latent Diffusion Model (CLDM). (B) The synthetic dataset ensures privacy against a DL-based identification model. (C) The synthetic dataset’s utility is evaluated by training a cancer classification model whose performance is comparable to that of a model trained on real datasets.
Informatics 12 00112 g001
Figure 2. An example of images from the same patient: (a) before applying the mask and rotation, (b) after applying the mask and rotation.
Figure 2. An example of images from the same patient: (a) before applying the mask and rotation, (b) after applying the mask and rotation.
Informatics 12 00112 g002
Figure 3. An example of the model’s identification results. The query image is the image used for the search. The images highlighted in green belong to the target patient, while those in red do not.
Figure 3. An example of the model’s identification results. The query image is the image used for the search. The images highlighted in green belong to the target patient, while those in red do not.
Informatics 12 00112 g003
Figure 4. Example of the image generation. From left to right: (1) real mammogram labeled as cancer; (2) corresponding mask where red denotes the breast region and yellow indicates the lesion location; (3) image generated conditionally on the mask; (4) Grad-CAM visualization of a pretrained classifier showing attention focused on the lesion area and predicting cancer.
Figure 4. Example of the image generation. From left to right: (1) real mammogram labeled as cancer; (2) corresponding mask where red denotes the breast region and yellow indicates the lesion location; (3) image generated conditionally on the mask; (4) Grad-CAM visualization of a pretrained classifier showing attention focused on the lesion area and predicting cancer.
Informatics 12 00112 g004
Figure 5. Visual comparison of real and conditionally generated mammograms. The top rows show the original images alongside the corresponding generated samples conditioned on the lesion masks, where the red area on the mask denotes the breast region and the yellow area indicates the lesion location. The bottom rows present Grad-CAM visualizations for both the original and generated images, highlighting the classifier’s attention regions.
Figure 5. Visual comparison of real and conditionally generated mammograms. The top rows show the original images alongside the corresponding generated samples conditioned on the lesion masks, where the red area on the mask denotes the breast region and the yellow area indicates the lesion location. The bottom rows present Grad-CAM visualizations for both the original and generated images, highlighting the classifier’s attention regions.
Informatics 12 00112 g005
Figure 6. Comparison of distances in the Train set. Distances between images from the original and original dataset (blue), and from the original and generated dataset (red).
Figure 6. Comparison of distances in the Train set. Distances between images from the original and original dataset (blue), and from the original and generated dataset (red).
Informatics 12 00112 g006
Figure 7. Comparison of distances in the Test set. Distances between images from the original and original dataset (blue) and from the original and generated dataset (red).
Figure 7. Comparison of distances in the Test set. Distances between images from the original and original dataset (blue) and from the original and generated dataset (red).
Informatics 12 00112 g007
Table 1. Performance of patient identification task.
Table 1. Performance of patient identification task.
mAP@4Precision@4Precision@1Acc@3
W/O labels0.6360.6660.8470.951
With labels0.6490.6790.8550.944
Table 2. FID results between original and reconstructed images from the test set (↓ indicates that lower values are better).
Table 2. FID results between original and reconstructed images from the test set (↓ indicates that lower values are better).
Title 1VQ-VAEKL-VAE
FID ↓9.1671.263
Table 3. FID metrics for LDM (↓ indicates that lower values are better).
Table 3. FID metrics for LDM (↓ indicates that lower values are better).
FID ↓Generated ImagesOriginal Images
6.54540004000
5.8088000
Table 4. Classification performance of models that were solely trained on the original and generated datasets using the test portion of the original dataset.
Table 4. Classification performance of models that were solely trained on the original and generated datasets using the test portion of the original dataset.
ROC-AUCF1-Score
Original0.820.43
Generated0.770.36
Table 5. Re-identification quality metrics.
Table 5. Re-identification quality metrics.
mAP@RR-precisionPrecision@1Acc@4
Train0.0010.00190.00250.0075
Test0.00510.00820.01320.0332
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Shodiev, D.; Ushakov, E.; Litvinov, A.; Markin, Y. Privacy-Preserving Synthetic Mammograms: A Generative Model Approach to Privacy-Preserving Breast Imaging Datasets. Informatics 2025, 12, 112. https://doi.org/10.3390/informatics12040112

AMA Style

Shodiev D, Ushakov E, Litvinov A, Markin Y. Privacy-Preserving Synthetic Mammograms: A Generative Model Approach to Privacy-Preserving Breast Imaging Datasets. Informatics. 2025; 12(4):112. https://doi.org/10.3390/informatics12040112

Chicago/Turabian Style

Shodiev, Damir, Egor Ushakov, Arsenii Litvinov, and Yury Markin. 2025. "Privacy-Preserving Synthetic Mammograms: A Generative Model Approach to Privacy-Preserving Breast Imaging Datasets" Informatics 12, no. 4: 112. https://doi.org/10.3390/informatics12040112

APA Style

Shodiev, D., Ushakov, E., Litvinov, A., & Markin, Y. (2025). Privacy-Preserving Synthetic Mammograms: A Generative Model Approach to Privacy-Preserving Breast Imaging Datasets. Informatics, 12(4), 112. https://doi.org/10.3390/informatics12040112

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop