Color-Guided Mixture-of-Experts Conditional GAN for Realistic Biomedical Image Synthesis in Data-Scarce Diagnostics

Kwiek, Patrycja; Ciepiela, Filip; Jakubowska, Małgorzata

doi:10.3390/electronics14142773

Open AccessArticle

Color-Guided Mixture-of-Experts Conditional GAN for Realistic Biomedical Image Synthesis in Data-Scarce Diagnostics

by

Patrycja Kwiek

,

Filip Ciepiela

and

Małgorzata Jakubowska

^*

Faculty of Materials Science and Ceramics, AGH University of Krakow, al. Mickiewicza 30, 30-059 Kraków, Poland

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(14), 2773; https://doi.org/10.3390/electronics14142773

Submission received: 29 May 2025 / Revised: 2 July 2025 / Accepted: 8 July 2025 / Published: 10 July 2025

(This article belongs to the Special Issue Deep Learning in Video and Image Processing: Challenges, Solutions, and Future Directions)

Download

Browse Figures

Versions Notes

Abstract

Background: Limited availability of high-quality labeled biomedical image datasets presents a significant challenge for training deep learning models in medical diagnostics. This study proposes a novel image generation framework combining conditional generative adversarial networks (cGANs) with a Mixture-of-Experts (MoE) architecture and color histogram-aware loss functions to enhance synthetic blood cell image quality. Methods: RGB microscopic images from the BloodMNIST dataset (eight blood cell types, resolution 3 × 128 × 128) underwent preprocessing with k-means clustering to extract the dominant colors and UMAP for visualizing class similarity. Spearman correlation-based distance matrices were used to evaluate the discriminative power of each RGB channel. A MoE–cGAN architecture was developed with residual blocks and LeakyReLU activations. Expert generators were conditioned on cell type, and the generator’s loss was augmented with a Wasserstein distance-based term comparing red and green channel histograms, which were found most relevant for class separation. Results: The red and green channels contributed most to class discrimination; the blue channel had minimal impact. The proposed model achieved 0.97 classification accuracy on generated images (ResNet50), with 0.96 precision, 0.97 recall, and a 0.96 F1-score. The best Fréchet Inception Distance (FID) was 52.1. Misclassifications occurred mainly among visually similar cell types. Conclusions: Integrating histogram alignment into the MoE–cGAN training significantly improves the realism and class-specific variability of synthetic images, supporting robust model development under data scarcity in hematological imaging.

Keywords:

conditional generative adversarial networks; Mixture-of-Experts; RGB histograms; UMAP; peripheral blood; missing blood cells generation

1. Introduction

Biomedical challenges have become a critical focus for artificial intelligence (AI) research due to the unprecedented availability of clinical data generated by hospitals, diagnostic laboratories, and research institutions [1,2,3]. This data—spanning electronic health records (EHRs), laboratory test results, high-resolution medical images, and physiological signals—holds great promise for advancing diagnostics, predicting treatment outcomes, and enabling personalized therapies. Bioinformatics plays a pivotal role in this ecosystem by transforming raw clinical data into structured forms suitable for computational analysis, facilitating the training of machine learning (ML) models that can aid in clinical decision-making. However, the reliability of these models hinges on the accuracy and consistency of the underlying data, which is meticulously maintained by medical professionals. Even slight errors in device calibration, data collection protocols, or annotation quality can propagate through algorithms, potentially leading to serious clinical risks.

Despite the transformative potential of AI in healthcare, several persistent challenges limit its full integration into medical practice. One of the most significant barriers is the scarcity of large, high-quality, labeled datasets. This problem is especially acute in specialized diagnostic areas, such as hematology, histopathology, and rare disease detection, where data acquisition requires expert clinicians and sophisticated, often expensive, equipment. Class imbalance further complicates model development; pathological cases or specific cell types may represent only a small fraction of the dataset, while healthy or common cases dominate. This imbalance can bias models toward overpredicting majority classes, thereby undermining diagnostic accuracy for critical minority classes.

The heterogeneity of medical data introduces additional complexity. Variations in imaging modalities (e.g., MRI, CT, and microscopy); staining protocols; equipment manufacturers; image resolution; and even lighting conditions contribute to substantial variability in data representation. Such variability challenges the ability of models to generalize across institutions and patient populations. Moreover, regulatory constraints on data sharing, high diagnostic costs (particularly in systems where procedures are not fully reimbursed), unequal access to advanced medical equipment, and variable levels of staff training further limit the quantity and standardization of data available for AI model development.

To address these limitations, researchers have explored a range of strategies aimed at improving data availability, diversity, and model robustness. Data augmentation remains one of the most widely used techniques, introducing variations into existing images through operations such as rotation, translation, scaling, flipping, adding noise, adjusting contrast and color, cropping, and zooming [4,5,6,7]. These techniques can help models become more resilient to minor variations in image presentation but do not address deeper issues of class imbalance or data scarcity at the distributional level.

More recent efforts have focused on generative models for synthetic data creation. These models include generative adversarial networks (GANs), which learn to produce realistic images by pitting a generator against a discriminator, and their extension, conditional GANs (cGANs), which allow image generation tied to specific class labels [8,9]. Other approaches, such as Variational Autoencoders (VAEs), generate samples by modeling the probability distribution of existing data, while Diffusion Models iteratively denoise samples to synthesize highly realistic images [10,11]. These generative methods are particularly valuable in low-data regimes, offering the possibility of enriching datasets without further straining limited clinical resources. Transfer learning, where models pretrained on large, general purpose datasets like ImageNet are adapted to medical tasks, has also proven effective in overcoming data limitations by leveraging prior knowledge. Semi-supervised learning, employing strategies like self-training and co-training, enables models to utilize vast amounts of unlabeled data alongside smaller labeled subsets, progressively expanding training sets with pseudo-labeled data [12,13].

Privacy-preserving techniques, such as federated learning, facilitate model training across distributed datasets without exposing sensitive patient data, addressing legal and ethical barriers to data sharing.

Nevertheless, synthetic data generation for biomedical imaging still faces substantial challenges. Many generative frameworks underexploit color information, despite its diagnostic significance, especially in fields like histopathology and hematology, where subtle color variations differentiate morphologically similar cells or tissue structures. Color inconsistencies introduced during data synthesis can degrade the quality of synthetic images and limit their utility for training reliable diagnostic models.

In this study, we propose a novel synthetic image generation strategy that directly addresses these challenges: a Color-Guided Mixture-of-Experts Conditional GAN (MoE–cGAN) architecture tailored for biomedical image synthesis in data-scarce settings. Our method explicitly integrates targeted RGB color channel analysis into the generative process, leveraging the discriminative power of color to improve class-specific image fidelity. Unlike standard augmentation or unconditional synthesis techniques, our approach begins with a detailed statistical and structural analysis of the RGB distributions in stained biomedical images, particularly peripheral blood cell images. This analysis informs the generator’s design by guiding the incorporation of color histogram similarity—focused on diagnostically meaningful channels—into the loss function via Wasserstein distance. The Mixture-of-Experts structure introduces modular specialization, allowing different sub-generators to focus on particular classes, thereby enhancing diversity, addressing class imbalance, and improving minority class representation. This color-aware, expert-driven generative strategy offers a comprehensive solution for producing high-fidelity synthetic data, supporting the development of balanced, generalizable AI models for biomedical diagnostics.

2. Materials and Methods

2.1. Dataset and Data Preprocessing

The dataset utilized in this study is the BloodMNIST dataset, which is a subset of the MedMNIST benchmark dataset collection [14]. This collection is composed of 17,092 microscopic images of eight types of blood cells, normal cells, and obtained during hematologic or oncologic diseases, which are all colored purple due to the type of cell staining. The eight cell classes, i.e., basophils, eosinophils, erythroblasts, immature granulocytes, lymphocytes, monocytes, neutrophils, and platelets (or thrombocytes), are represented in the mentioned dataset. The labels for each cell image were assigned by expert pathologists. The detailed characteristics of the data are shown in Table S1. The database was divided into training (70%, 11,162 samples), validation (10%, 1785 samples), and test (20%, 3572 samples) sets, as proposed by the authors, who also provided tools for loading these three subsets. This is the standard division used in the MedMNIST v2 benchmarks. The library software allows images to load in three different resolutions, i.e., 64 × 64, 128 × 128, and 224 × 224 pixels.

The BloodMNIST dataset plays an important role in the diagnosis of blood and parasitic diseases and in the creation of decision support systems in hematopathology. It enables the automation of microscopic analysis, relieving specialists and increasing diagnostic accuracy. It is used, among others, to assess the morphology of blood cells in the diagnosis of anemia, leukemia (e.g., AML and ALL), thalassemia, malaria, and sepsis. It can support the monitoring of treatment effectiveness (chemotherapy and immunotherapy) and be useful in population screening programs. BloodMNIST is also used in training AI models and medical education, facilitating the recognition of blood pathologies.

2.2. Methods

To analyze color distributions and inter-class similarities, we employed k-means clustering, UMAP dimensionality reduction, and the Spearman rank correlation coefficient. K-means was used to identify dominant color patterns by clustering pixel intensities, providing insight into shared chromatic features across classes [15]. UMAP was applied to RGB histogram data to visualize sample groupings and assess whether color distributions align with cell class labels, aiding in the detection of class-separable or overlapping chromatic structures [16,17]. Spearman correlation quantified the inter-class histogram similarity, capturing monotonic relationships between color profiles while being robust to outliers and intensity shifts [18,19,20]. The flowchart of each algorithm is presented in Figure 1. Together, these methods support a deeper understanding of color-based variation in the dataset and its impact on class separability and generation fidelity.

The study used conditional generative adversarial networks (cGAN), consisting of a generator and a discriminator competing with each other in the training process. The generator creates samples imitating real data, and the discriminator assesses their authenticity. Additional conditioning, e.g., class labels, is introduced in cGAN, which allows control over the generated images [21,22,23].

cGAN training can be unstable, which is why regularization techniques such as layer normalization or loss function modifications are used. The effectiveness of the model is assessed based on the realism of the generated samples and the balance of losses of both networks—the optimal classification accuracy of the discriminator is about 50%. cGAN is used, among others, in medicine, where it allows the generation of synthetic data in situations of limited availability of real examples [24,25,26].

LeakyReLU introduces a small, non-zero slope (typically α ≈ 0.01) for negative inputs, addressing the “dying neurons” issue inherent in standard ReLU. In GAN generators, it enhances the training stability by improving the gradient flow and maintaining activation diversity, which supports more effective learning and the generation of realistic, varied samples [27,28].

The Mixture-of-Experts (MoE) approach can be viewed as a dynamic instantiation of the Divide-and-Conquer strategy [29], in which a complex task is decomposed into subcomponents handled by specialized expert networks. Both MoE and Divide-and-Conquer frameworks aim to process distinct aspects of the input by delegating responsibilities to modular components. Unlike traditional Divide-and-Conquer, which uses static assignment, MoE incorporates a gating mechanism that dynamically routes inputs to the most relevant experts based on input characteristics. A key similarity lies in their integration of partial outputs into a final, unified result, while a central distinction is MoE’s adaptive routing and weighted aggregation, offering greater flexibility and efficiency. RGB-T Salient Object Detection (SOD), for example, applies Divide-and-Conquer principles by processing RGB and thermal data streams through modality-specific networks before fusing the outputs to enhance robustness and accuracy [30,31].

In conditional image generation, MoE combined with conditional GANs (cGANs) forms an advanced architecture capable of modeling complex and multimodal data distributions. Multiple expert generators specialize in different class-conditional distributions, and a gating network assigns input-dependent weights to combine their outputs. The shared discriminator evaluates both image realism and class fidelity, enforcing stronger supervision. This structure mitigates mode collapse by encouraging expert diversity and enables scalable generation, as the number of experts can grow without linearly increasing the inference cost. The effective optimization of MoE–cGAN architectures involves regularization and weight balancing techniques. Empirical studies demonstrate improved image detail, consistency, and adaptability over standard GAN variants, enabling more accurate and diverse generation across classes and conditions [32,33].

The Wasserstein distance, also known as Earth Mover’s Distance (EMD), measures the difference between two probability distributions as the minimum “transport cost” of mass from one distribution to the other [34]. In generative modeling, it provides more stable gradients than traditional divergence measures and is sensitive to subtle distributional shifts, which contributes to more robust training dynamics [35,36,37,38]. In this work, a 1D Wasserstein distance is employed to compare the cumulative histograms of pixel intensities in the red and green (R and G) channels between generated and real images, reflecting perceptually important color structures while excluding the blue channel due to its lower diagnostic relevance. Each image is flattened into a vector, and normalized histograms are computed over a shared range [0,1] using 256 bins. Cumulative histograms are then calculated and normalized, and the Wasserstein distance is computed as the mean absolute difference between corresponding bins. The final loss is the sum of distances for the R and G channels:

W a s s e r s t e i n D i s t a n c e = \frac{1}{K} \sum_{i = 1}^{K} |C_{r e a l} [i] - C_{g e n} [i]|, l o s s = {l o s s}_{R} + {l o s s}_{G}

where:

$C_{r e a l} [i]$ —the value of the cumulative histogram for the real image at bin i.
$C_{g e n} [i]$ —the corresponding value for the generated image.
K—the number of histogram bins.

This formulation guides the generator toward matching the color intensity distributions of real images, improving visual fidelity and reducing artifacts such as unnatural hues.

The architecture of the model used in this work is presented in Figure 2.

2.3. Evaluation Metrics

In multiclass labeling with 8 classes, metrics such as accuracy, precision, recall, and F1-score are defined by extending them from binary versions, taking into account the specificity of many classes.

Let us assume the notation:

$T P_{i}$ (True Positive): a number of correct predictions from class i.
$F P_{i}$ (False Positive): samples from other classes misclassified as class i.
${F N}_{i}$ (False Negative): samples from class i mislabeled as other classes.

Accuracy measures the overall correctness of the classification.

A c c u r a c y = \frac{N u m b e r o f c o r r e c t p r e d i c t i o n s}{N u m b e r o f a l l s a m p l e s} = \frac{\sum_{i = i}^{8} T P_{i}}{N} = \frac{\sum_{i = i}^{8} T P_{i}}{\sum_{i = i}^{8} (T P_{i} + F N_{i})}

The total number of samples is equal to the sum of all predictions:

N = \sum_{i = i}^{8} (T P_{i} + F N_{i})

As each sample is either:

correctly classified as its true class (TP_i for given class);
incorrectly classified as a different class (FN_i).

Precision is the ratio of correctly classified samples of a given class to all samples classified as that class. Precision is calculated for each class as follows:

P_{i} = \frac{T P_{i}}{T P_{i} + F P_{i}}

Recall is the ratio of correctly classified samples of a given class to all samples that actually belong to that class. Recall is calculated for each class accordingly:

R_{i} = \frac{T P_{i}}{T P_{i} + F N_{i}}

F1-score is the harmonic mean of precision and sensitivity; it is calculated for each class in the following way:

F 1_{i} = 2 \cdot \frac{P_{i} \cdot R_{i}}{P_{i} + R_{i}}

To obtain a single value for the entire model, metric aggregation is used.

Macro Average calculates the arithmetic mean of the metrics for all classes, treating each class equally.

M a c r o P = \frac{1}{8} \sum_{i = 1}^{8} P_{i} M a c r o R = \frac{1}{8} \sum_{i = 1}^{8} R_{i}

Weighted Average assigns weights proportional to class size.

W e i g h t e d P = \sum_{i = 1}^{8} w_{i} P_{i},

where:

w_{i} = \frac{N u m b e r o f s a m p l e s f r o m c l a s s i}{N}

The confusion matrix allows for a more detailed analysis of the classification errors. It is a table where the rows represent the actual classes and the columns represent the predicted classes. In the case of a multiclass problem, accuracy can also be calculated from the confusion matrix.

One of the most important metrics for assessing the quality of data generation is the Fréchet Inception Distance (FID)—the lower the FID value, the smaller the difference between the feature distributions of real and generated images. FID measures the similarity between the distributions using the Inception V3 model. Real and generated images are processed by the Inception V3 model to obtain feature vectors (usually from the pool3 layer, with 2048 dimensions). The mean and covariance of these features are calculated for both sets (real and generated). FID is the Fréchet (Wasserstein-2) distance between these two Gaussian distributions. The following notations are used:

k—feature space dimension;
$m_{1} \in R^{k}$ —the mean feature vector extracted from the real distribution;
$m_{2} \in R^{k}$ —the mean feature vector extracted from the generated distribution;
$C_{1} \in R^{k x k}$ —the covariance matrix of features for real data;
$C_{2} \in R^{k x k}$ —the covariance matrix of features for generated data.

The Fréchet Inception Distance (FID) is then defined as

F I D^{2} = {||m_{1} - m_{2}||}_{2}^{2} + T r (C_{1} + C_{2} - 2 {(C_{1} C_{2})}^{\frac{1}{2}})

where:

${||m_{1} - m_{2}||}_{2}^{2}$ denotes the squared Euclidean norm of the difference between the means;
$T r (\cdot)$ represents the trace of a matrix;
${(C_{1} C_{2})}^{\frac{1}{2}}$ denotes the matrix square root of products $C_{1}$ and $C_{2}$ .

FID does not depend directly on the number of classes but on the overall similarity of feature distributions. Therefore, one can calculate one FID altogether, comparing the entire generated set with the entire real set, or for each of the classes separately.

Monitoring the losses of both models is also important, because a uniform decrease in losses suggests that the generator is learning to effectively fool the discriminator and the discriminator is correctly recognizing the authenticity of the samples. Maintaining a balance between the generator and the discriminator is crucial, because one of the models being too strong can lead to training instability and deterioration of the quality of the results. The evaluation of GAN performance should be supplemented by a visual analysis of the generated samples, which allows detecting possible artifacts that are invisible using only numerical metrics.

2.4. Training, Testing, and Data Generation

In this project, images from the BloodMNIST database with dimensions of 128 × 128 are processed, and the color is saved in the RGB space. The RGB components are normalized to the interval [−1, 1].

2.5. Hardware and Software

Data generation was performed with the use of deep neural networks implemented in Python 3, for which the input data were images from the MedMNIST dataset. For this purpose, the PyTorch 2.5.1 library was applied. Various types of data simulation, collection, processing, and preliminary interpretations, as well as charts and graphs preparation, were implemented in Python. Calculations were performed using a Lenovo Legion laptop and PC with the specifications presented in Table S2.

3. Results

3.1. Characteristics of the BloodMNIST Database with Regard to Color Analysis in the R, G, and B Channels

RGB color analysis plays a key role in BloodMNIST modeling due to the close relationship between the color and shape of blood cells such as erythrocytes, leukocytes, and platelets. Color differences, such as purple nuclei and pink cytoplasm, are important for classification and segmentation. RGB histograms allow for assessing the color distribution between classes, supporting feature extraction and eliminating false classifications resulting from similar shapes. Analysis of individual channels can help reduce dimensionality and improve classification accuracy. In transfer models (e.g., ResNet and EfficientNet), RGB is the basic input, and preserving color information increases the efficiency of using models pre-trained on datasets such as ImageNet. Colors also support artifact identification and noise reduction, improving the quality of the input data.

Figure 3 shows eight blood cell types along with RGB channel histograms, allowing the analysis of morphological features and differences in color distribution. Purple and pink tones dominate the entire dataset, resulting from eosin and basic staining. The background of the images has a distinct pinkish tone. The histograms show characteristic patterns: in the red channel, the highest intensities occur between 230 and 255; in the green channel, the lowest in the range of 210–230; and in the blue channel, values from 190 to 210 dominate. There is no blue intensity below 125, confirming the dominance of warmer tones. Purples have balanced red and blue values and low green intensity, giving them a cool, deep character. Dark purples have lower R and B channel values, and light purples have higher values. Pinks are dominated by red; light pinks have very high red and moderate other channels, while darker purples have slightly reduced intensity in all channels, maintaining a warm tone. Such RGB dependencies reflect the typical colors present in cell images and are important for distinguishing them and using them in machine learning models.

In order to identify the dominant colors in cell images, the k-means method (k = 6) was used, which allows for the reduction of the full color spectrum to a few representative shades—centroids. Each centroid reflects the average color value in one of the key color groups of a given class, creating a synthetic but very characteristic “color signature” of the cell. Unlike histograms, which show the full intensity distribution of the R, G, and B channels, the k-means method allows for a direct comparison of classes in color space, revealing their similarities and differences.

Basophils are distinguished by a strongly contrasting combination of a cream background and dark purple cytoplasmic granules, often obscuring the nucleus. Their centroids are clearly separated from other classes, which greatly facilitates their identification. Eosinophils, on the other hand, are characterized by the presence of large red granules on a light pink background of the cytoplasm. This translates into a strong dominance of the red channel and a clearly red color profile, very characteristic and well distinguishable [39,40,41].

Erythroblasts combine a pale pink cytoplasmic background with a darker, often purple, nucleus, resulting in the presence of both light and darker centroids corresponding to the cytoplasm and nucleus, respectively. Immature granulocytes, as transitional cells, show a more diffuse color distribution. Their irregular nuclei and less concentrated granules cause the centroids to fall within the range of cool but less distinct purples, which can make it difficult to clearly classify them based on color alone.

Lymphocytes and monocytes show great color similarity. In lymphocytes, most of the cell is occupied by a dark purple nucleus, surrounded by a thin layer of light pink cytoplasm. Monocytes are larger, their nucleus is kidney-shaped, and their cytoplasm is more grayish. Despite these differences, the dominant colors of both types are similar—centroids corresponding to purple and pink colors often appear, which can lead to errors in classification if only color is taken into account.

Neutrophils, the most common leukocytes, contain a segmented nucleus and small granules, which translate into a grayish, slightly purple cytoplasm. Their color distribution is even, and the dominant colors oscillate around dark purples and subdued shades, with small intensity shifts in the red and blue channels. Such a profile, although less pronounced than in eosinophils or basophils, still allows them to be recognized in the context of the entire database.

Platelets, as the smallest structures in the analyzed set, do not contain a nucleus and appear in the image as small pink–purple dots. Due to their size and simple structure, most of the color comes from the background of the preparation. The histograms show low color intensity, and the centroids are concentrated around very light, pale pink shades. Thanks to this, despite the lack of a complex structure, platelets are well recognizable based on their characteristic background and limited color range.

The use of the k-means method significantly simplifies the analysis and highlights the differences between cell classes while indicating which types are more difficult to distinguish due to color similarities. In combination with morphological analysis, this provides a solid basis for automated classification and statistical analysis in blood cell image recognition systems.

The efficiency of automatic recognition is also influenced by the morphological features of cells, such as size, nucleus shape, and nucleus-to-cytoplasm ratio. Lymphocytes, with a dominant nucleus, have a distinct color profile, while platelets—very small and anucleate—are more difficult to capture in color. Immature forms, such as granulocytes, are characterized by a less clear, “blurred” color distribution. Figure 3 illustrates that morphological differences correspond well with RGB histograms. The use of k-means allows the color analysis to be simplified to a few representative shades, which aids classification, especially for classes with distinct color and structural features.

Before applying the UMAP algorithm, the images were flattened to one-dimensional vectors, and then, their dimensionality was reduced to 200 principal components using PCA. UMAP performed separately for the R, G, and B channels (Figure 4) revealed that some classes, such as erythroblasts, lymphocytes, and platelets, form clearly separated clusters. The R channel allows for good discrimination of most classes, except for basophils and monocytes, which partially overlap. The G channel separates the classes better than the B channel, which shows weaker discrimination and more outliers. Therefore, the green channel may be more important in the design of classification algorithms. Overall, the UMAP analysis shows that classes with clear color features are well recognized, while more complex cases require additional information, e.g., cell morphology or the integration of data from several channels. The reliability of the representation at the level of about 0.84 confirms the accuracy of this representation.

The study of color differences between cell classes was based on the calculation of the distance (1—Spearman correlation) between the histograms in the R, G, and B channels (Figure 5). Lower values indicate greater similarity and higher values greater differences. In all the channels, the distances are relatively low, which suggests a partial dependence between the color signals. The smallest differences are observed in channel B, where only platelets stand out significantly due to their purple hue and small size. The largest differences between classes occur in channel G, which is consistent with previous UMAP observations and emphasizes its usefulness for classification.

Channel R may also contain important information, although it does not clearly distinguish one class. It is worth paying attention to some pairs of classes (e.g., lymphocytes–basophils and granulocytes–eosinophils), which, depending on the channel, show a variable degree of similarity. The green channel in particular reveals subtle but statistically significant differences, which may be important for the further improvement of classification algorithms.

3.2. Image Generation

The generator and discriminator modules were designed to make cGAN training efficient and stable, i.e., the balance between the generator and the discriminator is crucial to obtain high-quality results.

3.2.1. Generator

The optimized generator employs a Mixture-of-Experts (MoE) architecture, where multiple specialized sub-generators (experts) operate in parallel, and their outputs are adaptively combined via a gating network. The generator takes a latent noise vector (dim = 100) and a class label (0–7) as input. The label is embedded and concatenated with the noise vector to form a unified input, which is then processed by the gating network—comprising two linear layers with ReLU activation followed by softmax—to produce expert-specific weights. Each expert receives the same input and consists of a linear projection to an initial feature map (e.g., 128 × init_size × init_size); unflattening; and a series of ConvTranspose2d layers with InstanceNorm2d, ReLU/LeakyReLU, and two residual blocks. The output is passed through a final Conv2d and tanh activation to generate an RGB image. The final output is a weighted sum of the experts’ outputs, enabling dynamic specialization and improved generation quality across diverse classes.

3.2.2. Discriminator

The discriminator performs both adversarial discrimination and auxiliary classification to enforce realism and label consistency. It consists of three Conv2d layers with increasing channel depth (64 → 128 → 256), LeakyReLU activations, and dropout (0.4/0.3), progressively downsampling the input while extracting hierarchical features. The final feature map is flattened and passed to two parallel linear heads: an adversarial output for real/fake prediction and a classification head for class label assignment. This dual-objective design improves the training stability and enhances the generator’s ability to produce class-consistent, high-fidelity images. The discriminator complements the Mixture-of-Experts generator by reinforcing both visual and semantic accuracy.

3.2.3. Training of Image Generation Models

The GAN model is trained alternately, teaching the discriminator to distinguish between real and generated images and assigning them correct labels and the generator to create realistic images consistent with the given classes. Adam optimizers with different parameters were used: for the generator, lr = 0.0002, and for the discriminator, lr = 0.0001, with betas = (0.5, 0.999), and the batch size was set to 64. Two loss functions are used in training: BCEWithLogitsLoss (to distinguish real and fake images) and CrossEntropyLoss (to classify labels).

The discriminator training consists of calculating the total loss from real and generated images and their labels and then updating its weights. Additionally, the accuracy of its prediction is calculated. The generator, on the other hand, produces images based on a random vector and labels, which are assessed by the discriminator for the authenticity and correctness of classes. The total loss of the generator consists of the adversarial, classification, and histogram components, weighted by appropriate coefficients. Based on this, the generator updates its parameters. The accuracy of assigning labels by the discriminator to the generated images serves as an additional measure of the generator’s training effectiveness.

The loss for the generator is calculated as a weighted sum of the three components. Adversarial loss: The generator wants to “fool” the discriminator, so we treat the generated images as real. Classification loss: The generator is penalized when the discriminator fails to assign the correct label to the generated image (comparison with fake_labels). Histogram loss: The difference between the color distribution of the generated and real images is calculated. Only the red and green channels are considered, as they have been previously proven to best differentiate the classes. The Wasserstein histogram loss function is used in calculations. The loss components are then combined. The total loss of the generator is the sum of three components, where each is weighted:

g_{l o s s} = 1.0 \cdot g_{l o s s_{a d v}} + 2.0 \cdot g_{l o s s_{c l a s s}} + 10.0 \cdot g_{l o s s_{h i s t}}

The weights (1.0, 2.0, and 10.0) indicate the relative importance of the loss components (adversarial, classification, and histogram) and were selected in the optimization process. The generation strategy was tested for 100 epochs (200 iterations), generating a total of 1000 images from eight classes. The values of loss and accuracy are shown in Figure S1, and the sample images—real and generated—are compared in Figure 6.

3.3. Participation of Experts in Image Generation

In this study, various computation variants involving two to eight experts were analyzed. It was observed that, with a larger number of experts, three to four paths made a distinct contribution. Therefore, in the final version of peripheral blood cell generation, four experts were used within the Mixture-of-Experts framework. Figure S2 shows, in the form of a heatmap, the percentage contribution of the four experts to the image generation of each of the eight classes.

Based on the expert contribution data for BloodMNIST classification, several patterns may be observed. Expert 4 demonstrates the highest overall contribution and shows expertise in eosinophil (66.0%) and neutrophil (63.6%) classification. This expert also contributes significantly to platelet (52.9%) and lymphocyte (51.2%) identification, suggesting broad competency across multiple cell types. Expert 2 exhibits a more focused specialization profile, with exceptional performance in monocyte classification (61.9%) and granulocyte identification (42.3%). However, this expert shows notably lower contributions to platelet (2.7%) and erythroblast (3.4%) classification. Expert 3 demonstrates intermediate performance, with strengths in granulocyte classification (55.6%) and erythroblast identification (34.9%). This expert maintains relatively consistent contributions across most cell types. Expert 1 shows the most balanced distribution pattern but with a generally lower overall contribution, suggesting a more conservative approach to annotation.

It is also worth focusing on observations for individual classes. Monocyte classification shows extreme expert specialization, with Expert 2 providing 61.9% of the annotations while other experts contribute minimally (2.9–28.4%), suggesting this cell type may require specific expertise. Eosinophil classification is heavily dominated by Expert 4 (66.0%), with Expert 1 providing secondary support (22.5%), indicating potential morphological complexity.

3.4. Final Evaluation of Image Generation

The experiments used a classification model based on the ResNet50 architecture, applied to the BloodMNIST dataset, containing blood cell images divided into eight classes. The transfer learning approach was used; the model pre-trained on ImageNet was adapted to the medical task by modifying its final layer for multiclass classification. The CrossEntropyLoss function and the Adam optimizer with a learning step of 0.001 were used. The model was trained for 10 epochs, achieving an accuracy of 0.97 on the test set.

ResNet50 was also used to evaluate the synthetic data generated in the study. Accuracy, precision, sensitivity (recall), and the F1 measure were determined (Figure 7), obtaining a total accuracy of 0.97. The main classification errors concerned the granulocyte and lymphocyte classes, which are illustrated in the confusion matrix.

The confusion matrix demonstrates strong overall model performance, with most blood cell classes classified with perfect or near-perfect accuracy. Basophils, eosinophils, erythroblasts, neutrophils, and platelets were classified with 100% accuracy, suggesting that these classes are well represented in the feature space and are easily distinguishable by the model.

However, some degree of confusion is observed between morphologically and chromatically similar classes. For instance, granulocytes, while mostly classified correctly (101 instances), were occasionally misclassified as monocytes (12 instances) and lymphocytes (1 instance). Likewise, lymphocytes were misclassified as erythroblasts (2 instances) and monocytes (11 instances). Monocytes were predicted with high accuracy (129 correct), with only a single misclassification as a lymphocyte. These patterns indicate difficulty in distinguishing between granulocytes, monocytes, and lymphocytes—subtypes that are known to share overlapping visual and structural features.

Importantly, this classification behavior aligns with previous observations on color distribution patterns. The confusion matrix analysis is consistent with the findings from UMAP projections and the distance matrix, which highlighted the chromatic proximity between immature granulocytes, neutrophils, and lymphocytes. Specifically, low inter-class color distances, such as the 0.080 distance in the red channel between granulocytes and neutrophils, point to the similarity in hue, which may contribute to the classifier’s errors.

Additionally, immature granulocytes exhibit more color outliers, making them more likely to be misclassified as neighboring classes in the feature space, particularly neutrophils and lymphocytes. The presence of dark purple hues, confirmed via k-means clustering, in all three of these classes further complicates their differentiation. These overlapping chromatic features reinforce the observed classification challenges and ex-plain the localized confusion within the model’s predictions.

In summary, while the model shows excellent performance overall, misclassifications tend to occur between classes with high visual and color similarity. This highlights the importance of chromatic information in distinguishing subtle differences between closely related cell types and underscores the challenges in synthetic image generation and classification for such categories.

The FID value calculated for the entire database is 52.09, and the average FID equals 102.07. The FID values for each individual class are given in Table 1.

In the next stage of evaluating the quality of the generated image dataset, the analysis focused on whether individual classes retained the required data diversity and whether the inter-class distances were appropriately reproduced. Figure S3 presents the mean distances calculated for pairs of objects within classes (intra-class distances), as well as for pairs across the eight classes (inter-class distances). To ensure consistency with the BloodMNIST database, 128 objects from each class were randomly selected for the calculations. The results, illustrating both the mean values and the variability using box plots, are presented separately for each of the R, G, and B channels.

For the BloodMNIST dataset and intra-class analysis, the green channel (G) exhibits the highest variation within each class, followed by the red channel (R) and, finally, the blue channel (B), which shows the lowest variance. For inter-class analysis, the green channel (G) demonstrates a substantially greater ability to separate classes compared to the red (R) and blue (B) channels. Specifically, channel G provides the best class separability, while channel B exhibits poor inter-class distance, indicating limited utility for distinguishing between classes.

For the generated images, in the intra-class analysis, the ranking of variations among channels remains consistent with the real data. Channel B continues to display a low spread and a low median, indicating the tight clustering of values. The green channel maintains a wide spread similar to that observed in the real data, reflecting comparable within-class variations. In the inter-class analysis, the green channel once again shows the highest degree of class separation. Although there is a slight reduction in inter-class distances across all channels compared to the real data, the overall structure of class separability is preserved. As anticipated, channel B continues to provide poor separability between classes.

Spider plots (Figure S4) show the mean intra-class distance per class for each channel, represented by the lines for R, G, and B. The overall shape of the radar plots for both the real and generated data is quite similar, indicating that the class-specific consistency is preserved to some extent in the generated images. Regarding the magnitude of the distances, in most classes, channel B consistently shows lower intra-class distances in both the real and generated datasets, likely due to the inherently low variance in the blue channel. Conversely, channel G remains dominant in both cases, exhibiting the highest intra-class distances, which are well preserved in the generated data. The values in the generated dataset are slightly lower in some instances, suggesting the tighter clustering of data points, which may reflect a tendency toward over-regularization.

To conclude, the generated images successfully preserve the internal structure of the dataset. Specifically, they maintain class compactness in intra-class distributions and retain class separability in inter-class distributions.

4. Discussion

The research presented in this work proposes a novel strategy for synthetic biomedical image generation that leverages the discriminative potential of RGB color channels, particularly in the context of stained peripheral blood cells. The effectiveness of the approach stems from the integration of preliminary color analysis into model design and loss function formulation. In this section, we discuss the relevance and implications of the findings, analyze the observed phenomena, and identify limitations and directions for future research.

4.1. Contribution of Color Channels to Class Discrimination

The analysis of RGB histograms, UMAP projections, and inter-class distance matrices revealed that the red and green channels provided more important information for distinguishing blood cell types than the blue channel. This is due to the histological nature of the staining in BloodMNIST, where eosin and basic dyes impart pink–purple hues to the images with a dominant red component. Based on this, it was decided to exclude the blue channel from the histogram generator loss function, which simplified the model without losing key information.

This approach shows that selective color analysis based on biological interpretability can effectively support the optimization of generative models. It also highlights the importance of pre-color profiling in biomedical contexts, where subtle visual differences have high diagnostic significance.

4.2. Misclassification Patterns and Biological Justification

The classification errors mainly concerned classes with similar morphology and similar staining features, such as immature granulocytes, neutrophils, and lymphocytes. Their visual similarities, confirmed by histogram and UMAP cluster analysis, led to difficulties in the classifier’s discrimination, e.g., a low Spearman distance in the red channel between immature granulocytes and neutrophils.

This phenomenon highlights the biological justification of the errors—the boundaries between some cell classes are naturally blurred. Color, although important, is not always sufficient for effective class discrimination. Therefore, morphological information from microscopic images was also included in the generation process to better represent the structure of blood cells.

4.3. Ablation Study

This paper analyzes the effectiveness of three variants of the cGAN architecture in generating synthetic blood cell images. The first variant was based on standard convolutional layers, the second additionally included residual blocks, and the third used the Mixture-of-Experts (MoE) architecture. For each approach, three versions of the generator loss function were tested: without taking into account color histograms, with histograms of the red and green channels, and with a full RGB histogram.

Table 2 presents detailed metric results for each combination of architecture and loss function. It includes, among others, classification accuracy using the model trained on the BloodMNIST dataset, the FID index value, and other quality measures of the generated images. The data from the table clearly indicate that the best results, including a classification accuracy of 0.97 and low FID, were achieved for the MoE cGAN variant with a loss function taking into account the histograms of the red and green channels.

Ablation experiments showed that both increasing the complexity of the architecture and introducing a directed loss function significantly improve image quality. The MoE model proved to be particularly effective due to the specialization of experts, which allowed for better capturing class-specific morphological features. Moreover, selectively including only two channels (red and green) outperformed both the variant without histograms and the one with full RGB; including the blue channel introduced additional noise, which worsened the performance. These results emphasize the importance of biologically justified feature selection in the design of loss functions for generative models.

4.4. Comparison with Other Works

Various works present alternative approaches to generating additional histopathological images of peripheral blood cells as an extension of the BloodMNIST database [42,43,44,45,46,47]. All the strategies described differ significantly from our proposal, as they use images with a much lower resolution, the lowest of those offered by MedMNIST. After preprocessing, they have dimensions of 3 × 28 × 28 or 3 × 32 × 32 pixels. Our generation approach, on the other hand, uses images with a resolution of 3 × 128 × 128. Additionally, none of the articles mentioned provide the quality of generation for individual classes, only for the entire database.

The study [42] introduced a novel method, AE-COT-GAN, for medical image generation using GANs that effectively mitigates mode collapse by leveraging an autoencoder and extended semi-discrete optimal transport (SDOT) to transform distributions in the latent space. In extensive experiments on the BloodMNIST database, a FID value of 12.42 and accuracy of 0.97 was obtained.

In [44], DM-GAN was presented, i.e., a multi-generator GAN architecture designed to address intra-class imbalance and isolated samples in medical image datasets by enhancing the sample diversity and quality. Through the integration of self-attention mechanisms and a novel generator loss combining mode-seeking and mutual exclusion terms, the model proved to be a useful tool for image generation. The best accuracy in peripheral blood cell classification of 0.93 was obtained using the DenseNet201 architecture.

The study [45] introduced LEGAN, a GAN-based approach that addresses intra-class imbalance by using the local outlier factor LOF to detect sparse regions and applying affine transformations to enhance the sample diversity before training. Additionally, a decentralization constraint based on information entropy guided the generator toward producing more diverse samples, resulting in improved image quality and classification performance across multiple medical imaging datasets. For the BloodMNIST dataset, a FID of 10.8 was obtained and an accuracy of ca. 0.93 for the augmented samples.

A novel generative model, HD-PAN, based on Hölder divergence for semi-supervised disease classification using positive and unlabeled medical images, addressing the challenges of limited annotated data was proposed in [47]. Extensive experiments on benchmark datasets demonstrated that the method outperforms KL divergence-based approaches, achieving state-of-the-art results in medical image-assisted diagnosis. Taking the BloodMNIST dataset as an example, the accuracy rate reached 0.84, while the F1-score was 0.82.

The paper [43] introduced BDGAN, a generative model designed to improve medical image classification under data imbalance by enhancing both class boundary learning and intra-class diversity through a multi-generator architecture and specialized loss functions. Experiments on real-world datasets confirmed that BDGAN generates high-quality, diverse samples, leading to significant improvements in classification performance. For the BloodMNIST dataset, an accuracy of 0.94 was obtained for the generated samples.

The study [46] addressed the challenge of intra-class data imbalance in medical imaging by proposing a two-step GAN-based augmentation method guided by the Cluster-Based Local Outlier Factor (CBLOF) algorithm to distinguish sparse and dense samples. The GAN was trained to focus on sparse regions, and a one-class SVM was applied post-generation to filter out noisy samples. The experimental results on four medical datasets showed that the proposed method significantly enhances sample diversity and quality, leading to an approximate 3% improvement in classification accuracy. On the BloodMNIST dataset, the model achieved an average accuracy of 0.92 and a Fréchet Inception Distance (FID) of 10.6.

All described algorithms and results obtained apply to very low-resolution images.

4.5. Limitations and Future Works

The proposed framework effectively generates synthetic biomedical images, but there is room for improvement. The integration of spatial attention or segmentation mechanisms could better preserve important morphological details. Future works could include adaptive color normalization and domain adaptation to improve the robustness of different staining protocols.

In the Mixture-of-Experts architecture, dynamic routing or expert pruning is worth considering in order to reduce the computational burden. Multimodal feature conditioning (color, shape, and texture) would capture complex biomedical features and self-supervised targets could increase the realism of the data without the need for intensive supervision. Such extensions would increase the utility of the method in various clinical contexts.

5. Conclusions

The aim of the work was to check whether the analysis of colors in histological images and their inclusion in modeling can improve the quality of the generated images of missing data. The research was carried out on images of eight classes of peripheral blood cells, which showed similarities in color and shape. No segmentation or other preprocessing techniques were used in the work; the images were used in their original form.

A novel element was the detailed examination of the role of the R, G, and B components, including their histograms, in class differentiation. The analysis using the k-means and UMAP methods showed that the red and green components distinguished classes better than the blue channel. Therefore, only the R and G channels were used in the generator loss function, which turned out to be crucial for improving the quality of generation.

A new strategy based on cGAN with the MoE (Mixture of Experts) mechanism was introduced, where the gate network dynamically selects the best expert to generate the image. The inclusion of R and G histograms in the loss function significantly improved the color and texture reproduction, increasing the realism of images and the classification accuracy (up to 0.97). The applied approach reduced the problem of mode collapse, increased the diversity of synthetic data, and enabled its effective use as support in the conditions of limited or imbalanced real data.

The quality of the generated data was further validated by analyzing intra- and inter-class distances, which showed that the synthetic images preserved the structural characteristics of the original dataset. The green channel consistently demonstrated the highest variation and class separability, while the blue channel remained the least informative. The generated images reflected similar patterns to real data, confirming their realism and structural fidelity.

In summary, the study shows that targeted color analysis and the integration of expert knowledge in the cGAN-MoE architecture allows to create high-quality synthetic medical data. The proposed method can be applied in various dataset augmentation projects, provided that the color characteristics of a given set are previously analyzed.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/electronics14142773/s1: Figure S1: Metrics recorded during image generation; Figure S2: Expert contribution heatmap for BloodMNIST dataset—percentage distribution by class; Figure S3: The mean distances calculated for pairs of objects within classes (intra-class distances) as well as for pairs across the eight classes (inter-class distances), for BloodMNIST dataset and generated images; Figure S4: Mean intra-class distances per class for the R, G, and B channels in the BloodMNIST dataset and the generated images; Table S1: Characteristics of the BloodMNIST database classes; Table S2: Hardware and software specifications.

Author Contributions

P.K.: Conceptualization; Methodology; Data curation; Formal analysis; Funding acquisition; Investigation; Software; Resources; Visualization; Project administration; Writing—original draft. F.C.: Conceptualization; Methodology. M.J.: Conceptualization; Methodology; Formal analysis; Investigation; Software; Supervision; Validation; Writing—original draft. All authors have read and agreed to the published version of the manuscript.

Funding

This research project was supported by the program “Excellence initiative—research university” at the AGH University of Krakow.

Data Availability Statement

The authors used publicly available data in this manuscript. Information on the availability of the dataset is provided in the paper. The algorithms implemented in Python are available on the GitHub platform at https://github.com/pwkwiek/blood (accessed on 7 July 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Asif, S.; Wenhui, Y.; Ur-Rehman, S.; Ul-ain, Q.; Amjad, K.; Yueyang, Y.; Jinhai, S.; Awais, M. Advancements and Prospects of Machine Learning in Medical Diagnostics: Unveiling the Future of Diagnostic Precision. Arch. Comput. Methods Eng. 2024, 32, 853–883. [Google Scholar] [CrossRef]
Ramírez, J.G.C.; Islam, M.M.; Even, A.I.H. Machine Learning Applications in Healthcare: Current Trends and Future Prospects. J. Artif. Intell. Gen. Sci. 2024, 1, 1–12. [Google Scholar]
Chen, X. AI in Healthcare: Revolutionizing Diagnosis and Treatment through Machine Learning. MZ J. Artif. Intell. 2024, 1, 1–18. [Google Scholar]
Afkanpour, M.; Hosseinzadeh, E.; Tabesh, H. Identify the Most Appropriate Imputation Method for Handling Missing Values in Clinical Structured Datasets: A Systematic Review. BMC Med. Res. Methodol. 2024, 24, 188. [Google Scholar] [CrossRef]
Dekermanjian, J.P.; Shaddox, E.; Nandy, D.; Ghosh, D.; Kechris, K. Mechanism-Aware Imputation: A Two-Step Approach in Handling Missing Values in Metabolomics. BMC Bioinform. 2022, 23, 179. [Google Scholar] [CrossRef]
Garcea, F.; Serra, A.; Lamberti, F.; Morra, L. Data Augmentation for Medical Imaging: A Systematic Literature Review. Comput. Biol. Med. 2023, 152, 106391. [Google Scholar] [CrossRef]
Islam, T.; Hafiz, M.S.; Jim, J.R.; Kabir, M.M.; Mridha, M.F. A Systematic Review of Deep Learning Data Augmentation in Medical Imaging: Recent Advances and Future Research Directions. Healthc. Anal. 2024, 5, 100340. [Google Scholar] [CrossRef]
Saxena, D.; Cao, J. Generative Adversarial Networks (GANs). ACM Comput. Surv. 2021, 54, 63. [Google Scholar] [CrossRef]
Wang, Z.; She, Q.; Ward, T.E. Generative Adversarial Networks in Computer Vision: A Survey and Taxonomy. ACM Comput. Surv. 2021, 54, 37. [Google Scholar] [CrossRef]
Kazerouni, A.; Aghdam, E.K.; Heidari, M.; Azad, R.; Fayyaz, M.; Hacihaliloglu, I.; Merhof, D. Diffusion Models in Medical Imaging: A Comprehensive Survey. Med. Image Anal. 2023, 88, 102846. [Google Scholar] [CrossRef]
Rguibi, Z.; Hajami, A.; Zitouni, D.; Elqaraoui, A.; Zourane, R.; Bouajaj, Z. Improving Medical Imaging with Medical Variation Diffusion Model: An Analysis and Evaluation. J. Imaging 2023, 9, 171. [Google Scholar] [CrossRef] [PubMed]
Al-Azzam, N.; Shatnawi, I. Comparing Supervised and Semi-Supervised Machine Learning Models on Diagnosing Breast Cancer. Ann. Med. Surg. 2021, 62, 53–64. [Google Scholar] [CrossRef] [PubMed]
Yang, X.; Song, Z.; King, I.; Xu, Z. A Survey on Deep Semi-Supervised Learning. IEEE Trans. Knowl. Data Eng. 2023, 35, 8934–8954. [Google Scholar] [CrossRef]
Yang, J.; Shi, R.; Wei, D.; Liu, Z.; Zhao, L.; Ke, B.; Pfister, H.; Ni, B. MedMNIST v2—A Large-Scale Lightweight Benchmark for 2D and 3D Biomedical Image Classification. Sci. Data 2023, 10, 41. [Google Scholar] [CrossRef]
Sinaga, K.P.; Yang, M.S. Unsupervised K-Means Clustering Algorithm. IEEE Access 2020, 8, 80716–80727. [Google Scholar] [CrossRef]
McInnes, L.; Healy, J.; Saul, N.; Großberger, L. UMAP: Uniform Manifold Approximation and Projection. J. Open Source Softw. 2018, 3, 861. [Google Scholar] [CrossRef]
Mittal, M.; Praveen Gujjar, J.; Guru Prasad, M.S.; Devadas, R.M.; Ambreen, L.; Kumar, V. Dimensionality Reduction Using UMAP and TSNE Technique. In Proceedings of the 2nd IEEE International Conference on Advances in Information Technology, ICAIT 2024, Chikkamagaluru, Karnataka, India, 24–27 July 2024; Volume 1, pp. 1–5. [Google Scholar] [CrossRef]
Chattamvelli, R. Rank Correlation. In Correlation in Engineering and the Applied Sciences. Synthesis Lectures on Mathematics & Statistics; Springer: Berlin/Heidelberg, Germany, 2024; pp. 77–106. [Google Scholar]
Shen, S.; Jin, S.; Li, F.; Zhao, J. Optical Coherence Tomography Parameters as Prognostic Factors for Stereopsis after Vitrectomy for Unilateral Epiretinal Membrane: A Cohort Study. Sci. Rep. 2024, 14, 6715. [Google Scholar] [CrossRef]
Benčević, M.; Habijan, M.; Galić, I.; Babin, D.; Pižurica, A. Understanding Skin Color Bias in Deep Learning-Based Skin Lesion Segmentation. Comput. Methods Programs Biomed. 2024, 245, 108044. [Google Scholar] [CrossRef]
Kim, H.; Ryu, S.M.; Keum, J.S.; Oh, S.I.; Kim, K.N.; Shin, Y.H.; Jeon, I.H.; Koh, K.H. Clinical Validation of Enhanced CT Imaging for Distal Radius Fractures through Conditional Generative Adversarial Networks (CGAN). PLoS ONE 2024, 19, e0308346. [Google Scholar] [CrossRef]
Deabes, W.; Abdel-Hakim, A.E. CGAN-ECT: Reconstruction of Electrical Capacitance Tomography Images from Capacitance Measurements Using Conditional Generative Adversarial Networks. Flow Meas. Instrum. 2024, 96, 102566. [Google Scholar] [CrossRef]
Campbell, J.N.A.; Dais Ferreira, M.; Isenor, A.W. Generation of Vessel Track Characteristics Using a Conditional Generative Adversarial Network (CGAN). Appl. Artif. Intell. 2024, 38, e2360283. [Google Scholar] [CrossRef]
Yang, H.; Hu, Y.; He, S.; Xu, T.; Yuan, J.; Gu, X. Applying Conditional Generative Adversarial Networks for Imaging Diagnosis. In Proceedings of the 2024 IEEE 6th International Conference on Power, Intelligent Computing and Systems, ICPICS 2024, Shenyang, China, 26–28 July 2024; pp. 1717–1722. [Google Scholar]
Skandarani, Y.; Jodoin, P.M.; Lalande, A. GANs for Medical Image Synthesis: An Empirical Study. J. Imaging 2023, 9, 69. [Google Scholar] [CrossRef] [PubMed]
Zhang, Z.; Li, Y.; Shin, B.S. Robust Medical Image Colorization with Spatial Mask-Guided Generative Adversarial Network. Bioengineering 2022, 9, 721. [Google Scholar] [CrossRef] [PubMed]
Sun, Y. The Role of Activation Function in Image Classification. In Proceedings of the 2021 IEEE 3rd International Conference on Communications, Information System and Computer Engineering, CISCE 2021, Beijing, China, 14–16 May 2021; pp. 275–278. [Google Scholar]
Lakhdari, K.; Saeed, N. A New Vision of a Simple 1D Convolutional Neural Networks (1D-CNN) with Leaky-ReLU Function for ECG Abnormalities Classification. Intell. Based Med. 2022, 6, 100080. [Google Scholar] [CrossRef]
Cormen, T.H.; Leiserson, C.E.; Rivest, R.L.; Stein, C. Introduction to Algorithms, 3rd ed.; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Zhang, Q.; Huang, N.; Yao, L.; Zhang, D.; Shan, C.; Han, J. RGB-T Salient Object Detection via Fusing Multi-Level CNN Features. IEEE Trans. Image Process. 2020, 29, 3321–3335. [Google Scholar] [CrossRef]
Tang, H.; Li, Z.; Zhang, D.; He, S.; Tang, J. Divide-and-Conquer: Confluent Triple-Flow Network for RGB-T Salient Object Detection. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 47, 1958–1974. [Google Scholar] [CrossRef]
Gan, W.; Ning, Z.; Qi, Z.; Yu, P.S. Mixture of Experts (MoE): A Big Data Perspective. arXiv 2025, arXiv:2501.16352. [Google Scholar]
Zhao, C.; Du, H.; Niyato, D.; Kang, J.; Xiong, Z.; Kim, D.I.; Shen, X.S.; Letaief, K.B. Enhancing Physical Layer Communication Security through Generative AI with Mixture of Experts. IEEE Wirel. Commun. 2025, 32, 176–184. [Google Scholar] [CrossRef]
Rubner, Y.; Tomasi, C.; Guibas, L.J. Earth Mover’s Distance as a Metric for Image Retrieval. Int. J. Comput. Vis. 2000, 40, 99–121. [Google Scholar] [CrossRef]
Mi, J.; Ma, C.; Zheng, L.; Zhang, M.; Li, M.; Wang, M. WGAN-CL: A Wasserstein GAN with Confidence Loss for Small-Sample Augmentation. Expert. Syst. Appl. 2023, 233, 120943. [Google Scholar] [CrossRef]
Bobadilla, J.; Gutiérrez, A. Wasserstein GAN-Based Architecture to Generate Collaborative Filtering Synthetic Datasets. Appl. Intell. 2024, 54, 2472–2490. [Google Scholar] [CrossRef]
Ngasa, E.E.; Jang, M.A.; Tarimo, S.A.; Woo, J.; Shin, H.B. Diffusion-Based Wasserstein Generative Adversarial Network for Blood Cell Image Augmentation. Eng. Appl. Artif. Intell. 2024, 133, 108221. [Google Scholar] [CrossRef]
Anaya-Sánchez, H.; Altamirano-Robles, L.; Díaz-Hernández, R.; Zapotecas-Martínez, S. WGAN-GP for Synthetic Retinal Image Generation: Enhancing Sensor-Based Medical Imaging for Classification Models. Sensors 2025, 25, 167. [Google Scholar] [CrossRef] [PubMed]
Lawrence, P.; Brown, C. Comparing Human-Level and Machine Learning Model Performance in White Blood Cell Morphology Assessment. Eur. J. Haematol. 2024, 114, 115–119. [Google Scholar] [CrossRef]
Constantinescu, A.E.; Bull, C.J.; Jones, N.; Mitchell, R.; Burrows, K.; Dimou, N.; Bézieau, S.; Brenner, H.; Buchanan, D.D.; D’Amato, M.; et al. Circulating White Blood Cell Traits and Colorectal Cancer Risk: A Mendelian Randomisation Study. Int. J. Cancer 2024, 154, 94–103. [Google Scholar] [CrossRef]
Ozga, M.; Nicolet, D.; Mrózek, K.; Walker, C.J.; Blachly, J.S.; Kohlschmidt, J.; Orwick, S.; Carroll, A.J.; Larson, R.A.; Kolitz, J.E.; et al. White Blood Cell Count Levels Are Associated with Inflammatory Response and Constitute Independent Outcome Predictors in Adult Patients with Acute Myeloid Leukemia Aged <60 Years. Am. J. Hematol. 2024, 99, 2236–2240. [Google Scholar] [CrossRef]
Wang, J.; Lei, B.; Ding, L.; Xu, X.; Gu, X.; Zhang, M. Autoencoder-Based Conditional Optimal Transport Generative Adversarial Network for Medical Image Generation. Vis. Inform. 2024, 8, 15–25. [Google Scholar] [CrossRef]
Ding, H.; Tao, Q.; Huang, N. BDGAN: Boundary and Diversity-Aware Generative Adversarial Network for Imbalanced Medical Image Augmentation. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing, Hyderabad, India, 6–11 April 2025; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2025. [Google Scholar]
Ding, H.; Zhang, K.; Huang, N. DM-GAN: A Data Augmentation-Based Approach for Imbalanced Medical Image Classification. In Proceedings of the 2024 IEEE International Conference on Bioinformatics and Biomedicine, BIBM 2024, Lisbon, Portugal, 3–6 December 2024; Institute of Electrical and Electronics Engineers Inc.: Piscataway, NJ, USA, 2024; pp. 3160–3165. [Google Scholar]
Ding, H.; Huang, N.; Wu, Y.; Cui, X. LEGAN: Addressing Intraclass Imbalance in GAN-Based Medical Image Augmentation for Improved Imbalanced Data Classification. IEEE Trans. Instrum. Meas. 2024, 73, 2517914. [Google Scholar] [CrossRef]
Ding, H.; Huang, N.; Wu, Y.; Cui, X. Improving Imbalanced Medical Image Classification through GAN-Based Data Augmentation Methods. Pattern Recognit. 2025, 166, 111680. [Google Scholar] [CrossRef]
Zhang, Y.; Li, C.; Liu, Z.; Li, M. Semi-Supervised Disease Classification Based on Limited Medical Image Data. IEEE J. Biomed. Health Inform. 2024, 28, 1575–1586. [Google Scholar] [CrossRef]

Figure 1. Pre-analysis of data correlation and distribution: explanation of k-means and UMAP algorithms and Spearman distances.

Figure 2. The Mixture-of-Experts (MoE) model combined with a conditional GAN (cGAN) as an advanced architecture for generating high-quality, complex images proposed in the presented study.

Figure 3. Histograms of the R, G, and B components (relative frequency) and dominant colors in 8 cell classes. The histogram colors represent the red, green, and blue components, while the squares next to the blood cell images indicate the dominant color in each image.

Figure 4. UMAP analysis results for each of the channels with pre-PCA.

Figure 5. Distance matrices (1—Spearman) for the R, G, and B channels.

Figure 6. Visual comparison of peripheral blood cell classes: each row represents a distinct cell type (basophil, eosinophil, erythroblast, granulocyte, lymphocyte, monocyte, neutrophil, and platelet); the first four columns show real microscopic images, while the last four columns display corresponding synthetic images generated by the model. The colors in the images result from staining blood samples using methods applied in hematology.

Figure 7. Classification model metrics for evaluating blood cell image generation, and the confusion matrix for the generated 1000 images from eight classes.

Table 1. FID values for each class.

Class	FID
basophil	91.39
eosinophil	102.35
erythroblast	116.95
granulocyte	109.13
lymphocyte	104.18
monocyte	82.92
neutrophil	69.42
platelet	140.25

Table 2. Comparison of the values of the metrics obtained during the generation of images using three various strategies.

Nr	NN Architecture	Color Histograms	Accuracy	FID for the Entire Generated Set	FID Averaged for Individual Classes
1	cGAN	Not used	0.89	74.5	137.0
2		Red, Green	0.87	76.3	138.7
3		Red, Green, Blue	0.91	91.1	149.5
4	cGAN with residual blocks	Not used	0.90	57.1	112.2
5		Red, Green	0.93	66.4	123.8
6		Red, Green, Blue	0.87	71.3	132.5
7	Mixture- of-Experts cGAN	Not used	0.92	50.4	98.4
8		Red, Green ¹	0.97	52.1	102.1
9		Red, Green, Blue	0.86	50.0	101.7

¹ The best result obtained using the MoE-cGAN algorithm.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kwiek, P.; Ciepiela, F.; Jakubowska, M. Color-Guided Mixture-of-Experts Conditional GAN for Realistic Biomedical Image Synthesis in Data-Scarce Diagnostics. Electronics 2025, 14, 2773. https://doi.org/10.3390/electronics14142773

AMA Style

Kwiek P, Ciepiela F, Jakubowska M. Color-Guided Mixture-of-Experts Conditional GAN for Realistic Biomedical Image Synthesis in Data-Scarce Diagnostics. Electronics. 2025; 14(14):2773. https://doi.org/10.3390/electronics14142773

Chicago/Turabian Style

Kwiek, Patrycja, Filip Ciepiela, and Małgorzata Jakubowska. 2025. "Color-Guided Mixture-of-Experts Conditional GAN for Realistic Biomedical Image Synthesis in Data-Scarce Diagnostics" Electronics 14, no. 14: 2773. https://doi.org/10.3390/electronics14142773

APA Style

Kwiek, P., Ciepiela, F., & Jakubowska, M. (2025). Color-Guided Mixture-of-Experts Conditional GAN for Realistic Biomedical Image Synthesis in Data-Scarce Diagnostics. Electronics, 14(14), 2773. https://doi.org/10.3390/electronics14142773

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Color-Guided Mixture-of-Experts Conditional GAN for Realistic Biomedical Image Synthesis in Data-Scarce Diagnostics

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset and Data Preprocessing

2.2. Methods

2.3. Evaluation Metrics

2.4. Training, Testing, and Data Generation

2.5. Hardware and Software

3. Results

3.1. Characteristics of the BloodMNIST Database with Regard to Color Analysis in the R, G, and B Channels

3.2. Image Generation

3.2.1. Generator

3.2.2. Discriminator

3.2.3. Training of Image Generation Models

3.3. Participation of Experts in Image Generation

3.4. Final Evaluation of Image Generation

4. Discussion

4.1. Contribution of Color Channels to Class Discrimination

4.2. Misclassification Patterns and Biological Justification

4.3. Ablation Study

4.4. Comparison with Other Works

4.5. Limitations and Future Works

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI