Figure 1.
Asymmetric attention-enhanced UNet architecture for cardiac MRI diffusion model. The network employs a five-level hierarchical structure with feature dimensions [64, 128, 256, 512, 1024]. Standard blocks (blue) handle local feature extraction in early encoder and late decoder stages, while attention blocks (yellow) capture global structural relationships in deeper layers. Skip connections (red dashed arrows) preserve spatial information across resolution levels. The asymmetric attention placement enables progressive construction from local tissue characteristics to global cardiac anatomical relationships.
Figure 1.
Asymmetric attention-enhanced UNet architecture for cardiac MRI diffusion model. The network employs a five-level hierarchical structure with feature dimensions [64, 128, 256, 512, 1024]. Standard blocks (blue) handle local feature extraction in early encoder and late decoder stages, while attention blocks (yellow) capture global structural relationships in deeper layers. Skip connections (red dashed arrows) preserve spatial information across resolution levels. The asymmetric attention placement enables progressive construction from local tissue characteristics to global cardiac anatomical relationships.
Figure 2.
Detailed architecture of the multi-head self-attention mechanism used in attention blocks. Input features undergo layer normalization followed by convolutional processing before being split into Query (Q), Key (K), and Value (V) components. The attention operation computes scaled dot-product attention across 8 heads in parallel, with outputs concatenated and projected through linear layers. Residual connections ensure stable gradient flow throughout the attention computation.
Figure 2.
Detailed architecture of the multi-head self-attention mechanism used in attention blocks. Input features undergo layer normalization followed by convolutional processing before being split into Query (Q), Key (K), and Value (V) components. The attention operation computes scaled dot-product attention across 8 heads in parallel, with outputs concatenated and projected through linear layers. Residual connections ensure stable gradient flow throughout the attention computation.
Figure 3.
Visual comparison showing (a) real cardiac MRI images used as ground truth, alongside synthetic images generated by different methods: (b) variational autoencoder (VAE) baseline (FID = 325.26, SSIM = 0.596 ± 0.065, MS-SSIM = 0.594 ± 0.090), (c) Wasserstein generative adversarial network (WGAN-GP) (FID = 227.98, SSIM = 0.471 ± 0.083, MS-SSIM = 0.470 ± 0.164), (d) StyleGAN2 with adaptive discriminator augmentation (StyleGAN2-ADA) (FID = 117.70, SSIM = 0.406 ± 0.061, MS-SSIM = 0.350 ± 0.072), and (e) the proposed method (FID = 77.78, SSIM = 0.720 ± 0.143, MS-SSIM = 0.925 ± 0.069). Each panel displays multiple cardiac cross-sectional views demonstrating the quality and realism of synthetic image generation across different approaches. The quantitative metrics demonstrate the superior performance of our proposed diffusion model, with the lowest FID score (indicating better distributional similarity to real images) and highest SSIM and MS-SSIM scores (indicating better structural preservation).
Figure 3.
Visual comparison showing (a) real cardiac MRI images used as ground truth, alongside synthetic images generated by different methods: (b) variational autoencoder (VAE) baseline (FID = 325.26, SSIM = 0.596 ± 0.065, MS-SSIM = 0.594 ± 0.090), (c) Wasserstein generative adversarial network (WGAN-GP) (FID = 227.98, SSIM = 0.471 ± 0.083, MS-SSIM = 0.470 ± 0.164), (d) StyleGAN2 with adaptive discriminator augmentation (StyleGAN2-ADA) (FID = 117.70, SSIM = 0.406 ± 0.061, MS-SSIM = 0.350 ± 0.072), and (e) the proposed method (FID = 77.78, SSIM = 0.720 ± 0.143, MS-SSIM = 0.925 ± 0.069). Each panel displays multiple cardiac cross-sectional views demonstrating the quality and realism of synthetic image generation across different approaches. The quantitative metrics demonstrate the superior performance of our proposed diffusion model, with the lowest FID score (indicating better distributional similarity to real images) and highest SSIM and MS-SSIM scores (indicating better structural preservation).
![Diagnostics 15 01985 g003 Diagnostics 15 01985 g003]()
Figure 4.
The bar chart and accompanying table compare the percentage of total pixels occupied by three cardiac structures: left ventricle (LV), right ventricle (RV), and myocardium (Myo) in real versus generated images. Real data is shown in blue (light gray in grayscale) and generated data in purple (dark gray in grayscale). While LV and myocardium distributions show no statistically significant differences between real and generated images (p = 0.054 and p = 0.470, respectively), the right ventricle shows a statistically significant reduction in generated images (1.25% vs. 1.13%, p = 0.004), representing a 9.7% relative decrease. Error bars indicate standard deviation. The overall similarity in class distributions demonstrates that the generative model maintains realistic anatomical proportions for most cardiac structures. Statistical analysis performed using two-sample Kolmogorov–Smirnov test comparing distributions between methods (n = 100 samples per group).
Figure 4.
The bar chart and accompanying table compare the percentage of total pixels occupied by three cardiac structures: left ventricle (LV), right ventricle (RV), and myocardium (Myo) in real versus generated images. Real data is shown in blue (light gray in grayscale) and generated data in purple (dark gray in grayscale). While LV and myocardium distributions show no statistically significant differences between real and generated images (p = 0.054 and p = 0.470, respectively), the right ventricle shows a statistically significant reduction in generated images (1.25% vs. 1.13%, p = 0.004), representing a 9.7% relative decrease. Error bars indicate standard deviation. The overall similarity in class distributions demonstrates that the generative model maintains realistic anatomical proportions for most cardiac structures. Statistical analysis performed using two-sample Kolmogorov–Smirnov test comparing distributions between methods (n = 100 samples per group).
Figure 5.
The three panels show kernel density estimates comparing the distribution of pixel percentages for left ventricle (LV), right ventricle (RV), and myocardium (Myo) between real cardiac MRI images (blue/light gray) and generated synthetic images (purple/dark gray). The LV distributions show substantial overlap with real data exhibiting a slightly broader spread. The RV panel reveals that generated images produce a more concentrated distribution with reduced variability compared to real images, consistent with the statistically significant difference noted in previous analysis. The myocardium distributions demonstrate excellent agreement between real and generated data, with nearly identical peak locations and spread. The overlapping areas (shown in darker purple/gray) indicate regions where both distributions coincide, demonstrating the model’s ability to capture realistic anatomical proportions across different cardiac structures. Statistical analysis performed using two-sample Kolmogorov–Smirnov test comparing distributions between methods (n = 100 samples per group).
Figure 5.
The three panels show kernel density estimates comparing the distribution of pixel percentages for left ventricle (LV), right ventricle (RV), and myocardium (Myo) between real cardiac MRI images (blue/light gray) and generated synthetic images (purple/dark gray). The LV distributions show substantial overlap with real data exhibiting a slightly broader spread. The RV panel reveals that generated images produce a more concentrated distribution with reduced variability compared to real images, consistent with the statistically significant difference noted in previous analysis. The myocardium distributions demonstrate excellent agreement between real and generated data, with nearly identical peak locations and spread. The overlapping areas (shown in darker purple/gray) indicate regions where both distributions coincide, demonstrating the model’s ability to capture realistic anatomical proportions across different cardiac structures. Statistical analysis performed using two-sample Kolmogorov–Smirnov test comparing distributions between methods (n = 100 samples per group).
![Diagnostics 15 01985 g005 Diagnostics 15 01985 g005]()
Figure 6.
The violin plots display the probability density distributions and statistical comparisons for four key LV shape features: area (pixels), roundness (0–1 scale), eccentricity (0–1 scale), and solidity (0–1 scale). Real data is shown in blue (light gray in grayscale) and generated data in purple (dark gray in grayscale). Box plots within each violin indicate median, quartiles, and outliers, with mean () and standard deviation () values displayed. Statistical significance testing reveals that area (p = 0.803) and roundness (p = 0.174) show no significant differences between real and generated images. However, eccentricity shows a highly significant difference (p < 0.001), with real LV structures being more eccentric (0.61 ± 0.08) compared to generated ones (0.55 ± 0.12). Solidity also shows a significant but small difference (p = 0.016), though both distributions have nearly identical means (0.97). These results indicate that while the generative model captures most LV shape characteristics accurately, it tends to produce slightly more circular and less eccentric ventricular shapes than observed in real cardiac anatomy. *** indicates p < 0.001; * indicates p < 0.05; ns indicates not significant.
Figure 6.
The violin plots display the probability density distributions and statistical comparisons for four key LV shape features: area (pixels), roundness (0–1 scale), eccentricity (0–1 scale), and solidity (0–1 scale). Real data is shown in blue (light gray in grayscale) and generated data in purple (dark gray in grayscale). Box plots within each violin indicate median, quartiles, and outliers, with mean () and standard deviation () values displayed. Statistical significance testing reveals that area (p = 0.803) and roundness (p = 0.174) show no significant differences between real and generated images. However, eccentricity shows a highly significant difference (p < 0.001), with real LV structures being more eccentric (0.61 ± 0.08) compared to generated ones (0.55 ± 0.12). Solidity also shows a significant but small difference (p = 0.016), though both distributions have nearly identical means (0.97). These results indicate that while the generative model captures most LV shape characteristics accurately, it tends to produce slightly more circular and less eccentric ventricular shapes than observed in real cardiac anatomy. *** indicates p < 0.001; * indicates p < 0.05; ns indicates not significant.
![Diagnostics 15 01985 g006 Diagnostics 15 01985 g006]()
Figure 7.
The violin plots display the probability density distributions and statistical comparisons for four key RV shape features: area (pixels), roundness (0–1 scale), eccentricity (0–1 scale), and solidity (0–1 scale). Real data is shown in blue (light gray in grayscale) and generated data in purple (dark gray in grayscale). Box plots within each violin indicate median, quartiles, and outliers, with mean () and standard deviation () values displayed. Statistical analysis reveals that only area shows no significant difference between real and generated images (p = 0.220). All other features show significant differences: roundness is significantly lower in generated images (0.12 ± 0.03) compared to real images (0.17 ± 0.17, p = 0.009), eccentricity is highly significantly lower in generated images (0.54 ± 0.09 vs. 0.62 ± 0.10, p < 0.001), and solidity is significantly reduced in generated images (0.42 ± 0.07 vs. 0.46 ± 0.11, p = 0.007). These results indicate that the generative model systematically produces RV structures that are less round, less eccentric, and less solid than real RV anatomy, suggesting challenges in accurately capturing the complex and variable morphology of the right ventricle. *** indicates p < 0.001; ** indicates p < 0.01; ns indicates not significant.
Figure 7.
The violin plots display the probability density distributions and statistical comparisons for four key RV shape features: area (pixels), roundness (0–1 scale), eccentricity (0–1 scale), and solidity (0–1 scale). Real data is shown in blue (light gray in grayscale) and generated data in purple (dark gray in grayscale). Box plots within each violin indicate median, quartiles, and outliers, with mean () and standard deviation () values displayed. Statistical analysis reveals that only area shows no significant difference between real and generated images (p = 0.220). All other features show significant differences: roundness is significantly lower in generated images (0.12 ± 0.03) compared to real images (0.17 ± 0.17, p = 0.009), eccentricity is highly significantly lower in generated images (0.54 ± 0.09 vs. 0.62 ± 0.10, p < 0.001), and solidity is significantly reduced in generated images (0.42 ± 0.07 vs. 0.46 ± 0.11, p = 0.007). These results indicate that the generative model systematically produces RV structures that are less round, less eccentric, and less solid than real RV anatomy, suggesting challenges in accurately capturing the complex and variable morphology of the right ventricle. *** indicates p < 0.001; ** indicates p < 0.01; ns indicates not significant.
![Diagnostics 15 01985 g007 Diagnostics 15 01985 g007]()
Figure 8.
The violin plots display the probability density distributions and statistical comparisons for four key myocardium shape features: area (pixels), roundness (0–1 scale), eccentricity (0–1 scale), and solidity (0–1 scale). Real data is shown in blue (light gray in grayscale) and generated data in purple (dark gray in grayscale). Box plots within each violin indicate median, quartiles, and outliers, with mean () and standard deviation () values displayed. Statistical analysis reveals excellent agreement between real and generated myocardium features, with no significant differences observed across all four parameters: area (p = 0.724), roundness (p = 0.054), eccentricity (p = 0.920), and solidity (p = 0.923). The means are nearly identical for most features, with eccentricity and solidity showing perfect agreement (0.74 ± 0.12 and 0.80 ± 0.06, respectively, for both real and generated data). These results demonstrate that the generative model successfully captures myocardium morphological characteristics with high fidelity, suggesting that the complex ring-like structure of the myocardium is well-preserved in synthetic cardiac images. ns indicates not significant.
Figure 8.
The violin plots display the probability density distributions and statistical comparisons for four key myocardium shape features: area (pixels), roundness (0–1 scale), eccentricity (0–1 scale), and solidity (0–1 scale). Real data is shown in blue (light gray in grayscale) and generated data in purple (dark gray in grayscale). Box plots within each violin indicate median, quartiles, and outliers, with mean () and standard deviation () values displayed. Statistical analysis reveals excellent agreement between real and generated myocardium features, with no significant differences observed across all four parameters: area (p = 0.724), roundness (p = 0.054), eccentricity (p = 0.920), and solidity (p = 0.923). The means are nearly identical for most features, with eccentricity and solidity showing perfect agreement (0.74 ± 0.12 and 0.80 ± 0.06, respectively, for both real and generated data). These results demonstrate that the generative model successfully captures myocardium morphological characteristics with high fidelity, suggesting that the complex ring-like structure of the myocardium is well-preserved in synthetic cardiac images. ns indicates not significant.
![Diagnostics 15 01985 g008 Diagnostics 15 01985 g008]()
Figure 9.
The violin plots show the distribution of LV/Myo ratio, RV/LV ratio, and roundness measurements for both ventricles. Real data appears in blue (light gray in grayscale) and generated data in purple (dark gray in grayscale). Box plots show median and quartiles, with violin shapes indicating the distribution density. Mean () and standard deviation () values are displayed for each comparison. Note the excellent preservation of LV/Myo ratio (p = 0.908) and LV roundness (p = 0.702), while RV metrics show greater variability, reflecting the inherent challenge of modeling right ventricular geometry. The close agreement in left ventricular metrics demonstrates successful capture of LV morphology, whereas right ventricular differences highlight the complexity of accurately modeling RV anatomy in synthetic cardiac images.
Figure 9.
The violin plots show the distribution of LV/Myo ratio, RV/LV ratio, and roundness measurements for both ventricles. Real data appears in blue (light gray in grayscale) and generated data in purple (dark gray in grayscale). Box plots show median and quartiles, with violin shapes indicating the distribution density. Mean () and standard deviation () values are displayed for each comparison. Note the excellent preservation of LV/Myo ratio (p = 0.908) and LV roundness (p = 0.702), while RV metrics show greater variability, reflecting the inherent challenge of modeling right ventricular geometry. The close agreement in left ventricular metrics demonstrates successful capture of LV morphology, whereas right ventricular differences highlight the complexity of accurately modeling RV anatomy in synthetic cardiac images.
Figure 10.
The histogram shows overlapping density distributions for myocardial wall thickness measurements in pixels, with real images in blue (light gray in grayscale) and generated images in purple (dark gray in grayscale). While distributions show significant overlap, generated images tend toward slightly thinner walls, with a statistically significant difference (p < 0.001) between mean thicknesses (21.20 ± 9.01 pixels for real vs. 19.91 ± 9.20 pixels for generated images). Despite this difference, the substantial distributional overlap indicates that the diffusion model maintains clinically plausible myocardial wall thickness variations.
Figure 10.
The histogram shows overlapping density distributions for myocardial wall thickness measurements in pixels, with real images in blue (light gray in grayscale) and generated images in purple (dark gray in grayscale). While distributions show significant overlap, generated images tend toward slightly thinner walls, with a statistically significant difference (p < 0.001) between mean thicknesses (21.20 ± 9.01 pixels for real vs. 19.91 ± 9.20 pixels for generated images). Despite this difference, the substantial distributional overlap indicates that the diffusion model maintains clinically plausible myocardial wall thickness variations.
Figure 11.
Pairwise feature relationship analysis for left ventricle (LV) morphological characteristics. This correlation matrix displays pairwise scatter plots and marginal distributions comparing real cardiac MRI data (blue/light gray) with generated synthetic data (purple/dark gray) across five key LV morphological features: area, perimeter, eccentricity, solidity, and roundness. The diagonal histograms show the distribution of individual features, while off-diagonal scatter plots reveal correlations between feature pairs. The substantial overlap between real and generated data points across all feature relationships demonstrates that the generative model successfully preserves the complex interdependencies between LV morphological characteristics, maintaining clinically realistic feature correlations in the synthetic cardiac images.
Figure 11.
Pairwise feature relationship analysis for left ventricle (LV) morphological characteristics. This correlation matrix displays pairwise scatter plots and marginal distributions comparing real cardiac MRI data (blue/light gray) with generated synthetic data (purple/dark gray) across five key LV morphological features: area, perimeter, eccentricity, solidity, and roundness. The diagonal histograms show the distribution of individual features, while off-diagonal scatter plots reveal correlations between feature pairs. The substantial overlap between real and generated data points across all feature relationships demonstrates that the generative model successfully preserves the complex interdependencies between LV morphological characteristics, maintaining clinically realistic feature correlations in the synthetic cardiac images.
Table 1.
Quantitative comparison of cardiac MRI image generation methods.
Table 1.
Quantitative comparison of cardiac MRI image generation methods.
Method | FID ↓ | SSIM ↑ | MS-SSIM↑ |
---|
VAE Baseline | 325.26 | 0.596 ± 0.065 | 0.594 ± 0.090 |
WGAN-GP | 227.98 | 0.471 ± 0.083 | 0.470 ± 0.164 |
StyleGAN2-ADA | 117.70 | 0.406 ± 0.061 | 0.350 ± 0.072 |
Our Model | 77.78 | 0.720 ± 0.143 | 0.925 ± 0.069 |
Table 2.
Ablation study comparing different attention placement strategies in cardiac MRI diffusion model. Results demonstrate the superiority of asymmetric attention placement, achieving the best image quality metrics while maintaining computational efficiency compared to uniform attention approaches.
Table 2.
Ablation study comparing different attention placement strategies in cardiac MRI diffusion model. Results demonstrate the superiority of asymmetric attention placement, achieving the best image quality metrics while maintaining computational efficiency compared to uniform attention approaches.
Architecture Variant | FID ↓ | SSIM ↑ | Training Time ↓ |
---|
Uniform Attention | 89.23 | 0.681 | 12.3 h |
No Attention | 95.47 | 0.634 | 7.1 h |
Asymmetric (Ours) | 77.78 | 0.720 | 8.2 h |
Table 3.
Comparison of pixel-level class distributions between real and generated images.
Table 3.
Comparison of pixel-level class distributions between real and generated images.
Structure | Real (%) | Generated (%) | Relative Diff. | p-Value |
---|
Left Ventricle | 1.52 ± 1.52 | 1.54 ± 1.54 | +1.1% | 0.054 |
Right Ventricle | 1.25 ± 1.25 | 1.13 ± 1.13 | −9.7% | 0.004 * |
Myocardium | 2.04 ± 2.04 | 2.01 ± 2.01 | −1.2% | 0.470 |
Table 4.
Comparison of shape features between real and generated cardiac structures. Statistical comparisons performed using two-sample Kolmogorov–Smirnov test (n = 100 samples per method).
Table 4.
Comparison of shape features between real and generated cardiac structures. Statistical comparisons performed using two-sample Kolmogorov–Smirnov test (n = 100 samples per method).
Structure | Feature | Real | Generated | Rel. Diff. | p-Value |
---|
LV | Area (pixels) | 996.5 ± 343.4 | 1007.8 ± 287.5 | +1.1% | 0.054 |
Roundness | 0.91 ± 0.02 | 0.90 ± 0.03 | −0.5% | 0.702 |
Eccentricity | 0.61 ± 0.08 | 0.55 ± 0.12 | −9.6% | 0.004 * |
Solidity | 0.97 ± 0.01 | 0.97 ± 0.01 | −0.2% | 0.036 * |
RV | Area (pixels) | 781.4 ± 293.7 | 740.6 ± 154.0 | −5.2% | 0.008 * |
Roundness | 0.17 ± 0.17 | 0.12 ± 0.03 | −26.7% | 0.023 * |
Eccentricity | 0.62 ± 0.10 | 0.54 ± 0.09 | −12.5% | <0.001 * |
Solidity | 0.46 ± 0.11 | 0.42 ± 0.07 | −7.8% | 0.025 * |
Myo | Area (pixels) | 1214.3 ± 709.9 | 1245.5 ± 570.5 | +2.6% | 0.217 |
Roundness | 0.66 ± 0.29 | 0.60 ± 0.14 | −9.2% | 0.619 |
Eccentricity | 0.74 ± 0.12 | 0.74 ± 0.12 | −0.2% | 0.600 |
Solidity | 0.80 ± 0.06 | 0.80 ± 0.05 | −0.1% | 0.199 |
Table 5.
Comparison of cardiac-specific metrics. Statistical comparisons performed using two-sample Kolmogorov–Smirnov test (n = 100 samples per method).
Table 5.
Comparison of cardiac-specific metrics. Statistical comparisons performed using two-sample Kolmogorov–Smirnov test (n = 100 samples per method).
Metric | Real | Generated | Rel. Diff. | p-Value |
---|
LV/Myo Ratio | 0.84 ± 0.34 | 0.83 ± 0.27 | −1.4% | 0.908 |
RV/LV Ratio | 0.91 ± 0.41 | 0.79 ± 0.26 | −13.3% | 0.111 |
Myo Thickness (pixels) | 21.20 ± 9.01 | 19.91 ± 9.20 | −6.1% | <0.001 * |