Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions
Featured Application
Abstract
1. Introduction
- (a)
- Theoretical reframing of human–GenAI differentiation as a structural prior divergence problem rather than artifact-based detection.
- (b)
- Empirical evidence that negative space utilization constitutes a dominant modality-specific compositional signal.
- (c)
- Demonstration that structural anchoring and topological fragmentation metrics encode independent discriminative value.
- (d)
- Development of a reproducible and interpretable machine learning framework operating exclusively on low-level visual statistics.
2. Materials and Methods
2.1. Dataset Construction
2.1.1. Human-Designed Posters
- Curated (editorially featured) works: n = 60.
- Most Appreciated works: n = 20.
- Most Viewed works: n = 20.
- (a)
- Inclusion CriteriaPosters were included in the dataset if they satisfied all the following conditions:
- Clear thematic relevance to a jazz festival, identifiable through title, description, tags, or visual composition.
- Presentation as a single finalized poster design (multi-page or multi-layout presentations were excluded).
- Minimum short-edge resolution of 1024 pixels to ensure reliable pixel-based computational analysis.
- Explicit identification as human-created (works labeled as AI-generated were excluded).
- Direct relevance to graphic design practice and poster design conventions.
- (b)
- Exclusion Criteria
- The following materials were excluded from the dataset:
- Duplicate uploads or minor variations of the same poster.
- Mood boards, presentation layouts, or multi-design showcase pages.
- Posters embedded within mockups where the design area could not be clearly isolated.
- Non-poster outputs such as logos, UI designs, branding kits, or icon collections.
2.1.2. GenAI-Generated Posters
2.2. Image Preprocessing and Standardization
2.2.1. Geometric Normalization
2.2.2. Color Space Masking
2.3. Computational Visual Features
2.3.1. Structural Complexity Metrics
2.3.2. Color-Based Metrics
2.3.3. Compositional Metric
2.4. Statistical Analysis and Classification Framework
2.4.1. Statistical Testing
2.4.2. Random Forest Classification Framework
- Edge Density.
- Shannon Entropy.
- Colorfulness.
- Hue Entropy.
- Padding Ratio.
- Rule-of-Thirds Activation Score.
- Connected Component Density.
- (i)
- The original five-feature configuration;
- (ii)
- A structural-only configuration including the two structural descriptors;
- (iii)
- A combined seven-feature configuration integrating all descriptors.
2.4.3. Feature Importance Analysis
3. Results
3.1. Compositional Separation (Padding Ratio)
3.2. Structural Complexity Differences
3.3. Chromatic Organization
3.4. Multidimensional Feature Space (PCA)
3.5. Random Forest Classification Performance
3.6. Permutation-Based Feature Importance
3.7. Feature Ablation Analysis
3.8. SHAP-Based Model Interpretability
3.9. Representative Best-, Typical-, and Worst-Case Examples
3.10. Structural Robustness and Feature Family Ablation Analysis
3.11. Comparative Evaluation with Alternative Traditional Classifiers
4. Discussion
4.1. Structural Implications
4.2. Interpretation in the Context of Previous Research and Design Theory
4.3. Multivariate Separability and Model Behavior
4.4. Broader Implications and Future Research Directions
4.5. Limitations and Scope of the Study
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| GenAI | Generative Artificial Intelligence |
| PCA | Principal Component Analysis |
| RF | Random Forest |
| ROC | Receiver Operating Characteristic |
| AUC | Area Under the Curve |
| CV | Cross-Validation |
| HSV | Hue–Saturation–Value (color space) |
| RGB | Red, Green, Blue (color space) |
| SD | Standard Deviation |
References
- Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 10684–10695. [Google Scholar] [CrossRef]
- Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
- Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020, arXiv:2006.11239. [Google Scholar] [CrossRef]
- Floridi, L.; Cowls, J.; Beltrametti, M.; Chatila, R.; Chazerand, P.; Dignum, V.; Luetge, C.; Madelin, R.; Pagallo, U.; Rossi, F. AI4People-An Ethical Framework for a Good AI Society. Minds Mach. 2018, 28, 689–707. [Google Scholar] [CrossRef] [PubMed]
- Dwivedi, Y.K.; Kshetri, N.; Hughes, L.; Slade, E.L.; Jeyaraj, A.; Kar, A.K.; Baabdullah, A.M.; Koohang, A.; Raghavan, V.; Ahuja, M. So what if ChatGPT wrote it? Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int. J. Inf. Manag. 2023, 71, 102642. [Google Scholar] [CrossRef]
- Elgammal, A.; Liu, B.; Elhoseiny, M.; Mazzone, M. CAN: Creative Adversarial Networks, Generating Art by Learning about Styles and Deviating from Style Norms. In Proceedings of the 8th International Conference on Computational Creativity, Atlanta, GA, USA, 19–23 June 2017; pp. 96–103. [Google Scholar] [CrossRef]
- Hertzmann, A. Can Computers Create Art? Arts 2018, 7, 18. [Google Scholar] [CrossRef]
- Cetinic, E.; She, J. Understanding and Creating Art with AI: Review and Outlook. ACM Trans. Multimed. Comput. Commun. Appl. 2022, 18, 1–22. [Google Scholar] [CrossRef]
- Wang, S.-Y.; Wang, O.; Zhang, R.; Owens, A.; Efros, A.A. CNN-Generated Images Are Surprisingly Easy to Spot‚ for Now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8695–8704. [Google Scholar] [CrossRef]
- Gragnaniello, D.; Cozzolino, D.; Marra, F.; Poggi, G.; Verdoliva, L. Are GAN-Generated Images Easy to Detect? A Critical Analysis of the State-of-the-Art. arXiv 2021, arXiv:2104.02617. [Google Scholar] [CrossRef]
- Verdoliva, L. Media Forensics and DeepFakes: An Overview. IEEE J. Sel. Top. Signal Process. 2020, 14, 910–932. [Google Scholar] [CrossRef]
- Xuan, X.; Peng, B.; Wang, W.; Dong, J. On the Generalization of GAN Image Forensics. arXiv 2019, arXiv:1902.11153. [Google Scholar] [CrossRef]
- Li, W.; He, P.; Li, H.; Wang, H.; Zhang, R. Detection of GAN-Generated Images by Estimating Artifact Similarity. IEEE Signal Process. Lett. 2021, 28, 2137–2141. [Google Scholar] [CrossRef]
- Mahara, A.; Rishe, N. Methods and Trends in Detecting AI-Generated Images: A Comprehensive Review. Comput. Sci. Rev. 2026, 60, 100908. [Google Scholar] [CrossRef]
- Park, D.; Na, H.; Choi, D. Performance Comparison and Visualization of AI-Generated-Image Detection Methods. IEEE Access 2024, 12, 62609–62627. [Google Scholar] [CrossRef]
- Bird, J.J.; Lotfi, A. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access 2024, 12, 26896–26909. [Google Scholar] [CrossRef]
- Velásquez-Salamanca, D.; Martín-Pascual, M.Á.; Andreu-Sánchez, C. Interpretation of AI-Generated vs. Human-Made Images. J. Imaging 2025, 11, 227. [Google Scholar] [CrossRef]
- Redies, C. A Universal Model of Esthetic Perception Based on the Sensory Coding of Natural Stimuli. Spat. Vis. 2007, 21, 97–117. [Google Scholar] [CrossRef] [PubMed]
- Rigau, J.; Feixas, M.; Sbert, M. Informational Aesthetics Measures. IEEE Comput. Graph. Appl. 2008, 28, 24–34. [Google Scholar] [CrossRef]
- Machado, P.; Romero, J.; Nadal, M.; Santos, A.; Correia, J.; Carballal, A. Computerized Measures of Visual Complexity. Acta Psychol. 2015, 160, 43–57. [Google Scholar] [CrossRef]
- Datta, R.; Joshi, D.; Li, J.; Wang, J.Z. Studying Aesthetics in Photographic Images Using a Computational Approach. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 288–301. [Google Scholar] [CrossRef]
- Sigaki, H.Y.D.; Perc, M.; Ribeiro, H.V. History of Art Paintings through the Lens of Entropy and Complexity. Proc. Natl. Acad. Sci. USA 2018, 115, E8585–E8594. [Google Scholar] [CrossRef]
- Çınar Kalenderoğlu, S.; Demiröz, M. Integrating Text-to-Image AI in Architectural Design Education: Analytical Perspectives from a Studio Experience. J. Des. Studio 2024, 6, 247–258. [Google Scholar] [CrossRef]
- Koç, M.; As, İ. Evaluating the Aesthetic Quality in Computer-Generated Renderings via a Comparative Analysis. IDA Int. Des. Art J. 2025, 7, 256–268. [Google Scholar]
- Arnheim, R. Art and Visual Perception: A Psychology of the Creative Eye; University of California Press: Berkeley, CA, USA, 1974. [Google Scholar]
- Lidwell, W.; Holden, K.; Butler, J. Universal Principles of Design; Rockport Publishers: Beverly, MA, USA, 2010. [Google Scholar]
- Lupton, E.; Phillips, J.C. Graphic Design: The New Basics; Princeton Architectural Press: New York, NY, USA, 2015. [Google Scholar]
- Oppenlaender, J. A taxonomy of prompt modifiers for text-to-image generation. Behav. Inf. Technol. 2023, 43, 3763–3776. [Google Scholar] [CrossRef]
- Berlyne, D.E. Aesthetics and Psychobiology; Appleton-Century-Crofts: New York, NY, USA, 1971. [Google Scholar]
- Oliva, A.; Torralba, A. Building the Gist of a Scene: The Role of Global Image Features in Recognition. Prog. Brain Res. 2006, 155, 23–36. [Google Scholar] [CrossRef]
- Hasler, D.; Süsstrunk, S. Measuring Colourfulness in Natural Images. In Proceedings of the SPIE Human Vision and Electronic Imaging VIII, Santa Clara, CA, USA, 20–24 January 2003; pp. 87–95. [Google Scholar] [CrossRef]
- Shapiro, S.S.; Wilk, M.B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika 1965, 52, 591–611. [Google Scholar] [CrossRef]
- Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1988. [Google Scholar]
- Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]






| Methodology Reference | Evaluated Characteristics | Detection Analysis Framework | Interpretability | Main Limitation | Distinguishing Position Relative to the Present Study |
|---|---|---|---|---|---|
| Deep neural artifact-based detection [9,10,13] | Pixel-level anomalies, frequency-domain irregularities, and architecture-specific synthetic fingerprints. | CNN-based detectors and deep feature embeddings. | Low (black-box). | Strong dependence on learned representations; limited transparency; may be sensitive to changes in generative architectures. | The present study shifts the focus from artifact detection to interpretable structural divergence analysis. |
| Explainable or feature-based AI-image identification [15,16,17] | Synthesized visual features, explainable indicators, and image-level differences between AI-generated and human-made content. | Explainable AI frameworks and classifier-oriented visual analysis. | Moderate. | Primarily classification-oriented; limited emphasis on controlled compositional structure and thematic matching. | The present study emphasizes controlled thematic comparison and interpretable compositional quantification rather than classification alone. |
| Design- and aesthetics-oriented comparative studies [23,24] | Aesthetic quality, compositional judgment, and visually perceived differences in computer-generated imagery or AI-assisted design contexts. | Comparative design analysis and perceptual/aesthetic evaluation. | Moderate to high. | Not primarily designed for human–AI structural discrimination or reproducible compositional measurement. | The present study introduces a formalized quantification framework for structural and chromatic divergence in a controlled poster dataset. |
| Proposed method (this study) | Structural and compositional priors, including padding ratio (negative space), hue entropy, edge density, rule-of-thirds activation, and connected component density. | Transparent machine-learning framework (Random Forest) with SHAP, permutation importance, ablation analysis, and supplementary classifier sensitivity testing. | High (interpretable/white-box oriented). | Restricted to a controlled thematic domain and a single publicly accessible text-to-image generation platform; excludes higher-order semantic and OCR-based descriptors. | Reframes human–AI differentiation as a structural divergence problem and quantifies domain- and platform-constrained compositional differences using interpretable descriptors. |
| Parameter | Value Used | Section | Rationale | Expected Effect if Changed |
|---|---|---|---|---|
| Cropping strategy | Center-focused cropping | Section 2.2.1 | Reduces peripheral framing inconsistencies and edge artifacts while preserving the main compositional field. | More aggressive cropping may remove meaningful layout information; no cropping may increase framing-induced variability. |
| Resize resolution | 1024 × 1024 pixels | Section 2.2.1 | Balances sufficient spatial detail for feature extraction with comparability across heterogeneous poster formats. | Lower resolutions may suppress fine structural transitions and color variation; higher resolutions may increase computational cost without proportionate benefit. |
| Interpolation method | Bilinear interpolation | Section 2.2.1 | Provides stable geometric normalization while minimizing abrupt resampling artifacts. | Alternative interpolation methods may slightly alter edge transitions and fine-grained texture responses. |
| Color space conversion | RGB → HSV | Section 2.2.2 | Separates chromatic information from intensity and supports hue-based analysis more directly than RGB space. | Retaining RGB would reduce the specificity of hue-based metrics and complicate chromatic masking. |
| Saturation threshold (hue) | S ≤ 25 excluded | Section 2.2.2 | Removes near-achromatic regions so that hue entropy reflects meaningful chromatic organization rather than background noise. | Stricter or looser thresholds may alter the sensitivity of hue entropy to low-saturation background regions. |
| Edge detector | Canny edge detection | Section 2.3 | Provides a standardized estimate of local structural transition density. | Different edge detectors or thresholds may change the absolute edge density values and sensitivity to fine boundaries. |
| Significance threshold | α = 0.05 | Section 2.4.1 | Standard significance criterion for inferential comparison. | Stricter thresholds reduce false positives but may decrease sensitivity to moderate effects. |
| Cross-validation folds | 5 folds | Section 2.4.2 | Appropriate balance between robustness and stability for a moderate-sized dataset. | Fewer folds may increase variance; more folds may raise computational cost and increase split sensitivity. |
| Bootstrap iterations | 1000 | Section 2.4.3 | Provides a stable empirical estimate of ROC-AUC confidence intervals. | Fewer iterations may yield less stable interval estimates; more iterations increase computation with diminishing practical gain. |
| Primary classifier | Random Forest | Section 2.4.2 | Supports nonlinear interactions, moderate-sized data, and interpretable feature-importance analysis. | Alternative classifiers may yield different performance profiles and interpretability characteristics. |
| Supplementary SVM kernel | RBF | Section 3.10 | Captures nonlinear class boundaries in the original five-feature configuration. | A linear kernel may reduce flexibility if class separation is nonlinear. |
| Supplementary k-NN neighbors | k = 5 | Section 3.10 | Standard local-neighborhood setting for sensitivity comparison. | Smaller k may increase sensitivity to noise; larger k may oversmooth local class structure. |
| Structural descriptor set | Rule-of-Thirds Activation, Connected Component Density | Section 2.3 and Section 3.9 | Extends the framework beyond basic chromatic and density features to capture anchoring and fragmentation. | Excluding these descriptors reduces structural interpretability; including additional descriptors may complicate separability. |
| Metric | Human (Mean ± SD) | AI (Mean ± SD) | p-Value | Rank-Biserial r |
|---|---|---|---|---|
| Edge Density | 0.039 ± 0.025 | 0.030 ± 0.030 | <0.001 | −0.350 |
| Shannon Entropy | 3.577 ± 1.269 | 3.631 ± 1.068 | 0.925 | −0.008 |
| Colorfulness | 54.025 ± 30.592 | 35.963 ± 18.304 | <0.001 | −0.367 |
| Hue Entropy | 0.839 ± 0.150 | 0.998 ± 0.126 | <0.001 | +0.500 |
| Padding Ratio | 0.505 ± 0.179 | 0.171 ± 0.297 | <0.001 | −0.661 |
| Analysis Type | Metric | Value | Interpretation |
|---|---|---|---|
| Cross-validation stability | Mean AUC | 0.991 ± 0.004 | High performance stability |
| OOF Bootstrap | AUC (95% CI) | 0.99 (0.978–0.999) | Robust modality separation |
| Feature Ablation | AUC (without padding) | 0.903 | Multi-feature persistence |
| Distributional Divergence | KS Statistic | 0.76 (p < 10−28) | Strong structural separation |
| Effect Size | Cohen’s d | 1.365 | Extremely large effect |
| Feature Group | Accuracy | ROC-AUC |
|---|---|---|
| Original | 0.955 | 0.9910 |
| Structural | 0.815 | 0.8820 |
| Combined | 0.955 | 0.9935 |
| Classifier | Feature Set | Mean Accuracy | Mean ROC-AUC |
|---|---|---|---|
| Random Forest | Original 5 features | 0.955 | 0.991 |
| SVM (RBF) | Original 5 features | 0.970 | 0.996 |
| k-NN (k = 5) | Original 5 features | 0.955 | 0.991 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Vardar, N.; Gümüş, Ç. Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions. Appl. Sci. 2026, 16, 3669. https://doi.org/10.3390/app16083669
Vardar N, Gümüş Ç. Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions. Applied Sciences. 2026; 16(8):3669. https://doi.org/10.3390/app16083669
Chicago/Turabian StyleVardar, Necati, and Çağrı Gümüş. 2026. "Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions" Applied Sciences 16, no. 8: 3669. https://doi.org/10.3390/app16083669
APA StyleVardar, N., & Gümüş, Ç. (2026). Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions. Applied Sciences, 16(8), 3669. https://doi.org/10.3390/app16083669

