Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions

Vardar, Necati; Gümüş, Çağrı

doi:10.3390/app16083669

Open AccessArticle

Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions

by

Necati Vardar

^1,*

and

Çağrı Gümüş

²

¹

Department of Mechatronics Engineering, Faculty of Engineering and Natural Sciences, KTO Karatay University, Konya 42020, Türkiye

²

Graphic Design Department, Faculty of Fine Arts and Design, KTO Karatay University, Konya 42020, Türkiye

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(8), 3669; https://doi.org/10.3390/app16083669

Submission received: 4 March 2026 / Revised: 30 March 2026 / Accepted: 2 April 2026 / Published: 9 April 2026

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Featured Application

The proposed computational machine learning framework can be applied to structural pattern discrimination problems in image-based datasets where spatial and informational features require quantitative separation. By combining entropy-based descriptors with cross-validated classification, the methodology enables interpretable modeling of structured visual systems. The approach may support analytical workflows in digital design research, visual communication studies, and computational image analysis contexts where reproducible structural characterization is required.

Abstract

The rapid proliferation of text-to-image generative systems has transformed visual content production, yet the structural characteristics embedded in their compositional outputs remain insufficiently understood. Rather than approaching human–AI differentiation as a purely classification problem, this study investigates whether a controlled set of AI-generated and human-designed posters exhibits measurable structural divergence under thematically matched conditions. A dataset of jazz festival posters was analyzed using interpretable geometric and information-theoretic descriptors, including spatial density (padding ratio), edge density, chromatic dispersion, and entropy-based measures. Instead of relying on deep neural detection architectures, we employed a transparent machine-learning framework to examine intrinsic structural separability within feature space. Results demonstrated highly stable group separation (ROC-AUC = 0.99; 95% CI: 0.978–0.999) under cross-validated evaluation. Distributional analysis further revealed a pronounced divergence in spatial density allocation (Kolmogorov–Smirnov statistic = 0.76, p < 10⁻²⁸), accompanied by a very large effect size (Cohen’s d = 1.365). While padding ratio emerged as the dominant discriminative factor, additional entropy- and chromatic-based descriptors contributed to group separation even when spatial density was excluded (AUC = 0.903). These findings indicate that AI-generated and human-designed posters can diverge in negative space allocation and chromatic organization under controlled thematic and platform-specific conditions. The study contributes to the explainable analysis of generative visual systems by reframing human–AI differentiation as a structural divergence problem grounded in interpretable image statistics rather than as a model-specific artifact detection task.

Keywords:

computational modeling; information entropy; structural pattern analysis; random forest classification; cross-validation; generative artificial intelligence; visual feature analysis

1. Introduction

The rapid advancement of Generative Artificial Intelligence (GenAI), particularly diffusion-based and large-scale text-to-image architectures, has fundamentally transformed visual content production. Latent diffusion systems and hierarchical text-conditional generation models now produce high-resolution images exhibiting strong perceptual realism and stylistic consistency [1,2,3]. While this technological progress expands creative capacity, it simultaneously raises a foundational technical question: Do generative systems encode measurable structural priors that systematically differ from human-imposed compositional constraints?

Existing scholarship addressing AI-generated imagery has predominantly focused on ethical implications, authorship, and societal impact [4,5], or on perceptual and stylistic evaluation within computational creativity research [6,7,8]. Within multimedia forensics, detection approaches have largely relied on convolutional neural networks trained to identify architecture-specific artifacts or frequency-domain irregularities [9,10,11]. Although such models often achieve high classification performance, their reliance on model-dependent fingerprints may limit generalizability across evolving generative architectures [12,13]. Recent discussions therefore emphasize the need for model-agnostic and interpretable detection strategies that do not depend on opaque deep feature embeddings [13]. Recent review and benchmarking studies have further highlighted the rapid diversification of AI-generated image detection methods, while also emphasizing the importance of interpretability, comparative evaluation, and robustness across detection settings [14,15,16]. However, most existing methods remain embedded within high-dimensional learned representations, offering limited theoretical insight into the structural mechanisms differentiating human and generative visual production. Although recent studies have begun to address explainable identification and interpretation of AI-generated versus human-made imagery, most remain primarily classification-oriented and do not explicitly frame the problem as one of structural compositional divergence [16,17]. Parallel to generative modeling research, computational aesthetics has long demonstrated that visual artifacts encode measurable structural regularities. Information-theoretic measures such as Shannon entropy and chromatic dispersion have been used to approximate visual complexity and perceptual organization [18,19,20]. Edge-density statistics and spatial proportion metrics have similarly been employed to quantify compositional balance and structural detail intensity [21,22]. These interpretable descriptors provide a theoretically grounded bridge between low-level image statistics and higher-level perceptual attributes. Recent design-oriented studies have likewise examined how computational evaluation and AI-supported image generation interact with aesthetic judgment, compositional reasoning, and visual decision-making in applied creative contexts [23,24]. Despite this established foundation, the systematic application of interpretable compositional descriptors to differentiate contemporary GenAI outputs from human-designed graphic compositions remains limited. Most modality detection studies focus on artifact-level irregularities or deep representation learning rather than intrinsic compositional priors. From a theoretical standpoint, human graphic design is guided by explicit compositional constraints such as hierarchical anchoring, modular segmentation, grid alignment, and deliberate negative space allocation [25,26,27]. Whitespace functions as an intentional structural variable contributing to visual clarity and perceptual equilibrium. In contrast, diffusion-based generative models iteratively optimize denoising objectives to approximate learned pixel distributions [1,3]. These architectures prioritize distributional plausibility and global coherence rather than explicit enforcement of compositional economy. Consequently, generative systems may implicitly embed spatial density biases or diffuse chromatic dispersion patterns reflective of statistical priors rather than intentional layout constraints. This divergence suggests that human and generative design processes may operate under distinct structural priors that are computationally measurable independent of semantic content or architecture-specific artifacts. Rather than proposing a new deep detection architecture, the present study contributes a methodological reframing of human–AI visual differentiation as a structural divergence problem. Its contribution lies in combining interpretable low-level and mid-level compositional descriptors, controlled thematic matching, and explainability-oriented multivariate analysis to quantify domain- and platform-constrained differences in visual organization. In this sense, the study is intended not as a benchmark for universal AI-image detection, but as an interpretable quantification framework for examining how human-designed and AI-generated visual compositions differ under controlled conditions.

The decision to focus on jazz festival posters was methodological rather than arbitrary. This domain provides a visually rich yet semantically coherent design space in which typography, negative space, color organization, and compositional hierarchy are central to poster construction. By constraining both groups to a single thematic category, we aimed to reduce between-sample heterogeneity arising from divergent communicative purposes and to isolate structural differences more plausibly linked to production mechanisms. Accordingly, the present study should be interpreted as a controlled proof-of-concept within a thematically stabilized design setting rather than as a universal claim spanning all visual domains.

We hypothesize that the examined AI-generated compositions exhibit systematic spatial density and chromatic dispersion regularities that differ from human-imposed compositional constraints, and that these differences can be detected using interpretable geometric and information-theoretic descriptors without reliance on semantic or deep feature representations. This hypothesis is also aligned with recent efforts to interpret AI-generated imagery not only through artifact detection performance but also through more transparent analysis of visual and structural regularities [15,17]. To test this hypothesis, the present study formulates human-versus-GenAI differentiation as a structural pattern discrimination problem within a multidimensional descriptor space. Rather than employing deep neural detection architectures, we construct a transparent computational framework grounded in entropy-based, chromatic, compositional, and structural anchoring metrics. Feature-family ablation, permutation analysis, and SHAP-based interpretability are employed to determine whether modality separability arises from isolated descriptors or coordinated compositional priors.

The primary contributions of this work are:

(a): Theoretical reframing of human–GenAI differentiation as a structural prior divergence problem rather than artifact-based detection.
(b): Empirical evidence that negative space utilization constitutes a dominant modality-specific compositional signal.
(c): Demonstration that structural anchoring and topological fragmentation metrics encode independent discriminative value.
(d): Development of a reproducible and interpretable machine learning framework operating exclusively on low-level visual statistics.

By isolating intrinsic geometric and chromatic properties under controlled thematic conditions, this study contributes to explainable AI in creative systems, computational authorship attribution, and structural bias analysis in generative pipelines.

To clarify the methodological position of the present study, Table 1 compares representative approaches used to distinguish human-designed and AI-generated visual compositions. As shown, many existing approaches remain centered on artifact-oriented detection or classifier-driven identification, often relying on deep learned representations with limited interpretability. More recent studies have improved explainability and comparative analysis, yet most still emphasize classification performance rather than controlled quantification of compositional divergence. In contrast, the present study focuses on interpretable structural and chromatic descriptors under controlled thematic matching, thereby reframing human–AI differentiation as a structural divergence problem rather than as a purely artifact-centered detection task.

2. Materials and Methods

This study compares human-designed and AI-generated jazz festival posters using a standardized computational analysis pipeline. The methodological framework consists of dataset construction, preprocessing, computational feature extraction, statistical testing, and supervised classification.

2.1. Dataset Construction

This study was designed as a controlled comparative analysis between human-designed and AI-generated posters within a fixed thematic domain. To ensure internal validity and reduce stylistic variability arising from divergent design purposes, both datasets were restricted to a single concept: jazz festival posters. This domain was selected because it offers a visually rich yet semantically coherent design space in which typography, negative space, color organization, and compositional hierarchy are all central to poster construction. This restriction was intended to create a controlled proof-of-concept setting that would reduce confounding variation introduced by divergent communicative purposes. Concept stabilization was implemented to ensure that measurable differences between groups could be attributed to production mechanisms rather than thematic heterogeneity. The use of thematically constrained visual datasets has been shown to improve interpretability and internal consistency in computational aesthetics research [20,21].

Two balanced datasets were constructed, each consisting of n = 100 posters, resulting in a total of 200 visual samples subjected to computational analysis. Equal sample sizes were maintained to prevent statistical bias caused by group imbalance and to support reliable comparative modeling. Balanced sampling strategies are widely recommended in comparative visual feature analysis to enhance statistical robustness and classification stability [19]. Dataset construction followed explicitly defined selection criteria to ensure transparency and reproducibility.

2.1.1. Human-Designed Posters

The human-designed dataset was constructed using publicly accessible works obtained from the digital design platform Behance (www.behance.net) (accessed on 4 February 2026), which hosts contemporary professional portfolios in graphic design. Behance was selected because it provides access to portfolio-based design outputs reflecting current design practices rather than commercially optimized stock templates. To reduce sampling bias and avoid overrepresentation of a single visibility category, a stratified sampling strategy was implemented. Posters were retrieved using the search query “Jazz Festival Poster” within the Creative Fields → Graphic Design category. The following strata were proportionally sampled:

Curated (editorially featured) works: n = 60.
Most Appreciated works: n = 20.
Most Viewed works: n = 20.

Stratified sampling approaches have been recommended in large-scale visual dataset studies to ensure stylistic diversity while maintaining conceptual consistency [19].

(a)

Inclusion Criteria

Posters were included in the dataset if they satisfied all the following conditions:

Clear thematic relevance to a jazz festival, identifiable through title, description, tags, or visual composition.
Presentation as a single finalized poster design (multi-page or multi-layout presentations were excluded).
Minimum short-edge resolution of 1024 pixels to ensure reliable pixel-based computational analysis.
Explicit identification as human-created (works labeled as AI-generated were excluded).
Direct relevance to graphic design practice and poster design conventions.

(b)

Exclusion Criteria

The following materials were excluded from the dataset:
Duplicate uploads or minor variations of the same poster.
Mood boards, presentation layouts, or multi-design showcase pages.
Posters embedded within mockups where the design area could not be clearly isolated.
Non-poster outputs such as logos, UI designs, branding kits, or icon collections.

All selected samples were documented with source metadata to ensure traceability and methodological transparency.

2.1.2. GenAI-Generated Posters

The AI-generated dataset was constructed using a publicly accessible text-to-image generation platform referred to in this study as “Nano Banana.” While proprietary implementation details are not publicly disclosed, the system appears to follow generation principles commonly associated with large-scale contemporary text-to-image models described in the prior literature [1,2,3]. Image generation was conducted in February 2026 under controlled prompting conditions to ensure thematic and compositional consistency. To maintain direct comparability with the human-designed dataset, the same thematic constraint, namely jazz festival poster design, was strictly enforced throughout the generation process. A total of n = 100 AI-generated posters were retained to match the human-designed sample size and preserve statistical balance across groups. The final set was assembled under a fixed procedural protocol to support thematic coherence and class comparability rather than to optimize classification outcomes. A structured prompting protocol was implemented to approximate professional poster design conventions. Prompt engineering has been shown to influence stylistic and compositional characteristics in text-to-image systems [28]. Accordingly, prompts were standardized to include consistent semantic descriptors (e.g., “jazz festival poster, modern graphic design, typography-centered composition, minimalist geometric forms, limited color palette, high contrast, balanced layout, print-ready design”). Secondary stylistic modifiers were rotated systematically to introduce controlled variation while preserving conceptual coherence. For each prompt configuration, multiple candidate outputs were generated. To reduce selection bias, a single image per prompt configuration was retained based on thematic relevance and minimum poster-like visual coherence, without manual editing, cropping, retouching, or compositional modification. The curation procedure was intended to establish a comparable poster dataset rather than to optimize classification performance. Importantly, no samples were selected or excluded on the basis of feature values, expected classification difficulty, model predictions, or similarity to human-designed examples. No adjustments were made to spatial density, color distribution, or layout arrangement after generation. All selected AI-generated posters were exported at native resolution and subsequently subjected to the identical preprocessing and normalization procedures described in Section 2.2. The model configuration and prompting framework were kept constant throughout the data collection phase to ensure experimental consistency. Because the platform is proprietary, full architectural transparency and version-level technical reproducibility are not publicly available. Reproducibility in the present study is therefore provided at the procedural level through explicit reporting of the thematic constraint, prompting strategy, generation period, and post-generation handling rules. Although exact output replication may be affected by potential external system updates beyond experimental control, the generation workflow itself remained fixed during dataset construction to preserve internal validity.

The overall workflow of the proposed measurement and testing framework is summarized in Figure 1.

2.2. Image Preprocessing and Standardization

Computational aesthetic analysis is highly sensitive to resolution, aspect ratio, framing, and color distribution. Without proper standardization, extracted numerical features may reflect technical presentation differences rather than intrinsic structural properties of the design itself. Therefore, a controlled preprocessing protocol was implemented prior to feature extraction to ensure comparability across human-designed and AI-generated posters. Similar normalization procedures are widely recommended in computational image analysis to minimize measurement bias [20,21]. All preprocessing steps were applied identically to both datasets to maintain methodological symmetry.

2.2.1. Geometric Normalization

Poster designs naturally vary in aspect ratio, layout format, and resolution. To reduce geometric heterogeneity while preserving compositional integrity, a two-step normalization procedure was applied.

First, center-focused cropping was performed to minimize peripheral framing inconsistencies. This decision was taken to reduce edge artifacts and scaling distortions that may artificially inflate structural metrics such as edge density. Previous studies in visual perception and computational aesthetics indicate that uncontrolled framing differences can significantly influence low-level structural measurements [19,20].

Following cropping, all images were resized to a standardized resolution of 1024 × 1024 pixels using bilinear interpolation. Resolution normalization ensures that pixel-dependent metrics (e.g., entropy, edge density) remain directly comparable across samples and are not confounded by scaling differences. The standardized resolution of 1024 × 1024 pixels was selected as a practical balance between preserving sufficient spatial detail for feature extraction and ensuring comparability across heterogeneous poster formats. Lower resolutions may suppress fine structural transitions and color variation, whereas substantially higher resolutions would increase computational cost without proportionate methodological benefit for the present descriptor set. Accordingly, the adopted normalization setting was chosen to support stable measurement while minimizing resolution-induced variability across samples.

A methodological distinction was made between content-based metrics and spatial distribution metrics. Structural features such as Edge Density, Shannon Entropy, and color-based measures were computed on normalized content regions. In contrast, spatial distribution measurements (e.g., padding ratio) were derived from the original, uncropped compositions to preserve information related to negative space usage. This separation prevents normalization artifacts from influencing spatial density analyses and allows negative space behavior to emerge as an independent structural variable.

2.2.2. Color Space Masking

Color-based metrics are particularly sensitive to achromatic regions (pure white, black, or grayscale areas), which may distort hue distribution statistics. To ensure that hue entropy reflects meaningful chromatic organization rather than background noise, a saturation-based masking procedure was implemented.

All images were converted from RGB to HSV color space. Pixels with saturation values below a predefined threshold (S ≤ 25) were excluded from hue-based calculations. This masking step prevents neutral background regions from artificially reducing or inflating hue entropy measurements. The saturation threshold used for hue-based analysis was selected to exclude near-achromatic regions while preserving perceptually meaningful chromatic content. In principle, stricter or looser threshold values may affect the sensitivity of hue entropy to low-saturation background regions; however, the adopted threshold was chosen to reduce background-induced distortion while retaining active design colors relevant to the compositional analysis.

Similar chromatic filtering strategies have been employed in computational aesthetics research to isolate perceptually relevant color structures [20,28]. By excluding low-saturation pixels, hue entropy becomes a more reliable indicator of chromatic diversity and distribution regularity within the active design region. This approach enables clearer differentiation between limited-palette controlled designs and visually diffuse color organizations. All preprocessing operations were executed using Python (Version 3.14.2; Python Software Foundation, Wilmington, DE, USA) and Python-based image processing libraries including OpenCV (Version 4.13.0.90; OpenCV Foundation) and scikit-image (Version 0.26.0; scikit-image developers), ensuring procedural consistency and reproducibility.

2.3. Computational Visual Features

To objectively characterize structural and chromatic properties of the posters, a set of computational visual descriptors was extracted. The selected metrics were designed to represent complementary dimensions of visual organization, including structural complexity, chromatic organization, compositional spatial distribution, and spatial anchoring robustness. The feature set was intentionally restricted to interpretable low-level and mid-level structural descriptors rather than semantic embeddings, OCR-derived typography variables, or graph-based layout representations. This design choice was made to preserve methodological transparency and to test whether structural priors alone could reveal measurable divergence between human-designed and AI-generated compositions.

All descriptors were computed using a standardized Python-based pipeline applied identically to both datasets to ensure procedural consistency. The primary feature set included edge density, Shannon entropy, colorfulness, hue entropy, and padding ratio. To further capture higher-order spatial organization patterns, two additional structural robustness descriptors—Rule-of-Thirds Activation Score and Connected Component Density—were computed. These descriptors quantify compositional anchoring and spatial fragmentation, respectively. The selection of features was guided by prior work in computational aesthetics and visual perception research, where measurable low-level image statistics are frequently used to approximate perceptual attributes such as complexity, order, visual balance, and structural coherence [19,20,21].

2.3.1. Structural Complexity Metrics

Structural complexity reflects the density and variability of visual elements within a composition. In empirical aesthetics, complexity has long been associated with perceptual stimulation and information richness [29]. In computational terms, it can be approximated through edge-based and entropy-based measures. Edge Density quantifies the proportion of pixels identified as edges relative to the total pixel count. Edges were detected using the Canny edge detection algorithm applied to grayscale-converted images. The metric was computed as:

E d g e D e n s i t y = (N u m b e r o f e d g e p i x e l s) / (T o t a l n u m b e r o f p i x e l s)

Edge-based measures have been widely used to approximate structural detail intensity and visual sharpness in image analysis studies [30]. Higher edge density values indicate greater local structural variation and detail concentration.

Shannon entropy measures the distributional unpredictability of grayscale pixel intensities. It reflects tonal heterogeneity and information dispersion within an image. Entropy was computed as:

H = - \sum p_{i} \log_{2} p_{i}

where

p_{i}

represents the probability of intensity level

i

occurring within the image.

Entropy-based metrics are commonly employed in computational aesthetics to capture information richness and structural irregularity [19,22]. Higher entropy values correspond to more heterogeneous tonal distributions.

2.3.2. Color-Based Metrics

Color organization plays a central role in poster design, influencing perceptual salience and visual identity. To characterize chromatic structure, two complementary metrics were extracted: colorfulness and hue entropy.

Colorfulness was calculated using the method proposed by Hasler and Süsstrunk [31], which derives a perceptual color intensity score based on the mean and standard deviation of red, green (RG) and yellow, blue (YB) opponent channels. This metric is widely used in image quality and aesthetic research as a robust estimator of perceived chromatic strength. Higher colorfulness values indicate stronger chromatic contrast and saturation differences within the image.

Hue entropy measures the distributional variability of hue values within the HSV color space. Following the preprocessing protocol described in Section 2.2.2, only pixels exceeding the saturation threshold were included in the calculation. Hue entropy was computed using the Shannon entropy formula applied to the hue histogram. This metric captures how evenly or unevenly color tones are distributed across the composition. Prior studies suggest that entropy-based color measures are effective in distinguishing controlled palette usage from diffuse chromatic dispersion [19,22].

2.3.3. Compositional Metric

In addition to structural and chromatic features, a compositional metric was introduced to quantify spatial distribution characteristics. The padding ratio represents the proportion of empty or low-information regions relative to the overall poster area. Negative space plays a fundamental role in graphic design, contributing to hierarchy, readability, and visual balance. Padding ratio was calculated by identifying background-dominant regions through intensity homogeneity thresholds and computing:

P a d d i n g R a t i o = (A r e a o f l o w - v a r i a t i o n r e g i o n s) / (T o t a l i m a g e a r e a)

Unlike edge or entropy measures, which focus on local variation, the padding ratio captures macro-level compositional structure. This metric allows assessment of whether designs tend toward spatial minimalism or density. The inclusion of spatial proportion metrics aligns with empirical aesthetics research emphasizing the importance of compositional balance and spatial economy in visual perception [20,29].

All feature extraction procedures were automated and applied uniformly to both human-designed and AI-generated datasets to ensure direct comparability.

2.4. Statistical Analysis and Classification Framework

The analytical workflow consisted of two complementary stages. First, extracted visual descriptors were statistically compared between human-designed and AI-generated posters to identify feature-level group differences. Second, a multivariate classification framework was applied to evaluate whether the proposed interpretable feature set could reliably support group discrimination under cross-validated conditions. Statistical testing procedures, effect-size estimation, and classification protocols are described in detail in the following subsections.

2.4.1. Statistical Testing

Prior to group comparisons, the distribution of each extracted visual feature was evaluated using the Shapiro–Wilk normality test [32]. This preliminary step ensured that subsequent statistical procedures were selected based on empirical distributional properties rather than assumed parametric conditions.

For features satisfying normality assumptions, Welch’s independent samples t-test was applied due to its robustness against unequal variances between groups. For features that violated normality assumptions, the Mann–Whitney U test was employed as a non-parametric alternative. All tests were two-tailed, and statistical significance was evaluated at

α = 0.05

. In this study, the p-value was interpreted as the probability of observing a group difference at least as extreme as that obtained in the sample, assuming that the null hypothesis of no true difference between groups is true.

To complement significance testing and avoid overreliance on p-values, effect sizes were calculated for each comparison. Cohen’s d was reported for parametric tests [33], while rank-based effect size estimates were computed for non-parametric comparisons. Reporting effect sizes allows interpretation of the practical magnitude of differences beyond statistical detectability.

2.4.2. Random Forest Classification Framework

To evaluate whether the extracted visual descriptors collectively enable reliable discrimination between human-designed and AI-generated posters, a Random Forest (RF) classifier was implemented. Random Forest is an ensemble learning algorithm based on bootstrap aggregation and multiple decision trees, known for its robustness to multicollinearity, non-linear feature interactions, and moderate-sized datasets [34]. In the present study, Random Forest was selected as the primary classifier because it provides robust nonlinear decision boundaries, tolerates interactions among low-dimensional interpretable features, and yields stable feature-importance estimates that are consistent with the explainability-oriented design of the framework. The classification stage was not intended as an exhaustive benchmark against all traditional machine-learning algorithms, but rather as a transparent multivariate test of whether the proposed structural descriptors collectively support reliable discrimination between human-designed and AI-generated posters.

The baseline classification model was trained using five primary visual descriptors:

Edge Density.
Shannon Entropy.
Colorfulness.
Hue Entropy.
Padding Ratio.

To assess structural robustness and feature-family complementarity, two additional spatial organization descriptors were subsequently introduced:

Rule-of-Thirds Activation Score.
Connected Component Density.

Accordingly, three model configurations were evaluated:

(i): The original five-feature configuration;
(ii): A structural-only configuration including the two structural descriptors;
(iii): A combined seven-feature configuration integrating all descriptors.

Given the moderate dataset size, model evaluation was performed using stratified five-fold cross-validation to reduce split-dependent variability and to ensure balanced representation of both classes in each fold. In each iteration, 80% of the data were used for training and 20% for validation. Performance metrics were computed for each fold and averaged to obtain mean classification performance and standard deviation values. Model performance was assessed using accuracy, precision, recall, F1-score, and the area under the receiver operating characteristic curve (ROC-AUC). Reported values represent the mean ± standard deviation across cross-validation folds. Feature importance scores were computed using the mean decrease in impurity (Gini importance). In addition, permutation importance was conducted under the same cross-validation protocol to assess the robustness of feature contributions beyond impurity-based estimates. This multivariate modeling stage complements univariate statistical testing by capturing potential interaction effects among descriptors and assessing whether the proposed interpretable feature set supports stable predictive separability within a multivariate setting.

2.4.3. Feature Importance Analysis

In addition to structural and chromatic descriptors, compositional behavior was examined through the padding ratio metric defined in Section 2.3.3. Unlike edge density and entropy-based measures, which primarily quantify local variation and information dispersion, padding ratio captures global spatial organization by estimating the proportion of negative space within the composition. From a univariate statistical perspective, padding ratio distributions were compared between human-designed and AI-generated posters using the Mann–Whitney U test due to non-normal distribution characteristics. This analysis evaluates whether spatial economy and negative space utilization differ systematically between production paradigms. From a multivariate modeling perspective, padding ratio was incorporated as a predictor within the Random Forest framework. Its relative contribution to classification performance was quantified using mean decrease in impurity (Gini importance). Evaluating the compositional metric in both univariate and multivariate contexts enables complementary interpretation: statistical testing identifies distributional differences, while ensemble modeling reveals its discriminative relevance when interacting with other structural descriptors.

By integrating a compositional variable alongside low-level image statistics, the analytical framework extends beyond purely texture- or color-based differentiation and facilitates a broader structural interpretation of design behavior.

The principal analytical parameters used in the proposed framework, together with their selected values, methodological rationale, and expected effects if modified, are summarized in Table 2.

3. Results

Table 3 presents the descriptive statistics and non-parametric group comparisons of the extracted computational visual features for human-designed and AI-generated posters. Significant differences were observed across most structural and chromatic descriptors. Padding ratio (r = −0.661, p < 0.001) and hue entropy (r = 0.500, p < 0.001) demonstrated large effect sizes, indicating substantial compositional and chromatic divergence between production modalities. In contrast, Shannon entropy did not differ significantly between groups (r = −0.008, p = 0.925), suggesting comparable levels of global informational complexity.

3.1. Compositional Separation (Padding Ratio)

Among the extracted visual descriptors, the padding ratio—defined as the proportion of low-activity (near-background) regions relative to total image area—emerged as the most discriminative compositional variable within the extracted feature set between human-designed and AI-generated posters. Human-designed posters exhibited a substantially higher padding ratio (0.505 ± 0.179) compared to AI-generated posters (0.171 ± 0.297). The difference was statistically significant according to the Mann–Whitney U test (p < 0.001), with a large effect size (|r| = 0.661). Structurally, higher padding ratios in human-designed posters suggest greater use of negative space, whereas AI-generated posters tended to display denser visual occupancy and reduced spatial margins.

These findings indicate that spatial economy and negative space utilization represent a major compositional distinction between human and AI design outputs within the examined poster domain.

3.2. Structural Complexity Differences

Structural complexity was assessed using edge density and Shannon entropy metrics (Table 3). Edge density values were significantly higher in human-designed posters (0.039 ± 0.025) compared to AI-generated posters (0.030 ± 0.030). The difference was statistically significant (Mann–Whitney U test, p < 0.001), with a small-to-moderate effect size (|r| = 0.350). This result suggests that human-designed posters tend to exhibit sharper local structural transitions and more concentrated edge variations. Such differences may be associated with typographic detailing and compositional refinement, contributing to localized increases in structural contrast. In contrast, Shannon entropy did not reveal a statistically significant difference between groups (Human: 3.577 ± 1.269; AI: 3.631 ± 1.068; p = 0.925; r = −0.008). This indicates that global grayscale informational complexity—measured as the distribution of tonal intensities—remains comparable across human and AI-generated outputs.

Taken together, these findings suggest that while local structural density differs between production modalities, overall informational dispersion at the tonal level does not constitute a primary structural boundary between human and AI design processes.

3.3. Chromatic Organization

Chromatic organization was evaluated using colorfulness and hue entropy metrics (Table 3). Colorfulness values were significantly higher in human-designed posters (54.025 ± 30.592) compared to AI-generated posters (35.963 ± 18.304). The difference was statistically significant according to the Mann–Whitney U test (p < 0.001), with a small-to-moderate effect size (|r| = 0.367). This suggests that human-designed posters tend to exhibit stronger chromatic contrast and more pronounced opponent-color variation. In contrast, hue entropy values were significantly higher in AI-generated posters (0.998 ± 0.126) than in human-designed posters (0.839 ± 0.150) (p < 0.001; r = 0.500), indicating a moderate effect size. Higher hue entropy reflects a more diffuse and evenly distributed chromatic structure across the hue spectrum.

Together, these findings suggest that while human-designed posters demonstrate stronger color intensity and contrast, AI-generated posters exhibit greater chromatic dispersion. This contrast in color organization represents a systematic chromatic distinction between production modalities within the examined poster domain.

3.4. Multidimensional Feature Space (PCA)

To explore whether the extracted structural, chromatic, and compositional features collectively exhibit systematic distributional differences, Principal Component Analysis (PCA) was applied to the standardized dataset (Figure 2).

Projection of samples onto the two-dimensional PCA plane revealed a visible tendency toward group differentiation between human-designed and AI-generated posters. Although partial overlap remained, the distribution patterns indicate that combined feature interactions contribute to modality-dependent structural variation. It is important to note that PCA is an unsupervised dimensionality reduction technique. Therefore, the observed spatial tendencies reflect intrinsic variance structure within the feature space rather than classification-optimized separation.

3.5. Random Forest Classification Performance

To evaluate the collective discriminative capacity of the extracted visual features, a Random Forest classifier was implemented using stratified five-fold cross-validation. Classification performance results are presented in Figure 3 and Figure 4.

The model achieved a mean cross-validated accuracy of 0.955 (±0.019). Out-of-fold predictions across all folds were aggregated to compute a global confusion matrix (Figure 3). The classifier correctly identified 98 human-designed posters and 93 AI-generated posters, resulting in 191 correct classifications out of 200 samples. Receiver Operating Characteristic (ROC) analysis based on pooled cross-validation predictions yielded an Area Under the Curve (AUC) of 0.991 (Figure 4), indicating robust cross-validated discriminative performance of the combined feature set. Notably, the classifier was trained exclusively on structural, chromatic, and compositional image descriptors without incorporating semantic or textual information. Therefore, the observed classification performance reflects distributional differences in low-level visual structure rather than content-based cues.

Overall, these findings demonstrate that the extracted computational features collectively provide substantial discriminative signal for distinguishing between human-designed and AI-generated posters within the examined thematic context.

3.6. Permutation-Based Feature Importance

To evaluate the robustness of feature contributions beyond impurity-based importance estimates, permutation importance analysis was conducted using aggregated out-of-fold (OOF) predictions obtained from stratified five-fold cross-validation, with 50 permutation repetitions per feature. Padding ratio emerged as the dominant predictor, producing a mean ROC-AUC decrease of 0.2206 (±0.0785) when permuted. Hue entropy exhibited moderate discriminative relevance (mean ΔAUC = 0.0931 ± 0.0361), whereas edge density, colorfulness, and Shannon entropy demonstrated comparatively limited impact on classification performance.

These findings indicate that the discriminative capacity of the model is primarily associated with global compositional structure—particularly negative space utilization—while local structural and tonal complexity measures contribute to a lesser extent within the examined dataset. The dominance of padding ratio was consistent with impurity-based importance estimates, supporting the stability of the compositional signal across cross-validated folds.

3.7. Feature Ablation Analysis

To further examine the relative contribution of individual features, an ablation study was conducted by systematically removing and isolating specific descriptors. Using all features, the model achieved a ROC-AUC of 0.991. When padding ratio was excluded, performance decreased to 0.903, indicating that although padding ratio substantially contributes to discrimination, additional structural and chromatic descriptors retain predictive information.

Importantly, the persistence of meaningful classification performance after exclusion of padding ratio demonstrates that structural divergence is not reducible to a single compositional variable. Hue entropy and structural anchoring metrics independently retain discriminative capacity, suggesting coordinated modality-specific patterns across spatial and chromatic dimensions. This multi-feature persistence supports the interpretation that human and generative design processes operate under distinct compositional priors rather than isolated geometric artifacts.

Using padding ratio alone yielded a ROC-AUC of 0.854, demonstrating that compositional spacing is a strong but not sufficient discriminator. Notably, combining padding ratio and hue entropy restored performance to 0.955, approaching the full model accuracy. These findings suggest that modality differentiation is primarily governed by complementary compositional and chromatic characteristics rather than by a single dominant variable. For clarity and comparative overview, key stability, ablation, and distributional divergence metrics are consolidated in Table 4.

3.8. SHAP-Based Model Interpretability

To further elucidate the directional and instance-level influence of individual descriptors, SHAP (SHapley Additive exPlanations) analysis was applied to the trained Random Forest model. The SHAP summary (beeswarm) plot is presented in Figure 5. As illustrated in Figure 5, padding ratio exhibits the largest overall contribution to model output, confirming its dominant role in classification. Lower padding ratios consistently shift predictions toward the AI-generated class (positive SHAP values), whereas higher padding ratios push predictions toward the human-designed class (negative SHAP values). This directional pattern indicates that negative space utilization constitutes the primary compositional signal encoded by the model. Hue entropy demonstrates a secondary but meaningful influence. Higher entropy values generally increase the likelihood of AI classification, suggesting that chromatic variability complements structural spacing in modality differentiation. In contrast, edge density, colorfulness, and Shannon entropy display substantially narrower SHAP distributions centered near zero, indicating comparatively limited influence on decision boundaries. Importantly, the alignment observed across ablation analysis, permutation importance testing, and SHAP interpretation reveals a stable and internally consistent decision structure. Collectively, these findings suggest that global compositional organization—particularly spatial distribution and color variability—rather than localized tonal or textural complexity, represents the dominant differentiating characteristic between AI-generated and human-designed posters within the evaluated feature space.

3.9. Representative Best-, Typical-, and Worst-Case Examples

To complement the quantitative performance and feature-level interpretability analyses, representative best-case, typical-case, and worst-case examples from both human-designed and AI-generated posters are presented in Figure 6.

3.10. Structural Robustness and Feature Family Ablation Analysis

To evaluate the robustness of the proposed computational framework beyond the initial descriptor set, two additional structure-oriented features were introduced: rule-of-thirds activation score and connected component density. These descriptors were designed to quantify compositional anchoring and topological fragmentation, thereby extending the structural characterization of poster layouts.

Group-level comparisons confirmed statistically significant differences between human-designed and AI-generated posters for both descriptors (Mann–Whitney U test, p < 10⁻¹⁰ for Rule-of-Thirds; p < 10⁻²¹ for Component Density). Human-designed posters exhibited higher activation around rule-of-thirds intersection regions and greater component density, indicating more modular and segmented layout organization. In contrast, AI-generated posters demonstrated comparatively lower structural fragmentation and weaker alignment with canonical compositional anchor points. To assess discriminative robustness at the feature-family level, a structured ablation analysis was conducted using stratified five-fold cross-validation. Classification performance across descriptor configurations is summarized in Table 5.

The structural-only configuration (two features) achieved a mean accuracy of 0.815 and a ROC-AUC of 0.882, demonstrating that spatial anchoring and topological fragmentation independently encode substantial group-discriminative information. The original five-feature configuration achieved a mean cross-validated accuracy of 0.955 and a ROC-AUC of 0.991. When structural robustness descriptors were integrated with the original set (seven features in total), the ROC-AUC slightly increased to 0.9935 while classification accuracy remained stable at 0.955. This pattern indicates improved ranking stability without a corresponding gain in threshold-dependent classification performance. These performance values indicate strong separability within the present controlled dataset; however, they should not be interpreted as direct evidence of equivalent performance across unconstrained visual domains or alternative generative systems. Importantly, the addition of structural descriptors did not introduce instability or performance degradation. Instead, the results suggest complementary interaction between chromatic distribution metrics and higher-order compositional structure indicators. Collectively, the ablation results confirm that the framework remains robust under feature-family isolation and extension. Structural descriptors contribute independent discriminative value while preserving cross-validated stability, thereby strengthening the methodological reliability of the proposed computational differentiation approach.

3.11. Comparative Evaluation with Alternative Traditional Classifiers

To assess whether the observed structural separability was specific to the Random Forest framework or remained stable across alternative traditional classifiers, the original five-feature configuration was additionally evaluated using Support Vector Machine (SVM) and k-Nearest Neighbors (k-NN) under the same stratified five-fold cross-validation protocol.

As shown in Table 6, the SVM model achieved the highest mean accuracy (0.970) and mean ROC-AUC (0.996), slightly outperforming the Random Forest baseline. The k-NN classifier yielded a mean accuracy of 0.955 and a mean ROC-AUC of 0.991, demonstrating performance comparable to the Random Forest model. These supplementary results indicate that the discriminative signal encoded by the proposed interpretable feature set is not restricted to a single classifier family. Nevertheless, Random Forest was retained as the primary framework because its ensemble tree structure is well aligned with the feature-importance and explainability objectives of the study. This supplementary comparison was limited to the original five-feature configuration and should therefore be interpreted as a classifier-sensitivity check rather than a full benchmark across all feature-set variants.

4. Discussion

4.1. Structural Implications

The present study examined whether interpretable computational descriptors could differentiate human-designed and AI-generated posters within a controlled thematic domain. Across univariate testing and complementary model-based analyses, compositional and chromatic descriptors provided substantially stronger separation than global informational complexity measures.

The most pronounced divergence emerged in the padding ratio, which showed a large effect size (|r| = 0.661) and consistently dominated model explainability profiles. Both permutation-based importance and SHAP interpretation indicated that padding ratio contributed the largest share of predictive signal, and its influence was directionally stable: lower padding ratios shifted predictions toward the AI-generated class, whereas higher padding ratios shifted predictions toward the human-designed class. This pattern suggests that global negative-space allocation functions as the primary compositional discriminator within the evaluated feature space.

Beyond its discriminative strength, the magnitude of divergence observed in spatial density allocation is consistent with the possibility of a structural prior operating within the examined text-to-image generation setting. Rather than reflecting deliberate layout economy, AI-generated posters in this dataset tended to exhibit denser spatial occupancy and reduced negative space relative to human-designed posters. In contrast, human designers often regulate negative space intentionally to support hierarchy, readability, and compositional equilibrium. The substantial distributional separation (KS = 0.76) and very large effect size (Cohen’s d = 1.365) therefore suggest that whitespace divergence may reflect systematic differences in production logic within the present controlled setting, rather than incidental variation alone.

Importantly, this structural separation was not mirrored by global grayscale entropy. Shannon entropy did not differ between groups (p = 0.925), implying that the overall quantity of tonal information is comparable across the analyzed poster sets. The implication is not that one group contains “more information,” but that information is distributed differently in space; in other words, the observed differences appear to be more strongly associated with layout organization than with aggregate tonal complexity.

Edge density exhibited a smaller, secondary divergence (|r| = 0.350), indicating differences in local structural transitions; however, permutation and SHAP analyses showed that its marginal contribution to classification was limited relative to padding ratio and hue entropy. Taken together, the results suggest that group differentiation in this dataset is primarily associated with global compositional organization (negative-space utilization), complemented by chromatic dispersion characteristics, rather than by local textural complexity or overall informational density.

4.2. Interpretation in the Context of Previous Research and Design Theory

Prior research in computational aesthetics has largely focused on perceptual quality assessment, stylistic imitation, or semantic coherence in AI-generated imagery [6,25]. In contrast, the present study isolates low-level geometric and chromatic descriptors independent of semantic content. The results suggest that structural differences between the analyzed human-designed and AI-generated poster sets remain detectable even when higher-level semantic cues are excluded, indicating that distributional image statistics may encode systematic production-related differences within the present controlled setting. From a design theory perspective, negative space constitutes a foundational principle in visual composition. Classical theories of visual balance and perceptual organization [25] emphasize spatial equilibrium and controlled allocation of empty regions as mechanisms for enhancing clarity, hierarchy, and legibility. Contemporary layout systems similarly treat whitespace as an intentional structural constraint rather than residual background [27]. The elevated padding ratios observed in human-designed posters are consistent with these principles, suggesting deliberate spatial restraint aligned with rule-based compositional strategies. By contrast, the lower padding ratios and increased hue entropy observed in AI-generated posters may reflect structural tendencies associated with the examined text-to-image generation setting. Such systems optimize image formation through learned statistical regularities, which may favor denser spatial occupation and broader chromatic dispersion relative to human-designed layouts. Importantly, this interpretation does not imply qualitative hierarchy; rather, it suggests that the analyzed AI-generated outputs may encode compositional regularities that differ from human-imposed spatial constraints under the present controlled conditions. The chromatic findings further reinforce this distinction. While human-designed posters exhibited greater colorfulness within structured palettes, AI-generated posters demonstrated higher hue entropy, indicating more diffuse chromatic variability. This pattern may reflect differing compositional strategies: human designers often apply constrained contrast to guide attention hierarchically, whereas AI-generated images may distribute chromatic information more diffusely in accordance with learned statistical associations.

Collectively, these findings support the interpretation that human-designed and AI-generated posters in the present dataset may operate under different compositional tendencies. Within the controlled thematic context examined here, these differing tendencies manifest as measurable divergences in negative-space utilization and chromatic dispersion.

4.3. Multivariate Separability and Model Behavior

The multidimensional separability of the descriptor space was evaluated using Random Forest classification under stratified five-fold cross-validation. The original five-feature configuration achieved a mean cross-validated ROC-AUC of 0.991 and a mean accuracy of 0.955, confirming strong discriminative capacity based on compositional and chromatic descriptors. To assess the independent contribution of structural metrics, a model trained exclusively on the two newly introduced structural descriptors (Rule-of-Thirds activation and Connected Component Density) achieved a mean accuracy of 0.815 and a ROC-AUC of 0.882. Although lower than the full-feature configuration, this result indicates that structural organization alone encodes substantial group-discriminative information within the present dataset. When the structural descriptors were integrated with the original feature set (seven features total), classification performance slightly improved to a ROC-AUC of 0.9935, while classification accuracy remained stable at 0.955. The observed increase in ROC-AUC suggests enhanced ranking stability and greater separation margins between human-designed and AI-generated samples. Importantly, performance variance remained low across folds, indicating that the additional structural features did not introduce instability or overfitting.

These findings suggest that group separability in the present dataset arises not from a single dominant descriptor but from the complementary interaction between chromatic dispersion patterns and higher-order spatial organization metrics. While padding ratio remains a strong contributor, ablation results confirm that structural density and compositional anchor activation provide independent and additive discriminative value. From an engineering perspective, the results suggest that the analyzed AI-generated posters exhibit statistically detectable spatial density regularities and topological continuity tendencies within the present controlled setting. Conversely, human-designed compositions exhibit stronger compositional anchoring and modular fragmentation consistent with established layout principles. Together, these observations support the interpretation that the observed group differentiation in this dataset is more strongly associated with low-level structural organization than with semantic content.

4.4. Broader Implications and Future Research Directions

The present findings contribute to the broader discourse on human–AI creative divergence by demonstrating that structural differences between modalities are computationally quantifiable through interpretable geometric and chromatic descriptors. Beyond initial compositional metrics, the integration of structural anchoring and topological fragmentation indicators further revealed that modality-specific signatures are embedded not only in spatial density but also in higher-order layout organization. The complementary interaction observed between chromatic dispersion and structural descriptors indicates that separability arises from multidimensional compositional priors rather than isolated statistical artifacts.

These results hold implications for explainable AI in creative systems, computational authorship attribution, and automated design auditing. The ability to operationalize compositional divergence using transparent and reproducible descriptors provides a methodological foundation for analyzing structural biases in generative pipelines without reliance on opaque deep feature embeddings. Such approaches may support future tools for evaluating generative authenticity, detecting layout regularities, or monitoring convergence between human and algorithmic design practices. In methodological terms, the present work is not intended to introduce a new classifier architecture, but rather to establish an interpretable quantification framework for examining structural divergence between human-designed and AI-generated posters under controlled thematic conditions. Rather than framing modality differentiation as a forensic detection problem dependent on architecture-specific artifacts, the present findings highlight the feasibility of structural bias analysis using interpretable geometric descriptors. By operationalizing compositional divergence through transparent metrics, the study contributes to explainable generative modeling research and demonstrates that structural priors can be quantified without reliance on deep feature embeddings. This perspective may encourage a shift from artifact detection toward compositional behavior analysis in future human–AI comparative studies.

Future research may extend this framework along several dimensions. Cross-domain validation across diverse poster genres and design categories would clarify whether the observed structural divergence persists beyond the present thematic context. Comparative analyses involving multiple generative architectures—including diffusion-based, transformer-based, and adversarial systems—would help determine whether the identified spatial density and topological patterns reflect architecture-specific priors or broader generative tendencies. Additionally, systematic manipulation of prompting constraints, such as explicit whitespace or grid alignment instructions, could help disentangle model-intrinsic structural biases from user-driven compositional influence.

Finally, integrating perceptual evaluation methods—such as eye-tracking, visual saliency mapping, or cognitive load assessment—may bridge computational structural metrics with human aesthetic perception. Establishing this connection would enable more comprehensive understanding of how measurable geometric divergence translates into experiential design differences between human and generative outputs.

4.5. Limitations and Scope of the Study

While the present study demonstrates robust cross-validated discriminative performance, several constraints define the scope of inference. First, the AI-generated dataset was produced using a single publicly accessible text-to-image generation platform under controlled prompting configurations. Although this approach ensured internal consistency, it does not permit strong claims regarding whether the observed discrepancies are specific to the examined generation setting or extend more broadly across other generative paradigms. The observed divergence—particularly in negative space allocation and compositional anchoring—should therefore be interpreted as representative of the analyzed platform-specific setting rather than as universal across all image-generation frameworks. Second, the restriction to jazz festival posters should be understood as a methodological control rather than a claim of domain exhaustiveness. Poster design in this category typically foregrounds typography, negative space, contrast, and compositional balance, making it suitable for structural analysis. At the same time, the present findings should not be generalized uncritically to domains governed by different communicative functions, such as advertising banners, editorial layouts, cinematic posters, or illustration-driven compositions. For example, advertising banners may prioritize denser call-to-action placement and reduced negative space, editorial layouts may be more strongly governed by grid regularity and typographic hierarchy, and cinematic posters may exhibit more image-dominant compositions with different anchoring logic. The present findings should therefore be interpreted as domain-conditioned structural tendencies rather than universal compositional laws. Third, although stratified five-fold cross-validation and robustness-oriented performance analyses were employed, the dataset size (n = 200) remains moderate relative to the diversity of contemporary visual production settings. While internal stability across folds supports statistical reliability, broader external validity will require replication across larger datasets, additional design genres, and multiple collection periods. Fourth, AI-generated samples were curated to ensure thematic coherence and minimum poster-like visual consistency. Although care was taken to avoid systematic bias, including the exclusion of feature-based or prediction-based selection, any manual curation step may still introduce a degree of residual selection bias. Future studies should therefore examine whether the observed structural patterns remain stable under larger-scale, less curated, and multi-source generation settings. Fifth, although supplementary comparison with support vector machines and k-nearest neighbors was conducted on the original five-feature configuration, the classification analysis remained centered on a Random Forest framework selected for robustness, nonlinear interaction handling, and consistency with the interpretability-oriented design of the study, rather than for exhaustive classifier benchmarking across all model families and feature-set variants. Broader comparative evaluation across alternative classifiers and extended feature configurations would further clarify the classifier-dependence of the reported performance profile. Finally, the framework intentionally relies on interpretable low-level and mid-level structural and chromatic descriptors. Higher-order semantic, typographic, and graph-based layout representations were excluded to preserve methodological transparency and explainability. While this choice enhances interpretive clarity, integrating typography-aware, graph-based, and other higher-level descriptors may improve ecological validity and reveal additional modality-dependent signatures.

Taken together, these limitations define the boundaries of inference. The findings should therefore be interpreted as evidence of domain-constrained structural divergence supported by cross-validated internal robustness rather than as a claim of universal modality detection.

5. Conclusions

This study systematically quantified structural, chromatic, and compositional differences between human-designed and AI-generated posters using interpretable low-level computational visual descriptors. The results indicate that global compositional structure—particularly negative space utilization captured by the padding ratio metric—functions as a major differentiating factor within the present controlled dataset. Human-designed posters exhibited greater spatial restraint and clearer compositional anchoring, whereas AI-generated posters tended toward denser spatial occupancy patterns and altered structural modularity.

Although global grayscale entropy did not significantly differ between groups, chromatic dispersion and spatial density metrics consistently revealed group-level regularities under the examined thematic and generation conditions. The structural-only configuration retained substantial independent discriminative capacity (ROC-AUC = 0.882), indicating that spatial organization alone encodes meaningful separation signals in this setting. The original five-feature model achieved a mean cross-validated ROC-AUC of 0.991 and a mean accuracy of 0.955. When structural robustness descriptors were integrated into the full descriptor set (seven features total), classification performance slightly improved (ROC-AUC = 0.9935) while maintaining stable accuracy across folds. These findings suggest complementary interactions between chromatic variability and higher-order spatial organization.

Importantly, these group differences emerged without reliance on semantic or textual information, indicating that geometric and distributional image properties can encode measurable structural distinctions under controlled thematic conditions. The ablation analysis further suggests that the observed separability is not attributable to a single dominant variable alone but instead reflects coordinated contributions across multiple feature families.

From an engineering perspective, the proposed framework provides a reproducible and interpretable methodology for platform- and domain-constrained visual differentiation grounded in image statistics. The contribution of the present work lies not in proposing a new classifier architecture, but in establishing an interpretable and reproducible quantification framework for examining structural divergence between human-designed and AI-generated posters under controlled thematic conditions. However, the present investigation was intentionally restricted to a single thematic domain, a single publicly accessible text-to-image generation platform, and a Random Forest-based evaluation framework. Accordingly, the findings should be interpreted as evidence of internally robust structural divergence within a controlled proof-of-concept setting rather than as a claim of universal modality detection across all design domains, generative architectures, or classifier families. Future research should therefore examine cross-domain generalizability, multi-platform consistency, alternative classifier behavior, and perceptual correlates to determine the broader applicability of computational compositional signatures in visual design analysis.

Author Contributions

Conceptualization, N.V. and Ç.G.; methodology, N.V.; software, N.V.; validation, N.V.; formal analysis, N.V.; investigation, N.V.; resources, N.V.; data curation, N.V.; writing—original draft preparation, N.V.; writing—review and editing, N.V. and Ç.G.; visualization, N.V.; supervision, N.V.; project administration, N.V.; funding acquisition, N.V. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Ethical review and approval were waived for this study as it involved the computational analysis of publicly available digital images and did not involve human participants or animals.

Informed Consent Statement

Not applicable.

Data Availability Statement

The dataset consists of publicly accessible poster images retrieved from Behance and AI-generated samples produced during this study. Processed numerical feature data used for statistical analysis are available from the corresponding author upon reasonable request.

Acknowledgments

The authors acknowledge KTO Karatay University for institutional and academic support. The authors also thank the developers of the open-source software libraries used in this study. Grammarly (web-based writing tool; Grammarly Inc., San Francisco, CA, USA) was used to improve language during manuscript preparation. The authors have reviewed and approved the final version of the manuscript and take full responsibility for its content.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
GenAI	Generative Artificial Intelligence
PCA	Principal Component Analysis
RF	Random Forest
ROC	Receiver Operating Characteristic
AUC	Area Under the Curve
CV	Cross-Validation
HSV	Hue–Saturation–Value (color space)
RGB	Red, Green, Blue (color space)
SD	Standard Deviation

References

Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-Resolution Image Synthesis with Latent Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 19–24 June 2022; pp. 10684–10695. [Google Scholar] [CrossRef]
Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. arXiv 2020, arXiv:2006.11239. [Google Scholar] [CrossRef]
Floridi, L.; Cowls, J.; Beltrametti, M.; Chatila, R.; Chazerand, P.; Dignum, V.; Luetge, C.; Madelin, R.; Pagallo, U.; Rossi, F. AI4People-An Ethical Framework for a Good AI Society. Minds Mach. 2018, 28, 689–707. [Google Scholar] [CrossRef] [PubMed]
Dwivedi, Y.K.; Kshetri, N.; Hughes, L.; Slade, E.L.; Jeyaraj, A.; Kar, A.K.; Baabdullah, A.M.; Koohang, A.; Raghavan, V.; Ahuja, M. So what if ChatGPT wrote it? Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int. J. Inf. Manag. 2023, 71, 102642. [Google Scholar] [CrossRef]
Elgammal, A.; Liu, B.; Elhoseiny, M.; Mazzone, M. CAN: Creative Adversarial Networks, Generating Art by Learning about Styles and Deviating from Style Norms. In Proceedings of the 8th International Conference on Computational Creativity, Atlanta, GA, USA, 19–23 June 2017; pp. 96–103. [Google Scholar] [CrossRef]
Hertzmann, A. Can Computers Create Art? Arts 2018, 7, 18. [Google Scholar] [CrossRef]
Cetinic, E.; She, J. Understanding and Creating Art with AI: Review and Outlook. ACM Trans. Multimed. Comput. Commun. Appl. 2022, 18, 1–22. [Google Scholar] [CrossRef]
Wang, S.-Y.; Wang, O.; Zhang, R.; Owens, A.; Efros, A.A. CNN-Generated Images Are Surprisingly Easy to Spot‚ for Now. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 8695–8704. [Google Scholar] [CrossRef]
Gragnaniello, D.; Cozzolino, D.; Marra, F.; Poggi, G.; Verdoliva, L. Are GAN-Generated Images Easy to Detect? A Critical Analysis of the State-of-the-Art. arXiv 2021, arXiv:2104.02617. [Google Scholar] [CrossRef]
Verdoliva, L. Media Forensics and DeepFakes: An Overview. IEEE J. Sel. Top. Signal Process. 2020, 14, 910–932. [Google Scholar] [CrossRef]
Xuan, X.; Peng, B.; Wang, W.; Dong, J. On the Generalization of GAN Image Forensics. arXiv 2019, arXiv:1902.11153. [Google Scholar] [CrossRef]
Li, W.; He, P.; Li, H.; Wang, H.; Zhang, R. Detection of GAN-Generated Images by Estimating Artifact Similarity. IEEE Signal Process. Lett. 2021, 28, 2137–2141. [Google Scholar] [CrossRef]
Mahara, A.; Rishe, N. Methods and Trends in Detecting AI-Generated Images: A Comprehensive Review. Comput. Sci. Rev. 2026, 60, 100908. [Google Scholar] [CrossRef]
Park, D.; Na, H.; Choi, D. Performance Comparison and Visualization of AI-Generated-Image Detection Methods. IEEE Access 2024, 12, 62609–62627. [Google Scholar] [CrossRef]
Bird, J.J.; Lotfi, A. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. IEEE Access 2024, 12, 26896–26909. [Google Scholar] [CrossRef]
Velásquez-Salamanca, D.; Martín-Pascual, M.Á.; Andreu-Sánchez, C. Interpretation of AI-Generated vs. Human-Made Images. J. Imaging 2025, 11, 227. [Google Scholar] [CrossRef]
Redies, C. A Universal Model of Esthetic Perception Based on the Sensory Coding of Natural Stimuli. Spat. Vis. 2007, 21, 97–117. [Google Scholar] [CrossRef] [PubMed]
Rigau, J.; Feixas, M.; Sbert, M. Informational Aesthetics Measures. IEEE Comput. Graph. Appl. 2008, 28, 24–34. [Google Scholar] [CrossRef]
Machado, P.; Romero, J.; Nadal, M.; Santos, A.; Correia, J.; Carballal, A. Computerized Measures of Visual Complexity. Acta Psychol. 2015, 160, 43–57. [Google Scholar] [CrossRef]
Datta, R.; Joshi, D.; Li, J.; Wang, J.Z. Studying Aesthetics in Photographic Images Using a Computational Approach. In Proceedings of the European Conference on Computer Vision, Graz, Austria, 7–13 May 2006; pp. 288–301. [Google Scholar] [CrossRef]
Sigaki, H.Y.D.; Perc, M.; Ribeiro, H.V. History of Art Paintings through the Lens of Entropy and Complexity. Proc. Natl. Acad. Sci. USA 2018, 115, E8585–E8594. [Google Scholar] [CrossRef]
Çınar Kalenderoğlu, S.; Demiröz, M. Integrating Text-to-Image AI in Architectural Design Education: Analytical Perspectives from a Studio Experience. J. Des. Studio 2024, 6, 247–258. [Google Scholar] [CrossRef]
Koç, M.; As, İ. Evaluating the Aesthetic Quality in Computer-Generated Renderings via a Comparative Analysis. IDA Int. Des. Art J. 2025, 7, 256–268. [Google Scholar]
Arnheim, R. Art and Visual Perception: A Psychology of the Creative Eye; University of California Press: Berkeley, CA, USA, 1974. [Google Scholar]
Lidwell, W.; Holden, K.; Butler, J. Universal Principles of Design; Rockport Publishers: Beverly, MA, USA, 2010. [Google Scholar]
Lupton, E.; Phillips, J.C. Graphic Design: The New Basics; Princeton Architectural Press: New York, NY, USA, 2015. [Google Scholar]
Oppenlaender, J. A taxonomy of prompt modifiers for text-to-image generation. Behav. Inf. Technol. 2023, 43, 3763–3776. [Google Scholar] [CrossRef]
Berlyne, D.E. Aesthetics and Psychobiology; Appleton-Century-Crofts: New York, NY, USA, 1971. [Google Scholar]
Oliva, A.; Torralba, A. Building the Gist of a Scene: The Role of Global Image Features in Recognition. Prog. Brain Res. 2006, 155, 23–36. [Google Scholar] [CrossRef]
Hasler, D.; Süsstrunk, S. Measuring Colourfulness in Natural Images. In Proceedings of the SPIE Human Vision and Electronic Imaging VIII, Santa Clara, CA, USA, 20–24 January 2003; pp. 87–95. [Google Scholar] [CrossRef]
Shapiro, S.S.; Wilk, M.B. An Analysis of Variance Test for Normality (Complete Samples). Biometrika 1965, 52, 591–611. [Google Scholar] [CrossRef]
Cohen, J. Statistical Power Analysis for the Behavioral Sciences, 2nd ed.; Lawrence Erlbaum Associates: Hillsdale, NJ, USA, 1988. [Google Scholar]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]

Figure 1. Workflow of the proposed measurement and testing framework. Human-designed and AI-generated posters were first organized under controlled thematic matching, then subjected to standardized preprocessing and interpretable feature extraction. The extracted descriptors were subsequently evaluated through statistical testing and multivariate analysis to quantify structural divergence between the two poster groups.

Figure 2. The first principal component (PC1) explained 39.22% of the total variance, while the second principal component (PC2) accounted for 22.42%. Together, these two components captured 61.64% of the overall variance in the five-dimensional feature space.

Figure 3. Confusion matrix obtained from pooled out-of-fold predictions using stratified 5-fold cross-validation. The classifier correctly identified 98 human-designed and 93 AI-generated posters.

Figure 4. Receiver operating characteristic (ROC) curve illustrating the classification performance of the proposed model in distinguishing between Human and AI-generated samples. The model demonstrated excellent discriminative ability with an area under the curve (AUC) of 0.991. The dashed diagonal line represents the performance of a random classifier.

Figure 5. SHAP summary (beeswarm) plot illustrating feature-level contributions to the Random Forest classifier. Each point represents an individual poster instance, with color indicating feature value (blue = low, red = high). Horizontal position reflects the SHAP value (impact on model output). Padding ratio exhibits the strongest directional and magnitude contribution to classification, followed by hue entropy, whereas edge density, colorfulness, and Shannon entropy show comparatively limited influence.

Figure 6. Representative best-case, typical-case, and worst-case examples from the cross-validated poster classification analysis. The top row (a–c) presents human-designed posters, whereas the bottom row (d–f) presents AI-generated posters. Best-case examples were defined as correctly classified samples with the highest class-confidence scores, typical-case examples as correctly classified samples whose confidence values were closest to the class-specific median, and worst-case examples as misclassified samples. These examples were selected from pooled out-of-fold predictions obtained during stratified five-fold cross-validation to illustrate clear, representative, and challenging classification outcomes.

Table 1. Comparison of representative methodologies for distinguishing human-designed and AI-generated visual compositions.

Methodology Reference	Evaluated Characteristics	Detection Analysis Framework	Interpretability	Main Limitation	Distinguishing Position Relative to the Present Study
Deep neural artifact-based detection [9,10,13]	Pixel-level anomalies, frequency-domain irregularities, and architecture-specific synthetic fingerprints.	CNN-based detectors and deep feature embeddings.	Low (black-box).	Strong dependence on learned representations; limited transparency; may be sensitive to changes in generative architectures.	The present study shifts the focus from artifact detection to interpretable structural divergence analysis.
Explainable or feature-based AI-image identification [15,16,17]	Synthesized visual features, explainable indicators, and image-level differences between AI-generated and human-made content.	Explainable AI frameworks and classifier-oriented visual analysis.	Moderate.	Primarily classification-oriented; limited emphasis on controlled compositional structure and thematic matching.	The present study emphasizes controlled thematic comparison and interpretable compositional quantification rather than classification alone.
Design- and aesthetics-oriented comparative studies [23,24]	Aesthetic quality, compositional judgment, and visually perceived differences in computer-generated imagery or AI-assisted design contexts.	Comparative design analysis and perceptual/aesthetic evaluation.	Moderate to high.	Not primarily designed for human–AI structural discrimination or reproducible compositional measurement.	The present study introduces a formalized quantification framework for structural and chromatic divergence in a controlled poster dataset.
Proposed method (this study)	Structural and compositional priors, including padding ratio (negative space), hue entropy, edge density, rule-of-thirds activation, and connected component density.	Transparent machine-learning framework (Random Forest) with SHAP, permutation importance, ablation analysis, and supplementary classifier sensitivity testing.	High (interpretable/white-box oriented).	Restricted to a controlled thematic domain and a single publicly accessible text-to-image generation platform; excludes higher-order semantic and OCR-based descriptors.	Reframes human–AI differentiation as a structural divergence problem and quantifies domain- and platform-constrained compositional differences using interpretable descriptors.

Table 2. Summary of key analytical parameters, selected values, and their methodological roles.

Parameter	Value Used	Section	Rationale	Expected Effect if Changed
Cropping strategy	Center-focused cropping	Section 2.2.1	Reduces peripheral framing inconsistencies and edge artifacts while preserving the main compositional field.	More aggressive cropping may remove meaningful layout information; no cropping may increase framing-induced variability.
Resize resolution	1024 × 1024 pixels	Section 2.2.1	Balances sufficient spatial detail for feature extraction with comparability across heterogeneous poster formats.	Lower resolutions may suppress fine structural transitions and color variation; higher resolutions may increase computational cost without proportionate benefit.
Interpolation method	Bilinear interpolation	Section 2.2.1	Provides stable geometric normalization while minimizing abrupt resampling artifacts.	Alternative interpolation methods may slightly alter edge transitions and fine-grained texture responses.
Color space conversion	RGB → HSV	Section 2.2.2	Separates chromatic information from intensity and supports hue-based analysis more directly than RGB space.	Retaining RGB would reduce the specificity of hue-based metrics and complicate chromatic masking.
Saturation threshold (hue)	S ≤ 25 excluded	Section 2.2.2	Removes near-achromatic regions so that hue entropy reflects meaningful chromatic organization rather than background noise.	Stricter or looser thresholds may alter the sensitivity of hue entropy to low-saturation background regions.
Edge detector	Canny edge detection	Section 2.3	Provides a standardized estimate of local structural transition density.	Different edge detectors or thresholds may change the absolute edge density values and sensitivity to fine boundaries.
Significance threshold	α = 0.05	Section 2.4.1	Standard significance criterion for inferential comparison.	Stricter thresholds reduce false positives but may decrease sensitivity to moderate effects.
Cross-validation folds	5 folds	Section 2.4.2	Appropriate balance between robustness and stability for a moderate-sized dataset.	Fewer folds may increase variance; more folds may raise computational cost and increase split sensitivity.
Bootstrap iterations	1000	Section 2.4.3	Provides a stable empirical estimate of ROC-AUC confidence intervals.	Fewer iterations may yield less stable interval estimates; more iterations increase computation with diminishing practical gain.
Primary classifier	Random Forest	Section 2.4.2	Supports nonlinear interactions, moderate-sized data, and interpretable feature-importance analysis.	Alternative classifiers may yield different performance profiles and interpretability characteristics.
Supplementary SVM kernel	RBF	Section 3.10	Captures nonlinear class boundaries in the original five-feature configuration.	A linear kernel may reduce flexibility if class separation is nonlinear.
Supplementary k-NN neighbors	k = 5	Section 3.10	Standard local-neighborhood setting for sensitivity comparison.	Smaller k may increase sensitivity to noise; larger k may oversmooth local class structure.
Structural descriptor set	Rule-of-Thirds Activation, Connected Component Density	Section 2.3 and Section 3.9	Extends the framework beyond basic chromatic and density features to capture anchoring and fragmentation.	Excluding these descriptors reduces structural interpretability; including additional descriptors may complicate separability.

Table 3. Comparison of visual feature distributions between human-designed and AI-generated posters. Group differences were evaluated using the Mann–Whitney U test. Effect sizes are reported as rank-biserial correlation coefficients (r).

Metric	Human (Mean ± SD)	AI (Mean ± SD)	p-Value	Rank-Biserial r
Edge Density	0.039 ± 0.025	0.030 ± 0.030	<0.001	−0.350
Shannon Entropy	3.577 ± 1.269	3.631 ± 1.068	0.925	−0.008
Colorfulness	54.025 ± 30.592	35.963 ± 18.304	<0.001	−0.367
Hue Entropy	0.839 ± 0.150	0.998 ± 0.126	<0.001	+0.500
Padding Ratio	0.505 ± 0.179	0.171 ± 0.297	<0.001	−0.661

Table 4. Summary of robustness and structural divergence analyses.

Analysis Type	Metric	Value	Interpretation
Cross-validation stability	Mean AUC	0.991 ± 0.004	High performance stability
OOF Bootstrap	AUC (95% CI)	0.99 (0.978–0.999)	Robust modality separation
Feature Ablation	AUC (without padding)	0.903	Multi-feature persistence
Distributional Divergence	KS Statistic	0.76 (p < 10⁻²⁸)	Strong structural separation
Effect Size	Cohen’s d	1.365	Extremely large effect

Table 5. Structural Robustness and Feature Family Ablation Results.

Feature Group	Accuracy	ROC-AUC
Original	0.955	0.9910
Structural	0.815	0.8820
Combined	0.955	0.9935

Table 6. Comparative performance of traditional classifiers on the original five-feature configuration under stratified five-fold cross-validation.

Classifier	Feature Set	Mean Accuracy	Mean ROC-AUC
Random Forest	Original 5 features	0.955	0.991
SVM (RBF)	Original 5 features	0.970	0.996
k-NN (k = 5)	Original 5 features	0.955	0.991

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Vardar, N.; Gümüş, Ç. Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions. Appl. Sci. 2026, 16, 3669. https://doi.org/10.3390/app16083669

AMA Style

Vardar N, Gümüş Ç. Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions. Applied Sciences. 2026; 16(8):3669. https://doi.org/10.3390/app16083669

Chicago/Turabian Style

Vardar, Necati, and Çağrı Gümüş. 2026. "Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions" Applied Sciences 16, no. 8: 3669. https://doi.org/10.3390/app16083669

APA Style

Vardar, N., & Gümüş, Ç. (2026). Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions. Applied Sciences, 16(8), 3669. https://doi.org/10.3390/app16083669

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Quantifying Structural Divergence Between Human and Diffusion-Based Generative Visual Compositions

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset Construction

2.1.1. Human-Designed Posters

2.1.2. GenAI-Generated Posters

2.2. Image Preprocessing and Standardization

2.2.1. Geometric Normalization

2.2.2. Color Space Masking

2.3. Computational Visual Features

2.3.1. Structural Complexity Metrics

2.3.2. Color-Based Metrics

2.3.3. Compositional Metric

2.4. Statistical Analysis and Classification Framework

2.4.1. Statistical Testing

2.4.2. Random Forest Classification Framework

2.4.3. Feature Importance Analysis

3. Results

3.1. Compositional Separation (Padding Ratio)

3.2. Structural Complexity Differences

3.3. Chromatic Organization

3.4. Multidimensional Feature Space (PCA)

3.5. Random Forest Classification Performance

3.6. Permutation-Based Feature Importance

3.7. Feature Ablation Analysis

3.8. SHAP-Based Model Interpretability

3.9. Representative Best-, Typical-, and Worst-Case Examples

3.10. Structural Robustness and Feature Family Ablation Analysis

3.11. Comparative Evaluation with Alternative Traditional Classifiers

4. Discussion

4.1. Structural Implications

4.2. Interpretation in the Context of Previous Research and Design Theory

4.3. Multivariate Separability and Model Behavior

4.4. Broader Implications and Future Research Directions

4.5. Limitations and Scope of the Study

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI