Next Article in Journal
Variable Working Condition Fault Diagnosis Method for Rotating Machinery Based on Dual-Task Cognitive Cost Sensitivity
Previous Article in Journal
Analysis of the Effect of Attention Mechanism on the Accuracy of Deep Learning Models for Fake News Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Critical Assessment of Modern Generative Models’ Ability to Replicate Artistic Styles

by
Andrea Asperti
1,*,†,
Franky George
2,†,
Tiberio Marras
1,†,
Razvan Ciprian Stricescu
1,† and
Fabio Zanotti
1,†
1
Department of Informatics-Science and Engineering (DISI), University of Bologna, Via Mura Anteo Zamboni 7, 40126 Bologna, Italy
2
Data Science, Artificial Intelligence & Modelling, University of Hull, Cottingham Rd, Hull HU6 7RX, UK
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Big Data Cogn. Comput. 2025, 9(9), 231; https://doi.org/10.3390/bdcc9090231
Submission received: 4 July 2025 / Revised: 16 August 2025 / Accepted: 1 September 2025 / Published: 6 September 2025

Abstract

In recent years, advancements in generative artificial intelligence have led to the development of sophisticated tools capable of mimicking diverse artistic styles, opening new possibilities for digital creativity and artistic expression. This paper presents a critical assessment of the style replication capabilities of contemporary generative models, evaluating their strengths and limitations across multiple dimensions. We examine how effectively these models reproduce traditional artistic styles while maintaining structural integrity and compositional balance in the generated images. The analysis is based on a new large dataset of AI-generated works imitating artistic styles of the past, holding potential for a wide range of applications: the “AI-Pastiche” dataset. This study is supported by extensive user surveys, collecting diverse opinions on the dataset and investigating both technical and aesthetic challenges, including the ability to generate outputs that are realistic and visually convincing, the versatility of models in handling a wide range of artistic styles, and the extent to which they adhere to the content and stylistic specifications outlined in prompts, preserving cohesion and integrity in generated images. This paper aims to provide a comprehensive overview of the current state of generative tools in style replication, offering insights into their technical and artistic limitations, potential advancements in model design and training methodologies, and emerging opportunities for enhancing digital artistry, human–AI collaboration, and the broader creative landscape.

1. Introduction

Generative AI has rapidly expanded into creative fields, transforming how visual art is produced, modified, and experienced. Early breakthroughs, such as StyleGAN [1,2], laid the foundation for high-quality image synthesis, but the field has since been revolutionized by the rapid rise of diffusion-based models [3,4,5]. These newer techniques have significantly enhanced the ability to generate realistic images, mimic artistic styles [6,7], and even create entirely new visual compositions [8,9], establishing diffusion models as the dominant paradigm in generative artistry. Among these capabilities, style replication has emerged as a key area of interest, allowing users to apply diverse historical and modern artistic styles to AI-generated images [10,11,12]. This technology enables greater artistic expression and personalization, bridging the gap between computational creativity and traditional artistry and empowering artists, designers, and hobbyists to explore and reinterpret visual styles in ways that were previously highly specialized or time-intensive [13,14,15,16].
The purpose of this study is to provide a critical assessment of the capabilities and limitations of current generative tools in effectively replicating styles. By examining both the technical performance and aesthetic outcomes of these tools, the study aims to highlight their strengths, identify areas where they fall short, and offer insights into the potential improvements needed to enhance their application in creative fields.
Specifically, we compared twelve modern generative models: DallE 3 (https://openai.com/index/dall-e-3/, accessed on 3 July 2025), Stable Diffusion 1.5 (https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5, accessed on 3 July 2025), Stable Diffusion 3.5 large (https://stabledifffusion.com/tools/sd-3-5-large, accessed on 3 July 2025), Flux 1.1 Pro (https://flux1.ai/flux1-1, accessed on 3 July 2025), Flux 1 Schnell (https://fluxaiimagegenerator.com/flux-schnell, accessed on 3 July 2025), Omnigen (https://omnigenai.org/, accessed on 3 July 2025), Ideogram (https://ideogram.ai/login, accessed on 3 July 2025), Kolors 1.5 (https://klingai.com/text-to-image/new, accessed on 3 July 2025), Firefly Image 3 (https://firefly.adobe.com/, accessed on 3 July 2025), Leonardo Pheonix (https://leonardo.ai, accessed on 3 July 2025), Midjourney V6.1 (https://www.midjourney.com/imagine, accessed on 3 July 2025), and Auto-Aesthetics v1 (https://neural.love/blog/auto-aesthetics-v1-ai-art-revolution, accessed on 3 July 2025).
The models were compared using 73 uniform prompts that span a broad range of painting styles from the past five centuries. This resulted in the creation of a large supervised dataset of AI-generated artworks: the AI-Pastiche dataset. The dataset was explicitly conceived to highlight the multimodal interplay between language and image, focusing on how textual prompts guide visual style and content. The dataset can also offer a valuable resource for advancing research in areas such as deepfake detection, digital forensics, and the ethical study of AI-generated content. By supplying a controlled, high-quality set of AI-generated images, the dataset aids in training and testing models for improved detection accuracy, robustness against manipulation, and broader exploration of generative AI capabilities across fields ranging from security to digital art.
The quality of the generated images was evaluated based on two criteria: the ability of the models to faithfully replicate human-crafted artwork and their capacity to faithfully adhere to the style and content specified in the prompts, preserving coherence and integrity of the composition. The first criterion was assessed through a public survey in which participants were asked to distinguish between human-created and AI-generated images. The second criterion, which involved a per-prompt comparison of the samples generated by different models, was evaluated directly by members of our team along with a few additional volunteers.
The novelty of our approach consists precisely in the focus on human perception and qualitative evaluation across multiple generators, offering a blend of subjective assessment, style focus, and prompt–image alignment, all grounded in human judgment.
The results of our investigation reveal that while modern generative models demonstrate remarkable artistic capabilities, they still encounter significant challenges in faithfully replicating historical styles. Rather than a lack of detail, hyperrealism emerges as the primary obstacle—AI-generated images often display excessive sharpness and unnatural precision, making them visually striking but historically inconsistent. According to our evaluation, state-of-the-art models successfully produce images that non-expert users misidentify as human-created in less than 30% of cases, highlighting the persistent gap between AI-generated and traditionally crafted artworks.
This work is part of a larger and ambitious project that aims to assess whether Large Language Models possess an aesthetic sense and, if so, to identify the aesthetic principles that guide their preferences. This investigation represents a significant advancement in understanding the emergent abilities of LLMs [17,18,19,20] and their social implications. In evaluating the aesthetic sense of LLMs, it is essential to bypass any potential familiarity that the models may have with specific artworks, as this could allow them to draw on pre-existing evaluations or learned data. By using a dataset of fictional or AI-generated artworks, such as the one created as part of this study, we can ensure that LLMs rely solely on the information provided in the dataset, thus offering a more controlled evaluation of their aesthetic judgment.
In summary, this work makes two major contributions:
  • The creation of a well-curated, richly annotated dataset of AI-generated images, covering some of the most widely used generative models, suitable for a wide range of applications;
  • An in-depth evaluation through targeted human surveys, primarily assessing the perceived authenticity of generated images and their adherence to the prompt.
There are a few important disclaimers to make. First, we emphasize that the aim of this research study is not to conduct a comparative qualitative evaluation of different models—a task that would require a significantly larger dataset—but rather to assess the overall performance of these models in the style replication task. The current limitations of such systems are often evident even in a small number of examples; thus, the scale of AI-Pastiche is largely sufficient for a meaningful and focused assessment, demonstrating that problematic cases are not cherry-picked anomalies but rather recurring issues.
We see AI-Pastiche as a foundation for future comparative and diagnostic research across different modeling approaches and aesthetic criteria. The surveys included in this work are designed to enrich the dataset with human evaluations that would be impossible to obtain through automated methods. The choice to involve non-expert participants reflects our interest in the broader interpretability and public reception of AI-generated art. It is not in the scope of this article to carry out in-depth analyses using specific metrics or automatic image–text alignment scores; such investigations are precisely the kind of research we hope to enable and encourage through the release of this dataset.
The survey evaluation was conducted with a population of educated though non-expert users. We contend that at the current stage of development, state-of-the-art style replication does not yet require expert evaluation to identify its major limitations. In any case, expert and non-expert perspectives provide different yet equally valuable insights. No personal or identifiable information was collected at any point during the study. Participants were informed about the purpose and scope of the survey on the introduction page and were given the option to either proceed or opt out. They were also free to discontinue their participation at any time, and any incomplete responses were excluded from the dataset. We carefully considered the ethical implications of conducting a public survey of this scale and ensured that participation remained fully anonymous, voluntary, and non-invasive throughout the process.
Finally, we acknowledge that paintings are not merely flat images—they are anaglyphic, possessing 2½D qualities that play a critical role in perception through a complex interaction of texture, viewer, and context [21,22]. That said, the current quality of generative outputs does not yet reach a level where such subtle perceptual dimensions come into play in a meaningful way. As generative models continue to improve, perceptual features such as surface texture, depth, and material realism may gradually become more relevant for evaluation.
This article has the following structure: In Section 3, we describe our methodology, the way the dataset was created, the selection of models, and the way surveys were formulated and conducted. Section 4 gives a detailed description of the dataset and the relative metadata. In Section 5 we give a detailed description of the surveys, the target audience, and the frameworks used to publish and collect data. Section 6 describes the results of the evaluation. An in-depth discussion of some of the main critical aspects of the style transfer capabilities of generative tools is given in Section 7. In Section 8, we offer a few ideas for future developments and outline some possible applications of our dataset.

2. Related Works

AI-driven artistic style transfer has grown significantly in recent years, driven by advances in deep learning and generative models. Several works have explored the capabilities, limitations, and applications of AI-generated imagery. Our work contributes with a comprehensive evaluation of multiple generative models, emphasizing their adherence to artistic style and prompt fidelity.
Early works such as Gatys et al.’s [23] laid the foundation for neural style transfer, introducing methods that blend content and style representations from convolutional neural networks. Subsequent research expanded on these concepts, improving efficiency and control over style application [24]. More recently diffusion-based models have demonstrated superior results in high-fidelity artistic synthesis, allowing for more nuanced style adaptation. Our study builds upon these advancements but diverges in its focus on evaluating multiple state-of-the-art models across diverse artistic styles and historical periods. This allows for a broader assessment of model performance.
One major area of focus has been figuring out how to evaluate and detect AI-generated images. For instance, studies like CIFAKE by Bird and Lotfi [25] and GenImage by Zhu et al. [26] have worked on measuring how realistic synthetic images are and developing techniques to tell them apart form human-made art. Similarly, Li et al. [27] explored the world of adversarial AI-generated art, shedding light on the challenges of authentication and detection. These efforts are vital to assessing the authenticity of generated works, particularly in contexts where human perception plays a critical role.
To support this kind of research, several large scale datasets have been created:
  • ArtiFact Dataset [28]: This is a diverse mix of real and synthetic images, covering everything from human faces to animals to landscapes, vehicles, and artworks. It includes images synthesized by 25 different methods, including 13 GAN-based models and 7 diffusion models.
  • WildFake Dataset [29]: A dataset designed to assess the generalizability of AI-generated image detection models. It contains fake images sourced from the open-source community, covering various styles and synthesis methods.
  • TWIGMA Dataset [30]: A large-scale collection of AI-generated images scraped from Twitter, from 2021 to 2023, including metadata such as tweet text, engagement metrics, and associated hashtags.
While these studies focus on detecting AI-generated images, we focus on examining how convincingly these images replicate human-created art. Through public perception surveys, we assess whether generated paintings can be mistaken for human artwork, providing insights into the models’ ability to deceive a human viewer.
Beyond detection, generated images are increasingly used as data sources for synthetic training and research applications. The work by Yang et al. [31] discusses the implications of using AI-generated images for training machine learning models. They explore the potential of synthetic datasets to enhance machine learning capabilities while also addressing concerns related to biases, authenticity, and ethical challenges.
Another direction in the field is the use of diffusion models for artistic style transfer. Researchers such as Chung et al. [11] and Zhang et al. [7,12] have introduced training-free methods and pre-trained diffusion models specifically designed for style adaptation. These works highlight the effectiveness of modern diffusion-based architectures in achieving high-fidelity artistic synthesis while maintaining flexibility for style injection. Furthermore, the work by Png et al. [10] proposes a feature-guided approach that improves control over the stylistic aspects of the generated output.
The creative applications of generative AI have also been widely discussed. Haase et al. [15] explored the role of generated imagery in inspiring human creativity, particularly in design workflows. Similarly, Barros and Ai [13] investigated the integration of text-to-image models in industrial design, while Vartiainen and Tedre [16] examined their use in craft education. We complement these works by examining the limitations of generative tools in artistic fidelity, particularly their struggle with maintaining compositional balance, avoiding anachronisms, and ensuring stylistic coherence. We highlight critical shortcomings such as overuse of hyperrealism, anatomical distortions, and misinterpretations of historical context, which could be key obstacles to seamless integration into professional artistic workflow.
Furthermore, a growing body of work focuses on understanding the emergent capabilities of Large Language Models and their application in aesthetic evaluation. Studies such as those by Wei et al. [17] and Du et al. [19] discuss how LLMs develop new abilities, such as the preference for certain artistic styles. Wang et al. [32] analyzed evaluation metrics for generative images, offering insights into how to assess AI-generated art both quantitatively and qualitatively. These studies can be expanded with our proposed dataset, which, unlike other existing ones, is a controlled dataset of synthetic artworks.

3. Methodology

In this section we outline our methodology for the creation of the dataset, the selection of models, and their evaluation. Our goal was to build a controlled, art-style-focused dataset to evaluate whether modern diffusion models can produce images that are both visually coherent and stylistically faithful to well-defined art styles. To make results comparable across models, we standardized the prompts, generation settings, and metadata.

3.1. Creation of the Dataset, Aims, and Methodology Used for Data Acquisition

Crafting high-quality prompts is crucial because prompt design directly governs relevance and stylistic fidelity of outputs [33,34,35]. The prompt creation process began with obtaining the detailed descriptions of notable artworks drawn from authoritative art history sources and online museum archives. Using these references, we employed a two-stage, reverse-engineering-inspired method to fine-tune the prompts, adapting the descriptors to match the structural and length constraints typically supported by contemporary generative models.
In Stage 1, we manually drafted an initial set of prompts, with a common structure. They typically began with an indication of the style and historical period to imitate, sometimes reinforced by referencing a specific painter. This was followed by a detailed description of the subject, including suggestions for lighting, colors, and tones. Finally, each prompt concluded with a hint about the overall sentiment or emotion the artwork was intended to convey. These prompts were then tested on a subset of the diffusion models selected for the study to identify potential failure modes, including style drift, incorrect motifs, and anachronistic elements.
In Stage 2, the manual prompts were iteratively revised with GPT-4o to strengthen style cues by adding period-specific vocabulary, removing unnecessary adjectives, and clarifying composition. After each revision, a small batch of images was regenerated per model using the refined prompts. A prompt was retained for final art generation only if (i) the outputs consistently displayed the intended properties across a majority of models and (ii) GPT-4o, when given only the generated image, could correctly identify the target style described in the original prompt. Here are a couple of example prompts:
  • “Generate a detailed winter landscape painting in the Flemish Renaissance style of the second half of the 16th century. Depict a snow-covered village with small, rustic houses nestled into a hilly landscape. Include bare, slender trees in the foreground with hunters walking through the snow, accompanied by dogs. The scene should feature frozen lakes or ponds in the background, where villagers are skating and engaging in winter activities. The sky is a muted, wintry blue-gray, and the overall tone of the painting should evoke a peaceful, yet somewhat melancholic atmosphere, with intricate details showing rural life during winter.”
  • “Generate a view of Venice in the Vedutism style of the first half of the 18th century, focusing on a scene along the Grand Canal. The composition features detailed classical architecture with grand domes and facades, and gondolas moving along the canal. Add soft clouds to the sky and ensure there is little fading in the horizon, providing clear visibility of distant buildings. The color palette should include very soft blues and warm earth tones, avoiding saturated colors. The atmosphere remains calm and luminous, with minimal light-and-shadow effects, capturing the beauty and grandeur of Venice from a broad perspective.”
To ensure comparability, each accepted prompt was used unchanged across all diffusion models. For each (prompt, model) pair, we generated a fixed number of samples at the model’s native target resolution, using consistent sampler hyperparameters set to the model’s default values. We avoided post-processing and upscaling.
Each image is stored with detailed generation metadata, including the model, prompt text, subject, style, and period. This allows for the regeneration of similar samples and supports controlled ablation studies across models and settings. Further details on the dataset are provided in Section 4.

3.2. Models

Image generative models are a class of machine learning algorithms designed to synthesize novel images by learning the underlying patterns in existing data. By approximating the underlying distribution of visual data, these models generate outputs that form the foundation of various creative AI applications.
Within the domain of image generation, models are broadly categorized into Text-to-Image (Text2Img) and Image-to-Image (Img2Img) frameworks [36,37], although hybrid and specialized approaches also exist. Text2Img models generate entirely new images based on textual descriptions, effectively translating linguistic cues into visual representations. In contrast, Img2Img models modify or enhance existing images by leveraging an input image as a reference while applying stylistic or contextual transformations. This study primarily focuses on Text2Img models due to their ability to create images purely from descriptive text prompts, making them particularly suited for analyzing artistic style recreation.
To systematically evaluate the artistic fidelity and limitations of state-of-the-art (SOTA) commercial generative models, 12 diffusion-based models were selected, among the most widely used and highly regarded in the field. These models were identified based on their popularity and performance, as detailed in the Introduction. The selection was motivated by three key considerations:
1.
Benchmarking Established Models: Using well-established models enables the creation of a high-quality AI-generated art dataset, which could serve as a valuable resource for future research.
2.
Avoiding Training and Fine-Tuning Biases: Training a model from scratch or fine-tuning an existing open-source model would not provide a fair assessment of the out-of-the-box capabilities of these models. Our goal was to evaluate their pre-trained performance rather than their adaptability to new training objectives.
3.
Computational Constraints: Training or fine-tuning diffusion models is highly resource-intensive. Proprietary models, in particular, are trained on vast datasets with ongoing refinements by dedicated research teams, making them the most suitable candidates for assessing the current peak capabilities of image generative AI.
Initially, 15 diffusion models were considered. Each model was tested using three standardized prompts to evaluate its ability to generate visually coherent and stylistically accurate images. Five researchers independently assessed the outputs based on realism, artifact minimization, and adherence to the prompt. A model was discarded if all five unanimously agreed that it failed to meet these criteria. For example, DeepFloyd IF [38] was among the initial 15 models considered but was excluded from further experimentation. Its generated outputs frequently failed to align with the described artistic movements, particularly struggling with facial features and even simple object shapes (e.g., dogs and other animals).
The final selection of 12 models used in our study is listed in the Introduction, with key specifications summarized in Table 1. It is important to note that many of these models are proprietary, and as a result, their architectural details and training methodologies remain undisclosed.

3.3. Evaluation Criteria

Models are evaluated based on two distinct and orthogonal criteria, each addressing a crucial aspect of their performance:
  • Authenticity. The first criterion evaluates the model’s ability to generate samples that are sufficiently realistic and convincing, such that they could be mistaken for artifacts created by a human. This involves assessing the quality of the generated output in terms of visual coherence, attention to detail, and overall believability. A high score in this area indicates that the model produces outputs that closely mimic human creativity and craftsmanship.
  • Adherence to Prompt Instructions. The second criterion focuses on the model’s capacity to accurately follow the detailed instructions specified in the prompt. This involves assessing how well the generated outputs align with the intended artistic style, thematic elements, or any specific requirements outlined. Success in this area demonstrates the model’s ability to interpret and faithfully execute complex and nuanced instructions.
These two evaluation criteria are deliberately designed to be independent. While a model may excel in producing outputs faithfully mimicking human art crafts, it might still fail to accurately adhere to the stylistic constraints of the prompt, or vice versa. By assessing these dimensions separately, this research study aims to obtain a comprehensive understanding of the model’s strengths and weaknesses across both realism and prompt alignment.
The way these criteria are addressed in our surveys will be described in Section 5.

4. The AI-Pastiche Dataset

AI-Pastiche is a carefully curated multimodal dataset comprising 953 AI-generated paintings in well-known historical artistic styles. These images were produced using 73 manually crafted textual prompts, each describing an imaginary artwork rendered in the visual language of a specific artistic tradition. The prompts incorporate references to subject matter, composition, and aesthetic features characteristic of different periods, enabling a focused investigation into how generative models interpret and reproduce stylistic and cultural cues. For each prompt, one or more images were generated across a selection of state-of-the-art text-to-image models (Section 3.2). The dataset also includes comprehensive metadata detailing the generation process, as well as human evaluation scores supporting analysis of prompt–image alignment, stylistic fidelity, and technical quality.

4.1. Dataset Objectives

The two primary purposes of the dataset are to
1.
Analyze the capabilities and limitations of SOTA generative models in accurately recreating well-known painting styles.
2.
Provide a high-quality AI-generated painting dataset for the research community, facilitating future studies on generative AI in artistic domains.
While the current dataset consists of 953 carefully selected images, we plan to expand it in future iterations, incorporating additional artistic styles and more diverse prompts to further evaluate model performance and limitations.

4.2. Metadata and Composition

The AI-Pastiche dataset includes detailed metadata for each generated painting, summarized in Table 2. It is important to note that attributes such as subject, style, and period correspond to the intended description in the prompt rather than a direct analysis of the generated image itself.
At present, the dataset exhibits some stylistic imbalances, particularly in terms of artistic periods (Table 3) and movements (Table 4). In future expansions, we aim to mitigate these imbalances by incorporating a broader range of historical styles and more diverse prompts.
All data are available at the following Kaggle repository: https://www.kaggle.com/datasets/asperticsuniboit/deepfakedatabase/, accessed on 3 July 2025, AI-Pastiche.

5. Human Surveys

To evaluate model performance according to the criteria outlined in Section 3, two distinct human surveys were designed and conducted, and their results were collected for analysis.

5.1. Authenticity

With authenticity, we refer to the extent to which a model generates outputs that convincingly resemble human-made creations.
The evaluation was conducted using a survey-based approach, where participants were asked to classify images as either AI-generated or human-made. For the human-made paintings, a subset of open-access images from the National Gallery of Art in Washington (https://www.nga.gov/open-access-images.html, accessed on 3 July 2025) was used. Participants were shown a set of 20 images, one at a time and in sequence, comprising a random mix of genuine and AI-generated works, and were asked to classify each image individually.
To ensure unbiased and reliable responses, the survey presented the images in randomized order, without any metadata or contextual information that could hint at their origin. This design encouraged participants to base their judgments solely on the visual and stylistic qualities of the images.
The survey was conducted anonymously, and no personal information was collected in accordance with privacy considerations. Given the focus on European painting, it was acknowledged that cultural background could influence participants’ perceptions. To account for this, participants were asked whether they identified with a European cultural background, with the option to decline to answer.
The survey reached approximately 600 participants, selected from a diverse pool to capture a wide range of perspectives. Most of the participants were students and colleagues, suggesting a relatively high level of education and some familiarity with artistic aesthetics, though typically without formal training in art critique. This selection was intentional, as it reflected the anticipated audience for AI-generated art in real-world scenarios. The study aimed to evaluate perceptual authenticity as it might be experienced by the general public.

5.2. Adherence to Prompt Instructions

The purpose of this evaluation is to assess each generated image based on its alignment with the requirements specified in the given prompt.
This classification task is significantly more complex than the previous one, as it requires a careful reading and thorough understanding of the prompt, as well as a comparative evaluation of outputs from different models. For this reason, participation was restricted to a selected number of members, comprising people of our research group, colleagues of the department of fine arts, and some of their students. While the collective number of participants was sensibly smaller than for the first survey, each person evaluated multiple prompts, resulting in 5706 entries with an average of about 475 assessments for each model.
In a companion study [48], the potential of fully automated evaluation using models such as CLIP [49] was explored, focusing on its perceptual capabilities across both human-made and AI-generated artworks. While CLIP performs well in anchoring images to broad semantic categories and proves effective in discriminative tasks such as matching a generated sample to its corresponding prompt, it often struggles with the more subjective dimensions of artistic evaluation, including style, historical period, and cultural context. As a result, its assessment of prompt adherence tends to be strongly biased toward content features and does not align closely with human judgments of artistic fidelity.
Our evaluation metric relies on subjective assessments of how well each image satisfies the requirements of the prompt, taking into account content, stylistic fidelity, and technical quality, penalizing the presence of visible defects or artifacts, as well as the lack of cohesion or integrity. Although it is theoretically possible to rank the generated images on a continuous scale, the inherent complexity of the task and the subjective nature of the evaluations led us to simplify the process. Instead, images are categorized into three broad classes: low, medium, and high alignment with the prompt.
These classifications—low, medium, and high—are not absolute or universal but are defined relative to the specific set of images generated for each prompt. This relative approach ensures that the evaluation accounts for the context and inherent variability within each batch of images.

6. Results of the Surveys

This section presents and analyzes the results of the surveys. As previously noted, the goal is not to compare the performance of different models but rather to offer a clearer understanding of the current state of the field. Our focus is on identifying the persistent challenges faced by generative models, highlighting specific problem areas, and discussing potential directions for improvement. By examining these limitations, we aim to contribute to the broader discourse on how these models can be refined and enhanced for more reliable and aesthetically convincing outputs.

6.1. Authenticity Results

Figure 1 depicts the confusion matrix resulting from the survey: overall, around 29% of AI-generated images were mistakenly attributed to humans. Interestingly enough, a slightly lower but still relevant number of human-generated images were attributed to AI: in this case, the misclassification percentage is around 20%.
Figure 2 illustrates the frequency distribution of misclassification percentages for AI-generated images in the dataset. The distribution is skewed toward lower misclassification percentages, with a small subset of images achieving a perfect authenticity score.
Since the purpose of this work is not to make a ranking of models but merely to understand the overall state of the art, we only provide a synthetic evaluation of the six best models, as evinced from our survey. The results are summarized in Table 5.
The best-performing model appears to be Ideogram, achieving an impressive authenticity rate close to 50%. It is also noteworthy that relatively older models, such as Stable Diffusion 1.5 and Omnigen, perform comparatively well against more recent competitors. As apparent from the results of the second survey (see Section 6.2), this is partly due to these models adopting a more liberal interpretation of the prompt, often sacrificing strict prompt adherence in favor of aesthetic quality.
Some examples of AI-generated artifacts of different styles and periods among the most frequently classified as human-made according to our survey are shown in Figure 3.
A per-period investigation (see Table 6) shows that not surprisingly, generative models perform particularly well in mimicking art of the last century and (some styles) of the 19th century. They clearly seem to have much more trouble producing convincing artifacts of previous periods.
Analyzing results according to artistic styles is complicated by the current underrepresentation of certain movements: values have been reported for the sake of completeness, but their statistical relevance is really modest. For instance, as shown in Table 7, the style with the highest model performance is “Art Nouveau”.
However, we have only a single prompt associated with this label, depicting a pencil sketch of a seated man in a pensive attitude. A few examples are shown in Figure 4. Due to the schematic simplicity of both the subject and the technique, it is not surprising that many of the AI-generated artifacts have been mistakenly perceived as human-made.
A similar problem arises with the “Satirical” style. Again, we have only one prompt relative to this category, referring to a caricature of Otto Von Bismarck in the style of the satirical magazine La Lune of the end of the 19th century. Many models created convincing artifacts, as illustrated in Figure 5.
Apart from these cases, generative models appear to be more adept at imitating modern artistic styles, such as Impressionism, Cubism, Dadaism, Futurism, and similar movements. These styles often emphasize abstraction, bold shapes, and expressive brushwork, which align well with the strengths of generative models.
Conversely, models face greater challenges when attempting to replicate older artistic styles, such as Renaissance, Baroque, and Rococo. These styles are characterized by intricate details, realistic depictions, and complex compositions, which demand a level of precision and semantic interpretation that many models struggle to achieve.
Interestingly, the worst performance is observed when models attempt to imitate naïve art. One key reason for this difficulty is the challenge most models face in handling the “flat” perspective typical of this style, as discussed in Section 7.2.3. Unlike classical or modern styles, naïve art often employs a lack of depth, disproportionate figures, and an intuitive rather than rule-based approach to composition. This contradicts the implicit biases of generative models, which are often trained to prioritize realism, shading, and perspective consistency.

6.1.1. Distinction of Results for Cultural Background

As mentioned in the Introduction, participants in the survey were asked to disclose their cultural background to assess its potential impact on the perception of European paintings.
In this regard, the collected data are highly unbalanced, with European participants outnumbering non-European participants by approximately six to one. As a result, any analysis of this factor must be approached with caution, as the sample distribution may limit the reliability of our findings.
The only interesting result is relative to the misclassification rate for different historical periods, shown in Figure 6.
Not surprisingly, non-European participants tend to misclassify images from the 15th, 16th, and 17th centuries more frequently, likely due to a lower level of familiarity with the artistic movements of those periods. European art from these centuries is deeply rooted in specific cultural and historical contexts, with stylistic conventions that may not be as immediately recognizable to those who have not been extensively exposed to them.
A similar analysis across different artistic styles did not reveal any additional trends significant enough to report.

6.1.2. Influence of the Subject

Our final investigation examines the influence of subject matter on the model’s ability to generate artifacts that can be mistaken for human-made creations. For this analysis, the tags described in Section 4.2 are used. Specifically, each prompt is represented as a multilabel binarization over its associated set of tags. A linear regression was performed to predict the average degree of “authenticity”, as determined by the survey for all entries associated with the selected tags. The analysis was limited to tags appearing in at least two different prompts.
A high degree of predictive accuracy was not expected, given that other factors—such as the intended style and historical period—also influence outcomes. The primary focus of the analysis lies not in the predicted values themselves but in the weights assigned by the model to individual tags, particularly negative ones, which may highlight categories that pose challenges for current generative systems.
After normalizing the output by using Gaussian normalization, a prediction error of approximately 0.4 was obtained (compared with the unit standard deviation). As expected, the prediction accuracy is not particularly high, but it is sufficient to demonstrate a correlation between tags and perceived authenticity.
In Figure 7, the weights associated with the different tags are shown. The investigation is repeated for all models (blue) and for a restricted subset of models comprising Ideogram, Midjourney, Stable-Diffusion-3.5-large, and Dall·E, which obtained high scores both in authenticity and prompt adherence.
Looking at the negative scores, a notable group is composed by tags related to humans: “crowd”, “person”, “persons”, “child”, and “portrait”. This provides strong evidence that generative models still struggle to represent humans convincingly when mimicking artistic painting. In addition, portraits of women tend to present more challenges compared with those of men.
This difficulty may arise from several, sometimes contrasting factors. For example, generative models may fail to achieve realism in highly complex and dynamic scenes involving multiple people or crowds, while at the same time, they may adopt exaggerated hyperrealism in portraits. We discuss these issues in more detail in Section 7.
From the naturalistic point of view, “clouds”, “flowers”, and “water” seem to have a negative impact. The tag “flower” contrasts with “still_life”, which, by comparison, has a significantly more positive score. In our dataset, the negative perception associated with flowers seems to be primarily linked to paintings in Naïve style—one of the styles where generative models, as observed in the previous section, tend to perform the worst. The negative scores for “clouds” and “water” appear to stem from the inherent complexity of rendering these elements in a way that aligns with the stylistic and historical constraints specified in the prompt. It is also interesting to observe that while the “best” models seem to be able to cope with water in an acceptable way, their performance on “clouds” is even worse than average. We shall discuss this subject in more detail, in Section 7.2.3, where we also provide a few examples.
Other tags related to nature, such as “fog”, “snow”, and “trees”, do not appear to pose significant challenges for generative models. These elements are often rendered convincingly, likely due to their relatively uniform structures and the abundance of high-quality reference images available in training datasets. However, the situation changes when considering specific moments of the day. Night scenes can easily suffer from inconsistencies in lighting and contrast or from the hyperrealistic rendering of specific elements, such as the moon. More notably, dawn and sunset present particular difficulties, as generative models often struggle to capture the complex interplay of warm and cool tones, the gradual transitions in atmospheric lighting, and the way natural and artificial light sources interact during these times. These shortcomings can lead to unnatural gradients, misplaced highlights, or an overall loss of realism, making these scenarios more challenging than other elements related to nature.
The explicit request in the prompt to add visible brushstrokes frequently increases the perception of authenticity. In addition, models generally perform better when prompted to adopt soft, muted tones rather than vibrant or dramatic color schemes. When working with softer tones, the model is more likely to produce balanced, harmonious compositions that align well with a wide range of artistic styles. In contrast, when tasked with generating highly saturated or dramatic lighting effects, models often tend to over-interpret the request, leading to exaggerated contrasts, unnatural color blending, or an overuse of artificial-looking highlights and shadows.
Finally, models appear to struggle with subjects related to mythology and religion, due to a combination of the inherent complexity of these themes and content moderation filters that may constrain or influence their performance.

6.2. Adherence to Prompt Instructions and Stylistic Fidelity Results

This survey measured user satisfaction based on the alignment of the generated images with the requirements specified in the given prompt, considering both content and style. Users rated their evaluations in three categories: “good”, “medium”, and “low”. The evaluation was not intended to be absolute, but rather comparative, assessing how each model’s output performed relative to others.
For instance, if a particular image was unanimously classified as “Good”, this does not necessarily imply that it was a highly satisfactory interpretation of the prompt. Rather, it simply indicates that in the collective judgment of the reviewers, it outperformed the outputs of other models.
Reviewers were encouraged to give a balanced repartition in the three categories, to reduce the impact of prompt complexity and inherent variability within each batch of images.
To derive a summary score, we computed a weighted average, assigning a value of 1 to “good”, 0 to “medium”, and −1 to “low”.
The results are summarized in Figure 8 and Table 8. Again, in the table, we only list the best-performing models, according to our investigation.
It is worth noting that prompt adherence leads to a substantially different ranking compared with authenticity scores, which were evaluated without knowledge of the corresponding prompts. A model that was instructed to generate a Renaissance painting but instead produced a convincing Cubist artwork would likely receive a high authenticity score, despite failing to follow the intended artistic style.
This suggests that some models prioritize aesthetic quality over strict prompt adherence, opting for visually compelling outputs even at the expense of accuracy. This tendency is particularly evident in early-generation generative models, such as Stable Diffusion 1.5 and Omnigen, which frequently take creative liberties with prompt instructions.
Despite their loose interpretation of prompts and their occasional introduction of artifacts and distortions, these models remain among the most creative and surprising in our tests. Their ability to produce unexpected yet visually engaging results highlights a trade-off in generative AI: while newer models may achieve higher precision in style replication, earlier models often exhibit a greater degree of unpredictability and artistic exploration, which can sometimes lead to unexpectedly compelling outputs.

6.3. Survey Results Integration

Synthetic results from the human evaluation survey have been integrated into the AI-Pastiche dataset available on Kaggle. These include aggregated ratings for each image based on subjective assessments of its perceived authenticity (i.e., the proportion of respondents who believed the sample was created by a human) and its adherence to the prompt. In addition, we introduced a metadata column labeled defects, which captures the presence and severity of visible artifacts in the generated image (with 0 indicating no visible defects and 1 indicating major or repeated defects). All metadata values are expressed as floating-point numbers in the range [0, 1].
The inclusion of these annotations is intended to support further research in perceptual evaluation and prompt–image alignment.

7. Critical Aspects of Artificial Generation

In this section, we highlight some of the most common and critical challenges observed in the generative models under consideration. These insights stem both from our direct experience in dataset creation and from the results of our surveys.
We structure the discussion around three major problem areas: artifacting and distortion (Section 7.1), hyperrealism (Section 7.2), and anachronisms (Section 7.3).

7.1. Artifacting and Distortion

Among the most evident problems are artifacting and distortion, where models fail to maintain anatomical coherence or structural integrity. A few major instances are discussed below.

7.1.1. Fingers, Hands, and Limbs

Correctly rendering hands and fingers remains one of the major challenges in generative image synthesis. The problem is common to most of the models: some examples are given in Figure 9.
The problem is well known and stems from several factors. The primary challenge is that hands frequently interact with objects or other parts of the body, leading to complex occlusions and overlapping regions. This poses difficulties both during training, where the model must abstract hands and fingers from their specific context, and during generation, where the model must realistically render them within the context of these interactions.
The problem is not limited to human figures. Animals are frequently depicted with an unnatural number of legs, heads, or similar distortions (see Figure 10).

7.1.2. Distortions in Complex Scenarios

Distortions become more pronounced in complex scenarios, such as groups of people, highly dynamic scenes, or intricate architectural compositions. In these cases, maintaining a natural balance between stylistic accuracy and structural coherence remains a challenging task for most models. A few typical examples are shown in Figure 11, but nearly all models struggled with these specific prompts: (a) a traditional rural festival in 19th-century Realism style, (b) a music lesson in Rococo style, and (c) a battle between knights in Early Renaissance style.
One frustrating limitation of current generative models is their inability to dynamically adjust generation time based on the complexity of the task. Unlike human artists, who naturally dedicate more time to intricate compositions while completing simpler ones more quickly, these models follow a fixed computational budget, regardless of the difficulty of the image being generated.
For instance, diffusion-based models operate within a predefined number of denoising steps, meaning that they do not inherently “realize” when an image requires additional refinement to resolve ambiguities in structure, perspective, or stylistic details. Whether generating a minimalist still life or a highly detailed historical battle scene, the model performs the same number of steps, often leading to over-processing in simple cases and underdeveloped details in complex ones.

7.2. Hyperrealism

Most generative models, often optimized for photorealism, struggle to reproduce the unique nuances and distinctive qualities characteristic of artistic styles from the past. We shall discuss the issues in three paradigmatic cases: portraits, still lifes, and landscapes.

7.2.1. Portraits

Modern generative models often exhibit a hyperrealistic tendency when replicating facial details, often in contrast with the historical artistic style they are supposed to mimic according to the prompt. This excessive sharpness and detail can create a fundamental mismatch between the expected stylistic conventions and the generated output, leading to images that feel anachronistic or unconvincing. A few examples are given in Figure 12.
The problem becomes even more apparent when considering historical limitations in artistic materials and techniques. Painters working with oil or tempera could not achieve the pore-level skin textures or ultra-sharp reflections that modern models tend to generate by default. When a generative model introduces such hyperrealistic details into a Renaissance painting or a 17th-century Dutch portrait, the output no longer aligns with the stylistic expectations of that period.

7.2.2. Still Lifes

Another common subject where generative models struggle to restrain their tendency toward excessive realism is still life painting. While still lifes often contain highly detailed depictions of objects, traditional artistic styles—especially those from historical periods—frequently employ soft lighting, controlled textures, and a painterly touch that distinguishes them from hyperrealistic renderings.
Generative models, however, tend to overemphasize surface details, reflections, and textures, producing results that lean toward photographic realism rather than adhering to the stylistic characteristics of classical still life compositions. This issue becomes particularly noticeable in flower arrangements, fruit compositions, and table settings, where the AI-generated images may include overly sharp edges, unnatural glossiness, or exaggerated depth-of-field effects that are inconsistent with traditional oil painting techniques.
A few typical examples are shown in Figure 13.

7.2.3. Landscapes and Cityscapes

As evidenced by the tag analysis in Section 6.1.2, the most challenging elements in the representation of naturalistic scenes are clouds and water, particularly when combined with specific times of the day—such as dawn or sunset—or when the prompt demands highly dramatic atmospheric effects. These conditions require a delicate interplay of light, color gradients, and reflections, which can be difficult for generative models to reproduce in a way that remains both visually coherent and stylistically faithful.
Successfully depicting clouds and water often requires a nuanced understanding of texture, movement, and atmospheric perspective. While most models have made significant progress in generating these elements with a high degree of realism, they often struggle when tasked with deviating from photorealism in favor of a specific artistic technique. Instead of adapting to the rushed brushwork of Impressionism or the soft, fairy-like contrast of Baroque landscapes, models frequently default to overly detailed or artificially blended textures, resulting in images that feel technically proficient but stylistically inconsistent with historical painting traditions. Some examples are given in Figure 14.
The tension between hyperrealism and stylistic accuracy becomes particularly evident in the case of Naïve art. While hyperrealism is defined by extreme attention to detail, the Naïve style is intentionally simplified, often characterized by flat perspectives, bold colors, and a disregard for proportional accuracy. Figures and objects may appear distorted or childlike, resembling works created without adherence to formal artistic training.
An example of this contrast is shown in Figure 15a, where the model introduces a strong fading effect at the horizon. While this technique enhances depth and improves realism, it is fundamentally at odds with the flat rendering style typical of Naïve painting. When prompted to generate a flatter sky with reduced fading and perspective effects, the model tends to overcompensate, resulting in an oversimplified artifact that still fails to fully capture the intended aesthetic, as illustrated in Figure 15b,c.

7.3. Anachronisms

Not infrequently, models may add anachronistic elements in the painting, completely disrupting the historical setting and often creating unintentionally humorous effects. A typical example is the van in the middle of the scene of Figure 16a, inspired to the style of Pieter Bruegel the Elder.
The artwork of Figure 16b was supposed to be a watercolor inspired by the Realism art of the Industrial Revolution period, depicting an open-cut mine with industrial activity. Almost all models filled the scene with modern crane-like machines and trucks.
Another example is the iron fence in Figure 16c. The work is intended to represent the Expulsion from the Garden of Eden, supposedly mimicking the style of Raphael. The composition is rather confused, and the style is essentially Neoclassical; however, the iron fence enclosing the “garden” is particularly jarring. Wrought iron fences of this type became common in Europe only in the 19th century, primarily for gates or balconies, so its presence in a painting that is meant to belong to the Renaissance period is entirely unjustified.
Similar examples are the PC mouse in Figure 5a, likely conflicting with the prompt request to have a small mouse in the picture, or the radiators in Figure 17.
While the clothing is generally accurate for the specified period and style, there are still some noticeable mistakes. For example, in Figure 12d, the girl washing garments, ostensibly from the 19th century, is wearing sneakers—a clear anachronism. Similarly, the group of people lounging by the seaside in Figure 18, intended to reflect the Impressionist movement, includes women wearing bikinis—an anachronism that is clearly out of place.
The most intriguing and complex example of anachronism is Midjourney’s depiction of the Baptism of Christ, shown in Figure 18. The issue lies in the wound on Jesus’ side, traditionally associated with the crucifixion. This example is fascinating, as it highlights a fundamental challenge in generative models: the ability to semantically interpret and contextually place visual elements. The inclusion of the wound suggests that the model has conflated distinct aspects of Christian iconography, likely due to overlapping representations of Christ within its training data drawn from various narratives.
This mistake underscores the difficulty of ensuring historical and theological accuracy in generative outputs, particularly when representing complex religious or cultural symbols. It reflects a lack of nuanced understanding, with the model treating all depictions of Jesus as interchangeable rather than context-specific. Addressing such issues would require either more carefully curated training data or the development of advanced mechanisms for context-aware generation. This example serves as a compelling case study in the importance of aligning generative outputs with both stylistic fidelity and semantic coherence.

8. Conclusions

In this work, we have explored the capabilities and limitations of modern generative models in replicating historical artistic styles. Our analysis is structured around two main contributions: (1) the creation of a well curated, supervised dataset of AI-generated artworks—the AI-Pastiche dataset—and (2) a comprehensive evaluation of generative models through user surveys assessing perceptual authenticity and prompt adherence.
The AI-Pastiche dataset is a richly annotated collection of AI-generated images, categorized by model, style, period, and subject matter. While its current size is not very large (73 prompts, 12 models, and 753 images), it offers a valuable resource for analyzing the strengths and weaknesses of different generative approaches, with potential for diverse applications and for serving as a benchmark in future research on AI-driven artistic replication.
Using the AI-Pastiche dataset, we conducted a systematic evaluation of generative models based on extensive user surveys. We separately assessed perceptual authenticity—how convincingly an artwork mimics human-created paintings—and prompt adherence—how faithfully the output aligns with the given instructions. The results reveal a key trade-off: some models prioritize aesthetic quality over strict adherence to the prompt, while others sacrifice visual refinement for greater accuracy. This discrepancy underscores the challenges in balancing creative flexibility and control in generative image synthesis.
Our study highlights both the progress and ongoing challenges in generative AI for artistic style replication. While models can produce visually compelling outputs, a major obstacle remains their tendency toward hyperrealism. When attempting to reproduce historical styles, these models focus on surface-level details, such as textures and brushwork, yet fail to capture the deeper artistic principles that define each period. Artistic style is more than a sum of textures: it involves composition, narrative intent, spatial relationships, and cultural context. Given the limited availability of training data for many historical styles, achieving a truly contextually accurate AI-generated artwork remains a difficult task.
Another fundamental limitation is the rigid inference time of generative models. Unlike human artists, who naturally allocate more effort to complex compositions, these models operate under fixed computational budgets, leading to missed opportunities for adaptive refinement. Future improvements may involve confidence-based step adjustments, allowing the model to extend or shorten the generation process depending on the complexity of the scene. More advanced conditioning mechanisms could also enable models to better integrate structural coherence and artistic intent rather than simply mimicking surface features.
Ultimately, our findings point to the next frontier in generative AI for art: moving beyond simple visual reproduction toward models that can understand and interpret artistic traditions in a more holistic and historically grounded way. While significant challenges remain, improvements in training strategies, dataset curation, and adaptive inference methods could help bridge the gap between style imitation and true artistic coherence, bringing generative AI closer to meaningful contributions in digital artistry.
Our dataset could help trace progress in this direction. The field is evolving very rapidly, and even within the short time required for the publication of this article, new systems have emerged with improved capabilities. We plan to integrate AI-Pastiche with these new models in the near future while also extending the number of prompts to improve stylistic coverage and provide a more balanced dataset. We are open to collaborations for future improvements to the dataset.

Author Contributions

Conceptualization, A.A.; methodology, A.A., F.G., T.M., R.C.S. and F.Z.; software, A.A., F.G., T.M., R.C.S. and F.Z.; investigation, A.A., F.G., T.M., R.C.S. and F.Z.; data curation, A.A., F.G., T.M., R.C.S. and F.Z.; writing, A.A., F.G., T.M., R.C.S. and F.Z.; supervision, A.A. All authors have read and agreed to the published version of the manuscript.

Funding

Research partially supported by the Future AI Research (FAIR) project of the National Recovery and Resilience Plan (NRRP), Mission 4 Component 2 Investment 1.3, funded by the European Union -NextGenerationEU.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The AI-Pastiche dataset is freely available on Kaggle at the following repository: https://www.kaggle.com/datasets/asperticsuniboit/deepfakedatabase/, accessed on 3 July 2025.

Acknowledgments

The AI-Pastiche Dataset was officially presented at the UnaEuropa Summer School at “AI & Creativity”, held at Campus Condorcet, Université Paris 1 Panthéon-Sorbonne on 7–11 July 2025.

Conflicts of Interest

On behalf of all authors, the corresponding author states that there are no conflicts of interest.

References

  1. Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analyzing and Improving the Image Quality of StyleGAN. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2020, Seattle, WA, USA, 13–19 June 2020; pp. 8107–8116. [Google Scholar] [CrossRef]
  2. Sauer, A.; Karras, T.; Laine, S.; Geiger, A.; Aila, T. StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis. In Proceedings of the International Conference on Machine Learning, ICML 2023, Honolulu, HI, USA, 23–29 July 2023; Proceedings of Machine Learning Research. Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J., Eds.; 2023; Volume 202, pp. 30105–30118. [Google Scholar]
  3. Ho, J.; Jain, A.; Abbeel, P. Denoising Diffusion Probabilistic Models. In Proceedings of the Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, Virtual, 6–12 December 2020; Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M., Lin, H., Eds.; NeurIPS Foundation, Inc.: San Diego, CA, USA, 2020. [Google Scholar]
  4. Song, J.; Meng, C.; Ermon, S. Denoising Diffusion Implicit Models. In Proceedings of the 9th International Conference on Learning Representations, ICLR 2021, Virtual, 3–7 May 2021. [Google Scholar]
  5. Rombach, R.; Blattmann, A.; Lorenz, D.; Esser, P.; Ommer, B. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 10684–10695. [Google Scholar]
  6. Xu, Y.; Xu, X.; Gao, H.; Xiao, F. SGDM: An Adaptive Style-Guided Diffusion Model for Personalized Text to Image Generation. IEEE Trans. Multim. 2024, 26, 9804–9813. [Google Scholar] [CrossRef]
  7. Zhang, Z.; Zhang, Q.; Xing, W.; Li, G.; Zhao, L.; Sun, J.; Lan, Z.; Luan, J.; Huang, Y.; Lin, H. ArtBank: Artistic Style Transfer with Pre-trained Diffusion Model and Implicit Style Prompt Bank. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence, AAAI 2024, Vancouver, BC, Canada, 20–27 February 2024; pp. 7396–7404. [Google Scholar] [CrossRef]
  8. Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
  9. Saharia, C.; Chan, W.; Saxena, S.; Li, L.; Whang, J.; Denton, E.L.; Ghasemipour, S.K.S.; Lopes, R.G.; Ayan, B.K.; Salimans, T.; et al. Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding. In Proceedings of the NeurIPS, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
  10. Png, W.H.; Aun, Y.; Gan, M. FeaST: Feature-guided Style Transfer for high-fidelity art synthesis. Comput. Graph. 2024, 122, 103975. [Google Scholar] [CrossRef]
  11. Chung, J.; Hyun, S.; Heo, J. Style Injection in Diffusion: A Training-Free Approach for Adapting Large-Scale Diffusion Models for Style Transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, WA, USA, 16–22 June 2024; pp. 8795–8805. [Google Scholar] [CrossRef]
  12. Zhang, Z.; Zhang, Q.; Lin, H.; Xing, W.; Mo, J.; Huang, S.; Xie, J.; Li, G.; Luan, J.; Zhao, L.; et al. Towards Highly Realistic Artistic Style Transfer via Stable Diffusion with Step-aware and Layer-aware Prompt. In Proceedings of the Thirty-Third International Joint Conference on Artificial Intelligence, IJCAI 2024, Jeju, Republic of Korea, 3–9 August 2024; pp. 7814–7822. [Google Scholar]
  13. Barros, M.; Ai, Q. Designing with words: Exploring the integration of text-to-image models in industrial design. Digit. Creat. 2024, 35, 378–391. [Google Scholar] [CrossRef]
  14. Zhou, H.; Zhu, J.; Mateas, M.; Wardrip-Fruin, N. The Eyes, the Hands and the Brain: What can Text-to-Image Models Offer for Game Design and Visual Creativity? In Proceedings of the 19th International Conference on the Foundations of Digital Games, FDG 2024, Worcester, MA, USA, 21–24 May 2024; Smith, G., Whitehead, J., Samuel, B., Spiel, K., van Rozen, R., Eds.; Association for Computing Machinery (ACM): New York, NY, USA, 2024; p. 23. [Google Scholar]
  15. Haase, J.; Djurica, D.; Mendling, J. The Art of Inspiring Creativity: Exploring the Unique Impact of AI-generated Images. In Proceedings of the 29th Americas Conference on Information Systems, AMCIS, Panama City, Panama, 10–12 August 2023; Pavlou, P.A., Midha, V., Animesh, A., Carte, T.A., Graeml, A.R., Mitchell, A., Eds.; Association for Information Systems (AIS): Atlanta, GA, USA, 2023. [Google Scholar]
  16. Vartiainen, H.; Tedre, M. Using artificial intelligence in craft education: Crafting with text-to-image generative models. Digit. Creat. 2023, 34, 1–21. [Google Scholar] [CrossRef]
  17. Wei, J.; Tay, Y.; Bommasani, R.; Raffel, C.; Zoph, B.; Borgeaud, S.; Yogatama, D.; Bosma, M.; Zhou, D.; Metzler, D.; et al. Emergent Abilities of Large Language Models. Trans. Mach. Learn. Res. 2022, 2022. [Google Scholar]
  18. Schaeffer, R.; Miranda, B.; Koyejo, S. Are Emergent Abilities of Large Language Models a Mirage? In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
  19. Du, Z.; Zeng, A.; Dong, Y.; Tang, J. Understanding Emergent Abilities of Language Models from the Loss Perspective. arXiv 2024, arXiv:2403.15796. [Google Scholar] [CrossRef]
  20. Huang, X.A.; Malfa, E.L.; Marro, S.; Asperti, A.; Cohn, A.G.; Wooldridge, M.J. A Notion of Complexity for Theory of Mind via Discrete World Models. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; Al-Onaizan, Y., Bansal, M., Chen, Y., Eds.; 2024; pp. 2964–2983. [Google Scholar]
  21. Nummenmaa, L.; Hari, R. Bodily feelings and aesthetic experience of art. Cogn. Emot. 2023, 37, 515–528. [Google Scholar] [CrossRef] [PubMed]
  22. Freedberg, D.; Gallese, V. Motion, emotion and empathy in esthetic experience. Trends Cogn. Sci. 2007, 11, 197–203. [Google Scholar] [CrossRef]
  23. Gatys, L.A.; Ecker, A.S.; Bethge, M. A Neural Algorithm of Artistic Style. arXiv 2015, arXiv:1508.06576. [Google Scholar] [CrossRef]
  24. Huang, X.; Belongie, S.J. Arbitrary Style Transfer in Real-Time with Adaptive Instance Normalization. In Proceedings of the IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, 22–29 October 2017; pp. 1510–1519. [Google Scholar] [CrossRef]
  25. Bird, J.J.; Lotfi, A. CIFAKE: Image Classification and Explainable Identification of AI-Generated Synthetic Images. arXiv 2023, arXiv:2303.14126. [Google Scholar] [CrossRef]
  26. Zhu, M.; Chen, H.; Yan, Q.; Huang, X.; Lin, G.; Li, W.; Tuv, Z.; Hu, H.; Hu, J.; Wang, Y. GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image. arXiv 2023, arXiv:2306.0857. [Google Scholar]
  27. Li, Y.; Liu, Z.; Zhao, J.; Ren, L.; Li, F.; Luo, J.; Luo, B. The Adversarial AI-Art: Understanding, Generation, Detection, and Benchmarking. arXiv 2024, arXiv:2404.14581. [Google Scholar] [CrossRef]
  28. Rahman, M.A.; Paul, B.; Sarker, N.H.; Hakim, Z.I.A.; Fattah, S.A. Artifact: A Large-Scale Dataset with Artificial And Factual Images For Generalizable And Robust Synthetic Image Detection. In Proceedings of the IEEE International Conference on Image Processing, ICIP 2023, Kuala Lumpur, Malaysia, 8–11 October 2023; pp. 2200–2204. [Google Scholar] [CrossRef]
  29. Hong, Y.; Zhang, J. WildFake: A Large-scale Challenging Dataset for AI-Generated Images Detection. arXiv 2024, arXiv:2402.11843. [Google Scholar]
  30. Chen, Y.T.; Zou, J.Y. TWIGMA: A dataset of AI-Generated Images with Metadata From Twitter. In Proceedings of the Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023, New Orleans, LA, USA, 10–16 December 2023; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; NeurIPS Foundation, Inc.: San Diego, CA, USA, 2023. [Google Scholar]
  31. Yang, Z.; Zhan, F.; Liu, K.; Xu, M.; Lu, S. AI-Generated Images as Data Source: The Dawn of Synthetic Era. arXiv 2023, arXiv:2310.01830. [Google Scholar] [CrossRef]
  32. Wang, B.; Zhu, Y.; Chen, L.; Liu, J.; Sun, L.; Childs, P.R.N. A study of the evaluation metrics for generative images containing combinational creativity. Artif. Intell. Eng. Des. Anal. Manuf. 2023, 37. [Google Scholar] [CrossRef]
  33. Oppenlaender, J. The Creativity of Text-to-Image Generation. In Proceedings of the 25th International Academic Mindtrek conference, Academic Mindtrek 2022, Tampere, Finland, 16–18 November 2022; pp. 192–202. [Google Scholar] [CrossRef]
  34. Sanchez, T. Examining the Text-to-Image Community of Practice: Why and How do People Prompt Generative AIs? In Proceedings of the Creativity and Cognition, C&C 2023, Virtual, 19–21 June 2023; pp. 43–61. [Google Scholar]
  35. Lee, S.; Lee, J.; Bae, C.H.; Choi, M.; Lee, R.; Ahn, S. Optimizing Prompts Using In-Context Few-Shot Learning for Text-to-Image Generative Models. IEEE Access 2024, 12, 2660–2673. [Google Scholar] [CrossRef]
  36. Brooks, T.; Holynski, A.; Efros, A.A. InstructPix2Pix: Learning to Follow Image Editing Instructions. arXiv 2022, arXiv:2211.09800. [Google Scholar]
  37. Parmar, G.; Park, T.; Narasimhan, S.; Zhu, J. One-Step Image Translation with Text-to-Image Models. arXiv 2024, arXiv:2403.12036. [Google Scholar]
  38. AI, S. Stability AI Releases DeepFloyd IF, a Powerful Text-to-Image Model That Can Smartly Integrate Text into Images. 2023. Available online: https://www.unite.ai/stability-ai-releases-text-to-image-model-deepfloyd-if/ (accessed on 3 July 2025).
  39. Peebles, W.; Xie, S. Scalable Diffusion Models with Transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, ICCV 2023, Paris, France, 1–6 October 2023; pp. 4172–4182. [Google Scholar]
  40. Esser, P.; Kulal, S.; Blattmann, A.; Entezari, R.; Müller, J.; Saini, H.; Levi, Y.; Lorenz, D.; Sauer, A.; Boesel, F.; et al. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis. In Proceedings of the Forty-First International Conference on Machine Learning, ICML 2024, Vienna, Austria, 21–27 July 2024. Proceedings of Machine Learning Research (PMLR). [Google Scholar]
  41. Adobe. Adobe Introduces Firefly Image 3 Foundation Model to Take Creative Exploration and Ideation to New Heights. 2024. Available online: https://news.adobe.com/news/news-details/2024/adobe-introduces-firefly-image-3-foundation-model-to-take-creative-exploration-and-ideation-to-new-heights (accessed on 3 July 2025).
  42. Xiao, S.; Wang, Y.; Zhou, J.; Yuan, H.; Xing, X.; Yan, R.; Wang, S.; Huang, T.; Liu, Z. OmniGen: Unified Image Generation. arXiv 2024, arXiv:2409.11340. [Google Scholar] [CrossRef]
  43. Henry, A.; Dachapally, P.R.; Pawar, S.S.; Chen, Y. Query-Key Normalization for Transformers. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; Findings of ACL. Cohn, T., He, Y., Liu, Y., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; Volume EMNLP 2020, pp. 4246–4253. [Google Scholar] [CrossRef]
  44. Labs, B.F. Announcing Black Forest Labs. 2024. Available online: https://bfl.ai/blog/24-08-01-bfl (accessed on 3 July 2025).
  45. Sauer, A.; Lorenz, D.; Blattmann, A.; Rombach, R. Adversarial Diffusion Distillation. In Proceedings of the Computer Vision—ECCV 2024—18th European Conference, Milan, Italy, 29 September–4 October 2024; Proceedings, Part LXXXVI. pp. 87–103. [Google Scholar]
  46. Team, K. Kolors: Effective Training of Diffusion Model for Photorealistic Text-to-Image Synthesis. 2024. Available online: https://huggingface.co/docs/diffusers/main/en/api/pipelines/kolors (accessed on 3 July 2025).
  47. AI, Neural Love. Ai Art Revolution: Introducing Auto-Aesthetics for Personalized Gen AI Experience. 2024. Available online: https://neural.love/blog/auto-aesthetics-v1-ai-art-revolution (accessed on 3 July 2025).
  48. Asperti, A.; Dessì, L.; Tonetti, M.C.; Wu, N. Does CLIP perceive art the same way we do? arXiv 2025, arXiv:2505.05229. [Google Scholar] [CrossRef]
  49. Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J.; et al. Learning Transferable Visual Models from Natural Language Supervision. In Proceedings of the 38th International Conference on Machine Learning, ICML 2021, Virtual, 18–24 July 2021; Proceedings of Machine Learning Research. Meila, M., Zhang, T., Eds.; 2021; Volume 139, pp. 8748–8763. [Google Scholar]
Figure 1. Human vs. AI-generation confusion matrix. Less than 30% of AI-generated images were attributed to humans. Around 20% of human artworks were attributed to AI.
Figure 1. Human vs. AI-generation confusion matrix. Less than 30% of AI-generated images were attributed to humans. Around 20% of human artworks were attributed to AI.
Bdcc 09 00231 g001
Figure 2. Distribution of misclassification percentages on AI-generated images. Only a small set of AI-generated images are able to reach a high authenticity score.
Figure 2. Distribution of misclassification percentages on AI-generated images. Only a small set of AI-generated images are able to reach a high authenticity score.
Bdcc 09 00231 g002
Figure 3. Examples of convincing AI-generated examples of different styles and periods, according to the results of our survey.
Figure 3. Examples of convincing AI-generated examples of different styles and periods, according to the results of our survey.
Bdcc 09 00231 g003
Figure 4. “Art Nouveau” examples. The sketchy nature of the subject specified by the prompt adapted particularly well to the capacities of generative models.
Figure 4. “Art Nouveau” examples. The sketchy nature of the subject specified by the prompt adapted particularly well to the capacities of generative models.
Bdcc 09 00231 g004
Figure 5. Satirical examples. Caricatures of Otto Von Bismarck in the style of the satirical magazine La Lune of the end of the 19th century.
Figure 5. Satirical examples. Caricatures of Otto Von Bismarck in the style of the satirical magazine La Lune of the end of the 19th century.
Bdcc 09 00231 g005
Figure 6. Misclassification rates by periods.
Figure 6. Misclassification rates by periods.
Bdcc 09 00231 g006
Figure 7. Impact of the tags on the perceived authenticity of the AI-generate artifacts, estimated through linear regression. The analysis is restricted to tags occurring in at least two different prompts. We perform the investigation for all models (blue) and for a restricted subset of models with higher authenticity scores (red).
Figure 7. Impact of the tags on the perceived authenticity of the AI-generate artifacts, estimated through linear regression. The analysis is restricted to tags occurring in at least two different prompts. We perform the investigation for all models (blue) and for a restricted subset of models with higher authenticity scores (red).
Bdcc 09 00231 g007
Figure 8. An histogram summarizing the adherence to prompt instructions for the different models, according to our survey.
Figure 8. An histogram summarizing the adherence to prompt instructions for the different models, according to our survey.
Bdcc 09 00231 g008
Figure 9. Problems with fingers in hands and foots; extra limbs.
Figure 9. Problems with fingers in hands and foots; extra limbs.
Bdcc 09 00231 g009
Figure 10. Strange beasts, with an unnatural number of heads or legs, or fusing multiple sources.
Figure 10. Strange beasts, with an unnatural number of heads or legs, or fusing multiple sources.
Bdcc 09 00231 g010
Figure 11. Lack of structural coherence in complex scenarios and highly dynamic scenes.
Figure 11. Lack of structural coherence in complex scenarios and highly dynamic scenes.
Bdcc 09 00231 g011
Figure 12. Hyperrealism examples. The required styles were (a) Baroque, (b) Baroque, (c) Romanticism, (d) 19th-century Realism, (e) Impressionism, and (f) Romanticism.
Figure 12. Hyperrealism examples. The required styles were (a) Baroque, (b) Baroque, (c) Romanticism, (d) 19th-century Realism, (e) Impressionism, and (f) Romanticism.
Bdcc 09 00231 g012
Figure 13. Hyperrealism examples. The required styles were (a) Surrealism, (b) Classicism, and (c) Renaissance.
Figure 13. Hyperrealism examples. The required styles were (a) Surrealism, (b) Classicism, and (c) Renaissance.
Bdcc 09 00231 g013
Figure 14. Hyperrealism examples. The required styles were (a) Impressionism, (b) Baroque, and (c) Rococo Vedutism.
Figure 14. Hyperrealism examples. The required styles were (a) Impressionism, (b) Baroque, and (c) Rococo Vedutism.
Bdcc 09 00231 g014
Figure 15. (a) A Naïve-style sample generated by Dall·E, with an excessive sense of perspective. (b,c) Prompting the model to reduce the fading effect at the horizon and produce a more flat perspective results in an oversimplification of the produced artifact.
Figure 15. (a) A Naïve-style sample generated by Dall·E, with an excessive sense of perspective. (b,c) Prompting the model to reduce the fading effect at the horizon and produce a more flat perspective results in an oversimplification of the produced artifact.
Bdcc 09 00231 g015
Figure 16. Example of anachronisms: the car in Figure (a), the modern trucks and machines in Figure (b), the iron fence in Figure (c).
Figure 16. Example of anachronisms: the car in Figure (a), the modern trucks and machines in Figure (b), the iron fence in Figure (c).
Bdcc 09 00231 g016
Figure 17. The subject of the two samples is “The Daughter of the Astronomer”, in Vermeer’s style. The problematic element is the radiator, which is not compatible with the intended period.
Figure 17. The subject of the two samples is “The Daughter of the Astronomer”, in Vermeer’s style. The problematic element is the radiator, which is not compatible with the intended period.
Bdcc 09 00231 g017
Figure 18. Other examples of anachronism: (a) Bikinis are not compatible with the period of Impressionism; (b) the wound on Jesus’ side would not be present at the time of His baptism.
Figure 18. Other examples of anachronism: (a) Bikinis are not compatible with the period of Impressionism; (b) the wound on Jesus’ side would not be present at the time of His baptism.
Bdcc 09 00231 g018
Table 1. Comparison of the used models.
Table 1. Comparison of the used models.
ModelCreatorArchitectureConditioningResolution (Default)Configurable Output Shape
Ideogram 2.0 (https://ideogram.ai/login, accessed on 3 July 2025)Ideogram AIDiffusion-based architecture, not fully disclosedText-to-image1024 × 1024Yes
Flux 1.1 Pro (https://bfl.ai/models/flux-pro, accessed on 3 July 2025)Black Forest LabsRectified FLow Transformer [39,40]Text-to-image2048 × 2048Yes
Dall-E 3 (via ChatGPT-4o) (https://openai.com/index/dall-e-3/, accessed on 3 July 2025)https://openai.com/research/, accessed on 3 July 2025 OpenAIDiffusion-based architecture, not fully disclosedText-to-image1024 × 1024No
Firefly Image 3 (https://firefly.adobe.com/, accessed on 3 July 2025)AdobeDiffusion.based architecture, not fully disclosed [41]Text-to-image and image-to-image2048 × 2048Yes
OmniGen (https://github.com/VectorSpaceLab/OmniGen, accessed on 3 July 2025)Beijing AcademyLatent Diffusion Model [42]Multimodal-to-image2048 × 2048Yes
Leonardo Phoenix (https://leonardo.ai, accessed on 3 July 2025)Leonardo Interactive PtyDiffusion-based architecture, not fully disclosedText-to-image and image-to-image1024 × 1024Yes
Midjourney V6.1 (https://www.midjourney.com/imagine, accessed on 3 July 2025)MidjourneyDiffusion + Transformers framework, not fully disclosedText-to-image and image-to-image1024 × 1024Yes
Stable Diffusion 1.5 (https://huggingface.co/stable-diffusion-v1-5/stable-diffusion-v1-5, accessed on 3 July 2025)Stability AI (dismissed)Latent Diffusion Model (LDM) [5]Text-to-image and image-to-image512 × 512Yes
Stable Diffusion 3.5-large (https://stability.ai/news/introducing-stable-diffusion-3-5, accessed on 3 July 2025)Stability AIAdvanced LDM framework with CLIP and T5 text encoders [40]Text-to-image and image-to-image [43]1024 × 1024Yes
Flux.1 Schnell (https://blackforestlabs.ai/announcing-black-forest-labs/, accessed on 3 July 2025)Black Forest LabsFast version of Flux.1.1 [44], trained using latent adversarial diffusion distillation [45].Text-to-image2048 × 2048Yes
Kolors 1.5 (https://klingai.com/text-to-image/new, accessed on 3 July 2025)Kuaishou Kolors team—Kling AILarge-scale Latent Diffusion-Based Model [46]Text-to-image and image-to-image1024 × 1024Yes
Auto-Aesthetics v1 (https://neural.love/blog/auto-aesthetics-v1-ai-art-revolution, accessed on 3 July 2025)Neural.loveNot disclosed [47]Text-to-image1024 × 1024Yes
Table 2. Description of the attributes used in the dataset.
Table 2. Description of the attributes used in the dataset.
AttributeDescription
Generative ModelThe model used to generate the image. The list of models is provided in Section 3.2.
SubjectA list of tags describing the image content based on the prompt. There are approximately 50 different tags, including categories such as “landscape”, “animals”, and “trees”. Some tags also represent color schemes or tonal impressions, such as “gold”, “soft tones”, and “vibrant tones”. Tags are stored as a comma-separated list.
StyleA synthetic label describing the artistic style of the image. Styles include Renaissance, Baroque, Rococo, Classicism, Romanticism, Realism, Satirical, Impressionism, Art Nouveau, Naïve, Expressionism, Futurism, Cubism, Dadaism, Fauvism, Abstractionism, Symbolism, and Surrealism.
PeriodThe historical period or century to which the intended painting style belongs, as specified in the prompt (e.g., 18th century and Renaissance).
PromptThe full text prompt used to generate the image.
Generated ImageThe identifier of the image.
Table 3. AI-Pastiche statistics by period.
Table 3. AI-Pastiche statistics by period.
PeriodTotal%
20th century30732.2
19th century28930.3
16th century15316.1
17th century11712.3
15th century495.1
18th century384
Table 4. Most represented styles.
Table 4. Most represented styles.
StyleTotal%
Renaissance20221.2
Impressionism13614.3
Romanticism929.7
Baroque869.0
Realism606.3
Surrealism505.2
Dadaism444.6
Table 5. Performance of models in terms of perceived authenticity. Only the six models with the highest performance according to our survey are listed. Total refers to all models.
Table 5. Performance of models in terms of perceived authenticity. Only the six models with the highest performance according to our survey are listed. Total refers to all models.
ModelTotal CountMisclassifiedRatio
Ideogram5322630.49
Midjourney5982570.43
Stable-diffusion-3.5-large5722040.36
Stable Diffusion 1.56001950.33
OmniGen6422060.32
Dall·E 35621690.30
Total686819800.29
Table 6. Perceived authenticity of AI-generated images based on the intended historical period.
Table 6. Perceived authenticity of AI-generated images based on the intended historical period.
PeriodTotal CountMisclassifiedRatio
20th century20956970.33
19th century19555410.28
17th century7862080.26
15th century331840.25
16th century10832650.24
18th century261570.22
Table 7. Perceived authenticity of AI-generated images based on the artistic style meant to be replicated.
Table 7. Perceived authenticity of AI-generated images based on the artistic style meant to be replicated.
PeriodTotal CountMisclassifiedRatio
Art Nouveau104490.47
Cubism232920.40
Satirical74290.39
Impressionism9223500.38
Dadaism3201180.37
Futurism114420.37
Classicism273990.36
Fauvism119400.34
Expressionism170570.34
Symbolism302980.32
Vedutism92260.28
Renaissance14583550.24
Romanticism6351540.24
Abstractionism91200.22
Baroque5741230.21
Realism402850.21
Surrealism334700.21
Rococo157300.20
Naïve201330.16
Table 8. Average “adherence” score. The score was computed as a weighted average of the survey results, associating value 1 to “good”, 0 to “medium” and −1 to “low”. We only list models with better-than-average behavior, according to our investigation.
Table 8. Average “adherence” score. The score was computed as a weighted average of the survey results, associating value 1 to “good”, 0 to “medium” and −1 to “low”. We only list models with better-than-average behavior, according to our investigation.
ModelAverage Score
Leonardo Phoenix0.37
Dall·E 30.36
Midjourney0.32
Ideogram0.29
stable-diffusion-3.5-large0.22
Flux 1.1 Pro0.08
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Asperti, A.; George, F.; Marras, T.; Stricescu, R.C.; Zanotti, F. A Critical Assessment of Modern Generative Models’ Ability to Replicate Artistic Styles. Big Data Cogn. Comput. 2025, 9, 231. https://doi.org/10.3390/bdcc9090231

AMA Style

Asperti A, George F, Marras T, Stricescu RC, Zanotti F. A Critical Assessment of Modern Generative Models’ Ability to Replicate Artistic Styles. Big Data and Cognitive Computing. 2025; 9(9):231. https://doi.org/10.3390/bdcc9090231

Chicago/Turabian Style

Asperti, Andrea, Franky George, Tiberio Marras, Razvan Ciprian Stricescu, and Fabio Zanotti. 2025. "A Critical Assessment of Modern Generative Models’ Ability to Replicate Artistic Styles" Big Data and Cognitive Computing 9, no. 9: 231. https://doi.org/10.3390/bdcc9090231

APA Style

Asperti, A., George, F., Marras, T., Stricescu, R. C., & Zanotti, F. (2025). A Critical Assessment of Modern Generative Models’ Ability to Replicate Artistic Styles. Big Data and Cognitive Computing, 9(9), 231. https://doi.org/10.3390/bdcc9090231

Article Metrics

Back to TopTop