Research on Emotion-Based Inspiration Mechanism in Art Creation by Generative AI
Abstract
1. Introduction
2. Related Work
3. Methods
3.1. The Inspiration Mechanism
3.1.1. Creative Emotion Capture Stage
- Step 1:
- An artist interacts with a conversational agent interface by providing a voice description of his/her creation intention or vision. The artist’s voice input was captured as an audio signal, which was then processed by a “multimodal Emotion LLM”. The “multimodal emotion LLM” is not a single, fine-tuned model but an LLM agent designed to interpret an artist’s vocal description. It executes two main tasks. One is speech-to-text transcription. For example, we can employ OpenAI’s Whisper model (large-v3). Whisper is recognized for its high accuracy and robustness in transcribing spoken language from various acoustic environments, making it ideal for capturing the artist’s natural speech. Another task is emotion and content analysis: the transcribed text was then analyzed to extract its literal content and emotional undertones as prompt tags. In this case, we can utilize OpenAI’s GPT-4 model for this task. The agent instructed GPT-4 to act as an “artistic emotion analyst.”
- Step 2:
- At the same time, the artist provides a line drawing sketch that captures their initial idea of composition. This sketch serves as a structural skeleton and visual guidance for the subsequent image processing.
- Step 3:
- The voice description (from step 1) was fed into a “multimodal emotion LLM” to computationally understand the emotional nuances conveyed in the artist’s voice. This LLM analyzed the voice signals to extract and categorize the expressed emotions into text descriptions and emotion tags.
- Step 4:
- The text descriptions and emotion tags (from step 3) were combined to create a richer, multi-faceted prompt that includes both specific emotional cues and creative descriptive content.
3.1.2. Visual Inspiration Stage
- Step 5:
- The line drawing sketch (from step 2) was used as input for the “Sketch-to-Image Generation Model”. This process utilizes diffusion models to generate a painting version of the sketch image—“Base Image”. The primary purpose is to translate the artist’s rough visual idea into a more concrete image without the real composition by painting. “Sketch-to-Image Generation Model” is also an LLM agent that utilizes a diffusion model conditioned on both the image sketch and the text description to transform the artist’s line drawing sketch into a fully rendered base image.
- Step 6:
- Creative Emotion Weaving—the combined emotion tags and text description (from step 4) feeds into a “prompt-generation LLM” to generate a fully developed prompt called “creative emotion prompt”.
- Step 7:
- Emotion Vision Rendering—initially the “Base Image” (from step 5) and the “creative emotion prompt” (resulting from step 6) were used as input for “emotion as a prompt” to generate a batch of the “Emotion Vision” images. This involves using the emotional prompt to refine the “Base Image”, imbuing it with the desired emotional qualities by the “Text-Conditional Image-to-Image Generation Model”. “Text-Conditional Image-to-Image Generation Model” is an LLM agent using an image-to-image (img2img) diffusion model to take a “Base Image” as input and a “creative emotion prompt” as text conditioning to generate a new image. It fine-tunes the visual output, ensuring it aligns with the intended emotional expression.
- Step 8:
- Rearrange art elements—generating “Emotion Vision” images is iterative and finally produces a batch of “Emotion Vision” images. By rearranging emotion tags and text descriptions from a random seed, an AI agent can rewrite the “creative emotion prompt” using different art elements, controlling attributes like color palettes, styles, or stroke expressions. Consequently, different emotion styles of “Emotion Vision” images were produced to maximize the inspirational effect.
- Step 9:
- Inspiration during creation—AI agent presents a batch of “Emotion Vision” images on the interface to inspire the artist’s creation journal.
3.2. Key Concept Design
3.2.1. Creative Emotion Capture
3.2.2. Creative Emotion Weaving
#PassionAndLove, #ExcitementAndAnticipation, #CuriosityAndWonder, #FocusAndFlow, #FrustrationAndStruggle, #DoubtAndVulnerability, #DeterminationAndPerseverance, #JoyAndElation, #SatisfactionAndFulfillment, #SurpriseAndDiscovery, #ContemplationAndReflection, #CatharsisAndRelease, #CreativeAnxiety |
- Step 1:
- Creative Seed Input.
Sketch Concept: “A lone, gnarled tree on a barren hill.” Text Description: “An artwork about isolation and resilience.” Selected Emotion Tag: #DeterminationAndPerseverance |
- Step 2:
- Initial, Unrefined Prompt.
“A lone, gnarled tree on a barren hill. An artwork about isolation and resilience.” |
- Step 3:
- LLM-Mediated Refinement (The “Weaving”).
“You are an art director. Revise the following prompt to visually embody the feeling of ‘Determination and Perseverance’. Do not just add these words. Instead, describe specific changes to the light, color palette, line quality, and composition that would convey this emotion.” |
- Step 4:
- The Final, Creative Emotion Prompt.
Prompt: ** A solitary, ancient, gnarled tree stands defiantly on a wind-swept, barren hill under a dramatic, stormy sky. To convey determination and perseverance, the tree’s lines must be thick and deeply etched, showing its age and struggle. The light should be stark, with a single, powerful beam breaking through the dark clouds to illuminate the tree, creating a high-contrast, heroic effect. The color palette should be muted and earthy (deep browns, grays) but with a defiant splash of deep green leaves on a single branch. The composition should place the tree slightly off-center, as if it is bracing against an unseen force, emphasizing its enduring struggle and resilience. |
3.2.3. Emotion Vision Rendering
- Lines or edges can show direction, movement, and emotions.
- Shapes composed of lines or color, like circles, squares, or free-form, can show the physical entities in a two-dimensional geometric area.
- Form is the three-dimensional version of shape with shading, perspective, or depth to present a stereoscopic feeling.
- Space can be filled with objects or left empty to create depth, which often refers to the area around and within a piece of art.
- Color presents a surface’s hue, saturation, and lightness, which can create feelings, add depth, and lead the viewer’s focus.
- Value helps create contrast, volume, and atmosphere in art, indicating how light or dark a color is.
- Texture describes the surface quality of a piece. It can be tangible or visual and add depth and insight.
Prompt: ** Imagine a sprawling, surreal landscape dominated by colossal, geometric forms made of shattered, iridescent glass. Use lines to depict both sharp, fractured edges and flowing, ethereal trails emanating from the broken structures. The space is vast and disorienting, with a strong sense of atmospheric perspective. The color palette should be predominantly cool (blues, greens, purples) in the distance, shifting to warm, vibrant hues (reds, oranges, yellows) near the viewer. Consider incorporating a single organic element (e.g., a lone, weathered tree or a swirling vortex of energy) to contrast with the rigid geometry. Think about the value shifts within the glass shards to create a sense of depth and realism, and contrast it with the almost flat texture in the background. Emotion Tag Focus: #CreativeAnxiety #FrustrationAndStruggle #DeterminationAndPerseverance #CatharsisAndRelease |
3.2.4. Iterative Inspiration Cycle
- An initial creative idea forms in the artist’s mind.
- The artistic blueprint and the associated emotions are clarified through voice descriptions and sketch drawings.
- The artistic creation begins by focusing on embedding those emotions into the process.
- Using text-to-prompt AI, an emotional image prompt is generated and refined.
- At various checkpoints, prompt-to-image AI produces/rendering the “emotional vision” images.
- The accumulated emotional visions images can be collected as a collection.
3.2.5. Metrics for Emotional Vision
- Midpoints (x1, x2) show how these parameters control the location and width of the optimal inspiration peak. For instance, adjusting x1 shifts the point where inspiration begins to decline, defining the upper bound of the ideal similarity range. Steepness (k1, k2 in Formula (3)) demonstrates how these parameters control the sharpness of the rise and fall of the EVS curve, modeling how “forgiving” or “strict” the criteria for inspiration are.
- Amplitude and Offset (A, C in Formula (3)) can be explained as scaling and baseline parameters that map the function to a desired scoring range (e.g., 0–100).
- Diversity Ratio and Decay Rate (α, k in Formula (4)): We will provide a clearer analysis of how a controls the initial “boost” from diversity at a CLIP score of zero and how k controls how rapidly this boost fades as similarity increases.
3.2.6. Mathematical Modeling Aspects of the Emotion Vision Score (EVS)
- Low Similarity (CLIP score is low): When a generated image is too divergent from the artist’s sketch and textual description, it creates a conceptual disconnect. While it may be innovative, it does not honor the artist’s original intent and is unlikely to inspire the specific art project.
- High Similarity (CLIP score is high): When a generated image closely resembles the initial sketch, it offers no new perspective, creative leap, or unexpected development. It lacks transformative value and therefore also fails to inspire.
- Optimal Inspirational Range: The highest inspirational value lies in a “sweet spot” where the generated image maintains an optimal balance. It respects the structural and thematic essence of the original concept while introducing engaging, novel elements that advance the artist’s vision.
- The inspiration growth phase (L1 term): The first logistic function models the initial rise in inspirational value. As the CLIP score (x) moves away from zero, the image becomes more relevant to the artist’s intention, and its inspirational potential grows accordingly.
- The replication penalty phase (L2 term): The second logistic function models the decline in inspiration as the image becomes not too derivative. As the CLIP score approaches 100, this term’s value increases, and, when subtracted from the first term, it creates a sharp downturn in the total score, penalizing mere replication.
4. Experiments and Discussion
4.1. Experimental Setup
4.2. The Performance of Creative Inspiration
- Impressionism aims to capture the ephemeral qualities of light and atmosphere, often evoking subtle emotions such as tranquility, wonder, or nostalgia, which are closely tied to the depicted scene [25]. Therefore, weaving emotion into the “Base” image of Impressionism may compromise the color, light, and atmosphere display. Expressionism seeks to convey feelings and emotions directly. Artists often utilized intense and non-naturalistic colors, along with agitated brushstrokes, to express subjective experiences, anxieties, and psychological states [25]. Consequently, weaving emotion in the “Base” image of Expressionism may also interfere with its direct composition and color display.
- The “original” is the score of the original artwork, in which the “Abstract Expressionism” and “Impressionism” art movements have a significantly lower score than on the “Base” and “EmotionWoven”. Its scores show an interesting dip from “Base” to “Original” after peaking at “Emotion-Woven”. We deduced that Abstract Expressionism is more challenging for artists in artistic creation as they often emphasize non-representational forms and spontaneous gesture through energetic brushstrokes, dripping paint, and bold colors or serene color fields to express complex, profound, usually subconscious feelings [25]. In other words, our mechanism, referred to as “EmotionWoven,” has a significant inspirational effect compared to the traditional art creation method referred to as “Original,” especially in “Abstract Expressionism” and “Impressionism” art movements.
4.3. Emotion Vision Score Metric Analysis
4.4. Limitations of the Experimental Design
- Establishing a controlled and reproducible first step: The primary goal of the current work was to establish the feasibility and internal validity of the proposed mechanism, particularly focusing on the Emotion Vision Score (EVS). Human creativity is inherently subjective and variable, making it difficult to control in initial experiments. We could create a standardized, reproducible benchmark by using famous art movements with well-documented stylistic and emotional contexts (e.g., the angst of Expressionism and the geometry of Cubism). This allowed us to systematically test whether the EVS metric behaves as theoretically intended across diverse artistic styles before introducing the complexities of an on-site user study.
- Simulating the generalized creative process: Converting a finished artwork into a line-drawing sketch reasonably simulates an artist’s initial compositional idea. The analysis of the original artwork provides a proxy for the rich “Text Description” that an artist might provide about their vision. This allowed us to provide a proof of concept for the entire mechanism—from the initial concept to the inspirational output—and validate that the mechanism could generate significantly different results based on various emotional prompts.
4.5. Benchmarking of the EVS Metric
- Face Validity: The mathematical formulation of the EVS, combining a double-sigmoid and a diversity decay function, was explicitly designed to model the established psychological principle that creativity flourishes in a “sweet spot” between the familiar and the novel. The function’s shape is a direct mathematical representation of this theory, giving it strong face validity.
- Convergent Validity with Domain Knowledge: The results presented in Figure 5 provide a form of convergent validation. The EVS metric generated scores that align with established knowledge from art history. For instance, “Cubism”, an art form defined by its geometric structure, received a high EVS even from the initial sketch, whereas “Abstract Expressionism”, which relies heavily on color and gesture, received a lower initial score but showed significant improvement after emotion weaving. This demonstrates that the EVS is sensitive to stylistic nuances in a way that is consistent with expert human understanding.
- Parameter Sensitivity Analysis: The analysis of the diversity decay component (a) in Figure 7 serves as a sensitivity analysis. It shows that the metric behaves predictably as its internal parameters are adjusted, confirming that it is a stable and well-behaved mathematical function.
5. Conclusions
Funding
Conflicts of Interest
Abbreviations
GAI | Generative Artificial Intelligence |
LLM | Large Language Model |
EVS | Emotion Vision Score |
References
- Zhou, E.; Lee, D. Generative artificial intelligence, human creativity, and art. PNAS Nexus 2024, 3, pgae052. [Google Scholar] [CrossRef] [PubMed]
- Lee Boonstra, Prompt Engineering (White Paper). Available online: https://www.kaggle.com/whitepaper-prompt-engineering (accessed on 23 April 2025).
- Deonna, J.; Teroni, F. The creativity of emotions. Philos. Explor. 2025, 28, 165–179. [Google Scholar] [CrossRef]
- Galanos, T.; Liapis, A.; Yannakakis, G.N. AffectGAN: Affect-Based Generative Art Driven by Semantics. In Proceedings of the 2021 9th International Conference on Affective Computing and Intelligent Interaction Workshops and Demos (ACIIW), Nara, Japan, 28 September–1 October 2021; IEEE: New York, NY, USA, 2021; pp. 1–7. [Google Scholar]
- Rothenberg, A. Creative Emotions and Motivations. In Flight from Wonder: An Investigation of Scientific Creativity; Oxford University Press: Oxford, UK, 2014; pp. 59–72. [Google Scholar]
- Naa Anyimah Botchway, C. Emotional Creativity. In Creativity; Brito, S.M., Thomaz, J., Eds.; IntechOpen: London, UK, 2022. [Google Scholar]
- Sundquist, D.; Lubart, T. Being Intelligent with Emotions to Benefit Creativity: Emotion across the Seven Cs of Creativity. J. Intell. 2022, 10, 106. [Google Scholar] [CrossRef] [PubMed]
- Čábelková, I.; Dvořák, M.; Smutka, L.; Strielkowski, W.; Volchik, V. The predictive ability of emotional creativity in motivation for adaptive innovation among university professors under COVID-19 epidemic: An international study. Front. Psychol. 2022, 13, 997213. [Google Scholar] [CrossRef] [PubMed]
- Rooij, A.; Corr, P.J.; Jones, S. Creativity and Emotion: Enhancing Creative Thinking by the Manipulation of Computational Feedback to Determine Emotional Intensity. In Proceedings of the 2017 ACM SIGCHI Conference on Creativity and Cognition. Singapore, 27–30 June 2017; pp. 148–157. [Google Scholar]
- Wu, Z.; Gong, Z.; Ai, L.; Shi, P.; Donbekci, K.; Hirschberg, J. Beyond silent letters: Amplifying llms in emotion recognition with vocal nuances. arXiv 2024, arXiv:2407.21315. [Google Scholar]
- Alzoubi, A.M.A.; Qudah, M.F.A.; Albursan, I.S.; Bakhiet, S.F.A.; Alfnan, A.A. The Predictive Ability of Emotional Creativity in Creative Performance Among University Students. SAGE Open 2021, 11, 215824402110088. [Google Scholar] [CrossRef]
- Koley, S.; Bhunia, A.K.; Sekhri, D.; Sain, A.; Chowdhury, P.N.; Xiang, T.; Song, Y.Z. It’s All About Your Sketch: Democratising Sketch Control in Diffusion Models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 June 2024. [Google Scholar]
- Prudviraj, J.; Jamwal, V. Sketch & Paint: Stroke-by-Stroke Evolution of Visual Artworks. arXiv 2025, arXiv:2502.20119. [Google Scholar]
- Chatterjee, S. DiffMorph: Text-less Image Morphing with Diffusion Models. arXiv 2024, arXiv:2401.00739. [Google Scholar]
- Csikszentmihalyi, M. Flow: The Psychology of Optimal Experience; Harper & Row: New York, NY, USA, 1990. [Google Scholar]
- Deci, E.L.; Ryan, R.M. Self-determination theory. In Handbook of Theories of Social Psychology; Sage Publications Ltd.: Thousand Oaks, CA, USA, 2012; Volume 1, pp. 416–436. [Google Scholar]
- Finke, R.A.; Ward, T.B.; Smith, S.M. Creative Cognition: Theory, Research, and Application; MIT Press: Cambridge, MA, USA, 1996. [Google Scholar]
- Hogan, S. Art Therapy Theories: A Critical Introduction; Routledge: Abingdon, UK, 2015. [Google Scholar]
- Gero, J.S. Creativity, emergence and evolution in design. Knowl.-Based Syst. 1996, 9, 435–448. [Google Scholar] [CrossRef]
- Adeyekun, A.J. The Elements and Principles of Art. 2019, p. 1. Available online: https://www.academia.edu/100095283/THE_ELEMENTS_AND_PRINCIPLES_OF_ART (accessed on 23 April 2025).
- Clarke, A.; Hulbert, S.; Summers, F. Towards a Fair, Rigorous and Transparent Fine Art Curriculum and Assessment Framework. Arts 2018, 7, 81. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J. Learning transferable visual models from natural language supervision. In Proceedings of the 38th International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Zhou, M.; Wang, Z.; Zheng, H.; Huang, H. Long and short guidance in score identity distillation for one-step text-to-image generation. arXiv 2024, arXiv:2406.01561. [Google Scholar]
- Roper, L. Using Sigmoid and Double-Sigmoid Functions for Earth-States Transitions. 2000. Available online: https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=718b9fdd2ed27d0193179ec0eb01a2ac622d25e3 (accessed on 23 April 2025).
- Meecham, P.; Sheldon, J. Modern Art: A Critical Introduction, 2nd ed.; Routledge: Abingdon, UK, 2005. [Google Scholar] [CrossRef]
- Wang, Z.; Bovik, A.C.; Sheikh, H.R.; Simoncelli, E.P. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef] [PubMed]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
Emotion/State | Brief Description/Manifestation |
---|---|
Passion and Love | A strong motivation, a profound love for the medium, subject, or creation, and an intense desire or compulsion. |
Excitement and Anticipation | The excitement of a new idea and the eagerness to see a vision come to life. |
Curiosity and Wonder | Explore materials and techniques. Ask, “Why?” Stay open to new ideas. |
Focus and Flow (Immersion) | Deep concentration, fully absorbed in work; time disappears; effortless action; deep satisfaction. |
Frustration and Struggle | When things do not go as planned, we may face material resistance, creative blocks, and feelings of inadequacy. |
Doubt and Vulnerability | I often question my abilities and ideas, express concern about the reception of my work, and feel vulnerable in the creative process. |
Determination and Perseverance | Strive to overcome challenges by actively problem solving and refining your work to align with your vision. |
Joy and Elation | When elements align effectively, challenges are resolved, and innovative concepts emerge. |
Satisfaction and Fulfillment | A sense of accomplishment upon completion, along with the feeling that the effort was worthwhile. |
Surprise and Discovery | Artwork can take unexpected directions, leading to “happy accidents” and emergent qualities that bring delight. |
Contemplation and Reflection | Taking a step back to evaluate the work involves critical thinking and introspective decision making. |
Catharsis and Release | Transforming difficult emotions and experiences into creative expression serves as a powerful therapeutic tool for healing and growth. |
Anxiety (Performance/Reception) | Concern about how completed work or commissions will be received by others. |
Creative Emotion Tag | Primary Theoretical Source | Possible Role Within the Art Creative Process |
---|---|---|
Passion and Love | Self-Determination Theory [16] | Represents the intrinsic motivation and profound connection to the subject or medium, serving as the foundational driver for creative work. |
Excitement and Anticipation | Creative Cognition Models [17] | Captures the initial high-energy state when a new idea is conceived and the artist is eager to see it realized. |
Curiosity and Wonder | Problem-Solving Models of Creativity [17] | Reflects the exploratory mindset of the artist, involving openness to new techniques, materials, and ideas, and asking “What if?”. |
Focus and Flow (Immersion) | Flow Theory [15] | Represents the state of deep, effortless concentration and engagement where the artist is fully absorbed in the act of creating. |
Frustration and Struggle | Creative Cognition Models [17] | Acknowledges the common and necessary phase of encountering creative blocks, technical challenges, or material resistance. |
Doubt and Vulnerability | Psychoanalytic and Art Therapy Principles (Hogan) | Addresses the internal states of self-questioning and the emotional risk involved in expressing personal ideas and exposing them to others. |
Determination and Perseverance | Self-Determination Theory; Problem-Solving Models [16] | Embodies the artist’s drive to overcome obstacles, refine their work, and bring their vision to completion through sustained effort. |
Joy and Elation | Flow Theory [15] | The positive affective state experienced when creative elements align, challenges are successfully overcome, or an innovative breakthrough occurs. |
Satisfaction and Fulfillment | Self-Determination Theory [16] | Relates to the feelings of competence, accomplishment, and autonomy that arise upon the completion of an artwork. |
Surprise and Discovery | Emergentist Theories of Creativity [19] | Represents the “happy accidents” and unexpected outcomes that arise during the process, leading to new and unplanned creative directions. |
Contemplation and Reflection | Creative Cognition Models [17] | Describes the critical phase of stepping back to evaluate the work, involving introspective decision making and assessment of the piece. |
Catharsis and Release | Psychoanalytic and Art Therapy Principles [18] | Captures the therapeutic function of art, where the creative process is used to express, process, and find release from difficult internal emotions. |
Anxiety (Performance/Reception) | Art Therapy Principles; Social Models of Creativity [18] | Relates specifically to the external pressures and concerns about how the final artwork will be perceived, judged, or received by an audience. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Yu, Y.-C. Research on Emotion-Based Inspiration Mechanism in Art Creation by Generative AI. Mathematics 2025, 13, 2597. https://doi.org/10.3390/math13162597
Yu Y-C. Research on Emotion-Based Inspiration Mechanism in Art Creation by Generative AI. Mathematics. 2025; 13(16):2597. https://doi.org/10.3390/math13162597
Chicago/Turabian StyleYu, Yuan-Chih. 2025. "Research on Emotion-Based Inspiration Mechanism in Art Creation by Generative AI" Mathematics 13, no. 16: 2597. https://doi.org/10.3390/math13162597
APA StyleYu, Y.-C. (2025). Research on Emotion-Based Inspiration Mechanism in Art Creation by Generative AI. Mathematics, 13(16), 2597. https://doi.org/10.3390/math13162597