Next Article in Journal
Normative Data of Neuromuscular Function in Upper Limb and Its Correlation with Superficial Fascia and Body Mass Composition
Previous Article in Journal
Microbiome as a Sensitive Indicator of River Environmental Health—A Catchment-Scale Approach (Poland)
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Modeling Inherent Aesthetics and Contextual Decisions for Personalized Color Recommendation in AIGC

School of Mechanical Science & Engineering, Huazhong University of Science and Technology, Wuhan 430075, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(3), 1543; https://doi.org/10.3390/app16031543
Submission received: 11 December 2025 / Revised: 31 January 2026 / Accepted: 1 February 2026 / Published: 3 February 2026
(This article belongs to the Section Computing and Artificial Intelligence)

Abstract

While creative Artificial Intelligence (AI) tools offer unprecedented creative power, their outputs often create a “personalization gap” by converging towards a generalized “average aesthetic” that ignores nuanced user preferences. This study addresses this challenge with a proof-of-concept computational framework to model and predict subjective color choices, aiming to make creative systems more human-centered. Our dual-track methodology attempts to decouple user preference into “inherent aesthetic profiles” and “contextual design decisions.” Through a dual-level study with 111 participants, we quantified inherent aesthetics into a vector library and trained a Gradient Boosting Decision Tree (GBDT) model on contextual data to predict design choices. The model achieved a predictive accuracy of 40.8%, and a grouped permutation importance analysis revealed the Product Category (Importance = 0.416) as the dominant predictor, providing evidence that design context is paramount. Crucially, a subsequent exploratory user validation study, analyzed with a linear mixed-effects model, showed our personalized recommendations were rated as significantly more satisfying (β = 1.278, p < 0.001) than those of a non-personalized baseline. This research provides a foundational framework for modeling subjective preference by distinguishing between stable traits and dynamic choices, offering a potential pathway to steer creative AI beyond generic outputs towards more personal and context-aware creative partners.

1. Introduction

1.1. The Rise of AIGC and the New “Personalization Gap”

The advent of creative Artificial Intelligence (AI) tools, such as Midjourney and DALL-E [1], has marked a paradigm shift, redefining the boundaries of digital creativity. These models offer unprecedented capabilities, generating visually stunning and complex imagery from simple text prompts, thereby democratizing design and accelerating production workflows [2]. However, amidst this wave of generative power, a fundamental limitation has emerged: a persistent “personalization gap.” While technically proficient, these AI models are typically trained on vast, heterogeneous internet data. Consequently, their color outputs often converge towards a statistically derived, generalized “average aesthetic” [3]. While a rich body of color preference research exists, it was largely developed for static design contexts and is ill-equipped to provide the real-time, dynamic, and personalized guidance required by modern generative systems. This “one-size-fits-all” approach struggles to cater to the specific, nuanced, and deeply personal aesthetic preferences of individual users or niche target audiences.
The core challenge lies in the contradiction between the generalized nature of large-scale models and the subjective nature of aesthetic judgment. A color palette deemed “professional” by a corporate designer is vastly different from one considered “vibrant” by a Generation Z content creator. Yet, current creative AI tools lack the fine-grained control mechanisms to reliably distinguish between and cater to these diverse user needs. The reliance on “prompt engineering” [4] remains a process of trial and error, falling short of a systematic, predictable method for aligning generative outputs with individual tastes. This personalization gap not only limits the practical utility of these tools in professional design workflows but also hinders their potential to become truly collaborative and user-adaptive creative partners [5,6]. For instance, in consumer electronics, it prevents a brand from automatically generating product colorway variations that appeal to different market segments—such as tech enthusiasts versus casual users. Similarly, in industrial design, it limits the ability to rapidly create customized product visualizations for client presentations that align with a specific corporate identity or user persona. While fully integrating personalization into the complex, end-to-end creative AI pipeline is a grand challenge, a foundational first step is to develop a robust, computational model that can accurately predict user preference. This study focuses on this foundational step, aiming to provide the predictive intelligence that could one day power such personalized generative systems.

1.2. Towards a Dual-Level Preference Model: Decoupling Inherent Aesthetics from Contextual Decisions

To begin bridging this personalization gap, we must recognize that user preference is not a monolithic concept. While a wealth of research confirms that color preference is not arbitrary but is deeply contextual and tied to meaning [7,8], we argue that a truly effective personalization model must go a step further by deconstructing user preference into two levels: the stable, abstract “inherent aesthetic” and the dynamic, concrete “contextual decision.” A user’s inherent aesthetic—for instance, their general affinity for a “Tech/Minimalist” style—constitutes what we term their relatively stable “quantified aesthetic profile.” However, how this inherent preference is modulated, translated, and applied when they make a design decision for a home appliance (the context) is a more complex and dynamic process.
The core hypothesis of this study is that by first explicitly distinguishing and separately quantifying these two preference levels at the data layer, the complex problem of subjective preference modeling becomes tractable. We introduce a multi-dimensional semantic space composed of four fundamental pillars of design intent: Style (e.g., “Bright”), Function (e.g., “Warning,” “Safe”), Emotion (e.g., “Joyful,” “Calm”), and Culture. The methodology for constructing this multi-dimensional lexicon and, crucially, the neurocognitive validation of its semantic-to-color mapping accuracy using Event-Related Potentials (ERPs) have been detailed in our foundational prior work [9]. That study confirmed that colors generated via our framework are cognitively congruent with their target semantics, providing a scientifically robust foundation for the current personalization research. This allows us to move beyond ambiguous color descriptions and instead define personalization modeling as a dual task: first, to profile a user’s inherent aesthetic, and second, to predict their contextual design choices based on that profile.

1.3. Research Questions and Contributions

To lay the groundwork for addressing the personalization gap in AI-driven color design, this study develops and evaluates a machine learning framework for personalized, semantically grounded color recommendation. Our research is guided by the following core questions:
(RQ1) How can users’ inherent aesthetic preferences and their contextual design decisions be effectively distinguished, quantified, and what systematic patterns do they exhibit within our participant sample?
(RQ2) Can a machine learning model be effectively trained to predict a user’s specific design choices based on their profile and the design context within a controlled experimental setting?
(RQ3) What are the key factors within a user’s profile and the design context that most significantly influence color preference predictions?
The primary contributions of this work are threefold. First, we propose a novel dual-track modeling framework that attempts to decouple user preference into “inherent aesthetics” and “contextual decisions.” We characterize the former via aesthetic vectorization and predict the latter using a supervised learning model. This provides a new conceptual paradigm and a potential direction for personalization research in computational aesthetics.
Second, we provide empirical evidence suggesting the framework’s effectiveness. Our Gradient Boosting Decision Tree (GBDT) model [10] not only achieves a predictive accuracy of 40.8% in offline tests, but more importantly, an exploratory follow-up user study demonstrated that its personalized recommendations were rated as significantly more satisfying (β = 1.278, p < 0.001) than a data-driven, non-personalized baseline.
Third, we offer novel quantitative insight into the drivers of aesthetic judgment. Our robust grouped permutation importance analysis revealed a clear hierarchy, with Product Category (Context) being the dominant predictor (Importance = 0.416). This “Context is King” finding provides data-driven evidence for the necessity of context-aware personalization in creative AI tools. By addressing these questions, our study offers a potential path toward making generative tools not just more powerful, but more personal, intuitive, and truly aligned with the diversity of human creative intent.

2. Related Work

To position our contribution, we review three intersecting domains of research: the factors influencing color preference, existing computational approaches to color selection, and the current state of personalization within creative AI. Across all three domains, we will argue that a common limitation persists: the challenge of systematically distinguishing and modeling a user’s stable, ‘inherent aesthetic’ from their dynamic, ‘contextual decisions,’ a gap our dual-track framework is specifically designed to address.

2.1. Factors Influencing Color Preference

The study of color preference has a rich history, evolving from a search for universal aesthetic principles to a more nuanced understanding of its contextual and individual nature. Early research often sought to identify universally liked colors, with foundational theories like the Ecological Valence Theory positing that preferences are tied to affective responses to objects associated with those colors [11], often leading to broad, cross-cultural trends such as a general preference for the color blue [12,13]. However, this universalist view has been increasingly challenged by research demonstrating the profound impact of various moderating factors.
A significant body of work has established that preference is shaped by experience and environment. Cultural background plays a critical role in shaping color associations and, consequently, preferences, with colors like white holding vastly different meanings in Eastern and Western contexts [14,15,16]. Similarly, demographic factors such as age and gender have been consistently shown to correlate with distinct patterns of color choice, with preferences often shifting across the lifespan [17]. More pertinent to our research, a user’s professional background can create a significant divergence in aesthetic judgment. A consistent finding is that design professionals often exhibit preferences for more complex, muted, or unconventional palettes compared to laypersons, who may favor more primary and saturated colors [18].
While these studies provide compelling evidence that color preference is a multi-faceted phenomenon, they are often descriptive in nature. They successfully identify that these factors have an influence but do not offer a computational model for how they interact. Specifically, they implicitly group factors that constitute a user’s stable, inherent traits (e.g., culture, age, professional training)—what we define as an ‘inherent aesthetic profile’—but they do not systematically model the translation process of this profile into a dynamic choice within a specific application context (a ‘contextual decision’). Therefore, a clear research opportunity exists in moving from descriptive identification to a predictive, computational framework that can begin to quantify this translation process and leverage it to build a functional, personalized recommendation engine.

2.2. Computational Approaches to Color Selection and Recommendation

Existing computational tools for color selection can be broadly categorized into three paradigms. When examined through the lens of our dual-track framework, each reveals a fundamental inability to decouple inherent aesthetics from contextual decisions.
First, rule-based systems, such as early versions of Adobe Color, are grounded in classical color harmony theories proposed by artists like Johannes Itten [19]. These tools generate palettes based on fixed geometric relationships on the color wheel (e.g., complementary, analogous, triadic) [20]. While theoretically sound and interpretable, these systems are inherently rigid. Crucially, their failure in personalization is twofold: they completely ignore the concept of an ‘inherent aesthetic profile’—applying the same harmony rules to every user—and they are context-agnostic, lacking any mechanism for ‘contextual decisions.’ A complementary color scheme for a children’s toy is generated with the same logic as one for a medical device.
Second, data-driven systems emerged with the advent of large-scale image repositories. Tools like Color Hunt and some features of Pinterest mine popular designs from platforms like Behance and Dribbble to extract trending color palettes [21]. These systems excel at reflecting contemporary aesthetics but, by their nature, capture the “average aesthetic” of a community. The core limitation here is that they conflate inherent aesthetics with contextual decisions. A trending palette might be popular because it was predominantly used for fintech app interfaces (a contextual factor), or because its high-contrast nature appeals to a certain designer demographic’s inherent taste (an inherent factor). Data-driven systems are blind to this distinction, treating popularity as a monolithic signal. They fail to decouple why something is popular and therefore cannot answer the critical personalization question: ‘For this specific user (with their unique inherent aesthetic profile), what color is appropriate for this specific product (their contextual decision)?’
Third, semantic and affective models represent a step towards more intelligent selection, establishing statistical mappings between emotion words (e.g., “Exciting,” “Calm”) and specific color attributes [22,23]. Research at Kansei Engineering, for example, has long sought to connect psychological feelings with product properties like color [24]. While these models rightly focus on the critical link between color and meaning, their failure lies in modeling a universal context-to-color link while ignoring the moderating role of the user’s inherent aesthetic profile. They answer the question, “What color corresponds to ‘calm’?”, but fail to address the subsequent, more critical question: “For a user whose inherent aesthetic profile favors ‘Tech/Minimalist’ styles, what concrete color choice will they make when designing a product that needs to convey ‘calmness’ (the contextual decision)? Their choice will invariably differ from a user with a ‘Vintage/Warm’ aesthetic profile facing the same context. By failing to model this crucial interaction between stable traits and dynamic choices, these systems cannot achieve true personalization.”
In summary, prior approaches leave a significant gap for a framework that can distinguish and model a user’s “inherent aesthetics” versus their “contextual decisions.” A system is needed that is not only semantically rich but also deeply personalizable, moving beyond the application of a universal semantic rule to all users.

2.3. Personalization in the Creative AI Era

The rise of Artificial Intelligence-Generated Content (AIGC) has fundamentally altered the landscape of creative tools, but personalization remains a nascent feature. Current methods for personalization, while powerful, often operate at a coarse granularity and typically do not implement the systematic decoupling of inherent and contextual preferences that our framework proposes.
The first method is prompt engineering, a manual, iterative process of trial-and-error [4]. While a user can try to specify both inherent style (e.g., “in a minimalist style”) and context (e.g., “for a coffee maker”), this process is unsystematic and unpredictable. It relies on the model’s opaque internal representations rather than a structured understanding of the user, thus failing to provide a reliable bridge between a user’s profile and the final output.
The second method is model fine-tuning, using techniques like Low-Rank Adaptations (LoRAs) to generate images in a specific style [25]. This approach is powerful for capturing a user’s specific aesthetic style—for instance, training a LoRA on a designer’s portfolio. However, it is often a static solution. A LoRA trained for ‘vintage sci-fi posters’ is ill-equipped to adapt its output for a ‘modern corporate logo’ design task. It captures the inherent preference but lacks a dynamic mechanism to modulate that preference based on a new ‘contextual decision.’
To crystallize the distinctions discussed and explicitly highlight the novelty of our framework, we present a comparative analysis against leading paradigms in design recommendation and creative AI personalization. Table 1 contrasts our Dual-Track Model with three state-of-the-art approaches, evaluating them across key dimensions of personalization.
As the comparison in Table 1 illustrates, our Dual-Track Model offers a unique synthesis of capabilities that existing approaches lack. While data-driven methods fail to personalize beyond group averages and AIGC fine-tuning can capture inherent style but often lacks contextual adaptability, our framework is one of the first to explicitly model and predict the interaction between a user’s stable aesthetic profile and their dynamic, context-driven choices. This dual-level approach not only aims for a higher degree of personalization but also provides a scalable, interpretable, and context-aware solution that addresses a critical gap in the current state of the art.

3. Methodology

3.1. Research Framework Overview

This study employs a systematic, multi-phase research methodology designed to construct and evaluate a conceptual framework for color preference prediction that attempts to decouple users’ inherent aesthetic preferences from their contextual design decisions. As illustrated in Figure 1, the entire research framework is organized into three core phases: Dual-Level Data Collection, Dual-Track Data Modeling, and Application System Validation. This framework serves not only as the methodological blueprint for this section but also as a clear roadmap for the research findings that follow.
The research commenced with a meticulously designed Dual-Level Data Collection phase. Through a comprehensive online study, we systematically gathered user profiles, abstract aesthetic preferences (Level 1), and contextual design choices (Level 2) from 111 participants, establishing a foundational data foundation.
Subsequently, in the Dual-Track Data Modeling phase, we applied a differentiated approach to process the collected data:
(1)
Aesthetic Vectorization Track: For the Level 1 abstract preference data, we employed a quantification strategy to transform it into multi-dimensional vectors intended to represent each user’s “inherent aesthetic profile.” This process resulted in a computable User Aesthetic Vector Library.
(2)
Decision Modeling Track: For the Level 2 contextual choice data, we formulated a supervised learning task. Through feature engineering, we trained a Gradient Boosting Decision Tree (GBDT) model to learn the mapping from “user profile + design context” to “specific design decisions.”
Finally, in the Application System Validation phase, these two outputs were integrated into a hybrid recommendation system. An independent, counterbalanced within-subjects user study was conducted to evaluate the potential utility of this system in simulated design scenarios.

3.2. Phase 1: The Dual-Level Data Collection Framework

To build a model capable of capturing a structured representation of user preferences, we designed an online study, termed the “Dual-Level, Four-Dimensional Preference Profiling Framework,” to investigate the interplay between user characteristics, abstract aesthetics, and contextual color choices.

3.2.1. Participants

A total of 111 participants were recruited for the study through an online survey platform, Wenjuanxing (www.wjx.cn; Ranxing Information Technology Co., Ltd., Changsha, China). The cohort comprised 58 females (52.3%) and 53 males (47.7%). Participants’ ages ranged from 18 to over 55 (estimated M ≈ 33.8 years), with a high concentration of participants (86.5%) falling between 18 and 45 years old. To investigate the influence of professional background, the cohort was divided into two groups based on self-reported design experience. The Non-Design Professional Group consisted of 68 participants (61.3%) who reported having no design-related experience. The Design-Experienced Group consisted of 43 participants (38.7%), which included students and practitioners from design-related fields (n = 33) as well as individuals with design experience as a personal hobby (n = 10). Participants also provided information on their educational background and familiarity with various product types. All participation was voluntary, and informed consent was obtained prior to the study. A detailed summary of participant characteristics is provided in Table 2. Descriptive statistics of participant characteristics were calculated using SPSS Statistics (version 28.0; IBM Corp., Armonk, NY, USA).

3.2.2. The Dual-Level, Four-Dimensional Preference Profiling Framework

To construct a comprehensive and robust user preference profile, we designed and implemented a systematic data acquisition strategy engineered to capture both what we term intrinsic aesthetic inclinations and the applied, context-dependent judgments of users. The entire framework is grounded in the four semantic dimensions established in our prior work: Style, Function, Emotion, and Culture [9].
It is crucial to clarify the role of the ‘Culture’ dimension within this study’s scope. While Style, Function, and Emotion represent relatively universal psychological constructs, the Culture dimension is inherently context-specific. As detailed in our foundational work [9], the cultural semantics identified and utilized in this study are deeply rooted in the symbolic system of the “Five Colors” theory from Chinese culture (e.g., red for festivity, yellow for nobility).
Therefore, the ‘Culture’ pillar in this research should be interpreted not as a universal finding, but as a specific instantiation demonstrating our framework’s potential capability to capture and model culturally specific color associations. The methodological framework itself—identifying key cultural symbols and mapping them to color preferences—is proposed as a generalizable approach. Future research could apply this same framework to other cultural contexts to populate it with different, locally relevant color semantics (e.g., mapping colors to concepts of purity, mourning, or royalty in a Western context).
The framework operates on two complementary levels of abstraction:
  • Level 1: Profiling Inherent Aesthetics
This level aims to capture a representation of a user’s aesthetic identity, as expressed through choices within our stimulus set. Participants were systematically presented with 25 subcategories across the four dimensions (e.g., “Vibrant” for Style, “Safety” for Function). These 25 subcategories were not arbitrarily chosen but are the outcome of a rigorous, multi-stage process detailed in our foundational work [9]. As summarized therein, they were derived by first using natural language processing to mine a large corpus of design-related texts, followed by unsupervised clustering (K-means) to identify the four primary dimensions. Subsequently, a combination of hierarchical clustering, topic modeling, and expert calibration was employed to deconstruct each primary dimension into these fine-grained, semantically coherent subcategories. This ensures that the stimuli used in the present study are grounded in a systematically constructed and validated semantic space. For each subcategory, its name, a core semantic description, and three representative color options were displayed. Participants were required to choose their most preferred option. This process aims to map out their “inherent aesthetic profile,” providing data on their foundational inclinations towards different design concepts, independent of any specific application.
  • Level 2: Profiling Contextual Design Decisions
This level measures how a user’s choices are modulated by real-world design constraints. To achieve this, we introduced multiple distinct products (e.g., a household hair dryer, a handheld screwdriver). For each product, participants were presented with three color schemes representing different design directions (e.g., “Bright,” “Fashionable,” and “Gentle” styles for a desk lamp, as shown in Figure 2) and were required to make a three-alternative forced choice (3-AFC). The generation of these three color options was systematic, ensuring that while all choices were semantically plausible, they represented a clear hierarchy of appropriateness for the specific product context. The curation logic was as follows:
  • Optimal Color: One color was chosen to be highly conventional and contextually optimal, representing the most common or classic design choice for that product category (e.g., a “high-visibility warning yellow” for a handheld screwdriver, a standard color in the power tool industry for safety and visibility).
  • Plausible but Suboptimal Color: A second color was selected from a related but less conventional semantic style. It is aesthetically valid but represents a more niche or creative choice for the product (e.g., a “hazard orange-red” for the same screwdriver, which also aligns with the “warning” function but is often associated with more urgent emergency equipment).
  • Plausible but Distant Color: The third color was chosen from a semantic style that is generally positive but contextually distant, acting as a sensible yet less fitting distractor (e.g., a “durable combat camouflage” color, which, while conveying robustness, shifts the primary semantic from “warning” to “durability” and is less typical for a consumer power tool).
Figure 2. Three-alternative forced choice (3-AFC) task for the desk lamp scenario, presenting three distinct style schemes: (a) a desk lamp representing the ‘Bright’ style; (b) a desk lamp representing the ‘Fashionable’ style; (c) a desk lamp representing the ‘Gentle’ style.
Figure 2. Three-alternative forced choice (3-AFC) task for the desk lamp scenario, presenting three distinct style schemes: (a) a desk lamp representing the ‘Bright’ style; (b) a desk lamp representing the ‘Fashionable’ style; (c) a desk lamp representing the ‘Gentle’ style.
Applsci 16 01543 g002
This structured approach, which presents a gradient of contextual appropriateness, ensures that the participant’s choice is a meaningful signal of their pragmatic design preference within this forced-choice task. As illustrated in Figure 2, for a desk lamp, the options might correspond to “Bright,” “Fashionable,” and “Gentle” styles. This choice does not merely indicate what they like in general, but forces them to make a pragmatic design decision for a specific product. This level captures their “contextual design decisions,” revealing how abstract preferences translate into concrete choices within a given context.
Crucially, the goal of this task is not to predict a user’s choice of a specific, discrete color hex code, but rather to predict their preference for the underlying semantic style represented by that color. Each of the 25 semantic subcategories in our lexicon (e.g., “Bright,” “Gentle,” “Durable”) is parametrically mapped to a continuous, bounded region within the HSB color space, a mapping that was developed and neurocognitively validated in our prior work [9]. Therefore, by training the model to predict the preferred semantic category in a forced-choice task, we are in effect training it to identify the most suitable region of the continuous color space for a given user and context. This semantic generalization approach is the key to scaling our model’s predictions beyond the discrete options presented in the survey to real-world applications involving continuous color selection or palette generation. Furthermore, while each product was presented with a unique trio of semantic options, all options were drawn from this unified, global lexicon of 25 semantic categories. This design ensures a consistent label space for our predictive model, a critical detail for its construction and evaluation as elaborated in Section 3.3.2.
This dual-level data acquisition strategy was systematically repeated across all four semantic dimensions. The resulting dataset is thus uniquely rich, providing a holistic view that links user profiles to both their stable aesthetic core and their dynamic, context-sensitive preferences. This structured, hierarchical data is fundamental to building a context-aware and generalizable prediction system.

3.2.3. Foundation: Semantic Lexicon Construction and Neurocognitive Validation

The stimuli and semantic structure used in this study are built upon a validated computational framework from our prior work [9]. A summary of its construction and validation is provided here for completeness.
Lexicon Construction and Semantic-to-Color Mapping: The foundational framework was built in two stages. First, a four-dimensional semantic lexicon was constructed through a rigorous data-driven and expert-guided process. This involved mining candidate terms from multiple corpora, using K-means clustering to identify four primary dimensions, and then having a panel of 12 experts calibrate the final categorization of 234 core terms, achieving substantial inter-rater reliability (Fleiss’ Kappa = 0.78). These dimensions were named Style, Function, Emotion, and Culture. Second, a parametric model was established to map these semantic concepts to the perceptually uniform CIELAB color space. For each of the 25 subcategories used in the present study, a “benchmark domain” (a bounding box in CIELAB) was systematically defined by synthesizing principles from color theory with empirical data from industry standards.
Neurocognitive Validation: Crucially, the validity of this semantic-to-color mapping was empirically tested using Event-Related Potentials (ERPs), a method that measures the brain’s response to semantic congruity via the N400 component. This neurocognitive experiment yielded two key findings that directly inform the present study:
Cognitive Congruency: An equivalence test (TOST) on N400 amplitudes confirmed that colors generated by the framework were statistically equivalent (p < 0.001) to expert-defined, highly congruent colors, validating that the mapping is cognitively meaningful.
Hierarchy of Semantic Importance: An analysis of semantic mismatch conditions revealed a clear hierarchy in how color attributes contribute to meaning: Hue (H) > Brightness (B) > Saturation (S). This finding provides a data-driven rationale for prioritizing certain color attributes in our preference modeling.

3.3. Phase 2: Dual-Track Data Modeling

Following the structured data acquisition, we proceeded to the Dual-Track Data Modeling phase. In contrast to monolithic user modeling approaches that often rely on implicit feedback to learn a single latent representation of the user (e.g., via matrix factorization in collaborative filtering [26]), our framework adopts an explicit, dual-track strategy. This approach allows us to separately model a user’s stable traits and their dynamic decision-making processes, offering a path toward greater interpretability. The proposed innovation lies in a two-stage strategy that first quantifies a user’s stable, intrinsic preferences and then uses this profile to predict their dynamic, context-dependent choices.
Stage 1: Constructing the Inherent Aesthetic Profile from Level 1 Data. The first stage aims to create a quantitative profile of each user’s core aesthetic disposition. As detailed in Section 3.3.1, this is achieved by analyzing the relative colorimetric properties (i.e., HSB Z-scores) of their choices in abstract, context-free tasks. The output is a multi-dimensional ‘inherent aesthetic profile’ vector for each user, which serves as a stable feature set representing their preferences as captured by our instrument.
Stage 2: Training the Contextual Predictor from Level 2 Data. The second stage aims to predict a user’s choice in a specific design scenario. As detailed in Section 3.3.2, we trained a machine learning model where the target variable is the semantic category of the user’s chosen design. The model learns a mapping from a combination of the user’s static ‘inherent aesthetic profile’ (from Stage 1) and dynamic contextual features (e.g., product category) to a predicted semantic preference. This semantic prediction can then be potentially generalized to continuous color spaces based on our prior work [9].
This dual-component architecture allows the framework to first model who the user is at a fundamental level, and then predict what they will choose in a specific situation. The following sections elaborate on the methodologies for each of these two modeling tracks.

3.3.1. Aesthetic Vectorization: Constructing the Inherent Aesthetic Profile Library

The Level 1 data, reflecting a user’s abstract preferences, serves to define “who the user is.” To construct this user profile, often termed an ‘inherent aesthetic profile,’ we opted for an explicit feature engineering approach based on users’ semantic preferences. This contrasts with implicit profiling methods that learn user embeddings from behavioral data like clicks or purchases [27]. Our explicit method was chosen because each dimension of the resulting vector is directly interpretable as a preference for a specific semantic concept (e.g., ‘Bright’ or ‘Tech’), a crucial requirement for building a transparent and human-centered design tool. To transform their discrete choices into this representation, we developed a hybrid vectorization strategy that employs two distinct methods tailored to different semantic dimensions. The resulting 25-dimensional vector constitutes each user’s “vectorized aesthetic signature.”
  • Weighted Scoring for ‘Style’ Dimensions
For the 10 ‘Style’ dimensions, which represent high-level aesthetic concepts, we employed a two-part weighted scoring method instead of Z-scores to better capture both conceptual and visual preference. This score is calculated using the formula: Score = Base_Weight + (1 − Base_Weight) × Fine_Weight. The components of this formula are determined as follows: The Base_Weight is assigned a value of 0.5 if a user indicates a preference for the style’s abstract name, and 0 otherwise, capturing their direct conceptual affinity. The Fine_Weight is calculated as the ratio of selected color samples to the total number of samples associated with that style (e.g., if a user selects 3 out of 12 associated colors, the Fine_Weight is 0.25), measuring their approval of the style’s specific color instantiations. This composite approach thus provides a nuanced score that combines a user’s abstract affinity for a style concept with their concrete preference for its visual representation.
2.
Relative Preference Z-scores for ‘Function,’ ‘Emotion,’ and ‘Culture’ Dimensions
Conversely, for the 15 dimensions related to more concrete concepts (i.e., Function, Emotion, and Culture), we used a relative preference calculation method based on Z-scores to quantify a user’s tendency towards specific chromatic properties. The process is as follows:
(1)
Chromatic Attributes and Choice of Color Space: A critical methodological choice in this study was the use of the HSB (Hue, Saturation, Brightness) color model for the purpose of vectorizing user preferences. This differs from the perceptually uniform CIELAB space utilized in our foundational framework [9], a decision that was deliberate and predicated on the different goals of the two tasks. While our prior work aimed for generative accuracy, for which CIELAB is optimal, the current study’s goal is to model a user’s subjective aesthetic preferences. For this purpose, the HSB model’s dimensions align more intuitively with human conceptualizations of color properties—namely, its identity (Hue), intensity (Saturation), and lightness (Brightness).
Therefore, to capture this conceptual structure of preference, we chose to operate within the HSB space. To determine which of its attributes carries the most semantic weight, we drew upon the findings from our foundational neurocognitive study [9]. That study, using Event-Related Potentials (ERPs), empirically established a clear hierarchy of semantic importance for color attributes: Hue (H) > Brightness (B) > Saturation (S). The results showed that while mismatches in Hue and Brightness elicited significant semantic conflict signals in the brain (a large N400 effect), the effect of a Saturation-only mismatch was negligible (Cohen’s d = −0.020).
Given that Hue (H) is a categorical attribute and difficult to treat linearly, we selected the next most semantically impactful continuous attribute, Brightness (B), as the basis for our Z-score calculation. This choice is therefore not arbitrary but is grounded in neurophysiological evidence suggesting that a user’s preference along the Brightness dimension is a stronger indicator of their semantic interpretation than their preference for Saturation.
(2)
Local Normalization: For each question, the mean and standard deviation of the Brightness values were calculated based only on the three options presented. We acknowledge that using a standard deviation derived from only three samples is a methodological limitation that makes the resulting Z-score sensitive to the specific composition of our curated triplets. This measurement constraint is a key consideration when interpreting the scope of our findings. To ensure transparency regarding our stimulus design, we analyzed the descriptive statistics for the Brightness values of all 15 triplets used in the Z-score calculations. The analysis revealed sufficient variability in the options presented to participants (the mean standard deviation of Brightness across all triplets was 0.055). This indicates that the stimuli were not constructed with uniformly minimal or maximal variance, which supports the reasonableness of this local normalization approach within our experimental context.
(3)
Z-score Calculation: Based on the user’s specific choice, we then computed the standard score (Z-score) of the chosen color’s Brightness relative to the local mean of the three options. This captures a user’s preference for relatively higher or lower brightness within that specific semantic context, a dimension empirically shown to be critical for conveying meaning [9].
The vectors from all 111 participants, constructed using this hybrid methodology, collectively form the User Aesthetic Vector Library. The robustness and internal consistency of these composite vectors were empirically validated through a split-half reliability analysis, as detailed in Section 4.1.4.

3.3.2. Decision Modeling: Building the Context-Aware Preference Predictor

The Level 2 data, reflecting a user’s decision-making in specific contexts, is ideal for learning a mapping rule from “input” to “output.” We leveraged this to build a supervised learning model.
  • Dataset and Feature Engineering
The final dataset for modeling was constructed from the survey responses of 111 participants. Each participant completed 9 multi-class choice tasks (3 for Style, 2 for Function, 2 for Emotion, and 2 for Culture), resulting in a total of 999 decision records. Each record was then transformed into a feature vector to serve as input for our predictive models.
The input feature space (X) was engineered to comprise three main components: a User Profile, the user’s “Inherent Aesthetic Profile”, and the Contextual Features. The User Profile encapsulated demographic attributes (Age, Gender, Education) and a nuanced representation of professional expertise, which was decomposed into its nature (hobby/professional) and duration (years). The “Inherent Aesthetic Profile” component is the 25-dimensional vector representing stable, abstract preferences, as derived from the Level 1 task. The final and most critical component involved contextual features engineered through a principle we term “semantic generalization”. This approach is the cornerstone of our model’s ability to be context-aware. Instead of treating each product scenario from the survey (e.g., “hair dryer”) as a unique, isolated instance, we mapped them to a higher-level abstract feature: product_category (e.g., “Personal Consumer Electronics”). This generalized category, along with the dimension_context (“Style”, “Function”, etc.), was then one-hot encoded.
This comprehensive process resulted in a final feature set of 103 features per record, providing a rich, multi-faceted representation of both the user and the decision-making context.
The target variable (y) for this supervised learning task was defined as the semantic label of the user’s chosen design option, creating a multi-class classification problem. A critical aspect of our experimental design is that while the combination of presented options was unique to each product scenario (e.g., {“Bright”, “Fashionable”, “Gentle”} for a desk lamp, or {“White”, “Black”, “Red”, “Yellow”, “Cyan”} for a car), all semantic labels were drawn from a single, unified global label space. This space includes the 25 subcategories from our four core dimensions. For instance, in the car scenario, the model’s task is to predict which of the five cultural colors the user will choose. By framing the problem this way, our multi-class classifier is not trained on inconsistent, local label sets. Instead, it learns to predict a user’s preference for a specific semantic concept from a consistent, overarching vocabulary, ensuring the model is interpretable and its accuracy is a meaningful measure of performance.
2.
Model Selection and Training
For the task of predicting a user’s contextual design choice, we selected tree-based ensemble models for evaluation, specifically Gradient Boosting Decision Tree (GBDT) and Random Forest. All data processing, modeling, and analysis were conducted using Python (version 3.8.8). The core scientific computing libraries employed included pandas (version 2.0.3) for data manipulation, NumPy (version 1.24.4) for numerical operations, and scikit-learn (version 1.3.2) for the implementation of machine learning models, the Group K-Fold cross-validation protocol, and performance evaluation metrics. These models are particularly well-suited for tabular data of this nature, as they can effectively capture complex, non-linear interactions between features and are robust to feature scaling, outperforming traditional linear models like logistic regression in many similar classification tasks [28]. Furthermore, unlike less transparent ‘black-box’ models such as deep neural networks, their mechanisms for feature importance analysis provide critical insights into the drivers of aesthetic judgment, a key requirement for addressing our RQ3.
To ensure a rigorous and stable evaluation of model performance, we employed a 5-fold Group K-Fold cross-validation protocol. The entire pool of participants was randomly partitioned into 5 distinct groups. While we did not apply stratification at the group level to balance class distributions across folds, this user-level grouping is paramount for preventing the primary threat of data leakage in a personalization context. In each of the 5 “outer loop” iterations, data from 4 groups of users were used for training, while data from the remaining group served as the held-out test set.
This Group K-Fold approach guarantees that all data from a single participant are exclusively in either the training or the test set within each fold, preventing data leakage. Crucially, all preprocessing steps (e.g., feature scaling) were fitted only on the training data of each fold and then applied to the test data. Furthermore, hyperparameter tuning for each model was performed independently within each fold using a nested cross-validation approach. Specifically, for each outer training set (4 folds of users), we conducted an inner 3-fold group cross-validation with a grid search to find the optimal hyperparameters. This nested procedure ensures that the test set for the outer fold remains completely unseen during all stages of model training and tuning, providing an unbiased estimate of generalization performance. The final performance metrics reported are the average and standard deviation of the scores obtained across the 5 outer folds.
3.
Model Performance Evaluation and Feature Importance Analysis
The model’s performance was rigorously evaluated across the cross-validation folds using accuracy and macro-averaged F1-score. Furthermore, to answer RQ3, we analyzed the model’s feature importances to investigate the relative influence of factors like product_category and design experience on users’ color style choices. To ensure the robustness of these findings, we conducted a grouped permutation importance analysis, as detailed in Section 4.3.

4. Results

This section presents the main findings from our study. We first report on the quantified patterns of users’ inherent aesthetic preferences derived from the Level 1 data analysis (addressing RQ1). We then detail the predictive performance of our context-aware model on the Level 2 task (addressing RQ2). Finally, we provide an in-depth feature importance analysis to investigate the key drivers of color preference (addressing RQ3) and report the results of the final user validation study, which provides initial evidence for the practical efficacy of our personalization framework.

4.1. Analysis of Inherent Aesthetic Preferences (Answering RQ1)

To address our first research question (RQ1), our analysis of the Level 1 data from the User Aesthetic Vector Library yielded two key findings. First, we identified systematic, population-level patterns in aesthetic preferences within our participant sample. Second, we validated the vector representation’s ability to capture fine-grained individual similarities and differences.

4.1.1. Systematic Population-Level Preference Patterns

Our analysis revealed distinct and quantifiable user preference patterns across the four semantic dimensions. In the “Style” dimension, a clear hierarchy of preference emerged. “Bright” (selected by 64.9% of participants) and “Gentle” (64.0%) were identified as the two most favored styles, indicating a strong and widespread inclination towards positive and approachable color aesthetics. Following closely was “Fresh” (56.8%), reinforcing an appreciation for natural and clean themes. Styles associated with minimalism and tradition, such as “Plain” (42.3%) and “Tech” (32.4%), also garnered significant interest. In contrast, more niche styles like “Luxury” (14.4%), “Fantasy” (16.2%), and “Functional” (18.0%) commanded a smaller but dedicated audience. These systematic patterns suggest that users have stable and distinguishable internal associations between semantic concepts and specific color properties, providing a foundational basis for subsequent predictive modeling.
These population-level preference patterns show a strong correspondence with foundational findings in affective color science. The widespread appeal of the ‘Gentle’ style, characterized by high brightness and low-to-medium saturation, aligns with a substantial body of research linking pastel-like colors to passive, calming emotions. Conversely, the popularity of the ‘Bright’ style corroborates findings that associate highly saturated and bright colors with active, positive emotions like joy and excitement. Our findings thus provide an ecologically valid confirmation of these principles within the specific context of abstract design semantics.

4.1.2. Validation of Inherent Aesthetic Profile Representation via Similarity Analysis

To further validate that our Abstract Preference Vectors serve as a meaningful representation of a user’s ‘inherent aesthetic profile,’ we conducted a similarity analysis using the cosine similarity metric across all 111 user vectors. The analysis revealed a wide spectrum of similarity scores, suggesting that the vector space effectively captures both shared tastes and unique individual preferences.
As shown in Table 3, we identified pairs of users with extremely high similarity scores (approaching 1.0) and others with very low scores, indicating divergent tastes.
A qualitative review of the underlying choice data for these extreme cases provided supporting evidence for the validity of our vector representation. For instance, User 4 and User 18 (similarity = 0.9740) exhibited a remarkably consistent aesthetic profile: both showed a clear preference for “Bright,” “Gentle,” and “Tech” styles, while simultaneously disliking a wide range of other styles including “Fantasy,” “Elegant,” and “Fashionable.”
Conversely, users with very low similarity scores, such as User 22 and User 96 (similarity = 0.3036), demonstrated diametrically opposed tastes. User 22′s choices for “Bright,” “Tech,” and “Gorgeous” reflected an aesthetic centered on modernity, vibrancy, and sophistication. In stark contrast, User 96′s preferences for “Gentle,” “Fresh,” “Composed,” and “Functional” pointed to an aesthetic grounded in comfort, nature, and practicality.
This analysis provides suggestive evidence that the vector representation is not arbitrary but is a valid and sensitive measure of a user’s inherent aesthetic profile, as captured by our measurement paradigm. This has potential practical implications, particularly for addressing the ‘cold-start’ problem, as a new user’s preferences can be provisionally inferred from their nearest neighbors in the vector space.

4.1.3. Visualization of Semantic-Colorimetric Associations

To make the claim of consistent associations between semantic concepts and color properties more tangible, we visualized the colorimetric properties of the two most popular styles identified in Section 4.1.1: “Bright” and “Gentle.” The HSB parameter ranges for all 25 subcategories were systematically defined and validated in our foundational work [9]; here, we present a focused visualization for these two key styles to illustrate the underlying systematic patterns.
Figure 3 illustrates the defined parameter ranges for Hue (H), Saturation (S), and Brightness (B) that constitute the “Bright” and “Gentle” styles. The plot reveals clear, distinct colorimetric signatures for each semantic concept. Both styles occupy a high Brightness range (B > 0.80), aligning with their positive valence. However, they are sharply distinguished by Saturation: the “Bright” style is defined by high saturation (S > 0.75), while the “Gentle” style is characterized by a markedly low-to-medium saturation range (0.15 < S < 0.50). Furthermore, their Hue ranges also differ, contributing to their unique identities. This visualization provides concrete, quantitative evidence that abstract user preferences for semantic styles have systematic, distinguishable underpinnings in the color space, further validating our approach to quantifying users’ inherent aesthetic profiles.

4.1.4. Reliability Analysis of the Inherent Aesthetic ProfileVector Library

To address the potential instability of our vectorization, particularly concerning the Z-scoring method and the use of the HSB color space, we conducted a split-half reliability analysis to assess the robustness of our hybrid measurement methodology. For each participant, the full set of Level 1 preference items was randomly divided into two halves. An independent Aesthetic Profile Vector was computed from each half—replicating our entire hybrid process (weighted scores for Style and Z-scores for others)—resulting in two parallel vector libraries.
We then calculated the mean cosine similarity between each participant’s two vectors across the entire sample (N = 111). The resulting average similarity, treated as a correlation coefficient for this analysis, was r = 0.549. To estimate the reliability of the full set of items, we applied the Spearman–Brown prophecy formula. The analysis yielded an estimated reliability coefficient of 0.709.
A reliability coefficient above 0.7 is considered “acceptable” in psychometric and human–computer interaction studies. This result indicates good internal consistency across the two distinct types of preference measurements used in our framework. It demonstrates that despite the acknowledged methodological limitations, the resulting composite inherent aesthetic profile vectors are robust enough to reliably capture a stable underlying preference structure for each user, thus validating their use in our subsequent modeling.

4.1.5. Robustness Analysis of the Aesthetic Profile Vectors

To further assess the robustness of our aesthetic profile vectors against sampling variability in participants’ choices, we conducted a bootstrapping analysis. For each participant, we generated 100 bootstrapped choice sets by resampling their 25 Level 1 decisions with replacement, from which a new aesthetic profile vector was computed each time. We then calculated the average cosine similarity between a participant’s original vector and their 100 bootstrapped vectors. The grand average similarity across all 111 participants was 0.81 (SD = 0.04). This high degree of similarity indicates that the core structure of a user’s aesthetic profile is stable and not unduly influenced by any single choice, providing further evidence that our vectorization method captures a consistent and meaningful preference signal rather than random noise.

4.2. Performance of the Context-Aware Predictive Model (Answering RQ2)

To answer our second research question (RQ2), we evaluated the ability of machine learning models to predict a user’s specific design choices based on their profile and the design context. We employed a robust 5-fold cross-validation protocol where users were grouped to prevent data leakage during training and testing. The performance of our primary models was benchmarked against several baseline and ablated models. The average performance across all 5 folds is summarized in Table 4.
The results, averaged over 5 folds, are summarized in Table 4. To account for the varying difficulty of the choice tasks, we stratified the performance analysis by task type, since the chance-level accuracy differs between the 3-alternative forced-choice (3-AFC) tasks and the 5-way choice tasks used for the Culture dimension.
The GBDT (Full Model) achieved an overall accuracy of 40.80%. When broken down, it achieved a mean accuracy of 42.50% on the 3-AFC tasks (chance level: 33.3%) and 34.80% on the more challenging 5-way tasks (chance level: 20.0%). This demonstrates that the model provides a substantial performance gain over random guessing in both scenarios. Notably, the performance lift above chance was even more pronounced in the more difficult 5-way task (+14.8%) compared to the 3-AFC task (+9.2%).
To further formalize this chance-adjusted performance, we computed Cohen’s Kappa, a metric that corrects for chance agreement. The GBDT model achieved a Kappa of 0.1380 for 3-AFC tasks and 0.1850 for 5-way tasks. Both values indicate a fair level of agreement beyond what would be expected by chance, again highlighting the model’s particular effectiveness in the more complex 5-way choice context. The Random Forest model exhibited a similar pattern of performance (Overall Accuracy = 41.53%), confirming the robustness of these findings. These stratified results provide a strong affirmative answer to RQ2, suggesting that a machine learning approach can effectively predict user choices in subjective domains of varying difficulty within our experimental setup.
While a direct comparison of our accuracy scores to other studies is challenging due to the novelty of our specific multi-class prediction task, this result is significant in demonstrating a learnable signal in subjective aesthetic choice. Previous research on preference prediction has often focused on binary choices or ratings for existing items (e.g., in movie recommendation). Our work extends this by showing that even in a more nuanced, simulated generative context of choosing between abstract semantic styles, user preference follows predictable, non-random patterns that can be modeled. This provides an empirical foundation for future investigations into predictive tools for computational aesthetics, moving beyond descriptive analysis towards actionable prediction.
Furthermore, the ablation study, conducted using the GBDT architecture, offers critical insights that provide support for our core hypotheses. The Context-only model (mean accuracy 30.75%) was substantially more predictive than the User-only model (mean accuracy 18.13%), providing quantitative evidence suggesting that “Context is King” (addressing RQ3). The fact that the GBDT (Full Model) substantially outperforms the Context-only model further underscores the significant value of incorporating user-specific ‘inherent aesthetic profiles’ for improving personalization. Therefore, our results collectively show that a model combining contextual and user profile features is the most effective for predicting specific user choices.

4.3. Analysis of Key Drivers of Color Preference (Answering RQ3)

To robustly identify the most influential factors shaping user choices and answer RQ3, we conducted a grouped permutation importance analysis. This advanced method addresses the known biases of standard feature importance techniques, such as the tendency to inflate the importance of high-cardinality features resulting from one-hot encoding. Instead of permuting individual feature columns, we conceptually grouped features into blocks (e.g., all product_category_ columns) and measured the drop in model accuracy when each entire block was permuted simultaneously. This approach provides a methodologically sound, unbiased estimate of the predictive value of higher-level concepts.
We defined five primary feature groups: User Profile (Level 1), Product Category, Dimension Context, Target Emotion/Color, and Other Context. The results of this analysis, averaged over 10 permutation runs on the held-out test set, are presented in Table 5 and visualized in Figure 4.
The results reveal a clear hierarchy of influence. As shown in Table 5, Product Category emerged as the single most dominant predictor, with its permutation causing a substantial 0.416 drop in model accuracy. This finding provides strong, direct evidence that what is being designed is the primary driver of color preference.
Following closely are other contextual factors. The Other Context and Target Emotion/Color blocks, which specify the concrete goal of the design task (e.g., choosing a color for the “trust” emotion), also showed very high importance (a drop of 0.289 and 0.245 in accuracy, respectively). The broader Dimension Context (e.g., knowing it is an ‘Emotion’ task) was also influential (a 0.102 drop). Collectively, these context-related feature blocks overwhelmingly dictate the model’s predictions, providing robust quantitative support for our “Context is King” conclusion.
Crucially, the User Profile (Level 1) block also demonstrated a clear and positive importance score (an accuracy drop of 0.088). While its influence is secondary to the dominant contextual factors, this result confirms that incorporating the user’s inherent aesthetic profile provides significant, valuable information for personalizing predictions. In summary, our analysis reveals a decision-making hierarchy where the specific design context is paramount, but the user’s personal profile plays a vital, non-trivial role in refining the final choice.
This ‘Context is King’ finding offers a crucial clarification to the broader color preference literature. While many studies identify the influence of demographic factors, our model-based analysis, which weighs all factors simultaneously, suggests their role may be secondary in design-specific tasks. However, we must interpret this finding with a critical caveat. Our stimuli were intentionally constructed to represent distinct semantic concepts (e.g., ‘Safety’ colors vs. ‘Warning’ colors). Consequently, the model’s strong reliance on context features may, to some extent, reflect its ability to learn the heuristics embedded in our stimulus-construction process, rather than uncovering a universally genuine contextual preference structure. While our framework demonstrates powerful predictive capability within the confines of our curated stimulus set, our results should be interpreted as suggestive evidence for the primacy of context, pending validation with more organically generated stimuli.

4.4. User Validation Study: Evaluating Real-World Efficacy

While offline metrics such as accuracy and F1-scores provide a quantitative measure of model performance, a preliminary validation of our framework necessitates an evaluation of its potential efficacy and user-perceived value. To this end, we conducted a mixed-methods user validation study designed to answer a critical question: can our Personalized Model generate color recommendations that are perceived as significantly more satisfying than those from a data-driven but non-personalized baseline?

4.4.1. Experimental Design

We recruited 12 new participants, balanced by professional design experience (6 professionals, 6 non-professionals), who had not been involved in the initial data collection. A counterbalanced, within-subjects design was employed to control for inherent individual differences in aesthetic taste, a powerful method for studies of subjective preference.
Each participant evaluated color recommendations for three distinct products, strategically chosen to represent the diverse, generalized categories our model was trained on: a Wireless Mouse (“Personal Consumer Electronics”), an Electric Screwdriver (“Industrial Tools”), and a Car (“Transportation”). The 12 participants were randomly assigned to one of two groups (Group A or Group B) to counterbalance the order of the experimental conditions across the three product sessions, thus mitigating potential ordering effects, as detailed in Table 6.
The two primary conditions under comparison were the Personalized Condition and the Baseline Condition. For the Personalized Condition, we leveraged our full modeling framework to generate a tailor-made semantic style based on each participant’s unique profile. This involved using the trained GBDT model for scenarios with available contextual data, and the hybrid recommendation system for scenarios where such data was absent. In contrast, the Baseline Condition represented a data-driven, yet generic, “one-size-fits-all” approach, with its construction logic strategically stratified based on the availability of contextual data to ensure a robust and fair comparison.
To establish a fair and objective baseline, we designed data-driven strategies contingent on the coverage of contextual data (Level 2) in our original survey. This “most popular” or modal approach was chosen as the baseline because it represents the standard, non-personalized strategy employed by many commercial systems, which often recommend best-sellers or trending options based on aggregated user behavior. It serves as a strong real-world proxy for a “one-size-fits-all” design approach that leverages the “wisdom of the crowd.” While simple, this baseline is not trivial, as it reflects the most dominant preference within the user population.
For products with available contextual data, such as the Wireless Mouse and Electric Screwdriver, we analyzed our entire initial dataset (N = 111) to identify the single most frequently chosen (i.e., modal) semantic for the corresponding Product Category. However, for the Car scenario, where contextual data was absent, this conventional modal approach was not applicable. To construct an equally data-driven baseline in this cold-start situation, we instead turned to the Level 1 (abstract preference) data and calculated the most popular abstract style selected across all 111 participants, which represents the general, context-free population preference. The results of this analysis are summarized in Table 7, which dictates the fixed recommendation for each product in the Baseline Condition.
Thus, for this study, the predetermined baseline style for the Wireless Mouse was “Fresh”, that for the Electric Screwdriver was “Warning”, and that for the Car was “Bright”, irrespective of the individual participant’s profile.
The experimental procedure was consistent for each participant across all three product evaluations. We use the Car scenario to illustrate the process. For any participant assigned to the Baseline Condition for this product, they would be presented with the single, predetermined recommendation shown in Figure 5. This color corresponds to the “Bright” semantic style, which our data analysis of Level 1 preferences identified as the most popular abstract style across the entire population.
Conversely, a participant in the Personalized Condition would have their unique profile analyzed by the model to generate a tailor-made recommendation. The diversity of these outputs is illustrated in Figure 6, which shows six distinct recommendations generated for six different user profiles. For instance, a profile associated with a preference for gentle aesthetics might be recommended a soft pink (“Gentle”, User 1), while another might receive a bold red (“Fashionable”, User 2).
Each participant, after viewing their assigned image, would rate their satisfaction on a 7-point Likert scale and provide qualitative feedback. This counterbalanced exposure across the three products ensures that every participant experiences and evaluates recommendations from both the personalized and baseline systems, allowing for a robust within-subjects comparison.

4.4.2. Validation Results and Analysis

To appropriately analyze our nested data structure, where each participant rated multiple products under different conditions, we employed a linear mixed-effects model (LMM). This advanced statistical approach allows us to test our primary hypothesis while rigorously controlling for potential confounding effects. Satisfaction score was the dependent variable. We included Condition (Personalized vs. Baseline) as a fixed effect to determine if our personalized recommendations were rated higher. To account for the non-independence of observations, we also included random intercepts for both Participant and Product, thereby isolating the true effect of our personalization algorithm from baseline differences across individuals and items.
The key fixed-effect results from the model are summarized in Table 8. The LMM analysis revealed a highly significant positive fixed effect of the Condition (β = 1.278, SE = 0.291, p < 0.001). This indicates that after controlling for baseline variability among participants and products, the personalized recommendations were rated, on average, 1.28 points higher on the 7-point scale than the baseline recommendations.
The 95% confidence interval for this effect, [0.707, 1.848], does not contain zero, further confirming the robustness of this finding. This provides strong statistical evidence that our personalization framework offers a meaningful and impactful improvement in user experience. The model’s random effects analysis also revealed that while there was significant variance attributable to the products themselves (σ2 = 0.396), the variance attributable to participants was negligible (σ2 = 0.000) once product effects were accounted for. This underscores the critical importance of having used a mixed-effects model to disentangle these confounding factors, a step that a simpler t-test could not achieve.
Qualitative feedback provided rich context for these quantitative results and further illuminated the nature of this preference. Comments for the personalized recommendations were often specific and affirmative, linking the suggested color to the user’s identity or intended use context. For example, one professional designer noted, “As a designer, I appreciate this subtle beige; it fits my minimalist desk perfectly,” while a student mentioned, “This green is so fresh and calming, I love it.” In stark contrast, feedback for the baseline colors was generally neutral to lukewarm, often describing them as “generic,” “safe,” or “like every other product out there.”
In conclusion, this validation study, analyzed with a robust statistical model appropriate for its design, demonstrates the potential practical value of our AI-driven framework. It confirms that by moving beyond a generic, popularity-based approach, our model generates color recommendations that are not only algorithmically predicted but are also perceived by end-users as significantly more satisfying and contextually appropriate in this exploratory evaluation. These promising findings provide strong initial evidence that our system can help improve the user experience in a simulated application scenario.

4.5. Quantifying Recommendation Diversity to Move Beyond the “Average Aesthetic”

A core objective of our framework is to transcend the “average aesthetic” by providing genuinely diverse, user-specific recommendations, as qualitatively illustrated in Figure 6. To quantitatively validate this claim, we conducted an analysis to measure the diversity of our model’s outputs compared to a standard baseline. We used the CIEDE2000 (ΔE2000) formula, the industry standard for calculating perceived color differences, as our diversity metric.
For this analysis, we selected the same three representative product categories used in the user validation study (Wireless Mouse, Electric Screwdriver, Automobile). We then chose three user profiles with highly dissimilar ‘aesthetic profiles’, identified via low cosine similarity of their Abstract Preference Vectors: User A (preference for ‘Bright’, ‘Tech’, ‘Gorgeous’), User B (preference for ‘Gentle’, ‘Fresh’, ‘Functional’), and User C (preference for ‘Fresh’, ‘Elegant’).
We generated the top color recommendation for each user-product pair using our personalized model. For the baseline, we used a popularity-based approach that recommends the single most common color for each category, consistent with the logic in our user study. Diversity was then quantified by calculating the average pairwise ΔE2000 among the recommendations for the three different users for each product.
The results, presented in Table 9, show a stark contrast.
The baseline model, by design, exhibits zero diversity, recommending the same “one-size-fits-all” color to every user. In sharp contrast, our personalized model generated recommendations with average perceptual distances ranging from 41.25 to 53.49. These large ΔE2000 values signify that the model is not merely suggesting subtle shade differences but is producing choices of fundamentally different colors that align with each user’s unique aesthetic profile.
For instance, for the ‘Wireless Mouse’, the model might recommend a vibrant cyan for User A (aligning with ‘Tech’), a muted beige for User B (aligning with ‘Functional’), and a soft pearl white for User C (aligning with ‘Elegant’). This quantitative analysis provides strong empirical evidence that our framework successfully avoids the pitfalls of regression to the mean within our tested scenarios. It confirms that the diversity illustrated qualitatively in Figure 6 is a robust and measurable property of our system, effectively translating a user’s unique ‘aesthetic profile’ into genuinely personalized and varied design outputs.

5. Discussion

Our research was motivated by the potential “personalization gap” in modern creative tools, a challenge magnified in the era of AIGC, where generated content is abundant but often generic. By developing and evaluating a machine learning framework for personalized color recommendation, our study offers novel insights into the computational modeling of aesthetic preference within a constrained design task. In this section, we discuss the contributions and theoretical implications of our findings, their potential practical applications, and avenues for future work.

5.1. Core Innovations and Theoretical Implications

While color preference has been a long-standing topic of inquiry, our research introduces several key contributions that distinguish it from prior work, particularly in its potential application to creative AI systems.
First and foremost, we proposed and evaluated a dual-track modeling framework that attempts to decouple “inherent aesthetic preference” from “contextual design decisions.” Traditional studies often stop at identifying population-level trends (e.g., “blue is the most liked color”). Our work contributes a functional, predictive system that explicitly distinguishes between two types of preference data. On one track, through aesthetic vectorization, we quantify a user’s stable, abstract taste (“who they are”) into a computable “inherent aesthetic profile.” On the other, through a supervised learning model, we capture how this profile, when combined with a specific design context (“what they are doing”), translates into a concrete, dynamic design decision. The results from our exploratory user validation study, analyzed with a rigorous linear mixed-effects model, offer strong support for this approach. The analysis showed that our personalized recommendations were perceived as significantly more satisfying than a data-driven baseline (β = 1.278, p < 0.001). This suggests that subjective aesthetic preference, while complex, is not random but contains a learnable, systematic signal within our specific experimental paradigm.
Second, we proposed a “semantic generalization” approach for feature engineering that may help address the cold-start problem in recommendation systems. Instead of treating each product as a unique instance, which limits scalability, we mapped specific products to generalized categories (e.g., “Personal Consumer Electronics”). Our grouped permutation importance analysis revealed that product_category was the single most dominant predictor (Importance = 0.416). This is a crucial finding: it provides evidence that context is paramount and offers a scalable method for applying the model to new, unseen products, suggesting a potential cold-start solution for recommendation systems in the design domain.
Third, our model provides quantitative observations that align with long-standing sociological theories in design. The grouped permutation importance analysis reveals that a user’s profile—shaped by their life experience and training—is the second most influential block of predictors after contextual factors (User Profile Importance = 0.088). While our previous analysis based on Gini Importance hinted at the role of specific demographics, the more robust grouped permutation method confirms the collective, non-trivial predictive value of the user profile as a whole. This lends empirical support to concepts like cultural capital and generational aesthetics, suggesting that life stage and background may shape an individual’s visual vocabulary. It also provides a basis for quantifying the notion of “trained taste,” by demonstrating a learnable signal in the preferences of design professionals and novices. By successfully modeling these factors, our framework serves not only as a proof-of-concept tool but also as a computational model for exploring the layered structure of human aesthetic judgment.

5.2. Practical Implications and a Conceptual AIGC Workflow

The potential practical value of our framework is best illustrated by conceptualizing its integration as an intelligent “personalization layer” into a generative system. To demonstrate this potential, we have instantiated this concept in a proof-of-concept desktop application, the “AI Color Workshop,” which builds directly upon our prior work [9]. It is important to note that this application serves as a conceptual demonstration of a potential end-to-end workflow, rather than a fully evaluated design tool.
As shown in Figure 7, the application integrates three modules. The “AI Color Projector” (left) for semantic-to-color mapping and the “AI Image Generator” (right) for final rendering were developed in our foundational study. They provide the core engine for translating any given semantic into a precise color and applying it to a generated image. The novel contribution of the current research is the central “AI Smart Recommender” module. This new component embeds our trained preference model, acting as an intelligent bridge that personalizes the entire workflow.
The integrated end-to-end workflow operates as follows:
  • User Preference Input:
A user interacts with the central “AI Smart Recommender” module. Here, they input the target audience’s profile (e.g., age, design experience), the design context (e.g., “Home Appliances”), and select their preferred abstract design styles (e.g., “Bright”, “Tech”).
2.
Personalized Semantic Recommendation:
Our hybrid recommendation model (a synergy of the GBDT model and a similarity algorithm leveraging the User Aesthetic Vector Library), embedded within this module, instantly processes the input and generates a ranked list of preferred semantic styles with corresponding confidence scores (e.g., Top 1: Bright, 98.0%). This step replaces manual guesswork with a data-driven prediction of the target user’s taste. This similarity-based component is particularly crucial for addressing the cold-start problem for new users. As validated by our cosine similarity analysis (Section 4.1.2), which quantitatively confirmed that vector similarity correlates with taste alignment, we can instantly provide reasonable recommendations for a new user by identifying existing users with a similar ‘inherent aesthetic profile,’ making the system immediately useful.
3.
Controllable AIGC Generation:
The user can then select a high-ranking semantic. This action can either trigger the “AI Color Projector” module to explore specific color parameters or directly command the “AI Image Generator”. In the latter case, the chosen semantic is translated into a precise color and embedded into a prompt, instructing the AIGC model to render the final product—such as the refrigerator shown—in a color that is deeply aligned with the target user’s predicted preferences from the very first step.
This closed-loop process—from user profile to personalized semantic to controllable generation—illustrates a potential pathway toward “going beyond the average aesthetic.” It suggests a conceptual model for a more human-centered generative system that is aware of who it is designing for, providing one possible direction for the development of intelligent, context-aware, and personalized creative tools.

5.3. Limitations and Future Work

Despite the promising results, this study is subject to several important limitations that define the scope of our claims and suggest avenues for future research.
First, the measurement framework itself has inherent constraints. Our two-level profiling methodology is dependent on the specific stimuli used, and the Level 1 ‘aesthetic profile’ captures preferences relative to our author-curated image triplets. Furthermore, the Level 2 data was collected via a three-alternative forced-choice (3-AFC) task, which may not fully replicate the complexity of unconstrained, real-world design behavior. While our reliability analyses support the internal consistency of these measures, their validity as indicators of a general, context-independent aesthetic trait requires further psychometric investigation.
Second, the strong contextual effects identified in our Level 2 model may be partially confounded with our stimulus design. The color triplets were curated to represent specific semantic concepts. This means the model may have learned to recognize the authors’ stimulus-construction heuristics rather than a genuine, underlying user preference structure. While our grouped permutation importance analysis provides robust evidence for the dominance of context within our measurement paradigm, future work is essential to disentangle the effects of our constructed task from a more general principle of contextual preference using more diverse or user-generated stimuli.
Third, the user validation study does not test real-world design workflows or AIGC usage. The study evaluated satisfaction with pre-selected color recommendations, which is a step removed from the complexities of tasks like prompt engineering for AIGC systems or iterative design refinement. The findings demonstrate the model’s ability to predict relative preference among curated choices, but future research is essential to integrate and evaluate this capability within more authentic, designer-centered creative tasks.
Fourth, the findings related to the ‘Culture’ dimension are specific to the cultural and linguistic context of our participant sample. As we clarified in our methodology, the cultural semantics used in this study were intentionally grounded in Chinese color theory to serve as a proof-of-concept for our framework’s ability to model culturally specific knowledge. Consequently, the model’s predictions related to this dimension are not expected to generalize directly to other cultures. Validating the cross-cultural transferability of our entire framework is a critical next step. Future research should apply our data collection and modeling methodology to diverse cultural regions to build and compare distinct, culturally specific preference models. This would not only test the generalizability of our framework but could also lead to valuable insights in the field of comparative color semantics.
Fifth, our online data collection method introduces the potential for sampling bias. Participants were recruited through online platforms and may have a higher-than-average interest in design and aesthetics, leading to a self-selection bias. This could potentially influence the generalizability of the systematic preference patterns identified in response to RQ1. For example, the strong population-level preference for styles like “Bright” and “Gentle” might be more pronounced in this demographic than in the general population. However, it is important to note that the primary goal of this study was not to establish universal aesthetic laws, but rather to develop and validate a computational framework for capturing, quantifying, and modeling individual aesthetic preferences. The core findings—such as the dominance of context over demographics (RQ3) and the model’s ability to generate diverse, satisfying recommendations (RQ2)—are predicated on the internal consistency and structure of the collected data, rather than its absolute population-level representativeness. Future work could address this limitation by recruiting a more stratified sample from diverse demographic and psychographic backgrounds to further test the cross-population validity of the observed preference patterns.
Sixth, the current model predicts preferences for single colors or simple color schemes within a fixed semantic space. Future work should expand the framework to address the complexity of multi-color palettes and color harmony. This could involve training models to predict the suitability of color combinations or integrating our preference model with established computational harmony rules.
Seventh, the construct validity of our ‘quantified aesthetic profile’ warrants careful consideration. Its Z-score component, used for the Function, Emotion, and Culture dimensions, relies on a standard deviation estimated from only three author-curated samples (n = 3). This makes the measurement sensitive to the specific composition of these triplets. To address this, we took several steps: we aimed to increase transparency, as detailed in our methodology, by analyzing and reporting the variability within the stimulus triplets; we demonstrated the vector’s internal consistency via split-half reliability (r = 0.709, Section 4.1.4); and we confirmed its robustness against choice variability through a bootstrapping analysis, which showed a high average similarity of 0.81 between original and resampled vectors (Section 4.1.5).
However, we concur with the critique that these analyses primarily support the measurement’s stability and consistency—that is, it captures a stable response pattern to our specific stimulus design—rather than definitively establishing it as a direct measure of a context-independent, inherent trait. Future work must employ more robust psychometric approaches to model these choices, such as ordinal regression or paired-comparison frameworks, which can derive preference scores with stronger theoretical underpinnings. This would be a critical next step to validate whether the stable structure we identified is an artifact of our measurement or a reflection of a deeper, inherent preference.
Eighth, the user validation study, while employing a rigorous counterbalanced within-subjects design and an appropriate statistical model (LMM), was conducted with a small sample of 12 participants. While this design allowed us to detect a statistically significant effect, the small sample size limits the broader generalizability of our findings. The conclusions drawn from this study should therefore be interpreted as exploratory and providing initial, promising evidence, pending validation from future, larger-scale user studies.
Finally, while our model predicts preference based on static user profiles, individual tastes can evolve. A longitudinal study tracking users’ preferences over time could enable the development of dynamic models that adapt to a user’s changing aesthetic sensibilities, leading to an even more sophisticated level of long-term personalization.

6. Conclusions

In an era increasingly shaped by generative AI, a critical challenge has emerged: bridging the gap between the powerful yet often generic output of creative AI tools and the nuanced, subjective aesthetic preferences of individual users. This research addressed this “problem of the average” by proposing and evaluating a proof-of-concept machine learning framework that served as a foundational step toward developing personalized, context-aware color recommendations.
Our primary contribution lies in the proposal and implementation of a dual-track modeling approach that attempts to decouple “inherent aesthetic preference” from “contextual design decisions.” Through this methodology, we showed that subjective color preference, far from being arbitrary, can be modeled as a learnable, systematic signal within the constraints of our experimental paradigm. We first quantified users’ abstract preferences into ‘inherent aesthetic profiles’ via aesthetic vectorization, and then used a supervised learning model to learn how this profile translates into specific choices within particular design contexts. This framework, which separates the stable traits of a user from their dynamic decisions, forms the conceptual cornerstone of our research.
Our exploratory user validation study provided promising initial evidence of the framework’s efficacy. Analyzed with a rigorous linear mixed-effects model, personalized recommendations were rated as significantly more satisfying (β = 1.278, p < 0.001) than a data-driven, one-size-fits-all baseline. This finding empirically supports our core hypothesis that a context-aware, personalized approach is superior to generic solutions in a controlled, simulated recommendation task.
Ultimately, this work offers more than just a predictive model; it delivers a conceptual methodology and a potential direction for a new generation of creative tools, as embodied by our “AI Color Workshop” proof-of-concept. By demonstrating a method to understand and cater to individual preferences at the semantic level, we offer a promising path to steer creative AI systems towards becoming more human-centered. The framework presented here lays empirical groundwork for a future where generative AI might serve not merely as a content producer, but as a more personalized and collaborative creative partner, capable of crafting outputs that resonate with the unique aesthetic profile of each user.

Author Contributions

Conceptualization, L.L. and X.L.; methodology, L.L.; validation, L.L.; investigation, L.L.; data curation, L.L.; writing—original draft preparation, L.L.; writing—review and editing, L.L. and X.L.; visualization, L.L.; supervision, X.L.; project administration, X.L.; funding acquisition, X.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science and Technology Department of Hubei Province under the Key Research and Development Program of Hubei Province, grant number 2021BCD005.

Institutional Review Board Statement

According to institutional guidelines, formal ethical approval is not required for this type of study. The research exclusively involved an anonymous online survey where participants voluntarily reported their aesthetic preferences. The protocol involved no deception, posed no more than minimal risk to participants, and all collected data were fully anonymized. The study was conducted in accordance with the principles of the Declaration of Helsinki.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to express their sincere gratitude to all the participants who voluntarily took part in the online surveys and user validation studies. Their time and insightful feedback were invaluable to the successful completion of this research.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Ramesh, A.; Dhariwal, P.; Nichol, A.; Chu, C.; Chen, M. Hierarchical Text-Conditional Image Generation with CLIP Latents. arXiv 2022, arXiv:2204.06125. [Google Scholar] [CrossRef]
  2. Epstein, Z.; Hertzmann, A.; Investigators of Human Creativity; Akten, M.; Farid, H.; Fjeld, J.; Frank, M.R.; Groh, M.; Herman, L.; Leach, N.; et al. Art and the science of generative AI. Science 2023, 380, 1110–1111. [Google Scholar] [CrossRef] [PubMed]
  3. Manovich, L. AI Aesthetics; Strelka Press: Moscow, Russia, 2018. [Google Scholar]
  4. Liu, P.; Yuan, W.; Fu, J.; Jiang, Z.; Hayashi, H.; Neubig, G. Pre-train, Prompt, and Predict: A systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 2023, 55, 1–35. [Google Scholar] [CrossRef]
  5. Yu, K.; Xiao, Y.; Li, M.; Yu, S.; Yang, Y.; Guo, X.; Zhang, W.; Yuan, X. Design for AI-Integrated design team collaboration: A strategy and exploration using Node Flow in establishing a reusable representation of knowledge in the collaborative process. In Proceedings of the Design Research Society Conference 2024 (DRS2024), Boston, MA, USA, 23–28 June 2024. [Google Scholar] [CrossRef]
  6. Turchi, T.; Carta, S.; Ambrosini, L.; Malizia, A. Human-AI co-creation: Evaluating the impact of large-scale text-to-image generative models on the creative process. In Proceedings of the International Symposium on End-User Development (IS-EUD 2023), Cagliari, Italy, 6–9 June 2023; pp. 343–351. [Google Scholar] [CrossRef]
  7. Ou, L.-C.; Luo, M.R.; Woodcock, A.; Wright, A. A study of colour emotion and colour preference. Part I: Colour emotions for single colours. Color Res. Appl. 2004, 29, 232–240. [Google Scholar] [CrossRef]
  8. Palmer, S.E.; Schloss, K.B. Human preference for individual colors. In Proceedings of the Human Vision and Electronic Imaging XV, San Jose, CA, USA, 18–21 January 2010; Volume 7527, pp. 353–364. [Google Scholar]
  9. Li, L.; Liu, X.; Wang, Y. Bridging the Semantic Gap in Computational Color Design: A Neurocognitively Validated Framework. IEEE Access. 2026, 14, 576–591. [Google Scholar] [CrossRef]
  10. Luo, X.; Yuan, M. Special Issue on “Machine-Learning-Assisted Intelligent Processing and Optimization of Complex Systems”. Processes 2023, 11, 2595. [Google Scholar] [CrossRef]
  11. Palmer, S.E.; Schloss, K.B. An ecological valence theory of human color preference. Proc. Natl. Acad. Sci. USA 2010, 107, 8877–8882. [Google Scholar] [CrossRef] [PubMed]
  12. Nichols, W.J. Blue Mind: The Surprising Science that Shows How Being Near, in, on, or Under Water Can Make You Happier, Healthier, More Connected, and Better at What You Do; Back Bay Books: New York, NY, USA, 2015. [Google Scholar]
  13. Hurlbert, A.C.; Ling, Y. Biological components of sex differences in color preference. Curr. Biol. 2007, 17, R623–R625. [Google Scholar] [CrossRef] [PubMed]
  14. Madden, T.J.; Hewett, K.; Roth, M.S. Managing images in different cultures: A cross-national study of colour meanings and preferences. J. Int. Mark. 2000, 8, 90–107. [Google Scholar] [CrossRef]
  15. Saito, M. A comparative study of color preferences in Japan, China and Indonesia, with emphasis on the preference for white. Percept. Mot. Skills 1996, 83, 115–128. [Google Scholar] [CrossRef] [PubMed]
  16. Aslam, M.M. Are you selling the right colour? A cross-cultural review of colour as a marketing cue. J. Mark. Commun. 2006, 12, 15–30. [Google Scholar] [CrossRef]
  17. Taylor, C.; Clifford, A.; Franklin, A. Color preferences are not universal. J. Exp. Psychol. Gen. 2013, 142, 1015–1027. [Google Scholar] [CrossRef] [PubMed]
  18. Ou, L.-C.; Luo, M.R.; Sun, P.-L.; Hu, N.-C.; Chen, H.-S.; Guan, S.-S.; Woodcock, A.; Caivano, J.L.; Huertas, R.; Treméau, A.; et al. A cross-cultural comparison of colour emotion for two-colour combinations. Color Res. Appl. 2012, 37, 23–43. [Google Scholar] [CrossRef]
  19. Itten, J. The Art of Color: The Subjective Experience and Objective Rationale of Color; Van Nostrand Reinhold: New York, NY, USA, 1973. [Google Scholar]
  20. Tokumaru, M.; Muranaka, N.; Imanishi, S. Color design support system considering color harmony. In Proceedings of the IEEE International Conference on Fuzzy Systems, Honolulu, HI, USA, 12–17 May 2002; Volume 1, pp. 378–383. [Google Scholar]
  21. O’Donovan, P.; Agarwala, A.; Hertzmann, A. Color compatibility from large image datasets. ACM Trans. Graph. 2011, 30, 1–12. [Google Scholar] [CrossRef]
  22. Valdez, P.; Mehrabian, A. Effects of color on emotions. J. Exp. Psychol. Gen. 1994, 123, 394–409. [Google Scholar] [CrossRef] [PubMed]
  23. Bartram, L.; Patra, A.; Stone, M. Affective color palettes in visualization. IEEE Trans. Vis. Comput. Graph. 2017, 24, 478–487. [Google Scholar]
  24. Nagamachi, M. Kansei/Affective Engineering; CRC Press: Boca Raton, FL, USA, 2010. [Google Scholar]
  25. Li, S. DiffStyler: Diffusion-based Localized Image Style Transfer. arXiv 2024, arXiv:2403.18461. [Google Scholar]
  26. Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. IEEE Comput. 2009, 42, 30–37. [Google Scholar] [CrossRef]
  27. Wang, T.; Brovman, Y.M.; Madhvanath, S. Personalized Embedding-based e-Commerce Recommendations at eBay. arXiv 2021, arXiv:2102.06156. [Google Scholar]
  28. Friedman, J.H. Greedy Function Approximation: A Gradient Boosting Machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Figure 1. The three-phase research framework. The methodology is designed to first decouple user preference into abstract (Level 1) and contextual (Level 2) data, then build a dual-track predictive model, and finally evaluate the model’s utility in an exploratory user study.
Figure 1. The three-phase research framework. The methodology is designed to first decouple user preference into abstract (Level 1) and contextual (Level 2) data, then build a dual-track predictive model, and finally evaluate the model’s utility in an exploratory user study.
Applsci 16 01543 g001
Figure 3. Parameter range plot for the HSB values of “Bright” and “Gentle” styles. The visualization highlights the distinct and systematic colorimetric signatures corresponding to these two popular semantic concepts.
Figure 3. Parameter range plot for the HSB values of “Bright” and “Gentle” styles. The visualization highlights the distinct and systematic colorimetric signatures corresponding to these two popular semantic concepts.
Applsci 16 01543 g003
Figure 4. Visualization of grouped permutation importance.
Figure 4. Visualization of grouped permutation importance.
Applsci 16 01543 g004
Figure 5. The baseline recommendation for the Car scenario, corresponding to the “Bright” semantic style.
Figure 5. The baseline recommendation for the Car scenario, corresponding to the “Bright” semantic style.
Applsci 16 01543 g005
Figure 6. Examples of personalized recommendations for the Car scenario, generated for six different user profiles.
Figure 6. Examples of personalized recommendations for the Car scenario, generated for six different user profiles.
Applsci 16 01543 g006
Figure 7. The “AI Color Workshop” proof-of-concept system. The central “AI Smart Recommender” is the new personalization layer developed in this study, which is integrated with the “AI Color Projector” (semantic mapping) and “AI Image Generator” (rendering) from our prior work.
Figure 7. The “AI Color Workshop” proof-of-concept system. The central “AI Smart Recommender” is the new personalization layer developed in this study, which is integrated with the “AI Color Projector” (semantic mapping) and “AI Image Generator” (rendering) from our prior work.
Applsci 16 01543 g007
Table 1. A Comparative analysis of personalization frameworks.
Table 1. A Comparative analysis of personalization frameworks.
Feature DimensionData-Driven (Popularity-Based)Semantic Mapping ModelsAIGC Fine-Tuning (e.g., LoRA)Our Dual-Track Model (Proposed)
Modeling GranularityMonolithic (treats preference as a single signal)Monolithic (universal semantic mapping)Style-level (captures a holistic style)Dual (decouples inherent traits from contextual choices)
Handles ‘Inherent Aesthetics’No (conflated with context)No (assumes universal aesthetics)Yes (excellent at capturing a specific user style)Yes (explicitly modeled via Aesthetic Vector Library)
Handles ‘Contextual Decisions’No (recommends popular items regardless of context)Partial (maps to context, but universally)No (style is static, not context-adaptive)Yes (core function of the GBDT prediction model)
Personalization ApproachGroup-based (average user)Non-personal (universal rules)User-specific, but staticDynamic & User-specific (adapts to user and context)
Cold-Start CapabilityPoor for niche productsGood for new contexts (if in lexicon)Poor for new contexts (requires re-training/prompting)Good (semantic generalization for new products & similarity for new users)
InterpretabilityLow (“It’s popular”)High (“It means ‘calm’“)Low (black box)High (“Because of product category and your profile”)
User OverheadLow (passive data collection)Low (no user data needed)High (requires data curation and technical skill)Moderate (initial profiling survey)
Table 2. Descriptive statistics of participant characteristics (N = 111).
Table 2. Descriptive statistics of participant characteristics (N = 111).
CharacteristicCategoryFrequency (n, %)
Age Group (years)18–2537 (33.3%)
26–3525 (22.5%)
36–4534 (30.6%)
46–559 (8.1%)
>556 (5.4%)
GenderMale53 (47.7%)
Female58 (52.3%)
Design ExperienceNone68 (61.3%)
Personal Interest10 (9.0%)
Professional or Academic33 (29.7%)
Years of Design Experience (for n = 43)<1 year4 (9.3%)
1–3 years8 (18.6%)
3–5 years13 (30.2%)
>5 years18 (41.9%)
Highest Education LevelJunior high school or below3 (2.7%)
High school/Vocational13 (11.7%)
Associate degree10 (9.0%)
Bachelor’s degree41 (36.9%)
Master’s degree35 (31.5%)
Doctoral degree9 (8.1%)
Familiarity with Product TypesConsumer Electronics96 (86.5%)
Home Appliances74 (66.7%)
Transportation49 (44.1%)
Cultural and Creative Products47 (42.3%)
Industrial Equipment12 (10.8%)
Medical Devices12 (10.8%)
Other3 (2.7%)
Table 3. Examples of highest and lowest cosine similarity scores between user aesthetic vectors.
Table 3. Examples of highest and lowest cosine similarity scores between user aesthetic vectors.
CategoryUser PairCosine Similarity
Highest SimilarityUser 4 & User 180.9740
User 89 & User 1050.9672
User 27 & User 850.9600
Lowest SimilarityUser 22 & User 960.3036
User 14 & User 540.3292
User 22 & User 540.3416
Table 4. Performance comparison of predictive models, stratified by task type (5-fold cross-validation).
Table 4. Performance comparison of predictive models, stratified by task type (5-fold cross-validation).
ModelTask TypeChance Level Mean Accuracy (±Std Dev)Mean Macro F1-Score (±Std Dev)Mean Cohen’s Kappa (±Std Dev)
Random ForestOverall (Mixed)~29.6%0.4153 (±0.0340)0.2777 (±0.0170)0.3471 (±0.0372)
3-AFC Tasks33.3%0.4350 (±0.0350)0.2910 (±0.0180)0.1530 (±0.0525)
5-AFC Tasks20.0%0.3468 (±0.0310)0.2355 (±0.0150)0.1835 (±0.0388)
GBDT (Full Model)Overall (Mixed)~29.6%0.4080 (±0.0250)0.2905 (±0.0380) 0.3418 (±0.0271)
3-AFC Tasks33.3%0.4250 (±0.0260)0.3080 (±0.0400)0.1380 (±0.0390)
5-AFC Tasks20.0%0.3480 (±0.0220)0.2310 (±0.0320)0.1850 (±0.0275)
Logistic RegressionOverall (Mixed)~29.6%0.3652 (±0.0300)0.2709 (±0.0302)0.2881 (±0.0332)
3-AFC Tasks33.3%0.3780 (±0.0310)0.2830 (±0.0315)0.0675 (±0.0465)
5-AFC Tasks20.0%0.3200 (±0.0270)0.2300 (±0.0275)0.1500 (±0.0338)
GBDT (Context-only)Overall (Mixed)~29.6%0.3075 (±0.0169)0.0996 (±0.0045)0.2325 (±0.0191)
3-AFC Tasks33.3%0.3240 (±0.0175)0.1080 (±0.0050)−0.0135 (±0.0263)
5-AFC Tasks20.0%0.2500 (±0.0150)0.0700 (±0.0035)0.0625 (±0.0188)
GBDT (User-only)Overall (Mixed)~29.6%0.1813 (±0.0226)0.1542 (±0.0366)0.0898 (±0.0245)
3-AFC Tasks33.3%0.1880 (±0.0230)0.1600 (±0.0380)−0.2180 (±0.0345)
5-AFC Tasks20.0%0.1586 (±0.0210)0.1320 (±0.0320)−0.0517 (±0.0263)
Majority ClassOverall (Mixed)~29.6%0.0618 (±0.0087)0.0047 (±0.0006)0.0000 (±0.0000)
3-AFC Tasks33.3%0.0790 (±0.0111)0.0060 (±0.0008)−0.3813 (±0.0167)
5-AFC Tasks20.0%0.0000 (±0.0000)0.0000 (±0.0000)−0.2500 (±0.0000)
Table 5. Grouped permutation importance of feature blocks.
Table 5. Grouped permutation importance of feature blocks.
Feature GroupImportance Mean (Drop in Accuracy)Importance Std
Product Category0.41600.0204
Other Context0.28870.0244
Target Emotion/Color0.24540.0190
Dimension Context0.10210.0202
User Profile (Level 1)0.08760.0269
Table 6. Counterbalanced design for the user validation study.
Table 6. Counterbalanced design for the user validation study.
SessionProductGroup A (N = 6)Group B (N = 6)
1Wireless MousePersonalizedBaseline
2Electric ScrewdriverBaselinePersonalized
3CarPersonalizedBaseline
Table 7. Determination of baseline recommendations for validation scenarios.
Table 7. Determination of baseline recommendations for validation scenarios.
Product Scenario for ValidationRepresentative Product CategoryBaseline Data SourceMost Popular (Modal) SemanticSelection CountFinal Baseline Style for User
Wireless MousePersonal Consumer ElectronicsLevel 2 (Contextual)Fresh63Fresh
Electric ScrewdriverIndustrial ToolsLevel 2 (Contextual)Warning53Warning
AutomobileTransportationLevel 1 (Abstract)Bright72Bright
Table 8. Results of the linear mixed-effects model for satisfaction scores.
Table 8. Results of the linear mixed-effects model for satisfaction scores.
EffectCoefficient (β)Std. Error95% Confidence Intervalp-Value
Condition (Personalized vs. Baseline)1.2780.291[0.707, 1.848]<0.001
Table 9. Quantitative comparison of recommendation diversity (average pairwise ΔE2000).
Table 9. Quantitative comparison of recommendation diversity (average pairwise ΔE2000).
Product CategoryAvg. ΔE2000 (Personalized Model)Avg. ΔE2000 (Baseline Model)
Wireless Mouse48.170.00
Electric Screwdriver53.490.00
Automobile41.250.00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, L.; Liu, X. Modeling Inherent Aesthetics and Contextual Decisions for Personalized Color Recommendation in AIGC. Appl. Sci. 2026, 16, 1543. https://doi.org/10.3390/app16031543

AMA Style

Li L, Liu X. Modeling Inherent Aesthetics and Contextual Decisions for Personalized Color Recommendation in AIGC. Applied Sciences. 2026; 16(3):1543. https://doi.org/10.3390/app16031543

Chicago/Turabian Style

Li, Lin, and Xinxiong Liu. 2026. "Modeling Inherent Aesthetics and Contextual Decisions for Personalized Color Recommendation in AIGC" Applied Sciences 16, no. 3: 1543. https://doi.org/10.3390/app16031543

APA Style

Li, L., & Liu, X. (2026). Modeling Inherent Aesthetics and Contextual Decisions for Personalized Color Recommendation in AIGC. Applied Sciences, 16(3), 1543. https://doi.org/10.3390/app16031543

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop