Our research was motivated by the potential “personalization gap” in modern creative tools, a challenge magnified in the era of AIGC, where generated content is abundant but often generic. By developing and evaluating a machine learning framework for personalized color recommendation, our study offers novel insights into the computational modeling of aesthetic preference within a constrained design task. In this section, we discuss the contributions and theoretical implications of our findings, their potential practical applications, and avenues for future work.
5.1. Core Innovations and Theoretical Implications
While color preference has been a long-standing topic of inquiry, our research introduces several key contributions that distinguish it from prior work, particularly in its potential application to creative AI systems.
First and foremost, we proposed and evaluated a dual-track modeling framework that attempts to decouple “inherent aesthetic preference” from “contextual design decisions.” Traditional studies often stop at identifying population-level trends (e.g., “blue is the most liked color”). Our work contributes a functional, predictive system that explicitly distinguishes between two types of preference data. On one track, through aesthetic vectorization, we quantify a user’s stable, abstract taste (“who they are”) into a computable “inherent aesthetic profile.” On the other, through a supervised learning model, we capture how this profile, when combined with a specific design context (“what they are doing”), translates into a concrete, dynamic design decision. The results from our exploratory user validation study, analyzed with a rigorous linear mixed-effects model, offer strong support for this approach. The analysis showed that our personalized recommendations were perceived as significantly more satisfying than a data-driven baseline (β = 1.278, p < 0.001). This suggests that subjective aesthetic preference, while complex, is not random but contains a learnable, systematic signal within our specific experimental paradigm.
Second, we proposed a “semantic generalization” approach for feature engineering that may help address the cold-start problem in recommendation systems. Instead of treating each product as a unique instance, which limits scalability, we mapped specific products to generalized categories (e.g., “Personal Consumer Electronics”). Our grouped permutation importance analysis revealed that product_category was the single most dominant predictor (Importance = 0.416). This is a crucial finding: it provides evidence that context is paramount and offers a scalable method for applying the model to new, unseen products, suggesting a potential cold-start solution for recommendation systems in the design domain.
Third, our model provides quantitative observations that align with long-standing sociological theories in design. The grouped permutation importance analysis reveals that a user’s profile—shaped by their life experience and training—is the second most influential block of predictors after contextual factors (User Profile Importance = 0.088). While our previous analysis based on Gini Importance hinted at the role of specific demographics, the more robust grouped permutation method confirms the collective, non-trivial predictive value of the user profile as a whole. This lends empirical support to concepts like cultural capital and generational aesthetics, suggesting that life stage and background may shape an individual’s visual vocabulary. It also provides a basis for quantifying the notion of “trained taste,” by demonstrating a learnable signal in the preferences of design professionals and novices. By successfully modeling these factors, our framework serves not only as a proof-of-concept tool but also as a computational model for exploring the layered structure of human aesthetic judgment.
5.2. Practical Implications and a Conceptual AIGC Workflow
The potential practical value of our framework is best illustrated by conceptualizing its integration as an intelligent “personalization layer” into a generative system. To demonstrate this potential, we have instantiated this concept in a proof-of-concept desktop application, the “AI Color Workshop,” which builds directly upon our prior work [
9]. It is important to note that this application serves as a conceptual demonstration of a potential end-to-end workflow, rather than a fully evaluated design tool.
As shown in
Figure 7, the application integrates three modules. The “AI Color Projector” (left) for semantic-to-color mapping and the “AI Image Generator” (right) for final rendering were developed in our foundational study. They provide the core engine for translating any given semantic into a precise color and applying it to a generated image. The novel contribution of the current research is the central “AI Smart Recommender” module. This new component embeds our trained preference model, acting as an intelligent bridge that personalizes the entire workflow.
The integrated end-to-end workflow operates as follows:
A user interacts with the central “AI Smart Recommender” module. Here, they input the target audience’s profile (e.g., age, design experience), the design context (e.g., “Home Appliances”), and select their preferred abstract design styles (e.g., “Bright”, “Tech”).
- 2.
Personalized Semantic Recommendation:
Our hybrid recommendation model (a synergy of the GBDT model and a similarity algorithm leveraging the User Aesthetic Vector Library), embedded within this module, instantly processes the input and generates a ranked list of preferred semantic styles with corresponding confidence scores (e.g., Top 1: Bright, 98.0%). This step replaces manual guesswork with a data-driven prediction of the target user’s taste. This similarity-based component is particularly crucial for addressing the cold-start problem for new users. As validated by our cosine similarity analysis (
Section 4.1.2), which quantitatively confirmed that vector similarity correlates with taste alignment, we can instantly provide reasonable recommendations for a new user by identifying existing users with a similar ‘inherent aesthetic profile,’ making the system immediately useful.
- 3.
Controllable AIGC Generation:
The user can then select a high-ranking semantic. This action can either trigger the “AI Color Projector” module to explore specific color parameters or directly command the “AI Image Generator”. In the latter case, the chosen semantic is translated into a precise color and embedded into a prompt, instructing the AIGC model to render the final product—such as the refrigerator shown—in a color that is deeply aligned with the target user’s predicted preferences from the very first step.
This closed-loop process—from user profile to personalized semantic to controllable generation—illustrates a potential pathway toward “going beyond the average aesthetic.” It suggests a conceptual model for a more human-centered generative system that is aware of who it is designing for, providing one possible direction for the development of intelligent, context-aware, and personalized creative tools.
5.3. Limitations and Future Work
Despite the promising results, this study is subject to several important limitations that define the scope of our claims and suggest avenues for future research.
First, the measurement framework itself has inherent constraints. Our two-level profiling methodology is dependent on the specific stimuli used, and the Level 1 ‘aesthetic profile’ captures preferences relative to our author-curated image triplets. Furthermore, the Level 2 data was collected via a three-alternative forced-choice (3-AFC) task, which may not fully replicate the complexity of unconstrained, real-world design behavior. While our reliability analyses support the internal consistency of these measures, their validity as indicators of a general, context-independent aesthetic trait requires further psychometric investigation.
Second, the strong contextual effects identified in our Level 2 model may be partially confounded with our stimulus design. The color triplets were curated to represent specific semantic concepts. This means the model may have learned to recognize the authors’ stimulus-construction heuristics rather than a genuine, underlying user preference structure. While our grouped permutation importance analysis provides robust evidence for the dominance of context within our measurement paradigm, future work is essential to disentangle the effects of our constructed task from a more general principle of contextual preference using more diverse or user-generated stimuli.
Third, the user validation study does not test real-world design workflows or AIGC usage. The study evaluated satisfaction with pre-selected color recommendations, which is a step removed from the complexities of tasks like prompt engineering for AIGC systems or iterative design refinement. The findings demonstrate the model’s ability to predict relative preference among curated choices, but future research is essential to integrate and evaluate this capability within more authentic, designer-centered creative tasks.
Fourth, the findings related to the ‘Culture’ dimension are specific to the cultural and linguistic context of our participant sample. As we clarified in our methodology, the cultural semantics used in this study were intentionally grounded in Chinese color theory to serve as a proof-of-concept for our framework’s ability to model culturally specific knowledge. Consequently, the model’s predictions related to this dimension are not expected to generalize directly to other cultures. Validating the cross-cultural transferability of our entire framework is a critical next step. Future research should apply our data collection and modeling methodology to diverse cultural regions to build and compare distinct, culturally specific preference models. This would not only test the generalizability of our framework but could also lead to valuable insights in the field of comparative color semantics.
Fifth, our online data collection method introduces the potential for sampling bias. Participants were recruited through online platforms and may have a higher-than-average interest in design and aesthetics, leading to a self-selection bias. This could potentially influence the generalizability of the systematic preference patterns identified in response to RQ1. For example, the strong population-level preference for styles like “Bright” and “Gentle” might be more pronounced in this demographic than in the general population. However, it is important to note that the primary goal of this study was not to establish universal aesthetic laws, but rather to develop and validate a computational framework for capturing, quantifying, and modeling individual aesthetic preferences. The core findings—such as the dominance of context over demographics (RQ3) and the model’s ability to generate diverse, satisfying recommendations (RQ2)—are predicated on the internal consistency and structure of the collected data, rather than its absolute population-level representativeness. Future work could address this limitation by recruiting a more stratified sample from diverse demographic and psychographic backgrounds to further test the cross-population validity of the observed preference patterns.
Sixth, the current model predicts preferences for single colors or simple color schemes within a fixed semantic space. Future work should expand the framework to address the complexity of multi-color palettes and color harmony. This could involve training models to predict the suitability of color combinations or integrating our preference model with established computational harmony rules.
Seventh, the construct validity of our ‘quantified aesthetic profile’ warrants careful consideration. Its Z-score component, used for the Function, Emotion, and Culture dimensions, relies on a standard deviation estimated from only three author-curated samples (
n = 3). This makes the measurement sensitive to the specific composition of these triplets. To address this, we took several steps: we aimed to increase transparency, as detailed in our methodology, by analyzing and reporting the variability within the stimulus triplets; we demonstrated the vector’s internal consistency via split-half reliability (
r = 0.709,
Section 4.1.4); and we confirmed its robustness against choice variability through a bootstrapping analysis, which showed a high average similarity of 0.81 between original and resampled vectors (
Section 4.1.5).
However, we concur with the critique that these analyses primarily support the measurement’s stability and consistency—that is, it captures a stable response pattern to our specific stimulus design—rather than definitively establishing it as a direct measure of a context-independent, inherent trait. Future work must employ more robust psychometric approaches to model these choices, such as ordinal regression or paired-comparison frameworks, which can derive preference scores with stronger theoretical underpinnings. This would be a critical next step to validate whether the stable structure we identified is an artifact of our measurement or a reflection of a deeper, inherent preference.
Eighth, the user validation study, while employing a rigorous counterbalanced within-subjects design and an appropriate statistical model (LMM), was conducted with a small sample of 12 participants. While this design allowed us to detect a statistically significant effect, the small sample size limits the broader generalizability of our findings. The conclusions drawn from this study should therefore be interpreted as exploratory and providing initial, promising evidence, pending validation from future, larger-scale user studies.
Finally, while our model predicts preference based on static user profiles, individual tastes can evolve. A longitudinal study tracking users’ preferences over time could enable the development of dynamic models that adapt to a user’s changing aesthetic sensibilities, leading to an even more sophisticated level of long-term personalization.