Human vs. LLM Creativity: A Comparative Analysis of Task-Dependent Asymmetry and Linguistic Mechanisms
Abstract
1. Introduction
2. Literature Review
2.1. Creativity Performance of LLM Text Generation
2.1.1. The Outbreak of Text Generation and Creativity
2.1.2. Hidden Constraints and Latent Limitations
2.2. Which Is Superior in Writing Creativity: Human or LLM?
2.2.1. The LLM Advantage in Formal Effectiveness
2.2.2. Human Uniqueness: Experience and Affect
2.2.3. Contrasting Underlying Mechanisms of Creativity
2.2.4. Conflicts in Comparative Creative Performance
- The LLM Advantage in Divergent Tasks
- 2.
- The Human Advantage in Complex Creative Tasks
2.3. Writing Assisted by LLMs: Augmentation or Diminishment
2.3.1. Augmentation: Raising the Floor
2.3.2. Diminishment: Homogenization and Cognitive Constraints
- The Ceiling-Lowering Effect and Homogenization
- 2.
- Loss of Agency
2.4. Gaps in Prior Research and Objectives of the Present Study
2.4.1. Gaps in Prior Research
- Missing Task Typology Control
- 2.
- Monolithic Evaluation Metrics
- 3.
- Black Box of Linguistic Mechanisms
2.4.2. Research Objectives and Investigative Strategies
- Objective 1: Multi-dimensional cross-task validation:
- Objective 2: Elucidating divergent linguistic mechanisms:
- Objective 3: Deconstructing the “collaboration trap” in human–AI interaction:
3. Materials and Methods
3.1. Participants and Materials
- Human-Only group (H): Texts composed by human participants without any LLM assistance.
- LLM-Only group (L): Texts generated entirely by the designated LLM via API.
- LLM-Assisted group (A): Texts composed by human participants who utilized an LLM as an integrated tool during the writing process.
3.1.1. Human Participants
3.1.2. LLM Generation
3.1.3. Writing Tasks and Corpus
Phase 1: Propositional Writing Tasks
Phase 2: Creative Writing Task
3.2. Experimental Procedure
- Step 1: Task assignment. Participants and LLMs were assigned to propositional tasks (Phase 1) or a creative task (Phase 2).
- Step 2: Text generation. Texts were produced across three author types: Human-Only, LLM-Only (GPT-5), and LLM-Assisted (utilizing GPT-2 as a suboptimal tool).
- Step 3: Data preprocessing. A rigorous cleaning protocol excluded blank, off-topic, or non-Chinese language responses, resulting in a final corpus.
- Step 4: Expert rating. Five trained raters independently evaluated each text based on the dual-dimensional rubric of originality and effectiveness.
3.3. Creativity Assessment and Rating Protocol
3.3.1. Scoring Rubric
- Originality (O): This measured the degree of novelty, uniqueness, and imaginative quality of the text. It was calculated as the aggregate mean of sub-scores related to Inspiration.
- Effectiveness (E): This measured the degree of functional appropriateness, execution quality, and successful communication of the intended message.
3.3.2. Rater Training and Scoring
- Initial Training: Raters were thoroughly instructed on the theoretical definitions of originality and effectiveness and the operational descriptions for each level of the 5-point scale.
- Calibration Session: Prior to formal scoring, 15% of the total essay sample was independently evaluated by all five raters.
- Consensus Meeting: Notable discrepancies and differences in opinion regarding the trial scoring were identified and discussed in detail to refine the raters’ shared understanding and application of the rubric descriptors, ensuring a consensus on the scoring standards was reached.
3.3.3. Reliability Assessment
3.4. Computational Linguistic Feature Extraction
3.4.1. Feature Selection and Extraction
3.4.2. Modeling and Rationale Analysis
- Creativity_Score is the mean rater score for either originality or effectiveness;
- AuthorType is a dummy variable (0 = Human-Only, 1 = LLM-Only);
- Featurei represents one of the 15 standardized linguistic features;
- γi is the coefficient for the interaction term, the primary parameter of interest.
- Model 1 (Main Effects) included AuthorType and all 15 Featurei main effects.
- Model 2 (Full Model) added the 15 interaction terms (AuthorType × Featurei) to Model 1 to test for moderation.
3.5. Statistical Analysis Strategy
3.5.1. Constraint 1: Cross-Sectional Cohort Differences
3.5.2. Constraint 2: Inconsistent LLMs Across Groups
3.6. Data Analysis Strategy
3.6.1. Comparative Analysis
3.6.2. Unsupervised Clustering Analysis
3.6.3. Hierarchical Moderated Regression (HMR) for Mechanism Exploration
Model Specification
Procedural Steps
3.6.4. Specific Analysis of Human–AI Collaboration
Comparative Assessment of Creativity Scores
Linguistic Mechanism Comparison
4. Results
4.1. Comparative Performance: Task-Dependent Asymmetry
4.1.1. Global 2 (Author Type) × 2 (Task Category) ANOVA Results
4.1.2. Follow-Up Task-Specific Comparisons (t-Tests)
4.2. Exploratory Creativity Profiles
4.2.1. Identified Creativity Profiles Through Cluster Analysis
- Cluster 0: High originality, high effectiveness (Ideal): This profile represents the optimal creativity, characterized by high mean scores on both originality and effectiveness. This cluster constituted 22.61% of the total analyzed corpus.
- Cluster 1: Low originality, middle effectiveness (Plain): This profile was characterized by the lowest mean originality score and mid-level effectiveness. Texts here are generally routine and predictable, though they possess a moderate degree of competence.
- Cluster 2: Low originality, high effectiveness (Safe): This was the largest cluster, which is defined by low-level originality but relatively high effectiveness. Texts in this profile are well-executed and competent but lack innovative or unique elements.
- Cluster 3: Middle originality, low effectiveness (Moderate): This profile showed a mixed pattern: mid-level originality but the lowest effectiveness score among the four clusters. This suggests texts that attempted unique ideas but struggled significantly in execution and coherence.
4.2.2. Distribution of Creativity Profiles by Author Type
4.3. Mechanism Exploration: The Moderating Role of Authorship on Linguistic Predictors
4.3.1. Overview of Moderation Effects
4.3.2. Linguistic Mechanisms in Propositional Tasks
4.3.3. Linguistic Mechanisms in the Creative Task
4.4. Style-Specific Linguistic Mechanisms of Human–AI Collaboration
4.4.1. Comparative Assessment of Creativity Scores and Profile Distribution
4.4.2. The Linguistic Mechanism of the Human–AI Collaboration Trap
4.4.3. Author Type × Style Interaction on Linguistic Feature
5. Discussion
5.1. Task-Dependent Asymmetry of Originality: Embodied Cognition vs. Probabilistic Generation
5.2. The Universal Dominance of Effectiveness: The Perfection of "Safe" Writing
5.3. Antithetical Linguistic Pathways: Cognitive Investment vs. Structural Optimization
5.4. The "Collaboration Trap": Semantic Collapse and Anchoring Effects
5.5. Limitations and Future Directions
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Appendix A. Writing Task Prompts
- Task 1: Propositional Writing Tasks
- Topic 1: Companionship is the Best Gift;
- Topic 2: If I Could Do It Again;
- Topic 3 (Title Completion): I Forgot ________.
- Task 2: College Creative Writing (Design Fiction Workshop)
- Workshop Theme:
- B.
- Core Task: Design Fiction.
- Possible Forms: An allegorical story, a topic for discussion, or a question full of contradiction/doubt.
- Creation Basis: Based on speculation and imagination, construct a future scenario set in the year 2040.
- Two Proposed Contextual Assumptions for 2040 (Choose One):
- The Era of AI Involution (Hyper-competition): By 2040, AI and automation have replaced some human jobs, boosting productivity but intensifying “Involution” (hyper-competition). Competition extends between humans, between humans and AI, and between AIs (e.g., the “Involution King” competes with AI, and some automation is developed solely for the sake of hyper-competition).
- Data Workers: By 2040, AI and automation have replaced some human jobs, work types have diversified, and “living is working” has become the norm. Humans generate data through daily activities (including sleep) to earn income, with different behaviors priced differently.
- C.
- Four-Step Creation Process
- Choose one 2040 background assumption and generate 4–5 imaginative scenes, using a fixed sentence structure for each:
- “In 2040, a ___________ (what kind of) scenario occurred. I ___________ (what situation/difficulty/challenge/event did I encounter). I hope ___________ (what change I want to make/what goal I want to achieve).”
- Based on the “future snippet,” supplement the action: “…To achieve this goal, I then did ______________” (multiple imaginative and contextually relevant actions may be developed).
- Select the most preferred snippet and elaborate on possible results/impacts (can involve personal/societal levels, be positive/negative, be expected/unexpected, or be active/frustrating).
- Elaborate on the snippet to form a complete Design Fiction, which must include the following:
- The social context of 2040;
- Events that occurred in 2040 (characters, actions, new products/services/policies);
- Reflection on the critical issues (impact of technology on work, the meaning of work for humans, the boundaries of work, etc.; reflection focus: AI’s impact on humans, ideal AI form, AI’s impact on future work, essential qualities of meaningful work, or division of labor between AI and humans).
Appendix B. Creativity Rating Instrument and Guidelines
- Rating Dimensions and Descriptors
| Dimension | Scale | 5 (Strongly Agree/Very High) Descriptors |
|---|---|---|
| Originality | 5-1 |
|
| Effectiveness | 5-1 |
|
- 2.
- Detailed Explanations and Guidelines
- Novelty Aspects (More Detailed Definitions)
- Unusual Character or Background: The story’s participants or background are highly unusual (e.g., in the topic “Companionship is the Best Gift,” a “book” is a more novel subject than “parents”). Characters or settings are very unusual (e.g., animals or aliens are main characters; the story is set on another planet). The author makes an unusual or unexpected choice in interpreting the writing prompt (e.g., literally interpreting the concept of power, or interpreting it from a drastically different angle than typical explanations).
- Unusual Plot or Development: An unusual or unexpected development of the story or plot (e.g., describing a “father who left home early and did not accompany the grandmother”). The author makes unusual or unexpected choices in the plot’s direction (e.g., the story begins like a corny teen romance but has a surprisingly dark ending; the story starts very realistically but ends unexpectedly mysteriously; the story begins very dark but unexpectedly becomes funny).
- Uncommon Thoughts or Emotions: The author expresses uncommon but appropriate thoughts, emotions, or judgments (e.g., a “left-behind” child’s longing for a complete family).
- Rare Language Style: The author chooses very rare words, phrases, language, or rhetorical styles (e.g., extensive parallelism, personification, or archaic Chinese language).
- B.
- Concreteness Aspects (Detailed Definitions)
- Abstract: Refers to concepts that cannot be experienced through the senses (e.g., love and despair).
- Generalization: Use of overly vague and un-visualized terminology (e.g., “everything” or “her life”) without further explanation.
- C.
- Exclusion Criteria
- Off-topic essays are marked “Off-topic,” and blank papers are marked “Blank.” Neither receives a score.
- No ideological or moral judgments are to be made.
References
- Abraham, A. (2025). Why the standard definition of creativity fails to capture the creative act. Theory & Psychology, 35(1), 40–60. [Google Scholar] [CrossRef]
- Al Hosni, J. (2025). Preserving authorial voice in academic texts in the age of generative AI: A thematic literature review. Arab World English Journal, 16(3), 244–258. [Google Scholar] [CrossRef]
- Amabile, T. M. (1983). The social psychology of creativity. Springer. [Google Scholar] [CrossRef]
- Anthropic. (2024). The Claude 3 model family: Opus, Sonnet, Haiku. Available online: https://api.semanticscholar.org/CorpusID:268232499 (accessed on 15 January 2026).
- Baer, J. (2012). Divergent thinking and creativity: A task-specific approach. Psychology Press. [Google Scholar]
- Baron, N. S. (2023). Who wrote this? How AI and the lure of efficiency threaten human writing. Stanford University Press. Available online: https://www.loc.gov/item/2023011363/ (accessed on 15 January 2026).
- Barsalou, L. W. (2008). Grounded cognition. Annual Review of Psychology, 59, 617–645. [Google Scholar] [CrossRef]
- Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021, March 3–10). On the dangers of stochastic parrots: Can language models be too big? 2021 ACM Conference on Fairness, Accountability, and Transparency (pp. 610–623), Toronto, ON, Canada. [Google Scholar] [CrossRef]
- Boden, M. A. (2004). The creative mind: Myths and mechanisms (2nd ed.). Routledge. [Google Scholar] [CrossRef]
- Boden, M. A. (2009). Computer models of creativity. AI Magazine, 30(3), 23–34. [Google Scholar] [CrossRef]
- Brown, T. B., Mann, B., Ryder, N., Subbiah, M., Kaplan, J., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-Voss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D. M., Wu, J., Winter, C., … Amodei, D. (2020). Language models are few-shot learners. Advances in Neural Information Processing Systems, 33, 1877–1901. [Google Scholar] [CrossRef]
- Bruner, J. (1991). The narrative construction of reality. Critical Inquiry, 18(1), 1–21. [Google Scholar] [CrossRef]
- Brysbaert, M., Warriner, A. B., & Kuperman, V. (2014). Concreteness ratings for 40 thousand generally known English word lemmas. Behavior Research Methods, 46(3), 904–911. [Google Scholar] [CrossRef] [PubMed]
- Chang, T. A., & Bergen, B. K. (2023). Language model behavior: A comprehensive survey. Computational Linguistics, 49(3), 567–610. [Google Scholar] [CrossRef]
- Colton, S., & Wiggins, G. A. (2012, August 27–31). Computational creativity: The final frontier? 20th European Conference on Artificial Intelligence (pp. 21–26), Montpellier, France. [Google Scholar] [CrossRef]
- Cropley, A. J. (2006). In praise of convergent thinking. Creativity Research Journal, 18(3), 391–404. [Google Scholar] [CrossRef]
- Cropley, A. J., & Cropley, D. (2008). Resolving the paradoxes of creativity: An extended phase model. Cambridge Journal of Education, 38(3), 355–373. [Google Scholar] [CrossRef]
- Csikszentmihalyi, M. (1997). Creativity: Flow and the psychology of discovery and invention. Harper Perennial. [Google Scholar]
- Damasio, A. R. (1999). The feeling of what happens: Body and emotion in the making of consciousness. Houghton Mifflin Harcourt. [Google Scholar]
- Diedrich, J., Benedek, M., Jauk, E., & Neubauer, A. C. (2015). Are creative ideas novel and useful? Psychology of Aesthetics, Creativity, and the Arts, 9(1), 35–40. [Google Scholar] [CrossRef]
- Dinu, A., Florescu, A. M., & Resceanu, A. (2025). A comparative approach to assessing linguistic creativity of large language models and humans. Procedia Computer Science, 270, 1292–1301. [Google Scholar] [CrossRef]
- Doshi, A. R., & Hauser, O. P. (2024). Generative AI enhances individual creativity but reduces the collective diversity of novel content. Science Advances, 10(28), eadn5290. [Google Scholar] [CrossRef] [PubMed]
- Epstein, Z., Hertzmann, A., & Investigators of the Akten-Hertzmann Lab. (2023). Art and the science of generative AI. Science, 380(6650), 1110–1111. [Google Scholar]
- Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30(4), 681–694. [Google Scholar] [CrossRef]
- Gero, K. I., Liu, V., & Chilton, L. (2022, June 13–17). Sparks: Inspiration for science writing using language models. 2022 ACM Designing Interactive Systems Conference (pp. 1002–1019), Virtual Event, Australia. [Google Scholar] [CrossRef]
- Giannuzzo, A. (2023). Creativity, intentions, and self-narratives: Can AI really be creative? In Progress in artificial intelligence (EPIA 2023) (pp. 52–63). Springer. [Google Scholar] [CrossRef]
- Glăveanu, V. P., Ritter, S. M., Reiter-Palmon, R., Lubart, T., & Nijstad, B. A. (2013). Creativity as action: Findings from five creative domains. Frontiers in Psychology, 4, 176. [Google Scholar] [CrossRef]
- Guilford, J. P. (1950). Creativity. American Psychologist, 5(9), 444–454. [Google Scholar] [CrossRef]
- Haase, J., & Hanel, P. H. (2023). Artificial muses: Generative artificial intelligence chatbots have risen to human-level creativity. Journal of Creativity, 33(3), 100066. [Google Scholar] [CrossRef]
- Hutson, M. (2021). Robo-writers: The rise and risks of language-generating AI. Nature, 591(7848), 22–25. [Google Scholar] [CrossRef]
- Jakesch, M., Hancock, J. T., & Naaman, M. (2023). Human heuristics for AI-generated language are flawed. Proceedings of the National Academy of Sciences, 120(11), e2208839120. [Google Scholar] [CrossRef]
- Ji, Z., Lee, N., Frieske, R., Yu, T., Su, D., Xu, Y., Ishii, E., Bang, Y., Madotto, A., & Fung, P. (2023). Survey of hallucination in natural language generation. ACM Computing Surveys, 55(12), 248. [Google Scholar] [CrossRef]
- Jordanous, A. (2012). A standardised procedure for evaluating creative systems: Computational creativity evaluation based on what it is to be creative. Cognitive Computation, 4(3), 246–279. [Google Scholar] [CrossRef]
- Karwowski, M., Kaufman, J. C., Lebuda, I., Szumski, G., & Firkowska-Mankiewicz, A. (2017). Intelligence in childhood and creative achievements in middle-age: The necessary condition approach. Intelligence, 64, 36–44. [Google Scholar] [CrossRef]
- Kaufman, J. C. (2016). Creativity 101 (2nd ed.). Springer. Available online: https://www.springerpub.com/creativity-101-9780826129529.html (accessed on 15 January 2026).
- Kaufman, J. C., & Beghetto, R. A. (2009). Beyond big and little: The four c model of creativity. Review of General Psychology, 13(1), 1–12. [Google Scholar] [CrossRef]
- Kenett, Y. N., Anaki, D., & Faust, M. (2014). Investigating the structure of semantic networks in low and high creative persons. Frontiers in Human Neuroscience, 8, 407. [Google Scholar] [CrossRef]
- Koivisto, M., & Grassini, S. (2023). Best humans still outperform artificial intelligence in a creative divergent thinking task. Scientific Reports, 13(1), 13601. [Google Scholar] [CrossRef] [PubMed]
- Köbis, N., & Mossink, L. D. (2021). Artificial intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry. Computers in Human Behavior, 114, 106553. [Google Scholar] [CrossRef]
- Lee, M., Liang, P., & Yang, Q. (2022, April 30–May 5). Coauthor: Designing a human-ai collaborative writing dataset for exploring language model capabilities. 2022 CHI Conference on Human Factors in Computing Systems (pp. 1–19), New Orleans, LA, USA. [Google Scholar]
- Liang, W., Yuksel, M., Huang, J. X., & Zou, J. (2023). GPT detectors are biased against non-native English writers. Patterns, 4(7), 100779. [Google Scholar] [CrossRef]
- Liu, Z., Zhou, J., Li, Y., & Chen, J. (2023). Summary of ChatGPT-related research and perspective towards the future of large language models. Meta-Radiology, 1(2), 100017. [Google Scholar] [CrossRef]
- Mar, R. A., & Oatley, K. (2008). The function of fiction is the abstraction and simulation of social experience. Perspectives on Psychological Science, 3(3), 173–192. [Google Scholar] [CrossRef]
- Marcus, G., & Davis, E. (2019). Rebooting AI: Building artificial intelligence we can trust. Pantheon. [Google Scholar]
- Mazzone, M., & Elgammal, A. (2019). Art, creativity, and the potential of artificial intelligence. Arts, 8(1), 26. [Google Scholar] [CrossRef]
- Messeri, L., & Crockett, M. J. (2024). Artificial intelligence and illusions of understanding in scientific research. Nature, 627(8002), 49–58. [Google Scholar] [CrossRef]
- Mitchell, M., & Krakauer, D. C. (2023). The debate over understanding in AI’s large language models. Proceedings of the National Academy of Sciences, 120(13), e2215907120. [Google Scholar] [CrossRef]
- Moura, F. T. (2023). Artificial intelligence, creativity, and intentionality: The need for a paradigm shift. Journal of Creative Behavior, 57(3), 336–338. [Google Scholar] [CrossRef]
- Noy, S., & Zhang, W. (2023). Experimental evidence on the productivity effects of generative artificial intelligence. Science, 381(6654), 187–192. [Google Scholar] [CrossRef]
- O’Sullivan, J. (2025). Stylometric comparisons of human versus AI-generated creative writing. Humanities and Social Sciences Communications, 12, 1708. [Google Scholar] [CrossRef]
- Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., Zhang, C., Agarwal, S., Slama, K., Ray, A., Schulman, J., Hilton, J., Kelton, F., Miller, L., Simens, M., Askell, A., Welinder, P., Christiano, P., Leike, J., & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730–27744. [Google Scholar]
- Park, J. S., O’Brien, J. C., Cai, C. J., Morris, M. R., Liang, P., & Bernstein, M. S. (2023, October 29–November 1). Generative agents: Interactive simulacra of human behavior. 36th Annual ACM Symposium on User Interface Software and Technology (pp. 1–22), San Francisco, CA, USA. [Google Scholar] [CrossRef]
- Pavlik, J. V. (2023). Collaborating with ChatGPT: Considering the implications of generative artificial intelligence for journalism and media education. Journalism & Mass Communication Educator, 78(1), 84–93. [Google Scholar] [CrossRef]
- Peeperkorn, L., van der Linden, C., & de Ridder, R. (2024). The homogenization of creativity in the age of AI. New Media & Society. [Google Scholar]
- Pennebaker, J. W., Boyd, R. L., Jordan, K., & Blackburn, K. (2015). The development and psychometric properties of LIWC2015. University of Texas at Austin. [Google Scholar]
- Runco, M. A. (2024). The discovery and innovation of AI does not qualify as creativity. Journal of Cognitive Psychology, 1–10. [Google Scholar] [CrossRef]
- Runco, M. A., & Charles, R. E. (1993). Judgments of originality and appropriateness as predictors of creativity. Personality and Individual Differences, 15(5), 537–546. [Google Scholar] [CrossRef]
- Runco, M. A., & Jaeger, G. J. (2012). The standard definition of creativity. Creativity Research Journal, 24(1), 92–96. [Google Scholar] [CrossRef]
- Searle, J. R. (1980). Minds, brains, and programs. Behavioral and Brain Sciences, 3(3), 417–424. [Google Scholar] [CrossRef]
- Shrout, P. E., & Fleiss, J. L. (1979). Intraclass correlations: Uses in assessing rater reliability. Psychological Bulletin, 86(2), 420. [Google Scholar] [CrossRef] [PubMed]
- Simonton, D. K. (1999). Origins of genius: Darwinian perspectives on creativity. Oxford University Press. [Google Scholar]
- Simonton, D. K. (2012). Taking the US Patent Office criteria seriously: A quantitative three-criterion creativity definition and its implications. Creativity Research Journal, 24(2–3), 97–106. [Google Scholar] [CrossRef]
- Stein, M. I. (1953). Creativity and culture. The Journal of Psychology, 36(2), 311–322. [Google Scholar] [CrossRef]
- Sternberg, R. J. (1999). Handbook of creativity. Cambridge University Press. [Google Scholar] [CrossRef]
- Sternberg, R. J. (2024). Do not worry that generative AI may compromise human creativity or intelligence in the future: It already has. Journal of Intelligence, 12(7), 69. [Google Scholar] [CrossRef] [PubMed]
- Sutskever, I., Vinyals, O., & Le, Q. V. (2014, December 8–12). Sequence to sequence learning with neural networks. 28th International Conference on Neural Information Processing Systems (pp. 3104–3112), Montreal, QC, Canada. [Google Scholar]
- Van Rooij, A., & Biskjaer, M. M. (2025, October 7–10). Has AI surpassed humans in creative idea generation? A meta-analysis. 36th Annual Conference of the European Association of Cognitive Ergonomics (pp. 1–11), Tallinn, Estonia. [Google Scholar] [CrossRef]
- Vervaeke, J., Lillicrap, T. P., & Richards, B. A. (2012). Relevance realization and the emerging framework in cognitive science. Journal of Logic and Computation, 22(1), 79–99. [Google Scholar] [CrossRef]
- Vinchon, F., Gironnay, V., & Lubart, T. (2024). GenAI creativity in narrative tasks: Exploring new forms of creativity. Journal of Intelligence, 12(12), 125. [Google Scholar] [CrossRef]
- Wiggins, G. A. (2006). A preliminary framework for description, analysis and comparison of creative systems. Knowledge-Based Systems, 19(7), 449–458. [Google Scholar] [CrossRef]
- Xu, W., Jojic, N., Rao, S., Brockett, C., & Dolan, B. (2025). Echoes in ai: Quantifying lack of plot diversity in llm outputs. Proceedings of the National Academy of Sciences, 122(35), e2504966122. [Google Scholar] [CrossRef]
- Yuan, A., Coenen, A., Reif, E., & Ippolito, D. (2022, March 22–25). Wordcraft: Story writing with large language models. 27th International Conference on Intelligent User Interfaces (pp. 841–852), Helsinki, Finland. [Google Scholar] [CrossRef]
- Zhou, E., & Lee, D. (2024). Generative artificial intelligence, human creativity, and art. PNAS Nexus, 3(3), pgae052. [Google Scholar] [CrossRef] [PubMed]





| Task Phase | Task Type | Author Type | Initial Sample | Final Cleaned Sample (N) |
|---|---|---|---|---|
| Phase 1 | Propositional | Human-Only (H) | 200 | 189 |
| Human-Only (H) | 200 | 190 | ||
| Human-Only (H) | 200 | 188 | ||
| LLM-Only (L) | 600 | 600 | ||
| Phase 2 | Creative Fiction | Human-Only (H) | 25 | 25 |
| LLM-Assisted (A) | 25 | 25 | ||
| LLM-Only (L) | 79 | 79 | ||
| Total Corpus Size | 1329 | 1296 |
| Domain | Feature | Operational Definition | Theoretical Rationale for Creativity |
|---|---|---|---|
| Lexical | MTLD (Measure of Textual Lexical Diversity) | The mean number of words needed before a lexical Type–Token Ratio (TTR) value of 0.72 is reached; averaged over consecutive samples. | High diversity can indicate rich vocabulary but may also lead to reduced thematic focus, impacting perceived originality and effectiveness. |
| Text Difficulty | An index of text difficulty calculated based on character frequency and strokes. | Text difficulty is a proxy for processing fluency; moderate difficulty may be perceived as more thoughtful and effective. | |
| Count of Classical Chinese Words | The absolute count of words identified as belonging to classical Chinese lexicons. | Use of classical language signals erudition and stylistic flair, potentially enhancing originality. | |
| Semantic | Perceptual Processes | The proportion of words related to seeing, hearing, or feeling (e.g., “see”, “listen”), based on the LIWC2015 dictionary. | Perceptual language creates vivid mental imagery, which can enhance both the vividness (effectiveness) and novelty (originality) of a text. |
| Use of First-Person Singular | The proportion of first-person singular pronouns (e.g., “I”, “me”). | High usage may indicate subjectivity and personal narrative, which can affect perceived objectivity and originality depending on the task. | |
| Mean Concreteness | The average concreteness rating of all content words. | Concrete words are easier to process and evoke mental imagery. A balance of concrete and abstract language is often key to effective and creative writing. | |
| Causal Language | The proportion of words indicating causal relationships (e.g., “because”, “effect”), based on LIWC2015. | Reflects logical reasoning and structure, which is crucial for the effectiveness of argumentative or expository texts. | |
| Family-Related Words | The proportion of words referring to family members (e.g., “mother”, “brother”), based on LIWC2015. | May indicate the use of personal or social themes, influencing the perceived relatability and style of the text. | |
| Syntactic | Mean Parse Tree Height | The average maximum path length from the root to any leaf node in a constituency parse tree for each sentence. | A proxy for syntactic complexity; higher trees suggest more complex sentence structures, often associated with higher cognitive ability. |
| Average Dependency Distance | The average linear distance (in words) between a headword and its dependent elements in a sentence. | A well-validated metric of syntactic complexity; greater distances suggest more complex phrasing and embedded clauses. | |
| Words Per Sentence | The total number of words divided by the total number of sentences. | Measures sentence length; longer sentences can convey more complex ideas but may reduce readability and effectiveness. | |
| Conjunctions | The proportion of conjunctions (e.g., “and”, “but”) used in the text, based on LIWC2015. | Reflects the complexity of logical connections between ideas. | |
| Discourse | Inter-Sentence Similarity | The mean cosine similarity between consecutive sentence pairs, calculated using Sentence-BERT (SBERT) embeddings. | Measures local cohesion. Very high similarity may indicate repetition (low originality), while very low similarity may signal incoherence (low effectiveness). |
| Intra-Paragraph Sentence Similarity | The mean cosine similarity between all sentence pairs within a paragraph, calculated using SBERT embeddings. | Measures thematic cohesion at the paragraph level, a key component of overall text effectiveness. |
| Dependent Variable | Independent Variable | Sum of Squares | df | F-Statistic | p-Value | Partial η2 |
|---|---|---|---|---|---|---|
| Originality | Author Type (A) | 0.023 | 1 | 0.059 | 0.808 | <0.001 |
| (N = 1256) | Task Category (T) | 0.020 | 1 | 0.050 | 0.823 | <0.001 |
| A × T Interaction | 3.693 | 1 | 9.399 | 0.002 | 0.007 | |
| Residual | 492.011 | 1252 | ||||
| Effectiveness | Author Type (A) | 107.061 | 1 | 328.068 | <0.001 | 0.208 |
| (N = 1256) | Task Category (T) | 16.329 | 1 | 50.036 | <0.001 | 0.038 |
| A × T Interaction | 0.064 | 1 | 0.196 | 0.658 | <0.001 | |
| Residual | 408.574 | 1252 |
| Task | Dimension | t | df | p-Value | Cohen’s d | Key Finding |
|---|---|---|---|---|---|---|
| Creative Task (CT) | Originality | 3.764 | 102 | <0.001 | 0.74 | H > L |
| Effectiveness | −4.280 | 102 | <0.001 | 0.84 | L > H | |
| Propositional P1 | Originality | −1.407 | 387 | 0.160 | −0.14 | Not significant |
| Effectiveness | −9.459 | 387 | <0.001 | −0.96 | L > H (Large Effect) | |
| Propositional P2 | Originality | 0.863 | 388 | 0.389 | 0.09 | Not significant |
| Effectiveness | −9.502 | 388 | <0.001 | −0.97 | L > H (Large Effect) | |
| Propositional P3 | Originality | −0.463 | 386 | 0.643 | −0.05 | Not significant |
| Effectiveness | −11.622 | 386 | <0.001 | −1.18 | L > H (Very Large Effect) |
| Cluster Index | Cluster Name (Profile) | Mean Originality (SD) | Mean Effectiveness (SD) | Sample Size (N) | Percentage (%) |
|---|---|---|---|---|---|
| 0 | Ideal Style (High O, High E) | 3.75 (0.35) | 3.87 (0.46) | 286 | 22.61 |
| 1 | Plain Style (Low O, Mid-E) | 2.26 (0.27) | 3.28 (0.34) | 332 | 26.24 |
| 2 | Safe Style (Low O, High E) | 2.80 (0.30) | 3.58 (0.40) | 448 | 35.41 |
| 3 | Moderate Style (Mid-O, Low E) | 3.30 (0.31) | 2.59 (0.35) | 199 | 15.73 |
| Author Type | Ideal (High O, High E) | Moderate (Mid-O, Low E) | Safe (Low O, High E) | Plain (Low O, Mid-E) |
|---|---|---|---|---|
| Human (Student) | 49.82% | 49.85% | 26.62% | 84.92% |
| LLM (AI-only) | 50.18% | 50.15% | 73.38% | 15.08% |
| Task Category | Dependent Variable | Linguistic Feature | ΔR2 | Β Interaction | p-Value |
|---|---|---|---|---|---|
| Propositional | Effectiveness (E) | Difficulty | 0.016 | −0.001 | <0.001 |
| Effectiveness (E) | Classical Words | 0.008 | −0.005 | <0.001 | |
| Effectiveness (E) | Percept | 0.0072 | −9.321 | 0.001 | |
| Effectiveness (E) | MTLD | 0.0059 | −0.002 | 0.003 | |
| Originality (O) | MTLD | 0.0328 | −0.005 | <0.001 | |
| Originality (O) | Difficulty | 0.0056 | −0.001 | 0.009 | |
| Originality (O) | Family Words | 0.0055 | 9.738 | 0.012 | |
| Creative | Effectiveness (E) | Difficulty | 0.0594 | −0.002 | 0.007 |
| Effectiveness (E) | Paragraph Similarity | 0.0000 | 2.087 | 0.008 | |
| Originality (O) | First-Person Singular | 0.0469 | 15.40 | 0.019 | |
| Originality (O) | Difficulty | 0.0366 | −0.001 | 0.04 |
| Author Type | Ideal (C0) (High O, High E) | Plain (C1) (Low O, Mid-E) | Safe (C2) (Low O, High E) | Moderate (C3) (Mid-O, Low E) | Total |
|---|---|---|---|---|---|
| Human (HO) | 7 | 1 | 1 | 16 | 25 |
| AI-Assisted (HAC) | 5 | 3 | 1 | 16 | 25 |
| LLM-Only (LO) | 10 | 26 | 25 | 18 | 79 |
| Total | 22 | 30 | 27 | 50 | 129 |
| DV | Mean (H) | Mean (A) | Mean (L) | F | p-Value | Tukey HSD (Adj.) |
|---|---|---|---|---|---|---|
| Originality | 3.20 | 3.10 | 2.93 | 1.55 | 0.220 | No Significant Differences |
| Effectiveness | 2.87 | 2.82 | 3.82 | 32.73 | <0.001 | L ≫ H; L ≫ A |
| Feature | Mean (H) | Mean (A) | Mean (L) | F | p-Value |
|---|---|---|---|---|---|
| ParaSim (Intra-Paragraph Similarity) | 0.000 | 0.318 | 0.417 | 103.70 | <0.001 |
| First-Person Singular | 0.032 | 0.044 | 0.013 | 17.64 | <0.001 |
| Difficulty | 589.997 | 749.099 | 877.545 | 5.27 | 0.007 |
| Concreteness | 2.476 | 2.453 | 2.560 | 9.14 | <0.001 |
| Classical Words | 90.240 | 114.320 | 126.933 | 3.27 | 0.044 |
| Conjunctions | 0.046 | 0.057 | 0.047 | 3.85 | 0.025 |
| Feature | F(6, 114) | p | Partial η2 |
|---|---|---|---|
| Paragraph Similarity (Adj.) | 3.40 | 0.004 | 0.148 |
| Lexical Diversity (MTLD) | 2.64 | 0.020 | 0.119 |
| Difficulty | 2.46 | 0.028 | 0.112 |
| Words Per Sentence | 2.40 | 0.032 | 0.110 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Yang, L.; Xin, T.; Yu, Y.; Wu, Y. Human vs. LLM Creativity: A Comparative Analysis of Task-Dependent Asymmetry and Linguistic Mechanisms. J. Intell. 2026, 14, 27. https://doi.org/10.3390/jintelligence14020027
Yang L, Xin T, Yu Y, Wu Y. Human vs. LLM Creativity: A Comparative Analysis of Task-Dependent Asymmetry and Linguistic Mechanisms. Journal of Intelligence. 2026; 14(2):27. https://doi.org/10.3390/jintelligence14020027
Chicago/Turabian StyleYang, Liping, Tao Xin, Yunye Yu, and Yiying Wu. 2026. "Human vs. LLM Creativity: A Comparative Analysis of Task-Dependent Asymmetry and Linguistic Mechanisms" Journal of Intelligence 14, no. 2: 27. https://doi.org/10.3390/jintelligence14020027
APA StyleYang, L., Xin, T., Yu, Y., & Wu, Y. (2026). Human vs. LLM Creativity: A Comparative Analysis of Task-Dependent Asymmetry and Linguistic Mechanisms. Journal of Intelligence, 14(2), 27. https://doi.org/10.3390/jintelligence14020027

