Game Knowledge Management System: Schema-Governed LLM Pipeline for Executable Narrative Generation in RPGs
Abstract
1. Introduction
- 1.
- We formalize the concept of a Game Knowledge Management System (G-KMS) and articulate a conceptual and operational lifecycle for executable game knowledge in LLM-based generation pipelines.
- 2.
- We propose an LLM-chained schema-governed pipeline that produces structurally valid, semantically grounded, and engine-executable narrative knowledge.
- 3.
- We establish a multidimensional evaluation methodology that combines engine-level execution and human player interaction for reproducible system-level validation.
2. Related Work
2.1. Knowledge Representation and Management in Game Systems
2.2. Symbolic and Template-Based Narrative PCG
2.3. LLM-Based Narrative Generation and Dialogue Control
2.4. Runtime-Integrated PCG and Engine-Level Systems
2.5. Summary
3. Proposed Method
3.1. Architectural Overview
3.2. Data and Asset Standardization
3.3. Schema-Constrained LLM Generation Process
3.4. Evaluation Pipeline and Unity Implementation
4. Experimental Results and Analysis
4.1. Structural Validity
4.2. Textual Diversity and Redundancy
| Temp | Token Entropy H1 ↑ | Route Entropy ↑ | Interpretation |
|---|---|---|---|
| T0.3 | 1.71 | 0.49 | The dialogue is relatively stable but has the lowest diversity. |
| T0.7 1 | 2.45 | 0.75 | Diversity has increased significantly, while the structure remains stable. |
| T1.0 | 2.65 | 0.76 | It exhibits the highest diversity but begins to approach the upper limit of randomness. |
| Temp | Mean Similarity ↓ | Near-Duplicate Rate (>0.9) ↓ | Interpretation |
|---|---|---|---|
| T0.3 | 0.612 | 4.8% | There are a few very similar tasks. |
| T0.7 1 | 0.607 | 2.3% | Reduced semantic repetition, more dispersed expression. |
| T1.0 | 0.598 | 1.1% | Almost no repetition, but some tasks deviated from the theme. |
4.3. Task-Graph Metrics
4.4. Engine-Level Playability Proxies
4.5. LLM Self-Evaluation
4.6. Human Player Study
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
| AI | Artificial Intelligence |
| API | Application Programming Interface |
| GenAI | Generative Artificial Intelligence |
| G-KMS | Game Knowledge Management System |
| HITL | Human-in-the-Loop |
| KMS | Knowledge Management System |
| PCG | Procedural Content Generation |
| LLM | Large Language Model |
| RAG | Retrieval-Augmented Generation |
| RTS | Real-Time Strategy |
| RPG | Role-Playing Game |
| DSR | Design Science Research |
| JSON | JavaScript Object Notation |
| DSL | Domain-Specific Language |
| NPC | Non-Player Character |
| UI | User Interface |
| GPT | Generative Pre-trained Transformer |
| t-SNE | t-distributed Stochastic Neighbor Embedding |
| UMAP | Uniform Manifold Approximation and Projection |
| S-BERT | Sentence-BERT |
| WebGL | Web Graphics Library |
Appendix A
| Layer | Purpose | Representative Constraints and Elements |
|---|---|---|
| Constraint Layer | Enforce structural validity and engine compatibility |
|
| Context Layer | Ground narrative generation in world knowledge and style |
|
| Few-Shot Layer | Provide structural and stylistic priors |
|
| Generation Request Layer | Specify task-level narrative and gameplay requirements |
|
| System-Level Output Rules | Enforce admission-level correctness before execution |
|
Appendix B
| Normalization Rule | Description |
|---|---|
| Illegal key sanitization | Removes fields not permitted by the schema from actions and conditions (e.g., stray ‘id’, ‘text’, ‘to’, ‘states’ keys). |
| Branch balancing | Ensures each dialogue node contains at least one fallback branch (‘return_to_main’), preventing dead-end states. |
| NPC spatial correction | If ‘positionZone’ is missing, autofill with “All”; snap NPC coordinates to the nearest walkable cell (Manhattan metric). |
| Schema-field alignment | Align ‘metadata.title’ to match the top-level ‘title’ if inconsistent. |
| Fix-log recording | All modifications are stored in a ‘post_normalize.fixlog’ list for transparency and debugging. |
| Validation Category | Core Checks | Description |
|---|---|---|
| Schema Conformity | Draft-7 schema check; Required fields; Enum validity | Ensures JSON structure matches the formal task schema. In compatibility mode, schema violations are logged but not blocking. |
| Prefab and Portrait Validity | Prefab whitelist enforcement; Portrait naming rule; (‘<Faction>_<m/f>_face.png’) | Verifies that all character and item prefabs exist in the official whitelist and portrait filenames match faction/gender patterns. |
| Position and Zone Consistency | Walkable-cell lookup; Zone normalization and aliasing; Near-miss tolerance with EPS | Confirms that player/NPC/item positions correspond to valid walkable points within tolerance. Zone names are normalized to avoid mismatch. Near misses are downgraded to warnings. |
| Dialogue-Graph Integrity | ‘option.next’ branch mapping; ‘endingId’ validity; Node structure correctness | Ensures all dialogue options reference existing branches, all ending IDs are defined, and no branches create dead references. |
| Title and Metadata Alignment * | Check ‘title == metadata.title’ | Prevents inconsistencies between top-level and metadata titles during multi-step generation. |
Appendix C

Appendix D
| Component | Description/Example |
|---|---|
| Evaluator Role | “You are a narrative reviewer evaluating the coherence and quality of a generated RPG quest”. |
| Evaluation Scope | Assess narrative consistency, quest logic, character voice, dialogue quality, solvability, and overall interestingness of a single generated task. |
| Scoring Dimensions | world_consistency, narrative_logic, solvability, character_voice, dialogue_quality, interestingness, and overall_score; each rated on a 1–5 Likert scale. |
| Violation Flags | Invalid prefab or location references penalize world_consistency; dead-end or unreachable dialogue branches penalize narrative_logic; use of reserved canon names is recorded as a violation flag. |
| Output Format | Structured JSON conforming to eval.schema.json, containing numeric scores for each dimension and a brief natural-language summary (≤120 characters) of the quest premise and resolution. |
| Example Output | { “overall”: 4.17, “world_consistency”: 5, “narrative_logic”: 4, “solvability”: 4, “character_voice”: 4, “dialogue_quality”: 4, “interestingness”: 4, “summary”: “Players resolve a missing wand incident through dialogue at a forest camp”. } |
References
- Hu, C.; Zhao, Y.; Liu, J. Game Generation via Large Language Models. In Proceedings of the 2024 IEEE Conference on Games (CoG), Milan, Italy, 5–8 August 2024; pp. 1–4. [Google Scholar]
- Wu, W.; Wu, H.; Jiang, L.; Liu, X.; Zhao, H.; Zhang, M. From Role-Play to Drama-Interaction: An LLM Solution. In Findings of the Association for Computational Linguistics: ACL 2024; Association for Computational Linguistics: Stroudsburg, PA, USA, 2024; pp. 3271–3290. [Google Scholar]
- Liu, X.; Xie, Z.; Jiang, S. Personalized Non-Player Characters: A Framework for Character-Consistent Dialogue Generation. AI 2025, 6, 93. [Google Scholar] [CrossRef]
- Park, J.S.; O’Brien, J.; Cai, C.J.; Morris, M.R.; Liang, P.; Bernstein, M.S. Generative Agents: Interactive Simulacra of Human Behavior. In UIST’23: Proceedings of the 36th Annual ACM Symposium on User Interface Software and Technology, San Francisco, CA, USA, 29 October–1 November 2023; Association for Computing Machinery: New York, NY, USA, 2023; pp. 1–22. [Google Scholar]
- Buongiorno, S.; Klinkert, L.; Zhuang, Z.; Chawla, T.; Clark, C. PANGeA: Procedural Artificial Narrative Using Generative AI for Turn-Based, Role-Playing Video Games. Proc. AAAI Conf. Artif. Intell. Interact. Digit. Entertain. 2024, 20, 156–166. [Google Scholar] [CrossRef]
- Shao, Y.; Li, L.; Dai, J.; Qiu, X. Character-LLM: A Trainable Agent for Role-Playing. arXiv 2023, arXiv:2310.10158. [Google Scholar] [CrossRef]
- Kang, T.; Lin, M.C. Action2Dialogue: Generating Character-Centric Narratives from Scene-Level Prompts. arXiv 2025, arXiv:2505.16819. [Google Scholar]
- Li, J.; Li, Y.; Wadhwa, N.; Pritch, Y.; Jacobs, D.E.; Rubinstein, M.; Bansal, M.; Ruiz, N. Unbounded: A Generative Infinite Game of Character Life Simulation. arXiv 2024, arXiv:2410.18975. [Google Scholar] [CrossRef]
- ISO 30401:2018; Knowledge Management Systems—Requirements. International Organization for Standardization: Geneva, Switzerland, 2018.
- Ontanón, S.; Synnaeve, G.; Uriarte, A.; Richoux, F.; Churchill, D.; Preuss, M. A survey of real-time strategy game AI research and competition in StarCraft. IEEE Trans. Comput. Intell. AI Games 2013, 5, 293–311. [Google Scholar] [CrossRef]
- Young, R.M.; Riedl, M.O. An Architecture for Integrating Plan-Based Behavior Generation with Interactive Game Environments. J. Game Dev. 2004, 1, 1–29. [Google Scholar]
- Togelius, J.; Yannakakis, G.N.; Stanley, K.O.; Browne, C. Search-Based Procedural Content Generation: A Taxonomy and Survey. IEEE Trans. Comput. Intell. AI Games 2011, 3, 172–186. [Google Scholar] [CrossRef]
- Smith, G.; Whitehead, J.; Mateas, M. Tanagra: Reactive Planning and Constraint Solving for Mixed-Initiative Level Design. IEEE Trans. Comput. Intell. AI Games 2011, 3, 201–215. [Google Scholar] [CrossRef]
- Alexander, R.; Martens, C. Deriving Quests from Open World Mechanics. In FDG’17: Proceedings of the 12th International Conference on the Foundations of Digital Games, Hyannis, MA, USA, 14–17 August 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 1–7. [Google Scholar]
- Ammanabrolu, P.; Broniec, W.; Mueller, A.; Paul, J.; Riedl, M. Toward Automated Quest Generation in Text-Adventure Games. In Proceedings of the 4th Workshop on Computational Creativity in Language Generation, Tokyo, Japan; Burtenshaw, B., Manjavacas, E., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 1–12. [Google Scholar]
- da Rocha Franco, A.d.O.; de Carvalho, W.V.; da Silva, J.W.F.; Maia, J.G.R.; de Castro, M.F. Managing and Controlling Digital Role-Playing Game Elements: A Current State of Affairs. Entertain. Comput. 2024, 51, 100708. [Google Scholar] [CrossRef]
- Risi, S.; Togelius, J. Increasing Generality in Machine Learning through Procedural Content Generation. Nat. Mach. Intell. 2020, 2, 428–436. [Google Scholar] [CrossRef]
- Koppen, L. Integrating a Human Feedback Loop in PCG for Level Design Using LLMs. Bachelor’s Thesis, University of Twente, Enschede, The Netherlands, 2024. [Google Scholar]
- Sun, Y.; Li, Z.; Fang, K.; Lee, C.H.; Asadipour, A. Language as Reality: A Co-Creative Storytelling Game Experience in 1001 Nights Using Generative AI. Proc. AAAI Conf. Artif. Intell. Interact. Digit. Entertain. 2023, 19, 425–434. [Google Scholar] [CrossRef]
- Kumaran, V.; Carpenter, D.; Rowe, J.; Mott, B.; Lester, J. End-to-End Procedural Level Generation in Educational Games with Natural Language Instruction. In Proceedings of the 2023 IEEE Conference on Games (CoG), Boston, MA, USA, 21–24 August 2023; pp. 1–8. [Google Scholar]
- Nasir, M.U.; James, S.; Togelius, J. Word2World: Generating Stories and Worlds through Large Language Models. arXiv 2024, arXiv:2405.06686. [Google Scholar]
- Li, W.; Bai, Y.; Lu, J.; Yi, K. Immersive Text Game and Personality Classification. arXiv 2022, arXiv:2203.10621. [Google Scholar] [CrossRef]
- Shuster, K.; Urbanek, J.; Szlam, A.; Weston, J. Am I Me or You? State-of-the-Art Dialogue Models Cannot Maintain an Identity. In Findings of the Association for Computational Linguistics: NAACL 2022, Seattle, WA, USA; Association for Computational Linguistics: Stroudsburg, PA, USA, 2022; pp. 2367–2387. [Google Scholar]
- Song, H.; Wang, Y.; Zhang, W.-N.; Liu, X.; Liu, T. Generate, Delete and Rewrite: A Three-Stage Framework for Improving Persona Consistency of Dialogue Generation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online; Jurafsky, D., Chai, J., Schluter, N., Tetreault, J., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2020; pp. 5821–5831. [Google Scholar]
- Ji, K.; Lian, Y.; Li, L.; Gao, J.; Li, W.; Dai, B. Enhancing Persona Consistency for LLMs’ Role-Playing Using Persona-Aware Contrastive Learning. arXiv 2025, arXiv:2503.17662. [Google Scholar]
- Takayama, J.; Ohagi, M.; Mizumoto, T.; Yoshikawa, K. Persona-Consistent Dialogue Generation via Pseudo Preference Tuning. In Proceedings of the 31st International Conference on Computational Linguistics, Abu Dhabi, UAE; Association for Computational Linguistics: Stroudsburg, PA, USA, 2025; pp. 5507–5514. [Google Scholar]
- Zhou, W.; Peng, X.; Riedl, M. Dialogue Shaping: Empowering Agents through NPC Interaction. arXiv 2023, arXiv:2307.15833. [Google Scholar] [CrossRef]
- Jennings, N.; Wang, H.; Li, I.; Smith, J.; Hartmann, B. What’s the Game, Then? Opportunities and Challenges for Runtime Behavior Generation. In UIST’24: Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology, Pittsburgh, PA, USA, 13–16 October 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 1–13. [Google Scholar]
- Wang, G.; Xie, Y.; Jiang, Y.; Mandlekar, A.; Xiao, C.; Zhu, Y.; Fan, L.; Anandkumar, A. Voyager: An Open-Ended Embodied Agent with Large Language Models. arXiv 2023, arXiv:2305.16291. [Google Scholar] [CrossRef]
- Meta Fundamental AI Research Diplomacy Team (FAIR); Bakhtin, A.; Brown, N.; Dinan, E.; Farina, G.; Flaherty, C.; Fried, D.; Goff, A.; Gray, J.; Hu, H.; et al. Human-Level Play in the Game of Diplomacy by Combining Language Models with Strategic Reasoning. Science 2022, 378, 1067–1074. [Google Scholar] [CrossRef]
- Zheng, S.; He, K.; Yang, L.; Xiong, J. MemoryRepository for AI NPC. IEEE Access 2024, 12, 62581–62596. [Google Scholar] [CrossRef]
- Sudhakaran, S.; González-Duque, M.; Freiberger, M.; Glanois, C.; Najarro, E.; Risi, S. MarioGPT: Open-Ended Text2Level Generation through Large Language Models. Adv. Neural Inf. Process. Syst. 2023, 36, 54213–54227. [Google Scholar]
- Welleck, S.; Kulikov, I.; Roller, S.; Dinan, E.; Cho, K.; Weston, J. Neural Text Generation with Unlikelihood Training. arXiv 2019, arXiv:1908.04319. [Google Scholar] [CrossRef]
- Chang, S.; Wang, J.; Dong, M.; Pan, L.; Zhu, H.; Li, A.H.; Lan, W.; Zhang, S.; Jiang, J.; Lilien, J.; et al. Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness. arXiv 2023, arXiv:2301.08881. [Google Scholar]
- Holtzman, A.; Buys, J.; Du, L.; Forbes, M.; Choi, Y. The Curious Case of Neural Text Degeneration. arXiv 2020, arXiv:1904.09751. [Google Scholar] [CrossRef]
- Li, J.; Galley, M.; Brockett, C.; Gao, J.; Dolan, B. A Diversity-Promoting Objective Function for Neural Conversation Models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, California; Knight, K., Nenkova, A., Rambow, O., Eds.; Association for Computational Linguistics: Stroudsburg, PA, USA, 2016; pp. 110–119. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
- Ahmed, M.; Seraj, R.; Islam, S.M.S. The K-Means Algorithm: A Comprehensive Survey and Performance Evaluation. Electronics 2020, 9, 1295. [Google Scholar] [CrossRef]
- van der Maaten, L.; Hinton, G. Visualizing Data Using t-SNE. J. Mach. Learn. Res. 2008, 9, 2579–2605. [Google Scholar]
- McIntyre, N.; Lapata, M. Plot Induction and Evolutionary Search for Story Generation. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden; Association for Computational Linguistics: Stroudsburg, PA, USA, 2010; pp. 1562–1572. [Google Scholar]
- Porteous, J.; Cavazza, M. Controlling Narrative Generation with Planning Trajectories: The Role of Constraints. In Interactive Storytelling. ICIDS 2009; Iurgel, I.A., Zagalo, N., Petta, P., Eds.; Springer: Berlin/Heidelberg, Germany, 2009; pp. 234–245. [Google Scholar]
- Riedl, M.O.; Young, R.M. Narrative Planning: Balancing Plot and Character. J. Artif. Intell. Res. 2010, 39, 217–268. [Google Scholar] [CrossRef]
- Shaker, N.; Togelius, J.; Nelson, M.J. Procedural Content Generation in Games; Springer: Cham, Switzerland, 2016. [Google Scholar]
- Summerville, A.; Snodgrass, S.; Guzdial, M.; Holmgård, C.; Hoover, A.K.; Isaksen, A.; Nealen, A.; Togelius, J. Procedural Content Generation via Machine Learning (PCGML). IEEE Trans. Games 2018, 10, 257–270. [Google Scholar] [CrossRef]
- Smith, G.; Othenin-Girard, A.; Whitehead, J.; Wardrip-Fruin, N. PCG-Based Game Design: Creating Endless Web. In Proceedings of the International Conference on the Foundations of Digital Games, Raleigh, NC, USA, 29 May–1 June 2012; Association for Computing Machinery: New York, NY, USA, 2012; pp. 188–195. [Google Scholar]
- Zheng, L.; Chiang, W.-L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.P.; et al. Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena. In Proceedings of the 37th International Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023; Curran Associates Inc.: Red Hook, NY, USA, 2023; pp. 46595–46623. [Google Scholar]
- Bai, Y.; Jones, A.; Ndousse, K.; Askell, A.; Chen, A.; DasSarma, N.; Drain, D.; Fort, S.; Ganguli, D.; Henighan, T.; et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv 2022, arXiv:2204.05862. [Google Scholar] [CrossRef]












| Category | Representative Work | Structural Control | Narrative Expressiveness | Dialogue/Persona Consistency | Engine-Level Integration | Distinguishing Features |
|---|---|---|---|---|---|---|
| Rule-Based/Planning-Based PCG | [14,15] | High (template/ logic rules) | Low–Medium | None | Limited/none | Requires heavy authoring; constrained variation; no LLM richness |
| Template-Driven/Hybrid Systems | [5,18] | High (template anchors) | Medium | Limited | Editor-only; not automated | Template-bound; requires human supervision; not scalable |
| LLM—Based Narrative Generation | [1,2,4] | Low (free-form generation) | High | Moderate | None | No schema guarantees; inconsistent world models; text-only evaluation |
| LLM Dialogue/Persona Consistency Models | [3,6,7] | Medium (AMR/fine-tuning) | Medium | High | None | No quest-state control; no quest graph generation; no runtime verification |
| Open-Ended LLM Simulation/Multi-Agent Systems | [8,29] | Low–Medium | High | Moderate | Sandbox only | No schema validation; not engine-ready; no structural repair |
| LLM—Chained PCG Framework (our) | - | High (JSON schema + graph repair) | High | High(world bible+ persona coherence) | Full Unity runtime validation | An end-to-end LLM-to-engine pipeline for scalable, execution-ready JSON artifacts |
| Version | Tasks | Smoke Pass Rate | Validate Pass Rate | Interpretation |
|---|---|---|---|---|
| Early Baseline (VLST1) | 20 | 50% | 0% | Partially loadable, but severe semantic inconsistencies prevent any task from meeting validation requirements. |
| Prompt-Optimized (VLST2) | 20 | 100% | 0% | Prompt regularization fixes all structural issues but does not resolve deeper semantic constraints such as prefab mismatches and invalid ending references. |
| Narrative-Normalized (VLST3) | 20 | 100% | 100% | Automatic normalization enforces strict structural and semantic constraints, correcting missing fields, flagging inconsistencies, and invalid references. |
| Prompt-Optimized (VLST4) | 100 | 100% | 68% | Built upon the normalization framework introduced in VLST3, this version enhances prompt richness and narrative guidance while relaxing overly rigid post-processing rules. |
| Final Generator (without Normalization) | 100 | 100% | 0% | Removing normalization under identical prompt and schema constraints leads to a complete loss of semantic validity, indicating that prompt- and schema-based control alone is insufficient for ensuring runtime-executable content. This confirms normalization as a necessary governance layer for maintaining engine-aligned semantic consistency, rather than a mere engineering convenience. |
| Final Generator (T = 0.3) | 100 | 100% | 78% | The final generator retains normalization as a safety layer but adopts a softer, selective correction strategy to preserve expressive diversity. Validation rates vary with sampling temperature, reflecting a controlled trade-off between structural reliability and narrative variability. Among these, T = 0.7 achieves the strongest balance between semantic validity and expressive richness. |
| Final Generator (T = 0.7) | 100 | 100% | 85% | |
| Final Generator (T = 1.0) | 100 | 100% | 77% |
| Stage | Schema Errors | Semantic Errors | Missing Required Fields | Prefab Errors |
|---|---|---|---|---|
| VLST1 | High | High | Frequent | Frequent |
| VLST2 | Eliminated (structure fully stable) | High (deep semantic issues persist) | Eliminated | Partially resolved |
| VLST3 | Eliminated | Fully resolved (after normalization) | Eliminated | Only minor inconsistencies |
| VLST4 | Eliminated | Moderate (improved but not fully fixed) | Eliminated | Rare |
| Final_w/o Norm | Eliminated | High | Eliminated | Frequent |
| Final_T0.7 | Eliminated | Lowest | Eliminated | Nearly eliminated |
| Temp | Avg Token Count ↑ | Std ↑ | Interpretation |
|---|---|---|---|
| T0.3 | 82.3 | 25.0 | The shortest, with concise dialogue content. |
| T0.7 1 | 92.7 | 32.4 | More information, yet maintaining consistency. |
| T1.0 | 100.8 | 42.1 | The longest, but the task content varies greatly. |
| Version | Num_tasks | Avg_path_len | Branching _ratio | Clustering _coeff | dead_ends | Reachable _endings | Total_endings |
|---|---|---|---|---|---|---|---|
| T0.3 | 78 | 3.0 | 1.341 | 0.0 | 0 | 5.16 | 5.16 |
| T0.7 | 85 | 3.0 | 1.361 | 0.0 | 0 | 5.38 | 5.38 |
| T1.0 | 77 | 3.0 | 1.362 | 0.0 | 0 | 5.15 | 5.15 |
| Golden_sample | 1 | 3.0 | 1.455 | 0.0 | 0 | 8 | 8 |
| Version | Avg. Overall | World Consistency | Narrative Logic | Solvability | Character Voice | Dialogue Quality | Interestingness | Invalid Pos. | Notes |
|---|---|---|---|---|---|---|---|---|---|
| T0.3 | 4.20 | 4.3 | 4.1 | 4.2 | 4.2 | 4.2 | 4.0 | 0 | Logic-stable, conservative creativity |
| T0.7 | 4.35 | 4.5 | 4.3 | 4.4 | 4.4 | 4.3 | 4.3 | 0 | Best balance between coherence and creativity |
| T1.0 | 4.25 | 4.2 | 4.1 | 4.2 | 4.5 | 4.5 | 4.7 | 1 | High creativity, mildly unstable structure |
| Dimension | Scores | Content |
|---|---|---|
| Story Pitch | - | Ilya seeks help at the edge of the campsite. You follow footprints and wind-scattered fragments, recover and repair the map, and return it to ensure safe night travel along the route. |
| World Consistency | 4.5 | No canon names; prefabs ok; positions ok |
| Narrative Logic | 4.5 | branches closed; zone-event fit |
| Solvability | 4.0 | objectives unlock and complete |
| Character Voice | 4.5 | Distinct speakers and concise options; personalities indicated. |
| Dialogue Quality | 4.0 | No overlong lines |
| Interestingness | 4.0 | Search vs. promise routes add variety; fits campsite-to-forest beat. |
| Dimension | Question Items | Mean | SD | Interpretation |
|---|---|---|---|---|
| Playability and Clarity | Q1–Q2 | 4.17 | 0.22 | Players generally felt that the quest instructions were clear and the process was easy to understand. |
| Narrative Logic | Q3–Q4 | 4.23 | 0.33 | The narrative structure is coherent, with no jumps in plot or logical inconsistencies. |
| World Consistency | Q5 | 4.40 | 0.50 | The characters, events, and worldview are highly consistent. |
| Character Voice and Dialogue Quality | Q6–Q8 | 4.30 | 0.41 | The NPCs have distinct personalities, and their dialogue is natural and non-repetitive. |
| Interestingness | Q9 | 4.07 | 0.46 | The overall task is interesting and the process is engaging. |
| Overall Experience | Q10–Q11 | 4.57 | 0.47 | The experience was positive, and most players were willing to try more AI-generated tasks. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Rahman, A.; Yu, A.; Cho, K. Game Knowledge Management System: Schema-Governed LLM Pipeline for Executable Narrative Generation in RPGs. Systems 2026, 14, 175. https://doi.org/10.3390/systems14020175
Rahman A, Yu A, Cho K. Game Knowledge Management System: Schema-Governed LLM Pipeline for Executable Narrative Generation in RPGs. Systems. 2026; 14(2):175. https://doi.org/10.3390/systems14020175
Chicago/Turabian StyleRahman, Aynigar, Aihe Yu, and Kyungeun Cho. 2026. "Game Knowledge Management System: Schema-Governed LLM Pipeline for Executable Narrative Generation in RPGs" Systems 14, no. 2: 175. https://doi.org/10.3390/systems14020175
APA StyleRahman, A., Yu, A., & Cho, K. (2026). Game Knowledge Management System: Schema-Governed LLM Pipeline for Executable Narrative Generation in RPGs. Systems, 14(2), 175. https://doi.org/10.3390/systems14020175

