Next Article in Journal
A Comprehensive Review and Experimental Study on Biodiesel Upgrade Through Selective Partial Catalytic Hydrogenation
Previous Article in Journal
On the Context-Aware GNSS Navigation: Test of a k-Nearest Neighbors Classifier in Different Environments
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Proceeding Paper

Ensuring the Secure Integration of Generative AI in Video Game Developments †

by
Pablo Natera-Muñoz
1,*,
Ruth Torres-Gallego
1,
Antonio Silva-Luengo
2,
Pablo García-Rodríguez
1 and
Alberto Carrón-Campón
1
1
Grupo de Ingeniería de Medios (GIM), University of Extremadura, 10003 Cáceres, Spain
2
Grupo Robolab, University of Extremadura, 10003 Cáceres, Spain
*
Author to whom correspondence should be addressed.
Presented at the First Summer School on Artificial Intelligence in Cybersecurity, Cancun, Mexico, 3–7 November 2025.
Eng. Proc. 2026, 123(1), 31; https://doi.org/10.3390/engproc2026123031
Published: 11 February 2026
(This article belongs to the Proceedings of First Summer School on Artificial Intelligence in Cybersecurity)

Abstract

The use of generative artificial intelligence (AI) is rapidly transforming video game development, enabling the creation of dynamic dialogues, procedural environments, and customized in-game experiences. However, this paradigm shift also introduces significant cybersecurity challenges. This paper explores the integration of generative AI in game design and the associated risks, including prompt injection, generation of harmful content, and potential data leakage. We propose a secure generative framework for video games that incorporates input sanitization, content moderation, and sandboxed model execution to mitigate these threats. Our methodology aims to balance creativity and security, enabling safe deployment of generative systems in modern game environments.

1. Introduction

Generative artificial intelligence is rapidly reshaping the video game industry, enabling dynamic dialogues, adaptive quests, and procedural environments. This integration, however, introduces new cybersecurity risks. Unlike deterministic systems, the stochastic nature of generative models can be exploited to bypass filters or gain unfair advantages. Furthermore, training data may contain sensitive information, leading to unintended leakage. Prompt injection attacks are particularly concerning, allowing users to subvert model behavior and create reputational risks for developers.
This paper addresses these concerns by proposing a robust framework for secure AI implementation. This architecture relies on three layers—the Input Sanitization Service (ISS), the Isolated Generative Model (IGM), and the Output Validation and Fallback System (OVFS)—to enforce integrity at every stage. Our goal is to provide technical guidelines that allow developers to harness generative models while maintaining a secure environment. This work thus provides a viable standard for integrating large language models in interactive entertainment.developers integrating large language models in interactive entertainment.

2. Materials and Methods

The proposed methodology consists of a layered defense built around three core principles: input control, output validation, and system isolation. The framework establishes a secure perimeter by treating player input as inherently untrustworthy and model output as requiring explicit verification. This approach implements adaptive security mechanisms designed to maintain both game integrity and the quality of the player experience.
The first element is input control. Since generative models are highly sensitive to prompts, it is essential to sanitize user inputs before they reach the AI system. This involves advanced context-aware checks to detect manipulation and reduce the likelihood of prompt injection attacks [1]. This process employs techniques such as prompt template enforcement—using structured formats like JSON or XML tags to define context—and multi-layered analysis to securely guide the model’s generation [2].
Equally important is the validation of generated content, necessary as even sanitized inputs can lead to harmful outputs due to model drift or hallucinations. All model outputs are evaluated through a combination of lightweight content classifiers, toxicity filters [3], and fallback systems that trigger re-generation or replace unsafe results. These filters ensure that generated content aligns with ethical standards and prevents biased content from reaching the player.
Finally, the methodology enforces the isolation of the generative AI component from the game engine. By deploying the model within a sandboxed environment [4], it is prevented from accessing sensitive resources or triggering unintended side effects. This is implemented using containerization technology, such as Docker [5], ensuring kernel-level separation of the LLM process from the game runtime. Robust logging and monitoring mechanisms [6] allow teams to detect misuse and adapt the system to evolving threats, establishing a solid foundation for responsible AI deployment.

3. Results

The proposed architecture is a three-layered system: the Input Sanitization Service (ISS), the Isolated Generative Model (IGM), and theOutput Validation and Fallback System (OVFS). This design treats the AI as a decoupled service running in a secure perimeter, enhancing security.

3.1. Input Sanitization Service (ISS)

The ISS is the mandatory gateway for all player prompts. It mitigates prompt injection and other sophisticated attacks by using advanced, context-aware filtering, ensuring all inputs are safe before reaching the generative model.
Prompt Filtering and Transformation This service enforces prompt templates and multi-layered analysis to securely guide the model [2]. Raw player input is converted into a constrained structure (e.g., using JSON/XML tags) that rigorously separates user text from immutable system instructions, preventing command hijacking [1]. This structure rigorously defines the NPC’s context, emotional state, and conversational goals. Specialized classifiers also detect and block complex adversarial phrases or sudden sentiment shifts.
Auditing and Red-Teaming Integration The ISS undergoes continuous AI red-teaming and security audits, adhering to established risk management frameworks [7]. All adversarial prompts (both failed and successful) are logged and fed into an automated retraining loop. These prompts are categorized by attack vector to improve the adaptive response. This process allows the filters to rapidly adapt to new, emerging threats.

3.2. Isolated Generative Model (IGM)

The IGM is the core execution environment for the LLM, focusing on operational security. It operates as a microservice with a principle of least privilege, limiting its capacity to interact with or compromise core game components.
Containerized Sandboxing and Isolation The generative AI component is executed within a strictly controlled sandboxed environment [4]. This is typically implemented using containerization technology, such as Docker [5], to ensure strict, kernel-level separation from the host OS and game runtime and to apply restrictions on file system, network, and memory access.
Observability and Latency Management The IGM integrates mechanisms for resource throttling and per-user rate limits to guarantee Quality of Service (QoS) and prevent DoS attacks. Robust logging and monitoring mechanisms [6] capture real-time data (latency, token usage, hardware utilization) to rapidly detect anomalies or exploitation attempts.

3.3. Output Validation and Fallback System (OVFS)

The OVFS is the final defensive layer, validating all content before it is delivered to the player. It ensures strict compliance with safety, ethical, and narrative standards, acting as a safeguard against harmful outputs caused by model drift or inherent biases.
Multi-Stage Content Vetting The OVFS implements a multi-stage vetting process. The initial stage uses lightweight, real-time content classifiers and toxicity filters to block immediate violations like hate speech [3]. This is followed by a narrative compliance check to ensure the response is contextually appropriate (e.g., confirming a medieval NPC does not discuss modern topics). Finally, outputs with low confidence scores are flagged for scrutiny.
Explainability and Remediation The OVFS incorporates principles of Explainable AI (XAI) to provide auditable logs detailing why content was flagged. This is vital for diagnostics and fine-tuning. If an output is flagged, the system triggers remediation: either a “soft failure” (re-prompting the IGM with safer parameters) or a “hard failure” (immediately replacing the content with a pre-defined, safe fallback message, known as “guardrailing”).

4. Discussion

As generative Al continues to evolve, future work in this area must focus on making secure content generation more scalable, accessible, and adaptable across game genres and platforms. One of the main priorities is the development of lightweight moderation systems that can operate in real time without disrupting the gameplay experience [8]. These systems must balance performance with safety, especially in complex multiplayer or open-world environments where player interaction is highly unpredictable. The core architectural challenge involves optimizing the ISS and OVFS layers to function efficiently at the edge, utilizing smaller, specialized safety classifiers to achieve the ultra-low latency required for truly immersive, real-time dialogue and procedural generation.
A second critical goal is to explore new, automated methods for aligning generative models intrinsically with game-specific ethics and narrative constraints. This will move beyond simple template constraints, implementing continuous fine-tuning of the IGM using adversarial data collected directly from the retraining loop generated by ISS and OVFS failures. This process ensures safety and narrative coherence are learned intrinsically by the model itself, which is a more robust solution than simple post-processing filters, and not merely enforced by external filters. Furthermore, we aim to significantly improve transparency by fully integrating the Explainable AI (XAI) mechanisms introduced in the OVFS layer, helping developers and auditors understand precisely why specific outputs are filtered or modified.
In the long term, we envision the creation of a standard, open framework for secure generative Al in games. This initiative requires the development of an open-source security abstraction layer—a standardized API—that handles the security microservices (ISS, IGM, OVFS) across different proprietary engines like Unity or Unreal Engine. This standardization will democratize secure integration, which is crucial for smaller developers and indie studios, and accelerate the adoption of unified, community-driven best practices. Collaboration between Al researchers, game developers, and cybersecurity experts will be absolutely essential to building a responsible, trustworthy, and scalable future for generative technologies in interactive entertainment.

Author Contributions

Conceptualization, P.N.-M. and R.T.-G.; methodology, P.N.-M. and R.T.-G.; software, P.G.-R. and A.C.-C.; validation, R.T.-G., P.G.-R. and A.C.-C.; formal analysis, A.S.-L.; investigation, R.T.-G.; resources, A.S.-L.; writing—original draft preparation, P.N.-M.; writing—review and editing, P.N.-M., R.T.-G., A.S.-L., P.G.-R. and A.C.-C.; supervision, P.N.-M.; project administration, P.N.-M.; funding acquisition, A.C.-C. All authors have read and agreed to the published version of the manuscript.

Funding

This initiative is carried out within the framework of the funds of the Recovery, Transformation and Resilience Plan, financed by the European Union (Next Generation) - National Cybersecurity Institute (INCIBE) in the project C107/23 “Artificial Intelligence Applied to Cybersecurity in Critical Water and Sanitation Infrastructures”.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. IBM. What Is a Prompt Injection Attack? Available online: https://www.ibm.com/think/topics/prompt-injection (accessed on 20 October 2025).
  2. NVIDIA Developer. Securing LLM Systems Against Prompt Injection. Available online: https://developer.nvidia.com/blog/securing-llm-systems-against-prompt-injection/ (accessed on 20 October 2025).
  3. Utopia Analytics. AI Content Moderation for Online Gaming and Chats. Available online: https://www.utopiaanalytics.com/ai-content-moderation-for-online-gaming-and-chat-services (accessed on 20 October 2025).
  4. Fortinet. What Is Sandboxing? Available online: https://www.fortinet.com/resources/cyberglossary/what-is-sandboxing (accessed on 20 October 2025).
  5. Docker, Inc. Securing LLM Workloads with Container Isolation. Available online: https://www.docker.com/blog/securing-llm-workloads-with-container-isolation/ (accessed on 20 October 2025).
  6. Harvard University Information Technology. AI Sandbox. Available online: https://www.huit.harvard.edu/ai-sandbox (accessed on 20 October 2025).
  7. National Institute of Standards and Technology (NIST). AI Risk Management Framework. Available online: https://www.nist.gov/itl/ai-risk-management-framework (accessed on 20 October 2025).
  8. Lasso. AI-Powered Content Moderation for Gaming Platforms. Available online: https://www.lassomoderation.com/industries/content-moderation-for-gaming/ (accessed on 20 October 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Natera-Muñoz, P.; Torres-Gallego, R.; Silva-Luengo, A.; García-Rodríguez, P.; Carrón-Campón, A. Ensuring the Secure Integration of Generative AI in Video Game Developments. Eng. Proc. 2026, 123, 31. https://doi.org/10.3390/engproc2026123031

AMA Style

Natera-Muñoz P, Torres-Gallego R, Silva-Luengo A, García-Rodríguez P, Carrón-Campón A. Ensuring the Secure Integration of Generative AI in Video Game Developments. Engineering Proceedings. 2026; 123(1):31. https://doi.org/10.3390/engproc2026123031

Chicago/Turabian Style

Natera-Muñoz, Pablo, Ruth Torres-Gallego, Antonio Silva-Luengo, Pablo García-Rodríguez, and Alberto Carrón-Campón. 2026. "Ensuring the Secure Integration of Generative AI in Video Game Developments" Engineering Proceedings 123, no. 1: 31. https://doi.org/10.3390/engproc2026123031

APA Style

Natera-Muñoz, P., Torres-Gallego, R., Silva-Luengo, A., García-Rodríguez, P., & Carrón-Campón, A. (2026). Ensuring the Secure Integration of Generative AI in Video Game Developments. Engineering Proceedings, 123(1), 31. https://doi.org/10.3390/engproc2026123031

Article Metrics

Back to TopTop