1. Introduction
Generative artificial intelligence is rapidly reshaping the video game industry, enabling dynamic dialogues, adaptive quests, and procedural environments. This integration, however, introduces new cybersecurity risks. Unlike deterministic systems, the stochastic nature of generative models can be exploited to bypass filters or gain unfair advantages. Furthermore, training data may contain sensitive information, leading to unintended leakage. Prompt injection attacks are particularly concerning, allowing users to subvert model behavior and create reputational risks for developers.
This paper addresses these concerns by proposing a robust framework for secure AI implementation. This architecture relies on three layers—the Input Sanitization Service (ISS), the Isolated Generative Model (IGM), and the Output Validation and Fallback System (OVFS)—to enforce integrity at every stage. Our goal is to provide technical guidelines that allow developers to harness generative models while maintaining a secure environment. This work thus provides a viable standard for integrating large language models in interactive entertainment.developers integrating large language models in interactive entertainment.
2. Materials and Methods
The proposed methodology consists of a layered defense built around three core principles: input control, output validation, and system isolation. The framework establishes a secure perimeter by treating player input as inherently untrustworthy and model output as requiring explicit verification. This approach implements adaptive security mechanisms designed to maintain both game integrity and the quality of the player experience.
The first element is input control. Since generative models are highly sensitive to prompts, it is essential to sanitize user inputs before they reach the AI system. This involves advanced context-aware checks to detect manipulation and reduce the likelihood of prompt injection attacks [
1]. This process employs techniques such as prompt template enforcement—using structured formats like JSON or XML tags to define context—and multi-layered analysis to securely guide the model’s generation [
2].
Equally important is the validation of generated content, necessary as even sanitized inputs can lead to harmful outputs due to model drift or hallucinations. All model outputs are evaluated through a combination of lightweight content classifiers, toxicity filters [
3], and fallback systems that trigger re-generation or replace unsafe results. These filters ensure that generated content aligns with ethical standards and prevents biased content from reaching the player.
Finally, the methodology enforces the isolation of the generative AI component from the game engine. By deploying the model within a sandboxed environment [
4], it is prevented from accessing sensitive resources or triggering unintended side effects. This is implemented using containerization technology, such as Docker [
5], ensuring kernel-level separation of the LLM process from the game runtime. Robust logging and monitoring mechanisms [
6] allow teams to detect misuse and adapt the system to evolving threats, establishing a solid foundation for responsible AI deployment.
3. Results
The proposed architecture is a three-layered system: the Input Sanitization Service (ISS), the Isolated Generative Model (IGM), and theOutput Validation and Fallback System (OVFS). This design treats the AI as a decoupled service running in a secure perimeter, enhancing security.
3.1. Input Sanitization Service (ISS)
The ISS is the mandatory gateway for all player prompts. It mitigates prompt injection and other sophisticated attacks by using advanced, context-aware filtering, ensuring all inputs are safe before reaching the generative model.
Prompt Filtering and Transformation This service enforces prompt templates and multi-layered analysis to securely guide the model [
2]. Raw player input is converted into a constrained structure (e.g., using JSON/XML tags) that rigorously separates user text from immutable system instructions, preventing command hijacking [
1]. This structure rigorously defines the NPC’s context, emotional state, and conversational goals. Specialized classifiers also detect and block complex adversarial phrases or sudden sentiment shifts.
Auditing and Red-Teaming Integration The ISS undergoes continuous AI red-teaming and security audits, adhering to established risk management frameworks [
7]. All adversarial prompts (both failed and successful) are logged and fed into an automated retraining loop. These prompts are categorized by attack vector to improve the adaptive response. This process allows the filters to rapidly adapt to new, emerging threats.
3.2. Isolated Generative Model (IGM)
The IGM is the core execution environment for the LLM, focusing on operational security. It operates as a microservice with a principle of least privilege, limiting its capacity to interact with or compromise core game components.
Containerized Sandboxing and Isolation The generative AI component is executed within a strictly controlled sandboxed environment [
4]. This is typically implemented using containerization technology, such as Docker [
5], to ensure strict, kernel-level separation from the host OS and game runtime and to apply restrictions on file system, network, and memory access.
Observability and Latency Management The IGM integrates mechanisms for resource throttling and per-user rate limits to guarantee Quality of Service (QoS) and prevent DoS attacks. Robust logging and monitoring mechanisms [
6] capture real-time data (latency, token usage, hardware utilization) to rapidly detect anomalies or exploitation attempts.
3.3. Output Validation and Fallback System (OVFS)
The OVFS is the final defensive layer, validating all content before it is delivered to the player. It ensures strict compliance with safety, ethical, and narrative standards, acting as a safeguard against harmful outputs caused by model drift or inherent biases.
Multi-Stage Content Vetting The OVFS implements a multi-stage vetting process. The initial stage uses lightweight, real-time content classifiers and toxicity filters to block immediate violations like hate speech [
3]. This is followed by a narrative compliance check to ensure the response is contextually appropriate (e.g., confirming a medieval NPC does not discuss modern topics). Finally, outputs with low confidence scores are flagged for scrutiny.
Explainability and Remediation The OVFS incorporates principles of Explainable AI (XAI) to provide auditable logs detailing why content was flagged. This is vital for diagnostics and fine-tuning. If an output is flagged, the system triggers remediation: either a “soft failure” (re-prompting the IGM with safer parameters) or a “hard failure” (immediately replacing the content with a pre-defined, safe fallback message, known as “guardrailing”).
4. Discussion
As generative Al continues to evolve, future work in this area must focus on making secure content generation more scalable, accessible, and adaptable across game genres and platforms. One of the main priorities is the development of lightweight moderation systems that can operate in real time without disrupting the gameplay experience [
8]. These systems must balance performance with safety, especially in complex multiplayer or open-world environments where player interaction is highly unpredictable. The core architectural challenge involves optimizing the ISS and OVFS layers to function efficiently at the edge, utilizing smaller, specialized safety classifiers to achieve the ultra-low latency required for truly immersive, real-time dialogue and procedural generation.
A second critical goal is to explore new, automated methods for aligning generative models intrinsically with game-specific ethics and narrative constraints. This will move beyond simple template constraints, implementing continuous fine-tuning of the IGM using adversarial data collected directly from the retraining loop generated by ISS and OVFS failures. This process ensures safety and narrative coherence are learned intrinsically by the model itself, which is a more robust solution than simple post-processing filters, and not merely enforced by external filters. Furthermore, we aim to significantly improve transparency by fully integrating the Explainable AI (XAI) mechanisms introduced in the OVFS layer, helping developers and auditors understand precisely why specific outputs are filtered or modified.
In the long term, we envision the creation of a standard, open framework for secure generative Al in games. This initiative requires the development of an open-source security abstraction layer—a standardized API—that handles the security microservices (ISS, IGM, OVFS) across different proprietary engines like Unity or Unreal Engine. This standardization will democratize secure integration, which is crucial for smaller developers and indie studios, and accelerate the adoption of unified, community-driven best practices. Collaboration between Al researchers, game developers, and cybersecurity experts will be absolutely essential to building a responsible, trustworthy, and scalable future for generative technologies in interactive entertainment.
Author Contributions
Conceptualization, P.N.-M. and R.T.-G.; methodology, P.N.-M. and R.T.-G.; software, P.G.-R. and A.C.-C.; validation, R.T.-G., P.G.-R. and A.C.-C.; formal analysis, A.S.-L.; investigation, R.T.-G.; resources, A.S.-L.; writing—original draft preparation, P.N.-M.; writing—review and editing, P.N.-M., R.T.-G., A.S.-L., P.G.-R. and A.C.-C.; supervision, P.N.-M.; project administration, P.N.-M.; funding acquisition, A.C.-C. All authors have read and agreed to the published version of the manuscript.
Funding
This initiative is carried out within the framework of the funds of the Recovery, Transformation and Resilience Plan, financed by the European Union (Next Generation) - National Cybersecurity Institute (INCIBE) in the project C107/23 “Artificial Intelligence Applied to Cybersecurity in Critical Water and Sanitation Infrastructures”.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data that support the findings of this study are available from the corresponding author upon reasonable request.
Conflicts of Interest
The authors declare no conflicts of interest.
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |