Next Article in Journal
A Unified Self-Supervised Framework for Plant Disease Detection on Laboratory and In-Field Images
Previous Article in Journal
Interaction-Based Vehicle Automation Model for Intelligent Vision Systems
Previous Article in Special Issue
From Design to Deployment: A Comprehensive Review of Theoretical and Experimental Studies of Multi-Energy Systems for Residential Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Enhancing Security and Applicability of Local LLM-Based Document Retrieval Systems in Smart Grid Isolated Environments

1
Department of Hacking & Security, Far East University, Chungcheongbuk-do 27601, Republic of Korea
2
Department of Green Energy, Far East University, Chungcheongbuk-do 27601, Republic of Korea
3
Department of Computer Engineering, Sejong University, Seoul 05006, Republic of Korea
4
Convergence Engineering for Intelligent Drone, Sejong University, Seoul 05006, Republic of Korea
*
Authors to whom correspondence should be addressed.
Electronics 2025, 14(17), 3407; https://doi.org/10.3390/electronics14173407 (registering DOI)
Submission received: 29 July 2025 / Revised: 19 August 2025 / Accepted: 21 August 2025 / Published: 27 August 2025

Abstract

The deployment of large language models (LLMs) in closed-network industrial environments remains constrained by privacy and connectivity limitations. This study presents a retrieval-augmented question-answering system designed to operate entirely offline, integrating local vector embeddings, ontology-based semantic enrichment, and quantized LLMs, while ensuring compliance with industrial security standards like IEC 62351. The system was implemented using OpenChat-3.5 models with two quantization variants (Q5 and Q8), and evaluated through comparative experiments focused on response accuracy, generation speed, and secure document handling. Empirical results show that both quantized models delivered comparable answer quality, with the Q5 variant achieving approximately 1.5 times faster token generation under limited hardware. The ontology-enhanced retriever further improved semantic relevance by incorporating structured domain knowledge into the retrieval stage. Throughout the experiments, the system demonstrated effective performance across speed, accuracy, and information containment—core requirements for AI deployment in security-sensitive domains. These findings underscore the practical viability of offline LLM systems for privacy-compliant document search, while also highlighting architectural considerations essential for extending their utility to environments such as smart grids or defense-critical infrastructures.

1. Introduction

Smart grids represent the next-generation power infrastructure that integrates Information and Communication Technologies (ICT) across the entire power lifecycle—from generation and transmission to distribution and consumption—to maximize energy efficiency and enable real-time responsiveness. These systems are composed of critical infrastructure elements such as power plants, substations, and transmission and distribution facilities, and are characterized by the interconnection of numerous operational devices, sensors, and control systems via dedicated networks [1]. However, such complexity simultaneously expands the surface of potential cyber threats. In particular, smart grids operating in closed-network (air-gapped) environments face unique challenges where external connectivity is restricted, necessitating a delicate balance between robust security and accessible information retrieval [2].
Terminology: In this paper, we use the term smart grid in the operational sense of a sensing- and communication-enabled grid that delivers real-time telemetry to a central control center. Edge assets do not perform autonomous diagnosis or closed-loop control; all decisions remain with human operators. Accordingly, the proposed system is a decision-support tool and does not issue control commands.
Recently, large language models (LLMs) have demonstrated remarkable capabilities in natural language understanding and document-based question answering, leading to their adoption across various industrial sectors. Nevertheless, most LLMs are deployed via cloud-based external APIs, rendering them impractical for use in closed-network infrastructures like smart grids due to concerns over data leakage and security compliance. Consequently, there is an emerging demand for alternative AI systems that can securely process internal data while mitigating the risk of information exposure [3].
Smart grid operators frequently rely on extensive internal documentation—such as incident response manuals, security reports, and policy records—for informed decision-making. However, existing keyword-based retrieval systems are limited in their ability to perform semantic search, resulting in inefficiencies that require significant time and human resources to locate relevant materials. This highlights the need for a document embedding-based LLM system that operates locally, particularly one that adopts a Retrieval-Augmented Generation (RAG) architecture to provide accurate and rapid document search and response capabilities [4,5].
In response to these needs, this study aims to design a local LLM-based document question-answering system that can be securely deployed within air-gapped industrial environments such as smart grids. The proposed system emphasizes security, real-time responsiveness, and resilience, thereby establishing a viable technical foundation for field application. Furthermore, by evaluating alignment with international standards such as IEC 62351 [6], this research seeks to demonstrate the potential of the system as a practical AI solution for enhancing industrial cybersecurity.

1.1. Challenges in Smart Grid Security and Document Management

The smart grid is an intelligent system that interconnects the entire power infrastructure—from power plants, substations, and transmission and distribution networks to end-user terminals—through advanced automation and real-time communication technologies. Compared to traditional power grids, smart grids involve significantly more control points and communication nodes, thereby increasing the surface area vulnerable to cyberattacks. In particular, the convergence of Operational Technology (OT) and Information Technology (IT) within smart grid architectures elevates the security risk, as compromises in control systems can directly result in physical consequences.
Due to this structural complexity, smart grids are typically operated within air-gapped environments that are physically and logically isolated from external networks. However, being disconnected from the internet does not guarantee immunity from cyber threats. Insider threats, the introduction of malware via portable media such as USB devices, and infections through temporarily connected laptops or diagnostic equipment during maintenance are all viable attack vectors. In practice, incidents targeting smart grid and SCADA systems are on the rise, highlighting the urgent need to enhance detection and response capabilities in security systems [7].
Simultaneously, incident response and operational decision-making in smart grids heavily rely on a large corpus of internal documentation [8]. This includes system architecture diagrams, incident response procedures, equipment manuals, regulatory compliance documents, and internal security guidelines. Security Operation Center (SOC) analysts and field operators are expected to make accurate and timely decisions based on these resources. However, most of these documents are unstructured, vary in format, and exist in large volumes, making it difficult to retrieve meaningful information promptly using conventional keyword-based search methods [8,9].
Moreover, the energy sector is subject to a variety of regulatory requirements and international standards—such as IEC 62351 [6] and NISTIR 7628 [2]—which often take the form of lengthy technical documents spanning hundreds of pages. Expecting security personnel or technical staff to manually search and interpret these documents in real time is both inefficient and error-prone.
Therefore, there is a pressing need for an intelligent system that can preserve the security constraints of air-gapped environments while enabling rapid semantic search of internal documents and providing natural language responses. Such a system must go beyond simple information retrieval to include contextual understanding based on domain knowledge, retrieval of precedent cases in similar threat scenarios, and assistance in interpreting standard compliance documents. To address these needs, we propose the development of a local LLM-based document question-answering system capable of operating independently within closed networks. This system aims to simultaneously enhance the security and operational efficiency of smart grids [10].

1.2. Research Contributions and Paper Structure

The study addresses the urgent need for a document-centric question-answering system that operates safely inside air-gapped critical infrastructures such as smart grids, eliminating any dependency on external networks while meeting stringent requirements for security, real-time responsiveness, and industrial applicability. Prior approaches, constrained by keyword retrieval and simple pattern matching, fall short of delivering semantically accurate results or supporting the domain-aware question answering demanded by field operators [9]. Moreover, directly deploying cloud-resident LLM services in smart-grid environments raises intolerable risks of data leakage, connectivity barriers, and non-compliance with international security regulations [11,12].
To overcome these limitations, the proposed work presents a fully offline Retrieval-Augmented Generation architecture in which e5-based embeddings, vector search, and LLM inference are executed locally, ensuring suitability for closed-network infrastructures. The design extends beyond mere functionality by incorporating defenses against prompt-injection and embedding inversion attacks, GPU-accelerated low-latency inference, and robust backup-and-re-indexing routines that sustain service continuity under failure conditions [13,14]. Further, by integrating a smart-grid-specific ontology and knowledge graph, the system elevates contextual accuracy and enables domain reasoning unattainable with conventional search engines. Alignment with IEC 62351 [6], NISTIR 7628 [2], and related standards demonstrates the solution’s interoperability and practical readiness for industrial rollout.
The remainder of the paper is organized as follows. Section 2 surveys relevant technologies and prior research. Section 3 details the overall architecture and the design of each subsystem. Section 4 explains security- and reliability-oriented core functions, while Section 5 describes the ontology-based semantic enhancement strategy. Section 6 analyses conformity with international standards and discusses industrial deployment scenarios. Section 7 validates system effectiveness through experiments and scenario-driven evaluations. Section 8 and Section 9 conclude by summarizing contributions, acknowledging limitations, and outlining directions for future work.

2. Related Research

In critical infrastructures such as smart grids, where both high security and real-time responsiveness are paramount, the direct application of cloud-based large language models (LLMs) remains largely infeasible. Concerns over potential data leakage during external API calls, technical restrictions imposed by air-gapped environments, and the need to comply with international security standards all demand a fundamental rethinking of how such models are deployed—moving beyond performance metrics to reconsider operational paradigms entirely. Against this backdrop, growing attention has been directed toward local, self-contained LLM architectures. These architectures offer the ability to support natural language querying and document interpretation without exposing sensitive data to external networks, thereby aligning with the operational constraints of high-security industrial settings [15].
However, localizing an LLM is not simply a matter of installing the model onto a physical device. High-performing pre-trained models—such as those developed by Hugging Face or Meta—must be optimized for offline inference, often through techniques such as model quantization and parameter reduction tailored to GPU environments. Models with approximately seven billion parameters, including LLaMA2, OpenChat, and Mistral, have proven capable of delivering practical inference on a single GPU and are increasingly being deployed in conjunction with document-based Retrieval-Augmented Generation (RAG) systems. In these implementations, vector embedding models such as e5-large or BGE are used to index pre-processed documents, enabling efficient retrieval and semantic matching. The RAG architecture, in particular, plays a critical role in reducing hallucination and improving the factual reliability of generated responses [10,16].
Nevertheless, applying this architecture to smart grid environments introduces a distinct set of challenges. Unlike general IT systems, smart grids operate as cyber-physical systems (CPS) in which operational technology (OT) is directly linked to physical energy flow and control mechanisms. As such, document systems in this context are not merely used for information retrieval but are central to decision-making in areas such as incident response, control command interpretation, and compliance evaluation. To fulfill such roles, the system must possess domain-specific reasoning capabilities that exceed the limits of simple keyword or embedding-based retrieval. This requirement calls for an enhanced architecture that incorporates domain-aware ontologies and knowledge graphs [14,17].
A knowledge graph enables the model to understand the relationships between entities within the smart grid domain—for example, between generators and protective relays, or remote terminal units (RTUs) and communication protocols—beyond the level of individual word semantics. This structured representation allows the system to generate logically connected answers that go beyond sentence extraction. For instance, in response to a question such as “What is the response protocol for unauthorized access in a substation?”, the model would be able to follow a reasoning chain such as event type → response policy → applicable equipment manual, rather than simply quoting a document. This capability significantly differentiates the proposed system from conventional LLM implementations, and is particularly advantageous in highly standardized, interdependent domains like the power sector.
At the same time, such an architecture must be reinforced from a security perspective. During vector database access or LLM-driven document referencing in RAG systems, several risks may arise—including prompt injection attacks, embedding leakage leading to reverse inference, and unauthorized document exposure. To address these concerns, recent studies have proposed countermeasures such as differential privacy-enhanced embeddings, harmful prompt detection, query logging, and user authentication mechanisms [15]. Building on this body of research, the present study establishes the technical foundations for a domain-specialized local LLM document-response system that ensures both reliability and security in air-gapped smart grid environments.

Limitations of Existing Research

Although recent studies have increasingly explored natural language processing (NLP)-based question-answering systems applicable to high-security infrastructures such as smart grids, several technical and operational limitations persist when considering real-world deployment scenarios.
A primary limitation lies in the dependency on cloud-based LLM systems. Most existing question-answering frameworks are designed around external APIs provided by platforms such as OpenAI, Google, or Anthropic. As a result, they are fundamentally incompatible with air-gapped infrastructure environments or raise considerable security concerns when deployed in such contexts. While alternative local systems do exist, they are often limited to demonstration-level implementations and are generally not designed with the security requirements of critical infrastructure in mind, focusing instead on generic text processing tasks.
A second challenge is the lack of domain-specific knowledge integration. Fields such as power systems, industrial control, and smart grids operate on formalized equipment specifications, communication protocols, and security policies. General-purpose LLMs, in the absence of prior domain adaptation, struggle to process such structured information effectively. Although some studies have succeeded in handling predefined FAQ documents or simple policy responses, they fall short in addressing more complex reasoning tasks that require semantic linkage across heterogeneous documents or the incorporation of ontologies for context-aware inference.
A third and increasingly important limitation is the insufficient consideration of system-level security in question-answering architectures. Recent literature frequently raises concerns about prompt injection, embedding inversion, and the extraction of sensitive information via user queries. Yet few studies propose or implement structural safeguards—such as authentication-based access control, noise-injected vectorization, or query logging—that are essential for deploying such systems as trustworthy AI components within industrial environments. This absence of secure architecture design remains a critical barrier to practical adoption.
Finally, existing research rarely addresses alignment with international standards or operational compatibility with industry practices. Particularly in the power sector, frameworks such as IEC 62351 [6] and NISTIR 7628 [2] provide detailed security guidelines tailored for smart grid environments. Nonetheless, there remains a lack of architectural consideration for integration with incident response documents, operating procedures, or compliance policies derived from these standards. Consequently, many existing approaches are limited to proof-of-concept levels and remain unfit for deployment in security-sensitive, field-oriented environments.
To overcome these limitations, the present study proposes a locally operable LLM-based architecture explicitly designed for integration into smart grid infrastructures. The system is developed with a comprehensive focus on security, domain relevance, real-time responsiveness, and policy-level interoperability, thereby laying the groundwork for practical implementation in high-assurance environments.

3. System Architecture

3.1. Design Objectives and Security Constraints

The primary objective of this system is to implement a document-based question-answering framework powered by a local large language model (LLM), capable of operating entirely within a network-isolated environment. The goal is to enable the practical use of AI capabilities in industrial field settings while preventing the leakage of sensitive technical data or personal information. To achieve this, the system incorporates a retrieval-augmented design that combines local LLM inference with document indexing and embedding techniques based on LlamaIndex.
The system is specifically engineered to function in a fully offline setting. Internet connectivity and external API access are entirely disabled, with dummy API keys configured to ensure that no outbound connections are attempted. All data processing takes place strictly within the local machine, effectively eliminating any risk of information leakage to external servers. However, internal threats such as unauthorized access to the vector database or reverse engineering of document embeddings still pose potential risks. Accordingly, the system is designed to satisfy strict security requirements, including local data retention, prevention of unauthorized access, and protection against embedding misuse.

3.2. Overall Architecture of the Proposed Document QA System

The architecture of the proposed local document question-answering system is composed of four core modules: document ingestion and preprocessing, vector embedding generation, query processing, and response logging. When a user uploads local files in PDF or TXT format, the system extracts textual content using tools such as PyMuPDF (v1.24.10, Artifex Software Inc., Novato, CA, USA). The extracted text is then segmented into semantically meaningful chunks, and embeddings are generated for each segment. These embeddings are stored as a vector index in a local database, which is used during query time to identify the most relevant document fragments based on similarity search.
The retrieved fragments are subsequently reranked using a Sentence Transformer to refine contextual relevance, after which the final context is selected for answer generation. The local LLM then generates a response based on the selected context. All operations are performed entirely within the local environment without any external data transmission. Furthermore, both queries and generated responses are logged internally, supporting auditability and enabling secure monitoring and oversight of system activity. Figure 1 shows the overall system architecture.

3.3. Local LLM Configuration and Model Selection Criteria

In configuring the local document question-answering system, the OpenChat-3.5 model—fine-tuned on Meta’s open Llama series—was adopted as the base LLM. Two quantized versions of the model, Q5 and Q8, were selected to support different operational priorities. The Q5 model was employed to optimize for speed and lightweight execution, while the Q8 version prioritized response quality. Both models were deployed in .gguf format and executed without GPU acceleration using the llama.cpp backend through Python (version 3.12.6) bindings via LlamaCPP, enabling entirely CPU-based inference suitable for air-gapped environments.
Table 1 summarizes the key configuration differences between the two model versions. The Q5 configuration limits the maximum generation tokens to 256 and uses a shorter context window of 4096 tokens, enabling rapid inference even in constrained environments. In contrast, the Q8 model supports up to 1024 generation tokens and an extended context window of 8192 tokens, allowing for richer and more complete answers at the cost of higher resource consumption. Both configurations incorporate similarity-based document retrieval and reranking stages, although the Q8 variant engages with a greater number of retrieved and reranked documents to enhance contextual precision.
For both models, the generation temperature was set to 0.7 to balance determinism and variability in responses. To improve the contextual relevance of retrieved documents, a SentenceTransformer-based reranking module was integrated prior to LLM inference. Furthermore, the response generation process was configured to use LlamaIndex’s response_mode = “refine”, which allows the model to incorporate multiple retrieved contexts in a sequential manner. This setup ensures that an initial answer is generated from the first retrieved context, and is subsequently refined by incorporating additional relevant contexts step-by-step.
To balance information density and computational efficiency, the prompt structure was designed to encourage concise answers, ideally limited to within 100 characters. This approach not only supports faster response generation but also aids in summarizing essential information for field operability.

3.4. Document Processing and Embedding Strategy

To improve the semantic accuracy of document retrieval, this study adopts the Multilingual-e5-large embedding model provided by Hugging Face. While various alternative embedding models—such as RoBERTa, BERT, and BGE—were considered, the E5 model was ultimately selected due to its strong performance in semantic similarity tasks between questions and documents, its multilingual support, and its compatibility with CPU-based local environments. Competing models such as BGE were evaluated during preliminary testing but were not employed in the final system configuration.
The document preprocessing pipeline is designed to handle input files in both PDF and TXT formats, as shown in Figure 2. For PDF documents, the PyMuPDF (fitz) library is used to extract text from each page, after which all content is merged into a single unified string and converted into a LlamaIndex-compatible document object. TXT files are read directly using UTF-8 encoding and processed as raw text. Extracted text from each document is then segmented into semantically coherent chunks, rather than by sentence boundaries, using LlamaIndex’s Semantic Text Splitter module. This approach ensures that each chunk preserves meaningful context, which is essential for accurate retrieval.
Each generated chunk is subsequently transformed into a dense vector representation using the E5 embedding model. These vectors are then stored in a local vector store for retrieval and indexing. The overall document embedding procedure, including text extraction, semantic chunking, and vectorization, is summarized in the pseudocode shown in Figure 2. The embedded document vectors are managed through a Vector Store Index and are persistently stored on disk in a JSON-based format. This design enables the system to reload previously indexed data upon restart, eliminating the need to reprocess and re-embed all documents from scratch. Each vector is stored together with metadata indicating the source document name, which allows the system to identify the origin of each retrieved chunk and enhances the reliability and interpretability of the generated responses. While the current implementation supports only PDF and TXT formats, the architecture is extensible to accommodate a wider range of document types, including DOCX, HTML, and image-based inputs through OCR. Future development aims to expand the preprocessing pipeline accordingly to support these additional formats in a unified and scalable manner.

3.5. Configuring a RAG-Based Search and Reordering Pipeline

User queries are processed using a Retrieval-Augmented Generation (RAG) framework, which consists of three core stages: vector-based retrieval, reranking, and response generation. The overall process is implemented as shown in Figure 3.
In the vector-based retrieval stage, the input query q R d is embedded in to a dense vector representation V q using the E5 model [5]:
V q = f e m b e d ( q )
The similarity between the query vector and each document chunk vector V i in the vector store is computed using cosine similarity [18]:
s i m V q , V i = V q V q V i V i
The top k candidates with the highest similarity scores are selected [5]:
C t o p k = T o p k ( s i m V q , V i i = 1 N )
In the semantic reranking phase, the candidate chunks C 1 , C 2 , , C k C t o p k are passed through a cross-encoder g r e r a n k , which jointly considers both the query and the candidate context to compute semantic relevance scores [19]:
s c o r e q , c i = g r e r a n k ( q , c i )
The top n contexts { c 1 , , c ( n ) } are then selected after reranking, where n < k .
For response generation, the selected contexts are assembled into a prompt P , which is input to the local LLM. The LLM then generates a refined answer A using LlamaIndex’s multi-step refinement mode. Let the initial answer be A 1 = L L M ( q , c ( 1 ) ) , and let each refinement step be defined recursively [20]:
A i + 1 = L L M _ r e f i n e ( A i , c i + 1 ) ,   f o r   i = 1 , , n 1
The final output is A n , which integrates all selected context chunks progressively.
Finally, the entire interaction is securely logged, including the query q, the selected contexts { c ( i ) } , the generated answer A n , and metadata such as timestamps and retrieval statistics. This log is stored in an access-controlled environment for auditability and system performance evaluation.
Implementation note: Because each retrieved chunk’s original plaintext is persisted together with its embedding vector and provenance metadata (document ID, page span, timestamp) in the local index, the system directly inserts the selected plaintext chunks into the LLM prompt during answer generation. No inverse transformation or decoding from vectors is required—embeddings are used only for retrieval, while generation is always conditioned on the exact source text.

3.6. User Interface Concept and Auditability

To aid non-expert users, we provide a conceptual GUI (Figure 4) that foregrounds evidence and auditability. The interface exposes top-k retrieved passages with sources and timestamps, highlights citation anchors in the generated answer, and displays a lightweight confidence cue. An “Evidence-Only” mode restricts output to verbatim extracts. Role-based views (operator/admin) and an immutable audit log align with the access-control and logging design of the system.

4. Security and Reliability-Oriented Design Enhancement

4.1. Security Requirements (Closed-Network Deployment)

The system is deployed inside closed-network energy infrastructures to avoid exposure to cloud services and outbound traffic.
Despite air-gapping, internal risks remain, including (i) compromise of the vector database and potential embedding-inversion attempts; (ii) systematic query aggregation by an authenticated but curious user; and (iii) prompt-injection via malicious inputs or documents that could steer the LLM to reveal protected information.
Security requirements: To ensure operational safety and trustworthiness under the above model, the system enforces the following controls:
SR1. Data locality.
All source documents and their embedding vectors are stored exclusively on local storage within the secure boundary. This eliminates the risk of inadvertent disclosure through internet transmission.
SR2. No external APIs.
Both embedding and LLM inference execute entirely on-premises with all outbound network paths disabled. No third-party API calls are allowed.
SR3. Protection of embedding vectors and vector data.
Stored embedding vectors and metadata are protected using cryptographic safeguards (e.g., encryption at rest, authenticated access), role-based access policies, and, where appropriate, obfuscation to reduce the impact of potential vector-inversion attacks.
SR4. Strong RBAC and separation of duties.
Administrative privileges (document registration, deletion, re-indexing, configuration) are separated from operator privileges and granted only after successful authentication and authorization.
SR5. Comprehensive audit logging with privacy controls.
All queries, retrieval events, and model responses are logged immutably for monitoring and security auditing. Access to logs is restricted to authorized personnel, and sensitive fields are handled according to privacy policy (masking/ minimization where necessary).
SR6. Abuse and prompt-injection mitigations.
The interface supports evidence-first interactions (e.g., citations/Evidence-Only mode) and applies input hardening against prompt-injection. Query rate-limiting/ throttling and monitoring help curb automated aggregation of operational details.

4.2. Prompt Injection and Embedding Inversion Defense

To mitigate prompt injection threats malicious attempts to manipulate model behavior through crafted user inputs prompt construction strictly follows predefined templates designed to clearly distinguish between system-generated instructions and user queries. A general mathematical formulation of secure prompt construction can be expressed as follows [5]:
P r o m p t = Q u s e r + \ n [ Document   Context ] \ n + C s e l e c t e d
Here, Q u s e r refers to the user’s query, while C s e l e c t e d indicates the document contexts that have been selected through semantic re-ranking processes.
This structured approach ensures clear delineation between user input and predefined system context. Preprocessing filters further identify and remove inputs containing potentially harmful commands or suspicious patterns. While such measures substantially mitigate current risks, advanced strategies such as reinforcement learning from human feedback (RLHF) or middleware-based external prompt validation remain areas for future exploration.
To prevent embedding inversion attacks—unauthorized attempts to reconstruct sensitive content from embedding vectors—strict file permissions are enforced, limiting vector index access exclusively to authorized administrators. Additional protective strategies, including secure vector operations, cryptographic hashing of embeddings, or embedding data encryption techniques, are also under consideration. Future enhancements may incorporate differential privacy (DP) methods or perturbation through noise injection to further obscure sensitive information within embeddings. This protective approach can be mathematically expressed through the differential privacy paradigm.
M X = f X + L a p l a c e ( 0 , Δ f )
where M denotes the privacy mechanism, f X the original embedding function, Δ f the sensitivity of the embedding, and the privacy budget. Such methods, however, require careful consideration due to inherent trade-offs with retrieval accuracy [12].

4.3. Real-Time Processing Optimization and GPU Acceleration

Currently, the implemented system operates exclusively on CPU resources, employing multi-threaded inference processes (e.g., Intel i7-13700F with 16-core processing). Benchmark evaluations demonstrated that the 5-bit quantized model (OpenChat-3.5-0106.Q5_K_M) achieves more than threefold inference speed improvements over the 8-bit quantized version (OpenChat-3.5-1210.Q8_0). Nonetheless, response latencies may still notably increase for extensive context handling or generation of lengthy responses.
Future research will explore GPU acceleration methods aimed at achieving real-time performance standards within operational energy infrastructure settings. Transitioning to GPU-based LLM inference, for instance, utilizing PyTorch (version 2.8.0)-compatible inference platforms, could reduce processing complexity significantly, from linear CPU-based complexity O ( n ) toward substantially improved parallel GPU computation efficiency.
T G P U T C P U O ( n )
Moreover, parallelized GPU-based embedding computations offer further latency reductions. Additional optimization strategies under consideration include model compression techniques, intelligent prompt summarization algorithms, and caching frequently requested responses to enhance system responsiveness in real-time contexts [18,21].

4.4. System Resilience Through Automated Indexing and Recovery

To enhance system resilience, embedded indices are persistently stored on disk (in JSON or equivalent serialized formats), enabling immediate restoration upon system restart without re-embedding overhead. A conceptual model of storage and recovery can be represented as shown in Figure 5.
Furthermore, manual re-indexing via administrator interfaces supports updates upon document changes. Future developments may incorporate automated indexing triggered by real-time file system monitoring or scheduled periodic updates. Regular incremental backups of index data will further ensure prompt recovery capabilities in scenarios involving data corruption or accidental loss, thereby significantly reinforcing system robustness.
Operational fault monitoring and alerting. To complement the security-oriented logging described above, the system design includes a lightweight health monitor that probes critical components (embedding worker, vector index, LLM backend, storage, and the API gateway) and tracks heartbeats and error rates. When timeouts or errors are detected, the monitor automatically notifies administrators (e.g., via the audit log/SIEM and email) and can trigger policy-defined maintenance actions such as pausing ingestion, restarting the affected service, or failing over to a standby instance. This mechanism covers not only cybersecurity incidents but also ordinary equipment faults (e.g., disk errors, GPU out-of-memory), thereby improving availability. In the current prototype we primarily focus on security logging; integrating full health-monitoring and alerting into the production deployment is planned, which will further reduce time-to-detect and time-to-recover.

4.5. Access Control, Logging, and Privacy Protection Mechanisms

Security is systematically enforced through role-based access control mechanisms. Document management and administrative tasks are strictly segregated from user-level query functions, requiring secure authentication for administrative operations. A simplified representation of the access control logic is illustrated in Figure 6.
Authenticated administrative access prevents unauthorized modifications of sensitive document indices, ensuring operational security integrity. Future enhancements may further include granular user permissions to restrict highly sensitive documents to authorized personnel exclusively, accompanied by detailed administrative action logging for enhanced auditing transparency.
Comprehensive logging of query-response interactions, including query timestamp, content, context count, and response summaries, supports security auditing and performance diagnostics. Logs are securely stored with strictly controlled administrative access. Potential future privacy enhancements involve encryption or privacy-preserving masking of log entries—for instance, hashing user identifiers and establishing automated retention policies to purge logs beyond their required auditing lifecycle.
Importantly, since all system processing occurs locally without external transmission, inherent risks associated with cloud-based LLM deployments—particularly external leakage of sensitive personal or operational data—are fundamentally mitigated. This advantage represents a significant step forward in industrial security, offering a safe, privacy-preserving LLM integration environment. Further research may incorporate selective encryption of sensitive log information and advanced privacy-preserving embedding methodologies to strengthen system confidentiality.

5. Ontology Based Semantic Enrichment Strategy

5.1. Domain Knowledge Modeling for Smart Grid Environments

The smart grid environment is characterized by inherent complexity, comprising diverse interconnected elements such as generators, substations, transformers, remote terminal units (RTUs), protection relays, and specific communication protocols. These components interact dynamically, forming intricate relational structures essential for accurate interpretation and effective operational management. However, conventional keyword-based retrieval systems often fail to capture these nuanced domain relationships adequately, limiting their effectiveness in sophisticated, real-time operational contexts.
To address this limitation, this study proposes adopting an ontology-based knowledge modeling framework specifically tailored for smart grid environments. Ontologies explicitly model domain-specific concepts, attributes, and relationships, providing a formal semantic representation that substantially enhances system interpretability. For instance, a domain-specific ontology can explicitly define relationships linking protection relays to specific transformers or clearly specify how communication protocols interface with RTU operations. These semantically structured definitions allow for more precise operational interpretation and facilitate accurate automated reasoning within the document question-answering system.

5.2. Integration of Ontologies Within the Question Answering Workflow

Integrating domain ontologies into the retrieval-augmented generation (RAG) workflow involves embedding ontology-based reasoning directly into the document retrieval and response-generation processes. Upon receiving a user query such as, “What are the required response steps following an unauthorized access alert at a substation?” the system does not merely retrieve isolated document fragments based on keyword matches. Instead, it employs embedding-based semantic similarity computation, measuring the cosine similarity between the query vector (q) and document vectors (d), defined mathematically as [22].
S i m c o s q , d = q q d d = i = 1 n q i d i i = 1 n q i 2 i = 1 n d i 2
This approach quantifies semantic relatedness precisely, ensuring more relevant and accurate retrieval outcomes.
Moreover, after initial retrieval and semantic re-ranking, the ontology framework further guides prompt construction provided to the local LLM. Presenting the model with ontology-enriched semantic contexts rather than raw text alone allows the model to generate contextually coherent, operationally precise, and semantically accurate responses. Thus, ontology integration serves not only as a reference mechanism but as an integral component enhancing the overall semantic comprehension capabilities of the document QA system.

5.3. Enhancing Semantic Interpretation and Reasoning Accuracy

Adopting ontology based semantic enrichment significantly improves the interpretive accuracy and reliability of the QA system within smart grid contexts. By explicitly modeling domain specific semantic relationships, the ontology approach effectively resolves inherent ambiguities typically encountered in conventional keyword-based or shallow semantic retrieval approaches. For instance, operational protocols related to specific security threats or technical events often entail multi-step, cross-referenced documentation not explicitly interlinked in textual descriptions. Here, ontology-guided semantic retrieval leverages the formal ontology structure to identify relevant conceptual relationships accurately. Specifically, semantic relationship scores between domain concepts C i and C j ( R e l o n t ) are mathematically defined by an exponential decay function of the ontology distance ( d i s t ( C i , C j )) [23]:
R e l o n t C i , C j = e α · d i s t ( c i , c j )
where α denotes a parameter adjusting the semantic relevance weighting based on conceptual distances. Consequently, closely related concepts receive higher weighting during retrieval, greatly enhancing multi-hop semantic inference accuracy.
This enhanced semantic reasoning capability provides tangible operational advantages, particularly within critical infrastructure environments. Operators and security personnel gain contextually accurate and reliable guidance in real time, substantially improving situational awareness, decision-making efficiency, and operational responsiveness. Ultimately, ontology-based semantic enrichment positions the QA system as a more robust, contextually intelligent, and reliable tool, significantly increasing its practical value within complex, security-sensitive smart grid infrastructures.
Quantitative impact: Beyond the qualitative gains discussed above, we complement the analysis with an additional ranking metric—Mean Reciprocal Rank (MRR)—reported in Section 7.5, alongside top-1/top-3 relevance. This provides a more complete quantitative view of ontology-enhanced retrieval quality.

5.4. Smart Grid Domain Ontology Development and Example

Developing a smart grid-specific ontology requires a structured approach that clearly organizes the key entities and their relationships within the power infrastructure domain. The process typically starts by identifying core classes that form the foundation of the ontology. These include physical components like Substations, Transformers, Generators, Remote Terminal Units (RTUs), and Protection Relays, as well as communication protocols such as SCADA and IEC 61850. In addition, conceptual elements like security alerts or operational procedures are also included.
Each of these domain entities is defined as a distinct concept (or class) within the ontology, and real-world relationships are explicitly modeled to reflect how components interact in practice. For instance, a substation may contain multiple transformers, be monitored by an RTU, and include safety devices like protection relays. Likewise, a security event such as an Unauthorized Access Alert is linked to the specific substation where it occurs and is connected to a corresponding response procedure or protocol.
By formalizing these relationships, the ontology effectively constructs a knowledge graph that integrates both physical infrastructure and procedural context. This structure enables a QA system to automatically navigate and utilize these connections, allowing it to reason over complex scenarios and deliver informed responses.
Integrating this ontology-based knowledge model into the QA system enables the system to understand and respond more intelligently to user queries that involve domain-specific concepts. When a user’s question references something like an “unauthorized access at a substation,” the system does not just look for documents that match the exact keywords—it uses the ontology to recognize the semantic relationships behind the terms. It understands that an alert event is tied to a specific substation, and that the event is associated with a predefined response procedure.
This means the system can retrieve not only documents about the alert itself but also related materials—like technical information on the substation’s devices or security protocols—even when those documents are not directly linked. Because the ontology is built using formal structures like OWL or RDF, it supports consistent interpretation and logical reasoning. This allows the QA system or an inference engine to deduce additional insights, such as recognizing that “grid station” refers to the same concept as “substation,” or that a protection relay guarding a transformer implies that the transformer is part of the same substation.
Figure 7 shows a simplified example of this ontology. In the diagram, a Substation is connected to several related components: it contains a Transformer, includes a Protection Relay, and is monitored by an RTU. A security alert, like an Unauthorized Access Alert, is linked to the Substation where it occurs, and is further connected to a Response Procedure that outlines the appropriate reaction. This structure illustrates how various entities—equipment, monitoring devices, and event-handling protocols—are semantically connected, providing the QA system with a deeper context for understanding and inference.

6. Industrial Standards and Interoperability Considerations

6.1. Alignment with IEC 62351 [6] and NIST IR 7628 [2] Standards

Given the highly security-sensitive nature of smart grid infrastructures, compliance with internationally recognized standards such as IEC 62351 [6] and NIST IR 7628 [2] is essential. IEC 62351 [6] is a multi-part standard that addresses cybersecurity in power systems, providing technical specifications to ensure secure communication and data integrity within energy management and control environments. Likewise, NIST IR 7628 [2] offers comprehensive cybersecurity guidelines specifically tailored to smart grid systems, covering areas such as risk assessment, system protection, and secure operational procedures.
In the context of our proposed local LLM-based document QA system, careful alignment with these standards is crucial to demonstrate that the system enhances, rather than compromises, existing security requirements. This ensures that our design does not introduce new vulnerabilities, but instead reinforces the foundation laid by established guidelines.
Our system is particularly aligned with the core requirements of IEC 62351 [6]. It incorporates secure data handling, strong user authentication protocols, and role-based access control—all of which reflect key elements of the standard. For clarity, Table 2 summarizes how the different parts of IEC 62351 [6] are addressed by our QA system.
For instance, IEC 62351-3 [6] mandates the use of Transport Layer Security (TLS) to protect TCP/IP communications in SCADA environments. In response, our system encrypts all client-server communication channels using TLS and enforces mutual authentication. In addition, other components of the standard—such as SCADA protocol security, continuous monitoring, access control, key management, secure system architecture, and event logging—are all reflected in specific design choices within our implementation, as detailed in the table.
In addition to complying with IEC 62351 [6], the QA system’s design also reflects key principles from the NIST IR 7628 [2] Guidelines for Smart Grid Cyber Security. NIST IR 7628 [2] stresses a risk-based approach to protecting smart grid environments, and our deployment process starts with a thorough risk assessment (as outlined in Section 6.2) to align with that core principle.
The system’s localized data processing—keeping sensitive operational information on-premises and disconnected from the public internet—along with tightly controlled communication paths and robust logging and auditing mechanisms, all reflect NIST’s recommended practices for securing critical energy infrastructure.
By addressing both the technical controls of IEC 62351 [6] and the broader risk management framework of NIST IR 7628 [2], our QA system reinforces confidence in its suitability for highly regulated operational networks. This deliberate alignment ensures that the system strengthens, rather than weakens, the existing cybersecurity posture making it a trusted solution for real-world deployment.

6.2. Guidelines for System Deployment Within Regulatory Environments

Deploying an AI-based system within critical energy infrastructure, where regulatory oversight is strict and operational safety is paramount, requires a methodical and well-structured approach from the outset. To ensure the system integrates smoothly and meets compliance expectations, several key guidelines must be followed throughout the deployment process.
First, it is essential to conduct comprehensive risk assessments in line with the NIST IR 7628 [2] framework. This involves thoroughly analyzing how the introduction of the QA system may impact existing grid operations, identifying potential cybersecurity vulnerabilities, and defining mitigation strategies in advance. These assessments help ensure that all stakeholders—from technical teams to compliance officers—fully understand the risks and the safeguards that will be in place. By starting with this foundation, the deployment process becomes more transparent, structured, and resistant to unforeseen disruptions.
Second, operational security policies must be clearly defined and maintained in accordance with the technical controls outlined in IEC 62351 [6]. These policies should establish how the QA system will be used in daily operations, outline procedures for continuous security monitoring, and specify response actions in case of a cybersecurity incident. Clear delineation of access roles, user permissions, and security responsibilities is especially important, particularly when mapping roles to IEC 62351-8 [6] role-based access control standards. In addition, logging and auditing processes must be documented and structured in a way that aligns with the upcoming IEC 62351-14 [6], allowing seamless integration with existing compliance tools and reporting workflows. These policies should not be static; they must be periodically reviewed and revised to reflect changes in the regulatory landscape or system configuration.
Third, preparing system administrators, operators, and all relevant personnel through structured training programs is a vital part of deployment. These training sessions should not only explain how to operate the QA system but also emphasize the security procedures and standards that must be upheld. Instructional content should include operational workflows, emergency response protocols, access management practices, and guidance on using the system in compliance with internal and external security frameworks. Beyond initial training, organizations should regularly conduct refresher courses and simulation drills such as mock cybersecurity incidents that involve interaction with the QA system to maintain high awareness and practical readiness among users.
By incorporating these measures risk assessment, security policy development, and user education into the deployment strategy from the beginning, the organization can ensure the QA system reinforces existing security structures rather than introducing gaps. This proactive approach helps maintain regulatory compliance, protects critical infrastructure, and establishes a secure foundation for the responsible use of AI in smart grid environments.

6.3. System Modularity and Integration Potential with Existing Infrastructure

Modern smart grid environments are inherently complex and diverse, which is why the proposed QA system has been intentionally designed with modularity and interoperability at its core. The system architecture allows for natural integration with existing infrastructure components such as SCADA systems, energy management systems (EMS), and operational data historians, without requiring significant changes to existing hardware or software.
Each component is loosely connected, making it possible to introduce the QA functionality step by step. For example, the data embedding and indexing modules can independently link to various document repositories or databases already in use. The ingestion module can be configured to periodically pull disturbance reports or event logs from a SCADA historian and convert them into vector embeddings for indexing. This entire process operates in a read-only mode, meaning it does not send control commands or interfere with real-time grid operations, keeping operational risk to a minimum.
The local LLM inference module is also designed to be self-contained and can run on standard on-premises servers. It does not require internet access or any external cloud services, so there are no added network dependencies or security risks. This module communicates with other parts of the system through clearly defined APIs, which allows it to be integrated into specific operational settings such as being queried through an operator’s HMI while remaining isolated from core control networks.
This modular and interoperability focused design makes the system easy to deploy in heterogeneous, legacy environments and allows each component whether it is the ingestion module, ontology service, retriever, LLM, or user interface to be updated, replaced, or scaled independently based on evolving needs.
As a result, the QA system can be positioned as a knowledge assistant that integrates seamlessly into the existing smart grid infrastructure. It adds immediate value by connecting fragmented data sources, supports informed decision-making, and does all of this while staying within the strict safety and security boundaries required for critical energy operations.

6.4. Example Scenario: Secure Integration in a Legacy Smart Grid Environment

To demonstrate the practical value of this design, consider a scenario where the QA system is deployed in a legacy substation environment. An operator in the control center notices an unusual increase in DNP3 protocol messages from RTU-5 at Substation A. In many installations, triaging an unusual increase in DNP3 traffic still requires correlating SCADA logs, equipment manuals, and cybersecurity guidance across systems. Experienced operators can resolve routine cases quickly; the proposed system is intended to support non-routine or cross-domain incidents, assist junior staff, and provide auditable justification rather than replace operator expertise.
With the QA system in place, the operator can simply ask a natural language question through the interface, such as, “There is an abnormal frequency of DNP3 messages from RTU-5 at Substation A. What could be the cause and what action should I take?”
The system processes the query using its modular components. The retriever identifies relevant information, such as recent intrusion detection alerts linked to RTU-5 and excerpts from the substation’s documentation. The ontology service adds context by linking RTU-5 to DNP3 communications and recognizing patterns associated with known attacks that generate excessive DNP3 traffic. Based on this input, the LLM module generates a concise response.
For example, it may respond, “RTU-5 at Substation A is generating a high volume of DNP3 messages, which may indicate a denial-of-service attempt targeting the communication network. This behavior aligns with IEC 62351-7 [6] guidelines on network monitoring. A similar pattern was observed in the Black Energy malware incident. It is recommended to inspect RTU-5 for suspicious activity and apply recent DNP3 security updates in accordance with IEC 62351-5 [6].”
The operator receives a clear explanation based on both current system data and established standards. The answer references the relevant standard, cites a real-world case for context, and offers a response aligned with best practices—all without exposing sensitive data outside the network.
This scenario illustrates how the QA system connects operational data with domain knowledge, helping the operator make timely, informed decisions. It interprets technical details in a way that is accessible and aligned with internal policies. By integrating smoothly with existing infrastructure and applying industry standards, the system strengthens incident response and improves overall situational awareness.

7. Evaluation and Analysis

7.1. Experimental Setup and Dataset Configuration

In order to rigorously evaluate the effectiveness of the proposed local LLM-based document retrieval and QA system, we established an experimental environment reflecting realistic operational conditions common in smart grid infrastructures. The testing was conducted on a local workstation featuring an Intel Core i7-13700F CPU with 16 cores and 32 GB RAM, without any external GPU acceleration to closely represent a typical operational setting. Two quantized variants of the OpenChat-3.5 local LLM—specifically, the Q5 model (0106.Q5_K_M, 5-bit quantization) and the Q8 model (1210.Q8_0, 8-bit quantization)—were comparatively assessed.
For embedding generation, we employed the Hugging Face multilingual-e5-large model, selected for its well-established semantic retrieval capabilities and suitability for CPU-only environments. Our dataset comprised operational and security policy documents relevant to smart grid systems, presented in PDF and TXT formats. These documents were segmented into semantically coherent text chunks and converted into vector embeddings. This approach realistically simulated typical smart grid operational document retrieval scenarios.
Scope of evaluation: The present study evaluates two configurations of the proposed system (quantization levels and ontology usage) under a fixed corpus to isolate architectural effects. We do not benchmark against manual operations or third-party enterprise search tools, as measuring operator triage time and accuracy would require controlled human-in-the-loop user studies in an operational control room—beyond the scope of this paper. Designing such a man–machine benchmark (e.g., rare-event triage, decision validation, and documentation time) is part of our planned future work.

7.2. Evaluation of Response Accuracy, Latency, and System Resource Usage

The system evaluation focused on three critical performance dimensions: semantic accuracy, query response latency, and system resource usage. A comprehensive set of 30 queries representative of actual smart grid operational scenarios was utilized to assess the system’s performance. Semantic accuracy was evaluated through manual inspection by domain experts who verified whether generated answers matched expected operational guidelines and procedures. Both Q5 and Q8 models demonstrated comparable semantic accuracy, with negligible qualitative differences observed.
However, response latency revealed significant differences attributable to quantization methods. The Q5 model, optimized for computational speed, consistently provided responses approximately three times faster than the Q8 model. Specifically, the average latency recorded for the Q5 model was between 3 and 4 s per query, whereas the Q8 model’s latency ranged from approximately 10 to 12 s per query. Regarding system resource usage, CPU utilization during inference peaked between 60% and 80%, and memory consumption remained within acceptable limits (below 75% of available RAM), underscoring the practical viability of deploying the system in typical operational settings without specialized hardware. These key experimental results are summarized concisely in Table 3.

7.3. Safety and Hallucination

Hallucination risk and mitigations. While we do not run a dedicated hallucination stress test, the design constrains ungrounded outputs through several safeguards: (i) retrieval-augmented prompting that presents only retrieved contexts to the LLM; (ii) ontology-guided context selection that narrows the response space to domain-valid relations; (iii) a GUI that surfaces explicit citations and an Evidence-Only mode for verbatim extracts; and (iv) operator-in-the-loop use with immutable audit logs and RBAC. In cases where retrieved evidence is sparse or ambiguous, operators are instructed to treat answers as decision support and to consult procedures, rather than executing automated actions. These safeguards are aligned with the system’s offline, security-first deployment model.

7.4. Experimental Scenario: Incident Report Retrieval for Power Substation Security Events

To evaluate the practical applicability of the system in realistic operational contexts, we conducted a focused scenario experiment simulating typical smart grid security incidents at power substations. Queries included practical security-focused questions, such as. “What immediate steps should be taken upon detecting unauthorized access at a substation?” and “Outline the recommended procedure following anomalous RTU communications.”
In these experimental scenarios, the retrieval-augmented generation (RAG) pipeline effectively located and retrieved relevant operational procedures and guidelines embedded in the document repository. The semantic re-ranking stage notably improved precision, filtering out contextually irrelevant documents and thereby providing more accurate and operationally meaningful responses. Furthermore, integrating ontology-based semantic contexts into LLM prompts significantly enhanced the overall interpretability and relevance of generated answers, as validated through careful review by domain experts. This clearly demonstrated the system’s capability to deliver contextually accurate, operationally precise, and reliable guidance, fulfilling critical information needs during security incidents.

7.5. Retrieval Accuracy and Semantic Precision Evaluation

In addition to latency and system resource usage, we conducted a detailed assessment of retrieval precision and semantic alignment between the top-ranked context segments and user intent. Across the 30 domain-specific queries previously used for performance benchmarking, we manually evaluated the contextual relevance of both the top-1 retrieved segment and the final response output.
Initial context selection based solely on cosine similarity of embedding vectors achieved 87% relevance when judged by domain experts. However, by introducing a secondary semantic re-ranking layer—based on transformer-derived semantic scores—the top-1 relevance accuracy improved to 93%. This layered retrieval mechanism ensured that the selected contexts better matched the operational intent expressed in the queries. The re-ranking process proved especially effective in disambiguating polysemous terms or operational shorthand commonly found in smart grid documents.
In practical terms, this result highlights that even without domain-specific fine-tuning or labeled training data, the combination of dense embedding and lightweight re-ranking enables robust semantic understanding of complex technical queries. The quantitative outcomes of this accuracy analysis are summarized in Table 4.
These results confirm that the retrieval pipeline, augmented with semantic re-ranking, provides contextually aligned and operationally meaningful documents suitable for downstream reasoning in mission-critical environments.
To further quantify ranking quality, we report Mean Reciprocal Rank (MRR) on the same 30-query set used for accuracy evaluation. Let r q denote the rank of the first relevant item for query q ; MRR is defined as
M R R = 1 Q q Q 1 r q
This metric captures how consistently the system surfaces a relevant passage at the top of the list and complements top-1/top-3 relevance. In our setting, the ontology-enhanced pipeline yielded higher MRR than the cosine-only baseline, mirroring the improvements observed in top-1 accuracy (Table 4). This indicates that ontology guidance improves not only overall accuracy but also the ranking effectiveness of retrieved contexts presented to operators [24].

7.6. Case Study: Ontology-Enriched QA vs. Baseline QA Evaluation

To assess the impact of ontology integration in retrieval-augmented QA, we conducted a comparative experiment using two versions of our system with and without ontology support across two quantization levels of the OpenChat-3.5 model (Q5 and Q8). The evaluation used the same document set and question set for all configurations to ensure a consistent basis for comparison.
Table 5 and Table 6 present sample questions and answers produced by the baseline and ontology-enhanced systems at both quantization levels (Q5 and Q8), illustrating the differences in their outputs. We observe a clear trend: the baseline answers rely solely on the retrieved passages, which sometimes leads to omissions when the needed information is not explicitly present. The ontology-enriched answers, by contrast, are generally more informative and aligned with expert knowledge.
In Q1, both answers were correct, but the baseline response lacked specifics. It did not mention which elements of substation communication were secured or how the standard is structured. In contrast, the ontology informed answer listed key components such as encryption and role-based access control, reflecting an understanding that IEC 62351 [6] is a multi-part standard addressing various security functions.
In Q2, the baseline gave general advice, while the ontology enhanced version identified specific threats like DNP3 spoofing and firmware-based malware. It also connected these with appropriate countermeasures, referencing standards such as IEC 62351-5 [6] and IEC 62351-8 [6]. These standards were not mentioned in any single retrieved passage but were included through the ontology, helping the model respond more like a domain expert than a simple retriever.
In Q3, the ontology informed answer combined information from an incident report with industry best practices. While the baseline response mentioned surface level issues like firewall misconfiguration and outdated patches, the enhanced answer explained them in the broader context of architectural weaknesses, such as poor separation between IT and OT networks and the absence of intrusion monitoring. This approach helped clarify not just what failed, but why it mattered, using the ontology to relate report details to common security principles.
Importantly, the use of ontology did not cause the model to introduce false or irrelevant information. This is likely due to the controlled prompt design, where only concise, query relevant facts grounded in trusted documents or standards were added. For example, the ontology might supply simple facts like “RTUs use DNP3” and “IEC 62351-5 [6] secures DNP3,” allowing the model to combine these with the document content to form a coherent answer.
Overall, the ontology helped guide the model toward more accurate and complete responses. Instead of guessing or omitting key points, it consistently referenced the correct standards and concepts, contributing to clearer and more useful outputs.

8. Ethical and Social Considerations

Deploying AI driven decision support in safety critical domains raises ethical and social concerns. First, automation bias may cause operators to over-trust model outputs; thus the system must be framed as decision support, not decision-making. Second, accountability requires complete audit trails and provenance of retrieved evidence; our design enforces role-based access and immutable logs but still relies on organizational governance. Third, data minimization and purpose limitation are necessary when indexing internal documents. only operationally required corpora should be embedded and retention policies must be enforced. Fourth, fairness and non-discrimination remain relevant when human resources or penalties could be affected; content policies should prevent the model from generating recommendations outside approved procedures. Finally, non-determinism in LLM outputs implies residual risk. Therefore, we mandate human in the loop verification for safety-relevant actions and surface confidence cues and citations to retrieved passages. These safeguards operator in the loop workflows, provenance, and strict scoping of indexed data help bound the social risks while maintaining the operational benefits of the proposed offline QA system.

9. Conclusions

The above safeguards delimit the ethical and social scope of our system; they also inform future field trials and certification. This study confirmed that a document-based Retrieval-Augmented Generation (RAG) system can operate effectively even in closed environments where internet access is completely restricted. Experimental results demonstrated that the combination of vector embedding based semantic search and reranking mechanisms in a local LLM question answering setup provides highly relevant responses aligned with user intent, achieving an answer accuracy of approximately 90%. A comparison between the OpenChat-3.5 models Q5_K_M (5-bit quantized) and Q8_0 (8-bit quantized) revealed that both models generated similarly accurate answers across test queries, maintaining a high correctness rate without significant performance disparity. Notably, Q5_K_M achieved approximately 1.5 times faster token generation than Q8_0, substantially reducing average response time. Most queries were answered within 10 s on a single CPU/GPU setup, approximating real-time performance. These findings suggest that model quantization can reduce latency without sacrificing accuracy, indicating that the system is well suited for practical applications in terms of both performance and responsiveness.
Unlike conventional cloud-based RAG systems, all processes are executed locally, eliminating the risk of sensitive data being transmitted externally. This design significantly enhances data security by eliminating dependency on external APIs [18]. Compared to traditional keyword-based enterprise search systems, the proposed model leverages semantic vector search, enabling more contextually accurate document retrieval [22]. Experimental results also showed that incorporating a reranking algorithm improved retrieval accuracy by approximately 10%, confirming the superiority of semantic search over simple keyword matching. In sum, the proposed local semantic search architecture with built-in security complements the performance strengths of cloud-based RAG while inheriting the safety of on-premise systems, thereby addressing the limitations of both approaches. The system successfully maintains cloud level search precision in an offline setting while mitigating data leakage risks an integrated improvement substantiated through empirical testing [25].
From the perspective of security and practical deployment, the system demonstrates strong applicability. Its entirely offline configuration allows it to be deployed in network-isolated environments such as manufacturing plants, financial institutions, military systems, or critical infrastructure including smart grids. As data remains entirely internal and all computations are locally performed, the system effectively prevents technology leakage and offers high operational reliability. The actual implementation is designed to rapidly retrieve and summarize internal documents without exposing source content externally, complying with the strict information protection requirements of industrial sites while providing AI-driven knowledge exploration in a practical manner. Even in highly sensitive environments, the system can help protect internal knowledge assets while enhancing productivity through AI, and the experimentally verified performance metrics support this real-world applicability.
Future work should focus on refining the system for high-security infrastructure environments such as smart grids [26]. First, faster response times should be pursued by optimizing model quantization techniques, applying lightweight LLMs, and leveraging dedicated hardware such as GPUs. Second, to address global industrial needs, the system must expand its support for multilingual documents [27,28]. Current operation focuses on Korean and English, but embedding models and retrieval pipelines should be redesigned for multilingual or language-agnostic capabilities. Third, enhancing embedding-level security remains a crucial challenge. Since vector representations may still pose reconstruction risks, mechanisms such as Differential Privacy should be explored to prevent sensitive information from leaking during embedding, alongside encryption and strict access controls for the embedding store [29]. If pursued along these directions, the system can evolve into a more robust and secure AI-powered document retrieval solution for completely offline environments [30].
In conclusion, this study proposes and validates a design strategy that addresses both data security and AI, two core demands of modern industry offering meaningful contributions toward the development of the next generation of security-oriented knowledge retrieval systems.

Author Contributions

Conceptualization, K.L., S.Y. and Y.L.; funding acquisition, Y.L.; methodology, K.L., S.Y. and Y.L.; supervision, D.S.; validation, K.L. and Y.L.; writing—original draft, K.L. and Y.L.; writing—review and editing, D.S. and J.J. All authors have read and agreed to the published version of the manuscript.

Funding

This research was conducted with the support of the Human Resources Development Project for Regional Energy Clusters funded by the Ministry of Trade, Industry and Energy in 2025. (Project number: 20224000000070).

Data Availability Statement

The data presented in this study are available on request from the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Sridhar, S.; Hahn, A.; Govindarasu, M. Cyber–physical system security for the electric power grid. Proc. IEEE 2012, 100, 210–224. [Google Scholar] [CrossRef]
  2. NISTIR 7628 Rev. 1; Guidelines for Smart Grid Cybersecurity. National Institute of Standards and Technology (NIST): Gaithersburg, MD, USA, 2014. Available online: https://nvlpubs.nist.gov/nistpubs/ir/2014/NIST.IR.7628r1.pdf (accessed on 1 March 2025).
  3. Yao, Y.; Duan, J.; Xu, K.; Cai, Y.; Sun, Z.; Zhang, Y. A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly. High-Confid. Comput. 2024, 4, 100211. [Google Scholar] [CrossRef]
  4. Knollmeyer, S.; Caymazer, O.; Grossmann, D. Document GraphRAG: Knowledge Graph Enhanced Retrieval Augmented Generation for Document Question Answering within the Manufacturing Domain. Electronics 2025, 14, 2102. [Google Scholar] [CrossRef]
  5. Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.; Rocktäschel, T.; et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv 2020, arXiv:2005.11401. [Google Scholar] [CrossRef]
  6. IEC 62351; Power Systems Management and Associated Information Exchange—Data and Communications Security. International Electrotechnical Commission (IEC): Geneva, Switzerland, 2018. Available online: https://webstore.iec.ch/publication/6028 (accessed on 23 March 2025).
  7. Lee, Y.; Chae, H.; Lee, K. Countermeasures against large-scale reflection DDoS attacks using exploit IoT devices. J. Inf. Commun. Converg. Eng. 2021, 19, 127–136. [Google Scholar] [CrossRef]
  8. Musman, S.; Turner, A. A game oriented approach to minimizing cybersecurity risk. Int. J. Saf. Secur. Eng. 2017, 8, 212–222. [Google Scholar] [CrossRef]
  9. Uzunov, A.V.; Fernandez, E.B. An extensible pattern-based library and taxonomy of security threats for distributed systems. Comput. Stand. Interfaces 2014, 36, 734–747. [Google Scholar] [CrossRef]
  10. Lang, J.; Guo, Z.; Huang, S. A Comprehensive Study on Quantization Techniques for Large Language Models. arXiv 2024, arXiv:2411.02530. [Google Scholar] [CrossRef]
  11. Dwork, C.; McSherry, F.; Nissim, K.; Smith, A. Calibrating Noise to Sensitivity in Private Data Analysis. In Theory of Cryptography; Halevi, S., Rabin, T., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2006; Volume 3876, pp. 265–284. [Google Scholar] [CrossRef]
  12. Dwork, C.; Roth, A. The Algorithmic Foundations of Differential Privacy. Found. Trends Theor. Comput. Sci. 2014, 9, 211–407. [Google Scholar] [CrossRef]
  13. Wang, L.; Yang, N.; Huang, X.; Jiao, B.; Yang, L.; Jiang, D.; Majumder, R.; Wei, F. Text Embeddings by Weakly-Supervised Contrastive Pre-Training. arXiv 2022, arXiv:2212.03533. [Google Scholar] [CrossRef]
  14. Perez, F.; Ribeiro, I. Ignore Previous Prompt: Attack Techniques for Language Models. arXiv 2022, arXiv:2211.09527. [Google Scholar] [CrossRef]
  15. Gao, Y.; Xia, X.; Guo, Y. Correction: Gao et al. A Modeling Method for Thermal Error Prediction of CNC Machine Equipment Based on Sparrow Search Algorithm and Long Short-Term Memory Neural Network. Sensors 2023, 23, 3600. Sensors 2024, 24, 2133. [Google Scholar] [CrossRef] [PubMed]
  16. Wan, Z.; Wang, X.; Liu, C.; Alam, S.; Zheng, Y.; Liu, J.; Qu, Z.; Yan, S.; Zhu, Y.; Zhang, Q.; et al. Efficient Large Language Models: A Survey. arXiv 2023, arXiv:2312.03863. [Google Scholar] [CrossRef]
  17. Bi, S.; Zhang, Y.J. Graph-Based Cyber Security Analysis of State Estimation in Smart Power Grid. arXiv 2016, arXiv:1612.05878. [Google Scholar] [CrossRef]
  18. Nickolls, J.; Buck, I.; Garland, M.; Skadron, K. Scalable Parallel Programming with CUDA. ACM Queue 2008, 6, 40–53. [Google Scholar] [CrossRef]
  19. Nogueira, R.; Cho, K. Passage Re-ranking with BERT. arXiv 2019, arXiv:1901.04085. [Google Scholar] [CrossRef]
  20. Madaan, A.; Tandon, N.; Gupta, P.; Hallinan, S.; Gao, L.; Wiegreffe, S.; Alon, U.; Dziri, N.; Prabhumoye, S.; Yang, Y.; et al. Self-Refine: Iterative Refinement with Self-Feedback. arXiv 2023, arXiv:2303.17651. [Google Scholar] [CrossRef]
  21. Owens, J.D.; Houston, M.; Luebke, D.; Green, S.; Stone, J.E.; Phillips, J.C. GPU Computing. Proc. IEEE 2008, 96, 879–899. [Google Scholar] [CrossRef]
  22. Salton, G.; Wong, A.; Yang, C.S. A Vector Space Model for Automatic Indexing. Commun. ACM 1975, 18, 613–620. [Google Scholar] [CrossRef]
  23. Wang, J.Z.; Du, Z.; Payattakool, R.; Yu, P.S.; Chen, C.F. A New Method to Measure the Semantic Similarity of GO Terms. Bioinformatics 2007, 23, 1274–1281. [Google Scholar] [CrossRef] [PubMed]
  24. Manning, C.D.; Raghavan, P.; Schutze, H. Introduction to Information Retrieval; Cambridge University Press: Cambridge, UK, 2008. [Google Scholar] [CrossRef]
  25. Liu, F.; Kang, Z.; Han, X. Optimizing RAG Techniques for Automotive Industry PDF Chatbots: A Case Study with Locally Deployed Ollama Models. arXiv 2024, arXiv:2408.05933. [Google Scholar] [CrossRef]
  26. Li, J.; Yang, Y.; Sun, J. Risks of Practicing Large Language Models in Smart Grid: Threat Modeling and Validation. arXiv 2024, arXiv:2405.06237. [Google Scholar] [CrossRef]
  27. Feng, F.; Yang, Y.; Cer, D.; Arivazhagan, N.; Wang, W. Language-agnostic BERT Sentence Embedding. arXiv 2020, arXiv:2007.01852. [Google Scholar] [CrossRef]
  28. Yu, P.; Merrick, L.; Nuti, G.; Campos, D. Arctic-Embed 2.0: Multilingual Retrieval Without Compromise. arXiv 2024, arXiv:2412.04506. [Google Scholar] [CrossRef]
  29. Koga, T.; Wu, R.; Chaudhuri, K. Privacy-Preserving Retrieval-Augmented Generation with Differential Privacy. arXiv 2024, arXiv:2412.04697. [Google Scholar] [CrossRef]
  30. Xu, J.; Li, Z.; Chen, W.; Wang, Q.; Gao, X.; Cai, Q.; Ling, Z. On-Device Language Models: A Comprehensive Review. arXiv 2024, arXiv:2409.00088. [Google Scholar] [CrossRef]
Figure 1. Overall system architecture.
Figure 1. Overall system architecture.
Electronics 14 03407 g001
Figure 2. Document embedding and indexing flow.
Figure 2. Document embedding and indexing flow.
Electronics 14 03407 g002
Figure 3. RAG-based query answering procedure.
Figure 3. RAG-based query answering procedure.
Electronics 14 03407 g003
Figure 4. Conceptual operator interface showing (A) query panel, (B) top-k retrieved passages with re-ranking, (C) model answer with citation badges and confidence cue, (D) role-based access control, (E) audit log and provenance, and (F) offline status indicator.
Figure 4. Conceptual operator interface showing (A) query panel, (B) top-k retrieved passages with re-ranking, (C) model answer with citation badges and confidence cue, (D) role-based access control, (E) audit log and provenance, and (F) offline status indicator.
Electronics 14 03407 g004
Figure 5. Save and restore concept.
Figure 5. Save and restore concept.
Electronics 14 03407 g005
Figure 6. Access control logic.
Figure 6. Access control logic.
Electronics 14 03407 g006
Figure 7. A simplified example of the smart grid ontology.
Figure 7. A simplified example of the smart grid ontology.
Electronics 14 03407 g007
Table 1. Configuration comparison of quantized local LLMs.
Table 1. Configuration comparison of quantized local LLMs.
Model
Version
Max
Generation
Tokens
Context
Window
Top-k
Similarity
Retrieval
Top-n
Reranking
Description
OpenChat-3.5-0106.Q5_K_M256409653Speed-Optimized
(5-bit quantization)
OpenChat-3.5-1210.Q8_010248192103Accuracy-focused
(8-bit quantization)
Table 2. Alignment of the QA system’s design with IEC 62351 [6] security standards.
Table 2. Alignment of the QA system’s design with IEC 62351 [6] security standards.
IEC 62351 [6] PartFocus AreaQA System Alignment
62351-3: Secure Communication (TCP/IP)Defines use of TLS for securing TCP/IP-based power system protocolsAll QA system network interactions (e.g., any client–server queries or data fetches) are protected with TLS encryption and mutual authentication, fulfilling secure communication requirements.
62351-5: Security for IEC 60870-5 and Derivatives (e.g., DNP3)Adds authentication and cryptographic protection to legacy SCADA protocolsThe QA system passively integrates with SCADA sources in a read-only mode, using secure protocols and respecting access controls, without sending control commands—ensuring compliance with IEC 62351-5 [6].
62351-6: Security for IEC 61850 Protocols (MMS)Extends security to substation automation communications
(IEC 61850 MMS)
Similarly, our system’s interactions with IEC 61850-based substation data (if any) occur only through secure, authenticated channels. The QA service does not participate in control operations; it only ingests authorized data and therefore remains compliant with IEC 62351-6 [6] security measures for MMS messaging.
62351-7: Network and System Management (NSM)Defines standardized objects and alarms for monitoring the security status of power system devicesThe QA system supports IEC 62351-7 [6] by ingesting and interpreting standardized security monitoring data, such as IDS alerts and equipment metrics, integrating them into its ontology-based knowledge base to answer queries on security events and system health.
62351-8: Role-Based Access Control (RBAC)Specifies a standardized RBAC scheme for power system operations (user roles, permissions)The QA system enforces strong user authentication and role-based access control (RBAC) in line with IEC 62351-8 [6], ensuring only authorized users can access sensitive functions or data, and fully adheres to existing utility access control policies.
62351-9: Key ManagementCovers management of cryptographic keys and digital certificates for secure communications.The QA system follows IEC 62351-9 [6] by using the existing Public Key Infrastructure for certificate and key management, ensuring secure communications without introducing any ad hoc or insecure key handling practices.
62351-10: Security Architecture (Guidelines)Provides overall security architecture best practices (defense in depth, network segmentation, minimal exposure)The QA system adheres to IEC 62351-10 [6] by applying defense-in-depth principles: it is on-premises, cloud-independent, runs with minimal privileges, encrypts data at rest, and is isolated behind firewalls—minimizing attack surface and ensuring layered security throughout its lifecycle.
62351-14: Security Event LoggingRecommends standardized logging for security events to support auditing and forensic analysis.The QA system maintains tamper-resistant audit logs of all queries and actions, including timestamps, user IDs, and accessed resources. These logs can be mapped to the upcoming IEC 62351-14 [6] standard, supporting SIEM integration and ensuring auditability aligned with industry best practices.
Table 3. Summary of response accuracy, latency, and resource usage for local LLMs.
Table 3. Summary of response accuracy, latency, and resource usage for local LLMs.
Performance MetricOpenChat-3.5-Q5 (5-Bit)OpenChat-3.5-Q8 (8-Bit)
Semantic AccuracyHigh (expert verified)High (expert verified)
Average Response Latency3–4 s/query10–12 s/query
CPU Utilization (peak)~60–70%~70–80%
Memory UsageModerate (<75% RAM)Moderate (<75% RAM)
SuitabilityReal-time, latency-criticalHigh-accuracy applications
Table 4. Summary of retrieval accuracy and semantic precision results.
Table 4. Summary of retrieval accuracy and semantic precision results.
Evaluation MetricCosine Similarity OnlyWith Semantic Re-Ranking
Top-1 Relevance Accuracy (30 queries)87%93%
Top-3 Relevance Coverage97%100%
Manual Expert Verification NeededModerateMinimal
Table 5. Comparison table of Q5 models for the ontology-based QA system.
Table 5. Comparison table of Q5 models for the ontology-based QA system.
QuestionBaseline-Q5 AnswerOntology-Q5 Answer
Q1: “What does IEC 62351 [6] specify for securing substation communications?“IEC 62351 [6] is a standard for securing substation communications. It recommends using encryption and strong authentication to protect substation data.”“IEC 62351 [6] is a multi-part standard for power system cybersecurity. For substation communications, it prescribes measures like encrypting SCADA protocol traffic (for example, using TLS for network communications) and enforcing strong authentication and role-based access control for operators.”
Q2: “How can an RTU in a smart grid be compromised, and how to prevent it?”“An RTU can be compromised via network attacks or malware. To prevent this, use firewalls and keep its firmware updated.”“RTUs can be compromised by network-based exploits (for example, an attacker intercepting or spoofing control commands) or through malware that targets the device’s firmware. To prevent this, one should secure the RTU’s network (using firewalls and intrusion detection), strengthen its communication protocols (for instance, apply IEC 62351-5 [6] to authenticate DNP3 messages), regularly update and verify firmware integrity, and enforce strict access controls on the device (using IEC 62351-8 [6] role-based access). By combining these measures, the RTU becomes much more resilient to cyber attacks.”
Q3: “In the 2020 malware incident report, what were the main security failures?”“The report says the attacker bypassed the firewall and that critical patches were not applied in time.”“The incident analysis found two main failures: poor network segmentation between the corporate IT and OT environments (which allowed the malware to spread into SCADA) and unpatched firmware on certain RTUs (missing critical updates that the malware exploited). It also noted a monitoring gap—the substation’s intrusion detection system failed to detect the breach. These weaknesses violated the defense-in-depth principle and enabled the attack to succeed.”
Table 6. Comparison table of Q8 models for the ontology-based QA system.
Table 6. Comparison table of Q8 models for the ontology-based QA system.
QuestionBaseline-Q8 AnswerOntology-Enhanced Q8 Answer
Q1: “What does IEC 62351 [6] specify for securing substation communications?“IEC 62351 [6] provides security for communication protocols in power systems. It includes using TLS encryption for TCP/IP and defining access control.”“IEC 62351 [6] is a multi-part standard addressing power system cybersecurity. For substation communications, it mandates measures such as encrypting SCADA protocols (IEC 62351-3 [6] requires TLS for TCP/IP-based protocols), adding secure authentication for legacy protocols (IEC 62351-5 [6] covers DNP3/IEC 60870-5 [6]), and implementing role-based access control for operators (IEC 62351-8 [6]). Together, these provisions ensure end-to-end confidentiality and integrity in substation data exchange.”
Q2: “How can an RTU in a smart grid be compromised, and how to prevent it?”“An RTU can be compromised via network attacks or malware. For prevention, use firewalls and update firmware.”“Remote Terminal Units can be compromised through methods such as communication protocol exploits (for example, intercepting or spoofing DNP3 commands) or even physical tampering. Notorious malware such as Stuxnet has targeted RTUs by exploiting firmware vulnerabilities. To prevent such compromises, multiple layers are needed: network security (firewalls and intrusion detection on substation networks), protocol security enhancements (applying IEC 62351-5 [6] to authenticate DNP3 messages so attackers cannot spoof commands), regular firmware updates and integrity checks (to thwart malware), and strict access controls so only authorized personnel can reconfigure the RTU (per IEC 62351-8 [6]). By combining these measures, the RTU becomes much more resilient to cyber attacks.”
Q3: “In the 2020 malware incident report, what were the main security failures?”“The report notes that an attacker bypassed the firewall and there was a lack of timely patching of systems.”“According to the 2020 incident analysis, the malware infiltrated the plant network due to two key failures: inadequate network segmentation (the corporate IT and OT networks were bridged, allowing the malware to spread to SCADA) and outdated firmware on certain RTUs (missing critical patches that the malware exploited). In addition, there was no effective intrusion alarm the substation’s IDS did not detect the malware’s activity (a gap in IEC 62351-7 [6] monitoring). These failures meant that the defense-in-depth principle was violated, enabling the attack to succeed.”
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Lee, K.; Yang, S.; Jeong, J.; Lee, Y.; Shin, D. Enhancing Security and Applicability of Local LLM-Based Document Retrieval Systems in Smart Grid Isolated Environments. Electronics 2025, 14, 3407. https://doi.org/10.3390/electronics14173407

AMA Style

Lee K, Yang S, Jeong J, Lee Y, Shin D. Enhancing Security and Applicability of Local LLM-Based Document Retrieval Systems in Smart Grid Isolated Environments. Electronics. 2025; 14(17):3407. https://doi.org/10.3390/electronics14173407

Chicago/Turabian Style

Lee, Kiho, Sumi Yang, Jaeyeong Jeong, Yongjoon Lee, and Dongkyoo Shin. 2025. "Enhancing Security and Applicability of Local LLM-Based Document Retrieval Systems in Smart Grid Isolated Environments" Electronics 14, no. 17: 3407. https://doi.org/10.3390/electronics14173407

APA Style

Lee, K., Yang, S., Jeong, J., Lee, Y., & Shin, D. (2025). Enhancing Security and Applicability of Local LLM-Based Document Retrieval Systems in Smart Grid Isolated Environments. Electronics, 14(17), 3407. https://doi.org/10.3390/electronics14173407

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop