Abstract
This paper introduces an LLM-driven design-time workflow for Internet of Things (IoT) and network management system development that combines the generative and summarization capabilities of Large Language Models with the formal rigor of Model-Driven Engineering (MDE). The key novelty lies in grounding LLM-assisted topology design and network management, including reasoning about configuration code to formally verifiable models, enabling security- and safety-aware decisions by design with improved trust and explainability compared with black-box runtime solutions. The approach relies on activity-diagram-based models that provide formal semantics for capturing control flows, decision points, and interactions among IoT devices, edge nodes, and network management components, supporting systematic functional safety validation. Design-time security analysis is realized through MDE combined with Object Constraint Language (OCL) rules, allowing explainable detection of misconfigurations, policy violations, and potential vulnerabilities before deployment. The workflow is evaluated using representative IoT and mobile network management scenarios, demonstrating enhanced effectiveness and up to 15 times reduction in detection and corrective action time for critical tasks.
1. Introduction
Since the rapid rise of Generative Artificial Intelligence (GenAI) in recent years, its foundational models and methodologies—most notably Large Language Models (LLMs) [1], Vision Language Models (VLMs) [2], and Retrieval-Augmented Generation (RAG) frameworks [3]—have been adopted with varying degrees of success across an increasingly wide spectrum of domains [4]. Applications now span manufacturing, blockchain systems, cybersecurity, and software engineering, each leveraging GenAI’s strengths to address domain-specific challenges. Each of these domains leverages GenAI’s capacity to address tasks that were previously highly resource-intensive or infeasible using conventional algorithmic approaches.
LLMs in particular have emerged as a core enabler for intelligent automation, due to their strong capabilities in language understanding, semantic interpretation, structured reasoning, and text generation [1]. Their ability to synthesize complex unstructured inputs and provide context-aware outputs positions them as powerful tools for augmenting human decision making. Building on this, VLMs extend the functional scope of LLMs by incorporating image-based and other visual modalities [2]. This multimodal fusion opens new opportunities for automation in visually intensive workflows, such as network-topology interpretation, anomaly detection, and system visualization. In parallel, RAG methodologies have become fundamental for enhancing GenAI scalability and reliability, as they enrich model prompts with domain-specific, up-to-date contextual information retrieved from large external knowledge sources [3]. This retrieval-driven enhancement not only improves prompting efficiency but also gives the ability to integrate aspects of compliance with privacy constraints, regulatory requirements, and organizational governance policies.
Additionally, a prominent trend in GenAI research is the emergence of agentic approaches, where complementary AI models are orchestrated to perform coordinated reasoning, planning, and decision making [5,6]. These multi-component systems aim to replicate certain aspects of human-like problem solving, improving transparency, explainability, and controllability, key factors in fostering user trust and acceptance [6]. Likewise, multi-agent GenAI architectures are being increasingly explored for tasks requiring robustness, cross-validation, or collaborative inference across several autonomous agents [5,6]. Such systems can evaluate one another’s outputs, mitigate risks of model hallucination, and significantly elevate the overall quality and reliability of the generated solutions. As a result, GenAI-based technologies are also being adopted within highly sensitive, safety-critical operational environments, such as telecommunications networks, infrastructure management platforms, and large-scale network operations centers [1,6].
On the other side, the widespread adoption of IoT and underlying mobile network technologies has elevated modern communication, automation, and data-driven decision making across industries. As these systems increasingly support critical services such as healthcare, transportation, smart cities, and industrial control, their reliability and trustworthiness have become essential. The high level of connectivity, combined with the large number of heterogeneous devices, also increases exposure to cyber threats and system failures, making robust design and management a necessity rather than an option [7]. Therefore, security and functional safety play a vital role in ensuring dependable IoT and mobile network operations [8]. Security protects devices, communication channels, and network infrastructure from unauthorized access, cyberattacks, and data breaches, safeguarding sensitive data and maintaining service availability. Additionally, functional safety ensures that systems behave predictably and move to safe states in the presence of faults, errors, or unexpected conditions, particularly in safety-critical applications where failures may cause physical harm or significant damage. Together, security and functional safety address both intentional attacks and accidental failures, forming the foundation for resilient, safe, and trustworthy IoT and mobile network management systems.
This paper contributes to this growing body of work by examining the applicability and potential of state-of-the-art GenAI methodologies within the Internet of Things (IoT) domain, with a particular emphasis on network management processes and operational workflows. Given the inherent heterogeneity, scale, and complexity of modern communication infrastructures, which span multiple technologies, topologies, and service models, GenAI has significant potential to accelerate operational procedures, enhance automation, and reduce the time and technical expertise required for routine and specialized tasks [9]. On the other side, the importance of enhancing IoT system manageability and safety through functional domain separation and management interface isolation has been emphasized in the recent literature [10]. Therefore, our objective is to evaluate how GenAI-enabled frameworks can support or automate key network management-related procedures, especially regarding the aspects of security and functional reliability.
The proposed approach is motivated by the following challenges observed in current IoT and network management practices:
- Predominance of runtime, black-box AI solutions, which provide limited formal guarantees, weak explainability, and reduced trust in safety- and security-critical environments.
- Lack of design-time verification mechanisms capable of proactively identifying security misconfigurations, policy violations, and functional safety issues before deployment.
- Limited integration between GenAI capabilities and formal engineering methods, resulting in AI-assisted decisions that are difficult to validate, justify, or audit.
- Growing complexity and heterogeneity of IoT and mobile network infrastructures, which increases the likelihood of human error during topology design and management configuration.
- Need for trustworthy and explainable AI-assisted workflows that can be safely adopted in regulated and mission-critical networked systems.
To address these challenges, this paper makes the following key contributions:
- An LLM-driven, design-time workflow for IoT and network management system development that shifts security and functional safety analysis from reactive runtime mechanisms to proactive by-design validation.
- A novel integration of LLMs with Model-Driven Engineering (MDE), grounding LLM-assisted topology design and management-code reasoning in formally verifiable models to enhance trust and explainability.
- An activity-diagram-based modeling approach with formal semantics to capture control flows, decision points, and interactions among IoT devices, edge nodes, and network management components.
- A design-time security analysis framework combining MDE with Object Constraint Language (OCL) rules to enable explainable detection of misconfigurations, policy violations, and potential vulnerabilities prior to deployment.
- Evaluation using representative IoT and mobile network management scenarios, including both locally deployable and proprietary solutions, demonstrating improved effectiveness and reduction in detection and corrective action time.
The remainder of this paper is organized as follows. Section 2 provides an overview of the proposed LLM-empowered security-aware network management framework. In the beginning, it also includes relevant related works and underlying network management notations, as well as a summary of our earlier publications with the aim to identify the research gaps and motivate the present study. Section 3 presents the two considered case studies and reports the results of applying the LLM-empowered framework to ensure security and functional safety by design in IoT infrastructures. Moreover, Section 4 presents the discussion of the achieved outcomes and comparison with previous works. Finally, Section 5 presents the concluding remarks, summarizing the implications of our findings and discussing the broader prospects for GenAI adoption in IoT and mobile telecommunications.
2. Materials and Methods
2.1. Related Works
The contributions presented in this paper build upon and extend a series of our earlier research efforts. In our prior work [6], we investigated how MDE can be integrated with LLMs to support automated network design and experimentation. Employing MDE offers several advantages. First, it provides a compact and structured representation of experiment configurations, which can be retained within an AI agent’s memory and used to guide subsequent decision making and management operations. Second, the use of explicit models introduces a verifiable intermediate layer within the workflow, thereby enhancing transparency and strengthening human trust in the downstream outputs produced by GenAI-enabled components. Third, because the generated models are human-readable, they support direct human intervention, review, and iterative refinement. Our implementation relies on the Eclipse Modeling Framework (EMF) and Ecore metamodeling environment, complemented with OCL specifications to ensure model correctness, as detailed in [11].
Building on these foundations, we explored the integration of retrieval-augmented generation (RAG) in [12] to efficiently process and contextualize large bodies of textual information, including network standards, architectural specifications, and technical reference documents. This addition was shown to improve GenAI’s ability to handle complex, domain-specific inputs within automated network management workflows. In parallel, our work in [13] introduced a metamodeling-based abstraction layer capable of automatically capturing key architectural and topological characteristics from textual descriptions, enabling higher-level automation of network experimentation tasks.
We further expanded the scope of GenAI-driven automation by examining the role of Vision Language Models (VLMs) in performance evaluation and visual data extraction. As demonstrated in [14], VLMs can effectively interpret plots, diagrams, and other graphical performance indicators, enabling automated extraction of parameters and metrics traditionally requiring manual inspection. In addition, our initial exploration of agentic frameworks for autonomous network and IoT system management was reported in [6,15], where we introduced multi-agent GenAI paradigms capable of coordinating tasks, validating results, and improving operational robustness. Finally, in [16], we proposed a GenAI-based security framework tailored for IoT environments. This framework integrates LLM-driven vulnerability assessment—based on source code, logs, and configuration analysis—with automated generation of corrective actions, such as command sequences and configuration updates that directly modify network behavior. On the other side, in [17], we aggregated our contributions on LLM-driven network constraint modeling and automated code generation.
A consolidated overview of key contributions on related topics featuring authors of this paper is provided in Table 1.
Table 1.
Overview of authors’ previous works on GenAI in network management and experimentation.
In contrast, Table 2 presents a survey of other comparable research efforts in the area of LLM-based security-aware IoT infrastructure and network management, highlighting the methodologies they employ, as well as the goals, specific case studies investigated and underlying models. Considering the scope of covered works, it can be identified that there is a gap when it comes to design-time analysis, as most of them consider run-time aspects. Therefore, in this paper, we aim to bridge the research gap in this area and provide a complementary approach that would provide the means to ensure security by design, building upon our previous works on GenAI-enabled network management [6,10,11,12,13,14,15,16,17].
Table 2.
Overview of external related works on GenAI in network management and experimentation.
2.2. Background
In this subsection, we cover the notations behind the software artifacts (configurations and executable commands) leveraged for programmatic network management in our case study. YANG and NETCONF together form a powerful and complementary framework for managing modern communication networks, and their benefits extend naturally into IoT. In this paper, we adopt their synergy in order to enable highly automated network management operations within complex IoT infrastructures by manipulating program code using LLM.
YANG (Yet Another Next Generation) [24] is a modeling language widely used in telecommunications and networking to define the structure of configuration data, operational state information, and network services. Through YANG models, operators can describe how devices should be configured, how their status, counters, and alarms are monitored, and how complex services such as VPNs, quality of service, routing, and 5G network slicing are represented in a standardized way. By serving as a common schema, YANG promotes consistency, reduces vendor dependency, and enables automation across increasingly software-driven networks.
On the other side, NETCONF (Network Configuration Protocol) [25] complements YANG by providing a secure and reliable mechanism to access and manipulate the data defined by these models. Operating typically over SSH and using XML-encoded messages, NETCONF allows management systems to retrieve device state, apply configuration changes, validate updates, and safely roll back to previous configurations when needed. This standardized, programmatic approach replaces manual, vendor-specific command-line interfaces, making large-scale automation and error reduction feasible in complex telecom environments.
When applied to IoT systems, the combination of YANG and NETCONF becomes especially valuable. IoT deployments often involve thousands or millions of heterogeneous devices that require consistent configuration, monitoring, and lifecycle management. YANG can be used to model IoT device capabilities, sensor data, connectivity parameters, and security settings, while NETCONF provides a secure channel to remotely configure and manage these devices at scale. Together, they enable automated provisioning, centralized monitoring, and reliable updates, supporting scalable, interoperable, and manageable IoT networks that align with modern SDN, NFV, and 5G-based infrastructures.
2.3. LLM-Based Network Management Code Analysis
The proposed approach integrates Retrieval-Augmented LLMs with MDE to support secure and reliable system design during the early development of complex telecommunications infrastructure. The goal is to automate the extraction of relevant network commands, reason about potentially hazardous or insecure management activity sequences, and validate them using formal design rules before deployment.
Figure 1 summarizes the workflow, where each step corresponds to a numbered element. In what follows, these steps will be described. Steps marked red can optionally involve human input or feedback to ensure the correctness of the subsequent steps.
Figure 1.
LLM-based IoT network management code functional safety analysis workflow.
Inputs preparation (1a/1b): A network command catalog and protocol descriptor catalog (1a, 1b) are treated as authoritative structured sources of truth (command names, allowable parameters, protocol semantics, and message formats). They are forwarded to dedicated parsers that produce machine-readable representations containing only the information relevant for subsequent security and functional safety analysis. These catalogs are considered non-exclusive, as telecom software may operate across multiple abstraction layers. At higher abstraction levels—typical for middleware or orchestration frameworks—command-level representations are used (1a). At lower layers—such as device drivers, network functions, or protocol handlers—raw protocol messages may also appear directly within the code (1b).
RAG-based context construction (2a/2b, 3): In practice, command catalogs contain thousands of entries, and protocol descriptors may include numerous message types with variations, exceeding LLM context limits and making token usage costly. Therefore, a RAG layer indexes catalogs and prior artifacts, ensuring that LLM outputs remain grounded in concrete, up-to-date data. The RAG retrieves only the specific catalog segments relevant to the telecom component’s configuration currently being analyzed.
Network command/protocol extraction (4–8): Prompt 1 extracts commands and protocol-level interactions used within the provided code. This produces a focused set of relevant commands/messages for the current security analysis. A rule-based or formal validator ensures that the extracted commands/messages exist in the catalogs and conform to expected formats and parameter ranges before further processing. This prevents constructs with potentially malicious or invalid values from propagating through the system. The validated command set forms the basis for management activity sequence reasoning in subsequent steps. In what follows, Prompt 1 is given.
Prompt 1.
Network command/protocol message extraction—“You are extracting a list of network commands and protocol messages based on the given source code {code}. For each extracted entry, include name, type, value/parameters, and protocol”.
Management activity sequence construction (9–11): Prompt 2 generates a management-activity-sequence representation of the code so that intended network behavior is expressed as sequences of management activities, enabling easy identification of insecure or undesirable outcomes. The LLM (with RAG-retrieved context) produces structured management activity sequences, with the aim of identifying potential malicious behavior (e.g., “forged command → routing policy altered → network slice enters unstable state”). The extraction module converts LLM text into a machine-readable management activity sequence representation. For this purpose, PlantUML activity-diagram notation with minor extensions is used because of its simplicity and robustness during iterative LLM-driven extension, as shown in [13]. In this case, Prompt 2 has the following form:
Prompt 2.
“You are updating a PlantUML activity diagram describing a telecom management activity sequence (without comments or explanations) given as {current-activity-sequence}, based on the provided source code or configuration: {code}, and considering {relevant commands/messages}. Each management activity step should include notes specifying: input, input_format, output, output_format”.
Functional safety/reliability validation (11–12): Management activity sequences are represented in an MDE environment as formal models (such as state machines, sequence diagrams, activity or domain-specific models). In our approach, PlantUML activity diagrams are converted into JSON using model-to-model transformation for rule checking. Ordering constraints are then verified. For this purpose, we support rules of the form ai before/after aj, which specifies that activity ai must occur earlier/later in the execution sequence than activity aj. These rules are evaluated by identifying the corresponding activities in the modeled event sequence and comparing their sequence indices to verify compliance. Security and reliability rules (Rule 1 … Rule N) represent domain best practices and may be defined manually or generated with LLM support, as described in [16,17]. Such before/after constraints are commonly used to enforce domain best practices, for example, ensuring that authentication precedes privilege modification, validation occurs before deployment, or rollback actions follow failure detection in any analyzed management activity sequence. Despite their simplicity, these rules provide an effective and explainable mechanism for detecting unsafe or inconsistent management behaviors at design time.
Feedback loops (6, 12): If validation fails, feedback returns to the developer through the indicated steps. This iterative loop connects LLM-generated hypotheses with formal verification and intermediate outputs before deployment. Corrective actions may be performed manually or automatically. An example of Prompt 2b, which can be used for this purpose, is given in what follows.
Prompt 3.
“Based on the code analysis outcome {result}, correct the following code {code} to eliminate the detected issues”.
Deployment and testing (13–14): Validated artifacts (telecom service modules, orchestration scripts, and test cases) are deployed to the target network device or testbed platform. The generated management activity sequences are executed and evaluated to observe concrete security or performance implications. The generated code is deployed to the target telecom device or network function—either directly or through remote update mechanisms. The system is then executed and evaluated in a telecommunications testbed, enabling integration testing, simulation, and iterative refinement.
2.4. Secure Topology by Design
Furthermore, this workflow provides an automated, security-by-design environment for telecommunications and distributed communication systems behind IoT infrastructure. By combining LLM-assisted modeling, formal security analysis, and automated YANG and NETCONF generation, it enables early detection of communication-related security flaws, iterative refinement of system architectures, and seamless transition from validated design to deployable configuration. As a result, complex communication systems can be designed, analyzed, and securely configured in a unified and systematic manner. The approach builds upon prior work on instance model generation, OCL-based constraint formalization, and metamodel-driven transformations [13,17]. The resulting environment enables early-stage validation of complex communication system architectures, ensuring that security requirements are addressed before implementation and deployment. The main objective is to ensure that communication interfaces, protocols, and management configurations satisfy security requirements by design. In what follows, the main steps behind the proposed workflow are described in Figure 2. Dashed lines for particular steps denote that they are optional part of the workflow, while red boxes for steps mean that human intervention/feedback can be taken into account.
Figure 2.
LLM-driven analysis of IoT system security aspects in design time.
System modeling (1): The process begins with user-provided descriptions of the target system architecture, including system components (devices, services, gateways, and controllers); communication relationships and data flows; network technologies and transport mechanisms; communication and application layer protocols; and assumptions related to authentication, encryption, access control, and isolation. The considered scope and covered system modeling aspects are defined with respect to the metamodel, which can be either manually crafted by a domain expert or also automatically created using LLM, starting from reference architecture documents (as presented in [13], denoted as steps 0a and 0b). The previously mentioned inputs are used for the construction of a structured prompt for system model instance generation (denoted as Prompt 3).
Prompt 4.
“Update the system: {system instance} based on the provided system metamodel: {metamodel} and user-defined requirements: {user input}. Identify system components, communication endpoints, channels, protocols, and relevant configuration attributes. Ensure that the generated model conforms to the metamodel and reflects the intended communication architecture”.
LLM-assisted instance model generation (2): Using the structured prompt, the LLM generates a concrete instance model conforming to a predefined telecommunication system metamodel. The generated model captures the intended communication topology and security properties of the system.
System feedback and iterative refinement (3): The generated system model is visualized using a PlantUML class diagram notation and presented to the user. This enables inspection of system topology and protocol usage, validation of architectural assumptions, and identification of missing or insufficient security mechanisms. The workflow supports iterative refinement, allowing the system model to be updated until it accurately reflects the intended architecture.
Security guidelines integration (4–5): Security-related guidelines relevant to IoT systems and underlying networks are provided in textual form. These may include secure communication requirements, restrictions on protocol usage, isolation and segmentation rules, authentication and authorization constraints, and secure management interface requirements. Using Prompt 4, these guidelines are automatically transformed into formal OCL rules aligned with the system metamodel. This step converts informal security knowledge into precise, machine-checkable constraints.
Prompt 5.
“Generate formal security constraints with respect to the provided system metamodel: {metamodel}, based on the given security and telecommunications guidelines: {reference architecture/security guidelines}. Express each requirement as a precise, verifiable OCL rule that can be evaluated against system model instances. Ensure that the constraints capture communication-level security properties such as protocol usage, authentication, encryption, and access control”.
Model-driven security analysis (6–7): The system model is transformed into an EMF/Ecore-compliant representation, enabling formal evaluation. The analysis process checks the system model against all OCL security constraints in order to detect communication-level security violations. Moreover, it also identifies the components, channels, or protocols responsible for each violation. The outcome is a pass/fail analysis report, highlighting violated rules in a form that is both machine-readable and understandable by system designers.
Automated model correction (8) [optional]: If security violations are identified, the workflow supports an optional corrective loop. Using Prompt Construct 4b, the LLM can interpret the failing security rules, propose architectural or configuration-level corrections, and update the system model accordingly. The updated model will then be re-analyzed until all security requirements are satisfied.
Prompt 6.
“Update the system model: {system instance} based on the provided analysis results: {passing/failing rules}. Identify elements that violate security constraints and propose modifications that resolve the violations while preserving the overall system intent. Ensure that the updated model remains compliant with the system metamodel: {metamodel} and addresses all reported security issues”.
Code generation (9): Once the system architecture passes all security checks, the validated model is used to automatically generate YANG data models. The generated YANG models describe communication interfaces and endpoints, protocol and service configuration parameters, security policies such as credentials, keys, and access control rules, and monitoring and management capabilities. Each managed component is mapped to a corresponding YANG module or submodule, ensuring consistency between validated design models and configuration representations. Additionally, from the generated YANG models, NETCONF configuration artifacts are derived. These artifacts enable secure configuration of networked components, enforcement of authentication and authorization policies, as well as consistent deployment of validated communication and security settings.
Prompt 7.
“Generate YANG model and NETCONF configuration that would reflect the changes to update current system: {system instance} based on the analysis results: {passing/failing rules}”.
Configuration deployment (10): The NETCONF payloads can be directly applied to the target infrastructure, ensuring that deployment-level configurations faithfully implement the security-validated architecture.
2.5. Workflow Implementation
Implementation of the workflow relies on a flexible Node-RED environment (version 4.1.2) [26], building upon good practices from our previous work in [27]. Additionally, for LLM interfacing, we use node-red-contrib-ai-intent extension (version 3.2.4) [28] that provides elements specialized for prompting and handling of generated results.
Node-RED is an open-source, flow-based programming tool that aims to make connection of hardware devices, APIs, and online services easier. Instead of writing lots of code, users build applications by wiring together visual “node” elements in a browser-based editor. Each node represents a specific task—such as reading data from a sensor, calling a web API, processing information, or sending messages—and data flows between them in real time. Node-RED is especially popular in IoT (Internet of Things) projects because it works well with devices like Raspberry Pi, microcontrollers, and cloud services. It supports many communication protocols (such as HTTP and MQTT) and has a large community library of reusable nodes, which makes development faster and more flexible. Overall, Node-RED is valued for its simplicity, visual clarity, and ability to rapidly prototype and deploy integrations, from smart home systems to data dashboards and automation workflows. Furthermore, node-red-contrib-ai-intent is a Node-RED extension consisting of a set of nodes designed to enhance automation workflows with AI and intent-based interactions. The package allows users to register custom intents and enables Large Language Model (LLM)-powered chatbot capabilities directly within Node-RED flows. It includes nodes for interacting with LLMs (such as OpenAI, Gemini, or local AI implementations) and workflows that can trigger registered intents dynamically. A screenshot of the environment showing the activity diagram construction workflow is given in Figure 3.
Figure 3.
Node-RED workflow implementation.
Node-RED is well suited for rapid prototyping and functional exploration due to its ease of usage, flexibility and extendibility, but its use in real-time systems and simulations introduces notable performance limitations. Its event-driven execution model on top of Node.js incurs message passing and scheduling overhead that can become a bottleneck in large or high-frequency simulations. Moreover, the lack of real-time guarantees and execution determinism, due to the JavaScript event loop, asynchronous callbacks, and garbage collection, limits its suitability for timing-sensitive or hard safety-critical scenarios. As system complexity grows, increased CPU and memory contention may lead to latency, message backlogs, or reduced responsiveness. However, Node-RED in our case is adopted within design-time, which is acceptable as strict real-time constraints are not applied.
The main functionality of the tool is organized as an API, which is described in Table 3. The proposed framework exposes a set of RESTful API endpoints that support the construction, analysis, and validation of model-driven networked systems. The APIs enable the derivation of a metamodel from a reference architecture, the instantiation of system models based on requirements, and the generation of formal security rules from guidelines. Additional endpoints provide script analysis, rule compliance checking, and configuration value validation. Graphical representations of the current behavior model and metamodel can be retrieved on demand, allowing both automated verification and human-readable inspection of the system state.
Table 3.
Overview of Node-RED workflow API calls.
As can be noted from Table 3, the overall workflow consists of a sequence of modular design-time API calls of varying complexity. In what follows, we will consider their complexity expressed in Big O notation. Relevant elements will be denoted as follows: M—number of metamodel elements; S—number of system components or configuration entities; R—number of rules; A—number of activities extracted from scripts; P—number of configuration parameters or values; and (Ref)Req—number of (reference) textual requirements.
- -
- constructMetamodel operates on the freeform text about reference architecture and scales linearly with the number of architectural requirements, yielding a complexity of O(RefReq).
- -
- constructSystem processes freeform textual requirements and the metamodel to generate an instance model, with complexity O(Req), assuming that the metamodel is fixed at the point of observation.
- -
- construct(Safety/Security)Rules translates textual safety and security guidelines into rules, scaling linearly with the number of input textual rule definitions, O(R).
- -
- analyzeScript parses YANG scripts and constructs activity representations, with complexity proportional to the number of activities, O(A).
- -
- checkRules validates extracted behavior against predefined rules. Since each rule may require scanning the activity set, this step dominates the workflow with O(R × A) complexity.
- -
- analyzeValues evaluates configuration or operational parameters against constraints, scaling linearly with the number of parameters, O(P).
- -
- currentBehaviorModel and currentMetamodel generate visual representations from existing models and scale linearly with model size, O(A) and O(M), respectively.
As all operations are performed at design time, this complexity remains acceptable and enables predictable, explainable, and scalable analysis for functional safety, security, and reliability assurance in network management system design.
Additionally, we also consider RAG-based context construction used to ground LLM reasoning in large command catalogs and protocol descriptors, which may contain thousands of entries and exceed practical LLM context limits. To address this, the workflow employs a two-stage retrieve-and-re-rank pipeline that selectively extracts only the catalog fragments relevant to the telecom component under analysis. Let D denote the number of indexed catalog entries or artifacts, k the number of candidates retrieved in the first stage, r the number of candidates re-ranked (with r ≤ k), L the final prompt length, and T the generated output length. The retrieval stage uses vector-based approximate nearest-neighbor search, yielding a complexity of O(log D) per query when indexed retrieval is employed (or O(D) in the worst case without indexing). This stage returns a small candidate set of size k, significantly reducing the downstream processing cost. The re-ranking stage evaluates semantic relevance over the retrieved candidates and scales linearly with the shortlist size, resulting in O(r) complexity. Since r is typically small and bounded, this step remains efficient even for large catalogs. The context construction and generation stage then operates on the reduced, grounded context and follows transformer inference complexity, approximately O(L2 + T2). Because RAG constrains the context to only relevant artifacts, both L and T remain bounded, effectively controlling token usage and inference cost. Overall, the complexity of RAG-based context construction is dominated by O(log D + r + L2 + T2) per invocation when indexed retrieval is used. As these operations are also executed at design time and on a bounded context, the approach remains scalable while ensuring accurate, up-to-date, and grounded LLM outputs for configuration analysis.
3. Results
In the first part of the Results Section, scenarios for both functional reliability analysis and security by design are presented. Afterwards, an evaluation for both a commercial LLM service and a locally deployable model aiming at crucial tasks for these scenarios is given.
3.1. Functional Safety Analysis Case Study
To demonstrate proof-of-concept for reliability assurance in networked IoT environments, we consider several experimental scenarios based on an IoT edge-gateway system that covers two main aspects: failure handling and maintaining Quality of Service (QoS). When it comes to failure handling, it must trigger an alert to the network operations center (NOC) when either the Wi-Fi sensor network or the LoRaWAN network reports a critical device failure. On the other side, it also aims to ensure that infrastructure works seamlessly under varying conditions, including an increasing number of users and performance drops imposed by changes in the environment. Each scenario is considered from the edge-controller decision-making module and contains functional flaws that should be discovered by the proposed LLM-empowered framework.
- -
- Scenario 1: Gateway ignores device failure reports and continues normal data forwarding instead of issuing an alert.
- -
- Scenario 2: Gateway issues alerts based on messages from a non-existent or disabled network interface (e.g., LoRaWAN), while only Wi-Fi sensors are active.
- -
- Scenario 3: Gateway sends an alert even though no failure has yet been detected by either network.
- -
- Scenario 4: The service scales down available compute resources even though the number of users increases, degrading performance instead of stabilizing it.
- -
- Scenario 5: Premature configuration updates can cause unexpected downtime due to unnecessary restarts or reinitialization.
An activity-diagram representation of these network reliability scenarios is depicted in Figure 4. Critical control-flow faults affecting system reliability are highlighted in red within the scenarios’ activity diagrams. In the first scenario, the risk is that the network operations center does not receive a required alert, causing extended downtime of critical IoT devices. Moreover, within the second scenario, the gateway issues alerts using data from the wrong network interface, meaning real device failures may never be reported correctly. In the third scenario, alerts are sent prematurely, causing false alarms and potential overload of the monitoring system. Furthermore, in the case of the fourth scenario, resource underscaling under heavy load reduces service quality, increases latency, and may cause system unavailability. Finally, premature configuration updates can cause unexpected downtime due to unnecessary restarts or reinitialization.
Figure 4.
Functional reliability analysis scenarios activity diagrams.
The underlying rules checked against the activity diagram for each scenario are summarized in Table 4.
Table 4.
Summary of network management functional reliability case study scenarios.
3.2. Secure Topology by Design Case Study
This scenario considers a multi-domain distributed telecommunications system supporting data collection, control, and centralized management across different network zones. The system consists of the following components (as depicted in the metamodel in Figure 5):
Figure 5.
Security by design case study metamodel visualized as PlantUML class diagram: rectangles represent classes (“C” symbol stands for “class”); textual labels inside rectangles are class attributes; arrows denote the relationships between the classes.
- -
- Field Device: Generates operational data and exposes a local control interface.
- -
- Edge Gateway: Aggregates traffic from multiple field devices and performs protocol translation.
- -
- Control Service: Provides centralized control and orchestration functions.
- -
- Management Server: Performs configuration management, monitoring, and software updates.
- -
- External Analytics Service: Receives telemetry data for analysis.
- -
- Network Domains
- ○
- Local Domain: trusted internal communication;
- ○
- Edge Domain: partially trusted aggregation layer;
- ○
- Core Domain: centralized control and management;
- ○
- External Domain: untrusted third-party services.
The modeled system represents a distributed, multi-tier communication architecture composed of field-level devices, aggregation components, control and management services, and external consumers. Each component plays a distinct role in data production, aggregation, control, and system management, and communicates with other components using explicitly defined communication paths.
The Field Device operates at the lowest level of the system and is responsible for generating operational and telemetry data. These data may include status information, measurements, or event notifications required for higher-level processing. The Field Device establishes a data communication link with the Edge Gateway, through which it periodically or event-driven transmits data. This link is intended for internal data transport and is typically confined to a local or trusted network segment. The communication is unidirectional or bidirectional, depending on the use case, but is primarily optimized for efficient data transfer rather than centralized control.
The Edge Gateway acts as a central aggregation and mediation point within the system. It collects incoming data from one or more Field Devices, performs optional preprocessing or filtering, and forwards relevant information to other system components. Beyond simple data forwarding, the Gateway also serves as a security and protocol boundary, ensuring that data leaving the local domain conforms to approved communication protocols and security policies. All cross-domain communication is intentionally routed through the Gateway to prevent uncontrolled direct interactions between system components.
The Control Service provides centralized control logic and coordination functions. It consumes selected data streams from the Gateway and may issue control commands or configuration updates in response. Communication between the Gateway and the Control Service occurs over a secure control channel. This channel is designed to carry command and control messages that influence system behavior. Because of its operational importance, the control channel is required to use authenticated and encrypted communication and is restricted to approved protocols. The Gateway ensures that only validated and authorized messages are forwarded to the Control Service.
The Management Server is responsible for system-wide management tasks such as monitoring, configuration, diagnostics, and lifecycle management. The Gateway communicates with the Management Server over a dedicated management communication link. This management link supports operations, such as reporting operational status and health metrics, receiving configuration updates, applying policy changes and software management actions. The management communication is required to use a secure management protocol and must be encrypted to protect sensitive configuration data and credentials. The Gateway acts as the managed endpoint and exposes only explicitly defined management interfaces.
The system also interacts with the External Analytics Service, which consumes selected telemetry data for advanced processing, reporting, or long-term analysis. This service resides in an external or less-trusted domain. To limit exposure and maintain security boundaries, the External Analytics Service does not communicate directly with internal components such as the Control Service or Field Devices. Instead, all telemetry destined for external analysis is routed through the Gateway using a secure telemetry link.
This link enforces encryption and protocol restrictions to protect data in transit and to prevent unauthorized access to internal system components. The overall architecture enforces clear separation between functional domains, including local, edge, core, and external domains. The Gateway is the only component permitted to communicate across domain boundaries, making it a focal point for enforcing security policies.
Each communication path is associated with (1) a specific purpose (data, control, management, or telemetry); an approved protocol with (2) explicit security properties such as encryption and access control. This structured separation ensures that sensitive management and control operations are protected, external exposure is minimized, and communication-related security requirements can be formally analyzed and validated before deployment.
The initial design intentionally includes multiple weaknesses, formalized in order to be detected by corresponding OCL rules, as summarized in Table 5. In the fourth column, part before arrow represents the logical condition within the model instance diagram which triggers the rule check itself, given after the arrow.
Table 5.
Summary of security flaws in the use case and corresponding formal rules.
More details about the generated response as an outcome can be found in Appendix A, including both the YANG module and NETCONF configuration.
3.3. Experiments and Evaluation
Considering the fact that network management requirements, YANG models, and controller-side source code are often subject to intellectual property (IP) protection and confidentiality-related constraints, the adoption of locally deployable LLM solutions can be highly beneficial. This is particularly relevant in NETCONF/YANG-based environments, where configuration schemas, operational data models, and automation logic are frequently proprietary and should not be shared with external cloud services.
Therefore, in this paper, when it comes to evaluation of the proposed approach, we compare the effectiveness of two widely adopted LLMs: the commercial OpenAI GPT-5 [29] and the locally deployable Meta LLaMA-3.3-70B-Instruct [30] model. Building on our previous work addressing requirements handling and model-based analysis, and assuming a fixed retrieval-augmented generation (RAG) system for Retrieval and Re-Rank-based question answering from [12], the experiments presented in this paper focus on complementary analysis tasks not covered in our prior publications. These tasks include (i) NETCONF and YANG parameter mapping based on existing software artifacts and (ii) activity diagram-based event-driven validation of configuration and operational behavior derived from YANG-modeled state transitions.
Regarding the analyzed source code, we rely on an execution environment building upon previous works [31], aiming at network management simulation, extending it with support for programmable NETCONF interfaces and YANG-modeled telemetry.
Four key aspects covered within experiments are summarized in Table 6: (1) NETCONF/YANG mapping—identification of NETCONF RPC calls, YANG modules, containers, lists, and leaf values from source code, based on a RAG-generated shortlist of candidate YANG elements; (2) YANG state mapping—association of operational and configuration data accessed via NETCONF with corresponding YANG data tree paths; (3) activity-driven validation—construction of configuration and operational event-driven activity diagrams from YANG modules, which are subsequently analyzed against predefined safety, consistency, and correctness rules; and (4) corrective action generation—generation of actions mapped to specific YANG modules and data nodes, including NETCONF edit-config operations and constraint-aware rollback procedures, to safely resolve detected violations.
Table 6.
Experiments and results summary: tasks crucial for LLM-driven IoT security and functional safety.
For the NETCONF/YANG mapping experiments, two variants are evaluated, corresponding to smaller and larger sets of candidate command elements produced by the RAG-based system. The RAG subsystem itself is considered fixed and is, therefore, not evaluated within the scope of this paper. In this study, we employ a retrieve-and-re-rank pipeline to support automated network management tasks.
Scenario descriptions and relevant YANG/NETCONF elements are first encoded using the SentenceTransformer model all-MiniLM-L6-v2 [32] to retrieve the top-k semantically relevant configuration and monitoring parameters based on embedding similarity. The retrieved candidates are subsequently re-ranked using the cross-encoder model ms-marco-MiniLM-L-6-v2 [33,34] to improve relevance ordering. The highest-ranked YANG model elements are then segmented into chunks to ensure compatibility with the input constraints of large language models. In this paper, such an approach is applied to automatically identify and retrieve the appropriate YANG/NETCONF parameters required for security and functional safety analysis.
The first column of Table 6 denotes the experiment category. The second column represents success criteria, which define the objective conditions that must be met for an aspect to be considered correctly handled. The next column summarizes the observations obtained when using the GPT-5 model, while the fourth column presents the corresponding observations for the locally deployable LLaMA model. The fifth column, speed-up, reports the relative reduction in task completion time achieved by the LLM-assisted workflow compared with a conventional manual engineering process. The traditional baseline involves expert-driven inspection of YANG models, NETCONF RPC traces, configuration scripts, and validation rules, typically requiring iterative analysis and cross-referencing across multiple artifacts. Speed-up values are expressed as multiplicative factors and represent the ratio between the average time required by the traditional approach and the average time required by the LLM-supported approach for the same task and input complexity. Speed-up for different aspects is calculated as an average of 20 independent runs under identical experimental conditions, four times for each of the five scenarios. Measurements reflect end-to-end execution time, including model inference, post-processing, and result validation, averaged across multiple experimental runs under identical conditions. Higher speed-up values indicate greater efficiency gains attributable to automated reasoning, semantic inference, and cross-artifact correlation performed by the LLMs.
The final column reports the overall model performance, expressed as the percentage of successful completions. Effectiveness in our evaluation is measured quantitatively as task-level completion accuracy, computed over repeated executions under identical experimental conditions. For a given task T and model M, effectiveness is defined by expression (1) as the ratio of successful executions to the total number of runs:
where N = 20 denotes the number of independent runs, I(·) is an indicator function that equals 1 if all correctness conditions are satisfied and 0 otherwise, and Cᵢᴹ,ᵀ represents the conjunction of all task-specific correctness criteria for run i. A run is considered successful only if all required outputs are correct. Therefore, for each experiment category, performance is reported as the percentage of successful completions over 20 independent executions (four runs for each of five scenarios from Table 4), using identical prompts, inference parameters, and retrieved context. A completion is considered successful only if all required elements are correctly produced, according to the task-specific definitions.
Effectiveness (M, T) = (1/N) × Σ I(Cᵢᴹ,ᵀ),
For NETCONF/YANG and YANG state-mapping tasks, a completion is considered successful if all relevant YANG nodes and their corresponding values are correctly identified. For the activity diagram-based event-driven analysis, a successful completion requires the construction of a syntactically and semantically correct activity sequence, enabling accurate detection of configuration errors, missing dependencies, or rule violations according to previously mentioned validation scenarios (S1–S6). Finally, when it comes to corrective action generation, execution is assumed to be successful only in the case when generated commands are both syntactically correct and ensure that the required intervention is applied. Therefore, the reported completion percentages correspond to end-to-end task accuracy, rather than token-level or element-level precision/recall. This choice reflects the engineering-oriented objective of the workflow, where partial correctness is insufficient for automated network management. This way, the evaluation prioritizes operational correctness, reproducibility, and deployability over abstract NLP metrics and scores. Therefore, binary task success under deterministic execution conditions is a highly relevant quantitative measure for comparing commercial cloud-based and locally deployable LLMs.
For both evaluated models, the maximum token limit per prompt was set to 4096 tokens, and default temperature settings were applied. The locally deployable LLaMA model was executed on an NVIDIA A100 GPU with 80 GB of VRAM. The proposed workflow used within the experiments relies on a fixed and explicitly defined LLM configuration to ensure deterministic behavior and reproducibility across experiments. All interactions with the LLM follow a template-based prompting strategy, where structured prompts with predefined roles and placeholders are used. Depending on the task, prompts are executed in a zero-shot manner, relying on the model’s pretraining, or in a lightweight few-shot mode when format consistency is critical (structured JSON or PlantUML output). No open-ended conversational prompting is employed. To further control randomness, the LLM is executed with fixed inference parameters across all experiments. Specifically, the temperature is set to a low value (temperature = 0–0.2) to favor deterministic and conservative outputs, and top-p sampling is disabled. Top-p (nucleus) sampling is disabled to minimize stochastic variation in the generated outputs and to ensure deterministic and reproducible behavior across repeated executions of the workflow. Since top-p sampling dynamically selects tokens from a probability mass that can vary between runs, its use may introduce non-determinism that is undesirable in design-time analysis, where consistency and repeatability are critical. By disabling top-p sampling and using a low temperature setting, the model consistently favors the highest-probability tokens, reducing output variability and improving the reliability of structured artifacts such as JSON, PlantUML models, Ecore metamodel, EMF-compliant model instance, OCL rules, and YANG configurations. Such an execution configuration is particularly important for automated validation and formal analysis, where even minor syntactic or semantic variations can affect downstream processing. The maximum token limit is fixed per prompt type to ensure consistent response length and to avoid uncontrolled verbosity. These parameters remain unchanged throughout the evaluation. Reproducibility is additionally supported by the use of RAG, which constrains the LLM’s context to a deterministic subset of relevant artifacts (such as command catalogs). By grounding generation in retrieved context, using prompts with pre-defined structure and inference parameters, variability across runs is minimized. Finally, all GenAI-generated artifacts used in the experiments are empirically verified through human review, ensuring the correctness and consistency of the reported results.
4. Discussion
In this section, observations based on the achieved results are covered for key aspects, including both quantitative and qualitative perspectives, as well as limitations of the current solution.
4.1. Quantitative Aspects
NETCONF/YANG Command Mapping: When identifying NETCONF remote procedure calls (RPCs) and associating YANG data nodes with configuration and operational commands, the commercial model consistently achieves higher accuracy and generates concise, well-defined command–parameter mappings with minimal ambiguity. The locally deployable model correctly recognizes the majority of relevant NETCONF RPCs and YANG nodes; however, it occasionally produces redundant or overly broad command associations, increasing the need for manual validation and post-processing. Quantitative results show that the commercial model outperforms the locally deployable one by approximately 10% for smaller command sets and by up to 15% as the number of candidate commands increases, highlighting its greater robustness to expanded or noisier contextual input.
YANG state command mapping: Both models demonstrate strong performance when mapping configuration and operational state transitions to standardized YANG data tree paths, particularly for scenarios involving smaller command sets. At lower complexity levels, both achieve perfect accuracy, underscoring the benefits of YANG’s structured and hierarchical data representation for command interpretation. As the number of candidate commands grows, a more pronounced performance decline is observed for the locally deployable model. While the commercial model maintains near-perfect accuracy, the locally deployable model exhibits a modest decrease, primarily due to challenges in resolving vendor-specific extensions or non-standard YANG nodes that require normalization.
Functional safety analysis: In the functional safety analysis task, which requires reconstructing causal and temporal command sequences, the commercial model reliably produces complete and correct execution flows that conform to predefined safety and consistency constraints. The locally deployable model generally captures the primary command ordering but may miss secondary dependencies or strict sequencing requirements in more complex validation scenarios. These limitations are reflected in lower accuracy across several test cases and suggest that functional safety reasoning imposes greater demands on long-range dependency modeling and the interpretation of implicit constraints.
Corrective command generation: For corrective action generation, the commercial model produces precise, executable remediation commands that are explicitly linked to the relevant YANG modules and data nodes, including well-formed NETCONF <edit-config> operations and constraint-aware rollback procedures. In contrast, the locally deployable model often outputs higher-level or partially specified corrective commands, with incomplete references to YANG modules or missing parameter constraints. As a result, additional manual refinement is typically required before such commands can be safely integrated into automated management pipelines.
Execution efficiency and overall trends: Regarding the execution time, for both commercial and locally deployable models, the individual task duration is of the order of magnitude of seconds (slightly faster for the commercial model relying on high-performance cloud infrastructure compared with the reference configuration used for locally deployable models), while manual operations are estimated to last from minutes to hours. Across all evaluated tasks, the locally deployable model achieves substantial execution speed-ups, ranging from approximately 6× to 15× depending on task complexity. This efficiency advantage is particularly valuable for on-premise and edge deployments, where latency, operational cost, and data confidentiality are key concerns. Although the commercial model consistently delivers higher accuracy and greater robustness for complex NETCONF/YANG command reasoning and functional safety analysis, the locally deployable model demonstrates performance that is satisfactory for many practical IoT security and safety applications, especially when supported by a well-designed retrieval-augmented generation (RAG) pipeline.
4.2. Qualitative Aspects
This paper addresses the design-time analysis gap in the cybersecurity literature, shifting the focus from catching threats as they happen to preventing them through safety and security by design.
Key aspects of novelty compared with similar works can be summarized as
- -
- Focus on design-time vs. run-time: While contemporary research (such as [19,20,23]) focuses almost exclusively on runtime detection and mitigation, we target the early development phase of IoT and network management systems. This allows for the earlier detection of security flaws before a system is even deployed, ensured by its design itself.
- -
- Integration of formal methods (MDE and rule-based validation): The framework is novel in its synergy of LLMs with MDE rule validation mechanisms based on notations like OCL. By using LLMs to generate system model instances and then validating them against formal OCL rules, this research adds a verifiable intermediate layer that enhances transparency and human trust in AI-generated outputs.
- -
- Semantic gap in existing architectures: While papers like [19,22] bridge the semantic gap for human operators at runtime, this work aims to bridge the gap between informal textual requirements and formal technical artifacts (like YANG modules and NETCONF payloads) required for infrastructure deployment.
- -
- Safety-critical reliability: This paper addresses a gap in functional safety analysis, ensuring that IoT systems behave predictably and move to safe states during accidental failures, not just intentional cyberattacks.
In contrast with conventional automated network security analysis and management solutions, the proposed methodology provides greater flexibility and generality, as changes in usage domain (such as industrial IoT or healthcare), network topology or design decisions do not require the development of new models or retraining of existing ones, which might be costly and time-consuming. This advantage comes from the fact that LLMs are pre-trained on extensive text corpora and, therefore, exhibit strong summarization and generalization capabilities, including for previously unseen scenarios. Adaptation in such cases is achieved primarily by supplying an updated metamodel, domain rules, a constrained set of allowed configuration templates, corrective actions and command patterns, while the proposed approach focuses on delivering accurate and context-rich inputs to the generative models. This way, we conveniently switch between different domains. Furthermore, the tight integration with formally grounded MDE techniques introduces verifiability and explainability, which are essential for practical adoption and significantly enhance human trust in GenAI-based solutions for safety-critical use cases.
4.3. Limitations
At the current stage of this work, several limitations should be acknowledged. One key limitation arises from the inherent risk of LLM hallucinations, which may lead to incorrect or incomplete outputs in certain cases and affect the success of individual workflow steps. Although RAG and structured contextualization are employed to mitigate this effect, hallucinations are still not entirely eliminated. To address this risk, all GenAI-empowered experimental outcomes are empirically verified through human-in-loop review, ensuring the correctness and reliability of the final results. The systematic adoption of automated hallucination detection and mitigation mechanisms, such as the employment of multiple LLM agents and post-processing of individual agent outcomes are considered an important direction for extension of current work.
Scalability represents another potential limitation. As the number of system components, interaction flows, or formal rules increases, the complexity of the underlying models and the volume of constraints may grow substantially. This may impact model management, analysis time, and the efficiency of design-time validation, particularly for large-scale IoT and network management systems. While the proposed workflow is modular by design, further investigation is needed to evaluate its scalability under significantly larger and more complex system configurations.
In addition, the effectiveness of the proposed approach depends on the quality and completeness of manually defined OCL rules and metamodel specifications. Inaccurate, incomplete, or overly restrictive rules may limit the detection of misconfigurations or lead to false positives during design-time analysis. Although the use of formal modeling improves transparency and explainability, the manual effort required to define and maintain high-quality rule sets remains a challenge and may affect adoption in practice.
Overall, while the proposed framework demonstrates strong potential for improving trust, security, and safety through explainable, design-time GenAI integration, these limitations highlight important areas for future research, including scalability optimization, rule engineering support, and robust hallucination mitigation strategies.
On the other side, the selection of baselines in this study was guided by conceptual relevance to design-time network management, formal model-driven analysis, and automation of configuration and decision-making workflows. The chosen baselines represent widely adopted traditional and automated network management approaches that rely on manual intervention, rule-based logic, or narrowly scoped optimization mechanisms, thereby providing a meaningful point of comparison with the proposed GenAI- and MDE-enabled framework. However, the set of baselines could be further strengthened by incorporating a wider set of methods. At the current stage, many recent GenAI- or deep learning-based solutions primarily focus on runtime optimization or black-box decision making, often without formal guarantees of correctness, which limits their direct comparability with the proposed design-time, verification-oriented workflow. Future work will, therefore, consider expanding the baseline set to include conceptually closer methods, such as learning-assisted network design tools and other hybrid approaches involving AI, and will provide a more systematic justification of baseline selection criteria to further strengthen comparative evaluation.
5. Conclusions
Based on the outcomes, it can be concluded that both the commercial and locally deployable large language models achieve satisfactory performance across the evaluated tasks, including NETCONF/YANG command mapping, YANG state interpretation, functional safety reasoning, and corrective command generation. The commercial model demonstrates consistently higher accuracy and stronger robustness, particularly in scenarios involving larger command sets, implicit dependencies, and complex safety constraints. Nevertheless, the locally deployable model delivers results that remain suitable for many practical IoT security and functional safety applications, especially when the analysis scope and contextual inputs are carefully controlled.
To compensate for the observed performance gap, additional supporting measures are beneficial when employing locally deployable models. In particular, the use of simpler and more explicit notations, such as PlantUML for representing event-driven chains and execution flows, helps reduce ambiguity and improves the model’s ability to reconstruct correct causal relationships. Furthermore, a well-designed retrieval-augmented generation (RAG) system plays a critical role by supplying relevant NETCONF commands, YANG modules, and operational context, thereby limiting the cognitive load on the model and mitigating errors caused by incomplete or noisy input. When these complementary techniques are applied, locally deployable models can approach the effectiveness of commercial solutions while offering significant advantages in terms of execution speed, deployment flexibility, and preservation of data confidentiality.
Overall, the findings suggest that locally deployable LLMs represent a viable and increasingly attractive option for LLM-driven IoT security and functional safety analysis. Although they may require additional preprocessing, contextualization, and representation-level simplifications, their performance is sufficient to support automated and semi-automated management workflows, particularly in environments where on-premise deployment and protection of sensitive software assets are essential.
Most state-of-the-art cybersecurity research acts like a highly trained guard dog (an analogy for runtime detection, which is effective at spotting an intruder and barking an alert once they cross the fence). On the other side, the work presented in this paper acts analogous to architect and building inspector—instead of waiting for an intruder, we rely on AI together with MDE to review the blueprints, to ensure there are no hidden crawlspaces or faulty circuits that could cause a “fire” or allow a break-in, before the house is even built, bridging the gap when it comes to design-time security analysis. When it comes to the benefits of this approach, it reduces the time needed for individual tasks from 6 to 15 times, relying on LLM-based automation. On the other side, early security and safety issue detection and elimination in the design phase has added value, as the system will not be later disrupted for re-deployment and corrective actions at run-time.
Future research will investigate fine-tuning and various prompting techniques for locally deployable LLMs to narrow the performance gap with commercial models while preserving the advantages of on-premise deployment. In parallel, the use of simplified and structured intermediate representations will be further explored to improve interpretability and robustness. Moreover, the approach will be extended to closed-loop network management scenarios, incorporating runtime feedback from network management operations and policy enforcement mechanisms in synergy with run-time tools. It will also aim to support continuous monitoring and automated system adaptation over time. On the other side, future work can further enhance the security and safety aspects of the proposed framework by incorporating recent deep reinforcement learning (DRL)-based optimization approaches for non-orthogonal integrated sensing and communication (ISAC) systems [35,36]. These studies introduce advanced channel and interference models together with learning-based optimization techniques that can support the identification of safety-critical operating conditions, robustness margins, and security-sensitive configuration boundaries under dynamic wireless environments. Embedding such insights into the LLM- and MDE-driven design-time workflow would enable the formulation of formally verifiable security and safety constraints, thereby improving resilience to misconfigurations, performance degradation, and sensing–communication interference in next-generation IoT and network management systems.
Author Contributions
Conceptualization, N.P., D.K. and M.G.; methodology, N.P. and M.G.; software, N.P.; validation, D.K. and N.P.; formal analysis, N.P.; investigation, D.K.; resources, D.K.; data curation, N.P.; writing—original draft preparation, N.P. and D.K.; writing—review and editing, D.K. and M.G.; visualization, N.P.; supervision, D.K. and M.G.; project administration, D.K. and M.G.; funding acquisition, D.K. and M.G. All authors have read and agreed to the published version of the manuscript.
Funding
This paper was made under the project NATO Science for Peace and Security Programme Ref. No. G6259, “Enhancing IoT Systems Security”.
Data Availability Statement
Crucial software artifacts are openly available on GitHub: https://github.com/penenadpi/llm_iot_sec (accessed on 8 January 2026). Additional experiment data and dependencies can be obtained upon request from the corresponding author.
Acknowledgments
During the preparation of this work, the authors used ChatGPT 5.2 for grammar, language style and spelling check. After using these services, the authors reviewed and edited the content as needed and take full responsibility for this publication’s content.
Conflicts of Interest
The authors declare no conflicts of interest.
Abbreviations
The following abbreviations were used in this manuscript:
| AI | Artificial Intelligence |
| API | Application Programming Interface |
| DRL | Deep Reinforcement Learning |
| EMF | Eclipse Modeling Framework |
| HTTPS | HyperText Transfer Protocol Secure |
| GenAI | Generative Artificial Intelligence |
| IoT | Internet of Things |
| ISAC | Integrated Sensing and Communication |
| JSON | JavaScript Object Notation |
| LLM | Large Language Model |
| LoRaWAN | Long-Range Wide-Area Network |
| MDE | Model-Driven Engineering |
| NOC | Network Operations Center |
| NETCONF | Network Configuration Protocol |
| NFV | Network Function Virtualization |
| NLP | Natural Language Processing |
| OCL | Object Constraint Language |
| QoS | Quality of Service |
| RAG | Retrieval-Augmented Generation |
| RPC | Remote Procedure Call |
| REST | Representational State Transfer |
| SDN | Software Defined Networking |
| SSH | Secure Shell |
| UML | Unified Modeling Language |
| VLM | Vision Language Model |
| YANG | Yet Another Next Generation |
| XAI | Explainable Artificial Intelligence |
| XML | Extensible Markup Language |
Appendix A
Appendix A.1
| Example of generated YANG module for security by design case study. |
| module secure-communication-model { |
| namespace "urn:secure-communication"; |
| prefix scm; |
| organization "Elfak"; |
| description |
| "YANG model enforcing secure communication and management |
| for distributed telecommunications systems."; |
| container communication { |
| list link { |
| key "link-id"; |
| leaf link-id { |
| type string; |
| description "Unique identifier of the communication link."; |
| } |
| leaf protocol { |
| type enumeration { |
| enum HTTPS; |
| enum NETCONF; |
| } |
| description "Approved communication protocol."; |
| } |
| leaf encrypted { |
| type boolean; |
| must ". = 'true'" { |
| description |
| "All non-local communication must be encrypted."; |
| } |
| description "Indicates whether encryption is enabled."; |
| } |
| leaf domain { |
| type enumeration { |
| enum Local; |
| enum Edge; |
| enum Core; |
| enum External; |
| } |
| description "Trust domain of the communication link."; |
| } |
| leaf role { |
| type enumeration { |
| enum Data; |
| enum Control; |
| enum Management; |
| enum Telemetry; |
| } |
| description "Functional role of the communication link."; |
| } |
| must "not(protocol = 'HTTPS' and role = 'Management')" { |
| description |
| "Management communication must not use HTTPS."; |
| } |
| must "(role != 'Management') or (protocol = 'NETCONF')" { |
| description |
| "Management communication must use NETCONF."; |
| } |
| } |
| } |
| } |
Appendix A.2
| Generated NETCONF payload for security by design case study. |
| <rpc message-id="101" |
| xmlns="urn:ietf:params:xml:ns:netconf:base:1.0"> |
| <edit-config> |
| <target> |
| <running/> |
| </target> |
| <config> |
| <communication xmlns="urn:example:secure-communication"> |
| <link> |
| <link-id>control-link</link-id> |
| <protocol>HTTPS</protocol> |
| <encrypted>true</encrypted> |
| <domain>Edge</domain> |
| <role>Control</role> |
| </link> |
| <link> |
| <link-id>management-link</link-id> |
| <protocol>NETCONF</protocol> |
| <encrypted>true</encrypted> |
| <domain>Core</domain> |
| <role>Management</role> |
| </link> |
| <link> |
| <link-id>telemetry-link</link-id> |
| <protocol>HTTPS</protocol> |
| <encrypted>true</encrypted> |
| <domain>External</domain> |
| <role>Telemetry</role> |
| </link> |
| </communication> |
| </config> |
| </edit-config> |
| </rpc> |
References
- Bariah, L.; Zhao, Q.; Zou, H.; Tian, Y.; Bader, F.; Debbah, M. Large generative AI models for telecom: The next big thing? IEEE Commun. Mag. 2024, 62, 84–90. [Google Scholar] [CrossRef]
- Ranjani, H.G.; Prabhudesai, R. Measuring visual understanding in telecom domain: Performance metrics for image-to-UML conversion using VLMs. arXiv 2025, arXiv:2509.11667. [Google Scholar]
- Zhang, R.; Du, H.; Liu, Y.; Niyato, D.; Kang, J.; Sun, S.; Shen, X.; Poor, H.V. Interactive AI with retrieval-augmented generation for next generation networking. IEEE Netw. 2024, 38, 414–424. [Google Scholar] [CrossRef]
- Petrović, N.; Al-Azzoni, I. Model-driven smart contract generation leveraging ChatGPT. In Advances in Systems Engineering, Proceedings of the ICSEng 2023, Las Vegas, NV, USA, 22–24 August 2023; Lecture Notes in Networks and Systems; Springer: Cham, Switzerland, 2023; Volume 761, pp. 387–396. [Google Scholar]
- Hosseini, S.; Seilani, H. The role of agentic AI in shaping a smart future: A systematic review. Array 2025, 26, 100399. [Google Scholar] [CrossRef]
- Suljović, S.; Petrović, N.; Vujović, V.; Milašinović, M.; Đorđević, G.; Stefanović, R. AI agent-driven maintenance: Case study of outage probability for 5G wireless system with L-branch SC receiver influenced by Rician fading and Rician co-channel interference. In Proceedings of the 2025 12th International Conference on Electrical, Electronic and Computing Engineering (IcETRAN), Cacak, Serbia, 9–12 June 2025; pp. 1–5. [Google Scholar]
- Sebestyen, H.; Popescu, D.E.; Zmaranda, R.D. A Literature Review on Security in the Internet of Things: Identifying and Analysing Critical Categories. Computers 2025, 14, 61. [Google Scholar] [CrossRef]
- Tomur, E.; Gül, A.; Aydın, M.A.; Erdin, E. SoK: Investigation of Security and Functional Safety in Industrial IoT. In Proceedings of the 2021 IEEE International Conference on Cyber Security and Resilience (CSR), Rhodes, Greece, 26–28 July 2021; pp. 226–233. [Google Scholar] [CrossRef]
- Bovenzi, G.; Cerasuolo, F.; Ciuonzo, D.; Di Monda, D.; Guarino, I.; Montieri, A.; Persico, V.; Pescapé, A. Mapping the landscape of generative AI in network monitoring and management. IEEE Trans. Netw. Serv. Manag. 2025, 22, 2441–2472. [Google Scholar] [CrossRef]
- Hao, L.; Zhang, S.; Schulzrinne, H. Advancing IoT system dependability: A deep dive into management and operation plane separation. In Proceedings of the 2025 IEEE 11th World Forum on Internet of Things (WF-IoT), Chengdu, China, 27–30 October 2025; pp. 1–6. [Google Scholar] [CrossRef]
- Suljović, S.; Petrović, N.; Krstić, D. GenAI-enabled network design for the case of the outage probability of a Beaulieu-Xie wireless fading environment with maximal ratio combining. In Proceedings of the 67th International Symposium ELMAR-2025, Zadar, Croatia, 15–17 September 2025; pp. 205–208. [Google Scholar]
- Petrović, N.; Suljović, S.; Đorđević, S.; Vujović, V.; Stokić, I. RAG-enriched approach to network standardization: MGF-based calculation of average bit error probability in Nakagami-m fading environment with selection diversity receiver case study. In Proceedings of the 2024 59th International Scientific Conference on Information, Communication and Energy Systems and Technologies (ICEST), Sozopol, Bulgaria, 1–3 July 2024; pp. 1–4. [Google Scholar]
- Petrović, N.; Vujović, V.; Zdravković, N.; Suljović, S.; Đorđević, G.; Đorđević, S. Metamodeling for network experimentation: ABEP of SC diversity with L branches under the combined effects of Rayleigh fading and BX co-channel interference in WCN case study. In Proceedings of the 24th International Symposium INFOTEH-JAHORINA (INFOTEH), Jahorina, Sarajevo, Bosnia and Herzegovina, 19–21 March 2025; pp. 1–6. [Google Scholar]
- Milić, D.; Petrović, N.; Jović, M.; Stefanović, R.; Saliji, M.; Suljović, E. VLM-enabled network experimentation: Case study of channel capacity in system limited by η-μ fading. In Proceedings of the 2025 24th International Symposium INFOTEH-JAHORINA (INFOTEH), Jahorina, Sarajevo, Bosnia and Herzegovina, 19–21 March 2025; pp. 1–5. [Google Scholar]
- Petrović, N.; Krstić, D.; Suljović, S.; Javor, D. LLM-driven approach to automated sustainability of IoT systems. In Proceedings of the 2025 IEEE 34th International Conference on Microelectronics (MIEL 2025), Nis, Serbia, 13–16 October 2025; pp. 1–4. [Google Scholar]
- Petrović, N.; Krstić, D.; Suljović, S.; Hanczewski, S.; Glabowski, M. Agent-based AI approach to security in IoT systems leveraging GenAI. In Proceedings of the 33rd International Conference on Software, Telecommunications, and Computer Networks (SoftCOM 2025), Split, Croatia, 18–20 September 2025; pp. 1–5. [Google Scholar]
- Krstić, D.; Suljović, S.; Djordjevic, G.; Petrović, N.; Milić, D. MDE and LLM synergy for network experimentation: Case analysis of wireless system performance in Beaulieu–Xie fading and κ–µ cochannel interference environment with diversity combining. Sensors 2024, 24, 3037. [Google Scholar] [CrossRef] [PubMed]
- Rigaki, M.; Catania, C.; Garcia, S. Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments. arXiv 2024, arXiv:2409.11276v1. [Google Scholar] [CrossRef]
- Baral, S.; Saha, S.; Haque, A. An Adaptive End-to-End IoT Security Framework Using Explainable AI and LLMs. In Proceedings of the 2024 IEEE 10th World Forum on Internet of Things (WF-IoT), Ottawa, ON, Canada, 10–13 November 2024. [Google Scholar]
- Gutiérrez-Galeano, L.; Domínguez-Jiménez, J.-J.; Schäfer, J.; Medina-Bulo, I. LLM-Based Cyberattack Detection Using Network Flow Statistics. Appl. Sci. 2025, 15, 6529. [Google Scholar] [CrossRef]
- Diaf, A.; Korba, A.A.; Karabadji, N.E.; Ghamri-Doudane, Y. BARTPredict: Empowering IoT Security with LLM-Driven Cyber Threat Prediction. In Proceedings of the GLOBECOM 2024—2024 IEEE Global Communications Conference, Cape Town, South Africa, 8–12 December 2024; pp. 1239–1244. [Google Scholar] [CrossRef]
- Mahmood, M.A.I.; Ashab, F.; Sohan, M.S.; Chy, M.H.I.; Kader, M.F. LLM-Enhanced Security Framework for IoT Network: Anomaly Detection and Malicious Devices Identification. IEEE Access 2025, 13, 168405–168419. [Google Scholar] [CrossRef]
- Otoum, Y.; Asad, A.; Nayak, A. LLM-Based Threat Detection and Prevention Framework for IoT Ecosystems. arXiv 2025, arXiv:2505.00240. [Google Scholar]
- IETF RFC 7950; Bjorklund, M. YANG—A Data Modeling Language for the Network Configuration Protocol (NETCONF). RFC Editor: Marina del Rey, CA, USA, 2016.
- IETF RFC 6241; Enns, R.; Bjorklund, M.; Schoenwaelder, J.; Bierman, A. Network Configuration Protocol (NETCONF). RFC Editor: Marina del Rey, CA, USA, 2011.
- OpenJS Foundation. Node-RED: Low-Code Programming for Event-Driven Applications: The Easiest Way to Collect, Transform and Visualize Real-Time Data. Available online: https://nodered.org/ (accessed on 9 February 2026).
- Petrovic, N.; Tosic, M. SMADA-Fog: Semantic Model Driven Approach to Deployment and Adaptivity in Fog Computing. Simul. Model. Pract. Theory 2020, 101, 102033. [Google Scholar] [CrossRef]
- Node-RED Flow Library. @technithusiast/node-red-contrib-ai-intent. Available online: https://flows.nodered.org/node/@technithusiast/node-red-contrib-ai-intent (accessed on 28 December 2025).
- OpenAI. Introducing GPT-5; OpenAI: San Francisco, CA, USA, 2025; Available online: https://openai.com/index/introducing-gpt-5/ (accessed on 26 December 2025).
- Meta. Meta-Llama/LLaMA-3.3-70B-Instruct; Meta Platforms, Inc.: Menlo Park, CA, USA, 2024; Available online: https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct (accessed on 26 December 2025).
- Petrović, N.; Vasić, S.; Milić, D.; Suljović, S.; Koničanin, S. GPU-supported simulation for ABEP and QoS analysis of a combined macro diversity system in a gamma-shadowed κ–µ fading channel. Facta Univ. Ser. Electron. Energetics 2021, 34, 89–104. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. Available online: https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2 (accessed on 28 December 2025).
- Wang, W.; Wei, F.; Dong, L.; Bao, H.; Yang, N.; Zhou, M. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. In Proceedings of the 34th Conference on Neural Information Processing Systems (NeurIPS 2020), Online, 6–12 December 2020; Curran Associates Inc.: Red Hook, NY, USA, 2020. Article No. 485. Volume 1, pp. 5776–5788. [Google Scholar]
- Cross-Encoder for MS Marco. Available online: https://huggingface.co/cross-encoder/ms-marco-MiniLM-L-6-v2 (accessed on 9 February 2026).
- Ma, Z.; Liang, Y.; Zhu, Q.; Zheng, J.; Lian, Z.; Zeng, L.; Fu, C.; Peng, Y.; Ai, B. Hybrid-RIS-assisted cellular ISAC networks for UAV-enabled low-altitude economy via deep reinforcement learning with mixture-of-experts. IEEE Trans. Cogn. Commun. Netw. 2026, 12, 3875–3888. [Google Scholar] [CrossRef]
- Ma, Z.; Zhang, R.; Ai, B.; Lian, Z.; Zeng, L.; Niyato, D. Deep reinforcement learning for energy efficiency maximization in RSMA-IRS-assisted ISAC systems. IEEE Trans. Veh. Technol. 2025, 74, 18273–18278. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.




