Asset Discovery in Critical Infrastructures: An LLM-Based Approach
Abstract
1. Rationale and Motivation
- Semantic Context Awareness through LLM Integration: The framework integrates large language models (LLMs) to introduce semantic context awareness in device identification processes, eliminating the need for a priori data classification or context-specific rule sets that constrain existing solutions. This semantic understanding enables dynamic interpretation of heterogeneous data sources and protocols without predetermined taxonomies.
- Unified Asset Lifecycle Management via Mixture of Experts: The architecture employs a Mixture of Experts (MoE) approach that coordinates specialized, fine-tuned lightweight models for distinct operational phases—asset identification, vulnerability assessment, and network parameter optimization—thereby providing comprehensive asset lifecycle management within a unified framework rather than requiring separate tools for each function.
- Non-Intrusive Multi-Modal Data Fusion: The system implements a non-intrusive data fusion methodology that synthesizes information from passive traffic analysis, protocol-aware active probing, and multi-modal sensor inputs (including acoustic and electromagnetic signatures), fundamentally departing from traditional aggressive scanning approaches that pose operational risks to critical infrastructure.
2. Related Work
3. Background
3.1. Asset Discovery in ICS
3.2. Available Tools Review
3.3. Why LLMs Are Useful in ICSs
4. Architecture Overview
4.1. MoE Model Description: Activate Only What You Need
4.2. The Feedback Loop: Analyze–Enrich–Set
4.3. Requirements Elicitation and Threat Model
- Attack Surface Obfuscation: Incomplete asset discovery produces fragmented attack surface mappings, creating security blind spots where threat assessment and defensive measures are not implemented. These unmapped segments become high-risk attack vectors.
- Shadow Device Proliferation: Unmanaged devices within the operational technology perimeter represent critical security vulnerabilities. These assets, lacking proper patch management, security monitoring, and configuration control, constitute weak nodes susceptible to lateral movement attacks and persistent threats.
- Protocol Misclassification: Legacy and proprietary industrial protocols often exhibit non-standard behaviors that confound traditional discovery mechanisms, leading to asset misidentification and inappropriate security policy application.
- Temporal Asset-State Drift: Dynamic network topologies and device state changes in ICS environments create temporal inconsistencies in asset inventories, undermining continuous security monitoring and incident response capabilities.
5. Proof of Concept: AI-Based ICS Asset Discovery
5.1. Testbed Description
- The TP-Link Router TL-WR940N: Serves as the central hub managing internal traffic and segmenting the network from external access. It ensures consistent IP addressing and routing [18].
- The Robotic Arms (Niryo NED2): Two robotic manipulators, each based on Raspberry Pi hardware, with 6 degrees of freedom and programmable via API. These are Wi-Fi-connected and reflect devices used in both educational and prototyping scenarios [19].
- The 3D Printer UltiMaker S7 Pro Bundle: A Wi-Fi-connected printer representative of auxiliary manufacturing assets commonly integrated into ICS networks [20].
- The 3D Printer Bambu Lab X1E 3D Printer: Another Wi-Fi-enabled 3D printer, included to test recognition of similar devices across different manufacturers [21].
- The Programmable Logic Controller (PLC): Omron NX1P2, a core component in ICSs responsible for process control. This unit is connected via Ethernet and supports common industrial protocols. Its presence allows for evaluating asset detection in more sensitive segments of the network [22].
- The Workstation: A Windows 11 Pro workstation connected via Ethernet. It hosts the asset discovery framework and the local LLM execution engine. This machine is equipped with an Intel(R) Core(TM)i9-14900KF processor and 64 GB of RAM.
5.2. Tool Selection
5.3. Fine-Tuning LLM
- (1)
- Network reconnaissance data collection from the testbed environment using Nmap and traffic capture tools;
- (2)
- Automated dataset generation through the EasyTrain framework, converting raw network outputs into structured instruction–input–output triplets;
- (3)
- Data preprocessing and tokenization using Gemma’s native tokenizer with custom collation functions;
- (4)
- LoRA-based fine-tuning on CUDA-optimized hardware with mixed-precision training;
- (5)
- Adapter merging and model export for Ollama deployment;
- (6)
- Inference engine configuration with domain-specific system prompts and sampling parameters optimized for technical accuracy in industrial cybersecurity contexts.
5.4. Output Report
Listing 1. Formatted JSON output report generated by the framework. |
6. Results and Discussion
6.1. Preliminary Framework Evaluation
6.2. Scalability Evaluation
6.3. Current Limitations
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Santos, S.; Costa, P.; Rocha, A. IT/OT convergence in industry 4.0: Risks and analysis of the problems. In Proceedings of the 2023 18th Iberian Conference on Information Systems and Technologies (CISTI), Aveiro, Portugal, 20–23 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Murray, G.; Johnstone, M.N.; Valli, C. The convergence of IT and OT in critical infrastructure. In Proceedings of the Australian Information Security Management Conference, Perth, Australia, 5–6 December 2017. [Google Scholar]
- CISA. ICS Advisory Report; Technical Report; Cybersecurity and Infrastructure Security Agency: Washington, DC, USA, 2023. [Google Scholar]
- Schrick, N.L.; Lorenzen, C.; Mitchell, C.; Rials, C.; Swartzwelder, R.; Kelley, C.; Sweeney, C.; Nelson, J.; Hendrix, A.; Kalohi, D.; et al. The Growth of Asset Identification in OT Environments and Remaining Challenges. In Proceedings of the 2024 Resilience Week (RWS), Austin, TX, USA, 3–5 December 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
- ISO/IEC 27001:2022; Information Security, Cybersecurity and Privacy Protection–Information Security Management Systems–Requirements, 3rd edition. International Organization for Standardization: Geneva, Switzerland; International Electrotechnical Commission: Geneva, Switzerland, 2022. Available online: https://www.iso.org/standard/27001 (accessed on 15 June 2025).
- Angraini; Megawati; Haris, L. Risk Assessment on Information Asset an academic Application Using ISO 27001. In Proceedings of the 2018 6th International Conference on Cyber and IT Service Management (CITSM), Parapat, Indonesia, 7–9 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–4. [Google Scholar]
- Trend Micro. New Research Reveals Three Quarters of Cybersecurity Incidents Occur Due to Unmanaged Assets. 2025. Available online: https://newsroom.trendmicro.com/2025-04-29-New-Research-Reveals-Three-Quarters-of-Cybersecurity-Incidents-Occur-Due-to-Unmanaged-Assets (accessed on 30 July 2025).
- Hanka, T.; Niedermaier, M.; Fischer, F.; Kießling, S.; Knauer, P.; Merli, D. Impact of active scanning tools for device discovery in industrial networks. In Proceedings of the Security, Privacy, and Anonymity in Computation, Communication, and Storage: SpaCCS 2020 International Workshops, Nanjing, China, 18–20 December 2020; Proceedings 13. Springer: Berlin/Heidelberg, Germany, 2021; pp. 557–572. [Google Scholar]
- Vermeer, M.; West, J.; Cuevas, A.; Niu, S.; Christin, N.; Van Eeten, M.; Fiebig, T.; Ganán, C.; Moore, T. SoK: A framework for asset discovery: Systematizing advances in network measurements for protecting organizations. In Proceedings of the 2021 IEEE European Symposium on Security and Privacy (EuroS&P), Vienna, Austria, 6–10 September 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 440–456. [Google Scholar]
- Yang, W.; Fang, Y.; Zhou, X.; Shen, Y.; Zhang, W.; Yao, Y. Networked Industrial Control Device Asset Identification Method Based on Improved Decision Tree. J. Netw. Syst. Manag. 2024, 32, 32. [Google Scholar] [CrossRef]
- Park, M.; Cho, S.J.; Kim, H. A study on asset identification in smart buildings automation systems. In Proceedings of the 2023 Fourteenth International Conference on Ubiquitous and Future Networks (ICUFN), Paris, France, 4–7 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 923–925. [Google Scholar]
- Wang, H.; Eklund, D.; Oprea, A.; Raza, S. FL4IoT: IoT device fingerprinting and identification using federated learning. Acm Trans. Internet Things 2023, 4, 1–24. [Google Scholar] [CrossRef]
- Fakih, M.; Dharmaji, R.; Moghaddas, Y.; Quiros, G.; Ogundare, O.; Al Faruque, M.A. Llm4plc: Harnessing large language models for verifiable programming of plcs in industrial control systems. In Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Practice, Lisbon, Portugal, 14–20 April 2024; pp. 192–203. [Google Scholar]
- Vasilatos, C.; Mahboobeh, D.J.; Lamri, H.; Alam, M.; Maniatakos, M. Llmpot: Automated llm-based industrial protocol and physical process emulation for ics honeypots. arXiv 2024, arXiv:2405.05999. [Google Scholar] [CrossRef]
- Mo, S.; Salakhutdinov, R.; Morency, L.P.; Liang, P.P. Iot-lm: Large multisensory language models for the internet of things. arXiv 2024, arXiv:2407.09801. [Google Scholar]
- Cai, W.; Jiang, J.; Wang, F.; Tang, J.; Kim, S.; Huang, J. A survey on mixture of experts in large language models. IEEE Trans. Knowl. Data Eng. 2025, 37, 3896–3915. [Google Scholar] [CrossRef]
- Messe, N.; Chiprianov, V.; Belloir, N.; El-Hachem, J.; Fleurquin, R.; Sadou, S. Asset-oriented threat modeling. In Proceedings of the 2020 IEEE 19th International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom), Guangzhou, China, 29 December 2020–1 January 2021; IEEE: Piscataway, NJ, USA, 2020; pp. 491–501. [Google Scholar]
- TP-Link Technologies Co., Ltd. TL-WR940N V6 User Guide. Available online: https://www.tp-link.com/us/user-guides/TL-WR940N_V6/ (accessed on 1 June 2025).
- Niryo SAS. NED2 User Manual v1.0.0; Six-Axis Collaborative Robot; User Manual for NED2 Robotic Arm. Available online: https://static.generation-robots.com/media/manuel-utilisation-ned2-niryo-en.pdf (accessed on 1 June 2025).
- UltiMaker. UltiMaker S7 Pro Bundle—Technical Specification. Available online: https://ultimaker.com/3d-printers/s-series/ultimaker-s7-pro-bundle/ (accessed on 1 June 2025).
- Lab, B. Bambu Lab X1E 3D Printer—Technical Specification. Available online: https://eu.store.bambulab.com/products/x1e?srsltid=AfmBOoqdcffZl66_xp9JL-x8gq8jGQTNvtnv9EGNQCfcgbuWmVjZV9GN (accessed on 1 June 2025).
- Omron Corporation. NX-Series NX1P2 CPU Unit Built-in I/O and Option Board User’s Manual. Available online: https://files.omron.eu/downloads/latest/manual/en/w579_nx-series_nx1p2_cpu_unit_built-in_i_o_and_option_board_users_manual_en.pdf?v=1 (accessed on 1 June 2025).
- Python Software Foundation. Python 3.13.5 Documentation. Available online: https://docs.python.org/it/3/about.html (accessed on 1 June 2025).
- Ollama Contributors. Ollama Documentation. 2025. Available online: https://github.com/ollama/ollama/tree/main/docs (accessed on 7 July 2025).
- Google DeepMind. google/gemma-3-27b-it Model Card. Hugging Face Model Repository. 2025. Multimodal Gemma3 Model (27B), 128K Context Window; Access Requires Accepting Google’s License. Available online: https://huggingface.co/google/gemma-3-27b-it (accessed on 1 June 2025).
- NVIDIA Corporation. GeForce RTX 4090 Graphics Card. 2022. Available online: https://www.nvidia.com/en-us/geforce/graphics-cards/40-series/rtx-4090/ (accessed on 7 July 2025).
- Nmap Project. Nmap Documentation; Includes Reference Guide, Man Page, Installation Instructions, Scripting Engine Documentation. Available online: https://nmap.org/book/man.html (accessed on 7 July 2025).
- Atkinson, R.; Bhatti, S.N. Address Resolution Protocol (ARP) for the Identifier-Locator Network Protocol for IPv4 (ILNPv4). RFC 6747. 2012. Available online: https://www.rfc-editor.org/info/rfc6747 (accessed on 7 July 2025).
Tool/Platform | Discovery Method | Key Capabilities | Primary Limitations |
---|---|---|---|
Traditional IT Security Solutions | |||
Nmap | Active scanning | Port scanning, service detection, network enumeration | No industrial protocol support, operational risks to OT devices, inappropriate for ICSs |
Shodan | Passive internet scanning | Internet-exposed device identification, global visibility | Air-gapped network inaccessibility, limited asset context |
Commercial ICS Security Platforms | |||
Tenable OT Security | Passive monitoring | Non-intrusive scanning, vulnerability detection, Nessus-based technology | Infrastructure modifications required, passive monitoring constraints |
Claroty | Multi-method | Passive monitoring, active queries, database parsing, data fusion | Complex integration, coordination challenges |
Armis | Agentless ML | ML-based analysis, behavioral profiling, device fingerprinting | Limited deep inspection, traffic pattern dependency |
Nozomi Guardian | AI multi-protocol | AI device profiling, wireless discovery (Wi-Fi, Bluetooth, Zigbee, LoRaWAN), smart polling | Scalability constraints, strategic sensor placement requirements |
Universal Industry Limitations | |||
Incomplete proprietary protocol coverage. | |||
Difficulty distinguishing PLC/RTU/HMI types. | |||
Limited firmware detection virtual system identification challenges. | |||
Unresolved discovery–safety tension. |
Parameter | Description | Impact |
---|---|---|
Base model | Gemma-3 | Foundation model |
Fine-tuning method | LoRA (PEFT) | Reduces training cost while retaining pre-trained knowledge |
Adapted modules | q_proj, v_proj, k_proj, o_proj, gate_proj, up_proj, down_proj | Limits fine-tuning to selected model components |
LoRA settings | , , dropout = 0.1 | Control adaptation range and regularization |
Token limit | 1024 tokens | Defines max input length per example |
Batch size | 4 (effective 32) | Number of samples per training step (after gradient accumulation) |
Learning rate | Controls weight update size during training | |
Scheduler | Cosine annealing | Adjusts learning rate dynamically |
Epochs | 2 | Number of full passes through the dataset |
Training steps | 500 | Total training updates applied |
Optimizer | AdamW (weight decay 0.01) | Improves convergence while mitigating overfitting |
Gradient checkpointing | Disabled | Ensures LoRA compatibility |
Tokenizer | Native Gemma tokenizer | Processes input sequences for the model |
Chunk size/overlap | 2000/200 characters | Maintains context across chunked input |
Data format | Instruction, input, output tuples | Structures tasks for supervised fine-tuning |
Sampling config | temperature = 0.7, top-p = 0.9, top-k = 40 | Balances determinism and response diversity during inference |
Hardware | NVIDIA RTX 4090 | GPU used for training |
Model | Average Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
---|---|---|---|---|
Gemma3:27b | 66.67 | 46.00 | 66.00 | 54.00 |
DeepSeek-R1:32b | 58.30 | 25.00 | 50.00 | 33.33 |
Dolphin-Mistral:7b | 33.00 | 29.00 | 33.00 | 27.00 |
Model | Avg. Time |
---|---|
Gemma3:27b | 79.47 s |
DeepSeek-R1:32b | 48.49 s |
Dolphin-Mistral:7b | 28.53 s |
Model | Avg. GPU Usage |
---|---|
Gemma3:27b | 83.90% |
DeepSeek-R1:32b | 79.96% |
Dolphin-Mistral:7b | 67.85% |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Coppolino, L.; Iannaccone, A.; Nardone, R.; Petruolo, A. Asset Discovery in Critical Infrastructures: An LLM-Based Approach. Electronics 2025, 14, 3267. https://doi.org/10.3390/electronics14163267
Coppolino L, Iannaccone A, Nardone R, Petruolo A. Asset Discovery in Critical Infrastructures: An LLM-Based Approach. Electronics. 2025; 14(16):3267. https://doi.org/10.3390/electronics14163267
Chicago/Turabian StyleCoppolino, Luigi, Antonio Iannaccone, Roberto Nardone, and Alfredo Petruolo. 2025. "Asset Discovery in Critical Infrastructures: An LLM-Based Approach" Electronics 14, no. 16: 3267. https://doi.org/10.3390/electronics14163267
APA StyleCoppolino, L., Iannaccone, A., Nardone, R., & Petruolo, A. (2025). Asset Discovery in Critical Infrastructures: An LLM-Based Approach. Electronics, 14(16), 3267. https://doi.org/10.3390/electronics14163267