TwinFedPot: Honeypot Intelligence Distillation into Digital Twin for Persistent Smart Traffic Security
Abstract
1. Introduction
- We introduce a novel inverse distillation strategy where honeypots act as passive learners and generate semantic logits using Zero-Shot Learning, rather than training local models.
- We implement TwinFedPot, a DT-based framework that aggregates logits at the central server to construct and update a teacher model capable of real-time threat classification without raw data sharing.
- Our approach enhances system scalability, adaptability, and privacy, making it suitable for large-scale, real-time smart traffic infrastructures and other cyber–physical systems.
- Extensive experiments on the CIC-IoT2023 dataset validate our system’s performance, showing improved generalization to unseen attacks, and reduced computational burden at the edge.
2. Literature Review
2.1. Digital Twins
- IT/OT infrastructure: This layer connects the physical and digital environments using networks, sensors, cloud platforms, and computing resources. It ensures reliable data flow, system scalability, and operational efficiency.
- Virtual representation: This component models the physical system using methods such as 3D modeling, simulation, or ML. It is continuously updated to reflect real-world conditions accurately.
- Service layer: Acting as middleware, this layer provides functions for accessing, updating, and synchronizing digital twins. It maintains consistency through one-way, two-way, or real-time data exchange.
- Security and trustworthiness: This layer ensures data integrity, access control, and transparency. It defines roles, responsibilities, and policies to protect the system from cyber threats.
- Applications: This top layer focuses on real-world implementations across domains such as healthcare, smart manufacturing, energy, and urban planning, showcasing the value of DTs in improving decision-making and performance.
2.2. Honeypots
2.3. Federated Distillation (FD)
- Simple voting: In this approach, each student model transmits its hard class prediction (i.e., the argmax of its output probability distribution) to the teacher. The teacher then aggregates these predictions and selects the most frequently occurring class label among them. For example, the authors of [16] use the majority vote of lightweight student models to determine the final image classification output.
- Logit averaging: Students send raw logits or softmax outputs, which are averaged to form a consensus. This approach has been effectively adopted in various domains. For instance, Sahraoui et al. [15] designed a federated distillation framework in the IoV-healthcare context, where the server broadcasts influenza-related insights, and clients return distilled logits, ensuring privacy and reducing communication overhead.
- Kullback–Leibler (KL) divergence: The teacher receives soft probability distributions (obtained by applying softmax to logits) from the student models and computes their average to form a consensus distribution. The teacher model is then updated by minimizing the KL divergence between its own predicted probability distribution and this aggregated consensus. The authors of [17] refer to this loss as “oracle loss”, which aligns the teacher’s output with the collective semantic knowledge distilled from the students.
2.4. Related Works
2.4.1. Blockchain-Based Security Frameworks
2.4.2. ML/DL-Based Security Architectures
3. System Model
- : A set of distributed honeypots deployed across the network.
- : The local dataset at honeypot , consisting of samples , where each is a log-derived feature vector continuously collected from system activities.
- : Labels annotated using the MITRE ATT&CK framework [32], representing observed Tactics, Techniques, and Procedures (TTPs).
- : A local student model at honeypot , trained or fine-tuned on dataset , or used to infer unseen attack classes using ZSL.
- : A global teacher model hosted on the DT cloud server, updated using knowledge distilled from the honeypots.
- : The soft prediction (logits or probability vector) produced by the student model for input x.
- Physical assets: Each vehicle is equipped with a Telematics Control Unit (TCU), an embedded device that collects data from key vehicle sensors (e.g., GPS, OBD-II, fuel level, tire pressure, speed). The TCU processes these real-time data and transmits them to a cloud server via wireless networks (e.g., GPRS, cellular, LTE) in a structured format for further analysis.
- Traffic digital twin (TDT) system: Hosted on a cloud server, the TDT continuously processes and analyzes incoming data (as outlined in Algorithm 1) using advanced algorithms and ML techniques. This processing yields actionable, real-time insights for congestion analysis, dynamic route optimization, and predictive navigation. The system also integrates the global FD model, which aggregates the distilled knowledge from local honeypots. This aggregation mechanism functions as a decay strategy, mitigating potential attacks targeting the physical infrastructure and enhancing overall system resilience.
Algorithm 1 TwinFedPot: inverse FD for DT |
Require: Initial DT model parameters , temperature scaling factor T |
Ensure: Updated global DT model |
1: while any honeypot is active do |
2: for each honeypot in parallel do |
3: Acquire and preprocess local log data from host system: |
4: Apply the local zero-shot student model to each input |
5: Compute soft predictions using temperature-scaled softmax: |
6: Transmit to the DT server ▹ Only soft labels are shared. |
7: end for |
8: Aggregate all received client predictions: |
9: Update the DT model by minimizing the KL-based distillation loss: |
10: Optionally broadcast updated teacher predictions to honeypots. |
11: Increment round: |
12: end while return Continuously enhance DT model ▹ TwinFedPot provides scalable, privacy-preserving, and attack-aware without training and raw data sharing. |
- Adaptive honeypot network: Cloud-based honeypots are strategically deployed across multiple servers, leveraging high-speed monitoring and logging to detect and record malicious activities in real time. The intelligent detection process unfolds through several key stages:
- -
- Attack detection and logging: Each honeypot mimics a genuine system, enticing attackers to interact with it. Upon an attempted intrusion, the honeypot logs detailed information such as
- *
- Source of the attack (IP address, geolocation, attack vector);
- *
- Command execution logs;
- *
- Traffic patterns and anomalies;
- *
- Access attempts and privilege escalation;
- *
- Protocols used (e.g., TCP, UDP, ICMP).
- Before local model training or ZSL-based inference, raw attack data are preprocessed to extract meaningful features, such as access frequency, request patterns, and behavioral anomalies, and to label known attacks based on observed behavior.
- -
- Local model training/inferencing: Each honeypot either trains a local model or ZSL on its processed log data. By analyzing features like access frequency and behavioral patterns, the honeypot identifies both known and novel attacks, enabling adaptive detection without prior labeled examples.
- -
- IFD for global defense: Each honeypot shares distilled knowledge (e.g, logits or soft labels) or updated model parameters with the central server. This secure knowledge exchange strengthens the global model’s ability to detect emerging and previously unseen threats while maintaining low communication overhead and preserving data sovereignty. As illustrated in Figure 3 and formalized in Equations (2) and (3), the DT aggregates these soft predictions and refines the global model through KL-divergence minimization, enabling robust and generalized threat detection.
4. Evaluation Analysis
- Accuracy, representing the proportion of correctly classified instances, is given by
- True Positive Rate (TPR), also referred as recall or sensitivity, measures the fraction of actual attacks that are correctly identified:
- False Positive Rate (FPR) quantifies the proportion of benign instances misclassified as attacks, calculated as
- F1-score, which provides a harmonic mean between precision and recall, is defined as
4.1. Dataset
4.2. Experimental Results
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Sahraoui, Y.; Kerrache, C.A.; Hadjkouider, A.M. Crowd Management Using Mobile Crowdsourcing and Remote Sensing During the COVID-19 Pandemic. In Mobile Crowdsensing and Remote Sensing in Smart Cities; Springer: Berlin/Heidelberg, Germany, 2024; pp. 141–159. [Google Scholar]
- Sahraoui, Y.; Kerrache, C.A. Safety in the Sky: Cloud-Powered Smart Security for Vehicular Crowdsensing. In Mobile Crowdsensing and Remote Sensing in Smart Cities; Springer: Berlin/Heidelberg, Germany, 2024; pp. 187–199. [Google Scholar]
- Abbasi, S.; Rahmani, A.M.; Balador, A.; Sahafi, A. Internet of Vehicles: Architecture, services, and applications. Int. J. Commun. Syst. 2021, 34, e4793. [Google Scholar] [CrossRef]
- Sahraoui, Y.; Kerrache, C.A. Sensors and Metaverse-Digital Twin Integration: A Path for Sustainable Smarter Cities. In Mobile Crowdsensing and Remote Sensing in Smart Cities; Springer: Berlin/Heidelberg, Germany, 2024; pp. 41–50. [Google Scholar]
- Ariyachandra, M.M.F.; Wedawatta, G. Digital twin smart cities for disaster risk management: A review of evolving concepts. Sustainability 2023, 15, 11910. [Google Scholar] [CrossRef]
- IBM Security and Ponemon Institute. 2020 Cost of a Data Breach Report. 2020. Available online: https://www.ibm.com/think/x-force/whats-new-2020-cost-of-a-data-breach-report (accessed on 7 July 2025).
- Sedlmeir, J.; Buhl, H.U.; Fridgen, G.; Keller, R. The Energy Consumption of Blockchain Technology: Beyond Myth. Bus. Inf. Syst. Eng. 2020, 62, 599–608. [Google Scholar] [CrossRef]
- Michael, J.; Pfeiffer, J.; Rumpe, B.; Wortmann, A. Integration challenges for digital twin systems-of-systems. In Proceedings of the 10th IEEE/ACM International Workshop on Software Engineering for Systems-of-Systems and Software Ecosystems, Pittsburgh, PA, USA, 16 May 2022; pp. 9–12. [Google Scholar]
- DT Consortium. Detital Twin Digital Twin Platform Stack Architectural Framework; DT Consortium: Milford, MA, USA, 2023. [Google Scholar]
- Hegedüs, D.L.; Balogh, Á.; Érsok, M.; Erdődi, L.; Olcsák, L.; Bánáti, A. Beyond Static Defense: Dynamic Honeypots for Proactive Threat Engagement. In Proceedings of the 2024 IEEE 18th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timisoara, Romania, 23–25 May 2024; pp. 000547–000552. [Google Scholar]
- Shi, L.; Li, Y.; Liu, T.; Liu, J.; Shan, B.; Chen, H. Dynamic distributed honeypot based on blockchain. IEEE Access 2019, 7, 72234–72246. [Google Scholar] [CrossRef]
- Biedermann, S.; Mink, M.; Katzenbeisser, S. Fast dynamic extracted honeypots in cloud computing. In Proceedings of the 2012 ACM Workshop on Cloud Computing Security Workshop, Raleigh, NC, USA, 19 October 2012; pp. 13–18. [Google Scholar]
- Wang, Y.; Su, Z.; Benslimane, A.; Xu, Q.; Dai, M.; Li, R. Collaborative honeypot defense in uav networks: A learning-based game approach. IEEE Trans. Inf. Forensics Secur. 2023, 19, 1963–1978. [Google Scholar] [CrossRef]
- Wang, Z.; You, J.; Wang, H.; Yuan, T.; Lv, S.; Wang, Y.; Sun, L. HoneyGPT: Breaking the Trilemma in Terminal Honeypots with Large Language Model. arXiv 2024, arXiv:2406.01882. [Google Scholar] [CrossRef]
- Sahraoui, Y.; Kerrache, C.A.; Calafate, C.T.; Manzoni, P. FedRx: Federated distillation-based solution for preventing hospitals overcrowding during seasonal diseases using MEC. In Proceedings of the 2024 IEEE 21st Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 6–9 January 2024; pp. 558–561. [Google Scholar]
- Kang, J.; Gwak, J. Ensemble learning of lightweight deep learning models using knowledge distillation for image classification. Mathematics 2020, 8, 1652. [Google Scholar] [CrossRef]
- Kang, M.; Mun, J.; Han, B. Towards oracle knowledge distillation with neural architecture search. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 4404–4411. [Google Scholar]
- Putz, B.; Dietz, M.; Empl, P.; Pernul, G. Ethertwin: Blockchain-based secure digital twin information management. Inf. Process. Manag. 2021, 58, 102425. [Google Scholar] [CrossRef]
- Liu, X.; Jiang, Y.; Wang, Z.; Zhong, R.Y.; Cheung, H.; Huang, G.Q. imseStudio: Blockchain-enabled secure digital twin platform for service manufacturing. Int. J. Prod. Res. 2023, 61, 3984–4003. [Google Scholar] [CrossRef]
- Kumar, P.; Kumar, R.; Kumar, A.; Franklin, A.A.; Garg, S.; Singh, S. Blockchain and deep learning for secure communication in digital twin empowered industrial IoT network. IEEE Trans. Netw. Sci. Eng. 2022, 10, 2802–2813. [Google Scholar] [CrossRef]
- Amofa, S.; Xia, Q.; Xia, H.; Obiri, I.A.; Adjei-Arthur, B.; Yang, J.; Gao, J. Blockchain-secure patient Digital Twin in healthcare using smart contracts. PLoS ONE 2024, 19, e0286120. [Google Scholar] [CrossRef]
- Kumar, R.; Aljuhani, A.; Javeed, D.; Kumar, P.; Islam, S.; Islam, A.N. Digital twins-enabled zero touch network: A smart contract and explainable AI integrated cybersecurity framework. Future Gener. Comput. Syst. 2024, 156, 191–205. [Google Scholar] [CrossRef]
- Khan, A.; Shahid, F.; Maple, C.; Ahmad, A.; Jeon, G. Toward smart manufacturing using spiral digital twin framework and twinchain. IEEE Trans. Ind. Inform. 2020, 18, 1359–1366. [Google Scholar] [CrossRef]
- Lu, Y.; Huang, X.; Zhang, K.; Maharjan, S.; Zhang, Y. Low-latency federated learning and blockchain for edge association in digital twin empowered 6G networks. IEEE Trans. Ind. Inform. 2020, 17, 5098–5107. [Google Scholar] [CrossRef]
- Liao, S.; Wu, J.; Bashir, A.K.; Yang, W.; Li, J.; Tariq, U. Digital twin consensus for blockchain-enabled intelligent transportation systems in smart cities. IEEE Trans. Intell. Transp. Syst. 2021, 23, 22619–22629. [Google Scholar] [CrossRef]
- Salim, M.M.; Camacho, D.; Park, J.H. Digital twin and federated learning enabled cyberthreat detection system for IoT networks. Future Gener. Comput. Syst. 2024, 161, 701–713. [Google Scholar] [CrossRef]
- Krishnaveni, S.; Sivamohan, S.; Jothi, B.; Chen, T.M.; Sathiyanarayanan, M. TwinSec-IDS: An Enhanced Intrusion Detection System in SDN-Digital-Twin-Based Industrial Cyber-Physical Systems. Concurr. Comput. Pract. Exp. 2025, 37, e8334. [Google Scholar] [CrossRef]
- Boyina, V.A.K.; Chettier, T.M.; Patolia, D.; Gupta, N. Cloud-Based Digital Twin for Cybersecurity Threat Prediction. In Proceedings of the 2025 3rd International Conference on Advancement in Computation & Computer Technologies (InCACCT), Mohali, India, 17–18 April 2025; pp. 762–766. [Google Scholar] [CrossRef]
- Suhail, S.; Iqbal, M.; Hussain, R.; Jurdak, R. ENIGMA: An explainable digital twin security solution for cyber–physical systems. Comput. Ind. 2023, 151, 103961. [Google Scholar] [CrossRef]
- Yigit, Y.; Bal, B.; Karameseoglu, A.; Duong, T.Q.; Canberk, B. Digital Twin-Enabled Intelligent DDoS Detection Mechanism for Autonomous Core Networks. IEEE Commun. Stand. Mag. 2022, 6, 38–44. [Google Scholar] [CrossRef]
- Yigit, Y.; Kinaci, O.K.; Duong, T.Q.; Canberk, B. TwinPot: Digital twin-assisted honeypot for cyber-secure smart seaports. In Proceedings of the 2023 IEEE International Conference on Communications Workshops (ICC Workshops), Rome, Italy, 28 May–1 June 2023; pp. 740–745. [Google Scholar]
- MITRE Corporation. MITRE ATT&CK Framework. 2024. Available online: https://attack.mitre.org (accessed on 27 March 2025).
- Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CICIoT2023: A real-time dataset and benchmark for large-scale attacks in IoT environment. Sensors 2023, 23, 5941. [Google Scholar] [CrossRef]
- Li, L.; Gou, J.; Yu, B.; Du, L.; Yi, Z.; Tao, D. Federated Distillation: A Survey. arXiv 2024, arXiv:2404.08564. [Google Scholar] [CrossRef]
- Hilprecht, B.; Binnig, C. Zero-Shot Cost Models for Out-of-the-box Learned Cost Prediction. In Proceedings of the 12th Conference on Innovative Data Systems Research (CIDR), Amsterdam, The Netherlands, 12–15 January 2020. [Google Scholar]
Study | Decentralized Arch. | Tech. | Attack Awareness | Privacy Support | Defense Adaptivity | Resource Efficiency | Real-Time Monitoring | Scalability |
---|---|---|---|---|---|---|---|---|
[18] | ✓ | Blockchain | Limited | Partial | ✗ | Medium | Medium | High |
[19] | ✓ | Blockchain | ✗ | ✓ | ✗ | Medium | High | Medium |
[20] | ✓ | Blockchain, DL | Weak | ✓ | Moderate | ✗ | ✓ | Low |
[21] | ✓ | Blockchain | ✗ | ✓ | ✗ | ✗ | ✓ | Low |
[23] | ✓ | Blockchain | ✓ | ✓ | ✓ | Low | N/A | Moderate |
[22] | ✓ | Blockchain, XAI | Limited | ✓ | Moderate | Low | ✗ | Moderate |
[24] | ✓ | Blockchain, FL | ✓ | ✓ | ✓ | Low | ✓ | Moderate |
[25] | ✓ | Blockchain, Auction | Limited | Moderate | Low | Low | N/A | ✓ |
[26] | ✓ | Opt. FL | ✓ | ✓ | ✓ | Moderate | ✓ | ✓ |
[27] | ✗ | Hybrid DL | ✓ | ✗ | ✓ | Moderate | ✓ | Medium |
[28] | ✗ | ML | ✓ | ✓ | Partial | Medium | ✓ | Moderate |
[29] | ✗ | XAI | ✓ | ✓ | ✓ | ✓ | ✗ | ✓ |
[31] | ✗ | Honeypot, ML | High | Weak | Moderate | Low | Medium | Low |
Ours | ✓ | Honeypot + DT + FD | High | High | High | High | High | High |
AI Model | Accuracy | TPR | Precision | F1-Score |
---|---|---|---|---|
Decision Trees (DTs) | 0.9708 | 0.8337 | 0.8104 | 0.8215 |
Random Forest (RF) | 0.9825 | 0.7955 | 0.9885 | 0.8658 |
XGBoost | 0.9836 | 0.8195 | 0.9677 | 0.8768 |
Adaboost | 0.9681 | 0.6964 | 0.9599 | 0.7661 |
Logistic Regression (LR) | 0.8930 | 0.4042 | 0.7874 | 0.4200 |
Quadratic Discriminant Analysis (QDA) | 0.4918 | 0.6393 | 0.5262 | 0.4412 |
DNN | 0.9624 | 0.5893 | 0.9156 | 0.6186 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Sahraoui, Y.; Hadjkouider, A.M.; Kerrache, C.A.; Calafate, C.T. TwinFedPot: Honeypot Intelligence Distillation into Digital Twin for Persistent Smart Traffic Security. Sensors 2025, 25, 4725. https://doi.org/10.3390/s25154725
Sahraoui Y, Hadjkouider AM, Kerrache CA, Calafate CT. TwinFedPot: Honeypot Intelligence Distillation into Digital Twin for Persistent Smart Traffic Security. Sensors. 2025; 25(15):4725. https://doi.org/10.3390/s25154725
Chicago/Turabian StyleSahraoui, Yesin, Abdessalam Mohammed Hadjkouider, Chaker Abdelaziz Kerrache, and Carlos T. Calafate. 2025. "TwinFedPot: Honeypot Intelligence Distillation into Digital Twin for Persistent Smart Traffic Security" Sensors 25, no. 15: 4725. https://doi.org/10.3390/s25154725
APA StyleSahraoui, Y., Hadjkouider, A. M., Kerrache, C. A., & Calafate, C. T. (2025). TwinFedPot: Honeypot Intelligence Distillation into Digital Twin for Persistent Smart Traffic Security. Sensors, 25(15), 4725. https://doi.org/10.3390/s25154725