Optimizing Internet of Things Honeypots with Machine Learning: A Review

Lanz, Stefanie; Pignol, Sarah Lily-Rose; Schmitt, Patrick; Wang, Haochen; Papaioannou, Maria; Choudhary, Gaurav; Dragoni, Nicola

doi:10.3390/app15105251

Open AccessReview

Optimizing Internet of Things Honeypots with Machine Learning: A Review

by

Stefanie Lanz

,

Sarah Lily-Rose Pignol

,

Patrick Schmitt

,

Haochen Wang

,

Maria Papaioannou

,

Gaurav Choudhary

and

Nicola Dragoni

^*

Section of Cybersecurity Engineering, Department of Applied Mathematics and Computer Science, Technical University of Denmark (DTU), 2800 Kongens Lyngby, Denmark

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(10), 5251; https://doi.org/10.3390/app15105251

Submission received: 1 April 2025 / Revised: 30 April 2025 / Accepted: 5 May 2025 / Published: 8 May 2025

(This article belongs to the Special Issue Machine Learning and Data Analysis: Bridging Theory and Real-World Solutions)

Download

Browse Figures

Versions Notes

Abstract

The increasing use of Internet of Things (IoT) devices has led to growing security concerns, necessitating advanced solutions to address emerging threats. Honeypots enhance IoT security by attracting and analyzing attackers. However, traditional honeypots struggle with adaptability and efficiency. This paper examines how machine learning enhances honeypot capabilities by improving threat detection and response mechanisms. A systematic literature review using the snowballing method explores the application of supervised, unsupervised, and reinforcement learning. Various classifiers for machine learning are analyzed to optimize honeypot architectures. This paper focuses on two types of honeypots: dynamic honeypots, which evolve to mislead attackers, and adaptive honeypots, which respond to threats in real time. By evaluating low-interaction, high-interaction, and hybrid honeypots, we determine how different machine learning techniques enhance detection and resource efficiency. Key findings include improved detection rates, with machine learning techniques, particularly supervised learning models like random forest, significantly enhancing detection accuracy, achieving up to 0.96 accuracy. Adaptive honeypots utilizing machine learning demonstrate better resource management, reducing false positives and optimizing computational resources. Despite these improvements, high computational demands and limited real-world testing hinder widespread adoption in IoT environments. This paper provides an overview of current trends, identifies research gaps, and offers insights for developing more intelligent IoT honeypots. There is no doubt that machine learning can help create more resilient and adaptive security solutions for IoT networks.

Keywords:

honeypot; internet of things; IoT; machine learning

1. Introduction

The Internet of Things (IoT) is increasingly critical in industry and society, impacting domains such as smart cities and homes. According to McKinsey [1], the global value of IoT could reach $5.5 to $12.6 trillion by 2030, including the value captured by consumers and customers of IoT products and services. The proliferation of IoT devices generates massive data volumes, necessitating efficient and secure communication [2]. As research by Nikolov [3] demonstrates, standard protocols such as XMPP, CoAP, and HTTP are often inadequate for IoT environments, leading to security vulnerabilities. IoT security faces numerous threats, including denial-of-service (DoS) attacks, eavesdropping, privilege escalation, and web injection [2]. The reliance on real-time wireless communication further expands the attack surface.

Honeypots play a vital role in IoT security by simulating attacks and capturing malicious activities. As the survey provided by Bringer [4] highlights, honeypots are used against various attacks, including malware, phishing, and DoS attacks. However, existing systems primarily focus on individual protocols and lack comprehensive platform simulations. Effective honeypots must exhibit realistic behaviors and interactions to deceive attackers. The diversity of IoT communication protocols complicates the development of versatile honeypots, and many protocols remain insufficiently tested [5].

Machine learning has emerged as a promising solution to address the challenges associated with IoT honeypots. It enables rapid analysis of extensive datasets, identifying attack patterns and malicious behaviors. Machine learning can automate security system adaptations, providing effective solutions for resource-constrained IoT environments [6]. Hussain [6] collected different machine learning techniques for various IoT threats, such as DoS and malware attacks. By leveraging historical attack data, machine learning models enhance threat detection and identify emerging attack vectors. Additionally, machine learning-driven honeypots can dynamically generate and adapt simulated environments, improving their realism and effectiveness [7].

This paper highlights the key characteristics of IoT honeypots with the application of machine learning techniques. In particular, the authors provide an in-depth analysis of specific honeypot and machine learning techniques, their functionalities, and benefits. The authors also introduce architectural approaches for IoT honeypots and compare findings to derive insights to optimize honeypot architectures and apply machine learning techniques in IoT security. The following research questions guide the analysis of this research work:

RQ 1: What is the current state of research on the integration of machine learning in IoT honeypots?

This question aims to provide an overview of existing studies and frameworks that explore the application of machine learning techniques in IoT honeypots. By reviewing the current literature, this research work seeks to identify key trends, methodologies, and research gaps, forming the basis for understanding how machine learning is being integrated into honeypot technologies.

RQ 2: Are there existing IoT honeypots that utilize machine learning, and what techniques do they employ?

The second research question examines practical implementations of machine learning-based IoT honeypots. It focuses on the specific machine learning techniques employed, such as supervised, unsupervised, or reinforcement learning, and their contributions to improving attack detection, classification, and response mechanisms. This analysis aims to highlight the practical benefits of machine learning in enhancing honeypot interactions, adaptability, and threat identification.

RQ 3: What challenges can be addressed using machine learning, and what potential improvements can it bring to IoT honeypots?

The third research question explores the broader implications of machine learning integration in IoT honeypots. By assessing the limitations and challenges currently faced by honeypots, this research work investigates how machine learning technologies can mitigate these issues. Additionally, this question examines the potential advantages of machine learning, including the development of dynamic honeypots and the combined use of multiple machine learning methodologies to create more adaptive and resilient security solutions.

To address these research questions, the authors conduct a systematic literature review following Wohlin’s snowballing methodology [8], with results classified using a concept-matrix approach inspired by Webster’s framework [9]. The core concepts analyzed include interactive honeypots, machine learning methodologies, detection architectures, and classifier technologies. The results section addresses the first two research questions by examining existing approaches and techniques. Building on these findings, the discussion section explores the advantages of dynamic honeypots over adaptive honeypots in IoT environments and evaluates the combined benefits of multiple machine learning methodologies. By synthesizing the current state of research and identifying areas for improvement, this paper aims to contribute to the advancement of machine learning-enhanced IoT honeypots, ultimately bolstering cybersecurity in IoT ecosystems.

The rest of this paper is organized as follows: Section 2 provides the theoretical background, introducing IoT, machine learning, and honeypot concepts. Section 3 compares this paper with existing surveys, highlighting key differences and contributions. Section 4 outlines the methodology of the systematic literature review. Section 5 presents the findings, covering interaction honeypots, machine learning techniques, detection architectures, and classifier technologies. Section 6 discusses the potential of dynamic honeypots over adaptive honeypots and examines the combined advantages of multiple machine learning techniques in IoT honeypots.

2. Theoretical Background

In this section, a theoretical background to honeypots, the IoT, and machine learning is provided. These subsections are general and are intended to introduce the technology as an overview.

2.1. Honeypot

Honeypots are a tool for cyber deception that was initially introduced in the 1990s [4]. They are used to detect intruders and trick them into connecting to a certain system within the network. Ultimately, any data collected by an honeypot can be categorized as malicious and the result of an intrusion [10]. This can be interpreted in contrast to the fact that honeypots are useless if they are detected and subsequently avoided by attackers [11].

Honeypots can be divided into two main categories, depending on their purpose: research honeypots and production honeypots [11]. The purpose of the research honeypot is to collect, observe, and study the actions of attackers and the tools they use to penetrate the system. Its purpose is to identify unknown vulnerabilities and attacks [11]. Therefore, its main purpose is to collect information about how attackers operate and execute attacks in preparation for the possibility of a real attack on an organization. However, it should be noted that the research honeypot itself does not contribute to security, but only the collected data can be used to improve security in the future [12]. In contrast to research honeypots, production honeypots are primarily used for defense. They are placed in the production network behind the firewall and serve to mislead attackers by simulating a real system. This deception not only distracts attackers from the actual system but also notifies system administrators of potential attackers [11]. Production honeypots mimic corporate networks or services and entice attackers to expose vulnerabilities while collecting valuable data. They excel at detection performance by addressing common issues of IDS such as false positives and negatives and handling large traffic volumes. Unlike IDS, traffic to a honeypot is not likely to be authorized, simplifying analysis and reducing noise. While less effective as a standalone prevention mechanism, they complement robust security measures such as firewalls and system patching. Honeypots also enable detailed forensic analysis by isolating compromised computers without disrupting production systems [12].

As mentioned earlier, honeypots can be categorized by their purpose. Another very important way of categorizing honeypots is by their level of interaction. In general, honeypots can be divided as low-, medium-, and high-interaction honeypots [13]. Low-interaction honeypots simulate limited services without providing full system access, thus reducing risks but offering restricted functionality [13]. Examples like Honeyd and Specter are easy to implement, providing a façade for attackers while supporting the analysis of threats such as spammers and worms [12]. Medium-interaction honeypots, on the other hand, offer more technical complexity than low-interaction honeypots, but they remain less advanced than high-interaction ones. They simulate services that create a more convincing illusion of an operating system, enabling the logging and analysis of more sophisticated attacks while maintaining a low risk of compromise [14]. Examples such as mwcollect, nepenthes, and honeytrap are characterized by collecting malware and handling some unknown attacks through dynamic responses [12]. High-interaction honeypots are the most advanced, providing attackers with a real operating system to interact with, which allows for extensive data collection and analysis. However, they are complex to design, carry higher risks, and require constant monitoring to prevent them from becoming security vulnerabilities. Honeynets are a notable example, primarily used for research purposes [12,13].

Honeypots offer significant advantages as well as significant risks, making them a valuable but cautious addition to security mechanisms. On the one hand, they excel at collecting small, high-value datasets by focusing exclusively on traffic directed at them, avoiding the burdens of overwhelming logs or excessive alerts. Furthermore, their minimal resource requirements make them cost-effective, as they can operate efficiently on low-end or retired systems [12]. On the other hand, honeypots come with inherent limitations. They only monitor interactions in which they are directly involved, so attacks on other parts of the system remain undetected. Furthermore, attackers can reveal a honeypot’s true nature through fingerprinting, and if a honeypot is compromised, it could be exploited for attacks or to store illegal content [12].

2.2. Internet of Things

At its core, the IoT serves as a bridge between the real and virtual worlds, facilitating interactions between physical objects and digital systems. Through the internet, IoT devices collect, process, and share data, enabling real-time monitoring, control, and automation. This connectivity transforms traditional objects into smart devices capable of improving efficiency, optimizing resources, and enhancing user experiences. The concept of IoT has evolved significantly over the decades. The first instance of IoT can be traced back to the mid-1970s at Carnegie Mellon University, where a Coke vending machine was connected to the internet, allowing students to check its status remotely. This early example demonstrated the potential of connected devices long before the term “IoT” was coined [15]. In 2005, the International Telecommunication Union (ITU) formally introduced the concept of IoT at the World Summit on the Information Society in Tunisia, presenting it as a revolutionary technology with the potential to reshape global society [15].

IoT can be simply defined as the integration of sensors, processors, and controllers for actuation, all interconnected via the internet. These systems gather data from the environment, process them, and analyze them to inform decisions or trigger automated actions. Data streams generated by IoT devices can be stored for future reference or processed in real-time to enable immediate responses, making IoT a powerful tool for improving efficiency and enabling autonomous operations. A defining characteristic of IoT is its ability to operate autonomously, minimizing human intervention. IoT systems monitor and control processes independently, ensuring consistent and efficient performance. These systems also optimize resource utilization, reduce delays, and save time, making them invaluable in applications ranging from healthcare to industrial automation. Despite their potential, many IoT devices operate on low-cost hardware, which restricts their ability to perform complex tasks or implement advanced security measures. Additionally, their reliance on low-bandwidth communication channels poses challenges for data-heavy applications and limits the adoption of robust encryption, increasing their vulnerability to attacks. The deployment of IoT devices in physically insecure or hostile environments further exposes them to risks. Remote management and the absence of manual control make it difficult to monitor and secure these devices effectively. Many IoT systems also rely on Radio Frequency Identification (RFID) technology, which is vulnerable to eavesdropping, spoofing, and other security threats, adding to the challenges of ensuring safe operations [16]. IoT systems are susceptible to a variety of security threats, including sinkhole, HELLO flood, wormhole, and Sybil attacks. In a sinkhole attack, a malicious node disrupts communication by attracting and manipulating traffic within the network. HELLO flood attacks involve flooding the network with false messages, leading to instability and data loss. Wormhole attacks tunnel data between distant nodes, creating routing disruptions, while Sybil attacks see a malicious node assuming multiple identities, enabling network manipulation or denial of service. These threats underscore the importance of robust security measures in IoT environments [16].

The IoT represents a groundbreaking shift in how the physical and digital worlds interact. By connecting real-world objects to the internet, IoT enables intelligent systems that enhance efficiency and autonomy. However, the challenges of resource limitations and security vulnerabilities must be addressed to fully realize its potential. As IoT continues to evolve, it remains a cornerstone of innovation across industries, offering unprecedented opportunities for connectivity and automation.

2.3. Machine Learning

Machine learning has its origins in the 1950s, when the term “intelligent machinery” was introduced [17]. Among the earliest advances is the work of Arthur Samuel, who developed the first machine learning algorithm in 1955, known as the Draughts algorithm, which could play a game [17]. The development of the theory behind machine learning is heavily influenced by the concepts of neurobiology and mathematics [17]. As early as 1943, McCulloch and Pitts formulated a model of the neuron that used binary inputs and threshold logic, followed by Hebb’s 1949 rule, which describes the strengthening of neural connections during simultaneous activation [17]. Important milestones followed with Rosenblatt’s perceptron algorithm in 1957 and Minsky’s identification of the XOR problem in 1969, which showed the limitations of perceptrons [17]. In the 1980s, significant progress was made with the introduction of Hopfield networks and Boltzmann machines, as well as the refinement of multi-layer perceptrons (MLPs) by Rumelhart, Hinton, and McClelland with backpropagation. The development of classifiers and algorithms such as Support Vector Machines (1995), AdaBoost (1997), and Random Forests (2001) has expanded and improved the applicability of machine learning. The third wave of neural networks began around 2005, led by researchers such as Hinton, LeCun, and Bengio, who established the principles of deep learning [17].

Machine learning is a subfield of artificial intelligence, while deep learning is a specialized area within machine learning [17]. Today, machine learning is applied in a wide range of areas, including computer vision, prediction models, semantics, natural language processing, and information retrieval. Computer vision includes object recognition, detection, and processing, while prediction includes classification, analysis, and recommendation of topics. Semantic analysis refers to the process of relating the syntactic structures of paragraphs, sentences, and words to the entire text level. Natural language processing is concerned with programming and teaching computers to handle natural language data properly. Information retrieval is the science of searching for information in documents [17]. Important machine learning algorithms include linear classifiers, decision trees, support vector machines, and artificial neural networks, which form the basis for these applications and enable complex, data-driven solutions in numerous disciplines [17].

3. Comparison of Existing Surveys

In the existing literature, several surveys are already available that are thematically related to this research work. It is noticeable that these either focus on the use of machine learning techniques in specific cybersecurity domains or on honeypots in IoT environments, but without establishing a connection between the two aspects. In addition, most of the surveys are from 2018. This research work addresses this research gap by examining the application of machine learning techniques in the context of IoT honeypots. The thematic classification of the existing surveys is presented in Table 1.

This categorization makes it clear that, so far, no comprehensive survey exists that combines IoT honeypots with machine learning techniques. The aim of this research work is therefore to address this research gap and systematically analyze existing solutions.

The paper by Dangi et al. [18] belongs to the category of Machine Learning Techniques in Cybersecurity, as it extensively examines the use of machine learning and deep learning for securing 5G network slices. It analyzes the security risks that arise throughout the lifecycle of these virtual networks and illustrates how machine learning algorithms can be applied for attack detection, fault prevention, and network security optimization. Additionally, the paper provides a comprehensive overview of existing machine learning-based security solutions, compares various approaches, and develops a taxonomy that systematically describes the application of machine learning for different network functions. Since the focus is clearly on machine learning technologies for improving cybersecurity, this paper is categorized accordingly.

The paper by Franco et al. [19] falls into both the Honeypot Technology and IoT Environment categories, as it deals with the use of honeypots and honeynets to secure IoT, IIoT, and CPS systems. It analyzes the increasing threats to these interconnected environments due to their inherent vulnerabilities and demonstrates how honeypots can serve as a deception technology to identify attackers and better understand their methods. Furthermore, the paper presents a detailed taxonomy of existing honeypot and honeynet approaches, describes key design factors, and discusses open challenges in this research area. Due to its comprehensive approach, it is classified under both honeypot technology and IoT environments.

The paper by Jain et al. [20] belongs to the category Machine Learning Techniques in Cybersecurity, as it focuses on detecting and mitigating DDoS attacks in software-defined networks (SDNs) while analyzing various machine learning-based security solutions. It examines existing methods for identifying and mitigating DDoS attacks and evaluates them based on their effectiveness. Additionally, different types of DDoS attacks are described, along with the datasets, tools, and simulators used for attack detection. A key component of the paper is the comparative analysis of different DDoS detection techniques, where machine learning models play a crucial role. Since the paper strongly emphasizes machine learning-based security measures, it is clearly assigned to this category.

The paper by Kaur et al. [21] is primarily categorized under Honeypot Technology, as it addresses various security threats to IoT devices and highlights honeypots as an effective protective measure against attacks. It analyzes the vulnerabilities of different IoT devices and discusses various security solutions for detecting and combating threats such as botnet attacks, malware, and hacking. A significant portion of the paper is dedicated to the role of honeypots, which can deceive attackers and prevent attacks proactively, eliminating the need for subsequent detection. Additionally, it describes a simple implementation of a honeypot for resource-constrained IoT devices. Since the paper only briefly touches on the IoT environment and primarily focuses on honeypot technologies as a security measure, it is mainly categorized under Honeypot Technology, with a minor connection to the IoT Environment.

The paper by Oza et al. [22] belongs to both the Honeypot Technology and IoT Environment categories, as it explores the application of honeypots and honeynets to secure IoT devices against cyberattacks. It highlights the growing threat of cyberattacks on IoT infrastructures and demonstrates how honeypots can be used as a deception technology to attract attackers, analyze their methods, and develop appropriate protective measures. While simple honeypots often secure only individual systems, the paper also describes the use of honeynets, which consist of networks of highly interactive honeypots that simulate more realistic attack targets. Through the comprehensive analysis of different implementation scenarios, the paper demonstrates how these technologies not only serve a preventive function but also provide valuable data for attack detection. Since the focus is on both honeypot technology and its specific application in the IoT environment, the paper is assigned to both categories.

The paper by Razali et al. [23] also falls into both the Honeypot Technology and IoT Environment categories, as it investigates the application of honeypots in the context of IoT security. It describes honeypots as isolated networks that mimic real systems to attract attackers and monitor their interactions. Particularly in the IoT sector, where connected devices increasingly become targets of cyberattacks, honeypots serve as an effective means of attack detection and threat pattern analysis. The paper emphasizes that research on honeypots in IoT environments is becoming increasingly relevant, as IoT devices, due to their widespread use and security gaps, represent attractive attack targets. It provides an overview of various honeypot implementations in IoT and highlights their significance for cybersecurity. Since both the technology and its specific application in the IoT environment are central, the paper is clearly categorized under both domains.

4. Methodology

In the following, the rationale behind employing a systematic literature review, the methods utilized, and the process of conducting the literature search is presented in detail.

4.1. Research Design

We conducted a systematic literature review to identify and summarize the current state of machine learning in the development of IoT honeypots. This approach provides a structured and unbiased process for identifying relevant research and ensures a comprehensive and transparent overview of the research field. A systematic review is particularly valuable in areas such as IoT, where the research landscape is large and diverse. By applying this research methodology, this study was able to minimize biases, such as sample selection bias, and provide a reproducible framework for other researchers to build on [8]. The review process followed the guidelines of Petersen et al. [24] and included Wohlin’s snowballing technique [8] to balance thoroughness and efficiency. This approach allows us to identify both fundamental literature and recent advances. In particular, Wohlin’s snowballing technique plays a crucial role by starting from an initial set of relevant papers and using their references (backward snowballing) and citations (forward snowballing) to broaden the scope of the review. This approach was essential for capturing retrospective and prospective research [8]. Petersen et al. also explicitly pointed out that snowballing is an effective strategy for conducting systematic literature reviews in various research areas [24].

After completing the snowballing process, the authors created a concept-matrix as part of the data analysis to classify the results. Following Petersen’s guidelines, this topic-specific classification helped us to both refine existing topics and identify emerging topics [24]. The concept-matrix was based on Webster’s concept-centered matrix methodology, which supports logical analysis and the identification of key concepts [9]. This framework also enabled a deeper understanding of the IoT honeypot research area and its intersection with machine learning technologies. By using both backward and forward snowballing effects and presenting them in a concept-matrix, this study was able to ensure a broad exploration of the field. This dual approach not only captured the current state of research but also provided insights into future directions in machine learning applications for IoT honeypots. This structured and strict methodology allowed us to build a solid foundation for knowledge growth in this emerging field.

4.2. Data Collection

For the collection of relevant literature, the authors employed a systematic and structured approach, adhering to best practices and guidelines from Petersen et al. [24]. The first step in the data collection process was to define key search terms to ensure that all relevant literature on IoT honeypots and machine learning techniques were included.

The authors defined the core search terms as “honeypot”, “IoT” and “Internet of Things”. These terms reflect the primary focus of this research work on the development of honeypots within the IoT domain. Given that these terms are quite broad, they were combined using the logical operator “AND” and “OR” to increase the relevance of the search results. The precise search string used across all databases was:

(“iot” OR “internet of things”) AND “honeypot”

This search string allowed us to retrieve literature directly related to IoT honeypots as well as those exploring other cyber security and machine learning concepts in this domain.

The authors selected four key databases for the literature search: Scopus, IEEE Xplore, ACM, and DTU Find. The authors chose these databases because of their broad and interdisciplinary coverage, making them well-suited for accessing a wide range of research articles in the relevant fields. Scopus was chosen for its extensive coverage of interdisciplinary scientific publications. It is particularly strong in computer science and applied engineering, making it an excellent source for conference papers and journal articles relevant to our paper. ACM Digital Library and IEEE Xplore are renowned for their focus on computer science and engineering, containing numerous publications on security architectures and machine learning techniques within the context of cybersecurity. DTU Find provides access to technical and engineering-focused research, including literature on advanced IoT technologies, which aligns well with the research objectives of this research work.

The literature research was conducted in October 2024. Therefore, the search includes all results from the latest developments in the field of machine learning for IoT honeypots up to and including this period. The results from these three databases are summarized in Table 2, which shows the number of hits per database. The search results are presented in terms of the number of hits, which reflects the total number of articles retrieved from each database. Initially, 40 relevant articles were identified based on the defined search terms, forming the basis for the subsequent steps of the literature review process followed by this research work.

To ensure the thoroughness of this review, the authors established inclusion and exclusion criteria, drawing upon the guidelines provided by Wohlin et al. [8]. These criteria were designed to ensure that only high-quality and relevant literature was included in our analysis. The inclusion criteria included studies, conference papers, and technical reports that addressed machine learning techniques in IoT honeypots or related security mechanisms. The authors also focused on literature that provided detailed descriptions of methods, experiments, or technologies used, as this was essential for data analysis of this research work.

The exclusion criteria were as follows:

Literature that did not address honeypots or was unrelated to IoT.
Literature that lacked insights into machine learning techniques.
Literature that focused narrowly on technologies outside the scope of our research.
Literature that only focused on the hardware aspects of honeypots and did not incorporate machine learning.

Following these selection criteria, the authors initially identified 40 relevant articles, which were then used as the foundation for the snowballing process. This process, as described by Wohlin et al. [8], was employed to identify additional relevant literature by reviewing the reference lists of the selected articles. The snowballing process was carried out in two phases: backward and the forward snowballing. Backward snowballing involved reviewing the reference lists of the selected papers to identify older relevant literature. By examining these references, this study could expand the scope of the search and identify foundational literature that informed current research in the area of IoT honeypots and machine learning. Forward snowballing focused on literature that cited the selected papers. This allowed us to identify newer works that were influenced by or built upon the selected articles.

Figure 1 shows the full snowballing process the authors applied, carried out in three iterative cycles to refine and expand the literature base. The process began with 40 papers from the first search. Through snowballing, a total of 49 relevant papers were identified for the final set—34 from the first search, 10 through backward snowballing, and 5 through forward snowballing.

The authors managed and organized the selected literature with Citavi6 as a reference management tool, which facilitated the efficient handling of bibliographic information and allowed us to easily categorize and access the literature. Each article was assessed for its relevance to the research questions and categorized accordingly. A summary of the key methods, findings, and contributions of each paper was also included in the database of this research work to assist in the analysis phase. The final set of selected literature consisted of 49 papers, which were considered relevant for this review.

4.3. Data Analysis

For the analysis of the selected literature, this study adopted a systematic approach to extract and synthesize key insights from the literature in the dataset of this research work. The primary objective was to identify recurring themes, key methodologies, and trends in the use of machine learning for the development of IoT honeypots. To achieve this, this study employed Webster’s concept-matrix approach [9], which is a widely used method for organizing and categorizing the findings from systematic literature reviews. The concept-matrix served as a tool for mapping out the different dimensions of the literature included in the review. This matrix allowed us to systematically compare and contrast various aspects of the research, identify gaps in the literature, and highlight areas where different literature converged or diverged in their findings. The concept-matrix was structured around four key dimensions that emerged from the literature: Interaction honeypots, machine learning techniques, detection architecture, and classifier technology.

The specific classification of each paper to the four dimensions can be found in Table 3. Each of these dimensions corresponds to a critical aspect of IoT honeypot development in the context of machine learning. The interaction honeypot dimension relates to the level of engagement that the honeypot has with potential attackers, ranging from low interaction to high interaction. The dimension of dimension of machtechniques covers the various algorithms and techniques employed in analyzing and detecting malicious activities within these honeypots. The detection architecture refers to the systems and frameworks designed to detect and classify malicious behavior, while the classifier dimension focuses on the techniques used to categorize or identify attacker behavior. In general, the data analysis phase was instrumental in organizing the literature and generating a comprehensive understanding of the various approaches to using machine learning in the development and deployment of IoT honeypots. Through the systematic application of the concept-matrix and qualitative analysis techniques, we were able to extract meaningful insights that informed the conclusions and recommendations of this research work. In the following, we explain the findings in each dimension of the concept-matrix.

5. Findings

In this section, the individual concepts on the concept-matrix (Table 3) are presented. Among them are interaction honeypots, machine learning method, detection architecture, and classifier technology. The subcategories are implicitly included in the sections and not individually itemized.

5.1. Interaction Honeypots

Honeypots are tools for cyber deception that were initially introduced in the 1990s [4]. In the early 2010s, a literature review by Bringer et al. [4] identified numerous advances and trends that had evolved since the 1990s; the same can be said about the last 10+ years. The area of honeypots has made significant advances in the last century. This has been further driven by the emergence of machine learning and artificial intelligence in recent years. However, even though significant strides have been made in the honeypot and cyber deception area, major weaknesses remain present to this day. The key issues identified in the analyzed literature are: The inability to generate network traffic that can mimic real traffic flow, which makes it possible for attackers to distinguish honeypots from real networks based on this characteristic [26]. Traditional honeypots have a static configuration and fixed network location, which becomes invalid in the case that the honeypot is detected or bypassed by the attackers [60]. It can be said that the main problem of traditional honeypots and even high-interaction honeypots is the ease with which attackers can detect them. With the attacker’s knowledge of their existence, the value of a honeypots nullifies, as it is possible for them to be bypassed [61].

In general, honeypots can be distinguished into low-interaction and high-interaction honeypots. There are also medium-interaction honeypots; however, these are not as common as the other two types. Therefore, this study will focus on the two most prevalent ones in the literature. Low-interaction honeypots are designed to emulate specific services or protocols, such as Telnet, SSH, or HTTP, which are commonly targeted in IoT environments. These honeypots provide limited interaction with attackers, focusing on detecting and logging attack attempts [13,35]. Low-interaction honeypots are simple to deploy, making them ideal for large-scale monitoring. Tools like Honeyd and Dionaea have been widely adopted due to their flexibility and low resource requirements [13]. In addition to that, for instance, IoTCmal can emulate a range of devices and protocols by implementing partial TCP/IP stacks, allowing researchers to simulate various IoT environments at scale [66]. Attackers often target common IoT services known for their vulnerabilities. Dionaea, for example, emulates services like FTP, SMB, and Telnet, capturing malware samples and providing valuable insights into the types of attacks that exploit these services. The focus on emulating widely used protocols makes low-interaction honeypots effective for capturing data from automated attacks, such as those launched by botnets like Mirai [47]. Low-interaction honeypots are particularly effective at capturing and detecting large-scale automated attacks. For example, Dionaea has been used extensively to gather malware samples by emulating vulnerable IoT services, which are frequently targeted by automated scanning tools. The malware collected from these honeypots provides researchers with critical data for analyzing attack patterns and understanding the propagation mechanisms used by malware like the Mirai botnet [47]. The data collected by low-interaction honeypots can be integrated into broader threat intelligence systems. For instance, Kumar et al. discuss the integration of low-interaction honeypots in a multi-platform threat intelligence framework, where the data from these honeypots are correlated with other sources to enhance the detection of known malware signatures and emerging threats [40]. One of the major drawbacks of low-interaction honeypots is their vulnerability to detection. Skilled attackers can often recognize these honeypots through fingerprinting techniques, as their simplified responses and limited protocol implementations reveal their decoy nature [35]. Research by Kumar et al. highlights that low-interaction honeypots like KFSensor and Honeyd are effective for detecting basic reconnaissance but fail to engage attackers attempting more sophisticated exploits [40]. Also, due to their design, they cannot capture the full scope of an attacker’s methods. They are primarily useful for detecting initial scanning and exploitation attempts but provide limited insight into the attacker’s behavior one access is gained. These limitations make them less effective for studying complex, multi-stage attacks that require deeper system interaction [40].

High-interaction honeypots offer a deeper level of engagement by emulating real systems or using actual IoT devices. These honeypots can capture detailed data on attacker behavior, providing insights into sophisticated attacks that low-interaction honeypots cannot detect [13]. High-interaction honeypots like FirmPot and HoeyIoT provide realistic environments that closely mimic actual IoT devices, including vulnerable firmware versions. FirmPot, for instance, uses real firmware images of routers and surveillance cameras, allowing researchers to emulate the exact behavior of these devices [69]. This approach makes it difficult for attackers to distinguish the honeypot from a genuine device, increasing the chances of capturing detailed exploitation attempts [66].

HoneyIoT employs reinforcement learning techniques to dynamically adapt its responses based on attacker actions, using a Markov decision process to model and predict attacker behavior. This adaptability reduces the likelihood of detection and provides a more authentic interaction, allowing researchers to observe complex attack techniques and gain a deeper understanding of the tactics used by sophisticated threat actors [35]. However, this study will provide more details regarding reinforcement learning in Section 4.2. On the other side, high-interaction honeypots require significant resources, including powerful hardware, advanced virtualization, or even real devices, to provide realistic interactions. The complexity of deploying and maintaining these honeypots is highlighted in the study by Wang et al., which discussed the need for robust containment measures to prevent the honeypot from being compromised and used for malicious purposes [66] Because high-interaction honeypots simulate real systems or use actual devices, there is a higher risk that attackers could exploit vulnerabilities in the honeypot itself. This potential for compromise necessitates careful isolation and monitoring, as a compromised honeypot could be leveraged to launch attacks against other system [26].

Therefore, to compare the two types, it is important to understand that the decision on which honeypot to use comes with significant tradeoffs. Low-interaction honeypots excel in scalability and resource efficiency, making them suitable for detecting large-scale, automated attacks. However, their limited interaction capabilities restrict the depth of analysis. High-interaction honeypots provide richer data and deeper insights into attacker behavior but are resource-intensive and challenging to deploy at scale. High-interaction honeypots like FirmPot and HoneyIoT are more resilient against evasion tactics due to their realistic emulation [35,69]. In contrast, low-interaction honeypots are often detected by experienced attackers, reducing their effectiveness for capturing sophisticated threats.

In addition to the prior discussed types of honeypots, in recent years, another approach emerged: hybrid honeypot models. They manage to integrate both low-interaction and high-interaction modules that map real device vulnerabilities to the public network. This hybrid approach allows for broad threat detection and detailed analysis, capturing a wide range of attacks from basic scans to complex malware exploits [66]. Also, some hybrid honeypots dynamically adjust their interaction level based on detected threats. For example, when low-interaction sensors detect suspicious activity, the session is escalated to a high-interaction honeypot for further analysis. This adaptive strategy maximizes resource efficiency while maintaining comprehensive threat coverage [66]. Hybrid systems like the multi-platform architecture described by Kumar et al. aggregate data from various honeypots to generate actionable cyber-threat intelligence. This integrated approach enhances the detection of multistage attacks and provides a holistic view of the threat landscape, improving IoT security defenses [40].

In summary, it can be observed that current research highlights the complementary roles of low- and high-interaction honeypots in IoT security. Both types of honeypots come with significant positives and negatives, making it necessary to take these trade-offs into consideration when deciding on the type of honeypot. Hybrid honeypot approaches can leverage the advantages of both honeypot types. However, it is to be noted that they do not eliminate their risks. In the following sections, the authors will further explore the ways in which machine learning can be applied to enhance the capabilities of honeypots in the IoT context.

5.2. Machine Learning Techniques

With the rapid proliferation of IoT devices, network security issues have become increasingly prominent. IoT devices often face constraints such as low power, limited storage, and complex deployment environments, making it difficult to adopt traditional security measures [35]. In recent years, honeypot technology has gained widespread application in IoT security research. By simulating vulnerable environments, honeypots attract and record attack behaviors, providing data for the analysis and detection of attack patterns. However, the massive and complex datasets generated by honeypot systems are challenging to analyze quickly and accurately using traditional techniques [44]. As a result, machine learning techniques have been introduced into IoT honeypot systems to enhance the efficiency of attack detection and classification.

Supervised learning techniques excel in detecting and classifying known attack patterns. These techniques rely on labeled data and learn the characteristics of known attack behaviors to identify similar patterns. The main advantages of supervised learning lie in its high detection accuracy and strong interpretability. For example, algorithms like Support Vector Machines (SVMs) and Random Forest (RF) can effectively distinguish between normal and malicious traffic [36]. Random forest, in particular, reduces the risk of overfitting by combining multiple decision trees, enabling efficient classification of complex traffic in IoT environments. Additionally, deep learning techniques such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory networks (LSTMs) have been applied to traffic analysis, extracting complex spatiotemporal features to detect more sophisticated attacks [71]. One practical application is botnet attack detection. In a smart factory scenario, combining honeypot data with a random forest algorithm achieved 96% accuracy, significantly enhancing security [44]. Another study showed that a random forest-based model achieved an AUC of 0.93 in classifying honeypot servers, effectively distinguishing them from regular servers and reducing the risk of detection by attackers [37]. As shown in Table 4, supervised learning models such as random forest, implemented in Weka and R-Studio, demonstrated high accuracy (up to 0.96) and consistent precision/recall values. However, discrepancies in false positive rates (FPR of 0.219 vs. 0.2667) underscore the importance of tool selection in real-world deployments. Despite these strengths, supervised learning in IoT environments faces challenges. First, it requires a large amount of labeled data, which is costly and time-consuming to obtain [29]. Second, complex models (e.g., deep learning) demand high computational and storage resources, making them difficult to deploy on resource-constrained IoT devices [50]. Balancing detection accuracy with computational efficiency remains a critical challenge for supervised learning applications in IoT honeypots.

Unsupervised learning does not rely on labeled data, making it suitable for discovering unknown attack patterns and abnormal behaviors. Its primary applications include anomaly detection and traffic clustering. Clustering algorithms like K-means and DBSCAN group traffic data to identify anomalies [45]. K-means clusters data points based on their distance from centroids, while DBSCAN uses density to group adjacent points, marking those in sparse regions as anomalies. Isolation forest, a popular unsupervised anomaly detection algorithm, uses tree-based models to identify outliers, performing well in detecting infrequent attack patterns. For instance, studies have shown that isolation forest effectively identifies anomalous traffic in IoT networks, uncovering new attack patterns without labeled data [36]. Dimensionality reduction techniques like Principal Component Analysis (PCA) and t-SNE have also been applied to honeypot data analysis [32]. These techniques reduce high-dimensional feature spaces into low-dimensional visualizations, allowing analysts to intuitively identify anomalies in data distributions. By combining clustering algorithms and dimensionality reduction techniques, researchers can explore new attack behaviors in unsupervised environments, contributing to enhanced IoT network security. The advantages of unsupervised learning lie in its independence from labeled data, making it highly effective in detecting unknown threats. However, the lack of labels often results in less interpretable outcomes, and model performance is highly sensitive to feature selection and parameter tuning. In practice, unsupervised techniques are often complemented with domain knowledge or additional supervised learning to improve detection accuracy [36]. Table 4 summarizes the performance metrics of notable unsupervised models. For instance, combinations like CNN + ScS and DMLP + ScS reached 100% accuracy, precision, and recall in the referenced studies, although issues such as unclassified instances were noted.

Reinforcement learning learns optimal strategies through interactions between an agent and its environment, showing potential applications in IoT honeypot systems. For instance, reinforcement learning algorithms can dynamically adjust honeypot configurations to achieve higher attack capture rates [64]. Compared to supervised and unsupervised learning, reinforcement learning emphasizes decision-making in dynamic environments, making it well-suited for addressing evolving attack patterns and attacker strategies. One application of reinforcement learning in IoT network security is dynamically configuring honeypot deployment locations to attract attacker behaviors. For example, using Q-learning algorithms, the system can iteratively adjust honeypot positions to maximize their effectiveness in trapping attackers. Another potential application involves utilizing reinforcement learning models to automatically adapt defense strategies, enabling honeypot systems to respond differently to various attackers [32]. Despite its potential, reinforcement learning faces several challenges in IoT honeypot applications. First, reinforcement learning models typically require large volumes of interaction data for training, which can be difficult to obtain in IoT environments. Second, reinforcement learning models are computationally intensive and require high-performance hardware. Due to these limitations, most reinforcement learning research in IoT honeypots remains in theoretical validation stages [35]. As shown in Table 4, reinforcement learning models such as A2C and PPO have demonstrated the highest average reward in simulations, with reward-based proxies like a penalty for incorrect responses being used in place of standard false positive metrics. With advancements in computational technology, its application prospects merit further exploration.

In summary, machine learning techniques offer diverse options for attack detection in IoT honeypots. Supervised learning is ideal for identifying known attack patterns, unsupervised learning excels at detecting unknown threats, and reinforcement learning provides innovative pathways for dynamically adjusting defense strategies. By selecting suitable machine learning techniques, researchers can enhance the effectiveness of honeypot technology in IoT network security.

5.3. Detection Architecture

As IoT devices have advanced, traditional brute-force login attacks, such as those targeting telnet ports, have become less effective due to measures like random default passwords and the disabling of telnet and SSH services to avoid botnets like Mirai. Attackers now exploit specific vulnerabilities, such as Remote Code Execution (RCE), to inject malicious code and compromise devices. These vulnerabilities often depend on the type, brand, or model of the target device, prompting attackers to perform pre-attack checks for better success rates. Furthermore, analyzing the interaction traces between attackers and real IoT devices provides valuable insights into attacker behaviors and tactics [35].

To better understand why there was a need for different honeypots, it is of utmost importance to understand what the limitations of traditional honeypots are. Traditional honeypots initially had low interaction capabilities. These early versions only simulated basic services and focused on detecting intrusions, but they lacked the ability to sustain complex interactions or collect detailed data about attacker tactics. Indeed, attackers scan networks for vulnerabilities and perform detailed checks on target devices before launching attacks, so low-interaction honeypots often fail to deceive attackers and miss capturing the actual attack [49]. Due to advances in malware, the need for more sophisticated honeypots emerged, including mid- and high-interaction honeypots. The problem is they still often fail to convincingly replicate the intricacies of real devices, which attackers could identify. IoT-specific vulnerabilities are particularly challenging, as attackers often check device-specific characteristics like brand and firmware version. Limited-interaction honeypots fail in these pre-check attack phases, reducing their effectiveness as they struggle to convincingly simulate device-specific characteristics that attackers look for before engaging fully [31]. Initial traditional setups like Cowrie used by HoneyShell could not fully deceive attackers, as initial phases allowed all credentials and did not emulate sophisticated device features. Later improvements added custom user filesystems and device responses to increase authenticity and engagement [61]. These limitations highlight the need for honeypots to enhance their interaction capabilities to effectively convince attackers that they are real devices.

The adaptivity of a honeypot refers to its ability to tailor its interactions based on an attacker’s behavior, convincingly mimicking an IoT device to avoid detection. Even when the optimal response is unknown, the honeypot leverages the attacker-provided data to select the most suitable reply. Figure 2 illustrates this adaptive process as a feedback loop between the learning automation and a random environment. The honeypot performs actions within the environment, receives feedback based on those actions, and uses this information to refine its responses continually. AIIPot exemplifies this by broadcasting all incoming requests to the IoT device network, ensuring a highly plausible response that maintains the illusion of authenticity [49]. Adaptive honeypots generate responses based on the likelihood of eliciting a reply from the attacker, ensuring the interaction continues without prematurely ending the session. This feedback-driven process enables continuous learning and adaptation, convincing attackers they are engaging with a real device, thereby extending interaction durations and collecting richer, more detailed security insights [60].

Reinforcement learning often underlies the technologies used in adaptive honeypots. For example, HoneyIoT adapts over time by learning optimal responses, using feedback from attacker interactions to refine its strategies. First, the honeypot’s RL agent can either exploit known responses or explore new actions, gradually identifying the best strategies. Then, actions (like allow, block, or substitute) are taken on states represented by bash commands, such as wget and sudo, replicating real-world IoT actions. HoneyIoT models its interactions with attackers as an MDP and uses differential content mutation to update response fields, enhancing deception by changing select response fields during runtime. HoneyIoT’s reward structure motivates the agent to prolong interactions until it captures malware uploads or vulnerabilities, encouraging interaction patterns that lead to higher attacker engagement [31]. More generally, the honeypot receives rewards for successfully responding to attacks, reinforcing effective responses. Over time, this process converges to an optimal policy, ensuring the honeypot consistently applies the best response to each scenario. Reinforcement learning fosters an environment where honeypots can autonomously evolve, integrating new interaction strategies as they accumulate more attack data. This process enables a faster response to emerging threats, improving the honeypot’s ability to handle complex, automated attack sequences.

Recently, various adaptative honeypots have emerged. One of them is the AIIPot architecture. AIIPot introduces a sophisticated structure for adaptive, IoT-focused honeypots. The honey-Chatbot is a conversational interface that mimics IoT device responses, increasing realism by interacting with attackers in a seemingly natural way. The request/response database stores expected requests and responses typical to IoT devices. Unknown requests (OOV) are flagged and passed to the evaluator, gradually enriching the honeypot’s database. The evaluator module determines if requests are trustworthy, dictating response paths. Trusted requests are broadcast locally, while suspicious ones are redirected, further strengthening the illusion of an authentic IoT environment [49].

One other is the HoneyShell ecosystem. HoneyShell’s development highlights adaptivity through phases. Initially, it collects data and refines its responses to hide its honeypot nature. Later, it restricts accepted login credentials based on usage patterns and emulates file commands to align more closely with attacker expectations. These adaptations allow for more realistic interactions and better data collection on attacker behavior. The honeypot’s ecosystem consists of three main components: server farms (both on-premises and cloud-based) hosting honeypot instances to engage with attackers and collect data; a vetting system to obscure the honeypot’s true nature from attackers; and an analytics infrastructure to analyze data for insights into attacker behavior. Notably, “HoneyShell” was developed as a honeypot version of shell services, addressing a significant portion of the observed attack surface [61].

HoneyIoT specifically employs reinforcement learning to adaptively interact with attackers. Interactions are framed as a Markov Decision Process (MDP), and the agent selects optimal responses to keep attackers engaged and increase the likelihood of observing a full attack sequence. Differential content mutation further adapts response fields at runtime to better align with attacker expectations [35].

Testing the effectiveness of adaptive honeypots is crucial. To select the most effective algorithm for HoneyIoT, the collected attack data were modeled into an MDP (Markov Decision Process) framework and tested using OpenAI Gym and Stable Baselines3, as Guan [35] shows, with a comparison of different learning algorithms in the adopted graph. For comparison, a baseline approach without reinforcement learning was implemented, where responses were chosen randomly with equal probability. Rewards were assigned based on attacker actions: positive rewards (+5 for malware uploads, +1 for vulnerability exploitation) and negative rewards (−1 for other actions). The results across 100 randomized attack sessions showed that the Proximal Policy Optimization (PPO) algorithm demonstrated the fastest convergence and highest average rewards, making it the preferred choice for HoneyIoT. Testing results indicate that reinforcement learning-driven approaches significantly improve interaction quality, with rewards structured to reflect progress towards capturing detailed attack information [35].

The HoneyShell testing experiment evolved through two phases, as Tabari [61] shows. In Phase 1, the initial deployment minimally modified the original Cowrie code, collecting data and identifying responses that could reveal the honeypot’s nature. It accepted all username–password combinations to maximize attacker interaction. Three instances were deployed: two on-premises and one in the cloud. In Phase 2, launched nine months later for one on-premises instance (with the other remaining in Phase 1 for comparison), HoneyShell refined its operations. It restricted valid credentials to the top 30 combinations observed in Phase 1 and enhanced attacker environments by adding new users, file systems, and commands. Logs from Phase 1 informed improvements, such as emulating file command responses, ensuring more realistic interactions and attracting further attacker activity [61].

The experiments demonstrated that adaptive honeypots, powered by reinforcement learning like the PPO algorithm and iterative refinements, significantly enhance interaction quality and realism, ensuring more effective data collection on attacker behavior.

Dynamic honeypots represent an advanced layer of cybersecurity by leveraging constant changes in their configurations, services, or system appearances to confuse and mislead attackers. Unlike static honeypots, which remain fixed and easier to profile over time, dynamic honeypots thrive on unpredictability, increasing the attackers’ uncertainty and discouraging them from exploiting vulnerabilities. This dynamism helps honeypots evade detection and sustain their effectiveness in hostile environments. An evolution of this concept is found in evolutionary dynamic systems, where the honeypot is treated as a dynamic ecosystem involving defenders, attackers, and legitimate users. These honeypots not only change configurations but also cycle between real and fake services, creating a security equilibrium that further enhances their deceptive capabilities. Unlike conventional dynamic honeypots, which primarily focus on altering appearances or configurations, evolutionary honeypots adopt a broader strategy by actively simulating real-life network behavior and integrating legitimate-like interactions [67].

5.4. Machine Learning Classifiers

The classifier for different machine learning techniques can be divided into different domains, including specific classifier algorithms and different types of classifiers. In addition, the areas of IoT traffic and malware detection can be covered with machine learning. Finally, a smart honeypot framework “S-Pot” from Franco et al. [34] will be introduced.

First, it is important to understand why machine learning with its classifier is important for security analysis. One problem is that the botnet structure contains very large and diverse data structures, which is why a synchronous view of attack activities is important. For this purpose, features such as duration or frequency can be indexed. Machine learning offers a perfect solution to this problem, as it uses feature-based classification and clustering. In addition, no unnecessary features need to be installed and no static assumptions about botnet profiles need to be made [59]. The process is the following, with machine learning (Figure 3): After the network packet is captured, the packets can be split into time bins. Then stateless and stateful feature vectors are generated, which can now be used for classification using machine learning [65]. This process shows the perfect applicability of machine learning. Overall, it can be said that with its classification, machine learning offers a very good opportunity for security.

There are two different types of classifiers. One is the ensemble voting classifier [53]. This combines different models to increase the prediction accuracy. For this purpose, different models are trained on different data aspects in order to strengthen the models for different possibilities and data sets. The predication process is then practiced with individual models. Finally, all predictions are merged to obtain a final classification. For merging, majority voting according to the most frequent prediction or weighted voting according to performance can be used. Ensemble voting offers the possibility to be used in different scenarios [53]. Another classifier type is the weighted voting classifier [53]. This is used when the performance of the prediction can be different in order to achieve an even weighting. The strategy here is that weights are assigned according to model performance. The process therefore looks like this: First, the weights are assigned individually to their respective model. Then, all model predictions are combined with an average weight into a final prediction. The weights can be assigned manually or automatically to their respective models and are based on training data [53]. In summary, it can be said that the classifier type should be selected according to the desired objective. If the focus is on prediction accuracy, the ensemble voting classifier is a good choice. If the performance of the different models is different, weighted voting classifiers should be preferred.

Classification and attack detection in the area of IoT uses various classifier techniques. The special thing about IoT here is the short-range communication and distinct network flow characteristics. Attack detection with machine learning involves two different analyses. One is power side-channel and the other is behavioral [65]. The power side-channel analysis uses power consumption data and extracts the features with deep learning. In the behavioral analysis, the device behavior is displayed as graphs. The application used here is Graph Convolutional Networks (GCNs). For both analyses, the machine learning process in the IoT area includes data collection, feature extraction based on packet length, inter-packet intervals, and protocol and binary classification. The feature selection is based on limited endpoint and regular packet intervals [65]. The most effective algorithms used are random forest, K-nearest neighbors, and neural networks. These are ensemble voting classifiers [65]. The random forest classifier [33] is used for power-side channel analysis. Danilov [30] shows how well random forest performs compare to the naive Bayes classifer and K-nearest neighbors method. The aim of IoT security is to identify different types of network traffic attacks [34]. Overall, it can be said that the ensemble voting classifier is the most suitable for IoT, and the analysis option with machine learning is very broadly based.

The S-Pot as a smart honeypot framework represents a cutting-edge solution for enhancing cybersecurity through advanced machine learning techniques [34]. Key components include data cleansing to remove noise and inconsistencies and feature selection to identify the most relevant characteristics for classifying attacks. Through data preprocessing and feature extraction, S-Pot ensures optimal input for its classifiers. The framework is designed with flexibility, allowing for the integration of modern machine learning techniques, including deep learning. Classification results are seamlessly passed to a dynamic rule configuration module, enhancing integration capabilities. Notably, S-Pot excels in the detection of new attack types, enabled by machine learning classifiers, achieving a high accuracy of 97 percent in attack detection using the J48 algorithm. Additionally, it minimizes false alarms, effectively reducing false negatives, making it a reliable and efficient solution for modern threat landscapes [34]. By combining these features, S-Pot demonstrates a robust and adaptable approach to addressing evolving cyber threats.

The Hybrid Web Attack Detection Framework introduces an innovative Convolutional Neural Network (CNN)-based deep learning classifier tailored for IoT honeypots [54]. This deep learning classifier demonstrates superior performance with no signs of overfitting, ensuring reliable and efficient detection of web attacks. An integral component of the system is the Cookie Analysis Engine (CAE), which leverages deep learning to provide personalized analysis. The CAE is crucial for maintaining attacker sessions and enabling customized deception strategies. It functions by analyzing cookie fields and categorizing them as benign, suspicious, or malicious. However, its performance depends on the deep learning classifier for analyzing additional fields, highlighting a key limitation. An important observation is that many malicious requests contain seemingly benign cookie fields, underscoring the necessity of the deep learning classifier for comprehensive and accurate analysis [54]. By combining CNN-based classification with cookie analysis, this framework provides an effective and tailored solution for detecting web-based attacks in IoT honeypot environments.

To further validate and compare the performance of different machine learning classifiers, several open-source implementations and benchmarks can be utilized. For instance, the benchm-ml project on GitHub (version 2.0.1, developed by GitHub, Inc., located in San Francisco, CA, USA) provides a minimal benchmark for scalability, speed, and accuracy of commonly used open-source implementations of machine learning algorithms, including random forests, gradient boosted trees, and deep neural networks [72]. Additionally, the scikit-learn bench project benchmarks various implementations of machine learning algorithms across data analytics frameworks such as scikit-learn, DAAL4PY, cuML, and XGBoost [72]. These resources offer valuable insights into the performance of different classifiers and can be instrumental in optimizing IoT honeypot systems. Overall, there are various classifier technologies for machine learning techniques that may be relevant for integration with IoT honeypots. These results can be seen as an extension of the machine learning method section.

6. Discussion

In this section, the potential advantages of dynamic honeypots over adaptive honeypots in IoT environments, as well as the combined benefits of various machine learning techniques in enhancing IoT honeypots, are explored. Additionally, the contributions and limitations of this study are discussed, and an outlook for future studies is provided.

6.1. Potential Advantages of Dynamic Honeypots in Internet of Things over Adaptative Honeypots

The debate over the effectiveness of evolutionary dynamic honeypots versus adaptive honeypots in IoT security highlights their distinct strengths and the nuanced contexts in which each can excel. Evolutionary dynamic honeypots bring a revolutionary approach to deception, leveraging phase-based cycles of real and fake services to create unpredictable environments. This unpredictability significantly increases the challenge for attackers, disrupting their ability to exploit systems. For instance, experiments with HoneyShell demonstrated the power of evolutionary dynamics. By alternating between phases, HoneyShell refined credential requirements and incorporated realistic responses based on prior attacker interactions. These adjustments not only maintained authenticity but also enabled the capture of deeper insights into attacker behavior. This phase-based adaptation is a hallmark of evolutionary systems, where constantly switching states reduces the average attack success rate and creates a security environment that outperforms static methods. HoneyShell’s deliberate refinements—such as enhanced user credentials, commands, and responses—offer a prime example of how such systems convincingly mimic real environments, making it harder for attackers to identify and exploit honeypots.

However, one might argue that while evolutionary systems excel in their broad-spectrum deterrence and long-term adaptability, they may not always be the best choice for scenarios requiring immediate, precise interaction. This is where adaptive honeypots come into play, offering unparalleled real-time responsiveness. Adaptive honeypots are designed to tailor their responses based on the attacker’s behavior in the moment, ensuring sustained engagement and enhancing the illusion of interacting with an authentic IoT device. This ability to react dynamically is particularly effective against novel attack strategies or unknown vulnerabilities, where pre-programmed evolution might lag behind emerging threats. Moreover, adaptive honeypots are well-suited for resource-constrained IoT setups, where the complexity of evolutionary systems may not be practical to implement.

The strengths of both evolutionary dynamic and adaptive honeypots lie in their complementary approaches. Are we aiming to confuse attackers over time, as seen in HoneyShell’s strategic evolution, or is the goal to dive deeply into specific interactions to uncover new insights in real time? The impact of these systems, whether through the dynamism of evolutionary cycles or the immediacy of adaptive engagement, underscores their vital roles in modern IoT security. By blending these approaches, we can envision a future where the balance of deception and responsiveness creates a more robust and versatile defense against ever-evolving cyber threats.

The analysis reveals consistent efforts to balance realism and scalability across IoT honeypot systems. Both low- and high-interaction honeypots struggle with generating believable traffic, while hybrid approaches aim to merge their strengths. Machine learning techniques are widely adopted but diverge in trade-offs: supervised learning achieves high accuracy but requires labeled data, whereas reinforcement learning adapts dynamically at higher computational costs. Key challenges include managing large data volumes, real-world deployment complexities of high-interaction systems, and environmental sensitivity that delays responses to new attack patterns.

Current methods remain limited by static configurations and resource intensity. Future work should prioritize lightweight machine learning integration to address deployability constraints. Cross-platform standardization could enhance threat intelligence sharing, while advanced adaptive models may improve autonomy in dynamic environments. These directions align with the observed gaps in existing systems, such as the need for real-time adaptability and efficient resource use.

6.2. Combined Advantages of Multiple Machine Learning Techniques in Internet of Things Honeypots

In the field of IoT honeypots, the combination of multiple machine learning techniques is of great significance. Supervised learning is highly accurate in detecting known attack patterns. It can be trained with labeled data to recognize specific attack signatures and behaviors that have been previously identified. For example, it can effectively identify common malware infections or known types of network intrusions. Unsupervised learning, however, shines in discovering unknown abnormal patterns. It can analyze the data without prior knowledge of what is normal or abnormal and detect unusual behaviors that might indicate new or emerging threats. Reinforcement learning optimizes the system’s defense strategy. It enables the honeypot to adapt and improve its responses over time based on the rewards or punishments it receives from the environment.

The combination of these three methods offers several advantages. Firstly, in the initial stage, unsupervised learning helps to identify potential anomalies. It scans the IoT environment and flags any data points or behaviors that deviate from the norm. Then, supervised learning comes into play to classify these anomalies. It uses its learned models to determine whether the flagged items are actual attacks or just false positives. Finally, reinforcement learning takes the results from the previous two steps and optimizes the defense behavior of the honeypot. It decides on the best actions to take, such as adjusting the configuration of the honeypot or initiating countermeasures. This sequential process allows for a more comprehensive and accurate threat detection and response. Moreover, the combination of multiple methods enables more comprehensive threat detection and response. It covers all aspects from the initial data analysis and anomaly discovery to the accurate identification of attacks and the dynamic adjustment of defense strategies. This forms a complete security protection system that significantly improves the security and effectiveness of the honeypot system in the complex IoT environment. Figure 4 illustrates this sequential process, where each technique plays a key role. Unsupervised learning identifies potential threats, supervised learning models classify them, and reinforcement learning ultimately optimizes the honeypot’s defense strategy.

However, there are also challenges. The combination of multiple machine learning techniques requires a large amount of high-quality data for training and learning. In the IoT environment, collecting and labeling data is often difficult. The data may also contain noise and be incomplete. To solve these problems, data augmentation techniques can be adopted to increase the amount of data. Semi-supervised learning can be used to make better use of unlabeled data. Additionally, effective data cleaning and preprocessing algorithms need to be developed to improve data quality. Another challenge is the increase in model complexity and computational resource requirements. Combining multiple machine learning techniques can lead to more complex models that demand more computing power. In resource-constrained IoT devices and honeypot systems, this can be a problem. To address this, model compression techniques can be used to reduce the size of the models. Lightweight algorithms can be adopted to reduce computational burden. Edge computing can also be utilized to distribute the computational load. At the same time, optimizing algorithms and model structures can improve computational efficiency.

In summary, the combination of supervised, unsupervised, and reinforcement learning techniques in IoT honeypots has great potential. By leveraging their complementary advantages and addressing the challenges, the security and performance of IoT honeypots can be enhanced, thus better protecting IoT systems from various security threats.

6.3. Limitations

While this paper provides valuable insights into the integration of machine learning into IoT honeypots, it aims to establish a foundation and represent the current state of research rather than solve a specific problem. Certain limitations associated with the scope and methodology of this research must be considered to provide context and adequately evaluate the findings.

The systematic literature review was conducted within a limited three-month period. Although the time-frame imposed natural constraints, rigorous efforts were made to ensure the review is complete and reproducible. The relatively short time-frame influenced the focus on meaningful and representative studies to enable clear and coherent analysis. IoT security, machine learning, and honeypots are dynamic fields that are rapidly evolving. The findings of this paper reflect the state of research as of October 2024. The rapid development of new machine learning models, IoT technologies, and emerging threats means that some findings presented here may need to be reevaluated in the future.

Additionally, it is worth mentioning that the literature the authors of this study reviewed predominantly focuses on short-term experiments or simulations. These studies lacked long-term deployment scenarios, which limited our ability to draw conclusions about the sustainable performance and adaptability of machine learning techniques in IoT honeypots. Furthermore, machine learning solutions with significant computational requirements were often recommended in the literature. These solutions may not be compatible with the limitations of resource-constrained IoT devices, such as limited processing power and storage capacity, which poses challenges for practical use that should be considered. Additionally, ensuring the robustness of machine learning models against adversarial attacks remains a crucial challenge. Many existing studies do not address how well these models can withstand sophisticated evasion techniques used by attackers, highlighting a critical research gap.

Another significant limitation lies in data availability and quality. The effectiveness of machine learning-driven honeypots heavily depends on high-quality datasets for training and evaluation. However, obtaining real-world attack data is challenging due to privacy concerns, ethical considerations, and the constantly evolving nature of cyber threats. The reliance on publicly available datasets or simulated attack scenarios may not fully capture the complexity of real-world adversarial behaviors. Additionally, biases in dataset collection and annotation can influence model performance, leading to potential misclassifications and reduced generalizability of findings.

Furthermore, the interpretability and transparency of machine learning models in IoT honeypots remain an ongoing challenge. Many state-of-the-art models function as “black boxes”, making it difficult to explain their decision-making processes. This lack of interpretability can hinder trust and adoption, particularly in critical IoT infrastructures where explainability is crucial for security operations.

These limitations provide the context for interpreting the results of this paper. Addressing them in future research will be essential to further enhancing the effectiveness, efficiency, and applicability of machine learning-based IoT honeypots in real-world environments.

Finally, it is important to highlight that the real-world deployment of machine learning classifiers in IoT honeypot systems presents several implications. Firstly, the scalability and performance of these classifiers must be rigorously tested to handle large volumes of data and diverse attack patterns [73]. Ensuring robust performance in production environments requires thorough validation and continuous monitoring to adapt to evolving threats [74]. Additionally, the integration of machine learning models into existing security frameworks necessitates careful consideration of compatibility and interoperability [75]. Ethical considerations, such as data privacy and user trust, are paramount, requiring transparent and accountable deployment practices [74]. Addressing these challenges will enhance the effectiveness and reliability of IoT honeypot systems in real-world scenarios.

6.4. Future Research Directions

Building on the results of this paper, future research could explore several promising avenues for further developing the field of IoT honeypots and their integration with machine learning.

One key future research direction is the real-world implementation and long-term evaluation of machine learning-enhanced honeypots. Current studies predominantly focus on short-term experiments and controlled environments, but long-term deployments in diverse IoT settings could provide deeper insights into the sustained effectiveness and adaptability of these systems against evolving attack strategies. Investigating how these honeypots perform under real-world constraints, including varying network conditions and emerging threat landscapes, remains a crucial challenge.

Another important research direction involves optimizing machine learning models for IoT honeypots, particularly in terms of efficiency and resource consumption. Given the inherent limitations of IoT devices—such as restricted processing power and energy constraints—future studies should focus on developing lightweight and energy-efficient machine learning algorithms. This could include exploring novel model architectures, compression techniques, or hybrid approaches that balance computational demands with practical applicability.

The integration of emerging technologies into IoT honeypots also presents a significant opportunity for future research. For instance, 5G networks could enhance the scalability and responsiveness of honeypots, while edge computing could help distribute computational loads efficiently. Additionally, incorporating blockchain technology could improve data integrity and secure information sharing between honeypot networks, reducing the risk of tampering and increasing trust in collected threat intelligence.

Moreover, future research could investigate the potential of adversarial machine learning in IoT honeypots. Attackers are increasingly employing sophisticated evasion techniques to bypass security mechanisms. Studying how adversarial machine learning can be used both defensively (to improve honeypot deception) and offensively (to detect and counter adversarial attacks) could provide valuable insights into strengthening IoT security.

Finally, interdisciplinary research combining cybersecurity, behavioral analysis, and artificial intelligence could further refine the effectiveness of honeypots. Understanding attacker behavior through AI-driven behavioral analytics could lead to more adaptive and intelligent honeypot designs that dynamically adjust their strategies based on real-time threat intelligence.

By addressing these research directions, future studies can contribute to the advancement of IoT honeypots, ensuring they remain effective against increasingly sophisticated cyber threats while maintaining efficiency and scalability in real-world applications.

7. Conclusions

The increasing use of IoT devices has led to growing security concerns, necessitating advanced solutions to address emerging threats. Honeypots enhance IoT security by attracting and analyzing attackers. However, traditional honeypots struggle with adaptability and efficiency. This paper examines how machine learning enhances honeypot capabilities by improving threat detection and response mechanisms.

A systematic literature review using the snowballing method explores the application of supervised, unsupervised, and reinforcement learning in optimizing honeypot architectures. This paper focuses on dynamic honeypots, which evolve to mislead attackers, and adaptive honeypots, which respond to threats in real time. By evaluating low-interaction, high-interaction, and hybrid honeypots, we determine how different machine learning techniques enhance detection and resource efficiency.

The findings of this research work indicate that integrating multiple machine learning techniques enhances honeypot effectiveness (RQ1). Supervised, unsupervised, and reinforcement learning contribute to improved detection, adaptability, and automated response mechanisms, making honeypots more resilient against cyber threats.

The comparison of dynamic and adaptive honeypots highlights their respective strengths. Dynamic honeypots are effective in misleading attackers over time, while adaptive honeypots provide real-time threat responses. This paper also emphasizes the role of low-interaction, high-interaction, and hybrid honeypots in optimizing threat detection and security resource allocation. Despite their advantages, machine learning-based honeypots face challenges such as high computational costs and limited real-world deployment (RQ2, RQ3).

Future research should focus on developing lightweight machine learning models suitable for resource-constrained IoT environments. Long-term real-world testing, integration with emerging technologies like 5G and edge computing, and optimization techniques for model efficiency could further enhance the practicality and scalability of these systems.

In conclusion, this paper provides valuable insights into the role of machine learning in improving IoT honeypots. Addressing existing challenges and exploring future research directions can lead to more intelligent, adaptable, and effective security solutions for IoT networks.

Author Contributions

Conceptualization, S.L., S.L.-R.P., P.S., H.W., M.P., G.C. and N.D.; methodology, S.L., S.L.-R.P., P.S., H.W., M.P., G.C. and N.D.; validation, S.L., S.L.-R.P., P.S., H.W., M.P., G.C. and N.D.; investigation, S.L., S.L.-R.P., P.S. and H.W.; resources, S.L., S.L.-R.P., P.S. and H.W.; data curation, S.L., S.L.-R.P., P.S. and H.W.; writing—original draft preparation, S.L., S.L.-R.P., P.S., H.W., M.P., G.C. and N.D.; writing—review and editing, M.P., G.C. and N.D.; visualization, S.L., S.L.-R.P., P.S., H.W., M.P., G.C. and N.D.; supervision, M.P., G.C. and N.D.; project administration, M.P., G.C. and N.D.; funding acquisition, N.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflicts of interest.

References

McKinsey & Company. The Internet of Things: Catching Up to an Accelerating Opportunity. Report. 2021. Available online: https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/iot%20value%20set%20to%20accelerate%20through%202030%20where%20and%20how%20to%20capture%20it/the-internet-of-things-catching-up-to-an-accelerating-opportunity-final.pdf (accessed on 24 April 2025).
Kozlov, D.; Veijalainen, J.; Ali, Y. Security and Privacy Threats in IoT Architectures. In Proceedings of the 7th International Conference on Body Area Networks, Oslo, Norway, 24–26 February 2012; pp. 256–262. [Google Scholar] [CrossRef]
Nikolov, N. Research of MQTT, CoAP, HTTP and XMPP IoT Communication Protocols for Embedded Systems. In Proceedings of the 2020 XXIX International Scientific Conference Electronics (ET), Sozopol, Bulgaria, 16–18 September 2020; pp. 1–4. [Google Scholar] [CrossRef]
Bringer, M.L.; Chelmecki, C.A.; Fujinoki, H. A Survey: Recent Advances and Future Trends in Honeypot Research. Int. J. Comput. Netw. Inf. Secur. 2012, 4, 63. [Google Scholar] [CrossRef]
Pa, Y.M.P.; Suzuki, S.; Yoshioka, K.; Matsumoto, T.; Kasama, T.; Rossow, C. IoTPOT: A Novel Honeypot for Revealing Current IoT Threats. J. Inf. Process. 2016, 24, 522–533. [Google Scholar] [CrossRef]
Hussain, F.; Hussain, R.; Hassan, S.A.; Hossain, E. Machine Learning in IoT Security: Current Solutions and Future Challenges. IEEE Commun. Surv. Tutorials 2020, 22, 1686–1721. [Google Scholar] [CrossRef]
Ahmed, Y.; Beyioku, K.; Yousefi, M. Securing Smart Cities Through Machine Learning: A Honeypot-Driven Approach to Attack Detection in Internet of Things Ecosystems. IET Smart Cities 2024, 6, 180–198. [Google Scholar] [CrossRef]
Wohlin, C. Guidelines for Snowballing in Systematic Literature Studies and a Replication in Software Engineering. In Proceedings of the 18th International Conference on Evaluation and Assessment in Software Engineering, London, UK, 13–14 May 2014; pp. 1–10. [Google Scholar] [CrossRef]
Webster, J.; Watson, R.T. Analyzing the Past to Prepare for the Future: Writing a Literature Review. MIS Q. 2002, 26, xiii–xxiii. [Google Scholar]
Spitzner, L. Honeypots: Catching the Insider Threat. In Proceedings of the 19th Annual Computer Security Applications Conference, Las Vegas, NV, USA, 8–12 December 2003; pp. 170–179. [Google Scholar] [CrossRef]
Ng, C.; Pan, L.; Xiang, Y. Design Honeypots. In Honeypot Frameworks and Their Applications: A New Framework; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
Mokube, I.; Adams, M. Honeypots: Concepts, Approaches, and Challenges. In Proceedings of the 45th Annual ACM Southeast Conference, Winston-Salem, NC, USA, 23–24 March 2007; pp. 321–326. [Google Scholar] [CrossRef]
Dara, N.; Shankar, P.; Arvind, P.V.; Singh, V. Intelligent Insight into IoT Threats: Leveraging Advanced Analytics with Honeypots for Anomaly Detection. In Proceedings of the 2024 IEEE 9th International Conference for Convergence in Technology (I2CT), Pune, India, 5–7 April 2024; pp. 1–6. [Google Scholar]
Lingenfelter, B.; Vakilinia, I.; Sengupta, S. Analyzing Variation Among IoT Botnets Using Medium Interaction Honeypots. In Proceedings of the 2020 10th Annual Computing and Communication Workshop and Conference (CCWC), Las Vegas, NV, USA, 6–8 January 2020; pp. 761–767. [Google Scholar] [CrossRef]
Millar, S. IoT Security Challenges and Mitigations: An Introduction. arXiv 2021, arXiv:2112.14618. [Google Scholar]
Sharmeela, C.; Sanjeevikumar, P.; Sivaraman, P.; Joseph, M. IoT, Machine Learning and Blockchain Technologies for Renewable Energy and Modern Hybrid Power Systems; CRC Press: Boca Raton, FL, USA, 2023. [Google Scholar]
Shinde, P.P.; Shah, S. A Review of Machine Learning and Deep Learning Applications. In Proceedings of the 2018 Fourth International Conference on Computing Communication Control and Automation (ICCUBEA), Pune, India, 16–18 August 2018; pp. 1–6. [Google Scholar] [CrossRef]
Dangi, R.; Jadhav, A.; Choudhary, G.; Dragoni, N.; Mishra, M.; Lalwani, P. ML-Based 5G Network Slicing Security: A Comprehensive Survey. Future Internet 2022, 14, 116. [Google Scholar] [CrossRef]
Franco, J.; Aris, A.; Canberk, B.; Uluagac, A.S. A Survey of Honeypots and Honeynets for Internet of Things, Industrial Internet of Things, and Cyber-Physical Systems. IEEE Commun. Surv. Tutor. 2021, 23, 2351–2383. [Google Scholar] [CrossRef]
Jain, A.K.; Shukla, H.; Goel, D. A Comprehensive Survey on DDoS Detection, Mitigation, and Defense Strategies in Software-Defined Networks. Clust. Comput. 2024, 27, 13129–13164. [Google Scholar] [CrossRef]
Kaur, B.; Pateriya, P.K. A Survey on Security Concerns in Internet of Things. In Proceedings of the 2018 Second International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 14–15 June 2018; pp. 27–34. [Google Scholar] [CrossRef]
Oza, A.D.; Kumar, G.N.; Khorajiya, M. Survey of Snaring Cyber Attacks on IoT Devices with Honeypots and Honeynets. In Proceedings of the 2018 3rd International Conference for Convergence in Technology (I2CT), Pune, India, 6–8 April 2018; pp. 1–6. [Google Scholar] [CrossRef]
Razali, M.F.; Razali, M.N.; Mansor, F.Z.; Muruti, G.; Jamil, N. IoT Honeypot: A Review from Researcher’s Perspective. In Proceedings of the 2018 IEEE Conference on Application, Information and Network Security (AINS), Langkawi, Malaysia, 21–22 November 2018; pp. 93–98. [Google Scholar] [CrossRef]
Petersen, K.; Vakkalanka, S.; Kuzniarz, L. Guidelines for Conducting Systematic Mapping Studies in Software Engineering: An Update. Inf. Softw. Technol. 2015, 64, 1–18. [Google Scholar] [CrossRef]
Alshahrani, A. Predication Attacks Based on Intelligent Honeypot Technique. Inf. Sci. Lett. 2023, 12, 46. [Google Scholar]
Bao, J.; Kantarcioglu, M.; Vorobeychik, Y.; Kamhoua, C. IoTFlowGenerator: Crafting Synthetic IoT Device Traffic Flows for Cyber Deception. arXiv 2023, arXiv:2305.00925. [Google Scholar] [CrossRef]
Chempavathy, B.; Deshmukh, V.M.; Datta, A.; Shiva, A.T.; Singh, G. An Exploration into Secure IoT Networks Using Deep Learning Methodologies. In Proceedings of the 2022 IEEE International Conference for Advancement in Technology (ICONAT), Goa, India, 21–22 January 2022; pp. 1–4. [Google Scholar] [CrossRef]
Chuang, P.J.; Hung, T.C. Enhanced Attack Blocking in IoT Environments: Engaging Honeypots and Machine Learning in SDN OpenFlow Switches. J. Appl. Sci. Eng. 2020, 23, 163–173. [Google Scholar]
Da Costa, K.A.; Papa, J.P.; Lisboa, C.O.; Munoz, R.; de Albuquerque, V.H.C. Internet of Things: A Survey on Machine Learning-Based Intrusion Detection Approaches. Comput. Netw. 2019, 151, 147–157. [Google Scholar] [CrossRef]
Danilov, V.; Ovasapyan, T.; Ivanov, D.V.; Konoplev, A.S.; Moskvin, D.A. Generation of Synthetic Data for Honeypot Systems Using Deep Learning Methods. Autom. Control Comput. Sci. 2022, 56, 916–926. [Google Scholar] [CrossRef]
Dowling, S.; Schukat, M.; Barrett, E. New Framework for Adaptive and Agile Honeypots. Etri J. 2020, 42, 965–975. [Google Scholar] [CrossRef]
El Ghazi, A.; Rachid, A.M. Machine Learning and Datamining Methods for Hybrid IoT Intrusion Detection. In Proceedings of the 2020 5th IEEE International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech), Marrakesh, Morocco, 24–26 November 2020; pp. 1–6. [Google Scholar]
El-Taie, M.; Kraidi, A.Y. A Deep Learning Framework for Securing IoT Against Malwares. J. Cybersecur. Inf. Manag. 2023, 11, 38–46. [Google Scholar]
Franco, J.; Aris, A.; Babun, L.; Uluagac, A.S. S-Pot: A Smart Honeypot Framework with Dynamic Rule Configuration for SDN. In Proceedings of the GLOBECOM 2022–2022 IEEE Global Communications Conference, Rio de Janeiro, Brazil, 4–8 December 2022; pp. 2818–2824. [Google Scholar] [CrossRef]
Guan, C.; Liu, H.; Cao, G.; Zhu, S.; La Porta, T. HoneyIoT: Adaptive High-Interaction Honeypot for IoT Devices Through Reinforcement Learning. In Proceedings of the 16th ACM Conference on Security and Privacy in Wireless and Mobile Networks, Guildford, UK, 29 May–1 June 2023; pp. 49–59. [Google Scholar] [CrossRef]
Harahsheh, K.M.; Chen, C.H. A Survey of Using Machine Learning in IoT Security and the Challenges Faced by Researchers. Informatica 2023, 47, 1–54. [Google Scholar] [CrossRef]
Huang, C.; Han, J.; Zhang, X.; Liu, J. Automatic Identification of Honeypot Server Using Machine Learning Techniques. Secur. Commun. Netw. 2019, 2019, 2627608. [Google Scholar] [CrossRef]
Iwabuchi, M.; Nakamura, A. A Heuristics and Machine Learning Hybrid Approach to Adaptive Cyberattack Detection. In Proceedings of the 2024 IEEE International Conference on Artificial Intelligence, Computer, Data Sciences and Applications (ACDSA), Victoria, Seychelles, 1–2 February 2024; pp. 1–7. [Google Scholar] [CrossRef]
Khan, S.U.; Eusufzai, F.; Azharuddin Redwan, M.; Ahmed, M.; Sabuj, S.R. Artificial Intelligence for Cyber Security: Performance Analysis of Network Intrusion Detection. In Explainable Artificial Intelligence for Cyber Security: Next Generation Artificial Intelligence; Springer: Berlin/Heidelberg, Germany, 2022; pp. 113–139. [Google Scholar] [CrossRef]
Kumar, S.; Janet, B.; Eswari, R. Multi Platform Honeypot for Generation of Cyber Threat Intelligence. In Proceedings of the 2019 IEEE 9th International Conference on Advanced Computing (IACC), Tiruchirappalli, India, 13–14 December 2019; pp. 25–29. [Google Scholar] [CrossRef]
Layton, J.; Hu, F.; Hei, X. Survey of Machine Learning Defense Strategies. In AI, Machine Learning and Deep Learning; CRC Press: Boca Raton, FL, USA, 2023; pp. 121–129. [Google Scholar]
Lee, S.; Abdullah, A.; Jhanjhi, N. A Review on Honeypot-Based Botnet Detection Models for Smart Factory. Int. J. Adv. Comput. Sci. Appl. 2020, 11, 1–12. [Google Scholar] [CrossRef]
Lee, S.; Abdullah, A.; Jhanjhi, N.; Kok, S. Classification of Botnet Attacks in IoT Smart Factory Using Honeypot Combined with Machine Learning. PeerJ Comput. Sci. 2021, 7, e350. [Google Scholar] [CrossRef] [PubMed]
Lee, S.; Abdullah, A.; Jhanjhi, N.; Kok, S. Honeypot Coupled Machine Learning Model for Botnet Detection and Classification in IoT Smart Factory—An Investigation. In Proceedings of the MATEC Web of Conferences, Subang Jaya, Malaysia, 25 November 2020; Volume 335, p. 04003. [Google Scholar] [CrossRef]
Liu, H.; Lang, B. Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey. Appl. Sci. 2019, 9, 4396. [Google Scholar] [CrossRef]
Liu, J.; Liu, S.; Zhang, S. Detection of IoT Botnet Based on Deep Learning. In Proceedings of the 2019 Chinese Control Conference (CCC), Guangzhou, China, 27–30 July 2019; pp. 8381–8385. [Google Scholar] [CrossRef]
Mahajan, V.; Singh, J. Malware Detection and Analysis Using Modern Honeypot Allied with Machine Learning: A Performance Evaluation. In Proceedings of the 2023 4th International Conference on Electronics and Sustainable Communication Systems (ICESC), Coimbatore, India, 6–8 July 2023; pp. 1539–1544. [Google Scholar] [CrossRef]
Matin, I.M.M.; Rahardjo, B. Malware Detection Using Honeypot and Machine Learning. In Proceedings of the 2019 7th International Conference on Cyber and IT Service Management (CITSM), Jakarta, Indonesia, 6–8 November 2019; Volume 7, pp. 1–4. [Google Scholar] [CrossRef]
Mfogo, V.S.; Zemkoho, A.; Njilla, L.; Nkenlifack, M.; Kamhoua, C. AIIPot: Adaptive Intelligent-Interaction Honeypot for IoT Devices. In Proceedings of the 2023 IEEE 34th Annual International Symposium on Personal, Indoor and Mobile Radio Communications (PIMRC), Toronto, ON, Canada, 5–8 September 2023; pp. 1–6. [Google Scholar] [CrossRef]
Panda, M.; Abd Allah, A.M.; Hassanien, A.E. Developing an Efficient Feature Engineering and Machine Learning Model for Detecting IoT-Botnet Cyber Attacks. IEEE Access 2021, 9, 91038–91052. [Google Scholar] [CrossRef]
Pashaei, A.; Akbari, M.E.; Lighvan, M.Z.; Charmin, A. Deep Learning Based Early Intrusion Detection in IIoT using Honeypot. Majlesi J. Electr. Eng. 2023, 17, 1–12. [Google Scholar]
Pauna, A.; Bica, I.; Pop, F.; Castiglione, A. On the Rewards of Self-Adaptive IoT Honeypots. Ann. Telecommun. 2019, 74, 501–515. [Google Scholar] [CrossRef]
Pothumani, P.; Reddy, E.S. Network Intrusion Detection Using Ensemble Weighted Voting Classifier Based Honeypot Framework. J. Auton. Intell. 2024, 7, 1–12. [Google Scholar] [CrossRef]
Shahid, W.B.; Aslam, B.; Abbas, H.; Afzal, H.; Khalid, S.B. A Deep Learning Assisted Personalized Deception System for Countering Web Application Attacks. J. Inf. Secur. Appl. 2022, 67, 103169. [Google Scholar] [CrossRef]
Sharma, P.; Kapoor, S.; Sharma, R. Ransomware Detection, Prevention and Protection in IoT Devices Using ML Techniques Based on Dynamic Analysis Approach. Int. J. Syst. Assur. Eng. Manag. 2023, 14, 287–296. [Google Scholar] [CrossRef]
Shinan, K.; Alsubhi, K.; Alzahrani, A.; Ashraf, M.U. Machine Learning-Based Botnet Detection in Software-Defined Network: A Systematic Review. Symmetry 2021, 13, 866. [Google Scholar] [CrossRef]
Shobana, M.; Poonkuzhali, S. A Novel Approach to Detect IoT Malware by System Calls Using Deep Learning Techniques. In Proceedings of the 2020 International Conference on Innovative Trends in Information Technology (ICITIIT), Kottayam, India, 13–14 February 2020; pp. 1–5. [Google Scholar] [CrossRef]
Shrivastava, R.K.; Bashir, B.; Hota, C. Attack Detection and Forensics Using Honeypot in IoT Environment. In Proceedings of the Distributed Computing and Internet Technology: 15th International Conference, ICDCIT, Bhubaneswar, India, 10–13 January 2019; pp. 402–409. [Google Scholar] [CrossRef]
Sun, P.; Li, J.; Bhuiyan, M.Z.A.; Wang, L.; Li, B. Modeling and Clustering Attacker Activities in IoT Through Machine Learning Techniques. Inf. Sci. 2019, 479, 456–471. [Google Scholar] [CrossRef]
Sun, C.; Bu, Y.; Chen, B.; Zhang, D.; Chen, Z.; Lu, X.; Zhang, S.; Sun, J. Application of Artificial Intelligence Technology in Honeypot Technology. In Proceedings of the 2021 International Conference on Advanced Computing and Endogenous Security, Nanjing, China, 21–22 April 2022; pp. 1–9. [Google Scholar] [CrossRef]
Tabari, A.Z.; Liu, G.; Ou, X.; Singhal, A. Revealing Human Attacker Behaviors Using an Adaptive Internet of Things Honeypot Ecosystem. In IFIP International Conference on Digital Forensics; Springer Nature: Cham, Switzerland, 2023; pp. 73–90. [Google Scholar] [CrossRef]
Tien, C.W.; Chen, S.W.; Ban, T.; Kuo, S.Y. Machine Learning Framework to Analyze IoT Malware Using ELF and Opcode Features. Digit. Threat. Res. Pract. 2020, 1, 1–19. [Google Scholar] [CrossRef]
Tsemogne, O.; Hayel, Y.; Kamhoua, C.; Deugoué, G. Game-Theoretic Modeling of Cyber Deception Against Epidemic Botnets in Internet of Things. IEEE Internet Things J. 2021, 9, 2678–2687. [Google Scholar] [CrossRef]
Veluchamy, S.; Kathavarayan, R.S. Deep Reinforcement Learning for Building Honeypots Against Runtime DoS Attack. Int. J. Intell. Syst. 2022, 37, 3981–4007. [Google Scholar] [CrossRef]
Vishwakarma, R.; Jain, A.K. A Honeypot with Machine Learning Based Detection Framework for Defending IoT Based Botnet DDoS Attacks. In Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 23–25 April 2019; pp. 1019–1024. [Google Scholar] [CrossRef]
Wang, B.; Dou, Y.; Sang, Y.; Zhang, Y.; Huang, J. IoTCMal: Towards a Hybrid IoT Honeypot for Capturing and Analyzing Malware. In Proceedings of the ICC 2020—2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–7. [Google Scholar] [CrossRef]
Wang, X.; Shi, L.; Cao, C.; Wu, W.; Zhao, Z.; Wang, Y.; Wang, K. Game Analysis and Decision Making Optimization of Evolutionary Dynamic Honeypot. Comput. Electr. Eng. 2024, 119, 109534. [Google Scholar] [CrossRef]
Xu, Y.; Jiang, Y.; Yu, L.; Li, J. Brief Industry Paper: Catching IoT Malware in the Wild Using HoneyIoT. In Proceedings of the 2021 IEEE 27th Real-Time and Embedded Technology and Applications Symposium (RTAS), Nashville, TN, USA, 18–21 May 2021; pp. 433–436. [Google Scholar] [CrossRef]
Yamamoto, M.; Kakei, S.; Saito, S. Firmpot: A Framework for Intelligent-Interaction Honeypots Using Firmware of IoT Devices. In Proceedings of the 2021 Ninth International Symposium on Computing and Networking Workshops (CANDARW), Matsue, Japan, 23–26 November 2021; pp. 405–411. [Google Scholar] [CrossRef]
Yu, X.N.; Guo, W.K.; Liu, Y.Z.; Cao, Y.P.; Zhang, M.; Wang, H.F. An Automatic Features Extraction Model of IDS for IoT. In International Conference on Computer Engineering and Networks; Springer Nature: Singapore, 2022; pp. 1260–1268. [Google Scholar] [CrossRef]
Shinly, S.S.S.; Raja, R.S. Investigation of Machine Learning Techniques in Intrusion Detection System for IoT Network 2020. In Proceeding of the 2020 3rd International Conference on Intelligent Sustainable Systems (ICISS), Thoothukudi, India, 3–5 December 2020; pp. 1164–1167. [Google Scholar]
Benerradi, J.; Clos, J.; Landowska, A.; Valstar, M.F.; Wilson, M.L. Benchmarking framework for machine learning classification from fNIRS data. Front. Neuroergon. 2023, 4, 994969. [Google Scholar] [CrossRef]
Machine Learning Models Organization Practical Guide: Deploying Machine Learning Models in Real-World. 2023. Available online: https://machinelearningmodels.org/practical-guide-deploying-machine-learning-models-in-real-world/ (accessed on 31 March 2025).
Paleyes, A.; Urma, R.G.; Lawrence, N.D. Challenges in Deploying Machine Learning: A Survey of Case Studies. ACM Comput. Surv. 2022, 55, 114. [Google Scholar] [CrossRef]
Cabrera, C.; Paleyes, A.; Thodoroff, P.; Lawrence, N.D. Real-world Machine Learning Systems: A Survey from a Data-Oriented Architecture Perspective. arXiv 2023, arXiv:2302.04810. [Google Scholar] [CrossRef]

Figure 1. Snowballing process.

Figure 2. Feedback connection of automation and environment from the literature [60].

Figure 3. Process flow for machine learning-based detection framework from the literature [65].

Figure 4. Process with different machine learning techniques.

Table 1. Existing surveys.

Author	Year	ML Techniques	Honeypot Technology	IoT Enviroment
Dangi [18]	2022	X
Franco [19]	2021		X	X
Jain [20]	2024	X
Kaur [21]	2018		X	partly
Oza [22]	2018		X	X
Razali [23]	2018		X	X
This paper	2025	X	X	X

Table 2. Overview of literature search results.

Category	Hits	Relevant Literature	Selected Literature
ACM	40	40	34
IEEE Xplore	219
Scopus	246
DTU Find	265
Backward	1314	28	10
Forward	674	53	5
Total			49

Table 3. Concept-Matrix.

Author	Year	Interaction Honeypots		Machine Learning Technique		Detection Architecture		Classifiers
Author	Year	Low Interaction	High Interaction	Supervised	Unsupervised	Reinforcement	Malware Detection	Adaptive Honeypots	Method	Framework
Alshahrani [25]	2023			X			X
Bao [26]	2023	X	X		X				X
Bringer [4]	2012	X	X
Chempayathy [27]	2022		X		X			X
Chuang [28]	2020							X
Da Costa [29]	2019			X	X				X
Danilov [30]	2022							X		X
Dara [13]	2024	X	X	X	X			X	X	X
Dowling [31]	2020			X	X	X		X
El Ghazi [32]	2020	X	X	X	X			X	X	X
El-Taie [33]	2023					X				X
Franco [34]	2022		X	X			X		X
Guan [35]	2023	X	X					X	X	X
Harahsheh [36]	2023			X	X
Huang [37]	2019			X
Iwabuchi [38]	2024						X
Khan [39]	2022			X
Kumar [40]	2019	X	X				X
Layton [41]	2023	X	X	X	X
Lee [42]	2020	X	X		X	X	X
Lee [43]	2021			X			X
Lee [44]	2021			X			X		X	X
Lingenfelter [14]	2020					X	X			X
Liu [45]	2019			X	X				X
Liu [46]	2019			X
Mahajan [47]	2023	X		X			X
Matin [48]	2019			X			X
Mfogo [49]	2023	X	X					X
Panda [50]	2021			X
Pashaei [51]	2023			X		X
Pauna [52]	2019				X			X
Pothumani [53]	2024			X			X		X
Shahid [54]	2022	X	X						X
Sharma [55]	2023			X	X			X
Shinan [56]	2021	X	X	X	X				X
Shobana [57]	2020			X			X
Shrivastava [58]	2019			X				X
Sun [59]	2019				X		X		X
Sun [60]	2022		X					X
Tabari [61]	2023				X		X	X
Tien [62]	2020			X					X
Tsemogne [63]	2021		X
Veluchamy [64]	2022					X
Vishwakarma [65]	2019								X
Wang [66]	2020	X	X							X
Wang [67]	2024				X			X
Xu [68]	2021		X							X
Yamamoto [69]	2021		X					X
Yu [70]	2022					X	X
		13	18	24	16	7	14	15	14	9

Table 4. Comparison table by metrics.

Learning Type	Accuracy	Precision	Recall	FPR
Supervised learning	0.953 (Weka),	0.82 [37]	0.82 [37]	0.219 (Weka),
	0.96 (R-Studio) [44]			0.2667 (R-Studio) [44]
Unsupervised learning	100% (CNN + ScS),	100% micro and macro (CNN + ScS),	100% micro and macro (CNN + ScS),	Not explicitly given;
	100% (DMLP + ScS) [50]	100% micro and macro (DMLP + ScS) [50]	100% micro and macro (DMLP + ScS) [50]	noted issue with unclassified instances [50]
Reinforcement learning	Highest average reward (A2C, PPO) [35]	-	-	-1 reward for incorrect response, used as FPR proxy [35]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lanz, S.; Pignol, S.L.-R.; Schmitt, P.; Wang, H.; Papaioannou, M.; Choudhary, G.; Dragoni, N. Optimizing Internet of Things Honeypots with Machine Learning: A Review. Appl. Sci. 2025, 15, 5251. https://doi.org/10.3390/app15105251

AMA Style

Lanz S, Pignol SL-R, Schmitt P, Wang H, Papaioannou M, Choudhary G, Dragoni N. Optimizing Internet of Things Honeypots with Machine Learning: A Review. Applied Sciences. 2025; 15(10):5251. https://doi.org/10.3390/app15105251

Chicago/Turabian Style

Lanz, Stefanie, Sarah Lily-Rose Pignol, Patrick Schmitt, Haochen Wang, Maria Papaioannou, Gaurav Choudhary, and Nicola Dragoni. 2025. "Optimizing Internet of Things Honeypots with Machine Learning: A Review" Applied Sciences 15, no. 10: 5251. https://doi.org/10.3390/app15105251

APA Style

Lanz, S., Pignol, S. L.-R., Schmitt, P., Wang, H., Papaioannou, M., Choudhary, G., & Dragoni, N. (2025). Optimizing Internet of Things Honeypots with Machine Learning: A Review. Applied Sciences, 15(10), 5251. https://doi.org/10.3390/app15105251

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Optimizing Internet of Things Honeypots with Machine Learning: A Review

Abstract

1. Introduction

2. Theoretical Background

2.1. Honeypot

2.2. Internet of Things

2.3. Machine Learning

3. Comparison of Existing Surveys

4. Methodology

4.1. Research Design

4.2. Data Collection

4.3. Data Analysis

5. Findings

5.1. Interaction Honeypots

5.2. Machine Learning Techniques

5.3. Detection Architecture

5.4. Machine Learning Classifiers

6. Discussion

6.1. Potential Advantages of Dynamic Honeypots in Internet of Things over Adaptative Honeypots

6.2. Combined Advantages of Multiple Machine Learning Techniques in Internet of Things Honeypots

6.3. Limitations

6.4. Future Research Directions

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI