Abstract
The integration of the Internet of Things (IoT) has become essential in our daily lives. It plays a core role in operating our daily infrastructure from energy grids and water distribution systems to healthcare and household devices. However, the rapid growth of IoT connections exposes our world to various sophisticated cybersecurity threats. Responding to these potential threats, many security measures have been proposed. The IoT-based Intrusion Detection System is one of the salient components of the security layer and alerts security administrators to any suspicious behaviors. In fact, machine learning-based IDS shows promising results, especially supervised models, but such models require expensive labelling processes by domain experts. The active learning strategy reduces the annotation cost and directs experts to label a small set of carefully selected instances. This paper proposes a robust approach called Clustering-based Layered Active Instance REpresentation (CLAIRE). It involves selecting both representative and informative instances. The former is selected through three sequential clustering-based layers, while the latter is selected by the fourth layer that implements an ensemble-based uncertainty mechanism to identify the most informative instances. Comprehensive evaluation on two well-known IoT datasets, namely, N-BaIoT and CICIoT2023, demonstrates promising results in selecting a small set of instances that capture the various data distributions of the data even in imbalanced datasets. We compare the results of the proposed approach with state-of-the-art baselines that work in the same scope of traditional machine learning.