Abstract
The world has adopted so many IoT devices but it comes with its own share of security vulnerabilities. One such issue is ARP spoofing attack which allows a man-in-the-middle to intercept packets and thereby modify the communication. Also, this allows an intruder to gain access to the user’s entire local area network. The ACI-IoT-2023 dataset captures ARP spoofing attacks, yet its absence of specified extracted features hinders its application in machine learning-aided intrusion detection systems. To combat this, we present a framework for ARP spoofing detection which improves the dataset by extracting ARP-specific features and evaluating their impact under different time-window configurations. Beyond generic feature engineering and model evaluation, we contribute by treating ARP spoofing as a time-window pattern and aligning the window length with observed spoofing persistence from the dataset timesheet—turning window choice into an explainable, repeatable setting for constrained IoT devices; by standardizing deployment-oriented efficiency profiling (inference latency, RAM usage, and model size) reported alongside accuracy, precision, recall and F1-scores to enable edge-feasible model selection; and by providing an ARP-focused, reproducible pipeline that reconstructs L2 labels from public PCAPs and derives missing link-layer indicators, yielding a transparent path from labeling to windowed features to training evaluation. Our research systematically analyzes five models with multiple time-windows, including Decision Tree, Random Forest, XGBoost, CatBoost, and K-Nearest Neighbors. This study shows that XGBoost and CatBoost provide maximum performance at the 1800 s window that corresponds to the longest spoofing duration in the timesheet, achieving accuracy greater than 0.93%, precision above 0.95%, recall near 0.91%, and F1-scores above 0.93%. Although Decision Tree has the least inference latency (∼0.4 ms.), its lower recall risks missed attacks. By contrast, XGBoost and CatBoost sustain strong detection with less than 6$ ms inference and moderate RAM, indicating practicality for IoT deployment. We also observe diminishing returns beyond (∼1800 s) due to temporal over-aggregation.