1. Introduction
IoT device fingerprinting is the process of identifying and authenticating Internet of Things (IoT) devices by analyzing their unique characteristics—such as radio frequency signal patterns, network traffic behavior, or hardware-specific traits. As IoT deployments expand rapidly across environments like smart homes, healthcare systems, and industrial networks, secure and reliable device identification has become critical. Traditional identifiers like MAC or IP addresses are vulnerable to spoofing and cloning, underscoring the need for more robust, spoof-resistant fingerprinting techniques that leverage hard-to-replicate device-specific features.
Modern approaches employ machine learning, signal processing, and behavioral analysis to streamline device management and enhance security and fingerprinting in increasingly complex and heterogeneous IoT ecosystems. These approaches can generally be classified into two main groups based on the behavior analysis domain.
The first group is the time-domain-based approach, which primarily focuses on analyzing IoT device behavior from the network traffic based on aggregated statistics extracted from dense network traffic. These methods extract features from various levels of network activity, including packet-level, flow-level, and link-layer communication (e.g., Wi-Fi or Ethernet) [
1,
2]. These approaches have shown promising performance in many scenarios; however, they also face significant limitations. One of the primary challenges is the limited visibility into device behavior, particularly for IoT devices that generate sparse, low-volume, or irregular traffic. Many such devices remain inactive for extended periods and only transmit data during brief, infrequent interactions. In these cases, the time-domain features may appear noisy, inconsistent, or insufficient for constructing accurate and reliable fingerprints.
The second group is frequency-domain-based analysis, which primarily focuses on analyzing IoT device behavior from the radio frequency signals in the frequency domain. Fourier transform or wavelet analysis has primarily been applied to only radio frequency signals [
3,
4,
5], which are emitted by IoT devices at the physical layer during wireless communication. Radio frequency fingerprinting (RFF) identifies devices by analyzing unique physical-layer characteristics of wireless signals, such as amplitude, phase, or modulation imperfections.
RFF is inherently limited in IoT networks with wired or mixed connectivity. Wired devices, such as industrial sensors, medical equipment, or point-of-sale systems connected via Ethernet, transmit data through cables, producing no RF signals for analysis, rendering RFF ineffective in wire-dominant environments like smart factories or medical networks. Moreover, RFF captures only device-specific hardware traits, such as transmitter imperfections, which reflect individual identities but not device function or type. This makes it challenging to identify device types (e.g., distinguishing a smart bulb from a smart thermostat), which is critical for network administrators performing tasks like firmware updates, vulnerability patching, or access control. Moreover, environmental factors like interference or multipath fading can degrade RFF’s effectiveness.
To overcome the limitations of both time-domain and radio frequency-based fingerprinting approaches, we propose a novel solution that leverages the informative richness of network traffic alongside the robustness of frequency-domain representations. Frequency-domain features provide a more stable and discriminative feature space by capturing underlying periodicities, burst patterns, and behavioral rhythms that are often obscured in raw time-series data. Our approach combines the behavioral depth of time-domain analysis with the structural clarity and pattern recognition capabilities of the frequency domain by applying wavelet transform techniques, enabling accurate and resilient IoT device fingerprinting in diverse network environments.
We refer to the proposed solution as Wavelet IoT Device Fingerprint. As illustrated in
Figure 1, the system addresses the limitations of radio frequency fingerprinting (RFF) by enabling passive traffic collection from both wired and wireless interfaces at the network gateway level. This approach leverages network traffic characteristics—including flow behavior, packet timing, and communication patterns—that inherently carry device-specific and type-specific signatures. Moreover, wired connections offer more stable and noise-free traffic, improving the reliability of the fingerprinting process. To analyze the network traffic in the frequency domain, the system first converts the network traffic into time-series signals, then applies wavelet transform techniques (such as DWT or WST) to extract multi-resolution patterns across both time and frequency domains. These wavelet coefficients are rich in behavioral information and serve as features for classification. Finally, machine learning models are used to classify devices by their unique identity or device type. By operating in the frequency domain, the Wavelet IoT Device Fingerprint system provides a resilient and comprehensive solution for IoT device identification and classification.
This work is, to the best of our knowledge, the first to apply wavelet-based analysis (DWT and WST) to network-traffic signals specifically for the purpose of IoT device fingerprinting, and the first to conduct a systematic and comparative evaluation of these wavelet techniques across three distinct datasets. While wavelet-based analysis has been used for anomaly detection and intrusion detection [
6,
7,
8,
9,
10,
11], these domains focus on identifying deviations from established norms to detect suspicious or malicious behavior. In contrast, device fingerprinting is a fundamentally different problem: rather than detecting anomalies, it aims to uncover stable and distinctive patterns in device behavior that are consistent over time and unique to each device or device type. This distinction is critical, as techniques effective in anomaly detection do not necessarily translate to robust fingerprinting.
Moreover, prior works in IoT device fingerprinting have largely overlooked two critical evaluation dimensions: the impact of dataset size and complexity on model accuracy [
12] and the performance of fingerprinting methods under cyberattacks such as DDoS attacks [
12,
13]. In this study, we address these gaps by evaluating the proposed wavelet-based fingerprinting approach under both different size, heterogeneous datasets and adversarial conditions. We examine how increasing the number and diversity of IoT devices affects classifier performance and assess the system’s resilience during DDoS attacks. This provides a more realistic measure of the model’s scalability and robustness in complex, real-world IoT environments.
We evaluate the proposed wavelet-based fingerprinting solution on three IoT datasets of varying sizes and complexity: the CIC IoT Dataset 2022 [
14], the IoT Device Classification Dataset by Sivanathan et al. [
15], and the CIC IoT Dataset 2023 [
16]. The results show a clear advantage in accuracy, with the approach effectively identifying both individual devices and device types. It also demonstrates robustness to network heterogeneity and maintains acceptable performance under DDoS attack conditions, making it well-suited for scalable and secure IoT environments.
The rest of this paper is organized as follows:
Section 2 reviews the related work.
Section 3 introduces the proposed approach.
Section 4 presents the experimental setup and the results, followed by a discussion in
Section 5. Finally,
Section 6 concludes the paper and outlines potential directions for future research.
4. Experiments and Results
In this section, we present the datasets, the evaluation scenarios, and the experimental results used to assess the effectiveness of the proposed wavelet-based IoT device fingerprinting solution.
To evaluate the system’s performance, we compare the proposed approach against baseline models that utilize traditional time-domain features that aggregate statistics extracted from network traffic. The baseline models are designed to ensure a fair and controlled comparison by using the same machine learning classifiers and being trained on the same datasets as the proposed method. The only variation between the baseline and the proposed system lies in the feature representation: the baseline models rely on a set of eight time-domain features, which also serve as the input for the proposed method and generate the wavelet-based features used in our approach.
To provide a comprehensive and quantitative assessment, we employ standard performance metrics commonly used in machine learning and classification tasks, including:
Accuracy: Measures overall correctness of classification.
Precision: Assesses how many of the identified devices were correctly classified.
Recall: Measures the model’s ability to correctly identify all relevant devices.
F1-Score: Provides a balance between precision and recall.
Macro-averaged precision, recall, and F1-score are used as the primary evaluation metrics in this work to provide a fair and robust assessment of IoT device fingerprinting performance under class imbalance. These metrics are computed by evaluating precision, recall, and F1-score independently for each device class and then taking their unweighted average, thereby ensuring that all devices contribute equally to the final evaluation regardless of traffic volume or class frequency.
The evaluation of the proposed wavelet-based IoT fingerprinting solution is conducted across different distinct experimental scenarios, each designed to assess a specific dimension of the system’s performance, scalability, and robustness:
Scenario 1: Sampling Rate Sensitivity Analysis.
Scenario 2: Feature Reduction Analysis.
Scenario 3: Individual Device Identification Under Normal Conditions.
Scenario 4: Device Type Identification Under Normal Conditions.
Scenario 5: Performance Under Adversarial Conditions.
Each scenario targets a unique evaluation objective—ranging from fine-grained device recognition to generalized type classification, and resilience against malicious or abnormal traffic. Together, these scenarios provide a comprehensive framework for validating the effectiveness of the proposed solution under realistic and security-relevant conditions. Detailed descriptions of these scenarios are provided in the corresponding section.
Our learning instances are traffic windows of 300 s extracted with no overlap. To prevent temporal and near-duplicate leakage between partitions, we define sessions as time-block sessions: for each device, the traffic timeline is partitioned into contiguous non-overlapping time blocks of 15–30 min (block length), and all windows whose timestamps fall within the same time block are assigned the same . We then construct a grouping key and perform the 70–15–15% train–validation–test split using GroupShuffleSplit, ensuring that all windows from the same device and time block remain entirely within a single partition. The fixed seed (seed = 42) is retained to make this grouping-based split reproducible. To quantify the stability and reliability of the reported performance, we evaluate all models using repeated fully grouped 5-fold cross-validation, where grouping is enforced at the session (time-block) level to prevent temporal and near-duplicate leakage between folds.
For each metric, performance is first computed independently on each fold and then aggregated. In addition to reporting mean ± standard deviation, we explicitly quantify uncertainty by computing 95% confidence intervals (CIs) across folds using the standard normal approximation. Specifically, given the fold-level standard deviation and the number of folds , the 95% CI is computed as . For clarity and transparency, the reported standard deviations are consistent with—and derived from—these confidence intervals, ensuring that variability and uncertainty are directly interpretable and reproducible. This dual reporting of mean ± standard deviation and explicit confidence intervals provides a rigorous assessment of result stability, particularly important given the near-perfect performance observed in several configurations.
In cases where the denominator (e.g., payload size) is zero for packet/payload ratios, we replace the undefined value with a large constant (specifically, in our implementation, chosen to exceed typical ratio scales in the dataset by several orders of magnitude). This avoids direct representation of infinity, which can introduce numerical instability in floating-point operations. However, we recognize that such substitutions can indeed act as an uncontrolled parameter if not addressed further. To mitigate this, in the implementation, prior to normalization and extraction of downstream wavelet features, we explicitly identify these large constants as outliers and eliminate the affected samples or features from the dataset. This is done via a simple threshold-based filter (values are flagged and removed), ensuring they do not propagate into the normalization process or influence wavelet decomposition. In our experiments, this filtering impacted less than 0.5% of the data points, and sensitivity analyses (re-running models with varied constants , , ) showed negligible changes in overall accuracy (<0.1% variance in F1 scores), supporting that our near-perfect results are robust and not materially dependent on this parameter.
4.1. Machine Learning Algorithms
Four distinct machine learning algorithms—K-Nearest Neighbors (KNN), Support Vector Machine (SVM), Random Forest (RF), and XGBoost (XGB)—were employed to represent four different families of machine learning techniques. These algorithms were selected because they have been widely adopted in prior research and allow the diverse properties of the extracted wavelet features to be systematically examined. The KNN classifier was applied as an instance-based learning method, in which each sample was classified by computing its distance to the k nearest training instances in the feature space. Through this mechanism, local geometric patterns within the feature vectors were effectively captured. The SVM classifier was used as a kernel-based discriminative model. In this approach, class boundaries were determined by identifying an optimal separating hyperplane that maximizes the margin between classes in a suitably transformed feature space, allowing complex and nonlinear separations to be modeled. The RF algorithm was incorporated as an ensemble technique drawn from the decision tree family. Multiple decision trees were constructed through randomized feature selection and bootstrapped sampling, and their outputs were aggregated to produce final predictions. This ensemble formulation was shown to enhance robustness against overfitting while capturing hierarchical and nonlinear relationships within frequency- and wavelet-domain representations. The XGB classifier was employed as a gradient-boosting-based ensemble method, in which trees were sequentially constructed to correct the residual errors of prior trees. Through its use of second-order gradient information, regularization, and optimized tree growth, XGB was able to model subtle boundary structures and complex interactions in the wavelet feature space with high predictive stability. Collectively, these algorithms—differing in their underlying assumptions, decision boundaries, and learning biases—enabled a comprehensive examination of the geometric, hierarchical, and discriminative characteristics encoded within the extracted wavelet features.
Model hyperparameters were selected manually rather than through automated search methods such as GridSearchCV or Bayesian optimization. This manual approach was chosen based on prior empirical experience with similar datasets and to maintain computational efficiency during experimentation.
Table 1 summarizes the algorithms hyperparameters that have been used.
4.2. Datasets
To evaluate the effectiveness and generalizability of the proposed wavelet-based IoT device fingerprinting approach, we utilize three publicly available datasets: the CIC IoT Dataset 2022 [
14], the UNSW IoT Device Classification Dataset by Sivanathan et al. [
15], and the CIC IoT Dataset 2023 [
16]. These datasets collectively provide a rich and diverse foundation for experimentation and validation.
We selected these datasets based on their varying sizes, levels of complexity, and degrees of heterogeneity, to thoroughly evaluate the scalability and robustness of our proposed fingerprinting approach. The CIC IoT Dataset 2023 is the largest and most comprehensive, featuring 105 IoT devices deployed in a complex network topology with support for multiple communication protocols, including Wi-Fi, Zigbee, and Z-Wave. This dataset reflects a highly realistic and challenging environment, including both benign and malicious traffic. The CIC IoT Dataset 2022 is moderate in scale, containing 60 devices and offering acceptable heterogeneity with multi-protocol support, though it operates within a simpler network structure. In contrast, the UNSW IoT dataset is the smallest, with only 28 devices, and features low complexity, as it contains benign traffic only and lacks the variability found in larger datasets.
Table 2 summarizes the key characteristics of each dataset, including the number of devices, types of traffic, and protocols.
4.3. Wavelet-Based Feature Construction Parameters
During model development, we systematically evaluated a range of parameters and configurations in order to identify settings that provide the best trade-off between classification accuracy and computational resource usage. This evaluation was guided by both empirical performance and practical deployment considerations.
For DWT-based feature extraction, we selected the Daubechies-4 (db4) mother wavelet with two decomposition levels. We evaluated both shallower and deeper decompositions; however, two levels were sufficient to capture the dominant multi-resolution structure present within the 300 s windows, while deeper decompositions led to coefficient dilution and marginal performance gains. The db4 wavelet was chosen due to its compact support and favorable time–frequency localization properties, which are well suited for modeling bursty and nonstationary IoT traffic patterns.
For WST-based features, we used a configuration with J = 2 scales, L = 4 angular filters, and maximum scattering order = 2, implemented using the Kymatio framework. Increasing the number of scales beyond J = 2 resulted in a substantial increase in computational cost without corresponding improvements in classification performance. Retaining second-order scattering coefficients proved beneficial, as they effectively capture higher-order interactions and modulation-style variability in traffic dynamics that are characteristic of device-specific behavior.
4.4. Sampling Rate Sensitivity Analysis
The sampling rate plays a critical role in transforming raw network packet data into meaningful time-series signals for IoT device fingerprinting. An appropriate sampling rate determines the granularity and resolution of the captured traffic behavior, directly influencing the quality of extracted features and, consequently, the performance of the classification models. Therefore, investigating the optimal sampling rate is an essential step in designing an effective fingerprinting system. If the sampling rate is too low, important behavioral patterns may be missed; if too high, it may introduce noise or unnecessary computational overhead.
In this experiment, we evaluate the impact of different sampling rates on classification performance. Specifically, we generate time-series signals using multiple sampling intervals (1 s to 60 s) and measure the accuracy of various machine learning models at each setting. This analysis helps identify the sampling rate that provides the best balance between signal fidelity, model accuracy, and processing efficiency.
Figure 4 presents the accuracy of various machine learning models using DWT and WST coefficients for both individual IoT device identification.
Figure 4a shows the accuracy of individual IoT device identification across different sampling rates using DWT features.
Figure 4b shows the accuracy of device type classification across different sampling rates using DWT features. Similarly,
Figure 4c,d illustrate the models’ accuracy for both individual IoT device identification and IoT device type classification using WST features. The four figures show the highest classification accuracy across all models is achieved at a 1 s sampling rate. As the sampling interval increases—in increments of 10 s up to 60 s—the accuracy consistently declines. At the 60 s sampling rate, all models demonstrate their lowest performance, indicating that coarse temporal granularity leads to a loss of critical behavioral patterns necessary for effective classification. Based on these results, a 1 s sampling rate is selected for the remaining experiments, as it provides the best trade-off between signal resolution and model accuracy for both device-level and type-level identification tasks.
4.5. Feature Reduction Analysis
Wavelet transform coefficients—whether derived from DWT or WST—capture signal behavior across multiple time-frequency scales, making them highly effective for characterizing IoT device communication patterns. However, this richness in representation often results in a high-dimensional feature space, which may include redundant or noisy features. Such redundancy can increase computational complexity and risk overfitting without significantly improving classification performance.
To address this, PCA is applied primarily as an exploratory feature-reduction step to assess whether the high-dimensional DWT and WST feature sets contain redundant or non-discriminative components, rather than as a mandatory preprocessing stage. To guide component selection, we adopt a model-driven performance criterion: instead of selecting the number of principal components solely based on retained variance ratios, we systematically evaluate classifier accuracy across a range of PCA dimensions and select the configuration that yields the best validation performance. This approach is particularly appropriate in the context of IoT device fingerprinting, where not all variance captured by PCA is relevant to device identity, and some components may predominantly reflect noise or channel effects.
Importantly, to prevent information leakage, PCA is fitted exclusively on the training data in each iteration, and the learned transformation is then applied to the corresponding validation and test sets. This procedure ensures a fair evaluation and preserves the integrity of the experimental protocol.
Figure 5a illustrates the accuracy of machine learning models using varying numbers of features extracted from DWT coefficients. The results show that accuracy improves steadily up to 30 features, after which further increases yield minimal or no significant improvement. Beyond this point, the marginal gain does not justify the additional computational overhead. Similarly,
Figure 5b presents the results for WST-based features. The trend mirrors that of the DWT experiment: model accuracy increases up to approximately 30 features, after which it plateaus, indicating that the most informative components are concentrated in the first few dimensions.
These findings highlight the importance of feature reduction in optimizing the trade-off between model performance and efficiency, and justify the use of 30 features for subsequent experiments involving both DWT and WST-based fingerprinting.
4.6. Individual Device Identification Under Normal Conditions
In this experiment, we train and test supervised machine learning algorithms using benign network traffic, representing the normal operating conditions of IoT devices. The objective is to evaluate the system’s ability to accurately perform fine-grained device identification, distinguishing between individual IoT devices—even those from the same manufacturer or with similar functionality.
The experiment reflects use cases where precise identification is critical, such as tracking specific devices within a network, auditing device presence, or detecting unauthorized replicas that may appear visually or functionally identical but differ in origin or intent. The evaluation is conducted in three phases: (1) testing baseline models using time-domain features, (2) evaluating the proposed solution with DWT features, and (3) assessing performance using WST features.
4.6.1. Baseline Models’ Performance
Table 3 presents the performance of baseline machine learning models—KNN, SVM, Random Forest (RF), and XGBoost (XGB)—on three datasets (CIC2022, CIC2023, and UNSW) using time-domain features for individual IoT device identification. The results indicate moderate to low accuracy, highlighting the limitations of time-domain features in achieving reliable classification.
Among all configurations, the best performance was achieved by XGBoost on the UNSW dataset, with an accuracy of 77%, followed closely by its performance on CIC2022 (72%) and CIC2023 (68%). Random Forest also performed consistently across all datasets, with accuracies ranging from 64% to 69%. In contrast, SVM and KNN showed lower effectiveness, particularly on the CIC2023 dataset, where SVM recorded the lowest accuracy of 41%, and KNN followed closely at 43%. These results highlight the challenges of using limited conventional time-domain features alone for effective device fingerprinting—particularly in more complex or noisy datasets.
4.6.2. Discrete Wavelet Transform (DWT) Performance
Table 4 presents the performance of various machine learning models—KNN, SVM, Random Forest (RF), and XGBoost (XGB)—using Discrete Wavelet Transform (DWT) features for individual IoT device identification across three datasets: CIC2022, CIC2023, and UNSW.
The results presented in
Table 4 demonstrate a significant performance improvement over baseline time-domain models. Across all datasets, XGBoost and Random Forest consistently deliver the highest performance, achieving near-perfect scores. Specifically, both models reach 99% accuracy, precision, recall, and F1-score on CIC2022 and UNSW, indicating outstanding ability to distinguish between individual devices. On the CIC2023 dataset, XGBoost maintains a high accuracy of 99%, while Random Forest scores 97%, reflecting strong generalizability. In contrast, SVM and KNN show relatively lower but still solid performance. For instance, on CIC2023, SVM achieves 74% accuracy, while KNN reaches 83%. On CIC2022 and UNSW, both models perform better, with accuracies ranging from 81% to 95%.
4.6.3. Wavelet Scattering Transform (WST) Performance
The results presented in
Table 5 demonstrate that WST-based features substantially enhance the performance of traditionally weaker models, particularly KNN and SVM. On the CIC2022 dataset, KNN achieves 92% accuracy and SVM reaches 90%, both showing notable improvements over their performance using time-domain features and even Discrete Wavelet Transform (DWT) features. These models also maintain balanced precision and recall, resulting in strong F1-scores of 0.92 and 0.90, respectively.
Similarly, on the more challenging CIC2023 dataset, both KNN and SVM achieve 86% accuracy, with KNN demonstrating better overall balance (precision and recall = 0.86) compared to SVM (precision = 0.75, recall = 0.74). On the UNSW dataset, both KNN and SVM achieve near-perfect accuracy of 99%, equaling the performance of more advanced ensemble models such as Random Forest and XGBoost. This result is particularly significant, as it shows that simpler models can match the effectiveness of complex classifiers when powered by rich, hierarchical WST features. While XGBoost continues to deliver the highest overall performance, with accuracy ranging from 98% to 100% across all datasets, and Random Forest closely follows with 90% to 99% accuracy, the key insight is clear: WST dramatically improves the predictive power of non-ensemble models like KNN and SVM, effectively narrowing the performance gap between them and their more computationally intensive counterparts.
4.7. IoT Device Type Identification Under Normal Conditions
In this experiment, we again use benign traffic, but shift the focus from individual identification to device type identification. Here, the goal is to group devices into broader functional categories—such as smart plugs, IP cameras, thermostats, or sensors—based on their communication behavior.
This setting evaluates the model’s ability to capture shared behavioral patterns among different devices of the same type, regardless of manufacturer or deployment environment. It simulates practical scenarios where network administrators need to manage devices at a group level, for example, to roll out firmware updates, apply security patches, or enforce type-based access policies efficiently. The evaluation is conducted in three phases: (1) testing baseline models using time-domain features, (2) evaluating the proposed solution with DWT features, and (3) assessing performance using WST features.
4.7.1. Baseline Models’ Performance
The results presented in
Table 6 indicate moderate performance, with clear limitations in using the limited eight time-domain features for generalized type-level classification. On the CIC2022 dataset, XGBoost achieves the highest performance with 80% accuracy and an F1-score of 0.82, followed by Random Forest at 78% accuracy. KNN, by contrast, shows the weakest performance, with only 69% accuracy and an F1-score of 0.66. On the more challenging CIC2023 dataset, the overall accuracy of all models drops noticeably except XGBoost leads with 82% accuracy, while KNN and SVM drop to 55% and 57% accuracy, respectively. This suggests that time-domain features struggle to capture consistent patterns across varied device types in noisy or diverse environments. On the UNSW dataset, the trend remains consistent: XGBoost performs best with 87% accuracy, followed by Random Forest at 80%. KNN and SVM achieve 72% and 71% accuracy respectively, showing limited discriminative power for device type classification.
4.7.2. Discrete Wavelet Transform (DWT) Performance
The results presented in
Table 7 clearly demonstrate that DWT-based feature representations yield high classification performance across all datasets, significantly outperforming the baseline models built on time-domain features. On the CIC2022 dataset, all models perform well, with XGBoost achieving perfect accuracy (100%), and Random Forest following closely with 99% accuracy. Simpler models like KNN also show strong results with 93% accuracy, while SVM achieves 85%, both maintaining respectable F1-scores of 0.93 and 0.89, respectively. On the more challenging CIC2023 dataset, which includes higher variability and noise, XGBoost once again performs robustly with 99% accuracy, and Random Forest achieves 98%, confirming their effectiveness even in complex environments. KNN maintains a solid performance with 84% accuracy, while SVM sees a drop to 75% accuracy and an F1-score of 0.76, indicating a greater sensitivity to overlapping device behaviors and traffic inconsistencies. Finally, on the UNSW dataset, all models exhibit exceptional performance: KNN, SVM, and Random Forest each achieve 99% accuracy, while XGBoost delivers perfect classification results with 100% accuracy, precision, recall, and F1-score.
4.7.3. Wavelet Scattering Transform (WST) Performance
The results presented in
Table 8 demonstrate that WST-based features enable high-performance IoT device type identification across all evaluated datasets and models. On the CIC2022 dataset, all models perform strongly. XGBoost achieves the highest performance with 99% accuracy across all metrics, followed closely by Random Forest at 97%, while both KNN and SVM reach 94% accuracy and F1-score. These results indicate that WST features enable effective type classification even among similar device categories. For the more complex CIC2023 dataset, there is a slight drop in performance due to increased device diversity and traffic complexity. XGBoost still leads with 96% accuracy, followed by Random Forest (93%), SVM (88%), and KNN (87%). On the UNSW dataset, all models achieve near-perfect results, with XGBoost reaching 100% across all metrics and the remaining models (KNN, SVM, RF) each scoring 99%. This reflects the dataset’s lower complexity and well-separated device types, where WST-based models perform exceptionally well.
4.8. Performance Under Adversarial Conditions
In this experiment, we assess the robustness of the trained models from Scenario 3 and Scenario 4 by introducing network traffic generated under DDOS attack conditions that alter normal communication patterns.
The objective is to evaluate how well the proposed wavelet-based fingerprints maintain their identification performance in the presence of attack or environmental disruptions. This scenario is critical for understanding the real-world reliability of the system in secure IoT deployments, where the network may be partially compromised or operating under threat. In this experiment, we use only CICIoT2022 and CICIoT2023 datasets because the UNSW dataset provides only benign traffic. The evaluation is conducted in three phases: (1) testing baseline models using time-domain features, (2) evaluating the proposed solution with DWT features, and (3) assessing performance using WST features. The training data consist exclusively of benign traffic, while the test data include both benign traffic and DDoS attack traffic.
4.8.1. Baseline Models’ Performance
The results in
Table 9 reveal a significant performance degradation across all models and metrics in the presence of DDoS attacks. For the CIC2022 dataset, even the best-performing model, XGBoost, achieves only 22% accuracy with an F1-score of 0.23, while Random Forest follows with 19% accuracy. KNN and SVM models perform worse, with accuracies of 14% and 11%, respectively, and correspondingly low F1-scores (0.13 and 0.11). On the CIC2023 dataset, the degradation is even more pronounced. XGBoost remains the top-performing model but only achieves 18% accuracy, while Random Forest reaches 14%. KNN and SVM perform the poorest under attack conditions, with accuracy as low as 9% and 10%, and F1-scores of 0.08 and 0.10, respectively.
Table 10 presents the performance metrics of baseline machine learning models for IoT device type identification under DDoS attack conditions, using time-domain features. On CIC2022 dataset, model performance is generally poor due to the disruptive nature of DDoS traffic. XGBoost achieves the highest accuracy at 33%, followed by Random Forest (29%), KNN (24%), and SVM (18%). Precision, recall, and F1-scores follow similar trends, all remaining below 0.35. These results indicate that traditional time-domain features are highly sensitive to attack-induced noise and fail to preserve distinctive patterns needed for reliable classification. On the more complex CIC2023 dataset, performance further degrades. XGBoost remains the top performer but only reaches 29% accuracy, with an F1-score of 0.27. Random Forest trails slightly at 24% accuracy, while KNN and SVM drop to 15% and 12% accuracy, respectively. This sharp decline highlights the increased difficulty of identifying device types under adversarial conditions in more heterogeneous environments.
4.8.2. Discrete Wavelet Transform (DWT) Performance
Table 11 presents the performance metrics of various machine learning models for individual IoT device identification under DDoS attack conditions using Discrete Wavelet Transform (DWT) features. Compared to the time-domain baselines reported in
Table 9, the results demonstrate a significant improvement in model robustness when utilizing DWT-based feature representations. On the CIC2022 dataset, XGBoost achieves the highest accuracy at 62%, followed by Random Forest at 59%, marking a substantial improvement over their baseline accuracies of 22% and 19%, respectively. Similarly, KNN and SVM also see meaningful gains, reaching 43% and 41% accuracy, with corresponding F1-scores of 0.43 and 0.40, compared to their baseline values of just 14% and 11%. On the more challenging CIC2023 dataset, while all models experience an expected performance drop due to increased traffic complexity and attack intensity, DWT-based features still offer considerable resilience. XGBoost and Random Forest maintain relatively high accuracies of 58% and 53%, significantly outperforming their baseline figures of 18% and 14%. KNN and SVM models although achieving lower accuracies of 29% and 20%, respectively, still show noticeable improvement over their time-domain counterparts (9% and 10%).
Table 12 presents the performance metrics of DWT-based models for IoT device type identification under DDoS attack. On the CIC2022 dataset, all models show improved performance compared to time-domain baselines (as seen in
Table 10). XGBoost achieves the highest accuracy at 68%, followed by Random Forest at 63%, both maintaining consistent precision, recall, and F1-scores. KNN and SVM also show moderate performance with accuracy and F1-scores of 53% and 50%, respectively. These results demonstrate that DWT features enhance resilience to DDoS traffic for moderate datasets. For the more complex CIC2023 dataset, all models experience a drop in performance, yet DWT still enables relatively strong classification. XGBoost again leads with 66% accuracy, followed by Random Forest (60%), while KNN (44%) and SVM (38%) perform less effectively. Nevertheless, all DWT-based models outperform their time-domain counterparts, showing that DWT provides a robust feature space for device type classification under adversarial conditions.
4.8.3. Wavelet Scattering Transform (WST) Performance
The results in
Table 13 show that WST-based models deliver the best overall robustness under attack when compared to both time-domain and DWT-based counterparts (
Table 9 and
Table 11). On the CIC2022 dataset, XGBoost achieves the highest accuracy of 71%, with precision, recall, and F1-score all matching at 0.70. Random Forest follows with a strong 68% accuracy, while KNN and SVM both perform reasonably better, achieving 56–57% accuracy, with balanced F1-scores (0.55–0.56). These results reflect an improvement over baseline and DWT models. On the more challenging CIC2023 dataset, model performance understandably declines but remains superior to previous approaches. XGBoost still leads with 69% accuracy, and Random Forest closely follows at 62%, both maintaining F1-scores above 0.63. KNN and SVM experience moderate drops to 47% and 42% accuracy, respectively, but still outperform their time-domain versions, which fell below 10%.
The performance metrics of WST-based models for IoT device type identification under DDoS attack conditions is presented in
Table 14. On the CIC2022 dataset, all models exhibit resilience under DDoS attacks. XGBoost achieves the best overall performance with 73% accuracy, precision, recall, and F1-score—indicating robustness and stability. Random Forest follows closely with 69% accuracy, while KNN and SVM reach 61% and 57% accuracy, respectively. These results show that WST features enhance model performance under adversarial conditions compared to time-domain and even DWT-based approaches.
5. Discussion
The evaluation across all tables reveals consistent and insightful trends regarding the impact of feature type, dataset complexity, and machine learning classifier choice on the performance of IoT device fingerprinting models. These trends are observed across both individual device identification and device type classification, under both benign and adversarial conditions.
5.1. Impact of Feature Type
The used feature representation significantly affects model performance. Time-domain features perform the weakest overall—particularly on complex datasets like CIC2023—with accuracy dropping as low as 41–43%. These results highlight the limited ability of shallow time-domain statistics to capture the nuanced communication behaviors of IoT devices, especially in heterogeneous and large-scale deployments. Their susceptibility to noise and inability to generalize across varying traffic patterns make them unsuitable for reliable fingerprinting in complex environments.
In contrast, Discrete Wavelet Transform (DWT) captures low-frequency behavioral patterns that reflect stable, device-specific characteristics. This allows for more robust classification across datasets. However, DWT still struggles to capture high-frequency signal components and bursty or irregular behaviors, which are common in real-world IoT communications.
Wavelet Scattering Transform (WST) outperforms both time-domain and DWT features, offering a more powerful and resilient feature representation. WST improves the performance of models like KNN and SVM, making them viable alternatives for simple algorithms in resource-constrained environments where deep models or ensembles may not be practical. Its advantage lies in its multi-layered, translation-invariant, and energy-preserving structure, which mimics the expressive power of deep neural networks while remaining computationally efficient and interpretable. WST effectively captures both fine-grained transient behaviors and long-term communication patterns, making it adaptable to diverse network conditions.
Under DDoS attack scenarios, while all models experience degradation, WST demonstrates better resilience compared to time-domain and DWT features. Its ability to learn deep structural patterns enables it to maintain reasonable performance even when traffic is saturated with attack flows.
5.2. Impact of Machine Learning Classifiers
The four classifiers evaluated—KNN, SVM, Random Forest (RF), and XGBoost (XGB)—show distinct performance characteristics. XGB and RF, both ensemble-based learning algorithms, consistently outperform KNN and SVM across all feature types and datasets. These ensemble models aggregate the predictions of multiple base learners to reduce variance and improve generalization, making them especially well-suited for complex, high-dimensional feature spaces such as those derived from DWT and WST.
On the other hand, KNN, while simpler and less computationally intensive, benefits greatly from richer feature representations. In particular, WST features significantly boost its accuracy, allowing this model to compete with more advanced classifiers in less complex datasets or constrained deployment scenarios.
5.3. Effect of Dataset Complexity
We evaluated our approach using three datasets that differ substantially in terms of the number of devices, class imbalance, and traffic heterogeneity. Among them, the CIC2023 dataset is the largest and most diverse, containing traffic from over 100 IoT devices, which introduces significantly higher variability, noise, and class imbalance. In contrast, the UNSW dataset, which includes 28 IoT devices, exhibits comparatively lower variability and noise, more balanced class distributions, and more homogeneous traffic patterns. Consistent with these differences, we observe that all evaluated models—including the weaker baseline classifiers—achieve higher accuracy on the UNSW dataset than on CIC2023.
These observations highlight the strong influence of dataset complexity on IoT device fingerprinting performance and motivate an important research question regarding how network scale, traffic heterogeneity, and class imbalance jointly affect the effectiveness and robustness of machine-learning-based fingerprinting solutions. Addressing this question rigorously would require a controlled scaling study on a large dataset, in which the number of devices, per-device sample counts, and traffic diversity are systematically varied under both controlled and unconstrained conditions. Such an analysis would enable a more precise quantification of how individual complexity factors impact fingerprint stability and classification performance and represents a valuable direction for future work beyond the scope of the present study.
5.4. Performance Under Anomalous Conditions
While the inclusion of DDoS traffic provides useful insight into model behavior under anomalous conditions, we emphasize that the presented experiments do not constitute a comprehensive robustness evaluation against adversarial attacks. Within the evaluated settings, wavelet-based features (DWT and WST) consistently exhibit better performance under anomaly-induced traffic conditions compared to baseline feature representations, indicating a degree of resilience to traffic disruption. However, we avoid overclaiming robustness, as the experiments do not explicitly separate multiple training and testing regimes nor do they systematically assess generalization across varying attack intensities with quantified uncertainty. Accordingly, we interpret the DDoS-related results as partial performance evaluations under anomalous conditions rather than definitive robustness guarantees. A rigorous robustness analysis—incorporating controlled adversarial scenarios, performance degradation metrics, and uncertainty estimates across multiple tasks and classifiers—remains an important direction for future work.
5.5. Comparison with State-of-the-Art Methods
It is important to note that direct numerical comparisons with prior work are often infeasible due to several well-recognized limitations in this research domain. First, many published studies do not release source code or sufficiently detailed implementation descriptions, which significantly hinders reproducibility. Second, reported results are highly dataset-dependent: some authors rely on publicly available datasets, whereas others evaluate exclusively on proprietary or laboratory-collected data, making cross-study comparisons inherently inconsistent. Third, performance outcomes are further affected by differences in preprocessing pipelines, feature-engineering choices, and hyperparameter configurations—details that are frequently omitted or only partially described in the literature. Despite these constraints, prior work generally reports classification accuracies of 90% or higher, and our results are consistent with—and in several cases exceed—this performance range.
In terms of approach and methodology, our novel solution is based on wavelet analysis, leveraging both the rich information content of raw network traffic and the robustness of frequency-domain analysis. This represents a largely unexplored direction in the current literature. We rigorously evaluated the proposed wavelet-based fingerprinting approach across three different public datasets, testing it under novel conditions not previously explored in the state-of-the-art methods, including the impact of dataset size and heterogeneity, and IoT device fingerprinting accuracy under adversarial conditions such as DDoS attacks.