Scalable Generation of Synthetic IoT Network Datasets: A Case Study with Cooja
Abstract
1. Introduction
1.1. Background
1.2. Our Contribution
1.3. Related Work and Novelty
2. Methodology
2.1. Simulation Environment
2.2. Automation Pipeline
- Stage 1: Topology specification.
- Stage 2: Simulation configuration.
- Stage 3: Execution.
- Stage 4: Parsing.
- Stage 5: Dataset assembly.
- Customization and extensibility.
- Intermediate artifacts.
2.3. Steps to Design the Experiment Pipeline
- Implement the motes/firmware. In practice, these are .c files that Contiki-NG can execute. These files also define what is going to be logged in each simulation.
- Choose the topology or define a new one.
- Define the ranges of simulation configurations, i.e., the number of motes, transmission and interference ranges, and random seed.
- Implement the parser for the logs generated during the simulations.
- Implement a dataset aggregator, i.e., a script that scans the parsed logs of all simulations and produces an ML-ready dataset, typically a CSV file with inputs and outputs, possibly with additional references to simulation-specific outputs.
2.4. Case Study: RPL Network Analysis
- (i)
- draw grid dimensions by sampling and instantiate an grid with spacing S;
- (ii)
- put the root node at ;
- (iii)
- sample distinct grid sites without replacement for non-root nodes;
- (iv)
- add independent jitter to each non-root node: ;
- (v)
- check connectivity under a 50 m communication radius by forming the proximity graph (edges between nodes within 50 m) and verifying that all nodes are reachable from the root; if not, we start back from Step (i) by resampling the grid and jitter. We repeat this loop up to 10 times. In practice, the algorithm always found a connected graph within one to five attempts.
2.5. Baselines
2.6. Graph Neural Networks for Predicting Network Behavior
- Node encoder: A two-layer multi-layer perceptron maps raw node features (x-coordinate, y-coordinate, initial battery) to an H-dimensional embedding space. Features are normalized using training set statistics to ensure balanced gradient magnitudes.
- Graph convolutions: L GraphSAGE layers progressively refine node embeddings by aggregating information from expanding neighborhoods. Each layer applies a mean aggregation over neighbor embeddings, followed by a learned linear transformation and ReLU activation. More advanced aggregation methods can potentially improve the performance, but we left this exploration for future work. Residual connections and layer normalization stabilize training on deep architectures.
- Task-specific predictors: For node-level prediction (contact loss), a two-layer MLP maps refined node embeddings to scalar outputs. For graph-level prediction (coverage), node embeddings are aggregated via global mean pooling, concatenated with graph-level features (node count, grid spacing), and passed through a prediction MLP.
3. Results
3.1. Dataset
3.2. Contact Loss Prediction
3.3. Network Coverage Prediction
4. Discussion
5. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Raptis, T.P.; Passarella, A.; Conti, M. Data Management in Industry 4.0: State of the Art and Open Challenges. IEEE Access 2019, 7, 97052–97093. [Google Scholar] [CrossRef]
- Koutsiamanis, R.A.; Papadopoulos, G.Z.; Fafoutis, X.; Fiore, J.M.D.; Thubert, P.; Montavont, N. From Best Effort to Deterministic Packet Delivery for Wireless Industrial IoT Networks. IEEE Trans. Ind. Inform. 2018, 14, 4468–4480. [Google Scholar] [CrossRef]
- Gaur, R.; Prakash, S. Performance and Parametric Analysis of IoT’s Motes with Different Network Topologies. In Proceedings of the Innovations in Electrical and Electronic Engineering, New Delhi, India, 2–3 January 2021; Mekhilef, S., Favorskaya, M., Pandey, R.K., Shaw, R.N., Eds.; Springer: Singapore, 2021; pp. 787–805. [Google Scholar]
- Voulgaridis, K.; Lagkas, T.; Angelopoulos, C.M.; Nikoletseas, S.E. IoT and digital circular economy: Principles, applications, and challenges. Comput. Netw. 2022, 219, 109456. [Google Scholar] [CrossRef]
- Osterlind, F.; Dunkels, A.; Eriksson, J.; Finne, N.; Voigt, T. Cross-level sensor network simulation with cooja. In Proceedings of the 2006 31st IEEE Conference on Local Computer Networks, Tampa, FL, USA, 14–16 November 2006; IEEE: Piscataway, NJ, USA, 2006; pp. 641–648. [Google Scholar]
- Oikonomou, G.; Duquennoy, S.; Elsts, A.; Eriksson, J.; Tanaka, Y.; Tsiftes, N. The Contiki-NG open source operating system for next generation IoT devices. SoftwareX 2022, 18, 101089. [Google Scholar] [CrossRef]
- Grigoryan, G.; Khachatrian, H.; Raptis, T.P. Toward Automating Cooja Experiment Workflows for Dataset Generation. In Proceedings of the 2024 11th International Conference on Software Defined Systems (SDS), Gran Canaria, Spain, 9–11 December 2024; pp. 19–26. [Google Scholar] [CrossRef]
- Essop, I.; Ribeiro, J.C.; Papaioannou, M.; Zachos, G.; Mantas, G.; Rodriguez, J. Generating Datasets for Anomaly-Based Intrusion Detection Systems in IoT and Industrial IoT Networks. Sensors 2021, 21, 1528. [Google Scholar] [CrossRef] [PubMed]
- Corso, G.; Stark, H.; Jegelka, S.; Jaakkola, T.; Barzilay, R. Graph neural networks. Nat. Rev. Methods Prim. 2024, 4, 17. [Google Scholar] [CrossRef]
- Tian, L.; Mehari, M.T.; Santi, S.; Latré, S.; De Poorter, E.; Famaey, J. Multi-objective surrogate modeling for real-time energy-efficient station grouping in IEEE 802.11ah. Pervasive Mob. Comput. 2019, 57, 33–48. [Google Scholar] [CrossRef]
- Ngo, D.T.; Aouedi, O.; Piamrat, K.; Hassan, T.; Raipin-Parvédy, P. Empowering Digital Twin for Future Networks with Graph Neural Networks: Overview, Enabling Technologies, Challenges, and Opportunities. Future Internet 2023, 15, 377. [Google Scholar] [CrossRef]
- Arzo, S.T.; Naiga, C.; Granelli, F.; Bassoli, R.; Devetsikiotis, M.; Fitzek, F.H.P. A Theoretical Discussion and Survey of Network Automation for IoT: Challenges and Opportunity. IEEE Internet Things J. 2021, 8, 12021–12045. [Google Scholar] [CrossRef]
- Chernyshev, M.; Baig, Z.; Bello, O.; Zeadally, S. Internet of Things (IoT): Research, Simulators, and Testbeds. IEEE Internet Things J. 2018, 5, 1637–1647. [Google Scholar] [CrossRef]
- Almutairi, R.; Bergami, G.; Morgan, G. Advancements and Challenges in IoT Simulators: A Comprehensive Review. Sensors 2024, 24, 1511. [Google Scholar] [CrossRef] [PubMed]
- Jha, D.N.; Alwasel, K.; Alshoshan, A.; Huang, X.; Naha, R.K.; Battula, S.K.; Garg, S.; Puthal, D.; James, P.; Zomaya, A.; et al. IoTSim-Edge: A simulation framework for modeling the behavior of Internet of Things and edge computing environments. Softw. Pract. Exp. 2020, 50, 844–867. [Google Scholar] [CrossRef]
- Mahmud, R.; Pallewatta, S.; Goudarzi, M.; Buyya, R. iFogSim2: An extended iFogSim simulator for mobility, clustering, and microservice management in edge and fog computing environments. J. Syst. Softw. 2022, 190, 111351. [Google Scholar] [CrossRef]
- Levis, P.; Lee, N.; Welsh, M.; Culler, D. TOSSIM: Accurate and scalable simulation of entire TinyOS applications. In Proceedings of the SenSys ’03: Proceedings of the 1st International Conference on Embedded Networked Sensor Systems, Los Angeles, CA, USA, 5–7 November 2003; pp. 126–137. [Google Scholar] [CrossRef]
- Dunkels, A.; Osterlind, F.; Tsiftes, N.; He, Z. Software-based on-line energy estimation for sensor nodes. In Proceedings of the 4th Workshop on Embedded Networked Sensors, Cork, Ireland, 25–26 June 2007; pp. 28–32. [Google Scholar]
- Moteiv Corporation. Tmote Sky Wireless Sensor Node Datasheet. 2006. Available online: http://www.crew-project.eu/sites/default/files/tmote-sky-datasheet.pdf (accessed on 6 October 2025).
- Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Volume 30. [Google Scholar]
- Rusch, T.K.; Bronstein, M.M.; Mishra, S. A survey on oversmoothing in graph neural networks. arXiv 2023, arXiv:2303.10993. [Google Scholar] [CrossRef]








| Component | Description |
|---|---|
| Firmware (required) | Source code for sink and sensing motes (src/*.c). |
| Topology definition (required) | Topology family and parameter ranges (e.g., grid, tree, random mesh). |
| Sweep configuration (required) | Duration, seeds, radio and battery settings. |
| Parser (required) | Adjusts to firmware-specific log message formats. |
| Dataset builder (optional) | Aggregates parsed results into ML-ready datasets. |
| Database loader (optional) | Inserts processed data into PostgreSQL tables. |
| Mote | x | y | CPU | LPM | Listen | Transmit | Off | Total | Initial Battery | Consumed | Remaining | Status | Uptime | Last msg recv by Root | Sent | Forwarded |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 0.0 | 0.0 | 17 | 268 | 285 | 0 | 0 | 285 | 0 | alive | 285 | 0 | 0 | |||
| 2 | 76.0 | 115.4 | 17 | 278 | 294 | 0 | 1 | 295 | 84 | 49 | 35 | alive | 295 | 152 | 16 | 9 |
| 3 | 18.9 | 130.2 | 3 | 292 | 294 | 0 | 1 | 295 | 73 | 49 | 24 | alive | 295 | 297 | 43 | 0 |
| 4 | 14.1 | −3.2 | 6 | 289 | 294 | 0 | 1 | 295 | 86 | 49 | 37 | alive | 295 | 297 | 49 | 0 |
| 5 | 42.3 | 29.2 | 9 | 286 | 294 | 0 | 1 | 295 | 99 | 49 | 50 | alive | 295 | 297 | 49 | 36 |
| 6 | −5.2 | 58.8 | 4 | 291 | 294 | 0 | 1 | 295 | 88 | 49 | 39 | alive | 295 | 297 | 45 | 0 |
| 7 | 39.7 | 107.0 | 5 | 149 | 154 | 0 | 1 | 155 | 25 | 25 | 0 | dead | 155 | 152 | 18 | 62 |
| 8 | 21.1 | 73.9 | 5 | 230 | 234 | 0 | 1 | 235 | 38 | 39 | 0 | dead | 235 | 232 | 34 | 0 |
| 9 | 82.3 | 108.7 | 18 | 277 | 293 | 1 | 1 | 295 | 51 | 49 | 2 | alive | 295 | 152 | 17 | 18 |
| 10 | 85.3 | 23.8 | 4 | 241 | 244 | 0 | 1 | 245 | 40 | 40 | 0 | dead | 245 | 237 | 35 | 0 |
| 11 | 16.2 | 81.3 | 7 | 288 | 294 | 0 | 1 | 295 | 53 | 49 | 4 | alive | 295 | 297 | 45 | 29 |
| 12 | 31.0 | 22.8 | 26 | 269 | 294 | 0 | 1 | 295 | 66 | 49 | 17 | alive | 295 | 297 | 52 | 396 |
| 13 | 50.0 | −5.7 | 7 | 288 | 294 | 0 | 1 | 295 | 55 | 49 | 6 | alive | 295 | 297 | 50 | 37 |
| 14 | 47.7 | 64.9 | 9 | 286 | 294 | 0 | 1 | 295 | 68 | 49 | 19 | alive | 295 | 297 | 49 | 82 |
| 15 | 87.3 | 135.4 | 15 | 280 | 294 | 0 | 1 | 295 | 81 | 49 | 32 | alive | 295 | 152 | 13 | 10 |
| 16 | 18.1 | 44.0 | 13 | 282 | 294 | 0 | 1 | 295 | 70 | 49 | 21 | alive | 295 | 297 | 48 | 121 |
| 17 | 73.5 | 10.3 | 5 | 290 | 294 | 0 | 1 | 295 | 83 | 49 | 34 | alive | 295 | 297 | 50 | 0 |
| 18 | 60.2 | 11.7 | 5 | 290 | 294 | 0 | 1 | 295 | 96 | 49 | 47 | alive | 295 | 297 | 48 | 0 |
| Hyperparameter | Values of the Grid | Selected Value for Contact Loss Prediction | Selected Value for Coverage Prediction |
|---|---|---|---|
| Learning rate | |||
| Number of layers L | 4 | 4 | |
| Hidden dimension H | 128 | 512 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Khachatrian, H.; Dovlatyan, A.; Grigoryan, G.; Raptis, T.P. Scalable Generation of Synthetic IoT Network Datasets: A Case Study with Cooja. Future Internet 2025, 17, 518. https://doi.org/10.3390/fi17110518
Khachatrian H, Dovlatyan A, Grigoryan G, Raptis TP. Scalable Generation of Synthetic IoT Network Datasets: A Case Study with Cooja. Future Internet. 2025; 17(11):518. https://doi.org/10.3390/fi17110518
Chicago/Turabian StyleKhachatrian, Hrant, Aram Dovlatyan, Greta Grigoryan, and Theofanis P. Raptis. 2025. "Scalable Generation of Synthetic IoT Network Datasets: A Case Study with Cooja" Future Internet 17, no. 11: 518. https://doi.org/10.3390/fi17110518
APA StyleKhachatrian, H., Dovlatyan, A., Grigoryan, G., & Raptis, T. P. (2025). Scalable Generation of Synthetic IoT Network Datasets: A Case Study with Cooja. Future Internet, 17(11), 518. https://doi.org/10.3390/fi17110518

