Enhanced MQTT Protocol for Securing Big Data/Hadoop Data Management
Abstract
1. Introduction
- Client interaction: Exchanges between the client and the resource manager or data nodes can be compromised, with an infected client being able to introduce malicious content or compromise connections.
- Data fragmentation: The use of big data clusters implies redundancy of data distributed across multiple nodes. This fragmentation complicates security in the absence of an appropriate protection model.
- Data integrity: Checks the integrity and redundancy of the information stored on a block.
- Data access control: Hadoop only offers access control at the data schema level, without finer granularity to precisely define user responsibilities and rights.
2. Background
2.1. Related Work
2.1.1. Security of Big Data/Hadoop
2.1.2. Security of the MQTT Protocol
2.1.3. Security Big Data/Hadoop MQTT
| Years, Ref | Authentication | Authorization | Confidentiality | Integrity ID-DN | Monitoring IDS | Availability | User Access Control | Network Access Control | Used Simulation Method | Results |
|---|---|---|---|---|---|---|---|---|---|---|
| Cloud/Blockchain environments used big data | ||||||||||
| 2025 [7] | Hybrid cryptographic key management between DataNodes and NameNode | Based on blockchain-based access control and smart contracts | Ensured using AES-GCM encryption for data-at-rest and RSA for key exchange | Verified using digital signatures and hash-based validation of DataNode IDs | Integrated intrusion detection (IDS) module monitors Hadoop logs and node behaviors | Availability through distributed replication across multiple DataNodes | User access is verified using blockchain credentials | Network-level control via secure communication channels (TLS + IDS alerts) | Using Hadoop cluster (2–3 DataNodes) on VirtualBox | Encrypt (time/file size) (*) 2.03 s/size = 100 M (*) 10.75 s/size = 500 M (*) 21.10 s/size = 1 G |
| 2024 [10] | Kerberos authentication | No | Encryption: (*) Dual-thread ECC encryption (DH-ECC) (*) Paillier homomorphic encryption algorithm analysis | No | No | Improved storage efficiency | No | No | Hadoop-based encrypted storage scheme | Encrypt (time/file size = 4 G) (*) AES = 130 s (*) DES = 140 s (*) ECC = 150 s (*) DH-ECC = 50 s |
| 2023 [20] | Two-factor authentication (2FA), OTP | Access control mechanisms | Encryption (RSA, AES, RC4) | SHA-512, SHA-256, MD5 | Yes In cloud | Security and availability ensured through encryption | Yes, role-based and attribute-based access control implementation | Firewalls in Cloud | Microsoft Azure, C#/ASP.NET | The y-axis shows time in milliseconds, and the X-axis represents the size in bytes. (encrypt = 3500 ms) (decryp = 700 ms) |
| 2021 [33] | Mutual authentication with lightweight protocols in IoT edge computing | Blockchain-based access control using Hyperledger Fabric (HLF) | Blockchain-based provenance mechanism ensuring data privacy | Cryptographic hash functions and blockchain ledger for data integrity | No | Ensures data availability through decentralized storage on Hadoop | Blockchain-based authentication for IoT devices | No | Experimentation on Hadoop and Hyperledger Fabric with Raspberry Pi devices | 600 transactions per minute, 500 ms average response time, improved data traceability and security |
| Big data with Hadoop platform | ||||||||||
| 2024 [24] | No | No | No | No | Discusses the use of big data analytics in cybersecurity for real-time threat detection and mitigation | Highlights scalability and performance | No | No | Hadoop, Spark, and machine learning algorithms | Analytics results |
| 2024 [1] | No | No | RSA encryption | No | No | Ensures availability through HDFS replication and MapReduce for parallel processing. | No | No | Single-node Hadoop cluster | Size 1 MB: (Mappers 1 = 16 s/Mappers 2 = 15 s/Mappers 3 = 14 s) Size 10 MB: (Mappers 1 = 13 s/Mappers 2 = 14 s/Mappers 3 = 13 s) Size 100 MB: (Mappers 1 = 15 s/Mappers 2 = 14 s/Mappers 3 = 14 s) |
| 2023 [16] | CP-ABE (Ciphertext-Policy Attribute-Based Encryption), RSA, and AES | CP-ABE | AES, RSA, and CP-ABE | No | No | The Hadoop Distributed File System (HDFS) ensures high availability and fault tolerance. | Attribute-based access control | No | Conducted on a Hadoop system with an Intel® Core i5 processor, 8 processors, 16 GB memory, and CentOS 7.5 (1 slave) | 1 GB/encry (*)Proposed = 5 min (*)DES = 13 min (*)3DES = 12.5 min (*) Blowfish = 11.8 min |
| 2023 [21] | No | No | Privacy-Preserving Data Mining (PPDM) through data decentralization | No | No | Ensures security by decentralizing data to prevent unauthorized access | Decentralized Data Control | No | NA | Improved data privacy by reducing security risks from centralized data storage |
| 2023 [17] | AES (Advanced Encryption Standard) et OTP (One-Time Password) | NA | AES et OTP | No | No | HDFS ensures high availability | Access control is based on user attributes | No | NA | encryption/decryption speed and reduces the size of encrypted files by 20% compared to traditional methods |
| 2021 [18] | Kerberos (between users and Hadoop services) | UNIX-style permissions and ACLs | AES | No | No | HDFS data replication | Kerberos and delegation tokens | No | Hadoop installed on a VirtualBox, virtual machine Ubuntu | AES crypt (key: 128/192/256) (block: 128) time/s: 150/300/700 |
| Hadoop with MQTT | ||||||||||
| 2024 [12] | NA | NA | NA | NA | NA | Usage Hadoop | NA | Local Network | NA | Summary in the form of a BPMN diagram |
| 2024 [13] | NA | NA | NA | NA | NA | Combination of Kafka (for streaming) and Hadoop/HDFS (for storage) | NA | NA | Physical test bed with 6 IoT beacons | calculates risk scores during a simulated evacuation (test-bed) |
| Our work | ||||||||||
| 2026 | (*) AES (Advanced Encryption Standard) (*) JWT (JSON Web Tokens) | Access control mechanisms | AES-GCM | (*) SHA-3-256 (*) MD5 (by using (Hadoop) (*) Integrity offered by AES-GCM encryption | Module monitors (verification function) | High availability through distributed replication across multiple DataNodes | User access is verified using (hash verification function) | Network-level control via secure communication channels (TLS + IDS alerts) | Using Hadoop cluster (3 DataNodes) on VirtualBox | Encrypt (time/file size) (*) 2 s/size = 100 M (*) 7.69 s/size = 500 M (*) 19.97 s/size = 1 G |
3. Materials and Methods
3.1. System Architecture
3.2. Proposed Three-Phase Integrity Defense for Hadoop
3.2.1. Hadoop-Based Approach
| Algorithm 1. Secure Write Phase in the Enhanced Hadoop/MQTT Framework |
| Input: File to be written F Output: File F stored in replicated blocks across the DataNodes 1. HADOOP CLIENT: Divide file F into blocks {B1, B2, …, B1} 2. HADOOP CLIENT → NAMENODE: Notify the NameNode to write the file 3. NAMENODE: Identify the appropriate DataNodes for each block 4. HADOOP CLIENT → DATA NODE (DN): Send each block to the assigned DN 5. REPEAT for each block Bk: 5.1 DN: Save block Bk 5.2 DN: Replicate the block to other DataNodes according to the replication policy (e.g., 2 copies) UNTIL all blocks are written |
| Algorithm 2. Read Phase in the Enhanced Hadoop/MQTT Framework |
| Input: Name of the file F to read Output: Reconstructed file F 1. NAMENODE → HADOOP CLIENT: Divide the file into i blocks {B1, B2, …, Bi} and specify the DataNodes that store each block 2. For each block Bk: 2.1 DN: Identify the location of the blocks (e.g., B1: DN1, DN3, B2: DN2, DNj, …) 3. HADOOP CLIENT: To retrieve block Bk: IF an available DN contains block Bk THEN 3.1 Retrieve the block from this DN ELSE 3.2 Retrieve the block from another DN containing this block // Repeat for all blocks {B1, B2, …, Bi} to reconstruct file F |
3.2.2. Confidentiality Check
- The distributed file system (HDFS) transmits a query to the master node, which contains the information for creating a new file.
- In this step, the NameNode analyzes the availability of space.
- The file is encrypted in the NameNode when the characteristics are specified.
- The AES key is generated and applied to the file to encrypt it to protect it.
- In this step, the output data is encrypted, followed by a writing process to store the information in a particular DN.
- The NameNode stores information about the current NameNode and the secondary NameNode (replication node) throughout the replication operation.
3.2.3. Integrity Check
Integrity DataNode Level
- Retrieve ID-DN.
- Calculate the new ID hash value “”
- Retrieve ID-DN reference hash “”
- Test whether the ID-DN is changed or not by hash verification « = ? ».
- 4.1
- If the verification test is false: Stop YARN.
- 4.2
- Else, Start YARN, if there is another DN to verify the uniqueness of their ID. If here is another DN to repair the process from the beginning.
Integrity Blocks Level
- Retrieve the DN-ID and retrieve the ID-Blocks.
- Calculate the new hash value of the ID “H_Curent” at the DN and block level.
- Retrieve the reference hash of (ID-DN/ID-Blocks) “H_(“reference”)”.
- Verify the modification of (ID-DN/ID-Blocks) by verifying the hash H_(“reference”) = ? H_“Actual”. If the verification test is false, then raise an alert; otherwise, there is another replication of the block in another DN to verify the uniqueness of its identifier
3.3. Extended MQTT for Security
3.3.1. Execution Scenario
3.3.2. Authentication Module
3.3.3. Discovery Mechanism
4. Results
- Hadoop Environment Security Reinforcement: We propose an integrity verification mechanism for DNs to prevent unauthorized modifications or malicious node impersonation. This is achieved through SHA-3-256 hashing to validate DN identity, ensuring uniqueness, preventing hash collusion, and providing trustworthiness within the cluster.
- Data Encryption for Confidentiality and Integrity: We evaluate the performance of AES encryption modes (CTR, CBC, and GCM) with varying key sizes (128, 192, and 256 bits) to ensure secure data storage and transmission. The results (transfer times, computational overhead) are analyzed to determine the optimal encryption approach for large-scale datasets.
4.1. DataNode Integrity Verification
4.2. Analysis of AES Modes and Key Sizes Based on Transfer Time to HDFS
4.2.1. AES-CTR Mode (Counter Mode)
4.2.2. AES-CBC (Cipher Block Chaining)
4.2.3. AES-GCM (Galois/Counter Mode)
5. Discussion
5.1. Evaluation of AES Modes and Key Impact
5.2. Comparative Results
5.3. Evaluation of Extended MQTT

6. Limitations and Future Work
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kala, K.; Makhloga, K.; Khan, A.; Pandey, A.; Mittal, S. A Framework for Big Data Security Using MapReduce in IoT Enabled Computing. In Proceedings of the 11th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 15–16 March 2024; pp. 570–573. [Google Scholar]
- DemandSage. Google Search Statistics: How Many Google Searches Per Day? 2024. Available online: https://www.demandsage.com/google-search-statistics (accessed on 10 September 2025).
- Yassein, M.B.; Shatnawi, M.Q.; Aljwarneh, S.; Al-Hatmi, R. Internet of Things: Survey and open issues of MQTT protocol. In Proceedings of the 2017 International Conference on Engineering & MIS (ICEMIS), Monastir, Tunisia, 8–10 May 2017; IEEE: New York, NY, USA, 2017; pp. 1–6. [Google Scholar]
- Azzedin, F.; Alhazmi, T. Secure data distribution architecture in IoT using MQTT. Appl. Sci. 2023, 13, 2515. [Google Scholar] [CrossRef]
- Shan, Y.; Su, Y.; Lin, J.; Shan, T. IoT Communication Based on MQTT and OneNET Cloud Platform in Big Data Environment. Preprints 2024, 2024011250. [Google Scholar]
- Shvaika, A.; Shvaika, D.; Landiak, D.; Artemchuk, V. A distributed architecture for MQTT messaging: The case of TBMQ. J. Big Data 2025, 12, 224. [Google Scholar] [CrossRef]
- Filaly, Y.; Berros, N.; El Bekkali, M.; Younes El Bouzekri, E.L. A Comprehensive Survey on Big Data Privacy and Hadoop Security: Insights into Encryption Mechanisms and Emerging Trends. Results Eng. 2025, 27, 106203. [Google Scholar] [CrossRef]
- Wen, C.; Yang, J.; Gan, L.; Pan, Y. Big data driven internet of things for credit evaluation and early warning in finance. Future Generat. Comput. Syst. 2021, 124, 295–307. [Google Scholar] [CrossRef]
- Huang, X.; Yi, W.; Wang, J.; Xu, Z. Hadoop-based medical image storage and access method for examination series. Math. Probl. Eng. 2021, 2021, 5525009. [Google Scholar] [CrossRef]
- Guan, S.; Zhang, C.; Wang, Y.; Liu, W. Hadoop-based secure storage solution for big data in cloud computing environment. Digit. Commun. Netw. 2024, 10, 227–236. [Google Scholar] [CrossRef]
- Patel, C.; Doshi, N. A novel MQTT security framework in generic IoT model. Procedia Comput. Sci. 2020, 171, 1399–1408. [Google Scholar] [CrossRef]
- Barton, M.; Budjac, R.; Tanuska, P.; Sladek, I.; Nemeth, M. Advancing small and medium-sized enterprise manufacturing: Framework for IoT-based data collection in Industry 4.0 concept. Electronics 2024, 13, 2485. [Google Scholar] [CrossRef]
- Sarkar, S.; Kumar, S.S.; Giri, A.; Dammur, A. Advancing Urban Evacuation Management: A Real-Time, Adaptive Model Leveraging Cloud-Enabled Big Data and IoT Surveillance. In Proceedings of the 2023 4th International Conference on Intelligent Technologies (CONIT), Bangalore, India, 21–23 June 2024; IEEE: New York, NY, USA, 2024; pp. 1–6. [Google Scholar]
- Kaur, P.; Sharma, M.; Mittal, M. Big Data and Machine Learning Based Secure Healthcare Framework. Procedia Comput. Sci. 2018, 132, 1049–1059. [Google Scholar] [CrossRef]
- Sh Ahmed, S.; L Abd Al-Nabi, D. Using Hadoop to analyze big data for multiple purposes: An applied study according to the map-reduce model. Int. J. Nonlinear Anal. Appl. 2023, 14, 47–62. [Google Scholar]
- Filaly, Y.; El Mendili, F.; Berros, N.; El Bouzekri El Idrissi, Y. Hybrid encryption algorithm for information security in Hadoop. Int. J. Adv. Comput. Sci. Appl. 2023, 14, 1297–1302. [Google Scholar] [CrossRef]
- Fashakh, A.; Ibrahim, A.A. A proposed secure and efficient Big Data (Hadoop) security mechanism using encryption algorithm. In Proceedings of the 2023 International Conference on Electrical, Computer and Energy Technologies (ICECET), Cape Town, South Africa, 8–10 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–6. [Google Scholar]
- Saritha, G.; Nagalakshmi, V. An efficient approach for BigData security based on Hadoop system using cryptographic techniques. Indian J. Comput. Sci. Eng. 2021, 12, 1027–1037. [Google Scholar] [CrossRef]
- Narayanan, U.; Paul, V.; Joseph, S. A novel system architecture for secure authentication and data sharing in cloud enabled Big Data Environment. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 3121–3135. [Google Scholar] [CrossRef]
- Alabdulatif, A.; Thilakarathne, N.N.; Kalinaki, K. A Novel Cloud Enabled Access Control Model for Preserving the Security and Privacy of Medical Big Data. Electronics 2023, 12, 2646. [Google Scholar] [CrossRef]
- Josphineleela, R.; Kaliappan, S.; Natrayan, L.; Garg, A. Big Data Security through Privacy—Preserving Data Mining (PPDM): A Decentralization Approach. In Proceedings of the Second International Conference on Electronics and Renewable Systems (ICEARS-2023), Tuticorin, India, 2–4 March 2023; IEEE: New York, NY, USA, 2023; pp. 718–721. [Google Scholar]
- Nayini, S.; Kandlakunta, A.R. Big Data Hadoop: Security, Privacy, Performance and Scalability. Privacy, Performance and Scalability. 2024. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=5071036 (accessed on 11 February 2026).
- Palit, S.; Roy, C.S. Securing Big Data with Hadoop in enterprise information systems. In Proceedings of the International Conference on Research in Education and Science (ICRES 2024), ISTES, Antalya, Turkey, 27–30 April 2024; ISTES: San Antonio, TX, USA, 2024; pp. 1402–1416. [Google Scholar]
- Dandekar, M.; Lote, S.; Dandekar, P. Implementing the power of Big Data analytics. In Proceedings of the 2024 IEEE 3rd International Conference on Electrical Power and Energy Systems (ICEPES), MANIT Bhopal, India, 21–22 June 2024; IEEE: New York, NY, USA, 2024. [Google Scholar]
- Singla, R.; Kaur, N.; Koundal, D.; Bharadwaj, A. Challenges and developments in secure routing protocols for healthcare in WBAN: A comparative analysis. Wirel. Pers. Commun. 2022, 122, 1767–1806. [Google Scholar] [CrossRef]
- Kawaguchi, R.; Bandai, M. Edge based MQTT broker architecture for geographical IoT applications. In Proceedings of the 2020 International Conference on Information Networking (ICOIN), Barcelona, Spain, 7–10 January 2020; IEEE: New York, NY, USA, 2020; pp. 232–235. [Google Scholar]
- Nayak, M.; Patro, P.; Awotunde, J.B.; Gupta, S.K. IoT Security Architectures and Protocols. In Security Paradigms in 6G Smart Cities and IoT Ecosystems; CRC Press: Boca Raton, FL, USA, 2025; pp. 52–71. [Google Scholar]
- Nadeem, M.; Mustafa, R.; Abi-Char, P.E.; Tucker, R.S. A Study of Security Threats in IoT Network Layer using MQTT and TLS. In Proceedings of the 2025 12th International Conference on Information Technology (ICIT), Amman, Jordan, 27–30 May 2025; IEEE: New York, NY, USA, 2025; pp. 161–166. [Google Scholar]
- Sharma, A.; Malviya, R.; Gupta, R. Big data analytics in healthcare. In Cognitive Intelligence and Big Data in Healthcare; John Wiley & Sons, Inc.: Hoboken, NJ, USA, 2022; pp. 257–301. [Google Scholar] [CrossRef]
- Karatas, M.; Eriskin, L.; Deveci, M.; Pamucar, D.; Garg, H. Big Data for Healthcare Industry 4.0: Applications, challenges and future perspectives. Expert Syst. Appl. 2022, 200, 116912. [Google Scholar] [CrossRef]
- Chien, H.Y.; Shih, A.T.; Huang, Y.M. Exploring MQTT Broker-Based, End-to-End Models for Security and Efficiency. Sensors 2025, 25, 5308. [Google Scholar] [CrossRef]
- Alharbi, S.; Awad, W.; Bell, D. HECS4MQTT: A Multi-Layer Security Framework for Lightweight and Robust Encryption in Healthcare IoT Communications. Future Internet 2025, 17, 298. [Google Scholar] [CrossRef]
- Pajooh, H.H.; Rashid, M.A.; Alam, F.; Demidenko, S. IoT Big Data provenance scheme using blockchain on Hadoop ecosystem. J. Big Data 2021, 8, 114. [Google Scholar] [CrossRef]
- Dworkin, M.J. Recommendation for Block Cipher Modes of Operation: The CBC, CFB, OFB, CTR, and XTS Modes. NIST 2001. [Google Scholar] [CrossRef]
- Vidhya, S. Enhancing Cloud Security for Structured Data: An AES-GCM Based Format-Preserving Encryption Approach. In Artificial Intelligence Based Smart and Secured Applications, Proceedings of the International Conference on Advancements in Smart Computing and Information Security, Rajkot, India, 16–18 October 2024; Springer Nature: Cham, Switzerland, 2024; pp. 196–205. [Google Scholar]
- Dworkin, M.J. Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC. NIST 2007. [Google Scholar] [CrossRef]













| Symbol Used | Description |
|---|---|
| ID-reference | DN identifier created at the first startup (reference identity) |
| Current-ID | DN Identifier created at the new startup (current identity) |
| ID-DN | DN Identifier |
| Reference ID-DN hash | |
| New ID-DN hash | |
| ID-Blocks | Identifier of Blocks |
| H_”Curent” | Hash (ID-DN/ID-Block) first startup |
| H_(“reference”) | Reference hash (ID-DN/ID-Blocks) |
| H_”Actual” | New hash (ID-DN/ID-Blocks) |
| Symbol Used | Description |
|---|---|
| UUID | User unique identifier for DN |
| start-all.sh | Script shell for start hadoop |
| verify_all_datanode_hashes.sh | Script shell for verify all DN hashes |
| AES in CTR, CBC, and GCM modes | Mode encryption algorithm: CTR: Counter Mode CBC: Cipher Block Chaining GCM: Galois/Counter Mode |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Kamoun-Abid, F.; Meddeb-Makhlouf, A. Enhanced MQTT Protocol for Securing Big Data/Hadoop Data Management. J. Sens. Actuator Netw. 2026, 15, 22. https://doi.org/10.3390/jsan15010022
Kamoun-Abid F, Meddeb-Makhlouf A. Enhanced MQTT Protocol for Securing Big Data/Hadoop Data Management. Journal of Sensor and Actuator Networks. 2026; 15(1):22. https://doi.org/10.3390/jsan15010022
Chicago/Turabian StyleKamoun-Abid, Ferdaous, and Amel Meddeb-Makhlouf. 2026. "Enhanced MQTT Protocol for Securing Big Data/Hadoop Data Management" Journal of Sensor and Actuator Networks 15, no. 1: 22. https://doi.org/10.3390/jsan15010022
APA StyleKamoun-Abid, F., & Meddeb-Makhlouf, A. (2026). Enhanced MQTT Protocol for Securing Big Data/Hadoop Data Management. Journal of Sensor and Actuator Networks, 15(1), 22. https://doi.org/10.3390/jsan15010022

