MDPI - Publisher of Open Access Journals

19 pages, 807 KB

Open AccessArticle

DAG-Guided Active Fuzzing: A Deterministic Approach to Detecting Race Conditions in Distributed Cloud Systems

by Hongyi Zhao, Zhen Li, Yueming Wu and Deqing Zou

Appl. Sci. 2026, 16(4), 2061; https://doi.org/10.3390/app16042061 - 19 Feb 2026

Viewed by 502

The rapid expansion of distributed cloud platforms introduces critical security challenges, specifically non-deterministic race conditions like Time-of-Check to Time-of-Use (TOCTOU) vulnerabilities. Traditional passive detection methods often fail to identify these transient “Heisenbugs” due to the asynchronous nature of multi-threaded control planes. To address [...] Read more.

The rapid expansion of distributed cloud platforms introduces critical security challenges, specifically non-deterministic race conditions like Time-of-Check to Time-of-Use (TOCTOU) vulnerabilities. Traditional passive detection methods often fail to identify these transient “Heisenbugs” due to the asynchronous nature of multi-threaded control planes. To address this, we propose a novel DAG-Guided Active Fuzzing framework. Our approach constructs a Directed Acyclic Graph (DAG) to map causal dependencies of API operations and implements deterministic proactive scheduling. By injecting microsecond-level delays into identified race windows, the system enforces adversarial interleavings to expose hidden order and atomicity violations. Validated on 32 verified vulnerabilities across six distributed systems (including Hadoop and OpenStack), our method achieves an overall Recall (Detection Rate) of 68.8% across the entire dataset and a peak Precision of 92% in reproducibility tests, significantly outperforming random fuzzing baselines (

p < 0.01

). Furthermore, the framework maintains a low runtime overhead of 11.5%. These findings demonstrate a favorable trade-off between detection depth and system efficiency, establishing the approach as a robust toolchain for transforming theoretical concurrency risks into reproducible security findings in large-scale cloud infrastructure. Full article

(This article belongs to the Special Issue Cyberspace Security Technology in Computer Science)

► Show Figures

Figure 1

27 pages, 3230 KB

Open AccessArticle

Enhanced MQTT Protocol for Securing Big Data/Hadoop Data Management

by Ferdaous Kamoun-Abid and Amel Meddeb-Makhlouf

J. Sens. Actuator Netw. 2026, 15(1), 22; https://doi.org/10.3390/jsan15010022 - 16 Feb 2026

Viewed by 826

Abstract

Big data has significantly transformed data processing and analytics across various domains. However, ensuring security and data confidentiality in distributed platforms such as Hadoop remains a challenging task. Distributed environments face major security issues, particularly in the management and protection of large-scale data. [...] Read more.

Big data has significantly transformed data processing and analytics across various domains. However, ensuring security and data confidentiality in distributed platforms such as Hadoop remains a challenging task. Distributed environments face major security issues, particularly in the management and protection of large-scale data. In this article, we focus on the cost of secure information transmission, implementation complexity, and scalability. Furthermore, we address the confidentiality of information stored in Hadoop by analyzing different AES encryption modes and examining their potential to enhance Hadoop security. At the application layer, we operate within our Hadoop environment using an extended, secure, and widely used MQTT protocol for large-scale data communication. This approach is based on implementing MQTT with TLS, and before connecting, we add a hash verification of the data nodes’ identities and send the JWT. This protocol uses TCP at the transport layer for underlying transmission. The advantage of TCP lies in its reliability and small header size, making it particularly suitable for big data environments. This work proposes a triple-layer protection framework. The first layer is the assessment of the performance of existing AES encryption modes (CTR, CBC, and GCM) with different key sizes to optimize data confidentiality and processing efficiency in large-scale Hadoop deployments. Afterwards, we propose evaluating the integrity of DataNodes using a novel verification mechanism that employs SHA-3-256 hashing to authenticate nodes and prevent unauthorized access during cluster initialization. At the third tier, the integrity of data blocks within Hadoop is ensured using SHA-3-256. Through extensive performance testing and security validation, we demonstrate integration. Full article

(This article belongs to the Section Network Security and Privacy)

► Show Figures

Figure 1

37 pages, 3325 KB

Open AccessReview

A Comprehensive Survey of MapReduce Models for Processing Big Data

by Hemn Barzan Abdalla, Yulia Kumar, Yue Zhao and Davide Tosi

Big Data Cogn. Comput. 2025, 9(4), 77; https://doi.org/10.3390/bdcc9040077 - 27 Mar 2025

Cited by 3 | Viewed by 5222

Abstract

With the rapid increase in the amount of big data, traditional software tools are facing complexity in tackling big data, which is a huge concern in the research industry. In addition, the management and processing of big data have become more difficult, thus [...] Read more.

With the rapid increase in the amount of big data, traditional software tools are facing complexity in tackling big data, which is a huge concern in the research industry. In addition, the management and processing of big data have become more difficult, thus increasing security threats. Various fields encountered issues in fully making use of these large-scale data with supported decision-making. Data mining methods have been tremendously improved to identify patterns for sorting a larger set of data. MapReduce models provide greater advantages for in-depth data evaluation and can be compatible with various applications. This survey analyses the various map-reducing models utilized for big data processing, the techniques harnessed in the reviewed literature, and the challenges. Furthermore, this survey reviews the major advancements of diverse types of map-reduce models, namely Hadoop, Hive, Pig, MongoDB, Spark, and Cassandra. Besides the reliable map-reducing approaches, this survey also examined various metrics utilized for computing the performance of big data processing among the applications. More specifically, this review summarizes the background of MapReduce and its terminologies, types, different techniques, and applications to advance the MapReduce framework for big data processing. This study provides good insights for conducting more experiments in the field of processing and managing big data. Full article

► Show Figures

Figure 1

16 pages, 4769 KB

Open AccessArticle

Digital Forensics Readiness in Big Data Networks: A Novel Framework and Incident Response Script for Linux–Hadoop Environments

by Cephas Mpungu, Carlisle George and Glenford Mapp

Appl. Syst. Innov. 2024, 7(5), 90; https://doi.org/10.3390/asi7050090 - 25 Sep 2024

Viewed by 3926

Abstract

The surge in big data and analytics has catalysed the proliferation of cybercrime, largely driven by organisations’ intensified focus on gathering and processing personal data for profit while often overlooking security considerations. Hadoop and its derivatives are prominent platforms for managing big data; [...] Read more.

The surge in big data and analytics has catalysed the proliferation of cybercrime, largely driven by organisations’ intensified focus on gathering and processing personal data for profit while often overlooking security considerations. Hadoop and its derivatives are prominent platforms for managing big data; however, investigating security incidents within Hadoop environments poses intricate challenges due to scale, distribution, data diversity, replication, component complexity, and dynamicity. This paper proposes a big data digital forensics readiness framework and an incident response script for Linux–Hadoop environments, streamlining preliminary investigations. The framework offers a novel approach to digital forensics in the domains of big data and Hadoop environments. A prototype of the incident response script for Linux–Hadoop environments was developed and evaluated through comprehensive functionality and usability testing. The results demonstrated robust performance and efficacy. Full article

(This article belongs to the Section Information Systems)

► Show Figures

Figure 1

21 pages, 10483 KB

Open AccessArticle

Evading Cyber-Attacks on Hadoop Ecosystem: A Novel Machine Learning-Based Security-Centric Approach towards Big Data Cloud

by Neeraj A. Sharma, Kunal Kumar, Tanzim Khorshed, A B M Shawkat Ali, Haris M. Khalid, S. M. Muyeen and Linju Jose

Information 2024, 15(9), 558; https://doi.org/10.3390/info15090558 - 10 Sep 2024

Cited by 5 | Viewed by 2304

Abstract

The growing industry and its complex and large information sets require Big Data (BD) technology and its open-source frameworks (Apache Hadoop) to (1) collect, (2) analyze, and (3) process the information. This information usually ranges in size from gigabytes to petabytes of data. [...] Read more.

The growing industry and its complex and large information sets require Big Data (BD) technology and its open-source frameworks (Apache Hadoop) to (1) collect, (2) analyze, and (3) process the information. This information usually ranges in size from gigabytes to petabytes of data. However, processing this data involves web consoles and communication channels which are prone to intrusion from hackers. To resolve this issue, a novel machine learning (ML)-based security-centric approach has been proposed to evade cyber-attacks on the Hadoop ecosystem while considering the complexity of Big Data in Cloud (BDC). An Apache Hadoop-based management interface “Ambari” was implemented to address the variation and distinguish between attacks and activities. The analyzed experimental results show that the proposed scheme effectively (1) blocked the interface communication and retrieved the performance measured data from (2) the Ambari-based virtual machine (VM) and (3) BDC hypervisor. Moreover, the proposed architecture was able to provide a reduction in false alarms as well as cyber-attack detection. Full article

(This article belongs to the Special Issue Cybersecurity, Cybercrimes, and Smart Emerging Technologies)

► Show Figures

Figure 1

10 pages, 1500 KB

Open AccessProceeding Paper

A Novel Information Security Framework for Securing Big Data in Healthcare Environment Using Blockchain

by Lakshman Kannan Venugopal, Rajappan Rajaganapathi, Abhishek Birjepatil, Sundararajan Edwin Raja and Gnanasaravanan Subramaniam

Eng. Proc. 2023, 59(1), 107; https://doi.org/10.3390/engproc2023059107 - 22 Dec 2023

Cited by 9 | Viewed by 2554

Abstract

The Blockchain-based information security framework for health care big data environments is a framework designed for the secure storage, access, and transmission of health care data in big data environments. It combines the privacy and security advantages of encryption and decentralized networks offered [...] Read more.

The Blockchain-based information security framework for health care big data environments is a framework designed for the secure storage, access, and transmission of health care data in big data environments. It combines the privacy and security advantages of encryption and decentralized networks offered by Blockchain technology with the scalability of distributed systems to provide an effective secure platform for big data applications. The framework is based on the principles of confidentiality and immutability to ensure the security and privacy of health care data. The framework is designed to support a wide range of information sources and use cases including patient records, clinical research, medical imaging, genomic data, and pharmaceutical trials. It is also designed to be compatible with existing distributed computing and data querying technologies such as Hadoop and Spark, which will help organizations to improve the accessibility of health care data. The Blockchain-based framework will also provide an audit trail, allowing hospitals and other organizations to better monitor and control access to their data. This will enable organizations to ensure compliance with HIPAA and other regulations, while providing enhanced confidentiality and privacy to users and patients. Full article

(This article belongs to the Proceedings of Eng. Proc., 2023, RAiSE-2023)

► Show Figures

Figure 1

27 pages, 2829 KB

Open AccessArticle

EStore: A User-Friendly Encrypted Storage Scheme for Distributed File Systems

by Yuxiang Chen, Guishan Dong, Chunxiang Xu, Yao Hao and Yue Zhao

Sensors 2023, 23(20), 8526; https://doi.org/10.3390/s23208526 - 17 Oct 2023

Cited by 3 | Viewed by 3104

Abstract

In this paper, we propose a user-friendly encrypted storage scheme named EStore, which is based on the Hadoop distributed file system. Users can make use of cloud-based distributed file systems to collaborate with each other. However, most data are processed and stored in [...] Read more.

In this paper, we propose a user-friendly encrypted storage scheme named EStore, which is based on the Hadoop distributed file system. Users can make use of cloud-based distributed file systems to collaborate with each other. However, most data are processed and stored in plaintext, which is out of the owner’s control after it has been uploaded and shared. Meanwhile, simple encryption guarantees the confidentiality of uploaded data but reduces availability. Furthermore, it is difficult to deal with complex key management as there is the problem whereby a single key encrypts different files, thus increasing the risk of leakage. In order to solve the issues above, we put forward an encrypted storage model and a threat model, designed with corresponding system architecture to cope with these requirements. Further, we designed and implemented six sets of protocols to meet users’ requirements for security and use. EStore manages users and their keys through registration and authentication, and we developed a searchable encryption module and encryption/decryption module to support ciphertext retrieval and secure data outsourcing, which will only minimally increase the calculation overhead of the client and storage redundancy. Users are invulnerable compared to the original file system. Finally, we conducted a security analysis of the protocols to demonstrate that EStore is feasible and secure. Full article

(This article belongs to the Special Issue Data Privacy, Security, and Trust in New Technological Trends)

► Show Figures

Figure 1

15 pages, 4226 KB

Open AccessArticle

Research on Secure Storage Technology of Spatiotemporal Big Data Based on Blockchain

by Bao Zhou, Junsan Zhao, Guoping Chen and Ying Yin

Appl. Sci. 2023, 13(13), 7911; https://doi.org/10.3390/app13137911 - 6 Jul 2023

Cited by 3 | Viewed by 2159

Abstract

With the popularity of spatiotemporal big data applications, more and more sensitive data are generated by users, and the sharing and secure storage of spatiotemporal big data are faced with many challenges. In response to these challenges, the present paper puts forward a [...] Read more.

With the popularity of spatiotemporal big data applications, more and more sensitive data are generated by users, and the sharing and secure storage of spatiotemporal big data are faced with many challenges. In response to these challenges, the present paper puts forward a new technology called CSSoB (Classified Secure Storage Technology over Blockchain) that leverages blockchain technology to enable classified secure storage of spatiotemporal big data. This paper introduces a twofold approach to tackle challenges associated with spatiotemporal big data. First, the paper proposes a strategy to fragment and distribute space–time big data while enabling both encryption and nonencryption operations based on different data types. The sharing of sensitive data is enabled via smart contract technology. Second, CSSoB’s single-node storage performance was assessed under local and local area network (LAN) conditions, and results indicate that the read performance of CSSoB surpasses its write performance. In addition, read and write performance were observed to increase significantly as the file size increased. Finally, the transactions per second (TPS) of CSSoB and the Hadoop Distributed File System (HDFS) were compared under varying thread numbers. In particular, when the thread number was set to 100, CSSoB demonstrated a TPS improvement of 7.8% in comparison with HDFS. Given the remarkable performance of CSSoB, its adoption can not only enhance storage performance, but also improve storage security to a great extent. Moreover, the fragmentation processing technology employed in this study enables secure storage and rapid data querying while greatly improving spatiotemporal data processing capabilities. Full article

(This article belongs to the Special Issue Computational Methods for Next Generation Wireless and IoT Applications)

► Show Figures

Figure 1

20 pages, 4788 KB

Open AccessArticle

Generative Adversarial Networks (GAN) and HDFS-Based Realtime Traffic Forecasting System Using CCTV Surveillance

by Praveen Devadhas Sujakumari and Paulraj Dassan

Symmetry 2023, 15(4), 779; https://doi.org/10.3390/sym15040779 - 23 Mar 2023

Cited by 39 | Viewed by 5499

Abstract

The most crucial component of any smart city traffic management system is traffic flow prediction. It can assist a driver in selecting the most efficient route to their destination. The digitalization of closed-circuit television (CCTV) systems has resulted in more effective and capable [...] Read more.

The most crucial component of any smart city traffic management system is traffic flow prediction. It can assist a driver in selecting the most efficient route to their destination. The digitalization of closed-circuit television (CCTV) systems has resulted in more effective and capable surveillance imaging systems for security applications. The number of automobiles on the world’s highways has steadily increased in recent decades. However, road capacity has not developed at the same rate, resulting in significantly increasing congestion. The model learning mechanism cannot be guided or improved by prior domain knowledge of real-world problems. In reality, symmetrical features are common in many real-world research objects. To mitigate this severe situation, the researchers chose adaptive traffic management to make intelligent and efficient use of the current infrastructure. Data grow exponentially and become a complex item that must be managed. Unstructured data are a subset of big data that are difficult to process and have volatile properties. CCTV cameras are used in traffic management to monitor a specific point on the roadway. CCTV generates unstructured data in the form of images and videos. Because of the data’s intricacy, these data are challenging to process. This study proposes using big data analytics to transform real-time unstructured data from CCTV into information that can be shown on a web dashboard. As a Hadoop-based architectural stack that can serve as the ICT backbone for managing unstructured data efficiently, the Hadoop Distributed File System (HDFS) stores several sorts of data using the Hadoop file storage system, a high-performance integrated virtual environment (HIVE) tables, and non-relational storage. Traditional computer vision algorithms are incapable of processing such massive amounts of visual data collected in real-time. However, the inferiority of traffic data and the quality of unit information are always symmetrical phenomena. As a result, there is a need for big data analytics with machine learning, which entails processing and analyzing vast amounts of visual data, such as photographs or videos, to uncover semantic patterns that may be interpreted. As a result, smart cities require a more accurate traffic flow prediction system. In comparison to other recent methods applied to the dataset, the proposed method achieved the highest accuracy of 98.21%. In this study, we look at the construction of a secure CCTV strategy that predicts traffic from CCTV surveillance using real-time traffic prediction analysis with generative adversarial networks (GAN) and HDFS. Full article

► Show Figures

Figure 1

28 pages, 4528 KB

Open AccessArticle

A Framework for Attribute-Based Access Control in Processing Big Data with Multiple Sensitivities

by Anne M. Tall and Cliff C. Zou

Appl. Sci. 2023, 13(2), 1183; https://doi.org/10.3390/app13021183 - 16 Jan 2023

Cited by 27 | Viewed by 9418

Abstract

There is an increasing demand for processing large volumes of unstructured data for a wide variety of applications. However, protection measures for these big data sets are still in their infancy, which could lead to significant security and privacy issues. Attribute-based access control [...] Read more.

There is an increasing demand for processing large volumes of unstructured data for a wide variety of applications. However, protection measures for these big data sets are still in their infancy, which could lead to significant security and privacy issues. Attribute-based access control (ABAC) provides a dynamic and flexible solution that is effective for mediating access. We analyzed and implemented a prototype application of ABAC to large dataset processing in Amazon Web Services, using open-source versions of Apache Hadoop, Ranger, and Atlas. The Hadoop ecosystem is one of the most popular frameworks for large dataset processing and storage and is adopted by major cloud service providers. We conducted a rigorous analysis of cybersecurity in implementing ABAC policies in Hadoop, including developing a synthetic dataset of information at multiple sensitivity levels that realistically represents healthcare and connected social media data. We then developed Apache Spark programs that extract, connect, and transform data in a manner representative of a realistic use case. Our result is a framework for securing big data. Applying this framework ensures that serious cybersecurity concerns are addressed. We provide details of our analysis and experimentation code in a GitHub repository for further research by the community. Full article

(This article belongs to the Special Issue Applications of Enhancing Network Security: Latest Advances and Prospects)

► Show Figures

Figure 1

18 pages, 2885 KB

Open AccessEditor’s ChoiceArticle

Introducing UWF-ZeekData22: A Comprehensive Network Traffic Dataset Based on the MITRE ATT&CK Framework

by Sikha S. Bagui, Dustin Mink, Subhash C. Bagui, Tirthankar Ghosh, Russel Plenkers, Tom McElroy, Stephan Dulaney and Sajida Shabanali

Data 2023, 8(1), 18; https://doi.org/10.3390/data8010018 - 11 Jan 2023

Cited by 31 | Viewed by 11282

Abstract

With the rapid rate at which networking technologies are changing, there is a need to regularly update network activity datasets to accurately reflect the current state of network infrastructure/traffic. The uniqueness of this work was that this was the first network dataset collected [...] Read more.

With the rapid rate at which networking technologies are changing, there is a need to regularly update network activity datasets to accurately reflect the current state of network infrastructure/traffic. The uniqueness of this work was that this was the first network dataset collected using Zeek and labelled using the MITRE ATT&CK framework. In addition to identifying attack traffic, the MITRE ATT&CK framework allows for the detection of adversary behavior leading to an attack. It can also be used to develop user profiles of groups intending to perform attacks. This paper also outlined how both the cyber range and hadoop’s big data platform were used for creating this network traffic data repository. The data was collected using Security Onion in two formats: Zeek and PCAPs. Mission logs, which contained the MITRE ATT&CK data, were used to label the network attack data. The data was transferred daily from the Security Onion virtual machine running on a cyber range to the big-data platform, Hadoop’s distributed file system. This dataset, UWF-ZeekData22, is publicly available at datasets.uwf.edu. Full article

► Show Figures

Figure 1

19 pages, 2707 KB

Open AccessArticle

An Efficient Hybrid QHCP-ABE Model to Improve Cloud Data Integrity and Confidentiality

by Kranthi Kumar Singamaneni, Ali Nauman, Sapna Juneja, Gaurav Dhiman, Wattana Viriyasitavat, Yasir Hamid and Joseph Henry Anajemba

Electronics 2022, 11(21), 3510; https://doi.org/10.3390/electronics11213510 - 28 Oct 2022

Cited by 23 | Viewed by 3192

Abstract

Cloud computational service is one of the renowned services utilized by employees, employers, and organizations collaboratively. It is accountable for data management and processing through virtual machines and is independent of end users’ system configurations. The usage of cloud systems is very simple [...] Read more.

Cloud computational service is one of the renowned services utilized by employees, employers, and organizations collaboratively. It is accountable for data management and processing through virtual machines and is independent of end users’ system configurations. The usage of cloud systems is very simple and easy to organize. They can easily be integrated into various storages of the cloud and incorporated into almost all available software tools such as Hadoop, Informatica, DataStage, and OBIEE for the purpose of Extraction-Transform-Load (ETL), data processing, data reporting, and other related computations. Because of this low-cost-based cloud computational service model, cloud users can utilize the software and services, the implementation environment, storage, and other on-demand resources with a pay-per-use model. Cloud contributors across this world move all these cloud-based apps, software, and large volumes of data in the form of files and databases into enormous data centers. However, the main challenge is that cloud users cannot have direct control over the data stored at these data centers. They do not even know the integrity, confidentiality, level of security, and privacy of their sensitive data. This exceptional cloud property creates several different security disputes and challenges. To address these security challenges, we propose a novel Quantum Hash-centric Cipher Policy-Attribute-based Encipherment (QH-CPABE) framework to improve the security and privacy of the cloud user’s sensitive data. In our proposed model, we used both structured and unstructured big cloud clinical data as input so that the simulated experimental results conclude that the proposal has precise, resulting in approximately 92% correctness of bit hash change and approximately 96% correctness of chaotic dynamic key production, enciphered and deciphered time as compared with conventional standards from the literature. Full article

(This article belongs to the Section Computer Science & Engineering)

► Show Figures

Figure 1

17 pages, 1453 KB

Open AccessArticle

Towards Developing a Robust Intrusion Detection Model Using Hadoop–Spark and Data Augmentation for IoT Networks

by Ricardo Alejandro Manzano Sanchez, Marzia Zaman, Nishith Goel, Kshirasagar Naik and Rohit Joshi

Sensors 2022, 22(20), 7726; https://doi.org/10.3390/s22207726 - 12 Oct 2022

Cited by 8 | Viewed by 3017

Abstract

In recent years, anomaly detection and machine learning for intrusion detection systems have been used to detect anomalies on Internet of Things networks. These systems rely on machine and deep learning to improve the detection accuracy. However, the robustness of the model depends [...] Read more.

In recent years, anomaly detection and machine learning for intrusion detection systems have been used to detect anomalies on Internet of Things networks. These systems rely on machine and deep learning to improve the detection accuracy. However, the robustness of the model depends on the number of datasamples available, quality of the data, and the distribution of the data classes. In the present paper, we focused specifically on the amount of data and class imbalanced since both parameters are key in IoT due to the fact that network traffic is increasing exponentially. For this reason, we propose a framework that uses a big data methodology with Hadoop–Spark to train and test multi-class and binary classification with one-vs-rest strategy for intrusion detection using the entire BoT IoT dataset. Thus, we evaluate all the algorithms available in Hadoop–Spark in terms of accuracy and processing time. In addition, since the BoT IoT dataset used is highly imbalanced, we also improve the accuracy for detecting minority classes by generating more datasamples using a Conditional Tabular Generative Adversarial Network (CTGAN). In general, our proposed model outperforms other published models including our previous model. Using our proposed methodology, the F1-score of one of the minority class, i.e., Theft attack was improved from 42% to 99%. Full article

(This article belongs to the Special Issue Communication, Security, and Privacy in IoT)

► Show Figures

Figure 1

14 pages, 2084 KB

Open AccessArticle

Morton Filter-Based Security Mechanism for Healthcare System in Cloud Computing

by Sugandh Bhatia and Jyoteesh Malhotra

Healthcare 2021, 9(11), 1551; https://doi.org/10.3390/healthcare9111551 - 15 Nov 2021

Cited by 6 | Viewed by 4171

Abstract

Electronic health records contain the patient’s sensitive information. If these data are acquired by a malicious user, it will not only cause the pilferage of the patient’s personal data but also affect the diagnosis and treatment. One of the most challenging tasks in [...] Read more.

Electronic health records contain the patient’s sensitive information. If these data are acquired by a malicious user, it will not only cause the pilferage of the patient’s personal data but also affect the diagnosis and treatment. One of the most challenging tasks in cloud-based healthcare systems is to provide security and privacy to electronic health records. Various probabilistic data structures and watermarking techniques were used in the cloud-based healthcare systems to secure patient’s data. Most of the existing studies focus on cuckoo and bloom filters, without considering their throughputs. In this research, a novel cloud security mechanism is introduced, which supersedes the shortcomings of existing approaches. The proposed solution enhances security with methods such as fragile watermark, least significant bit replacement watermarking, class reliability factor, and Morton filters included in the formation of the security mechanism. A Morton filter is an approximate set membership data structure (ASMDS) that proves many improvements to other data structures, such as cuckoo, bloom, semi-sorting cuckoo, and rank and select quotient filters. The Morton filter improves security; it supports insertions, deletions, and lookups operations and improves their respective throughputs by 0.9× to 15.5×, 1.3× to 1.6×, and 1.3× to 2.5×, when compared to cuckoo filters. We used Hadoop version 0.20.3, and the platform was Red Hat Enterprise Linux 6; we executed five experiments, and the average of the results has been taken. The results of the simulation work show that our proposed security mechanism provides an effective solution for secure data storage in cloud-based healthcare systems, with a load factor of 0.9. Furthermore, to aid cloud security in healthcare systems, we presented the motivation, objectives, related works, major research gaps, and materials and methods; we, thus, presented and implemented a cloud security mechanism, in the form of an algorithm and a set of results and conclusions. Full article

(This article belongs to the Special Issue Emerging Technologies in Health Informatics and Management)

► Show Figures

Figure 1

29 pages, 4955 KB

Open AccessArticle

A Hadoop Based Framework Integrating Machine Learning Classifiers for Anomaly Detection in the Internet of Things

by Ikram Sumaiya Thaseen, Vanitha Mohanraj, Sakthivel Ramachandran, Kishore Sanapala and Sang-Soo Yeo

Electronics 2021, 10(16), 1955; https://doi.org/10.3390/electronics10161955 - 13 Aug 2021

Cited by 22 | Viewed by 4789

Abstract

In recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify [...] Read more.

In recent years, different variants of the botnet are targeting government, private organizations and there is a crucial need to develop a robust framework for securing the IoT (Internet of Things) network. In this paper, a Hadoop based framework is proposed to identify the malicious IoT traffic using a modified Tomek-link under-sampling integrated with automated Hyper-parameter tuning of machine learning classifiers. The novelty of this paper is to utilize a big data platform for benchmark IoT datasets to minimize computational time. The IoT benchmark datasets are loaded in the Hadoop Distributed File System (HDFS) environment. Three machine learning approaches namely naive Bayes (NB), K-nearest neighbor (KNN), and support vector machine (SVM) are used for categorizing IoT traffic. Artificial immune network optimization is deployed during cross-validation to obtain the best classifier parameters. Experimental analysis is performed on the Hadoop platform. The average accuracy of 99% and 90% is obtained for BoT_IoT and ToN_IoT datasets. The accuracy difference in ToN-IoT dataset is due to the huge number of data samples captured at the edge layer and fog layer. However, in BoT-IoT dataset only 5% of the training and test samples from the complete dataset are considered for experimental analysis as released by the dataset developers. The overall accuracy is improved by 19% in comparison with state-of-the-art techniques. The computational times for the huge datasets are reduced by 3–4 hours through Map Reduce in HDFS. Full article

(This article belongs to the Special Issue Security and Privacy for IoT and Multimedia Services)

► Show Figures

Figure 1

Search Results (27)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (27)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI