You are currently viewing a new version of our website. To view the old version click .
Sustainability
  • Article
  • Open Access

27 September 2021

Real-Time DDoS Attack Detection System Using Big Data Approach

,
,
,
,
,
,
and
1
Department of Software Engineering, University of Management and Technology, Lahore 54770, Pakistan
2
Department of Computer Engineering, National University of Technology, Islamabad 44000, Pakistan
3
College of Business, Abu Dhabi University, Abu Dhabi 59911, United Arab Emirates
4
Oxford Centre for Islamic Studies, University of Oxford, Marston Rd, Headington, Oxford OX3 0EE, UK
This article belongs to the Special Issue Big Data Security, Privacy and Sustainability

Abstract

Currently, the Distributed Denial of Service (DDoS) attack has become rampant, and shows up in various shapes and patterns, therefore it is not easy to detect and solve with previous solutions. Classification algorithms have been used in many studies and have aimed to detect and solve the DDoS attack. DDoS attacks are performed easily by using the weaknesses of networks and by generating requests for services for software. Real-time detection of DDoS attacks is difficult to detect and mitigate, but this solution holds significant value as these attacks can cause big issues. This paper addresses the prediction of application layer DDoS attacks in real-time with different machine learning models. We applied the two machine learning approaches Random Forest (RF) and Multi-Layer Perceptron (MLP) through the Scikit ML library and big data framework Spark ML library for the detection of Denial of Service (DoS) attacks. In addition to the detection of DoS attacks, we optimized the performance of the models by minimizing the prediction time as compared with other existing approaches using big data framework (Spark ML). We achieved a mean accuracy of 99.5% of the models both with and without big data approaches. However, in training and testing time, the big data approach outperforms the non-big data approach due to that the Spark computations in memory are in a distributed manner. The minimum average training and testing time in minutes was 14.08 and 0.04, respectively. Using a big data tool (Apache Spark), the maximum intermediate training and testing time in minutes was 34.11 and 0.46, respectively, using a non-big data approach. We also achieved these results using the big data approach. We can detect an attack in real-time in few milliseconds.

1. Introduction

The data stored on the internet is growing day by day particularly for threats that target sensitive or crucial data, and this has raised many security issues, such as malicious intrusions [1]. Targeted attacks and threats such as malware and botnets cause great damage to the community in different factors, such as financial loss or health loss. Significant research has been conducted in which researchers have proposed different Intrusion Detection Systems (IDS) to mitigate the risk of malicious intrusion attacks [2]. Now, the data can be extracted easily from information retrieval models as well as information extraction of any kind [3,4]. Traditional intrusion detection techniques can only work best on slow-speed data or small data. Currently, they are inefficient on big data and are incapable of handling high-speed data, so new methods need to be adapted to work on large data to detect any signs of intrusion. The security and privacy of data is now the big challenge in the life of big data, particularly in network attacks [5]. One major attack is the DDoS attack. DDoS attacks are cyber-attacks on specific servers or network with the intended purpose of disrupting that network or server’s normal operation [6]. Real-time detection of DDoS attacks is not easy to detect and mitigate, but this solution holds great value as attacks can cause big issues [7]. First, we need to define intrusion, intrusion detection, and intrusion detection systems.
According to the National Institute of Standards and Technology (NIST), intrusion is defined as bypassing the security mechanism or violating the security policies Confidentiality, Integrity and Availability (CIA) in computer networks [8].
  • Intrusion detection (ID) is divided into two processes: monitoring the intrusions and analyzing a network’s events to seek any malicious packets or a source in the computer network or a computer system.
  • Intrusion Detection System (IDS) detects an event as an intrusion when a noticeably different event occurs from a legitimate or authorized event [9].
Machine learning (ML) is the study that is continuously being enhanced via training and the exploitation of information. It is considered a component of artificial intelligence. There are different kinds of learnings based on the information available, such as supervised, semi-supervised and unsupervised learning. ML has a good results through the use of bimodal approaches through big data and deep learning models. ML techniques are utilized in a broad range of applications, such as in healthcare, for predicting COVID19, Osteoporosis and Schistosomiasis [10,11,12,13,14,15].
The recognition and detection performance in terms of security evaluation remain an issue, such as with DDoS attacks and blockchain. Classification algorithms have been used in many studies and aimed to detect and solve the DDoS attack. DDoS attacks can be performed easily by using the weaknesses of networks and by generating requests for services for software [16,17]. The elapsed time during any kind of real-time detection of DDoS attacks is also a challenge, and as attacks are not easy to detect and mitigate, this solution holds great value, as these attacks can cause big issues [18]. However, the existing approaches have many problems in detecting the DDoS attacks, such as computation costs while detecting, and the inability to deal with large requests passing through the network towards the server. Classification algorithms classify the packets for the identification of DDoS from the normal packets [19].
In recent years, many studies have applied the big data framework Spark ML to achieve better results, but they have not calculated the running time after applying Spark, as in our approach [20,21,22,23,24,25]. Moreover, machine learning algorithms, together with the big data approach, can solve many complex problems and find many hidden patterns in the context [26]. Big data promotes environment sustainability to solve very complex problems in real-time on social media using big data approaches, applications such fake profile detection or supply chain management with blockchain technology [27,28]. Apache Spark has evolved into an illegitimate platform for big data analysis. It is a computational framework for a high-performance cluster featuring Java, R language-integrated APIs, and Python. As a free software product that is quickly evolving, with a rising number of participants from university and business, the whole production and studies underlying Apache Spark, particularly by those that are novices in this domain, has proven difficult for academics. This article presents a practical assessment of extensive data analysis using Apache Spark. The major elements, concepts, and capabilities of Apache Spark are addressed in this analysis. In particular, this article illustrates the architecture and implementation of Apache Spark large-scale database techniques and pathways for master learning, diagram research, and data aggregation [29,30].
Two models were selected for real-time detection: Random Forest (RF) and Multi-Layer Perceptron (MLP). Both classifiers were evaluated using the Scikit ML libraries as a non-big data approach and Spark ML libraries as big data approach. In terms of accuracy, we achieved similar mean accuracy in both of the models. Still, in terms of training time and testing time, the big data approach outperforms the non-big data approach because Spark computations in memory take place in a distributed manner. The minimum average training and testing time in minutes was 14.08 and 0.04, respectively. Using a big data tool (Apache Spark), the maximum average training and testing time in minutes was 34.11 and 0.46, respectively, using a non-big data approach. We also achieved these results using the big data approach. We can detect an attack in real-time in few milliseconds.
The following are the major contributions of our study:
  • As far as we know, there is no study such as this, which has compared accuracies as well as execution time with machine learning and Apache Spark ML on Distributed Denial of Service (DDoS).
  • We detected DoS attacks in an efficient manner with and without the use of big data machine learning approaches in Random Forest and Multi-Layer Perceptron models.
  • In addition to the detection of DDoS attack, we have optimized the performance of the models by minimizing the execution time as compared with other existing approaches using the big data framework.
The document is organized as follows: Section 2 describes the related work that has been done and reviews the current studies related to machine learning approaches in detecting network attacks; Section 3 presents a methodology and dataset we used for the experiments; Section 4 shows the experiments we performed using both techniques (big data and non-big data); Section 5 discusses the results obtained from both methods of detecting DDoS attacks in real-time; finally, in Section 6, the conclusions and future work are presented.

3. Material and Methodology

3.1. Dataset

We used the application layer DDoS dataset available on Kaggle. The dataset belongs to a large dataset category and consists of around 0.9 million records with 77 features columns and a target column. Mainly it consists of three labels: (1) DDoS slow loris; (2) DDoS Hulk; and (3) BENIGN [55].

3.2. Our Approach and Data Pre-Processing

Figure 1 shows the system block diagram that applies classification algorithms after pre-processing to detect the DDoS attack. There are four main components involved: (1) data acquisition, (2) data pre-processing, (3) classification machine model, and (4) evaluation to produce the output of whether the attack is DDoS.
Figure 1. System Block Diagram to predict DDoS Attacks.
The first component (data acquisition) of the proposed model highlights the dataset that is used for the pre-processing phase of the model. In the second component of the model (pre-processing), we pre-processed the data before applying any machine learning model. First, noise filtering is performed on the dataset. Noise filtering is a collection of procedures used to reduce noise from data collected on development. In the next step, missing values are handled utilizing various policies, such as ignoring data with missing entries, replacing data with a universal, consistent method and explicitly filling in missing attributes depending on your area of expertise. In the third step, we transformed the 3-class problem into 2-class problems. We considered “DDoS slow loris” and “DDoS Hulk” as a single class and “BENIGN” as another class. In the last step, we reduced the number of features from 77 to 25 features. There were 77 features in the dataset, so Principal Component Analysis (PCA) technique was used to reduce the number of features and left only those most suitable features for predicting the model. Principal Component Analysis (PCA) tool is used for dimensionality reduction in both approaches with and without the big data approach. Scikit PCA is used for the non-big data approach, and Spark PCA is used for the big data approach. In the final step, all string type data was transformed into float type. Input data for Apache Spark ML models should be in vector form, so the string values of the data were transformed into numerical to create the dense vectors. The string indexer library was used to index the target column for Spark ML and Standard Scaler library for Scikit ML models.
In the third component of our proposed model, we applied two machine learning models, RF and MLP, for the training and testing of the data. Both models were applied with and without big data approach. We used Scikit libraries for the modeling purpose and considered it as non-big data approach, and for the big data approach we applied Apache Spark libraries Spark ML for the training and testing of the dataset. Finally, in the last steps we measured the evaluation matrices of all the approaches and compared the results.

3.3. Classification Machine Learning Models

We applied two machine learning models for our approach in the case of with and without big data machine learning.

3.3.1. Random Forest

Stochastic classification systems are part of the wide range of outfit approaches to learning. They are easy to construct and quick to operate, and in several fields, they have been found quite beneficial. In the learning phase, multiple “basic” decision trees and the maximum vote (modal) in them at the categorization phase are the fundamental premise behind the Random Forest technique. This voting approach has, amongst other advantages, a correction to the excess dataset for the undesired characteristic of decision bodies. Random forests utilize the broad process-based packaging for individual trees in the composition during the learning phase. Periodically the packing chooses and matches a unique item to update the exercise set. Unlike any cutting, every tree is cultivated. Research has used RF for the construction of spatial distribution of the population density in a region [56].
The Equation (1) of Gini Importance, assuming only two child nodes is as follows:
mx y = wtI y wt left y I left y wt right y I right y  
where in Equation (1) the mxy is importance of node and I is Impurity and wt is weight of left and right with number of samples
Figure 2 shows the Random Forest classifier diagram of how it works. Several tresses are used for the predictions, and the voting approach is used for the target output. The random character of tree construction is an intrinsic outcome of that.
Figure 2. Random Forests classifier.
The number of trees in the whole set is a flexible variable that the out-of-bag mistake quickly discovers. As with Naive Bayes—and neighboring methods close to it—Random Forest is very common due to its clarity and strong results. In contrast to the two previous techniques, however, Random Forest is not predictable in terms of the architecture of the resulting framework produced.

3.3.2. Multi-Layer Perceptron

One of the main frequent algorithms in the domain of neural learning is a Multi-Layer Perceptron (MLP). A brain network called ‘vanilla’ is sometimes referred to as MLP, and the intricate patterns of the present era are easier. However, it has also paved the way for increasingly sophisticated convolutional neural networks [15,57].
The MLP is a neural feedback system, meaning the input layer is used to send knowledge towards the output layer. Weights are allocated to the links across the layers. The gravity indicates the significance of a link. The MLP is used for various activities such as inventory evaluation. A Multi-Layer Perception is comprised of interlinked cells that transmit knowledge to each other, like the human brain.
The MLP updated the weight when it occurred error during misclassification. The Equation (2) shows the weight update through old weight plus learning rate (lr) multiplied with expected value (y) and predicted value (x) during forward and backward processes.
wt   updation =   old   wt   +   lr     y     x     a
Figure 3 shows the Multi-Layer Perceptron diagram including inputs, hidden layer and output layer. A value is allocated to every cell. The system can be split into three top layers: (1) input Layer, the earliest communication layer that receives an intake to create an outcome; (2) hidden layer(s), where at least one hidden layer is a critical system, and they input data, calculations and processes to create everything useful; and (3) output layer, where there is a significant output in the neurons in this layer.
Figure 3. Multi-Layer Perceptron classifier.

4. Experiments and Results

4.1. Experiment Setup

Random Forest and Multi-Layer Perceptron were used to detect an attack in real-time and evaluate the performance with and without the big data approach. We used Apache Spark, a big data tool distributed framework, to speed up the computational and time-consuming tasks. Spark ML libraries were used to evaluate the performance with the big data approach and Scikit ML on Google Colab [58] libraries for the non-big data approach. The system specifications of the server we used for both approaches are the same. We used Databricks Community Editions for experiments with available memory of 15.3 GB and two cores with a single DB [59]. Spark v3.1.1 was used for Spark ML Libraries, and no worker nodes were used. The availability zone of the server was US-West-2C.

4.2. Experimental Parameters

We used the Cross-Validation technique, and started with the 100 trees at the beginning up to 500 trees with a step of 100 and with other default settings. There were two minimum samples used to split the internal node of a tree and one minimum number of samples were used to be at a leaf node. For measuring the quality of the split criteria, “Gini” was selected. Moreover, the parameter was set to auto for the RF to consider the number of features for the split. About 70% of the data was used for training, and 30% was used for testing.
The variables used for performing the experiment with Multi-Layer Perceptron classifier were as follows. We used Cross-Validation technique, and started with the 100 iterations at the beginning up to 500 iterations with a step of 100 and with default other settings. The Adam optimizer was utilized for the weight optimization of the model and Rectified Linear Unit (ReLu) was used as activation function. We had also shuffle samples for each iteration and the value of exponential decay rate was set to 0.9. Here also 70% of the data was used for training and the remaining 30% was used for testing.

4.3. Evaluation Results

We measured results through accuracy, precision, recall, F1 score and confusion matrix of our both models.
Figure 4 shows the accuracy comparison of both models, Random Forest classifier (RF) and Multi-Layer Perceptron (MLP) for both approaches, big data and non-big Data.
Figure 4. Accuracy comparison of both models (RF and MLP) for both approaches (big data and non-big data).
The RF classifier was used, and this decision was based on the majority of the voting. When the model was trained with 100 trees, we achieved the minimum accuracy of about 99.868%, and the maximum accuracy achieved was 99.989% when the model was trained with the 400 trees for non-big data approach. On the other hand, for big data approach the model was trained with 200 trees, and we achieved the minimum accuracy of about 99.045%, and maximum accuracy achieved was 99.895% when the model was trained with the 400 trees. The MLP classifier was used with a minimum of 100 iterations up to a maximum of 500 iterations. When the model was trained with 200 iterations, we achieved the minimum accuracy of about 97.5%, and maximum accuracy achieved was 99.9% when the model was trained with the 400 iterations for non-big data approach. On the other hand, for the big data approach, the model was trained with 100 iterations, and we achieved the minimum accuracy of about 96.9%. Maximum accuracy achieved was 99.5% when the model was trained with the 500 iterations.
Table 1 shows the evaluation matrix (accuracy, precision, recall, and F1 score) of both models, Random Forest classifier (RF) and Multi-Layer Perceptron (MLP), for both big data and non-big data approaches. When the RF model was evaluated the precision of the model was 99.97%, whereas recall was 99.98%. The false-positive rate was 0.004, and the false-negative rate was 0.002 for non-big data approach. When the big data approach was used the precision of the model was 99.94%, whereas recall was 99.97%. The false-positive rate was 0.007 and false-negative rate was 0.003.
Table 1. Evaluation matrix with and without big data approaches.
When the MLP model was evaluated, the precision of the model was 99.96%, whereas recall was 99.97%. The false-positive rate was 0.005 and false-negative rate was 0.003 for the non-big data approach. When the big data approach was used, the precision of the model was 99.94%, whereas recall was 99.97%. The false-positive rate was 0.007 and false-negative rate was 0.003.

4.4. Execuation Time

Our main aim was to calculate the execution time of our all four approaches.
Figure 5 shows the running time comparison of both Random Forest classifier (RF) and Multi-Layer Perceptron (MLP) for both approaches, big data and non-big data.
Figure 5. Running time comparison of both models (RF and MLP) for both approaches (big data and non-big data).
When the RF model was trained with 100 trees, we achieved the minimum testing time of about 0.2 min, and the maximum testing time took around 0.8 min when the model was trained with the 400 trees for non-big data approach. On the other hand, for the big data approach the model was trained with 100 trees, and we achieved the minimum testing time of about 0.064 min. For the maximum testing time it took around 0.148 min when the model was trained with the 300 trees.
When the MLP model was trained with 300 trees, we achieved the minimum testing time of about 0.065 min, and the maximum testing time took around 0.168 min when the model was trained with the 500 iterations for non-big data approach. On the other hand, for the big data approach, when the model was trained with 500 iterations, we achieved the minimum testing time of about 0.023 min, and the maximum testing time took around 0.073 min when the model was trained with the 100 iterations.

5. Discussion

In terms of accuracy, we achieved a similar mean accuracy of the approaches, but in terms of training time and testing time the big data approach outperforms the non-big data approach due to the fact that Spark does all computations in memory in a distributed manner. In addition, the Random Forest classifier outperforms MLP, having a higher mean accuracy. In Figure 6, a comparison of average accuracies is presented for both the approaches.
Figure 6. Average accuracies comparison between models using big data approach and non-big data approach.
It is clearly seen that the average accuracy of using big data approach or non-big data approach was somehow similar, and Figure 7 shows the average training time of the models using the Scikit Libraries and Apache Spark ML Libraries. The RF models took more time on training as it uses the ensemble technique which uses various decision tress to be trained.
Figure 7. Average training time comparison between models using big data approach and non-big data approach.
Figure 8 shows the average testing time of the models using the Scikit Libraries and Apache Spark ML Libraries. It can clearly be seen that by using that big data approach, we have reduced the prediction time of the model to about 75% with almost similar prediction accuracy. The MLP algorithm achieves a minimum prediction time of about a few milliseconds for predicting an attack.
Figure 8. Average testing time comparison between models using big data approach and non-big data approach.
Table 2 shows the recent studies related to DDOS through various machine learning, deep learning and big data approaches with accuracies and execution time.
Table 2. State-of-the-art comparison work with our approaches in terms of accuracy and execution time.
It can clearly be seen above in Table 2 that our approach using Random Forest and Multi-Layer Perceptron and big data framework Spark achieved better accuracies and processing time overall. The minimum accuracy in the above studies achieved was 56.64% in [34] using Naive Bayes classifier with the non-big data approach. By comparing with our approach, our minimum accuracy achieved was 99.05% with MLP classifier using the non-big data approach. The maximum accuracy achieved was 99.88% in [54] with REPTree model but without the big data approach, whereas compared with our approach, the maximum accuracy achieved was 99.94% with RF classifier using big data approach. The minimum processing time achieved was 0.48 s with Logistic Regression classifier in other studies. Using our proposed model, the minimum processing time achieved was 0.04 s using MLP classifier with big-data approach. Our proposed models achieved high accuracy and low processing time as compared with other existing models.
However, there are some limitations of our study, First, we used only two machine learning models. Second, the dataset has only two classes.

6. Conclusions

With the growing presence of data on the internet, new opportunities for threats to target sensitive data have been raised many security issues, such as malicious intrusions. One major type of attack is the DDoS attack. Traditional intrusion detection techniques can only work best on slow-speed data or small data. Still, they are inefficient on big data and are incapable of handling high-speed data, so new methods adapted to work on large data to detect any signs of intrusion are needed. In this paper, we predicted DDoS attacks in real-time with different machine learning models using a big data approach. We used a distributed system, Apache Spark, and a classification algorithm to enhance the algorithms’ execution. Additionally, we compared the results of the big data approach and how it outperforms the non-big data approach. Apache Spark is a big data tool to detect an attack in real-time with Spark ML libraries. We applied the two machine learning approaches, Random Forest (RF) and Multi-Layer Perceptron (MLP), through the Scikit ML library and big data framework Spark-ML library for the detection of DoS attack. In addition to the detection of DoS attacks we have optimized the performance of the models by minimizing the prediction time as compared with other existing approaches using big data framework. We achieved a similar mean accuracy in the models used, but in terms of training time and testing time big data approach outperforms the non-big data approach due to the fact that Spark performs computations in memory in a distributed manner. The minimum average training and testing time in minutes was 14.08 and 0.04, respectively, by using the big data tool (Apache Spark), and maximum average training and testing time in minutes was 34.11 and 0.46, respectively, by using the non-big data approach. Using the big data approach, we were able to detect an attack in real-time in a few milliseconds.
In the future, we will evaluate Apache Spark with other big data tools in terms of accuracy, training time and testing time of the machine learning models. We could also train different models and combine them with deep learning approaches using neural networks for predicting real-time results from convolutional neural network architectures [60,61,62]. Moreover, we could also apply datasets by recent work on big deep Learning (Big DL) framework [63].

Author Contributions

All authors contributed equally to this work; conceptualization, M.J.A., A.Y., H.N., U.F., O.H., M.H., A.M.Z. and H.M.A.B.; methodology, M.J.A., U.F., O.H., M.H., A.M.Z. and H.M.A.B.; software, M.J.A., A.Y., H.N., U.F., O.H., M.H., A.M.Z. and H.M.A.B.; validation, M.J.A., A.Y., H.N., U.F., O.H., M.H., A.M.Z. and H.M.A.B.; investigation, M.J.A., A.Y., H.N., U.F., O.H., M.H., A.M.Z. and H.M.A.B.; resources; M.J.A., A.Y., H.N., U.F., O.H., M.H., A.M.Z. and H.M.A.B.; data curation, M.J.A., A.Y., H.N., U.F., O.H., M.H., A.M.Z. and H.M.A.B.; writing—original draft preparation, M.J.A., A.Y., H.N., U.F. and H.M.A.B.; writing—review and editing, M.J.A., H.N., U.F., O.H. and H.M.A.B.; visualization, M.J.A., A.Y., H.N., U.F., O.H. and H.M.A.B.; project administration M.J.A., A.Y., H.N., O.H., M.H. and A.M.Z.; funding acquisition, M.J.A., A.Y., H.N.; A.M.Z. and H.M.A.B. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not Applicable.

Data Availability Statement

It is stated that dataset was publicly available on the Kaggle website https://www.kaggle.com/wardac/applicationlayer-ddos-dataset (accessed date 7 November 2019).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript.
AODEAverage One-Dependence Estimator
CAIDMCollaborative and Adaptive Intrusion Detection Model
CIAConfidentiality, integrity and availability
CNNConvolutional Neural Network
DBNDeep Belief Network
DBN-EGWO-KELMDeep Belief Network—Enhanced Grey Wolf Optimizer—Kernel-based Extreme Learning Machine
DDoSDistributed Denial of Service
DoSDenial of Service
DRDiabetic retinopathy
DTDecision Tree
E-CARGOEnvironments classes, agents, roles, groups, and objects
EGWOEnhanced Grey Wolf Optimizer
ELMExtreme Learning Machine
FCMFuzzy C-Means
HG-GAHyper-graph based Genetic Algorithm
IDIntrusion Detection
IDSIntrusion Detection System
KELMKernel-based Extreme Learning Machine
KMK-Means
KNNK-Nearest Neighbors
LFCMLiteral Fuzzy c-Means
LMDRTLogarithm Marginal Density Ratios Transformation
LRLogistic Regression
LSTMLong Short-Term Memory
MLMachine learning
MLPMulti-Layer Perceptron
MQTTMessage Queuing Telemetry Transport
NBNaïve Bayes
NISTNational Institute of Standards and Technology
NSLNetwork Security Laboratory
PCAParallel Principal Component Analysis
PSO and KNNParticle Swarm Optimization and K-Nearest Neighbors
RFRandom Forest
RNNRecurrent Neural Network
RNN-IDSRecurrent Neural Network Intrusion Detection System
SDNSoftware Defined Network
SGDStochastic Gradient Descent
SRSIO-FCMThe Scalable Random Sampling with Iterative Optimization Fuzzy c-Means algorithm
SVCSupport Vector Classifier
SVC-RFSupport Vector Classifier with Random Forest
SVMSupport Vector Machine

References

  1. Munoz-Arcentales, A.; López-Pernas, S.; Pozo, A.; Alonso, Á.; Salvachúa, J.; Huecas, G. Data Usage and Access Control in Industrial Data Spaces: Implementation Using FIWARE. Sustainability 2020, 12, 3885. [Google Scholar] [CrossRef]
  2. Song, J.; Lee, Y.; Choi, J.-W.; Gil, J.-M.; Han, J.; Choi, S.-S. Practical In-Depth Analysis of IDS Alerts for Tracing and Identifying Potential Attackers on Darknet. Sustainability 2017, 9, 262. [Google Scholar] [CrossRef] [Green Version]
  3. Rehma, A.A.; Awan, M.J.; Butt, I. Comparison and Evaluation of Information Retrieval Models. VFAST Trans. Softw. Eng. 2018, 6, 7–14. [Google Scholar]
  4. Alam, T.M.; Awan, M.J. Domain analysis of information extraction techniques. Int. J. Multidiscip. Sci. Eng. 2018, 9, 1–9. [Google Scholar]
  5. Koo, J.; Kang, G.; Kim, Y.-G. Security and Privacy in Big Data Life Cycle: A Survey and Open Challenges. Sustainability 2020, 12, 10571. [Google Scholar] [CrossRef]
  6. Privalov, A.; Lukicheva, V.; Kotenko, I.; Saenko, I. Method of Early Detection of Cyber-Attacks on Telecommunication Networks Based on Traffic Analysis by Extreme Filtering. Energies 2019, 12, 4768. [Google Scholar] [CrossRef] [Green Version]
  7. Nishanth, N.; Mujeeb, A. Modeling and detection of flooding-based denial-of-service attack in wireless ad hoc network using Bayesian inference. IEEE Syst. J. 2020, 15, 17–26. [Google Scholar] [CrossRef]
  8. Scarfone, K.; Mell, P. Guide to intrusion detection and prevention systems (idps). NIST Spec. Publ. 2007, 800, 94. [Google Scholar]
  9. Mukherjee, B.; Heberlein, L.T.; Levitt, K.N. Network intrusion detection. IEEE Netw. 1994, 8, 26–41. [Google Scholar] [CrossRef]
  10. Gupta, M.; Jain, R.; Arora, S.; Gupta, A.; Awan, M.J.; Chaudhary, G.; Nobanee, H. AI-enabled COVID-9 Outbreak Analysis and Prediction: Indian States vs. Union Territories. Comput. Mater. Contin. 2021, 67, 933–950. [Google Scholar] [CrossRef]
  11. Anam, M.; Ponnusamy, V.; Hussain, M.; Nadeem, M.W.; Javed, M.; Guan Goh, H.; Qadeer, S. Osteoporosis Prediction for Trabecular Bone using Machine Learning: A Review. Comput. Mater. Contin. 2021, 67, 89–105. [Google Scholar] [CrossRef]
  12. Ali, Y.; Farooq, A.; Alam, T.M.; Farooq, M.S.; Awan, M.J.; Baig, T.I. Detection of Schistosomiasis Factors Using Association Rule Mining. IEEE Access 2019, 7, 186108–186114. [Google Scholar] [CrossRef]
  13. Javed, R.; Saba, T.; Humdullah, S.; Jamail, N.S.M.; Awan, M.J. An Efficient Pattern Recognition Based Method for Drug-Drug Interaction Diagnosis. In Proceedings of the 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 6–7 April 2021; pp. 221–226. [Google Scholar]
  14. Nagi, A.T.; Awan, M.J.; Javed, R.; Ayesha, N. A Comparison of Two-Stage Classifier Algorithm with Ensemble Techniques On Detection of Diabetic Retinopathy. In Proceedings of the 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 6–7 April 2021; pp. 212–215. [Google Scholar]
  15. Abdullah, A.; Awan, M.; Shehzad, M.; Ashraf, M. Fake News Classification Bimodal using Convolutional Neural Network and Long Short-Term Memory. Int. J. Emerg. Technol. Learn. 2020, 11, 209–212. [Google Scholar]
  16. Polat, H.; Polat, O.; Cetin, A. Detecting DDoS Attacks in Software-Defined Networks Through Feature Selection Methods and Machine Learning Models. Sustainability 2020, 12, 1035. [Google Scholar] [CrossRef] [Green Version]
  17. Ochôa, I.S.; Leithardt, V.R.Q.; Calbusch, L.; Santana, J.F.D.P.; Parreira, W.D.; Seman, L.O.; Zeferino, C.A. Performance and Security Evaluation on a Blockchain Architecture for License Plate Recognition Systems. Appl. Sci. 2021, 11, 1255. [Google Scholar] [CrossRef]
  18. Dos Anjos, J.C.S.; Gross, J.L.G.; Matteussi, K.J.; González, G.V.; Leithardt, V.R.Q.; Geyer, C.F.R. An Algorithm to Minimize Energy Consumption and Elapsed Time for IoT Workloads in a Hybrid Architecture. Sensors 2021, 21, 2914. [Google Scholar] [CrossRef]
  19. Ganguly, S.; Garofalakis, M.; Rastogi, R.; Sabnani, K. Streaming algorithms for robust, real-time detection of ddos attacks. In Proceedings of the 27th International Conference on Distributed Computing Systems (ICDCS’07), Toronto, ON, Canada, 25–27 June 2007; p. 4. [Google Scholar]
  20. Awan, M.J.; Rahim, M.S.M.; Nobanee, H.; Yasin, A.; Khalaf, O.I.; Ishfaq, U. A Big Data Approach to Black Friday Sales. Intell. Autom. Soft Comput. 2021, 27, 785–797. [Google Scholar] [CrossRef]
  21. Awan, M.J.; Rahim, M.S.M.; Nobanee, H.; Munawar, A.; Yasin, A.; Azlanmz, A.M.Z. Social Media and Stock Market Prediction: A Big Data Approach. Comput. Mater. Contin. 2021, 67, 2569–2583. [Google Scholar] [CrossRef]
  22. Ahmed, H.M.; Javed Awan, M.; Khan, N.S.; Yasin, A.; Shehzad, H.M.F. Sentiment Analysis of Online Food Reviews using Big Data Analytics. Elem. Educ. Online 2021, 20, 827–836. [Google Scholar]
  23. Awan, M.J.; Khan, R.A.; Nobanee, H.; Yasin, A.; Anwar, S.M.; Naseem, U.; Singh, V.P. A Recommendation Engine for Predicting Movie Ratings Using a Big Data Approach. Electronics 2021, 10, 1215. [Google Scholar] [CrossRef]
  24. Awan, M.J.; Gilani, S.A.H.; Ramzan, H.; Nobanee, H.; Yasin, A.; Zain, A.M.; Javed, R. Cricket Match Analytics Using the Big Data Approach. Electronics 2021, 10, 2350. [Google Scholar] [CrossRef]
  25. Khalil, A.; Awan, M.J.; Yasin, A.; Singh, V.P.; Shehzad, H.M.F. Flight Web Searches Analytics through Big Data. Int. J. Comput. Appl. Technol. 2021, in press. [Google Scholar]
  26. Zhou, L.; Pan, S.; Wang, J.; Vasilakos, A.V. Machine learning on big data: Opportunities and challenges. Neurocomputing 2017, 237, 350–361. [Google Scholar] [CrossRef] [Green Version]
  27. Park, K.O. A study on sustainable usage intention of blockchain in the big data era: Logistics and supply chain management companies. Sustainability 2020, 12, 10670. [Google Scholar] [CrossRef]
  28. Awan, M.J.; Khan, M.A.; Ansari, Z.K.; Yasin, A.; Shehzad, H.M.F. Fake Profile Recognition using Big Data Analytics in Social Media Platforms. Int. J. Comput. Appl. Technol. 2021, in press. [Google Scholar]
  29. Kshetri, N.; Torres, D.C.R.; Besada, H.; Ochoa, M.A.M. Big Data as a Tool to Monitor and Deter Environmental Offenders in the Global South: A Multiple Case Study. Sustainability 2020, 12, 10436. [Google Scholar] [CrossRef]
  30. Awan, M.J.; Yasin, A.; Nobanee, H.; Ali, A.A.; Shahzad, Z.; Nabeel, M.; Zain, A.M.; Shahzad, H.M.F. Fake News Data Exploration and Analytics. Electronics 2021, 10, 2326. [Google Scholar] [CrossRef]
  31. Zhang, H.; Dai, S.; Li, Y.; Zhang, W. Real-time distributed-random-forest-based network intrusion detection system using Apache spark. In Proceedings of the 2018 IEEE 37th International Performance Computing and Communications Conference (IPCCC), Orlando, FL, USA, 17–19 November 2018; pp. 1–7. [Google Scholar]
  32. Wang, H.; Xiao, Y.; Long, Y. Research of intrusion detection algorithm based on parallel SVM on spark. In Proceedings of the 2017 7th IEEE International Conference on Electronics Information and Emergency Communication (ICEIEC), Macau, China, 21–23 July 2017; pp. 153–156. [Google Scholar]
  33. Zekri, M.; El Kafhali, S.; Aboutabit, N.; Saadi, Y. DDoS attack detection using machine learning techniques in cloud computing environments. In Proceedings of the 2017 3rd International Conference of Cloud Computing Technologies and Applications (CloudTech), Rabat, Morocco, 24–26 October 2017; pp. 1–7. [Google Scholar]
  34. Halimaa, A.; Sundarakantham, K. Machine learning based intrusion detection system. In Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 23–25 April 2019; pp. 916–920. [Google Scholar]
  35. Raman, M.G.; Somu, N.; Kirthivasan, K.; Liscano, R.; Sriram, V.S.S. An efficient intrusion detection system based on hypergraph-Genetic algorithm for parameter optimization and feature selection in support vector machine. Knowl.-Based Syst. 2017, 134, 1–12. [Google Scholar] [CrossRef]
  36. Wang, H.; Gu, J.; Wang, S. An effective intrusion detection framework based on SVM with feature augmentation. Knowl.-Based Syst. 2017, 136, 130–139. [Google Scholar] [CrossRef]
  37. Teng, S.; Wu, N.; Zhu, H.; Teng, L.; Zhang, W. SVM-DT-based adaptive and collaborative intrusion detection. IEEE/CAA J. Autom. Sin. 2017, 5, 108–118. [Google Scholar] [CrossRef]
  38. Ahmad, I.; Basheri, M.; Iqbal, M.J.; Rahim, A. Performance comparison of support vector machine, random forest, and extreme learning machine for intrusion detection. IEEE Access 2018, 6, 33789–33795. [Google Scholar] [CrossRef]
  39. Yin, C.; Zhu, Y.; Fei, J.; He, X. A deep learning approach for intrusion detection using recurrent neural networks. IEEE Access 2017, 5, 21954–21961. [Google Scholar] [CrossRef]
  40. Li, Z.; Yan, G. A Spark Platform-Based Intrusion Detection System by Combining MSMOTE and Improved Adaboost Algorithms. In Proceedings of the 2018 IEEE 9th International Conference on Software Engineering and Service Science (ICSESS), Beijing, China, 23–25 November 2018; pp. 1046–1049. [Google Scholar]
  41. Aftab, M.O.; Awan, M.J.; Khalid, S.; Javed, R.; Shabir, H. Executing Spark BigDL for Leukemia Detection from Microscopic Images using Transfer Learning. In Proceedings of the 2021 1st International Conference on Artificial Intelligence and Data Analytics (CAIDA), Riyadh, Saudi Arabia, 6–7 April 2021; pp. 216–220. [Google Scholar]
  42. Al-Qatf, M.; Lasheng, Y.; Al-Habib, M.; Al-Sabahi, K. Deep learning approach combining sparse autoencoder with SVM for network intrusion detection. IEEE Access 2018, 6, 52843–52856. [Google Scholar] [CrossRef]
  43. Kato, K.; Klyuev, V. Development of a network intrusion detection system using Apache Hadoop and Spark. In Proceedings of the 2017 IEEE Conference on Dependable and Secure Computing, Taipei, Taiwan, 7–10 August 2017; pp. 416–423. [Google Scholar]
  44. Marir, N.; Wang, H.; Feng, G.; Li, B.; Jia, M. Distributed abnormal behavior detection approach based on deep belief network and ensemble SVM using spark. IEEE Access 2018, 6, 59657–59671. [Google Scholar] [CrossRef]
  45. Kim, J.; Kim, J.; Thu, H.L.T.; Kim, H. Long short term memory recurrent neural network classifier for intrusion detection. In Proceedings of the 2016 International Conference on Platform Technology and Service (PlatCon), Jeju, Korea, 15–17 February 2016; pp. 1–5. [Google Scholar]
  46. Jha, P.; Tiwari, A.; Bharill, N.; Ratnaparkhe, M.; Nagendra, N.; Mounika, M. Fuzzy-Based Kernelized Clustering Algorithms for Handling Big Data Using Apache Spark. In Proceedings of 6th International Conference on Harmony Search, Soft Computing and Applications, ICHSA 2020, Advances in Intelligent Systems and Computing; Nigdeli, S.M., Kim, J.H., Bekdaş, G., Yadav, A., Eds.; Springer: Singapore, 2021; Volume 1275. [Google Scholar] [CrossRef]
  47. Saravanan, S. Performance evaluation of classification algorithms in the design of Apache Spark based intrusion detection system. In Proceedings of the 2020 5th International Conference on Communication and Electronics Systems (ICCES), Coimbatore, India, 10–12 June 2020; pp. 443–447. [Google Scholar]
  48. Syed, N.F.; Baig, Z.; Ibrahim, A.; Valli, C. Denial of service attack detection through machine learning for the IoT. J. Inf. Telecommun. 2020, 4, 482–503. [Google Scholar] [CrossRef]
  49. Priya, S.S.; Sivaram, M.; Yuvaraj, D.; Jayanthiladevi, A. Machine learning based DDoS detection. In Proceedings of the 2020 International Conference on Emerging Smart Computing and Informatics (ESCI), Pune, India, 12–14 March 2020; pp. 234–237. [Google Scholar]
  50. Ujjan, R.M.A.; Pervez, Z.; Dahal, K.; Khan, W.A.; Khattak, A.M.; Hayat, B. Entropy Based Features Distribution for Anti-DDoS Model in SDN. Sustainability 2021, 13, 1522. [Google Scholar] [CrossRef]
  51. Gadze, J.D.; Bamfo-Asante, A.A.; Agyemang, J.O.; Nunoo-Mensah, H.; Opare, K.A.-B. An Investigation into the Application of Deep Learning in the Detection and Mitigation of DDOS Attack on SDN Controllers. Technologies 2021, 9, 14. [Google Scholar] [CrossRef]
  52. Ahuja, N.; Singal, G.; Mukhopadhyay, D.; Kumar, N. Automated DDOS attack detection in software defined networking. J. Netw. Comput. Appl. 2021, 187, 103108. [Google Scholar] [CrossRef]
  53. Wang, Z.; Zeng, Y.; Liu, Y.; Li, D. Deep belief network integrating improved kernel-based extreme learning machine for network intrusion detection. IEEE Access 2021, 9, 16062–16091. [Google Scholar] [CrossRef]
  54. Dehkordi, A.B.; Soltanaghaei, M.; Boroujeni, F.Z. The DDoS attacks detection through machine learning and statistical methods in SDN. J. Supercomput. 2021, 77, 2383–2415. [Google Scholar] [CrossRef]
  55. Warda. Application-Layer DDoS Dataset. Available online: https://www.kaggle.com/wardac/applicationlayer-ddos-dataset (accessed on 7 November 2019).
  56. Wang, F.; Lu, W.; Zheng, J.; Li, S.; Zhang, X. Spatially explicit mapping of historical population density with random forest regression: A case study of Gansu Province, China, in 1820 and 2000. Sustainability 2020, 12, 1231. [Google Scholar] [CrossRef] [Green Version]
  57. Awan, M.J.; Raza, A.; Yasin, A.; Shehzad, H.M.F.; Butt, I. The Customized Convolutional Neural Network of Face Emotion Expression Classification. Ann. Rom. Soc. Cell Biol. 2021, 25, 5296–5304. [Google Scholar]
  58. Awan, M.J. Acceleration of Knee MRI Cancellous bone Classification on Google Colaboratory using Convolutional Neural Network. Int. J. Adv. Trends Comput. Sci. Eng. 2019, 8, 83–88. [Google Scholar] [CrossRef]
  59. Salloum, S.; Dautov, R.; Chen, X.; Peng, P.X.; Huang, J.Z. Big data analytics on Apache Spark. Int. J. Data Sci. Anal. 2016, 1, 145–164. [Google Scholar] [CrossRef] [Green Version]
  60. Mujahid, A.; Awan, M.J.; Yasin, A.; Mohammed, M.A.; Damaševičius, R.; Maskeliūnas, R.; Abdulkareem, K.H. Real-Time Hand Gesture Recognition Based on Deep Learning YOLOv3 Model. Appl. Sci. 2021, 11, 4164. [Google Scholar] [CrossRef]
  61. Mubashar, R.; Awan, M.J.; Ahsan, M.; Yasin, A.; Singh, V.P. Efficient Residential Load Forecasting using Deep Learning Approach. Int. J. Comput. Appl. Technol. 2021, in press. [Google Scholar]
  62. Awan, M.J.; Rahim, M.S.M.; Salim, N.; Mohammed, M.A.; Garcia-Zapirain, B.; Abdulkareem, K.H. Efficient Detection of Knee Anterior Cruciate Ligament from Magnetic Resonance Imaging Using Deep Learning Approach. Diagnostics 2021, 11, 105. [Google Scholar] [CrossRef] [PubMed]
  63. Awan, M.J.; Bilal, M.H.; Yasin, A.; Nobanee, H.; Khan, N.S.; Zain, A.M. Detection of COVID-19 in Chest X-ray Images: A Big Data Enabled Deep Learning Approach. Int. J. Environ. Res. Public Health 2021, 18, 10147. [Google Scholar] [CrossRef]
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.