The Next Generation Cognitive Security Operations Center: Adaptive Analytic Lambda Architecture for Efficient Defense against Adversarial Attacks

: A Security Operations Center (SOC) is a central technical level unit responsible for monitoring, analyzing, assessing, and defending an organization’s security posture on an ongoing basis. The SOC staff works closely with incident response teams, security analysts, network engineers and organization managers using sophisticated data processing technologies such as security analytics, threat intelligence, and asset criticality to ensure security issues are detected, analyzed and finally addressed quickly. Those techniques are part of a reactive security strategy because they rely on the human factor, experience and the judgment of security experts, using supplementary technology to evaluate the risk impact and minimize the attack surface. This study suggests an active security strategy that adopts a vigorous method including ingenuity, data analysis, processing and decision-making support to face various cyber hazards. Specifically, the paper introduces a novel intelligence driven cognitive computing SOC that is based exclusively on progressive fully automatic procedures. The proposed λ-Architecture Network Flow Forensics Framework (λ-ΝF3) is an efficient cybersecurity defense framework against adversarial attacks. It implements the Lambda machine learning architecture that can analyze a mixture of batch and streaming data, using two accurate novel computational intelligence algorithms. Specifically, it uses an Extreme Learning Machine neural network with Gaussian Radial Basis Function kernel (ELM/GRBFk) for the batch data analysis and a Self-Adjusting Memory k-Nearest Neighbors classifier (SAM/k-NN) to examine patterns from real-time streams. It is a forensics tool for big data that can enhance the automate defense strategies of SOCs to effectively respond to the threats their environments face.


Introduction
With an ever-increasing cybersecurity threat landscape to interconnected or networked devices, and since the volume of data is growing exponentially, it is more important than ever for critical infrastructures and organizations to be strengthened with intelligent driven security managing and monitoring tools.Using the right combination of these intelligent centralized tools and big data technologies allows classifying risks with high accuracy across network infrastructures to identify sophisticated attacks.Nevertheless, current SOCs focus mostly on human experience and the opinion of experts to evaluate and minimize potential cyber threats.On the other hand, traditional signaturebased security systems are unable in most cases to identify evolving threats such as zero-day malware or they produce a vast number of false alarms, thus are proven ineffective as security management tools.The implementation and usage of alternative, more innovative and more effective intelligent methods with fully automated aptitudes appear is necessary to produce an-up-to date SOC that can handle security incidents.
Accordingly, SOCs are being forced to consider new ways to boost their cyber defenses such as cloud strategies, big data analytics and artificial intelligence technologies that are emerging as the frontrunner in the fight against cyber-crime.With fully self-governed systems that mimic the functioning of the human brain and help to improve decision-making with minimum human interference, a Next Generation Cognitive Computing SOC (NGC2SOC) is in a far better place to strengthen and reinforce cybersecurity strategies.The ultimate purpose of NGC2SOC comprises sophisticated intelligence driven tactics for real-time investigation of both known and unknown vulnerabilities, immediate access, evidence visualization and additional advanced tools or practices that reduce the potential risk in critical assets combined with a completely automated reinstatement of cybersecurity problems.
Machine learning is a practice used to develop sophisticated representations and systems that produce dependable, repeatable decisions and discover unseen or hidden patterns through learning from historical data.In these models, the training and test data are expected to be produced from identical although probably unidentified distributions, thus they have been very sensitive to slight changes in the input or a series of specific transformations [1].Most of those sensitivities under certain circumstances may lead to altering the behavior of the machine learning algorithms.Specifically, security of machine learning systems is vulnerable to crafted adversarial examples, which may be imperceptible to the human eye, but can lead the model to misclassify the output.In recent times, different types of adversaries based on their threat model leverage these vulnerabilities to compromise a machine learning system where adversaries have high incentives.
An adversarial attack is an attempt to maliciously operate the input data or manipulate specific weaknesses of machine learning procedures to compromise the entire security system.For example, a classification process by a trained neural network classifier decides which class a new remark fits based on a training set of data covering remarks whose class association is known.The classification threshold is imperfect and an appropriately designed and implemented adversarial attack, which corresponds to a modified input that may come from a modified dataset, can lead the algorithm to a wrong solution (wrong class).This is because the neural networks operate on high-dimensional data, they are sensitive to overfitting, they can be too linear and they are characterized by the inherent uncertainty of their predictions.
To understand the security properties of learning algorithms in adversarial settings, one should address the following main issues:


identifying potential vulnerabilities of machine learning algorithms during learning and classification;  devising appropriate attacks that correspond to the identified threats and evaluating their impact on the targeted system; and  proposing countermeasures to improve the security of machine learning algorithms against the considered attacks.
In general, there are two defense strategies against adversarial attacks.First is the reactive strategy, which consists of training another classifier, which should be constructed based on the variety of the execution mode and on the restrictions' settings that can lead to dissimilar decision boundaries, even if all other constraints remain steady.For example, different classifiers should be chosen with multiple levels of diversity that use dissimilar functional settings and diverse training sets, thus permitting dissimilar decision boundaries to be formed, and that can be combined to reduce the overall error.The second is the proactive strategy, which relies on implementing suitable precautionary training, capable of establishing the exact decision boundaries.An investigation that considers the training process of the learning model should try to discover the optimum weights.The weight vector is a very important parameter, as it is used in the development of defining the confidentiality of classifiers, and the confidence of the pattern recognition process.For example, in the situation of higher weights, it is an important request of the classification process of how they regulate the class boundaries of the general prototype.Hence, it is extremely important to provide robustness to machine learning algorithms against these adversaries.
This paper proposes the development of an innovative λ-Architecture Network Flow Forensics Framework (λ-NF3) to network traffic analysis, demystification of malware traffic and encrypted traffic identification for efficient defense against adversarial attacks.The λ-NF3 is an effective and accurate network administration system that offers intelligent network flow forensics methods, aiming to be used by NGC2SOCs that can work without the need of human experience and the opinion of experts to evaluate and minimize potential cyber threats.A basic innovation of the proposed methodology is the combination of two sophisticated algorithms for the first time in a hybrid machine learning framework.The proposed framework employs a specific version of the Lambda architecture combined with Extreme Learning Machine with Gaussian Radial Basis Function kernel (ELM/GRBFK) for the batch data classification and k-NN Classifier with Self Adjusting Memory (SAM/k-NN) to investigate real time data streams.Lambda architecture was chosen, as, in multifactorial problems of high complexity of large datasets such as the one under consideration, the outcomes of the estimation are multi-variable, especially with respect to analysis and integration of network data flows.This implementation follows a reactive cyber security strategy for dealing with adversarial attacks as it combines training two diametrically opposite classifiers to detect incoming potential threats and to discard them.In addition, it is important to highlighted that the proposed novel scheme offers high learning speed, ease of execution, minimal human involvement and minimum computational power and resources.
The remainder of this paper is organized as follows.Section 2 presents the literature review on machine learning approaches have used in the traffic analysis, how Lambda architecture improves the overall accuracy of a big data model and some interesting methods to hardening a machine learning system against adversarial attacks.Section 3 defines the proposed framework.Section 4 outlines the methodology.Section 5 present the datasets.Section 6 explains the results.Section 7 present the conclusions.

Literature Review
Adversarial attacks have been recorded against spam filtering, where spam messages are obscured through spelling mistakes [2]; in computer networks, where malware code masquerade as benign network packets [3]; in antivirus software, where malicious program pass the signature detection test [4]; and in biometric recognition, where false biometric features may be alternated to mimic an authentic user (biometric spoofing) [5].The network flow classification and categorization problem require a vast amount of computing resources [6].In addition, the exponential increase of collected daily network data has led to the need for big data storage applications that should be designed for high capacity, low latency, and rapid analytics.In addition, the velocity of data and the necessity of real-time analysis, combined with the variety of both structured and non-structured data forms, present many challenges including scalability and storage bottleneck, noise gathering, false correlation, supplementary endogeneity, and measurement errors.However, the biggest challenge in the analysis of network flow data, which is a big data processing problem, is the performance of pattern recognition and knowledge mining by employing intelligent systems with proper architectures [7].
In addition, SOC staff are assisted by visual tools when studying big data.Their clarification of the genuineness on the screen may vary due to their familiarity and skills.An essential request of the efficiency is to maximize operators' cyber condition alertness by adopting expressive visualization tools as part of an all-inclusive decision-support method [8].
The λ-NF3 is an effective and innovative intelligence-driven cyber security method.This study has emerged after extensive and long-term research about the network forensics process with cybersecurity methodologies and specifically about the network traffic analysis, demystification of malware traffic and encrypted traffic identification [9][10][11][12][13][14][15][16][17].Significant work has been done using various machine learning methods in various domains.
For example, one study demonstrates that vulnerabilities can be predicted using an SVM model based on a set of code metrics for a specific Android application [18].The classification model exhibits good performance in terms of both accuracy and precision.However, this study applies to a limited pool of applications and few Android versions.In addition, Shabtai et al. [19] proposed a heuristic approach to static analysis of Android applications based on matching suspicious applications with the predefined malware models.Static models are built from Android capabilities and Android Framework API call chains used by the application.All the analysis steps and model construction are fully automated.However, the proposed method has smaller detection coverage with randomly chosen malware models.
In addition, in [20], the authors proposed an inter-application communication tool that detects application communication vulnerabilities.The proposed model can be used by developers to analyze their own applications before release, by application reviewers to analyze applications in the Android market, and by end users.The authors analyzed 20 applications and found 34 exploitable vulnerabilities; 12 of the 20 applications have at least one vulnerability.This shows that applications can be vulnerable to attack and that developers should take precautions to protect themselves from these attacks.Burguera et al. [21] proposed a behavior-based malware detection system, while Glodek at al. [22] a permissions-based malware detection system; however, the classification performances of these systems are severely affected by limited supervised information and unknown applications.
On the other hand, Zhang et al. [23] developed a new method to tackle the problem of unknown applications in the crucial situation of a small supervised training set.The proposed method possesses the superior capability of detecting unknown flows generated by unknown applications and utilizing the correlation information among real-world network traffic to boost the classification performance.A theoretical analysis is provided to confirm the performance benefit of the proposed method.Moreover, the comprehensive performance evaluation conducted on two real-world network traffic datasets shows that the proposed scheme outperforms the existing methods in the critical network environment.
Malware attacks are increasingly popular attack vectors in online crime.As trends and anecdotal evidence show, preventing these attacks, regardless of their opportunistic or targeted nature, has proven difficult: intrusions happen, and devices get compromised, even at security-conscious organizations.Therefore, an alternative line of work has focused on detecting and disrupting the individual steps that follow an initial compromise and that are essential for the successful progression of the attack.Several approaches and techniques have been proposed to identify the Command and Control (C2) channel that a compromised system establishes to communicate with its controller.The success of C2 detection approaches depends on collecting relevant network traffic.As traffic volumes increase, this is proving increasingly difficult.
For example, Gardiner et al. [24] analyzed current approaches of ISP-scale network measurement from the perspective of C2 detection, discussed several weaknesses that affect current techniques and provided suggestions for their improvement.Hsu et al. [25] proposed an innovative structure for detecting botnets in real time based on performance metrics to investigate whether a suspicious server is a fast-flux bot.The most innovative part of this approach is the fact that it works in either passive or active mode.Valuations show that the proposed solution is a promising method that can identify the botnet's activities without noteworthy performance deprivation, however the method fails in the situation of encrypted communication of the compromised machines of the botnet.
In addition, Haffner et.al. [26] employed an automated method to export the payload content from network flow of real-time applications and used several machine learning models to categorize the network traffic.The proposed method is time consuming and requires high CPU utilization.Furthermore, Holz et al. [27] developed a heuristic approach to calculating some properties that identify some fast-flux botnets.This is a passive method to locate obsolete botnets and fails to investigate dynamic fast-flux botnets based on sophisticated techniques.Almubayed et al. [28] presented a very interesting method to extract several features from the encrypted traffic of the Tor network.These features are appropriate and can classify the Tor traffic with very high accuracy.
On the other hand, several optimal and novel applications have been done in applying Lambda architecture [29,30].For example, Kiran et al. [31] presented a cost-optimized lambda architecture that combines online and batch data processing to handling a huge volume of sensor data.Both procedures can produce effective data accumulations, combinations or aggregations that are easier to analyze for identifying hidden patterns.It is also a promising method that reduces significantly the processing time and the resource requirements.Moreover, Yamato et al. [32] used a lambda architecture to analyze data from IoT sensors.It is a data analytics framework that uses incremental learning techniques to identify anomalies in real time.
The automatically updating learning model improves analysis accuracy and is a promising method to defend against adversarial attacks.A new valuation procedure that is resilient to face these attacks was proposed by Yong et al. [33].Specifically, the authors, to respond to injection attacks and adversarial additive and multiplicative errors, proposed a method to split the dataset into uncertainness subsets that lead to a manageable optimization result.In addition, Chong et al. [34] proposed a combination method of two algorithms to defend against adversarial attacks.In the first stage, an effective algorithm uses a finite window of measurements to reconstruct the initial state.In the second stage, a different algorithm intervenes to the exact state appreciation .Finally, Chen et al. [35] introduced a security regularization term that contemplates the circumvention cost of feature handlings by attackers to increase the system security.

Network Forensics
Network forensics is a progressive procedure involving the monitoring and analysis of network traffic for information congregation, legal investigation, or intrusion detection and prevention.In addition, network forensics arrange provisional and momentary evidence in an unpredictable and dynamic environment such that network traffic is transmitted and then lost.An imperative subsection of network forensics is the traffic classification process, which is an automated procedure to classify network patterns according to numerous constraints into several traffic classes.The main method to recognize and classify network traffic with high accuracy and precision is the classification process based on the payload [36].
Serious weaknesses of these methods are the complexity and their requirements in terms of computational resources.In addition, cyber security expert opinion is required to differentiate provided services and implement appropriate security policies.Even the most sophisticated forecast method that relies on Deep Packet Inspection (DPI) is time-consuming, does not produce accurate results and suffers high false alarm rates [37].
To summarize, network flow forensics methods depend on the availability of various system resources, need supervision from a network engineer with advanced skills in cybersecurity and nevertheless fail totally to identify zero-day exploitations.
Malicious botnets have become the most dangerous threat of the Internet today using advanced techniques to obfuscate the network communication aspects involved in their phishing schemes, malware delivery or other criminal enterprises.Thus, malware traffic analysis is the primary method of botnet investigation and identification of the command and control (C2) infrastructures associated with these activities.The most sophisticated types of malware are seeking network communication aspects in botnet establishment and operation with the C2 isolated servers via an obfuscated communication layer, based on the fuzzy construction of the Tor network.The Tor network produces traffic similar to the normal encryption traffic of the HTTPS protocol, making the identification procedure extremely difficult [38].
Since cyber systems' security is a multifaceted procedure, SOC management cannot be based only on the passive-mode signature-based defense applications that are ineffective on zero-day attacks.The discovery and identification of a penetration or interruption in the network should be a self-acting and nearly real-time procedure, which would offer an imperative advantage to the administrators.In this point of view, the use of more effective methods of network supervision, with capabilities of automated control in network traffic analysis, demystification of malware traffic and encrypted traffic identification is important to estimate the behavior of malware, the purpose of attacks and the damage caused by malware activity.

Batch Processing
Batch data are usually datasets collected during some transactions or processes for a certain time period and characterize the systems functionality.Processing these data using conventional data mining or machine learning methods assumes that they are available and can be accessed simultaneously without any limitation in terms of their processing or analysis time.It should be noted that these data are susceptible to noise, their classification process has a significant cost, and they require serious hardware infrastructure for safe storage and general handling.
Batch processing is the implementation of a sequence of procedures from batch data.This processing can be concluded or scheduled in the time period when the computing resources are less busy.Similarly, it evades wasting system resources with manual intercession and management, keeping high overall rate of utilization.It permits the system to use diverse processes for collaborating works and separation tasks, thus reducing the storage overhead and shifting bottlenecks.However, the batch processing also has numerous drawbacks, for example, users are unable to terminate a progression and must wait until the execution completes.
With the new technologies that have dominated our lives over the last few years and especially with the constant rise of Internet sensors and actuators, the volume of data generated by devices is constantly increasing.The rising field of real-world implementations produces data streams at a cumulative percentage, needing large-scale and real-time processing.

Stream Processing
Data streams are endless data that are produced by multiple network infrastructures, such as sensors, IoT equipment, etc.A typical example for streaming data is the data that can be collected from network monitoring queries at high traffic rates.Streaming data should be handled in sequence and incrementally or over sliding time windows.Due to their reliance on strict time constraints and their more general availability, they are selected for detailed and specialized data processing techniques that can lead to multiple levels of revelation of the hidden knowledge they may contain.In addition, these data need to be processed without accessing all the rest of the data.In addition, it should be considered that concept drift may occur in the data, which means that the stream properties may transform and alternate over time.It is frequently used in big data projects, in which data streams are quickly produced by several dissimilar sources.
Streaming data and data generated by dynamic environments have lead to some of the most robust research areas of the new era.stream processing techniques are used by machine learning applications on data streams for real time analysis and knowledge extraction under displacement and feedback environments.
In the case of stream processing techniques by machine learning, the algorithms are controlled by a variety of possible shifting modes and constraints related to memory consumption, resource limitation and processing time.In this category, the available data are scaled in a sequential order and used for training and forecasting by calculating the error in each iteration.The aim of the algorithms in this category is to minimize the cumulative error for all iterations.We consider that the intention of supervised learning using the square loss function is to minimize the empirical error calculated by the following function [39]: where ∈ , ∈ and ∈ .Let there be a data table of Χ × and a target values table of dimensions Υ × 1 as they are defined after the entrance of the first i data points.
Let us suppose that the covariance table = is reversible, and * ( ) = 〈 * , 〉 is the ideal result for the linear least squares problem, as shown in Equation ( 2 The calculation of the table = ∑ has a time complexity of ( ).Reversing the × table has a time complexity of ( ), whereas the rest of the multiplication requires time complexity of ( ), producing an overall complexity of ( + ).If we consider that n is the set of points in the dataset = 1,2, … , and it is essential to recalculate the result after the arrival of each new data vector, we obtain a total complexity ( + ) [39,40].It is important here to mention that a machine learning stream processing is appropriate in cases where it is required to dynamically adapt the procedure to new standards or data, or when the streams are produced as a function of time, as in the case of the research of the adversarial attacks.

The Proposed Approach
The need to extract information from extensive networking flows in real time is a big data challenge, such as those that re-established the prototyping architectures of big data.Big data architectures include mechanisms for ingesting, protecting, processing, and transforming data into big data structures.In addition, these architectures typically comprise an examination of data lakes (batch processing), real-time analyses (streaming processing), predictive analytics from unstructured data and machine learning tools that analyze data with low latency [31].
The analysis of very large volumes of data is time consuming and cannot be completed in real time.Abundant data storage that works with the entire dataset to store the results of the queries for future use is frequently required.One serious disadvantage to this method is that it introduces latency.
The lambda architecture faces this problem by producing two pathways for data flow.A batch layer (cold path) includes all inbound data in their raw format and achieves batch processing on the data.The outcome of this analysis is stored and deposited as a batch view.In addition, a speed layer (hot path) analyzes unbounded streams of data in real time.The hot path is planned for low latency, at the expense of accuracy [32].Figure 1 is a depiction of the lambda architecture.The lambda architecture has been designed to balance latency, throughput, and fault tolerance using the cold path to provide complete and accurate views of historical data.At the same time, it uses the hot path to provide real time data stream analysis of new inputs.Finally, an additional element that enhances the process and adds accuracy in the entire model is that the two projection outputs can be joined before the final data presentation or the final decision.
The algorithmic approach of the λ-NF3 in the first phase includes the feature extraction procedure from network flow.In the second phase, these features are analyzed by both classifiers to minimize the possibility of being deceived by adversarial attacks.Both results are merged with a bias to the cold path (batch processing) because it allows good audit trail, although the real-time processing is more difficult for auditing.
A depiction of the λ-NF3 process is shown in Figure 2. The first investigation is whether the traffic is normal or abnormal (network traffic analysis).If the traffic is abnormal, it will be further analyzed (malware traffic demystification) for the purpose of identifying the specific abnormality (botnet, crimeware, Advanced Persistent Threat (APT) attack, CoinMiner, etc.).Besides, if the traffic is normal, it will be further analyzed to inspect whether the application or protocol uses non-encrypted traffic ( File Transfer Protocol (FTP), Hypertext Transfer Protocol (HTTP), Domain Name System (DNS), Simple Mail Transfer Protocol (SMTP), etc.) or encrypted traffic (encrypted traffic identification), as well as which protocol that it uses (The onion router (Tor), Secure Shell (SSH), Secure Sockets Layer Web (SSLweb), Secure Sockets Layer Peer-to-Peer (SSLP2P), Secure Copy Protocol (SCP), Skype, etc.).
A depiction of the classification process of the proposed λ-NF3 is presented in the following Figure 3.

Methodology
The λ-NF3 was developed by employing a general concept of coupling different types of sophisticated algorithms with significant diversity in their operation and configuration mode, with different architectures, requiring different realizations, hyper-parameter settings and training techniques.These algorithms are presented below.

Extreme Learning Machines for Batch Data
An ELM is a type of Single-Hidden Layer Feed Forward Neural Network (SLFFNN) [41] with N hidden neurons.The most impressive characteristic of the ELMs is the fact that the input weights and the bias in the hidden layer are assigned randomly [42].In addition, an ELM can precisely learn K samples, thousands of times greater [43] than a back-propagation feed forward neural network because parameters such as stopping criterion, learning rate and learning epochs do not need to be tuned.
The mathematical background of the ELM is presented in [41][42][43].Generally, the input data in an ELM is related to a random L future space with a training set N, where ( , ), ∈ ⟦1, ⟧ with ∈ and ∈ .The output is calculated as follows [41][42][43]: Vector matrix = [ , … , ] is the outcome that includes the weight matrix from the hidden and output layers.ℎ( ) = [ ( ), … , ( )] is the outcome of the hidden layer for the input x, and ( ) is the outcome of the ith neuron.If {( , )} is a training set, then = , where = [ , … , ] is learning problem with T the outcome and H the hidden layer of an ELM that is calculated as follow: This research uses ELM/GRBFK.The Gaussian kernel is as follows: ELM is an important approach for handling and analyzing big data as it requires the minimum training time relative to the corresponding engineering learning algorithms; it does not require fine manipulations to determine its operating parameters; and it can determine appropriate output weights towards the most effective resolution of a problem.Most importantly, they have the potential to generalize, in contrast to corresponding methods that adjust their performance based solely on their training dataset.It is also obvious that the emerging use of ELM in big data analysis creates serious prerequisites for complex systems' development by low cost machines.

kNN Classifier with Self Adjusting Memory for Streaming Data
The SAM/k-NN is an artificial intelligence algorithm that is biologically inspired from multiple human memory systems, specifically short-and long-term memory [44].Short-Term Memory (STM) is the capacity for holding, but not manipulating, a small quantity of information for a short time period.It is defined in contrast to Long-Term Memory (LTM) that is the stage of the memory model where informative knowledge is held indefinitely.
Recurrent reactivations are the primary mechanism that encode the memory information culminating in the spreading of information to supplementary locales and integration of new knowledge.Generally speaking, information from STM are conveyed to LTM in the process of the transformation over time of knowledge, which is called memory consolidation.For instance, once someone learns how to ride a bike, it is never forgotten because it is stored within the LTM, making it impossible to be lost.The SAM architecture is partly inspired by this model.For example, the general statement of new inputs (streaming data) is more related for current estimates that can be associated with temporal trends or time-based events.On the other hand, the batch processing from historical data can lead to much better prediction results, while offering generalization.
The SAM architecture is described in Figure 4. Memories are represented by the sets MST, MLT and MC.Each memory is a subset of × {1, … , } with different length that fluctuates throughout the adjustment procedure.STM signifies the present idea and is an active sliding window that contains the most recent samples m of the data streams [44]: The LTM includes all previous information that is not opposing those of the STM.Different from the STM, the LTM is a set of p points: The union of both memories is called Combined Memory (CM) and defined as: Every set induces a distance weighted k-NN classifier: The function of the kNN algorithm has the aim of assigning a class-label for a given data-point where ( , ) is the Euclidean distance between two data points and ( , ) returns the k nearest neighbors of x in Z.The SAM/k-NN model was introduced by Losing et al. [44].
The implementation of this algorithm as a data stream categorization model is based on the general assumption that new data are more relevant to current forecasts, but prior knowledge is required to properly rank them.The optimal combination of the two processing levels can minimize errors and increase classification precision.The implementation of this model, which provides knowledge transfer potential, is an effective and real time forensics tool to cyber or adversarial attacks identification.

Data
The detailed extraction process [45] that includes the appropriate features, which can identify network attacks from the network flow, is analytically described in [46].It should be noted that this extraction procedure is also enriched by some innovative representation practices for data structures and alteration that introduce the Pandas tool for data manipulation in Python.
The Network Traffic Analysis (NTA) binary dataset that contains of 30 independent variables and 2 classes (normal or abnormal); the Demystification of Malware Traffic (DMT) multiclass dataset that contains 30 independent variables and 5 malware classes (Botnet, Crimeware, APT, Attack and CoinMiner); the Encrypted Traffic Analysis (ETI) binary dataset that includes 30 independent variables and 2 classes (encrypted or non-encrypted); the Encrypted Traffic Identification (EnTI) multiclass dataset that encompasses 30 independent variables and 6 classes that represent encrypted protocols (Τοr, SSH, SSLweb, SSLP2P, SCP, and Skype); and the Unencrypted Traffic Identification (UTI) multiclass dataset that comprises 30 independent variables and 4 classes of unencrypted network protocols (FTP, HTTP, DNS, and SMTP), were determined to create extremely complex situations that can potentially include the most likely cases that can be detected in a network infrastructure and that are suitable to train the proposed λ-NF3 [47].
The full list of the 30 data features is detailed in Table 1.

fpsh_cnt
The number of times the PSH flag was set for packets travelling in the forward direction.

bpsh_cnt
The number of times the PSH flag was set for packets travelling in the backward direction.

furg_cnt
The number of times the URG flag was set for packets travelling in the forward direction.

burg_cnt
The number of times the URG flag was set for packets travelling in the backward direction.

total_fhlen
The total header length (network and transport layer) of packets travelling in the forward direction.

total_bhlen
The total header length (network and transport layer) of packets travelling in the backward direction.30 dscp Differentiated services code point, a field in the IPv4 and IPv6 headers.

label Class
The detailed collection procedure is analytically described in [48].

Results
In all simulations, the testing hardware and software conditions are listed as follows: Laptop Intel-i7 2.4 G CPU, 16 G DDR3 RAM, Ubuntu 18.04 LTS, Anaconda Python Data Science Platform and TensorFlow Python environment.

Batch Data Classification Performance
The classification performance in the batch process was measured by the development of a Confusion Matrix (CM) and then calculating the True Positive Rate (TPR), the True Negative Rate (TNR) and the Total Accuracy (TA), as defined by Equations ( 11)-( 13), respectively [49,50]: In addition, the Precision (PRE), Recall (REC) and F-Score indices allowed the distinctive and irrefutable evaluation of the model.These matrices are defined in Equations ( 14)- (16), respectively [49,50]: Ten-fold cross validation (10_FCV) was employed to measure performance indices.Tables 2-6 present the outcomes of the λ-NF3 method and the equivalent results from competitive algorithms (Support vector Machine (SVM), Multi-Layer Artificial Neural Network (MLFF) ANN, k-Nearest Neighbor (k-NN) and Random Forest (RF)).The proposed ELM/GRBFK algorithm seems to have a slightly better performance across all datasets, compared to the other methods but the proposed batch processing approach is hundreds of times faster.Thus, the proposed method is appropriate for big data analytics.

Streaming Data Classification Performance
The analysis of data streams is a specialized machine learning problem that requires specific metrics to measure the accuracy.The Kappa statistic [51] is the most reliable measure to benchmark the accuracy in streaming data classification.It measures the arrangement between two raters who each classify N items into C equally classes.The explanation of κ is [52] where po is the comparative observed arrangement between raters (equal to accuracy) and pe is the supposed likelihood of chance arrangement, via the observed data to estimate the likelihoods of each observer arbitrarily seeing each class.If the raters are in complete agreement, then κ = 1.If there is no agreement, κ ≈ 0. Due to the a temporal dependences in data streams, the Kappa-Temporal statistic was used [51]: where pper is the accuracy of the persistent classifier.The Kappa-Temporal statistic take values within [1, −∞].If the classifier is seamlessly accurate, then κper = 1.In other cases, the κper = 0.If κper < 0, the reference classifier is performing worse than the baseline classifier [51].
Tables 7-11 presents the results of the scenarios applied on streaming data in this research and the equivalent results from competitive methods (Hoeffding Adaptive Tree (ΗΑΤ) [52] and primal estimated sub-gradient solver for support vector machine (SPegasos) [53]).The learning estimation used 10,000 instances.Prequential evaluation method was used [54].The training windows used were 5000 and 1000 instances.A basic innovation of this methodology is the combination for the first time in a hybrid machine learning framework the ELM/GRBFK and SAM/k-NN algorithms.The combination offers high learning speed, ease of execution, minimal human involvement and minimum computational power and resources for network traffic analysis, demystification of malware traffic and encrypted traffic identification.
Finally, the datasets, developed after protracted and extensive investigation on the network protocols, work in the lower layers (transport, network and data) and the higher layers (session, presentation and application) of the system.It is important also to note that the dataset is developed after evaluations concerning the restrictions of their characteristic performance of those protocols and the purpose of their normal or abnormal behavior in a real networking environment.

Discussion
An advanced, dependable and highly effective cybersecurity system, using machine learning principles, is presented in this research paper.It is an appropriate tool for big data applications with many data streams in situations where signature-based approaches are computationally infeasible.
The development of λ-ΝF3 is based on the Lambda Architecture approach.This architecture can handle enormous quantities of data in real-time using an ideal combination of two machine learning algorithms for batch and data stream.Specifically, it uses batch process to provide complete and accurate views of historical data and real time data stream processing to provide views of new inputs.The final decision comes from the two joined outputs.
The λ-ΝF3 is an adaptive analytic framework for efficient defense against adversarial attacks and proposed for the NGC2SOC.This intelligence-driven method, from which hopeful outcomes have emerged, creates a reliable advanced application for the tactic of improved cyber security infrastructures.Moreover, this implementation follows a reactive cyber security strategy for dealing with adversarial attacks, as it combines training of two opposite classifiers to detect incoming anomalies and to discard them.Training is done by using sophisticated real datasets that respond to realistic situations.The operating scenarios proposed with the combination of batch and streaming data createabilities for a fully-defined configuration of model parameters and for high-precision classification or correlation.Finally, the application of artificial intelligence on digital recording machines, aiming to recognize adversarial attacks with machine learning, enhances and simplifies the cyber defense and it introduces new perspectives in the management of cyber security policies.
The proposed system was tested and evaluated on real-world datasets of high complexity that emerged after extensive research on network behavior.The remarkable results and the generalization of the system meaningfully support the proposed methodology, although the degree of difficulty and realism that has been added has formed multifactorial questions of exhaustive examination and reproduction.
The evaluation of the proposed method was carried out by thoroughly presenting and quoting the metrics that can determine the classification accuracy of the algorithms.The broad application of the proposed technique, which minimizes the cost of the cyber-attacks, is a prerequisite for the establishment of a NGC2SOC, aiming at the cyber security and protection of organizations and critical infrastructures.

Future Work
Future enquiry could include the proposed model under a novel structure that would combine semi-supervised approaches and online incremental learning for the identification of hidden patterns between unstructured data types included in network traffic.In addition, λ-ΝF3 could be improved towards further enhancing the constraints of the algorithm used by the Lambda architecture, so that an even more efficient, more accurate, and faster prediction procedure is achieved.An adapted visualization that can merge into the proposed λ-NF3 would further assist NGC2SOC operators in handling cybersecurity incidents.Multi-format depictions may support a C2 system with advanced reports and representations that enhance the overall decision mechanism.Moreover, it would be significant to study the development of this certain framework by applying lambda architecture in a parallel and distributed environment such as hadoop.Finally, an additional component that could be considered as a future extension concerns the procedure of λ-ΝF3 with approaches of selfimprovement and meta-learning to fully automate the defense against adversarial attacks.

Figure 3 .Algorithm 1
Figure 3.The classification process of the proposed λ-NF3.The overall process is presented in Algorithm 1. Algorithm 1 The λ-NF3 Algorithm Inputs: Input new network traffic data Dl Step 1: % Features Extraction Feature extraction from network flow Step 2: % Make a prediction Use the pretrained ELM-RBF classifier using Dl to produce the prediction Ml Use the streaming SAM/k-NN classifier using Dl to produce the prediction M2 Assign weights of 0.60 to Ml Merge the two predictions Ml and M2 into M Output from traffic analysis: if Abnormal Malware Traffic Demystification for a class label Botnet, Crimeware, APT, Attack, CoinMiner else if Encrypted Encrypted Traffic Identification for a class label Tor, SSH, SSLweb, SSLP2P, SCP, Skype else for a class label FTP, HTTP, DNS, SMTP end if end if Outputs: Class label for each new data D1

Name Interpretation 1 srcip
The source IP address of the flow. 2 srcport The source port number of the flow.3 dstip The destination IP address of the flow.4 dstport The destination port number of the flow.5 total_fpackets The total number of packets travelling in the forward direction.6 total_bpackets The total number of packets travelling in the backward direction.7 min_fpktl The minimum packet length (in bytes) from the forward direction.8 max_fpktl The maximum packet length (in bytes) from the forward direction.9 min_bpktl The minimum packet length (in bytes) from the backward direction.10 max_bpktl The maximum packet length (in bytes) from the backward direction.11 min_fiat The minimum interarrival time (in microseconds) between two packets.12 max_fiat The maximum interarrival time (in microseconds) between two packets.13 min_biat The minimum interarrival time (in microseconds) between two packets.14 max_biat The maximum interarrival time (in microseconds) between two packets 15 duration The time elapsed (in microsec) from the first packet to the last packet.16 min_active The minimum duration (in microseconds) of a sub-flow.17 max_active The maximum duration (in microseconds) of a sub-flow.18 min_idle The minimum time (in microseconds) the flow was idle.19 max_idle The maximum time (in microseconds) the flow was idle.20 sflow_fpackets The average number of forward travelling packets in the sub-flows.21 sflow_fbytes The average number of bytes, travelling in the forward direction.22 sflow_bpackets The average number of backward travelling packets in the sub-flows.23 sflow_bbytes The average number of bytes, travelling in the backward direction.