Special Issue "Machine Learning for Cybersecurity Threats, Challenges, and Opportunities"

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 31 July 2020.

Special Issue Editors

Prof. Luis Javier Garcia Villalba
Website
Guest Editor
Group of Analysis, Security and Systems (GASS), Department of Software Engineering and Artificial Intelligence (DISIA), Faculty of Computer Science and Engineering, Office 431, Universidad Complutense de Madrid (UCM), Calle Profesor José García Santesmases, 9, Ciudad Universitaria, 28040 Madrid, Spain
Interests: Anonymity; Computer Security; Cyber Security; Cryptography; Information Security; Intrusion Detection; Malware; Privacy; Trust
Special Issues and Collections in MDPI journals
Prof. Dr. Rafael T. de Sousa Jr.
Website
Guest Editor
Decision Technologies Laboratory - LATITUDE, Electrical Engineering Department (ENE), Institute of Technology (FT), University of Brasília (UnB), Brasília-DF, CEP 70910-900, Brazil
Interests: cyber, information and network security, distributed data services and machine learning for intrusion and fraud detection, signal processing, energy harvesting and security at the physical layer.
Dr. Mario Blaum
Website
Guest Editor
IBM Almaden Research Center, San Jose, CA, USA
Interests: error-correcting codes; fault tolerance; parallel processing; cryptography; modulation codes for magnetic recording; timing algorithms; holographic storage; parallel communications; neural networks; finite group theory
Dr. Ana Lucila Sandoval Orozco

Guest Editor
Cybersecurity INCT Unit 6, Decision Technologies Laboratory-LATITUDE, Electrical Engineering Department (ENE), Technology College, University of Brasília (UnB), Brasília-DF, CEP 70910-900, Brazil
Interests: computer and network security; multimedia forensics; error-correcting codes; information theory
Special Issues and Collections in MDPI journals

Special Issue Information

Dear Colleagues,

Cybersecurity has become a major priority for every organization. The right controls and procedures must be put in place to detect potential attacks and protect against them. However, the number of cyber-attacks will be always bigger than the number of people trying to protect themselves against attacks. New threats are being discovered on a daily basis, making it harder for current solutions to cope with a large amount of data to analyze. Machine learning systems can be trained to find attacks which are similar to known attacks. This way, we can detect even the first intrusions of their kind and develop better security measures.

The sophistication of threats has also increased substantially. Sophisticated zero-day attacks may go undetected for months at a time. Attack patterns may be engineered to take place over extended periods of time, making them very difficult for traditional intrusion detection technologies to detect. Even worse, new attack tools and strategies can now be developed using adversarial machine learning techniques, requiring a rapid co-evolution of defenses that matches the speed and sophistication of machine learning-based offensive techniques. Based on this motivation, this Special Issue aims at providing a forum for people from academia and industry to communicate their latest results on theoretical advances and industrial case studies that combine machine learning techniques, such as reinforcement learning, adversarial machine learning, and deep learning, with significant problems in cybersecurity. Research papers can be focused on offensive and defensive applications of machine learning to security. The potential topics of interest of this Special Issue are listed below. Submissions can contemplate original research, serious dataset collection and benchmarking, or critical surveys.

Potential topics include but are not limited to:

  • Adversarial training and defensive distillation;
  • Attacks against machine learning;
  • Black-box attacks against machine learning;
  • Challenges of machine learning for cyber security;
  • Ethics of machine learning for cyber security applications;
  • Generative adversarial models;
  • Graph representation learning;
  • Machine learning forensics;
  • Machine learning threat intelligence;
  • Malware detection;
  • Neural graph learning;
  • One-shot learning; continuous learning;
  • Scalable machine learning for cyber security;
  • Steganography and steganalysis based on machine learning techniques;
  • Strength and shortcomings of machine learning for cyber-security.
Prof. Luis Javier Garcia Villalba
Prof. Dr. Rafael T. de Sousa Jr.
Dr. Mario Blaum
Dr. Ana Lucila Sandoval Orozco
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All papers will be peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Published Papers (26 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

Open AccessArticle
Malicious JavaScript Detection Based on Bidirectional LSTM Model
Appl. Sci. 2020, 10(10), 3440; https://doi.org/10.3390/app10103440 - 16 May 2020
Abstract
JavaScript has been widely used on the Internet because of its powerful features, and almost all the websites use it to provide dynamic functions. However, these dynamic natures also carry potential risks. The authors of the malicious scripts started using JavaScript to launch [...] Read more.
JavaScript has been widely used on the Internet because of its powerful features, and almost all the websites use it to provide dynamic functions. However, these dynamic natures also carry potential risks. The authors of the malicious scripts started using JavaScript to launch various attacks, such as Cross-Site Scripting (XSS), Cross-site Request Forgery (CSRF), and drive-by download attack. Traditional malicious script detection relies on expert knowledge, but even for experts, this is an error-prone task. To solve this problem, many learning-based methods for malicious JavaScript detection are being explored. In this paper, we propose a novel deep learning-based method for malicious JavaScript detection. In order to extract semantic information from JavaScript programs, we construct the Program Dependency Graph (PDG) and generate semantic slices, which preserve rich semantic information and are easy to transform into vectors. Then, a malicious JavaScript detection model based on the Bidirectional Long Short-Term Memory (BLSTM) neural network is proposed. Experimental results show that, in comparison with the other five methods, our model achieved the best performance, with an accuracy of 97.71% and an F1-score of 98.29%. Full article
Show Figures

Figure 1

Open AccessArticle
ConvProtoNet: Deep Prototype Induction towards Better Class Representation for Few-Shot Malware Classification
Appl. Sci. 2020, 10(8), 2847; https://doi.org/10.3390/app10082847 - 20 Apr 2020
Abstract
Traditional malware classification relies on known malware types and significantly large datasets labeled manually which limits its ability to recognize new malware classes. For unknown malware types or new variants of existing malware containing only a few samples each class, common classification methods [...] Read more.
Traditional malware classification relies on known malware types and significantly large datasets labeled manually which limits its ability to recognize new malware classes. For unknown malware types or new variants of existing malware containing only a few samples each class, common classification methods often fail to work well due to severe overfitting. In this paper, we propose a new neural network structure called ConvProtoNet which employs few-shot learning to address the problem of scarce malware samples while prevent from overfitting. We design a convolutional induction module to replace the insufficient prototype reduction in most few-shot models and generates more appropriate class-level malware prototypes for classification. We also adopt meta-learning scheme to make classifier robust enough to adapt unseen malware classes without fine-tuning. Even in extreme conditions where only 5 samples in each class are provided, ConvProtoNet still achieves more than 70% accuracy on average and outperforms other traditional malware classification methods or existed few-shot models in experiments conducted on several datasets. Extra experiments across datasets illustrate that ConvProtoNet learns general knowledge of malware which is dataset-invariant and careful model analysis proves effectiveness of ConvProtoNet in few-shot malware classification. Full article
Show Figures

Figure 1

Open AccessArticle
Investigation of Dual-Flow Deep Learning Models LSTM-FCN and GRU-FCN Efficiency against Single-Flow CNN Models for the Host-Based Intrusion and Malware Detection Task on Univariate Times Series Data
Appl. Sci. 2020, 10(7), 2373; https://doi.org/10.3390/app10072373 - 30 Mar 2020
Abstract
Intrusion and malware detection tasks on a host level are a critical part of the overall information security infrastructure of a modern enterprise. While classical host-based intrusion detection systems (HIDS) and antivirus (AV) approaches are based on change monitoring of critical files and [...] Read more.
Intrusion and malware detection tasks on a host level are a critical part of the overall information security infrastructure of a modern enterprise. While classical host-based intrusion detection systems (HIDS) and antivirus (AV) approaches are based on change monitoring of critical files and malware signatures, respectively, some recent research, utilizing relatively vanilla deep learning (DL) methods, has demonstrated promising anomaly-based detection results that already have practical applicability due low false positive rate (FPR). More complex DL methods typically provide better results in natural language processing and image recognition tasks. In this paper, we analyze applicability of more complex dual-flow DL methods, such as long short-term memory fully convolutional network (LSTM-FCN), gated recurrent unit (GRU)-FCN, and several others, for the task specified on the attack-caused Windows OS system calls traces dataset (AWSCTD) and compare it with vanilla single-flow convolutional neural network (CNN) models. The results obtained do not demonstrate any advantages of dual-flow models while processing univariate times series data and introducing unnecessary level of complexity, increasing training, and anomaly detection time, which is crucial in the intrusion containment process. On the other hand, the newly tested AWSCTD-CNN-static (S) single-flow model demonstrated three times better training and testing times, preserving the high detection accuracy. Full article
Show Figures

Figure 1

Open AccessArticle
DeepDCA: Novel Network-Based Detection of IoT Attacks Using Artificial Immune System
Appl. Sci. 2020, 10(6), 1909; https://doi.org/10.3390/app10061909 - 11 Mar 2020
Abstract
Recently Internet of Things (IoT) attains tremendous popularity, although this promising technology leads to a variety of security obstacles. The conventional solutions do not suit the new dilemmas brought by the IoT ecosystem. Conversely, Artificial Immune Systems (AIS) is intelligent and adaptive systems [...] Read more.
Recently Internet of Things (IoT) attains tremendous popularity, although this promising technology leads to a variety of security obstacles. The conventional solutions do not suit the new dilemmas brought by the IoT ecosystem. Conversely, Artificial Immune Systems (AIS) is intelligent and adaptive systems mimic the human immune system which holds desirable properties for such a dynamic environment and provides an opportunity to improve IoT security. In this work, we develop a novel hybrid Deep Learning and Dendritic Cell Algorithm (DeepDCA) in the context of an Intrusion Detection System (IDS). The framework adopts Dendritic Cell Algorithm (DCA) and Self Normalizing Neural Network (SNN). The aim of this research is to classify IoT intrusion and minimize the false alarm generation. Also, automate and smooth the signal extraction phase which improves the classification performance. The proposed IDS selects the convenient set of features from the IoT-Bot dataset, performs signal categorization using the SNN then use the DCA for classification. The experimentation results show that DeepDCA performed well in detecting the IoT attacks with a high detection rate demonstrating over 98.73% accuracy and low false-positive rate. Also, we compared these results with State-of-the-art techniques, which showed that our model is capable of performing better classification tasks than SVM, NB, KNN, and MLP. We plan to carry out further experiments to verify the framework using a more challenging dataset and make further comparisons with other signal extraction approaches. Also, involve in real-time (online) attack detection. Full article
Show Figures

Figure 1

Open AccessArticle
Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches
Appl. Sci. 2020, 10(5), 1775; https://doi.org/10.3390/app10051775 - 04 Mar 2020
Cited by 1
Abstract
Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of [...] Read more.
Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of security threats in this scenario. To counteract against malicious behaviours and preserve essential security services, Network Intrusion Detection Systems (NIDSs) are the most widely used defence line in communications networks. Nevertheless, there is no standard methodology to evaluate and fairly compare NIDSs. Most of the proposals elude mentioning crucial steps regarding NIDSs validation that make their comparison hard or even impossible. This work firstly includes a comprehensive study of recent NIDSs based on machine learning approaches, concluding that almost all of them do not accomplish with what authors of this paper consider mandatory steps for a reliable comparison and evaluation of NIDSs. Secondly, a structured methodology is proposed and assessed on the UGR’16 dataset to test its suitability for addressing network attack detection problems. The guideline and steps recommended will definitively help the research community to fairly assess NIDSs, although the definitive framework is not a trivial task and, therefore, some extra effort should still be made to improve its understandability and usability further. Full article
Show Figures

Graphical abstract

Open AccessArticle
Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation Learning
Appl. Sci. 2020, 10(5), 1692; https://doi.org/10.3390/app10051692 - 02 Mar 2020
Abstract
Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent [...] Read more.
Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent vulnerability detection methods suffer from the problems of long-term dependence, out of vocabulary, coarse detection granularity and lack of vulnerable samples. This paper proposes an automated and intelligent vulnerability detection method in source code based on the minimum intermediate representation learning. First, the sample in the form of source code is transformed into a minimum intermediate representation to exclude the irrelevant items and reduce the length of the dependency. Next, the intermediate representation is transformed into a real value vector through pre-training on an extended corpus, and the structure and semantic information are retained. Then, the vector is fed to three concatenated convolutional neural networks to obtain high-level features of vulnerability. Last, a classifier is trained using the learned features. To validate this vulnerability detection method, an experiment was performed. The empirical results confirmed that compared with the traditional methods and the state-of-the-art intelligent methods, our method has a better performance with fine granularity. Full article
Show Figures

Figure 1

Open AccessArticle
Systematic Approach to Malware Analysis (SAMA)
Appl. Sci. 2020, 10(4), 1360; https://doi.org/10.3390/app10041360 - 17 Feb 2020
Abstract
Malware threats pose new challenges to analytic and reverse engineering tasks. It is needed for a systematic approach to that analysis, in an attempt to fully uncover their underlying attack vectors and techniques and find commonalities between them. In this paper, a method [...] Read more.
Malware threats pose new challenges to analytic and reverse engineering tasks. It is needed for a systematic approach to that analysis, in an attempt to fully uncover their underlying attack vectors and techniques and find commonalities between them. In this paper, a method of malware analysis is described, together with a report of its application to the case of Flame and Red October. The method has also been used by different analysts to analyze other malware threats like ‘Stuxnet’, ‘Dark Comet’, ‘Poison Ivy’, ‘Locky’, ‘Careto’, and ‘Sofacy Carberp’. The method presented in this work is a systematic and methodological process of analysis, whose main objective is the acquisition of knowledge as well as to gain a full understanding of a particular malware. Using the proposed method to analyze two well-known malware as ‘Flame’ and ‘Red October’ will help to understand the added value of the method. Full article
Show Figures

Figure 1

Open AccessArticle
Graph Convolutional Networks for Privacy Metrics in Online Social Networks
Appl. Sci. 2020, 10(4), 1327; https://doi.org/10.3390/app10041327 - 15 Feb 2020
Abstract
In recent years, privacy leakage events in large-scale social networks have become increasingly frequent. Traditional methods relying on operators have been unable to effectively curb this problem. Researchers must turn their attention to the privacy protection of users themselves. Privacy metrics are undoubtedly [...] Read more.
In recent years, privacy leakage events in large-scale social networks have become increasingly frequent. Traditional methods relying on operators have been unable to effectively curb this problem. Researchers must turn their attention to the privacy protection of users themselves. Privacy metrics are undoubtedly the most effective method. However, social networks have a substantial number of users and a complex network structure and feature set. Previous studies either considered a single aspect or measured multiple aspects separately and then artificially integrated them. The measurement procedures are complex and cannot effectively be integrated. To solve the above problems, we first propose using a deep neural network to measure the privacy status of social network users. Through a graph convolution network, we can easily and efficiently combine the user features and graph structure, determine the hidden relationships between these features, and obtain more accurate privacy scores. Given the restriction of the deep learning framework, which requires a large number of labelled samples, we incorporate a few-shot learning method, which greatly reduces the dependence on labelled data and human intervention. Our method is applicable to online social networks, such as Sina Weibo, Twitter, and Facebook, that can extract profile information, graph structure information of users’ friends, and behavioural characteristics. The experiments show that our model can quickly and accurately obtain privacy scores in a whole network and eliminate traditional tedious numerical calculations and human intervention. Full article
Show Figures

Figure 1

Open AccessArticle
Cybersecurity Threats Based on Machine Learning-Based Offensive Technique for Password Authentication
Appl. Sci. 2020, 10(4), 1286; https://doi.org/10.3390/app10041286 - 14 Feb 2020
Abstract
Due to the emergence of online society, a representative user authentication method that is password authentication has been a key topic. However, in this authentication method, various attack techniques have emerged to steal passwords input from the keyboard, hence, the keyboard data does [...] Read more.
Due to the emergence of online society, a representative user authentication method that is password authentication has been a key topic. However, in this authentication method, various attack techniques have emerged to steal passwords input from the keyboard, hence, the keyboard data does not ensure security. To detect and prevent such an attack, a keyboard data protection technique using random keyboard data generation has been presented. This technique protects keyboard data by generating dummy keyboard data while the attacker obtains the keyboard data. In this study, we demonstrate the feasibility of keyboard data exposure under the keyboard data protection technique. To prove the proposed attack technique, we gathered all the dummy keyboard data generated by the defense tool, and the real keyboard data input by the user, and evaluated the cybersecurity threat of keyboard data based on the machine learning-based offensive technique. We verified that an adversary obtains the keyboard data with 96.2% accuracy even if the attack technique that makes it impossible to attack keyboard data exposure is used. Namely, the proposed method in this study obviously differentiates the keyboard data input by the user from dummy keyboard data. Therefore, the contributions of this paper are that we derived and verified a new security threat and a new vulnerability of password authentication. Furthermore, a new cybersecurity threat derived from this study will have advantages over the security assessment of password authentication and all types of authentication technology and application services input from the keyboard. Full article
Show Figures

Figure 1

Open AccessArticle
Collecting Vulnerable Source Code from Open-Source Repositories for Dataset Generation
Appl. Sci. 2020, 10(4), 1270; https://doi.org/10.3390/app10041270 - 13 Feb 2020
Abstract
Different Machine Learning techniques to detect software vulnerabilities have emerged in scientific and industrial scenarios. Different actors in these scenarios aim to develop algorithms for predicting security threats without requiring human intervention. However, these algorithms require data-driven engines based on the processing of [...] Read more.
Different Machine Learning techniques to detect software vulnerabilities have emerged in scientific and industrial scenarios. Different actors in these scenarios aim to develop algorithms for predicting security threats without requiring human intervention. However, these algorithms require data-driven engines based on the processing of huge amounts of data, known as datasets. This paper introduces the SonarCloud Vulnerable Code Prospector for C (SVCP4C). This tool aims to collect vulnerable source code from open source repositories linked to SonarCloud, an online tool that performs static analysis and tags the potentially vulnerable code. The tool provides a set of tagged files suitable for extracting features and creating training datasets for Machine Learning algorithms. This study presents a descriptive analysis of these files and overviews current status of C vulnerabilities, specifically buffer overflow, in the reviewed public repositories. Full article
Show Figures

Figure 1

Open AccessArticle
A New Method of Fuzzy Support Vector Machine Algorithm for Intrusion Detection
Appl. Sci. 2020, 10(3), 1065; https://doi.org/10.3390/app10031065 - 05 Feb 2020
Abstract
Since SVM is sensitive to noises and outliers of system call sequence data. A new fuzzy support vector machine algorithm based on SVDD is presented in this paper. In our algorithm, the noises and outliers are identified by a hypersphere with minimum volume [...] Read more.
Since SVM is sensitive to noises and outliers of system call sequence data. A new fuzzy support vector machine algorithm based on SVDD is presented in this paper. In our algorithm, the noises and outliers are identified by a hypersphere with minimum volume while containing the maximum of the samples. The definition of fuzzy membership is considered by not only the relation between a sample and hyperplane, but also relation between samples. For each sample inside the hypersphere, the fuzzy membership function is a linear function of the distance between the sample and the hyperplane. The greater the distance, the greater the weight coefficient. For each sample outside the hypersphere, the membership function is an exponential function of the distance between the sample and the hyperplane. The greater the distance, the smaller the weight coefficient. Compared with the traditional fuzzy membership definition based on the relation between a sample and its cluster center, our method effectively distinguishes the noises or outlies from support vectors and assigns them appropriate weight coefficients even though they are distributed on the boundary between the positive and the negative classes. The experiments show that the fuzzy support vector proposed in this paper is more robust than the support vector machine and fuzzy support vector machines based on the distance of a sample and its cluster center. Full article
Show Figures

Figure 1

Open AccessArticle
A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data
Appl. Sci. 2020, 10(3), 936; https://doi.org/10.3390/app10030936 - 31 Jan 2020
Abstract
The popularity of social networks provides people with many conveniences, but their rapid growth has also attracted many attackers. In recent years, the malicious behavior of social network spammers has seriously threatened the information security of ordinary users. To reduce this threat, many [...] Read more.
The popularity of social networks provides people with many conveniences, but their rapid growth has also attracted many attackers. In recent years, the malicious behavior of social network spammers has seriously threatened the information security of ordinary users. To reduce this threat, many researchers have mined the behavior characteristics of spammers and have obtained good results by applying machine learning algorithms to identify spammers in social networks. However, most of these studies overlook class imbalance situations that exist in real world data. In this paper, we propose a heterogeneous stacking-based ensemble learning framework to ameliorate the impact of class imbalance on spam detection in social networks. The proposed framework consists of two main components, a base module and a combining module. In the base module, we adopt six different base classifiers and utilize this classifier diversity to construct new ensemble input members. In the combination module, we introduce cost sensitive learning into deep neural network training. By setting different costs for misclassification and dynamically adjusting the weights of the prediction results of the base classifiers, we can integrate the input members and aggregate the classification results. The experimental results show that our framework effectively improves the spam detection rate on imbalanced datasets. Full article
Show Figures

Figure 1

Open AccessArticle
Synthetic Minority Oversampling Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets
Appl. Sci. 2020, 10(3), 794; https://doi.org/10.3390/app10030794 - 22 Jan 2020
Cited by 1
Abstract
Presently, security is a hot research topic due to the impact in daily information infrastructure. Machine-learning solutions have been improving classical detection practices, but detection tasks employ irregular amounts of data since the number of instances that represent one or several malicious samples [...] Read more.
Presently, security is a hot research topic due to the impact in daily information infrastructure. Machine-learning solutions have been improving classical detection practices, but detection tasks employ irregular amounts of data since the number of instances that represent one or several malicious samples can significantly vary. In highly unbalanced data, classification models regularly have high precision with respect to the majority class, while minority classes are considered noise due to the lack of information that they provide. Well-known datasets used for malware-based analyses like botnet attacks and Intrusion Detection Systems (IDS) mainly comprise logs, records, or network-traffic captures that do not provide an ideal source of evidence as a result of obtaining raw data. As an example, the numbers of abnormal and constant connections generated by either botnets or intruders within a network are considerably smaller than those from benign applications. In most cases, inadequate dataset design may lead to the downgrade of a learning algorithm, resulting in overfitting and poor classification rates. To address these problems, we propose a resampling method, the Synthetic Minority Oversampling Technique (SMOTE) with a grid-search algorithm optimization procedure. This work demonstrates classification-result improvements for botnet and IDS datasets by merging synthetically generated balanced data and tuning different supervised-learning algorithms. Full article
Show Figures

Figure 1

Open AccessArticle
Improving Incident Response in Big Data Ecosystems by Using Blockchain Technologies
Appl. Sci. 2020, 10(2), 724; https://doi.org/10.3390/app10020724 - 20 Jan 2020
Abstract
Big data ecosystems are increasingly important for the daily activities of any type of company. They are decisive elements in the organization, so any malfunction of this environment can have a great impact on the normal functioning of the company; security is therefore [...] Read more.
Big data ecosystems are increasingly important for the daily activities of any type of company. They are decisive elements in the organization, so any malfunction of this environment can have a great impact on the normal functioning of the company; security is therefore a crucial aspect of this type of ecosystem. When approaching security in big data as an issue, it must be considered not only during the creation and implementation of the big data ecosystem, but also throughout its entire lifecycle, including operation, and especially when managing and responding to incidents that occur. To this end, this paper proposes an incident response process supported by a private blockchain network that allows the recording of the different events and incidents that occur in the big data ecosystem. The use of blockchain enables the security of the stored data to be improved, increasing its immutability and traceability. In addition, the stored records can help manage incidents and anticipate them, thereby minimizing the costs of investigating their causes; that facilitates forensic readiness. This proposal integrates with previous research work, seeking to improve the security of big data by creating a process of secure analysis, design, and implementation, supported by a security reference architecture that serves as a guide in defining the different elements of this type of ecosystem. Moreover, this paper presents a case study in which the proposal is being implemented by using big data and blockchain technologies, such as Apache Spark or Hyperledger Fabric. Full article
Show Figures

Figure 1

Open AccessArticle
A Zero-Knowledge Proof System with Algebraic Geometry Techniques
Appl. Sci. 2020, 10(2), 465; https://doi.org/10.3390/app10020465 - 08 Jan 2020
Abstract
Current requirements for ensuring data exchange over the internet to fight against security breaches have to consider new cryptographic attacks. The most recent advances in cryptanalysis are boosted by quantum computers, which are able to break common cryptographic primitives. This makes evident the [...] Read more.
Current requirements for ensuring data exchange over the internet to fight against security breaches have to consider new cryptographic attacks. The most recent advances in cryptanalysis are boosted by quantum computers, which are able to break common cryptographic primitives. This makes evident the need for developing further communication protocols to secure sensitive data. Zero-knowledge proof systems have been around for a while and have been considered for providing authentication and identification services, but it has only been in recent times that its popularity has risen due to novel applications in blockchain technology, Internet of Things, and cloud storage, among others. A new zero-knowledge proof system is presented, which bases its security in two main problems, known to be resistant, up to now, against quantum attacks: the graph isomorphism problem and the isomorphism of polynomials problem. Full article
Show Figures

Figure 1

Open AccessArticle
CyberSPL: A Framework for the Verification of Cybersecurity Policy Compliance of System Configurations Using Software Product Lines
Appl. Sci. 2019, 9(24), 5364; https://doi.org/10.3390/app9245364 - 08 Dec 2019
Cited by 2
Abstract
Cybersecurity attacks affect the compliance of cybersecurity policies of the organisations. Such disadvantages may be due to the absence of security configurations or the use of default configuration values of software products and systems. The complexity in the configuration of products and systems [...] Read more.
Cybersecurity attacks affect the compliance of cybersecurity policies of the organisations. Such disadvantages may be due to the absence of security configurations or the use of default configuration values of software products and systems. The complexity in the configuration of products and systems is a known challenge in the software industry since it includes a wide range of parameters to be taken into account. In other contexts, the configuration problems are solved using Software Product Lines. This is the reason why in this article the framework Cybersecurity Software Product Line (CyberSPL) is proposed. CyberSPL is based on a methodology to design product lines to verify cybersecurity policies according to the possible configurations. The patterns to configure the systems related to the cybersecurity aspects are grouped by defining various feature models. The automated analysis of these models allows us to diagnose possible problems in the security configurations, reducing or avoiding them. As support for this proposal, a multi-user and multi-platform solution has been implemented, enabling setting a catalogue of public or private feature models. Moreover, analysis and reasoning mechanisms have been integrated to obtain all the configurations of a model, to detect if a configuration is valid or not, including the root cause of problems for a given configuration. For validating the proposal, a real scenario is proposed where a catalogue of four different feature models is presented. In this scenario, the models have been analysed, different configurations have been validated, and several configurations with problems have been diagnosed. Full article
Show Figures

Figure 1

Open AccessArticle
Malware Detection on Byte Streams of Hangul Word Processor Files
Appl. Sci. 2019, 9(23), 5178; https://doi.org/10.3390/app9235178 - 29 Nov 2019
Abstract
While the exchange of data files or programs on the Internet grows exponentially, most users are vulnerable to infected files, especially to malicious non-executables. Due to the circumstances between South and North Korea, many malicious actions have recently been found in Hangul Word [...] Read more.
While the exchange of data files or programs on the Internet grows exponentially, most users are vulnerable to infected files, especially to malicious non-executables. Due to the circumstances between South and North Korea, many malicious actions have recently been found in Hangul Word Processor (HWP) non-executable files because the HWP is widely used in schools, military facilities, and government institutions of South Korea. The HWP file usually has one or more byte streams that are often used for the malicious actions. Based on an assumption that infected byte streams have particular patterns, we design a convolutional neural network (CNN) to grasp such patterns. We conduct experiments on our prepared 534 HWP files, and demonstrate that the proposed CNN achieves the best performance compared to other machine learning models. As new malicious attacks keep emerging, we will keep collecting such HWP files and investigate better model structures. Full article
Show Figures

Figure 1

Open AccessArticle
Bitcoin and Cybersecurity: Temporal Dissection of Blockchain Data to Unveil Changes in Entity Behavioral Patterns
Appl. Sci. 2019, 9(23), 5003; https://doi.org/10.3390/app9235003 - 20 Nov 2019
Abstract
The Bitcoin network not only is vulnerable to cyber-attacks but currently represents the most frequently used cryptocurrency for concealing illicit activities. Typically, Bitcoin activity is monitored by decreasing anonymity of its entities using machine learning-based techniques, which consider the whole blockchain. This entails [...] Read more.
The Bitcoin network not only is vulnerable to cyber-attacks but currently represents the most frequently used cryptocurrency for concealing illicit activities. Typically, Bitcoin activity is monitored by decreasing anonymity of its entities using machine learning-based techniques, which consider the whole blockchain. This entails two issues: first, it increases the complexity of the analysis requiring higher efforts and, second, it may hide network micro-dynamics important for detecting short-term changes in entity behavioral patterns. The aim of this paper is to address both issues by performing a “temporal dissection” of the Bitcoin blockchain, i.e., dividing it into smaller temporal batches to achieve entity classification. The idea is that a machine learning model trained on a certain time-interval (batch) should achieve good classification performance when tested on another batch if entity behavioral patterns are similar. We apply cascading machine learning principles—a type of ensemble learning applying stacking techniques—introducing a “k-fold cross-testing” concept across batches of varying size. Results show that blockchain batch size used for entity classification could be reduced for certain classes (Exchange, Gambling, and eWallet) as classification rates did not vary significantly with batch size; suggesting that behavioral patterns did not change significantly over time. Mixer and Market class detection, however, can be negatively affected. A deeper analysis of Mining Pool behavior showed that models trained on recent data perform better than models trained on older data, suggesting that “typical” Mining Pool behavior may be represented better by recent data. This work provides a first step towards uncovering entity behavioral changes via temporal dissection of blockchain data. Full article
Show Figures

Figure 1

Open AccessArticle
Malicious PDF Detection Model against Adversarial Attack Built from Benign PDF Containing JavaScript
Appl. Sci. 2019, 9(22), 4764; https://doi.org/10.3390/app9224764 - 08 Nov 2019
Abstract
Intelligent attacks using document-based malware that exploit vulnerabilities in document viewing software programs or document file structure are increasing rapidly. There are many cases of using PDF (portable document format) in proportion to its usage. We provide in-depth analysis on PDF structure and [...] Read more.
Intelligent attacks using document-based malware that exploit vulnerabilities in document viewing software programs or document file structure are increasing rapidly. There are many cases of using PDF (portable document format) in proportion to its usage. We provide in-depth analysis on PDF structure and JavaScript content embedded in PDFs. Then, we develop the diverse feature set encompassing the structure and metadata such as file size, version, encoding method and keywords, and the content features such as object names, keywords, and readable strings in JavaScript. When features are diverse, it is hard to develop adversarial examples because small changes are robust for machine-learning algorithms. We develop a detection model using black-box type models with the structure and content features to minimize the risk of adversarial attacks. To validate the proposed model, we design the adversarial attack. We collect benign documents containing multiple JavaScript codes for the base of adversarial samples. We build the adversarial samples by injecting the malware codes into base samples. The proposed model is evaluated against a large collection of malicious and benign PDFs. We found that random forest, an ensemble algorithm of a decision tree, exhibits a good performance on malware detection and is robust for adversarial samples. Full article
Show Figures

Figure 1

Open AccessArticle
A Feature Analysis Based Identifying Scheme Using GBDT for DDoS with Multiple Attack Vectors
Appl. Sci. 2019, 9(21), 4633; https://doi.org/10.3390/app9214633 - 31 Oct 2019
Cited by 2
Abstract
In recent years, distributed denial of service (DDoS) attacks have increasingly shown the trend of multiattack vector composites, which has significantly improved the concealment and success rate of DDoS attacks. Therefore, improving the ubiquitous detection capability of DDoS attacks and accurately and quickly [...] Read more.
In recent years, distributed denial of service (DDoS) attacks have increasingly shown the trend of multiattack vector composites, which has significantly improved the concealment and success rate of DDoS attacks. Therefore, improving the ubiquitous detection capability of DDoS attacks and accurately and quickly identifying DDoS attack traffic play an important role in later attack mitigation. This paper proposes a method to efficiently detect and identify multivector DDoS attacks. The detection algorithm is applicable to known and unknown DDoS attacks. Full article
Show Figures

Figure 1

Open AccessArticle
Ontology-Based System for Dynamic Risk Management in Administrative Domains
Appl. Sci. 2019, 9(21), 4547; https://doi.org/10.3390/app9214547 - 26 Oct 2019
Abstract
With the increasing complexity of cyberthreats, it is necessary to have tools to understand the changing context in real-time. This document will present architecture and a prototype designed to model the risk of administrative domains, exemplifying the case of a country in real-time, [...] Read more.
With the increasing complexity of cyberthreats, it is necessary to have tools to understand the changing context in real-time. This document will present architecture and a prototype designed to model the risk of administrative domains, exemplifying the case of a country in real-time, specifically, Spain. In order to carry out this task, a modeling of the assets and threats detected by various sources of information has been carried out. All this information is stored as knowledge making use of ontologies, which enables the application of reasoning engines in order to infer new knowledge that can be used later in the following reasoning. This modeling and reasoning have been enriched with a dynamic system for managing the trust of the different sources of information and capabilities for increased reliability with the inclusion of additional threat intelligence information. Full article
Show Figures

Figure 1

Open AccessArticle
Insider Threat Detection Based on User Behavior Modeling and Anomaly Detection Algorithms
Appl. Sci. 2019, 9(19), 4018; https://doi.org/10.3390/app9194018 - 25 Sep 2019
Cited by 1
Abstract
Insider threats are malicious activities by authorized users, such as theft of intellectual property or security information, fraud, and sabotage. Although the number of insider threats is much lower than external network attacks, insider threats can cause extensive damage. As insiders are very [...] Read more.
Insider threats are malicious activities by authorized users, such as theft of intellectual property or security information, fraud, and sabotage. Although the number of insider threats is much lower than external network attacks, insider threats can cause extensive damage. As insiders are very familiar with an organization’s system, it is very difficult to detect their malicious behavior. Traditional insider-threat detection methods focus on rule-based approaches built by domain experts, but they are neither flexible nor robust. In this paper, we propose insider-threat detection methods based on user behavior modeling and anomaly detection algorithms. Based on user log data, we constructed three types of datasets: user’s daily activity summary, e-mail contents topic distribution, and user’s weekly e-mail communication history. Then, we applied four anomaly detection algorithms and their combinations to detect malicious activities. Experimental results indicate that the proposed framework can work well for imbalanced datasets in which there are only a few insider threats and where no domain experts’ knowledge is provided. Full article
Show Figures

Figure 1

Open AccessArticle
Information Extraction of Cybersecurity Concepts: An LSTM Approach
Appl. Sci. 2019, 9(19), 3945; https://doi.org/10.3390/app9193945 - 20 Sep 2019
Cited by 2
Abstract
Extracting cybersecurity entities and the relationships between them from online textual resources such as articles, bulletins, and blogs and converting these resources into more structured and formal representations has important applications in cybersecurity research and is valuable for professional practitioners. Previous works to [...] Read more.
Extracting cybersecurity entities and the relationships between them from online textual resources such as articles, bulletins, and blogs and converting these resources into more structured and formal representations has important applications in cybersecurity research and is valuable for professional practitioners. Previous works to accomplish this task were mainly based on utilizing feature-based models. Feature-based models are time-consuming and need labor-intensive feature engineering to describe the properties of entities, domain knowledge, entity context, and linguistic characteristics. Therefore, to alleviate the need for feature engineering, we propose the usage of neural network models, specifically the long short-term memory (LSTM) models to accomplish the tasks of Named Entity Recognition (NER) and Relation Extraction (RE). We evaluated the proposed models on two tasks. The first task is performing NER and evaluating the results against the state-of-the-art Conditional Random Fields (CRFs) method. The second task is performing RE using three LSTM models and comparing their results to assess which model is more suitable for the domain of cybersecurity. The proposed models achieved competitive performance with less feature-engineering work. We demonstrate that exploiting neural network models in cybersecurity text mining is effective and practical. Full article
Show Figures

Figure 1

Open AccessArticle
Malware Detection Approach Based on Artifacts in Memory Image and Dynamic Analysis
Appl. Sci. 2019, 9(18), 3680; https://doi.org/10.3390/app9183680 - 05 Sep 2019
Cited by 1
Abstract
The need to detect malware before it harms computers, mobile phones and other electronic devices has caught the attention of researchers and the anti-malware industry for many years. To protect users from malware attacks, anti-virus software products are downloaded on the computer. The [...] Read more.
The need to detect malware before it harms computers, mobile phones and other electronic devices has caught the attention of researchers and the anti-malware industry for many years. To protect users from malware attacks, anti-virus software products are downloaded on the computer. The anti-virus mainly uses signature-based techniques to detect malware. However, this technique fails to detect malware that uses packing, encryption or obfuscation techniques. It also fails to detect unseen (new) ones. This paper proposes an integrated malware detection approach that applies memory forensics to extract malicious artifacts from memory and combines them to features extracted during the execution of malware in a dynamic analysis. Pre-modeling techniques were also applied for feature engineering before training and testing the data set on the machine learning models. The experimental results show a significant improvement in both detection accuracy rate and false positive rate, 98.5% and 1.7% respectively, by applying the support vector machine. The results verify that our integrated analysis approach outperforms other analysis methods. In addition, the proposed approach overcomes the limitation of single path file execution in dynamic analysis by adding more relevant memory artifacts that can reveal the real intention of malicious files. Full article
Show Figures

Figure 1

Open AccessArticle
Network Intrusion Detection Based on Novel Feature Selection Model and Various Recurrent Neural Networks
Appl. Sci. 2019, 9(7), 1392; https://doi.org/10.3390/app9071392 - 03 Apr 2019
Cited by 6
Abstract
The recent increase in hacks and computer network attacks around the world has intensified the need to develop better intrusion detection and prevention systems. The intrusion detection system (IDS) plays a vital role in detecting anomalies and attacks on the network which have [...] Read more.
The recent increase in hacks and computer network attacks around the world has intensified the need to develop better intrusion detection and prevention systems. The intrusion detection system (IDS) plays a vital role in detecting anomalies and attacks on the network which have become larger and more pervasive in nature. However, most anomaly-based intrusion detection systems are plagued by high false positives. Furthermore, Remote-to-Local (R2L) and User-to-Root (U2R) are two kinds of attack which have low predicted accuracy scores in advance IDS methods. Therefore, this paper proposes a novel IDS framework to overcome these IDS problems. The proposed framework including three main parts. The first part is to build SFSDT model which is the feature selection model. SFSDT is to generate the best feature subset from the original feature set. This model is a hybrid Sequence Forward Selection (SFS) algorithm and Decision Tree (DT) model. The second part is to build various IDS models to train on the best-selected feature subset. The various Recurrent Neural Networks (RNN) are traditional RNN, Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). Two IDS datasets are used for the learned models in experiments including NSL-KDD in 2010 and ISCX in 2012. The final part is to evaluate the proposed model by comparing the proposed models to other IDS models. The experimental results show the proposed models achieve significantly improved accuracy detection rate as well as attack types classification. Furthermore, this approach can reduce the computation time by memory profilers measurement. Full article
Show Figures

Figure 1

Review

Jump to: Research

Open AccessReview
Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey
Appl. Sci. 2019, 9(20), 4396; https://doi.org/10.3390/app9204396 - 17 Oct 2019
Cited by 7
Abstract
Networks play important roles in modern life, and cyber security has become a vital research area. An intrusion detection system (IDS) which is an important cyber security technique, monitors the state of software and hardware running in the network. Despite decades of development, [...] Read more.
Networks play important roles in modern life, and cyber security has become a vital research area. An intrusion detection system (IDS) which is an important cyber security technique, monitors the state of software and hardware running in the network. Despite decades of development, existing IDSs still face challenges in improving the detection accuracy, reducing the false alarm rate and detecting unknown attacks. To solve the above problems, many researchers have focused on developing IDSs that capitalize on machine learning methods. Machine learning methods can automatically discover the essential differences between normal data and abnormal data with high accuracy. In addition, machine learning methods have strong generalizability, so they are also able to detect unknown attacks. Deep learning is a branch of machine learning, whose performance is remarkable and has become a research hotspot. This survey proposes a taxonomy of IDS that takes data objects as the main dimension to classify and summarize machine learning-based and deep learning-based IDS literature. We believe that this type of taxonomy framework is fit for cyber security researchers. The survey first clarifies the concept and taxonomy of IDSs. Then, the machine learning algorithms frequently used in IDSs, metrics, and benchmark datasets are introduced. Next, combined with the representative literature, we take the proposed taxonomic system as a baseline and explain how to solve key IDS issues with machine learning and deep learning techniques. Finally, challenges and future developments are discussed by reviewing recent representative studies. Full article
Show Figures

Figure 1

Back to TopTop