Machine Learning for Cybersecurity Threats, Challenges, and Opportunities

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (30 December 2020) | Viewed by 236953

Special Issue Editors

Group of Analysis, Security and Systems (GASS), Universidad Complutense de Madrid (UCM), 28040 Madrid, Spain
Interests: artificial intelligence; big data; computer networks; computer security; information theory; IoT; multimedia forensics; 6G
Special Issues, Collections and Topics in MDPI journals
Decision Technologies Laboratory - LATITUDE, Electrical Engineering Department (ENE), Institute of Technology (FT), University of Brasília (UnB), Brasília-DF, CEP 70910-900, Brazil
Interests: cyber; information and network security; distributed data services and machine learning for intrusion and fraud detection; signal processing; energy harvesting and security at the physical layer
Special Issues, Collections and Topics in MDPI journals
IBM Almaden Research Center, San Jose, CA, USA
Interests: error-correcting codes; fault tolerance; parallel processing; cryptography; modulation codes for magnetic recording; timing algorithms; holographic storage; parallel communications; neural networks; finite group theory
Cybersecurity INCT Unit 6, Decision Technologies Laboratory-LATITUDE, Electrical Engineering Department (ENE), Technology College, University of Brasília (UnB), Brasília-DF, CEP 70910-900, Brazil
Interests: computer and network security; multimedia forensics; error-correcting codes; information theory
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Cybersecurity has become a major priority for every organization. The right controls and procedures must be put in place to detect potential attacks and protect against them. However, the number of cyber-attacks will be always bigger than the number of people trying to protect themselves against attacks. New threats are being discovered on a daily basis, making it harder for current solutions to cope with a large amount of data to analyze. Machine learning systems can be trained to find attacks which are similar to known attacks. This way, we can detect even the first intrusions of their kind and develop better security measures.

The sophistication of threats has also increased substantially. Sophisticated zero-day attacks may go undetected for months at a time. Attack patterns may be engineered to take place over extended periods of time, making them very difficult for traditional intrusion detection technologies to detect. Even worse, new attack tools and strategies can now be developed using adversarial machine learning techniques, requiring a rapid co-evolution of defenses that matches the speed and sophistication of machine learning-based offensive techniques. Based on this motivation, this Special Issue aims at providing a forum for people from academia and industry to communicate their latest results on theoretical advances and industrial case studies that combine machine learning techniques, such as reinforcement learning, adversarial machine learning, and deep learning, with significant problems in cybersecurity. Research papers can be focused on offensive and defensive applications of machine learning to security. The potential topics of interest of this Special Issue are listed below. Submissions can contemplate original research, serious dataset collection and benchmarking, or critical surveys.

Potential topics include but are not limited to:

  • Adversarial training and defensive distillation;
  • Attacks against machine learning;
  • Black-box attacks against machine learning;
  • Challenges of machine learning for cyber security;
  • Ethics of machine learning for cyber security applications;
  • Generative adversarial models;
  • Graph representation learning;
  • Machine learning forensics;
  • Machine learning threat intelligence;
  • Malware detection;
  • Neural graph learning;
  • One-shot learning; continuous learning;
  • Scalable machine learning for cyber security;
  • Steganography and steganalysis based on machine learning techniques;
  • Strength and shortcomings of machine learning for cyber-security.
Prof. Luis Javier Garcia Villalba
Prof. Dr. Rafael T. de Sousa Jr.
Dr. Mario Blaum
Dr. Ana Lucila Sandoval Orozco
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Related Special Issue

Published Papers (37 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

24 pages, 625 KiB  
Article
Machine Learning-Based Malicious X.509 Certificates’ Detection
by Jiaxin Li, Zhaoxin Zhang and Changyong Guo
Appl. Sci. 2021, 11(5), 2164; https://doi.org/10.3390/app11052164 - 01 Mar 2021
Cited by 4 | Viewed by 3183
Abstract
X.509 certificates play an important role in encrypting the transmission of data on both sides under HTTPS. With the popularization of X.509 certificates, more and more criminals leverage certificates to prevent their communications from being exposed by malicious traffic analysis tools. Phishing sites [...] Read more.
X.509 certificates play an important role in encrypting the transmission of data on both sides under HTTPS. With the popularization of X.509 certificates, more and more criminals leverage certificates to prevent their communications from being exposed by malicious traffic analysis tools. Phishing sites and malware are good examples. Those X.509 certificates found in phishing sites or malware are called malicious X.509 certificates. This paper applies different machine learning models, including classical machine learning models, ensemble learning models, and deep learning models, to distinguish between malicious certificates and benign certificates with Verification for Extraction (VFE). The VFE is a system we design and implement for obtaining plentiful characteristics of certificates. The result shows that ensemble learning models are the most stable and efficient models with an average accuracy of 95.9%, which outperforms many previous works. In addition, we obtain an SVM-based detection model with an accuracy of 98.2%, which is the highest accuracy. The outcome indicates the VFE is capable of capturing essential and crucial characteristics of malicious X.509 certificates. Full article
Show Figures

Graphical abstract

20 pages, 2949 KiB  
Article
Designing Trojan Detectors in Neural Networks Using Interactive Simulations
by Peter Bajcsy, Nicholas J. Schaub and Michael Majurski
Appl. Sci. 2021, 11(4), 1865; https://doi.org/10.3390/app11041865 - 20 Feb 2021
Cited by 1 | Viewed by 2553
Abstract
This paper addresses the problem of designing trojan detectors in neural networks (NNs) using interactive simulations. Trojans in NNs are defined as triggers in inputs that cause misclassification of such inputs into a class (or classes) unintended by the design of a NN-based [...] Read more.
This paper addresses the problem of designing trojan detectors in neural networks (NNs) using interactive simulations. Trojans in NNs are defined as triggers in inputs that cause misclassification of such inputs into a class (or classes) unintended by the design of a NN-based model. The goal of our work is to understand encodings of a variety of trojan types in fully connected layers of neural networks. Our approach is: (1) to simulate nine types of trojan embeddings into dot patterns; (2) to devise measurements of NN states; and (3) to design trojan detectors in NN-based classification models. The interactive simulations are built on top of TensorFlow Playground with in-memory storage of data and NN coefficients. The simulations provide analytical, visualization, and output operations performed on training datasets and NN architectures. The measurements of a NN include: (a) model inefficiency using modified Kullback–Liebler (KL) divergence from uniformly distributed states; and (b) model sensitivity to variables related to data and NNs. Using the KL divergence measurements at each NN layer and per each predicted class label, a trojan detector is devised to discriminate NN models with or without trojans. To document robustness of such a trojan detector with respect to NN architectures, dataset perturbations, and trojan types, several properties of the KL divergence measurement are presented. Full article
Show Figures

Graphical abstract

30 pages, 2267 KiB  
Article
TEDL: A Text Encryption Method Based on Deep Learning
by Peng Wang and Xiang Li
Appl. Sci. 2021, 11(4), 1781; https://doi.org/10.3390/app11041781 - 17 Feb 2021
Cited by 2 | Viewed by 2155
Abstract
Recent years have seen an increasing emphasis on information security, and various encryption methods have been proposed. However, for symmetric encryption methods, the well-known encryption techniques still rely on the key space to guarantee security and suffer from frequent key updating. Aiming to [...] Read more.
Recent years have seen an increasing emphasis on information security, and various encryption methods have been proposed. However, for symmetric encryption methods, the well-known encryption techniques still rely on the key space to guarantee security and suffer from frequent key updating. Aiming to solve those problems, this paper proposes a novel symmetry-key method for text encryption based on deep learning called TEDL, where the secret key includes hyperparameters in the deep learning model and the core step of encryption is transforming input data into weights trained under hyperparameters. Firstly, both communication parties establish a word vector table by training a deep learning model according to specified hyperparameters. Then, a self-update codebook is constructed on the word vector table with the SHA-256 function and other tricks. When communication starts, encryption and decryption are equivalent to indexing and inverted indexing on the codebook, respectively, thus achieving the transformation between plaintext and ciphertext. Results of experiments and relevant analyses show that TEDL performs well for security, efficiency, generality, and has a lower demand for the frequency of key redistribution. Especially, as a supplement to current encryption methods, the time-consuming process of constructing a codebook increases the difficulty of brute-force attacks, meanwhile, it does not degrade the efficiency of communications. Full article
Show Figures

Figure 1

21 pages, 595 KiB  
Article
Intelligent Cyber Attack Detection and Classification for Network-Based Intrusion Detection Systems
by Nuno Oliveira, Isabel Praça, Eva Maia and Orlando Sousa
Appl. Sci. 2021, 11(4), 1674; https://doi.org/10.3390/app11041674 - 13 Feb 2021
Cited by 56 | Viewed by 6668
Abstract
With the latest advances in information and communication technologies, greater amounts of sensitive user and corporate information are shared continuously across the network, making it susceptible to an attack that can compromise data confidentiality, integrity, and availability. Intrusion Detection Systems (IDS) are important [...] Read more.
With the latest advances in information and communication technologies, greater amounts of sensitive user and corporate information are shared continuously across the network, making it susceptible to an attack that can compromise data confidentiality, integrity, and availability. Intrusion Detection Systems (IDS) are important security mechanisms that can perform the timely detection of malicious events through the inspection of network traffic or host-based logs. Many machine learning techniques have proven to be successful at conducting anomaly detection throughout the years, but only a few considered the sequential nature of data. This work proposes a sequential approach and evaluates the performance of a Random Forest (RF), a Multi-Layer Perceptron (MLP), and a Long-Short Term Memory (LSTM) on the CIDDS-001 dataset. The resulting performance measures of this particular approach are compared with the ones obtained from a more traditional one, which only considers individual flow information, in order to determine which methodology best suits the concerned scenario. The experimental outcomes suggest that anomaly detection can be better addressed from a sequential perspective. The LSTM is a highly reliable model for acquiring sequential patterns in network traffic data, achieving an accuracy of 99.94% and an f1-score of 91.66%. Full article
Show Figures

Figure 1

15 pages, 922 KiB  
Article
An Attention-Based Graph Neural Network for Spam Bot Detection in Social Networks
by Chensu Zhao, Yang Xin, Xuefeng Li, Hongliang Zhu, Yixian Yang and Yuling Chen
Appl. Sci. 2020, 10(22), 8160; https://doi.org/10.3390/app10228160 - 18 Nov 2020
Cited by 19 | Viewed by 3424
Abstract
With the rapid development of social networks, spam bots and other anomaly accounts’ malicious behavior has become a critical information security problem threatening the social network platform. In order to reduce this threat, the existing research mainly uses feature-based detection or propagation-based detection, [...] Read more.
With the rapid development of social networks, spam bots and other anomaly accounts’ malicious behavior has become a critical information security problem threatening the social network platform. In order to reduce this threat, the existing research mainly uses feature-based detection or propagation-based detection, and it applies machine learning or graph mining algorithms to identify anomaly accounts in social networks. However, with the development of technology, spam bots are becoming more advanced, and identifying bots is still an open challenge. This paper proposes a new semi-supervised graph embedding model based on a graph attention network for spam bot detection in social networks. This approach constructs a detection model by aggregating features and neighbor relationships, and learns a complex method to integrate the different neighborhood relationships between nodes to operate the directed social graph. The new model can identify spam bots by capturing user features and two different relationships among users in social networks. We compare our method with other methods on real-world social network datasets, and the experimental results show that our proposed model achieves a significant and consistent improvement. Full article
Show Figures

Figure 1

10 pages, 606 KiB  
Article
Detecting Cyber Threat Event from Twitter Using IDCNN and BiLSTM
by Yong Fang, Jian Gao, Zhonglin Liu and Cheng Huang
Appl. Sci. 2020, 10(17), 5922; https://doi.org/10.3390/app10175922 - 26 Aug 2020
Cited by 16 | Viewed by 3868
Abstract
In the context of increasing cyber threats and attacks, monitoring and analyzing network security incidents in a timely and effective way is the key to ensuring network infrastructure security. As one of the world’s most popular social media sites, users post all kinds [...] Read more.
In the context of increasing cyber threats and attacks, monitoring and analyzing network security incidents in a timely and effective way is the key to ensuring network infrastructure security. As one of the world’s most popular social media sites, users post all kinds of messages on Twitter, from daily life to global news and political strategy. It can aggregate a large number of network security-related events promptly and provide a source of information flow about cyber threats. In this paper, for detecting cyber threat events on Twitter, we present a multi-task learning approach based on the natural language processing technology and machine learning algorithm of the Iterated Dilated Convolutional Neural Network (IDCNN) and Bidirectional Long Short-Term Memory (BiLSTM) to establish a highly accurate network model. Furthermore, we collect a network threat-related Twitter database from the public datasets to verify our model’s performance. The results show that the proposed model works well to detect cyber threat events from tweets and significantly outperform several baselines. Full article
Show Figures

Graphical abstract

23 pages, 4024 KiB  
Article
Hybrid Malware Classification Method Using Segmentation-Based Fractal Texture Analysis and Deep Convolution Neural Network Features
by Maryam Nisa, Jamal Hussain Shah, Shansa Kanwal, Mudassar Raza, Muhammad Attique Khan, Robertas Damaševičius and Tomas Blažauskas
Appl. Sci. 2020, 10(14), 4966; https://doi.org/10.3390/app10144966 - 19 Jul 2020
Cited by 89 | Viewed by 6601
Abstract
As the number of internet users increases so does the number of malicious attacks using malware. The detection of malicious code is becoming critical, and the existing approaches need to be improved. Here, we propose a feature fusion method to combine the features [...] Read more.
As the number of internet users increases so does the number of malicious attacks using malware. The detection of malicious code is becoming critical, and the existing approaches need to be improved. Here, we propose a feature fusion method to combine the features extracted from pre-trained AlexNet and Inception-v3 deep neural networks with features attained using segmentation-based fractal texture analysis (SFTA) of images representing the malware code. In this work, we use distinctive pre-trained models (AlexNet and Inception-V3) for feature extraction. The purpose of deep convolutional neural network (CNN) feature extraction from two models is to improve the malware classifier accuracy, because both models have characteristics and qualities to extract different features. This technique produces a fusion of features to build a multimodal representation of malicious code that can be used to classify the grayscale images, separating the malware into 25 malware classes. The features that are extracted from malware images are then classified using different variants of support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), and other classifiers. To improve the classification results, we also adopted data augmentation based on affine image transforms. The presented method is evaluated on a Malimg malware image dataset, achieving an accuracy of 99.3%, which makes it the best among the competing approaches. Full article
Show Figures

Figure 1

20 pages, 473 KiB  
Article
Cross-Site Scripting Guardian: A Static XSS Detector Based on Data Stream Input-Output Association Mining
by Chenghao Li, Yiding Wang, Changwei Miao and Cheng Huang
Appl. Sci. 2020, 10(14), 4740; https://doi.org/10.3390/app10144740 - 09 Jul 2020
Cited by 15 | Viewed by 4689
Abstract
The largest number of cybersecurity attacks is on web applications, in which Cross-Site Scripting (XSS) is the most popular way. The code audit is the main method to avoid the damage of XSS at the source code level. However, there are numerous limits [...] Read more.
The largest number of cybersecurity attacks is on web applications, in which Cross-Site Scripting (XSS) is the most popular way. The code audit is the main method to avoid the damage of XSS at the source code level. However, there are numerous limits implementing manual audits and rule-based audit tools. In the age of big data, it is a new research field to assist the manual auditing through machine learning. In this paper, we propose a new way to audit the XSS vulnerability in PHP source code snippets based on a PHP code parsing tool and the machine learning algorithm. We analyzed the operation sequence of source code and built a model to acquire the information that is most closely related to the XSS attack in the data stream. The method proposed can significantly improve the recall rate of vulnerability samples. Compared with related audit methods, our method has high reusability and excellent performance. Our classification model achieved an F1 score of 0.92, a recall rate of 0.98 (vulnerable sample), and an area under curve (AUC) of 0.97 on the test dataset. Full article
Show Figures

Graphical abstract

17 pages, 1236 KiB  
Article
Providing Email Privacy by Preventing Webmail from Loading Malicious XSS Payloads
by Yong Fang, Yijia Xu, Peng Jia and Cheng Huang
Appl. Sci. 2020, 10(13), 4425; https://doi.org/10.3390/app10134425 - 27 Jun 2020
Cited by 7 | Viewed by 14574
Abstract
With the development of internet technology, email has become the formal communication method in modern society. Email often contains a large amount of personal privacy information, possible business agreements, and sensitive attachments, which make emails a good target for hackers. One of the [...] Read more.
With the development of internet technology, email has become the formal communication method in modern society. Email often contains a large amount of personal privacy information, possible business agreements, and sensitive attachments, which make emails a good target for hackers. One of the most common attack method used by hackers is email XSS (Cross-site scripting). Through exploiting XSS vulnerabilities, hackers can steal identities, logging into the victim’s mailbox and stealing content directly. Therefore, this paper proposes an email XSS detection model based on deep learning technology, which can identify whether the XSS payload is carried in the email or not. Firstly, the model could extract the Sender, Receiver, Subject, Content, Attachment field information from the original email. Secondly, the email XSS corpus is formed after data processing. The Word2Vec algorithm is introduced to train the corpus and extract features for each email sample. Finally, the model uses the Bidirectional-RNN algorithm and Attention mechanism to train the email XSS detection model. In the experiment, the AUC (area under curve) value of the Bidirectional-RNN model reached 0.9979. When the Attention mechanism was added, the accuracy upper limit of the Bidirectional-RNN model was raised to 0.9936, and the loss value was reduced to 0.03. Full article
Show Figures

Figure 1

29 pages, 3410 KiB  
Article
Methodology for Forensics Data Reconstruction on Mobile Devices with Android Operating System Applying In-System Programming and Combination Firmware
by Claudinei Morin da Silveira, Rafael T. de Sousa Jr, Robson de Oliveira Albuquerque, Georges D. Amvame Nze, Gildásio Antonio de Oliveira Júnior, Ana Lucila Sandoval Orozco and Luis Javier García Villalba
Appl. Sci. 2020, 10(12), 4231; https://doi.org/10.3390/app10124231 - 20 Jun 2020
Cited by 7 | Viewed by 10582
Abstract
This paper proposes a new forensic analysis methodology that combines processes, techniques, and tools for physical and logical data acquisition from mobile devices. The proposed methodology allows an overview of the use of the In-System Programming (ISP) technique with the usage of Combination [...] Read more.
This paper proposes a new forensic analysis methodology that combines processes, techniques, and tools for physical and logical data acquisition from mobile devices. The proposed methodology allows an overview of the use of the In-System Programming (ISP) technique with the usage of Combination Firmware, aligned with specific collection and analysis processes. The carried out experiments show that the proposed methodology is convenient and practical and provides new possibilities for data acquisition on devices that run the Android Operating System with advanced protection mechanisms. The methodology is also feasible in devices compatible with the usage of Joint Test Action Group (JTAG) techniques and which use Embedded Multimedia Card (eMMC) or Embedded Multi-Chip Package (eMCP) as main memory. The techniques included in the methodology are effective on encrypted devices, in which the JTAG and Chip-Off techniques prove to be ineffective, especially on those that have an unauthorized access protection mechanism enabled, such as lock screen password, blocked bootloader, and Factory Reset Protection (FRP) active. Studies also demonstrate that data preservation and integrity are maintained, which is critical to a digital forensic process. Full article
Show Figures

Figure 1

21 pages, 2665 KiB  
Article
Malicious JavaScript Detection Based on Bidirectional LSTM Model
by Xuyan Song, Chen Chen, Baojiang Cui and Junsong Fu
Appl. Sci. 2020, 10(10), 3440; https://doi.org/10.3390/app10103440 - 16 May 2020
Cited by 24 | Viewed by 4493
Abstract
JavaScript has been widely used on the Internet because of its powerful features, and almost all the websites use it to provide dynamic functions. However, these dynamic natures also carry potential risks. The authors of the malicious scripts started using JavaScript to launch [...] Read more.
JavaScript has been widely used on the Internet because of its powerful features, and almost all the websites use it to provide dynamic functions. However, these dynamic natures also carry potential risks. The authors of the malicious scripts started using JavaScript to launch various attacks, such as Cross-Site Scripting (XSS), Cross-site Request Forgery (CSRF), and drive-by download attack. Traditional malicious script detection relies on expert knowledge, but even for experts, this is an error-prone task. To solve this problem, many learning-based methods for malicious JavaScript detection are being explored. In this paper, we propose a novel deep learning-based method for malicious JavaScript detection. In order to extract semantic information from JavaScript programs, we construct the Program Dependency Graph (PDG) and generate semantic slices, which preserve rich semantic information and are easy to transform into vectors. Then, a malicious JavaScript detection model based on the Bidirectional Long Short-Term Memory (BLSTM) neural network is proposed. Experimental results show that, in comparison with the other five methods, our model achieved the best performance, with an accuracy of 97.71% and an F1-score of 98.29%. Full article
Show Figures

Figure 1

21 pages, 6411 KiB  
Article
ConvProtoNet: Deep Prototype Induction towards Better Class Representation for Few-Shot Malware Classification
by Zhijie Tang, Peng Wang and Junfeng Wang
Appl. Sci. 2020, 10(8), 2847; https://doi.org/10.3390/app10082847 - 20 Apr 2020
Cited by 20 | Viewed by 3405
Abstract
Traditional malware classification relies on known malware types and significantly large datasets labeled manually which limits its ability to recognize new malware classes. For unknown malware types or new variants of existing malware containing only a few samples each class, common classification methods [...] Read more.
Traditional malware classification relies on known malware types and significantly large datasets labeled manually which limits its ability to recognize new malware classes. For unknown malware types or new variants of existing malware containing only a few samples each class, common classification methods often fail to work well due to severe overfitting. In this paper, we propose a new neural network structure called ConvProtoNet which employs few-shot learning to address the problem of scarce malware samples while prevent from overfitting. We design a convolutional induction module to replace the insufficient prototype reduction in most few-shot models and generates more appropriate class-level malware prototypes for classification. We also adopt meta-learning scheme to make classifier robust enough to adapt unseen malware classes without fine-tuning. Even in extreme conditions where only 5 samples in each class are provided, ConvProtoNet still achieves more than 70% accuracy on average and outperforms other traditional malware classification methods or existed few-shot models in experiments conducted on several datasets. Extra experiments across datasets illustrate that ConvProtoNet learns general knowledge of malware which is dataset-invariant and careful model analysis proves effectiveness of ConvProtoNet in few-shot malware classification. Full article
Show Figures

Figure 1

26 pages, 3444 KiB  
Article
Investigation of Dual-Flow Deep Learning Models LSTM-FCN and GRU-FCN Efficiency against Single-Flow CNN Models for the Host-Based Intrusion and Malware Detection Task on Univariate Times Series Data
by Dainius Čeponis and Nikolaj Goranin
Appl. Sci. 2020, 10(7), 2373; https://doi.org/10.3390/app10072373 - 30 Mar 2020
Cited by 14 | Viewed by 6008
Abstract
Intrusion and malware detection tasks on a host level are a critical part of the overall information security infrastructure of a modern enterprise. While classical host-based intrusion detection systems (HIDS) and antivirus (AV) approaches are based on change monitoring of critical files and [...] Read more.
Intrusion and malware detection tasks on a host level are a critical part of the overall information security infrastructure of a modern enterprise. While classical host-based intrusion detection systems (HIDS) and antivirus (AV) approaches are based on change monitoring of critical files and malware signatures, respectively, some recent research, utilizing relatively vanilla deep learning (DL) methods, has demonstrated promising anomaly-based detection results that already have practical applicability due low false positive rate (FPR). More complex DL methods typically provide better results in natural language processing and image recognition tasks. In this paper, we analyze applicability of more complex dual-flow DL methods, such as long short-term memory fully convolutional network (LSTM-FCN), gated recurrent unit (GRU)-FCN, and several others, for the task specified on the attack-caused Windows OS system calls traces dataset (AWSCTD) and compare it with vanilla single-flow convolutional neural network (CNN) models. The results obtained do not demonstrate any advantages of dual-flow models while processing univariate times series data and introducing unnecessary level of complexity, increasing training, and anomaly detection time, which is crucial in the intrusion containment process. On the other hand, the newly tested AWSCTD-CNN-static (S) single-flow model demonstrated three times better training and testing times, preserving the high detection accuracy. Full article
Show Figures

Figure 1

23 pages, 776 KiB  
Article
DeepDCA: Novel Network-Based Detection of IoT Attacks Using Artificial Immune System
by Sahar Aldhaheri, Daniyal Alghazzawi, Li Cheng, Bander Alzahrani and Abdullah Al-Barakati
Appl. Sci. 2020, 10(6), 1909; https://doi.org/10.3390/app10061909 - 11 Mar 2020
Cited by 61 | Viewed by 5814
Abstract
Recently Internet of Things (IoT) attains tremendous popularity, although this promising technology leads to a variety of security obstacles. The conventional solutions do not suit the new dilemmas brought by the IoT ecosystem. Conversely, Artificial Immune Systems (AIS) is intelligent and adaptive systems [...] Read more.
Recently Internet of Things (IoT) attains tremendous popularity, although this promising technology leads to a variety of security obstacles. The conventional solutions do not suit the new dilemmas brought by the IoT ecosystem. Conversely, Artificial Immune Systems (AIS) is intelligent and adaptive systems mimic the human immune system which holds desirable properties for such a dynamic environment and provides an opportunity to improve IoT security. In this work, we develop a novel hybrid Deep Learning and Dendritic Cell Algorithm (DeepDCA) in the context of an Intrusion Detection System (IDS). The framework adopts Dendritic Cell Algorithm (DCA) and Self Normalizing Neural Network (SNN). The aim of this research is to classify IoT intrusion and minimize the false alarm generation. Also, automate and smooth the signal extraction phase which improves the classification performance. The proposed IDS selects the convenient set of features from the IoT-Bot dataset, performs signal categorization using the SNN then use the DCA for classification. The experimentation results show that DeepDCA performed well in detecting the IoT attacks with a high detection rate demonstrating over 98.73% accuracy and low false-positive rate. Also, we compared these results with State-of-the-art techniques, which showed that our model is capable of performing better classification tasks than SVM, NB, KNN, and MLP. We plan to carry out further experiments to verify the framework using a more challenging dataset and make further comparisons with other signal extraction approaches. Also, involve in real-time (online) attack detection. Full article
Show Figures

Figure 1

21 pages, 488 KiB  
Article
Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches
by Roberto Magán-Carrión, Daniel Urda, Ignacio Díaz-Cano and Bernabé Dorronsoro
Appl. Sci. 2020, 10(5), 1775; https://doi.org/10.3390/app10051775 - 04 Mar 2020
Cited by 66 | Viewed by 6868
Abstract
Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of [...] Read more.
Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of security threats in this scenario. To counteract against malicious behaviours and preserve essential security services, Network Intrusion Detection Systems (NIDSs) are the most widely used defence line in communications networks. Nevertheless, there is no standard methodology to evaluate and fairly compare NIDSs. Most of the proposals elude mentioning crucial steps regarding NIDSs validation that make their comparison hard or even impossible. This work firstly includes a comprehensive study of recent NIDSs based on machine learning approaches, concluding that almost all of them do not accomplish with what authors of this paper consider mandatory steps for a reliable comparison and evaluation of NIDSs. Secondly, a structured methodology is proposed and assessed on the UGR’16 dataset to test its suitability for addressing network attack detection problems. The guideline and steps recommended will definitively help the research community to fairly assess NIDSs, although the definitive framework is not a trivial task and, therefore, some extra effort should still be made to improve its understandability and usability further. Full article
Show Figures

Graphical abstract

16 pages, 1416 KiB  
Article
Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation Learning
by Xin Li, Lu Wang, Yang Xin, Yixian Yang and Yuling Chen
Appl. Sci. 2020, 10(5), 1692; https://doi.org/10.3390/app10051692 - 02 Mar 2020
Cited by 52 | Viewed by 6908
Abstract
Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent [...] Read more.
Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent vulnerability detection methods suffer from the problems of long-term dependence, out of vocabulary, coarse detection granularity and lack of vulnerable samples. This paper proposes an automated and intelligent vulnerability detection method in source code based on the minimum intermediate representation learning. First, the sample in the form of source code is transformed into a minimum intermediate representation to exclude the irrelevant items and reduce the length of the dependency. Next, the intermediate representation is transformed into a real value vector through pre-training on an extended corpus, and the structure and semantic information are retained. Then, the vector is fed to three concatenated convolutional neural networks to obtain high-level features of vulnerability. Last, a classifier is trained using the learned features. To validate this vulnerability detection method, an experiment was performed. The empirical results confirmed that compared with the traditional methods and the state-of-the-art intelligent methods, our method has a better performance with fine granularity. Full article
Show Figures

Figure 1

31 pages, 7123 KiB  
Article
Systematic Approach to Malware Analysis (SAMA)
by Javier Bermejo Higuera, Carlos Abad Aramburu, Juan-Ramón Bermejo Higuera, Miguel Angel Sicilia Urban and Juan Antonio Sicilia Montalvo
Appl. Sci. 2020, 10(4), 1360; https://doi.org/10.3390/app10041360 - 17 Feb 2020
Cited by 21 | Viewed by 9353
Abstract
Malware threats pose new challenges to analytic and reverse engineering tasks. It is needed for a systematic approach to that analysis, in an attempt to fully uncover their underlying attack vectors and techniques and find commonalities between them. In this paper, a method [...] Read more.
Malware threats pose new challenges to analytic and reverse engineering tasks. It is needed for a systematic approach to that analysis, in an attempt to fully uncover their underlying attack vectors and techniques and find commonalities between them. In this paper, a method of malware analysis is described, together with a report of its application to the case of Flame and Red October. The method has also been used by different analysts to analyze other malware threats like ‘Stuxnet’, ‘Dark Comet’, ‘Poison Ivy’, ‘Locky’, ‘Careto’, and ‘Sofacy Carberp’. The method presented in this work is a systematic and methodological process of analysis, whose main objective is the acquisition of knowledge as well as to gain a full understanding of a particular malware. Using the proposed method to analyze two well-known malware as ‘Flame’ and ‘Red October’ will help to understand the added value of the method. Full article
Show Figures

Figure 1

15 pages, 1476 KiB  
Article
Graph Convolutional Networks for Privacy Metrics in Online Social Networks
by Xuefeng Li, Yang Xin, Chensu Zhao, Yixian Yang and Yuling Chen
Appl. Sci. 2020, 10(4), 1327; https://doi.org/10.3390/app10041327 - 15 Feb 2020
Cited by 11 | Viewed by 2577
Abstract
In recent years, privacy leakage events in large-scale social networks have become increasingly frequent. Traditional methods relying on operators have been unable to effectively curb this problem. Researchers must turn their attention to the privacy protection of users themselves. Privacy metrics are undoubtedly [...] Read more.
In recent years, privacy leakage events in large-scale social networks have become increasingly frequent. Traditional methods relying on operators have been unable to effectively curb this problem. Researchers must turn their attention to the privacy protection of users themselves. Privacy metrics are undoubtedly the most effective method. However, social networks have a substantial number of users and a complex network structure and feature set. Previous studies either considered a single aspect or measured multiple aspects separately and then artificially integrated them. The measurement procedures are complex and cannot effectively be integrated. To solve the above problems, we first propose using a deep neural network to measure the privacy status of social network users. Through a graph convolution network, we can easily and efficiently combine the user features and graph structure, determine the hidden relationships between these features, and obtain more accurate privacy scores. Given the restriction of the deep learning framework, which requires a large number of labelled samples, we incorporate a few-shot learning method, which greatly reduces the dependence on labelled data and human intervention. Our method is applicable to online social networks, such as Sina Weibo, Twitter, and Facebook, that can extract profile information, graph structure information of users’ friends, and behavioural characteristics. The experiments show that our model can quickly and accurately obtain privacy scores in a whole network and eliminate traditional tedious numerical calculations and human intervention. Full article
Show Figures

Figure 1

16 pages, 3696 KiB  
Article
Cybersecurity Threats Based on Machine Learning-Based Offensive Technique for Password Authentication
by Kyungroul Lee and Kangbin Yim
Appl. Sci. 2020, 10(4), 1286; https://doi.org/10.3390/app10041286 - 14 Feb 2020
Cited by 12 | Viewed by 4212
Abstract
Due to the emergence of online society, a representative user authentication method that is password authentication has been a key topic. However, in this authentication method, various attack techniques have emerged to steal passwords input from the keyboard, hence, the keyboard data does [...] Read more.
Due to the emergence of online society, a representative user authentication method that is password authentication has been a key topic. However, in this authentication method, various attack techniques have emerged to steal passwords input from the keyboard, hence, the keyboard data does not ensure security. To detect and prevent such an attack, a keyboard data protection technique using random keyboard data generation has been presented. This technique protects keyboard data by generating dummy keyboard data while the attacker obtains the keyboard data. In this study, we demonstrate the feasibility of keyboard data exposure under the keyboard data protection technique. To prove the proposed attack technique, we gathered all the dummy keyboard data generated by the defense tool, and the real keyboard data input by the user, and evaluated the cybersecurity threat of keyboard data based on the machine learning-based offensive technique. We verified that an adversary obtains the keyboard data with 96.2% accuracy even if the attack technique that makes it impossible to attack keyboard data exposure is used. Namely, the proposed method in this study obviously differentiates the keyboard data input by the user from dummy keyboard data. Therefore, the contributions of this paper are that we derived and verified a new security threat and a new vulnerability of password authentication. Furthermore, a new cybersecurity threat derived from this study will have advantages over the security assessment of password authentication and all types of authentication technology and application services input from the keyboard. Full article
Show Figures

Figure 1

14 pages, 931 KiB  
Article
Collecting Vulnerable Source Code from Open-Source Repositories for Dataset Generation
by Razvan Raducu, Gonzalo Esteban, Francisco J. Rodríguez Lera and Camino Fernández
Appl. Sci. 2020, 10(4), 1270; https://doi.org/10.3390/app10041270 - 13 Feb 2020
Cited by 7 | Viewed by 4680
Abstract
Different Machine Learning techniques to detect software vulnerabilities have emerged in scientific and industrial scenarios. Different actors in these scenarios aim to develop algorithms for predicting security threats without requiring human intervention. However, these algorithms require data-driven engines based on the processing of [...] Read more.
Different Machine Learning techniques to detect software vulnerabilities have emerged in scientific and industrial scenarios. Different actors in these scenarios aim to develop algorithms for predicting security threats without requiring human intervention. However, these algorithms require data-driven engines based on the processing of huge amounts of data, known as datasets. This paper introduces the SonarCloud Vulnerable Code Prospector for C (SVCP4C). This tool aims to collect vulnerable source code from open source repositories linked to SonarCloud, an online tool that performs static analysis and tags the potentially vulnerable code. The tool provides a set of tagged files suitable for extracting features and creating training datasets for Machine Learning algorithms. This study presents a descriptive analysis of these files and overviews current status of C vulnerabilities, specifically buffer overflow, in the reviewed public repositories. Full article
Show Figures

Figure 1

19 pages, 2318 KiB  
Article
A New Method of Fuzzy Support Vector Machine Algorithm for Intrusion Detection
by Wei Liu, LinLin Ci and LiPing Liu
Appl. Sci. 2020, 10(3), 1065; https://doi.org/10.3390/app10031065 - 05 Feb 2020
Cited by 19 | Viewed by 4224
Abstract
Since SVM is sensitive to noises and outliers of system call sequence data. A new fuzzy support vector machine algorithm based on SVDD is presented in this paper. In our algorithm, the noises and outliers are identified by a hypersphere with minimum volume [...] Read more.
Since SVM is sensitive to noises and outliers of system call sequence data. A new fuzzy support vector machine algorithm based on SVDD is presented in this paper. In our algorithm, the noises and outliers are identified by a hypersphere with minimum volume while containing the maximum of the samples. The definition of fuzzy membership is considered by not only the relation between a sample and hyperplane, but also relation between samples. For each sample inside the hypersphere, the fuzzy membership function is a linear function of the distance between the sample and the hyperplane. The greater the distance, the greater the weight coefficient. For each sample outside the hypersphere, the membership function is an exponential function of the distance between the sample and the hyperplane. The greater the distance, the smaller the weight coefficient. Compared with the traditional fuzzy membership definition based on the relation between a sample and its cluster center, our method effectively distinguishes the noises or outlies from support vectors and assigns them appropriate weight coefficients even though they are distributed on the boundary between the positive and the negative classes. The experiments show that the fuzzy support vector proposed in this paper is more robust than the support vector machine and fuzzy support vector machines based on the distance of a sample and its cluster center. Full article
Show Figures

Figure 1

18 pages, 1497 KiB  
Article
A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data
by Chensu Zhao, Yang Xin, Xuefeng Li, Yixian Yang and Yuling Chen
Appl. Sci. 2020, 10(3), 936; https://doi.org/10.3390/app10030936 - 31 Jan 2020
Cited by 48 | Viewed by 5228
Abstract
The popularity of social networks provides people with many conveniences, but their rapid growth has also attracted many attackers. In recent years, the malicious behavior of social network spammers has seriously threatened the information security of ordinary users. To reduce this threat, many [...] Read more.
The popularity of social networks provides people with many conveniences, but their rapid growth has also attracted many attackers. In recent years, the malicious behavior of social network spammers has seriously threatened the information security of ordinary users. To reduce this threat, many researchers have mined the behavior characteristics of spammers and have obtained good results by applying machine learning algorithms to identify spammers in social networks. However, most of these studies overlook class imbalance situations that exist in real world data. In this paper, we propose a heterogeneous stacking-based ensemble learning framework to ameliorate the impact of class imbalance on spam detection in social networks. The proposed framework consists of two main components, a base module and a combining module. In the base module, we adopt six different base classifiers and utilize this classifier diversity to construct new ensemble input members. In the combination module, we introduce cost sensitive learning into deep neural network training. By setting different costs for misclassification and dynamically adjusting the weights of the prediction results of the base classifiers, we can integrate the input members and aggregate the classification results. The experimental results show that our framework effectively improves the spam detection rate on imbalanced datasets. Full article
Show Figures

Figure 1

19 pages, 849 KiB  
Article
Synthetic Minority Oversampling Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets
by David Gonzalez-Cuautle, Aldo Hernandez-Suarez, Gabriel Sanchez-Perez, Linda Karina Toscano-Medina, Jose Portillo-Portillo, Jesus Olivares-Mercado, Hector Manuel Perez-Meana and Ana Lucila Sandoval-Orozco
Appl. Sci. 2020, 10(3), 794; https://doi.org/10.3390/app10030794 - 22 Jan 2020
Cited by 51 | Viewed by 4915
Abstract
Presently, security is a hot research topic due to the impact in daily information infrastructure. Machine-learning solutions have been improving classical detection practices, but detection tasks employ irregular amounts of data since the number of instances that represent one or several malicious samples [...] Read more.
Presently, security is a hot research topic due to the impact in daily information infrastructure. Machine-learning solutions have been improving classical detection practices, but detection tasks employ irregular amounts of data since the number of instances that represent one or several malicious samples can significantly vary. In highly unbalanced data, classification models regularly have high precision with respect to the majority class, while minority classes are considered noise due to the lack of information that they provide. Well-known datasets used for malware-based analyses like botnet attacks and Intrusion Detection Systems (IDS) mainly comprise logs, records, or network-traffic captures that do not provide an ideal source of evidence as a result of obtaining raw data. As an example, the numbers of abnormal and constant connections generated by either botnets or intruders within a network are considerably smaller than those from benign applications. In most cases, inadequate dataset design may lead to the downgrade of a learning algorithm, resulting in overfitting and poor classification rates. To address these problems, we propose a resampling method, the Synthetic Minority Oversampling Technique (SMOTE) with a grid-search algorithm optimization procedure. This work demonstrates classification-result improvements for botnet and IDS datasets by merging synthetically generated balanced data and tuning different supervised-learning algorithms. Full article
Show Figures

Figure 1

20 pages, 12601 KiB  
Article
Improving Incident Response in Big Data Ecosystems by Using Blockchain Technologies
by Julio Moreno, Manuel A. Serrano, Eduardo B. Fernandez and Eduardo Fernández-Medina
Appl. Sci. 2020, 10(2), 724; https://doi.org/10.3390/app10020724 - 20 Jan 2020
Cited by 10 | Viewed by 4434
Abstract
Big data ecosystems are increasingly important for the daily activities of any type of company. They are decisive elements in the organization, so any malfunction of this environment can have a great impact on the normal functioning of the company; security is therefore [...] Read more.
Big data ecosystems are increasingly important for the daily activities of any type of company. They are decisive elements in the organization, so any malfunction of this environment can have a great impact on the normal functioning of the company; security is therefore a crucial aspect of this type of ecosystem. When approaching security in big data as an issue, it must be considered not only during the creation and implementation of the big data ecosystem, but also throughout its entire lifecycle, including operation, and especially when managing and responding to incidents that occur. To this end, this paper proposes an incident response process supported by a private blockchain network that allows the recording of the different events and incidents that occur in the big data ecosystem. The use of blockchain enables the security of the stored data to be improved, increasing its immutability and traceability. In addition, the stored records can help manage incidents and anticipate them, thereby minimizing the costs of investigating their causes; that facilitates forensic readiness. This proposal integrates with previous research work, seeking to improve the security of big data by creating a process of secure analysis, design, and implementation, supported by a security reference architecture that serves as a guide in defining the different elements of this type of ecosystem. Moreover, this paper presents a case study in which the proposal is being implemented by using big data and blockchain technologies, such as Apache Spark or Hyperledger Fabric. Full article
Show Figures

Figure 1

14 pages, 314 KiB  
Article
A Zero-Knowledge Proof System with Algebraic Geometry Techniques
by Edgar González Fernández, Guillermo Morales-Luna and Feliu Sagols
Appl. Sci. 2020, 10(2), 465; https://doi.org/10.3390/app10020465 - 08 Jan 2020
Cited by 4 | Viewed by 3424
Abstract
Current requirements for ensuring data exchange over the internet to fight against security breaches have to consider new cryptographic attacks. The most recent advances in cryptanalysis are boosted by quantum computers, which are able to break common cryptographic primitives. This makes evident the [...] Read more.
Current requirements for ensuring data exchange over the internet to fight against security breaches have to consider new cryptographic attacks. The most recent advances in cryptanalysis are boosted by quantum computers, which are able to break common cryptographic primitives. This makes evident the need for developing further communication protocols to secure sensitive data. Zero-knowledge proof systems have been around for a while and have been considered for providing authentication and identification services, but it has only been in recent times that its popularity has risen due to novel applications in blockchain technology, Internet of Things, and cloud storage, among others. A new zero-knowledge proof system is presented, which bases its security in two main problems, known to be resistant, up to now, against quantum attacks: the graph isomorphism problem and the isomorphism of polynomials problem. Full article
Show Figures

Figure 1

28 pages, 5256 KiB  
Article
CyberSPL: A Framework for the Verification of Cybersecurity Policy Compliance of System Configurations Using Software Product Lines
by Ángel Jesús Varela-Vaca , Rafael M. Gasca, Rafael Ceballos, María Teresa Gómez-López and Pedro Bernáldez Torres
Appl. Sci. 2019, 9(24), 5364; https://doi.org/10.3390/app9245364 - 08 Dec 2019
Cited by 16 | Viewed by 3390
Abstract
Cybersecurity attacks affect the compliance of cybersecurity policies of the organisations. Such disadvantages may be due to the absence of security configurations or the use of default configuration values of software products and systems. The complexity in the configuration of products and systems [...] Read more.
Cybersecurity attacks affect the compliance of cybersecurity policies of the organisations. Such disadvantages may be due to the absence of security configurations or the use of default configuration values of software products and systems. The complexity in the configuration of products and systems is a known challenge in the software industry since it includes a wide range of parameters to be taken into account. In other contexts, the configuration problems are solved using Software Product Lines. This is the reason why in this article the framework Cybersecurity Software Product Line (CyberSPL) is proposed. CyberSPL is based on a methodology to design product lines to verify cybersecurity policies according to the possible configurations. The patterns to configure the systems related to the cybersecurity aspects are grouped by defining various feature models. The automated analysis of these models allows us to diagnose possible problems in the security configurations, reducing or avoiding them. As support for this proposal, a multi-user and multi-platform solution has been implemented, enabling setting a catalogue of public or private feature models. Moreover, analysis and reasoning mechanisms have been integrated to obtain all the configurations of a model, to detect if a configuration is valid or not, including the root cause of problems for a given configuration. For validating the proposal, a real scenario is proposed where a catalogue of four different feature models is presented. In this scenario, the models have been analysed, different configurations have been validated, and several configurations with problems have been diagnosed. Full article
Show Figures

Figure 1

13 pages, 472 KiB  
Article
Malware Detection on Byte Streams of Hangul Word Processor Files
by Young-Seob Jeong, Jiyoung Woo and Ah Reum Kang
Appl. Sci. 2019, 9(23), 5178; https://doi.org/10.3390/app9235178 - 29 Nov 2019
Cited by 3 | Viewed by 3172
Abstract
While the exchange of data files or programs on the Internet grows exponentially, most users are vulnerable to infected files, especially to malicious non-executables. Due to the circumstances between South and North Korea, many malicious actions have recently been found in Hangul Word [...] Read more.
While the exchange of data files or programs on the Internet grows exponentially, most users are vulnerable to infected files, especially to malicious non-executables. Due to the circumstances between South and North Korea, many malicious actions have recently been found in Hangul Word Processor (HWP) non-executable files because the HWP is widely used in schools, military facilities, and government institutions of South Korea. The HWP file usually has one or more byte streams that are often used for the malicious actions. Based on an assumption that infected byte streams have particular patterns, we design a convolutional neural network (CNN) to grasp such patterns. We conduct experiments on our prepared 534 HWP files, and demonstrate that the proposed CNN achieves the best performance compared to other machine learning models. As new malicious attacks keep emerging, we will keep collecting such HWP files and investigate better model structures. Full article
Show Figures

Figure 1

20 pages, 1152 KiB  
Article
Bitcoin and Cybersecurity: Temporal Dissection of Blockchain Data to Unveil Changes in Entity Behavioral Patterns
by Francesco Zola, Jan Lukas Bruse, Maria Eguimendia, Mikel Galar and Raul Orduna Urrutia
Appl. Sci. 2019, 9(23), 5003; https://doi.org/10.3390/app9235003 - 20 Nov 2019
Cited by 17 | Viewed by 4512
Abstract
The Bitcoin network not only is vulnerable to cyber-attacks but currently represents the most frequently used cryptocurrency for concealing illicit activities. Typically, Bitcoin activity is monitored by decreasing anonymity of its entities using machine learning-based techniques, which consider the whole blockchain. This entails [...] Read more.
The Bitcoin network not only is vulnerable to cyber-attacks but currently represents the most frequently used cryptocurrency for concealing illicit activities. Typically, Bitcoin activity is monitored by decreasing anonymity of its entities using machine learning-based techniques, which consider the whole blockchain. This entails two issues: first, it increases the complexity of the analysis requiring higher efforts and, second, it may hide network micro-dynamics important for detecting short-term changes in entity behavioral patterns. The aim of this paper is to address both issues by performing a “temporal dissection” of the Bitcoin blockchain, i.e., dividing it into smaller temporal batches to achieve entity classification. The idea is that a machine learning model trained on a certain time-interval (batch) should achieve good classification performance when tested on another batch if entity behavioral patterns are similar. We apply cascading machine learning principles—a type of ensemble learning applying stacking techniques—introducing a “k-fold cross-testing” concept across batches of varying size. Results show that blockchain batch size used for entity classification could be reduced for certain classes (Exchange, Gambling, and eWallet) as classification rates did not vary significantly with batch size; suggesting that behavioral patterns did not change significantly over time. Mixer and Market class detection, however, can be negatively affected. A deeper analysis of Mining Pool behavior showed that models trained on recent data perform better than models trained on older data, suggesting that “typical” Mining Pool behavior may be represented better by recent data. This work provides a first step towards uncovering entity behavioral changes via temporal dissection of blockchain data. Full article
Show Figures

Figure 1

17 pages, 5121 KiB  
Article
Malicious PDF Detection Model against Adversarial Attack Built from Benign PDF Containing JavaScript
by Ah Reum Kang, Young-Seob Jeong, Se Lyeong Kim and Jiyoung Woo
Appl. Sci. 2019, 9(22), 4764; https://doi.org/10.3390/app9224764 - 08 Nov 2019
Cited by 10 | Viewed by 4422
Abstract
Intelligent attacks using document-based malware that exploit vulnerabilities in document viewing software programs or document file structure are increasing rapidly. There are many cases of using PDF (portable document format) in proportion to its usage. We provide in-depth analysis on PDF structure and [...] Read more.
Intelligent attacks using document-based malware that exploit vulnerabilities in document viewing software programs or document file structure are increasing rapidly. There are many cases of using PDF (portable document format) in proportion to its usage. We provide in-depth analysis on PDF structure and JavaScript content embedded in PDFs. Then, we develop the diverse feature set encompassing the structure and metadata such as file size, version, encoding method and keywords, and the content features such as object names, keywords, and readable strings in JavaScript. When features are diverse, it is hard to develop adversarial examples because small changes are robust for machine-learning algorithms. We develop a detection model using black-box type models with the structure and content features to minimize the risk of adversarial attacks. To validate the proposed model, we design the adversarial attack. We collect benign documents containing multiple JavaScript codes for the base of adversarial samples. We build the adversarial samples by injecting the malware codes into base samples. The proposed model is evaluated against a large collection of malicious and benign PDFs. We found that random forest, an ensemble algorithm of a decision tree, exhibits a good performance on malware detection and is robust for adversarial samples. Full article
Show Figures

Figure 1

26 pages, 5209 KiB  
Article
A Feature Analysis Based Identifying Scheme Using GBDT for DDoS with Multiple Attack Vectors
by Jian Zhang, Qidi Liang, Rui Jiang and Xi Li
Appl. Sci. 2019, 9(21), 4633; https://doi.org/10.3390/app9214633 - 31 Oct 2019
Cited by 13 | Viewed by 3473
Abstract
In recent years, distributed denial of service (DDoS) attacks have increasingly shown the trend of multiattack vector composites, which has significantly improved the concealment and success rate of DDoS attacks. Therefore, improving the ubiquitous detection capability of DDoS attacks and accurately and quickly [...] Read more.
In recent years, distributed denial of service (DDoS) attacks have increasingly shown the trend of multiattack vector composites, which has significantly improved the concealment and success rate of DDoS attacks. Therefore, improving the ubiquitous detection capability of DDoS attacks and accurately and quickly identifying DDoS attack traffic play an important role in later attack mitigation. This paper proposes a method to efficiently detect and identify multivector DDoS attacks. The detection algorithm is applicable to known and unknown DDoS attacks. Full article
Show Figures

Figure 1

15 pages, 4044 KiB  
Article
Ontology-Based System for Dynamic Risk Management in Administrative Domains
by Mario Vega-Barbas, Víctor A. Villagrá, Fernando Monje, Raúl Riesco, Xavier Larriva-Novo and Julio Berrocal
Appl. Sci. 2019, 9(21), 4547; https://doi.org/10.3390/app9214547 - 26 Oct 2019
Cited by 8 | Viewed by 5479
Abstract
With the increasing complexity of cyberthreats, it is necessary to have tools to understand the changing context in real-time. This document will present architecture and a prototype designed to model the risk of administrative domains, exemplifying the case of a country in real-time, [...] Read more.
With the increasing complexity of cyberthreats, it is necessary to have tools to understand the changing context in real-time. This document will present architecture and a prototype designed to model the risk of administrative domains, exemplifying the case of a country in real-time, specifically, Spain. In order to carry out this task, a modeling of the assets and threats detected by various sources of information has been carried out. All this information is stored as knowledge making use of ontologies, which enables the application of reasoning engines in order to infer new knowledge that can be used later in the following reasoning. This modeling and reasoning have been enriched with a dynamic system for managing the trust of the different sources of information and capabilities for increased reliability with the inclusion of additional threat intelligence information. Full article
Show Figures

Figure 1

21 pages, 2156 KiB  
Article
Insider Threat Detection Based on User Behavior Modeling and Anomaly Detection Algorithms
by Junhong Kim, Minsik Park, Haedong Kim, Suhyoun Cho and Pilsung Kang
Appl. Sci. 2019, 9(19), 4018; https://doi.org/10.3390/app9194018 - 25 Sep 2019
Cited by 43 | Viewed by 11638
Abstract
Insider threats are malicious activities by authorized users, such as theft of intellectual property or security information, fraud, and sabotage. Although the number of insider threats is much lower than external network attacks, insider threats can cause extensive damage. As insiders are very [...] Read more.
Insider threats are malicious activities by authorized users, such as theft of intellectual property or security information, fraud, and sabotage. Although the number of insider threats is much lower than external network attacks, insider threats can cause extensive damage. As insiders are very familiar with an organization’s system, it is very difficult to detect their malicious behavior. Traditional insider-threat detection methods focus on rule-based approaches built by domain experts, but they are neither flexible nor robust. In this paper, we propose insider-threat detection methods based on user behavior modeling and anomaly detection algorithms. Based on user log data, we constructed three types of datasets: user’s daily activity summary, e-mail contents topic distribution, and user’s weekly e-mail communication history. Then, we applied four anomaly detection algorithms and their combinations to detect malicious activities. Experimental results indicate that the proposed framework can work well for imbalanced datasets in which there are only a few insider threats and where no domain experts’ knowledge is provided. Full article
Show Figures

Figure 1

15 pages, 2098 KiB  
Article
Information Extraction of Cybersecurity Concepts: An LSTM Approach
by Houssem Gasmi, Jannik Laval and Abdelaziz Bouras
Appl. Sci. 2019, 9(19), 3945; https://doi.org/10.3390/app9193945 - 20 Sep 2019
Cited by 33 | Viewed by 4190
Abstract
Extracting cybersecurity entities and the relationships between them from online textual resources such as articles, bulletins, and blogs and converting these resources into more structured and formal representations has important applications in cybersecurity research and is valuable for professional practitioners. Previous works to [...] Read more.
Extracting cybersecurity entities and the relationships between them from online textual resources such as articles, bulletins, and blogs and converting these resources into more structured and formal representations has important applications in cybersecurity research and is valuable for professional practitioners. Previous works to accomplish this task were mainly based on utilizing feature-based models. Feature-based models are time-consuming and need labor-intensive feature engineering to describe the properties of entities, domain knowledge, entity context, and linguistic characteristics. Therefore, to alleviate the need for feature engineering, we propose the usage of neural network models, specifically the long short-term memory (LSTM) models to accomplish the tasks of Named Entity Recognition (NER) and Relation Extraction (RE). We evaluated the proposed models on two tasks. The first task is performing NER and evaluating the results against the state-of-the-art Conditional Random Fields (CRFs) method. The second task is performing RE using three LSTM models and comparing their results to assess which model is more suitable for the domain of cybersecurity. The proposed models achieved competitive performance with less feature-engineering work. We demonstrate that exploiting neural network models in cybersecurity text mining is effective and practical. Full article
Show Figures

Figure 1

12 pages, 1704 KiB  
Article
Malware Detection Approach Based on Artifacts in Memory Image and Dynamic Analysis
by Rami Sihwail, Khairuddin Omar, Khairul Akram Zainol Ariffin and Sanad Al Afghani
Appl. Sci. 2019, 9(18), 3680; https://doi.org/10.3390/app9183680 - 05 Sep 2019
Cited by 44 | Viewed by 6802
Abstract
The need to detect malware before it harms computers, mobile phones and other electronic devices has caught the attention of researchers and the anti-malware industry for many years. To protect users from malware attacks, anti-virus software products are downloaded on the computer. The [...] Read more.
The need to detect malware before it harms computers, mobile phones and other electronic devices has caught the attention of researchers and the anti-malware industry for many years. To protect users from malware attacks, anti-virus software products are downloaded on the computer. The anti-virus mainly uses signature-based techniques to detect malware. However, this technique fails to detect malware that uses packing, encryption or obfuscation techniques. It also fails to detect unseen (new) ones. This paper proposes an integrated malware detection approach that applies memory forensics to extract malicious artifacts from memory and combines them to features extracted during the execution of malware in a dynamic analysis. Pre-modeling techniques were also applied for feature engineering before training and testing the data set on the machine learning models. The experimental results show a significant improvement in both detection accuracy rate and false positive rate, 98.5% and 1.7% respectively, by applying the support vector machine. The results verify that our integrated analysis approach outperforms other analysis methods. In addition, the proposed approach overcomes the limitation of single path file execution in dynamic analysis by adding more relevant memory artifacts that can reveal the real intention of malicious files. Full article
Show Figures

Figure 1

29 pages, 2021 KiB  
Article
Network Intrusion Detection Based on Novel Feature Selection Model and Various Recurrent Neural Networks
by Thi-Thu-Huong Le, Yongsu Kim and Howon Kim
Appl. Sci. 2019, 9(7), 1392; https://doi.org/10.3390/app9071392 - 03 Apr 2019
Cited by 69 | Viewed by 5519
Abstract
The recent increase in hacks and computer network attacks around the world has intensified the need to develop better intrusion detection and prevention systems. The intrusion detection system (IDS) plays a vital role in detecting anomalies and attacks on the network which have [...] Read more.
The recent increase in hacks and computer network attacks around the world has intensified the need to develop better intrusion detection and prevention systems. The intrusion detection system (IDS) plays a vital role in detecting anomalies and attacks on the network which have become larger and more pervasive in nature. However, most anomaly-based intrusion detection systems are plagued by high false positives. Furthermore, Remote-to-Local (R2L) and User-to-Root (U2R) are two kinds of attack which have low predicted accuracy scores in advance IDS methods. Therefore, this paper proposes a novel IDS framework to overcome these IDS problems. The proposed framework including three main parts. The first part is to build SFSDT model which is the feature selection model. SFSDT is to generate the best feature subset from the original feature set. This model is a hybrid Sequence Forward Selection (SFS) algorithm and Decision Tree (DT) model. The second part is to build various IDS models to train on the best-selected feature subset. The various Recurrent Neural Networks (RNN) are traditional RNN, Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). Two IDS datasets are used for the learned models in experiments including NSL-KDD in 2010 and ISCX in 2012. The final part is to evaluate the proposed model by comparing the proposed models to other IDS models. The experimental results show the proposed models achieve significantly improved accuracy detection rate as well as attack types classification. Furthermore, this approach can reduce the computation time by memory profilers measurement. Full article
Show Figures

Figure 1

Review

Jump to: Research

26 pages, 1625 KiB  
Review
A Systematic Review of Defensive and Offensive Cybersecurity with Machine Learning
by Imatitikua D. Aiyanyo, Hamman Samuel and Heuiseok Lim
Appl. Sci. 2020, 10(17), 5811; https://doi.org/10.3390/app10175811 - 22 Aug 2020
Cited by 18 | Viewed by 7497
Abstract
This is a systematic review of over one hundred research papers about machine learning methods applied to defensive and offensive cybersecurity. In contrast to previous reviews, which focused on several fragments of research topics in this area, this paper systematically and comprehensively combines [...] Read more.
This is a systematic review of over one hundred research papers about machine learning methods applied to defensive and offensive cybersecurity. In contrast to previous reviews, which focused on several fragments of research topics in this area, this paper systematically and comprehensively combines domain knowledge into a single review. Ultimately, this paper seeks to provide a base for researchers that wish to delve into the field of machine learning for cybersecurity. Our findings identify the frequently used machine learning methods within supervised, unsupervised, and semi-supervised machine learning, the most useful data sets for evaluating intrusion detection methods within supervised learning, and methods from machine learning that have shown promise in tackling various threats in defensive and offensive cybersecurity. Full article
Show Figures

Figure 1

28 pages, 2771 KiB  
Review
Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey
by Hongyu Liu and Bo Lang
Appl. Sci. 2019, 9(20), 4396; https://doi.org/10.3390/app9204396 - 17 Oct 2019
Cited by 461 | Viewed by 37971
Abstract
Networks play important roles in modern life, and cyber security has become a vital research area. An intrusion detection system (IDS) which is an important cyber security technique, monitors the state of software and hardware running in the network. Despite decades of development, [...] Read more.
Networks play important roles in modern life, and cyber security has become a vital research area. An intrusion detection system (IDS) which is an important cyber security technique, monitors the state of software and hardware running in the network. Despite decades of development, existing IDSs still face challenges in improving the detection accuracy, reducing the false alarm rate and detecting unknown attacks. To solve the above problems, many researchers have focused on developing IDSs that capitalize on machine learning methods. Machine learning methods can automatically discover the essential differences between normal data and abnormal data with high accuracy. In addition, machine learning methods have strong generalizability, so they are also able to detect unknown attacks. Deep learning is a branch of machine learning, whose performance is remarkable and has become a research hotspot. This survey proposes a taxonomy of IDS that takes data objects as the main dimension to classify and summarize machine learning-based and deep learning-based IDS literature. We believe that this type of taxonomy framework is fit for cyber security researchers. The survey first clarifies the concept and taxonomy of IDSs. Then, the machine learning algorithms frequently used in IDSs, metrics, and benchmark datasets are introduced. Next, combined with the representative literature, we take the proposed taxonomic system as a baseline and explain how to solve key IDS issues with machine learning and deep learning techniques. Finally, challenges and future developments are discussed by reviewing recent representative studies. Full article
Show Figures

Figure 1

Back to TopTop