Applied Sciences

Research

Jump to: Review

24 pages, 625 KiB

Open AccessArticle

Machine Learning-Based Malicious X.509 Certificates’ Detection

by Jiaxin Li, Zhaoxin Zhang and Changyong Guo

Appl. Sci. 2021, 11(5), 2164; https://doi.org/10.3390/app11052164 - 1 Mar 2021

Cited by 8 | Viewed by 5151

Abstract

X.509 certificates play an important role in encrypting the transmission of data on both sides under HTTPS. With the popularization of X.509 certificates, more and more criminals leverage certificates to prevent their communications from being exposed by malicious traffic analysis tools. Phishing sites [...] Read more.

X.509 certificates play an important role in encrypting the transmission of data on both sides under HTTPS. With the popularization of X.509 certificates, more and more criminals leverage certificates to prevent their communications from being exposed by malicious traffic analysis tools. Phishing sites and malware are good examples. Those X.509 certificates found in phishing sites or malware are called malicious X.509 certificates. This paper applies different machine learning models, including classical machine learning models, ensemble learning models, and deep learning models, to distinguish between malicious certificates and benign certificates with Verification for Extraction (VFE). The VFE is a system we design and implement for obtaining plentiful characteristics of certificates. The result shows that ensemble learning models are the most stable and efficient models with an average accuracy of 95.9%, which outperforms many previous works. In addition, we obtain an SVM-based detection model with an accuracy of 98.2%, which is the highest accuracy. The outcome indicates the VFE is capable of capturing essential and crucial characteristics of malicious X.509 certificates. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Graphical abstract

20 pages, 2949 KiB

Open AccessArticle

Designing Trojan Detectors in Neural Networks Using Interactive Simulations

by Peter Bajcsy, Nicholas J. Schaub and Michael Majurski

Appl. Sci. 2021, 11(4), 1865; https://doi.org/10.3390/app11041865 - 20 Feb 2021

Cited by 2 | Viewed by 3544

Abstract

This paper addresses the problem of designing trojan detectors in neural networks (NNs) using interactive simulations. Trojans in NNs are defined as triggers in inputs that cause misclassification of such inputs into a class (or classes) unintended by the design of a NN-based [...] Read more.

This paper addresses the problem of designing trojan detectors in neural networks (NNs) using interactive simulations. Trojans in NNs are defined as triggers in inputs that cause misclassification of such inputs into a class (or classes) unintended by the design of a NN-based model. The goal of our work is to understand encodings of a variety of trojan types in fully connected layers of neural networks. Our approach is: (1) to simulate nine types of trojan embeddings into dot patterns; (2) to devise measurements of NN states; and (3) to design trojan detectors in NN-based classification models. The interactive simulations are built on top of TensorFlow Playground with in-memory storage of data and NN coefficients. The simulations provide analytical, visualization, and output operations performed on training datasets and NN architectures. The measurements of a NN include: (a) model inefficiency using modified Kullback–Liebler (KL) divergence from uniformly distributed states; and (b) model sensitivity to variables related to data and NNs. Using the KL divergence measurements at each NN layer and per each predicted class label, a trojan detector is devised to discriminate NN models with or without trojans. To document robustness of such a trojan detector with respect to NN architectures, dataset perturbations, and trojan types, several properties of the KL divergence measurement are presented. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Graphical abstract

30 pages, 2267 KiB

Open AccessArticle

TEDL: A Text Encryption Method Based on Deep Learning

by Peng Wang and Xiang Li

Appl. Sci. 2021, 11(4), 1781; https://doi.org/10.3390/app11041781 - 17 Feb 2021

Cited by 3 | Viewed by 3200

Abstract

Recent years have seen an increasing emphasis on information security, and various encryption methods have been proposed. However, for symmetric encryption methods, the well-known encryption techniques still rely on the key space to guarantee security and suffer from frequent key updating. Aiming to [...] Read more.

Recent years have seen an increasing emphasis on information security, and various encryption methods have been proposed. However, for symmetric encryption methods, the well-known encryption techniques still rely on the key space to guarantee security and suffer from frequent key updating. Aiming to solve those problems, this paper proposes a novel symmetry-key method for text encryption based on deep learning called TEDL, where the secret key includes hyperparameters in the deep learning model and the core step of encryption is transforming input data into weights trained under hyperparameters. Firstly, both communication parties establish a word vector table by training a deep learning model according to specified hyperparameters. Then, a self-update codebook is constructed on the word vector table with the SHA-256 function and other tricks. When communication starts, encryption and decryption are equivalent to indexing and inverted indexing on the codebook, respectively, thus achieving the transformation between plaintext and ciphertext. Results of experiments and relevant analyses show that TEDL performs well for security, efficiency, generality, and has a lower demand for the frequency of key redistribution. Especially, as a supplement to current encryption methods, the time-consuming process of constructing a codebook increases the difficulty of brute-force attacks, meanwhile, it does not degrade the efficiency of communications. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

21 pages, 595 KiB

Open AccessEditor’s ChoiceArticle

Intelligent Cyber Attack Detection and Classification for Network-Based Intrusion Detection Systems

by Nuno Oliveira, Isabel Praça, Eva Maia and Orlando Sousa

Appl. Sci. 2021, 11(4), 1674; https://doi.org/10.3390/app11041674 - 13 Feb 2021

Cited by 97 | Viewed by 10604

Abstract

With the latest advances in information and communication technologies, greater amounts of sensitive user and corporate information are shared continuously across the network, making it susceptible to an attack that can compromise data confidentiality, integrity, and availability. Intrusion Detection Systems (IDS) are important [...] Read more.

With the latest advances in information and communication technologies, greater amounts of sensitive user and corporate information are shared continuously across the network, making it susceptible to an attack that can compromise data confidentiality, integrity, and availability. Intrusion Detection Systems (IDS) are important security mechanisms that can perform the timely detection of malicious events through the inspection of network traffic or host-based logs. Many machine learning techniques have proven to be successful at conducting anomaly detection throughout the years, but only a few considered the sequential nature of data. This work proposes a sequential approach and evaluates the performance of a Random Forest (RF), a Multi-Layer Perceptron (MLP), and a Long-Short Term Memory (LSTM) on the CIDDS-001 dataset. The resulting performance measures of this particular approach are compared with the ones obtained from a more traditional one, which only considers individual flow information, in order to determine which methodology best suits the concerned scenario. The experimental outcomes suggest that anomaly detection can be better addressed from a sequential perspective. The LSTM is a highly reliable model for acquiring sequential patterns in network traffic data, achieving an accuracy of 99.94% and an f1-score of 91.66%. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

15 pages, 922 KiB

Open AccessArticle

An Attention-Based Graph Neural Network for Spam Bot Detection in Social Networks

by Chensu Zhao, Yang Xin, Xuefeng Li, Hongliang Zhu, Yixian Yang and Yuling Chen

Appl. Sci. 2020, 10(22), 8160; https://doi.org/10.3390/app10228160 - 18 Nov 2020

Cited by 28 | Viewed by 5201

Abstract

With the rapid development of social networks, spam bots and other anomaly accounts’ malicious behavior has become a critical information security problem threatening the social network platform. In order to reduce this threat, the existing research mainly uses feature-based detection or propagation-based detection, [...] Read more.

With the rapid development of social networks, spam bots and other anomaly accounts’ malicious behavior has become a critical information security problem threatening the social network platform. In order to reduce this threat, the existing research mainly uses feature-based detection or propagation-based detection, and it applies machine learning or graph mining algorithms to identify anomaly accounts in social networks. However, with the development of technology, spam bots are becoming more advanced, and identifying bots is still an open challenge. This paper proposes a new semi-supervised graph embedding model based on a graph attention network for spam bot detection in social networks. This approach constructs a detection model by aggregating features and neighbor relationships, and learns a complex method to integrate the different neighborhood relationships between nodes to operate the directed social graph. The new model can identify spam bots by capturing user features and two different relationships among users in social networks. We compare our method with other methods on real-world social network datasets, and the experimental results show that our proposed model achieves a significant and consistent improvement. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

10 pages, 606 KiB

Open AccessArticle

Detecting Cyber Threat Event from Twitter Using IDCNN and BiLSTM

by Yong Fang, Jian Gao, Zhonglin Liu and Cheng Huang

Appl. Sci. 2020, 10(17), 5922; https://doi.org/10.3390/app10175922 - 26 Aug 2020

Cited by 27 | Viewed by 5448

Abstract

In the context of increasing cyber threats and attacks, monitoring and analyzing network security incidents in a timely and effective way is the key to ensuring network infrastructure security. As one of the world’s most popular social media sites, users post all kinds [...] Read more.

In the context of increasing cyber threats and attacks, monitoring and analyzing network security incidents in a timely and effective way is the key to ensuring network infrastructure security. As one of the world’s most popular social media sites, users post all kinds of messages on Twitter, from daily life to global news and political strategy. It can aggregate a large number of network security-related events promptly and provide a source of information flow about cyber threats. In this paper, for detecting cyber threat events on Twitter, we present a multi-task learning approach based on the natural language processing technology and machine learning algorithm of the Iterated Dilated Convolutional Neural Network (IDCNN) and Bidirectional Long Short-Term Memory (BiLSTM) to establish a highly accurate network model. Furthermore, we collect a network threat-related Twitter database from the public datasets to verify our model’s performance. The results show that the proposed model works well to detect cyber threat events from tweets and significantly outperform several baselines. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Graphical abstract

23 pages, 4024 KiB

Open AccessArticle

Hybrid Malware Classification Method Using Segmentation-Based Fractal Texture Analysis and Deep Convolution Neural Network Features

by Maryam Nisa, Jamal Hussain Shah, Shansa Kanwal, Mudassar Raza, Muhammad Attique Khan, Robertas Damaševičius and Tomas Blažauskas

Appl. Sci. 2020, 10(14), 4966; https://doi.org/10.3390/app10144966 - 19 Jul 2020

Cited by 115 | Viewed by 8459

Abstract

As the number of internet users increases so does the number of malicious attacks using malware. The detection of malicious code is becoming critical, and the existing approaches need to be improved. Here, we propose a feature fusion method to combine the features [...] Read more.

As the number of internet users increases so does the number of malicious attacks using malware. The detection of malicious code is becoming critical, and the existing approaches need to be improved. Here, we propose a feature fusion method to combine the features extracted from pre-trained AlexNet and Inception-v3 deep neural networks with features attained using segmentation-based fractal texture analysis (SFTA) of images representing the malware code. In this work, we use distinctive pre-trained models (AlexNet and Inception-V3) for feature extraction. The purpose of deep convolutional neural network (CNN) feature extraction from two models is to improve the malware classifier accuracy, because both models have characteristics and qualities to extract different features. This technique produces a fusion of features to build a multimodal representation of malicious code that can be used to classify the grayscale images, separating the malware into 25 malware classes. The features that are extracted from malware images are then classified using different variants of support vector machine (SVM), k-nearest neighbor (KNN), decision tree (DT), and other classifiers. To improve the classification results, we also adopted data augmentation based on affine image transforms. The presented method is evaluated on a Malimg malware image dataset, achieving an accuracy of 99.3%, which makes it the best among the competing approaches. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

20 pages, 473 KiB

Open AccessArticle

Cross-Site Scripting Guardian: A Static XSS Detector Based on Data Stream Input-Output Association Mining

by Chenghao Li, Yiding Wang, Changwei Miao and Cheng Huang

Appl. Sci. 2020, 10(14), 4740; https://doi.org/10.3390/app10144740 - 9 Jul 2020

Cited by 22 | Viewed by 6469

Abstract

The largest number of cybersecurity attacks is on web applications, in which Cross-Site Scripting (XSS) is the most popular way. The code audit is the main method to avoid the damage of XSS at the source code level. However, there are numerous limits [...] Read more.

The largest number of cybersecurity attacks is on web applications, in which Cross-Site Scripting (XSS) is the most popular way. The code audit is the main method to avoid the damage of XSS at the source code level. However, there are numerous limits implementing manual audits and rule-based audit tools. In the age of big data, it is a new research field to assist the manual auditing through machine learning. In this paper, we propose a new way to audit the XSS vulnerability in PHP source code snippets based on a PHP code parsing tool and the machine learning algorithm. We analyzed the operation sequence of source code and built a model to acquire the information that is most closely related to the XSS attack in the data stream. The method proposed can significantly improve the recall rate of vulnerability samples. Compared with related audit methods, our method has high reusability and excellent performance. Our classification model achieved an F1 score of 0.92, a recall rate of 0.98 (vulnerable sample), and an area under curve (AUC) of 0.97 on the test dataset. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Graphical abstract

17 pages, 1236 KiB

Open AccessArticle

Providing Email Privacy by Preventing Webmail from Loading Malicious XSS Payloads

by Yong Fang, Yijia Xu, Peng Jia and Cheng Huang

Appl. Sci. 2020, 10(13), 4425; https://doi.org/10.3390/app10134425 - 27 Jun 2020

Cited by 13 | Viewed by 17401

Abstract

With the development of internet technology, email has become the formal communication method in modern society. Email often contains a large amount of personal privacy information, possible business agreements, and sensitive attachments, which make emails a good target for hackers. One of the [...] Read more.

With the development of internet technology, email has become the formal communication method in modern society. Email often contains a large amount of personal privacy information, possible business agreements, and sensitive attachments, which make emails a good target for hackers. One of the most common attack method used by hackers is email XSS (Cross-site scripting). Through exploiting XSS vulnerabilities, hackers can steal identities, logging into the victim’s mailbox and stealing content directly. Therefore, this paper proposes an email XSS detection model based on deep learning technology, which can identify whether the XSS payload is carried in the email or not. Firstly, the model could extract the Sender, Receiver, Subject, Content, Attachment field information from the original email. Secondly, the email XSS corpus is formed after data processing. The Word2Vec algorithm is introduced to train the corpus and extract features for each email sample. Finally, the model uses the Bidirectional-RNN algorithm and Attention mechanism to train the email XSS detection model. In the experiment, the AUC (area under curve) value of the Bidirectional-RNN model reached 0.9979. When the Attention mechanism was added, the accuracy upper limit of the Bidirectional-RNN model was raised to 0.9936, and the loss value was reduced to 0.03. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

29 pages, 3410 KiB

Open AccessArticle

Methodology for Forensics Data Reconstruction on Mobile Devices with Android Operating System Applying In-System Programming and Combination Firmware

by Claudinei Morin da Silveira, Rafael T. de Sousa Jr, Robson de Oliveira Albuquerque, Georges D. Amvame Nze, Gildásio Antonio de Oliveira Júnior, Ana Lucila Sandoval Orozco and Luis Javier García Villalba

Appl. Sci. 2020, 10(12), 4231; https://doi.org/10.3390/app10124231 - 20 Jun 2020

Cited by 19 | Viewed by 14641

Abstract

This paper proposes a new forensic analysis methodology that combines processes, techniques, and tools for physical and logical data acquisition from mobile devices. The proposed methodology allows an overview of the use of the In-System Programming (ISP) technique with the usage of Combination [...] Read more.

This paper proposes a new forensic analysis methodology that combines processes, techniques, and tools for physical and logical data acquisition from mobile devices. The proposed methodology allows an overview of the use of the In-System Programming (ISP) technique with the usage of Combination Firmware, aligned with specific collection and analysis processes. The carried out experiments show that the proposed methodology is convenient and practical and provides new possibilities for data acquisition on devices that run the Android Operating System with advanced protection mechanisms. The methodology is also feasible in devices compatible with the usage of Joint Test Action Group (JTAG) techniques and which use Embedded Multimedia Card (eMMC) or Embedded Multi-Chip Package (eMCP) as main memory. The techniques included in the methodology are effective on encrypted devices, in which the JTAG and Chip-Off techniques prove to be ineffective, especially on those that have an unauthorized access protection mechanism enabled, such as lock screen password, blocked bootloader, and Factory Reset Protection (FRP) active. Studies also demonstrate that data preservation and integrity are maintained, which is critical to a digital forensic process. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

21 pages, 2665 KiB

Open AccessArticle

Malicious JavaScript Detection Based on Bidirectional LSTM Model

by Xuyan Song, Chen Chen, Baojiang Cui and Junsong Fu

Appl. Sci. 2020, 10(10), 3440; https://doi.org/10.3390/app10103440 - 16 May 2020

Cited by 32 | Viewed by 6064

Abstract

JavaScript has been widely used on the Internet because of its powerful features, and almost all the websites use it to provide dynamic functions. However, these dynamic natures also carry potential risks. The authors of the malicious scripts started using JavaScript to launch [...] Read more.

JavaScript has been widely used on the Internet because of its powerful features, and almost all the websites use it to provide dynamic functions. However, these dynamic natures also carry potential risks. The authors of the malicious scripts started using JavaScript to launch various attacks, such as Cross-Site Scripting (XSS), Cross-site Request Forgery (CSRF), and drive-by download attack. Traditional malicious script detection relies on expert knowledge, but even for experts, this is an error-prone task. To solve this problem, many learning-based methods for malicious JavaScript detection are being explored. In this paper, we propose a novel deep learning-based method for malicious JavaScript detection. In order to extract semantic information from JavaScript programs, we construct the Program Dependency Graph (PDG) and generate semantic slices, which preserve rich semantic information and are easy to transform into vectors. Then, a malicious JavaScript detection model based on the Bidirectional Long Short-Term Memory (BLSTM) neural network is proposed. Experimental results show that, in comparison with the other five methods, our model achieved the best performance, with an accuracy of 97.71% and an F1-score of 98.29%. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

21 pages, 6411 KiB

Open AccessArticle

ConvProtoNet: Deep Prototype Induction towards Better Class Representation for Few-Shot Malware Classification

by Zhijie Tang, Peng Wang and Junfeng Wang

Appl. Sci. 2020, 10(8), 2847; https://doi.org/10.3390/app10082847 - 20 Apr 2020

Cited by 34 | Viewed by 4772

Abstract

Traditional malware classification relies on known malware types and significantly large datasets labeled manually which limits its ability to recognize new malware classes. For unknown malware types or new variants of existing malware containing only a few samples each class, common classification methods [...] Read more.

Traditional malware classification relies on known malware types and significantly large datasets labeled manually which limits its ability to recognize new malware classes. For unknown malware types or new variants of existing malware containing only a few samples each class, common classification methods often fail to work well due to severe overfitting. In this paper, we propose a new neural network structure called ConvProtoNet which employs few-shot learning to address the problem of scarce malware samples while prevent from overfitting. We design a convolutional induction module to replace the insufficient prototype reduction in most few-shot models and generates more appropriate class-level malware prototypes for classification. We also adopt meta-learning scheme to make classifier robust enough to adapt unseen malware classes without fine-tuning. Even in extreme conditions where only 5 samples in each class are provided, ConvProtoNet still achieves more than 70% accuracy on average and outperforms other traditional malware classification methods or existed few-shot models in experiments conducted on several datasets. Extra experiments across datasets illustrate that ConvProtoNet learns general knowledge of malware which is dataset-invariant and careful model analysis proves effectiveness of ConvProtoNet in few-shot malware classification. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

26 pages, 3444 KiB

Open AccessArticle

Investigation of Dual-Flow Deep Learning Models LSTM-FCN and GRU-FCN Efficiency against Single-Flow CNN Models for the Host-Based Intrusion and Malware Detection Task on Univariate Times Series Data

by Dainius Čeponis and Nikolaj Goranin

Appl. Sci. 2020, 10(7), 2373; https://doi.org/10.3390/app10072373 - 30 Mar 2020

Cited by 17 | Viewed by 7602

Abstract

Intrusion and malware detection tasks on a host level are a critical part of the overall information security infrastructure of a modern enterprise. While classical host-based intrusion detection systems (HIDS) and antivirus (AV) approaches are based on change monitoring of critical files and [...] Read more.

Intrusion and malware detection tasks on a host level are a critical part of the overall information security infrastructure of a modern enterprise. While classical host-based intrusion detection systems (HIDS) and antivirus (AV) approaches are based on change monitoring of critical files and malware signatures, respectively, some recent research, utilizing relatively vanilla deep learning (DL) methods, has demonstrated promising anomaly-based detection results that already have practical applicability due low false positive rate (FPR). More complex DL methods typically provide better results in natural language processing and image recognition tasks. In this paper, we analyze applicability of more complex dual-flow DL methods, such as long short-term memory fully convolutional network (LSTM-FCN), gated recurrent unit (GRU)-FCN, and several others, for the task specified on the attack-caused Windows OS system calls traces dataset (AWSCTD) and compare it with vanilla single-flow convolutional neural network (CNN) models. The results obtained do not demonstrate any advantages of dual-flow models while processing univariate times series data and introducing unnecessary level of complexity, increasing training, and anomaly detection time, which is crucial in the intrusion containment process. On the other hand, the newly tested AWSCTD-CNN-static (S) single-flow model demonstrated three times better training and testing times, preserving the high detection accuracy. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

23 pages, 776 KiB

Open AccessArticle

DeepDCA: Novel Network-Based Detection of IoT Attacks Using Artificial Immune System

by Sahar Aldhaheri, Daniyal Alghazzawi, Li Cheng, Bander Alzahrani and Abdullah Al-Barakati

Appl. Sci. 2020, 10(6), 1909; https://doi.org/10.3390/app10061909 - 11 Mar 2020

Cited by 86 | Viewed by 7658

Abstract

Recently Internet of Things (IoT) attains tremendous popularity, although this promising technology leads to a variety of security obstacles. The conventional solutions do not suit the new dilemmas brought by the IoT ecosystem. Conversely, Artificial Immune Systems (AIS) is intelligent and adaptive systems [...] Read more.

Recently Internet of Things (IoT) attains tremendous popularity, although this promising technology leads to a variety of security obstacles. The conventional solutions do not suit the new dilemmas brought by the IoT ecosystem. Conversely, Artificial Immune Systems (AIS) is intelligent and adaptive systems mimic the human immune system which holds desirable properties for such a dynamic environment and provides an opportunity to improve IoT security. In this work, we develop a novel hybrid Deep Learning and Dendritic Cell Algorithm (DeepDCA) in the context of an Intrusion Detection System (IDS). The framework adopts Dendritic Cell Algorithm (DCA) and Self Normalizing Neural Network (SNN). The aim of this research is to classify IoT intrusion and minimize the false alarm generation. Also, automate and smooth the signal extraction phase which improves the classification performance. The proposed IDS selects the convenient set of features from the IoT-Bot dataset, performs signal categorization using the SNN then use the DCA for classification. The experimentation results show that DeepDCA performed well in detecting the IoT attacks with a high detection rate demonstrating over 98.73% accuracy and low false-positive rate. Also, we compared these results with State-of-the-art techniques, which showed that our model is capable of performing better classification tasks than SVM, NB, KNN, and MLP. We plan to carry out further experiments to verify the framework using a more challenging dataset and make further comparisons with other signal extraction approaches. Also, involve in real-time (online) attack detection. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

21 pages, 488 KiB

Open AccessArticle

Towards a Reliable Comparison and Evaluation of Network Intrusion Detection Systems Based on Machine Learning Approaches

by Roberto Magán-Carrión, Daniel Urda, Ignacio Díaz-Cano and Bernabé Dorronsoro

Appl. Sci. 2020, 10(5), 1775; https://doi.org/10.3390/app10051775 - 4 Mar 2020

Cited by 105 | Viewed by 9544

Abstract

Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of [...] Read more.

Presently, we are living in a hyper-connected world where millions of heterogeneous devices are continuously sharing information in different application contexts for wellness, improving communications, digital businesses, etc. However, the bigger the number of devices and connections are, the higher the risk of security threats in this scenario. To counteract against malicious behaviours and preserve essential security services, Network Intrusion Detection Systems (NIDSs) are the most widely used defence line in communications networks. Nevertheless, there is no standard methodology to evaluate and fairly compare NIDSs. Most of the proposals elude mentioning crucial steps regarding NIDSs validation that make their comparison hard or even impossible. This work firstly includes a comprehensive study of recent NIDSs based on machine learning approaches, concluding that almost all of them do not accomplish with what authors of this paper consider mandatory steps for a reliable comparison and evaluation of NIDSs. Secondly, a structured methodology is proposed and assessed on the UGR’16 dataset to test its suitability for addressing network attack detection problems. The guideline and steps recommended will definitively help the research community to fairly assess NIDSs, although the definitive framework is not a trivial task and, therefore, some extra effort should still be made to improve its understandability and usability further. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Graphical abstract

16 pages, 1416 KiB

Open AccessArticle

Automated Vulnerability Detection in Source Code Using Minimum Intermediate Representation Learning

by Xin Li, Lu Wang, Yang Xin, Yixian Yang and Yuling Chen

Appl. Sci. 2020, 10(5), 1692; https://doi.org/10.3390/app10051692 - 2 Mar 2020

Cited by 74 | Viewed by 9531

Abstract

Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent [...] Read more.

Vulnerability is one of the root causes of network intrusion. An effective way to mitigate security threats is to discover and patch vulnerabilities before an attack. Traditional vulnerability detection methods rely on manual participation and incur a high false positive rate. The intelligent vulnerability detection methods suffer from the problems of long-term dependence, out of vocabulary, coarse detection granularity and lack of vulnerable samples. This paper proposes an automated and intelligent vulnerability detection method in source code based on the minimum intermediate representation learning. First, the sample in the form of source code is transformed into a minimum intermediate representation to exclude the irrelevant items and reduce the length of the dependency. Next, the intermediate representation is transformed into a real value vector through pre-training on an extended corpus, and the structure and semantic information are retained. Then, the vector is fed to three concatenated convolutional neural networks to obtain high-level features of vulnerability. Last, a classifier is trained using the learned features. To validate this vulnerability detection method, an experiment was performed. The empirical results confirmed that compared with the traditional methods and the state-of-the-art intelligent methods, our method has a better performance with fine granularity. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

31 pages, 7123 KiB

Open AccessArticle

Systematic Approach to Malware Analysis (SAMA)

by Javier Bermejo Higuera, Carlos Abad Aramburu, Juan-Ramón Bermejo Higuera, Miguel Angel Sicilia Urban and Juan Antonio Sicilia Montalvo

Appl. Sci. 2020, 10(4), 1360; https://doi.org/10.3390/app10041360 - 17 Feb 2020

Cited by 26 | Viewed by 13576

Abstract

Malware threats pose new challenges to analytic and reverse engineering tasks. It is needed for a systematic approach to that analysis, in an attempt to fully uncover their underlying attack vectors and techniques and find commonalities between them. In this paper, a method [...] Read more.

Malware threats pose new challenges to analytic and reverse engineering tasks. It is needed for a systematic approach to that analysis, in an attempt to fully uncover their underlying attack vectors and techniques and find commonalities between them. In this paper, a method of malware analysis is described, together with a report of its application to the case of Flame and Red October. The method has also been used by different analysts to analyze other malware threats like ‘Stuxnet’, ‘Dark Comet’, ‘Poison Ivy’, ‘Locky’, ‘Careto’, and ‘Sofacy Carberp’. The method presented in this work is a systematic and methodological process of analysis, whose main objective is the acquisition of knowledge as well as to gain a full understanding of a particular malware. Using the proposed method to analyze two well-known malware as ‘Flame’ and ‘Red October’ will help to understand the added value of the method. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

15 pages, 1476 KiB

Open AccessArticle

Graph Convolutional Networks for Privacy Metrics in Online Social Networks

by Xuefeng Li, Yang Xin, Chensu Zhao, Yixian Yang and Yuling Chen

Appl. Sci. 2020, 10(4), 1327; https://doi.org/10.3390/app10041327 - 15 Feb 2020

Cited by 14 | Viewed by 3756

Abstract

In recent years, privacy leakage events in large-scale social networks have become increasingly frequent. Traditional methods relying on operators have been unable to effectively curb this problem. Researchers must turn their attention to the privacy protection of users themselves. Privacy metrics are undoubtedly [...] Read more.

In recent years, privacy leakage events in large-scale social networks have become increasingly frequent. Traditional methods relying on operators have been unable to effectively curb this problem. Researchers must turn their attention to the privacy protection of users themselves. Privacy metrics are undoubtedly the most effective method. However, social networks have a substantial number of users and a complex network structure and feature set. Previous studies either considered a single aspect or measured multiple aspects separately and then artificially integrated them. The measurement procedures are complex and cannot effectively be integrated. To solve the above problems, we first propose using a deep neural network to measure the privacy status of social network users. Through a graph convolution network, we can easily and efficiently combine the user features and graph structure, determine the hidden relationships between these features, and obtain more accurate privacy scores. Given the restriction of the deep learning framework, which requires a large number of labelled samples, we incorporate a few-shot learning method, which greatly reduces the dependence on labelled data and human intervention. Our method is applicable to online social networks, such as Sina Weibo, Twitter, and Facebook, that can extract profile information, graph structure information of users’ friends, and behavioural characteristics. The experiments show that our model can quickly and accurately obtain privacy scores in a whole network and eliminate traditional tedious numerical calculations and human intervention. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

16 pages, 3696 KiB

Open AccessArticle

Cybersecurity Threats Based on Machine Learning-Based Offensive Technique for Password Authentication

by Kyungroul Lee and Kangbin Yim

Appl. Sci. 2020, 10(4), 1286; https://doi.org/10.3390/app10041286 - 14 Feb 2020

Cited by 16 | Viewed by 5564

Abstract

Due to the emergence of online society, a representative user authentication method that is password authentication has been a key topic. However, in this authentication method, various attack techniques have emerged to steal passwords input from the keyboard, hence, the keyboard data does [...] Read more.

Due to the emergence of online society, a representative user authentication method that is password authentication has been a key topic. However, in this authentication method, various attack techniques have emerged to steal passwords input from the keyboard, hence, the keyboard data does not ensure security. To detect and prevent such an attack, a keyboard data protection technique using random keyboard data generation has been presented. This technique protects keyboard data by generating dummy keyboard data while the attacker obtains the keyboard data. In this study, we demonstrate the feasibility of keyboard data exposure under the keyboard data protection technique. To prove the proposed attack technique, we gathered all the dummy keyboard data generated by the defense tool, and the real keyboard data input by the user, and evaluated the cybersecurity threat of keyboard data based on the machine learning-based offensive technique. We verified that an adversary obtains the keyboard data with 96.2% accuracy even if the attack technique that makes it impossible to attack keyboard data exposure is used. Namely, the proposed method in this study obviously differentiates the keyboard data input by the user from dummy keyboard data. Therefore, the contributions of this paper are that we derived and verified a new security threat and a new vulnerability of password authentication. Furthermore, a new cybersecurity threat derived from this study will have advantages over the security assessment of password authentication and all types of authentication technology and application services input from the keyboard. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

14 pages, 931 KiB

Open AccessArticle

Collecting Vulnerable Source Code from Open-Source Repositories for Dataset Generation

by Razvan Raducu, Gonzalo Esteban, Francisco J. Rodríguez Lera and Camino Fernández

Appl. Sci. 2020, 10(4), 1270; https://doi.org/10.3390/app10041270 - 13 Feb 2020

Cited by 11 | Viewed by 6332

Abstract

Different Machine Learning techniques to detect software vulnerabilities have emerged in scientific and industrial scenarios. Different actors in these scenarios aim to develop algorithms for predicting security threats without requiring human intervention. However, these algorithms require data-driven engines based on the processing of [...] Read more.

Different Machine Learning techniques to detect software vulnerabilities have emerged in scientific and industrial scenarios. Different actors in these scenarios aim to develop algorithms for predicting security threats without requiring human intervention. However, these algorithms require data-driven engines based on the processing of huge amounts of data, known as datasets. This paper introduces the SonarCloud Vulnerable Code Prospector for C (SVCP4C). This tool aims to collect vulnerable source code from open source repositories linked to SonarCloud, an online tool that performs static analysis and tags the potentially vulnerable code. The tool provides a set of tagged files suitable for extracting features and creating training datasets for Machine Learning algorithms. This study presents a descriptive analysis of these files and overviews current status of C vulnerabilities, specifically buffer overflow, in the reviewed public repositories. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

19 pages, 2318 KiB

Open AccessArticle

A New Method of Fuzzy Support Vector Machine Algorithm for Intrusion Detection

by Wei Liu, LinLin Ci and LiPing Liu

Appl. Sci. 2020, 10(3), 1065; https://doi.org/10.3390/app10031065 - 5 Feb 2020

Cited by 26 | Viewed by 5486

Abstract

Since SVM is sensitive to noises and outliers of system call sequence data. A new fuzzy support vector machine algorithm based on SVDD is presented in this paper. In our algorithm, the noises and outliers are identified by a hypersphere with minimum volume [...] Read more.

Since SVM is sensitive to noises and outliers of system call sequence data. A new fuzzy support vector machine algorithm based on SVDD is presented in this paper. In our algorithm, the noises and outliers are identified by a hypersphere with minimum volume while containing the maximum of the samples. The definition of fuzzy membership is considered by not only the relation between a sample and hyperplane, but also relation between samples. For each sample inside the hypersphere, the fuzzy membership function is a linear function of the distance between the sample and the hyperplane. The greater the distance, the greater the weight coefficient. For each sample outside the hypersphere, the membership function is an exponential function of the distance between the sample and the hyperplane. The greater the distance, the smaller the weight coefficient. Compared with the traditional fuzzy membership definition based on the relation between a sample and its cluster center, our method effectively distinguishes the noises or outlies from support vectors and assigns them appropriate weight coefficients even though they are distributed on the boundary between the positive and the negative classes. The experiments show that the fuzzy support vector proposed in this paper is more robust than the support vector machine and fuzzy support vector machines based on the distance of a sample and its cluster center. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

18 pages, 1497 KiB

Open AccessArticle

A Heterogeneous Ensemble Learning Framework for Spam Detection in Social Networks with Imbalanced Data

by Chensu Zhao, Yang Xin, Xuefeng Li, Yixian Yang and Yuling Chen

Appl. Sci. 2020, 10(3), 936; https://doi.org/10.3390/app10030936 - 31 Jan 2020

Cited by 65 | Viewed by 6748

Abstract

The popularity of social networks provides people with many conveniences, but their rapid growth has also attracted many attackers. In recent years, the malicious behavior of social network spammers has seriously threatened the information security of ordinary users. To reduce this threat, many [...] Read more.

The popularity of social networks provides people with many conveniences, but their rapid growth has also attracted many attackers. In recent years, the malicious behavior of social network spammers has seriously threatened the information security of ordinary users. To reduce this threat, many researchers have mined the behavior characteristics of spammers and have obtained good results by applying machine learning algorithms to identify spammers in social networks. However, most of these studies overlook class imbalance situations that exist in real world data. In this paper, we propose a heterogeneous stacking-based ensemble learning framework to ameliorate the impact of class imbalance on spam detection in social networks. The proposed framework consists of two main components, a base module and a combining module. In the base module, we adopt six different base classifiers and utilize this classifier diversity to construct new ensemble input members. In the combination module, we introduce cost sensitive learning into deep neural network training. By setting different costs for misclassification and dynamically adjusting the weights of the prediction results of the base classifiers, we can integrate the input members and aggregate the classification results. The experimental results show that our framework effectively improves the spam detection rate on imbalanced datasets. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

19 pages, 849 KiB

Open AccessArticle

Synthetic Minority Oversampling Technique for Optimizing Classification Tasks in Botnet and Intrusion-Detection-System Datasets

by David Gonzalez-Cuautle, Aldo Hernandez-Suarez, Gabriel Sanchez-Perez, Linda Karina Toscano-Medina, Jose Portillo-Portillo, Jesus Olivares-Mercado, Hector Manuel Perez-Meana and Ana Lucila Sandoval-Orozco

Appl. Sci. 2020, 10(3), 794; https://doi.org/10.3390/app10030794 - 22 Jan 2020

Cited by 71 | Viewed by 6778

Abstract

Presently, security is a hot research topic due to the impact in daily information infrastructure. Machine-learning solutions have been improving classical detection practices, but detection tasks employ irregular amounts of data since the number of instances that represent one or several malicious samples [...] Read more.

Presently, security is a hot research topic due to the impact in daily information infrastructure. Machine-learning solutions have been improving classical detection practices, but detection tasks employ irregular amounts of data since the number of instances that represent one or several malicious samples can significantly vary. In highly unbalanced data, classification models regularly have high precision with respect to the majority class, while minority classes are considered noise due to the lack of information that they provide. Well-known datasets used for malware-based analyses like botnet attacks and Intrusion Detection Systems (IDS) mainly comprise logs, records, or network-traffic captures that do not provide an ideal source of evidence as a result of obtaining raw data. As an example, the numbers of abnormal and constant connections generated by either botnets or intruders within a network are considerably smaller than those from benign applications. In most cases, inadequate dataset design may lead to the downgrade of a learning algorithm, resulting in overfitting and poor classification rates. To address these problems, we propose a resampling method, the Synthetic Minority Oversampling Technique (SMOTE) with a grid-search algorithm optimization procedure. This work demonstrates classification-result improvements for botnet and IDS datasets by merging synthetically generated balanced data and tuning different supervised-learning algorithms. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

20 pages, 12601 KiB

Open AccessArticle

Improving Incident Response in Big Data Ecosystems by Using Blockchain Technologies

by Julio Moreno, Manuel A. Serrano, Eduardo B. Fernandez and Eduardo Fernández-Medina

Appl. Sci. 2020, 10(2), 724; https://doi.org/10.3390/app10020724 - 20 Jan 2020

Cited by 15 | Viewed by 5951

Abstract

Big data ecosystems are increasingly important for the daily activities of any type of company. They are decisive elements in the organization, so any malfunction of this environment can have a great impact on the normal functioning of the company; security is therefore [...] Read more.

Big data ecosystems are increasingly important for the daily activities of any type of company. They are decisive elements in the organization, so any malfunction of this environment can have a great impact on the normal functioning of the company; security is therefore a crucial aspect of this type of ecosystem. When approaching security in big data as an issue, it must be considered not only during the creation and implementation of the big data ecosystem, but also throughout its entire lifecycle, including operation, and especially when managing and responding to incidents that occur. To this end, this paper proposes an incident response process supported by a private blockchain network that allows the recording of the different events and incidents that occur in the big data ecosystem. The use of blockchain enables the security of the stored data to be improved, increasing its immutability and traceability. In addition, the stored records can help manage incidents and anticipate them, thereby minimizing the costs of investigating their causes; that facilitates forensic readiness. This proposal integrates with previous research work, seeking to improve the security of big data by creating a process of secure analysis, design, and implementation, supported by a security reference architecture that serves as a guide in defining the different elements of this type of ecosystem. Moreover, this paper presents a case study in which the proposal is being implemented by using big data and blockchain technologies, such as Apache Spark or Hyperledger Fabric. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

14 pages, 314 KiB

Open AccessArticle

A Zero-Knowledge Proof System with Algebraic Geometry Techniques

by Edgar González Fernández, Guillermo Morales-Luna and Feliu Sagols

Appl. Sci. 2020, 10(2), 465; https://doi.org/10.3390/app10020465 - 8 Jan 2020

Cited by 6 | Viewed by 4691

Abstract

Current requirements for ensuring data exchange over the internet to fight against security breaches have to consider new cryptographic attacks. The most recent advances in cryptanalysis are boosted by quantum computers, which are able to break common cryptographic primitives. This makes evident the [...] Read more.

Current requirements for ensuring data exchange over the internet to fight against security breaches have to consider new cryptographic attacks. The most recent advances in cryptanalysis are boosted by quantum computers, which are able to break common cryptographic primitives. This makes evident the need for developing further communication protocols to secure sensitive data. Zero-knowledge proof systems have been around for a while and have been considered for providing authentication and identification services, but it has only been in recent times that its popularity has risen due to novel applications in blockchain technology, Internet of Things, and cloud storage, among others. A new zero-knowledge proof system is presented, which bases its security in two main problems, known to be resistant, up to now, against quantum attacks: the graph isomorphism problem and the isomorphism of polynomials problem. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

28 pages, 5256 KiB

Open AccessArticle

CyberSPL: A Framework for the Verification of Cybersecurity Policy Compliance of System Configurations Using Software Product Lines

by Ángel Jesús Varela-Vaca , Rafael M. Gasca, Rafael Ceballos, María Teresa Gómez-López and Pedro Bernáldez Torres

Appl. Sci. 2019, 9(24), 5364; https://doi.org/10.3390/app9245364 - 8 Dec 2019

Cited by 26 | Viewed by 4509

Abstract

Cybersecurity attacks affect the compliance of cybersecurity policies of the organisations. Such disadvantages may be due to the absence of security configurations or the use of default configuration values of software products and systems. The complexity in the configuration of products and systems [...] Read more.

Cybersecurity attacks affect the compliance of cybersecurity policies of the organisations. Such disadvantages may be due to the absence of security configurations or the use of default configuration values of software products and systems. The complexity in the configuration of products and systems is a known challenge in the software industry since it includes a wide range of parameters to be taken into account. In other contexts, the configuration problems are solved using Software Product Lines. This is the reason why in this article the framework Cybersecurity Software Product Line (CyberSPL) is proposed. CyberSPL is based on a methodology to design product lines to verify cybersecurity policies according to the possible configurations. The patterns to configure the systems related to the cybersecurity aspects are grouped by defining various feature models. The automated analysis of these models allows us to diagnose possible problems in the security configurations, reducing or avoiding them. As support for this proposal, a multi-user and multi-platform solution has been implemented, enabling setting a catalogue of public or private feature models. Moreover, analysis and reasoning mechanisms have been integrated to obtain all the configurations of a model, to detect if a configuration is valid or not, including the root cause of problems for a given configuration. For validating the proposal, a real scenario is proposed where a catalogue of four different feature models is presented. In this scenario, the models have been analysed, different configurations have been validated, and several configurations with problems have been diagnosed. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

13 pages, 472 KiB

Open AccessArticle

Malware Detection on Byte Streams of Hangul Word Processor Files

by Young-Seob Jeong, Jiyoung Woo and Ah Reum Kang

Appl. Sci. 2019, 9(23), 5178; https://doi.org/10.3390/app9235178 - 29 Nov 2019

Cited by 5 | Viewed by 4424

Abstract

While the exchange of data files or programs on the Internet grows exponentially, most users are vulnerable to infected files, especially to malicious non-executables. Due to the circumstances between South and North Korea, many malicious actions have recently been found in Hangul Word [...] Read more.

While the exchange of data files or programs on the Internet grows exponentially, most users are vulnerable to infected files, especially to malicious non-executables. Due to the circumstances between South and North Korea, many malicious actions have recently been found in Hangul Word Processor (HWP) non-executable files because the HWP is widely used in schools, military facilities, and government institutions of South Korea. The HWP file usually has one or more byte streams that are often used for the malicious actions. Based on an assumption that infected byte streams have particular patterns, we design a convolutional neural network (CNN) to grasp such patterns. We conduct experiments on our prepared 534 HWP files, and demonstrate that the proposed CNN achieves the best performance compared to other machine learning models. As new malicious attacks keep emerging, we will keep collecting such HWP files and investigate better model structures. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

20 pages, 1152 KiB

Open AccessArticle

Bitcoin and Cybersecurity: Temporal Dissection of Blockchain Data to Unveil Changes in Entity Behavioral Patterns

by Francesco Zola, Jan Lukas Bruse, Maria Eguimendia, Mikel Galar and Raul Orduna Urrutia

Appl. Sci. 2019, 9(23), 5003; https://doi.org/10.3390/app9235003 - 20 Nov 2019

Cited by 21 | Viewed by 6009

Abstract

The Bitcoin network not only is vulnerable to cyber-attacks but currently represents the most frequently used cryptocurrency for concealing illicit activities. Typically, Bitcoin activity is monitored by decreasing anonymity of its entities using machine learning-based techniques, which consider the whole blockchain. This entails [...] Read more.

The Bitcoin network not only is vulnerable to cyber-attacks but currently represents the most frequently used cryptocurrency for concealing illicit activities. Typically, Bitcoin activity is monitored by decreasing anonymity of its entities using machine learning-based techniques, which consider the whole blockchain. This entails two issues: first, it increases the complexity of the analysis requiring higher efforts and, second, it may hide network micro-dynamics important for detecting short-term changes in entity behavioral patterns. The aim of this paper is to address both issues by performing a “temporal dissection” of the Bitcoin blockchain, i.e., dividing it into smaller temporal batches to achieve entity classification. The idea is that a machine learning model trained on a certain time-interval (batch) should achieve good classification performance when tested on another batch if entity behavioral patterns are similar. We apply cascading machine learning principles—a type of ensemble learning applying stacking techniques—introducing a “k-fold cross-testing” concept across batches of varying size. Results show that blockchain batch size used for entity classification could be reduced for certain classes (Exchange, Gambling, and eWallet) as classification rates did not vary significantly with batch size; suggesting that behavioral patterns did not change significantly over time. Mixer and Market class detection, however, can be negatively affected. A deeper analysis of Mining Pool behavior showed that models trained on recent data perform better than models trained on older data, suggesting that “typical” Mining Pool behavior may be represented better by recent data. This work provides a first step towards uncovering entity behavioral changes via temporal dissection of blockchain data. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

17 pages, 5121 KiB

Open AccessArticle

Malicious PDF Detection Model against Adversarial Attack Built from Benign PDF Containing JavaScript

by Ah Reum Kang, Young-Seob Jeong, Se Lyeong Kim and Jiyoung Woo

Appl. Sci. 2019, 9(22), 4764; https://doi.org/10.3390/app9224764 - 8 Nov 2019

Cited by 15 | Viewed by 5831

Abstract

Intelligent attacks using document-based malware that exploit vulnerabilities in document viewing software programs or document file structure are increasing rapidly. There are many cases of using PDF (portable document format) in proportion to its usage. We provide in-depth analysis on PDF structure and [...] Read more.

Intelligent attacks using document-based malware that exploit vulnerabilities in document viewing software programs or document file structure are increasing rapidly. There are many cases of using PDF (portable document format) in proportion to its usage. We provide in-depth analysis on PDF structure and JavaScript content embedded in PDFs. Then, we develop the diverse feature set encompassing the structure and metadata such as file size, version, encoding method and keywords, and the content features such as object names, keywords, and readable strings in JavaScript. When features are diverse, it is hard to develop adversarial examples because small changes are robust for machine-learning algorithms. We develop a detection model using black-box type models with the structure and content features to minimize the risk of adversarial attacks. To validate the proposed model, we design the adversarial attack. We collect benign documents containing multiple JavaScript codes for the base of adversarial samples. We build the adversarial samples by injecting the malware codes into base samples. The proposed model is evaluated against a large collection of malicious and benign PDFs. We found that random forest, an ensemble algorithm of a decision tree, exhibits a good performance on malware detection and is robust for adversarial samples. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

26 pages, 5209 KiB

Open AccessArticle

A Feature Analysis Based Identifying Scheme Using GBDT for DDoS with Multiple Attack Vectors

by Jian Zhang, Qidi Liang, Rui Jiang and Xi Li

Appl. Sci. 2019, 9(21), 4633; https://doi.org/10.3390/app9214633 - 31 Oct 2019

Cited by 15 | Viewed by 4451

Abstract

In recent years, distributed denial of service (DDoS) attacks have increasingly shown the trend of multiattack vector composites, which has significantly improved the concealment and success rate of DDoS attacks. Therefore, improving the ubiquitous detection capability of DDoS attacks and accurately and quickly [...] Read more.

In recent years, distributed denial of service (DDoS) attacks have increasingly shown the trend of multiattack vector composites, which has significantly improved the concealment and success rate of DDoS attacks. Therefore, improving the ubiquitous detection capability of DDoS attacks and accurately and quickly identifying DDoS attack traffic play an important role in later attack mitigation. This paper proposes a method to efficiently detect and identify multivector DDoS attacks. The detection algorithm is applicable to known and unknown DDoS attacks. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

15 pages, 4044 KiB

Open AccessArticle

Ontology-Based System for Dynamic Risk Management in Administrative Domains

by Mario Vega-Barbas, Víctor A. Villagrá, Fernando Monje, Raúl Riesco, Xavier Larriva-Novo and Julio Berrocal

Appl. Sci. 2019, 9(21), 4547; https://doi.org/10.3390/app9214547 - 26 Oct 2019

Cited by 10 | Viewed by 6946

Abstract

With the increasing complexity of cyberthreats, it is necessary to have tools to understand the changing context in real-time. This document will present architecture and a prototype designed to model the risk of administrative domains, exemplifying the case of a country in real-time, [...] Read more.

With the increasing complexity of cyberthreats, it is necessary to have tools to understand the changing context in real-time. This document will present architecture and a prototype designed to model the risk of administrative domains, exemplifying the case of a country in real-time, specifically, Spain. In order to carry out this task, a modeling of the assets and threats detected by various sources of information has been carried out. All this information is stored as knowledge making use of ontologies, which enables the application of reasoning engines in order to infer new knowledge that can be used later in the following reasoning. This modeling and reasoning have been enriched with a dynamic system for managing the trust of the different sources of information and capabilities for increased reliability with the inclusion of additional threat intelligence information. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

21 pages, 2156 KiB

Open AccessArticle

Insider Threat Detection Based on User Behavior Modeling and Anomaly Detection Algorithms

by Junhong Kim, Minsik Park, Haedong Kim, Suhyoun Cho and Pilsung Kang

Appl. Sci. 2019, 9(19), 4018; https://doi.org/10.3390/app9194018 - 25 Sep 2019

Cited by 78 | Viewed by 16187

Abstract

Insider threats are malicious activities by authorized users, such as theft of intellectual property or security information, fraud, and sabotage. Although the number of insider threats is much lower than external network attacks, insider threats can cause extensive damage. As insiders are very [...] Read more.

Insider threats are malicious activities by authorized users, such as theft of intellectual property or security information, fraud, and sabotage. Although the number of insider threats is much lower than external network attacks, insider threats can cause extensive damage. As insiders are very familiar with an organization’s system, it is very difficult to detect their malicious behavior. Traditional insider-threat detection methods focus on rule-based approaches built by domain experts, but they are neither flexible nor robust. In this paper, we propose insider-threat detection methods based on user behavior modeling and anomaly detection algorithms. Based on user log data, we constructed three types of datasets: user’s daily activity summary, e-mail contents topic distribution, and user’s weekly e-mail communication history. Then, we applied four anomaly detection algorithms and their combinations to detect malicious activities. Experimental results indicate that the proposed framework can work well for imbalanced datasets in which there are only a few insider threats and where no domain experts’ knowledge is provided. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

15 pages, 2098 KiB

Open AccessArticle

Information Extraction of Cybersecurity Concepts: An LSTM Approach

by Houssem Gasmi, Jannik Laval and Abdelaziz Bouras

Appl. Sci. 2019, 9(19), 3945; https://doi.org/10.3390/app9193945 - 20 Sep 2019

Cited by 45 | Viewed by 5690

Abstract

Extracting cybersecurity entities and the relationships between them from online textual resources such as articles, bulletins, and blogs and converting these resources into more structured and formal representations has important applications in cybersecurity research and is valuable for professional practitioners. Previous works to [...] Read more.

Extracting cybersecurity entities and the relationships between them from online textual resources such as articles, bulletins, and blogs and converting these resources into more structured and formal representations has important applications in cybersecurity research and is valuable for professional practitioners. Previous works to accomplish this task were mainly based on utilizing feature-based models. Feature-based models are time-consuming and need labor-intensive feature engineering to describe the properties of entities, domain knowledge, entity context, and linguistic characteristics. Therefore, to alleviate the need for feature engineering, we propose the usage of neural network models, specifically the long short-term memory (LSTM) models to accomplish the tasks of Named Entity Recognition (NER) and Relation Extraction (RE). We evaluated the proposed models on two tasks. The first task is performing NER and evaluating the results against the state-of-the-art Conditional Random Fields (CRFs) method. The second task is performing RE using three LSTM models and comparing their results to assess which model is more suitable for the domain of cybersecurity. The proposed models achieved competitive performance with less feature-engineering work. We demonstrate that exploiting neural network models in cybersecurity text mining is effective and practical. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

12 pages, 1704 KiB

Open AccessArticle

Malware Detection Approach Based on Artifacts in Memory Image and Dynamic Analysis

by Rami Sihwail, Khairuddin Omar, Khairul Akram Zainol Ariffin and Sanad Al Afghani

Appl. Sci. 2019, 9(18), 3680; https://doi.org/10.3390/app9183680 - 5 Sep 2019

Cited by 66 | Viewed by 8639

Abstract

The need to detect malware before it harms computers, mobile phones and other electronic devices has caught the attention of researchers and the anti-malware industry for many years. To protect users from malware attacks, anti-virus software products are downloaded on the computer. The [...] Read more.

The need to detect malware before it harms computers, mobile phones and other electronic devices has caught the attention of researchers and the anti-malware industry for many years. To protect users from malware attacks, anti-virus software products are downloaded on the computer. The anti-virus mainly uses signature-based techniques to detect malware. However, this technique fails to detect malware that uses packing, encryption or obfuscation techniques. It also fails to detect unseen (new) ones. This paper proposes an integrated malware detection approach that applies memory forensics to extract malicious artifacts from memory and combines them to features extracted during the execution of malware in a dynamic analysis. Pre-modeling techniques were also applied for feature engineering before training and testing the data set on the machine learning models. The experimental results show a significant improvement in both detection accuracy rate and false positive rate, 98.5% and 1.7% respectively, by applying the support vector machine. The results verify that our integrated analysis approach outperforms other analysis methods. In addition, the proposed approach overcomes the limitation of single path file execution in dynamic analysis by adding more relevant memory artifacts that can reveal the real intention of malicious files. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

29 pages, 2021 KiB

Open AccessArticle

Network Intrusion Detection Based on Novel Feature Selection Model and Various Recurrent Neural Networks

by Thi-Thu-Huong Le, Yongsu Kim and Howon Kim

Appl. Sci. 2019, 9(7), 1392; https://doi.org/10.3390/app9071392 - 3 Apr 2019

Cited by 94 | Viewed by 7087

Abstract

The recent increase in hacks and computer network attacks around the world has intensified the need to develop better intrusion detection and prevention systems. The intrusion detection system (IDS) plays a vital role in detecting anomalies and attacks on the network which have [...] Read more.

The recent increase in hacks and computer network attacks around the world has intensified the need to develop better intrusion detection and prevention systems. The intrusion detection system (IDS) plays a vital role in detecting anomalies and attacks on the network which have become larger and more pervasive in nature. However, most anomaly-based intrusion detection systems are plagued by high false positives. Furthermore, Remote-to-Local (R2L) and User-to-Root (U2R) are two kinds of attack which have low predicted accuracy scores in advance IDS methods. Therefore, this paper proposes a novel IDS framework to overcome these IDS problems. The proposed framework including three main parts. The first part is to build SFSDT model which is the feature selection model. SFSDT is to generate the best feature subset from the original feature set. This model is a hybrid Sequence Forward Selection (SFS) algorithm and Decision Tree (DT) model. The second part is to build various IDS models to train on the best-selected feature subset. The various Recurrent Neural Networks (RNN) are traditional RNN, Long Short-Term Memory (LSTM), and Gated Recurrent Unit (GRU). Two IDS datasets are used for the learned models in experiments including NSL-KDD in 2010 and ISCX in 2012. The final part is to evaluate the proposed model by comparing the proposed models to other IDS models. The experimental results show the proposed models achieve significantly improved accuracy detection rate as well as attack types classification. Furthermore, this approach can reduce the computation time by memory profilers measurement. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

Review

Jump to: Research

26 pages, 1625 KiB

Open AccessReview

A Systematic Review of Defensive and Offensive Cybersecurity with Machine Learning

by Imatitikua D. Aiyanyo, Hamman Samuel and Heuiseok Lim

Appl. Sci. 2020, 10(17), 5811; https://doi.org/10.3390/app10175811 - 22 Aug 2020

Cited by 36 | Viewed by 13310

Abstract

This is a systematic review of over one hundred research papers about machine learning methods applied to defensive and offensive cybersecurity. In contrast to previous reviews, which focused on several fragments of research topics in this area, this paper systematically and comprehensively combines [...] Read more.

This is a systematic review of over one hundred research papers about machine learning methods applied to defensive and offensive cybersecurity. In contrast to previous reviews, which focused on several fragments of research topics in this area, this paper systematically and comprehensively combines domain knowledge into a single review. Ultimately, this paper seeks to provide a base for researchers that wish to delve into the field of machine learning for cybersecurity. Our findings identify the frequently used machine learning methods within supervised, unsupervised, and semi-supervised machine learning, the most useful data sets for evaluating intrusion detection methods within supervised learning, and methods from machine learning that have shown promise in tackling various threats in defensive and offensive cybersecurity. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

28 pages, 2771 KiB

Open AccessEditor’s ChoiceReview

Machine Learning and Deep Learning Methods for Intrusion Detection Systems: A Survey

by Hongyu Liu and Bo Lang

Appl. Sci. 2019, 9(20), 4396; https://doi.org/10.3390/app9204396 - 17 Oct 2019

Cited by 805 | Viewed by 57988

Abstract

Networks play important roles in modern life, and cyber security has become a vital research area. An intrusion detection system (IDS) which is an important cyber security technique, monitors the state of software and hardware running in the network. Despite decades of development, [...] Read more.

Networks play important roles in modern life, and cyber security has become a vital research area. An intrusion detection system (IDS) which is an important cyber security technique, monitors the state of software and hardware running in the network. Despite decades of development, existing IDSs still face challenges in improving the detection accuracy, reducing the false alarm rate and detecting unknown attacks. To solve the above problems, many researchers have focused on developing IDSs that capitalize on machine learning methods. Machine learning methods can automatically discover the essential differences between normal data and abnormal data with high accuracy. In addition, machine learning methods have strong generalizability, so they are also able to detect unknown attacks. Deep learning is a branch of machine learning, whose performance is remarkable and has become a research hotspot. This survey proposes a taxonomy of IDS that takes data objects as the main dimension to classify and summarize machine learning-based and deep learning-based IDS literature. We believe that this type of taxonomy framework is fit for cyber security researchers. The survey first clarifies the concept and taxonomy of IDSs. Then, the machine learning algorithms frequently used in IDSs, metrics, and benchmark datasets are introduced. Next, combined with the representative literature, we take the proposed taxonomic system as a baseline and explain how to solve key IDS issues with machine learning and deep learning techniques. Finally, challenges and future developments are discussed by reviewing recent representative studies. Full article

(This article belongs to the Special Issue Machine Learning for Cybersecurity Threats, Challenges, and Opportunities)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Machine Learning for Cybersecurity Threats, Challenges, and Opportunities

Share This Special Issue

Special Issue Editors

Special Issue Information

Benefits of Publishing in a Special Issue

Related Special Issue

Published Papers (37 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI