Detection of Hacker Intention Using Deep Packet Inspection

Foreman, Justin; Waters, Willie L.; Kamhoua, Charles A.; Hemida, Ahmed H. Anwar; Acosta, Jaime C.; Dike, Blessing C.

doi:10.3390/jcp4040037

Open AccessArticle

Detection of Hacker Intention Using Deep Packet Inspection

by

Justin Foreman

^1,*

,

Willie L. Waters

¹,

Charles A. Kamhoua

²,

Ahmed H. Anwar Hemida

²,

Jaime C. Acosta

² and

Blessing C. Dike

¹

Department of Electrical and Computer Engineering, Prairie View A&M University, Prairie View, TX 77446, USA

²

DEVCOM Army Research Laboratory, Adelphi, MD 20783, USA

^*

Author to whom correspondence should be addressed.

J. Cybersecur. Priv. 2024, 4(4), 794-804; https://doi.org/10.3390/jcp4040037

Submission received: 26 June 2024 / Revised: 3 September 2024 / Accepted: 4 September 2024 / Published: 1 October 2024

Download

Browse Figures

Versions Notes

Abstract

Ideally, in a real cyberattack, the early detection of probable hacker intent can lead to improved mitigation or prevention of exploitation. With the knowledge of basic principles of communication protocols, the reconnaissance/scanning phase intentions of a hacker can be inferred by detecting specific patterns of behavior associated with hacker tools and commands. Analyzing the reconnaissance behavior of the TCP Syn Scan between Nmap and the host, we built machine learning models incorporating the use of a filtering method we developed for labeling a dataset for detection of this behavior. We conclude that feature selection and detailed targeted labeling, based on behavior patterns, yield a high accuracy and F1 Score using Random Forest and Logistics Regression classifiers.

Keywords:

deep packet inspection; labeling; machine learning; reconnaissance; intention; intrusion; detection

1. Introduction

Reconnaissance is the engagement of the attacker with a network before launching an attack (exploit) to gain more information about the network. Port scanning is the process that enables the detection of a target device’s open ports for sending and receiving data. It also detects closed ports. One way of doing this is to send information requests to target devices and observe the responses.

Intention detection is the detection of abnormal activity in the reconnaissance and scanning phases of hacker activity and determining the goal or objective of the attacker. Intention detection, as we see it, occurs during the phases of reconnaissance and scanning [1]. This activity on the network can be detected by observing the communication protocol behavior on the network.

This research is based on the detection of information request by observing the network behavior at the communication protocol level. When certain scanning commands are executed, the related network protocol exhibits a certain pattern. By observing the packet content, specifically the transport control protocol (TCP) flag settings, the abnormal operation of the hacker can be detected.

Deep packet inspection (DPI) is a technique for inspecting network packet data at a detailed level which can be done in order to identify abnormalities and possibly filter out malware and other unwanted traffic [2]. After considering several tools for the network data ingest, we chose NFStream, an open-source network data analysis framework, mainly because of its support for deep packet inspection and easy integration into our Python notebooks on Google CoLab.

NFStream is an open-source framework that allows high throughput network traffic flow analysis to be run on IT networking hardware. NFStream converts network packets (in pcap format) to flows (sessions) by aggregating packets that share a common key. The common key consists of source address, destination address, port number, protocol, Virtual LAN (VLAN) ID, and tunnel identifier. NFStream derives and calculates flow features, flag counts (such as TCP Flag), statistics, and ground truth for the application layer [3]. The reliable ground truth is derived by integrating the nDPI deep packet inspection toolkit into the traffic behavior analysis [4].

Using a combination of machine learning algorithms and DPI technology, we developed models to demonstrate how this early detection can be implemented. We demonstrate using binary classification how the early detection of abnormalities, such as port scanning, can be implemented in a Machine Learning (ML) model. We assert that the intention of the attacker may be inferred from the type of scanning behavior detected.

In this study, we characterize the behavior of the SYN scan and develop an algorithm for the detection of this behavior. We postulate that this work can be extended to include the characterization of network protocol behavior based on types of Nmap scripts used to scan for specific vulnerabilities [5,6]. Upon the identification of a certain pattern detected and associated with a particular scan, the intention of the hacker can be inferred. This detection will enable cyber deception techniques such as honey pots, honey nets, or masking to be able to thwart the attack [7]. The goal of this research is to perform such characterizations to develop a method of identifying these behavioral patterns to infer hacker intention.

This research is part of a broader research effort to build a reconnaissance intrusion detection system. The paper proceeds to discuss related work, an overview of TCP flags, the methodology developed in the research design, results, discussion and conclusion.

2. Related Work

Related work can be split into two categories, namely foundational protocol operation and research related to the detection of network reconnaissance activity, such as network discovery and port scanning. We reviewed [8] to confirm our understanding of the TCP protocol and related TCP control flags. We confirmed that the TCP flag states provide connection-oriented information that can be used to detect network discovery and port scanning. TCP has control flags in the TCP packet header. We conducted a literature search on scanning tools to determine variations in their functionality. We concluded that Nmap, an open-source tool, provided all the capabilities to do network discovery and generate port scanning traffic. The authors of [9] undertake a discussion on the Nmap principles, rules, and experiments. We reference the official Nmap website [5] for confirmation of Nmap usage and commands.

In regard to the detection of abnormal traffic, ref. [10] used unsupervised deep learning techniques for identification of IoT botnet activities. The authors of [11] used selected features in analyzing normality and abnormality with Wireshark for detection of anomalies. Their approach, however, required constant attention: profiling must occur frequently because traffic behavior patterns change and, therefore, baselines become outdated. The authors of [12] analyzed control flags to detect congestion in a network by observing the ratios of TCP flag counts. The authors of [13] presented a method to determine abnormal traffic by using flow characteristics and the use of a connection threshold. With much interest, we reviewed [14]. The authors examined TCP packets, formulated behavior for TCP control packets, and taught the behavior to a neural network for detection. Their work led us to examine the use of AI/ML algorithms, applying our knowledge of TCP control flag states to label a dataset for ground truth. To our knowledge, this is the first research to show how automated labeling logic can be applied based on certain conditions in the deep packet features.3. TCP Flags Overview

TCP flags are used to indicate a particular state during a TCP conversation. They can also be used for troubleshooting purposes or to control how a particular connection is handled [15]. TCP flags are various types of flag bits present in the TCP header. Each of them has its own significance. They initiate connections, carry data, and tear down connections. The commonly used TCP flags are SYN, ACK, RST, FIN, URG, and PSH. Others not so common include CWR and ECE [15]:

SYN (synchronize): packets that are used to begin a connection.
ACK (acknowledgment): packets that are used to confirm receipt of data packets; also used for confirmation of initiation request and tear down requests.
RST (reset): indicates that the connection is down or perhaps that the service is not accepting the requests.
FIN (finish): signifies that the connection is being torn down. Both the sender and receiver send FIN packets for graceful termination of the connection.
PSH (push): signifies that the incoming data should be passed to the application directly instead of being buffered.
URG (urgent): signifies that the data carried by the packet should be processed immediately by the TCP stack.
CWR (congestion window reduced) indicates the TCP segment has been received with the ECE flag set.
ECE: signifies that TCP peer is Explicit Congestion Notification (ECN)-capable.

The flag bits in most normal TCP data packets are SYN, ACK, FIN, or some combination of these three. Other combinations of the TCP flags are likely abnormal operations, whether intentional or not intentional. The detection of these abnormal operations will lead to the early detection of malfunctioning hardware or the exposure of hacker information gathering and scanning activities. Therefore, it is useful to have an ML system that can detect this abnormal behavior.

3. Methodology

Intention detection seeks to pinpoint abnormal information gathering activity directed to a target network or system with the objective of classifying the goals or intentions of the intruder. TCP flags- CWR, ECE, URG, ACK, PSH, RST, SYN, and FIN- provide state information about the target based on a flag being set by the sender or receiver. Table 1 shows different port scanning techniques that may be used for information gathering about a target host or system. We included a UDP protocol scan in the research, which uses no flags, but there are other indications that can be observed.

First, in order to ensure and confirm our understanding, we reviewed TCP flag uses [15,16]. The setting of the flags and flag counts indicate that certain actions are being conducted, such as operating-system and port scans. Secondly, we observed the traffic in our sandbox LAN which consisted of the system shown in Figure 1.

Network Mapper (Nmap) is a tool used for network exploration and security audits in assessing a system’s vulnerabilities. Nmap determines what hosts are available in the network, what services are available per host (application name and version), what operating system is used per host, and whether a firewall/filter(s) is used.

We issued Nmap commands toward a target Windows 2012 Server from the Kali 0100 attack device, captured the traffic to a pcap file via the Kali 0200 packet collector, and viewed the pcap file in Wireshark. By observing Nmap command traffic in a small network, we were able to determine the behavior of network traffic and observe the settings of TCP flags. This allowed us to develop Python logic for labeling samples with the abnormal label (label = 1), otherwise label = 0.

We generated a dataset in our SECURE Lab that captured benign and scan/probe traffic using Nmap. The general network setup is shown in Figure 2.

We issued Nmap commands for scanning on the 10.0.0.x network with the target node on network 172.x.x.x. This network represents the typical business network. Network address translation is implemented such that the internal network address is not exposed to the internet.

Nmap tools were used to scan the network with the combination of the different Nmap flags to perform the probing. The Nmap tool was used to scan the network (172.16.1.x/24) for scan and stealth scan probing. Wireshark, an open-source network monitor, was used to capture the probing traffic using port mirroring techniques on a Cisco network switch.

The following table (Table 2) shows the commands entered that correspond to the traffic generated on the network and captured to a pcap file. From the Kali Linux 1 (an Offensive Kali) workstation, the following Nmap commands were entered, while simultaneously generating normal network traffic with the Spirent CyberFlood CF20. Table 2 shows the commands entered in order of execution.

We chose to implement three models in Python- Naïve Bayes, Random Forest, and Logistics Regression due to prior experience and the high accuracy (>90%) of these algorithms in our prior experimental results. Figure 3 summarizes the methodology and approach to model development. Data preprocessing on the dataset included removal of samples with missing data, balancing the dataset, performing feature selection, normalizing the data features, splitting the dataset into training/testing sets, and training the data using the three ML algorithms. A Random Forest algorithm was used for feature selection, reducing features for input to the model from 87 to 22 features. The three models were created, and the test dataset was applied to each algorithm. The programs were created and executed on an Intel Core7 running Windows 11. Google CoLab, an interactive development environment, was used to develop the models. An excerpt from the CoLab notebook is shown in Figure 4. We executed the code on a Tensor Processing Unit (TPU) processor.

We developed ML programs in Python that labeled each sample as indicated: normal (0) or abnormal (1). Since we are demonstrating ML supervised algorithms, it was essential to label the dataset with the correct label value by developing filtering logic, which is a novel approach, to the best of our knowledge. The filtering logic was based on the state of TCP Flags and the application name. The pcap file is input to NFStream, a data analysis framework [3]. NFStream reads the pcap file and converts it into a Pandas data frame. The authors in [3] provide a definition of each feature. Because of the deep packet inspection, we obtained features, including TCP flag states and flow statistics, such as the number of bidirectional packets in a link session. In Figure 4, we show one set of logic using the features (data items) provided, which was utilized to carry out the labeling for binary classification. The labeling of the dataset is critical in developing a ML model.

Correctly labeling a dataset requires input from a domain expert. To our knowledge, researchers in this area do little to explain how they labeled a dataset for the ground truth. We labeled the dataset for the binary classification of normal and abnormal (TCP Syn Scan) traffic. In the logic of Figure 4, we set label = 1 (abnormal), if the application is unknown and the RST flag count is greater than or equal to a value of 1 for source or destination.

We confirmed this logic based on the operational principle that either the source or the destination address will set the RST flag. Nmap starts by sending a SYN, as shown in Figure 5. In a normal connection, the target would respond with a SYN/ACK and the source (Nmap host) would respond with data packets until complete.

A hacker using an Nmap SYN scan would respond with an RST because it has the information it needs. Applying this logic to the Nmap dataset, we labeled 77 link sessions as abnormal, as shown in Figure 4. We reviewed a few of the records that were abnormal and observed that the source IP address matched the address of our Nmap source node.

To mitigate imbalanced dataset issues, we used class weights to result in a balanced dataset, as shown in Figure 6. We also performed the analysis using the Synthetic Minority Over-sampling Technique (SMOTE) balancing technique.

Our preprocessing steps also included removing missing data, performing feature selection, normalizing, and scaling the features. The labeled dataset was split into train/test with an 80/20 split. The training dataset was used as input to the three ML algorithms -Logistic Regression, Random Forest and Gaussian Naïve Bayes- for training to detect abnormal traffic data [18,19]. We compare the results in the next section

4. Results

We demonstrated the logic to label the dataset for the TCP SYN scan based on the state of the TCP flags [15]. The SYN scan goal or intention is to determine the state of a port on a communications system without making a full connection [20,21]. We trained the three ML models to detect the TCP SYN scan. Typically, a well-behaved application would not be attempting a TCP SYN scan. Therefore, the detection of this behavior indicates abnormal activity, most likely by a hacker. When hacker intentions can be detected at this point, based on the behavior of the hacker, defenses can be put in place to prevent cybersecurity threats.

Table 3 shows the results of model performance. We initially ran the code on a sub-set of data, and the accuracies were very high, ~1. To validate our findings, since the accuracies were very high, we ran our algorithm on the more extensive data set (32 gigabytes), which the initial data set was approximately 1/4th the size of, and which resulted in the very high accuracies shown in Table 3. We attribute these results to our feature selection process and a detailed labeling process using Python code developed after studying the behavior of the SYN scan. Although we demonstrate models for the TCP SYN scan, we can create models for the binary classification of any of the scans in Table 1.

A comparison of the results for the dataset balanced using both class weights and the synthetic minority oversampling (SMOTE) statistical technique for dataset balancing shows that the Logistic Regression and Random Forest algorithms yield high accuracy, precision, recall, and F1 scores.

The confusion matrices for Random Forest and Logistic Regression algorithms were identical and are shown in Figure 7. There were no false positives and no false negatives. In the confusion matrix for the Gaussian Naïve Bayes results shown in Figure 8, there were 2,962 false positives. Also, the Gaussian Naïve Bayes algorithm resulted in a lower F1 score, which shows the model has room for improvement in precision.

5. Discussion

As indicated in the results, Random Forest and Logistics Regression metrics were high. Random Forest Classification provides support to handle imbalanced datasets and over-fitting, yielding a better-performing model [22,23,24].

Although the Gaussian Naïve Bayes processed the data in significantly less time, the precision and F1 scores were much lower than Logistic Regression and Random Forest. The lower F1 score indicates the model needs improvement in precision. The metrics for this model are lower than the others due to its inherent requirement for feature independence [25].

A Random Forest algorithm was used for feature selection, reducing features for input to the models from 87 to 22 features. By steadily reducing the number of features, we were able to gauge the impact of the remaining features on the efficacy of models. We observed an inflection point, after selecting the top 22 features, below which model performance began to suffer. A reduction in the number of features also contributed to faster processing times.

We propose that by analyzing the TCP flag states, presented through NFStream’s deep packet inspection, we can build ML models to detect specific hacker behaviors, from which intent can be inferred at an early stage by characterizing the network behavior associated with certain hacker scans. Once intention is detected, appropriate means can be used to better thwart an attack, such as cyber deception.

6. Conclusions

The types of scans in Table 1, adopted from [16], are undertaken when a hacker is in the information-gathering and scanning phases. This activity can be detected using the ML model for binary classification. We suggest the following concept—that using the TCP flag states, presented through NFStream deep packet inspection, we can build ML models to detect specific hacker behaviors from which intent can be inferred at an early stage by characterizing the behavior associated with certain types of tools and scans. This was performed for the Nmap SYN scan in this study but can be extended to others.

Through the experiment, we demonstrated the selection of important features using the Random Forest feature selection. Also, the additional deep packet features of NFStream allow for the preservation of the TCP states of each link session.

In the future, we plan to build a multi-classifier model for detecting the different scans in Table 1 and other specific types of Nmap scans in the early identification of hacker reconnaissance activity, from which hacker intent may be inferred. Also, we plan to study other techniques to detect horizontal scanning and vertical scanning for botnet detection.

We believe that intent detection should be part of any intrusion detection system and that this study is a steppingstone to establishing intruder intent. Knowing the intention of an intruder could be used along with the existing field of cyber deception to develop a strategy to thwart an attack and safeguard resources. Furthermore, we believe the future of network IDS will be based on identification of intention of the attacker along with the type of attack. The lightweight system developed in this research can be deployed to a microcontroller unit like the NVIDIA Jetson Nano.

Author Contributions

J.F. (principal investigator): concept originator and concept development; W.L.W.: concept development and software and lab implementation; C.A.K.: DoD Mentor of research, assisted in defining research scope; A.H.A.H.: assisted in mentoring of research and scope development; J.C.A.: provided important insight into methodology; B.C.D.: helped with laboratory setup and concept implementation. All authors have read and agreed to the published version of the manuscript.

Funding

Department of Defense, Prairie View A&M University College of Engineering.

Data Availability Statement

Data are available upon request, pending approval of the SECURE research center director.

Acknowledgments

Research was sponsored by the Under Secretary of Defense for Research and Engineering (USRE) and was accomplished under Cooperative Agreement Number W911NF-19-2-0120. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the USRE or the U.S. Government. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding any copyright notation herein. Additionally, this research was supported in part by the Prairie View A&M University Roy G. Perry College of Engineering and facilitated in the SECURE (Systems to Enhance Cybersecurity for Universal Research Environment) Center of Excellence in the Prairie View A&M University Roy G. Perry College of Engineering.

Conflicts of Interest

The authors declare no conflict of interest.

References

Engebretson, P. The Basics of Hacking and Penetration Testing; Syngress-Elsevier Publishing: Amsterdam, The Netherlands, 1974. [Google Scholar]
Brook, C. What Is Deep Packet Inspection? And How It Really Works. Fortinet Digital Guardian. Available online: https://www.digitalguardian.com/blog/what-deep-packet-inspection-how-it-works-use-cases-dpi-and-more (accessed on 3 September 2024).
Aouini, Z.; Pekar, A. NFStream: A flexible network data analysis framework. Comput. Netw. 2022, 204, 108719. [Google Scholar] [CrossRef]
Deri, L.; Martinelli, M.; Cardigliano, A. nDPI: Open-source high speed packet inspection. In Proceedings of the 2014 International Wireless Communications and Mobile Computing Conference (IWCMC), Nicosia, Cyprus, 4–8 August 2014. [Google Scholar]
Lyon, G. Nmap Official Website. Available online: https://nmap.org/ (accessed on 12 April 2023).
Ann, E. Use Nmap to Discover Vulnerabilities, Launch DoS Attacks, and More. Null Byte. Available online: https://null-byte.wonderhowto.com/how-to/use-nmap-7-discover-vulnerabilities-launch-dos-attacks-and-more-0168788/ (accessed on 3 September 2024).
Brice, Hannah. The Ultimate Guide to Cyber Deception Technology. LUPOVIS. Available online: https://www.lupovis.io/the-ultimate-guide-to-cyber-deception-technology/ (accessed on 12 August 2024).
Kurose, J.F.; Ross, K.W. Computer Networking: A Top-Down Approach, 7th ed.; Pearson Publishing: Upper Saddle River, NJ, USA, 2017. [Google Scholar]
Liao, S.; Zhou, C.; Zhao, Y.; Zhang, Z.; Zhang, C.; Gao, Y.; Zhong, G. A Comprehensive Detection Approach of Nmap: Principles, Rules and Experiments. In Proceedings of the 2020 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery (CyberC), Chonqing, China, 29–30 October 2020; pp. 64–71. [Google Scholar]
Apostol, I.; Preda, M.; Bica, I. IoT Botnet Anomaly Detection Using Unsupervised Deep Learning. Electronics 2021, 10, 1876. [Google Scholar] [CrossRef]
Gill, M.S.; Lindskog, D.; Zavarsky, P. Profiling Network Traffic Behavior for the Purpose of Anomaly-Based Intrusion Detection. In Proceedings of the 2018 17th IEEE International Conference on Trust, Security and Privacy in Computing and Communications/12th IEEE International Conference on Big Data Science and Engineering (TrustCom/BigDataSE), New York, NY, USA, 1–3 August 2018; pp. 885–890. [Google Scholar]
Fukushima, M.; Goto, S. Analysis of TCP Flags in Congested Network. IEICE Trans. Inf. Syst. 2000, E83-D, 996–1002. [Google Scholar] [CrossRef]
Muraleedharan, N. Analysis of TCP flow data for traffic anomaly and scan detection. In Proceedings of the 16th IEEE International Conference on Networks, New Delhi, India, 12–14 December 2008; pp. 1–4. [Google Scholar]
Balram, S.; Wiscy, M. Detection of TCP SYN Scanning Using Packet Counts and Neural Network. In Proceedings of the SITIS ‘08: Proceedings of the 2008 IEEE International Conference on Signal Image Technology and Internet Based Systems, Washington, DC, USA, 30 November–3 December 2008. [Google Scholar]
Cao, D. Understanding TCP Flags SYN ACK RST FIN URG PSH. Available online: https://www.howtouselinux.com/post/tcp-flags (accessed on 3 September 2024).
Bhuyan, M.H.; Bhattacharyya, D.K.; Kalita, J.K. AOCD: An Adaptive Outlier Based Coordinated Scan Detection Approach. Int. J. Netw. Secur. 2012, 14, 339–351. [Google Scholar]
SECURE Center, Network Lab Setup Diagram, Prairie View A&M University. Available online: http://pvamu1.s3-website-us-east-1.amazonaws.com/ (accessed on 3 September 2024).
Shafiq, M.; Tian, Z.; Sun, Y.; Du, X. Selection of effective machine learning algorithm and Bot-IoT attacks traffic classification for internet of things in smart city. Future Gener. Comput. Syst. 2020, 107, 433–442. [Google Scholar] [CrossRef]
Shaukat, K.; Luo, S.; Varadharajan, V.; Hameed, I.A.; Xu, M. A Survey on Machine Learning Techniques for Cyber Security in the Last Decade. IEEE Access 2020, 8, 222310–222354. [Google Scholar] [CrossRef]
Hanna, K.T. What is SYN Scanning. TechTarget. Available online: https://www.techtarget.com/searchnetworking/definition/SYN-scanning (accessed on 1 August 2023).
Paliwal, A. The Ultimate Port Scanning Guide.: Part 3-Port Scans. SecOps Solution. Available online: https://www.secopsolution.com/blog/the-ultimate-port-scanning-guide (accessed on 3 September 2024).
Huh, K. Surviving in a Random Forest with Imbalanced Datasets. Medium. Available online: https://medium.com/sfu-cspmp/surviving-in-a-random-forest-with-imbalanced-datasets-b98b963d52eb (accessed on 2 September 2024).
Luo, H.; Pan, X.; Wang, Q.; Ye, S.; Qian, Y. Logistic Regression and Random Forest for Effective Imbalanced Classification. In Proceedings of the 2019 IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA, 15–19 July 2019; IEEE: Piscataway, NJ, USA, 2019; Volume 1, pp. 916–917. [Google Scholar] [CrossRef]
RITP. Logistic Regression and Regularization: Avoiding Overfitting and Improving Generalization. Medium. Available online: https://medium.com/@rithpansanga/logistic-regression-and-regularization-avoiding-overfitting-and-improving-generalization-e9afdcddd09d (accessed on 2 September 2024).
Rennie, J.D.; Shih, L.; Teevan, J.; Karger, D.R. Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In Proceedings of the 20th International Conference on Machine Learning (ICML), Virtual, 21 August 2003; pp. 616–623. [Google Scholar]

Figure 1. Inspection lab setup with devices and addresses.

Figure 2. SECURE lab setup [17].

Figure 3. Summary of methodology to create ML models.

Figure 4. Example label logic (Source: our CoLab research notebook).

Figure 5. TCP SYN scan [9].

Figure 6. Dataset balancing.

Figure 7. Random Forest, Logistic Regression Confusion Matrix.

Figure 8. Naïve Bayes Confusion Matrix.

Table 1. Scan types and TCP flag conditions [16].

Port Scan Technique	Protocol	TCP Flag	Target Reply (Open Port)	Target Reply (Closed Port)
TCP Connect	TCP	SYN	ACK	RST
SYN Scan	TCP	SYN	ACK	RST
SYN/ACK Scan	TCP	SYN/ACK	RST	RST
FIN Scan	TCP	FIN	No	RST
ACK Scan	TCP	ACK	No	RST
NULL Scan	TCP	No	No	RST

Table 2. Nmap commands used [5].

Probe Type	Nmap Command Issued	Short Description
Nmap host scan	sudo nmap 172.16.1.20	Default method of scanning the host for identifying open ports.
Nmap ping scan	sudo nmap -sn 172.16.1.20	Does Nmap host discovery to determine available hosts without port scan.
TCP SYN scan	sudo nmap -sS 172.16.1.20	TCP packet sent with SYN flag set; after receiving response in the form of SYN-ACK, it disconnects by sending RST flag. Also known as stealth scan.
TCP connect scan	sudo nmap -sT 172.16.1.20	Attempts TCP connection during the scan by issuing a TCP connect call. Default TCP scan when SYN scan is not an option.
TCP ACK scan	sudo nmap -sA 172.16.1.20	Determines whether the port is filtered or unfiltered
TCP window scan	sudo nmap -sW 172.16.1.20	Similar to ACK scan except it exploits implementation details of certain systems to differentiate open ports from closed ports. It also can determine if port is filtered.
UDP scan	sudo nmap -sU 172.16.1.20	Used with UDP protocol to determine open, closed, filtered state of UDP ports
Null scan	sudo nmap -sN 172.16.1.20	Scan with TCP header zero to determine state of port (open, closed, filtered).
FIN scan	sudo nmap -sF 172.16.1.20	Scan with FIN flag set to determine state of port (open, closed, filtered).
XMAS scan	sudo nmap -sX 172.16.1.20	Send packet with FIN, URG and PUSH flags set to determine state of port (open, closed, filtered).

Table 3. Model performance results from our CoLab notebook.

Metrics	Logistic Regression (Class Weights)	Random Forest (Class Weights)	Gaussian Naïve Bayes (Class Weights)	Logistic Regression (SMOTE)	Random Forest (SMOTE)	Gaussian Naïve Bayes (SMOTE)
Accuracy	1.0	1.0	0.977270	0.999985	1.0	0.996278
Precision	1.0	1.0	0.005039	0.882353	1.0	0.03
Recall	1.0	1.0	1.0	1.0	1.0	1.0
F1 Score	1.0	1.0	0.010027	0.9375	1.0	0.058252
AUC_ROC	1.0	1.0	0.988634	0.999992	1.0	1.0
Training Time	24.253008	7.964047	0.110240	17.384840	19.179496	0.250027
Pred Time	0.202459	0.496398	0.015879	0.129446	0.506340	0.016541
Detection Rate	0.999885	0.999885	0.977155	0.999870	0.9999885	0.996163
Memory Usage MB	0.001340	0.091180	0.00143	0.001330	0.092680	0.001410

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Foreman, J.; Waters, W.L.; Kamhoua, C.A.; Hemida, A.H.A.; Acosta, J.C.; Dike, B.C. Detection of Hacker Intention Using Deep Packet Inspection. J. Cybersecur. Priv. 2024, 4, 794-804. https://doi.org/10.3390/jcp4040037

AMA Style

Foreman J, Waters WL, Kamhoua CA, Hemida AHA, Acosta JC, Dike BC. Detection of Hacker Intention Using Deep Packet Inspection. Journal of Cybersecurity and Privacy. 2024; 4(4):794-804. https://doi.org/10.3390/jcp4040037

Chicago/Turabian Style

Foreman, Justin, Willie L. Waters, Charles A. Kamhoua, Ahmed H. Anwar Hemida, Jaime C. Acosta, and Blessing C. Dike. 2024. "Detection of Hacker Intention Using Deep Packet Inspection" Journal of Cybersecurity and Privacy 4, no. 4: 794-804. https://doi.org/10.3390/jcp4040037

APA Style

Foreman, J., Waters, W. L., Kamhoua, C. A., Hemida, A. H. A., Acosta, J. C., & Dike, B. C. (2024). Detection of Hacker Intention Using Deep Packet Inspection. Journal of Cybersecurity and Privacy, 4(4), 794-804. https://doi.org/10.3390/jcp4040037

Article Menu

Detection of Hacker Intention Using Deep Packet Inspection

Abstract

1. Introduction

2. Related Work

3. Methodology

4. Results

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI