Detecting Stepping-Stone Intrusion and Resisting Intruders’ Manipulation via Cross-Matching Network Traffic and Random Walk

Yang, Jianhua; Wang, Lixin; Qin, Maochang; Neundorfer, Noah

doi:10.3390/electronics12020394

Open AccessArticle

Detecting Stepping-Stone Intrusion and Resisting Intruders’ Manipulation via Cross-Matching Network Traffic and Random Walk

by

Jianhua Yang

^*

,

Lixin Wang

,

Maochang Qin

and

Noah Neundorfer

TSYS School of Computer Science, Columbus State University, Columbus, GA 31907, USA

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(2), 394; https://doi.org/10.3390/electronics12020394

Submission received: 20 October 2022 / Revised: 21 December 2022 / Accepted: 8 January 2023 / Published: 12 January 2023

(This article belongs to the Special Issue Advanced Future Communication Techniques and Security Solutions for 6G and Internet of Things)

Download

Browse Figures

Versions Notes

Abstract

:

Attackers can exploit compromised hosts to launch attacks over the Internet. This protects an intruder, placing them behind a long connection chain consisting of multiple compromised hosts. Such attacks are called stepping-stone intrusions. Many algorithms have been proposed to detect stepping-stone intrusions, but most detection algorithms are weak in resisting intruders’ session manipulation, such as chaff-perturbation. This paper proposes a novel detection algorithm: Packet Cross-Matching and RTT-based two-dimensional random walk. Theoretical proof shows network traffic cross matching can be effective in resisting attackers’ chaff attack. Our experimental results over the AWS cloud show that the proposed algorithm can resist attackers’ chaff attacks up to a chaff rate of 100%.

Keywords:

stepping-stone intrusion; session manipulation; chaff attack; time-jittering attack; random walk; network traffic; cross matching

1. Introduction

To avoid being detected or captured, most intruders tend to exploit compromised computer hosts to attack the victims they are interested in. We call the compromised computer hosts stepping-stones [1]. Most attackers establish a long connection chain with more than three stepping-stones to better protect themselves when launching attacks. The more stepping-stones used, the harder it is to capture the attacker. Such attacks are called stepping-stone intrusions.

Stepping-stone intruders are especially hard to track or capture due to using a long TCP/IP session to make their attacks. One easy way developed in 2000 to detect such an intrusion was to decide if a host plays a role of stepping-stone, or to estimate the number of compromised hosts. Many real-world applications may use stepping-stones legitimately; thus, a simple determination of intrusion by merely using the fact that a host has been used as a stepping-stone may produce a false positive error. Estimating the number of compromised hosts reduces the likelihood of a false positive since there are almost no legitimate uses for a connection chain with more than three hosts. However, due to its simplicity and reliability, many methods have been still proposed to detect stepping-stone intrusion by determining if a host plays a role of a stepping-stone.

The first method to detect stepping-stone intrusion by determining if a host plays a role of stepping-stone was described in the paper [2] by S. Staniford-Chen, and L. T. Heberlein in 1995. This method was created even before the stepping-stone concept was formally proposed. In the paper [2], the contents of the network traffic between an incoming and an outgoing connection are compared. If the two computer network connections hold the same network traffic contents, then the two sessions are treated as relayed. A relayed connection pair such as this indicates the host plays a role of stepping-stone. However, this initial algorithm cannot be applied to an encrypted network communication session, which has been widely used since 2000.

Y. Zhang and V. Paxson developed a time-based thumbprint [1] method to detect stepping-stone intrusion. This method compares the time-thumbprint from an incoming connection with an outgoing connection. This method does not require viewing the contents of a packet and therefore is unaffected by their encrypted nature. Instead of using the contents from a series of packets, the time-based thumbprint compares the ON time gap, as well as the OFF gap between the network packets collected from an incoming connection and those collected from an outgoing connection. Monitoring an interactive network connection for a certain period, we would not see any network traffic flowing through the session for that period. This period is called the ‘OFF’ time gap. Oppositely, there is a period where packets are flowing through the connection: ‘ON’ gap. When we monitor a network connection continuously, we can get an ON–OFF gap sequence. This sequence uniquely identifies the interactive network connection. Therefore, the statement of detecting stepping-stone intrusion becomes comparing two sequences. Thus, we can avoid the previous need to view packet contents. This approach [1] can apply to an encrypted interactive network connection. K. Yoda and H. Etoh [3] also proposed a similar idea to detect stepping-stone intrusion in 2000. Rather than using time-based thumbprint, their algorithm viewed the deviation between two different network interactive sessions. A small deviation indicates a high likelihood that the two interactive sessions are relayed. A relayed session pair indicates a high possibility that the host plays a role of stepping-stone. The deviation [3] can be calculated using the header information of TCP/IP network packets. Since the only data used in this method can be obtained from the header of the TCP/IP packet, which is not encrypted, this method is obviously applicable to an encrypted network session.

We found that the primary issue of the above approaches comes from the fact that the time-based thumbprint and the deviation between two network sessions can be garbled by intruders’ session manipulation, such as chaff perturbation and/or time-jittering. Chaff perturbation is a concealment method in which attackers can insert some trivial packets into a regular TCP/IP connection to make two relayed sessions appear unrelated. Chaff perturbation will be discussed in detail in Section 2.

Research from D. L. Donoho (ed.) [4] reveals that attackers cannot disguise their network sessions to evade detection unlimitedly. The attackers’ ability to manipulate a live interactive session is capped. A. Blum, D. Song, and S. Venkataraman developed an algorithm [5] using TCP/IP packet count to detect stepping-stone intrusion by checking the difference in the number of packets traveling between two connections. If the two network sessions are relayed, that difference in packet count is bounded with a high probability. This approach tends to fail in terms of resisting attackers’ chaff attack because the number of the chaffed packets necessary to evade detection is relatively small.

T. He and L. Tong [6] developed an approach called Detect Bounded Memory Chaff to detect stepping-stone intrusion in 2007. They claimed their algorithm can resist intruders’ chaff evasion proportional to the size of the network traffic. Their research proof and experimental results show that, if assuming the packet delay is limited by ∆, in every n network packets, an attacker needs to inject n/(1 + λΔ) packets minimally into the connection to evade detection, here λ is a parameter of a Poisson distribution. Obviously, the smaller the ∆, the better performance the algorithm would present in resisting intruders’ chaff attack. However, higher false positive detection error might be introduced.

K. H. Yung [7] developed an algorithm to detect stepping-stone intrusion by roughly estimating the length of a connection chain built by attackers. The longer a connection chain, the higher the possibility the chain was built by an attacker because it does not make any sense to use a long connection chain to access a host if a shorter one exists. K. H. Yung’s approach using packet round-trip time (RTT) needs matching network traffic to estimate the length of a connection chain. Unfortunately, the packet matching approach in [7] cannot make a correct packet match for a long connection chain. J. Yang developed some approaches to match network traffic [8] which works for a long connection.

In 2021, Dr. L. Wang et al. developed using k-means clustering to mine network packets to estimate the length of a connection chain. This approach overcomes the issues of the existing approaches for SSID (Stepping-Stone Intrusion Detection) by estimating the length of a connection chain. The issue primarily means requiring a large number of TCP packets to be captured and processed to make an effective detection. The algorithm proposed in [9] can accurately determine the length of a connection chain with a smaller number of packets. However, this method loses effectiveness if there are many outliers in the packets’ RTTs.

In ref. [10], Dr. L. Wang et al. proposed an approach that uses packet crossover to estimate the downstream length of a connection chain. Packet crossover is a phenomenon in which a Send (request) packet meets with the Echo (response) packet of a previous Send packet while traveling along the connection chain between a client host and a server host. The paper proved that the length of a downstream connection chain strictly increases as the ration of packet crossover increases. Dr. L. Wang et al. proposed a framework to test the resistance capability of detection algorithms for stepping-stone intrusion against time-jittering session attack [11].

In 2021, Dr. J. Yang et al. proposed an algorithm of using encrypted packets to detect stepping-stone intrusion [12]. To perform this approach using the encrypted packets of a host’s incoming and outgoing connections, Dr. J. Yang et al. assumed that the two connections would use the same encryption algorithm, which may not always be true. Dr. J. Yang improved the algorithm published in [12] to make it accurate regardless of if the encryption algorithms from the incoming and outgoing connections are the same. The initial result was published in the proceedings of 36th International Conference on Advanced Information Networking and Applications, Sydney, Australia, April, 2022 [13].

In this paper, we propose a new idea by exploiting packet cross-matching and modeling the RTTs of network packets as a random walk to detect stepping-stone intrusion and resist intruders’ chaff evasion attack at a higher rate than all other existing approaches. This paper is an extended version of the conference paper published in [14]. The idea was initially motivated by the approach proposed in [15]: using RTT-based one dimensional random walk to detect stepping-stone intrusions. In this paper, we made three significant improvements from the conference paper [14]. The first improvement was to adopt a new published packet matching algorithm in 2021 [16]. The second improvement was to enhance the cross matching algorithm by making it more robust and efficient by instead of an absolute number of RTTs, using the ratio between the number of RTT changes and the total number of RTTs as a critical factor to determine if an intrusion is occurring. The third improvement was to provide more experimental results, which were conducted using Amazon AWS cloud infrastructure to verify the improved performance of the new algorithm, esp. its capacity to resist intruders’ chaff evasion attack.

In [14], the experiments were conducted over a short connection of three hosts. In each connection, we collected two types of packets: Send and Echo. We placed the Send packets collected into an array called Send stream. Similarly, an Echo stream contained all the Echo packets from a connection. The idea of cross-matching is to match the TCP/IP packets not only from the streams in the same connection but also the streams from different connections. The difference between the amount of RTTs matched from the same connections, and the amount of RTTs matched from different connections are the two dimensions in the Random Walk.

The rest of this paper is arranged as the following. Section 2 describes a way to model and match computer network traffic. Section 3 discusses the stepping-stone chaff evasion attack. Section 4 presents the RTT-based random walk. Section 5 describes packet cross-matching. Section 6 presents the detection algorithm using a two-dimensional random walk based on packet cross-matching. Section 7 analyzes how this algorithm can detect stepping-stone intrusion and resist intruders’ chaff attacks. In Section 8, we present experimental results to justify the performance of the proposed algorithm. In Section 9, we conclude the paper and discuss some future work.

2. Modelling and Matching Network Traffic

Network communication traffic is the interactions between the requests of a client side and the responses from the corresponding server side. To simplify our description and clarify the packet matching process for readers, Network communication can be modeled as Send, Echo, and Ack packets in this research.

2.1. Send and Echo Definition

A TCP/IP packet has a one-byte Flag field in its header with six bits in the order of “UAPRSF” from bit 10 to 15. These bits indicate the packet type, and the two other bits (Bits 8 and 9) are used for TCP congestion control. If the bit 11 “A” is set up, it indicates the packet is an Acknowledgement packet. However, if the bit 12 “P” is set up, it shows the packet is being used for the ‘Push’ function; it pushes the buffered data into the receiver side application. If a packet has its “P” flag set up, the packet must carry a data payload. If a packet only has its “A” flag set up, it is used as a signal to acknowledge to the opposite communication side that the data sent out has been received correctly. A packet with only an “A” flag set up does not carry any data payload. However, a packet with both “A” and “P” flags set up plays two roles: not only to carry a data payload but also to acknowledge the opposite side.

In a client–server communication model, we assume that Host A (client) communicates with Host B (server) via a TCP connection. Host A and B can both work as a sender and a receiver. If a TCP packet has the flag “P” set up and is sent from its client side to the server, the packet is defined as a Send packet; on the contrary, if the packet is sent from its server side to the client, this packet is defined as an Echo packet. Similarly, an Ack is a TCP packet sent from either the server or the client and has the flag “A” set up. Intuitively, a Send packet is a request made by a client, and an Echo packet is a server’s response to that request. In general, a request will fit into one packet, but the response to one request may be too large to fit into one packet. A Send or an Echo packet may also act as an acknowledgment packet.

2.2. Packet Matching and RTT

A Send, also called a request packet, from the client side may cause one or more Echo packets at the server side. Packet matching is finding the corresponding Echo packet for a given Send. The reason to study packet matching is that the time gap between the two matched packets can reflect the length of the connection chain between a client and the server. We call this gap the Round-Trip-Time (RTT) between a client side and its corresponding server. Notably, the RTT here is defined as a connection chain between two communication hosts spanning multiple hosts (the stepping-stones).

If a Send packet only causes one Echo packet, it is trivial to match the two packets and compute the RTT between those packets. If a Send packet causes multiple Echo packets, the Send packet can match with the first Echo packet to compute the RTT. The problem with packet matching is that if multiple Sends each result in multiple Echo packets, is it still possible to correctly match the Sends with Echoes? Packet matching approaches will be discussed in the section “Packet Cross-Matching”.

2.3. RTT Distribution

Matching a Send packet with its Echo packets is not trivial, especially when multiple Send packets being echoed by multiple Echo packets. Since our goal in matching TCP packets is to calculate the RTT between a matched Send and Echo, it is not necessary to match a Send packet with all its Echoes. Instead, we just compute the RTT of the first matched packets. J. Yang, etc. [8] proposed an algorithm to find RTTs without directly matching the packets. This algorithm took advantage of the RTT distribution to obtain the RTTs of TCP packets.

A packet’s RTT can be treated as the sum of four network delays including packets’ processing delay, queuing delay, transmission delay and propagation delay [17]. Though there may be many hops/connections between a source host and its destinations, for simplicity, we can model the delay of a network connection as a queue. T(t) can be expressed in Equation (1) if we use T(t) to represent the RTT in a TCP/IP connection.

T (t) = T_{0} + Δ T (t)

(1)

where T₀ is a constant, and

Δ T (t)

is a variation. The constant part is primarily from propagation delay and the varying part

Δ T (t)

is from network queuing delay [18]. We apply the /M/M/1 queuing model to the RTT queue [19], and obtain the distribution of

Δ T (t)

. This distribution is modeled in Equation (2).

\begin{array}{l} P (Δ T > x) = \lim_{t \to \infty} P (Δ T (t) > x) \\ = e^{(- γ_{i} x)}, γ_{i} = μ_{i} (1 - ρ_{i}) \end{array}

(2)

where

μ_{i}

is the service rate on link i and

ρ_{i}

is the corresponding utilization factor.

From the results in Equation (2), we know that the variance of RTTs in a network connection can be simulated as an exponential distribution. Due to the cumulative delay of all the hosts between the client and the server, it is hard to simulate the variance of RTTs of a TCP connection chain as an exponential distribution. From the research results in [8], we found that the occurrences of

Δ R T T

for a TCP/IP connection chain can be modeled as a Poisson distribution [17,18]. We adopt a Poisson distribution to model the variance of RTTs in this paper.

If the variance of RTTs follows a Poisson distribution, a data mining-clustering approach can be used to match network traffic [8]. One important feature of the Poisson distribution tells us that more than 90% of the RTTs are distributed around its mean within one standard deviation distance. This can be described in a math inequality

μ - σ \leq R T T s \leq μ + σ

. Here,

μ

is the mean value and

σ

is the standard deviation of the Poisson distribution.

3. Chaff Attack Definition and Implementation

3.1. Chaff Definition

A chaff attack is a widespread attack used by intruders in recent years. Many existing SSID approaches can be defeated by chaff attacks. In this section, we will explain the concept of a chaff attack, and how it lowers the performance of a stepping-stone intrusion detection algorithm and potentially renders them useless. The chaffing process injects packets into a live TCP/IP interactive session. It exploits a packet forging or spoofing technique. It interferes with an established live TCP session. The chaff technique is commonly used in man-in-the-middle, denial-of-service, and/or stepping-stone intrusion attacks. Chaffing can be performed through the following procedures: create a raw socket; make an Ethernet, IP, and TCP header; obtain empty data for the stepping-stone intrusion; assemble the header and data to generate a chaff packet, and deliver the chaff packet out by the raw socket. Some existing tools can make chaff packets easily. These tools include Pakcit, hping, and Ettercap. In the following section, we will discuss the tool Packit in detail.

3.2. Packit

One popular tool to make and inject TCP/IP packets into a network connection is Packit [20]. Using this tool, intruders can make essentially any types of packets including ARP, IP, TCP, UDP, ICMP and Ethernet header. Typically, attackers need to manually define a TCP and Ethernet header value to evade SSID systems. This tool allows intruders to specify the injected network traffic information including the type of packet, the total number of packets, the number of seconds to wait between each injected packet, the injection rate, the network interface to be injected into, the data payload, and the length of the packets to inject. Here is an example to show how to inject a packet into a network connection. We could insert 70 TCP/IP packets at a rate of 100 packets per second from 172.24.210.19 on the port 521 to 164.27.191.25 on the port 80 by setting up flags: SYN and RST. We assume the sequence number is 432,719,088, and a source Ethernet address is DD:44:A0:4B:8D:11 The corresponding Packit tool command would be “packit –s 172.24.210.19 –d 164.27.191.25 –S 521 –D 80 –F SR –q 432719088 –c 70 –b 10 –e DD:44:A0:4B:8D:11”.

3.3. Chaff Tool Developed Using C#

We developed a C# code to make and inject TCP/IP packets. We first validate the source/destination ports input by a user. Next, we check if the port is valid. The easiest way to validate a port is to check if it is between 0 and 65,535, excluding 0.

A TCP packet can be built from the Datalink layer to the Application layer. At the Datalink layer, the MAC (Multiple Access Control) address of the gateway can be obtained. Therefore, packets can be sent outside of a LAN (Local Area Network). We also need to confirm that the MAC address of an active network interface is obtained and valid. Next, we construct the packet in Network Layer with IPv4 format. From the user’s input, it is trivial to get the source and destination IP address. We can obtain the header checksum based on the given header information. We assume no fragmentation is required. If the identification number of a packet is not given by the user, it is empty by default. At the transport layer, the protocol can be set up automatically. All other fields, including “Options”, “TTL”, and “TypeOfService”, can be set up according to the user’s input. However, TTL value must be validated. Its legal value cannot be more than 255.

We set up the Transport Layer after completing the network layer. In the Transport layer, we use the user’s input to set up the source port and destination port numbers, respectively. The Transport layer header checksum is set up automatically. The user can decide the packet sequence and acknowledgment numbers. Otherwise, the default sequence number is set up to 100, and the default acknowledgement number is set up to 50 by the system. TCP Packet flags are decided by user input, including U, R, S, P A, and F. If no user input is detected, the available window size can be set up to 100 in default. Finally, we need to set up the application layer to make the payload. In this layer, the user can decide the message and/or data to send across the network. This is the payload part of a packet. In our system, a textbox is set up for the user to enter their message. After completing all the above steps, we come to the last step to build the packet. We use the “PacketBuilder” function to assemble the above parts of the packet into a cohesive whole. We also need to select a network device to set up the interface to send packets out.

We reach the last stage using the communicator we built to send the packet over the network. We use the method in the “PacketBuilder” class to build our packet, then send the packet out using the “SendPacket” method of the “PacketCommunicator” class. This concludes our discussion on how to chaff a network connection with C#.

3.4. Chaff Affection to RTT-Based Random-Walk

From Section 3.2 and Section 3.3, we know that it is not hard for attackers to chaff a network connection. This chaff attack can defeat most of the SSID algorithms. We examine one of the detection tools, ON–OFF time thumbprint [1], as an example to demonstrate how detection is defeated. We monitor a host and collect five Send and Echo packets from the incoming connection and the outgoing connection of the host, respectively. We assume the following packet sequences are obtained: P_in = {27, 52, 61, 80, 92}, and P_out = {27, 52, 62, 81, 92}. Each value in the sequence represents the timestamp of the packet captured, either a Send or an Echo. We obtain two ON–OFF thumbprint sequences: T_in = {25, 9, 19, 12}, and T_out = {25, 10, 19, 11} where “25, 19, 11, 12” represents “OFF” time of the connections, and “9, 10” represents “ON” time. It is trivial to conclude that the two thumbprints are tightly relayed. If we chaff the outgoing connection with five TCP packets using Packit, such as P_out = {27, 37, 42, 52, 57, 62, 81, 109}. The corresponding outgoing time thumbprint becomes T_out = {35 (ON), 45(OFF)} if assuming the threshold is 10. It is very clear that after the chaffed packets are sent to the outgoing connection, it is hard to tell if the two connections are still relayed or not. The chaff attack uses packet injection to evade SSID system. The algorithm proposed in this paper can resist intruders’ chaff evasion attack.

4. Random-Walk Model

4.1. One-Dimensional Random-Walk

A random walk can be described as a stochastic process. This process presents a route consisting of a succession of random steps on mathematical space. It can predict the change of a time series variable that occurs from one period to the next. In each period, the value of the variable takes an independent random step going up or down. We define the First Differences as the differences from one observation to the next. If the first differences used in the walk are random, a series is considered to follow random-walk. In order to understand random-walk well, one point we must be sure is that the series itself in a random-walk model is not random. The random part is the changes from one period to the next.

Here is a simple example to demonstrate a random-walk process. The process starts with 0. It randomly selects the next step either −1 or 1. Then, it adds the randomly selected value to the observation from the previous time step. The process repeats the previous step until it stops.

The above simple model is a typical example of an one-dimensional random-walk. In each step, moving left with the value of −1 or right with the value of +1 occurs in equal probability. A stepping-stone host has incoming connections from other hosts as well as outgoing connections to other hosts. If we monitor a host, and collect all the TCP/IP network traffic to/from the host, then the differences between the number of packets from the incoming and the outgoing connection, respectively, can be modeled as a time series. This series is a random-walk process. In other words, if the difference can be modeled as a random-walk, the host is very likely being used as a stepping-stone.

The results in [8] shows that random-walk can be used to detect stepping-stone intrusion. Unfortunately, the approach proposed in [8] was challenged by intruders’ chaff evasion manipulation. From the above analysis of the chaff technique, we know intruders can insert any number of packets into a connection theoretically. This also means that approach applying random-walk to detect intrusion can be evaded if there are enough chaffs.

4.2. RTT-Based Random Walk

Chaffed packets are typically removed before arriving at the destination host of a connection chain. Otherwise, the chaffed meaningless packets would be received and processed by the receiver and would cause errors because they are not a correct part of the communication. Chaffed packets do not incur any Echo packets. Thus, the likelihood of a chaff packet being matched to the Echo packets triggered by normal Send packets is low. This means that chaffed packets could potentially be filtered out by packet matching. If we monitor a host being used as a stepping-stone and match the Send and Echo packets at its incoming and outgoing link, respectively, we obtain the number of matched packets N_in from the incoming link, as well as the number N_out from the outgoing link. The difference between N_in and N_out can show the behavior of a random-walk. This can resist intruders’ chaff manipulation since the chaffed packets are rarely matched. The details of applying RTT-based random-walk to detect stepping-stone intrusion can be obtained from [15].

Intruders can manipulate a connection in more ways than just chaffed Send packets. One reasonable assumption is that intruders could insert any type of packets into a connection. If Echo and Send packets were chaffed concurrently, it would further complicate SSID using RTT-based random-walk, since the number of packets matched may now be affected. To alleviate this issue, we propose a Packet Cross-Matching algorithm to detect stepping-stone intrusion.

5. Packet Cross-Matching

5.1. Packet Matching

The methods that utilize packet matching to detect stepping-stone intrusion not only have a largely reduced false positive error but also resist intruders’ evasion due to traffic manipulation. Packet matching is the process of pairing the requests over a TCP connection from the sender with the corresponding responses from the receiver. The RTT from the matched packets reflects the length of the connection chain from sender to receiver. The longer a connection chain is, the higher the probability that the stepping-stone intrusion is detected.

Packet matching techniques have been extensively studied since 2002 [7]. The easiest way to match TCP packets is by using packet sequence numbers. Each TCP packet has a sequence number in its header that indicates its first byte number in its payload. A TCP packet payload is numbered in bytes. The total length field in its header tells a recipient the sequence number of the last byte in the data stream. If a packet is sent out with sequence number 100 and length 20, it indicates the packet has 20 bytes data from sequence number 100 to 119. If the packet is correctly received, the Echo packet acknowledges that the 20 bytes were correctly received. So, the echo packet’s acknowledgment number will be 120, which indicates all of the 20 bytes have been correctly received. Examining a Send packet sequence number and an Echo packets acknowledgment number can determine if the two packets are matched or not. This is only reliable in a local area network where each request is received, processed, and echoed in time. The Internet is far more complex than a local area network for packet matching.

On the Internet, packets are matched by using RTTs rather than using packet sequence and acknowledgment numbers. The time gap between matched packets can reflect the time length of an extended connection chain. If we know a packet’s RTT, we can obtain the matched packet pair using said RTT. The Clustering-Partitioning data mining algorithm [8] is an approach that matches TCP/IP packets by taking advantage of the distribution of packets’ RTTs.

As we have discussed, the purpose of matching TCP/IP network traffic is to compute their RTTs. The RTT of a network packet can represent the length of a connection chain. Using RTTs helps us predict the length of a connection chain, and the number of hosts compromised as stepping-stones in that chain. Matching packets is not trivial, especially in the context of the Internet. There are many challenges in matching TCP/IP packets over the Internet.

Matching packets on the Internet is harder than in a local area network. The primary reason is that some issues existed in the TCP/IP protocol design. DARPA designed the TCP/IP protocol. This protocol was first used by DARPA in ARPANET in the 1970s [21]. Security was not an important concern of the design at that time. Most efforts put into its design were focused on network communication efficiency. Knowing TCP/IP working mechanism is crucial for better understanding packet matching, as well as understanding its challenges. Before moving to packet matching, we will first discuss the design issues of the TCP/IP protocol.

TCP/IP is a protocol suite containing many different protocols. TCP is the most important one in the protocol family. TCP is a connection-oriented protocol, and it regulates how to deliver messages reliably from a source host to its destination. TCP only defines the communication rules between two nodes; each node can be either a host or a router. TCP supports many useful features to make network communication more efficient. These features include pipelining, three-way hands-shake, cumulative acknowledgment, congestion-control, and flow-control. A pipelined protocol can result in packet crossover issue. In packet crossover, it allows a request to be sent out before the acknowledgment of its previous request. It is possible that an on-the-way request may be passed by the response to its previous request. A pipelined protocol makes higher network utilization, but complex packet matching due to packet crossover issues. The feature of cumulative acknowledgement can also make packet matching complicated. The main reason is that multiple requests received by a recipient can be acknowledged and echoed in a single response packet. Another factor to affect TCP packet matching is packet resending. In order to implement a reliable delivery in TCP protocol, the mechanism taking by TCP is to check each received packet to see if there is any error or out of order. If any such error/incorrect ordering is detected, the packet is required to resend by the sender host. If, after a request packet is sent out, the sender does not receive the acknowledgment in a predetermined time, this request packet will be resent by the sender automatically. If so, the resending for a previously received packet would result in duplications of the same packet at the receiver side. All these factors make TCP packet matching a complex process.

Another factor to affect packet matching is packet loss. A Send, Echo, or Ack may be lost during networking communication due to many unknown reasons. As we mentioned before, a lost Send can be resent. However, due to the cumulative acknowledgment feature of the TCP protocol, a lost acknowledgment and echo packet might be ignored if the sender side timer does not expire. However, a lost request packet must be resent after its timer expires. Due to the difficult nature of matching TCP packets from Internet communication, packets’ RTTs can be estimated using the RTTs distribution. As we discussed before, a data mining and clustering approach [8] can match Send and Echo packets in a network connection chain by making use of the RTTs’ distribution.

5.2. Packet Cross-Matching

As shown in Figure 1, host h_i is used as a stepping-stone, its incoming connection is denoted as

C_{i n}^{}

and a relayed outgoing connection

C_{o u t}^{}

. We use

S_{i}^{}

and

E_{i}^{}

to represent the Send and Echo packets collected from

C_{i n}^{}

, respectively. We do likewise for

S_{j}^{}

and

E_{j}^{}

. The connection chain from the intruder’s host to h_i is called the upstream connection chain of h_i, and the one from h_i to the victim’s host is called the downstream connection chain. We assume the host prior to h_i immediately along the upstream link is h_i−₁, and the one immediately after h_i along the downstream link is host h_i+₁.

Traditionally, we only apply packet matching to the packets collected from either the incoming connection

C_{i n}^{}

, or the outgoing connection

C_{o u t}^{}

; that is to match the packets S_j with E_j, or S_i with E_i. Since S_i can be chaffed from the host h_i−₁, and E_i can be chaffed from the host h_i, if the chaffed packets are inserted carefully, it will affect the amount of RTTs, further defeating the RTT-based random-walk detection approach. Similarly, E_j can be chaffed from the downstream host h_i+₁, and S_j can be chaffed from the host h_i. To deal with intruders’ chaff manipulation and make connection manipulation harder, we propose Packet Cross-Matching algorithm.

As shown in Figure 1, the previous packet matching approaches focus on matching Send and Echo packets in either

C_{i n}^{}

or

C_{o u t}^{}

; that is to match S_i with E_i, as well as matching S_j with E_j. If either S_i/S_j or E_i/E_j are chaffed, the corresponding matching results may be affected. Packet Cross-Matching is to match the Sends in

C_{i n}^{}

with the Echoes in

C_{o u t}^{}

. It also matches the Send packets in

C_{o u t}^{}

with the Echo packets in

C_{i n}^{}

. Suppose Send and Echo packets are collected from the incoming and outgoing connections of a host and are put into four queues, respectively, as the following:

S_{i} = {s_{i 1}, s_{i 2}, \dots, s_{i n}} E_{i} = {e_{i 1}, e_{i 2}, \dots, e_{i m}} S_{j} = {s_{j 1}, s_{j 2}, \dots, s_{j n}} E_{j} = {e_{j 1}, e_{j 2}, \dots, e_{j m}}

Applying Cross-Matching, the Send packets in

S_{i}

can match not only with the Echo packets in

E_{i}

, but also with the packets in

E_{j}

. Similarly, the Sends in

S_{j}

can match with the Echoes in

E_{i}

, as well as with the packets in

E_{j}

. In this paper, we elect to use the improved clustering partitioning approach proposed in [16] to match network traffic. We can obtain four RTT queues from a host monitored:

R T T_{i - i}, R T T_{i - j}, R T T_{j - j}, R T T_{j - i}

.

6. Detection Algorithm

6.1. Modelling the Problem

Our research goal is to determine if a host is used for stepping-stone intrusion by the network traffic from its incoming and outgoing connection. As we discussed, most existing research focuses on determining if there exists a relayed incoming and outgoing connection pair. Unfortunately, chaff perturbation manipulation can easily defeat those approaches for stepping-stone intrusion detection, even though some researchers simply assumed that chaff perturbation only happens in one stream (either the Send packet stream or Echo packet stream) of a connection. However, we know that, with the help of manipulation tools, intruders can easily inject meaningless packets into any stream in any connection concurrently. Therefore, it is necessary to develop a novel approach to defeat intruders’ evasion attacks, especially chaff perturbation. In this research, we assume that intruders can only inject chaff into each host in a connection chain independently. In other words, intruders cannot coordinate chaff injections on two or more consecutive hosts in a connection chain.

As we discussed in Section III—Stepping-stone Chaff Attack, an intruder can inject some meaningless packets including Send and Echo into a connection by specifying the source and destination IP and port numbers. The injected packets can be observed from one stepping-stone host, which is the source host of the injection and the destination host of the injection, but not any other stepping-stones. This can be explained via the scenario shown in Figure 2.

In Figure 2, we assume that an intruder can make a connection chain using OpenSSH from the host used by the Intruder to the host used by the Victim through the stepping-stone hosts h1, h2, h3, and h4. Host h2 is used as a sensor where a detection program can collect packets and decide if there exists an intrusion. If the intruder injects some packets at the outgoing connection of the sensor (the host h2) by specifying the source port at h2, the IP address of h2, the destination IP address of the Victim host, and port 22 used as the SSH server port, the injected packets can be captured at the sensor and Victim host, respectively, not at hosts h3 and h4. This also reminds us that injected packets cannot go along a connection chain. If injected packets could be processed and replied to, the echoed packet would return to the sensor, but not via the connection between h2 and h3. If the injected packets are matched, they may not give similar RTTs to those from the connection chain. So, the packet matching approach can easily filter out the injected packets.

Injected packets are not allowed to arrive at the Victim host since they are meaningless and cannot be executed. Most intruders just inject packets from a port at a sensor but make the chaff packets dropped immediately at the destination host. Intruders can inject packets into packets streams S_i, and E_i, as well as S_j and E_j, as shown in Figure 1. This would complicate stepping-stone intrusion detection and defeat most existing detection methods. We propose the Cross-Matching RTT-based random walk SSID algorithm to defeat this type of chaff perturbation attack.

6.2. Detection Algorithm

As shown in Figure 1, the host h_i plays the role of stepping-stone. The host incoming and outgoing connections are denoted as C_in and C_out, respectively. The packets collected from C_in can be divided into Send packet stream S_i, and Echo stream E_i. Similarly, S_j and E_j are two packet streams for the outgoing connection. The incoming and outgoing connection of a stepping-stone host should be relayed. However, intruders’ chaff manipulation may break it.

From the previous study, we matched Send and Echo in the same connection to detect stepping-stone intrusion. We found that it is easy to be defeated by intruders’ chaff manipulation attack. Our recent study has shown that if we match Send and Echo crossly, that means matching packets not only from the same connection, but also from different connections, such as the packets from incoming connection with the packets from outgoing connection, it would perform better than it was in resisting intruders’ chaff attack.

As Figure 1 shows, the host h_i is monitored and Send and Echo packets from its incoming connection C_in and outgoing connection C_out, are collected, respectively. We put the Sends and Echoes collected in four queues: S_i, Send queue, and E_i_, Echo queue from C_in, and S_j and E_j from C_out.

S_i = {s_i1, s_i2, …, s_im};

E_i = {e_i1, e_i2, …, e_in};

S_j = {s_j1, s_j2, …, s_jk};

E_j = {e_j1, e_j2, …, e_jl}

We match the Sends in queue S_i with the Echoes in queue E_i and E_j, respectively, using the packet-matching algorithm proposed in [16]. This gives us matched queues RTT_ii and RTT_ij. Similarly, queue RTT_ji can be obtained from the packet matching between queues S_j and E_i and queue RTT_jj from the packet matching between S_j and E_j. We assume the number of elements in RTT_ii, RTT_ij, RTT_ji, and RTT_jj are N_ii, N_ij, N_ji, and N_jj, respectively. It is trivial to compute the differences between the numbers.

Δ_{i i - j j} = N_{i i} - N_{j j} Δ_{i i - i j} = N_{i i} - N_{i j} Δ_{i i - j i} = N_{i i} - N_{j i} Δ_{j j - i j} = N_{j j} - N_{i j} Δ_{j j - j i} = N_{j j} - N_{j i}

The above equation can determine not only if there exists a stepping-stone intrusion, but also if chaff attack exists, and which session is chaffed. If

Δ_{i i - j j}

is always bounded, and is close to zero, this indicates a successful detection of stepping-stone intrusion. If

Δ_{i i - i j}

is always larger than zero, it tells us that the session E_j is being chaffed from its downstream adjacent host. Similarly, if

Δ_{i i - j i}

is always larger than zero, we detect a stepping-stone intrusion, and know that the session S_j is chaffed. If

Δ_{j j - i j}

is always larger than zero, we conclude that the session S_i is chaffed. Similarly, if

Δ_{j j - j i}

is always larger than zero, we understand that the session E_i is chaffed.

By summarizing the above ideas, we come up with an algorithm to not only detect stepping-stone intrusion, but also resist intruders’ chaff perturbation attack. For convenience, we name the algorithm as “Cross-Matching RTT-based Random Walk”, abbreviated as CMRW. The detail of the CMRW algorithm are shown in the following (Algorithm 1).

Algorithm 1: CMRW Algorithm—Cross-matching RTT-based random walk detection algorithm.

Input: Threshold α (in between 0 and 1) and Collected TCP Packet Streams: E_i, S_i, E_j, S_j
Output: Stepping-stone intrusion detected or not
Begin:
          Step 1: call Clustering-Partitioning Packet Match algorithm [16] to match S_i with E_i, S_i
          with E_j, S_j with E_i, and S_j with E_j, respectively.
          Step 2: from the above packet matching, we obtain the different amount of
          RTTs from the different packets match pairs: N_ii, N_ij, N_ji, and N_jj.
          Step 3: Compute the differences among N_ii, N_ij, N_ji, and N_jj as the following:

Δ_{i i - j j} = N_{i i} - N_{j j} Δ_{i i - i j} = N_{i i} - N_{i j} Δ_{i i - j i} = N_{i i} - N_{j i} Δ_{j j - i j} = N_{j j} - N_{i j} Δ_{j j - j i} = N_{j j} - N_{j i}

, and compute the ratio

α_{x x - y y}

between Δ (the differences between the different
amount of RTTs) and N (the amount of RTTs) as the following:

α_{i i - j j} = Δ_{i i - j j} / ((N_{i i} + N_{j j}) / 2) α_{i i - i j} = Δ_{i i - i j} / ((N_{i i} + N_{i j}) / 2) α_{i i - j i} = Δ_{i i - j i} / ((N_{i i} + N_{j i}) / 2) α_{j j - i j} = Δ_{j j - i j} / ((N_{j j} + N_{i j}) / 2) α_{j j - j i} = Δ_{j j - j i} / ((N_{j j} + N_{j i}) / 2)

Step 4:
If

α_{i i - j j} < α & α_{i i - i j} < α & α_{i i - j i} < α & α_{i i - j i} < α & α_{j j - i j} < α & α_{j j - j i} < α,

Then, Stepping-stone intrusion, OR
If

α_{i i - j j} > α & α_{i i - i j} < α & α_{i i - j i} < α & α_{i i - j i} < α & α_{j j - i j} < α & α_{j j - j i} < α,

Then, Stepping-stone intrusion and possibly S_i and E_i, or S_j and E_j are chaffed, OR
If

α_{i i - j j} > α & α_{i i - i j} < α & α_{i i - j i} < α & α_{i i - j i} < α & α_{j j - i j} 〈 α & α_{j j - j i} 〉 α,

Then, Stepping-stone intrusion, and possibly (S_j, S_i and E_i), or (S_i, S_j and E_j)
are chaffed
End:

In the above algorithm, a typical value of α in our experiments is 3%. Randomly chaffing any packet stream individually does not usually affect the amount of RTTs due to packet matching algorithm. Our experimental results show that even chaffing both the Send and Echo streams concurrently in any connection, or chaffing all Send streams in both connections concurrently, still may not affect the amount of RTTs. However, the CMRW algorithm can detect stepping-stone intrusion with or without packet chaffing.

7. Resistance Analysis to Chaff Attack

In Section 6, the algorithm CMRW to detect stepping-stone intrusion has been presented. In this section, we will analyze its resistance to chaff attacks. Before heading to our analysis, we will prove a theorem which shows the relationship between injected packets and the number of RTTs from a network connection.

Theorem 1:

If a network connection is chaffed with either Send or Echo packets, then the number of RTTs of the packets from the connection cannot be decreased compared with the un-chaffed connection.

Proof.

We collected the packets from a network connection and put them in a packet sequence, such as

{\dots, s_{i}, s_{i + 1}, \dots, e_{j}, e_{j + 1}, \dots}

, where

s_{i}

and

s_{i + 1}

are assumed two adjacent Sends. There may be other non-Send packets between

s_{i}

and

s_{i + 1}

. Packets

e_{j}

and

e_{j + 1}

are two adjacent Echo packets. It is possible that there are some other types of packets in between the two Sends, but no Echoes. It is also assumed that

s_{i}

matches with

e_{j}

and

s_{i + 1}

matches with

e_{j + 1}

. Therefore, the RTTs between the matched Sends and Echoes are calculated as

R T T_{i j} = e_{j} - s_{i}

and

R T T_{i + 1 j + 1} = e_{j + 1} - s_{i + 1}

. If packet Send

s^{'}

is injected in between

s_{i}

and

s_{i + 1}

, and also assuming

s^{'}

is either close to

s_{i}

or close to

s_{i + 1}

, from the packet matching algorithm [16], the RTTs between

s^{'}

and

e_{j}

or

e_{j + 1}

can be filtered out. Thus, in this case, the amount of RTTs will remain unchanged even under chaff perturbation. Otherwise, it cannot match with either

e_{j}

or

e_{j + 1}

, and cannot match with any Echoes before

e_{j}

or after

e_{j + 1}

, because if so, there will be a contradiction. This tells us if a packet is injected between two adjacent Sends, it will not contribute to the amount of RTTs. The only exception is that if an additional Echo is injected into between

e_{j}

and

e_{j + 1}

, such as

e^{'}

, the two injections may or may not match each other randomly. If they happen to match each other, it will contribute to the amount of RTTs. Otherwise, it does not affect anything on SSID. However, the possibility that two injected packets match with each other is low. Therefore, we draw our conclusion that is “injected Sends or Echoes cannot decrease the amount of RTTs in a network connection”. □

Consider two relayed incoming and outgoing connections in a host, and four streams: A send stream from the incoming and the outgoing connection, respectively, as well as an Echo stream. If each connection is chaffed, we have the following four different cases totally.

Case 1:

Figure 3 shows the first case in which

S_{i}^{'}

represents a chaffed stream. Send packet stream

S_{i}^{'}

cannot be chaffed at host h_i. The chaff packets must be injected into the connection at the outgoing connection of the adjacent upstream host of h_i, and the injections must be removed from the outgoing connection of the host h_i, otherwise the two Send streams would be relayed. Based on algorithm CMRW and Theorem 1, the four differences

Δ_{i i - j j}, Δ_{i i - j i}, Δ_{i j - j j}, Δ_{i j - j i}

are all larger than zero.

Case 2:

Figure 4 shows the second case in which

S_{j}^{'}

represents a chaffed stream. In this case, the chaffed packet can be injected into the outgoing connection of the host h_i and the injected packets will head to the adjacent downstream host of h_i. Based on the algorithm CMRW and Theorem 1, the four differences

Δ_{j j - i i}, Δ_{j j - i j}, Δ_{j i - i i}, Δ_{j i - i j}

are all larger than zero.

Case 3:

Figure 5 shows the third case in which

E_{i}^{'}

represents a chaffed Echo stream. In this case, chaffed packets can be injected at the incoming connection of the host of h_i. And the injections will go to the adjacent upstream host of h_i. Based on the algorithm CMRW and Theorem 1, the four differences

Δ_{i i - j j}, Δ_{i i - i j}, Δ_{j i - j j}, Δ_{j i - i j}

are all larger than zero.

Case 4:

Figure 6 shows the last case in which

E_{j}^{'}

represents a chaffed Echo packet stream. In this case, the chaffed packets must be injected from the incoming connection of the adjacent downstream host of h_i, and the injected packets must be removed from the incoming connection of the host h_i, otherwise the two Echo streams would be relayed. Based on algorithm CMRW and Theorem 1, the four differences

Δ_{j j - j i}, Δ_{j j - i i}, Δ_{i j - i i}, Δ_{i j - j i}

are all larger than zero.

The above analysis clearly demonstrated that if a session has been chaffed, CMRW can resist this chaff attack wherever a connection is chaffed, unlike the previous detection algorithm, which can only resist chaff attacks where the Send stream of the incoming connection is chaffed in a certain degree. Another important feature of CMRW algorithm is that it can maintain a high stepping-stone detection rate, even when the session is manipulated by chaff perturbation in different connections.

8. Experimental Results and Analysis

We designed an experiment to test the performance of the proposed cross-matching algorithm. In this experiment, a connection chain including 8 hosts: CCT30→AWS1→AWS2→AWS3→AWS4→AWS5→AWS6→AWS7 (7 connections) was established using SSH. We used CCT 35 to record network traffic at every incoming and outgoing connection of each host from AWS1 to AWS6. CCT 30 was used as the attacker’s host, and AWS7 was used as the victim. CCT 30 and CC35 are located in the computing lab of Columbus State University, GA, USA. AWS1 through AWS7 are the servers from Amazon Cloud Services. Table 1 gives the ASW servers’ IP and geographic location.

We executed three different scripts to simulate three attackers to generate network traffic at the hosts CCT30 through AWS7. Each of the three attackers’ scripts was repeated ten times.

Attacker 1 script:

pwd

whoami

sudo su

ls

cd/etc

ls –a

scp -p shadow attacker username@attacker IP:/home/seed/Documents exit

Attacker 2 script:

whoami

pwd

cd /home/seed/Documents

ls

nano text_file.txt

//paste a large text and save it

ls

cat hello.txt

exit

Attacker 3 script:

whoami

pwd

cd /home/seed/Documents

ls

nano hello.txt

//enter a few sentences and save the text file

ls

cat hello.txt

exit

We used the following tcpdump command to collect network traffic at the incoming connection of each sensor host ten times,

sudo tcpdump -nn -tt “(dst port 22 && dst xxx.xxx.xxx.xxx) || (src xxx.xxx.xxx.xxx && src port 22)” >AWS1_IN_AttackX_TestX.txt

and the below command was used for the outgoing connection of each sensor host 10 times.

sudo tcpdump -nn -tt “(dst port 22 && src xxx.xxx.xxx.xxx) || (dst xxx.xxx.xxx.xx && src port 22)” >AWS1_OUT_AttackX_TestX.txt

In both cases, “xxx.xxx.xxx.xxx” referred to the hosts’ own IP address, and X referred to a test number from 1 to 10.

We only captured Send and Echo packets with the above tcpdump commands. Since the experimental results obtained from the different servers are similar, we will only present the results obtained from AWS1. For each test, we obtained one file storing the Sends and Echoes collected from the incoming connection of AWS1, called AWS1-IN-TestX, and the file AWS1-OUT-TestX for storing the Send and Echo packets collected. We divided the Send and Echo packets into two separate files. So, from AWS1-IN-Test1 we obtained S-IN-Test1 containing all the Send packets and E-IN-Test1 containing all the Echo packets. Similarly, from AWS1-OUT-Test1, we also obtained S-OUT-Test1 and E-OUT-Test1. We call the packet-matching algorithm [16] to match S-IN-Test1 with E-IN-Test1 (denoted as S_in_E_in), as well as S-IN-Test1 with E-OUT-Test1 (denoted as S_in_E_o), S-OUT-Test1 with E-IN-Test1 (denoted as S_o_E_in), and S-OUT-Test1 with E-Out-Test1 (denoted as S_o_E_o). Table 2 shows the packet matching results for 10 tests at AWS1. The number inside each parenthesis represents the number of Send packets collected from either the incoming connection or outgoing connection of AWS1. The number before the parenthesis represents the number of matched Send and Echo pairs. The results from Table 1 show that without any manipulation, in each test, all the numbers in the four columns are equal. This tells us a stepping-stone intrusion is detected based on our proposed algorithm.

The reliability of a detection algorithm in resisting intruders’ session manipulation, especially chaff-perturbation, is a key factor in measuring the algorithm’s performance. Chaff-perturbation may happen at either the incoming or outgoing connection of a host. Chaffing an outgoing connection of a host is equivalent to chaffing the incoming connection of its adjacent host. In this paper, we chaff the incoming connection of AWS1 with a chaff rate of 10%, 20%, …, up to 100%. Due to space limitation, we only show the resisting performance of the proposed algorithm under the chaff rate of 10%, 50%, and 100%. There are terminologies used below we need to explain here. Chaff Send indicates injecting some Send packets randomly into the packets collected from a connection. Chaff Echo means injecting some Echo packets randomly into the packets collected from a connection. Chaff Send and Echo injects some Send and Echo packets randomly into the packet stream collected.

As shown in Table 2, we use Test01 as an example to explain Send packet insertion. It shows, in the first column of Test01, 106 Send packets were collected, and 105 of them were matched with corresponding Echo packets collected from the same connection. The second, third, and fourth columns of Test01 show the same packet matching results but with different packet matching combinations. The second column shows the matching results between the Send packets from the incoming connection of AWS1 and the Echo packets from the outgoing connection. The third and fourth columns show the matching results between the Send packets from the outgoing connection and the Echo packets from the incoming and outgoing connection. We first chaff Send packets into the incoming connection of AWS1 with chaff rate of 10%, 20%, 30%, …, to 100%. If we chaff the Send packets in the incoming connection with a 10% chaff rate, it will affect the matching rate for the first and the second column of the Test01. The more Send packets chaffed, the more severe the packet matching rate is affected. This means chaff-perturbation may potentially overturn the detection algorithm. Due to space limits, Table 3 only shows the packet cross-matching results under the chaff rate of 10%, 50%, and 100%. It clearly shows that even at a 100% chaff rate, a stepping-stone intrusion can be detected using the cross-matching and random-walk model. We also found the first and second columns share the same matching rate regardless of the chaff rate. A similar conclusion can be drawn for the third and fourth columns.

We chaffed Echo packets into the incoming connection of AWS1 with chaff rate of 10%, 20%, 30%, …, and 100%. The packet cross-matching results with an Echo chaff rate of 10%, 50%, and 100% are shown in Table 4. It shows that even though the packet matching rates are not equal for the four columns, the first and the third column keep the same, as well as the second and the fourth column. The results in Table 4 tell us that even under 100% chaff-perturbation manipulation, the proposed algorithm is still able to detect stepping-stone intrusion. It can resist intruders’ chaff-perturbation of 100%.

We Chaff both Send and Echo packets into the incoming connection of AWS1 with chaff rate of 10%, 20%, 30%, …, and 100%. Table 5 shows the packet cross-matching results with both a Send and Echo chaff rate of 10%, 50%, and 100%. The packet matching results show that even under a severe chaff-perturbation, such as 100% for both Send and Echo packets, the difference in the packet matching rate between the first and the third column follows random-walk behavior, and the matching rate of the second and the fourth column remains the same with three exceptions (see the results indicated in Red), but keeps very close. The results show that the proposed algorithm can resist intruders’ chaff-perturbation up to 100% even when both the Send and Echo streams are chaffed.

9. Conclusions and Future Work

In this paper, we propose a novel approach to detecting stepping-stone intrusion, as well as resisting intruders’ chaff manipulation, using network traffic cross-matching and RTT-based random-walk. To the best of our knowledge, this is the first time network packet cross-matching is proposed, and we apply both packet cross-matching and random-walk to detect stepping-stone intrusion and resist intruders’ chaff perturbation. The basic idea of applying packet cross-matching to resist intruders’ chaff-perturbation attack is that injected Send or Echo packets may affect the packet matching rate of either the incoming or outgoing connection of a host. Still, the difference in the packet matching rate between the incoming and outgoing connection follows a random-walk behavior. This feature can be used not only to detect stepping-stone intrusion but also to resist intruders’ chaff perturbation evasion attack. The proposed algorithm was justified using the network traffic collected from a TCP connection chain spanning 8 hosts including 1 local computer and 7 Amazon AWS cloud servers. The experimental results show that a stepping-stone intrusion can be detected even when an incoming connection is chaffed by Send chaff, Echo chaff, or both. The experimental results also show that the proposed algorithm can resist intruders’ chaff-perturbation evasion up to a 100% chaff rate. One future work will focus on using other types of packets, such as Acknowledgement packets, ICMP packets, and UDP packets to detect stepping-stone intrusion and resist intruders’ session manipulation. Another future work is to test if the algorithm proposed here can still resist intruders’ chaff manipulation if an intruder coordinates chaff on two or more hosts in a computer network connection chain.

Author Contributions

J.Y.: conceptualization, methodology, writing, project administration, and funding acquisition; L.W.: validation, formal analysis, investigation, supervision, and funding acquisition; M.Q.: software, validation, resources, and formal analysis; N.N.: data curation, software, and validation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Security Agency NCAE-C research grant H98230-20-1-0293 with Columbus State University, Georgia, USA.

Data Availability Statement

The network traffic packets captured for this project can be found from the shared Drive: https://drive.google.com/drive/folders/1KvTRBPthVGEf5sk1HJiftSI0uThYi1Dp. Accessed on 1 September 2022.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, Y.; Paxson, V. Detecting Stepping-Stones. In Proceedings of the 9th USENIX Security Symposium, Denver, CO, USA, 14–17 August 2000; pp. 67–81. [Google Scholar]
Chen, S.S.; Heberlein, L.T. Holding Intruders Accountable on the Internet. In Proceedings of the IEEE Symposium on Security and Privacy, Oakland, CA, USA, 8–10 May 1995; pp. 39–49. [Google Scholar]
Yoda, K.; Etoh, H. Finding Connection Chain for Tracing Intruders. In Proceedings of the 6th European Symposium on Research in Computer Security, Toulouse, France, 4–6 October 2000; Volume 1985, pp. 31–42. [Google Scholar]
Donoho, D.L.; Flesia, A.; Shankar, U.; Paxson, V.; Coit, J.; Staniford, S. Detecting Pairs of Jittered Interactive Streams by Exploiting Maximum Tolerable Delay. In Proceedings of the 5th International Symposium on Recent Advances in Intrusion Detection, Zurich, Switzerland, 16–18 October 2002; pp. 45–59. [Google Scholar]
Blum, A.; Song, D.; Venkataraman, S. Detection of Interactive Stepping-Stones: Algorithms and Confidence Bounds. In Proceedings of the International Symposium on Recent Advance in Intrusion Detection, Sophia Antipolis, France, 15–17 September 2004; pp. 20–35. [Google Scholar]
He, T.; Tong, L. Detecting encrypted stepping-stone connections. IEEE Trans. Signal Process. 2007, 55, 1612–1623. [Google Scholar]
Yung, K.H. Detecting Long Connecting Chains of Interactive Terminal Sessions. In Proceedings of the International Symposium on Recent Advance in Intrusion Detection, Zurich, Switzerland, 16–18 October 2002; pp. 1–16. [Google Scholar]
Yang, J.; Huang, S.-H.S. Mining TCP/IP Packets to Detect Stepping-Stone Intrusion. J. Comput. Secur. 2007, 26, 479–484. [Google Scholar] [CrossRef]
Wang, L.; Yang, J.; Xu, X.; Wan, P.-J. Mining Network Traffic with the k-Means Clustering Algorithm for Stepping-stone Intrusion Detection. Wirel. Commun. Mob. Comput. 2021, 2021, 1–9. [Google Scholar] [CrossRef] [PubMed]
Wang, L.; Yang, J.; Lee, A. An Effective Approach for Stepping-Stone Intrusion Detection Using Packet Crossover. In Proceedings of the 23rd World Conference on Information Security Applications (WISA2022), Jeju Island, Republic of Korea, 24–26 August 2022. to be published. [Google Scholar]
Wang, L.; Yang, J.; Workman, M.; Wan, P.-J. A Framework to Test Resistance of Detection Algorithms for Stepping-Stone Intrusion on Time-Jittering Manipulation. Wirel. Commun. Mob. Comput. 2021, Volume 2021, 1–8. [Google Scholar]
Yang, J.; Wang, L.; Shakya, S. Modelling Network Traffic and Exploiting Encrypted Packets to Detect Stepping-stone Intrusions. J. Internet Serv. Inf. Secur. 2022, 12, 2–25. [Google Scholar]
Neundorfer, N.; Yang, J.; Wang, L. Modelling Network Traffic via Identifying Encrypted Packets to Detect Stepping-stone Intrusion under the Framework of Heterogonous Packet Encryption. In Proceedings of the 36th International Conference on Advanced Information Networking and Applications, Sydney, Australia, 13–15 April 2022; Volume 450, pp. 516–527. [Google Scholar]
Yang, J. Resistance to Chaff Attack through TCP/IP Packet Cross-Matching and RTT-based Random Walk. In Proceedings of the 30th IEEE International Conference on Advanced Information Networking and Applications, Crans-Montana, Switzerland, 23–25 March 2016; pp. 784–789. [Google Scholar]
Yang, J.; Zhang, Y. RTT-based Random Walk Approach to Detect Stepping-Stone Intrusion. In Proceedings of the 29th IEEE International Conference on Advanced Information Networking and Applications, Gwangju, Republic of Korea, 24–27 March 2015; pp. 558–563. [Google Scholar]
Yang, J.; Wang, L. Applying MMD Data Mining to Match Network Traffic for Stepping-Stone Intrusion Detection. New Trends Smart Sens. Netw. Smart Comput. Netw. Secur. Sens. 2021, 21, 7464. [Google Scholar] [CrossRef] [PubMed]
Li, Q.; Mills, D.L. On the Long-range Dependence of Packet Round-trip Delays in Internet. In Proceedings of International Conference on Communications (ICC’98), Atlanta, GA, USA, 7–11 June 1998; Volume 1, pp. 1185–1192. [Google Scholar]
Paxson, V.; Floyd, S. Wide-area Traffic: The Failure of Poisson Modeling. IEEE/ACM Trans. Netw. 1995, 3, 226–244. [Google Scholar] [CrossRef] [Green Version]
Kao, E. An Introduction to Stochastic Processes; Duxbury Press: New York, NY, USA, 1996; pp. 47–87. [Google Scholar]
Packet Capturing. Available online: http://linux.die.net/man/8/packit (accessed on 18 August 2015).
The History of TCP/IP. Available online: https://www.techtarget.com/searchnetworking/definition/TCP-IP (accessed on 2 December 2018).

Figure 1. A stepping-stone host.

Figure 2. A long connection chain with Intruder, Stepping-stones and Victim.

Figure 3. Incoming Send stream is chaffed.

Figure 4. Outgoing Send stream is chaffed.

Figure 5. Incoming Echo stream is chaffed.

Figure 6. Outgoing Echo stream is chaffed.

Table 1. AWS servers’ IP and geographic location.

Server Name	OS	Public IP Address	Private IP Address	Geographic Location
AWS 1	Ubuntu	34.239.127.118	172.31.86.245	Virginia, USA
AWS 2	Ubuntu	52.59.96.142	172.31.29.198	Frankfurt, Germany
AWS 3	Ubuntu	18.133.230.26	172.31.39.118	London, UK
AWS 4	Ubuntu	52.60.79.102	173.31.12.163	Vancouver, Canada
AWS 5	Ubuntu	52.194.232.92	172.31.0.70	Tokyo, Japan
AWS 6	Ubuntu	34.248.116.10	172.31.32.141	Dublin, Ireland
AWS 7	Ubuntu	52.67.153.198	172.31.13.98	San Paulo, Brazil

Table 2. Packet cross-matching results at AWS1.

Test	S_in_E_in	S_in_E_o	S_o_E_in	S_o_E_o
Test01	105(106)	105(106)	105(106)	105(106)
Test02	112(113)	112(113)	112(113)	112(113)
Test03	111(113)	112(113)	111(113)	112(113)
Test04	118(132)	118(132)	118(132)	118(132)
Test05	114(116)	114(116)	114(115)	114(115)
Test06	107(108)	107(108)	107(108)	107(108)
Test07	111(114)	111(114)	111(114)	111(114)
Test08	116(119)	116(119)	116(119)	116(119)
Test09	122(126)	123(126)	122(125)	123(125)
Test10	122(179)	122(179)	122(179)	122(179)

Table 3. Packet cross-matching results with Send chaffed rate 10%, 50%, and 100% at AWS1.

	Chaff Send 10%				Chaff Send 50%				Chaff Send 100%
Test	S_in_E_in	S_in_E_o	S_o_E_in	S_o_E_o	S_in_E_in	S_in_E_o	S_o_E_in	S_o_E_o	S_in_E_in	S_in_E_o	S_o_E_in	S_o_E_o
Test01	104(116)	104(116)	105(106)	105(106)	105(138)	105(138)	105(106)	105(106)	105(213)	105(213)	105(106)	105(106)
Test02	112(124)	112(124)	112(113)	112(113)	112(147)	112(147)	112(113)	112(113)	113(227)	113(227)	112(113)	112(113)
Test03	111(124)	112(124)	111(113)	112(113)	111(147)	112(147)	111(113)	112(113)	111(227)	112(227)	111(113)	112(113)
Test04	118(145)	118(145)	118(132)	118(132)	118(171)	118(171)	118(132)	118(132)	118(265)	118(265)	118(132)	118(132)
Test05	114(127)	114(127)	114(115)	114(115)	110(151)	110(151)	114(115)	114(115)	115(233)	115(233)	114(115)	114(115)
Test06	108(118)	108(118)	107(108)	107(108)	107(140)	107(140)	107(108)	107(108)	107(217)	107(217)	107(108)	107(108)
Test07	111(125)	111(125)	111(114)	111(114)	112(148)	112(148)	111(114)	111(114)	111(229)	111(229)	111(114)	111(114)
Test08	116(131)	116(131)	116(119)	116(119)	116(155)	116(155)	116(119)	116(119)	116(239)	116(239)	116(119)	116(119)
Test09	122(138)	123(138)	122(125)	123(125)	123(164)	123(164)	122(125)	123(125)	125(253)	125(253)	122(125)	123(125)
Test10	122(197)	122(197)	122(179)	122(179)	122(233)	122(233)	122(179)	122(179)	122(359)	123(359)	122(179)	122(179)

Table 4. Packet cross-matching results with Echo chaffed rate 10%, 50%, and 100% at AWS1.

	Chaff Echo 10%				Chaff Echo 50%				Chaff Echo 100%
Test	S_in_E_in	S_in_E_o	S_o_E_in	S_o_E_o	S_in_E_in	S_in_E_o	S_o_E_in	S_o_E_o	S_in_E_in	S_in_E_o	S_o_E_in	S_o_E_o
Test01	101(106)	105(106)	101(106)	105(106)	82(106)	105(106)	82(106)	105(106)	64(106)	105(106)	64(106)	105(106)
Test02	111(113)	112(113)	111(113)	112(113)	94(113)	112(113)	94(113)	112(113)	71(113)	112(113)	71(113)	112(113)
Test03	109(113)	112(113)	109(113)	112(113)	88(113)	112(113)	88(113)	112(113)	72(113)	112(113)	72(113)	112(113)
Test04	112(132)	118(132)	112(132)	118(132)	87(132)	118(132)	87(132)	118(132)	74(132)	118(132)	73(132)	118(132)
Test05	104(116)	114(116)	104(115)	114(115)	85(116)	114(116)	85(115)	114(115)	67(116)	114(116)	67(115)	114(115)
Test06	105(108)	107(108)	105(108)	107(108)	81(107)	107(108)	81(107)	107(108)	58(105)	107(108)	58(105)	107(108)
Test07	108(114)	111(114)	108(114)	111(114)	81(114)	111(114)	81(114)	111(114)	66(114)	111(114)	66(114)	111(114)
Test08	109(119)	116(119)	109(119)	116(119)	82(119)	116(119)	82(119)	116(119)	62(119)	116(119)	62(119)	116(119)
Test09	120(125)	123(126)	120(124)	123(125)	93(121)	123(126)	93(120)	123(125)	71(121)	123(126)	72(120)	123(125)
Test10	119(179)	122(197)	119(179)	122(179)	96(179)	122(179)	96(179)	122(179)	73(179)	122(179)	73(179)	122(179)

Table 5. Packet cross-matching results with both Send and Echo chaffed rate 10%, 50%, and 100% at AWS1.

	Chaff Both Send and Echo 10%				Chaff Both Send and Echo 50%				Chaff Both Send and Echo 100%
Test	S_in_E_in	S_in_E_o	S_o_E_in	S_o_E_o	S_in_E_in	S_in_E_o	S_o_E_in	S_o_E_o	S_in_E_in	S_in_E_o	S_o_E_in	S_o_E_o
Test01	100(116)	105(116)	101(106)	105(106)	86(159)	105(159)	82(106)	105(106)	77(213)	105(213)	64(106)	105(106)
Test02	111(124)	112(124)	111(113)	112(113)	97(170)	112(170)	94(113)	112(113)	91(227)	113(227)	71(113)	112(113)
Test03	109(124)	112(124)	111(113)	112(113)	91(170)	112(170)	88(113)	112(113)	89(227)	112(227)	72(113)	112(113)
Test04	112(145)	118(145)	118(132)	118(132)	91(198)	118(198)	87(132)	118(132)	83(265)	118(265)	73(132)	118(132)
Test05	104(127)	114(127)	114(115)	114(115)	91(174)	114(174)	85(115)	114(115)	78(233)	113(233)	67(115)	114(115)
Test06	105(118)	108(118)	105(108)	107(108)	85(160)	107(160)	81(107)	107(108)	63(211)	107(217)	58(105)	107(108)
Test07	108(125)	111(125)	108(114)	111(114)	85(171)	111(171)	81(114)	111(114)	74(229)	111(229)	66(114)	111(114)
Test08	109(131)	116(131)	109(119)	116(119)	87(179)	116(179)	82(119)	116(119)	76(239)	116(239)	62(119)	116(119)
Test09	120(137)	122(138)	120(124)	123(125)	101(180)	124(189)	93(120)	123(125)	90(244)	125(253)	72(120)	123(125)
Test10	119(197)	122(197)	119(179)	122(179)	99(269)	122(269)	96(179)	122(179)	81(359)	122(359)	73(179)	122(179)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Yang, J.; Wang, L.; Qin, M.; Neundorfer, N. Detecting Stepping-Stone Intrusion and Resisting Intruders’ Manipulation via Cross-Matching Network Traffic and Random Walk. Electronics 2023, 12, 394. https://doi.org/10.3390/electronics12020394

AMA Style

Yang J, Wang L, Qin M, Neundorfer N. Detecting Stepping-Stone Intrusion and Resisting Intruders’ Manipulation via Cross-Matching Network Traffic and Random Walk. Electronics. 2023; 12(2):394. https://doi.org/10.3390/electronics12020394

Chicago/Turabian Style

Yang, Jianhua, Lixin Wang, Maochang Qin, and Noah Neundorfer. 2023. "Detecting Stepping-Stone Intrusion and Resisting Intruders’ Manipulation via Cross-Matching Network Traffic and Random Walk" Electronics 12, no. 2: 394. https://doi.org/10.3390/electronics12020394

APA Style

Yang, J., Wang, L., Qin, M., & Neundorfer, N. (2023). Detecting Stepping-Stone Intrusion and Resisting Intruders’ Manipulation via Cross-Matching Network Traffic and Random Walk. Electronics, 12(2), 394. https://doi.org/10.3390/electronics12020394

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Detecting Stepping-Stone Intrusion and Resisting Intruders’ Manipulation via Cross-Matching Network Traffic and Random Walk

Abstract

1. Introduction

2. Modelling and Matching Network Traffic

2.1. Send and Echo Definition

2.2. Packet Matching and RTT

2.3. RTT Distribution

3. Chaff Attack Definition and Implementation

3.1. Chaff Definition

3.2. Packit

3.3. Chaff Tool Developed Using C#

3.4. Chaff Affection to RTT-Based Random-Walk

4. Random-Walk Model

4.1. One-Dimensional Random-Walk

4.2. RTT-Based Random Walk

5. Packet Cross-Matching

5.1. Packet Matching

5.2. Packet Cross-Matching

6. Detection Algorithm

6.1. Modelling the Problem

6.2. Detection Algorithm

7. Resistance Analysis to Chaff Attack

8. Experimental Results and Analysis

9. Conclusions and Future Work

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI