1. Introduction
The Internet is structured as an interconnection of smaller networks owned by different entities (academic, governmental, commercial, or else). These networks are called Autonomous Systems (AS) and can be identified by their peers thanks to a unique AS number (ASN). An AS owns a set of IP addresses that can be assigned to users or network equipment. Each AS handles its internal communications and has one or more gateways linked to adjacent AS. According to the report produced by the Number Resource Organization (NRO), the number of allocated ASN in December 2021 was around 116,900 [
1]. To handle such an amount of potential communication, two main protocols are in use: the Internet Protocol (IP) for the data plane and the Border Gateway Protocol (BGP) for the control plane. BGP’s task is to make sure every router knows how to forward incoming packets. In a nutshell, BGP allows AS to construct routing tables by advertising the set of IP addresses they own and spreading those advertisements along with the ordered list of traversed AS so far. Hence, every AS can keep its routing table up to date. The issue with that procedure is that it fully relies on trust. There are no security features preventing an AS to advertise a set of IP addresses that it does not truly own. This kind of misleading advertisement (whether intentional or accidental) can lead to modified routing tables and is known when performed on purpose, as a prefix hijacking attack. This kind of incident has occurred frequently during the last two decades. For instance, on February 2008, Youtube became unreachable for two hours after Pakistan Telecom falsely claimed to be the better route for joining it [
2]. Another striking breakdown happened on April 2010, when China Telecom advertised the wrong traffic routes: for approximately 20 minutes, no less than
of the Internet’s traffic adopted those routes, including some traffic of the US government, military sites and commercial sites such as Yahoo! and IBM [
3]. More recently, in June 2019, an AS owned by the company “Safe Host” mishandled an update on routers and started to advertise wrong routes. China Telecom echoed those claims, which ultimately led to an important amount of traffic going through China Telecom before getting to their initial destination. This situation lasted about 2 h [
4].
Those incidents had an important and very noticeable impact, hence they were quickly detected. However, a smaller-scale relay may be established by a stealthy attacker over the long term, which could allow man-in-the-middle attacks and metadata gathering, or could globally impact the network in terms of latency or private data leaks. A noticeable example of stealthier kinds of attacks is presented in [
5], where authors show that hijacking bitcoin messages in order to delay block propagation can entail financial losses. Aside from those practical issues, a successful international relay attack would also damage the geopolitical reputation of the victim’s country.
Since July 1994, when BGP-4 was first described [
6], many proposals tried to enhance its security [
7,
8,
9,
10]. All these contributions aimed to strengthen BGP by working on the possibility to authenticate and authorize BGP updates between AS. The Internet Engineering Task Force (IETF) initiated the BGPSec standardization project [
11] based on Secure-BGP [
7]. The key idea is to use the RPKI public-key infrastructure to certify AS signatures. Hence, a BGPSec update will contain the reachable set of IP addresses along with the list of all the AS that received the update where all participating AS signed their own pre-path. Other attempts using techniques for anomaly detection, localization, and mitigation are to be found with different levels of efficiency [
12,
13,
14], searching for strategies such as alternative routes creation, or hijacked BGP routes announcement analyzes. According to [
15], these attempts target only specific subproblems and do not provide a complete detection.
On a more global scale, Path Aware Networking (PAN) is emerging as a novel way of thinking routing architecture allowing more accurate knowledge on the path traveled by data [
16,
17,
18,
19,
20,
21]. An important goal for those architectures is to achieve precise and, above all, trustworthy path tracking of the traversed routers during the sending of a packet. While a complete redesign of the Internet routing architecture might just be the perfect long-term solution, there is still a long way before a worldwide suitable design makes it to standardization.
Our Contribution.
We approach the detection of relay attacks by using time measurement. To the best of our knowledge, this has never been achieved before in the context of Internet communications.
We analyze the time stability between two communicating nodes by running intercontinental experiments over 5 months.
We propose ICRP: a two-party cryptographic protocol performing simultaneously the sending of messages, the measurements of the timings, the authentication of the receiver and the decision about the legitimacy of the route. The decision process uses a so-called decision function, taking as inputs a sample of measures captured on the fly. The function checks if the sample matches the “expected behavior” between the nodes, and outputs a Boolean (1 if the sample is suspicious, i.e., the traffic might be hijacked, 0 otherwise).
We implement a prototype to test the performances of our protocol for a large amount of data exchanges (Up to 200 Mo).
The remainder of the paper is structured as follows.
In
Section 2, we analyze time stability over Internet communication and show that this stability is achieved for UDP communications between terminals from different locations, even for intercontinental exchanges.
In
Section 3, we introduce the function deciding whether a given exchange is suspicious. This decision function outputs a Boolean (1 for suspicious and 0 otherwise). It takes as inputs a freshly collected sample and a so-called “reference sample”. The reference sample represents the “expected behavior” between the nodes and is constructed during a learning phase prior to the first execution of the protocol. We then test the decision function efficiency by observing the false positive and false negative rates over a large group of both genuine communications and relay simulations.
In
Section 4, we first describe the Distance-Bounding Protocols. They are used in short range contactless communications for two-party authentication [
22,
23,
24,
25,
26]. Those protocols achieve authentication while ensuring an upper bound on the distance separating the parties, which highly complexifies the possibilities for relay attacks. Secondly, we present our protocol ICRP which translates the idea of Distance-Bounding in the context of Internet communications.
In
Section 5, we describe our prototype implementation and evaluate the overhead induced by our solution in terms of latency, computational complexity, and packet size.
We believe our approach to be innovative and realistic for practical applications, and so, in
Section 6, we present an illustrative example.
2. Internet Latency
The protocol we introduce in this paper strongly relies on Internet latency and its stability over time. We consequently describe, in this section, the experiments we performed to measure this latency and evaluate how much it is impacted by a traffic hijacking attack.
2.1. Time Measurement
We distinguish two methods for measuring the transit time between two machines. The One Way Transit Time (OWTT) represents the time measured between the sending of a packet and the arrival to the destination. This approach attempts to capture the real-time separating two endpoints but demands a precise clock synchronization of those points and sending the timestamp along with the packet.
The Round Trip Time (RTT) is measuring the time between the sending and the reception of a response. As this is a one-sided measure, there is no need for clock synchronization. The approximation is often made, but there is no insurance that the transit times in both directions are comparable. It is then preferable to consider RTT as a stand-alone metric rather than a way to measure OWTT.
In this paper, we adopt the RTT metric. Using the OWTT metric constrains one of the party members of our protocol to send its timestamp data to the other party for a travel time to be computed. This has at least two clear downsides: (1) it raises the overall quantity of data to be sent, and (2) it may become a breach for an attack aiming to falsify the measures.
2.2. Experimental Setup
We measured RTTs for UDP traffic between two parties, the Sender and the Receiver, sometimes relayed by the Attacker. We define below the key points of our experiments.
Locations. We use four nodes located in different countries for our experiments:
- -
France
- -
Germany
- -
Poland
- -
USA, Oregon
Hijacked traffic. We ran our experiments on the Internet, hence we had no control over the route between and . For this reason, we simulated the presence of a relay by sending directly the packets from to and then from to .
Packet size. The impact of packet length on RTT is very weak for realistic variations [
27]. Hence, we arbitrarily chose to use 512-bytes packets across all our experiments.
2.3. RTT Measurements without Adversary
2.3.1. Stability over a Short Period
We present the result of short-period (i.e., a few minutes) experiments in
Figure 1,
Figure 2 and
Figure 3.
Figure 1 shows 6 graphs, in which each “+” represents the value of one RTT in milliseconds (readable in the
y-axis). Each graph is a plot of 7000 RTTs between two end-points collected in a row. The dates and times of the start and end of the measurements are given on each individual graph.
Figure 2 and
Figure 3 display the same samples on a more zoomed-in scale along with their statistic distribution.
Regarding the stability of the measurements, the distributions show that the majority of the measures are concentrated in one dense interval. Depending on the sources and destinations, the samples appear in different shapes. Noticeably, samples from Germany-Oregon or France-Oregon seem to be formed of several layers. When this is the case, it seems that one layer always outstands the others. Indeed, for the sample gathered from France to Oregon, 93% of its measures are in and for Germany to Oregon, 81% are in .
2.3.2. Stability over a Long Period
In this paper, we decided to check the RTT stability over long periods (a few months) for two main reasons. We firstly want to validate our time-based approach, given that this kind of method has never been used for Internet relay detection before. Secondly, as our decision function uses a reference sample to test fresh samples, we need to know if this reference remains representative over time or if it should be regularly updated.
In this section, we display long-term measurements. During a full month, we gathered 1000 RTTs per hour between two nodes (Poland and Oregon) to observe the overall evolution.
Figure 4 shows that long-term stability is achieved over this period. However, comparing this very large sample with older measures on
Figure 5 also shows a slight modification of about 3 ms. Going further on this analysis, we observed several samples collected between early September and mid-January. We see in
Figure 6 two graphs, the top one showing the means in milliseconds of those samples, the days on which they were collected are readable in the
x-axis with their respective sizes (between parenthesis). The bottom one shows the same sample on a more zoomed-in scale.
Figure 6 shows that the stability of the measures is susceptible to evolve for the order of magnitude of the milliseconds. This same result is noticeable as well for samples between Germany and Oregon. Those variations remain small in comparison with the impact caused by a relay on the path (see
Section 2.4).
2.4. RTT Measurements with Adversary
Figure 7 shows the impact of a relay over the Round Trip Time for exchanges between the node in Poland and the one in Oregon. We display an alternance of standard communications and relayed communications going through the node in France.
It appears that the relay creates a drastic impact on the measured time. Indeed, the RTTs get increased by more than 150 ms.
For this specific route, the impact on the time caused by the relay is more than enough to efficiently distinguish between a genuine route and a relayed one.
The impact of a relay may be caused by many factors, such as: the number of traversed routers, the location of the attacker, his proximity to a genuine route, his control over some network equipment, and so forth. This means that there exists one or multiple optimum setups, lowering the impact of a relay to a minimum. With that information in mind, we choose to define our decision process so that it can modify its detection sensibility. By doing that, we provide users a dynamic capacity to face adversaries even in very efficient setups, see
Section 3.
3. Decision Function
We define a decision function noted
.
This function takes as input a fresh sample (
) of size
n and returns a bit
(i.e., accepted) or
(i.e., rejected).
uses a parameter called
which is a trusted sample of RTT. The sample
can be seen as a fingerprint of the expected behavior of time between two nodes and, as seen in
Section 2, this reference sample is not subject to great changes over long periods of time.
We tested on numerous samples, some of which are genuine communication between a sender and a receiver , and the rest issued from a relay simulation where sends its packets to an intermediary node which relay them to .
3.1. Definitions
In this section, we define the keywords, concepts and ideas that will be used throughout this paper.
3.1.1. Reference Sample
The reference sample consists of a large set of measures gathered in advance during a learning phase performed between and . It represents the standard values we can expect when measuring RTTs between and . It is worth noting that the learning phase should take place when there is no ongoing attack, that is, when the route taken by the packets during the measurements has not been altered by a malicious party.
The reference sample should be updated when the genuine RTTs deviate from their reference due, for example, to modifications in the network topology. The experiments presented in
Section 2,
Figure 5, show that such a modification may occur, but does not cause a drastic change in the measures in comparison with the impact of a relay.
In environments where RTTs are not stable, one can consider performing dynamic updates of the reference sample to improve the reliability of the protocol. For example, any new valid execution of the protocol provides 256 fresh RTTs that can be concatenated to while the 256 oldest ones can be removed from . Automatic updates should be monitored, though, as they may allow poisoning attacks on the reference sample.
3.1.2. Terminology
outputs a binary response: 0 if the tested sample is considered genuine, 1 otherwise. Throughout
Section 3.2, we challenge
with genuine and relayed samples and analyze its efficiency using the following terminology:
3.2. Description and Efficiency
Given the stability of the samples, we choose to use a positional decision process. Our decision function selects a threshold
t depending on the reference sample
it uses. This threshold is a time limit allowing at most a given proportion
p of the fresh sample
to be above. So typically, the threshold should be around the
percentile of
. The decision function accepts
if this upper bound on the proportion is fulfilled, and rejects it otherwise (see Algorithm 1).
Algorithm 1 pseudocode |
Input Output 0 or 1 |
3.2.1. False Negatives and False Positives
We show in
Figure 8 (in
Figure 9, respectively) the false positive and negative rates we obtained in relation to the threshold value for communications between Germany and Oregon (Poland and Oregon, respectively). Those tests have been performed on more than 500 samples gathered over several months. As we saw in
Section 2.4, the measurements of our hijacking simulation created such a time gap that this decision function is strong enough to achieve absolute detection,
Figure 8 and
Figure 9 both highlight this by having a very large interval of possible threshold values leading to no false positive nor false negative.
Note that those graphs may change depending on the capacity and positional setup of the intermediary. Indeed, it is expected that the number of additional routing equipment visited during the relay is highly related to its efficiency. This would be represented by a false negative rate growing closer to 1 for lesser threshold values. Choosing a suited threshold then becomes a matter of appreciation of how efficient an attacker can get.
3.2.2. Choosing the Threshold
As we stated in
Section 2.4, the efficiency of an attack may depend on many factors such as the attacker’s connection speed, the current network topology and probably other factors. To detect an attacker disposing of an optimal relay setup, we should set the decision function to the highest sensitivity that can be supported. This is achieved by letting the threshold be as low as possible while keeping some breathing room to avoid the maximum of false positives as well. From
Figure 9, we see that the minimum threshold value getting
of false positive on our tests lies around 192 ms. Nevertheless, we observed on
Section 2.3.2 that small variations on the samples might emerge over long periods of times (
Figure 5 and
Figure 6). Hence, users can choose to slightly loosen the sensibility of the decision function with a higher threshold.
Allowing the samples to live about 5 to 10 ms higher than normally expected trades off the insurance of very few false positives against the possibility of an attack, assuming that such an efficient relay is achievable between those nodes.
5. Performances
In this section, we evaluate ICRP performances. Firstly, we supply an overview of our prototype implementation by explaining the problem with the sequential representation given in
Figure 11 and
Figure 12. Then, we analyze the results based on this implementation regarding 3 main points of attention: (1) the computational capabilities for the cryptographic operations (hashes, signature, verification and decision function); (2) the throughput capabilities in comparison with direct sending of UDP packets; and (3) the data overhead added from a classical sending.
5.1. Prototype Implementation
Our proposal as displayed by
Figure 11 and
Figure 12 has a very clear downside, which is its sequentiality. Indeed, with this representation, each time
sends a packet, he has to wait for a response before sending the next one. This is especially problematic when ICRP runs in active mode and aims to send many consecutive packets. Hence,
needs to concurrently perform the sending and the reception of
’s responses.
Similarly, if
and
need to run multiple consecutive sessions in active mode, the verification and authentication part of the protocol must not be realized sequentially as it would force
to wait until the end of the session to start a new one.
Figure 14 schematically shows the differences in terms of efficiency between 3 simplified models of implementation for
’s side: the top one is the sequential implementation, the middle one displays 2 concurrent threads for the sending and the reception of acknowledgments, the bottom one displays 3 concurrent threads for sending, receiving and verifying. The dotted lines represent a repeated operation, while the solid lines represent inactive periods of time for the current thread.
Our prototype is implemented according to the bottom model of
Figure 14. The third part handling verification is separated into three threads for synchronization purposes. The other party
is also implemented concurrently, with one thread handling the reception of packets and the sending of the response, and a second thread performing the cryptographic computations. We provide below a description of each thread’s actions.
On ’s side:
Thread : this thread is in charge of sending all the packets to and generating a timestamp when it does. It then stores the timestamp in a structure shared by all threads.
Thread : this thread is in charge of receiving every response from , generating a timestamp when it does, and computing the RTTs from the timestamps placed in the shared structure.
Thread : this thread is in charge of updating the Hash context with the values known beforehand by . That is the content of packets and the bits .
Thread : this thread is in charge of updating the Hash context with the values received from . That is the bits .
Thread : this thread is in charge of receiving ’s signature and waits for all the data it needs to be available from other Threads. It then proceeds to check the signature and applies the decision function on the RTTs for the current session.
On ’s side:
Thread : this thread is in charge of receiving the packets and sending the responses to .
Thread : this thread is in charge of updating the Hash context with the values known beforehand by . That is the bits . It then waits for all the data it needs to be available from other threads (the content of packets and the bits ), proceeds to Hash&Sign, and finally, sends the signature to .
Using multiple threads to boost up the performances forces the parties to tag each packet with a sequence number in order to link each message and response to the correct round and session and not get confused with the time measurements. Consequently, we added a 4 bytes header to indicate the sequence number. These 4 bytes are formed by 2 bytes indicating the current session number followed by 2 bytes indicating the current round number. Writing the round and session indexes on 2 bytes each is good enough to fit our experimental needs while being easily implemented in our prototype. In a real case scenario though, the number of rounds n for an ICRP session should not be greater than 512 because a relay detection can only occur once the n rounds are over. If n is at most 512, its associated field in the 4 bytes header can be limited to only 9 bits, leaving 23 bits for the session index. We choose to use a 4 bytes header as it is also the size used for the field SQN of the TCP header which serves the same purpose of keeping the sessions synchronized between the nodes. This TCP field is reset once reached the maximum value of , which is high enough to ensure not having two packets with the same SQN transiting at the same time.
5.2. Analysis
Our protocol has three main parameters:
k: the number of sessions to execute.
n: the number of rounds per session. It also defines the size of a collected sample and the number of packets sent during one session. We assume n to be constant from one session to the next.
p: the size of each packet in bytes. We also assume that p remains constant over rounds and sessions.
We call a -sending, the sending with our prototype of bytes of data through k sessions of n rounds with constant p-bytes packet size.
5.2.1. Complexity of the Computations
We leave to users the choice of the cryptographic primitives, as we believe that they are interchangeable in our protocol. For our experiments, we have arbitrarily chosen to use as the hash function, and as the Signature algorithm. Those choices are voluntarily poor performance-wise. However, they allow us to give an upper complexity bound.
For each session, (, respectively) performs Hash&Sign (Hash&Verify, respectively) over the packets and the random bits and . This is bits of data to be hashed. is based on the Merkel–Damgård construction, this means that the message to hash is separated into blocks of identical size which are processed by a compression function. Hence, its complexity is linear in the number of blocks involved. uses 512-bits blocks, so for each session, the Hash complexity will be in the number of compression function, given by the following computation: .
Table 1 shows the number of applied compression functions and the corresponding hashing time depending on
n with a fixed value of
.
Regarding the signature and the verification, the input value is always a 256 bits string, and so the time taken for this operation remains constant for both operations.
Finally, runs the decision function on the sample. This process is linear in the size n of the sample as it goes through the table of RTTs and increments a counter every time the treated value is higher than the chosen threshold. Note that n should not be too large because the verification is performed for each n packet sent. Hence, a high n leaves a wider amount of data to be relayed before the detection. We believe or to be the most suitable choices. These values being very small, we can consider the decision complexity to be negligible.
Note that, the slower the overall verification process is, the later a suspicious sample will be detected. However, as the authentication and verification are performed concurrently with the other processes, those times do not impact the throughput performances.
5.2.2. Throughput
In this section, we test the impact of parameters n, p, and k over the sending time of large blocks of data. We then compare those times with the throughput given by the sending of raw UDP packets unsupervised by ICRP.
Impact of Parameters n and k
We see on
Table 2 the times in seconds involved in an
-sending of 20 Megabytes (resp. 100 Megabytes) of data using 3 possible values of
which are
,
and
(resp.
,
and
). The parameter
p remains constant to 500 bytes for those tests. For this configuration, we see that the number of sessions
k and rounds
n creates no visible impact on the overall sending time.
Impact of Packet Size p
As it was stated in [
27], the packet size has a low impact on the sending time of a single packet. This means that the more data contained within every single packet, the faster the sending of the overall message (containing multiple packets) will be.
Figure 15 shows the evolution of the time to send 100 Megabytes of data depending on the size of the individual packets. We can observe that the time decreases for realistic values of
p. The maximum size of UDP packets is implicitly specified in the official IETF documentation RFC768 [
32], as the UDP header contains a field called “Length”. This field is 16 bits in size and represents the length in bytes of the packet (header included), which means the theoretical maximum size for a UDP packet would be 65,535 bytes. In practice, however, most services (for instance DNS) restrict the largest packet length to 512 bytes in order to respect the Maximum Transmission Unit (MTU) on the Internet and avoid frequent packet loss.
Comparison with a Direct Sending of UDP Packets
To see how ICRP performs, we compared the times involved in the sending of a given amount of data between two fixed nodes, using our prototype implementation and a direct sending including no time measurements, authentication, or acknowledgments.
As
Figure 15 has demonstrated, the size of individual packets has an impact on the global sending time of a file. Hence, those 2 methods should send packets of comparable sizes. The following
Table 3 displays the measures of the overall sending of 10, 40, 100 and 200 Megabytes of data using the two methods with a constant packet size of 500 bytes and compares the obtained Throughput.
The average throughput is about Mb/s for direct sending and about Mb/s using ICRP. The measures were performed between a personal computer based in Caen, France, and a Server supplied by AWS (Amazon Web Services). The slight loss in performance is due to the fact that ICRP has to handle multiple threads concurrently on both the Sender’s and Receiver’s side, which is obviously not the case for direct sending. It induces that the processing capabilities of the endpoints have an impact on throughput performances. This impact remains low though, as, in this experiment, the sending machine was a personal laptop with few processing capabilities, and still limited the throughput loss to .
5.2.3. Volume
We quantify the volume of overhead data added through a -sending in Active Mode in comparison with the amount of raw information transmitted ( bytes).
Each message (resp. response) is marked with a bit (resp. ). This gives an additional bits of information traveling through the network.
Each message (resp. response) is complemented with a sequence number encoded onto a 4 bytes header. This adds another bits of additional data traveling.
During the verification part of our prototype, a signature is sent for each session, with an additional 2 bytes tag indicating the current session number. This adds another additional bits.
Overall, the total overhead of our protocol is
bits. The proportion of additional data traveling through the network is:
This proportion is unrelated to the number of sessions
k.
Table 4 displays the overhead proportion for a few practical values of
n and
p.
It appears that increasing the values of n and p lowers the overhead proportion, but, as stated before, it is preferable to keep the number of round n below 512. Otherwise, the decision process would be too rarely applied. It is also advised to restrain the size p of individual packets to avoid too frequent packet loss.