Next Article in Journal
An Investigation of the Operating Principles and Power Consumption of Digital-Based Analog Amplifiers
Previous Article in Journal
An Improved Lightweight Network Using Attentive Feature Aggregation for Object Detection in Autonomous Driving
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Address Obfuscation to Protect against Hardware Trojans in Network-on-Chips

by
Thomas Mountford
1,*,
Abhijitt Dhavlle
2,
Andrew Tevebaugh
1,
Naseef Mansoor
3,
Sai Manoj Pudukotai Dinakarrao
2 and
Amlan Ganguly
1,*
1
Computer Engineering, Rochester Institute of Technology, Rochester, NY 14623, USA
2
Electrical and Computer Engineering, George Mason University, Fairfax, VA 22030, USA
3
Computer Information Science, Minnesota State University, Mankato, MN 56001, USA
*
Authors to whom correspondence should be addressed.
J. Low Power Electron. Appl. 2023, 13(3), 50; https://doi.org/10.3390/jlpea13030050
Submission received: 20 June 2023 / Revised: 23 August 2023 / Accepted: 24 August 2023 / Published: 6 September 2023

Abstract

:
In modern computing, which relies on the interconnection of networks used in many/multi-core systems, any system can be critically subverted if the interconnection is compromised. This can be done in a multitude of ways, but the threat of a hardware Trojan (HT) being injected into a system is particularly prevalent due to the increase in third-party manufacturers for system-on-chip (SoC) designs. With a local injection of an HT in an SoC, an adversary can gain access to information about applications running on the system by revealing specific communications of the SoC, and the network-on-chip (NoC) as a whole. This heavily compromises the system and gives information to the attacker, which can lead to more tailored, compromising attacks. In this paper, we demonstrate an HT that exploits communication patterns inside an SoC to reveal applications that are running on an NoC with multi/many-core processors. This is performed by leaking packet counts, after which the attacker then uses machine learning techniques to identify applications running on processors, and the SoC as a whole. We also propose a LUT-based obfuscation technique to limit the information available to the hardware Trojan. Our results indicate that this obfuscation method can reduce the accuracy of this attack from 99% to <8% in multi/many-core systems.

1. Introduction

The necessity of computing has driven the modern digital world to the adoption of powerful platforms, such as many/multi-core chips or blade servers running with multiple processors. These parallel platforms need interconnected networks, such as a network-on-chip (NoC), that connects processing elements together [1]. The NoC specifically is a communication subsystem that is responsible for sending information between the modules in the system which typically are processing elements. The vital role that the NoC performs as the communication fabric, as well as its large surface area, make it one of the most vulnerable elements in the modern computer system. These modern computing systems require many components that push many chip vendors to become fabless. Becoming fabless entails that many parts of the chip for the vendors are contracted out and not designed in the company’s own factory. This can alleviate operating cases as well as the time to market and design costs. In addition, modern system-on-chips (SoCs) use many third-party IPs (3PIPs), which can be obtained from untrustworthy vendors. An attacker could introduce malicious logic, such as a hardware Trojan (HT) [2], at the foundry or at the 3rd party design house. This HT has a variety of nefarious uses, such as information leakage, functionality subversion, battery exhaustion, and system delay [2,3,4]. Embedding an HT in something as vital as the NoC can reveal a multitude of information to an attacker. In this work, the HT gathers packet count and destinations through infected switches. This information can be used to reveal the application suites running on the system, compromising the user profile. These applications detail specific unique processes that are often used to benchmark NoC and processor designs. These benchmarks provide a detailed view of what processes could be running on an infected system. This information can further benefit the attacker by enabling more tailored attacks suited for the specific system running an obscure process. This not only affects the infected processor but all systems which have the compromised processor as a subsystem. For instance, if a military camera system has an HT deployed, an attacker could clearly see that the processor is running facial tracing software. This not only tells the attacker what the chip is being used for but also can give that attacker the whereabouts of other, non-infected camera systems. One infected switch subverted the military backbone and led to a compromise in national security [5]. In this paper, we introduce a lightweight NoC-based HT that looks at the flits traversing through a switch. The HT increments a counter corresponding to the destination. This can be performed with multiple HTs in different switches, improving the accuracy of the attack. After a certain number of clock cycles, the HT packetizes the counts and sends them to an external attacker. The attacker uses data analysis techniques to determine the benchmark applications running on the system, compromising the user’s security. This is completed by using a machine learning (ML), ANN (artificial neural network) algorithm that can differentiate different application suites. This detection gains accuracy with more HTs embedded in the system, but with two HTs, we demonstrate 97% accuracy using these ML techniques. To defend against this HT, we demonstrate LUT-based routing that obfuscates the header in each packet. This obfuscation prevents the attacker from finding the destination and the path length of each packet. This obfuscation requires each switch in the NoC to use a LUT-based routing method that is implemented during the design phase of the NoC. A switch is responsible for routing the packets throughout a system. They are typically connected to each processing element and module. This can include the cores of the system, each cache level, memory, and I/O ports of the system. Typically, the header of the packet contains the destination address, which enables the router inside the switches to route the packet towards the correct corresponding direction [1]. The LUT obfuscation method does not operate this way. Instead, each packet has its own path calculated before routing in the network interface (NI). The route is then written to the header in the form of instructions for each switch to follow. The switches read their instruction, then increments the header so the next instruction is in line. This reduces the ML attack to a random guess as to the application running on the system. This LUT-based routing also allows a variety of routing methods to be used to improve additional security for the NoC and the SoC as a whole.

2. Related Work

Numerous researchers have explored the development of HTs within network-on-chip (NoC) components, such as routers, links, and network interfaces (NIs). These HTs possess the capability to intercept or manipulate data within the NoC, leading to various security risks such as denial-of-service (DoS) attacks, data snooping, and performance degradation. The detection of such HTs after the chip has been manufactured presents significant challenges, given the intricate design and lengthy manufacturing processes involved in modern multi/many processor system-on-chips. This significantly inhibits detection methods, such as physical inspection [6], functional testing [7], and side-channel analysis [8]. As a result, the majority of the existing research concentrates on devising solutions to mitigate the impact caused by these HTs rather than detecting and eliminating them.
The authors in [9] introduced a method to address HTs within the NoC routers. Their proposal involves implementing a bit shuffling-based encoding mechanism inside the routers to prevent HTs from diverting packets away from their intended destination cores, thereby mitigating performance degradation attacks. In another study [10], the authors countered snooping attacks aimed at extracting sensitive information through HTs embedded in the NIs. They incorporated encoding modules utilizing algebraic manipulation detection (AMD) and cyclic redundancy checks (CRC) as safeguards against such attacks.
In [11], authors presented a novel design of an HT embedded in the NIs capable of duplicating packets and launching a denial-of-service (DoS) attack. To address this issue, a multi-layer approach consisting of several components was proposed. Firstly, they introduced an encoding-based snooping invalidation module (SIM) to detect duplicate packets. Additionally, they proposed a low-power data-snooping detection circuit called THANOS, which uses threshold voltage degradation for detection. Lastly, a malicious application blacklisting mechanism was implemented to stop these attacks. However, it should be noted that these encoding-based solutions require additional hardware. This leads to increased communication overhead due to the presence of encoding–decoding modules in the data path. Furthermore, it adds complexity to the design of the NoC.
As a result of these challenges, a Trojan detection mechanism and a Trojan-aware routing algorithm were put forward in [12]. Their aim was to address the impact and mitigation of a misrouting hardware Trojan (HT) capable of launching a DoS attack by header flit manipulation. While these works concentrated on the design of HTs capable of executing DoS or performance degradation attacks, there is a limited amount of research that explores the combined effect of HTs and malicious applications working together to launch an attack on multi/many processor system-on-chips (MPSoCs).
In [13], the authors presented an attacker model that involves a hardware Trojan (HT) implanted in the NoC routers along with an accomplice application operating on the MPSoC. When triggered, the HT enables the application to issue commands to the compromised NoC component, thereby facilitating various attacks, such as snooping and denial-of-service (DoS). To counter such attacks, the authors proposed a layered security architecture consisting of a data scrambling layer, packet certification layer, and node obfuscation layer. However, the one-time pad XOR cipher employed by the data scrambler and packet certification layer can be compromised by accumulating a sufficient number of encrypted packets. Additionally, this work failed to adequately address snooping-based HTs.
In this work, for the first time, we demonstrate that a remote attacker, utilizing machine learning-based data processing techniques, can accurately determine the application profile in a multi/many processor system-on-chip (MPSoC) by analyzing the payload of a lightweight HT, which essentially functions as a counter. The reason behind the attacker’s ability to derive the application profile lies in the use of dimension-ordered routing (DOR) in NoC architectures. DOR exhibits a strong correlation between packet traversal frequencies at specific switches within the NoC and the applications being executed. The attacker can leverage this correlation to achieve high accuracy in profiling applications. Secure routing mechanisms, such as region-based routing (RBR) and segment-based routing (SBR), proposed in [14], are insufficient in mitigating the impact of such HTs. Although these routing mechanisms partition the NoC into different security zones and minimize inter-zone traffic, they prove ineffective against the proposed attacker model. Moreover, these mechanisms are deterministic in nature, further compromising their effectiveness.
In contrast, the research paper cited as [15] introduced an NoC routing algorithm that employs a combination of west-first and adaptive XY routing. This approach aims to enhance NoC security by providing additional routing paths and reducing interference with potential attackers. However, partially adaptive routing algorithms like this one are susceptible to attacks that exploit the available routing paths. Priority-based routing mechanisms, such as noninterference-based adaptive (NIBR) routing proposed in [16], based on DOR, are not completely impervious to attacks that exploit packet counts within NoC routers. Therefore, to effectively protect an NoC against such attackers, a randomized routing mechanism becomes crucial. However, implementing a fully randomized routing approach can severely degrade the performance of an NoC since the packets may not be routed along the shortest paths. In [17], a scheduling change is used along with time-division multiplexing, to prevent side-channel attacks in the NOC. They also implement a source-destination path randomization to further improve security. The authors in [18], implement a detection algorithm that detects changes in traffic communication to identify hostile IC’s in the NOC. [19] has the authors make and subvert three types of trojan attacks. Packet Duplication, application blocking, and misrouting. An algorithm detection program was constructed in [20], to identify HT inside of NOCS. Ref. [21] has a trojan to significantly delay the NOC with a minimal footprint.

3. Threat Model

Our threat model involves a system containing multiple users or tenets that use multi-core processors as a processing engine and are interconnected with an NoC. With the prevalence of third-party manufacturers, the NoC is often produced by a different organization than that which made the design. This allows malicious HTs to be inserted by these other organizations during the fabrication process. These HTs are often simple in functionality and compared to the NoC as a whole. In this threat model, an HT counts the destinations of packets going through it, then sends them to a core running an accomplice application running on the same chip. The accomplice application sends its own address to the HT to echo the payload back to it. On receiving the payload data from the HT, the accomplice application then sends the payload data to an external attacker. Hence, the HT does not have to know the address of the external attacker. A similar hardware-software based attack model was discussed in [13].
The attacker then can take the offloaded data and perform a traffic analysis attack, using large-scale computing. This offloads much of the calculations out of the NoC and allows the HT to remain undetected. This can lead to the attackers gaining knowledge of the applications running on the infected system, which compromises user privacy. These applications can be gathered by the attacker from a variety of CPU and GPU benchmark suits.
These benchmark applications fall into many categories that can correspond to unique targets that this HT can identify. By gathering applications outlined in the benchmarks used or from applications suites of interest, an attacker can clearly define applications run from the financial, science, math, animation, and physics fields. An attacker can even simulate more applications for a wider range of identification. This has many security implications that could compromise national security. For instance, if the HT discussed is inserted in any device for military or other purposes, the attacker could easily narrow down what specific applications the device was running. If the device was, per se, a camera monitoring system, one would expect the HT to see a large amount of body-tracking and some face sim applications running. This not only lets the attacker know the use of the device and potentially the location of the IP, but it also gives insight into the software running on the system that can allow a more complicated attack to be tailored to the specific device. Figure 1 shows the attack model discussed here. The functionality and design of the HT are described next.

4. Attack Model

In this section, we discuss the design of the HT, the trigger mechanism, and the off-chip HT payload analysis mechanism to determine the application profile of multi/many core processors.

4.1. Hardware Trojan (HT) Design

The HT is inserted inside the routing block of the NoC or interconnection switch. Each HT contains 16-bit counters that can count the number of the packets addressed to every destination NoC switch over a fixed time window. Hence, the total number of counters in the HT is dependent on the number of NoC switches. When a header flit arrives at the infected router, the HT reads the destination field from the header flit and increments the value of the corresponding counter, tracking the number of packets destined to that particular NoC switch. After counting the number of packets destined for all the NoC switches for a fixed time window, the HT packetizes these counts as an HT payload. Then, the HT transmits this payload to a core in the multi/many core system running an accomplice application. The fixed time period is denoted as an observation window in this paper. The HT also consists of another 16-bit down counter that functions as the timer to track this observation window. Once the observation window timer expires, the HT resets the counters for the next observation window and creates the HT payload. Although, in this HT design, we used concrete values for the counters, all these counter sizes are configurable. However, the size of the individual counters in the HT is bounded by the size of the observation window timer. This is because each packet consisting of multiple flits takes several clock cycles to be routed through a switch [22]. Thus, the maximum packet count will be less than the duration of the observation window. This particular HT does not impact the data path of the legitimate packets getting routed in the NoC, as it is not sequential to the routing logic, and the counting happens in parallel to the routing. Therefore, the timing analysis cannot detect the HT(s). In our attack model, we assume one or more NoC switches in the system to be infected with the HT. While research has proposed the design of HTs to count the number of packets, such as the one in [23], reading the destination is a new design concept. The HT design proposed in [23] uses one counter per switch and tallies the total packets through the infected switch. Due to this, a higher number of HTs is required to determine the application profile with high accuracy. On the other hand, the destination-reading HT partitioning this total packet count to each destination provides another dimension of information, resulting in a lower number of HTs to determine the application profile with high accuracy. This is shown in Section VI-B. Moreover, a few such counters of these moderate sizes are also undetectable in the large multi/many-core processors both in terms of area overhead or power consumption, as even a single NoC switch with a size of 30–40 K gates [22] is orders of magnitude more complex compared to the HT. The HT contains 16-bit counters to count the number of packets routed to destination NoC switches through the infected NoC switch. The total number of counters in the HT is dependent on the number of NoC switches. After a fixed time window, these counts are packetized and sent to an external attacker. The attacker has the option of reducing the size of the counters to, say, 8 bits and increasing the frequency of packets sent from the HT. This balance can be decided by the attacker to better circumvent the current detection standards.

4.2. Hardware Trojan Trigger Design

We envision that the proposed HT lies parallel to the data path of the packet transfer mechanism over the NoC and is always on. Such an HT is difficult to detect through timing analysis, as it does not affect the latency of the packets. Furthermore, due to the nature of the payload as described below, the area and power footprint of the HT is negligible compared to that of the NoC or the entire processor.
As an alternative to the more common “only on” Trojan, a conditional trigger can be designed, such as a counter trigger or combination input-based trigger [2]. While these triggers can be beneficial, we do not use them for two reasons. The first is the hardware complexity that the conditional trigger brings. While the conditional trigger method can improve the longer power consumption of the Trojan, the power draw while the Trojan is on increases. The increased power draw as well as the area overhead may lead to detection. Our proposed hardware trigger must continuously monitor data in order to gather data in the system as described in Section 4.1. This is the main reason why a conditional trigger is not beneficial and could actually lead to worse performance. Since our HT is always triggered, the probability of the detection analysis framework is not applicable here [24].

4.3. Off-Chip Hardware Trojan Payload Analysis

To demonstrate the threat that such an HT can pose, an HT payload analyzer is constructed using ML techniques. An artificial neural net model is used to train this payload analyzer in this work.
There are two advantages of using the ANN-based classifiers for analyzing the HT payload to predict the application profile. Firstly, ANNs are capable of mapping complex patterns efficiently, as well as being resilient to noise in the data. Secondly, for a similar type of counter-based HT proposed in [23], the authors showed that the scaling complexity for such ANNs is much lower compared to other machine learning classifiers like SVM, DT, and KNN when predicting multiple applications from the HT payload data. This is because multiple applications can be predicted by a single ANN by using softmax activation at the output layer.
The data set used to train the ANN contains 80 features with 13 outputs. The 80 features are the 80 switches used in a 64-core system with 16 memory accesses. The 13 output features are the applications that could be running on the system based on a collection of 13 application kernels from the Parsec benchmark suite. These data are created using a simulator, the same way that an attacker could. In this work, a simulator is used that can monitor the movement of flits as described in Section 6.1. In this simulator, various benchmarks are run from various suits that contained common uses for a multi-core SoC. These traces are then used to make a data set with information that the HT would be able to gather. Since traffic patterns differ due to architecture, routing algorithms, and traffic, the patterns will not be constant for the different applications. To simulate this characteristic, random noise is added to the packet counts. This noise is a negative binomial distribution with a M/M/1 queuing model [25]. This is used in hopes of making the ANN much more robust to the variations in architecture, routing algorithms, and traffic. To supplement this, the ANN is trained on all 80 features (switch data) available, which yields much better application prediction results. The caveat to doing this is that it is not feasiblefor a system to have an HT in every switch. This would have a massive area, power, and traffic overhead that would greatly increase the chance of detection. To reflect the real-life attack scenario, switches are selected to be infected based on their (negative) correlation to each other. This gives a list of optimal switch locations that would give unique data based on the application, which leads to a better ANN. In addition, the correlation minimizes the footprint in the affected system.
For traffic analysis of the payload data from the HT capable of collecting packet counts for each destination separately, a 5-layer ANN is deployed that has a variable node input (N), corresponding to the number of infected switches in the system. Since each infected switch has payload data corresponding to all switches as the Trojan gathers packet counts directed toward all destinations, N becomes the number of infected switches multiplied by the number of switches in the system. This brings the final neuron layout to N-800-500-200-64-13 neurons. The hidden layers have ReLU as an activation function, and a softmax is used as the output layer function. This is shown in Table 1. The output has thirteen neurons, corresponding to the thirteen potential applications that the system could be running. The computation time, overhead, and power draw are not applicable to the ANN because all of the computations and analysis are deployed off of the infected system. Common metrics to measure the performance of the ANN are used, such as accuracy, precision, recall, and F1-score.

5. Defense Model

In this section, we discuss the design of the defense model, which consists of path-based routing, LUT obfuscation, and the randomization of permutations.

5.1. LUT Obfuscation

The proposed defense mechanism against the HT-based traffic analysis attack is based on obfuscating the traffic analysis mechanism by obfuscating the destination in the packet header. With traditional flit routing, the destination of the flit is stored unencrypted in the header. The header is then passed to each switch, where the switch uses the header to determine the direction of travel (Figure 2a). A common routing strategy used in many/multi core chips is X-Y routing. In the case of a two-dimensional architecture, a packet will move one dimension fully before moving the other. For example, in an 8 × 8 mesh, if a packet needs to move corner to corner, it would first make 7 hops in the X dimension, before making 7 hops in the Y. This is a low-cost solution that ensures that the optimal path is taken during routing.
The header cannot simply be encrypted due to the latency overhead that encrypting and decrypting at each switch would provide. If the header contains the path for the packet to follow instead of the destination, the destination is no longer directly readable by an HT. This path contains the direction (port number) that the packet takes at each switch in order, appended to one another. Every switch is fitted with a look-up-table (LUT), which they use to route the packet, based on the path in the header (Figure 2b). If another dimension is added to the architecture, these LUTs can simply increase in size to account for the additional routing options. Every switch in the SoC will have a LUT to route packets through the SoC; however, all data calculated inside of a single processing element are not affected by the switches.
The header now contains each port number along the packet’s path for every switch on the path to follow. After the port number is read, the path in the header is shifted so that the next instruction is first in the queue. The destination can still be found with the header by forward tracing the unencrypted path. For instance, in Figure 2b, seeing that there are three Easts followed by a Local (2,2,2,4), the destination can be deduced to end at switch D.
In order to prevent this, the path needs to be obfuscated. While the path could be fully encrypted, this would incur the same prohibitive latency as encrypting the destination. These indexes can be pseudo-randomized for each switch, preventing the compromise of the whole system due to the obfuscation being found from one switch. This pseudo-randomization occurs in the LUT at design time as described in Section 4.2, and the path is obfuscated in the network interface when the packet is generated (Figure 2c). Randomization occurs at design time for two main reasons. The first is the processing overhead that the randomization incurs. We choose to randomize during design time to avoid the increased need for processing resources during runtime. Secondly, randomization during design time has the added benefit of additional security layers. For example, if the randomization occurred at runtime, an HT could be designed to read the runtime obfuscation of the switch, and subsequently subvert the entirety of the obfuscation.
While the path is obfuscated with the LUTs implemented, the path length is still available to the attacker. The path length can be used to determine the destination region that the packet is heading towards. This is not as powerful as a destination attack, but the destination region could still be used to determine the applications running. This attack is halted by padding all of the path lengths in the header. This makes all of the headers the same length, preventing the path length from being identified. In addition, every switch adds a port number to the end of the path once they pop off their port number and route the packet. While this method could have a packet being routed by these switch-generated port numbers, the end condition of a packet being internal should prevent the packet from continually getting routed. The LUT-based routing subverts attacks from third parties during the manufacturing phase of the NoC. Since the routing is implemented during design, injected Trojans will still be affected by the new routing architecture. The routing can further be configured by the frequency of LUT pseudo-randomizations during design time. A system centered on security may opt for a randomization every boot, while a system optimizing for performance may only randomize once. This architecture allows different routing algorithms to still be used, such as random routing [23], to further increase performance and security.

5.2. Pseudo-Randomization of the LUT Indices

In order to obfuscate each step in the path, all of the LUTs need to be pseudo-randomized. The first step in this process is to generate permutations of all the possible directions that a flit can take at a switch. In the standard mesh architectures, these directions are limited to North, South, East, West, and Local (Internal) to the switches node. To generate all of the 120 (5!) permutations, the Narayana Pandita generation algorithm is used. This algorithm outputs a list of all the permutations of size n, in this case 5, and can be run with a time complexity of O(N). This algorithm was originally published in [26] but has since been reworded and published in [27].
With the permutations generated, the list needs to be randomized by generating a pseudo-randomized list from 0 to 120. In software, this can be completed with a linear congruency generator (LCG). LCGs generate pseudo-randomized numbers repeating at a constant period. This is defined as
X n + 1 = ( a X n + c ) mod m
where Xn+1 is the next pseudo-random number, Xn is the seed, m is the modulus, a is the multiplier, and c is the increment. The Hull–Dobell theorem [28] is used to make the randomization algorithm have a period of m, in this case 120, without repeating any numbers.
This theorem adds three rules to LCG. The first is that m and c are co-primes, or only share an integer divisor of one. The second rule is that a − 1 needs to be divisible by all prime factors of m. The final rule is that if a − 1 is divisible by 4, m is divisible by 4 as well. The end result is all 120 numbers generated in a pseudo-random order. While the randomization and the permutation occur in software in the network interface (NI), an LFSR can be used to replace the LCG in the NI if hardware is required.
With the destination obfuscated, the theoretical destination prediction rate is 1/n, with n representing the number of cores in the system. With n being 64 in this case, we can assume a prediction rate of 1.5%. While this is the destination obfuscation, the output of the attack is one of thirteen classes. Since the attack is reliant on the destination, the application rate should be 1/m, where m is the number of applications. This works out to be 7.6% for the application prediction, which is equivalent to a random guess.

6. Evaluation

A system with 64 cores and 16 memory modules is used with an 8 × 8 mesh configuration. The packet traffic is simulated by the Synfull simulator and using the parsec and the splash 2 benchmark suits, which are executed until completion [29]. The characteristics of the cores are defined in Table 2.

6.1. Experimental Setup

In the 64-core environment, we considered a single application running on multiple threads each executed by multiple cores which share a memory stack. This configuration was used to map the traffic patterns that each application generates. The NoC switches are synthesized by using the ASIC design flows with Synopsis Design Compiler with 65 nm CMP standard cell libraries https://mycmp.fr/ (accessed on 25 November 2022). We used Candence simulation, assuming a link length of a 20 mm × 20 mm NoC topology, which gave energy and power dissipation. The ANN ML classifier was trained on NoXim [30], a cycle-accurate simulator implemented to track packets and their destinations going through each switch.
Our proposed attack model was implemented and evaluated on a 64-core mesh architecture using X-Y routing. The leaked packet count serves as the payload, transmitted by the inserted hardware Trojan (HT), to the ML-based attacker at regular intervals of 5000 cycles. To gauge the efficiency of the attack, we evaluated the ML engine’s accuracy, F1-score, recall, and precision on the attacker side as metrics. Additionally, we manipulated the number of observed features (i.e., inserted HTs) within the system and assessed the corresponding impact on the performance metric. An upward trend in attack efficiency was shown with an increase in feature size, as shown in Figure 3. All thirteen applications achieved 100% F1-score, precision, and recall with the presence of four HTs embedded in the 64-core, 80-switch system. With only two HTs embedded, a similar result was found with a combined application accuracy of 97.96% (Figure 4). In addition, recall (97.95%), and precision (97.82%) were found with two features. Hence, we can conclude that the proposed attacker is capable of effectively interpreting the user profile with minimal footprint or overhead within the chip.
This header-based HT is also compared to the HT discussed in [23]. As shown (Figure 4), the header-based HT has more accuracy with half of the number of HTs installed in the system.

6.2. ML Performance with Proposed LUT Obfuscation

We also compare the performance of the proposed attack model with the proposed LUT-based obfuscation. Figure 4 shows the accuracy of the attacker on mesh architecture for X-Y and LUT-obfuscated routing. It can be observed from Figure 4 that due to the higher routing obfuscation, the accuracy of the attacker falls significantly (<8%) compared to deterministic routing with higher feature sizes. Moreover, from Figure 4, it can also be observed that increasing the feature size does not increase the accuracy significantly and thus, represents the robustness of the proposed LUT obfuscation.

6.3. Routing and HT Overheads

The increased security offered by the proposed LUT-based outing comes with additional overheads in the routing logic as shown in the inset of Figure 1. We consider each switch to have a LUT with 5 indices, totaling 2 bytes for each switch or 160 bytes for the entire 80-switch system.
The 5-index LUT is compared to an X-Y router for the average 5 hops to the destination node. For each switch, the delay for X-Y routing falls to 0.2365 ns for a total routing latency of 1.1825 ns. In routing, the LUT has a single switch latency of 0.2330 ns for a total of 1.165 ns. In routing, the LUT provides a decrease in latency of about 0.0175 ns compared to the XY router. This leads to a speedup of about 1.5%. However, there is an additional overhead with the LUT when the flit header is constructed. With an average of 5 hops, the network interface needs to access 5 indexes in its 120 index LUT. This has a latency of 1.0238 per index for a total latency of 5.119 ns. With the entire life of the flit in scope, LUT is about 5 ns slower than a comparable system with X-Y routing.
While this means a slowdown by almost four times, this does not mean that the SoC with LUT routing takes four times longer to run. During routing, many other delays are introduced by many sources, such as traffic in switches, or thermal throttling. These added delays often occur during the transmission of a packet, where LUT-based routing excels. By offloading the path calculations to the NI with LUT routing, some of these delays can actually be reduced.

7. Conclusions

In this paper, an HT was designed and implemented into a multi-core network-on-chip. The HT was embedded into one or more switches in the NoC. This HT kept an array of counts that corresponded to the destination switches of packets that were routed through the infected switch. These counts were then sent off the system and run through an artificial neural network. The ANN predicted what applications were running on the system. With only one HT in an eighty-switch system, the program was able to predict the application with 76% accuracy. This increased to 97% with two HTs implemented. Compared to the switching network in the NoC, this HT was many orders of magnitude smaller and had much less power draw. This made the HT very difficult to detect. A defense against this HT was then constructed that obfuscated the header in the packets. This defense moved the switches from destination-based routing to a path-based one. The path was then further obfuscated by running a pseudo-randomization algorithm on system startup. This made each path instruction at every switch unique. The path length was hidden, with each switch padding the routing path with data. This defense reduced the HT application prediction equivalent to a random guess, regardless of the number of HT realistically embedded in the system. The path routing also offloaded the routing calculation to packet generation, improving the overall routing time in the system by almost 2%.

Future Work

There are many directions that future research on the specific HT can take. On the attack model side, improvements can be made to the neural net as well as the application being detected. The ANN used can be further optimized, or another architecture could be used to a better effect. In a multi-user scenario, many applications can be run at the same. While the neural network was built with that in mind, more optimizations need to be developed to see 3+ applications being detected with the HT. Other neural network models may prove more useful to this end, such as a CNN with image-parsing techniques. We envision scenarios where multiple applications are hosted on the system in parallel, and the number of parallel applications is also widely variable. Therefore, we may have scenarios with exponentially increasing training data volumes. In this case, a simple 5-layer ANN as evaluated in this work may not learn the inference adequately and we may need more sophisticated methods, such as CNNs. In the case of using CNNs, the gathered training data from the simulator can be transformed into an image, where each switch can be treated as a pixel location, and packet counts for each destination can be treated as a separate channel of that frame. These frames of data can be generated from the simulation platform while different combinations of applications are hosted on the system.
On the hardware side of the attack, the HTs could be made smaller. The direct comparison between the storage of the HT counters and the rate at which the packets are generated should be investigated further. While the attacker uses correlation to decide which switch to insert the Trojan into, they could also use the same method to restrict the destinations that they count. While this would decrease the accuracy of the HT, the area saving should be able to outweigh the downsides. The defense against the HT could also have some improvements. The method suggested offloads the entire path calculation to the NI before the packet is released. Ideally, the switch should be used to help calculate the path as well. This could further increase the security of the NoC, as well as allowing routing methods to be based on a packet’s lifespan rather than the number of switches passed through. Investigation on not using path routing could be conducted as well. While path routing was selected due to the low circuit overhead that five directions entailed, the entire destination could be obfuscated with significantly bigger LUTs in the switches. There also may be room for both destination and path routing to be implemented together. For instance, for larger systems, it may be beneficial to calculate the path for the next two hops, then encrypt the destination in the header. This would offload the path calculation to the header, keep the destination unreadable by the attacker, and prevent the destination from going through encryption at every switch.

Author Contributions

Conceptualization, T.M. and A.G.; methodology, T.M., A.T., N.M. and A.G.; software, T.M.; formal analysis, A.D. and S.M.P.D.; investigation, T.M.; resources, T.M., A.G., S.M.P.D. and A.D.; data curation, T.M. and N.M.; writing—original draft preparation, T.M.; writing—review and editing, A.G., S.M.P.D. and A.D.; visualization, T.M. and A.G.; supervision, A.G.; project administration, A.G.; funding acquisition, A.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the US National Science Foundation (NSF) CAREER Grant CNS-1553264.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Dally, W.J.; Towles, B. Route packets, not wires: On-chip interconnection networks. In Proceedings of the 38th Annual Design Automation Conference, Las Vegas, NV, USA, 22 June 2001. [Google Scholar]
  2. Xiao, K.; Forte, D.; Jin, Y.; Karri, R.; Bhunia, S.; Tehranipoor, M. Hardware trojans: Lessons learned after one decade of research. ACM Trans. Des. Autom. Electron. Syst. (TODAES) 2016, 22, 1–23. [Google Scholar] [CrossRef]
  3. Jin, Y.; Kupp, N.; Makris, Y. Experiences in hardware trojan design and implementation. In Proceedings of the IEEE International Workshop on Hardware-Oriented Security and Trust, San Francisco, CA, USA, 27 July 2009. [Google Scholar]
  4. Chakraborty, R.S.; Narasimhan, S.; Bhunia, S. Hardware trojan: Threats and emerging solutions. In Proceedings of the IEEE International High Level Design Validation and Test Workshop, San Francisco, CA, USA, 4–6 November 2009. [Google Scholar]
  5. Cruz, J.; Farahmandi, F.; Ahmed, A.; Mishra, P. Hardware trojan detection using atpg and model checking. In Proceedings of the 31st International Conference on VLSI Design and 2018 17th International Conference on Embedded Systems (VLSID), Pune, India, 6–10 January 2018. [Google Scholar]
  6. Skorobogatov, S. Physical Attacks and Tamper Resistance. In Introduction to Hardware Security and Trust; Springer: New York, NY, USA, 2012. [Google Scholar]
  7. Dubrova, E.; Näslund, M.; Selander, G. Secure and efficient lbist for feedback shift register-based cryptographic systems. In Proceedings of the 19th IEEE European Test Symposium (ETS), Paderborn, Germany, 26–30 May 2014; pp. 1–6. [Google Scholar]
  8. Kocher, P.; Jaffe, J.; Jun, B. Differential power analysis. In Advances in Cryptology—CRYPTO’ 99; Wiener, M., Ed.; Springer: Berlin/Heidelberg, Germany, 1999; pp. 388–397. [Google Scholar]
  9. Manoj Kumar, J.Y.V.; Swain, A.K.; Kumar, S.; Sahoo, S.R.; Mahapatra, K. Run time mitigation of performance degradation hardware trojan attacks in network on chip. In Proceedings of the IEEE Computer Society Annual Symposium on VLSI, Hong Kong, China, 8–11 July 2018; pp. 738–743. [Google Scholar]
  10. Boraten, T.; Kodi, A.K. Packet security with path sensitization for NoCs. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 14–18 March 2016; pp. 1136–1139. [Google Scholar]
  11. Raparti, V.Y.; Pasricha, S. Lightweight mitigation of hardware trojan attacks in NoC-based manycore computing. In Proceedings of the 2019 56th ACM/IEEE Design Automation Conference (DAC), Las Vegas, NV, USA, 2–6 June 2019; pp. 1–6. [Google Scholar]
  12. Manju, R.; Das, A.; Jose, J.; Mishra, P. Sectar: Secure noc using trojan aware routing. In Proceedings of the 2020 14th IEEE/ACM International Symposium on Networks-on-Chip (NOCS), Hamburg, Germany, 24–25 September 2020; pp. 1–8. [Google Scholar]
  13. Ancajas, D.M.; Chakraborty, K.; Roy, S. Fort-NoCs: Mitigating the threat of a compromised NoC. In Proceedings of the 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 1–5 June 2014; pp. 1–6. [Google Scholar]
  14. Fernandes, R.; Marcon, C.; Cataldo, R.; Silveira, J.; Sigl, G.; Sepúlveda, J. A security aware routing approach for NoC-based MPSoCs. In Proceedings of the Symposium on Integrated Circuits and Systems Design, Belo Horizonte, Brazil, 29 August–3 September 2016. [Google Scholar]
  15. Indrusiak, L.S.; Harbin, J.; Sepulveda, M.J. Side-channel attack resilience through route randomisation in secure real-time networks-on-chip. In Proceedings of the International Symposium on Reconfigurable Communication-Centric SoC (ReCoSoC), Madrid, Spain, 12–14 July 2017. [Google Scholar]
  16. Boraten, T.H.; Kodi, A.K. Securing NoCs against timing attacks with non-interference based adaptive routing. In Proceedings of the IEEE/ACM International Symposium on Networks-on-Chip (NOCS), Torino, Italy, 4–5 October 2018. [Google Scholar]
  17. Shalaby, A.; Tavva, Y.; Carlson, T.E.; Peh, L.-S. Sentry-NoC: A statically-scheduled NoC for secure SoCs. In Proceedings of the 15th IEEE/ACM International Symposium on Networks-on-Chip, Madison, WI, USA, 14–15 October 2021. [Google Scholar]
  18. Meng, X.; Raj, K.; Ray, S.; Basu, K. SEVNOC: Security Validation of System-on-Chip Designs with NoC Fabrics. IEEE Trans.-Comput.-Aided Des. Integr. Circuits Syst. 2022, 42, 672–682. [Google Scholar] [CrossRef]
  19. Weber, I.; Marchezan, G.; Caimi, L.; Marcon, C.; Moraes, F.G. Open-source NoC-based many-core for evaluating hardware trojan detection methods. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020. [Google Scholar]
  20. Li, Z.; Wang, J.; Huang, Z.; Wang, Q. EA-based Mitigation of Hardware Trojan Attacks in NoC of Coarse-Grained Reconfigurable Arrays. In Proceedings of the 2022 International Conference on Networking and Network Applications (NaNA), Urumqi, China, 3–5 December 2022. [Google Scholar]
  21. Yao, J.; Zhang, Y.; Hua, Y.; Li, Y.; Yang, J.; Chen, X. Spotlight: An Impairing Packet Transmission Attack Targeting Specific Node in NoC-based TCMP. In Proceedings of the 2023 IEEE European Test Symposium (ETS), Venezia, Italy, 22–26 May 2023. [Google Scholar]
  22. Pande, P.P.; Grecu, C.; Jones, M.; Ivanov, A.; Saleh, R. Performance evaluation and design trade-offs for network-on-chip interconnect architectures. IEEE Trans. Comput. 2005, 54, 1025–1040. [Google Scholar] [CrossRef]
  23. Ahmed, M.M.; Dhavlle, A.; Mansoor, N.; Sutradhar, P.; Dinakarrao, S.M.P.; Basu, K.; Ganguly, A. Defense against on-chip trojans enabling traffic analysis attacks. In Proceedings of the in 2020 Asian Hardware Oriented Security and Trust Symposium (AsianHOST), Kolkata, India, 15–17 December 2020; pp. 1–6. [Google Scholar]
  24. Shayan, M.; Basu, K.; Karri, R. Hardware trojans inspired ip watermarks. IEEE Des. Test 2019, 36, 72–79. [Google Scholar] [CrossRef]
  25. Hogg, R.V.; McKean, J.; Craig, A.T. Introduction to Mathematical Statistics; Pearson Education: New York, NY, USA, 2005. [Google Scholar]
  26. Pandita, N. The Ganita Kaumudi; Indian Press: Benares, India, 1936. [Google Scholar]
  27. Roy, R. Sources in the Development of Mathematics: Series and Products from the Fifteenth to the Twenty-First Century; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
  28. Hull, T.E.; Dobell, A.R. Random number generators. SIAM Rev. 1962, 4, 230–254. [Google Scholar] [CrossRef]
  29. Badr, M.; Jerger, N.E. Synfull: Synthetic traffic models capturing cache coherent behaviour. ACM SIGARCH Comput. Archit. News 2014, 42, 109–120. [Google Scholar] [CrossRef]
  30. Catania, V.; Mineo, A.; Monteleone, S.; Palesi, M.; Patti, D. Noxim: An open, extensible and cycle-accurate network on chip simulator. In Proceedings of the IEEE 26th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Toronto, ON, Canada, 27–29 July 2015. [Google Scholar]
Figure 1. NoC switch components with inserted HT.
Figure 1. NoC switch components with inserted HT.
Jlpea 13 00050 g001
Figure 2. Differences in routing architectures. (a) Destination-based routing steps. A packet will have the destination in its header, and calculate the next hop of the packet at each switch. (b) LUT-based routing steps. A packet will have a set of directions in its header and will execute the directions at each switch. (c) LUT obfuscated routing steps. A packet will still have a set of directions to execute, but the directions are uniquely obfuscated to each switch.
Figure 2. Differences in routing architectures. (a) Destination-based routing steps. A packet will have the destination in its header, and calculate the next hop of the packet at each switch. (b) LUT-based routing steps. A packet will have a set of directions in its header and will execute the directions at each switch. (c) LUT obfuscated routing steps. A packet will still have a set of directions to execute, but the directions are uniquely obfuscated to each switch.
Jlpea 13 00050 g002
Figure 3. (a) F1-Score, (b) Precision and (c) Recall, for X-Y routing with different features (number of HTs observed).
Figure 3. (a) F1-Score, (b) Precision and (c) Recall, for X-Y routing with different features (number of HTs observed).
Jlpea 13 00050 g003
Figure 4. Performance with obfuscated and unobfuscated data.
Figure 4. Performance with obfuscated and unobfuscated data.
Jlpea 13 00050 g004
Table 1. Architectural details of the neural network.
Table 1. Architectural details of the neural network.
Input Image Dimensions(#HT * 80) × 13
Hidden LayerConvolution: two layers
Activation FunctionLeakyReLu
Dropout0.4
Output LayerDense with sigmoid activation
OptimizerAdam with learning rate = 0.0002 and beta_1 = 0.5
Loss FunctionBinary cross entropy
Table 2. Component configuration for simulation.
Table 2. Component configuration for simulation.
ComponentConfiguration
System size64 cores, Out-of-Order, 16 cores/chip
Cache32 KB (private L1), 512 KB (shared L2), MOESI
NoC router3 stage pipe-lined 5 ports, 0.07 8pJ/bit
Total VC4, each 8 flits deep, 64 bits/flit
Wired NoC links64-bit flits, single cycle latency, 0.2 pJ/bit/mm
Technology65 nm, 1 V supply, 1 GHz clock
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mountford, T.; Dhavlle, A.; Tevebaugh, A.; Mansoor, N.; Dinakarrao, S.M.P.; Ganguly, A. Address Obfuscation to Protect against Hardware Trojans in Network-on-Chips. J. Low Power Electron. Appl. 2023, 13, 50. https://doi.org/10.3390/jlpea13030050

AMA Style

Mountford T, Dhavlle A, Tevebaugh A, Mansoor N, Dinakarrao SMP, Ganguly A. Address Obfuscation to Protect against Hardware Trojans in Network-on-Chips. Journal of Low Power Electronics and Applications. 2023; 13(3):50. https://doi.org/10.3390/jlpea13030050

Chicago/Turabian Style

Mountford, Thomas, Abhijitt Dhavlle, Andrew Tevebaugh, Naseef Mansoor, Sai Manoj Pudukotai Dinakarrao, and Amlan Ganguly. 2023. "Address Obfuscation to Protect against Hardware Trojans in Network-on-Chips" Journal of Low Power Electronics and Applications 13, no. 3: 50. https://doi.org/10.3390/jlpea13030050

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop