Hardware Activation by Means of PUFs and Elliptic Curve Cryptography in Field-Programmable Devices

Reusable design using IP cores requires of efficient methods for protecting the Intellectual Property of the designer and the corresponding license agreements. In this work, a new protection procedure establishing an activation protocol in a similar way to the activation process in the software world is presented. The procedure, named SEHAS (Secure Hardware Activation System) allows the distribution of cores in either Blocked (not functioning) or Demo (functioning with limited features) modes, while ensuring the license agreements by identifying not only the IP core but also the implementation device, using Physically Unclonable Functions (PUF). Moreover, SEHAS secures the exchange of information between the core and the core vendor using an Elliptic Curve Cryptosystem (ECC). This secure channel allows the IP core vendor to send a unique Activation Code to the core in order to switch it to the Activated Mode, thus enabling all its features.


Introduction
Reusable design has improved the costs and the human resources required for the development of complex systems.Nevertheless, the distribution of reusable modules in the form of Intellectual Property Cores (IP Cores) leads to new challenges for an adequate licensing and protection of the developer rights.In fact, once the IP Core has been delivered, the developer loses the control on how the core is used or redistributed.In the literature, several solutions for the protection of IP Cores at different description levels have been proposed [1][2][3][4][5][6][7][8][9][10][11][12][13][14], with watermarking techniques implemented at the HDL level the most suitable for IP core protection [2][3][4][5][6][7][8][9].
These methods allow the identification and ownership proof of the protected cores by means of a Digital Signature (DS), but they do not provide mechanisms for detecting unauthorized distribution of the a core since the physical device used for the implementation is not considered in the protection process.In this sense, Physical Unclonable Functions (PUF) [15][16][17][18][19][20][21] can provide a unique identification of the device, thus enabling the possibility of establishing a license for an IP core being implemented on a specific device.This can be achieved through the combination of the DS of the core with the Device ID (DID) provided by the PUF.In principle, the IP core vendor has access to the IP core information, and can thus embed a DS into the core under protection, but the vendor has no access to the device, which is held by the customer.Therefore, it is necessary to establish a secure channel for exchanging information between the device and the vendor in order to incorporate the DID information within the IP core under protection.The solution for establishing this secure channel can be the use of a Public Key Cryptosystem (PKC) [22,23] to be implemented into the IP core.This PKC implementation should require low area resources in order to minimize the area overhead introduced by the protection system.Elliptic Curve Cryptography (ECC) [24] has emerged as the preferred solution for the hardware implementation of a PKC when low area, low power consumption or narrow bandwidths are required [25,26].In fact, ECC requires lower key sizes than RSA for maintaining similar security levels [22,27], while allowing efficient hardware implementations when binary fields (GF(2 m )) are used [28,29].Thus, the introduction of both PUF and ECC units into the IP core enables the identification of the core and the device, and allows the detection of license terms infringements also when the core is implemented on an unauthorized device.Moreover, it is possible to extend the features provided by the PUF and the PKC for enabling new features.Regarding this, the concept of Hardware Activation is introduced in [14] for the protection of microprocessor IP cores.The idea is similar to the activation process used in the protection of software programs: an activation code is required for allowing the system to operate properly, otherwise the system under protection remains inoperative or, optionally, in Demo mode.In this paper, taking advantage of the availability of a DS, a DID, and a secure channel for the interchange of information between the core and the vendor through the PKC, a generalization of the applicability of Hardware Activation to any type of IP core is proposed.The rest of the article is organized as follows: Section 2 describes the Secure Hardware Activation System (SEHAS), Section 3 presents the proposed designs and implementations for the different blocks composing SEHAS, and Section 4 is devoted to the conclusions of the work.

Secure Hardware Activation System
In the present work, a protection scheme enabling the Hardware Activation [14] of a protected IP core will be developed.The method, called SEHAS (SEcure Hardware Activation System), enables a detailed monitoring of license compliance through the generation of a unique Activation Code (AC) for each pair of IP core and implementation device.In this scheme, the IP core vendor can choose between two options for establishing the behavior of the core when the specific license Activation Code has been not supplied: • Blocked mode.In this mode, the IP core remains blocked, without any functionality • Demo mode.In this mode, the core maintains a limited functionality.This mode enables the possibility of distributing the core for performance and feature evaluation by the customer, in a similar way to the demo versions of software programs.
From any of these two states, the protected core is switched to Activated mode when the correct license AC is supplied, enabling all features of the IP core.Thus, the concept of Hardware Activation is similar to the activation process in the software world, but applied to the distribution and licensing of hardware IP cores.In the following subsection, the protocol for generating the Activation Code in a secure way is described.

Activation Protocol
The proposed protocol requires a set of operations to be performed by the customer and the IP core vendor.Figure 1 shows the flow diagram of the protocol, which comprises the following operations: 1. Watermarking of the IP Core.The vendor generates a Digital Signature (DS), corresponding to the license details, and introduces it into the IP core to be protected by means of watermarking techniques such as IPP@HDL [6] or e-coreIPP [9].2. Introduction of the SEHAS module into the core.The logic needed for extracting the DS is included into the SEHAS module, which is after that embedded into the IP core.Note that the DS is not visible from outside at any time.Also, the structure of the IP core circuit, which includes the SEHAS module and the additional logic needed for protection, will not be modified later.Thus, if a Physical Unconlable Function (PUF) is used for obtaining the Device ID (DID), the PUF's result will remain unchanged in the next steps.As a result of this operation, an IP core in Blocked/Demo mode is obtained and supplied to the customer.

Generation of the Activation Request Code (ARC).
The customer enables the generation of the ARC by introducing a given control sequence to the core.This ARC is the public key of the ECC PKC [22] established for securing the information exchange between the vendor and the core.The private key is obtained from the DS and the DID, thus including information about the IP core and the implementation device.Summarizing this all, the private-public key pair of the core are the CIC (Combined Identification Code, obtained concatenating the DS and the DID) and the ARC.Once the ARC is generated, it is sent to the vendor.In this way, the customer has no information about the DS or the DID, since only the public key (ARC) is sent.4. Generation of the Activation Code (AC) and the Secure Shared Value (SSV).The vendor generates its private-public key pair.The private key is named Core-Device License (CDL), and the public key is the Activation Code.Using the CDL (private key of the vendor), and the ARC (public key of the core), the Secure Shared Value (SSV) is computed.The same SSV can be derived from the CIC (private key of the core) and the AC (Public Key of the vendor).5. Generation of the ROM configuration.The SEHAS module contains a ROM, which includes part of the combinational logic of the protected core.Depending on the ROM contents, the core will be in Blocked, Demo or Activated mode.In this step, the vendor generates a ROM configuration that modifies the logic functionality of the core.This new ROM configuration enables the switch to the Activated mode by means of the SSV.Also, the vendor sends the AC to the customer.Again, the exchanged information is a public key, and the ROM configuration corresponds to modified logic functions without useful information.Moreover, the ROM contents could be encrypted if more security was needed, using a private-key cryptosystem.AES-128 can be implemented using only 444 LUTs [30], maintaining high levels of security with low area overhead.The private key to be used for ROM encription/decription will be the Digital Signature (DS) of the IP core.6. Activation of the protected core.The customer performs the ROM configuration, and supplies the AC to the core.The SEHAS module internally computes the SSV using the AC (public key of the vendor) and the CIC (private key of the core), then switching to the Activated Mode.As a result of the described process, the customer gets an activated IP core with full features.Note that this core will not correctly operate in a different device, because the CIC will be different (the DS will be the same, but the DID will not), and consequently the SSV obtained will not be able to switch the core to the Activated Mode.In the following subsection, the required elements for implementing SEHAS are described.

Hardware Structure for SEHAS Implementation
The activation protocol described above requires a set of hardware elements for implementing each of the operations.In Figure 2, the block diagram of an IP core protected using the SEHAS method is shown.The functions of the different blocks are the following: 1. Watermarking Unit.Operation 1 of the activation process is performed through one of the watermarking techniques from those available in the literature [6,9].As a result of applying such method, a DS will be stored in the protected core.The watermarking unit is the responsible for extracting the DS previously stored in the watermarked core, as it was detailed in Operation 2 above.The DS extracted will be concatenated with the DID generated by the Device Identification Unit, resulting in the CIC that is needed for completing Operation 3. 2. Device Identification Unit.This unit provides the Device ID (DID), using a FPGA vendor specific function or a Physical Unclonable Function (PUF).In Section 3, these different possibilities will be considered.The DID obtained is combined with the DS supplied by the watermarking unit, generating the CIC. 3. ECC Unit.The ECC unit performs the cryptographic operations needed by SEHAS.When the gen input takes the '0' value, the ECC unit computes the ARC from the CIC and the base point G [22] (Operation 3).Note that activ input must take the value '1' for obtaining the ARC at the output of the core.When gen input takes the '1' value, ECC computes the SSV from the CIC and the AC, required for activating the core (Operation 6). 4. SSV Register.The SSV register maintains the SSV value computed by the ECC unit, allowing the IP core to operate in Activated mode while powered on.5. AMC.The AMC block contains part of the combinational logic of the IP core under protection.
The combinational functions are implemented using a ROM, with additional logic XOR-ing the functions with the SSV.Depending on the ROM configuration and the SSV, the core will be in Blocked, Demo or Activated mode.The next section proposes hardware implementations for the described modules.

SEHAS Blocks Implementations
In this Section, the implementation of each one of the SEHAS blocks is approached, considering different possibilities.The Watermarking Unit implementation depends on the technique used for storing the DS into the protected core.In [6,9], different watermarking techniques are described, along with the corresponding extraction logic.The next subsection approaches the implementation of the Device Identification Unit.

Device Identification in FPGAs Using PUFs
The usual target devices for reusable design using IP cores are Field Programmable Devices (FPLs), mainly FPGAs.The two main FPGA manufactures, Xilinx Inc.San Jose CA, USA, and Altera Corp. San Jose CA, USA, provide mechanisms for the identification of their devices.In the case of Xilinx devices, the families posterior to Spartan-6 contain an embedded, device-unique 57-bit identifier named "device DNA", permanently programmed into the FPGA and unchangeable.The DNA information is stored in an eFUSE (Electrically programmable fuse) register, and can be read internally from the FPGA using the DNA_PORT primitive supplied by Xilinx.The DNA feature is not supported by all device families.Altera uses a similar technique, providing a 64-bit Altera Unique Chip ID, which can be recovered by means of the ALTCHIP_ID IP core.As in the case of Xilinx DNA, the ALTCHIP_ID is not available for all devices.Thus, using the unique DID provided by FPGA vendors makes the IP core technology dependent, and restricts the devices where it can be implemented.Physical Unclonable Functions (PUF) [15,16,18] represent an alternative capable of providing a DID based on process variations in the manufacturing of the device.The selected PUF must have two main properties: • Variability.The PUF should generate a different DID for each device, and this DID must have good random statistical properties over different devices.• Repeatability.The PUF should generate always the same DID in the same device, independently of the operating conditions.
There are different implementations proposed for PUFs on FPGAs [16][17][18][19][20], and in [21] an analysis of these proposals is outlined.From [21], implementations based on the different frequency generated by ring oscillators are more suitable to FPGAs than other solutions, such as arbiter PUF [20] or butterfly PUF [19].In [18], a specific PUF for FPGAs is developed and presented, but it depends on the internal structure of the logic elements of the device.Here, we will use a PUF implementation based on the use of pairs of oscillators, whose frequency slightly differs as a consequence of process variations [17], as shown in Figure 3.Each oscillator feeds a counter, and the difference in the frequency causes a difference in the values presented by the counters at a given time.This difference is detected by a comparator, generating a bit which is stored in a latch or flip-flop (FF block).The oscillators are usually implemented using Ring Oscillators (RO), as proposed in [16], allowing the control of the oscillator frequency by increasing or reducing the number of elements in the ring.In this paper, a new implementation of the oscillators is proposed, using a T flip-flop chain as a frequency divider for achieving a better waveform, while reducing the area of the counters.The length of the T flip-flop chain determines the stability of the clock signal obtained from the oscillator, and experimental tests realized using a logic analyser show that a chain with 4 flip-flops provides a stable waveform.Moreover, the oscillators have an start input for synchronizing the start time of both oscillators, and a capture signal for establishing the time when the counters are compared.The interval between these two time instants must be long enough to allow the counters to exhibit different values, which conditions the size of the counters.Generation of 960 PUF bits on 20 Altera DE-1 boards (Cyclone II EP2C20F484C7 devices) with 8-bit counters provides 502 '0's and 458 '1's.The difference between '0's and '1's occurrence is mainly due to the fact of having equal values in the counters results in a '0' at the PUF output.Thus, 8-bit counters provide enough variability while maintaining a limited overhead.
On the other hand, repeatability has been tested performing 1000 repetitions of 48-bit NANDTO (NAND T-chain oscillator) PUF on each board.The experimental results provided a probability of obtaining an erroneous PUF output of P f 48 = 0.05.For improving repeatability, a majority function has been implemented, selecting the most repeated value in a three PUF calculation sequence.With this improvement, the probability of failing in the calculation of a 48-bit NANDTO PUF gets down to P m3 f 48 = 0.003.Moreover, if 10 activation attempts are performed at boot time of the protected system, the probability of failing in the activation process is negligible.
For completing the PUF design, a control unit generates the start and the capture signals for synchronizing the start and end times of the PUF calculation.Figure 4 shows the proposed structure for the NANDTOs, and Table 1 shows the implementation results for 32-bit, 48-bit and 64-bit NANDTO PUFs over several devices from Altera and Xilinx.Implementations were performed using the following development boards:  The PUF generation requires 770 clock cycles, which is equivalent to 38.5 µs when operating at 50 MHz.The area required is not negligible because two counters (of at least 8 bits) along with the corresponding comparator must be implemented for obtaining each PUF bit.
With the NANDTO PUF presented, a DID for identifying the device when the manufacturer does not provide it is available.In the next section, the secure channel for interchanging information between the device and the vendor is detailed.

Low Area ECC Unit for Implementing a PKC in IP Cores
As pointed out in the Introduction Section, Elliptic Curve Cryptography (ECC) [22,24] is specially suitable for hardware implementation when GF(2 m ) fields are used.In the protection system described in this work, an ECC Unit will be included for ensuring communications between the device and the IP vendor.The next subsection introduces the background over ECC.

Elliptic Curves over Finite Fields
An elliptic curve E defined over a finite field GF(q) consists of a set of points P = (x p , y p ) where x p and y p are elements of GF(q) satisfying the Weirstrass equation [24], together with the point at infinite, O.In the [22] standard, the curves are defined by means of two coefficients a ∈ GF(q) and b ∈ GF(q), named the coefficients of E. IF q is a power of 2, it must be b = 0 in GF(2 m ), and the points P = (x p , y p ) over E (except O) must satisfy the equation: If q = 2 m , the field elements can be represented using polynomial bases, by choosing an irreducible polynomial f (t).In this case, an element a of the field is represented using the bit string: corresponding to the polynomial where a i are elements of GF(2).When considering binary fields, the inverse of the point P(x, y) of the curve E is defined as: One of the characteristics enabling the use of elliptic curves in cryptography is the possibility of defining an internal operation into the curve, named elliptic addition.Geometrically, the sum of two points P, Q is given by a point R = P + A with the property that P, Q, and −R lie on the same straight line.From the full elliptic addition, the scalar product of a point P of the curve E and a natural n is defined as: nP = P + .n .+ P Moreover, given r a positive prime integer dividing the number of points on E, #E, and a curve point G of order r (generator of a subgroup of order r), an EC key pair can be defined.Specifically, taking s ∈ [1, r − 1] as an EC private key, and W = sG as the associated EC public key, a public key cryptosystem for interchanging a secret value can be established.In fact, if two communicants A and B generate their corresponding EC key pairs (s A , W A ), (s B , W B ), party A can compute the secret value P = s A • W B using the public key of B. Then, B can recover the secret value from the public key of A, making P = s B • W A because: This is the basic principle for public key cryptography using EC, and the conditions to be met by the parameters involved and the details of the different primitives can be found in [22,24,31].

Domain Parameters for the ECC Unit
For the interchange of secure information between two counterparts, a set of domain parameters for the EC Cryptosystem must be established.The ECC unit must provide a secure communication channel with low area requirements, so we selected the finite field GF(2 163 ) from the standard [23].Using the curve B-163 from [32], the EC domain parameters used in the ECC Unit will be the following: = 5846006549323611672814742442876390689256843201587 (number of elements of the subgroup)

Scalar-Point Product Using the Montgomery Ladder Algorithm
As shown above, the scalar-point product is the basic operation for ECC, and it is the operation performed by the ECC Unit.There are several algorithms for the scalar-point multiplication, but the algorithm in [31] is the one preferred for EC cryptography.In fact, ML algorithm takes always the same number of steps, hindering side channel attacks, and the basic operation in each step is the field multiplication if projective or mixed coordinates are used [31].Inversion (the most costly field operation) only is needed for coordinate conversion and the recovering of the y coordinate (the algorithm operates only over the x coordinate) [31].Writing: Algorithm 1 provides the scalar-point product kP.Note that the number of steps is fixed by m, being the number of steps independent of the k value, thus hindering side channel attacks based on analyzing power consumption.The operations required in Algorithm 1 are point addition and doubling.When using projective coordinates, the operations in the main loop result in field multiplications, additions and squarings [31].Addition and squarings are combinational operations, and inversion is only required for coordinate conversion, resulting in an execution time of [28]: The execution time can be reduced if only Koblitz curves [32] are used [28]: For the sake of generality, the ECC unit will support all types of standard curves, with the approximate execution time stated in Equation ( 8).In the implementation of the ECC Unit, performance is not critical because the PKC operations will be carried out only during the "boot" process.Thus, the design will be oriented to optimizing the area requirements.
The main field operations required for the scalar-point operation are multiplication and inversion.For field multiplication, the implementation in [33] allows completing the operation over GF(2 163 ) requiring only 504 LUTs, and taking 163 clock cycles.Regarding inversion, although very efficient inversion implementations based on the Itoh-Tsujii algorithm [29] are available, the divider proposed in [34] needs only 836 LUTs on Xilinx devices [28].Figure 5 shows the block diagram of the ECC Unit, and Table 2 presents the implementation results for the ECC scalar-point multiplication unit proposed.The area required by the ECC Unit proposed is a bit higher than the design presented in [28] for Virtex 5 devices (4100 LUTs), but our design can operate with non-Koblitz curves such as B-163 [32].Regarding performance, a scalar-point operation is completed in 8µs when operating at 50 MHz.

Activation Modes Circuit (AMC) Design
The third main block of the SEHAS protection system is the Activation Modes Circuit (AMD).This activation circuitry enables the different modes in the protected IP core, and performs the switching between modes as a function of a Secret Shared Value (SSV) generated from the AC.The AMC will include part of the combinational logic of the protected core, limiting the functionality of the core if the SSV is not valid.Figure 6 shows the structure of the AMC, which includes a ROM, a set of XOR gates, and a register containing the SSV.If f 1 , f 2 ... f n are the combinational functions to be implemented in the AMC, the function g i stored in the ROM is g i = f i ⊕ SSV(i), where SSV(i) is the i-th bit of SSV.Thus, the only option for recovering the original f i functions is providing the correct SSV.The implementation of the combinational functions using a ROM presents two advantages: 1.The vendor can provide the g i functions sending only the ROM configuration.This method does not produce any modification in the original circuit and, consequently, does not affect the PUF output.2. The vendor can provide g i functions for entering in Demo mode, if desired, without changing the structure of the circuit.If a finite field GF(2 m ) is used, the SSV will have m bits, and a maximum of m combinational functions can be included into the ROM.If the functions have n variables, the size of the memory will be 2 n • m.If m = 163, as stated in Section 3.2.2, and functions with n = 5 are considered, a memory of 5216 bits is required.Although the current programmable devices have dedicated blocks of memory that can host this amount of memory without difficulty, not all designs contain this number of combinational functions.For maintaining the levels of security, the number of output bits must be maintained around m = 163, reducing the number of address bits of the ROM, and adding multiplexers at the output.As an example, if 10 functions of 5 variables are available, they can be included into a ROM with 1-bit address and 160 output bits, followed by 10 16-to-1 multiplexers.In the AMC, the multiplexers will be added at the output of the XOR gates.With these elements, the area of the AMC can be estimated as follows for accommodating k functions of n variables: h = m/k will be the size of the k multiplexers at the output of the AMC.If l = log 2 (h), then the number of 6-input LUTs (Spartan 6 and Virtex 6 devices) required is given by: Regarding the ROM memory, the number of address bits will be q = n − l, and the ROM size: Finally, the number of LUTs required for the XOR gates is #LUTs(xor) = m, and the SSV register takes #LUTs(SSV) = m Thus, for the commented example with k = 10 and n = 5, we will have h = 16, l = 4, #LUTs(mux) = 752, q = 1, #bits(ROM) = 326, #LUTs(xor) = 163, and #LUTs(SSV) = 163.

SEHAS Area and Performance Analysis
Previous sections have described the different blocks integrating SEHAS.Now, the complete SEHAS protection system will be analyzed, with special attention to the area and performance overhead.Because SEHAS allows different options, four designs will be considered: • SEHAS-DID-ROM.In this design, the DID provided by the device manufacturers is used.Thus, DNA_PORT has been used for Xilinx devices, while ALTCHIP_ID was not available for the Altera devices used for tests.Regarding the AMC, a ROM with no encryption is used, resulting in the design with lowest area overhead.• SEHAS-PUF48-ROM.This design assumes the unavailability of a DID supplied by the manufacturers, and makes use of a 48-bit NANDTO PUF.The AMC includes a ROM with no encryption.• SEHAS-DID-AES.This design considers a SEHAS protection module taking advantage of DID functions provided by device manufacturers, and including an AMC with a ROM encrypted using AES-128.• SEHAS-PUF48-AES.Finally, a design using a 48-bit PUF and AES encrypted ROM is considered.
Table 3 shows the results corresponding to these four designs.SEHAS-DID-ROM requires only 5200 LUTs for implementing the Hardware Activation protocol.This is affordable for medium-size cores, representing an area overhead of 5% on Ciclone IV (114,000 LEs) or 10% on Spartan-6 devices (55,000 LUTs).The total time required to complete an activation process, taking into account that SEHAS carries out 10 activation attempts before reporting a "failed activation", is around 32.6 ms.
About power consumption, SEHAS requires around 800 mW at 50 MHz on Spartan-6 devices, but it operates only at boot time.Thus, SEHAS does not affect the power consumption of the IP core when it is in normal operation.It is difficult to compare SEHAS with other procedures, because it is the only protection method providing all the Hardware Activation features.There area other proposals, such as [35,36] using PUFs and symmetric encryption, but [35] uses a "simulated" PUF, and authentications are performed at FPGA configuration time.The proposal in [36] is based on integrating a PUF into the FPGA along with an AES module, thus being device dependent.

Conclusions
In this work, a new procedure for the protection and Hardware Activation of IP cores implemented on FPGAs has been presented.The method, named SEHAS, allows the distribution of IP cores in Blocked or Demo mode, and their activation, switching the protected system to Activated mode, by means of an Activation Code (AC) provided by the vendor.The AC takes into account not only the identification of the core through a Digital Signature, but also the identification of the target device using a PUF.The PUF proposed in this work, introducing the NANDTO oscillators, allows the identification of any FPGA device, independently of the FPGA vendor or the device family.Also, a secure channel for interchanging information between the IP core and the vendor is established by means of an Elliptic Curve Cryptosystem, including an ECC unit into the protection system.With this elements, SEHAS constitutes a hardware activation system with high levels of security, although the area requirements (around 8000 LUTs, as shown in previous sections), makes SEHAS suitable only for high-value IP cores.Nevertheless, the area resources can be significantly reduced in exchange for security, using ECC fields out of the standards.Regarding performance, SEHAS is intended not to limit the operating frequency of the IP core under protection, as the results have shown.

Figure 1 .
Figure 1.Flow diagram for the hardware activation process.

Figure 2 .
Figure 2. Block diagram of a SEHAS protected IP core.

Table 1 .
Area and delay figures for NANDTO PUF implementation on different devices.

Table 2 .
Area and delay figures for ECC unit implementation on different devices.