Time-and Amplitude-Controlled Power Noise Generator against SPA Attacks for FPGA-Based IoT Devices

: Power noise generation for masking power traces is a powerful countermeasure against Simple Power Analysis (SPA), and it has also been used against Differential Power Analysis (DPA) or Correlation Power Analysis (CPA) in the case of cryptographic circuits. This technique makes use of power consumption generators as basic modules, which are usually based on ring oscillators when implemented on FPGAs. These modules can be used to generate power noise and to also extract digital signatures through the power side channel for Intellectual Property (IP) protection purposes. In this paper, a new power consumption generator, named Xored High Consuming Module (XHCM), is proposed. XHCM improves, when compared to others proposals in the literature, the amount of current consumption per LUT when implemented on FPGAs. Experimental results show that these modules can achieve current increments in the range from 2.4 mA (with only 16 LUTs on Artix-7 devices with a power consumption density of 0.75 mW/LUT when using a single HCM) to 11.1 mA (with 67 LUTs when using 8 XHCMs, with a power consumption density of 0.83 mW/LUT). Moreover, a version controlled by Pulse-Width Modulation (PWM) has been developed, named PWM-XHCM, which is, as XHCM, suitable for power watermarking. In order to build countermeasures against SPA attacks, a multi-level XHCM (ML-XHCM) is also presented, which is capable of generating different power consumption levels with minimal area overhead (27 six-input LUTS for generating 16 different amplitude levels on Artix-7 devices). Finally, a randomized version, named RML-XHCM, has also been developed using two True Random Number Generators (TRNGs) to generate current consumption peaks with random amplitudes at random times. RML-XHCM requires less than 150 LUTs on Artix-7 devices. Taking into account these characteristics, two main contributions have been carried out in this article: ﬁrst, XHCM and PWM-XHCM provide an efﬁcient power consumption generator for extracting digital signatures through the power side channel, and on the other hand, ML-XHCM and RML-XHCM are powerful tools for the protection of processing units against SPA attacks in IoT devices implemented on FPGAs.


Introduction
Security of the information generated and managed by Internet of Things (IoT) devices is one of the main current challenges [1] within the IoT context. Several advances have been achieved in the last years, introducing encryption in communication protocols such as MQTT-SN [2], optimizing cryptographic algorithms to be executed in low-cost microcontrollers [3], or using low-cost FPGAs to implement IoT platforms including hardware cryptoprocessors to add public-key cryptography [4]. These security features are focused on the protection of the information when entering or leaving the device, but information being processed by the microcontroller or microprocessor should also be protected against Side Channel Attacks (SCA) [5]. In this sense, power consumption is one of the main side channels widely used to extract sensitive information from IoT devices [5,6], processing systems [7][8][9] and to attack cryptographic algorithms implemented on hardware [10][11][12] or software [13]. These attacks are based on acquiring power traces from the target system and analyzing them later. Depending on the type of analysis performed over these power traces, it is possible to distinguish three main power SCAs: • Simple Power Analysis (SPA) consists of information being extracted directly from power traces [14]. This type of attack does not require specialized instrumentation and allows extracting information from a low number of power traces; thus, it is critical to implement some kind of countermeasure to prevent these attacks. These countermeasures are usually based on masking operations executed in the system [15] by randomizing the order in which instructions are executed [16], or masking the data involved in some operations by means of arithmetic or boolean operations with random values [17]. Other very effective countermeasures consist of generating "power noise" to prevent useful information from being extracted from the power traces [18,19]. • Differential Power Analysis (DPA) is based on analysis of the statistical correlation of acquired power traces [20]. Attacks based on DPA are more difficult to avoid than those based on SPA, especially if they are combined with other techniques such as Artificial Intelligence (AI) [21], because the effects of masking or noise introduction can be overridden. On the other hand, this type of attack requires the acquisition of a high number of power traces, and frequent changes of the secret key can be an effective countermeasure in the particular case of cryptographic circuits. In this case, a Public-Key Cryptosystem (PKC) is usually required, which may be provided by hybrid cryptoprocessors such as in [22] or [4] in order to enable the completion of these frequent changes securely. Other countermeasures in the literature include advanced masking [23] or sophisticated power noise generation [24]. • Correlation Power Analysis (CPA) is based on the same principles as DPA, and the only difference is that CPA makes use of the correlation factor instead of plain correlation [25] to guess the secret key or any other relevant information. It provides similar results and also require a high number or power traces [26,27].
In the case of hardware or software implementations of cryptographic algorithms, the operations to be performed are always the same, they are well known, and it is possible to obtain thousands of power traces generated with the same private key. All these facts make feasible the use of CPA or DPA attacks. In the case of processing units, CPA and DPA attacks are more difficult to perform because the instructions to be executed are not known and can vary due to interrupts, interaction with other devices, communications, data values, etc. On the other hand, with a well-trained classifier, it is possible to perform SPA attacks to disassemble the program being executed [28]. In addition, an SPA is simple to apply [10,29]; thus, it is critical to implement countermeasures against it. Indeed, it can be applied by direct observation of power consumption traces using an oscilloscope or a DC power analyzer. Its simplicity makes it affordable in a variety of situations [9,11], but it is also possible to establish countermeasures [30,31].
In the case of attacks based on monitoring power consumption, one of the most used countermeasures consists of generating additional power consumption by means of dedicated circuits to hide the information included in the power traces of the target circuit or algorithm [32]. These dedicated circuits generating significant power consumption are usually based on Ring Oscillators (RO) [33][34][35][36] that can be activated or deactivated by means of a control signal, thus allowing control of the time when the consumption peaks are generated. Other structures such as the High Consuming Module (HCM) presented in [37] may be used for this purpose, but both ROs and HCMs have the drawback that once implemented, it is not possible to control the amplitude of the corresponding power consumption peak. Additionally, HCMs have been proven to be a useful tool to extract digital signatures through the side channel in order to protect the Intellectual Property (IP) of devices.
In this paper, we propose two modifications of the HCM in [37] to both optimize the power consumption generated per LUT in FPGAs and to allow its real-time control.
The first modification, named Xored High Consumption Module (XHCM), can be controlled by means of Pulse-Width Modulation (PWM), taking advantage of the high current consumption range offered by these elements to achieve a fine-grain control of the generated amplitude. This also enables new features for power watermarking [38] and IP core protection. The other modification, named Multi-Level Xored High Consuming Module (ML-XHCM), enables the generation of different power consumption levels with minimal area overhead, thus being suitable to implement countermeasures against SPA attacks. The rest of the article is organized as follows: Section 2 revises power-based side channel attacks and the available countermeasures; Section 3 presents different circuits used to generate additional power consumption in FPGAs and introduces the Xored High Consuming Modules, showing experimental results and comparing them to HCMs; Section 4 is devoted to the description of the two proposals of controlled XHCMs for the protection against SPAs of information processed in FPGA-based IoT platforms; Section 5 summarizes the experimental results and the comparison to other power noise generators; and Section 6 presents some conclusions and future work.

Previous Work
In this section, we introduce the most common side channel attacks described in the literature for processing and cryptographic circuits, as well as the available countermeasures for SPAs, focusing on those based on the generation of power consumption noise.

Side Channel Attacks on Computing and Cryptographic Circuits
The first side channel attacks were developed to attack cryptographic circuits, as in [20], where an SPA on a smart card performing Data Encryption Standard (DES) [39] operations is described. This shows the importance of protecting circuits managing sensitive information against power consumption analysis. This type of attack may lead to obtaining the secret key, but on the other hand, there are some countermeasures available in the literature to hinder it. These countermeasures are mainly based on avoiding conditional jumps and carefully studying the timing of the different operations when protecting software implementations [40], or improving dynamic power consumption to mask power variations. Further, in [20], the basis for a DPA on DES is described. This attack is more difficult to avoid because it is based on the statistical correlation of a set of power consumption traces, but at the same time, the need for thousands of power traces enables the frequent change of the secret key as a countermeasure. Nowadays, DES has been abandoned due to its security issues, and the Advanced Encryption Standard (AES) has emerged as the new standard for symmetric encryption. AES construction is different from that of DES, but it presents similar vulnerabilities to SPA and DPA attacks, as reported in [10,29,41,42]. Additionally, the acquisition of power traces can be combined with the setup of a system of algebraic equations in order to obtain the secret key. This technique is known as algebraic crypto-analysis, and some works have reported successful attacks on AES, although the solving of these systems of equations is not trivial [43,44].
Other powerful techniques based on different principles are the one proposed in [45], where fault injection is combined with side channel attacks; the proposal in [21], where AI is combined with correlation power analysis (CPA); the use of collision attacks as in [46]; or the use of AI combined with EM attacks [47]. Although exploitation of side channels was originally intended to attack cryptographic algorithms, their application has been extended to general computing systems [7,8], microcontrollers used in IoT devices [6,28,48] and, more recently, to systems implementing neural networks [9,49]. In these processing systems, a high amount of sensitive information is managed and, even if the exchange and transmission of information is usually protected by means of cryptographic algorithms, it is vulnerable to power analysis of the processing system itself. Moreover, in these processing systems, the countermeasures developed for cryptographic circuits are not always suitable. Indeed, in a cryptographic algorithm the sequence of operations is well-known, and it is possible to use countermeasures such as masking [23], which can be effective against SPA, CPA and DPA attacks. Nevertheless, in general computing systems, the attacks are oriented to find out what instructions are being executed and thus to try to extract associated data. In this case, the attacks are based on previous training of a classifier in order to recognize the power patterns generated by the instructions and a later SPA over the target system [28,48]. Therefore, it is required to implement countermeasures against SPA attacks in both computing and cryptographic circuits.

Countermeasures for SPA Attacks
There are different proposals of countermeasures for SPA, DPA and CPA attacks. As has been previously commented, DPA and CPA attacks are more effective than SPA attacks when applied to cryptographic circuits, as they always perform the same operations, and it is possible to collect sets containing thousand of power traces generated using the same private key to apply statistical correlation techniques. These countermeasures are also effective against SPA attacks, and there are several approaches to protect software and hardware implementations of cryptographic algorithms, particularly for AES, which currently is the most extended symmetric cipher. In the case of hardware implementations, these countermeasures are mainly based on masking and hiding techniques [50]. Regarding software implementations, masking techniques try to conceal information being processed by applying randomly generated masks to intermediate values using arithmetic [15] or boolean [17] operations. This makes it difficult to identify peaks in power traces with specific intermediate values of AES (or any other cryptographic algorithm) operations. Hiding techniques try to hinder the extraction of information from power traces in two dimensions: time and amplitude. In the case of hiding in the time domain, the idea is to randomize the order in which some operations are performed [23], while hiding in the amplitude domain implies introducing modifications to power consumption [51]. Note that when considering the protection of computing systems or IoT microcontrollers, the instructions and operations may be unknown or may be very different depending on the data or the interaction with external systems. In this context, CPA or DPA techniques are not feasible, and SPAs are the main concern in the protection of the power side channel in such systems. The main proposals for hardware SPA protection in the literature are summarized in the following: • Use of voltage regulators. In [52], it is shown that Low-Dropout Regulators (LDO) help to de-correlate the input current from the current drawn by the circuit. A more advanced proposal is carried out in [53], where a converter-gating technique including a multi-phase switched-capacitor converter is used for de-correlating currents. These solutions are intended for ASIC implementations of the circuit under protection, where the voltage regulator is included in the same chip. • Generating power noise. This technique considers power consumption as an output signal and generates "noise" on that output to hide the contents of this signal [54]. These methods make use of Finite State Machines [32] or ring-oscillators to generate such noise [19]. • Masking arithmetic or boolean operations. As has been previously commented, a method to hide the results of cryptographic operations is to mask the operands or the results with boolean or arithmetic operations with random values. In [55], the addition of a randomly generated mask is proposed for protecting AES against SPA and DPA, while [56] proposes the use of multiplication as the masking arithmetic operation. As an example of the use of boolean masking, [57] proposes the use of the XOR operation. These techniques are not suitable for microcrontollers used in IoT because they require additional area and increment the processing time of software programs. • In the case of processing systems, the combination of the techniques above with the specific design of some modules of the microprocessor can lead to the implementation of designs resistant to power attacks. An example can be found in [58], in which a RISC-V processor is modified in order to be resistant to power side channel attacks, but at the cost of severe area overhead.
Taking into account the advantages and drawbacks of each protection method against SPAs, in this paper we focus on the implementation of an effective power noise generator to hinder SPA attacks on microcontrollers implemented on FPGAs.

Power Watermarking
Power watermarking [38] enables the use of the power consumption side channel to extract license or intellectual property information from a digital circuit. This information is usually a watermark or a digital signature [59] that unequivocally identifies the owner of the design under protection. The main issue in these protection systems is how to generate the variations on power consumption to enable easy extraction of the watermark from the power signal. In [37], it is shown that HCMs generate suitable signals for this purpose. In principle, HCMs do not significantly increase the power consumption of the system under protection because they are activated only when extraction of the watermark is required. To apply these modules to protect a processing unit against SPA attacks, they have to be modified to generate peaks of power consumption with an amplitude similar to that generated by the execution of a processor instruction, thus generating reasonable power consumption overhead.

Power Noise Generation
Power noise generation [19,32] is one of the main countermeasures used against SPA, DPA and CPA attacks. It can be used standalone or in combination with other countermeasures as arithmetic or boolean masking as it acts directly on the power side channel. The idea is to generate power consumption peaks similar to those generated by the computation system or the cryptoprocessor under protection. In order to properly hinder information contained in the power traces, these generated peaks should meet two requirements: • The amplitude of the peaks should be similar to that generated by the operations of interest. • The times when the peaks are generated should be random.
With these two requirements, generated peaks should be indistinguishable from those generated by the circuits under protection. In principle, to build a Power Noise Generator (PNG) with these properties, two main elements are required: a Power Consumption Generator (PCG) and a True Random Number Generator (TRNG). These elements are studied in the following sub-sections.

Power Consumption Generators in FPGAs
To generate controlled power peaks, specific circuits producing instant power consumption are required. In general, the power consumption of a circuit implemented on an FPGA can be expressed as [60]: where C e f f (i) is the effective capacitance, and SW(i) is the switching activity, both of element i. Note that in (1), it is assumed that all elements in the circuit are powered at the same voltage, VDD, and are operating at the same frequency, f . In the case of FPGAs, C e f f (i) can be considered equal if all elements under consideration are logic gates, as they are implemented in LUTs. In the case of including flip-flops, C e f f will have a different value. From this equation, if we want to generate high power consumption with a small number of LUTs, high values for f should be achieved. This is the idea behind using ring oscillators (ROs) as basic PCG circuits [33]. Indeed, a RO is the most simple structure for generating significant power consumption with the low area requirements in an FPGA, as shown in Figure 1. This scheme corresponds to a Simple Ring Oscillator with one inverting element, SRO(1), which can be enabled or disabled by means of the enable input and the AND gate. enable Figure 1. SRO with 1 inverting element.
In the case of an SRO with n elements, it is necessary to distinguish two cases depending on n being odd or even, as shown in Figure 2.  From this figure, the increase in dynamic power consumption generated by SRO(n) can be estimated as [60]: where C AND and C NOT are the effective capacitance, and SW AND and SW NOT are the switching activity of the AND and NOT gates, respectively. Note that in the case of n being an even number, there is a N AND gate (AND gate with inverted output), but it can be considered that in an FPGA its effective capacitance is the same as that of an AND gate. In principle, power consumption generated by an SRO may be increased by adding more inverters. Indeed, according to Equation (1), there will be more elements in the sum, and power consumption should increase. However, there is a side effect: the delay of the signal from the beginning to the end of the ring is increased because there are more elements, and consequently, f ro decreases. Table 1 shows experimental results corresponding to SRO(n) for different values of n in a CMOD-A7 Digilent board including an Artix-7 XC7A35T-1CPG236C device from Xilinx/AMD. The designs have been implemented using the Vivado 2020.2 design suite. The board is powered at 5 V and includes voltage regulators to generate the different voltage levels required by the FPGA. This device has been selected because it is a low-cost, small-size FPGA suitable for IoT designs, and it is included on a board with a reduced number of peripherals that can easily be powered externally. In this table, column ∆I presents the increment of current when enabling the corresponding SRO, and ∆W is the corresponding increment in power. Measurements were obtained using a Keysight N6705C DC Power analyzer. In [61], a detailed study of ring oscillator implementations in Ultrascale devices from Xilinx is presented, with power consumption densities from 1.7 mW/LUT to 2.2 mW/LUT, which are consistent with results in Table 1, taking into account the technology step between Artix-7 and UltraScale devices.
From Table 1, there is no significant increase in power when adding a reduced number of inverting elements. In order to avoid the effect of f ro decreasing when n increases, one solution is to arrange n SRO(1) elements in parallel rather than using the SRO(n). In that case, the results in Table 2 were obtained, where m is the number of parallel SRO(1).  [37] and displayed in Figure 3. In this case, eight SRO(1) are operating in parallel, and the corresponding eight outputs are combined by means of AND and OR gates to generate additional switching activity without increasing the number of LUTs. Indeed, one HCM generates ∆I = 1.2 mA (∆W = 6 mW) requiring only 8 LUTs, while two HCMs operating in parallel generate ∆I = 2.1 mA (∆W = 10.5 mW) with 16 LUTs in an Artix-7 XC7A35T-1CPG236C device. It must also be noted that these modules do not affect the maximum operating frequency of the overall system, since they do not interact with the system clock, as shown in Figure 3.

Xored High Consuming Modules (XHCMs)
The HCM presented in the previous section can be improved by replacing the AND/OR gates in Figure 3 with XOR gates, obtaining the so-called Xored High Consuming Module (XHCM). Introduction of XOR gates generates some more switching activity, since in the last level, the AND gate, with 25% switching probability, is replaced by an XOR function with 50% switching probability (in the rest of the levels, AND and OR gates are alternated and, on average, should have the same switching activity). At the same time, this requires the same number of LUTs when implemented on FPGAs. Experimental results show that one HCM generates 6.0 mW of power overhead with one instance and 10.5 mW when using two independent instances, while a single XHCM generates 6.5 mW, and two generate 11.5 mW, thus providing better results than HCMs. Note that the switching activity can be increased if the instances are gated with OR/AND functions in HCMs and XOR gates in XHCMs instead of being independent. Table 3 shows complete experimental results and comparisons of HCMs with XHCMs in gated and non-gated versions, where m is the number of HCM/XHCM instances. When comparing independent instances of HCMs and XHCMs, the latter provide a 7% improvement in power generated per LUT. In the gated versions, this improvement rises up to 10-15%.   Table 3, gated XHCMs with m = 1 and m = 2 are suitable for generating power noise intended to mask power traces of processing systems or cryptographic processors, while those with m = 4 or m = 8 can be useful for generating signals to be transmitted through the power side channel due to the high current increase, which makes it easier to recover the signal, as illustrated in Figure 4. A complete study regarding the generation and recovery of power signals using HCMs for power watermarking (which is immediately applicable to XHCMs) can be found in [37]. In both cases, it would be desirable to have the possibility of not only controlling the time during which the additional power consumption is performed, but also to control the amplitude of the generated consumption peaks. This amplitude control, which is required to generate power noise against SCAs, is approached in the next section. In the following, we use the term XHCM for gated versions, as they are advantageous compared to independent XCHM instances.

Controlled XHCMs
A first approach for controlling the amplitude of the power signal generated by XHCMs can be the use of a Pulse-Width Modulation (PWM) module connected to the enable signal. This allows generation of different waveforms that can extend the type of signals transmitted by means of XHCMs through the side channel for extracting information. This is covered in the next subsection.

PWM-XHCM
As commented above, PWM can be used to generate different output levels in an XHCM. Indeed, as the typical delay of an LUT in an Artix-7 device is 0.13 ns, the oscillation frequency of the feedback elements can be estimated to be in the range of GHz. Therefore, if a 100-MHz PWM is introduced, it can be considered that XHCM power generation has continuous behavior, and it can be modulated by PWM. Additionally, it is interesting for the mentioned application to generate a high number of different output levels. Using 10 bits for the period, it is possible to generate 1024 different power levels. On the other hand, this large number of power levels enables the generation of different functions for extracting information from the inside of the chip. Figure 5 shows the block diagram of the proposed PWM-XHCM system, with 10-bit resolution for the PWM period and 10 bits for the duty cycle. It requires 80 LUTs when implemented on an Artix-7 device. As an example, Figure 6 shows the increment of current drain generated by a 40% duty-cycle PWM with a 100-MHz clock and a period of 2.5 µs. In this figure, the white line corresponds to the variations generated by thermal noise, and the solid yellow line is the average of the current. Table 4 shows results for different duty cycles.  Since PWM_duty can be modified on-the-fly, it is possible to generate periodic functions, such as sin(ωt + φ), and to change amplitude, frequency or phase of those functions, thus enabling generation of amplitude-, frequency-or phase-modulated signals to be transmitted from the inside of the chip. This adds new features and flexibility for power watermarking applications [37,38]. As a drawback, the introduction of PWM increases the general power consumption of the entire system, as can be observed in Figure 4 and Figure 6. Indeed, the minimal current has an average of 36.5 mA without PWM, and around 58.0 mA when introducing PWM. This fact, along with the correlation of the signal with the clock feeding the PWM block, makes it not a recommended solution to mask power consumption peaks. Moreover, this structure does not generate instantaneous power consumption peaks, since it generates an average of power consumption over a complete PWM period.

Multi-Level XHCMs
In order to overcome the drawbacks of a PWM-controlled XHCM, a modification of XHCM allowing individual enabling or disabling every oscillator is proposed. Figure 7 shows the corresponding Multi-Level XHCM (ML-XHCM), which can generate eight different levels of power consumption. Comparing the circuit in Figure 7 to the one presented in Figure 3, two main differences can be observed: First, the AND/OR gates have been replaced by XOR gates in order to take advantage of the 50% switching probability of these gates, as detailed in Section 3.2. Second, the enable input, which simultaneously enables or disables the eight ring oscillators in Figure 3, has been replaced in Figure 7 by eight enable(i) inputs that can individually enable or disable each ring oscillator, thus providing different levels of power consumption depending on the number of inputs enabled at a given time. In the case of m = 1, it implies a resolution of δI = 0.16 mA, thus generating peaks of power consumption from 0.16 mA to 1.3 mA. If these amplitudes are randomly generated, these peaks can be used to mask power consumption variations generated by different processing operations in computing systems or crypto-processors. Figure 8 shows the block diagram of the controlled ML-XHCM, where the global pow_enable signal enables the power consumption unit (pow_enable = '0' resets the latch; pow_enable = '1' enables it), pow_level allows introduction of the desired power level, and load loads the value from the decoder into the latch. The decoder enables as many ROs as indicated by the pow_level input. It requires 17 LUTs when implemented on Artix-7 devices.
In the case of m = 2, 16 different amplitude levels with a resolution of δI = 0.13 mA can be generated, requiring 27 LUTs on Artix-7 devices. Note that through the randomization of the values passed to the pow_level input of the controlled ML-XHCM and the times when pow_enable is enabled, our proposal allows the introduction of random values in both amplitude and time domains, thus making it suitable for generating power noise to mask power traces in SPA attacks. (1) enable (2) enable (3) enable (4) enable (5) enable (6) enable(7)

Random Number Generation
In order to generate random values for the randomization of the amplitude and the enable time of ML-XHCM modules, we propose the use of a TRNG based on ROs and specifically designed for FPGAs [62]. The TRNG has been implemented using 50 ROs and a register to stabilize the values generated by the ROs. Finally, the content of the 50 bits of the register are XORed to obtain one random bit. Figure 9 shows the scheme of the TRNG, which uses clk_trng for syncronization, the enable_trng input for enabling or disabling the random generator, and the rnd_bit output is the resulting random bit. This TRNG requires only 51 six-input LUTS on an Artix-7 device.

Randomized ML-XHCM
In this section, we add two uncorrelated TRNGs to the ML-XHCM, thus enabling the generation of power consumption peaks of random amplitudes at random times. Indeed, if a TRNG feeds the shift register and the output of this register is connected to the ML-XHCM described in Figure 7, different amplitudes may be randomly generated. This is shown in Figure 10, where TRNG_A and SR_A are the TRNG and the shift register, respectively, for randomizing amplitudes. A similar structure can be used to randomize time, but in this case, the consumption peaks will be activated 50% of the time. In order to make the percentage of activation time configurable, a decoder can be added, as shown in Figure 10, thus being active 1/k of the time, where k is the number of outputs of the decoder. As will be shown in the next section, the addition of the decoder and shift registers has no significant effect on the required area resources.  Table 5 summarizes the experimental results obtained from the different implementations proposed in this paper on an Artix-7 device. In this table, n ro stands for the number of ROs included in the module, Amp. levels is the number of different levels of amplitude that the modules can generate, and δI is the resolution in the current increase. From this table, XCHM modules allow the generation of high current (or equivalently, high power) increases, thus being useful for power watermarking applications in which easily recoverable square signals are preferred (see [37]). If other types of signals, such as sinusoidals, are required, PWM-XHCM provides 1024 different levels to build any type of waveform at the cost of a few more LUTs. On the other hand, to generate power noise against SPA attacks on processing systems, ML-XHCM and RML-XHCM provide low-area overhead solutions for the protection of processing units used in IoT devices and implemented on low-cost   Table 6 shows the comparison of our proposals for building countermeasures against SPA attacks to other methods in the literature. In [19], a power consumption generator based on the activation of 20-RO sets is proposed, with a total of 32 amplitude levels. Although this work does not provide results of area requirements in terms of LUTs, the need of 620 ROs implies more area resources than ML-XHCM or RML-XHCM. Amplitude levels are controlled by an activity sensor, also based on ROs, to try to follow the levels of activity in an AES implementation. Randomization in time is achieved using an LFSR, which provides worse random properties than a TRNG. Regarding the works in [24,54], they are oriented to the protection of AES implementations, and the control is based on specific operations of this encryption algorithm. Therefore, our proposals present very low-area overhead while providing complete flexibility for applications not only for the protection of cryptographic algorithms, but also for the protection of processing units in low-cost FPGAs, thus improving the features provided by other works in the literature.

Conclusions and Future Work
Power consumption noise generation is an effective countermeasure against SPA attacks against cryptographic circuits and processing units. In this context, power consumption generators are essential modules to build security applications such as power noise generation for masking power traces in side channel attacks against computing systems or cryptographic processors. Moreover, these modules can also be applied for power watermarking for protecting intellectual property of IP cores. In this paper, a modification of the HCM from [37], named XHCM, has been proposed. This module improves the amount of current consumption per LUT on FPGAs when compared to other proposals in the literature. Indeed, experimental results show it is possible to achieve current increments in the range from 2.4 mA, with only 16 LUTs when using a single XHCM on Artix-7 devices with a power consumption density of 0.75mW/LUT, to 11.1 mA with 67 LUTs when using 8 XHCMs, with a power consumption density of 0.83 mW/LUT. Additionally, two controlled versions have been developed. The first one, PWM-XHCM, can be controlled by PWM and allows the implementation of different types of waveforms and modulation procedures. The characteristics of XHCM and PWM-XHCM are specially suitable for power watermarking applications. On the other hand, the multi-level controlled ML-XHCM is able to provide different power consumption levels with minimal area overhead, thus enabling the generation of randomized power noise in the amplitude and time domains. This design is suitable for hiding processing information in power traces as countermeasure to SPA and DPA attacks. As a proof-of-concept of the possibilities for randomizing the time and amplitude domains of ML-XHCM, the so-called RML-XHCM has been developed, including two de-correlated TRNGs to generate random amplitude power consumption peaks at random times. All these features can be implemented requiring less than 150 LUTs in a low-cost Artix-7 device from Xilinx/AMD. Therefore, the main contributions of this article can be summarized as: • An improved high-consuming power generator, XHCM, and a PWM-controlled variant, PWM-XHCM, have been developed, thus providing an efficient solution to extract digital signatures through the power side channel. • A modification of XCHM enabling the control of both amplitude and time domains, named ML-XHCM, has been proposed, providing a powerful tool for the protection of processing units against SPA attacks in IoT devices implemented on FPGAs. Moreover, a time-randomized version, called RML-XHCM, has also been developed as a proofof-concept.
As future work, we plan to build a secure IoT device with power noise masking for the processing unit based on the open-hardware platform for IoT devices on FPGAs presented in [63]. Funding: This work was partially supported by the Consejería de Economía y Conocimiento de la Junta de Andalucía (Spain) and by the European Regional Development Funds (ERDF) under Project B-TIC-588-UGR20.