Body Bias Optimization for Real-Time Systems

: The energy of real-time systems for embedded usage needs to be efﬁcient without affecting the system’s ability to meet task deadlines. Dynamic body bias (BB) scaling is a promising approach to managing leakage energy and operational speed, especially for system-on-insulator devices. However, traditional energy models cannot deal with the overhead of adjusting the BB voltage; thus, the models are not accurate. This paper presents a more accurate model for calculating energy overhead using an analytical double exponential expression for dynamic BB scaling and an optimization method based on nonlinear programming with consideration of the real-chip parameter constraints. The use of the proposed model resulted in an energy reduction of about 32% at lower frequencies in comparison with the conventional model. Moreover, the energy overhead was reduced to approximately 14% of the total energy consumption. This methodology provides a framework and design guidelines for real-time systems and computer-aided design.


Introduction
Power consumption has become an important factor, especially for real-time systems (RTSs) for embedded usage that must meet task deadlines. The computational node in such systems often works intermittently with a certain interval. Reducing leakage power consumption in sleep mode has become a major concern as technology feature size continues to scale.
RTS energy efficiency has been extensively studied, and several techniques have been developed for saving energy, including dynamic power management (DPM) [1] such as power gating (PG) [2] and dynamic voltage scaling (DVS) [3]. Although these techniques improve energy efficiency, they often require a significant amount of power, since they must directly control the system supply voltage. When the supply voltage is in the near-threshold region, the range of power-supply scaling is restricted [4]. Moreover, the application of PG results in the loss of th e data in the memory element. Thus, PG often introduces serious problems in embedded systems.
Body bias (BB) control is attracting the attention of designers as a means of controlling the tradeoff between leakage power and performance without affecting power supply [5,6]. It is especially efficient in fully depleted silicon-on-insulator (FD-SOI) technology [7][8][9], which is commonly used for low-power systems.
Previous studies on dynamic BB control [5,10,11] have two limitations. First, they were not based on a model of BB switching-voltage overhead. Most of them ignored it or included the energy consumption in the active state. Second, a method has not been proposed for finding the optimal BB voltage and the optimal supply voltage for meeting task deadlines.
Some Internet of Things (IoT) devices work with a long interval and thus have a task deadline of more than a few seconds. The overhead of switching the body bias (BB) voltage is hence trivial. On the other hand, since the GPS time period is millisecond-order, the deadline for locating mobile devices is often millisecond-order. Some factory automation tasks also require a millisecond-order deadline. Precise power optimization techniques are thus required to cope with such newly developed applications. We have devised a standard full switching impulse voltage (double exponential) expression [12][13][14], and a mathematical expression of the transient for estimating the switching energy. We also devised an interior point method (IPM) based on this power model that can be used to obtain optimality in nonlinear programming (NLP). The main contributions of this study can be summarized as follows: • A mathematical model for calculating energy overhead based on the double exponential equation, • increasing the accuracy of the energy overhead calculation model, and • a method for optimizing energy consumption by optimizing the BB voltage and supply voltage by applying NLP.
This methodology provides a framework and design guidelines for RTSs and computer-aided design (CAD).
This paper is organized as follows. Section 2 describes BB, FD-SOI technology, and related work on energy reduction. Section 3 explains the mathematical power model and introduces the concept of the double exponential waveform. In addition, it introduces the term for increasing the accuracy of the energy overhead. The energy consumption optimization method is described in Section 4. The experimental results are presented, analyzed, and discussed in Section 5 along with the method for increasing the accuracy of the energy overhead calculation model. Finally, Section 6 concludes the paper and mentions future work.

BB for Silicon on Thin Box
Our target technology is silicon-on-thin-box (SOTB) technology, a novel FD-SOI technology [7]. It features latch-up immunity, high temperature tolerance, high performance, radiation hardness, and high BB sensitivity due to its insulating buried oxide layer, which is widely used in SOI devices [8,9]. These body-driven characteristics enable high-level energy reduction using BB. Unlike conventional FD-SOI devices, an SOTB device is formed on an ultra-thin box layer (about 10 nm thick), as shown in Figure 1, enabling a wide range of BB control. Consequently, SOTB technology ensures more efficient reduction in leakage current using BB control than conventional metal-oxide semiconductor field-effect transistors (MOSFETs). The SOTB states are classified in accordance with the nMOS BB voltage VBN, the pMOS BB voltage VBP, and the supply voltage VDD. As with other FD-SOI technologies, the default state of a given MOSFET (VBN = 0 and VBP = VDD) in SOTB technology is called zero body bias (ZBB). If a lower voltage is applied to the nMOS body (VBN < 0) and a higher voltage is applied to the pMOS body (VBP>VDD), the depletion width increases; thus, the threshold voltage increases. This condition is known as reverse body bias (RBB). With RBB, since the threshold voltage is higher, the leakage current is lower at the expense of an increase in the delay time, so performance is degraded.
In contrast, if a higher voltage is applied to the nMOS body (VBN>0) and a lower voltage is applied to the pMOS body (VBP<VDD), the depletion width decreases; therefore, the threshold voltage decreases. This condition is known as forward body bias (FBB). With FBB, since the threshold voltage is lower, the delay is reduced at the expense of an increase in the leakage current. We use balanced BB such that VBN = VDD − VBP, which normally results in the best performance per energy [11]. Thus, BB is represented simply as VBN hereafter.

Related Work
Since energy efficiency is essential in embedded RTSs, several techniques have been developed for saving energy, and much research has been carried out on saving energy by controlling the power supply and clock frequency. The DPM technique reduces the energy dissipation of RTSs with low-power idle states [1]. However, when the power supply is cut off during the idle state, volatile data are lost. Hence, when data need to be preserved, a certain level of voltage must be supplied as a power supply, which restricts the leakage power reduction. The DVS technique can drastically reduce the dynamic power due to the quadratic power supply dependency [3]. However, the range for power supply scaling is highly restricted when the power supply voltage is near the threshold region [4].
Several studies evaluated the use of BB control combined with DVS [5,10]. The models developed can calculate an optimal power supply and BB voltage for each operational frequency. They are based on the assumption of ideal voltage regulators that can output any voltage obtained from the models. However, actual voltage regulators have certain output voltage resolution limitations. Akgul et al. proposed a power-management method that takes into account such voltage constraints [10]. They assumed discrete power supply voltages and succeeded in reducing the energy despite the restrictions on leakage power reduction and power supply scaling. However, these studies were not based on parameters from real chips, and the overhead of adjusting the BB was not considered.
Several approaches have been proposed for improving energy efficiency. When considering overhead conditions or analyzing idle regions, all of these approaches are based on circuit-level information [15][16][17][18][19]. Since we do not have any circuit-level information for the target processor, we must instead measure the real chip. Our goal is to obtain a realistic model just by the parameters from a simple evaluation of the real chip and process parameters.
Without power-saving control, a task is executed in time t exe and finishes at the given deadline. The frequency and voltage are constant up to the deadline, so power is wasted in the idle region, as shown in Figure 2a.
A previous study [20] focused on two possible scenarios for executing a given task on an RTS while considering a predefined deadline. In the first scenario, the system works at the minimum frequency at which task execution finishes at the deadline. This means that the minimum VDD and ZBB voltages are supplied with a minimum frequency to satisfy the deadline (for example, 10 MHz), as illustrated in Figure 2b. This scenario is our baseline. In the second scenario, shown in Figure 3, the VDD is optimized to boost the frequency in accordance with the alpha power law, so the task is executed in much less time than in the first scenario. This is our test scenario. During the time remaining until the deadline, RBB is applied to reduce the leakage power. If the BB voltage is fixed and the substrate has been charged, almost no current is required for giving the biasing. If the voltage changes dynamically, energy is lost due to substrate charging and discharging.
We previously established a functional mathematical model for power and timing that uses parameters extracted from real chips [6,11]. Moreover, we performed several of the first studies that included energy overhead parameters extracted from real chips [20][21][22]. By using our power and timing model to control BB, we reduced energy consumption by 15.3% at 40 MHz for a deadline of 3 ms and an optimal VBN for the RBB voltage (−500 mV) compared with the baseline. The overhead for dynamic BB switching occupied a considerable part of the total energy required; thus, we had to evaluate the overhead for real chips. Although we identified the optimal point, we used a brute-force coarse-grain search with a limited range of values: From −200 to −700 mV with 100 mV steps for VBN and from 20 MHz to 60 with 10 MHz steps for the frequency [20]. Hence, exploring BB voltages other than the evaluated values was impossible. To overcome these problems, we have now devised a model that includes BB switching overhead, and it is suitable for optimization methodologies.

Baseline Model
In our previous work [6,11,20], we developed a power and timing model using BB control. It is based on real-chip measurements of leakage current, switching current, and maximum operating frequency. We measured the target chip at 25°C. We again assume that the execution time of the target task is fixed and can be estimated, as described in a previous paper [20]. Here, execution time is represented as T exe . The total energy (E T ) is the sum of the static energy (E s ), dynamic energy (E d ), energy overhead of the sleep-down transition (E ovs ), and idle energy (E id ): This equation can be represented as where I is the leakage current, A and B are the coefficients of the exponential terms for VDD and VBN, respectively, and α · C is the coefficient of dynamic energy corresponding to the switching activity factor at capacitance C. CPI is the clock cycles per instruction and represents the number of cycles that an instruction needs to be executed; the target system is a V850 E-Star microcontroller (described in Section 5). E ovs is the sum of the VBN and VBP energies when we apply BB ( Figure 3). Although this paper focuses on VBN for simplicity, the evaluated value includes both energies. Only the sleep-down energy is considered here, since it represents current charging, while the wake-up voltage represents current discharging. The Figure 4 illustrates the device-circuit level connection. We must adjust VDD to the optimal value for each target application and deadline. . Device-circuit level connection for the target system: V850 E-star. VDD (supply voltage) is adjusted to the optimal value for each target application and deadline, and not changed during the execution. VBN/VBP (nMOS BB voltage/pMOS BB voltage) are also adjusted to the optimal BB value and switched dynamically to/from zero bias.
Additionally, we must adjust VBN/VBP to the optimal BB value and switch dynamically to/from zero bias. The energy consumed by a VBN/VBP generator itself is not included in E ovs . Various types (analog or digital that could work at the near-threshold region) of charge pump circuits/DACs with various tradeoffs have been proposed for VBN/VBP generators [23][24][25][26], and considering the total system including them is beyond the scope of this paper.
As stated above, the execution time for a given task is defined as T exe ; the task is executed with N instructions. The idle time is defined as T id .
Additionally, T exe should satisfy: where D is the deadline at which the critical task must be completed, and T ovsT is the time needed to establish the necessary VDD and VBN when switching to and from active and idle states. T ovsT can be defined as the sum of the wake-up and sleep-down times, t w and t s : Some of these parameters are obtained from real-chip evaluation, and the others come from the characteristics of the SOTB device. The details of obtaining the parameters were described in our previous paper [20]. However, there was no way to mathematically represent E ovs in Equation (2). That is why we could not apply an optimization method to the above expressions. Although there are several models for representing transition behavior [15], they are mostly for controlling the PG supply voltage. Hence, we propose using a double exponential expression, a conventional method in power electronics.
Before going into detail, we show the overall workflow of this study in Figure 5.

Double Exponential Waveform Expression
We use the double exponential waveform expression to model E ovs . We consider an electrical transient to be a temporary disturbance in a power system caused by voltage switching, so a minimal level of transient energy is expected regardless of the circuit. The electrical transients have the shape of a standard full switching impulse (SI) waveform or a double exponential waveform; in our case, this transient occurs after we finish the task execution and we apply BB, as shown in Figure 3. We analyze this transient period from the real chip current measurements. SI waveforms are characterized by three parameters: The rise time (t rise ), which is the time it takes to reach the maximum current amplitude, the current amplitude (I ovs ,) and the tail time (t tail ), that is, the time it takes to settle.
We use these parameters to model E ovs to fit into the double exponential waveform [27]. The SI waveform is expressed as where gamma (γ) and delta (δ) are related to the t rise and t tail times, respectively, and kappa (κ) is the amplitude-modifying factor used to compensate for interaction between the two exponential terms. The κ factor is related to γ, and δ and can be calculated using Equation (6) [28]: Finally, to get the energy overhead, we need to integrate Equation (5) from time 0 to t s so that E ovs can be expressed as Furthermore, we assume that VBN changes instantly between constant values. If the voltage source has an inner resistor, the voltage drop must be taken into consideration. Since some charge pump circuits used in VBN controllers have a large inner resistor, it may need to be considered. Here, we assume an ideal battery and a constant VBN/VBP in order to separate the analysis from battery issues.

Switching Impulse Waveform Model Coefficients
To find appropriate coefficients for the target chip, we use real-chip measurement results with several predefined values of VBN.
The proposed method for modeling E ovs is based on known physical parameters, t s and I ovs , and the VBN voltage variation (−200 to −700 mV) that we established from our previous evaluation of an SOTB device [20]. First, we use the Nelder-Mead algorithm to calculate the analytical function parameters of the SI waveform (γ, δ, and κ) from the known measurements (t rise and t tail ) of t s [29]. The following approximations are used to initiate the algorithm: Next, we calculate κ with Equation (6) and then Equation (7) using the computed coefficients and evaluate the results by using the mean absolute percentage deviation. Finally, we adjust t rise , t tail , γ, and δ as required to optimize the fitting and recalculate κ. We used this fitting process in each VBN step of our evaluation.
This fitting process is repeated until coefficients are obtained with a minimal error in accordance with the extracted real-chip measurements made in our previous work; for those measurements, we used an SG-4322 function generator to provide BB. Both VBP and VBN were changed simultaneously. The energy and timing overheads were measured using a Keysight MSOX 4104A oscilloscope and N2820A current probe [20]. This process was done to enable comparison of the calculated results with the measured ones and was used as a reference to fine-tune the analytical parameters; that is, real-chip measurement was required only once.
To check the validity of our proposed method, we compare the measured and calculated results in Figure 6 for the worst-case (the largest and smallest of VBN) evaluation. This fitting process yielded time parameters t rise = 9.34 100 t s and t tail = 24 100 t s with an average error of 10.5%. We present the results of using the E ovs model in Table 1. Although the maximum error was about 14%, the impact on the total energy required was about 1.6%, as explained below. This seems to be a reasonable error. The γ and δ analytical function parameters were fully evaluated through optimization (Section 4) and through the scenario, as shown in Figure 3 (Section 5). The mean error for these settings is discussed in a later section. In short, we update Equation (2) with the energy overhead equation Equation (7), giving us the updated equation for total energy:

Problem Definition
There is a tradeoff between power savings and switching overhead. While a high RBB saves a significant amount of static power in the idle state, the switching overhead is larger. Several variables are involved in this tradeoff. Moreover, there is a considerable number of tradeoff possibilities. Let us consider the tradeoff of BB characteristics mentioned in Section 2.1. We control the RBB characteristics by using several electrical parameters concurrently.
More advanced analyses are required to weigh the tradeoffs among all of the variables involved. Therefore, we aim at optimizing selection of the RBB and supply voltage while simultaneously considering the given task deadline, with minimal energy switching penalties and energy waste.
Consistent with the tradeoff information mentioned above, we can describe the problem as a single-objective optimization problem: Given an application, optimize the energy consumption and performance of the given task when there are concurrent options for RBB and the supply voltage.
We use Equation (10) to model this optimization problem. We catalog the variables and coefficients involved into four groups, as summarized in Table 2. The system coefficient variables (I, A, B, α, and C) are acquired in accordance with the method described in [6]. Here, the problem is the optimization of energy consumption by finding the optimal VBN and VDD voltages, constrained by the switching overhead penalties and energy waste. This is thus a problem of finding the minimum constrained nonlinear multi-variable equation.

Interior Point Nonlinear Programming Model
The Newton-Raphson method is commonly applied to engineering problems due to its swift and robust convergence characteristics. Nonetheless, if a given problem has saddles, multiple roots, or the initial condition is not a valid starting point (since, from a geometrical point of view, selection of the starting point is arbitrary), the algorithm might get caught in a suboptimal solution or may not even converge. It is thus essential that the convergence condition is ensured; therefore, we use a more robust method, the interior point method (IPM). By using the IPM, we can reach and guarantee convergence to the optimum solution by traversing the interior region described by the double exponential waveform rather than around its surface, as done by the Newton-Raphson method. The IPM has been proven to achieve an optimal solution efficiently for these types of optimization problems [30,31].
Its convergence advantage and computational efficiency make IPM an excellent problem-solving method for NLP [31]. Therefore, the objective function is the equation for total energy (Equation (10)) as a function of VBN and VDD. Hence, the optimization problem is subject to − 700 mV ≤ VBN ≤ −200 mV (12) 304.11 mV ≤ VDD ≤ 470.87 mV. (13) The goal is to minimize E T . To do so, we minimize VBN and VDD in the objective function (Equation (11)) while satisfying the voltage variation constrained for VBN ((12)) and VDD ( (13)); these constraints are based on our previous analysis of an SOTB device [20]. Another crucial constraint is the frequency, since VDD is related to frequency by the alpha power law; the system must work at a minimum frequency while attaining the performance required to avoid wasting energy. Furthermore, the frequency must be calculated in accordance with its VDD. This relationship is described in the next section. We established an evaluation framework [20] from 20 to 60 MHz with 10 MHz steps and calculated the VDD from this frequency range.
We must ensure that our model complies with the device for its operational time and rising time when applying BB. To demonstrate this, we assume a hypothetical scenario of a 3 ms deadline, which is the independent variable.
We developed a program in MATLAB [32] to compute the objective function with the IPM-NLP algorithm. The target optimization variables for the algorithm are VBN and VDD. We map the coefficients and the formulas to compute the variables of Equation (11). Next, the program calculates the variables and sweeps across the coefficients. For each iteration, the program evaluates each variable with the possible combinations. In this manner, we keep the relationships among variables. It iterates the objective function evaluating VBN and VDD until it converges. The results for our scenario were a VBN of −449 mV and a VDD of 397 mV.
In reality, the VBN and operational frequency are discrete values; however, both have various tradeoffs between cost and accuracy, depending on the available BB generators and clock frequency controllers. Since our method can find a continuous optimal value, we can set the most promising discrete values close to the optimal one in consideration of the available BB generators and clock controllers [23][24][25][26]. We compiled the optimization model with MATLAB R2019a 9.6.0.1174912 on an HP notebook computer (Windows 10 64-bit, Intel i7-8550U CPU 1.8 GHz, RAM 16 GB). The IPM-NLP computation time was 0.474 s.
To evaluate the efficiency of our methods, we estimated the computation time for a brute-force fine-grain search, whereas we used a brute-force coarse-grain search (real-chip measurements) for the evaluation and results. We used a VBN configuration with a voltage variation of −200 to −700 mV with 100 mV steps and a frequency range of 20 to 60 MHz with 10 MHz steps as well as its associated VDD in accordance with the previously reported method [20]. Now, we estimate the computation time for the brute-force fine-grain search. We used the same ranges as for the coarse-grain search but with unit step granularity for each case (VBN, VDD, and frequency) and swept through every combination. The computation time for the search was 4.265 s. Our proposed optimization method outperformed in ≈90% of the brute-force fine-grain search. Moreover, it guarantees an exact optimal solution. This optimization process is suitable for a compiler or design CAD tools if the execution time of the target program and the deadline are fixed. If they are changed due to a change in requirements, optimization must be done in the run-time system. The execution time of 4.265 s is short enough for optimization to be performed in an edge system. This optimization is needed only when a new task is introduced into the system, which is assumed to happen infrequently and severely influences the energy consumption. Thus, the energy for optimization itself was omitted.

Optimal Frequency
Once we find the optimal VDD, the next step is to find the optimal frequency f . The gate delay in MOSFETs is expressed using the alpha power law [33]: where is the process parameter, α is the velocity saturation coefficient for the MOSFET, and 1 ≤ α ≤ 2 (2 in the case of SOTB technology) [6,33]. The frequency is proportional to the reciprocal of t d . Therefore, we can determine f by using VBN-VDD optimization: where F is a coefficient related to frequency and Vth is the threshold voltage, which varies due to the back gate biasing. It can be linearly approximated using: where V th0 is the threshold voltage with ZBB and K γ is a constant given by the technology process coefficient (back gate biasing). Table 3 summarizes the power model coefficients obtained from real-chip measurements [6]. Using the coefficients in the table in Equation (15), we obtained f core = 38.02 MHz and f mem = 38.72 MHz for the core and memory, respectively. The variation between f core and f mem is very small; therefore, as a rule of thumb, we use the slowest one.

Target System: V850 E-Star
To explore the capabilities of the proposed methodology, we evaluate the break-even time (BET) and optimize the energy for several deadlines. BET is important because, if it exceeds the given deadline, the proposed methodology cannot be used (is not effective). Instead, the device should remain active to cope with the short deadline.
To evaluate the efficiency of the methodology, we used a V850 E-Star microcontroller consisting of a processing unit and a memory module as the target system [34][35][36]. It is a 32 bit RISC microcontroller with a simple in-order five-stage pipeline, 46.2 k gate logic cells, and 128 kb instruction/data memory modules for car electronics developed by Renesas Electronics. The V850 E-Star basically executes one instruction per clock cycle; hence, CPI = 1. The chip is implemented with an LEAP 65 nm FD-SOI SOTB technology node. As mentioned above, the evaluated results include both the VBN and VBP energies.
Additionally, the local memory is the most significant part in the V850 E-Star, because it takes a large part of the leakage power and is also on the critical path. To choose the coefficients to build our model, we consider the memory as the worst-case scenario.

Break-Even Time
We evaluate the BET using Equation (7), which is the proposed energy overhead calculation model. It is calculated using where P s is the static power consumed during the active state and P id is the power consumed during the idle state. In our previous work [22], we characterized the efficiency of dynamic BB scaling and established the working region of the VBN voltage framework. We set the base of this comparison as the nominal VDD = 600 mV, which is a typical supply voltage for the SOTB used for the V850 E-star. Here, we use this working region as a baseline. For the optimization phase, we use a deadline of 3 ms. Moreover, we use the optimized voltage conditions for VDD and VBN computed using Equation (11). As shown in Figure 7, the BET of the optimized VBN is found at the midpoint of the working region. We obtain 0.28 ms. This is consistent with the 0.25 ms for a −500/−400 mV brute-force search, with ≈10% error.

Optimized VBN-VDD
First, we focus on the active state. We set BB at zero bias, and D and T exe are given. The number N of instructions is determined from the baseline scenario with CPI = 1. As mentioned, the V850 E-Star executes one instruction per clock cycle [35,36]. Since the operational frequencies with the settings given above are higher than that of the baseline, the instruction execution of each task finishes prior to the deadline. Furthermore, when a periodic real-time task finishes execution, the system is put into the idle state by the next active state.
We use the optimized voltage conditions for VDD and VBN derived from Equation (11). Additionally, we calculate the optimal frequency from Equation (15). We set the optimized VDD and keep it fixed during the active and idle periods. We do not change it dynamically because this increases the cost. Next, when the task execution finishes, we put the system into the idle state by applying RBB with the optimized VBN. This creates an electrical transient, which is calculated using Equation (7).
As an example of this optimization, we evaluate the test scenario illustrated in Figure 3 using Equation (10) for a deadline of 3 ms. Figure 8 shows the energy consumption at the optimal VBN and the change in energy consumption with different VBN coarse voltages for different frequencies.
The baseline scenario is represented by the dotted line (frequency of 10 MHz). The optimal point of energy reduction, represented by the continuous line, is at 38.06 MHz with an energy consumption of 3.07 µJ on average, corresponding to 76.22% of the baseline.  Figure 9 depicts the total energy E T breakdown by element: E s , E d , E ovs , and E id . The graph is grouped by coarse VBN voltages and frequencies. The optimized scenario breakdown energy regions are delimited by horizontal lines. The largest energy reduction is 23.78% for the 38.06 MHz case. These results demonstrate that we can cut the E ovs element (from the VBN coarse voltage step) from 20% (−700 mV worst case) or 6% (−200 mV best case) to an optimal 14% of the total energy.
For further analysis using the same approach used to evaluate the optimized VBN-VDD voltages for a 3 ms deadline (Figure 8), we use this validation approach for deadlines of 2 ms, 3 ms, 4 ms, 12 ms, and 1 s, and use the optimized voltages for VDD and VBN accordingly. As shown in Figure 10, the shorter the deadline, the higher the frequency needed to meet the deadline. The optimized VBN-VDD is on the right side of the graph. The reduction ratios are 18.61%, 23.78%, 26.59%, 32.11%, and 53.19% for 2 ms, 3 ms, 4 ms, 12 ms, and 1 s, respectively. Each reduction ratio is significantly higher than the coarse voltage counterpart. Since the VBN has no significant variation, we can expect that energy reduction is lower for shorter deadlines and higher for longer deadlines. At 1 s, for example, using a lower supply voltage and a lower frequency, it achieves a significant energy reduction. Table 4 summarizes the optimized configurations obtained using interior point nonlinear programming.   . Energy reduction ratio vs. baseline for deadlines of 2 ms, 3 ms, 4 ms, 12 ms, and 1 s considering leakage current in the idle state and optimal frequency. The shorter the deadline, the higher the frequency needed to meet the deadline.

Model Accuracy
The energy overhead calculation model presented in this paper is based on the power and timing model described in Section 3. In Section 3.1, we presented the model for E s , E d , and E id . In 3.2 and 3.3, we introduced the SI double exponential model, Equation (7). This equation introduces double exponential waveform analytical function parameters (γ, δ, and κ). These are the fitting parameters for the SI waveform. We use the Nelder-Mead algorithm for its estimation. This is a well-proven algorithm for estimating these parameters. We consider the following errors.

•
Mean error. We calculate the error for each VBN coarse voltage, as shown in Table 1. Despite the difference between the real device and the ideal model, the model depicts a close approximation. We achieve a mean error between the analytical model and the real-chip measurement of 10.5%. • Effect over the model. Although the model uses the time, the error is a function of the VBN voltage. The time duration of E ovs changes slightly; however, it does not have a major effect on the waveform. In contrast, the VBN voltage has major changes (every 100 mV); thus, it affects the result. The maximum error is about 14%, whereas the effect on total energy is about 1.6%. Even though the model has a mean error of about 10%, the energy reduction is substantially increased. As we can see in Figure 10, the energy reduction ratio increases from 17.97% to 18.61%, from 21.86% to 23.78%, from 23.81% to 26.59%, from 27.71% to 32.11%, and from 29.64% to 53.19% for 2 ms, 3 ms, 4 ms, 12 ms, and 1 s, respectively, for decreases in the supply voltage, RBB voltage, and frequency. Thus, the effect of the error over the model is negligible.
Additionally, the coefficients of the target device are dependent on the chip temperature. Nevertheless, as in [37], for the FD-SOI SOTB, the coefficients have an accuracy of 93.8% at 25°C, and at 50°C, the accuracy is maintained at 91.6%. However, in the worst case, the highest commercial temperature, 65°C, the accuracy decreases to 79.5%. Thus, the same variations are expected for the proposed model.

Conclusions and Future Work
In this paper, we proposed an analytical approach and methodology for optimizing the reverse body bias and supply voltage using interior point nonlinear programming for real-time systems.
We devised an equation for estimating the overhead energy that includes analytical function coefficients. We computed these coefficients using the Nelder-Mead algorithm, thereby transforming the physical parameters of the double exponential waveform into analytical function coefficients. We incorporated this mathematical model into the complete total energy model to improve RTS energy efficiency and accuracy. Then, we used the interior point nonlinear programming model for minimizing the total energy consumption.
The evaluation results demonstrate that the proposed methodology can significantly reduce energy consumption without affecting the system's ability to meet the task deadline. We analyzed how BB optimization affects energy saving in terms of the tradeoff between energy consumption and execution time. The optimal energy consumption range is from 399 to 375 mV for VDD from 2 to 12 ms and 341 mV for 1 s. For VBN, it is from −445 to −477 mV for the same deadline range and −689 mV for 1 s. This corresponds to frequencies from 38.86 to 30.94 MHz and 20 MHz for 1 s. The results show that the sleep-down transition accounts for 14% of the total energy consumed. We obtained a BET of 0.28 ms, which is consistent with the 0.25 ms of −500 mV/−400 mV brute-force search findings, with ≈10% error.
If the execution time of the target program is severely affected by the inputs and is difficult to estimate, the proposed methodology cannot be applied. However, a number of real-time scheduling algorithms have been reported for programs for which the execution time can be estimated.
Thus, the proposed model can be used as a reference for RTS, automated computation, and for CAD (under development and future work) [38,39]. Extracting the parameters from the real chip was the first step to modeling of all of the SOTB devices. With this model, we can fix the VBN/VBP, VDD, and how much time it should be applied when the target application and the deadline are given. This means that it is useful to design the system, including the chip.
These results demonstrate that the proposed methodology can achieve greater energy reduction, that it increases the accuracy, and that it can be automated.
At the moment of the evaluation, only typical (TT) device dies were available. The future work for this methodology is to validate it across different devices dies(fast-FF and slow-SS) and different device architectures.