1. Introduction
Outer space is full of radiation sources that include solar wind, solar flares, coronal mass ejections, galactic cosmic rays, Van Allen radiation belts, solar particle events, etc. This radiation environment consists of particles such as protons, electrons, neutrons, and heavy ions, [
1]. The strike of any of these particles may compromise the normal operation of electronic circuits on board of space systems in this environment. Depending on the type and characteristics of the impinging radiation, different effects, either irreversible or (partially or totally) reversible, may arise. There are two major effects of radiation i.e., total ionizing dose (TID) and single event effect (SEE). TID also called cumulative effect, produce gradual changes in the operational parameters of the devices, which tends to degrade the characteristics of the devices overtime. SEE cause abrupt changes or transient behavior in circuits. Such effects, interfere with space systems’ electronics operation, and, in some cases, threaten the survival of such systems. While TID effects reveal themselves gradually often after years of operation before a complete failure, SEEs don’t. This work considers alleviating the effects of SEE on electronic circuits used for space applications.
Currently, the study of techniques to keep electronic circuits operational in such hostile environment has increased [
2], driven by the increasing number of applications of radiation tolerant circuits, such as space missions, satellites, high-energy physics experiments, etc. [
3,
4]. This paper considers a module level approach for radiation hardening using fault tolerant method.
Fault tolerant methods use redundancy to mask or get around faults in electronic circuits. Redundancy is one of the most important methods to obtain highly reliable systems. Redundancy techniques have the ability to deliver continuous service in the presence of hardware faults by providing redundant hardware components. Redundancy techniques in general are adopting additional hardware components or additional computation time, which are used for fault detection or for fault masking so that the effect of faults is not reflected on the output signal [
5]. The most common radiation mitigation techniques are TMR and FMR methods [
6,
7]. They are highly-efficient but very costly and are used for situations where high reliability is targeted. Reliability is an important quality measure of a fault tolerant system.
Reliability is defined as the probability of not failing in a particular environment for a particular mission time. Suppose a system consists of
N identical components. Let
S(
t) be the number of surviving components at time
t, and
Q(
t) the number of components that failed up to time
t. Then the probability of survival of the components also known as the reliability
R(
t), which is given by:
A measure of failure
F(
t) is defined as the conditional probability that the system fails by time
t referred to us unreliability or failure time distribution:
Since
S(
t) +
Q(
t) =
N, therefore:
Since
F(
t) is a probability, its derivative is a probability distribution function and defined as,
where
f(
t) shows the probability of failures per unit time.
Now, the failure rate
λ is defined as the number of failures per unit time, compared with the number of surviving components.
Using Equation (3), the failure rate can be written as,
The expression may be integrated from 0 to time
t, by considering at time
t = 0,
R(
t) = 1, and at time
t the reliability is
R(
t), then,
Often
λ is assumed to be constant during the useful life of the system. Thus,
The mean time to failure (
MTTF) for the system is obtained as,
Assuming independent and identical modules having reliability of
Rm and with
λ constant failure rate each, and then using the binomial theorem
The reliability of TMR is given as,
3. Proposed Four Modules Architecture
Besides having the best reliability and consequently MTTF, the disadvantage of the modified triplex–duplex architecture is its high hardware resource utilization. In effort to come up with high reliability and lower resource requirement redundancy, a four module architecture was developed as shown in the
Figure 2, which has the highest reliability compared to both TMR and FMR methods and lowest hardware resource requirement compared to FMR and the modified triplex–duplex methods.
The operation of this architecture is similar to the modified triplex–duplex architecture above, except that, there are four physical modules and two clone modules reducing the total number of actual duplicated modules to four instead of six. The clone modules were created as long as at least two of the physical modules were fault free, which in effect significantly reduces hardware resource utilization compared to the FMR and the modified triplex–duplex methods. The architecture masks the failure of two physical modules out of four.
The proposed four modules architecture is comparable, in terms of reliability, to the four modules highly reliable self-purging redundancy, [
8,
9]. Self-purging redundancy uses a threshold voter instead of a majority voter. A threshold voter outputs a 1, if the number of its inputs that are 1 is greater than or equal to the threshold value; otherwise it outputs a 0. The idea of self-purging redundancy is that if only one module fails, then its output will be different from the others. A switch checks if a module’s output differs from the output of a threshold voter. If it does differ, then the module is assumed to be faulty and its control flip-flop is reset to 0. This permanently masks the output of the module so that its input to the threshold voter will always be 0.
As pointed out in [
8], the self-purging method is not so much popular due to its complex threshold voter architecture. In case of the self-purging technique, faulty module detection is performed by comparing each module’s output with the voted output. However, the detection of the faulty module is carried out before voting. In the case of the developed four modules method, it reduces the complexity encountered with a faulty voter especially when using multiple voters in the case of self-purging redundancy. Moreover, the proposed four-module redundancy technique can tolerate the simultaneous failure of two modules, whereas, a four module self-purging redundancy with a threshold of 2 cannot. Self-purging redundancy with a threshold of T can tolerate up-to T-1 simultaneous failures.
Assuming the same conditions as in previous cases for reliability calculation,
There is 25% and 30% improvement in MTTF compared to TMR and FMR methods, respectively.
The contributions of the developed methods are as follows:
Authors proposed a highly reliability redundancy technique called the modified triplex–duplex redundancy, which has 61% and 66% longer expected life than TMR and FMR techniques, respectively, although its hardware utilization is the highest compared to both methods.
To rectify the hardware consumption drawback of the modified triplex–duplex technique, authors proposed a novel four module redundancy technique derived from the modified triplex–duplex method with the following advantages:
- ○
It is comparable in reliability to the four modules self-purging redundancy with threshold of 2 and to TMR with one spare with the additional advantages of tolerating simultaneous failure of two modules and reducing complexity, which both of the above two techniques lack.
- ○
It gives 30% higher MTTF compared to FMR while utilizing lower hardware resources.
- ○
It gives 25%higher MTTF compared to TMR method.
- ○
Unlike self-purging redundancy that requires a specialized threshold voter, the proposed method is used with both single and triplicated majority voter architectures, since it is based on the modified triplex–duplex architecture.
5. FPGA Implementation and Results Obtained
The digital PID compensator, an 8-bit sigma delta ADC and an 8-bit 1.5 MHz DPWM, as well as, all redundancy techniques have been implemented in MATLAB and Xilinx system generator. The overall objective is to properly regulate the output voltage towards the desired output voltage irrespective of the input voltage and any load variations within the given ranges and irrespective of radiation induced failure of any number of the duplicated modules based on the masking ability of the redundancy technique being used.
5.1. Hardware-in-the-Loop Simulation
It is practical to test the embedded controller more efficiently with a powerful method of hardware-in-the-loop (HIL) simulation. By thoroughly testing the controller in a virtual environment before proceeding to real-world tests of the complete system, one can maintain reliability and time requirements in a cost-effective manner. HIL simulation can also allow verifying whether the vendor specific FPGA synthesis tool actually retains the module level design, which is often not the case. Therefore, the HIL block is generated representing the radiation tolerant digital voltage mode controller for the synchronous buck converter.
The manual switches (S1, S2, S3, and S4), shown at the input of the controller HIL block diagram in
Figure 5 are used to emulate the radiation faults during simulation; this is accomplished by switching the controller inputs to signals other than expected signals from the feedback system, or switching the inputs to ground (or, switch to zero). The duplicated voter’s, Ref [
11] error detectors (PIDErr1, PIDErr2, and PIDErr3) and the DPWM signals voter’s error detectors (PWMErr1 and PWMErr2), shown at the output of the controller HIL block diagram in the
Figure 5 can be used for repair/reconfiguration process initiation [
12,
13,
14], when radiation faults occur in the respective voters, if such systems are used.
Figure 6 provides the converter output voltage and current without applying radiation fault emulation.
Figure 7 shows the HIL simulation block during fault emulation of modules 1 and 2.
Figure 8 presents the output voltage and current of converter under fault emulation of modules 1 and 2. Module 1 is switched to different signal at 0.5 m-second and then module 2 is switched to a different signal at 1 m-second to emulate the radiation fault. As it is clear from the
Figure 8, there is a rise in voltage output of converter for short interval when switching the second module. This is due to switching transients.
There are five other different fault emulation cases available. All the other possible fault emulation combinations provided the same output voltages and currents as the case portrayed in the
Figure 8.
5.2. Comparison of FPGA Resource Utilization and Reliability
As can be seen from
Table 2, the proposed four modules redundancy uses the lowest hardware resources compared to FMR and the modified triplex–duplex redundancies while having the highest reliability compared to TMR and FMR techniques as explained earlier.
6. Conclusions
This paper presents a module level design approach to an FPGA based radiation tolerant digital voltage mode controller for a synchronous buck converter. A four-module high-reliability redundancy technique is proposed and implemented on zynq-7000 development board (Zybo). The technique has been compared with three other more common utilized redundancy techniques for reliability and FPGA resource utilization. It is observed that, the developed method has25% and 30% longer expected life than TMR and FMR techniques, respectively and requires lower FPGA resources compared to FMR and the modified triplex–duplex techniques.
It is shown that the proposed method can be used for radiation tolerant synchronous buck converter design for applications requiring relatively longer mission time, compared to TMR and FMR techniques. The work can be utilized in such applications where fault-masking ability of a system is required. For example space applications, power electronic converters applications, computers, satellites, high-energy physics experiments, etc.