Fault Analysis and Non-Redundant Fault Tolerance in 3-Level Double Conversion UPS Systems Using Finite-Control-Set Model Predictive Control

: Fault-tolerance is critical in power electronics, especially in Uninterruptible Power Supplies, given their role in protecting critical loads. Hence, it is crucial to develop fault-tolerant techniques to improve the resilience of these systems. This paper proposes a non-redundant fault-tolerant double conversion uninterruptible power supply based on 3-level converters. The proposed solution can correct open-circuit faults in all semiconductors (IGBTs and diodes) of all converters of the system (including the DC-DC converter), ensuring full-rated post-fault operation. This technique leverages the versatility of Finite-Control-Set Model Predictive Control to implement highly speciﬁc fault correction. This type of control enables a conditional exclusion of the switching states affected by each fault, allowing the converter to avoid these states when the fault compromises their output but still use them in all other conditions. Three main types of corrective actions are used: predictive controller adaptations, hardware reconﬁguration, and DC bus voltage adjustment. However, highly differentiated corrective actions are taken depending on the fault type and location, maximizing post-fault performance in each case. Faults can be corrected simultaneously in all converters, as well as some combinations of multiple faults in the same converter. Experimental results are presented demonstrating the performance of the proposed solution.


Introduction
Power electronics converters play a major role in numerous critical applications, causing internal converter faults to carry significant consequences. Internal faults can disrupt or even interrupt critical processes, bringing severe economic losses, service unavailability, safety risks, or even loss of life. Therefore, power converter reliability and system resilience have become extremely important over the last years. Fault tolerance provides a way to mitigate the effects of internal converter faults, allowing the system to remain in operation until the faults are repaired. This significantly reduces the impact of faults and dramatically increases the system's resilience, avoiding undesired downtime. Semiconductor faults are one of the main causes of converter failure, so the development of fault-tolerant techniques to mitigate semiconductor faults is critical.
An Uninterruptible Power Supply (UPS) provides protection to a critical load, improving its resilience. Hence, it is imperative to guarantee that the load is protected not only against grid faults but also from internal faults in the UPS system itself. Therefore, fault-tolerant strategies are crucial to reduce or eliminate the effect of internal UPS faults, allowing the UPS and, consequently, the critical load to remain in operation after a fault occurs. This is particularly important in double-conversion UPS systems since all power supplied to the load flows permanently through the power converters. in other operating points or the dynamic response [75][76][77]. To tackle this problem, several techniques have been proposed to automatically adjust weighting factors during operation [77][78][79][80]. In particular, the solution proposed in [77] provides a continuous weighting factor adjustment, allowing the controller to adapt to different operating conditions and improving both steady-state performance and the dynamic response. Due to its high adaptability to different conditions, this technique presents great potential for fault-tolerant applications, since it can adapt to new (and initially unforeseen) operating conditions without re-tuning.
A cooperative FCS-MPC principle is proposed in [74], to improve multi-converter performance without increasing the overall computational load of the system (compared to independent controllers). This cooperative solution is combined with the dynamically weighted FCS-MPC technique proposed in [77] and applied to a 3-level double conversion UPS system. The resulting control system proposed in [74] is highly adaptable and promotes high cooperation between all converters of the UPS, improving the overall performance and dynamic response.
Other FCS-MPC solutions have also been proposed for UPS applications. For example, in [81][82][83][84][85] FCS-MPC solutions are proposed for 2-level UPS inverters. However, all these solutions focus uniquely on the inverter stage of the UPS and overlook the remaining system. Other studies focus on the parallel connection of multiple UPS inverters using FCS-MPC techniques [86,87]. Once again, these techniques focus only on the inverter stage of the UPS and a 2-level topology is used. In [74] a full 3-level UPS system is studied, considering all UPS converters (rectifier, inverter and DC-DC) and all operation modes. In [87,88] FCS-MPC techniques are proposed to control full UPS modules (rectifier+inverter) connected in parallel. These studies use both 2-level [87] and 3-level [88] topologies. However, the DC-DC converter and UPS battery bank are overlooked. Thus the operation of a full UPS system using FCS-MPC was rarely studied, especially with multilevel topologies. In addition, most solutions in the literature use conventional fixed weighting factor approaches, which may not provide sufficient adaptability to enable post-fault operation without altering controller parameters.
Due to the dynamic nature of the controller proposed in [74], this technique should be particularly suited for use in fault-tolerant UPS systems, since it can dynamically adapt to different operating conditions without changes to the controller parameters. The cooperative principle used between the converters of the UPS is also advantageous, since it allows each converter to aid in compensating the limitations imposed by faults in the others only when needed (without compromising normal operation). Even though [74] demonstrated the advantages of this controller, it did not study the system response in case of faults and did not propose any mechanisms to allow post-fault operation. Thus, this paper uses the control system proposed in [74], now proposing several control adaptations, topology changes and additional techniques to create a highly resilient non-redundant fault-tolerant UPS system. Some solutions have previously been presented featuring fault-tolerant measures to correct IGBT open-circuit (OC) faults in FCS-MPC-based systems. The studies in [41][42][43][44][45] correct OC faults in 2-level grid-connected converters by connecting the faulty phase to the midpoint of the DC bus using additional switches. A similar approach is taken in [61], where the faulty phase is connected to an auxiliary capacitor charged at half the DC bus voltage. In these cases, the prediction model is altered in order to consider the correct post-fault output voltage in the faulty phase. In [41,44] an additional objective is included in the cost function, to maintain DC bus capacitor balance in post-fault conditions. In [44] the system can operate even with a phase current sensor fault. In [45] the objective function is altered to minimize post-fault power ripple. In [43], a multi-vector approach is taken, applying multiple voltage vectors in each sampling period. Thus, in fault-tolerant operation the controller presents a modulator-like output. All solutions mentioned above require a distinct control-set and altered prediction model to be used after the fault.
In [47], a fault-tolerant FCS-MPC controller is proposed for doubly-fed induction generator applications, using 3-level NPC converters. As in the previous cases, slow switches are used to connect the faulty-phase to the midpoint of the DC bus, in all cases of IGBT OC fault in the NPC converter (regardless of the faulty switch). After the reconfiguration, the controller permanently considers a fixed switching state in the faulty phase (equivalent to the reconfiguration), thus reducing the available control set. The technique in [47] can also correct short-circuit faults, by permanently removing the switching states that cause a short-circuit from the control-set. In both cases, the post-fault control-set is merely a subset of the one available in normal conditions instead of a distinct one, making controller implementation significantly simpler. The solutions in [27,31] provide fault-tolerant capabilities to 3-level NPC rectifiers and motor drive inverters and also consider a subset of the initial control set for fault-tolerant operation. In these cases, switching states affected by the fault are excluded from the control set only when current in the faulty phase flows in a specific direction (affected by the fault), which allows the converter to still use them in other conditions. However, these studies do not explain how this exclusion is implemented. Moreover, [27] considers only faults in inner IGBTs of the NPC converter (all other faults are overlooked) and, since no other corrective actions are taken, the converter is actually unable to correct the effect of these faults (it is somewhat reduced, but high current distortion still exists). In [31] both inner and outer IGBT faults can be corrected. In the case of inner IGBTs, the faulty phase is connected to the DC bus midpoint using additional switches. To avoid derating, the DC bus voltage is increased in post-fault conditions. However, the new DC bus voltage value is chosen manually, so it cannot automatically adapt to varying load conditions. The previously described fault-tolerant FCS-MPC techniques study fault-tolerance in a single converter and imply a derating of the system in post-fault operation. Only the solution in [61] avoids system derating, by doubling the DC bus voltage after a fault is detected (using an auxiliary boost converter which drives the DC bus of the inverter).
Despite the great versatility of FCS-MPC, its application to fault-tolerant solutions has been relatively limited, focusing mostly on the study of single-converter systems (most frequently of 2-level type) and correcting only IGBT faults. Moreover, few studies have leveraged the versatility of FCS-MPC to impose advanced switching constraints in fault-tolerant operation. Also, little to no studies have focused on fault-tolerant FCS-MPC UPS systems and most solutions consider a post-fault derating, which is unacceptable in UPS applications. This paper proposes a new non-redundant fault-tolerant technique for a complete double-conversion UPS, based on multilevel topologies and leveraging the versatility of FCS-MPC. The proposed system uses two 3-level NPC converters and a 3-level DC-DC converter. A fault-tolerant scheme is proposed to correct faults in all UPS converters, guaranteeing full-rated operation after open-circuit faults in any semiconductor of the UPS. Unlike the solutions found in the literature, which focus mostly on the correction of OC faults in the IGBTs, the technique proposed in this paper can correct faults in all semiconductors of the UPS: IGBTs, clamping diodes, and anti-parallel diodes.
Even though numerous fault-tolerant solutions have been proposed for NPC converters, the fault-tolerant operation of a full multilevel-based double-conversion UPS system has not been properly studied. Also, little to no studies have approached fault tolerance in the 3-level DC-DC converter, especially when used in this kind of system. The proposed solution uses a common approach for all UPS converters, based on three types of corrective actions: controller adaptations, hardware reconfiguration, and DC bus voltage adjustment.
Unlike most previously proposed solutions, the technique proposed in this paper does not use the same corrective action for all types of faults. Instead, highly differentiated corrective actions are taken depending on the faulty semiconductor type and location. This minimizes the impact of the fault correction on the remaining converters and maximizes the overall UPS performance in post-fault conditions. In addition, conversely to most fault-tolerant solutions found in the literature (FCS-MPC and modulator-based), the proposed technique does not entirely prevent the use of the switching states affected by the fault. Instead, a new technique is described in detail to selectively remove these switching states from the control-set only in affected current conditions. This means those switching states can still be used in all conditions that do not require the faulty component to carry a current. Thus, the converter retains higher control versatility. This type of advanced switching state exclusion is made possible by FCS-MPC and would be impossible to implement with most other control techniques.
The proposed solution requires minimal hardware expansion and allows full rated post-fault operation even after simultaneous faults in all UPS converters. The proposed technique can even correct multiple simultaneous faults in the same converter, depending on the faulty switches.
To summarize, this paper presents a detailed fault impact analysis and proposes a new, highly comprehensive, non-redundant fault-tolerant technique for multilevel-based UPS systems, which allows fast correction of OC faults in any semiconductor in the UPS (IGBTs or diodes) and full-rated post-fault operation. The proposed solution leverages the power and versatility of FCS-MPC to enable a selective switching state exclusion and implements highly differentiated corrective action for different faults, thus maximizing post-fault performance. The proposed technique can simultaneously correct faults in the 3 converters of the UPS and even several cases of multiple simultaneous faults within the same converter.
Experimental results are presented to demonstrate the performance and advantages of the proposed fault-tolerant solution in different scenarios.
This paper is organized as follows: in Section 2, the studied UPS system is described. Section 3 presents a detailed fault impact analysis on the modulation capabilities of the gridside and load-side converters and the proposed fault-tolerant solutions for these converters are presented. Section 4 presents a fault impact analysis on the DC-DC converter as well as the proposed fault-tolerant solution for this converter. In Section 5, experimental results are presented and discussed. The advantages, shortcomings, and applicability of the proposed solution are discussed in detail in Section 6.

System Description
In this paper, a 3-level double conversion UPS system is proposed, consisting of two 3-Level Neutral-Point-Clamped (3LNPC) converters and a 3-Level DC-DC (3LDC) converter sharing a single DC bus, as represented in Figure 1. The grid-side converter (GSC) is connected to the grid by an inductive filter, while the load-side converter (LSC) is connected to the load using an LC filter. The DC-DC converter (DCC) connects the DC bus to the battery bank using an inductive filter. In this paper, the 3 phases on the grid side are represented by X ∈ {R, S, T} while on the load side, they are represented by X ∈ {A, B, C}. The IGBTs on each leg are denoted as S Xn , where X denotes the phase and n identifies the IGBT, numbered from top (1) to bottom (4). The anti-parallel diodes are identified as D Xn , using the same numbering. The clamping diodes are denoted as D X5 and D X6 (numbered from top to bottom). The IGBTs and anti-parallel diodes of the 3LDC converter as denoted as S Dn and D Dn (numbered from top to bottom). Each 3LNPC converter has 3 possible switching states S X in each phase The 3LDC converter has 4 possible switching states, The proposed fault-tolerant UPS does not use redundant components and requires only minimal hardware expansion. Only 8 additional bidirectional switches are needed for fault tolerance (shown in red in Figure 1). These can be relays, triacs or contactors, for example. This presents a relatively a low cost and complexity increase when compared to the standard (non-fault-tolerant) system.
The three converters in the UPS system are controlled using a Cooperative and Dynamically Weighted Finite-Control-Set Model Predictive Control technique, proposed in [74]. This technique provides excellent dynamic response and steady-state performance, as well as improved converter cooperation. The use of a Model Predictive Control technique is highly advantageous for fault-tolerant purposes, since it allows the inclusion of advanced restrictions on the control-set, allowing the proposed technique to effectively limit/restrict the use of switching states affected by the existing fault in specific conditions. Since the used control technique is not the main focus of this work, it will not be discussed in detail in this paper (for more details, interested readers may refer to [74]). Instead, only the specific changes made to the controller in the scope of the fault-tolerant operation will be described.
Even though the detection and identification of faults is critical for the correct activation of fault-tolerant measures, they can be studied separately from the fault tolerance approach. Hence, this paper will focus uniquely on the fault-tolerant techniques used to correct faults on the UPS power converters and not on the techniques used to detect and identify those faults. For the experimental implementation of this work, a fault diagnosis technique based on those proposed in [89,90] is used to trigger the proposed fault-tolerant measures.

Fault Analysis and Proposed Fault Tolerance for the Grid-Side and Load-Side Converters
In this section, the impact of each type of fault affecting the 3LNPC converters is analyzed and the proposed fault-tolerant solution is presented. A space-vector analysis is used to provide a visual and intuitive analysis of the limitations imposed by each type of fault on the available modulation area, as well as of the correction provided by the proposed technique.

Fault Impact on the 3LNPC Modulation Capabilities
Each 3LNPC converter has 27 possible switching states, that produce 19 distinct voltage vectors in the absence of faults, as represented in Figure 2a. As visible in this Figure, the 12 outer vectors are generated by a single switching state, while the inner voltage vectors can be produced by two (redundant) switching states. The zero vector can be produced by three distinct switching states. In normal conditions, the converter has the full hexagon available as the modulation area. In Figure 2, the triplets shown at the end of each vector represent the switching states in the 3 phases that generate that vector. (1, 1,−1) (0, 1,−1) When any of the semiconductors of the 3LNPC becomes faulty, being left in OC, several switching states can stop producing the expected converter voltage, reducing the number of available voltage vectors. Figure 2b-e displays the voltage vectors affected by each type of semiconductor fault, as well as the usable modulation area after the fault (shaded in gray). In each subfigure, the voltage vectors displayed in red can no longer guarantee the expected output voltage and those in blue are only guaranteed by a single switching state (a redundant state is lost).
When a fault occurs, the voltage output of the converter is compromised, reducing the operating range of the system. For example, when a OC occurs in outer IGBT S R1 several vectors are affected by the fault, reducing the available modulation area on the right-side of the αβ plane, as shown in Figure 2b. Despite this reduction, balanced voltage modulation is still possible within the inner hexagon.
It is important to note that the voltage vectors affected by a given fault (shown in Figure 2) do not become entirely unavailable. For example, IGBT S R1 only carries a current when i R < 0. When i R > 0 the current in phase R flows through the corresponding anti-parallel diode D R1 and the affected switching states produce the expected output voltage and vectors. On the other hand, when i R < 0, those states produce an output voltage different from expected, making it impossible to generate the corresponding voltage vectors. Thus, the voltage vectors and modulation areas show as available in Figure 2 represent those that can be used correctly in all conditions.
A clamping diode fault only affects 2 voltage vectors and the full modulation area can still be used relying on the remaining vectors, as shown in Figure 2c. An OC in an anti-parallel diode has an impact on the modulation area analogous to the outer IGBT case, as shown in Figure 2d. However, this case presents an additional problem: when an anti-parallel diode is left in OC and an affected switching state is selected, there is no valid flow path for the phase current in the faulty phase. This means that the current flowing in the affected phase will be quickly extinguished, leading to very high voltage spikes. Nonetheless, if only the unaffected voltage vectors are used, balanced operation is still possible within the inner hexagon.
On the other hand, when a fault occurs in an inner IGBT (S R2 ), the converter completely loses the ability to generate voltage vectors on one side of the modulation area, as represented in Figure 2e. This means that a balanced operation of the three-phase converter becomes impossible after the fault. To address this issue, a hardware reconfiguration can be performed, permanently connecting the faulty phase to the midpoint of the DC bus. This results in the modulation area shown in Figure 2f-making a balanced operation once again possible within the inner hexagon. The possibility of a balanced operation is the main requirement for the proposed fault-tolerant technique.
This analysis makes it is clear that different types of faults have a significantly different impact on the output voltage that can be produced by the 3LNPC converter.

Multiple Simultaneous Faults in a 3LNPC Converter
If more than one fault affects a given 3LNPC converter simultaneously, the available switching states and voltage vectors will be further restricted. However, in some of these cases the inner hexagon is still available in the modulation area, making a fault-tolerant operation possible. Some examples are shown in Figure 3.  For example, when multiple faults occur affecting only outer IGBTs or anti-parallel diodes located in the upper half or lower half of their respective legs, the inner hexagon remains available. Some examples of this are displayed in Figure 3a,b. As shown in Figure 3b, the inner hexagon remains available, even with faults in the 3 phases, as long as the faults are located in the same half of their respective legs. On the other hand, as seen in Figure 3c, if switches in both the upper and lower half-legs are affected, the available modulation region no longer encompasses the inner hexagon and fault correction is impossible.
Several other cases of multiple faults can maintain a balanced modulation within the inner hexagon, as exemplified by Figure 3d,e. In these cases, a fault correction is possible with the proposed technique (described next). Table 1 summarizes different types of single and multiple faults in a 3LNPC converter, their consequences on the modulation area and correctability with the proposed fault-tolerant technique.

Required DC Bus Increase
Correctable W/ Prop. Techn.

Voltage Spikes
Single fault outer IGBT Figure

Proposed Fault-Tolerant Technique for the GSC and LSC
The proposed fault-tolerant UPS system uses the non-redundant topology presented in Figure 1 and consists of three main parts: controller adaptations, hardware reconfiguration (topology change) and DC bus voltage adjustment.

Controller Adaptations
After the reconfiguration, the 3LNC converter cannot reliably use all its switching states. Hence, it is necessary to make sure that the controller does not select switching states that will produce incorrect an voltage output. In several solutions found in the literature, this is done by altering the modulation patterns in order to avoid the affected switching states [23,24,48,50,52,54,55]. However, this requires custom patterns to be developed for each fault case, which would be a significant undertaking given the large amount of faults considered in this work. Additionally, the control states affected by the fault are typically removed completely from the switching possibilities of the converter, which is sub-optimal. In this work, a selective switching state exclusion is proposed, based on a FCS-MPC technique.
Consider the case of IGBT S R1 . This switch is used when switching state 1 is selected in phase R. However, it can only carry negative phase current (i R < 0). When i R > 0, the current flows through the anti-parallel diode D R1 instead. Hence, when S R1 is faulty, the modulation area is affected (as shown in Figure 2b) only when i R < 0. When i R > 0 the full modulation area is available. Thus, it makes sense to prevent the controller from using the affected control state only when i R < 0, keeping a higher control versatility in the remaining conditions. This conditional exclusion of switching states is made possible by the extremely versatile nature of FCS-MPC and would be extremely hard to achieve with most other types of controller. Nonetheless, its implementation is rather straightforward with this technique.
The GSC has 2 main objectives-grid current tracking and DC bus capacitor balancing. The overall cost function used in the GSC [74] is given bŷ whereĝ i g andĝ bal g represent the partial cost functions regarding the grid current tracking and capacitor balancing, respectively, andŴ i g andŴ bal g represent their respective dynamic weighting factors. The factor g f ault g represents the newly added constraint for selective switching state exclusion due to the existence of faults (explained next). Similarly, the LSC has 2 main objectives-load voltage tracking and DC bus capacitor balance. In addition, a constraint is used to limit the output LSC current. The overall LSC cost function [74] is given bŷ whereĝ v andĝ bal L represent the partial cost functions regarding the load voltage tracking and capacitor balancing, respectively, andŴ v andŴ bal L represent their respective dynamic weighting factors. The factor g i L represents a hard constraint that prevents the output LSC current from surpassing predefined values and g f ault L represents the newly added constraint for selective switching state exclusion after faults (explained next). Whenever a fault is diagnosed, the controller first identifies the switching states which have its output altered by that fault and the specific current conditions in which those states are affected. This information is displayed in Table 2. Table 2. Switching States Affected by each Fault in Phase X (to be Avoided by the Controller).

Switching States to Avoid
Then, if the current in the faulty phase i X is in the range affected by an existing fault, a very high penalization (g f ault = ∞) is included in the FCS-MPC objective function whenever an affected switching state (from Table 2) is considered by the controller. This effectively prevents the controller from using that switching state at that instant. If no affected switching state is being considered, then g f ault = 0.
This mechanism effectively prevents the use of switching states affected by an existing fault, only in the affected current conditions. If no fault affects the converter in the present current conditions, no penalization is applied (g f ault = 0) and the controller is free to use all switching states. If there are faults in several phases, the penalization is applied to all control options using an affected state in at least one phase.
This solution significantly improves the response of the converter, with minimal impact on its operation and minimal design effort. This simple control restriction maximizes the available control options and ensures that the controller only considers valid switching states, correctly predicting the effect of each of them.
Fault-tolerant techniques frequently require control changes for post-fault operation. However, given the dynamic nature of the predictive controller, no further changes to the control system are necessary. This is a major advantage of using MPC, resulting in a very simple and effective fault-tolerant control. Additionally the dynamic behavior of the used Dynamically Weighted controller [74] automatically adapts to the new conditions, so there is no need to define new weighting factor values for post-fault operation (significantly reducing the implementation effort of the fault-tolerant system).
Clamp diode faults (single or multiple) can actually be corrected merely with these controller adaptations, with no further corrective action, since they retain full modulation capabilities. All other fault types require additional actions.

Hardware Reconfiguration
The hardware reconfiguration of the 3LNPC converters consists of the deactivation of all IGBTs in the faulty phase and the activation of the additional bidirectional switch included in that phase (shown in red in Figure 1). This permanently connects the AC terminal of the faulty phase to the midpoint of the DC bus. This approach is common in 3-phase converters, having been used in 2-level [37,38,[40][41][42][43][44]46] and NPC [29,[47][48][49][50][51][52][53][54][55] converters. This fault-tolerant scheme requires only one additional bidirectional switch per phase of the 3LNPC converter, as represented in Figure 1. Triacs or Solid-State Relays (SSR) are typically used as these bidirectional switches. One of the main advantages of this approach is its relatively low cost (when compared to redundant techniques).
As visible in Figure 2f, after the reconfiguration only a limited portion of the modulation area is available, but balanced voltage modulation can be achieved within the inner hexagon. This is critical to correct inner IGBT faults, in which a balanced operation was impossible after the fault. However, for other types of faults a hardware reconfiguration is not necessary.
In most solutions proposed in the literature, the hardware reconfiguration is always performed, regardless of the type of detected fault. In the technique proposed in this paper, the hardware reconfiguration is only used when it is strictly necessary (to correct inner IGBT faults or multiple faults in the same phase). By avoiding the hardware reconfiguration, a major advantage is achieved: the voltage vector redundancy is not completely lost (in most cases). This allows the converter to retain more of its DC bus balancing capabilities and improves the performance of the system in post-fault operation.
After the hardware reconfiguration is performed, the controller allows only the selection of switching state 0 in the reconfigured phase (by penalizing states 1 and −1, from Table 2). This ensures that the controller correctly predicts the converter response.

DC Bus Voltage Adjustment
After a fault (or reconfiguration) occurs, the available modulation area of the converter is reduced. By increasing the DC bus voltage, the available modulation area can be increased, covering the full modulation range when the DC bus voltage is doubled. This allows the system to maintain full-rated operation, which is critical in a UPS system. This principle has been used in several applications and converters [48][49][50] and is illustrated in Figure 4 for the case of a reconfiguration in phase R. When the DC bus voltage is doubled, the inner hexagon coincides with the original operating range, but the converter has significantly less switching options. This approach is used in all fault cases that result in a balanced modulation being possible only within the inner hexagon.
When the DC bus voltage reference is doubled, the grid current and battery current references are saturated to their maximum allowed value. This ensures that both the GSC and DCC supply as much power as possible to the DC bus, making the DC bus voltage adjustment as fast as possible and therefore reducing the duration of the transient before steady-state post-fault operation is reached. After both capacitors reach a predefined voltage threshold (90% of their new target voltage), regular operation is resumed.
Due to the specifics of GSC operation, outer IGBT faults in the GSC can be corrected in a more optimized manner. The fundamental current and voltage vectors on the GSC are given byv wherev s andī s represent the grid voltage and current vectors andv g represents the fundamental GSC output voltage vector. The objective of the GSC is to absorb sinusoidal currents from the grid, drawing the necessary active power for the UPS operation and a user-defined reactive power level (typically zero, for unity power factor). Hence, from (3), it is possible to conclude that the fundamental GSC output voltage should slightly lead the grid voltage vector, as illustrated in Figure 5a.  Consider the case of a fault in IGBT S R1 . This switch only carries negative current (i R < 0), which means that the fault only affects the converter when the grid current vector i s is located on the left-hand side of the αβ plane, as illustrated in Figure 5a. However, sincē v g only slightly leadsī s , whenī s is on the left side of the plane (affected by the fault),v g is almost always within the available modulation region. On the other hand, when i R > 0 (ī s on the right side), the full modulation area is available.
Hence, the fault in S R1 only truly impacts the GSC operation in the conditions represented in Figure 5a-whenī s is still on the left semi-plane andv g is on the right semi-plane. This explains why this type of fault only affects the GSC during a small portion of the period (as will be shown in Section 5). Given the well-defined behavior of the GSC, the DC bus voltage can be increased only enough to ensure that the resulting modulation area encompassesv g in the borderline scenario shown in Figure 5b, instead of doubling this voltage.
Through some trigonometric analysis of Figure 5b, it can be concluded that the required DC bus voltage for this minimal DC bus voltage increase is given by where f is the grid frequency, |v s | is the grid voltage amplitude (given by the PLL) and i * s d , i * s q are the direct and quadrature components of the grid current reference vector (responsible for the active and reactive power components, respectively). The grid current reference is calculated by the controller (refer to [74]) and is therefore automatically adjusted when the load or operating conditions change, automatically adjusting the target DC bus voltage for the correction (V * DC min ). This analytical analysis assumes that the converter would generate the reference current vectorī g by applying the ideal output voltage vector v g . However, FCS-MPC does not use a modulator and can only apply a single control state per switching cycle, so it cannot apply a precise voltage vector. More importantly, the FCS-MPC controller needs to simultaneously pursue other concurrent objectives, which means it may need to compromise the reference current tracking in some control cycles. This leads the fundamental output voltage vector to deviate from the required vectorv g represented in Figure 5. Hence, a margin factor k is introduced in (4). This factor provides some margin for the controller to compensate these deviations. The value of k can be chosen through testing, seeking a compromise between tracking accuracy and DC bus voltage minimization -in this paper, a value of k = 1.1 is used. With the proposed implementation, the minimum threshold V * DC min is automatically adjusted depending on the operating conditions of the UPS.
Thus, in the case of an outer IGBT fault in the GSC, a lower DC bus voltage increase is performed, mitigating the effect of the fault with lower DC bus voltage. This is advantageous because a higher DC bus voltage increases the stress to the components and tends to reduce the overall UPS efficiency (as shown in Section 5).
In the case of any other type of fault in the UPS, it is impossible to reliably establish a minimum DC voltage level (due to the unknown nature of the load, for example), so this voltage is always doubled for post-fault operation-except in the cases of clamp diodes and a single outer IGBT fault.

Differentiated Correction Action for Each Type of Fault
The proposed fault-tolerant technique uses a highly differentiated approach for the correction of each type of fault. This minimizes the impact of the correction on the overall operation of the system and allows several faults to be corrected simultaneously, maximizing the reliability of the system without requiring additional components.
The decision process of the fault-tolerant procedure is represented in the flowchart in Figure 6. As visible in the figure, the converter takes distinct actions for each type of fault, in order to maximize the operational potential of the converter. The procedure in Figure 6 is valid for both the GSC and LSC.  If a single inner IGBT fault occurs or multiple faults in the same phase, the hardware reconfiguration is performed and the DC bus voltage is doubled; the controller always considers the switching state 0 in the faulty phase. When a single outer IGBT in the GSC is faulty, the controller selectively avoids the affected switching states and a minimal DC bus voltage increase is performed, to the value given by (4). When faults are detected only on clamp diodes (single or multiple faults), the fault tolerance is ensured merely through controller adaptations, without changes to the DC bus voltage. In all other cases, the controller selectively avoids the switching states affected by the fault, and doubles the DC bus voltage. In many cases, this restores complete modulation capabilities, enabling a correction of the fault effect. However, in some multiple fault cases, a complete fault compensation is not possible (as shown in Table 1). In these cases, even though a full compensation is impossible, the controller still performs the controller adaptations and DC bus voltage doubling, reducing the fault effects as much as possible. If a reconfiguration has been previously performed and a new fault occurs in a different phase, a full correction is impossible, so the previous reconfiguration is kept and the new faults are partly mitigated only using the selective switching state elimination.

Fault Analysis and Proposed Fault Tolerance for the DC-DC Converter
In this section, the impact of each type of fault affecting the 3-level DC-DC converter is analyzed and the proposed fault-tolerant solution is presented. The same analysis and correction philosophy is used as in the previous section.

Fault Impact on the 3LDC Voltage Output Capabilities
In addition to faults in the 3LNPC converters of the UPS system (GSC and LSC), faults can also occur in the DC-DC converter. These faults can severely compromise the operation of the UPS, by making it impossible to charge the batteries or use the energy stored in them to supply the load if the grid fails. Therefore, it is extremely important to include fault-tolerant capabilities in the 3-level DC-DC converter.
Analogously to the 3LNPC case, OC faults in the 3LDC converter reduce the available output voltage generation capabilities of the DC-DC converter. This is illustrated in Figure 7 for several possible fault cases.
(e) fault in diode D D2 As shown in this figure, different faults have distinct impact on the available voltage modulation interval. When an inner IGBT or diode is left in OC, the converter looses the ability to generate voltages higher than v DC/2 (Figure 7b,c). On the other hand, when an outer semiconductor is faulty the converter is unable to generate voltages lower than v DC/2 (Figure 7c,e).
Depending on the battery bank voltage, it may still be possible to draw positive and negative current from the batteries after the fault (if v bat is within the available operating range). However, this depends on factors such as the battery charge level and capacitor balance. To ensure that the fault-tolerant solution is independent from the battery bank design choice and charge state, a full operating area needs to be guaranteed in postfault operation.

Proposed Fault-Tolerant Technique for the DC-DC Converter
The fault-tolerant approach proposed for the 3LDC converter is based on the same design principles as the one previously presented for the 3LNPC case. This technique is based on the same 3 actions: controller adaptations, hardware reconfiguration and DC bus voltage adjustment.

Controller Adaptations
As proposed for the 3LNPC case, the FCS-MPC controller is used to selectively overlook the switching states affected by the fault only in the conditions in which the faulty switch would be used. The switching states to be avoided by the controller in each fault case are shown in Table 3, as well as those to be avoided after a reconfiguration (described next).
2, 3 2, 3 As previously described, a very high penalization is applied to the FCS-MPC objective function when an affected switching state is considered in the affected conditions shown in Table 3. The overall cost function of the DC-DC converter [74] has two main objectives (battery current tracking and DC bus capacitor balancing) and is given bŷ whereĝ i bat andĝ bal D represent the partial cost functions regarding the battery current tracking and capacitor balancing, respectively, andŴ i bat andŴ bal D represent their respective dynamic weighting factors. The factor g f ault D represents the newly added constraint for selective switching state exclusion. This constraint is implemented analogously to those defined in Section 3.3.1.

DC Bus Voltage Doubling
As in the 3LDC converter case, by doubling the DC bus voltage the original voltage modulation range is restored (when the lower half of the voltage interval is available). This is illustrated in Figure 8.
Doubling the DC bus voltage v DC However, this does not correct the fault in the case of inner semiconductor faults, since the lower half of the modulation region is lost. In these cases, a hardware reconfiguration is needed.

Hardware Reconfiguration
In order to correct faults in inner semiconductors, a hardware reconfiguration is necessary. Thus, a non-redundant fault-tolerant 3LDC topology is proposed. This topology, shown in Figure 1, requires only 2 additional bidirectional switches to be included in the 3LDC converter (such as triacs or SSRs). The reconfiguration performed with this topology is demonstrated for the example of an OC fault in IGBT S D2 , shown in Figure 9.
In order to correct an inner semiconductor fault (S D2 in the example), the additional switch located in the same half-leg is activated. This results in the configuration shown in Figure 9b, which is equivalent to a permanent activation of the faulty switch. This way, the previously unavailable states become the only available ones, as visible in Figure 9c (d) Voltage range after reconfiguration Given that each fault in the 3LDC converter only affects the converter during either the charging or discharging of the batteries (positive or negative battery current, respectively), as seen in Table 3, the reconfiguration can be activated or deactivated, in order to maximize the readiness of the UPS system. For example, if diode D D2 is in OC, the fault disrupts converter operation only when charging the batteries, so a reconfiguration is only required when the batteries need to be charged. On the other hand, when discharging the batteries this reconfiguration is unnecessary and needlessly limits the switching possibilities of the converter (reducing, for example, its ability to balance the DC bus capacitors). Conversely, an OC in IGBT S D2 affects the converter while discharging the batteries, so a reconfiguration is only needed in when discharging the batteries (UPS in stored energy mode).

Correction Action for Each Type of Fault
The decision process for the proposed fault-tolerant strategy in the 3LDC converter is represented schematically in the flowchart in Figure 10.
The fault-tolerant actions to be taken by the system depend mainly on the operating mode of the DC-DC converter (charging or discharging the batteries). Since each fault only affects the converter in one of these modes, a given hardware reconfiguration can be activated only when the converter operates in a mode affected by the fault.
Since the DC-DC converter may need to quickly draw energy from the batteries in case of grid failure, the system is always left in the state which allows immediate battery discharge (by default), except during the battery charging process. This ensures maximum readiness for compensation of grid faults.
Despite the non-redundant nature of the proposed fault-tolerant scheme, it is possible to correct some combinations of simultaneous faults. For example, if two faults exist but each of them affects the converter in a different operating mode (battery charging/discharging), both of them can be corrected, since they do no simultaneously affect the system.  Moreover, if 2 faults affect the same operating mode but are located in the same half-leg (for example, S D1 and D D2 ), the converter can maintain a correct operation as long as the reconfiguration is performed (as it would if only D D2 was faulty).
In conclusion, the proposed fault tolerance scheme can correct multiple faults independently of each other if they affect different battery modes (charge/discharge) and can correct multiple faults in the same mode as long as they are located in the same halfleg. This provides a very high degree of protection, even though the technique has no redundant equipment.

Experimental Results
Experimental results are now presented, demonstrating the advantages of the proposed fault-tolerant technique. The prototype developed to test the proposed fault-tolerant UPS system is shown in Figure 11.  Table 4. For prototype safety, the grid voltage is adjusted to 60 V (rms) with an autotransformer and the DC bus reference in the absence of faults is set to 110 V. The UPS supplies a nonlinear load, composed of a 3-phase resistive load (50 Ω resistors connected in ∆) connected in parallel with a 3-phase diode rectifier feeding a 50 Ω//94 µF load. All controllers, fault diagnosis and fault-tolerant techniques are implemented in Matlab/Simulink and executed in real-time in a dSpace MicroLabBox platform, with a sampling time of 70 µs. A Yokogawa WT3000 power analyzer is used to monitor system performance. Custom-made Mosfetbased SSRs are used to perform the hardware reconfigurations.

Normal UPS Operation
The steady-state performance of the proposed UPS is shown in Figure 12. As seen in this figure, the load draws a highly distorted current from the UPS (THD ≈ 19.33%), but the UPS ensures low load voltage distortion (THD ≈ 2.55%), well within the acceptable range. The load draws a approximately 375 W and 235 VAr. The UPS draws approximately sinusoidal currents from the grid (THD ≈ 0.86%) with unity power factor. In these conditions, the CSG and LSC display an average switching frequency of approximately 2.8 kHz and 4.4 kHz, respectively. Given the low voltage and low power conditions used for prototype safety, the UPS displays an overall efficiency of approximately 74% in this test.
(a) Waveforms acquired in controller (b) Power analyzer results    An OC fault is created in IGBT S R2 at t = 80 ms and is quickly identified. After the diagnosis, the reconfiguration is immediately performed and the DC bus voltage reference is doubled to 220 V. As seen in Figure 13, the switching state 0 is permanently selected in phase R after the reconfiguration. As soon as the fault is detected, the GSC and DCC immediate increase their current references to the maximum defined value ( ī * g = 15 A in the GSC and i * bat = 10 A in the DCC). This ensures that they supply the maximum possible power to the DC bus, in order to quickly charge it. After approximately 45.5 ms both capacitors surpass 90% of the target voltage (110 V per capacitor), returning the current references to their regular calculated value. This results in a very fast transition to post-fault operation. The load voltage is not affected by the fault or DC bus charging, continuously ensuring a high-quality voltage waveform at the critical load. Figure 14 demonstrates the UPS operation in the presence of an S R2 fault (with no correction) and the steady-state post-fault operation with the proposed technique. It is clear that an uncorrected fault severely degrades the grid current waveform, resulting in a very high THD. On the other hand, with the proposed fault-tolerant solution the UPS presents approximately the same grid current distortion as in normal operation (THD ≈ 1%). The THD of the load voltage waveform actually decreases in post-fault operation. Even tough this reduction seems contradictory, it can be easily explained. After the DC voltage is doubled, the LSC mostly uses the inner voltage vectors. However, since this converter is healthy, the outer voltage vectors are still available (with double the amplitude). These larger vectors provide faster compensation of the current spikes generated by the non-linear load, reducing the small "dips" caused by these spikes on the load voltage waveform and therefore reducing the overall THD. Thus, the harmonic content found on the UPS does not increase in post-fault operation. Even though phase R no longer presents switching (since it is permanently connected to the DC bus midpoint), phases S and T display a higher switching frequency than in normal operation (≈3.6 kHz). More importantly, the higher voltage and lack of medium voltage vectors (now located at the outside of the initial modulation area) lead to a significantly higher switching frequency on the LSC (close to 8 kHz compared to 4.4 kHz in normal operation). Consequently, the higher voltage applied to each semiconductor and higher switching frequency lead to increased converter losses. This is clear in the overall UPS efficiency, which drops from an initial 73.8% to approximately 66.4%.

Grid-Side Converter Faults
This demonstrates that an efficiency reduction is expected in post-fault conditions. However, it must be noted that the amplitude of this reduction is not representative of the one expected in an industrial system. Since very low power is drawn from the UPS in this test, even a small increase in losses produces a significant efficiency reduction. In high-power operation, the efficiency should be reduced, but not in such a considerable way. Even with this efficiency reduction, fault tolerance is highly advantageous, since it allows the system to remain in operation until the fault is repaired.
It is also clear from Figure 13 that the capacitor balance is reduced after the fault, resulting in higher capacitor voltage oscillation than in normal operation. Despite the efficiency reduction, the UPS successfully maintains operation in the same conditions after the fault, without compromising the operation of the critical load.
The fault correction procedure in case of an outer IGBT fault (in S R1 ) is shown in Figure 15. The fault is detected at t = 60 ms and the converter immediately triggers the DC voltage adjustment. As seen in the figure, the grid currents are distorted while the DC bus increases, but resume sinusoidal waveform once the DC bus approaches its target voltage. In this case, the DC bus voltage is not doubled, but increased only to the voltage value calculated using (4).  From Figure 16a it is clear that this fault only affects the converter for a short portion of the period, during the transition to negative current (as seen in Figure 5). Nonetheless, it causes significant grid current distortion, which needs to be corrected. As shown in Figure 16b, the proposed technique reduces the current THD to approximately 1.65%. Several design choices made in the fault-tolerant technique can be better understood by studying Figure 16c-e. Figure 16c clearly demonstrates that without adjusting the DC bus voltage, an acceptable correction may not be possible (depending on the operating conditions). Comparing Figure 16b,d, one can conclude that a higher THD reduction would be possible by doubling the DC bus voltage, instead of performing a minimal DC bus voltage increase. However, a higher DC bus voltage also leads to higher switching losses in both the GSC and LSC-with full DC bus voltage doubling, the average GSC/LSC switching frequency is 4.4 kHz/8 kHz vs. 4.2 kHz/7.5 kHz in the minimal voltage increase case-consequently causing lower UPS efficiency (65.6% vs. 68%). This is why a minimal voltage increment is advantageous.
It is important to note that the minimal voltage increase in this case was quite significant (from 110 V to approximately 185 V-roughly a 68% increase). This is mainly due to the very low grid voltage used in the tests (for prototype safety). If higher voltage is used, a significantly lower voltage increase would be seen (proportionally). For example, with a grid voltage of 400 V, 700 V DC bus and the converter drawing 7.5 kW from the grid, Equation (4) would result in a minimum DC voltage of approximately 826.5 V-a 19.5% increase. Thus, in industrial applications this approach will carry even higher advantage than demonstrated by these results.
Finally, Figure 16e demonstrates the UPS performance obtained with a complete elimination of the affected switching state (1) from phase R. This approach reduces the control versatility more than the proposed solution, since the state cannot be used in the conditions in which it is not affected by the fault. Even though this does not compromise the modulation area, it reduces the DC bus balancing capabilities of the converter, which leads to higher DC bus unbalance (clearly seen in the figure), as well as a higher grid current THD. This clearly demonstrates the advantages of using the proposed selective switching state exclusion technique. Figure 17 displays the case of a fault in clamp diode D R5 of the GSC. As shown in Figure 17, the proposed technique is able to correct the clamp diode fault merely through software adaptations. If no corrective action is taken, the current in the faulty phase R is slightly distorted, as seen in Figure 17a. More importantly, the unexpected voltage output of the affected switching state leads to erroneous DC bus power calculation, which raises the DC bus voltage level (from 55 V to about 63 V). By avoiding the affected control state, this problem is entirely avoided and the grid current THD remains similar to pre-fault conditions. As before, the proposed selective state exclusion promotes better DC bus voltage balance than a full state exclusion. In this fault case, the DC bus voltage remains the same and the switching frequency is not significantly altered, resulting in approximately the same efficiency as in normal operation.
Since anti-parallel diode faults could be potentially destructive for the developed prototype, due to the high voltage spikes caused by the sudden phase current elimination, this type of fault was not tested experimentally. Instead, the fault-tolerant procedure was triggered without an actual fault, merely to demonstrate the transient and post-fault performance with this type of fault. This is illustrated in Figure 18. As in previous cases, the DC bus voltage reference is immediately doubled and the controller selectively avoids the switching states that use the faulty diode (thus avoiding causing additional voltage spikes). During the DC bus voltage doubling the grid currents are considerably distorted, but resume operation with low distortion as soon as the target DC voltage is reached (THD ≈ 0.98% in steady-state). The load voltage THD is reduced, as previously explained. As before, a UPS efficiency reduction is observed in fault-tolerant operation. Figure 19 demonstrates the corrective potential of the proposed technique in cases where multiple simultaneous faults exist in the GSC. The UPS post-fault performance in each presented case is demonstrated with no corrective action (on top) and with the proposed fault-tolerant mechanism (on the bottom). In all cases, it is clear that the proposed technique significantly reduces the impact of the faults on the grid current waveform and on the DC bus voltage level (which tends to deviate from the intended reference when in the presence of faults). The load voltage waveform also tends to slightly improve. In the first and second cases, the post-correction THD values are similar to those found in normal operation (lower in the case of voltage THD, as previously explained)-seen in Figure 19d,e. The third case, shown in Figure 19c,f, represents the case of two faults in outer IGBTs, located both in an upper a lower half-leg (S R1 and S S4 ). From Table 1, this type of fault cannot be completely corrected, since it is impossible to recover a full modulation area. Nonetheless, the proposed technique is able to significantly reduce the effect of the fault, leaving the resulting grid current THD with an acceptable value (3.1%). This demonstrates that even in cases in which a full modulation area cannot be obtained, the doubling of the DC bus and the proposed selective state exclusion can provide acceptable post-fault performance. In all cases, GSC/LSC switching frequency significantly increases after the DC voltage doubling (to slightly over 4 kHz/8 kHz), once again leading to an efficiency reduction.

Load-Side Converter Faults
Experimental results are now presented for the case of faults in the LSC. Figure 20 demonstrates the steady-state UPS operation after a fault in IGBTs S A1 and S A2 , when no corrective actions are applied. As visible, these faults have a very high impact on the load voltage waveform, especially in the inner IGBT case. These faults severely distort the load voltage waveforms and alter their RMS value, which can compromise the operation of the protected critical load.   The fault occurs at t = 80 ms and is quickly identified. The DC bus voltage reference is immediately doubled and the controller avoids the affected switching states. The GSC and DCC immediately begin to inject as much power as possible into the DC bus, raising its voltage to the new reference value (220 V) in little over 30 ms. The grid-current distortion remains close to the pre-fault values. The load voltage distortion after the reconfiguration is not significantly increased (raises from 2.55% to 2.7%). During the transient, the load voltage is distorted (mainly in phase A). However, this distortion is relatively mild and lasts only for about one fundamental period, which should not compromise the critical load operation. The DC bus capacitors balance is not significantly affected by LSC faults, since the GSC is the main responsible for maintaining the DC bus capacitors balanced. In post-fault operation, the GSC switching frequency increases to approximately 3.3 kHz. The switching frequency in phases B and C of the LSC increases to approximately 5.1 kHz. Thus, the system displays switching frequencies significantly lower than in the case of GSC faults. In addition, phase A no longer has switching losses, since a hardware reconfigura-tion is performed. This results in an overall efficiency considerably higher than in the case of GSC faults-system efficiency is reduced from 73.8% to 71.4%.
The response of the proposed fault-tolerant system when a fault occurs in outer IGBT S A1 is displayed in Figure 22. In this case, as seen in the decision process in Figure 6, no hardware reconfiguration is performed. The DC bus voltage doubling is similar to the previous case. Since a reconfiguration is not performed, the LSC retains higher control versatility, which is why it can achieve lower load voltage distortion than in the previous case (THD = 2.3%). In this case, the load voltage distortion during the transient is corrected even faster than in the previous case. Hence, the critical load operation is not affected by the fault. With this type of fault, the average switching frequency in post-fault operation is higher than in the previous case (4 kHz in the GSC and ≈ 5.6 kHz in the LSC, with all phases in operation). For this reason, a lower efficiency is obtained.
The clamp diode and anti-parallel cases are similar to those presented for the GSC (and are therefore omitted). Some cases of multiple simultaneous faults in the LSC are shown in Figure 23, without any fault correction (on top) and with the proposed fault-tolerant scheme (on the bottom).
The first case, shown in Figure 23a,d represents the case of 2 faults in the same phase (IGBTs S A2 and S A3 ). As seen in Figure 23a, these faults critically compromise the load voltage waveform when uncorrected. On the other hand, when the proposed fault tolerance is used, these 2 faults can be corrected through the reconfiguration of phase A and the doubling of the DC bus voltage. As seen in Figure 23d, the UPS keeps similar grid current and load voltage distortion after the fault. As in the previous cases, the overall UPS efficiency is reduced, but is considerably higher than in the case of GSC faults.
The second case, displayed in Figure 23b,e, represents the case of 2 faults in outer IGBTs in different phases, both in the upper half-leg. In this case, the fault can be completely corrected, resulting in performance similar to the pre-fault operation (with a lower efficiency). The case in Figure 23c,f represents one of the cases in which a complete modulation area cannot be achieved-two outer IGBTs in different half-legs (S A1 and S B4 , in this case)refer to Table 1. As predicted, these two faults cannot be completely compensated by the proposed technique. Unlike the analogous case of GSC faults (in Figure 19f), sufficient fault mitigation cannot be achieved and significant voltage distortion is seen on the load voltage waveform (THD = 11.3%), which will most likely compromise the operation of the critical load.
Nonetheless, several cases of simultaneous multiple faults can be effectively corrected in the LSC.

DC-DC Converter Faults
Results are now presented demonstrating the performance of the system when an OC fault occurs in the DC-DC converter. Figure 24 demonstrates the UPS operation when a fault occurs in outer IGBT S D1 of the 3LDC converter, while the batteries are being charged.  As visible in Figure 24, the batteries are initially being charged with a 2 A current. Then, at t = 20 ms, an OC fault is created in IGBT S D1 . The DC bus voltage reference is immediately doubled. Both the GSC and the DCC immediately begin charging the DC bus. The fault in S D1 does not affect the discharging of the batteries, so the DCC can contribute to raise the DC bus voltage without restrictions. As the DC bus is charged, the DCC resumes battery charging. At this point, correct operation with positive current is already possible and a normal operation is resumed. Figure 25 displays the case of a fault in inner IGBT S D2 . The UPS operates in stored energy mode when the fault occurs, with the DCC supplying all energy to the load.  When the fault is identified, at t = 150 ms, the reconfiguration is immediately performed, which restores the ability of the converter to generate negative current. Simultaneously, the DC bus voltage reference is doubled. This makes the DCC draw as much power as possible from the batteries, in order to raise the DC bus voltage. In this particular case, since the UPS operates in stored energy mode, the GSC cannot contribute to raise the DC bus voltage, making DC bus charging slower. To minimize this problem, the DCC can draw higher current from the batteries to speed up this process (effectively operating in overload for the duration of the process). This is possible because batteries typically support higher discharge currents for short periods (higher than the rated continuous discharge current). Since the DC charging process is usually fast (the DCC operates in overload only for approximately 226 ms in this test), this does not pose a risk for the batteries. In the experimental tests, the battery overcurrent limit established for this period is 13 A. Depending on the used battery type and characteristics, this overcurrent limit can be significantly higher (for example, with lithium batteries)-the higher the current drawn from the batteries, the faster the transition to post-fault operation.
After the reconfiguration, the DCC looses its ability to balance the DC bus capacitors, and is only able to charge capacitor C 2 . Thus, as the converter tries to raise the DC bus voltage, it also significantly increases the unbalance between the two capacitors (reaching values as high as 60 V-visible in Figure 25). This case is particularly affected by this limitation because the GSC is not available in stored-energy mode and cannot contribute to the DC bus balance. The LSC contributes to correct this unbalance, but due to the higher priority given by this converter to the output voltage waveform the capacitor balancing is relatively slow. Due to an initial drop in v C 1 voltage, the load voltage waveform is slightly affected for approximately 2 fundamental periods (with a small reduction of peak value).
Approximately 350 ms after the fault, at t = 500 ms, steady-state is reached. Since the DCC can only charge capacitor C 2 , the LSC is the only converter responsible for maintaining capacitor balance. Thus, an unbalance of approximately 16 V is maintained in steady-state. The overall post-fault unbalance could be lowered by raising the relative weight of the DC bus balancing objective in the LSC global FCS-MPC cost function, but this would lead to a degradation of the load voltage waveform. This unbalance (approximately 7% of the total DC bus voltage) is not ideal, but does not compromise the continuous operation of the UPS in fault-tolerant mode, especially since the UPS can only remain in stored energy mode for a relatively short period of time (until the batteries are depleted). It is also important to note that this case is relatively unlikely, since a UPS rarely operates in stored energy mode (only when the grid fails) and only for short periods of time. Despite this, the proposed fault-tolerant solution ensures a continuous and uninterrupted load supply in any fault case.

Multiple Converter Faults
The ability of the proposed non-redundant fault-tolerant scheme to correct faults in multiple converters simultaneously is demonstrated in Figures 26 and 27.    In Figure 26 a fault in inner IGBT S R2 of the GSC is created first. The fault is quickly identified, and the hardware reconfiguration is performed in phase R of the GSC. The DC bus voltage reference is immediately doubled. At t = 0.865 s, a fault in inner IGBT S A2 of the LSC is created. The reconfiguration is immediately performed in phase A of the LSC and, since the DC bus voltage is already doubled, post-fault operation is immediately achieved (with a practically instantaneous transition). After the second fault occurs, a higher DC bus capacitor voltage oscillation can be observed. Nonetheless, this is not significant and does not affect the UPS operation.
In Figure 27, the same two faults occur, but in reverse order. Thus, as the LSC fault happens first, the LSC needs to wait for the DC bus voltage doubling to achieve post-fault operation, leading to a short period of slight voltage waveform distortion, as seen in Figure 21.
When the GSC fault occurs, its transition to post-fault operation is practically instantaneous, since it does not need to wait for the DC bus voltage doubling.
As this example demonstrates, the proposed non-redundant fault-tolerant strategy allows the correction of simultaneous faults in different converters, retaining full-rated capabilities. Faults can be corrected simultaneously in all converters of the UPS.

Discussion and Conclusions
As described in Sections 3 and 4 and demonstrated by the experimental results in Section 5, the proposed fault-tolerant technique has a very high fault correction potential, especially since this is a non-redundant system. In this section, the advantages, disadvantages and industrial applicability of the proposed technique are discussed in light of the presented results and the most important conclusions are presented.
Fault-tolerant solutions in the literature mostly focus on IGBT OC faults, with only a few solutions considering clamp diode faults. On the other hand, the proposed faulttolerant technique can correct OC faults in all semiconductors of the UPS-IGBTs, their respective anti-parallel diodes, and the clamp diodes-resulting in a very comprehensive correction solution. Additionally, the proposed technique provides fault tolerance not only in the 3-level NPC converters, but also in the 3-level DC-DC converter-something not done before.
The proposed system requires a low amount of additional hardware, adding little complexity to the system topology-only three bidirectional switches are required for each 3-level NPC converter and only two switches for the 3-level DC-DC converter (a total of eight switches). On the other hand, since the fault-tolerant procedure requires (in some cases) a doubling of the DC bus voltage, the minimum voltage rating of the DC bus capacitors and all the IGBTs is doubled. A higher component voltage rating can significantly increase the system cost, especially in high-voltage systems. When the required IGBT voltage ratings are close to the limit of current technology, raising the voltage rating may exponentially increase system cost, rendering the solution unfeasible. For this reason, this solution is unlikely to be economically viable in medium to high-voltage applications.
Nonetheless, 3-level converters are also used industrially in low voltage systems (such as in high-power datacenter UPS systems), due to their higher power quality, lower losses and smaller filtering requirements. In this type of system, the required IGBT voltage in a 3-level converter is relatively low. Thus, even when the voltage requirements are doubled, the IGBTs still falls within a very commonly used voltage range, which means the price increase is not overwhelming. In these cases, even though the UPS cost increase is not negligible, it is acceptable given the increase in system resilience. Consider the example of a datacenter UPS: the UPS system typically feeds only low voltage loads and needs to be extremely reliable. If, for example, a DC bus voltage of 800V is used, each IGBT would need to be rated at 400 V in a regular system or 800 V in the proposed fault-tolerant solution. Nowadays, IGBTs up to 1200 V are extremely common, which makes them relatively affordable. Hence, even though the required investment is not negligible, the significantly higher reliability of the proposed fault-tolerant system justifies it, especially in highly critical facilities, such as high-tier datacenters, in which resilience is the top priority.
The proposed technique allows all UPS converters to retain full modulation capacity after internal faults. This avoids derating the system and ensures that the critical load can continue to operate in the same conditions, regardless of the fault.
Previous work in the literature has used the hardware reconfiguration adopted for the 3-level NPC converters. However, the reconfiguration is typically performed whenever a fault is detected, regardless of its type. The proposed fault-tolerant solution performs this reconfiguration only when strictly necessary-in case of 3LNPC inner IGBT fault or multiple faults in the same phase, or in the case of 3LDC inner semiconductor fault (IGBT or diode). Thus, a reconfiguration is avoided in most cases. By avoiding an unnecessary reconfiguration, the faulty converter retains more usable switching states, ensuring higher control flexibility and minimizing the fault impact on the overall UPS performance.
The increasing of the DC bus voltage leads to increased losses in the converter, reducing its efficiency. However, this is not a critical problem, for two reasons: (1) the main objective of the UPS is protect the critical load, which is ensured by the proposed technique; (2) the UPS will only work in fault-tolerant conditions for a relatively short period, until the faults are repaired. Nonetheless, a minimal DC bus voltage increase technique is also proposed specifically for the case of outer IGBT faults in the GSC. This approach allows the correction of this type of fault with lower DC bus voltage increase, which leads to a lower UPS efficiency reduction in post-fault operation.
Unlike most solutions found in the literature, the proposed solution does not entirely prevent the converter from using the switching states affected by the fault. Instead, the adaptations made to the FCS-MPC controller ensure that the switching states affected by the fault are selectively avoided only in the conditions in which the faulty switch would need to carry a current (and fail to do so). In all other conditions, the switching state is still usable. This significantly reduces the impact of the control changes, compared to the standard solution of entirely eliminating the affected switching states from the available switching options. As demonstrated by the results, this provides more significant fault correction and reduces the negative impact of the fault. This selective exclusion of control states or each specific fault case is only made possible by the use of Model Predictive Control, which enables the inclusion of advanced control-set restrictions.
One of the main advantages of the proposed scheme is its highly differentiated approach to fault correction, depending on the existing fault(s). This minimizes the negative impact of fault correction on all UPS system converters and maximizes its overall performance.
The proposed technique can correct faults in all converters of the UPS simultaneously. Depending on the type of fault, different actions can be taken in each converter. This can result, for example, in the simultaneous reconfiguration of all converters. In addition, it is also possible to correct several cases of multiple simultaneous faults in the same converter. In all cases in which a full post-fault modulation area is achievable, similar grid current and load voltage behavior can be obtained. In the remaining cases, the proposed selective exclusion of the affected switching states and the doubling of the DC bus enable only partial mitigation of the fault effect on the grid (which can be sufficient for GSC faults, but not for LSC faults). The proposed fault-tolerant technique provides very high UPS reliability by compensating the effect of faults in all semiconductors of the UPS, thus allowing the system to remain in operation after any semiconductor fault occurs. The possibility of multiple fault correction further increases the resilience of the proposed fault-tolerant system.
It should be noted that even though the fault correction ensures a full modulation area in the faulty converters, it reduces their DC bus balancing capabilities. This is typically not critical as the controller can maintain an acceptable capacitor balance. However, if all active converters are simultaneously faulty and have their balancing capabilities reduced, the DC bus capacitors may become more significantly unbalanced, especially when operating in stored-energy mode.
A new fault-tolerant technique for the 3-level DC-DC converter is also proposed in this paper, based on the same principle used in the 3-level NPC converters. This technique allows the DC-DC converter to operate continuously with a full operating range, thus guaranteeing the viability of the UPS in case of an internal fault. Despite having a higher battery current ripple and severely reduced DC bus balancing capabilities in post-fault operation, the DC-DC converter can continue to operate uninterruptedly after an internal fault, ensuring proper UPS operation until the faults are repaired.
The proposed technique takes differentiated action depending on the faulty switch and whether the batteries need to be charged, minimizing the effect of the fault correction on the converter. The solution can simultaneously correct faults affecting each battery mode (charge/discharge).
Even though the proposed technique requires the charging of the DC bus, and therefore does not provide an instant transition to fault-tolerant operation, the transient impact on the critical load is relatively low. As demonstrated by the presented results, the load voltage generated by the UPS is not significantly affected by faults in the GSC and DCC, even during the reconfiguration and voltage doubling procedure. When a fault occurs in the LSC, the load voltage waveform can be slightly distorted while the DC bus voltage is increased. However, this distortion is relatively low and exists only during a short period of time, so it should not have a significant impact on the critical load operation. In the tested conditions, the proposed UPS system complies with the power quality standards EN 50160 and IEC-6100-3-2, both in relation to the output voltage and to the current drawn from the grid, even in fault-tolerant operation. In the presented results, the output voltage total harmonic distortion is kept below 3% for all correctable faults, which is well below the maximum limit of 8% defined for a point-of-delivery. Individual harmonics also present low values, well below the admissible limits defined in EN 50160. The current drawn from the grid also presents low distortion, with each harmonic component bellow the limits established in IEC-6100-3-2, even in the cases with higher distortion (within the fault cases correctable with the proposed technique). The developed prototype represents a scaleddown lab-oriented system, working at low voltage/current levels, so there is relatively little value in the compliance with these standards. Nonetheless, a full-scale system using these techniques should display an analogous response, resulting in a high-quality, highly resilient UPS system.
It is also important to note that the proposed corrective measures need to be triggered by fault diagnosis. Thus, the correction can only be as fast as the used diagnostic technique.
If the used diagnostic algorithm does not provide a fast detection and identification of the faulty switch, there may be a small period after the fault occurs and until the diagnosis is completed, in which the fault affects the UPS operation and no corrective action was taken yet. For this reason, it is very important to have a fast and accurate fault diagnosis solution, to minimize this delay as much as possible.
Generally speaking, fault diagnosis is typically harder to achieve when a given fault has a relatively low impact on the converter (e.g., clamp diode faults). This can potentially lead to a slower identification and consequent delay of the fault-tolerant action. However, since the impact of the fault on the UPS is low in these cases, this short delay does not usually carry significant consequences. In a UPS system, the main concern is the protection of the critical load, so it is particularly important to quickly correct faults in the LSC. Since all faults in the LSC have a significant impact on the UPS, fault identification is easier to achieve and a fast diagnosis is obtained.
As demonstrated by the results, the UPS efficiency can be considerably lower in fault-tolerant operation. It is important to note that a very steep efficiency reduction was seen in the results due to the low-power levels used in the tests (for prototype safety). The magnitude of this reduction does not convey the one that would be found in a highpower system, but it does demonstrate that an efficiency reduction is to be expected. The UPS can also present higher DC bus capacitor unbalance in post-fault scenarios and higher harmonic distortion. Given that the main priority of the UPS is to protect a critical load, the observed efficiency reduction is entirely acceptable, since it carries costs greatly inferior to an unscheduled critical process interruption or disruption. Moreover, faulttolerant operation should be a temporary state, maintained only until the fault is repaired. In highly critical applications, such as high-tier datacenters or other critical infrastructures, maintenance teams should be able to quickly perform maintenance operations and replace the faulty component(s), reverting the system to normal operation in a relatively short time-frame. The proposed techniques effectively keep the UPS system in operation with acceptable performance and full-rated characteristics, protecting the critical process and avoiding unscheduled interruptions and downtime.