FPGA-Based Implementation of MMC Control Based on Sorting Networks

: In Modular Multilevel Converter (MMC) applications, the balancing of the capacitor voltages is one of the most important issues for achieving the proper behavior of the MMC. The Capacitor Voltage Balancing (CVB) control is usually based on classical sorting algorithms which consist of repetitive/recursive loops. This leads to an increase of the execution time when many Sub-Modules (SMs) are employed. When the execution time of the balancing is longer than the sampling period, the proper operation of the MMC cannot be ensured. Moreover, due to their inherent sequential operation, sorting algorithms are suitable for software implementation (microcontrollers or DSPs), but they are not appropriate for a hardware implementation. Instead, in this paper, Sorting Networks (SNs) are proposed due to their convenience for implementation in FPGA devices. The advantages and the main challenges of the Bitonic SN in MMC applications are discussed and different FPGA implementations are presented. Simulation results are provided in normal and faulty conditions. Moreover, a comparison with the widely used bubble sorting algorithm and max/min approach is made in terms of execution time and performance. Finally, hardware-in-the-loop results are shown to prove the effectiveness of the implemented SN.


Introduction
Nowadays, the Modular Multilevel Converter (MMC) has become a promising solution in different applications, such as in High Voltage Direct Current (HVDC) [1,2], high-power motor drivers [3,4] and STATic COMpensators (STATCOM) [5,6]. Thanks to several advantages, such as high modularity, scalability, low Total Harmonic Distortion (THD), high efficiency and high reliability, the interest in this topology has increased in both industry and academy [7]. However, this topology presents several challenges, such as the necessity to control the circulating current, ensure the balance of the losses among the Sub-Modules (SMs) and maintain the capacitor voltage balanced [8].
In the literature, two Capacitor Voltage Balancing (CVB) control algorithms are mainly proposed: the individual control approach [9] and the global arm control approach. The latter is commonly used in the Nearest Level Control (NLC) which requires a Sorting Algorithm (SA) [10]. Indeed, to balance the capacitor voltages (CVs), the SMs with the highest or lowest CV must be selected based on the arm current direction. Then, the SA provides a sorted list of the SMs according to their capacitor voltages.
In MMC applications, the implementation of SAs is a key challenge mainly due to the timing performances and the high computation efforts of this kind of algorithms. Low timing performances can slow down the CVB algorithm by limiting either the maximum sampling frequency or the maximum allowable number of SMs in the converter. The SAs are usually implemented in

Overview of MMC: Topology and Control
The proposed implementation of the sorting algorithm can be used in any kind of MMC. The SMs can be either half-bridge or full-bridge, without any changes in the SN. In the following, a three-phase HVDC grid connected MMC application with half-bridge SM is considered (Figure 1). Its topology and control structure are described in the next sections.
Energies 2018, 11, x FOR PEER REVIEW 3 of 18 phase HVDC grid connected MMC application with half-bridge SM is considered ( Figure 1). Its topology and control structure are described in the next sections.

MMC Topology
Each phase is composed of a leg that in turn consists of an upper and a lower arm. Each arm is composed of N series connected SMs, an arm inductor Larm and the parasitic resistances in the arm, here denoted with Rarm [21]. The half-bridge structure for the SM is considered. It consists of two switches with two antiparallel diodes and a capacitor C, as depicted in Figure 1. The capacitor can be inserted or bypassed in the arm circuit based on the status of the two switches [8].

MMC Control Levels Hierarchy
Among the different modulation techniques, the NLC is often adopted for MMC [10]. The block diagram of a three-phase MMC controller based on such a modulation technique is displayed in Figure 2. The outer current control provides the reference voltage Vref for each phase from the measured grid currents. To achieve the results presented in this paper, a classical outer current control has been implemented in the digital platform [22]. On the other hand, the circulating current control is adopted to limit the inherent circulating current ripple that is generated in the MMC. Then, the reference voltage is adjusted before the NLC. A common circulating current control based on resonant controllers has been used in this paper [22]. After that, the NLC calculates the insertion indices for the upper and lower arm for each leg, as expressed in: where p and m represent the phase ( = , , ) and the arm ( = , ), respectively. It is worth mentioning that the NLC only results the number of SMs to be inserted and it is indifferent to which SMs are selected. This task is taken in charge by the CVB control algorithm to ensure the balance between the capacitor voltages in the arm. The main elements of the CVB control algorithm are the sorting method and the selection technique. The aim of the sorting method is to put in ascendant order the SMs of an arm according to the measured capacitor voltages , . Figure 1. Schematic representation of a three-phase grid connected MMC.

MMC Topology
Each phase is composed of a leg that in turn consists of an upper and a lower arm. Each arm is composed of N series connected SMs, an arm inductor L arm and the parasitic resistances in the arm, here denoted with R arm [21]. The half-bridge structure for the SM is considered. It consists of two switches with two antiparallel diodes and a capacitor C, as depicted in Figure 1. The capacitor can be inserted or bypassed in the arm circuit based on the status of the two switches [8].

MMC Control Levels Hierarchy
Among the different modulation techniques, the NLC is often adopted for MMC [10]. The block diagram of a three-phase MMC controller based on such a modulation technique is displayed in Figure 2. The outer current control provides the reference voltage V ref for each phase from the measured grid currents. To achieve the results presented in this paper, a classical outer current control has been implemented in the digital platform [22]. On the other hand, the circulating current control is adopted to limit the inherent circulating current ripple that is generated in the MMC. Then, the reference voltage is adjusted before the NLC. A common circulating current control based on resonant controllers has been used in this paper [22]. After that, the NLC calculates the insertion indices for the upper and lower arm for each leg, as expressed in: where p and m represent the phase (p = a, b, c) and the arm (m = u, l), respectively. It is worth mentioning that the NLC only results the number of SMs to be inserted and it is indifferent to which SMs are selected. This task is taken in charge by the CVB control algorithm to ensure the balance between the capacitor voltages in the arm. The main elements of the CVB control algorithm are the sorting method and the selection technique. The aim of the sorting method is to put in ascendant order the SMs of an arm according to the measured capacitor voltages V C i ,pm . The selection technique intends to select the SMs to be inserted based on the sorted list evaluated by the sorting method and the direction of the arm current ipm. The best balancing performance is achieved when the sorting is executed in each sampling period and always the best SMs are selected. However, in this case, a high switching frequency fsw is resulted. Different improved methods have been developed to decrease fsw [23][24][25]. In this work, a reduction of the fsw is achieved by inserting or bypassing only the difference between the required SMs and the actual ones, as shown in Figure 3. Moreover, the sorting is also performed if the capacitor voltages reach an upper or lower threshold value [23]. In this case, the SM with the highest (lowest) capacitor voltage is bypassed if the current is positive (negative) and another one is inserted in its place.
Another possible solution to reduce the fsw and decrease the execution time is to adopt max/min approaches. They are based on the fact that only one SM has to be usually inserted/bypassed in one sampling period. However, this assumption is valid only during steady state operation. During faults, more SMs have to be manipulated in order to ensure fast reaction from the converter side. The max/min methods require more sampling periods to insert/bypass the required number of SMs. This leads a slower control dynamic as shown in Section 4.

Problem Statement
The aim of this paper is to deal with the efficient FPGA-implementation of the sorting method to reduce the execution time of the whole controller. To have all voltage steps equal to one level, the sampling period Ts has to satisfy [22]: The selection technique intends to select the SMs to be inserted based on the sorted list evaluated by the sorting method and the direction of the arm current i pm . The best balancing performance is achieved when the sorting is executed in each sampling period and always the best SMs are selected. However, in this case, a high switching frequency f sw is resulted. Different improved methods have been developed to decrease f sw [23][24][25]. In this work, a reduction of the f sw is achieved by inserting or bypassing only the difference between the required SMs and the actual ones, as shown in Figure 3. Moreover, the sorting is also performed if the capacitor voltages reach an upper or lower threshold value [23]. In this case, the SM with the highest (lowest) capacitor voltage is bypassed if the current is positive (negative) and another one is inserted in its place.
Another possible solution to reduce the f sw and decrease the execution time is to adopt max/min approaches. They are based on the fact that only one SM has to be usually inserted/bypassed in one sampling period. However, this assumption is valid only during steady state operation. During faults, more SMs have to be manipulated in order to ensure fast reaction from the converter side. The max/min methods require more sampling periods to insert/bypass the required number of SMs. This leads a slower control dynamic as shown in Section 4.

Problem Statement
The aim of this paper is to deal with the efficient FPGA-implementation of the sorting method to reduce the execution time of the whole controller. To have all voltage steps equal to one level, the sampling period T s has to satisfy [22]:  In Equation (2), N is the number of SMs in the arm. Consequently, the controller execution time Tcontrol has to fulfill: The common solution based on SAs has limited timing performances. They limit either the maximum allowable sampling frequency or the number of SMs in the arm. A significant reduction of Tcontrol can be achieved by adopting the proposed SNs. They are convenient for FPGA implementation due to their parallel structure. Moreover, they avoid the use of iterative and branch instructions.
Different factorization levels are introduced to give the possibility to choose the best tradeoff between required hardware resources and execution time. For these reasons, they are attractive for MMC applications and their detailed description is given in the next section.

Description of the Sorting Networks
The sorting networks are widely adopted in data processing [19] due to their timing performance. They consist of a fixed parallel structure composed of m-horizontal wires and several Compare-and-Swap (CS) operators. The latter carries out the sorting of two input elements: it compares them and ensures that the larger input value comes out from the upper output wire and the smaller input from the lower wire. Among the different SNs, the Bitonic structure is chosen in this work due to its modular aspect and to the reduced amount of CS operators [19,20].
Such a sorting network is composed of different stages which in turn are composed by several CS operators. In Figure 4, an eight-input Bitonic Sorting Network is depicted. In this case, six stages in the structure can be highlighted. The number of stages obviously depends on the number of the input. The unsorted sequence, denoted with x, is applied on the left, and the sorted list y results on the right. Only one element per wire can be applied. In Equation (2), N is the number of SMs in the arm. Consequently, the controller execution time T control has to fulfill: The common solution based on SAs has limited timing performances. They limit either the maximum allowable sampling frequency or the number of SMs in the arm. A significant reduction of T control can be achieved by adopting the proposed SNs. They are convenient for FPGA implementation due to their parallel structure. Moreover, they avoid the use of iterative and branch instructions.
Different factorization levels are introduced to give the possibility to choose the best tradeoff between required hardware resources and execution time. For these reasons, they are attractive for MMC applications and their detailed description is given in the next section.

Description of the Sorting Networks
The sorting networks are widely adopted in data processing [19] due to their timing performance. They consist of a fixed parallel structure composed of m-horizontal wires and several Compare-and-Swap (CS) operators. The latter carries out the sorting of two input elements: it compares them and ensures that the larger input value comes out from the upper output wire and the smaller input from the lower wire. Among the different SNs, the Bitonic structure is chosen in this work due to its modular aspect and to the reduced amount of CS operators [19,20].
Such a sorting network is composed of different stages which in turn are composed by several CS operators. In Figure 4, an eight-input Bitonic Sorting Network is depicted. In this case, six stages in the structure can be highlighted. The number of stages obviously depends on the number of the input. The unsorted sequence, denoted with x, is applied on the left, and the sorted list y results on the right. Only one element per wire can be applied. Firstly, each SM is enumerated starting from the top of each arm, as shown in Figure 1, and its position is named here , . Each element xi must consist of the acquired capacitor voltage , and the corresponding position , , as depicted in Figure 4, where the subscript pm has been omitted for simplicity. A comparison of the capacitor voltages is performed and the swap operation is executed for both , and , if the voltages are not in the right order. The output y results in a sorted list with the voltages in ascending order along with the corresponding physical SM position. Then, if the SM with the highest voltage is needed for insertion, the first element is considered; otherwise, the last one is selected.
Another aspect that should be considered is the length of the list. Indeed, in MMC applications, the modularity is one of the main advantages [7]. It consists in the possibility to raise the power rating by adding more SMs in the arm. This means that the number of SMs, and then the length of the list, is not known a priori. However, a maximum number of SMs in an arm (named here with M) can always be presumed and then the SN is built for this worst-case scenario. In these conditions, some elements of the list can be left empty since the actual length of the list N can be different from M. To guarantee the proper behavior of the SN, these dummy elements have to be filled by setting the voltage equal to 0 and the position to −1 [20]. Therefore, these elements are kept in the last positions and the elements with positions different from −1 are selected when the SMs with lowest voltages are required.
An example of an eight-input Bitonic SN for a MMC with three SMs per arm ( = 8 and = 3) is depicted in Figure 4. The dummy element is in Position 4 and it is kept in that position along the net. The result can be achieved before the last stage by considering only a sub-structure of the network as shown in the example. After having presented the SNs and having discussed their main peculiar aspects in MMC applications, some simulation results are shown in the next section.

Simulation Results
In this section, the Bitonic SN is compared with the max/min approach to show the differences between an algorithm that completely sorts the list and one that searches only the SM with the highest/lowest CV. The simulations have been performed in PLECS ® power electronic simulation environment in both normal and faulty conditions. The MMC and grid parameters are given in Tables  1 and 2, respectively.
The Bitonic SN and the max/min approach have been implemented in PLECS. It is worth to note that the SN is intrinsically parallel; to simulate it its treatment has been serialized. Thus, its behavior is equal to the one achieved with a classical sorting algorithm such as the BSA. A tolerance band has Firstly, each SM is enumerated starting from the top of each arm, as shown in Figure 1, and its position is named here P i,pm . Each element x i must consist of the acquired capacitor voltage V C i ,pm and the corresponding position P i,pm , as depicted in Figure 4, where the subscript pm has been omitted for simplicity. A comparison of the capacitor voltages is performed and the swap operation is executed for both V C i ,pm and P i,pm if the voltages are not in the right order. The output y results in a sorted list with the voltages in ascending order along with the corresponding physical SM position. Then, if the SM with the highest voltage is needed for insertion, the first element y 1 is considered; otherwise, the last one is selected.
Another aspect that should be considered is the length of the list. Indeed, in MMC applications, the modularity is one of the main advantages [7]. It consists in the possibility to raise the power rating by adding more SMs in the arm. This means that the number of SMs, and then the length of the list, is not known a priori. However, a maximum number of SMs in an arm (named here with M) can always be presumed and then the SN is built for this worst-case scenario. In these conditions, some elements of the list can be left empty since the actual length of the list N can be different from M. To guarantee the proper behavior of the SN, these dummy elements have to be filled by setting the voltage equal to 0 and the position to −1 [20]. Therefore, these elements are kept in the last positions and the elements with positions different from −1 are selected when the SMs with lowest voltages are required.
An example of an eight-input Bitonic SN for a MMC with three SMs per arm (M = 8 and N = 3) is depicted in Figure 4. The dummy element is in Position 4 and it is kept in that position along the net. The result can be achieved before the last stage by considering only a sub-structure of the network as shown in the example. After having presented the SNs and having discussed their main peculiar aspects in MMC applications, some simulation results are shown in the next section.

Simulation Results
In this section, the Bitonic SN is compared with the max/min approach to show the differences between an algorithm that completely sorts the list and one that searches only the SM with the highest/lowest CV. The simulations have been performed in PLECS ® power electronic simulation environment in both normal and faulty conditions. The MMC and grid parameters are given in Tables 1 and 2, respectively.
The Bitonic SN and the max/min approach have been implemented in PLECS. It is worth to note that the SN is intrinsically parallel; to simulate it its treatment has been serialized. Thus, its behavior is equal to the one achieved with a classical sorting algorithm such as the BSA. A tolerance band has also been introduced as shown in [23]. The max/min method proposed in [13] has been adopted in this work. The tolerance bands are set to 2 kV.

Quantity Value
Grid frequency ( f grid ) 50 Hz

Normal Condition
The simulation results shown in this section were achieved during steady state condition, i.e., without any faults in the system. At 0.2 s, the converter starts to deliver power to the grid. In Figure 5, the output currents, circulating currents and active power are shown when the Bitonic SN is adopted. The capacitor voltages for both the Bitonic SN and the max/min approach are depicted in Figure 6. also been introduced as shown in [23]. The max/min method proposed in [13] has been adopted in this work. The tolerance bands are set to 2 kV.

Normal Condition
The simulation results shown in this section were achieved during steady state condition, i.e., without any faults in the system. At 0.2 s, the converter starts to deliver power to the grid. In Figure 5, the output currents, circulating currents and active power are shown when the Bitonic SN is adopted. The capacitor voltages for both the Bitonic SN and the max/min approach are depicted in Figure 6. When the best SMs are always selected, the algorithm that provides the complete sorted list achieves a better balance in comparison with the max/min approach. However, it leads to a high switching frequency, equal to 409 Hz in the studied case (Figure 6a). The max/min approach instead only selects one SM to be inserted or bypassed when a change in , is resulted from the NLC (Figure 6b). Then, the switching frequency is intrinsically optimized and almost equal to 60 Hz. By When the best SMs are always selected, the algorithm that provides the complete sorted list achieves a better balance in comparison with the max/min approach. However, it leads to a high switching frequency, equal to 409 Hz in the studied case (Figure 6a). The max/min approach instead only selects one SM to be inserted or bypassed when a change in N p,m is resulted from the NLC (Figure 6b). Then, the switching frequency is intrinsically optimized and almost equal to 60 Hz.
By adopting the improved f sw for the Bitonic SN, as described in Figure 3, the f sw can be reduced. In this case, it is comparable to the one achieved with the max/min method, but the achieved balance is worse than the one achieved without the f sw optimization, as shown in Figure 6c. adopting the improved fsw for the Bitonic SN, as described in Figure 3, the fsw can be reduced. In this case, it is comparable to the one achieved with the max/min method, but the achieved balance is worse than the one achieved without the fsw optimization, as shown in Figure 6c. Finally, it can be concluded that in normal condition and when the fsw optimization is active, the max/min method and the sorting algorithm give almost the same balancing results and switching frequency. From now on, the fsw optimization is considered active, if not differently mentioned. Finally, it can be concluded that in normal condition and when the f sw optimization is active, the max/min method and the sorting algorithm give almost the same balancing results and switching frequency. From now on, the f sw optimization is considered active, if not differently mentioned.

Phase-to-Ground Fault
The phase-to-ground fault is simulated by adopting the previous MMC and grid parameters. When this kind of fault appears in the system, the control algorithm demands more SMs to be inserted or bypassed in the same sampling period. The fault is applied at 0.5 s on the phase a. Figure 7 shows the output currents, the capacitor voltages of phase a and the number of switches for the same phase. When the number of switches is positive, it means that the SMs have been inserted, while, when it is negative, the SMs have been bypassed. It is worth noting that, in the case of the SN, three SMs have been bypassed at the moment of the fault. This number can relatively increase if the number of SMs in the arm is higher. The possibility to apply the required changes in one sampling period T s enhances the dynamic performance of the controller. Indeed, at the fault instant, the algorithms that provide the fully sorted list are able to track the current reference and avoid spikes on the output current as shown in Figure 8.

Phase-to-Ground Fault
The phase-to-ground fault is simulated by adopting the previous MMC and grid parameters. When this kind of fault appears in the system, the control algorithm demands more SMs to be inserted or bypassed in the same sampling period. The fault is applied at 0.5 s on the phase a. Figure 7 shows the output currents, the capacitor voltages of phase a and the number of switches for the same phase. When the number of switches is positive, it means that the SMs have been inserted, while, when it is negative, the SMs have been bypassed. It is worth noting that, in the case of the SN, three SMs have been bypassed at the moment of the fault. This number can relatively increase if the number of SMs in the arm is higher. The possibility to apply the required changes in one sampling period Ts enhances the dynamic performance of the controller. Indeed, at the fault instant, the algorithms that provide the fully sorted list are able to track the current reference and avoid spikes on the output current as shown in Figure 8.  On the other hand, the max/min method keeps only one switch per each Ts, requiring more sampling periods to insert/bypass the needed SMs. This leads to a lower controller dynamic that provokes a spike on the output current, as depicted in Figure 8. It is also possible to run the max/min method several times in the same sampling period to get the required SMs, but this leads to a longer execution time that can easily exceed the chosen Ts, not guaranteeing the proper controller behavior. On the other hand, the max/min method keeps only one switch per each T s , requiring more sampling periods to insert/bypass the needed SMs. This leads to a lower controller dynamic that provokes a spike on the output current, as depicted in Figure 8. It is also possible to run the max/min method several times in the same sampling period to get the required SMs, but this leads to a longer execution time that can easily exceed the chosen T s , not guaranteeing the proper controller behavior. The advantages of the proposed SN over the min/max approach have been demonstrated and the FPGA implementation of the Bitonic SN is now dealt with.

FPGA Implementation of the Bitonic SN
Different FPGA-based implementations of SNs are shown and compared in this section. Firstly, the fully pipelined SN is presented. After that, a hybrid structure and a fully factorized SN are proposed for reducing the required resources. Finally, they are compared in terms of required resources and latency, i.e., the number of system clock cycles for achieving the final result.

Fully Pipelined SN
This kind of implementation of the SN allows a drastic reduction of the execution time. This architecture is driven by an external clock signal that synchronizes the data through the SN. After each stage, a bank of flip-flops is allocated for storing the results of CS operators, as shown in Figure  9. A new list of CVs can be fed to the input of the SN every clock cycle. The basic structure of the CS operator is also presented in Figure 9.  The advantages of the proposed SN over the min/max approach have been demonstrated and the FPGA implementation of the Bitonic SN is now dealt with.

FPGA Implementation of the Bitonic SN
Different FPGA-based implementations of SNs are shown and compared in this section. Firstly, the fully pipelined SN is presented. After that, a hybrid structure and a fully factorized SN are proposed for reducing the required resources. Finally, they are compared in terms of required resources and latency, i.e., the number of system clock cycles for achieving the final result.

Fully Pipelined SN
This kind of implementation of the SN allows a drastic reduction of the execution time. This architecture is driven by an external clock signal that synchronizes the data through the SN. After each stage, a bank of flip-flops is allocated for storing the results of CS operators, as shown in Figure 9. A new list of CVs can be fed to the input of the SN every clock cycle. The basic structure of the CS operator is also presented in Figure 9. It consists of a comparator and four multiplexers. The inputs are the two SM capacitor voltages V C a and V C b and the two SM positions P a and P b . The outputs are the sorted capacitor voltages V s,C a and V s,C b and their corresponding positions P s,a and P s,b . The values of b v and b p correspond to the size of the fixed-point format of the voltage and of the position, respectively.

M-Factorized SN
To reduce the required number of CS operators, the previous SN structure can be factorized. The objective is to reuse a single CS operator to perform more CS operations. However, this optimization leads to an increase of the latency in the architecture. Then, a compromise between the factorization level, i.e., which sub-structure of the Bitonic SN is factorized, and the latency is necessary. Figure 10 shows an example using = 8 and the factorization level L equal to 4. This means that the fourinput Bitonic SN sub-structure, highlighted in Figure 4, is factorized. This solution allows a reduction of the required CS operators and, therefore, the used resources. Indeed, the factorized four-input Bitonic SN only requires two CS operators instead of six. However,

M-Factorized SN
To reduce the required number of CS operators, the previous SN structure can be factorized. The objective is to reuse a single CS operator to perform more CS operations. However, this optimization leads to an increase of the latency in the architecture. Then, a compromise between the factorization level, i.e., which sub-structure of the Bitonic SN is factorized, and the latency is necessary. Figure 10 shows an example using M = 8 and the factorization level L equal to 4. This means that the four-input Bitonic SN sub-structure, highlighted in Figure 4, is factorized.

M-Factorized SN
To reduce the required number of CS operators, the previous SN structure can be factorized. The objective is to reuse a single CS operator to perform more CS operations. However, this optimization leads to an increase of the latency in the architecture. Then, a compromise between the factorization level, i.e., which sub-structure of the Bitonic SN is factorized, and the latency is necessary. Figure 10 shows an example using = 8 and the factorization level L equal to 4. This means that the fourinput Bitonic SN sub-structure, highlighted in Figure 4, is factorized. This solution allows a reduction of the required CS operators and, therefore, the used resources. Indeed, the factorized four-input Bitonic SN only requires two CS operators instead of six. However, This solution allows a reduction of the required CS operators and, therefore, the used resources. Indeed, the factorized four-input Bitonic SN only requires two CS operators instead of six. However, the latency will be 6 instead of 3 and no other list can be inserted at the input during this calculation, reducing the throughput of the architecture. It is worth noting that this solution also requires eight Multiplexers.

Fully Factorized SN
The last proposed architecture is the fully factorized SN, i.e., L = M. This alternative can be adopted to drastically reduce the required resources at the cost of a larger latency.
This architecture is depicted in Figure 11. The state machine generates and sends the synchronization signals to the data path which processes the input data. The Map operator implements the multiplexers and the registers needed for the factorization. The sorting done signal is raised when the sorted list is available at the output.
Energies 2018, 11, x FOR PEER REVIEW 12 of 18 the latency will be 6 instead of 3 and no other list can be inserted at the input during this calculation, reducing the throughput of the architecture. It is worth noting that this solution also requires eight Multiplexers.

Fully Factorized SN
The last proposed architecture is the fully factorized SN, i.e., = . This alternative can be adopted to drastically reduce the required resources at the cost of a larger latency.
This architecture is depicted in Figure 11. The state machine generates and sends the synchronization signals to the data path which processes the input data. The Map operator implements the multiplexers and the registers needed for the factorization. The sorting done signal is raised when the sorted list is available at the output.

Comparison
In this section, the previous SN structures are compared in terms of five-input Look-Up-Tables (LUTs), Flip-Flops (FFs) and latency. The LUTs, FFs and latency for these structures can be easily preevaluated, as shown in [20]. Figure 12a depicts the required LUTs with different factorization level. It is shown that, by increasing the number of SMs, N, the required LUTs increase. The fully pipelined is the structure that requires the highest amount of LUTs, as expected. By increasing the factorization level, the needed LUTs can be reduced. The right y axis shows the percentage of the required LUT when the selected low-cost System-on-Chip (SoC) device is considered. Then, it is obvious that a fully pipelined structure for high numbers of N is impracticable with this kind of device. The same happen for the FFs, as depicted in Figure 12b. On the other side, a higher factorization level leads a higher latency number, as shown in Figure 12c. Thus, a compromise is required between the LUTs, FFs and latency. The designer can then pre-evaluate the required resources and the latency to choose the best solution for its requirements.

Comparison
In this section, the previous SN structures are compared in terms of five-input Look-Up-Tables (LUTs), Flip-Flops (FFs) and latency. The LUTs, FFs and latency for these structures can be easily pre-evaluated, as shown in [20]. Figure 12a depicts the required LUTs with different factorization level. It is shown that, by increasing the number of SMs, N, the required LUTs increase. The fully pipelined is the structure that requires the highest amount of LUTs, as expected. By increasing the factorization level, the needed LUTs can be reduced. The right y axis shows the percentage of the required LUT when the selected low-cost System-on-Chip (SoC) device is considered. Then, it is obvious that a fully pipelined structure for high numbers of N is impracticable with this kind of device. The same happen for the FFs, as depicted in Figure 12b. On the other side, a higher factorization level leads a higher latency number, as shown in Figure 12c. Thus, a compromise is required between the LUTs, FFs and latency. The designer can then pre-evaluate the required resources and the latency to choose the best solution for its requirements.

Timing Diagram
In this section, the timing diagram for the proposed Bitonic SN is presented in Figure 13. The following control actions can be executed in parallel: the current controls (for both the output current is and the circulating current ic), the upper arm sorting and the lower arm sorting. This leads a reduction of the whole control time Tcontrol. The latter is the sum of two contributions: (a) the longer time between the current control time Tcc and the time needed to the Bitonic SN Tbn; and (b) the time

Timing Diagram
In this section, the timing diagram for the proposed Bitonic SN is presented in Figure 13. The following control actions can be executed in parallel: the current controls (for both the output current i s and the circulating current i c ), the upper arm sorting and the lower arm sorting. This leads a reduction of the whole control time T control . The latter is the sum of two contributions: (a) the longer time between the current control time T cc and the time needed to the Bitonic SN T bn ; and (b) the time T selection needed for selecting the SM to be inserted. From Equation (3) and Figure 13, it is possible to derive the maximum sorting time T max,sort that has to be guaranteed: Energies 2018, 11, x FOR PEER REVIEW 14 of 18 Tselection needed for selecting the SM to be inserted. From Equation (3) and Figure 13, it is possible to derive the maximum sorting time Tmax,sort that has to be guaranteed: , ≤ (4) Figure 13. Timing Diagram of the MMC control based on BSN.

Hardware-In-the-Loop Results
A MMC system is usually composed by tens or hundreds of modules per arm. The realization of this kind of systems is very expensive, and then an usual approach is to test the control behavior using HIL approach. In this section, HIL results are provided to validate the proposed sorting method and compare it with the bubble SA and with the max/min approach in terms of the achieved sorting execution time. The Zedboard platform, equipped with a Xilinx SoC Zynq-7020 device (named here as Zynq), has been employed. This device consists of almost 85,000 logic element cells, 4.9 Mb block RAM, 220 DSP and two embedded ARM cortex A9 processors with a clock frequency equal to 667 MHz.
The MMC control, composed by the output current control and the circulating current control, has been implemented in the first ARM core of the Processor System (PS). Along with the MMC control, the bubble sorting algorithm and the max/min approach have been also carried out in the same core. On the other hand, the proposed Bitonic SN has been implemented in the Programmable Logic (PL) side by allowing the implementation of the hardware structures presented in Section 3 and the adoption of the timing diagram shown in Figure 13. The fully factorized Bitonic SN structure has been chosen to save hardware resources. In the remaining ARM core, a single-phase MMC model has been emulated based on [26]. The case of N = 32 has been considered, which is realistic in view of the industrial implementation of MMC-HVDC by ABB [27]. A single-phase system has been chosen considering that the performances of the Bitonic SN are identical for the single-phase and the threephase system. Indeed, in the three-phase case, it can be easily implemented in parallel for achieving the same execution time shown in the following. The adopted capacitor value is 2.4 mF and the output energy is equal to 33 MW. The remaining MMC parameters are shown in Table 1. The whole implementation structure is depicted in Figure 14. The internal signals of the board are read and displayed on the PC through a serial communication. A sag voltage of 50% is emulated after 105 ms. The corresponding capacitor voltages, output currents, and output voltages for the max/min approach and the Bitonic SN are shown in Figure 15.

Hardware-In-the-Loop Results
A MMC system is usually composed by tens or hundreds of modules per arm. The realization of this kind of systems is very expensive, and then an usual approach is to test the control behavior using HIL approach. In this section, HIL results are provided to validate the proposed sorting method and compare it with the bubble SA and with the max/min approach in terms of the achieved sorting execution time. The Zedboard platform, equipped with a Xilinx SoC Zynq-7020 device (named here as Zynq), has been employed. This device consists of almost 85,000 logic element cells, 4.9 Mb block RAM, 220 DSP and two embedded ARM cortex A9 processors with a clock frequency equal to 667 MHz.
The MMC control, composed by the output current control and the circulating current control, has been implemented in the first ARM core of the Processor System (PS). Along with the MMC control, the bubble sorting algorithm and the max/min approach have been also carried out in the same core. On the other hand, the proposed Bitonic SN has been implemented in the Programmable Logic (PL) side by allowing the implementation of the hardware structures presented in Section 3 and the adoption of the timing diagram shown in Figure 13. The fully factorized Bitonic SN structure has been chosen to save hardware resources. In the remaining ARM core, a single-phase MMC model has been emulated based on [26]. The case of N = 32 has been considered, which is realistic in view of the industrial implementation of MMC-HVDC by ABB [27]. A single-phase system has been chosen considering that the performances of the Bitonic SN are identical for the single-phase and the three-phase system. Indeed, in the three-phase case, it can be easily implemented in parallel for achieving the same execution time shown in the following. The adopted capacitor value is 2.4 mF and the output energy is equal to 33 MW. The remaining MMC parameters are shown in Table 1. The whole implementation structure is depicted in Figure 14. The internal signals of the board are read and displayed on the PC through a serial communication. A sag voltage of 50% is emulated after 105 ms. The corresponding capacitor voltages, output currents, and output voltages for the max/min approach and the Bitonic SN are shown in Figure 15.   Figure 14. Hardware architecture of the developed system. The processor system and the programmable logic communicate through the AXI bus.
It is worth noting that the limited dynamic performance of the max/min approach cause an undesired overshoot in is of about 28%. The achieved execution times for the three techniques in both normal and fault conditions are summarized in Table 3. The sorting network has been implemented in FPGA and then some operations can be executed in parallel: the current control and the sorting algorithm in this case. Then, the control time is evaluated by summing up the longer execution time between the current control and the sorting algorithm with the selection time.
On the other hand, the bubble SA and the max/min approach have been implemented in the ARM core and then the current control, the sorting algorithm and the selection method are executed in a sequential manner. The whole execution time is then evaluated by summing up all the terms. The bubble SA achieves the worst timing performance by requiring maximum 4.56 µs in normal conditions and 6.46 µs during the emulated fault. It is worth noting that its execution time is not fixed. Thus, it is not deterministic and can be much higher than the ones obtained in this example. It is worth noting that the limited dynamic performance of the max/min approach cause an undesired overshoot in i s of about 28%. The achieved execution times for the three techniques in both normal and fault conditions are summarized in Table 3. The sorting network has been implemented in FPGA and then some operations can be executed in parallel: the current control and the sorting algorithm in this case. Then, the control time is evaluated by summing up the longer execution time between the current control and the sorting algorithm with the selection time.
On the other hand, the bubble SA and the max/min approach have been implemented in the ARM core and then the current control, the sorting algorithm and the selection method are executed in a sequential manner. The whole execution time is then evaluated by summing up all the terms. The bubble SA achieves the worst timing performance by requiring maximum 4.56 µs in normal conditions and 6.46 µs during the emulated fault. It is worth noting that its execution time is not fixed. Thus, it is not deterministic and can be much higher than the ones obtained in this example.  The max/min method needs 2.56 µs and 2.75 µs to be performed in normal and fault conditions, respectively. However, it does not provide a fully sorted list by causing current overshoot during faults. It is important to mention that its execution time is also not fixed. On the other hand, the proposed SN only requires a fixed execution time equal to 0.5 µs. It is also able to provide a fully sorted list useful during fault cases or when more SMs are required (like at the start-up). Its short execution time allows the proper behavior of the controller. However, if the number of SMs increases, the execution time raises and Ts decreases at the same time. In the case the execution time of the fully factorized structure is higher than the maximum allowable Ts, it is still possible to adopt either a hybrid structure or the fully pipelined SN that allow a reduction of the execution time. This choice can be done before starting the implementation thanks to the pre-evaluated resources and latency shown in Section 5 by reducing the time-to-market. Then, the right compromise between the sorting time and the required resources can be ensured. Table 4 shows the hardware resources required for the implementation of one fully factorized 64-Bitonic SN in the adopted device. If a three-phase 64-SM MMC is of interest, six Bitonic SN have to be considered and then almost 60% of the available resources are consumed. It is also possible to use only one fully factorized Bitonic SN for accomplishing the sorting of all the arms. Obviously, this solution leads to an increment in the execution time. Another solution can be to implement a fully pipelined Bitonic SN that allows a higher throughput (a new list can be sent after the first stage has been performed) and then a lower execution time at the expense of an increment of the required resources. As already said, the right compromise has to be achieved depending on the case and the discussion in Section 5 is for this aim.  The max/min method needs 2.56 µs and 2.75 µs to be performed in normal and fault conditions, respectively. However, it does not provide a fully sorted list by causing current overshoot during faults. It is important to mention that its execution time is also not fixed. On the other hand, the proposed SN only requires a fixed execution time equal to 0.5 µs. It is also able to provide a fully sorted list useful during fault cases or when more SMs are required (like at the start-up). Its short execution time allows the proper behavior of the controller. However, if the number of SMs increases, the execution time raises and T s decreases at the same time. In the case the execution time of the fully factorized structure is higher than the maximum allowable T s , it is still possible to adopt either a hybrid structure or the fully pipelined SN that allow a reduction of the execution time. This choice can be done before starting the implementation thanks to the pre-evaluated resources and latency shown in Section 5 by reducing the time-to-market. Then, the right compromise between the sorting time and the required resources can be ensured. Table 4 shows the hardware resources required for the implementation of one fully factorized 64-Bitonic SN in the adopted device.

Required Resources Utilization (%)
LUT 5121 (53 k) 9.6 FF 1571 (106 k) 1.5 If a three-phase 64-SM MMC is of interest, six Bitonic SN have to be considered and then almost 60% of the available resources are consumed. It is also possible to use only one fully factorized Bitonic SN for accomplishing the sorting of all the arms. Obviously, this solution leads to an increment in the execution time. Another solution can be to implement a fully pipelined Bitonic SN that allows a higher throughput (a new list can be sent after the first stage has been performed) and then a lower execution time at the expense of an increment of the required resources. As already said, the right compromise has to be achieved depending on the case and the discussion in Section 5 is for this aim.

Conclusions
In this paper an FPGA-implementation of the Bitonic Sorting Network for MMC applications has been proposed. The main advantages of the proposed sorting method are: the fixed parallel structure and the possibility to be efficiently implemented in FPGA devices. This leads to a shorter and deterministic execution time, which results in a better performance of the CVB control. Three different architectures have been proposed to meet the different possible requirements of the application of interest: the fully pipelined architecture, the hybrid structure and the fully factorized SN. Their hardware resources and latency have been pre-evaluated and compared. This allows choosing the structure that best fits the application requirements before starting the implementation. Simulation results have been given in both normal and faulty conditions to compare the proposed sorting method with the max/min approach. The latter is considered to request the shortest execution time but it does not provide the fully sorted list by deteriorating the dynamic performance of the controller.
The fully factorized Bitonic SN has been implemented in a low-cost Xilinx Zedboard along with the MMC plant model and the corresponding control. Based on the HIL results, the proposed sorting method shows superior performance. Moreover, the execution time and the output current of the implemented SN have been compared with the ones achieved with the bubble sorting algorithm and the max/min technique in the case of N = 32 and a sag voltage of 50%. The output current obtained with the max/min technique presents an overshoot of about 28% in comparison with the one achieved by adopting an algorithm that provides a fully sorted list. Then, the main gain of the proposed SN compared to the max/min approach is the ability to handle fault occurrences. Moreover, the sorting and selection for the SN require 2.96 µs to be executed against the 9.22 µs and 3.05 µs needed by the bubble SA and the max/min approach, respectively. Considering the hardware resources, it requires about 10% of the available resources of the adopted board.