Real-Time Regulation of Beam-Based Feedback: Implementing an FPGA Solution for a Continuous Wave Linear Accelerator

Control applications targeting fast industrial processes rely on real-time feasible implementations. One of such applications is the stabilization of an electron bunch arrival time in the context of a linear accelerator. In the past, only the electric field accelerating the electron bunches was actively controlled in order to implicitly stabilize the accelerated electron beam. Nowadays, beam properties are specifically measured at a target position and then stabilized by a dedicated feedback loop acting on the accelerating structures. This dedicated loop is usually referred to as a beam-based feedback (BBF). Following this, the control system of the electron linear accelerator for beams with high brilliance and low emittance (ELBE) is planned to be upgraded by the BBF, and the problem of implementing a designed control algorithm becomes highly relevant. In this work, we propose a real-time feasible implementation of a high-order H2 regulator based on a field-programmable gate array (FPGA). By presenting simulation and synthesis results made in hardware description language (HDL) VHDL, we show that the proposed digital solution is fast enough to cover the bunch repetition rates frequently used at ELBE, such as 100 kHz. Finally, we verify the implementation by using a dedicated FPGA testbench.


Introduction
Radio frequency (RF) particle accelerators employ RF electromagnetic fields to accelerate charged particles to high energies while forming the particles into well-defined beams. A beam of accelerated particles can then be used to create a secondary radiation of ultra-short photon pulses, thus providing a light source for scientific experiments.  ELBE is a versatile light source located at Helmholtz-Zentrum Dresden-Rossendorf (HZDR), Germany. As illustrated in Figure 2, the general layout of the ELBE THz beamline follows the concept of a light source, namely: (1) electron bunches are generated by an electron gun, (2) these electron bunches are accelerated with the help of RF linacs consisting of superconducting RF cavities, and (3) photons are produced from these accelerated electron bunches inside a periodic magnetic structure, called undulator. elaborated by introducing systolic arrays in order to deal with state-space matrix operations. Despite the fact that there are many sophisticated systolic array architectures [19][20][21][22], in this work, a straightforward systolic structure suffices that is similar to a 2-D array. Furthermore, we examine the application of fixed-point data types to the regulator algorithm. The latency of the resulting implementation is then compared to the one based on floating-point data types. Finally, by assembling a hardware testbench that involves another FPGA, we verify the output of our implementation against simulation data from Simulink.
The remainder of this paper is organized as follows: Section 2 presents the BBF loop structure at ELBE and discusses the loop elements, including the bunch arrival time regulator, the LLRF controller and the BAM sensor. The discussion aims to build a proper context for the regulator implementation. Section 3 describes the hardware and software frameworks used in this work. Following this, Section 4 introduces the resulting firmware architecture and demarcates the new regulator logic from the given framework. By moving towards internal details, Section 5 focuses on the implementation of state-space formalism and introduces important topics, such as systolic arrays and fixed-point data types. Section 6 demonstrates the hardware setup used to verify the regulator implementation and discusses the verification results. Finally, Section 7 concludes the paper.

Beam-Based Feedback Loop Structure at ELBE
ELBE beam-based feedback extends the existing control scheme by introducing a regulation loop involving a BAM sensor, a BBF regulator and a LLRF system of the fourth accelerating cavity (C4). This results in a new interconnection of control elements that involves cascaded loops. Figure 3 elaborates the emerging beam-based feedback structure, while making the main accent on the beam-based feedback regulator.

Beam-Based Feedback Regulator
The BBF regulator designed in [4] represents a 7th-order single-input single-output H 2 control law expressed in a state-space form where where x, u and y denote the usual state, control input and measured output, and where A, B 2 , C 2 and D 22 are the system, input, output and feed-forward matrices, respectively. Matrices K u , L x and L u are the products of H 2 synthesis generated by MATLAB function h2syn. In order to alleviate the implementation of this control law, Equations (1) and (2) and since the new matrices can be evaluated offline, Equations (1) and (2) can be rewritten as As a result, Equations (11) and (12) form the structure of the beam-based feedback regulator displayed in Figure 3.
The regulator acts by correcting the setpoints of RF variables inside the LLRF controller FPGA. So to further reveal the loop details, the LLRF block displayed in Figure 3 needs to be expanded and elaborated.

RF System as Actuator for Electron Beam Regulation
In case of ELBE, every RF system is controlled by its dedicated LLRF controller. The purpose of such control is the stabilization of the corresponding RF variables, i.e., the amplitude A and phase φ of an accelerating RF field. Due to implementation specifics, the LLRF controller operates with the in-phase I and quadrature Q representations of the field amplitude and phase. Stabilized by the LLRF controller these I and Q signals are then applied inside a vector modulator (VM) to modulate a reference RF signal coming from a master oscillator. But since the signals generated by the LLRF controller are in the range of milliwatts, hence the name low-level RF, a solid-state power amplifier (SSPA) is used to drive the modulated RF signal to a kilowatt range. This finally results in a high-power RF signal that is passed through a waveguide to a RF cavity. Figure 4 illustrates the described RF system by expanding the LLRF block first introduced in Figure 3. To incorporate the RF setpoint correction signal a coming from the beam-based feedback regulator, the LLRF controller FPGA is extended by a setpoint modulation logic M. The logic is defined as where a denotes a change in the RF field amplitude expressed in percent units, and where r I and r Q are the RF field setpoints in I and Q, respectively. Factor 1 /100 is required to convert the amplitude change in percent to a relative error ∆A /A [8]. In fact, this factor could be applied by the BBF regulator implementation. In this case, however, the small resulting number would suffer from underflows caused by the fixed-point data type in use by the LLRF implementation. Therefore, this factor is applied last. To sum up, the I and Q setpoints are modulated by the relative amplitude error in order to alter the RF field oscillating inside the cavity. This will cause a change in the arrival time of subsequent electron bunches, which can then be diagnosed by the BAM sensor.

Bunch Arrival Time Monitor as Sensor of Electron Beam
The beam-based feedback is represented by a bunch arrival time measured by the BAM sensor, see Figure 3. The operation principle of this sensor is based on measuring bunch arrival time relative to an actively-stabilized optical timing reference. Specifically, periodic pulses of a reference laser are amplitude-modulated with electric signals coming from pick-up antennas that probe the electric field of the passing electron bunch. The arrival time information is thus transferred into an amplitude modulation of coincident laser pulses. According to such arrival time representation, the output of this sensor is defined by a dimensionless modulation value τ with a working point set to τ = 1. Figure 5 exemplifies a BAM readout taken at a bunch repetition rate of 50 kHz. The BAM readout displays arrival time fluctuations caused by various disturbances, e.g., noise coming from the electron gun, drifts caused by changes in ambient temperature and RF noise inherent to RF acceleration process. In particular, the presented time domain data reveals distinct low frequency oscillations about the working point. Simultaneously, the frequency domain data features two components inherent to RF noise. These are (1) a random component represented by a spectral profile that decays with certain slopes as the frequency offset increases and (2) a deterministic component that manifests itself as a number of spurs along that profile. A harmonic of voltage ripple at 100 Hz is a representative of the latter.
The beam-based feedback regulator seeks to compensate the fluctuations in signal τ, and the simulation of such compensation is displayed in Figure 6. The compensated data demonstrates the work of the regulator, i.e., (1) time domain data exhibits no drift and (2) the frequency data demonstrates suppression of low frequency noise up to ca. 5 kHz. Compared to the initial τ signal with no feedback, the regulator almost halves the disturbance effect in terms of the corresponding rms values.
The 50 kHz bunch repetition rate is commonly used for the ELBE THz beamline. Still, higher rates are possible, e.g., several hundred kilohertz or even a megahertz. Hence, the main reason to aim for a low-latency implementation of the regulator algorithm.

Hardware/Software Environment for Beam-Based Feedback Regulator
As a system, the beam-based feedback regulator relies on a number of subsystems to perform common tasks, such as establishing a user interface, sending and receiving of data, saving these data for further analysis, etc. To alleviate the heavy lifting, the regulator solution takes advantage of a digital platform based on the micro telecommunications computing architecture (MTCA) [23] and deployed on the ELBE accelerator machine [24].

MTCA.4 Hardware Environment
The fact that the regulator will be incorporated into the existing digital platform sets a few important conditions: (1) use of custom FPGA boards and (2) compliance with the existing firmware framework developed in VHDL [25]. Consequently, the regulator implementation is carried out on a specific data processing and telecommunication board, called the advanced mezzanine card (AMC) TCK7 [26]. The board features a high-performance FPGA device XC7K420T from Xilinx. To help leveraging the features of this board the given firmware framework provides support for • RAM-or register-based internal interface (II) to communicate with CPU software; • Low-latency links (LLLs) to communicate with other FPGA boards; • Data acquisition (DAQ) facility to save diagnostic data to DDR3 memory.
In this work, the AMC TCK7 board must (1) receive data from the BAM FPGA, (2) do the processing and (3) send the result to the LLRF FPGA. While the processing is done inside the XC7K420T chip, the receiving/sending occurs through the low-latency links that are driven by 10-Gigabit small form-factor pluggable (SFP) optical transceivers. Measurement signals τ and control signals a are transferred via these LLL facilities. Similarly, the initialization of the processing stage happens through a register-based hardware/software (HW/SW) interface driven by a PCI Express (PCIe) connection. This interface is used by external CPU software to set the values of gain matrix elements, setpoints, etc. In addition, the processing stage needs to save diagnostics data. For this reason, DAQ facility is used to dump data to DDR3 memory. To sum up this description, Figure 7 illustrates a general block diagram showing the beam-based feedback regulator in the context of the given MTCA. 4   The complexity of the presented hardware system underscores the importance of a software-based user interaction. Hence, the necessity to establish a hardware/software interface that would facilitate the initialization, control and other user related tasks.

Hardware/Software Interface
The MTCA.4 technology uses a special software framework [27] to support the PCIebased communication between the related hardware and software. Specifically, this framework provides the necessary application programming interface (API) to establish a registerbased HW/SW interface. By writing and reading these registers the user is able to interact with the beam-based feedback regulator FPGA, and Figure 8 depicts a basic user interaction as a unified modeling language (UML) use case diagram. 022 submitted to Sensors 8 of 21 The complexity of the presented hardware system underscores the importance of a 197 software-based user interaction. Hence the necessity to establish a hardware/software 198 interface that would facilitate the initialization, control and other user related tasks. Along with the basic use case to initialize the gain data when the application starts, a more sophisticated variant is to allow updating these data when the regulation is already running. The difficulty is related to the fact that ELBE is operated in CW mode, so there are no pauses long enough to permit a plain data overwrite. Moreover, it is reasonable to assume that updating the gain data separately, i.e., regardless of the algorithm state as a whole, will lead to erroneous algorithm results, hence the necessity to update the gains (1) all at once and (2) at a proper time instant. Essentially, the second requirement suggests an FSM-based implementation on the hardware side. In the meantime, the first requirement is fulfilled on the software side by first making a time-consuming write of the entire data from the software to intermediate hardware buffers and then triggering a fast data update on the hardware side. Figure 9 demonstrates this concept.
Version August 11, 2022 submitted to Sensors 8 of 21 The complexity of the presented hardware system underscores the importance of a 197 software-based user interaction. Hence the necessity to establish a hardware/software 198 interface that would facilitate the initialization, control and other user related tasks. The starting and stopping use cases are both based on the manipulation of the same flag to enable feedback data propagation to the regulation algorithm. This one-bit flag, called CTL_ENA, is set to 1 to enable regulation. Fundamentally, this defines an eventdriven behavior of the algorithm implementation, i.e., when the algorithm receives no data, the regulation idles, see Figure 10. In essence, the given MTCA.4 environment sets a number of requirements to the firmware architecture of the beam-based feedback regulator, namely: • firmware is written in hardware description language VHDL; • digital design targets an FPGA device from Xilinx; • communication logic adheres to the given protocols, i.e., LLL and II; • regulator algorithm is governed by an FSM.
With this in mind, the firmware architecture can now be elaborated.

Firmware Architecture of Beam-Based Feedback Regulator
The structure of the MTCA.4 firmware framework divides the on-chip logic into board and application compartments. The former manages the board specific features, including the low-level communication interfaces and clock generation, while the latter defines the application logic. Both compartments are then united inside a top level VHDL entity. Figure 11 depicts this structure as a general block diagram and uses color codes to demarcate the new regulator logic from the given framework.

Behavioral Model
The behavior of the regulator can be modeled in terms of the data that flows inside the application compartment and drives the initialization, regulation and diagnostics: Initialization Before running the regulation algorithm, the gain data need to be initialized.
Since these data are transferred from the CPU software, the values are received by the firmware through the II communication. In fact, a single gain can be represented by a matrix with multiple values, so the receiver should be RAM-based. According to the firmware requirements outlined in Section 3, this RAM is expected to serve as an intermediate buffer that is first written by the software and then, after a software trigger, read by the actual gain logic. Once such initialization is complete, the regulation data flow becomes enabled.

Regulation
The regulation data flow starts from the reception of a BAM sensor measurement coming over a dedicated low-latency link. Provided the regulator algorithm is not busy at the moment and the regulation is enabled by the user, see Figure 10, the new measurement triggers the computation of a new control signal. Otherwise, the received measurement is dropped. Such behavior ensures that the algorithm always sees up-to-date measurements. When the algorithm is indeed triggered, the data starts flowing to gains and further to sums. Once the regulation algorithm is complete, the new control signal is transmitted over a dedicated low-latency link to the LLRF actuator.

Diagnostics
The diagnostics saves the regulation data, including the BAM sensor measurements and the corresponding control signals. Importantly, the BAM data are saved even if the regulator is disabled. This allows to diagnose the open-loop behavior of the system.
To sum up, Figure 12 illustrates a functional schematic of the BBF application. Note that the illustrated firmware blocks belong to the new regulator logic, hence the corresponding color code. In this context, the additional Rx and Tx blocks act as adapters between the framework and the regulator in order to establish a properly registered ready/valid handshake-a flow control technique [28] that is used throughout the entire regulator logic. Note also that for the sake of brevity the diagram omits the transformation from signal τ

Finite State Machine
In addition to the firmware requirements outlined in Section 3, another reason for involving an FSM to manage the regulator algorithm is inspired by the analogy between a discrete-time state-space form of the regulator and a Mealy state machine [29]. Specifically, a hardware implementation of the Mealy machine is composed of combinational and sequential logic blocks. The combinational logic takes the current state together with input and computes the next state and output. The next state is then saved, or registered, by the sequential logic. The process repeats during the next iteration. Accordingly, Figure 13 depicts an FSM of the state-space regulator, where sequential logic block z −1 registers the regulator state, and where combinational logic blocks f (·) and g(·) implement Expressions (11) and (12), respectively. Unfolding this FSM approach, Figure 14 demonstrates a state diagram that captures the essential parts of the regulator behavior. Note that once the BBF algorithm is triggered, f (·) and g(·) will be executed in parallel. Yet f (·) will take significant time to process its longest operation, i.e., Θx. So it is possible to get the result of g(·) and initiate the sending of LLRF data before f (·) completes. Hence, a busy state with two stages. Finally, the latency of algorithm operations, such as Θx, underscores the necessity to analyze the implementation of these state-space constructs.

State-Space Implementation in Hardware
In this work, we propose to describe the state-space formalism of the beam-based feedback regulator in terms of systolic arrays. Indeed, Equations (11) and (12) are based on matrix-vector multiplication, and systolic arrays offer a hardware architecture that enables massively parallel execution of this mathematical operation. Moreover, when implemented on FPGAs, systolic arrays can enjoy not only the inherently parallel nature of FPGAs, but also the availability of specialized circuits that are able to efficiently perform mathematical operations.

Systolic Array Structure
Systolic arrays are grid-like interconnections of data processing elements, or nodes, that are driven by data to perform some specific uniform operation. In particular, a properly organized data flow can drive an array of multiply-accumulate nodes, or MACs, in order to implement a matrix-vector multiplication. Figure 15 demonstrates an example of a node interconnection that computes the product c = A b, where b and c are 2 × 1 input and output vectors, respectively, and where A is a 2 × 2 matrix. In this straightforward systolic implementation the number of nodes corresponds to the size of the output vector c. This allows to parallelize the computation of individual output vector elements and to keep the computation results local to the nodes. Consequently, when the data finishes propagating through this interconnection in a wave-like manner, the nodes will store a fully computed output vector. An efficient implementation of the presented systolic array node, i.e., the multiplieraccumulator unit, requires a design that exploits specialized computational resources provided by an FPGA, namely digital signal processor (DSP) slices. These high-speed circuits support a number of mathematical functions, including multiplication and addition, and therefore can accelerate a compute-intensive design. According to the data sheet [30], the XC7K420T device features 1680 DSP slices, and each slice contains a preadder, a 25 × 18 multiplier, an adder, and an accumulator. Correspondingly, this kind of slice can be configured to perform a multiply-accumulate operation in the form of a · b + c, where a and b are 25-bit and 18-bit signals, respectively, and where accumulation with c is sign-extended to 48 bits. On the one hand, by adhering to these signal widths, e.g., in case of custom fixed-point data types, a MAC unit can be mapped to a single DSP slice on the FPGA, thus leading to an optimal use of FPGA resources. On the other hand, if the MAC unit needs to operate with wider signals, e.g., when dealing with 32-bit floating-point data, the number of DSP slices per MAC unit increases, and the design consumes more FPGA resources. To investigate the optimal solution-both in terms of resources and latency-this work focuses on the custom fixed-point data types.

Fixed-Point Analysis of Regulator Signals and Gains
Due to the 25 × 18 signal widths dictated by the DSP48E1 slice [31], the fixed-point implementation differentiates between two types of data: • signals flowing through the regulator, e.g., from y[k] to u[k]; • gain matrix values that modify the signals.
The signals and gains are assigned to 25-bit and 18-bit words, respectively, thus prioritizing the precision of the signal data. Apart from the precision, the 25-bit signal word needs to allocate enough bits for the integer part to avoid overflows inside the regulator. To mitigate this risk and to improve the conditioning of the regulator, the inputs and outputs are properly scaled. It is expected that the input signal coming from the BAM sensor exhibits a deviation of roughly δτ max = 0.1. This yields y = δτ · δτ −1 max = 0.1 · 10 = 1, thus normalizing the signal magnitudes flowing inside the regulator.
However, there can also be unexpected spikes in the BAM readout, see Figure 5. In general, a large deviation of the input signal δτ will result in a considerable action taken by the regulator. In case of ELBE, a drastic change of the signal driving the cavity will have a high probability to trigger the ELBE protection system to switch off the accelerator. To cope with this unwanted scenario the signal δτ is limited as follows Setting δτ lim = 1 yields y = δτ lim · δτ −1 max = 1 · 10 = 10. Under these conditions, four bits are enough to represent the integer part of the signal. The derived signal data type is then (1,25,20), where the fixed-point notation is specified as (sign, word length, word fraction length). (15) In contrast, the fixed-point data type for gains of a 4th-order regulator is derived by examining the magnitudes of the values stored inside the corresponding matrices, namely Depending on the magnitude range of a particular matrix, the gain data type can vary its precision. For example, the maximum magnitude among the matrix values constituting the gain Θ is −1.0294. Hence, only one bit suffices for the integer part, another bit represents the sign, and 16 bits can be allocated to the fraction part, thus yielding a precision of 2 −16 = 1.5259 × 10 −5 . Following this, Table 1 displays the various fixed-point data types assigned to the regulator matrices.  (1,18,16) The gains δτ −1 max and a max are integers with values 10 and 1, respectively, so their fraction parts could be omitted. Still, it is expected that these gains can be manipulated during run-time to do fine-tuning of the regulator. Thus, the fraction parts are preserved and the corresponding data types are both chosen as (1,18,10). In addition, the BAM and LLRF signals have fixed-point data types (0, 18,15) and (1,18,10), respectively. Consequently, the regulator implementation needs to take this into account when receiving or sending low-latency link data.
The presented analysis underscores the fact that a fixed-point implementation requires a thorough examination of the data flowing inside the design. It is therefore of interest to see how much this effort helps to keep the systolic array structure optimal in terms of latency and resources, e.g., compared to a floating-point implementation.

Data Flow to Drive Systolic Arrays
The operation of systolic arrays relies on a properly organized data flow. In this work, a digital circuit responsible for the data flow is called a data channel, and multiple data channels compose a data stream. As shown in Figure 16   Inside the data stream circuit, the data are structured based on their types. The gain is treated as a two-dimensional matrix and is organized into matrix rows. These rows are represented by separate data channels implemented as RAMs. Such organization facilitates the gain data throughput, because it allows to write or read an entire matrix column in a single clock cycle. This also places a requirement on the software to write the gain data to intermediate RAM buffers in a column-major order. After a software trigger, the gain controller will rely on the proper format of the intermediate data in order to initialize the internal data channels column by column. Unlike the gain, the signal is placed into a register-based memory. Again, this allows to register an incoming signal, i.e., the gain argument, in a single clock cycle. Along with the gain column, the signal vector element is then fed into the systolic array to perform the necessary computation. Once the computation is over, the results from each systolic array node are assembled into a result vector which is then propagated to the gain entity output.
Undoubtedly, the data stream circuit introduces some amount of overhead into the design. Moreover, the latency of the data management will dominate in case of low-order regulator implementations, see Figure 17. The data stream overhead is partially related to data padding required by systolic arrays. As can be seen in Figure 15, the data flow driving the systolic array is padded with zeros. Although these zeros ensure the correct computation inside the nodes, they also lead to the increase of the corresponding data channel size which becomes data channel size = gain row size + gain column size − 1.
Of course, the overall latency of the regulation algorithm is also affected by the implementation of the state-space computation units, i.e., the MAC and the sum. This is especially true for the floating-point case which uses a 32-bit single precision data type [32] and thus relies on an intellectual property (IP) from Xilinx [33]. In this context, even though the sum implementation represents a trivial element-wise summation of the argument vectors, and thus does not involve systolic arrays, its floating-point implementation still uses DSP resources, see Table 2. Indeed, by adhering to the dictated signal widths the fixed-point MAC implementation is mapped to a single DSP slice. Compared to the floating-point counterpart that uses more DSPs per MAC, the one-to-one mapping allows the fixed-point design to occupy considerably less DSP resources on the FPGA.
It can be argued, though, that the usage of the proposed gain entity architecture for scalar operations, such as δτ −1 max · δτ, is far beyond what is required. Hence, the unwanted overhead. This is a reasonable point provided the system does not change in the future. Yet there are plans to extend the system in order to regulate the compression and energy of the electron bunches. In this case, the current architecture can be easily scaled to implement the required mathematical operation, e.g., where E and C denote the energy and compression of the electron bunches, respectively. This underscores the scalability of the current digital solution.
In general, the presented data demonstrates that the proposed state-space architecture can be routinely clocked at 240 MHz-a particular clock signal generated by the board compartment of the MTCA.4 firmware framework. Running at such frequency allows the fixed-point implementation to stay on the sub-microsecond level even in the case of high-order regulator designs. The floating-point implementation exhibits greater latency, but even the 7th-order regulator designed in [4] takes less than 2 microseconds to perform its state-space computation. The 50 kHz bunch repetition rate is thus supported by both implementations. In the mean time, the fast fixed-point implementation needs to be verified in hardware.

Firmware Verification
The correctness of the regulator implementation is verified by assembling a digital setup which involves the BBF regulator FPGA connected to an additional FPGA that serves as a testbench. Along with the verification of the state-space implementation, the usage of a separate FPGA allows testing the low-latency link communication. The corresponding installation can be observed in Figure 18. The two FPGAs are slided into a MTCA.4 crate leaving only their front panels exposed. As can be seen, these panels feature SFP slots that are used to interconnect the two devices using optical cables.
In principle, the testbench operation can be outlined as follows

1.
Sending a stimulus to the BBF regulator FPGA; 2.
Comparing the received response with a similar one generated by a floating-point MATLAB simulation.
The MATLAB simulation is run offline and is driven by the same stimulus-simulated BAM data. Hence, both responses should be identical to a certain degree of precision. Figure 19 summarizes the outlined testbench operation.   Figure 19 summarizes the outlined testbench operation.  The difference between the response simulated in MATLAB and the one produced by the testbench is displayed in Figure 21. Essentially, the assembled testbench represents an open-loop system, i.e., the output of the testbench regulator has no effect on the next BAM data sample. Coupled with the precision issue, the open-loop scenario leads to accumulation of the mismatch between the two responses. Still, it is expected that this negative behavior will disappear, once the fixed-point regulator closes the real machine loop. In the closed-loop scenario, the correctness of the first few responses will matter, and the current testbench data exhibits up to 20 correct responses before the least significant fraction bit flips. Note that the magnitude of the bit flip corresponds to the precision of the fixed-point data type (1,18,10) used by the LLRF controller.

Conclusions
In the context of the linear accelerator ELBE, the benefit of implementing a beam-based feedback regulator using an FPGA is twofold. Firstly, the low-latency nature of FPGAs allows dealing with fast processes, such as the regulation of an electron bunch arrival time. Secondly, the high configurability of FPGAs enables the implementation of sophisticated regulation algorithms, e.g., an optimal H 2 regulator in its state-space representation.
Accordingly, in this work we proposed an FPGA-based implementation of a beambased feedback regulation algorithm to compensate electron bunch arrival time fluctuations. Using a top-down approach, we gradually introduced the levels of the regulator system, starting from the structure of the ELBE beam-based feedback loop and culminating in the digital architecture of a state-space regulator. In order to implement the matrix-vector multiplications of the state-space formalism, we used systolic arrays. Even though the systolic arrays added data management overhead into the design, we saw that the design could easily be scaled in terms of the regulator orders as well as its inputs and outputs. Furthermore, we made a thorough analysis of a potential fixed-point implementation and then compared it to a floating-point one. Essentially, when clocked at 240 MHz the fixed-point implementation of a 4th-order H 2 regulator takes 0.425 µs to perform its state-space computations, thus enabling sub-microsecond sample times. In contrast, a similar floating-point implementation takes 1.025 µs. Finally, we verified the correctness of the implementation by running a hardware testbench which included two interconnected FPGAs.
The next step is to validate the proposed digital solution on the real machine. This will allow evaluating the whole spectrum of decisions made in this and our previous works, including the disturbance modeling, the choice of the regulation algorithm and the state-space implementation. Such evaluation will be the subject of our future report.

Conflicts of Interest:
The authors declare no conflict of interest.