Scalable Hardware-Efficient Architecture for Frame Synchronization in High-Data-Rate Satellite Receivers

Crocetti, Luca; Pagani, Emanuele; Bertolucci, Matteo; Fanucci, Luca

doi:10.3390/electronics13030668

Open AccessArticle

Scalable Hardware-Efficient Architecture for Frame Synchronization in High-Data-Rate Satellite Receivers^†

¹

Department of Information Engineering, University of Pisa, Via G. Caruso 16, 56122 Pisa, Italy

²

IngeniArs S.r.l., Via Ponte a Piglieri 8, 56121 Pisa, Italy

^*

Authors to whom correspondence should be addressed.

^†

This paper is an extended version of our paper published in 2nd IEEE Industrial Electronics Society Annual Online Conference (IES ONCON 2023).

Electronics 2024, 13(3), 668; https://doi.org/10.3390/electronics13030668

Submission received: 29 December 2023 / Revised: 25 January 2024 / Accepted: 29 January 2024 / Published: 5 February 2024

(This article belongs to the Special Issue Advances in Algorithms and Architectures for Digital Signal Processing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The continuous technical advancement of scientific space missions has resulted in a surge in the amount of data that is transferred to ground stations within short satellite visibility windows, which has consequently led to higher throughput requirements for the hardware involved. To aid synchronization algorithms, the communication standards commonly used in such applications define a physical layer frame structure that is composed of a preamble, segments of modulation symbols, and segments of pilot symbols. Therefore, the detection of a frame start becomes an essential operation, whose accuracy is undermined by the large Doppler shift and quantization errors in hardware implementations. In this work, we present a design methodology for frame synchronization modules that are robust against large frequency offsets and rely on a parallel architecture to support high throughput requirements. Several algorithms are evaluated in terms of the trade-off between accuracy and resource utilization, and the best solution is exemplified through its application to the CCSDS 131.2-B-1 and CCSDS 131.21-O-1 standards. The implementation results are reported for a Xilinx KU115 FPGA, thereby showing the capability of supporting baud rates that are greater than 2 Gbaud, as well as a corresponding throughput of 15.80 Gbps. To the best of our knowledge, this paper is the first to propose a design methodology for parallel frame synchronization modules that has applicability to the CCSDS 131.2-B-1 and CCSDS 131.21-O-1 standards.

Keywords:

frame synchronization; frame synchronizer; CCSDS 131.2-B-1; CCSDS 131.21-O-1; SCCC receiver; SCCC-X; space telemetry; parallel architecture; FPGA; DSP

1. Introduction

Over the last few years, space applications have pushed the requirements for the data rates and security services of communication links to the limit. Increasingly complex space missions involve onboard instruments that are characterized by sensors with particularly high resolutions and payload components, thereby generating a huge amount of data that require very high-data-rate links to be transferred to ground stations within the visibility window available in a single orbit pass. Governmental agencies such as the European Space Agency (ESA) are attempting to address these problems by issuing and updating reports and standards through the Consultative Committee for Space Data Systems (CCSDS). In particular, the CCSDS produced the 131.2-B-1 [1] standard in 2012, which defines flexible advanced coding and modulation schemes for high-rate telemetry applications. Such coding and modulation schemes rely on Serially Concatenated Convolutional Codes (SCCCs), which are a class of Forward Error Correction (FEC) codes that are highly suitable for turbo (iterative) decoding. Currently, the CCSDS 131.2-B-1 constitutes a fundamental reference for telemetry (TM) downlink applications in Earth Exploration Satellite Services (EESSs) [2,3] such as geostationary Earth orbit (GEO)-to-ground and low Earth orbit (LEO)-to ground systems. In addition, in 2021, the CCSDS produced the 131.21-O-1 [4] standard, which tackles the increased throughput requirements of future missions through the utilization of advanced adaptive techniques. In particular, the CCSDS defined an extension of the SCCCs specified in [1], i.e., Serially Concatenated Convolutional Code eXtension (SCCC-X), which can be exploited in a high elevation regime without saturating the data rate of Earth observation missions, while still respecting realistic constraints [4].

According to [5], the block diagram of a generic receiver can be depicted as in Figure 1, in which the baseband receiver takes inputs from an analog-to-digital converter (ADC) that interfaces it with the radio frequency (RF) front-end that operates at some intermediate frequency (IF).

The CCSDS has also produced standards and recommendations to address the cybersecurity threats in space links and communications, such as, in particular, the 352.0-B-2 [6] and 355.0-B-2 [7] standards, which have led to the endorsement of security schemes based on the Advanced Encryption Standard (AES) and the Secure Hash Algorithm 2 (SHA2). Specifically, the CCSDS 352.0-B-2 standard recommends the usage of the Galois Counter Mode (GCM) of AES to protect data with authenticated encryption, as well as the usage of the Cipher-Based Message Authentication Code (CMAC), Galois Message Authentication Code (GMAC), and Hash-Based Message Authentication Code (HMAC) schemes to protect data with authenticated integrity. The CCSDS 355.0-B-2 standard instead defines how to apply such security services to the data exchanged over space links. In particular, it defines the space data link security (SDLS) frame format and the corresponding security protocol for different space protocols, including TM protocols. The application of the security functions must be performed on the transmitter side before the synchronization and channel coding stages, by encapsulating the payload with specific security fields. On the recipient side, the removal of security fields and the retrieval of the original payload (e.g., by decryption) must be performed only after the synchronization and channel decoding stages. Therefore, the integration of the security mechanisms specified in [6,7] does not require the modification of the receiver architecture shown in Figure 1 because, in this case, they will be applied after the output of the receiver (i.e., in the decoded data).

We have already presented the development and the implementation of hardware modules for the security functions approved by the CCSDS in [8], thereby achieving results that outperform the state-of-the-art technology, which includes future improvements in the security strength with key generation and derivation services that are aided by highly qualified random number generators [9] and protection mechanisms against side-channel attacks (SCAs) [10]. Regarding the development of the baseband receiver, we presented a preliminary work in [11], which focused on the frame synchronization module. Indeed, the authors of [12] have identified the frame synchronization process as one of the most critical aspects of the receiving chain, which creates a bottleneck. In Earth observation missions, the amount of collected data to be delivered to ground stations has reached several terabytes per day [13]. As further improvements are expected in the future, the implementation of the frame synchronization module has become a challenging task for designers due to its high throughput requirements. In [11], we proposed a design methodology for the frame synchronization module by relying on a parallel approach. The necessity of this approach arose from a collaborative project with IngeniArs S.r.l. for the FPGA implementation of a receiver that was able to comply with the CCSDS 131.2-B-1 standard and to support a baud rate of at least 1 Gbaud. IngeniArs S.r.l owned a serial version of the frame synchronization module that would have required an operating frequency of 1 GHz on the KU115 FPGA to support the baud rate of 1 Gbaud. Since it was not possible to implement the serial frame synchronization module with such an operating frequency, we opted for a parallel approach, thereby deriving a design methodology that could offer different solutions according to the synthesis frequency, the resource consumption, and the number of parallel input samples. To the best of our knowledge, we are the first in the literature to propose such a parallel approach for receivers compliant with the CCSDS 131.2-B-1 standard. In this work, we present the advancements obtained by developing a parallel-architecture frame synchronizer based on the methodology that we presented in [11] and its implementation on a KU115 FPGA. We have been able to realize frame synchronization modules capable of supporting a baud rate of at least 1 Gbaud, at operating frequencies much lower than 1 GHz (specifically from 62.5 MHz up to 250 MHz at most). Moreover, the proposed frame synchronizer can also support applications compliant with the CCSDS 131.21-O-1 standard, since its functioning depends only on the frame format at the physical layer (PL), and both the 131.2-B-1 and 131.21-O-1 standards use the same PL frame format.

The remainder of this work is organized as follows. Section 2 describes the PL frame defined by the CCSDS 131.2-B-1 and 131.21-O-1 standards and provides a review of the state-of-the-art synchronization algorithms related to it. Section 3 analyzes the synchronization algorithms presented in Section 2 in terms of the trade-off between accuracy and the consumption of logic resources and identifies the best solution. Section 4 reports the methodology that we proposed in [11] for the design of parallel frame synchronization modules and provides additional insights. Section 5 presents the results obtained by designing the proposed module in VHDL and implementing it on a KU115 FPGA. Finally, Section 6 presents the conclusions of this work.

2. PL Frame for CCSDS 131.2-B-1 and 131.21-O-1 and Synchronization Algorithms

The CCSDS 131.2-B-1 standard specifies the use of SCCCs with different constellation formats (i.e., QPSK, 8-PSK, 16-ASPK, 32-APSK, and 64-APSK), which are combined into 27 different modulation scheme coding rate (MODCOD) combinations. Accordingly, the CCSDS 131.21-O-1 standard defines ten additional MODCODs that extend those in CCSDS 131.2-B-1 (i.e., MODCOD 28 to 37) to include the constellations 128-APSK and 256-APSK. As a result, efficient bandwidth utilization can be achieved in a variety of scenarios, and channel adaptation techniques, such as variable coding and modulation and adaptive coding and modulation [14], are inherently supported. The CCSDS 131.2-B-1 and the CCSDS 131.21-O-1 define the same PL frame as the communication unit for the physical layer. Figure 2 shows its structure, which consists of three main parts.

The frame marker (FM), which is composed of 256 $π / 2$ BPSK modulated known symbols, and whose main purpose is to enable frame start detection; however, it can also be used for carrier frequency and phase recovery [15] and signal-to-noise ratio estimation [15].
The frame descriptor (FD), which is composed of 64 $π / 2$ BPSK modulated symbols that carry the information about the MODCOD used and indications about the presence or absence of pilot symbols.
The codeword segment (CWS), which consists of 16 codeword sections (CW#i for i = 1, 2, …, 16) of modulated symbols with additional optional pilot symbols. Specifically, each codeword section is made up of 15 subsections of 540 data symbols, optionally followed by 16 pilot symbols, for a total of 8340 or 8100 symbols (hence for a total of 133,440 or 129,600 symbols for the entire CWS). The pilot symbols can be used to recover the carrier phase and any residual frequency offset [15].

From a theoretical perspective, several solutions to the frame synchronization problem have been published in the literature, the simplest of which is based on computing the correlation of a portion of the received sequence with the known sync word at each symbol position and comparing the result with a threshold [16]. However, many improvements to the simple correlation metric have been proposed in the past. Pioneering works by Massey [17] and Nielsen [18] presented optimal and sub-optimal metrics for the case of an additive white Gaussian noise channel with binary signaling, a constant frame length (i.e., periodic occurrence of the sync word), and coherent demodulation, i.e., assuming perfect carrier frequency and phase recovery prior to frame synchronization being performed. Subsequent studies have extended these early results in various directions, such as those of multi-level modulations, ISI-impaired and fading channels, or code-aided frame synchronization: a detailed survey, which is beyond the scope of this work, can be found in [19], where, furthermore, optimal and sub-optimal rules for frame synchronization in the presence of carrier phase offset (i.e., non-coherent frame synchronization) were reported for the first time.

Since our focus is on the CCSDS 131.2-B-1 standard, which is commonly employed for communication with satellites where a large Doppler shift is experienced, we are interested in non-coherent frame synchronization in the presence of a possibly large frequency offset. A reference work on this subject is [20], where four metrics that are remarkably robust against carrier frequency offset were derived through an approximation of the maximum likelihood function. These metrics are given in Equations (1)–(4).

L_{0} (μ) = \sum_{i = 1}^{L - 1} \{{|\sum_{k = i}^{L - 1} r_{μ + k}^{*} \cdot s_{k} \cdot r_{μ + k - i} \cdot s_{k - i}^{*}|}^{2} - \sum_{k = μ + i}^{μ + L - 1} {|r_{k}|}^{2} {|r_{k - i}|}^{2}\}

(1)

L_{1} (μ) = \sum_{i = 1}^{L - 1} \{|\sum_{k = i}^{L - 1} r_{μ + k}^{*} \cdot s_{k} \cdot r_{μ + k - i} \cdot s_{k - i}^{*}| - \sum_{k = μ + i}^{μ + L - 1} |r_{k}| |r_{k - i}|\}

(2)

L_{2} (μ) = |\sum_{k = 1}^{L - 1} r_{μ + k}^{*} \cdot s_{k} \cdot r_{μ + k - 1} \cdot s_{k - 1}^{*}| - \sum_{k = μ + 1}^{μ + L - 1} |r_{k}| |r_{k - 1}|

(3)

L_{3} (μ) = |\sum_{k = 1}^{L - 1} r_{μ + k} \cdot s_{k}^{*} \cdot r_{μ + k - 1}^{*} \cdot s_{k - 1}|

(4)

For each metric in Equations (1)–(4), r and s denote complex numbers, the received symbols, and the ideal frame marker symbols, respectively, while the operator * denotes the complex conjugate of the symbol that it is applied to. L is the FM length, which is equal to 256 for the standards CCSDS 131.2-B-1 and CCSDS 131.21-O-1, and it determines the upper limit of the summations. Since the expected ideal FM symbols s are defined by the standard and known in advance, the products

s_{k} \cdot s_{k - 1}^{*}

can be pre-computed and stored in lookup tables.

Although the work presented in [20] can be considered dated, it continues to serve as a seminal reference for frame synchronization techniques, as no significant theoretical advances have been made to date. This is confirmed by the fact that the authors of [5] recommend using one of the metrics in [20]. The particular robustness of these metrics against carrier frequency offset is due to the use of double correlation [21], which also makes them insensitive to phase offset. Furthermore, the implementation of these metrics exhibits moderate complexity, which we have further reduced as shown in Section 3. It is worth noting that the timing uncertainty of the symbols provided to the frame synchronizer must be recovered beforehand to avoid performance degradation. In addition, since timing recovery takes place before frame synchronization, a non-data-aided algorithm must be used. More recently, the use of post-detection integration, which is a technique originally developed for code acquisition in spread-spectrum communications [22], has been investigated and developed into various detection schemes [23,24,25]. FFT-based techniques have also been studied [26].

3. Accuracy versus Resource Consumption Trade-Off in Synchronization Algorithms

Regarding the impact on the hardware implementation, the double summation adopted by

L_{0} (μ)

and

L_{1} (μ)

makes them impractical for use in a high-data-rate receiver, since the corresponding architecture requires a

255 \times 255

complex matrix to store the different

r_{μ + k}^{*} \cdot r_{μ + k - i}

products and a

255 \times 255

real matrix for the moduli. In addition, the number of operations for each input sample is considerably high. In contrast, the

L_{2} (μ)

and

L_{3} (μ)

metrics do not involve a double summation and differ from each other only by a correction term that is present in the former but not in the latter. The operations required to compute the metric

L_{3} (μ)

are 255 complex multiplications and 254 complex sums, plus the final modulus. In the case of the metric

L_{2} (μ)

, the operations required for the correction term must also be included, which correspond to additional 255 real multiplications and 254 real sums, plus the computation of 255 complex number moduli. The latter comes at a high cost, since the exact modulus calculation requires the square root. Although less complex yet approximated solutions could be used, such as the alpha-max plus beta-min algorithm [27] or the CORDIC one [28], the impact of their inherent algorithmic error would have to be carefully evaluated, and an additional number of operations would be introduced in proportion to the required accuracy. Therefore, the metric

L_{3} (μ)

is the best candidate for hardware implementation in terms of logic resource utilization.

Regarding the false detection probability, [20] shows that the metric

L_{1} (μ)

is the best performing, followed by

L_{2} (μ)

,

L_{0} (μ)

, and

L_{3} (μ)

. However, the low complexity of

L_{3} (μ)

makes it reasonable to further analyze its false detection probability to determine if it meets our requirements. To achieve this, we utilized the kernel density estimation technique [29] to fit the distribution of the metric values in two separate scenarios: when the metric is assessed at the correct starting point of the frame marker, and when it is evaluated at any other different position within the frame. Figure 3 illustrates the two distributions for MODCOD 1,

E_{s} / N_{0} = - 2 dB

, and a normalized frequency offset

Δ f = 0.2

. The orange distribution represents of the metric for the correct marker position, while the blue distribution represents the metric value for all other points.

To estimate the false detection probability, we conducted a test as follows:

N - 1

samples were taken from the blue distribution and one from the orange distribution, where N is the length of the entire frame (i.e., 133,760 or 129,920, respectively, if pilot symbols are included or not). If the sample from the orange distribution has a higher value than the others, the start-of-frame is correctly detected; otherwise, it is not. By repeating the test a sufficient number of times, we estimated the correct detection probability to be approximately

p = 0.99903

. This confirms that the simple peak detection method is effective for frame synchronization, with the usual requirement being a 99% probability of correct start-of-frame detection in operative conditions such as those considered in our test. However, post-processing can be performed on the metric values to further increase the reliability of frame synchronization. One possible solution is to declare the frame lock (or unlock) event after a certain number of peaks is detected (or missed, respectively) at the expected position, which is at a distance from the previous peak equal to frame length in symbols. With this, the probability of correct frame synchronization becomes that of Equation (5), where K is the considered number of peaks to be matched.

p_{p o s t - p} = 1 - {(1 - 0.99903)}^{K}

(5)

The value of K provides an additional degree of freedom that can be traded for further simplification. For example, instead of the formally required ℓ₂-norm (Euclidean norm), the modulus of the sum result required by the metric

L_{3} (μ)

can be computed as the ℓ₁-norm (Manhattan norm). This affects the probability of correct frame synchronization, which decreases to

p = 0.94738

, but it can be easily compensated for by increasing the value of K (e.g.,

K = 2

increases the probability to 0.9972). From a hardware implementation perspective, substituting the ℓ₂-norm with the ℓ₁-norm results in significant resource savings. This is because it eliminates the necessity of a square-root unit and only requires a sum with the possible sign inversion of the addends. In conclusion, Equation (6) illustrates the approximated version of

L_{3} (μ)

.

{\hat{L}}_{3} (μ) = |Re \{\sum_{k = 1}^{L - 1} r_{μ + k} \cdot r_{μ + k - 1}^{*} \cdot s_{k}^{*} \cdot s_{k - 1}\}| + |Im \{\sum_{k = 1}^{L - 1} r_{μ + k} \cdot r_{μ + k - 1}^{*} \cdot s_{k}^{*} \cdot s_{k - 1}\}|

(6)

Based on the discussion in this section, the metric

{\hat{L}}_{3} (μ)

is the most suitable option for the design of hardware frame synchronization modules in high-speed receivers that comply with the CCSDS 131.2-B-1 and CCSDS 131.21-O-1 standards. In high-data-rate applications, the parallel processing of multiple samples per clock cycle is a compelling choice. The reduced logic complexity corresponding to Equation (6) is key to achieving moderate resource utilization. Additionally, the presence of complex multiplications is well suited for FPGA devices, which typically embed DSPs that can be exploited for such operations.

4. Architecture of the Frame Synchronization Module

To better understand the mechanisms and resources required, it is convenient to analyze and illustrate the construction of the serial version before describing the parallel architecture of a frame synchronization module based on Equation (6).

4.1. Serial Frame Synchronization Module

The application of metric

{\hat{L}}_{3} (μ)

to the PL frame results in the highest correlation value at the last symbol of the FM. Therefore, by observing as many symbols as the frame length (i.e., 133,760 or 129,920), the start-of-frame can be easily identified when the end of the FM is detected because its position is constant from the beginning of the frame.

The complex product between the frame symbols in Equation (6) can be expressed as

p_{μ, k} = r_{μ + k} \cdot r_{μ + k - 1}^{*}

, and its real and imaginary components can be made explicit as follows to derive the corresponding hardware implementation:

\begin{matrix} p_{μ, k} & = \underset{r_{μ + k}}{\underset{⏟}{[Re \{r_{μ + k}\} + j \cdot Im \{r_{μ + k}\}]}} \cdot \underset{r_{μ + k - 1}^{*}}{\underset{⏟}{[Re \{r_{μ + k - 1}\} - j \cdot Im \{r_{μ + k - 1}\}]}} \\ = \underset{Re \{p_{μ, k}\}}{\underset{⏟}{[Re \{r_{μ + k}\} \cdot Re \{r_{μ + k - 1}\} + Im \{r_{μ + k}\} \cdot Im \{r_{μ + k - 1}\}]}} \\ + j \cdot \underset{Im \{p_{μ, k}\}}{\underset{⏟}{[Re \{r_{μ + k - 1}\} \cdot Im \{r_{μ + k}\} - Re \{r_{μ + k}\} \cdot Im \{r_{μ + k - 1}\}]}} \end{matrix}

(7)

Equation (7) demonstrates that the complex product

p_{μ, k}

can be implemented using four DSPS, one for each multiplication between the vectors representing the real and imaginary components, along with one adder (for

Re \{p_{μ, k}\}

) and one subtractor (for

Im \{p_{μ, k}\}

). This information is sufficient to describe the first part of the module dedicated to calculating metric

{\hat{L}}_{3} (μ)

, as shown in Figure 4.

In Figure 4, the green elements represent resources dedicated to the real components of processed vectors, while all the yellow elements represent resources dedicated to the imaginary counterpart. The shift registers of two locations at the input are used to store the current sample (

r_{μ + k})

, split into its corresponding components

I = Re \{r_{μ + k}\}

and

Q = Im \{r_{μ + k}\}

), and the previous sample (

r_{μ + k - 1}

), which are used to feed a complex multiplier to compute the product

p_{μ, k}

, according to Equation (7). The real and imaginary components of

p_{μ, k}

, i.e.,

Re \{p_{μ, k}\}

and

Im \{p_{μ, k}\}

, respectively, are stored in dedicated shift registers of

L - 1 = 255

locations each to collect the 255 complex products for the summation (Equation (6)).

Referring back to Equation (6), the products

p_{μ, k}

are multiplied by the products of ideal symbols

s_{k}^{*} \cdot s_{k - 1}

, which are denoted as

z_{k}

from this point onwards. According to the specifications of standards CCSDS 131.2-B-1 and CCSDS 131.21-O-1, the product

z_{k} = s_{k}^{*} \cdot s_{k - 1}

can only take the values

2 j

or

- 2 j

for any of the 255 possible combinations. The value 2 can be omitted because it is common to all multiplications with products

p_{μ, k}

, approximating

z_{k} \approx j

or

z_{k} \approx - j

. When performing arithmetic with complex numbers, multiplying by j or

- j

results in swapping the real and imaginary parts of the complex number and changing the sign of these parts accordingly. Therefore, if the real and imaginary components of

p_{μ, k}

are made explicit, it follows that

p_{μ, k} \cdot z_{μ, k} \approx \{\begin{matrix} - Im \{p_{μ, k}\} + j \cdot Re \{p_{μ, k}\}, & if z_{μ, k} = 2 j \\ Im \{p_{μ, k}\} - j \cdot Re \{p_{μ, k}\}, & if z_{μ, k} = - 2 j \end{matrix}

(8)

By integrating the assumption

a_{μ, k} = p_{μ, k} \cdot z_{μ, k}

and using Equation (8), Equation (6) can be reformulated as shown in Equation (9).

{\hat{L}}_{3} (μ) = |Re \{\sum_{k = 1}^{L - 1} a_{μ, k}\}| + |Im \{\sum_{k = 1}^{L - 1} a_{μ, k}\}|

(9)

Assuming

A (μ) = \sum_{k = 1}^{L - 1} a_{μ, k}

, it follows that

{\hat{L}}_{3} (μ) = |Re \{A (μ)\}| + |Im \{A_{(} μ)\}|

, which is equivalent to the ℓ₁-norm of

A (μ)

. This expression can be used to finalize the architecture outline of the unit responsible for calculating metric

{\hat{L}}_{3} (μ)

, as illustrated by Figure 5. With respect to Figure 4, the additional resources include a swap layer that swaps and reverses the sign of the real and imaginary parts of the products

p_{μ, k}

according to Equation (8), two adder trees to compute the summation over the 255 elements (one for the real part and one for the imaginary part), and a unit for the calculation of the ℓ₁-norm.

Similar to Figure 4, in Figure 5, the green elements represent resources dedicated to the real components of processed vectors, while the yellow elements represent resources dedicated to the imaginary counterpart.

To complete the serial frame synchronization module, it is necessary to integrate a peak detection unit to seek the highest value of

{\hat{L}}_{3} (μ)

within an observation window (i.e., 133,760 or 129,920 symbols), a buffering block for the input symbols to cover the latency introduced by the processing chain (i.e., the two previous blocks), and a pass-gate switch that allows the transfer of input symbols to the output of the module when the synchronization occurs (or to block them otherwise). The pass-gate switch shall be driven by the peak detection unit. Figure 6 shows the corresponding high-level architecture of a serial frame synchronization module. Figure 7 instead illustrates the corresponding state diagram of the finite state machine (FSM) that regulates the transfer of the input symbols to the output and is integrated into the peak detection unit.

In relation to Figure 6, the (serial) frame synchronization module takes as input the real and imaginary components of the receiving symbols (respectively, I and Q) and a control flag (pilot_en) that signals the presence or absence of pilot symbols within the frame: this is used by the peak detection unit to determine the length of the observation window, which is either 133,760 or 129,920 symbols, respectively. The peak detection unit stores and updates the highest correlation value within the observation window and the position of the symbol that induced such a value, based on the values of the

{\hat{L}}_{3} (μ)

metric calculated by the dedicated block. Upon system boot, specifically after every reset, the peak detection unit enters the UNINITIALIZED state (refer to Figure 7) and utilizes the first observation window to initialize the registers that are dedicated to the peak’s statistics (value and position). Afterwards, it transitions to an idle state, which is denoted as the UNLOCK state. The peak detection unit remains in this state until synchronization occurs. Meanwhile, it blocks the transfer of input symbols to the output by driving the pass-gate barrier accordingly. The synchronization occurs only in the event that the peak of the

{\hat{L}}_{3} (μ)

metric is detected in the same position within the observation window for a certain number of consecutive times, and, when this happens, the peak detection unit declares the lock of frame synchronization, moving to the LOCK state (Figure 7). This is signaled by a dedicated output flag (lock state flag in Figure 6) that is set to 1. The parameter LOCK_THR (Figure 7) indicates the number of consecutive peak matches and is configurable through a synthesis parameter corresponding to the value K in Section 3. Similarly, the UNLOCK_THR parameter (Figure 7) is defined to indicate the unlock threshold, which corresponds to the number of consecutive mismatches of the position in which the highest correlation peak is expected within the observation window. When this happens (unlock event), the peak detection unit moves back to the UNLOCK state, and this sets to 0 the lock state flag and drives the pass-gate barrier accordingly to interrupt the symbol transfer on the output. If the unlock event occurs during the transfer of a frame, the ongoing frame will not be abruptly interrupted. Instead, the transfer will be completed until the end of this frame, and the interruption of symbol transfer on the output will occur at the beginning of the next frame. As shown in Figure 6, the frame synchronization module also includes an input data buffer block that is integrated to increase the robustness of the mechanism for the transfer of input symbols. By implementing such a buffer unit with a width of one symbol (for the serial implementation) and a depth equal to the sum of the latencies introduced by the approximated metric calculator and the peak detection unit, a new frame can be immediately transferred to the output if the lock of frame synchronization occurs exactly when a new frame begins, without wasting it. This feature can be appreciated in Figure 8. When there is no offset between the beginning of the frames and the beginning of the observation windows, the buffering of the input data allows for the transmission of symbols on the output from the frame #N + 1. Without using the buffering approach, this would not be possible, since the first symbols of frame #N + 1 would already be consumed, and the LOCK state cannot be declared at the end of the observation window #N. This would force the frame synchronization module to wait for the beginning of the next frame, i.e., frame #N + 2, as illustrated in Figure 9.

4.2. Parallel Frame Synchronization Module

The architecture described in Section 4.1 can be extended to implement a parallel frame synchronization module by optimizing the consumption of logic resources, by assuming that a certain number of parallel symbols PS is provided as input to the frame synchronization module in a clock cycle (

r_{μ + k}

,

r_{μ + k + 1}

, …,

r_{μ + k + P S - 1}

). Indeed, PS complex products

p_{μ, k}

,

p_{μ, k + 1}

, …,

p_{μ, k + P S - 1}

must be calculated and stored in the corresponding shift register units within a clock cycle. The depth of these shifting units must be L – 1 + PS – 1 = 255 + PS – 1 locations, instead of 255 locations, as in the serial implementation. To prevent the saturation of the successive computing chain, it is necessary to replicate the swap layer, and the adder trees for real and imaginary components must be replicated in parallel. This requires a total of PS identical structures, each of which calculates a different

{\hat{L}}_{3} (μ)

value using a different slice of 255 complex products extracted from the 255 + PS – 1 available products. For example, if PS = 4, the shift register units must have a depth of 255 + 4 – 1 = 258 locations, which include the products

p_{μ, 1}

,

p_{μ, 2}

,

p_{μ, 3}

, …,

p_{μ, 258}

. As a result, the PS = 4 parallel computing chains formed by the swap layer and adder trees must take as inputs the following ranges of products:

p_{μ, 1}

to

p_{μ, 255}

,

p_{μ, 2}

to

p_{μ, 256}

,

p_{μ, 3}

to

p_{μ, 257}

and

p_{μ, 4}

to

p_{μ, 258}

. Accordingly, the peak detection unit must integrate a comparator tree to compare the PS metrics

{\hat{L}}_{3} (μ)

in parallel with each other and with respect to the reference

{\hat{L}}_{3} (μ)

value for the ongoing observation window. Based on this, the corresponding architecture outline can be derived for both the approximated metric calculator and the frame synchronization module in the case of parallel implementation. Figure 10 and Figure 11, respectively, illustrate these schemes. It should be be noted that, unlike the serial implementation case shown in Figure 6, the parallel version of the frame synchronization module requires the integration of an additional module dedicated to aligning the output symbols with respect to a segment of PS parallel symbols, as shown in Figure 11. This is because, at the output, PS parallel symbols are transmitted to sustain the input data rate and not to saturate the internal processing chains. If this feature was not implemented and the beginning of the frames did not match the start of a PS segment, each segment at the beginning (or end) of a frame transferred to the output would contain symbols from both the current and previous (or next) frames. Therefore, it is necessary to integrate a mechanism to signal which symbols belong to which frame.

4.3. Considerations for Hardware Implementations and Testing

Since our goal was to implement the parallel frame synchronization module in hardware using VHDL, we focused on calibrating the critical delays caused by long combinational paths to ensure that the module could support high operating frequencies. Specifically, we concentrated on the adder tree units, which were designed using the typical layered architecture of an adder tree. Each layer instantiates half as many adders as there are inputs (e.g.,

N / 2

adders, for N inputs) and sums them in pairs, generating

N / 2

sums for a total of

l o g_{2} (N)

layers. The last layer produces a single output sum. The number of inputs to be added depends on the symbol length of the FM field, which is 256, according to the specifications of CCSDS 131.2-B-1 and CCSDS 131.21-O-1 (Section 2). Therefore, a total of 255 inputs must be added, resulting in an adder tree structure with 8 layers. Since the output vectors of each layer serve as inputs to the next layer, a long combinational path can be formed from the input of the adder trees in Figure 10 to their output. We made it possible to instantiate (or not) a pipeline stage for each layer using a corresponding synthesis parameter integrated within the VHDL code. Figure 12 depicts the layered adder tree structure and the insertion point (red line) for the eventual pipeline stages to break the long combinational path. Additionally, one of the inputs of the first layer in Figure 12 is stuck at zero because it must sum to 255 addends, as explained in Section 4. However, for simplicity, we implemented a symmetric structure with 256 inputs for the first layer, leaving the synthesis tool to optimize the corresponding logic architecture.

The combinational path within the adder trees is influenced by the bit width of the first layer inputs due to the carry chains. As a result, it also relies on the bit width of the symbols received across the computation chain in Figure 10. To address this, we included a synthesis parameter in the VHDL code to adjust the bit width of the input symbols. In conclusion, the VHDL code for the parallel frame synchronization module allows for the tuning of the following features:

The bit width of the input symbols;
The number of parallel input symbols (i.e., PS);
The insertion of a pipeline stage for each layer of the adder tree architecture;
The threshold for the lock event (parameter LOCK_THR) and the unlock event (parameter UNLOCK_THR).

As shown in [11], we tested our module with RTL simulations for various combinations of configuration parameters and successfully verified its operation under different conditions. To do this, we used a bit-true MATLAB model of a CCSDS receiver developed by IngeniArs S.r.l., which complies with the 131.2-B-1 standard. We used this model to extract the corresponding test vectors for the frame synchronization module. Additionally, the MATLAB model allowed us to determine the optimal bit width for data vectors at various stages of the processing chain within the frame synchronization module. This referred to the minimum bit width that did not degrade the accuracy of the correlation metric during the frame synchronization process. We discovered that using 12 bits was sufficient to represent the input symbols. Additionally, using only the most significant half of the complex products

p_{μ, k}

,

p_{μ, k + 1}

, …,

p_{μ, k + P S - 1}

does not compromise the accuracy of the calculation of

{\hat{L}}_{3} (μ)

. However, reducing the output sums across the adder tree layers by even one bit significantly reduces the accuracy. Therefore, we developed the VHDL code for the frame synchronization module to maintain accuracy while reducing the logical complexity and combinational delays caused by larger vectors. The sequence of complex products

p_{μ, k}

,

p_{μ, k + 1}

, …,

p_{μ, k + P S - 1}

is the input to the adder tree units, and, by halving their bit width, we significantly reduced the combinational path inside these units.

We conducted several tests exploiting the bit-true MATLAB model from IngeniArs to introduce and modulate noise on the received samples. The noise amplitude was varied to verify the proposed module’s ability to successfully lock frame synchronization, even when setting the lock threshold

K = 2

(Equation (5)), as explained in Section 2. Furthermore, since the IngeniArs application required the recovery of symbols with a phase shift of

0^{\circ}

,

90^{\circ}

,

180^{\circ}

, or

270^{\circ}

, we utilized the MATLAB model to introduce these values for the symbol phase in the corresponding test vectors. The simulations also confirmed that our module was not affected by the phase offset, as stated in Section 2. In Table 1, we summarize the range of values used in the combinations of configuration parameters for testing.

5. Results

Exploiting the configurable VHDL code of the proposed frame synchronization module, we implemented it on a KU115 FPGA from Xilinx/AMD (specifically the device xcku115-flva2104-2) using the Vivado tool version 2020.2. We performed several experiments by varying the values of the synthesis parameters presented in Section 4.3. However, in all cases, the goal was to support at least a 1 Gbaud rate. Therefore, the corresponding operating frequency can be calculated by dividing the baud rate by the parallelization factor PS. For example, in the case of PS = 4, the corresponding frequency required is

1 G b a u d / 4 s y m b o l s = 10^{9} \cdot \frac{s y m b o l s}{s} \cdot \frac{1}{4} \cdot \frac{1}{s y m b o l s} = 250 MHz

. This also confirms that the serial version of the frame synchronization module (PS = 1) would require a frequency of 1 GHz. The results of the implementation experiments are reported in Table 2, and each one is labeled with a specific identifier declared in the first column (Run column). In terms of logic resource consumption, we report the utilization of embedded digital signal processors (DSPs) and the total utilization of programmable logic elements (i.e., the configurable logic blocks, CLBs) in the corresponding columns. We also detail the cost of the CLBs in terms of combinational logic elements (CLB LUTs) and registers (CLB Registers). In particular, each CLB of the KU115 FPGA contains 8 CLB LUTs and 16 CLB Registers.

With reference to Table 2, we tested different parallelism values (PS column)—in particular, the values 4, 8 and 16. In all cases, the implementation on the KU115 FPGA was successful, reporting a maximum supported frequency higher than the corresponding required frequency (Required frequency column) that we set through the timing constraints. In addition, we tested different solutions of the pipeline stages inside the adder trees for the case PS = 4 and PS = 8. To give a clearer insight into this aspect, we report in Table 2 all eight layers of the adder tree structure (Figure 12) in the sub-columns of the column Adder tree pipeline stage. For each of them, we indicate the instantiation or not of the corresponding pipeline stage by using a full black circle (•) or an empty circle (∘), respectively. On the one hand, this aspect was found to be crucial to achieving the frequency required by the implementation; on the other hand, the possibility of moving the pipeline stages across the layers of the adder trees allowed us to reduce the consumption of logic resources. For example, we studied three different solutions for PS = 4 that were able to achieve the target baud rate of 1 Gbaud (i.e., a frequency of 250 MHz), which were runs #1, #2 and #3. For these cases, Table 2 shows that the most favorable configuration of the adder tree pipeline stages in terms of resource utilization is the one used in run #3, with a total CLB consumption of

6.98 %

. Runs #1 and #2, on the other hand, consume

7.07 %

and

6.99 %

of the CLBs, respectively. The usage of other logic resources such as DSPs was the same. From another point of view, Table 2 shows that the total utilization of the programmable logic elements of the FPGA varies on average from 7% to

13.5

% to

26.8

% for PS = 4, 8 or 16, respectively. Therefore, in order to better evaluate the proposed solutions, the rightmost and last column (Efficiency) shows the efficiency of the different experiments. It is expressed in terms of kbaud/CLB and is calculated by dividing the target baud rate of 1 Gbaud by the utilization of the CLB resources (CLBs column). Such data show that the higher the level of parallelism, the lower the efficiency when targeting a 1 Gbaud symbol rate. In any case, the higher the level of parallelism, the lower the frequency required for the implementation when targeting 1 Gbaud. Therefore, even though the utilization of logic resources increases with the level of parallelism, if the maximum supported frequency does not match the frequency required by the target application, it is possible to reduce the frequency requirement by increasing the parallelization factor PS. For the case PS = 16, we did not insert any pipeline stage within the adder trees because the low frequency required (i.e., 62.5 MHz) was such that they were not necessary, saving costs in terms of CLBs.

The utilization of the DSPs in Table 2 highlights another aspect that confirms what we have presented in this work, specifically in Section 4. In fact, the presented architecture requires one complex multiplication to process one input symbol (in one clock cycle). According to Equation (7), this can be translated as four different multiplications between scalars (the real and imaginary components of the input symbol and the previous one), which, in hardware, corresponds to four different (and parallel) multipliers, each one dedicated to a scalar multiplication. Therefore, the total number of (parallel) multipliers can be predetermined and calculated as

4 \cdot P S

, i.e., 16, 32 or 64 for PS = 4, 8 or 16, respectively. The values in the DSPs column confirm this by emphasizing that one DSP is used for each (scalar) multiplication.

However, the operating frequencies shown in Table 2 are not the maximum supported ones, but only those required for the 1 Gbaud use case. Therefore, we also investigated this aspect by determining the maximum frequency that the different runs could support. We report the results of this analysis in Table 3 for the same runs #3, #5 and #6 of Table 2. This investigation showed that the proposed solutions for the frame synchronization module can support baud rates higher than 1 Gbaud. The maximum supported baud rate can be calculated by multiplying the parallelization factor PS by the corresponding maximum frequency. For example, in the case of run #6, the maximum supported baud rate is 16 symbols · 250 MHz = 1.36 Gbaud.

The results in Table 3 do not show any significant changes with respect to the considerations about the efficiency of the solutions reported in Table 2.

As a final experiment, we explored the possibility of further increasing the maximum frequency (and the corresponding baud rate) by merging the results of runs #3 and #5. In particular, we applied to run #5 (PS = 8) the same pipeline strategy of adder trees used in run #3 (PS = 4). The surprising result was that the corresponding implementation was able to offer a baud rate of 2.12 Gbaud without significant changes in resource consumption, as shown in Table 4. In fact, this new solution, labeled run #7, requires 13.45% of the KU115 CLBs (i.e., 11,150), while run #5 requires 13.36% (i.e., 11,080 CLBs). The main difference lies in the number of CLB Registers, which is more than double for run #7 compared to run #5. However, it is worth remembering that each CLB embeds 16 CLB Registers, so the effect of the additional CLB Register usage on the total number of CLBs is scaled by a factor of 16. Furthermore, a CLB is considered used (occupied) regardless of how many of its internal resources are used (if any or all). Whether only one CLB LUT on 8 is in use, whether only one CLB Register on 16 is in use or whether all 8 CLB LUTs and all 16 CLB Registers are in use, the CLB that contains them is always counted as used. Thus, the number of CLBs used by run #7 seems reasonable despite the huge increase in CLB Register usage. In fact, considering that most of the CLBs whose only CLB LUT resources were occupied by the combinational layers of the adder trees in run #5, run #7 also uses the corresponding local CLB Registers (i.e., unused CLB Registers in the same CLB) to implement the additional pipeline stages that must be placed immediately after the combinational layers of the adder trees. In other words, in run #5, many CLBs are partially used (only in the corresponding CLB LUT resource, but still used overall), while, in run #7, the corresponding CLB Register resources are also used, thus fully utilizing these CLBs and significantly increasing the overall efficiency of the implementation. Indeed, as can be seen in the corresponding column of Table 4, the efficiency in kbaud/CLB of run #7 is not only much higher than that of run #5, but it is also the highest of all the solutions presented, even of runs #1, #2 and #3 in Table 2. Similarly, run #7 also has the highest baud rate. Furthermore, the results in Table 4 confirm that the critical path is in the long combinational path of the adder tree units, as assumed. In fact, they show that it is possible to increase the maximum frequency by integrating additional pipeline stages only in the adder tree units for the same parallelization factor.

The throughput corresponding to the baud rate can be calculated based on the information block size of a MODCOD and the following formula:

R_{b} = \underset{f r a m e r a t e}{\underset{⏟}{(\frac{R_{S}}{S_{F}})}} \cdot 16 \cdot B

(10)

In Equation (10), B is the number of transmitted/received information bits, i.e., the information block size parameter defined in [1,4] for the different MODCODs. In other words, it represents the data payload size in bits for each transmitted/received codeword.

R_{S}

and

S_{F}

are, respectively, the symbol rate (i.e., the baud rate) and the PL frame length in symbols (i.e., 133,760 or 129,920). The division of

R_{S}

by

S_{F}

corresponds to the frame rate. Since each PL frame contains 16 codewords, the bit rate

R_{b}

can be calculated by multiplying B by 16 and by the frame rate. In the best case, i.e., for the minimum PL frame length of 129,920 symbols (no pilot symbols within the frame), the target baud rate of 1 Gbaud allows a throughput of 5.38 Gbps for MODCOD 27 defined in the CCSDS 131.2-B-1. In fact, this MODCOD is the one with the largest information block size among the MODCODs of CCSDS 131.2-B-1, which is

B = 43,678

bits. According to Equation (10), the corresponding calculation is

(1 / 129,920) \cdot 43,678 \cdot 16 \approx 5.38

. Similarly, MODCOD 37, defined in CCSDS 131.21-O-1, is characterized by an information block size

B = 60,510

; hence, the corresponding throughput for 1 Gbaud is 7.45 Gbps. Referring to the results of run #7 in Table 4, it provides a baud rate of 2.12 Gbaud, so the corresponding throughput for MODCOD 27 and MODCOD 37 is 11.40 Gbps and 15.80 Gbps, respectively.

Comparison with Other Solutions

As mentioned in Section 1, to the best of our knowledge, this work is the first to propose a parallel architecture for high-data-rate frame synchronization modules according to the specifications of the CCSDS 131.2-B-1 and 131.21-O-1 standards, so a comparison with other existing solutions is not possible. In addition, the literature does not even provide detailed works on serial architectures for the same standards, making it difficult to find results that can be used for a comparison with our proposal. The most complete work that we can find concerns the implementation of a frame synchronization module for the Digital Video Broadcasting Satellite Second Generation (DVB-S2) standard. Indeed, the authors of [30] propose the implementation of a parallel frame synchronization module according to the specifications of the DVB-S2 standard, with a parallelism level equal to 4 and based on the search of correlation peaks between the received symbols and the expected frame header. Similarities can be found between our architecture (Figure 10 and Figure 11) and the one presented in [30], such as the complex multiplication units, swapping layers, and pipelined adder trees. In particular, the authors of [30] adopted three pipeline stages inside the adder trees and their module performs a double peak search: one correlation peak is computed with respect to the beginning of the frame, similar to our solutions, while a second correlation peak is searched in the same observation window with respect to a secondary header field. Despite the double correlation peak search, which could justify a higher resource cost for improved accuracy, the results show that the solution proposed in [30] requires the analysis of at least four frames to successfully complete the frame synchronization process with 99% accuracy. Instead, our solution is able to successfully complete the frame synchronization process using two frames, as shown analytically in Section 2 and in Equation (5) and confirmed by the simulations. A final significant difference lies in the depth of the buffer for the complex products since the frame header length defined by the DVB-S2 standard corresponds to 32 symbols instead of the 256 symbols of the FM field defined by the CCSDS 131.2-B-1 and 131.21-O-1 standards. Although the authors of [30] do not report the maximum frequency supported by their solution, they give the supported baud rate of 25 Mbaud only for the constellations QPSK and 8-PSK. Instead, our solution also supports the 16-ASPK, 32-APSK and 64-APSK constellations. Table 5 summarizes the comparison with the work presented in [30].

6. Conclusions

In this work, we presented a design methodology for the implementation of frame synchronization modules based on a parallel approach to meet the high data rate requirements in satellite receivers capable of supporting a minimum baud rate of 1 Gbaud. The illustrated guidelines show the main issues to focus on when designing a frame synchronizer based on a parallel approach and how to configure the architecture of such a module to find the best trade-off in terms of accuracy, timing and resource utilization. The proposed approach is compliant with both the CCSDS 131.2-B-1 and CCSDS 131.21-O-1 standards, and our implementation experiments also showed that it is possible to maximize the supported baud rate with very little additional cost in terms of resource (CLB) utilization. As shown by the results in Table 4, the maximum baud rate supported by our solutions was 2.12 Gbaud on the KU115 FPGA, which corresponds to a maximum throughput of 11.40 Gbps and 15.80 Gbps for CCSDS 131.2-B-1 and CCSDS 131.21-O-1, respectively, and it requires less than 14% resource utilization. Therefore, our solution is capable of supporting the throughput requirements of next-generation payload data telemetry systems that comply with these standards. To the best of our knowledge, this work represents the first example in the literature that addresses the topic of the FPGA-based implementation and characterization of a parallel frame synchronization module compliant with the CCSDS 131.2-B-1 and CCSDS 131.21-O-1 standards. In particular, we have proposed a number of solutions to satisfy the baud rate constraint and to overcome the high operating frequency requirement (1 GHz) of serial implementations, which may make them infeasible on FPGA devices. In fact, all the proposed solutions require much lower operating frequencies (maximum 250 MHz and down to 65 MHz), with resource consumption between 6.98% and 26.77% on the KU115 FPGA.

Future work will include the evaluation of optimization techniques to further reduce the logic complexity and resource consumption, the development of the other modules of the baseband receiver for these applications and the integration of security modules [8] compliant with the CCSDS 352.0-B-2 and 355.0-B-2 standards. In terms of optimization, for example, the work presented in [31] could be used to implement complex multipliers using three DSPs instead of four.

Author Contributions

Conceptualization, L.C. and E.P.; methodology, L.C.; validation, L.C.; investigation, L.C., E.P. and M.B.; data curation, L.C. and E.P.; writing—original draft preparation, L.C. and E.P.; writing—review and editing, L.C. and E.P.; visualization, L.C. and E.P.; supervision, L.F.; project administration, L.C.; funding acquisition, L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by IngeniArs S.r.l. and by the Italian Ministry of University and Research (MUR) through project CN4-CN00000023 of the program Recovery and Resilience Plan (PNRR), under grant agreement no. I53C22000720001, and through the project FoReLab of the program “Departments of Excellence”.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Authors Emanuele Pagani and Matteo Bertolucci were employed by the company IngeniArs S.r.l. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ADC	Analog-to-Digital Converter
AES	Advanced Encryption Standard
APSK	Amplitude and Phase-Shift Keying
CCSDS	Consultative Committee for Space Data Systems
CMAC	Cipher-Based Message Authentication Code
CLB	Configurable Logic Block
CW	Codeword
CWS	Codeword Segment
DVB-S2	Digital Video Broadcasting Satellite Second Generation
DSP	Digital Signal Processor
ESA	European Space Agency
EESS	Earth Exploration Satellite Service
FD	Frame Descriptor
FEC	Forward Error Correction
FM	Frame Marker
FPGA	Field Programmable Gate Array
GCM	Galois Counter Mode
GEO	Geostationary Earth Orbit
GMAC	Galois Message Authentication Code
HMAC	Hash-Based Message Authentication Code
IF	Intermediate Frequency
LEO	Low Earth Orbit
LUT	Lookup Table
MATLAB	Matrix Laboratory
MODCOD	Modulation and Coding
PL	Physical Layer
PSK	Phase-Shift Keying
QPSK	Quadrature Phase-Shift Keying
RF	Radio Frequency
SCA	Side-Channel Attack
SCCC	Serially Concatenated Convolutional Code
SCCC-X	Serially Concatenated Convolutional Code eXtension
SDLS	Space Data Link Security
SHA2	Secure Hash Algorithm 2
VHDL	VHSIC Hardware Description Language
VHSIC	Very-High-Speed Integrated Circuit

References

CCSDS. Flexible Advanced Coding and Modulation Scheme for High Rate Telemetry Applications; Recommended Standard CCSDS 131.2-B-1 (Blue Book); CCSDS Secretariat, National Aeronautics and Space Administration: Washington, DC, USA, 2012. [Google Scholar]
Lamoral Coines, A.; Jiménez, V.P.G. CCSDS 131.2-B-1 transmitter design on FPGA with adaptive coding and modulation schemes for satellite communications. Electronics 2021, 10, 2476. [Google Scholar] [CrossRef]
Ugolini, A.; Montorsi, G.; Colavolpe, G. Next Generation High-Rate Telemetry. IEEE J. Sel. Areas Commun. 2018, 36, 327–337. [Google Scholar] [CrossRef]
CCSDS. Serially Concatenated Convolutional Codes-eXtension (SCCC-X); Experimental Specification CCSDS 131.21-O-1 (Orange Book); CCSDS Secretariat, National Aeronautics and Space Administration: Washington, DC, USA, 2021. [Google Scholar]
Baldi, M.; Bertinelli, M.; Chiaraluce, F.; Closas, P.; Dhakal, P.; Garello, R.; Maturo, N.; Navarro, M.; Palomo, J.M.; Paolini, E.; et al. State-of-the-art space mission telecommand receivers. IEEE Aerosp. Electron. Syst. Mag. 2017, 32, 4–15. [Google Scholar] [CrossRef]
CCSDS. CCSDS Cryptographic Algorithms; Recommended Standard CCSDS 352.0-B-2 (Blue Book); CCSDS Secretariat, National Aeronautics and Space Administration: Washington, DC, USA, 2019. [Google Scholar]
CCSDS. Space Data Link Security Protocol; Recommended Standard CCSDS 355.0-B-2 (Blue Book); CCSDS Secretariat, National Aeronautics and Space Administration: Washington, DC, USA, 2022. [Google Scholar]
Crocetti, L.; Falaschi, F.; Saponara, S.; Fanucci, L. Highly-efficient Galois Counter Mode Symmetric Encryption Core for the Space Data Link Security Protocol. In Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2023; Lecture Notes in Electrical Engineering (LNEE); Springer: Cham, Switzerland, 2024; Volume 1110, pp. 297–303. [Google Scholar] [CrossRef]
Baldanzi, L.; Crocetti, L.; Falaschi, F.; Belli, J.; Fanucci, L.; Saponara, S. Digital Random Number Generator Hardware Accelerator IP-Core for Security Applications. In Applications in Electronics Pervading Industry, Environment and Society. ApplePies 2019; Lecture Notes in Electrical Engineering (LNEE); Springer: Cham, Switzerland, 2020; Volume 627, pp. 117–123. [Google Scholar] [CrossRef]
Crocetti, L.; Baldanzi, L.; Bertolucci, M.; Sarti, L.; Carnevale, B.; Fanucci, L. A simulated approach to evaluate side-channel attack countermeasures for the Advanced Encryption Standard. Integration 2019, 68, 80–86. [Google Scholar] [CrossRef]
Crocetti, L.; Pagani, E.; Bertolucci, M.; Fanucci, L. Implementation Strategies for Highly-accurate and Efficient Frame Synchronization Modules in Satellite Communication Receivers. In Proceedings of the 2nd IEEE Industrial Electronics Society Annual Online Conference (IES ONCON 2023), Virtual Online, 8–10 December 2023. [Google Scholar]
Baldi, M.; Bertinelli, M.; Chiaraluce, F.; Closas, P.; Garello, R.; Maturo, N.; Navarro, M.; Palomo, J.M.; Paolini, E.; Pfletschinger, S.; et al. NEXCODE: Next generation uplink coding techniques. In Proceedings of the 2016 International Workshop on Tracking, Telemetry and Command Systems for Space Applications (TTC), Noordwijk, The Netherlands, 13–16 September 2016; pp. 1–8. [Google Scholar]
Wang, P.; Li, H.; Chen, B.; Zhang, S. Enhancing Earth Observation Throughput Using Inter-Satellite Communication. IEEE Trans. Wirel. Commun. 2022, 21, 7990–8006. [Google Scholar] [CrossRef]
Toptsidis, N.; Arapoglou, P.D.; Bertinelli, M. Link adaptation for Ka band low Earth orbit Earth Observation systems: A realistic performance assessment. Int. J. Satell. Commun. Netw. 2012, 30, 131–146. [Google Scholar] [CrossRef]
CCSDS. SCCC—Summary of Definition and Performance; CCSDS 130.11-G-1, Informational Report; CCSDS Secretariat, National Aeronautics and Space Administration: Washington, DC, USA, 2019; Issue 1. [Google Scholar]
Scholtz, R. Frame Synchronization Techniques. IEEE Trans. Commun. 1980, 28, 1204–1213. [Google Scholar] [CrossRef]
Massey, J. Optimum Frame Synchronization. IEEE Trans. Commun. 1972, 20, 115–119. [Google Scholar] [CrossRef]
Nielsen, P. Some Optimum and Suboptimum Frame Synchronizers for Binary Data in Gaussian Noise. IEEE Trans. Commun. 1973, 21, 770–772. [Google Scholar] [CrossRef]
Chiani, M. Noncoherent Frame Synchronization. IEEE Trans. Commun. 2010, 58, 1536–1545. [Google Scholar] [CrossRef]
Choi, Z.Y.; Lee, Y. Frame synchronization in the presence of frequency offset. IEEE Trans. Commun. 2002, 50, 1062–1065. [Google Scholar] [CrossRef]
Choi, Z.Y.; Lee, Y.H. On the use of double correlation for frame synchronization in the presence of frequency offset. In Proceedings of the 1999 IEEE International Conference on Communications (Cat. No. 99CH36311), Vancouver, BC, Canada, 6–10 June 1999; Volume 2, pp. 958–962. [Google Scholar] [CrossRef]
Viterbi, A. CDMA: Principles of Spread Spectrum Communication; Addison-Wesley Wireless Communications Series; Addison-Wesley Publishing Company: Boston, MA, USA, 1995. [Google Scholar]
Pedone, R.; Villanti, M.; Vanelli-Coralli, A.; Corazza, G.E.; Mathiopoulos, P.T. Frame Synchronization in Frequency Uncertainty. IEEE Trans. Commun. 2010, 58, 1235–1246. [Google Scholar] [CrossRef]
Mazzali, N.; Stante, G.; Bhavani, S.M.R.R.; Ottersten, B. Performance analysis of noncoherent frame synchronization in satellite communications with frequency uncertainty. In Proceedings of the 2015 IEEE Symposium on Communications and Vehicular Technology in the Benelux (SCVT), Luxembourg, 24 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
Kim, P.; Corazza, G.E.; Pedone, R.; Villanti, M.; Chang, D.I.; Oh, D.G. Enhanced Frame Synchronization for DVB-S2 System Under a Large of Frequency Offset. In Proceedings of the 2007 IEEE Wireless Communications and Networking Conference, Hong Kong, China, 11–15 March 2007; pp. 1183–1187. [Google Scholar] [CrossRef]
Liess, M.; Lázaro, F.; Munari, A. Frame Synchronization Algorithms for Satellite Internet of Things Scenarios. In Proceedings of the 2022 11th Advanced Satellite Multimedia Systems Conference and the 17th Signal Processing for Space Communications Workshop (ASMS/SPSC), Graz, Austria, 6–8 September 2022; pp. 1–8. [Google Scholar] [CrossRef]
Smyk, R.; Czyżak, M. Improved magnitude estimation of complex numbers using alpha max and beta min algorithm. Zesz. Nauk. Wydz. Elektrotechniki Autom. Politech. Gdan. 2016, 51, 167–171. [Google Scholar]
Volder, J.E. The CORDIC Trigonometric Computing Technique. IRE Trans. Electron. Comput. 1959, EC-8, 330–334. [Google Scholar] [CrossRef]
Parzen, E. On Estimation of a Probability Density Function and Mode. Ann. Math. Stat. 1962, 33, 1065–1076. [Google Scholar] [CrossRef]
Li, Q.; Zeng, X.; Wu, C.; Zhang, Y.; Deng, Y.; Jun, H. Optimal frame synchronization for DVB-S2. In Proceedings of the 2008 IEEE International Symposium on Circuits and Systems (ISCAS), Seattle, WA, USA, 18–21 May 2008; pp. 956–959. [Google Scholar] [CrossRef]
Paz, P.; Garrido, M. Efficient Implementation of Complex Multipliers on FPGAs Using DSP Slices. J. Signal Process. Syst. 2023, 95, 543–550. [Google Scholar] [CrossRef]

Figure 1. High-level outline of the baseband receiver.

Figure 2. Structure of the PL frame defined by CCSDS 131.2-B-1 [1] and CCSDS 131.21-O-1 [4].

Figure 3. Histogram of values for the metric

L_{3} (μ)

[11]. The orange distribution indicates the metric values and the frequency density when using symbols of the FM field; the blue distribution gives the same information when using all other symbols of the frame.

Figure 3. Histogram of values for the metric

L_{3} (μ)

[11]. The orange distribution indicates the metric values and the frequency density when using symbols of the FM field; the blue distribution gives the same information when using all other symbols of the frame.

Figure 4. Received symbol multiplication block of

{\hat{L}}_{3} (μ)

metric calculation unit for serial implementation of a frame synchronization module [11].

Figure 4. Received symbol multiplication block of

{\hat{L}}_{3} (μ)

metric calculation unit for serial implementation of a frame synchronization module [11].

Figure 5. Block diagram of the

{\hat{L}}_{3} (μ)

metric calculation unit for the serial implementation of the frame synchronization module [11].

Figure 5. Block diagram of the

{\hat{L}}_{3} (μ)

metric calculation unit for the serial implementation of the frame synchronization module [11].

Figure 6. High-level block diagram for the serial implementation of the frame synchronization module [11].

Figure 7. FSM of the frame synchronization module [11].

Figure 8. Behavior of the frame synchronization in case of lock event when no offset is present between the beginning of the frame and the beginning of the observation window.

Figure 9. Behavior of the frame synchronization module in case of lock event when an offset is present between the beginning of the frame and the beginning of the observation window.

Figure 10. Block diagram of the

{\hat{L}}_{3} (μ)

metric calculation unit for the parallel implementation of the frame synchronization module [11].

Figure 10. Block diagram of the

{\hat{L}}_{3} (μ)

metric calculation unit for the parallel implementation of the frame synchronization module [11].

Figure 11. High-level block diagram for the proposed parallel implementation approach of the frame synchronization module [11].

Figure 12. Layered structure of adder tree units. Red lines indicate the pipeline stages that can be instantiated, or not, by means of dedicated synthesis parameters in order to reduce the critical path.

Table 1. Value ranges used for combinations of configuration parameters for testing.

Parameter	Values (Range)
Input symbol bit width	8–24
Parallelization factor (PS)	1, 2, 4, 6, 8, 10, 12, 16, 20
Lock threshold (LOCK_THR)	2–5
Unlock threshold (UNLOCK_THR)	2–5
Pipeline stages in adder trees	All possibilities (refer to Figure 12)
MODCODs (CCSDS 131.2-B-1)	1–27
Signal-to-noise ratio ( $E_{S} / N_{0}$ )	−2 dB–30 dB
Symbol phase offset	$0^{\circ}$ , $90^{\circ}$ , $180^{\circ}$ , $270^{\circ}$

Table 2. Implementation results of the proposed solutions on the KU115 FPGA for a target baud rate of 1 Gbaud. Full black circles and empty circles in the sub-columns of the column Adder tree pipeline stage indicate the instantiation or not of the pipeline stage for the corresponding layer of the adder trees, respectively.

Run	PS	Adder Tree Pipeline Stage								Required Frequency [MHz]	CLBs		CLB LUTs		CLB Registers		DSPs		Efficiency [kbaud/CLB]
Run	PS	1	2	3	4	5	6	7	8	Required Frequency [MHz]	CLBs		CLB LUTs		CLB Registers		DSPs		Efficiency [kbaud/CLB]
#1	4	∘	∘	•	∘	∘	•	∘	•	250	5859	(7.07%)	32,979	(4.97%)	11,834	(0.89%)	16	(0.29%)	170.68
#2	4	∘	•	∘	∘	•	∘	∘	•	250	5793	(6.99%)	33,372	(5.03%)	15,675	(1.18%)	16	(0.29%)	172.62
#3	4	∘	•	∘	∘	•	∘	•	∘	250	5791	(6.98%)	33,391	(5.03%)	15,819	(1.19%)	16	(0.29%)	172.68
#4	8	∘	∘	∘	•	∘	∘	∘	∘	125	11,194	(13.50%)	62,221	(9.38%)	12,195	(0.92%)	32	(0.58%)	89.33
#5	8	∘	∘	∘	∘	•	∘	∘	∘	125	11,221	(13.53%)	62,102	(9.36%)	10,275	(0.77%)	32	(0.58%)	89.12
#6	16	∘	∘	∘	∘	∘	∘	∘	∘	62.5	22,212	(26.79%)	120,613	(18.18%)	9909	(0.75%)	64	(1.16%)	45.02

Table 3. Implementation results of the solutions presented in Table 2 on the KU115 FPGA for the maximum supported frequency. The run identifier (Run column Run) corresponds to the one in Table 2. Full black circles and empty circles in the sub-columns of the column Adder tree pipeline stage indicate the instantiation or not of the pipeline stage for the corresponding layer of the adder trees, respectively.

Run	PS	Adder Tree Pipeline Stage								Maximum Frequency [MHz]	Baud Rate [Gbaud]	CLBs		CLB LUTs		CLB Registers		DSPs		Efficiency [kbaud/CLB]
Run	PS	1	2	3	4	5	6	7	8	Maximum Frequency [MHz]	Baud Rate [Gbaud]	CLBs		CLB LUTs		CLB Registers		DSPs		Efficiency [kbaud/CLB]
#3	4	∘	•	∘	∘	•	∘	•	∘	265	1.06	5788	(6.98%)	33,399	(5.03%)	15,821	(1.19%)	16	(0.29%)	183.14
#5	8	∘	∘	∘	∘	•	∘	∘	∘	166.67	1.33	11,080	(13.36%)	62,597	(9.44%)	10,431	(0.79%)	32	(0.58%)	120.34
#6	16	∘	∘	∘	∘	∘	∘	∘	∘	85	1.36	22,198	(26.77%)	121,340	(18.29%)	10,184	(0.77%)	64	(1.16%)	61.27

Table 4. Implementation results of the solutions presented in Table 2 on the KU115 FPGA for the maximum supported frequency. The run identifier (Run column) corresponds to the one in Table 2. Full black circles and empty circles in the sub-columns of the column Adder tree pipeline stage indicate the instantiation or not of the pipeline stage for the corresponding layer of the adder trees, respectively.

Run	PS	Adder Tree Pipeline Stage								Maximum Frequency [MHz]	Baud Rate [Gbaud]	CLBs		CLB LUTs		CLB Registers		DSPs		Efficiency [kbaud/CLB]
Run	PS	1	2	3	4	5	6	7	8	Maximum Frequency [MHz]	Baud Rate [Gbaud]	CLBs		CLB LUTs		CLB Registers		DSPs		Efficiency [kbaud/CLB]
#5	8	∘	∘	∘	∘	•	∘	∘	∘	166.67	1.33	11,080	(13.36%)	62,597	(9.44%)	10,431	(0.79%)	32	(0.58%)	120.34
#7	8	∘	•	∘	∘	•	∘	•	∘	265	2.12	11,150	(13.45%)	63,458	(9.57%)	25,684	(1.94%)	32	(0.58%)	190.13

Table 5. Comparison with other existing solutions from the literature.

Work	PS	Adder Tree Pipeline Stages	Frequency (Max.)	Baud Rate (Max.)	Depth of Complex Product Buffer	Supported Standards	Supported Constellations
This work	Configurable	Configurable	Up to 265 MHz	>2 Gbaud	255 + PS - 1	CCSDS 131.2-B-1 CCSDS 131.21-O-1	QPSK 8-PSK 16-ASPK 32-APSK 64-APSK
[30]	Fixed (4)	Fixed (3)	–	25 Mbaud	31	DVB-S2	QPSK 8-PSK

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Crocetti, L.; Pagani, E.; Bertolucci, M.; Fanucci, L. Scalable Hardware-Efficient Architecture for Frame Synchronization in High-Data-Rate Satellite Receivers. Electronics 2024, 13, 668. https://doi.org/10.3390/electronics13030668

AMA Style

Crocetti L, Pagani E, Bertolucci M, Fanucci L. Scalable Hardware-Efficient Architecture for Frame Synchronization in High-Data-Rate Satellite Receivers. Electronics. 2024; 13(3):668. https://doi.org/10.3390/electronics13030668

Chicago/Turabian Style

Crocetti, Luca, Emanuele Pagani, Matteo Bertolucci, and Luca Fanucci. 2024. "Scalable Hardware-Efficient Architecture for Frame Synchronization in High-Data-Rate Satellite Receivers" Electronics 13, no. 3: 668. https://doi.org/10.3390/electronics13030668

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Scalable Hardware-Efficient Architecture for Frame Synchronization in High-Data-Rate Satellite Receivers^†

Abstract

1. Introduction

2. PL Frame for CCSDS 131.2-B-1 and 131.21-O-1 and Synchronization Algorithms

3. Accuracy versus Resource Consumption Trade-Off in Synchronization Algorithms

4. Architecture of the Frame Synchronization Module

4.1. Serial Frame Synchronization Module

4.2. Parallel Frame Synchronization Module

4.3. Considerations for Hardware Implementations and Testing

5. Results

Comparison with Other Solutions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Scalable Hardware-Efficient Architecture for Frame Synchronization in High-Data-Rate Satellite Receivers †

Abstract

1. Introduction

2. PL Frame for CCSDS 131.2-B-1 and 131.21-O-1 and Synchronization Algorithms

3. Accuracy versus Resource Consumption Trade-Off in Synchronization Algorithms

4. Architecture of the Frame Synchronization Module

4.1. Serial Frame Synchronization Module

4.2. Parallel Frame Synchronization Module

4.3. Considerations for Hardware Implementations and Testing

5. Results

Comparison with Other Solutions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Scalable Hardware-Efficient Architecture for Frame Synchronization in High-Data-Rate Satellite Receivers^†