Integrated DVB-X2 Receiver Architecture with Common Acceleration Engine

Featured Application: This work can be used as a reference for the development of integrated broadcast receivers. Abstract: This paper proposes an integrated DVB-X2 receiver architecture to support multi-mode broadcasting standards such as DVB-T2, DVB-C2, and DVB-S2 in a single platform. The entire system consists of a tuner block, a H / W-based receiver engine, a frame processor, and an A / V decoder. Speciﬁcally, an integrated architecture to solve key design and technical issues such as reducing the complexity of the receiver, e ﬃ ciently accessing the H / W-based receiver engine, and simplifying an OFDM demodulator is proposed. The H / W-based receiver engine for DVB-X2 demodulation and channel decoding functions is implemented in two FPGA devices. The frame processor is implemented with 256 MB memory and a DSP operating at a clock speed of 1.0 GHz. To verify functionalities of the proposed DVB-X2 receiver, various test scenarios were considered in the laboratory setting. In particular, the proposed system was tested under various operating modes, as speciﬁed in standards such as DVB-T2, DVB-C2, and DVB-S2, and demonstrated successful operations in all test scenarios.

Although there are many broadcasting receivers that support the individual standards of DVB-X2, developing an integrated DVB-X2 receiver to support multi-mode broadcasting standards in a single platform is still challenging. There are many similarities among DVB-T2, DVB-C2 and DVB-S2. Each of the standard technologies consists of similar, or the same, functional blocks. Therefore, the integrated DVB-X2 receiver can be implemented by minimizing additional hardware (H/W) logic resources. If one receiver is able to receive different types of broadcasts regardless of the transmission media, manufacturing and production costs can be reduced, rather than developing receivers for each consumer broadcasting system separately. The integrated receiver allows low-cost and small-size consumer set-top boxes to be designed for DVB-T2, DVB-C2, and DVB-S2 networks. For these reasons, various commercial chips were developed to support multi-mode broadcasting standards.
This paper proposes an integrated DVB-X2 receiver architecture with the common accelerator. The remainder of this paper is organized as follows. After detailed system descriptions in Section 2, architectures of the proposed receiver and DVB-T2/C2 demodulator are described in Sections 3 and 4, respectively. Implementation results and conclusions are presented in Sections 5 and 6, respectively.

System Description
In this section, brief descriptions of DVB-T2, DVB-C2, and DVB-S2 standards as well as the proposed DVB-X2 receiver are given.

Multi-Mode Broadcasting Receiver
There are several issues in design and implementation of the multi-mode broadcasting receiver. First, DVB-T2, DVB-C2, and DVB-S2 transmission systems operate with a number of different transmission modes and parameters, which makes the receiver architecture more complex. A software (S/W)-based implementation enables one to implement complex functions easily and more rapidly than a H/W-based implementation [5]. Moreover, addition and modification of functions are easily extensible through the S/W upgrade. However, the S/W-based implementation is slower and consumes more power than the H/W-based implementation. For these reasons, it is efficient to implement complex and numerous functions in the S/W-based implementation and only implement simple functions that require heavy computation in the H/W-based implementation. Therefore, an efficient receiver architecture that can reduce the complexity of the receiver through co-design of the H/W and S/W, is required to support various transmission modes and parameters. Second, DVB-T2, DVB-C2 and DVB-S2 systems consist of similar or practically the same functional blocks and system-specific blocks of each system. When similar or the same functional blocks are implemented with a common accelerator, an integrated receiver that supports multi-mode can be efficiently implemented. Furthermore, a large volume of data can be transferred between the common accelerator and specific blocks of each system. Therefore, an access method for the common accelerator affects the system performance such as latency and processing time. In this regard, an effective interface scheme is required to access the common accelerator. Lastly, DVB-C2 uses the same OFDM sub-carrier spacing as a 4K FFT mode used in DVB-T2, and it uses a part of the guard interval (GI) lengths used in DVB-T2. It also uses the same scattered pilot patterns, which allow implementation of the same channel estimation block for both systems [4]. Consequently, a low complexity demodulator architecture that combines the DVB-T2 demodulator and the DVB-C2 demodulator without severe overhead is required.

DVB-T2, DVB-C2, and DVB-S2 Standards
DVB-T2 provides six FFT sizes up to 32K FFT, seven diverse GIs, and four OFDM modulation-based modulation schemes up to 256-quadrature amplitude modulation (QAM). In addition, it offers a concatenated LDPC and BCH code with various code rates. DVB-C2 provides 4K FFT sizes, two GIs, five OFDM modulation-based modulation schemes up to 4096-QAM, and the LDPC/BCH code. DVB-S2 provides four modulation modes such as quadrature phase-shift keying (QPSK), 8-PSK, 16-amplitude and PSK (APSK), and 32-APSK. It also offers the LDPC/BCH code with various code rates. Table 1 summarizes the DVB-T2, DVB-C2, and DVB-S2 features. Figure 1 shows block diagrams of DVB-T2, DVB-C2, and DVB-S2 receivers.   Figure 2 shows the frame structure of DVB-T2, DVB-C2, and DVB-S2. In DVB-T2, a super frame consists of up to 255 T2 frames. A T2 frame consists of a P1 symbol, a P2 symbol, and data symbols. The P1 symbol as a start signal of the frame is designed to be suitable for estimating time synchronization and coarse fractional frequency synchronization, and it carries 7-bit basic system  Figure 2 shows the frame structure of DVB-T2, DVB-C2, and DVB-S2. In DVB-T2, a super frame consists of up to 255 T2 frames. A T2 frame consists of a P1 symbol, a P2 symbol, and data symbols.   Figure 2 shows the frame structure of DVB-T2, DVB-C2, and DVB-S2. In DVB-T2, a super frame consists of up to 255 T2 frames. A T2 frame consists of a P1 symbol, a P2 symbol, and data symbols. The P1 symbol as a start signal of the frame is designed to be suitable for estimating time synchronization and coarse fractional frequency synchronization, and it carries 7-bit basic system The P1 symbol as a start signal of the frame is designed to be suitable for estimating time synchronization and coarse fractional frequency synchronization, and it carries 7-bit basic system Appl. Sci. 2019, 9, 3983 4 of 16 information such as FFT size and multiple input single output (MISO) availability. The P1 symbol is a 1K OFDM symbol modulated into binary PSK (BPSK), and has a C-A-B structure of two GIs back and forth. The P2 symbol provides Layer 1 signaling information, and consists of an L1 pre-signaling and an L1 post-signaling. The L1 pre-signaling transfers peak to average power ratio (PAPR) availability, L1 post-signaling related information, pilot patterns, a number of data symbols, a number of T2 frames, and GI information. The L1 post-signaling contains physical layer pipe (PLP) and future extension frame (FEF)-related information. A C2 frame consists of a preamble symbol and data symbols. The preamble symbol includes L1 signaling part 2 data, which provide all of the information needed for a receiver to access an L2 signaling and PLPs in the C2 frame. A preamble header is composed of 32 OFDM cells, which are placed in front of the L1 time interleaving block in each of the preamble symbols. In DVB-S2, each frame is made up of a PLHEADER and slots, in which DTV broadcasting contents are actually transferred. The PLHEADER consists of the start of a frame (SOF) symbol and a physical layer signaling (PLS) code symbol. The SOF symbol is sent at the beginning of the frame. The PLS code symbol includes a MODCOD field specifying the modulation and code type, and the type field that specifies the FECFRAME length (64,800 bits or 16,200 bits) and the presence/absence of pilots. The PLHEADER is composed of 90 symbols, and each slot is also composed of 90 symbols. The number of slots, S is determined by the modulation scheme and the size of code blocks. The pilot blocks, consisting of 36 symbols in every 16 slots, are transferred.

System Overview
The proposed DVB-X2 receiver provides digital terrestrial, cable, and satellite DTV services in a single receiver platform. Figure 3 shows the block diagram of the proposed DVB-X2 receiver.
Appl. Sci. 2019, 9, x 4 of 16 information such as FFT size and multiple input single output (MISO) availability. The P1 symbol is a 1K OFDM symbol modulated into binary PSK (BPSK), and has a C-A-B structure of two GIs back and forth. The P2 symbol provides Layer 1 signaling information, and consists of an L1 pre-signaling and an L1 post-signaling. The L1 pre-signaling transfers peak to average power ratio (PAPR) availability, L1 post-signaling related information, pilot patterns, a number of data symbols, a number of T2 frames, and GI information. The L1 post-signaling contains physical layer pipe (PLP) and future extension frame (FEF)-related information. A C2 frame consists of a preamble symbol and data symbols. The preamble symbol includes L1 signaling part 2 data, which provide all of the information needed for a receiver to access an L2 signaling and PLPs in the C2 frame. A preamble header is composed of 32 OFDM cells, which are placed in front of the L1 time interleaving block in each of the preamble symbols. In DVB-S2, each frame is made up of a PLHEADER and slots, in which DTV broadcasting contents are actually transferred. The PLHEADER consists of the start of a frame (SOF) symbol and a physical layer signaling (PLS) code symbol. The SOF symbol is sent at the beginning of the frame. The PLS code symbol includes a MODCOD field specifying the modulation and code type, and the type field that specifies the FECFRAME length (64,800 bits or 16,200 bits) and the presence/absence of pilots. The PLHEADER is composed of 90 symbols, and each slot is also composed of 90 symbols. The number of slots, S is determined by the modulation scheme and the size of code blocks. The pilot blocks, consisting of 36 symbols in every 16 slots, are transferred.

System Overview
The proposed DVB-X2 receiver provides digital terrestrial, cable, and satellite DTV services in a single receiver platform. Figure 3 shows the block diagram of the proposed DVB-X2 receiver. The tuner block made up of DVB-T2, DVB-C2, and DVB-S2 tuners converts radio frequency (RF) signals into digital baseband signals. The demodulation block made up of DVB-T2, DVB-C2, and DVB-S2 demodulators starts the operation when receiving the corresponding transmission signal, and performs the received signal processing of each corresponding baseband such as synchronization, channel estimation, and equalization. The common accelerator performs de-interleaving and LDPC/BCH decoding functions. The frame processor processes the data area using the header information of the corresponding BB frame of DVB-T2, DVB-C2, and DVB-S2. The de-multiplexer (De-MUX) performs the inverse of the stream and mode adaptations to reproduce transport stream (TS), generic continuous stream (GCS), generic fixed-length packetized stream (GFPS) or generic stream encapsulation (GSE) sent by the transmitter. The audio/video (A/V) decoder reproduces audio and video signals from TS, GCS, GFPS, or GSE signals.

DVB-X2 Receiver Architecture
The common accelerator consists of de-interleavers such as frequency, time, and cell, and a forward error correction (FEC) decoding block such as a constellation de-mapper, a bit de-interleaver, and an LDPC/BCH decoder. The frequency de-interleaver is performed within one OFDM symbol using a permutation function. The time de-interleaver is operated in the interleaving frame, which is The tuner block made up of DVB-T2, DVB-C2, and DVB-S2 tuners converts radio frequency (RF) signals into digital baseband signals. The demodulation block made up of DVB-T2, DVB-C2, and DVB-S2 demodulators starts the operation when receiving the corresponding transmission signal, and performs the received signal processing of each corresponding baseband such as synchronization, channel estimation, and equalization. The common accelerator performs de-interleaving and LDPC/ BCH decoding functions. The frame processor processes the data area using the header information of the corresponding BB frame of DVB-T2, DVB-C2, and DVB-S2. The de-multiplexer (De-MUX) performs the inverse of the stream and mode adaptations to reproduce transport stream (TS), generic continuous stream (GCS), generic fixed-length packetized stream (GFPS) or generic stream encapsulation (GSE) sent by the transmitter. The audio/video (A/V) decoder reproduces audio and video signals from TS, GCS, GFPS, or GSE signals.

DVB-X2 Receiver Architecture
The common accelerator consists of de-interleavers such as frequency, time, and cell, and a forward error correction (FEC) decoding block such as a constellation de-mapper, a bit de-interleaver, and an LDPC/BCH decoder. The frequency de-interleaver is performed within one OFDM symbol using a permutation function. The time de-interleaver is operated in the interleaving frame, which is composed of a number of FEC blocks. The cell de-interleaver performs the de-interleaving process within a single FEC block with a size of 64,800 or 16,200 using a permutation function. These de-interleavers perform simple tasks such as memory read and write operations. However, they require a large amount of memory, and need to generate different types of permutation addresses caused by diverse operation modes and parameters. Thus, the frequency, time, and cell de-interleavers are more effective implemented in S/W than in H/W. The decoding process of the LDPC decoder is performed with complex operations between variable nodes and check nodes. Because the error correction performance of the LDPC decoder improves through repetitive calculations, the LDPC decoder requires a large amount of high-speed calculations. Therefore, the LDPC decoder is effective implemented in H/W in order to improve the reception performance. The BCH decoder concatenated with the LDPC decoder calculates syndromes to find errors of the input data, determines the error location through the error location polynomial and outputs after revising the delayed original input data.
The proposed DVB-X2 receiver architecture is composed of a tuner block, a H/W-based receiver engine, a S/W-based frame processor, and an A/V decoder as shown in Figure 4. The H/W-based receiver engine is composed of functional blocks that require real-time operation. On the other hand, the frame processor does not require real-time operation, and is composed of functional blocks, which support many different modes and parameters. The H/W-based receiver engine includes a DVB-T2/C2 demodulator, a DVB-S2 demodulator, and a channel decoder. The frame processor performs three main functions, such as the channel estimation function for the DVB-T2/C2 demodulator, various de-interleaving functions, and the frame decoding function such as the frame de-mapping and frame decoding. composed of a number of FEC blocks. The cell de-interleaver performs the de-interleaving process within a single FEC block with a size of 64,800 or 16,200 using a permutation function. These deinterleavers perform simple tasks such as memory read and write operations. However, they require a large amount of memory, and need to generate different types of permutation addresses caused by diverse operation modes and parameters. Thus, the frequency, time, and cell de-interleavers are more effective implemented in S/W than in H/W. The decoding process of the LDPC decoder is performed with complex operations between variable nodes and check nodes. Because the error correction performance of the LDPC decoder improves through repetitive calculations, the LDPC decoder requires a large amount of high-speed calculations. Therefore, the LDPC decoder is effective implemented in H/W in order to improve the reception performance. The BCH decoder concatenated with the LDPC decoder calculates syndromes to find errors of the input data, determines the error location through the error location polynomial and outputs after revising the delayed original input data.
The proposed DVB-X2 receiver architecture is composed of a tuner block, a H/W-based receiver engine, a S/W-based frame processor, and an A/V decoder as shown in Figure 4. The H/W-based receiver engine is composed of functional blocks that require real-time operation. On the other hand, the frame processor does not require real-time operation, and is composed of functional blocks, which support many different modes and parameters. The H/W-based receiver engine includes a DVB-T2/C2 demodulator, a DVB-S2 demodulator, and a channel decoder. The frame processor performs three main functions, such as the channel estimation function for the DVB-T2/C2 demodulator, various de-interleaving functions, and the frame decoding function such as the frame de-mapping and frame decoding.

DVB-T2/C2 OFDM Demodulator
The performance of an OFDM demodulator is mainly dependent on the channel estimation algorithm. A channel estimator does not need real-time processing because it estimates the channel using a number of OFDM symbols. Thus, the channel estimator is easily able to apply various channel estimation algorithms for channel change in operation if implemented with the upgradable and flexible S/W. Functional blocks operating in a sample level are efficient implemented in H/W when high-speed signal processing operations are required. Functional blocks relating to the performance of the demodulator and non-real-time process are effective implemented in S/W. Figure 5 shows the block diagram of the proposed DVB-T2/C2 demodulator. The demodulator can select one of a H/W-based channel estimator or an embedded S/W-based channel estimator. The DVB-T2/C2 demodulator stores pilots passing the FFT block into the pilot memory and stores the

DVB-T2/C2 OFDM Demodulator
The performance of an OFDM demodulator is mainly dependent on the channel estimation algorithm. A channel estimator does not need real-time processing because it estimates the channel using a number of OFDM symbols. Thus, the channel estimator is easily able to apply various channel estimation algorithms for channel change in operation if implemented with the upgradable and flexible S/W. Functional blocks operating in a sample level are efficient implemented in H/W when high-speed signal processing operations are required. Functional blocks relating to the performance of the demodulator and non-real-time process are effective implemented in S/W. Figure 5 shows the block diagram of the proposed DVB-T2/C2 demodulator. The demodulator can select one of a H/W-based channel estimator or an embedded S/W-based channel estimator. The DVB-T2/C2 demodulator stores pilots passing the FFT block into the pilot memory and stores the residual symbol timing offset (STO) and carrier frequency offset (CFO) recovered from OFDM symbols into the OFDM memory. In addition, the demodulator stores an index of the current OFDM symbol stored in the OFDM memory into the status register. Then, it generates an interrupt signal to a frame processor. The frame processor performs channel estimation using pilots stored in the pilot memory after receiving the interrupt signal, and stores the estimated channel information into the channel memory. The frame processor is aware of the index information of the included OFDM symbol of pilots currently being processed by referencing the index of the current OFDM symbol stored in the status register. Channel equalization is performed by selecting the channel information stored in the channel memory or the channel information generated by the H/W-based channel estimator. OFDM symbols stored in the OFDM memory are used for de-interleaving and frame de-mapping in the frame processor.
Appl. Sci. 2019, 9, x 6 of 16 residual symbol timing offset (STO) and carrier frequency offset (CFO) recovered from OFDM symbols into the OFDM memory. In addition, the demodulator stores an index of the current OFDM symbol stored in the OFDM memory into the status register. Then, it generates an interrupt signal to a frame processor. The frame processor performs channel estimation using pilots stored in the pilot memory after receiving the interrupt signal, and stores the estimated channel information into the channel memory. The frame processor is aware of the index information of the included OFDM symbol of pilots currently being processed by referencing the index of the current OFDM symbol stored in the status register. Channel equalization is performed by selecting the channel information stored in the channel memory or the channel information generated by the H/W-based channel estimator. OFDM symbols stored in the OFDM memory are used for de-interleaving and frame demapping in the frame processor.

DVB-S2 Demodulator
The DVB-S2 demodulator consists of a matched filter, a symbol timing recovery, a carrier phase recovery, a frame detector, a PLS decoder, a PL descrambler, an equalizer, and a de-mapper as shown in Figure 6. The matched filter is commonly used to maximize the signal to noise ratio (SNR) in the presence of additive stochastic noise. It is implemented using a 41-tap square root raised cosine filter. The symbol timing recovery block corrects the timing errors included in the process to convert samples to symbols. It corrects the timing errors by using an interpolator. The Gardner algorithm is used for estimation of the timing error [6]. The interpolation filter is implemented using a Cubic Lagrange interpolator. The carrier phase recovery is designed with a second order phase-locked loop (PLL), which recovers the carrier frequency and phase offsets by using the automatic frequency control function. The frame detector searches the start position of the DVB-S2 frame using the SOF symbol and the PLS code. It is implemented using a correlator. The equalizer corrects the errors in the magnitude and phase of the received signals. The PLS decoder is implemented using a Reed-Muller (32, 6) decoder. The PL descrambler performs the descrambling operation for the XFECFRAME of the PL frame.

DVB-S2 Demodulator
The DVB-S2 demodulator consists of a matched filter, a symbol timing recovery, a carrier phase recovery, a frame detector, a PLS decoder, a PL descrambler, an equalizer, and a de-mapper as shown in Figure 6. The matched filter is commonly used to maximize the signal to noise ratio (SNR) in the presence of additive stochastic noise. It is implemented using a 41-tap square root raised cosine filter. The symbol timing recovery block corrects the timing errors included in the process to convert samples to symbols. It corrects the timing errors by using an interpolator. The Gardner algorithm is used for estimation of the timing error [6]. The interpolation filter is implemented using a Cubic Lagrange interpolator. The carrier phase recovery is designed with a second order phase-locked loop (PLL), which recovers the carrier frequency and phase offsets by using the automatic frequency control function. The frame detector searches the start position of the DVB-S2 frame using the SOF symbol and the PLS code. It is implemented using a correlator. The equalizer corrects the errors in the magnitude and phase of the received signals. The PLS decoder is implemented using a Reed-Muller (32, 6) decoder. The PL descrambler performs the descrambling operation for the XFECFRAME of the PL frame. Appl. Sci. 2019, 9, x 7 of 16 Figure 6. Block diagram of the of DVB-S2 demodulator.

LDPC Decoder
The belief-propagation or sum-product algorithm (SPA) provides a powerful method for decoding LDPC codes [7]. However, a large amount of computation is required for check node operations [8]. The min-sum algorithm is introduced to reduce the complexity of the check node operations of the SPA [9]. The layered decoding algorithm is a form of partially parallel decoding by dividing a parity check matrix into multiple layers and it has a fast convergence rate [10]. LDPC codes in DVB-T2, DVB-C2, and DVB-S2 are quasi-cyclic (QC) LDPC codes, which suit the H/W implementation well due to the regularity in parity check matrices. By performing specific permutation on the original parity check matrix of the LDPC code, its parity check matrix can be transformed into a pattern similar to QC-LDPC codes with 360 by 360 sub-blocks.
The LDPC decoder consists of a log likelihood ratio (LLR) memory update unit, two permutation network units, and P processing element (PE) units as shown in Figure 7. The LLR memory update unit is used for storing the initial and updated LLR values for each bit of a code word. Two permutation network units are used for shuffling and reshuffling messages, and they are implemented using two barrel shifters. The PE unit computes messages of variable and check nodes using the layered decoding scheme. It consists of two MIN_GEN units, a check node update (CNU) unit and random access memory (RAM) to store the check node messages. Two MIN_GEN units generate check node messages for previous and current iterations using the check node messages stored in RAM. The CNU unit finds the minimum value among variable node messages. The LDPC decoder is implemented with a partially parallel architecture by employing 90 PE units and it decodes the LDPC code using 6-bit LLR messages.

LDPC Decoder
The belief-propagation or sum-product algorithm (SPA) provides a powerful method for decoding LDPC codes [7]. However, a large amount of computation is required for check node operations [8]. The min-sum algorithm is introduced to reduce the complexity of the check node operations of the SPA [9]. The layered decoding algorithm is a form of partially parallel decoding by dividing a parity check matrix into multiple layers and it has a fast convergence rate [10]. LDPC codes in DVB-T2, DVB-C2, and DVB-S2 are quasi-cyclic (QC) LDPC codes, which suit the H/W implementation well due to the regularity in parity check matrices. By performing specific permutation on the original parity check matrix of the LDPC code, its parity check matrix can be transformed into a pattern similar to QC-LDPC codes with 360 by 360 sub-blocks.
The LDPC decoder consists of a log likelihood ratio (LLR) memory update unit, two permutation network units, and P processing element (PE) units as shown in Figure 7. The LLR memory update unit is used for storing the initial and updated LLR values for each bit of a code word. Two permutation network units are used for shuffling and reshuffling messages, and they are implemented using two barrel shifters. The PE unit computes messages of variable and check nodes using the layered decoding scheme. It consists of two MIN_GEN units, a check node update (CNU) unit and random access memory (RAM) to store the check node messages. Two MIN_GEN units generate check node messages for previous and current iterations using the check node messages stored in RAM. The CNU unit finds the minimum value among variable node messages. The LDPC decoder is implemented with a partially parallel architecture by employing 90 PE units and it decodes the LDPC code using 6-bit LLR messages.

LDPC Decoder
The belief-propagation or sum-product algorithm (SPA) provides a powerful method for decoding LDPC codes [7]. However, a large amount of computation is required for check node operations [8]. The min-sum algorithm is introduced to reduce the complexity of the check node operations of the SPA [9]. The layered decoding algorithm is a form of partially parallel decoding by dividing a parity check matrix into multiple layers and it has a fast convergence rate [10]. LDPC codes in DVB-T2, DVB-C2, and DVB-S2 are quasi-cyclic (QC) LDPC codes, which suit the H/W implementation well due to the regularity in parity check matrices. By performing specific permutation on the original parity check matrix of the LDPC code, its parity check matrix can be transformed into a pattern similar to QC-LDPC codes with 360 by 360 sub-blocks.
The LDPC decoder consists of a log likelihood ratio (LLR) memory update unit, two permutation network units, and P processing element (PE) units as shown in Figure 7. The LLR memory update unit is used for storing the initial and updated LLR values for each bit of a code word. Two permutation network units are used for shuffling and reshuffling messages, and they are implemented using two barrel shifters. The PE unit computes messages of variable and check nodes using the layered decoding scheme. It consists of two MIN_GEN units, a check node update (CNU) unit and random access memory (RAM) to store the check node messages. Two MIN_GEN units generate check node messages for previous and current iterations using the check node messages stored in RAM. The CNU unit finds the minimum value among variable node messages. The LDPC decoder is implemented with a partially parallel architecture by employing 90 PE units and it decodes the LDPC code using 6-bit LLR messages.

Interface between the H/W Engine and the Frame Processor
A large amount of data is exchanged between the H/W engine and the frame processor. Therefore, if the data is not effectively exchanged, the frame processor may have difficulty in processing the DVB-T2, DVB-C2, or DVB-S2 frame. To solve this problem, an interrupt method is used. The DVB-T2/C2 demodulator generates an interrupt each time an OFDM symbol is received. It provides an index each time an OFDM symbol is stored in order to prevent loss of an OFDM symbol. The DVB-S2 demodulator is difficult to share the first-in first-out (FIFO) memory used in the DVB-T2/C2 demodulator because of the clock using a different frequency. Thus, it directly interfaces with the LDPC decoder through a separate FIFO memory. If multiple sources generate interruptions, it can degrade the performance of the frame processor. Therefore, data transfer between the LDPC decoder and the frame processor is based solely on the interrupt generated by the DVB-T2/C2 demodulator. Figure 8 shows the procedure of data transfer between the frame processor and the H/W engine to perform decoding of the T2 frame. The operation of the frame processor is as follows. If the H/W engine triggers the interrupt, the frame processor moves the data stored in the memory of the H/W engine into the internal memory of the frame processor. Then, the frame processor performs the possible functions with the data collected up to then and waits for the interrupt again. The functions of the frame processor include functions to collect many OFDM symbol data. For example, in order to perform the time de-interleaving function, a number of T2 frames need to be received depending on the operation mode. Thus, each time the interrupt is generated, it performs only storing data needed to perform the time de-interleaving function and performs other functions. When the interrupt is generated again and all of the data is collected, the frame processor performs the time de-interleaving function and other functions. A large amount of data is exchanged between the H/W engine and the frame processor. Therefore, if the data is not effectively exchanged, the frame processor may have difficulty in processing the DVB-T2, DVB-C2, or DVB-S2 frame. To solve this problem, an interrupt method is used. The DVB-T2/C2 demodulator generates an interrupt each time an OFDM symbol is received. It provides an index each time an OFDM symbol is stored in order to prevent loss of an OFDM symbol. The DVB-S2 demodulator is difficult to share the first-in first-out (FIFO) memory used in the DVB-T2/C2 demodulator because of the clock using a different frequency. Thus, it directly interfaces with the LDPC decoder through a separate FIFO memory. If multiple sources generate interruptions, it can degrade the performance of the frame processor. Therefore, data transfer between the LDPC decoder and the frame processor is based solely on the interrupt generated by the DVB-T2/C2 demodulator. Figure 8 shows the procedure of data transfer between the frame processor and the H/W engine to perform decoding of the T2 frame. The operation of the frame processor is as follows. If the H/W engine triggers the interrupt, the frame processor moves the data stored in the memory of the H/W engine into the internal memory of the frame processor. Then, the frame processor performs the possible functions with the data collected up to then and waits for the interrupt again. The functions of the frame processor include functions to collect many OFDM symbol data. For example, in order to perform the time de-interleaving function, a number of T2 frames need to be received depending on the operation mode. Thus, each time the interrupt is generated, it performs only storing data needed to perform the time de-interleaving function and performs other functions. When the interrupt is generated again and all of the data is collected, the frame processor performs the time deinterleaving function and other functions.

DVB-T2/C2 Demodulator Architecture
In this section, a DVB-T2/C2 demodulator architecture, which combines DVB-T2 and DVB-C2 demodulators with low H/W complexity by sharing the functional blocks, is described.
The proposed DVB-T2/C2 demodulator consists of a common OFDM demodulation unit, a DVB-T2/C2 specific unit, and a control unit as shown in Figure 9. The common OFDM demodulation unit includes a signal power detector, a GI detector, CFO and STO synchronization blocks, a variable FFT,

DVB-T2/C2 Demodulator Architecture
In this section, a DVB-T2/C2 demodulator architecture, which combines DVB-T2 and DVB-C2 demodulators with low H/W complexity by sharing the functional blocks, is described.
The proposed DVB-T2/C2 demodulator consists of a common OFDM demodulation unit, a DVB-T2/C2 specific unit, and a control unit as shown in Figure 9. The common OFDM demodulation unit includes a signal power detector, a GI detector, CFO and STO synchronization blocks, a variable FFT, a channel equalizer, and residual CFO and STO synchronization blocks. The DVB-T2/C2 specific unit is composed of a P1 symbol detector, a T2 pilot generator, a C2 sub-carrier and preamble detector, and a C2 pilot generator. The control unit controls for the DVB-T2/C2 demodulator operating in DVB-T2 or DVB-C2 mode.
Appl. Sci. 2019, 9, x 9 of 16 a channel equalizer, and residual CFO and STO synchronization blocks. The DVB-T2/C2 specific unit is composed of a P1 symbol detector, a T2 pilot generator, a C2 sub-carrier and preamble detector, and a C2 pilot generator. The control unit controls for the DVB-T2/C2 demodulator operating in DVB-T2 or DVB-C2 mode.

Mode Detection
Detection of DVB-T2 signals is available by detecting the P1 symbol in the time domain. The P1 symbol transmitted in the start of each T2 frame includes information such as FFT sizes and MISO mode. Thus, if the P1 symbol is detected, information of FFT sizes to operate the demodulator in the DVB-T2 mode can be obtained. DVB-C2 uses a fixed 4K FFT. Thus, it is possible to detect the GI length and remove CFO and STO. The C2 preamble can be detected in the frequency domain through the FFT operation. Therefore, if the proposed demodulator is operated in the DVB-C2 mode at the beginning, the DVB-T2 or DVB-C2 signal can be detected in the time and frequency domains at the same time. An alternative method can be used for mode detection. DVB-C2 uses two GI lengths such as 1/128 and 1/64. Table 2 shows the number of GI samples used in DVB-T2. The 4K FFT mode of DVB-T2 does not use GI lengths of 1/128 and 1/64 used in DVB-C2. Therefore, this method can detect the operation mode of the demodulator in the time domain using the output value of the GI detector. If the operating mode is detected with time domain signals, the operation mode of the demodulator is quickly determined to make service support faster.

Mode Detection
Detection of DVB-T2 signals is available by detecting the P1 symbol in the time domain. The P1 symbol transmitted in the start of each T2 frame includes information such as FFT sizes and MISO mode. Thus, if the P1 symbol is detected, information of FFT sizes to operate the demodulator in the DVB-T2 mode can be obtained. DVB-C2 uses a fixed 4K FFT. Thus, it is possible to detect the GI length and remove CFO and STO. The C2 preamble can be detected in the frequency domain through the FFT operation. Therefore, if the proposed demodulator is operated in the DVB-C2 mode at the beginning, the DVB-T2 or DVB-C2 signal can be detected in the time and frequency domains at the same time.
An alternative method can be used for mode detection. DVB-C2 uses two GI lengths such as 1/128 and 1/64. Table 2 shows the number of GI samples used in DVB-T2. The 4K FFT mode of DVB-T2 does not use GI lengths of 1/128 and 1/64 used in DVB-C2. Therefore, this method can detect the operation mode of the demodulator in the time domain using the output value of the GI detector. If the operating mode is detected with time domain signals, the operation mode of the demodulator is quickly determined to make service support faster.

STO and CFO Synchronization Unit
The purpose of STO synchronization is to find the right starting point for FFT by removing the GI. For DVB-C2, STO synchronization can be performed by using the correlation characteristics of the GI and OFDM symbol. For DVB-T2, it can also be performed by using the P1 symbol or the GI. The purpose of CFO synchronization is to compensate CFO caused by the Doppler shifts and a frequency mismatch in the local oscillators of the transmitter and the receiver. CFO can be estimated using the output signal of the P1 symbol detector or the GI detector and the CFO synchronization block compensates the CFO by using a numerically controlled oscillator (NCO) and a complex multiplier. Figure 10a shows the block diagram of the GI detector. It detects the GI using the correlation between an input signal and a delayed signal by a FIFO memory. Figure 10b shows the proposed GI average unit using an accumulator to reduce H/W complexity. DVB-T2 uses various GI samples in combination with seven GI fractions and six FFT sizes as shown in Table 2

STO and CFO Synchronization Unit
The purpose of STO synchronization is to find the right starting point for FFT by removing the GI. For DVB-C2, STO synchronization can be performed by using the correlation characteristics of the GI and OFDM symbol. For DVB-T2, it can also be performed by using the P1 symbol or the GI. The purpose of CFO synchronization is to compensate CFO caused by the Doppler shifts and a frequency mismatch in the local oscillators of the transmitter and the receiver. CFO can be estimated using the output signal of the P1 symbol detector or the GI detector and the CFO synchronization block compensates the CFO by using a numerically controlled oscillator (NCO) and a complex multiplier. Figure 10a shows the block diagram of the GI detector. It detects the GI using the correlation between an input signal and a delayed signal by a FIFO memory. Figure 10b shows the proposed GI average unit using an accumulator to reduce H/W complexity. DVB-T2 uses various GI samples in combination with seven GI fractions and six FFT sizes as shown in Table 2. The proposed GI average unit requires 4864 delay units of input complex signals, 35 complex adders and 12 delay units of accumulated complex values to calculate the average of various GIs.

Variable FFT Unit
The Radix-2 algorithm is known for small quantization noise and various sizes of FFT can be easily implemented. The Radix-4 algorithm has bigger quantization noise than the Radix-2 algorithm, and different lengths of FFT with their parameters cannot be easily implemented. However, the Radix-4 algorithm has an advantage in high-speed operations with fewer complex multipliers compared to the Radix-2 algorithm. The Radix-2 2 algorithm implements the Radix-4 algorithm with

Variable FFT Unit
The Radix-2 algorithm is known for small quantization noise and various sizes of FFT can be easily implemented. The Radix-4 algorithm has bigger quantization noise than the Radix-2 algorithm, and different lengths of FFT with their parameters cannot be easily implemented. However, the Radix-4 algorithm has an advantage in high-speed operations with fewer complex multipliers compared to the Radix-2 algorithm. The Radix-2 2 algorithm implements the Radix-4 algorithm with Radix-2 blocks [11]. It implements FFT efficiently because it shares the benefit of the Radix-4 algorithm where a small number of multipliers are used, and also shares the benefits of the Radix-2 algorithm, such as small quantization noise and easy implementation.
FFT architectures can be classified into two categories such as pipeline architectures and memory-based architectures [12][13][14]. Memory-based architectures take smaller H/W than pipeline architectures. However, memory-based architectures require longer latency and a faster operating clock than pipeline architectures. Depending on various FFT modes, the diversity of data rates in the FFT core complicates the interface circuits between functional blocks. Pipeline FFT architectures reduce the latency and the power consumption, which makes them the most appropriate for DVB-T2 and DVB-C2. Pipeline architectures are divided into a multipath delay commutator (MDC), a single-path delay feedback (SDF), and a single-path delay commutator (SDC) in accordance with the method of exchanging input and output data at each stage.
DVB-T2 requires the FFT operation from 1K up to 32K. It is effective to use Radix-22 SDF FFT when the performance of quantization noise and the efficient implementation of H/W are considered. To support various FFT sizes from 1K up to 32K, the Radix-22 SDF architecture is used for 1K FFT and the Radix-2 SDF architecture is used for expandable parts. Figure 11 shows the block diagram of the variable FFT. To support various FFT sizes from 1K up to 32K, the Radix-22 SDF architecture is used for 1K FFT and the Radix-2 SDF architecture is used for expandable parts. Figure 11 shows the block diagram of the variable FFT.

Channel Equalizer Unit
There are various scattered pilot patterns in DVB-T2 and DVB-C2 uses a part of the scattered pilot patterns in DVB-T2. Figure 12 shows the block diagram of the channel equalizer based on various scattered pilot patterns. The channel estimator performs pilot symbol-assisted channel estimation using pilots and their respective positions in order to support various pilot patterns. It is implemented by the least squares (LS) estimator. The interpolator performs frequency interpolation by using a linear interpolator.
Channel Estimator

Channel Equalizer Unit
There are various scattered pilot patterns in DVB-T2 and DVB-C2 uses a part of the scattered pilot patterns in DVB-T2. Figure 12 shows the block diagram of the channel equalizer based on various scattered pilot patterns. The channel estimator performs pilot symbol-assisted channel estimation using pilots and their respective positions in order to support various pilot patterns. It is implemented by the least squares (LS) estimator. The interpolator performs frequency interpolation by using a linear interpolator.
There are various scattered pilot patterns in DVB-T2 and DVB-C2 uses a part of the scattered pilot patterns in DVB-T2. Figure 12 shows the block diagram of the channel equalizer based on various scattered pilot patterns. The channel estimator performs pilot symbol-assisted channel estimation using pilots and their respective positions in order to support various pilot patterns. It is implemented by the least squares (LS) estimator. The interpolator performs frequency interpolation by using a linear interpolator.

Residual STO and CFO Synchronization Unit
A residual STO continuously changes the phase of frequency domain signals over time, and it can be restored by estimating the phase variation of pilots over time. A residual CFO changes the phase of a fixed size for frequency domain signals over time, and it can be restored by estimating the phase variation of pilots. Figure 13 shows the block diagram of the synchronization block to estimate and compensate the residual STO and CFO. The residual STO and CFO synchronization block operation is as follows. During the first cycle, it estimates STO included in pilots, and then removes STO from pilots. After STO correction, it estimates CFO by using STO-recovered pilots. By using the estimated CFO and STO, it performs fine STO and CFO correction. The proposed STO and CFO synchronization block can estimate and remove all STO and CFO by using an average unit, a phase estimator, a NCO, and a complex multiplier. The proposed architecture achieves low H/W complexity by sharing the average unit, the phase estimator, the NCO, and the complex multiplier.
Appl. Sci. 2019, 9, x 12 of 16 A residual STO continuously changes the phase of frequency domain signals over time, and it can be restored by estimating the phase variation of pilots over time. A residual CFO changes the phase of a fixed size for frequency domain signals over time, and it can be restored by estimating the phase variation of pilots. Figure 13 shows the block diagram of the synchronization block to estimate and compensate the residual STO and CFO. The residual STO and CFO synchronization block operation is as follows. During the first cycle, it estimates STO included in pilots, and then removes STO from pilots. After STO correction, it estimates CFO by using STO-recovered pilots. By using the estimated CFO and STO, it performs fine STO and CFO correction. The proposed STO and CFO synchronization block can estimate and remove all STO and CFO by using an average unit, a phase estimator, a NCO, and a complex multiplier. The proposed architecture achieves low H/W complexity by sharing the average unit, the phase estimator, the NCO, and the complex multiplier.

Implementation Results and Lab Test
The proposed DVB-X2 receiver is implemented with a tuner block, a H/W-based receiver engine, a frame processor, and an A/V decoder as shown in Figure 14. The tuner block is composed of DVB-T2, DVB-C2, and DVB-S2 tuner modules. The DVB-T2 and DVB-C2 tuner modules are implemented in the same way using a half-network interface module (NIM) type tuner for DVB-T/C and a 14-bit analog-to-digital converter (ADC), and the DVB-S2 tuner module is implemented using a half-NIM type tuner for DVB-S/S2 and a 10-bit ADC. The DVB-X2 receiver engine for demodulation and channel decoding functions is implemented using two FPGA devices. The implementation results are given in Table 3. The frame processor is implemented with 256 MB memory and a DSP operating at up to 1.0 GHz, and the A/V decoder is implemented with a commercial chip for IP set-top boxes. Table 4 shows a comparison of FPGA resource utilization for DVB-T2, DVB-C2, DVB-S2, and DVB-X2 demodulators. As shown in Table 4, the proposed DVB-X2 demodulator requires the same amount of memory while slice registers, slice LUTs, and DSP48Es require 43.3%, 10.5%, 16.7% more resources compared to the DVB-T2 demodulator, respectively.

Implementation Results and Lab Test
The proposed DVB-X2 receiver is implemented with a tuner block, a H/W-based receiver engine, a frame processor, and an A/V decoder as shown in Figure 14. The tuner block is composed of DVB-T2, DVB-C2, and DVB-S2 tuner modules. The DVB-T2 and DVB-C2 tuner modules are implemented in the same way using a half-network interface module (NIM) type tuner for DVB-T/C and a 14-bit analog-to-digital converter (ADC), and the DVB-S2 tuner module is implemented using a half-NIM type tuner for DVB-S/S2 and a 10-bit ADC. The DVB-X2 receiver engine for demodulation and channel decoding functions is implemented using two FPGA devices. The implementation results are given in Table 3. The frame processor is implemented with 256 MB memory and a DSP operating at up to 1.0 GHz, and the A/V decoder is implemented with a commercial chip for IP set-top boxes. Table 4 shows a comparison of FPGA resource utilization for DVB-T2, DVB-C2, DVB-S2, and DVB-X2 demodulators. As shown in Table 4, the proposed DVB-X2 demodulator requires the same amount of memory while slice registers, slice LUTs, and DSP48Es require 43.3%, 10.5%, 16.7% more resources compared to the DVB-T2 demodulator, respectively. To verify the functionality and reliability of the proposed DVB-X2 receiver, tests were conducted in the laboratory as shown in Figure 15. Commercial PC-based modulators are used to generate DVB-T2, DVB-C2, and DVB-S2-modulated signals. The laboratory test was conducted for more than 24 continuous hours and results show that the proposed DVB-X2 receiver operates without any service interruption. To verify the functionality and performance of the frame processor, the time to process the T2 frame generated as a test parameter of Table 5 was measured. DVB-T2, DVB-C2, and DVB-S2 all share similar BB frame structures, but signal processing of baseband in DVB-T2 is the most complicated, so the T2 frame was used for the test. The T2 frame, made up of the P1 symbol, the P2 symbol and data symbols, was received for every 9268 us. Table 6 shows the results of the time measurement for processing the T2 frame. For the measurement of processing time, the DSP internal timer with clock cycle 6 ns was used. As the result of the measurement indicates, the frame processor needs 3717 us to process the T2 frame, which satisfies the required time of 9268 us. The frame demapping function, the frame decoding function, and de-interleaving functions such as frequency, cell, and time, do not require much processing time. The proposed receiver was also tested under various DVB-T2/C2/S2 operation modes as specified in the standards and demonstrated successful operations in all test scenarios.     internal timer with clock cycle 6 ns was used. As the result of the measurement indicates, the frame processor needs 3717 us to process the T2 frame, which satisfies the required time of 9268 us. The frame de-mapping function, the frame decoding function, and de-interleaving functions such as frequency, cell, and time, do not require much processing time. The proposed receiver was also tested under various DVB-T2/C2/S2 operation modes as specified in the standards and demonstrated successful operations in all test scenarios. continuous hours and results show that the proposed DVB-X2 receiver operates without any service interruption. To verify the functionality and performance of the frame processor, the time to process the T2 frame generated as a test parameter of Table 5 was measured. DVB-T2, DVB-C2, and DVB-S2 all share similar BB frame structures, but signal processing of baseband in DVB-T2 is the most complicated, so the T2 frame was used for the test. The T2 frame, made up of the P1 symbol, the P2 symbol and data symbols, was received for every 9268 us. Table 6 shows the results of the time measurement for processing the T2 frame. For the measurement of processing time, the DSP internal timer with clock cycle 6 ns was used. As the result of the measurement indicates, the frame processor needs 3717 us to process the T2 frame, which satisfies the required time of 9268 us. The frame demapping function, the frame decoding function, and de-interleaving functions such as frequency, cell, and time, do not require much processing time. The proposed receiver was also tested under various DVB-T2/C2/S2 operation modes as specified in the standards and demonstrated successful operations in all test scenarios.

PC-based
Receiver DVB-X2 Figure 15. Test environment for the proposed DVB-X2 receiver.

Conclusions
In this paper, an integrated DVB-X2 receiver architecture to support multi-mode broadcasting standards in a single platform was proposed. The proposed receiver architecture uses the common acceleration engine to receive DVB-T2, DVB-C2, and DVB-S2-modulated signals, and its functional blocks are designed with the H/W engine and the S/W engine. The proposed receiver is implemented with the tuner block, two FPGA devices, the DSP, 256 MB memory, and the A/V decoder. According to the implementation results, the proposed receiver uses the same amount of memory as the DVB-T2 receiver with a small increase in the number of logic resources. Thus, the proposed receiver architecture can support DVB-T2, DVB-C2, and DVB-S2 standards while minimizing additional H/W resources required. The functionality of the proposed receiver was verified under various DVB-T2/C2/S2 operation modes as specified in the standards. The proposed DVB-X2 receiver is applicable to consumer digital broadcasting systems such as DVB set-top-boxes, digital HDTVs, etc.