Design and Realization of Dynamically Adjustable Multi-Pulse Real-Time Coherent Integration System

Bi, Jinrui; Zhang, Hongyu; Sun, Lihua; Jiang, Qingchao

doi:10.3390/electronics15020397

Open AccessArticle

Design and Realization of Dynamically Adjustable Multi-Pulse Real-Time Coherent Integration System

by

Jinrui Bi

,

Hongyu Zhang

,

Lihua Sun

^*

and

Qingchao Jiang

School of Information Science and Engineering, East China University of Science and Technology, Shanghai 200237, China

^*

Author to whom correspondence should be addressed.

Electronics 2026, 15(2), 397; https://doi.org/10.3390/electronics15020397

Submission received: 8 December 2025 / Revised: 3 January 2026 / Accepted: 9 January 2026 / Published: 16 January 2026

(This article belongs to the Special Issue From Circuits to Systems: Embedded and FPGA-Based Applications)

Download

Browse Figures

Versions Notes

Abstract

Radar signal coherent integration technology is a critical method to improve the performance of detection systems. However, existing techniques face challenges regarding real-time performance and the flexibility of multi-pulse coherent accumulation. In this paper, a dynamically configurable multi-pulse multi-frame real-time coherent integration system based on FPGA is designed and implemented, and the dynamic configuration of the number of pulses and the number of frames stored for each pulse is realized through the host computer. The experimental results show that the output signal delay of coherent integration is 33 microseconds at 40 pulses, and the energy gain reaches 16 dB at 40 pulses, which provides a dynamically configurable hardware platform and solution for real-time coherent integration of high-frame-count, multi-pulse radar signals.

Keywords:

radar detection; coherent integration; FPGA; dynamic configuration

1. Introduction

Coherent accumulation has become a critical technique in modern radar and communication systems for combating channel fading and enhancing the detection of weak signals [1,2,3]. By integrating multiple radar pulse echoes before detection, this technique effectively suppresses random noise, thereby significantly improving the signal-to-noise ratio and detection probability in complex electromagnetic environments [1,4,5,6,7].

Low-latency operation during coherent accumulation is paramount for practical deployment. In military and security scenarios, such as missile warning or fighter radar systems, real-time processing minimizes the latency between target discovery and system response, ensuring timely interception or evasion [8,9,10]. Likewise, devices such as autonomous driving or drones also rely on radar to sense the environment in real time. For instance, in autonomous driving, a processing delay of just 10 ms for coherent accumulation translates to a critical loss of 2.8 m in braking distance for a vehicle traveling at 100 km/h. Conventional implementation platforms for coherent accumulation face significant limitations [11]. Central Processing Units (CPUs), constrained by their sequential architecture, struggle to satisfy the temporal constraints inherent in high-throughput radar streams. While Digital Signal Processors (DSPs) provide some parallelism, their limited number of multiply-accumulate (MAC) units renders them inefficient for large-scale computations. Application-Specific Integrated Circuits (ASICs) offer optimal performance for a given algorithm but lack flexibility; any change in radar parameters requires costly and time-consuming re-fabrication, as they cannot be reconfigured through software [12,13,14]. Furthermore, most existing implementations employ a fixed number of accumulation pulses, a rigid approach ill-suited for dynamic targets and varying environments. For instance, tracking a high-velocity maneuvering target requires reducing the pulse count to mitigate Doppler spread, whereas for a low-velocity target, increasing the pulse count is preferable to maximize the signal-to-noise ratio [15,16,17]. To address these challenges of real-time performance and flexibility, this paper proposes a dynamically configurable, multi-pulse coherent accumulation scheme utilizing a Field-Programmable Gate Array (FPGA). Our approach is implemented on a high-performance hardware platform comprising an Analog Devices AD9361 RF transceiver and a Xilinx Ultrascale series FPGA. The core of our design leverages the high-bandwidth capabilities of DDR4 memory, combined with an innovative data framing architecture, to enable real-time processing of the radar pulse stream. Consequently, the proposed system not only satisfies stringent real-time requirements but also facilitates runtime parameter tuning of the integration interval, demonstrating superior adaptability and flexibility for diverse operational scenarios. To provide the necessary theoretical grounding, the mathematical basis of coherent integration is briefly introduced. Assuming the radar transmits

N

pulses, the received signal for the

n

-th pulse can be modeled as

x_{n} (t) = s_{n} (t) + w_{n} (t)

where

s_{n} (t)

represents the target echo and

w_{n} (t)

denotes the additive white Gaussian noise. The process of coherent integration involves the summation of these

N

pulse echoes, which can be expressed as

y (t) = \sum_{n = 0}^{N - 1} x_{n} (t) = \sum_{n = 0}^{N - 1} s_{n} (t) + \sum_{n = 0}^{N - 1} w_{n} (t)

(1)

Since the target signals

s_{n} (t)

are phase-aligned (coherent), they add up constructively (voltage sums linearly), whereas the noise components

w_{n} (t)

are uncorrelated and add up incoherently. Consequently, for

N

accumulated pulses, the signal power increases by a factor of

N^{2}

, while the noise power increases by a factor of

N

, resulting in a Signal-to-Noise Ratio (SNR) improvement of

N

(or

10 {l o g}_{10} N

dB).

Beyond SNR improvement, coherent integration plays a pivotal role in determining radar resolution. While range resolution is primarily dictated by the signal bandwidth (

Δ R = c / 2 B

), coherent integration extends the coherent processing interval (CPI), which directly refines the Doppler (velocity) resolution. Theoretically, the Doppler resolution is inversely proportional to the total integration time (

Δ f_{d} = 1 / T_{C P I}

). Therefore, by dynamically increasing the number of accumulated pulses, the system not only enhances weak signal detection but also improves the capability to distinguish targets with small velocity differences.

2. Overall System Logic Design

The overall hardware architecture of the system, illustrated in Figure 1, is centered around a Xilinx Kintex Ultrascale FPGA (XCKU040-FFVA1156-2-I; AMD, Santa Clara, CA, USA) and an Analog Devices AD9361 agile RF transceiver (Analog Devices, Norwood, MA, USA). Serving as the high-performance RF front-end, the AD9361 is responsible for critical functions including RF signal transmission and reception, frequency synthesis, and digital up/down-conversion. The FPGA acts as the system’s central processing and control hub, tasked with the real-time storage of radar data, execution of the core coherent accumulation algorithm, and generation of linear frequency modulation (LFM) waveforms. Furthermore, the FPGA manages all external communication, receiving configuration commands from a host computer via a serial interface and transmitting the processed data at high speed over Gigabit Ethernet using the UDP protocol. Together, these components form a complete and efficient pipeline for data acquisition, processing, and communication.

Centering on an embedded MicroBlaze soft-core processor (AMD, Santa Clara, CA, USA), the top-level architecture (illustrated in Figure 2) implements a hardware-software co-processing paradigm. This design strategically separates the system’s operation into a control path, managed by the processor, and a high-speed data path, implemented in programmable logic. In the control path, the MicroBlaze processor acts as the master controller. It configures the AD9361 transceiver’s internal registers via the SPI bus to set key parameters (e.g., frequency, gain, bandwidth) and processes commands received from a host computer through a UART module. To communicate with the data path, it uses the AXI4-Lite bus protocol to send control signals to the FPGA’s programmable logic (PL), permitting the flexible orchestration of specialized hardware components.

In the data path, the MicroBlaze generates and writes custom waveform data into on-chip Block RAM (BRAM). From there, dedicated driver logic within the PL reads this data at high speed and streams it to the AD9361 for transmission. The precise timing of this entire process is managed by PL-based timer and interrupt controllers, ensuring signal periodicity and stability. This architecture fully leverages the flexibility of the MicroBlaze for complex control tasks while exploiting the massive parallelism of the FPGA’s PL for high-performance data processing.

2.1. BRAM Data Transfer Module

This module is responsible for generating and buffering the transmission waveforms. The data flow begins when the MicroBlaze soft-core processor receives configuration parameters, such as frequency, pulse width, and amplitude, from a host computer via the UART interface. Based on these parameters, the processor software generates the corresponding digital waveform sequence. Finally, using the AXI bus, the MicroBlaze writes the generated data into the FPGA’s on-chip Block RAM (BRAM), where it is cached for subsequent high-speed access by the programmable logic.

2.2. AD9361 Data Driver Module Design

In this work, the AD9361 is configured via its SPI port to operate in a dual-receiver, dual-transmitter (2R2T) mode using a Low-Voltage Differential Signaling (LVDS) interface. The data transfer timing for the receive (RX) path is detailed in Figure 3. This interface employs a source-synchronous, double data rate (DDR) scheme where each 12-bit I/Q sample is transmitted to the FPGA over two full clock cycles. The RX_FRAME signal functions as a channel selector: a high level indicates data transmission for the first channel, while a low level indicates transmission for the second. The transmit (TX) interface operates based on a similar protocol.

The AD9361 provides the FPGA with six differential data pairs, one differential source-synchronous clock pair, and a frame synchronization signal. The physical layer implementation for this interface, detailed in the block diagram in Figure 4, follows a structured processing chain. First, the incoming differential clock is converted to a single-ended signal by an IBUFDS primitive and then routed through a BUFG global buffer to ensure high-quality, low-skew clock distribution. The differential data and frame signals are similarly passed through IBUFDS primitives. To compensate for signal skew arising from PCB trace length mismatches, each of these single-ended signals is then fed into an IDELAYE2 primitive for fine-grained delay tuning. Finally, the delay-compensated signals are input to IDDR primitives, which de-serialize the double-data-rate stream by capturing data on both the rising and falling clock edges. The resulting parallel bits are then concatenated to reconstruct the full 12-bit I and Q data samples. The transmit path is implemented using an analogous methodology.

2.3. Data Storage and Coherent Integration Module

The data storage and coherent accumulation capabilities of this system are built upon a high-performance DDR4 SDRAM. Operating at a clock frequency of 1200 MHz, it establishes the architectural basis for handling the continuous stream processing of high-volume radar datasets. The module processes a composite 24-bit data stream (12-bit I and 12-bit Q) from the AD9361 at a 40 MHz sampling rate. Due to the inherent limitations of DDR4 memory—such as periodic self-refresh cycles and the inability to perform simultaneous read and write operations—we employ a time-division operational strategy. This strategy involves writing one full frame of data before initiating the read and process sequence for multiple frames.

To enable flexible pulse accumulation, the system supports a dynamically configurable range of 1 to 40 pulses. Accordingly, the 512 MB physical address space of the DDR4 is logically partitioned into 40 distinct 12.8 MB regions, each dedicated to storing the data from a single pulse echo. For efficient data management and memory alignment, we define a ‘data frame’ as a 2 KB block containing 680 24-bit samples. This structure allows the storage depth for each pulse, configurable from 1 to 18,823 frames, to be dynamically set by the host computer based on the specific detection task. As illustrated in Figure 5, the module comprises three core sub-modules: Data Write, Coherent Accumulation, and Data Computation, which collectively manage the entire workflow from data buffering to the final accumulated output.

2.3.1. Data Write Module

The data write module first monitors the initialization completion status of the MIG (Memory Interface Generator) IP core to ensure that the DDR4 SDRAM is ready. Upon receiving a ‘start accumulating’ command from the MicroBlaze processor, the module starts writing the incoming 24-bit I/Q data stream from the AD9361 into a dedicated asynchronous FIFO buffer. The write port bit width of this FIFO is configured to 24 bits to directly match the input data. To accommodate DDR4 burst transfers, the read port bit width of the FIFO is extended to 192 bits (24 bits × 8) accordingly.

2.3.2. Coherent Integration Module

This module orchestrates the intricate process of storing and retrieving data frames from DDR4 memory for accumulation. Its operation begins by reconstructing a complete data frame (680 samples, 2 KB) from the 192-bit-wide stream provided by the Data Write module’s output FIFO. The module’s logic is best understood in two phases: an initial buffer-filling phase and a steady-state circular buffer phase.

Initial Filling Phase: Initially, the system writes incoming data frames sequentially into the N dynamically configured pulse regions within the DDR4 memory. The memory address pointer automatically advances to the next region once the current one is filled to its configured frame depth. This process continues until all N pulse regions have been populated for the first time.

Steady-State Circular Buffer Phase: Once the buffers are full, the module transitions to its steady-state operation, functioning as a large, multi-pulse circular buffer. Here, the arrival of a new data frame, which is used to overwrite the oldest corresponding frame data, simultaneously triggers a synchronized read operation. This read command fetches data from the same frame address across all N active pulse regions. This critical step provides the N parallel data streams required for the subsequent computation stage. As shown in Figure 6, the data read from each of the N regions is then temporarily buffered into one of 40 dedicated, 192-bit-wide FIFOs, with the number of active FIFOs precisely matching the configured pulse count.

To further clarify the control logic of the time-division strategy, a state diagram is presented in Figure 7. The system initiates in the IDLE state and transitions to the Initial Filling Phase state upon receiving the start command. In this phase, data is sequentially written to the DDR4 memory until all pulse buffers are filled. Subsequently, the system enters the Steady-State (Circular Buffer) phase. In this state, a ‘Write-Read’ time-division cycle is executed: each new frame write triggers a simultaneous read operation from all N pulse addresses, ensuring real-time data coherency.

Quantitative timing model. Let the input frame rate be

f_{f r m}

With

N

pulse regions and a configured depth of

D

frames per region, the initial fill time

T_{f i l l}

can be approximated as

T_{f i l l} \approx \frac{N \times D}{f_{f r m}}

(2)

After initialization, the steady-state scheduling follows a fixed write–sync-read–FIFO-dispatch–compute pipeline per frame. Hence, the steady-state output latency

T_{s s}

can be expressed as

T_{s s} \approx T_{w r} + T_{r d} + T_{f i f o} + T_{p i p e}

(3)

where

T_{w r}

,

T_{r d}

are DDR4 write/read service times (including arbitration/refresh effects),

T_{f i f o}

denotes buffering/handshake delay, and

T_{p i p e}

is the fixed computation pipeline delay. This explains why the output latency remains nearly constant once the pipeline is full.

2.3.3. Data Computing Module

This module performs the parallel summation of the data streams from the multiple output FIFOs. A naive approach of summing up to 40 input channels in a single clock cycle would result in an extremely high logic fan-in, creating a critical path that makes achieving timing closure impossible at the target frequency. To overcome this, our design implements a three-stage pipelined adder tree, as illustrated in Figure 8.

Stage 1: The 40 input data streams are divided into groups, and seven parallel adders generate seven partial sums. Stage 2: The seven partial sums from the first stage are further reduced by a second layer of adders, yielding two intermediate sums. Stage 3: A final adder combines the two intermediate sums to produce the fully accumulated output signal. This hierarchical, pipelined architecture systematically breaks the long combinatorial path into shorter, manageable segments, which significantly reduces the overall path delay and ensures stable operation at high clock frequencies.

To ensure high-precision processing, the input signal consists of 12-bit signed integers for both I and Q channels. Accumulating up to 40 pulses requires a theoretical dynamic range expansion of approximately 5.3 bits (

{l o g}_{2} 40

). To prevent overflow and avoid truncation errors, the internal adder tree and the final output interface are designed with a 24-bit width per channel (48-bit total for I/Q), as shown in Figure 5. This provides sufficient headroom (24 bits > 12 bits + 6 bits) to maintain full precision throughout the integration process, ensuring that no numerical degradation occurs.

2.4. Ethernet Data Transfer Module

This module manages the high-speed uplink of processed data to a host computer via Gigabit Ethernet. At the hardware level, the system employs a Micrel KSZ9031RNX Physical Layer (PHY) transceiver (Micrel, San Jose, CA, USA), which interfaces with the FPGA through the Reduced Gigabit Media Independent Interface (RGMII) standard.

At the protocol level, the implementation is based on the TCP/IP suite. We specifically selected the User Datagram Protocol (UDP) as the transport layer protocol, as it is ideally suited for the real-time, continuous data streams generated by the radar system. Unlike connection-oriented protocols such as TCP, UDP eliminates the overhead associated with connection establishment and data acknowledgment handshakes. This results in significantly lower latency and higher transmission efficiency, making it the optimal choice for high-throughput, real-time streaming applications. The internal logic of the UDP transmitter is implemented as a finite state machine (FSM), as depicted in Figure 9.

The transmission data path begins with the final accumulated results from the Data Computation module being fed into a dedicated FIFO within the Ethernet module. This FIFO serves as a packet assembly buffer, interfacing with the byte-oriented network stack by providing an 8-bit (1-byte) wide read port. The system monitors the fill level of this buffer continuously. Once the amount of cached data reaches 1500 bytes—a size selected to match a typical Ethernet Maximum Transmission Unit (MTU)—the UDP packet transmission sequence is triggered, sending a full data payload to the host computer.

2.5. GUI Interface and Operation

System control and parameter configuration are handled by a custom host-side Graphical User Interface (GUI) developed in MATLAB (Version R2023a), shown in Figure 10. The application, which communicates with the hardware platform via a serial port, features an automatic scanning and connection function for user convenience. The GUI is organized into three primary panels.

Transceiver & Acquisition Settings: This panel allows for the configuration of the AD9361 RF front-end, including parameters like transmit attenuation and receive gain. Users can also set key acquisition parameters, such as the transmission duration and the number of data frames to capture. A ‘System Initialization’ button provides a one-click reset to default values.

Waveform Generation: Here, users can define the transmission waveform by specifying its amplitude and frequency. Generating a Linear Frequency Modulated (LFM) signal is straightforward: the user selects a checkbox, enters the LFM parameters, and clicks the ‘Generate Signal’ button.

Accumulation Control & System Status: This panel contains the most critical run-time control, empowering the operator to modify the pulse count on-the-fly for coherent accumulation (from 1 to 40). It also provides real-time hardware monitoring, where users can query the board’s temperature and voltage, with the readouts appearing in a status display console. It should be noted that the runtime adjustability process involves a latency of approximately 3 ms, attributed to serial transmission and processor interrupt handling. Furthermore, to prevent memory addressing errors when switching pulse counts, the system executes a soft reset of the data storage module upon receiving new configuration parameters. While this momentarily interrupts the data stream, it ensures the correctness of the subsequent integration cycle.

Processed data is transmitted from the FPGA over the network to the host computer, where it can be captured using a standard network utility like the ‘NetAssist’ tool (Version 5.0.14) shown in Figure 11. To receive the data stream, the utility is configured by selecting the UDP protocol, setting the local host’s IP address and listening port, and then activating the listening service. Furthermore, the tool includes an integrated data logging feature. Enabling the ‘Receive and save to file’ option automatically stores the payload of all incoming UDP packets to a local file, providing a convenient method for subsequent offline analysis.

3. System Test Results

The physical testbed, as shown in Figure 12, consists of a Xilinx Kintex Ultrascale development board (AXK040; ALINX, Shanghai, China) and an Analog Devices AD9361-based RF card (FMC-SDR001-9361; Milianke, Changzhou, China). The RF module is interfaced with the FPGA board through a high-speed FMC connector. External connections include a serial cable for system configuration and a Gigabit Ethernet cable for high-speed data transfer to the host computer.

3.1. Single Tone Signal Testing

To validate the logical correctness of the data storage and coherent accumulation module, a local loopback test was conducted. In this test, the AD9361 was configured via its SPI interface to operate in an internal digital loopback mode. This setup allows the device’s receive path to directly capture the single-tone signal generated by its own transmit path. We then leveraged the Integrated Logic Analyzer (ILA) to achieve instantaneous observation of critical internal signal transitions throughout this process.

The results of this test, captured using the ILA, are presented in Figure 13. The signals labeled rake1 to rake40 represent the data from the individual pulse echoes, while coherent_result is the final accumulated output. The waveform capture confirms two critical aspects of the design. First, the individual rake signals are perfectly phase-aligned, which validates the integrity and accuracy of the data storage and retrieval path through the DDR4 memory. Second, the coherent_result signal provides the correct summed output after a deterministic three-clock-cycle pipeline delay. This observation verifies the functional correctness of the Data Computation module and the validity of its timing design.

3.2. LFM Signal Testing

To verify the system’s parameter adjustment versatility, we tested the coherent accumulation of a Linear Frequency Modulated (LFM) signal with the pulse count set to 1, 15, 30, and 40 via the host computer GUI. The resulting accumulated output waveforms, captured using the Vivado (Version 2018.3) Integrated Logic Analyzer (ILA), are shown in Figure 14. These results confirm that the system correctly adjusts its processing pipeline in real-time according to the user-configured pulse count, validating the flexibility of the design.

Aiming at quantifying temporal efficiency, we conducted latency tests by varying both the number of accumulation pulses and the number of data frames stored per pulse. We measured two key metrics: the initial buffer fill time and the steady-state output latency. As shown in Table 1, the initial buffer time reflects the duration required to populate the memory regions before the first accumulation result is available. It is directly proportional to the product of the pulse count and storage frames. In contrast, the data output delay remains constant regardless of the number of stored frames for a given pulse count. This demonstrates that the system’s temporal responsiveness is not constrained by the history depth of the radar data, which is a significant advantage of the proposed DDR4-based time-division architecture.

Finally, to analyze the processing gain, both the accumulated LFM signal (40 pulses) and a single-pulse raw signal were exported via the UDP module to MATLAB. The comparative analysis is shown in Figure 15. The measured energy gain provides a critical metric for evaluating the system’s phase linearity. Theoretically, the SNR improvement factor for coherent integration of

N

pulses is given by

G = 10 {l o g}_{10} (N)

. For the maximum configuration of

N = 40

pulses used in our experiment, the theoretical limit is

G_{t h e o r e t i c a l} = 10 {l o g}_{10} (40) \approx 16.02 d B

(4)

Our experimental result of 16 dB is remarkably close to this theoretical value. This minimal discrepancy indicates that the system maintains excellent phase synchronization across the entire accumulation window.

3.3. Hardware Resource and Power Analysis

To evaluate the system’s suitability for embedded applications, we analyzed the hardware resource utilization and power consumption using the Xilinx Vivado suite (Version 2018.3). As shown in Table 2, the coherent integration module occupies only a fraction of the XCKU040’s resources (e.g., 10.16% LUTs and 0.42% DSPs), ensuring high expandability. Furthermore, the total estimated power consumption is 2.847 W, the static power consumption is 0.254 W, and the dynamic power consumption is 2.593 W. This combination of low resource occupancy and high energy efficiency confirms that the proposed system meets the stringent SWaP (Size, Weight, and Power) requirements of airborne radar platforms.

4. Discussion

4.1. Performance Comparison

To comprehensively evaluate the competitiveness of the proposed system, we conducted a comparative analysis against diverse architectures, including DSP, ASIC, and traditional BRAM-based FPGA solutions. The comparison focuses on processing latency, flexibility, and scalability, as summarized in Table 3.

As shown in Table 3, DSP-based solutions offer high flexibility but suffer from high latency due to serial instruction execution, often reaching millisecond levels for large-scale matrix computations. Conversely, ASIC implementations provide the lowest latency (<20 µs) but lack flexibility; parameters are hardwired, preventing dynamic reconfiguration. Traditional BRAM-based FPGAs offer low latency but are limited by on-chip memory capacity, which restricts pulse scalability.

To further benchmark the system, we compared it with specific state-of-the-art FPGA implementations. Existing streaming architectures, such as the real-time noise radar processor by Ankel et al. [18], achieve ultra-low processing latency (typically 10 µs) through fixed hardware pipelines. However, this architecture lacks flexibility; adjusting the number of integration pulses requires modifying the HDL code and re-synthesizing the bitstream. Similarly, BRAM-based designs, such as the long-time integration algorithm by Mi et al. [19], are severely constrained by the FPGA’s on-chip memory resources (typically <50 Mb), preventing the processing of high-frame-count signals.

In contrast, our proposed DDR4-based architecture leverages external SDRAM to achieve high scalability (up to 40 pulses) and dynamic configurability. Although this introduces a deterministic latency of 33 µs due to memory access cycles, this delay is negligible compared to the typical Pulse Repetition Interval (PRI) of radar systems (often hundreds of microseconds). Thus, the proposed design effectively bridges the gap between DSP flexibility and ASIC speed, making it an optimal solution for adaptive radar systems.

4.2. Robustness in Realistic Environments

While the current experimental validation was conducted under controlled laboratory conditions, realistic radar environments introduce factors such as Doppler shifts and phase noise that can impact integration efficiency. Specifically, large Doppler shifts caused by high-speed targets can introduce phase errors across the pulse train, potentially leading to coherence loss if the integration time is fixed. A key advantage of the proposed architecture is its dynamic configurability: it allows the system to reduce the number of accumulation pulses (e.g., from 40 to 10) to ensure the integration window remains within the target’s coherence time, thereby mitigating Doppler effects. Furthermore, regarding low SNR conditions, the measured 16 dB energy gain (Figure 15) confirms the system’s robustness and its ability to significantly improve the signal-to-noise ratio for weak target detection.

4.3. Scalability Analysis

The current system configuration is limited to 40 pulses primarily due to two architectural factors: the timing closure challenges associated with the depth of the parallel adder tree (Section 2.3.3) and the read bandwidth saturation of the single-channel DDR4 interface during the time-division retrieval phase. To support higher pulse counts (e.g., >100) in future iterations, the design could be extended by adopting a cascaded accumulation strategy to reduce logic fan-in or by upgrading the hardware platform to utilize High Bandwidth Memory (HBM), which would significantly expand the available read throughput for massive multi-pulse processing.

5. Conclusions

This paper has presented the design and implementation of a real-time coherent accumulation system, based on an FPGA and the AD9361 transceiver, that offers significant advantages in both processing speed and configuration flexibility. The core innovation is a novel time-division architecture that leverages the high bandwidth of DDR4 SDRAM. This approach overcomes the real-time bottlenecks of conventional methods, achieving a continuous stream of accumulated results with a remarkably low latency of just 33 microseconds post-initialization. Furthermore, the system provides exceptional adaptability through a MATLAB-based host GUI, which facilitates the dynamic configuration of the pulse count (1–40) and frame depth (1–18,823) to meet various operational demands. Experimental results, demonstrating an energy gain of 16 dB for a 40-pulse accumulation, have validated the effectiveness and robustness of the proposed design.

To transition the system from a laboratory prototype to an autonomous embedded sensor, the dependence on the host PC can be removed by leveraging the on-chip MicroBlaze processor. In an autonomous configuration, the MicroBlaze firmware would be upgraded to run a closed-loop control algorithm, automatically adjusting the accumulation parameters based on environmental feedback (e.g., increasing pulse count when no target is detected). Furthermore, future work will involve integrating a hardware-based CFAR detector after the coherent integration stage. This would allow the FPGA to make detection decisions locally and transmit only concise target coordinates to the platform’s mission computer via standard avionics interfaces (e.g., CAN bus), significantly reducing data link bandwidth requirements.

6. Patents

This research has a patent with the publication number CN120492385A.

Author Contributions

Conceptualization, L.S., J.B. and Q.J.; methodology, L.S. and J.B.; software, J.B. and H.Z.; validation, J.B., H.Z., L.S. and Q.J.; formal analysis, J.B.; investigation, J.B. and H.Z.; writing—original draft preparation, J.B.; writing—review and editing, H.Z., L.S. and Q.J.; visualization, J.B. and H.Z.; supervision, L.S.; project administration, L.S.; funding acquisition, L.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (12304553).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

The authors would like to thank the reviewers for their insightful comments and suggestions, which helped improve the quality of this manuscript. The authors also wish to acknowledge the support from the laboratory at the School of Information Science and Engineering, East China University of Science and Technology.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

ASIC	Application-Specific Integrated Circuit
AXI	Advanced eXtensible Interface
BRAM	Block RAM
CPU	Central Processing Unit
DDR	Double Data Rate
DSP	Digital Signal Processor
FIFO	First-In, First-Out
FPGA	Field-Programmable Gate Array
FSM	Finite State Machine
GUI	Graphical User Interface
ILA	Integrated Logic Analyzer
LFM	Linear Frequency Modulation
LVDS	Low-Voltage Differential Signaling
MAC	Multiply-Accumulate
MTU	Maximum Transmission Unit
PHY	Physical Layer
PL	Programmable Logic
RGMII	Reduced Gigabit Media Independent Interface
RX	Receive
SPI	Serial Peripheral Interface
TCP/IP	Transmission Control Protocol/Internet Protocol
TX	Transmit
UART	Universal Asynchronous Receiver-Transmitter
UDP	User Datagram Protocol

References

Kannanthara, J.; Griffiths, D.; Jahangir, M.; Jones, J.M.; Baker, C.J.; Antoniou, M.; Bell, C.J.; White, H.; Bongs, K.; Singh, Y. Whole system radar modelling: Simulation and validation. IET Radar Sonar Navig. 2023, 17, 1050–1060. [Google Scholar] [CrossRef]
Luong, D.; Balaji, B. Quantum two-mode squeezing radar and noise radar: Covariance matrices for signal processing. IET Radar Sonar Navig. 2020, 14, 97–104. [Google Scholar] [CrossRef]
Davis, M.E. Merrill I. Skolnik’s 50 year impact on radar development. IEEE Aerosp. Electron. Syst. Mag. 2022, 37, 57–59. [Google Scholar] [CrossRef]
Gong, J.; Yan, J.; Li, D.; Chen, R. Comparison of radar signatures based on flight morphology for large birds and small birds. IET Radar Sonar Navig. 2020, 14, 1365–1369. [Google Scholar] [CrossRef]
Sanchez-Rivas, D.; Rico-Ramirez, M.A. Towerpy: An open-source toolbox for processing polarimetric weather radar data. Environ. Model. Softw. 2023, 167, 105746. [Google Scholar] [CrossRef]
Henry, D.; Aubert, H.; Galaup, P.; Véronèse, T. Dynamic estimation of the yield in precision viticulture from mobile millimeter-wave radar systems. IEEE Trans. Geosci. Remote Sens. 2021, 60, 1–15. [Google Scholar] [CrossRef]
Wen, Q.; Cao, S. Radar range-doppler flow: A radar signal processing technique to enhance radar target classification. IEEE Trans. Aerosp. Electron. Syst. 2023, 60, 1519–1529. [Google Scholar] [CrossRef]
Chaves, C.S.; Geschke, R.H.; Shargorodskyy, M.; Herschel, R.; Kose, S.; Leuchs, S.; Krebs, C. Multisensor polarimetric MIMO radar network for disaster scenario detection of persons. IEEE Microw. Wirel. Compon. Lett. 2021, 32, 238–240. [Google Scholar] [CrossRef]
Haynes, M.S.; Chapin, E.; Moussessian, A.; Madsen, S.N. Opposite-side ambiguities in radar sounding interferometry. IEEE Trans. Geosci. Remote Sens. 2020, 58, 4640–4652. [Google Scholar] [CrossRef]
Daum, F. A system engineering perspective on quantum radar. In Proceedings of the 2020 IEEE International Radar Conference (RADAR), Washington, DC, USA, 28–30 April 2020; IEEE: New York, NY, USA, 2020; pp. 958–963. [Google Scholar]
Raphaeli, D.; Bilik, I. Challenges in automotive MIMO radar calibration in anechoic chamber. IEEE Trans. Aerosp. Electron. Syst. 2023, 59, 6205–6214. [Google Scholar] [CrossRef]
Vu, V.T.; Ivanenko, Y.; Pettersson, M.I. Phase error calculation caused by start-stop approximation in processing FMCW radar signals for SAR imaging. IEEE Access 2023, 11, 103669–103678. [Google Scholar] [CrossRef]
Frazer, G.J.; Williams, C.G. Emerging Trends in Radar: HF Skywave Radar. IEEE Aerosp. Electron. Syst. Mag. 2025; in press. [Google Scholar] [CrossRef]
Kumbul, U.; Uysal, F.; Vaucher, C.S.; Yarovoy, A. Automotive radar interference study for different radar waveform types. IET Radar Sonar Navig. 2022, 16, 564–577. [Google Scholar] [CrossRef]
Gao, X.; Roy, S.; Xing, G. MIMO-SAR: A hierarchical high-resolution imaging algorithm for mmWave FMCW radar in autonomous driving. IEEE Trans. Veh. Technol. 2021, 70, 7322–7334. [Google Scholar] [CrossRef]
Hoang, H.; John, M.; McEvoy, P.; Ammann, M.J. Calibration to mitigate near-field antennas effects for a MIMO radar imaging system. Sensors 2021, 21, 514. [Google Scholar] [CrossRef] [PubMed]
Brigada, D.J.; Ryvkina, J. Radar-optimized wind turbine siting. IEEE Trans. Sustain. Energy 2021, 13, 403–413. [Google Scholar] [CrossRef]
Ankel, M.; Tholén, M.; Bryllert, T.; Ulander, L.M.; Delsing, P. Implementation of a coherent real-time noise radar system. IET Radar Sonar Navig. 2024, 18, 1002–1013. [Google Scholar] [CrossRef]
Mi, Y.; Zhang, Y.; Yang, J. Long-time coherent integration algorithm for high-speed maneuvering target detection. J. Appl. Remote Sens. 2023, 17, 026515. [Google Scholar] [CrossRef]

Figure 1. System block diagram.

Figure 2. Logical structure of the top layer of the system.

Figure 3. AD9361 receive timing diagram.

Figure 4. AD9361 data reception processing block diagram.

Figure 5. Structure of data storage and coherent integration module.

Figure 6. Schematic diagram of data storage and output.

Figure 7. State transition diagram of the time-division operational strategy.

Figure 8. Data operation diagram.

Figure 9. UDP send FSM state diagram.

Figure 10. MATLAB-based GUI for system control.

Figure 11. Ethernet data reception interface.

Figure 12. Physical system test photo.

Figure 13. Single-tone signal local loopback test. The arrows indicate the three-clock-cycle pipeline delay between the input data and the coherent integration output.

Figure 14. Output results for different numbers of pulses. (a) one pulse, (b) fifteen pulses, (c) thirty pulses, (d) forty pulses.

Figure 15. Waveform of LFM signal with coherent integration.

Table 1. Test delay times.

Number of Pulses Coherent Integration/(pcs)	Number of Pulse Storage Frames/(pcs)	Initial Multi-Pulse Buffer Time/(ms)	Coherent Integration Data Output Delay/(µs)
20	100	34	24
20	1000	340	24
40	100	68	33
40	1000	680	33

Table 2. Hardware resource utilization on XCKU040 (AMD, Santa Clara, CA, USA).

Resource Type	Available	Used	Utilization Rate (%)
Logic Slice LUTs	242,400	24,617	10.16
Flip-Flops	484,800	27,227	5.62
Block RAM Tile	600	293	48.83
DSP48E1 Slices	1920	8	0.42
Global Clock Buffers	32	12	37.5

Table 3. Comparative analysis of coherent integration architectures.

Architecture	DSP	ASIC	FPGA (BRAM)	Proposed FPGA (DDR4)
Latency	High (ms)	Ultra-Low (<20 µs)	Ultra-Low (<20 µs)	Low (33 µs)
Flexibility	High	None	Medium	High (Dynamic)
Scalability	High	Low	Low	High (Max 40 pulses)
Cost	Low	High	Medium	Medium

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bi, J.; Zhang, H.; Sun, L.; Jiang, Q. Design and Realization of Dynamically Adjustable Multi-Pulse Real-Time Coherent Integration System. Electronics 2026, 15, 397. https://doi.org/10.3390/electronics15020397

AMA Style

Bi J, Zhang H, Sun L, Jiang Q. Design and Realization of Dynamically Adjustable Multi-Pulse Real-Time Coherent Integration System. Electronics. 2026; 15(2):397. https://doi.org/10.3390/electronics15020397

Chicago/Turabian Style

Bi, Jinrui, Hongyu Zhang, Lihua Sun, and Qingchao Jiang. 2026. "Design and Realization of Dynamically Adjustable Multi-Pulse Real-Time Coherent Integration System" Electronics 15, no. 2: 397. https://doi.org/10.3390/electronics15020397

APA Style

Bi, J., Zhang, H., Sun, L., & Jiang, Q. (2026). Design and Realization of Dynamically Adjustable Multi-Pulse Real-Time Coherent Integration System. Electronics, 15(2), 397. https://doi.org/10.3390/electronics15020397

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Design and Realization of Dynamically Adjustable Multi-Pulse Real-Time Coherent Integration System

Abstract

1. Introduction

2. Overall System Logic Design

2.1. BRAM Data Transfer Module

2.2. AD9361 Data Driver Module Design

2.3. Data Storage and Coherent Integration Module

2.3.1. Data Write Module

2.3.2. Coherent Integration Module

2.3.3. Data Computing Module

2.4. Ethernet Data Transfer Module

2.5. GUI Interface and Operation

3. System Test Results

3.1. Single Tone Signal Testing

3.2. LFM Signal Testing

3.3. Hardware Resource and Power Analysis

4. Discussion

4.1. Performance Comparison

4.2. Robustness in Realistic Environments

4.3. Scalability Analysis

5. Conclusions

6. Patents

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI