Next Article in Journal
mm-Wave Radar-Based Vital Signs Monitoring and Arrhythmia Detection Using Machine Learning
Next Article in Special Issue
Application of Approximation Constructions with a Small Number of Parameters for the Estimation of a Rayleigh Fading Multipath Channel with Doppler Spectrum Spreading
Previous Article in Journal
Accuracy of Real Time Continuous Glucose Monitoring during Different Liquid Solution Challenges in Healthy Adults: A Randomized Controlled Cross-Over Trial
Previous Article in Special Issue
Inter-Multilevel Super-Orthogonal Space–Time Coding Scheme for Reliable ZigBee-Based IoMT Communications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Computationally Efficient Implementation of Joint Detection and Parameters Estimation of Signals with Dispersive Distortions on a GPU

Science and Research Department, Moscow Technical University of Communications and Informatics, Moscow 111024, Russia
*
Author to whom correspondence should be addressed.
Sensors 2022, 22(9), 3105; https://doi.org/10.3390/s22093105
Submission received: 4 March 2022 / Revised: 14 April 2022 / Accepted: 18 April 2022 / Published: 19 April 2022

Abstract

:
The detector is an integral part of the device for receiving and processing radio signals. Signals that have passed through the ionospheric channel acquire an unknown Doppler shift and are subject to dispersion distortions. It is necessary to carry out joint detection and parameter estimation to improve reception quality and detection accuracy. Modern hardware base developing makes it possible to implement a device for joint detection and evaluation of signals based on standard processors (CPU) and graphic processors (GPU). The article discusses the implementation of a signal detector that allows for real-time operation. A comparison of implementations of algorithms for estimating the Doppler frequency shift through multiplication by a complex exponent and the fast Fourier transform (FFT) is performed. A comparison of computational costs and execution speed on the CPU and GPU is considered.

1. Introduction

Ionospheric radio communication is a highly reliable and cost-effective solution for organizing communication with outlying regions, as well as with regions whose infrastructure has been disrupted due to natural disasters. Currently, development of decameter ionospheric radio communication systems is on the way to increasing the speed of information transmission [1,2,3,4,5,6]. When using broadband signals in the decameter range, the frequency dispersion has a significant effect on the signal [7,8,9,10,11,12,13,14]. Thus, due to the frequency dispersion, at different frequencies wideband signals have different propagation delays. Such a difference leads to a synchronization error and affects the quality of signal detection and the quality of information reception [15,16,17]. A separate problem is the detection of long signal preambles with a duration of about several seconds long, with a spectrum wider than 100 kHz [18,19,20] and with a coherent accumulation of the detected signal energy throughout its duration. In this case, the signal base reaches a value exceeding 50 dB, and the required accuracy of estimation and compensation of the Doppler frequency shift is in the tenths and in some cases hundredths of a hertz. Otherwise, the coherent accumulation of signal energy over time intervals of units or even tens of seconds becomes impossible. Simultaneously, with the evaluation of the Doppler shift [21,22,23,24,25,26], it is also required to evaluate and compensate for the dispersion distortions of the detected signals.
In this paper, we show the possibility of constructing a device for the joint detection and estimation of the parameters of signals with dispersion distortions on graphic processors. Implementations proposed in this paper allow for the simultaneous detection of signals and estimation of dispersion distortions, delay, Doppler shift, and initial phase in real time.

2. Related Work

Stein, Tolimieri, and Winograd are the founders of research on algorithms for calculating uncertainty functions. Stein has described a processing approach for obtaining joint delay and frequency offset (DTO/DFO) estimates for continuous signals based on the efficient calculation of complex ambiguity functions [27]. Typically, it involves a two-mode process called coarse and fine modes. Coarse mode is used to greatly reduce the time delay and frequency offset uncertainty, after which fine mode calculations are performed. Precise mode uses product/filter mixing interpretation, greatly reducing the processing load. Tolimieri and Winograd proposed an algorithm for the discrete ambiguity function calculation in [28]. They rely on the fact that, in most basic applications, it is necessary to calculate the limited parts of the DFT of a discrete ambiguity function. To do this, they first pass a long sequence through a decimated FIR filter, and then they use the FFT algorithm. Additionally, computationally efficient algorithms for the joint estimation of the Doppler shift and time delay are considered in [29,30]. These papers propose a new method based on a pre-weighted Zoom FFT with a cascaded filter algorithm to minimize the processing load of cross-ambiguity functions without compromising performance. The weighting process in the Zoom FFT method provided an opportunity for the researchers to get rid of redundant calculations. The multi-stage filtering method was used to reduce complexity and to obtain a high-performance system. A method for processing segments was also proposed, adapted to calculate the ambiguity function when imposing input data frames. By considering the calculation of the cross-ambiguity functions of overlapping data frames as the calculation of the FFT of the overlapping data, the redundancy of the calculations can be eliminated.
Modern techniques for reducing the complexity of the cross-ambiguity function (CAF) are based on numerical fitting for CAF [31]. These algorithms make full use of the property that the CAF is symmetrical in the frequency domain. Simulation results show that, compared to the method that looks for the CAF peak, the proposed algorithm can significantly reduce computational complexity while meeting the accuracy requirements of the joint time-frequency estimate.
In paper [32], the authors propose a method for solving the problem of determining the mutual delay time of ultra-wideband signals. A modified algorithm, which can be implemented by parallel calculation of the cross-ambiguity function, was used to compensate Doppler shift of the recorded signals. This algorithm was based on the division of an ultra-wideband signal into separate frequency channels. An increase in the computational efficiency of the proposed algorithm was achieved by parallel calculation of the convolution function and cross-ambiguity.
However, all the above works do not take into account the problem of compensating for dispersion distortions and processing signals with a base over 50 dB. There are also no computationally efficient solutions implemented on the GPU that allow for the real-time detection of signals with a base of more than 50 dB (the signal spectrum width is hundreds of kHz, the duration is a few seconds) with the simultaneous estimation of dispersion distortions, delay, Doppler shift, and initial phase. Given these features, the joint detection and estimation of signal parameters requires large computational resources. The modern technology level makes it possible to consider the possibility of developing a computationally efficient implementation of various algorithms on GPUs. For example, such GPU implementation allows you to build systems for parallel simulation of MIMO radars [33] and build digital downconverter [34]. Additionally, GPUs are very often used in deep learning [35]. Thus, computing on GPUs is becoming more and more efficient.

3. Analytical Formulation of the Problem

The complex envelope of the signal at the joint detection and signal parameters estimation device input can be represented as a composition of the useful signal complex envelope, distorted by the frequency dispersion of ionospheric channel, and the complex envelope of white Gaussian noise:
y ˙ i ( φ , τ = l Δ t , f d , s ) = e j φ e j 2 π f d ( i l ) Δ t x ˙ i l ( s ) + n ˙ i , i = 0 ÷ N p 1 ,
where x ¯ ˙ ( s ) = x ¯ ˙ h ¯ ˙ ( s ) is distorted by the ionospheric channel useful signal complex envelope, h ˙ i ( s ) is the ionospheric channel impulse response (IR) complex envelope, x ˙ i is the complex envelope of useful undistorted signal, f d is the doppler frequency shift, τ is the delay in seconds, l is the delay in samples, Δ t is the sample time, s is the slope of the dispersion characteristic (parameter that characterizes dispersion distortions), φ is the unknown phase shift, n ˙ ( t ) is the complex envelope of white Gaussian noise with zero mean and variance σ ɯ 2 , and N p is the number of samples.
The ionospheric channel impulse response (IR) complex envelope connects with frequency response of the ionospheric channel H ˙ ( j 2 π f ) through Fourier transform H ˙ ( j 2 π f ) : h ˙ ( t , s ) = H ˙ ( j 2 π f ) e j 2 π f d f , where x ( t ) is a transmitted signal that is known at the receiving side.
The ionospheric channel model, which takes into account frequency dispersion, is proposed in [8]. We consider version of this model with a linear dispersion characteristic. Then frequency response of the ionospheric channel in the absence of multipath signal propagation can be described as
H ˙ ( j 2 π f ) = e j π s f 2 ,   f [ Δ f / 2 ;   Δ f / 2 ] ,
where Δ f is the bandwidth of the ionospheric channel.
The decision statistic can be found as:
λ ˙ i ( φ , τ , f d , s ) = n = 0 N p 1 y ˙ n ( φ , τ = l Δ t , f d , s ) g ˙ i n ( f d , s ) ,
where the matched filter impulse response g ˙ is defined as
g ˙ N p 1 i ( f d , s ) = n = 0 N p 1 x ˙ n e j 2 π f d n Δ t h ˙ i n ( s ) .
Then, the parameter estimates can be found as:
φ ^ , τ ^ , f d ^ , s ^ = arg max φ , τ , f d , s λ ˙ i ( φ , τ , f d , s ) ,
where φ ^ , τ ^ , f d ^ and s ^ are estimates of φ , τ , f d and s , respectively.

4. Implementation of a Matched Filter

This section may be divided by subheadings. It should provide a concise and precise description of the experimental results, their interpretation, and the experimental conclusions that can be drawn. From Equation (2) it can be seen that the number of matched filters to obtain a complete set of decision statistics λ ˙ i ( φ , τ , f d , s ) is determined by the number of possible Doppler frequency shifts f d and slopes of the dispersion characteristic s :
N m f = N f d N s ,
where N m f is the number of matched filters, N f d is the number of possible Doppler frequency shifts f d , and N s is the number of possible slopes of the dispersion characteristic s . A large number of matched filters imposes high requirements on the computing platform. Doppler frequency shift f d consideration (for its estimation) can be carried out after matched filtering, then Equation (2) can be represented as:
λ ˙ i ( φ , τ , f d , s ) = e j 2 π f d i Δ t n = 0 N p 1 y ˙ n ( φ , τ = l Δ t , f d , s ) g ˙ i n ( s ) ,
where
g ˙ N p 1 i ( s ) = n = 0 N p 1 x ˙ n h ˙ i n ( s ) .
The above transformation reduces number of required matched filters to N m f = N s , which can significantly reduce computational costs. However, in the conditions of an ionospheric channel, due to the presence of a Doppler frequency shift during the observation of the complex envelope at the input of the matched filter, a phase drift occurs, which leads to losses in the SNR at the output of the matched filter. To minimize these losses, we will convolve not with a reference signal of duration N p , but with signals (see Figure 1):
x ˙ m , n = x ˙ n + m N p p ,   n = 0 ÷ N p p 1 ,   m = 0 ÷ M 1 ,
where N p p = N p M , and M is the number of splits of the original sequence.
In this case, matched filtering can be performed using a series-matched filter, which is a set of filters matched with sequences x ˙ m , n .

4.1. Estimation Algorithm via Complex Exponents

A filter matched with a series of sequences is shown at Figure 2. The signal at the output of each matched filter can be written as:
λ ˙ m , n ( s ) = l = 0 N p p 1 y ˙ m , l g ˙ m , n l ( s ) ,   n = 0 ÷ N p p 1 ,   m = 0 ÷ M 1 ,
where g ˙ M 1 m , N p p 1 n ( s ) = k = 0 N p 1 x ˙ k + m N p p h ˙ n ( k + m N p p ) ( s ) is the complex impulse response envelope of the filter matched to the m -th sequence.
Doppler frequency shift is taken into account:
  λ ˙ m , n ( f d , s ) = λ ˙ m , n ( s ) e j 2 π f d ( n + m N p p ) Δ t .
The decision statistics at the matched filter output can be obtained as:
λ ˙ n ( f d , s ) = m = 0 M 1 λ ˙ m , n ( f d , s ) .
The interval of allowable values of the Doppler frequency shift is [ f s 2 N p p : f s 2 N p p ] , where f s is the sample rate. Within this interval, value of the estimated Doppler frequency shift can be arbitrary. A significant drawback of this implementation is the requirement for the amount of RAM to store arrays with complex exponents.
Joint detection and signal parameters estimation device scheme is shown in Figure 3.

4.2. Algorithm with Doppler Estimation via FFT

Multiplication by complex exponents and the subsequent summation to further estimate the Doppler frequency shift can be done using the FFT.
Let f d = k f s N , then Equation (10) can be represented as:
λ ˙ n , k ( f d = k Δ f , s ) = m = 0 M 1 λ ˙ m , n ( s ) e j 2 π k m ,
where
λ ˙ m , n ( s ) = l = 0 N p p 1 y ˙ m , l g ˙ m , n l ( s ) , n = 0 ÷ N p p 1 , m = 0 ÷ M 1 .
Equation (11) can be calculated using FFT algorithms from λ ˙ m , n ( s ) for each k . This algorithm, in contrast to the algorithm with multiplications by complex exponents, makes it possible to estimate the Doppler frequency shift only for f d = k Δ f , where k = [ N p p 2 : N p p 2 ] . The scheme of the filter matched with a series of sequences with searches for Doppler frequency shifts through the FFT is shown in Figure 4.

5. GPU Implementation

A matched filter with a series of sequences on the GPU is implemented using the fast convolution algorithm “Overlap and Save” [36] and the FFT and IFFT parallel computation library on the GPU–clFFT, implemented on OpenCL [37] (see Figure 5). The clFFT library is developed by clMathLibraries, an OpenCL library implementation of discrete fast Fourier transforms. The input data are loaded into the GPU in blocks of N p p samples. Loading is performed into a circular buffer B i n p u t , size N p p ( M + 1 ) . After loading the next block of samples, the buffer B i n p u t is fed to the calculation of the FFT with the size of 2 N p p with an overlap in N p p samples. FFT results are written to a buffer B F F T , size 2 N p p M . Post-FFT samples are multiplied with frequency response samples H i ( s ) , i = 0 , 1 , , M 1 . The multiplication result is written to the buffer B M U L and fed to the calculation of the IFFT, size 2 N p p . Samples after this IFFT are placed in the B I F F T buffer. The second half of each 2 N p p sample is the response of the filter λ ˙ m , n ( s ) matched to the m -th sequence.
Received responses are transferred to the module for taking into account Doppler frequency shifts and obtaining the total decision statistics. This module is made in two versions. The first option is to directly multiply by complex exponents and then sum the filter responses. Multiplication operations by complex exponents are performed by calculating different samples of decision statistics using different GPU work items (WI).
The work items set w i , j of the graphic processor is represented as a matrix W , dimension R 1 × R 2 (see Figure 6). Where R 1 and R 2 are numbers of work items in the 1st and 2nd dimension, respectively. These values determined by GPU implementation and have to be taken into account in the parallelization of the algorithm adaptation for GPU.
Within the available number of work items, it is proposed to parallelize the calculation of all samples of the decision statistics for all possible values of the Doppler frequency shifts f d . The required number of work items to compute decision statistic samples λ ˙ n ( f d , s ) for a single Doppler frequency shift value is N p p . The maximum number of work items per calculation of the decision statistic samples for one value of the Doppler frequency shift can be calculated as:
N max _ i t e m s _ exp = R 1 R 2 N f d .
Then, the actual number of work items is defined as:
N i t e m s _ exp = min ( N max _ i t e m s _ exp , N p p ) .
In the case when required number of work items exceeds number of available GPU items, some work items will calculate several samples of decision statistics λ ˙ n ( f d , s ) .
When performing calculations on the GPU, work items are combined into work groups (WG). The best performance is achieved by setting the work group size N s i z e _ w o r k _ g r o u p to the maximum, which is determined by the specific GPU implementation. The number of work groups for computing decision statistics samples λ ˙ n ( f d , s ) for one value of Doppler frequency shift:
N w o r k _ g r o u p = N i t e m s _ exp N s i z e _ w o r k _ g r o u p .
The distribution of calculations between work items and GPU work groups is shown in Figure 7. This figure shows that the decision statistics values calculation λ ˙ n ( φ , τ f d , s ) is divided into N f d groups by N w o r k _ g r o u p × N s i z e _ w o r k _ g r o u p work items. Each of these groups performs the calculation of the decision statistics samples λ ˙ n ( φ , τ f d , s ) for one of the possible values of the Doppler frequency shift f d . This improves the performance of the algorithm by performing parallel computations.
The second option for building a module for taking into account Doppler frequency shifts and obtaining the total decision statistics was performed using the FFT through the clFFT library. According to Equation (11) and Figure 4, the FFT must be taken from the n -th samples of all responses λ ˙ m , n ( s ) . The clFFT library allows you to perform all the necessary FFTs using a buffer B I F F T without additional memory operations. Figure 8 shows that the clFFT library allows you to perform an FFT from all n -th samples for all λ ˙ m , n ( s ) , n = 0 ÷ N p p 1 , m = 0 ÷ M 1 that were in the buffer B I F F T without additional data copies. The number of these FFT operations is M .
The FFT results are written to the buffer B m f in such a way that the decision statistics λ ˙ n ( f d , s ) for different values of the Doppler frequency shift are sequentially stored in the memory.

6. Comparison of Algorithms Computational Complexity

Computational complexity is affected by the number of possible values f d and s , which are defined as N s and N f d , respectively. Computational complexity is given in the number of complex multiplications per one input sample. Computational complexity of the device for joint detection and estimation of signal parameters for two implementations of the algorithm is defined as:
N c m = ( 2 M ( log 2 ( 2 N p p ) 1 ) + M N f d ) N s ,
N c m _ f f t = ( 2 M ( log 2 ( 2 N p p ) 1 ) + N f d 2 ( log 2 ( N f d ) 2 ) ) N s .
Thus, computational complexity of the proposed algorithm depends on the number of partitions of the original sequence M , the duration of one part of the original sequence N p p , the number of possible values of Doppler shifts in frequency N f d , and slopes of the dispersion characteristic of the ionospheric channel N s .

7. Test Results on CPU and GPU

For the experiment, a six-core Intel Core i7-8700 CPU with a clock frequency of 3.2 GHz and a Geforce RTX 3060 GPU with 3584 CUDA cores, a base clock frequency of 1.32 GHz, and a 192-bit memory bus were used. The experiment was run on a computing platform of 32 GB of RAM with a speed of 2400 MT/s. The experiment was carried out in the operating system Linux Ubuntu 20.04 with Nvidia GPU driver version 460.73.01. The used clFFT library version was 2.12.2. For algorithm implementation, compilation was used with a gcc 9.4.0 compiler with compiler flags set to o2. To execute calculations, five cores and 10 threads of Intel Core i7-8700 CPU were used. One core and two threads were left for the needs of the operating system. Testing was performed on a signal with a bandwidth Δ F = 400   kHz and a duration T = 7 s. The base of this signal was 64.5 dB. These parameters were chosen based on the results of field experiments carried out on single-hop ionospheric paths up to 3000 km long. The search ranges for the Doppler frequency shift and the slope of the dispersion characteristic of the ionospheric channel were also selected based on the results of field experiments. Dependence of the computational complexity on the number of possible values of Doppler shifts in frequency N f d for a different number of slopes of the dispersion characteristic of the ionospheric channel N s for N p p = 32768 and M = 86 is shown in Figure 9.
This graph shows that an increase in the number of possible values N f d leads to a slight increase in computational complexity compared to an increase in the number of possible values N s . The dependence of the number of complex multiplications on the number of possible values f d for a different number of splits M of the original signal at is shown in Figure 10.
The number of experiments performed to obtain averaged results was 1000. Increasing the number M leads to an increase in computational complexity.
Table 1 shows the dependence of the algorithm running time on the block duration for f d = 5 : 0.05 : 5 N f d = 201 .
Table 2 shows how many times RTX 3060 GPU is faster than base Intel i7-8700 processor. It can be seen that the performance gain of the RTX 3060 GPU in the algorithm without FFT decreases with increasing block duration, while in the algorithm with FFT, it remains constant.
The TDP of the RTX 3060 GPU is 170W, while the TDP of the Intel Core i7-8700 is 65W. Thus, the increase in power consumption when using the RTX 3060 GPU compared to the Intel Core i7-8700 CPU is 2.62 times, and the minimum performance increase is 4.37 times. Therefore, it is advisable to use a GPU, since the increase in performance exceeds the loss in power consumption.
Dependence of the response level of the matched filter on the block duration at the Doppler shift f d = 3 is shown in Figure 11.
Implementation with Doppler shift estimation via FFT on the GPU is the most efficient and allows for processing one sample in less than 2 µs with a loss of no more than 0.5 dB. With a block duration of less than 80 ms, the loss does not exceed 0.5 dB.

8. Conclusions

This paper proposes two implementations of the joint detection and estimation of the parameters of signals with dispersion distortions on the CPU and GPU. In the first method, the estimation of the Doppler frequency shift is performed in a direct way, by multiplying by complex exponents. In the second method, estimation of the Doppler frequency shift is performed through the FFT. All FFTs in the proposed implementations are performed through the “Overlap and Save” fast convolution algorithm. The computational complexity of the proposed implementations of joint detection and estimation of signal parameters is calculated. It is shown that the method based on the estimation of the Doppler frequency shift through the FFT is the most computationally efficient. Implementation of this method on the GPU allows for the joint detection and estimation of signal parameters in real time. It is shown how the duration of a block in a matched filter with a series of sequences affects the response level. Reducing the block duration results in a reduction in matched response level loss but results in an increase in computational complexity.

Author Contributions

Conceptualization, V.I.L. and E.M.L.; methodology, formal analysis, and investigation V.I.L.; software, writing—original draft preparation, and writing—review and editing, V.I.L. and N.A.K.; validations and supervision E.M.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Jorgenson, M.B.; Johnson, R.W.; Nelson, R.W. An Extension of Wideband HF Capabilities. In Proceedings of the IEEE Military Communications Conference, San Diego, CA, USA, 18–20 November 2013; pp. 1202–1206. [Google Scholar]
  2. Pijoan, J.L.; Altadill, D.; Torta, J.M.; Alsina-Pagès, R.M.; Marsal, S.; Badia, D. Remote Geophysical Observatory in Antarctica with HF Data Transmission: A Review. Remote Sens. 2014, 6, 7233–7259. [Google Scholar] [CrossRef] [Green Version]
  3. Kandaurov, N.A. Signal-Code Constructs for Wideband HF Communication. In Proceedings of the 2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications, Yaroslavl, Russia, 1–3 July 2019; p. 5. [Google Scholar] [CrossRef]
  4. Laraway, S.A.; Loera, J.; Moradi, H.; Farhang-Boroujeny, B. Experimental Comparison of FB-MC-SS and DS-SS in HF Channels. In Proceedings of the MILCOM 2018 IEEE Military Communications Conference (MILCOM), Los Angeles, CA, USA, 29–31 October 2018; pp. 714–719. [Google Scholar] [CrossRef]
  5. Deumal, M.; Vilella, C.; Socoro, J.; Alsina-Pagès, R.M.; Pijoan, J.L. A DS-SS signaling based system proposal for low SNR HF digital communications. In Proceedings of the 10th International Conference on Ionospheric Radio Systems and Techniques, London, UK, 18–21 July 2006; pp. 128–132. [Google Scholar]
  6. Laraway, S.A.; Farhang-Boroujeny, B. Performance Analysis of a Multicarrier Spread Spectrum System in Doubly Dispersive Channels with Emphasis on HF Communications. IEEE Open J. Commun. Soc. 2020, 1, 462–476. [Google Scholar] [CrossRef]
  7. Sun, H.; Yang, G.; Cui, X.; Zhu, P.; Jiang, C. Design of an Ultrawideband Ionosonde. IEEE Geosci. Remote Sens. Lett. 2015, 12, 1042–1045. [Google Scholar] [CrossRef]
  8. Ivanov, D.V. Methods and Mathematical Models for Study of the Propagation of Decameter Complex Signals and Correction its Dispersion Distortions; MarSTU: Yoshkar-Ola, Russia, 2006. [Google Scholar]
  9. Male, J.; Porte, J.; Gonzalez, T.; Maso, J.M.; Pijoan, J.L.; Badia, D. Analysis of the Ordinary and Extraordinary Ionospheric Modes for NVIS Digital Communications Channels. Sensors 2021, 21, 2210. [Google Scholar] [CrossRef] [PubMed]
  10. Adjemov, S.S.; Lobov, E.M.; Kandaurov, N.A.; Lobova, E.O. Methods and algorithms of broadband HF signals dispersion distortion compensation. In Proceedings of the 2019 Systems of Signal Synchronization, Generating and Processing in Telecommunications (SYNCHROINFO), Yaroslavl, Russia, 1–3 July 2019; pp. 1–9. [Google Scholar] [CrossRef]
  11. Barnes, R.I.; Earl, G.F. A wideband technique for micro-ranging in OTHR. In Proceedings of the 2008 IEEE Radar Conference, Rome, Italy, 26–30 May 2008; pp. 1–6. [Google Scholar] [CrossRef]
  12. Huang, D.; Liu, E.; Hu, H.; Liu, J. Algorithm for the estimation of ionosphere parameters from ground scatter echoes of SuperDARN. Sci. China Technol. Sci. 2018, 61, 1755–1764. [Google Scholar] [CrossRef]
  13. Ajemov, S.S.; Lobov, E.M.; Kandaurov, N.A.; Lobova, E.O.; Lipatkin, V.I. Algorithms of Estimating and Compensating the Dispersion Distortions of Wideband Signals in the HF Channel. H&ES Res. 2021, 13, 57–74. [Google Scholar]
  14. Lipatkin, V.I.; Lobova, E.O.; Kandaurov, N.A. Wideband Signals Dispersion Distortions Optimum Tracking Compensator Based On Digital Filter Banks Using Farrow Filters. In Proceedings of the 2020 Systems of Signals Generating and Processing in the Field of on Board Communications, Moscow, Russia, 19–20 March 2020; p. 6. [Google Scholar] [CrossRef]
  15. Lipatkin, V.I.; Lobova, E.O. Broadband Noise-like Signal Parameters Joint Estimation Quality with Dispersion Distortions in the Ionospheric Channel. In Proceedings of the 2020 Systems of Signal Synchronization, Generating and Processing in Telecommunications, SYNCHROINFO 2020, Svetlogorsk, Russia, 1–3 July 2020; p. 6. [Google Scholar] [CrossRef]
  16. Lipatkin, V.I.; Lobova, E.O.; Telengator, K.E. The Influence of the Quality of the Estimation of Dispersion Distortions of a Broadband HF Signal on the Noise Immunity of a Radio Link. In Proceedings of the 2021 Systems of Signal Synchronization, Generating and Processing in Telecommunications, SYNCHROINFO 2021, Svetlogorsk, Russia, 30 June–2 July 2021; p. 4. [Google Scholar] [CrossRef]
  17. Lipatkin, V.I.; Lobov, E.M.; Lobova, E.O.; Kandaurov, N.A. Cramer-Rao Bounds for Wideband Signal Parameters Joint Estimation in Ionospheric Frequency Dispersion Distortion Conditions. In Proceedings of the 2021 Systems of Signals Generating and Processing in the Field of on Board Communications, Moscow, Russia, 16–18 March 2021; p. 7. [Google Scholar] [CrossRef]
  18. Arnold, E.; Rodriguez-Morales, F.; Paden, J.; Leuschen, C.; Keshmiri, S.; Yan, S.; Ewing, M.; Hale, R.; Mahmood, A.; Blevins, A.; et al. HF/VHF Radar Sounding of Ice from Manned and Unmanned Airborne Platforms. Geosciences 2018, 8, 182. [Google Scholar] [CrossRef] [Green Version]
  19. Davey, S.J.; Fabrizio, G.A.; Rutten, M.G. Multipath-aware detection and tracking in skywave over-the-horizon radar. In Proceedings of the 2017 IEEE Radar Conference (RadarConf), Seattle, WS, USA, 8–12 May 2017; pp. 0636–0640. [Google Scholar] [CrossRef]
  20. Fabrizio, G.; Zadoyanchuk, A.; Francis, D.; Nguyen, V. Using emitters of opportunity to enhance track geo-registration in HF over-the-horizon radar. In Proceedings of the 2016 IEEE Radar Conference (RadarConf), Philadelphia, PA, USA, 2–6 May 2016; pp. 1–5. [Google Scholar] [CrossRef]
  21. Tao, S.; Ran, T.; Rong, S.R. A Fast Method for Time Delay, Doppler Shift and Doppler Rate Estimation. In Proceedings of the 2006 CIE International Conference on Radar, Shangai, China, 16–19 October 2006; pp. 1–4. [Google Scholar] [CrossRef]
  22. Wang, Y.L.; Wu, Y.; Yi, S.C. An Efficient Direct Position Determination Algorithm Combined with Time Delay and Doppler. Circuits Syst. Signal Process 2016, 35, 635–649. [Google Scholar] [CrossRef]
  23. Deng, L.; Wei, P.; Zhang, Z.; Zhang, H. Doppler Frequency Shift Based Source Localization in Presence of Sensor Location Errors. IEEE Access 2018, 6, 59752–59760. [Google Scholar] [CrossRef]
  24. Ren, F.; Gao, H.; Yang, L. Distributed Multistatic Sky-Wave Over-The-Horizon Radar Based on the Doppler Frequency for MarineTarget Positioning. Electronics 2021, 10, 1472. [Google Scholar] [CrossRef]
  25. Warrington, E.M.; Stocker, A.J. Measurements of the Doppler and multipath spread of the HF signals received over a path oriented along the midlatitude trough. Radio Sci. 2003, 38, 1–12. [Google Scholar] [CrossRef] [Green Version]
  26. Knapp, C.H.; Karter, G.C. The Generalized Correlation Method for Estimation of Time Delay. IEEE Trans. Acoust. Speech Signal Process. 1976, 24, 320–327. [Google Scholar]
  27. Stein, S. Algorithms for ambiguity function processing. IEEE Trans. Acoust. Speech Signal Process. 1981, 29, 588–599. [Google Scholar] [CrossRef]
  28. Tolimieri, R.; Winograd, S. Computing the ambiguity surface. IEEE Trans. Acoust. Speech Signal Process. 1985, 33, 1239–1245. [Google Scholar] [CrossRef]
  29. Zhihai, Z.; Tao, S. Research on Fast Computation of Ambiguity Function. In Proceedings of the 2008 Congress on Image and Signal Processing, Washington, DC, USA, 27–30 May 2008; pp. 188–192. [Google Scholar] [CrossRef]
  30. Zhang, W.; Tao, R.; Ma, Y. Fast computation of the ambiguity function. In Proceedings of the 7th International Conference on Signal Processing, 2004, Beijing, China, 31 August–4 September 2004; Volume 3, pp. 2124–2127. [Google Scholar] [CrossRef]
  31. Zhang, Z.; Wang, X.; Zou, Y.; Zhang, R. A Low Complexity Algorithm for Time-Frequency Joint Estimation of CAF Based on Numerical Fitting. In Proceedings of the 2020 IEEE/CIC International Conference on Communications in China (ICCC Workshops), Online, 9–11 August 2020; pp. 214–218. [Google Scholar] [CrossRef]
  32. Ershov, R.A.; Morozov, O.A.; Fidelman, V.R. Time Delay Estimation of Ultra-wideband Signals by Calculation of the Cross-Ambiguity Function. In Wireless Communications, Networking and Applications. Lecture Notes in Electrical Engineering; Zeng, Q.A., Ed.; Springer: New Delhi, India, 2016; Volume 348. [Google Scholar]
  33. Liu, G.; Yang, W.; Li, P.; Qin, G.; Cai, J.; Wang, Y.; Wang, S.; Yue, N.; Huang, D. MIMO Radar Parallel Simulation System Based on CPU/GPU Architecture. Sensors 2022, 22, 396. [Google Scholar] [CrossRef] [PubMed]
  34. Kandaurov, N.A.; Lipatkin, V.I.; Varlamov, V.O. Implementing Digital Downconversion on a GPU. In Proceedings of the 2021 Systems of Signal Synchronization, Generating and Processing in Telecommunications, SYNCHROINFO 2021, Svetlogorsk, Russia, 30 June–2 July 2021; p. 8. [Google Scholar] [CrossRef]
  35. He, Y.; Li, X.; Li, R.; Wang, J.; Jing, X. A Deep-Learning Method for Radar Micro-Doppler Spectrogram Restoration. Sensors 2020, 20, 5007. [Google Scholar] [CrossRef] [PubMed]
  36. Proakis, J.G. Digital Communications, 4th ed.; McGraw-Hill: New York, NY, USA, 2001. [Google Scholar]
  37. Munshi, A. The OpenCL Specification. Khronos OpenCL Working Group. Version 1.2. Document Revision 19. 2012. Available online: https://www.khronos.org/registry/OpenCL/specs/opencl-1.2.pdf (accessed on 17 April 2022).
Figure 1. Reference signal divided into M parts.
Figure 1. Reference signal divided into M parts.
Sensors 22 03105 g001
Figure 2. Matched filter with a series of sequences with complex exponents.
Figure 2. Matched filter with a series of sequences with complex exponents.
Sensors 22 03105 g002
Figure 3. Scheme of the device for joint detection and signal parameter estimation.
Figure 3. Scheme of the device for joint detection and signal parameter estimation.
Sensors 22 03105 g003
Figure 4. Matched filter with a series of sequences with searches over Doppler frequency shifts via FFT.
Figure 4. Matched filter with a series of sequences with searches over Doppler frequency shifts via FFT.
Sensors 22 03105 g004
Figure 5. Implementation diagram of a matched filter with a series of sequences.
Figure 5. Implementation diagram of a matched filter with a series of sequences.
Sensors 22 03105 g005
Figure 6. A set of GPU work items.
Figure 6. A set of GPU work items.
Sensors 22 03105 g006
Figure 7. Distribution of computations between GPU work items.
Figure 7. Distribution of computations between GPU work items.
Sensors 22 03105 g007
Figure 8. Scheme of the module for taking into account Doppler frequency shifts and obtaining the total decisive statistics, implemented through the FFT.
Figure 8. Scheme of the module for taking into account Doppler frequency shifts and obtaining the total decisive statistics, implemented through the FFT.
Sensors 22 03105 g008
Figure 9. Dependence of the number of complex multiplications on the number of possible values f d for a different number of possible values of the slope of the dispersion characteristic of the ionospheric channel N s , M = 86 , N p p = 32768 .
Figure 9. Dependence of the number of complex multiplications on the number of possible values f d for a different number of possible values of the slope of the dispersion characteristic of the ionospheric channel N s , M = 86 , N p p = 32768 .
Sensors 22 03105 g009
Figure 10. Dependence of the number of complex multiplications on the number of possible values f d for a different number of splits M of the original signal at N s = 10 .
Figure 10. Dependence of the number of complex multiplications on the number of possible values f d for a different number of splits M of the original signal at N s = 10 .
Sensors 22 03105 g010
Figure 11. Dependence of the response level of the matched filter on the duration of the block with a Doppler frequency shift f d = 3 .
Figure 11. Dependence of the response level of the matched filter on the duration of the block with a Doppler frequency shift f d = 3 .
Sensors 22 03105 g011
Table 1. Experimental running time of the algorithms per one input sample, with different block durations.
Table 1. Experimental running time of the algorithms per one input sample, with different block durations.
Algorithm Implementation TypeBlock Length
10.24 ms
µs
Block Length
20.48 ms
µs
Block Length
40.96 ms
µs
Block Length
81.92 ms
µs
Block Length
163.84 ms
µs
Doppler without FFT on CPU251.1124.462.5931.315.91
Doppler with FFT on CPU17.839.175.883.982.51
Doppler without FFT on GPU7.364.212.491.611.19
Doppler with FFT on GPU3.912.031.290.910.55
Table 2. GPU RTX 3060 Performance Boost vs. CPU Intel Core i7-8700.
Table 2. GPU RTX 3060 Performance Boost vs. CPU Intel Core i7-8700.
Algorithm Implementation TypeBlock Length
10.24 ms
Block Length
20.48 ms
Block Length
40.96 ms
Block Length
81.92 ms
Block Length
163.84 ms
Doppler without FFT34.1229.5525.1419.4413.37
Doppler with FFT4.564.524.564.374.56
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Lipatkin, V.I.; Lobov, E.M.; Kandaurov, N.A. Computationally Efficient Implementation of Joint Detection and Parameters Estimation of Signals with Dispersive Distortions on a GPU. Sensors 2022, 22, 3105. https://doi.org/10.3390/s22093105

AMA Style

Lipatkin VI, Lobov EM, Kandaurov NA. Computationally Efficient Implementation of Joint Detection and Parameters Estimation of Signals with Dispersive Distortions on a GPU. Sensors. 2022; 22(9):3105. https://doi.org/10.3390/s22093105

Chicago/Turabian Style

Lipatkin, Vladislav I., Evgeniy M. Lobov, and Nikolai A. Kandaurov. 2022. "Computationally Efficient Implementation of Joint Detection and Parameters Estimation of Signals with Dispersive Distortions on a GPU" Sensors 22, no. 9: 3105. https://doi.org/10.3390/s22093105

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop