Counter-Interception and Counter-Exploitation Features of Noise Radar Technology

: In defense applications, the main features of radars are the Low Probability of Intercept (LPI) and the Low Probability of Exploitation (LPE). The counterpart uses more and more capable intercept receivers and signal processors thanks to the ongoing technological progress. Noise Radar Technology (NRT) is probably a very effective answer to the increasing demand for operational LPI/LPE radars. The design and selection of the radiated waveforms, while respecting the prescribed spectrum occupancy, has to comply with the contrasting requirements of LPI/LPE and of a favorable shape of the ambiguity function. Information theory seems to be a “technologically agnostic” tool to attempt to quantify the LPI/LPE capability of noise waveforms with little, or absent, a priori knowledge of the means and the strategies used by the counterpart. An information theoretical analysis can lead to practical results in the design and selection of NRT waveforms.


Introduction
The most relevant features of Noise Radar systems in defence applications are tightly related to modern Electronic Warfare (EW) systems, whose intercept, identification and jamming capabilities are quickly evolving following the evolution of radar threats. Both EW and radar systems are boasting more and more "intelligence" thanks to the tighter and tighter convergence of computer science, communications, signal processing and big data analytic means. The history of anti-interference radars is old: more or less "clever" anti-jamming techniques have been proven for over half a century in the radar context [1]. They include (to name only two of them) the Adaptive Frequency Selection with which the radar automatically selects the least jammed operating frequency, and the Adaptive Antenna System to counteract sidelobe jamming [2,3]. The normal follow-on has been a generation of more and more adaptive radars, arriving in this century to the concept of cognitive radar [4,5] with some (more or less partial) implementations [6,7] of it. Modern adaptive radars [8] may change their operating modes and the radiated waveforms almost instantaneously: these radar threats are adaptable and reprogrammable, creating for EW engineers an urgent request to characterise them correctly.
At the same time, the increasing scarcity of the electromagnetic spectrum, which is, of course, the main resource for radio communications, radio navigation and radar, has generated a lot of interest in research and development activities, both academic and industrial. Hot topics today are the Communication and Radar Spectrum Sharing (CRSS), and the Noise Radar pseudorandom radiated signals may only be statistically analysed, a matter to be discussed in the remaining parts of this paper.
As an example, let us consider (among many) two highly-automated EW systems: the Autonomous decoys and the Electronic deception means to confuse enemy's intelligence, surveillance, and reconnaissance (ISR) systems. A common element of these applications is the Digital Radio Frequency Memory, DRFM, system shown in Figure 1. signal classification. For example, a radar waveform optimised for target tracking is selected by the radar for a good reason and may be of a higher threat level to an airborne platform than is a different emission that is clearly associated with a surveillance task.
Thus, present-day operational radars, also agile ones, may be "more easily" intercepted to feed the "libraries" of the emitters of tactical and strategic interest, while the Noise Radar pseudorandom radiated signals may only be statistically analysed, a matter to be discussed in the remaining parts of this paper.
As an example, let us consider (among many) two highly-automated EW systems: the Autonomous decoys and the Electronic deception means to confuse enemy's intelligence, surveillance, and reconnaissance (ISR) systems. A common element of these applications is the Digital Radio Frequency Memory, DRFM, system shown in Figure 1. The key elements of the DRFM system are a fast Analog-to-Digital Converter, ADC, a fast Digital-to-Analog Converter, DAC, and a fast dual memory or fast memory, [40]. Using them, a "copy" of the radar signal is acquired, appropriately delayed and transmitted a number of times to jam the radar receiver. When the radar changes its signal, this particular jammer is only effective at radar ranges greater than the range of the platform carrying the DRFM, but if the radar signal is predictable, or even transmitted unchanged many times, all radar ranges may be jammed. When the EW system combines the DRFM technology with waveform analyses in the ES domain that have the aforementioned intelligent features the jammer becomes "smart" with deceptive features, as shown in Figure  2. Remark: The block "Libraries" includes both the "signal" and the "threat" level. The key elements of the DRFM system are a fast Analog-to-Digital Converter, ADC, a fast Digital-to-Analog Converter, DAC, and a fast dual memory or fast memory, [40]. Using them, a "copy" of the radar signal is acquired, appropriately delayed and transmitted a number of times to jam the radar receiver. When the radar changes its signal, this particular jammer is only effective at radar ranges greater than the range of the platform carrying the DRFM, but if the radar signal is predictable, or even transmitted unchanged many times, all radar ranges may be jammed. When the EW system combines the DRFM technology with waveform analyses in the ES domain that have the aforementioned intelligent features the jammer becomes "smart" with deceptive features, as shown in Figure 2.
the Autonomous decoys and the Electronic deception means to confuse enemy's gence, surveillance, and reconnaissance (ISR) systems. A common element of these cations is the Digital Radio Frequency Memory, DRFM, system shown in Figure 1  The key elements of the DRFM system are a fast Analog-to-Digital Converter a fast Digital-to-Analog Converter, DAC, and a fast dual memory or fast memor Using them, a "copy" of the radar signal is acquired, appropriately delayed and tra ted a number of times to jam the radar receiver. When the radar changes its sign particular jammer is only effective at radar ranges greater than the range of the p carrying the DRFM, but if the radar signal is predictable, or even transmitted unch many times, all radar ranges may be jammed. When the EW system combines the technology with waveform analyses in the ES domain that have the aforementione ligent features the jammer becomes "smart" with deceptive features, as shown in 2. ure 2. General block diagram of a deceptive/smart radar jammer using Digital Radiofrequency (RF) Memory (DRFM ark: The block "Libraries" includes both the "signal" and the "threat" level. Remark: The block "Libraries" includes both the "signal" and the "threat" level. In this case the jamming signal results from the aforementioned steps (a), (b) and (c) with the Machine Learning/Deep Learning analysis of the signals emitted by radar (see for instance [41]) according to their "signatures" and to their statistical features, and the comparison of the result with "a priori" information stored in the ad hoc "libraries".
In this general frame, the remaining part of this paper is dedicated to the robustness of Noise Radar waveforms; for example, in Section 4.2 of [22] dedicated to Waveforms, it is claimed that a pure random-phase coded signal is the best waveform for CW LPI radar. This solution was studied and tested [42][43][44] by the Liu Guosui group, which is one of the forerunners of NRT. In fact, the acquisition/recording of pseudorandom radar signals (not necessarily phase-only modulated) is of little advantage to the counterpart and their limited information content, in general, does not allow the enemy to feed any "signal library"; more generally, the analysis of this content allows us to quantify the Low Probability of Exploitation (LPE) property of NRT.
In [45] a novel system concept is introduced combining active sensing by a noise radar (NR) with electronic warfare (EW) tasks. This joint NR/EW sensor concept provides simultaneous operations of spectral sensing, jamming and radar target detection.

Pseudo-Random Numbers Generators and Cryptography Security
The entropy concept, in its broad sense, can be used to describe the disorder and lack of predictability of a system (Appendix A). Computation units are deterministic systems, hence they cannot generate entropy, but they only may collect it from outside sources when needed. Strictly speaking, computers may not generate a random sequence, but only a sequence of pseudo-random numbers (PRN), which is not strictly random being the result of an algorithm: the (pseudo) randomness indicates that these numbers "appear random" to an external observer, i.e., they may pass some statistical tests. It is well-known that a PRN generation (PRNG) algorithm has a starting point, called "seed", which defines the whole sequence till a repetition point. In a finite-state machine such as a digital unit, repetition cannot be avoided, but pushed away and away exploiting the digital resources, registers and memory above all. The widely used "Mersenne Twister" [46] is the collective name of a family of PRNGs based on F 2 -linear maps, whose period, for 32-bit integers, is the huge 2 19937 −1. Today the Mersenne Twister is the default generator in C compilers, the Python language, the Maple mathematical computation system, and in many other environments. In practice, repetition is a minor problem with respect to the low statistical quality of some widely used PRNGs [47].
In principle, an unpredictable truly-random numbers generator (TRNG), also said hardware-based RNG, is a nondeterministic system driven by a noise source (i.e., a physical process) governed by quantum mechanical laws, such as reflections of photons by a beamsplitter, radioactive decay and many others. However, practical methods to derive the requested numbers from such sources (i.e., to extract and exploit their entropy compliant with the recommendation for the entropy sources used for random bit generation [48]) do not always supply acceptable results, as important problems of accuracy and dynamic range arise in most cases. For example, one could try to get random numbers by measuring radioactive decay times and taking the time interval X between two successive decays. It is well known that for a fixed decay-rate λ the probability distribution function of the time interval (the inter-arrival time in traffic theory) X is an exponential type random variable with distribution: F X (x) = 1 − exp(−λx), x ≥ 0. Hence, taking the output quantity x and transforming it into u = 1 − exp(−λx), one should obtain (in theory) a random numbers generator whose output U has a uniform distribution between 0 and 1. In practice, problems arise. (a) Particle detectors have an efficiency less than unity. (b) The clock, needed to measure X, has a finite resolution ∆x, i.e., the resulting random number is zero when two successive different decay times differ by less than ∆x. (c) A real clock has drifts and higher-order errors. (d) The decay-rate λ is a priori unknown and has to be measured, leading to some possible errors in the distribution of U. For these reasons, the desired uniform distribution cannot be exactly achieved in practice.
In general, methods to extract, from physical sources, entropy/randomness are costly, hard to be implemented and only used in particular, highly sensitive applications. This is why computer-based PRNGs are referred to as "physical" generators and are so widely used. An exception may be low-quality methods in consumer computer applications (e.g., electronic games) to create "entropy pools" based on easily usable phenomena such as random movements of the mouse, least significant digits of the clock at a given event and so on.
However, recently a PRNG was developed based on the use of beta radiations enabled by integrated circuits (ICs) suitably designed to detect the very low energy of these radiations [49,50]. This generator, although with a low number of bits, has shown a relatively simple structure, low-cost and small volume, passing the National Institute of Standards and Technology (NIST) test [48].
Cryptograph [51] is a driving factor for research on PRNGs, which are used in various cryptographic steps, with an overall security level mostly depending on the quality of the used pseudo-random number generator. A PRNG suitable for cryptographic use is called a Cryptographically Secure Pseudo-Random Number Generator (CSPRNG). The strength of a cryptographic system heavily depends on the properties of these CSPRNGs. Depending on the particular application, a CSPRNG might generate sequences with some of these features: (i) to appear random to an external observer, (ii) to be unpredictable in advance, (iii) cannot be reproduced using affordable means. A perfect CSPRNG is one that produces a sequence of independent, equally distributed numbers, i.e., if X i is the generated integer number (between 1 and N) at the i-th step, the probability of X i equals 1/N independently on all the outputs of the other steps, i.e., all X k , with k = i. In other words, knowing the past or future numbers does not help to predict the current number.
A perfect CSPRNG would permit the implementation of an unbreakable cryptographic system, i.e., robust to any attack even using unlimited computation power. It is the celebrated one-time pad, in which each bit of the message is coded by addition (XOR operation) to the corresponding bit of the key, which is the output of the CSPRNG and decoded with the same operation by the legitimate recipient knowing the key. In practice, this method has the important limitation of the key to be kept secret, to be used only once, in addition to the well-known fundamental problem of the distribution of the cryptographic keys.
The features of CSPRNGs make them suitable to generate waveforms to be emitted by Noise Radars, which have to be secure against statistical analysis and reproduction for jamming and spoofing purposes. The main difference with respect to the aforementioned cryptographic applications is the inherent randomness of the physical medium (including receiver noise, environmental noises and channel/target fluctuations), [52][53][54]. This topic is also studied in recent research on Physical Layer secure communications and the Internet of Things [55,56]. In order to benefit the legitimate receiver (the own radar receiver) while denying the operation of a counterpart receiver, one shall exploit the difference between the channel to the legitimate receiver and the one to an eavesdropper (an intercept receiver in radar/EW applications) to securely transmit confidential messages (even without using an encryption key).
A possible example could be transmitting one communication code with a main, narrow-beam antenna and a different code, for deception purposes, by an auxiliary antenna (similar to the Interrogator Side Lobe Suppression-ISLS-one in Secondary Surveillance Radar-Selective Mode-SSR Mode S) whose pattern is higher than the sidelobes structure of the main antenna [57,58], thus masking its sidelobes signals.

Information Content of Radar Signals
When studying the properties of noise radar waveforms, the main question is: "How much information about the signals emitted from a particular radar may be obtained by analysing more and more samples from the radar emission?" Likely, the answer mainly depends on the operation (and performance) of the counterpart system and on the operational theatre. However, a rather general answer may be searched in terms of Information Theory [59], in which a measure of the information contained in a signal is related to the entropy concept [60] from which the mutual information is derived [61] (see Appendix A).
To introduce the concept of mutual information, we start considering a generic measurement system as sketched in Figure 3 [61]. Generally speaking, X is a vector whose components define the parameters of the "object" we wish to measure. The "measurement mechanism" maps X into a random vector Y ("observer") introducing an inherent inaccuracy due to the measurement errors and to disturbing effects. We denote I(X; Y) the mutual information between X and Y, i.e., the amount of information that the measurement (Y) provides about the parameters of the "object" (X). More and more information can be obtained about X when I(X; Y) increases. In communication systems, it is desirable to choose, among all transmissible signals, the ones that maximise the mutual information. Conversely, a low mutual information level implies a high difficulty in identifying the "object", as in the case of an LPI radar system. Remote Sens. 2021, 13, 4509 7 of 2 depends on the operation (and performance) of the counterpart system and on the opera tional theatre. However, a rather general answer may be searched in terms of Information Theory [59], in which a measure of the information contained in a signal is related to th entropy concept [60] from which the mutual information is derived [61] (see Appendix A).
To introduce the concept of mutual information, we start considering a generic meas urement system as sketched in Figure 3 [61]. Generally speaking, is a vector whos components define the parameters of the "object" we wish to measure. The "measuremen mechanism" maps into a random vector ("observer") introducing an inherent inac curacy due to the measurement errors and to disturbing effects. We denote ( ; ) th mutual information between and , i.e., the amount of information that the measure ment ( ) provides about the parameters of the "object" ( ). More and more information can be obtained about when ( ; ) increases. In communication systems, it is desira ble to choose, among all transmissible signals, the ones that maximise the mutual infor mation. Conversely, a low mutual information level implies a high difficulty in identify ing the "object", as in the case of an LPI radar system. In Electronic Support (ES) measurements, the "object" is a particular radar (able t transmit some types of waveform) and the components of the vector are the (relatively few) parameters of the waveform as obtained by the EW system, for example, TOA (tim of arrival), duration (PW, Pulse Width), bandwidth B, codes-called MOP (Modulation on Pulse), samples of the power spectrum and more. Adding goniometry to the ESM an additional component is available, i.e., the DOA (direction of arrival). With the classifica tion function or even the Specific Emitter Identification (SEI) feature, the result is a vecto of integers that enumerate the radar sources in the library. In such a case, the informatio of is not only a set of parameters (estimation problem) but it includes identification (deci sion problem on many hypotheses) or, as a subcase, classification of the radar source.
The "observer" is the vector , whose components represent the samples of the sig nal as intercepted by the ESM. The vector is affected by noise, multipath, and measure ment errors. Hence, ( ; ) is a measure of the (partial and corrupted) information tha provides about . The aim of LPI radar designs is to design and select a set of rada waveforms that minimise the information ( ; ) for best, or optimal, LPI features. How ever, this set of waveforms has to guarantee the requirements for optimal target detection i.e., a high peak-sidelobe level in its auto-correlation function (and, considering the Dop pler shift, in the ambiguity function) to mitigate the masking effect due to strong targets The constraints include a limited bandwidth to meet the frequency regulations, the re quired range-resolution and, finally, the efficiency in the transmitted power. Concernin the latter, noise radars show the possibility to control the peak-to-average power rati (PAPR) of the radiated waveform. For a given noise sequence [ ] of length , the PAPR is defined as:  In Electronic Support (ES) measurements, the "object" is a particular radar (able to transmit some types of waveform) and the components of the vector X are the (relatively few) parameters of the waveform as obtained by the EW system, for example, TOA (time of arrival), duration (PW, Pulse Width), bandwidth B, codes-called MOP (Modulation on Pulse), samples of the power spectrum and more. Adding goniometry to the ESM an additional component is available, i.e., the DOA (direction of arrival). With the classification function or even the Specific Emitter Identification (SEI) feature, the result is a vector of integers that enumerate the radar sources in the library. In such a case, the information of X is not only a set of parameters (estimation problem) but it includes identification (decision problem on many hypotheses) or, as a subcase, classification of the radar source.
The "observer" is the vector Y, whose components represent the samples of the signal as intercepted by the ESM. The vector Y is affected by noise, multipath, and measurement errors. Hence, I(X; Y) is a measure of the (partial and corrupted) information that Y provides about X. The aim of LPI radar designs is to design and select a set of radar waveforms that minimise the information I(X; Y) for best, or optimal, LPI features. However, this set of waveforms has to guarantee the requirements for optimal target detection, i.e., a high peak-sidelobe level in its auto-correlation function (and, considering the Doppler shift, in the ambiguity function) to mitigate the masking effect due to strong targets. The constraints include a limited bandwidth to meet the frequency regulations, the required range-resolution and, finally, the efficiency in the transmitted power. Concerning the latter, noise radars show the possibility to control the peak-to-average power ratio (PAPR) of the radiated waveform. For a given noise sequence g[k] of length n, the PAPR is defined as: To maximise the transmitted power, deterministic waveforms (e.g., chirp, Barker, . . . , [62]) are normally Phase/Frequency only modulated with unitary PAPR. Unless the amplitude is saturated, a noise waveform has a natural PAPR around 10 (or greater) with reduced transmitted energy, which causes a loss, compared to the PAPR of unity, in signal-to-noise ratio (SNR) equal to: i.e., about 10 dB or greater for natural PAPR. To ensure a loss lower than 2 dB, the PAPR must be less than 1.6. Of course, when a hard limiter is applied to the noise waveform, the loss is zero. However, in this case, the noise waveform has a reduced number of degrees of freedom (DoF) equal to the number (BT) of phase values versus of 2BT degrees of full freedom, i.e., amplitude and phase pairs, or equivalently, I and Q (in-phase and quadrature) pairs. More generally, a reduction in PAPR causes a decrease in the equivalent number of DoF and therefore potentially a greater probability of interception of the waveform. When the intercepted (observed) signal Y is modelled as a complex discrete-time random process, it is natural to arrive at the concept of the Mutual Information Rate (MIR) for the real part and the imaginary part of Y as a measure of the rate of growth of the common information versus the time, as explained in paragraph 3.1.

Mutual Information of a Random Process
For a real discrete-time random process represented by a sequence of n equallydistributed random variables {X 1 , X 2 , . . . , X n }, with marginal and the joint entropies h(X i ), i = 1, 2, . . . , n, and h(X 1 , X 2 , . . . , X n ) respectively, the mutual information (not to be confused with the I(X; Y) of Figure 3) can be evaluated (see Appendix A) as [59]: h(X i ). As a measure of the rate of growth of the common information versus the time, we introduce the mutual information rate (MIR): Substituting Equation (3) in Equation (4) and using the relation between the joint and the conditional entropy (see Appendix A for details), the MIR can be written as: Therefore the MIR represents the entropy of a single sample h(X n ), reduced by the knowledge of its past, i.e., by the conditional entropy h(X n |X n−1 , X n−2 , . . . , X 1 ). If the process is stationary in a wide sense (WSS), and n tends to the infinity, Equation (5) becomes (for details see Appendix A): with h(X n ) = h(X) for each n, and h r (X 1 , X 2 , . . . , X n ) the entropy rate, i.e., the measure of the average information carried by each sample in a random sequence of n consecutive samples. In many cases, the evaluation of h(x) and h r (X 1 , X 2 , . . . , X n ) are computationally hard, a well-known exception being one of the WSS Gaussian processes, where: with S(ω) denoting the power spectrum density of the Gaussian process. Therefore: Remote Sens. 2021, 13, 4509 The MIR is non-negative and equal to zero if and only if S(ω) is a constant, i.e., for a white Gaussian process. By Equation (9), we can introduce the Spectral Flatness Measure (SFM) for the Gaussian process: The SFM is a well-known and widely accepted method for evaluation of the "whiteness" (or "compressibility" in audio or imaging applications) of a signal. From the nonnegativity of MIR, it is easy to show that 0 < SFM ≤ 1. Values of SFM close to zero (MIR 1) correspond to a more structured (or less random) signal; an SFM = 1 (MIR = 0) corresponds to a random, unpredictable signal. Table 1 shows the theoretical MIR and SFM evaluated using Equations (9) and (10) for three different power spectrum densities: Uniform, Hamming and Blackman-Nuttall (see Figure 4). In Section 4, they will be used to generate noise waveforms with the assigned spectrum. The Blackman-Nuttall spectrum will produce a more structured signal in contrast to the others.
The is a well-known and widely accepted method for evalu ness" (or "compressibility" in audio or imaging applications) of a signal. ativity of , it is easy to show that 0 < ≤ 1. Values of cl 1) correspond to a more structured (or less random) signal; an = 1 sponds to a random, unpredictable signal. Table 1 shows the theoretical MIR and SFM evaluated using Equ for three different power spectrum densities: Uniform, Hamming and (see Figure 4). In Section 4, they will be used to generate noise wave signed spectrum. The Blackman-Nuttall spectrum will produce a mor in contrast to the others.  In Appendix B the definition of MIR is extended to a complex pro process is a second-order circular doubly white Gaussian (see Appe nition), the marginal entropy ℎ( ) is the sum of the entropy of the rea imaginary part. Additionally, the entropy rate results equal to the sum of the real and the imaginary parts. Hence: In Appendix B the definition of MIR is extended to a complex process [63][64][65]. If the process {Z n } is a second-order circular doubly white Gaussian (see Appendix B for the definition), the marginal entropy h(Z) is the sum of the entropy of the real and the one of the imaginary part. Additionally, the entropy rate results equal to the sum of the entropy rate of the real and the imaginary parts. Hence:

On the Significance of MIR and SFM in Radar
The MIR and the SFM are used to characterise the information content of various signals, e.g., the ones from musical instruments [66] and, sometimes, radar signals [67].
The MIR analysis of radar signals, e.g., to characterise their LPI/LPE features, would prove really useful when one takes into account the full information chain of the counterpart: reception/interception, analysis/extraction up to data exploitation, which also includes some a priori information stored in tactical databases or "libraries". Such an analysis is beyond the scope of this paper. Only, we wish to emphasise the importance of the a priori information by the ideal, simple example which follows.
Let the counterpart know the radar's operating band, with bandwidth B (e.g., B = 50 MHz), of the victim radar, and be able to sample its signals at the Nyquist rate. Let the intercept receiver operate at a constant SNR measured on the band B before processing. The exemplary radar, here, is assumed to have two basic options: (a) Linear Frequency Modulation (LFM) pulse with duration T such that B·T 1. (b) Noise Radar operation emitting noise waveforms with three cases: (b 1 ) "natural" Peak-to-Average Power Ratio, PAPR ≥ 10 (Gaussian process for the components I and Q); (b 2 ) "low PAPR", e.g., PAPR = 1.5 (non-Gaussian process for I and Q); (b 3 ) "unimodular" waveform, PAPR = 1 (non-Gaussian process for I and Q).
In case (a), when the counterpart knows T (does not matter whether by intelligence or measurement; remember that B is supposed to be known), the signal is fully determined (apart from an immaterial constant phase). Hence, in principle, the "new" information from the waveform's samples is nearly zero, which is consistent with the "flat" signal spectrum of a chirp signal with SFM = 1, hence with a nearly null MIR. Similar considerations apply to any signal defined by a finite number of parameters. These parameters determine the values of the signal samples or, vice versa, in principle, may be determined by a suited, limited number of these samples.
Some preferred signal analysis methods against LPI radar use time-frequency transforms. Their main aim is to classify the emitter (e.g., to define which signals are transmitted, e.g., choose among Barker (nested Barker), polyphase, Frank, Golay, and other codes [62]). In short, they are aimed at classification, a role in which MIR is less important. Moreover, the ESM "sees" a signal that consists of radar emissions plus noise (from its receiver) so Equation (11) has only a theoretical value.
Different considerations apply to the case of interception of NRT signals, case (b), where the number of (real) parameters defining the (pseudo-random) signals reaches thepossibly huge-value 2BT. Hence, even when the parameters B and T are known to the intercepting part, a number of real parameters slightly less than 2BT remain unknown, and the MIR analysis may be meaningful, and sometimes interesting, at least to compare different pseudorandom waveforms. In case (b 1 ) the process is Gaussian and Equation (11) can be applied, conversely, in cases (b 2 ) and (b 3 ) the low PAPR makes the noise process no longer Gaussian and Equation (11) is not applicable anymore. For non-Gaussian statistics, it is necessary to evaluate the distance from the Gaussian one. A possible approach is based on the use of the negentropy (see Appendix A), which will be considered in Section 4.2.

Estimation of Entropy and Mutual Information by Simulation
Three exemplary noise waveforms Z(k) = X(k) + jY(k) of N samples, duration T = N·T s = 100 µs (T s is the sampling time), with spectrum inside − B 2 , + B 2 defined by three spectral windows (Uniform, Hamming and Blackman-Nuttall) with B = 50 MHz, are generated as described in [68] with a detailed functional description of the blocks of Figure 5.  We assume that the sampling frequency = is set to , hence the number of complex samples is corresponding to 2 real samples (or 2 degrees of freedom). The PAPR is varied from the "natural" value (ten or greater) to a value of one (minimum loss on Signal-to-Noise Ratio). In this frame, the three waveforms are compared in terms of marginal entropy, joint entropy and mutual information.

4.1.". Natural" PAPR (Gaussian Process)
A "natural" PAPR makes ( ) and ( ) zero-mean Gaussian-distributed random variables of the same variance = = . The latter, here, is set to 0.5 for the given waveforms with unit power. Posing In Table 2 for = 10 , the value of and are shown for the real part of the signal ( ). Similar results are obtained for the imaginary part ( ). The maximum deviation of ( ) from 0.5 (theoretical value) is around 3 • 10 comparable with the standard deviation of the estimator equal to 2.24 × 10 .
The joint entropy is estimated using Equation (13):  We assume that the sampling frequency F s = 1 T s is set to B, hence the number of complex samples is BT corresponding to 2BT real samples (or 2BT degrees of freedom). The PAPR is varied from the "natural" value (ten or greater) to a value of one (minimum loss on Signal-to-Noise Ratio). In this frame, the three waveforms are compared in terms of marginal entropy, joint entropy and mutual information.

"Natural" PAPR (Gaussian Process)
A "natural" PAPR makes X(k) and Y(k) zero-mean Gaussian-distributed random variables of the same variance σ 2 X = σ 2 Y = σ 2 . The latter, here, is set to 0.5 for the given waveforms with unit power. Posing [X i (1) X i (2) . . . X i (N)] where i denotes the trial i-th (i = 1, 2, . . . , N trial with N trial set to 10 5 ), the analysis is limited here to the first and at the second probability hierarchy, i.e., it is performed only on a couple of extracted random variables X 1 = [X i (1)] and X 2 = [X i (2)]; the same is carried out for Y 1 and Y 2 . In the following Montecarlo simulation the marginal entropy h(X 1 ), h(X 2 ), the joint entropy h(X 1 , X 2 ) and the mutual information I(X 1 , X 2 ) are estimated following two different approaches.
In the first approach, the marginal entropy is estimated by the analytical expression, valid for a Gaussian model, using the sample variance S 2 1(2) (The sample variance is the estimator For the Gaussian population S 2 is unbiased and consistent with Var S 2 = 2σ 4 N−1 [60]. For N = 10 5 and σ 2 = 0.5, the standard deviation is 2.24 × 10 −3 ): In Table 2 for N = 10 5 , the value of S 2 1 and S 2 2 are shown for the real part of the signal (X). Similar results are obtained for the imaginary part (Y). The maximum deviation of S 2 1(2) from 0.5 (theoretical value) is around 3·10 −3 comparable with the standard deviation of the estimator equal to 2.24 × 10 −3 . The joint entropy is estimated using Equation (13): with |K| the determinant of the estimated covariance matrixK = S 2 is the sample correlation coefficient:  [60]; setting N = 10 5 the standard deviation of W is σ W ∼ = 3.16 × 10 −3 .
For Uniform spectrum at lag T s = 1 B the random variable {X 1 , X 2 } are uncorrelated (ρ 12 ∼ = 0), while for Hamming and Blackman-Nuttall the width of the main lobe of the autocorrelation increases and ρ 12 assumes the value reported in Table 3. The maximum deviation ofρ 12 from its theoretical value as shown in Table 3, after Fisher's transformation into W, is comparable with the standard deviation of W. Table 3. Estimated joint entropy and information of the real part of two contiguous samples with natural PAPR. Finally, by Equation (3) with n = 2, the mutual information is estimated as: It results that, having considered a pair of successive samples, the mutual information depends only on the correlation coefficient and, for ρ 12 ∼ = 0, it is very close to zero. If |ρ 12 | tends to one (X 1 , X 2 are becoming perfectly correlated) the mutual information is going to infinity.
The estimates ofĥ(X 1 ),ĥ(X 2 ),ĥ(X 1 , X 2 ) andÎ(X 1 , X 2 ), using the sample parameters, are shown in Tables 2 and 3. A second approach to estimate the marginal and the joint entropy is carried out using the 1D and 2D histograms as an approximation of the probability density and joint density function (details about the entropy estimation by histogram are reported in Appendix C). Then, the mutual information is estimated usingĥ(X 1 ),ĥ(X 2 ) andĥ(X 1 , X 2 ) in Equation (3) as in the previous approach. Figure 6 shows the projections of the 2D histograms on the plane (X 1 , X 2 ). A uniform spectrum shows a circular symmetry due to the independence of (X 1 , X 2 ). With Hamming and Blackman-Nuttall spectrum, after 1 B the theoretical correlation coefficient is 0.4258 and 0.6727 respectively (the level curves of the joint density function are ellipses with the major axis sloping).
The estimated entropy using histogram and the corresponding estimation of the mutual information, which are shown in Table 2, are in good agreement with those estimated by the sample parameters (S 2 1 , S 2 2 ,ρ 12 ) and with the theoretical ones.
ence of ( , ). With Hamming and Blackman-Nuttall spectrum, after the theoretical correlation coefficient is 0.4258 and 0.6727 respectively (the level curves of the joint density function are ellipses with the major axis sloping). The estimated entropy using histogram and the corresponding estimation of the mutual information, which are shown in Table 2, are in good agreement with those estimated by the sample parameters ( 1 2 , 2 2 , ) and with the theoretical ones.

Controlled PAPR (Non-Gaussian Process)
A reduction of the PAPR from the natural value by a non-linear transformation (Alternating Projection algorithm in Figure 5) distorts the random process reducing or deleting its Gaussianity. For a real random variable X, the negentropy J(X) measures the loss of Gaussianity and is defined as: where h G (X) = 1 2 ln 2πeσ 2 is the entropy of a Gaussian random variable with the same variance σ 2 as the random variable X. As well known, see for instance [60], the Gaussian distribution has the maximal entropy among all distributions with the same variance, hence a positive J(X) measures the distance of X from the Gaussian model. The main problem using negentropy is its evaluation, very difficult when the probability density function of X is unknown. Figure 7 shows the estimated negentropy of the real part of a single waveform generated by FCG (with Uniform, Hamming and Blackman-Nuttall spectrum, B = 50 MHz, σ 2 = 0.5) varying the PAPR, where h G (X) = 1 2 ln 2πeσ 2 = ln √ πe ∼ = 1.0724 and h(X) are estimated using the histogram (Appendix C). The negentropy results are independent of the spectrum shape. When the PAPR is less than 4σ ∼ = 3 the process begins to be definitely non-Gaussian. The same trend is obtained for the imaginary part. According to Figure 7, for PAPR > 3 the joint entropy and the joint information are the ones of a Gaussian process (see Figure 8a,b). For PAPR < 3 the non-linear transformation de-correlates the samples reducing both the joint entropy (Figure 8a) and the mutual information (Figure 8b).

Controlled PAPR (non-Gaussian Process)
A reduction of the from the natural value by a non-linear transformation (Alternating Projection algorithm in Figure 5) distorts the random process reducing or deleting its Gaussianity. For a real random variable , the negentropy ( ) measures the loss of Gaussianity and is defined as: where ℎ ( ) = (2 ) is the entropy of a Gaussian random variable with the same variance as the random variable . As well known, see for instance [60], the Gaussian distribution has the maximal entropy among all distributions with the same variance, hence a positive ( ) measures the distance of from the Gaussian model. The main problem using negentropy is its evaluation, very difficult when the probability density function of is unknown.

Conclusions, Recommendations and Perspectives for Future Research
The transmitted waveform is a key performance element in every radar design. Using Noise Radar Technology, the pseudo-random waveforms shall be suitably "tailored" to satisfy contrasting requirements in terms of power efficiency (calling for a "low" and often nearly unitary PAPR) and of the information available to any intercepting counterpart (calling for a "high" PAPR equal or close to that, order of 9-10, of a Gaussian process). In addition, low sidelobes of the autocorrelation function are requested by many applications, calling for a significant spectral weighting (e.g., Blackman-Nuttall or Taylor).
From a preliminary information-theoretic analysis, using the negentropy concept, the effect of the selected PAPR on the information content that is available to EW receivers was analysed in its main trends. It results that for a PAPR value above an approximate threshold of three, the Gaussian approximation for the entropy holds. On the other hand, when the PAPR goes from about three towards the unity, there is a steep increase in the negentropy and a significant decrease in the joint information of a pair of successive samples due to their progressive decorrelation. This decorrelation increases the quantity of information available to the counterpart and might limit the LPI properties of the radar by supporting a detection or interception of the noise radar emission. Thus, the trade-off in noise radar waveform design between longer detection ranges (demanding for higher effective transmitter powers often implemented efficiently by lower PAPR values) and LPI features remains. This paper aims to support such design decisions by providing some quantitative analysis of the relationship between the PAPR and the detectability or exploitability of the radar waveform.
Author Contributions: G.G. defined the main content, including a synthesis of the modern EW systems, and the overall organisation of the paper. G.P. developed the waveform analysis tools and obtained the related results. K.S. and C.W. studied, checked, and critically reviewed the waveform generation methods and the related analysis of their information content. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

Conclusions, Recommendations and Perspectives for Future Research
The transmitted waveform is a key performance element in every radar design. Using Noise Radar Technology, the pseudo-random waveforms shall be suitably "tailored" to satisfy contrasting requirements in terms of power efficiency (calling for a "low" and often nearly unitary PAPR) and of the information available to any intercepting counterpart (calling for a "high" PAPR equal or close to that, order of 9-10, of a Gaussian process). In addition, low sidelobes of the autocorrelation function are requested by many applications, calling for a significant spectral weighting (e.g., Blackman-Nuttall or Taylor).
From a preliminary information-theoretic analysis, using the negentropy concept, the effect of the selected PAPR on the information content that is available to EW receivers was analysed in its main trends. It results that for a PAPR value above an approximate threshold of three, the Gaussian approximation for the entropy holds. On the other hand, when the PAPR goes from about three towards the unity, there is a steep increase in the negentropy and a significant decrease in the joint information of a pair of successive samples due to their progressive decorrelation. This decorrelation increases the quantity of information available to the counterpart and might limit the LPI properties of the radar by supporting a detection or interception of the noise radar emission. Thus, the trade-off in noise radar waveform design between longer detection ranges (demanding for higher effective transmitter powers often implemented efficiently by lower PAPR values) and LPI features remains. This paper aims to support such design decisions by providing some quantitative analysis of the relationship between the PAPR and the detectability or exploitability of the radar waveform.
Author Contributions: G.G. defined the main content, including a synthesis of the modern EW systems, and the overall organisation of the paper. G.P. developed the waveform analysis tools and obtained the related results. K.S. and C.W. studied, checked, and critically reviewed the waveform generation methods and the related analysis of their information content. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.

Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.

Conflicts of Interest:
The authors declare no conflict of interest.

Appendix A Appendix A.1. Entropy and Negentropy
From an energetic point of view, in a transformation process, maximum efficiency is obtained when the sum of all involved energies after the process has occurred has an ability to produce work equal to their sum before. The II Law of Thermodynamics, the well-known "increase of entropy principle", states that maximum efficiency is attained in energy transformations for an (ideal) reversible process, i.e., an "infinitely slow" one, in which the entropy production is zero, hence, the order is saved. The extent of the entropy creation due to irreversibility is a measure of the "lack of ideality" of a process. In energy transformations, we get the maximum efficiency using processes in which the entropy creation due to irreversibility is minimal.
The mechanical-statistical character of the entropy concept and its logarithmic connection to the probability is due to Ludwig E. Boltzmann (20 February 1844-5 September 1906 and his kinetic gas theory. The tombstone of Boltzmann in the Viennese Zentralfriedhof shows his celebrated formula for the entropy (S) of a system with W possible microstates: S = k B ·log(W), where k B is the Boltzmann's constant and log(·) the natural logarithm.
In the Information Theory context [59], entropy is generally seen as a measure of the uncertainty (unpredictability) of an event, and for a random variable X, of the "distance" of its realisations from the predictable ones.
Given a discrete random variable X with probability mass function p(x i ) = P{X = x i }, the entropy is defined as (being E[·] the expected value operator): For a continuous random variable X with probability density function f (x), the differential entropy (denoted with the lowercase letter) is defined as: In the present discussion, the natural logarithm will be used. Note that when dealing with discrete random variables the equivalent formulation with base-2 logarithm (log 2 ) is more widely used, and leads to the well-known information unit called bit, while the natural logarithm, more widely used in signal processing, leads to the (equivalent) nat unit 1 nat = 1 ln(2) ∼ = 1.443 bit .
A tutorial example of entropy evaluation is the coin-tossing experiment with outcomes head (with probability p 1 = P{head}) and tail (with probability p 2 = P{tail} = 1 − p 1 ), which generates a Bernouilli-type variable. Its entropy is: H Bernouilli = −[p 1 log(p 1 ) + p 2 log(p 2 )], which is equal to 1, i.e., one bit of information, when log 2 is used and the coin is unbiased (p 1 = 0.5), i.e., the result is minimally predictable (the disorder is maximum). For a biased coin, predictability increases and the entropy is smaller: for p 1 = 0.25, it equals 0.8114 bit (0.5623 nat), and for p 1 going to zero, or to one, the entropy goes to zero, i.e., the result becomes predictable.
When the random variable X is normalised (i.e., with zero mean and unit variance) its entropy only depends on the particular shape of its probability density function f (x) or probability mass function p(x i ). For real continuous random variables the interesting result holds that, among all normalised random variables, the Gaussian (Normal) variable has the maximum entropy: h G (X) = 1 2 ln(2πe) ∼ = 1.42 nat. Hence, any non-linear transformation of a Gaussian variable reduces the entropy, or "creates some negentropy". The term negentropy is first found in Schrödinger's book [69], which discusses the complexity of life and its evolution towards more and more "organized", or someway, "ordered", living species, a situation which seems to "create order" and to "negate the entropy", apparently against the II Law of Thermodynamics.
For a random variable X, the negentropy J(X) is defined as: where h G (X) = 1 2 ln 2πeσ 2 is the entropy of a Gaussian random variable with the same variance σ 2 of X. As said, the Gaussian distribution has the maximal entropy among all distributions of the same variance, hence J(x) ≥ 0.
The main problem using negentropy is its evaluation, very difficult when the probability density function of X is unknown and it has to be estimated. Therefore, approximations of negentropy use the higher-order (up to four) moments [70]: Appendix A.2. Self-Information and Mutual-Information , the entropy of X equals the expectation of the random variable I(X). The latter is said self-information of X and measures the information obtained by observing X, or equivalently, it measures the a priori uncertainty in the outcome of the event {X = x i } in the discrete case and {x < X < x + dx} for a continuous random variable.
By Equation (A8), we can introduce the mutual information rate (MIR) as a measure of the rate of growth of the common information as a function of the time: If {X 1 , X 2 , . . . , X n } are independent, MIR = 0. By Equation (A8), MIR can be written as: Using the relation between the joint and the conditional densities for {X 1 , X 2 , . . . , X n }, i.e.,: f (x 1 , x 2 , . . . , x n ) = f (x n |x n−1 , . . . , x 1 )· f (x n−1 |x n−2 , . . . , x 1 )· . . . · f (x 2 |x 1 )· f (x 1 ), the joint entropy is written as a function of the conditional entropy: h(X 1 , X 2 , . . . , X n ) = h(X n |X n−1 , . . . , X 1 ) + h(X n−1 |X n−2 , . . . , X 1 ) + . . . + h(X 2 |X 1 ) + h(X 1 ) (A13) and the MIR becomes: For a real wide sense stationary (WSS) process Equation (A14) can be written as: being h(X n ) = h(X) for each n and, when n → ∞ , it can be demonstrated that lim n→∞ h (X n |X n−1 , X n−2 , . . . , X 1 ) = lim n→∞ 1 n h(X 1 , X 2 , . . . , X n ) h r (X 1 , X 2 , . . . , X n ), where the second limit defines the entropy rate h r (X 1 , X 2 , . . . , X n ), i.e., the measure of the average information carried by each sample in a random sequence of n consecutive samples. The above concepts can be extended to n discrete random variables. For example, considering a process generating, at each step, a random natural number X i in the interval from 1 to N, (where N in computer applications may be, for example, the largest representable integer number, equal to 2 b where b is the number of bits used in the representation) if the distribution of X is uniform and each generation is independent of the others, the resulting entropy after M steps and for N 1 is M N log(N). It seems useful to define 1 N log(N) as the entropy rate, or information rate, of this process.

Appendix B
For a complex random variable Z = X + jY the probability density function, f (Z), is defined by the joint density of the real (X) and the imaginary (Y) part, i.e., f (Z) f (X, Y).
Considering a complex random vector (Z ∈ C n ), with n complex components {Z 1 , Z 2 , . . . , Z n }, the joint probability density function is f (Z n ) f (X n , Y n ) where X n , Y n ∈ R n are real vectors denoting the real and the imaginary part of Z n .
The entropy of Z n is: A complex random process {Z(t)}, after sampling, can be described by n complex samples, i.e., by a raw vector Z n = [Z 1 , Z 2 , . . . , Z n ], whose components are complex random variables Z i = X i + jY i with i = 1, 2, . . . , n. Without loss of generality, we suppose Z n to be zero-mean.
In this context of complex processes, to extend the definition of the entropy rate, we introduce the second-order stationary (SOS) concept. The only covariance function R(i, i + m) = E Z i+m Z * i is not sufficient to entirely describe an SOS process. It is necessary to introduce a second function, said the pseudo-covariance function (also called the relation function in [65]), defined as

Definition of a Second-Order Stationary (SOS) Process
A complex random process Z n is said SOS if: it is WSS, i.e., ∀i E[Z i ] is constant and the covariance function R(i, i + m) = E Z i+m Z * i only depends on the index difference m, i.e., R(i, i + m) = R(m); (ii) ∀i the pseudo covariance function R(i, i + m) = E[Z i+m Z i ] only depends on the index difference m, i.e., R(i, i + m) = R(m).
For complex random processes, the WSS assumption does not necessarily imply the SOS. In many applications (communications, medical image processing, . . . ) signals (realisations of the process), in the most general case, are complex with the real and imaginary parts being possibly correlated to each other. In this case the R(m) is complex. However, when the real and imaginary parts are uncorrelated, R(m) has real values, and when R(m) = 0, the process is said second-order circular (SOC) [63]. (Definition: A complex process is a second-order circular (SOC) if its second-order statistics are invariant in any phase transformation, i.e., considering Z and Z·e jα , their covariance functions, R(m), are equal for any real number α, but their pseudo-covariance functions, R(m), are equal if and only if they are zero).
If the operator F {·} defining the Fourier Transform is applied to the covariance function and to the pseudo-covariance function, the power spectrum S(ω) = F {R(m)} and the pseudo-power spectrum S(ω) = F R(m) are defined.
For a real process S(ω) = S(ω), and uniquely it defines the power spectrum of the process, which shows the well-known properties of symmetry, non-negative and a finite-value of the integral over the interval [−π, +π]. Conversely, any function S(ω) satisfying the previous three properties can be considered to be the Fourier transform of the covariance function of a real process.
For a complex process, given the two functions S(ω) and S(ω), they are respectively the power spectrum and the pseudo-power spectrum of a complex SOS random process if they satisfy the necessary conditions: An SOS complex process is said "white" if the covariance function R(m) = R(0)·δ(m), where δ(m) is the delta function, and no restriction is imposed on R(m). Instead, an SOS complex process is said "doubly white" if it is "white", and R(m) = R(0)·δ(m).
In general R(m) is complex, but when the real and imaginary parts are uncorrelated, it assumes a real value and, and when R(m) = 0, the process is a circular white process. Now, after the introduction of the above concepts and definitions, we define the entropy rate of a complex random process Z n . As it occurs in the real case, the entropy rate is defined: if the limit exists, and it results: h(Z 1 , Z 2 , . . . , Z n ) ≤ ∑ n k=1 h(Z k ), where the equality occurs if and only if the Z i are independent.
Hence, the entropy rate can be used to measure the sample dependence and it reaches the upper bound when all samples of the process are independent.
Given a complex SOS Gaussian process Z n with power spectrum S(ω) and pseudo-power spectrum S(ω), the entropy rate is [64]: Now we show that for a doubly white Gaussian random process with R(0) ∈ R (i.e., with the real and imaginary parts uncorrelated) the entropy rate, Equation (A19), is the sum of the entropy rate of the real part, h r (X n ), and imaginary parts, h r (Y n ), of the process. Since X n and Y n are uncorrelated, we can directly use the entropy rate formula for real Gaussian process, and the entropy rate of the complex Gaussian process is: h r (Z n ) = h r (X n ) + h r (Y n ) Since X n and Y n are white, i.e., with constant spectrum: S X (ω) = R X (0) and S Y (ω) = R Y (0), due to the uncorrelation hypothesis for X n , Y n : R(0) = R X (0) + R Y (0) and R(0) = R X (0) − R Y (0), it results S(ω) = S(−ω) = S X (ω) + S Y (ω) and S(ω) = S X (ω) − S Y (ω) real. Then: Inserting Equation (A23) in Equation (A19), we obtain Equation (A22) being: If the Gaussian random process is second-order circular white, R(m) = 0, the entropy rate is simply given by twice the entropy rate in the real domain and MIR = h(Z) − h r (Z 1 , Z 2 , . . . , Z n ), can be evaluated considering the marginal entropy h(Z) = ln(πe) + ln 1 2π +π −π S(ω)dω (sum of the entropy of the real and the imaginary part) and also the entropy rate as the sum of the entropy rate of the real and the imaginary part: h r (Z 1 , Z 2 , . . . , Z n ) = ln(πe) + 1 2π +π −π ln[S(ω)]dω. Hence:
The performance of the estimation depends on the number of samples (or pairs for 2D histograms) used to evaluate the histogram. Figure A1a shows the estimated marginal entropy, Figure A1b the joint entropy versus the number of samples (or pairs) and their standard deviation ( Figure A1c). Generally, this approach presents an underestimate of the entropy for a low number of samples (lack of values in the histogram evaluation). For a large number of samples (or pairs), i.e., ≥ 10 5 , the performance becomes to be appreciable.
This approach conceptually could be extended to estimate the joint density in the multivariate case (n > 2), however, the "curse of dimensionality" [71] makes it impossible to realise. This term "curse of dimensionality" was coined by R. E. Bellman (1961) to indicate that the number of samples needed to estimate an arbitrary function with a given level of accuracy grows exponentially with the number of variables that it comprises. a large number of samples (or pairs), i.e., ≥ 10 , the performance becomes to be appreciable.
This approach conceptually could be extended to estimate the joint density in the multivariate case ( > 2), however, the "curse of dimensionality" [71] makes it impossible to realise. This term "curse of dimensionality" was coined by R. E. Bellman (1961) to indicate that the number of samples needed to estimate an arbitrary function with a given level of accuracy grows exponentially with the number of variables that it comprises.