You are currently viewing a new version of our website. To view the old version click .
Sensors
  • Article
  • Open Access

14 February 2023

Comparison of Information Criteria for Detection of Useful Signals in Noisy Environments

,
and
Institute of Control Sciences of RAS, 117997 Moscow, Russia
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Development, Investigation and Application of Acoustic Sensors: Part II

Abstract

This paper considers the appearance of indications of useful acoustic signals in the signal/noise mixture. Various information characteristics (information entropy, Jensen–Shannon divergence, spectral information divergence and statistical complexity) are investigated in the context of solving this problem. Both time and frequency domains are studied for the calculation of information entropy. The effectiveness of statistical complexity is shown in comparison with other information metrics for different signal-to-noise ratios. Two different approaches for statistical complexity calculations are also compared. In addition, analytical formulas for complexity and disequilibrium are obtained using entropy variation in the case of signal spectral distribution. The connection between the statistical complexity criterion and the Neyman–Pearson approach for hypothesis testing is discussed. The effectiveness of the proposed approach is shown for different types of acoustic signals and noise models, including colored noises, and different signal-to-noise ratios, especially when the estimation of additional noise characteristics is impossible.

1. Introduction

Since Shannon [1] introduced information and information entropy, these concepts have attracted significant attention from scientists, as evidenced by the large number of articles devoted to the development of information theory in relation to various theoretical and practical aspects. Many different information criteria, metrics, and methods for their calculation that are based, one way or another, on the concepts of Shannon entropy have been proposed and investigated [2]. These metrics can be used quite successfully in signal processing, which eventually led to the emergence of a separate section of this scientific area, called entropic signal analysis [3].
For signals described by time series, the information entropy can be calculated on the basis of both signal representation in the time domain [4] and its representation in the frequency domain [5], i.e., using the signal spectrum. The convenience of the second approach comes from the fact that white noise, which is usually used to model background noise in these problem statements, has a uniform frequency distribution. This allows us to simplify its mathematical description and separate useful signals more effectively.
Decision theory considers change-point detection problems, which are closely related to the problems discussed above: often, in such problems, the moment of change in the parameters of a random process registered in discrete time must be determined. In [6,7], many probabilistic-statistical methods of solving such problems are considered. The Neyman–Pearson approach to this problem was used in [8]. Additionally, one cannot ignore the so-called Anomaly Detection Problems, where detection of anomalies in time series is required [9,10,11], i.e., the moment at which the behavior of the system begins to qualitatively differ from normal for various reasons, in particular, due to unwanted external interference. The electrocardiogram (ECG) is one example of such a time series, and ECGs have been analyzed in a large number of articles, for example, [12]. The presence of an anomaly in this case can indicate health problems, and detection at an early stage may save the life of the patient.
Of particular interest is the processing of acoustic signals, which can be useful, for example, in Voice Activity Detection (VAD) problems [13] related to voice assistants. The task is usually to separate speech segments from background environmental noise. Related articles [5,14,15] present a method for endpoint detection, i.e., the determination of the limits of a speech signal in a mixture of this signal and background noise based on the calculation of the spectral entropy. The general idea of methods based on information criteria is that their values experience a sharp jump when a useful signal appears in the noise.
In a series of articles [16,17,18,19,20,21], researchers introduced the concept of a statistical measure of signal complexity, which they called statistical complexity. In [22,23], statistical complexity and information entropy were used to classify various underwater objects of animate and inanimate nature from recorded sound. In the present article, we use this measure to indicate the appearance of an useful acoustic signal in a highly noisy mixture. It should be noted that the positive side of the proposed method is that it does not require any a priori knowledge about the signal to be detected. However, a priori information, such as the approximate frequency range of the signal is known, its detection will be even more accurate.
The structure of the paper is as follows. Section 2 provides a brief theoretical summary of the information criteria used in various known signal detection methods. In Section 3, entropy variation is investigated and statistical complexity is introduced. In Section 4, the connection between statistical complexity and the Neyman–Pearson criterion for hypothesis testing is also discussed to justify the proposed approach. Section 5 is provides a variety of examples, gives a comparison of different information criteria, and discussed the results, which allows us to make an educated choice about a suitable rule for the detection of signals in a noisy mixture. Section 6 summarizes the conducted research and provides the direction for future work.

2. Information Criteria

2.1. Information Entropy and Other Information Criteria

In information theory, the entropy of a random variable is the average level of ”surprise” or ”uncertainty” inherent in the possible outcomes of the variable. For a discrete random variable X that takes values in the alphabet X and has a distribution density of p p : X [ 0 , 1 ] , the entropy, according to Shannon [1], is defined as
H ( p ) = x X p ( x ) log 2 p ( x ) ,
where denotes the sum across all possible values of the variable. When computing the sum (1), it is agreed that 0 log 0 = 0 , and this assumption holds for all future equations. From the Formula (1), it follows that entropy reaches its maximum value when all states of the system are equally probable.
There are several definitions of information divergences, i.e., the statistical distances between two distributions. The Kullback–Leibler divergence (or mutual entropy) between two discrete probability distributions p ( x ) and q ( x ) on an event set X is defined as
D K L ( p q ) = x X p ( x ) log 2 p ( x ) q ( x ) .
This measure is a statistical distance and distinguishes statistical processes by indicating how much p ( x ) differs from q ( x ) by the maximum likelihood hypothesis test when the actual data obey the distribution p ( x ) . It is easy to see that
D K L ( p q ) = H ( p , q ) H ( p ) ,
where H ( p , q ) is a cross-entropy between p and q:
H ( p , q ) = E p [ log 2 q ] ,
where E p [ · ] is an operator of the mathematical expectation relative to the distribution p.
The symmetrized Kullback–Leibler distance [13] is often used in studies:
ρ ( p q ) = D K L ( p q ) + D K L ( q p ) .
However, the Jensen–Shannon divergence, which symmetrizes the Kullback–Leibler divergence and is often a more convenient information measure for practical applications, is used more often:
J S D ( p q ) = D K L ( p m ) + D K L ( q m ) 2 , m = p + q 2 .
It is symmetric and always has a finite value. The square root of the Jensen–Shannon divergence is a metric that is often called the Jensen–Shannon distance.
It is easy to see that
J S D ( p q ) = H ( m ) 1 2 ( H ( p ) + H ( q ) ) .
Another quantity related to the complexity of the system is the ”disequilibrium,” denoted by D, which shows the deviation of a given probability distribution from a uniform one. The concept of the statistical complexity of a system can be considered a development of the concept of entropy. In [16,17,18,19,20,21], it is defined as
C = H · D ,
where C is the statistical complexity, H is the information entropy, and D is a measure of the disequilibrium of the distribution relative to the uniform one.
The measure of statistical complexity reflects the relationship between the amount of information and its disequilibrium in the system. As a parameter D, according to the authors of [16], one can choose any metric that determines the difference between the maximum entropy and the entropy of the studied signal. The simplest example of disequilibrium is the square of the Euclidean distance in R N between the original distribution and the uniform distribution, but often, the Jensen–Shannon divergence [22,23] is also used.

2.2. Time Entropy

Now let us consider the information characteristics mentioned above in relation to time series. The Shannon entropy for systems with unequal probability states is defined as follows: Let the i-th state of the system have a probability of p i = N i / N , where N is a sample volume and N i is the amount of filling at the i-level. Then, the entropy H ( p ) , according to the Formula (1), equals
H ( p ) = i = 1 N p i log 2 p i .
From here, we consider discrete probability distributions p i with the following properties:
p i [ 0 , 1 ] , i = 1 N p i = 1 .
There are different ways to calculate probabilities p i from the time series. The simplest one is as follows: First, the maximum x max and minimum x min values are found for the considered time series x ( t ) with N data points. Then, the interval ( x max x min ) is divided into n subintervals (levels) so that the value of the interval Δ x is not less than the confidence interval of the observations. The resulting sample is treated as a “message”, and the i subintervals are treated as an “alphabet”. Then, we find the number Δ N i of sample values x k that fall into each of the subintervals and determine the relative population level p i t (the probability of a value from the sample falling into a subinterval i, that is, the relative frequency of occurrence of the “letter” in the “message”):
p i t = Δ N i N , i = 1 n Δ N i = N , i = 1 n p i t = 1 .
The elementary entropy of the sampling is defined as the Shannon entropy (9) on a given set p i t , and this is normalized to the total number of states n so that its values belong to the interval [ 0 , 1 ] :
H ( p t ) = i = 1 n p i t log 2 p i t log 2 n .
This approach is known as the first sampling entropy [4] and is used, for example, in [24] to detect the hydroacoustic signals emitted by an underwater source.
On the other hand, the second sampling entropy can be defined as
H ( p 0 ) = i = 1 N p i 0 log 2 p i 0 log 2 N , p i 0 = x ( t i ) k = 1 N x ( t k ) .
In this case, the signal samples themselves are considered “letters”, which are distributed across the time axis in contrast to the amplitude axis from (12), and the “alphabet” is the whole set of amplitudes.

2.3. Spectral Entropy

In addition to the time domain, the entropy can be calculated based on the representation of the signal in the frequency domain, i.e., p i can be calculated with the spectrum of the signal. Spectral entropy is a quantitative assessment of the spectral complexity of the signal in the frequency domain from an energy point of view.
Consider the time series x ( t ) and its spectral decomposition in the frequency domain X ( f i ) with N f f t frequency components, obtained using the Fast Fourier Transform (FFT). The spectral power density is estimated as follows:
s ( f i ) = 1 N f f t X ( f i ) 2 .
Then, the probability distribution of the spectral power density p s = { p 1 , p 2 , , p N f f t } can be written in the form
p i s = s ( f i ) k = 1 N f f t s ( f k ) , i = 1 , , N f f t ,
where s ( f i ) is the spectral energy for the spectral component with a frequency of f i , p i s is the corresponding probability density, N f f t is the number of spectral components in the FFT, and the upper index s shows that the distribution refers to the signal spectrum. The resulting function is a spectrum distribution density function.
Finally, the spectral entropy can be determined with the Equation (9) and normalized by the size of the spectrum:
H ( p s ) = k = 1 N f f t p k s log 2 p k s log 2 N f f t .
The Spectral Information Divergence ( S I D ) method [25] was recently added to the Matlab mathematical package, and is calculated according to Formula (5) according to the similarity of two signals based on the divergence between the probability distributions of their spectra:
S I D ( r , t ) = i p i log p i q i + i q i log q i p i ,
where r and t are the reference and test spectra, respectively, and the values of the probability distribution p i and q i for these spectra are determined according to (15).

4. Hypothesis Testing

The classical probabilistic approach to the study of the considered problem of the detection of useful signals against background noise is called binary hypothesis testing. The binary problem associated with the decision to receive only noise (hypothesis Γ 0 ) or to receive a mixture of a useful signal and noise (hypothesis Γ 1 ) is solved [8].
In the statistical decision theory [6], it is shown that, in signal detection in the presence of noise, the optimal decisive rule is based on a comparison of the likelihood ratio with some threshold. The Neyman–Pearson criterion is used to select the threshold in the absence of a priori probabilities of the presence and absence of a useful signal. The efficiency of the detection procedure using the Neyman–Pearson criterion is characterized by the probability of correct detection with a fixed probability of false alarms.
The solution to the problem of distinguishing between two hypotheses can be derived from the following variant of the Neyman–Pearson lemma.
Lemma 2
(Neyman-Pearson). Let there be a measurable function, called a decisive rule,
d ( x 1 , , x N ) = 1 , t h e   h y p o t h e s i s Γ 0 , 0 , t h e   h y p o t h e s i s Γ 1 ,
based on which
α ( d ) = P r o b a b i l i t y ( o f   d e t e c t i o n Γ 0 Γ 1 i s   t r u e ) ,
β ( d ) = P r o b a b i l i t y ( o f   d e t e c t i o n Γ 1 Γ 0 i s   t r u e ) .
The decisive rule d * is optimal if
α ( d * ) + β ( d * ) = inf d [ α ( d ) + β ( d ) ] = E r ( N ; Γ 0 , Γ 1 ) ,
where E r ( N ; Γ 0 , Γ 1 ) is called an error function.
In the problem of the detection of a useful signal, α is known as the probability of a false alarm occurring, and β is known as the probability of missing a useful signal.
The error function can be calculated precisely through the variation of the measure (with a sign) by the following formula from [6]:
E r ( N ; Γ 0 , Γ 1 ) = 1 1 2 P 0 ( N ) P 1 ( N ) = 1 T V ( P 0 , P 1 ) ,
where P 0 ( N ) is the multivariate distribution function of observational statistics under hypothesis Γ 0 , and P 1 ( N ) is the multivariate distribution function of observational statistics under hypothesis Γ 1 , and T V is the full variation T V ( P 0 , P 1 ) = 1 2 P 0 ( N ) P 1 ( N ) .
The peculiarity of the Formula (37) is that if the carriers on which the hypothesis measures Γ 0 and Γ 1 are concentrated are different, then E r ( N ; Γ 0 , Γ 1 ) 0 . If the measures P 0 ( N ) and P 1 ( N ) are similar, then P 0 ( N ) P 1 ( N ) 0 , and then E r ( N ; Γ 0 , Γ 1 ) 1 .
For the problem of detecting a deterministic useful signal, the case P 0 ( N ) P 1 ( N ) = 2 T V ( P 0 , P 1 ) 0 and the possibility of a reasonable estimate of this value is interesting.
The estimated T V ( P , Q ) constraints are known from thed estimate J S D ( P Q ) , which is used to compute the statistical complexity C J S D . Both T V and J S D are metrics related to the probability distribution space, but in the Euclidean space, D S Q (21) serves as this metric. Since the problem of detecting a deterministic signal in the presence of background noise is considered, it is reasonable to additionally take into account this ”determinism” by multiplying D S Q by the entropy H, which is associated with the introduction and use of statistical complexity in the form of (22) and (25).

5. Modelling and Discussion

5.1. The Calculation Algorithm and Presentation of the Simulation Results

In all experiments, graphs of the information characteristics are presented as functions of time. The characteristics are calculated from the signal according to the following algorithm:
  • After being digitized with the sampling rate, the F audio signal is divided into short segments containing W digital samples.
  • The discrete densities p i (15) are calculated from the time or frequency domains.
  • The information criterion is calculated using p i .
  • The sequence of values is displayed together with the signal on the time axis (each of the obtained values is extended by W counts).
When a certain threshold of the information criterion is exceeded, this indicates the appearance of a useful signal in the mixture.
The signal processing results according to this algorithm are presented below. For different acoustic signals, a comparison of the quality of indication of the appearance of a useful signal by different information criteria at different levels of added white noise is demonstrated. In addition, Section 5.5 shows a comparison of two methods for calculating the statistical complexity and draws conclusions about the usefulness of both.
The first acoustic signal chosen was an audio recording of a humpback whale song recorded underwater. A large set of such recordings is available from the Watkins Marine Mammal Sound Database collected by Woods Hole Oceanographic Institution and the New Bedford Whaling Museum. The ability to separate such signals from strong sea noise may be useful for research biologists for further classification and study. In addition, these signals are similar in structure to the human voice with separate words, the extraction of which could be useful, for example, in tasks of voice activity detection and speech recognition.
In all of the graphs presented below, the signal is marked with a blue line, and the corresponding information metric is marked with a red line. The left vertical axis corresponds to the values of the signal amplitude, and the right vertical axis corresponds to the values of the information metric. All horizontal axes represent the timeline in seconds. The signal is shown without added noise for better comprehension, but the variable parameter of the standard deviation σ N of the white noise is marked with a dashed line. All information metrics are normalized for the convenience of presentation. All calculations and visualizations were performed using Python. White, brown, and pink noises, which were artificially added to audio recordings, were also generated numerically.

5.2. Time Information Criteria

First, we consider the behavior of the information entropies H ( p t ) and H ( p 0 ) , calculated from the time samples of the signal x ( t ) .
Figure 1 shows that as noise increases, there is serious degradation of the time entropy graph for both calculation methods, so that for σ N = 2000 (SNR ≈ 1.5 dB), these information criteria can no longer serve as reliable indicators of the appearance of a useful signal in the mixture. We note an interesting feature of the behavior of H ( p 0 ) and H ( p t ) : the value of the first characteristic is maximal for the uniform distribution and decreases with the appearance of a useful signal in the mixture, while, in contrast, the value of the second is minimal in the absence of a signal and increases with its appearance. This obviously follows on from the formulas for calculating the distributions and entropies (11), (13), (12).
Figure 1. Graphs of H ( p t ) and H ( p 0 ) for different levels of added noise.
The information characteristic L H ( p q ) stands out favorably from the time entropies, as demonstrated above.
In Figure 2, the L H for the noise level σ N = 2000 shows the appearance of a useful signal and works sufficiently, even for the double noise value. However, it should be noted that this is true only for stationary noise, whose average value does not change over time. Otherwise, this metric will react to changes in noise as well, which follows on from the formula (29). Moreover, initial estimation of σ N is required for the correct functioning of this criterion.
Figure 2. Graphs of L H ( p q ) for different levels of added noise.

5.3. Time Entropy H ( p 1 )

The time entropy H ( p 1 ) associated with another grouping of the ”alphabet” derived from the signal samples is considered separately and the graphs for different number of letters are shown in Figure 3 and Figure 4.
Figure 3. The entropy H ( p 1 ) for the number of letters of the alphabet equal to 64.
Changing the alphabet partitioning negatively affects the effectiveness of entropy in this representation:
Figure 4. The entropy H ( p 1 ) for the number of letters of the alphabet equal to 8.

5.4. Spectral Information Criteria

Information criteria based on the spectral distribution of p s are deprived of the disadvantages of the time criteria.
Figure 5 shows the dependence of the spectral entropy on time. We can see a significant improvement in the maximum allowable noise level, at which the indication of the appearance of a useful signal is still possible, with respect to the graphs presented in Figure 2.
Figure 5. Spectral entropy plots for different SNRs.
The point that we want to make is that the white noise in a signal in spectral representation has quite a definite uniform probability distribution, which greatly facilitates the calculation of entropy and saves us from the necessity of estimating the variance of this noise. Moreover, even if the noise is not stationary, i.e., its parameters change over time, in a small window W, it can still be considered white, and the above statement is still true.
The distribution p s can be used as the basis for a number of information divergences (17), (24), (22), (25):
Figure 6 shows that the separability of information metrics decreases along with the signal-to-noise ratio (SNR). However, the statistical complexity C S Q performs better than all other criteria, because it still allows a useful signal to be distinguished when other metrics behave irregularly and no longer show significantly excess levels compared to areas without a signal. Thus, it is the most promising characteristic in our opinion.
Figure 6. Information divergence plots based on spectral distribution.

5.5. Comparison of Different Ways of Calculating the Statistical Complexity

Of separate interest is the comparison of the behavior of the statistical complexities C S Q and C J S D , which essentially correspond to different methods of calculating the same value of statistical complexity. Figure 7 illustrates this comparison.
Figure 7. Comparison of statistical complexities C S Q and C J S D .
We can see that C S Q shows a better result when used as an indicator of the appearance of a useful signal in white noise compared to the Jensen–Shannon divergence.

5.6. Hydroacoustic Signal Model of an Underwater Marine Object

The second signal is a modelled hydroacoustic signal of an underwater marine object. The study of such signals is important in military and civilian applications, because it can automate the process of analyzing the hydroacoustic scene and identifying potential threats. In Figure 8, spectral entropy dependencies for different levels of added noise are shown.
Figure 8. Spectral entropy plots for different SNRs.
Figure 9 shows the dependencies of statistical complexity for a given signal. It is worth noting that the selected information metric shows the presence of a useful signal, even for a very small SNR ( 17 dB) in the last example.
Figure 9. Graphs of statistical complexity for different SNRs.
It can be observed that, in comparison with all other information metrics, the statistical complexity shows the best result in terms of indicating the presence of a useful signal in the mixture, because it remains effective for small SNRs, while all other characteristics can no longer detect a useful signal in noisy receiving channels.

5.7. Hydroacoustic Signal Model with Pink Noise

Now let us change the additive noise model and use pink noise instead of white noise. As can be observed in Figure 10, the spectral entropy shows an unsatisfactory result for the chosen low SNR.
Figure 10. Spectral entropy for pink noise model.
Figure 11 shows that, along with the spectral entropy, the statistical complexity C S Q performs poorly, but C J S D confidently shows the presence of a signal.
Figure 11. Statistical complexities C S Q and C J S D for pink noise model.

5.8. Hydroacoustic Signal Model with Brown Noise

In this example, brown noise is used as the noise model. As in the previous subsection, spectral entropy fails in the task of signal extraction, as shown in Figure 12.
Figure 12. Spectral entropy for brown noise model.
However, Figure 13 shows that the statistical complexity with the Jensen–Shannon disbalance exhibits a satisfactory performance.
Figure 13. Statistical complexities C S Q and C J S D for brown noise model.
The results are summarized in Table 1. The checkmark indicates the possibility of confident indication of the useful signal, and x indicates the lack of this.
Table 1. Comparison of information criteria for different noise models and SNRs.

6. Conclusions

The article proposed a method for indicating the appearance of a useful signal in a heavily noisy mixture based on the statistical complexity. The analytical formulas used to determine the disequilibrium and statistical complexity were obtained using entropy variation. The effectiveness of the proposed approach for two types of acoustic signals in comparison with other information metrics was shown for different models of added noise. For white noise, the appearance of deterministic signal was shown to be reliably detected for a very small SNR ( 15 dB) when the statistical complexity based on the spectral distribution variance was used as the criterion. However, for more complex noise models, the use of the statistical complexity with the Jensen–Shannon disequilibrium was shown to have better efficiency. Both the time and frequency domains were considered for the entropy calculation. The criteria for signal detection in a heavy noise mixture based on time distributions were shown to be less informative than those based on spectral distribution. The connection between the statistical complexity criterion and the Neyman–Pearson approach for hypothesis testing was also discussed. Future work will be devoted to research on the information criteria based on two- and multidimensional distributions, and acoustic signals with realistic background noise will be considered.

Author Contributions

Conceptualization, A.G. and P.L.; methodology, A.G. and P.L.; software, P.L.; validation, L.B. and P.L.; formal analysis, P.L., L.B. and A.G.; investigation, L.B., A.G. and P.L.; writing—original draft preparation, P.L. and A.G.; writing—review and editing, L.B., A.G. and P.L.; visualization, P.L.; supervision, A.G.; project administration, A.G.; funding acquisition, A.G. All authors have read and agreed to the published version of the manuscript.

Funding

The work of A.G. and P.L. was partially supported by the Russian Science Foundation under grant no 23-19-00134.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The humpback whale song example was downloaded from https://cis.whoi.edu/science/B/whalesounds/bestOf.cfm?code=AC2A (accessed on 25 September 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
FFTFast Fourier Transform
SNRSignal-to-noise ratio

Appendix A

Proof of Lemma 1.
The difference between entropies for the distributions p i and q i gives the entropy variation δ H :
δ H = H ( q + δ q ) H ( q ) = i = 1 N ( q i + δ q i ) log 2 ( q i + δ q i ) + i = 1 N q i log 2 q i = i = 1 N ( q i + δ q i ) log 2 q i 1 + δ q i q i + i = 1 N q i log 2 q i .
The property of the logarithm of the product and the regrouping of the summands allows the chain of equations to continue as follows:
δ H = i = 1 N ( q i + δ q i ) log 2 q i + log 2 1 + δ q i q i + i N q i log 2 q i = i = 1 N q i log 2 ( 1 + δ q i q i ) i = 1 N δ q i log 2 q i + log 2 1 + δ q i q i = i = 1 N δ q i log 2 q i i = 1 N ( q i + δ q i ) log 2 1 + δ q i q i .
The first sum is equal to the difference between cross-entropy and entropy. The next transformation is decomposed into an infinite logarithm series, and the resulting sum is divided into two parts:
δ H = H ( p , q ) H ( q ) i = 1 N n = 1 ( q i + δ q i ) ( 1 ) n + 1 n ln 2 δ q i q i n = H ( p , q ) H ( q ) i = 1 N n = 1 ( 1 ) n + 1 δ q i n n ln 2 q i n 1 i = 1 N n = 1 ( 1 ) n + 1 δ q i n + 1 n ln 2 q i n .
One summand corresponding to n = 1 is removed from the first sum, and summation continues with n = 2 .
δ H = H ( p , q ) H ( q ) i = 1 N δ q i ln 2 i = 1 N n = 2 ( 1 ) n + 1 δ q i n n ln 2 q i n 1 i = 1 N n = 1 ( 1 ) n + 1 δ q i n + 1 n ln 2 q i n .
The resulting summand is zero. Shifting the summation index of the first summation results in δ H in the following form:
δ H = H ( p , q ) H ( q ) + i = 1 N n = 1 ( 1 ) n + 1 δ q i n + 1 ( n + 1 ) ln 2 q i n i = 1 N n = 1 ( 1 ) n + 1 δ q i n + 1 n ln 2 q i n = H ( p , q ) H ( q ) + i = 1 N n = 1 ( 1 ) n + 1 δ q i n + 1 ln 2 q i n 1 n + 1 1 n = H ( p , q ) H ( q ) i = 1 N n = 1 δ q i n + 1 ( 1 ) n + 1 q i n n ( n + 1 ) ln 2 .
Another shift of the summation index leads to the Equation (18), which ends the proof of the Lemma. □

References

  1. Shannon, C.E. A Mathematical Theory of Communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
  2. Ribeiro, M.; Henriques, T.; Castro, L.; Souto, A.; Antunes, L.; Costa-Santos, C.; Teixeira, A. The Entropy Universe. Entropy 2021, 23, 222. [Google Scholar] [CrossRef] [PubMed]
  3. Gray, R.M. Entropy and Information Theory; Springer: Boston, MA, USA, 2011. [Google Scholar] [CrossRef]
  4. Delgado-Bonal, A.; Marshak, A. Approximate Entropy and Sample Entropy: A Comprehensive Tutorial. Entropy 2019, 21, 541. [Google Scholar] [CrossRef]
  5. Shen, J.L.; Hung, J.W.; Lee, L.S. Robust entropy-based endpoint detection for speech recognition in noisy environments. In Proceedings of the 5th International Conference on Spoken Language Processing (ICSLP 1998), Sydney, Australia, 30 November–4 December 1998. [Google Scholar]
  6. Shiryaev, A.N.; Spokoiny, V.G. Statistical Experiments and Decisions; WORLD SCIENTIFIC: Singapore, 2000. [Google Scholar] [CrossRef]
  7. Johnson, P.; Moriarty, J.; Peskir, G. Detecting changes in real-time data: A user’s guide to optimal detection. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. 2017, 375, 16. [Google Scholar] [CrossRef]
  8. Ship detection using Neyman-Pearson criterion in marine environment. Ocean Eng. 2017, 143, 106–112. [CrossRef]
  9. Mehrotra, K.G.; Mohan, C.K.; Huang, H. Anomaly Detection Principles and Algorithms; Terrorism, Security, and Computation; Springer International Publishing: Cham, Switzerland, 2017. [Google Scholar] [CrossRef]
  10. Howedi, A.; Lotfi, A.; Pourabdollah, A. An Entropy-Based Approach for Anomaly Detection in Activities of Daily Living in the Presence of a Visitor. Entropy 2020, 22, 845. [Google Scholar] [CrossRef] [PubMed]
  11. Bereziński, P.; Jasiul, B.; Szpyrka, M. An Entropy-Based Network Anomaly Detection Method. Entropy 2015, 17, 2367–2408. [Google Scholar] [CrossRef]
  12. Horie, T.; Burioka, N.; Amisaki, T.; Shimizu, E. Sample Entropy in Electrocardiogram During Atrial Fibrillation. Yonago Acta Medica 2018, 61, 049–057. [Google Scholar] [CrossRef] [PubMed]
  13. Ramirez, J.; Segura, J.; Benitez, C.; de la Torre, A.; Rubio, A. A new Kullback-Leibler VAD for speech recognition in noise. IEEE Signal Process. Lett. 2004, 11, 266–269. [Google Scholar] [CrossRef]
  14. Wu, B.F.; Wang, K.C. Robust Endpoint Detection Algorithm Based on the Adaptive Band-Partitioning Spectral Entropy in Adverse Environments. Speech Audio Process. IEEE Trans. 2005, 13, 762–775. [Google Scholar] [CrossRef]
  15. Weaver, K.; Waheed, K.; Salem, F. An entropy based robust speech boundary detection algorithm for realistic noisy environments. In Proceedings of the International Joint Conference on Neural Networks, Portland, OR, USA, 20–24 July 2003; Volume 1, pp. 680–685. [Google Scholar] [CrossRef]
  16. López-Ruiz, R. Shannon information, LMC complexity and Rényi entropies: A straightforward approach. Biophys. Chem. 2005, 115, 215–218. [Google Scholar] [CrossRef] [PubMed]
  17. Catalán, R.G.; Garay, J.; López-Ruiz, R. Features of the extension of a statistical measure of complexity to continuous systems. Phys. Rev. E 2002, 66, 011102. [Google Scholar] [CrossRef] [PubMed]
  18. Calbet, X.; López-Ruiz, R. Tendency towards maximum complexity in a nonequilibrium isolated system. Phys. Rev. E 2001, 63, 066116. [Google Scholar] [CrossRef] [PubMed]
  19. Rosso, O.; Larrondo, H.; Martin, M.; Plastino, A.; Fuentes, M. Distinguishing Noise from Chaos. Phys. Rev. Lett. 2007, 99, 154102. [Google Scholar] [CrossRef] [PubMed]
  20. Lamberti, P.; Martin, M.; Plastino, A.; Rosso, O. Intensive entropic non-triviality measure. Phys. A Stat. Mech. Its Appl. 2004, 334, 119–131. [Google Scholar] [CrossRef]
  21. Zunino, L.; Soriano, M.C.; Rosso, O.A. Distinguishing chaotic and stochastic dynamics from time series by using a multiscale symbolic approach. Phys. Rev. E 2012, 86, 046210. [Google Scholar] [CrossRef] [PubMed]
  22. Li, Z.; Li, Y.; Zhang, K. A Feature Extraction Method of Ship-Radiated Noise Based on Fluctuation-Based Dispersion Entropy and Intrinsic Time-Scale Decomposition. Entropy 2019, 21, 693. [Google Scholar] [CrossRef] [PubMed]
  23. Dai, Y.; Zhang, H.; Mao, X.; Shang, P. Complexity–entropy causality plane based on power spectral entropy for complex time series. Phys. A: Stat. Mech. Its Appl. 2018, 509, 501–514. [Google Scholar] [CrossRef]
  24. Quazi, A.H. Method for Detecting Acoustic Signals from an Underwater Source. U.S. Patent US5668778A, 16 September 1997. [Google Scholar]
  25. Chang, C.-I. An information-theoretic approach to spectral variability, similarity, and discrimination for hyperspectral image analysis. IEEE Trans. Inf. Theory 2000, 46, 1927–1932. [Google Scholar] [CrossRef]
  26. Sason, I. On f-Divergences: Integral Representations, Local Behavior, and Inequalities. Entropy 2018, 20, 383. [Google Scholar] [CrossRef] [PubMed]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.