# Entropy Power, Autoregressive Models, and Mutual Information

## Abstract

**:**

## 1. Introduction

## 2. Entropy Power/Entropy Rate Power

**X**be a stationary continuous-valued random process with samples ${X}^{n}$ =$[{X}_{i},i=1,2,\dots ,n]$, then the differential entropy rate of the process

**X**is [19]

## 3. Autoregressive Models

#### 3.1. The Power Spectral Density

#### 3.2. The Levinson-Durbin Recursion

**R**is an m by m Toeplitz matrix of the autocorrelation terms in Equation (17), $\mathbf{A}={[{a}_{1},{a}_{2},\dots ,{a}_{m}]}^{T}$, and $\mathbf{C}$ is a column vector of the autocorrelation terms $R\left(j\right),j=1,2,\dots ,m$.

## 4. Minimum MSPE and AR Models

## 5. Log Ratio of Entropy Powers

#### 5.1. Gaussian Distributions

#### 5.2. Laplacian Distributions

#### 5.3. Increasing Predictor Order

#### 5.4. Maximum Entropy Spectral Estimate

#### 5.5. Orthogonal Decompositions and Whitened Prediction Errors

## 6. Experimental Examples

## 7. Application to Speech Coding

#### 7.1. Speech Waveform Coding

#### 7.2. Code-Excited Linear Prediction

## 8. Other Possible Applications

#### 8.1. ECG Classification

#### 8.2. EEG Classification

#### 8.3. Geophysical Exploration

## 9. Conclusions

## Funding

## Acknowledgments

## Conflicts of Interest

## Abbreviations

ADPCM | Adaptive differential pulse code modulation |

AMR | Adaptive multirate |

AR | Autoregressive |

CELP | Code-excited linear prediction |

DPCM | Differential pulse code modulation |

ECG | Electorcardiogram |

EEG | Electroencephalogram |

EVS | Enhanced voice services |

MSPE | Mean squared prediction error |

MMSPE | Minimum mean squared prediction error |

MMSE | Minimum mean squared error |

PCM | Pulse code modulation |

SNR | Signal to noise ratio |

SPER | Signal to prediction error ratio |

VoIP | Voice over Internet Protocol |

## References

- Box, G.; Jenkins, G. Time Series Analysis: Forecasting and Control; Holden-Day: San Francisco, CA, USA, 1970. [Google Scholar]
- Grenander, U.; Rosenblatt, M. Statistical Analysis of Stationary Time Series; John Wiley & Sons: New York, NY, USA, 1957. [Google Scholar]
- Koopmans, L.H. The Spectral Analysis of Time Series; Academic press: New York, NY, USA, 1995. [Google Scholar]
- Makhoul, J. Linear prediction: A tutorial review. Proc. IEEE
**1975**, 63, 561–580. [Google Scholar] [CrossRef] - Markel, J.D.; Gray, A.H. Linear Prediction of Speech; Springer: Berlin, Germany, 1976; Volume 12. [Google Scholar]
- Rabiner, L.R.; Schafer, R.W. Digital Processing of Speech Signals; Pearson: Englewood Cliffs, NJ, USA, 2011. [Google Scholar]
- Robinson, E.A.; Treitel, S. Geophysical Signal Analysis; Prentice-Hall: Upper Saddle River, NJ, USA, 1980. [Google Scholar]
- Robinson, E.A.; Durrani, T.S. Geophysical Signal Processing; Prentice-Hall International: London, UK, 1986. [Google Scholar]
- Vuksanovic, B.; Alhamdi, M. AR-based method for ECG classification and patient recognition. Int. J. Biom. Bioinform.
**2013**, 7, 74–92. [Google Scholar] - Gersch, W.; Martinelli, F.; Yonemoto, J.; Low, M.; McEwen, J. A Kullback Leibler-nearest neighbor rule classification of EEGs: The EEG population screening problem, an anesthesia level EEG classification application. Comput. Biomed. Res.
**1980**, 13, 283–296. [Google Scholar] [CrossRef] - Gersch, W.; Martinelli, F.; Yonemoto, J.; Low, M.; McEwen, J. Kullback Leibler Nearest neighbor rule classification of EEGS. Comput. Biomed. Res.
**1980**, 16. [Google Scholar] - Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J.
**1948**, 27, 379–423. [Google Scholar] [CrossRef] - Gallager, R.G. Information Theory and Reliable Communication; Wiley: New York, NY, USA, 1968. [Google Scholar]
- Gibson, J.D.; Mahadevan, P. Log likelihood spectral distance, Entropy rate power, and mutual information with applications to speech coding. Entropy
**2017**, 19, 496. [Google Scholar] [CrossRef] - Gray, R. Information rates of autoregressive processes. IEEE Trans. Inf. Theory
**1970**, 16, 412–421. [Google Scholar] [CrossRef] - Gibson, J.D.; Berger, T.; Lookabaugh, T.; Lindbergh, D.; Baker, R.L. Digital Compression for Multimedia: Principles and Standards; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1998. [Google Scholar]
- Cover, T.M.; Thomas, J.A. Elements of Information Theory; Wiley-Interscience: Hoboken, NJ, USA, 2006. [Google Scholar]
- Shynk, J.J. Probability, Random Variables, and Random Processes: Theory and Signal Processing Applications; John Wiley & Sons: Hoboken, NJ, USA, 2012. [Google Scholar]
- El Gamal, A.; Kim, Y.H. Network Information Theory; Cambridge University Press: Cambridge, UK, 2011. [Google Scholar]
- Kolmogorov, A.N. Theory of the Transmission of Information. In Proceedings of the Session on Scientific Problems of Automation; 1956. [Google Scholar]
- Pinsker, M.S. Information and Information Stability of Random Variables and Processes; Holden-Day: San Francisco, CA, USA, 1964. [Google Scholar]
- Berger, T. Rate Distortion Theory; Prentice-Hall: Englewood Cliffs, NJ, USA, 1971. [Google Scholar]
- Berger, T.; Gibson, J.D. Lossy Source Coding. IEEE Trans. Inf. Theory
**1998**, 44, 2693–2723. [Google Scholar] [CrossRef] - Chu, W.C. Speech Coding Algorithms; Wiley: Hoboken, NJ, USA, 2003. [Google Scholar]
- Gray, R.M. Toeplitz and Circulant Matrices: A Review. Found. Trends Commun. Inf. Theory
**2006**, 2, 155–239. [Google Scholar] [CrossRef] - Gray, R.M.; Hashimoto, T. A note on rate-distortion functions for nonstationary Gaussian autoregressive processes. IEEE Trans. Inf. Theory
**2008**, 54, 1319–1322. [Google Scholar] [CrossRef] - Gray, R.M. Linear Predictive Coding and the Internet Protocol; Now Publishers: Hanover, MA, USA, 2010. [Google Scholar]
- Durbin, J. The fitting of time series models. Rev. Inst. Int. Stat.
**1960**, 28, 233–243. [Google Scholar] [CrossRef] - Choi, B.; Cover, T.M. An information-theoretic proof of Burg’s maximum entropy spectrum. Proc. IEEE
**1984**, 72, 1094–1096. [Google Scholar] [CrossRef] - Atal, B.S.; Hanauer, S.L. Speech analysis and synthesis by linear prediction of the speech wave. J. Acoust. Soc. Am.
**1971**, 50, 637–655. [Google Scholar] [CrossRef] [PubMed] - Silverman, B.W. Density Estimation for Statistics and Data Analysis; Routledge: Abingdon-on-Thames, UK, 2018. [Google Scholar]
- Darbellay, G.A.; Vajda, I. Estimation of the information by an adaptive partitioning of the observation space. IEEE Trans. Inf. Theory
**1999**, 45, 1315–1321. [Google Scholar] [CrossRef][Green Version] - Hudson, J.E. Signal Processing Using Mutual Information. IEEE Signal Process. Mag.
**2006**, 23, 50–54. [Google Scholar] [CrossRef] - Grenander, U.; Szego, G. Toeplitz Forms and Their Applications; University of California Press: Oakland, CA, USA, 1958. [Google Scholar]
- Sayood, K. Introduction to Data Compression; Morgan Kaufmann: Waltham, MA, USA, 2017. [Google Scholar]
- Gibson, J. Speech Compression. Information
**2016**, 7, 32. [Google Scholar] [CrossRef] - 3GPP. Speech Codec Speech Processing Functions; Adaptive Multi-Rate-Wideband (AMR-WB) Speech Codec; Transcoding Functions. TS 26.190, 3rd Generation Partnership Project (3GPP). 2011. [Google Scholar]
- Dietz, M.; Multrus, M.; Eksler, V.; Malenovsky, V.; Norvell, E.; Pobloth, H.; Miao, L.; Wang, Z.; Laaksonen, L.; Vasilache, A.; et al. Overview of the EVS codec architecture. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia, 19–24 April 2015; pp. 5698–5702. [Google Scholar] [CrossRef]
- Cummiskey, P.; Jayant, N.; Flanagan, J. Adaptive quantization in differential PCM coding of speech. Bell Labs Tech. J.
**1973**, 52, 1105–1118. [Google Scholar] [CrossRef] - Gibson, J. Speech coding for wireless communications. In Mobile Communications Handbook, 3rd ed.; CRC Press: Boca Raton, FL, USA, 2012; pp. 539–557. [Google Scholar]
- Robinson, E.A. Predictive Decomposition of Time Series with Applications to Seismic Exploration. Ph.D. Thesis, MIT, Cambridge, MA, USA, 1954. [Google Scholar]
- Robinson, E.A. Predictive decomposition of time series with application to seismic exploration. Geophysics
**1967**, 32, 418–484. [Google Scholar] [CrossRef]

**Table 1.**Change in Mutual Information from Equation (34) as the Predictor Order is Increased: Frame 45, SPER = 16 dB.

N | $\mathit{MMSPE}(\mathit{X},{\mathit{X}}_{\mathit{N}})$ | $\mathit{I}(\mathit{X};{\mathit{X}}_{\mathit{N}})-\mathit{I}(\mathit{X};{\mathit{X}}_{\mathit{N}-1})$ |
---|---|---|

0 | $1.0$ | 0 bits/letter |

0–1 | $0.3111$ | $0.842$ bits/letter |

1–2 | $0.0667$ | $1.11$ bits/letter |

2–3 | $0.0587$ | $0.092$ bits/letter |

3–4 | $0.0385$ | $0.304$ bits/letter |

4–5 | $0.0375$ | $0.019$ bits/letter |

5–6 | $0.0342$ | $0.065$ bits/letter |

6–7 | $0.0308$ | $0.069$ bits/letter |

7–8 | $0.0308$ | $0.0$ bits/letter |

8–9 | $0.0261$ | $0.12$ bits/letter |

9–10 | $0.0251$ | $0.026$ bits/letter |

0–10 Total | $0.0251$ | $2.647$ bits/letter |

**Table 2.**Change in Mutual Information from Equation (34) as the Predictor Order is Increased: Frame 23, SPER = 11.85 dB.

N | $\mathit{MMSPE}(\mathit{X},{\mathit{X}}_{\mathit{N}})$ | $\mathit{I}(\mathit{X};{\mathit{X}}_{\mathit{N}})-\mathit{I}(\mathit{X};{\mathit{X}}_{\mathit{N}-1})$ |
---|---|---|

0 | $1.0$ | 0 bits/letter |

0–1 | $0.2577$ | $0.978$ bits/letter |

1–2 | $0.1615$ | $0.339$ bits/letter |

2–3 | $0.1611$ | $0.0$ bits/letter |

3–4 | $0.1179$ | $0.225$ bits/letter |

4–5 | $0.1118$ | $0.042$ bits/letter |

5–6 | $0.0962$ | $0.104$ bits/letter |

6–7 | $0.0704$ | $0.226$ bits/letter |

7–8 | $0.0653$ | $0.054$ bits/letter |

8–9 | $0.0653$ | $0.0$ bits/letter |

9–10 | $0.0652$ | $0.0$ bits/letter |

0–10 Total | $0.0652$ | $1.968$ bits/letter |

**Table 3.**Change in Mutual Information from Equation (34) as the Predictor Order is Increased: Frame 3314, SPER = 7.74 dB.

N | $\mathit{MMSPE}(\mathit{X},{\mathit{X}}_{\mathit{N}})$ | $\mathit{I}(\mathit{X};{\mathit{X}}_{\mathit{N}})-\mathit{I}(\mathit{X};{\mathit{X}}_{\mathit{N}-1})$ |
---|---|---|

0 | $1.0$ | 0 bits/letter |

0–1 | $0.6932$ | $0.265$ bits/letter |

1–2 | $0.4918$ | $0.25$ bits/letter |

2–3 | $0.4782$ | $0.02$ bits/letter |

3–4 | $0.3554$ | $0.215$ bits/letter |

4–5 | $0.3474$ | $0.0164$ bits/letter |

5–6 | $0.2109$ | $0.36$ bits/letter |

6–7 | $0.2065$ | $0.015$ bits/letter |

7–8 | $0.1788$ | $0.104$ bits/letter |

8–9 | $0.1682$ | $0.044$ bits/letter |

9–10 | $0.1680$ | $0.0$ bits/letter |

0–10 Total | $0.1680$ | $1.29$ bits/letter |

© 2018 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Gibson, J.
Entropy Power, Autoregressive Models, and Mutual Information. *Entropy* **2018**, *20*, 750.
https://doi.org/10.3390/e20100750

**AMA Style**

Gibson J.
Entropy Power, Autoregressive Models, and Mutual Information. *Entropy*. 2018; 20(10):750.
https://doi.org/10.3390/e20100750

**Chicago/Turabian Style**

Gibson, Jerry.
2018. "Entropy Power, Autoregressive Models, and Mutual Information" *Entropy* 20, no. 10: 750.
https://doi.org/10.3390/e20100750