# A Review of Shannon and Differential Entropy Rate Estimation

## Abstract

## 1. Introduction

## 2. Entropy and Entropy Rate

**Definition 1.**

**Definition 2.**

**Definition 3.**

**Definition 4.**

**Definition 5.**

**Definition 6.**

**Theorem 1.**

## 3. Parametric Approaches

#### 3.1. Gaussian Processes

**Definition 7.**

**Σ**is the covariance matrix.

#### 3.1.1. Maximum Entropy Spectral Estimation

#### 3.1.2. Maximum Likelihood Spectral Estimation

#### 3.1.3. Non-Parametric Spectral Density Estimation

#### 3.2. Markov Processes

#### 3.2.1. Markov Chains

#### 3.2.2. Hidden Markov Models

#### 3.2.3. Other Markov Processes

#### 3.3. Renewal/Point Processes

## 4. Non-Parametric Approaches

#### 4.1. Discrete-Valued, Discrete-Time Entropy Rate Estimation

#### 4.2. Continuous-Valued, Discrete-Time Entropy Rate Estimation

#### 4.2.1. Approximate Entropy

#### 4.2.2. Sample Entropy

#### 4.2.3. Permutation Entropy

#### 4.2.4. Specific Entropy

## 5. Conclusions

**Table 1.**Comparison of entropy rate estimation techniques into categories based on parametric/non-parametric techniques. The modelling estimate refers to the quantity that is estimated in the technique and the entropy rate estimate refers to the full entropy rate expression used. For example, if estimating entropy rate of a Markov chain using plug-in estimation. Then, the modelling estimates may be non-parametric for the transition probabilities, ${p}_{ij}$ and the stationary distribution, ${\pi}_{j}$. However, the entropy rate estimator is a parametric estimator for the Markov model. Hence, there are no non-parametric/parametric estimators because non-parametric entropy estimators do not use a model.

Modelling Estimate | ||
---|---|---|

Entropy Rate Estimate | Parametric | Non-Parametric |

Parametric | [7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26] | [27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43] |

Non-Parametric | N/A | [44,45,46,47,48,49,50,51,52,53,54] |

**Table 2.**Comparison of entropy rate estimation techniques. They are partitioned into four categories based whether they are discrete or continuous time, and whether they work on discrete or continuous-valued data.

