Complexity in animal communication: Estimating the size of N-Gram structures

In this paper, new techniques that allow conditional entropy to estimate the combinatorics of symbols are applied to animal communication studies to estimate the communication's repertoire size. By using the conditional entropy estimates at multiple orders, the paper estimates the total repertoire sizes for animal communication across bottlenose dolphins, humpback whales, and several species of birds for N-grams length one to three. In addition to discussing the impact of this method on studies of animal communication complexity, the reliability of these estimates is compared to other methods through simulation. While entropy does undercount the total repertoire size due to rare N-grams, it gives a more accurate picture of the most frequently used repertoire than just repertoire size alone.


Introduction
The complexity of animal communication is a topic frequently discussed, but difficult 13 to resolve. While it is beyond dispute that many species communicate, even the basic purposes 14 of these communications-whether to communicate information or to just influence the behavior of methods such as entropy rate and Lempel-Ziv complexity [15]. In this paper, we will focus on the 23 methods using conditional entropy. Measuring animal communication in terms of the entropy in bits, 24 these studies have attempted to look at the animal communication structure at various lengths (N-grams) 25 in order to determine the structure of the communications and whether the tools of information theory 26 can lend themselves to a better understanding of animal behavior and possibly what types of information 27 can be communicated.
Where p(b i , j) is the joint probability of the sequence (b i , j) and p b i (j) is the conditional probability analysis for a large group of these are given in [17,18].
Soon after human languages, animal communication of varying types were studied using entropy. with large symbol alphabets, the decrease in the information graph could be indicative of the inadequacy 80 of sample sizes at larger orders rather than the fundamental order of the underlying Markov process.

81
With these caveats, the information graphs will still be shown as an illustration of the results of the 82 studies on each animal communication and should be used with caution to establish the complexity of 83 sequences.

84
In general, the larger the order of dependence, the more "complex" the communication is deemed.

85
For example, many bird call sequences seem to show first order dependence, though this is unsure since 86 a sample size of multiples of the number of symbols squared is needed to confirm this ( Figure 1).  Table 3).

Conditional Entropy Order
Conditonal Entropy (bias min.)  little information available, we can estimate upper and lower bounds for the entropy bias. This will be 104 described following the section on combinatorics.

Combinatorics of Information Theory and Repertoire Size
One of the lesser known, but extremely useful, facets of information theory is the way entropy can be 107 used for combinatorics. In particular, the number of combinations of a symbol set can be more accurately 108 estimated using the first-order entropy than can be done with an assumption of random likelihood. For is accurate: Here H is the Shannon (first-order) entropy using logarithm of base M. This assumes that each is an additional element of error in this analysis.

125
Since H is the first-order entropy, this Shannon-Weaver model assumes that each symbol has an i.i.d.

126
probability of appearing in each space in the N-Gram. If there is any correlation between symbols, the 127 larger N becomes, the more likely W N is inaccurate. However, in this model there is no co-dependence 128 between symbols on which symbol is more likely to follow another and the base assumption is that in   In the next section, we will investigate the complexity of several species including bottlenose  In this paper we will use entropy combinatorial techniques to estimate the N-gram repertoires of six For our analysis, we will use the results from the low noise data set.  France and Poland to test the hypothesis that habitat change, marked in France but not Poland, is having 206 a significant effect on the call patterns of Alauda arvensis L.. While songs were more shared amongst different birds in the restricted habitat near Paris, song complexity was almost identical in both locations.

208
For this paper, we use the continuous habitat data from the Poland habitat.

223
Here we use the data from these papers to reproduce graphically the information graphs for the  First, we will represent the minimum bias corrected conditional entropies as information graphs from 228 order 0, log M for the number of individual symbols, to the third order. Only the humpback whale data 229 stops at the second order due to a lack of data on the third order entropy. In analyzing the data from the species and estimating repertoires it is essential to define sample sizes 236 and correct for bias. In Table 1, the basic data from the papers is shown. One key issue to resolve is was not always available but for dolphins, humpback whales, and starlings, this methodology was used 242 to calculate S 2 and S 3 .

243
In to the maximum possible value-that which would make the conditional entropy at this order (usually the 247 third order) equal to that of the second order.

248
In Tables 3 and 4, the final estimates for the bias corrected conditional entropies and the derived 249 repertoire sizes are given.   Table 3. The corrected conditional entropies, minimum and maximum, calculated for the conditional entropies of orders 1-3 according to the paper data and values in Tables 1 and  2. Values with asterisks indicate where the maximum bias assumption correction would have exceeded the previous order entropy and therefore the maximum bias is limited at the bias-corrected previous order entropy. From these tables, especially Table 4