# Using Data-Compressors for Classification Hunting Behavioral Sequences in Rodents as “Ethological Texts”

## 1. Introduction

## 2. The Suggested Method

_{0}= {the behavioral sequences are generated by a single source} and the alternative hypotheses H

_{1}= {the behavioral sequences are generated by different sources}. We stored sequences of symbols (each corresponded to the performed behavioral element) into the text files (txt) (say, X, Y, Z). All species were compared with each other in pairs. Our task is to answer the question of how close these sources are to each other. To do this, first, we divide each source text file approximately in half. Suppose we are dealing with three sources. The first half we denote by X*, Y*, and Z*. We divide the second halves into fragments of the same size, for example, 120 bytes and designate them x

_{1}, x

_{2}… x

_{n}; y

_{1}, y

_{2}… y

_{n}and z

_{1}, z

_{2}… z

_{n}. In our example, let “n” be equal to 9, and thus, there will be 27 such sample files. Then we individually add each resulting fragment (x

_{i}, y

_{i}, z

_{i}) to the first halves (X*, Y* and Z*). We thus obtain 81 augmented text files (X*x

_{i}, X*y

_{i}, X*z

_{i}, Y*x

_{i}, Y*y

_{i}, Y*z

_{i}and etc). All files obtained, including the first halves of the source files X*, Y* and Z*, are separately archived. Then each pair (X, Y), (X, Z), and (Y, Z) is examined separately and the association coefficient is determined for each one. Let us consider the pair (X, Y) as an example. We then obtained the differences between the volumes of archives source files and the augmented files (let us denote this difference as Δ; Δ(X*y

_{i}) = ϕ(X*y

_{i}) − ϕ(X*)), the example: ϕ(X*y

_{1}) – ϕ(X*) = 59 b and ϕ(Y*y

_{1}) − ϕ(Y*) = 41 b; ϕ(X*y

_{2}) − ϕ(X*) = 69 b, and ϕ(Y*y

_{2}) − ϕ(Y*) = 46 b; ϕ(X*y

_{3}) − ϕ(X*) = 71 b, and ϕ(Y*y

_{3}) − ϕ(Y*) = 38 b and etc. (where ϕ is the archive). We thus detected the number of cases in which the difference between the volumes of the source files and the augmented files were the smallest. Suppose, we have in all nine cases Δ (X*y

_{i}) > Δ (Y*y

_{i}), in one from those Δ (X*x

_{i}) < Δ (Y*x

_{i}), and in the rest eight Δ (Y*x

_{i}) < Δ (X*x

_{i}). Put the number of these cases in the corresponding cells of the 2 × 2 table (see also Figure A1 in Appendix A). In the case of our example, to compare the sources “X” and “Y”, the matrix will have the following form (Table 1):

## 3. The Procedure

#### 3.1. Notions and Data Encoding

#### 3.2. Constructing Sequences for Hypothesis Testing

## 4. Results

## 5. Discussion and Conclusions

## Supplementary Materials

## Author Contributions

## Funding

## Acknowledgments

## Conflicts of Interest

## Appendix A

#### Animals and Housing

#### Experimental Scheme

Symbols | Behavioural Elements |
---|---|

Q | Running |

S | Walking |

W | Bite |

E | Capturing the prey by forepaws (only in rodents) |

R | Handling (only in rodents) |

H | Nibbling insects’ legs |

G | Carrying the prey in teeth |

D | Sniffing |

N | Pinning the prey down to the ground by one paw (only in shrew) |

M | The same, by two paws (only in shrew) |

C | Freezing |

V | Turning a body to 90° |

B | U-turn |

F | Turning a head |

Y | Rearing against the wall |

U | Backwards movement |

X | Self-grooming |

J | Jump |

I | Free-standing rearing |

**Figure A1.**Here is a procedure for processing data to obtain the 2 × 2 matrices. Step 1. We divide each source file approximately in half. Then we leave the first half unchanged and divide the second one into several fragments of the same volume. The program that we used to cut text files is in the public domain: https://github.com/m-novikov/sequence_cut. Step 2. To the first parts of the source files, we added individually the fragments containing behavioral sequences of the same species and thus obtained files: X*x

_{1}, Y*y

_{1}, etc. After that, to the first parts of the source files, we added individually the fragments containing sequences of another species and thus obtained files X*y

_{1}, Y*x

_{1}, etc. We thus obtained the augmented files and got a possibility to compare structural features of behavioral sequences of two species. Step 3. We now archive all files obtained individually. Step 4. For each pair of species, we calculate the difference between the archive containing the augmented file and the first half of the source file. Step 5. We detect cases in which the difference between the archive containing the augmented file and the first half of the source file was minimal and calculate the sum of numbers of these cases. Step 6. We place the obtained data into the cells of the 2 × 2 matrix.

**Figure 1.**A dendrogram of similarity between hunting behaviors in the species studied based on the association coefficients from Table 6.

x | y | |
---|---|---|

X* | 1 | 0 |

Y* | 8 | 9 |

x | z | |
---|---|---|

X* | 9 | 0 |

Z* | 0 | 9 |

y | z | |
---|---|---|

Y* | 5 | 0 |

Z* | 4 | 9 |

Species | Sizes of a Source Text Files (Bytes) | Numbers of Sequences in Source Text Files | Sizes of the First Parts of the Source Text Files (Bytes) | Number of the Sample Files Obtained |
---|---|---|---|---|

Rattus norvegicus | 2572 | 108 | 1290 | 9 |

Apodemus agrarius | 3343 | 83 | 1672 | 9 |

Phodopus campbelli | 1715 | 43 | 801 | 4 |

P. sungorus | 1585 | 76 | 792 | 6 |

Allocricetulus eversmanni | 1463 | 60 | 731 | 5 |

Al. curtatus | 2814 | 115 | 1407 | 9 |

Lasiopodomys gregalis | 1086 | 34 | 543 | 3 |

Alticola tuvinicus | 1319 | 157 | 659 | 5 |

Sorex araneus | 1637 | 61 | 818 | 5 |

Species | A. agrarius | L. gregalis |
---|---|---|

A. agrarius | 6 | 0 |

L. gregalis | 3 | 3 |

Species | R. nor. | A. ag. | P. cam. | P. sun. | Al. ev. | Al. cur. | L. gr. | Alt. tuv. | S. ar. |
---|---|---|---|---|---|---|---|---|---|

R. norvegicus | 0 | 0.58 | 1 | 0.74 | 0.37 | 0.24 | 1 | 0.85 | 1 |

A. agrarius | 0.58 | 0 | 0.28 | 0.87 | 0.85 | 1 | 0.58 | 0.86 | 0.93 |

P. campbelli | 1 | 0.28 | 0 | 0.53 | 0 | 0.44 | 0.73 | 1 | 1 |

P. sungorus | 0.74 | 0.87 | 0.53 | 0 | 0.83 | 0.49 | 1 | 1 | 1 |

Al. eversmanni | 0.37 | 0.85 | 0 | 0.83 | 0 | 0.45 | 0.6 | 1 | 1 |

Al. curtatus | 0.24 | 1 | 0.44 | 0.49 | 0.45 | 0 | 0.82 | 1 | 1 |

L. gregalis | 1 | 0.58 | 0.73 | 1 | 0.6 | 0.82 | 0 | 1 | 1 |

Alt. tuvinicus | 0.85 | 0.86 | 1 | 1 | 1 | 1 | 1 | 0 | 1 |

S. araneus | 1 | 0.93 | 1 | 1 | 1 | 1 | 1 | 1 | 0 |

Species | R. nor. | A. ag. | P. cam. | P. sun. | Al. ev. | Al. cur. | L. gr. | Alt. tuv. | S. ar. |
---|---|---|---|---|---|---|---|---|---|

R. norvegicus | X | 0.029* | 0.001 ** | 0.011 * | 0.360 | 1.000 | 0.005 ** | 0.005 ** | 0.001 ** |

A. agrarius | X | 1.000 | 0.002 ** | 0.005 ** | 0.001 ** | 0.180 | 0.003 ** | 0.003 ** | |

P. campbelli | X | 0.200 | 1.000 | 0.230 | 0.140 | 0.008 ** | 0.009 ** | ||

P. sungorus | X | 0.015* | 0.100 | 0.020* | 0.002 ** | 0.002 ** | |||

Al. eversmanni | X | 0.150 | 0.190 | 0.008 ** | 0.008 ** | ||||

Al. curtatus | X | 0.020* | 0.001 ** | 0.001 ** | |||||

L. gregalis | X | 0.020 * | 0.020 * | ||||||

Alt. tuvinicus | X | 0.008 ** | |||||||

S. araneus | X |

