Open Access
This article is
 freely available
 reusable
Mathematics 2019, 7(5), 427; https://doi.org/10.3390/math7050427
Article
Retrieving a Context Tree from EEG Data
^{1}
Instituto de Matemática e Estatística, Universidade de São Paulo, São Paulo 05508090, Brazil
^{2}
Centro de Matemática, Universidad de la República, Uruguay and Instituto Pasteur de Montevideo, Montevideo 11400, Uruguay
^{3}
Instituto de Matemática, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21941909, Brazil
^{4}
Instituto de Biofísica, Universidade Federal do Rio de Janeiro, Rio de Janeiro 21941902, Brazil
^{*}
Author to whom correspondence should be addressed.
Received: 28 March 2019 / Accepted: 5 May 2019 / Published: 14 May 2019
Abstract
:It has been repeatedly conjectured that the brain retrieves statistical regularities from stimuli. Here, we present a new statistical approach allowing to address this conjecture. This approach is based on a new class of stochastic processes, namely, sequences of random objects driven by chains with memory of variable length.
Keywords:
stochastic chains with memory of variable length; sequences of random objects driven by context tree models; stochastic modeling of EEG data1. Introduction
Consider the following experimental situation. A listener is exposed to a sequence of auditory stimuli, generated by a stochastic chain, while electroencephalographic (EEG) signals are recorded from his scalp. Starting from Von Helmholtz [1], a classical conjecture in neurobiology claims that the listener’s brain automatically identifies statistical regularities in the sequence of stimuli (see, for instance, [2,3]). If this is the case, then a signature of the stochastic chain generating the stimuli should somehow be encoded in the brain activity. The question is whether this signature can be identified in the EEG data recorded during the experiment. The goal of this paper is to discuss a new probabilistic framework in which this conjecture can be formally addressed.
To model the relationship between the random chain of auditory stimuli and the corresponding EEG data, we introduce a new class of stochastic processes. A process in this class has two components. The first one is a stochastic chain taking values in the set of auditory units. The second one is a sequence of functions corresponding to the sequence of EEG chunks recorded during the exposure of the successive auditory stimuli.
We use a stochastic chain with memory of variable length to model the dependence from the past characterizing the sequence of auditory stimuli. Stochastic chains with memory of variable length were introduced by Rissanen [4], as a universal system for data compression. In his seminal paper, Rissanen observed that in many real life stochastic chains the dependence from the past has not a fixed length. Instead, it changes at each step as a function of the past itself. He called a context the smallest final string of past symbols containing all the information required to predict the next symbol. The set of all contexts defines a partition of the past and can be represented by a rooted and labeled oriented tree. For this reason, many authors call stochastic chains with memory of variable length context tree models. We adopt this terminology here. A nonexhaustive list of articles on context tree models, with applications in biology and linguistics, includes [5,6,7,8,9,10,11,12,13].
An interesting point about stochastic chains with memory of variable length with finite context trees is that they are dense in the $\overline{d}$topology in the class of chains of infinite order with continuous and nonnull transition probabilities and summable continuity rates. This result follows easily from Fernández and Galves [14] and Duarte et al. [15]. We refer the reader to these articles for definitions and more details.
Besides modeling the chain of auditory units, we must also model the relationship between the chain of stimuli and the sequence of EEG chunks. To that end, we assume that at each time step a new EEG chunk is chosen according to a probability measure (defined on suitable class of functions) which depends only on the context assigned to the sequence of auditory units generated up to that time. In particular, this implies that to describe the new class of stochastic chains introduced in this paper, we also need to consider a family of probability measures on the set of functions corresponding to the EEG chunks, indexed by the contexts of the context tree characterizing the chain of auditory stimuli.
In this probabilistic framework, the neurobiological question can now be rigorously addressed as follows. Is it possible to retrieve the context tree characterizing the chain of stimuli from the corresponding EEG data? This is a problem of statistical model selection in the class of stochastic processes we have just informally described.
This article is organized as follows. In Section 2, we provide an informal overview of our approach. In Section 3, we introduce the notation, recall what is a context tree model and introduce the new class of sequences of random objects driven by context tree models. A statistical procedure to select, given the data, a member on the class of sequences of random objects driven by context tree models is presented in Section 4. The theoretical result supporting the proposed method, namely Theorem 1, is given in the same section. In Section 5, we conduct a brief simulation study to illustrate the statistical selection procedure presented in Section 4. The proof of Theorem 1 is given in Section 6.
2. Informal Presentation of Our Approach
Volunteers are exposed to sequences of auditory stimuli generated by a context tree models while EEG signals are recorded. The auditory units used as stimuli are strong beats, weak beats or silent units, represented by symbols $2,1$ and 0, respectively.
The way the sequence of auditory units was generated can be informally described as follows. Start with the deterministic sequence
$$2\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}2\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}2\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}2\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}2\dots .$$
Then, replace each weak beat (symbol 1) by a silent unit (symbol 0) with probability $\u03f5$ in an independent way.
An example of a sequence produced by this procedure acting on the basic sequence would be
$$2\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}2\phantom{\rule{4pt}{0ex}}0\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}2\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}1\phantom{\rule{4pt}{0ex}}2\phantom{\rule{4pt}{0ex}}0\phantom{\rule{4pt}{0ex}}0\phantom{\rule{4pt}{0ex}}2\dots .$$
In the sequel, this stochastic chain is denoted by the symbols $({X}_{0},{X}_{1},{X}_{2},\dots )$.
The stochastic chain just described can be generated step by step by an algorithm using only information from the past. We impose to the algorithm the condition that it uses, at each step, the shortest string of past symbols necessary to generate the next symbol.
This algorithm can be described as follows. To generate ${X}_{n}$, given the past ${X}_{n1},{X}_{n2},\dots $, we first look to the last symbol ${X}_{n1}$.
 If ${X}_{n1}=2$, then$${X}_{n}=\left\{\begin{array}{cc}1,\hfill & \mathrm{with}\mathrm{probability}\phantom{\rule{4pt}{0ex}}1\u03f5,\hfill \\ 0,\hfill & \mathrm{with}\mathrm{probability}\phantom{\rule{4pt}{0ex}}\u03f5.\phantom{\rule{4pt}{0ex}}\hfill \end{array}\right.$$
 If ${X}_{n1}=1$ or ${X}_{n1}=0$, then we need to go back one more step,
 ▪
 if ${X}_{n2}=2$, then$${X}_{n}=\left\{\begin{array}{cc}1,\hfill & \mathrm{with}\mathrm{probability}\phantom{\rule{4pt}{0ex}}1\u03f5,\hfill \\ 0,\hfill & \mathrm{with}\mathrm{probability}\phantom{\rule{4pt}{0ex}}\u03f5;\phantom{\rule{4pt}{0ex}}\hfill \end{array}\right.$$
 ▪
 if ${X}_{n2}=1$ or ${X}_{n2}=0$, then ${X}_{n}=2$ with probability 1.
The algorithm described above is characterized by two elements. The first one is a partition of the set of all possible sequences of past units. This partition is represented by the set
$$\tau =\{00,10,20,2,01,11,21,2\}.$$
In partition $\tau $, the string 00 represents the set of all strings ending by the ordered pair $(0,0)$; 10 represents the set of all strings ending by the ordered pair $(1,0)$, …; and finally the symbol 2 represents the set of all strings ending by 2. Following Rissanen [4], let us call context any element of this partition.
For instance, if
the context associated to this past sequence is 01.
$$\dots ,{X}_{n3}=1,{X}_{n2}=2,{X}_{n1}=0,{X}_{n}=1.$$
The partition $\tau $ of the past as described above can be represented by a rooted and labeled tree (see Figure 1) where each element of the partition is described as a leaf of the tree.
In the construction described above, for each sequence of past symbols, the algorithm first identifies the corresponding context w in the partition $\tau $. Once the context w is identified, the algorithm chooses a next symbol $a\in \{0,1,2\}$ using the transition probability $p\left(a\rightw)$. In other terms, each context w in $\tau $ defines a probability measure on $\{0,1,2\}$. The family of transition probabilities indexed by elements of the partition is the second element characterizing the algorithm.
The families of transition probabilities associated to $\tau $ are shown in Table 1.
Using the notion of context tree, the neurobiological conjecture can now be rephrased as follows. Is the brain able to identify the context tree generating the sample of auditory stimuli? From an experimental point of view, the question is whether it is possible to retrieve the tree presented in Figure 1 from the corresponding EEG data. To deal with this question we introduce a new statistical model selection procedure described below.
Let ${Y}_{n}$ be the chunk of EEG data recorded while the volunteer is exposed to the auditory stimulus ${X}_{n}$. Observe that ${Y}_{n}$ is a continuous function taking values in ${\mathbb{R}}^{d}$, where $d\ge 1$ is the number of electrodes. Its domain is the time interval of length, say T, during which the acoustic stimulus ${X}_{n}$ is presented.
The statistical procedure introduced in the paper can be informally described as follows. Given a sample $({X}_{0},{Y}_{0}),\dots ,({X}_{n},{Y}_{n})$ of auditory stimuli and associated EEG chunks and for a suitable initial integer $k\ge 1$, do the following.
 For each string $\mathbf{u}={u}_{1},{u}_{2},\dots ,{u}_{k}$ of symbols in $\{0,1,2\}$, identify all occurrences in the sequence ${X}_{0},{X}_{1},\dots ,{X}_{n}$ of the string $a\mathbf{u}$, obtained by concatenating the symbol $a\in \{0,1,2\}$ and the string $\mathbf{u}$.
 For each $a\in \{0,1,2\}$, define the subsample of all EEG chunks ${Y}_{m}={Y}_{m}^{\left(a\mathbf{u}\right)}$ such that ${X}_{mk}=a,{X}_{mk+1}={u}_{1},\dots ,{X}_{m}={u}_{k}$ (see Figure 2).
 For any pair $a,b\in \{0,1,2\}$, test the null hypothesis that the law of the EEG chunks ${Y}^{\left(a\mathbf{u}\right)}$ and ${Y}^{\left(b\mathbf{u}\right)}$ collected at Step 2 are equal.
 (a)
 If the null hypothesis is not rejected for any pair of final symbols a and b, we conclude that the occurrence of a or b before the string $\mathbf{u}$ do not affect the law of EEG chunks. Then, we start again the procedure with the one step shorter sequence $\mathbf{u}={u}_{2},\dots ,{u}_{k}$.
 (b)
 If the null hypothesis is rejected for at least one pair of final symbols a and b, we conclude that the law of EEG chunks depend on the entire string $a\mathbf{u}$ and we stop the pruning procedure.
 We keep pruning the sequence ${u}_{1},\dots ,{u}_{k}$ until the nullhypothesis is reject for the first time.
 Call ${\widehat{\tau}}_{n}$ the tree constituted by the strings which remained after the pruning procedure.
The question is whether ${\widehat{\tau}}_{n}$ coincides with the context tree $\tau $ generating the sequence of auditory stimuli.
An important technical issue must be clarified at this point, namely, how to test the equality of the laws of two subsamples of EEG chunks. This is done using the projective method informally explained below.
Suppose we have two samples of random functions, each sample composed by independent realizations of some fixed law. To test whether the two samples are generated by the same law, we choose at random a “direction” and project each function in the samples in this direction. This produces two new samples of real numbers. Now, we test whether the samples of the projected real numbers have the same law. Under suitable conditions, a theorem by CuestaAlbertos et al. [16] ensures that for almost all directions if the test does not reject the null hypothesis that the projected samples have the same law, then the original samples also have the same law.
The arguments informally sketched in this section are formally developed in the subsequent sections.
3. Notation and Definitions
Let A be a finite alphabet. Given two integers $m,n\in \mathbb{Z}$ with $m\le n$, the string $({u}_{m},\dots ,{u}_{n})$ of symbols in A is often denoted by ${u}_{m}^{n}$; its length is $\ell \left({u}_{m}^{n}\right)=nm+1$. The empty string is denoted by ∅ and its length is $\ell (\varnothing )=0$. Fixing two strings u and v of elements of A, we denote by $uv$ the string in ${A}^{\ell \left(u\right)+\ell \left(v\right)}$ obtained by the concatenation of u and v. By definition $u\varnothing =\varnothing u=u$ for any string $u\in {A}^{\ell \left(u\right)}$. The string u is said to be a suffix of v if there exists a string s satisfying $v=su$. This relation is denoted by $u\u2aafv$. When $v\ne u$, we say that u is a proper suffix of v and write $u\prec v$. Hereafter, the set of all finite strings of symbols in A is denoted by ${A}^{\ast}:={\bigcup}_{k=1}^{\infty}{A}^{k}$. For any finite string $w={w}_{k}^{1}$ with $k\ge 2$, we write $suf\left(w\right)$ to denote the onestep shorter string ${w}_{k+1}^{1}$.
Definition 1.
A finite subset τ of ${A}^{\ast}$ is a context tree if it satisfies the following conditions:
 1.
 Suffix Property. For no $w\in \tau $ we have $u\in \tau $ with $u\prec w$.
 2.
 Irreducibility. No string belonging to τ can be replaced by a proper suffix without violating the suffix property.
The set $\tau $ can be identified with the set of leaves of a rooted tree with a finite set of labeled branches. The elements of $\tau $ are always denoted by $w,u,v,\dots $.
The height of the context tree $\tau $ is defined as $\ell \left(\tau \right)=max\{\ell (w)\phantom{\rule{0.166667em}{0ex}}:\phantom{\rule{0.166667em}{0ex}}w\in \tau \}.$ In the present paper, we only consider context trees with finite height.
Definition 2.
Let τ and ${\tau}^{\prime}$ be two context trees. We say that τ is smaller than ${\tau}^{\prime}$ and write $\tau \u2aaf{\tau}^{\prime}$, if for every ${w}^{\prime}\in {\tau}^{\prime}$ there exists $w\in \tau $ such that $w\u2aaf{w}^{\prime}$.
Given a context tree $\tau $, let $p=\left\{p\right(\xb7\mid w):w\in \tau \}$ be a family of probability measures on A indexed by the elements of $\tau $. The pair $(\tau ,p)$ is called a probabilistic context tree on A. Each element of $\tau $ is called a context. For any string ${x}_{n}^{1}\in {A}^{n}$ with $n\ge \ell \left(\tau \right)$, we write ${c}_{\tau}\left({x}_{n}^{1}\right)$ to denote the only context in $\tau $ which is a suffix of ${x}_{n}^{1}$.
Definition 3.
A probabilistic context tree $(\tau ,p)$ with height $\ell \left(\tau \right)=k$ is irreducible if for any ${a}_{k}^{1}\in {A}^{k}$ and $b\in A$ there exist a positive integer $n=n({a}_{k}^{1},b)$ and symbols ${a}_{0},{a}_{1},\dots ,{a}_{n}=b\in A$ such that
$$p\left({a}_{0}\right{c}_{\tau}\left({a}_{k}^{1}\right))>0,p\left({a}_{1}\right{c}_{\tau}\left({a}_{0}{a}_{k}^{1}\right))>0,\dots ,p\left({a}_{n}\right{c}_{\tau}({a}_{n1},\dots ,{a}_{0}{a}_{k}^{1}))>0.$$
Definition 4.
Let $(\tau ,p)$ be a probabilistic context tree on A. A stochastic chain ${\left({X}_{n}\right)}_{n\in \mathbb{N}}$ taking values in A is called a context tree model compatible with $(\tau ,p)$ if
 1.
 for any $n\ge \ell \left(\tau \right)$ and any finite string ${x}_{n}^{1}\in {A}^{n}$ such that $P\left({X}_{0}^{n1}={x}_{n}^{1}\right)>0$, it holds that$$P\left({X}_{n}=a\mid {X}_{0}^{n1}={x}_{n}^{1}\right)=p\left(a\mid {c}_{\tau}\left({x}_{n}^{1}\right)\right)\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}for\phantom{\rule{4pt}{0ex}}all\phantom{\rule{4pt}{0ex}}a\in A,$$
 2.
 For any $1\le j<\ell \left({c}_{\tau}\left({x}_{n}^{1}\right)\right)$, there exists $a\in A$ such that$$P\left({X}_{n}=a\mid {X}_{0}^{n1}={x}_{n}^{1}\right)\ne P\left({X}_{n}=a\mid {X}_{nj}^{n1}={x}_{j}^{1}\right).$$
With this notation, we can now introduce the class of random objects driven by a context tree model.
Definition 5.
Let A be a finite alphabet, $(\tau ,p)$ a probabilistic context tree on A, $(F,\mathcal{F})$ a measurable space and $({Q}^{w}:w\in \tau )$ a family of probability measures on $(F,\mathcal{F})$. The bivariate stochastic chain ${({X}_{n},{Y}_{n})}_{n\in \mathbb{N}}$ taking values in $A\times F$ is a sequence of random objects driven by a context tree model compatible with $(\tau ,p)$ and $({Q}^{w}:w\in \tau )$ if the following conditions are satisfied:
 1.
 ${\left({X}_{n}\right)}_{n\in \mathbb{N}}$ is a context tree model compatible with $(\tau ,p)$.
 2.
 The random elements ${Y}_{0},{Y}_{1},\dots $ are $\mathcal{F}$measurable. Moreover, for any integers $\ell \left(\tau \right)\le m\le n$, any string ${x}_{m\ell \left(\tau \right)+1}^{n}\in {A}^{nm+\ell \left(\tau \right)}$ and any sequence ${J}_{m},\dots ,{J}_{n}$ of $\mathcal{F}$measurable sets,$$P\left({Y}_{m}\in {J}_{m},\dots ,{Y}_{n}\in {J}_{n}{X}_{m\ell \left(\tau \right)+1}^{n}={x}_{m\ell \left(\tau \right)+1}^{n}\right)\phantom{\rule{0.166667em}{0ex}}=\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}\prod _{k=m}^{n}{Q}^{{c}_{\tau}\left({x}_{k\ell \left(\tau \right)+1}^{k}\right)}\left({J}_{k}\right),$$
Definition 6.
A sequence of random objects driven by a context tree model compatible with $(\tau ,p)$ and $({Q}^{w}:w\in \tau )$ is identifiable if for any context $w\in \tau $ there exists a context $u\in \tau $ such that suf$\left(w\right)$ = suf$\left(u\right)$ and ${Q}^{w}\ne {Q}^{u}.$
The process $\left({X}_{n}\right)$ is called the stimulus chain and $\left({Y}_{n}\right)$ is called the response chain.
The experimental situation described in Section 2 can now be formally presented as follows.
 The stimulus chain $\left({X}_{n}\right)$ is a context tree model taking values in an alphabet having as elements symbols indicating the different types of auditory units appearing in the sequence of stimuli. We call $(\tau ,p)$ its probabilistic context tree.
 Each element ${Y}_{n}$ of the response chain $\left({Y}_{n}\right)$ represents the EEG chunk recorded while the volunteer is exposed to the auditory stimulus ${X}_{n}$. Thus, ${Y}_{n}=({Y}_{n}\left(t\right),t\in [0,T])$ is a function taking values in ${\mathbb{R}}^{d}$, where T is the time distance between the onsets of two consecutive auditory stimuli and d the number of electrodes used in the analysis. The sample space F is the Hilbert space ${L}^{2}([0,T],{\mathbb{R}}^{d})$ of ${\mathbb{R}}^{d}$valued functions on $[0,T]$ having square integrable components. The Hilbert space F is endowed with its usual Borel $\sigma $algebra $\mathcal{F}$.
 Finally, $({Q}^{w},w\in \tau )$ is a family of probability measures on ${L}^{2}([0,T],{\mathbb{R}}^{d})$ describing the laws of the EEG chunks.
From now on, the pair $(F,\mathcal{F})$ always denotes the Hilbert space ${L}^{2}([0,T],{\mathbb{R}}^{d})$ endowed with its usual Borel $\sigma $algebra.
4. Statistical Selection for Sequences of Random Objects Driven by Context Tree Models
Let $({X}_{0},{Y}_{0}),\dots ,({X}_{n},{Y}_{n})$, with ${X}_{k}\in A$ and ${Y}_{k}\in F$ for $0\le k\le n$, be a sample produced by a sequence of random objects driven by a context tree model compatible with $(\overline{\tau},\overline{p})$ and $({\overline{Q}}^{w}:w\in \overline{\tau})$. Before introducing the statistical selection procedure, we need two more definitions.
Definition 7.
Let τ be a context tree and fix a finite string $s\in {A}^{\ast}$. We define the branch in τ induced by s as the set ${B}_{\tau}\left(s\right)=\{w\in \tau :w\succ s\}$. The set ${B}_{\tau}\left(s\right)$ is called a terminal branch if for all $w\in {B}_{\tau}\left(s\right)$ it holds that $w=as$ for some $a\in A$.
Given a sample ${X}_{0},\dots ,{X}_{n}$ of symbols in A and a finite string $u\in {A}^{\ast}$, the number of occurrences of u in the sample ${X}_{0},\dots ,{X}_{n}$ is defined as
$${N}_{n}\left(u\right)=\sum _{m=l\left(u\right)1}^{n}1\{{X}_{m\ell \left(u\right)+1}^{m}=u\}.$$
Definition 8.
Given integers $n>L\ge 1$, an admissible context tree of maximal height L for the sample ${X}_{0},\dots ,{X}_{n}$ of symbols in A, is any context tree τ satisfying
 1.
 $w\in \tau $ if and only if $\ell \left(w\right)\le L$ and ${N}_{n}\left(w\right)\ge 1$.
 2.
 Any string $u\in {A}^{\ast}$ with ${N}_{n}\left(u\right)\ge 1$ is a suffix of some $w\in \tau $ or has a suffix $w\in \tau $.
For any pair of integers $1\le L<n$ and any string $u\in {A}^{\ast}$ with $\ell \left(u\right)\le L$, call ${I}_{n}\left(u\right)$ the set of indexes belonging to $\{\ell (u)1,\dots ,n\}$ in which the string u appears in sample ${X}_{0},\dots ,{X}_{n}$, that is
$${I}_{n}\left(u\right)=\{\ell \left(u\right)1\le m\le n:{X}_{m\ell \left(u\right)+1}^{m}=u\}.$$
Observe that by definition ${I}_{n}\left(u\right)={N}_{n}\left(u\right)$. If ${I}_{n}\left(u\right)=\{{m}_{1},\dots ,{m}_{{N}_{n}\left(u\right)}\}$, we set ${Y}_{k}^{\left(u\right)}={Y}_{{m}_{k}}$ for each $1\le k\le {N}_{n}\left(u\right)$. Thus, ${Y}_{1}^{\left(u\right)},\dots ,{Y}_{{N}_{n}\left(u\right)}^{\left(u\right)}$ is the subsample of ${Y}_{0},\dots ,{Y}_{n}$ induced by the string u.
Given $u\in {A}^{\ast}$ such that ${N}_{n}\left(u\right)\ge 1$ and $h\in F$, we define the empirical distribution associated to the projection of the sample ${Y}_{1}^{\left(u\right)},\dots ,{Y}_{{N}_{n}\left(u\right)}^{\left(u\right)}$ onto the direction h as
where for any pair of functions $f,h\in F$,
$${\widehat{Q}}_{n}^{u,h}\left(t\right)=\frac{1}{{N}_{n}\left(u\right)}\sum _{m=1}^{{N}_{n}\left(u\right)}{1}_{(\infty ,t]}\left(\langle {Y}_{m}^{\left(u\right)},h\rangle \right),\phantom{\rule{4pt}{0ex}}t\in \mathbb{R},$$
$$\langle f,h\rangle =\sum _{i=1}^{d}{\int}_{0}^{T}{f}_{i}\left(t\right){h}_{i}\left(t\right)dt.$$
For a given pair $u,v\in {A}^{\ast}$, with $max\{\ell (u),\ell (v\left)\right\}\le L$ and $h\in F$, the Kolmogorov–Smirnov distance between the empirical distributions ${\widehat{Q}}_{n}^{u,h}$ and ${\widehat{Q}}_{n}^{v,h}$ is defined by
$$\mathrm{KS}({\widehat{Q}}_{n}^{u,h},{\widehat{Q}}_{n}^{v,h})=\underset{t\in \mathbb{R}}{sup}{\widehat{Q}}_{n}^{u,h}\left(t\right){\widehat{Q}}_{n}^{v,h}\left(t\right).$$
Finally, we define for any pair $u,v\in {A}^{\ast}$ such that $max\{\ell (u),\ell (v\left)\right\}\le L$ and $h\in F,$
$${D}_{n}^{h}(({Y}_{1}^{\left(u\right)},\dots ,{Y}_{{N}_{n}\left(u\right)}^{\left(u\right)}),({Y}_{1}^{\left(v\right)},\dots ,{Y}_{{N}_{n}\left(v\right)}^{\left(v\right)}))=\sqrt{\frac{{N}_{n}\left(u\right){N}_{n}\left(v\right)}{{N}_{n}\left(u\right)+{N}_{n}\left(v\right)}}\mathrm{KS}({\widehat{Q}}_{n}^{u,h},{\widehat{Q}}_{n}^{v,h}).$$
Our selection procedure can now be described as follows. Fix an integer $1\le L<n$ and let ${\mathcal{T}}_{n}$ be the largest admissible context tree of maximal height L for the sample ${X}_{0},\dots ,{X}_{n}$. The largest means that if $\tau $ is any other admissible context tree of maximal height L for the sample ${X}_{1}^{n}$, then $\tau \u2aaf{\mathcal{T}}_{n}.$
For any string $u\in {A}^{\ast}$ such that ${B}_{{\mathcal{T}}_{n}}\left(u\right)$ is a terminal branch, we test the null hypothesis
using the test statistic
where $W=(({W}_{1}\left(t\right),\dots ,{W}_{d}\left(t\right)):t\in [0,T])$ is a realization of a ddimensional Brownian motion in $[0,T]$.
$${H}_{0}^{\left(u\right)}\phantom{\rule{0.166667em}{0ex}}\phantom{\rule{0.166667em}{0ex}}:\mathcal{L}\left({Y}_{1}^{\left(au\right)},\dots ,{Y}_{{N}_{n}\left(au\right)}^{\left(au\right)}\right)\phantom{\rule{0.166667em}{0ex}}=\phantom{\rule{0.166667em}{0ex}}\mathcal{L}\left({Y}_{1}^{\left(bu\right)},\dots ,{Y}_{{N}_{n}\left(bu\right)}^{\left(bu\right)}\right),\phantom{\rule{0.166667em}{0ex}}\forall \phantom{\rule{0.166667em}{0ex}}au,bu\in {B}_{{\mathcal{T}}_{n}}\left(u\right)$$
$${\u2206}_{n}\left(u\right)\phantom{\rule{0.166667em}{0ex}}=\phantom{\rule{0.166667em}{0ex}}{\u2206}_{n}^{W}\left(u\right)\phantom{\rule{0.166667em}{0ex}}=\phantom{\rule{0.166667em}{0ex}}\underset{a,b\in A}{max}{D}_{n}^{W}\left(({Y}_{1}^{\left(au\right)},\dots ,{Y}_{{N}_{n}\left(au\right)}^{\left(au\right)}),({Y}_{1}^{\left(bu\right)},\dots ,{Y}_{{N}_{n}\left(bu\right)}^{\left(bu\right)})\right),$$
We reject the null hypothesis ${H}_{0}^{\left(u\right)}$ when ${\u2206}_{n}\left(u\right)>c$, where $c>0$ is a suitable threshold. When the null hypothesis ${H}_{0}^{\left(u\right)}$ is not rejected, we prune the branch ${B}_{{\mathcal{T}}_{n}}\left(u\right)$ in ${\mathcal{T}}_{n}$ and set as a new candidate context tree
$${\mathcal{T}}_{n}=\left({\mathcal{T}}_{n}\backslash {B}_{{\mathcal{T}}_{n}}\left(u\right)\right)\cup \left\{u\right\}.$$
On the other hand, if the null hypothesis ${H}_{0}^{\left(u\right)}$ is rejected, we keep ${B}_{{\mathcal{T}}_{n}}\left(u\right)$ in ${\mathcal{T}}_{n}$ and stop testing ${H}_{0}^{\left(s\right)}$ for strings $s\in {A}^{\ast}$ such that $s\u2aafu$.
In each pruning step, take a string $s\in {A}^{\ast}$ that induces a terminal branch in ${\mathcal{T}}_{n}$ and has not been tested yet. This pruning procedure is repeated until no more pruning is performed. We denote by ${\widehat{\tau}}_{n}$ the final context tree obtained by this procedure. The formal description of the above pruning procedure is provided in Algorithm 1 as pseudocode.
Algorithm 1 Pseudocode describing the pruning procedure used to select the tree ${\widehat{\tau}}_{n}$. 

To state the consistency theorem, we need the following definitions.
Definition 9.
A probability measure P defined on $(F,\mathcal{F})$ satisfies Carleman condition if all the absolute moments ${m}_{k}={\int \left\righth\left\right}^{k}P\left(dh\right)$, $k\ge 1,$ are finite and
$$\sum _{k\ge 1}{m}_{k}^{1/k}=+\infty .$$
Definition 10.
Let P be a probability measure on $(F,\mathcal{F})$. We say that P is continuous if ${P}^{h}$ is continuous for any $h\in F$, where ${P}^{h}$ is defined by
$${P}^{h}\left((\infty ,t]\right)=P(x\in F:\langle x,h\rangle \le t),\phantom{\rule{4pt}{0ex}}t\in \mathbb{R}.$$
Let V be a finite set of indexes and $({P}_{i}:i\in V)$ be a family of probability measures on $(F,\mathcal{F})$. We say that $({P}_{i}:i\in V)$ is continuous if for all $i\in V$, the probability measure ${P}_{i}$ is continuous.
In what follows, let ${c}_{\alpha}=\sqrt{(1/2)ln(2/\alpha )}$, where $\alpha \in (0,1).$ We say that ${\alpha}_{n}\to 0$ slowly enough as $n\to \infty $ if
$$\frac{\sqrt{n}}{{c}_{{\alpha}_{n}}}\to \infty \phantom{\rule{4pt}{0ex}}\mathrm{as}\phantom{\rule{4pt}{0ex}}n\to \infty .$$
Theorem 1.
Let $({X}_{0},{Y}_{0}),\dots ,({X}_{n},{Y}_{n})$ be a sample produced by a identifiable sequence of random objects driven by a context tree model compatible with $(\overline{\tau},\overline{p})$ and $({\overline{Q}}_{w}:w\in \overline{\tau})$, and let ${\widehat{\tau}}_{n}$ be the context tree selected from the sample by Algorithm 1 with $L\ge \ell \left(\overline{\tau}\right)$ and threshold ${c}_{{\alpha}_{n}}=\sqrt{(1/2)ln(2/{\alpha}_{n})}$, where ${\alpha}_{n}\in (0,1)$. If $(\overline{\tau},\overline{p})$ is irreducible and $({\overline{Q}}_{w}:w\in \overline{\tau})$ is continuous and satisfies Carleman condition, then for ${\alpha}_{n}\to 0$ slowly enough as $n\to \infty $,
$$\underset{n\to \infty}{lim}P({\widehat{\tau}}_{n}\ne \overline{\tau})=0.$$
The proof of Theorem 1 is presented in Section 6.
5. Simulation Study
In this section, we illustrate the performance of Algorithm 1 by applying it in a toy example. We consider the context tree model compatible with $(\overline{\tau},\overline{p})$ described in Section 2 with $\u03f5=0.2$. For each $w\in \overline{\tau}$, we assume ${\overline{Q}}^{w}$ is the law of a diffusion process with drift coefficient ${f}_{w}={\left({f}_{w}\left(t\right)\right)}_{0\le t\le 1}$ and constant diffusion coefficient. For simplicity, all diffusion coefficients are assumed to be 1. For each context $w\in \overline{\tau}$, we assume ${f}_{w}=K{g}_{w}$, where K is a positive constant and ${g}_{w}$ is the density of a Gaussian random variable with mean ${\mu}_{w}$ and standard deviation ${\sigma}_{w}$, restricted to the interval $[0,1]$. In the simulation, we take $K=5$. The shapes of the functions ${f}_{w}$ and corresponding values of ${\mu}_{w}$ and ${\sigma}_{w}$ are shown in Figure 3. One can check that the assumptions of Theorem 1 are satisfied by this toy example.
To numerically implement Algorithm 1, we assume that all trajectories of the diffusion processes are observed on equally spaces point $0={t}_{0}<{t}_{1}<\dots <{t}_{100}=1$, where ${t}_{i}=\frac{i}{100}$ for each $1\le i\le 100$. For each sample size $n=100,120,140,\dots ,1000,$ we estimate the fraction of times Algorithm 1, with ${\alpha}_{n}=1/n$ and $L=4$, correctly identifies the context tree $\overline{\tau}$ based on 100 random samples of the model with size $n.$ The results are reported in Figure 4.
6. Proof of Theorem 1
The proof of Theorem 1 is a direct consequence of Propositions 1 and 2 presented below.
Proposition 1.
Let $({X}_{0},{Y}_{0}),\dots ,({X}_{n},{Y}_{n})$ be a sample produced by a sequence of random objects driven by a context tree model compatible with $(\overline{\tau},\overline{p})$ and $({\overline{Q}}_{w}:w\in \overline{\tau})$ satisfying the assumptions of Theorem 1. Let $\alpha \in (0,1)$ and set ${c}_{\alpha}=\sqrt{1/2ln(2/\alpha )}$. For any integer $L\ge \ell \left(\overline{\tau}\right)$, context $w\in \overline{\tau}$, direction $h\in F\backslash \left\{0\right\}$, and strings $u,v\in {\cup}_{k=1}^{L\ell \left(w\right)}{A}^{k}$ such that $w\u2aafu$ and $w\u2aafv$, it holds that
$$\underset{n\to \infty}{lim}P({D}_{n}^{h}(({Y}_{1}^{\left(u\right)},\dots ,{Y}_{{N}_{n}\left(u\right)}^{\left(u\right)}),({Y}_{1}^{\left(v\right)},\dots ,{Y}_{{N}_{n}\left(v\right)}^{\left(v\right)}))>{c}_{\alpha})=\alpha .$$
In particular, for any ${\alpha}_{n}\to 0$ as $n\to \infty $, we have
$$\underset{n\to \infty}{lim}P({D}_{n}^{h}(({Y}_{1}^{\left(u\right)},\dots ,{Y}_{{N}_{n}\left(u\right)}^{\left(u\right)}),({Y}_{1}^{\left(v\right)},\dots ,{Y}_{{N}_{n}\left(v\right)}^{\left(v\right)}))>{c}_{{\alpha}_{n}})=0.$$
Proof.
The irreducibility of $(\overline{\tau},\overline{p})$ implies that Pa.s. both ${N}_{n}\left(u\right)$ and ${N}_{n}\left(v\right)$ tend to $+\infty $ as n diverges. Thus, Theorem 3.1(a) of [16] implies that the law of ${D}_{n}^{h}(({Y}_{1}^{\left(u\right)},\dots ,{Y}_{{N}_{n}\left(u\right)}^{\left(u\right)}),({Y}_{1}^{\left(v\right)},\dots ,{Y}_{{N}_{n}\left(v\right)}^{\left(v\right)}))$ is independent of the strings u and v, and also of the direction $h\in F\backslash \left\{0\right\}$. It also implies that ${D}_{n}^{h}(({Y}_{1}^{\left(u\right)},\dots ,{Y}_{{N}_{n}\left(u\right)}^{\left(u\right)}),({Y}_{1}^{\left(v\right)},\dots ,{Y}_{{N}_{n}\left(v\right)}^{\left(v\right)})$ converges in distribution to $K={sup}_{t\in [0,1]}\leftB\left(t\right)\right$ as $n\to \infty $, where $B=\left(B\right(t):t\in [0,1\left]\right)$ is a Brownian Bridge. Since $P(K>{c}_{\alpha})=\alpha $, the first part of the result follows.
By the first part of the proof, for any fixed $\alpha \in (0,1)$, we have that for all n large enough,
$$P({D}_{n}^{h}(({Y}_{1}^{\left(u\right)},\dots ,{Y}_{{N}_{n}\left(u\right)}^{\left(u\right)}),({Y}_{1}^{\left(v\right)},\dots ,{Y}_{{N}_{n}\left(v\right)}^{\left(v\right)}))>{c}_{\alpha}))\le 2\alpha .$$
Thus, given $\u03f5>0$, take $\alpha \in (0,1)$ such that $2\alpha <\u03f5$ to deduce that for all n large enough,
$$P({D}_{n}^{h}(({Y}_{1}^{\left(u\right)},\dots ,{Y}_{{N}_{n}\left(u\right)}^{\left(u\right)}),({Y}_{1}^{\left(v\right)},\dots ,{Y}_{{N}_{n}\left(v\right)}^{\left(v\right)}))>{c}_{\alpha}))<\u03f5.$$
Since ${c}_{{\alpha}_{n}}\to \infty $ as $n\to \infty $, we have that for all n sufficiently large ${c}_{{\alpha}_{n}}>{c}_{\alpha}$ so that the result follows from the previous inequality. □
Proposition 2 reads as follows.
Proposition 2.
Let $({X}_{0},{Y}_{0}),\dots ,({X}_{n},{Y}_{n})$ be a sample produced by a identifiable sequence of random objects driven by a context tree model compatible with $(\overline{\tau},\overline{p})$ and $({\overline{Q}}_{w}:w\in \overline{\tau})$, and let ${\widehat{\tau}}_{n}$ satisfying the assumptions of Theorem 1. Let $\alpha \in (0,1)$ and define ${c}_{\alpha}=\sqrt{1/2ln(2/\alpha )}$. For any string $s\in {A}^{\ast}$ such that ${B}_{\overline{\tau}}\left(s\right)$ is a terminal branch there exists a pair $w,{w}^{\prime}\in {B}_{\overline{\tau}}\left(s\right)$ such that for almost all realization of a Brownian motion $W=\left(W\right(t):t\in [0,T\left]\right)$ on $[0,T]$,
whenever ${\alpha}_{n}\to 0$ slowly enough as $n\to \infty .$
$$\underset{n\to \infty}{lim}P({D}_{n}^{W}(({Y}_{1}^{\left(w\right)},\dots ,{Y}_{{N}_{n}\left(w\right)}^{\left(w\right)}),({Y}_{1}^{\left({w}^{\prime}\right)},\dots ,{Y}_{{N}_{n}\left({w}^{\prime}\right)}^{\left({w}^{\prime}\right)}))\le {c}_{{\alpha}_{n}})=0,$$
Proof.
Since the sequence of random objects $({X}_{0},{Y}_{0}),({X}_{1},{Y}_{1}),\dots $ is identifiable and ${B}_{\overline{\tau}}\left(s\right)$ is a terminal branch, there exists a pair $w,{w}^{\prime}\in {B}_{\overline{\tau}}\left(suf\left(w\right)\right)$ whose associated distributions ${\overline{Q}}^{w}$ and ${\overline{Q}}^{w}$ on F are different, and both ${\overline{Q}}^{w}$ and ${\overline{Q}}^{{w}^{\prime}}$ satisfy the Carleman condition. For each $n\ge 1$, define
if $min\{{N}_{n}\left(w\right),{N}_{n}\left({w}^{\prime}\right)\}\ge 1$. Otherwise, we set ${N}_{n}=0$. The irreducibility of $(\overline{\tau},\overline{p})$ implies that ${n}^{1/2}{N}_{n}\to C$ as $n\to \infty $ Pa.s., where C is a positive constant depending on w and ${w}^{\prime}$.
$${N}_{n}:=\sqrt{\frac{{N}_{n}\left(w\right){N}_{n}\left({w}^{\prime}\right)}{{N}_{n}\left(w\right)+{N}_{n}\left({w}^{\prime}\right)}},$$
Now, Theorem 3.1(b) of [16] implies that, for almost all realization of a Brownian motion W on F,
$$\underset{n\to \infty}{\text{lim inf}}\mathrm{KS}({\widehat{Q}}_{n}^{W,w},{\widehat{Q}}_{n}^{W,{w}^{\prime}})>0\phantom{\rule{4pt}{0ex}}\mathit{\text{P}}\mathrm{a}.\mathrm{s}.$$
Since ${D}^{W}(({Y}_{1}^{\left(w\right)},\dots ,{Y}_{{N}_{n}\left(w\right)}^{\left(w\right)}),({Y}_{1}^{\left({w}^{\prime}\right)},\dots ,{Y}_{{N}_{n}\left({w}^{\prime}\right)}^{\left({w}^{\prime}\right)})/{c}_{{\alpha}_{n}}=\frac{\sqrt{n}}{{c}_{{\alpha}_{n}}}\frac{{N}_{n}}{\sqrt{n}}\mathrm{KS}({\widehat{Q}}_{n}^{h,w},{\widehat{Q}}_{n}^{h,{w}^{\prime}})$ and ${\alpha}_{n}\to 0$ slowly enough, the result follows. □
Proof of Theorem 1.
Let ${C}_{\overline{\tau}}$ be the set of contexts belonging to a terminal branch of $\overline{\tau}$. Define also the following events
$${U}_{n}=\bigcup _{w\in {C}_{\overline{\tau}}}\{{\u2206}_{n}^{W}\left(\mathrm{suf}\left(w\right)\right)\le {c}_{{\alpha}_{n}}\}\phantom{\rule{4pt}{0ex}}\mathrm{and}\phantom{\rule{4pt}{0ex}}{O}_{n}=\bigcup _{w\in \overline{\tau}}\bigcup _{\begin{array}{c}s\succ w:\\ \ell \left(s\right)\le L\end{array}}\{{\u2206}_{n}^{W}\left(s\right)>{c}_{{\alpha}_{n}}\}.$$
It follows from the definition of Algorithm 1 that
$$P({\widehat{\tau}}_{n}\ne \overline{\tau})=P\left({U}_{n}\right)+P\left({O}_{n}\right).$$
Thus, it is enough to prove that for any $\u03f5>0$ there exists ${n}_{0}={n}_{0}\left(\u03f5\right)$ such that $P\left({U}_{n}\right)\le \u03f5/2$ and $P\left({O}_{n}\right)\le \u03f5/2$ for all $n\ge {n}_{0}$.
By the union bound, we see that
$$P\left({U}_{n}\right)\le \sum _{w\in \overline{\tau}}P({\u2206}_{n}^{W}\left(\mathrm{suf}\left(w\right)\right)\le {c}_{{\alpha}_{n}}).$$
The sequence of random objects $({X}_{0},{Y}_{0}),({X}_{1},{Y}_{1}),\dots $ is identifiable. Thus by observing that for each $w\in {C}_{\overline{\tau}}$, ${B}_{\overline{\tau}}\left(\mathrm{suf}\left(w\right)\right)$ is a terminal branch, we have that there exists ${w}^{\prime}\in {B}_{\overline{\tau}}\left(\mathrm{suf}\left(w\right)\right)$ such that the associated distributions ${\overline{Q}}^{w}$ and ${\overline{Q}}^{{w}^{\prime}}$ on F are different, and both ${\overline{Q}}^{w}$ and ${\overline{Q}}^{{w}^{\prime}}$ satisfies Carleman condition. Since
and $\overline{\tau}$ is finite, Proposition 2 implies that $P\left({U}_{n}\right)\to 0$ as $n\to \infty $, if ${\alpha}_{n}\to 0$ slowly enough. As a consequence, for any $\u03f5>0$ there exists ${n}_{0}={n}_{0}\left(\u03f5\right)$ such that $P\left({U}_{n}\right)\le \u03f5/2$ for all $n\ge {n}_{0}$.
$$\{{\u2206}_{n}^{W}\left(\mathrm{suf}\left(w\right)\right)\le {c}_{{\alpha}_{n}}\}\subset \{{D}_{n}^{W}(({Y}_{1}^{\left(w\right)},\dots ,{Y}_{{N}_{n}\left(w\right)}^{\left(w\right)}),({Y}_{1}^{\left({w}^{\prime}\right)},\dots ,{Y}_{{N}_{n}\left({w}^{\prime}\right)}^{\left({w}^{\prime}\right)})\le {c}_{{\alpha}_{n}}\},$$
Using again the union bound, we have
$$P\left({O}_{n}\right)\le \sum _{w\in \overline{\tau}}\sum _{\begin{array}{c}s\succ w:\\ \ell \left(s\right)\le L\end{array}}P({\u2206}_{n}^{W}\left(s\right)>{c}_{{\alpha}_{n}}).$$
By observing that $\overline{\tau}$ is finite, the alphabet A is finite and
we deduce from Proposition 1 and the inequality in Equation (6) that, for any $\u03f5>0$, we have $P\left({O}_{n}\right)\le \u03f5/2$ for all n large enough. This concludes the proof of the theorem. □
$$\{{\u2206}_{n}^{W}\left(s\right)>{c}_{{\alpha}_{n}}\}=\bigcup _{a,b\in A}\{{D}_{n}^{W}(({Y}_{1}^{\left(as\right)},\dots ,{Y}_{{N}_{n}\left(as\right)}^{\left(as\right)}),({Y}_{1}^{\left(bs\right)},\dots ,{Y}_{{N}_{n}\left(bs\right)}^{\left(bs\right)})>{c}_{{\alpha}_{n}}\},$$
Author Contributions
All authors contributed equally to this work.
Funding
This work is part of USP project Mathematics, computation, language and the brain, FAPESP project Research, Innovation and Dissemination Center for Neuromathematics (grant 2013/076990), and project Plasticity in the brain after a brachial plexus lesion (FAPERJ grants E26/010002902/2014, E26/010002474/2016). A. Galves and C.D. Vargas are partially supported by CNPq fellowships (grants 311 719/20163 and 309560/20179, respectively). C.D. Vargas is also partially supported by a FAPERJ fellowship (CNE 202.785/2018). A. Duarte was fully and successively supported by CNPq and FAPESP fellowships (grants 201696/20150 and 2016/177919). G. Ost was fully and successively supported by CNPq and FAPESP fellowships (grants 201572/20150 and 2016/177894).
Conflicts of Interest
The authors declare no conflict of interest.
References
 Von Helmholtz, H. Handbuch der Physiologischen Optik; Translated by The Optical Society of America in 1924 from the third germand edition, 1910, Treatise on physiological optics, Volume III; Leopold Voss: Leipzig, Germany, 1867; Volume 3. [Google Scholar]
 Garrido, M.I.; Sahani, M.; Dolan, R.J. Outlier responses reflect sensitivity to statistical structure in the human brain. PLOS Comput. Biol. 2013, 9. [Google Scholar] [CrossRef] [PubMed]
 Wacongne, C.; Changeux, J.; Dehaene, S. A Neuronal Model of Predictive Coding Accounting for the Mismatch Negativity. J. Neurosci. 2012, 32, 3665–3678. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Rissanen, J. A Universal Data Compression System. IEEE Trans. Inf. Theory 1983, 29, 656–664. [Google Scholar] [CrossRef]
 Bühlmann, P.; Wyner, A.J. Variable length Markov chains. Ann. Stat. 1999, 27, 480–513. [Google Scholar] [CrossRef][Green Version]
 Csiszár, I.; Talata, Z. Context tree estimation for not necessarily finite memory processes, Via BIC and MDL. IEEE Trans. Inf. Theory 2006, 52, 1007–1016. [Google Scholar] [CrossRef]
 Leonardi, F.G. A generalization of the PST algorithm: modeling the sparse nature of protein sequences. Bioinformatics 2006, 22, 1302–1307. [Google Scholar] [CrossRef] [PubMed][Green Version]
 Galves, A.; Löcherbach, E. Stochastic chains with memory of variable length. TICSP Ser. 2008, 38, 117–133. [Google Scholar]
 Garivier, A.; Leonardi, F. Context tree selection: A unifying view. Stoch. Processes Appl. 2011, 121, 2488–2506. [Google Scholar] [CrossRef][Green Version]
 Gallo, S. Chains with unbounded variable length memory: perfect simulation and a visible regeneration scheme. Adv. Appl. Probab. 2011, 43, 735–759. [Google Scholar] [CrossRef][Green Version]
 Galves, A.; Galves, C.; García, J.E.; Garcia, N.L.; Leonardi, F. Context tree selection and linguistic rhythm retrieval from written texts. Ann. Appl. Stat. 2012, 6, 186–209. [Google Scholar] [CrossRef]
 Galves, A.; Garivier, A.; Gassiat, E. Joint Estimation of Intersecting Context Tree Models. Scand. J. Stat. 2013, 40, 344–362. [Google Scholar] [CrossRef]
 Belloni, A.; Oliveira, R.I. Approximate group context tree. Ann. Stat. 2017, 45, 355–385. [Google Scholar] [CrossRef][Green Version]
 Fernández, R.; Galves, A. Markov approximations of chains of infinite order. Bull. Braz. Math. Soc. 2002, 33, 295–306. [Google Scholar] [CrossRef]
 Duarte, D.; Galves, A.; Garcia, N.L. Markov approximation and consistent estimation of unbounded probabilistic suffix trees. Bull. Braz. Math. Soc. 2006, 37, 581–592. [Google Scholar] [CrossRef]
 CuestaAlbertos, J.A.; Fraiman, R.; Ransford, T. Random projections and goodnessoffit tests in infinitedimensional spaces. Bull. Braz. Math. Soc. New Ser. 2006, 37, 477–501. [Google Scholar] [CrossRef]
Figure 2.
EEG signals recorded from four electrodes. The sequence of stimuli is indicated in the top horizontal line. Vertical lines indicate the beginning of the successive auditory units. The distance between two successive vertical lines is $T>0$. Solid vertical lines indicate the successive occurrence times of the string $\mathbf{u}$. The first yellow strip corresponds to the chunk ${Y}_{n}^{\left(a\mathbf{u}\right)}$ associated to the string $a\mathbf{u}$. The second yellow strip corresponds to the chunk ${Y}_{n}^{\left(b\mathbf{u}\right)}$ associated to the string $b\mathbf{u}$.
Figure 3.
Functions ${g}_{w}$ and the corresponding values of ${\mu}_{w}$ and ${\sigma}_{w}$ for $w\in \tau =\{2,11,21,01,00,10,20\}:$ (top) function ${g}_{2}$; (middle) functions ${g}_{21}$, ${g}_{11}$ and ${g}_{01}$; and (bottom) functions ${g}_{20}$, ${g}_{10}$ and ${g}_{00}$.
Figure 4.
Proportion of correct identification of the context tree $\overline{\tau}=\{2,01,11,21,20,10,00\}$ by applying Algorithm 1 to simulated data with sample sizes $n=100,120,140,\dots ,1000.$ For sample sizes larger than 200, the proportion of correct identification is at least $95\%$.
Context w  $\mathbf{p}\left(0\right\mathbf{w})$  $\mathbf{p}\left(1\right\mathbf{w})$  $\mathbf{p}\left(2\right\mathbf{w})$ 

2  $\u03f5$  $1\u03f5$  0 
21  $\u03f5$  $1\u03f5$  0 
20  $\u03f5$  $1\u03f5$  0 
11  0  0  1 
10  0  0  1 
01  0  0  1 
00  0  0  1 
© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).