Retrieving a context tree from EEG data

It has been repeatedly conjectured that the brain retrieves statistical regularities from stimuli, so that their structural features are separated from noise. Here we present a new statistical approach allowing to address this conjecture. This approach is based on a new class of stochastic processes driven by context tree models. Also, it associates to a new experimental protocol in which structured auditory sequences are presented to volunteers while electroencephalographic signals are recorded from their scalp. A statistical model selection procedure for functional data is presented to analyze the electrophysiological signals. This procedure is proved to be consistent. Applied to samples of electrophysiological trajectories collected during structured auditory stimuli presentation, it produces results supporting the conjecture that the brain effectively identifies the context tree characterizing the source.

It has been repeatedly conjectured that the brain retrieves statistical regularities from stimuli, so that their structural features are separated from noise. Here we present a new statistical approach allowing to address this conjecture. This approach is based on a new class of stochastic processes driven by context tree models. Also, it associates to a new experimental protocol in which structured auditory sequences are presented to volunteers while electroencephalographic signals are recorded from their scalp. A statistical model selection procedure for functional data is presented to analyze the electrophysiological signals. This procedure is proved to be consistent. Applied to samples of electrophysiological trajectories collected during structured auditory stimuli presentation, it produces results supporting the conjecture that the brain effectively identifies the context tree characterizing the source.
1. Introduction. This paper can be summarized as follows.
• We present a new statistical approach allowing to address the conjecture that the brain operates as statistician, assigning models to external stimuli. • This is done through the introduction of a new class of stochastic processes driven by context tree models.
• This approach leads to a new experimental protocol to test this conjecture. • In order to analyze the electrophysiological signals collected using our experimental protocol we introduce a new statistical model selection procedure for functional data. • We also prove that this procedure is consistent. This is exemplified with an application to simulated data reproducing some of the features of the experimental data. • Applied to samples of electrophysiological trajectories collected by our team this new functional data statistical selection procedure finds results which support the conjecture that the brain effectively identifies the context tree characterizing the source.
Early evidence that the brain performs probabilistic modeling comes mostly from behavioral experiments (see, among others, Wolpert, Doya and Kawato (2003) for a review). There is also evidence that the brain employs predictive coding through the identification of markers of prediction and deviance detection (Friston (2005), Garrido et al. (2009), Wacongne et al. (2011) and Wacongne, Changeux and Dehaene (2012)). However, to the best of our knowledge, none of these papers succeeded in presenting a new statistical framework in which model selection was rigorously employed to test the conjecture that the brain assigns probabilistic models to samples of stimuli. This is precisely what we do in the present paper.
Our approach is based in the introduction of a new class of stochastic processes driven by context models. Using this new class of stochastic processes it is possible to establish a formal relationship between structured sequences of random stimuli and the stochastic process associated to the brain processing of the stimuli.
This new framework suggests a new experimental protocol which can be summarized as follows. A volunteer is exposed to sequences of auditory stimuli. The sequences are generated step by step by a random source. Elec-troencephalographic (EEG) signals are recorded during the exposure to the auditory sequence of stimuli. The conjecture is that the volonteers' brain automatically identifies the context tree characterizing the source. If this is the case, a signature of the structure of the source should be encoded in the brain activity. The question is whether this signature can be identified in the EEG data recorded during the experiment. This is the challenge faced here.
Considering the electroencephalogram as the realization of a stochastic process means that we have a problem of statistical model selection with functional data which is in general a difficult and unsolved issue. However the stochastic process driven by a context tree model approach adopted here makes the question treatable. This is done by the introduction of a new model selection procedure for functional data driven by a context tree model. This procedure is proved to be consistent. The effective application of the procedure to EEG data uses the projective method introduced in Cuesta-Albertos, Fraiman and Ransford (2006).
Let us describe briefly the structure of the random source generating the auditory stimuli. This source is defined by an algorithm which sets the probabilities of occurrence of each next unit. The algorithm has two components.
• The first component describes a partition of the set of all possible sequences of past units. This partition is defined using only the final past units. The number of units used to define the partition is variable and changes as a function of the past itself. • The second component is a family of probability measures in the set of auditory units. This family of measures is indexed by the elements of the partition. In other terms, the probability of occurrence of a next unit depends on the element of the partition to which the past up to that point belongs.
Random sources of this type have been introduced by Rissanen (1983) under the name of context tree models (CTM). The reason of this name is the following. Partitions of the past as described above can be represented by a rooted and labeled tree. In this tree, each element of the partition is described as a leaf of the tree. Rissanen called each element of the partition, each leaf, a context.
Introduced by Rissanen (1983) as a universal system for data compression, the context tree models become known in the statistics community through Bühlmann and Wyner (1999) in which they appear with the name of variable length Markov chains (VLMC). Context tree models are stochastic chains with memory of variable length which is the name adopted in the review paper by Galves and Löcherbach (2008) to which we refer the reader for a self-contained presentation of the field. This article is organized as follows. In Section 2 we introduce the notation and basic definitions, including the notion of context tree model, presented in Subsection 2.1 and the notion of stochastic process driven by a context tree model presented in Subsection 2.3. The random sources used in our experimental protocol are examples of context tree models. They are presented in Subsection 2.2. Our new procedure for statistical model selection is presented in Section 3. Theorem 3.1 on the consistency of the model selection procedure is also presented in Section 3. The case study with data obtained using our experimental protocol is presented in Section 4. In particular in Subsection 4.3 we state Theorem 4.1 on the consistency of the statistical model selection using the projective method. A simulation study is presented in Section 5. The proofs of Theorems 3.1 and 4.1 are presented in Appendices A.1 and A.2 respectively. In Appendix A.3 is proved a technical lemma needed in Subsection 4.4.
2. Stochastic processes driven by context tree models.
2.1. Context tree models. Let A be a finite alphabet. Given two integers m, n ∈ Z with m ≤ n, the string (u m , . . . , u n ) of symbols in A is often denoted by u n m ; its length is (u n m ) = n − m + 1. The empty string is denoted by ∅ and its length is (∅) = 0. Fixed two strings u and v of elements of A, we denote by uv the string in A (u)+ (v) obtained by the concatenation of u and v. By definition u∅ = ∅u = u for any string u ∈ A (u) . The string u is said to be a suffix of v if there exists a string s satisfying v = su. This relation will be denoted by u v. When v = u we say that u is a proper suffix of v and write u ≺ v. Hereafter, the set of all finite strings of symbols in A is denoted by A * := ∞ k=1 A k .
Definition 1. A finite subset τ of A * is a context tree if it satisfies the following conditions: 1. Suffix Property. For no w ∈ τ we have u ∈ τ with u ≺ w . 2. Irreducibility. No string belonging to τ can be replaced by a proper suffix without violating the suffix property.
The set τ can be easily identified with the set of leaves of a rooted tree with a finite set of labeled branches. The elements of τ will be always denoted by w.
Definition 2. The height of the context tree τ is defined as Given a context tree τ , let p = {p(· | w) : w ∈ τ } be a family of probability measures on A indexed by the elements of τ .
Definition 3. The pair (τ, p) will be called a probabilistic context tree on A. Each element of τ will be called a context.
Definition 4. Let τ and τ be two context trees. We say that τ is smaller than τ and write τ τ , if for every w ∈ τ there exists w ∈ τ such that w w . If τ = τ we write τ ≺ τ and say that τ is strictly smaller than τ .
Definition 5. Let (τ, p) be a probabilistic context tree on A. A timehomogeneous irreducible stochastic chain (X n ) n∈Z taking values in A is called a context tree model compatible with (τ, p) if 1. For each integer n ≥ (τ ) and any finite string where c τ x −1 −n is the only context in τ which is a suffix of x −1 −n . 2. No proper suffix of c τ x −1 −n satisfies (2.1). Context tree models will also be called stochastic chains with memory of variable length .
2.2. The random sources used in our experimental protocol. We can now formally define the two random sources producing the auditory stimuli in our experimental protocol. Both sources are examples of context tree models generating random sequences of strong beats, weak beats and silent units. We use the symbols 0, 1 and 2 to represent respectively silent units, weak beats and strong beats, so that the alphabet of the two sources is the set The sources can be described as follows: (i) Consider the deterministic sequence (s n ) n∈Z either . . . 2 1 1 2 1 1 2 1 1 2 . . .
The parameter is the omission probability of the weak beats. For each n ∈ Z we define The stochastic chain derived from the first deterministic sequence with period 3 will be called Ternary random source. The stochastic chain derived from the second deterministic sequence with period 4 will be called Quaternary random source.
It is an easy exercise to check that the Quaternary and Ternary random sources are context tree models with context trees given respectively by The graphical representation of these trees is given in Figure 1 and the associated family of transition probabilities are presented in Table 1. 3. Stochastic processes driven by context tree models. In the experimental protocol described in Section 1, we record a segment of EEG for each single acoustic beat or silent unit generated by the random source. The conjecture is that the distribution of the EEG chunk depends on the context associated to the sequence of acoustic units generated up to the present time step. This suggests the following definition. Definition 6. Let A be a finite alphabet, (τ, p) a probabilistic context tree on A, (F, F) a measurable space and (Q w : w ∈ τ ) a family of probability measures on (F, F). The bivariate stochastic chain (X n , Y n ) n∈Z taking values in A × F is a stochastic process driven by a context tree model compatible with (τ, p) and (Q w : w ∈ τ ) if the following conditions are satisfied, 1. (X n ) n∈Z is a stochastic chain with memory of variable length compatible with (τ, p). 2. For any integers m ≤ n, any string x n m− (τ )+1 ∈ A n−m− (τ ) and any sequence J m , . . . , J n of F-measurable sets, The process (X n ) n∈Z will be called the source chain and (Y n ) n∈Z will be called the response chain.
Example 1. Our experimental protocol can be presented in the framework of stochastic processes driven by context tree models. This is done as follows.
• The source chain (X n ) n∈Z represents the sequence of units of the auditory stimuli. This chain takes values in the alphabet {0, 1, 2} where 1 (respectively 2) indicates that the auditory unit is a weak (respectively strong) beat and 0 indicates a silent unit. The chain (X n ) n∈Z represents either the sequence of the stimuli produced by the Ternary or the Quaternary random source which have the context trees defined in (2.3) and graphically represented in Figure 1. • The response chain (Y n ) n∈Z is constituted by the successive chunks of EEG data, each chunk corresponding to an auditory unit. This means that each Y n is a real function Y n = (Y n (t), t ∈ [0, T ]), where T is the time distance between the onsets of two consecutive auditory stimuli.
The goal of the experimental protocol sketched in Section 1 is to produce empirical evidence supporting the conjecture that the brain does statistical model selection. In our case, this means checking whether it is possible to retrieve the structure of the source producing the auditory stimuli from a statistical analysis of the EEG data. Using the formalism of stochastic processes driven by context tree models described above this question can be formulated as follows. Is it possible to recover the context tree τ defining the random source generating the chain (X n ) n∈Z from the observable sample Y 1 , . . . , Y n ? The statistical framework to address this question is presented in the next section.

Statistical model selection.
In this section we address the problem of statistical model selection in the class of the stochastic processes driven by context tree models. More precisely, let (F, F) be a measurable space, (τ ,p) be a probabilistic context tree on a finite alphabet A and (Q w : w ∈τ ) a family of probability measures on (F, F). Finally, let (X 1 , Y 1 ), . . . , (X n , Y n ), with X k ∈ A and Y k ∈ F for 1 ≤ k ≤ n, be a sample produced by a stochastic process driven by the context tree model compatible with (τ ,p) and (Q w : w ∈τ ). Our task is to present a statistical procedure to select a context tree from the sample (X 1 , Y 1 ), . . . , (X n , Y n ). Before presenting our statistical selection procedure we need two more definitions.
Definition 7. Let τ be a context tree and s a finite string of symbols in A. We define the branch in τ induced by s as the set B τ (s) = {w ∈ τ : w = as for some a ∈ A}.
Given a sample (X 1 , . . . , X n ) of symbols in A and a finite string u ∈ A * , we write N n (u) to denote the number of occurrences of the string u in the sample (X 1 , . . . , X n ), that is Definition 8. Given integers n > L ≥ 1, an admissible context tree of maximal height L, for the sample (X 1 , . . . , X n ) of symbols in A, is any context tree τ such that For any pair of integers n > L ≥ 1 and any string u ∈ A * with (u) ≤ L, the set of indexes belonging to {L, . . . , n} in which the string u appears in sample (X 1 , . . . , X n ) is denoted by I n (u), that is Nn(u) ) is the subsample of Y n 1 induced by the string u. Finally, let D be a measurable function D : ∞ m 1 =1,m 2 =1 F m 1 ×F m 2 → R + and (c n ) n≥1 be a sequence of positive real numbers satisfying, for any pair of strings u, v ∈ A * with max{ (u), (v)} ≤ L, the following conditions C.1 if there exists w ∈τ such that w u and w v, it holds Our selection procedure can be now described as follows. For given integers 1 ≤ L < n, let T n be the largest admissible context tree of maximal height L for the sample (X 1 , . . . , X n ). The largest means that if τ is any other admissible context tree of maximal height L for the sample X n 1 , then τ T n . For any string s ∈ A * with maximal length and such that |B Tn (s)| ≥ 2, we test the null hypothesis using the test statistic Nn(bs) )).
We reject the null hypothesis H (s) 0 when ∆ n (s) > c n . When the null hypothesis H (s) 0 is not rejected, we prune the branch B Tn (s) in T n and set as a new candidate tree On the other hand, if the null hypothesis H is not tested any longer for all other strings u ∈ A * such that u s.
In each pruning step take always the largest string s ∈ A * which has not been tested yet. This pruning procedure is repeated until no more pruning is performed. We denote byτ n the final context tree obtained by this pruning procedure. The formal description of the above procedure is provided in Algorithm 1 as a pseudo code.
The consistency of Algorithm 1 is the content of our first theorem.
Theorem 3.1. Let (X n 1 , Y n 1 ) be a sample produced by a stochastic process driven by a context tree model compatible with (τ ,p) and (Q w : w ∈τ ) and let τ n be the context tree selected from the sample by Algorithm 1 with L ≥ (τ ).
If the statistic D and the threshold sequence (c n ) n≥1 satisfy conditions C.1 and C.2 then lim n→∞ P (τ n =τ ) = 0.
4. Case study: retrieving context trees from EEG data. A total of 20 healthy volunteers (9 female, mean age 30, standard deviation 6.8, 18 right handed) was evaluated. All the volunteers did not have any neurological pathology. The volunteers signed a informed consent term, after the nature of the study and the protocol to be performed had been completely understood. This experimental protocol was approved by the local ethical research committee (process number 22047613.2.0000.5261).
4.1. Experimental protocol. The experimental protocol is the one briefly sketched in Section 1. The experiment consisted in exposing volunteers to sequences of auditory stimuli defined as strong beats, weak beats and silent units, indicated respectively by symbols 2, 1 and 0. The sequence of auditory stimuli were produced by the context tree models called in Subsection 2.2 Ternary and Quaternary random sources. Herein, the omission probability (see equation (2.2)) took the value 0.2. A third random source was employed to separate the samples generated by the Ternary and Quaternary random sources. This third random source produced sequences of independent auditory stimuli taking the values 0, 1 and 2 with probability 1/3. The corresponding context tree reduces to the single root with no branches. The goal of introducing this unpredictable sequence was to shuffle cards before the volunteer is exposed to a next sample.
The volunteer was exposed to two 12 min blocks of samples generated by each of the random sources. The blocks were separated from each other by a period of time ranging from 5 to 10 min, during which data collection was interrupted. Each sample was a concatenation of three 1 min sequences of auditory units generated independently by the same random source. Each sequence of auditory units was separated from the next one by a 15 seconds silent interval.
All volunteers were exposed to two different orderings, either Ternary, Independent, Quaternary or Quaternary, Independent, Ternary. For half of the volunteers the starting block was Ternary, Independent, Quaternary and the second block was Quaternary, Independent and Ternary. The inverse ordering was used with the other half, to balance possible order effects.
Presentation software (Presentation Mixer as a Primary Buffer and a Sound card: SoundMAX HD Audio) was used to play the auditory sequences through a headset. The loudness of the stimuli was individually regulated before the experiment start by presenting the strong beat and asking the volunteer to adjust it up to a comfortable level (Range: 0.1-0.3 dB).

4.2.
Data acquisition and pre-processing. Electroencephalographic recording was performed by means of a 128 channels system (Geodesic HidroCel GSN 128 EGI, Electrical Geodesic Inc.) during the exposure to the auditory sequence of stimuli. The electrode cap, previously immersed in saline solution (KCl), was dressed into the volunteer's scalp. Volunteers were instructed to close their eyes and remain quiet throughout the experiment.
The EEG signal was amplified with a nominal gain of 20 times. The acquisition was performed in a sampling frequency of 250Hz. During acquisition the signal was analogically filtered (Butterworth first order band-pass filter of 0.1-200Hz; Geodesic EEG System 300, Electrical Geodesic Inc.). The electrode positioned on the vertex (Cz) was used as reference.
The data was preprocessed offline using EEGLAB (Delorme and S.Makeig (2004)) running in MATLAB environment (MathWorks, Natick, MA, version R2012a). Signals were filtered with a Butterworth fourth order bandpass filter of 1-30 Hz. Artifacts above and below 100 µV were removed. The data was then segmented into events of 450ms, each one indexed by the corresponding auditory unit. Finally, baseline correction was performed using the signal collected 50 ms before each event start (see figure 2). 4.3. Statistical framework. In this section we discuss how to apply the theoretical framework presented in Section 3 for the case where the driven process (Y n ) n∈Z is a sequence of chunks of EEG signals produced by our experimental protocol as described in Example 1.
The framework is the following. The stochastic process driven by a context tree model (X n , Y n ) n∈Z is such that Y n ∈ L 2 ([0, T ]) for some T > 0. To use Theorem 3.1 we need to define a statistic for samples of elements of L 2 ([0, T ]) and a threshold sequence (c n ) n≥1 satisfying both conditions C.1 and C.2. This will be done by using the projective method (Cuesta-Albertos, Fraiman and Ransford (2006)) as follows.
Let 1 ≤ L < n be positive fixed integers and (X 1 , Y 1 ), . . . , (X n , Y n ) be a sample produced by a stochastic process driven by a context tree model with Y k ∈ L 2 ([0, T ]), for 1 ≤ k ≤ n, for some T > 0. Given a string u ∈ A * with (u) ≤ L and an element h ∈ L 2 ([0, T ]) define the empirical distribution associated to string u and direction h aŝ where for any pair of functions f, h ∈ L 2 ([0, T ]), For a given pair of finite strings u and v with max{ (u), (v)} ≤ L and h ∈ L 2 ([0, T ]), the Kolmogorov-Smirnov distance between the empirical dis-tributionsQ u,h n andQ v,h n is denoted by Let us then define for any pair of strings w, s ∈ A * with { (u), (v)} ≤ L and h ∈ L 2 ([0, T ]), Under some assumptions on the family of probability measures generating the response process (Y n ) n∈Z , it will be proved that if the direction h appearing in D h is chosen as a realization of a Brownian motion W = (W (t) : 0 ≤ t ≤ T ) then the statistic D W satisfies conditions (C.1) and (C.2) for almost all realizations W = (W (t) : 0 ≤ t ≤ T ). These are the contents of Propositions 1 and 2 in the Appendix A.2.
For this reason, for any string s ∈ A * and any realization of a Brownian motion W , we define Nn(bs) )).
To state the consistency theorem in this framework we need the following assumption.
Assumption 1. The associated family (Q w : w ∈τ ) of probability measures on (L 2 ([0, T ]), B(L 2 ([0, T ]))) is non-atomic and, for any string s ∈ A such that |Bτ (s)| ≥ 2, there exist w, w ∈ Bτ (s) such thatQ w =Q w and eitherQ w orQ w satisfy the Carleman condition. A probability measure P on (L 2 ([0, T ]), B(L 2 ([0, T ]))) is non-atomic if for any h ∈ L 2 ([0, T ]), the distribution P h is non-atomic. Let V be a finite set of indexes and (P i : i ∈ V ) be a family of probability measures on (L 2 ([0, T ]), B(L 2 ([0, T ]))). We say that (P i : i ∈ V ) is non-atomic if for all i ∈ V , the probability measure P i is non-atomic.
Theorem 4.1. Let (X 1 , Y 1 ), . . . , (X n , Y n ) be a sample produced by a stochastic process driven by a context tree model compatible with (τ ,p) and (Q w : w ∈τ ) and letτ n be the context tree selected from the sample by Algorithm 1 with L ≥ (τ ). If (Q w : w ∈τ ) satisfies Assumption 1 and the threshold sequence (c n ) n≥1 is such that c n → ∞ and c n n −1/2 → 0 as n → ∞, then lim n→∞ P (τ n =τ ) = 0.
The proof of Theorem 4.1 is postponed to Appendix A.2.

Data analysis.
In what follows the application of the statistical procedure described in Section 4.3 is detailed. The analysed data set may be summarized as follows. Let V = {v 1 , v 2 , . . . , v 20 } denote the set of 20 volunteers under consideration, E = {e 1 , e 2 , . . . , e 18 } the set of all electrodes belonging to the system 10-20 but the reference electrode C z , and S ={Ternary, Quaternary} the set of possible random sources. In the sequel, to avoid a cumbersome notation, we shall describe the data set as well as the application of the statistical procedure used to select a context tree from this data set for a fixed volunteer v ∈ V and random source s ∈ S.
The sample of the auditory sequence produced by the random source s ∈ S exposed to the volunteer v ∈ V is then denoted by X 1 , . . . , X n where n is the sample size. Since the time of exposure (which is the same for all volunteers) of each random source is 6 min and the length of the interval between two consecutive auditory units is 450ms, it follows that the sample size n = 800. The corresponding EEG recording restricted to the set of electrodes E is denoted by e ∈ E , with T = 450ms, is the vector of EEG chunks associated to the k-th auditory unit X k , with 1 ≤ k ≤ n. Being 250Hz the sampling frequency of the EEG acquisition and T = 450ms, all the EEG chunks have 113 values. This requires that all the computations necessary to numerically implement our statistical procedure are done with discrete time t = 1, . . . 113. Therefore, the sample data set is where n = 800 and Y k ∈ R 113×18 , for all 1 ≤ k ≤ n. An extension of the Algorithm 1 (henceforward called Algorithm 2) is then employed to select a context tree from the sample appearing in (4.2). The algorithm 2 works as follows. For each electrode e ∈ E, apply Algorithm 1 to the data (X 1 , Y e 1 ) . . . , (X n , Y e n ) replacing the steps 6, 7 and 8 by the following procedure. Hereafter fix an integer N ≥ 1.
• Generate N independent realizations W 1 , . . . , W N of the Brownian mo- 1{∆ Wm n (s) > c n } and if∆ n (s) is large, that is∆ n (s) > C 1 for some constant C 1 > 0, perform step 8 of Algorithm 1 . Otherwise, perform steps 10 and 11 of Algorithm 1.
Using more than one random projection we increase the power of the test. We denote byτ e n the context tree selected from the sample (X 1 , Y e 1 ) . . . , (X n , Y e n ) by applying this powerful version of Algorithm 1. Finally, setting for any w ∈ A * , Z n (w) = e∈E 1{w ∈τ e n }, the context tree assignedτ n to sample (4.2) is then defined aŝ for some constant C 2 > 0.
The threshold values c n , C 1 and C 2 are chosen in such a way whereτ is the context tree defining the stochastic chain (X n ) n∈Z . The proof of the existence of such a triple is the content of Lemma 1. Its proof is postponed to Appendix A.3. The boundedness of the EEG signals (after the pre-processing step) implies that their underlying family of probability distributions satisfies Carleman condition. Thus the consistency of Algorithm 2 follows immediately from Theorem 4.1.

4.5.
Results. The results obtained from the analysis of the EEG recorded while the volunteers were exposed to the Quaternary and Ternary sources are summarized in Figures 3 and 4.
In the Quaternary case this summary shows that -15 of the context trees correctly have 2 as a context; -19 of the context trees correctly do not have 1 as a context; -18 of the context trees correctly do not have 0 as a context; -12 of the context trees correctly satisfy the 3 features above. This means that 12 of the twenty volunteers successfully identify the correct context tree truncated at height 2.
In the Ternary case the summary shows that -only 3 of the context trees correctly have 2 as a context; -16 of the context trees correctly do not have 1 as a context; -11 of the context trees correctly do not have 0 as a context.
A main reason for context misidentifications in the data analysis could be the small set of samples of EEG signals used in each prune-or-keep decision. For instance in the case of the Quaternary source, 14 volunteers failed to identify 000 as a context and 7 volunteers incorrectly decide not to prune the Fig 3: Context tree summary for the Quaternary data. This context tree summarizes the twenty context trees obtained from the EEG data recorded while the volunteers are exposed to the Quaternary source. White nodes indicate the number of subjects which correctly identify the node as not being a context. Black nodes indicate the number of subjects which correctly identify the node as a context.

Fig 4:
Context tree summary for the Ternary data. This context tree summarizes the twenty context trees obtained from the EEG data recorded while the volunteers are exposed to the Ternary source. White nodes indicate the number of subjects which correctly identify the node as not being a context. Black nodes indicate the number of subjects which correctly identify the node as a context. branch 020. This is not surprising when we observe that for each volunteer, the strings 000 and 020 appear at most 14 and 15 times respectively in the sample. In the Ternary source, the strings 020 and 002 appear at most 17 and 15 times with average number occurrences equal to 10.3 and 10.7 respectively per volunteer.
In general, the results obtained in the Ternary case are less impressive than those obtained for the Quaternary case. This could be a consequence of the fact that in the Ternary case the two random units (which could take the values 0 or 1) occur successively, while in the Quaternary case the two random units are separated by deterministic units.
In general, the question of the importance of the specific features of the context trees used in our experimental setup should be better discussed. This will be done in forthcoming article.
5. Simulation study. We perform simulation studies for stochastic processes driven by Quaternary and Ternary context tree models presented in Figure 1 and Table 1. Throughout this section,τ is either the Quaternary or Ternary context tree. For each of the two context tree models we consider two different possibilities, P1 and P2 to define the family of probability measures (Q w : w ∈τ ).
If we chose a random function h ∈ L 2 ([0, T ]) according to a probability measure Q on L 2 ([0, T ]) and then project h on a realization of a Brownian motion W = (W (t) : t ∈ [0, T ]), the resulting variable has a normal distribution with mean 0 and whose variance is given by In particular, the elements of set V w = {V 1 , . . . , V Nn(w) }, where for each in Subsection 3) have normal distribution whose variance depends on the context w through the distribution Q w • V −1 . Although the distribution Q w is unknown we always can compute the set of variances through formula (5.1) from our data set. This set of variances is used to define the two possibilities P 1 and P 2 considered in the simulation study. In this way, we incorporate some of the features of the experimental data in the simulated data.
For each context tree model, we choose an electrode of a specific volunteer. For either model the electrode and volunteer chosen were the ones with best performance in the statistical analysis. In the simulation associated to the Quaternary context tree model EEG signals recorded from the electrode E58 of the volunteer V 19 were used. In the Ternary case, the electrode E92 of volunteer V 05 was chosen. For each context w ∈τ we compute, using (5.1), the set of variances V w of the EEG signals associated to w.
The choices P 1 and P 2 defining the family of probability measures (Q w : w ∈τ ) are given now. In P1, for each context w ∈τ ,Q w is a Gaussian distribution where the mean µ w is the average value over the set V w , and the variance is σ 2 = 10 −4 . The right (left) side of the Table 2 shows the empirical averages of V w , for each context w ∈τ , in the Quaternary (Ternary) case. In P2, for each w ∈τ , we take Q w as the Uniform distribution over the set V w . All the algorithms are provided as supplementary materials.  Table 2 Transition probabilities of the Quaternary (right) and Ternary (left) stochastic processes in the P1 protocol, when Q w = N (µw, 10 4 ) and µw is the empirical average of the set of variances of the EEG signals associated to each context We considered four choices of samples size n: 50 000, 100 000, 150 000 and 200 000 for each one of the two scenarios, P1 and P2. For each value of n we simulated 100 samples. Then, for each sample we selected a context tree according the statistical model selection described in Section 3, starting from the complete admissible context tree of maximal length 3.
The right (left) side of the Tables 3 and 4, indicates the number of times that, within the 100 simulated samples, the Quaternary (Ternary) context tree was correctly selected by the algorithm under the possibilities P1 and P2 respectively. We also report the number of times the second most selected context tree was chosen by the algorithm. In the Quaternary case in all simulation protocols the second most selected context tree is the Quaternary context tree with the contexts {21, 01} replaced by the symbol 1. In the Ternary case, in all simulation protocols the second most selected context tree is the one which has the symbol 0 as a context, instead of {00, 10, 20}, and keeps all the other contexts of the Ternary context tree.

APPENDIX A: MATHEMATICAL PROOFS
A.1. Proof of Theorem 3.1.
Proof of Theorem 1. Define Cτ = {w ∈τ : |Bτ (suf(w))| ≥ 2} where suf(w) ∈ A * is the string obtained from the string w by removing its last symbol. If w = w −1 −k for some k ≥ 1, then suf(w) = w −1 −(k−1) with the convention that w −1 0 = ∅. Define also the following events    It follows from the definition of the Algorithm 1 that P (τ n =τ ) = P (U n ) + P (O n ). Thus, we need to prove that both P (U n ) and P (O n ) converge to 0 as n → ∞. We start by proving that P (U n ) → 0 as n → ∞. By the union bound, we see that Since |Bτ (suf(w))| ≥ 2, there exits a pair u, v ∈ Bτ (suf(w)) whose the associated distributionsQ u andQ v on F are different. Observing that the Condition C.2 then implies that P (U n ) → 0 as n → ∞.
To conclude the proof it remains to show that P (O n ) → 0 as n → ∞. Using again the union bound, we have Observing that Nn(as) ), (Y (bs) 1 , . . . , Y Nn(bs) ) > c n }, we deduce from Condition C.1 and inequality (A.2) that P (O n ) → 0 as n → ∞.
A.2. Proof of Theorem 4.1. Theorem 4.1 will follow easily from the two propositions below. The first proposition states that if the family {Q w : w ∈τ } is continuous and the threshold sequence (c n ) n≥1 satisfies c n → ∞ as n → ∞, we have that for each h ∈ L 2 ([0, T ]), the statistic D h and (c n ) n≥1 fulfill the condition C.1. More precisely, Proposition 1. Letτ be a context tree and {Q w : w ∈τ } be a family of continuous probability distributions on (F, F). For any positive integer A k such that such that w u and w v , h ∈ F \ {0} and (c n ) n≥1 such that c n → ∞ as n → ∞, it holds that Proof. Indeed, by recurrence the random variables N n (u) and N n (v) diverge as n → ∞. Hence, Theorem 3.1(a) of Cuesta-Albertos, Fraiman and Ransford (2006) implies that the distribution of the random variable Nn(v) )) is independent of the strings u and v, and also of the direction h ∈ F \ {0}. Moreover, for each t > 0, it holds that where G(t) = 2 ∞ k=1 (−1) k+1 e −2k 2 t 2 . Given > 0, let t 0 be a positive real number such that G(t) < /2 for all t ≥ t 0 . Such t 0 exits since t → G(t) is an non-decreasing function such that G(t) → 0 as t → ∞. Hence, for all n sufficiently large so that c n ≥ t 0 and As a consequence for all n large enough Nn(v) )) > c n ) < , which concludes the proof.
We will now show that under Assumption 1 and assuming that (c n ) n≥1 is such that c n → ∞ and c n n −1/2 → 0 as n → ∞, for almost all realizations of a Brownian motion W = (W (t) : t ∈ [0, T ]), the statistic D W and the threshold sequence (c n ) n≥1 also fulfill Condition C.2. This is the content of the second proposition.
Proposition 2. For any (c n ) n≥1 satisfying c n → ∞ and c n n −1/2 → 0 as n → ∞, and any family {Q w : w ∈τ } of probability distributions on (L 2 ([0, T ]), B(L 2 ([0, T ]))) verifying Assumption 1, there exists a pair w, w ∈ τ such that for almost all realization of a Brownian motion W = (W (t) : Nn(w ) )) ≤ c n } = {Z h n ≥ 1}, so that the result will follow immediately once we prove that for almost all realization of a Brownian motion W on [0, T ], P-almost surely Z W n → 0 as n → ∞.
By the strong law of large number for context tree models, we have Palmost surely (A.4) lim n→∞ n 1/2 N n = p(w) + p(w ) p(w)p(w ) . The proof of Theorem 4.1 is now easy.
Proof of Theorem 4.1.. Indeed, proceed as in the proof of Theorem 3.1. Propositions 1 implies that the right-hand side of (A.1) vanishes as n → ∞, while the right-hand side of (A.2) vanishes as n → ∞ thanks to Propositions 2.
A.3. Lemma 1. In what follows, for each n ≥ 1, letτ n be the context tree selected by applying Algorithm 2 to the sample data (X 1 , Y 1 ), . . . (X n , Y n ), where for each 1 ≤ m ≤ n, Y m = (Y e m : e ∈ E) for some finite set of indexes E, and for e ∈ E, the sample (X 1 , Y e 1 ), . . . (X n , Y e n ) is produced by a stochastic process driven by a context tree compatible with a probabilistic context tree (τ ,p) (recall the setup presented in Subsection (4.4)). The Theorem 1 reads the following.
Proof. Given α 1 ∈ (0, 1], using Kolmogorov-Smirnov statistic properties we may find a threshold value c n such that for any finite string s ∈ A * with s / ∈τ and any realization of a Brownian motion W = (W (t) : t ∈ [0, T ]), it holds P (∆ W n (s) > c n ) ≤ α 1 . In the sequel we write∆ n (s) =∆ e n (s) to stress the dependence on e ∈ E. By the independence of W 1 , . . . , W N , it follows that conditioning on Y e 1 , . . . Y e n , the random variable∆ e n (s) is stochastically dominated by a Binomial random variable B(N, α 1 ). From this it is straightforward to deduce that for any given α 2 ∈ (0, 1], it is possible to find a constant C 1 such that for any finite string s / ∈τ , P (∆ e n (s) > C 1 ) ≤ α 2 .
Since for any s ∈ A * , {s ∈τ e n } ⊂ {∆ e n (suf(s)) > C 1 } (recall the definition ofτ e n given in Subsection 4.4), it follows from the inequality above that for any s / ∈τ , P (s ∈τ e n ) ≤ α 2 . Notice that in Algorithm 2 all the Brownian random directions chosen to computeτ e n andτ e n are independent for each e, e ∈ E. We may deduce (similarly as above) that conditioning on (Y 1 , . . . Y n ), the random variable Z n (s) with s / ∈τ is stochastically bounded a Binomial random variable B(|E|, α 2 ). It follows then that given any α ∈ (0, 1], we may chose C 2 in such a way that P (τ τ n ) ≤ w∈τ s w: (s)≤L P (Z n (s) > C 2 ) ≤ α.

SUPPLEMENTARY MATERIAL
Supplement A: Data and Scripts (http://www.e-publications.org). The directory SUPPLEMENT contains the subdirectories DATASET and SCRIPTS. The data files for each volunteer as well as a Readme file which contains the data description are available in the directory DATASET. The SCRIPTS directory contains three subdirectories PRE-PROCESSING, SIMULATION and PROJECTIVE-METHOD. In the PRE-PROCESSING directory is placed all the scripts concerning the pre-processing of the dataset, briefly described in Subsection 4.2. In the SIMULATION directory the scripts used in Section 5 are provided. The PROJECTIVE-METHOD directory contains all the scripts used in the data analysis. Each one of theses subdirectories include also a Readme files explaining how to use the scripts.