cVEP Training Data Validation—Towards Optimal Training Set Composition from Multi-Day Data

This paper investigates the effects of the repetitive block-wise training process on the classification accuracy for a code-modulated visual evoked potentials (cVEP)-based brain–computer interface (BCI). The cVEP-based BCIs are popular thanks to their autocorrelation feature. The cVEP-based stimuli are generated by a specific code pattern, usually the m-sequence, which is phase-shifted between the individual targets. Typically, the cVEP classification requires a subject-specific template (individually created from the user’s own pre-recorded EEG responses to the same stimulus target), which is compared to the incoming electroencephalography (EEG) data, using the correlation algorithms. The amount of the collected user training data determines the accuracy of the system. In this offline study, previously recorded EEG data collected during an online experiment with 10 participants from multiple sessions were used. A template matching target identification, with similar models as the task-related component analysis (TRCA), was used for target classification. The spatial filter was generated by the canonical correlation analysis (CCA). When comparing the training models from one session with the same session’s data (intra-session) and the model from one session with the data from the other session (inter-session), the accuracies were (94.84%, 94.53%) and (76.67%, 77.34%) for intra-sessions and inter-sessions, respectively. In order to investigate the most reliable configuration for accurate classification, the training data blocks from different sessions (days) were compared interchangeably. In the best training set composition, the participants achieved an average accuracy of 82.66% for models based only on two training blocks from two different sessions. Similarly, at least five blocks were necessary for the average accuracy to exceed 90%. The presented method can further improve cVEP-based BCI performance by reusing previously recorded training data.


Introduction
The electroencephalography (EEG)-based brain-computer interface (BCI) is a technology that has the potential of replacing, enhancing, and improving human interaction with the surrounding/environment as well as enhancing digital life [1][2][3][4].
While the most popular and easy to implement paradigms are based on the visually evoked potentials (e.g., SSVEP-based BCI [5,6]), the code-modulated VEP (cVEP) has been highly researched in the recent past with very promising results in terms of accuracy and information transfer rate (ITR) [7,8]. Spüler et al. used a one class support vector machine and error-related potentials for an online adaptation of a cVEP-based BCI system [9]. In 2016, Nakanishi et al. utilised canonical correlation analysis (CCA) and datasets recorded on two different days to evaluate the performance of different session-to-session transfer learning approaches, in order to optimise the SSVEP classification [10]. Nakanishi et al. presented an enhanced ensemble task-related component analysis (TRCA) method, where the spatial participant, which were spread apart by about 2 weeks. Here, we combine these recorded data into one pool, which is further analysed.

Participants
The data of the 10 participants of the original experiment [15] were utilised in this offline analysis. All subjects were recruited from Rhine-Waal University of Applied Sciences (two female), with a mean (SD) age of 25.4 years (4.1). A written informed consent was signed before the experiment; the experiment was approved by the ethical committee of the medical faculty of the University Duisburg-Essen, Germany. Participants had normal or corrected-to-normal vision and little to no prior BCI experience. All participants received a financial reward for their participation.

Stimulus Presentation
The stimuli were presented on a 24.5-inch monitor (Acer predator XB252Q) with a refresh rate of 240 Hz and a resolution of 1920 × 1080 pixels, which was positioned approximately 60 cm from the subject's eyes [15]. The stimuli consisted of 32 white-squared targets with sizes of 150 × 150 pixels or 4.3 × 4.3 cm, which correspond to a visual angle of 4.1°, presented over a black background; see Figure 1. The distance between the targets was 75 pixels or 2.1 cm (vertical and horizontal).  [15] with 32 targets. During the training session, the current target on which the participants needed to fix their gaze was marked with a green frame.
The cVEP stimuli used the 63-bit m-sequence c 1 = 101011001101110110100100111000 101111001010001100001000001111110 [27], where '0' represented 'black' and '1' represented 'white', thus presenting at full contrast. The remaining stimuli c k , k = 2, . . . , K, where K = 32, were generated by circularly left shifting c 1 by k · 2 bits. The update rate was set to 60 Hz; thus, the colour of the stimulus changed in accordance with the bit sequence: every fourth frame with the used refresh rate of 240 Hz. This m-sequence was displayed for a duration of 1.05 s, which emulates to the historical vertical refresh rate of 60 Hz, which was popular in many previous studies.
In order to achieve the most reliable stimulus presentation, the G-Sync ® technology was disabled. Instead, the fixed refresh rate (240 Hz) was used, where a single frame is drawn within 4.17 milliseconds. Additionally, with the graphic card manufacturer tool, the number of pre-rendered frames (in the graphics card memory) was set to 1 (minimal available value). This reduced the internal drawing delays usually present in graphics card hardware to the actual drawing on the screen during the vertical screen refresh. Further details regarding the stimulus design can be found in [15].

Recordings
The recordings used for this study were the training phase of the original study, which consisted of a series of six repetitions (blocks). In every block, the participants had to focus their gaze at each target, which was cued sequentially through all 32 targets, for 2.10 s with a gaze shift pause of 1.0 s between every target. Between the blocks, the participants had an opportunity for a short break (while the EEG was still connected and the participant was still seated), after which they continued by pressing the space key.

Hardware and Software for Data Analysis
The hardware used for the offline analysis was a MSI GT73 notebook equipped with an Intel processor (Intel Core i7-6820HK CPU @2.70 GHz) and 16 GB of RAM, running on Windows 10 Education. The software utilised for the offline analysis was MATLAB ® 2021b, and the ensemble TRCA methods from [11] were adopted.

CCA-Based Spatial Filter Design and Template Generation
Canonical correlation analysis is the most popular and widely used method for generating the spatial filters, which finds a linear transformation that maximizes the correlation between the recorded signal and a template signal, e.g., sine-cosine signals or averaged EEG template signals [28,29]. Typically, only the first canonical correlation and corresponding weights (w) are used for the classification and construction of filters [11].
Similar to our previous study, we combined the CCA method with the TRCA template construction [15,22,30]. Each training trial was stored in an m × n matrix, where m denotes the number of electrode channels (here, m = 16) and n denotes the number of sample points (here, there are two 1.05 s stimulus cycles, n = 1.05 · F S · 2 = 1260).
Given two multi-dimensional variables X ∈ R m 1 ×n and Y ∈ R m 2 ×n , CCA identifies weights w X ∈ R m 1 and w Y ∈ R m 2 that maximize the correlation, ρ, between the so-called canonical variates x = X T w X and y = Y T w Y by solving The correlation value ρ that solves (1) is the canonical correlation. For most VEPbased BCI realizations, only the first canonical correlation weights (w X , w Y ) are used for classification or for the spatial filter design.
The recorded data were segmented into single trials T i ∈ R m×n , and used to generate a CCA-based spatial filter w ∈ R m . In total, 32 × 6 = 192 of such trials T i,j ∈ R m×n , i = 1, . . . , K, j = 1, . . . , n b , where n b = 6, were recorded.
For each target, individual templates X i ∈ R m×n and filters w i were determined (i = 1, . . . , K). In order to generate the spatial filters, the two matrices were constructed, whereX i represents the average of the n b trials corresponding to the i-th class. When inserted into Equation (1), these two matrices yield the filter vectors: w i = wT i , i = 1, . . . , K.

Classification
In order to add further improvements to the target classification, the ensemble spatial filter w, introduced in the TRCA method [11], which concatenates the spatial filters of all targets (classes), and the filter bank-based analysis [31], were implemented. For the filter bank design, a k was calculated as which gives the following normalised results: 0.386, 0.207, 0.156, 0.132, and 0.119. The lower cut-off frequencies for the k-th sub-band were 6, 14, 22, 30, and 38 Hz, and the upper cut-off frequency was 60 Hz for all-sub, which was filtered with an 4th-order Butterworth filter. In order to cancel out any phase response, forward and reverse filtering methods were applied. The output command (C) was calculated based on the weighted linear combinations of the Pearson's correlation coefficient (ρ k (X i T w i ,T i T w i )) and the filter bank sub-bands (K) and amplitude (a k ), for every trial in each block. The information transfer rate was calculated utilising the regular ITR formula: where N is the number of targets, P is accuracy, and T is the total time, including the gaze shift (inter-trial) pause. An online ITR calculator can be found at (https://bci-lab. hochschule-rhein-waal.de/en/itr.html, accessed on 31 December 2021).

Procedure
Both training sessions from the original study [15] were recorded on different days (mean distance 14.4 days), where each training session contained 6 training blocks. All the blocks from both sessions were used as follows: the testing data from 1st session (D 1 ) blocks were numbered from 1 to 6, and the testing data from 2nd session (D 2 ) blocks were numbered 7 to 12; see Figure 2.
For this study, the blocks were arranged in the following sequence (seq) = 1, 7, 2, 8, 3, 9, 4, 10, 5, 11, 6, and 12 for the leave-one-out, stepwise correlation and for 5 different variant combinations, which are described as follows (see Figure 2): Variant 1 (V 1 ), where block bl12 was used for testing and blocks bl1-bl11 in seq order were used for model building. Variant 2 (V 2 ), where blocks bl12 and bl6 were used for testing and blocks bl1-bl11 in seq order without bl6 were used for model building. Variant 3 (V 3 ), where blocks bl12, bl6, and bl11 were used for testing and blocks bl1-bl10 in seq order without bl6 were used for model building. Variant 4 (V 4 ), where blocks bl12, bl6, bl11, and bl5 were used for testing and blocks bl1-bl10 in seq order without bl5 and bl6 were used for model building. Variant 5 (V 5 ), where blocks bl12, bl6, bl11, bl5, and bl10 were used for testing and blocks bl1-bl9 in seq order without bl5 and bl6 were used for model building.
Additionally, a cross testing between session 1 (D 1 ) containing bl1-bl6 and ses-sion2 (D 2 ) containing bl7-bl12 was performed, where models built from one session were used for testing the data from the other session, e.g., model M (training data, x-axis) from subject 1 session 1 was used to classify the testing data (y-axis) of all subjects from the same session (session 1, Figure 3, left side) and all subjects from the other session (session 2, Figure 4, left side). The other direction was also analysed e.g., model M (training data, x-axis) from subject 1 session 2 was used to classify the testing data (y-axis) of all subjects from the same session (session 2, Figure 3, right side) and all subjects from the other session (session 1, Figure 4, right side). The accuracy of these results is presented in Figures 3 and 4, and the detailed diagonals are presented in Table 1.
The stepwise increasing concatenated models were built as follows (e.g., for testing bl12): • Step 1-model bl1 • Step Step 10-model bl1 For the leave-one-out cross-validation, if one block was tested, the steps would be built accordingly without these blocks.
The accuracy was calculated for every trial out of 32 in one block.  [15] were divided into 12 blocks and rearranged into a testing sequence, which was divided into model building and testing datasets.    Table 1 shows the results for session 1 (bl1-bl6) vs. session 2 (bl7-bl12) cross analysis, where the training data of one session were used to create the reference models M and the data from both were used for testing D. The training data of session 1 was used to build the model session 1 (M session1 ), and this model was used to classify the testing data from session 1 (D session1 ) and the testing data of session 2 (D session2 ). Similar, the training data of session 2 was used to build the model session 2 (M session2 ), and this model was used to classify the testing data of sessions 1 (D session1 ) and session 2 (D session2 ). A pairwise Mann-Whitney Utest between the M session1 D session1 vs. M session1 D session2 showed a statistical significance with U = 21.5, Z = −2.17, p < 0.05; between the M session1 D session1 vs. M session2 D session1 showed a statistical significance with U = 23.0, Z = −2.05, p < 0.05; between the M session2 D session2 vs. M session1 D session2 showed no statistical significance with U = 30.5, Z = −1.48, p = 0.14; between the M session2 D session2 vs. M session2 D session1 showed no statistical significance with U = 30.5, Z = −1.48, p = 0.14.

Results
The bottom part of Table 1 shows the corresponding information transfer rates (ITRs), as shown in Equation (5).
The values in Table 1 also represent the corresponding diagonals of Figures 3 and 4, where the additional cross-subject validation was performed. In Figure 3, the models and data from the same session (intra session) were cross-validated between the subjects and in Figure 4 the models and data from the other session (inter session) were cross-validated between the subjects.
In order to test the interchangeability of single training blocks from different multi-day sessions, the testing sequence arrangement (see Figure 2) was constructed and evaluated in Figure 5 with a single block bl12 against a concatenating increasing model from both sessions. This approach was also used to compare all other block from the multi-day pool in a leave-one-out cross-validation with a stepwise increasing (concatenating) models, including all other blocks in Table 2. The eleven bars in Figure 5 corresponds to the column labelled 12 in Table 2.
The different variants constructed in Figure 2 and evaluated in Table 3 test the different multi-day training data in order to build a minimal training model for improving the accuracy. Here, a Kruskal-Wallis Test showed no statistical significance between the different variants V 1 -V 5 with H(4) = 0.271, p = 0.992. The models were built stepwise added together; the blocks in the seq order, e.g., "1", "7", "2", "8", etc. represent the models built from bl1, bl1 + bl7, bl1 + bl7 + bl2, and bl1 + bl7 + bl2 + bl8, respectively. The whiskers mark the standard error.  Table 3. Accuracies of the different variants for the tested 10 participants (P1-P10).

Discussion
This paper focuses on the offline analysis of the cVEP training data only of the original study [15]; the training sessions were unified for all participants of the original study, which was performed on 2 separate days, spread apart for around 14 days.
Here, the focus lies on finding the suitable training process for the used cVEP system by cross-analysing the training data of both sessions and finding the best training strategy. First, the training data of each session were used for TRCA model building and testing (see Figure 3). When analysing this training intra-session's classification accuracies, averaged over all six blocks of a leave-one-out cross-validation, session 1 and session 2 reached 94.84% and 94.53%, respectively, which correspond to an ITR of 88.0 bits/min and 87.1 bits/min, respectively (see Table 1).
Both training sessions' data from the 1st and the 2nd day, session 1 and session 2, respectively, were cross-validated against the models from the other session (inter-session); see Table 1 and Figure 4. Interestingly, this cross-session analysis (presented in Table 1) shows that the participants could still achieve reliable control over the system with accuracies at around 76.67% and 77.34%, which shows that the fresh model can classify old data.
In order to avoid favouring the second session and the dependencies arising from the session continuity, the blocks data of both sessions were interchangeably (alternately) stacked, creating a continuous testing sequence seq. This sequence was also used for stepwise increasing models. These models were analysed with a leave-one-out crossvalidation of every block and averaged across participants (Table 2), yielding a maximum accuracy of 96.88%, which is higher than the intra-session average of both of the sessions (Table 1). This result was achieved for testing the block bl8 with the model at the step 6. The results of the testing sequence show a relevant increase in accuracy when additional blocks are added (increasing step number) to the model build. Interestingly, in step 2, where just two blocks were used for the model, the averaged accuracy was 82.66%. This accuracy increased with every additional step until it reached 94.44%.
Lastly, to find the minimalistic training procedure, the tested sequence seq data were arranged into five variants. In every variant, the accuracy was calculated by averaging the results for every testing block (see Table 3, Figure 2) with a fixed model. When analysing these results, the average accuracy had almost no difference between the variants V 1 -V 5 . Interestingly, when more testing data were used in a variant (e.g., V 5 ), there was only a slight increase in the classification accuracies, reaching 93.56% when compared to e.g., V 2 , which achieved 92.5%, the same as V 1 .
These results show that from at least two sessions a minimum of two training blocks are required, in order to achieve sufficient cVEP-based BCI system control, so that the accuracy exceeds 80%, as mentioned in Renton et al. [32]. In this case, the average classification accuracies achieved 82%, and if at least five training blocks are used, the average accuracy exceeds 90% (see Table 2). The 2nd step of the tested sequence included the 1st blocks of both sessions, thus, one old (previous session) and one new (current session) block of training data. The findings of this study suggest that the already collected training data can be efficiently reused, e.g., as transferred training data, to increase the user-friendliness of the BCI system by reducing repetitive training and thus minimising the potential visual fatigue caused by the stimuli. Compared to the system developed by Thielen et al. [26], which only requires an initial warm-up period of 12 s, the here-presented approach requires a minimum of one new training block, which corresponds to 98.2 s (including the inter-trial time of 1.0 s). On the other hand, some minimal training can be beneficial (easier adaptation) for different stimulation hardware (higher refresh rates which corresponds to higher cVEP carrier frequency), where the number of gold codes can be limited; instead, longer m-sequences can be implemented: one example is a 124-bit quintary m-sequence [33]).

Conclusions
Our findings show that at least two training blocks from two sessions form a sufficient starting point for further improvements in a modern cVEP-based BCI system, which in future research can further be improved with e.g., an online adaptation process. While the novel developed/emerging systems can work without a user-specific training (e.g., Spüler et al. [9] or Thielen et al. [26]), most of the currently used and popular cVEP-based BCI systems require some training phase that could benefit from previously collected training data, also from other subjects. This cross-subject feature extraction requires further research. While nothing can replace a "fresh" EEG data from the training (a necessity for an optimal model), the need for new re-training data can be reduced to the bare minimum. This can help some potential users in re-using their own previously collected data for the BCI system for everyday use and thus boost the setup times and further increase the initial daily performance of a cVEP system. This is also valid for many other BCI paradigms.