1. Introduction
Electroencephalography (EEG) has emerged as a highly effective non-invasive method for measuring human cognitive activities and mental states, with applications ranging from neurorehabilitation to mindfulness training [
1,
2]. In recent years, advances in sensor technology and machine learning have supported the development of systems that interpret brain activity patterns, particularly those that aim to achieve subject-independent analysis [
3,
4,
5]. Among these applications, meditation assessment has received significant attention due to its relevance to mental health [
6], stress reduction [
7], and cognitive enhancement [
1,
8]. However, a major challenge in developing generalized EEG-based systems for such purposes lies in the variability of brain signals across individuals and sessions [
9]. This study focuses on the subject-independent classification of EEG signals across multiple meditation and non-meditation sessions using machine learning techniques, with the goal of improving the robustness and reliability of EEG-based mental state monitoring systems [
10,
11].
Meditation has been proven to cause both temporary and permanent changes in a person [
12,
13], where “Trait” [
14,
15] refers to the lasting changes that develop after long-term meditation practice. “State” [
16,
17], on the other hand, refers to the temporary changes that occur while a person is meditating compared to when they are not. Since this study aims to classify meditation and non-meditation states, we focus on “State” characteristics. Many people who practise meditation aim to achieve progressively calmer mental states, and currently, psychological assessments such as mindfulness scales, questionnaires, and interviews have demonstrated such progressive differences among individuals who practise meditation [
18,
19].
At the same time, neuroimaging techniques such as EEG [
20,
21,
22], fMRI (functional magnetic resonance imaging) [
23], fNIRS (functional near-infrared spectroscopy) [
24], and PET (positron emission tomography) [
25], along with physiological measurements such as heart rate variability [
26] and cortisol levels [
27], have been used to understand various characteristics of meditation/non-meditation. A significant number of studies have been conducted using EEG to examine different aspects of meditation/non-meditation [
28,
29,
30], and some of these studies highlight the importance of developing software systems that can support individuals in practicing meditation, particularly in guiding them toward progress [
31,
32,
33]. At the most basic level, the calmness achieved through meditation can be identified by comparing brain patterns during meditation and non-meditation, with non-meditation serving as a baseline [
29,
34]. However, since meditation skills improve over time with repeated practice, understanding states such as calmness requires computational analysis and recognition of patterns in multiple-session meditation/non-meditation EEG data [
35,
36].
We recognized the importance of studying patterns in multiple-session meditation/non-meditation EEG data [
37,
38,
39]. Our earlier work [
37] showed that while meditation sessions share similar EEG characteristics, they are clearly distinguishable from non-meditation sessions. As a next step, in our following study [
38], we successfully classified intra-subject, multiple-session meditation/non-meditation EEG data. When developing a system that can monitor and distinguish brain pattern differences between meditation and non-meditation EEG data, it is more practical and user-friendly to design it as a subject-independent system, since such systems do not require personal calibration or prior data collection from each individual user, making them easier to use and more accessible in real-world applications. Therefore, as the next step in our study, in this paper, we plan to test subject-independent, multiple-session meditation/non-meditation EEG data classification and compare this performance with the outcomes of our previous intra-subject classification study [
38].
EEG is a method used to measure the electrical activity of the brain [
40,
41] by placing electrodes on the scalp at predefined locations [
42]. Studies have shown that EEG signals collected in this manner, which consist of vibration-like patterns, can correlate with different mental tasks [
43]. One common method for analyzing EEG data is to first transform it into the frequency domain [
44], where, based on specific characteristics, the signal is divided into five frequency bands: Delta (0.5–4 Hz), Theta (4–8 Hz), Alpha (8–13 Hz), Beta (13–30 Hz), and Gamma (>30 Hz) [
45,
46,
47]. Based on past research, the Theta [
48,
49,
50,
51] and Alpha [
52,
53,
54] bands are most commonly associated with meditation/non-meditation studies, and they were utilized in our study as well. EEG mainly consists of non-linear and complex data; therefore, extracting meaningful patterns from it is a lengthy and challenging process, commonly referred to as the brain–computer interface (BCI) pipeline [
55].
A BCI pipeline consists of multiple steps that can generally be broken down into data collection, preprocessing, feature extraction, classification, and application [
56,
57]. Preprocessing involves cleaning and artifact removal of the collected EEG data and modifying it into a different structure or a simpler format, such as through dimensionality reduction [
58,
59]. Since EEG data are complex and have a low signal-to-noise ratio, this cleaning and preprocessing step plays a significant role in the entire study. After the data is cleaned and converted into the desired format, the next step is feature extraction [
56,
60]. The aim here is to extract important information hidden in the EEG data that can be used for valuable future tasks. Even after thorough cleaning, a certain amount of noise still remains in the EEG data. Because of this, advanced signal processing methods such as short-time Fourier transform (STFT) [
61,
62], common spatial patterns (CSPs) [
63,
64,
65], and event-related potentials (ERPs) [
66] are used in feature extraction.
After feature extraction, the next step in the BCI pipeline is classification and application [
67,
68,
69]. When we examine past work, notable classification algorithms used with meditation EEG include Support Vector Machines (SVMs) [
70], Linear Discriminant Analysis (LDA) [
71], Artificial Neural Networks (ANNs) [
72], Decision Trees, Random Forests [
4], and k-Nearest Neighbors (k-NN) [
9]. Among these, SVMs and ANNs were the most commonly used methods, with ANNs being significantly utilized by researchers in the recent past. The use of ANNs has increased not only with EEG but also in many research scenarios involving large, nonlinear datasets with high computational complexity that require processing to extract meaningful patterns [
73]. This is because ANNs can function as deep learning algorithms that learn from data, adapt to new information, and model complex nonlinear relationships [
74]. An ANN structure behaves similarly to neuron interactions in the brain, and a basic ANN consists of an input layer, an output layer, and a number of hidden layers, where each layer contains one or more nodes depending on the problem it is designed to solve. An ANN is first trained on a dataset before being used on a new dataset, and training occurs by sending signals forward and backward through the network while optimizing the internal weights to learn from the training data [
75]. After a significant amount of analysis, in our previous study on intra-subject multiple-session classification, we used ANNs to ensure consistent conditions during classification while comparing the performance of different feature extraction methods [
38]. In this study, we intend to use the proven BCI pipelines from our previous study, allowing us to compare the performance of intra-subject and inter-subject multiple-session meditation/non-meditation classification on common ground.
This study aims at the novel task of evaluating how effectively subject-independent EEG data classification can be achieved for multiple-session meditation and non-meditation EEG data. This work extends our previously published study [
38], where we first demonstrated that multiple-session meditation and non-meditation EEG data exhibit common characteristics within each group. Following that, we successfully performed intra-subject classification of multiple-session meditation and non-meditation EEG data. As a continuation of these studies, this work focuses on subject-independent classification of multiple-session meditation and non-meditation EEG data.
By focusing on both multiple-session meditation and subject-independent classification, this study adds significant value to scientific research. Past studies show a growing demand and interest in achieving subject-independent EEG data classification, particularly in areas such as motor imagery [
56,
76]. Such an advancement would enable the development of applications that can be trained on existing data and used by new individuals without further modifications. Unfortunately, no prior research exists on subject-independent, multiple-session meditation EEG classification. Therefore, this study represents a key milestone in initiating and encouraging the development of algorithms that can support apps guiding individuals in their meditation progress. The value of such an app is particularly significant when it is subject-independent and supports multiple-session meditation practice, as true progress in meditation can only be achieved through consistent long-term practice.
2. Materials and Methods
This study is an extension of our previous work [
38], where we successfully demonstrated intra-subject, multi-session EEG data classification for meditation and non-meditation states. In this work, we test how subject-independent, multi-session EEG data classification for meditation and non-meditation states performs compared to our previous study. Since we aim to compare the results of inter-subject classification with our earlier intra-subject classification, this study follows a procedure similar to our previous work, where the only difference is that, instead of using EEG sessions from the same person for training and testing on the selected machine learning algorithms, EEG sessions from different participants were used. Since the data preprocessing and construction of the BCI pipelines were carried out similarly to our previous study, only the most important steps are summarized here, and readers can refer to our earlier paper [
38] for the full description. At the same time, the parts of the procedure that are unique to this study are thoroughly explained in this section.
This study uses an EEG dataset available online, with the dataset’s DOI as follows:
https://doi.org/10.18112/openneuro.ds003816.v1.0.1. A description of the dataset can be found in one of our previous research articles [
37]. The dataset is labelled as experienced meditators [
77,
78,
79], but since there is no measurement of experience levels, such as years of practice, we assume that there may be varying levels of experience among the expert participants. In this study, five sessions of EEG data collected per mental task for each of the 12 participants were used, which is a subset of the above online dataset. The study uses both meditation and non-meditation EEG data. Two meditation types were used in this study: loving-kindness meditation [
80,
81] on oneself (LKM-Self) and on others (LKM-Others). Therefore, for each participant, 15 sessions of EEG data were used, with 5 sessions per meditation type and 5 sessions of non-meditation data. Two independent classification tests were conducted: one for LKM-Self vs. non-meditation and the other for LKM-Others vs. non-meditation. Altogether, 180 (15 × 12) sessions of EEG data were used in this study, and these 180 sessions were the highest-quality sessions selected from the original dataset after initial cleaning and screening.
The selected 180 sessions of EEG data were further cleaned and preprocessed using EEGLAB [
59,
82] in MATLAB R2021b. A “Basic FIR filter” was applied to extract the data between 2 Hz and 45 Hz, and bad channels were identified and corrected [
83,
84]. Independent component analysis (ICA) [
71,
85,
86] was then performed on each EEG data session, and cleaning was carried out by selecting the appropriate components through visual inspection [
59,
87]. Since the original EEG data had 127 channels, principal component analysis (PCA) [
71,
85] was used along with ICA to achieve dimensionality reduction [
88,
89], allowing ICA to run smoothly. The cleaned EEG data sessions were saved in “.eeg” file format for use in the BCI pipelines developed using Python 3.9 [
90,
91].
In our previous study [
38], intra-subject classification was performed using 5 session pairs of EEG data for the two compared meditation/non-meditation mental tasks for a selected participant. Since we plan to compare the results of the two studies, in this study, we also used 5 session pairs of EEG data for the mental tasks. After selecting these 10 sessions, we used three sizes of training data to match our previous study in order to compare the results. Therefore, in this study, we used 2 session pairs, 3 session pairs, or 4 session pairs of EEG data for training the machine learning algorithms in independent analyses.
The cleaned data consists of 180 sessions from 12 participants. Since we are working on inter-subject classification, we first modelled this dataset into three groups corresponding to the three mental tasks. Specifically, 60 sessions of data were assigned to LKM-Self, LKM-Others, and non-meditation, with each group containing 5 sessions of EEG data from a single participant. Two independent studies were conducted to classify LKM-Self vs. non-meditation and LKM-Others vs. non-meditation. Since the procedures were the same for both cases, the description is provided only for the LKM-Self vs. non-meditation classification.
For all independent tests conducted, the first task was to randomly select 5 participants per mental task, ensuring that each participant contributed at most one session for that task. Since each participant had 5 sessions of data per mental task, 1 session was then randomly selected from the available 5 for each chosen participant. In this way, each mental task contained data from 5 different participants. Among the two mental tasks, a single participant’s data might appear in one or both tasks depending on the random selection. The selected 5 meditation and 5 non-meditation sessions were then paired randomly to create 5 meditation/non-meditation session pairs. This selection process was repeated each time a pair of meditation/non-meditation mental tasks was used to study their classification strength.
For the LKM-Self vs. non-meditation pair, three independent studies were conducted, each using a different number of training session pairs. These studies were evaluated using three independent BCI pipelines. In this study, we used the same three BCI pipelines that produced the best performance in our previous work [
38] on intra-subject multiple-session meditation/non-meditation EEG classification. Since, as part of our study, we intend to compare the performance of intra-subject and inter-subject classifications, we used the same BCI pipelines. Here, each of these EEG session data was broken down into epochs of 2 s in size with a 1 s overlap [
92]. The main difference among these three BCI pipelines is the method used for feature extraction. The three BCI pipelines use either common spatial patterns (CSPs) [
63,
64,
65], short-time Fourier transform (STFT) [
61,
62], or a fusion of CSP and STFT in each pipeline for feature extraction.
To give some background on the feature extraction methods used in the three BCI pipelines, common spatial patterns (CSPs) identify spatial filters that maximize variance differences between two mental tasks, capturing discriminative patterns across EEG channels. Short-time Fourier transform (STFT) analyzes the spectral content of EEG signals over short time windows, allowing the extraction of frequency-based features such as theta and alpha bands, which are relevant to meditation. The third pipeline combines CSP and STFT, taking advantage of both spatial and spectral information and providing a richer feature representation for classification. These methods are critical for the current study, as they enable the BCI pipelines to extract complementary information from multi-session EEG data and differentiate meditation from non-meditation states effectively.
In this study, EEG data were analyzed across several frequency ranges to identify the most effective band for meditation and non-meditation classification. Consistent with our previous findings [
38], the theta and alpha frequency range (4–13 Hz) again produced the best classification performance. In all three BCI pipelines, artificial neural networks [
74,
75] were used as the classification algorithms, providing a common ground to compare the performance of the three feature extraction algorithms. We tested several network configurations and activation functions, and similarly to our previous study, a compact multi-layer perceptron (MLP) with two hidden layers of 20 nodes and a logistic activation function provided the best performance. Although EEG data are high-dimensional, dimensionality-reduction techniques reduced the feature set to a maximum of 14 inputs for each BCI pipeline. For this compact input, the two 20-node hidden-layer structure offered the highest accuracy with low computational cost, while the increasing network size did not improve performance. Since we used three different training sizes on three different BCI pipelines, we conducted nine different studies. Also, since this was carried out for the classification of LKM-Self vs. non-meditation and LKM-Others vs. non-meditation independently, this adds up to eighteen different studies that we conducted.
All the tests conducted started with a random selection of 5 pairs of EEG session data for the selected meditation/non-meditation mental tasks. Then, three different studies were conducted based on the number of session pairs used for the training of the machine learning algorithms. In the following sections, each of these three studies is explained one after the other.
The first study was conducted using 2 EEG data session pairs for training and 1 EEG data session pair for testing the BCI pipelines. This study starts by randomly selecting 5 EEG data session pairs for the LKM-Self and non-meditation mental tasks. In this study, 3 out of 5 session pairs were selected at a time, with 2 pairs used for training and the remaining session pair used for testing. Here, interchanging the testing session pair gave 3 tests for the 3 selected session pairs. Using the 5 selected pairs, multiple tests were conducted on various selection combinations of 3 session pairs out of 5. With different selection combinations, a total of 30 tests were conducted for each randomly selected 5-session pair of LKM-Self/non-meditation. This random selection of 5 session pairs of EEG data for LKM-Self and non-meditation was repeated for 30 independent experiments, resulting in 900 tests. Since 900 tests were conducted on one type of BCI pipeline and we used 3 independent BCI pipelines in our study, the total number of tests conducted was 2700. Therefore, for these 3 BCI pipelines, a total of 2700 (900 × 3) tests were conducted on the classification of LKM-Self vs. non-meditation using 3 session pairs of EEG data in the study.
The second study was conducted using 3 EEG data session pairs for training and 1 EEG data session pair for testing the BCI pipelines. This study begins by randomly selecting 5 EEG data session pairs for the LKM-Self and non-meditation mental tasks. In this study, 4 out of the 5 session pairs were selected at a time, with 3 pairs used for training and the remaining session pair used for testing. Here, interchanging the testing session pair gave 4 tests for the selected 4 session pairs. Using the selected 5 pairs, multiple tests were conducted on various selection combinations of 4 session pairs out of 5. With different selection combinations, a total of 20 tests were conducted for each randomly selected 5-session pair of LKM-Self/non-meditation. This random selection of 5 session pairs of EEG data for LKM-Self and non-meditation was repeated for 30 independent experiments, resulting in 600 tests. Since 600 tests were conducted on one type of BCI pipeline, and we used 3 independent BCI pipelines in our study, the total number of tests conducted was 1800. For these 3 BCI pipelines, a total of 1800 (600 × 3) tests were conducted on the classification of LKM-Self vs. non-meditation using 4 session pairs of EEG data in the study.
The third study was conducted using 4 EEG data session pairs for training and 1 EEG data session pair for testing the BCI pipelines. This study begins by randomly selecting 5 EEG data session pairs for the LKM-Self and non-meditation mental tasks. In this study, out of the 5 session pairs selected at a time, 4 pairs were used for training, and the remaining session pair was used for testing. Here, interchanging the testing session pair gave 5 tests for the randomly selected 5 session pairs of LKM-Self/non-meditation. This random selection of 5 session pairs of EEG data for LKM-Self and non-meditation was repeated for 30 independent experiments, resulting in 150 tests. Since 150 tests were conducted on one type of BCI pipeline, and we used 3 independent BCI pipelines in our study, the total number of tests conducted was 450. For these 3 BCI pipelines, hence, 450 (150 × 3) tests were conducted on the classification of LKM-Self vs. non-meditation using 5 session pairs of EEG data in the study.
When considering all three studies with different numbers of session pairs tested on the 3 BCI pipelines, a total of 4950 (2700 + 1800 + 450) tests were conducted to study the classification of LKM-Self vs. non-meditation. A similar approach was used to study the classification of LKM-Others vs. non-meditation; thus, a total of 9900 (4950 × 2) independent tests were conducted in our study for testing inter-subject multiple session meditation and non-meditation EEG data classification. The full description of the implementation of the 3 BCI pipelines is provided in our previous research article [
38], and a summary of the procedure is shown in
Figure 1. Our work clearly describes the steps for using CSP, STFT and the fusion of CSP and STFT for feature extraction. It also describes how deep learning was achieved using artificial neural networks with 2 hidden layers to match each of the feature sets and obtain optimal outcomes.
3. Results
In this study, we evaluated the effectiveness of subject-independent multiple-session EEG data classification for distinguishing between meditation and non-meditation states. We plan to compare these findings with our previous research on intra-subject multiple-session EEG data classification. The results are mainly divided into two parts, as we independently compared non-meditation with two meditation techniques: LKM-Self and LKM-Others. Each of these comparisons was tested under three conditions, based on the number of EEG session pairs used, specifically three, four, and five session pairs for both LKM-Self/non-meditation and LKM-Others/non-meditation classification tasks.
As a result, six tables were generated to present the findings from these six studies. For each study, three independent classification tests were conducted using three different BCI pipelines, which employed CSP, STFT, or a fusion of CSP and STFT for feature extraction. Accordingly, each of the six tables includes the classification accuracies achieved by all three pipelines. Each study was repeated independently 30 times to generalize the findings and improve result reliability. Leave-one-session-out cross-validation was applied, and the average and standard deviation were computed to report the classification accuracy and associated uncertainty for each test.
Table 1 presents the average classification accuracies obtained for LKM-Self vs. non-meditation EEG data using three session pairs based on 30 independent experiments conducted for each of the three BCI pipelines with different feature extraction methods: CSP, STFT, and a fusion of CSP and STFT. Each of these 30 experiments consists of 30 independent tests, and each row in the table shows the average accuracy and uncertainty for those 30 tests across the three BCI pipelines. For all 2700 (30 × 30 × 3) tests conducted, two session pairs were used for training the machine learning algorithm, and one session pair was used for testing the algorithm. The “Mean Accuracy (All Tests)” provides the average accuracy and uncertainty for all 900 tests conducted for each BCI pipeline. Since we observed a significant level of uncertainty for each pipeline across the 900 tests, we calculated the “Mean Accuracy (Bottom 50%)” and the “Mean Accuracy (Top 50%)” of these 900 tests. These mean accuracies and uncertainties are displayed at the bottom of
Table 1.
Table 2 follows the same description as provided for
Table 1, with the only difference being the use of the meditation mental task LKM-Others instead of LKM-Self. Therefore,
Table 2 presents the average classification accuracies obtained for LKM-Others vs. non-meditation EEG data using three session pairs, based on 30 independent experiments conducted for each of the three BCI pipelines employing different feature extraction methods: CSP, STFT, and a fusion of CSP and STFT. Each BCI pipeline consists of 900 tests, adding up to a total of 2700 tests. The table also includes the results for “Mean Accuracy (All Tests)”, “Mean Accuracy (Bottom 50%)”, and “Mean Accuracy (Top 50%)”.
The results shown in
Table 3,
Table 4,
Table 5 and
Table 6 follow the same structure and reporting format as described for
Table 1 and
Table 2, with the only differences being the number of session pairs (four and five instead of three) and the meditation task (LKM-Self or LKM-Others). Each table presents the average accuracies and uncertainties across the three BCI pipelines (CSP, STFT, and CSP–STFT fusion), including “Mean Accuracy (All Tests),” “Mean Accuracy (Bottom 50%),” and “Mean Accuracy (Top 50%).”
To support the tabulated results (
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6),
Figure 2,
Figure 3,
Figure 4 and
Figure 5 provide a visual summary of the mean classification accuracies obtained across the three BCI pipelines (CSP, STFT, and CSP + STFT). These figures illustrate the overall performance trends for LKM-Self vs. non-meditation and LKM-Other vs. non-meditation classification under different session pair conditions (three, four, and five).
4. Discussion
In this study, we evaluated the performance of inter-subject (subject-independent), multi-session EEG data classification for meditation and non-meditation states and compared it with our previous work on intra-subject, multi-session EEG data classification for the same states. Six separate studies were conducted, resulting in six tables of outcomes. These included the LKM-Self vs. non-meditation study and the LKM-Others vs. non-meditation study, each performed using three, four, and five EEG session pairs. Since each of these six independent studies was tested using three different BCI pipelines, each table contains three columns corresponding to these pipelines. The main difference among the three pipelines lies in the use of different feature extraction algorithms, namely, CSP, STFT, and CSP + STFT. With three pipelines applied across the six studies, a total of 18 (6 × 3) outcomes are presented in the six result tables (
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6).
In the six tables, there are 30 lines of results, and each result represents the average classification accuracy along with the corresponding error for a single random selection of five session pairs. As described in the
Section 2, after randomly selecting five session pairs of meditation/non-meditation EEG data, each pipeline was tested on all possible combinations of those five pairs, with classification accuracy computed for each instance (for the three, four, and five session pairs used independently in the pipeline, the total number of possible combinations was 30, 20, and 5, respectively). These classification accuracies were then used to calculate the average classification accuracy and the associated error. This is reported as a single result in each table, and for 30 such tests, 30 results are shown for each BCI pipeline.
Our aim was to obtain a mean accuracy for each of the BCI pipelines using the results in the six tables so that the results could be compared with the mean accuracies obtained in our previous studies on intra-subject classification. Therefore, we calculated the overall average for each pipeline based on 30 independent tests. In the tables, these results are labelled as “Mean Accuracy (All Tests).” For the tables containing results of three, four, and five session pairs, these overall averages and errors for each pipeline were calculated using classification accuracy results obtained from 900, 600, and 150 independent tests, respectively. The total number of tests conducted for the entire study was 9900 ((900 + 600 + 150) × 3 × 2).
After obtaining the 18 overall mean accuracy values for the six tables, they were compared with the corresponding instances from our previous studies. We observed that the mean accuracy for each instance in this study was lower than the corresponding accuracy in the previous study for the same instance. This suggests that inter-subject classification produced lower classification accuracy than intra-subject classification. Additionally, a significant finding in this study was the high level of errors observed when calculating these mean accuracy values. These results opened a new perspective in our study, revealing that some instances yield high classification accuracies, while others yield low classification accuracies when conducting inter-subject, multi-session meditation/non-meditation classification.
Compared to our previous study, the reduced classification accuracies along with the high errors in this study demonstrate that some tests produced high classification accuracies, while others resulted in low classification accuracies. One possible explanation, which should be tested in future research, is that this variability may be related to differences in meditation experience levels among the participants, since session selection was performed randomly. Although the dataset was labelled as consisting of experienced meditators, the actual experience level, such as years of practice, was not measured. We therefore hypothesize that a range of experience levels may have existed among the so-called experienced meditation participants. If this is the case, we would expect some selection combinations to yield high classification accuracies, while others would result in lower accuracies. High accuracies could occur when the random selection included meditation EEG sessions from participants with similarly high levels of meditation experience, whereas low accuracies could result when the random selection mixed EEG data from participants with varying levels of experience.
To test this assumption for the 18 studies shown in
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6, after calculating the “Mean Accuracy (All Tests)”, we also computed the “Mean Accuracy (Bottom 50%)” and “Mean Accuracy (Top 50%)”. For each BCI pipeline, the classification accuracies were divided into two equal groups, the lower 50% and the upper 50%, and the mean accuracy and corresponding error were calculated for each group. These calculations were performed across the 18 independent studies, and the resulting values are presented at the bottom of
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6. The results show the mean accuracies of both the top and bottom 50% of the accuracy values obtained when testing each BCI pipeline for each of the three training sizes used in the study.
At a glance, we can observe a significant difference in classification accuracy between the mean accuracies calculated from the top and bottom halves of the results. This indicates that classification accuracy depends on the data selected for the classification. If accuracies were not influenced by the choice of sessions, the difference between the top 50% and bottom 50% would not be substantial. Since we used random selection, some combinations yielded significantly high classification accuracies, while others resulted in notably low accuracies. This was reflected in the large variance among the calculated classification accuracies for all tests within a single BCI pipeline model and further evidenced by the substantial difference between the mean accuracies of the top 50% and the bottom 50%.
The summary of the classification accuracy results (“Mean Accuracy (All Tests)”) presented in
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6 is further elaborated in
Figure 2,
Figure 3,
Figure 4 and
Figure 5. Additionally, we compared the results obtained in this study (“Mean Accuracy (All Tests)”, “Mean Accuracy (Bottom 50%)”, and “Mean Accuracy (Top 50%)”) with those from our previous study, and this comparison is illustrated in
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10 and
Figure 11. In the following sections, these figures are explained individually.
First, we focus on the performance trends of the 18 different studies, which we label as “Mean Accuracy (All Tests).” When calculating the mean accuracies, in the cases of three, four, or five session pairs, they were computed using accuracies obtained from 900, 600, and 150 tests, respectively.
Figure 2,
Figure 3,
Figure 4 and
Figure 5 present the comparison of mean classification accuracies for LKM-Self/non-meditation and LKM-Others/non-meditation using the three algorithms, CSP, STFT, and CSP + STFT, across three, four, and five session pairs of EEG data.
For LKM-Self/non-meditation (
Figure 2 and
Figure 4), the CSP + STFT pipeline significantly outperforms the others for three- and four-session pairs, and it performs slightly better in the five-session pair case. An increase in the number of training pairs generally improves accuracy for CSP and STFT, while for CSP + STFT, the improvement is seen mainly when comparing 3 to 4 session pairs. Overall, in 5 out of 6 cases where session pairs increased, classification accuracy improved.
For LKM-Others/non-meditation (
Figure 3 and
Figure 5), CSP + STFT outperforms the others in the three- and four-session pair cases, while STFT performs best in the five-session pair case. As with LKM-Self, increasing session pairs usually improved classification accuracy for CSP and STFT but not consistently for CSP + STFT. Here, 4 out of 6 cases showed improvements.
By comparing both meditation types, we see that in 9 out of 12 instances (75.0%), an increase in the number of training session pairs led to higher classification accuracy. These findings are consistent with our previous intra-subject study, where increasing training session pairs improved classification accuracy in all 12 comparisons (100%).
The performance of the three BCI pipelines (CSP, STFT, and CSP + STFT) was further tested using pairwise
t-tests (
Table 7). The results indicate that CSP + STFT performs significantly better than CSP (t (5) = −3.15,
p = 0.025), while CSP + STFT is marginally better than STFT (t (5) = −2.43,
p = 0.059), though not statistically significant at α = 0.05. No significant difference was found between CSP and STFT (t (5) = −0.59,
p = 0.581).
Next, the results obtained in this study are compared with those from our previous study, and
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10 and
Figure 11 have been prepared to provide the corresponding comparisons. The results from this study include “Mean Accuracy (All Tests)”, “Mean Accuracy (Bottom 50%)”, and “Mean Accuracy (Top 50%)”, as shown in
Table 1,
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6. These are compared with the matching experimental results from our previous study, labelled as “Mean Accuracy (Intra-Subject, Previous Study)”. Each graph allows for a comparison of the performances of the three BCI pipelines, CSP, STFT, and CSP + STFT, for a selected meditation type and session pair count.
Figure 6 presents the classification accuracies for LKM-Self/non-meditation using 3 session pairs of EEG data with the three algorithms, CSP, STFT, or (CSP + STFT), while
Figure 7 shows the corresponding results for LKM-Others/non-meditation. Similarly,
Figure 8 displays the classification accuracies for LKM-Self/non-meditation using 4 session pairs, and
Figure 9 presents the same for LKM-Others/non-meditation. Finally,
Figure 10 and
Figure 11 show the classification accuracies for LKM-Self and LKM-Others/non-meditation, respectively, when using 5 session pairs of EEG data with the three algorithms.
When we look at
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10 and
Figure 11 for the 18 different study instances (3 BCI pipelines × 2 meditation types × 3 training session pair sizes), we observe several recurring patterns. Therefore, we will address these patterns one by one and provide a general explanation as follows.
The first observation we can elaborate on for all these 18 study instances is that the mean accuracy (all tests) in this study yielded lower classification accuracies compared to the same instances in our previous study on intra-subject classification. This is clearly understandable, as this study tested inter-subject (subject-independent) classification, meaning EEG multi-session data from different participants were used for training and testing the algorithms. In contrast, the previous study conducted a single test using multi-session data from a single person for both training and testing. This highlights that when using meditation EEG data from multiple participants, extracting meaningful information and applying it to classification algorithms becomes more complex, as such data involve greater variability than multi-session data from a single individual. Nevertheless, we were able to demonstrate that good classification performance is achievable in certain instances. Therefore, we can conclude that further research is necessary to generalize these results and achieve high mean accuracies for subject-independent classification using multiple-session meditation/non-meditation EEG data.
The second significant factor we observed in this study was that the mean accuracy (all tests) for these 18 studies exhibited significantly high error values or, in other words, variability. This suggests that the classification accuracies had a wider distribution around the mean values. What this indicates is that, while the mean accuracies in this study were lower than those in our previous study, there were individual instances that yielded relatively good accuracies, as well as those that resulted in relatively poor accuracies. These studies were conducted using a random selection of session pairs, and the high variance reveals that certain selection combinations produced good classification accuracies. Although we ensured that for each mental task, the session data were always selected from different participants, some sessions still displayed significantly higher levels of match, leading to higher classification accuracies. The underlying reasons for obtaining high accuracy for certain selection combinations and low accuracy for others remain to be explored.
One assumption we can make is that although the selected participants are labelled as expert meditators, they may still exhibit differences, falling within a range of expertise at the expert level. When the randomly selected data came from participants with similar levels of meditation skills, using session data from these participants tended to produce higher classification accuracies. Although this may or may not be the case, what remains significant is that there were individual instances that resulted in high classification accuracies. Therefore, future studies should be conducted to better understand the reasons why some session combinations fail to yield good classification accuracies and to develop solutions to address these issues. Such studies will make a meaningful contribution toward achieving subject-independent, multiple-session meditation/non-meditation classification.
To further illustrate the significance of some session combinations yielding high accuracies and others yielding low accuracies, we divided the entire accuracy result set into two equal parts and calculated the mean accuracy for each part. For each pipeline, when the number of session pairs used was three, four, and five, the number of tests conducted for each was 900, 600, and 150, respectively. In each case, this set was divided in half, with the largest half of the values placed in one group and the smallest half in another group. Then, using these selected test values, the mean and error for each group were calculated. These mean values (“Mean Accuracy (Bottom 50%)” and “Mean Accuracy (Top 50%)”) are shown in
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10 and
Figure 11, enabling us to compare them with the full mean accuracies of this study, “Mean Accuracy (All Tests),” and the results of the past study, “Mean Accuracy (Intra-Subject, Previous Study)”.
When examining
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10 and
Figure 11, we observe a significant difference between the mean accuracies of the bottom 50% and the top 50%, indicating that some session pair combinations yield better classification accuracies than others. Although the overall mean accuracies (all tests) in this study were lower than the corresponding values in the previous study, the mean accuracies of the top 50% were consistently higher than those of the past intra-subject classification. This further suggests that the top 50% of results in this inter-subject classification study outperformed those from the previous intra-subject study. Therefore, these positive findings highlight the importance of future research aimed at identifying the factors contributing to lower-performing combinations and improving the overall effectiveness of subject-independent, multiple-session meditation/non-meditation classification.
Additionally, we tested the results of the three-, four-, and five-session pairs independently to examine whether bimodality was present in the classification accuracies obtained in our study (
Table 8). Hartigan’s dip test was used, and for the three-session case, unimodality was rejected, indicating bimodal characteristics among the 5400 test results. In contrast, the four- and five-session cases, which together consisted of 4500 tests (3600 + 900), failed to reject unimodality, indicating a lack of statistically significant bimodality. When generalizing across the entire dataset, we can conclude that although a hint of bimodality is visible, further research is needed to confirm this characteristic in multiple-session meditation/non-meditation EEG data.
Furthermore, findings that improve multiple-session classification will contribute to the future development of subject-independent EEG meditation-guiding apps. This is because progress in meditation is achieved through rigorous practice across multiple sessions, and therefore, any algorithm intended to support a person’s meditation progress should be capable of distinguishing patterns across sessions, specifically differentiating between meditation and non-meditation states. Although current research on such apps that support mental calmness and relaxation is still in its early stages, there is significant attention and high demand even for the most basic apps currently available. For these reasons, this study on subject-independent, multiple-session meditation/non-meditation classification holds great significance in current EEG research.