2.1. Samples and EEG Acquisition System
In this experiment, sucrose and quinine serve as standard reference substances for preparing sweet- and bitter-tasting solutions, respectively. For the preparation of sweet-tasting solutions at different concentrations, 0.3 g, 0.6 g, 1.2 g, 2.4 g, and 2.8 g of sucrose were weighed and dissolved separately in 100 mL of distilled water to obtain five distinct concentrations. For the preparation of bitter-tasting solutions at different concentrations, 0.0002 g, 0.0004 g, 0.0008 g, 0.0016 g, and 0.0032 g of quinine were weighed and dissolved separately in 100 mL of distilled water to obtain five distinct concentrations.
The EEG acquisition system used was the NCERP-P system, manufactured by Shanghai Nuocheng Electric Co., Ltd. EEG data from 21 channels related to taste were collected at a sampling rate of 256 Hz. The structure of the detection system is shown in
Figure 1. According to the International 10–20 electrode placement system, electrodes were precisely positioned at the following locations on the EEG cap produced by Wuhan Green Tech Co., Ltd.: Fz, Cz, Pz, T3, T4, C3, C4, Fp1, Fp2, F7, F8, T5, T6, O1, O2, F3, F4, P3, P4, A1, and A2. The reference electrode was placed on the bilateral earlobes. Conductive gel was applied at each electrode, and the contact conditions were adjusted to ensure that the impedance of each channel remained below 10 kΩ. Upon completion of system assembly, baseline signal acquisition and equipment calibration were implemented sequentially to guarantee the acquisition of high-fidelity EEG signals with minimal artifacts. In order to achieve accurate regulation and high-sensitivity detection of taste-evoked EEG responses, a custom-developed liquid delivery module was seamlessly integrated with the EEG recording system. The core components of this liquid delivery module include a STM32F103ZE main control board (supplied by Yehuo Technology Co., Ltd., Dongguan, China), a S15S-53J miniature vacuum pump (manufactured by Hailin Technology Co., Ltd., Chengdu, China), an sfos-1037v-01 electromagnetic valve (provided by Yehuo Electronic Technology Co., Ltd., Dongguan, China), and an AB32-S21P020C-11R liquid flow sensor (produced by ODE Limited, Hong Kong, China). Specifically, the main control board serves as the central command unit, which not only allows for precise tuning of liquid flow velocity and dispensed volume but also executes real-time and accurate control over the on-off status of the electromagnetic valve and the operational parameters of the vacuum pump. The external part of the device is equipped with a soundproof cover and sound-absorbing foam to effectively reduce operational noise, thereby creating a quiet and stable experimental environment. Consequently, a dedicated taste stimulation device (
Figure 1) was also designed and applied in this study, ensuring that liquid stimuli entered the subject’s mouth at a constant flow rate and volume. This minimized human error and guaranteed accuracy and reproducibility throughout the stimulation process.
2.2. Taste Detection Experiment
In this study, we designed and conducted a taste-related EEG stimulation experiment. A total of 20 healthy adult subjects were recruited for the experiment, including 10 males and 10 females, aged 20–24 years, with a mean age of 22.15 ± 1.23 years, and with no history of smoking or right-handedness. Pure water stimulation was included as a control condition to rule out the influence of non-taste factors on EEG activity. To ensure the balance of the data, the sample size for each subject, each concentration of sweet/bitter taste, and the control group of pure water was exactly the same. All samples were processed according to the same pre-treatment, segmentation, and artifact removal procedures, with no sample bias.
To ensure the stability and reliability of the EEG data collected during the taste experiment, the following experimental protocols were implemented:
(1) Experimental Environment Control: EEG experiments are highly sensitive to external interference. To minimize the effects of environmental noise and lighting variations, the experiment was conducted in a quiet, isolated laboratory equipped with acoustic isolation and electromagnetic shielding systems to avoid external disturbances. Adjustable lighting and light-blocking devices were installed indoors to maintain a soft and stable visual environment. The laboratory was cleaned and ventilated prior to the experiment, in order to prevent odors from affecting the participants’ mood. During the experiment, room temperature was maintained at 22 ± 2 °C and relative humidity at 50 ± 10% RH, in order to ensure participant comfort and EEG signal stability.
(2) Participant State Requirements: Participants were instructed to ensure adequate sleep the night before the experiment, so that they could participate in a good mental state. On the day of the experiment, consumption of caffeine, spicy foods, or any substances that may affect neural activity was prohibited. The use of perfumes or other odor-stimulating products was also avoided, so as to prevent cross-influences between taste and olfaction.
(3) Physiological Preparation and Taste Control: To avoid signal distortion caused by high scalp resistance, the participants were required to wash their hair on the day of the experiment and refrain from using oily hair care products. The experiment was generally scheduled approximately two hours after a meal, in order to minimize the effects of hunger or satiety on taste sensitivity. Before the experiment, the participants rinsed their mouths with water to remove residual taste substances, ensuring the independence and comparability of each stimulus.
(4) Equipment and Electromagnetic Protection: Only the EEG acquisition system, stimulus-control computer, and essential auxiliary devices remained in the laboratory, eliminating interference from other electromagnetic emission sources. Grounding checks and signal calibration were performed before the experiment, in order to ensure that the impedance of all electrodes was below 10 kΩ. Baseline signal detection was completed before the formal experiment began, in order to guarantee stable system operation and the acquisition of authentic and effective signals.
The specific steps of the formal experiment were as follows:
(1) Preparation Phase.
After entering the laboratory, the participant first underwent identity verification and received an explanation of the experiment. The experimenter provided a detailed introduction to the experimental purpose, task requirements, and precautions, and informed consent was obtained. Subsequently, the participant was guided to a comfortable experimental chair, maintaining an upright sitting posture with a relaxed body. Both hands were placed naturally on the table surface, and movement was minimized to reduce the impact of motion artifacts on EEG signals.
(2) Electrode Placement and Calibration.
Following the International 10–20 electrode placement system, electrodes were precisely positioned at the following locations on the EEG cap manufactured by Wuhan Green Tech Co., Ltd. (Wuhan, China): Fz, Cz, Pz, T3, T4, C3, C4, Fp1, Fp2, F7, F8, T5, T6, O1, O2, F3, F4, P3, P4, A1, and A2. The reference electrode was placed on both earlobes. Conductive gel was applied at each electrode, and the contact conditions were adjusted to ensure that the impedance of each channel remained below 10 kΩ. After installation, baseline signal detection and device calibration were performed to ensure stable EEG signals without significant artifacts.
(3) Taste Stimulus Presentation.
This experiment adopted a randomized presentation design. Six concentration gradients (including water) were set for both sweet and bitter solutions, and one concentration was randomly selected for stimulation each time. Six distinct gradients of taste stimulation solutions were freshly prepared and assigned the labels S0-S5 and B0-B5 two hours prior to the formal initiation of the experiment, with ultrapure distilled water (designated as concentration 0) employed as the blank control.
All formulated solutions were then placed in a 37 °C constant-temperature incubator for storage, so as to maintain a stable temperature consistent with that of the human oral cavity. The detailed implementation procedures were as follows:
(1) Prior to the initiation of each experimental trial, the start button was activated to trigger the distilled water rinsing mode of the taste stimulation system. Within approximately 5 s, the system rapidly delivered distilled water to the nozzle, enabling thorough flushing and cleansing of the internal pipelines. This pre-trial rinsing step was implemented to eliminate residual substances that might otherwise introduce interference to the experimental outcomes. (2) After the distilled water rinsing process, the system automatically shifted to the solution rinsing mode. The experimental solution was pumped to the nozzle at an elevated flow rate for 3 s, ensuring the complete expulsion of any residual distilled water within the pipelines and the full priming of the pipelines with the experimental solution. At the conclusion of the rinsing phase, a buzzer emitted a 0.5 s alert to notify the participant of the imminent start of the trial. (3) Two seconds after the termination of the rinsing procedure, the taste stimulation system transitioned to the solution stimulation mode. A 2.5 mL aliquot of the stimulation solution was dispensed at a flow rate of 0.5 mL/s over a 5 s duration, with uniform delivery targeted at the region 0.25–0.5 cm above the center of the participant’s tongue. Participants were instructed to retain the solution in their oral cavity without swallowing for 15 s, facilitating the adequate induction of taste-evoked EEG signals. (4) Upon the completion of EEG data acquisition, the buzzer was activated again to signal the end of the experimental stimulation phase. The participants then performed three rounds of oral rinsing, each with 50 mL of purified water, in order to thoroughly eliminate any residual solution from the oral cavity, thereby preventing taste carryover and cross-stimulation interference. To mitigate taste fatigue, the participants were given a 30 min rest interval before commencing the subsequent experimental trial. (5) Each participant received only one type of taste stimulus per session, with a total of 5 sessions. In each session, six concentration gradients were randomly selected, and each concentration was repeated for 6 rounds, with the presentation order randomized to reduce order effects. EEG data for pure water were also collected as a control in each session. After each round, the participant rested for 24 h. The entire experiment was monitored by researchers, and the trial could be terminated at any time if discomfort occurred. The timing sequence for each experimental trial is shown in
Figure 2, and the overall parallel experimental process is illustrated in
Figure 3. (6) The sampling frequency of EEG detection is 256 Hz, which can effectively capture the dynamic responses of taste perception. Each taste concentration stimulation generated six parallel sets of 15 s EEG data. During the sample segmentation process, each sample lasted for 1 s, meaning that each subject would have 90 samples under each taste concentration stimulation.
Before performing deep feature extraction and classification of EEG signals, the raw taste-related EEG data needed to be preprocessed. First, a 0.5–45 Hz bandpass filter was applied; its core purpose was to lock onto the effective rhythm frequency bands of taste-evoked EEG activity, fully retaining the five key frequency bands—Delta (0.5–4 Hz), Theta (4–8 Hz), Alpha (8–13 Hz), Beta (13–30 Hz), and Gamma (30–45 Hz)—while filtering out very low-frequency baseline drift and ineffective high-frequency noise above 45 Hz [
2]. In practical engineering implementation, digital bandpass filters have non-ideal transition band characteristics. The cutoff at 45 Hz is not abrupt, and some residual high-frequency components remain in the 45–50 Hz range [
9,
33]. Therefore, we added a 47 Hz lowpass filter as a protective measure, further suppressing the residual energy in the 45–50 Hz range, preventing edge components of power-line interference from entering the effective frequency band, and reducing the interference burden for subsequent precise notch filtering. Furthermore, the 50 Hz notch filter is a narrowband interference suppression technique, which is functionally complementary to the bandpass and lowpass filters [
5]. In laboratory environments, 50 Hz power-line interference has strong energy and sharp peaks, which can easily pass through the transition band of a conventional bandpass filter. The bandpass filter only provides broadband attenuation and cannot precisely eliminate this strong, single-frequency interference. The 50 Hz notch filter can deeply suppress power-line noise at a specific frequency, while maximally preserving the effective rhythm signals within 0.5–45 Hz, avoiding the loss of useful information. In summary, the preprocessing in this study strictly followed the order of broadband filtering first, followed by precise noise suppression. Specifically, we first applied a 0.5–45 Hz bandpass filter to define the effective signal range; then, we used a 47 Hz lowpass filter to compensate for the bandpass transition band defect; finally, we applied a 50 Hz notch filter to remove residual interference. This order prevented the progressive propagation of interference and minimized the phase shift and amplitude distortion caused by cascaded filtering.
2.3. Wavelet Graph Convolutional Neural Network
EEG data are characterized by two essential properties: multiscale rhythmicity and spatial nonstationarity [
2,
5,
6]. Taste-evoked EEG is a strongly non-stationary signal; its neural responses have instantaneous peaks and sustained activation over time. Classical bandpass filtering would lose this key dynamic response information. In contrast, WT can localize signals in both the time and frequency domains; it not only extracts standard frequency bands, but also accurately captures energy changes at each time point, making it more suitable for transient, evoked neural activities such as taste perception. First, WT is applied to the preprocessed EEG signals for time–frequency decomposition [
27]. This method adaptively captures transient rhythmic events at each electrode, yielding five frequency bands: Delta, Theta, Alpha, Beta, and Gamma. Stable and reproducible correlations exist between rhythms of different frequencies and specific physiological states or cognitive functions of the brain [
34]. These correlations provide a biophysically grounded and functionally well-defined standardized analytical framework for understanding and decoding brain states. Subsequently, to address the nonstationary nature of the signals, the maximum and average energy within each frequency band are computed via WT, representing the salient nonstationary features and overall trend characteristics of each band [
27]. Wavelet energy calculation is stable and highly noise-resistant. Taste-evoked EEG signals are strongly non-stationary, and energy can directly reflect both the intensity and dynamic changes in neural activity. Maximum energy captures the peak of neural responses, corresponding to the moment of strongest activation elicited by taste stimuli. Average energy represents the overall level of neural activity across the entire time window, reflecting the sustained characteristics of taste processing. The two measures are complementary, enabling comprehensive quantification of neural rhythm differences across the Delta, Theta, Alpha, Beta, and Gamma frequency bands, thereby enhancing the ability to discriminate between sweet and bitter tastes as well as their concentrations. Next, graph structures are constructed from the features computed for different electrodes, where electrodes serve as graph nodes and the Pearson correlation coefficients between electrodes form the adjacency matrix. The calculation of Pearson correlation coefficients is efficient, stable, and suitable for quickly constructing graph structures for low-channel EEG; it can quantify the functional connection strength between electrodes. Finally, a WGCNet is designed to classify EEG data corresponding to different concentrations of sweet and bitter tastes.
Figure 4 shows the main structure of WGCNet, and the model consists of the following four main modules:
(1) Time–Frequency Decomposition via WT.
The rhythmic characteristics of EEG signals are distributed across different frequency ranges, and specific frequency bands are closely associated with the neural processing of taste perception. To accurately capture the transient rhythmic responses of each electrode under taste stimulation, this study employs the Morlet wavelet as the mother wavelet for time–frequency decomposition [
35,
36]. The Morlet wavelet offers both excellent temporal localization and frequency resolution, enabling adaptive tracking of instantaneous frequency changes in EEG signals; its mathematical expression is as follows:
where
represents the carrier frequency, which ensures a balanced resolution of the wavelet in both the time and frequency domains, and
is the decomposition result. During the pre-experiment, we also examined the influence of different wavelet basis functions on model classification performance. The results of this discussion were presented in
Table S1 of the Supplementary File. The specific procedure for time–frequency decomposition is as follows:
Step 1: After preprocessing the raw EEG signals to remove baseline drift and power-line interference, the Morlet wavelet is applied to perform continuous WT on the EEG signals from 21 channels (each channel comprising 256 sampling points). This step calculates the inner product between the signal and wavelet bases at different scales, yielding a time–frequency coefficient matrix with dimensions T × F, where T = 256 represents the number of timepoints and F corresponds to the number of frequency bands defined by the scaling.
Step 2: Based on the physiological rhythmic characteristics of EEG signals, the time–frequency coefficients are divided into five typical frequency bands: Delta, Theta, Alpha, Beta, and Gamma. Each band corresponds to an independent sub-matrix of time–frequency coefficients, enabling the separate extraction of distinct neural rhythmic features.
Through wavelet time–frequency decomposition, the one-dimensional (256 × 1) time-domain EEG signal is transformed into a two-dimensional (256 × 5) time–frequency feature matrix. This representation retains the temporal dynamics of the signal while clarifying its frequency–rhythm distribution, thereby providing a structured data foundation for subsequent feature quantification.
(2) Computation of Maximum and Mean Wavelet Energy Values.
EEG signals evoked by taste stimulation exhibit marked non-stationary characteristics. Variations in energy across different frequency bands directly reflect the intensity and dynamic trends of neural activity. To quantify both the non-stationary features and the overall patterns within each frequency band, this study calculates the maximum and mean energy values from the time–frequency coefficient matrix of each band as core features.
Wavelet Energy Calculation: For the
c-th channel and the
b-th frequency band, let
denote the time–frequency coefficient matrix of the band (where
represents the number of frequency points in that band). The instantaneous energy at each time point
t is defined as follows:
where
represents the magnitude of the time–frequency coefficient. The energy calculation reflects the intensity of neural activity within that frequency band at the corresponding timepoint.
Extraction of maximum energy values: For each channel and each frequency band, the maximum value is extracted from the instantaneous energy sequence
(
). This metric highlights the peak neural responses elicited by taste stimulation, captures the most prominent rhythmic activity characteristics, and reflects the intensity of key activation moments during neural processing. The calculation is expressed as follows:
Extraction of Mean Energy Values: The mean value of the instantaneous energy sequence is calculated for each channel and each frequency band. This metric characterizes the average level of neural activity within the band over the entire stimulus period, reflecting the overall activation trend of neural rhythms during taste perception. The calculation is expressed as follows:
Ultimately, the feature vector for each channel has a dimension of 2 × 5 = 10 (5 frequency bands × 2 energy metrics). Across all 21 channels, this yields an original feature response matrix of size 21 × 10, enabling cross-subject classification based on the time–frequency characteristics of the EEG signals.
(3) Graph Structure Construction for EEG Features.
Advanced graph-based EEG classification models can significantly improve feature representation and classification ability. EEG electrodes can be naturally modeled as graph nodes, and the functional connections between electrodes as graph edges. Graph computing can precisely mine spatial topology and functional synergy among brain regions, and it also adapts well to the non-Euclidean structure and strong spatial dependencies of EEG; therefore, applying it to EEG recognition has strong physiological and technical justification [
37]. In the graph modeling of this paper, the nodes are the 21 EEG electrodes. The edge set consists of the connections between all electrode pairs. The adjacency matrix is a connection strength matrix, where each connection is quantified using the Pearson correlation coefficient. The edge set defines the structure of connections between electrodes, while the adjacency matrix provides the numerical expression of those connection strengths. The relationship between the two is that of “structural definition” versus “numerical representation”. The specific procedure is as follows:
Step 1: Definition of Graph Nodes: The 21 EEG electrodes are defined as graph nodes. The feature vector of each node corresponds to the 10-dimensional energy feature set derived from the corresponding electrode, forming a graph node feature matrix.
Step 2: Construction of the Adjacency Matrix: The Pearson Correlation Coefficients are adopted to quantify the strength of feature-based associations between electrodes, serving as the edge weight of the graph (adjacency matrix).
Step 3: Standardization of the Adjacency Matrix: Since the adjacency matrix used in graph convolution is undirected, the absolute values of its entries are taken.
By constructing this graph structure, the initially dispersed electrode features are integrated into spatially correlated, structured data, effectively exploring the cooperative processing of information among brain regions and providing a basis for capturing the distributed neural mechanisms underlying taste perception. The procedure for building the Graph Convolutional Network (GCN) is detailed below:
In a GCN,
denotes an undirected graph [
38], where
V represents the set of nodes,
represents the set of edges, and
A is the adjacency matrix that defines the connections between nodes. The structure of the graph can be expressed by the Laplacian matrix (
L), which is formulated as follows:
where
D denotes the degree matrix, and the
i-th diagonal element thereof is calculated as follows:
In an undirected graph,
L is a positive semidefinite matrix, and
. It undergoes spectral decomposition to yield:
where
denotes mutually orthogonal eigenvectors, which serve as the Fourier basis.
represents a symmetric matrix, and
indicates the eigenvalues corresponding to
ui. Given that
is an orthogonal matrix (
), the above expression can be rewritten as follows:
The graph Fourier transform converts signals defined in the spatial domain to the spectral domain in order to facilitate convolution operations, followed by an inverse transformation back to the spatial domain. The conversion of a graph signal
x from the spatial domain to the spectral domain can be formulated as follows:
The inverse graph Fourier transform of the signal is formulated as follows:
Accordingly, graph convolution can be formulated as follows:
where
represents graph convolution,
denotes the Hadamard product, and
g is the convolution kernel. Let
Ultimately, the expression of graph convolution can be converted into
The trainable parameters of the convolution kernel can be formulated as follows:
However, GCN involves computationally expensive eigendecomposition. Moreover, the number of graph convolution parameters equals the number of graph nodes. Consequently, an excessive number of convolution parameters can impair decision performance. It is noteworthy that the graph convolution kernel is inevitably multiplied by the matrix
U, leading to a computational complexity of O(
n2) for GCN. To address this issue, Chebyshev polynomials are adopted in this study to simultaneously reduce the number of parameters and lower the computational complexity. Initially, a polynomial approximation is applied to the convolution kernel, which is expressed as follows:
where
K denotes the maximum order of the polynomial, with
K being substantially smaller than 21. This polynomial approximation reduces the number of convolution kernel parameters from
n to
K. Nevertheless, since the input signal still requires multiplication by the matrix
U, the computational complexity of the graph convolution operation remains at O(
n2). Therefore, Chebyshev polynomials are adopted, and the convolution operation is termed Chebyshev convolution [
39]. The convolution kernel can be transformed as follows:
where
denotes the Chebyshev polynomial coefficient, and
represents the Chebyshev polynomial evaluated at the scaled Laplacian matrix
. In the implementation of graph convolution computations, the Chebyshev polynomial can be formulated as follows:
Subsequent to polynomial fitting, the convolution operation can be rephrased as follows:
where
, and
are defined as above. With the adoption of this strategy, eigendecomposition of the Laplacian matrix is no longer necessary, and the computational complexity is reduced from O(
n2) to O(
kn). Concurrently, benefiting from the polynomial-based calculation paradigm, the number of convolution parameters is diminished from
n to
k. In the present study, the parameter
k was set to 2.
(4) WGCNet Architecture.
Based on the computational procedures described above, WGCNet was designed (Pseudocode shown in Algorithm 1). First, two graph convolutional layers (GC) were employed to extract deep EEG features. To balance computational complexity and feature-extraction effectiveness, preliminary experiments determine that each graph convolution used 20 kernels. Subsequently, a fast graph pooling operation was applied to aggregate and reduce the dimensionality of the deep EEG features. Finally, two fully connected layers (FC) performed nonlinear mapping from the deep EEG features to the labels corresponding to different taste concentrations. The first FC layer contained 32 neurons, and the second FC layer contained 6 neurons.
| Algorithm 1: Wavelet Graph Convolutional Neural Network (WGCNet) |
| Input: EEG signals: . (N = samples, C = 21 channels, T = 256 timepoints); Num_classes = 6 |
| Output: Classification probability: |
| 1. For each sample : |
| a. Morlet wavelet: Decompose each channel into Delta/Theta/Alpha/Beta/Gamma bands |
| b. Compute for each band: Instant energy , then , |
| c. Channel feature: |
| d. Sample feature matrix: |
| 2. Build graph: Nodes = F; Adjacency matrix |
| 3. Chebyshev GCN (K = 2): |
| a. Laplacian |
| b. Chebyshev polynomials: |
| c. GConv1: 20 kernels |
| d. GConv2: 20 kernels |
| 4. Pooling + FC: |
| a. Pooling = FastAveragePooling (GConv2) |
| b. FC1: 32 neurons |
| c. FC2: 6 neurons |
| 5. Return |
2.4. Cross-Subject Training Based on Meta-Learning
The core goal of the meta-learning is to enable the model to “learn how to learn”, rather than directly learning the EEG features of a specific group of subjects. This aims to solve the problem of sharp accuracy degradation in cross-subject recognition caused by individual brain differences. Traditional deep learning is “trained on a batch of subjects and applied directly to new subjects”, which often fails due to distribution differences across individuals. In contrast, meta-learning extracts general taste-related neural representations from multiple subjects and learns a set of “good initialization parameters”. This allows the model to quickly adapt to a completely new subject with only a small number of samples and a few gradient update steps. The technical process is as follows:
Step 1: Data Construction.
A total of 10,800 samples (20 subjects × 6 classes × 90 samples per class) are partitioned by subject into 20 independent “task domains”. The data from each subject constitutes a distinct meta-task, within which support and query sets are defined. For each subject, 30 samples per class are randomly selected to form the support set (following a 6-way 30-shot task structure), while the remaining 60 samples per class serve as the query set. The support set is used for meta-model training to enable rapid learning, and the query set is used for evaluation and meta-loss computation. Let the overall dataset be denoted as , where represents the EEG data of the i-th subject. Each comprises six concentration-level labels (including a control condition). For each subject dataset , the support set and query set are partitioned according to the “6-way 30-shot” scheme. The meta-task is thus constructed accordingly.
Step 2: Meta-learning Training and Cross-Validation.
(1) Outer-Loop Leave-One-Subject-Out Validation: A rigorous cross-subject evaluation is performed using the leave-one-subject-out method. Specifically, the data from one subject are held out as the meta-test task (representing a completely unseen subject), while the data from the remaining 19 subjects serve as meta-training tasks.
(2) Inner-Loop Meta-Cross-Training: Among the 19 meta-training task domains, a five-fold cross-validation scheme (4, 4, 4, 4, 3) is adopted to partition the meta-training and meta-validation sets. The meta-learning model, which employs model-agnostic meta-learning (MAML), learns a good initial parameter set directly, enabling rapid adaptation to a new subject with only a few gradient steps. This model iteratively trains the WGCNet on the meta-training set. The core process is as follows: For each meta-task, a few gradient update steps are performed on its support set (rapid adaptation), after which the loss is computed on the query set. The gradients are then aggregated to update the initialization parameters of the meta-model. The objective is to enable the model to learn how to quickly adapt to a new subject. The meta-validation set is used for model parameter tuning and identifying the optimal classification network. Let the initial parameters of the meta-model be denoted as
. For each meta-training task
T, rapid task-specific adaptation is first carried out using its support set
S. The updated task parameters, denoted as
, are defined as follows:
where
is the within-task learning rate (set to 0.01);
L denotes the
L2 cross-entropy loss function (the coefficient of the penalty term is 0.01).
represents the predicted output of the WGCNet model with parameters
for the support set
S, and
denotes the ground-truth labels of the support set
S. A multi-step adaptation strategy (default: 3 steps) is employed during training, meaning that
undergoes multiple iterative updates:
The meta-loss function is defined as the average loss over the query sets of all meta-training tasks under the adapted parameters
:
where
denotes a subset of meta-training tasks,
represents the ground-truth labels of the query set
Q, and
is the predicted output of the model on the query set after its parameters are updated to
. The meta-parameters
are updated via gradient descent on the meta-loss, following the update rule:
where
represents the meta-learning rate, set to 0.01.
Step 3: Testing and Evaluation.
When the trained meta-model encounters a test subject, the data of that subject are defined as a meta-test task
. Using the 30 samples per class from this subject as the support set
, rapid adaptation is performed. The adapted parameters for the new task are obtained as follows:
The classification performance is then evaluated using the query set
of that subject. This process is repeated 20 times, each time leaving out a different subject as the test domain. Finally, the average performance over all 20 runs is computed as the final metric for cross-subject generalization capability. The performance evaluation indicators of the leave-one-out method include accuracy, precision, recall rate, and F1-score [
40,
41].