A Portable and Affordable Four-Channel EEG System for Emotion Recognition with Self-Supervised Feature Learning

Luo, Hao; Li, Haobo; Tao, Wei; Yang, Yi; Ieong, Chio-In; Wan, Feng

doi:10.3390/math13101608

Open AccessArticle

A Portable and Affordable Four-Channel EEG System for Emotion Recognition with Self-Supervised Feature Learning

by

Hao Luo

^1,†,

Haobo Li

^1,†,

Wei Tao

^1,2

,

Yi Yang

^1,2

,

Chio-In Ieong

^3,* and

Feng Wan

^1,2,*

¹

Department of Electrical and Computer Engineering, Faculty of Science and Technology, University of Macau, Taipa 999078, Macau

²

Centre for Cognitive and Brain Sciences, Institute of Collaborative Innovation, University of Macau, Taipa 999078, Macau

³

Guangdong Institute of Intelligence Science and Technology, Hengqin, Zhuhai 519031, China

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2025, 13(10), 1608; https://doi.org/10.3390/math13101608

Submission received: 24 April 2025 / Revised: 8 May 2025 / Accepted: 11 May 2025 / Published: 14 May 2025

(This article belongs to the Special Issue Advanced Time Series and Computational Methods in Biological Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Emotions play a pivotal role in shaping human decision-making, behavior, and physiological well-being. Electroencephalography (EEG)-based emotion recognition offers promising avenues for real-time self-monitoring and affective computing applications. However, existing commercial solutions are often hindered by high costs, complicated deployment processes, and limited reliability in practical settings. To address these challenges, we propose a low-cost, self-adaptive wearable EEG system for emotion recognition through a hardware–algorithm co-design approach. The proposed system is a four-channel wireless EEG acquisition device supporting both dry and wet electrodes, with a component cost below USD 35. It features over 7 h of continuous operation, plug-and-play functionality, and modular expandability. At the algorithmic level, we introduce a self-supervised feature extraction framework that combines contrastive learning and masked prediction tasks, enabling robust emotional feature learning from a limited number of EEG channels with constrained signal quality. Our approach attains the highest performance of 60.2% accuracy and 59.4% Macro-F1 score on our proposed platform. Compared to conventional feature-based approaches, it demonstrates a maximum accuracy improvement of up to 20.4% using a multilayer perceptron classifier in our experiment.

Keywords:

electroencephalogram (EEG); affective computing; few-channel EEG; wearable EEG; self-supervised learning; feature extraction

MSC:

68T07; 92C47

1. Introduction

Emotions play an essential role in human daily life, profoundly influencing cognition, communication, and decision-making processes [1]. Among various physiological signals, electroencephalography (EEG) has emerged as a promising modality for emotion recognition due to its high temporal resolution, non-invasiveness, and real-time monitoring capabilities [2].

The applications of EEG-based emotion recognition span multiple domains. In automotive systems, emotion-aware driver assistance is considered a potential approach to enhance road safety [3]. Within clinical neurology, identifying emotional responses to specific stimuli can facilitate the diagnosis of affective disorders such as post-traumatic stress disorder (PTSD) [4] and depression [5]. Therapeutic interventions including robot-assisted and music-assisted therapies also benefit from emotion detection. In information retrieval, emotion recognition enables affective computing applications such as implicit tagging of multimedia content [6]. Healthcare applications include early detection of negative emotional states during social media consumption [7]. The gaming industry utilizes emotion recognition to adapt game dynamics [8], while virtual reality systems leverage emotional states to enhance educational outcomes [9].

Although EEG-based emotion recognition holds great promise, there are still many challenges in transitioning emotion recognition technology from the laboratory to daily life. Traditional experiment-level EEG equipment provide high-quality signals; however, it suffers from complex setup procedures, it is uncomfortable, and it has limited mobility [10]. Compared with experiment-level device, commercial wireless EEG devices offer improved portability and cost-effectiveness for personal applications [11], whereas the most affordable single-channel devices remain prohibitively expensive (e.g., NeuroSky MindWave at USD 199) [12].

Moreover, commercial systems exhibit technical limitations. Commercial devices may exhibit compromised signal quality as a trade-off for achieving cost-effectiveness compared to experiment-level equipment. Commercial device might observe an increased bandwidth in the

α

band (8–13 Hz). For example, the Fp1 channel power spectrum of MindWave is generally similar to medical systems, but the bandwidth is slightly increased [10]. Some researchers found that mismatch negative (MMN) waveforms captured by Emotiv exhibited a lower signal-to-noise ratio, and these waveforms from half of the subjects seemed different from those from medical-grade devices [13].

With the potential signal quality issues, commercial EEG devices have introduced significant challenges in emotion recognition. To reconcile high performance with low cost and portability, we developed a robust emotion feature extractor for the proposed device, compensating for the performance degradation typically observed in portable systems using conventional methods. Our main contributions include the following:

A modular wireless EEG acquisition system supporting Wi-Fi communication with adjustable sampling rates (250–1000 Hz), compatible with both dry/wet electrodes, featuring plug-and-play operation, and built at a components cost of under USD 35.
A robust self-supervised feature extractor (EmoAdapt), which extracts discriminative features based on general features from EEG signals by integrating contrastive learning and masking learning tasks, minimizing the influence of EEG signal quality to the greatest extent.

Experiments show that our system, with the synergy of hardware and software, can effectively achieve emotion recognition and achieve a maximum cross-session accuracy of 60.2% and a Macro-F1 score of 59.4% on our proposed platform. Compared to conventional feature-based approaches, it shows a maximum accuracy improvement of up to 20.4% using a multilayer perceptron (MLP) classifier in our experiment.

The remainder of this paper is organized as follows: Section 2 reviews related work; Section 3 details the EEG acquisition system; Section 4 presents our methodology; Section 5 describes the experimental setup; Section 6 analyzes results; Section 7 provides a discussion; and Section 8 concludes the paper.

2. Related Works

EEG-based emotion recognition systems require tight integration between signal acquisition hardware and data processing algorithms. In this section, we review related work in both domains. We begin by introducing common methods for recognizing emotions from EEG signals. Then, we review EEG acquisition systems, including commercial-grade devices and self-designed platforms.

2.1. EEG-Based Emotion Features

EEG features for emotion computing are primarily categorized into time domain, frequency domain, and time–frequency domain features. The time-domain features are relatively easier to extract and mainly includes event-related potentials [14], higher-order over-zero analysis [15], and Hjorth parameters [16]. Frequency-domain features are most preferred in emotion computing due to their high correlation with psychological activities. Through a Fourier transform, signals are in general decomposed into five frequency bands:

δ

(1–3 Hz),

θ

(4–7 Hz),

α

(8–13 Hz),

β

(14–30 Hz),

γ

(31–50 Hz). From given bands, frequency domain features are extracted, mainly power spectral density (PSD), differential entropy (DE), differential asymmetry (DASM), rational asymmetry (RASM), and energy spectrum (ES), where PSD and DE features have demonstrated particularly effective performance in emotion recognition systems [17,18,19]. The Fourier transform operates across the entire time domain, while time–frequency domain features are employed to capture both global and local signal characteristics. In addition to conventional features, massive techniques have been proposed by studies to further improve the feature performance. Techniques like feature smoothing and linear dynamic systems (LDS) [20] are typically applied to entire trial features to remove emotion-irrelevant components. Domain adaptation (DA) methods are effective ways to further develop abundant domain-invariant features, thereby achieving better emotion recognition across sessions or subjects [17]. However, none of the conventional feature extraction methods mentioned above can effectively address the challenges associated with low-SNR signals acquired in resource-constrained EEG systems.

2.2. Self-Supervised Learning

As a branch of unsupervised learning, self-supervised learning automatically creates fake labels, based on the inherent properties of the signals themselves. Self-supervised learning has been widely applied in various fields, including computer vision, natural language processing (NLP), and bio-informatics, among others. Recently, it has emerged as an effective approach for automatic feature extraction. This approach demonstrates dual advantages: it effectively extract vast quantities of unlabeled data while simultaneously reducing generalization degradation caused by label noise. Self-supervised learning can generally be divided into two major categories: contrastive learning and masked prediction tasks. The goal of contrastive learning is to learn the differences between classes. The most commonly used method is to establish negative samples through data augmentation. The SimCLR framework was expanded to time series data to train a channel-level feature extractor, proceeding with data augmentation by channel reorganization and dataset fusion [21]. In another approach, signal transforms were used for data enhancement, and a pretext task based on CNN was constructed [22]. Additionally, contrastive learning was employed to study the EEG characteristics of sleep, with constructed positive and negative samples [23].

The core objective of the mask prediction task is to reconstruct masked portions of input data. This approach was notably implemented in Masked Autoencoders (MAEs) [24], which applied SSL to partially observed image patches using Vision Transformers (ViTs). Unlike SimMIM [25], where the full set of image patches is used for reconstruction, MAE employs a high masking rate (over 75%) resulting in a better ability to recover missing information.

Recent advancements aim to integrate contrastive learning principles with masked prediction tasks. For instance, spatial offset cropping was introduced as an augmentation strategy, where an enhanced view of the input was compared with the predicted view from an online feature decoder [26]. This dual-path framework enabled the simultaneous capture of both global structural features and discriminative local patterns. A self-supervised learning framework combining contrastive learning and masked prediction tasks was proposed for the classification of sleep stages under single-channel EEG, incorporating a MAMBA-based temporal context module [27].

2.3. Commercial EEG Acquisition System

In recent years, the demand for portable EEG acquisition devices has grown substantially, leading to the development of a variety of commercial-grade systems. These devices are primarily designed for use in research, healthcare, and consumer applications, with an emphasis on portability, ease of use, and accessibility.

The OpenBCI Ganglion is a four-channel, open-source EEG acquisition system known for its versatility. It is particularly favored by researchers and developers due to its customizability. The system supports both dry and wet electrodes and includes a graphical user interface (GUI) for streamlined setup and configuration [28]. The EPOC Insight is a wireless EEG headset developed by Emotiv, specifically designed for cognitive research and brain-computer interface (BCI) applications. It features five EEG channels and is optimized for user convenience, requiring minimal setup time [29]. The Muse S is a consumer-focused EEG device aimed at meditation and sleep monitoring. It incorporates multiple sensors, including EEG, photoplethysmography (PPG), and an accelerometer, making it a multifunctional platform for wellness applications. However, the device has limitations in terms of dynamic range and resolution [30]. The NeuroSky MindWave is another signal acquisition device designed for research and educational purposes [31]. A comparative summary of these and other commercial systems is provided in Table 1 and Table 2.

Commercial EEG devices are highly portable, user-friendly, and equipped with wireless connectivity, making them ideal for non-laboratory applications. However, fixed electrode configurations, high costs, and limited customizability remain significant barriers to advanced research and widespread adoption.

2.4. Self-Designed EEG Acquisition System

Many researchers have recognized the limitations of existing EEG acquisition systems and have proposed various custom-designed solutions. Based on the implementation of the analog front end (AFE) in the signal acquisition chain, these designs can be categorized as follows.

2.4.1. Designs Using Discrete General-Purpose Operational Amplifiers

These systems utilize standard-precision operational amplifiers to construct front-end signal conditioning circuits for amplification, filtering, and biasing. Analog-to-digital conversion is subsequently performed using general-purpose ADC chips.

A flexible and portable wireless EEG headband was developed using discrete op-amps like the OPA314 and OPA333 (Texas Instruments, Dallas, TX, USA) [37]. Integrated on a flexible PCB, that system supported eight channels for signal acquisition, quantization, and motion artifact detection, with electrodes directly mounted on the board. It was suitable for mobile monitoring of the prefrontal and temporal lobes. Another study analyzed and optimized a previously proposed EEG acquisition circuit, achieving performance comparable to that of commercial systems [38]. Additionally, a wearable three-channel EEG sensor for depression diagnosis was introduced [39]. That system employed flexible, non-invasive electrodes to acquire frontal EEG signals and incorporated the Ant Lion Optimization (ALO) algorithm, demonstrating a high signal-to-noise ratio (SNR) and classification accuracy.

The primary advantages of this approach are design flexibility and customizable sub-circuits, allowing more controlled signal quality. However, this method typically involves more complex circuitry, reduced integration, and higher implementation costs.

2.4.2. Designs Based on Dedicated Bio-AFE Chips

These designs can be categorized into two types: those with integrated ADCs and those without.

Chips Integrating AFE and ADC

This category includes well-established solutions such as Texas Instruments’ ADS1298 and ADS1299, widely adopted in both academic and engineering contexts. A versatile smart glasses-based platform called “GAPSES” was proposed, utilizing the ADS1298 for EEG and EOG acquisition [40]. Targeted at the temporal lobes, this platform incorporates the GAP9 (GreenWaves Technologies, Grenoble, France) processor, which enhances edge-side processing for applications like eye movement detection and steady-state visual evoked potential (SSVEP) analysis. Another innovative solution, EEG-Linx, is a modular WESN platform featuring miniaturized ADS1299-based nodes with a DRL-free design [41]. This system supports synchronized, wireless multi-node EEG acquisition and has been validated with SSVEP and ASSR tasks, demonstrating signal quality comparable to commercial devices for flexible and unobtrusive EEG monitoring. Additionally, some researchers introduced an IoT-enabled, wearable multimodal monitoring system (IEMS) for neurological intensive care units (NICUs) [42]. That system monitored EEG, regional cerebral oxygen saturation (

{rSO}_{2}

), temperature, ECG, PPG, and bioimpedance (Bio-Z), supporting remote diagnosis and clinical decision-making, with hospital-based validation confirming its robustness under severe interference. Lastly, a portable EEG acquisition system was developed using the LHE7909 (Legendsemi, Suzhou, China) and ESP32-S3 (Espressif, Shanghai, China), specifically targeting motor imagery tasks [43], with experimental results demonstrating the reliability of the acquired EEG signals.

These solutions offer high integration, incorporating features such as built-in programmable gain amplifiers (PGAs), driven right leg (DRL) circuits, lead-off detection, digital filtering, and digitized outputs. These functions simplify downstream data acquisition. However, the trade-off lies in slightly higher cost and reduced design flexibility.

Chips Providing Only Biopotential Amplification (External ADC Required)

A representative example is Analog Devices’ AD8232 (Wilmington, MA, USA), which integrates an amplifier, band-pass filter, and DRL driver but outputs analog signals, requiring an external ADC. Researchers have developed an open-source biopotential sensor based on the AD8232, capable of acquiring EEG, ECG, and EMG signals [44]. That approach balances integration with flexibility. However, the requirement for an external ADC increases PCB area and reduces system compactness.

2.4.3. Highly Integrated Bio-SoC Solutions

Recent products have achieved even greater levels of system integration. For example, Nanochap’s NNC-EPC001 (Hangzhou, China) is a RISC-V-based System on Chip (SoC) that integrates EEG, ECG, EMG, and PPG analog front ends. It features a complete ExG signal chain—including PGAs, a 24-bit ADC, AC lead-off detection, and a DRL circuit—along with PPG transceivers and an on-chip Heart APP for real-time heart rate detection [45]. Although no published studies have yet employed this SoC for EEG acquisition, it shows promise for future development.

The aforementioned EEG acquisition systems exhibit strong performance and innovative designs, but they often involve trade-offs between cost, complexity, integration level, and flexibility. In particular, high-performance systems based on commercial bio-AFE chips may offer excellent signal quality but are generally expensive and less customizable, while discrete designs provide flexibility at the cost of compactness and integration. To address these limitations, this work proposes a low-cost, modular EEG acquisition system that balances signal quality, hardware simplicity, and system flexibility. Details of the system design are presented in the following section.

3. Proposed EEG Acquisition System

This section presents the proposed acquisition system. As illustrated in Figure 1, the overall workflow begins with the EEG amplifier, which amplifies the signal and performs analog-to-digital conversion. The main control module then reads the digitized data, packages it, and transmits it to the host computer via Wi-Fi. The software on the host computer handles data reception, storage, and visualization and forwards it to the algorithmic processing stage. The system is described from three perspectives: hardware development, embedded system development, and software development.

3.1. Hardware Development

As illustrated in Figure 2, a modular hardware architecture was adopted, consisting of three primary components: the EEG amplifier module, the main control module, and the expansion module. This design approach enhanced flexibility and scalability while maintaining cost-efficiency.

3.1.1. EEG Amplifier Module

The EEG amplifier module serves as the core component of the system. It functions as a generalized amplifier, responsible not only for amplifying weak EEG signals but also for converting them into digital form for subsequent processing. In the proposed design, the module primarily consists of two EEG-specific amplifiers and an ADC. Figure 3 shows the schematic diagram of the entire module. The signal input, common-mode feedback, and built-in LDO are all part of the internal circuitry of the analog front end. The Bias Generation circuit generates a shifting voltage to align the amplifier output with the ADC input, and below it are several other auxiliary circuit modules.

Amplifier

EEG signals typically exist in the microvolt range, making them unsuitable for direct analog-to-digital conversion without amplification. To address this, the KS1092—a lightweight, dual-channel bio-amplifier developed by Kingsense (Shenzhen, China)—was employed in this design [46]. The KS1092 integrates amplification, filtering, and biasing functions and includes two cascaded programmable gain amplifiers (PGAs), offering a total gain range from 360 to 2760. It amplifies weak EEG signals to a usable voltage range of 0–1.8 V, powered by an internal 1.8 V low-dropout (LDO) regulator. A built-in band-pass filter restricts the output bandwidth to 0.5–200 Hz. Without external biasing resistors, the amplifier achieves an input impedance of up to 5 GΩ, supporting both dry and wet electrodes. The design incorporates a common-mode rejection ratio (CMRR) of 100 dB, enhancing resistance to external interference. Furthermore, a built-in right-leg driving (RLD) circuit provides common-mode feedback to suppress noise such as power-line interference.

As shown in Figure 4, an 18 MΩ bias resistor (

R_{B I A S}

) stabilizes the input DC operating point at approximately 0.9 V, ensuring proper internal operation and improved common-mode noise suppression. To limit the current injected into the human body, a 10 kΩ resistor (

R_{L I M}

) is connected in series at the output. The high level of integration in the KS1092 enables the entire acquisition circuit to be implemented with only five passive components. Additional features include Fast Restore (FR) and lead-off detection (LDF). The FR function temporarily creates a low-resistance path to the output, accelerating stabilization after step inputs—particularly useful due to the low cut-off frequency of the high-pass filter. The LDF function monitors electrode connectivity by detecting power-line interference, although the KS1092 lacks an internal AC excitation source and therefore does not support real-time quantitative impedance measurements.

ADC

In the biomedical field, EEG signal acquisition commonly employs either Successive-Approximation Register (SAR) or

Σ - Δ

structured ADCs [47]. SAR-based ADCs typically offer higher speed and lower power consumption compared to

Σ - Δ

ADCs. On the other hand, the

Σ - Δ

ADCs provide superior resolution due to oversampling and noise-shaping techniques. By operating at high clock frequencies,

Σ - Δ

ADCs can effectively convert time resolution into analog resolution, thereby enhancing signal quality. The downside is that they consume more power and have slower conversion speeds, resulting in some latency. For multi-channel EEG acquisition, channel synchronization is another consideration. Based on their sampling strategy, ADCs can be classified as either synchronous or asynchronous. Synchronous-sampling ADCs include multiple ADC cores and enable simultaneous sampling across all channels. In contrast, asynchronous-sampling ADCs utilize a single core along with a multiplexer (MUX) to switch between channels. For asynchronous-sampling

Σ - Δ

ADCs, the longer conversion time makes the delay between channels more pronounced.

To ensure simultaneous acquisition, signal quality, and cost-effectiveness, this work employed the ADS131M08 from Texas Instruments—a synchronous 8-channel, 24-bit

Σ - Δ

ADC with a maximum sampling rate of 32 kSPS. The device supports both differential and single-ended input configurations. The ADS131M08 includes an internal 1.2 V reference voltage and also supports an external reference input within the range of 1.1 to 1.3 V. Its input voltage range is limited to

\pm V_{r e f}

, which does not fully accommodate the KS1092’s output range of 0–1.8 V, To address this mismatch, a 0.9 V reference voltage was generated by connecting two identical resistors in series to divide the 1.8 V output of the KS1092’s internal LDO. The accuracy of the clock source directly affects the precision of the ADC’s sampling rate. To ensure timing stability and minimize susceptibility to interference, an 8.192 MHz active crystal oscillator with a frequency tolerance of ±10 ppm was selected as the ADC’s clock source, instead of a standard passive crystal. The external clock

f_{CLKIN}

enters the ADC and passes through a pre-divider, generating the modulator clock

f_{MOD} = f_{CLKIN} / 2

. The output data rate is defined as

f_{DATA} = f_{MOD} / OSR

, where

OSR

denotes the oversampling ratio. Thus, for an output rate of 500 Hz, an OSR of 8192 can still be achieved, ensuring high signal quality.

The ADS131M08 incorporates a Global-Chop function. This function samples the positive input three times, then swaps the inputs and samples the negative input three times. The final result is calculated from these six samples. This reduces the impact of the input offset voltage (

V_{OS}

) of the internal PGA. This mechanism enhances the accuracy of low-frequency signal acquisition. A dedicated “Data Ready” pin generates a falling-edge pulse upon completion of each data conversion. By connecting this pin to a microcontroller’s GPIO, an external interrupt can be used to trigger data retrieval via SPI, eliminating the need for a timer and reducing processing overhead.

Impedance Measurement

As noted previously, the KS1092 does not include a built-in AC excitation source and therefore cannot perform quantitative contact impedance measurements. To overcome this limitation, an external impedance measurement circuit was added to the EEG amplifier module as shown in Figure 5. The circuit comprises a digital-to-analog converter (DAC7311, Texas Instruments), which generates an external AC signal. This signal is injected into the body through the target electrode using an analog switch (TMUX1309, Texas Instruments). The resulting current flowing through the electrode is converted into a voltage signal by a precision resistor. An instrumentation amplifier (INA350, Texas Instruments) is used to amplify the voltage signal, and the amplified signal is then sampled by the ADC.

Finally, the contact impedance can be calculated based on the measured voltage using (2):

Z_{t o t a l} = \frac{V_{D A C}}{I_{i n}} = \frac{V_{D A C}}{V_{O U T} / A_{v, I N A} / R_{s a m p l e}} = \frac{V_{D A C} \cdot R_{s a m p l e} \cdot A_{v, I N A}}{V_{O U T}},

(1)

Z_{c o n t a c t} = Z_{t o t a l} - R_{l i m i t} - R_{s a m p l e},

(2)

where

Z_{t o t a l}

represents the total impedance in the loop, and

Z_{c o n t a c t}

denotes the contact impedance.

R_{l i m i t}

is the current-limiting resistor, and

R_{s a m p l e}

is the sampling resistor.

V_{D A C}

is the amplitude of the DAC output signal,

I_{i n}

is the amplitude of the injected current,

V_{O U T}

is the output voltage amplitude of the instrumentation amplifier (INA), and

A_{v, I N A}

is the gain of the INA.

Power and I/O Configuration

The EEG amplifier module was sensitive to noise. Given that its power consumption was low, an LDO chip, the TPS7A2023 (Texas Instruments), was used to generate 3.3 V as AVDD, while another LDO chip, the LP5912 (Texas Instruments), generated 3.3 V as DVDD. Both chips have high PSRR and low noise. To minimize interference from digital components, the analog and digital grounds were separated to improve analog signal integrity.

The AFE required multiple configurable I/O ports. Connecting them directly to the main control module would occupy too many I/O ports and increase the size of the connector. Therefore, an

I^{2} C

-based I/O expander was employed, allowing the control of 16 I/O ports using only two

I^{2} C

lines. Additionally, that module supported customizable

I^{2} C

address configuration. With this feature, it was only necessary to scan the

I^{2} C

addresses in the main control module after startup. This enabled the identification of the current EEG amplifier module and achieved plug-and-play functionality for different EEG amplifier modules.

3.1.2. Main Control Module

The main control module was built around the Espressif ESP32-S3 (Shanghai, China), a high-performance dual-core microcontroller that integrates Wi-Fi and BLE capabilities. Its dual-core architecture enables concurrent execution of Wi-Fi communication and data acquisition or processing tasks. The module also incorporates a 6-axis MEMS MotionTracking sensor (ICM42688, TDK InvenSense, San Jose, CA, USA) for real-time orientation monitoring, and a TF card slot to support offline data storage and backup. Given the wireless transmission demands of the ESP32-S3, it represented the system’s primary power consumer. Supplying 3.3 V via an LDO from a 3.7 V battery was inefficient and reduced battery life; therefore, a buck DC-DC converter was used to efficiently step down input voltages (3.6–5 V) to 3.3 V for the ESP32-S3. Meanwhile, the lower-power IMU and TF card were powered through an LDO to ensure a stable 3.3 V supply. A Micro-USB port was also provided for direct PC connection, enabling convenient debugging and data acquisition. A schematic of the main control module is shown in Figure 6.

3.1.3. Expansion Module

The ESP32-S3 supports IEEE 802.11b/g/n protocols, commonly referred to as Wi-Fi 4, and operates in the 2.4 GHz ISM band, which is constrained by a limited number of non-overlapping channels (specifically channels 1, 6, and 11) and high device density. In high-concurrency scenarios, these constraints can lead to network congestion and packet loss. Under such conditions, the TCP retransmission mechanism may further exacerbate transmission delays and increase system load. Moreover, inadequate router performance or an excessive number of connected devices can contribute to persistent packet loss.

To address these limitations, we developed an SLE expansion module based on the BearPi-BM H21E (Nanjing, China) [48], as illustrated in Figure 7. The SLE module obtains data from the main controller via UART and sends them to the receiver on the computer. The receiver is connected to the computer via USB. This module supports the SparkLink Low Energy (SLE) protocol—a low-power sub-protocol within the NearLink framework proposed by Huawei. Compared to Wi-Fi and BLE, SLE provides more stable connectivity, lower latency, and higher reliability [49], making it a viable alternative transmission channel. The expansion module includes reserved interfaces for power, UART, SPI, and other specific functions, enabling future integration of additional physiological signal modules such as PPG, ECG, or GSR, as required.

3.1.4. Power Supply

Due to the need for Wi-Fi transmission, the power consumption is significantly higher than that of Bluetooth. The system operates on a 2000 mAh, 3.7 V lithium battery, supporting over 7 h of endurance. When connected to a computer via Micro-USB, it utilizes the 5 V supply as the main power source. An ORing configuration was implemented using the LM66100 ideal diode IC from Texas Instruments, enabling multiple power supply options. By utilizing ORing, a higher voltage power source could be used to supply power to the entire system.

Moreover, to extend acquisition time, a dual-battery power supply module was developed that supports manual and automatic switching between batteries, allowing for battery replacement without power interruption. As shown in Figure 8, the key component is the TPS2117 (Texas Instruments), which functions as a power multiplexer with manual and priority switching capabilities. Control signals are generated via a button on the board, based on a binary counter, enabling the switching between three modes: manual use of VIN1, manual use of VIN2, and priority use of VIN1 (switch to VIN2 when VIN1 < 3.6 V). The QA port generates a signal for switching between manual and automatic modes, while the QB port is used to select between the two batteries in manual mode.

3.1.5. PCB Design

Both the EEG amplifier and control boards adopted a 4-layer stack-up. The top and bottom layers were designated for signal routing, the second layer was a dedicated ground plane, and the third layer served as the power plane. To ensure lower noise, the path for low-frequency, weak analog signals like EEG should be kept short and wide. The PCB is shown in Figure 9. All modules shared the same PCB size, 42 mm × 32 mm, and the thickness was 1.0 mm.

3.1.6. Cost Description

The component prices for the EEG amplifier and the main control module are shown in Table 3. As shown in Table 4, the price of a minimum operational system is less than USD 35.

3.2. Embedded System Development

The embedded development of the system was based on PlatformIO (Core 6.1.18). We employed a hybrid framework of ESP-IDF (version 5.1.2) and Arduino (version 2.0.14), integrating Arduino as a component of ESP-IDF to streamline the development process.

As shown in Figure 10a, upon powering up, the MCU enters the “Hardware Setup” mode. It configures the relevant pins and scans the

I^{2} C

addresses to identify the current EEG amplifier module. The parameters of the EEG amplifier module are then initialized based on the default settings stored in the Non-Volatile Storage (NVS). This process includes enabling the AFE, setting the gain, configuring the ADC sampling rate, activating Global-Chop, and establishing the cut-off frequency for the high-pass filter. Next, the MCU connects to a host computer via Wi-Fi, where the host sends the necessary configuration information. The controller retrieves the current time using the SNTP protocol to initialize storage on the TF card.

After completing the hardware initialization, the device automatically enters the EEG measurement mode. Data reading relies on a trigger from the DR pin, which activates the external interrupt and the task of sending EEG data. Once the external interrupt is received, the device immediately reads ADC data through SPI and event data. To reduce power and network load, data are not transmitted for every single data point. Instead, a certain number of data points are accumulated before transmission. When the number of data points in the queue reaches a predefined threshold, the interrupt service routine (ISR) sends a notification to inform the EEG transmission task. Upon receiving this instruction, the transmission task dequeues the data, parses and packages them, and then stores them on the TF card or sends them to the host computer.

In the paradigm of emotion recognition, viewing stimulus materials is essential for inducing emotions. To analyze the EEG data, it is necessary to know the start and end times of the stimuli. Therefore, tagging the start and end of the stimulus is crucial. To ensure reliable transmission of event data, data transmission and event/command transmission are differentiated. The following sections introduces data transmission and event transmission separately.

3.2.1. Data Transmission

To ensure reliable data transmission, many Wi-Fi-based acquisition systems utilize the TCP protocol or higher-level protocols built on TCP. For example, TCP is employed to transmit data in the JSON format [50]. In the EEG field, a commonly used higher-level protocol is the Lab Streaming Layer (LSL) protocol. This real-time data streaming and middleware system is primarily designed for multimodal data synchronization and transmission in neuroscience. The LSL protocol encompasses both TCP and UDP, with data transmission primarily relying on TCP [51]. For instance, a system developed based on LSL demonstrated its capabilities in multimodal biosensing [52]. Furthermore, the LSL protocol employs the Precision Time Protocol (PTP), which provides high synchronization accuracy and low latency for multimodal data [53]. However, since the system operates outside of controlled laboratory conditions, strict low-latency synchronization is not a primary requirement. While the Lab Streaming Layer (LSL) is effective for synchronization, it introduces additional overhead. Additionally, real-world network environments may exhibit instability, making reliable data arrival more critical than synchronization precision.

Basic TCP socket-based transmission minimizes overhead but poses risks such as packet loss. In addition, due to constraints imposed by the Maximum Segment Size (MSS), issues such as packet sticking may occur, adversely affecting downstream data processing. To address these challenges, the Message Queuing Telemetry Transport (MQTT) protocol, which operates over TCP, was selected for data transmission. It is a lightweight messaging protocol based on a publish/subscribe architecture, enabling efficient message publishing and subscription. Its Quality of Service (QoS) mechanism improves transmission reliability over TCP. MQTT defines three QoS levels: 0, 1, and 2. The differences between them are shown in Table 5.

The Quality of Service (QoS) level was configured to 1, ensuring that at least one copy of each data packet reached the host computer to minimize the risk of data loss. However, this setting may result in the delivery of duplicate packets. Although configuring QoS to level 2 would guarantee exactly one delivery, it would impose a significant load on the embedded device [54]. Therefore, QoS level 1 was selected as a trade-off between reliability and system overhead. To improve transmission efficiency, data were transmitted in raw binary format instead of using the JSON format. Each data packet included a header and a tail for framing purposes. An incremental packet ID was appended to each packet, along with an internal timestamp and other relevant metadata. These additions facilitated accurate and efficient data analysis on the host computer.

3.2.2. Event and Command Transmission

As previously discussed, while the MQTT protocol provides reliable data transmission, it presents certain limitations in real-time control scenarios [55]. A key drawback is the absence of a native prioritization mechanism. As a result, when control commands or critical events are transmitted concurrently with high-throughput EEG data via MQTT, the system may experience delayed responses, compromising temporal sensitivity and real-time performance.

To address this limitation, a dedicated TCP socket-based communication channel was implemented for event and command transmission. The corresponding processing thread on the embedded system was assigned a higher priority than MQTT-related tasks, ensuring timely handling of high-importance messages, particularly under conditions of heavy network load. Upon receiving a command, the device returned an acknowledgment (ACK) signal to the host; if the ACK was not received within a predefined timeout period, the host initiated automatic retransmission until a specified maximum number of attempts was reached. In scenarios requiring stricter responsiveness guarantees, deploying an auxiliary processing module, rather than handling all tasks on a single SoC, may be necessary to maintain real-time performance.

3.3. Software Development

The proposed data acquisition system was equipped with a corresponding software platform. The block diagram of the software is illustrated in Figure 11. The software was developed based on JavaScript (ES6+) and Node.js (18.20.0).

The software design encompassed both the backend service architecture and the graphical user interface (GUI) development.

3.3.1. Backend Service

The input layer began with an MQTT Broker that handles message brokering, followed by a server-side MQTT client that subscribed to all data topics published by the device. As previously noted, the system used QoS level 1, which may result in duplicate message delivery. Additionally, network congestion could cause packets to arrive out of order. To mitigate these issues, a buffering mechanism was implemented using Redis’s (5.0) sorted set data structure. Redis, a high-performance NoSQL database, enables each data element to be assigned a unique score—in this case, the packet ID—which allows for both deduplication and reordering prior to downstream processing.

The forwarding layer employed Node.js’s EventEmitter to broadcast the processed data to all registered output services. Current output options include local file storage and Redis-based storage. Due to browser security restrictions that prevent direct use of TCP sockets, data transmission to the graphical user interface was conducted via the WebSocket protocol. Additionally, a TCP socket forwarding module was provided to enable external servers to access real-time data streams for advanced analysis.

A command management server operated above the data pipeline to coordinate the configuration of the input, forwarding, and output services. It also handled command issuance to the device, including instructions to start or stop measurements and to tag specific events.

3.3.2. Web-Based GUI

The GUI of the software platform was developed using modern Web technologies. A screenshot of the waveform display interface is shown in Figure 12. The front-end was implemented with Vue 3 (3.3.4), while waveform rendering was handled using the open-source library P5.js (1.11.4). The interface allows users to configure both the displayed time window and amplitude range to suit various visualization needs.

4. Methodology

This section presents EmoAdapt, a self-supervised learning framework designed to extract discriminative EEG features under real-world acquisition constraints. By integrating contrastive learning and masked learning, EmoAdapt constructs robust and generalizable representations from variable and noisy EEG data.

Unlike conventional masked image modeling (MIM)-based approaches [27], EmoAdapt generates augmented EEG signals via domain-specific transformations and contrasts them with the original signals to learn invariant features. To further enhance discriminability, a masked learning task is introduced to capture localized patterns critical for emotion recognition. Additionally, differential masking rates are applied to the augmented and original signals, promoting hierarchical learning of local-to-global representations. This design ensures robust and comprehensive feature extraction, combining contrastive learning for alignment of general features and mask prediction learning of comprehensive features including local and local-to-global representations. Next, components are elaborated in detail.

4.1. Self-Supervised Feature Extractor

As illustrated in Figure 13, the feature extractor in EmoAdapt comprises four main components: signal transformation, tokenization with 1D convolutional neural network (1D-CNN), transformer encoder, and transformer decoder.

4.1.1. Signal Transformation

Signal transformation was designed to modify input signals while maximally preserving both local and global EEG features, thereby strengthening the influence of interference factors, which allowed the model to extract interference-independent features. For our signal transformation approach, we evaluated various signal augmentation techniques. Given the requirements of emotion recognition tasks and the channel limitation of our device, time-domain data augmentation was preferred over frequency-domain and spatial-domain methods, as temporal features are less indifferent for emotion computation and also more susceptible to the quality of signals. Referring to typical time-domain data augmentation techniques, our signal transformation framework incorporated the following operations: Gaussian noise injection, amplitude scaling, horizontal flipping, vertical flipping, temporal shifting, and time warping.

Given an original EEG data point

X \in R^{N \times C \times S}

, where N is the number of trials, C is the number of channels, and S is the number of time samples, we define six signal transformation methods:

1.

Gaussian noise addition:

X_{noisy} = X + ϵ,

(3)

where

ϵ \sim N (0, σ^{2})

is the addition noise following the Gaussian distribution with a mean of zero and a standard deviation of

σ

.

2.

Amplitude scaling:

X_{scaled} = α X,

(4)

where

α

is the amplitude scaling factor.

3.

Horizontal flipping (sign inversion):

X_{h - flipped} = - X,

(5)

4.

Vertical flipping (time reversal):

X_{v - flipped} [n, c] = [v_{s}, v_{s_{1}}, \dots, v_{1}, v_{0}],

(6)

where

X [n, c] = [v_{0}, v_{1}, \dots, v_{s - 1}, v_{s}]

, n is the trail index, and c is the channel index.

5.

Temporal dislocation:

(a): Split each trail $X_{i} \in R^{C \times S}$ into n segments at the S dimension. Denote the jth segment as $X_{i, j} \in R^{C \times S_{j}}$ , where $\sum_{j = 1}^{n} s_{j} = S$ .
(b): Randomly permute the segments.
(c): Concatenate to form new sample:

$X_{i, dislocated} = [\begin{matrix} X_{i, π (1)} & X_{i, π (2)} & \dots & X_{i, π (n)} \end{matrix}],$

(7)

where $π : {1, 2, \dots, n} \to {1, 2, \dots, n}$ is the random permutation (bijective function) that shuffles the order of the vectors.

6.

Time warping:

(a)

Split each trail

X_{i} \in R^{C \times S}

into m segments at the S dimension. Denote the jth segment as

X_{i, j} \in R^{C \times s_{j}}

, where

\sum_{j = 1}^{m} s_{j} = S

.

(b)

Each segment

X_{i, j}

is zoomed in/out by a random time scaling factor

r_{i, j}

which follows a uniform distribution cross segments:

s_{i, j}^{'} = r_{i, j} \cdot s_{j},

(8)

where

r_{i, j} \sim U (d_{low}, d_{high})

follows a uniform distribution with lower scaling limit

d_{l o w}

and upper scaling limit

d_{h i g h}

.

Linear interpolation is applied to each segment

X_{i, j} = [v_{0}, v_{1}, \dots, v_{s_{j}}]

, transforming the segment length from the original

s_{j}

to the new length

s_{i, j}^{'}

, resulting in the interpolated segment

X_{i, j}^{'} = [u_{0}, u_{1}, \dots, u_{s_{i, j}^{'} - 1}]

:

u_{j} = v_{⌊ t ⌋} + (t - ⌊ t ⌋) \cdot (v_{⌈ t ⌉} - v_{⌊ t ⌋}),

(9)

where

$t = (\frac{s_{j} - 1}{s_{i, j}^{'} - 1}) j$ , and $j = 0, 1, \dots, s_{i, j}^{'} - 1$ is the segment vector index.
$⌊ t ⌋$ (floor operation): greatest integer $\leq t$ .
$⌈ t ⌉$ (ceiling operation): smallest integer $\geq t$ .

(c)

The scaled segments are reconnected to form the signal

Z_{i} \in R^{C \times \sum_{j = 1}^{m} s_{i, j}^{'}}

. To align the new signal length with the original length, the time warping signal

Y_{i}

is finally obtained by resampling

Z_{i}

to the original length S:

X_{i, warping} = Resample (Z_{i}),

(10)

where signal

Z_{i} = [\begin{matrix} X_{i, 1}^{'} & \dots & X_{i, m}^{'} \end{matrix}]

, and the time warping signal

X_{i, warping} \in R^{C \times S}

.

4.1.2. Tockenization and 1D CNN

Tokenization began with a sliding-window operation on the original EEG signals and the transformed EEG signals. Specifically, the sliding window was applied for each channel individually, where each windowed segment generated one token. All resulting tokens across channels were concatenated to form the set of tokens

{[X_{i}]}_{i = 1}^{N}

. The order of token concatenation was inconsequential at that stage, as position embeddings were subsequently applied to preserve temporal and spatial relationships. The sliding window had a size of 1 s and a sliding step of 0.2 s throughout all experiments.

The sliding window approach served two critical purposes: First, it generated an expanded set of tokens to facilitate effective masked prediction tasks. Second, by generating tokens derived from different channels at varying temporal positions, the model could learn cross-channel correlations and temporal dependencies during masked prediction tasks.

Apart from conventional visual or textual data, EEG signals possess spatiotemporal characteristics that require specialized feature extraction. When general Masked Autoencoders (MAEs) [24] are applied directly to segmented EEG data, they fail to adequately capture temporal characteristics within segments. This indicate that the nearly linear projection used in ViT and MAE is not enough for EEG signal tokens to embed as token embeddings. To address this limitation, we incorporated a 1D CNN for temporal feature extraction, denoted as

F

, for each token.

The 1D CNN architecture was based on a multi-scaling 1D ResNet framework, which accelerated gradient flow during backpropagation and consequently enhanced parameter updates. The 1D CNN architecture drew inspiration from [27], with modifications to the max pooling, average pooling, and fully connected layers to meet affective computing requirements. The 1D CNN architecture employed a shared convolutional layer comprising a 1D convolution with kernel size 7, batch normalization, ReLU activation, and max pooling operations. A shared layer was followed by three parallel feature extractors with distinct kernel sizes (3, 5, and 7). Each extractor contained three identical convolutional blocks (1D convolution → batch normalization → ELU activation), with residual connections and average pooling applied after the final block. The multi-scale features from all extractors were concatenated and processed through fully connected layers to produce the output vector. It is important to note that this output already represented the token embeddings used in subsequent modules.

4.1.3. Transformer Encoder

The transformer encoder

E

mapped token embeddings into a latent space Z, where its structure followed the Masked Autoencoders of [24] excluding linear projection. Given the extracted features

{[F (X_{i})]}_{i = 1}^{N}

, where N represents the total number of tokens, position encoding was first performed by adding position embedding

P_{i}

to each token. Notably, the positional encoding was used for the encapsulation of partial temporal information and spatial features of EEG through token sequence. Random masking was added to token embeddings according to a predetermined masking ratio

ρ

, and random masking was used on both the original EEG branch (

ρ

= 0.8) and the transformed EEG branch (

ρ

= 0.5). The unmasked tokens were then collected and concatenated with a position-encoded class token C, which was a learnable token used to aggregate the information of all other tokens to form a global feature representation. Together, this combined representation was processed through four series-connected transformer encoder blocks (encoder heads = 8, embed dim = 768) to obtain the latent space Z:

{[Z_{i}]}_{i = 1}^{\hat{N} + 1} = E {{[F (X_{i}) + P_{i}]}_{i = 1}^{\hat{N}} \oplus [C + P_{\hat{N} + 1}]},

(11)

where

\hat{N} = (1 - ρ) N

denotes the number of remaining tokens after masking, and ⊕ represents the concatenation operation. The latent class token

Z_{\hat{N} + 1}

served as the feature representation for downstream emotion recognition tasks.

Notably, we applied distinct masking ratios to original (

ρ_{o r i g}

) and transformed (

ρ_{t r a n s}

) EEG signals, with

ρ_{o r i g} > ρ_{t r a n s}

to facilitate local-to-global representation learning. Only the previous networks, including the transformer encoder, were utilized during the downstream task evaluation.

4.1.4. Transformer Decoder

The transformer decoder

D

reconstructed the EEG signal from the latent representations

{[Z_{i}]}_{i = 1}^{\hat{N}}

; additionally, only the latent representations of the original signal were included. The transformer decoder had a similar structure to the transformer encoder, but it employed three series-connected transformer blocks (encoder heads = 8, embed dim = 768) and included an additional linear projection layer. The linear projection was applied to the latent representations

{[Z_{i}]}_{i = 1}^{\hat{N}}

, aiming for a reduction in the token dimensionality. Then, the projected latent representations were concatenated with additional mask tokens

{[M_{i}]}_{i = 1}^{N - \hat{N}}

to restore the original token sequence structure with number N. This combined representation underwent positional encoding before being processed by the transformer decoder. The final reconstructed EEG signal Y was obtained through layer normalization and a linear projection operation

G

applied to the transformer decoder’s output:

{[Y_{i}]}_{i = 1}^{N} = G \cdot D {P ({[Z_{i}]}_{i = 1}^{\hat{N}} \oplus {[M_{i}]}_{i = 1}^{N - \hat{N}}) + {[P_{i}]}_{i = 1}^{N}},

(12)

where

P

denotes the reorganized operation. It should be noted that the positional embeddings

P_{i}

are different in the transformer encoder and decoder.

4.2. Training Task

The pretext objectives can be divided into signal reconstruction and contrastive learning. For the signal reconstruction task, we exclusively computed the loss on masked patches between the pixel decoder’s predictions and the original EEG signals, ensuring that

{[Y_{i}]}_{i = 1}^{N_{m}}

approximated

{[X_{i}]}_{i = 1}^{N_{m}}

as closely as possible. Given the high variability in raw EEG signals, the traditional Mean Squared Error (MSE) may suffer from instability due to differences in data scale and magnitude. To address this, we employed the Normalized Mean Squared Error (NMSE) loss for more robust optimization. The reconstruction loss was as follows:

L_{rec} = \frac{1}{N_{m}} \sum {({[Y_{i}]}_{i = 1}^{N_{m}} - {[\frac{X_{i} - {\bar{X}}_{i}}{σ_{x_{i}}}]}_{i = 1}^{N_{m}})}^{2},

(13)

where

N_{m} = N - \hat{N}

represents the number of masked tokens.

{\bar{X}}_{i}

and

σ_{x_{i}}

represent the mean and standard deviation of each token, respectively.

The objective of contrastive learning is to maximize the similarity between identical instances while minimizing the similarity between different instances. Given the original EEG signal and its transformed pair, we extracted two latent class tokens, denoted as

Z_{\hat{N} + 1}^{original}

and

Z_{\hat{N} + 1}^{transform}

, respectively.

Based on these tokens, we constructed positive pairs by matching tokens from the same batch position index, while negative pairs were formed by pairing tokens from different indices. The similarity between these token pairs was then quantified using the Normalized Temperature-scaled Cross-Entropy loss (NT-Xent) [56].

The final contrastive loss

L_{contra}

was computed as the mean of the bidirectional NT-Xent losses between the original and transformed token representations:

L_{contrast} = \frac{1}{2} (L_{original \to transform} + L_{transform \to original}),

(14)

L_{a \to b} = - \frac{1}{N_{b}} \sum_{i = 1}^{N_{b}} log \frac{exp (sim (z_{i}^{a}, z_{i}^{b}) / τ)}{\sum_{k = 1}^{2 N_{b}} 1_{k \neq i} exp (sim (z_{i}^{a}, z_{k}^{b}) / τ)},

(15)

where

sim (u, v) = u^{⊤} v / (∥ u ∥ ∥ v ∥)

represents the cosine similarity,

τ

is a temperature parameter scaling the similarity distribution default to 0.05,

N_{b}

is the batch size, and

1_{k \neq i}

is an indicator function evaluating to 1 when

k \neq i

.

In the training process, the masked prediction task was performed concurrently with contrastive learning. To balance the contributions of reconstruction loss and contrastive loss, we introduced a hyperparameter

α

. This configuration allowed the overall loss magnitude to be controlled by the learning rate, while the ratio between reconstruction loss and contrastive loss could be adjusted through

α

. The final compound loss function was

L = L_{rec} + α L_{contrast},

(16)

4.3. Classification Task

The proposed method did not focus on the design of downstream classifiers, as a wide variety of established models are already available for emotion recognition. Machine learning-based methods include Linear Discriminant Analysis (LDA) [57], Bayesian methods [58], SVM [59], MLP, KNN [60], and Random Forrest [61]. Deep learning-based approaches comprise Deep Belief Networks (DBNs) [20], CNNs [62,63,64], CNN-LSTM hybrid models [65], Graph Neural Networks (GNNs) [66], and CNN–transformer hybrid [67]. In addition to conventional classifiers, domain adaptation (DA) techniques are more suitable for cross-subject tasks, given their growing prominence in contemporary research [17,18].

ML algorithms generally achieve lower accuracy than the latest DL algorithms, though their robustness enables consistent performance across diverse scenarios. Thus, ML methods remain more prevalent in most emotion recognition systems and commercial applications. In the following experiment, we employed multiple ML algorithms for downstream task training and testing, and four commonly used ML methods were selected: SVM, MLP, KNN, and RF.

Model performance was evaluated using accuracy (ACC) and Macro-F1 (MF1) scores. ACC reflects the overall prediction correctness of the model, while MF1 simultaneously considers both precision and recall across all classes, providing a balanced measure especially valuable for imbalanced datasets. Formulas for ACC and MF1 are as follows:

A C C = \frac{T P + T N}{T P + T N + F P + F N},

(17)

M F 1 = \frac{2}{N} \sum_{i = 1}^{N} \frac{P_{i} \times R_{i}}{P_{i} + R_{i}},

(18)

where

P_{i} = \frac{T P_{i}}{T P_{i} + F P_{i}}, R_{i} = \frac{T P_{i}}{T P_{i} + F N_{i}}

,

T P

and

T N

denotes the True Positives and True Negatives,

F P

and

F N

denotes the False Positives and False Negatives, and N is the total number of classes.

5. Experiment and Materials

This section outlines the datasets employed for algorithm validation, the experimental protocol for data acquisition, and all data processing procedures.

5.1. Channel Selection

For few-channel emotion recognition devices, electrode selection is crucial. Since emotions typically involve activity across different brain regions, sparse electrode configurations often fail to adequately capture emotional dynamics. Based on prior research on optimal electrode placement, the temporal lobes regions are generally prioritized [20]. However, to better capture the comprehensive spatial patterns of brain activity, we strategically positioned two additional electrodes over the prefrontal and parietal regions. The final electrode configuration consisted of Fp1, C5, Cp3, and P4. Reference and bias electrode were selected to be the left ear A1 and right ear A2, respectively. The layout of the selected EEG electrodes based on the 10–20 system is shown in Figure 14.

5.2. SEED Dataset

The SEED dataset was collected from 15 participants (7 males and 8 females), each undergoing three sessions one week apart, resulting in a total of 45 sessions [68]. In each session, participants watched 15 film clips (5 positive, 5 negative, and 5 neutral) presented in a well-organized sequence to prevent consecutive clips of the same emotion. Each trial began with a 5 s hint, followed by a 4 min film clip, a 45 s self-assessment where participants reported their emotional responses via a questionnaire, and a 15 s rest period. The dataset was designed for a three-class emotion classification task (positive, negative, neutral), with participants instructed to provide immediate and genuine feedback after each clip to ensure reliable emotional labeling.

5.3. Experiment Protocol

To ensure the comparability and reliability of our experiment, we employed the same emotional induction materials as the SEED dataset, consisting of 15 Chinese film clips categorized into positive, neutral, and negative. The experiment involved two healthy subjects with no visual impairments, each seated comfortably in front of a computer. The experiment comprised a total of 15 trials, with each trial beginning with a 5 s cue before the video playback, followed by a 20 s rest period after each clip. The presentation order of the film clips was kept the same as that of the SEED dataset.

5.4. Data Processing

For the SEED dataset, we utilized the provided processed data, which was downsampled to 200 Hz and filtered with a 0–75 Hz low-pass frequency filter after segmentation. The processed data were segmented using a fixed 5 s window length without any overlap to simulate a real online task, where 5 s was selected to balance the prediction accuracy and feedback frequency. Labels were assigned to the segmented data based on the video sequence, and the segmented data were concatenated along the first dimension, resulting in labels of shape (windows) and data of shape (windows × channels × 1000), with each experiment saved as a separate file. Our collected data underwent identical processing steps as the SEED dataset, including downsampling to 200 Hz, applying a 0–75 Hz low-pass filter, and segmentation with a fixed 5 s window length, with each experiment also saved as an individual file.

6. Results

6.1. Signal Acquisition Validation

To verify the functionality of the proposed EEG acquisition system, a series of signal recording experiments were conducted under typical usage scenarios.

Figure 15 shows two representative types of bioelectrical signals recorded in addition to EEG. Clear eye-blink artifacts are visible in the EEG traces. Distinct EMG activity was observed during voluntary teeth clenching, reflecting effective muscle signal acquisition.

For EEG recordings, the alpha-band activity was clearly observed in the occipital region during the eyes-closed state. Figure 16 illustrates the PSD and time–frequency analysis of EEG signals under eyes-open and eyes-closed conditions. A pronounced increase in alpha-band power was evident during the eyes-closed state, consistent with well-established physiological patterns.

6.2. Methodology Validation

6.2.1. Feature Baselines

Power spectral density (PSD) and differential entropy (DE) features are commonly employed in emotion recognition systems, with their effectiveness well documented in previous studies. Both PSD and DE features are typically extracted across five standard frequency bands after applying a Hamming window:

δ

: 1–3 Hz,

θ

: 4–7 Hz,

α

: 8–13 Hz,

β

: 14–30 Hz, and

γ

: 31–50 Hz. It should be noted that the DE feature, originally an extension of Shannon entropy under Gaussian distribution, considers the variance of EEG time series signals. However, since this variance equals the average energy within the frequency band, DE can alternatively be computed through a PSD. Time-domain DE extraction is given by [68]:

h (X) = \frac{1}{2} log (2 π e σ^{2}),

(19)

where EEG is the time sequence X and

X \sim N (μ, σ^{2})

For both PSD and DE features in our experiment, we applied the FFT with a fixed 1 s window corresponding to 200 samples, without overlapping the windows.

In addition to frequency-domain features, we incorporated time-domain features considering potential online system applications. We selected computationally efficient features including the following:

Power (P): reflects signal intensity, calculated as the ratio of the sum of squared signal samples to the number of samples.
Line length (L): measures waveform dimensional changes, influenced by both frequency and amplitude variations, computed as the sum of absolute differences between consecutive samples.
Root-mean-square (RMS): obtained by taking the square root of the mean squared EEG signal samples.
First difference (D1): calculated as the sum of differences between $N - 1$ sample pairs divided by $N - 1$ .
Second difference (D2): computed similarly to D1 but based on $N - 2$ samples.

6.2.2. Model Baselines

Multiple recent and classical machine learning and deep learning methods were compared in the experiments, and they were tuned to their best performance.

Feature-based machine learning baselines and their parameters were as follows:

Random Forest (RF): the number of estimators was set to 100 for both traditional and SSL-based features.
Multilayer perceptron (MLP):
–
SSL-based features: hidden layer size = 1000, initial learning rate = $1 \times 10^{- 5}$ , ReLU activation.
–
Traditional features: hidden layer size = 100, initial learning rate = $1 \times 10^{- 4}$ , ReLU activation.
k-Nearest Neighbors (k-NN): the number of neighbors (k) was set to 50 for both feature types.
Support Vector Machine (SVM):
–
SSL-based features: regularization parameter $C = 5$ , RBF kernel.
–
Traditional features: $C = 10$ , RBF kernel.

Deep learning baselines included the following:

ERTNet is a CNN–transformer model. Feature maps are through temporal convolution and spatial convolution, and advanced spatiotemporal features are integrated using a transformer [67].
EEGNET is a classical EEG-based deep learning method, including depthwise and separable convolutions [63].
ShallowConvNet and DeepConvNet are two CNNs of different depths designed based on EEG [64].
Tsception is a multi-scale convolutional neural network, consisting of dynamic time layers, asymmetric spaces and advanced fusion layers [62].

6.2.3. Validation on SEED Dataset

In the cross-session evaluation, we ensured that the SSL model and downstream machine learning training utilized identical sessions for the training set, while the test set was exclusively composed of data from a distinct session. Specifically, the SEED dataset comprised three sessions, each containing data from 15 subjects. Initially, we trained the unsupervised model using all subjects from one session for 40 epochs. Subsequently, the identical session’s data were employed for downstream emotion classification training. Upon completion of these two training phases, the model was evaluated on the remaining two sessions separately, with the ACC and MF1 being recorded. To enhance robustness, four distinct machine learning models were trained and tested in parallel. After iterating this process across all three sessions, the results were averaged to yield the SSL features’ performance reported in Table 6. The methods for training and testing ML models with baseline features and SSL features were the same. It can be understood as replacing the SSL features’ output by the SSL model with baseline features for downstream training and testing. For the DL methods, the models were trained on one session and independently evaluated on the other two sessions.

The experimental results demonstrated that the SSL features extracted by the SSL model outperformed the baseline features on the RF, MLP, and SVM in both ACC and MF1, and they also outperformed all DL methods. Notably, the SVM classifier delivered the highest performance with SSL features, attaining an average accuracy of 0.5471 and an MF1 of 0.5447. Furthermore, SSL features exhibited a smaller performance variance among different ML algorithms, with maximum deviations of only 0.034 (ACC and MF1) across models; this showed the robustness of our proposed algorithm.

However, we observed that the SSL features achieved comparable ACC to DE features when using KNN, while outperforming DE features in terms of MF1. The improvements in SSL features over traditional methods remained modest, with the highest gains observed for the MLP (0.058 in ACC and 0.066 in MF1). Although our algorithm demonstrated greater robustness than traditional features on the SEED dataset, the overall performance improvement was limited.

6.2.4. Validation on Experiment Dataset

Following the validation on the SEED dataset, we further investigated the performance of the SSL model on our proposed portable device through a cross-session experiment. In our cross-session test, we intentionally employed a cross-dataset SSL feature extractor to evaluate the robustness and generalization capability. Specifically, the SSL feature extractor was trained on session 3 from the SEED dataset, while downstream ML classifiers were trained and tested based on the experiment’s data in a cross-session manner. Our validation results indicated that this cross-dataset approach caused a performance reduction of 1–3% compared to conventional cross-session SSL training.

Despite this slight performance trade-off, we deliberately adopted the cross-dataset paradigm for two key reasons: to prove the robustness of our algorithm to the changes in the dataset and to highlight its low sensitivity to the EEG signal quality.

For the ML baselines’ comparison, we exclusively employed DE features, as they demonstrated better performance compared to PSD and time-domain features in prior cross-session validation.

Experimental results from Table 7 demonstrate that our proposed method outperformed ML baselines based on DE features across both ACC and MF1. The most significant improvement was observed with the MLP, achieving a 20.4% increase in ACC and a 20.9% enhancement in MF1. The RF classifier yielded the highest performance under the SSL features (ACC: 60.2%; MF1: 59.4%). The DL baselines showed comparable performance to DE-based ML methods in that experiment. Tsception, which had the best performance among DL methods, exhibited similar results to the DE-based Random Forest, with differences in both ACC and MF1 being less than 1%. However, our proposed approach based on RF outperformed Tsception and DE-based RF by around 5% ACC and 6% MF1. Through a comparative analysis between our experiment and the previous experiment on the SEED dataset, our method demonstrated significantly greater performance improvements on our dataset compared to other baseline approaches. For instance, while the SSL feature-based SVM showed only a 1% ACC improvement over DE features on the SEED dataset, our experiments revealed a substantial 9% enhancement. Similarly, the MLP achieved a 20.4% ACC improvement in our study, significantly higher than the 5.8% observed on the SEED dataset. The Random Forest classifier showed a 4.8% ACC increase (compared to 0.24% on the SEED dataset), and KNN demonstrated a 9.9% improvement (versus −0.17% on the SEED dataset).

These results indicated that our model exhibited more significant effectiveness under our proposed device, which aligned with our initial hypotheses. These findings substantially support that our model performs particularly well on devices with potential signal quality issues.

6.2.5. Ablation Study

In this section, we conducted ablation experiments to validate the effectiveness of each component in our model, and the experiment results can be found in Table 8. As described in Section 4 (Methodology), we introduced contrastive learning through an additional branch, where different mask rates were applied to the original EEG and transformed EEG branches, enabling the model to learn from local-to-global representations.

To investigate the impact of key components, we performed three ablation studies:

1.: Non-ST (signal transformation): we removed the signal transformation component to examine the effectiveness of contrastive learning.
We adjusted the mask rate of the transformed EEG branch to evaluate the importance of local-to-global learning. Specifically, we implemented two variants:
2.: Non-DMR (differential mask rate): the mask rate of the transformed EEG branch was set equal to the original EEG branch, which was fixed at 0.8 during all experiments.
3.: Non-STM (spatio-temporal masking): the mask rate of the transformed EEG branch was set to zero (no masking).

6.2.6. Vitalization

To comprehensively evaluate the performance enhancement of SSL features over conventional features, we employed t-distributed stochastic neighbor embedding (t-SNE) for dimensionality reduction and visualization. The t-SNE projection demonstrated an expansion of inter-class separability, indicating that the distinctions between different emotion categories became more pronounced in the SSL feature space. Additionally, intra-class variability was substantially minimized, resulting in tighter clustering of samples belonging to the same emotional category. As illustrated in Figure 17, the SSL features exhibited more compact intra-class distributions and clearer inter-class separation boundaries compared to conventional DE features.

Figure 17a,b shows the data of session 3 of the 15th subject in the SEED dataset. We can see the three non-overlapping classes of emotions from the t-SNE space in Figure 17a. However, the DE features demonstrated limited performance on devices with potential signal quality issues, as illustrated in Figure 17c. On our acquisition equipment, the DE features failed to discriminate among the three emotional classes, showing complete overlap in their representations. In contrast, our proposed method achieved effective separation of the three emotional categories in the t-SNE space with significantly reduced overlapping.

Notably, we observed a partial overlap between a small subset of neutral and happy emotion samples in Figure 17b, which likely reflected inherent label noise caused by imperfect emotional elicitation during data collection rather than deficiencies in the SSL model itself. This observation underscores the model’s ability to capture subtle emotional nuances that may not be perfectly aligned with the ground-truth labels.

7. Discussion

As shown in Table 1, the amplifier selected in this work operated at a lower voltage while providing higher gain compared to several established commercial devices. While this configuration reduced the burden on the ADC, it constrained the dynamic range, limiting the input to

4.4 {mV}_{pp}

. To mitigate this, the system is typically operated at a lower gain setting to accommodate a wider dynamic range. Additionally, stable electrode attachment is critical to prevent motion-induced voltage spikes that may cause amplifier saturation. Currently, input noise, CMRR, and other parameters are not yet optimized and will be addressed in future iterations. However, our prototype system incorporating the proposed algorithm successfully achieved the intended emotion analysis objectives, demonstrating encouraging initial performance.

To address these signal quality limitations, we introduced the self-supervised feature extractor EmoAdapt. As demonstrated in Table 6 and Table 7, EmoAdapt significantly alleviated performance degradation due to signal noise. However, it still encountered challenges related to cross-subject feature distribution discrepancies, particularly in scenarios involving new users without conditions for model fine-tuning. Future work will explore extending the framework to a semi-supervised paradigm, where partial label information is incorporated into the NT-Xent loss to better align feature centroids between positive and negative pairs.

Currently, the system is a single-modal, low-channel EEG acquisition device. Extensive studies have demonstrated that multimodal EEG can enhance emotion recognition performance. For instance, Yang et al. proposed an AI edge computation platform for emotion recognition using wearable physiological sensors [69]. By utilizing EEG and ECG/PPG signals for emotion classification, the accuracy achieved was 94.3% and 76.8%, respectively. That platform integrated a RISC-V CPU, an accelerator, and an FPGA, enabling real-time applications and local neural network training. Future enhancements may involve exploring more complex network architectures and multimodal signal fusion strategies.

Similarly, Kim et al. introduced a wearable system capable of detecting emotional transitions and identifying causal relationships in daily scenarios [70]. Additionally, Saffaryazdi et al. combined facial micro-expressions, EEG, GSR, PPG, and other multimodal signals for emotion recognition [71]. Unlike macro-expressions, micro-expressions provide more nuanced cues, contributing to greater accuracy. Their experimental results indicated that combining multiple modalities and fusing their outputs could improve emotion recognition.

Motivated by these insights, future development of the proposed system could exploit its reserved expansion interfaces to enable multimodal signal acquisition. Nevertheless, given the system’s emphasis on cost-effectiveness, it is essential to critically assess whether the integration of additional modalities yields performance improvements that justify the increased hardware complexity and cost. If such gains do not scale proportionally, the incorporation of additional sensing modalities may not be economically justified.

In this paper, we proposed a modular concept for the system, allowing for easy replacement of the analog front-end. Currently, we have only validated the feasibility of this concept using the KS1092 AFE. However, there has been no side-by-side comparison with state-of-the-art equipment. Therefore, we can design various AFE modules based on this system and evaluate their performance individually. Additionally, we can establish an objective cost-effectiveness metric based on the test results to facilitate the selection process for future researchers and engineers during system design.

8. Conclusions

This study presented a complete, portable, and cost-effective EEG-based emotion recognition system that integrated modular hardware with a robust self-supervised learning framework. The hardware, developed with a total cost under USD 30, supported both wet and dry electrodes, wireless data transmission, onboard storage, adjustable gain and sampling rates, and real-time visualization through a web interface, making it suitable for real-world deployment. On the algorithmic side, we proposed EmoAdapt, a self-supervised feature extractor designed to enhance emotion-related EEG feature learning. Evaluated on both public (SEED) and in-house datasets under cross-session, cross-subject, and cross-dataset scenarios, EmoAdapt consistently outperformed baseline methods in terms of feature robustness and discriminability, particularly under the low-signal-quality conditions typical of portable acquisition. The proposed system demonstrated a well-balanced design between hardware affordability and algorithmic performance, offering a scalable and accessible solution for practical affective-computing applications.

Author Contributions

Conceptualization, H.L. (Hao Luo), H.L. (Haobo Li), C.-I.I. and F.W.; Methodology, H.L. (Haobo Li) and W.T.; Software, H.L. (Hao Luo) and Y.Y.; Validation, H.L. (Hao Luo) and H.L. (Haobo Li); Formal analysis, H.L. (Haobo Li) and Y.Y.; Investigation, H.L. (Hao Luo) and W.T.; Resources, C.-I.I. and F.W.; Data curation, H.L. (Haobo Li); Writing—original draft, H.L. (Hao Luo), H.L. (Haobo Li); Writing—review & editing, W.T., Y.Y., C.-I.I. and F.W.; Visualization, H.L. (Hao Luo) and H.L. (Haobo Li); Supervision, C.-I.I. and F.W.; Project administration, C.-I.I. and F.W.; Funding acquisition, C.-I.I. and F.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded in part by The Science and Technology Development Fund, Macau SAR (File no. 0085/2023/AMJ), by University of Macau (File no. MYRG2022-00197-FST, MYRG-GRG2024-00285-FST, MYRG-CRG2024-00048-FST-ICI), and by Guangdong Basic and Applied Basic Research Foundation (Grant No. 2023A1515010844).

Data Availability Statement

The SEED dataset used in the methodology validation can be found at https://bcmi.sjtu.edu.cn/home/seed/seed.html (accessed on 20 April 2025). The proposed algorithm code and processed experimental data can be found at https://github.com/LihaoboECE/EmoAdapt (accessed on 20 April 2025).

Acknowledgments

Thanks are due to Jerry Zhang for offering technical support on KS1092.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Alarcao, S.M.; Fonseca, M.J. Emotions recognition using EEG signals: A survey. IEEE Trans. Affect. Comput. 2017, 10, 374–393. [Google Scholar] [CrossRef]
Li, X.; Zhang, Y.; Tiwari, P.; Song, D.; Hu, B.; Yang, M.; Zhao, Z.; Kumar, N.; Marttinen, P. EEG based emotion recognition: A tutorial and review. ACM Comput. Surv. 2022, 55, 1–57. [Google Scholar] [CrossRef]
Halim, Z.; Rehan, M. On identification of driving-induced stress using electroencephalogram signals: A framework based on wearable safety-critical scheme and machine learning. Inf. Fusion 2020, 53, 66–79. [Google Scholar] [CrossRef]
Rozgic, V.; Vazquez-Reina, A.; Crystal, M.; Srivastava, A.; Tan, V.; Berka, C. Multi-modal prediction of ptsd and stress indicators. In Proceedings of the 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; IEEE: New York, NY, USA, 2014; pp. 3636–3640. [Google Scholar]
Cai, H.; Qu, Z.; Li, Z.; Zhang, Y.; Hu, X.; Hu, B. Feature-level fusion approaches based on multimodal EEG data for depression recognition. Inf. Fusion 2020, 59, 127–138. [Google Scholar] [CrossRef]
Moshfeghi, Y.; Jose, J.M. An effective implicit relevance feedback technique using affective, physiological and behavioural features. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 28 July–1 August 2013; pp. 133–142. [Google Scholar]
Nguyen, T.H.; Chung, W.Y. Negative news recognition during social media news consumption using EEG. IEEE Access 2019, 7, 133227–133236. [Google Scholar] [CrossRef]
Alakus, T.B.; Gonen, M.; Turkoglu, I. Database for an emotion recognition system based on EEG signals and various computer games–GAMEEMO. Biomed. Signal Process. Control 2020, 60, 101951. [Google Scholar] [CrossRef]
Vesisenaho, M.; Juntunen, M.; Fagerlund, J.; Miakush, I.; Parviainen, T.; Miakush, I.; Parviainen, T. Virtual reality in education: Focus on the role of emotions and physiological reactivity. J. Virtual Worlds Res. 2019, 12, 1. [Google Scholar] [CrossRef]
Ratti, E.; Waninger, S.; Berka, C.; Ruffini, G.; Verma, A. Comparison of medical and consumer wireless EEG systems for use in clinical trials. Front. Hum. Neurosci. 2017, 11, 398. [Google Scholar] [CrossRef]
Maskeliunas, R.; Damasevicius, R.; Martisius, I.; Vasiljevas, M. Consumer-grade EEG devices: Are they usable for control tasks? PeerJ 2016, 4, e1746. [Google Scholar] [CrossRef]
Dadebayev, D.; Goh, W.W.; Tan, E.X. EEG-based emotion recognition: Review of commercial EEG devices and machine learning techniques. J. King Saud-Univ.-Comput. Inf. Sci. 2022, 34, 4385–4401. [Google Scholar] [CrossRef]
Badcock, N.A.; Mousikou, P.; Mahajan, Y.; De Lissa, P.; Thie, J.; McArthur, G. Validation of the Emotiv EPOC® EEG gaming system for measuring research quality auditory ERPs. PeerJ 2013, 1, e38. [Google Scholar] [CrossRef] [PubMed]
Song, T.; Zheng, W.; Song, P.; Cui, Z. EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Trans. Affect. Comput. 2018, 11, 532–541. [Google Scholar] [CrossRef]
Petrantonakis, P.C.; Hadjileontiadis, L.J. Emotion recognition from EEG using higher order crossings. IEEE Trans. Inf. Technol. Biomed. 2009, 14, 186–197. [Google Scholar] [CrossRef]
Oh, S.H.; Lee, Y.R.; Kim, H.N. A novel EEG feature extraction method using Hjorth parameter. Int. J. Electron. Electr. Eng. 2014, 2, 106–110. [Google Scholar] [CrossRef]
Yang, Y.; Wang, Z.; Tao, W.; Liu, X.; Jia, Z.; Wang, B.; Wan, F. Spectral-spatial attention alignment for multi-source domain adaptation in EEG-based emotion recognition. IEEE Trans. Affect. Comput. 2024, 15, 2012–2024. [Google Scholar] [CrossRef]
Yang, Y.; Wang, Z.; Song, Y.; Jia, Z.; Wang, B.; Jung, T.P.; Wan, F. Exploiting the Intrinsic Neighborhood Semantic Structure for Domain Adaptation in EEG-based Emotion Recognition. IEEE Trans. Affect. Comput. 2025, 1–13. [Google Scholar] [CrossRef]
Shi, L.C.; Jiao, Y.Y.; Lu, B.L. Differential entropy feature for EEG-based vigilance estimation. In Proceedings of the 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan, 3–7 July 2013; IEEE: New York, NY, USA, 2013; pp. 6627–6630. [Google Scholar]
Zheng, W.L.; Lu, B.L. Investigating critical frequency bands and channels for EEG-based emotion recognition with deep neural networks. IEEE Trans. Auton. Ment. Dev. 2015, 7, 162–175. [Google Scholar] [CrossRef]
Mohsenvand, M.N.; Izadi, M.R.; Maes, P. Contrastive representation learning for electroencephalogram classification. In Proceedings of the Machine Learning for Health, Virtual, 11 December 2020; PMLR: New York, NY, USA, 2020; pp. 238–253. [Google Scholar]
Wang, X.; Ma, Y.; Cammon, J.; Fang, F.; Gao, Y.; Zhang, Y. Self-supervised EEG emotion recognition models based on CNN. IEEE Trans. Neural Syst. Rehabil. Eng. 2023, 31, 1952–1962. [Google Scholar] [CrossRef]
Jiang, X.; Zhao, J.; Du, B.; Yuan, Z. Self-supervised contrastive learning for EEG-based sleep staging. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; IEEE: New York, NY, USA, 2021; pp. 1–8. [Google Scholar]
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollár, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Xie, Z.; Zhang, Z.; Cao, Y.; Lin, Y.; Bao, J.; Yao, Z.; Dai, Q.; Hu, H. Simmim: A simple framework for masked image modeling. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022. [Google Scholar]
Huang, Z.; Jin, X.; Lu, C.; Hou, Q.; Cheng, M.M.; Fu, D.; Shen, X.; Feng, J. Contrastive Masked Autoencoders are Stronger Vision Learners. IEEE Trans. Pattern Anal. Mach. Intell. 2024, 46, 2506–2517. [Google Scholar] [CrossRef]
Lee, C.H.; Kim, H.; Han, H.-J.; Jung, M.K.; Yoon, B.C.; Kim, D.J. NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep Stage Classification Using Single-Channel EEG. arXiv 2024, arXiv:2404.17585. [Google Scholar]
OpenBCI. Ganglion EEG Device. Available online: https://docs.openbci.com/Ganglion/GanglionSpecs/ (accessed on 1 April 2025).
Emotiv. Insight EEG Device. Available online: https://www.emotiv.com/products/insight (accessed on 1 April 2025).
InteraXon. MUSE S Athena EEG Device. Available online: https://choosemuse.com/products/muse-s-athena (accessed on 1 April 2025).
NeuroSky. MindWave Mobile EEG Device. Available online: https://store.neurosky.com/pages/mindwave (accessed on 1 April 2025).
Bitbrain. Diadem Dry EEG Device. Available online: https://www.bitbrain.com/neurotechnology-products/dry-eeg/diadem (accessed on 1 April 2025).
Bitbrain. Ikon Textile EEG Device. Available online: https://www.bitbrain.com/neurotechnology-products/textile-eeg/ikon (accessed on 1 April 2025).
Emotiv. EPOCX EEG Device. Available online: https://www.emotiv.com/products/epoc-x (accessed on 1 April 2025).
InteraXon. MUSE 2 EEG Device. Available online: https://choosemuse.com/products/muse-2 (accessed on 1 April 2025).
OpenBCI. Cyton EEG Device. Available online: https://docs.openbci.com/Cyton/CytonSpecs/ (accessed on 1 April 2025).
Dabbaghian, A.; Yousefi, T.; Fatmi, S.Z.; Shafia, P.; Kassiri, H. A 9.2-g Fully-Flexible Wireless Ambulatory EEG Monitoring and Diagnostics Headband With Analog Motion Artifact Detection and Compensation. IEEE Trans. Biomed. Circuits Syst. 2019, 13, 1141–1151. [Google Scholar] [CrossRef]
Wang, X.; Huang, W.; He, C.; Wu, H.; Lin, J.; Cheng, L. A Flexible EEG Acquisition Headband with High Reliability and High Signal-to-Noise Ratio. IEEE Sens. J. 2024, 24, 14370–14379. [Google Scholar] [CrossRef]
Tian, F.; Zhu, L.; Shi, Q.; Wang, R.; Zhang, L.; Dong, Q.; Qian, K.; Zhao, Q.; Hu, B. The Three-Lead EEG Sensor: Introducing an EEG-Assisted Depression Diagnosis System Based on Ant Lion Optimization. IEEE Trans. Biomed. Circuits Syst. 2023, 17, 1305–1318. [Google Scholar] [CrossRef]
Frey, S.; Lucchini, M.A.; Kartsch, V.; Ingolfsson, T.M.; Bernardi, A.H.; Segessenmann, M.; Osieleniec, J.; Benatti, S.; Benini, L.; Cossettini, A. GAPses: Versatile smart glasses for comfortable and fully-dry acquisition and parallel ultra-low-power processing of EEG and EOG. IEEE Trans. Biomed. Circuits Syst. 2024, 1–11. [Google Scholar] [CrossRef]
Ding, R.; Hovine, C.; Callemeyn, P.; Kraft, M.; Bertrand, A. A wireless, scalable and modular EEG sensor network platform for unobtrusive brain recordings. IEEE Sens. J. 2025, 1. [Google Scholar] [CrossRef]
Jiang, Y.; Tian, M.; Zhang, J.; Li, J.; Tan, C.; Ren, C.; Feng, J.; Cai, Y.; Gao, J.; Ma, Y.; et al. IEMS: An IoT-Empowered Wearable Multimodal Monitoring System in Neurocritical Care. IEEE Internet Things J. 2023, 10, 1860–1875. [Google Scholar] [CrossRef]
Zhu, Y.; Yu, R.; Huang, S.; He, A. A Motor Imagery Training System Based on LHE7909 Portable EEG Acquisition Device. In Proceedings of the 2024 5th International Symposium on Computer Engineering and Intelligent Communications (ISCEIC), Wuhan, China, 8–10 November 2024; pp. 70–73. [Google Scholar] [CrossRef]
Mendes Junior, J.J.A.; Campos, D.P.; Biassio, L.C.d.A.V.D.; Passos, P.C.; Júnior, P.B.; Lazzaretti, A.E.; Krueger, E. AD8232 to Biopotentials Sensors: Open Source Project and Benchmark. Electronics 2023, 12, 833. [Google Scholar] [CrossRef]
Nanochap. Vital Signs Parameter Detection Chip. Available online: https://www.nanochap.cn/shengwuchuangan/shengmingtizhengcanshujiancexinpian.html (accessed on 1 April 2025).
Kingsense. Single/Dual Channel Low Power Microvolt (uV) Level Signal Acquisition Analog Front-End. Available online: http://www.ks-chip.com/products/show-773.html (accessed on 21 March 2025).
Li, B.; Cheng, T.; Guo, Z. A Review of EEG Acquisition, Processing and Application. J. Phys. Conf. Ser. 2021, 1907, 012045. [Google Scholar] [CrossRef]
BearPi. Introduction of BearPi-BM H21E. Available online: https://www.bearpi.cn/core_board/bearpi/bm/h21e/ (accessed on 1 April 2025).
Gao, M.; Wan, L.; Shen, R.; Gao, Y.; Wang, J.; Li, Y.; Vucetic, B. SparkLink: A Short-Range Wireless Communication Protocol with Ultra-Low Latency and Ultra-High Reliability. Innovation 2023, 4, 100386. [Google Scholar] [CrossRef]
Zou, B.; Zheng, Y.; Shen, M.; Luo, Y.; Li, L.; Zhang, L. BEATS: An Open-Source, High-Precision, Multi-Channel EEG Acquisition Tool System. IEEE Trans. Biomed. Circuits Syst. 2022, 16, 1287–1298. [Google Scholar] [CrossRef]
LSL Development Team. Lab Streaming Layer (LSL). Available online: https://labstreaminglayer.readthedocs.io/ (accessed on 4 April 2025).
Siddharth; Patel, A.N.; Jung, T.P.; Sejnowski, T.J. A Wearable Multi-Modal Bio-Sensing System Towards Real-World Applications. IEEE Trans. Biomed. Eng. 2019, 66, 1137–1147. [Google Scholar] [CrossRef] [PubMed]
Iwama, S.; Takemi, M.; Eguchi, R.; Hirose, R.; Morishige, M.; Ushiba, J. Two common issues in synchronized multimodal recordings with EEG: Jitter and Latency. Neurosci. Res. 2024, 203, 1–7. [Google Scholar] [CrossRef] [PubMed]
Lee, S.; Kim, H.; Hong, D.k.; Ju, H. Correlation analysis of MQTT loss and delay according to QoS level. In Proceedings of the The International Conference on Information Networking 2013 (ICOIN), Bangkok, Thailand, 28–30 January 2013; pp. 714–717. [Google Scholar] [CrossRef]
Soni, D.; Makwana, A. A Survey on Mqtt: A Protocol of Internet of Things (Iot). In Proceedings of the International Conference on Telecommunication, Power Analysis and Computing Techniques (ICTPACT-2017), Chennai, India, 6–8 March 2017; Volume 20, pp. 173–177. [Google Scholar]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
Lee, Y.Y.; Hsieh, S. Classifying different emotional states by means of EEG-based functional connectivity patterns. PLoS ONE 2014, 9, e95415. [Google Scholar] [CrossRef]
Chung, S.Y.; Yoon, H.J. Affective classification using Bayesian classifier and supervised learning. In Proceedings of the 2012 12th International Conference on Control, Automation and Systems, Jeju, Republic of Korea, 17–21 October 2012; IEEE: New York, NY, USA, 2012; pp. 1768–1771. [Google Scholar]
Lan, Z.; Sourina, O.; Wang, L.; Liu, Y. Real-time EEG-based emotion monitoring using stable features. Vis. Comput. 2016, 32, 347–358. [Google Scholar] [CrossRef]
Mohammadi, Z.; Frounchi, J.; Amiri, M. Wavelet-based emotion recognition system using EEG signal. Neural Comput. Appl. 2017, 28, 1985–1990. [Google Scholar] [CrossRef]
Ackermann, P.; Kohlschein, C.; Bitsch, J.A.; Wehrle, K.; Jeschke, S. EEG-based automatic emotion recognition: Feature extraction, selection and classification methods. In Proceedings of the 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services (Healthcom), Munich, Germany, 14–16 September 2016; IEEE: New York, NY, USA, 2016; pp. 1–6. [Google Scholar]
Ding, Y.; Robinson, N.; Zhang, S.; Zeng, Q.; Guan, C. TSception: Capturing temporal dynamics and spatial asymmetry from EEG for emotion recognition. IEEE Trans. Affect. Comput. 2022, 14, 2238–2250. [Google Scholar] [CrossRef]
Lawhern, V.J.; Solon, A.J.; Waytowich, N.R.; Gordon, S.M.; Hung, C.P.; Lance, B.J. EEGNet: A compact convolutional neural network for EEG-based brain–computer interfaces. J. Neural Eng. 2018, 15, 056013. [Google Scholar] [CrossRef]
Schirrmeister, R.T.; Springenberg, J.T.; Fiederer, L.D.J.; Glasstetter, M.; Eggensperger, K.; Tangermann, M.; Hutter, F.; Burgard, W.; Ball, T. Deep learning with convolutional neural networks for EEG decoding and visualization. Hum. Brain Mapp. 2017, 38, 5391–5420. [Google Scholar] [CrossRef]
Rajpoot, A.S.; Panicker, M.R. Subject independent emotion recognition using EEG signals employing attention driven neural networks. Biomed. Signal Process. Control 2022, 75, 103547. [Google Scholar]
Wang, M.; El-Fiqi, H.; Hu, J.; Abbass, H.A. Convolutional neural networks using dynamic functional connectivity for EEG-based person identification in diverse human states. IEEE Trans. Inf. Forensics Secur. 2019, 14, 3259–3272. [Google Scholar] [CrossRef]
Liu, R.; Chao, Y.; Ma, X.; Sha, X.; Sun, L.; Li, S.; Chang, S. ERTNet: An interpretable transformer-based framework for EEG emotion recognition. Front. Neurosci. 2024, 18, 1320645. [Google Scholar] [CrossRef] [PubMed]
Duan, R.N.; Zhu, J.Y.; Lu, B.L. Differential entropy feature for EEG-based emotion classification. In Proceedings of the 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), San Diego, CA, USA, 6–8 November 2013; IEEE: New York, NY, USA, 2013; pp. 81–84. [Google Scholar]
Yang, C.J.; Fahier, N.; He, C.Y.; Li, W.C.; Fang, W.C. An ai-edge platform with multimodal wearable physiological signals monitoring sensors for affective computing applications. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020; IEEE: New York, NY, USA, 2020; pp. 1–5. [Google Scholar]
Kim, B.H.; Jo, S.; Choi, S. ALIS: Learning affective causality behind daily activities from a wearable life-log system. IEEE Trans. Cybern. 2021, 52, 13212–13224. [Google Scholar] [CrossRef] [PubMed]
Saffaryazdi, N.; Wasim, S.T.; Dileep, K.; Nia, A.F.; Nanayakkara, S.; Broadbent, E.; Billinghurst, M. Using facial micro-expressions in combination with EEG and physiological signals for emotion recognition. Front. Psychol. 2022, 13, 864047. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overall workflow of the proposed acquisition system.

Figure 2. Block diagram of the proposed acquisition device.

Figure 3. Schematic of the EEG amplifier module.

Figure 4. Schematic of the biasing circuit of the KS1092 amplifier.

Figure 5. Schematic of the impedance measurement circuit.

Figure 6. Schematic of the main control module.

Figure 7. Block diagram of the expansion module.

Figure 8. (a) ORing configuration of the power supply. (b) Schematic of the dual-battery supply module.

Figure 9. PCB design of the proposed acquisition system.

Figure 10. The key flowcharts in the embedded program. Figure (a) illustrates the state diagram depicting the system’s functional modes. Figure (b) outlines the hardware initialization process and critical steps. Figure (c) presents the flowchart for reading and transmitting EEG data.

Figure 11. Software block diagram.

Figure 12. Screenshot of the Web-based graphical user interface.

Figure 13. Overview of the EmoAdapt pipeline, including (1) signal transformation for generating negative samples, (2) tokenization and 1D-CNN for temporal feature extraction, (3) transformer encoder, and (4) transformer decoder.

Figure 14. The EEG cap layout based on the 10-20 system with highlighted selected channels (selected channels: Fp1, C5, Cp3, and P4 are highlighted with yellow, left ear reference electrode: A1, and right ear bias electrode: A2).

Figure 15. Using the earlobe as a reference, bioelectrical signals were collected from FP1. Figure (a) shows the EOG signal associated with blinking. Highlighted the blink of an eye in light blue. Figure (b) presents the electromyographic (EMG) signal related to teeth grinding. The time period of grinding teeth is marked in light blue.

Figure 16. EEG signals in the occipital region during the eyes-open and eyes-closed states. Figure (a) shows the time-domain waveform during the 1 s eyes-open state, while Figure (b) displays the waveform in the eyes-closed state. (c) presents a comparison of the signal PSD in the 30 s eyes-open and eyes-closed states. (d) displays the time–frequency analysis for the eyes-open and eyes-closed states, revealing a distinct alpha wave in the eyes-closed state.

Figure 17. t-SNE visualization of Subject 15, Session 3, in the SEED dataset and one subject in our experimental data (t-SNE perplexity = 30): (a) Using DE features from subject in the SEED dataset. (b) Using SSL features from subject in the SEED dataset (model trained using SEED session 3). (c) Using DE features from subject in our experimental data. (d) Using SSL features from subject in our experimental data (same model as in (b)).

Table 1. Comparison table of commercial EEG device parameters.

Product	Company	Channel Number	Electrode Position	Input Noise ( $μ V_{rms}$ )	InputRange (mV_pp)	Input Impedance (GΩ)	BW (Hz)
Diadem	Bitbrain	12	Fixed	<1	200	50	DC–40
Ikon	Bitbrain	5	Fixed	<1	200	50	DC–40
EPOCX	Emotiv	14	Fixed	/	8	/	0.16–43
Insight	Emotiv	5	Fixed	/	8.4	/	0.5–43
MUSE 2	InteraXon	4	Fixed	/	2	/	/
Muse S	InteraXon	4	Fixed	/	2	/	/
Cyton	OpenBCI	8	Not fixed	/	5000	0.5	150
Ganglion	OpenBCI	4	Not fixed	/	3000	0.1	150
MindWave Mobile 2	NeuroSky	1	Fixed	/	/	/	3–100

Table 2. Comparison table of commercial EEG device parameters (continued).

Product	Sampling Frequency (Hz)	Resolution (bits)	CMRR (dB)	Communication	Battery Life (h)	Price (USD)	Source
Diadem	256	24	>100	BT	8	/	[32]
Ikon	256	24	>100	BLE	9	/	[33]
EPOCX	256	16	/	BLE	9	999	[34]
Insight	128	16	/	BLE	20	499	[29]
MUSE 2	256	12	/	BT 4.2	5	294	[35]
Muse S	256	14	/	BLE 5.3	10	519	[30]
Cyton	250	24	110	BLE	/	999	[36]
Ganglion	200	24	106	BLE	/	499	[28]
MindWave Mobile 2	512	23	/	BT/BLE	8	129.99	[31]

Table 3. Cost comparison of EEG amplifier and main control modules.

EEG Amplifier		Main Control
Model/Category	Total Cost (USD)	Model/Category	Total Cost (USD)
KS1092 (AFE)	10.59	ESP32-S3 Wireless Module	3.44
ADS131M08 (ADC)	2.76	ICM-42688-P (IMU)	1.46
Resistor	0.22	Resistor	0.15
Capacitor	0.30	Capacitor	0.15
Switch and connector	0.58	Switch and connector	1.27
Others	4.12	Others	1.46
Total	18.57	Total	7.91

The component prices in the table are converted based on the prices and exchange rates as of March 2025.

Table 4. Cost breakdown of a minimum device.

Component	Cost (USD)
EEG amplifier module	18.57
Main control module	7.91
Battery	2.96
Lead wires	1.54
Total	30.98

The prices in the table are for the components only and do not include the PCB costs.

Table 5. Comparison of MQTT QoS levels.

QoS Level	Description
QoS 0	Messages are sent once without guaranteed delivery.
QoS 1	Messages are sent at least once but may be duplicated.
QoS 2	Messages are sent exactly once, ensuring reliable delivery.

Table 6. Cross-session experiment on SEED.

Methods		Mean ACC	Mean MF1
Features	Model	Mean ACC	Mean MF1
SSL features	RF	0.5235	0.5175
	MLP	0.5432	0.5392
	KNN	0.5133	0.5109
	SVM	0.5471	0.5447
DE	RF	0.5211	0.5135
	MLP	0.4856	0.4733
	KNN	0.5150	0.5051
	SVM	0.5372	0.5334
PSD	RF	0.5204	0.5130
	MLP	0.3977	0.3953
	KNN	0.3806	0.3798
	SVM	0.3606	0.2313
TD	RF	0.4919	0.4860
	MLP	0.4349	0.4270
	KNN	0.4730	0.4710
	SVM	0.4220	0.3610
Mix	RF	0.5132	0.5079
	MLP	0.4182	0.4145
	KNN	0.3806	0.3798
	SVM	0.3607	0.2313
ERTNet		0.5225	0.5165
EEGNET		0.5210	0.5086
ShallowConvNet		0.5046	0.4980
DeepConvNet		0.4954	0.4629
Tsception		0.5091	0.4970

TD refers to time-domain Features, Mix is the combination of DE, PSD, and time-domain features. Bold numbers indicate the best results across different features for the same ML method.

Table 7. Cross-session test on the experiment’s data.

Methods		Mean ACC	Mean MF1
Features	Model	Mean ACC	Mean MF1
SSL features *	RF	0.6017	0.5935
	MLP	0.5955	0.5824
	KNN	0.6062	0.5928
	SVM	0.5999	0.5894
DE	RF	0.5540	0.5309
	MLP	0.3915	0.3737
	KNN	0.5073	0.4769
	SVM	0.5051	0.4733
ERTNet		0.5284	0.4789
EEGNET		0.5307	0.5053
ShallowConvNet		0.4861	0.4712
DeepConvNet		0.4339	0.3571
Tsception		0.5470	0.5233

* The SSL features were generated using a cross-dataset feature extractor trained on SEED’s session 3. Bold numbers indicate the best results across different features for the same ML method.

Table 8. MF1 performance for ablation study.

Components			RF	MLP	KNN	SVM
ST	DMR	STM	RF	MLP	KNN	SVM
✓	✓	✓	0.5176	0.5230	0.4998	0.5358
✓	✓		0.4995	0.5222	0.4774	0.5167
✓		✓	0.5093	0.5153	0.4929	0.5279
	✓	✓	0.4882	0.4964	0.4595	0.4805

The ✓ shows which components are retained in the ablation study. ST denotes the signal transformation component, DMR denotes the differential masking ratio, and STM denotes signal transformation branch masking.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Luo, H.; Li, H.; Tao, W.; Yang, Y.; Ieong, C.-I.; Wan, F. A Portable and Affordable Four-Channel EEG System for Emotion Recognition with Self-Supervised Feature Learning. Mathematics 2025, 13, 1608. https://doi.org/10.3390/math13101608

AMA Style

Luo H, Li H, Tao W, Yang Y, Ieong C-I, Wan F. A Portable and Affordable Four-Channel EEG System for Emotion Recognition with Self-Supervised Feature Learning. Mathematics. 2025; 13(10):1608. https://doi.org/10.3390/math13101608

Chicago/Turabian Style

Luo, Hao, Haobo Li, Wei Tao, Yi Yang, Chio-In Ieong, and Feng Wan. 2025. "A Portable and Affordable Four-Channel EEG System for Emotion Recognition with Self-Supervised Feature Learning" Mathematics 13, no. 10: 1608. https://doi.org/10.3390/math13101608

APA Style

Luo, H., Li, H., Tao, W., Yang, Y., Ieong, C.-I., & Wan, F. (2025). A Portable and Affordable Four-Channel EEG System for Emotion Recognition with Self-Supervised Feature Learning. Mathematics, 13(10), 1608. https://doi.org/10.3390/math13101608

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Portable and Affordable Four-Channel EEG System for Emotion Recognition with Self-Supervised Feature Learning

Abstract

1. Introduction

2. Related Works

2.1. EEG-Based Emotion Features

2.2. Self-Supervised Learning

2.3. Commercial EEG Acquisition System

2.4. Self-Designed EEG Acquisition System

2.4.1. Designs Using Discrete General-Purpose Operational Amplifiers

2.4.2. Designs Based on Dedicated Bio-AFE Chips

2.4.3. Highly Integrated Bio-SoC Solutions

3. Proposed EEG Acquisition System

3.1. Hardware Development

3.1.1. EEG Amplifier Module

Amplifier

ADC

Impedance Measurement

Power and I/O Configuration

3.1.2. Main Control Module

3.1.3. Expansion Module

3.1.4. Power Supply

3.1.5. PCB Design

3.1.6. Cost Description

3.2. Embedded System Development

3.2.1. Data Transmission

3.2.2. Event and Command Transmission

3.3. Software Development

3.3.1. Backend Service

3.3.2. Web-Based GUI

4. Methodology

4.1. Self-Supervised Feature Extractor

4.1.1. Signal Transformation

4.1.2. Tockenization and 1D CNN

4.1.3. Transformer Encoder

4.1.4. Transformer Decoder

4.2. Training Task

4.3. Classification Task

5. Experiment and Materials

5.1. Channel Selection

5.2. SEED Dataset

5.3. Experiment Protocol

5.4. Data Processing

6. Results

6.1. Signal Acquisition Validation

6.2. Methodology Validation

6.2.1. Feature Baselines

6.2.2. Model Baselines

6.2.3. Validation on SEED Dataset

6.2.4. Validation on Experiment Dataset

6.2.5. Ablation Study

6.2.6. Vitalization

7. Discussion

8. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI