Spectrogram Contrast Enhancement Improves EEG Signal-Based Emotional Classification

Malallah, Fahad Layth; Iqbal, Kamran

doi:10.3390/app152312634

Open AccessArticle

Spectrogram Contrast Enhancement Improves EEG Signal-Based Emotional Classification

by

Fahad Layth Malallah

^1,2,*

and

Kamran Iqbal

¹

School of Engineering and Engineering Technology, University of Arkansas at Little Rock, Little Rock, AR 72204, USA

²

Department of Computer Networks and Internet, College of Information Technology, Ninevah University, Mosul 41002, Iraq

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(23), 12634; https://doi.org/10.3390/app152312634

Submission received: 3 November 2025 / Revised: 24 November 2025 / Accepted: 25 November 2025 / Published: 28 November 2025

(This article belongs to the Special Issue Brain Functional Connectivity: Prediction, Dynamics, and Modeling—2nd Edition)

Download

Browse Figures

Versions Notes

Featured Application

Analyzing and regulating human emotions has the potential to improve self-awareness, build stronger relationships, support emotional intelligence, and promote therapies for mental health outcomes.

Abstract

Neuroscience adopts a multidimensional approach to decode thoughts and actions originating inside the brain, also called Brain Computer Interface (BCI). However, achieving high accuracy in the electroencephalography signal-based decoding remains a challenge and an open research topic in BCI research. This study aims to enhance the accuracy of signal classification for identifying human emotional states. We utilized the publicly available EEG–Audio–Video (EAV) dataset that comprises EEG recordings from 42 subjects across five emotional categories. Our key contribution is to exploit the two-dimensional contrast enhancement applied to the spectrogram for feature extraction, followed by classification using the EEGNet model. As a result, 12.5% improvement in classification accuracy over the baseline was achieved. This contribution demonstrates a potential advancement in BCI-based EEG signal processing in neuroscientific research.

Keywords:

adaptive contrast enhancement; brain computer interface; deep learning; emotional classification; electroencephalography; Short-Time Fourier Transform

1. Introduction

An electroencephalogram (EEG) records the electrical activity of the brain using surface electrodes attached to the scalp. The EEG has been employed for researching the Brain Computer Interfaces (BCIs) [1]. Neuroscience integrates several subfields, such as psychology and cognitive science, neuroimaging, and artificial intelligence (AI), to explore the functioning of the entire nervous system. Within neuroscience, BCI deals with communication between an individual’s brain signals and an external device not involving oral communication or motor functions [2]. BCI-EEG signals are characterized by their frequency, amplitude, waveform morphology, and spatial placement across scalp electrodes [3]. Typically, EEG signals span a frequency range of 0.1 Hz to 100 Hz and are categorized into five primary bandwidths (Figure 1): Delta (δ) with the range of 0.5–3.5 Hz, associated with deep sleep and comatose states; Theta (θ) with the range of 3.5–7.5 Hz, linked to creativity, stress, and deep meditation; Alpha (α) with the range of 7.5–12 Hz, predominant during relaxed and calm mental states; Beta (β) with the range of 13–30 Hz, observed during focused attention, visual processing, and motor coordination; and Gamma (γ) that has frequencies >30 Hz, which emerges during complex cognitive functions, motor execution, and multitasking [4,5].

The human brain is functionally divided into four major lobes: the frontal, temporal, parietal, and occipital lobes [6]. Each lobe is associated with a distinct set of structures, which correspond to specific neural functions. As shown by using color codes in Figure 2, the frontal lobe (Fp1, Fp2, AFz, F7, F3, Fz, F4, F8, FC5, FC1, FCz, FC2, and FC6) is primarily responsible for executive functions, including cognitive control, decision-making, and the regulation of emotional responses during task execution. The temporal lobe (T7, TP9, T8, and T10) plays a critical role in auditory processing and the perception of biological motion. The parietal lobe (P7, P3, Pz, P4, P8, PO9, PO10, CP1, CP2, CP5, and CP6) is largely involved in somatosensory processing, spatial representation, and tactile perception. Finally, the occipital lobe (O1, Oz, and O2) is primarily responsible for visual processing, particularly the perception and interpretation of visual stimuli [4].

Despite recent advancements, EEG-based BCI systems continue to face significant challenges, particularly with respect to low classification accuracy [7] and inter-subject variability [8]. The intra-user variability limitation refers to the phenomenon where EEG signals corresponding to the same cognitive task or thought can vary across different recording sessions for the same individual. In other words, the EEG pattern generated by a specific mental activity may not be identical when that activity is repeated at a later date or time [9]. The main contribution of this research lies in its demonstration that spectrograms generated via the STFT from EEG signals can be enhanced through the application of adaptive contrast enhancement (ACE). This preprocessing technique gives a superior representation of the underlying neural patterns in the spectrograms, which, in turn, facilitates improved discriminatory feature extraction and thus augments the final classification performance in subsequent analytical stages. Accordingly, this idea has been applied and tested to show the enhanced recognition accuracy by exploiting the EEG emotional state of a person for BCI applications.

To demonstrate the BCI potential of EEG signals, we used a publicly available EAV dataset [10] as a benchmark. The EAV dataset contains recordings from 42 subjects across five emotional classes: neutral, anger, happiness, sadness, and calmness [10]. Two EEG recordings from channel 0 and channel 5 for subject_1 are randomly depicted in Figure 3. The classification accuracy enhancement is achieved by refining the spectrogram feature extraction methods previously employed with the EEGNet architecture. The proposed methodology leverages the Short-Time Fourier Transform (STFT) that transforms EEG signals into a time–frequency representation. Later, adaptive contrast enhancement is introduced to achieve a better representation, enabling the EEGNet model to more accurately capture both temporal and spectral features.

This research paper is organized into five sections: Section 2 presents the literature review. Section 3 explains the research methodology design. Section 4 presents and discusses the results of the study. Finally, Section 5 presents the conclusion, followed by the list of references.

2. Literature Review

One of the fundamental components of BCI technology is the ability to recognize emotional states within the brain. This process, often referred to as brain decoding, can be achieved through both invasive [11] and non-invasive [12] methods. Invasive BCIs involve the implantation of microelectronic devices beneath the scalp or directly into neural tissue such as the electrocorticograph (ECOC) [13], offering high signal fidelity and accuracy. However, these methods present significant challenges, including the risk of infection, high cost, and surgical complexity [14]. In contrast, non-invasive techniques such as those based on EEG or functional near-infrared spectroscopy (fNIRS) [15] are widely adopted due to their safety, portability, and ease of use. While non-invasive approaches offer a more practical solution for everyday applications, they typically yield lower accuracy compared to invasive methods due to signal attenuation and noise. Nevertheless, EEG-based systems, in particular, have become central to the development of second-generation BCI technologies, offering a promising balance between usability and performance [16,17]. The task of decoding emotions from brain activity has attracted considerable attention from researchers. However, achieving high recognition accuracy remains a significant challenge, requiring substantial improvement. Nevertheless, recognizing and classifying EEG bio-signals is a complex task due to several inherent characteristics: high intra-subject variability, high dimensionality, non-stationarity, and a strong susceptibility to noise [18]. These challenges are further compounded when applying deep learning techniques to EEG-based emotion recognition. In particular, two major obstacles persist: the variability of emotional patterns across individuals (intra-subject variability) and the limited availability of labeled EEG datasets. Several studies on emotion recognition using EEG signals have been made publicly available. Notably, Wang [18] proposed a method based on a pre-trained Vision Transformer for emotion recognition, evaluating its performance across four widely used public datasets: SEED, SEED-IV, DEAP, and FACED. The cross-dataset emotion recognition accuracy achieved 93.14% on SEED, 83.18% on SEED-IV, 93.53% on DEAP, and 92.55% on FACED. The approach utilizes a transfer learning framework known as Pre-trained Encoder from Sensitive Data (PESD).

Another notable study, published in Nature (Scientific Data) [10], introduced the EAV dataset for emotion recognition in conversational contexts. This multimodal dataset incorporates three modalities—EEG, audio, and video—to model human emotions more comprehensively. Among these, EEG plays a central role. For the EEG component, the authors employed the SEED-IV dataset, which consists of 30-channel EEG recordings. A total of 42 participants took part in the study, each engaging in cue-based conversational scenarios designed to elicit five distinct emotional states: neutrality, anger, happiness, sadness, and calmness. Each participant contributed approximately 200 interactions, encompassing both listening and speaking tasks, resulting in a total of 8400 interactions across all participants. For EEG data acquisition, the BrainAmp system (Brain Products, Munich, Germany) was used. EEG signals were collected via Ag/AgCl electrodes placed at standardized scalp locations: Fp1, Fp2, F7, F3, Fz, F4, F8, FC5, FC1, FC2, FC6, T7, C3, Cz, C4, T8, CP5, CP1, CP2, CP6, P7, P3, Pz, P4, P8, PO9, O1, Oz, O2, and PO10. Data were sampled at 500 Hz, with reference electrodes placed at the mastoids and grounding via the AFz electrode. The electrode impedance was maintained below 10 kΩ to ensure data quality. The EEG recordings were initially stored in BrainVision Core Data Format and later imported into MATLAB for further processing and analysis [10]. Emotion recognition performance for each modality was evaluated using deep neural network (DNN) models. The best classification accuracy achieved for EEG-based emotion recognition using this dataset was approximately 60%. We will use these results as a benchmark as we aim to enhance the classification accuracy of the EAV dataset.

Other researchers have classified EEG into three classes, happy, neutral, and sad, as reported in [19]. Using an SVM classifier with time–frequency features, an accuracy of 88.93% was achieved. Various EEG datasets are designed to study emotional responses under varying experimental conditions. The DEAP dataset [20] records 32-channel EEG from 32 participants during music video stimulation, annotated with continuous dimensions (valence, arousal, dominance, and liking), making it suitable for dimensional emotion modeling. The SEED series [21] employs 64-channel EEG and movie clips to induce discrete emotions such as positive, negative, and neutral and, after that, happy, sad, and fear in SEED-IV/V across 15 subjects, facilitating categorical emotion classification studies. In contrast, the DREAMER dataset [22] uses 14-channel EEG from 23 subjects watching film clips, providing self-reported valence, arousal, and dominance ratings, balancing practicality with robust affective annotations. Another MPED dataset (Song et al., 2019) [23] extends emotional descriptors to include arousal, valence, and discrete emotional states (DES) via 62-channel EEG. Table 1 lists other research related to EEG emotional classifications that explains some important attributes such as the number of EEG channels, dataset name, number of individuals, and number of output classes; this table also lists the methodology used and the reported accuracy.

3. Materials and Methods

The signal processing and classification of emotional EEG waves involve several processing steps, including band-pass filtering, downsampling, and a reshaping process. Following preprocessing, STFT [30,31] is applied as a feature extraction technique to enhance the signal’s representational characteristics. Unlike a standard Fourier Transform, which assumes signal stationarity, the STFT operates by dividing the continuous EEG signal into brief, sequential time segments using a sliding window function. This allows for the computation of a local Fourier spectrum for each segment, effectively capturing the temporal evolution of spectral power across key frequency bands (delta, theta, alpha, beta, and gamma). The resultant time–frequency representation (TFR) provides a highly informative feature set that preserves crucial information about both the timing and the frequency content of neural oscillations and transient events. Then, the adaptive contrast enhancement (ACE) process is applied. Later, a two-dimensional feature matrix is prepared for subsequent advanced analysis within the EEGNet environment [32], which employs a deep learning training model to classify specific cognitive states. The preprocessing pipeline and the overall model architecture are depicted in the general block diagram as shown in Figure 4.

In this study, EEG data consisting of 42 subjects was retrieved from the public EAV dataset [10]. Each record has two files: an EEG data file that contains raw EEG signals and a corresponding label file containing class annotations. The EEG signals are represented as

X \in R^{s \times c \times n}

, where

s

the number of time samples over

20 s

of recording at 500 Hz,

c = 30

is the number of channels, and

n = 200

is the number of trials. Labels are stored as a one-hot encoded matrix:

Y \in \{0, 1\} k \times n

, where

k

is the number of classes. The input dataset for each subject has dimensions

X_{i n p u t} = 10000 \times 30 \times 200

. The data were organized into segments, with each segment representing a trial of EEG recordings across multiple channels. The preprocessing pipeline consisted of several steps to prepare the EEG data for classification. The first step applied bandpass filtering (BPF) to retain frequencies between 3 Hz and 50 Hz, resulting in the filtered signal

X_{f}

, given by the following equation:

X_{f} (s, c) = B P F (X (s, c), [3, 50], f_{s})

(1)

Next, the filtered EEG signal

X_{f} (s, c, n)

were downsampled from

f_{s} = 500 H z

to

f_{s 2} = 100 H z

using polyphase resampling [32] to reduce computational complexity, as given by the following equation:

X_{d} (s^{'}, c, n) = r e s m a p l e (X_{f} (s, c, n), \frac{f_{s}}{f_{s 2}})

(2)

where

(s^{'})

represents the downsampled signal. Then, the reshaped data is segmented and transposed into a format suitable for analysis, resulting in a shape

X_{d} (s^{'}, c, n) = 2000 \times 30 \times 200

. Then, segmentation and reshaping processes are applied, in which the downsampled signals were segmented into trials and then reshaped into a tensor

X_{s} \in R^{n \times s^{'} \times C}

, where

n = 200

is the number of trials,

s^{'} = 2000

is the number of downsampled time points per trial, and

c = 30

, where

X_{s} = 200 \times 2000 \times 30

.

The downsampled signals

X_{s}

are reshaped and transposed to align with the segmentation requirements of the classification model. Five specific class labels,

k \in {1, 3, 5, 7, 9}

, were randomly selected for classification. One-hot encoding was applied to represent the class labels, as in the following equation:

Y_{k, i}^{'} = \{\begin{array}{l} 1, i f c l a s s k c o r r e s p o n d t o t r i a l i \\ 0, O t h e r w i s e \end{array}

(3)

In the above,

Y^{'} \in {0, 1}^{5 \times n^{'}}

, where

n^{'}

is the number of selected trails. Accordingly, the input dataset

X_{s} \in R^{t \times c \times s \times 1}

has the dimensions

X_{s} = 400 \times 30 \times 500 \times 1

, 400 for trials,

30

for channels, and

500

for time points for the EEG signal with the corresponding labels

Y_{k, i}^{'} = 400 \times 5

. Next, the STFT was applied to the EEG dataset as a key methodological improvement that could enhance the EEG signal recognition accuracy by providing time–frequency localization

3.1. Short-Time Fourier Transform (STFT) Feature Extraction

Given an input EEG dataset

x \in R^{t \times c \times s \times 1}

, where

t

is the number of trials,

c

is the number of channels, and

s

is the number of time samples, the STFT was applied to each trial. The STFT parameters are

f_{s}

(the sampling frequency in Hz),

N_{s e g}

(the segment length), and

N_{o v e r l a p}

(the number of overlapping samples between consecutive segments). The STFT is applied to each one-dimensional EEG signal

x_{t, c} (n)

, where

x_{t, c} (n) = x (t, c, n)

represents the time-domain signal for trial

t \in \{1, 2, 3 \dots, T\}

, channel

c \in \{1, 2, 3 \dots, C\}

and the sample index

n \in \{0, 1, 2, \dots, S - 1\}

. The STFT is defined as the following equation:

Z_{t, c} (k, m) = \sum_{n = 0}^{N_{s e g} - 1} x_{t, c} (n + m \cdot (N_{s e g} - N_{o v e r l a p})) \cdot w (n) e^{- \frac{j 2 π k n}{N_{s e g}}}

(4)

where

$Z_{t, c} (k, m)$ is the complex-valued STFT coefficient for frequency_bin ( $k)$ and time_bin ( $m)$ ;
$w (n)$ is a window function of length $N_{s e g}$ ;
$k \in \{0, 1, \dots, ⌊N_{s e g} / 2⌋\}$ indexes the frequency_bins assuming positive frequencies for real-valued signals;
$m \in \{0, 1, \dots, M - 1\}$ indexes the time bins, where $M = ⌊\frac{S - N_{s e g}}{N_{s e g} - N_{o v e r l a p}}⌋ + 1$ is the number of time_bins.

To boost the output, the absolute value is obtained from the complex-valued STFT coefficients:

Y_{t, c} (k, m) = |Z_{t, c} (k, m)|

. The resulting STFT magnitude is organized into a four-dimensional array

Y \in R^{t \times c \times k \times m}

, where:

k = [N_{s e g} / 2] + 1

is the number of frequency_bins including zero frequency and Nyquist frequency for even

N_{s e g}

, and

m

is the number of time_bins. For each trial

t

and chancel

c

, the STFT magnitude

Y_{t, c} (k, m)

is computed, and the results are stacked as

{Y (t, c, k, m) = Y}_{t, c} (k, m)

.

It may be noted that the choice of the window function

w (n)

and parameters

N_{s e g}

and

N_{o v e r l a p}

affects the time–frequency resolution trade-off. A larger

N_{s e g}

provides better frequency resolution but poorer time resolution, while a larger

N_{o v e r l a p}

increases the number of time_bins, improving temporal smoothness. The sampling frequency

f_{s}

determines the frequency resolution with frequency_bins corresponding to

f_{k} = k \frac{f_{s}}{N_{s e g}}

for

k = 0, 1,2, \dots ⌊\frac{N_{s e g}}{2}⌋

.

To facilitate compatibility with the convolutional neural networks (CNNs) within the EEGLAB environment, the 2D time–frequency representation is transformed into a 1D feature vector. This is achieved through a flattening operation, which reshapes the spectrogram matrix, comprising

k

frequency_bins and

m

time_bins into a contiguous vector of length

S T F T_{C o f} = k \times m

. Consequently, the final feature set, denoted as STFT_Cof, encapsulates the entire constellation of magnitude values from the STFT for each single channel in each subject, thereby rendering the rich time–frequency structure into a format suitable for the EEGLAB. Now, the dataset is ready as

X_{R} \in R^{t \times c \times S T F T_C o f \times 1}

to be fed into the EEGnet for training and building the future reference model. Thus the final tensor

X_{R} \in R^{400 \times 30 \times 585 \times 1}

.

3.2. Adaptive Contrast Enhancement (ACE)

The adaptive local spectral contrast enhancement (ACE) is used to improve the interpretability and feature salience of time–frequency representations [33,34]. Several research studies in the literature have used STFT spectrograms directly passed to the next stage classifier or deep learning or LSTM network [35]. These include a multi-input CNN on STFT features to classify EEG motor imagery [36] and a multi-feature extraction CNN used to improve the STFT features for classifying emotional status [37]. Another type of STFT improvement is embedding a process before the classifier, for instance, STFT improvement by using common spatial pattern (CSP) combined with STFT, then pipelined to the neural network classifier [38]. Feature selection, such as dimensionality reduction, LDA, and random forest, have also been used after the STFT as a type of spectrogram improvement, later classified with SVM [39]. In the proposed work, STFT features will not be applied directly to the classifier but be integrated with an ACE process to enhance these features and then sent to the classifier.

In this study, ACE is used to mitigate the low contrast of the STFT spectrogram features. This is accomplished by normalizing the spectrogram

S T F T_{C o f}

based on local statistics within a defined neighborhood

N

.

Let

S T F T_{C o f} = k \times m

denote an input spectrogram and

E n h a n c_S T F T_{C o f} = k \times m

denote the enhanced spectrogram. Let neighborhood

N_{k}

and

N_{m}

around each point

(k, m)

be defined, where the kernel

(k \times m)

is specified by the neighborhood_size as a hyperparameter. The local statistical estimation includes the local mean

μ (k, m)

and local standard deviation

σ (k, m)

, which are estimated within this neighborhood using uniform filters. The local mean

μ (k, m)

is computed as the following equation:

μ (k, m) = \frac{1}{N} \sum_{i, j \in (k, m)} S T F T_{C o f} (i, j)

(5)

where

N

is the number of points in the neighborhood. The local standard

σ (k, m),

which measures local spectral contrast (texture), is derived from the local mean of squares as the following equation:

σ (k, m) = \sqrt{\frac{1}{N} \sum_{i, j \in (k, m)} {S T F T_{C o f} (i, j)}^{2} - {μ (k, m)}^{2}}

(6)

Each point in the spectrogram is then normalized (Z-score) by subtracting the local mean and dividing by the local standard deviation as the following equation:

E n h a n c_{N o r m}_S T F T_{C o f} (k \times m) = \frac{S T F T_{C o f} (k \times m) - μ (k, m)}{σ (k, m) + ϵ}

(7)

A small constant

{ϵ = 10}^{- 8}

is added for numerical stability and in some cases to prevent division by zero. This operation effectively stretches the local dynamic range, boosting components that stand out from their local background. To map the enhanced data

E n h a n c_{N o r m}_S T F T_{C o f} (k, m)

back to a physically meaningful range, a min-max rescaling is applied to normalize within the range [0, 1] as the following equation:

E n h a n c_{S c a l e d}_S T F T_{C o f} (k, m) = \frac{E n h a n c_{N o r m}_S T F T_{C o f} (k, m) - M i n (E n h a n c_{N o r m}_S T F T_{C o f})}{M a x (E n h a n c_{N o r m}_S T F T_{C o f}) - M i n (E n h a n c_{N o r m}_S T F T_{C o f})}

(8)

It is then rescaled to the original amplitude range of the input spectrogram

S T F T_{C o f} (k, m)

to preserve the global amplitude relationships while maintaining the enhanced local contrast as given by the following equation:

\begin{matrix} E n h a n c_S T F T_{C o f} (k, m) = E n h a n {c_{S c a l e d} S T F T}_{c o f} (k, m) \cdot (M a x (S T F T_{C o f}) - \\ M i n (S T F T_{C o f})) + M i n (S T F T_{C o f}) . \end{matrix}

(9)

3.3. Deep Learning EEGNet Classifier

After extracting the features for all subjects and channels, the enhanced data

E n h a n c_S T F T_{C o f} (k, m)

is fed into the CNN, which consists of 14 layers arranged in two blocks. Table 2 details the sequential layer architecture of EEGNet [32], a convolutional neural network (CNN). Each block consists of a sequence of layers including convolution, normalization, ReLU activation, and average pooling, with their corresponding hyperparameter configurations specified in the table.

The model is trained with the Adam optimizer and categorical cross-entropy loss function. To maintain the cross-validation, the STFT with the ACE dataset is split into two sets: 50% for training and the other 50% set aside for testing. The 50% training dataset is further divided into training (70%) and validation (30%) subsets. The model was trained for 100 epochs with a batch size of 32, and performance was monitored on the validation subset. The choice of window function

w (n)

and parameters

N_{s e g = 128}

and

N_{o v e r l a p} = 64

affects the time–frequency resolution trade-off. These parameters were selected following experimentation for achieving better accuracy. The model’s performance was evaluated on the blind test set using accuracy and the weighted F1-score. A confusion matrix was computed for each subject to assess classification performance across the five classes. The confusion matrices were summed across all subjects to obtain an aggregate performance metric. Average accuracy and F1-scores were calculated to summarize the model’s effectiveness. The results were averaged across all 42 subjects. The summed confusion matrix provided insights into the model’s classification performance across the selected classes. In the experiments, the adjustable parameters include the number of classes Nclasses = 5, dropout rate δ = 0.5, filters F1 = 8, depth multiplier D = 2, filters F2 = 16, and normalization rate η = 0.25. Dropout type is either SpatialDropout2D or Dropout.

4. Results and Discussion

Spectrograms based on the STFT for selected EEG signals, such as those from subjects 2 and 3, are shown in Figure 5, in which Figure 5a depicts EEG signals related to subject 2 across three channels: 0, 5, and 10. Their spectrogram for each channel is represented as a time bin on the x-axis and a frequency bin on the y-axis, with spectral power indicated by color intensity. From the image, it is obvious that the higher amplitude signals have higher power spectrum signals in their frequency domain. This process can represent the signals more effectively and provide an abstract representation of the signals’ temporal changes over time, leading to a better understanding by the classifier. Similarly, Figure 5b shows the EEG signals for subject 3, using the three channels 0, 5, and 10, in both the time and frequency domains. Remarkably, this method’s capacity underscores revealing energy distribution across frequency bands over time that offers an informative abstraction of temporal dynamics, enhancing feature discriminability for subsequent classification.

A limitation of the STFT technique is its fixed time–frequency resolution because the STFT uses a fixed window size,

N_{s e g}

, leading to a constant time–frequency resolution across all frequencies. This can be suboptimal for signals with both low-frequency components (requiring longer windows for better frequency resolution) and high-frequency components (requiring shorter windows for better time resolution). Thus, the STFT limitation requires adjusting the trade-off between time and frequency resolution. Moreover, spectral leakage occurs because of the use of a finite window

w (n)

. The choice of window function mitigates but does not eliminate this issue. Also, it could have sensitivity to parameter selection because the performance of the STFT depends heavily on the choice of

N_{s e g}

and

N_{o v e r l a b}

for the window function. Suboptimal parameters can lead to poor resolution or artifacts, requiring domain expertise or empirical tuning. Eventually, tensor

Y \in R^{T \times C \times K \times M}

can be high-dimensional, especially for large

T

,

C

, or

S T F T_{C o f}

, that is increasing memory and computational requirements for downstream processing.

4.1. Signal Enhancement Using STFT with ACE

Figure 6 visualizes the effect of the adaptive spectral contrast enhancement on the STFT original

S T F T_{C o f}

and enhanced STFT

E n h a n c_S T F T_{C o f}

of subject 2 and subject 3 using randomly selected channels 5 and 10 from each subject. The ACE has three main effects as follows:

First, the increased contrast, which means that the high- and low-frequency bin regions in the spectrogram are clearer in the enhanced versions. This makes it easier for the machine learning to distinguish between different frequency components and their changes over time.
Secondly, spectral features are sharper or more defined in the enhanced plots. And the third effect is background noise elimination by amplifying the relevant signal components while relatively suppressing the less important background noise.

Furthermore, it is easily assessed that the difference between Figure 6a, the original STFT, and Figure 6b, the enhanced STFT, is subjective. Indeed, ACE has limitations that need to be considered. For example, the normalization process can create artifacts around strong, isolated features, where the local standard deviation is very low just outside the feature’s boundary. In addition, the ACE method assumes local stationarity within the window. It may perform poorly with highly non-stationary EEG interfering signals.

The choice of the hyperparameter named neighborhood_size

N

is essential for improving the EEG signal representations. It is a trade-off, because, if it is a small window, for instance,

N = 3

, the process captures a very fine-grained, high-frequency texture. Therefore, this is useful for enhancing narrow spectral lines but may also extract high-frequency noise. Conversely, if large windows are configured, for instance,

N = 15

, the process captures broader spectral trends. This is effective for enhancing larger structures, such as a formant’s spectral envelope, but may overlook finer details. Overall, the enhancement aims to make the important spectral characteristics of the EEG data more visually prominent, which can be beneficial for subsequent analysis or input into a machine learning model like EEGNet.

4.2. Recognition Accuracy

The classification accuracy is assessed by exploiting the confusion matrix (CM) [40]. In the case of two classes, CM has four parameters: True Positive (TP), True Negative (TN), False Positive (FP), and False Negative (FN), as illustrated in Table 3.

In this paper, the proposed classifier is evaluated by its accuracy and F1 score, in which their formulas are listed in the following equations:

A c c u r a c y = \frac{T P + T N}{T P + F P + T N + F N}

(10)

F 1 s c o r e = \frac{T P}{T P + 0.5 (F P + F N)}

(11)

As this research involves five emotional categories, the standard confusion matrix is extended from two to five classes. The idea is to consider one class as true and the remaining four as false. For example, if the second class is considered true, the first, third, fourth, and fifth are deemed false. Figure 7 presents a comparative analysis of classification accuracy for 42 subjects, contrasting the performance of original EEG data (black), preprocessed EEG dataset using Short-Time Fourier Transform (STFT) feature extraction (orange), and the STFT with ACE processing (green). The x-axis represents the subject index (1–42), while the y-axis denotes accuracy values ranging from 0 to 1.

The plot reveals variability in the accuracy across subjects, with some showing significant improvement after STFT with ACE dataset processing, while others exhibit minimal change or slight degradation. Figure 8 presents a comparative analysis of classification accuracy based on F1-score for 42 subjects, contrasting the performance of the original EEG data (black), preprocessed EEG dataset using STFT feature extraction (orange), and the STFT with ACE processing (green). The x-axis represents the subject index (1–42), while the y-axis denotes accuracy values ranging from 0 to 1. There is an improvement after STFT with ACE dataset processing compared with the original EEG dataset, while others exhibit minimal change or slight degradation.

As shown in Figure 7 and Figure 8, the effect of the ACE stage resulted in a marginal decrease in recognition rate for a small subset of subjects compared to the STFT features. However, this minor reduction is offset by the substantial improvement observed in the average of all subjects. Therefore, the aggregate performance, measured by the mean accuracy and F1 score across all 42 subjects confirms the overall performance of the ACE method by improving 1.66%.

4.3. Comparison of Classification Accuracies

Table 4 contains the recognition accuracy results of an EEG for five emotional classes: neutrality, anger, happiness, sadness, and calmness. Eight experiments were run for training and testing, and their averages have been calculated. Table 4 demonstrates a clear performance where the proposed method (STFT + Adaptive Contrast Enhancement + EEGNet) consistently achieves the highest accuracy with an average of up to 72.5%, outperforming both the approach consisting of STFT + EEGNet at an average of up to 70.84% and the baseline EEGNet alone with the classification accuracy as an average of up to 59.94% (which is similar to the accuracy published in the original dataset [10]).

The substantial increase from the baseline to STFT-enhanced models confirms the critical importance of time–frequency features for EEG analysis, while integrating a small improvement from adding ACE signal preprocessing as an effective refinement technique that enhances feature discriminability in spectrograms. The resulting low variance across all eight experimental runs confirms the statistical robustness of these improvements, solidifying the conclusion that each processing stage (particularly the ACE approach) meaningfully contributes to more accurate and reliable EEG classification.

The eight experimental attempts were conducted using identical hyperparameters and configurations to ensure and confirm methodological consistency. This replication was performed essentially to verify the initial results and to assess the robustness of the default parameter set against the potential influence of probable variability. Thus, this is to confirm that the outcomes were not attributable to random chance. The average result of 72.5% can be considered highly stable, since the standard deviation (SD) of ±0.42 is very small relative to this average. Accordingly, the percentages on a test would indicate very consistent and reproducible performance across all eight experiments.

Table 5 illustrates the confusion matrix related to the STFT with ACE preprocessing. For the 42 subjects, each has 1680 samples per class and 1 subject has 40 samples per class, as listed in the table. Because each subject has 200 instances entered in the testing (40 instances/class). The model demonstrates strong overall performance for a complex five-class problem. The high values along the main diagonal (1195, 1195, 1287, 1294, and 1130) summed to 6101 correct predictions, while off-diagonal elements represent misclassifications. The overall accuracy is given as 6101/8400 = 72.6%, which reflects the correct predictions divided by the number of samples (5 classes × 1680 instances).

Figure 9 depicts the evaluation across all 42 subjects, featuring three distinct confusion matrices. The first CM corresponds to the model testing on the 200 instances for each subject’s original dataset, establishing a baseline performance. The second matrix presents the testing results utilizing STFT with preprocessing on a held-out set of 200 instances, representing a blind testing scenario. The third matrix illustrates the accuracy achieved by integrating STFT with ACE preprocessing. A comparative analysis reveals that the principal diagonal of the third confusion matrix (proposed method) contains the highest instances of correct predictions, demonstrating a superior classification accuracy and an enhanced emotional recognition rate attributable to the combined STFT-ACE preprocessing pipeline.

The graph in Figure 10 illustrates the relationship between classification accuracy and neighborhood size as a hyperparameter associated with ACE preprocessing. As is shown, the accuracy demonstrates notable sensitivity to this parameter, initially increasing to an apparent optimum between neighborhood sizes of 10 and 11, where peak performance of approximately 73% is attained. Beyond this peak, a consistent decrease in accuracy is observed when the neighborhood size increases to 14, suggesting that larger neighborhoods introduce non-discriminative information that degrades the model’s performance. We may note that STFT-ACE fusion is a novel idea to improve the EEG signal representation, even though it offers a small improvement that could be enhanced using optimization and fine-tuning of hyperparameters.

4.4. SHAP Channel Importance Analysis

To highlight which active EEG channels contribute more to building the model than others, Shapley Additive Explanations (SHAP) analysis is applied. SHAP is based on mean absolute SHAP values to provide a quantitative assessment of the contribution of EEG channel to the predictive output of the trained model [41]. According to Figure 11, channels with taller bars are more influential in determining the model’s predictions compared to channels with shorter bars. A high mean absolute SHAP value for a channel suggests that the information contained within the signal from that channel significantly contributes to the model’s ability to discriminate between the different classes. Conversely, channels with low mean absolute SHAP values are less effective for the model’s performance.

Based on the mean absolute SHAP values presented in Figure 11 and the sorted list of channels by importance, it is clear that channel 15, channel 29, channel 10, channel 1, and channel 2 exhibit higher mean absolute SHAP values compared to other channels. This means that the information acquired by these specific channels is more critical for the model to classify the emotional EEG data. In contrast, channels with shorter bars, such as channel 9, channel 21, and channel 26, have lower mean absolute SHAP values, indicating they are less influential in the model’s decision-making process for this task and dataset. Therefore, the dataset could be reduced without a big impact on the model accuracy. Analyzing the spatial distribution of these important channels on an EEG cap could help neuroscientists to locate which brain regions (channels) are expected to be relevant for emotion processing. In addition, SHAP can be used in the future to adjust the accuracy by applying channel reduction based on SHAP mean absolute values and then to pipeline it with the STFT along with the ACE for better accuracy and efficient computation.

We acknowledge that the scope of the current study is limited to the EAV dataset. Specifically, EAV provides a larger data pool (42 subjects) and a greater diversity of emotional classes (five emotional categories) compared to other benchmark datasets, enabling a robust assessment of the model generalizability and accuracy. We further note that the present study is limited to the proof of concept; comprehensive benchmarking is left as a key objective of our future work. In the context of accuracy, it is noted that datasets with a smaller number of classes may produce a higher classification accuracy than EAV. Other factors include how clean and well-defined the classification task is for each dataset. For example, in the literature, real-world datasets with two or three classes and only 15 subjects give above 90% accuracy.

5. Conclusions

Despite recent improvements in BCI, the biggest potential challenge in EEG-based BCI systems remains the limited recognition rate and the ability to represent an individual’s neural patterns for accurate and reliable classification. This paper proposed an efficient methodology aimed at enhancing classification accuracy through a robust pipeline for EEG signal processing and classification. The proposed approach leverages the STFT of non-stationary EEG signals with ACE to amplify high-frequency details by attenuating low-frequency components. The proposed method achieved an average accuracy of 72.50%. This represents a 12.55% improvement over the baseline accuracy of 60% reported in the original EAV study. The proposed methodology has the potential to enhance EEG recognition across multiple fields. Future work will focus on optimizing key hyperparameters of STFT, which are the sampling frequency, window length, and overlap percentage. Indeed, this study employed standard default values of Fs = 100 Hz, Seg_length = 128 samples, and a 50% Seg_overlap, highlighting that these settings were not optimized and that their optimization represents a promising method for enhancing analytical performance. In addition, optimization may be used to improve the ACE neighborhood size and explore the use of SHAP-based channel interpretation for feature selection to facilitate the deployment of this methodology on lightweight embedded systems. Further, eliminating uninformative EEG channels could have a positive effect on performance.

Author Contributions

Conceptualization, F.L.M. and K.I.; Methodology, F.L.M. and K.I.; Validation, F.L.M.; Formal analysis, F.L.M. and K.I.; Investigation, F.L.M. and K.I.; Resources, F.L.M.; Writing—original draft, F.L.M.; Writing—review and editing, F.L.M. and K.I.; Visualization, K.I.; Supervision, K.I.; Project administration, K.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available upon request from the first author. The original EEG data [10] is available at https://github.com/nubcico/EAV (accessed on 30 September 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Bastos, N.S.; Marques, B.P.; Adamatti, D.F.; Billa, C.Z. Analyzing EEG signals using decision trees: A study of modulation of amplitude. Comput. Intell. Neurosci. 2020, 2020, 3598416. [Google Scholar] [CrossRef]
Machado, S.; Cunha, M.; Velasques, B.; Minc, D.; Bastos, V.H.; Budde, H.; Cagy, M.; Piedade, R.; Ribeiro, P. Interface cérebro-computador: Novas perspectivas para a reabilitação. Rev. Neurociências 2009, 17, 329–335. [Google Scholar] [CrossRef]
Jayaraman, V.; Sivalingam, S.; Munian, S. Analysis of Real Time EEG Signals. Master’s Thesis, Linnaeus University, Växjö, Sweden, 2014. [Google Scholar]
Ghosh, R.; Deb, N.; Sengupta, K.; Phukan, A.; Choudhury, N.; Kashyap, S.; Phadikar, S.; Saha, R.; Das, P.; Sinha, N.; et al. SAM 40: Dataset of 40 subject EEG recordings to monitor the induced-stress while performing stroop color-word test, arithmetic task, and mirror image recognition task. Data Brief 2021, 1, 14562090. [Google Scholar] [CrossRef] [PubMed]
Pandey, P.; Tripathi, R.; Miyapuram, K.P. Classifying oscillatory brain activity associated with indian rasa s using network metrics. Brain Inform. 2022, 9, 15. [Google Scholar] [CrossRef] [PubMed]
Jacobson, S.; Pugsley, S.; Marcus, E.M. Cerebral cortex functional localization. In Neuroanatomy for the Neuroscientist; Springer: Berlin/Heidelberg, Germany, 2025; pp. 347–379. [Google Scholar]
Kumar, S.; Sharma, A. Advances in non-invasive EEG-based brain-computer interfaces: Signal acquisition, processing, emerging approaches, and applications. Signal Process. Strateg. 2025, 281–310. [Google Scholar]
Thakur, S.; Thakur, S.; Rana, A.; Kumar, P.; Kumar, K.; Chen, C.M. Exploring the Evolution of Feature Extraction Methods in Brain–Computer Interfaces (BCIs): A Systematic Review of Research Progress and Future Trends. Wiley Interdiscip. Rev. Data Min. Knowl. Discov. 2025, 15, e70040. [Google Scholar] [CrossRef]
Barrows, P.; Van Gordon, W.; Gilbert, P. Current trends and challenges in EEG research on meditation and mindfulness. Discov. Psychol. 2024, 4, 148. [Google Scholar] [CrossRef]
Lee, M.-H.; Shomanov, A.; Begim, B.; Kabidenova, Z.; Nyssanbay, A.; Yazici, A.; Lee, S.-W. EAV: EEG-Audio-Video Dataset for Emotion Recognition in Conversational Contexts. Sci. Data 2024, 11, 1026. [Google Scholar] [CrossRef]
Merk, T.; Köhler, R.M.; Brotons, T.M.; Vossberg, S.R.; Peterson, V.; Lyra, L.F.; Vanhoecke, J.; Chikermane, M.; Binns, T.S.; Li, N. Invasive neurophysiology and whole brain connectomics for neural decoding in patients with brain implants. Nat. Biomed. Eng. 2025, 1–18. [Google Scholar] [CrossRef]
d’Ascoli, S.; Bel, C.; Rapin, J.; Banville, H.; Benchetrit, Y.; Pallier, C.; King, J.-R. Decoding individual words from non-invasive brain recordings across 723 participants. arXiv 2024, arXiv:2412.17829. [Google Scholar]
Leuthardt, E.C.; Moran, D.W.; Mullen, T.R. Defining surgical terminology and risk for brain computer interface technologies. Front. Neurosci. 2021, 15, 599549. [Google Scholar] [CrossRef] [PubMed]
Arico, P.; Borghini, G.; Di Flumeri, G.; Sciaraffa, N.; Colosimo, A.; Babiloni, F. Passive BCI in operational environments: Insights, recent advances, and future trends. IEEE Trans. Biomed. Eng. 2017, 64, 1431–1436. [Google Scholar] [CrossRef] [PubMed]
Nia, A.F.; Tang, V.; Talou, G.D.M.; Billinghurst, M. Decoding emotions through personalized multi-modal fNIRS-EEG Systems: Exploring deterministic fusion techniques. Biomed. Signal Process. Control. 2025, 105, 107632. [Google Scholar] [CrossRef]
Zhang, M.; Qian, B.; Gao, J.; Zhao, S.; Cui, Y.; Luo, Z.; Shi, K.; Yin, E. Recent Advances in Portable Dry Electrode EEG: Architecture and Applications in Brain-Computer Interfaces. Sensors 2025, 25, 5215. [Google Scholar] [CrossRef]
Tam, N.D. A second-generation non-invasive brain–computer interface (BCI) design for wheelchair control. Acad. Eng. 2025, 2. [Google Scholar] [CrossRef]
Wang, F.; Tian, Y.-C.; Zhou, X. Cross-dataset EEG emotion recognition based on pre-trained Vision Transformer considering emotional sensitivity diversity. Expert. Syst. Appl. 2025, 279, 127348. [Google Scholar] [CrossRef]
Sun, H.; Wang, H.; Wang, R.; Gao, Y. Emotion recognition based on EEG source signals and dynamic brain function network. Methods 2025, 415, 110358. [Google Scholar] [CrossRef]
Koelstra, S.; Muhl, C.; Soleymani, M.; Lee, J.-S.; Yazdani, A.; Ebrahimi, T.; Pun, T.; Nijholt, A.; Patras, I. Deap: A database for emotion analysis; using physiological signals. IEEE Trans. Affect. Comput. 2011, 3, 18–31. [Google Scholar] [CrossRef]
Zheng, W.-L.; Zhu, J.-Y.; Lu, B.-L. Identifying stable patterns over time for emotion recognition from EEG. IEEE Trans. Affect. Comput. 2017, 10, 417–429. [Google Scholar] [CrossRef]
Katsigiannis, S.; Ramzan, N. DREAMER: A database for emotion recognition through EEG and ECG signals from wireless low-cost off-the-shelf devices. IEEE J. Biomed. Health Inform. 2017, 22, 98–107. [Google Scholar] [CrossRef] [PubMed]
Song, T.; Zheng, W.; Lu, C.; Zong, Y.; Zhang, X.; Cui, Z. MPED: A multi-modal physiological emotion database for discrete emotion recognition. IEEE Access 2019, 7, 12177–12191. [Google Scholar] [CrossRef]
Song, T.; Zheng, W.; Song, P.; Cui, Z. EEG emotion recognition using dynamical graph convolutional neural networks. IEEE Trans. Affect. Comput. 2018, 11, 532–541. [Google Scholar] [CrossRef]
Goshvarpour, A. Cognitive-inspired spectral spatiotemporal analysis for emotion recognition utilizing electroencephalography signals. Cogn. Comput. 2025, 17, 2. [Google Scholar] [CrossRef]
Chakravarthi, B.; Ng, S.-C.; Ezilarasan, M.; Leung, M.-F. EEG-based emotion recognition using hybrid CNN and LSTM classification. Front. Comput. Neurosci. 2022, 16, 1019776. [Google Scholar] [CrossRef]
Karthiga, M.; Suganya, E.; Sountharrajan, S.; Balusamy, B.; Selvarajan, S. Eeg based smart emotion recognition using meta heuristic optimization and hybrid deep learning techniques. Sci. Rep. 2024, 14, 30251. [Google Scholar] [CrossRef] [PubMed]
Alidoost, Y.; Asl, B.M. Entropy-Based Emotion Recognition Using EEG Signals. IEEE Access 2025, 13, 51242–51254. [Google Scholar] [CrossRef]
Islam, M.; Lee, T. An automated extraction of spectral-temporal and spatial-temporal features of EEG for emotion detection. Brain Inform. 2025, 12, 19. [Google Scholar] [CrossRef]
Mateo, C.; Talavera, J.A. Bridging the gap between the short-time Fourier transform (STFT), wavelets, the constant-Q transform and multi-resolution STFT. Signal Image Video Process. 2020, 14, 1535–1543. [Google Scholar] [CrossRef]
Harris, F. Polyphase Interpolators with Reversed Order of Up-Sampling and Down-Sampling. In Proceedings of the 2021 55th Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 31 October–3 November 2021; Springer: Berlin/Heidelberg, Germany, 2025; pp. 918–924. [Google Scholar]
Pandey, V.; Panwar, N.; Kumbhar, A.; Roy, P.P.; Iwamura, M. Enhanced Cross-Task EEG Classification: Domain Adaptation with EEGNet. In Proceedings of the International Conference on Pattern Recognition, 2025; Springer: Berlin/Heidelberg, Germany, 2025; pp. 354–369. [Google Scholar]
Zhang, W.; Zhuang, P.; Sun, H.-H.; Li, G.; Kwong, S.; Li, C. Underwater image enhancement via minimal color loss and locally adaptive contrast enhancement. IEEE Trans. Image Process. 2022, 31, 3997–4010. [Google Scholar] [CrossRef] [PubMed]
Safonov, I.V.; Kurilin, I.V.; Rychagov, M.N.; Tolstaya, E.V. Adaptive global and local contrast enhancement. In Adaptive Image Processing Algorithms for Printing; Springer: Berlin/Heidelberg, Germany, 2017; pp. 1–39. [Google Scholar]
Xia, L.; Wang, R.; Ye, H.; Jiang, B.; Li, G.; Ma, C.; Gao, Z. Hybrid LSTM–transformer model for the prediction of epileptic seizure using scalp EEG. IEEE Sens. J. 2024, 24, 21123–21131. [Google Scholar] [CrossRef]
Ali, O.; Saif-ur-Rehman, M.; Dyck, S.; Glasmachers, T.; Iossifidis, I.; Klaes, C. Enhancing the decoding accuracy of EEG signals by the introduction of anchored-STFT and adversarial data augmentation method. Sci. Rep. 2022, 12, 4245. [Google Scholar] [CrossRef] [PubMed]
Babu, N.R.; Viswanathan, V. MFENet: A Multi-Feature Extraction Network for Enhanced Emotion Detection Using EEG and STFT. IEEE Access 2025, 13, 133338–133350. [Google Scholar] [CrossRef]
Liyanagedera, N.D.; Bareham, C.A.; Kempton, H.; Guesgen, H.W. Novel machine learning-driven comparative analysis of CSP, STFT, and CSP-STFT fusion for EEG data classification across multiple meditation and non-meditation sessions in BCI pipeline. Brain Inform. 2025, 12, 4. [Google Scholar] [CrossRef] [PubMed]
Chen, X.; Bao, X.; Jitian, K.; Li, R.; Zhu, L.; Kong, W. Hybrid EEG Feature Learning Method for Cross-Session Human Mental Attention State Classification. Brain Sci. 2025, 15, 805. [Google Scholar] [CrossRef]
Ting, K.M. Confusion Matrix. In Encyclopedia of Machine Learning; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2011. [Google Scholar] [CrossRef]
Lundberg, S.; Lee, S.-I. A Unified Approach to Interpreting Model Predictions. arXiv 2017, arXiv:1705.07874. [Google Scholar] [CrossRef]

Figure 1. EEG signals displaying brain rhythms characterized by distinct frequency bands: Delta, Theta, Alpha, Beta, and Gamma.

Figure 2. Placement of the electrodes for a 32-channel EEG across all four cerebral lobes—frontal, temporal, parietal, and occipital.

Figure 3. Randomly picked EEG signal samples from the EAV dataset: (a) Subject_1: channel_0, (b) Subject_1: channel_5.

Figure 4. General block diagram for the proposed methodology for emotional EEG signal classification.

Figure 5. EEG signal waveform in the time domain and following STFT: Channels 0, 5, and 10, (a) Subject 2, (b) Subject 3.

Figure 6. Spectrograms of EEG channels 5 and 10 for subject 2 and subject 3: (a) before ACE pre-processing, (b) after ACE preprocessing.

Figure 7. Accuracy comparison between the original EEG (black), STFT feature extraction (orange), and the proposed STFT with ACE (green) across 42 subjects in the EAV dataset, demonstrating superior performance of the proposed method approach.

Figure 8. F1-score comparison between the original EEG (black), STFT feature extraction (orange), and the proposed STFT with ACE (green) across 42 subjects in the EAV dataset, demonstrating superior performance of the proposed method.

Figure 9. The confusion matrix of the three experiments: (a) original EEG dataset, (b) STFT for EEG dataset, (c) enhanced STFT by ACE for the EEG dataset.

Figure 10. The relationship between the number of ACE neighbors (N) and the model’s accuracy.

Figure 11. EEG channel importance based on mean absolute SHAP values.

Table 1. Recent research related to EEG emotional classifications.

EEG Channels	Dataset/No. Individuals/No. Emotion Classes	Method	Accuracy (%)	Year/Ref.
32	SEED-IV/15/4	Graph Convolutional Network (GCN) based on functional connectivity	~90.4	2018/[24]
32	DEAP/32/4 sad fear, neutral, happy	Wavelet transform singular value to optimize computational	89.55	2025/[25]
32	SEED-V/20/4 (Joy, Fear, Sad, Peace)	Hybrid CNN-LSTM	~98%	2022/[26]
32	DEAP and SEED	meta-heuristic optimization and hybrid deep learning techniques	SEED: 99 DEAP:100	2024/[27]
32	DEAP/32/4	multiscale fluctuation-based dispersion entropy (MFDE) and refined composite MFDE (RCMFDE)	96.67	2025/[28]
32	DEAP/32/5 valence (Low, High), arousal, dominance, and liking levels	2D CNN-BiLSTM	94.71	2025/[29]

Table 2. List of CNN layers in the EEGNet [32].

	Layer Name/Hyperparameters		Layer Name/Hyperparameters
1	Block 1: Input Tensor: $X_{R} \in R^{400 \times 30 \times 585 \times 1}$	8	Block 2: SeparableConv2D: F2 filters, kernel (1, 8), padding = same
2	Conv2D: F1 filters, kernel (3, 3), padding = same	9	BatchNorm, ELU
3	BatchNorm	10	AveragePooling2D: (1, 2)
4	DepthwiseConv2D: kernel (C, 1), depth D, max-norm 1	11	Dropout: δ
5	BatchNorm, ELU	12	Output: Flatten
6	AveragePooling2D: (1, 2)	13	Dense: Nclasses, max-norm η
7	Dropout: δ	14	Softmax Output: Class probabilities

Table 3. Confusion matrix for classification over two class labels.

Input Class	Output Classes
Input Class	Class 1	Class 2
Class 1	(TP)	(FN)
Class 2	(FP)	(TN)

Table 4. Combined accuracy of the classification experiment with 42 subjects: before STFT, after STFT, and combined ACE with STFT.

Experiment No.	Accuracy % EEGNet	Accuracy % STFT + EEGNet	Accuracy % STFT+ ACE + EEGNet
1	60.12	70.70	72.75
2	59.58	71.55	72.42
3	60.22	70.48	72.51
4	59.79	71.31	73.11
5	60.10	70.92	72.41
6	60.17	69.92	72.56
7	59.20	70.20	71.56
8	60.30	71.65	72.63
Average/SD	59.94/±0.38	70.84/±0.59	72.50/±0.42

Table 5. The confusion matrix for classification with STFT + ACE for the EEG dataset.

Classes	1	2	3	4	5
1	1195	100	69	48	185
2	135	1195	159	88	150
3	69	170	1287	135	67
4	76	55	118	1294	148
5	205	160	47	115	1130
42 subjects	1680	1680	1680	1680	1680
1 subject	40	40	40	40	40
Class Acc.	71.1%	71.1%	76.6%	77.0%	67.2%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Malallah, F.L.; Iqbal, K. Spectrogram Contrast Enhancement Improves EEG Signal-Based Emotional Classification. Appl. Sci. 2025, 15, 12634. https://doi.org/10.3390/app152312634

AMA Style

Malallah FL, Iqbal K. Spectrogram Contrast Enhancement Improves EEG Signal-Based Emotional Classification. Applied Sciences. 2025; 15(23):12634. https://doi.org/10.3390/app152312634

Chicago/Turabian Style

Malallah, Fahad Layth, and Kamran Iqbal. 2025. "Spectrogram Contrast Enhancement Improves EEG Signal-Based Emotional Classification" Applied Sciences 15, no. 23: 12634. https://doi.org/10.3390/app152312634

APA Style

Malallah, F. L., & Iqbal, K. (2025). Spectrogram Contrast Enhancement Improves EEG Signal-Based Emotional Classification. Applied Sciences, 15(23), 12634. https://doi.org/10.3390/app152312634

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Spectrogram Contrast Enhancement Improves EEG Signal-Based Emotional Classification

Featured Application

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Short-Time Fourier Transform (STFT) Feature Extraction

3.2. Adaptive Contrast Enhancement (ACE)

3.3. Deep Learning EEGNet Classifier

4. Results and Discussion

4.1. Signal Enhancement Using STFT with ACE

4.2. Recognition Accuracy

4.3. Comparison of Classification Accuracies

4.4. SHAP Channel Importance Analysis

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI