Transformer-Based Decomposition of Electrodermal Activity for Real-World Mental Health Applications

Tsirmpas, Charalampos; Konstantopoulos, Stasinos; Andrikopoulos, Dimitris; Kyriakouli, Konstantina; Fatouros, Panagiotis

doi:10.3390/s25144406

Open AccessArticle

Transformer-Based Decomposition of Electrodermal Activity for Real-World Mental Health Applications

by

Charalampos Tsirmpas

^1,*

,

Stasinos Konstantopoulos

^1,2

,

Dimitris Andrikopoulos

¹

,

Konstantina Kyriakouli

³

and

Panagiotis Fatouros

¹

Feel Therapeutics Inc., San Francisco, CA 94108, USA

²

Siglyx, 151 22 Athens, Greece

³

Institute for Machine Learning, Johannes Kepler University, 4040 Linz, Austria

^*

Author to whom correspondence should be addressed.

Sensors 2025, 25(14), 4406; https://doi.org/10.3390/s25144406

Submission received: 20 May 2025 / Revised: 2 July 2025 / Accepted: 8 July 2025 / Published: 15 July 2025

(This article belongs to the Special Issue Mental Health Monitoring and Psychiatric Practice Using Sensors and Wearables)

Download

Browse Figures

Versions Notes

Abstract

Decomposing Electrodermal Activity (EDA) into phasic (short-term, stimulus-linked responses) and tonic (longer-term baseline) components is essential for extracting meaningful emotional and physiological biomarkers. This study presents a comparative analysis of knowledge-driven, statistical, and deep learning-based methods for EDA signal decomposition, with a focus on in-the-wild data collected from wearable devices. In particular, the authors introduce the Feel Transformer, a novel Transformer-based model adapted from the Autoformer architecture, designed to separate phasic and tonic components without explicit supervision. The model leverages pooling and trend-removal mechanisms to enforce physiologically meaningful decompositions. Comparative experiments against methods such as Ledalab, cvxEDA, and conventional detrending show that the Feel Transformer achieves a balance between feature fidelity (SCR frequency, amplitude, and tonic slope) and robustness to noisy, real-world data. The model demonstrates potential for real-time biosignal analysis and future applications in stress prediction, digital mental health interventions, and physiological forecasting.

Keywords:

biosignal processing; phasic and tonic decomposition; time series detrending; electrodermal activity; wearable sensors; deep learning; transformer models; digital mental health; real-world signal analysis

1. Introduction

Real-time physiological monitoring using wearable sensors is increasingly recognized as a valuable tool in psychiatric practice, offering continuous, objective data to support mental health assessment and intervention [1,2,3]. Electrodermal Activity (EDA), in particular, is a measure of changes in the electrical properties of the skin influenced by the activity of sweat glands, which are regulated by the autonomic nervous system (ANS). Specifically, EDA reflects the sympathetic branch of the ANS, which is associated with physiological arousal and responses to emotional or stressful stimuli [4]. When an individual experiences heightened emotional arousal, the sympathetic nervous system stimulates eccrine sweat glands, particularly on the palms and soles, leading to measurable fluctuations in skin conductance [5,6]. These fluctuations are captured as changes in electrical conductance on the skin’s surface, which is typically measured in microsiemens (µS).

EDA is widely used in psychophysiological research due to its sensitivity to emotional, cognitive, and stress-related processes [7,8]. It has been applied extensively in fields such as psychology, neuroscience, and human–computer interaction for applications ranging from stress and anxiety detection to emotion recognition and mental health monitoring [9]. More recently, measuring EDA has become increasingly accessible with the advent of wearable sensors, allowing continuous and unobtrusive monitoring of physiological arousal in everyday environments. Research-grade devices such as the Feel Monitor [10], Empatica E4 [11], and Shimmer3 GSR+ [12] have been widely adopted in clinical and research settings [13,14,15,16,17,18], while consumer-grade devices like the Fitbit and Garmin wearables are increasingly being used in mental health monitoring [19,20]. These platforms facilitate the detection of stress patterns and physiological correlates of emotional states, facilitating early identification of psychiatric symptoms and supporting timely, personalized interventions [21].

EDA signals are typically divided into two primary components, phasic and tonic, each representing different aspects of ANS arousal. The phasic component reflects short-term fluctuations that are closely tied to discrete external or internal stimuli, such as sudden loud noises, visual stimuli, or cognitive tasks that elicit an arousal response. These rapid changes, also known as skin conductance responses (SCR), occur in reaction to specific events and are characterized by sharp, transient increases in skin conductance followed by a gradual return to baseline [4]. Phasic activity provides insights into an individual’s immediate, momentary responses to stimuli, making it useful for studying arousal dynamics, event-related responses, and stimulus-triggered physiological changes [4].

The tonic component, on the other hand, reflects overall arousal levels over longer periods. It is often referred to as the skin conductance level (SCL) and is indicative of an individual’s general physiological state or emotional baseline [4]. The tonic component changes gradually and is influenced by factors such as stress, attention, or sustained emotional states rather than immediate stimuli. SCL provides essential context for interpreting an individual’s broader arousal state [4].

Given the distinct physiological information each component provides, it becomes evident that decomposing EDA signals into their phasic and tonic components is critical for a comprehensive understanding of ANS functioning. This methodological distinction has direct clinical implications: accurately isolating phasic activity can help identify acute stress responses in real time, while monitoring tonic levels may reveal chronic stress or emotional dysregulation patterns. Such insights can inform timely, personalized interventions in patients with mood or anxiety disorders, making EDA decomposition not just a technical requirement but a clinically actionable step in advancing precision mental health care. By isolating these elements, researchers can better distinguish between momentary, stimulus-induced reactions and the broader, baseline arousal states that evolve over time.

However, a significant challenge in evaluating EDA decomposition techniques is the lack of a universally accepted ground truth for distinguishing phasic and tonic EDA components, which directly impacts the clinical validation and broader adoption of EDA-based monitoring tools in psychiatric care. Without clear ground truth, it becomes difficult to objectively assess the performance of different decomposition algorithms, which, in turn, complicates their integration into clinical workflows that require reliability and interpretability. This methodological gap limits the utility of EDA signals in informing treatment personalization and delays the implementation of real-time stress interventions based on these physiological markers. In lab conditions, researchers trigger ANS responses, which allow them to directly localize SCRs in the signal. Naturally, the duration of such studies is limited. This means that such experiments do not offer the opportunity to observe meaningful changes in the slow-changing SCL. Continuous monitoring, on the other hand, allows for the study of SCL but lacks explicit information about where SCRs have occurred.

Bach and Friston [22] also discuss this lack of a definitive reference and how it hinders the objective assessment of the accuracy and reliability of different methods. They assert, however, that phasic/tonic methods are more accurate than directly analyzing the EDA signal, as was standard practice previously.

In this article, we investigate whether deep learning—specifically, a Transformer architecture in a non-autoregressive setting—can be used to address the challenge of EDA decomposition without requiring detailed supervision. This approach enables deployment in wearable devices for continuous, real-world monitoring of autonomic arousal. By facilitating real-time tracking, the method has the potential to enhance clinical decision support systems in psychiatry and bridge the gap between algorithmic innovation and clinical adoption in mental health care. In the remainder of the article, we first present the relevant background (Section 2), including both methods specifically designed for decomposing EDA signals and data-driven methods, which identify events that stand out from the overall signal without referring to EDA-specific knowledge. We then discuss the challenge of comparing methods in the absence of ground truth and propose the comparison metrics that will be used in our experiments (Section 3). We then present our Transformer-based architecture and how we use data collected in the wild to compare it against prior methods (Section 4). We then present and discuss the results and the insights gained regarding relative strengths and limitations of EDA-specific methods, generic data-driven methods, and our deep-learning method (Section 5). Finally, we conclude and present future work and potential applications (Section 6).

2. Background

In this section, we first present literature specifically targeting EDA signal decomposition. These methods are characterized by the fact that they incorporate extensive domain knowledge on the morphology of SCRs and the EDA signal in general. We then proceed to present generic methods for time series detrending, Transformer neural networks, and, in general, data-driven methods that do not rely on domain-specific priors.

2.1. Knowledge-Driven Methods

Benedek and Kaernbach [23,24] propose the Ledalab phasic/tonic separation method, which directly reflects the physiology that generates the EDA signal. They build on earlier works that identify the Bateman biexponential function as accurately reflecting the characteristic steep onset and a slow recovery of phasic impulses; In fact, the Bateman function is both physiologically motivated and consistent with the data [25]. In theory, one could simply fit the Bateman function to calculate the phasic component and then simply subtract that from the full EDA signal to calculate the tonic component. However, as noted by Benedek and Kaernbach [24], this is not straightforward due to (a) the variability of the two parameters of the Bateman function both across subjects and for a given subject and (b) the fact that new responses may be superimposed on the recovery slope of previous ones.

What they do instead is to use the Bateman function as a way to detect the segment of the EDA signal that is the result of imposing the response impulse on the underlying skin conductance level assuming that the impulse peak has been detected. In other words, Ledalab does not use the Bateman function as a direct detector of SCRs, but as a way to `guess’ a Bateman-shaped area around each local maximum in the signal. To account for variation in the parameters of the Bateman function, an optimization task is performed for each maximum to estimate the parameters that maximize metrics inspired by known properties of the phasic and tonic components, which also includes elements that have been empirically found to work. This is followed by deconvolution of the EDA signal over the Bateman function. This gives a driver signal: A signal such that its convolution with the Bateman function would give the original EDA signal; or, in other words, a signal where occurrences of the shape of the Bateman function stand out more clearly. Naturally, because of the optimization step, practically all maxima fit the Bateman shape and appear in the driver signal. To separate actual impulses from noise artifacts, Ledalab applies an empirical threshold to identify `significant peaks’. Once significant peaks are detected, the region of the EDA signal that corresponds to each peak (as per the Bateman function) is zeroed out. The core idea is that phasic impulses are contained in time, so if we remove the regions with phasic activity, what remains is a purely tonic signal. The tonic component of the phasic-activity regions can then be accurately interpolated between the purely tonic regions. Once the complete tonic component is estimated, then simple subtraction gives the phasic component.

Greco et al. [26] follow a more direct approach in their cvxEDA algorithm. Instead of Ledalab’s multi-step method, cvxEDA directly optimizes the complete formulation of EDA as the sum of a Bateman-shaped phasic component, a relatively smooth (cubic spline) tonic component, and some residual (sampling or modeling error). The optimizer finds the best parameters for all components, again guided by priors about the shape of phasic impulses.

Hernando-Gallego et al. [27] follow a similar approach, also jointly optimizing the parameters of the phasic and the tonic model. When compared to cvxEDA, one difference is that their sparsEDA algorithm uses a dictionary of impulse models to select from, instead of allowing the optimizer to set the parameters of the Bateman function. Also, the method is more explicitly formulated to take into consideration the sampling rate and other theoretical and practical considerations regarding the sampled and discretized signal.

In a more recent variation of this optimization-based line of research, Wickramasuriya et al. [28] use the Zdunek and Cichocki [29] method for finding sparse overlapping signals. However, this method needs to be parameterized with the onset and recovery times, which are known to vary between subjects and even between experiments with the same subject. In order to estimate these parameters, Wickramasuriya et al. [28] first apply cvxEDA on a short portion of the data.

Jain et al. [30] emphasise treating a more complex tonic component than the cubic spline interpolation that is used in the works presented above. Jain et al. [30] note that in the case of wearable devices, the baseline can have abrupt shifts due to movement, of a magnitude that cannot be captured by the usual noise models. In order to represent such discontinuities, they model EDA as being composed of a step baseline function, a tonic component, and the phasic impulses. Exploiting reasonable assumptions regarding the maximum frequency of discontinuities and the almost-sparse nature of the phasic component, they are able to formulate a solvable optimization task.

2.2. Data-Driven Methods

The methods presented above are specifically designed for decomposing the EDA signal, and thus encode prior knowledge about the shape of SCR. However, the manifestation of SCRs in actual data is not uniform and clear to the extent that we can consider recognizing them a solved problem. The amplitude of the actual responses varies, and the relative amplitude with respect to noise even more so; Furthermore, overlapping SCRs might obscure the slow-recovery pattern predicted by the Bateman function.

Given the above, it makes sense to also experiment with data-driven methods to identify responses as events that stand out from the overall signal without referring to prior observations about their shape. Framing EDA decomposition in this way, it makes sense to also consider the wide variety of statistical detrending methods that have been proposed to separate fast-moving events from any slow-moving trend in the background. Detrending typically works by fitting a linear (or, in any case, relatively simple) model for the overall trend and then subtracting that from the signal. Fitting a linear model to a time series is one of the most common tasks in non-parametric statistics, and many textbook methods (such as least squares and the Theil–Sen estimator) are part of daily practice in econometrics, physics, meteorology, and practically every natural science.

Similarly to the statistical methods discussed above, machine learning methods also have the advantage that they can exploit data to find regularities for which we lack a clear definition. Artificial Neural Networks (ANN) [31] are known to be universal function approximators with minimal requirements for prior knowledge, but specific network architectures are best suited for different kinds of tasks. In our case, we are looking for a network that is able to capture the positional dependencies that model the steep onset and slow recovery of phasic impulses while simultaneously abstracting away the specific position of each impulse in the EDA signal. The convolutional neural network (CNN) architecture ([31], Chapter 9) has been specifically designed to capture such local patterns and, even closer to our case, recurrent neural networks (RNN) ([31], Chapter 10) and long short-term memory (LSTM) [32] target capturing local events in time series.

Unlike non-parametric statistics, however, machine learning relies on a training dataset where relatively short (although not necessarily exact) spans of signal are annotated as positive and negative examples of the pattern of interest. Manually creating such a dataset at the scale required for training an ANN is a daunting task, and such methods are typically used in tasks where datasets of positive and negative examples are readily available or can be extracted at scale. In our case, it is preferable to focus on unsupervised methods that can be trained to separate the fast-moving and the slow-moving component of the EDA signal without explicit examples.

The use of CNNs, designed to reconstruct and/or forecast time series, in an autoregressive and therefore unsupervised manner, can be especially enhanced by using dilated convolutions that are able to capture long-range dependencies effectively along the temporal dimension ([33], closing paragraph). Similar successful approaches include WaveNet, a CNN-type model with dilated convolutions for fixed-length time series [34], and the combination of multiple CNNs, each designed to capture either closeness pattern or long and short-range dependencies in time series [35].

Naturally, in order to be able to distinguish the two components, signal data points that are far removed from each other must be taken into account together. A transformer model with self-attention provides attention connections with a very wide receptive field where temporal correlations are less locally focused and more widely connected, allowing it to uncover long-range dependencies [36].

A natural issue that arises with using attention on long time series is computational efficiency. In the traditional approach of scaled dot product attention, the computational complexity and memory requirements scale quadratically over the size of the input, making it difficult to handle inputs beyond 512 tokens [37]. At the sampling rate of 8Hz, which is generally accepted as sufficient for capturing EDA features, this restricts our input to 1min. Various modified attention mechanisms have been proposed to tackle this problem, and among the most notable and successfully used are the Informer model, which uses prob-sparse self-attention (provides a probabilistic approach to sparsifying the self-attention matrix) [38], the Reformer (LSH attention by approximating the full self-attention mechanism with a more computationally manageable variant) [39], and the Autoformer model [40], which uses an auto-correlation mechanism instead of traditional attention.

Our reasons for choosing the Autoformer model are mainly two. First, the model is very efficient in uncovering long-range dependencies because instead of computing pairwise attention weights between all elements in the sequence, the autocorrelation attention mechanism leverages the statistical properties of time series data to identify and focus on the most predictive parts of the sequence [40]. Second, the Autoformer is a detrending model, as its blocks sequentially remove and refine a trend component from the time series input. Given the above, we expect the Autoformer to be an appropriate basis for developing a trend-sensitive decomposition mechanism that separates the slow-moving tonic component from the fast-moving phasic component.

3. Comparing EDA Decomposition Methods

In order to compare algorithms, we would normally go back to their purpose and define metrics that reflect how well each algorithm serves this purpose. This is less straightforward in multi-step methods, where we do not have ground truth annotations for the outputs of the intermediate steps but only for the system as a whole. Nevertheless, the core intuition behind decomposition is that once the slow-moving SCL component has been removed, SCR peaks will have similar amplitude so that peak-detection methods can be directly parameterized.

We should note here that these characteristics of the SCL and SCR components are encoded in the optimization step of Ledalab and cvxEDA; therefore, the result is expected to be optimal for a perfectly noiseless signal. However, in data collected in the wild, either algorithm can fail under different circumstances: Ledalab optimizes each peak separately, which might result in fitting peaks that are noise artifacts; cvxEDA optimizes globally, which might result in counterintuitive SCL as it tries to fit noise artifacts. We will revisit this point in Section 5.2, but what is important is that the most informed methods in the domain often give different results without any clear, automated way to decide which is more accurate.

Without access to golden-truth decomposition, we need to establish measures of similarity between decompositions and compare the behavior of the different methods. Such a measure of similarity could be the RMS between SCL curves, or the same SCR peaks are identified under the same peak detection parameters. However, we observe that the ultimate goal is not the identification of SCR peaks per se, but statistics known to be characteristic of ANS responses: SCR frequency, mean amplitude, and the power spectral density of different frequency bands. In other words, it can happen that two alternative decompositions of the same signal might appear different when comparing whether the same SCRs have been identified, but give the same or very similar feature values.

(Kreibig [41], Table 2) lists the EDA features proposed in the relevant literature. There seems to be a consensus on the direction of change of SCL, the frequency and amplitude of SCR, and the nonspecific skin conductance response rate. Other features that appear prominently, although not universally, include the Ohmic Perturbation Duration index (OPD) and SYDER skin potential forms. We refer the reader to Silva et al. [42] for precise definitions. Among those, only the direction of change of SCL, the frequency of SCR, and the amplitude of SCR can be directly extracted from the decomposed signal without sophisticated further analysis. The rest of the features require identifying the onset and end timepoints of the SCR, which is either a by-product of some (but not all) of the methods under consideration or a substantial secondary analysis that can (for overlapping SCR) produce errors independently of the decomposition quality.

Based on the above, we use, in the experiments we present below, the direction of change of SCL, the frequency of SCR, and the amplitude of SCR as similarity metrics because they depend on the decomposition in a direct and straightforward way.

4. Experimental Setup

4.1. Datasets and Methods

Our dataset is extracted from data acquired in previous studies by Tsirmpas et al. [18,43], comprising longitudinal EDA data collected over a 16-week period from 40 individuals. This dataset is considerably larger-scale than most datasets previously used to study EDA, both with respect to the number of different users and with respect to the duration of the EDA signal per user. It is also acquired from wearable devices in the wild, offering a very representative sample of what EDA looks like in personalized health and well-being use cases, as opposed to controlled-environment clinical studies and research.

For our empirical comparisons, we have used the following implementations of relevant methods presented in Section 2:

Ledalab as implemented by Pypsy 0.1.5, (Available from https://github.com/brennon/Pypsy, accessed on 20 May 2024) with the change that the input is resampled to 24Hz instead of 25 Hz. This has no impact on the method, but makes it easier to undersample back to the 8Hz rate of our data.
cvxEDA and sparsEDA as implemented in NeuroKit2 0.2.10. (Available from https://github.com/neuropsychology/NeuroKit, accessed on 20 May 2024) and PyPI. Note that this version or later must be used, as earlier versions do not include our sparsEDA patch.
Thiel detrending as implemented in SciPy 1.11. (Package scipy.stats.theilslopes from https://github.com/scipy (accessed on 20 May 2024) and PyPI.)

All methods were applied on non-overlapping 3 min frames to produce the tonic component, noting that Theil detrending produces a slope/intercept pair, which is interpreted as a straight line with those parameters. The phasic component was then calculated as the residue from the EDA signal.

For all methods, the EDA signal was first cleaned with a low-pass filter with a 3Hz cutoff frequency and a 4th order Butterworth filter. The number of peaks was calculated using the NeuroKit2 peak detection algorithm with default parameters. Since we are interested in observing differences between the peaks detected after different decomposition methods, the specific algorithm and parameters used are not expected to affect our results, as long as the same algorithm/parameters is used for all methods. For this reason, we have not used the peaks reported by decomposition methods such as Ledalab, where detecting SCR is an integral part of the algorithms.

4.2. Feel Transformer

We base our Feel Transformer on the Autoformer [40], but we effected several changes in the architecture, although we largely re-used the implementations of the individual layers from the original Autoformer implementation (Available from https://github.com/thuml/Autoformer, accessed on 20 May 2024). We used one encoder and two decoder blocks, with the aim of adding more processing power and depth to our network. At the same time, using the input sequence again before the decoder blocks acts as a residual connection. We use the network in a non-autoregressive fashion. That is, there is no left-to-right processing as is usual with the decoder part of a transformer, but rather the sequence is processed bidirectionally in both encoder and decoder blocks. An overview of the architecture is shown in Figure 1.

The motivation behind this modification comes from the lack of explicit supervision regarding how EDA is decomposed in its SCL and SCR components, with the simultaneous lack of supervision regarding when to expect ANS responses in our in-the-wild data acquisition. Since ANS responses follow (unknown to us) stimuli and cannot possibly be predicted from past EDA signal, our objective cannot be prediction (forecasting) as in the original Autoformer paper, but the non-autoregressive reconstruction of the input into two components. The loss function is the mean squared error between the original time series and the addition of the two components into a reconstruction of the full signal.

To bias the decomposition toward a slow-moving SCL and a fast-moving SCR component, we simplified the network that produces the SCL part of the output to a max-pooling layer, forcing the deeper part of the network to reconstruct the SCR component. The intuition behind this is that the SCR network will be challenged to converge when it is presented with data where similar morphologies appear at different amplitude levels, but it can converge by learning morphologies that are at roughly the same amplitude level since the loss is calculated after adding the SCL and the SCR components.

The network of the Feel Transformer has the following characteristics:

There is one encoder layer and two decoder layers. The dimension of the embedding is 32. The output dimension of the first convolution of the feed-forward block is 16.
There are 4 attention heads.
The SCL component is modeled by 1-D average pooling.

We tested three different sizes for the average pooling kernel: (a) 8 × 60 + 1 (we will refer to this as Feel Transformer 1 in the results presented below); (b) 8 × 30 + 1 (we will refer to this as Feel Transformer 2); (c) 8 × 1 + 1 (Feel Transformer 3). Since our data is at 8 Hz, this effectively means that the granularity of the SCL is 60 s, 30 s, or 1 s, respectively.

Furthermore, a ReLU activation function was used for the non-linear blocks. Regarding training, we used a learning rate of 0.001 with an Adam optimizer for the mean squared error (MSE) loss with a weight decay of 0.1. The final hyperparameter selection resulted from a search in between several hyperparameter configurations in order to optimize the MSE loss. More specifically, we formed all the possible configurations resulting from a choice of (i) 8, 16, 32, or 64 dimensions for embedding, (ii) 4, 8, or 16 dimensions for the output of the first convolution, and (iii) 2, 4, 8, or 16 attention heads. For each configuration we obtained the resulting MSE loss on a validation sub-set of the training data and observed for which configuration this loss was minimized.

Table 1 recaps all the methods included in the experiments and their key characteristics and Figure 2 gives a graphical outline of the experiment’s workflow.

5. Experimental Results and Discussion

5.1. Analysis of Feature Values

As argued in Section 3, the most useful features that can be extracted from the decomposed EDA signal are the direction (falling, stable, or rising) of the slope of the tonic component and the density and amplitude of peaks in the phasic component.

We calculated the tonic slope in µS/s by simply dividing the difference between the last and the first sample in each frame over its duration (3 min). Since the tonic slope can never be literally zero, we have binned the slopes into three bands, which we called “falling” (most negative slopes), “stable”, and “rising”. The bin boundaries are set at −0.001 µS/s and 0.001 µS/s. These boundaries were selected because they give a uniform histogram when bundling together the tonic components from all decomposition methods. We then calculate the histograms separately for the tonic components produced by each method. By comparing these histograms against a uniform 33% – 33% – 33% split, we can see any possible bias toward exaggerating or understating the tonic slope. As we can see in Table 2, Theil and the Feel Transformer give results similar to each other and different from the results given by Ledalab, cvxEDA, and sparsEDA, which characterize more frames as falling. In order to formally quantify this observation, we also provide the entropy between each of these histograms and the uniform histogram that we used to derive the histogram bins. As can be seen in the table, the distance of Theil and the Feel Transformer (except the third configuration) from the uniform distribution is at least one order of magnitude smaller than the rest.

Regarding peak density and amplitude, these are given in Table 3 and Table 4. As can be seen, Ledalab, cxvEDA, detrending, and the first two configurations of the Feel Transformer generally agree on peak density. When compared on peak amplitude, cxvEDA stands out in giving considerably higher amplitudes, followed by Ledalab at around 0.2 µS, and sparsEDA, detrending, Feel Transformer 1 and 2 at around 0.1 µS. Unlike the previous table, we quantify these observations for these tables as the pairwise Jensen–Shannon distance between the histograms, as there is no initial or original histogram (Table 5 and Table 6). Comparing these distances shows that the higher mean given by sparsEDA is not due to a major shift in the distribution, but is rather due to a larger number of outliers (see also the right-most column of Table 4).

Ledalab appears to be giving the best quality of results. As far as we can understand, without a specific ground truth label for individual peaks, the Leadalab algorithm makes peaks more pronounced as evidenced by (a) the higher mean amplitude for roughly the same peak density and (b) the higher fraction of peaks in the 0.2–0.4 µS band. On the other hand, the high mean amplitude in the cvxEDA peaks is an artifact of the cvxEDA optimization that we will revisit below.

The third Feel Transformer configuration uses the smallest kernel and is expected to produce a tonic component that closely follows the EDA curve; the results show that this result is not desired as it gives a phasic signal that is of too low amplitude, with many small peaks that are more likely to reflect noise than SCR.

5.2. Analysis of Phasic Morphology

We will now proceed to extract more statistics and to study characteristic examples in order to better understand the results presented in Section 5.1.

Since the phasic component is the residue between the EDA signal and the tonic component, its range (max–min amplitude) informs us about the complexity of the tonic component: A more complex tonic model will tend to follow the EDA signal more closely and leave a smaller range to the phasic component. The validity of this interpretation of amplitude range is corroborated by the ranges of the three Feel Transformer models, where the drastically smaller kernel of the third model indeed gives a significantly narrower value range for the phasic component (Table 7).

Coming back to the relatively high mean amplitude of the peaks, it is explained by phenomena such as those observed in Figure 3. Notice the dip around 90 s in the Ledalab and cvxEDA tonic components, which does not reflect a real drop in the amplitude of the EDA signal but is a side-effect of cubic interpolation. This dip causes the minor amplitude rise at 85 s to become more significant than it really is and to be falsely recognized as a peak. Although in this example both Ledalab and cvxEDA exhibit this phenomenon, it is more common in cvxEDA, possibly because it jointly optimizes the Bateman parameters and the SCL interpolation, affording it more flexibility to `discover’ instances of the Bateman function. Figure 4 has a characteristic example: using cubic interpolation to treat the sudden change in SCL between 110 and 125 s causes both Ledalab and cvxEDA to create an artificial peak at 110 s; Note, however, that creates another artificial peak at 150 s by raising the SCL at 140–145 s without having any reason to do so based on the original EDA, so that the falling SCL at 140–150 s creates the rise needed in the phasic component at 145–150 s to recognize a peak.

When comparing Ledalab and cvxEDA with sparsEDA and detrending, we observe that the latter completely miss the change in SCL level between 110 and 125 s. Although this is obviously wrong, it does not have any side-effects on the overall tonic slope or peak density. It does, however, have a negative impact on peak amplitude, which is higher than it should be. The Feel Transformer compromises between the two, and falls into the same pitfall, although for a different reason: The Transformer is not trying to jointly optimize the SCL and the SCR as cvxEDA does, but it is trying to minimize the loss measured as the distance from the EDA signal. So it creates an SCL that is falling steeply, trying to reach the lower SCL level after 215 s, and this steep SCL slope creates the rise in phasic at 140–150 s that creates the peak. When using a finer kernel to allow the SCL to more closely follow the EDA, the problem persists, and when the kernel gets too fine (the third configuration), the results break down completely (Figure 5).

6. Conclusions and Further Work

We presented the Feel Transformer, a new method for decomposing EDA signals into a slow-moving tonic component and a sparse, fast-moving phasic component. Our method is based on the Autoformer, a variation of the Transformer NN architecture, but it departs from the standard Autoformer in that it explicitly encodes the knowledge that one of the two components must be a relatively simple and slow-moving curve that fits the general trend of the signal. In this respect, the Feel Transformer is more similar to domain-agnostic detrending, as both estimate the tonic component from the overall signal (including the SCRs), under the (reasonable) assumption that SCRs are sparse and not expected to drastically affect the estimation of the tonic component. On the other hand, the three EDA-specific methods use a three-pass approach, where they first use prior knowledge of the overall shape of SCR to identify possible SCRs, then estimate the tonic component from the remaining non-SCR signal only, and then extract the phasic component and apply peak detection to identify actual SCRs.

The empirical results validate this similarity: When extracting the features generally considered as most useful in EDA analysis (SCL direction, SCR density, SCR amplitude), the Feel Transformer agrees with detrending on all three features, while the Feel Transformer and detrending agree with Ledalab, cvxEDA on SCR density, and sparsEDA on SCR amplitude (Section 5.1). The visual inspection of characteristic frames reveals that the cubic interpolation used by Ledalab and cvxLeda creates non-existent peaks when the EDA signal has abrupt changes. Such changes are rare in laboratory conditions, where datasets are usually acquired, but are a lot more common in personal healthcare and well-being applications that rely on the EDA signal acquired in the wild from wearable devices. It should also be noted that laboratory studies validate methods by observing signal segments that are known to contain ANS responses because such responses have been explicitly elicited by stimuli. However, such approaches cannot validate the tonic component since their limited duration cannot contain meaningful SCL changes. Based on the above, we observe that the agreement of the Feel Transformer with domain-specific methods on the SCR features is a positive indication for the validity of our method; the minor disagreement between the Feel Transformer and domain-specific methods on the SCL should not be taken into account since the performance of domain-specific methods on the SCL features has not been validated.

As future work, we plan to investigate the robustness of the Feel Transformer to noisy signals. Wearable biosensors used in real-world settings often introduce artifacts due to movement, sweat, and intermittent signal loss. In contrast to traditional statistical deconvolution methods, which treat all deviations symmetrically and lack contextual understanding, Transformer-based models can learn to identify and ignore non-informative segments by capturing global signal structure. This ability is particularly critical in differentiating emotional stress from physical exertion in ambulatory monitoring scenarios, such as in cases of Post-traumatic stress disorder (PTSD) or Attention-deficit/hyperactivity disorder (ADHD). Demonstrating that the Feel Transformer can reliably extract meaningful features from such noisy, in-the-wild data would mark a major advancement over laboratory-optimized methods.

A further advantage of the Feel Transformer is that, in contrast to the other methods, it is a generative model. This offers the opportunity to non-autoregressively simulate or forecast future physiological states (features) and then use the generated signal to extract further features besides the ones used to generate the signal in the first place. This enables applications such as anticipatory interventions and anomaly detection in psychiatry, ranging from predicting stress overload and panic attack onset to forecasting depressive episode relapse.

Recent work supports the feasibility of such applications: Yang et al. [44] demonstrated the use of Transformer-based models to forecast affective states by integrating wearable sensor data with self-reported diaries, achieving high accuracy in mood prediction across temporal windows. Also, works like the one from Halkiopoulos and Gkintoni [45] highlighted that Transformer and reinforcement learning are used extensively over biosingals like heart rate and EDA fluctuations during virtual reality-based therapeutic stimuli, where predictive simulations enable adaptive adjustments to therapeutic content in real time, tailoring intervention intensity to the individual’s physiological profile. The above are some of the studies that highlight the potential of models like the Feel Transformer, not only to decompose biosignals but also to generate plausible physiological trajectories—an essential capability for real-world mental health applications.

Author Contributions

Conceptualization: C.T. and S.K. jointly defined the research problem, identified the need for an unsupervised EDA decomposition approach, and proposed the use of a Transformer-based model. Methodology: C.T. and S.K. designed the comparative framework against statistical and knowledge-based methods and outlined the architecture modifications to the Autoformer for physiological signal decomposition. Software and Data Curation: D.A. and K.K. implemented the modified Autoformer architecture, processed the dataset collected from prior longitudinal studies, and prepared it for model training and evaluation. Validation: C.T. and P.F. assessed the model’s performance across multiple decomposition metrics and validated its behavior through feature-based comparisons and signal morphology analysis. Formal Analysis: S.K. led the design of evaluation criteria for SCR and SCL features and interpreted statistical trends across decomposition techniques. Writing—Original Draft: C.T. and S.K. collaboratively drafted the manuscript, including the framing of the problem, literature review, and technical details of the proposed model. Writing—Review & Editing: C.T. and P.F. reviewed and refined the manuscript to ensure clarity, coherence, and clinical relevance. Supervision: P.F. provided project oversight, ensured methodological rigor, and aligned the study goals with broader digital mental health research objectives. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The data will be made available by the authors upon request, without undue reservation.

Conflicts of Interest

C.T. is employed by Feel Therapeutics Inc., receives a salary, and owns a large share of the company stocks. D.A., P.F., and S.K. are employed by Feel Therapeutics Inc., receive a salary, and own options in the company. S.K. is employed by Siglyx. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Abdullah, S.; Choudhury, T. Sensing Technologies for Monitoring Serious Mental Illnesses. IEEE MultiMedia 2018, 25, 61–75. [Google Scholar] [CrossRef]
Sano, A.; Taylor, S.; McHill, A.W.; Phillips, A.J.; Barger, L.K.; Klerman, E.; Picard, R. Identifying Objective Physiological Markers and Modifiable Behaviors for Self-Reported Stress and Mental Health Status Using Wearable Sensors and Mobile Phones: Observational Study. J. Med. Internet Res. 2018, 20, e210. [Google Scholar] [CrossRef] [PubMed]
Singh, J.; Sharma, D. Automated detection of mental disorders using physiological signals and machine learning: A systematic review and scientometric analysis. Multimed. Tools Appl. 2024, 83, 73329–73361. [Google Scholar] [CrossRef]
Boucsein, W. Electrodermal Activity; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2012. [Google Scholar] [CrossRef]
Dawson, M.E.; Schell, A.M.; Filion, D.L. The Electrodermal System. In Handbook of Psychophysiology, 3rd ed.; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar] [CrossRef]
Shields, S.A.; MacDowell, K.A.; Fairchild, S.B.; Campbell, M.L. Is Mediation of Sweating Cholinergic, Adrenergic, or Both? A Comment on the Literature. Psychophysiology 1987, 24, 312–319. [Google Scholar] [CrossRef] [PubMed]
Rahma, O.N.; Putra, A.P.; Rahmatillah, A.; Putri, Y.S.K.A.; Fajriaty, N.D.; Ain, K.; Chai, R. Electrodermal Activity for Measuring Cognitive and Emotional Stress Level. J. Med. Signals Sens. 2022, 12, 155–162. [Google Scholar] [CrossRef] [PubMed]
Buchwald, M.; Kupiński, S.; Bykowski, A.; Marcinkowska, J.; Ratajczyk, D.; Jukiewicz, M. Electrodermal Activity as a Measure of Cognitive Load: A Methodological Approach. In Proceedings of the 23rd International Conference on Signal Processing: Algorithms, Architectures, Arrangements, and Applications (SPA 2019), Poznan, Poland, 18–20 September 2019; pp. 175–179. [Google Scholar] [CrossRef]
Greco, A.; Valenza, G.; Scilingo, E.P. Advances in Electrodermal Activity Processing with Applications for Mental Health; Springer Nature: Berlin/Heidelberg, Germany, 2016. [Google Scholar] [CrossRef]
Therapeutics, F. Monitoring Device. 2024. Available online: https://www.feeltherapeutics.com/ (accessed on 15 May 2024).
Empatica. E4: Wearable Physiological Sensing Platform. 2024. Available online: https://www.empatica.com/research/e4/ (accessed on 15 May 2024).
Sensing, S. Shimmer3 GSR Unit. 2024. Available online: https://www.shimmersensing.com/product/shimmer3-gsr-unit/ (accessed on 15 May 2024).
Milstein, N.; Gordon, I. Validating Measures of Electrodermal Activity and Heart Rate Variability Derived From the Empatica E4 Utilized in Research Settings That Involve Interactive Dyadic States. Front. Behav. Neurosci. 2020, 14. [Google Scholar] [CrossRef] [PubMed]
Udovičić, G.; Ðerek, J.; Russo, M.; Sikora, M. Wearable Emotion Recognition System based on GSR and PPG Signals. In Proceedings of the 2nd International Workshop on Multimedia for Personal Health and Health Care, MMHealth ’17, New York, NY, USA, 23–27 October 2017; pp. 53–59. [Google Scholar] [CrossRef]
Fatouros, P.; Tsirmpas, C.; Andrikopoulos, D.; Kaplow, S.; Kontoangelos, K.; Papageorgiou, C. Randomized controlled study of a digital data driven intervention for depressive and generalized anxiety symptoms. npj Digit. Med. 2025, 8, 113. [Google Scholar] [CrossRef] [PubMed]
Andrikopoulos, D.; Vassiliou, G.; Fatouros, P.; Tsirmpas, C.; Pehlivanidis, A.; Papageorgiou, C. Machine learning-enabled detection of attention-deficit/hyperactivity disorder with multimodal physiological data: A case-control study. BMC Psychiatry 2024, 24, 547. [Google Scholar] [CrossRef] [PubMed]
Mourtakos, S.; Vassiliou, G.; Kontoangelos, K.; Papageorgiou, C.; Philippou, A.; Bersimis, F.; Geladas, N.; Koutsilieris, M.; Sidossis, L.S.; Tsirmpas, C.; et al. Assessment of Resilience of the Hellenic Navy Seals by Electrodermal Activity during Cognitive Tasks. Int. J. Environ. Res. Public Health 2021, 18, 4384. [Google Scholar] [CrossRef] [PubMed]
Tsirmpas, C.; Nikolakopoulou, M.; Kaplow, S.; Andrikopoulos, D.; Fatouros, P.; Kontoangelos, K.; Papageorgiou, C. A Digital Mental Health Support Program for Depression and Anxiety in Populations With Attention-Deficit/Hyperactivity Disorder: Feasibility and Usability Study. JMIR Form Res. 2023, 7, e48362. [Google Scholar] [CrossRef] [PubMed]
de Vries, H.; Oldenhuis, H.; van der Schans, C.; Sanderman, R.; Kamphuis, W. Does Wearable-Measured Heart Rate Variability During Sleep Predict Perceived Morning Mental and Physical Fitness? Appl. Psychophysiol. Biofeedback 2023, 48, 247–257. [Google Scholar] [CrossRef] [PubMed]
Damme, K.; Vargas, T.; Walther, S.; Shankman, S.A.; Mittal, V.A. Physical and mental health in adolescence: Novel insights from a transdiagnostic examination of FitBit data in the ABCD study. Transl. Psychiatry 2024, 14, 75. [Google Scholar] [CrossRef] [PubMed]
Poh, M.Z.; Swenson, N.C.; Picard, R.W. A wearable sensor for unobtrusive, long-term assessment of electrodermal activity. IEEE Trans. Biomed. Eng. 2010, 57, 1243–1252. [Google Scholar] [CrossRef] [PubMed]
Bach, D.R.; Friston, K.J. Model-based analysis of skin conductance responses: Towards causal models in psychophysiology. Psychophysiology 2013, 50, 15–22. [Google Scholar] [CrossRef] [PubMed]
Benedek, M.; Kaernbach, C. Decomposition of Skin Conductance Data by Means of Nonnegative Deconvolution. Psychophysiology 2010, 47, 647–658. [Google Scholar] [CrossRef] [PubMed]
Benedek, M.; Kaernbach, C. A Continuous Measure of Phasic Electrodermal Activity. J. Neurosci. Methods 2010, 190, 80–91. [Google Scholar] [CrossRef] [PubMed]
Alexander, D.M.; Trengove, C.; Johnston, P.; Cooper, T.; August, J.P.; Gordon, E. Separating individual skin conductance re- sponses in a short interstimulus-interval paradigm. J. Neurosci. Methods 2005, 146, 116–123. [Google Scholar] [CrossRef] [PubMed]
Greco, A.; Valenza, G.; Lanata, A.; Scilingo, E.P.; Citi, L. cvxEDA: A Convex Optimization Approach to Electrodermal Activity Processing. IEEE Trans. Biomed. Eng. 2016, 63, 797–804. [Google Scholar] [CrossRef] [PubMed]
Hernando-Gallego, F.; Luengo, D.; Artés-Rodríguez, A. Feature Extraction of Galvanic Skin Responses by Nonnegative Sparse Deconvolution. IEEE J. Biomed. Health Inform. 2018, 22, 1385–1394. [Google Scholar] [CrossRef] [PubMed]
Wickramasuriya, D.S.; Amin, M.R.; Faghih, R.T. Skin Conductance as a Viable Alternative for Closing the Deep Brain Stimulation Loop in Neuropsychiatric Disorders. Front. Neurosci. 2019, 13. [Google Scholar] [CrossRef] [PubMed]
Zdunek, R.; Cichocki, A. Improved M-FOCUSS Algorithm with Overlapping Blocks for Locally Smooth Sparse Signals. IEEE Trans. Signal Process. 2008, 56, 4752–4761. [Google Scholar] [CrossRef]
Jain, S.; Oswal, U.; Xu, K.S.; Eriksson, B.; Haupt, J. A Compressed Sensing Based Decomposition of Electrodermal Activity Signals. IEEE Trans Biomed Eng 2017, 64, 2142–2151. [Google Scholar] [CrossRef] [PubMed]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Hewamalage, H.; Bergmeir, C.; Bandara, K. Recurrent Neural Networks for Time Series Forecasting: Current status and future directions. Int. J. Forecast. 2021, 37, 388–427. [Google Scholar] [CrossRef]
Borovykh, A.; Bohte, S.; Oosterlee, C.W. Conditional time series forecasting with convolutional neural networks. arXiv 2017, arXiv:1703.04691. [Google Scholar]
Wang, K.; Li, K.; Zhou, L.; Hu, Y.; Cheng, Z.; Liu, J.; Chen, C. Multiple convolutional neural networks for multivariate time series prediction. Neurocomputing 2019, 367, 189–197. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 5998–6008. [Google Scholar]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond efficient transformer for long sequence time-series forecasting. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Online, 2–9 February 2021. [Google Scholar]
Kitaev, N.; Kaiser, Ł.; Levskaya, A. Reformer: The Efficient Transformer. In Proceedings of the International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Wu, H.; Xiao, J.; Wang, W.; Long, M.; Wang, J. Autoformer: Decomposition Transformers with Auto-correlation for Long-Term Series Forecasting. In Proceedings of the Advances in Neural Information Processing Systems 34 (NeurIPS 2021), Virtual, 6–14 December 2021. [Google Scholar]
Kreibig, S.D. Autonomic Nervous System Activity in Emotion: A Review. Biol. Psychol. 2010, 84, 394–421. [Google Scholar] [CrossRef] [PubMed]
Silva, H.; Fred, A.; Eusé’bio, S.; Torrado, M.; Ouakinin, S. Feature Extraction for Psychophysiological Load Assessment in Unconstrained Scenarios. In Proceedings of the 2012 Annual International Conference of the IEEE Engineering in Medicine and Biology Society, San Diego, CA, USA, 28 August–1 September 2012. [Google Scholar] [CrossRef]
Tsirbas, C.; Andrikopoulos, D.; Fatouros, P.; Eleftheriou, G.; Anguera, J.A.; Kontoangelos, K.; Papageorgiou, C. Feasibility, engagement, and preliminary clinical outcomes of a digital biodata-driven intervention for anxiety and depression. Front. Digit. Health 2022, 4, 868970. [Google Scholar] [CrossRef]
Yang, Z.; Wang, Y.; Yamashita, K.S.; Khatibi, E.; Azimi, I.; Dutt, N.; Borelli, J.L.; Rahmani, A.M. Integrating wearable sensor data and self-reported diaries for personalized affect forecasting. Smart Health 2024, 32, 100464. [Google Scholar] [CrossRef]
Halkiopoulos, C.; Gkintoni, E. The Role of Machine Learning in AR/VR-Based Cognitive Therapies: A Systematic Review for Mental Health Disorders. Electronics 2025, 14, 1110. [Google Scholar] [CrossRef]

Figure 1. Graphical outline of the Feel Transformer architecture.

Figure 2. Graphical outline of the experimental workflow.

Figure 3. A characteristic example of false peaks created by the forced smoothness of cubic interpolation, even when the original signal has abrupt changes. EDA signal (black) and tonic component (green) are shown on the left-hand side; the phasic component and the peaks detected (red cross) are shown on the right-hand side.

Figure 4. A characteristic example of false peaks created by cvxEDA. EDA signal (black) and tonic component (green) are shown on the left-hand side; the phasic component and the peaks detected (red cross) are shown on the right-hand side.

Figure 5. A comparison of the three configurations of the Feel Transformer. EDA signal (black) and tonic component (green) are shown on the left-hand side; the phasic component and the peaks detected (red cross) are shown on the right-hand side.

Table 1. Methods used in the experiments and their key characteristics.

Method	EDA-Specific	Training	Tuning
Ledalab	Yes	No	None
cvxEDA	Yes	No	None
sparsEDA	Yes	No	None
Theil detrending	No	No	alpha 0.95
Feel Transformer 1	No	Yes	60 s SCL segments
Feel Transformer 2	No	Yes	30 s SCL segments
Feel Transformer 3	No	Yes	1 s SCL segments

Table 2. Histograms of falling/stable/rising slopes for the tonic components produced by each decomposition method. The three Feel Transformer models refer to the three configurations detailed in Section 4.2. The last column gives the entropy between each histogram and the uniform histogram.

Decomp.	Tonic Slope			Entropy
Method	Falling	Stable	Rising
Ledalab	36%	32%	32%	16 $\times 10^{- 4}$
cvxEDA	38%	30%	32%	51 $\times 10^{- 4}$
sparsEDA	38%	35%	27%	10 $\times 10^{- 3}$
Theil	34%	34%	32%	40 $\times 10^{- 5}$
Feel Transf. 1	34%	32%	34%	40 $\times 10^{- 5}$
Feel Transf. 2	34%	32%	34%	40 $\times 10^{- 5}$
Feel Transf. 3	34%	31%	35%	13 $\times 10^{- 4}$

Table 3. The mean number of peaks and histograms of the number of peaks detected per 3min frame of the phasic component produced by each decomposition method. The three Feel Transformer models refer to the three configurations detailed in Section 4.2.

Decomp.		Mean	Histogram of Number of Peaks
Method		Peaks	0–4	5–9	10–14	15–19	20–24	25–29	30–
Ledalab		11.5	16%	28%	27%	15%	7%	3%	2%
cvxEDA		11.8	18%	27%	24%	16%	9%	3%	3%
sparsEDA		9.7	23%	35%	22%	11%	5%	2%	1%
Theil		10.1	21%	35%	22%	12%	5%	2%	2%
Feel	1	10.5	19%	34%	24%	13%	6%	2%	2%
Trans-	2	11.2	16%	31%	26%	14%	7%	3%	2%
former	3	26.3	7%	13%	13%	12%	10%	8%	35%

Table 4. The Mean amplitude and amplitude histogram of the peaks detected in the phasic component produced by each decomposition method. The three Feel Transformer models refer to the three configurations detailed in Section 4.2.

Decomp.		Mean	Histogram of Peak Amplitude
Method		Ampl.	0–0.005	0.005–0.10	0.10–0.20	0.20–0.40	0.40–
Ledalab		0.196	36%	16%	12%	30%	6%
cvxEDA		0.661	45%	11%	10%	26%	8%
sparsEDA		0.116	45%	13%	12%	26%	4%
Theil		0.109	49%	13%	12%	23%	3%
Feel	1	0.103	49%	14%	12%	23%	3%
Trans-	2	0.102	50%	14%	12%	22%	3%
former	3	0.017	86%	5%	3%	6%	1%

Table 5. The Jensen–Shannon distance between the peak density histograms.

	cvxEDA	sparsEDA	Theil	FT 1	FT 2	FT 3
Ledalab	0.038	0.091	0.077	0.056	0.019	0.359
cvxEDA		0.108	0.091	0.079	0.051	0.343
sparsEDA			0.023	0.038	0.078	0.404
Theil				0.026	0.064	0.387
FT 1					0.041	0.382
FT 2						0.366

Table 6. The Jensen–Shannon distance between the peak amplitude histograms.

	cvxEDA	sparsEDA	Theil	FT 1	FT 2	FT 3
Ledalab	0.080	0.050	0.099	0.093	0.099	0.354
cvxEDA		0.085	0.090	0.085	0.086	0.299
sparsEDA			0.054	0.049	0.056	0.320
Theil				0.007	0.009	0.269
FT 1					0.010	0.273
FT 2						0.266

Table 7. A histogram of the amplitude ranges (max–min) of the phasic component produced by each decomposition method. The three Feel Transformer models refer to the three configurations detailed in Section 4.2.

Decomp.		Histogram of Phasic Amplitude Ranges
Method		0–0.02	0.02–0.05	0.05–0.10	0.10–0.50	0.50–1.0	1.0–10	10–
Ledalab		48%	16%	12%	15%	3%	5%	1%
cvxEDA		48%	17%	11%	14%	3%	6%	1%
sparsEDA		44%	18%	13%	16%	3%	7%	1%
Theil		43%	18%	13%	17%	3%	5%	1%
Feel	1	45%	18%	12%	16%	3%	5%	1%
Trans-	2	46%	18%	12%	15%	3%	5%	0%
former	3	82%	8%	4%	5%	1%	1%	0%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Tsirmpas, C.; Konstantopoulos, S.; Andrikopoulos, D.; Kyriakouli, K.; Fatouros, P. Transformer-Based Decomposition of Electrodermal Activity for Real-World Mental Health Applications. Sensors 2025, 25, 4406. https://doi.org/10.3390/s25144406

AMA Style

Tsirmpas C, Konstantopoulos S, Andrikopoulos D, Kyriakouli K, Fatouros P. Transformer-Based Decomposition of Electrodermal Activity for Real-World Mental Health Applications. Sensors. 2025; 25(14):4406. https://doi.org/10.3390/s25144406

Chicago/Turabian Style

Tsirmpas, Charalampos, Stasinos Konstantopoulos, Dimitris Andrikopoulos, Konstantina Kyriakouli, and Panagiotis Fatouros. 2025. "Transformer-Based Decomposition of Electrodermal Activity for Real-World Mental Health Applications" Sensors 25, no. 14: 4406. https://doi.org/10.3390/s25144406

APA Style

Tsirmpas, C., Konstantopoulos, S., Andrikopoulos, D., Kyriakouli, K., & Fatouros, P. (2025). Transformer-Based Decomposition of Electrodermal Activity for Real-World Mental Health Applications. Sensors, 25(14), 4406. https://doi.org/10.3390/s25144406

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Transformer-Based Decomposition of Electrodermal Activity for Real-World Mental Health Applications

Abstract

1. Introduction

2. Background

2.1. Knowledge-Driven Methods

2.2. Data-Driven Methods

3. Comparing EDA Decomposition Methods

4. Experimental Setup

4.1. Datasets and Methods

4.2. Feel Transformer

5. Experimental Results and Discussion

5.1. Analysis of Feature Values

5.2. Analysis of Phasic Morphology

6. Conclusions and Further Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI