The experimental configuration integrated equipment with accurate physiological monitoring tools to guarantee an immersive and reliable data acquisition process. The experimental environment consisted of a high-performance computing workstation configured to ensure reliable, low-latency multimodal data acquisition during continuous TikTok exposure.
4.1. Engagement Binarization and Frequency
To determine whether the proportion of “engaged” moments exceeded the baseline derived from [
21], engagement values were binarized using the logistic-sigmoid function with parameters
,
, and threshold
, as shown in Equation (
2).
Across all participants, a total of
frames were analyzed, of which
(33.2%) were classified as “engaged.” A one-sided binomial test was conducted against the null hypothesis
, yielding
. A one-sided binomial test was conducted against the null hypothesis
, yielding
. As shown in
Figure 3, the observed proportion of frames classified as engaged significantly exceeds the theoretical baseline, confirming that TikTok stimuli elicited above-average emotional engagement during the experimental sessions. As shown in
Figure 3, the binomial test compares the observed proportion of engaged frames against the empirical baseline probability. The significant deviation above the baseline confirms that TikTok exposure elicits above-average facial engagement at the frame level. To evaluate whether TikTok exposure elicited facial engagement above an empirically established baseline, we conducted a one-sided binomial test comparing the observed proportion of engaged frames against the reference probability
p0 = 0.250875. Across all participants, 234,508 out of 707,029 frames (33.2%) were classified as engaged, yielding a highly significant deviation from the baseline (
p < 0.001). The observed engagement proportion clearly exceeds the empirical baseline probability.
4.1.1. Temporal Dynamics of Engagement
To test the hypothesis that engagement would increase over time, Pearson’s correlation coefficient (r) was computed between elapsed time (in seconds) and normalized engagement for each participant. Out of 27 participants, 23 exhibited negative correlations, indicating that engagement tended to decrease as viewing time progressed. After removing invalid or missing values, the mean correlation was . A one-sample t-test on Fisher z-transformed correlations confirmed that this downward trend was statistically significant (, ).
To further illustrate this temporal decline,
Figure 4 provides a participant-level visualization of the temporal evolution of facial engagement across the viewing session by directly comparing average normalized engagement during the early (minutes 1–5) and late (minutes 11–15) intervals. Each point represents one participant, with the horizontal axis indicating early-session engagement and the vertical axis indicating late-session engagement. The dashed diagonal corresponds to the identity line (
), which denotes equal engagement levels across both intervals. Points located below the diagonal indicate participants whose average engagement decreased during the later portion of the session, whereas points above the diagonal would indicate increased engagement over time. As illustrated in the figure, the majority of participants fall below the identity line, revealing a systematic reduction in facial engagement during the late viewing interval. This pattern provides a clear visual complement to the correlation-based analysis, demonstrating that the observed negative association between engagement and elapsed time reflects a consistent within-subject temporal decline rather than being driven by a small subset of participants.
These findings contradict the initial hypothesis of sustained or increasing engagement, suggesting instead that continuous exposure to short-form videos may induce emotional habituation or cognitive fatigue. Notably, this analysis was performed using the continuous normalized engagement signal rather than binarized labels, ensuring that the observed temporal decay is not an artifact of threshold-based classification.
Figure 5 illustrates the growth-curve analysis of facial engagement across the viewing session using a linear mixed-effects modeling framework. The gray lines represent the engagement trajectories of all participants included in the study (
N = 27), plotted at the minute level to visualize inter-individual variability in baseline engagement and temporal response patterns. Some trajectories appear partially truncated due to missing values associated with shorter or incomplete recordings; these were intentionally retained as missing data to avoid selective visualization or participant exclusion. The black line depicts the population-level trend estimated by the mixed-effects model, capturing the average temporal evolution of engagement while accounting for individual differences through random intercepts. Despite substantial variability across participants, the group-level trajectory exhibits a consistent downward trend over time, indicating a systematic decline in facial engagement during sustained TikTok exposure. This result reinforces the robustness of the observed engagement decay and demonstrates that it is not driven by a small subset of participants, but rather reflects a general temporal pattern across the full sample. Despite this variability, a clear population-level trend emerges. The black curve represents the fixed-effect component of the mixed-effects model, capturing the average engagement trajectory across all participants while accounting for random intercepts at the individual level. This group-level trend exhibits a systematic decline in engagement as exposure time progresses, indicating that emotional engagement decreases over sustained interaction with algorithmically curated short-form video content. Importantly, the mixed-effects growth-curve approach allows the temporal decay in engagement to be modeled independently of individual baseline differences. While some participants begin with high engagement levels and others with lower initial responsiveness, the overall downward trajectory remains consistent, suggesting that the observed engagement decay is a robust temporal phenomenon rather than an artifact driven by a subset of participants. This finding aligns with theories of emotional habituation and attentional fatigue, whereby repeated exposure to high-arousal stimuli leads to a gradual attenuation of emotional responsiveness.
From a methodological perspective, the use of a mixed-effects model strengthens the interpretation of the results by simultaneously capturing within-subject temporal dynamics and between-subject variability. Rather than relying on simple averaged time series or correlation-based analyses, this approach provides a more comprehensive representation of engagement dynamics under continuous algorithmic stimulation. The figure therefore offers converging evidence that short-form video platforms can elicit strong initial emotional engagement but struggle to sustain it over time, reinforcing the study’s central claim regarding habituation and emotional fatigue in algorithm-driven media environments.
4.1.2. GSR Analysis
For each participant, the GSR values were aligned with the engagement labels using a
ms timestamp tolerance. Let
and
. The mean GSR conductance was then computed separately for engaged and non-engaged moments as shown in Equation (
3).
Table 3 summarizes the individual mean conductance levels. Across participants, paired-sample
t-tests revealed no statistically significant difference in mean GSR between engaged (
M = 3.81 μS) and non-engaged moments (
M = 3.77 μS),
,
. Thus, physiological arousal, as captured by GSR, did not systematically increase during facially classified “engaged” intervals.
To facilitate visual inspection of the participant-level GSR values reported in
Table 3,
Figure 6 presents a per-participant comparison of mean electrodermal conductance during facially classified engaged and non-engaged moments. Substantial inter-individual variability in absolute GSR levels is observed; however, no consistent pattern of higher conductance during engaged states emerges across participants.
The absence of a significant GSR difference suggests that facially detected engagement may not directly correspond to sympathetic arousal, emphasizing the complexity of emotional responses during digital media consumption.
4.2. Fast Nonlinear Complexity Analysis of Physiological and Expressive Signals
To complement the linear and distribution-based analyses presented earlier, we introduce a lightweight yet theoretically grounded nonlinear complexity assessment of the physiological (GSR) and expressive (facial engagement) signals recorded during continuous TikTok exposure. Traditional nonlinear methods such as Sample Entropy and full Lempel–Ziv Complexity are computationally demanding for high-frequency, long-duration time series and are therefore impractical for rapid experimental pipelines or real-time affective computing systems. To address this limitation while preserving interpretability, we implement two fast-complexity estimators widely adopted in physiological informatics: Approximate Entropy Light (ApEn-L), which captures local unpredictability in autonomic fluctuations, and Binary Pattern Complexity (BPC), which quantifies the structural richness of state transitions in the engagement sequence.
Figure 7 summarizes the results of the nonlinear complexity analyses by comparing early (minutes 1–5) and late (minutes 11–15) segments of the session. While the Binary Pattern Complexity (BPC) metric shows a clear reduction in the late interval, indicating a loss of structural richness in facial engagement patterns, the Approximate Entropy Light (ApEn-L) values remain relatively stable across time, with no pronounced decrease between early and late segments.
Similarly, the BPC metric shows a marked reduction between the early and late segments. Because BPC reflects the diversity of transitions in the binarized engagement sequence, a lower value indicates that facial engagement responses become more stereotyped and less behaviorally flexible over time. This result suggests that even when participants exhibit facial markers of engagement, these expressions follow increasingly constrained patterns during extended exposure. The phenomenon mirrors reductions in expressive bandwidth observed in sustained-attention tasks and reinforces the conceptualization of algorithmically curated short-form video streams as high-intensity but low-durability affective stimuli.
The introduction of these fast nonlinear complexity metrics represents a novel methodological contribution to the study of short-form media consumption. Whereas prior TikTok research has primarily focused on amplitude-based measures, self-reports, or algorithmic engagement signals, our complexity-based analysis demonstrates that continuous platform exposure leads not only to decreasing engagement levels but also to a collapse in the intrinsic dynamical richness of both physiological arousal and expressive behavior. This layered degradation of affective complexity highlights a previously undocumented dimension of emotional fatigue in algorithmically driven media environments, offering new implications for digital well-being, attentional sustainability, and the design of ethical personalized content delivery systems.
Figure 7 illustrates the nonlinear complexity patterns of both physiological and expressive responses during continuous TikTok exposure. The first metric, Approximate Entropy Light (ApEn-L), provides a computationally efficient proxy for the unpredictability of the GSR signal. Higher ApEn-L values indicate richer autonomic fluctuations, whereas lower values reflect increased predictability and reduced adaptability of the sympathetic nervous system. As shown in the Figure, the relative stability of ApEn-L suggests that autonomic signal unpredictability does not systematically decline over the course of exposure. This indicates that, although facial engagement becomes more stereotyped over time, the underlying physiological arousal dynamics retain a comparable level of local variability.
The second metric, Binary Pattern Complexity (BPC), quantifies the diversity of state transitions in the binarized engagement sequence. A higher BPC value reflects a more flexible and varied pattern of facial engagement responses, while a lower value indicates that behavioral expressions follow increasingly repetitive or stereotyped trajectories. The marked reduction in BPC observed in the late segment demonstrates that participants’ expressive behavior narrows over time, even when engagement is still detected. This finding suggests that the nature of engagement shifts from a dynamically rich pattern to a more uniform response profile as the session progresses.
Taken together, the metrics depicted in
Figure 7 reveal a differentiated pattern of emotional dynamics. Whereas expressive behavior, as captured by BPC, exhibits a marked reduction in structural complexity over time, physiological arousal dynamics, as indexed by ApEn-L, remain comparatively stable. This dissociation suggests that expressive engagement may undergo faster habituation than autonomic processes, reinforcing the interpretation of emotional engagement as a multi-component construct with partially independent temporal trajectories.
4.3. Discussion
The present study provides empirical evidence on the temporal dynamics of emotional engagement during continuous TikTok use through a multimodal experimental approach. The results demonstrate that, although the proportion of engaged frames significantly exceeded the theoretical baseline, engagement declined as exposure continued. This finding challenges the assumption that algorithmically tailored content produces sustained affective involvement, instead suggesting a habituation effect consistent with cognitive saturation and emotional fatigue. The negative time–engagement correlation () reflects the attenuation of responsiveness to repeated high-arousal stimuli, aligning with established models of hedonic adaptation and attentional decay in digital media contexts. From a psychophysiological standpoint, these results imply that while TikTok initially captures attention through novelty and audiovisual stimulation, its ability to maintain emotional arousal may be inherently self-limiting over prolonged sessions.
The observed decline in emotional engagement over time can be coherently interpreted through established theories of emotional habituation. Habituation theory posits that repeated exposure to stimuli with similar affective characteristics leads to a progressive reduction in emotional responsiveness, even when stimulus intensity remains high. In the context of short-form video platforms, algorithmic personalization ensures a continuous stream of emotionally optimized content; however, this very optimization may accelerate habituation by reducing novelty and increasing perceptual redundancy. Our findings empirically support this mechanism by demonstrating that initial engagement peaks rapidly but subsequently attenuates during continuous exposure, suggesting that emotional intensity alone is insufficient to sustain long-term engagement without novelty or meaningful variation.
From an attentional perspective, the temporal decay of engagement aligns with models of attentional resource depletion and cognitive fatigue. According to attentional decay frameworks, sustained exposure to high-frequency, high-arousal stimuli taxes limited cognitive resources, leading to reduced responsiveness over time. Short-form video platforms intensify this process by minimizing recovery intervals and continuously demanding orienting responses through rapid audiovisual transitions. The negative correlation between engagement and elapsed time observed in this study provides biometric evidence for attentional fatigue in algorithmically curated media environments, reinforcing the notion that continuous stimulation may paradoxically undermine sustained attention.
Importantly, the dissociation between facially inferred engagement and physiological arousal observed in this study challenges simplified models of emotional engagement that assume a direct correspondence between expressive behavior and autonomic activation. While facial expression analysis captures overt, socially legible markers of engagement, electrodermal activity reflects underlying sympathetic nervous system dynamics that may habituate more rapidly. This divergence suggests that emotional engagement is a multi-layered construct in which expressive and physiological components follow distinct temporal trajectories. The findings therefore support hierarchical and component-process models of emotion, emphasizing the need for multimodal measurement frameworks in affective computing and media psychology.
Taken together, these findings motivate a testable theoretical framework in which emotional engagement during algorithmically curated media exposure follows a three-phase trajectory: (1) rapid affective activation driven by novelty and reward anticipation, (2) progressive habituation characterized by declining physiological arousal and attentional resources, and (3) expressive persistence with reduced autonomic support. This framework generates clear, testable predictions—for example, that physiological markers of arousal will decay faster than facial indicators under continuous exposure, and that introducing meaningful content variation or recovery intervals may partially restore engagement dynamics. By articulating engagement as a dynamic, multi-component process rather than a static outcome, the present study advances theoretical coherence in the study of digital media engagement.
The dissociation observed between facially inferred engagement and GSR further underscores the complexity of measuring affective responses in interactive digital environments. While Affectiva’s AFFDEX algorithm provides a valid proxy for observable expressivity, it may not fully reflect sympathetic activation at the autonomic level. The absence of significant GSR differences between engaged and non-engaged moments () suggests that facial cues of engagement are partially decoupled from underlying physiological arousal. This divergence is not necessarily contradictory but indicative of multi-layered emotional processing, where overt expressivity and physiological excitation follow distinct temporal and intensity trajectories. Methodologically, this emphasizes the importance of integrating complementary modalities—such as heart-rate variability, pupil dilation, or EEG—to capture a more holistic spectrum of user engagement.
Beyond its empirical contributions, this study advances the methodological framework for examining emotional engagement in algorithmic media systems. By synchronizing biometric and behavioral data in real time, the experiment bridges the gap between subjective self-reports and objective physiological markers, offering a replicable pipeline for future affective computing research. However, the laboratory setting, constrained exposure time, and logistic parameterization of the engagement function may limit generalizability to naturalistic mobile contexts. Future work should expand temporal scope, incorporate adaptive thresholds for engagement detection, and examine cross-modal synchrony under ecologically valid conditions. Such refinements would enhance understanding of how platform design, content variability, and individual traits jointly modulate emotional regulation and digital well-being in emerging media ecosystems.
The proportion of engaged frames () significantly exceeded the theoretical baseline (), confirming TikTok’s high initial emotional appeal.
Engagement levels exhibited a significant negative correlation with time (, , ), indicating a consistent decline across participants.
GSR conductance did not differ significantly between engaged and non-engaged frames (, ), suggesting that facial indicators of engagement were not consistently accompanied by measurable physiological arousal.
It is important to consider that electrodermal activity reflects autonomic responses with inherent physiological latencies, typically occurring several seconds after stimulus processing. In experimental paradigms with discrete and well-defined emotional events, time-lagged or event-aligned analyses can provide valuable insight into causal affective dynamics. However, the present study intentionally employed a naturalistic, continuous browsing paradigm in which participants were exposed to an uninterrupted stream of algorithmically curated short-form videos. In such contexts, emotional stimulation is sustained and overlapping rather than event-based, making it difficult to identify discrete engagement onsets or stimulus boundaries suitable for event alignment. Consequently, the analyses focused on distribution-level and temporal trends in engagement and physiological arousal rather than fine-grained event-locked coupling. This design choice aligns with the study’s objective of characterizing global engagement dynamics and habituation effects under continuous algorithmic stimulation, rather than modeling moment-to-moment causal responses. Future studies employing controlled stimulus timing, explicit event markers, or experimentally induced engagement episodes may extend the present framework by incorporating time-lagged or event-aligned analyses to further disentangle expressive and autonomic response dynamics.
An additional methodological extension involves the use of individualized engagement thresholds, calibrated to each participant’s baseline or distributional properties. While such personalization may further reduce inter-individual variability, it requires reliable ground-truth labels or extended calibration phases, which were beyond the scope of the present study. Future work may integrate adaptive or participant-specific thresholds to enhance personalization and cross-study comparability.
An important consideration concerns the role of video content characteristics in shaping emotional engagement. While incorporating explicit content labels or emotional attributes may appear desirable for disentangling semantic effects from temporal dynamics, such an approach assumes that content operates as an independent experimental variable. In algorithm-driven platforms such as TikTok, however, content selection is itself an endogenous outcome of continuous personalization and affective optimization.
The present study intentionally prioritizes ecological validity by examining emotional engagement under uninterrupted, algorithmically curated stimulation, rather than isolating predefined content categories. From this perspective, short-form video content functions as a manifestation of the algorithmic system rather than an external factor. Artificially controlling or labeling content may therefore obscure the very mechanisms through which algorithmic personalization shapes affective dynamics over time.
By focusing on temporal patterns of engagement and physiological response during continuous exposure, the study isolates system-level effects such as habituation, emotional fatigue, and multimodal dissociation. Future research may extend this framework by combining controlled content paradigms with naturalistic feeds to disentangle semantic attributes from algorithmic delivery, thereby complementing the system-level insights provided here.
Taken together, these results indicate that TikTok content elicits strong but transient emotional engagement, with diminishing affective responsiveness over continuous exposure. The dissociation between facial and physiological metrics highlights the need for multimodal models of engagement that account for both observable and latent affective processes.
4.3.1. Rationale and Value of the Multimodal Approach
A central contribution of this study lies in the combined analysis of facial expression metrics and electrodermal activity, which enables a more nuanced characterization of emotional engagement than either modality alone. Facial engagement indices derived from the AFFDEX SDK capture observable and socially legible expressions of responsiveness, reflecting how users outwardly react to stimuli. In contrast, GSR provides a direct measure of sympathetic nervous system activation, indexing underlying physiological arousal that may not be consciously expressed.
The integration of both modalities revealed a systematic dissociation between expressive and physiological components of engagement. While facial engagement exhibited a clear temporal decline and a reduction in structural complexity over sustained exposure, physiological arousal did not show corresponding differences between engaged and non-engaged states and maintained relatively stable nonlinear complexity. This divergence indicates that outward expressions of engagement may habituate more rapidly than autonomic processes, a pattern that would remain undetected under a unimodal design.
By jointly examining expressive and physiological signals, the multimodal framework allowed us to move beyond simple confirmation of engagement levels and instead uncover the layered and partially independent dynamics of emotional engagement under continuous algorithmic stimulation. This approach refines existing models of digital engagement by demonstrating that reliance on a single modality—either facial metrics or physiological arousal alone—can lead to incomplete or potentially misleading interpretations. The findings thus underscore the necessity of multimodal measurement frameworks for capturing the full complexity of affective responses in highly stimulating digital media environments.
4.3.2. Role of Subjective Feedback in Interpreting Biometric Engagement
Although the primary analyses of this study focus on objective biometric measures, a post-session self-report questionnaire was administered to capture participants’ subjective impressions of their engagement experience. This questionnaire was intentionally not included in the main statistical analyses, as self-reported engagement reflects reflective, post hoc evaluation rather than moment-to-moment affective dynamics. In contrast, the biometric measures employed in this study were designed to capture rapid, non-conscious emotional and physiological responses during continuous exposure.
The distinction between subjective and biometric measures is particularly relevant in light of the observed dissociation between facial engagement and physiological arousal. Participants’ self-reports often reflect perceived enjoyment, interest, or fatigue after the session, whereas facial expressions and GSR index real-time expressive and autonomic processes that may evolve independently of conscious appraisal. Rather than serving as redundant indicators, these modalities provide complementary perspectives on engagement operating at different levels of awareness.
From this perspective, the subjective feedback collected post-session offers contextual support for interpreting the biometric findings, helping to situate objective engagement decay and multimodal dissociation within participants’ conscious experience. Future work may integrate synchronized self-report probes or experience sampling methods with biometric measures to further bridge subjective and physiological dimensions of engagement in algorithmically curated media environments.