Eye-Tracker Study of Influence of Affective Disruptive Content on User’s Visual Attention and Emotional State

When reading interesting content or searching for information on a website, the appearance of a pop-up advertisement in the middle of the screen is perceived as irritating by a recipient. Interrupted cognitive processes are considered unwanted by the user but desired by advertising providers. Diverting visual attention away from the main content is intended to focus the user on the appeared disruptive content. Is the attempt to reach the user by any means justified? In this study, we examined the impact of pop-up emotional content on user reactions. For this purpose, a cognitive experiment was designed where a text-reading task was interrupted by two types of affective pictures: positive and negative ones. To measure the changes in user reactions, an eye-tracker (for analysis of eye movements and changes in gaze points) and an iMotion Platform (for analysis of face muscles’ movements) were used. The results confirm the impact of the type of emotional content on users’ reactions during cognitive process interruptions and indicate that the negative impact of cognitive process interruptions on the user can be reduced. The negative content evoked lower cognitive load, narrower visual attention, and lower irritation compared to positive content. These results offer insight on how to provide more efficient Internet advertising.


Introduction
Intrusive web advertising, such as animated, pop-up, layer ads, and other informational content appearing in any part of the screen, drag the users' attention away from their primary task: reading or navigating through the main content of web pages. This approach is a standard procedure used by advertisement designers who use their marketing activities to ensure advertisements are delivered and noticed. Therefore, increasing the effectiveness of ads delivery is often accompanied by an increase in intrusiveness [1][2][3]. However, as was observed in early studies related to TV marketing [4], the increased usage of invasive forms of advertising leads to ad avoidance. This pattern strengthens for various forms of interactive media such as pop-up ads [1], video content [5], and within social media [6].
The distractions introduced by online ads are being perceived as annoying by an increasing number of users, which has created a negative attitude toward the ads. Irritation is associated with most forms of intensive advertising, especially with forced exposure such as pop-up, layer [1], and pre-roll video ads [2]. Therefore, advertising companies are facing the dilemma of how to draw users' attention to the disruptive content without evoking their negative attitude. To limit the negative impact of ads on consumers, various techniques can be used such as increasing ad relevance [7,8], including entertaining content [9], increasing privacy protection [10], using gamification [11], or increasing user control [12]. Entirely eliminating the negative feelings evoked by interruptions may be challenging, but it is at least possible to lower their intensity by providing the content of emotional load or highly engaging content, thereby attracting user attention.
With the above assumptions, adding emotional appeal to video pre-roll ads has received special focus. Li et al. reported the results of an experiment where a level of intrusiveness was measured with a scale [3], showing that ads with positive emotional load appeal are considered less intrusive and generate more positive attitude toward the advertising content and the brand. However, the results of extended study where more emotional states were considered [13] showed that the usage of basic emotions such as disgust, happiness, sadness, surprise, and suspense within pre-roll video ads would not help to stop or reduce irritation, regardless of their valence. The study confirmed that the basic emotions significantly increase the skipping of video ads, but complex affective responses such as humor, fun, and warmth decrease skipping and can help to increase ad effectiveness. Another study verified the possibility of decreasing perceived intrusiveness by adding emotional appeal to pre-roll ads displayed with video content, where higher positive emotional appeal led to lower perceived intrusiveness [14].
Emotional load is also known as a factor modifying visual attention. The influence of affective pictures on attention is reported in [15][16][17]. Most of these effects were originally reported for negative stimuli (snakes, fearful, or angry faces). The results showed that stimuli with negative valence are detected rapidly and shift visual attention to their location [15,18,19]. Conversely, positive affect is thought to promote the exploration of new information, in contrast to negative emotions, which primarily act to focus (narrow) attention and cognition. In support, authors [20,21] demonstrated that positive mood states widen the range of attention to visual and conceptual space. Such results suggest that positive valence, at least in terms of mood, can serve to broaden one's range of attention, unlike the effects of negative valence, which narrow the distribution or scope of one's attentional field of view. The effects of wide and narrow attention are called ambient and focal attention, respectively, in the eye-tracking literature. Ambient attention is typically characterized by relatively short fixations followed by long saccades. Conversely, focal attention is described by long fixations followed by short saccades [22,23].
A separate study investigating how the arousal level of a stimulus influences visual attention assumed that highly arousing stimuli capture attention, regardless of their emotional valence [24]. Thus, according to arousal theories of attention, it is arousal rather than valence that influences the amount of attention that is voluntarily or involuntarily focused on stimuli.
The studies presented above analyzed the influence of the emotional content on users' visual attention or their emotional response. In this study, our aim was to study both factors simultaneously and to determine whether such an approach allows for a more compact and unified answer to the question on how to prepare the marketing content to reduce user irritation and limit ad avoidance. Hence, the aim of this study was to measure the impact of affective disruptive content and its emotional type on changes in user visual attention, cognitive load, and emotional states.
To achieve our aim, we designed a cognitive experiment. During the experiment, a subject's task was to read and understand the text displayed on the screen. The text-reading task was suddenly interrupted by affective pictures (taken from the International Affective Picture System (IAPS) [25]) appearing in the center of the screen (Figure 1). Although different kinds of emotions can be evoked by those pictures, we based our study on the most general classification of emotion that divides the whole spectrum of emotions into two classes: positive and negative. Thus, positive and negative pictures were used without adding subclasses or levels of intensity to create sharp differences between the used stimuli. During the experiment, we collected two types of data: eye movements and gaze points identification (acquired by an eye-tracker) and facial muscle micro-movements (recorded with an iMotion platform). Whereas the eye movements were recorded to detect changes in visual attention and the cognitive load, the patterns of facial muscle movements were acquired to measure the level and type of emotions felt by users. Notably, our goal was not to recognize the primary emotions, such as happiness, anger, sadness, etc. The results of our experiment confirm that the choice of emotional load of the disruptive content is an important factor when designing ads. The correct type of emotional content might reduce the negative impact of interrupting the user's ongoing cognitive process. As we found, the more negative the content, the lower the cognitive load, and the narrower the visual attention, and the lower the irritation.
The paper is structured as follows. Section 1 presents the motivation behind the study. All the materials and methods used in the research can be found in Section 2. Section 3 discusses the results of our experiment, and Section 4 concludes the paper.

Experiment Setup
The goal of the experiment was to measure the impact of affective disruptive content and its emotional type on user visual attention, cognitive load, and emotional states change ( Figure 2). The experiment was conducted with 33 respondents who declared normal or corrected-to-normal vision. All analyzed data were fully anonymized. Before the experiment, the participants provided informed written consent to have data from the perceptual experiment used in research (according the Bioethics Committee Agreement no KB-0012/24/2020). The experiment was performed using an NEC monitor with a native resolution of 1680 × 1050 pixels. The monitor display was calibrated to the sRGB color space using a Minolta CS-200 colorimeter and a Specbos 1201 spectroradiometer. During the experiment, the gaze points and eye movements were recorded with a Tobii Pro X3 120 Hz eye-tracker, and the movements of the user's face muscles were recorded with an iMotions platform. The recording device (HP webcam HD 3300 720P) was attached to the upper part of the stimulus screen.
The main respondent task was to read a set of 10 different texts; each text was followed by a short questionnaire testing a text understanding. The reading process was interrupted by visual stimuli with a positive or negative emotional load. The examples of negative and positive stimuli displayed during the experiment, as well as the text reading task, are presented in Figure 3.
The two-level randomization procedure was applied in the process of selecting the stimuli for each respondent. First, the order of the condition (negative/positive) was randomly chosen; second, the pictures in one emotional group were randomly arranged. Such an experimental plan enabled us to find out whether there are statistically significant differences between the two experimental conditions, that is, between positive and negative stimuli.
The disrupting stimuli were presented for seven seconds with random intervals (13 and 26 s) between each two of them. In total, 30 stimuli were displayed for each user. After displaying the whole set of 30 stimuli (15 negative and 15 positive), the remaining part of the reading task was free from interruptions. The average experiment time was about 15-20 min.
The affective pictures used in the experiment were obtained from the IAPS database [25], which was used for the assessment of human emotions and the assignment of experimental tasks [26]. The IAPS database consists of approximately 900 visual stimuli (photographs) provided in three-dimensional values (arousal, dominance, and valence) of subjects' emotional responses, which are assessed through SAM for each photograph [27]. For the experiment, a total of 30 IAPS photographs (15 photographs for each group of positive and negative) were selected based on the following valence values in each group: negative, 4 points or less; positive, 6.3 points or more. According to the signed agreement, pictures from the IAPS database cannot be shared outside of research; therefore, we present them with their names and characteristics in Table 1. The visualization of the experiment in Figure 3 is only for illustrative purposes.   Table 1.

Eye-Tracking Metrics
Eye movement characteristics provide indirect access to cognitive processes, e.g., decision making [28], attention [29], and memory [30]. Particularly, ocular events, e.g., saccades, blinks, fixations, and pupillary responses, involve different neural circuitries in connection with visuomotor information processing [31]. Fixation duration has often been investigated in the context of visual attention mode, e.g., ambient mode, which is characterized by short fixations and long saccades during early scene inspection, and focal mode, which is characterized by longer fixations, which is associated with more detailed object feature processing during later inspection phases [32].
A number of studies [33][34][35] have suggested that saccade velocity (SCV), saccade amplitude (SCA), saccade duration (SCD), fixation duration (FD), blink duration (BD), blink frequency (BF), and pupil dilation range (PDR) may be sensitive to mental load variation and fatigue. According to [36], saccade is a relevant ocular event for studying fatigue and the following decrease in user attention. The velocity of saccades is an especially good indicator of stress, mental overload, irritation states, and lowering attention oculometrics. Saccade velocity is the average saccade speed in degrees per second. A higher saccade velocity indicates higher stress and task complexity and lower concentration while performing a task. The higher the cognitive load, the shorter the saccades, and the higher the saccade velocity [37].
In our research, to measure visual attention change, when interruption from a cognitive task occurred, we chose two metrics: fixation duration, in the form of fixation duration in the area of interest (AOI) of a given stimuli compared to the fixation duration in the experiment window for the total time the stimuli was displayed (fixation duration percent (FDP); and the saccade velocity (SCV). Notably, according to [1], information or advertisement that is deemed important, interesting, or intriguing rewards the viewer, who is thereby less likely to feel irritated by the interruption. Therefore, in our research, we wanted to check if the value that viewers receive from advertising could be increased by considering the emotional message contained in the ad.

Eye-Tracker Data Preprocessing
The eye-tracker enabled us to register the gaze points and eye movements and thus to isolate the elements that attracted the user's attention at a given moment. To analyze the eye-tracker data, the metrics were computed per every picture that interrupted the text reading process for every observer. However, before the statistical differences between data were computed, the saccadic velocities were standardized with Equation (1).
where i is the observer number (i ∈ (1, . . . , n)), j is the picture number (j ∈ (1, . . . , m)), d i,j is the value of saccadic velocity,d i is the mean value of saccadic velocity, and σ i is the standard deviation of saccadic velocity. The fixation duration percent (FDP k ) is described according to Equation (2), so the values did not require standardization.
where FDAOI k is the fixation duration in the area of interest AOI k defined for stimuli k, and FDS k is the fixation duration in the experiment's window during the time when the stimuli was displayed (where stimuli k was presented).
Next, as the observers may have received implausible impression scores because they misunderstood the experiment instruction or did not engage in the task and provided random answers, a screening procedure was employed. For this, we applied the standard approach described in [38], Annex 2.3.1, that provides a numerical screening procedure. The procedure involves counting the number of trials in which an observer's result lies outside the ±2 standard deviation range and rejecting those observers for whom (a) more than 5% of the trials are outside that range and (b) the trials outside that range are evenly distributed so that the absolute difference between the counts of trials exceeding the lower and upper bounds of that range is not more than 30%. We applied this procedure to our data but did not find any participants that needed to be removed.

Analysis of Emotions
The complexity of emotional response to presented stimuli allows measurement of the user's emotional state from many different perspectives. Among the approaches used for assessing human emotional responses, two main methods can be distinguished. The first is based on analysis of reactions hidden inside the biosignals measured from the human body by using methods such as electroencephalography (EEG) [39], electrocardiography (ECG) [40], electromyography (EMG) [41], galvanic skin response (GSR) [40,42,43], and eye-tracking (ET) [44]. The second is focused on analyzing more external responses, such as body gestures [45], speech [46], or facial expressions [47][48][49][50]. Although all these approaches have been extensively studied, in the domain of emotional expression, the movements of facial muscles are regarded as the central source of information [51,52].
Facial expression analysis is usually based on the facial actions coding system (FACS) published in 1978 by Ekman and Friesen [53]. The system was first created in 1970 by Hjortsjö [54]; then, it was enhanced and published by Ekman in 1978 and once again updated in 2002 [55]. FACS describes a set of facial muscle movements coded in the form of action units (AUs). A single AU corresponds to a contraction or relaxation of one or more muscles. The FACS defines 44 AUs [51], such as AU7, lid tightener; AU14, dimpler; AU24, lip pressor; etc. The intensities of AU movements are measured on a discrete scale, ranging from A (trace) to E (maximum). The FACS alone does not provide any direct emotion descriptors [47]. The classification of emotions is based on patterns of AU activations that are described in related sources [56]. The sets of patterns corresponding to 6 basic emotions (anger, disgust, fear, happiness, sadness, and surprise) are additionally listed in [47].
Although the manual assignment of scores to different AUs to recognize underlying emotional expressions is possible for static pictures or short video clips, the emotional tagging of longer video material is ultimately a tedious task. Therefore, automated recognition systems are usually used to deal with the task. Apart from different algorithms and approaches proposed by scientists from leading research centers [57][58][59], fully commercial systems such as Azure Face API, Face++, Noldus FaceReader, or the iMotion module for Facial Expression Analysis [60][61][62] are available on the market. In our work, we used the latter of the mentioned systems, the iMotion module, which uses the Affectiva algorithm [63] to detect AU movements and the underlying emotions.

Emotional Data Preprocessing
Although the Facial Expression Analysis module provides both the raw information about the activation of single action units and their translation to core emotions (joy, anger, fear, disgust, contempt, sadness, and surprise) together with some additional indexes, only nine features were stored for offline analysis. Seven of them represented the core emotions mentioned above, and two represented more general mental characteristics: attention and engagement. Each of these features was provided as an intensity score measured in a normalized scale from 0 to 100.
All nine features were processed according to the same processing pipeline, which was composed of three steps. The first step was performed over the whole set of features, which was accompanied by the stimulus vector. The task of this step was to remove all empty records that were introduced during the recording process as a result of massive changes in a subject's position that temporarily broke the contact between the camera and the subject. Usually, empty records appeared when the picture box was removed from the screen at the beginning of the next text-reading period. The average number of empty records was about 10% of the total.
All the next steps from the processing pipeline were performed individually on each feature. The task of step 2 was to remove the influence of the outlier data. This step was necessary because each sudden movement of the subject's head introduced a large change in the feature value. To deal with the outlier problem, we calculated the 5th and 95th percentiles over the feature time series and replaced all the feature values falling under the 5th percentile or exceeding the 95th percentile with the value of the 5th or 95th percentile, respectively. In step 3, we applied 250 ms nonoverlapping windows on the feature time series. For each window, a corresponding label (-1, text-reading period; 1, negative picture presentation; 3, positive picture presentation) was assigned. Next, each window was described by its means value.
At the end of the processing pipeline, about 260 samples characterizing negative and 260 samples characterizing positive picture periods were obtained for each subject (10 pictures × (4 windows per second × 7 s of 1 picture presentation − 2 possible borders periods)). Regarding the text-reading periods, the number of samples characterizing those periods varied among subjects. This number of samples was much higher since (i) the text was presented before each picture and (ii) the text presentation time was longer. On average, about 3560 samples characterizing text periods (20 text periods × (4 windows per second × 45 s of 1 text presentation − 2 possible border periods)) were obtained for each subject. Hence, the total size of the feature matrix of one subject was about 9 features × 4080 samples. To ensure the comparability of the feature matrixes between subjects, each feature in the matrix was standardized to have zero mean and unit standard deviation. The feature matrices (and the stimulus vectors) calculated for all the subjects were concatenated and submitted to statistical analysis. Since the data distribution in the three analyzed groups significantly deviated from the normal distribution for most features, in all analyses, the non-parametric Kruskal-Wallis test with a p-value set to 0.05 was used to test the significance of between-groups differences.

Results and Discussion
The following section discusses results from the perceptual experiment with the goal to determine whether, during advertising design, both users' visual attention and their emotional response can be considered to reduce the user irritation and limit ad avoidance.
The user's natural reactions such as gaze point change, eye movements, and resulting oculometrics of eyesight behavior were deeply analyzed, as well the user's natural face expression. To reliably measure the user's natural responses, the keys were the eye-tracker and the iMotions Facial Expression Analysis Module employed during the experiment.

Eye Tracking
In order to determine if the negatively loaded content attracted more focal user attention and incurred lower cognitive load compared to positively loaded content, an eye-tracking signal was acquired from the experiment and analyzed with two metrics: ratio of fixation duration in the area of interest (AOI, covering the analyzed stimuli) to fixation duration on the slide of the experiment where the stimuli was displayed (FDP); and saccade velocity (SCV) when the affective picture was displayed.
To analyze the eye-tracker signal in relation to the displayed stimuli (text and picture), two areas of interest were defined: AOI 1 (covering the part of the screen where only the text was displayed) and AOI 2 (covering the part of the screen where the disruptive affective pictures were suddenly displayed during the text reading). The defined AOIs for en example screen from the experiment are depicted in Figure 4. The measurements of saccade velocity help observing the pattern of a scan path and exploring the cognitive effort. Saccade velocity is highly correlated to discriminatory parameters in terms of cognitive performance [64].
The first oculometrics analyzed was fixation duration percent (FDP): the longer the fixation time, the more often the component attracts the user's attention. Therefore, we analyzed the relation between fixation duration in the area of interest of a given stimuli and the fixation duration in the experiment window during the period when the stimuli was displayed (see Equation (2)). To visualize this, the heat maps for all three groups of stimuli (text, and positive and negative pictures) for the given AOIs are depicted in Figure 5. When analyzing the eyesight focus for text reading, AOI 1 was considered. For emotional pictures that interrupted the text-reading process, AOI 2 was analyzed respectively. First, we compared the text and affective pictures without division of positive and negative pictures (Figure 6 (left)). The results of the one-way two-level Kruskal-Wallis test (with 5% significance level) for fixation time spent during text reading and disruptive pictures is obvious, below 10 (−17) . The results of the Kruskal-Wallis test analysis indicate a significant difference between text reading and emotional pictures. This means that the user was definitely more focused on reading the text than on the picture that interrupted this task. This situation is not surprising; however, between pictures with opposite emotional characteristics, the statistical significance appeared as well (Figure 6 (middle)). This means that the perceptions of the affective stimuli differed. The significant difference was identified between the 10 first affective pictures (p = 0.006). Despite the pictures being stimuli interrupting the user's cognitive process, the subjects paid attention to them to different extents depending on their emotional type. Much more time was devoted to the negative than to the positive pictures. This is according to the rule of more intriguing objects attracting attention, and attention being focused on negative stimuli, which is consistent with the results previously reported [15,18,19]. The results from our experiments indicate that the negative pictures were more interesting to the users than the positive ones, and users devoted more time to them.
The second analyzed oculometrics was mean saccade velocity (SCV), which was measured when emotional stimulus was disrupted. Saccadic eye movements bring the line of sight to details of interest in the visual scene. In earlier research [33][34][35][36][37], saccade velocity has been identified as a very good indicator of fatigue, stress, mental overload, irritation states, and lowering attention oculometrics; it indicates a high degree of distraction during the gaze concentration process. In other words, SCV indicates the stability of a participant's gaze: a more stable gaze yields a smaller saccade velocity and less chaotic eye pattern; a higher average saccade velocity indicates higher stress and lower concentration while performing the cognitive task. The higher the cognitive load, the shorter the saccades, and the higher the saccade velocity [37].
Given the above, we used the saccade velocity to prove that the emotional characteristic of disruptive stimuli (positive or negative) has an impact on user attention, especially sight stability, and on cognitive load when the cognitive task was disrupted. As the characteristics of saccades differ for text reading and looking at pictures, for the analysis of saccadic metrics, we only considered the affective stimuli. When examining the sight behavior during text reading, the saccade velocity is high, which is the nature of reading. During picture watching, the metrics should be much lower if the user is in focus.
Analyzing the results obtained from the experiment (Figure 6 (right)), we found that the standardized velocity of saccadic movements for the negative pictures was lower than for the positive pictures. The difference in mean saccades velocity between pictures of different emotional characteristics according to the one-way, two-level Kruskal-Wallis test (at a 5% significance level) was statistically significant (p = 0.04). The SCV metrics showed a greater cognitive load and irritation in the case of positive pictures in comparison to the negative ones. This finding is most likely related to the fact that the pictures were so intriguing and surprising that they eliminated the irritation resulting from the interruption of the cognitive process in comparison to the positive pictures. This could also have resulted from the mechanism where negative feeling can be reduced by other negative impacts.

Emotions Recognition
To identify which emotions were evoked when the subject's cognitive task (textreading task) was interrupted with emotional pictures, we used a one-way statistic test. In this test, we were not interested in picture valence: we only wanted to check whether the emotional features significantly differed between the two conditions: the text-reading condition and the picture presentation condition. To prepare data for this analysis, the samples labeled as negative and positive were joined together to form one group, which was named picture. Next, the Kruskal-Wallis test (p-value: 0.05) testing the picture condition against the text condition was applied separately to each feature. The total means for both conditions and each feature, together with the p-value levels, are presented in Figure 7. In the figure, seven out of nine features (attention, anger, sadness, disgust, joy, surprise, and fear) significantly differed between the two analyzed conditions.
Five of the significant effects found in the analysis agreed with our preliminarily expectations and are straightforward to explain. The text-reading period required more attention than the picture-presentation period because the subjects were forced to read and understand the text in order to provide the correct answers to the questions following each section of the text. The higher disgust in the picture presentation period was also expected, since some of the negative pictures were perceived as revolting for most subjects, which contrasted the neutral tone of the text. The significant increase in the anger feature in the picture presentation period is also easy to explain. The pictures appearing in front of the subject's eyes and obscuring the text interrupted the current cognitive task performed by the subject and were hence perceived as annoying. The higher anger was negatively correlated with the next significant feature, joy, which explains its smaller value in the picture presentation period. The next effect that we detected was less sadness in the picture presentation period. Once again, the explanation of this effect is straightforward. Sadness is a negative emotion that reflects a person's avoidance tendency. Anger is also a negative emotion, but it reflects the motivation to engage (approach tendency). Since a person cannot withdraw and engage at the same time, one emotion has to prevail. In the case of interrupting a person's ongoing cognitive process, anger is usually much stronger than eventual sadness; as a result, the pop-up pictures induced more anger and less sadness.
The emotional feature that was the most unexpected to us was the feature depicting the subject's surprise. At the beginning of the experiment, we were sure that since the picture presentation time was random, the subject should be surprised each time when a picture appeared on the screen. Our anticipation was found to be false: the subjects were significantly more surprised by the text than by the pictures. One possible explanation of this effect might be that the symptoms of surprise appeared in both the text-reading and picture presentation periods. However, during the text-reading period, the surprise emotion had chances to develop; in the picture presentation period, it was quickly overcome by anger associated with the interruption of the cognitive task.
The last significant effect that we identified during the analysis was the higher fear in the text-reading period. This effect was the opposite of our expectation. The set of negative pictures used in our experiment also included pictures with fearful content. Hence, we expected those pictures to evoke fear in the subjects. However, the fearful pictures induced some fear but not enough to counteract the nonfearful positive pictures (compare Figure 8). Figure 7 shows the differences between the text-reading and the text presentation periods. As we mentioned above, the majority of the identified effects were as expected. However, the main aim of our analysis was not to observe what happened when we presented a picture on the screen but to determined whether the valence of the picture could induce differences in the emotions felt by the subjects. To this end, we performed another set of Kruskal-Wallis tests (p-value: 0.05), one test per feature. This time, we tested all periods when the positive pictures were presented (positive condition) against all periods when the negative pictures were presented (negative condition). The total means for both conditions and each feature, together with the p-value levels, are presented in Figure 8. From the figure, six out of nine features (engagement, anger, sadness, joy, surprise, and contempt) significantly differed between the two analyzed conditions. The comparison of the grand means obtained for three of those features showed the assumed tendency: the negative pictures were perceived as more surprising, more contemptuous, and sadder than positive ones.
Three other significant effects, higher anger and engagement and lower joy for positive pictures, were the opposite of expected. At the beginning of the experiment, we assumed that positively loaded content would be able to attenuate the irritation induced by interrupting the user's main task: the text-reading task. Hence, we assumed that anger would be significantly higher for negative than for positive pictures. However, the more anger in the positive picture presentation period was one of the most vivid effects in the experiment. Since engagement and joy are highly correlated with anger (engagement presented positive and joy presented negative correlation), higher anger during the positive picture presentation period also directly explains the two remaining effects: higher engagement and lower joy.

Discussion
Despite marketing activity being associated with digital environments and commercial aspects being important for portals, games, and social platforms, user experience makes it important to keep those environments as user-friendly as possible. The types of content and techniques used have an impact on the obtained effects as well as on user perception and attitudes toward brands. Earlier attempts showed that the level of intrusiveness could be reduced by adding emotional appeal to video ads [14]. Earlier results based on subjective assessments from user surveys showed that positive emotional content generated a more positive attitude toward the advertising content and the brand. To avoid subjectivity, we based our study on objective measures derived from eye tracking and facial emotion recognition. With those measures, we obtained effects contradictory to previously reported findings: simple positive content based on, e.g., animals, cats, or nice views, has no ability to reduce the irritation caused by disruptive content.
For the reduction in the negative impact due to disruptive content, the intensity of stimuli and the level of attention induced during interruptions observed for negative content were more effective than positive emotional load. This supports earlier results [1] showing that if information as well as an ad are regarded as important, interesting, or intriguing for the viewer, they feel a lower irritation level due to the interruption. The content with potentially negative appeal from different categories, such as dangerous animals, disgusting pictures, or sad scenes, generated higher attention as measured with eye tracking. This results in capturing information with focal vision, which is important for effective communication [65]. This is important from the perspective of practical applications because attention is one of key factors recognized as important for brand memory [66] and attitudes [67].
Hence, one of the conclusions of our experiment is that user attention was dependent on the valence of the emotional content. In the case of positive pictures, the attention was more ambient than for the negative ones, which were characterized with more focal attention. This conclusion is supported by the fixation time. The long fixation time corresponded to the pictures with a negative load, opposite to the short fixation time observed for the positive pictures. The analysis of saccadic eye movements (saccades velocity) provided more insight into the users' cognitive states and emotions: it showed greater cognitive load and irritation (which is consistent with the results in [37]); surprisingly, this was not found in the case of negative pictures but in the case of positive ones. These results might be a consequence of the pictures being so intriguing and surprising that they eliminated the irritation resulting from the interruption of the cognitive process, as opposed to positive pictures. The conclusions formulated on the basis of eye patterns confirm the facial features analysis. Here, irritation, measured by anger level, was much higher when pictures with positive valence were presented.
These results are consistent with those of earlier studies [68], emphasizing that negative stimuli are treated as emotionally more intensive than positive stimuli. Cognitive process interruption is intense on its own, and content with positive emotional load did not have enough intensity to replace its negative impact.
Our experiment was based on several methodological choices. Firstly, we used affective pictures from a standardized database instead of real advertisements. We decided to go in this direction, since those pictures were accompanied by an identified emotional load. Moreover, they allowed us to keep an emotional load not disturbed by marketing content, brand names, or product altitudes. On the other hand, our choice might narrow the scalability of our results, as true-marketing stimuli can have a slightly different influence on the users.
Secondly, we used an eye-tracking system and facial emotion recognition software to measure the emotional reaction of respondents. Both approaches are usually the first choice in the experiments focused on emotion recognition, although the methods that measure directly the biosignals generated by the human nervous system, such as electroencephalography, electrocardiography, electromyography, or galvanic skin response, usually provide more precise measurements. However, those methods, are characterized by higher invasiveness and smaller applicability, which makes them less practical and harder to use in outside-laboratory conditions.
Thirdly, we focused only on one cognitive task, a text-reading task that is typical for portals or blogs being the main target for marketing activity. This choice might influence the transition of our results to non-text content that provides different levels of user engagement. Moreover, we used neutral texts, while texts within websites might have a positive or negative emotional load. Consistency or inconsistency between text and content used during interruption of user cognitive process can also have an impact on differences in users' reactions.
One possible limitation of the results presented in this paper is related to the scaling intensity of both positive and negative content. We used two classes of images, and in both classes, different levels of positive or negative impact could be distinguished. It opens additional directions for future research on the impact of content type and task difficulty on user emotions.

Conclusions
The study whose results are presented in the paper was designed to provide a better understanding of the impact of emotional content used in the process of creating marketing content on the users' perception. To achieve this aim, we conducted the cognitive experiment with affective pictures that enabled us to study users' visual attention and their emotional response to visual stimuli.
By studying both factors simultaneously, we were able to find a more compact and unified answer to the question on how to prepare the marketing content to reduce a user irritation and limit ad avoidance.
Our experiment clearly shows that the user attention depends on the valence of the emotional content. In the case of positive pictures, the attention was more ambient than for the negative ones, which were characterized by more focal attention. This conclusion was also supported by the fixation time, which was higher for negative pictures. The conclusions formulated on the basis of eye patterns were confirmed also by the facial features analysis. Here, irritation, measured by anger level, was much higher when pictures with positive valence were presented.
The presented results show the possibility of replacing emotions induced by interruptions using content with stronger emotional load such as shocking or scary pictures. This creates implications for practice when adjusting the emotional intensity of the content used for the marketing technique and intensity of other stimuli within the user's environment. We do not suggest using shocking content in typical campaigns; instead, we propose searching for methods of attracting user attention by emotionally intensifying the disruptive content, because this might lower user irritation.