The Impact of Pet Videos on Emotional Face Processing
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThe manuscript addresses an emerging topic, how exposure to pet-related media influences emotional processing. The idea is interesting and aligns with current discussions on media-induced emotion regulation and mental health. The authors conducted three experiments that form a coherent research narrative. However, the manuscript contains substantial conceptual, methodological, analytical, and interpretational issues. Many central claims (especially regarding the “social attributes” of pets) are insufficiently supported by the data or the experimental design. Important confounding variables are uncontrolled; several analytical interpretations are inaccurate or overstated; and the narrative contains theoretical leaps not warranted by the presented evidence.
- First, I think that Key confounding variables are not controlled. The authors repeatedly argue that pet videos exert unique effects due to their “social attributes.” However, the pet videos contain no human–animal interaction and show only animals alone, making them poor proxies for social support or social cues. The stimuli represent cute biological movement, not validated social-supportive content. This undermines the central theoretical claim. Without measures of perceived sociality or social support, the data cannot justify the conclusion that social attributes drive the observed effects.
- The attentional bias index (RT_incon – RT_con) is used, but: No raw RT descriptives, SDs, or accuracy data are provided. No justification is given for interpreting positive vs. negative index values. The dot-probe task is widely criticized for poor reliability; this is not acknowledged. In Fig. 1 and the related text, the manuscript interprets the index inconsistently and makes strong statements without reporting the underlying RT patterns.
- Experiment 2 is used to explain results from Experiment 1. However: Experiments were conducted on different days, with potential order effects. No mediation analysis is performed. No direct measurement of valence perception within Experiment 1.
- I thinks that experiment 3 does not support the manuscript’s interpretation. The authors conclude that pet videos do not influence word valence because words lack social attributes. However: Words differ from faces in sensory modality, emotional immediacy, and processing stages. The null effect could be due to lower sensitivity, semantic processing delay, or lower arousal, not “lack of sociality.”
- Statistical reporting needs improvement. No effect sizes (ηp², Cohen’s d) are provided. Multiple comparisons are not corrected despite a high number of LSD post-hocs. Some interpretations overstate borderline non-significant results (e.g., p = 0.160 treated as meaningful trend.
- Figures lack error bars or standard deviations.
I think there is still a serious problem with this manuscript, that is, the structure of the manuscript does not meet the requirements of SCI papers. I strongly recommend that the author present the three experiments in a complete experimental writing format.
In addition, the author did not state any ethical requirements either.
Author Response
Reviewer 1
Comment 1: First, I think that Key confounding variables are not controlled. The authors repeatedly argue that pet videos exert unique effects due to their “social attributes.” However, the pet videos contain no human–animal interaction and show only animals alone, making them poor proxies for social support or social cues. The stimuli represent cute biological movement, not validated social-supportive content. This undermines the central theoretical claim. Without measures of perceived sociality or social support, the data cannot justify the conclusion that social attributes drive the observed effects.
Response:
Thank you very much for your valuable comment. As you rightly pointed out, due to the lack of measurements for sociality and social support, the data of this study cannot fully support the conclusion that social attributes drive the observed effects. Therefore, in the descriptions regarding the potential mediation of pet videos' effects on emotional face processing by social attributes, the word "may" is consistently used throughout the manuscript.
On one hand, the inference that "the attentional bias–modulating effect of pet videos may not be solely attributable to positive mood induction but could also be related to the social support they provide" is based on the following rationale: First, the pet videos in this study featured cats and dogs, the animals most commonly kept as companion animals in daily life. Unlike other cute animals (e.g., rabbits, birds), cats and dogs share close emotional bonds with humans. Liu et al. (2025) found that viewing images of companion animals (cats and dogs) specifically activates brain regions associated with human attachment, as well as neural circuits involved in memory, emotion, and attachment-related cognitive processing. This suggests that viewing companion animal images engages the human socio-emotional system. Furthermore, research indicates that people primarily keep pets for emotional support (Zhang et al., 2024a), and humans can form attachment relationships with pets, which provide emotional support and comfort (Keefer et al., 2014). Therefore, a potential important reason why "pet videos specifically modulate emotional face processing" may be the social and emotional support associated with pet videos, which is a characteristic not present in scenery videos.
On the other hand, your comment precisely highlights the limitations of the conclusions supported by our data. To avoid conflating "possible speculative reasons" with "conclusions drawn from the data," we have made the following revisions to the manuscript:
- In the Discussion section of Experiment 1, we have explained the social attributes of the animals in the pet videos and emphasized that the statement "the attentional bias–modulating effect of pet videos may not be solely attributable to positive mood induction but could also be related to the social support they provide" is based on logical inference (speculation) rather than being data-driven (conclusion). (see page 6, lines 246-257, in red)
- In the General Discussion section, we have addressed the limitations of the study and suggested future research directions to better control for the confounding variable of social support.(see page 12-13, lines 500-505, 538-544, in red)
Comment 2: The attentional bias index (RT_incon – RT_con) is used, but: No raw RT descriptives, SDs, or accuracy data are provided. No justification is given for interpreting positive vs. negative index values. The dot-probe task is widely criticized for poor reliability; this is not acknowledged. In Fig. 1 and the related text, the manuscript interprets the index inconsistently and makes strong statements without reporting the underlying RT patterns.
Response:
(1)Descriptive statistics for raw reaction times (RTs), standard deviations, etc., have been added to the table (see page 5, lines 217-219, in red);
(2)We have added a detailed explanation of the interpretation of positive and negative values of the Attention Bias Index: Attention Bias Index (ABI) = RTincon - RTcon.Specifically, the attentional bias index was calculated by subtracting reaction times in congruent trials from reaction times in incongruent trials. A positive attentional bias index indicates that reaction times in incongruent trials were longer than those in congruent trials, suggesting faster responses in congruent trials and thus the presence of an attentional bias toward emotional stimuli. In this case, participants responded more rapidly to probes appearing at the location of emotional stimuli, whereas probes presented at the location of neutral stimuli required longer reaction times due to attentional reorienting. An attentional bias index of zero indicates no difference in reaction times between congruent and incongruent trials, reflecting the absence of attentional bias. A negative attentional bias index indicates shorter reaction times in incongruent trials than in congruent trials, suggesting faster responses in incongruent trials and reflecting attentional avoidance of emotional stimuli. In this case, participants responded more rapidly to probes appearing at the location of neutral stimuli and more slowly to probes appearing at the location of emotional stimuli, indicating avoidance of emotional information. (see page 4-5, lines 181-194, in red)
(3)We have added an explanation for the dependent variable in Experiment 1 (Attention Bias Index Difference): Attention Bias Index Difference = ABIpre - ABIpost, The change in attentional bias was computed as the attentional bias index before video viewing minus the attentional bias index after video viewing,The more positive value indicated less post - video bias towards emotional stimuli. (see page 5, lines 196-199, in red)
(4)We have revised the y-axis label and caption for Figure 1 as follows:

Figure 1. Attention Bias Index Difference. Attention Bias Index Difference = ABIpre - ABIpost, The attentional bias index difference was defined as the attentional bias index before minus after video viewing, with more positive values indicating reduced attentional bias toward emotional stimuli. Error bars represent means ± SEM.
(5)We have acknowledged the limitation of the dot-probe task: Finally, this study employed a dot-probe task to measure attentional bias. However, it should be noted that the dot-probe task has been widely criticized for its low reliability (Kappenman et al., 2014). Future research could adopt more reliable experimental paradigms or combine eye-tracking and EEG techniques to validate the present findings and further clarify their cognitive mechanisms. (see page 13, lines 548-552, in red)
Comment 3: Experiment 2 is used to explain results from Experiment 1. However: Experiments were conducted on different days, with potential order effects. No mediation analysis is performed. No direct measurement of valence perception within Experiment 1.
Response:
- Regarding order effects: Experiments 1 and 2 involved different participants. All participants in this study took part in only one of the experiments. Furthermore, a Latin square design was used to counterbalance the order of conditions within each experiment. Therefore, the experimental results are minimally influenced by order effects.
- Regarding mediation analysis: Since the two experiments were completed by different participant groups, mediation analysis is not applicable. We have stated in the Participants section of Experiment 1 (2.1.1) that none of the participants took part in other experiments. We will add a similar statement in the Participants section of Experiment 2. (see page 7, lines 281-282, in red)
- Regarding the absence of direct valence perception measurement in Experiment 1: Experiment 1 used a dot-probe paradigm where emotional and neutral stimuli were presented simultaneously for a brief duration, and participants were required to respond to the probe as quickly as possible. It was not suitable to ask participants to evaluate facial valence concurrently while performing this task.
We believe the results of Experiment 2 are representative and reflect the cognitive processing patterns of the majority, thus they can be used to explain the results of Experiment 1.
Comment 4: I thinks that experiment 3 does not support the manuscript’s interpretation. The authors conclude that pet videos do not influence word valence because words lack social attributes. However: Words differ from faces in sensory modality, emotional immediacy, and processing stages. The null effect could be due to lower sensitivity, semantic processing delay, or lower arousal, not “lack of sociality.”
Response:
First, your suggestion highlights a non-negligible potential reason for the null effect in Experiment 3. As you noted, we also stated in the Discussion of Experiment 3: "Numerous studies have demonstrated distinct processing mechanisms between text and images in humans, supported by behavioral (Lin et al., 2022) and neuroimaging evidence (Vandenberghe, 1996; Reisch et al., 2020; Lin et al., 2023). For example, emotional effects elicited by words emerge selectively in late-stage semantic processing, while facial emotional processing dominates early sensory stages (Rellecke et al., 2011; Reisch et al., 2020). Therefore, the valence ratings of emotional words may exhibit lower sensitivity to video-induced modulation compared to facial stimuli." This aligns with your point. Your suggestion provides additional possible reasons for the null effect. We fully agree that the null effect in Experiment 3 could stem from multiple factors, including: (1) Processing stage differences: emotional effects of words occur more in later semantic processing, while facial emotion is processed in early perceptual stages; (2) Sensory modality and emotional immediacy: visual dynamic videos may have weaker emotional arousal effects on static text; (3) The social attribute hypothesis is only one possibility, not the sole explanation.
Second, we would like to clarify the inferences and conclusions drawn from Experiment 3. Our study aimed to test whether pet videos influence the affective perception of stimuli beyond socially relevant facial cues. As noted in Experiment 3 (page 10, lines 376-379), we proposed that “If the perception target lacks social attributes, pet videos may not exert specific effects on emotional perception. Consequently, this experiment employs emotional words as the perception target, investigating the impact of emotional videos on valence assessment of emotional words.” The purpose was to extend the scope of our investigation by examining whether pet videos affect the valence perception of non-social stimuli, such as emotional text. The results indicated no significant differences in valence ratings for positive, negative, or neutral words across the three conditions. Consequently, we cautiously concluded in the manuscript that “Importantly, this effect was only observed in facial stimuli with social attributes.” (Abstract, page 1, lines 18-19). It is important to note that our emphasis is not on stating that “pet videos do not influence emotional text due to the lack of social attributes.” Rather, we highlight that, by comparison with emotional text, the effect of pet videos on affective valence perception is observed exclusively for facial stimuli with social attributes. The purpose of Experiment 3, therefore, was to delineate the scope and stability of the effect of pet videos on emotion processing.
Finally, and more importantly, your comment made us aware of potential confusion in the inferences and conclusions of Experiment 3. Therefore, we will add the following content to the Discussion of Experiment 3:
It should be noted that one of the primary aims of the present experiment was to examine whether the modulatory effect of pet videos on emotional processing depends on the social attributes of the stimuli (e.g., faces used in Experiments 1 and 2). The results showed that pet videos did not modulate the valence perception of non-social emotional words, suggesting that their effects may be more sensitive to stimuli with social attributes. However, this null effect may also be attributable to systematic differences between words and faces in terms of emotional immediacy, sensory modality, or depth of processing. Future research could further clarify the boundary conditions and cognitive mechanisms underlying the effects of pet videos on emotional processing by directly comparing different types of social and non-social stimuli. (see page 12 , lines 456-464, in red)
Comment 5: Statistical reporting needs improvement. No effect sizes (ηp², Cohen’s d) are provided. Multiple comparisons are not corrected despite a high number of LSD post-hocs. Some interpretations overstate borderline non-significant results (e.g., p = 0.160 treated as meaningful trend.
Response:
(1) Regarding reporting effect sizes (ηp²): In the results analyses of all three experiments, we have reported ηp² values for main effects and interaction effects. For example, in Experiment 2, we reported ηp² values for the main effect of face emotion type, the main effect of video type, and their interaction (page 8, lines 319-323). We will further check all relevant analyses in the revised manuscript to ensure complete and standardized statistical reporting of effect sizes.
(2)Regarding correction for multiple comparisons: Your point is very important. We did use the Bonferroni correction for post-hoc (LSD) comparisons in the data analysis but failed to state this explicitly in the original text. This was an oversight in our description. We will add this clarification in the "Data Analysis" sections of the revised manuscript and update/annotate the corresponding results.
(3)Regarding the interpretation of borderline results like "p = 0.160": Upon review, the original text did not contain statements treating p = 0.160 as a significant trend. We consistently adhere to p < 0.05 as the standard for statistical significance and base inferences only on results meeting this threshold in the discussion. If there are any statements in the text that might be misinterpreted, we will clarify or adjust them during revision to ensure strict correspondence between interpretation and statistical results.
Comment 6: Figures lack error bars or standard deviations.
Response:
Thank you for pointing out this detail regarding figure presentation. In the original Figures 1, 2, and 3, we did include error bars (representing the standard error of the mean) for each bar to reflect the estimation precision of the sample means. We carefully checked the submitted manuscript files and suspect that the visual prominence of the error bars might have been reduced during document generation or conversion, making them less discernible to you. We apologize for this.
In the revised manuscript, we will explicitly indicate the specific statistic represented by the error bars in the figure legends or captions (e.g., "Error bars represent means ± standard error of the mean (SEM)").
Comment 7: I think there is still a serious problem with this manuscript, that is, the structure of the manuscript does not meet the requirements of SCI papers. I strongly recommend that the author present the three experiments in a complete experimental writing format.
Response:
We have supplemented the previously omitted details in the Methods sections.
Comment 8: In addition, the author did not state any ethical requirements either.
Response:
This study was approved by the Ethics Review Board of Sichuan Normal University (ID: 2023LS018) and was conducted in accordance with the ethical principles of the Declaration of Helsinki. Written informed consent was obtained from all participants prior to their inclusion in the study. This information has been added to the manuscript (see page 14, lines 576-578, in red).
References:
Kappenman, E. S., Farrens, J. L., Luck, S. J., & Proudfit, G. H. (2014). Behavioral and ERP measures of attentional bias to threat in the dot-probe task: poor reliability and lack of correlation with anxiety. Frontiers in psychology, 5, 1368.
Keefer, L., Landau, M., & Sullivan, D. (2014). Non-human Support: Broadening the Scope of Attachment Theory. Social and Personality Psychology Compass, 8(9), 524–535.
Lin, W., Li, Z., Zhang, X., Gao, Y., & Lin, J. (2023). Electrophysiological evidence for the effectiveness of images versus text in warnings. Scientific Reports, 13, 1278.
Lin, W., Lin, J., Yang, Y., Liao, J., Chen, W., & Mo, L. (2022). The difference in the warning effect of different warning signs, International Journal of Occupational Safety and Ergonomics, 28(2), 890-900.
Liu, H., Zhou, X., Lin, J., & Lin, W. (2025). Specific Neural Mechanisms Underlying Humans' Processing of Information Related to Companion Animals: A Comparison with Domestic Animals and Objects. Animals, 15(21), 3162.
Rault, J. L. , Bateson, M. , Boissy, A. , Forkman, B. , Grinde, B. , & Gygax, L. , et al. (2025). A consensus on the definition of positive animal welfare. Biology Letters, 21(1).
Reisch, L. M., Wegrzyn, M., Woermann, F. G., Bien, C. G. & Kissler, J. (2020). Negative content enhances stimulus-specifc cerebral activity during free viewing of pictures, faces, and words. Human Brain Mapping, 41(15), 4332–4354.
Reisch, L. M., Wegrzyn, M., Woermann, F. G., Bien, C. G. & Kissler, J. (2020). Negative content enhances stimulus-specifc cerebral activity during free viewing of pictures, faces, and words. Human Brain Mapping, 41(15), 4332–4354.
Rellecke, J., Palazova, M., Sommer, W., & Schacht, A. (2011). On the automaticity of emotion processing in words and faces: Event-related brain potentials evidence from a superficial task. Brain and Cognition, 77(1), 23–32.
Vandenberghe, R., Price, C., Wise, R., Josephs, O. & Frackowiak, R. S. (1996). Functional anatomy of a common semantic system for words and pictures. Nature, 383(6597), 254–256.
Zhang, X., He, Y., Yang, S., & Wang, D. (2024a). Human Preferences for Dogs and Cats in China: The Current Situation and Influencing Factors of Watching Online Videos and Pet Ownership. Animals, 14, 3458.
Author Response File:
Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThe manuscript presents three experiments that aim to examine the mechanisms by which exposure to pet-related video stimuli influences human processing of emotional facial expressions.
The work is clearly presented, employs an original paradigm, and has significant applied values for research on multispecies coexistence in everyday and mediated contexts, including situations without direct human–animal interaction, such as viewing prerecorded video material. The methodological procedures are appropriate for the hypotheses formulated by the authors, and the interpretation of the findings is coherent and well integrated with the existing literature.
My suggestions to the authors are grouped in the following three directions:
1). More clarification on animal welfare
I am recommending that the authors explicitly state throughout the manuscript that all pets depicted in the videos displayed indicators consistent with positive welfare, following the consensus definition proposed by Rault et al. (2025). Directly linking the description of the stimuli to this framework would strengthen the ethical justification of the study and clarify that the animals were experiencing predominantly positive states rather than merely the absence of distress.
2). The characteristics of the participants
In the participant description, the authors are encouraged to report whether participants were current or former pet owners and to include any available information on their attitudes toward companion animals. If such data were not collected, the absence of these variables and their potential influence on responses to pet-related stimuli should be explicitly acknowledged as limitations.
3). Ethical aspects of pet video recording
In the Discussion section, it would be useful to add a statement that the recording of companion animals (e.g. for mental health intervention purposes) should be conducted using non-invasive, welfare-oriented, and ethically sound procedures. This section should also clarify that generating such materials by pet owners ought not to prioritize human mental health benefits at the expense of animal wellbeing, but rather aim to promote positive outcomes for both humans and animals.
Author Response
Comment 1: More clarification on animal welfare
I am recommending that the authors explicitly state throughout the manuscript that all pets depicted in the videos displayed indicators consistent with positive welfare, following the consensus definition proposed by Rault et al. (2025). Directly linking the description of the stimuli to this framework would strengthen the ethical justification of the study and clarify that the animals were experiencing predominantly positive states rather than merely the absence of distress.
Response:
Thank you for your valuable suggestion. Firstly, regarding the emotional attributes of the videos, we have clearly stated in the Methods section (page 4, lines 141-142) that all pet videos used were evaluated for emotional valence by independent participants, and their ratings were significantly higher than those for neutral videos. This ensures the video content itself possesses positive emotional attributes.
Additionally, we have added a separate paragraph in the video materials section of the experimental methods (see page 4, lines 143-145, in red) to explain the ethical standards for animal video recording and selection, specifically as follows:
All animals shown in the videos met established indicators of positive animal welfare (e.g., expression of natural behaviors and positive mental states), following the consensus framework proposed by Rault et al. (2025), to ensure compliance with ethical research standards.
We believe this addition helps enhance the ethical transparency of the study and more accurately reflects the positive state of the animals in the videos. Thank you again for your valuable suggestion.
Comment 2: The characteristics of the participants
the participant description, the authors are encouraged to report whether participants were current or former pet owners and to include any available information on their attitudes toward companion animals. If such data were not collected, the absence of these variables and their potential influence on responses to pet-related stimuli should be explicitly acknowledged as limitations.
Response:
The participants in this study were university students with normal or corrected-to-normal vision. Since participants' pet ownership status was not the primary focus of this study, such data were not collected, nor were participants' attitudes towards companion animals considered. As you pointed out, this variable could potentially influence responses to pet-related stimuli. Therefore, this limitation should be acknowledged in the discussion, and future research directions should be indicated. We have added the following content to the Discussion section:
In addition, because participants’ pet ownership status was not the primary focus of this study, data regarding this variable were neither collected nor analyzed. Future research could consider including participants’ pet ownership and attitudes toward companion animals as potential variables for further investigation. (see page 13, lines 544-548, in red)
We hope this addition makes the paper more rigorous and points the way for future research.
Comment 3: Ethical aspects of pet video recording
In the Discussion section, it would be useful to add a statement that the recording of companion animals (e.g. for mental health intervention purposes) should be conducted using non-invasive, welfare-oriented, and ethically sound procedures. This section should also clarify that generating such materials by pet owners ought not to prioritize human mental health benefits at the expense of animal wellbeing, but rather aim to promote positive outcomes for both humans and animals.
Response:
We have added a relevant statement at the end of the Discussion section, emphasizing that future pet video recording should follow non-invasive, animal welfare-oriented ethical principles and aim for the mutual well-being of humans and animals. The specific addition is as follows:
It is noteworthy that although the pet videos used in this study were sourced from publicly available online platforms, future applications of such content in formal mental health interventions or research should adhere strictly to non-invasive, animal welfare–oriented ethical guidelines. The production of videos should not pursue human psychological benefits as the sole objective; rather, it should also promote positive human–animal interactions and shared well-being, ensuring that animals remain in a natural, comfortable, and stress-free state throughout recording (Rault et al., 2025). Such practices are not only an ethical requirement for scientific research but also constitute a foundation for developing sustainable and responsible human–animal interactive media. (see page 13, lines 525-533, in red)
We believe this addition helps enhance the ethical integrity and social responsibility awareness of the research. Thank you again for your insightful suggestion.
References:
Kappenman, E. S., Farrens, J. L., Luck, S. J., & Proudfit, G. H. (2014). Behavioral and ERP measures of attentional bias to threat in the dot-probe task: poor reliability and lack of correlation with anxiety. Frontiers in psychology, 5, 1368.
Keefer, L., Landau, M., & Sullivan, D. (2014). Non-human Support: Broadening the Scope of Attachment Theory. Social and Personality Psychology Compass, 8(9), 524–535.
Lin, W., Li, Z., Zhang, X., Gao, Y., & Lin, J. (2023). Electrophysiological evidence for the effectiveness of images versus text in warnings. Scientific Reports, 13, 1278.
Lin, W., Lin, J., Yang, Y., Liao, J., Chen, W., & Mo, L. (2022). The difference in the warning effect of different warning signs, International Journal of Occupational Safety and Ergonomics, 28(2), 890-900.
Liu, H., Zhou, X., Lin, J., & Lin, W. (2025). Specific Neural Mechanisms Underlying Humans' Processing of Information Related to Companion Animals: A Comparison with Domestic Animals and Objects. Animals, 15(21), 3162.
Rault, J. L. , Bateson, M. , Boissy, A. , Forkman, B. , Grinde, B. , & Gygax, L. , et al. (2025). A consensus on the definition of positive animal welfare. Biology Letters, 21(1).
Reisch, L. M., Wegrzyn, M., Woermann, F. G., Bien, C. G. & Kissler, J. (2020). Negative content enhances stimulus-specifc cerebral activity during free viewing of pictures, faces, and words. Human Brain Mapping, 41(15), 4332–4354.
Reisch, L. M., Wegrzyn, M., Woermann, F. G., Bien, C. G. & Kissler, J. (2020). Negative content enhances stimulus-specifc cerebral activity during free viewing of pictures, faces, and words. Human Brain Mapping, 41(15), 4332–4354.
Rellecke, J., Palazova, M., Sommer, W., & Schacht, A. (2011). On the automaticity of emotion processing in words and faces: Event-related brain potentials evidence from a superficial task. Brain and Cognition, 77(1), 23–32.
Vandenberghe, R., Price, C., Wise, R., Josephs, O. & Frackowiak, R. S. (1996). Functional anatomy of a common semantic system for words and pictures. Nature, 383(6597), 254–256.
Zhang, X., He, Y., Yang, S., & Wang, D. (2024a). Human Preferences for Dogs and Cats in China: The Current Situation and Influencing Factors of Watching Online Videos and Pet Ownership. Animals, 14, 3458.
Author Response File:
Author Response.pdf
Round 2
Reviewer 1 Report
Comments and Suggestions for AuthorsI have no questions.

