Self-disclosure to a Robot: Only for Those Who Suffer the Most

: Social robots may become an innovative means to improve the well-being of individuals. Earlier research showed that people easily self-disclose to a social robot even in cases where that was unintended by the designers. We report on an experiment of self-disclosing in a diary journal or to a social robot after negative mood induction. The off-the-shelf robot was complemented with our in-house developed AI chatbot and could talk about ‘hot topics’ after having it trained with thousands of entries on a complaint website. We found that people who felt strong negativity after being exposed to shocking video footage benefited the most from talking to our robot rather than writing down their feelings. For people less affected by the treatment, a confidential robot chat or writing a journal page did not differ significantly. We discuss emotion theory in relation to robotics and possibilities for an application in design (the emoji-enriched ‘talking stress ball’). We also un-derline the importance of - otherwise disregarded - outliers in a data set that is of a therapeutic nature.


Introduction
Since the outbreak of the COVID-19 pandemic, there is an upsurge of interest in social isolation, loneliness, and depression. People living alone, people with low social-economic status, as well as quite unexpectedly, youngsters and students are at risk of loneliness (Bu, Steptoe, & Fancourt, 2020a [96]; Bu, Steptoe, & Fancourt, 2020b [97]). In the United States, lockdowns and social distancing measures were associated with increased levels of loneliness, while loneliness correlated highly with depression and suicidal ideation. Loneliness remained high even after distancing measures were relaxed (Killgore, Cloonan, Taylor, Miller, & Dailey, 2020 [98]). In the United Kingdom, a country severely impacted by the pandemic, people with COVID-19, likely developed psychiatric disorders and were lonelier, particularly women, adolescents, and young adults (Li, & Wang, 2020 [99]). In Hong Kong, where our current study took place, COVID-19 even led to "alarming levels of psychiatric symptoms" with loneliness playing a disadvantageous role (Tso & Park, 2020 [100]). A number of interventions may help reduce the feeling of loneliness during social isolation, among which are mindfulness exercises, lessons on friendship, robot pets, and programs that facilitate making social contact (Williams, Townson, Kapur, Ferreira, Nunn, Galante, ... & Usher-Smith, 2021) [101].
It is also media exposure to negative information such as war and disasters that may lead to negative psychological outcomes, in particular feelings of anxiety (Hopwood & Schutte, 2017) [29]. It seems that in the developed countries, depression, stress, and anxiety increase among the youth as a result of intensive media use. According to Twenge, Joiner, Rogers, and Martin (2018) [86], for example, cases of (attempted) suicide among adolescents have gone up since 2010 in the US, which would be linked to heavy media usage.
To improve individuals' mental well-being, a considerable number of studies focuses on the reduction of negative emotions. Emotions are a characteristic human phenomenon (Frijda, 2007) [22] and have a huge impact on individuals' lives, including 3 of 26 To self-disclose, talking with a psychiatrist but also journal writing is one of the methods widely adopted in psychotherapy. A variety of studies examined journal writing to reduce distress (Alford, Malouff, & Osland, 2005 [1]; Hemenover, 2003 [26]; Horneffer & Jamison, 2002 [27]; Ireland, Malouff, & Byrne, 2007 [30]). Journal writing has beneficial effects particularly for college students (Frattaroli, 2006) [19]. Writing as an intervention would transfer the nonverbal memories into a verbal form that helps reorganize the memories, resulting into stress reduction (Pennebaker & Francis, 1996 [58]; Dalton & Glenwick, 2009) [13].
The meta-analysis by Frisina, Borod, and Lepore (2004) [23] indeed found that writing improved health outcomes (d = . 19). However, the effect was stronger for physical outcomes (d = .21) than for psychological outcomes (d = .07) (ibid.). In accordance, Pascoe (2016) [53] states that the effectiveness of writing to reduce the level of negative emotions is but limited and needs further study. The most beneficial form of writing seems to include large numbers of positive emotion words and a moderate number of negative emotion words. Participants who used too many or too few negative emotion words benefited less from a writing intervention (Pascoe, 2016) [53], so that writing may be contra-indicated for individuals with, for instance, alexithymia, who are unable to express emotions (Lumley, 2004) [44]. Moreover, studies conducted by Pennebaker (1993) [56], Pennebaker and Beall (1986) [57], and Murray and Segal (1994) [50] point out that the physical presence of a therapist is what moderates the negative emotions rather than the writing itself.
Problem is that worldwide, mental-health workers, therapists, and psychiatrists are in short supply (World Health Organization, 2018) [94]. Luckily, however, and quite unexpectedly, since the release of the Rogerian chatbot-therapist ELIZA (Weizenbaum, 1966) [92], people nowadays do not merely share their secrets with fellow humans but also with their Apple Siri voice agent (e.g., Saffarizadeh, Boodraj, & Alashoor, 2017) [70] as well as with conversation and companion robots (e.g., Hoorn, Konijn, Germans, Burger, & Munneke, 2015) [28]. Perhaps, then, that social robots may be an 'AI-in-Design' alternative to practice emotion-disclosure interventions with. Provided that they work well, of course.
In that respect, Wada, Shibata, Saito, Sakamoto, and Tanie (2005) [89] showed that social robots alleviate adverse emotions such as loneliness and stress. Measured on a geriatric-depression scale as well as a 'face scale,' the level of depression of participants significantly decreased after interaction with a social robot (ibid.). Jibb, Birnie, Nathan, Beran, Hum, Victor, and Stinson (2018) [31] found that talking to a robot reduced the level of distress among children who underwent cancer treatment. Dang and Tapus (2015) [14] found that social robots can assist humans during emotion-oriented coping, using a stress-eliciting game played together with a robot. Cabibihan, Javed, Ang, and Aljunied (2013) [6] show evidence that robots work well for autistic children, improve their adaptive behaviors (e.g., Robins [34]). Social robots also may increase the mental well-being of older adults through perceived emotional support and interaction (Pu, Moyle, Jones, & Todorovic, 2018) [62].
In psychotherapy as well, robots may meet the special needs of individuals with cognitive, physical, or social disabilities (Libin & Libin, 2004) [43]. The meta-analysis conducted by Costescu, Vanderborght, and David (2014) [10] indicates that in overall robot-enhanced psychotherapy, robots have medium-sized significant effects on the improvement of behavior but not so much on cognitive and subjective aspects. Yet, individual studies sometimes do show that social robots improve performance on the subjective and cognitive level as well (e.g., Kidd In view of the generally positive therapeutic effects of robots in reducing stress and anxiety, our research question is whether social robots offer an alternative to traditional diary writing to 'let off some steam,' particularly in coming to terms with negatively valenced emotions after violent-media exposure. We expected that social robots would do better than writing down ones feelings because the robot more closely resembles talking to a person (i.e. a virtual therapist) and writing may not be everybody's preferred way of expression. Therefore, we propose (H1) that a social robot that invites self-disclosure from its user decreases the level of negative emotions more than pencil-and-paper approaches do. As a medium (H2), a social robot that invites self-disclosure will be regarded as more relevant to the user's goals and concerns than pencil-and-paper approaches.

Participants and Design
After obtaining approval from the institutional Ethical Review Board (filed under HSEARS20200204003), Voluntary participants (N = 45; MAge = 24.9, SDAge = 3.29, 55.6% female, Chinese nationality) were randomly assigned to a between-subjects experiment of self-disclosure after negative-mood induction in a Robot (n = 24; 54.2% female) versus Writing condition (n = 21; 57.1% female). All participants had university training at the master level, except for four doctorate degrees, three bachelors, and one with a diploma degree. Informed consent was obtained formally from all participants. They did not receive any credits or monetary rewards.

Procedure
Participants were brought in a dimly lit and shielded-off section of the experimenter room and were seated in front of a laptop. The experiment consisted of negative-mood induction and then self-disclosure with one of two media, after which participants filled out an online questionnaire in the Qualtrics environment for administration of surveys and experiments (https://www.qualtrics.com/).
In the induction part, participants were confronted with a 10m6s video compilation of three documentaries about a serious earthquake incident that happened in Wenchuan Sichuan, China in 2008. Viewing negative media, including videos, images, and texts, effectively induces negative emotions with an increasing activation of the aversive system (Bolls, Potter, & Lang, 2001 [4]; Lang, Shin, & Lee, 2005 [37]). In accordance with Siedlecka and Denson's (2019) [78] review that video is the most effective means of mood induction, we prepared a video on the Sichuan earthquake to make the contents culturally related to our participants and bring relevance and realness to the experience.
After the video and 30-40s of instruction, participants either talked to a robot about their experiences during the video or wrote them down on paper. The robot nor writing utensils were visible before self-disclosure. The self-disclosure session took about 10 minutes. The movements of the robot and text input were handled in remote control (Wizard of Oz), the conversation was handled autonomously by our inhouse developed AI chatbot (next section).
After self-disclosure ended, participants filled out a 30-item structured questionnaire (Appendix A), reporting on their assessment of the video clip and talking to the robot or writing the journal page. Appendix A shows the English translation of the Chinese version in the robot condition. Supplementary Materials 1 provides both questionnaire versions, for robot and writing, in Chinese and in English translation. Items on the questionnaire were presented in blocks with pseudo-random sequences of items within blocks, different for each participant. We ended the questionnaire inquiring about demographic information. Upon completion, participants were thanked for their participation and debriefed.

Video materials
The video materials for negative-mood induction were 10 minutes and 6 seconds long and were composed of video excerpts from the following three Sichuan earthquake Internet documentaries: Internet video in memory of the Wenchuan Sichuan earthquake 10 th anniversary (cut at 00:02-01: 19 [71]).
The robot of our choice was a Robotis DARwIn Mini, a 3D printable, programmable, and customizable miniature humanoid robot of 27 cm tall with Bluetooth connection to a laptop. The robot could stand up and move its arms while speaking through an AI chatbot. Technical details about the DARwIn Mini can be found in Supplementary Materials 1. The actions DARwIn could execute during the experiment such as waving and raising its arms were controlled remotely.

Self-disclosure chatbot
The DARwIn Mini cannot speak, therefore, we created our own chatbot, using DARwIn Mini as the humanoid embodiment of our self-disclosure inviting AI chatbot. Next, we give a concise account of the development of both the hardware and software. Supplementary Materials 1 offers further specifications.
Hardware development. Two main components made up the hardware of our chatbot: the core board Raspberry Pi Zero (WH) and an extension board that was connected to the speaker. These two boards we engineered into an integrated circuit. Figure  1 offers an impression of the hardware prototype chatbot. Software development. To create a chatbot adjacent to the DARwIn Mini, we set up a homepage for test subjects to assess the chatbot system (for details on the chatbot, please refer to Supplementary Materials 1) (www.roboticmeme.com). For website development, we used Semantic UI as the front-end framework (https://semantic-ui.com/) and Node.js as the back-end (https://nodejs.org/en/). We tentatively called the chatbot MEME and invited test subjects to share their secrets with MEME in our test environment. The chatbot on the website had speech recognition in Putonghua, Cantonese, and English, using a Turing robot API. To increase the traffic on our website, we also created an official WeChat account and used Python to run a server in Google Cloud (https://cloud.google.com/). On WeChat, we used Chill chat with the Xiaohuangji corpus for information retrieval.
Ours was a hierarchical chatting system, consisting of three layers: (1) A rule-based layer that focused on certain specific chatting tasks (Eliza.py and regular expressions), (2) an information retrieval-system that searched the answer from a corpus built from Weibo conversations and conversations about movies, and (3) a generation layer that used the general-purpose encoder seq2seq as well as Generative Adversarial Network, a machine-learning tool, to generate a response (https://github.com/google/seq2seq; https://en.wikipedia.org/wiki/Generative_adversarial_network). We adopted the k-means algorithm in sentence vector clustering. After many iterations of improvement, the final model could effectively answer a question.
For natural language understanding, we installed a Rasa stack and so made the conversation somewhat more contextualized (https://rasa.com/). For Rasa to estimate what a user means to say, we classified a number of conversational topics that had to do with negative experiences. Therefore, we analyzed the contents of a complaining website and ran a spider program to catch the users' comments, after which we did data mining for hot topics.
For training, we sampled a 2 years' record of almost 500 pages and nearly 10,000 comments. Then we tokenized these utterances and identified the high-frequency items ('hot topics'). An impression of the results is depicted in Figure 2: People worried most about unrequited love, emotions, relationship, family, love, homosexual love, cheating, love crush, self, life, work, making love (sex), being disappointed in live, being the only one, feelings, lost, life, to cheer up, marriage, trouble and worry, loneliness, depression, study, entry exam to university and college, secrets, and love relationships. The complete set-up of the self-disclosure AI chatbot is shown in Figure 3. The sing, movie, poem, and weather options were not used in the actual experiment. For the experiment, we installed our chatbot system in a voice kit that stood behind the DARwIn Mini. We did not install voice-recognition software because of its inefficiency (i.e. slow and inaccurate). Therefore, a confederate not visible to the participant inputted the participant's utterances. Information processing and replying to the participants was done autonomously by our AI. Figure 4 exhibits the interaction flow. Together, the DARwIn Mini standing in front of the voice kit carrying our self-disclosure AI chatbot made up the 'robot condition' in our experiment. Figure 5 shows the final set-up. We constructed the conversation following the guidelines in psychotherapy (i.e. Nystul, 2016) [52]. For example, open questions such as "How do you feel about that?" were asked to guide the participants' reflections on their experience. During the conversation, only minimal encouragement like "Yes, I see" was provided by the robot. The open questions that were coded into the chatbot were also posed to participants in the condition of writing, during their instruction. In inviting self-disclosure, the robot basically followed social norms from social penetration theory (Altman & Taylor, 1973 [2]). Based on Nomura and Kawakami (2011) [51], however, the robot did not share secrets with its user and did not (need to) apply reciprocity, although this is an important social rule in human interaction (cf. Psychopathology Committee of the Group for the Advancement of Psychiatry, 2001) [61]. Thus, the robot was not self-disclosing but invited self-disclosure by asking open questions (Hoorn, Konijn, Germans, Burger, & Munneke, 2015) [28] .

Measures
For measurement, we worked from a dimensional model of valence and relevance (cf. Smith & Ellsworth, 1985) [81] rather than a categorical model, which classifies emotions by name ('sad' or 'happy'). Emotion words are fuzzy (Frijda, 1998) and appraisal of an event may elicit a variety of emotions (Parkinson, 1995 [54]; Frijda, 2007 [22]). Different people interpret events differently and so different emotions are generated for the same event. If negative emotions are presented only by name, then, consensus among the participants may be low. Because appraisal is a dynamic process (Frijda, 2007 [22]; Carrera & Oceja, 2007 [7]; Russell, 2017 [67]), the possibility exists that an individual has multiple emotions at a single event. It is difficult to list out all possible negative emotions in a questionnaire by name and not measure fatigue effects eventually. Therefore, we assessed the most core concept of valence (positive/pleasant vs negative/unpleasant), which is a more fundamental process compared to aspects of valence that require associative and sometimes conceptual processing (Scherer, 2013) [75].
In self-report instruments such as Positive and Negative Affect Schedule (PANAS), valence is conceived of as two unipolar dimensions (Watson, Clark, & Tellegen, 1988) [91]. Each affective direction would be mediated by an independent neural pathway (e.g., Diener, 1999) [15]. This is in contrast to earlier approaches that maintain a bipolar measurement (for a discussion, see Russell & Carroll, 1999a [68], 1999b [69]). Moreover, two unipolar scales (0  n, 0  p) can at one point stand in a bipolar constellation (n  p) but from a bipolar measurement, one can never return to a unipolar conception. Therefore, we decided for two unipolar dimensions to measure affect and constructed a structured questionnaire with more items on a measurement scale, featuring positive (indicative) items as well as negative (counter-indicative) items. This approach also remedied potential answering tendencies.
We used two versions of a structured questionnaire as appropriate to one of two conditions: Talking with the robot or journal writing on a piece of paper (Appendix A and Supplementary Materials 1). The questionnaire was constructed from emotion literature (e.g., Scherer, 2013 [75]; Frijda, 2007 [22]; Russell, 2003 [63]) and ran four measurement scales: Valence after the movie but before treatment (robot or writing), Valence after treatment, Relevance, and Novelty. Together with the Demographics, Novelty served as a control.
Items were Likert-type statements followed by a 6-point rating scale (1 = strongly disagree, 6 = strongly agree). One half of the items on each measurement scale consisted of four indicative statements and the other half of counter-indications. Blocks of related items were offered in pseudo-random order, different for each participant. Items within blocks also were pseudo-randomly presented to each participant.
The measurement scale 'Valence before treatment' (ValB) consisted of four indicative items, for example, "I feel good" and of four counter-indicative items, for example, "I feel bad." We used the same items for measurement of Valence after talking to the robot or writing on paper but adjusted the wording to the situation. Thus, 'Valence after treatment' (ValA) also had four indicative and four counter-indicative items. Relevance of robot or writing to goals and concerns (i.e. personal emotion regulation) was measured with two indicative items (e.g., '… is useful') and two counter-indicative items (e.g., '… is worthless').
To control for a possible confound of the robot as a novel means to regulate emotions, the Novelty scale was composed of three indicative items (e.g., '… is new') and three counter-indicative items (e.g., '… is commonplace'). Demographics included information about the participant's Gender, Age, Education level, and Country. At the end of the questionnaire, participants could leave their comments.
Then we conducted reliability analysis on our measurement scales (for elaboration, see Supplementary Materials 1). For the variables of theoretical interest, all measurement scales with all items included, achieved good to very good reliability in the first run (Cronbach's α  .82). This was so for the separate subscales of Valence (4 items each) and for their combination (ValB and ValA, 8 items each) as well as for Relevance (4 items). After repair, the control variable of Novelty (5 items) had Cronbach's α = .77.
To test discriminant validity, we performed Principal Component Analysis with Varimax rotation on Valence-after (ValA), Relevance, and Novelty. Indicative items formed a positive-Valence subscale as the counter-indicative items clustered into a negative-Valence subscale. Items on the Relevance scale neatly fell in line as intended. Novelty showed some spread over both Valence and Relevance. However, because this was a control variable, we kept the scale intact and will observe in the Results section its tendency to coalesce with variables of theoretical interest.
We then calculated the means (M) across the items on a scale and performed an outlier analysis for Valence (before and after), Relevance, and Novelty. We found that participant 9 was an outlier in MValB, participant 39 in MValA. Participant 5 and 21 were outliers for MValAi. Participants 39, 27, 38, and 33 were outliers in MValAc (see Supplementary Materials 1). There were no outliers in MNov, MRel, MValBc, and MValBi. We will perform our effects analysis with and without those outliers.

Demographics
We checked the country that participants came from but only participant (31) reported she was from Africa; the rest were from China. Inspection of the scatter plot, however, showed that number 31 was not in the zone of outliers. Therefore, we decided to treat this person as one of the same sample and not treat her differently in the analysis.
Next, we checked whether Age was correlated with the eight dependent variables (Valence-bipolar before and after, Positive and Negative Valence before and after, Relevance, and Novelty). We calculated Pearson bivariate correlations (two-tailed) and found no significant relations. Age did have a near-significant weak negative correlation with Positive-Valence-before (r = -.27, sig.= .08), indicating that with higher age, people were less positive after viewing the earthquake video.
Then we examined whether Gender was influential for the eight dependent variables. We ran a MANOVA (Pillai's Trace) to check the effect of Gender but we found no significant multivariate effects (V = .11, F(7,37) = .68, p = .688). Yet, Gender did exact a small univariate effect -barely significant -on the experience of Novelty (F(1,41) = 4.18, p = .047, ηp 2 = .09). Throughout, females experienced more Novelty (M = 4.03, SD = .83) than did males (M = 3.50, SD = .87). However, Novelty was a control variable in our experiment and not of theoretical interest. Therefore, we concluded that Gender did not have a significant effect on the variables theoretically related to our hypotheses.
Among all participants, there were four with doctorate degrees, three with bachelor's degrees and one with a diploma degree. The rest were all master's degrees. We found participant 39 with a doctorate degree also to be one of the outliers to the scale means. Thus, we excluded this participant from the effect analysis of Educational background.
We put the seven participants with a degree other than master in one group and randomly chose seven other participants (who were not outliers) with a master degree in the other group. We performed an independent samples t-test to check whether Education had effect on the eight dependent variables that related to our theoretical hypothesis. We ran this test five times, each time with a different set of masters and found that in certain group comparisons, Education did have an effect on some of the theoretical variables (see Supplementary Materials 1). Therefore, we made two data sets, one with all 45 participants (24 in the robot group and 21 in the writing group) and the other with 31 participants (17 in the robot group and 14 in the writing group), excluding the outliers and the participants with a non-master degree as educational background. These separate sets were used to confront our hypotheses with.

Manipulation check: Emotional effects after negative-mood induction and after treatment
We wanted to control whether any emotion at all was provoked by the shocking video footage of the earthquake and whether the treatment (robot or writing) evoked any change in emotion at all. Or did everything remain at scale value 1 (no emotions reported)?
For N = 45, we ran a one-sample t-test (two-tailed) with 1 as the test value to see if any negative (or positive) emotions occurred after mood induction as well as after treatment. For Positive Valence after the earthquake clips, MValBi showed that t = 8. To check whether before-after effects of treatment actually occurred, we also ran paired-samples t-tests (two-tailed) in both data sets N = 45 and n = 31. Note that these are no tests of our hypotheses but a mere inspection if anything happened at all.
For the difference between MValBc -MValAc with N = 45, t = 9.34, p < .00001. For the difference between MValBc -MValAc with n = 31, t = 9.42, p < .00001, so that we may conclude that participants after treatment became less negative (MValBc was significantly larger than MValAc).
For the difference between MValBi -MValAi with N = 45, t = -7.16, p < .00001. For the difference between MValBi -MValAi with n = 31, t = -7.24, p < .00001, so that we may conclude that participants after treatment became more positive. Whether through a robot or through writing, treatment had effect into the expected direction.

Effect of Media (robot vs. writing) on Valence and Relevance
To analyze the changes in Valence after talking to a robot or writing a diary page, we computed three mean difference scores: For bipolar Valence, Val = MValA -MValB; for Positive Valence, ValP = MValAi -MValBi; and for Negative Valence, ValN = MValAc -MValBc. In Table 1, Val, ValP, ValN, MRel, and MNov are shown for the two conditions (robot vs. writing). Top half of Table 1 shows the averages for the entire sample (N = 45), the bottom half with suspected cases excluded (n = 31).

Effects on bipolar Valence and Relevance
Next, we performed a General Linear Model (GLM) Multivariate analysis of Media (2: robot vs. writing) on ∆Val and MRel (grand mean scores) with MNov as a covariate. We did this for N = 45 and n = 31 separately. For an extensive report, see Supplementary Materials 1.
With Novelty excluded from the analysis, the pattern of multivariate effects was similar as before (V = .09, F(2,42) = 2.09, p = .136, ηp 2 = .09). Officially, we should stop our scrutiny here. Yet, when we looked into the main effect of Media on ∆Val, we did see that without Novelty, the effect became significant (F(1,43) = 4.23, p = .046, ηp 2 = .09). As a trend, beneath the surface, it seemed that talking to a robot (M∆Val = 1.76, SD = 1.25) had a more positive impact on Valence (bipolar conception) than did writing (M∆Val = 1.10, SD = .81) after negative mood induction. 3.14, p = .087, ηp 2 = .10). Without the covariates, the pattern of these results did not change.
In all, we saw that the only marginally significant effect we could establish for the theoretical variables was with N = 45, without MNov as a covariate, in a bipolar conception of Valence (Val). We wondered, then, how this could be the case since the mood induction and the treatment had been so successful according to t-test (Section 3.2).

Effect of Media on Valence and Relevance for those who felt most negative
In clinical trials, it is good practice to contrast a control group with a treatment group and measure the effects of a drug or medical device (e.g., Friedman, Furberg, & DeMets, 2010, p. 2) [20]. We attempted the same but now with depressed people (after mood induction), using two different media (robot vs pen-and-paper). However, another approach in clinical research is to try a drug on healthy volunteers versus patient volunteers and this is what we so far failed to recognize: Part of the participants may not have been affected much by the mood induction and therefore did not need treatment or comfort from our robot or journal writing; after all, they were not distressed, they did feel the emotion but were 'immune to the affliction' so the treatment was superfluous, a sub-sample ceiling effect.
Therefore, we performed a median split for both data sets N = 45 and n = 31 on the variable MValBc (Negative Valence before treatment). In the data set with N = 45, with the outliers included, 23 participants were on the side of feeling most negative. Twelve of them were in the robot condition and 11 in the writing condition.
For n = 31, without the outliers, 17 participants felt most negative, 10 of which talked to a robot after viewing the footage and 7 did the writing. Table 2 provides the means and SDs for Val, ValP, ValN, MRel, and MNov for talking to a robot or writing a journal page for those participants who felt very negative after watching the earthquake video.  MNov also showed significant multivariate effects (V = .47, F(2,19) = 8.42, p = .002, ηp 2 = .47) but on MRel alone (F(1,20) = 16.85, p = .001, ηp 2 = .46), not on ∆Val (F < 1, p = 459). Novelty made things more relevant.
With emotional outliers included, then, talking to a robot (M∆Val = 2.74, SD = .83) had a more positive impact on Valence (bipolar conception) than did writing (M∆Val = 1.56, SD = .84) after negative mood induction. The level of novelty of the medium made things more relevant but neither medium was significantly more relevant than the other. Novelty did not significantly influence the valence result. After removing MNov as a covariate, we found that no significant multivariate effects were present any more (V = . 30 It seemed, then, with the outliers dismissed from the data (and less power due to fewer subjects), that the effects tended to disappear. It is for those who suffer the most that robots are most helpful. The novelty aspect of talking to a robot may make the medium more relevant to personal goals and concerns but is not (or for the less affected only marginally) influential for feeling more positive after a chat with a robot about negative experiences.  40), showing that robots exerted higher levels of undifferentiated Valence (non-unipolar) than writing on paper. We repeated the test but now with Novelty as the covariate but MNov did not significantly contribute to any of the effects.
Then we did the same for the data set of n = 17. We ran two GLM Repeated measures of Media (2 conditions) on within-subjects factor (∆ValP vs ∆ValN) with MRel and MNov as separate covariates. Multivariate tests showed that no significant effects were obtained for ∆ValP vs ∆ValN (V = .008, F(1,14) = .11, p = .749, ηp 2 = .008). Here as well, the height of positive and negative valence did not differ. The interaction of (∆ValP vs ∆ValN) with Media also was not significant (V = . Thus, with outliers excluded, robots still exerted higher levels of undifferentiated Valence (non-unipolar) than writing on paper. With Novelty included, these positive effects became somewhat more pronounced for the less affected.

Exploratory analyses
In the previous, we saw that Novelty mainly affected Relevance, indicating that a medium becomes more relevant the newer it is to those who emotionally are affected but not too much. In Section 3.1, we found in turn that Novelty was affected by Gender. Therefore, we explored the Media  Gender effects on Novelty with Univariate ANOVA for both data sets N = 45 and n = 31. The research question was if robots were newer to females than to men or v.v.? With

Discussion and Conclusions
Our manipulation was successful: The video was rated as significantly inducing strong negative mood. Our treatment also was successful: We could demonstrate significant improvement of positive affect and reduction of negative affect after treatment.
We assumed that after negative-mood induction, (H1) a social robot that invites self-disclosure will lower the level of negative emotions more than writing a journal page does. Indeed, our self-disclosure AI chatbot in unison with the DARwIn Mini embodiment made viewers of video recordings of the Wenchuan Sichuan earthquake in China 2008 significantly more positive. This was particularly so for people who were most Preprints (www.preprints.org) | NOT PEER-REVIEWED | Posted: 13 May 2021 doi:10.20944/preprints202105.0279.v1 negatively affected by the video. For those less affected, writing a diary page also sufficed.
In our study, valence should be conceived of as a bipolar dimension. Significant and reasonably strong main effects of robots exerting more positive results than writing were established with bipolar valence, particularly for those with high negativity. Even when we analyzed valence as a within-factor of two levels measured as separate unipolar scales, the significant effects of media happened across positive and negative valence, not to these measures separately. Novelty of the medium (either robot or writing) did not affect the effects on bipolar valence -or occasionally for the less affected.
Then a note on analysis. If we had followed conventional statistical practice, we had eliminated the outliers from our data set and had found no differences between writing and robots in alleviating stress and anxiety. In reporting a null effect, looking at normal distributions only would have missed the upshot that those who are most in need of mental support should not be deprived of a treatment that is more effective than traditional text writing; something that comes closer to a therapist; a social robot that can relieve the shortage of caregivers in the mental care sector.
We also hypothesized (H2) that a social robot that invites self-disclosure is more relevant to goals, needs, and concerns than writing on paper. This was not the case. Although we measured the highest grand mean averages for relevance, whether tested for those high or low on emotional negativity, men or women, relevance did not differ for any of the fixed factors tested and did not significantly contribute to the effects on valence. It was only in unison with novelty that relevance took effect. The novelty aspect of talking to a robot or writing on paper apparently made the medium more relevant to personal goals and concerns. As an extra, women experienced more novelty of the presented medium than men but this was not specific to the robot or the writing condition.
New technologies such as social robots bring various opportunities of discovering new methods to improve an individual's well-being and suggest that such new technologies will alleviate the pressure on current healthcare, such as care for older adults, depressed youth, and groups with special needs (Broadbent, 2017 [5]). Our study focused on social robots helping individuals to improve their mental well-being through self-disclosure. The results suggest that individuals who have a relatively high level of negative emotions benefit the most from robots.
Our results are not consistent with Slavin-Spenny, Cohen, Oberleitner, and Lumley (2011) [79] that the effect of four conditions of disclosure (writing, private spoken, talking to a passive listener, talking to an active facilitator) had about the same effect in reducing negative emotions because after the disclosure session, the negative emotions remained. Our results also run counter with Murray and Segal (1994) [50] that the two procedures (talking and writing) were almost identical in reducing negative affect and in producing adaptive changes in cognition and self-esteem. However, our results are not at odds with Murray, Lamnin, and Carver (1989) [49] who compared writing and talking with a psychotherapist and found that after writing, no increase of positive emotions happened whereas after talking with the therapist, positive emotions increased. Maybe the answer lies in the change of focus: Talking to a (virtual) therapist does not so much decrease negativity as it compensates negativity by increasing positive affect.
Not reducing negativity, however, would go against studies by Murray and Segal (1994) [50], Epstein, Sloan, and Marx (2005) [16], Sloan, Marx, Epstein, and Lexington (2007), Perez, Penate, Bethencourt, and Fumero (2017) [80], which all show that emotional disclosure intervention is effective in reducing the level of negative emotions. Perhaps that the decrease in negativity takes longer than the immediate joy of encountering a (virtual) human. The length of the emotional disclosure session in the said studies was way longer than our 10 minutes. Frattaroli (2006) [19] concluded from a meta-analysis that such sessions usually last for days or weeks.
A limitation of our current study is that participants took the questionnaire but once, after talking with the robot or after writing, rather than after the video as well as after  [46], for example, by filling out a questionnaire. Perhaps next time we should combine self-reports together with physiological reactions and behaviors for triangulation purposes (Erevelles, 1998 [17]; Lang, 1993 [38]; Frijda, 1986 [21] [88] report beneficial effects of robot-enhanced therapy. With such overwhelming evidence for robot-supported mental well-being, we felt we should set out to put this knowledge into design practice. As is, we are in the process of developing our own robot MEME, a talking stress ball that embodies our self-disclosure AI chatbot. MEME may replace throwing a brick, launching a gas grenade, or suicidal ideation. We hope MEME may help people who feel depressed during social isolation or calms down those who seek violence to settle their disputes in Hong Kong. Figure 7 shows the development steps so far. MEME is round and covered in silicone rubber for a soft touch, look, and feel. It is portable, pocket-size, and is easy to carry on. The cover can be adapted according to the user's taste. Interaction with MEME takes place through emoji (Figure 8). Many studies in Computer-Mediated Communication assert the importance of emoji in non-verbal interactions to represent a person's affectionate or depressed feelings (e.g., Crystal, 2006 [12]; Rezabek & Cochenour, 1998 [64]; Wolf, 2000 [93]; Li & Yang, 2018 [42]). MEME can be found at www.roboticmeme.com. The logo was designed according to the directions of Walsh, Winterich, and Mittal (2010) [90].  To conclude, the video footage of the 2008 Sichuan earthquake aroused negative emotions, which were mitigated by self-disclosure to a robot or by writing a journal diary page. The choice of medium was indifferent to most participants, both means worked for them. For those who felt extremely bad after the shocking video, however, the medium did make a difference. Those high on negative valence were significantly more positive after talking to a social robot.
Valence in this study should be conceived of as a bipolar scale (i.e. more positive is less negative). Relevance had little to do with these effects and was most susceptible to novelty of the medium. The newer, the more personally relevant the medium seemed. The experience of novelty had little effect on valence and was higher for robots than for writing. Although females experienced more novelty throughout, this had nothing to do with robots as such. Robots seem good candidates to aid people with stress and anxiety problems. We took a shot at such opportunities by designing our own MEME stress ball with an emoji enhanced self-disclosure inviting AI chatbot.
Noteworthy, however, is that the positive effects we observed are valid for the robot as a whole. We should not attribute the positive effects on emotional valence to particular design features such as specific parts of the embodiment or the quality of the chatbot. As one of the participants astutely commented: The robot answered my questions in weird ways sometimes, and repeated some questions. I think the unexpected movement of the robot was the best part of the experiment. It affectively changed my mood. Not so much the conversation itself.
Let this be a reminder to us robot researchers, AI developers, and designers: The robot made funny moves and that cheered this participant up. Not the conversation about difficult things. Perhaps in the future, in concert with talking stress balls, we should create paper and pens that make sudden funny moves as well.
Supplementary Materials: Technical Report S1: Self-disclosure to a Robot or on Paper.  Institutional Review Board Statement: The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Human Subjects Ethics Sub-committee of the university filed under HSEARS20200204003. The authors declare that they have no conflict of interest.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Data can be made available by the corresponding author.
Acknowledgments: This study was a Chinese, Korean, and Dutch collaboration of students in the School of Design and the Dept. of Computing. Zachary Tan is kindly acknowledged for assembling the DARwIn Mini as part of his primary school project on robotics.
Conflicts of Interest: The authors declare that they have no conflict of interest.

Appendix A
Structured questionnaire for self-disclosure to a robot (English translated from the Chinese). Online Resource 1 provides the questionnaire versions for robot as well as for writing, in Chinese and English.
Dear Sir/Madam, Thank you for your time for our experiment. We would like to ask you to answer a few questions. Answering these questions will only take a few minutes.
You have the right to withdraw at any point during the study, for any reason, and without any prejudice. If you would like to contact the Principal Investigator in the study to discuss this research, please e-mail <name> via <name>@connect.polyu.hk.
By clicking the button below, you acknowledge that your participation in the study is voluntary, you are 18 years of age, and that you are aware that you may choose to terminate your participation in the study at any time and for any reason. The data provided by the participants of the study will be processed and published anonymously in the results sections of the paper.
This study is supervised by The Hong Kong Polytechnic University.
Thank you for your participation.
With kind regards, Team Social Robot MEME I agree to participate in this study I do not agree to participate in this study