2.3.1. Humanistic Art Therapy: Robot as Partner
In the current article, we argue that some approaches might not be appropriate for robots, due to the risk of harm if a robot presents a mistaken diagnosis. First, mistakes could be common in first robot prototypes; art therapy requires a high degree of understanding of humans, which would be currently difficult to expect from a robot given the emerging nature of this field. Second, there seems to be a potentially dangerous power imbalance in the case of a “normal” therapist looking over the shoulder of an “abnormal” patient, judging, in that decisions could be made based on some diagnosis which could adversely affect a person. Some similar arguments have been made in human science: for example, it has been proposed that some approaches could be oppressive if there is some disagreement in values between therapist and patient [
72,
73]. This line of thought was also echoed in the area of human–robot interaction, for therapy in general, by Ziemke and colleagues, who questioned the assumption that the therapist is an all-knowing expert who can deduce the truth [
54], and by Tapus et al. who also advocated that therapy robots should be hands-off [
74]. Furthermore Kahn et al. stated that the important target is how to design a scenario in which people will interact with robots as partners in a joint creative enterprise [
75]. Thus, we envision, from a humanistic perspective, that the first fundamental scenario to target for art therapy robots should be one in which the person is at the center of the interaction, and the robot does not diagnose and judge, but rather tries to play a supporting role.
Following this basic conceptualization that a robot can act as a partner in art rather than a judge, we propose a basic scenario, following the Five W strategy (why, who, what, where, when) to address some salient questions [
76]. Since “why” has already been discussed, we turn first to the question of who should paint. Art therapy can be conducted with various numbers of people and robots. A benefit of interacting with a single person, rather than a group, is that we expect the person will be better able to perceive full attention from the robot, due to, e.g., the “Socratic bottleneck” where in a group only one person can speak at once [
77]. (If we assume the art therapy robot uses a familiar interface for communicating—humanoid or possibly animal-like—then, similar humans or animals, it can show attention through cues such as gaze, body pose, speech, location, and motions (e.g., art-making), but, because humans and animals tend to only have one head, one body, and a few arms, and take turns speaking, such a robot cannot simultaneously directly look at and listen to multiple people at once. It would have to look from one to another, listening to one person’s response then another’s, similar to a human, providing an impression of giving less than its full attention to each person. In other words, as an example, any time one person is speaking is a time when another person will appear to not be receiving full attention. This effect is known in pedagogy, and a reason for sometimes breaking students apart into small groups, so that they can more actively interact [
77]).
Group therapy can also result in a sharing of strongly emotional experiences which might otherwise not be possible, and improve relationships [
1]; however, this is a more complex case, introducing new dynamics, such as the relationships between people and how choices will be regulated based on who is present. It could also be possible to allocate multiple robots per person; for example, one robot could steady the arm of a person with Parkinson’s disease, while another robot could paint beside them, providing company. However, we believe this case is also more complex. Therefore, we suggest that the more basic dyadic case is useful for initial explorations.
For such a dyadic case, we propose that three basic cases can be described, in which the robot’s involvement in painting is 0% (only the person paints), somewhere in between 0% and 100%, such as 50% (both the person and robot paint), or 100% (only the robot paints). To determine which a person prefers, the robot can ask at the beginning of the interaction. In the first case, the person might wish to achieve independently without relying on others and feel ownership over the art. In the second case, the person might value the enjoyment from social interaction or expect a nicer result if co-creating. In the third case, the person might be incapable of physically participating, or merely prefer to passively observe. In the first and second cases, the robot can seek to infer a person’s emotions from the art they make, while, in the third case, the robot can seek to recognize how the person is feeling directly. In all cases, the robot can attempt to make some basic conversation: In the first and second cases, the robot can also ask the person about what they are painting, whereas, in the third case, the robot will simply comment on what it is painting. From the perspective of simplicity, the first case is arguably the simplest; however, the second and third cases we feel are most in line with our vision of the robot as a partner, and fundamental from the perspective of exploring how a robot can interact in an emotional and creative manner.
Regarding what a person should paint, we suggest giving the person themselves the dignity to freely select what to paint. Where a more formal procedure is desired, objective tests could be adapted, such as the Diagnostic Drawing Series (DDS) which assesses colors, lines, and composition (e.g., placement and integration), in asking a person to make a picture, make a picture of a tree, and make a picture of how they feel using lines, shapes, and colors [
78]. However, we note that this test would not be possible to directly apply for the painting scenario considered here, as it is designed specifically for pastels; also, requirements on time, number of artworks to produce, and subject matter, might not be desired in various cases: for example, if the artwork generated can also be intended to be aesthetic or displayed somewhere, or if a person requires more time, or does not have the physical strength to produce many drawings. We suggest that a robot can in general seek to track such objective information over time to roughly gauge a person’s state and the effectiveness of sessions, even with freely chosen art. In addition, for individuals who cannot decide what to paint, a robot could suggest some topic; for example, self-portraiture is a tool used by art therapists as a way to promote self-reflectance and self-acceptance [
79].
Another question which should be addressed is where the robot should paint. If a robot draws on the same substrate or canvas, the interaction could be felt to be more social and intimate; that is, there could be a stronger effect of social facilitation [
80]. Furthermore, the robot could help the human to paint better, especially for individuals with restricted mobility, e.g., adding details which might require high dexterity, knowledge, or technical ability. Conversely, allowing a person to complete their own painting could result in an increased perception of accomplishment, and ownership, resulting from high perceived involvement, personalization, expression of territoriality, and control [
81]. (Feeling ownership can also enhance memory through the so-called self-referential effect [
82], which could be useful for dementia patients.) Such a case might also be simpler for initial investigations, as a robot does not need to track where a person is painting and avoid colliding with them. We argue that both cases can have benefits, so an art therapy robot should be capable of engaging in a range of art-making behaviors, from facilitation to completing a painting by itself. We also propose that the robot first ask the human how they would like to paint, and, if the same substrate is used, that the robot should allow the human as much as possible to play an important role in making the artwork.
In addition, a decision should be made on when art therapy should take place. Art therapy is tailored for the individual both in terms of number of sessions and structure of sessions, as follows [
6]. Single sessions are possible, although therapy tends to last over several weeks to a year. Session can be approximately an hour each. A common structure for each session, in line with models such as the Creative Axis Model, is to have some warm-up activity, a main activity, and reflection at the end. Similarly, initial sessions can focus on goal-setting, identifying problems, and accustomization; middle sessions can involve working on central themes, then more challenging and complex themes, distancing from problems, investigating solutions and answering questions; and, the end can focus on review, next steps, and closure. In addition, art therapists can make responsive art before, during, or after sessions. We believe the most practical starting point for research in art therapy is focusing on a single session first. If the robot does not paint during the same session, timing and memory can be factors which should also be considered, which might be important, especially for persons with dementia; therefore, we also suggest that initial work focuses on the basic case in which the robot makes art at the same time as the person, in close social proximity.
Within this basic scenario, summarized in
Table 1, we next consider some guidelines about how to conduct the desired form of humanistic art therapy. Phillips recommended: (1) investigating the meaning of the art with the person; (2) accepting (and at times encouraging) the communication of strong emotions, including negative ones; (3) praising creativity and skill, even for negative depictions; and (4) suggesting alternatives for disturbing negative content to express feelings [
83]. Additionally, the importance of keeping track of the timing and progression in the artworks over time was mentioned, as well as the importance of understanding popular culture, clinical, and social context for seeking to find meaning in a person’s art, although the latter could be difficult for a robot. These guidelines can be followed both via verbal interaction and via the therapist’s own artwork, in a process which has been described as
responsive art, or
visual feedback. Phillips noted that often visual feedback was more successful than a verbal response in investigating meaning, in line with our idea that a robot can paint with a person as a companion.
Such prescriptions are in line with our idea of the robot acting as a companion, rather than a judge, but they do not clearly state how a robot can decide what to draw. For example, if a person is expressing a negative emotion such as sadness, what should a robot do? The literature presents some evidence supporting two potentially opposing premises: that a robot could try to
match a human’s negative emotions, and that it could seek to
distract with a positive emotional display. In human–robot interaction, Goetz et al. compared a robot which is always positive to a robot which matched its mood to the context, finding the latter was more liked [
84]; Tapus and colleagues also found that people preferred a robot to match its behaviors to a user’s personality [
85]. A benefit of such an approach could be that the person does not feel that they always have to be positive and suppress their emotions, which has been reported to be an ineffective emotion regulation strategy with some negative consequences [
86]. Furthermore, a tendency to like people with similar attitudes has been described [
87], implying that such convergence of emotion displays could engender liking. (We note that this concept of displaying similar emotions does not imply any suggestion that a robot’s appearance should or should not resemble a human’s, which is debated: According to Mori’s Uncanny Valley hypothesis, a near-human appearance with slight imperfections could potentially trigger negative impressions; conversely, it has been argued that negative impressions can be avoided using an attractive design as not all imperfections elicit the same responses [
88], and also that humans can quickly become accustomed and extend their expectations for appearance [
89].) In addition, humor could also be perceived in causing a robot to behave in a sad manner, like seeing a pet’s concern over its owner; negative emotions can also be valuable [
90], and Philipp’s second guideline above regarding accepting others’ emotions could also be interpreted as suggesting matching.
A caveat is that the studies above were not conducted in the context of art therapy and it was not determined if people felt better as a result. In art therapy, Drake and Winner found that distraction had a better effect on a person’s mood than venting (i.e., expressing positive, rather than negative emotions, which might help to avoid negative rumination) [
91]. We believe various positive effects could ensue from a robot’s positive emotional behavior, such as happiness through emotional contagion; a person could feel safe if a robot never displays negativity; the robot could seem to have its own goals and not be merely reactive; and, Philipp’s third and fourth guidelines could be interpreted as suggesting distraction, by praising and suggesting alternatives.
However, Drake and Winner’s study was not conducted with robots. Moreover, if a robot always acts in the same positive manner, ignoring the person’s emotions, it could appear boring [
92], or insincere [
93]. Positive behaviors such as laughter can be irritating when perceived to stem from schadenfreude [
94]. Moreover, if a robot adopts a purely positive stance about everything, and does not acknowledge any negative aspects, a person might feel the need to express the negative side themselves, in line with the concepts of the devil’s advocate and reverse psychology, which might result in undesired negative rumination. Furthermore, the robot could be interpreted as implying that, if a person is not positive like it, they must have a problem, which could produce negative feelings.
We argue that the results of these studies do not necessarily contradict: for example, a robot in an interaction can do both, sometimes matching, thereby showing empathy and gaining trust, and sometimes distracting, thereby helping the user to feel better. Conversely, the robot could do both simultaneously, expressing a mixture of negative and positive emotions: for example, drawing a sad scene with a positive rainbow. Furthermore, there is not necessarily a conflict or disruption of contingency in responding positively to a negative emotional display. A study conducted in the context of analyzing conversations on Twitter suggested that
the form in which positive emotions are shown is important, finding that sympathy, greetings, and recommendations in particular exerted a strong positive influence on others’ emotions [
95]. Conversely, worrying, teasing, and complaining often caused others to feel negative emotions. In terms of basic Ekman emotion categories, worrying and complaint can be associated with fear and anger, respectively (high arousal, low valence), whereas sympathy is displayed through sadness (low arousal, low valence) toward a person’s unfortunate situation, or affection (high valence) toward a person. Greetings and recommendations are typically happy (mid to high valence). Teasing is typically insincere, and can be insincere anger, or insincere joy. Additionally, the study also found that people usually expressed the same emotion or a positive emotion, in line with our idea that both matching and distraction are useful behaviors to employ. We note that there are also theories in human science such as communication accommodation theory (CAT) which could offer additional insight into when to match or distract (converge or diverge); however, this theory, in line with social identity theory (SIT), focuses on explaining human motivations for social approval, communication efficiency, and social identity, rather than on how a robot could engender positive affective changes [
96].
We believe that a key concept underlying matching is empathy, which describes the capability to perceive, understand, and share another person’s emotions [
74,
97]. From this perspective, we believe that matching is not merely mimicry in showing similar emotions, but rather there is also a cognitive aspect involving perspective taking, which is required to deal with complex emotions involving mixed emotions and referents. More specifically, when matching, emotions to display can be influenced by the type and degree of emotions perceived; the robot’s personality; and the perceived importance for the robot to act based on the closeness of the relationship between robot and person, as well as the appraised context (e.g., for humans a bystander effect is observed in which helping behaviors are inhibited when many people are present) [
98]. Based on such theory, a rich computational model of empathy has been built, which we believe will enable matching behavior in art therapy robots [
99].
Some further insight can also be obtained by considering not just explicit guidelines for what a human therapist should do, but also the underlying processes embedded in art-making which allow people to feel good effects. As noted in
Section 1, in art therapy, some positive effects—such as improved self-awareness, self-image, relaxation, and social interactions—are facilitated via processes of self-exploration, self-fulfillment, catharsis, and perceiving belonging [
11]. Self-awareness is enhanced by projecting and exploring emotions and experiences, which might be easier to express with symbols than words (described as “refraction”, “dramatic distancing”, or being “once-removed”), and by allowing inference of a person’s current state and progression over time; enhanced self-awareness can facilitate healing, via reappraisal. Self-image is enhanced by fulfillment: via opportunities to achieve and to actively take an empowered role in improving one’s situation, promotion of cognitive ability (creative thinking), and positive distraction allowing a person to escape from negative rumination to a state of “remembered wellness” and regain an identity not defined by their problems. Relaxation is promoted by catharsis: the release of stress and tensions, engagement in a repetitive physical activity in which the person can freely choose when to start and end, and the subjective nature of aesthetics allowing for many different kinds of “good” result. Social interactions can be improved by feeling a sense of belonging, by being able to share with others, and by being included in a form of expression which is accessible to people of all ages and cultures. Thus, we propose that an art therapy robot can seek to promote self-exploration, self-fulfillment, catharsis, and perceiving belonging.
To promote self-exploration, the robot should sometimes ask a person about their art; in doing so, the person should not be directly interrogated, but rather their emotions can be explored indirectly through the art. The robot can also compile data on a person which can be given if approved to a care giver such as a doctor to make inferences on a person’s state and progression. To promote self-fulfillment, the robot could leave the most important parts of the painting for the person to paint, providing opportunities to feel independence, control, purpose, and growth, while possibly scaffolding and adjusting the challenge to the person’s skill level. In addition, the robot can refer to the person as an artist or as creative or skillful, seek to positively distract the person sometimes, and engage the person in creative thinking, asking questions such as “What do you think this looks like?” or “What would you do next time you wanted to paint something like this?” Furthermore, self-acceptance could be facilitated by including positive personalized content; for example, a robot could paint fish for a person with dementia who used to enjoy fishing. To promote catharsis, the robot should not put time pressure on a person, but rather ensure the atmosphere is relaxing; the robot should take over as little as possible the physical act of painting for a person, and should recognize when a person wants to start and end. To promote social interactions, the robot should suggest opportunities, such as showing the painting to others, or displaying a painting somewhere; the robot could furthermore suggest including others in paintings, mention others who have painted similar paintings, or seek to include interesting content in paintings which could lead to conversation.
2.3.2. General Interactions
We have considered the human science literature specific to art therapy to gain some initial ideas. Here, we enrich those ideas by considering general requirements to achieve good interactions with a robot, based on the idea that, although humans can do much automatically without conscious thought, robots need to be explicitly programmed. In other words, positive user experience (UX) is important to realize successful interactions with a robot, but does not automatically result when a system is built; rather, conscious design is required [
100]. Here, we discuss properties such as behavior modalities, and how to structure behavior to facilitate general well-being.
A fundamental question is which modalities a robot should use for input and output. The usefulness of speech for art therapy can be expected, as psychotherapy is sometimes referred to as “talk therapy”; verbal and vocal channels allow complex information to be conveyed [
101], in a highly salient fashion [
102], without requiring a person to look away from art-making and possibly lose concentration [
103]. Visual output also is useful, in that numerous streams of information can be shown continuously and simultaneously, additionally to people with difficulty hearing, and a human does not have to wait to hear complete messages from a robot, which could be difficult for users with limited attention. Moreover, tactile interfaces are fundamental, both for operating tools and machines, and for affectionate interactions. Since human art therapists typically utilize multiple modalities, we propose that robot art therapists as well will ultimately benefit from the ability to engage in multimodal interaction.
We also consider the overall problem of facilitating a person’s perceived well-being. In general, a robot can seek to facilitate hedonic and eudaimonic aspects of well-being, i.e., short-term positive feelings, and aspects which should contribute to eventual good feelings over the long run. Short-term feelings can be easier to measure when assessing the usefulness of some intervention, but their relation to long-term happiness is not always clear–especially considering the phenomenon of “hedonic adaptation”, where people after a fortunate or unfortunate event tend to return to the same “set point” or degree of happiness [
104]. This mechanism could be beneficial for a person’s ability to survive (e.g., helping people to avoid becoming oblivious to danger, and recover, respectively), emerging from neurochemical desensitization, but reduces the certainty of achieving positive long-term effects when focusing only on short term design properties. Likewise, the impact of focusing only on eudaimonic factors can also be unclear; a person struggling hard to improve themselves might experience much suffering every day, without certainty that their goal will be ever reached. Therefore we believe it is useful to design for both aspects, also in trying to facilitate good feelings toward the artist, the art, and the robot.
In our previous work, we have proposed some designs for facilitating hedonic well-being in an interaction with a robot, based on guidelines that a robot’s behavior should be rewarding, helpful, or inspiring (possibly combining function and playfulness, reactivity and proactivity), clear in regard to its intentions, and carefully executed [
105,
106]. In addition, some general criteria have been proposed to be important for a person’s eudaimonic well-being [
107]—comprising self-acceptance, positive relations with others, autonomy, environmental mastery, purpose in life, and personal growth—but how to apply them for a robot to support a person’s well-being in a creative application is unclear. We note that these criteria have been embedded in a form of therapy called “well-being therapy” which is intended to help with affective disorders, but this involves a very different scenario from the one we consider, in which a therapist assesses notes on perceived experiences to detect impairments and leads a process of cognitive restructuring [
108]. Instead, we argue here that these dimensions of eudaimonic well-being can be related to the qualities positively affected by art-therapy previously noted. Self-exploration allows for personal growth and perceiving purpose in life. Enhancing self-image relates to self-acceptance; as well, being autonomous and controlling one’s environment leads to having a positive self-image. Furthermore, enhancing social interactions entails positive relations with others.
Aside from theoretical prescriptions, the measurement of bodily signals in neuroscience is also contributing new understanding. Korb has summarized some positive effects from feeling gratitude, labeling negative emotions, decision-making, touch, bright light, and exercise [
109]. Feeling gratitude resulted in increased levels of dopamine and serotonin, associated with well-being. We note additionally that, not only being thankful for positive experiences, but forgiving of negative experiences, has also been linked with well-being [
3]. Labeling negative emotions resulted in higher ventrolateral prefrontal cortex activation and reduced amygdala activity, which relates to reduced perceptions of worry and fear. Making decisions, involving actively selecting and planning actions, especially without striving for perfection, engaged the prefrontal cortex, helped overcome striatum activity related to negative habits, calmed the limbic system, increased dopamine, and led to a stress-reducing feeling of control. Touch resulted in increased levels of oxytocin, serotonin, and dopamine—neurotransmitters associated with well-being—as well as reduced perceptions of pain and social exclusion, which was reported in an FMRI study to affect the brain similarly to physical pain. We also note that warmth is a typical component of human touch, where physical warmth has been linked with perceived psychological warmth [
110]. Bright light in the day and exercise also led to numerous positive effects such as boosted serotonin levels.
Based on this, we draw the following conclusions: If a person does not know what to paint, the robot can suggest painting something for which they feel grateful; for negative displays, a robot can ask them to consider forgiveness. A robot should ask a person to describe depicted emotions, although this does not mean a robot should not also make inferences itself based on the person’s art and/or direct signals. As suggested previously, the robot should allow the person to make various decisions. If possible, it would be an advantage for the robot to also be capable of engaging in simplified touch interactions (e.g., recognizing a hug or pat, and reacting appropriately); some previous studies in robotics have proposed such methods [
111,
112,
113,
114]. Likewise, an advanced system could act to select a painting environment which provides some sunlight and adequate warmth, and if possible should let the person move to get physical exercise (e.g., to fetch paints).
In designing interactions for well-being, another challenge will be how to assess the success of art therapy interactions. For this, a number of instruments have been described, from the Bradburn Affect Balance Scale, Fordyce Happiness Scale, and Satisfaction with Life Scale (SWLS) for well-being, as well as the Friedman Affect Scale and Positive and Negative Affect Schedule-X [
3]. For working specifically with elderly with dementia and art therapy, the following instruments have been proposed: Cornell Scale for Depression in Dementia (CSDD), The Multi Observational Scale for the Elderly (MOSES), The Mini-Mental State Exam (MMSE), The Rivermead Behavioural Memory Test (RBMT), Tests of Everyday Attention (TEA), Benton Fluency Task, and Bond-Lader Mood Scale [
12]. Thus, many tools exist which can be applied to evaluate robot art therapy interactions.
In summary, we have examined both explicit and implicit guidelines for human art therapists, as well as general guidelines for achieving good interactions, to prescribe interactive guidelines for an art therapist robot.