The Power of Gaze in Music. Leonard Bernstein’s Conducting Eyes

: The paper argues for the importance and richness of gaze communication during orchestra and choir conduction, and presents three studies on this issue. First, an interview with five choir and orchestra conductors reveals that they are not so deeply aware of the potentialities of gaze to convey indications in music performance. A conductor who was utterly conscious of the importance of gaze communication, however, is Leonard Bernstein, who conducted a performance of Haydn’s Symphony No. 88 using his face and gaze only. Therefore, a fragment of this performance is analyzed in an observational study, where a qualitative analysis singles out the items of gaze exploited by Bernstein and their corresponding meanings. Finally, a perception study is presented in which three of these items are submitted to expert, non-expert, and amateur participants. The results show that while the signal for “start” is fairly recognized, the other two, “pay attention” and “crescendo and accelerando” are more difficult to interpret. Furthermore, significant differences in gaze item recognition emerge among participants: experts not only recognize them more, but they also take advantage of viewing the items with audio-visual vs. video-only presentation, while non-experts do not take advantage of audio in their recognition.

Ever since ancient rhetoric, Aristotle, Cicero, and Quintilian have acknowledged the relevance of gaze in the orator's communication [12]. In modern psychological research, the seminal work of [13] overviewed the functions of gaze in man and other animals; Ekman [14] pointed out the expressive import of the eyebrows in conveying surprise and other emotions, but also anticipated their syntactic use as interrogative markers, which [15] later described as particularly precise and systematic in sign languages of the deaf. Further uses of gaze were illustrated, from conveying the performative of communicative acts [16,17], through regulating turn-taking and asking and giving backchannel, thus managing synchronization with speech and the negotiation of participant roles in interaction [18,19], fulfilling meta-discursive functions such as skipping a topic [20], and rhetorical functions like emphasizing or showing irony about something [21]. Nevertheless, eyes are not only exploited for expressive or discursive functions, as they can also bear referential information, for instance, pointing at objects or people in the surrounding context, but also producing iconic signals to inform about physical and metaphorical properties such as small or big size, subtle and difficult concepts [10]. However, gaze direction, which is so often taken into account in gaze studies, is not the only relevant parameter to imprint meaning into gaze communication. More generally, gaze can be seen as a lexicon: a set of systematic correspondences between meanings-i.e., various types of information on the world and on the Senders' internal states-and signals-i.e., physical aspects of gaze. In writing down such a lexicon [10], distinguishing the above meanings of gaze (syntactic, referential, discursive and conversational functions) also proposes an analysis of gaze signals in terms of a set of physical parameters (gaze direction, eye position in the sclera, eye humidity and reddening, eyelid and eyebrow position and movements) with respect to the extent in which any gaze item may assume some possible values (e.g., eyes upward vs. downward in the sclera, half-open vs. wide open vs. closed eyelids, raised vs. frowning eyebrows). This allows us to account for the high flexibility of gaze in producing communicative signals and conveying an articulated lexicon of meanings in everyday communication, much richer than the one accounted for by only tracking people's gaze direction.

Body and Gaze in Music Performance
The vast majority of works on the body communication of musicians or conductors investigates the use of gestures or whole-body movements. In the last 30 years, they have investigated the functions of movements in performance [22,23], sometimes focusing on the meanings of performers' movements [24][25][26], and how they influence the perception and the subjective experience of music [1][2][3]6,27], for instance, assessing the performer's gaze direction and other cues such as nodding, selftouch, stance width, step size, resolute impression in the first minute after stage entrance [28] and their effects on the audience's first impression. Other works focus on the reciprocal body communication between co-performers, such as progressive coordination, synchronization, and time lag reduction achieved through gestures and glances in duo pianists [4,29,30] and through head and gaze direction in quartets [31]. Concerning communication in ensembles, [32], through automated motion detection, shows how both head and gaze direction and head ancillary movements of a group of players spontaneously change when changing the way they interact with the rest of the orchestra and when they see the conductor. The study in [33], by mobile eye-tracking, finds a high variability in the amount of gazing at the partner in clarinet, violin and piano trios.
The study in [34] stresses the importance of gaze, along with sound, body movement, facial expression, and breath in giving and seeking information during musical interaction, and distinguishes mutual gaze-simply looking at the other's body-from eye-contact, which constitutes looking into each other's eyes. [35] finds that the players in a duo watch each other more after rehearsal than before, and that gazes are longer during temporally-unstable than during regularlytimed passages.
Overall, these results might indicate that a collaborative and monitoring gaze may be established only after a shared interpretation of the piece is determined. The study in [36] finds that the onset and duration of the looking behaviors of improvising instrumentalists does not co-vary with musical structure or the social context of performance; the underlying timing mechanisms support social interaction processes, facilitating interpersonal communication. In an experiment where subjects only saw or only heard other band members, [37] observed glances to the conductor 28% of the time, with glances under one second. The study in [38] also shows that pianists use nonverbal audio and visual cues during duet performances, while one in [39], which investigated several aspects of social interaction in piano duos, including dominance relationships, conflict management, and nonverbal communication, pianists highlight the importance of expressive body movements and positive facial gestures, but most of all of eye-contact which, gradually increasing from rehearsal to performance, is essential for coordination of the musical content.
The study in [40], besides from describing peculiar gestures, collects the musicians' descriptions of various types of eye contact among first and second violin and cellist, and argues that such "conversation with the eyes" also conveys the special social relation among performers.
On the other side of the stage, [31] found that for both non-musicians and musicians, gaze information helps to perceive musical expressivity. The study in [41] used eye-tracking to analyze the audience's gaze during a multipart female duo musical performance, finding that the melody part significantly attracted more visual attention than the accompaniment one; moreover, joint attention phenomena emerged as the singers shifted their gazes toward their co-performers, and melody or accompaniment strongly influenced the total duration of gazes within the audience. Comparing the utilization of acoustic cues of guitar pieces in conveying emotions, [42] found that performers were successful at communicating emotions to listeners, that their cue utilization was well matched to listeners, and that it was more consistent across different melodies than across performers.

The Conductor's Body
The social relationships in small ensembles and orchestra, and their mechanisms of leadership and group interaction have been the object of analyses ranging from phenomenological [43] to computational [44], all stressing the central role of the conductor, to the point where changing the conductor brings about a change in entertainment, and hence lower ratings of the performance [44].
Concerning the conductor's communication, gestures are the most frequently studied in the literature on conducting [22,45,46], while in the field of gesture studies, [47][48][49] analyze iconicity and metaphors in gestures. As to other modalities, [50] finds that eye contact and facial expressions of approval and disproval determine a better opinion of more expressive conductors, irrespective of actual better performance. The study in [51], investigating the use of facial expressions in conjunction with musical conducting gestures and the relative interpretation by instrumental performers, found that only their experience level, but not the conductor's concomitant facial expression, affected the instrumental performers' ability to interpret 53 different musical conducting gestures. On the contrary, using an occlusion technique, [52] highlighted the importance of "facial affective behavior for expressive conducting". In his experiment, 127 participants ranging between musically trained and untrained, watched randomly presented video excerpts without sound, in which they could see only the face, only the arms or the whole body in simulated peripheral vision: the "only arms" condition was judged as the one conveying the most information, but the interpretation of excerpts showing the face was the most similar to one of the complete video sequences with sound taken as reference. Some studies ask conductors to self-assess their own potential for leadership or other conducting capacities [53], thus investigating their self-awareness of their own use of communicative cues and techniques.
When it comes to the specific use of gaze, more research is presently underway, also exploiting eye-tracking [54]; nonetheless, eye-gaze is mainly investigated in the audience, in performers and between co-performers [35], and rarely as a specific tool of the conductor's communication. In reality, some works address the relevance attributed by conductors to the various cues, including gaze. For instance, some consider eye contact so important that they recommend annotating the points in the score during which to gaze at musicians [55]. However, in all these cases, most typically, the parameter of gaze taken into account is almost exclusively gaze direction. On the contrary, what should be (but is not) generally acknowledged in the literature is that the range of communicative potentialities and corresponding physical patterns of gaze actions by conductors is much wider than bare glances. It is not simply the very fact of looking at performers that is meaningful; the conductor can resort to a whole repertoire of several different ways to use eyes, each consisting of a particular pattern of gaze actions and conveying a specific meaning to performers.

Body Lexicons of Musicians and Conductors
Some works concerning multimodal communication in musical performance evidence that the body behaviors of both musicians and conductors form proper lexicons, that is, sets of signals systematically corresponding to given meanings concerning specific aspects of music. In pianists, [26] found that the movements of the head, trunk, and face sometimes only have a motoric function, merely helping to perform the technical gesture, like when a circular movement of fingers on the piano is accompanied by a circular movement of the head. In other cases, body movements do not communicate anything to others but simply express the performer's internal state, e.g., closed eyes express concentration, internal parts of eyebrows up express sadness. As found in both pianists and conductors [26,56], four kinds of emotions are expressed during a performance: 1. Process Emotions, those felt while playing or conducting-tension, fear of making mistakes, excitation or flow; 2. Outcome Emotions, due to the appreciation of how the performance is going: shame or satisfaction, disgust or ecstasy; 3. Meaning Oriented Emotions, enacted by musicians or conductors in order to be impressed on the music being performed, e.g., sadness, joy, triumph, poignancy; 4. Movement Oriented Emotions, which expression implies (and helps to perform) a particular movement or manner of movement of the technical gesture. For example, both pianists and conductors typically frown when playing or asking for a "forte", because frowning is an expression of anger, and displaying-hence, somehow feeling-anger mobilizes the energy necessary to play loud. Finally, body movements may have an utterly communicative function in music performance since they convey specific meaning to others; this occasionally happens in players, but is always the case for conductors.
Therefore, various studies have attempted to find out the lexicons of the conductors' gestures and gaze: first, qualitative analyses of corpora of conducting hypothesize the correspondences between specific gestures or gaze items and the meanings conveyed by the conductor, and then perception studies test the hypotheses put forward about some gesture or gaze items. After singling out the gestures for intensity-those providing indications like forte, piano, crescendo, diminuendo [56][57][58]-a lexicon of gaze in conducting was proposed [59,60].

The Lexicon of the Conductor's Gaze
This work is embedded in a more general project aimed at finding out the repertoire of the conductor's body communication. The project sticks to a model that defines communication as any event in which a Sender (S) has the goal for an Addressee (A) to learn a new belief (B), and to this end, produces a communicative signal (s), a perceivable stimulus that is linked to belief (B), which is then, through a Communicative System (CS), means a system of rules to set correspondences between signals and meanings, represented in both the Sender's and the Addressee's minds. The goal of communicating may be a conscious one, like when a conductor closes his eyes to signal that he is concentrating before starting-hence, that he is going to start; but it may also be a goal that S is not aware of, like when his sweating expresses his anxiety about the performance. The signals, according to the modality in which they are produced-words, prosody, gestures, facial expressions, gaze, posture-make up different Communication Systems, which may be of two different types. A "creative" system is a set of rules of resemblance used to create new iconic signals by imitating places, shapes, or actions: e.g., curve arms widening, imitating a swelling body, make an iconic gesture asking for "crescendo ". A "codified" system, that is, a "lexicon", is a list of rules of correspondence in which a specific meaning corresponds to each signal in the Senders' and the Addressees' minds: e.g., index finger as a vertical bar touching the mouth is a gesture asking for "piano"; frowning eyebrows is a gaze item asking for "forte".
Positing the existence of a lexicon of conducting gaze is based on the idea that if a signal used by a conductor did not correspond to the same meaning in both his/her mind and in the musicians', they simply could not understand each other, and the performance would be affected thereof. Our hypothesis is therefore that conductors and musicians share a lexicon of gaze in which each gaze item corresponds to a specific meaning; our previous works [59,60] and the present one first try to derive the items of this lexicon, and secondly try to test whether they are in fact shared across performers, and if they are also comprehensible to non-musicians.
In [60], in a corpus of conducting of an amateur choir, 17 items of gaze were singled out and described in terms of their parameters (e.g., gaze direction, eyelid and eyebrow positions, and movements) and literal and indirect meanings were attributed to each of them, finally clustering them into types according to the functions they fulfill in musical performance (Table 1). While some uses of gaze are non-communicative (e.g., line 3, the conductor looks around to check for the musicians' attention), some have an interactional function, such as requesting attention or providing feedback (line 7, he opens his eyes wide to reproach for a mistake), others have a technical function, providing indications for the various parameters of music: raising eyebrows to ask for a piano (line 11) or squinting eyes to ask for a "sforzato" (line 15) (indications of intensity), suddenly gazing at a section of the orchestra to give the start (line 1). Finally, an emotional/attitude category includes items that exploit gaze expressions of Meaning Oriented or Movement Oriented Emotions, either aimed at conveying expressive indications or to suggest the right attitude in playing or singing: e.g., raising internal parts of eyebrows to ask for a sad sound (line 13), raising eyebrows while retracting head in the shoulders to ask for accuracy (line 12) [59,60]. Adapted from [59].
In the following, we present three studies, to try to deepen our knowledge about the use of communicative gaze in conducting. First, an interview submitted to five conductors to assess their awareness of the communicative importance of gaze during the performance; second, a qualitative observational study analyzing a concert by Leonard Bernstein conducted only by gaze and facial expression, aimed at singling out some of his gaze communicative items; third, a perception study testing the comprehension of some of these items by professional experts, amateurs, and non-experts.

Study 1. The Conductors' Awareness of Gaze Communication in Music Performance
As we have seen, some of the above studies [53,55] test the conductors' awareness of the cues, techniques, and strategies they exploit in conducting. In this work, before presenting, in Sections 5 and 6-two studies concerning the specific role of gaze in conducting-we want to test the conductors' level of awareness of their eye communication. What do conductors think about their own use of gaze while conducting? Do they monitor their own use of eyes? And what importance do they attribute to it as a tool for communication in performance?
Here we present a study aimed at testing the degree of awareness, in conductors, of the importance of gaze communication during choir or orchestra conducting

Method
To investigate this issue, we submitted an in-depth interview to five professional conductors (All conductors asked to remain anonymous), in order to assess their use of gaze in conducting and their awareness of it. The first two interviews were aimed at establishing a control sample before focusing on some more specific questions in the subsequent three ones.
The nine questions of the interview had the aim of exploring the participants' previous thoughts about gaze communication, but also of testing their awareness of it in their conducting style and their inclination to give more value to such a neglected tool as gaze ( Table 2). Table 2. An interview on the use of communicative gaze by conductors. The first two questions of the interview were (1) What aspects of music do you prefer to emphasize in your style of conducting? (intensity and dynamics, variations in time or rhythm, expressive elements, attention request or feedback, etc.) (2) Do you always prioritize the same elements regardless of the repertoire? aimed at understanding the conductor's style and to single out the elements he or she preferred to convey while conducting: for example, it would not be pertinent to extract data about rhythm if the conductor him/herself says s/he usually prefers to convey dynamics.
Among the aspects of music mentioned before (intensity and dynamics, variations in time or rhythm, expressive elements, request for attention or feedback, etc.), which ones, in your opinion, are better suited to be conveyed through gaze? (6) From 1 to 10, how do you rate the importance of gaze while conducting? These questions tested the awareness of gaze in the five conductors, taking their experience, but also their preferences into account.
Question 4 highlights how consciously the conductor uses gaze to communicate with the performers, while Question 5 investigates his/her general perception of gaze as a tool that can be potentially used, even though s/he might not use it. Question 6 focuses instead on how much importance gaze assumes for the conductor, in relation to other communication channels, on a scale 1 to 10.
(7) If you had to choose only one option, what main function would you attribute to your gaze during orchestra/choir conducting? (8) What kind of input do you intend to convey through your gaze when you are conducting? Questions (7) and (8) try to assess if the conductors have conscious communicative intentions right behind their own use of gaze.
(9) How much of your intentions you communicate by your eyes do you think musicians/singers receive?
This last question is the only one in the list that is not about what conductors themselves think or feel about the gaze, but aims at understanding if conductors can judge the potential, and the result, of this communicative tool.
Every interview was conducted face to face. Although the interview was fairly structured, some questions had to be reiterated when the answer was not pertinent, mainly in order to overcome some language barriers. The interviews were audio-recorded and transcribed verbatim. The excerpts quoted below are translated from Italian.

Participants
The questions of the interview aimed at exploring the participants' previous thoughts. The choice fell on five professional conductors, namely, two from Italy, one from Lithuania, one from Argentina, and one from Bolivia; our perspective was as international as possible, since internationality is a crucial element to further highlight how gaze signals are culturally or linguistically determined. Second, to test whether awareness about gaze communication in conducting cuts across different conductors, we chose subjects of different gender and age (one female and four males, from 30 to 70 years old) and with different repertoires, two mainly conducting orchestra and three mainly choirs, ranging from classic to pop-modern music

Results
From the first questions of the in-depth interviews, two opposed trends emerge about conducting style: the tendency to emphasize the elements that convey emotions (preference for dynamics, phrasing, expressive elements) and, to the opposite, the tendency to emphasize the precision elements (starts and caesuras, time, rhythm).
"If I have to choose the moments in which I interact more with the orchestra, I would undoubtedly say when dynamics and phrasing are involved." (Conductor 1, Male) The relationship between conducting style and repertoire is varied: three conductors claim that, while certainly taking into consideration the variables due to repertoire, all in all, the elements they tend to highlight are the same, while the other two highlight how repertoire is fundamental to set the conducting.
A significant element that deserves further study is that the conductors who claim to be tied to routine are those more connected to a choral dimension. A hypothesis to account for this lies in the type of work that the conductor carries out with a choir, which is often continuous (if not almost exclusive), or at least more assiduous than the one the conductor usually has with an orchestra (that is usually more tied to conventional signals).
"Precise starts, precise closures, well-timed, expressive sound. When you are working with a choir for a while, you also already know what weaknesses they could have and try to prevent them." (Conductor 2, Male) As to the functions for which they use gaze signals, all the interviewed conductors seem to agree: answers mainly include the control and feedback functions and the notice of changes (both expressive and rhythmic). Interestingly enough, although they claim the main functions for gaze use for themselves are those of control, request for attention and feedback, in the aspects of music they, in general, believe gaze can effectively convey expressive elements and dynamics too-even if some conductors specify that they believe it to be unlikely that one can convey signals only through gaze, and they consider integrated communication more effective instead.
"Definitely the expressive elements and dynamics. Dynamics, however, together with the movements, because the eyes cannot give precise signs of when the phrasing should begin or end". They all agree, however, on the impossibility to transmit indications of time or rhythm only through gaze; they all believe the only thing that can be signaled is an imminent change (but the signal would equally lack the information necessary for performance).
"For time I prefer using hands, big movements.
[…] there are pieces in which the choir sings without the conductor, in ancient music for example, but it is always one element of the choir who makes a sign to make others start, it is impossible for the choir to do it alone". (Conductor 3, Male) "Only through gaze? I don't really see how, you can't, you know, blink in time ... it would be pure madness". (Conductor 4, Male) Such ambiguity about the importance of gaze is also expressed by the rating (on a scale from 1 to 10) asked by Question 6. More specifically, the interviewed conductors rate it, respectively, 5.50, 6, 7, 7.75, and 9.50, thus ranging from skepticism to enthusiasm.
In summary, regardless of what intentions conductors claim they want to express through their eyes, all of them absolutely agree that gaze is definitely a clear signal for musicians because they are professionals and practice a lot before rehearsals.
"If only the gaze I don't know [how much they understand], together [with hands] surely they understand". (Conductor 5, Female) "As I said there was practice, every musician already knows what the conductor wants in that particular passage". (Conductor 3, Male) To sum up, the answers of the interviewed conductors concerning the importance of gaze and their actual use of gaze signals do not reveal thorough acknowledgment of the communicative import of this kind of signal in conducting. They claim that they make use of gaze mostly to ask for attention and give feedback and, as a secondary function, to convey information about emotions to performers; but by in large, they consider gaze signals mostly ineffective to convey indications related to time or rhythm, and in general, they believe that gaze is not autonomous as a conducting signal, but necessarily needs to concur with a gesture to convey meaningful indications.
Judging from these results, conductors on average do not seem to have a very high level of awareness of their own gaze communication, and of its relevance in conducting; or at least they do not seem to attach such concrete meanings to it and believe that gaze is more useful to trigger attention in general.
Nevertheless, although this kind of attitude towards gaze appears quite common in our interviews, we know, on the contrary, that some conductors are deeply aware of the importance, richness, and usefulness of this communication tool. Leonard Bernstein is one of them. This is mainly witnessed by a performance of Haydn's Symphony No. 88 in G Major by Leonard Bernstein, with Wiener Philarmoniker in 1989 (The whole performance, lasting 29'51", can be viewed at youtube.com/watch?v=AV_ZE4zcl3I. The fragment analyzed is from 25'15" to 28'52"), where during the encore after the performance he conducts with his hands down or behind his back and only using his face and gaze communication.

Study 2. Bernstein's Conducting Gaze-A Qualitative Analysis
Leonard Bernstein is one of the most complex and fascinating characters in the musical scene of the late twentieth century; a great orchestra conductor, an inspired composer, an effective pedagogue, and a controversial public figure. Despite his prolific authorial and executive production, what makes him the perfect subject for our research purposes is his tendency, especially in the last years of his career, to favor an intimate conducting style: at times his conduction has been made mainly by exploiting gaze and face, and the Haydn performance above is a transparent example of this. Bernstein is thus a perfect example to demonstrate that gaze in conducting can be credited the status of a lexicon.

Research Questions and Hypotheses
In this second study, we investigated Bernstein's use of gaze as a conducting tool, wondering if and what relevant indications for music performance are conveyed by Bernstein's eyes, and whether his items of gaze are understood by expert musicians and laypeople. To answer these questions, we carried out a qualitative analysis of his use of gaze during conducting.
Our hypothesis was that the types of gaze already found in previous analyses of other conductors [59,60] are also valid for Bernstein.

Materials and Method
To verify this, we carried out a qualitative analysis of a fragment of 3'37" from the above Haydn's Symphony No. 88 in G Major, conducted by Leonard Bernstein. This is a standard symphony composed of four movements, written for flutes, two oboes, two bassoons, two horns, two trumpets, timpani, basso continuo (harpsichord), and strings. We chose to analyze the Finale (Allegro con spirito) of this symphony for several reasons: first, Haydn is an author of the eighteenth-century, when the notation system was already sufficiently coded and the use of specific terms for dynamics and time was shared, so we have a solid basis for analyzing the conducting performance starting from the score; second, in the Finale all the aspects of music pointed at by the conductors in our interviews, such as dynamics, time, rhythm, are represented; third, thanks to the multiple tight shots and close-ups of Bernstein in the video, his gaze behavior is quite clear and can be analyzed better than with most orchestra performances.
The fragment was analyzed by two independent raters in terms of a simplified version of the annotation scheme proposed in [59,60] (Table 3). The analysis was carried out with muted video, to avoid being influenced by the corresponding sound, by describing and attributing a meaning to each gaze behavior considered as communicative; later, the ambiguous annotations were discussed between the two raters, and checked also with the corresponding audio. In column 1, we write the time of the gaze item analyzed; column 2 contains the description of the item, 3 its meaning, and 4 its function for musical performance. In column 3, we write the literal meaning and sometimes, after the arrow, a possible indirect meaning which can be inferred from the literal one; finally, in column 4 we write the functions of the literal and, possibly, the indirect meaning written in 3, respectively. Thus, a single item may have two functions, one fulfilled by the literal meaning and one by the indirect meaning.
For example, at time 0.23 in our version of the video (column 1), Bernstein raises his eyebrows (column 2), a polysemous gaze item whose meaning common to all its uses [10] is "pay attention"; in this context this means "play lighter" (i.e., pay attention no tot be too heavy in your movements) (column 3), with the function of conveying an indication of intensity (column 4).

Results
The annotation revealed that Bernstein's gaze behaviors are similar to those performed by other conductors [59,60]. Like all of them, some signals have interactional functions (e.g., calling for attention, providing feedback) while others have technical functions (start, intensity, tempo, expression), and the emotion displays performed by gaze, again, like those found in previous analyses of other conductors, can express either the outcome of the ongoing performance (Outcome Emotion Expression), thus sometimes fulfilling a feedback function, or the emotion to be stamped in the music played (Meaning Oriented Emotion Expression). All in all, the fragment analyzed contains 56 gaze items, and the functions of their direct and indirect meanings are distributed as in Table 4. As already undertaken in previous studies on intensity gestures [56][57][58], our method to discover the conductor's lexicon exploits a two steps approach: first, we view the communicative signals investigated, whether gestures, or gaze items, only from the point of view of its production, hence simply attributing the Sender the intention of communicating a given meaning by using that particular signal. This was the aim of Study 2. The second step is to check, by a perception study, if that signal is actually interpreted as bearing that very same meaning by the Receivers; and this is what will be carried out in Study 3. In other words, in Study 2, we only make some hypotheses about what the meaning of each gaze signal in the fragment could be. Subsequently, the meanings hypothesized were tested in Study 3.

Study 3. How Comprehensible is Bernstein's Lexicon of Gaze?
After deriving, through qualitative analysis, some of Bernstein's conducting gaze items, we wanted to test their comprehensibility and sharedness in musicians and laypeople, and to this purpose we ran a perception study.

Research Questions
Our research questions were the following: RQ1: How comprehensible are Bernstein's gaze items? RQ2: Is the level of comprehensibility in some way linked to the viewer's musical expertise? RQ3: Are they equally comprehensible if accompanied by corresponding sound or not? How is the interpretation of gaze samples affected by their being presented either in both audio and video modality or only visually? RQ4: Do the most expert participants take greater advantage of the audio in recognizing the gaze items?

Experimental Design
To answer these questions, we designed a within-subjects perception study, with independent variables: 1. the meaning of gaze items, 2. the item presentation mode (video-only/audio-visual), and 3. the participant's musical expertise (Expert/Amateur/Non-expert), and as a dependent variable, the meaning attributed by the participant to the gaze item, considered as expected/unexpected with respect to the meaning resulting from our previous semantic analysis.

Materials
According to the opinions of the conductors interviewed in Study 1, the aspects of music performance that are most typically conveyed by gaze resulted to be the signals for "start", "keep the time", "pay attention", "crescendo/diminuendo", "accelerando/rallentando". Based on these results and on our analysis of Bernstein's performance, we chose three of his gaze items. For each of them, a video-clip was extracted from the analyzed fragment in order to show Bernstein's whole face, then the clips were presented either in audio-visual mode or in video-only mode without audio.
1. a gaze signal conveying the technical meaning "start": He looks at a section with open eyes, then he raises his eyebrows opening his eyes wide and then closes his eyes; 2. the "interactional" gaze signal for "pay attention": He opens his eyes wide while raising his eyebrows, and turns his gaze from right to left by moving his irises right-left in the sclera; 3. a gaze signal meaning "I ask you to perform a crescendo and accelerando": He raises his eyebrows while corrugating his forehead and then frowns intensely looking at the orchestra.
This last gaze item is quite a complex example, and it was chosen to demonstrate that even difficult items such as this are generally attributed meaningful interpretations.
As specified above, the clips showed the whole face of the conductor. An alternative would have been to cut the stimuli only showing his eye region. As demonstrated by [52] using occlusion techniques, the face is generally the richest part of the conductor's body in conveying expressiveness, i.e., information concerning the emotions to be stamped onto music. However, the meanings we attributed to the chosen gaze stimuli were not "emotional" but technical or interactional ones. In reality, in the stimuli "start" and "pay attention", the emotional import contributed by facial muscles outside the eye region did not bear relevant differences in the meaning of the gaze stimuli per se. In the "accelerando" part of the third stimulus, it is somehow the other way around: an originally emotional gaze signal-here a deep frown, generally conveying anger or concentration-comes to bear a non-emotional meaning-mobilize your energy to hurry up.

Pilot Study
We first ran a pilot study aimed at refining the procedure for the subsequent perception study. A focus group was conducted recruiting 12 participants, four for each of three levels of musical competence: Experts, Amateurs, and Non-experts. We considered people professionally working with music as "Experts": musicians, singers, composers and arrangers, conductors, music teachers; "Amateurs", those with some knowledge of music technique but not accustomed to the rituality of professionals: music students, non-professional musicians; and "Non-experts", those with no technical knowledge of music.
In the focus group, the clips of the three selected gaze signals cut from Bernstein's video were shown to the 12 participants in video-only mode while asking, for each of them, to guess its possible meaning. The meanings proposed for each of the three items were recorded, to later construct the distractors for the subsequent perception study.

Method
A questionnaire was finally built up to test the participants' attribution of meanings to the three clips of Bernstein's video-the gaze items respectively meaning, according to our hypothesis, "start", "pay attention", and "crescendo and accelerando". After answering biographical questions on gender and age, each participant watched the three clips of conducting behavior, each repeated twice, the first time in video-only mode and the second in audio-visual mode; then s/he had to answer a multiple-choice question for each clip, where the expected meaning was mixed with four distractors emerged from the focus group, not only the most quoted alternatives but sometimes the most interesting ones. All alternatives were randomized; in the video-only and the audio-visual version of the same clip, the answers were the same but in different random order. Finally, in the last question, following the audio-visual clip "crescendo/accelerando", the participants were asked to rate, on a 10 step Likert-type scale, how certain the participant felt about the meaning of the previous clip. This question was put to participants regarding the third clip only, due to its particular complexity.
Before entering the questionnaire, the recruited participants were asked to classify themselves as Experts, Amateurs, or Non-experts, with the three classes defined as seen above. Finally, a control question asked them what their work or activity was, and those defining themselves Experts who did not do a musical job were considered Amateurs.

Results
For all statistical analyses, IBM SPSS 26.0 was used, and the significance level was set to α = .05. Coherently, also observed power (1-β) was computed using α = 0.05. RQ1: Our first research question concerned the comprehensibility of the three gaze items by Bernstein submitted to participants. Our analysis indicates that they are fairly comprehensible. The overall percentage of correct interpretations without audio is 48% (For the sake of clarity, since we describe both the overall results and those of the single signals ( § 6.7.1-3), we sometimes employ the percentage format, some other times the average score of recognized signals; of course, they are interchangeable: if a single signal is recognized 50% of the times, this means that the correspondent average will be 0.5; as for the overall scores, 50% corresponds to 1.5 (out of three signals)).
Considering that each answer concerning an item was chosen among five possible ones (one expected answer and four distractors), this is more than twice the level of chance.
RQ2: The second question was whether the level of comprehension depends on the viewer's musical expertise. Our results show a strict relationship between the viewers' musical expertise and the correct interpretation of the gaze items. A three-way analysis of variance (ANOVA) was performed with expertise, gaze signals, and modality of presentation as the factors and the correct interpretations as the dependent variable. The main effect of the expertise resulted to be significant, F(2, 1104) = 28.34, p < 0.001 ηp 2 = 0.049, (1-β) > 0.99. Subsequent post-hoc comparisons with Bonferroni correction showed significant differences between Experts (M = 0.65) and Non-experts (0.43) (p < .001), and Experts and Amateurs (M = 0.43) (p < 0.001).
As shown by Figure 1, Experts in both conditions (video-only and audio-visual) systematically perform better in the interpretations of the gaze items, followed by Amateurs (albeit with a remarkable difference between interpretation from video-only and from audio-visual) and Nonexperts. RQ4: The three-way ANOVA revealed that there is a significant interaction between Expertise and the modality of presentation, F(2, 1104) = 6.36, p = 0.002 ηp 2 = 0.01, (1-β) = 0.90: in other words, the differences between video-only and audio-visual presentation significantly vary among our three experimental sub-samples. Bonferroni corrected post-hoc analyses revealed that Amateurs and Experts, as opposed to Non-experts, took advantage of the audio for correctly guessing the right meaning. Post-hoc results are reported in Table 5. Moreover, if we take into account the different gaze signals, we discover that Amateurs performed significantly better with the addition of the audio in "pay attention" (p = 0.005) and "crescendo and accelerando" (p = 0.008); while Experts performed better only in "crescendo and accelerando" (p < 0.001) as highlighted by the significant interaction Expertise × Modality × Gaze, F(4, 1104) = 3.46, p = 0.008 ηp 2 = 0.01, (1-β) = 0.86.
As can be observed in Figure 2, when the audio is added, all the three groups behave similarly; on the contrary, in the video only condition, the Amateurs performed poorly in recognizing the "pay attention" and "crescendo and accelerando" signals. In both modalities, the scores of the Non-experts, as opposed to the others', remain stable.
Curiously enough, by means of a one-way ANOVA, we also found a gender effect for the Videoonly condition, F(1, 185) = 5.46, p < 0.021 ηp 2 = 0.03, (1-β) = 0.64, namely men (M = 1.59 SD = 0.80) tended to recognize the signals better than women (M = 1.30 SD = 0.91). To be on safer side, we also checked for the gender distribution among the levels of expertise, finding it to be not significant, χ 2 (2, N = 186) = 3.47, p = 0.17, thus concluding that, at least as far as gaze conducting signals are concerned, males perform better than females. These data deserve further and more focused investigation since such higher proficiency in body language of men appears to contradict not only popular wisdom, but also several findings of a female advantage in social perception tasks [61,62], among which facial emotion recognition test [63] and recognition of distinct features depicting bodily movements [64]. However, this difference might be confined to technical conducting signals only.
We now proceed to a more fine-grained description of each single gaze signal. The first video represented the gaze signal that indicates "start". For its recognition the gap between right interpretations from audio-visual and video-only presentation is wider as the level of expertise decreases. While Experts recognize the signal even without audio (M = 0.073 SD = 0.44), this mean drops, in a fairly predictable manner to .61 (SD = 0.49) for Amateurs and to .47 (SD = 0.50) for Non-experts, the only significant mean difference being that of Experts and Non-experts (p = 0.005); the pattern remains the same in the audio-visual mode, with Experts being at .69 (SD = 0.46) and Non-experts at .41 (SD = 0.49) (p = 0.002).
Although differences in recognition of the signal derive from the different music expertise, all in all the signal is understood on average by 61% of the sample, in the case of video-only clips, and by 56% when sound is added.

Gaze Signal for "Pay Attention"
The second clip is the one with the gaze-signal for "pay attention". Its meaning is peculiar since it expresses the relationship that exists between the conductor and the orchestra. This peculiarity seems to be confirmed by our data, in that the majority of Experts (M = 0.72 SD = 0.45) recognize the "pay attention" signal even in the absence of audio; this is an indication that, at least for a competent audience, the signal is highly codified and shared. On the contrary, Amateurs recognize it less, In reality, the most interesting result about this gaze signal comes from the comparison between Amateurs' and Non-experts' responses. In both cases, introducing the auditory stimulus modifies participants' recognition: with Amateurs, the auditory stimulus helps to discern reality, to identify the correct meaning (they double their rate of correct responses), while the case of Non-experts is quite different.
The data obtained from Non-experts' answers, at first sight, seem contradictory. Unexpectedly, as much as 53% of Non-experts (almost ten percentage points more than the interpretation of the gaze-signal for "start", which is, in general, the most recognizable signal) identify the right option even without audio, but this percentage drops (43%) when the auditory stimulus is introduced in the second version of the video.
Overall, this signal is understood on average by 52% of the sample, in the case of video-only clips, and by 56% when sound is added. 6.7.3. Gaze Signal for "Crescendo and Accelerando" The third video clip explored the comprehensibility of a gaze signal issued by Bernstein that was in fact quite complex: one asking for "crescendo and accelerando". This indication is complex because it mixes two indications pertaining to different parameters of music: intensity (crescendo) and tempo (accelerando). Among our participants, Experts are able to recognize the indication of intensity; a positive response by .29 (SD = 0.45) (they are not able to perceive the indication of tempo combined with that of intensity). Amateurs even less than that (M = 0.20 SD = 0.40). Surprisingly enough, Nonexperts managed to recognize it almost twice as much as the Amateurs (M = 0.38 SD = 0.49). None of these differences are statistically significant as shown by Bonferroni corrected post-hoc comparisons.
Things radically change with the addition of sound. The "perform crescendo and accelerando" option for Experts significantly rises up (M = 0.76 SD = 0.42) (p < 0.001), revealing that this population takes advantage of the audio more than the other sub-samples. Amateurs take a great deal of advantage too (M = 0.44 SD = 0.50) (p = 0.008); Non-experts' ratings remain low (M = 0.37 SD = 0.48), but always above the level of chance.
On average, this signal is understood by 30% of the sample, in the case of video-only clips, and by 53% when sound is added.

Confidence Ratings
Given the complexity of this last gaze indication, "crescendo and accelerando", we decided to add a certainty self-assessment scale to rate the level of confidence of participants in their response: the participants were asked to identify, on a Likert scale from 1 to 10, the degree of confidence of their answer on the video administered. An omnibus one-way ANOVA was performed, revealing that the three groups significantly differ, F(2, 184) = 33.82, p < 0.001 ηp 2

Discussion
The hypothesis of this work was that the conductor's gaze could be encoded in specific signals, intelligible to some extent; our specific aim was to understand how recognizable are the signals of gaze used by Bernstein in his conducting, if their recognition is related to their level of music expertise, and whether it changes depending on the presence or absence of sound.
As for the first research issue, only the "start" gaze signal is fairly recognized by the whole sample. On the contrary, the signal for "pay attention" is less recognized. The delicacy of such signal is that by means of it the conductor transmits his emotions to the orchestra, he sends signals so that the performance reflects his idea of the work, and at the same time asks the orchestra for a commitment to attend to his signals and interpret them correctly; quite a complex relationship to understand for a Non-expert. A possible-yet curious-account for this result might thus be that signals like requests for attention and feedback (praise and reproach), that belong to the class of interactional signals, are not specific for conducting, so Non-experts do not expect to find them in a musical performance, and rather try to interpret them as technical signals in the presence of music; Experts and Amateurs, instead, having some experience of the relationship between conductor and performers, find these meanings rather expectable and recognize them both in the presence and in the absence of audio. In this perspective, it is not surprising that the "pay attention" gaze signal is difficult to understand for an audience totally uninformed about musical techniques: the sound is no longer an auxiliary element to understand, as was the case for amateurs, but it is an obstacle, a distracting factor.
With regard to the last and most complex signal, "crescendo and accelerando", the interesting finding regards the participants' interpretation confidence, and the slightly lower confidence of Amateurs as opposed to Non-experts.
Coming to the second research question, a significant difference emerges in the correctness of "video-only" interpretations among all three groups of participants. Experts systematically performed better than both Amateurs and Non-experts, whereas Amateurs, contrary to the hypothesis, provide less correct interpretations than Non-experts. When audio is added, instead, this strange effect disappears, and recognition linearly correlates with expertise.
Regarding our third and last research question, a difference emerges in correct interpretations of the video-only vs. audio-visual interpretations, namely the same signals with audio are better interpreted than those without it. Furthermore, the differences between with and without audio significantly vary depending on expertise; in other words, Non-experts are the only participants who do not take advantage of the audio cue. This counts as an implicit confirmation of the importance of expertise to grasp the right meaning of the gaze signals.
This study on Bernstein's conducting gaze items can be compared with a previous work on the recognition of gaze items by the conductor of an amateur choir [59]; a common result is the high level of recognition of the item for "start"; but apart from this single item, in general, no significant difference had been found in that work between Experts' and Non-experts' recognition, whereas in the present study, Experts are better than both Non-experts and Amateurs. This might depend on the different gaze items (apart from the "start" one) investigated in the two works: in the previous one, the gaze items mainly concerned technical aspects of music, like tempo and intensity, whereas here an "interactional" signal was submitted to participants, which, as seen above, might have more clearly set the difference between Experts, Non-experts and, most of all, Amateurs.

Conclusion
During the last few decades, a conspicuous body of research has investigated multimodal communication in music performance, tackling the body movements of performers and conductors between each other and with the audience. Concerning gestures in conducting, they have been investigated including their semantic aspects, the meanings they convey and the semiotic devices they exploit [45,[47][48][49][56][57][58]. In regards to gaze, the majority of studies on the use of eyes in performance, both between co-performers and between performers and conductor, deals with what we would call an "input", not an "output" function: using eyes mainly to acquire information about what another is doing [10], in order to reach synchronization or to catch technical or expressive indications. Even in studies tackling the "output", however, definitively communicative functions of the conductor's eyes do not catch the true richness of the conductor's gaze in conveying indications to musicians; even sophisticated studies through eye-tracking techniques, in music performance [33] like in everyday interaction [19], do not tell us much about the semantic input of gaze. First, the results are often confined to a single parameter such as gaze direction, without taking into account many other parameters such as eyebrow and eyelid movements or iris position on the sclera, that make up a rich repertoire of gaze signals [10,60]. Second, they do not even try to find systematic correspondences between the various signals and specific meanings for conducting.
The goal of our work, instead, was to demonstrate that there is much more than gaze direction to the communicative use of eyes in conducting: after discovering a shared lexicon in other conductors [59,60], we tried to find instances of this lexicon in the gaze communication of Leonard Bernstein. To this end, we conducted three studies exploiting three different empirical methods. First, an interview with five conductors told us that they are, to some extent, aware of their own use of gaze in conducting, but they do not attribute specific meanings to particular items of gaze, and they do not think that some parameters of music, such as rhythm, can be effectively communicated by gaze only. Second, an observational study coded the communicative gaze items used by Leonard Bernstein while conducting Haydn's symphony No. 88. Third, a perception study tested the comprehensibility of three of these items by Experts, Amateurs, and Non-experts when presented by a video-only vs. an audio-visual clip, finding out that the three items are reasonably recognized, but recognition is affected by both expertise level and presentation mode.
This work seems to confirm that the communicative items of gaze used in conducting make up a shared, systematic, and specific lexicon. The fact that, generally, Experts, and sometimes Amateurs, are better than Non-Experts in recognizing the items meanings tell us they in a sense make part of a "conducting gaze" communication system.
If some conductors, like Antonio Guarnieri quoted above, or Richard Strauss cited by [52], did in fact "conduct with their eyes", Leonard Bernstein did so in a conscious way: the very fact that, after conducting the whole symphony with the baton, in the encore, he decided to keep his hands down or behind his back, and to conduct by face and gaze only means that he was utterly aware of their self-consistency and even their effectiveness as a conducting tool.
Coming to the limitations of this study, we must keep in mind that gaze is only one of the signals produced by the face, while other parts of it-e.g., the mouth-may at the same time convey different messages. Therefore, we were very selective for choosing fragments to be used as stimuli in ways that other parts of the face did not add relevant information, nor intensify the meaning conveyed by gaze. Even then, the extent that the whole facial coordination contributes to the meaning attributed to a gaze item is not clear. Therefore, future research should provide refinements on this issue by submitting to participants clips with only the eye region visible in the video frames. Other studies will finally test the comprehensibility of a broader set of gaze items.
What is the point of such work? Studying the forms and meanings of the conductor's lexicon of gaze and the production and comprehension of its items may contribute, on the theoretical side, to in-depth knowledge of the communicative devices implied in music performance, and of their similarities and differences from those of everyday communication; on the application side, this may enhance the conductors' awareness of their communicative instruments and the teaching of conducting; on the technological side, to build virtual conducting devices, examples of which are conductor embodied agents and robots [65].
All of this work, finally, might be exploited to compare the effects on ensemble performance depending on the extent in which conductors make use of their gaze lexicon, and thus to understand in detail how the conductor's body behavior influences music.