In making assessments about others’ moods, personality traits, and relationships, we are able to make relatively reliable decisions based on a small amount of information [
1,
2]. This ability has been deemed ‘thin slicing’ and involves using cognitive and social heuristics to assess such traits and has been researched extensively. Additional research has also sought to examine this phenomenon in fields such as assessment of performance.
Tsay [
3] investigated the effect of differing modalities of cues and leaders (conductors) in the judgement of musical performances while auditory cues are perceived to be the most important cue in music. Musical novices selected winners from performances by three competition finalists in sound-only, visual-only, or audiovisual clips and reported what cues mattered the most when judging [
3]. Although all groups of participants in all experiments (musical novices and professionals) reported that sound is the most important information when evaluating music, visual-only conditions always exhibited the best accuracy when selecting winners while sound-only and audiovisual conditions had the worst, or below chance level accuracy. Tsay notes that judgment of winners is mostly based on visual rather than auditory cues and that sound might actually distract people from selecting actual winners. These results indicate that people, even musicians as well as non-musicians, appear to overweight visual information in their evaluation of music performances, so it is strongly encouraged that musicians and music adjudicators should focus more on the specific ways that visual cues affect music.
An important task in science is the ability to evaluate the credibility of research claims, based on both face validity as well as statistical reliability. Individual research papers are not stand-alone pieces of evidence, but rather contributions to the corpus of a field at large, and it is important to be able to assess which pieces of evidence are more or less valuable to that field. The Systematizing Confidence in Open Research and Evidence (SCORE; [
4]) project is an endeavour seeking to accomplish two goals–first, to assess individual research claims from studies in the social sciences over time, and second, to consider the elements in specific studies that may make the evidence coming from them more or less reliable to the field. The current study involves a replication of a study from Tsay [
3], in which the authors assessed the ability of non-musicians to evaluate musical performances based on very short extracts of the performances. In this case, the claim from the original paper by Tsay suggesting that it is possible to successfully assess musical performances from a 6-s clip could have negative repercussions on evaluation of such performances. For example, when allowing somebody to audition for an orchestra of for a place in a music performance program, one may choose to make a decision based on a very brief piece of an audition. However, if the effect reported by Tsay is spurious, we may be doing a disservice to performers by not assessing their full performance.
1.1. Musical Elements
The relative impact of visual and auditory cues has been studied by Thompson, Graham, and Russo [
5]. They explored the influence of audio-visual integration in listeners’ perception of music and how visual aspects affect the communication between performers and listeners. To test this, Thompson et al. [
5] divided participants into either audio-only or audiovisual groups and asked them to evaluate the level of disharmony of the performances that had either strong sense of dissonance or neutral facial expressions. Ratings were significantly higher in visually dissonant mode than neutral expressivity for audiovisual group, but this was not seen in audio-only group, which resulted from listeners integrating visual with auditory aspects of performance to form an audiovisual mental representation of music and this representation is not entirely predictable from the auditory input alone [
5]. These findings indicate that facial expressions and gestures hugely contribute to visual cues in music, which enhance audience’s experience and interpretation of the music performances.
Siminoski [
6] tested if audiences’ understanding of musical performances is affected by the two main cues in music—auditory and visual—by making clarinetist and pianist duets in four performer conditions: (1) normal setting, (2) no visual with full audio-only feedback, (3) full visual with partial auditory feedback, and (4) no visual but partial auditory feedback and asking participants to judge on expression, unity, and their subjective likeability of the performances that were presented in either audio-only, visual-only, or audiovisual clips. Normal performance setting ended up with highest ratings across all aspects and types of stimuli, and audio-only condition had no differences in ratings across the performer conditions while visual-only and audiovisual had significantly more differences [
6].
Pope [
7] examined how performance quality in both auditory and visual components and evaluators’ music experience shape music judgments by varying the quality of performance in audio and video (good or poor) and assigning participants into one of the four conditions: audio-only, video-only, good video + good or poor audio, or poor video + good or poor audio to rate on their musical aspects. Good quality performances gave significantly higher ratings than poor quality for all evaluation aspects, for almost all evaluation aspects, good video + good or poor condition had higher ratings than the other three conditions [
7]. The authors argue that this is possibly because of string orchestras’ lessening of the influence of aural deficiencies in their performances by demonstrating good visual presentations, which shows that visual aspects in performance impact the judgments of many musical characteristics although the performances are primarily auditory.
Tsay (2014) published another study on the importance of visual and auditory stimuli when evaluating music performances, and questioned the common belief that sound is the most reliable source of information when judging music [
8]. Tsay found that although the participants reported that sound matters most to their judgments, both musical novices and experts successfully identified the winners of music competitions through silent videos (visual-only) but were unable to do so with audio-only or even with audiovisual recordings [
8]. This suggests that the influence of visual cues is not affected in the experience in music, and that visual cues are likely overweighted when they are neither valued nor recognized stimuli when judging music, which may be because of the pressures that constrain our cognitive resources that lead to a visual dependence.
However, when Mehr, Scannell, and Winner [
9] tried to replicate Tsay’s [
8] work and tested the robustness and the generalizability of Tsay’s findings (the precedence of visual over auditory cues in music judgment), they concluded that the previous findings were not robust enough since minor changes in methods generated significantly different or even opposite results. For example, Mehr et al. note that when presenting stimuli in pairings rather than triads, participants were unable to reliably identify a winner, while in triads they were able to. They suggest this is due to probabilities related with guessing–for example, if one can rule out one of three presentations, they can perform at 50% chance while guessing. In Brimhall’s [
10] experiment, however, participants who observed the audiovisual stimulus provided similar ratings to those who experienced the audio-only clips, indicating an insignificant effect due to presentation conditions (audio-only, or audiovisual) and suggesting that visual stimulus did not influence the evaluation of music. Therefore, there is evidence that vision may not triumph as the dominant sense, and it is possible to judge the musical ratings without expecting the influence of visual feedback.
1.2. Visual Elements
Movements, or body gestures of performers during music performances are often considered as the most obvious area of study for visual elements that affect music evaluations. Trevor and Huron [
11] tested the effect of performer movement on judgments of performance quality. To test this, the movements were created for animated stick figure performers who were performing both slow and faster passages that had either magnified, original, or diminished performance motion; participants adjusted the range of motion to create the best musical performance [
11]. Participants significantly amplified the motions of the performers for the fast passages, while preferring only about normal movement for the lyrical passages. This indicates that greater performance motion exhibit superior performances (particularly with fast passages); perhaps the audience feels less inclined to increase the motion for lyrical or slower passages since it is already fairly expressive compared to inexpressive fast passages that have more technical demands which makes it difficult to add expressive motions.
Researchers have also been interested in the relationship between body movements and musical expressivity and have explored the effect of non-verbal body gestures on the expressivity [
12,
13,
14] and emotional quality [
15] of music performances. They assigned the participants into visual-only, audio-only, or audiovisual groups and presented solo performances with one of three expressive manners: restrained, normal, or exaggerated intention to rate expressivity and emotional qualities [
15,
16]. Expressive intention had its greatest impact when the performances could be seen (visual-only), which reveals that not only is vision a useful source of information about manner, but it also specifies manner more clearly than the other groups [
15,
16]. Hence, there is a need to consider visual as well as sound information in music perception as the most effective factor that determines expressivity and emotions being conveyed.
Weiss, Nusseck, and Spahn [
16] also analyzed the same question as Trevor and Huron [
13] examining the influence of ancillary gesture, but in clarinetists specifically. Participants viewed and rated videos of kinematic displays of clarinetists with optical markers attached to specific body parts to provide a full body recording of four different motion types on five general aspects of music—expressiveness, match of the movements to the music, musical fluency, professionalism, and overall impression of the performance [
16]. Highest ratings were given to those who performed with predominant motion, and the lowest ratings were given for performances with overall low motion [
16]. Authors propounded that this might have occurred because of the relationship between the perceived degree of motion and the intended level of expressiveness of a musician; a musician’s aim to purposely exaggerate the motion behaviour can enhance the perception of the expressivity and performance. As such, we have evidence that bodily gestures and movements enhance audiences’, and adjudicators’, experience and evaluations of musical performances.
1.3. Evaluators’ Musical Ability
Marozeau, Innes-Brown, Grayden, Burkitt, and Blamey [
17] and Griffiths and Reay [
18] focused on the effect of visual cues on music evaluations and whether evaluators’ musical training mediates this effect by asking musicians and non-musicians to rate: the difficulty of separating a four-note repeating melody from interleaved random distracter notes [
13], and four video clips (professional/good audio; PA + amateur/bad visual; AV, PV+AA, PA+PV, AA+AV) on three musical aspects [
18]. Marozeau et al. [
17] found out that when there was no visual cue, musicians generally rated the melody segregation as less difficult than non-musicians, but when a visual cue was present, difficulty ratings for musicians and non-musicians were very similar, indicating that the effect of evaluators’ level of musical experience on their judgments of music is still unclear, and visual cues affect listeners’ perception of music. However, Griffiths and Reay [
18] noticed evidence that visual information has a greater impact than auditory information on evaluations of performance quality, as the clip with bad audio + good video was rated significantly higher than that with good audio + bad video on all three evaluation measures, but also resulted in no significant effect of musical training on any of the evaluation measures. This result is in contrast to Marozeau et al.’s [
17] findings and highlights a possible unimportance of evaluators’ musical ability.
Mitchell and MacDonald [
19] explored the importance of visual or audio priming in identifying music performers, and which cue is stronger. Musicians were assigned into either: visual-audio; V-A (watched the target performer then listened to a line-up of target and distractors), or audio-visual; A-V (opposite to visual-audio order) and guessed the target performer [
19]. While all participants identified the target above chance level regardless of the presentation order and the number of distractors, V-A’s rates were significantly higher than A-V’s indicating that although both audio and visual cues provide enough info to achieve the task, the findings presumably arose from the visual cues being more robust information when identifying performers and people being more sensitive to them than auditory cues [
19]. Thus, the music industry should be aware that visual priming is more important than audio priming to correctly identify the targets.