Next Article in Journal
Practical Applications of Quantum Computing in Finance: Mathematical Foundations and Deployment Challenges
Next Article in Special Issue
Comparative Multilevel Governance: Subnational Governments in Latin America from a Comparative Perspective
Previous Article in Journal / Special Issue
Perinatal Mood Disorders Among Low-Income Birthing Persons Living in Urban Areas in the United States
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Review

The Senses of Music: Towards a Theoretical Model of Multisensory Musical Experience

by
Cristiane Nogueira
*,
Ana Isabel Pereira
and
Helena Rodrigues
Department of Musicology, School of Social Sciences and Humanities (NOVA FCSH), Centre for Music Studies (CESEM), NOVA University of Lisbon, 1069-061 Lisbon, Portugal
*
Author to whom correspondence should be addressed.
Encyclopedia 2026, 6(5), 94; https://doi.org/10.3390/encyclopedia6050094
Submission received: 23 February 2026 / Revised: 3 April 2026 / Accepted: 20 April 2026 / Published: 22 April 2026
(This article belongs to the Collection Encyclopedia of Social Sciences)

Abstract

A growing number of studies have highlighted the various sensory interactions involved in the musical experience, as relationships between music and dimensions of taste, olfaction, sound, and visual qualities, such as associations between pitch and the size of images or objects, spatial location and frequency, and instrumental timbres and visual shapes. These studies share the premise that the way we relate to the musical phenomenon, whether in the processes of production, perception, or understanding, emerges from an integrated and intrinsically multisensory perceptual event. Nevertheless, because music is present daily in everyday life and because this experience is inherently subjective, such interactions tend to occur so naturally and seem so obvious that they have been relegated to common sense. On the other hand, evidence indicates that sensory interactions constitute a fundamental ancestral mechanism for cognitive and neuronal development governed by non-arbitrary tendencies, multiple variables, and patterns of predictability. The novel contribution of this review is to advance a dynamic theoretical model of multisensory musical experience that takes crossmodal correspondences as its central organising axis, articulated through three structuring principles (universality, congruence effect, hierarchical tendency) and their interaction with musical organisation, cognitive structure, and the sensory systems mobilised by music. A future research agenda is also proposed to broaden and deepen investigations in the field of music psychology and human development.

1. Introduction

As an experience involving different sensory mechanisms, music has been recognised for its intrinsically multisensory nature [1]. In fact, from the manipulation of sound through musical instruments to the reading of signs and notation to physical involvement, whether creating, playing, or listening, the musical phenomenon activates and interacts with different perceptual senses, such as auditory, tactile, visual, and kinesthetic, as well as emotional and cognitive mechanisms.
Although seemingly obvious, a growing body of research shows that sensory interactions involving the musical phenomenon are governed by non-arbitrary tendencies, as in the case of the relationships between pitch and the size of images and objects [2,3,4,5,6,7], spatial location and frequency [8], brightness and visual forms [9], in addition to instrumental timbres and visual forms [10]. Other studies have reported connections between music and colours [11,12,13], icons and figures [14,15,16], and paintings [17,18,19]. In addition, relationships between music and taste [20,21,22,23,24,25] and between music and the sense of smell [26] have been reported. These studies share the premise that the way we relate to the musical phenomenon, whether in its production, perception, or understanding, emerges from an integrated perceptual event. There is evidence that combining information across sensory systems constitutes a fundamental ancestral mechanism for cognitive and neural development [27], manifesting itself in different ways among individuals. Consequently, interactions between sensory domains are believed to be central features of the human mind [28]. In seeking to understand the foundations that underpin this integration, this review draws on philosophical assumptions and empirical evidence to propose a theoretical model that helps to understand the multisensory nature of music, while also offering an agenda for future research in this area.

2. Philosophical and Theoretical Foundations of Multisensory Perception

2.1. Aristotelian Tradition

It was the pre-Socratic philosopher Democritus (490–370 BC) who originally proposed that all senses could be conceived as modes of touch activated by currents of atoms and, as such, endowed with common properties. Later, Aristotle (384–322 BC), credited with classifying the five basic perceptual senses (hearing, smell, taste, touch, and sight), developed two concepts regarding the nature of sensory experience: special objects and sensus communis.
The Aristotelian notion of sensus communis “lies at a higher level than do the other five senses” [29] (p. 4), since it would comprise a specific perceptual sense with important psychological functions, such as appreciation, awareness, judgment, discrimination, and conclusion of sensory perception. These functions would be triggered by the apprehension of six qualities, named by Aristotle as “sensible attributes”, common to all perceptual senses, namely: movement, rest, number, size, shape, and unity. The taste of a grape, for example, would be appreciated only by the sense of taste, while other qualities, such as the shape and size of the fruit, would be perceived more satisfactorily by Aristotle’s sensus communis.
Over the centuries, these concepts have underpinned the production of knowledge about perception, undergoing various transformations as scientific developments have advanced. However, especially since the development of the fields of psychology, psychophysics, and cognitive sciences, the unidirectional and passive understanding of the perceptual process has given way to a more complex, diverse, cyclical, and therefore multisensory view of this phenomenon [30]. Some researchers identify sensus communis as the basis for understanding contemporary multisensoriality [31,32]. Based on this, Marks (1978) presents the concept of “unity of the senses,” which concerns the relationship between the perceptual senses and the sharing of information among them [29].
In music, the concept of unity of the senses [29] can be observed both in performance and in musical appreciation. At a concert, for example, the listener focuses on the musical discourse while observing the expressiveness of the performers, visually following their gestures and movements; the atmosphere of the concert hall is captured through a distinctive aroma and the softness or comfort of the seats; the listener may be led to tap their feet or perform another movement, spontaneous or not, in response to the music; the timbre of a particular instrument may conjure up images or figures, and the melody may evoke memories and emotions that may positively or negatively reinforce the present experience. All the senses are integrated, capturing and giving meaning not only to the sound-musical stimulus, but also to the other surrounding stimuli.

2.2. Three Philosophical Trends for Understanding Multisensory Perception

Caznok (2008) presents three trends in understanding the phenomenon of multisensory perception, placing the concept of perception and the senses in perspective, including their qualities and inherent attributes [32].
Firstly, multisensoriality stems from the assumption that objects and events accumulate characteristics and properties, and it is up to the senses to capture them. In this case, perception would be based on the idea that “objects themselves contain properties that are similar or different, sometimes requiring the union and sometimes the separation of the senses” [32] (p. 117). In other words, when an object or event exhibits similar properties, perception tends to integrate them (i.e., a small object may be associated with a high-pitched sound, which promotes an integrated perception between sight and hearing); on the other hand, when they exhibit different properties, there may be a separation between the sensory modalities (i.e., if a visually small object is associated with a low-pitched sound, this incongruity may lead to differentiated processing between the senses).
Examples of attributes shared by both certain objects and events, as well as those perceived by different senses, include size, shape, quality, and movement. In music, these attributes would be apprehended primarily through hearing, although analogous correspondents in other domains, such as the visual, could also be activated. Several studies have shown associations between high-pitched sounds and small objects and images, and low-pitched sounds and large objects or images [2,3,4,5,6,7,8], as well as between musical timbre and visual form, with more strident and metallic sounds, originating from the triangle, cymbal, and gong, being associated with pointed shapes, while fuller sounds, such as those from the piano, cello, and marimba, were associated with rounded shapes [10].
The second trend argues that, rather than residing in objects and events, the capacity for “correspondence between the senses would then be an attribute of the perceptual apparatus” [32] (p. 119). In other words, the perceptual senses would be able to perceive certain attributes in common, called “suprasensory,” and would therefore be closer to each other, such as smell and taste, and sight and hearing. According to Marks (1978), “the several senses display a fundamental unity in part because a class of suprasensory attributes pertains to sensations on all modalities” [29] (p. 5), such as intensity, quality, duration, and extension.
Some studies corroborate this perspective, highlighting relationships between auditory and visual attributes: the correspondence between faster music and more saturated colours, and slower music and less saturated colours [13]; the taste of a particular beer being perceived as stronger and with a higher alcohol content in the presence of music, regardless of the type, when compared to the silent condition [33]; and in the presence of music, certain sensory attributes of wine, such as acidity and alcohol content, were perceived more strongly when compared to the silent condition [34].
The third trend in understanding multisensory perception is based on psychophysical properties shared by senses and stimuli [29], that is, on quantifiable functional relationships in sensory perception. Among these, sensitivity, i.e., “the speed with which a sensory organ responds to a stimulus” [32] (p. 122), and discrimination (the ability to detect, compare, and associate crossmodal stimuli) are particularly relevant [35]. This perspective has clear methodological implications, since it suggests that multisensory processes can be tested by controlling sensory mechanisms and stimulus properties, supporting procedures such as preference tests, judgements of stimulus qualities, association tasks, and the systematic manipulation of variables such as stimulus characterisation and the congruence level between paired inputs.
Although presented individually, these three trends in multisensory perception do not operate in isolation but rather in a dynamic, integrated manner. This convergence can be observed in one of the most recurrent and empirically investigated phenomena in the field of multisensory research, namely crossmodal correspondence (CC). As will be presented in more detail ahead, crossmodal correspondences (CCs) highlight a series of interactions between perceptual modalities related to musical and sound experiences, constituting a structural framework for proposing a model of the multisensory nature of music.

3. Multisensory Phenomena and Conceptual Delimitation

Multisensory perception is a complex phenomenon in which multiple sensory modalities interact in response to environmental stimuli [36,37,38]. Whether in the perception or processing of captured information, this phenomenon enables a more integrated and meaningful perception of the surrounding world, which is essential for its understanding [30]. Therefore, multisensory processing (or multisensory perception) is an umbrella term encompassing perceptual phenomena arising from the fusion of sensory modalities, such as sensory integration, crossmodal correspondence, and synesthesia, as shortly defined in Table 1.
Crossmodal correspondence refers to “a tendency for a sensory feature, or attribute, in one modality, either physically present or merely imagined, to be matched (or associated) with a sensory feature in another sensory modality” [38] (p. 140). This phenomenon evolved in parallel with the growth of the field of experimental psychology [29,35,39,40], when scholars from diverse fields began conducting studies of the human mind. Therefore, the CCs were linked to different terms over time, as synesthetic/intermodal congruence correspondences, synesthetic correspondence, intermodal equivalence, intermodal similarities, natural intermodal mappings, and weak synesthesia [29,40,44,45]. Contemporary research converges on treating CCs as genuine multisensory phenomena, with their own theoretical status [27,35,46,47].
The renewed interest in CCs emerged alongside experimental psychology [29] and Gestalt traditions, which emphasised sensory integration as a recurrent property of perception and helped disentangle CCs from synesthesia. First reported in 1880 by Francis Galton, synesthesia comprises “an automatic and involuntary phenomenon in which one modality evokes activation in a second, typically unrelated sensory or cognitive modality, resulting in the experience of atypical qualia” [41] (p. 259). As a neurological condition, the phenomenon involves a stimulus (called an inducer) that can be either sensory (a sound, a taste) or cognitive (a number, a letter, a day of the week) and a sensation (called a concurrent) that is an involuntary, automatic, and consistent response to the inducer [42]. Neuroscience was largely responsible for the resurgence of this topic in the late 1990s, providing evidence of this phenomenon through psychophysical tests and neuroimaging techniques [41].
Both synesthesia and CC originate from connections between different sensory modalities, and the observation that some types of synesthesia are also observed in non-synesthetes [40]. However, these phenomena differ in terms of population and directionality. It is estimated that approximately 4% of the world’s population is synesthetic, experiencing at least one type of synesthesia [41], among the more than 150 documented synesthetic combinations [42,48]. On the other hand, CCs are widely distributed phenomena in the population, statistically consistent, and empirically investigated in experimental psychology. Regarding directionality, synesthesia has been recognised as unidirectional: a letter (inducer) can evoke only one type of colour (concurrent), but the reverse is not true; the same colour will not necessarily evoke the letter. On the other hand, CCs are characterised as bidirectional phenomena, i.e., in the presence of music, an individual may associate it with a specific visual shape and vice versa.
As a widely shared phenomenon, CCs are theoretically better suited to underpin an understanding of music’s multisensory nature. To understand CCs in more detail, it is useful to consider how they are operationalised in empirical work, as explained below.

4. Crossmodal Correspondences: Variables and Nature

Studies in the field of CCs are mostly empirical and experimental, methodologically structured around sensory pairing [41] (e.g., sight and hearing, smell and taste, touch and sight). For example, to verify the associations between pitch and visual size, contrasting sound stimuli are used, such as a high-pitched and a low-pitched sound, as well as figures or images with different sizes [39,44]. Hearing and taste have been one of the most investigated pairs in crossmodal studies, with a predominance of real foods and beverages as stimuli, such as chocolate [20], chocolate ice cream [22,23], beer [33], wine [40,41,42,43,44,45,46,47,48,49,50,51], among others. On the other hand, between hearing and vision, the use of colour stimuli [11,12], images and icons [14,15], and visual shapes [52,53] predominates.
CCs can involve two types of stimuli, namely simple or complex [46]. A simple stimulus represents an isolated property without strong semantic associations, “that do not have any particular semantic/aesthetic meaning/association” [54] (p. 240). For auditory stimuli, this means a high- or low-pitched sound or a specific instrumental timbre; for visual stimuli, it corresponds to a colour, a luminous point, or a specific figure. Pitch has been one of the most recurrent simple auditory stimuli in CCs [2,3,4,5,6,54], followed by timbre [10], brightness, and visual shapes [9].
Complex stimuli combine multiple qualities and can be “operationally defined as having multiple individualizable elements or attributes” [46] (p. 6). For example, the relationship between the senses of hearing and taste was explored using an excerpt from Tchaikovsky’s String Quartet No. 1 combined with different types of wine [34]; excerpts from Bach, Mozart, and Brahms were used in interaction with a variety of colours [11]; crossmodal associations were examined between musical excerpts from the classical guitar repertoire of various composers, such as Villa-Lobos, Tárrega, and Albéniz, and a series of paintings [17].
Across these designs, CCs are investigated using tasks that probe perceived congruence, that is, the level of compatibility between pairs of sensory stimuli observed both in some events in the environment and in predefined sensory pairings. According to Parise and Spence (2013), “crossmodal correspondences might operate as additional cues to help solve the correspondence problem by biasing the brain toward integrating congruent stimuli and segregating incongruent ones” [39] (p. 802). For example, in implicit association tasks involving pairs of audiovisual stimuli, congruent pairs were identified more quickly than incongruent pairs [6].
Finally, the nature of CCs has been characterised along four partially overlapping bases: structural, statistical, semantic [35,39,40], and emotional [46,54], as shown in Table 2.
Correspondences of a structural nature arise from the characteristics of the nervous system itself and from the way the brain reacts to and encodes sensory stimuli. This type of correspondence is explained by different factors, such as the principle of neural economy, “whereby the brain adopts similar mechanisms to process a number of different features from different sensory modalities, which, as a consequence, might end up being associated” [39] (p. 794). In addition, they can rely on the structure and proximity of certain brain areas, where “multiple sensory dimensions might be associated as a consequence of their being processed in neighboring (see Ref. [55]) or interconnected brain areas (see Rouw and Scholte 2007)” [39] (p. 794).
On the other hand, statistical CCs are based on environmental influences and reflect the regularities of the world around us, suggesting a learning bias. For example, in the natural world, small animals tend to produce higher-pitched sounds than large animals, a combination that may underlie one of the most frequently cited audiovisual CCs in the literature, which links sounds to the size of objects [2,3,4,5,6,7]. In this case, statistical correspondence ensures that, after frequent exposure to certain combinations in nature, “sensory systems acquire information concerning the statistical regularities of the environment and hence the correlations between multiple sensory cues (…) can then be used in order to decide which stimuli normally go together, (…) and which to keep separate” [39] (p. 795).
In the case of semantic CCs, the role of language and lexical meaning in the combination of sensory stimuli is noteworthy. Given that many languages use the same adjectives or words to describe different experiences (for example, the word “softly” can refer to both the texture of a surface and the sound of an instrument; “up” and “down” are attributed to both sound volume and spatial elevation), this type of correspondence results “from the common linguistic labels used to describe various perceptual dimensions that eventually come to be associated” [39] (p. 796). On the other hand, it is believed that semantic correspondences may arise as a result of statistical regularities, since the natural environment offers recurring, non-arbitrary combinations that shape certain verbal associations. Indeed, some types of CCs may have a combined basis, grouping together more than one of these categories [41].
More recently, emotional correspondences have been emphasized [46], highlighting how shared affective tone (e.g., the perception of the taste of beer [33], chocolate ice cream [22,23], and the appreciation of works of art [19]) can organise crossmodal mappings, especially for complex stimuli. In musical contexts, these bases often combine [41], so that a given mapping, for instance, between pitch and visual size, may simultaneously draw on statistical learning, emotional colouring, and structural properties of the nervous system. Considering music’s ability to connect broad neural networks, it is plausible that certain CCs also have structural or even semantic foundations.

5. The Senses of Music or the Music of the Senses: A Proposed Model

Building on these foundations, this review proposes a theoretical model that specifically addresses the multisensory nature of musical experience. While previous work has documented a wide range of crossmodal correspondences and multisensory interactions, there is still a lack of a music-centred model that integrates empirical findings from music cognition with philosophical accounts of perception and contemporary multisensory research. This proposal treats music as a privileged domain for crossmodal activation and interaction.
The model comprises three main components. First, musical organization refers to the intrinsic structure of the musical stimulus (e.g., pitch, timbre, tempo, articulation, texture, form) that affords crossmodal mappings. Second, cognitive structure designates the individual’s repertoire of statistical learning, affective dispositions, cultural background, and semantic associations that shape how multisensory regularities are perceived and interpreted. Third, mobilised sensory systems—which refer to the activation of sensory systems during information processing in a given modality—include the auditory, visual, tactile, kinesthetic, gustatory, and olfactory modalities, which may be engaged by both physically present stimuli and imagined, metaphorical, and remembered experiences.
From this perspective, the multisensory nature of music operates in an expanded sensory dimension, extending beyond the strictly auditory domain, for both passive listeners and performers. When in contact with music, the subject not only decodes sound stimuli, but also simulates movements, imagines spatial trajectories, creates mental images, and attributes meanings and relationships to other perceptual domains, such as flavours, colours, textures, and shapes, whether spontaneously or induced. Multisensory perception, therefore, is not a secondary or illustrative effect of the musical experience, but one of its fundamental organising principles.
On a deeper level, the model regards the CCs as the organising axes of multisensory experiences in music, whose structuring three principles—universality, the congruence effect, and hierarchical tendency—influence the way in which multisensory musical experiences emerge and vary across contexts and individuals.

5.1. Structural Principles (Universality, Congruence, Hierarchical Tendency)

By universality, we mean that crossmodal associations comprise recurring phenomena in human experience and, in the first instance, emerge from statistical regularities in the physical environment. From this perspective, despite cultural variations and subjectivities, it is possible to find non-arbitrary relationships in various crossmodal domains related to music. Similar patterns of association were found between musical excerpts and images of different shapes (dots, lines, spirals, circular shapes) among individuals from the United Kingdom, Japan, and Papua New Guinea, suggesting that musical parameters such as articulation and pitch find abstract visual parallels even in a population with no knowledge of musical notation [56].
The same principle was observed in one of the first tests involving CC, known as the Kiki Bouba Effect, which the tendency to relate the meaningless words ‘kiki’ and ‘bouba’ to pointed and rounded visual shapes, respectively, was observed among English speakers on the island of Tenerife [57], Tamil speakers in India [55], non-Western populations such as the Himba of Kaokoland in Namibia [58], as well as among native children on the Mahale Peninsula in Tanganyika [59].
Therefore, the principle of universality in CC suggests that this phenomenon comprises a characteristic of the perceptual apparatus, occurring among all individuals, albeit in varying ways and degrees. Furthermore, it suggests that certain CC are transversal to different cultures, regardless of prior experience, as in the case of the Kiki-Bouba Effect mentioned above.
On the other hand, the congruence effect suggests that the cognitive system searches crossmodal combinations consistent with internalised patterns, whose associations have, in some way, already been observed in the environment or learned over time [40]. In music, for example, large instruments, such as the tuba and double bass, produce lower sounds than smaller instruments, such as the flute and violin. The same can be observed in nature, where small animals emit high-pitched sounds, while large animals emit low-pitched sounds.
Several studies indicate that pairs of congruent stimuli are processed more quickly [6,50,60,61] and have positive effects on perceived pleasantness and enjoyment [25,26]. It allows us to identify certain domains that share similar characteristics and qualities. For example, in the field of music, there is evidence that musical tempo is related to the speed of body movements [62,63], visual shapes [14,52], and pitch [6].
Although the congruence effect offers a certain perceptual stability, the expressiveness and sensitivity inherent in the musical experience result, in part, from its manipulation. Thus, in the musical context, some crossmodal associations can depart from the traditionally accepted perspective, inspire transgressions, and propose interesting configurations.
Finally, considering hierarchical tendency as one of the structuring principles of the multisensory nature of music, it is suggested that musical stimuli follow a path of variables along a continuum conditioned by hierarchical levels, whether physiological, psychological, musical, or cultural [56]. Consequently, the way in which musical elements are organised in music may enhance or attenuate some associations.
The hierarchical tendency becomes apparent when a particular musical excerpt features a variety of musical elements simultaneously, as in F. Chopin’s Prélude Op. 28 No. 22 in G minor, a piece for solo piano performed at a Molto agitato tempo, with contrasting dynamics (at times forte, at times soft) and a striking, contrapuntal rhythm. When used in a study of correspondence between musical excerpts and visual forms and icons, an excerpt from this work was associated with pointed shapes [14], although the piano’s timbre had been associated with rounded shapes in a previous study [10]. In this case, other musical elements, perhaps the intensity or the tempo, exerted a greater influence on the type of CC established.
Similarly, when using musical stimuli with the same timbre (flute) but with melodic and articulatory variations, it was observed that staccato excerpts were associated with dots, while legato excerpts were associated with circular figures, suggesting that, in these cases, articulation, rather than timbre, was the determining factor [56]. Other studies involving complex musical stimuli have identified tempo as the most decisive quality in these associations [52]. Therefore, this hierarchical organisation allows us to understand how simple musical stimuli activate direct mappings, while complex musical structures integrate multiple layers of CCs.

5.2. Interaction Between Musical Organisation, Cognitive Structure, Mobilised Sensory Systems and Affective Aspects

From this perspective, it is suggested that the multisensory nature of music emerges from the dynamic interaction between the intrinsic organisation of music, as a phenomenon articulated by different sound elements, the individual cognitive structure, shaped by statistical learning, affective, cultural, and semantic levels, and the set of sensory attributes mobilised by a given stimulus, while being permeated by the principles of universality, congruence effect, and hierarchical tendency. Even though crossmodal associations reveal non-arbitrary patterns, such as those presented throughout this work, the musical experience manifests as a unique multisensory phenomenon, continuously modulated by the particularities of each person.
In this regard, it should be noted that, despite growing interest, there are still few studies that directly address individual differences in multisensory experiences related to music. Nevertheless, literature consistently highlights the role of emotions in the processing of CCs, influencing not only the way in which stimuli from different modalities are integrated, but also the meanings and responses that emerge from them.
For example, studies suggest that musical preferences can influence the perceived taste of different foods: listening to one’s favourite music increased the perceived enjoyment of eating chocolate ice cream [22]; when listening to music they liked, positive emotions were evoked and the perceived sweetness of the ice cream increased, whereas when listening to music they did not like, combined with unpleasant sounds, the bitter and toasty characteristics of the food were more prominent [23]; emotional reactions triggered by music also influenced specific aspects of beer tasting, such as alcohol content and body [33]. In the context of a museum exhibition, background music significantly influenced judgements of pleasure, beauty and preference whilst viewing a work by W. Kandinsky: musical excerpts rated positively increased appreciation of the painting, whilst those rated negatively reduced it [19].
These studies suggest that emotions play a central role in modulating CCs, demonstrating that multisensory processing is influenced not only by the characteristics of the stimuli but also by individuals’ emotional responses. These findings provide a starting point for future research into the relationship between the universal and individual aspects of multisensory experience, whilst also considering cultural variables and musical experience.

6. Prospects for Future Research

In view of the foundations and theoretical model presented here, it is pertinent to outline an agenda for future research to test and deepen them, contributing to the consolidation of this field of study across at least three areas, namely:
A.
Theoretical, (1) to investigate the relationship between statistical learning and patterns of correspondence involving music in specific sensory domains; (2) to compare CCs in response to simple and complex stimuli, verifying the functioning of the congruence effect and hierarchical tendency; (3) to propose a predictive model of CCs in musical experience connecting with expectation theory.
B.
Empirical, (1) to investigate how CCs work in real contexts, such as restaurants, concert halls, and learning contexts, in dialogue with more ecological research trends; (2) to explore the role of attention in the perception of multisensory experiences and the emotional factors involved in this process; (3) to analyse the effects of musical training, age, and cultural context on CCs; (4) to investigate the structural basis of multisensory perception using neuroimaging techniques, identifying the areas of the brain involved in multisensory experiences.
C.
Teaching, (1) to examine the effects of multisensory approaches on music teaching and learning processes; (2) to provide empirical evidence on multisensory processing from childhood onwards; (3) to investigate the relationships between multisensory experiences and the formation of concepts and memorisation of musical patterns.

7. Considerations

This review explored the fundamentals of music’s multisensory nature and proposed a theoretical model for understanding this phenomenon based on CCs. The philosophical and theoretical foundations of the topic were grounded in empirical evidence, positioning crossmodal experiences not as exceptional events but as essential mechanisms of human development shared by all individuals.
The proposed dynamic model suggests that musical experience emerges from the interaction between the intrinsic organisation of sound elements, individual cognitive structure, and the set of sensory attributes mobilised by each stimulus. By articulating principles of universality, congruence bias, and hierarchical tendency, the model seeks to integrate the structural, affective, and semantic dimensions of musical experience, respecting the uniqueness that results from the interaction between shared patterns and individual perceptions.
Finally, the research agenda outlined in this review highlights the potential of the proposed model to inform future theoretical, empirical, and pedagogical work in music psychology and human development. In this sense, the model invites interdisciplinary dialogue and provides a framework for designing studies that bridge laboratory paradigms, ecological contexts, and music-based interventions throughout the lifespan.

Author Contributions

Conceptualization, C.N., A.I.P. and H.R.; visualization, C.N. and A.I.P.; writing—original draft preparation, C.N.; writing—review and editing, C.N., A.I.P. and H.R.; funding acquisition, C.N. and H.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was developed within the framework of CESEM (Centre for Music Studies, UID/693/2025). Financial support was provided by FCT (Portuguese Foundation for Research and Technology) through a PhD grant (PD/BD/150597/2020), co-financed under the European Social Fund (ESF) and the Ministry of Science, Technology, and Higher Education (MCTES).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

The content of the manuscript was created without the help of AI. During the preparation of this manuscript, the authors used DeepL Translator (2026) to translate the text from Portuguese into English. Afterwards, ChatGPT 3.5 was used to review the translation for correcting language that would sound unacademic or inappropriate to native speakers (across all sections). The entire text was then checked and corrected manually where necessary. The authors checked the terminology with the original sources, reviewed and edited the output, and take full responsibility for the content of this publication.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
CCCrossmodal correspondence
CCsCrossmodal correspondences

References

  1. Tan, S.; Pfordresher, P.; Harré, R. Psychology of Music: From Sound to Significance; Psychology Press: New York, NY, USA, 2010. [Google Scholar]
  2. Deroy, O.; Spence, C. Why we are not all synesthetes (not even weakly so). Psychon. Bull. Rev. 2013, 20, 643–664. [Google Scholar] [CrossRef]
  3. Gallace, A.; Spence, C. Multisensory synesthetic interactions in the speeded classification of visual size. Percept. Psychophys. 2006, 68, 1191–1203. [Google Scholar] [CrossRef] [PubMed]
  4. Parise, C.; Spence, C. Synesthetic congruency modulates the temporal ventriloquism effect. Neurosci. Lett. 2008, 442, 257–261. [Google Scholar] [CrossRef]
  5. Parise, C.; Spence, C. “When birds of a feather flock together”: Synesthetic correspondences modulate audiovisual integration in non-synesthetes. PLoS ONE 2009, 4, e5664. [Google Scholar] [CrossRef]
  6. Parise, C.; Spence, C. Audiovisual crossmodal correspondences and sound symbolism: A study using the implicit association test. Exp. Brain Res. 2012, 220, 319–333. [Google Scholar] [CrossRef]
  7. Walker, P.; Smith, S. Stroop interference based on the multimodal correlates of haptic size and auditory pitch. Perception 1985, 14, 729–736. [Google Scholar] [CrossRef]
  8. Evans, K.; Treisman, A. Natural cross-modal mappings between visual and auditory features. J. Vis. 2010, 10, 6. [Google Scholar] [CrossRef]
  9. Marks, L. On Cross-Modal Similarity: Auditory-Visual Interactions in Speeded Discrimination. J. Exp. Psychol. 1987, 16, 384–394. [Google Scholar] [CrossRef]
  10. Adeli, M.; Rouat, J.; Molotchnikoff, S. Audiovisual correspondence between musical timbre and visual shapes. Front. Hum. Neurosci. 2014, 8, 352. [Google Scholar] [CrossRef] [PubMed]
  11. Palmer, S.; Schloss, K.; Xu, Z.; Prado-León, L. Music–color associations are mediated by emotion. Proc. Natl. Acad. Sci. USA 2013, 110, 8836–8841. [Google Scholar] [CrossRef] [PubMed]
  12. Lindborg, P.; Friberg, A. Colour Association with Music Is Mediated by Emotion: Evidence from an Experiment Using a CIE Lab Interface and Interviews. PLoS ONE 2015, 10, e0144013. [Google Scholar] [CrossRef]
  13. Whiteford, K.L.; Schloss, K.B.; Helwig, N.E.; Palmer, S.E. Color, music, and emotion: Bach to the blues. i-Perception 2018, 9, 2041669518808535. [Google Scholar] [CrossRef] [PubMed]
  14. Murari, M.; Schubert, E.; Rodà, A.; Da Pos, O.; De Poli, G. How >:(is Bizet? Icon ratings of music. Psychol. Music 2017, 46, 749–760. [Google Scholar] [CrossRef]
  15. Marin, M.; Gingras, B.; Bhattacharya, J. Crossmodal transfer of arousal, but not pleasantness, from the musical to the visual domain. Emotion 2012, 12, 618–631. [Google Scholar] [CrossRef]
  16. Simurra, I.; Vanzella, P.; Sato, J. Timbre and Visual Forms: A crossmodal study relating acoustic features and the Bouba- Kiki Effect. In Proceedings of the 2nd International Conference on Timbre (Timbre 2020), Thessaloniki, Greece (online), 3–4 September 2020; Available online: http://timbre2020.mus.auth.gr/assets/papers/Timbre2020_proceedings.pdf (accessed on 3 April 2026).
  17. Albertazzi, L.; Canal, L.; Micciolo, R. Cross-modal associations between materic painting and classical Spanish music. Front. Psychol. 2015, 6, 424. [Google Scholar] [CrossRef] [PubMed]
  18. Albertazzi, L.; Canal, L.; Micciolo, R.; Hachen, I. Cross-Modal Perceptual Organization in Works of Art. i-Perception 2020, 11, 2041669520950750. [Google Scholar] [CrossRef]
  19. Braun Janzen, T.; de Oliveira, B.; Ventorim Ferreira, G.; Sato, J.R.; Feitosa-Santana, C.; Vanzella, P. The effect of background music on the aesthetic experience of a visual artwork in a naturalistic environment. Psychol. Music 2022, 51, 16–32. [Google Scholar] [CrossRef]
  20. Guetta, R.; Loui, P. When music is salty: The crossmodal associations between sound and taste. PLoS ONE 2017, 12, e0173366. [Google Scholar] [CrossRef]
  21. Reinoso-Carvalho, F.; Wang, Q.; van Ee, R.; Persoone, D.; Spence, C. “Smooth operator”: Music modulates the perceived creaminess, sweetness, and bitterness of chocolate. Appetite 2017, 108, 383–390. [Google Scholar] [CrossRef]
  22. Kantono, K.; Hamid, N.; Shepherd, D.; Yoo, M.J.Y.; Carr, B.T.; Grazioli, G. The effect of background music on food pleasantness ratings. Psychol. Music 2016, 44, 1111–1125. [Google Scholar] [CrossRef]
  23. Lin, Y.; Hamid, N.; Shepherd, D.; Kantono, K.; Spence, C. Musical and Non-Musical Sounds Influence the Flavour Perception of Chocolate Ice Cream and Emotional Responses. Foods 2022, 11, 1784. [Google Scholar] [CrossRef]
  24. Galmarini, M.; Silva Paz, R.; Enciso Shoquehuanca, D.; Zamora, D.; Mesz, B. Impact of music on the dynamic perception of coffee and evoked emotions evaluated by temporal dominance of sensations (TDS) and emotions (TDE). Food Res. Int. 2021, 150, 110795. [Google Scholar] [CrossRef]
  25. Wang, Q.; Spence, C. Assessing the Effect of Musical Congruency on Wine Tasting in a Live Performance Setting. i-Perception 2015, 6, 2041669515593027. [Google Scholar] [CrossRef]
  26. Velasco, C.; Balboa, D.; Marmolejo-Ramos, F.; Spence, C. Crossmodal effect of music and odor pleasantness on olfactory quality perception. Front. Psychol. 2014, 5, 1352. [Google Scholar] [CrossRef] [PubMed]
  27. Sathian, K.; Ramachandran, V. Multisensory Perception: From Laboratory to Clinic; Academic Press: Cambridge, MA, USA, 2020. [Google Scholar] [CrossRef]
  28. Ramachandran, V.; Marcus, Z.; Chunharas, C. Bouba-Kiki: Cross-domain resonance and the origins of synesthesia, metaphor, and words in the human mind. In Multisensory Perception: From Laboratory to Clinic; Sathian, K., Ramachandran, V., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 3–40. [Google Scholar] [CrossRef]
  29. Marks, L. The Unity of Senses: Interrelations Among the Modalities; Academic Press: New York, NY, USA, 1978; Available online: https://play.google.com/books/reader?id=qjq0BQAAQBAJ&pg=GBS.PA5 (accessed on 3 April 2026).
  30. Bruno, N.; Pavani, F. Perception: A Multisensory Perspective; Oxford University Press: New York, NY, USA, 2018. [Google Scholar]
  31. Auvray, M.; Spence, C. The multisensory perception of flavor. Conscious. Cogn. 2008, 17, 1016–1031. [Google Scholar] [CrossRef]
  32. Caznok, Y.B. Música: Entre o Audível e o Visível; Editora Unesp: São Paulo, Brazil, 2008. [Google Scholar]
  33. Reinoso-Carvalho, F.; Dakduk, S.; Wagemans, J.; Spences, C. Not Just Another Pint! The Role of Emotion Induced by Music on the Consumer’s Tasting Experience. Multisens. Res. 2019, 32, 367–400. [Google Scholar] [CrossRef] [PubMed]
  34. Spence, C.; Richards, L.; Kjellin, E.; Huhnt, A.-M.; Daskal, V.; Scheybeler, A.; Velasco, C.; Deroy, O. Looking for crossmodal correspondences between classical music and fine wine. Flavour 2013, 2, 29. [Google Scholar] [CrossRef]
  35. Parise, C. Crossmodal Correspondences: Standing Issues and Experimental Guidelines. Multisens. Res. 2016, 29, 7–28. [Google Scholar] [CrossRef] [PubMed]
  36. Bergantini, L. Multissensorialidade: Contribuições da Arte-Tecnologia a Partir do Caso do Festival Ars Electronica 2019. Ph.D. Thesis, Escola de Comunicações e Artes da Universidade de São Paulo, São Paulo, Brazil, 2021. Available online: https://www.teses.usp.br/teses/disponiveis/27/27159/tde-15022022-120744/en.php (accessed on 3 April 2026).
  37. Leote, R. (Ed.) Processos perceptivos e multissensorialidade: Entendendo a arte multimodal sob conceitos neurocientíficos. In ArteCiênciaArte; Editora UNESP: São Paulo, Brazil, 2015; pp. 23–44. Available online: https://books.scielo.org/id/mqfvk (accessed on 3 April 2026).
  38. Stein, B.; Burry, D.; Constantinidis, C.; Laurienti, P.; Meredith, M.; Perraut, T., Jr.; Ramachandran, R.; Röder, B.; Rowland, B.; Sathian, K.; et al. Semantic confusion regarding the development of multisensory integration: A practical solution. Eur. J. Neurosci. 2010, 31, 1713–1720. [Google Scholar] [CrossRef]
  39. Parise, C.; Spence, C. Audiovisual cross-modal correspondences in the general population. In The Oxford Handbook of Synesthesia; Simner, J., Hubbard, E.M., Eds.; Oxford University Press: Oxford, UK, 2013; pp. 790–815. [Google Scholar] [CrossRef]
  40. Spence, C. Crossmodal correspondences: A tutorial review. Atten. Percept. Psychophys. 2011, 73, 971–995. [Google Scholar] [CrossRef]
  41. Brang, D.; Ramachandran, V.S. How do crossmodal correspondences and multisensory processes relate to synesthesia? In Multisensory Perception: From Laboratory to Clinic; Sathian, K., Ramachandran, V.S., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 259–281. [Google Scholar] [CrossRef]
  42. Bragança, G. Relações entre Sensações Sinestésicas, Estados Emocionais e Estruturas Musicais. Ph.D. Thesis, Instituto de Ciências Biológicas da Universidade Federal de Minas Gerais, Belo Horizonte, Brazil, 2014. Available online: https://hdl.handle.net/1843/BUBD-9WMUJF (accessed on 3 April 2026).
  43. Ramachandran, V.S.; Hubbard, E.M. Hearing colors, tasting shapes. Sci. Am. Mind 2003, 16, 16–23. [Google Scholar] [CrossRef] [PubMed]
  44. Eitan, Z. How pitch and loudness shape musical space and motion. In The Psychology of Music in Multimedia; Tan, S., Cohen, A., Lipscomb, S., Kendall, R., Eds.; Oxford University Press: Oxford, UK, 2013; pp. 165–191. [Google Scholar] [CrossRef]
  45. Martino, G.; Marks, L. Synesthesia: Strong and weak. Curr. Dir. Psychol. Sci. 2001, 10, 61–65. [Google Scholar] [CrossRef]
  46. Spence, C. Assessing the Role of Emotional Mediation in Explaining Crossmodal Correspondences Involving Musical Stimuli. Multisens. Res. 2020, 33, 1–29. [Google Scholar] [CrossRef]
  47. Spence, C. Exploring Group Differences in the Crossmodal Correspondences. Multisens. Res. 2022, 35, 495–536. [Google Scholar] [CrossRef]
  48. Cytowic, R.E.; Eagleman, D.M. Wednesday is Indigo Blue: Discovering the Brain of Synesthesia; Boston Review: Cambridge, MA, USA, 2009. [Google Scholar]
  49. Hauck, P.; Hecht, H. Having a Drink with Tchaikovsky: The Crossmodal Influence of Background Music on the Taste of Beverages. Multisens. Res. 2019, 32, 1–24. [Google Scholar] [CrossRef]
  50. North, A. The effect of background music on the taste of wine. Br. J. Psychol. 2012, 103, 293–301. [Google Scholar] [CrossRef]
  51. Wang, Q.; Mesz, B.; Riera, P.; Trevisan, M.; Sigman, M.; Guha, A.; Spence, C. Analysing the Impact of Music on the Perception of Red Wine via Temporal Dominance of Sensations. Multisens. Res. 2019, 32, 455–472. [Google Scholar] [CrossRef] [PubMed]
  52. Blazhenkova, O.; Kumar, M.M. Angular versus curved shapes: Correspondences and emotional processing. Perception 2018, 47, 67–89. [Google Scholar] [CrossRef]
  53. Liew, K.; Lindborg, P.; Rodrigues, R.; Styles, S. Cross-Modal Perception of Noise-in-Music: Audiences Generate Spiky Shapes in Response to Auditory Roughness in a Novel Electroacoustic Concert Setting. Front. Psychol. 2018, 9, 178. [Google Scholar] [CrossRef]
  54. Spence, C.; Sathian, K. Audiovisual crossmodal correspondences: Behavioural consequences and neural underpinnings. In Multisensory Perception: From Laboratory to Clinic; Sathian, K., Ramachandran, V.S., Eds.; Academic Press: Cambridge, MA, USA, 2020; pp. 239–258. [Google Scholar] [CrossRef]
  55. Ramachandran, V.S.; Hubbard, E.H. Synaesthesia: A window into perception, thought and language. J. Conscious. Stud. 2001, 8, 3–34. [Google Scholar]
  56. Athanasopoulos, G.; Antović, M. Conceptual integration of sound and image: A model of perceptual modalities. Music Sci. 2018, 22, 72–87. [Google Scholar] [CrossRef]
  57. Köhler, W. Gestalt Psychology; Liveright: New York, NY, USA, 1929. [Google Scholar]
  58. Bremner, A.J.; Caparos, S.; Davidoff, J.; de Fockert, J.; Linnell, K.J.; Spence, C. “Bouba” and “ Kiki” in Namibia? A remote culture make similar shape-sound matches, but different shape-taste matches to Westerners. Cognition 2013, 126, 165–172. [Google Scholar] [CrossRef]
  59. Davis, R. The fitness of names to drawings: A cross-cultural study in Tanganyika. Br. J. Psychol. 1961, 52, 259–268. [Google Scholar] [CrossRef]
  60. Padulo, C.; Tommasi, L.; Brancucci, A. Implicit Association Effects Between Sound and Food Images. Multisens. Res. 2018, 31, 779–791. [Google Scholar] [CrossRef] [PubMed]
  61. Padulo, C.; Mangone, M.; Brancucci, A.; Balsamo, M.; Fairfield, B. Crossmodal congruency effects between sound and food pictures in a forced-choice task. Psychol. Res. 2021, 85, 2340–2345. [Google Scholar] [CrossRef] [PubMed]
  62. London, J.; Burger, B.; Thompson, M.; Toiviainen, P. Speed on the dance floor: Auditory and visual cues for musical tempo. Acta Psychol. 2016, 164, 70–80. [Google Scholar] [CrossRef] [PubMed]
  63. Kohn, D.; Eitan, Z. moving music: Correspondences of musical parameters and movement dimensions in children’s motion and verbal responses. Music Percept. 2016, 34, 40–55. [Google Scholar] [CrossRef]
Table 1. Key concepts in multisensory musical experience.
Table 1. Key concepts in multisensory musical experience.
Concept Definition (Short)Population ScopeExample
Sensory integrationCombination of information from different senses into one percept or unified event [2,38]All individualsSeeing and hearing a performer
Crossmodal correspondencesNon-arbitrary feature matchings across modalities [35,36,37,38,39,40]All individuals
(statistical)
High pitch—small visual size
SynesthesiaInvoluntary, idiosyncratic concurrents triggered by inducers [2,41,42,43]Minority
(~4%)
Pitch—colour in synesthetes
Table 2. Nature of crossmodal correspondence.
Table 2. Nature of crossmodal correspondence.
TypeBasisExample
Structural
[39,40,55]
Neural architectureShared coding of intensity across senses
Statistical
[35,39,40]
Environmental regularitiesLarge instruments—low sounds
Semantic [39,40]Shared linguistic descriptors“Bright” timbre and “bright” colour
Emotional [46,54]Shared affective meaningSad music—dark desaturated colours
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nogueira, C.; Pereira, A.I.; Rodrigues, H. The Senses of Music: Towards a Theoretical Model of Multisensory Musical Experience. Encyclopedia 2026, 6, 94. https://doi.org/10.3390/encyclopedia6050094

AMA Style

Nogueira C, Pereira AI, Rodrigues H. The Senses of Music: Towards a Theoretical Model of Multisensory Musical Experience. Encyclopedia. 2026; 6(5):94. https://doi.org/10.3390/encyclopedia6050094

Chicago/Turabian Style

Nogueira, Cristiane, Ana Isabel Pereira, and Helena Rodrigues. 2026. "The Senses of Music: Towards a Theoretical Model of Multisensory Musical Experience" Encyclopedia 6, no. 5: 94. https://doi.org/10.3390/encyclopedia6050094

APA Style

Nogueira, C., Pereira, A. I., & Rodrigues, H. (2026). The Senses of Music: Towards a Theoretical Model of Multisensory Musical Experience. Encyclopedia, 6(5), 94. https://doi.org/10.3390/encyclopedia6050094

Article Metrics

Back to TopTop