The Effects of Ethnically Congruent Music on Eye Movements and Food Choice—A Cross-Cultural Comparison between Danish and Chinese Consumers

Musical fit refers to the congruence between music and attributes of a food or product in context, which can prime consumer behavior through semantic networks in memory. The vast majority of research on this topic dealing with musical fit in a cultural context has thus far been limited to monocultural groups in field studies, where uncontrolled confounds can potentially influence the study outcome. To overcome these limitations, and in order to explore the effects of ethnically congruent music on visual attention and food choice across cultures, the present study recruited 199 participants from China (n = 98) and Denmark (n = 101) for an in-laboratory food choice paradigm with eye-tracking data collection. For each culture group, the study used a between-subject design with half of the participants listening to only instrumental “Eastern” music and the other half only listening to instrumental “Western” music, while both groups engaged in a food choice task involving “Eastern” and “Western” food. Chi-square tests revealed a clear ethnic congruency effect between music and food choice across culture, whereby Eastern (vs. Western) food was chosen more during the Eastern music condition, and Western (vs. Eastern) food was chosen more in the Western music condition. Furthermore, results from a generalized linear mixed model suggested that Chinese participants fixated more on Western (vs. Eastern) food when Western music was played, whereas Danish participants fixated more on Eastern (vs. Western) food when Eastern music was played. Interestingly, no such priming effects were found when participants listened to music from their own culture, suggesting that music-evoked visual attention may be culturally dependent. Collectively, our findings demonstrate that ambient music can have a significant impact on consumers’ explicit and implicit behaviors, while at the same time highlighting the importance of culture-specific sensory marketing applications in the global food industry.


Introduction
Sensory marketing can be defined as "marketing that engages the consumers' senses and affects their perception, judgment and behavior" [1]. In the context of food, researchers and practitioners have for decades been using this idea to study and create optimal dining and retail atmospherics that facilitate consumption and sales [2,3]. Auditory contributions to these sensory nudges are becoming an increasingly popular topic of research [4,5]. In particular, literature on ambient, atmospheric, or in-store 1. Eastern music would lead to more fixations on Eastern foods, and Western music would similarly lead to more fixations on Western foods. 2.
These fixation patterns could congruently predict the choice of the food, consistent with the "gaze bias theory" of visual attention and choice [38].
Importantly, despite the previously reported cultural differences between Eastern and Western cultures in terms of musical priming effects [23], our previous study similarly including Chinese and Danish participants, but with taste-congruent soundtracks, did not find any major interaction effects between music played and culture in the fixation and choice analyses [37]. Therefore, we did not a priori expect any cultural dependency with regard to fixations nor food choice. However, another study (not incorporating music) did find that Chinese (vs. American) participants had marginally more revisits, i.e., they alternated their fixations more between the food and background when observing different food images with varying background saliency [36]. The authors suggested that such difference could be rooted in a generally more holistic (vs. analytic) world view in Chinese participants. Accordingly, we also hypothesized that 3.
Chinese participants would have more revisits compared to Danish participants.

Participants
A total of 199 participants (98 Chinese and 101 Danish) took part in the main study. Two participants from China and four from Denmark were removed from the analysis due to eye-tracking quality < 70%, resulting in a final sample size of 193 participants of whom 96 were Chinese (mean age ± SD = 22.16 ± 2.73 years; 65% females) and 97 who were Danish (mean age ± SD = 23.81 ± 2.85 years; 61% females). The Chinese participants were university students in Beijing, China, who were recruited through participant pools on the social media platforms WeChat and QQ. The Danish participants were college/university students in Aarhus, Denmark, who were recruited through flyers at the Aarhus University Campus and social media including student Facebook groups. Therefore, the study consisted of two identical experiments which were carried out in two different locations (Beijing, China and Aarhus, Denmark), but with equivalent physical environment and experimental setup (see Section 2.4). All participants fulfilled the screening criteria and reported having normal or corrected-to-normal hearing, and normal or corrected-to-normal vision without color blindness. Written informed consent was obtained from all participants. The study was approved by the Ethics Committee of Aarhus University (approval number: 2019-616-000019), and conducted in accordance with the ethical standards laid out in the Declaration of Helsinki and Ethics Committee of the Institute of Psychology, Chinese Academy of Sciences. All participants were compensated for their participation (Chinese participants were monetarily compensated via Alipay or WeChat pay; Danish participants were rewarded a gift card).

Figure 1.
Example of a choice stimulus with two Western food items on the top and two Eastern food items on the bottom. Participants were instructed to choose one of the four food items that they wanted to eat the most using a mouse click. The overlaid labels indicated corresponds to the AOIs drawn out in the iMotions software ® (not shown to participants).

Auditory Stimuli
Two music excerpts were used in the study to represent "Eastern" and "Western" music, respectively. The Eastern music excerpt was from Hong Ting's performance of Lotus out of water, a traditional Chinese composition; and the Western excerpt was an interpretation of the jazz standard Autumn leaves, by Tal Farlow. Both excerpts only included a single plucked string instrument, the Chinese guzheng and acoustic guitar. The tempo of the excerpts was normalized to 90 bpm on average using Audacity 2.3.2 for Mac OS version 16.33 and were played from a Bose SoundLink Revolve Bluetooth speaker (Årslev, Denmark) at approximately 55 decibels in order to mimic a more realistic setting with ambient music. We specifically chose not to use headsets as it would unnecessarily draw attention to the sound component of the study, and we did not want to distract participants from the food choice task. Previous studies have endorsed the applicability of speakers in experiments with food and music [37,41,42]. We therefore controlled for listening volume by ensuring that the Bluetooth speaker was always placed in the same point in the room with respect to the participant (i.e., 80 cm away). The excerpts can be heard at: https://soundcloud.com/danni-pengli/sets/hongting-tal-farlow-90bpm.

Design and Procedure
Participants were asked to refrain from consuming food and drinks (except for water) two hours prior to the study, in order control for effects of hunger on food choice [43,44]. Participants entered the experiment room with artificial lightning (i.e., no sunlight) and seated approximately 70 cm from the HP EliteDisplay E243i, 24", 16:10 monitor (screen resolution of 1920 × 1080 pixels; Årslev, Denmark, and Beijing, China). They were instructed to minimize head movements during eyetracking recording. Before initiating the experiment, calibration of the screen-based Aurora Smart Eye eye-tracker (Årslev, Denmark), with a 60 Hz sampling rate and a head rotation and gaze accuracy Example of a choice stimulus with two Western food items on the top and two Eastern food items on the bottom. Participants were instructed to choose one of the four food items that they wanted to eat the most using a mouse click. The overlaid labels indicated corresponds to the AOIs drawn out in the iMotions software ® (not shown to participants).

Auditory Stimuli
Two music excerpts were used in the study to represent "Eastern" and "Western" music, respectively. The Eastern music excerpt was from Hong Ting's performance of Lotus out of water, a traditional Chinese composition; and the Western excerpt was an interpretation of the jazz standard Autumn leaves, by Tal Farlow. Both excerpts only included a single plucked string instrument, the Chinese guzheng and acoustic guitar. The tempo of the excerpts was normalized to 90 bpm on average using Audacity 2.3.2 for Mac OS version 16.33 and were played from a Bose SoundLink Revolve Bluetooth speaker (Årslev, Denmark) at approximately 55 decibels in order to mimic a more realistic setting with ambient music. We specifically chose not to use headsets as it would unnecessarily draw attention to the sound component of the study, and we did not want to distract participants from the food choice task. Previous studies have endorsed the applicability of speakers in experiments with food and music [37,41,42]. We therefore controlled for listening volume by ensuring that the Bluetooth speaker was always placed in the same point in the room with respect to the participant (i.e., 80 cm away). The excerpts can be heard at: https://soundcloud.com/danni-peng-li/sets/hongting-tal-farlow-90bpm.

Design and Procedure
Participants were asked to refrain from consuming food and drinks (except for water) two hours prior to the study, in order control for effects of hunger on food choice [43,44]. Participants entered the experiment room with artificial lightning (i.e., no sunlight) and seated approximately 70 cm from the HP EliteDisplay E243i, 24", 16:10 monitor (screen resolution of 1920 × 1080 pixels; Årslev, Denmark, and Beijing, China). They were instructed to minimize head movements during eye-tracking recording. Before initiating the experiment, calibration of the screen-based Aurora Smart Eye eye-tracker (Årslev, Denmark), with a 60 Hz sampling rate and a head rotation and gaze accuracy of 0.3 • , was done. The two sound conditions were divided between participants, such that half of the Chinese and Danish participants only listened to Eastern music, while the other half only listened to Western music. Music was played at the beginning of the calibration period and throughout the entire eye-tracking experiment of 20 trials. For each trial, they were shown a menu card (see Section 2.2) and instructed to choose one of the four food items that they wanted to eat in the given moment using a mouse click, while eye-tracking data was collected. No time constraint was set, but they were asked to choose simply based on their initial thought. To control for exposure bias, a fixation cross that participants had to focus on was displayed for 2 s before each trial ( Figure 2). participants had to focus on was displayed for 2 s before each trial ( Figure 2).
Participants underwent a training session of four trials prior to actual data collection. Based on the food categories, two areas of interest (AOI; Eastern AOI and Western AOI) were defined for each trial, as we were not particularly interested in each specific food item per se, but whether they were Eastern or Western (Figure 1). After choice paradigm with eye-tracking, participants answered a follow-up questionnaire in order to control for potential biases to the eye-tracking results. This included three questions about each of the twenty food items (liking: anchored with "Dislike extremely" to "Like extremely"; familiarity: anchored with "Not familiar at all" to "Extremely familiar"; and healthiness: anchored with "Extremely unhealthy" to "Extremely healthy"), with ratings given on continuous 9-point scales (see appendices; Figure A1). Likewise, participants listened to both music excerpts for 15 s followed by liking ratings and whether they believed them to be "Asian", "Western", or neither. The order of the food items shown and excerpt played were randomized for each participant. It took approximately 30 min for each participant to completely the whole study. The design of the experiment and preprocessing of data were carried out using iMotions ® software (https://imotions.com/platform/).

Figure 2.
Example of two trials. Before each trial, a fixation cross is presented for 2 s before participants make a food choice while eye-tracking is recorded. The trials were randomized between participants.

Eye-Tracking Metrics
One gaze point equals to one raw sample captured by the eye-tracker. At a sampling rate of 60 Hz, each gaze point represents a sixtieth of a second (16.67 ms). A fixation denotes when a series of gaze points happens to be close in time and range, resulting in gaze cluster (i.e., when our eyes are locked towards a specific entity). The typical duration of a fixation is 100-300 ms. Total fixation time is based on the total duration of participants' fixations (excludes datapoints between fixations). Participants underwent a training session of four trials prior to actual data collection. Based on the food categories, two areas of interest (AOI; Eastern AOI and Western AOI) were defined for each trial, as we were not particularly interested in each specific food item per se, but whether they were Eastern or Western ( Figure 1). After choice paradigm with eye-tracking, participants answered a follow-up questionnaire in order to control for potential biases to the eye-tracking results. This included three questions about each of the twenty food items (liking: anchored with "Dislike extremely" to "Like extremely"; familiarity: anchored with "Not familiar at all" to "Extremely familiar"; and healthiness: anchored with "Extremely unhealthy" to "Extremely healthy"), with ratings given on continuous 9-point scales (see Appendix A; Figure A1). Likewise, participants listened to both music excerpts for 15 s followed by liking ratings and whether they believed them to be "Asian", "Western", or neither. The order of the food items shown and excerpt played were randomized for each participant. It took approximately 30 min for each participant to completely the whole study. The design of the experiment and preprocessing of data were carried out using iMotions ® software (https://imotions.com/platform/).

Eye-Tracking Metrics
One gaze point equals to one raw sample captured by the eye-tracker. At a sampling rate of 60 Hz, each gaze point represents a sixtieth of a second (16.67 ms). A fixation denotes when a series of gaze points happens to be close in time and range, resulting in gaze cluster (i.e., when our eyes are locked towards a specific entity). The typical duration of a fixation is 100-300 ms. Total fixation time is based on the total duration of participants' fixations (excludes datapoints between fixations). Fixation count represents number of fixations recorded within an AOI. Revisit count indicates the number of fixation returns/revisits to an AOI (i.e., if a participant fixates on an AOI, fixate on somewhere else, and then return to the same AOI). We initially focused on these three metrics as they have been widely used in eye-tracking research [45,46]. In the preprocessing, a Velocity-Threshold Identification (I-VT) fixation-filter was set and fixations slower than 60 ms were removed from the analysis. See Appendix A ( Figure A2) for an example of and heatmap based on fixation values.

Statistical Analysis
All data was organized in Microsoft Excel for Mac OS version 16.33, and the analyses were performed in RStudio for Mac OS version 1.3.959.
Analysis of choice data was performed using chi-square independence tests to examine the relation between the music conditions (Eastern vs. Western) and food choice (Eastern vs. Western) for each culture. Yates' continuity correction was applied to all chi-square tests.
Eye-tracking data was analyzed using a generalized linear mixed models (GLMM) via the glmer()-function of the lme4 package in R. GLMM has been extensively employed in eye-tracking literature with repeated measures designs [47][48][49] as it depicts the response as a combination of fixed and random effects, and accounts for the hierarchical structure and non-independence of observations from individual participants by adding random intercepts to the models [50]. The models are very versatile due to their applicability to various types of variables: Gaussian (total fixation time), Poisson (fixation/revisit count), and binary (choice).
For the current eye-tracking analyses, the independent variables in the model were comprised of "culture" (between-subject factor; Chinese vs. Danish), "music condition" (between-subject factor; Eastern vs. Western), and "AOI" (Within-subject factor; Eastern vs. Western), which were coded as fixed effects. "Participant ID" and "Trial number" entered the model as random effects. The dependent variables of interest initially included Total fixation time, Fixation count, and Revisit count. Omnibus tests were carried out to test the main effects and interactions between the fixed independent variables. If a significant interaction was indicated by the GLMM, follow-up Bonferroni-corrected post hoc analyses were performed on the simple main effects using t-tests. These eye-tracking metrics were furthermore used in a multiple logistic regression as regressors on the binomial variable of food choice (chosen = 1; not chosen = 0), to assess the relationship between eye movement and eventual food choice, regardless of the food type.
Follow-up questionnaires were analyzed using one-way multivariate analysis of variance (MANOVA) to assess culture-specific differences in regard to liking, familiarity, and healthiness ratings of the combined variables of Eastern food (based on the average score of the ten Eastern food items) and Western food (based on the average score of the ten Western food items), as well as liking of the two music excerpts. If a global effect were found, Bonferroni corrected univariate one-way analysis of variance (ANOVA) was then performed on each dependent variable. In addition, paired t-tests (2-tailed) were performed to test for within-culture differences in liking, familiarity, and healthiness ratings between Eastern vs. Western food and liking between Eastern vs. Western music.

Eye-Tracking
A summary of the omnibus eye-tracking results is shown in Table 1, which is further explained in this section. To avoid redundancy and for clearer visual interpretation of the differences between the two cultures (as the choice results), we focused on two reduced models on each level of culture in the subsequent post analyses; one for Chinese participants only and one for Danish participants, each with two independent variables (music condition and AOI). Furthermore, due the tendency of high correlation between total fixation time and fixation count [51], we performed a correlation analysis on these to variables and found that they indeed were highly correlated for both Chinese (r = 0.88; t(3838) = 113; p < 0.001) and Danish (r = 0.87; t(3878) = 111; p < 0.001) participants. Therefore, to circumvent statistical redundancy, we focused on fixation count (instead of total fixation time) and revisit count. The exclusion also evades potential multicollinearity in our regression model [52]. All post hoc analyses reported are Bonferroni corrected for multiple comparisons and explained in the following sections.

Eye-Tracking
A summary of the omnibus eye-tracking results is shown in Table 1, which is further explained in this section. To avoid redundancy and for clearer visual interpretation of the differences between the two cultures (as the choice results), we focused on two reduced models on each level of culture in the subsequent post analyses; one for Chinese participants only and one for Danish participants, each with two independent variables (music condition and AOI). Furthermore, due the tendency of high correlation between total fixation time and fixation count [51], we performed a correlation analysis on these to variables and found that they indeed were highly correlated for both Chinese (r = 0.88; t (3838) = 113; p < 0.001) and Danish (r = 0.87; t (3878) = 111; p < 0.001) participants. Therefore, to circumvent statistical redundancy, we focused on fixation count (instead of total fixation time) and revisit count. The exclusion also evades potential multicollinearity in our regression model [52]. All post hoc analyses reported are Bonferroni corrected for multiple comparisons and explained in the following sections.

Fixation Count-Chinese Participants
For Chinese participants, an interaction effect in fixation count between music and AOI (z (3834) = 5.25; p < 0.001; Table 1) was detected. Post hoc analyses on the simple main effects showed that participants listening to Western music fixated significantly more on Western (vs. Eastern) food items (t (959) = 1.93; p = 0.022), but Eastern music did not lead to any differences (t (959) = −2.55; p = 0.110; Figure 4A).

Fixation Count-Chinese Participants
For Chinese participants, an interaction effect in fixation count between music and AOI (z(3834) = 5.25; p < 0.001; Table 1) was detected. Post hoc analyses on the simple main effects showed that participants listening to Western music fixated significantly more on Western (vs. Eastern) food items (t(959) = 1.93; p = 0.022), but Eastern music did not lead to any differences (t(959) = −2.55; p = 0.110; Figure  4A).

Fixation Count-Danish Participants
For Danish participants, an interaction effect in fixation count between music and AOI (z(3874) = 5.26; p < 0.001; Table 1) was observed. Post hoc analyses on the simple main effects showed that participants listening to Eastern music had significantly more fixations on Eastern (vs. Western) food

Fixation Count-Danish Participants
For Danish participants, an interaction effect in fixation count between music and AOI (z (3874) = 5.26; p < 0.001; Table 1) was observed. Post hoc analyses on the simple main effects showed that participants listening to Eastern music had significantly more fixations on Eastern (vs. Western) food items (t (1906) = 3.98; p < 0.001), but no differences were detected in the group only listening to Western music (t (1956) = −0.36; p = 1.000; Figure 4C).

Revisit Count-Chinese Participants
For Chinese participants, an interaction effect between music condition and AOI was found (z (3834) = 3.41; p < 0.001; Table 1). Post hoc analyses on the simple main effects indicated that participants listening to Western music revisited the Western (vs. Eastern) food significantly more (t (959) = −2.35; p = 0.038), but no differences were observed in the participants listening to Eastern music (t (959) = 1.89; p = 0.119; Figure 4B).

Revisit Count-Danish Participants
No main nor interaction effects were observed in the Danish participants in regard to revisit count (Table 1; Figure 4D).

Relationship between Food Choice and Eye Movements
The multiple logistic regression model for Chinese participants showed that the prediction variable of fixation count (z (3834) = 17.87; p < 0.001) and revisit count (z (3834) = -1.98; p = 0.048) contributed significantly to food choice (Table 2). Correspondingly, for Danish participants, both fixation count (z (3834) = 18.84; p < 0.001) and revisit count (z (3834) = -3.96; p < 0.001) could predict food choice. Fixation count was for both cultures the strongest predictor, meaning that the AOI that were fixated at more were correspondingly chosen more.

Music Excerpts
The MANOVA analysis ( Figure 6) revealed a significant difference between the two cultures on the combined dependent variables (likingEastern_music, likingWestern_music; F(1, 191) = 4.94; p = 0.008). Post hoc univariate ANOVA found that the dependent variable, likingWestern_music contributed significantly to the global effect. Specifically, Danish participants liked the Western music significantly more than Chinese participants (t(191) = −7.30; p < 0.001), but no difference was observed with regard to Eastern music (t(191) = 1.93; p = 0.11). The within-culture paired t-tests (2-tailed) showed that there was no difference in liking between Eastern and Western music for Chinese participants (t(95) = −0.64; p = 0.263), but for Danish participants, Western (vs. Eastern) music was liked significantly more (t(96) = 4.00; p < 0.001). Finally, 97% and 85% of the participants categorized the Eastern excerpt to be Eastern and the Western excerpt to be Western, respectively. In terms of familiarity, Chinese (vs. Danish) participants rated Eastern food to be more familiar (t (191) = 7.98; p < 0.001), and conversely, Danish participants were more familiar with Western food compared to Chinese participants(t (191) = 7.98; p < 0.001). With regard to healthiness ratings, Chinese participants rated Eastern food to be significantly healthier, compared to Danish participants (t (191) = 3.35; p = 0.002), but no participant background-based difference was detected for Western food (t (191) = -0.03; p = 1.000).

Music Excerpts
The MANOVA analysis ( Figure 6) revealed a significant difference between the two cultures on the combined dependent variables (liking Eastern_music , liking Western_music ; F (1, 191) = 4.94; p = 0.008). Post hoc univariate ANOVA found that the dependent variable, liking Western_music contributed significantly to the global effect. Specifically, Danish participants liked the Western music significantly more than Chinese participants (t (191) = −7.30; p < 0.001), but no difference was observed with regard to Eastern music (t (191) = 1.93; p = 0.11). The within-culture paired t-tests (2-tailed) showed that there was no difference in liking between Eastern and Western music for Chinese participants (t (95) = −0.64; p = 0.263), but for Danish participants, Western (vs. Eastern) music was liked significantly more (t (96) = 4.00; p < 0.001). Finally, 97% and 85% of the participants categorized the Eastern excerpt to be Eastern and the Western excerpt to be Western, respectively.

Discussion
Although the ideas and applications of sensory marketing have been employed widely by practitioners from small-to large-scale businesses [2,53], and studies have explored the effects of musical fit on consumer behavior [11,25,26], this is the first study that focuses on the ethnic congruity between "East" and "West" in terms of participants, music, and food. By utilizing eye-tracking to quantify visual attention during a laboratory-based food choice task, our study has also circumvented the uncertainties of previous experiments that exclusively focused on the choice itself in uncontrolled environments.
Consistent with our hypothesis 1, effects of musical fit on visual attention and food choice were observed. A clear congruent difference between what the participants chose and what they listened to was likewise reflected in the chi-square choice results. In line with previous literature [18,26], we found that both Chinese and Danish participants chose significantly more Eastern (vs. Western) food in the Eastern music condition and Western (vs. Eastern) food in the Western music condition. Not surprisingly, Chinese participants were more familiar with Eastern food items and Danish participants were more familiar with Western ones.
Interestingly, and partially in contrast to our hypothesis 1, our results showed that participants' fixations were only affected by the music which was ethnically different from their culture. That is, Chinese participants fixated more on Western food when Western music was played (but without difference for participants listening to Eastern music), whereas Danish participants fixated more on

Discussion
Although the ideas and applications of sensory marketing have been employed widely by practitioners from small-to large-scale businesses [2,53], and studies have explored the effects of musical fit on consumer behavior [11,25,26], this is the first study that focuses on the ethnic congruity between "East" and "West" in terms of participants, music, and food. By utilizing eye-tracking to quantify visual attention during a laboratory-based food choice task, our study has also circumvented the uncertainties of previous experiments that exclusively focused on the choice itself in uncontrolled environments.
Consistent with our hypothesis 1, effects of musical fit on visual attention and food choice were observed. A clear congruent difference between what the participants chose and what they listened to was likewise reflected in the chi-square choice results. In line with previous literature [18,26], we found that both Chinese and Danish participants chose significantly more Eastern (vs. Western) food in the Eastern music condition and Western (vs. Eastern) food in the Western music condition. Not surprisingly, Chinese participants were more familiar with Eastern food items and Danish participants were more familiar with Western ones.
Interestingly, and partially in contrast to our hypothesis 1, our results showed that participants' fixations were only affected by the music which was ethnically different from their culture. That is, Chinese participants fixated more on Western food when Western music was played (but without difference for participants listening to Eastern music), whereas Danish participants fixated more on Eastern food when Eastern music was played (without any difference for participants in the Western music condition). The study by Yeoh and North [26] revealed that musical fit effects in Asian participants only occur when there was no clear preference between the competing foods. Our findings suggested, however, that the influence of musical fit effect on visual attention only occurred whenever the music was incongruent with participants ethnic culture, but regardless of food preference within the two cultures. In other words, we still observed priming effects for music that were different from participants' own culture despite differences in food preference.
Importantly, in the domain of visual attention, these effects were more prominent when measuring fixation count rather than revisit count, as determined by lower β estimate and higher p-value in the regression analyses. In fact, fixation count was the strongest predictor of food choice, meaning for every unit of change in fixation count, the log odds of choice (vs. no choice) would increase by 0.25 and 0.26 for Chinese and Danish participants, respectively. Manippa and colleagues [54] similarly found that their visual primes did not influence the participants' food choices, but dwell time and fixation count were higher for chosen versus non-chosen foods. Consequently, in line with our hypothesis 2, people simply look more at food that is eventually chosen, consistent with previous studies [55,56], confirming the "gaze bias theory" [38]. Notably, some evidence suggest that increased visual attention to a product does not always in practice imply it is chosen [57,58].
Put together, these results collectively suggest that ethnically congruent music can invariably impact consumer choice as well as visual attention in both Western and Eastern cultures, possibly by creating referential meaning through semantic connotations. On a broader level, our findings demonstrate that not only extensively studied extrinsic product factors such as packaging and label design [33,59], but also contextual, ambient stimuli unrelated to the product itself can influence consumers' visual attention.
In light of the ongoing discussion on the stereotypically reported cultural differences in human cognition, where "East" is more collectivistic and focuses on structural determinants of behavior, and the "West" relying more on analytical and individualistic processing and reasoning [60][61][62], none of our findings clearly reflected this view. Interestingly, concerning revisit count, only Chinese participants listening to Western music had more revisits to Western (vs. Eastern) food items, which is in line with our hypothesis 3. One may argue that this reflects Easterners' more "holistic" viewing patterns as they alternate their fixations more between the AOI. Although there was a lack of significance, Zhang and Seo [36] also found that there was a trend for Chinese (vs. American) to have more revisits to the food from the background, and in particularly these increased as background saliency increased, while Americans tended to fixate more at the individual focal food objects. Hence, when background cues (whether it was table and plate decorations or music) are more salient or different from the norm (such as instrumental Western jazz), they may prompt more visually explorative behavior in individuals of Eastern ethnicity as previously reported [61,63].
On the other hand, these subtle differences may simply be due to the lower familiarity and higher palatability ratings of the Western food, and/or more hesitation of the choice. Besides, since Chinese participants fixated more on Western food in the Western music condition, even though there were no clear preference between the two music excerpts, the increased revisits are disputably just the product of musical fit and choice hesitation. That is, more revisits to Western food is likely manifested in predominantly deliberate "system 2" thinking as opposed to decisions based on more immediate "system 1" processes [64,65] when participants were exposed to Western music.

Limitations
It should be noted that the affective and cognitive processes mentioned above are highly dependent on other components which were not accounted for in the current study. For instance, a limitation is that we did not control for personality traits nor BMI, but only included background info of age, gender, and culture. It has previously been documented that individuals with lower self-esteem, poorer foresight, and more impulsive behavior, as observed in e.g., obese participants, are more susceptible to be influenced by sensory nudges [20,66]. In contrast, a recent study revealed that visual nudging interventions were efficient regardless of impulsivity traits [67], suggesting that at least some forms of nudging effects are applicable consistently through the consumer spectrum. Nonetheless, a more refined stratification of groups based on individual differences in addition to cultural background would indeed improve the current study.
Along the same lines, one may speculate that the demographic inclusion of "Chinese" participants is too broad a categorization, as there is enormous cultural variation within China itself [68]. However, our participants were only recruited from universities in Beijing, China, and have therefore lived in Beijing during at least their university years. In addition, the majority of Chinese citizens are still influenced by the same Confucian ideologies of collectivism and holism [69], and in any case, the within-China diversity is indisputably still smaller than the difference between China versus Denmark, or "East" versus "West".
That said, despite the similar demographic background (age and gender distribution) of Chinese and Danish participants, one should be aware of the generalizability of responses between these two groups due to differences in response style and conceptualization [70]. For example, we cannot know for sure that the subject ratings such as music liking reflected participants' actual preference. Although there was reportedly no difference between the two cultures for the Eastern music, other implicit electrophysiological measures would have been able to validate these responses [71]. This could perhaps clarify whether or not the auditory stimuli used also generated embodied meaning (i.e., spontaneous feelings) through affective priming, and not solely referential meaning through semantic congruency and memory. In other words, we would be able to test if the music played also induced emotional and affective responses, in addition to the semantic congruency. Particularly, if one of the music excerpts were evidently more preferred over the other, this might have been a decisive factor captured by (neuro)physiological measures, and thereby leading to difference in reward valuation and emotional regulation as observed in a study by Salimpoor and colleagues [72], This could result in modified hedonic experience of the food items through so-called "sensation transference" [73] and consequently affect visual attention and eye-movements.
In addition, we did not explicitly ask participants about familiarity of the music excerpts. Such information would add to the understanding about the referential meaning of the music retrieved from memory and prior experiences. Conversely, we expediently asked whether they believed the music to be categorized as either "Eastern" or "Western", which at least confirmed the ethnic congruency by a majority of the participants.
Finally, our study included a terminological/technological uncertainty, namely the quantitative inference of visual attention. Although there is a there is a strong association between eye movements and attention [74], we must still concede that what is being fixated on does not always on its own imply that the object is perceptually or cognitively processed in brain [46]. In order to accommodate this limitation as best as possible, we therefore discarded fixations that were lower than 60 ms using the I-VT-fixation-filter during preprocessing [75].

Conclusions
The present study is the first to explore implicit behavioral effects of ethnically congruent music between Eastern and Western participants. Our findings illustrate that the notion of musical fit on visual attention and food choice is universal across cultures. However, the congruency bias is not particularly uniform for Chinese and Danish participants. Instead, the visual saliency towards specific food categories is arguably more prominent whenever the music excerpts are incongruent with one's cultural background. In other words, the musical fit effect seems to work best with relatively unfamiliar food/music combinations. To increase the reliability and confidence to practitioners and stakeholders, future studies should aim to replicate these findings with various types of consumers and musical