Next Article in Journal
Aesthetic Image Statistics Vary with Artistic Genre
Next Article in Special Issue
Individual Differences in Multisensory Interactions: The Influence of Temporal Phase Coherence and Auditory Salience on Visual Contrast Sensitivity
Previous Article in Journal
Acknowledgement to Reviewers of Vision in 2019
Previous Article in Special Issue
The Louder, the Longer: Object Length Perception Is Influenced by Loudness, but Not by Pitch
 
 
Communication
Peer-Review Record

Musical Training Improves Audiovisual Integration Capacity under Conditions of High Perceptual Load

by Jonathan M. P. Wilbiks * and Courtney O’Brien
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Submission received: 25 October 2019 / Revised: 2 January 2020 / Accepted: 17 January 2020 / Published: 24 January 2020
(This article belongs to the Special Issue Multisensory Modulation of Vision)

Round 1

Reviewer 1 Report

The authors aimed to test the hypothesis that elements of musicality in participants can contribute to variation in audiovisual integration capacity. Participants watched a series of visual displays and judged which elements of the display changed when a tone was presented. Participants’ integration capacity estimates were correlated with three elements of musicality. They found audiovisual integration capacity was positively correlated with the amount of musical training in the shortest SOA condition.

The study is well conducted. My main concern is the finding reported in the current study is rather limited. The reported correlation between integration capacity and amount of musical training is relatively weak: it is the only significant one (p= 0.038) among 9 correlation tests. This result will not be significant, if we apply standard methods to adjust the p-values of multiple correlation coefficients. At least the authors can report effect size.

The removal of 20% participants who did not perform above chance is questionable, although as authors justified that such data management practices were used in earlier studies. I wonder what would be the musicality in these removed participants.

How reliable is the Goldsmith Musical Sophistication Index? Why the General Musical Sophistication factor is not included in the correlational  analysis?

Another minor point is that the introduction needs to be more focused.

Author Response

REVIEWER 1

 

Comments and Suggestions for Authors

The authors aimed to test the hypothesis that elements of musicality in participants can contribute to variation in audiovisual integration capacity. Participants watched a series of visual displays and judged which elements of the display changed when a tone was presented. Participants’ integration capacity estimates were correlated with three elements of musicality. They found audiovisual integration capacity was positively correlated with the amount of musical training in the shortest SOA condition.

The study is well conducted. My main concern is the finding reported in the current study is rather limited. The reported correlation between integration capacity and amount of musical training is relatively weak: it is the only significant one (p= 0.038) among 9 correlation tests. This result will not be significant, if we apply standard methods to adjust the p-values of multiple correlation coefficients. At least the authors can report effect size.

In looking into this issue, we also found that we should have used a Spearman correlation, rather than Pearson. This being the case, we now report findings of the Spearman correlation in the paper. We also now adjust our p-value for the fact that each individual metric is tested for 3 correlations. As such, the adjusted p-value we use is p < .017.

The removal of 20% participants who did not perform above chance is questionable, although as authors justified that such data management practices were used in earlier studies. I wonder what would be the musicality in these removed participants.

As we acknowledge in the manuscript, we also regret this, but we wished to continue our usual data management practices. We also note that there was no significant difference in musicality between included and removed participants, and that those participants that were removed seemed to be responding randomly, rather than according to the task.

How reliable is the Goldsmith Musical Sophistication Index? Why the General Musical Sophistication factor is not included in the correlational  analysis?

The Gold-MSI is a highly reliable measure, with Cronbach’s α for active engagement of .872, for perceptual abilities of .873, and for musical training of .903, and test-retest correlations of .899, .894, and .974, respectively. (this text has been added to section 2.2 of the manuscript as well).

The reason we did not use General Musical Sophistication is because it functions as a composite score of the six subscores of the MSI (including the three we already used).

Another minor point is that the introduction needs to be more focused.

We have attempted to tighten up the introduction.

Reviewer 2 Report

Review of Vision MS # 638354: “Musical training improves audiovisual integration capacity under conditions of high perceptual load”

 

This manuscript describes a study examining the degree to which an individual’s musical training and experience influences her/his capacity for integrating audio-visual stimuli. Specifically, a capacity measure is employed to determine if musically trained observers can more readily track and integrate multisensory input across multiple objects than those who have less musical experience. This is, in general, a very well written manuscript addressing an interesting question, and makes a useful contribution to our understanding of the constraints on multisensory integration, and is a logical studies in a series being conducted by the authors focused more generally on determining those factors that contribute to an individual’s ability to integrate multisensory stimuli. As such, I believe this manuscript is suitable for publication in the journal Vision. However, I feel that some of the concerns outline below should be addressed in order to make it a stronger paper.

 

Some General concerns:

 

There are some areas where the manuscript would benefit from a bit more clarity in the description of, in particular, methods. For example, on Line 64, the phrase “Different dots change a total of ten times” might be more clearly stated as “There are 10 intervals in which such polarity changes could occur….”. Similarly, on Line 84, the authors state that participants were “…unable to integrate more than one visual stimulus” – presumably they are referring to the integration of multiple visual stimuli with an auditory stimulus, but this is not stated as such.

 

There are some inconsistencies in tense usage throughout the document (e.g., on Line 61, in describing results from a series of studies previously conducted, the authors state “participants are” – should be “participants were” – the manuscript should be reviewed and these inconsistencies addressed.

 

Finally, the title includes the term “high perceptual load” referring to the fact that it is only at the fastest stimulus presentation rates (where presumably perceptual load is greatest) is there a correlation between musical training and capacity. The authors should come back to this point at the end of the document, and perhaps should point to studies that explicitly suggested faster presentation rates led to greater perceptual load.

 

 

SPECIFIC COMMENTS, INCLUDING MINOR EDITORIAL NOTES

Abstract

Line 21: insert “been” between “have” and “previously”

 

Introduction

Line 25: insert comma after “continue to be”

Line 26: Insert period after “psychological research”

            Insert comma after Miller [1]

Line 121: “non-musicians tempora…” à “non-musicians’ temporal…”

Line 125: inset “impact of” between “the” and “years”

Line 129: “memories” à “memorize”

Line 139: consider inserting “from our laboratory” after the word “literature”

 

 

 

Method

Participants

Line 160: data was à data were

 

Materials

Line 181-182: hyphenate stimulus modifiers (60-ms;  400-Hz; 5-ms)

            on-set à onset

 

Procedure

Line 193: “12 individual conditions of stimulus were created…” à “Twelve individual stimulus conditions were created…”

 

In Figure 1, the example trial provided is one in which the probe dot location did not change. It is suggested that the authors either explain this further or (and?) employ a trial example in which the probe dot is indicating the location of a change. It might help the reader to better understand the design.

 

Results

Line 220: Consider adding the symbols associated with the variables. That is, “the number of visual stimuli (n)…capacity (K)

Line 221: the use of the term “perfect” to describe probability is unusual. Is there a better term that can be used?

 

Line 232: The authors state an “increase in capacity with decreasing SOAs”. Assuming I am understanding the data, it appears in Figure 2 that capacity (K) is increasing with increasing SOAs, which is what one might predict. If this is not the case, the authors need to be more clear in the description of conditions. More generally speaking, the use of the term SOA may be a bit misleading as it may suggest an asynchrony across modalities. Indeed, the authors refer alternately to SOA and visual stimulus presentation rate, so why not merely use the latter term? In either case, consistency is preferred.

 

I’m also curious about why the authors did not test a ‘no audio cue’ condition as a baseline. Although it may be difficult to know when the potential change was to take place, particularly at the fastest presentation rates, the visual configurations of interest always occurred at the same place in the sequence. There was no uncertainty about that. However, it may be that the task would simply be impossible.

 

One might assume from the description that each of the 10 presentations of the dot grid remained one (for the duration associated with the stimulus presentation rate) until the next dot grid was presented. Is that the case, or was there a blank period inserted between each presentation?

Author Response

REVIEWER 2

Comments and Suggestions for Authors

Review of Vision MS # 638354: “Musical training improves audiovisual integration capacity under conditions of high perceptual load”

This manuscript describes a study examining the degree to which an individual’s musical training and experience influences her/his capacity for integrating audio-visual stimuli. Specifically, a capacity measure is employed to determine if musically trained observers can more readily track and integrate multisensory input across multiple objects than those who have less musical experience. This is, in general, a very well written manuscript addressing an interesting question, and makes a useful contribution to our understanding of the constraints on multisensory integration, and is a logical studies in a series being conducted by the authors focused more generally on determining those factors that contribute to an individual’s ability to integrate multisensory stimuli. As such, I believe this manuscript is suitable for publication in the journal Vision. However, I feel that some of the concerns outline below should be addressed in order to make it a stronger paper.

Some General concerns:

There are some areas where the manuscript would benefit from a bit more clarity in the description of, in particular, methods. For example, on Line 64, the phrase “Different dots change a total of ten times” might be more clearly stated as “There are 10 intervals in which such polarity changes could occur….”. Similarly, on Line 84, the authors state that participants were “…unable to integrate more than one visual stimulus” – presumably they are referring to the integration of multiple visual stimuli with an auditory stimulus, but this is not stated as such.

 

We have made these changes.

There are some inconsistencies in tense usage throughout the document (e.g., on Line 61, in describing results from a series of studies previously conducted, the authors state “participants are” – should be “participants were” – the manuscript should be reviewed and these inconsistencies addressed.

We have reviewed the manuscript and removed tense issues throughout, including on line 61.

Finally, the title includes the term “high perceptual load” referring to the fact that it is only at the fastest stimulus presentation rates (where presumably perceptual load is greatest) is there a correlation between musical training and capacity. The authors should come back to this point at the end of the document, and perhaps should point to studies that explicitly suggested faster presentation rates led to greater perceptual load.

 We appreciate this feedback, and have included this in the general discussion.

SPECIFIC COMMENTS, INCLUDING MINOR EDITORIAL NOTES

Abstract

Line 21: insert “been” between “have” and “previously”

We inserted this word.

Introduction

Line 25: insert comma after “continue to be”

Line 26: Insert period after “psychological research”

            Insert comma after Miller [1]

Line 121: “non-musicians tempora…” à “non-musicians’ temporal…”

Line 125: inset “impact of” between “the” and “years”

Line 129: “memories” à “memorize”

Line 139: consider inserting “from our laboratory” after the word “literature”

 We have made each of the corrections above, and thank the Reviewer for the corrections.

 

 

Method

Participants

Line 160: data was à data were

 We have made this correction.

Materials

Line 181-182: hyphenate stimulus modifiers (60-ms;  400-Hz; 5-ms)

            on-set à onset

We have made these corrections. 

Procedure

Line 193: “12 individual conditions of stimulus were created…” à “Twelve individual stimulus conditions were created…”

 We have made this correction.

In Figure 1, the example trial provided is one in which the probe dot location did not change. It is suggested that the authors either explain this further or (and?) employ a trial example in which the probe dot is indicating the location of a change. It might help the reader to better understand the design.

 We appreciate that this figure may have made it difficult for readers to understand, and have replaced the figure with a version in which the probed location did change. We also include a note in the figure caption that this was the case.

Results

Line 220: Consider adding the symbols associated with the variables. That is, “the number of visual stimuli (n)…capacity (K)

We have added in these symbols.

Line 221: the use of the term “perfect” to describe probability is unusual. Is there a better term that can be used?

 We have replaced “perfect” with “approaches certainty”, but we are happy to receive additional suggestions from the Reviewer.

Line 232: The authors state an “increase in capacity with decreasing SOAs”. Assuming I am understanding the data, it appears in Figure 2 that capacity (K) is increasing with increasing SOAs, which is what one might predict. If this is not the case, the authors need to be more clear in the description of conditions. More generally speaking, the use of the term SOA may be a bit misleading as it may suggest an asynchrony across modalities. Indeed, the authors refer alternately to SOA and visual stimulus presentation rate, so why not merely use the latter term? In either case, consistency is preferred.

 We have replaced SOA with ‘presentation speed’ throughout the manuscript. We also apologize for the previously misleading statement, and the Reviewer is correct in stating that capacity estimates increase along with longer/slower speeds of presentation.

I’m also curious about why the authors did not test a ‘no audio cue’ condition as a baseline. Although it may be difficult to know when the potential change was to take place, particularly at the fastest presentation rates, the visual configurations of interest always occurred at the same place in the sequence. There was no uncertainty about that. However, it may be that the task would simply be impossible.

 We did not do this as previous work by Van der Burg et al. (2013), and from our own lab (Wilbiks & Dyson, 2016; 2018, Wilbiks et al., 2019) have done so and find no perceptual boost from visual stimuli in the same way as we see from auditory stimuli.

One might assume from the description that each of the 10 presentations of the dot grid remained one (for the duration associated with the stimulus presentation rate) until the next dot grid was presented. Is that the case, or was there a blank period inserted between each presentation?

There were no blank periods inserted between each presentation, but the change of dot(s) was instantaneous at each change.

Round 2

Reviewer 1 Report

I would like to thank the authors for addressing some of my concerns. However the revision is very minimal. Specifically, the authors made little changes to the introduction. No justification was given for the change from Pearson to Spearman correlation.  Why p<0.017 was used given each of the 3 metrics was tested for 3 correlations. 

Minor: 

The term 'presentation rate' is still bit confusing. Rate usually refers to the number of stimulus presented within a unit time. Maybe presentation duration would be more appropriate? -- capacity estimates increase along with longer presentation duration.   

Author Response

I would like to thank the authors for addressing some of my concerns. However the revision is very minimal. Specifically, the authors made little changes to the introduction.

In the previous revision, we had attempted to focus the introduction by tightening up the language around the papers we cite. We have now attempted to do this further, and believe we have struck a balance between concision and providing enough information to show why the cited studies support our project. We would be happy to provide additional revisions if the Reviewer wishes to provide more specific points to be revised, and we thank the Reviewer for their continued contributions to this manuscript.

 

No justification was given for the change from Pearson to Spearman correlation. 

We have now included a justification that Spearman correlations were used as the data violated the assumptions of Pearson correlations. 

Why p<0.017 was used given each of the 3 metrics was tested for 3 correlations. 

As each data point was used in 3 correlations, a Bonferroni-corrected value was calculated as follows: .05 / 3 = .01666. This is now noted in the manuscripts as well.

Minor: 

The term 'presentation rate' is still bit confusing. Rate usually refers to the number of stimulus presented within a unit time. Maybe presentation duration would be more appropriate? -- capacity estimates increase along with longer presentation duration.   

We agree with the Reviewer on this terminology, and have changed 'rate' to 'duration' throughout.

 

 

Back to TopTop