Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

The Auditory-Visual Stroop Test to Assess Subjects with Tinnitus

Brain Sci. 2026, 16(6), 565; https://doi.org/10.3390/brainsci16060565

by Anna Carolina Marques Perrella de Barros¹, Daniela Gil¹

, Flavia Alencar de Barros²

, Richard S. Tyler³

, Ektor Tsuneo Onishi^2,*

and Fátima Cristina Alves Branco-Barreiro¹

Reviewer 1:

Palak Gupta

Reviewer 2: Anonymous

Brain Sci. 2026, 16(6), 565; https://doi.org/10.3390/brainsci16060565

Submission received: 27 March 2026 / Revised: 11 May 2026 / Accepted: 23 May 2026 / Published: 27 May 2026

(This article belongs to the Section Sensory and Motor Neuroscience)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Please see attached.

Comments for author File: Comments.pdf

Author Response

Dear Reviewer,

Thank you very much for the through review and insightful comments on our manuscript, “The Auditory-Visual Stroop Test to assess subjects with tinnitus”. We’ve carefully considered every suggestion and made the necessary revisions to significantly enhance the paper's quality.

Below is a detailed, point-by-point response to the reviewers’ comments. For clarity, the reviewers’ original comments are presented, followed by our corresponding response.

Introduction

Comment: The Introduction is very limited and almost nothing of what has previously been investigated in this field in mentioned there. The Introduction must be improved with this in mind.

Response: Dear reviewer, we appreciate the suggestion to expand the historical and investigative context of the Introduction. Our initial approach was to provide a concise opening, as we have recently published a detailed scoping review on this specific topic (Reference [9]), which we cited to avoid repetitive reporting.

However, we understand that a more self-contained Introduction would benefit the reader. We have now expanded this section to include a broader overview of previous investigations into the relationship between tinnitus and cognitive interference. Specifically, we have added a summary of key findings regarding executive attention deficits in tinnitus populations, providing the necessary foundation for our study’s objectives while maintaining the depth of our Discussion.

Procedures

Comment: The descriptions of the procedures are far from being adequate, maybe except for the procedure in the First Stage. But in the Second Stage and the Third Stage it is not possible to understand how these experiments were run. For the reader, the experimental procedure is one of the most important section of each paper because if the reader does not understand how the experiment was run, the results do not tell the reader anything. Furthermore, if the reader does not understand the experiment and its results, the results are of no scientific value.

Response: Dear reviewer, we appreciate your comment and carefully reviewed the Methods section.

Comment: The method that is described in First Stage is not a Stroop task; it is a cueing task similar to the classic Posner cueing task and includes valid and invalid cues. It also seems that what the AV-Stroop tests that authors used in the two other stages are similar to the one used in the First Stage and are therefore not Stroop tests.

Response: We appreciate the reviewer’s critical eye regarding the task’s classification. However, we respectfully disagree with the categorization of this protocol as a Posner cueing task. While both tasks utilize spatial locations, they evaluate entirely different cognitive mechanisms. We have clarified this in the manuscript based on the following three pillars:

Conditioning vs. Spatial Discovery (The Role of Training) Unlike the classic Posner task, where a participant must "find" a target based on a cue's validity, our participants underwent a rigorous training track prior to the experimental stages (as detailed in our institutional repository).

- During this training, subjects were conditioned to a specific logical rule: the side of the auditory stimulus is the required response side.

- We provided immediate feedback during this phase to ensure the stimulus-response association was reflexive.

- Therefore, during the actual test, the auditory signal was not a "cue" for a hidden location, but a direct command for a specific motor response.

Interference vs. Orienting (The Stroop Mechanism) In a Posner task, the delay in reaction time is caused by re-orienting the attentional spotlight. In our task, the delay is caused by interference suppression.

- The visual disk serves only as a validation of the learned logic.

- The "distractor" squares are the critical element: they create a Stroop-like conflict by providing a visual stimulus that contradicts the auditory command.

- The participant must exert inhibitory control to ignore the visual distractor and follow the learned auditory rule. This "clash" between a dominant visual stimulus and a conditioned auditory rule is the hallmark of an Audio-Visual Stroop Task.

Congruency as Stimulus-Response Compatibility The reviewer mentions that "congruent" and "incongruent" are not valid here. In the field of cognitive psychology, these terms are standard for Stimulus-Response Compatibility (SRC) paradigms.

- Congruent: The auditory command and visual environment align.

- Incongruent: The visual distractors conflict with the auditory command. Since our study aimed to measure how tinnitus affects executive function (the ability to inhibit distractions), the Stroop paradigm is the theoretically correct framework.

Comment: It is not clear what the word reading (WR) and colour naming (CN) tasks involved. That must be made clear.

Response: Dear reviewer, we appreciate your insightful comment. We reviewed the methods to clarify. Thanks for your kind guidance.

Comment: First stage: Adaptation of the Auditory-Visual Stroop Test The authors must explain clearly what the purpose of this task was. The task in this part is a modified Posner cueing task. The cue was the sound, the target the disk (if it was a ball, then it must have been visualised in three dimension space) and the squares acted as distractors. The cue could either be valid, i.e. it correctly suggested the target's location, or it could be invalid, i.e. suggesting wrong location of the target. Therefore, the concepts congruent and incongruent are not valid in this context. In a Stroop task when the word red is written in red, the condition is congruent but if the word red is written in blue, then the condition is incongruent. To be able to associate the sound to the target it is necessary for the participants the get feedback that inform them if their respond was correct or wrong. Furthermore, it seems that there was only one type of sound which means that the sound did not convey any message related to the identity of the target, but only its possible location.

Response: We acknowledge the reviewer's point regarding the necessity of feedback to establish stimulus association. To address this, we have clarified the role of the pre-experimental training phase. During this stage, participants underwent a dedicated training track where they received immediate feedback to consolidate the logical association between auditory lateralization and the required response side.

Regarding the sound's identity, while the acoustic properties remained constant, the sound functioned as a symbolic command rather than a simple spatial alert. Through the training process, the sound's 'message' was successfully mapped to a specific directional response. The visual disk then served as a reinforcement of this learned identity. This training ensured that by the time the experimental 'Stroop-type' trials began, the association was reflexive, allowing us to accurately measure the interference caused by the incongruent visual distractors.

Comment: The authors must also tell how the stimuli were presented, how big they were (in pixels or visual degrees if possible), the space between the stimuli, how many distractors, the distance to the screen, how the participant responded (with mouse, finger, ...), how long time the stimuli were visible, the time from when on scene disappeared until the next appeared, the size (in pixels) of the screen, the refresh rate of the screen. The authors must also report the duration of the sounds; the interval between the disappearance of the current stimuli and the appearance of the next.

Response: Dear reviewer, we thank you for your comment. Regarding the specific visual degrees and pixel-perfect dimensions, we have focused our reporting on the standardization of the environment for all participants. As the task evaluates a rule-based cognitive conflict (Stroop effect) rather than low-level visual perception or spatial acuity, these standardized conditions ensured that the observed interference was a result of executive processing rather than physical stimulus characteristics. Regarding the duration of the sounds, the interval between the disappearance of the current stimuli and the appearance of the next, these information is in the method section, Pages 2 and 3. We have clarified that participants were explicitly instructed to respond as quickly and as accurately as possible, a standard requirement for measuring the interference effects inherent in a Stroop-type paradigm. Regarding the physical presentation, all stimuli were displayed on a standardized monitor with consistent refresh rates and dimensions for all subjects. Our task evaluates higher-order cognitive interference rather than low-level visual acuity. We reviewed the Methods to address these concerns.

Comment: Page 2, line 78 "Previous training [...]" Previous to what? Please, explain

Response: Dear reviewer, we appreciate your insightful comment. We reviewed the methods to clarify. Thanks for your kind guidance.

Comment: Page 3, line 92 "After recorded instruction [...]" I do not know what this means. Please explain.

Response: Dear reviewer, we appreciate your support. We reviewed the methods to make it clearer. Thanks for your guidance.

Comment: Second Stage: Comparison of the Auditory-Visual Stroop Test, the Conventional Stroop Test, and the Montreal Cognitive Assessment results What was the purpose of this experiment (or stage)? The authors must clearly describe the purpose of this experimente.

Response: Dear reviewer, we appreciate your comment. We reviewed the objectives in the end of the Introduction to make it clear (Page 2).

Comment: Page 5 In the paragraph that begins in line 143, the authors discussed a pretest. But the purpose of it was not explained. One could speculate that it had something to do with the selection of participants and that some of them might have failed on the test. This must be made clear.

Response: Dear reviewer, we thank you for your kind support and guidance. We reviewed the Methods section to clarify the training stage.

Comment: Lines 147 and 148 "C-Stroop test’s total execution time and number of errors were registered by researchers." How was the time registered? And from what point to which point? Or was it the total time that took the participants to finish the test? If the participants had to read the words, then the reading speed may have affected the response times.

Response: Dear reviewer, we reviewed the Methods section to clarify the total execution time measure. We appreciate your concern regarding the potential influence of baseline reading speed on reaction times. We have addressed this in the manuscript by clarifying that our primary focus was not on absolute response times, but on the relative difference in execution time between tasks.

Because we compared the total execution time across tasks that shared identical linguistic and motor requirements, any individual variation in reading speed functioned as a constant across all conditions. Therefore, the differences observed between the tinnitus and control groups specifically reflect interference suppression and executive control rather than baseline reading proficiency.

Comment: Third Stage: Application of the Auditory-Visual Stroop Test in subjects with tinnitus and controls As in the second stage, it is not clear what is the purpose of this part. That must be made clear. Additionally, the c-Stroop test is not mentioned.

Response: Dear reviewer, we appreciate your comment. We reviewed the objectives in the end of the Introduction to make it clear (Page 2). The C-Stroop test is part of the Second Stage.

Comment: Page 6, line 168 "Participants in the Control Group completed medical history [...]" What is "medical history"?

Response: Dear reviewer, we thank you for your kind support and guidance. We reviewed the Methods section to clarify the training stage.

Comment: Page 6, Table 4 "49.1 (SD+47.9)" Standard deviation is the deviation around the mean. What is SD+47.9? The value of standard deviation is always >= 0. Please adjust the reported standard deviation values accordingly.

Response: Dear reviewer, we really appreciate your support. We corrected the Symbol.

Comment: Page 7 Lines 177 and 178 "It was not analyzed the classification of the results obtained, within or outside expectations." I do not get the meaning of this sentence, please rephrase.

Response: Dear reviewer, we really appreciate your support. We rephrased this sentence.

Comment: Paragraph that begins with "Tinnitus pitch and loudness matching was performed [...]" in line 181. In this paragraph, it seems that the authors are trying to explain how the sound stimuli's characteristics was matched to the characteristics of the tinnitus. But it is not possible to understand how that was done. The authors describe how the sound stimuli were created but given that it is very likely that the characteristics of the tinnitus varied between participants and therefore several sound stimuli were needed, but that is not mentioned. The authors must explain this process clearly. Line 185 "The presentation mode was ipsilateral." Ipsilateral to what? Lines 187 and 188 "This measurement allowed the definition of the type and frequency of tinnitus sensation (pitch)." What measure? Paragraph that begins with "To investigate the sensation of intensity [...] in line 189 The authors wrote that the same stimulus was used. That suggests that there was only one type of sound stimulus and then the same stimulus was used for all participants. Finding the loudness of the sound stimulus that matched the perceived tinnitus loudness by always increasing the loudness is not very robust method. And what was the established threshold and how was it found?

Response: Dear reviewer, we appreciate your comment. We reviewed the tinnitus pitch and loudness measurement description in the Methods section to make it clearer.

Comment: Statistical Analysis Page 7 Lines 204 and 205 "The p-value of the no-association test and confidence interval were calculated for each correlation coefficient." What is a "no-association test"? Please explain. Line 206 "The inferential analyses used in the descriptive analysis were:" Inferential analysis usually refers to make assumptions from the data that could be generalised to the population from which the sample is draw. But descriptive analyses simply describe the data. Please, rephrase Furthermore, the whole paragraph that begins in line 206 is very obscure and has to be rephrased to adequately explain how and when these tests were used. Line 209 "The significance level was set at 5% (*)." What does the asterisk in the parentheses represent? There are also asterisks in the tables without any explanations. Please, add explanations or remove the asterisks (which is a better option).

Response: We appreciate the reviewer’s thorough critique of our statistical reporting. We have revised the 'Statistical Analysis' section to adhere to standard nomenclature and to clarify the application of each test:

- Terminology: The term 'no-association test' has been replaced with 'significance testing of the correlation coefficients.'

- Categorization: We have clearly separated the descriptive statistics (used to summarize the sample) from the inferential statistics (used to compare groups and test hypotheses).

- Asterisks: We have removed the asterisks from the text and tables as suggested. We now report exact p-values or explicitly state the significance level in the text to ensure transparency.

- Clarity: The paragraph has been restructured chronologically to explain exactly which tests were applied to which variables, ensuring the workflow is no longer obscure."

Comment: Results General Please report which variables were included in the analyse and which test were used to compute the test statistics.

Response: Dear reviewer, thanks for your insightful comment. We revised the Statistical Analysis description in the Methods section to fulfill this gap.

Comment: First Stage: Adaptation of the Auditory-Visual Stroop Test There are actually no results in this section.

Response: We appreciate the reviewer’s observation. We have clarified this section to highlight that the results of the First Stage consisted of the successful development and technical validation of the experimental stimuli.

This phase was essential to produce the standardized 'test tracks' tailored for participants with tinnitus. The results of this stage are characterized by the final description of the auditory-visual pairings, and the technical parameters that ensured the reliability of the subsequently collected behavioral data. We have revised the text to more explicitly present these developmental outcomes as the foundational results of this stage.

Comment: Second Stage: Comparison of the Auditory-Visual Stroop Test, the Conventional Stroop Test, and the Montreal Cognitive Assessment results In the first paragraph, the authors report correlations between AV-Stroop test and c-Stroop test. But when explaining the experimental procedure, c-Stroop is not adequately explained. And neither is the AV-Stroop test (which probably is a cueing test).

Response: Dear reviewer, we thank you for your kind support and guidance. We reviewed the Methods section to clarify these questions.

Comment: There seems to be several spelling errors in Table 6. Please rectify.

Response: Dear reviewer, we really appreciate your support. We reviewed spelling in Table 6.

Comment: Page 9 Line 248 "The results were confirmed by the confidence intervals also presented in Tables 5 and 6" Well, this is clearly an overstatement. One third of the results in Table 5 is not significant and in Table 6, five of nine results are not significant. Note also, that it is not guaranteed that results are significant although the CIs do not include 0.

Response: Dear reviewer, we appreciate your rigorous evaluation of our statistical interpretation. We acknowledge that the term 'confirmed' was an overstatement, and we have revised the manuscript to provide a more nuanced discussion of the findings.

Comment: Third Stage: Application of the Auditory-Visual Stroop Test in subjects with tinnitus and controls Page 9 Line 255 "[...] both training and the three test stages [...]" What training and what are the three test stages? This is not clear because this is reported in the section on the Third Stage and no training is mentioned there and there seems not to be three test stages.

Response: Dear reviewer, we thank you for your support. We reviewed the Methods section to make it clearer. It is described at Page 8. Additionally, we reviewed the Results and Discussion sections.

Comment: Table 7. It seems that in column 2 (with Study Group as heading) the execution times are reported along with the standard deviation. The heading of the columns in this table must clearly represent the content of the corresponding column.

Response: Dear reviewer, we appreciate your comment. We agree that the column headings should be more descriptive to facilitate the interpretation of the data. We have updated the Table 7 headeres from Execution Time (s) to 'Execution Time (s): Mean (SD)’. All other column headings have been reviewed for clarity and technical accuracy.

Comment: Minor comments Replace the word subject (or subjects) with the word participant (or participants).

Response: Dear reviewer, we appreciate your comment. We updated the manuscript with the words participant/ participants.

Page 1, line 21 "This three-stage study aimed to adapt [...]" Do not personalise the study. The study, per se, did not aim at anything. 4 Page 2, line 78 "AV-Stroop test adaptation used auditory and visual stimuli." Do not personalise the test. The test, per se, did not use any stimuli, but stimuli were used in the tests.

Response: Dear reviewer, we appreciate your comment. We corrected these sentences. Thanks for your support.

Once again, we appreciate the time and effort the reviewers have dedicated to improving our work, and we hope that the revisions we made address all concerns. It is an honor to submit the revised version of our manuscript to the Brain Sciences.

Thank you for considering our revised manuscript.

Sincerely,

Authors

Reviewer 2 Report

Comments and Suggestions for Authors

Except for the Discussion section, which I just took a quick look at, I have carefully read the manuscript and after that I concluded it must be more or less rewritten and reorganised. As an example, in the results section for the First Stage no results are actually reported. I only took a quick look at the Discussion section because the experimental procedures – and therefore the results sections – are very ambiguous, see below.

Introduction

The Introduction is very limited and almost nothing of what has previously been investigated in this field in mentioned there. The Introduction must be improved with this in mind.

Procedures

The descriptions of the procedures are far from being adequate, maybe except for the procedure in the First Stage. But in the Second Stage and the Third Stage it is not possible to understand how these experiments were run. For the reader, the experimental procedure is one of the most important section of each paper because if the reader does not understand how the experiment was run, the results do not tell the reader anything. Furthermore, if the reader does not understand the experiment and its results, the results are of no scientific value.

The method that is described in First Stage is not a Stroop task; it is a cueing task similar to the classic Posner cueing task and includes valid and invalid cues. It also seems that what the AV-Stroop tests that authors used in the two other stages are similar to the one used in the First Stage and are therefore not Stroop tests.

It is not clear what the word reading (WR) and colour naming (CN) tasks involved. That must be made clear.

First stage: Adaptation of the Auditory-Visual Stroop Test

The authors must explain clearly what the purpose of this task was.

The task in this part is a modified Posner cueing task. The cue was the sound, the target the disk (if it was a ball, then it must have been visualised in three dimension space) and the squares acted as distractors. The cue could either be valid, i.e. it correctly suggested the target's location, or it could be invalid, i.e. suggesting wrong location of the target. Therefore, the concepts congruent and incongruent are not valid in this context. In a Stroop task when the word red is written in red, the condition is congruent but if the word red is written in blue, then the condition is incongruent.

To be able to associate the sound to the target it is necessary for the participants the get feedback that inform them if their respond was correct or wrong. Furthermore, it seems that there was only one type of sound which means that the sound did not convey any message related to the identity of the target, but only its possible location.

The authors must also tell how the stimuli were presented, how big they were (in pixels or visual degrees if possible), the space between the stimuli, how many distractors, the distance to the screen, how the participant responded (with mouse, finger, ...), how long time the stimuli were visible, the time from when on scene disappeared until the next appeared, the size (in pixels) of the screen, the refresh rate of the screen. The authors must also report the duration of the sounds; the interval between the disappearance of the current stimuli and the appearance of the next.

Page 2, line 78
"Previous training [...]" Previous to what? Please, explain

Page 3, line 92
"After recorded instruction [...]" I do not know what this means. Please explain.

Second Stage: Comparison of the Auditory-Visual Stroop Test, the Conventional Stroop Test, and the Montreal Cognitive Assessment results

What was the purpose of this experiment (or stage)? The authors must clearly describe the purpose of this experiment.

Page 5
In the paragraph that begins in line 143, the authors discussed a pretest. But the purpose of it was not explained. One could speculate that it had something to do with the selection of participants and that some of them might have failed on the test. This must be made clear.

Lines 147 and 148
"C-Stroop test’s total execution time and number of errors were registered by researchers."

How was the time registered? And from what point to which point? Or was it the total time that took the participants to finish the test? If the participants had to read the words, then the reading speed may have affected the response times.

Third Stage: Application of the Auditory-Visual Stroop Test in subjects with tinnitus and controls

As in the second stage, it is not clear what is the purpose of this part. That must be made clear. Additionally, the c-Stroop test is not mentioned.

Page 6, line 168
"Participants in the Control Group completed medical history [...]"
What is "medical history"?

Page 6, Table 4
"49.1 (SD+47.9)" Standard deviation is the deviation around the mean. What is SD+47.9? The value of standard deviation is always >= 0. Please adjust the reported standard deviation values accordingly.

Page 7

Lines 177 and 178
"It was not analyzed the classification of the results obtained, within or outside expectations."
I do not get the meaning of this sentence, please rephrase.

Paragraph that begins with "Tinnitus pitch and loudness matching was performed [...]" in line 181.

In this paragraph, it seems that the authors are trying to explain how the sound stimuli's characteristics was matched to the characteristics of the tinnitus. But it is not possible to understand how that was done. The authors describe how the sound stimuli were created but given that it is very likely that the characteristics of the tinnitus varied between participants and therefore several sound stimuli were needed, but that is not mentioned. The authors must explain this process clearly.

Line 185
"The presentation mode was ipsilateral."
Ipsilateral to what?

Lines 187 and 188
"This measurement allowed the definition of the type and frequency of tinnitus sensation (pitch)."
What measure?

Paragraph that begins with "To investigate the sensation of intensity [...] in line 189

The authors wrote that the same stimulus was used. That suggests that there was only one type of sound stimulus and then the same stimulus was used for all participants. Finding the loudness of the sound stimulus that matched the perceived tinnitus loudness by always increasing the loudness is not very robust method. And what was the established threshold and how was it found?

Statistical Analysis

Page 7

Lines 204 and 205
"The p-value of the no-association test and confidence interval were calculated for each correlation coefficient."
What is a "no-association test"? Please explain.

Line 206
"The inferential analyses used in the descriptive analysis were:"

Inferential analysis usually refers to make assumptions from the data that could be generalised to the population from which the sample is draw. But descriptive analyses simply describe the data. Please, rephrase

Furthermore, the whole paragraph that begins in line 206 is very obscure and has to be rephrased to adequately explain how and when these tests were used.

Line 209
"The significance level was set at 5% (*)."
What does the asterisk in the parentheses represent? There are also asterisks in the tables without any explanations. Please, add explanations or remove the asterisks (which is a better option).

3 Results

General

Please report which variables were included in the analyse and which test were used to compute the test statistics.

First Stage: Adaptation of the Auditory-Visual Stroop Test
There are actually no results in this section

Second Stage: Comparison of the Auditory-Visual Stroop Test, the Conventional Stroop Test, and the Montreal Cognitive Assessment results

In the first paragraph, the authors report correlations between AV-Stroop test and c-Stroop test. But when explaining the experimental procedure, c-Stroop is not adequately explained. And neither is the AV-Stroop test (which probably is a cueing test).

There seems to be several spelling errors in Table 6. Please rectify.

Page 9

Line 248
"The results were confirmed by the confidence intervals also presented in Tables 5 and 6"
Well, this is clearly an overstatement. One third of the results in Table 5 is not significant and in Table 6, five of nine results are not significant. Note also, that it is not guaranteed that results are significant although the CIs do not include 0.

Third Stage: Application of the Auditory-Visual Stroop Test in subjects with tinnitus and controls

Page 9

Line 255
"[...] both training and the three test stages [...]"
What training and what are the three test stages? This is not clear because this is reported in the section on the Third Stage and no training is mentioned there and there seems not to be three test stages.

Table 7.
It seems that in column 2 (with Study Group as heading) the execution times are reported along with the standard deviation. The heading of the columns in this table must clearly represent the content of the corresponding column.

Minor comments

Replace the word subject (or subjects) with the word participant (or participants)

Page 1, line 21
"This three-stage study aimed to adapt [...]" Do not personalise the study. The study, per se, did not aim at anything.

Page 2, line 78
"AV-Stroop test adaptation used auditory and visual stimuli." Do not personalise the test. The test, per se, did not use any stimuli, but stimuli were used in the tests.

Author Response

Dear Reviewer,

Below is a detailed, point-by-point response to the reviewers’ comments. For clarity, the reviewers’ original comments are presented, followed by our corresponding response.

Comment 0 (Title) Issue: The title contains words that are not capitalized in accordance with title case conventions. "assess," "subjects," and "tinnitus" should each begin with a capital letter. Suggestion: Authors should correct the title to read: "The Auditory-Visual Stroop Test to Assess Subjects with Tinnitus."

Response: Dear reviewer, we really appreciate your support. We corrected the title.

Introduction:

Comment 1 (Lines 44–66) Issue: The Introduction is approximately 400 words and consists of only three paragraphs. It does not establish the clinical context for tinnitus prevalence, chronicity, or multidimensional burden. This characterizatio is necessary to justify the clinical significance of the work. The cognitive-tinnitus relationship is summarized in insufficient depth without describing the current state of the field or the direction of cognitive assessment in tinnitus management. Suggestion: Authors should expand the Introduction by opening with tinnitus prevalence and clinical burden, followed by a substantive review of the cognitive-tinnitus literature including current attentional models and the state of cognitive assessment in clinical tinnitus practice.

Comment 2 (Lines 44–66) Issue: No survey of existing cognitive assessment tools in the tinnitus indication is provided. The authors cite their own scoping review (Ref #9) but do not use it to inform what is lacking in the existing literature. Without this, the necessity of a new test cannot be evaluated by the reader. Suggestion: Authors must summarize the strengths and limitations of existing Stroop adaptations and cognitive tools in tinnitus, drawing explicitly on Ref #9, before introducing the proposed AV-Stroop.

Response: Dear reviewer, we revised the Introduction to fulfill this gap. We thank you for your guidance.

Comment 3 (Lines 44–66) Issue: The AV-Stroop paradigm is never introduced or described in the Introduction. The acronym appears in the title and abstract, but the reader receives no conceptual grounding in what an auditory-visual Stroop is, how it differs from the conventional version, or what prior work exists using this paradigm. Suggestion: Authors should introduce the AV-Stroop paradigm conceptually in the Introduction, including its distinguishing features from the conventional Stroop and relevant prior work.

Response: Dear reviewer, we revised the Introduction to fulfill this gap. We thank you for your guidance.

Comment 4 (Lines 44–66) Issue: The rationale for using nonverbal auditory stimuli, a central methodological decision does not appear until the Discussion (lines 320–324). This leaves readers processing the entire methodology without understanding a key design choice. Suggestion: Authors should consider relocating the nonverbal stimulus rationale to the Introduction, where it can inform the readers’ understanding of the test design prior to the Methods section.

Response: Dear reviewer, we revised the Introduction to fulfill this gap. We thank you for your guidance.

Comment 5 (Lines 61–66) Issue: The final paragraph lists four study aims but states no directional hypotheses. For a three-stage study of this complexity, hypotheses are expected and necessary. Each stage had a testable, directional prediction that should be stated explicitly. Without hypotheses, the study reads as exploratory and the Discussion has no evaluative framework to return to. Suggestion: Authors must state directional hypotheses for each stage, including expected correlation between AV-Stroop and c-Stroop, expected independence from MOCA, expected group differences in Stage 3, and expected sensitivity of the Tinnitus Pitch track, in addition to the four aims currently listed.

Response: We thank the reviewer for this constructive suggestion. We agree that stating explicit, directional hypotheses strengthens the evaluative framework of the study. We have revised the final paragraph of the Introduction to include specific hypotheses for each stage. Specifically, we have formalized our expectations regarding the convergent validity between the AV-Stroop and C-Stroop tests, the independence of these measures from general cognitive screening (MoCA), the expected performance deficits in the tinnitus group, and the sensitivity of the customized Tinnitus Pitch tracks. These additions provide a clear benchmark for the discussion of our results.

Methods: Comment 6 (Lines 68–75) Issue: The study does not include a statement of compliance with the Declaration of Helsinki. For research involving human subjects, this is a mandatory reporting requirement for Brain Sciences (MDPI) and cannot be omitted. Ethics committee approval alone, while necessary, does not substitute for this declaration. Suggestion: Authors must add a statement confirming the study was conducted in accordance with the Declaration of Helsinki.

Response: We thank the reviewer for identifying this critical omission. We fully agree that compliance with the Declaration of Helsinki is a fundamental requirement for research involving human participants. We have updated the 'Methods' section (under the 'Ethical Considerations' or 'Participants' subheading) to explicitly state that the study was conducted in accordance with the Declaration of Helsinki. This statement now accompanies our local ethics committee approval details.

Comment 7 (Lines 68–75) Issue: There is no dedicated Participants section. Participant definitions, demographics, and group characteristics are distributed across Tables 2, 3, and 4, appearing piecemeal across three stages. The tinnitus Study Group definition and Control Group definition do not appear until Stage 3 (lines 156–160), despite being foundational to the entire study. The number of participants in Stage 1 is never stated. Whether the Stage 2 sample (n=45) and Stage 3 sample (n=70) are independent or overlapping is never addressed. Suggestion: Authors must add a dedicated Participants subsection at the beginning of the Methods, before any stage descriptions, covering: a) tinnitus diagnosis criteria and by whom it was confirmed; b) how secondary tinnitus was excluded; c) how absence of tinnitus was confirmed in controls; recruitment source and method for both groups; d) medication use and relevant comorbidities; e) total participant numbers across all three stages; and f) whether Stage 2 and Stage 3 samples overlap. A CONSORT-style participant flow diagram covering all three stages is strongly recommended.

Response: Dear reviewer, we appreciate your insightful comment. We revised the Methods to clarify this question. Thanks so much for your kind support.

Comment 8 (Lines 68–75) Issue: The study design is classified as 'observational, cross-sectional' throughout, but Stage 1 is a test development and adaptation process that does not fit this classification. Additionally, the sentence 'The present study was carried out in three stages' provides no useful information. Suggestion: Authors should consider refining the study design classification to reflect the mixed nature of the three stages and replace the uninformative stage announcement with a brief one-line preview of each stage's purpose.

Response: Dear reviewer, we appreciate your insightful comment. We revised the Methods to clarify this question. Thanks so much for your kind support.

Comment 9 (Lines 77–131) Issue: The training design choices are stated without justification: the number of training items (18), the binaural-to-monaural sequencing, and the 0.5-second audiovisual stimulus delay are all unexplained. No prior literature is cited to support these parameters. Suggestion: Authors should provide citations or empirical rationale for key training design parameters, particularly the 0.5-second audiovisual delay, which directly influences the nature of the auditory-visual conflict.

Response: Dear reviewer, we appreciate your insightful comment. We revised the Methods to clarify this question. Thanks so much for your kind support and guidance.

Comment 10 (Lines 77–131) Issue: As a test adaptation paper, no piloting or iterative refinement process is described. It is unclear how the 34 final tracks were selected, whether any were modified or discarded, or whether the training procedure was trialed before use. Suggestion: Authors should describe any piloting process used during test development, including sample characteristics, outcomes, and any modifications made. If no piloting was conducted, this should be acknowledged as a limitation.

Response: Dear reviewer, we agree that describing the refinement of the stimuli is essential for a methodological paper. We have added a description of the pilot phase conducted during Stage 1 in Methods.

Comment 11 (Lines 116–118) Issue: The output level is stated as calibrated to '0 dB' without specifying the reference 0 dB HL, 0 dB SL, 0 dBFS, or audiometer dial. This ambiguity is clinically significant in a tinnitus population where subclinical high-frequency hearing loss may be present even within normal audiometric limits. Suggestion: Authors must specify the stimulus presentation level with the correct decibel reference and justify the chosen level.

Response: Dear reviewer, we appreciate your insightful comment. We revised the Methods to clarify this question. Thanks so much for your kind support.

Comment 12 (Lines 95–99) Issue: The 80/20 congruent/incongruent ratio is justified only as a 'rare stimulus effect' without citation, and no alternative ratios are discussed. More importantly, since congruent and incongruent trials are never analyzed separately in the results, the practical value of this design manipulation is undermined. Suggestion: Authors should cite prior literature supporting the 80/20 ratio and acknowledge in the limitations that the rare stimulus effect could not be directly measured due to the total execution time outcome measure.

Response: Dear reviewer, we thank you for your guidance. We added this to the limitations section.

Comment 13 (Lines 133–152) Issue: The c-Stroop version used in Stage 2 is insufficiently described. The cited reference (Ref #10) is a Portuguese-language source, and no procedural details are provided, such as number of items, time limit, scoring procedure, or whether it was paper-based or computerized. Suggestion: Authors must provide full procedural details for the c-Stroop version administered, sufficient to allow replication.

Response: Dear reviewer, we appreciate your insightful comment. We revised the Methods to clarify this question. Thanks so much for your kind support.

Comment 14 (Lines 133–152) Issue: No test-retest or inter-rater reliability data are reported for the AV-Stroop. For a paper whose primary contribution is a new clinical instrument, reliability is a prerequisite for interpreting validity. The absence of reliability data is a major gap. Suggestion: Authors must report test-retest reliability of the AV-Stroop and inter-rater reliability for manual error counting or clearly acknowledge their absence as a primary limitation requiring future investigation.

Response: Dear reviewer, we thank you for your guidance. We added this to the limitations section.

Comment 15 (Lines 133–152) Issue: The order of test administration for the AV-Stroop, c-Stroop, and MOCA is not stated. Fatigue, practice, and carry-over effects across three sequentially administered cognitive tasks are plausible and unaddressed. Suggestion: Authors should specify the test administration order and state whether it was fixed or counterbalanced. If fixed, carry-over and fatigue effects should be acknowledged as limitations.

Response: Dear reviewer, we thank you for your comment. A fixed administration order (MOCA, c-Stroop, AV-Stroop) was employed to ensure procedural standardization. No fatigue was reported by participants, likely due to the brief duration and high efficiency of the tasks. We have acknowledged the lack of counterbalancing as a limitation.

Comment 16 (Lines 148–152) Issue: The MOCA is a screening tool for mild cognitive impairment. As the eligibility criteria excluded participants with neurological or psychiatric disorders, scores in this healthy sample likely clustered near the ceiling. The null MOCA finding cannot be straightforwardly interpreted without reporting the distribution of MOCA scores. Suggestion: Authors should report the distribution of MOCA scores in the Stage 2 sample and acknowledge the likely ceiling effect as a constraint on interpreting the null association between MOCA and AV-Stroop performance.

Response: Dear reviewer, thanks for your comment. The inclusion criteria excluded diagnosed neurological disorders, but MoCA scores ranged from 15 to 29, demonstrating significant cognitive variability within the sample. The lack of correlation between these scores and AV-Stroop execution times suggests that bimodal interference suppression operates as a specific executive process that is independent of the global cognitive status captured by the MoCA screening. We adressed these topics in the Results, Discussion, and Limitation sections.

Comment 17 (Lines 154–196) Issue: The testing environment is never described. For an auditory cognitive task, background noise levels, room acoustics, and use of a sound-treated booth are relevant to result validity and reproducibility. Suggestion: Authors should specify the acoustic environment in which testing was conducted. If a sound-treated room was not used, this should be acknowledged as a limitation.

Response: Dear reviewer, we have clarified the testing environments in the manuscript. While audiological evaluations occurred in a sound-treated booth, the AV-Stroop was administered in a quiet, distraction-free clinical room using circumaural headphones. This setup ensured consistent acoustic delivery and provided passive noise attenuation, minimizing the influence of environmental sounds. We have updated the Methods section to specify the equipment and environment used. Thanks for your comment.

Comment 18 (Lines 175–188) Issue: Tinnitus pitch matching has well-documented poor test-retest reliability in the literature. No repeat measurement or reliability check is described, and no acknowledgment of this limitation is provided, despite the Tinnitus Pitch track being the most novel element of Stage 3. Suggestion: Authors should acknowledge the established reliability limitations of tinnitus pitch matching and describe any steps taken to minimize their impact.

Response: Dear reviewer, to address the known variability in tinnitus pitch matching, we implemented a verification protocol where each match was retested and confirmed. Only stable, reproducible pitch matches were utilized for the Stage 3 stimulus construction. This replication step was specifically designed to minimize the impact of documented test-retest reliability issues in pitch matching. This information was added to the Methods section and also discussed in the Limitations section. We appreciate your kind support and guidance.

Comment 19 (Lines 176–196) Issue: HADS scores indicating clinical anxiety (12–21) are present in five tinnitus participants and clinical depression scores in four. The authors state these were collected for homogeneity purposes but deliberately not clinically interpreted (lines 175 180). Given the well-established relationship between affective disorders and cognitive performance, retaining these participants without comment is a methodological concern. Suggestion: Authors should consider either excluding participants with clinical HADS scores or conducting a sensitivity analysis to assess their influence on AV-Stroop performance.

Response: Dear reviewer, we acknowledge the reviewer’s concern regarding the potential influence of affective factors on cognitive performance. However, because this investigation was designed as a preliminary validation of the AV-Stroop's clinical utility, we prioritized a representative clinical sample over a strictly filtered one to ensure the results reflect a typical tinnitus population. While we collected HADS scores to characterize the sample's homogeneity, a detailed sensitivity analysis or sub-group exclusion was deemed outside the current study's scope. We have instead addressed this as a limitation, noting that future research should isolate the impact of anxiety and depression on AV Stroop Test.

Comment 20 (Lines 192–196) Issue: The order of the four AV-Stroop test tracks in Stage 3 (WN, NB, PT, Tinnitus Pitch) is never specified. If the Tinnitus Pitch track was consistently administered last, cumulative fatigue could partially account for the higher error rate observed in that track. Suggestion: Authors must specify the track administration order. If Tinnitus Pitch was always administered last, fatigue as a contributing factor should be acknowledged and, if possible, addressed analytically.

Response: Dear reviewer, thanks for your comment. The AV-Stroop tracks were administered in a fixed sequence: White Noise, Narrow Band, Pure Tone, and finally Tinnitus Pitch. We acknowledge the potential for order effects; however, the total duration of the four tracks was less than 12 minutes (each track was approximately 120 seconds), and no participants reported subjective fatigue or requested breaks. This suggests that the increased error rate observed in the Tinnitus Pitch track is more likely attributable to the higher cognitive load of the personalized stimuli rather than cumulative fatigue. We have noted the fixed administration order as a limitation.

Comment 21 (Lines 197–209) Issue: No a-priori power calculation is provided for any stage of the study. Sample sizes of n=45 (Stage 2) and n=30/40 (Stage 3) are stated without justification. Given the low absolute error counts in the results, the study may be underpowered for error-based comparisons. Suggestion: Authors must provide a power calculation for each stage, specifying the primary outcome variable, expected effect size, desired power, significance level, and resulting minimum sample size required.

Response: Dear reviewer, while an a-priori power calculation was not performed, the observed sample sizes (n=45 and n=30) proved sufficient to identify statistically significant differences with moderate strenght. These results confirm that the study was adequately powered to support our primary hypotheses. We have acknowledged the small absolute error counts as a limitation, noting that they reflect the high efficiency of the task rather than a lack of statistical sensitivity.

Comment 22 (Lines 197–209) Issue: No correction for multiple comparisons is described despite numerous statistical tests across multiple tracks, outcomes, and stages. Several reported p-values fall close to the 5% threshold (e.g., p=0.059, p=0.079) and could shift to non-significance after correction. Suggestion: Authors must apply an appropriate multiplicity correction (e.g., Bonferroni or Holm-Bonferroni) or declare a pre-specified primary endpoint hierarchy and interpret secondary outcomes accordingly.

Response: Dear reviewer, we appreciate this methodological point. This study was designed as an exploratory investigation into a novel assessment tool (AV-Stroop). Following the logic of Rothman (1990), we elected not to apply multiplicity corrections to avoid increasing Type II errors (missing true effects), which is critical in the early validation phase of a new clinical instrument. We have maintained the original p-values to allow for a transparent discussion of the trends observed, but we have updated the Limitations section to acknowledge that these findings should be interpreted with caution.

Comment 23 (Lines 197–209) Issue: Effect sizes are absent from all inferential analyses. Without them, the clinical significance of statistically significant findings cannot be assessed, which is particularly important for a paper proposing a new clinical tool. Suggestion: Authors must report effect sizes alongside p-values throughout the Results section.

Response: Dear reviewer, we thank you for this comment. Following the conventions established by Cohen (1988), our reported values represent effects of moderate strength, which supports the clinical relevance of the findings. We have updated the text to explicitly state that these coefficients are being utilized as our measure of effect size.

Comment 24 (Lines 197–209) Issue: The statistical analysis section describes all tests in a single undifferentiated block without mapping specific tests to specific stages or designating primary versus secondary outcomes. Suggestion: Authors should reorganize the statistical analysis section to clearly map each test to the stage and outcome it addresses and explicitly designate primary and secondary outcomes for each stage.

Response: Dear reviewer, we appreciate your insightful comment. We revised the Methods to clarify this question. Thanks so much for your kind support.

Results: Comment 25 (Lines 212–227) Issue: The Stage 1 results section contains no quantitative data. It describes the number and format of tracks created, content that belongs in the Methods and reports no performance metrics, participant responses, or outcomes of any kind. Suggestion: Authors should either remove Stage 1 from the Results section and integrate its content into Methods, or report actual outcome data (e.g., participant feedback, modification rates, error rates during piloting) that justify the adaptation decisions made.

Comment 26 (Lines 228–250) Issue: The Stage 2 results present inferential statistics without first reporting descriptive statistics for the AV-Stroop, c-Stroop, and MOCA. The actual score distributions for the c-Stroop and MOCA are never reported, preventing assessment of range restriction or ceiling effects. Suggestion: Authors must report mean, standard deviation, and range for all three instruments before presenting correlation results. MOCA score distribution is particularly important given the ceiling effect concern.

Response: Dear reviewer, we have addressed the request for descriptive data by emphasizing the previously reported MoCA distribution. The broad range confirms the absence of a ceiling effect in our sample. Regarding the c-Stroop and AV-Stroop, these results were included as comparative benchmarks. Consequently, we have prioritized the reporting of their performance trends within the correlational analysis. We have updated the Results section to more clearly summarize the descriptive characteristics of these instruments.

Comment 27 (Lines 234–237) Issue: Table 5 uses the column header 'TP' while the text and all other tables use 'PT' for pure tone. This inconsistency requires correction. Suggestion: Authors should correct 'TP' to 'PT' in Table 5 for consistency with the rest of the manuscript.

Response: Dear reviewer, we really appreciate your support. We reviewed the abbreviation in Table 5.

Comment 28 (Lines 255–260) Issue: The finding that tinnitus subjects were slower even during the training track — which contained only congruent stimuli with no interference effect — is reported briefly and without adequate interpretation. This finding directly bears on the central debate between generalized cognitive depletion and interference-specific impairment in tinnitus. Suggestion: Authors should give the training track finding more prominent treatment in both Results and Discussion, explicitly addressing what it implies about the generalized versus specific nature of tinnitus-related cognitive slowing.

Response: Dear reviewer, we agree that the slower performance during the training track is a critical finding. As detailed in the Discussion (Pages 16 and 17), we have explicitly addressed this result as evidence of a generalized cognitive impairment in the tinnitus group. We have revised the manuscript to make this distinction more prominent.

Comment 29 (Lines 256–259) Issue: The binary error threshold of fewer than 3 versus 3 or more errors per track is applied throughout Table 7 without justification. It is unclear whether this cutoff was pre specified, derived from prior literature, or determined post-hoc based on the data distribution. Suggestion: Authors must state the basis for the 3-error threshold. If it was determined post-hoc, this should be acknowledged as a limitation and a sensitivity analysis using a continuous error variable should be considered.

Response: Dear reviewer, the three-error threshold was determined post-hoc based on the observed data distribution. Given the low absolute error counts—a finding consistent with previous Stroop-based studies—a binary threshold was necessary to distinguish between baseline performance and significant interference. We have updated the manuscript to acknowledge this post-hoc classification as a limitation and have clarified that this threshold serves as an initial benchmark for identifying clinically relevant executive difficulty in this pilot phase.

Comment 30 (Lines 256–259) Issue: The PT error difference is reported at p=0.079 and described as observed at the '10% significance level.' Reporting a finding at a threshold that contradicts the stated 5% significance level is inconsistent and should not be presented as a finding. Suggestion: Authors should reframe the PT error result (p=0.079) as a non-significant trend. The significance threshold cannot be selectively relaxed for individual comparisons.

Response: Dear reviewer, we appreciate your insightful comment. We revised this sentence in the manuscript.

Comment 31 (Lines 272–278) Issue: The WN versus Tinnitus Pitch error comparison yields p=0.059, which does not reach the stated 5% significance level, yet the overall conclusion that the Tinnitus Pitch track demonstrated greater interference is presented without this qualification. Additionally, congruent and incongruent trial performance are never reported separately despite the 80/20 design being specifically intended to create differential conditions. Suggestion: Authors should report the WN vs Tinnitus Pitch result consistently with the stated significance threshold. Authors should also report, where possible, separate performance metrics for congruent and incongruent trial series.

RESPONSE: Dear reviewer, we have ensured that the p=0.059 result is consistently characterized as a marginal trend rather than a statistically significant finding. We revised the manuscript to make it clearer. The performance metrics for incongruent and congruent series were not obtained by the operating system adopted. It was mentioned as a limitation of our study.

Discussion: Comment 32 (Lines 286–466) Issue: The Discussion mirrors the three-stage structure of the Methods and Results, creating a sequential summary of findings rather than an analytical synthesis. It functions as a theoretical reporting of results rather than an inferential discussion. Since no hypotheses were stated in the Introduction, the Discussion has no evaluative framework to return to and cannot assess whether the data confirmed or refuted the study's predictions. Suggestion: Authors should consider restructuring the Discussion as a flowing analytical narrative rather than three sequential stage-summaries. It should open by returning to the study hypotheses, synthesize what the totality of the three stages demonstrates about the AV-Stroop as a clinical tool, engage directly with contradictory or unexpected findings, and build an explicit argument for the tool's clinical value and limitations.

Response: Dear reviewer, we have maintained the three-stage structure of the Discussion to facilitate clarity and organization, given the comprehensive nature and distinct methodologies of each phase. Additionally, we have revised the text to clarified the a-priori predictions that guided this research.

Comment 33 (Lines 319–324) Issue: The rationale for using nonverbal auditory stimuli — avoiding semantic and language processing biases to establish a more automatic and basal conflict — appears here for the first time. This is a central methodological justification that belongs in the Introduction. Suggestion: Authors should relocate this rationale to the Introduction as noted in Comment 4.

Response: Dear reviewer, we revised the Introduction to fulfill this gap. We thank you for your guidance.

Comment 34 (Lines 297–301) Issue: The neural overlap between Stroop-activated regions and tinnitus-related brain areas is listed in adjacent paragraphs without connecting them. This is potentially the most interesting theoretical point in the paper shared neural substrates could explain why tinnitus disrupts Stroop performance, yet it is left implicit. Suggestion: Authors should explicitly connect the two sets of brain regions and develop the neural overlap argument as a theoretical contribution of the work.

Response: Dear reviewer, we agree that the neural overlap between Stroop-activated regions and tinnitus-related brain areas represents a core theoretical contribution of this work. We have revised the Discussion to explicitly connect these two bodies of research. Thanks so much for your guidance.

Comment 35 (Lines 341–349) Issue: The conceptual equivalence claim based on AV-Stroop and c-Stroop correlation is overstated. Moderate correlation coefficients (0.448–0.690) indicate shared but also substantial unshared variance. What accounts for the unshared variance is not discussed. Suggestion: Authors should temper the conceptual equivalence claim and discuss possible reasons for the unshared variance between the two tests, which could reflect genuine construct differences.

Response: Dear reviewer, we appreciate your comment. We revised the manuscrip to address this question.

Comment 36 (Lines 355–358) Issue: The MOCA null finding is interpreted as confirming the task is 'simple, basal, easy-to-apply' without acknowledging the probable ceiling effect on MOCA scores in a healthy sample, as noted in Comment 16. Suggestion: Authors should qualify this interpretation with the ceiling effect caveat.

Comment 37 (Lines 388–398) Issue: The suggestion that WN and NB tracks showing the greatest error differences reflects the slightly higher proportion of noise-like tinnitus in the Study Group is a post hoc interpretation. The study was not designed or powered to test this hypothesis and tinnitus subtype sample sizes are too small to support it. Suggestion: Authors should clearly flag this as a speculative post-hoc observation requiring dedicated investigation in a larger, subtype-stratified sample.

Response: Dear reviewer, we appreciate your guidance. We acknowledge that the connection between tinnitus subtypes and performance on the WN and NB tracks is a post-hoc observation. We have revised the Discussion to explicitly flag this interpretation as speculative. We agree that our current sample was not powered for subtype stratification, and we have clarified that this finding serves as a hypothesis-generating observation for future, larger-scale studies.

Comment 38 (Lines 411–418) Issue: The Tinnitus Pitch finding is presented as a key conclusion but the discussion does not acknowledge its principal vulnerabilities: the WN vs Tinnitus Pitch p=0.059 result, the absence of a between-group comparison for this track, the poor test-retest reliability of tinnitus pitch matching, and the potential fatigue effect if this track was always administered last. Suggestion: Authors must acknowledge these limitations transparently when discussing the Tinnitus Pitch finding. The result should be framed as promising and hypothesis-generating rather than confirmatory.

Response: Dear reviewer, we revised the manuscript to fulfill this gap. Thanks for your comment.

Comment 39 (Lines 435–438) Issue: The negative emotional valence hypothesis for the Tinnitus Pitch finding is very interesting but introduced speculatively without any emotional valence rating having been obtained from participants and without a validated emotional Stroop paradigm for comparison. Suggestion: Authors should either develop this hypothesis with specific future methodological suggestions or remove it from the Discussion.

Response: Dear reviewer, we revised the manuscript, pointing to the need for future investigation. Thanks for your comment.

Comment 40 (Lines 449–453) Issue: The distinction between cognitive efficiency (reaction time) and cognitive performance (accuracy) is a meaningful theoretical contribution central to reconciling this study's findings with prior literature, but it is introduced and resolved in two sentences. Suggestion: Authors should consider expanding this distinction into a substantive paragraph addressing the existing literature on this topic and what the current findings specifically add.

Response: Dear reviewer, we revised the manuscript to fulfill this gap. Thanks for your comment.

Limitations:

Comment 41 (Lines 468–489) Issue: The limitations section acknowledges only two issues. The following are absent: absence of test-retest and inter-rater reliability data; no power calculation; no correction for multiple comparisons; no effect sizes; unspecified testing environment; tinnitus pitch matching reliability; no between-group Tinnitus Pitch comparison; restricted tinnitus phenotype (whistling/hissing only) limiting generalizability; probable MOCA ceiling effect; and unaddressed clinical HADS scores in a subset of participants. Suggestion: Authors must substantially expand the limitations section to address all principal methodological constraints identified above.

Response: Dear reviewer, we expanded the Limitations section. Thanks for your comment.

Comment 42 (Lines 471–473) Issue: The authors list 'not assessing the influence of tinnitus severity, characteristics, and symptom onset time' as a study limitation. However, THI scores, tinnitus duration, pitch, and loudness data were collected and are available for analysis. This is an analytical choice, not a design limitation. Suggestion: Authors should justify why THI and tinnitus characteristics were not included as covariates in the analyses or conduct and report these analyses.

Response: We appreciate the reviewer’s observation. The decision to exclude tinnitus characteristics (THI, duration, etc.) as covariates was a deliberate choice to maintain the scope of this initial three-stage validation study. Given the comprehensive nature of the current work, we believe these analyses warrant a dedicated, separate investigation. We have retained this in the Limitations/Future Directions section, explicitly stating that our goal here was to establish the tool’s fundamental sensitivity before pursuing more granular clinical stratification in future reports.

Conclusions: Comment 43 (Lines 491–499) Issue: The conclusions section is six sentences long for a three-stage study and is substantively identical to the abstract. It does not return to the four stated study aims, does not acknowledge any limitations qualifying the findings, and does not address clinical significance or next steps. Suggestion: Authors must substantially expand the conclusions to systematically address each study aim, acknowledge the principal limitations qualifying the headline findings, state the novel contribution of the work relative to existing tools and articulate the clinical and research implications.

Response: Dear reviewer, we revised the manuscript to fulfill this gap. Thanks for your comment.

Comment 44 (Lines 494–495) Issue: The statement 'AV-Stroop performance was not affected by the mild cognitive impairment screening score' is presented as a clean conclusion without acknowledging the probable ceiling effect on MOCA scores discussed in Comments 16 and 36. Suggestion: Authors should qualify this conclusion accordingly.

Comment 45 (Lines 498–499) Issue: The conclusion that 'a stimulus with spectral features similar to tinnitus perception proved to be more effective' overstates the evidence. This finding rests on a within-group analysis, a p-value that partially falls below the stated threshold, and several unaddressed methodological vulnerabilities. Suggestion: Authors should reframe this conclusion as a promising preliminary finding warranting confirmation in a larger, independently designed study.

Response: Dear reviewer, we revised the conclusion. Thanks for your comment.

Comment 46 (Lines 495) Issue: Grammatical error: 'Subjects with tinnitus were slowly than controls' should read 'slower than controls.' Suggestion: Authors should correct this error.

Response: Dear reviewer, we revised the conclusion. Thanks for your comment.

Thank you for considering our revised manuscript.

Sincerely,

Authors

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Final Recommendation (Second Round Review)

The revision represents a meaningful improvement. Key additions including the Helsinki declaration, directional hypotheses, study flowchart, pilot description, expanded limitations, and revised conclusions are acknowledged. The power calculation absence is noted but no further action is required of the authors on this point, and the Discussion structure is accepted as a stylistic choice.

However the following critical issues remain unresolved and must be addressed prior to acceptance:

Must resolve:

Erroneous find-and-replace substituting "participantive" for "subjective" throughout text and reference list
Duplicate paragraph in Stage 2 results (Lines 334–340)
Table 6 TP/PT inconsistency persists despite Comment 27 being acknowledged
Stage 2/Stage 3 cohort overlap must be explicitly quantified
Participants section still lacks tinnitus diagnosis criteria, exclusion of secondary tinnitus, and confirmation of tinnitus absence in controls
MOCA ceiling effect inadequately dismissed — mean and SD must be reported and the null finding interpreted more cautiously throughout
Effect sizes for Stage 3 Mann-Whitney and Friedman tests remain absent — rank-biserial correlation and Kendall's W must be reported

Author Response

Dear Reviewer,

We sincerely thank you for the your thorough review and insightful comments on our manuscript entitled “The Auditory-Visual Stroop Test to assess subjects with tinnitus”. We have carefully addressed all points raised and revised the manuscript accordingly to improve its clarity, methodological rigor and overall quality.

Below we provide a detailed point-by-point response. Reviewers comments are presented followed by our responses.

Comment: Erroneous find-and-replace substituting "participantive" for "subjective" throughout text and reference list

Response: Thank you for identifying this issue. The error resulted from an unintended global text replacement. All instances have been carefully reviewed and corrected throughout the manuscript, including the reference list.

Comment: Duplicate paragraph in Stage 2 results (Lines 334–340)

Response: We appreciate this observation. The duplicated paragraph has been removed and the results section has been revised to ensure clarity ad avoid redundancy.

Comment: Table 6 TP/PT inconsistency persists despite Comment 27 being acknowledged

Response: Thank you for highlighting this inconsistency. Table 6 has been reviewed and corrected.

Comment: Stage 2/Stage 3 cohort overlap must be explicitly quantified

Response: We agree that this point required clarification. The manuscript has been revised to explicity state that Stage 3 included all participants from Stage 2 (n=45), with the addition of 25 newly recruited participants, resulting in a total sample of n=70.

Comment: Participants section still lacks tinnitus diagnosis criteria, exclusion of secondary tinnitus, and confirmation of tinnitus absence in controls

Response: We appreciate this important comment. The Participants section has been revised to clearly distinguish between Stage 2 and 3. Stage 2 aimed to validate the AV-Stroop tool and therefore included participants regardless of tinnitus status. Stage 3 focused on clinical comparison and applied stricter criteria. We have now explicity stated: diagnostic criteria for tinnitus, exclusion of secondary tinnitus and confirmation of absence of tinnitus in the Control Group.

Comment: MOCA ceiling effect inadequately dismissed — mean and SD must be reported and the null finding interpreted more cautiously throughout

Response: Thank you for this valuable suggestion. We have now reported the mean MOCA score (24.067) and standard deviation (3.360), demonstrating variability in cognitive performance rather than a strict ceiling effect. Additionaly, the interpretation of the lack of association between MOCA and AV-Stroop performance has been revised throughout the manuscript to reflect a more cautious interpretation, emphasizing tha AV-Stroop may capture specific executive processes not fully assessed by global cognitive screening tools.

Comment: Effect sizes for Stage 3 Mann-Whitney and Friedman tests remain absent — rank-biserial correlation and Kendall's W must be reported

Response: We appreciate this important methodological recommendation. Effect sizes have now been calculated and included..

We greatly appreciate your valuable feedback, which has significantly strengthened our manuscript.

It is our honor to submit the revised version of our manuscript for consideration in Brain Sciences.

Sincerely,

Authors

Reviewer 2 Report

Comments and Suggestions for Authors

Comment 1: The manuscript has definitely been substantially improved and has more or less been rewritten. Therefore, I treated it like a new one and ignored mostly my previous comments and the authors' responses to them. However, the manuscript still needs substantial improvements.

Comment 2: The concept "execution time" is problematic in many places in the manuscript. Especially since in one place it is defined as the time from when the stimulus appears until the participant responds (lines 190 – 193). In other cases, the concept of "execution time" refers to the total it took to run the whole test. Maybe "total test time" or "total task time" would be better. Anyway, the meaning (or definition) of "execution time" must be the same throughout the manuscript.

Comment 3, Abstract: The authors frequently report "p-value" less or equal to x or bigger or equal to y. Since the p-value is singular the reporting method provides little information. When reporting that a single value is bigger or equal to e.g. 0.6 the value can range from 0.6 to infinity. Please adjust the writing method in all occasions where a single p-value is reported like this.

Comment 4, Introduction: The introduction is still very limited and some of what is in the Discussion section should be moved here. This is especially pronounced when discussing the First Stage. It would be ideal to explain the traditional Stroop test in a few words, e.g. in the first paragraph.

Comment 5, lines 51 – 58: The authors should inform the reader about how tinnitus affects the cognitive tasks as reported in the papers they cite.

Comment 6, lines 59 – 63: The discussion in these lines are very limited and the authors should tell the reader more about the application of the Stroop test. And cite some of the papers that are involved along with ref 9.

Comment 7, line 62 "assess neuronal connectivity during": How is it possible to assess neuronal connectivity by applying the Stroop test? Please, explain and cite a study or two that has done so.

Comment 8, lines 64 – 68: Instead of citing ref [9] only the authors should also cite some of the studies that have shown that tinnitus may interfere with top-down control and discuss the interference.

Comment 9: lines 69 – 73: Please explain those variations and cite the relevant papers. With respect to the rest of the manuscript, it would be ideal to explain how a stimulus can be made comparable to tinnitus matched pitch and what tinnitus matched pits is.

Comment 10, lines 80 – 82: As the AV-Stroop test has been used it is clear that it asses executive control and attentional factors. Please, explain in a few words what will be done differently here and what is the main purpose of this stage. When reading further, the purpose becomes obscure.

Comment 11, lines 83 – 85: Is it reasonable to expect that "cognitive screening test performance" can affect the AV-Stroop's results? The performance of a test is not the same as the performance of the individual that was tested. Please, rephrase.

Comment 12, lines 93 – 94: What is a Tinnitus Pitch stimulus? And interference in what context?

First Stage: Adaptation of the Auditory-Visual Stroop Test

Comment 13, line 126: The participants were instructed to respond when they heard the sound and point at the circle. But where they told that if the sound appeared to the left ear the circle was to the left, if in right ear then to the right and if in both ears then at the centre? If they were not told about the connection between the side of the sound stimulus and the side visual stimulus, their main emphasis would have been on the connection between the sound and the circle.

Comment 14: In the section on the First Stage, the authors never mention the participants that participated in this stage. They never explain (neither in this section nor in the Results section) how or if the data collected in this stage was analysed or not and if analysed, there are no discussion of how the results were used. At this point, it seems that the First Stage was not needed. Please explain what data was collected, how it was analysed and how it was used.

Comment 15, paragraph that begins in line 123: When discussing the procedure in this paragraph, please refer to the panels in Figure 1. As the paragraph is written, it seems that the panel 1 (the topmost) was not used. The sentence "Sound stimulus was initially presented binaurally (target visual stimulus presented in the center of two distracting stimuli)." (lines 129 and 130) probably refers to panel 2 in the figure.

Comment 16: It seems that each pairing was presented 6 times since the authors wrote "completing 18 sequential items". It is a little strange to refer to the presentations as "items" but not trials (or rounds).

Comment 17: Furthermore, it seems that this paragraph describes the first step in the First Stage and that the main purpose was "To establish the stimulus-response association".

Comment 18: How long time passed from the end of the above discussed part of Stage One, until the second part began? This time is important because the longer it was the weaker the association between the sound and the circle.

Comment 19: lines 126 – 128: The authors wrote: "During this phase, participants were conditioned to the rule that auditory stimulation (sound source) dictated the correct response side." In other words, the participants learned that the cue (the sound) was always valid. But I doubt that 18 presentations (3 x 6) were sufficient to condition the participants regarding the locations of the circle. The Stroop interference is because of responses that are close to be automatic. But it is very unlikely that 18 (3 x 6) presentations suffice to establish automatic responses, especially if the participants were not told that the sound conveyed information about the location of the circle.

Comment 20: Paragraph that begins in line 142: It seems that a second step in the First Stage is discussed in this paragraph.

Comment 21, line 146: Now the authors refer to each presentation as "series of stimuli" which seems to be trials and were previously referred to as "items". Please keep the terminology constant.

Comment 22: In this part the sound was either presented at the same side as the circle or on the other side. Which means that the cue (the sound) was either congruent (valid) or incongruent (invalid). But at this point, it is not clear if the same sound as in the first part was used here or not. This must be reported. Neither is it clear if the same participants participated in this part and in the previous part. Please, make that clear.

Comment 23, lines 151 and 152: "Each test track was about 120 second" What is a "test track"? Please explain.

Comment 24, line 153: "the following sequence of stimuli begins." What sequence of stimuli? Please explain.

Comment 25, line 155: There exist several types of monitors. That begs the question, what is a "standardized" monitor? Please, explain.

Comment 26, line 156: What is a consistent refresh rate?

Comment 27, lines 170 and 171: the authors wrote: "(right side/left side/right side + left side in stereo sound)" This description is hard to understand. Please, clarify.

Comment 28: In the paragraph that begins in lines 167 the stimuli are described. It is important for the readers to know the characteristics of the stimuli before they read about how they were presented. And in the paragraph that begins in line 177 the equipment used are described. Therefore, these two paragraphs should be moved to the beginning of the First Stage section.

Comment 29, lines 177 and 178: "using a touchscreen notebook (11.6 inches, 1366 x 768 pixels, HD)" Is this the "standardized" monitor mentioned in line 155? If that is the case, the discussion of the "standardized" monitor in line 155 should be deleted.

Comment 30, Table 1: Column headings are missing, please add them.

Comment 31, paragraph that begins in line 194: It seems that in this paragraph, the authors are explaining that the stimulus were refined (or revised) to improve the quality of them. But they do not explain how it was done or what the refinement was based on. One could expect that the data collected in this stage (First Stage) would have been used but it seems that the two data sets were never used. If that is the case, the authors must explain why the data was collected. But if I am wrong, the authors must explain how the data was used.

Comment 33, line 196: How were those 34 test tracks selected? And what constitutes as "test tracks"? Please, clarify.

Comment 34, lines 197 and 198, sentence that begins with "Additionally": The meaning of this sentence is obscure, please clarify. Furthermore, so far, the authors have not provided any evidence supporting "rapid conditioning" and neither have they provided evidence supporting "stable performance". Therefore, this statement is unsupported.

Comment 35: At this stage in the paper, the reader should have a clear view and understanding of what was the purpose of the First Stage, how the data was used, how the stimuli were refined, how many participants participated in this stage and if the same participants participated in both parts of the First Step. But that is not the case, so this section has to substantially improved.

Comment 36: Furthermore, after reading more of the manuscript, it seems that the AV-Stroop test was designed in this stage and used later in the two other stages. If that is the case, then the authors must explain the structure of the test. If the structure is not explained at this stage, it is very hard to comprehend the next two stages.

The Second Stage

Comment 36: The participants that participated in the First Stage, did some or all of them participate in this stage also?

Comment 37, paragraph that begins in line 211: I suppose that the AV-Stroop test is the one that was use in the First Stage. Please make that clear and how many trials (or presentations) were used. Was a training section included? The sounds reported here, were they the same as in the First Stage?

Comment 38, line 215: In this line the authors cite references 10, which is not in English and that makes the value of this citation very limited. I am not saying that it is wrong to cite ref. 10 but please explain what is the main purpose of citing this paper and preferably cite one similar in English.

Comment 39, line 221: In lines 190 – 193, the "total execution time" was defined as the time from when the stimulus appeared until the participant responded. But what is the "total execution time" now? Please, explain

Comment 40, paragraph that begins in line 223: In this paragraph the authors presented "the MOCA test" which is a psychometric test developed in Canada and cite references 11. This reference is not in English and has therefore a limited value since it is reasonable to expect that the majority of the possible reader might not be able to read the cited paper. Please add a citation that evaluates this test and is written in English; there exist several possible papers. The psychometric properties of the MOCA test are very important.

Comment 41, paragraph that begins in line 227: In this paragraph the authors report the order of the test used. They also mention a S-Stroop test that has not been discussed before, please explain what the S-Stroop test is. Furthermore, it is not clear how many trials (or presentations) were included; at least not for the AV-Stroop test. It is also unclear what was the participants' task in the S-Stroop test.

The Third Stage

Comment 42: It seems that all of the 45 participants that participated in the Second Stage participated in the Third Stage and 25 new were added to the sample. Please, rephrase the sentence that begins in line 234 to make this clear.

Comment 43: Tables 3 and 4. The value of standard deviation is always positive. So, writing SD +/- is wrong. Please, rectify.

Comment 44, line 248: In my previous review I asked what "clinical history" is and the authors responded that they had clarified this in the new version. But that is not the case. So, I ask again, what does it mean to "completed a clinical history"?

Comment 45, line 282: What was included in the training stage?

Comment 46, line 289: Please explain exactly what the total task execution time is and how it was measured. Please, also report how the number of errors were recorded.

Comment 47: Paragraph that begins in line 282. Please report how many trials (presentations) were in each task.

Statistical Analysis

Comment 48: Why do the authors cite reference 18 while discussing the statistical test they used? It seems to be a basic level textbook. While that might be OK, the authors should cite some peer reviewed published papers (in English) if they believe that citations are necessary.

3. Results

First Stage

Comment 49: The authors never explain how the 34 standardised test tracks were made or selected and neither have they explained what a test track is. That must be done. A big portion of what is reported here, should be in the section on the First Stage because it would greatly help the reader to understand the following stages.

Second Stage

Comment 50: Tables 5 and 6. What is Stroop LP

Third Stage

Comment 51: In this part number of errors (and some kind of ratios) and execution times are analysed. But the total number (or ratio) of errors for each group in each task is never reported. This information is very important, so please report it.

Comment 52: The first paragraph (begins in line 366). In this paragraph the authors are discussing the outcome of the statistical tests. But it is hard to understand what was compared. In lines 366 and 367 the authors are referring to the execution times reported in the rows 2 (Training) to 5 (PT) rows in Table 7. But in the next sentence it seems that they are referring to the rest of the table and that part is complicated, see my discussion of Table 7, below.

Comment 53, lines 366 and 367. In which tasks were these differences found? It is not sufficient to tell that is in the training and test tracks. Please, explain.

Comment 54: Tables 7 and 8. The value of standard deviation is always positive. So, writing SD +/- is wrong. Please, rectify.

Comment 55: Table 7. I recommend that the authors split this table into two tables because the information in the first part (first 5 rows) is very different from the information in the rest of the table.

Comment 56: Table 7, rows 6 – 12 (both included). In this part it seems that the error variable is now a binary variable: less than 3 errors and bigger or equal to 3 errors and that only errors in the less than 3 part compared between the groups. Please explain the reasons for this. Furthermore, it seems that what is compared is the number of participants that made less than 3 errors are compared, not the number of errors per se as stated in the table's caption. Whether or not, the authors split this table into 2 tables or not, the caption must correctly and clearly explain what is included in the table.

Comment 57, line 387. A p-value < 0.05 could be infinitely low, please report the exact p-value.

Comment 58: Table 8. I recommend that the authors split this table into two tables

Comment 59: Table 8. It is hard to see what is actually reported in Table 8. In the WN-line a value of 140.5 is reported and a p-value of 0.907. The intention seems to be to report some comparisons, but what is compared to what? Please, explain in the caption.

Comment 60: Table 8. Except for the Multiple comparisons' part, p-values are only for the WN-values. Please explain the reasons behind it.

Comment 61: Table 8. What is "Multiple comparisons (errors)"? As it is written in the table (e.g. WN x NB) it suggests that this might refer to interactions. Please explain, e.g. in the caption.

4. Discussion

First Stage

Comment 62, lines 411 and 412: Ref. 9 is a review paper so please cite directly some of "those studies" (e.g. [x], [y], [z]; for a review see [9]).

Comment 63, lines 413 and 414: "Additionally, the association with the presentation side establishes a conflict between the stimulus dimensions [22,23]." What is the meaning of this sentence? In ref. 22 they used the go/no-go paradigm, and the performance of their subjects was not compared between sides.

Comment 64, lines 415 to 419: The most expected form of presentations was sound and circle (presented 18 times) but not the side. The pairing of sound locations and visuals stimuli was presented 6 times for the left and 6 times for the right. If the participants would have been told about the relationship between the side of sound and side of circle, that would have helped. But since that was not done, the connection between sound and circle was much stronger that between sound and side. Furthermore, no evidence, supporting this claim, based on the collected data has been reported.

Comment 65, lines 420 to 422: It is highly unlikely that 2 x 6 presentations resulted in automatic responses.

Comment 66, lines 432 – 435: This paragraph should definitely be in the intro.

Comment 67, line 433: Please note that ref. 22 did not use the Stroop paradigm but the go/no-go paradigm.

Comment 68, line 434: I suggest that the authors replace the word "accurate" with the word "detailed". This is because I don't think that it affects the accuracy of the analyse whether to total time (the execution time, in this case) or the responses times are analysed. The authors should also keep in mind that the concept execution time could as well be response time since execution time is the time it takes to execute something. This becomes even more important because in lines 190 – 192 the execution time is defined as response time to a single presentation.

Comment 69: The majority of the text in this section belongs more to the Introduction section than to the Discussion section. Therefore, the authors should consider moving some of this text to the intro.

Second Stage

Comment 70, line 446: The authors used a sample drawn (selected) from a population. Therefore, it is misleading to use the word "population" in this context, please rectify.

Comment 71, line 453: Please, cite directly some of the previous studies in addition to ref. 9.

Comment 72, lines 456 – 458: Number of errors in which task? In Table 6, correlations results are reported, so please replace the word "associated" with correlated.

Third stage

Comment 73, line 485: Please note that ref. 22 did not use the Stroop paradigm but the go/no-go paradigm.

Comment 74, lines 487 – 491: In Table 7, it seems that the reported values are the number (and ratio) of participants that made errors either fewer than 3 or 3 and more. Therefore, we do not know about the total number of errors.

Comment 75, line 498: In Table 7, error counts (e.g. number of errors) is not reported, but the number of participants that made errors are reported. Please make this clear.

Comment 76, line 552: According to Table 7, the Control Group made errors in the 3 tests the authors mention here. Please take a look at this.

Comment 77, lines 565 and 566: Please cite some of these studies. If no studies are cited, this claim is unsupported.

Comment 78, line 570: More than what?

Comment 79, line 578: "Unlike other studies [25]" only one cited so singular (study) should be used but not plural. Although it would be best to cite more than just one.

Lines 578 – 582: Please note that ref. 22 did not use the Stroop paradigm but the go/no-go paradigm.

Limitations and Future Directions

Comment 80, line 592: It is not clear what the authors are referring to when mentioning "The operating system". Usually, this phenomenon refers to the operating system used to control the computer (e.g. MacOS, Linux, Windows). Given that in line 177 the authors refer to PowerPoint then they are probably referring to the software used, but not the operating system, per se. Furthermore, in line 190 the authors wrote that the execution time was recorded automatically. This is contradictory, please explain this discrepancy.

Comment 81, lines 619 – 623: This paragraph should be in the intro (or the section of statistical analysis) because if it is located there, the reader know this before reading the results section.

Author Response

Dear Reviewer,

Below we provide a detailed point-by-point response. Reviewers comments are presented followed by our responses.

Response: We appreciate this overall assessment. In this revised version, we have implemented substantial changes across sections, including restructuring the Introduction, clarifying the methodology (particularly Stage 1), standardizing terminology and improving presentation of results and tables.

Response: Thank you for identifying the inconsistency. The term “execution time” has been replaced with “total task time”, defined as the total duration required to complete the task. Additionally, when referring to single-stimulus responses, we now consistently use “response time”.

Response: We appreciate your comment and agree with this important point. All p-values are now reported as exact values whenever possible. In cases where values are below reporting threshold, we use standard format (p < 0.001).

Response: We appreciate your comment. This section has been substantially expanded and restructured to improve conceptual clarity and provide a stronger theoretical foundation.

Specifically, we have:

Included a concise explanation of the classical Stroop paradigm and its relevance to executive control
Expanded the discussion on cognitive interference in tinnitus, including attentional and inhibitory mechanisms
Clarified how tinnitus may affect top-down control processes
Added references to primary studies (in addition to review articles)
Provided a clearer description of auditory stimulus matching (tinnitus pitch matching) and its relevance
Explicitly stated the purpose and novelty of each study stage, particularly Stage 1

Comment 5, lines 51 – 58: The authors should inform the reader about how tinnitus affects the cognitive tasks as reported in the papers they cite.

Response: We appreciate your comment. We reviewed the introduction to fulfill this gap.

Comment 7, line 62 "assess neuronal connectivity during": How is it possible to assess neuronal connectivity by applying the Stroop test? Please, explain and cite a study or two that has done so.

Response: We appreciate your comment. We reviewed the introduction to fulfill this gap.

Response: We appreciate your comment. We reviewed the introduction and rephrased this sentence.

Comment 12, lines 93 – 94: What is a Tinnitus Pitch stimulus? And interference in what context?

Response: We appreciate your comment. We reviewed the introduction to clarify these questions.

First Stage: Adaptation of the Auditory-Visual Stroop Test

Response: We appreciate this important methodological point. We have clarified in the text that the spatial mapping between auditory stimuli and visual targets was not explicity instructed, as this was an intentional design choice to preserve the ecological and intuitive nature of the task.

Response: We agree that this section required substantial clarification. Stage 1 has now been clearly defined as a technical development and calibration phase, not a data-driven analytical stage. We now explicitly state that no clinical data were collected for inferential analysis and the purpose was to: develop AV-Stroop paradigm, calibrate auditory stimuli,define stimulus-response structure and standardize task parameters. We also clarify how this stage informed the design of Stages 2 and 3.

Response: Dear reviewer, we appreciate your comment. We reviewed the methods to clarify these questions.

Response: Dear reviewer, we appreciate your comment. We reviewed the methods to clarify these questions. It was 9 trials on right side and 9 trials on the left side.

Comment 17: Furthermore, it seems that this paragraph describes the first step in the First Stage and that the main purpose was "To establish the stimulus-response association".

Response: We thank the reviewer for this insightful observation. We agree that the main purpose of this initial step is to establish the stimulus-response association. We have updated the text to clarify it.

Response: We thank the reviewer for raising this important point. We have clarified in the methodology that the experimental trials were initiated immediately following the training phase. The time interval between the two steps was minimal (less than one minute), which ensured that the stimulus-response association remained strong and active throughout the testing procedure.

Response: We thank the reviewer for this important conceptual distinction. We have revised the manuscript to avoid overstating the concept of conditioning. The term “conditioned”has been replaced for “familiarization”and we now clarify that: the task relies on stimulus-response compatibility, not automaticity and the brief training phase was intended to ensure task comprehension, not to induce automatic responses.

Comment 20: Paragraph that begins in line 142: It seems that a second step in the First Stage is discussed in this paragraph.

Response: Dear reviewer, we appreciate your comment. The adaptation of the AV-Stroop Test encompassed the development of a training block and the test tracks. We reviewed the methods to clarify these questions.

Comment 21, line 146: Now the authors refer to each presentation as "series of stimuli" which seems to be trials and were previously referred to as "items". Please keep the terminology constant.

Response: Dear reviewer, we appreciate your comment. Terminology has been standardized thoughout: “items”/”series of stimuli” was replaced with “trials”.

Response: Dear reviewer, we appreciate your comment. We reviewed the methods to keep clarify these questions.

Comment 23, lines 151 and 152: "Each test track was about 120 second" What is a "test track"? Please explain.

Comment 24, line 153: "the following sequence of stimuli begins." What sequence of stimuli? Please explain.

Response: Thank you for pointing out this ambiguity. By "the following sequence of stimuli," we intended to refer to the series of experimental auditory and visual stimuli that are presented to the participant during the trials. We have revised the manuscript to be more precise, explicitly stating that this refers to the series of test trials involving both congruent and incongruent conditions.

Comment 25, line 155: There exist several types of monitors. That begs the question, what is a "standardized" monitor? Please, explain.

Response: We appreciate your request for clarification. We have replaced the term "standardized" to clarify that the exact same monitor and hardware setup were used for all participants throughout the data collection process.

Comment 26, line 156: What is a consistent refresh rate?

Response: We appreciate your request for clarification. The exact same monitor and hardware setup were used for all participants to ensure that stimulus size, brightness, and display latency remained uniform and controlled for every subject.

Comment 27, lines 170 and 171: the authors wrote: "(right side/left side/right side + left side in stereo sound)" This description is hard to understand. Please, clarify.

Response: Thank you for pointing out the ambiguity in our description of the auditory stimuli. We have revised the text to replace the colloquial description with standard acoustic terminology, specifying that the auditory stimuli were delivered via stereo headphones in three configurations: unilateral right, unilateral left, and binaural presentation.

Response: Dear reviewer, we appreciate your comment. We reviewed the methods and reorganized it.

Response: Dear reviewer, we appreciate your comment. We reviewed the methods to make it clearer.

Comment 30, Table 1: Column headings are missing, please add them.

Response: Dear reviewer, we appreciate your insightful comment. We added the headings. Thanks for your support.

Response: Dear reviewer, we appreciate your comment.

We have clarified that the refinement process was based on:

pilot testing of stimulus clarity and discrimination
verification of participant comprehension
adjustment of stimulus presentation timing

We now explicitly state that this process was iterative and observational, rather than based on formal statistical analysis.

Comment 33, line 196: How were those 34 test tracks selected? And what constitutes as "test tracks"? Please, clarify.

Response: Thank you for the question and the opportunity to clarify. We have expanded this section to clearly define "test tracks" as pre-structured auditory stimuli sets derived from clinical pitch-matching protocols, including Pure Tone (PT), Narrow Band (NB), and White Noise (WN) stmuli across standard audiological frequencies (0.25 to 8 kHz).

Response: Thank you for pointing out the lack of clarity and evidence regarding this statement. We have revised the text to clarify that our assessment of the training track was based on observational testing, which indicated that 18 trials were sufficient to establish the stimulus-response association and stabilize total track times without inducing cognitive fatigue. We have adjusted the language to reflect this practical observation rather than presenting it as a formal psychometric validation.

Response: Dear reviewer, we appreciate your comment. We reviewed the methods to make it clearer and improve this section.

Response: Thank you for this valuable feedback. We have clarified that: the AV-Stroop test used in Stages 2 and 3 is identical to the version developed in Stage 1; the same stimulus types and structured were maintained and a training phase was consistently applied.

The Second Stage

Comment 36: The participants that participated in the First Stage, did some or all of them participate in this stage also?

Response: Dear reviewer, we appreciate your comment. We reviewed the methods to make it clearer.

Response: We thank the reviewer for the question. We confirm that the AV-Stroop test used in this stage is the same instrument developed and adapted in the First Stage. To avoid ambiguity, we have updated the manuscript to explicitly state that the same paradigm, stimulus types (PT, NB, and WN), and training protocol were utilized across the clinical stages.

Response: We appreciate your comment regarding the accessibility of reference [10]. We have retained this citation because it contains the foundational data and original methodology specific to this paradigm. The purpose of citing these works is to establish the theoretical basis for our approach.

Response: Dear reviewer,

Terminology has been fully standardized to “total task time”, as recommended. Thank you for your guidance and support.

Response: Thank you for this feedback. While we have retained our original reference because it establishes the foundational parameters used in our specific study design, we appreciate the need for accessible English-language documentation regarding the test's psychometric properties. Accordingly, we have added the foundational English-language reference for the MoCA (Nasreddine et al., 2005) to the manuscript to provide the requested psychometric context.

Response: Thank you for highlighting this error. Corrected to C-Stroop throughout the manuscript.

The Third Stage

Response: Dear reviewer, we appreciate your comment. We reviewed the methods to make it clearer.

Comment 43: Tables 3 and 4. The value of standard deviation is always positive. So, writing SD +/- is wrong. Please, rectify.

Response: Thank you for the comment. We corrected it.

Response: Thank you for the follow-up and apologize for the continued ambiguity. We have replaced the phrase "completed a clinical history" with a more precise description of the data collection process. Specifically, this involved the collection of demographic and identification data, including age and gender, to characterize the participant cohort, and data regarding general health. We have updated the manuscript accordingly.

Comment 45, line 282: What was included in the training stage?

Response: Thank you for the question and apologize for any lack of clarity. The training stage consisted of a dedicated familiarization period using a "training track." We reviewed this sentence to make it clearer.

Comment 46, line 289: Please explain exactly what the total task execution time is and how it was measured. Please, also report how the number of errors were recorded.

Response: Thank you for the question and the suggestion. We have clarify that "total task execution time" is defined as the total task time. As suggested, we have adopted and standardized the term "total task time" throughout the manuscript. We have clarified that the measurement mechanics for both the total task time and the error rates are defined in the methodology section of the First Stage: The total task time was recorded automatically. The total task time was defined as the total it took to run the whole test. Errors were defined as incorrect target selection or inappropriate responses and were manually counted.

Comment 47: Paragraph that begins in line 282. Please report how many trials (presentations) were in each task.

Response: Thank you for the question and apologize for any ambiguity. We have updated the manuscript to clarify that each task in this stage utilized the same format and trial structure as the AV-Stroop test adapted in the first stage.

Statistical Analysis

Response: Thank you for this observation. We agree that the citation was not necessary for these standard statistical procedures. We have removed the reference from the manuscript as requested.

Results

First Stage

Response: Thank you for this feedback. We have revised the methodology of the First Stage to define what constitutes a "test track" and to explain the selection criteria of the 34 standardized tracks used in the paradigm.

Second Stage

Comment 50: Tables 5 and 6. What is Stroop LP

Response: Thank you for pointing out this oversight. The term "Stroop LP" was a typographical error. We have corrected it throughout Tables 5 and 6, as well as in the related text, to ensure consistency with the established methodology.

Third Stage

Response: Dear reviewer, we added this information. Thank you for your careful support.

Response: Dear reviewer, we reviewed the results section to clarify these questions. Thank you for your kind support.

Comment 53, lines 366 and 367. In which tasks were these differences found? It is not sufficient to tell that is in the training and test tracks. Please, explain.

Response: Dear reviewer, we reviewed the results section to clarify these questions. Thank you for your kind support.

Comment 54: Tables 7 and 8. The value of standard deviation is always positive. So, writing SD +/- is wrong. Please, rectify.

Response: Thank you for the comment. We corrected it.

Response: Dear reviewer, we appreciate your comment. We divided the table. Thank you for your kind support and guidance.

Response: Thank you for the detailed observation and for pointing out the ambiguity in the table caption. We clarify that due to the low frequency and skewed distribution of errors, a categorical approach (participants making < 3 errors vs. > 3 errors) was adopted to allow meaningful comparison between groups.

Comment 57, line 387. A p-value < 0.05 could be infinitely low, please report the exact p-value.

Response: Dear reviewer, we reviewed the manuscript to address your comment and avoid generic Numbers. Thanks for your kind guidance.

Comment 58: Table 8. I recommend that the authors split this table into two tables

Response: Dear reviewer, we appreciate your comment. We splited the table. Thanks for your kind support and guidance.

Response: Dear reviewer, thanks for your comment. In the Third Stage, we presented the comparison between the Study and Control Groups. Later, the comparison within the Study Group considered the analysis of the results in the Tinnitus Pitch track. We reviewed the text to clarify.

Comment 60: Table 8. Except for the Multiple comparisons' part, p-values are only for the WN-values. Please explain the reasons behind it.

Response: Dear reviewer, in this table, we

We have revised Table 8 and its caption to explicitly describe:

the comparisons performed
the reference condition (tinnitus pitch)
the meaning of reported p-values

Comment 61: Table 8. What is "Multiple comparisons (errors)"? As it is written in the table (e.g. WN x NB) it suggests that this might refer to interactions. Please explain, e.g. in the caption.

Response: Thank you for this observation and for pointing out the potential ambiguity in the table's terminology. The label "Multiple comparisons (errors)" refers to the pairwise post-hoc comparisons conducted after the Friedman test. Within the tinnitus group, the Friedman test was applied to compare the number of errors across the different test tracks (White Noise, Narrow Band, Pure Tone, and Tinnitus Pitch). The notation "WN x NB" denotes the comparison between these specific conditions, not an interaction effect. We have revised the caption of Table to clarify this methodology.

Discussion

First Stage

Comment 62, lines 411 and 412: Ref. 9 is a review paper so please cite directly some of "those studies" (e.g. [x], [y], [z]; for a review see [9]).

Response: Dear reviewer, thanks for your comment. We reviewed the text to fulfill this gap.

Response: Thank you for the question and for pointing out this ambiguity. We reviewed the manuscript to fulfill this gap.

Response: Thank you for this observation.

We agree with the reviewer that automaticity cannot be assumed. We have revised the Discussion to remove this interpretation and instead describe the task in terms of:

stimulus-response compatibility
task familiarization

This aligns the interpretation with established cognitive frameworks.

Comment 65, lines 420 to 422: It is highly unlikely that 2 x 6 presentations resulted in automatic responses.

Response: Dear reviewer, we thank you for this comment. We would like to reinforce that the task is intuitive; participants rapidly adapted to the association without requiring extensive training. All participants successfully completed this phase and demonstrated full understanding during testing, confirming that the brief period was adequate for the experimental task.

Comment 66, lines 432 – 435: This paragraph should definitely be in the intro.

Response: Thank you for the recommendation. We carefully considered moving this paragraph to the Introduction. However, we respectfully retained it in the Discussion section because it provides an interpretation of the specific methodological decisions presented as a result of the AV-Stroop test. We reviewed the Introduction to be more comprehensive and the writing to improve the quality and consistency of our manuscript. We sincerely thank you for your kind support.

Comment 67, line 433: Please note that ref. 22 did not use the Stroop paradigm but the go/no-go paradigm.

Response: Thank you for the question and for pointing out this ambiguity. We reviewed the manuscript to fulfill this gap.

Response: Dear reviewer, we appreciate your guidance and support. We uniformized the terms, as you recommended, and reviewed the manuscript to make it clearer.

Response: Thank you for this observation. We carefully considered the suggestion to move this text to the Introduction. However, we respectfully retain this discussion in its current section because it specifically addresses the adaptation of the instrument, which is a direct methodological outcome of our study rather than background literature. Moving this interpretation to the Introduction would require presenting the study's results before they have been introduced. Therefore, we believe it is most appropriate to keep this discussion connected to the interpretation of our findings.

Second Stage

Comment 70, line 446: The authors used a sample drawn (selected) from a population. Therefore, it is misleading to use the word "population" in this context, please rectify.

Response: Dear reviewer, thanks for your comment. We rectified it.

Comment 71, line 453: Please, cite directly some of the previous studies in addition to ref. 9.

Response: Dear reviewer, thanks for your comment. We reviewed the text and added these references.

Comment 72, lines 456 – 458: Number of errors in which task? In Table 6, correlations results are reported, so please replace the word "associated" with correlated.

Response: Thank you for the comment. We reviewed the manuscript to fulfill this gap.

Third stage

Comment 73, line 485: Please note that ref. 22 did not use the Stroop paradigm but the go/no-go paradigm.

Response: Thank you for the question and for pointing out this ambiguity. We reviewed the manuscript to fulfill this gap.

Response: Thank you for the question and for pointing out this ambiguity. We reviewed the manuscript to fulfill this gap, and added the information about the mumber of errors.

Comment 75, line 498: In Table 7, error counts (e.g. number of errors) is not reported, but the number of participants that made errors are reported. Please make this clear.

Response: Thank you for the comment. We reviewed the manuscript to make it clearer.

Comment 76, line 552: According to Table 7, the Control Group made errors in the 3 tests the authors mention here. Please take a look at this.

Response: Thank you for the comment. We reviewed the manuscript to make it clearer.

Comment 77, lines 565 and 566: Please cite some of these studies. If no studies are cited, this claim is unsupported.

Response: Thank you for the comment. We reviewed the manuscript to make it clearer.

Comment 78, line 570: More than what?

Response: Thank you for the comment. We reviewed the manuscript to correct.

Comment 79, line 578: "Unlike other studies [25]" only one cited so singular (study) should be used but not plural. Although it would be best to cite more than just one.

Response: Thank you for the comment. We reviewed the manuscript to correct.

Lines 578 – 582: Please note that ref. 22 did not use the Stroop paradigm but the go/no-go paradigm.

Response: Thank you for the question and for pointing out this ambiguity. We reviewed the manuscript to fulfill this gap.

Limitations and Future Directions

Response: Thank you for identifying this terminology error.

The term “operating system” has been replaced with “presentation software (Microsoft PowerPoint)”.

We clarify that:

stimulus presentation was controlled via PowerPoint
timing was recorded using its internal features

This resolves the inconsistency.

Comment 81, lines 619 – 623: This paragraph should be in the intro (or the section of statistical analysis) because if it is located there, the reader know this before reading the results section.

Response: Dear reviewer, thank your for your kind support. We made this correction.

We appreciate your time and effort to improve our work, and we hope that the revisions we have done address all concerns.

It is an honor to submit the revised version of our manuscript to the Brain Sciences.

Thank you for considering our revised manuscript.

Sincerely,

Authors

Round 3

Reviewer 2 Report

Comments and Suggestions for Authors

Comment 1: I have read the authors' responses in the cover letter and the revised manuscript. In many cases the authors wrote in the cover letter that they had modified the manuscript in accordance with my comments without the modifications. If the authors do not agree with my comments, they must explain why and justify that they are not going to modify the manuscript. But it is very important that what is written in the cover letter is congruent with what was done in the manuscript. However, the manuscript is definitely much better now but it still needs considerable improvements.

Comment 2, the error rates: Number of errors is one of the measures that the authors used to compare the groups. They cite studies that also reported few errors and low error rates. Both ref. 8 and 15 warns the reader to keep the low error rates in mind when interpreting the results. This strongly suggests that error rates are not the preferred measure to evaluate the Stroop effect. Ref 20 reports results of a two-cases study and with quite a few errors. In comment no. x I discuss the potential problem of as low error rates as reported in the manuscript and this is a fact that the authors need to discuss.

Abstract
Comment 3, line 36: "wording reading" -> word reading

Intro

Comment 4: Although the Intro is better than in the previous version, it is still not good enough. In many cases it looks like a collection of almost unrelated paragraphs. In other words, the flow in the Intro is far from being good and needs to be improved.

Comment 5, line 96: More than what?

First stage

Comment 6: The First Stage is split into several paragraphs, and it seems that the in the first paragraph the preparations (or making) of the sound stimuli are explained. But in the next paragraph the authors discuss the equipment used and in the third they discuss how this "experiment" was run. There for I strongly recommend that a sub-heading will be added to the first and the third paragraph; that would substantially improve readability.

Comment 7: The sentence that begins in line 145. I do not agree that the authors were refining the AV-Stroop paradigm, they were inventing a new one.

Comment 8, line 147: What is s clinical data? And since clinical data was not collected, was some other kind of data collected?

Comment 9, line 157: In this line the authors wrote that the "presentation format" was sent to left ear, right ear and both ears (bilateral). But what is a "presentation format"?

Comment 10, lines 159 – 161: This "computerized test material"? What consisted that of? Please explain.

Comment 11, line 168: I don't think that the participants were conditioned, but they became familiar with what was the target and what was the distractor.

Comment 12, lines 164 – How often were the stimuli presented in the condition described in Figure 2, Panel 1:

Comment 13, lines 164 – 167: In these lines the authors wrote: "Participants hear the recorded instruction to point to the circle to verify their visual discrimination of the target [...]" and "Afterwards, participants hear the recorded instruction “When you hear the sound, point to the circle”." These sentences seem to convey the same or similar meaning, but the word "Afterwards" makes these sentences obscure and "afterwards" compared to what? Please, rephrase.

Comment 14, lines 167 and 168: In comment no. in my last report, I expressed my doubts that 3 x 6 paired presentation were sufficient to condition the participants and in their reply the authors wrote: "We have revised the manuscript to avoid overstating the concept of conditioning." However, they are still overstating the effects of those 3 x 6 paired presentations. It is necessary to keep in mind that those 18 presentations were split into three conditions, left, right and middle with 6 presentations in each condition.

Comment 15, lines 170 and 171: According to these lines, the sounds were presented to both ears but not mentioned how often.

Comment 16, lines 172 – 174: In these lines the authors wrote that there were "18 sequential series of stimuli". In the authors' response to my previous comment no. 21 the authors wrote that "“items”/”series of stimuli” was replaced with “trials”" but that seems not to be the case. Please keep your modifications of the manuscript in accordance with your responses in the cover letter.

Comment 17, lines 176 – 181: I repeat from my last report (e.g. comment 19). The strongest connection must have been between stimulus type (circle) but not to location of the stimulus.

Comment 18, line 179: In this line the authors wrote: "This allowed for implicit association of stimuli", association to what?

Comment 19, line 180: I do not agree with the statement that the presentations ensured the association of stimuli, but I'll leave that for the future readers. Keep in mind that the strongest association was between the sound (wherever it was presented) and the stimulus type (the circle) but not the locations. The sound-circle pairing was presented 18 (9 left and 9 right) times plus the times the pairing presented for both ears and the circle in the middle. How often for both ears and the circle in the middle?

Comment 20: In my previous comment no 23 I asked what a "test track" is and the authors responded that they had now clarified that in the revised manuscript. But that is not the case. Again, please keep your modifications of the manuscript in accordance with the responses in the cover letter.

Comment 21, line 200: What is a test track? Despite my previous comment no 23 and the authors' response to it, it is still not clear what a test track is. Please explain.

Comment 22, Table 1. It seems to me that in the column headers the authors use the (probably not an English word) "Serie" where they are probably referring to trials. This is contrary to the authors' response to my comment no 21.

Comment 23, line 236: What are "standard clinical ranges"? Please explain.

Comment 24, lines 236 – 238: If there is not data supporting the statement that the 18 training trials (series of stimuli) sufficiently established stable "track" (trials?) then that statement is completely unsupported. It is possible that the authors subjectively concluded that these 18 trials sufficed, but they must support that statement with data, i.e. objective evidence that the reader can evaluate when reading the paper.

Comment 25, lines 242 and 243: In what context are frequencies in the range 0.25 to 8 kHz standardised? Please, explain.

Comment 26, lines 244 and 245: How did the display, per se, present stimuli? And what is a uniform monitor (left and right)?

Comment 27, lines 246 – 250: This description clearly shows that the tasks were cueing tasks with valid and invalid cues. However, I just leave this to the future readers.

Comment 28, line 251: What is total track time? Probably the same as total task time defined in lines 227 and 228. I iterate from above and from my previous report, please keep the terminology constant.

Comment 29: In the authors' response to my previous comment no 31 the authors wrote: "pilot testing of stimulus clarity and discrimination" and "verification of participant comprehension" to be able to conclude about this, results from data are necessary. Furthermore, in this same response the authors wrote that they explicitly stated that the process of evaluation in the section on the First Stage was mainly observational, but I was unable to find that statement.

Comment 30: The authors response to my previous comment no. 34: I was unable to find this in the revised manuscript.

Comment 31: Concluding remarks on the First Stage: Because no analyses were used to refine the stimuli and no discussion of how the authors evaluated the stimuli, this section is of very limited value for the future readers. Therefore, most of this section should be removed. Figure 2, panels 2, 3 and 4 should be kept and used to explain the experimental procedure along with the info provided in lines 239 – 252. The authors must keep in mind that replications are very important in science. And to make replications possible the experimental procedure (or procedures) must be explained in sufficient detail to make it possible for others to run a very similar experiment. Most of what is written in the section on the First Stage does not contribute to this. It suffices to provide a good and detailed explanation of the structure of the stimuli, e.g. the duration of the sound, the frequency and the loudness all of which is reported.

Second Stage

Comment 32: My previous comment no. 36. The authors have not responded to adequately to this comment.

Comment 33, lines 260 – 262: If tinnitus can affect cognitive processing (which I do not doubt), the participants with tinnitus might have affected the results from the Second Stage.

Third Stage

Comment 34, line 313: In this line the authors still refer to "a clinical history" and it is still unclear what it means (this is the third time I ask about this). It might be obvious to some of the possible future readers which are clinical researchers (i.e. investigating something related to mental illness). But I am pretty sure that not all the readers of Brain Sciences are clinical researchers. So please, rephrase the sentences where "a clinical history" is discussed.

Results - First Stage

Comment 35, lines 393 – 396: The authors still not explain how the test tracks were validated. Please explain.

Results - Second Stage

Comment 36, lines 400 – 401: Please inform the readers that are not familiar with the MOCA test, what the range is (min value and max value) and where the cutoff between diagnosed and undiagnosed people is.

Comment 36, line 406: Although the Stroop scores were not associated whit the MOCA scores, please report the p-value.

Results - Third Stage

Comment 37, lines 454 – 456: This sentence is very obscure. What did the authors mean when they wrote "distribution of participants". With respect to what follows, I suppose that the participants were split into two groups within each condition (two groups in study group and two groups in the control group) based on the number of errors they made. Please rephrase to make this clear.

Comment 38, line 455 and 456: In these lines the authors wrote: "considering an error threshold of fewer than or equal to and more than 3 errors." This definition is ambiguous. The threshold was less or equal to 3: Lower threshold <= 3. Or bigger than 3: Upper threshold > 3. I strongly recommend that the authors rephrase this and use mathematical signs instead of writing this out with words, because by doing so the ambiguity disappears. According to the definition in the text, the upper limit for the group with fewer errors was <= 3 but in Table 9 it seems to be < 3. Please take a look at this.

Comment 39, Table 8: Please, keep the number of decimal places constant.

Comment 40, Table 9: It is not possible the find out what the p-values refers to. For example, when looking at the results for the WN-task, there is one row with < 3 errors and one row for >= 3 errors. But the reported p-value is between these two lines and that makes it impossible to understand what was compared that resulted in the reported p-values. Please make that clear, e.g. in the table's caption.

Comment 41, Tables 8 and 9: From Table 8 it is obvious that the participants made very few errors and the info in Table 9 confirms it; a big majority of the participants made less than 3 errors which actually means that their number of errors ranged from 0 to 2. The tasks in the AV-test were very easy (discriminating between a circle and a square). Such a low error rates suggest that the task was so easy that ceiling effects are present and that makes the error analyses results much less valuable. The authors should discuss this possible ceiling effects regarding error rates.

Comment 42, lines 469 – 470: This paragraph is rather obscure. What does "performed" refer to? And between which test tracks? Please state clearly what test was used and which variables (both independent and dependent) were included in the test.

Comment 43, Table 10: The caption is obscure. What do the authors mean when writing "considering the Tinnitus Pitch track"? Please explain. Please do also explain what the p-values refer to. That must be some kind of comparisons, but what is compared is far from being clear, please make that clear. Why is only one effect size reported and effect size of which measure? And what values are reported in the Results column (there are missing spaces)? The columns headings have to be descriptive of what is in the corresponding column. Please rectify.

Comment 44, Table 11. I strongly recommend that the authors replace the "x" (e.g. WN x NB) with "vs" because that makes it obvious that e.g. WN is compared to NB. The difference between the conditions (e.g. WN vs. NB) must also be reported. Please remove from "Legend" description information that is not relevant for the table.

Comment 45, lines 478 – 487: When discussing the results and referring to "overall pattern" it is not sufficient to report only the p-values because a difference can be significant while it is so small that it is of no importance. In this paragraph the authors refer to Table 11, but in that table only the p-values are reported (well, also two effect sizes) but the value of the differences is not reported. And because the values are not reported it is not justified to write about trends. Furthermore, according to Table 8 the error rates are low (min value 0 and max value 9) and such a low error rate, might even reflect ceiling effects. Because of this, the results from the error analyses needs to be interpreted with caution.

Discussion

Comment 46, lines 515 – 520: In response to my previous comment no. 64, the authors wrote in the cover letter that agreed with the comment and had modified the text accordingly. But that is not case. In addition, they still ignore the fact that the most common pairing is not between sides (2 x 6) but between sound and stimulus' type (i.e. sound and circle 18 times). I don't think that 6 or even 18 presentations are sufficient to condition an individual. But that might suffice to make an individual familiar with the pairing, even the side-to-side paring.

Comment 47: The authors response to my previous comment no 65. Even though the participants finished the task with acceptable performance that does not mean that the responses were automated. And since this part of the discussion refers to the First Stage, there are no results reported that support the authors' claim in the response to my comment that "participants rapidly adapted to the association without requiring extensive training. All participants successfully completed this phase and demonstrated full understanding during testing, confirming that the brief period was adequate for the experimental task." It does not suffice to write that the participants' performance was adequate; that must be supported by data.

Comment 48, line 529 and the authors response to my previous comment 67: In the cover letter the authors wrote that they had modified the manuscript accordingly, but ref 22 is still cited as if ref 22 used Stroop, which was not the case. Please rectify.

Comment 49, line 651: In this line the authors wrote: "only the Study group made more than three errors per test." As I pointed out in my previous comment no. 76, participants in both groups made errors in the WN, NB, 650 and TP tasks. The statement is clearly wrong and must be corrected! This is one more example of that although the authors wrote in the cover letter that they had modified the manuscript in accordance with my comment, they did not.

Author Response

Dear Reviewer,

We sincerely thank you for your thorough review and insightful comments on our manuscript entitled “The Auditory-Visual Stroop Test to assess subjects with tinnitus”.

We have carefully addressed all points raised and revised the manuscript accordingly to improve its clarity, methodological rigor and overall quality.

Below we provide a detailed point-by-point response. Reviewer’s comments are presented followed by our responses.

Response: We sincerely apologize for any perceived discrepancies between our previous cover letter and the manuscript revisions. It was certainly not our intention to overlook any of the requested changes. We have performed a rigorous, line-by-line audit of the entire manuscript to ensure that every response provided in this letter is explicitly reflected in the updated text.

We appreciate the reviewer’s comment that the manuscript is 'definitely much better now.' Following your guidance, we have further refined the document to meet the required standard. In the responses below, we have clearly indicated the exact page and line numbers for every modification, and where we have respectfully maintained our original approach, we have provided a detailed justification for doing so.

Response: We thank the reviewer for this insightful critique regarding the utility of error rates in the Stroop paradigm. We agree that in populations with preserved cognitive function, error rates are often low and may not capture the interference effect as sensitively as total task time or reaction time.

As suggested, we have expanded our Discussion to explicitly address this. We clarify that while the low error rates prevent these measures from being the primary indicator of the Stroop effect, they serve as a crucial "quality control" metric. They demonstrate that the observed differences in total task times across Av-Stroop test were not due to a "speed-accuracy trade-off," but rather reflected a genuine increase in cognitive load. We have cited the warnings from references [8] and [15] to frame our null findings more cautiously, acknowledging that the stability of the error rates reflects the high compensatory capacity of this cohort. We have updated this on Page 22, Lines 775-783 (Discussion).

Abstract
Comment 3, line 36: "wording reading" -> word Reading

Response: Dear reviewer, we appreciate your comment. We corrected it (Page 1, Line 36).

Intro

Response: We thank the reviewer for the critique regarding the narrative flow. We have performed a reorganization of the Introduction to transition from a thematic list to a cohesive narrative. Specifically, we reordered the paragraphs to follow a logical 'funnel' approach:

Establishing the global impact of tinnitus on cognitive resources;
Theoretically linking this impact to inhibitory control and executive functions;
Identifying the gap in current assessment methods.

To ensure these sections are well-integrated, we included a new transitional paragraph (Page 2, Lines 57-63) that explicitly bridges the gap between auditory symptoms and the specific cognitive paradigms used in this study. This structural shift ensures that the Introduction now leads logically and directly to our research objectives.

Comment 5, line 96: More than what?

Response: We thank the reviewer for pointing out this lack of clarity. We have revised the sentence to explicitly state the comparison. The text now specifies that the adapted procedure is more sensitive for assessing inhibitory control in tinnitus patients compared to anterior versions of the Stroop task. This clarification has been added to Page 3, Lines 102-105.

First stage

Response: We thank the reviewer for this constructive suggestion to improve the readability of the 'First Stage' of our study. We agree that subheadings help the reader navigate the different technical aspects of stimulus preparation, equipament, and experimental execution. As recommended, we have restructured this section by adding subheadings on Pages 4 to 8, Lines 164-263.

Comment 7: The sentence that begins in line 145. I do not agree that the authors were refining the AV-Stroop paradigm, they were inventing a new one.

Response: We appreciate the reviewer’s perspective regarding the novelty of our approach. While we are encouraged by the suggestion that this represents a 'new' paradigm, we have intentionally used the term 'adaptation' out of respect for the foundational work of Stroop (1935). In the field of cognitive psychology, the Stroop effect refers to the specific mechanism of interference between conflicting stimulus dimensions. Since our test utilizes this core mechanism—applied to a cross-modal (auditory-visual) context—we believe 'adaptation' is the most technically accurate term to describe the evolution of the original paradigm for this specific clinical population. We have revised the sentence in lines 160-163 (Page 4) to better reflect this balance between novelty and theoretical heritage.

Comment 8, line 147: What is s clinical data? And since clinical data was not collected, was some other kind of data collected?

Response: We thank the reviewer for identifying this ambiguity. We clarify that in the 'First Stage' of the study, no clinical data (i.e., psychometric or performance data from a patient sample) were collected. This stage was strictly dedicated to the development and technical adaptation of the instrument. The outcomes of this stage were the standardized training and test tracks themselves. To avoid confusion, we have replaced 'clinical data' with 'patient performance data' and added a clarifying sentence on Page 4, Lines 157-162 to specify that this phase focused solely on the technical construction of the stimuli.

Comment 9, line 157: In this line the authors wrote that the "presentation format" was sent to left ear, right ear and both ears (bilateral). But what is a "presentation format"?

Response: We thank the reviewer for identifying this imprecise terminology. By 'presentation format,' we intended to refer to the spatial localization of the acoustic stimuli. To clarify this, we have replaced the phrase with 'acoustic presentation modes' (Page 5, Line 173).

Comment 10, lines 159 – 161: This "computerized test material"? What consisted that of? Please explain.

Response: We thank the reviewer for the request for clarification. The term 'computerized test material' refers specifically to the digital multimedia files developed during the First Stage of the study. This includes the complete set of synchronized auditory and visual tracks—comprising the training tracks and the experimental test tracks (White Noise, Narrow Band, and Pure Tone)—formatted for digital presentation. We have revised the text on Page 5, Lines 177-179 to explicitly define this material.

Comment 11, line 168: I don't think that the participants were conditioned, but they became familiar with what was the target and what was the distractor.

Response: We appreciate the reviewer’s comment regarding the terminology used in line 168. While the task was designed to be highly intuitive, we agree that 'familiarization' more accurately describes the cognitive process occurring during the training phase. Our goal was to ensure participants were fully habituated to the target stimuli and distractors before the experimental tracks began. We have replaced the term 'conditioned' with 'familiarized and habituated' to better align with standard cognitive psychology nomenclature (Page 5, Lines 191, 203, and 204).

Comment 12, lines 164 – How often were the stimuli presented in the condition described in Figure 2, Panel 1:

Response: We thank the reviewer for this question. In the condition described in Figure 2, Panel 1, the stimuli were presented only one time per trial to establish the initial baseline for stimulus recognition. We have updated the manuscript on Page 5, Lines 194-196 to explicitly state that this was a single presentation to ensure the methodology is fully reproducible.

Response: We apologize for the lack of clarity in these lines. The reviewer is correct that the phrasing was repetitive. The first sentence refers to a preliminary check of visual discrimination, while the second refers to the specific task instruction for the trial. We have removed the word 'Afterwards' and rephrased the section to clarify the sequence (Page 5, Lines 185-190).

Response: We apologize if the description of the training phase led to any confusion. We would like to clarify that the total number of stimuli in the training tracks is 21 presentation trials, as detailed in the Results section (and summarized below). The 21 trials consist of:

1 initial identification slide (visual verification);
1 specific task instruction slide;
1 Sound stimulus presented binaurally (target visual stimulus presented in the center of two distracting stimuli)
18 congruent auditory-visual presentation trials (distributed across left and right channels).

We reviewed the text to make it clear that the condition “Sound stimulus presented binaurally (target visual stimulus presented in the center of two distracting stimuli)” was presented once to each participant (Page 5, Lines 185-203).

Comment 15, lines 170 and 171: According to these lines, the sounds were presented to both ears but not mentioned how often.

Response: We clarify that the binaural (both ears) presentations occured one time during the specific training phase described in the Results section, and we inserted this information in the Methods section (Page 5, Lines 194-196).

Response: We sincerely apologize for this oversight and for the inconsistency between our previous response and the manuscript text. We understand that this discrepancy makes the review process more difficult, and we thank the reviewer for their patience.

We have now performed a rigorous, word-by-word audit of the entire manuscript to ensure that the term 'trials' is used consistently.

Comment 17, lines 176 – 181: I repeat from my last report (e.g. comment 19). The strongest connection must have been between stimulus type (circle) but not to location of the stimulus.

Response: We appreciate the reviewer’s comment regarding the terminology used in lines 176-181. While the task was designed to be highly intuitive, we agree that 'familiarization' more accurately describes the cognitive process occurring during the training phase. Our goal was to ensure participants were fully habituated to the target stimuli and distractors before the experimental tracks began. We have replaced the term 'conditioned' with 'familiarized and habituated' to better align with standard cognitive psychology nomenclature (Page 5, Lines 191, 203, and 204).

Comment 18, line 179: In this line the authors wrote: "This allowed for implicit association of stimuli", association to what?

Response: We thank the reviewer for identifying this ambiguity. We have clarified the sentence to specify the elements being associated. The text now explains that this phase allowed for the implicit association between the auditory stimuli and the visual target (the circle). This familiarization ensured that the participants could link the sound to the correct response automatically before the introduction of conflicting (incongruent) stimuli. We have updated Line 208-212 (Page 5) accordingly.

Response: We respectfully disagree with the suggestion that the association is independent of spatial location. The training phase was specifically designed to establish a congruent spatial-auditory association. By pairing the sound location (left, right, or center) with the corresponding visual target location, we ensured that participants developed a strong expectation of spatial congruency. This is a critical prerequisite for the experimental phase, where we then introduce incongruent spatial locations to measure the resulting interference.

Regarding the frequency of these presentations, as detailed in our previous responses to ensure manuscript-wide congruency: the training consists of 21 total trials, including 1 instruction, 1 visual discrimination, 8 left-side pairs, 8 right-side pairs, and 1 binaural/center pair. We have updated Page 5, Lines 185-212 to explicitly state these totals, providing full transparency for the reader regarding the spatial distribution of the stimuli. Additionally, we included a heading with the technical specification of the AV-Stroop test (Pages 8 and 9).

Response: We sincerely apologize for this oversight. To resolve this, we have performed a comprehensive, word-by-word audit to ensure the term 'test track' is explicitly defined and used consistently. As requested, we have added a formal definition on Page 4, Lines 152-156.

Comment 21, line 200: What is a test track? Despite my previous comment no 23 and the authors' response to it, it is still not clear what a test track is. Please explain.

Response: Dear reviewer, we thank you for your careful revision and guidance. As mentioned before, we included a formal definition in Methods section (Page 4, Lines 152-156): “The experimental phase utilized three distinct test tracks—specifically the White Noise (WN), Narrow-Band (NB), and Pure Tone (PT) tracks. Each test track consists of the synchronized auditory and visual stimuli developed and standardized during the First Stage of this study.”

Response: Dear reviewer, thanks for your comment. We corrected it (Page 7, Line 250).

Comment 23, line 236: What are "standard clinical ranges"? Please explain.

Response: We thank the reviewer for the opportunity to clarify this terminology. The term 'standard clinical ranges' refers to the frequency spectrum of 0.25 to 8 kHz, which corresponds to the standard range used in diagnostic pure-tone audiometry to evaluate human hearing sensitivity. We utilized this range to ensure that the auditory stimuli in the PT (Pure Tone) and NB (Narrow-Band) conditions were representative of the sounds encountered in clinical audiological assessments and are relevant to the speech-frequency spectrum. We have revised Line 236 to explicitly state these frequency boundaries and their clinical contexto (Page 8, Lines 269-273).

Response: We understand the reviewer’s concern regarding the empirical support for the training phase duration. The determination that 21 series (including 18 congruent trials) were sufficient was based on an internal pilot trial (n=5) conducted during the instrument's development. In this pilot, we observed that participants achieved 100% accuracy and verbalized full understanding of the task within the first 10–15 trials. Furthermore, during the main study, no participants requested additional instructions or reported confusion after the training phase, suggesting that the task was sufficiently intuitive and self-explanatory. We have revised to mention that the training duration was determined through a pilot phase to ensure procedural clarity (Page 8, Lines 274-278).

Comment 25, lines 242 and 243: In what context are frequencies in the range 0.25 to 8 kHz standardised? Please, explain.

Response: Dear reviewer, thanks for your comment. As mentioned before, in Comment 23, these frequencies were determined because it refers to the standard range used in diagnostic pure-tone audiometry to evaluate human hearing sensitivity.

Comment 26, lines 244 and 245: How did the display, per se, present stimuli? And what is a uniform monitor (left and right)?

Response: We thank the reviewer for identifying these points of confusion. To clarify, the stimuli were delivered via a Microsoft PowerPoint presentation displayed on a single, high-resolution monitor. The term 'uniform' was intended to describe the consistent visual background of the slides.

Regarding the 'left and right' placement, this refers to the spatial positioning of the visual markers (circle and square) on the screen, which correspond to the lateralization of the auditory stimuli. We have revised the manuscript to explicitly state that the task was delivered via presentation software and to clarify that the 'left and right' refers to the on-screen stimulus locations (Lines 293–297, Page 9).

Comment 27, lines 246 – 250: This description clearly shows that the tasks were cueing tasks with valid and invalid cues. However, I just leave this to the future readers.

Response: We appreciate the reviewer’s observation; however, we respectfully disagree that this is a simple cueing task. In a spatial cueing paradigm, a stimulus typically indicates where a target might appear. In our study, the auditory and visual stimuli are part of a multi-modal stimulus event where the location of the sound acts as a distractor or a facilitator.

The conflict between congruent (aligned) and incongruent (misaligned) spatial information is a classic measure of interference and inhibitory control, directly analogous to the Stroop effect. Specifically, the participant must inhibit the spatial information provided by the auditory channel to process the visual target. We have refined the description to clarify that the 'mismatch' between sound and target location is the mechanism used to induce cognitive interference, rather than a predictive cue (Page 9, Lines 303-305).

Response: Dear reviewer, thanks for your comment. We corrected it (Page 9, Line 306).

Response: We apologize for the difficulty in locating the statement regarding the observational nature of the First Stage evaluation. To address this, we have now explicitly added this information on Page 8, Lines 264-285. We provide the following observational data from the internal pilot (n=5):

Comprehension: 100% of pilot participants successfully identified the target and followed the instructions within the first trials.
Clarity: No participants reported ambiguity regarding stimulus discrimination (e.g., distinguishing between the circle and square).
Protocol Adjustment: Based on these observations, the training track was set to 21 series to ensure a margin of safety for task fluency and to prevent fatigue.

By providing these specific observational outcomes, we aim to offer the objective evidence required to support our conclusion that the training phase was sufficient.

Comment 30: The authors response to my previous comment no. 34: I was unable to find this in the revised manuscript.

Response: We sincerely apologize for this oversight. To resolve this, we have inserted this information on Page 8, Lines 264-285, as mentioned in our previous response.

Response: We respectfully disagree with the suggestion to remove the First Stage section. We fully agree with the reviewer that replication is the cornerstone of science, and it is precisely for this reason that we believe the First Stage must be preserved.

The First Stage is not merely a preliminary exercise; it represents the instrument validation process. Removing it would leave future researchers without the necessary context of how the stimuli were standardized, how the synchronization between auditory and visual components was achieved, and why specific parameters (duration, frequency, and spatiality) were selected.

To address the reviewer’s concerns regarding 'limited value,' we have significantly streamlined this section to focus strictly on the technical specifications and the observational validation (n=5) mentioned in previous responses (Page 8, Lines 263-285). This ensures that the 'foundation' of the experiment is documented in sufficient detail to allow for exact replication, while maintaining Figure 2 as the primary visual guide for the procedure.

Second Stage

Comment 32: My previous comment no. 36. The authors have not responded to adequately to this comment.

Response: We respectfully disagree that the structure has not been adequately explained, as we consider the technical details provided in Stage 1 to be the essential foundation of the instrument. However, we acknowledge that a more explicit 'structural summary' may assist the reader in transitioning between stages.

To ensure there is no ambiguity, we have organized the structural specifications of the AV-Stroop test into a clear technical summary on the end of the First Stage (Pages 8 and 9, Lines 287-313) . This summary explicitly links the development in Stage 1 to the applications in Stages 2 and 3, confirming that the trial counts, congruent/incongruent ratios, and timing parameters remain constant throughout the study. By pointing directly to these parameters, we believe the transition to the subsequent stages is now fully supported by the foundational data.

Comment 33, lines 260 – 262: If tinnitus can affect cognitive processing (which I do not doubt), the participants with tinnitus might have affected the results from the Second Stage.

Response: We appreciate the reviewer’s point; however, we would like to clarify that the presence of tinnitus does not confound the results of the Second Stage due to the within-subjects (repeated measures) design employed in this study.

In our analysis, each participant’s performance was compared directly against their own performance across the different experimental conditions. Because each individual served as their own control, any baseline cognitive interference or processing delay potentially caused by tinnitus was a constant factor across all conditions for that specific participant. We have clarified this methodological point in Page 9, Lines 322-325 to ensure the robustness of the internal comparison is clear to the reader.

Third Stage

Response: We sincerely apologize for the continued lack of clarity regarding the term 'clinical history.' We understand that 'clinical history' is too broad for a multidisciplinary audience.

We have completely removed the term from the manuscript and replaced it with a specific description of the medical and audiological data collected. We now explicitly state that the Study Group completed a structured interview regarding tinnitus characteristics and general health comorbidities. We have ensured that this specific terminology is used in Line 376-378 (Page 11) and throughout the text to ensure the study is transparent and replicable for all readers of Brain Sciences.

Results - First Stage

Comment 35, lines 393 – 396: The authors still not explain how the test tracks were validated. Please explain.

Response: We clarify that the validation of the test tracks in the First Stage was based on qualitative and behavioral criteria obtained during the protocol observational validation (n=5).

The tracks were considered 'validated' when all pilot participants demonstrated:

100% Identification Accuracy: The ability to distinguish between the circle and square markers and the left/right auditory stimuli without error.
Procedural Consistency: Completion of the 21-series training track without requesting additional instructions.
Perceptual Clarity: Verbal confirmation that the auditory stimuli (WN, NB, PT) were clearly audible and spatially distinct.

We have revised and included Lines 462-468 (Page 13) to explicitly define these validation criteria, ensuring that the transition from stimulus development to experimental application is transparent and based on these objective behavioral markers.

Results - Second Stage

Response: We thank the reviewer for this suggestion to improve the accessibility of our data. We have updated Lines 400–401 to include the score range of the Montreal Cognitive Assessment (MoCA), which ranges from 0 to 30 points. We have also specified that a score of 26 or higher is generally considered the threshold for normal cognitive function, while scores below this cutoff may indicate mild cognitive impairment (Page 14, 475-478). This addition provides the necessary context for interpreting the cognitive profile of our study population.

Comment 36, line 406: Although the Stroop scores were not associated whit the MOCA scores, please report the p-value.

Response: Dear reviewer, thanks for your comment. We inserted this information (Page 14, Lines 480, 481, 491, and 492).

Results - Third Stage

Response: We apologize for the lack of clarity in the phrasing 'distribution of participants.' To clarify, we were referring to the frequency distribution of participants categorized by their error rates within each group. We have rephrased this sentence to explicitly define these categories and the nature of the distribution (Page 16, Lines 531-538).

Response: We thank the reviewer for identifying this ambiguity and the inconsistency between the text and Table 10. We have corrected the definition to ensure mathematical precision and consistency across the manuscript.

The threshold used for the categorical analysis was indeed < 3 errors (lower group) and >3 errors (upper group). We have replaced the descriptive text in Lines 531-532 (Page 16) with mathematical symbols as suggested (< 3 and > 3) to eliminate any ambiguity. We have also audited Table 9 to ensure the data presented strictly follows this classification. We agree that this change improves the clarity and technical accuracy of the results.

Comment 39, Table 8: Please, keep the number of decimal places constant.

Response: Dear reviewer, thanks for your comment. We corrected it (Page 16, Table 9).

Response: We have updated the caption for Table 10 and reformatted the p-value column to clarify that the statistical significance refers to the comparison of the overall distribution (Study vs. Control Group) across both error categories (< 3 and >3) simultaneously.

Response: We agree with the reviewer that the low error rates in the AV-Stroop task can indicate a ceiling effect. We have added a paragraph to the Discussion section (Page 22, Lines 784-790) addressing this. We explain that the simplicity of the task was intentional to ensure the test remained accessible to clinical populations with varying levels of distress. We argue that the statistical significance achieved—despite this ceiling effect—actually emphasizes the strength of the interference caused by those specific acoustic stimuli.

Response: We apologize for the lack of clarity regarding the within-group comparison. We have rephrased this paragraph to explicitly state that we compared the number of errors across the four experimental conditions within the Study Group (Page 17, Lines 553-558, and caption of the Table 11).

Response: We apologize that the presentation of Table 11 (previous Table 10) was not sufficiently clear. The 'Tinnitus Pitch track' refers to the additional experimental condition where the acoustic stimulus was customized to match each participant's specific tinnitus perception, as determined during the pitch-matching assessment described in the Methods. To resolve the reviewer's concerns, we have:

Revised the Caption (Page 17, Lines 560-563): It now explicitly states that this is a within-group comparison of the Study Group across four conditions (WN, NB, PT, and the Tinnitus Pitch match).
Clarified the p-values: We have specified that the p-values refer to the Friedman Test used to compare error rates across all tracks.
Standardized the Effect Size: We have clarified that Kendall’s W is reported as the effect size for the overall comparison.

Fixed Formatting: We have corrected the spacing issues in the Results column.

Response: We have followed the reviewer’s recommendation and replaced 'x' with 'vs.' throughout Table 12 (previously titled Table 11) (Pages 17 and 18) to clarify the pairwise comparisons. Regarding the data reported, the differences between conditions are reflected in the provided statistical values. We have opted to report the effect size (Rank-Biserial Correlation) only for comparisons that reached statistical significance, as this is a standard practice to maintain focus on the most relevant findings.

Finally, we have reviewed the Legend. We believe the information provided is essential for the table to be interpreted independently of the main text, ensuring clarity for a multidisciplinary audience.

Response: We agree that p-values alone do not fully describe the importance of the findings. This is precisely why we reported the effect size (Rank-Biserial Correlation) for all significant results in Table 12; the effect size serves as the objective measure of the magnitude of the differences, independent of p-values.

Regarding the 'trends' and the 'caution' required due to low error rates (possible ceiling effects), we have already integrated a detailed analysis of these limitations into the Discussion section (Page 22, Lines 777-792) as per the reviewer's previous suggestions. We believe that by providing the effect sizes in Table 12 and the critical interpretation in the Discussion, the results are now presented with the necessary statistical and clinical context.

Discussion

Response: We apologize for any perceived oversight and appreciate the reviewer’s precision regarding the nature of the learning process.

To resolve this, we have performed a final audit of the manuscript (specifically Page 18, Lines 605-606) to ensure that all references to 'conditioning' have been removed and replaced with 'familiarization' or 'habituation.' Furthermore, we have explicitly clarified that this phase serves to familiarize the participant with the pairing of sound and stimulus type (e.g., sound-circle associations), as the reviewer correctly noted. We believe this shift in terminology aligns the text with the reviewer's observation that this stage establishes procedural baseline rather than behavioral conditioning.

Response: We appreciate the reviewer’s request for objective support regarding the First Stage. We have removed the claim of 'automatic responses' and replaced it with a data-driven report of participant performance during the Protocol Observational Validation (n=5).

We have updated the Discussion (Page 18, Lines 610–614) to include these specific performance markers, providing the 'data-driven' evidence that the brief training period was sufficient for participants to master the task requirements before entering the experimental tracks.

Response: We appreciate the reviewer’s attention to the bibliography. However, we would like to clarify that the previous Reference 22 was excluded and we mantained the reference to the study by Araneda et al. (2018) (in this version, Reference 16), titled 'A key role of the prefrontal cortex in the maintenance of chronic tinnitus: An fMRI study using a Stroop task' (NeuroImage: Clinical).

Unlike the author's previous 2015 work, this specific 2018 publication explicitly utilizes a Stroop paradigm to investigate inhibitory control in tinnitus patients. Therefore, we believe the citation is technically accurate and directly relevant to the current discussion of Stroop interference. We have verified that the citation in the manuscript correctly points to the 2018 fMRI study to avoid any confusion with the author's other publications.

Comment 49, line 651: In this line the authors wrote: "only the Study group made more than three errors per test." As I pointed out in my previous comment no. 76, participants in both groups made errors in the WN, NB, and TP tasks. The statement is clearly wrong and must be corrected! This is one more example of that although the authors wrote in the cover letter that they had modified the manuscript in accordance with my comment, they did not.

Response: We apologize for the confusion, but we must clarify that our statement is factually accurate based on the raw data presented in Table 9. While we agree with the reviewer that both groups made errors, the statement specifically refers to the threshold of >3 errors. As shown in Table 9, the maximum number of errors made by any participant in the Control Group across all tracks was 3, whereas participants in the Study Group reached up to 10 errors. Therefore, it remains a mathematical fact that only the Study Group contained individuals who exceeded the 3-error threshold. To prevent any further misunderstanding, we have revised Line 745-746 (Page 21) to be more explicit: 'While both groups committed errors, only participants in the Study Group exceeded the threshold of three errors per test track.' We believe this clarifies that we are acknowledging errors in both groups while highlighting the higher error density found exclusively in the Study Group.

We appreciate your guidance. We believe these revisions have significantly strengthened the technical precision of the manuscript.

It is an honor to submit the revised version of our manuscript to the Brain Sciences.

Thank you for considering our revised manuscript.

Sincerely,

Authors

Article Menu

The Auditory-Visual Stroop Test to Assess Subjects with Tinnitus

Final Recommendation (Second Round Review)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI