Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessData Descriptor

Peer-Review Record

SparrKULee: A Speech-Evoked Auditory Response Repository from KU Leuven, Containing the EEG of 85 Participants

Data 2024, 9(8), 94; https://doi.org/10.3390/data9080094

by Bernd Accou^1,2,*,†

, Lies Bollens^1,2,*,†

, Marlies Gillis¹

, Wendy Verheijen¹

, Hugo Van hamme²

and Tom Francart^1,*

Reviewer 1:

Malcolm Slaney

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Data 2024, 9(8), 94; https://doi.org/10.3390/data9080094

Submission received: 10 June 2024 / Revised: 5 July 2024 / Accepted: 16 July 2024 / Published: 26 July 2024

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

I'm glad for this paper and the data that it describes. Thank you.

Mostly minor presentation comments follow.

Please use active voice as much as possible. The first sentence of the paper would be much easier to read if it said: In order to study... studies (or experimentalists) present natural... while recording the EEG...

Last sentence of section 1. If you are going to mention this negative (challenges) what are you doing about it? Perhaps you can argue that you new larger database will mean it is less necessary to combine studies to get enough data?

Lines 43-44. I'm glad you mention the validation studies/citations. Are these only using the public data? The paper should be clear about what can be replicated based on just this paper's data.

I'm glad for the precise definitions of trial, session.

Line 101: just to be super clear, I'd say All audio stimuli..

and in Line 107, I'd say full-spectrum white noise (since high frequency noise, i.e. above 20kHz doesn't matter.)

For The EEG processing, what LPF frequencies were used. This is important!

Line 246: Are these integration windows the same as the time lags needed in line 241? I think this is the case, but you should use the same terminology to not confuse novice readers.

Lines 293 and elsewhere: Please don't use a reference number as the object of a sentence. I don't know who 10 is. If it is important enough to be part of the sentence then specify it in some way that helps readers who have familiarity in the area (by name, lab or title).

Line 325: Can you compare the results in figure 5 and 9? Since you bring up both, you should compare them, somehow. Otherwise, just show one type of analysis. But i'd prefer the analysis (and perhaps a citation for the more complete comparison.)

Line 306: Remove extra (

Line 391: Can you be more explicit about how and where to request permission? Specific email or address. If not in the paper, state where in the online material (README) it can be found.

Line 393: Can you be more specific about which part is restricted?

Thank you for providing this data to the community!

Comments on the Quality of English Language

Perfect.

Author Response

(Replies to the comments are marked in italic)

I'm glad for this paper and the data that it describes. Thank you.

Mostly minor presentation comments follow.

Thank you for this suggestion, we have modified the first sentence and changed to active voice wherever appropriate.

This is indeed a very good point. We have added your suggestion to the last line of the first section as follows: “SparrKULee provides a large homogenous dataset to train and evaluate models effectively, alleviating the need for combining multiple heterogeneous datasets”

Lines 43-44. I'm glad you mention the validation studies/citations. Are these only using the public data? The paper should be clear about what can be replicated based on just this paper's data.

The cited studies refer to the techniques (i.e. models) used to validate SparrKULee. We have added a note for clarity: “(note that the cited studies used different, smaller datasets)”

I'm glad for the precise definitions of trial, session.

Line 101: just to be super clear, I'd say All audio stimuli..

We agree that this is more clear and have adapted the manuscript accordingly.

and in Line 107, I'd say full-spectrum white noise (since high frequency noise, i.e. above 20kHz doesn't matter.)

We appreciate the comment, but we filtered the white noise to more closely resemble the speech spectrum. Therefore, we believe the term speech-weighted noise is warranted.

For The EEG processing, what LPF frequencies were used. This is important!

Thank you for your comment, we agree that this is indeed important information. As we utilized no low pass filtering, except for the anti-aliasing filters that scipy.stats.resample_poly used before downsampling, we added the following to the manuscript: ”The bandwidth of the resulting signal is therefore 0.5-32Hz, as no low-pass filtering was performed (excluding anti-aliasing filtering as done by scipy.signal.resample_poly)”

Line 246: Are these integration windows the same as the time lags needed in line 241? I think this is the case, but you should use the same terminology to not confuse novice readers.

Thank you for the recommendation, it is indeed more clear. We have changed the manuscript accordingly.

Thank you for the suggestion, we have adapted the manuscript accordingly.

Line 325: Can you compare the results in figure 5 and 9? Since you bring up both, you should compare them, somehow. Otherwise, just show one type of analysis. But I'd prefer the analysis (and perhaps a citation for the more complete comparison.)

Thank you for the recommendation. Both types of analysis (regression and match/mismatch) are currently widely used in the research community. Comparing both is still an open research question which (as far as we know) has not been solved.

Line 306: Remove extra (

We have changed the manuscript accordingly.

Line 391: Can you be more explicit about how and where to request permission? Specific email or address. If not in the paper, state where in the online material (README) it can be found.

Thank you for the suggestion, we have clarified the following part: “Per the request of 11 participants, access to their data is restricted (107 EEG recordings in total). Readers can request access by mailing the corresponding authors at \href{mailto:sparrkulee@kuleuven.be}{sparrkulee@kuleuven.be}, stating what they want to use the data for. Access will be granted to non-commercial users, complying with the CC-BY-NC-4.0 license.”

Line 393: Can you be more specific about which part is restricted?

We have added the following to the manuscript: “Per the request of 11 participants, access to their data is restricted (107 EEG recordings in total).”

Thank you for providing this data to the community!

Reviewer 2 Report

Comments and Suggestions for Authors

The authors present an extensive EEG dataset of a large number of participants listening to different types of speech and video stimuli in quiet as well as in background noise. The main purpose of the paper is to describe this publically-available dataset and to provide some analysis regarding speech tracking that can be used by other researchers as benchmark. To this end, the authors analyze the neural speech tracking of the speech envelope in different frequency bands as well as how it can allow to solve a match-mismatch task.

The dataset significantly adds to other, smaller speech-EEG datasets that are already currently available. It will be useful to researchers that seek to develop finer models of speech processing and decoding. The dataset is well described and the subsequent analysis is clearly presented. I only have a few comments that I would like the authors to address.

1. The uninterrupted trials were rather long, 15 minutes. I could easily imagine that participants' focus occassionally drifts away. Can the authors comment on this issue? Related to that, it would be important to see the evaluation of the comprehension questions, please show that.

2. Some of the stimuli are podcasts with videos. I have looked into some of these videos, and they contain parts where the speaker's face is well visible. It is known that lip movements are related to the speech envelope and are tracked by neural activity as well. The authors should comment on whether they believe that this effect will significantly impact the corresponding data or not (and if not, why).

3. The authors write that the EEG recordings were sampled at 8192 Hz. Please also describe the hardware filters that were applied in the EEG amplifier before digitization.

4. Is the pre-processed EEG data (line 367) already time-aligned to the speech stimuli?

5. Figure 8 shows an asymmetric pattern for the delta-band tracking, with high correlation values in the right temporal and the left occital areas. To the best of my knowledge, such an asymmetric distribution has not been reported before, it might stem from the large amount of the data that the authors had available. Can you comment on this issue as well as on the possible origins of this asymmetry?

Comments on the Quality of English Language

There is occassionally sloppy writing, e.g.

line 183 "to the Biosemi" should be "to the EEG amplifier"

line 185 "is inserted" should be "was inserted"

Author Response

(Replies to the comments are marked in italic)

The authors present an extensive EEG dataset of a large number of participants listening to different types of speech and video stimuli in quiet as well as in background noise. The main purpose of the paper is to describe this publically-available dataset and to provide some analysis regarding speech tracking that can be used by other researchers as benchmark. To this end, the authors analyze the neural speech tracking of the speech envelope in different frequency bands as well as how it can allow to solve a match-mismatch task.

The dataset significantly adds to other, smaller speech-EEG datasets that are already currently available. It will be useful to researchers that seek to develop finer models of speech processing and decoding. The dataset is well described and the subsequent analysis is clearly presented. I only have a few comments that I would like the authors to address.

The uninterrupted trials were rather long, 15 minutes. I could easily imagine that participants' focus occassionally drifts away. Can the authors comment on this issue? Related to that, it would be important to see the evaluation of the comprehension questions, please show that.

Thank you for the feedback. As the primary goal for SparrKULee was to obtain a large homogenous public dataset to train and evaluate models on, we chose to include longer trials of natural speech. You are correct that the participants’ focus might have drifted away occasionally, but we did not control for that (and we are not currently aware of a real-time absolute measure of auditory attention that is reliable).

We opted to not analyze the comprehension questions, as they were not calibrated and/or validated and merely served as external motivation for the participant to pay attention to the stimulus. We have adjusted the manuscript as follows: “As the questions were not calibrated, they merely motivated the participant to pay attention to the stimulus and no further analysis of the answers is provided”

Some of the stimuli are podcasts with videos. I have looked into some of these videos, and they contain parts where the speaker's face is well visible. It is known that lip movements are related to the speech envelope and are tracked by neural activity as well. The authors should comment on whether they believe that this effect will significantly impact the corresponding data or not (and if not, why).

This is indeed an important and valid remark. We have added the following analysis and excerpt to the manuscript:

“Additionally, the backward model performance for the podcast with and without 299

video is compared for the delta-band in Figure 8. No significant difference was found 300

between the podcast presented with or without video. This is in contrast to previous studies 301

that saw better neural tracking when the face of the speaker is presented [ 47, 48 ]. A possible 302

explanation for not finding an effect is that the face of the speaker is not continually present 303

during the whole recording, removing the added benefit for at least some portion of the 304

recording. Alternatively, the sample size contained within SparrKULee might be too small 305

to find an effect across participants. ”

The authors write that the EEG recordings were sampled at 8192 Hz. Please also describe the hardware filters that were applied in the EEG amplifier before digitization.

Thank you, that is indeed more clear. We have added the following to the manuscript: “Before digitization, the activeTwo system applies a 5th order cascaded integrator-comb (CIC) digital filter with cut-off frequency of 1600 Hz”

Is the pre-processed EEG data (line 367) already time-aligned to the speech stimuli?

This is indeed important: the pre-processed EEG is already time-aligned to the speech stimuli. We have adjusted the manuscript to reflect this.

Figure 8 shows an asymmetric pattern for the delta-band tracking, with high correlation values in the right temporal and the left occital areas. To the best of my knowledge, such an asymmetric distribution has not been reported before, it might stem from the large amount of the data that the authors had available. Can you comment on this issue as well as on the possible origins of this asymmetry?

While we agree that there seems to (visually) be some asymmetry in the delta-band tracking, the differences in symmetry are quite small (~0.005 Pearson correlation value), especially given the large inter-subject variability. Therefore, it seems inappropriate to draw conclusions from this.

Comments on the Quality of English Language:

There is occassionally sloppy writing, e.g.

line 183 "to the Biosemi" should be "to the EEG amplifier"

We have adjusted the manuscript accordingly.

line 185 "is inserted" should be "was inserted"

We have adjusted the manuscript accordingly.

Reviewer 3 Report

Comments and Suggestions for Authors

The paper presents the SparrKULee dataset, which is currently the largest auditory EEG dataset as far as I know. The dataset contains data from 85 young, normal-hearing subjects whose native language are Dutch. The article provides a detailed description of the experimental paradigm used in collecting the SparrKULee dataset, and provides two benchmark algorithms for envelope reconstruction task and match-mismatch task. I am sure that this database makes a significant contribution to the study of neural mechanisms (encoding and decoding) underlying speech perception. I think this paper should be accepted when the following issues are addressed by the authors.

1. The content of Table 1 and Table 2 are inconsistent. When introducing SparrKULee in Table 1, it includes speech-in-noise condition, but this condition is not presented in Table 2 and other places in the manuscript. Additionally, even for the 85 normal-hearing participants in the SparrKULee dataset, their effective EEG recording time is not fixed as 110 minutes, which is confusing. Please change "time per participant" to "average time per participant" in Table 1.

2. The results shown in Figure 5 probably need some explanation and discussion. The correlation values seem higher than the common results reported in previous studies, when the simple linear model was applied. In addition, it is reported that a significant difference was found between the audiobook and podcast stimuli (0.184 vs.0.133 median Pearson correlation), and then why the stimuli of audiobooks could lead to higher correlations? Basically, as a paper mainly introducing a dataset, the authors do not need to discuss these issues in depth. However, a brief discussion or providing some references that offer an explanation would be beneficial to readers and may inspire the further work on this dataset.

3. As far as we know, the dataset has been used in several competitions and several papers have already utilized this dataset. Mentioning and citing these papers and competitions may furtherly improve the impact of this dataset and this paper.

4. Some minor typo errors need to be modified, such as:

line 18-19, " to either decode features from the speech stimulus from the EEG", change to " to either decode features of the speech stimulus from the EEG ";

line 348, "the task wasrestingState", insert a blank after "was"

Comments on the Quality of English Language

Generally, the manuscript is well written in English, except the few minor errors shown in the previous comments

Author Response

(Replies to the comments are marked in italic)

The content of Table 1 and Table 2 are inconsistent. When introducing SparrKULee in Table 1, it includes speech-in-noise condition, but this condition is not presented in Table 2 and other places in the manuscript. Additionally, even for the 85 normal-hearing participants in the SparrKULee dataset, their effective EEG recording time is not fixed as 110 minutes, which is confusing. Please change "time per participant" to "average time per participant" in Table 1.

Thank you for the comment, it was indeed formulated in a confusing manner. We have deleted table 2 and adjusted the ‘time per participant’ to ‘Average time per participant’ in table 1.

The results shown in Figure 5 probably need some explanation and discussion. The correlation values seem higher than the common results reported in previous studies, when the simple linear model was applied. In addition, it is reported that a significant difference was found between the audiobook and podcast stimuli (0.184 vs.0.133 median Pearson correlation), and then why the stimuli of audiobooks could lead to higher correlations? Basically, as a paper mainly introducing a dataset, the authors do not need to discuss these issues in depth. However, a brief discussion or providing some references that offer an explanation would be beneficial to readers and may inspire the further work on this dataset.

Thank you for the feedback, we agree it would aid clarity if the difference in correlation scores with regard to previous studies and between audiobooks and podcast would be further elaborated upon. Therefore, we have added the following statements to the manuscript:

“A similar observation was made for all the proposed architectures of the 2023 Auditory EEG decoding challenge \cite{monesi2024auditory}, where the authors compare the performance of the models on stimulus type (audiobook vs. podcast) and observe that models perform significantly better on audiobooks than podcasts.

A possible explanation for this might be that the podcasts are recorded using compression techniques, which are employed to generate an evenly loud signal. It might be that the models fail to generalize to different compression techniques rather than to different unseen subjects.”

As far as we know, the dataset has been used in several competitions and several papers have already utilized this dataset. Mentioning and citing these papers and competitions may furtherly improve the impact of this dataset and this paper.

We agree that mentioning and citing these papers would be beneficial. We have adjusted the manuscript accordingly.

Some minor typo errors need to be modified, such as:

line 18-19, " to either decode features from the speech stimulus from the EEG", change to " to either decode features of the speech stimulus from the EEG ";

We have adjusted the manuscript accordingly.

line 348, "the task wasrestingState", insert a blank after "was"

We have adjusted the manuscript accordingly.

Article Menu

SparrKULee: A Speech-Evoked Auditory Response Repository from KU Leuven, Containing the EEG of 85 Participants

Further Information

Guidelines

MDPI Initiatives

Follow MDPI