Method for Resonant Frequency Attenuation in Dynamic Audio Equalizer
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsIn this paper, a method is proposed to attenuate the resonant frequency across the entire frequency spectrum of an audio signal. By applying a fast Fourier transform to the frequency spectrum, One-third octave bands and Equivalent Rectangular Bandwidth scale are derived. Simultaneously, a threshold curve array, which can detect the resonant frequency, is generated using the standard deviation. An IIR peak filter is then designed to process the input audio signal. The paper presents a clear logical structure, comprehensive work, and demonstrates a degree of innovation and engineering practicality. However, there are still several issues that require clarification or further attention from the authors.
- In line 251 on page 7, it is proposed that a judgment criterion of -15 dB should be used to determine whether to adopt the ERB distribution. However, the data lacks corresponding theoretical support. It is recommended that the author include relevant data or simulation experiments to substantiate this judgment criterion.
- The third section of the paper includes a detailed analysis of the experiment. However, it does not provide the specific resonant frequency of the corresponding audio data. This omission undermines the ability of the experimental results to support the conclusions drawn. It is recommended that the author enhance the third section by including the specific resonant frequency of the relevant audio data.
- In certain experiments, the images and analyses do not align well, as illustrated in Figure 11. It is recommended that the author enhance and optimize some of the experimental diagrams.
- The second section lacks specific algorithm details, particularly regarding the processing of arrays, such as threshold curves mentioned in Section 2.4. It is recommended that the author supplement and enhance this information.
- The datasetsutilized in this paper comprises 20 audio files, and it is recommended that the author consider expanding the datasets.
- The font size of the coordinate axes in some images is too small, making it difficult to read clearly, as illustrated in Figures 5 and 6. It is recommended that the author make adjustments to improve readability.
Author Response
Comments 1: In this paper, a method is proposed to attenuate the resonant frequency across the entire frequency spectrum of an audio signal. By applying a fast Fourier transform to the frequency spectrum, One-third octave bands and Equivalent Rectangular Bandwidth scale are derived. Simultaneously, a threshold curve array, which can detect the resonant frequency, is generated using the standard deviation. An IIR peak filter is then designed to process the input audio signal. The paper presents a clear logical structure, comprehensive work, and demonstrates a degree of innovation and engineering practicality. However, there are still several issues that require clarification or further attention from the authors.
|
Response 1: We improve the paper according with the reviewer comments.
|
Comments 2: 1. In line 251 on page 7, it is proposed that a judgment criterion of -15 dB should be used to determine whether to adopt the ERB distribution. However, the data lacks corresponding theoretical support. It is recommended that the author include relevant data or simulation experiments to substantiate this judgment criterion. |
Response 2: Agree. We have added an explanation of how the threshold value was obtained on page 7, paragraph 6, line 255, providing details of the experimentation shown in the next paragraph. In addition, we have added Figure 4, showing the data obtained to calculate the threshold. “A threshold was determined for the choice between the third octave distribution and the ERB scale. The threshold was calculated by dividing all the audio files in the dataset into blocks of 4096 samples and obtaining the spectral flatness coefficient for each block. Finally, the average of all blocks was calculated, and the threshold was determined. The coefficient value for each block and the average value considered as threshold are shown in Figure 4. The resultant threshold was -15dB. In scenarios where the input signal exhibits an equivalent power concentration across the spectrum, i.e., where the Spectral Flatness coefficient exceeds the threshold, the distribution by thirds of octave is employed, as it exhibits a coherent bandwidth throughout the spectrum. In contrast, when the coefficient falls below the threshold, the ERB distribution is used, as it allows the bandwidth to vary throughout the spectrum.”
Comments 3: 2. The third section of the paper includes a detailed analysis of the experiment. However, it does not provide the specific resonant frequency of the corresponding audio data. This omission undermines the ability of the experimental results to support the conclusions drawn. It is recommended that the author enhance the third section by including the specific resonant frequency of the relevant audio data. Response 3: We appreciate your comment about including specific resonant frequencies in section 3. However, the approach of the present study is based on a statistical analysis by frequency bands and groups of audio files, rather than evaluating individual resonant frequencies. This method allows more general conclusions to be drawn. To clarify this method and reinforce the validity of the approach, we have added an explanation of the analysis in section 3, page 15, paragraph 4, line 418 below. “The present study employs a statistical approach based on octave bands and by groups of audio files instead of analysing specific resonant frequencies in each audio. This method allows more general conclusions to be drawn about the effectiveness of the system in the proposed groups while avoiding biases caused by individual cases.” In addition, the following paragraphs were added in section 4 page 19 paragraph 4 line 503 and on page 20 paragraph 5 line 531 the validity of this analysis. “The utilisation of octave band-based analysis enables the estimation of resonance attenuation without the necessity of identifying each specific resonant frequency. Matching the detection zones with commercial tools such as Soothe2 confirms that the algorithm adequately captures the resonance patterns.” “Frequency band-based analysis proves to be a valid strategy to evaluate the effectiveness of the system without the need to identify individual resonant frequencies. Since energy build-up in certain bands is a reliable indicator of problematic resonances, this approach allows conclusions to be drawn on groups of audios in a general way.”
Comments 4: 3. In certain experiments, the images and analyses do not align well, as illustrated in Figure 11. It is recommended that the author enhance and optimize some of the experimental diagrams. Response 4: Agree. We have corrected a paragraph with the explanation that aligns to figure 11 (now figure 14) on page 17 paragraph 3 line 448. This description has been adapted to correspond to the figures that represent the results. “All three plug-ins exhibit attenuation exceeding -0.1 dB at frequencies of 500 Hz, 4 kHz and 8 kHz. However, it should be noted that the proposed plug-in generates -3.5 dB more than commercial plug-ins at the 8 kHz frequency.” A sentence is also added to improve the analysis of figure 16 and 17 on page 18 paragraph 2 line 466. “although the proposed plug-in generates higher attenuation”
Comments 5: 4. The second section lacks specific algorithm details, particularly regarding the processing of arrays, such as threshold curves mentioned in Section 2.4. It is recommended that the author supplement and enhance this information. Response 5: Comments welcome. The following paragraphs were added to complement and enhance the information on the arrangements in section 2. Please note that the following paragraph has been added to page 5, paragraph 3, line 195. “The analysis buffer still contains values between -1 to 1 in floating point as it is a representation of the input signal.”
The following paragraph is added on page 5, paragraph 5, line 207: “The Fast Fourier Transform (FFT) was computed using MATLAB's FFT function. The function accepts an input x array with double, single or int format values and delivers the frequency domain representation returned as a vector. In this case, the function returns an array of complex numbers representing each complex exponential by discrete frequencies.”
The following paragraph is added on page 10, paragraph 2, line 310 and line 317: “The two arrays are vectors of dimension 30, corresponding to one value for each band. Each vector contains the relevant spectrum information. The first vector has the maximum value per band in decibels, with negative float values. The second vector is the threshold curve vector, which has 30 floating point values in decibels less than 0.” “The final attenuation vector is converted from decibels to linear scale, resulting in a vector of dimension 30 with values between 0 and 1 in floating point, thereby representing the attenuation for each frequency band.”
The following paragraph is added on page 12, paragraph 3, line 352: “In this study, the MATLAB DSP System Toolbox object dsp.SOSFilter was utilized for signal filtering, implementing an IIR filter structure with second-order sections. The default filtering structure is Direct form II transposed and using the object the signal is filtered, giving as arguments the input signal in matrix form (mono or stereo) and the coefficient matrix b corresponding to a 30x3 matrix and the coefficient matrix a corresponding to a 30x3 matrix”
Comments 6: 5. The datasets utilized in this paper comprises 20 audio files, and it is recommended that the author consider expanding the datasets. Response 6: We hereby invite commentary and suggestions regarding the potential extension of the dataset. However, it should be noted that, for the application under consideration in this project, we consider the current dataset to be adequate for demonstrating the effectiveness of our proposal. The dataset under review consists of 20 audio files, which have been meticulously selected to encompass a range of instruments and conditions that are representative of real-life scenarios in commercial music applications. Specifically, the selection encompasses a wide range of instruments in various conditions, including percussion with reverb, guitars in different scenarios such as acoustic, clean electric, distorted and effects, vocals in different conditions, various wind instruments, and other instruments commonly found in commercial music such as organ, piano and electric bass. This diversity ensures that the dataset captures a broad spectrum of acoustic characteristics, allowing us to comprehensively assess the robustness and effectiveness of our approach. While a more extensive dataset could potentially offer supplementary information, it is believed that the current dataset is sufficient to validate the approach and demonstrate its practical applicability. However, the potential benefits of expanding the dataset in future work to further generalise the results are recognised. Furthermore, a paragraph has been appended to page 13 paragraph 3 line 382, providing a detailed explanation of the present situation and the diversity of the collected data. “The dataset consists of twenty audio files that have been carefully selected to include a variety of instruments and conditions that are representative of real-life scenarios in commercial music applications. In particular, the selection includes versatile instruments in different conditions such as percussion with reverb, guitars in different scenarios such as acoustic, clean electric, distorted and effects, vocals in different conditions, various wind instruments and other instruments commonly used in commercial music such as organ, piano and electric bass. This diversity ensures that the dataset captures a wide range of acoustic characteristics.“
Comments 7: 6. The font size of the coordinate axes in some images is too small, making it difficult to read clearly, as illustrated in Figures 5 and 6. It is recommended that the author make adjustments to improve readability. Response 7: Agree. The font size of the coordinate axes in figures 3, 6, 7, 9, 10, 12, 13, 14, 15, 16 and 17 was made larger.
|
4. Response to Comments on the Quality of English Language |
Quality of English Language
(x) The English is fine and does not require any improvement.
( ) The English could be improved to more clearly express the research.
Response: No comments
Author Response File: Author Response.pdf
Reviewer 2 Report
Comments and Suggestions for AuthorsThis paper presents a method for resonant frequencies attenuation in a dynamic audio equalizer. The study introduces a novel approach using FFT for frequency domain analysis, a standard deviation-based threshold curve for resonance detection, and IIR-Peak filters for attenuation. The proposed method is evaluated through a comparison with commercial plugins, including Soothe2 and RESO, using RMS and LUFS metrics. Furthermore, a subjective evaluation with 40 annotators is conducted to assess the perceptual effectiveness of the proposed approach. The results indicate that the developed plugin achieves greater attenuation, particularly in vocal, percussion, and guitar signals.
The paper demonstrates a well-structured and technically sound approach to addressing the problem of resonant frequency attenuation.
However, there are several concerns that should be addressed to strengthen the study. First, the dataset used for evaluation consists of only 20 audio files. While the selection of different audio sources is commendable, a larger and more diverse dataset would enhance the statistical robustness of the findings. Including real-world recordings with various levels of noise and reverberation could provide a more comprehensive validation. Additionally, while the paper provides a detailed mathematical foundation for the proposed method, some equations lack intuitive explanations. Simplifying key mathematical concepts or supplementing them with illustrative diagrams could improve readability for a broader audience. Furthermore, the subjective evaluation study, in which 74% of participants confirmed attenuation at resonant frequencies, would benefit from additional details regarding the expertise levels of the participants, the evaluation setup, and the criteria used for assessment. Discussing potential biases in subjective evaluations would also strengthen the credibility of the analysis.
Comments for author File: Comments.pdf
Author Response
Comments 1: This paper presents a method for resonant frequencies attenuation in a dynamic audio equalizer. The study introduces a novel approach using FFT for frequency domain analysis, a standard deviation-based threshold curve for resonance detection, and IIR-Peak filters for attenuation. The proposed method is evaluated through a comparison with commercial plugins, including Soothe2 and RESO, using RMS and LUFS metrics. Furthermore, a subjective evaluation with 40 annotators is conducted to assess the perceptual effectiveness of the proposed approach. The results indicate that the developed plugin achieves greater attenuation, particularly in vocal, percussion, and guitar signals. The paper demonstrates a well-structured and technically sound approach to addressing the problem of resonant frequency attenuation.
|
Response 1: No comments.
|
However, there are several concerns that should be addressed to strengthen the study.
Comments 2: First, the dataset used for evaluation consists of only 20 audio files. While the selection of different audio sources is commendable, a larger and more diverse dataset would enhance the statistical robustness of the findings. Including real-world recordings with various levels of noise and reverberation could provide a more comprehensive validation. |
Response 2: We hereby invite you to provide us with your comments and suggestions regarding the potential extension of the dataset. However, we would like to take this opportunity to explain that, for the application under consideration in this project, we consider the current dataset to be sufficient to demonstrate the effectiveness of our proposal. We have decided to use this repository because, to the best of our knowledge, there are few audio sets that include such a wide variety of instruments in different conditions being real-world recordings. The chosen dataset allowed us to take a wide range of instruments in a variety of settings, such as percussion with reverb, guitars in different contexts (acoustic, clean electric, distorted and effects), vocals in different conditions, various wind instruments and other instruments common in commercial music, such as organ, piano and electric bass. This diversity ensures that the dataset captures a broad spectrum of acoustic characteristics, allowing us to comprehensively assess the robustness and effectiveness of our approach. While a more extensive dataset could offer supplementary information and enhance the statistical robustness of the findings, we are of the opinion that the current dataset is sufficient to validate our approach and demonstrate its practical applicability in the specific context of this study. However, we acknowledge the potential benefits of incorporating additional recordings in future work, which would facilitate a more comprehensive validation. Furthermore, we have included the next paragraph on page 13 paragraph 3 line 382, which elucidates this situation and the diversity of the data collected.
“The dataset consists of twenty audio files that have been carefully selected to include a variety of instruments and conditions that are representative of real-life scenarios in commercial music applications. In particular, the selection includes versatile instruments in different conditions such as percussion with reverb, guitars in different scenarios such as acoustic, clean electric, distorted and effects, vocals in different conditions, various wind instruments and other instruments commonly used in commercial music such as organ, piano and electric bass. This diversity ensures that the dataset captures a wide range of acoustic characteristics.“
Comments 3: Additionally, while the paper provides a detailed mathematical foundation for the proposed method, some equations lack intuitive explanations. Simplifying key mathematical concepts or supplementing them with illustrative diagrams could improve readability for a broader audience. Response 3: We agree with your comment. For your attention, the following additional paragraphs clarifying the mathematical equations were added to increase clarity and accessibility for a wider audience. In addition, figures 6 and 8 were incorporated to provide supplementary visual aids to the equations. The following paragraph was added on page 6, paragraph 1, line 222. “The Spectral Flatness coefficient is a metric employed to characterise an audio spectrum. It provides a quantitative means to ascertain the extent to which a sound resembles a pure tone, as opposed to being noise-like.”
The following paragraph was added on page 9, paragraph 1, line 285. “As demonstrated in Figure 6, the spectrum is divided into octave bands, and the threshold values obtained with equation 8 are presented. This equation facilitates the calculation of a threshold value that considers the amplitudes of the corresponding band, while disregarding the prominent peaks.” Figure 6 is added to illustrate the process of obtaining threshold values per octave.
The following paragraph was added on page 9, paragraph 2, line 294. “Figure 7 shows the threshold curve generated by interpolating through the 30 points of the centre frequency of the selected distribution.”
The following paragraph was added on page 11, paragraph 3, line 339. “The magnitude response of a Peak filter is illustrated in Figure 8 when the Q (Bandwidth) factor is 1, the G value corresponding to the attenuation is -10dBFS, and the centre frequency is 1,000 Hz.” Figure 8 is added to illustrate the magnitude response of a Peak filter.
Comments 4: Furthermore, the subjective evaluation study, in which 74% of participants confirmed attenuation at resonant frequencies, would benefit from additional details regarding the expertise levels of the participants, the evaluation setup, and the criteria used for assessment. Discussing potential biases in subjective evaluations would also strengthen the credibility of the analysis. Response 4: We add the following paragraph on page 14 paragraph 2 line 392, which provides further detail on the methodology employed in conducting the inter-annotator agreement metric. This paragraph delineates the experience of the annotators, the questions that were posed, and the criteria that were taken into consideration.
“Finally, the inter-annotator agreement metric was implemented with 40 individuals from the audio field involved in order to ascertain whether the attenuation generated was at resonant frequencies. The level of experience of the annotators ranged from 2 to 10 years or more in different areas of audio and music. Audio engineers, producers, musicians, audiophiles and music-related people were considered in the evaluation. The experience of the annotators was carefully considered in order to ensure the transparency of the results for the project. In the course of the experiment, participants were requested to listen to the original audio files and to the audio files processed by the proposed plug-in, using headphones. Following the listening of each audio file, the participants were asked the following questions: - Do you hear a difference between the output audio file and the input audio file? - Do you consider the difference you hear in the output file to be attenuation of the resonant frequencies of the input audio file? - Do you consider that the attenuation in the low/mid frequencies is about the resonant frequencies of the input file? - Do you consider that the attenuation at the mid/high frequencies is about the resonant frequencies of the input file? It should be noted that all four questions had only binary (yes/no) answers. From the responses, it was possible to average for each question how many people answered yes and how many answered no, thus analysing the responses across the spectrum and in low and high frequency sections.”
|
4. Response to Comments on the Quality of English Language |
Quality of English Language
(x) The English is fine and does not require any improvement.
( ) The English could be improved to more clearly express the research.
Response: No comments
Author Response File: Author Response.pdf