Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

MiniatureVQNet: A Light-Weight Deep Neural Network for Non-Intrusive Evaluation of VoIP Speech Quality

Appl. Sci. 2023, 13(4), 2455; https://doi.org/10.3390/app13042455

by Elhard James Kumalija^*

and Yukikazu Nakamoto

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Appl. Sci. 2023, 13(4), 2455; https://doi.org/10.3390/app13042455

Submission received: 3 January 2023 / Revised: 6 February 2023 / Accepted: 10 February 2023 / Published: 14 February 2023

(This article belongs to the Special Issue Deep Learning for Speech Processing)

Round 1

Reviewer 1 Report

In this paper, authors propose MiniatureVQNet which is a light-weight deep neural network (DNN) model based single-ended speech quality evaluation for VoIP audio applications. The proposed model can predict audio quality independent of the source of degradation, whether noise or network, and light enough to run in embedded systems. Two variations of the proposed MiniatureVQNet model were evaluated, first MiniatureVQNet model trained on dataset which contains environmental noise only referred to as MiniatureVQNet–Noise, the second model trained on both noise and network distortions, this is referred to as MiniatureVQNet–Noise–Network. The proposed MiniatureVQNet model outperforms the traditional P.563 method in accuracy on all tested network conditions and environmental noise parameters. The Mean squared error (MSE) of the models compared to PESQ score for ITU-T P.563, MiniatureVQNet-Noise, and MiniatureVQNet Noise-Network are 2.19, 0.34, and 0.21 respectively.The performance of both models MiniatureVQNet-Noise-Network and MiniatureVQNet-Noise depend on noise type for SNR greater than 0dbB and less than 10dB. Training on noise-network distorted speech dataset improves model prediction accuracy in all VoIP environment distortions compared to training the model on noisy only dataset. However, you should provide the followings: 1) Why haven't you yet used the transformer-based language models in this article? 2) You can follow the approach of this article (https://link.springer.com/article/10.1007/s00521-022-07745-w) to be able to integrate transformers into CNN/LSTM deep learning architectures. 3) Lots of mistakes and typos, please send it to a native speaker to correct before publishing.

Author Response

1) Why haven't you yet used the transformer-based language models in this article?

2) You can follow the approach of this article (https://link.springer.com/article/10.1007/s00521-022-07745-w) to be able to integrate transformers into CNN/LSTM deep learning architectures.

Thank you for your advice. I acknowledge that transformers networks the newest and powerful models in natural language processing, mainly in text processing. However, the focus was to design a lightweight model. Therefore we prefered GRU over transformers.

3) Lots of mistakes and typos, please send it to a native speaker to correct before publishing.

Thank you once again for noticing typos and mistakes. The manuscript have improved a lot after proofreading. Please find attached updated manuscript.

Author Response File: Author Response.pdf

Reviewer 2 Report

Papers are well described and have significant content for publishing. Some minor observations are there as follows:

1. Dataset descriptions should be incorporated in more detail way... It was found that authors cited some work in dataset details, however a table consists of speaker information, speech details, speaker information, speech features, etc. will increase the readability of the paper.

2. Though the methodology is well-described but still it seems something is missing. Authors can include a kind of block diagram which can be helpful to understand the proposed method more clearly.

3. The Authors have mentioned in Table 2 about the correlation coefficient and MSE. Authors are encouraged to include the analysis of this result in more detail. Like, it is found from Table 2 that, 'MiniatureVQNet-Noise-Network' and 'Float16 MiniatureVQNet-Noise-Network' is having almost similar results. Is there any justification behind it. What the significant consequence can be drawn by that results?

4. Authors are encouraged to analyze the "why" of every result.

Author Response

Dataset descriptions should be incorporated in more detail way... It was found that authors cited some work in dataset details, however a table consists of speaker information, speech details, speaker information, speech features, etc. will increase the readability of the paper.

Thank you for the observation. The dataset I used is an aggregation of many public available datasets. I didn’t create a table to summarize all the attributes, as the number of subsets is larger. However, I have added more information about the datasets.

Though the methodology is well-described but still it seems something is missing. Authors can include a kind of block diagram which can be helpful to understand the proposed method more clearly.

I have added the proposed model’s network architecture diagram, I hope it will be easy for leaders to understand.

The Authors have mentioned in Table 2 about the correlation coefficient and MSE. Authors are encouraged to include the analysis of this result in more detail. Like, it is found from Table 2 that, 'MiniatureVQNet-Noise-Network' and 'Float16 MiniatureVQNet-Noise-Network' is having almost similar results. Is there any justification behind it. What the significant consequence can be drawn by that results?
Authors are encouraged to analyze the "why" of every result.

I have added more description of the experiment results. I have included the significance of the observations. Please find the changes in attached revised manuscript.

Author Response File: Author Response.pdf

Round 2

Reviewer 2 Report

The Suggestions are carefully followed and included in the manuscript. The paper is generally well written and structured.

Article Menu

MiniatureVQNet: A Light-Weight Deep Neural Network for Non-Intrusive Evaluation of VoIP Speech Quality

Further Information

Guidelines

MDPI Initiatives

Follow MDPI