Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach
Abstract
:1. Introduction
- We develop a model to automate the quality prediction of medical consultations. Particularly, the contribution at this point is at the feature engineering and model development levels. The model combines spectral features from the signals and text-based features from the transcripts, which will then be used to train different structures of deep and convolutional learning models.
- We reinforce the advantages of artificial intelligence in telemedicine. The development of an automatic quality assessment model reduces the effort and time for evaluating the consultations manually by the operations team. Besides, in pandemic situations such as the emergent COVID-19 pandemic, such an approach can enhance the quality of the service and better serve callers (patients).
2. Related Works
3. Problem Definition
4. Background
4.1. MFCCs and Mel Spectrogram
- Pre-emphasizing the input signal to remove unwanted or high frequencies.
- Framing and windowing the signal, where the objective is to divide the signal into a sequence of short overlapping frames to ensure that they are stationary, where a stationary signal reflects the true statistical and temporal characteristics. The windowing is often performed using rectangular windows as the Hamming window that conceals the potential of distorted segments found at the boundaries of the windows by smoothing them.
- Applying the Fourier transform of the signals to convert them from the time domain to the frequency domain to represent them in terms of their statistical and spectral features.
- Applying filter banks (“Mel filters”) to generate frames in the Mel scale.
- Computing the logarithmic value of the magnitude of powers resulted from the Mel filters.
- Calculating the spectrum of the results produced from the previous step by applying the discrete cosine transform (DCT) that results in cepstral coefficients as represented by Equation (1), where n∈ {0, 1, … C-1}, represents the cepstral coefficients, and C is the number of MFCCs.Conventionally, the MFCCs are from 8 to 13 features, however, those 13 coefficients exhibit static features of the respective frames apart. The generation of more temporal features is done by finding the first and second derivatives of the cepstral coefficients known as the delta and delta-delta features. Accordingly, the MFCCs are extended from 13 to 39 coefficients.
4.2. Deep Neural Networks (Convnet)
5. Methodology
5.1. Data Description
5.2. Signal-Based Approach
5.2.1. Feature Extraction
5.2.2. Model Structure
5.3. Transcript-Based Approach
5.3.1. Feature Extraction
5.3.2. Model Structure
5.4. Hybrid Approach Combining Spectral Features and Transcript Features
5.5. Experimental Settings
5.6. Evaluation Criteria
6. Results
6.1. Signal-Based Results
6.2. Transcript-Based Results
6.3. Hybrid-Based Results
7. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Conflicts of Interest
References
- Mosadeghrad, A.M. Factors affecting medical service quality. Iran. J. Public Health 2014, 43, 210–220. [Google Scholar] [PubMed]
- McConnochie, K.M. Webside manner: A key to high-quality primary care telemedicine for all. Telemed. E-Health 2019, 25, 1007–1011. [Google Scholar] [CrossRef] [PubMed]
- Roy, T.; Marwala, T.; Chakraverty, S. A survey of classification techniques in speech emotion recognition. In Mathematical Methods in Interdisciplinary Sciences; John Wiley & Sons Inc.: Hoboken, NJ, USA, 2020; pp. 33–48. [Google Scholar]
- Sharma, G.; Umapathy, K.; Krishnan, S. Trends in audio signal feature extraction methods. Appl. Acoust. 2020, 158, 107020. [Google Scholar] [CrossRef]
- Glowacz, A. Fault diagnostics of acoustic signals of loaded synchronous motor using SMOFS-25-EXPANDED and selected classifiers. Teh. Vjesn. 2016, 23, 1365–1372. [Google Scholar]
- Ranjan, J.; Patra, K.; Szalay, T.; Mia, M.; Gupta, M.K.; Song, Q.; Krolczyk, G.; Chudy, R.; Pashnyov, V.A.; Pimenov, D.Y. Artificial intelligence-based hole quality prediction in micro-drilling using multiple sensors. Sensors 2020, 20, 885. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Omari, T.; Al-Zubaidy, H. Call center performance evaluation. In Proceedings of the Canadian Conference on Electrical and Computer Engineering, Saskatoon, SK, Canada, 1–4 May 2005; pp. 1805–1808. [Google Scholar]
- Popovic, I.; Culibrk, D.; Mirkovic, M.; Vukmirovic, S. Automatic Speech Recognition and Natural Language Understanding for Emotion Detection in Multi-party Conversations. In Proceedings of the 1st International Workshop on Multimodal Conversational AI, Seattle, WA, USA, 12–16 October 2020; pp. 31–38. [Google Scholar]
- de Pinto, M.G.; Polignano, M.; Lops, P.; Semeraro, G. Emotions understanding model from spoken language using deep neural networks and mel-frequency cepstral coefficients. In Proceedings of the 2020 IEEE Conference on Evolving and Adaptive Intelligent Systems (EAIS), Bari, Italy, 27–29 May 2020; pp. 1–5. [Google Scholar]
- Yang, K.; Xu, H.; Gao, K. CM-BERT: Cross-Modal BERT for Text-Audio Sentiment Analysis. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 521–528. [Google Scholar]
- Bae, S.M.; Ha, S.H.; Park, S.C. A web-based system for analyzing the voices of call center customers in the service industry. Expert Syst. Appl. 2005, 28, 29–41. [Google Scholar] [CrossRef]
- Takeuchi, H.; Subramaniam, L.V.; Nasukawa, T.; Roy, S. Automatic identification of important segments and expressions for mining of business-oriented conversations at contact centers. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), Prague, Czech Republic, 28–30 June 2007; pp. 458–467. [Google Scholar]
- Garnier-Rizet, M.; Adda, G.; Cailliau, F.; Gauvain, J.L.; Guillemin-Lanne, S.; Lamel, L.; Vanni, S.; Waast-Richard, C. CallSurf: Automatic Transcription, Indexing and Structuration of Call Center Conversational Speech for Knowledge Extraction and Query by Content. In Proceedings of the LREC 2008, Marrakech, Morocco, 26 May–1 June 2008. [Google Scholar]
- Pandharipande, M.A.; Kopparapu, S.K. A novel approach to identify problematic call center conversations. In Proceedings of the 2012 Ninth International Conference on Computer Science and Software Engineering (JCSSE), Bangkok, Thailand, 30 May–1 June 2012; pp. 1–5. [Google Scholar]
- Pallotta, V.; Delmonte, R.; Vrieling, L.; Walker, D. Interaction Mining: The new Frontier of Call Center Analytics. In Proceedings of the DART@AI*IA, Palermo, Italy, 17 September 2011. [Google Scholar]
- Kopparapu, S.K. Non-Linguistic Analysis of Call Center Conversations; Springer: Cham, Switzerland, 2015. [Google Scholar]
- Karakus, B.; Aydin, G. Call center performance evaluation using big data analytics. In Proceedings of the 2016 International Symposium on Networks, Computers and Communications (ISNCC), Yasmine Hammamet, Tunisia, 11–13 May 2016; pp. 1–6. [Google Scholar]
- Chen, L.; Tao, J.; Ghaffarzadegan, S.; Qian, Y. End-to-end neural network based automated speech scoring. In Proceedings of the 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, 15–20 April 2018; pp. 6234–6238. [Google Scholar]
- Perera, K.; Priyadarshana, Y.; Gunathunga, K.; Ranathunga, L.; Karunarathne, P.; Thanthriwatta, T. Automatic Evaluation Software for Contact Centre Agents’ voice Handling Performance. Int. J. Sci. Res. Publ. 2019, 5, 1–8. [Google Scholar]
- Ahmed, A.; Shaalan, K.; Toral, S.; Hifny, Y. A Multimodal Approach to improve Performance Evaluation of Call Center Agent. Sensors 2021, 21, 2720. [Google Scholar] [CrossRef]
- Vergin, R.; O’Shaughnessy, D.; Farhat, A. Generalized mel frequency cepstral coefficients for large-vocabulary speaker-independent continuous-speech recognition. IEEE Trans. Speech Audio Process. 1999, 7, 525–532. [Google Scholar] [CrossRef]
- Tsai, W.C.; Shih, Y.J.; Huang, N.T. Hardware-Accelerated, Short-Term Processing Voice and Nonvoice Sound Recognitions for Electric Equipment Control. Electronics 2019, 8, 924. [Google Scholar] [CrossRef] [Green Version]
- Rao, K.S.; Manjunath, K. Speech Recognition Using Articulatory and Excitation Source Features; Springer: Cham, Switzerland, 2017. [Google Scholar]
- Bansal, V.; Pahwa, G.; Kannan, N. Cough Classification for COVID-19 based on audio mfcc features using Convolutional Neural Networks. In Proceedings of the 2020 IEEE International Conference on Computing, Power and Communication Technologies (GUCON), Greater Noida, India, 2–4 October 2020; pp. 604–608. [Google Scholar]
- Chabot, P.; Bouserhal, R.E.; Cardinal, P.; Voix, J. Detection and classification of human-produced nonverbal audio events. Appl. Acoust. 2021, 171, 107643. [Google Scholar] [CrossRef]
- Sandi, C.; Riadi, A.O.P.; Khobir, F.; Laksono, A. Frequency Cepstral Coefficient and Learning Vector Quantization Method for Optimization of Human Voice Recognition System. Solid State Technol. 2020, 63, 3415–3423. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
- Liu, W.; Wang, Z.; Liu, X.; Zeng, N.; Liu, Y.; Alsaadi, F.E. A survey of deep neural network architectures and their applications. Neurocomputing 2017, 234, 11–26. [Google Scholar] [CrossRef]
- Jia, Y.; Wang, M.; Wang, Y. Network intrusion detection algorithm based on deep neural network. IET Inf. Secur. 2018, 13, 48–53. [Google Scholar] [CrossRef]
- Wu, P.; Guo, H. LuNET: A deep neural network for network intrusion detection. In Proceedings of the 2019 IEEE Symposium Series on Computational Intelligence (SSCI), Xiamen, China, 6–9 December 2019; pp. 617–624. [Google Scholar]
- Fredes, J.; Novoa, J.; King, S.; Stern, R.M.; Yoma, N.B. Locally normalized filter banks applied to deep neural-network-based robust speech recognition. IEEE Signal Process. Lett. 2017, 24, 377–381. [Google Scholar] [CrossRef]
- Seki, H.; Yamamoto, K.; Nakagawa, S. A deep neural network integrated with filterbank learning for speech recognition. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 5480–5484. [Google Scholar]
- Jain, N.; Kumar, S.; Kumar, A.; Shamsolmoali, P.; Zareapoor, M. Hybrid deep neural networks for face emotion recognition. Pattern Recognit. Lett. 2018, 115, 101–106. [Google Scholar] [CrossRef]
- Bechtel, M.G.; McEllhiney, E.; Kim, M.; Yun, H. Deeppicar: A low-cost deep neural network-based autonomous car. In Proceedings of the 2018 IEEE 24th International Conference on Embedded and Real-Time Computing Systems and Applications (RTCSA), Hakodate, Japan, 28–31 August 2018; pp. 11–21. [Google Scholar]
- Tian, Y.; Pei, K.; Jana, S.; Ray, B. Deeptest: Automated testing of deep-neural-network-driven autonomous cars. In Proceedings of the 40th International Conference on Software Engineering, Gothenburg, Sweden, 27 May–3 June 2018; pp. 303–314. [Google Scholar]
- Roy, A.; Sun, J.; Mahoney, R.; Alonzi, L.; Adams, S.; Beling, P. Deep learning detecting fraud in credit card transactions. In Proceedings of the 2018 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, 27 April 2018; pp. 129–134. [Google Scholar]
- Yuan, S.; Wu, X.; Li, J.; Lu, A. Spectrum-based deep neural networks for fraud detection. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 2419–2422. [Google Scholar]
- Kollias, D.; Tagaris, A.; Stafylopatis, A.; Kollias, S.; Tagaris, G. Deep neural architectures for prediction in healthcare. Complex Intell. Syst. 2018, 4, 119–131. [Google Scholar] [CrossRef] [Green Version]
- Soliman, A.B.; Eissa, K.; El-Beltagy, S.R. Aravec: A set of arabic word embedding models for use in arabic nlp. Procedia Comput. Sci. 2017, 117, 256–265. [Google Scholar] [CrossRef]
- Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-scale machine learning on heterogeneous systems. arXiv 2015, arXiv:1603.04467. [Google Scholar]
- McFee, B.; Raffel, C.; Liang, D.; Ellis, D.P.; McVicar, M.; Battenberg, E.; Nieto, O. librosa: Audio and music signal analysis in python. In Proceedings of the 14th Python in Science Conference, Austin, TX, USA, 6–12 July 2015; Volume 8, pp. 18–25. [Google Scholar]
- Řehůřek, R.; Sojka, P. Software Framework for Topic Modelling with Large Corpora. In Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks; ELRA: Valletta, Malta, 2010; pp. 45–50. [Google Scholar]
Precision | Recall | F1-score | Accuracy | Loss | L.R. | Model | |||
---|---|---|---|---|---|---|---|---|---|
P.C. | Mc. Avg. | P.C. | Mc. Avg. | P.C. | Mc. Avg. | ||||
0.000 | 0.301 | 0.000 | 0.500 | 0.000 | 0.375 | 0.601 | 6.116 | 1 × 10 | Stacked DNN |
0.463 | 0.551 | 0.438 | 0.550 | 0.449 | 0.551 | 0.573 | 0.688 | 5 × 10 | |
0.000 | 0.301 | 0.000 | 0.500 | 0.000 | 0.375 | 0.601 | 6.116 | 1 × 10 | |
0.520 | 0.588 | 0.398 | 0.577 | 0.451 | 0.577 | 0.614 | 0.690 | 5 × 10 | |
0.399 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 9.221 | 1 × 10 | |
0.401 | 0.502 | 0.367 | 0.502 | 0.384 | 0.502 | 0.530 | 0.692 | 5 × 10 |
Precision | Recall | F1-score | Accuracy | Loss | L.R. | Model | |||
---|---|---|---|---|---|---|---|---|---|
P.C. | Mc. Avg. | P.C. | Mc. Avg. | P.C. | Mc. Avg. | ||||
0.398 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 9.221 | 1 × 10 | Stacked DNN |
0.455 | 0.557 | 0.586 | 0.560 | 0.512 | 0.551 | 0.555 | 0.695 | 5 × 10 | |
0.000 | 0.301 | 0.000 | 0.500 | 0.000 | 0.375 | 0.601 | 6.116 | 1 × 10 | |
0.485 | 0.589 | 0.625 | 0.592 | 0.546 | 0.582 | 0.586 | 0.691 | 5 × 10 | |
0.399 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 9.221 | 1 × 10 | |
0.448 | 0.589 | 0.813 | 0.575 | 0.578 | 0.519 | 0.526 | 0.694 | 5 × 10 |
Embedding Model | Precision | Recall | F1-score | Accuracy | Loss | E.D. | L.R. | |||
---|---|---|---|---|---|---|---|---|---|---|
P.C. | Mc. Avg. | P.C. | Mc. Avg. | P.C. | Mc. Avg. | |||||
AraVec-Twitter-CBOW | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 71.637 | 100 | 1 × 10 |
AraVec-Twitter-CBOW | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 6.494 | 5 × 10 | |
AraVec-Twitter-CBOW | 0.440 | 0.546 | 0.096 | 0.514 | 0.158 | 0.463 | 0.636 | 4.999 | 5 × 10 | |
AraVec-Twitter-CBOW | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 82.690 | 300 | 1 × 10 |
AraVec-Twitter-CBOW | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 4.419 | 5 × 10 | |
AraVec-Twitter-CBOW | 0.356 | 0.501 | 0.544 | 0.501 | 0.431 | 0.484 | 0.489 | 3.297 | 5 × 10 | |
AraVec-Twitter-SG | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 78.438 | 100 | 1 × 10 |
AraVec-Twitter-SG | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 6.406 | 5 × 10 | |
AraVec-Twitter-SG | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 5.240 | 5 × 10 | |
AraVec-Twitter-SG | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 84.065 | 300 | 1 × 10 |
AraVec-Twitter-SG | 0.356 | 0.678 | 1.000 | 0.502 | 0.525 | 0.267 | 0.358 | 4.355 | 5 × 10 | |
AraVec-Twitter-SG | 0.295 | 0.465 | 0.114 | 0.482 | 0.165 | 0.446 | 0.589 | 3.526 | 5 × 10 | |
AraVec-WiKi-SG | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 1.420 | 100 | 1 × 10 |
AraVec-WiKi-SG | 0.000 | 0.321 | 0.000 | 0.495 | 0.000 | 0.390 | 0.639 | 5.856 | 5 × 10 | |
AraVec-WiKi-SG | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 4.662 | 5 × 10 | |
AraVec-WiKi-CBOW | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 79.096 | 100 | 1 × 10 |
AraVec-WiKi-CBOW | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 6.471 | 5 × 10 | |
AraVec-WiKi-CBOW | 0.378 | 0.518 | 0.395 | 0.519 | 0.386 | 0.518 | 0.555 | 5.008 | 5 × 10 |
Vocabs Size | Embedding Model | Precision | Recall | F1-score | Acc. | Loss | E.D. | |||
---|---|---|---|---|---|---|---|---|---|---|
P.C. | Mc. Avg. | P.C. | Mc. Avg. | P.C. | Mc. Avg. | |||||
9000 | AraVec-Wiki-CBOW | 0.355 | 0.511 | 0.991 | 0.500 | 0.523 | 0.271 | 0.358 | 5.834 | 100 |
18,000 | 0.261 | 0.449 | 0.053 | 0.485 | 0.088 | 0.420 | 0.611 | 6.895 | ||
27,000 | 0.365 | 0.558 | 0.939 | 0.520 | 0.526 | 0.352 | 0.399 | 5.139 | ||
9000 | AraVec-Twitter-CBOW | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 6.211 | 100 |
18,000 | 0.361 | 0.510 | 0.728 | 0.509 | 0.483 | 0.443 | 0.445 | 6.244 | ||
27,000 | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 6.414 | ||
9000 | AraVec-Twitter-SG | 0.362 | 0.626 | 0.991 | 0.515 | 0.531 | 0.302 | 0.377 | 4.556 | 300 |
18,000 | 0.276 | 0.456 | 0.070 | 0.484 | 0.112 | 0.429 | 0.604 | 5.237 | ||
27,000 | 0.356 | 0.504 | 0.868 | 0.502 | 0.505 | 0.365 | 0.396 | 4.713 | ||
9000 | AraVec-Twitter-SG | 0.355 | 0.178 | 1.000 | 0.500 | 0.524 | 0.262 | 0.355 | 6.036 | 100 |
18,000 | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 6.096 | ||
27,000 | 0.000 | 0.322 | 0.000 | 0.500 | 0.000 | 0.392 | 0.645 | 5.870 |
Vocab Size | E.M. | Precision | Recall | F1-score | Acc. | Loss | Epochs | E.W. | E.D. | L.R. | B.S. | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P.C. | Mc. Avg. | P.C. | Mc. Avg. | P.C. | Mc. Avg. | |||||||||
9000 | SG | 0.000 | 0.299 | 0.000 | 0.495 | 0.000 | 0.373 | 0.595 | 2.085 | 30 | Non | 300 | 5 × 10 | 128 |
SG | 0.399 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 11.683 | 9 × 10 | |||||
SG | 0.399 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 2.299 | 5 × 10 | |||||
All | SG | 0.399 | 0.199 | 1.000 | 0.500 | 0.570 | 0.285 | 0.399 | 2.020 | 30 | Non | 300 | 5 × 10 | 128 |
SG | 0.000 | 0.301 | 0.000 | 0.500 | 0.000 | 0.375 | 0.601 | 11.685 | 9 × 10 | |||||
SG | 0.394 | 0.322 | 0.977 | 0.491 | 0.562 | 0.286 | 0.393 | 2.162 | 5 × 10 | |||||
All | CBOW | 0.400 | 0.501 | 0.250 | 0.501 | 0.308 | 0.488 | 0.551 | 3.315 | 30 | Non | 100 | 5 × 10 | 128 |
CBOW | 0.000 | 0.301 | 0.000 | 0.500 | 0.000 | 0.375 | 0.601 | 11.616 | 9 × 10 | |||||
CBOW | 0.383 | 0.387 | 0.891 | 0.469 | 0.535 | 0.309 | 0.383 | 2.452 | 5 × 10 | |||||
All | CBOW | 0.377 | 0.483 | 0.336 | 0.484 | 0.355 | 0.483 | 0.514 | 1.908 | 30 | Non | 100 | 5 × 10 | 64 |
CBOW | 0.408 | 0.510 | 0.586 | 0.511 | 0.481 | 0.495 | 0.495 | 11.623 | 9 × 10 | |||||
CBOW | 0.411 | 0.512 | 0.523 | 0.513 | 0.460 | 0.507 | 0.511 | 1.524 | 5 × 10 | |||||
All | CBOW | 0.397 | 0.489 | 0.898 | 0.496 | 0.550 | 0.355 | 0.414 | 1.256 | 30 | Non | 100 | 5 × 10 | 32 |
CBOW | 0.404 | 0.513 | 0.820 | 0.509 | 0.541 | 0.42 | 0.445 | 11.636 | 9 × 10 | |||||
CBOW | 0.399 | 0.499 | 0.844 | 0.500 | 0.541 | 0.394 | 0.430 | 1.205 | 5 × 10 | |||||
All | CBOW | 0.368 | 0.465 | 0.523 | 0.464 | 0.432 | 0.451 | 0.452 | 2.818 | 30 | Trainable | 100 | 5 × 10 | 128 |
CBOW | 0.368 | 0.477 | 0.305 | 0.479 | 0.333 | 0.475 | 0.514 | 0.932 | 50 | |||||
CBOW | 0.333 | 0.449 | 0.297 | 0.452 | 0.314 | 0.450 | 0.483 | 1.447 | 100 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Habib, M.; Faris, M.; Qaddoura, R.; Alomari, M.; Alomari, A.; Faris, H. Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach. Sensors 2021, 21, 3279. https://doi.org/10.3390/s21093279
Habib M, Faris M, Qaddoura R, Alomari M, Alomari A, Faris H. Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach. Sensors. 2021; 21(9):3279. https://doi.org/10.3390/s21093279
Chicago/Turabian StyleHabib, Maria, Mohammad Faris, Raneem Qaddoura, Manal Alomari, Alaa Alomari, and Hossam Faris. 2021. "Toward an Automatic Quality Assessment of Voice-Based Telemedicine Consultations: A Deep Learning Approach" Sensors 21, no. 9: 3279. https://doi.org/10.3390/s21093279