Pathological Voice Detection Using Joint Subsapce Transfer Learning
Abstract
:1. Introduction
- A new cross-corpus pathological voice recognition framework is proposed in this paper. Our method effectively eliminates the discrepancy between the source domain and the target domain to increase the recognition rate of cross-corpus pathological voice experiments.
- The aim of the JSTL is to find a projection matrix, which transforms the source domain samples and target domain samples into a common space. The MMD is used to quantify the distance between the source and target domains. Graph embedding and interclass and intraclass distance are used as regular terms. The experimental results show that in the case of regular term constraints, the mapped features have better separability.
- Compared to other methods that do not account for differences in conditional distributions, pseudo labels are used to construct the MMD matrix and graph embedding matrix.
2. Methods
2.1. Maximum Mean Discrepancy
2.2. Intra-Class Distance and Inter-Class Distance
2.3. Graph Embedding
Algorithm 1 The proposed JSTL algorithm |
1: Input: source and target data X; source domain labels ; regularization parameter , 2: Output: target domain labels 3: , , 4: repeat until convergence 5: construct the graph matrix by true and pseudo label 6: solve eigendecomposition and k smallest eigenvectors as adaptation matrix A 7: update pseudo target labels using a standard classifier f trained on projected source data 8: update and L 9: end repeat 10: return target domain labels determined by classifier f |
3. Experimental Methodology
3.1. Database
M-S | M-H | S-M | S-H | H-M | H-S | |
---|---|---|---|---|---|---|
Training set | MEEI | MEEI | SVD | SVD | HUPA | HUPA |
Testing set | SVD | HUPA | MEEI | HUPA | MEEI | SVD |
3.2. Feature Set
3.3. Feature Selection
4. Results and Discussion
4.1. Cross-Database Experiments
4.2. Parameter Sensitivity Analysis
4.3. Feature Visualization
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- LeBorgne, W.; Donahue, E.N. Voice Therapy as Primary Treatment of Vocal Fold Pathology. Otolaryngol. Clin. 2019, 52, 649–656. [Google Scholar] [CrossRef]
- Saeedi, N.E.; Almasganj, F.; Torabinejad, F. Support vector wavelet adaptation for pathological voice assessment. Comput. Biol. Med. 2011, 41, 822–828. [Google Scholar] [CrossRef] [PubMed]
- Turkmen, H.I.; Karsligil, M.E.; Kocak, I. Classification of laryngeal disorders based on shape and vascular defects of vocal folds. Comput. Biol. Med. 2015, 62, 76–85. [Google Scholar] [CrossRef] [PubMed]
- Arias-Londoño, J.D.; Godino-Llorente, J.I.; Sáenz-Lechón, N.; Osma-Ruiz, V.; Castellanos-Domínguez, G. An improved method for voice pathology detection by means of a HMM-based feature space transformation. Pattern Recognit. 2010, 43, 3100–3112. [Google Scholar] [CrossRef]
- Ali, Z.; Elamvazuthi, I.; Alsulaiman, M.; Muhammad, G. Automatic voice pathology detection with running speech by using estimation of auditory spectrum and cepstral coefficients based on the all-pole model. J. Voice 2016, 30, 757-e7. [Google Scholar] [CrossRef] [PubMed]
- Hireš, M.; Gazda, M.; Drotár, P.; Pah, N.D.; Motin, M.A.; Kumar, D.K. Convolutional neural network ensemble for Parkinson’s disease detection from voice recordings. Comput. Biol. Med. 2021, 141, 105021. [Google Scholar] [CrossRef]
- Kadiri, S.R.; Alku, P. Analysis and detection of pathological voice using glottal source features. IEEE J. Sel. Top. Signal Process. 2019, 14, 367–379. [Google Scholar] [CrossRef]
- Wu, Y.; Zhou, C.; Fan, Z.; Wu, D.; Zhang, X.; Tao, Z. Investigation and Evaluation of Glottal Flow Waveform for Voice Pathology Detection. IEEE Access 2020, 9, 30–44. [Google Scholar] [CrossRef]
- Zhou, C.; Wu, Y.; Fan, Z.; Zhang, X.; Wu, D.; Tao, Z. Gammatone spectral latitude features extraction for pathological voice detection and classification. Appl. Acoust. 2022, 185, 108417. [Google Scholar] [CrossRef]
- Daume, H., III; Marcu, D. Domain adaptation for statistical classifiers. J. Artif. Intell. Res. 2006, 26, 101–126. [Google Scholar] [CrossRef]
- Al-Nasheri, A.; Muhammad, G.; Alsulaiman, M.; Ali, Z.; Malki, K.H.; Mesallam, T.A.; Ibrahim, M.F. Voice pathology detection and classification using auto-correlation and entropy features in different frequency regions. IEEE Access 2017, 6, 6961–6974. [Google Scholar] [CrossRef]
- Hegde, S.; Shetty, S.; Rai, S.; Dodderi, T. A survey on machine learning approaches for automatic detection of voice disorders. J. Voice 2019, 33, 947. [Google Scholar] [CrossRef] [PubMed]
- Brockmann, M.; Storck, C.; Carding, P.N.; Drinnan, M.J. Voice loudness and gender effects on jitter and shimmer in healthy adults. J. Speech Lang. Hear. Res. 2008, 51, 1152–1160. [Google Scholar] [CrossRef]
- Pan, S.J.; Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Li, Y.; Yuan, L.; Vasconcelos, N. Bidirectional learning for domain adaptation of semantic segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 6936–6945. [Google Scholar]
- Poncelas, A.; Way, A. Selecting artificially-generated sentences for fine-tuning neural machine translation. arXiv 2019, arXiv:1909.12016. [Google Scholar]
- Li, B.; Wang, X.; Beigi, H. Cantonese automatic speech recognition using transfer learning from mandarin. arXiv 2019, arXiv:1911.09271. [Google Scholar]
- Fernando, B.; Habrard, A.; Sebban, M.; Tuytelaars, T. Unsupervised visual domain adaptation using subspace alignment. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2960–2967. [Google Scholar]
- Sun, B.; Saenko, K. Subspace distribution alignment for unsupervised domain adaptation. In Proceedings of the BMVC, Swansea, UK, 7–10 September 2015; Volume 4, pp. 21–24. [Google Scholar]
- Sun, B.; Feng, J.; Saenko, K. Correlation alignment for unsupervised domain adaptation. In Domain Adaptation in Computer Vision Applications; Springer: Berlin/Heidelberg, Germany, 2017; pp. 153–171. [Google Scholar]
- Yan, H.; Ding, Y.; Li, P.; Wang, Q.; Xu, Y.; Zuo, W. Mind the class weight bias: Weighted maximum mean discrepancy for unsupervised domain adaptation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2272–2281. [Google Scholar]
- Zong, Y.; Zheng, W.; Zhang, T.; Huang, X. Cross-corpus speech emotion recognition based on domain-adaptive least-squares regression. IEEE Signal Process. Lett. 2016, 23, 585–589. [Google Scholar] [CrossRef]
- Zong, Y.; Zheng, W.; Huang, X.; Yan, K.; Yan, J.; Zhang, T. Emotion recognition in the wild via sparse transductive transfer linear discriminant analysis. J. Multimodal User Interfaces 2016, 10, 163–172. [Google Scholar] [CrossRef]
- Song, P.; Zheng, W.; Ou, S.; Zhang, X.; Jin, Y.; Liu, J.; Yu, Y. Cross-corpus speech emotion recognition based on transfer non-negative matrix factorization. Speech Commun. 2016, 83, 34–41. [Google Scholar] [CrossRef]
- Borgwardt, K.M.; Gretton, A.; Rasch, M.J.; Kriegel, H.P.; Schölkopf, B.; Smola, A.J. Integrating structured biological data by kernel maximum mean discrepancy. Bioinformatics 2006, 22, e49–e57. [Google Scholar] [CrossRef]
- Song, P.; Zheng, W. Feature selection based transfer subspace learning for speech emotion recognition. IEEE Trans. Affect. Comput. 2018, 11, 373–382. [Google Scholar] [CrossRef]
- Chen, Y.; Xiao, Z.; Zhang, X.; Tao, Z. DSTL: Solution to Limitation of Small Corpus in Speech Emotion Recognition. J. Artif. Intell. Res. 2019, 66, 381–410. [Google Scholar] [CrossRef]
- Pan, S.J.; Tsang, I.W.; Kwok, J.T.; Yang, Q. Domain adaptation via transfer component analysis. IEEE Trans. Neural Netw. 2010, 22, 199–210. [Google Scholar] [CrossRef] [PubMed]
- Long, M.; Wang, J.; Ding, G.; Sun, J.; Yu, P.S. Transfer feature learning with joint distribution adaptation. In Proceedings of the IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 2200–2207. [Google Scholar]
- Yan, S.; Xu, D.; Zhang, B.; Zhang, H.J.; Yang, Q.; Lin, S. Graph embedding and extensions: A general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intell. 2006, 29, 40–51. [Google Scholar] [CrossRef]
- Islam, R.; Tarique, M.; Abdel-Raheem, E. A Survey on Signal Processing Based Pathological Voice Detection Techniques. IEEE Access 2020, 8, 66749–66776. [Google Scholar] [CrossRef]
- Saenz-Lechon, N.; Godino-Llorente, J.I.; Osma-Ruiz, V.; Gomez-Vilda, P. Methodological issues in the development of automatic systems for voice pathology detection. Biomed. Signal Process. Control 2006, 1, 120–128. [Google Scholar] [CrossRef]
- Barry, W.; Putzer, M. Saarbrucken Voice Database; Institute of Phonetics University of Saarland: Saarland, Germany, 2007. [Google Scholar]
- Mekyska, J.; Janousova, E.; Gomez-Vilda, P.; Smekal, Z.; Rektorova, I.; Eliasova, I.; Kostalova, M.; Mrackova, M.; Alonso-Hernandez, J.B.; Faundez-Zanuy, M.; et al. Robust and complex approach of pathological speech signal analysis. Neurocomputing 2015, 167, 94–111. [Google Scholar] [CrossRef]
- Eyben, F.; Wöllmer, M.; Schuller, B. Opensmile: The munich versatile and fast open-source audio feature extractor. In Proceedings of the 18th ACM International Conference on Multimedia, Florence, Italy, 25–29 October 2010; pp. 1459–1462. [Google Scholar]
- Holi, M.S. Wavelet transform features to hybrid classifier for detection of neurological-disordered voices. J. Clin. Eng. 2017, 42, 89–98. [Google Scholar]
- Belalcazar-Bolanos, E.; Orozco-Arroyave, J.; Arias-Londono, J.; Vargas-Bonilla, J.; Nöth, E. Automatic detection of Parkinson’s disease using noise measures of speech. In Proceedings of the Symposium of Signals, Images and Artificial Vision-2013: STSIVA, Bogota, Colombia, 11–13 September 2013; pp. 1–5. [Google Scholar]
- Dahmani, M.; Guerti, M. Vocal folds pathologies classification using Naïve Bayes Networks. In Proceedings of the 2017 6th International Conference on Systems and Control (ICSC), Batna, Algeria, 7–9 July 2017; pp. 426–432. [Google Scholar]
- Gong, B.; Shi, Y.; Sha, F.; Grauman, K. Geodesic flow kernel for unsupervised domain adaptation. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, Providence, RI, USA, 16–21 June 2012; pp. 2066–2073. [Google Scholar]
- Anzai, Y. Pattern Recognition and Machine Learning; Elsevier: Amsterdam, The Netherlands, 2012. [Google Scholar]
Parameters | Description |
---|---|
Labeled source/unlabeled target features | |
MMD matrix | |
Laplacian matrix | |
Source/target domain | |
Intraclass/interclass divergence matrix | |
c | the c-th pathological voice category |
regularization parameter of | |
regularization parameter of GE | |
k | number of nearest neighbors |
Database | Healthy | Neuromuscular | Structural |
---|---|---|---|
MEEI | 53 | 67 | 82 |
SVD | 200 | 64 | 130 |
HUPA | 239 | 31 | 128 |
LLD | Number | Functionals (12) |
---|---|---|
ZCR | 1 | mean |
ZCR | 1 | |
RMS | 1 | Energy standard deviation |
RMS | 1 | |
F0 | 1 | kurtosis, skewness |
F0 | 1 | |
HNR | 1 | Extremes: Value, rel. position, range |
HNR | 1 | |
MFCC1-12 | 12 | linear regression: offset, slope, MSE |
MFCC1-12 | 12 |
Cases | NN | PCA | TCA | JDA | GFK | SJDA | DSTL | JSTL |
---|---|---|---|---|---|---|---|---|
M-S | 40.10 | 35.28 | 48.48 | 46.70 | 38.07 | 49.75 | 45.18 | 52.03 |
M-H | 42.71 | 38.44 | 51.26 | 54.02 | 41.21 | 54.52 | 51.51 | 65.83 |
S-M | 42.57 | 35.64 | 45.05 | 45.05 | 48.02 | 45.05 | 50.00 | 52.97 |
S-H | 52.51 | 51.51 | 46.23 | 45.48 | 54.77 | 47.49 | 48.49 | 65.08 |
H-M | 43.56 | 43.07 | 44.06 | 42.57 | 44.06 | 45.54 | 47.37 | 62.38 |
H-S | 58.88 | 53.55 | 58.12 | 62.17 | 57.61 | 61.17 | 58.12 | 64.72 |
Category | NN | PCA | TCA | JDA | GFK | SJDA | DSTL | JSTL |
---|---|---|---|---|---|---|---|---|
Healthy | 63.33 | 58.96 | 68.09 | 70.08 | 64.36 | 70.39 | 68.14 | 75.37 |
Neuromuscular | 39.10 | 33.00 | 43.90 | 47.54 | 40.60 | 45.23 | 43.59 | 37.96 |
Structural | 40.59 | 36.66 | 36.43 | 35.25 | 40.22 | 37.97 | 38.55 | 62.96 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Y.; Qian, J.; Zhang, X.; Xu, Y.; Tao, Z. Pathological Voice Detection Using Joint Subsapce Transfer Learning. Appl. Sci. 2022, 12, 8129. https://doi.org/10.3390/app12168129
Zhang Y, Qian J, Zhang X, Xu Y, Tao Z. Pathological Voice Detection Using Joint Subsapce Transfer Learning. Applied Sciences. 2022; 12(16):8129. https://doi.org/10.3390/app12168129
Chicago/Turabian StyleZhang, Yihua, Jinyang Qian, Xiaojun Zhang, Yishen Xu, and Zhi Tao. 2022. "Pathological Voice Detection Using Joint Subsapce Transfer Learning" Applied Sciences 12, no. 16: 8129. https://doi.org/10.3390/app12168129
APA StyleZhang, Y., Qian, J., Zhang, X., Xu, Y., & Tao, Z. (2022). Pathological Voice Detection Using Joint Subsapce Transfer Learning. Applied Sciences, 12(16), 8129. https://doi.org/10.3390/app12168129