A Piezoelectric Micromachined Ultrasonic Transducer-Based Bone Conduction Microphone System for Enhancing Speech Recognition Accuracy
Abstract
1. Introduction
2. Design of Bone Conduction Microphone System
3. Speech Capture and Visualization
4. Speech Recognition Accuracy Enhancement
4.1. Speech Enhancement Model
4.2. Dataset and Setup
4.3. Results and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ma, Z.; Liu, Y.; Liu, X.; Ma, J.; Li, F. Privacy-preserving outsourced speech recognition for smart IoT devices. IEEE Internet Things J. 2019, 6, 8406–8420. [Google Scholar] [CrossRef]
- Gaikwad, S.K.; Gawali, B.W.; Yannawar, P. A review on speech recognition technique. Int. J. Comput. Appl. 2010, 10, 16–24. [Google Scholar] [CrossRef]
- Farahani, B.; Tabibian, S.; Ebrahimi, H. Toward a personalized clustered federated learning: A speech recognition case study. IEEE Internet Things J. 2023, 10, 18553–18562. [Google Scholar] [CrossRef]
- O’Shaughnessy, D. Speech enhancement—A review of modern methods. IEEE Trans. Human Mach. Syst. 2024, 54, 110–120. [Google Scholar] [CrossRef]
- Putta, V.S.; Priyadharson, A.S.M. Regional language speech recognition from bone conducted speech signals through CCWT algorithm. Circuits Syst. Signal Process 2024, 43, 6553–6570. [Google Scholar] [CrossRef]
- Dong, K.; Peng, H.; Che, J. Dynamic-static cross attentional feature fusion method for speech emotion recognition. In Proceedings of the MultiMedia Modeling, MMM 2023, Bergen, Norway, 9–12 January 2023; Lecture Notes in Computer Science. Springer: Cham, Switzerland, 2023; Volume 13834, pp. 350–361. [Google Scholar]
- Zhang, X.; Tang, J.; Cao, H.; Wang, C.; Shen, C.; Liu, J. Cascaded speech separation denoising and dereverberation using attention and TCN-WPE networks for speech devices. IEEE Internet Things J. 2024, 11, 18047–18058. [Google Scholar] [CrossRef]
- Tan, K.; Wang, D. Learning complex spectral mapping with gated convolutional recurrent networks for monaural speech enhancement. IEEE ACM Trans. Audio Speech Lang. Process. 2020, 28, 380–390. [Google Scholar] [CrossRef]
- Wang, M.; Chen, J.; Zhang, X.; Rahardja, S. End-to-end multi-modal speech recognition on an air and bone conducted speech corpus. IEEE ACM Trans. Audio Speech Lang. Process. 2023, 31, 513–524. [Google Scholar] [CrossRef]
- Shin, H.S.; Kang, H.; Fingscheidt, T. Survey of speech enhancement supported by a bone conduction microphone. In Proceedings of the Speech Communication; 10. ITG Symposium, Braunschweig, Germany, 26–28 September 2012; pp. 1–4. [Google Scholar]
- Hansen, C.H. Fundamentals of acoustics. Am. J. Phys. 1951, 19, 254–255. [Google Scholar]
- Zhou, Y.; Wang, H.; Chu, Y.; Liu, H. A robust dual-microphone generalized sidelobe canceller using a bone-conduction sensor for speech enhancement. Sensors 2021, 21, 1878. [Google Scholar] [CrossRef]
- Zhou, Y.; Chen, Y.; Ma, Y.; Liu, H. A real-time dual-microphone speech enhancement algorithm assisted by bone conduction sensor. Sensors 2020, 20, 5050. [Google Scholar] [CrossRef] [PubMed]
- Lee, C.; Rao, B.D.; Garudadri, H. Bone-conduction sensor assisted noise estimation for improved speech enhancement. In Proceedings of the Interspeech, Hyderabad, India, 2–6 September 2018; pp. 1180–1184. [Google Scholar]
- Hussain, T.; Tsao, Y.; Siniscalchi, S.M.; Wang, J.; Wang, H.; Liao, W. Bone-conducted speech enhancement using hierarchical extreme learning machine. In Increasing Naturalness and Flexibility in Spoken Dialogue Interaction; Lecture Notes in Electrical Engineering; Springer: Singapore, 2021; Volume 714, pp. 153–162. [Google Scholar]
- Rahman, M.S.; Shimamura, T. Amplitude variation of bone-conducted speech compared with air-conducted speech. Acoust. Sci. Technol. 2019, 40, 293–301. [Google Scholar] [CrossRef]
- Nishimura, T.; Miyamae, R.; Hosoi, H.; Saito, O.; Shimokura, R.; Yamanaka, T.; Kitahara, T. Frequency characteristics and speech recognition in cartilage conduction. Auris Nasus Larynx 2019, 46, 709–715. [Google Scholar] [CrossRef]
- Toya, T.; Birkholz, P.; Unoki, M. Measurements of transmission characteristics related to bone-Conducted speech using excitation signals in the oral cavity. J. Speech Lang. Hear. Res. 2020, 63, 4252–4264. [Google Scholar] [CrossRef] [PubMed]
- Ishikawa, H.; Otsuka, S.; Nakagawa, S. Threshold and frequency- and temporal resolutions of distantly presented bone-conducted sound in the audible-frequency range. Jpn. J. Appl. Phys. 2022, 61, 1065. [Google Scholar] [CrossRef]
- Zhu, M.; Ji, H.; Luo, F.; Chen, W. A robust speech enhancement scheme on the basis of bone-conductive microphones. In Proceedings of the 2007 3rd International Workshop on Signal Design and Its Applications in Communications, Chengdu, China, 23–27 September 2007; pp. 353–355. [Google Scholar]
- Rahman, M.S.; Shimamura, T. Pitch characteristics of bone conducted speech. In Proceedings of the 2010 18th European Signal Processing Conference, Aalborg, Denmark, 23–27 August 2010; pp. 795–799. [Google Scholar]
- Shimamura, T.; Tamiya, T. A reconstruction filter for bone-conducted speech. In Proceedings of the 48th Midwest Symposium on Circuits and Systems, 2005, Covington, KY, USA, 7–10 August 2005; pp. 1847–1850. [Google Scholar]
- Bouserhal, R.E.; Falk, T.H.; Voix, J. In-ear microphone speech quality enhancement via adaptive filtering and artificial bandwidth extension. J. Acoust. Soc. Am. 2017, 141, 1321–1331. [Google Scholar] [CrossRef]
- Trung, P.N.; Unoki, M.; Akagi, M. A study on restoration of bone-conducted speech in noisy environments with LP-based model and gaussian mixture model. J. Signal Process. 2012, 16, 409–417. [Google Scholar] [CrossRef]
- Huang, B.; Gong, Y.; Sun, J.; Shen, Y. A wearable bone-conducted speech enhancement system for strong background noises. In Proceedings of the 2017 18th International Conference on Electronic Packaging Technology (ICEPT), Harbin, China, 16–19 August 2017; pp. 1682–1684. [Google Scholar]
- Singh, P.; Mukul, M.K.; Prasad, R. Bone conducted speech signal enhancement using LPC and MFCC. In Proceedings of the Intelligent Human Computer Interaction, IHCI 2018, Allahabad, India, 7–9 December 2018; pp. 148–158. [Google Scholar]
- Zheng, C.; Yang, J.; Zhang, X.; Cao, T.; Sun, M. Bandwidth extension WaveNet for bone-conducted speech enhancement. In Proceedings of the 7th Conference on Sound and Music Technology (CSMT); Lecture Notes in Electrical Engineering. Springer: Singapore, 2020; pp. 3–14. [Google Scholar]
- Zheng, C.; Cao, T.; Yang, J.; Zhang, X.; Sun, M. Spectra restoration of bone-conducted speech via attention-based contextual information and spectrotemporal structure constraint. Trans. Fund. Electron. Commun. Comput. Sci. 2019, E102.A, 2001–2007. [Google Scholar]
- Nguyen, H.Q.; Unoki, M. Improvement in bone conducted speech restoration using linear prediction and long short-term memory model. J. Signal Process. 2020, 24, 175–178. [Google Scholar] [CrossRef]
- Tsuge, S.; Koizumi, D.; Fukumi, M.; Kuroiwa, S. Speaker verification method using bone-conduction and air-conduction speech. In Proceedings of the 2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Kanazawa, Japan, 7–9 January 2009; pp. 449–452. [Google Scholar]
- Wang, H.; Zhang, X.; Wang, D. Fusing bone-conduction and air-conduction sensors for complex-domain speech enhancement. IEEE ACM Trans. Audio Speech Lang. Process. 2022, 30, 3134–3143. [Google Scholar] [CrossRef]
- Wang, H.; Zhang, X.; Wang, D. Attention-based fusion for bone-conducted and air-conducted speech enhancement in the complex domain. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, 23–27 May 2022; pp. 7757–7761. [Google Scholar]
- Wang, M.; Chen, J.; Zhang, X.; Huang, Z.; Rahardja, S. Multi-modal speech enhancement with bone-conducted speech in time domain. Appl. Acoust. 2022, 200, 109058. [Google Scholar] [CrossRef]
- Yu, C.; Hung, K.; Wang, S.; Tsao, Y.; Hung, J. Time-domain multi-modal bone/Air conducted speech enhancement. IEEE Signal Process. Lett. 2020, 27, 1035–1039. [Google Scholar] [CrossRef]
- You, B.C.; Lo, S.C.; Chan, C.K.; Li, C.S.; Ho, H.L.; Chiu, S.C.; Hsieh, G.H.; Fang, W. Design and implementation of dual pressure variation chambers for bone conduction microphone. J. Micromech. Microeng. 2020, 30, 125009. [Google Scholar] [CrossRef]
- Dorize, C.; Guerrier, S.; Awwad, E.; Renaudier, J. Capturing acoustic speech signals with coherent MIMO phase-OTDR. In Proceedings of the 2020 European Conference on Optical Communications (ECOC), Brussels, Belgium, 6–10 December 2020; pp. 1–4. [Google Scholar]
- Gritsenko, T.V.; Orlova, M.V.; Zhirnov, A.A.; Konstantinov, Y.A.; Turov, A.T.; Barkov, F.L.; Khan, R.I.; Koshelev, K.I.; Svelto, C.; Pnev, A.B. Detection and recognition of voice commands by a distributed acoustic sensor based on phase-sensitive OTDR in the smart home concept. Sensors 2024, 24, 2281. [Google Scholar] [CrossRef]
- Jia, L.; Shi, L.; Liu, C.; Xu, J.; Gao, Y.; Sun, C.; Liu, S.; Wu, G. Piezoelectric micromachined ultrasonic transducer array-based electronic stethoscope for Internet of Medical Things. IEEE Internet Things J. 2022, 9, 9766–9774. [Google Scholar] [CrossRef]
- Dai, Y.; Gieseke, F.; Oehmcke, S.; Wu, Y.; Barnard, K. Attentional feature fusion. In Proceedings of the 2021 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 3–8 January 2021; pp. 3559–3568. [Google Scholar]
- Hu, Y.; Liu, Y.; Lv, S.; Xing, M.; Zhang, S.; Fu, Y.; Wu, J.; Zhang, B.; Xie, L. DCCRN: Deep complex convolution recurrent network for phase-aware speech enhancement. In Proceedings of the Interspeech 2020, Shanghai, China, 25–29 October 2020; pp. 2472–2476. [Google Scholar]
- Liu, C.; Wang, X.; Xie, Y.; Wu, G. Bone conduction pickup based on piezoelectric micromachined ultrasonic transducers. In Proceedings of the 2023 IEEE 36th International Conference on Micro Electro Mechanical Systems (MEMS), Munich, Germany, 15–19 January 2023; pp. 949–952. [Google Scholar]
- Li, Y.; Wang, Y.; Liu, X.; Shi, Y.; Patel, S.; Shih, S. Enabling real-time on-chip audio super resolution for bone-conduction microphones. Sensors 2023, 23, 35. [Google Scholar] [CrossRef]
- Liu, C.; Jia, L.; Shi, L.; Sun, C.; Cheam, D.D.; Wang, P.; Wu, G. Theoretical modeling of piezoelectric micromachined ultrasonic transducers with honeycomb structure. J. Microelectromech. Syst. 2022, 31, 984–993. [Google Scholar] [CrossRef]
- Daniel, P.; Arnab, G.; Gilles, B.; Lukas, B.; Ondrej, G.; Nagendra, G.; Mirko, H.; Petr, M.; Yanmin, Q.; Petr, S.; et al. The Kaldi speech recognition toolkit. In Proceedings of the IEEE Workshop on Automatic Speech Recognition and Understanding, Waikoloa, HI, USA, 11–15 December 2011. [Google Scholar]
- Bu, H.; Du, J.; Na, X.; Wu, B.; Zheng, H. AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline. In Proceedings of the 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA), Seoul, Republic of Korea, 1–3 November 2017; pp. 1–5. [Google Scholar]
- Fendji, J.L.K.E.; Tala, D.C.M.; Yenke, B.O.; Atemkeng, M. Automatic Speech Recognition Using Limited Vocabulary: A Survey. Appl. Artif. Intell. 2022, 36, 2095039. [Google Scholar] [CrossRef]
- Wang, P.; Sun, R.; Zhao, H.; Yu, K. A New Word Language Model Evaluation Metric for Character Based Languages. In Chinese Computational Linguistics and Natural Language Processing Based on Naturally Annotated Big Data; Sun, M., Zhang, M., Lin, D., Wang, H., Eds.; Springer: Berlin, Germany, 2013; pp. 315–324. [Google Scholar]
- Wang, D.; Zhang, X. THCHS-30: A free Chinese speech corpus. arXiv 2015, arXiv:1512.01882. [Google Scholar]
Noise Environment | AC | BC |
---|---|---|
Quiet (∼40 dB) | 150 | 70 |
Noisy (∼60 dB) | 20 | 60 |
Noisy (∼68 dB) | 10 | 55 |
Dataset | Speech Source |
---|---|
AW | Captured clean AC speech |
BW | Captured BC speech |
NW | Noise from THCH30 [48] |
ANW | Noisy AC speech obtained by mixing AW and NW datasets |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, C.; Wang, X.; Xiao, J.; Zhou, J.; Wu, G. A Piezoelectric Micromachined Ultrasonic Transducer-Based Bone Conduction Microphone System for Enhancing Speech Recognition Accuracy. Micromachines 2025, 16, 613. https://doi.org/10.3390/mi16060613
Liu C, Wang X, Xiao J, Zhou J, Wu G. A Piezoelectric Micromachined Ultrasonic Transducer-Based Bone Conduction Microphone System for Enhancing Speech Recognition Accuracy. Micromachines. 2025; 16(6):613. https://doi.org/10.3390/mi16060613
Chicago/Turabian StyleLiu, Chongbin, Xiangyang Wang, Jianbiao Xiao, Jun Zhou, and Guoqiang Wu. 2025. "A Piezoelectric Micromachined Ultrasonic Transducer-Based Bone Conduction Microphone System for Enhancing Speech Recognition Accuracy" Micromachines 16, no. 6: 613. https://doi.org/10.3390/mi16060613
APA StyleLiu, C., Wang, X., Xiao, J., Zhou, J., & Wu, G. (2025). A Piezoelectric Micromachined Ultrasonic Transducer-Based Bone Conduction Microphone System for Enhancing Speech Recognition Accuracy. Micromachines, 16(6), 613. https://doi.org/10.3390/mi16060613