Enhancing Bone Conduction Sensor Signals via Self-Supervised Acoustic Priors and Key-Value Memory
Abstract
1. Introduction
- Cross-Sensor Knowledge Transfer Framework: To the best of our knowledge, this is the first work to integrate large-scale SSL models into BC speech enhancement. By transferring high-fidelity acoustic priors from SSL embeddings, our method mitigates the hardware-imposed bandwidth limitation of BC sensors, significantly enhancing the fidelity and intelligibility of the sensor output and restoring fine-grained high-frequency content. Audio samples are publicly available at https://echoaimaomao.github.io/LeverageWav2Vec/ (accessed on 1 February 2026).
- Reference-free Retrieval via Key-Value Memory: We introduce a Key-Value Memory module to bridge the gap between BC and AC sensor signal domains. By mapping BC queries to robust acoustic priors, this mechanism retrieves high-fidelity restoration prompts during inference using solely BC input. This architecture also eliminates the need to execute the large SSL models during deployment, effectively reducing computational overhead for resource-constrained devices.
- Flexible Plug-and-Play Adaptor Design: We employ a lightweight Gated Attention Projection and cross-attention mechanism to dynamically align and fuse the retrieved priors with the backbone features. This design decouples the restoration network from the knowledge source, establishing a framework where both the backbone and the pre-trained model can be flexibly replaced or upgraded. It is worth noting that the computational overhead of the large-scale pre-trained model is strictly confined to the training phase, ensuring that the inference model remains lightweight and efficient for deployment on resource-constrained devices.
2. Background and Related Work
2.1. Signal Model
2.2. Bone Conduction Speech Enhancement
2.3. Self-Supervised Learning in Speech Processing
2.4. Key-Value Memory Networks for Cross Modal Retrieval
3. Methodology
- Mainstream Module that serves as the backbone for feature encoding and waveform reconstruction;
- Embedding Extraction Module that utilizes large-scale pre-trained SSL model to extract embeddings encapsulating high-fidelity acoustic priors;
- Dimension Adaptor Module specifically designed to align the dimensional discrepancy between bottleneck features and external embeddings via Up- and Down-Projection operations;
- Key-Value Memory Module that bridges the modality gap, enabling the associative retrieval of these idealized priors using BC features as queries.

3.1. The Mainstream Module
3.2. Embedding Extraction Module
3.3. Dimension Adaptor Module
3.4. Key-Value Memory Module
3.4.1. Storing and Addressing Representative Features
3.4.2. Bridging the Two Memories
3.4.3. Recalling the Target Embeddings
3.5. The Objective Function
4. Experimental Setup
4.1. Datasets and Metrics
4.1.1. Datasets
4.1.2. Evaluation Metrics
- (1)
- PESQ (Wide-band) [35]: Based on the ITU-T P.862.2 standard [36], this metric evaluates perceptual speech quality with a range from −0.5 to 4.5. It is particularly appropriate for this task as it assesses the restoration of high-frequency components (up to 7 kHz), which are typically severely attenuated in BC signals.
- (2)
- STOI [37]: This metric measures speech intelligibility by calculating the correlation of short-time temporal envelopes between the clean reference and the enhanced signal (range: 0 to 1). It is essential for verifying that the bandwidth extension process preserves the underlying linguistic content without introducing destructive artifacts.
- (3)
- Composite Metrics [38]: To approximate subjective Mean Opinion Scores (MOSs), we report CSIG (signal distortion), CBAK (background intrusiveness), and COVL (overall quality). These metrics (range: 1 to 5) provide a holistic view of the restoration performance, distinguishing between noise suppression capability and the naturalness of the reconstructed speech signal.
4.2. Training Details
4.2.1. Data Preprocessing
4.2.2. Training Configuration
4.2.3. Loss Function and Hyperparameters
5. Results and Analysis
5.1. Validation of the Proposed Framework
5.1.1. Impact of Hyperparameters
5.1.2. Effectiveness of Different SSL Configurations
- (1)
- Impact of Linguistic Consistency: Within the Wav2Vec 2.0 comparisons using Encoder Features, the performance follows the order: Mandarin-Mainland > Mandarin-Taiwan > English. This suggests that while acoustic structures possess some universality, aligning the language of the pre-trained model with the target speech domain yields more precise guidance.
- (2)
- Benefit of Contextual Information: For the Mandarin-Mainland configuration, utilizing Context Features yields superior performance compared to Encoder Features. This indicates that the high-level semantic and contextual representations learned by the Transformer layers provide robust cues for restoration, surpassing local acoustic features. Regarding HuBERT, extracting embeddings from the last Transformer layer yields better results than averaging all layers. We hypothesize that averaging across layers inevitably incorporates corrupted low-level representations, which dilutes the semantic distinctiveness of the retrieval keys. In contrast, the highest semantic layer provides abstract, phoneme-level representations that serve as stable, sensor-invariant anchors, enabling the memory network to accurately look up high-fidelity textures. Consequently, this configuration is adopted as our optimal method.
- (3)
- Scalability with Model Capability: Most notably, the HuBERT-based configuration achieves the best overall performance. HuBERT outperforms its Wav2Vec 2.0 counterpart, demonstrating that our flexible plug-and-play adaptor design allows the restoration framework to scale effectively with stronger upstream SSL models.
5.1.3. Compatibility of the Proposed Framework
5.1.4. Visualization of Memory Mechanism
5.2. Comparisons with Other Baselines
5.2.1. Baseline Methods
5.2.2. The Results of Objective Metrics
5.2.3. Performance Analysis Across Different Genders
5.2.4. Visualization of the Envelopes and Spectrograms
5.2.5. Subjective Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Dekens, T.; Verhelst, W. Body conducted speech enhancement by equalization and signal fusion. IEEE Trans. Audio Speech Lang. Process. 2013, 21, 2481–2492. [Google Scholar] [CrossRef]
- Wang, M.; Chen, J.; Zhang, X.; Huang, Z.; Rahardja, S. Multi-modal speech enhancement with bone-conducted speech in time domain. Appl. Acoust. 2022, 200, 109058. [Google Scholar] [CrossRef]
- Wang, H.; Zhang, X.; Wang, D. Fusing bone-conduction and air-conduction sensors for complex-domain speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 2022, 30, 3134–3143. [Google Scholar] [CrossRef]
- Kuang, K.; Yang, F.; Yang, J. A lightweight speech enhancement network fusing bone-and air-conducted speech. J. Acoust. Soc. Am. 2024, 156, 1355–1366. [Google Scholar] [CrossRef]
- Vu, T.T.; Seide, G.; Unoki, M.; Akagi, M. Method of LP-based blind restoration for improving intelligibility of bone-conducted speech. In Proceedings of the INTERSPEECH, Antwerp, Belgium, 27–31 August 2007; pp. 966–969. [Google Scholar]
- Turan, M.T.; Erzin, E. Source and filter estimation for throat-microphone speech enhancement. IEEE/ACM Trans. Audio Speech Lang. Process. 2015, 24, 265–275. [Google Scholar] [CrossRef]
- Trung, P.N.; Unoki, M.; Akagi, M. A study on restoration of bone-conducted speech in noisy environments with LP-based model and gaussian mixture model. J. Signal Process. 2012, 16, 409–417. [Google Scholar] [CrossRef]
- Liu, H.P.; Tsao, Y.; Fuh, C.S. Bone-conducted speech enhancement using deep denoising autoencoder. Speech Commun. 2018, 104, 106–112. [Google Scholar] [CrossRef]
- Zheng, C.; Cao, T.; Yang, J.; Zhang, X.; Sun, M. Spectra restoration of bone-conducted speech via attention-based contextual information and spectro-temporal structure constraint. IEICE Trans. Fundam. Electron. Commun. Comput. Sci. 2019, 102, 2001–2007. [Google Scholar] [CrossRef]
- Edraki, A.; Chan, W.Y.; Jensen, J.; Fogerty, D. Speaker adaptation for enhancement of bone-conducted speech. In Proceedings of the ICASSP 2024—2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; IEEE: New York, NY, USA, 2024; pp. 10456–10460. [Google Scholar]
- Li, C.; Yang, F.; Yang, J. A two-stage approach to quality restoration of bone-conducted speech. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 32, 818–829. [Google Scholar] [CrossRef]
- Li, Y.; Wang, Y.; Liu, X.; Shi, Y.; Patel, S.; Shih, S.F. Enabling real-time on-chip audio super resolution for bone-conduction microphones. Sensors 2022, 23, 35. [Google Scholar] [CrossRef]
- Cheng, L.; Dou, Y.; Zhou, J.; Wang, H.; Tao, L. Speaker-independent spectral enhancement for bone-conducted speech. Algorithms 2023, 16, 153. [Google Scholar] [CrossRef]
- Yu, C.; Hung, K.H.; Wang, S.S.; Tsao, Y.; Hung, J.W. Time-domain multi-modal bone/air conducted speech enhancement. IEEE Signal Process. Lett. 2020, 27, 1035–1039. [Google Scholar] [CrossRef]
- Zheng, C.; Xu, L.; Fan, X.; Yang, J.; Fan, J.; Huang, X. Dual-path transformer-based network with equalization-generation components prediction for flexible vibrational sensor speech enhancement in the time domain. J. Acoust. Soc. Am. 2022, 151, 2814–2825. [Google Scholar] [CrossRef] [PubMed]
- Julien, H.; Thomas, J.; Véronique, Z.; Éric, B. Configurable EBEN: Extreme Bandwidth Extension Network to enhance body-conducted speech capture. IEEE/ACM Trans. Audio Speech Lang. Process. 2023, 31, 3499–3512. [Google Scholar]
- Li, C.; Yang, F.; Yang, J. Restoration of Bone-Conducted Speech with U-Net-Like Model and Energy Distance Loss. IEEE Signal Process. Lett. 2023, 31, 166–170. [Google Scholar] [CrossRef]
- Baevski, A.; Zhou, Y.; Mohamed, A.; Auli, M. wav2vec 2.0: A framework for self-supervised learning of speech representations. Adv. Neural Inf. Process. Syst. 2020, 33, 12449–12460. [Google Scholar]
- Hsu, W.N.; Bolte, B.; Tsai, Y.H.H.; Lakhotia, K.; Salakhutdinov, R.; Mohamed, A. Hubert: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 3451–3460. [Google Scholar] [CrossRef]
- Chen, S.; Wang, C.; Chen, Z.; Wu, Y.; Liu, S.; Chen, Z.; Li, J.; Kanda, N.; Yoshioka, T.; Xiao, X.; et al. Wavlm: Large-scale self-supervised pre-training for full stack speech processing. IEEE J. Sel. Top. Signal Process. 2022, 16, 1505–1518. [Google Scholar] [CrossRef]
- Stenfelt, S.; Goode, R.L. Bone-conducted sound: Physiological and clinical aspects. Otol. Neurotol. 2005, 26, 1245–1261. [Google Scholar] [CrossRef]
- Shimamura, T.; Tamiya, T. A reconstruction filter for bone-conducted speech. In Proceedings of the 48th Midwest Symposium on Circuits and Systems, Cincinnati, OH, USA, 7–10 August 2005; IEEE: New York, NY, USA, 2005; pp. 1847–1850. [Google Scholar]
- McBride, M.; Tran, P.; Letowski, T.; Patrick, R. The effect of bone conduction microphone locations on speech intelligibility and sound quality. Appl. Ergon. 2011, 42, 495–502. [Google Scholar] [CrossRef]
- Kondo, K.; Fujita, T.; Nakagawa, K. On equalization of bone conducted speech for improved speech quality. In Proceedings of the 2006 IEEE International Symposium on Signal Processing and Information Technology, Vancouver, BC, Canada, 27–30 August 2006; IEEE: New York, NY, USA, 2006; pp. 426–431. [Google Scholar]
- Nilsson, M.; Gustaftson, H.; Andersen, S.V.; Kleijn, W.B. Gaussian mixture model based mutual information estimation between frequency bands in speech. In Proceedings of the 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing, Orlando, FL, USA, 13–17 May 2002; IEEE: New York, NY, USA, 2002; Volume 1, pp. 525–528. [Google Scholar]
- Vu, T.T.; Unoki, M.; Akagi, M. A blind restoration model for bone-conducted speech based on a linear prediction scheme. IEICE Proc. Ser. 2007, 41, 449–452. [Google Scholar]
- Mohamed, A.; Lee, H.y.; Borgholt, L.; Havtorn, J.D.; Edin, J.; Igel, C.; Kirchhoff, K.; Li, S.W.; Livescu, K.; Maaløe, L.; et al. Self-supervised speech representation learning: A review. IEEE J. Sel. Top. Signal Process. 2022, 16, 1179–1210. [Google Scholar] [CrossRef]
- Wang, Y.; Li, J.; Wang, H.; Qian, Y.; Wang, C.; Wu, Y. Wav2vec-switch: Contrastive learning from original-noisy speech pairs for robust speech recognition. In Proceedings of the ICASSP 2022—2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtual, 7–13 May 2022; IEEE: New York, NY, USA, 2022; pp. 7097–7101. [Google Scholar]
- Chen, L.W.; Rudnicky, A. Exploring wav2vec 2.0 fine tuning for improved speech emotion recognition. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
- Jayashankar, T.; Wu, J.; Sari, L.; Kant, D.; Manohar, V.; He, Q. Self-supervised representations for singing voice conversion. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
- Kim, M.; Hong, J.; Park, S.J.; Ro, Y.M. Cromm-vsr: Cross-modal memory augmented visual speech recognition. IEEE Trans. Multimed. 2021, 24, 4342–4355. [Google Scholar] [CrossRef]
- Kim, M.; Yeo, J.H.; Ro, Y.M. Distinguishing homophenes using multi-head visual-audio memory for lip reading. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 1174–1182. [Google Scholar]
- Yeo, J.H.; Kim, M.; Ro, Y.M. Multi-temporal lip-audio memory for visual speech recognition. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; IEEE: New York, NY, USA, 2023; pp. 1–5. [Google Scholar]
- Stoller, D.; Ewert, S.; Dixon, S. Wave-u-net: A multi-scale neural network for end-to-end audio source separation. arXiv 2018, arXiv:1806.03185. [Google Scholar]
- Rix, A.W.; Beerends, J.G.; Hollier, M.P.; Hekstra, A.P. Perceptual evaluation of speech quality (PESQ)—A new method for speech quality assessment of telephone networks and codecs. In Proceedings of the 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing, Proceedings (Cat. No. 01CH37221), Salt Lake City, UT, USA, 7–11 May 2001; IEEE: New York, NY, USA, 2001; Volume 2, pp. 749–752. [Google Scholar]
- International Telecommunication Union. Recommendation P.862.2: Wideband Extension to Recommendation P.862 for the Assessment of Wideband Telephone Networks and Speech Codecs; International Telecommunication Union: Geneva, Switzerland, 2005. [Google Scholar]
- Taal, C.H.; Hendriks, R.C.; Heusdens, R.; Jensen, J. An algorithm for intelligibility prediction of time–frequency weighted noisy speech. IEEE Trans. Audio Speech Lang. Process. 2011, 19, 2125–2136. [Google Scholar] [CrossRef]
- Hu, Y.; Loizou, P.C. A comparative intelligibility study of single-microphone noise reduction algorithms. J. Acoust. Soc. Am. 2007, 122, 1777–1786. [Google Scholar] [CrossRef]
- Panayotov, V.; Chen, G.; Povey, D.; Khudanpur, S. Librispeech: An ASR corpus based on public domain audio books. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, QLD, Australia, 19–24 April 2015; IEEE: New York, NY, USA, 2015; pp. 5206–5210. [Google Scholar]
- Clifton, A.; Reddy, S.; Yu, Y.; Pappu, A.; Rezapour, R.; Bonab, H.; Eskevich, M.; Jones, G.; Karlgren, J.; Carterette, B.; et al. 100,000 podcasts: A spoken English document corpus. In Proceedings of the 28th International Conference on Computational Linguistics, Virtual, 8–13 December 2020; pp. 5903–5917. [Google Scholar]
- Ardila, R.; Branson, M.; Davis, K.; Henretty, M.; Kohler, M.; Meyer, J.; Morais, R.; Saunders, L.; Tyers, F.M.; Weber, G. Common voice: A massively-multilingual speech corpus. arXiv 2019, arXiv:1912.06670. [Google Scholar]
- Du, J.; Na, X.; Liu, X.; Bu, H. Aishell-2: Transforming mandarin asr research into industrial scale. arXiv 2018, arXiv:1808.10583. [Google Scholar] [CrossRef]
- Sui, Y.; Zhao, M.; Xia, J.; Jiang, X.; Xia, S. TRAMBA: A hybrid transformer and mamba architecture for practical audio and bone conduction speech super resolution and enhancement on mobile and wearable platforms. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2024, 8, 205. [Google Scholar] [CrossRef]














| Block | Operation | Input Shape (1 s) | Input Shape (2 s) |
|---|---|---|---|
| Input | - | (16,384, 1) | (32,768, 1) |
| Encoder () | Conv1D () Decimation | (64, 200) | (128, 200) |
| Fusion Layer | 8-head Cross-Attention | (64, 200) | (128, 200) |
| Decoder () | Upsample Concat (Skip Conn.) Conv1D () | (16,384, 25) | (32,768, 25) |
| Output | Concat (Raw Input) Conv1D (1) | (16,384, 26) (16,384, 1) | (32,768, 26) (32,768, 1) |
| Model & Configuration | PESQ | STOI | CSIG | CBAK | COVL |
|---|---|---|---|---|---|
| Wav2Vec 2.0 | |||||
| English (Encoder Feat.) | 2.075 | 0.841 | 3.369 | 2.477 | 2.681 |
| Mandarin-Taiwan (Encoder Feat.) | 2.089 | 0.843 | 3.376 | 2.498 | 2.707 |
| Mandarin-Mainland (Encoder Feat.) | 2.096 | 0.845 | 3.387 | 2.502 | 2.715 |
| Mandarin-Mainland (Context Feat. Last) | 2.128 | 0.855 | 3.439 | 2.531 | 2.763 |
| HuBERT | |||||
| Mandarin-Mainland (Encoder Feat.) | 2.146 | 0.855 | 3.433 | 2.533 | 2.762 |
| Mandarin-Mainland (Context Feat. Avg) | 2.154 | 0.856 | 3.449 | 2.539 | 2.773 |
| Mandarin-Mainland (Context Feat. Last) | 2.157 | 0.857 | 3.452 | 2.545 | 2.784 |
| Configuration | PESQ | STOI | CSIG | CBAK | COVL |
|---|---|---|---|---|---|
| Proposed | 2.157 | 0.857 | 3.452 | 2.545 | 2.784 |
| - w/o SSL model | 2.013 | 0.835 | 3.285 | 2.421 | 2.624 |
| (Percentage Drop) | (−6.68%) | (−2.57%) | (−4.84%) | (−4.87%) | (−5.75%) |
| Proposed (U-Net-Like) | 2.024 | 0.846 | 3.259 | 2.047 | 2.573 |
| - w/o SSL model | 1.924 | 0.827 | 3.124 | 1.964 | 2.452 |
| (Percentage Drop) | (−4.94%) | (−2.25%) | (−4.14%) | (−4.05%) | (−4.70%) |
| Model | PESQ | STOI | CSIG | CBAK | COVL |
|---|---|---|---|---|---|
| BC Speech | 1.425 | 0.691 | 2.087 | 1.560 | 1.677 |
| FCN-BC | 1.677 | 0.660 | 2.323 | 2.081 | 2.085 |
| FCN-BC* | 1.697 | 0.671 | 2.353 | 2.112 | 2.121 |
| DPT-EGNet | 1.799 | 0.789 | 3.001 | 2.357 | 2.376 |
| DPT-EGNet* | 1.953 | 0.831 | 3.083 | 2.461 | 2.499 |
| EBEN | 1.833 | 0.793 | 3.221 | 2.475 | 2.485 |
| EBEN* | 1.874 | 0.796 | 3.336 | 2.492 | 2.516 |
| TRAMBA | 1.985 | 0.819 | 3.095 | 2.169 | 2.511 |
| U-Net-Like | 1.924 | 0.827 | 3.124 | 1.964 | 2.452 |
| Proposed | 2.157 | 0.857 | 3.452 | 2.545 | 2.784 |
| Model | PESQ | STOI | CSIG | CBAK | COVL |
|---|---|---|---|---|---|
| BC Speech | 1.024 | 0.420 | 1.017 | 1.004 | 1.003 |
| FCN-BC | 1.095 | 0.441 | 1.023 | 1.016 | 1.013 |
| FCN-BC* | 1.105 | 0.445 | 1.033 | 1.021 | 1.033 |
| DPT-EGNet | 1.635 | 0.645 | 3.171 | 2.039 | 2.413 |
| DPT-EGNet* | 1.658 | 0.674 | 3.260 | 2.051 | 2.522 |
| EBEN | 1.309 | 0.612 | 2.761 | 1.856 | 2.003 |
| EBEN* | 1.383 | 0.628 | 2.799 | 1.863 | 2.039 |
| TRAMBA | 1.329 | 0.569 | 2.965 | 1.747 | 2.104 |
| U-Net-Like | 1.618 | 0.683 | 3.212 | 2.019 | 2.399 |
| Proposed | 1.771 | 0.695 | 3.366 | 2.233 | 2.573 |
| Model | Params (M) | Model Size (MB) | MACs (G) | |
|---|---|---|---|---|
| ABCS (1 s) | ESMB (2 s) | |||
| FCN-BC | 0.01 | 0.04 | 0.14 | 0.28 |
| FCN-BC* | 3.91 | 15.64 | 3.89 | 7.78 |
| DPT-EGNet | 0.52 | 2.08 | 3.85 | 7.70 |
| DPT-EGNet* | 3.96 | 15.84 | 26.25 | 52.50 |
| EBEN | 1.98/29.70 a | 7.92/118.80 a | 1.02 | 2.04 |
| EBEN* | 5.38/33.10 a | 21.52/132.40 a | 3.10 | 6.20 |
| TRAMBA | 5.20 | 20.80 | 0.57 | 1.14 |
| U-Net-Like | 9.10 | 36.40 | 3.32 | 6.64 |
| Proposed | 3.87 | 15.48 | 2.43 | 4.74 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.
Share and Cite
Zheng, C.; He, H.; Fan, X.; Li, L.; Zhao, Y.; Yan, Y.; Yin, E. Enhancing Bone Conduction Sensor Signals via Self-Supervised Acoustic Priors and Key-Value Memory. Sensors 2026, 26, 1137. https://doi.org/10.3390/s26041137
Zheng C, He H, Fan X, Li L, Zhao Y, Yan Y, Yin E. Enhancing Bone Conduction Sensor Signals via Self-Supervised Acoustic Priors and Key-Value Memory. Sensors. 2026; 26(4):1137. https://doi.org/10.3390/s26041137
Chicago/Turabian StyleZheng, Changyan, Hao He, Xiaohu Fan, Lin Li, Yang Zhao, Ye Yan, and Erwei Yin. 2026. "Enhancing Bone Conduction Sensor Signals via Self-Supervised Acoustic Priors and Key-Value Memory" Sensors 26, no. 4: 1137. https://doi.org/10.3390/s26041137
APA StyleZheng, C., He, H., Fan, X., Li, L., Zhao, Y., Yan, Y., & Yin, E. (2026). Enhancing Bone Conduction Sensor Signals via Self-Supervised Acoustic Priors and Key-Value Memory. Sensors, 26(4), 1137. https://doi.org/10.3390/s26041137

