Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions

Li, Tsung-Jung; Hung, Jeih-Weih

doi:10.3390/math13213481

This is an early access version, the complete PDF, HTML, and XML versions will be available soon.

Open AccessArticle

Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions

by

Tsung-Jung Li

and

Jeih-Weih Hung

^*

Department of Electrical Engineering, National Chi Nan University, No. 301, University Rd., Puli Township, Nantou County 54561, Taiwan

^*

Author to whom correspondence should be addressed.

Mathematics 2025, 13(21), 3481; https://doi.org/10.3390/math13213481 (registering DOI)

Submission received: 3 October 2025 / Revised: 27 October 2025 / Accepted: 30 October 2025 / Published: 31 October 2025

(This article belongs to the Special Issue Advances in Artificial Intelligence, Machine Learning and Optimization)

Download Versions Notes

Abstract

We propose MUSE++, an advanced and lightweight speech enhancement (SE) framework that builds upon the original MUSE architecture by introducing three key improvements: a Mamba-based state space model, dynamic SNR-driven data augmentation, and an augmented multi-objective loss function. First, we replace the original multi-path enhanced Taylor (MET) transformer block with the Mamba architecture, enabling substantial reductions in model complexity and parameter count while maintaining robust enhancement capability. Second, we adopt a dynamic training strategy that varies the signal-to-noise ratios (SNRs) across diverse speech samples, promoting improved generalization to real-world acoustic scenarios. Third, we expand the model’s loss framework with additional objective measures, allowing the model to be empirically tuned towards both perceptual and objective SE metrics. Comprehensive experiments conducted on the VoiceBank-DEMAND dataset demonstrate that MUSE++ delivers consistently superior performance across standard evaluation metrics, including PESQ, CSIG, CBAK, COVL, SSNR, and STOI, while reducing the number of model parameters by over 65% compared to the baseline. These results highlight MUSE++ as a highly efficient and effective solution for speech enhancement, particularly in resource-constrained and real-time deployment scenarios.

Keywords: speech enhancement; Mamba architecture; extended loss function; lightweight neural network; dynamic SNR-based augmentation

Share and Cite

MDPI and ACS Style

Li, T.-J.; Hung, J.-W. Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions. Mathematics 2025, 13, 3481. https://doi.org/10.3390/math13213481

AMA Style

Li T-J, Hung J-W. Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions. Mathematics. 2025; 13(21):3481. https://doi.org/10.3390/math13213481

Chicago/Turabian Style

Li, Tsung-Jung, and Jeih-Weih Hung. 2025. "Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions" Mathematics 13, no. 21: 3481. https://doi.org/10.3390/math13213481

APA Style

Li, T.-J., & Hung, J.-W. (2025). Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions. Mathematics, 13(21), 3481. https://doi.org/10.3390/math13213481

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing the MUSE Speech Enhancement Framework with Mamba-Based Architecture and Extended Loss Functions

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI