Next Article in Journal
Printing Speed and Quality Enhancement by Controlling the Surface Energy of Cliché in Reverse Offset Printing
Next Article in Special Issue
A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs
Previous Article in Journal
Incremental Design of Perishable Goods Markets through Multi-Agent Simulations
Previous Article in Special Issue
Wearable Vibration Based Computer Interaction and Communication System for Deaf
Open AccessArticle

A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Faculty of Engineering & Information Sciences, University of Wollongong, Wollongong NSW2522, Australia
Author to whom correspondence should be addressed.
Academic Editor: Vesa Valimaki
Appl. Sci. 2017, 7(12), 1301;
Received: 29 October 2017 / Revised: 3 December 2017 / Accepted: 12 December 2017 / Published: 14 December 2017
(This article belongs to the Special Issue Sound and Music Computing)
Rendering spatial sound scenes via audio objects has become popular in recent years, since it can provide more flexibility for different auditory scenarios, such as 3D movies, spatial audio communication and virtual classrooms. To facilitate high-quality bitrate-efficient distribution for spatial audio objects, an encoding scheme based on intra-object sparsity (approximate k-sparsity of the audio object itself) is proposed in this paper. The statistical analysis is presented to validate the notion that the audio object has a stronger sparseness in the Modified Discrete Cosine Transform (MDCT) domain than in the Short Time Fourier Transform (STFT) domain. By exploiting intra-object sparsity in the MDCT domain, multiple simultaneously occurring audio objects are compressed into a mono downmix signal with side information. To ensure a balanced perception quality of audio objects, a Psychoacoustic-based time-frequency instants sorting algorithm and an energy equalized Number of Preserved Time-Frequency Bins (NPTF) allocation strategy are proposed, which are employed in the underlying compression framework. The downmix signal can be further encoded via Scalar Quantized Vector Huffman Coding (SQVH) technique at a desirable bitrate, and the side information is transmitted in a lossless manner. Both objective and subjective evaluations show that the proposed encoding scheme outperforms the Sparsity Analysis (SPA) approach and Spatial Audio Object Coding (SAOC) in cases where eight objects were jointly encoded. View Full-Text
Keywords: audio object coding; sparsity; psychoacoustic model; multi-channel audio coding audio object coding; sparsity; psychoacoustic model; multi-channel audio coding
Show Figures

Figure 1

MDPI and ACS Style

Jia, M.; Zhang, J.; Bao, C.; Zheng, X. A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity. Appl. Sci. 2017, 7, 1301.

Show more citation formats Show less citations formats
Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Article Access Map by Country/Region

Search more from Scilit
Back to TopTop