Next Article in Journal
Printing Speed and Quality Enhancement by Controlling the Surface Energy of Cliché in Reverse Offset Printing
Next Article in Special Issue
A Neural Parametric Singing Synthesizer Modeling Timbre and Expression from Natural Songs
Previous Article in Journal
Incremental Design of Perishable Goods Markets through Multi-Agent Simulations
Previous Article in Special Issue
Wearable Vibration Based Computer Interaction and Communication System for Deaf
Article Menu
Issue 12 (December) cover image

Export Article

Open AccessArticle
Appl. Sci. 2017, 7(12), 1301;

A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity

Beijing Key Laboratory of Computational Intelligence and Intelligent System, Faculty of Information Technology, Beijing University of Technology, Beijing 100124, China
Faculty of Engineering & Information Sciences, University of Wollongong, Wollongong NSW2522, Australia
Author to whom correspondence should be addressed.
Academic Editor: Vesa Valimaki
Received: 29 October 2017 / Revised: 3 December 2017 / Accepted: 12 December 2017 / Published: 14 December 2017
(This article belongs to the Special Issue Sound and Music Computing)
Full-Text   |   PDF [2320 KB, uploaded 18 December 2017]   |  


Rendering spatial sound scenes via audio objects has become popular in recent years, since it can provide more flexibility for different auditory scenarios, such as 3D movies, spatial audio communication and virtual classrooms. To facilitate high-quality bitrate-efficient distribution for spatial audio objects, an encoding scheme based on intra-object sparsity (approximate k-sparsity of the audio object itself) is proposed in this paper. The statistical analysis is presented to validate the notion that the audio object has a stronger sparseness in the Modified Discrete Cosine Transform (MDCT) domain than in the Short Time Fourier Transform (STFT) domain. By exploiting intra-object sparsity in the MDCT domain, multiple simultaneously occurring audio objects are compressed into a mono downmix signal with side information. To ensure a balanced perception quality of audio objects, a Psychoacoustic-based time-frequency instants sorting algorithm and an energy equalized Number of Preserved Time-Frequency Bins (NPTF) allocation strategy are proposed, which are employed in the underlying compression framework. The downmix signal can be further encoded via Scalar Quantized Vector Huffman Coding (SQVH) technique at a desirable bitrate, and the side information is transmitted in a lossless manner. Both objective and subjective evaluations show that the proposed encoding scheme outperforms the Sparsity Analysis (SPA) approach and Spatial Audio Object Coding (SAOC) in cases where eight objects were jointly encoded. View Full-Text
Keywords: audio object coding; sparsity; psychoacoustic model; multi-channel audio coding audio object coding; sparsity; psychoacoustic model; multi-channel audio coding

Figure 1

This is an open access article distributed under the Creative Commons Attribution License which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited (CC BY 4.0).

Share & Cite This Article

MDPI and ACS Style

Jia, M.; Zhang, J.; Bao, C.; Zheng, X. A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity. Appl. Sci. 2017, 7, 1301.

Show more citation formats Show less citations formats

Note that from the first issue of 2016, MDPI journals use article numbers instead of page numbers. See further details here.

Related Articles

Article Metrics

Article Access Statistics



[Return to top]
Appl. Sci. EISSN 2076-3417 Published by MDPI AG, Basel, Switzerland RSS E-Mail Table of Contents Alert
Back to Top