# A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity

^{1}

^{2}

^{*}

## Abstract

**:**

## 1. Introduction

^{3}AC) [5,6,7], as a new method instead of original “downmix plus spatial parameters” model, exploited spatial direction of virtual sound source and mapping the soundfield from 360° into 60°. At the receiver, the decoded signals can be achieved by inverse mapping the 60° stereo soundfield into 360°.

## 2. Proposed Compression Framework

#### 2.1. MDCT and Active Object Detection

^{th}frame, an input audio object

**s**

_{n}= [s

_{n}(1), s

_{n}(2), …, s

_{n}(M)] is transformed into the MDCT domain, denoted by S(n, l), where n (1 ≤ n ≤ N) and l (1 ≤ l ≤ L) are frame number and frequency index, respectively. M = 1024 is the frame length. Here, a 2048-points MDCT is applied with 50% overlapped [22]. By this overlap, discontinuity at block boundary is smoothed out without increasing the number of transform coefficients. Afterwards, MDCT of an original signal s

_{n}can be formulated as:

^{th}frame and (n + 1)

^{th}frame. ${\varphi}_{l}^{1}(m)=\omega (m)\cdot \mathrm{cos}\left[\frac{\pi}{M}\cdot \left(m+\frac{M+1}{2}\right)\cdot \left(l-\frac{1}{2}\right)\right]$, ${\varphi}_{l}^{2}(m)=\omega \left(m+M\right)\cdot \mathrm{cos}\left[\frac{\pi}{M}\cdot \left(m+\frac{3M+1}{2}\right)\cdot \left(l-\frac{1}{2}\right)\right]$ and

^{T}is the transpose operation. In addition, a Kaiser–Bessel derived (KBD) short-time window slid along the time axis with 50% overlapping between frames is used as window function ω(m).

#### 2.2. Psychoacoustic-Based TF Instants Sorting

_{mdct}(l) (dB expression), where l = 1, 2, …, L. Then, an L-dimensional Absolute Auditory Masking Threshold (AAMT) vector

**T**≡ [T

_{mdct}(1), T

_{mdct}(2), …, T

_{mdct}(L)] is generated for subsequent computing. From psychoacoustic theory, it is clear that if there exists a TF bin (n

_{0}, l

_{0}) that the difference between S

_{dB}(n

_{0}, l

_{0}) (dB expression of S(n

_{0}, l

_{0})) and T

_{mdct}(l

_{0}) is larger than other TF bins, which means that S(n

_{0}, l

_{0}) can be perceived more easily than other TF components, but not vice versa. Specifically, any signals below this threshold curve (i.e., S

_{dB}(n

_{0}, l

_{0}) − T

_{mdct}(l

_{0}) < 0) is imperceptible (because T

_{mdct}(l) is the lowest limit of HAS). Rely on this phenomenon, the AAMT vector

**T**is used for extracting the perceptual dominant TF instants efficiently.

^{th}(1 ≤ q ≤ Q) audio object S

_{q}(n, l), whose dB expression is written as S

_{q_dB}(n, l). An aggregated vector can be attained by converging each S

_{q_dB}(n, l) denoted as

**S**

_{q_dB}≡ [S

_{q_dB}(n, 1), S

_{q_dB}(n, 2), …, S

_{q_dB}(n, L)]. Subsequently, a perceptual detection vector is designed as:

_{q}(n,l) = S

_{q_dB}(n,l) − T

_{mdct}(l). To sort each element in

**P**

_{q}according to the magnitude in descending order, mathematically, a new vector can be attained as:

#### 2.3. NPTF Allocation Strategy

^{th}frame, we assume that the q

^{th}object will be distributed k

_{q}NPTF, i.e., k

_{q}TF instants will be extracted for coding. An Individual Object Energy Retention ratio (IOER) function for the q

^{th}object is defined by:

_{q}(n, l). Thus, k

_{q}will be allocated for each object with approximate IOER. Under the criterion of minimum mean-square error, for all $q\in \{1,2,\dots ,Q\}$ the k

_{q}can be attained via a constrained optimization equation as follow:

_{1}, k

_{2}, …, k

_{Q}for each object are the desired NPTF

_{1}, NPTF

_{2}, …, NPTF

_{Q}, which can be searched by our proposed method elaborated in Algorithm 1.

Algorithm 1: NPTF allocation strategy based on bisection method | |

Input: Q | ►number of audio objects |

Input: ${\left\{{S}_{q}\left(n,l\right)\right\}}_{q=1}^{Q}$ | ►MDCT coefficients of each audio object |

Input: ${\left\{{l}_{i}^{q}\right\}}_{i=1}^{L}$ | ►reordered frequency index by psychoacoustic model |

Input: BPA | ►lower limit used in dichotomy part |

Input: BPB | ►upper limit used in dichotomy part |

Input: BPM | ►median used in dichotomy part |

Output: K | ►desired NPTF allocation result |

1. Set K = Ø | |

2. for q = 1 to Q do | |

3. for k = 1 to L do | |

4. Calculate IOER function f_{IOER}(k, q) using ${\left\{{S}_{q}\left(n,l\right)\right\}}_{q=1}^{Q}$ and ${\left\{{l}_{i}^{q}\right\}}_{i=1}^{L}$ in Formula (12). | |

5. end for | |

6. end for | |

7. Initialize BPA = 0, BPB = 1, BPM = 0.5·(BPA + BPB), STOP = 0.01 chosen based on a series of informal experimental results. | |

8. while (BPB–BPA > STOP) do | |

9. Find the index value corresponding to BPM value in IOER function (i.e., f_{IOER}(k_{q}, q) ≈ BPM), denoted by k_{q}. | |

10. if $\sum _{q=1}^{Q}{k}_{q}}>L$ then | |

11. BPB = BPM, | |

12. BPM = [0.5·(BPA + BPB)]. | |

13. else | |

14. BPA = BPM, | |

15. BPM = [0.5·(BPA + BPB)]. | |

16. end if | |

17. end while | |

18. $\mathit{K}={\left\{{k}_{q}\right\}}_{q=1}^{Q}$ | |

19. return K |

_{q}(k

_{q}) elements to forming a new vector ${\tilde{\mathit{p}}}_{q}\equiv [{P}_{q}(n,{l}_{1}^{q}),\cdots ,{p}_{q}(n,{l}_{NPT{F}_{q}}^{q})].$. It should be note that ${l}_{1}^{q},{l}_{2}^{q},\dots ,{l}_{NPT{F}_{q}}^{q}$ indicate the origin of ${S}_{q}\left(n,{l}_{1}^{q}\right),{S}_{q}\left(n,{l}_{2}^{q}\right),\dots ,{S}_{q}\left(n,{l}_{NPT{F}_{q}}^{q}\right)$, respectively. We group ${l}_{1}^{q},{l}_{2}^{q},\dots ,{l}_{NPT{F}_{q}}^{q}$ into a vector ${\mathit{I}}_{q}\equiv \left[{l}_{1}^{q},{l}_{2}^{q},\dots ,{l}_{NPT{F}_{q}}^{q}\right]$, in the meantime, a new vector containing all extracted TF instants ${\widehat{\mathit{S}}}_{q}\equiv \left[{S}_{q}\left(n,{l}_{1}^{q}\right),{S}_{q}\left(n,{l}_{2}^{q}\right),\dots ,{S}_{q}\left(n,{l}_{NPT{F}_{q}}^{q}\right)\right]$ is generated. Finally, both

**I**

_{q}and ${\widehat{\mathit{S}}}_{q}$ should be stored locally and sent into the Downmix Processing module.

#### 2.4. Downmix Processing

_{q}) approximation signal of S

_{q}(n, l) can be attained by rearrange ${\widehat{\mathit{S}}}_{q}$ in the original position, expressed as:

^{T}is the transpose operation. This matrix is sparse matrix containing M × L entries. Through a column-wise scanning of

**D**

_{n}and sequencing the nonzero entries onto the frequency axis according to the scanning order, the mono downmix signal and side information can be obtained via Algorithm 2.

**d**

_{n}can be further encoded by SQVH technique. Meanwhile, the side information compressed via the Run Length Coding (RLC) and the Golomb-Rice coding [19] at about 90 kbps.

#### 2.5. Downmix Signal Compressing by SQVH

^{th}frame, the downmix signal

**d**

_{n}attained in Algorithm 2 can be expressed as:

**d**

_{n}need to be divided into 51 sub-bands, each sub-band contains 20 TF instants, respectively (without considering the last 4 instants). The sub-band power (spectrum energy) is determined for each of the 51 regions and it is defined as root-mean-square (rms) value of coterminous 20 MDCT coefficients computed as:

^{(i/2+1)}are set to be quantization values, where i is an integer in the range [−8, 31]. R

_{rms}(0) is the lowest frequency region, which is quantized with 5 bits and transmitted directly in transmission channel. The quantization indices of the remaining 50 regions, which are differentially coded against the last highest-numbered region and then Huffman coded with variable bitrates. In each sub-band, the Quantized Index (QI) value can be given by:

_{stepsize}is quantization steps, b is an offset value according to different categories, $\lfloor \text{}\rfloor $ denotes a round-up operation, MAX is maximum of MDCT coefficients corresponding to that

**category**and l represents the l

^{th}vector in the region r. There are several

**categories**designed in SQVH coding. The

**category**assigned to a region defines the quantization and coding parameters such as quantization step size, offset, vector dimension v

_{d}and an expected total number of bits. The coding parameters for different category is given in Table 1.

Algorithm 2: Downmix processing compression algorithm | |

Input: Q | ►number of audio objects |

Input: L | ►frequency index |

Input: λ | ►downmix signal index |

Input: ${\tilde{\mathit{S}}}_{q}$ | ►k-sparse approximation signal of S_{q} |

Output: SI_{n} | ►side information matrix |

Output: d_{n} | ►downmix signal |

1. Initialize λ= 1. | |

2. Set SI_{n} = 0, d_{n} = 0. | |

3. for l = 1 to L do | |

4. for q = 1 to Q do | |

5. if ${\tilde{S}}_{q}(n,l)$ ≠ 0 then | |

6. ${\mathrm{d}}_{n}(\lambda )={\tilde{S}}_{q}(n,l)$. | |

7. SI_{n}(q, l) = 1. | |

8. Increment λ. | |

9. end if | |

10. end for | |

11. end for | |

12. return d_{n} and SI_{n} |

_{r}(l), correspond to a unique vector is identified by an index as follows:

^{th}vector in region r and j is the index to the j

^{th}value of QI

_{r}(l) in a given vector. Then, all vector indices are Huffman coded with variable bit-length code for that region. Three types of bit-stream distributions are given in the proposed method, whose performance is evaluated in next section.

#### 2.6. Decoding Process

## 3. Performance Evaluation

#### 3.1. Test Conditions

#### 3.2. Objective Evaluations

_{stepsize}, whose allocation for three types of bitrates can be calculated as shown in Table 2.

#### 3.3. Subjective Evaluation

## 4. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## Appendix A. Sparsity Analysis of Audio Signal in the MDCT Domain

#### Appendix A.1. Measuring the Sparsity of Audio Signal

_{n}(m) in n

^{th}frame can be defined by:

^{th}frame, ${{\mathsf{\theta}}^{\prime}}_{n}\equiv \left[S\prime \left(n,1\right),\cdots ,S\prime \left(n,L\right)\right]$ is sparse approximation vector of ${\mathsf{\theta}}_{n}$. Then, the Frame Energy Preservation Ratio (FEPR) can be given by:

_{p}-norm.

#### Appendix A.2. Statistical Analysis Results

**Figure A1.**NPTF (Number of Preserved Time-Frequency Bins) results calculated from eight types of audio signals in various FEPR (Frame Energy Preservation Ratio).

_{FEPR}is FEPR):

_{FEPR})

_{STFT}and k(r

_{FEPR})

_{MDCT}are the averaged NPTF for an audio signal in the STFT and MDCT domain with a certain FEPR, respectively. NRDR is the difference between them. The larger the NRDR is, means that the less NPTF needed in the MDCT domain. Then, a statistical bar graph is presented which reflects the relationship between NRDR and FEPR.

_{FEPR}= 98~80%. It can be observe that the NRDR of all tested audio signals are non-negative, which means that the averaged NPTF in the MDCT domain is higher than that in the STFT domain. This result testifies that the performance of MDCT is absolutely dominant for all of the tested 8 items.

_{FEPR}uniformly decrease from 98% to 88%. When 80% ≤ r

_{FEPR}≤ 88%, the NRDR maintains at the same level or slightly grow. Videlicet, with the decrement of FEPR, the superiority of MDCT is becoming increasingly obvious.

_{FEPR}= 80% whilst other instruments can only achieve roughly 45%~55%. Besides, the sparseness of selected speech signals is weaker than all instruments in the MDCT domain but maintain consistency as far as the global regularity.

**Figure A2.**NRDR (Normalized Relative Difference Ratio) of eight types of audio signals under STFT (Short Time Fourier Transform) and MDCT (Modified Discrete Cosine Transform) in various FEPR.

## References

- International Telecommunication Union. BS.775: Multichannel Stereophonic Sound System with and without Accompanying Picture; International Telecommunications Union: Geneva, Switzerland, 2006. [Google Scholar]
- Bosi, M.; Brandenburg, K.; Quackenbush, S.; Fielder, L.; Akagiri, K.; Fuchs, H.; Dietz, M. ISO/IEC MPEG-2 advanced audio coding. J. Audio Eng. Soc.
**1997**, 45, 789–814. [Google Scholar] - Breebaart, J.; Disch, S.; Faller, C.; Herre, J.; Hotho, G.; Kjörling, K.; Myburg, F.; Neusinger, M.; Oomen, W.; Purnhagen, H.; et al. MPEG spatial audio coding/MPEG surround: Overview and current status. In Proceedings of the Audio Engineering Society Convention 119, New York, NY, USA, 7–10 October 2005. [Google Scholar]
- Quackenbush, S.; Herre, J. MPEG surround. IEEE MultiMedia
**2005**, 12, 18–23. [Google Scholar] [CrossRef] - Cheng, B.; Ritz, C.; Burnett, I. Principles and analysis of the squeezing approach to low bit rate spatial audio coding. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Honolulu, HI, USA, 16–20 April 2007; pp. I-13–I-16. [Google Scholar]
- Cheng, B.; Ritz, C.; Burnett, I. A spatial squeezing approach to ambisonic audio compression. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Las Vegas, NV, USA, 31 March–4 April 2008; pp. 369–372. [Google Scholar]
- Cheng, B.; Ritz, C.; Burnett, I.; Zheng, X. A general compression approach to multi-channel three-dimensional audio. IEEE Trans. Audio Speech Lang. Process.
**2013**, 21, 1676–1688. [Google Scholar] [CrossRef] - Bleidt, R.; Borsum, A.; Fuchs, H.; Weiss, S.M. Object-based audio: Opportunities for improved listening experience and increased listener involvement. In Proceedings of the SMPTE 2014 Annual Technical Conference & Exhibition, Hollywood, CA, USA, 20–23 October 2014. [Google Scholar]
- Pulkki, V. Virtual sound source positioning using vector base amplitude panning. J. Audio Eng. Soc.
**1997**, 45, 456–466. [Google Scholar] - Dolby Laboratories, “Dolby ATMOS Cinema Specifications”. 2014. Available online: http://www.dolby.com/us/en/technologies/dolbyatmos/dolby-atmos-specifications.pdf (accessed on 25 October 2017).
- Breebaart, J.; Engdegard, J.; Falch, C.; Hellmuth, O.; Hilpert, J.; Holzer, A.; Koppens, J.; Oomen, W.; Resch, B.; Schuijers, E.; et al. Spatial Audio Object Coding (SAOC)—The upcoming MPEG standard on parametric object based audio coding. In Proceedings of the Audio Engineering Society Convention 124, Amsterdam, The Netherlands, 17–20 May 2008. [Google Scholar]
- Herre, J.; Purnhagen, H.; Koppens, J.; Hellmuth, O.; Engdegard, J.; Hilper, J.; Villemoes, L.; Terentiv, L.; Falch, C.; Holzer, A.; et al. MPEG Spatial Audio Object Coding—The ISO/MPEG standard for efficient coding of interactive audio scenes. J. Audio Eng. Soc.
**2012**, 60, 655–673. [Google Scholar] - Pulkki, V. Directional audio coding in spatial sound reproduction and stereo upmixing. In Proceedings of the Audio Engineering Society Conference: 28th International Conference: The Future of Audio Technology—Surround and Beyond, Piteå, Sweden, 30 June–2 July 2006. [Google Scholar]
- Faller, C.; Pulkki, V. Directional audio coding: Filterbank and STFT-based design. In Proceedings of the Audio Engineering Society Convention 120, Paris, France, 20–23 May 2006. [Google Scholar]
- Herre, J.; Hilpert, J.; Kuntz, A.; Plogsties, J. MPEG-H 3D audio—The new standard for coding of immersive spatial audio. IEEE J. Sel. Top. Signal Process.
**2015**, 9, 770–779. [Google Scholar] [CrossRef] - Zheng, X.; Ritz, C.; Xi, J. Encoding navigable speech sources: An analysis by synthesis approach. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Kyoto, Japan, 25–30 March 2012; pp. 405–408. [Google Scholar]
- Zheng, X.; Ritz, C.; Xi, J. Encoding navigable speech sources: A psychoacoustic-based analysis-by-synthesis approach. IEEE Trans. Audio Speech Lang. Process.
**2013**, 21, 29–38. [Google Scholar] [CrossRef] - Yilmaz, O.; Rickard, S. Blind separation of speech mixtures via time-frequency masking. IEEE Trans. Audio Speech Lang. Process.
**2004**, 52, 1830–1847. [Google Scholar] [CrossRef] - Jia, M.; Yang, Z.; Bao, C.; Zheng, X.; Ritz, C. Encoding multiple audio objects using intra-object sparsity. IEEE Trans. Audio Speech Lang. Process.
**2015**, 23, 1082–1095. [Google Scholar] - Yang, Z.; Jia, M.; Bao, C.; Wang, W. An analysis-by-synthesis encoding approach for multiple audio objects. In Proceedings of the IEEE Signal and Information Processing Association Annual Summit and Conference (APSIPA), Hong Kong, China, 16–19 December 2015; pp. 59–62. [Google Scholar]
- Yang, Z.; Jia, M.; Wang, W.; Zhang, J. Multi-Stage Encoding Scheme for Multiple Audio Objects Using Compressed Sensing. Cybern. Inf. Technol.
**2015**, 15, 135–146. [Google Scholar] - Wang, Y.; Vilermo, M. Modified discrete cosine transform: Its implications for audio coding and error concealment. J. Audio Eng. Soc.
**2003**, 51, 52–61. [Google Scholar] - Enqing, D.; Guizhong, L.; Yatong, Z.; Yu, C. Voice activity detection based on short-time energy and noise spectrum adaptation. In Proceedings of the IEEE International Conference on Signal Processing (ICSP), Beijing, China, 26–30 August 2002; pp. 464–467. [Google Scholar]
- Painter, T.; Spanias, A. Perceptual coding of digital audio. Proc. IEEE
**2000**, 88, 451–515. [Google Scholar] [CrossRef] - Spanias, A.; Painter, T.; Atti, V. Audio Signal Processing and Coding; John Wiley & Sons: Hoboken, NJ, USA, 2006; pp. 114 & 274. ISBN 9780470041970. [Google Scholar]
- Jia, M.; Bao, C.; Liu, X. An embedded speech and audio coding method based on bit-plane coding and SQVH. In Proceedings of the IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Ajman, UAE, 11–16 December 2009; pp. 43–48. [Google Scholar]
- Xie, M.; Lindbergh, D.; Chu, P. ITU-T G.722.1 Annex C: A new low-complexity 14 kHz audio coding standard. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Toulouse, France, 14–19 May 2006; pp. 173–176. [Google Scholar]
- Xie, M.; Lindbergh, D.; Chu, P. From ITU-T G.722.1 to ITU-T G.722.1 Annex C: A New Low-Complexity 14kHz Bandwidth Audio Coding Standard. J. Multimed.
**2007**, 2, 65–76. [Google Scholar] [CrossRef] - QUASI Database—A Musical Audio Signal Database for Source Separation. Available online: http://www.tsi.telecomparistech.fr/aao/en/2012/03/12/quasi/ (accessed on 25 October 2017).
- International Telecommunication Union. BS.1534: Method for the Subjective Assessment of Intermediate Quality Levels of Coding Systems; International Telecommunication Union: Geneva, Switzerland, 1997. [Google Scholar]
- Gardner, B.; Martin, K. HRTF Measurements of a KEMAR Dummy-Head Microphone. Available online: http://sound.media.mit.edu/resources/KEMAR.html (accessed on 25 October 2017).
- Candes, E.J.; Wakin, M.B. An introduction to compressive sampling. IEEE Signal Process. Mag.
**2008**, 25, 21–30. [Google Scholar] [CrossRef] - Candes, E.J.; Romberg, J.K.; Tao, T. Stable signal recovery from incomplete and inaccurate measurements. Commun. Pure Appl. Math.
**2006**, 59, 1207–1223. [Google Scholar] [CrossRef] - Dhas, M.D.K.; Sheeba, P.M. Analysis of audio signal using integer MDCT with Kaiser Bessel Derived window. In Proceedings of the IEEE International Conference on Advanced Computing and Communication Systems (ICACCS), Coimbatore, India, 6–7 January 2017; pp. 1–6. [Google Scholar]
- Bosi, M.; Goldberg, R.E. Introduction to Digital Audio Coding and Standards; Springer: Berlin, Germany, 2003. [Google Scholar]
- Oppenheim, A.V.; Schafer, R.W. Discrete-Time Signal Processing, 3rd ed.; Publishing House of Electronics Industry: Beijing, China, 2011; pp. 673–683. ISBN 9787121122026. [Google Scholar]
- University of Iowa Music Instrument Samples. Available online: http://theremin.music.uiowa.edu/MIS.html (accessed on 25 October 2017).

**Figure 1.**The block diagram for the proposed compression framework. (MDCT, Modified Discrete Cosine Transform; IMDCT, Inverse Modified Discrete Cosine Transform; NPTF, Number of Preserved Time-Frequency Bins; SQVH, Scalar Quantized Vector Huffman Coding; TF, Time-Frequency).

**Figure 2.**Example of TF (Time-Frequency) instants extraction and de-mixing procedure with eight unique simultaneously occurring sources.

**Figure 3.**ODG (Objective Difference Grade) Score for the proposed audio object encoding approach and the SPA (Sparsity Analysis) framework (both in the STFT (Short Time Fourier Transform) and MDCT domain). (

**a**–

**d**) represent the results for 4 multi-track audio files.

**Figure 5.**The ODG score of four multi-track audio files, where each file correspond to three types of bitrates. (

**a**–

**d**) represent the results for 4 multi-track audio files.

**Figure 6.**MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) test results for the SPA framework and the proposed framework with 95% confidence intervals.

**Figure 7.**MUSHRA test results for the SPA method encoding at 128 kbps and the proposed approach at 105.14 kbps with 95% confidence intervals.

**Figure 8.**MUSHRA test results for separate AAC (Advanced Audio Coding) encoding at 30 kbps, SAOC (Spatial Audio Object Coding) and our proposed approach at 105 kbps with 95% confidence intervals.

**Figure 9.**MUSHRA test results with 95% confidence intervals for the soundfield rendering using separate AAC encoding at 30 kbps, SAOC, SPA and our proposed approach at 105 kbps.

Categories | q_{stepsize} | b | MAX | v_{d} | Bit Count |
---|---|---|---|---|---|

0 | 2^{−1.5} | 0.3 | 13 | 2 | 52 |

1 | 2^{−1.0} | 0.33 | 9 | 2 | 47 |

2 | 2^{−0.5} | 0.36 | 6 | 2 | 43 |

3 | 2^{0.0} | 0.39 | 4 | 4 | 37 |

The Index of the Bitrate Sub-Band | r | |||
---|---|---|---|---|

1~13 | 14~26 | 27~39 | 40~51 | |

105.14 kbps | 2^{−1.5} | 2^{−1.0} | 2^{−0.5} | 2^{0.0} |

112.53 kbps | 2^{−1.5} | 2^{−1.0} | 2^{−1.0} | 2^{−1.0} |

120.7 kbps | 2^{−1.5} | 2^{−1.5} | 2^{−1.5} | 2^{−1.5} |

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Jia, M.; Zhang, J.; Bao, C.; Zheng, X. A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity. *Appl. Sci.* **2017**, *7*, 1301.
https://doi.org/10.3390/app7121301

**AMA Style**

Jia M, Zhang J, Bao C, Zheng X. A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity. *Applied Sciences*. 2017; 7(12):1301.
https://doi.org/10.3390/app7121301

**Chicago/Turabian Style**

Jia, Maoshen, Jiaming Zhang, Changchun Bao, and Xiguang Zheng. 2017. "A Psychoacoustic-Based Multiple Audio Object Coding Approach via Intra-Object Sparsity" *Applied Sciences* 7, no. 12: 1301.
https://doi.org/10.3390/app7121301