# Populating the Mix Space: Parametric Methods for Generating Multitrack Audio Mixtures

^{*}

## Abstract

**:**

## Featured Application

**The numerical methods described in this paper can be used in the automatic creation of artificial datasets of audio mixes, as real-world mixes are both scarce and costly to produce. Such datasets can be used for a variety of applications, such as material for signal analysis, audio stimuli in psychoacoustic testing or as a population of solutions to be optimised, thus forming the basis of an automatic mixing system. Within this paper, the application of interest is testing the robustness of tempo estimation to re-mixing.**

## Abstract

## 1. Introduction

## 2. Theoretical Framework

#### 2.1. Track Gains

**Definition**

**1.**

**Definition**

**2.**

#### 2.2. Generating Gain Vectors by Sampling the Mix-Space

`SphericalDistributionsRand`(https://github.com/yuhuichen1015/SphericalDistributionsRand) code, based on the work of [14], was used to generate points according to a vMF distribution. In the context of audio mixes, $\mu $ (where $\left|\mu \right|=1$) represents the mix about which others are distributed, akin to the mean in a normal distribution. The $\kappa $ term represents the diversity of mixes generated, analogous (but inversely proportional) to variance. An example is shown in Figure 5, where three distributions are drawn from a 2-sphere.

#### 2.2.1. Simple Mixing Model

#### 2.2.2. Perceptual Mixing Model

#### 2.3. Track Panning

#### 2.3.1. Method 1—Separate Left and Right Gain Vectors

#### 2.3.2. Method 2—Separate Gain and Panning

#### 2.4. Track Equalisation

## 3. Applications

#### 3.1. Testing the Robustness of Tempo Estimation Algorithms to Changes in the Mix

`mirtempo`in the MIRtoolbox. In short, the classic tempo estimation algorithm performs onset detection based on the amplitude envelope of the audio. Periodicities in the detected onsets are determined by finding peaks in the autocorrelation function. The metre method additionally takes into account the metrical hierarchy of the audio, allowing for a more consistent tempo-tracking. Whichever tempo-estimation is used, the resultant tempo is the mean value over the 30-second audio segment. Panning and equalisation were not considered here as tempo was estimated from a mono signal.

#### 3.2. Estimation of Spectral Centroid in Sets of Mixes

## 4. Discussion

#### 4.1. Artificial Datasets for Testing of Processes

#### 4.2. Signal Analysis of Audio Mixing Practices

## 5. Conclusions

## Author Contributions

## Conflicts of Interest

## References

- Gonzalez, E.; Reiss, J. Improved control for selective minimization of masking using Inter-Channel dependancy effects. In Proceedings of the 11th International Conference on Digital Audio Effects (DAFx-08), Espoo, Finland, 1–4 September 2008. [Google Scholar]
- Tsilfidis, A.; Papadakos, C.; Mourjopoulos, J. Hierarchical perceptual mixing. In Proceedings of the 126th AES Convention, Munich, Germany, 7–10 May 2009. [Google Scholar]
- Reiss, J.D. Intelligent systems for mixing multichannel audio. In Proceedings of the IEEE 17th International Conference on Digital Signal Processing, Corfu, Greece, 6–8 July 2011. [Google Scholar]
- Cartwright, M.; Pardo, B.; Reiss, J. Mixploration: Rethinking the audio mixer interface. In Proceedings of the ACM 19th International Conference on Intelligent User Interfaces, Haifa, Israel, 24–27 February 2014. [Google Scholar]
- Terrell, M.; Simpson, A.; Sandler, M. The mathematics of mixing. J. Audio Eng. Soc.
**2014**, 62, 4–13. [Google Scholar] [CrossRef] - Jillings, N.; Stables, R. A semantically powered digital audio workstation in the browser. In Proceedings of the Audio Engineering Society International Conference on Semantic Audio, Erlangen, Germany, 22–24 June 2017. [Google Scholar]
- Wilson, A.; Fazenda, B.M. Navigating the Mix-Space: Theoretical and practical level-balancing technique in multitrack music mixtures. In Proceedings of the 12th Sound and Music Computing Conference, Maynooth, Ireland, 24–26 October 2015. [Google Scholar]
- Wilson, A.; Fazenda, B. Variation in Multitrack Mixes: Analysis of Low-level Audio Signal Features. J. Audio Eng. Soc.
**2016**, 64, 466–473. [Google Scholar] [CrossRef] - Wilson, A.; Fazenda, B. An evolutionary computation approach to intelligent music production, informed by experimentally gathered domain knowledge. In Proceedings of the 2nd AES Workshop on Intelligent Music Production, London, UK, 13 September 2016. [Google Scholar]
- Wilson, A. Perceptually-motivated generation of electric guitar timbres using an interactive genetic algorithm. In Proceedings of the 3rd Workshop on Intelligent Music Production, Salford, UK, 14 September 2017. [Google Scholar]
- Blumenson, L.E. A Derivation of n-Dimensional Spherical Coordinates. Am. Math. Mon.
**1960**, 67, 63–66. [Google Scholar] [CrossRef] - Fisher, N.I. Statistical Analysis of Circular Data; Cambridge University Press: Cambridge, UK, 1995. [Google Scholar]
- Mardia, K.V.; Jupp, P.E. Directional Statistics; John Wiley & Sons: Hoboken, NJ, USA, 2009; Volume 494. [Google Scholar]
- Chen, Y.H.; Wei, D.; Newstadt, G.; DeGraef, M.; Simmons, J.; Hero, A. Statistical estimation and clustering of group-invariant orientation parameters. In Proceedings of the IEEE 18th International Conference on Information Fusion, Washington, DC, USA, 6–9 July 2015. [Google Scholar]
- Wilson, A. Evaluation and Modelling of Perceived Audio Quality in Popular Music, towards Intelligent Music Production. Ph.D. Thesis, University of Salford, Salford, UK, 2017. [Google Scholar]
- Pestana, P. Automatic Mixing Systems Using Adaptive Audio Effects. Ph.D. Thesis, Universidade Catolica Portuguesa, Lisbon, Portugal, 2013. [Google Scholar]
- De Man, B. Towards a Better Understanding of Mix Engineering. Ph.D. Thesis, Queen Mary, University of London, London, UK, 2017. [Google Scholar]
- Lee, H.; Rumsey, F. Level and time panning of phantom images for musical sources. J. Audio Eng. Soc.
**2013**, 61, 978–988. [Google Scholar] - Lartillot, O.; Toiviainen, P. A matlab toolbox for musical feature extraction from audio. In Proceedings of the 10th International Conference on Digital Audio Effects (DAFx-07), Bordeaux, France, 10–15 September 2007. [Google Scholar]
- Pestana, P.D.; Reiss, J.D.; Barbosa, A. Loudness measurement of multitrack audio content using modifications of ITU-R BS.1770. In Proceedings of the 34th AES Convention; Audio Engineering Society, Rome, Italy, 4 May 2013. [Google Scholar]
- Lartillot, O.; Cereghetti, D.; Eliard, K.; Trost, W.J.; Rappaz, M.A.; Grandjean, D. Estimating Tempo and metrical features by tracking the whole metrical hierarchy. In Proceedings of the 3rd International Conference on Music & Emotion (ICME3), Jyväskylä, Finland, 11–15 June 2013. [Google Scholar]
- Von Bismarck, G. Timbre of steady sounds: A factorial investigation of its verbal attributes. Acta Acust. United Acust.
**1974**, 30, 146–159. [Google Scholar] - Grey, J.M.; Gordon, J.W. Perceptual effects of spectral modifications on musical timbres. J. Acoust. Soc. Am.
**1978**, 63, 1493–1500. [Google Scholar] [CrossRef] - McAdams, S.; Winsberg, S.; Donnadieu, S.; De Soete, G.; Krimphoff, J. Perceptual scaling of synthesized musical timbres: Common dimensions, specificities, and latent subject classes. Psychol. Res.
**1995**, 58, 177–192. [Google Scholar] [CrossRef] [PubMed] - De Man, B.; Leonard, B.; King, R.; Reiss, J. An analysis and evaluation of audio features for multitrack music mixtures. In Proceedings of the 15th International Society for Music Information Retrieval Conference, Taipei, Taiwan, 27–31 October 2014. [Google Scholar]
- Wilson, A.; Fazenda, B.M. 101 Mixes: A statistical analysis of mix-variation in a dataset of multitrack music mixes. In Proceedings of the 139th AES Convention, Audio Engineering Society, New York, NY, USA, 29 October–1 November 2015. [Google Scholar]
- Shirley, B.G.; Meadows, M.; Malak, F.; Woodcock, J.S.; Tidball, A. Personalized object-based audio for hearing impaired TV viewers. J. Audio Eng. Soc.
**2017**, 65, 293–303. [Google Scholar] [CrossRef] - Lartillot, O.; Eerola, T.; Toiviainen, P.; Fornari, J. Multi-feature modeling of pulse clarity: Design, validation and optimization. In Proceedings of the 9th International Society for Music Information Retrieval Conference, Philadelphia, PA, USA, 14–18 September 2008. [Google Scholar]

**Figure 1.**Points p, ${p}^{\prime}$ and r, in 2-track gain space. Note that the audio output at points p and ${p}^{\prime}$ is the same ‘mix’.

**Figure 2.**Graphical representation of three mixes in mix-space. While shown for three tracks, this is generalisable to any number of tracks n, using hyperspherical coordinates. (a) Mix at a point in 3-track gain space. Note that the audio output at points p and ${p}^{\prime}$ is the same ‘mix’, despite the vectors having different lengths in this space; (

**b**) For a 3-track mixture, while the cube (${\mathbb{R}}^{3}$) represents all outputs of a summing mixer, the surface of the sphere (${\mathbb{S}}^{2}$) represents all possible mixes.

**Figure 3.**Schematic representation of a four-track mixing task, with track gains ${g}_{1},{g}_{2},{g}_{3},{g}_{4}$, and the semantic description of the three $\varphi $ terms, when adjusted from 0 to $\pi /2$. Figure taken from [7].

**Figure 4.**A time-varying mix can be considered as a path in the mix-space. Here, a random time-varying mix is generated by means of a random walk. (

**a**) Random walk in mix-space; Brownian motion, halted after 30 s; (

**b**) Random walk from Figure 4a converted to gain-space; (

**c**) Time series of gain values for each of the three tracks.

**Figure 5.**Three sets of mixes, drawn from the mix-space. This shows the effect of varying the concentration parameter $\kappa $, that a larger value results in less diversity.

**Figure 7.**Panning of two tracks, represented as a 1-sphere. The panning mix is determined by the angle $\theta $ with ${r}_{\mathrm{pan}}$ acting as a scaling variable, adjusting the overall width of the mix. For example, ${C}^{\prime}$ is a wider version of C.

**Figure 8.**Panning method 1—separate vMF distributions for ${\mathrm{gain}}_{L}$ and ${\mathrm{gain}}_{R}$, both using Equation (7). (

**a**) Boxplot of track gains for left channel, using Equation (7); (

**b**) Boxplot of track gains for right channel, using Equation (7); (

**c**) Boxplot of pan positions for each track; (

**d**) Probability density of pan positions for each track.

**Figure 9.**Panning method 1b—separate vMF distributions for left and right channels but using unique $\mu $ vectors, shown in Equations (10) and (11). (

**a**) Boxplot of track gains for left channel, using Equation (10); (

**b**) Boxplot of track gains for right channel, using Equation (11); (

**c**) Boxplot of pan positions for each track. Where $\left|P\right|>1$, this is caused by negative track gains; (

**d**) Probability density of pan positions for each track.

**Figure 10.**Panning method 2—generating vMF distributions in panning space. As expected, increasing $\kappa $ (concentration parameter) results in a narrower range of pan positions for each track, around the target vector Equation (13).

**Figure 12.**Five randomly-chosen examples of 3-band equalisation, chosen from the tone-space. As ${\psi}_{2}\to 0$, the gain of the high band decreases. As ${\psi}_{1}\to 0$, the gain of the low band increases at the expense of the other two bands; their balance is determined by ${\psi}_{2}$.

**Figure 13.**Estimated tempo for three songs, 500 mixes each using Equation (7). In each histogram, the data is split into 100 bins. Overall, performance is better for the metre-based method, as it demonstrates greater accuracy and improved robustness to changes in the mix. (

**a**) “Burning Bridges”—The correct tempo is ≈100 bpm; (

**b**) “I’m Alright”—The correct tempo is ≈96 bpm; (

**c**) “What I Want”—The correct tempo is ≈99 bpm.

**Figure 14.**Estimated tempo for three songs, 500 mixes each using Equation (8). In each histogram, the data is split into 100 bins. Overall, performance is better for the metre-based method, as it demonstrates greater accuracy and improved robustness to changes in the mix; (

**a**) “Burning Bridges”—The correct tempo is ≈100 bpm; (

**b**) “I’m Alright”—The correct tempo is ≈96 bpm; (

**c**) “What I Want”—The correct tempo is ≈99 bpm.

**Figure 15.**Probability distribution of spectral centroid as a function of mix-space parameters; (

**a**) “Burning Bridges”; (

**b**) “I’m Alright”; (

**c**) “What I Want”.

**Table 1.**Summary of tempo estimation accuracy results. Shown is the mean squared error (MSE) in each set of 500 mixes.

© 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Wilson, A.; Fazenda, B.M. Populating the Mix Space: Parametric Methods for Generating Multitrack Audio Mixtures. *Appl. Sci.* **2017**, *7*, 1329.
https://doi.org/10.3390/app7121329

**AMA Style**

Wilson A, Fazenda BM. Populating the Mix Space: Parametric Methods for Generating Multitrack Audio Mixtures. *Applied Sciences*. 2017; 7(12):1329.
https://doi.org/10.3390/app7121329

**Chicago/Turabian Style**

Wilson, Alex, and Bruno M. Fazenda. 2017. "Populating the Mix Space: Parametric Methods for Generating Multitrack Audio Mixtures" *Applied Sciences* 7, no. 12: 1329.
https://doi.org/10.3390/app7121329