# Stay True to the Sound of History: Philology, Phylogenetics and Information Engineering in Musicology

^{*}

## Abstract

**:**

## 1. Introduction

- constructing the stemma codicum (recension, or the Latin recensio) starting with a set of the sources (all the different witnesses of that musical work);
- selection (or selectio), where the original source is determined by examining variants, selecting the best ones [9].

## 2. Related Works

## 3. Problem Description

## 4. Algorithms

#### 4.1. Audio Pre-Processing

#### 4.2. Leader Tape Detection

- From two sets of keypoints $({\mathcal{K}}_{i},{\mathcal{K}}_{j})$, find a subset of matched pairs by comparing the related descriptors. Given the matched pairs $\left(({u}_{k},{v}_{k}),({u}_{h}^{\prime},{v}_{h}^{\prime})\right)$, estimate the optimum geometric transform mapping ${P}_{i}$ onto ${P}_{j}$ with the RANSAC algorithm [36]. If a leader-tape is present, the set of inlier points returned by the algorithm will converge to a subset of keypoints belonging to only one of the two portions of the spectrogram separated by the leader (Figure 3).
- Define a function ${g}_{i}\left(v\right)$ counting the number of keypoints detected in ${P}_{i}(u,v)$ for each image column v (in order to avoid strong oscillations, $g\left(v\right)$ is processed with a moving-average low-pass filter). Then, define ${g}_{i}^{\prime}\left(v\right)$ as the number of inlier points left on ${P}_{i}(u,v)$ after the RANSAC algorithm. In the presence of a leader insertion, distance $|{g}_{i}\left(v\right)-{g}_{i}^{\prime}\left(v\right)|$ shows an evident step that can be detected by looking for gradient peaks.
- Let ${v}_{l}$ be the coordinate associated with the detected step. Define the following sets:$$\begin{array}{c}{\mathcal{K}}_{i}^{\left(L\right)}=\left\{({u}_{k},{v}_{k})\in {\mathcal{K}}_{i}|{v}_{k}<{v}_{l}\right\},\\ {\mathcal{K}}_{i}^{\left(R\right)}=\left\{({u}_{k},{v}_{k})\in {\mathcal{K}}_{i}|{v}_{k}>{v}_{l}\right\},\end{array}$$
- Perform a new geometric transform estimation, on the left and right portion of the images separately, according to the subdivision defined in (2). The estimated models come in the form of $3\times 3$ homography matrices, ${H}^{\left(L\right)}$ and ${H}^{\left(R\right)}$, from which it is possible to extract the translation components along the v direction, ${t}^{\left(L\right)}$ and ${t}^{\left(R\right)}$. The length of the candidate leader is then given by:$${w}_{l}=|{t}^{\left(L\right)}-{t}^{\left(R\right)}|.$$

#### 4.3. Spectrogram Registration

- If a leader-tape has been detected in ${P}_{j}$, compensate it on ${P}_{i}$ by adding a band of black pixels centered in ${v}_{l}$ and with length ${w}_{l}$.
- Estimate the global geometric transform H by running RANSAC on all keypoints.
- Warp ${P}_{i}$ towards ${P}_{j}$ according to H, obtaining ${P}_{i}^{\prime}$.
- Compute the dissimilarity value ${d}_{i,j}$ as the MSE of ${P}_{i}^{\prime}$ and ${P}_{j}$:$${d}_{i,j}=\frac{1}{U\xb7V}\sum _{u,v}{|{P}_{j}(u,v)-{P}_{i}^{\prime}(u,v)|}^{2},$$

#### 4.4. Overdub Detection

- Compute the residual spectrogram as the pixel-wise absolute difference of ${P}_{i}^{\prime}$ and ${P}_{j}$ (Figure 4a).$${P}_{r}(u,v)=|{P}_{i}^{\prime}(u,v)-{P}_{j}(u,v)|$$
- Define the function $e\left(v\right)$ representing the energy content of the residual spectrogram over time.$$e\left(v\right)=\sum _{u}{P}_{r}(u,v),\phantom{\rule{2.em}{0ex}}v=1,\dots ,V$$
- Look for strong variations in the residual energy by computing the first derivative ${e}^{\prime}\left(v\right)$ and applying an outlier detector (three scaled MAD from the median, where MAD denotes the median absolute deviation), obtaining a set of points $\mathcal{O}=\left\{{v}_{k}\right\}$ (Figure 4b).
- Process the points ${v}_{k}\in \mathcal{O}$ in order to obtain the interval $[{v}_{1},{v}_{2}]$ corresponding to the candidate overdub. The employed criterion is that of selecting the couple of points which maximizes the average energy ratio between the regions inside and outside those points.$$({v}_{1},{v}_{2})=arg\underset{({v}_{a},{v}_{b})\in {\mathcal{O}}^{2}}{max}\frac{\mathbb{E}{\left[e\left(v\right)\right]}_{{v}_{a}<v<{v}_{b}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}\phantom{\rule{4pt}{0ex}}}}{\mathbb{E}{\left[e\left(v\right)\right]}_{v<{v}_{a}\vee v>{v}_{b}}}$$Given a detected overdub spanning from ${v}_{1}$ to ${v}_{2}$, the algorithm tries to infer the phylogenetic relation. Again, we compare energy statistics inside and outside the overdub region, but in this case, we consider ${P}_{i}^{\prime}$ and ${P}_{j}$, instead of ${P}_{r}$.
- Scan through the spectrogram rows $u=1,\dots ,U$. For each u, compute:$$\begin{array}{c}{c}_{i}\left(u\right)=\left|\mathbb{E}{\left[{P}_{i\phantom{\rule{4pt}{0ex}}}(u,v)\right]}_{{v}_{1}<v<{v}_{2}}-\mathbb{E}{\left[{P}_{i\phantom{\rule{4pt}{0ex}}}(u,v)\right]}_{v<{v}_{1}\vee v>{v}_{2}}\right|\\ {c}_{j}\left(u\right)=\left|\mathbb{E}{\left[{P}_{j}(u,v)\right]}_{{v}_{1}<v<{v}_{2}}-\mathbb{E}{\left[{P}_{j}(u,v)\right]}_{v<{v}_{1}\vee v>{v}_{2}}\right|\end{array}$$

#### 4.5. Tree Estimation

- Starting from the matrix D, build an undirected graph $\mathcal{G}=\{\mathcal{V},\mathcal{E}\}$ with N nodes, where the i-th node is associated with the audio track ${x}_{i}\left(n\right)$ and each edge $(i,j)$ exists if and only if ${d}_{i,j}<+\infty $ and ${d}_{j,i}<+\infty $.
- Run a maximal clique algorithm on $\mathcal{G}$, obtaining ${\mathcal{C}}_{1},\dots ,{\mathcal{C}}_{K}\subseteq \mathcal{V}$.
- Compute the $K\times K$ clique-dissimilarity matrix ${D}_{\mathcal{C}}$ as:$${D}_{\mathcal{C}}(p,q)=\frac{1}{|{\mathcal{C}}_{p}||{\mathcal{C}}_{q}|}\sum _{i\in {\mathcal{C}}_{p},j\in {\mathcal{C}}_{q}}{d}_{i,j}$$
- Starting from the matrix ${D}_{\mathcal{C}}$, build a complete directed graph ${\mathcal{G}}_{\mathcal{C}}=\{{\mathcal{V}}_{\mathcal{C}},{\mathcal{E}}_{\mathcal{C}}\}$, with K nodes, where every node is a clique of the undirected graph $\mathcal{G}$ and each edge $(p,q)$ has a weight equal to ${D}_{\mathcal{C}}$, corresponding to the average dissimilarity between the audio tracks belonging to the p-th and the q-th cliques.
- Compute the phylogenetic tree as the minimum spanning arborescence ${\widehat{\mathcal{G}}}_{\mathcal{C}}=\{{\mathcal{V}}_{\mathcal{C}},{\widehat{\mathcal{E}}}_{\mathcal{C}}\}$, i.e., the directed rooted spanning tree with minimum weight.$${\widehat{\mathcal{E}}}_{\mathcal{C}}=arg\underset{{\mathcal{E}}^{s}\subset {\mathcal{E}}_{\mathcal{C}}}{min}\sum _{(p,q)\in {\mathcal{E}}^{s}}{D}_{\mathcal{C}}(p,q)$$

## 5. Dataset

- addition of a leader-tape within the tape;
- overdub with silence or with another track;
- addition of a splice within the tape.

## 6. Results and Discussion

- In 50% of the cases, the estimated tree perfectly reproduces the ground-truth. Specifically, all the tracks sharing the same tape modifications (leader-tape and/or overdub) are collected in the same clique, and the resulting cliques are correctly ordered in the phylogeny sense.
- In 40% of the cases, the estimated tree is not identical to the ground-truth, but still makes sense in phylogeny terms. For instance, in some cases, it is possible to observe that certain cliques result in being over-clustered: tracks that should belong to the same meta-node are split into more nodes, which can be siblings or in a parent-child relationship. However, the relative depths in the tree structure are maintained, and the overall phylogenetic sense is preserved. Figure 5 reports a couple of examples of this scenario.
- In 10% of the cases, the estimated tree shows some wrong phylogenetic relations (ancestor-descendant swaps) with respect to the ground-truth.

## 7. Conclusions

## Acknowledgments

## Author Contributions

## Conflicts of Interest

## References

- Pousseur, H. Ecrits Théoriques, 1954–1967; Editions Pierre Mardaga: Sprimont, Belgium, 2004. [Google Scholar]
- Canazza, S. The digital curation of ethnic music audio archives: From preservation to restoration. Int. J. Digit. Libr.
**2012**, 12, 121–135. [Google Scholar] [CrossRef] - Bressan, F.; Canazza, S. A Systemic Approach to the Preservation of Audio Documents: Methodology and Software Tools. J. Electr. Comput. Eng.
**2013**, 2013, 21. [Google Scholar] [CrossRef] - Bressan, F.; Rodà, A.; Canazza, S.; Fontana, F.; Bertani, R. The Safeguard of Audio Collections: A Computer Science Based Approach to Quality Control—The Case of the Sound Archive of the Arena di Verona. Adv. Multimedia
**2013**, 2013, 14. [Google Scholar] [CrossRef] - Bressan, F.; Canazza, S.; Rodá, A.; Bertani, R.; Fontana, F. Pavarotti Sings Again: A Multidisciplinary Approach to the Active Preservation of the Audio Collection at the Arena di Veronach to the Active Preservation to the Active Preservation of the Audio Collection at the Arena di Verona. J. New Music Res.
**2013**, 42, 364–380. [Google Scholar] [CrossRef] - Canazza, S.; Fantozzi, C.; Pretto, N. Accessing tape music documents on mobile devices. ACM Trans. Multimedia Comput. Commun. Appl.
**2015**, 12, 20. [Google Scholar] [CrossRef] - Fantozzi, C.; Bressan, F.; Pretto, N.; Canazza, S. Tape music archives: From preservation to access. Int. J. Digit. Libr.
**2017**, 18, 233–249. [Google Scholar] [CrossRef] - Van Huis, E. What makes a good archive? IASA J.
**2009**, 24, 25–28. [Google Scholar] - Timpanaro, S. The Genesis of Lachmann’s Method; University of Chicago Press: Chicago, IL, USA, 2005. [Google Scholar]
- Milani, S.; Fontana, M.; Bestagini, P.; Tubaro, S. Phylogenetic analysis of near-duplicate images using processing age metrics. In Proceedings of the 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Shanghai, China, 20–25 March 2016. [Google Scholar]
- Milani, S.; Bestagini, P.; Tubaro, S. Video phylogeny tree reconstruction using aging measures. In Proceedings of the 2017 European Signal Processing Conference (EUSIPCO 2017), Kos, Greece, 28 August–2 September 2017. [Google Scholar]
- Nucci, M.; Tagliasacchi, M.; Tubaro, S. A phylogenetic analysis of near-duplicate audio tracks. In Proceedings of the 2013 IEEE 15th International Workshop on Multimedia Signal Processing (MMSP), Pula, Italy, 30 September–2 October 2013; pp. 99–104. [Google Scholar]
- Kennedy, L.; Chang, S.F. Internet image archaeology: Automatically tracing the manipulation history of photographs on the web. In Proceedings of the ACM International Conference on Multimedia (ACM-MM), Vancouver, BC, Canada, 26–31 October 2008. [Google Scholar]
- de O. Costa, F.; Oikawa, M.A.; Dias, Z.; Goldenstein, S.; de Rocha, A.R. Image Phylogeny Forests Reconstruction. IEEE Trans. Inf. Forensics Sec.
**2014**, 9, 1533–1546. [Google Scholar] - Dias, Z.; Goldenstein, S.; Rocha, A. Exploring heuristic and optimum branching algorithms for image phylogeny. J. Vis. Commun. Image Represent.
**2013**, 24, 1124–1134. [Google Scholar] [CrossRef] - Melloni, A.; Bestagini, P.; Milani, S.; Tagliasacchi, M.; Rocha, A.; Tubaro, S. Image phylogeny through dissimilarity metrics fusion. In Proceedings of the European Workshop on Visual Information Processing (EUVIP), Paris, France, 10–12 December 2014. [Google Scholar]
- Verde, S.; Milani, S.; Bestagini, P.; Tubaro, S. Audio phylogenetic analysis using geometric transforms. In Proceedings of the 2017 IEEE International Workshop on Information Forensics and Security (WIFS), Rennes, France, 4–7 December 2017. [Google Scholar]
- Zattra, L. The Assembling of Stria by John Chowning: A Philological Investigation. Comput. Music J.
**2007**, 31, 38–64. [Google Scholar] [CrossRef] - Sallis, F.; Bertolani, V.; Burle, J.; Zattra, L. Live-Electronic Music. Composition, Performance and Study; Routledge: London, UK, 2017. [Google Scholar]
- Orio, N.; Snidaro, L.; Canazza, S.; Foresti, G.L. Methodologies and tools for audio digital archives. Int. J. Digit. Libr.
**2009**, 10, 201–220. [Google Scholar] [CrossRef] - Canazza, S.; Orio, N. Digital preservation and access of audio heritage: A case study for phonographic discs. In Proceedings of the 13th Conference on Digital Libraries, Corfu, Greece, 27 September–2 October 2009; pp. 451–454. [Google Scholar]
- Orio, N.; Zattra, L. ACAME—Analyse Comparative Automatique de la Musique Electroacoustique; Musimediane: Paris, France, 2009. [Google Scholar]
- Joly, A.; Buisson, O.; Frelicot, C. Content-Based Copy Retrieval Using Distortion-Based Probabilistic Similarity Search. IEEE Trans. Multimedia
**2007**, 9, 293–306. [Google Scholar] [CrossRef] - De Benedictis, A.I. Scrittura e supporti nel Novecento: Alcune riflessioni e un esempio (Ausstrahlung di Bruno Maderna). In La Scrittura Come Rappresentazione del Pensiero Musicale; Borio, G., Ed.; ETS: Pisa, Italic, 2004; pp. 237–291. [Google Scholar]
- Dwyer, T. Composing With Tape Recorders: Musique Concrete for Beginners; Oxford University Press: London, UK, 1971. [Google Scholar]
- AES. AES Recommended Practice for Audio Preservation and Restoration—Storage and Handling—Storage of Polyester-Base Magnetic Tape; AES: New York, NY, USA, 1997; (r2012). [Google Scholar]
- Eilers, D.A. Splicing Tapes and Their Proper Application. J. Audio Eng. Soc.
**1968**, 16, 472–476. [Google Scholar] - Bradley, K. IASA TC-04 Guidelines in the Production and Preservation of Digital Audio Objects: Standards, Recommended Practices, and Strategies, 2nd ed.; International Association of Sound and Audio Visual Archives: Aarhus, Denmark, 2009. [Google Scholar]
- Mallinson, J.C. Tutorial review of magnetic recording. Proc. IEEE
**1976**, 64, 196–208. [Google Scholar] [CrossRef] - Camras, M. Magnetic Recording Handbook; Van Nostrand Reinhold Co.: New York, NY, USA, 1988. [Google Scholar]
- National Association of Broadcaster. Magnetic Tape Recording and Reproducing (Reel-to-Reel); National Association of Broadcasters: Washington, DC, USA, 1965. [Google Scholar]
- International Electrotechnical Commission. BS EN 60094-1:1994 BS 6288-1: 1994 IEC 94-1:1981—Magnetic Tape Sound Recording and Reproducing Systems—Part 1: Specification for General Conditions and Requirements; International Electrotechnical Commission: Geneve, Switzerland, 1981. [Google Scholar]
- Zanoni, M.; Lusardi, S.; Bestagini, P.; Canclini, A.; Sarti, A.; Tubaro, S. Robust music identification approach based on local spectrogram image descriptors. In Proceedings of the 142nd AES Convention, Berlin, Germany, 20–23 May 2017; p. 9763. [Google Scholar]
- Williams, D.; Pooransingh, A.; Saitoo, J. Efficient music identification using ORB descriptors of the spectrogram image. EURASIP J. Audio Speech Music Proc.
**2017**, 2017, 17. [Google Scholar] [CrossRef] - Bay, H.; Ess, A.; Tuytelaars, T.; Gool, L.V. Speeded-up robust features. Comput. Vis. Image Underst.
**2008**, 110, 346–359. [Google Scholar] [CrossRef] - Fischler, M.A.; Bolles, R.C. Random sample consensus: A paradigm for model fitting with applications to image analysis and automated cartography. Commun. ACM
**1981**, 24, 381–395. [Google Scholar] [CrossRef] - Chu, Y.J.; Liu, T.H. On the shortest arborescence of a directed graph. Sci. Sin.
**1965**, 14, 1396–1400. [Google Scholar] - Edmonds, J. Optimum branchings. J. Res. Natl. Bur. Stand.
**1967**, 71B, 233–240. [Google Scholar] [CrossRef] - Studer. Studer A810—Operating and Service Instruction; Studer: Zurich, Switzerland, 2018. [Google Scholar]
- Micheloni, E.; Pretto, N.; Canazza, S. A step toward AI tools for quality control and musicological analysis of digitized analogue recordings: Recognition of audio tape equalizations. In Proceedings of the 11th InternationalWorkshop on Artificial Intelligence for Cultural Heritage Co-Located with the 16th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2017), Bari, Italic, 14–17 November 2017. [Google Scholar]

**Figure 1.**Example of near-duplicates (witnesses). In the middle of the tape (

**a**) has been added a piece of leader-tape obtaining the modified version (

**b**); The difference between the two versions can be clearly observed comparing the corresponding spectrograms (

**c**,

**d**).

**Figure 2.**Block diagram of the proposed algorithm. The input consists of the digitalized audio tracks ${x}_{i}$, $i=1,\dots ,N$, and the output is the estimated audio phylogeny tree (APT).

**Figure 3.**Spectrogram image ${P}_{i}(u,v)$ of an audio track ${x}_{i}\left(n\right)$, with green asterisks representing the detected SURF keypoints. Subfigures show the SURF keypoints (

**a**) and inlier keypoints after RANSAC (

**b**). Note that the remaining inlier points are located to the right of the leader-tape.

**Figure 4.**Residual spectrogram and related energy-over-time associated with a track pair $(i,j)$ containing an overdub, which appears in (

**a**) as a bright region with clean edges. The red circles in (

**b**) represent the detected outliers ${v}_{k}\in \mathcal{O}$, and the two points marked with green asterisks are the selected edges $({v}_{1},{v}_{2})$.

**Figure 5.**Examples of tree reconstruction with over-clustering errors. Datasets consist of seven audio tracks, $\{\mathbf{a},\mathbf{b},\dots ,\mathbf{g}\}$. In (

**a**), cluster $\{\mathbf{b},\mathbf{e},\mathbf{g}\}$ is split into the parent-child pair $\left(\left\{\mathbf{e}\right\},\{\mathbf{b},\mathbf{g}\}\right)$; in (

**b**), cluster $\{\mathbf{d},\mathbf{e},\mathbf{f},\mathbf{g}\}$ is split into the sibling pair $\left(\{\mathbf{d},\mathbf{e},\mathbf{g}\},\left\{\mathbf{f}\right\}\right)$.

**Table 1.**Equalization standards supported by the Studer A810 described by their time constants. Source: [39].

30 ips | 15 ips | 7.5 ips | 3.75 ips |
---|---|---|---|

AES: 17.5/∞ | CCIR: 35/∞ | 70/∞ | 90/3180 |

AES: 17.5/∞ | NAB: 50/3180 | 50/3180 | 90/3180 |

**Table 2.**Samples of electroacoustic music recorded on experimental tapes with the related configuration.

Samples | Recording Parameters | |||||
---|---|---|---|---|---|---|

# | Composer | Title | Year(s) | Speed | Equation | DBX |

1 | Luciano Berio | Differences | 1958–1959 | 7.5 | CCIR | yes |

2 | Pierre Boulez | Dialogue de l’ombre double | 1985 | 7.5 | CCIR | yes |

3 | Brian Ferneyhough | Mnemosyne | 1986 | 7.5 | CCIR | no |

4 | Brian Ferneyhough | Mnemosyne | 1986 | 15 | CCIR | yes |

5 | Bruno Maderna | Continuo | 1958 | 15 | CCIR | no |

6 | Bruno Maderna | Dimensioni II—invenzione su una voce | 1960 | 7.5 | NAB | yes |

7 | Bruno Maderna | Notturno | 1956 | 7.5 | NAB | no |

8 | Luigi Nono | ...sofferte onde serene... | 1976 | 15 | NAB | yes |

9 | Gruppo NPS | Interferenze II | 1965–1968 | 15 | NAB | yes |

10 | Gruppo NPS | Ricerca 4 | 1965–1968 | 15 | NAB | no |

Leader | Overdub | ||
---|---|---|---|

$p(L|L)$ | $p(L|\neg L)$ | $p(O|O)$ | $p(O|\neg O)$ |

90.0% | 0.0% | 75.0% | 3.3% |

© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## Share and Cite

**MDPI and ACS Style**

Verde, S.; Pretto, N.; Milani, S.; Canazza, S.
Stay True to the Sound of History: Philology, Phylogenetics and Information Engineering in Musicology. *Appl. Sci.* **2018**, *8*, 226.
https://doi.org/10.3390/app8020226

**AMA Style**

Verde S, Pretto N, Milani S, Canazza S.
Stay True to the Sound of History: Philology, Phylogenetics and Information Engineering in Musicology. *Applied Sciences*. 2018; 8(2):226.
https://doi.org/10.3390/app8020226

**Chicago/Turabian Style**

Verde, Sebastiano, Niccolò Pretto, Simone Milani, and Sergio Canazza.
2018. "Stay True to the Sound of History: Philology, Phylogenetics and Information Engineering in Musicology" *Applied Sciences* 8, no. 2: 226.
https://doi.org/10.3390/app8020226