Design and Application of the BiVib Audio-Tactile Piano Sample Library

: A library of piano samples composed of binaural recordings and keyboard vibrations has been built, with the aim of sharing accurate data that in recent years have successfully advanced the knowledge on several aspects about the musical keyboard and its multimodal feedback to the performer. All samples were recorded using calibrated measurement equipment on two Yamaha Disklavier pianos, one grand and one upright model. This paper documents the sample acquisition procedure, with related calibration data. Then, for sound and vibration analysis, it is shown how physical quantities such as sound intensity and vibration acceleration can be inferred from the recorded samples. Finally, the paper describes how the samples can be used to correctly reproduce binaural sound and keyboard vibrations. The library has potential to support experimental research about the psycho-physical, cognitive and experiential effects caused by the keyboard’s multimodal feedback in musicians and other users, or, outside the laboratory, to enable an immersive personal piano performance.


Introduction
During instrumental performance musicians are exposed to auditory, visual and also somatosensory cues. This multisensory experience has been studied since long [1][2][3][4][5], however the specific interaction between sound and vibrations has been object of systematic research since the 1980's [6][7][8][9][10][11][12], when tactile and force feedback cues started to be recognized to have a prominent role in the complex perception-action mechanisms occurring during musical instrument playing [13]. More recently, research on the somatosensory perception of musical instruments has been consolidated, as testified by the emerging "musical haptics" topic [14].
This increased interest is partly due to the increased availability of accurate yet affordable sensors and actuators, capable of recording touch gestures and rendering vibrations and force, respectively. Using these devices, complex experimental settings can be realized to measure and deliver multisensory information in a musical instrument, often as a result of a real-time analysis and synthesis process taking place during the performance [15][16][17]. Once integrated with traditional audio microphone and loudspeaker systems, touch sensors and actuators can be employed first to investigate the joint role of the auditory and somatosensory modality in the perception of a musical instrument, and then to realize novel musical interfaces and instruments building on the lessons of the previous investigation. Through this process, richer or even unconventional feedback cues can be conveyed to the performer, with the aim of increasing engagement, and hence the initial acceptability and subsequent playability of the new instrument [18][19][20][21].
In this scenario, the availability of multimodal databases combining and synchronizing different streams of information (audio, video, kinematic data of the instrument and performer in action, physiological signals, interactions among musicians etc.) is increasingly recognized as an essential asset for studying music performance. Recent examples include the "multimodal string quartet performance dataset" (QUARTET) [22], the "University of Rochester multimodal music performance dataset" (URMP) [23], the "database for emotion analysis using physiological signals" (DEAP) [24], the "TU-Note violin sample library" [25]. Furthermore, initiatives aiming at systematizing the creation of these databases have recently appeared, such as RepoVizz [26], a framework for storing, browsing, and visualizing synchronous multimodal data. Overall, this scenario suggests an increasing attention to the design of musical instrument sound databases, motivated by the concrete possibility for their content to be reproduced in instrumental settings including not only loudspeakers and headphones, but also haptic and robotic devices.
The piano represents an especially relevant case study not only for its importance in the history of Western musical tradition, but also for its potential in the musical instruments market due to the universality of the keyboard interface, a feature that has traditionally induced novel musical instrument makers to propose conservative instances of the standard piano keyboard whenever this interface made possible to control even revolutionary sound synthesis methods [27]. It is no accident that, also in recent years, sales of digital pianos and keyboard synthesizers have shown a growing trend as opposed to other instrument sales (https://www.namm.org/membership/global-report). For the same reason, researchers on new musical instruments have steadily elected the piano keyboard as the platform of choice for designing expansions of the traditional paradigm, affording a performer to accurately play two local selections of 88 available tones with the desired amplitude and temporal development [28,29].
When playing an acoustic piano, the performer is exposed to a variety of auditory, visual, somatosensory, and vibrotactile cues that combine and integrate to shape the pianist's perception-action loop. The present authors are involved in a long-term research collaboration around this topic, with a focus on two main aspects. The first one is the tactile feedback produced by keyboard vibrations that reach the pianist's fingers after keystrokes and remains active until key release. The second one is the auditory spatial information in the sound field radiated by the instrument at the performer's head position. Binaural piano tones are offered by a few audio plugin developers (e.g., Modartt Pianoteq (https://www.pianoteq.com/)) and digital piano manufacturers (e.g., Yamaha Clavinova (https://europe.yamaha.com/en/products/musical_instruments/pianos/clavinova/)), but they have limited flexibility of use. Free binaural piano samples can be found, too, such as the "binaural upright piano" library, (https://www.michaelpichermusic.com/binaural-upright-piano) which however offers only three dynamic levels (as opposed to the ten levels provided by the present dataset). More generally concerning the presentation of audio-tactile piano tones, the existing literature is scarce and provides mixed if not contradictory results about the actual perceptibility and possible relevance of this multisensory information [8]. Specific discussions of these aspects have been provided in previously published studies, regarding both sound localization [30] and vibration perception [12] on the acoustic piano. As a notable result of these studies, a digital piano prototype was developed that reproduces various types of vibrations [20] including those generated by acoustic pianos as an unavoidable by-product of their sound production.
Across the years, this research has resulted in the production of an extensive amount of experimental data, most of which resulting from highly accurate measurements with calibrated devices. In an effort to provide public access to such data, the authors of this paper present a dataset of audio-tactile piano samples organized as libraries of synchronized binaural sound and vibration signals.
The dataset contains samples relative to all 88 tones, played at different dynamics on two instruments: a grand and an upright piano. A preliminary version was presented at a recent conference [31], and has now been updated by a new release containing additional upright piano sounds along with the binaural impulse responses of the room in which the same piano was recorded. In order to use this dataset, it is necessary to take into account the recording conditions and, on the user side, to take control of the rendering system in an effort to match the reproduction to the characteristics of the original recorded signals. For this reason, Section 2 describes the hardware/software recording setup and the organization of samples into libraries for the free version of a popular music software sampler. An explanation about how to reproduce the database is provided in Section 3. Section 4 suggests some applications of this library, based on the authors' past design experiences with multimodal piano samples, and on more experiences that are foreseen for future research.
Among past experiences, the most prominent have consisted of studies on the role of haptic feedback during the performance, both vibratory and somatosensory: the former concerns the perception of keyboard vibrations, their accurate reproduction and their effects on the performance [12,20]; the latter concerns the influence of actively playing on the keyboard in the auditory localization process of piano tones [30,32]. Such experiences let to the decision of sharing the dataset with the scientific community, with the goal of fostering the research on the role of vibrations and tone localization in the pianist's perceived instrument quality (both not completely understood yet), as well as adding knowledge about the importance at cognitive level of multisensory feedback for its use in the design of novel keyboard interfaces.

Creation of the BiVib library
The BiVib (binaural and vibratory) sample library is a collection of high-resolution audio files (.wav format, 24-bit @ 96 kHz) containing binaural piano sounds and keyboard vibrations, coming along with documentation and project files for its reproduction through a free music software sampler. The dataset, whose core structure is illustrated in Table 1, is made available through an open-access data repository (https://zenodo.org/record/2573232) and released under a Creative Commons (CC BY-NC-SA 4.0) license.

Recording Hardware
The samples were recorded from two Yamaha Disklavier pianos-a grand model DC3 M4 located in Padova (PD), Italy, and an upright model DU1A with control unit DKC-850 located in Zurich (ZH), Switzerland. Disklaviers are Musical Instrument Digital Interface (MIDI from here on)-compliant acoustic pianos equipped with sensors for recording keystrokes and pedaling, and operating electromechanical motors for playback. The grand piano is located in a large laboratory space (approximately 6 × 4 m), while the upright piano is in an acoustically treated small room (approximately 4 × 2 m).
Binaural audio recordings made use of dummy heads for acoustic measurements, with slightly different setups in PD and ZH: the grand piano was recorded with a KEMAR 45BM (GRAS Sound & Vibration A/S, Holte, Denmark), whereas the upright piano with a Neumann KU 100 (Georg Neumann GmbH, Berlin, Germany). Both mannequins were placed in front of the pianos, approximately where the pianist's head is located on average (see Figure 1). The dummy heads were connected to the microphone inputs of two professional audio interfaces, a RME Fireface 800 (PD, gain set to +40 dB) (Audio AG, Haimhausen, Germany) and a RME UCX (ZH, gain set to +20 dB) (Audio AG, Haimhausen, Germany). The pair of condenser microphones inside the dummy heads were respectively driven by two 26CB preamplifiers supplied by a 12AL power module (PD), and by 48V phantom power provided by the audio interface (ZH).
Three configurations of the keyboard lid were selected for each piano. The grand piano (PD) was measured with the lid closed, fully open, and removed (i.e., detached from the instrument). The upright piano was recorded with the lid closed, in semi-open position (see Figure 1), and fully open. Different lid configurations in fact add insight on the role of the mechanical noise coming from the moving keys to the creation of transient cues of lateralization in the sound field reaching the performer's ears [30]. As a result, three sets of binaural samples were recorded for both pianos-one set for each lid position.
Vibration recordings were acquired with a Wilcoxon Research 736 (Wilcoxon Sensing Technologies Inc., Amphenol, MD, USA) piezoelectric accelerometer connected to a Wilcoxon Research iT100M (Wilcoxon Sensing Technologies Inc., Amphenol, MD, USA) intelligent transmitter, whose AC-coupled output fed one line input of the audio interface. The accelerometer was manually attached with double-sided adhesive tape to each key in sequence, as shown in Figure 2. Its center was positioned 2 cm far from the key edge, where most vibration modes are radiated efficiently, and where pianists typically put their fingertip.

Recording Software
Ten values of dynamics were chosen between MIDI key velocity 12 and 111 by evenly splitting this range into eleven intervals. This choice was motivated by a previous study by the present authors, which reported that both Disklaviers produced inconsistent dynamics outside this velocity range [12]. In general, the servo-mechanics of computer-controlled pianos fall short (to an extent that depends on the model) of providing a reliable response at extreme dynamic values [33]. Holding this assumption, two different software setups were used respectively for sampling sounds and vibrations.
Binaural samples were recorded via an automatic procedure programmed in SuperCollider. (A programming environment for sound processing and algorithmic composition: http://supercollider. github.io/.) The recording sessions took place at night time, thus minimizing unwanted noise coming from human activity in the building. On the grand piano, note lengths were determined algorithmically depending on their dynamics and pitch, ranging from 30 s when A0 was played at key velocity 111, to 10 s when C8 was played at key velocity 12. In fact, notes of increasing pitch and/or decreasing dynamics have shorter decay times. These durations allow each note to fade out completely, while minimizing silent recordings and the overall duration of the recording sessions-still, each session lasted approximately six hours.
On the contrary, an undocumented protection mechanism on the upright piano prevents its electromechanical system from holding down the keys for more than about 17 s, thus making a complete decay impossible for some notes, especially at low pitches and high dynamics. In this case, for the sake of simplicity all tones were recorded for just as long as possible.
Vibration samples were recorded through a less sophisticated procedure. A digital audio workstation (DAW) software was used to play back all notes in sequence at the same ten MIDI velocity values as those used for the binaural audio recordings, using a constant duration of 16 s. This duration in fact is certainly greater than the time taken by any key vibration to decay below sensitivity thresholds [12,34].

Sample Pre-Processing
Because of the mechanics of piano keyboards and the intrinsic limitations of computer-controlled electro-mechanical actuation, a systematic delay is introduced while reproducing MIDI note ON messages, which mainly varies with key dynamics. For this reason, all recorded samples started with silence of varying durations, which had to be removed in view of their use in a sampler (see Section 2.4). Given the number of files that had to be pre-processed (880 for each set), an automated procedure was implemented in SuperCollider to cut the initial silence of each audio sample. The procedure analyzed the amplitude envelope, detected the position of the largest peak, and finally applied a short fade-in starting a few milliseconds before the peak.
Additionally, vibration signals presented abrupt onsets in the first 200-250 ms right after the starting silence, as a consequence of the initial fly of the key and of the following impact against the piano keybed (see Figure 3). These onsets are not related to keyboard vibrations, and therefore they had to be removed. Picture from [12].
As such onset profiles showed large variability, in spite of several tests made in MATLAB, no reliable automated procedure could be realized for editing the vibration samples. A manual approach was employed instead: files were imported in a sound editor, their waveform was zoomed in and played back, and the onset was cut off.

Sample Library Organization
The sample library was organized for playback with the 'Kontakt Player' software-a free version of Native Instruments' Kontakt sampler, (https://www.native-instruments.com/en/products/ komplete/samplers/kontakt-5-player/) available for Windows and Mac OS systems. The full version of Kontakt 5 was instead used to develop Kontakt project files as described below. The resulting library is organized into several folders, named 'Instruments', 'Multis', 'Resources', and 'Samples'.
The 'Samples' folder-whose total size amounts to about 65 GB-held separate subfolders for the binaural and vibration samples, respectively, which contain further subfolders for each sample set (see Table 1 Following the terminology used in Kontakt, each instrument reproduces a sample set (e.g., binaural recording of the grand piano with lid open), while each multi combines two instruments respectively reproducing one binaural and one vibration sample set belonging to the same piano. The two instruments in each multi are configured so as to receive MIDI input data on channel 1, thus playing back at once, while their respective outputs are routed to different virtual channels in Kontakt: binaural samples are routed to a pair of stereo channels (numbered 1-2), while vibration samples are played through a mono channel (numbered 3). In this way, when using audio interfaces offering more than two physical outputs, it is possible to render synchronized binaural and vibrotactile cues at the same time by routing the audio signal to headphones and, in parallel, the vibration signal to an amplifier driving one or more actuators.
In each instrument, sample mapping was implemented relying on the 'auto-map' feature found in the full version of Kontakt: this parses file names and uses the recognized tokens for assigning samples to e.g., a pitch and velocity range. The chosen file naming template made it straightforward to batch-import the samples.
The amplitude of the recorded signals was not altered, that is, no dynamic processing or amplitude normalization was applied. The volume of all Kontakt instruments was set to 0 dB. Because of this setting and of the adopted velocity mapping strategy, sample playback was kept as transparent as possible for simplifying the setup of acoustic and vibratory analysis procedures, experiments and interactive applications (see Sections 3 and 4).

Application of the BiVib Library
Binaural piano tones such as those offered by Yamaha digital pianos or the Modartt Pianoteq software synthesizer are not fully suitable for research purposes due to being undocumented, hence non reproducible, acquisition procedures and/or post-processing of the sound signals. Moreover, the present authors have no evidence of public samples including piano keyboard vibrations. Thus, BiVib fills two gaps found in the datasets currently available for the reproduction of piano feedback.

Sample Reproduction
Experiments and applications requiring the use of calibrated data need exact reconstruction of the measured signals at the reproduction side: acoustic pressure for binaural sounds, and acceleration for keyboard vibrations. Obviously the reproduction must take place on a set-up in which neither autonomous sounds nor vibrations are present. For instance, the reproduction of vibrations could take place on a weighted MIDI keyboard (such as those found in digital pianos), while binaural sounds may be rendered through headphones. Figure 4 (left) shows one such setting, in which a commercial digital piano is augmented through the setup schematized as in Figure 4 (right). Note that the bottom of the digital piano has been reinforced by substituting the keybed with a thicker wooden panel, to form what we will call a haptic digital piano from here.
for the grand piano.
Once the original measurements have been reconstructed, physical quantities are, in principle, ready for presentation of sounds through headphones and vibrations through tactile actuators, holding the issues and limitations that are listed in the next section.

Sample Equalization and Limitations
BiVib users should hear the same sounds as the binaural pressure signals measured by the dummy head microphones, and should receive vibrations at the fingers which are identical to the acceleration signals measured by the accelerometer.
The audio reproduction condition can be satisfied by using headphones or earphones that bypass the outer ear and provide a frequency response that is as flat as possible. In this way, the performer would listen to stereo sounds that contain the binaural information created by the dummy head. Note that these sounds also contain the contribution of room reverberation, as the pianos could not be located inside anechoic rooms during the recording sessions: especially the laboratory hosting the grand piano was moderately reverberant, and no reverberation data could be collected for this room at the time of the acquisition of samples. Conversely, the upright piano (ZH) benefited of a silent studio room whose impulse responses have been measured and recently included in BiVib (folder 'IR'). More precisely, responses from two source points generated by a pair of Genelec 8040A loudspeakers were taken in correspondence of the two ear canal entrances of the Neumann KU 100 dummy head with both outer ears removed: As shown in Figure 5, the loudspeakers were placed symmetrically at both sides of the upright piano, 80 cm above the floor, pointing toward the dummy head (angle with the vertical plane parallel to the piano equal to 47 • ) which was positioned 55 cm far from the upright panel of the piano. Logarithmic sweeps were synthesized in the Audacity audio editor using the Aurora modules [35], then reproduced using the loudspeakers, and finally deconvolved again with Aurora to find four room transfer functions forming the source-ear transfer matrix. Such transfer functions were included in the dataset and can be used to remove the echoes of the recording room through inversion of the transfer matrix, using standard deconvolution techniques [36]. Also, because of the low energy of these echoes, binaural recordings were intentionally left unprocessed, thus allowing musicians and performers to use them as they are, while leaving any decision on their possible manipulation to advanced users. The vibration reproduction condition may be ideally satisfied by implementing a weighted MIDI keyboard (such as those found on digital pianos) where each key is provided with an actuator, similar to the prototype technology described in [37]. Even in the unlikely case that such actuators would offer a flat frequency response over the range of interest for tactile sensation [38], vibration reproduction would be altered by the shape, material and construction of the keyboard. That requires the implementation of a compensation procedure-similar to that described in the previous paragraph to remove room acoustic components-that would enable the accurate reproduction of the vibrations recorded by the accelerometer on each key. Unfortunately, on such a keyboard there would be 88 touch points requiring deconvolution. In alternative, two arrays of 88 transfer characteristics can be formed each by measuring vibrations on the same touch points, after exciting the digital piano body with a corresponding pair of tactile transducers. Figure 6 (left) [12] shows a detail of the haptic digital piano that the authors realized according to the latter solution, where two transducers were mounted below the keybed, one mid-way the keyboard length and another approximately at one quarter way the same length-see also Figure 4 (right). By playing an impulse simultaneously on both of them, advanced BiVib users should accurately measure on each key the corresponding impulse response, and then (with Aurora or similar tools) design an equalizer which on average flattens the "coloured" response affecting the vibrations due to their path from the audio interface to the keyboard. Since such measures depend on the user's musical keyboard, they could not be included in BiVib as the binaural room transfer functions were instead.
However, in the aforementioned digital piano customization it was experimentally observed by the authors that the tactile transducers shown in Figure 6 (left) were mostly responsible for the inclusion of distinct spectral peaks affecting all the key responses. Hence, the average equalization curve shown in Figure 6 (right) was estimated, and proved to be effective for flattening the peaks measured on an evenly-spaced subset of the keys. A similar curve can be approximated by BiVib users by designing an equalizer that flattens the frequency response of the tactile transducers equipping the setup. This response is normally reported in a good quality transducer's data sheet.

Experiments, Applications and Future Work
BiVib has been originally created to support multisensory experiments in which precise control had to be maintained over the simultaneous auditory and vibrotactile stimuli reaching a performing pianist, particularly when judging the perceived quality of an instrument. In a recent paper, the authors were able to confirm subjective vibrotactile frequency thresholds of active touch [34] by conducting tests with pianists who were asked to detect vibrations at the piano keyboard [12]. A comparison between such thresholds and the spectrum of a lower fortissimo A0 tone is shown in Figure 7. This demonstrates a previously unreported results: during active playing, pianists are able to integrate tactile sensation that would be imperceptible in passive touch conditions [38].  [38], while the two horizontal dashed lines represent the minimum and maximum thresholds recently measured under active touch conditions [34]. Picture adapted from [12].
A precise manipulation of the intensity relations between piano sound and vibrations may be used to investigate the existence of cross-modal effects occurring during piano playing. Such effects have been discovered as part of a more general multisensory integration mechanism [39] that under certain conditions can increase the perceived intensity of auditory signals [40], or conversely enhance touch perception [41]. Aiming to understand whether piano keyboard vibrations impact the perceived quality of the instrument and, as a secondary effect, the quality of a performance, the authors have first observed significant differences in the perceived quality of vibrating vs. silently playing Disklavier pianos. This observation marks a point in favor of the former, especially since that pianists involved in the experiment reported to be unaware of the existence of vibratory feedback [12]. In other words, while most pianists preferred the vibrating instrument, they did not consciously realize that their decision was caused by the vibrations produced (or not produced) by the instrument during the performance. Based on this result, a further experiment was designed for the haptic digital piano using diverse types of tactile feedback, synthesized by manipulating the BiVib samples. The test aimed at investigating possible consequences of the vibrotactile feedback on the pianist's playing experience (qualitative effect) and on the performance in terms of timing and dynamics accuracy (quantitative effect). Cross-modal effects resulting from varying the tactile feedback of the keyboard were observed, still these preliminary results are far from giving a systematic view about the impact of the different sensory channels on the pianist's playing experience, and especially on the accuracy of execution [20].
Another potential use of BiVib is in the investigation of binaural spatial cues for the acoustic piano. Using the recommendations given in the previous section, single tones of the upright piano can in fact be accurately cleared of the room echoes, and then be reproduced in the position of the ear entrance. The existence of localization cues in piano sounds has not been completely understood yet. Even in pianos where these cues are reported to be audible by listeners, their exact acoustic origin is still an open question [42]. Moreover, visual cues of self-moving keys (a condition possible on Disklavier pianos) producing the corresponding tones, as well as somatosensory cues occurring during active piano playing, may have an influence on localization judgments [30,32].
One further direction which may take advantage of BiVib deals with research in cognitive neuroscience: recently, pianists and their instrument served as key subjects for understanding diverse aspects of brain and motor development [43,44]. In this context, the contribution of the auditory, visual and tactile sensory modalities to this development have not been ascertained yet. Such knowledge could be of help not only to capture more general aspects of the development of human senses, but also to guide a perceptually and cognitively informed design of novel keyboard interfaces.
In summary, future research that can be conducted in the laboratory using BiVib includes tests aiming at conclusively understanding whether (i) vibrations affect the performance on the keyboard, and whether (ii) auditory lateralization is able to guide piano tone localization. If so, such multisensory cues may substantially contribute to the sense of engagement and, hence, improve the quality of the performance and make the learning curve of a keyboard interface more acceptable.
Outside the laboratory, the library can reward musicians who simply wish to use its sounds. On the one hand, the grand piano recordings contain echoes that bring distinct cues of the room where they have been recorded. In this sense they are ready for use, although labeled by a precise acoustic footprint. On the other hand, the upright piano recordings are much more anechoic and, hence, far less difficult to spatialize than piano recordings taken outside an acoustically controlled room [45]. Consequently, they can be easily imported and conveniently personalized by users through artificial reverberation.

Conclusions
BiVib provides a unique set of multimodal piano data recorded using high-quality equipment in controlled conditions through reproducible computer-controlled procedures. Since its original release, the library has been enriched with binaural responses of the room where the upright piano was recorded. We hope that a successful use of the BiVib dataset, in conjunction with this documentation and through publicly available projects for the Kontakt software sampler, will facilitate further research in piano acoustics, performance, and new musical interface design also for educational purposes.
Funding: This research was partially funded by Swiss National Science Foundation grants number 150107 and 178972. donated the wooden panel used to build the haptic digital piano. The authors would like to thank several students and collaborators who contributed to the development of this work along the years, in chronological order: Francesco Zanini, Valerio Zanini, Andrea Ghirotto, Devid Bianco, Lorenzo Malavolta, Debora Scappin, Mattia Bernardi, Francesca Minchio, Martin Fröhlich.

Conflicts of Interest:
The author declares no conflict of interest.