Microphone and Loudspeaker Array Signal Processing Steps towards a “Radiation Keyboard” for Authentic Samplers

: To date electric pianos and samplers tend to concentrate on authenticity in terms of temporal and spectral aspects of sound. However, they barely recreate the original sound radiation characteristics, which contribute to the perception of width and depth, vividness and voice separation, especially for instrumentalists, who are located near the instrument. To achieve this, a number of sound ﬁeld measurement and synthesis techniques need to be applied and adequately combined. In this paper we present the theoretic foundation to combine so far isolated and fragmented sound ﬁeld analysis and synthesis methods to realize a radiation keyboard , an electric harpsichord that approximates the sound of a real harpsichord precisely in time, frequency, and space domain. Potential applications for such a radiation keyboard are conservation of historic musical instruments, music performance, and psychoacoustic measurements for instrument and synthesizer building and for studies of music perception, cognition, and embodiment.


Introduction
Synthesizers tend to focus on timbral aspects of sound, which contains temporal and spectral features [1,2]. This is even true for modern synthesizers that imitate musical instruments by means of physical modeling [3,4]. Many samplers and electric pianos on the market use stereo recordings, or pseudostereo techniques [5,6] to create some perceived spaciousness in terms of apparent source width or perceived source extent, so that the sound appears more natural and vivid. However, such techniques do not capture the sound radiation characteristics of musical instruments, which may be essential for an authentic experience in music listening and musician-instrument-interaction.
Most sound field synthesis approaches synthesize virtual monopole sources or plane waves by means of loudspeaker arrays [7,8]. Methods to incorporate the sound radiation characteristics of musical instruments are based on sparse recordings of the sound radiation characteristics [5], like far field recordings from circular [9,10] or spherical [11] microphone arrays with 24 to 128 microphones. In these studies, a nearfield mono recording is extrapolated from a virtual source point. However, instead of a monopole point source, the measured radiation characteristic is included in the extrapolation function, yielding a so-called complex point source [9,12,13]. Complex point sources are a drastic simplification of the actual physics of musical instruments. However, complex point sources were demonstrated to create plausible physical and perceptual fields [5,14]. These sound natural in terms of source localization, perceived source extent and timbre, especially when listeners and/or sources move during the performance [5,[15][16][17][18][19][20].
To date, sound field synthesis methods to reconstruct the sound radiation characteristics of musical instruments do not incorporate exact nearfield microphone array measurements of musical instruments, as described in [21][22][23][24][25]. This is most likely because the measurement setup and the digital signal processing for high-precision microphone array measurements are very complex on their own. The methods include optimization algorithms and solutions to inverse problems. The same is true for sound field synthesis approaches that incorporate complex source radiation patterns.
In this paper we introduce the theoretic concept of a radiation keyboard. We describe on a theoretical basis, and with some practical considerations, which sound field measurement and synthesis methods should be combined, and how to combine them utilizing their individual strengths. All presented results are preparatory for the realization. In contrast to conventional samplers, electric pianos, etc., a radiation keyboard recreates not only the temporal and spectral aspects of the original instrument, but also its spatial attributes. The final radiation keyboard is basically a MIDI keyboard whose keys trigger different driving signals of a loudspeaker array in real-time. When playing the radiation keyboard, the superposition of the propagated loudspeaker driving signals should create the same sound field as the original harpsichord would do. Thus, the radiation keyboard will create a more realistic sound impression than conventional, stereophonic samplers. This is especially true for musical performance, where the instrumentalists moves their heads. The radiation keyboard can serve, for example, • as a means to produce authentic sounding replicas of historic musical instruments in the context of cultural heritage preservation [26,27], • as an authentic and immersive alternative to physical modeling synthesizers, conventional samplers, electrical pianos (or harpsichords, respectively) [3,4,28], • as a research tool for instrument building [29], interactive psychoacoustics [30], and embodied music interaction [31].
The remainder of the paper is organized as follows. Section 2 describes all the steps that are carried out to measure and synthesize the sound radiation characteristics of a harpsichord. In Section 2.1, we describe the setup to measure impulse responses of the harpsichord and the radiation keyboard loudspeakers. These are needed to calculate impulse responses that serve as raw loudspeaker driving signals. For three different frequency regions, f1 to f3, different methods are ideal to calculate. In Sections 2.2-2.4, we describe how to derive the loudspeaker impulse responses for frequency regions f1, f2 and f3. In Section 3, we describe how to combine the three frequency regions, and how to create loudspeaker driving signals that synthesize the original harpsichord sound field during music performance. After a summary and conclusion in Section 4, we discuss potential applications of the radiation keyboard in the outlook Section 5.

Method
The concept and design of the proposed radiation keyboard are illustrated in Figure 1. The sound field radiated by a harpsichord is analyzed by technical means. Then, this sound field is synthesized by the radiation keyboard. The radiation keyboard consists of a loudspeaker array whose driving signals are triggered by a MIDI keyboard. The superposition of the propagated loudspeaker driving signals creates the same sound field as a real harpsichord. To date, no sound field synthesis method is able to radiate all frequencies in the exact same way the harpsichord does. Therefore, we combine different sound field analysis and synthesis methods. This combination offers an optimal compromise: low frequencies f1 < 1500 Hz are synthesized with high precision in the complete half-space above the sound board, mid frequencies 1500 ≤ f2 ≤ 4000 Hz are synthesized with high precision within an extended listening region, and higher frequencies f3 > 4000 Hz are synthesized with high precision at discrete listening points within the listening region. Design and concept of the radiation keyboard (right). A MIDI keyboard triggers individual signals for 128 loudspeakers, which are arranged like a harpsichord. Replacing the real harpsichord (left) by the radiation keyboard creates only subtle audible differences. Unfortunately, the radiation keyboard cannot synthesizes the harpsichord sound in the complete space. Low frequencies f1 are synthesized with high precision in the complete half-space above the loudspeaker array (light blue zone). Mid frequencies are synthesized with high precision in the listening region (green zone) in which the instrumentalist is located. The sound field of very high frequencies is synthesized with a high precision at discrete listening points within the listening region (red dots).
To implement a radiation keyboard, four main steps are carried out. Figure 2 shows a flow diagram of the main steps: firstly, the sound radiation characteristics of the harpsichord are measured by means of microphone arrays. Secondly, an optimal constellation of loudspeaker placement and sound field sampling is derived from impulse these measurements. As the third step the impulse responses for the loudspeaker array are calculated. These serve as raw loudspeaker driving signals. Finally, loudspeaker driving signals are calculated by a convolution of harpsichord source signals with the array impulse responses. These driving signals are triggered by a MIDI keyboard and play in real-time. The superposition of the propagated driving signals synthesizes the complex harpsichord sound field.  To synthesize the sound field, it is meaningful to divide the harpsichord signal into three frequency regions: frequency region f1 lies below 1.5 kHz, the Nyquist frequency of the proposed loudspeaker array. Frequency region f2 ranges from 1.5 kHz to 4 kHz, the Nyquist frequency of the microphone array. Frequency region f3 lies above these Nyquist frequencies. Different sound field measurement and synthesis methods are optimal for each region. They are treated separately in the following sections.

Setup
The setup for the impulse response measurements is illustrated in Figure 3 for a piano under construction. In the presented approach the piano is replaced by a harpsichord. An acoustic vibrator excites the instrument at the termination point between the bridge and a string for each key. Successive microphone array recordings are carried out in the near field to sample the sound field at M = 1500 points parallel to the sound board. In addition to the near field recordings, the microphone array samples the listening region. The head of the instrumentalist will be located in this region during the performance (ear channel distance to keyboard y ≈ 0.37 m, ear channel distance to ground z ≈ 1.31 m for a grown person). The location is indicated by black dots in Figure 4. A lightweight piezoelectric accelerometer measures the vertical polarization of the transverse string acceleration h(κ a , t) at the intersection point between string and bridge for each of the A = 62 keys. This is not illustrated in Figure 3 but indicated as brown dots in Figure 5. The acceleration measured by the sensor is proportional to the force acting on the bridge. Details on the setup can be found in [32,33]. Alternatively, h(κ a , t) can be recorded optically, using a high speed camera and the setup described in [34,35], or it can be synthesized from a physical plectrum-string model [36]. The string recording represents the source signal that excites the harpsichord. . Procedure to derive the impulse response R f 1 (Θ l , κ a , t) for each loudspeaker and pressed key in the radiation keyboard. The brown curve represents the bridge, the brown dots depict exemplary excitation points. The black dots represent microphones near the soundboard, the light gray dots represent equivalent sources on the soundboard. The gray dots represent a regular subset of equivalent sources, which are replaced by loudspeakers (circles) in the radiation keyboard.
Measuring string vibrations isolated from the impulse response measurements of the sound board adds a lot of flexibility to the radiation keyboard. We derive impulse responses for the loudspeaker array of the radiation keyboard. This radiates all frequencies the same way the harpsichord would do. Consequently, any arbitrary source signal can serve as an input for the radiation keyboard. In addition to the measured harpsichord string acceleration h(κ a , t), the radiation keyboard can load any sound sample, such as alternative harpsichord tunings, alternative instrument recordings, or arbitrary test signals. Figure 4 illustrates the radiation keyboard. A rigid board in the shape of the harpsichord sound board serves as a loudspeaker chassis. A regular grid of loudspeakers is arranged on this chassis. The radiated sound field created by each single loudspeaker is recorded in the listening region.

Low Frequency Region f1
The procedure to calculate the loudspeaker impulse responses for frequency region f1 is illustrated in Figure 5. Firstly, impulse responses of the harpsichord are recorded in the near field. Next, the recorded sound field is propagated back to M = 1500 points on the harpsichord sound board. Then, an optimal subset of these points is identified. This subset determines the loudspeaker distribution of the radiation keyboard.

Nearfield Recording
The setup for the near field recordings is illustrated in Figure 3. A microphone array X near,m with equidistant microphone spacing of 40 mm is installed at a distance of 5 cm parallel to the harpsichord soundboard surface. The index m = 1, . . . , 1500 describes a microphone position above the harpsichord.
An acoustic vibrator excites the instrument at the intersection point of string and bridge; the string termination point [32] κ a . Here, the index a = 1, . . . , A describes the pressed key. For a harpsichord with 5 octaves, A = 62 keys exist.
To obtain impulse responses from the recorded data the so-called exponential sine sweep (ESS) technique is utilized [37]. The method has originally been proposed for measurements of weakly non-linear systems in room acoustics (e.g., loudspeaker excitation in a concert hall) but can also be adapted to structure-borne sound [38]. For the excitation an exponential sine sweep is used, where Here, ω 1 = 2π rad s −1 is the starting frequency, ω 2 = 2π× 24,000 rad s −1 is the maximum frequency and T = 25 s is the signal duration. The vibrator excites the sound board with this signal. Figure 6 shows the spectrogram of an exemplary microphone recording p(X 1 , t). Since the frequency axis has logarithmic scaling, the sweep appears as a straight line. Due to non-linearity in the shaker excitation the recording shows harmonic distortions parallel to the sinusoidal sweep. A deconvolution process eliminates these distortions. The deconvolution is realized by a linear convolution of the measured output p(X m , t) with the function Here, s −1 (t) is the temporal reverse of the excitation sweep signal (2) and b(t) is an amplitude modulation that compensate the energy generated per frequency, reducing the level by 6 dB/octave, starting with 0 dB at t = 0 s and ending with −6 log 2 (ω 2 /ω 1 ) dB at t = T, expressed as This linear deconvolution delays s(t) of an amount of time varying with frequency. The delay is proportional to the logarithm of frequency. Therefore, s −1 (t) stretches the signal with a constant slope, and compresses the linear part to a time delay corresponding to the filter length. The harmonic distortions have the same slope as the linear part and are, therefore, also packed to very precise times. If T is large enough, the linear part of an impulse response is temporally clearly separated from the non-linear pseudo IR.
This deconvolution process yields one signal This signal q (X m , t) is the linear impulse response q(X m , t) preceded by the nonlinear distortion products, i.e., the pseudo-IRs. An example of q (X 1 , t) and q(X 1 , t) is illustrated in Figure 7. The linear impulse response part can be obtained by a peak detection searching for the last peak in the time series. In the figure the final, linear impulse response q(X 1 , t) is highlighted in red.
The driving signal and the convolution are reproducible. Repeated measurements are carried out to sample the radiated sound field. To cover each key and sample point, this yields N × A = 93,000 recordings.  Figure 6 after deconvolution. The harmonic distortions are separated in time and precede the linear part q(X m , t) (red), which starts at t ≈ 3.5 s.

Back Propagation
The harpsichord soundboard is a continuous radiator of sound, but can be simplified as a discrete distribution of N = 1500 radiating points Y n , referred to as equivalent sources [23]. These equivalent sources sample the vibrating sound board. The validity of this simplification is restricted by the Nyquist-Shannon theorem, i.e., two equivalent sources per wave length are necessary. The following steps are frequency-dependent. Therefore, we transfer functions of time into frequency domain using the discrete Fourier transform. Terms in frequency domain are indicated by capital letters and the ω in the argument. For example Q(ω) represents the frequency spectrum of q(t).
The relationship between the radiating soundboard Q(Y n , ω) and the spectra of the aligned, linearized microphone recordings Q(X m , ω) is described by a linear equation system where is the Free field Greens' function. It is a complex transfer function that describes the sound propagation of the equivalent sources as monopole radiators. Here, the term r = ||X m − Y n || 2 is the Euclidean distance between equivalent sources and microphones, k is the wave number and i, the imaginary unit. Equation (6) is closely related to the Rayleigh Integral which is applied in acoustical holography and sound field synthesis approaches, like wave field synthesis and ambisonics [39]. One problem with Equation (6) is that the linear equation system is ill-posed. The radiated sound Q(X m , ω) is recorded but the source sound Q(Y n , ω), which created the recorded sound pressure distribution, is sought. When solving the linear equation system, e.g., by means of Gaussian elimination or an inverse matrix of G(r, ω) [40], the resulting sound pressure levels tend to be huge due to small numerical errors, measurement and equipment noise. This can be explained by the propagation matrix being ill-conditioned when microphone positions are close to one another compared to the considered wavelength. In this case the propagation matrix condition number is high. A regularization method relaxes the matrix and yields lower amplitudes. An overview about regularization methods can be found in [21,23,40]. For musical instruments, the Minimum Energy Method (MEM) [23,41] is very powerful. The MEM is an iterative approach, gradually reshaping the radiation characteristic of G(r, ω) from monopole at Ω = 0 to a ray at Ω = ∞ using the formulation where Ψ (α, ω, Ω) is multiplied by G(r, ω) in Equation (6) to reshaped the complex transfer function, like Q(X m , ω) = G(r, ω) × Ψ (α, ω, Ω) × Q(Y n , ω).
In Equations (8) and (9), α describes the angle between equivalent sources Y n and loudspeakers X m as inner product of both position vectors The angle α is given by the constellation of source-and receiver positions and is 1 in normal direction n of the considered equivalent source position and 0 in the orthogonal direction. The ideal value for Ω minimizes the reconstruction energy where The energy E is proportional to the sum of the squared pressure amplitudes on the considered structure. In a first step, the linear equation system is solved for integers from Ω = 0 to Ω = 10 and the reconstruction energy is plotted over Ω. Around the local minimum, the linear equation system is again solved, this time in steps of 0.1. Typically, the iteration is truncated after the first decimal place. An example of reconstruction energy over Ω is illustrated in Figure 8 together with the condition number of G × Ψ in Equation (9). Near Ω opt , both the signal energy and the matrix condition number tend to be low. Alternatively, the parameter Ω can be tuned manually to find the best reconstruction visually; the correct solution tends to create the sharpest edges at the instrument boundaries, with pressure amplitudes near 0. This is a typical result of the truncation effect: the finite extent of the source causes an acoustic short-circuit. At the boundary, even strong elongations of the sound board create hardly any pressure fluctuations, since air flows around the sound board. The effect can be observed in Figure 9.
The result of the MEM is one source term Q(Y n , ω) for each equivalent source on the harpsichord sound board. Below the Nyquist frequency, these 1500 equivalent sources approximate the sound field of a real harpsichord. This is true for the complete half-space above the sound board.

Optimal Loudspeaker Placement
The back-propagation method described in Section 2.2.2 yields one source term for each of the N = 1500 equivalent sources and A = 62 keys. Together, the equivalent sources sample the harpsichord sound field in the region of the sound board. Forward propagation of the source terms approximates the harpsichord sound field in the whole half-space above the sound board. This has been demonstrated, e.g., in [47]. Replacing each equivalent source by one loudspeaker is referred to as acoustic curtain, which is the origin of wave field synthesis [39,48]. In physical terms, this situation is a spatially truncated discrete Rayleigh integral, which is the mathematical core of wave field synthesis [7,8,39,49]. A prerequisite is that all equivalent sources are homogeneous radiators in the half-space above the soundboard. This is the case for the proposed radiation keyboard. For low frequencies, loudspeakers without a cabinet approximate dipoles fairly well. Naturally, single loudspeakers with a diameter in the order of 10 cm are inefficient radiators of low frequencies [50]. However, this situation improves when a dense array of loudspeakers is moving in phase. This is typically happening in the given scenario; when excited with low frequencies, the sound board vibrates as a whole [51], so the loudspeaker signals for the wave field synthesis will be in phase. While truncation creates artifacts in most wave field synthesis setups, referred to as truncation error [8,39,48], no artifacts are expected in the described setup due to natural tapering: at the boundaries of the loudspeaker array, an acoustic short-circuit will occur. However, the acoustic short-circuit also occurs in real musical instruments, as demonstrated in Figure 9. This is because compressed air in the front flows around the sound board towards the rear, instead of propagating as a wave. The acoustic short-circuit of the outermost loudspeakers acts like a natural tapering window. In wave field synthesis installations artificial tapering is applied to compensate the truncation error.
The MEM describes the sound board vibration by N = 1500 equivalent source terms. Replacing all equivalent sources by an individual loudspeaker is not ideal, because the spacing is too dense for broadband loudspeakers, and it is challenging to synchronize 1500 channels for real-time audio processing. Audio interfaces including D/A-converters for L = 128 synchronized channels in audio-cd quality are commercially available, using, for example, MADI or Dante protocol. In wave field synthesis systems, regular loudspeaker distributions have been reported to deliver the best synthesis results [52]. Covering the complete soundboard of a harpsichord with a regular grid consisting of 128 grid points, yields about one loudspeaker every 12 cm. This is a typical loudspeaker density in wave field synthesis systems and yields a Nyquist frequency of about 1.5 kHz for waves in air [7].
Three exemplary loudspeaker arrays are illustrated in Figure 10. Every third equivalent source can be replaced by a loudspeaker with little effect on the sound field synthesis precision below 1.5 kHz. This yields W = 11 possible loudspeaker grid positions Θ w,l .  At the ideal location of the loudspeaker grid, all loudspeakers lie near antinodes of all frequencies of all keys. In contrast to regions near the nodes, sound field calculations near the antinodes do not suffer from equipment noise, numerical noise, and small microphone misplacements. Instead, all loudspeakers contribute efficiently to the wave field synthesis. Therefore, the optimal grid location Θ l has the largest signal energy Equation (13) is solved for each of the w = 1, . . . , 11 possible loudspeaker grid positions. The grid with the maximum signal energy is replaced by loudspeakers as indicated in Figure 5. This ideal grid is the optimal loudspeaker distribution Θ l .
The Nyquist frequency of the loudspeaker array lies around 1.5 kHz. For reproduction of higher frequencies, other methods are necessary, as described in the following sections.

Higher Frequency Region f2
The procedure to calculate the loudspeaker impulse responses for frequency region f2 is illustrated in Figure 11. First, impulse responses of the harpsichord are recorded in the listening region. Next, impulse responses of the final loudspeaker grid Θ l are recorded in the listening region. These are transformed such that the loudspeaker array creates the harpsichord sound field in the listening region. Then, the optimal position of listening points is determined. These listening points are a subset of the microphone locations that sample the listening region. Figure 11. Procedure to derive the impulse response R f 2 (Θ l , κ a , v, t) for each loudspeaker Θ l and pressed key κ a in the radiation keyboard. The wooden plate represents the harpsichord sound board. The white plate represents the loudspeaker grid. The black dots represent microphones microphones in the listening region. At first, harpsichord and loudspeaker array create different sound fields in the listening region. Then, the loudspeaker signals are modified as to synthesize the harpsichord sound field at regular subset v of listening points that sample the listening region.

Far Field Recording
In addition to the near field recordings, the radiated sound is also recorded with a microphone array X far,v,j that samples the region in which the instrumentalist's head may be located during playing. We refer to this region as the listening region and to the discrete sample points as listening points. The distance between equivalent sources on the sound board and the listening points lies in the order of decimeters to meters. For frequencies above 1.5 kHz, this means that the listening region lies in the far field.
In the near field measurement, Section 2.2.1, one microphone array samples a planar region parallel to the sound board. In the far field measurement the microphone array samples a rectangular cuboid. The setup for the far field recordings is illustrated in Figures 4 and 11. As described in Sections 2.1 and 2.2.1, the sound board is excited with an exponential sweep. An array of J = 128 microphones samples the listening region. The microphones are arranged as a regular grid with a spacing of 4 cm. The array samples the complete sound field in the listening region for all wave lengths above 0.08 m, i.e., frequencies below 4 kHz. About V = 11 repeated measurements are carried out with a slightly shifted microphone array. Equations (1)-(5) describe how to excite the sound board and derive impulse responses for the A × J × V = 87,296 different source-receiver constellations.
These far field impulse responses provide a sample of the desired sound field Q des (X, κ a , ω) in the region in which the instrumentalist is moving her head. In frequency domain, it can be described as the relationship between source signal S(ω), complex transfer function G(r, ω) and microphone array recordings P des (X, κ a , ω) S(ω) × G(r, t) = P des (X v,j , κ a , ω) (14) where the recording of the sweep is aligned to receive the impulse response The terms S(ω) and G(r, t) in Equation (14) are known. So instead of microphone array measurements, P des (X, κ a , ω) can be calculated by this forward-propagation formula. In [47] it was demonstrated that the forward-propagation equals the measurements.

Radiation Method
To synthesize the desired sound field Q des (X, κ a , ω) with the given loudspeaker array Θ l , L = 128 loudspeaker signals R f2 (Θ l , ω) need to be calculated. This is done in two steps. First, the swept sine, Equation (1), is played trough each individual loudspeaker Θ l and recorded in the listening region. Here, and describe the relationship between the source signal, the raw microphone recordings and the final impulse response. Here, the unknown propagation term Ϙ(α, ω) is the ratio of the source signal and the recordings. In contrast to a real harpsichord source signal H(κ a , ω) we know that S(ω) 0 for all audible frequencies. Thus, the complex transfer function Ϙ(α, ω) between each loudspeaker of the array Θ l and each listening point X far,v,j is determined by simply recording the propagated swept sine, Equation (16), of each loudspeaker at each listening point, followed by the deconvolution, Equation (17).
This complex transfer function is neither the idealized monopole source radiation G(r, ω), nor the energy-optimized radiation function Ψ(α, ω)G(r, ω). Instead, Ϙ(α, ω) is the actual transfer function as measured physically. It includes the frequency and phase response of the loudspeakers, the amplitude decay and the phase-shift from each loudspeaker to each receiver. It can thus be considered the true transfer function. It includes the sound radiation characteristics of the loudspeakers, which tend to deviate from G and Ψ. Solving the linear equation system for all V = 11 microphone array positions yields the impulse response for the loudspeakers R f2 (Θ l , X v,j , κ a , ω). This procedure is referred to as radiation method as it synthesizes a desired sound field by including the measured sound radiation characteristics of the loudspeakers. Accounting for the actual transfer function from each loudspeaker to each listening point has the advantage that the rows in the linear equation system described by Equation (18) tend to deviate stronger in reality compared to idealized monopole radiators. This has been demonstrated in [20,49]. The radiation method is a robust regularization method that has been demonstrated to relax the linear equation system [9,20,39,49]. It leads to (a) low amplitudes and (b) solutions that vary only slightly, when the source-receiver constellation or the source signal is varied slightly. The method synthesizes a desired sound field as long as (a) the sound field lies in the far field and (b) at least two listening points per wavelength exist. The method is only ideal for frequency region f2, as it does not account for nearfield effects and spatial aliasing [9,20,49].

Optimal Listening Points
So far, Equation (18) delivers V = 11 sets of L = 128 impulse responses for each key. The solutions only vary slightly, due to microphone misplacements, equipment and background noise, numerical errors and the spatial variations of the loudspeaker sound radiation characteristics. The solutions are valid inside the listening region. Outside the listening region synthesis errors occur, because loudspeaker signals interfere in an arbitrary manner. Outside an anechoic chamber, this will lead to unnatural reflections. Consequently, the ideal impulse response minimizes synthesis errors outside the listening region. This is achieved by selecting the impulse responses with the lowest signal energy where The signal energy is the sum of all L = 128 loudspeaker impulse responses. In Equation (20) the signal energy of each key κ a is calculated for each of the v = 1, . . . , 11 microphone array positions. As defined in Equation (19), the final impulse response R f2 (Θ, κ a , ω) for each specific key is the one where v creates minimal signal energy. This solution exhibits the most constructive interference inside the listening region. The solution is valid for the considered frequency region f2, i.e., between 1.5 and 4 kHz. For higher frequencies, a third method is ideal, as discussed in the following section.

Highest Frequency Region
In principle, the method to calculate the impulse response for frequency region f3 equals the method described in Section 2.3. For frequencies over 4 kHz the radiation method only synthesizes the desired sound field at the discrete listening points, but not in between [49]. Therefore, it is adequate to approximate the desired amplitudes and phases at discrete listening points by solving Equation (18). The difference between frequency regions f2 and f3 is the selection of optimal listening points. Instead of choosing the impulse response with minimum signal energy, Equation (20), the ideal impulse response for f3 is the shortest, because it exhibits the highest impulse fidelity. When convolved with a short impulse response, the frequencies of source signals stay in phase. Quite contrary, long impulse responses indicate out of phase relationships. Phase is mostly audible during transients. Consequently, the shortest impulse response is ideal, because it maintains the characteristic, steep attack transient of harpsichord notes.
Impulse responses for one loudspeaker and three different microphone array locations v is illustrated in Figure 12. The shortest impulse response can be identified visually.

Calculation of Loudspeaker Driving Signals
Each method described above result in one truncated impulse response per loudspeaker and pressed key R f (Θ, κ a , t). Here, each impulse response is truncated in frequency. To combine them, the three truncated impulse responses per loudspeaker and pressed key are simply added, i.e., , κ a , t).
The result of Equation (21) is a broadband impulse response R(Θ l , κ a , t) that covers the complete audible frequency range. An example is illustrated in Figure 13. . Impulse responses R f truncated to frequency regions f1 (blue), f2 (green) and f3 (red). Adding up the time series yields a broadband impulse response R(Θ l , κ a , t) (black). Here, t is the time in seconds and u is a normalized voltage to drive the loudspeaker.
Summing the truncated impulse responses yields one broadband impulse response for each loudspeaker and pressed key, i.e., B = L × A = 7936 broadband impulse responses R Θ l ,κ a ,t . These impulse responses describe how any frequency is radiated by the loudspeaker array to approximate the sound field that the harpsichord would create, if excited by a broadband impulse.
Naturally, a played harpsichord is not excited by an impulse. Instead, pressing a key creates a driving signal that travels through the string and transfers to the soundboard via the bridge. In Section 2.1 we provide literature that suggests three different ways to record or model this driving signal h(κ a , t). In order to finally use the radiation keyboard as a harpsichord sampler, the loudspeaker driving signals d(Θ l , κ a , t) are calculated by a convolution of the impulse response with the source signal, i.e., d(Θ l , κ a , t) = h(κ a , t) * R(Θ l , κ a , t).
This yields B = 7936 sound files. These are imported to multiple instances of a software sampler in a digital audio workstation (DAW). Typically, one instance of a software sampler can address between 16 and 64 output channels. Consequently, between 2 and 8 sampler instances need to be initialized. Technologies like VST and Direct-X are able to handle this parallelism, and several multi-channel DAWs (like Steinberg Cubase, Ableton Live and Magix Samplitude) can handle the high number of output channels. Finally, the original keyboard of the harpsichord is replaced by a MIDI-Keyboard, whose note-on command triggers the 128 samples for the corresponding note.
As the effect of key velocity on the created level and timbre is negligibly, the harpsichord is the ideal instrument to start with; only one sample per note and loudspeaker is necessary. For more expressive instruments, such as the piano, the attack velocity affects the produced level and timbre. Here, several samples per note, or one attack-velocity controlled filter would have to be applied. This implies the need for much higher data rates and specific signal processing, which is out of scope of this paper.

Conclusions
In this paper the theoretic foundation of a radiation keyboard has been presented. It includes the complete chain from recording the source sound and the radiated sound of a harpsichord to synthesizing its temporal, spectral and spatial sound within an extended listening region, controlled in real-time. To achieve this, we choose the optimal method for each frequency region and inverse problem, and describe a way to combine so far isolated and fragmented sound field analysis and synthesis approaches.
For the low frequency region f1, a combination of nearfield recordings, the minimum energy method, an energy-efficient loudspeaker grid selection, and wave field synthesis is ideal. It synthesizes the desired sound field in the complete half-space above the sound board.
For the higher-frequency region f2, a combination of far field recordings, the radiation method, and energy efficient listening point selection is ideal. This combination synthesizes the desired sound field in the listening region with a high precision.
For the highest frequency region f3, far field recordings and an in-phase impulse response creation are ideal. It approximates the correct signal amplitudes in the listening region, while supporting the transient behavior of the source sound. The initial outcome of such a radiation keyboard is a sampler that mimics not only the temporal and spectral aspects of the original musical instrument, but also its spatial aspects.

Outlook
This paper presented the theoretic framework of our current research project. The effort to implement a radiation keyboard is very high and a number of sound field measurement and synthesis methods need to be combined, leveraging their individual strengths. We have not implemented the radiation keyboard yet; this paper rather describes the necessary means to realize it.
The implemented radiation keyboard is supposed to serve as a research tool to carry out interactive listening experiments that are more ecological than passive listening tests with artificial sounds in a laboratory environment. Note that the radiation keyboard is not restricted to harpsichord sounds. In principle, any arbitrary sound file can act as source signal and be radiated like a harpsichord. This enables us to manipulate the temporal and spectral aspects of the sound, while keeping the sound radiation constant. Loading different source sounds while keeping the sound radiation fixed, could reveal which temporal and spectral parameters affect the perception of source extent and naturalness in the direct sound of musical instruments. The radiation keyboard could answer the question, whether a saxophone sound with the radiation characteristics of a harpsichord sound larger than a real saxophone. Using the radiation keyboard, we can investigate apparent source width and immersion of direct sound both in presence and absence of room acoustics. To date, physical predictors of apparent source with originate in room acoustical investigations [15,16,39]. Findings disagree, which frequency region is of major importance for these listening impression. Different predictors and the discourse are examined in [5].
The strength of a real-time capable radiation keyboard is the interactivity: musicians can actively play the instrument instead of carrying out passive listening tests. Interactivity creates a dynamic sound and allows for a natural interaction in an authentic musical performance scenario. This is a necessity in the field of performance, gesture, and human-machine-interaction studies and a prerequisite for ecological psychoacoustics [30,31].
Funding: This research received no external funding.