Headphone-To-Ear Transfer Function Estimation Using Measured Acoustic Parameters

: This paper proposes to use an optimal ﬁve-microphone array method to measure the headphone acoustic reﬂectance and equivalent sound sources needed in the estimation of headphone-to-ear transfer functions (HpTFs). The performance of this method is theoretically analyzed and experimentally investigated. With the measured acoustic parameters HpTFs for different headphones and ear canal area functions are estimated based on a computational acoustic model. The estimation results show that HpTFs vary considerably with headphones and ear canals, which suggests that individualized compensations for HpTFs are necessary for headphones to reproduce desired sounds for different listeners.


Introduction
The headphone-to-ear transfer function (HpTF) is defined as the electroacoustic transfer function from the input of a headphone to the sound pressure at the eardrum [1]. In general, HpTFs can be measured by using standard ear simulators on dummy heads. However, current standard ear simulators with fixed acoustic structures cannot simulate the average human ears above 10 kHz and individual differences in ear canal geometry and eardrum impedance [2]. Thus, HpTFs measured by standard ear simulators may not be satisfactory if individual HpTFs over the audible frequency range are needed. It is shown that HpTFs vary considerably with headphones and listeners [3,4]. In the time domain, the impulse response of the HpTF involves the reflections between the inner surface of the headphone and the eardrum. If binaural signals are reproduced through headphones, these reflections interfere with the sound localization cues formed by direction-dependent pinna reflections in the binaural signals, and may cause the front-back confusion in sound localization [5,6]. In the frequency domain, HpTFs introduce timbre distortions [7,8]. Therefore, to faithfully reproduce binaural signals to different listeners through headphones, HpTFs need to be characterized and compensated.
Theoretically, HpTFs should be measured at a point in the ear canal where the binaural signal is recorded [9]. However, direct measurements of individual HpTFs inside human ear canals are difficult and risky. Recently, a method of estimating the HpTF given the headphone acoustic reflectance and equivalent sound source to the ear canal, the ear canal area function and the eardrum impedance has been developed [10]. This means that HpTFs for different headphones and listeners can be estimated through computations based on the parameters of headphones and external ears. In previous studies, there have been some results of the eardrum impedance and reflectance [11] and the ear canal cross-sectional area functions [12,13] measured on human ears. Furthermore, a method for estimating eardrum reflectance and ear canal cross-sectional area functions from ear canal input impedance has also been presented [14]. Thus, in order to estimate the HpTFs for different headphones, the headphone acoustic reflectance and equivalent sound source to the ear canal need to be measured.
The acoustic reflectance of a headphone to the ear canal can be measured by using a single microphone at five measurement positions in an impedance tube sequentially [10]. However, it may be time-consuming to perform sequential measurements if a large number of headphones need to be measured, especially for a wide frequency range with a linear frequency step. To solve this problem, five microphones are simultaneously used to measure the sound pressure signals in the impedance tube [15]. With respect to the determination of the headphone equivalent sound source to the ear canal, some researchers utilize known acoustic impedance as reference loads [16,17]. However, acoustic loads made of long tubes may result in some operation difficulties in the experiments, and the measurement accuracy may be degraded by using the probe microphones as well.
In this paper, an optimal five-microphone array method for measuring the headphone acoustic reflectance and equivalent sound sources needed in the estimation of HpTFs is presented. In contrast to the previous work [15], a compensation function is introduced to compensate the mismatch between microphone sensitivities, and the measurement accuracy of the proposed measurement method is further improved by using a two-stage searching algorithm. The performance of the measurement method is theoretically evaluated and experimentally investigated. With the measured headphone acoustic reflectance and equivalent sound sources, HpTFs for different headphones and ear canal area functions are then estimated through computations based on an acoustic model.

Theory
For the transfer function method [18,19], two measurement positions are used to measure the reflectance of an acoustic load, and the measurement frequency range is determined by 0.05c/s < f < 0.45c/s, where c is the speed of the sound, and s is the distance between the measurement positions. It is clear that two measurement positions cannot achieve satisfactory measurement accuracy over a wide frequency range. Assume N microphones located at positions x 1 to x N are simultaneously used in an impedance tube with a pinna simulator, as shown in Figure 1.
Let p i ( f ) and p r ( f ) denote the frequency responses of the incident and reflected sound pressures at x = 0, respectively. Then, the output of the nth microphone V n ( f ) at position x n can be expressed as where M n ( f ) is the frequency response of the sensitivity of the nth microphone, f is the frequency, k is the wave number, and j = √ −1. If tube attenuation is considered, the wave number can be determined as [20] where D 0 is the diameter of the tube. ( As shown below, the compensation functions can be obtained by calculating the ratio between the frequency response of the 1st microphone and that of the nth microphone measured at the same position in the impedance tube. Based on Equations (1) Or, in matrix form where mic V is the output vector of the microphone array, J is the matrix of compensation functions, A is the propagation matrix determined by the measurement positions, and P is the vector containing the unknown incident and reflected sound pressures in the impedance tube. Then, for 3 N  , the least-squares solution of Equation (5) can be obtained via pseudo-inversion where H A is the Hermitian transpose matrix of A , and (~) represents the estimated value. With some algebraic manipulations, Equation (6) can be rewritten as The acoustic reflectance can be determined by Considering that different microphones have different sensitivities, this paper introduces a complex compensation function J n ( f ) to compensate the mismatch between microphone sensitivities. With the frequency response of the sensitivity of the 1st microphone M 1 ( f ) as reference, the compensation function J n ( f ) is defined as As shown below, the compensation functions can be obtained by calculating the ratio between the frequency response of the 1st microphone and that of the nth microphone measured at the same position in the impedance tube.
Based on Equations (1) and (3), the outputs of N microphones measured at positions x 1 to x N can be written as Or, in matrix form where V mic is the output vector of the microphone array, J is the matrix of compensation functions, A is the propagation matrix determined by the measurement positions, and P is the vector containing the unknown incident and reflected sound pressures in the impedance tube. Then, for N ≥ 3, the least-squares solution of Equation (5) can be obtained via pseudo-inversion where A H is the Hermitian transpose matrix of A, and (~) represents the estimated value. With some algebraic manipulations, Equation (6) can be rewritten as Appl. The acoustic reflectance can be determined by It can be seen from Equation (7) that although M 1 ( f ) is unknown, this term is a common factor that can be cancelled in calculating the ratio between p i ( f ) and p r ( f ). Thus, the acoustic reflectance can be calculated from Equation (7) with M 1 ( f ) = 1.

Error Analysis
In practice, the compensated microphone output signals may contain errors due to the measurement noise and the inaccurate compensation functions. Let the compensated microphone output vector V com be the sum of the true values V and the errors ∆V Then the estimated sound pressure vector P can be obtained from Equation (7) Assume that the elements of ∆V are independently distributed with zero mean and equal variance σ 2 , the expected value of the squared norm of the estimation error can be formulated as where E[] is the expectation value of a matrix, Tr() is the trace value of a matrix, Λ is the diagonal matrix containing the singular values of A, and SF is the singularity factor, defined as [21] Thus, the sensitivity of the microphone array method to errors caused by measurement noise and inaccurate compensation functions can be evaluated in terms of the SF.
In general, the measurement accuracy of the microphone array method can be improved by increasing the number of microphone positions [22]. However, the effects of measurement accuracy will become marginal when the number of microphone positions is greater than seven [21]. Therefore, to obtain the accurate measurement results with the rational number of microphones, five microphones are chosen in this paper. For the single microphone method [10], where the microphone spacing is d pre = [1.25, 6.45, 1.0, 2.3]cm, the corresponding SF from 100 Hz to 16 kHz is plotted using the mauve line in Figure 2. In comparison, the SF determined by five microphone positions with uniform spacing of 1.0 cm is plotted using the blue line. As can be seen from Figure 2, the above non-uniform microphone array can achieve a wider effective measurement frequency range than the uniform-spacing configuration, but some fluctuations exist.
The performance of the non-uniform array can be further improved if its SF can be reduced. To do so, a two-staged searching algorithm is presented to find the optimal microphone spacing over the frequency range [ f min , f max ]. Firstly, define the collection D 0 of the microphone spacing d 0 , i.e., . . , 4 is the microphone spacing, f peak is the frequency of SF peak within the frequency range, and γ is the predefined threshold. Then, the optimal microphone spacing d opt can be determined as follows where SF = 1 where , and K is the number of the frequency bins. To improve the measurement accuracy by optimizing the microphone spacing, the searching algorithm is implemented between two adjacent measurement positions exhaustively at 1 mm step over   The threshold is chosen as 0.85, and the number of the frequency bins is corresponding SF are shown in Figure 2. As can be seen, the optimal microphone positions can lead to an SF with smaller peaks than the previous non-uniform positions. Compared to the above uniform and non-uniform positions, the optimal microphone position has the smallest averaged SF.
It is noted that the selected opt d is dependent on the threshold .

Measurements of Headphone Acoustic Reflectance
The proposed five-microphone array method is used here to measure the acoustic reflectance for different headphones in a customized impedance tube, which is made of stainless steel with an inner diameter of 8 mm, an outer diameter of 20 mm, and a total length of 380 mm. One end of this tube is connected to a standard small right pinna simulator DB 61 removed from the KEMAR dummy head, and the other end is connected to a sound source (headphone). At the measurement positions, five miniature microphones (Sonion 8002, Suzhou, China) are flush mounted on the tube.
To compensate the microphone mismatch, the frequency responses of all microphones are sequentially measured at position 1 x in the impedance tube. The B&K PULSE 3560C (Nae rum, Denmark) drives the sound source to produce a linear sweep signal from 20 Hz to 20 kHz. Then the compensation functions in terms of Equation (3) can be determined as To improve the measurement accuracy by optimizing the microphone spacing, the searching algorithm is implemented between two adjacent measurement positions exhaustively at 1 mm step over [s min , s max ], where f min = 100 Hz, f max = 16 kHz, s min = 0.45c/16000, and s max = 0.05c/100. The threshold γ is chosen as 0.85, and the number of the frequency bins is K = 300. Among all the configurations, the selected optimal microphone spacing is d opt = [2.0, 9.0, 5.0, 1.0] cm. The corresponding SF are shown in Figure 2. As can be seen, the optimal microphone positions can lead to an SF with smaller peaks than the previous non-uniform positions. Compared to the above uniform and non-uniform positions, the optimal microphone position has the smallest averaged SF. It is noted that the selected d opt is dependent on the threshold γ.

Measurements of Headphone Acoustic Reflectance
The proposed five-microphone array method is used here to measure the acoustic reflectance for different headphones in a customized impedance tube, which is made of stainless steel with an inner diameter of 8 mm, an outer diameter of 20 mm, and a total length of 380 mm. One end of this tube is connected to a standard small right pinna simulator DB 61 removed from the KEMAR dummy head, and the other end is connected to a sound source (headphone). At the measurement positions, five miniature microphones (Sonion 8002, Suzhou, China) are flush mounted on the tube.
To compensate the microphone mismatch, the frequency responses of all microphones are sequentially measured at position x 1 in the impedance tube. The B&K PULSE 3560C (Naerum, Denmark) drives the sound source to produce a linear sweep signal from 20 Hz to 20 kHz. Then the compensation functions in terms of Equation (3) can be determined as where U n ( f ) denotes the frequency response of the nth microphone measured at position x 1 . The compensation functions obtained by Equation (15) are shown in Figure 3. After compensation, the five microphones are mounted at the positions x 1 to x N , and the sound source plays the same sweep signal. With the measured compensation functions and sound pressure signals, the headphone acoustic reflectance r 0 ( f ) can be determined by the Equations (7) and (8).  (7) and (8). In the experiments, three different types of (insert, semi-open, closed-back) headphones are measured. As shown in Figure 4, the headphone chosen as a representative of each of the headphone type is Huawei AM12 (insert), Beyerdynamic DT880 (semi-open circumaural) and AKG K550 (closed-back circumaural). Figure 5a,b show the magnitude and phase responses of the acoustic reflectance for the above three different types of headphones measured using the proposed five-microphone array method. To validate the proposed method, the magnitude and phase responses of the headphone acoustic reflectance obtained with the single microphone method, which is sequentially measured at each measurement position, are also presented (blue lines) in Figure 5a,b. It can be seen that the results measured by the proposed method are in good agreement with those measured by using the single microphone method, suggesting that the proposed method for accurate measurements of the headphone acoustic reflectance over a wide frequency range is reliable. It should be mentioned that, the measurement results without the compensation function are close to those obtained with the compensation function. This is because the magnitudes of the compensation functions are less than 3 dB, which shows good consistency for all microphones.   In the experiments, three different types of (insert, semi-open, closed-back) headphones are measured. As shown in Figure 4, the headphone chosen as a representative of each of the headphone type is Huawei AM12 (insert), Beyerdynamic DT880 (semi-open circumaural) and AKG K550 (closed-back circumaural). Figure 5a,b show the magnitude and phase responses of the acoustic reflectance for the above three different types of headphones measured using the proposed five-microphone array method. To validate the proposed method, the magnitude and phase responses of the headphone acoustic reflectance obtained with the single microphone method, which is sequentially measured at each measurement position, are also presented (blue lines) in Figure 5a,b. It can be seen that the results measured by the proposed method are in good agreement with those measured by using the single microphone method, suggesting that the proposed method for accurate measurements of the headphone acoustic reflectance over a wide frequency range is reliable. It should be mentioned that, the measurement results without the compensation function are close to those obtained with the compensation function. This is because the magnitudes of the compensation functions are less than 3 dB, which shows good consistency for all microphones.  (7) and (8). In the experiments, three different types of (insert, semi-open, closed-back) headphones are measured. As shown in Figure 4, the headphone chosen as a representative of each of the headphone type is Huawei AM12 (insert), Beyerdynamic DT880 (semi-open circumaural) and AKG K550 (closed-back circumaural). Figure 5a,b show the magnitude and phase responses of the acoustic reflectance for the above three different types of headphones measured using the proposed five-microphone array method. To validate the proposed method, the magnitude and phase responses of the headphone acoustic reflectance obtained with the single microphone method, which is sequentially measured at each measurement position, are also presented (blue lines) in Figure 5a,b. It can be seen that the results measured by the proposed method are in good agreement with those measured by using the single microphone method, suggesting that the proposed method for accurate measurements of the headphone acoustic reflectance over a wide frequency range is reliable. It should be mentioned that, the measurement results without the compensation function are close to those obtained with the compensation function. This is because the magnitudes of the compensation functions are less than 3 dB, which shows good consistency for all microphones.

Measurements of Headphone Equivalent Sound Sources
In general, the sound source generated by a headphone to an ear canal can be modeled as the Thevenin equivalent pressure source or the Norton equivalent volume velocity source [16,23]. However, to measure the Thevenin pressure source, the entrance of the ear canal must be physically blocked, which is not feasible for insert headphones. Thus, the Norton equivalent volume velocity source (NEVVS) is used here to characterize the equivalent sound sources of headphones. By using the proposed five-microphone array method, the NEVVSs of different headphones can be measured based on the same impedance tube as used in the measurements of the headphone acoustic reflectance, where the detail measurement steps are described in [10]. The NEVVS responses of the AM12, DT880, and K550 measured by the proposed five-microphone array method are plotted in Figure 5c.

Estimation of HpTFs
The human ear canal is about −27 mm long and about 8 mm in diameter, which can be seen as a slightly bended tube with varying cross-sectional areas. In this paper, for simplicity, the acoustic model of the ear canal is approximated as an M-sectional tube with each section having the same length L and variable cross-sectional area Uf , the HpTF can be estimated as [10]  

Measurements of Headphone Equivalent Sound Sources
In general, the sound source generated by a headphone to an ear canal can be modeled as the Thevenin equivalent pressure source or the Norton equivalent volume velocity source [16,23]. However, to measure the Thevenin pressure source, the entrance of the ear canal must be physically blocked, which is not feasible for insert headphones. Thus, the Norton equivalent volume velocity source (NEVVS) is used here to characterize the equivalent sound sources of headphones. By using the proposed five-microphone array method, the NEVVSs of different headphones can be measured based on the same impedance tube as used in the measurements of the headphone acoustic reflectance, where the detail measurement steps are described in [10]. The NEVVS responses of the AM12, DT880, and K550 measured by the proposed five-microphone array method are plotted in Figure 5c.

Estimation of HpTFs
The human ear canal is about −27 mm long and about 8 mm in diameter, which can be seen as a slightly bended tube with varying cross-sectional areas. In this paper, for simplicity, the acoustic model of the ear canal is approximated as an M-sectional tube with each section having the same length L and variable cross-sectional area S m , m = 1, . . . , M, and the eardrum impedance Z ed ( f ) is at the end of the Mth section. Given the measured headphone acoustic reflectance r 0 ( f ) and Norton equivalent volume velocity source U 0 ( f ), the HpTF can be estimated as [10] H r m r m e −jk m 2L e −jk m 2L 1 r ed ( f )e −jk M 2L (16) where k m is the wave number in the mth section, r m = (S m+1 − S m )/(S m+1 + S m ), and r ed ( f ) is the reflectance of the eardrum r ed ( f ) = (ρc/S M − Z ed ( f ))/(ρc/S M + Z ed ( f )). To simulate the real human ear, the effective eardrum impedance model [11] and ear canal area function [12] measured on human ears are adopted here for estimating the HpTFs of the Huawei AM12, Beyerdynamic DT880, and AKG K550. The above ear canal area function with fixed ear canal length of 27 mm can be described by the mid locations x k that refers to the innermost corner of the ear canal and the corresponding radii r k .
Three different ear canal area functions with fixed r k and ear canal lengths of 22 mm, 27 mm, and 32 mm, where the corresponding mid locations are 22x k /27, x k , and 32x k /27 are chosen to study the effects of various ear canal lengths on HpTFs, and the estimated HpTFs for the AM12, DT880, and K550 are illustrated in Figure 6a. As can be seen, HpTFs vary considerably with headphones and ear canal lengths. Even for the same headphone, HpTFs show large variations between different ear canal lengths. In particular, for AM12, the peak at 6 kHz is caused by the self-resonance of the headphone, not relying on the length of the ear canal. The difference in HpTFs could be even greater if individual eardrum impedances are taken into account.
In the same way, to study the effects of various ear canal cross-sectional areas on HpTFs, three different ear canal area functions with fixed ear canal length of 27 mm and ear canal radius of 0.8 r k , r k , and 1.2 r k are chosen, and the estimated HpTFs for the AM12, DT880, and K550 are depicted in Figure 6b. According to the ear canal acoustic model presented in Equation (16), the effects of the ear canal radii r k on HpTFs can be equivalent to the effects of the eardrum reflectance r ed on HpTFs. As can be seen from Figure 5b, variations of the ear canal radii r k have a considerable impact on the overall gains of the HpTFs, while the structures of the HpTFs show slight variations. This is because the resonance frequency of the ear canal is mainly dependent on the length of ear canal rather than the eardrum impedance.
On the whole, HpTFs show large variations between different headphones and ear canals. In binaural reproduction, it is desired that individual HpTFs should be compensated in order to faithfully reproduce binaural signals at the eardrums. If compensation using non-individual HpTF from a third person, this may cause considerable perceptual degradation. The method of estimating HpTFs makes it possible to establish a database for various headphones and listeners by adjusting the parameters of headphones and ears. Actual individual eardrum impedances and ear canal area functions are expected to be used in future research.  Figure 6a. As can be seen, HpTFs vary considerably with headphones and ear canal lengths. Even for the same headphone, HpTFs show large variations between different ear canal lengths. In particular, for AM12, the peak at 6 kHz is caused by the self-resonance of the headphone, not relying on the length of the ear canal. The difference in HpTFs could be even greater if individual eardrum impedances are taken into account.
In the same way, to study the effects of various ear canal cross-sectional areas on HpTFs, three different ear canal area functions with fixed ear canal length of 27 mm and ear canal radius of 0.8 k r , k r , and 1.2 k r are chosen, and the estimated HpTFs for the AM12, DT880, and K550 are depicted in Figure 6b. According to the ear canal acoustic model presented in Equation (16), the effects of the ear canal radii rk on HpTFs can be equivalent to the effects of the eardrum reflectance ed r on HpTFs. As can be seen from Figure 5b, variations of the ear canal radii k r have a considerable impact on the overall gains of the HpTFs, while the structures of the HpTFs show slight variations. This is because the resonance frequency of the ear canal is mainly dependent on the length of ear canal rather than the eardrum impedance.
On the whole, HpTFs show large variations between different headphones and ear canals. In binaural reproduction, it is desired that individual HpTFs should be compensated in order to faithfully reproduce binaural signals at the eardrums. If compensation using non-individual HpTF from a third person, this may cause considerable perceptual degradation. The method of estimating HpTFs makes it possible to establish a database for various headphones and listeners by adjusting the parameters of headphones and ears. Actual individual eardrum impedances and ear canal area functions are expected to be used in future research.

Conclusions
An optimal five-microphone array method is developed for the measurements of the headphone acoustic reflectance and equivalent sound sources needed in the estimation of HpTFs. The optimal microphone positions are selected based on a two-stage searching algorithm, and compensation of microphones is implemented by introducing a compensation function. Experimental results show the effectiveness of the measurement method. This paper proposes that, given the parameters of headphones and ears, HpTFs can be estimated based on a computational acoustic model. The estimation results demonstrate that the parameters of headphones and ear canals have a considerable impact on the HpTFs, and compensations using individual HpTFs are essential for headphone reproduction. Our further research will focus on the effects of individualized compensation using model-based HpTFs on binaural reproduction.