Advancement of Individualized Head-Related Transfer Functions (HRTFs) in Perceiving the Spatialization Cues: Case Study for an Integrated HRTF Individualization Method

: Head-related transfer function (HRTF), which varies across individuals at the same direction, has grabbed widespread attention in the ﬁeld of acoustics and been used in many scenarios. In order to in-depth investigate the performance of individualized HRTFs on perceiving the spatialization cues, this study presents an integrated algorithm to obtain individualized HRTFs, and explores the advancement of such individualized HRTFs in perceiving the spatialization cues through two di ﬀ erent binaural experiments. An integrated method for HRTF individualization on the use of Principle Component Analysis (PCA), Multiple Linear Regression (MLR) and Partial Least Square Regression (PLSR) was presented ﬁrst. The objective evaluation was then made to verify the algorithmic e ﬀ ectiveness of that method. Next, two subjective experiments were conducted to explore the advancement of individualized HRTFs in perceiving the spatialization cues. One was auditory directional discrimination degree based on semantic di ﬀ erential method, in which the azimuth information of sound sources was told to the listeners before listening. The other was auditory localization, in which the azimuth information was not told to the listeners before listening. The corresponding statistical analyses for the subjective experimental results were made. All the experimental results support that individualized HRTFs obtained from the presented method achieve a preferable performance in perceiving the spatialization cues.


Introduction
Auditory sensing is essential for human beings to perceive the three-dimensional (3D) environment. The human hearing system can localize sound sources by using of the spatialization cues of the sources, for which head-related transfer functions (HRTFs) and the corresponding impulse responses (head-related impulse responses, HRIRs) play a vital role. HRTF is the transfer function of a linear system describing the filtering effect of the pinna, head and torso of a listener as a sound propagates from the source to the ear drum in free space [1]. Consequently, HRTFs are highly individualized, as the aforementioned anthropometric features are listener-dependent. HRTFs are defined as the ratio of the Fourier transform of the signal at the listener's eardrum to that at the center of the listener's head with the listener absent [1]. Obtaining accurate binaural HRTFs is of great significance in generating 3D sound. However, experimental measurement for HRTFs is difficult and time-consuming, which can not sufficiently support the practical application. Therefore, fast and efficiently individualizing HRTFs have drawn more and more attention from researchers.
Theoretically, the estimation of HRTFs is to decompose the parts that reflect anthropometric features from sound waves. It is the solution to a wave equation under a certain boundary condition. The most straightforward way to estimate the HRTF is to solve Rayleigh scattering of rigid sphere on plane wave, in which head and ears are simplified as a rigid sphere and two symmetrical points on the surface of the horizontal diameter respectively. However, the simplified head model fails to catch the diversity of individuals. Algazi et al. [2] used multipole reexpansion and boundary element methods to compute the HRTF of the models in both the frequency domain and the time domain. Katz [3] used boundary element method to calculate a portion the HRTF of an individual based on precise geometrical data. The calculation is time-consuming; special devices are needed to obtain the grid sampling data of human body surface. Meshram et al. [4] implemented the numerical simulation of acoustic propagation with Adaptive Rectangular Decomposition, which saved about 20 min when calculating HRTFs on a computer with eight cores and 3.4 GHz. Unfortunately, this is still time-consuming and difficult to implement in auralization [5].
Given the limitations of the above methods, many researchers tried to select a pair of HRTFs from an HRTF database as the individualized HRTFs of a new subject. According to this idea, the selection criterion is the similarity of anthropometric features. The more similar the anthropometric features, the closer the filtering effect on sound waves, and HRTFs can be shared among subjects with similar anthropometric features [6]. Based on the similarity and relativity of anthropometric structures, Zeng et al. [7] proposed a hybrid HRTF individualization algorithm, which has combined the method of principal component analysis (PCA), multiple linear regression (MLR) and database matching. Andreopoulou and Roginska [8] proposed a method of database matching by using sparsely measured HRTF datasets. It minimized data collection durations by applying a selective measurement procedure without weakening spatialization accuracy, which provides users the best-fitting densely measured HRTFs from an existing repository. Even though such methods can obtain individualized HRTFs fast, there are always differences in the same anthropometric features among different subjects. Thus, HRTFs are also different. Such methods still need improvement.
Considering the filtering effect of anthropometric features on sound waves, many researchers tried to construct a functional relationship between anthropometric features and HRTFs with the help of signal processing. By reducing the dimensions of HRTFs to five basic vectors, Nishino et al. [9] established a mapping system model from anthropometric features to basic vectors to obtain individualized HRTFs. By reducing the dimensions of HRIRs through non-negative matrix factorization, Tang et al. [10] reduced the HRIR dimensions of different subjects on a same direction to extract the feature vectors closely related to the anthropometric features. Then, eight anthropometric features were selected with correlation analysis to conduct the mapping relationship from the former to the latter by support vector regression. Next, the individualized HRIR basis vectors could be inferred by the trained nonlinear regression model, after which the individualized HRIR can be obtained by multiplying the feature vectors and basis vectors. Hu et al. [11] conducted a nonlinear mapping from eight independent anthropometric features to the twelve weight vectors of the principal components by using back-propagation neural network to obtain individualized HRTFs. Under the assumption that the linear weighting relationship between HRTF magnitudes are the same for different subjects and anthropometric features, Bilinski et al. [12] proposed the sparse efficient vectors, which can be used to represent anthropometric features of target subject with that of available subjects in the database. The individualized HRTF magnitudes could be obtained by multiplying the sparse vectors to the HRTF magnitudes in the database. Tashev et al. [13] supposed that the interaural time difference and HRTF magnitude possessed the same sparse relationship with the anthropometric features; they elicited a sparse vector from the anthropometric features, which could be used to recover the phase and magnitude of HRTFs. He et al. [14] studied the accuracy of different preprocessing methods on the sparse reconstruction of HRTFs. Zhu et al. [15] proposed another sparse based HRTF individualization method. Considering that the anthropometric features were not equally important for HRTFs [16], a weight set was assigned to the anthropometric features using a partially on-off strategy. And a sparse representation of the weighted anthropometric features was found. The individualized HRTFs could be synthesized through the weighted sparse representation of anthropometric features. Hugeng and Dadang [17] proposed a simple and efficient method for HRTF individualization by model individual HRTF magnitudes with a linear combination of ten orthonormal basis obtained from principle component analysis. Such methods mainly rely on the algorithm improvements to meet the demand of synthetic HRTF precision.
Obtaining HRTFs fast as well as precisely is strategic for the application in auralization. Although the above-mentioned studies have conducted in-depth research on HRTF individualization, more sufficient methods are still needed to obtain individualized HRTFs that contain accurate spatial information. To synthesize HRTFs for any listener, we present an integrated algorithm combing the method of correlation analysis, principle component analysis (PCA), multiple linear regression (MLR) and partial least square regression (PLSR). First, PCA was used to decompose HRTFs and extract the characteristic parameters. Then, correlation analysis was conducted to make a premier selection for reference features. Next, the retained anthropometric features were put into the MLR model to make a final selection for reference features. Finally, PLSR was used to synthesize the individualized HRTFs of a new subject. The advancement of synthesized HRTFs were then validated by two subjective listening tests using auralized sounds produced by convolving an original signal with the synthesized HRTFs. In our current research, only the horizontal plane data were used for HRTF individualization and listening tests.

Database and Characteristics
Since the database used in our current research has never been published before, this section includes a brief introduction and an analysis of the characteristics of HRTFs in the database.

Database
The database used in this paper was co-measured by Avic Huadong Photoelectric Co., LTD and our research group in Northwestern Polytechnical University. The measurement was conducted in a semi-anechoic room, with a size of 4.4 m × 4 m × 3.2 m. The cut-off frequency was 80 Hz, and the background noise was 10 dB. During the measurement, the loudspeaker was fixed at the arc slide track at a distance of 1.2 m to the head center. A pseudo-random signal was adopted as the measuring signal. The database contains 56 subjects aged between 25 and 45 with 50 anthropometric features of each subject and Head-Related Impulse Responses (HRIRs, the denotation of HRTFs in time domain) at 723 directions (see details in Table 1). All the HRIRs were in 800-sample length with a sampling frequency of 44.1 kHz.

Characteristics Analysis
Interaural Time Difference (ITD) and Interaural Level Difference (ILD) are two important features that can be used to test the accuracy of HRTFs. Figure 1 shows the average value of ITD at different

Characteristics Analysis
Interaural Time Difference (ITD) and Interaural Level Difference (ILD) are two important features that can be used to test the accuracy of HRTFs. Figure 1 shows the average value of ITD at different directions calculated from the HRTFs of 56 subjects in the database. Figure 2 shows the broadband average ILD (0 ~ 22.5 kHz) at different directions calculated from the HRTFs of 56 subjects in the database. ITDs were calculated through the method of rising edge with 20% [18]. For a certain elevation, the ITDs at Additionally, the elevation at horizontal plane has the largest ITD range, the ITD range decreases gradually as ϕ deviates from the horizontal plane. These results are consistent with existing literature [19]. As shown in Figure 2, ILDs are obviously asymmetric. The ILD generally has the largest variable range on the horizontal plane, which declines when the elevation increases. These are also consistent with the existing literature [18].
Furthermore, seven subjects were selected randomly to calculate their ITDs in the direction of θ = 0°, ϕ = 0° as shown in Figure 3. Although the ITDs of all the seven subjects perform the same tendency, there are still some apparent differences, such as the accurate location and value of the peak and valley, which also suggest the necessity of individualization for HRTFs.  ITDs were calculated through the method of rising edge with 20% [18]. For a certain elevation, the ITDs at 0 • ≤ θ ≤ 180 • are symmetric to those at 180 • ≤ θ ≤ 360 • . As sound moves laterally, the absolute value of ITD increases and reaches a maximum value near θ = 90 • and θ = 270 • . Additionally, the elevation at horizontal plane has the largest ITD range, the ITD range decreases gradually as φ deviates from the horizontal plane. These results are consistent with existing literature [19].
As shown in Figure 2, ILDs are obviously asymmetric. The ILD generally has the largest variable range on the horizontal plane, which declines when the elevation increases. These are also consistent with the existing literature [18].
Furthermore, seven subjects were selected randomly to calculate their ITDs in the direction of θ = 0 • , φ = 0 • as shown in Figure 3. Although the ITDs of all the seven subjects perform the same Appl. Sci. 2019, 9, 1867 5 of 19 tendency, there are still some apparent differences, such as the accurate location and value of the peak and valley, which also suggest the necessity of individualization for HRTFs. As shown in Figure 2, ILDs are obviously asymmetric. The ILD generally has the largest variable range on the horizontal plane, which declines when the elevation increases. These are also consistent with the existing literature [18].
Furthermore, seven subjects were selected randomly to calculate their ITDs in the direction of θ = 0°, ϕ = 0° as shown in Figure 3. Although the ITDs of all the seven subjects perform the same tendency, there are still some apparent differences, such as the accurate location and value of the peak and valley, which also suggest the necessity of individualization for HRTFs.

Methodology
Since not all the anthropometric features in the database are related to human hearing, it is necessary to find out those closely related to HRTFs as reference features. Once they are found, the main focus in application is to measure them and use them to synthesize individualized HRTFs. The procedure for this algorithm is shown in Figure 4. First, PCA was used to extract the characteristic

Methodology
Since not all the anthropometric features in the database are related to human hearing, it is necessary to find out those closely related to HRTFs as reference features. Once they are found, the main focus in application is to measure them and use them to synthesize individualized HRTFs. The procedure for this algorithm is shown in Figure 4. First, PCA was used to extract the characteristic parameters of HRTFs. Then, MLR was conducted to find the relationship between HRTF characteristic parameters and anthropometric features. Next, PLSR was used to synthesize the phase and magnitude of HRTFs. Finally, the individualized HRTFs of a new subject were obtained.

Principle Component Analysis for HRTF Characteristics Parameters
Principle component analysis is a statistical technique for analyzing and simplifying data set. It is often used to reduce the dimensionality of a data set while retaining the features that contribute the most to the variance of the data set. In previous research [20][21][22], PCA had been proved of effective in reconstruction of HRTF logarithmic magnitude. In current research, we applied PCA on HRTF logarithmic magnitude to extract the principle components as the characteristic parameters.
The logarithmic magnitudes of HRTF at M directions on the horizontal plane can be denoted as:

Principle Component Analysis for HRTF Characteristics Parameters
Principle component analysis is a statistical technique for analyzing and simplifying data set. It is often used to reduce the dimensionality of a data set while retaining the features that contribute the most to the variance of the data set. In previous research [20][21][22], PCA had been proved of effective in reconstruction of HRTF logarithmic magnitude. In current research, we applied PCA on HRTF logarithmic magnitude to extract the principle components as the characteristic parameters. The logarithmic magnitudes of HRTF at M directions on the horizontal plane can be denoted as: where f = 1, 2, . . ., N denotes the frequency bin of HRTF. Then, the covariance matrix could be calculated as: where [R] is an N by N Hermite matrix, and its eigenvalue is real. + denotes the generalized inverse of matrix [H log ]. Its eigenvector is extracted and arranged as the eigenvalue reduced-order u 1 , u 2 , . . . , u Q . Then the first Q eigenvectors are taken out as the base-vectors and to build a matrix: Because u 1 , u 2 , . . . , u Q are orthogonal, [H log ] N×M can be decomposed on the use of these base-vectors, then the weight matrix can be achieved accordingly: herein, [W] Q×M is the characteristic parameter of HRTFs. Figure 5 gives the result of PCA on a left ear's HRTF logarithmic magnitudes. As can be seen, more than 95% accuracy of the HRTF logarithmic magnitudes can be reconstructed from six principle components.

Multiple Linear Regression for Anthropometric Feature Selection
In this research, multiple linear regression was used for selecting the anthropometric features closely relating to the HRTFs. Suppose the relationship between principle components and the corresponding anthropometric features in the spatial direction θ is: where θ w represents the weight vector of HRTF logarithmic magnitude at direction θ , θ B represents the regression coefficients matrix, X denotes the anthropometric feature matrix, θ E represents the estimation error matrix, respectively. The regression coefficients matrix θ B can be calculated through the anthropometric features: T represents the transpose of a matrix. There were a number of anthropometric features in the database. For the reason that there were different correlations between the features and HRTFs, it was unnecessary to introduce all of them into the MLR models. By using unnecessary features, some useful information might be concealed, which led to a worse regression model. Meanwhile, getting rid of the unnecessary features alleviated the complexity of the system, which reduced the workload of the individualization procedure. In order to select the anthropometric features used in the regression model, correlation analysis was first conducted on different features. For the two features with large linear correlation coefficient, the one which had more significant influence was reserved. Then, correlation analysis was applied on the reserved features and HRTF logarithmic magnitudes to delete those features with smaller

Multiple Linear Regression for Anthropometric Feature Selection
In this research, multiple linear regression was used for selecting the anthropometric features closely relating to the HRTFs. Suppose the relationship between principle components and the corresponding anthropometric features in the spatial direction θ is: where w θ represents the weight vector of HRTF logarithmic magnitude at direction θ, B θ represents the regression coefficients matrix, X denotes the anthropometric feature matrix, E θ represents the estimation error matrix, respectively. The regression coefficients matrix B θ can be calculated through the anthropometric features: T represents the transpose of a matrix. There were a number of anthropometric features in the database. For the reason that there were different correlations between the features and HRTFs, it was unnecessary to introduce all of them into the MLR models. By using unnecessary features, some useful information might be concealed, which led to a worse regression model. Meanwhile, getting rid of the unnecessary features alleviated the complexity of the system, which reduced the workload of the individualization procedure. In order to select the anthropometric features used in the regression model, correlation analysis was first conducted on different features. For the two features with large linear correlation coefficient, the one which had more significant influence was reserved. Then, correlation analysis was applied on the reserved features and HRTF logarithmic magnitudes to delete those features with smaller correlation coefficients. Next, the selected anthropometric features were introduced to the MLR model, in which F-test and backward selection at significant level α = 0.05 would be used to delete those insignificant features.
Considering the linear regression model: where X is an n by p full rank matrix of known constant, Y is an n-vector of response, β is a p-vector of unknown parameter, and ε is an n by 1 unobservable error with a normal distribution N(0, σ 2 I n ).
Assume that a hypothesis in model (7) is given by H 0 : Aβ = c, where A is a known q by p matrix with rank q and c is a q by 1 vector. The usual test for hypothesis H 0 is F-test, which was equivalent to the likelihood ratio test. The test statistic F-statistic was given by: is an unbiased estimator of σ 2 in model (7), where P x = X(X T X) −1 X T . When H 0 holds the F-statistic Equation (8) distributes as an F distribution with degrees of freedom q and (n-p).
After the F-statistic has been calculated, the backward selection is used to select the significant features. First, all the features retained after correlation analysis were put into the regression model, then F-statistic of each feature was calculated and compared to the one at the significant level α = 0.05. After deleting the most insignificant feature, the rest ones were put into the model again and the insignificant one among them was deleted. This process was repeated until all the rest features were significant at the significant level α = 0.05. Table 2 gives the results of anthropometric feature selection for each step. And the anthropometric features are shown in detail as Figure 6. Table 2. Anthropometric features retained after each processing procedure.

Partial Least Square Regression for HRTF Synthesis
Partial least square regression is a statistical method that is used to search the basic functional relationship between two matrices [23]. It is often used in mapping the linear relationship from multiple independent variables to multiple dependent variables.
Assuming that the optimized anthropometric features of S subjects and the logarithmic magnitudes of HRTFs at M directions are presented with the matrices: where F, S, M and N represent the number of anthropometric features used in the model, training subjects, training azimuths and the frequency bins of HRTF, respectively. The first step of PLSR is to normalize the dependent and independent matrices. The normalization process is conducted as the equation below: where a * ij stands for the normalized value of a ij , M ij and m ij stand for the maximum and minimum value of a ij , respectively. The normalized matrices are X 0 and H 0 .
The second step is to construct the internal and external relationships between dependent variables and independent variables, and implement the regression of X 0 and H 0 on the first component: where t 1 represents the first component extracted from X 0 , p 1 and r 1 represent the regression coefficients, H 1 and X 1 represent the residual of X 0 and H 0 , respectively. Next, X 0 and H 0 are replaced with X 1 and H 1 , and the second step is repeated for l times until the accuracy of regression equation met the requirement, where l is determined by the cross validation seeing [22]. Then, the estimation of H 0 can be denoted as: Finally, according to the inverse process of normalization, the regression equation can be restored. Figure 7 shows an example data of measured and synthesized HRTF on the left ear of a KEMAR dummy head at θ = 0 • on the horizontal plane.

Objective Experiments
To evaluate the performance of the presented HRTF individualization method, the leave-oneout cross validation approach was used [24]. Each subject in the database was taken out successively as the testing subject with the remaining 55 ones being used for training.
The spectral distortion was used as the objective evaluation metric. The results of the presented HRTF individualization method was compared with other two existed HRTF individualization methods, which had been verified to be effective previously. One was database-matching approach proposed in [7], the other was general regression neural network based method presented in [25].

Evaluation Criteria
The error metric of spectral distortion widely used was employed to evaluate the synthesis precision of the individualized HRTFs: is the synthesized HRTF at the same direction, and k is the number of the frequency bin (K=400).
Then, the root-mean-square error (RMSE) was used to compare the two sets of HRTFs at all the 72 directions on the horizontal plane:

Objective Experiments
To evaluate the performance of the presented HRTF individualization method, the leave-one-out cross validation approach was used [24]. Each subject in the database was taken out successively as the testing subject with the remaining 55 ones being used for training.
The spectral distortion was used as the objective evaluation metric. The results of the presented HRTF individualization method was compared with other two existed HRTF individualization methods, which had been verified to be effective previously. One was database-matching approach proposed in [7], the other was general regression neural network based method presented in [25].

Evaluation Criteria
The error metric of spectral distortion widely used was employed to evaluate the synthesis precision of the individualized HRTFs: where H (d) k is the measured HRTF at d-th direction, k is the synthesized HRTF at the same direction, and k is the number of the frequency bin (K = 400). Then, the root-mean-square error (RMSE) was used to compare the two sets of HRTFs at all the 72 directions on the horizontal plane: where D stand for the direction of HRTF, D = 72.

Results
The results of the objective evaluation experiments are presented in Table 3, which validates the feasibility of the presented HRTF individualization method. The presented method achieved small synthesis error when compared with both the database matching one and general regression neural network (GRNN) base method. It is also notable that the proposed method achieved balanced synthesis errors for left and right ears, while the other two methods achieved relatively larger synthesis errors for left ear and smaller errors for right ear. Consequently, the proposed method was proved to be algorithmically effective and embodies the superiority in the balanced synthesis error of left and right ear.

Experimental Protocols
To investigate the effectiveness and superiority of the individualized HRTFs in perceiving the spatialization cues, a subjective experiment was conducted to compare the directional discrimination degree of sound sources. This experiment tested two HRTF species: individualized HRTFs (obtained from the presented method) and non-individualized ones (measured from KEMAR dummy head). Eight Chinese adults (four males and four females) aged between 23 and 27 were employed. They were postgraduates and doctoral student majoring in acoustics with self-reported normal hearing and former subjective experiment experience. All the participants were remunerated for their participation. Prior to the experiment, the reference features selected in Section 3.2 of each participant were measured to synthesize their individualized HRTFs. For body features of x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 and x 14 , they were measured directly by a ruler, for head and ear features of x 9 , x 10 , x 11 , x 12 , x 13 , they were measured from digital photographs as shown in Figure 8. On the basis of Inverse Fast Fourier Transform, HRTFs were transferred into HRIRs. Then testing stimuli were produced by convolving the HRIRs with an original double-channel notify signal whose waveform in time domain is shown in Figure 9. The testing stimuli were presented to the participants through Hi-Fi headphone [Sennheiser HD560].
Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci x8 and x14, they were measured directly by a ruler, for head and ear features of x9, x10, x11, x12, x13, they were measured from digital photographs as shown in Figure 8. On the basis of Inverse Fast Fourier Transform, HRTFs were transferred into HRIRs. Then testing stimuli were produced by convolving the HRIRs with an original double-channel notify signal whose waveform in time domain is shown in Figure 9. The testing stimuli were presented to the participants through Hi-Fi headphone [Sennheiser HD560].

Procedure
In this experiment, the semantic differential method [26][27][28] was adopted to evaluate the directional discrimination degree of sound sources. The order scale was used for evaluation, in which directional discrimination degrees are shown as Table 4. Before the experiment, the instructions and the questionnaires were handed out to the participants. On the questionnaire, the azimuth information of each testing stimulus was listed. During the experiment, each stimulus was played five times in succession, and then there would be a 5 s interval for the participants to select a class (listed in Table 4) according to their judgments. Then, the next stimulus would be presented in the same way until all the stimuli are played. There were totally 24 testing azimuths as shown in Figure  10. As a result, there were 24 (testing azimuth) × 2 (HRTF species) × 8 (participants) = 384 testing stimuli in total, and there were 24 × 2 = 48 testing stimuli for each participant.

Procedure
In this experiment, the semantic differential method [26][27][28] was adopted to evaluate the directional discrimination degree of sound sources. The order scale was used for evaluation, in which directional discrimination degrees are shown as Table 4. Before the experiment, the instructions and the questionnaires were handed out to the participants. On the questionnaire, the azimuth information of each testing stimulus was listed. During the experiment, each stimulus was played five times in succession, and then there would be a 5 s interval for the participants to select a class (listed in Table 4) according to their judgments. Then, the next stimulus would be presented in the same way until all the stimuli are played. There were totally 24 testing azimuths as shown in Figure 10. As a result, there were 24 (testing azimuth) × 2 (HRTF species) × 8 (participants) = 384 testing stimuli in total, and there were 24 × 2 = 48 testing stimuli for each participant.

Procedure
In this experiment, the semantic differential method [26][27][28] was adopted to evaluate the directional discrimination degree of sound sources. The order scale was used for evaluation, in which directional discrimination degrees are shown as Table 4. Before the experiment, the instructions and the questionnaires were handed out to the participants. On the questionnaire, the azimuth information of each testing stimulus was listed. During the experiment, each stimulus was played five times in succession, and then there would be a 5 s interval for the participants to select a class (listed in Table 4) according to their judgments. Then, the next stimulus would be presented in the same way until all the stimuli are played. There were totally 24 testing azimuths as shown in Figure  10. As a result, there were 24 (testing azimuth) × 2 (HRTF species) × 8 (participants) = 384 testing stimuli in total, and there were 24 × 2 = 48 testing stimuli for each participant.  In the experiment, the testing stimuli were divided into two groups (stimuli with individualized HRTFs and stimuli with non-individualized ones) based on the HRTF species, but the participants were unaware of this. The presentation order of testing stimuli in each group was randomized by the Latin square scheme to minimize the deviation of the experimental results that might be caused by  In the experiment, the testing stimuli were divided into two groups (stimuli with individualized HRTFs and stimuli with non-individualized ones) based on the HRTF species, but the participants were unaware of this. The presentation order of testing stimuli in each group was randomized by the Latin square scheme to minimize the deviation of the experimental results that might be caused by the test sequence. This psychoacoustic experiment was performed only once, and the subjects were required to make only one judgment for each stimulus. The experiment took about 15 min for each participant. Figure 11 gives the results of the subjective experiment for discrimination degree of given azimuths, in which an upper and shorter box implies a better evaluation results. As shown in Figure 11, stimuli with individualized HRTFs are evidently easy to discriminate the azimuth information. Despite that the stimuli at directions of 90 • and 270 • are easy to discriminate with non-individualized HRTFs, the individualized ones even improve the discrimination degree. At azimuths of 30 • and 45 • , the performance of the individualized HRTF species is obviously superior to the non-individualized species in the 25th percentiles, the median and the 75th percentiles. Moreover, the stimuli with azimuths in front of the listening subject (left part of Figure 11a and right part of Figure 11b) are noted easy to discriminate than the ones at the back of the listener (right part of Figure 11a and left part of Figure 11a). This is caused by the human auditory nature of being sensitive to the front directions. the test sequence. This psychoacoustic experiment was performed only once, and the subjects were required to make only one judgment for each stimulus. The experiment took about 15 min for each participant. Figure 11 gives the results of the subjective experiment for discrimination degree of given azimuths, in which an upper and shorter box implies a better evaluation results. As shown in Figure  11, stimuli with individualized HRTFs are evidently easy to discriminate the azimuth information. Despite that the stimuli at directions of 90° and 270° are easy to discriminate with non-individualized HRTFs, the individualized ones even improve the discrimination degree. At azimuths of 30° and 45°, the performance of the individualized HRTF species is obviously superior to the non-individualized species in the 25 th percentiles, the median and the 75 th percentiles. Moreover, the stimuli with azimuths in front of the listening subject (left part of Figure 11a and right part of Figure 11b) are noted easy to discriminate than the ones at the back of the listener (right part of Figure 11a and left part of Figure 11a). This is caused by the human auditory nature of being sensitive to the front directions. Figure 11. Box-plot for results of the subjective experiment for discrimination degree of given azimuths. (a) and (b) represent the azimuths in right and left of the subject, respectively. X-axis and Y-axis denote the testing azimuths and the discrimination degree showed in Table 4, respectively. Red and blue boxes denote the HRTF species of the presented individualization method and KEMAR dummy head. Box edges mark the 25 th and 75 th percentiles, black dots in a circle mark the median. Outliers are indicated by +.

Results and Analysis
To determine whether the individualized HRTFs perform better than the non-individualized ones in this subjective experiment, statistical analysis was made. First of all, a two-way repeated variance analysis (ANOVA) was made under the significance level of α = 0.05 to verify the validity of the experimental results. It was hypothesized that both the testing HRTF species and azimuths had  Figure 11. Box-plot for results of the subjective experiment for discrimination degree of given azimuths. (a) and (b) represent the azimuths in right and left of the subject, respectively. X-axis and Y-axis denote the testing azimuths and the discrimination degree showed in Table 4, respectively. Red and blue boxes denote the HRTF species of the presented individualization method and KEMAR dummy head. Box edges mark the 25th and 75th percentiles, black dots in a circle mark the median. Outliers are indicated by +.
To determine whether the individualized HRTFs perform better than the non-individualized ones in this subjective experiment, statistical analysis was made. First of all, a two-way repeated variance analysis (ANOVA) was made under the significance level of α = 0.05 to verify the validity of the experimental results. It was hypothesized that both the testing HRTF species and azimuths had no significant effect on the subjects (H 0 ). The hypothesis was kept if p was larger than α (p stands for the probability of making mistakes regarding to rejecting H 0 ). The hypothesis was rejected if p was less than α, the factor was thus considered to have significant effect on the subjects. The result of ANOVA was shown in Table 5. As can be seen from Table 5, both the HRTF species and the testing azimuth had significant effect on the participants, while the interaction of HRTF species and azimuth had no significant effect on the participants. Consequently, the psychoacoustic experiment was considered to be valid. Next, the Least Significant Difference (LSD) for independent samples was employed as the post-hoc of ANOVA to determine whether the individualized HRTFs performed better than the non-individualized ones in this experiment. It was hypothesized that there was no significant difference between the evaluation results of individualized HRTFs and non-individualized ones (H 0 ). p stands for the probability of making mistakes regarding to rejecting H 0 . H 0 was kept when p was larger than α, while H 0 was rejected when p was less than α. Under the significant level of α = 0.05, the result of LSD was given in Table 6 and Figure 12. As can be seen from Table 6, the evaluation results were significantly different between individualized HRTFs and the non-individualized ones. The mean class of the individualized HRTFs was larger than that of the non-individualized ones. The individualized HRTFs were consequently considered performing better than the non-individualized ones in this psychoacoustic experiment.
Appl. Sci. 2019, 9, x FOR PEER REVIEW 14 of 20 Appl. Sci. 2019, 9, x; doi: FOR PEER REVIEW www.mdpi.com/journal/applsci no significant effect on the subjects (H0). The hypothesis was kept if p was larger than α (p stands for the probability of making mistakes regarding to rejecting H0). The hypothesis was rejected if p was less than α, the factor was thus considered to have significant effect on the subjects. The result of ANOVA was shown in Table 5. As can be seen from Table 5, both the HRTF species and the testing azimuth had significant effect on the participants, while the interaction of HRTF species and azimuth had no significant effect on the participants. Consequently, the psychoacoustic experiment was considered to be valid. Next, the Least Significant Difference (LSD) for independent samples was employed as the posthoc of ANOVA to determine whether the individualized HRTFs performed better than the nonindividualized ones in this experiment. It was hypothesized that there was no significant difference between the evaluation results of individualized HRTFs and non-individualized ones (H0). p stands for the probability of making mistakes regarding to rejecting H0. H0 was kept when p was larger than α, while H0 was rejected when p was less than α. Under the significant level of α = 0.05, the result of LSD was given in Table 6 and Figure 12. As can be seen from Table 6, the evaluation results were significantly different between individualized HRTFs and the non-individualized ones. The mean class of the individualized HRTFs was larger than that of the non-individualized ones. The individualized HRTFs were consequently considered performing better than the non-individualized ones in this psychoacoustic experiment.   The results of this subjective experiment indicate that, comparing with the non-individualized HRTFs, the individualized ones obtained from the presented integrated method involved more accurate directional information. Meanwhile, the individual HRTFs performed better in discriminating the directional information. This is of great significance in the practical application.

Experimental Protocols
The experiment in Section 4.2 indicated that the directional information of sound sources with individualized HRTFs were easier to be detected when the azimuth had been known before. To further verify the superiority of the individualized HRTFs in perceiving the spatialization cues when the directional information was unknown for the listeners, the subjective localization experiment was also carried in this section. The participants employed in this experiment were the same as Section 4.2. The localization experiment also tested two HRTF species: individualized HRTFs (obtained from the presented method) and non-individualized HRTFs (measured from KEMAR dummy head). There were totally 22 testing azimuths on the horizontal plane. The testing stimuli were produced by convolving the signal shown in Figure 8

Procedure
During the experiment, the testing stimuli were presented to the participants directly through headphones [Sennheiser HD 560]. Before the test, the instructions and questionnaires for the experiment were handed out to all the participants. On the questionnaire, all the testing azimuths were listed for each testing stimulus. The participants were asked to select the most accurate azimuth according to their own judgments after hearing the stimulus. During the test, each stimulus was presented five times in succession. Then, there would be a 5 s interval, in which the participants made their judgments to the presented stimulus on the questionnaire. After that, the next stimulus would be presented in the same way, until all the testing stimuli of the two testing HRTF species were finished. The test continued until this process was repeated three times. To avoid the possible bias of the localization results caused by auditory fatigue, there was a 20 min rest between repetitions.
The presentation order of the stimuli was randomized using an altered Latin square scheme. Thus, the possible bias effects caused by the order effect and sequential dependencies could be minimized. The auditory localization experiment lasted about 1.5 h for each participant. Figure 13 shows the localization results of all the eight subjects by using the presented individualized HRTFs and the non-individualized ones of KEMAR dummy head. X-axis and Y-axis stand for the real azimuth and the perceived azimuth, respectively. Perfect correlation between target azimuth and response judgment corresponds points on line A on these graphs. That means the subjects' judgments were exactly the same as the real directions. On the other hand, points out of line B and line C correspond to localization errors larger than 45 • . Besides, points on the coordinate line of 180 • correspond to front-back confusions. It can be observed that the individualized HRTFs achieve slightly better localization performance, both in accurate localization and smaller localization error.  As a supplementary to the data shown in Figure 13, the localization accuracy rate, the average error angle and mean of the standard deviation of the error were calculated, and the results are listed in Table 7. It can be found from Table 7 that the localization accuracy rates of the individualized HRTFs were larger than those of non-individualized ones, except subject 7. The average localization errors of the individualized HRTFs were smaller than the non-individualized ones for all the participants, even though the difference was not obvious for subject 8. Moreover, the standard deviations of the individualized HRTFs were smaller than the non-individualized ones for participants, except subject 8. The individualized HRTFs seemed to be preferable in auditory localization.

Results and Analysis
To determine whether the experimental results were valid, two-way repeated variance analysis (ANOVA) was made for the localization errors of each subject under the significant level of α = 0.05. The two factors were HRTF species and testing azimuth. It was hypothesized that both the HRTF species and testing azimuth had no significant effect on the localization errors (H 0 ). p denotes the probability of making mistakes regarding to rejecting H 0 . H 0 was kept when the probability p was larger than α, while H 0 was rejected when p was less than α. The result of ANOVA was listed in Table 8. As can be seen from Table 8, the HRTF species, testing azimuth and their interaction had significant effect on the localization errors. Consequently, the localization experiment was proved valid. Next, the Least Significant Difference (LSD) for independent samples was employed as the post-hoc of ANOVA to determine whether the individualized HRTFs performed better than the non-individualized ones in the binaural localization experiment. The localization errors of the testing HRTF species were hypothesized to be of no significant difference (H 0 ). p stands for the probability of making mistakes regarding to rejecting H 0 . H 0 was kept when p was larger than α while H 0 was rejected when p was less than α. Under the significance level of α = 0.05, the result of the post-hoc was shown in Table 9 and Figure 14.   It can be found from Table 9 that under the significant level of α = 0.05, the localization errors of the individualized HRTFs and the non-individualized ones are significantly different. For the reason that the mean localization error of the individualized HRTFs was less than that of the nonindividualized ones, the individualized HRTFs were considered performing better than the nonindividualized ones in the binaural localization experiment.

Conclusion
The main contributions of this research lie in: 1) an integrated algorithm for HRTF individualization combining correlation analysis, principle component analysis, multiple linear regressions and partial least square regression was presented; 2) two psychoacoustic experiments were conducted to explore the superiority of the individualized HRTFs in perceiving the spatialization cues of sound sources. All the results support that the individualized HRTFs obtained through the presented integrated method achieve preferable performance in perceiving the spatialization cues than the general non-individualized ones.
Given the conclusions above, individualized HRTFs should be more necessary in practical applications, such as virtual hearing, tel-conference, etc. However, the practical applications involve not only spatialization cues, but also other attributes such as timbre. The use of HRTFs or even individualized HRTFs may possibly cause some other good or bad effects, which were not considered in this research. Thus, future work will need to investigate whether HRTFs and individualized HRTFs leads to some potential effects on binaural auditory in practical applications. It can be found from Table 9 that under the significant level of α = 0.05, the localization errors of the individualized HRTFs and the non-individualized ones are significantly different. For the reason that the mean localization error of the individualized HRTFs was less than that of the non-individualized ones, the individualized HRTFs were considered performing better than the non-individualized ones in the binaural localization experiment.

Conclusions
The main contributions of this research lie in: (1) an integrated algorithm for HRTF individualization combining correlation analysis, principle component analysis, multiple linear regressions and partial least square regression was presented; (2) two psychoacoustic experiments were conducted to explore the superiority of the individualized HRTFs in perceiving the spatialization cues of sound sources. All the results support that the individualized HRTFs obtained through the presented integrated method achieve preferable performance in perceiving the spatialization cues than the general non-individualized ones.
Given the conclusions above, individualized HRTFs should be more necessary in practical applications, such as virtual hearing, tel-conference, etc. However, the practical applications involve not only spatialization cues, but also other attributes such as timbre. The use of HRTFs or even individualized HRTFs may possibly cause some other good or bad effects, which were not considered in this research. Thus, future work will need to investigate whether HRTFs and individualized HRTFs leads to some potential effects on binaural auditory in practical applications.