A Hybrid Algorithm for Predicting Median-Plane Head-Related Transfer Functions from Anthropometric Measurements

: Since head-related transfer functions (HRTFs) represent the interactions between sounds and physiological structures of listeners, anthropometric parameters represent a straightforward way to customize (or predict) individualized HRTFs. This paper proposes a hybrid algorithm for predicting median-plane individualized HRTFs using anthropometric parameters. The proposed hybrid algorithm consists of three parts: decomposition of HRTFs; selection of key anthropometric parameters; and establishing a prediction formula. Firstly, an independent component analysis (ICA) is applied to median-plane HRTFs from multiple subjects to obtain independent components and subject-dependent weight coe ﬃ cients. Then, a factor analysis is used to select key anthropometric parameters relevant to HRTFs. Finally, a regression formula that connects ICA weight coe ﬃ cients to key anthropometric parameters is established by a multiple linear regression. Further, the e ﬀ ectiveness of the proposed hybrid algorithm is veriﬁed by an objective evaluation via spectral distortion and a subjective localization experiment. The results show that, when compared with generic Knowles Electronics Manikin for Acoustic Research (KEMAR) HRTFs, the spectral characteristics of the predicted HRTFs are closer to those of the individualized HRTFs. Moreover, the predicted HRTFs can alleviate front–back and up–down confusion and improve the accuracy of localization for most subjects.


Introduction
Three-dimensional spatial sound localization includes directional (i.e., azimuth and elevation) and distance localization. Lateral localization mainly depends on interaural differences, such as the interaural time difference (ITD) and the interaural sound level difference (ILD), caused by different transmission paths from sound sources to the left/right ear. Vertical localization, on the other hand, mainly relies on the change in the magnitude spectrum when sound waves propagate to the ears, that is, the spectral cue. Head-related transfer functions (HRTFs), defined as the system transfer function of the sound filter from a sound source to human eardrums in the frequency domain, include the aforementioned ITD, ILD, and spectral characteristics. One major application of binaural HRTFs is the creation of a virtual auditory display (VAD). A VAD can not only be used as an experimental platform in research of binaural hearing, but also has various applications in such fields as virtual reality, mobile communication, and multimedia [1][2][3]. From a physical point of view, HRTFs are closely related to human physiological structures. For example, different head sizes result in different levels of ITD, and different shapes and sizes of pinnae lead to different spectral characteristics. Therefore, HRTFs are individual-dependent, and each specific individual owns a specific set of HRTFs (individualized HRTFs). Previous literatures have reported that the use of non-individualized HRTFs in VADs, especially in the case of a static situation, often leads to a series of problems, such as increased front-back confusion and inside-of-head localization [4]. Therefore, to guarantee the authenticity of VADs, it is necessary to use individualized HRTFs for signal processing.
Acoustic measurement and numerical simulation are two primary approaches to obtain individualized HRTFs accurately [5][6][7][8][9][10][11]. However, acoustic measurement is time-consuming and requires a special apparatus; the calculation suffers from high requirements for computer performance and the scanning of physiological structures. Therefore, these two methods cannot support practical applications. In order to solve conflicts between the need for and acquisition of individualized HRTFs, researchers have proposed a method called the customization of individualized HRTFs. Customization (i.e., approximate acquisition) refers to simplifying the process of acquisition of individualized HRTFs as much as possible, while preserving individual auditory characteristics. As VADs have wide application, the customization of individualized HRTFs has attracted widespread attention.
Because HRTFs are closely related to individual physiological structures, HRTF customization based on anthropometric parameters is natural. Zotkin et al. [12] suggested that the similarity of anthropometric parameters between individuals corresponded to the similarity of their HRTFs. So, it is possible to select best-matching HRTFs from an HRTF baseline database according to the deviation in pinnae parameters. Then, subsequent studies on the optimization of matching parameters were carried out [13][14][15][16][17][18][19][20][21]. Zeng et al. [13] explored the statistical relationships between HRTF characteristic vectors and anthropometric parameters. Liu and Zhong [14] studied the similarity between anthropometric parameters, and proposed an improved database-matching method using four pinna-related anthropometric parameters. Iida et al. [15] first estimated the frequencies of the two lowest spectral notches (N1 and N2) by a listener's pinnae anthropometry, and then selected the best-matching HRTFs, of which N1 and N2 were the closest to the estimates. Middlebrooks pointed out that the differences in anthropometric dimensions between individuals would cause the overall translation of HRTFs along with the frequency axis, and thus proposed a frequency-scaling method for customizing individualized HRTFs [16,17]. In the method, the optimal frequency scale factor inferred from the anthropometric parameters was applied to the known HRTFs to obtain the approximation of the individualized HRTFs. Generally, the current mainstream method is to use correlation analysis and multiple linear regression analysis to find anthropometric parameter sets that are closely related to HRTFs [18][19][20][21].
Data analysis methods have also been employed in the individualization of HRTFs. He et al. synthesized HRTFs using a sparse representation with different pre-processes and post-processes trained from anthropometric parameters [22]. Furthermore, the radial basis neural network has been utilized in HRTF estimation based on anthropometric parameters and achieved promising performance [23]. Qi et al. proposed a sparsity-constrained weight mapping to individualize HRTFs, which obtained optimal weights to combine HRTFs based on the relationships of anthropometric parameters between the trained subjects and the target subject [24]. Schönstein et al. used a reversive feature elimination support vector machine (RFE-SVM) to select anthropometric parameters [25].
Although the abovementioned studies have conducted beneficial research on HRTF customization, more sufficient methods are needed to customize HRTFs according to the essence of HRTF characteristics. In this work, a hybrid algorithm based on anthropometric parameters is proposed. In the method, decomposition of HRTFs is first performed using independent component analysis (ICA) to explore the independent characteristics of HRTFs. Then, the selection of key anthropometric parameters is implemented using a factor analysis to detect the inner structure among the anthropometric parameters. Finally, the establishment of a prediction formula is accomplished using multiple linear regression. In this study, median-plane HRTF prediction in the CIPIC database (Center for Image Processing and Integrated Computing, University of California) is used as an illustrative example. The rest of the paper is organized as follows. Section 2 discusses in detail the principle and effect of the proposed hybrid algorithm based on anthropometric parameters; Section 3 validates the effectiveness of the proposed algorithm using objective spectral distortion criteria and a subjective listening experiment. Section 4 summarizes the whole work.

Methodology
HRTFs, or their time-domain counterparts, head-related impulse responses (HRIRs), describe the transmission process of sound waves scattering and reflecting through the head, torso, and pinnae in the free field [1]. Usually, an HRTF is defined as here P L and P R are the complex sound pressure produced by the sound source in the left and right ears of the listener, respectively; P 0 is the complex sound pressure at the center of the head when the human head is absent; f is the sound frequency; and a represents HRTF-related anthropometric parameters. An interaural polar coordinate system is adopted in this paper. Sound localization is determined by azimuth θ (−90 • ≤ θ ≤ 90 • ), elevation ϕ (−90 • < ϕ ≤ 270 • ), and distance r.
Equation (1) shows that an HRTF is a multivariate function that includes spatial sound localization, frequency, etc. Therefore, it is necessary to employ an appropriate decomposition method to reduce the dimensions and extract characteristics. ICA is an effective method for decomposing large amounts of data, and has been widely used in brain/speech signal processing, wireless communication, feature extraction, data mining, fault diagnosis, and other fields [26]. ICA is based on a higher-order statistical analysis (such as kurtosis), and thus yields independent base-vectors. In the structural model of HRTFs, the effects of anatomical structures were classified into three independent sub-models: effects of the head model, effects of the torso model, and pinna models [27]. Moreover, Chen performed a division analysis on HRIRs, revealing head and pinna effects, torso reflection, and knee reflection [28]. The theory of the structure models and anatomical filters/effects provides the theoretical foundations for utilizing ICA for HRTF decomposition.
As shown in Figure 1, a variety of statistical methods are combined to establish a hybrid algorithm for the customization of individualized HRTFs based on anthropometric parameters. On the one hand, ICA decomposition is applied to the magnitude of HRTFs to obtain the independent components and individual-dependent weight coefficients. The individual-dependent weight coefficient is used as the dependent variable to establish the subsequent prediction formula. On the other hand, factor analysis is applied to all anthropometric parameters to remove redundancy between these parameters. As a result, key anthropometric parameters are selected as independent variables in the subsequent prediction formula. Finally, the prediction formula relating weight coefficients to key anthropometric parameters is established by a multiple linear regression analysis. In application scenarios, the individualized coefficient weights for a specific listener are determined by substituting the measured listener's individualized anthropometric parameters into the proposed prediction formula. Then, the listener's customized HRTF can be obtained by substituting the predicted weight coefficients into an ICA-based HRTF formula. This is the basic idea of the proposed hybrid algorithm for customization of individualized HRTFs from anthropometric measurements.

Independent Component Analysis of HRTFs
In the first step of the hybrid algorithm, ICA is applied to magnitudes of HRTFs to obtain independent components and weight coefficients. There are several algorithms to implement ICA. In order to achieve fast convergence, a fast fixed-point algorithm based on negative entropy maximization is adopted here (FastICA) [29]. The procedure is as follows: Step 1: HRTF data of n subjects are denoted as H n,m , where m represents the sum of the frequency and directions of HRTF. It should be pointed out that, in order to fully extract subject-relevant individualized characteristics, all subjects constitute a vector direction of the matrix, that is: (2) Step 2: Centre (i.e., de-average) on H n,m to obtain a new matrix H n,m Step 3: Whiten (i.e., sphere) on H n,m ; here, we use principal component analysis (PCA) to obtain the direct result as where Λ and U represent the eigenvector matrix and the eigenvalue matrix of the covariance matrix C = E HH T , respectively. That is, a whitened linear transformation V is computed to obtain the whitened matrix Z = VH.
Step 4: Select the number d of independent components needed to be estimated.
Step 5: Initialize all random vectors W i , i = 1, 2, . . . , d. Each of W i has a unit norm and is chosen randomly. Then, perform orthogonal standardization to Step 6: For each i = 1, 2, . . . , d, update W i using an approximate Newton iteration algorithm, where g = tanh W T i Z , g is the first derivative of the original g.
Step 7: Perform orthogonal normalization processing on the updated matrix W according to the formula in Step 5.
Step 8: If it has not converged, return to Step 6 to repeat the process until the final convergence, and obtain the weight coefficient matrix A = W −1 and independent component matrix IC = WH n,m .
Step 9: Next, the FastICA algorithm is performed 10 times, and the run with the independent components that have the highest kurtosis (non-gaussianity) is selected.
Step 10: Finally, reconstruct HRTFs from the selected independent component matrix and the weight coefficient matrix If d = n, Equation (8) means complete reconstruction, that is, the reconstructed H is as same as the original H; if d < n, H is an approximation of the original H.
This work used the CIPIC HRTF database [7]. The database was established in 2001 by the UC Davis CIPIC Interface Laboratory. It contains HRTF data (44.1 kHz sampling frequency, 200 points sampling length) for 45 subjects, including the Knowles Electronics Manikin for Acoustic Research (KEMAR) with large and small ears, with 1250 directions per person (25 azimuths × 50 elevations). Meanwhile, it also includes 27 anthropometric parameters of 35 subjects (17 head and torso anthropometric parameters and 10 pinna anthropometric parameters).
As an illustrative example of the proposed algorithm, we selected data from 35 subjects with 27 complete anthropometric parameters, and divided them into a validation group (containing five subjects, numbers 003, 010, 018, 020, and 027) and the computational group (the remaining 30 subjects). The role of the computational group (also known as the baseline) is to implement the flow algorithm in Figure 1 to obtain an HRTF ICA decomposition and multiple linear regression formula between weight coefficients and anthropometric parameters. On the other hand, the role of the validation group is to input the anthropometric parameters of the subjects into the constructed HRTF ICA decomposition and multiple linear regression formula of weights and anthropometric parameters to predict their individualized HRTFs. Finally, we compare it with the measured individualized HRTFs to show the performance of the proposed algorithm.
In the process of HRTF decomposition using ICA, we calculated HRTF data below 20 kHz (232 frequency points) from 30 subjects at 25 elevation directions (an interval of 5.625 • ) in the front of the median plane. The directions and the frequency points (25 directions × 232 frequency points, m = 5800) were combined, and the HRTF data from the left and right ears (30 subjects × 2, n = 60) were combined, so the final variable is a two-dimensional matrix H 60,5800 . Our previous study has demonstrated the effectiveness of ICA decomposition and reconstruction of the vertical KEMAR HRTF [30]. The results indicated that the combination of the preceding six independent components can effectively characterize the main spectral characteristics of the median plane without introducing perceptible auditory defects. Considering that more individual-dependent variations are contained in this work, the preceding 10 independent components d = 10 were selected in the HRTF ICA decomposition for 30 subjects. Thus, the ICA decomposition of the median-plane HRTFs is obtained aŝ H 60,5800 = A 60,10 IC 10,5800 .
In Equation (9), A 60,10 is the individual-dependent weight coefficients, which are used in the subsequent linear regression analysis; IC 10,5800 is the direction and frequency-dependent base vector, which is used for the subsequent reconstruction of the individualized HRTF. During implementation, PCA is implemented before applying FastICA to reduce the HRTFs to 10 dimensions. The decorrelation approach is a symmetric method that estimates all the independent components in parallel. The nonlinearity used in the fixed-point algorithm is g(u) = tanh(u).
We plot one example of the input HRTF and the output independent components in Figure 2. It can be observed that the original data and the independent components share the same spectral characteristics, such as pinna-related spectral notches and torso-related spectral cues. Spectral distortion (SD) is used to evaluate the deviation between the original and reconstructed HRTFs. For a specific subject, SD is defined as [31]: where N = 232 represents the number of calculated frequency points. For different spatial directions, the average SD is defined as: where M = 25 is the number of involved spatial directions. The closer the values of SD and the SD value are to zero, the lower the deviation between the original and reconstructed HRTFs. Figure 3 shows the SD value of the original and reconstructed HRTFs from 30 subjects in the baseline. It indicates that the SD value is substantially around 4 dB in the 25 directions, implying that the HRTF decomposition using the preceding 10 independent components is stable and effective for reconstruction at all directions in the median plane.

Factor Analysis of Anthropometric Parameters
In order to obtain a concise formula between weight coefficients and anthropometric parameters, we used a factor analysis to select the key anthropometric parameters related to HRTF before formula derivation. There are 27 anthropometric parameters of the subjects in the CIPIC HRTF database (see Table 1), where x 1~x8 are the anthropometric parameters of the head and neck, x 9~x17 are the anthropometric parameters of the torso, and d 1~d8 , θ 1 , and θ 2 are the anthropometric parameters of the pinna. Factor analysis can be written in a matrix form [32]: where . . d 7 θ 1 θ 2 ] are the 27 anthropometric parameters of 30 subjects, µ is the mean value of X across subjects, α is the factor loading matrix, F is the common factor vector, and ε is the special factor vector. The steps in factor analysis are as follows: Step 1: Perform normalization on X i,j (i = 1, 2, . . . , 30; j = 1, 2, . . . , 27), eliminating differences in dimensions and the order of magnitude.
where x j is the mean value of X i,j , and S j is the standard deviation of X i,j .
Step 2: Estimate the factor load matrix α and special variance matrix D using maximum likelihood estimation [33].
Step 3: Calculate the variance contribution rate and the cumulative variance contribution rate C. According to the cumulative variance contribution rate, we have selected 16 common factors with a cumulative variance contribution rate of about 90%.
Step 4: The maximum variance orthogonal rotation method is used to rotate the factor load matrix α to have greater practical meaning [33]. The purpose of the maximum variance rotation method is to differentiate the absolute values of the elements on each column of the factor load matrix α, that is, the absolute values of a few elements are as large as possible, whereas the other elements as small as possible to zero.
Step 5: Finally, the Barlett factor score is estimated by the weighted least squares method [33]. The above factor analysis was conducted through the function Factoran in the MATLAB software. By inputting anthropometric parameters and the desired number of factors, the Factoran function in MATLAB will output the maximum likelihood estimates of the factor loadings α, the maximum likelihood estimates of the specific variances D, the rotation matrix T, and predictions of the common factors, also known as factor scores F. According to Equation (14), the variance contribution rate was calculated and the above 16 factors were selected with a rate of about 90%. The variance contribution rate of the 16 factors from the left and right ears are shown in Figure 4, and the cumulative variance contribution rate is about 90%. Therefore, we selected these 16 common factors to represent the 27 anthropometric parameters of the subjects. According to the factor load matrix α, anthropometric parameters with a larger load were chosen to be a specific factor. For example, Table 2 shows the selection of anthropometric parameters for Factor 1. x 1 , x 3 , x 6 , x 8 , x 9 , x 12 , x 16 , and x 17 have relatively large loads on the common Factor 1 (the bolded numbers in the table), so it is classified as this factor.  Finally, we conducted a comprehensive analysis that combined the physical meaning of the anthropometric parameters and the results of the factor analysis. For the left ear, Factor No. 1 contained eight anthropometric parameters and, finally, x 1 , x 16 were selected because their factor load values are largely a better representation of the head structure. For Factor No. 2, x 2 and x 10 were reserved for the same reason. For the right ear, x 1 and x 16 remained in Factor No. 1, while x 2 was reserved in Factor 2. In summary, the anthropometric parameters that remained and were introduced into the subsequent linear regression analysis were x 1 , x 2 , x 4 , x 5 , x 7 , x 10 , x 11 , x 14 , x 15 , x 16 , d 1 , d 2 , d 3 , d 4 , d 5 , d 6 , d 7 , d 8 , θ 1 , and θ 2 (shown in bold in Table 3). Table 3. Anthropometric parameters in the 16 common factors.

Factors Anthropometric Parameters (Left Ear) Anthropometric Parameters (Right Ear)
x 11 As seen in Figure 4, the variance contribution rate of Factor No. 1 is significantly larger than that of the other 15 factors. As shown in Table 3, the anthropometric parameters contained in Factor No. 1 and No. 2 can be used to characterize the outline of the head and neck, and the anthropometric parameters in Factor No. 3 and No. 4 can roughly describe the morphological features of pinna and concha. This demonstrates that the factor load matrix after factor analysis is easy to interpret. The corresponding special variance value is also very small, indicating that the factor-loading matrix obtained here is a good fit.

Multiple Linear Regression
The multiple linear regression formula represents a linear quantitative relationship between a dependent variable and multiple independent variables. It not only reveals the influence of the independent variables on the dependent variable, but also predicts the dependent variable via the regression formula. Its matrix form is [34]: where A represents the individual-dependent weight coefficient from ICA decomposition of HRTFs; Q represents the anthropometric parameters; β represents the regression coefficient; and σ is the constant term. The multivariate regression parameter β is estimated by the least squares method. If the estimatedβ of the regression coefficient vector has been obtained, thenÂ = Qβ. The residual ε i = A i −Â i ( i = 1, 2, . . . 10), representing the preceding 10 independent components of ICA decomposition, can be calculated. In this work, the stepwise selection method was used to select the anthropometric parameters involved in the multiple regression formula [34]. Figure 5 shows part of the graphical user interface (GUI) of the stepwise function in MATLAB. The individual-dependent weight coefficient A 60,10 from ICA decomposition of HRTFs is analyzed by the regression analysis with the anthropometric parameter Q 60,20 from the factor analysis. The left x 1~x20 in Figure 5 represent the 20 anthropometric parameters, while the right side in the figure uses the t-test to eliminate parameters that have no significant effect on the weight coefficients, and the F-test for linear regression yields the corresponding F and p values. According to the flow of the proposed HRTF customization through anthropometric parameters in Figure 1, the regression formulas obtained are shown in Table 4, and the results of the corresponding significance analysis are shown in Table 5. It should be noted that there are no relationships between x 5 , x 16 , and A 10,1 , so 18 anthropometric parameters are incorporated into the formula. So far, the entire procedure of implementing a hybrid algorithm based on anthropometric parameters for predicting individualized median-plane HRTFs is as follows: (1) Measure the dimensions of 18 anthropometric parameters of the specific subject, namely x 1 , x 2 , (2) Substitute the measured anthropometric parameters into each of the regression formulas in Table 4, predicting the subject-specific weight coefficient A; (3) Substitute the weight coefficient A to the ICA decomposition formula of HRTFs (i.e., Equation (9), reconstructing the individualized HRTFs of the specific subject. (4) Perform the above steps for the left and right ears, respectively. Table 4. Regression formula of our HRTF customization approach using anthropometric parameters.

Evaluation
According to the above steps for implementing the proposed hybrid prediction algorithm, the anthropometric parameters of the five subjects in the validation group were input into the regression formulas in Table 4. Then, the reconstructed HRTFs of the five subjects were obtained. In this section, the combination of the objective standard deviation and the subjective localization experiment are used to verify the effectiveness of the hybrid algorithm based on anthropometric parameters for predicting individualized median-plane HRTFs. Figure 6 shows the HRTF magnitude spectrum of a typical subject (No. 1) at the 0 • elevation in the median plane, in which the low frequencies (below 400 Hz) are eliminated due to measurement error. It can be seen that the predicted HRTFs and the original measured HRTFs are similar with a good consistency at the pinna-related spectral notches. Considering that the localization cues from pinnae are mainly concentrated in the frequency band from 5 kHz to 12 kHz, the predicted HRTFs using the proposed algorithm may not suffer from the deterioration of localization accuracy. Substituting the predicted and corresponding measured HRTFs into Equations (10) and (11), the SD as well as SD of each subject in the validation group can be obtained. Figure 7 shows the variation of SD with elevations for each subject in the validation group. As shown, the predicted HRTFs of Subject No. 1, No. 2, and No. 5 have a smaller SD value than that of the KEMAR HRTF at most elevation directions, suggesting that the proposed algorithm is advantageous in the median plane. The predicted HRTFs of Subject No. 3 and Subject No. 4 have a larger SD value than that of the KEMAR HRTF at some elevation directions. This may be caused by the fact that Figure 7 only demonstrates the prediction performance of an illustrative example of the proposed algorithm (see Section 2.1); the limited number of subjects in the illustrative example (i.e., 30 subjects in the calculation group) may not be sufficient to achieve the best reconstruction performance. Moreover, the anthropometric parameters of the CIPIC HRTF database used in the illustrative example are mainly measured via a photographic method; therefore, they are likely to contain measurement errors to certain extent. Furthermore, the SD that is averaged across 25 elevations is calculated according to Equation (11). Similarly, the SD values between KEMAR and the predicted HRTFs for the five subjects in the validation group are compared as shown in Figure 8. For most subjects, SD of the predicted (customized) HRTFs is smaller than that of the KEMAR HRTF. In particular, the SD values of Subject No. 1, No. 2, and No. 5 decrease by 1.2 dB, 1.1 dB, and 1.3 dB, respectively. In summary, for most subjects, the spectral characteristics of the predicted HRTFs are consistent with the original (i.e., measured) HRTFs; the SD value of the predicted HRTFs is reduced as compared to the KEMAR HRTFs. Therefore, the proposed HRTF customization hybrid algorithm based on ICA decomposition and anthropometric parameter measurement is effective with the customized HRTFs prior to KEMAR HRTFs with respect to spectral characteristics.

Subjective Localization Experiments
A localization experiment was further undertaken to verify the effectiveness of the proposed hybrid algorithm subjectively. Six graduate students (from 22 to 27 years old) participated in the experiment. They have experience in sound localization experiments and been screened by the Orbiter medical audiometer with a hearing loss of less than 10 dBHL below 8 kHz and less than 15 dBHL from 8 kHz to 12.5 kHz. The anthropometric parameters of the six subjects were extracted using the anthropometric parameter extraction method based on the three-dimensional scanned head model [35]. Dimensions of the measured anthropometric parameters of these six subjects are shown in Table 6. The predicted HRTFs were reconstructed by substituting the extracted anthropometric parameters into the regression formulas in Table 4.
In terms of experimental signals, the predicted HRIR and KEMAR HRIR were separately convoluted with a white noise to obtain the binaural signals for rendering. Specifically, the white noise is 10 s with the sampling resolution of 44,100 Hz and the quantization resolution of 16 bits.
Note that the predicted HRIR and KEMAR HRIR signals used here are the minimum phase functions, which are determined by the magnitudes of the predicted and KEMAR HRTFs, respectively. The target sound direction is selected from the median plane of six elevations: −45 • , 0 • , 22.5 • , 45 • , 67.5 • , and 90 • . Each subject needed to make 72 judgments (six source directions × two HRTF types × six repetitions). For each experimental condition, there were 36 judgments (six subjects × six repetitions). All the signal segments were presented randomly to each subject. The experimental hardware platform was a Windows 7 platform-based computer, the sound card was ESI ® maya22usb, and the headset was a pair of Sennheiser IE 80.
As our work focuses on the vertical localization in the median plane, the median-plane coordinates were arranged with elevation angles labelled at intervals of 5 • . Before the experiment, the subject's head was aligned to the center of the median-plane coordinates. In the experiment, the subject held a laser pen, and pointed to the perceived direction of sound source by straightening his/her arm. Then, the elevation angle of the perceived sound source was read and recorded by the experimenter. Playback was allowed until the subject determined the perceived directions and no feedback was provided to the subject.  Table 7 shows the experimental results of the six subjects. The absolute angle error is defined as the absolute value of the difference between the direction judged by the subject and the preset target direction. The table shows the average angle error from 36 judgments. Note that the angle errors were calculated after correcting for front-back and up-down confusions [2]. As to the absolute angle error, the HRTFs predicted by the proposed algorithm demonstrate a better localization effect than the KEMAR HRTFs in all of the six test directions. A further t-test shows that there is a significant difference in localization performance at elevations of 0 • and 22.5 • (p < 0.01) and elevations of 45 • , 67.5 • , and 90 • (p < 0.05), whereas no significant difference is observed at the elevation of −45 • (p > 0.05). For the front-back and up-down confusion rates, the predicted HRTF of the proposed algorithm is 5.5% and 3.3% lower than those of the KEMAR HRTFs, respectively.
In summary, the proposed HRTF customization hybrid algorithm based on ICA decomposition and anthropometric parameter measurement is effective compared with KEMAR HRTFs with respect to both objective and subjective evaluations.

Discussion
In the proposed hybrid algorithm, independent component analysis is used to decompose the magnitude of HRTFs, factor analysis is used to select significant anthropometric parameters, and finally multiple linear regression is used to establish the prediction formula. With the proposed hybrid algorithm, individual HRTFs can be customized by measuring the anthropometric parameters. Furthermore, the performance is evaluated by the objective SD value and a subjective localization experiment.
In order to further validate our proposed algorithm, we compare our method with other customization studies. Firstly, most studies extracted anthropometric parameters from photos, and some large-sized anthropometric parameters, such as head circumference and head height, were measured by a ruler. However, our work utilizes the anthropometric parameter extraction method based on a three-dimensional scanned head model, guaranteeing that the anthropometric parameter measurement had high accuracy. This is an important preliminary step for individual HRTF estimation based on anthropometric parameters. Secondly, our method outperforms the other customization studies regarding subjective localization performance. Our proposed method has the average angle error of 14.5 • across six elevations, while Iida's method achieved the average angle error of 20.7 • [15] and Hwang et al. achieved the average angle error of 24.5 • [36] for various elevations in the median plane. Therefore, our method improves the median-plane localization performance by around 6.2 • as compared to the existing method.

Conclusions
This paper proposes an individualized HRTF customization hybrid algorithm based on ICA decomposition of HRTFs and measurements of anthropometric parameters. The HRTF data of 30 subjects are first decomposed by ICA, and the spatial base vectors as well as weight coefficients are obtained. In parallel, a factor analysis is performed on 27 anthropometric parameters to eliminate redundancy. Then, the weight coefficients from ICA decomposition and anthropometric parameters selected by the factor analysis are fitted by a linear regression analysis to establish an individualized HRTF prediction formula containing anthropometric parameters. According to the objective evaluation of SD, the mean SD values of three of the five subjects in the validation group decrease by 1.2 dB, 1.1 dB, and 1.3 dB, respectively. The subjective localization experiment has also verified that the predicted HRTFs have a better localization effect than the generic KEMAR HRTFs in terms of absolute angle error and front-back and up-down confusion rates.
This work will facilitate customization of individualized HRTF data quickly and effectively in practical applications of virtual auditory display. It should be noted that, because the spectral characteristics of HRTFs are particularly essential in elevation localization, the proposed customization of individualized HRTF focuses on the elevation directions in the median plane without the presence of binaural localization cues. In principle, the proposed algorithm can be generalized to other spatial directions. Future works can continue to verify the applicability of this algorithm in three-dimensional space.
Author Contributions: X.Z. contributed to the research idea and framework of this study, and X.L. performed the experimental work and data analysis. X.L., H.S., and X.Z. discussed and wrote the contents of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.