Gaussian Process Regression for Single-Channel Sound Source Localization System Based on Homomorphic Deconvolution

To extract the phase information from multiple receivers, the conventional sound source localization system involves substantial complexity in software and hardware. Along with the algorithm complexity, the dedicated communication channel and individual analog-to-digital conversions prevent an increase in the system’s capability due to feasibility. The previous study suggested and verified the single-channel sound source localization system, which aggregates the receivers on the single analog network for the single digital converter. This paper proposes the improved algorithm for the single-channel sound source localization system based on the Gaussian process regression with the novel feature extraction method. The proposed system consists of three computational stages: homomorphic deconvolution, feature extraction, and Gaussian process regression in cascade. The individual stages represent time delay extraction, data arrangement, and machine prediction, respectively. The optimal receiver configuration for the three-receiver structure is derived from the novel similarity matrix analysis based on the time delay pattern diversity. The simulations and experiments present precise predictions with proper model order and ensemble average length. The nonparametric method, with the rational quadratic kernel, shows consistent performance on trained angles. The Steiglitz–McBride model with the exponential kernel delivers the best predictions for trained and untrained angles with low bias and low variance in statistics.


Introduction
The delivered signal over the space contains the spatial information due to the limited propagation speed. The walls and/or obstacles in the field may provide multipath patterns on the received signal. In the isotropic and open environment, the multiple receivers create numerous arrivals on the received signal. The propagated signal can be analyzed by the sound source localization (SSL) system to estimate the angle of arrival (AoA) for the signal source. The first engineering approaches for the SSL system utilized the unidirectional sound collection structure with steering for localizing. To avoid the physical movement, the extensive processing over the multichannel signals is required to understand the spatial information. The mathematical modeling of the propagation and arrival of the sound is important for processing in terms of sound directions.
The conventional method for SSL is beamforming [1,2], which uses the phase information from the multiple receivers for the spatial filtering. The accuracy of the beamforming localization is proportional to the receiver numbers along with the processing power; therefore, the significant computing power as well as data rate are involved in the precise sound localization. Certain natural creatures, including humans, can accurately localize sound sources in three-dimensional (3D) space by using the binaural correlation and structure profile [3]. The reflections and diffractions on the receiver structure provide additional features to derive the AoA for the sound sources. The human hearing for localization has computation/communication burden. The paper established the promising performance with a simple machine learning algorithm, linear regression. This paper extends the singlechannel, multiple-receiver SSL method with an advanced machine learning algorithm called GPR and novel feature extraction. Additionally, the simulation and experiment are diversified by including the further hold-out dataset for extensive angles. Therefore, this paper investigates the prediction performance along with the complex inference of the angles not shown in the learning process. Note that the identical anechoic chamber [43] has been operated for the acoustic experiments and evaluations to maintain consistency. This paper justifies the author's continuous research activities on the SSL system for decades. The novel underwater SSL systems based on the beamforming had been presented for rapid and scalable systems [31][32][33][34][35][36]. The sparsely distributed wireless sensor network localized the targets by capturing the acoustic energy from the moving vehicle [37]. The unique binaural [38] and monaural [39][40][41] SSL systems were proposed for feasible systems over the airborne sound propagation. The human aggressively exploits the binaural and monaural methods for better localization by using the magnitude, phase, and frequency variations over the two ears as well as the pinna/head structure [3]. The binaural and monaural SSL systems are possible because of the extensive use of the structure to overcome the receiver number limitation. In general, the conventional sensing modules are required to be installed on the target system without modifying the overall structure and configuration. Additionally, the computation and communication complexity should be maintained at a low level for system feasibility. The single-channel multiplereceiver SSL was suggested in the previous papers [30,42] to challenge the binaural/monaural method without the external structure and to include the beamforming performance with reduced computation/communication burden. The paper established the promising performance with a simple machine learning algorithm, linear regression. This paper extends the single-channel, multiple-receiver SSL method with an advanced machine learning algorithm called GPR and novel feature extraction. Additionally, the simulation and experiment are diversified by including the further hold-out dataset for extensive angles. Therefore, this paper investigates the prediction performance along with the complex inference of the angles not shown in the learning process. Note that the identical anechoic chamber [43] has been operated for the acoustic experiments and evaluations to maintain consistency. This paper is organized as follows: Section 2 introduces HD and explains how the time delays are extracted and represented. Section 3 presents the feature extraction methods for nonparametric/parametric HDs and the GPR for machine learning. Section 4 presents the computer simulations for the optimal receiver position and prominent localization parameters. The similarity matrix is employed for evaluating the various receiver positions with reduced computation. The simulation also demonstrates the performance of localization for the numerous parameters and models. The prediction performances on the training and hold-out datasets are illustrated and analyzed for extensive directions. Section 5 describes the performance of the actual dataset produced by the anechoic chamber experiments. The best parameters and model benefits are observed and arranged for the conclusion.

Nonparametric and Parametric Homomorphic Deconvolution
This section is presented by summarizing the previous papers [30,42] to provide access to the reader while maintaining concept integrity. For further information, the readers are suggested to follow the article for finding detailed explanation about the nonparametric and parameter HD. The homomorphic filtering method is a comprehensive method for signal processing, comprising a nonlinear transformation to a counterpart domain in which linear filters are performed, followed by an inverse transformation back to the original domain [44][45][46]. The HD is one application of homomorphic filtering to separate the convoluted signals by using the logarithm and exponential operations. The forward Sensors 2023, 23, 769 5 of 30 homomorphic system based on the logarithm converts the multiplicative functions into an additive format. The simple window operation performs the separation based on a signal property. The backward homomorphic system with exponentials reverts to the original time domain. The HD uses two homomorphic systems in a cascade to derive the propagation function, which indicates ToFs between the receivers.
The nonparametric HD produces the conventional raw distribution of time delay information based on multiple discrete Fourier transforms (DFT) or fast Fourier transforms (FFT). The propagation function model realizes the parametric ToF estimation in the last HD stage by employing Yule-Walker [47,48], Prony [48,49], and Steiglitz-McBride [48,50]. Note that the parametric method computes model coefficients and the nonparametric technique delivers the numerical distribution. Figure 2 illustrates the complete computational procedure for the nonparametric and parametric HD algorithms. The ensemble average in stage 3 and conjugate in stage 7 are inserted to conduct the signal-to-noise ratio (SNR) improvement and parametric modeling, respectively.  The choice of the last computation stage in Figure 2 is divided between the nonparametric and parametric HD. The Yule-Walker, Prony, and Steiglitz-McBride method are devised to provide the parametric HD. The following equations show the conventional regressive signal models in the time and z domains.
The model parameters below are estimated by the Yule-Walker algorithm to describe the propagation function ℎ[ ] of the given time signal. The Yule-Walker algorithm is classified as the autoregressive (AR) model; hence, the algorithm derives the coefficients of the denominator in Equation (2).
, … , ∈ ℂ and = , = = ⋯ = = 0 The model coefficients below are computed by the Prony and Steiglitz-McBride algorithms to represent ℎ[ ]. The Prony and Steiglitz-McBride algorithms belong to the autoregressive moving average (ARMA) model; therefore, the algorithms are required to compute the numerator and denominator coefficients in Equation (2).
, … , , , … , ∈ ℂ Note that the complex number input for the Yule-Walker, Prony, and Steiglitz-McBride algorithm provide the complex number coefficients which disengage the pole/zero location constrains. Without the constraints, further estimation accuracy is obtained from the numerical expansion [30].
In Equation (3), the estimated time delays are properly indicated by the derived coefficients from the parametric methods. The is the rational function shown in Equation (2), and L is the consistent FFT length in the HD algorithm. Observe that the samples n In Figure 2, the system receives the single channel signal, which contains the impulse response (or propagation function) of the receiver configuration based on the convolution sum. Note that the star operation in x[n] * h[n] in Figure 2 indicates a convolution sum operation. The FFT (stage 1 ) and inverse FFT (stage 4 ) pair with the absolute logarithm (stage 2 ), which corresponds to the real cepstrum of the forward conversion of the homomorphic system. The previous paper [42] showed that the ensemble average (stage 3 ) after the logarithm (stage 2 ) denotes the inverse proportionality in noise variance. Therefore, the longer ensemble average length induces higher SNR due to the lower noise power. The window function w[n] in frequency invariant filtering (stage 5 ) actually executes the separation of impulse response h[n]. The w[n] is allowed to pass beyond the specified time sample, which controls the minimum time delay in h[n]. The other FFT (stages 6 and 8 ) pairs with exponential functions (stage 7 ) reconstitutes and estimates the nonparametric propagation impulse response h[n]. The FFT in the last stage essentially performs the inverse FFT because the conjugate to the FFT provides the conjugated inverse FFT outcome according to the DFT/FFT property.
The choice of the last computation stage in Figure 2 is divided between the nonparametric and parametric HD. The Yule-Walker, Prony, and Steiglitz-McBride method are devised to provide the parametric HD. The following equations show the conventional regressive signal models in the time and z domains.
The model coefficients below are computed by the Prony and Steiglitz-McBride algorithms to represent h[n]. The Prony and Steiglitz-McBride algorithms belong to the autoregressive moving average (ARMA) model; therefore, the algorithms are required to compute the numerator and denominator coefficients in Equation (2).
Note that the complex number input for the Yule-Walker, Prony, and Steiglitz-McBride algorithm provide the complex number coefficients which disengage the pole/zero location constrains. Without the constraints, further estimation accuracy is obtained from the numerical expansion [30].
In Equation (3), the estimated time delays are properly indicated by the derived coefficients from the parametric methods. The H(z) is the rational function shown in Equation (2), and L is the consistent FFT length in the HD algorithm. Observe that the samples n corresponding to the maximum propagation function h[n] specify the time delays. Figure 3 illustrates the normalized h[n] for 50, 100, and 150 samples in time delay for the nonparametric and parametric HD algorithms. The window length for all HDs is 25 samples, with a 200 ensemble average length. The order for the parametric HDs is 9 in constant. The data is generated from the recorded audio signal from the anechoic chamber [43] with postprocessing (convolution sum) for dedicated time delays. The HDs demonstrate the statistical performance based on the SNR, ensemble average length, and order (only for parametric HDs). The nonparametric HD and Steiglitz-McBride methods provide consistent estimation with low variance and bias in general. The Yule-Walker and Prony methods are prone to the performance degradation in low SNR and short ensemble average length overall. Further statistical performances were analyzed for time delay estimation in the previous paper [30] based on the various conditions. Sensors 2023, 23, x FOR PEER REVIEW average length overall. Further statistical performances were analyzed for time de timation in the previous paper [30] based on the various conditions.

Methodology
The previous section introduced the methods to derive the time delay inform which is induced by the receiver structure. The process is known as the deconvolut reversing the convolution between the original sound and produced delays. Based delay distribution, the AoA is estimated by the machine learning algorithm, the GP proper information format. The GPR predicts the AoA based on the supervised le

Methodology
The previous section introduced the methods to derive the time delay information, which is induced by the receiver structure. The process is known as the deconvolution for reversing the convolution between the original sound and produced delays. Based on the delay distribution, the AoA is estimated by the machine learning algorithm, the GPR with proper information format. The GPR predicts the AoA based on the supervised learning process, which computes the optimal function from the statistical procedure. The nonparametric and parametric HD algorithms produce distinctive numerical distribution for the time delay; therefore, the appropriate feature extraction is necessary for the GPR. The following subsections demonstrate the feature extraction and GPR methods for accurate predictions.

Feature Extraction
Machine learning algorithms classified as supervised require labeled examples with feature input. The regression problem predicts a real-valued label from the feature learning process for the given unlabeled feature input. The GPR is one of the supervised and regression algorithms; therefore, a representative feature dataset as well as accurate labels are necessary for the learning process. The features and labels are arranged as follows: The lower case presents the scalar and the bold lower case denotes the vector for the L dataset in the above equation. The subscript indicates the i-th dataset. The label y i is produced by the feature vector x i . The individual vector x i is organized as follows: The n-th feature in the vector is described by the superscript number n in parenthesis in the above equation. Note that there are M features in the vector. The choice of x i is distinctive for the nonparametric and parametric HD algorithms in terms of domain as well as length.
The nonparametric HD utilizes the direct method for feature extraction. The raw distribution of nonparametric HD output immediately illustrates the time delay information with likelihood. The higher value in the sample location shows the stronger possibility of time delay in the sample location. The range of the distribution follows the time delay information from the given receiver configuration. The lower bound W is specified by the window function w[n] in Figure 2 stage 5 since the window function prunes the time delay below the cutoff W to separate the ToF from the received signal.
x (l) Equation (6) shows the feature extraction from the h[n] nonparametric HD output. The n max is the maximum time delay possibly derived from the receiver configuration. The time locations from the W + 1 to n max of the nonparametric HD output are organized for the feature x (l) i sequentially. Note that the h[n] nonparametric HD output is normalized for consistency.
The parametric HDs employ the parameter-based extraction method known as pole angle method for feature extraction. The parametric HDs can produce the same raw distribution as the nonparametric HD output by using Equation (3); however, the benefits of the parametric HDs are lost with the extra computation. All parametric HDs establish the rational function as shown in Equation (2), and the poles of the function are the dominant components to create the peaky response in the time delay distribution. Especially, the phases of the poles correspond to the time delay in HD distribution [30]. The denominator of the rational function in Equation (2) is shown below, and the solution of the polynomial provides the poles of the given regressive model.
The polynomial coefficients a i in the above equation are estimated parameters by using the Yule-Walker, Prony, and Steiglitz-McBride methods with the given order N. The a i s are complex numbers, and the solutions µ i of the polynomial are also complex numbers. The phase of the individual pole µ i is assigned to the feature vector, as shown below.
x (l) The operator computes the angle of the complex number µ i by using the simple trigonometric calculation in radians. The direct method is the simplest method to extract the feature from the HD output. The method simply extracts the portion of the output and cast it on the feature vector. The length of the feature vector from the direct method is likely to be extensive due to the maximum time delay collected by the receiver configuration. The pole angle method requires further computation to calculate the poles and phases. However, the length of the feature vector from the pole angle method is expected to be concise because of the parametric model order. Observe that the parameter order is considerably smaller than the HD length, and the model parameters represent the time delay based on the dedicated computation introduced in Section 2.

Gaussian Process Regression
The Gaussian process regression contains two methods: the Gaussian process and regression. By definition, a Gaussian process is a collection of random variables, any finite number of which have a joint Gaussian distribution. Therefore, the Gaussian process generalizes the Gaussian distribution to an uncountably infinite set of possible dimensions. The regression is a problem of predicting a real-valued output given a feature vector by establishing a model from a learning algorithm. The GPR assumes that the regression outputs are infinite and latent function variables f (x i ) and the finite collection of the function variables obeys the multivariate Gaussian distribution. The Bayesian modeling of functions allows us to train the model and predict the target, especially in applications where data is limited.
The GPR is an extensive research area for the machine learning algorithm utilized for various purposes [51][52][53]. The diverse approaches have explained the GPR in numerous papers [54,55] and web documentation [56,57] in mathematical and technical terms, respectively. Note that the following equations and descriptions provide the GPR for the given application/structure in a concise and deliverable style. The function f (x) below is specified by the Gaussian process with zero mean and input variance. Observe that the function variance is computed by the input variance [58,59].
The explicit basis function is a method to denote a nonzero mean over the function. The equation below represents the zero-order mean basis function 1 and coefficient β with the given mean and variance for the coefficient. The higher order mean function can be realized by the mean function matrix with vector β.
The GPR output g(x) in the equation below shows the Gaussian process with the new mean and variance for the linear combination in Equation (10). The scalar output of Equation (11) is extended to the vector output in the equation below. The bold font in the function name presents the vector output. The h corresponds to the basis function and one vector for the zero order.
The matrix/vector format and dimension are demonstrated in below equation. Note that the x i s in below equation are the feature vectors specified at Equation (5).
The joint distribution between the observed y and predicted y * function values derives the conditional distribution with g(X * ) mean and cov(g(X * )) covariance, as shown below. I L is the identity matrix for L size.
cov(g(X * )) = K(X * , X * ) − K(X * , X) K(X, X) + σ 2 n I L −1 K(X, X * ) The dimensions of the predicted output and basis function are shown in the equation below. The actual prediction of the GPR is the mean of the conditional distribution, y * equivalently g(X * ). L * is the length of the test dataset.
The computation of the prediction y * requires hyperparameters as covariance matrix, β, and σ 2 n . Note that the σ 2 n is a noise variance from the data measurement. The covariance matrix consists of covariance values k x i , x j |θ defined by the covariance function or kernel. The covariance matrix K(X, X |θ ) for the training set is presented in the equation below with the parameter vector θ. The covariance matrix K(X, X * |θ ) between the training and test sets is illustrated in the equation below as well.
The kernel (or covariance function) denotes the similarity between two latent function variables g(x i ) and g x j equivalently between data points x i and x j by a kernel trick [59]. The kernel develops the similarity model with parameters for the signal standard deviation σ f , and characteristic length-scale l. Both parameters control the horizontal and vertical relationships between the points. Table 1 provides the conventional kernel functions with corresponding parameters. Table 1. Conventional kernel functions and corresponding parameters [53].

Kernel Name
Kernel Function k x i ,x j |θ with Parameter θ After establishing the covariance matrix with the kernel function, the parameters β, θ, and σ 2 n are necessary to be estimated from the given training dataset to complete Equation (14). The maximum posteriori estimate of the parameter occurs when the conditional probability distribution of the parameter is at its highest. With insufficient prior knowledge about the parameter, the highest corresponds to the maximizing point from the marginal log likelihood shown in the equation below.
The multivariate optimization algorithm [60][61][62] over Equation (19) executes the following equation to find the respecting parameters in a numerical way.
Once the kernel is selected, the parameters are estimated by the training process, and the predictions are performed by Equation (14) from the parameters. The prediction for a single point can be realized with the equation below.
The GPR training and prediction require intensive computation as O N 3 due to the inversion of the covariance matrix. To release the complexity, numerous methods are possible, such as subsets of data point approximation [53,63], subsets of regressor approximation [53,64], etc. This paper applied the GPR without any approximation on training and prediction to remove any uncertainties caused by the information reduction. However, the optimized matrix inversion algorithms are used, such as LU decomposition [65] and Cholesky decomposition [66].
The explained GPR is utilized for the predictions from the 3D sinc function in Figure 4. The original 3D sinc plot is depicted by the one million x and y grid points in Figure 4a. The GPR is trained by the randomly chosen 300 data points (* in Figure 4a), and the measurement is observed in the immaculate situation of the noise-free input data. The GPR with exponential, squared exponential, and Matern 3/2 kernels predicts the rest of the figure based on the estimated parameters in Figure The GPR training and prediction require intensive computation as ( ) due to the inversion of the covariance matrix. To release the complexity, numerous methods are possible, such as subsets of data point approximation [53,63], subsets of regressor approximation [53,64], etc. This paper applied the GPR without any approximation on training and prediction to remove any uncertainties caused by the information reduction. However, the optimized matrix inversion algorithms are used, such as LU decomposition [65] and Cholesky decomposition [66].
The explained GPR is utilized for the predictions from the 3D sinc function in Figure  4. The original 3D sinc plot is depicted by the one million x and y grid points in Figure 4a. The GPR is trained by the randomly chosen 300 data points (* in Figure 4a), and the measurement is observed in the immaculate situation of the noise-free input data. The GPR with exponential, squared exponential, and Matern 3/2 kernels predicts the rest of the figure based on the estimated parameters in Figure 4b, Figure 4c, and Figure 4d, respectively. The GPR prediction performance suggests the significance of the kernel selection. If possible, prior knowledge of the data model leads to a better choice of the kernel function.

Simulations
In the previous section, the SCSSL algorithm is implemented by the HD, feature extraction, and GPR in consecutive order. This section realizes and performs the SCSSL algorithm over the computer-generated data. The performances are evaluated over the various parameters, such as kernels, orders, and ensemble lengths to determine the optimal choices for the actual experiments. Prior to the discuss of the SCSSL system simulation, the performance of the SCSSL system depends on the receiver configuration due to the derived time delay patterns from the receiver locations. The pattern diversity contributes to the prediction accuracy, and massive choices on the receiver configuration engage in the complex decision. This section initiates the simulation with the receiver configuration.

Receiver Locations
The HD algorithms compute the time delay between the receivers; hence, the receiver locations provide the performance fluctuation based on the temporal information quality. At least three receivers are required to resolve the angles for the π (180 • ) range. The higher numbers of receivers produce better performance in accuracy; however, the prediction range is still limited to π. The ambiguity originated from the HD algorithms, which introduced the range constraint. Observe that the time delays from the HDs are arranged for the positive values and cannot be distinguished between θ and θ + π in the given algorithm structure. Therefore, the receiver configuration comprises three receivers for π range observation. Figure 5 shows the arbitrary receiver configuration and corresponding time delay distribution. The receivers are placed on the particular grids (tiny blank circles in Figure 5a), which are located over the plane circle with a 32-cm radius. The distance between the adjacent grids is 4 cm in the horizontal and vertical directions for the total of 197 grids. Note that the discrete grids are applied to limit the number of combinations. The three receivers are located at p 1 , p 2 , and p 3 positions, and the resultant vectors are shown at Figure 5a. The receiver positions p i and vectors v i are represented by complex numbers in the Cartesian complex plane. Using the conventional angle notation, the arrivals from the east, north, and west directions correspond to the θ for 0, π/2, and π, respectively. For the individual vector v i , the actual time delay τ i in samples can be calculated by the following equation.
The θ is the arrival angle in radians, c is the sound speed (34,613 cm/s), T s is the sampling period (1/48,000 s), j is the imaginary number, and Re() is the operator for extracting the real part from the complex number. Note that the round() in Equation (22) represents the round to the nearest integer function.

Simulations
In the previous section, the SCSSL algorithm is implemented by the HD, featu traction, and GPR in consecutive order. This section realizes and performs the SCS gorithm over the computer-generated data. The performances are evaluated over th ious parameters, such as kernels, orders, and ensemble lengths to determine the o choices for the actual experiments. Prior to the discuss of the SCSSL system simu the performance of the SCSSL system depends on the receiver configuration due derived time delay patterns from the receiver locations. The pattern diversity contr to the prediction accuracy, and massive choices on the receiver configuration eng the complex decision. This section initiates the simulation with the receiver configu

Receiver Locations
The HD algorithms compute the time delay between the receivers; hence, the re locations provide the performance fluctuation based on the temporal information q At least three receivers are required to resolve the angles for the (180°) rang higher numbers of receivers produce better performance in accuracy; however, th diction range is still limited to . The ambiguity originated from the HD algor which introduced the range constraint. Observe that the time delays from the H arranged for the positive values and cannot be distinguished between and + given algorithm structure. Therefore, the receiver configuration comprises three rec for range observation. Figure 5 shows the arbitrary receiver configuration and corresponding time distribution. The receivers are placed on the particular grids (tiny blank circles in 5a), which are located over the plane circle with a 32-cm radius. The distance betwe adjacent grids is 4 cm in the horizontal and vertical directions for the total of 197 Note that the discrete grids are applied to limit the number of combinations. The receivers are located at , , and positions, and the resultant vectors are sho Figure 5a. The re positions and vectors are represented by complex numbers in the Cartesian plex plane. Using the conventional angle notation, the arrivals from the east, nort west directions correspond to the for 0, 2 ⁄ , and , respectively. For the indi vector , the actual time delay in samples can be calculated by the following equ The is the arrival angle in radians, is the sound speed (34,613 cm/s), is th pling period (1/48,000 s), is the imaginary number, and ℛ ( ) is the operator tracting the real part from the complex number. Note that the round( ) in Equatio represents the round to the nearest integer function.
The δ[ ] is the Kronecker delta function with time n on a sample unit, A is the number of discrete angles, V is the number of time delay vectors, and n max is the maximum time delay on a sample for all configurations. Figure 5b casts the n for vertical and θ j for horizontal axis.
The optimal receiver configuration can be obtained by employing the brute force method which systematically enumerates all feasible configurations for the least prediction error. The extensive computations are necessary for searching for the best receiver positions since each configuration realizes the entire SCSSL procedure for possible angles. By understanding the machine learning characteristics, the feature discrimination significantly mitigates the computational burden for evaluating the receiver configurations. The optimal receiver configuration should provide distinctive time delays on every assessing angle for information diversity. Observe that the GPR requires the unique feature for individual angles.
The diversity pattern can be analyzed by creating the similarity matrix S in the equation below. The multiplication in the equation below is the inner product between vectors.
The t vector is from Equation (23), and the S matrix is illustrated in Figure 6a. According to the Cauchy-Schwarz inequality [67], the inner product with itself or the same pattern produces the highest values. Figure 6a demonstrates the highest values of the diagonal axis, which indicate the auto-inner products. The off-diagonal element should yield lower values than the diagonal elements; otherwise, the GPR delivers the confusion decision. The optimal receiver configuration is chosen from the least , norm and is s in Figure 7a. Multiple configurations provide identical minimum , norm value to the geometric symmetry of rotated structures. The receiver positions are located edge of the circle so that they can deliver the widest range of time delays in the pa Additionally, the asymmetricity is observed in the receiver positions to discrimina east and west directions. Note that the symmetric receiver configuration provides th diction of up to the 2 ⁄ range. The corresponding similarity matrix is depicted in F 7b. In contrast to Figure 6a, the optimal similarity matrix presents a substantially low tribution in Figure 7b. Fewer highest values are detected in the off-diagonal axis a The S matrix is established for one receiver configuration. The L 1,1 norm of the S matrix shown below presents the overall similarity of the given configuration.
The L 1,1 norm is the simple sum of all absolute elements. The lower value in S 1,1 denotes the higher diversity and equivalently, the lower similarity in time delay pattern since the off-diagonal elements are decreased overall. Note that the diagonal elements are predetermined by the auto-inner products as constants. Figure 6b illustrates the sorted L 1,1 norm of the S matrix for all receiver configurations. The combination of the three receiver positions resulting in 197 grid points is 1,254,890, which describes the x axis of Figure 6b.
The optimal receiver configuration is chosen from the least L 1,1 norm and is shown in Figure 7a. Multiple configurations provide identical minimum L 1,1 norm values due to the geometric symmetry of rotated structures. The receiver positions are located at the edge of the circle so that they can deliver the widest range of time delays in the pattern. Additionally, the asymmetricity is observed in the receiver positions to discriminate the east and west directions. Note that the symmetric receiver configuration provides the prediction of up to the π/2 range. The corresponding similarity matrix is depicted in Figure 7b. In contrast to Figure 6a, the optimal similarity matrix presents a substantially low distribution in Figure 7b. Fewer highest values are detected in the off-diagonal axis area in Figure 7b; therefore, improved accuracy is expected from the GPR prediction in the Figure 7a receiver configuration. The angle ambiguity is expected to be reduced for the wider receiver position with an increased receiver number. The optimal receiver configuration is chosen from the least , norm and is s in Figure 7a. Multiple configurations provide identical minimum , norm value to the geometric symmetry of rotated structures. The receiver positions are located edge of the circle so that they can deliver the widest range of time delays in the pa Additionally, the asymmetricity is observed in the receiver positions to discrimina east and west directions. Note that the symmetric receiver configuration provides th diction of up to the 2 ⁄ range. The corresponding similarity matrix is depicted in F 7b. In contrast to Figure 6a, the optimal similarity matrix presents a substantially lo tribution in Figure 7b. Fewer highest values are detected in the off-diagonal axis a Figure 7b; therefore, improved accuracy is expected from the GPR prediction in the F 7a receiver configuration. The angle ambiguity is expected to be reduced for the receiver position with an increased receiver number.  Figure 8a presents the time delay distribution for the optimal receiver configuration shown in Figure 7a. The dynamic range and the diversity pattern of the delay are further expanded and improved than in the example case, respectively. Corresponding to the similarity matrix in Figure 7b, minor pattern overlaps are observed in the distribution in Figure 8a. Figure 8b illustrates the estimated time delay distribution by using the nonparametric HD algorithm on the simulated data. The data is generated from the receiver's perspective in the time domain with a 20 dB SNR. The estimated time delay precisely visualizes the theoretical time delay with trivial shadows caused by the DFT length limitation and noise interference [30]. Additionally, note that the minimum detectable time delay is determined by the window function in stage 5 of the HD algorithms shown in Figure 2. Figure 8b initiates the time delay axis (y axis) from six samples due to the five samples in the window length. visualizes the theoretical time delay with trivial shadows caused by the DFT length limitation and noise interference [30]. Additionally, note that the minimum detectable time delay is determined by the window function in stage ⑤ of the HD algorithms shown in Figure 2. Figure 8b initiates the time delay axis (y axis) from six samples due to the five samples in the window length. The optimal receiver configuration is justified by the similarity matrix and its corresponding norm. This process significantly reduces the computation of evaluating the receiver configuration. The previous paper [42] performed the 680-combination evaluation, the prediction error analysis, on 14 computers (intel i7-7700/32GB memory) with 25-h execution time. This paper assesses the 1,254,890 combinations of the single computer (inter i7-10700k/32GB memory) for less than half hour. The maximum diversity on the time delay pattern is provided with the receiver configuration and is interpreted properly with feature extraction and GPR.

Localization Performance
The SCSSL system sequentially executes HD, feature extraction, and GPR for prediction based on the provided receiver configuration. The performance of the SCSSL depends on the prediction accuracy and the independent variables are listed as ensemble length, parametric order, kernel functions, and incoming angles. The length of ensemble average controls the depth of average for enhancing the SNR in HD. The parametric order determines the pole and/or zero number to adjust peaks and valleys of the time delays to follow the distribution. The kernel functions define the relationship between the feature vectors to establish the prediction model. The incoming angle selects the training and evaluating angles for the performance analysis. This subsection analyzes the performance in terms of prediction accuracy with described independent parameters.
The GPR produces the continuous output of the angle prediction because of the regression property; however, training and analyzing the system require discrete angles. The SCSSL system is trained by the seen angles from 0 to − 18 ⁄ with 18 ⁄ resolution. Note that the opposite direction angles should be avoided to remove the confusion due to the ambiguity in the HD algorithms. The angles between the adjacent seen angles are organized as the unseen angles to characterize the inference performance of the SCSSL system. The vocabulary 'seen' and 'unseen' is adopted to distinguish the included and excluded angles for learning, respectively, and both types of angles are shown in Figure 9a. The SCSSL system performance is numerated by root mean square error (RMSE), as shown below. The optimal receiver configuration is justified by the similarity matrix and its corresponding norm. This process significantly reduces the computation of evaluating the receiver configuration. The previous paper [42] performed the 680-combination evaluation, the prediction error analysis, on 14 computers (intel i7-7700/32GB memory) with 25-h execution time. This paper assesses the 1,254,890 combinations of the single computer (inter i7-10700k/32GB memory) for less than half hour. The maximum diversity on the time delay pattern is provided with the receiver configuration and is interpreted properly with feature extraction and GPR.

Localization Performance
The SCSSL system sequentially executes HD, feature extraction, and GPR for prediction based on the provided receiver configuration. The performance of the SCSSL depends on the prediction accuracy and the independent variables are listed as ensemble length, parametric order, kernel functions, and incoming angles. The length of ensemble average controls the depth of average for enhancing the SNR in HD. The parametric order determines the pole and/or zero number to adjust peaks and valleys of the time delays to follow the distribution. The kernel functions define the relationship between the feature vectors to establish the prediction model. The incoming angle selects the training and evaluating angles for the performance analysis. This subsection analyzes the performance in terms of prediction accuracy with described independent parameters.
The GPR produces the continuous output of the angle prediction because of the regression property; however, training and analyzing the system require discrete angles. The SCSSL system is trained by the seen angles from 0 to π − π/18 with π/18 resolution. Note that the opposite direction angles should be avoided to remove the confusion due to the π ambiguity in the HD algorithms. The angles between the adjacent seen angles are organized as the unseen angles to characterize the inference performance of the SCSSL system. The vocabulary 'seen' and 'unseen' is adopted to distinguish the included and excluded angles for learning, respectively, and both types of angles are shown in Figure 9a. The SCSSL system performance is numerated by root mean square error (RMSE), as shown below.
The y i is the real output (label) andŷ i is the predicted output. For the given kernel function, the RMSE for the parametric HD method is illustrated by the colormap surface plot (shown in Figure 9b), with parametric order and an ensemble length parameter.
The RMSE of the nonparametric HD is depicted by the conventional plot with ensemble length only.
The is the real output (label) and is the predicted output. For the given ke function, the RMSE for the parametric HD method is illustrated by the colormap sur plot (shown in Figure 9b), with parametric order and an ensemble length parameter. RMSE of the nonparametric HD is depicted by the conventional plot with ensemble len only.
(a) (b) The simulation parameters and corresponding values are arranged in Table 2. one prediction, the SCSSL algorithm requires the ensemble length frames with over ping between frames. Each frame consists of 1024 samples; however, the frame progre only for the nonoverlapped samples, 256. The number of seen angles is equal to the n ber of unseen angle, and 0-angles are called the angle set. The SCSSL algorith trained by the 111 angle sets with 1998 iterations. The validation process is performe two angle sets of the training as 222 angle set and 3996 iterations. Note that one itera is one angle training, and the training/validation are operated by the separated datas  The simulation parameters and corresponding values are arranged in Table 2. For one prediction, the SCSSL algorithm requires the ensemble length frames with overlapping between frames. Each frame consists of 1024 samples; however, the frame progresses only for the nonoverlapped samples, 256. The number of seen angles is equal to the number of unseen angle, and 0-π angles are called the angle set. The SCSSL algorithm is trained by the 111 angle sets with 1998 iterations. The validation process is performed by two angle sets of the training as 222 angle set and 3996 iterations. Note that one iteration is one angle training, and the training/validation are operated by the separated dataset.  range for optimal performance and the other kernels consistently distributed the R values based on the slope analysis. The validation in Figure 10h indicates the least R distributions among the parametric HDs. The unseen angle performance is outstan against all HD/GPR algorithms. Based on the investigation described above, the no ametric HD/GPR, Yule-Walker HD/GPR, Prony HD/GPR, and Steiglitz-Mc HD/GPR chose the rational quadratic, rational quadratic, exponential, and expone kernel, respectively.    The nonparametric HD/GPR shows the consistent RMSE values for the training set in Figure 10a. The exponential kernel delivers the least RMSE distribution in the training set; however, the rational quadratic kernel is narrowly located on the left of the other kernels in validation and seen angles shown in Figure 10b. The unseen angles produce nearly identical distributions for all kernels. The Yule-Walker HD/GPR provides the distinguished empirical CDFs for individual kernels in training set in Figure 10c, and the rational quadratic kernel exhibits the optimal performance. According to the CDF slopes, the majority of the RMSEs are located at the edges of the performance graph. The validation in Figure 10d provides approximately duplicated performance for all kernels in seen and unseen angles. Unlike the training CDFs, the RMSEs are gathered at the center of the RMSE distribution by analyzing the slopes.

Non-parametric HD/GPR -Training
On Figure 10e, the Prony HD/GPR demonstrates the comparable performance to the Yule-Walker HD/GPR outcome for training. The steeper slopes are measured on low RMSE values for all kernels and the exponential kernel denotes the best performance in training. The validation in Figure 10f stays in Figure 10d for all kernels and seen/unseen angles; however, the Prony HD/GPR initiates the RMSE distribution slightly earlier. The Steiglitz-McBride HD/GPR delivers the prominent performance in training set, as shown in Figure 10g. The exponential kernel quickly increases the empirical CDF in low RMSE range for optimal performance and the other kernels consistently distributed the RMSE values based on the slope analysis. The validation in Figure 10h indicates the least RMSE distributions among the parametric HDs. The unseen angle performance is outstanding against all HD/GPR algorithms. Based on the investigation described above, the nonparametric HD/GPR, Yule-Walker HD/GPR, Prony HD/GPR, and Steiglitz-McBride HD/GPR chose the rational quadratic, rational quadratic, exponential, and exponential kernel, respectively. Figure 11 provides the RMSE distribution for ensemble length and/or parametric order based on the selected kernel function from Figure 10. Each row of the figure matrix shows the individual HD/GPR, and each column specifies the training, validation for seen angles, and validation for unseen angles. The red circle in each figure corresponds to the lowest RMSE position to indicate the best parameters. The nonparametric HD/GPR is a function of the ensemble length only; therefore, the first row of Figure 11 is a conventional 2D plot. The parametric HD/GPR RMSEs are determined by the ensemble length as well as parametric order; hence, the figures except the first row are described by the 3D surface plot with color bars. Note that the surface plots are generated by the interpolation, and the grids crossed by the white lines represent the original simulated and noninterpolated positions. The depth of RMSE in decibels is equalized with Figure 10 simulation. Figure 11a-c collect the RMSEs for non-parametric HD/GPR. Relatively consistent distributions were observed in all plots, and the validation for seen angles in Figure 11b provides the least RMSE on 200 ensemble lengths, which indicates the best parameter. Note that the best parameters should be evaluated during the validation process based on the hold-out dataset. Figure 11d-f arrange the RMSEs for Yule-Walker HD/GPR. The training RMSE distribution in Figure 11d illustrates the performance fluctuation in column-wise patterns. The high orders and around order 6 designate the low RMSE contributions. The validation RMSE for seen angles in Figure 11e shows the RMSE valley in order 4 and long ensemble length, which are recognized as the best parameters. Figure 11f for unseen angles includes the overall high RMSE distribution. Figure 11g-i organizes the RMSEs for Prony HD/GPR. The overall distributions for training and validation resemble the Yule-Walker HD/GPR counterparts with a lower bias. Observe that the blue area in Figure 11g and the colorbar in Figure 11h,i correspond to the blue area in Figure 11d and the colorbar in Figure 11e,f, respectively. The best parameters for the Prony HD/GPR are order 4 and ensemble length 200 as well. Figure 11j-l consolidates the RMSEs for the Steiglitz-McBride HD/GPR. The training RMSE distribution in Figure 11j provides a low RMSE distribution for wide parameter combinations except for minor high glitches at certain spots. The validation RMSEs for seen and unseen angles in Figure 11k,l yield the RMSE valley at the order 3 column for higher ensemble length. The optimal parameters for Steiglitz-McBride HD/GPR are order 3 and ensemble length 200. Table 3 summarizes the RMSE performance for the selected kernel function. The first column shows the SCSSL methods with kernel function in parentheses. The second column presents the best parameter values and corresponding RMSE in decibel for validation on seen angles. The third and fourth column collect the training and validation (unseen angles) RMSE on the determined best parameters, respectively. Note that the arrows in the column title indicate the source of the parameter. The RMSEs in squared brackets represent the best RMSE values which belong to the column simulation.  Figure 12 demonstrates the predictions from the HD/GPR algorithms for training and validation. The rows of the figure matrix specify the individual HD/GPR algorithms and the columns divide the training and validation. Figure 12 realizes the individual coordinate by x axis with real angles and y axis with predicted angles. The solid orange diagonal line in the figures indicates the perfect prediction. The 18 angular positions are isolated by π/18 radian distance with equiprobable possibility for all iterations in training. The 36 angles are evaluated on validation for seen and unseen angles with π/36 radian away from the adjacent angles. Note that the unseen angles (even sequence in validation angles) did not participate in the training process and seen angles are identified by the green vertical lines in validation plots. The single points in the plot are the cumulated predictions with very low prediction variance.
The non-parametric HD/GPR shows the coherent predictions for training and validation in Figure 12a,b, respectively. The calculated RMSEs for training and validation (seen angles) are −52.43 dB and −45.52 dB, individually, as shown in Table 3 with given optimal parameter. Therefore, the non-parametric HD/GPR predicts the seen angles with high accuracy and low variance. However, the unseen angle predictions in Figure 12b present the high bias and low variance in statistics. The predictions are located at approximately π/2 for all unseen angles with inferior RMSE performance −1.74 dB. The Yule-Walker HD/GPR provides the wide range of the predictions for training and validation in Figure 12c,d, respectively. The computed RMSEs for training and validation (seen angles) are −15.90 dB and −9.77 dB, individually, as shown in Table 3 with given best parameters. The Yule-Walker HD/GPR predicts the seen angles with low accuracy even for the training process because of the Yule-Walker estimator property [30]. The validation for seen angles denotes the wider prediction distribution than the training counterparts. The unseen angle predictions produce the RMSE as −3.99 dB which lends the better performance than the non-parametric HD/GPR since the prediction mean follows the true angles in Figure 12d. blue area in Figure 11d and the colorbar in Figure 11e,f, respectively. The best parameters for the Prony HD/GPR are order 4 and ensemble length 200 as well. Figure 11j-l consolidates the RMSEs for the Steiglitz-McBride HD/GPR. The training RMSE distribution in Figure 11j provides a low RMSE distribution for wide parameter combinations except for minor high glitches at certain spots. The validation RMSEs for seen and unseen angles in Figure 11k,l yield the RMSE valley at the order 3 column for higher ensemble length. The optimal parameters for Steiglitz-McBride HD/GPR are order 3 and ensemble length 200.  Table 3 summarizes the RMSE performance for the selected kernel function. The first column shows the SCSSL methods with kernel function in parentheses. The second column presents the best parameter values and corresponding RMSE in decibel for validation on seen angles. The third and fourth column collect the training and validation (unseen angles) RMSE on the determined best parameters, respectively. Note that the arrows in the column title indicate the source of the parameter. The RMSEs in squared brackets This simulation section investigates the optimal receiver configuration and lo tion performance analysis based on the computer-generated dataset. The similarity significantly reduces the computational complexity for finding the best configu  The Prony HD/GPR derives the mixed prediction characteristics for training and validation in Figure 12e,f correspondingly. The evaluated RMSEs for training and validation (seen angles) are −71.69 dB and −10.79 dB, respectively, as shown in Table 3 with given best parameters. The Prony HD/GPR predicts the seen angles with high accuracy and low variance for the training in Figure 12e. The seen angle predictions in validation on Figure 12f generate the performance degradation with low accuracy and high variance which is comparable to the Yule-Walker HD/GPR equivalent. The unseen angle predictions provide the RMSE as −4.71 dB which offer the similar performance with the Yule-Walker HD/GPR because the prediction averages pursue the real angles as well in Figure 12f. The Steiglitz-McBride HD/GPR introduces the neat predictions for training and validation in Figure 12g,h, respectively. The calculated RMSEs for training and validation (seen angles) are −63.48 dB and −39.91 dB, individually, as shown in Table 3 with given best parameters. The Steiglitz-McBride HD/GPR estimates the seen angles with high accuracy and low variance for the training in Figure 12g. Additionally, the seen angle predictions in validation on Figure 12h deliver the consistent high precision arranged nearby the perfect prediction line. Notable property of the prediction is that the unseen angles are distinctly estimated for the true angles with low variation and −10.73 dB RMSE. The unseen angles approximately 11π/36 and 3π/4 demonstrate the biased estimation due to the duplication in the time delay pattern. The similarity matrix in Figure 7b validates the duplication by inspecting the off-diagonal high values nearby 11π/36 and 3π/4 angle. The Steiglitz-McBride HD/GPR provides the best performance for all angle situations with accurate inference.
This simulation section investigates the optimal receiver configuration and localization performance analysis based on the computer-generated dataset. The similarity matrix significantly reduces the computational complexity for finding the best configuration from numerous combinations. Based on the optimal receiver configuration, the localization performance is evaluated for kernel functions, ensemble length, parametric order, and directional angles. The Steiglitz-McBride HD/GPR with exponential kernel demonstrates the best performance for seen and unseen angles with various dataset. The hold-out dataset for unseen angles can be predicted with −10.73 dB RMSE. The minor prediction glitches are observed due to the receiver configuration limitation.

Results
The acoustic experiments are implemented and evaluated in an anechoic chamber, which is proved to show limited conformance with ISO 3745 [68] for the 1 kHz-16 kHz 1/3 octave band in a hemi-free-field mode and for the 250 Hz-16 kHz 1/3 octave band in a free-field mode [43]. The SCSSL methods are analyzed in the free-field mode which requires completely shielded surfaces for all directions with acoustic wedges. The acoustic experimentation demands three microphones with predetermined locations (shown in Figure 7a) and single speaker with far-field provision. The directional angle between receivers and speaker are guided by the line laser (GLL 3-80 P, Bosch, Gerlingen, Germany) located above the speaker. The far-field provision is preserved by maintaining a one-meter minimum distance for all possible signal propagations.
The microphones are positioned at the receiver frame composed of lumbers with plastic supports. The structure retains the receiver shape and microphone height for circular movements. Note that the speaker and receiver level should be identical for unimpeded horizontal propagation. The circular movements for incoming angles involve the pair of saw-toothed wheels with engraved and intagliated shape for π/18 (10 • ) rotations. As photographed in Figure 13, the three receivers are placed in the radial pattern with concentric circles; hence, the circle center provides the angular movement center which is designed for the saw-toothed wheel pair. The general experiment arrangement is presented in Figure 13a and the receiver center is depicted in Figure 13b for the circular movement engager. The plastic accessories are realized by the 3D printer (Replicator 2, MakerBot, Brooklyn, NY, USA) from the polylactic acid (PLA) filament. Figure 13, the three receivers are placed in the radial pattern with concentric circles; hence, the circle center provides the angular movement center which is designed for the saw-toothed wheel pair. The general experiment arrangement is presented in Figure 13a and the receiver center is depicted in Figure 13b for the circular movement engager. The plastic accessories are realized by the 3D printer (Replicator 2, Mak-erBot, Brooklyn, NY, USA) from the polylactic acid (PLA) filament. The analog mixer (MX-1020, MPA Tech, Seoul, Republic of Korea) combines the microphones (C-2, Behringer, Tortola, British Virgin Islands) for establishing the single channel signal. The computer-connected audio interface (Quad-Capture, Roland, Hamamatsu, Japan) quantizes the single-channel signal for SCSSL algorithms. The speaker (HS80M, Yamaha, Hamamatsu, Japan) is also connected to the audio interface to produce the wideband acoustic signal. The generation and reception of real-time audio are managed by the MATLAB system object with the audio stream input/output (ASIO) driver. The audio is recorded for 5 min in every appointed angles. The first and last one second of data are excluded to decrease the interruptions by transition conditions. Consequently, the experiment for individual angles utilizes approximately 5 min of recorded data. Each data The analog mixer (MX-1020, MPA Tech, Seoul, Republic of Korea) combines the microphones (C-2, Behringer, Tortola, British Virgin Islands) for establishing the single channel signal. The computer-connected audio interface (Quad-Capture, Roland, Hamamatsu, Japan) quantizes the single-channel signal for SCSSL algorithms. The speaker (HS80M, Yamaha, Hamamatsu, Japan) is also connected to the audio interface to produce the wideband acoustic signal. The generation and reception of real-time audio are managed by the MATLAB system object with the audio stream input/output (ASIO) driver. The audio is recorded for 5 min in every appointed angles. The first and last one second of data are excluded to decrease the interruptions by transition conditions. Consequently, the experiment for individual angles utilizes approximately 5 min of recorded data. Each data frame is built by 1024 samples and the frame progresses to the further time by non-overlapped samples 256 for information consistency. The angular movement engager (shown in Figure 13b) is designed to rotate the receiver structure for every π/18 degree with the given 3D printer resource. Hence, the experiment angles are reduced by half for seen and unseen angle datasets. The experiment parameters are organized in Table 4.  Figure 14 shows the experimented RMSE distributions with a selected kernel function which is derived from the simulation. The validation is performed with 9 seen angles for 999 training iterations and 999 validation iterations. The range of the RMSE is equalized to the previous RMSE distribution plots, such as in Figures 10 and 11. The red circle on each figure corresponds to the lowest RMSE position to indicate the best parameters. The non-parametric HD/GPR demonstrates the gradually decreasing distribution from −35 dB to −48 dB. The improved SNR from the longer ensemble length contributes to the lower RMSE. The Yule-Walker HD/GPR indicates the enhanced RMSE on the area for longer ensemble length and low parametric order. The low RMSE area is wide and clustered. The Prony HD/GPR introduces the analogous RMSE distribution to the Yule-Walker HD/GPR counterpart. The low RMSE area is narrower than the lower parametric order. The Steiglitz-McBride HD/GPR delivers the low RMSE area on the small region with fragmented pattern. The orders 2, 3, and 4 with certain ensemble lengths produce very low RMSE values. Figure 14 shows the experimented RMSE distributions with a selected kernel function which is derived from the simulation. The validation is performed with 9 seen angles for 999 training iterations and 999 validation iterations. The range of the RMSE is equalized to the previous RMSE distribution plots, such as in Figures 10 and 11. The red circle on each figure corresponds to the lowest RMSE position to indicate the best parameters. The non-parametric HD/GPR demonstrates the gradually decreasing distribution from −35 dB to −48 dB. The improved SNR from the longer ensemble length contributes to the lower RMSE. The Yule-Walker HD/GPR indicates the enhanced RMSE on the area for longer ensemble length and low parametric order. The low RMSE area is wide and clustered. The Prony HD/GPR introduces the analogous RMSE distribution to the Yule-Walker HD/GPR counterpart. The low RMSE area is narrower than the lower parametric order. The Steiglitz-McBride HD/GPR delivers the low RMSE area on the small region with fragmented pattern. The orders 2, 3, and 4 with certain ensemble lengths produce very low RMSE values.  Table 5 organizes the best parameter values and corresponding RMSE from the validation process. The longer ensemble length achieves the lowest RMSE values for all HD/GPR algorithms. Observe that the optimal parametric orders are increased by one from the order chosen from the experimental RMSE distribution (seen angles) in Figure  14. The shifted orders are selected from the balance between the seen and unseen angle performance. Additionally, the parameter values are equal to the values from the simulation results shown in Table 3. The nonparametric HD/GPR produces the leading prediction accuracy for the seen angles according to the RMSE values. However, the unseen angle performance is significantly decreased, as expected from the prediction simulation shown in Figure 12b. The Steiglitz-McBride HD/GPR provides the lowest RMSE value for the unseen angles and the second RMSE performance for seen angles with least parametric order. The Yule-Walker and Prony HD/GPR methods show the overall performance deterioration for seen and unseen angles.   Table 5 organizes the best parameter values and corresponding RMSE from the validation process. The longer ensemble length achieves the lowest RMSE values for all HD/GPR algorithms. Observe that the optimal parametric orders are increased by one from the order chosen from the experimental RMSE distribution (seen angles) in Figure 14. The shifted orders are selected from the balance between the seen and unseen angle performance. Additionally, the parameter values are equal to the values from the simulation results shown in Table 3. The nonparametric HD/GPR produces the leading prediction accuracy for the seen angles according to the RMSE values. However, the unseen angle performance is significantly decreased, as expected from the prediction simulation shown in Figure 12b. The Steiglitz-McBride HD/GPR provides the lowest RMSE value for the unseen angles and the second RMSE performance for seen angles with least parametric order. The Yule-Walker and Prony HD/GPR methods show the overall performance deterioration for seen and unseen angles. Along with the previous simulations and experiments, this paper arrives at the final phase of investigating the actual prediction distribution based on the selected kernel function, ensemble length, and parametric order. Figure 15 shows the predictions from the individual HD/GPR algorithms for seen and unseen angles. The number of predictions is described in Table 4 as 999/999 iterations for seen/unseen angles. Note that the seen angles are indicated by the green vertical lines in the plots. In Figure 15a, the nonparametric HD/GPR represents the accurate predictions on seen angles with −47.54 dB RMSE; however, the predictions on unseen angles are consistent but biased with −1.95 dB RMSE. In Figure 15b, the Yule-Walker HD/GPR presents a wide variance on all predictions with −15.75 dB/−3.04 dB RMSE for seen/unseen angles. The means of the distribution seem to be following the perfect prediction line except for unseen angles around west end arrivals. In Figure 15c, the Prony HD/GPR provides a similar performance pattern to the Yule-Walker HD/GPR with −15.50 dB/−0.24 dB RMSE for seen/unseen angles. The performance glitches on the unseen angles are observed around west end and north-east arrivals. In Figure 15d, the Steiglitz-McBride HD/GPR demonstrates the overall best performance in terms of bias and variance with −37.56 dB/−9.11 dB RMSE for seen/unseen angles. The unseen angles around north-east (3π/4) arrivals provide the biased estimations due to the duplication in the time delay pattern shown in Figure 7b.

As photographed in
This results section performs the actual experiments with the optimal receiver configuration over the recorded audio from the anechoic chamber. With a wideband sound source, the three receivers are connected by the audio analog mixer and wired to the audio interface to collect data for individual angles. The HD/GPR algorithms show better performance with longer ensemble lengths and lower parametric orders, overall. The minimum RMSE value for each HD/GPR algorithm denotes the comparable performance outline to the simulation RMSE counterpart with minor discrepancy. The non-parametric HD/GPR and Steiglitz-McBride HD/GPR show the good performance on seen angles. The Steiglitz-McBride HD/GPR provides high prediction accuracy at unseen angles. The other HD/GPR algorithms generate marginal prediction performance for seen or unseen angles with high variance and/or high bias. Therefore, the Steiglitz-McBride homomorphic deconvolution, the pole angle method (feature extraction), and Gaussian process regress (exponential kernel) are combined as the SCSSL algorithm for optimal prediction on extensive incoming angles.
Walker HD/GPR with −15.50 dB/−0.24 dB RMSE for seen/unseen angles. The perfor glitches on the unseen angles are observed around west end and north-east arriv Figure 15d, the Steiglitz-McBride HD/GPR demonstrates the overall best perform terms of bias and variance with −37.56 dB/−9.11 dB RMSE for seen/unseen angl unseen angles around north-east (3 4 ⁄ ) arrivals provide the biased estimations du duplication in the time delay pattern shown in Figure 7b. This results section performs the actual experiments with the optimal receiv figuration over the recorded audio from the anechoic chamber. With a wideband source, the three receivers are connected by the audio analog mixer and wired to th interface to collect data for individual angles. The HD/GPR algorithms show bett formance with longer ensemble lengths and lower parametric orders, overall. Th mum RMSE value for each HD/GPR algorithm denotes the comparable performan line to the simulation RMSE counterpart with minor discrepancy. The non-para HD/GPR and Steiglitz-McBride HD/GPR show the good performance on seen angl Steiglitz-McBride HD/GPR provides high prediction accuracy at unseen angles. Th HD/GPR algorithms generate marginal prediction performance for seen or unseen

Conclusions
This paper provides the novel method to localize the angle of arrival based on the deconvolution with Gaussian process regression. The forward and backward homomorphic systems in cascade realize the deconvolution to extract the propagation function, which contains the time of flight between receivers. The nonparametric method is represented by the raw distribution of homomorphic deconvolution. The Yule-Walker, Prony, and Steiglitz-McBride from the spectral estimation technique are employed in the last stage of homomorphic deconvolution for model-and parameter-based propagation representation as a parametric method. The Gaussian process regression with a kernel (covariance) function calculates the mean and covariance of the updated function to predict the output based on the given training data. The Gaussian process regression predicts the angle from the given feature input once the learning process is completed. For feature extraction, the direct method and pole angle method are used for the nonparametric and parametric homomorphic deconvolutions, respectively. The optimal receiver configuration for a threereceiver structure is derived from the proposed similarity matrix analysis based on the time delay pattern diversity. The extensive simulations that present specific model order and longer ensemble length produce the least root mean square error. The Steiglitz-McBride model with an exponential kernel demonstrates the best performance for trained and untrained angles with various datasets. The experiments in the anechoic chamber present precise predictions with proper model order and ensemble length. The nonparametric method with a rational quadratic kernel shows good performance on trained angles. The Steiglitz-McBride model with an exponential kernel delivers the best predictions for trained and untrained angles based on the reduced model order.
This paper broadens the single-channel sound source localization system of the previous article to improve the comprehensive prediction performance by using Gaussian process regression with novel feature extraction. The foundation of the single channel sound source localization system was constructed by the time delay estimation using nonparametric and parametric homomorphic deconvolutions which was shown in prior independent article. The advantages of the system involve reduced system complexity because of the single analog receiver network with a single analog-to-digital converter for serving multiple receivers. The second benefit includes moderate modeling of machine learning algorithms to derive the estimation algorithm. Without using the complex propagation model, the training process of Gaussian process regression from the selected kernel function provides proper functioning for the sound source localization. Contrary to linear regression in the previous paper, the outcome of this paper proved that the Gaussian process regression with noble feature extraction realized the highest accuracy prediction for a wide range of the angles even for the untrained directions. Another enhancement of this system is introduced by the consistent algorithm complexity against receiver numbers. With fixed parameter values, the computational requirement is invariant to the receiver number. Observe that the conventional sound source localization systems show the same proportional complexity in terms of receiver numbers. The statistical analysis in simulation and experimentation demonstrates the feasibility of sound localization based on homomorphic deconvolution and Gaussian process regression. This paper contributes to increasing the possibility for the realization of a deployable and feasible sound source localization system by improving scalability and complexity. Currently, machine and deep learning techniques present aggressive progress for providing further accuracy in predictions and decisions with sparse or no datasets. The proposed system is expected to be improved by the use of advanced learning algorithms. Future work may aim to enhance the proposed system in complex situations, such as receiver failure and indirect propagation, etc. In addition, the future paper will explore the localization performance in the field using data collected by the mobile robot in a conventional environment. Due to the flexibility and expandability of deconvolution and machine learning, the use of the proposed method is not limited to sound source localization. From any sequential signals, the phase information on the multiple receivers can be extracted by the proposed algorithm. The future work will assist in finding additional utilizations other than sound source localization in succession.