Next Article in Journal
New Radiation-Hardened Design of a CMOS Instrumentation Amplifier and its Tolerant Characteristic Analysis
Previous Article in Journal
Cellular-D2D Resource Allocation Algorithm Based on User Fairness
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Fast Computation of LSP Frequencies Using the Bairstow Method

1
Smart Sensing R&D Center, Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China
2
School of Electronic, Electrical and Communication Engineering, University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Electronics 2020, 9(3), 387; https://doi.org/10.3390/electronics9030387
Submission received: 20 January 2020 / Revised: 23 February 2020 / Accepted: 25 February 2020 / Published: 26 February 2020
(This article belongs to the Section Circuit and Signal Processing)

Abstract

:
Linear prediction is the kernel technology in speech processing. It has been widely applied in speech recognition, synthesis, and coding, and can efficiently and correctly represent the speech frequency spectrum with only a few parameters. Line Spectrum Pairs (LSPs) frequencies, as an alternative representation of Linear Predictive Coding (LPC), have the advantages of good quantization accuracy and low spectral sensitivity. However, computing the LSPs frequencies takes a long time. To address this issue, a fast computation algorithm, based on the Bairstow method for computing LSPs frequencies from linear prediction coefficients, is proposed in this paper. The algorithm process first transforms the symmetric and antisymmetric polynomial to general polynomial, then extracts the polynomial roots. Associated with the short-term stationary property of speech signal, an adaptive initial method is applied to reduce the average iteration numbers by 26%, as compared to the statics in the initial method, with a Perceptual Evaluation of Speech Quality (PESQ) score reaching 3.46. Experimental results show that the proposed method can extract the polynomial roots efficiently and accurately with significantly reduced computation complexity. Compared to previous works, the proposed method is 17 times faster than Tschirnhus Transform, and has a 22% PESQ improvement on the Birge-Vieta method with an almost comparable computation time.

1. Introduction

Linear predictive analysis of the speech signal is one of the most powerful speech analysis techniques, which can extract the short-time spectral envelope information of speech signals efficiently, and is widely used in the fields of speech representing for low bit rate transmission or storage, automatic speech and speaker recognition [1,2,3,4,5]. The predominant linear predictive analysis method is Linear Prediction Coding (LPC), featuring better fault tolerance during transmitting spectral envelope information. As an alternative representation of LPC, Line Spectrum Pairs (LSPs) were first introduced by Itakura [6], which have the advantages of good quantization accuracy and low spectral sensitivity.
During the computation of LSPs, the LPC polynomial A(z) is decomposed into a pair of symmetric and anti-symmetric polynomials. These two polynomials are called LPC polynomials. It has been proven that the roots of the LPC polynomials, namely the LSPs frequencies, are interleaved on the unit circle [6]. Many methods have been proposed to solve these LPC polynomials. Soong and Juang [7,8] adopted the Discrete Cosine Transform (DCT) to evaluate the cosine functions, based on the bisection method with a fine grid. They employed a closed formula to extract the roots of LPC polynomial. Kang and Fransen [9] first proposed to transform the LPC polynomial into the sum of cosine functions to avoid the complex computation in later processing. They employed the autocorrelation method and the all-pass filter to extract the roots. Despite these methods solving the LPC polynomials correctly, they have the disadvantages of high computational complexity from trigonometric functions, and high memory usage from cosine tables. Kabal and Ramachandram [10] employed the bisection method and the interpolation property to locate the position of zero-crossings with a fine grid. While it requires no prior storages or the calculation of trigonometric functions, the number of bisections cannot be decreased without compromising the search of zero-crossings. Wu and Chen [11] first proposed to transform LPC polynomial into a pair of general-form polynomials, then used the closed-form formulas and the modified Newton-Raphson method to extract polynomial roots. It can avoid the computation of trigonometric functions with a fine grid. However, the mathematical operations with complex numbers are still time-consuming. Chen and Ruan [12] proposed the modified complex-free Ferrari formula to reduce the computation. While this algorithm avoids complex number operations, it still needs modulus operations to get the roots of the polynomials. Chen and Chang [13] employed the Tschirnhaus transform to reduce the degree of the polynomials, then used Descartes–Euler to extract the roots. While it can accelerate the computation processing, it still needs to compute trigonometric functions. Chung-Hsien Chang [14] proposed a method to solve the LPC polynomials, which is suitable for hardware implementation.
Hence, an efficient method to solve LPC polynomials should have the following characteristics:
  • Avoiding evaluation of trigonometric functions,
  • Avoiding using complex operations and a fine grid,
  • Extracting the roots rapidly and accurately.
Considering the above requirements, we proposed a fast LPC computation method, based on the Bairstow method, which can extract the roots of LPC polynomials without using complex operations, a fine grid and trigonometric functions. Luk [15] and Hsiao [16] used the Bairstow method to solve the polynomial, but without specific algorithm implementation and performance evaluation of the algorithm. Additionally, the initial method was stubborn, which limited the convergence speed. Thus, an adaptive initial method, associated with the short-term property of the speech signal, is proposed to guarantee its convergence in this paper. Then, we apply our method to a speech compression codec system. Experiments show that, compared with other methods, including those of Soong and Juang [8], Kabal and Ramachandram [10], the Chen and Wu method [11], the modified Ferrari’s formula [12], Tschirnhaus transform [13], the Birge-Vieta method [14] and the previous Braistow-base LPC analysis methods [15,16], the proposed method can estimate the LSPs frequencies accurately with the lowest computational complexity, together with high speech quality.
The rest of this paper is organized as follows: A brief introduction of LSPs frequencies is reviewed in Section 2. In Section 3, a detail description of the Bairstow method is given, then the adaptive initial method is given to boost the estimation accuracy and the convergence speed. Section 4 provides a guideline to estimate the 10-order LSPs, and shows the experimental results of the proposed method, as compared to previous methods. Finally, Section 5 provides the conclusion.

2. Line Spectral Pairs (LSPs) Frequencies

2.1. LPC Polynomial

This section introduces the background of the LSPs frequencies and some of the relevant properties briefly. Given a p-order LPC, the minimum-phase LPC polynomial is expressed as follows, according to Reference [14]:
A ( z ) = 1 k = 1 p a k z k = 1 a 1 z 1 a 2 z 2 a p z p ,
where a 1 , a 2 , …, a p are the direct form of LPC coefficients. The LPC polynomial can be decomposed into a symmetric P ( z ) polynomial and an anti-symmetric Q ( z ) polynomial. The following formulas are their representations:
P ( z ) = A ( z ) + z ( p + 1 ) A ( z 1 )
Q ( z ) = A ( z ) z ( p + 1 ) A ( z 1 ) .
The roots of P ( z ) and Q ( z ) determine the value of LSPs frequencies. According to Reference [14], the above two auxiliary polynomials have the significant properties that their roots are on the unit circle and interleave with each other. Further, P ( z ) has a root z = 1   ( w = π ) , and Q ( z ) has a root z = 1   ( w = 0 ) . These roots can be eliminated by polynomial division. Thus, the reduced polynomials can be obtained
R p ( z ) = P ( z ) z + 1   , R q ( z ) = Q ( z ) z 1 , P : even
R p ( z ) = P ( z ) ,   R q ( z ) = Q ( z ) z 2 1 , P : odd .
Furthermore, because the coefficients of both R p ( z ) and R q ( z ) are real, the location of the roots for each polynomial are conjugate on the unit circle and can be written as follows:
R p ( z ) = i = 1 , 3 , , 2 M 1 1 ( 1 2 q i z 1 + z 2 )
R q ( z ) = j = 2 , 4 , , 2 M 2 ( 1 2 θ j z 1 + z 2 )
M 1 = P 2 ,   M 2 = P 2 , P : even
M 1 = P + 1 2 , M 2 = P 1 2 , P : odd ,
where q i = cos ( w i ) , θ i = cos ( w j ) , M 1 + M 2 = P . The coefficients w i , w j are the line spectral frequencies (LSF) [16] and they satisfy the ordering property as follows:
0 < w 1 < θ 1 < w 2 < θ 2 < < w M 1 < θ M 2 < π , P : even
0 < w 1 < θ 1 < w 2 < θ 2 < < w M 2 = w M 1 1 < θ M 2 < w M 1 < π , P : odd .
The ordering property of LSF can be illustrated in Figure 1.
Combing z = e j w , Equations (6) and (7) can be described as follows:
R p ( e j w ) = 2 e j w M 1 ( p ( 0 ) cos ( M 1 w ) + p ( 1 ) cos ( M 1 1 ) w + + p ( M 1 1 ) cos w + p ( M 1 ) )
R q ( e j w ) = 2 e j w M 2 ( q ( 0 ) cos ( M 2 w ) + q ( 1 ) cos ( M 2 1 ) w + + q ( M 2 1 ) cos w + q ( M 2 ) ) .

2.2. General-Form Polynomial Transformation for the Computation of LSPs Frequencies

The zero crossings of R p ( z ) and R q ( z ) are the values of LSPs frequencies. In order to find the LSPs frequencies, we need to solve polynomials R p ( e j w ) = 0 and R q ( e j w ) = 0 .
This paper discusses the commonly used value of P = 10, then Equations (12) and (13) can be expressed as a general polynomial:
R ( e j w ) = f ( 0 ) cos ( M w ) + f ( 1 ) cos ( M 1 ) w + + f ( M 1 ) cos w + f ( M ) .
The above formula can be deployed to a general-form polynomial as follows:
R ( w ) = 32 cos 5 w + 16 f ( 1 ) cos 4 w + 8 ( f ( 2 ) 5 ) cos 3 w + 4 ( f ( 3 ) 4 f ( 1 ) ) cos 2 w + 2 ( f ( 4 ) 3 f ( 2 ) + 5 ) cos w + ( f ( 5 ) 2 f ( 3 ) + 2 f ( 1 ) ) ,
where f ( n ) are the polynomial coefficients, M = P 2 . We define x = cos ( w ) . As w is in the range ( 0 , π ) , then x is in the range ( 1 , + 1 ) . Therefore, the above two polynomials can be written as
R ( w ) = 32 x 5 + 16 f ( 1 ) x 4 + 8 ( f ( 2 ) 5 ) x 3 + 4 ( f ( 3 ) 4 f ( 1 ) ) x 2 + 2 ( f ( 4 ) 3 f ( 2 ) + 5 ) x + ( f ( 5 ) 2 f ( 3 ) + 2 f ( 1 ) ) .
According to Reference [17], the value of f ( n ) can be found by the recursive relations:
f p ( i + 1 ) = a i + 1 + a 10 i f p ( i ) , i = 0 , , 4
f q ( i + 1 ) = a i + 1 a 10 i + f q ( i ) , i = 0 , , 4 ,
where f p ( n ) and f q ( n ) represent the coefficient of R p ( w ) and R q ( w ) , respectively. According to Reference [18], we compare Equation (16) with a general-form polynomial (19):
f ( x ) = x 5 + α 1 x 4 + α 2 x 3 + α 3 x 2 + α 4 x + α 5 ,
where f ( x ) is a general-form polynomial with the same order of R ( w ) . Comparing Equation (19) to (16), the coefficients relationship of R ( w ) and f ( x ) can be written as
{ α 1 = f ( 1 ) / 2 , α 2 = ( f ( 2 ) 5 ) / 4 , α 3 = ( f ( 3 ) 4 f ( 1 ) ) / 8 α 4 = ( f ( 4 ) 3 f ( 2 ) + 5 ) / 16 , α 5 = ( f ( 5 ) 2 f ( 3 ) + 2 f ( 1 ) ) / 32 .
In summary, after a series of steps, LPC polynomials are turned into a general-form polynomial and the relationship between LPC coefficients and general-form polynomial coefficients is found. Therefore, Section 3 introduces the method to solve the general-form polynomial.

3. Polynomial Solution Based on Bairstow Method

As mentioned in Section 2, the LPC polynomials can be transformed into a general-form polynomial. Furthermore, the relationship between the LPC coefficients and coefficients of the general-form polynomial is presented by Equation (20). To solve the general-form polynomial, this paper employs the Bairstow method.
The Bairstow method [18,19,20] is a rapid root-finding algorithm for general-form polynomial. Compared to other conventional methods, it can efficiently reduce the number of addition and multiplication operations, and avoids the evaluation of trigonometric and other complex computations. Thus, the proposed method can converge rapidly and reduce the computation time significantly.
The Bairstow method follows from the observation that the roots of a real quadratic polynomial:
x 2 r x q ,
are the roots of a given N -order real polynomial
f ( x ) = α 0 x N + α 1 x N 1 + + α N 1 x + α N , α 0 0 ,
only if f ( x ) can be divided by Equation (21) without the remainder. After deflating, the deflated polynomial can be represented as
f ( x ) = f 1 ( x ) ( x 2 r x q ) + A x + B , α 0 0 ,
where the degree of f ( x ) is N -2. The remainder is expressed as A x + B . The coefficients of the remainder depend upon r and q , that is
A = A ( r , q ) , B = B ( r , q ) ,
and the remainder vanishes when r and q satisfy the condition
A ( r , q ) = 0 , B ( r , q ) = 0
According to the scheme, if r 0 and q 0 are the initial approximations for r and q , then the iterative process can be written as
r i + 1 = r i + Δ r q i + 1 = q i + Δ q Δ r = ( A B q B A q ) / ( A r B q A q B r ) Δ q = ( A r B B A r ) / ( A r B q A q B r ) ,   i = 0 , 1 , 2 , ,
where A r , A q , B r , B q are the partial derivatives with respect to r and q evaluated at r i and q i .
The values A and B can be found by means of a Horner-type [21] scheme. Comparing Equation (22) with (23), we can get the following recursive formula:
b 1 = 0 b 0 = α 0 b 1 = α 1 r b 2 = α 2 r b 1 q b i = α i r b i 1 q b i 2 A = α N 1 r b N 2 q b N 3 B = α N r b N 2 ,   i = 2 , 3 , , N 2 .
Define c i 1 = b i r , i = 1 , 2 , , N . Combining Equation (27), we can get the following recursive formula:
c 0 = 1 , c 1 = 0 c i = b i r c i 1 q c i 2 ,   i = 1 , 2 , , N ,
and the value of d r and d q :
Δ r = b c N 3 b N 1 c N 2 c 2 N 2 c N 3 ( c N 1 b N 1 ) Δ q = b N 1 ( c N 1 b N 1 ) b N c N 2 c 2 N 2 c N 3 ( c N 1 b N 1 )
The convergence criterion is defined to terminate the iterative procedure when the condition
| Δ r | < ε   and   | Δ q | < ε ,
is satisfied for a specified value ε . The predefined value ε determines the accuracy of the root. Then, we can get the root of f 1 ( x ) , assuming there are two distinct roots x 0 and x 1 ,
x 0 = ( r + r 2 4 q ) / 2 x 1 = ( r r 2 4 q ) / 2 .
To obtain the root of x 3 and x 4 , we employ a further division of f 1 ( x ) by ( x 2 r x q ) . Then, we obtain the deflated polynomial:
f 1 ( x ) = f 2 ( x ) ( x 2 r 1 x q 1 ) + A 1 x + B 1 ,
where the degree of f 2 ( x ) is N − 4. To find all the roots, we divide f i ( x ) by ( x 2 r i x q i ) , where i = 1 , 2 , , N / 2 , until the deflation results in a quadratic polynomial.
Table 1 shows the pseudocodes of the proposed method according to Equations (26)–(31). r ^ and q ^ are the initial value of r and q , respectively. According to Rererence [18], this algorithm is sensitive to the initial value of r and q . It will converge rapidly if the initial value of r and q are sufficiently close to their true values. In Section 4, we will discuss in detail how to choose the initial value of r and q .
After calculating the LSPs with the proposed method, the LSPs should be sorted in descending order according to Equations (10) or (11) with the Insert-Sort [22] algorithm.
Denoting the roots of the general-form polynomial are x i , the LSPs frequencies are given by w i = a r cos ( x i ) , i = 1, 2, …, P . The values of w i are ranked as follows:
0 < w 1 < w 2 < < w P ,   P   is even .

4. Experiment and Results

4.1. Experimental Environment

As mentioned earlier in Section 2, the LSPs are the representative parameters of the LPC filter. To calculate the LSPs, we use the Aurora data [23] as the benchmark. The Aurora database is a professional speech corpus created by the Aurora project (www.eld.org). There are 1001 clean and 1001 noisy speech files from the Test-A set, spoken by 50 males and 50 females used in this paper.
The test speech signals are sampled at 8 kHz with a resolution of 16 bits per sample. According to Reference [1], the speech signal has a short-term stationary property, so we select every 80 samples (10 ms) as a frame. In order to evaluate the performance of the proposed method, such as computing time, convergence speed, and its impact on speech quality, we choose ITU-T G.729, which is a data compression algorithm using conjugate-structure algebraic-code-excited linear prediction, as a verification platform. The proposed method is implemented using C language and is inserted into the speech encoding module of the ITU-T G.729 platform. Finally, 1,754,330 sets LSPs of 10-order LPC are evaluated on a personal computer (Intel i5-9300H).

4.2. Selection and Update of Initial Values

As introduced in Section 3, the proposed method is sensitive to the selection of the initial values of r and q , which greatly influences the convergence speed of polynomial optimization. The following experiment shows how to select appropriate initial values.
Three strategies are proposed to select the initial values of r and q . The first method, called the fixed initial method, which has been employed by a previous works [15], provides fixed initial values for the roots finding. The second method named statistic initial method calculates the statistical mean values of r and q respectively, then uses the statistical mean values as the initial values for the evaluation of the rest. The third method, called the adaptive initial method, which combines the short-term stationary property of the speech signal, then uses the final result of the current frame as the initial values of the next frame.
Aurora corpus is used as the benchmark set. For the fixed initial method, 1001 speech files from testa_ns_snr15 are evaluated with the initial values being 0, 4 or 8. For the statistic initial method, in order to guarantee the robustness, we use 200 speech files from testa_clean1 as the analytical set, and 1001 speech files from testa_n1_snr15 as the evaluating set, to guarantee the robustness. After the analytical set is processed to satisfy the convergence criterion, the statistic initial method is employed to calculate their statistical mean values r a v e and q a v e . Then, these values are used to initialize each frame for all speech samples from the evaluating set. For the adaptive initial method, without the necessity of precalculating initializations, 1001 speech files from testa_n1_snr15 are all adopted as the evaluating set. The initial values of the first frame are randomly generated, and the initial values of the subsequent frames are obtained directly from the previous frame.
Table 2 shows a sectional result of the experiment. r and q are the final converged values of each frame. Δ r and Δ q represent the differences between the initial values and the final values of r and q for each frame, respectively. Table 2 shows the | Δ r | and | Δ q | , which are almost lower than 0.03, and Table 3 shows the statistical mean values of | Δ r | and | Δ q | of the 1001 test files. From the table, we know the average values of | Δ r | and | Δ q | are 0.0419 and 0.0305, respectively, and the standard deviation is small. Through the statistical analysis, we discover the value of the parameters between adjacent frames only fluctuates slightly.
{ Δ r = r i n i t i a l r t r u e = r f i n a l r i n i t i a l   Δ q = q i n i t i a l q t r u e = q f i n a l q i n i t i a l .
To investigate the convergence speed of the three initial selection strategies, we perform a statistical analysis of the experiment data. Table 4 and Figure 2 illustrate the average iteration numbers of the fixed, statics and adaptive initial method. Where n1 and n2 mean iteration numbers of the iterative loop for the first five-order polynomial, the iterative loop for the first three-order polynomial, respectively; n3 and n4 mean iteration numbers of the iterative loop for the second five-order polynomial, the iterative loop for the second third-order polynomial, respectively; Total means the iteration numbers for calculating all 10 roots for the two LPC polynomials.The second column of Table 4 means the different initial value for r and q , where the first row applies r i n i t i a l = 0 and q i n i t i a l = 0 , the second row applies r i n i t i a l = 4 and q i n i t i a l = 4 , and the third row applies r i n i t i a l = 8 and q i n i t i a l = 8 for all test speech frames. In Figure 2, the two vertical lines in each subplot represent the average convergence iterations of the adaptive initial method with the dash, and the static initial method with the dot.
As Table 4 and Figure 2 show, the setting of the initial values directly affect the convergence iterations. The fixed initial method performs worse than the statistic initial method, and the convergence speed of the adaptive initial method is the fastest, with the average iterations of n1, n2 and n4 at about 3.7, the average iterations of n3 at about 5.2. Further, compared with the fixed initial ( r i n i t i a l = 8 and q i n i t i a l = 8 ) method and statics initial method, the total average iteration numbers are reduced by 68% and 26%, respectively, with the same computation accuracy. This also verifies the conclusion in Table 2; the value of r and q fluctuate slightly in the adjacent frame due to the short term stationary property of speech signal, thus selecting the final result of r and q for the current frame as the initial value for the next frame can reduce the average iteration obviously. Therefore, the adaptive initial method is effective, and we apply the adaptive initial method as the selected initial strategy.

4.3. Solve the 10-Order LSP Using the Proposed Method

We randomly select a speech frame to evaluate the computation accuracy of the proposed method. The selection of the initial method follows the strategy mentioned above. We solve the following F ( x ) polynomial and present the experimental results in Table 5, Table 6 and Table 7, where x j means the jth root of the polynomial, where | Δ | means the threshold value of iteration termination.
Table 5 shows the detail steps to solve the F ( x ) polynomial using the proposed method, where k means the kth iterative loop. This process includes two stages, of which a 5-order polynomial and a deflated three-order polynomial are both solved in four iterations sequentially. The solving steps of each stage are elaborately illustrated. In the first stage, a five-order polynomial is deflated, where r i and q i are the initial value of r and q in each iteration, respectively, r i + 1 and q i + 1 are calculated by the proposed method after obtaining new values. In the second stage, a deflated three-order polynomial is solved, where r 2 i and q 2 i are the initial values in each iteration of the deflated polynomial, and r 2 i + 1 and q 2 i + 1 are the results calculated by the proposed method using r 2 i and q 2 i .
Table 6 presents the difference between approximate and accurate values. The third column is the absolute difference between the accurate roots and the approximate roots, where | Δ | = | a c c u r a t e a p p r o x i m a t e | . The experimental results show that the computation accuracy is 0.00001, which means that the approximate values are very close to the accurate ones.
After the first and the second stages, we obtain the resulting value of x j , then transform the x j into the LSF w j by using the formula w j = a r c o s ( x j ) . Combing Table 5, Table 6 and Table 7, the results show that the optimization converges extremely fast, and the value of LSF is accurate.

4.4. Performance of the Proposed Method

This section evaluates the performance of the proposed method, including the computation time and the speech quality, by making a comparison between the proposed method and other methods, including the Birge-Vieta method [14], Tschirnhaus Transform [13], original/Modified Ferrari’s method [12], Chen and Wu method [11], Soong and Juang [8], and the Full search.
Due to the lack of the algorithm performance parameters Reference [14], we established the Birge-Vieta method model strictly according to the reference paper. Table 8 shows the Perceptual Evaluation of Speech Quality (PESQ) [24] comparison between the proposed method and the Birge-Vieta method. Table 9 shows the comparison of the average computation time between the proposed method and the previous works, under the same computation accuracy ( α = | x i + 1 x i | = 0.00001 ).
The second column of Table 9 is the average computation time of converting 1,754,330 sets of LPC coefficients to LSF frequencies. Associated Table 7 and Table 8, the results show that the PESQ of the proposed method has improved by 22%, in comparison to the Birge-Vieta method under the testa_clean1, while its computation time is comparable. After normalizing the CPU frequency of system environment to 3.7 GHz and with the same computation accuracy, the proposed method is 17 times faster than the Tschirnhaus transform method, 23 times faster than the Modified Ferrari’s method, 47 times faster than the Original Ferrari’s method, 47.4 times faster than the Chen and Wu method, 152 times faster than the Soong and Juang method, and is 201 times faster than the Full Search. It is obvious that the proposed method is the fastest, with the same computation accuracy.

5. Conclusions

This paper proposed an efficient method to calculate the roots of LPC polynomial, which can transform LPC coefficients to LSPs accurately and rapidly. Associated with the short-term stationary property of speech signal, an adaptive initial method is applied to reduce the average iteration numbers by 26%, as compared with the statics initial method, with the PESQ score reaching 3.35. Experiment results show that the proposed method is 17 times faster than the Tschirnhus Transform, and has a 22% PESQ improvement in comparison to the Birge-Vieta method, with almost the same computation time. Compared to other root-finding methods, the proposed method is the fastest one with comparable speech quality.

Author Contributions

Theory and conceptualization, Y.X.; methodology, Y.X.; software design and development, Y.X.; validation, Y.X.; formal analysis, Y.X. and Z.Z.; investigation, Y.X.; writing–original draft preparation, Y.X.; writing–review and editing, X.F., Z.Z., J.J., Y.Z., Z.Y. and S.Q. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

I would like to thank Smart sensing R&D center, Institute of microelectronics of Chinese academy of sciences that supported us in this work and Zhejiang University for assistance of the Aurora corpus evaluation.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Rabiner, L.R.; Schafer, R.W. Theory and Applications of Digital Speech Processing; Pearson Education: London, UK, 2011; p. 473. [Google Scholar]
  2. Makhoul, J. Linear Prediction: A Tutorial Review. Proc. IEEE 1975, 63, 561–580. [Google Scholar] [CrossRef]
  3. Chowdhury, A.; Ross, A. Fusing MFCC and LPC Features Using 1D Triplet CNN for Speaker Recognition in Severely Degraded Audio Signals. IEEE Trans. Inf. Forensics Secur. 2019, 15, 1616–1629. [Google Scholar] [CrossRef]
  4. Alku, P.; Saeidi, R. The Linear Predictive Modeling of Speech From Higher-Lag Autocorrelation Coefficients Applied to Noise-Robust Speaker Recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2017, 25, 1606–1617. [Google Scholar] [CrossRef]
  5. Ramalho, L.; Fonseca, M.N.; Klautau, A.; Lu, C.; Berg, M.; Trojer, E.; Höst, S. An LPC-Based Fronthaul Compression Scheme. IEEE Commun. Lett. 2017, 21, 318–321. [Google Scholar] [CrossRef]
  6. Itakura, F. Line Spectrum Representation of Linear Predictive Coefficients of Speech Signals. J. Acoust. Soc. Am. 1975, 57, 535. [Google Scholar] [CrossRef] [Green Version]
  7. Soong, F.K.; Juang, B.H. Optimal quantization of LSP parameters. IEEE Trans. Speech Audio Process. 1993, 1, 15–24. [Google Scholar] [CrossRef]
  8. Soong, F.K.; Juang, B.H. Line Spectrum Pair (LSP) and Speech Data Compression. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Diego, CA, USA, 19–21 March 1984. [Google Scholar]
  9. Kang, G.; Fransen, L. Application of line-spectrum pairs to low-bit-rate speech encoders. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Tampa, FL, USA, 26–29 April 1985; p. 8857. [Google Scholar]
  10. Kabal, P.; Ramachandran, R.P. The Computation of Line Spectral Frequencies Using Chebyshev Polynomials. IEEE Trans. Acoust. Speech Signal Process. 1986, 34, 1419–1426. [Google Scholar] [CrossRef] [Green Version]
  11. Wu, C.H.; Chen, J.H. A Novel Two-Level Method for the Computation of the LSP Frequencies Using a Decimation-in-Degree Algorithm. IEEE Trans. Speech Audio Process. 1997, 5, 106–115. [Google Scholar]
  12. Chen, S.H.; Chang, Y.; Ruan, J.C. An Efficient Computation of LSP Frequencies Using Modified Complex-Free Ferrari Formula. Signal Process. Syst. 2008, 52, 153–163. [Google Scholar] [CrossRef]
  13. Chen, S.H.; Chang, Y.; Syuan, C.J.Y. The Computation of Line Spectrum Pair Frequencies Using Tschirnhaus Transform. In Proceedings of the International Symposium on Circuits and Systems, Taipei, Taiwan, 24–27 May 2009; pp. 288–291. [Google Scholar]
  14. Chang, C.H.; Chen, B.W.; Chen, S.H.; Wang, J.F.; Chiu, Y.H. Low-Complexity Hardware Design for Fast Solving LSPs With Coordinated Polynomial Solution. IEEE Trans. Large Scale Integr. Syst. 2015, 23, 230–243. [Google Scholar] [CrossRef]
  15. Luk, W.S. Finding roots of real polynomial simultaneously by means of Bairstow’s method. Bit Numer. Math. 1996, 36, 302–308. [Google Scholar] [CrossRef]
  16. Hsiao, C.C.; Brodersen, R. A multi-rate root LPC speech synthesizer. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), San Diego, CA, USA, 19–21 March 1984; pp. 41–44. [Google Scholar]
  17. Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP); International Telecommunication Union: Geneva, Switzerland, 1996.
  18. Stoer, J.; Bulirsch, R. Introduction to Numerical Analysis, 2nd ed.; Springer Science & Business Media: Berlin, Germany, 2013; pp. 333–335. [Google Scholar]
  19. O’Donnell, J. A System for very low data rate speech communication. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Atlanta, GA, USA, 30 March–1 April 1981; pp. 8–11. [Google Scholar]
  20. Rothweiler, J. On polynomial reduction in the computation of LSP frequencies. IEEE Trans. Speech Audio Process. 1999, 7, 592–594. [Google Scholar] [CrossRef]
  21. Hildebrand, F.B. Introduction to Numerical Analysis, 2nd ed.; McGraw-Hill: New York, NY, USA, 1974; pp. 613–618. [Google Scholar]
  22. Wikipedia. Available online: https://en.wikipedia.org/wiki/Insertion_sort (accessed on 15 January 2020).
  23. Aurora Corpus. Available online: http://portal.elda.org/en/catalogues/free-resources/free-lrs-set-1/ (accessed on 15 January 2020).
  24. International Telecommunication Union. Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs; International Telecommunication Union: Geneva, Switzerland, 2001. [Google Scholar]
Figure 1. Illustration of the root location P = 10 (even).
Figure 1. Illustration of the root location P = 10 (even).
Electronics 09 00387 g001
Figure 2. Histograms of the iteration numbers required to converge, with the adaptive initial method and the static initial method, respectively: (a) n1, (b) n2, (c) n3, and (d) n4.
Figure 2. Histograms of the iteration numbers required to converge, with the adaptive initial method and the static initial method, respectively: (a) n1, (b) n2, (c) n3, and (d) n4.
Electronics 09 00387 g002
Table 1. Pseudocode of the proposed method.
Table 1. Pseudocode of the proposed method.
1 set r ^ and q ^ as the initial value of r and q
2 initialize r = r ^ , q = q ^ , Δ r = , Δ q =
3 while ( | Δ r | > ε ) | | ( | Δ q | > ε ) do
4  b 0 = α 0
5  b 1 = α 1 r b 0
6  c 0 = b 0
7  c 1 = b 1 r c 0
8 for ( i = 2 ; i < = N ; i + + ) do
9  b i = a i r b i 1 q b i 2
10  c i = b i r c i 1 q c i 2
11 end
12  Δ r = ( b N c N 3 b N 1 c N 2 ) / ( c 2 N 2 c N 3 ( c N 1 b N 1 ) )
13  Δ q = ( b N 1 ( c N 1 b N 1 ) b N c N 2 ) / ( c 2 N 2 c N 3 ( c N 1 b N 1 ) )
14  r = r + Δ r
15  q = q + Δ q
16 end
17  x 1 = ( r + r 2 4 q ) / 2
18  x 2 = ( r r 2 4 q ) / 2
19 return x 1 , x 2
Table 2. The adaptive initial method’s updating principle of r and q .
Table 2. The adaptive initial method’s updating principle of r and q .
Frame r i n i t i a l r f i n a l | Δ r | q i n i t i a l q f i n a l | Δ q |
1−0.244103−0.2551790.011077−0.311074−0.3011090.009965
2−0.255179−0.2419790.013201−0.301109−0.2794330.021676
3−0.241979−0.2382390.003739−0.279433−0.2760740.003359
4−0.238239−0.2584200.020180−0.276074−0.2997960.023722
5−0.258420−0.3203600.061940−0.299796−0.2676790.032117
6−0.320360−0.3501830.029824−0.267679−0.2468160.020863
Table 3. The mean and standard deviation of the | Δ r | and | Δ q | .
Table 3. The mean and standard deviation of the | Δ r | and | Δ q | .
Mathematic Analysis | Δ r | | Δ q |
Mean0.04190.0305
Standard Deviation0.00780.0038
Table 4. The average iteration numbers of the fixed, statistic and adaptive method for defining the initial values with the same computation accuracy.
Table 4. The average iteration numbers of the fixed, statistic and adaptive method for defining the initial values with the same computation accuracy.
Initial MethodAverage Iteration Numbers
n1n2n3n4Total
Fixed04.95.610.26.927.6
411.510.611.39.342.7
814.111.514.010.349.9
Statistic4.75.45.26.421.7
Adaptive3.73.45.23.716
Table 5. The detail steps to extract the roots of F ( x ) using the proposed method.
Table 5. The detail steps to extract the roots of F ( x ) using the proposed method.
F ( x ) = x 5 0.539786 x 4 0.760529 x 3 + 0.293614 x 2 + 0.088310 x 0.014837
First stage k r i r i + 1 q i q i + 1 | Δ r | | Δ q |
1−0.702538−0.6662990.0886800.0711630.0362400.017517
2−0.666299−0.6641320.0711630.0700340.0021670.001129
3−0.664132−0.6641170.0700340.0700340.0000150.000000
4−0.664117−0.6641170.0700340.0700340.0000000.000000
Second stage k r 2 i r 2 i + 1 q 2 i q 2 i + 1 | Δ r 1 | | Δ q 1 |
1−0.652314−0.623734−0.256786−0.2822220.0285800.025436
2−0.623734−0.621934−0.282222−0.2838700.0018010.001648
3−0.621934−0.621919−0.283870−0.2838850.0000150.000015
4−0.621919−0.621919−0.283885−0.2838850.0000000.000000
Table 6. Accurate and approximate values determined by the proposed method ( α = 0.00001 ) .
Table 6. Accurate and approximate values determined by the proposed method ( α = 0.00001 ) .
F ( x ) = x 5 0.539786 x 4 0.760529 x 3 + 0.293614 x 2 + 0.088310 x 0.014837
x j AccurateApproximate | Δ |
1−0.792042−0.7920410.000001
20.9282700.9282650.000005
30.6416820.6416860.000004
4−0.395398−0.3953880.000001
50.1757230.1757230.000000
Table 7. The results of LSP frequencies of F ( x ) = 0 .
Table 7. The results of LSP frequencies of F ( x ) = 0 .
F ( x ) = x 5 0.539786 x 4 0.760529 x 3 + 0.293614 x 2 + 0.088310 x 0.014837
i12345
r i −0.702538 N / A
q i 0.088680 N / A
r 2 i N / A −0.652314
q 2 i N / A −0.256786
x i 0.9282650.6416860.175723−0.395388−0.792041
w i 0.3810760.8741021.3941561.9772872.484941
Table 8. PESQ comparison of the proposed method and the previous methods.
Table 8. PESQ comparison of the proposed method and the previous methods.
MethodsTest SpeechPESQ
The proposed methodtesta_clean13.35
testa_n1_snr153.46
Birge-Vieta methodtesta_clean12.74
testa_n1_snr153.36
Table 9. Comparison of the average computation time with the previous works ( α = 0.00001 ).
Table 9. Comparison of the average computation time with the previous works ( α = 0.00001 ).
MethodsTime (ms)Normalized
Clock Numbers
Environment
The proposed method0.0016 59203.7 GHz
Birge-Vieta method 0.001866603.7 GHz
Tschirnhus Transform 0.06141043801.7 GHz
Modified Ferrari’s0.08021363401.7 GHz
Original Ferrari’s0.16402788001.7 GHz
Chen and Wu 0.16512806701.7 GHz
Soong and Juang0.53169037201.7 GHz
Full Search700.3211905401.7 GHz

Share and Cite

MDPI and ACS Style

Xue, Y.; Zhu, Z.; Jiang, J.; Zhan, Y.; Yu, Z.; Fan, X.; Qiao, S. Fast Computation of LSP Frequencies Using the Bairstow Method. Electronics 2020, 9, 387. https://doi.org/10.3390/electronics9030387

AMA Style

Xue Y, Zhu Z, Jiang J, Zhan Y, Yu Z, Fan X, Qiao S. Fast Computation of LSP Frequencies Using the Bairstow Method. Electronics. 2020; 9(3):387. https://doi.org/10.3390/electronics9030387

Chicago/Turabian Style

Xue, Yuqun, Zhijiu Zhu, Jianhua Jiang, Yi Zhan, Zenghui Yu, Xiaohua Fan, and Shushan Qiao. 2020. "Fast Computation of LSP Frequencies Using the Bairstow Method" Electronics 9, no. 3: 387. https://doi.org/10.3390/electronics9030387

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop