An Efficient VQ Codebook Search Algorithm Applied to AMR-WB Speech Coding

The adaptive multi-rate wideband (AMR-WB) speech codec is widely used in modern mobile communication systems for high speech quality in handheld devices. Nonetheless, a major disadvantage is that vector quantization (VQ) of immittance spectral frequency (ISF) coefficients takes a considerable computational load in the AMR-WB coding. Accordingly, a binary search space-structured VQ (BSS-VQ) algorithm is adopted to efficiently reduce the complexity of ISF quantization in AMR-WB. This search algorithm is done through a fast locating technique combined with lookup tables, such that an input vector is efficiently assigned to a subspace where relatively few codeword searches are required to be executed. In terms of overall search performance, this work is experimentally validated as a superior search algorithm relative to a multiple triangular inequality elimination (MTIE), a TIE with dynamic and intersection mechanisms (DI-TIE), and an equal-average equal-variance equal-norm nearest neighbor search (EEENNS) approach. With a full search algorithm as a benchmark for overall search load comparison, this work provides an 87% search load reduction at a threshold of quantization accuracy of 0.96, a figure far beyond 55% in the MTIE, 76% in the EEENNS approach, and 83% in the DI-TIE approach.


Introduction
With a 16 kHz sampling rate, the adaptive multi-rate wideband (AMR-WB) speech codec [1][2][3][4] is one of the speech codecs applied to modern mobile communication systems as a way to remarkably improve the speech quality on handheld devices.The AMR-WB is a speech codec developed on the basis of an algebraic code-excited linear-prediction (ACELP) coding technique [4,5], and provides nine coding modes with bitrates of 23.85, 23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 8.85, and 6.6 kbps.The ACELP-based technique is developed as an excellent speech coding technique, having a double advantage of low bit rates and high speech quality, but a price paid is a high computational complexity required in an AMR-WB codec.Using an AMR-WB speech codec, the speech quality of a smartphone can be improved but at the cost of high battery power consumption.
As in [11], a TIE algorithm is proposed to address the computational load issue in a VQ-based image coding.Improved versions of TIE approaches are presented in [12,13] to reduce the scope of a search space, giving rise to further reduction in the computational load.However, there exists a high correlation between ISF coefficients of neighboring frames in AMR-WB, that is, ISF coefficients evolve smoothly over successive frames.This feature benefits TIE-based VQ encoding, according to which a considerable computational load reduction is demonstrated.Yet, a moving average (MA) filter is employed to smooth the data in advance of VQ encoding of ISF coefficients.It means that the high correlation feature is gone, resulting in a poor performance in computational load reduction.Recently, a TIE algorithm equipped with a dynamic and an intersection mechanism, named DI-TIE, is proposed to effectively simplify the search load, and this algorithm is validated as the best candidate among the TIE-based approaches so far.On the other hand, an EEENNS algorithm was derived from equal-average nearest neighbor search (ENNS) and equal-average equal-variance nearest neighbor search (EENNS) approaches [15][16][17][18][19].In contrast to TIE-based approaches, the EEENNS algorithm uses three significant features of a vector, i.e., mean, variance, and norm, as a three-level elimination criterion to reject impossible codewords.
Furthermore, a binary search space-structured VQ (BSS-VQ) is presented in [20] as a simple as well as efficient way to quantize line spectral frequency (LSF) coefficients in the ITU-T G.729 speech codec [5].This algorithm demonstrated that a significant computational load reduction is achieved with a well maintained speech quality.In view of this, this paper will apply the BSS-VQ search algorithm to the ISF coefficients quantization in AMR-WB.This work aims to verify whether the performance superiority of the BSS-VQ algorithm remains, for the reason that the VQ structure in AMR-WB is different from that in G.729.On the other hand, another major motivation behind this is to meet the energy saving requirement on handheld devices, e.g.smartphones, for an extended operation time period.
The rest of this paper is outlined as follows.Section 2 gives the description of ISF coefficients quantization in AMR-WB.The BSS-VQ algorithm for ISF quantization is presented in Section 3. Section 4 demonstrates experimental results and discussions.This work is summarized at the end of this paper.

ISF Coefficients Quantization in AMR-WB
In a quantization process of AMR-WB [1], a speech frame of 20 ms is firstly applied to evaluate linear predictive coefficients (LPCs), which are then converted into ISF coefficients.Subsequently, quantized ISF coefficients are obtained following a VQ encoding process, which is detailed below.

Linear Prediction Analysis
In a linear prediction, a Levinson-Durbin algorithm is used to compute a 16th order LPC, a i , of a linear prediction filter [1], defined as: Subsequently, the LPC parameters are converted into the immittance spectral pair (ISP) coefficients for the purposes of parametric quantization and interpolation.The ISP coefficients are defined as the roots of the following two polynomials: Symmetry 2017, 9, 54 3 of 10 F 1 (z) and F 2 (z) are symmetric and antisymmetric polynomials, respectively.It can be proven that all the roots of such two polynomials lie and alternate successively on a unit circle in the z-domain.Additionally, F 2 (z) has two roots at z = 1 (ω = 0) and z = −1 (ω = π).Such two roots are eliminated by introducing the following polynomials, with eight and seven conjugate roots respectively on the unit circle, expressed as: where the coefficients q i are referred to as the ISPs in the cosine domain, and a [16] is the last predictor coefficient.A Chebyshev polynomial is used to solve Equations ( 4) and (5).Finally, 16th order ISF coefficients ω i can be obtained by taking the transformation ω i = arccos(q i ).

Quantization of ISF Coefficients
Before a quantization process, a mean-removed and first order MA filtering are performed on the ISF coefficients to obtain a residual ISF vector [1], that is: where z(n) and p(n) respectively denote the mean-removed ISF vector and the predicted ISF vector at frame n by a first order MA prediction, defined as: where r(n − 1) is the quantized residual vector at the previous frame.Subsequently, S-MSVQ is performed on r(n).As presented in Tables 1 and 2, S-MSVQ is categorized into two types in terms of the bit rate of the coding modes.In Stage 1, r(n) is split into two subvectors, namely, a 9-dimensional subvector r 1 (n) associated with codebook CB1 and a 7-dimensional subvector r 2 (n) associated with codebook CB2, for VQ encoding.As a preliminary step of Stage 2, the quantization error vectors are split into three subvectors for the 6.60 kbps mode or five for the modes with bitrates between 8.85 and 23.85 kbps, symbolized as r For instance, r (2)  1,1-3 in Table 1 represents the subvector split from the 1st to the 3rd components of r 1 , and then VQ encoding is performed thereon over codebook CB11 in Stage 2. Likewise, r (2)  2,4-7 stands for the subvector split from the 4th to the 7th components of r 2 , after which VQ encoding is performed over codebook CB22 in Stage 2. Finally, a squared error ISF distortion measure, that is, Euclidean distance, is used in all quantization processes.

BSS-VQ Algorithm for ISF Quantization
The basis of the BSS-VQ algorithm is that an input vector is efficiently assigned to a subspace where a small number of codeword searches is carried out using a combination of a fast locating technique and lookup tables, as a prerequisite of a VQ codebook search.In this manner, a significant computational load reduction can be achieved.
At the start of this algorithm, each dimension is dichotomized into two subspaces, and an input vector is then assigned to a corresponding subspace according to the entries of the input vector.This idea is illustrated in the following example.There are 2 9 = 512 subspaces for a 9-dimensional subvector r 1 (n) associated with codebook CB1, and an input vector can then be assigned to one of the 512 subspaces by means of a dichotomy according to each entry of the input vector.Finally, VQ encoding is performed using a prebuilt lookup table containing the statistical information on sought codewords.
In this proposal, the lookup table in each subspace is pre-built in a way that requires lots of data for training purposes.A training as well as an encoding procedure in BSS-VQ is illustrated with the example of a 9-dimensional codebook CB1 with 256 entries in AMR-WB.

BSS Generation with Dichotomy Splitting
As a preliminary step of a training procedure, each dimension is dichotomized into two subspaces, and a dichotomy position is defined as the mean of all the codewords contained in a codebook, formulated as: where c i (j) represents the jth component of the ith codeword c i , dp(j) the mean value of all the jth components.Taking the codebook CB1 as an instance, CSize = 256, Dim = 9.As listed in Table 3, all the dp(j) values are saved and then presented in a tabular form.Subsequently, for vector quantization on the nth input vector x n , a quantity ν n (j) is defined as: where x n (j) denotes the jth component of x n .Then x n is assigned to subspace k (bss k ), with k given as the sum of ν n (j) over the entire dimensions, formulated as: Symmetry 2017, 9, 54 5 of 10 In this study, 0 ≤ k < BSize and BSize = 2 9 = 512 represents the total number of subspaces.Taking an input vector x n = {20.0,20.1, 20.2, 20.3, 20.4, 20.5, 20.6, 20.7, 20.8} as an instance, ν n (j) = {2 0 , 2 1 , 2 2 , 0, 0, 0, 0, 0, 2 8 } for each j, 0 ≤ j ≤ 8, and k = 263 can be obtained by Equations ( 9) and ( 10) respectively.Thus, the input vector x n is assigned to the subspace bss k with k = 263.
By means of Equations ( 9) and (10), it is noted that this algorithm requires a small number of basic operations, i.e., comparison, shift and addition, such that an input vector is assigned to a subspace in a highly efficient manner.

Training Procedure of BSS-VQ
Following the determination of the dichotomy position for each dimension, a training procedure is performed to build a lookup table in each subspace.The lookup tables give the probability that each codeword serves as the best-matched codeword in each subspace, referred to as the hit probability of a codeword in a subspace for short.
A training procedure is stated below as Algorithm 1.With more than 1.56 GB of memory, a duration longer than 876 min and a total of 2,630,045 speech frames, a large speech database, covering a diversity of contents and multiple speakers, is employed as the training data in a training procedure.

Algorithm 1: Training procedure of BSS-VQ
Step 1.Initial setting: assign each codeword to all the subspaces, and then set the probability that the codeword c i corresponds to the best-matched codeword in bss k P hit (c Step 2. Referencing Table 3 and through Equations ( 9) and ( 10), an input vector can be efficiently assigned to bss k .
Step 3. A full search is conducted on all the codewords according to the Euclidean distance, given as: and an optimal codeword c opt satisfies: Step 4. Update the statistics on the optimal codeword, that is, P hit (c opt bss k ) .
Step 5. Repeat Steps 2-4, until the training is performed on all the input vectors.
A lookup table is built for each subspace, following the completion of a training procedure.The lookup table gives the hit probability of each codeword in a subspace.For sorting purposes, a quantity P hit (m|bss k ) , 1 ≤ m ≤ CSize, is defined as the m ranked probability that a codeword hits the best-matched codeword in subspace bss k .Taking m = 1 as an instance, P hit (m|bss the highest hit probability in bss k .As it turns out, the lookup table in each subspace gives the ranked hit probability in descending order and the corresponding codeword.

Encoding Procedure of BSS-VQ
In the encoding procedure of BSS-VQ, the cumulative probability P cum (M|bss k ) is firstly defined as the sum of the top M P hit (m|bss k ) in bss k , that is: Subsequently, given a threshold of quantization accuracy (TQA), a quantity M k (TQA) represents the minimum value of M that satisfies the condition P cum (M|bss k ) ≥ TQA in bss k , that is: For a given TQA, a total of 512 M k (TQA)s are evaluated by Equation ( 14) for all the subspaces, and the mean value is then given as: Illustrated in Figure 1 is a plot of the average number of searches M(TQA) corresponding to the values of TQA ranging between 0.90 and 0.99.Given a TQA = 0.95 as an instance, a mere average of 14.58 codeword searches is required to reach a search accuracy as high as 95%.In simple terms, the search performance can be significantly improved at the cost of a small drop in search accuracy.Furthermore, a trade-off can be made instantly between the quantization accuracy and the search load according to Figure 1.Hence, a BSS-VQ encoding procedure is described below as Algorithm 2.
Symmetry 2017, 9, 54 6 of 10 Illustrated in Figure 1 is a plot of the average number of searches ) (TQA M corresponding to the values of TQA ranging between 0.90 and 0.99.Given a TQA = 0.95 as an instance, a mere average of 14.58 codeword searches is required to reach a search accuracy as high as 95%.In simple terms, the search performance can be significantly improved at the cost of a small drop in search accuracy.Furthermore, a trade-off can be made instantly between the quantization accuracy and the search load according to Figure 1.Hence, a BSS-VQ encoding procedure is described below as Algorithm 2. Algorithm 2: Encoding procedure of BSS-VQ Step 1.Given a TQA, Mk(TQA) satisfying Equation ( 14) is found directly in the lookup table in bssk.
Step 2. Referencing Table 3 and by means of Equations ( 9) and ( 10), an input vector is assigned to a subspace bssk in an efficient manner.Step 3. A full search for the best-matched codeword is performed on the top Mk(TQA) sorted codewords in bssk, and then the output is the index of the found codeword.and the corresponding codeword is built for each subspace according to the training procedure.Accordingly, the VQ encoding can be performed using Algorithm 2.

Experimental Results
There are three experiments conducted in this work.The first is a search load comparison among various search approaches.The second is a quantization accuracy (QA) comparison among a full search and other search approaches.The third is a performance comparison among various approaches in terms of ITU-T P.862 perceptual evaluation of speech quality (PESQ) [21] as an objective measure of speech quality.A speech database, completely different from all the training data, is employed for outside testing purposes.With one male and one female speaker, the speech database in total takes up more than 221 MB of memory, occupies more than 120 min, and covers 363,281 speech frames.
Firstly, Table 4 lists a comparison on the average number of searches among full search, multiple TIE (MTIE) [13], DI-TIE, and EEENNS, while Table 5 gives the search load corresponding to TQA values Algorithm 2: Encoding procedure of BSS-VQ Step 1.Given a TQA, M k (TQA) satisfying Equation ( 14) is found directly in the lookup table in bss k .
Step 2. Referencing Table 3 and by means of Equations ( 9) and ( 10), an input vector is assigned to a subspace bss k in an efficient manner.
Step 3. A full search for the best-matched codeword is performed on the top M k (TQA) sorted codewords in bss k , and then the output is the index of the found codeword.
Step 4. Repeat Steps 2 and 3 until all the input vectors are encoded.
The BSS-VQ algorithm is briefly summarized as follows.Table 3 is the outcome by performing Equation (8) and is saved as the first lookup table.Subsequently, the second lookup table concerning P hit (m|bss k ) and the corresponding codeword is built for each subspace according to the training procedure.Accordingly, the VQ encoding can be performed using Algorithm 2.

Experimental Results
There are three experiments conducted in this work.The first is a search load comparison among various search approaches.The second is a quantization accuracy (QA) comparison among a full search and other search approaches.The third is a performance comparison among various approaches in terms of ITU-T P.862 perceptual evaluation of speech quality (PESQ) [21] as an objective measure of speech quality.A speech database, completely different from all the training data, is employed for Symmetry 2017, 9, 54 7 of 10 outside testing purposes.With one male and one female speaker, the speech database in total takes up more than 221 MB of memory, occupies more than 120 min, and covers 363,281 speech frames.
Firstly, Table 4 lists a comparison on the average number of searches among full search, multiple TIE (MTIE) [13], DI-TIE, and EEENNS, while Table 5 gives the search load corresponding to TQA values of the BSS-VQ algorithm.Moreover, with the search load required in the full search algorithm as a benchmark, Tables 6 and 7 present comparisons on the load reduction (LR) with respect to Tables 4 and 5.A high value of LR reflects a high search load reduction.Table 6 indicates that DI-TIE provides a higher value of LR than MTIE and EEENNS search approaches among all the codebooks.It is also found that most LR values of BSS-VQ are higher than the DI-TIE approach by an observation in Tables 6 and 7.For example, the LR values of BSS-VQ are indeed higher than DI-TIE in case the TQA is equal to or smaller than 0.99, 0.98, 0.96, and 0.99 in codebooks CB1, CB2, CB21, and CB22, respectively.Accordingly, a remarkable search load reduction is reached by the BSS-VQ search algorithm.In the QA aspect, a 100% QA is obtained by the MTIE, DI-TIE, and EEENNS algorithms as compared with a full search approach.Thus, only the QA experiment of BSS-VQ is conducted.The QA corresponding to TQA values of the BSS-VQ algorithm is given in Table 8.It reveals that QA is an approximation of TQA in either inside or outside testing cases.Moreover, this algorithm provides an LR between 77.78% and 93.98% at TQA = 0.90 as well as an LR between 67.23% and 88.39% at TQA = 0.99, depending on the codebooks.In other words, a trade-off can be made between the quantization accuracy and the search load.Furthermore, an overall LR is evaluated to observe the total search load of an entire VQ encoding procedure of an input vector.The overall LR refers to the total search load, defined as the sum of the average number of searches multiplied by the vector dimension in each codebook.Thus, an overall LR comparison with the full search as a benchmark is presented as a bar graph in Figure 2. As clearly indicated in Figure 2, the overall LR of BSS-VQ is higher than MTIE, DI-TIE, and EEENNS approaches, but at the same time the QA is as high as 0.98.Moreover, Table 9 gives a PESQ comparison, including the mean and the STD, among various approaches.Since MTIE, DI-TIE, and EEENNS provide a 100% QA, they both share the same PESQ with a full search, meaning that there is no deterioration in the speech quality.A close observation reveals little difference between PESQs obtained in a full search and in this search algorithm, that is, the speech quality is well maintained in BSS-VQ at TQA not less than 0.90.This BSS-VQ search algorithm is experimentally validated as a superior candidate relative to its counterparts.
comparison, including the mean and the STD, among various approaches.Since MTIE, DI-TIE, and EEENNS provide a 100% QA, they both share the same PESQ with a full search, meaning that there is no deterioration in the speech quality.A close observation reveals little difference between PESQs obtained in a full search and in this search algorithm, that is, the speech quality is well maintained in BSS-VQ at TQA not less than 0.90.This BSS-VQ search algorithm is experimentally validated as a superior candidate relative to its counterparts.

Conclusions
This paper presents a BSS-VQ codebook search algorithm for ISF vector quantization in the AMR-WB speech codec.Using a combination of a fast locating technique and lookup tables, an input vector is efficiently assigned to a search subspace where a small number of codeword searches is carried out and the aim of remarkable search load reduction is reached consequently.Particularly, a trade-off can be made between the quantization accuracy and the search load to meet a user's need when performing a VQ encoding.This BSS-VQ search algorithm, providing a considerable search load reduction as well as nearly lossless speech quality, is experimentally validated as superior to MTIE, DI-TIE, and EEENNS approaches.Furthermore, this improved AMR-WB speech codec can be adopted to upgrade the VoIP performance on a smartphone.As a consequence, the energy efficiency requirement is achieved for an extended operation time period due to computational load reduction.

Figure 1 .
Figure 1.A plot of the average number of searches versus TQA.

Step 4 .
Repeat Steps 2 and 3 until all the input vectors are encoded.The BSS-VQ algorithm is briefly summarized as follows.

Figure 1 .
Figure 1.A plot of the average number of searches versus TQA.

Figure 2 .Figure 2 .
Figure 2. Comparison of overall search load reduction among various approaches.

Table 1 .
Structure of S-MSVQ in AMR-WB in the 8.85-23.85kbps coding modes.

Table 3 .
Dichotomy position for each dimension in the codebook CB1.

Table 4 .
Average number of searches among various algorithms in the 8.85-23.85kbps modes.

Table 5 .
Search load of the BSS-VQ algorithm versus TQA values in the 8.85-23.85kbps modes.

Table 8 .
Comparison of QA percentage of the BSS-VQ algorithm versus TQA values in the 8.85-23.85kbps modes among codebooks.

Table 9 .
Comparison on mean opinion score (MOS) values using the PESQ algorithm among various methods.