An Efficient Codebook Search Algorithm for Line Spectrum Frequency (LSF) Vector Quantization in Speech Codec

Xue, Yuqun; Wang, Yongsen; Jiang, Jianhua; Yu, Zenghui; Zhan, Yi; Fan, Xiaohua; Qiao, Shushan

doi:10.3390/electronics10040380

Open AccessArticle

An Efficient Codebook Search Algorithm for Line Spectrum Frequency (LSF) Vector Quantization in Speech Codec

by

Yuqun Xue

^1,2,

Yongsen Wang

^1,2,

Jianhua Jiang

^1,2,

Zenghui Yu

¹,

Yi Zhan

¹,

Xiaohua Fan

¹ and

Shushan Qiao

^1,2,*

¹

Smart Sensing R&D Center, Institute of Microelectronics of Chinese Academy of Sciences, Beijing 100029, China

²

Department of Microelectronics, University of Chinese Academy of Sciences, Beijing 100049, China

^*

Author to whom correspondence should be addressed.

Electronics 2021, 10(4), 380; https://doi.org/10.3390/electronics10040380

Submission received: 3 December 2020 / Revised: 21 January 2021 / Accepted: 2 February 2021 / Published: 4 February 2021

(This article belongs to the Section Circuit and Signal Processing)

Download

Browse Figures

Versions Notes

Abstract

:

A high-performance vector quantization (VQ) codebook search algorithm is proposed in this paper. VQ is an important data compression technique that has been widely applied to speech, image, and video compression. However, the process of the codebook search demands a high computational load. To solve this issue, a novel algorithm that consists of training and encoding procedures is proposed. In the training procedure, a training speech dataset was used to build the squared-error distortion look-up table for each subspace. In the encoding procedure, firstly, an input vector was quickly assigned to a search subspace. Secondly, the candidate code word group was obtained by employing the triangular inequality elimination (TIE) equation. Finally, a partial distortion elimination technique was employed to reduce the number of multiplications. The proposed method reduced the number of searches and computation load significantly, especially when the input vectors were uncorrelated. The experimental results show that the proposed algorithm provides a computational saving (CS) of up to 85% in the full search algorithm, up to 76% in the TIE algorithm, and up to 63% in the iterative TIE algorithm. Further, the proposed method provides CS and load reduction of up to 29–33% and 67–69%, respectively, over the BSS-ITIE algorithm.

Keywords:

vector quantization (VQ); codebook search; line spectrum frequency (LSF); speech codec

1. Introduction

Vector quantization (VQ) is a high-performance technique for data compression. Due to its simple coding and high compression ratio, it has been successfully applied to speech, image, audio, and video compression [1,2,3,4,5,6]. It is also a key part of the G.729 Recommendation. A major handicap is that a remarkable computational load is required for the VQ of line spectrum frequency (LSF) coefficients of the speech codec [7,8,9,10,11,12]. Thus, it is necessary to reduce the computation load of VQ.

Conventionally, a full search algorithm is employed to find the code word that is best matched with an arbitrary input vector. However, the full search algorithm demands a large computation load. To reduce the encoding search complexity, many approaches have been proposed [13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28]. These approaches can be classified into four main types, according to their base techniques. The first type is the tree-structured VQ (TSVQ) technique [13,14,15,16,17], the second type is the triangular inequality elimination (TIE)-based approach [18,19,20,21], the third type is the equal-average equal-variance equal-norm nearest neighbor search (EEENNS) method [22,23,24] based on the statistical characteristic values of the input signal, and the last type is the binary search space-structured VQ (BSS-VQ) [25,26,27,28] method.

In the TSVQ approaches [13,14,15,16,17], the search complexity can be reduced significantly. However, the reconstructed speech quality is poor because the selected code word is not necessarily the best matched to the input vector. In contrast, the TIE-based method [18,19,20,21] can achieve an enormous computational load reduction without loss of speech quality. By employing the high correlation between the adjacent frames, a TIE method is used to reject the impossible candidate code words. However, the performance of computational load reduction is weakened when the input vectors are uncorrelated. The EEENNS technique employs the statistical characteristics of an input vector, such as the mean, the variance, and the norm, to reject the impossible code words. In order to further reduce the computational load, the BSS-VQ method [25] is proposed, in which the number of candidate code words is closely related to the distribution of the hit probability, and a sharpened distribution at specific code words yields an enormous computational load. However, the probability distribution is not uniform; some subspaces are concentrated, and some are flattened. Thus sometimes the performance is not good.

To solve the above issues, an efficient VQ codebook search algorithm is proposed. Even though the input vectors are uncorrelated, it can still reduce the computational load significantly while maintaining the quantization accuracy. Compared with previous works on the full search algorithm [6], TIE [18], ITIE [21], and the BSS-ITIE [28], the experimental results show that the proposed algorithm can quantize the input vector with the lowest computational load.

The rest of this paper is organized as follows. Section 2 describes the encoding procedure of the LSF coefficients in G.729. Section 3 presents the theory of the proposed algorithm in detail. Section 4 shows the experimental results and performance comparison between the proposed algorithm and previous works. Finally, Section 5 concludes this work.

2. LSF Coefficients Quantization in G.729

The ITU-T G.729 [29] speech codec was selected as the platform to verify the performance of the proposed algorithm. Thus, before introducing the theory of the proposed method, the principle of the LSF quantization in G.729 is introduced here.

The LSF coefficients are obtained by an equation

ω_{i} = a r \cos (q_{i})

, where

q_{i}

is the LSF coefficient in the cosine domain, and

w_{i}

is the computed LSF coefficient in the frequency domain.

The procedure of the LSF quantize is organized as follows: a switched 4th moving average (MA) prediction is used to predict the LSF coefficients of the current frame. The difference between the computed and predicted coefficients is quantized using a two-stage vector quantizer. The first stage is a 10-dimensional VQ using a codebook

L 1

with 128 entries (7 bits). In the second stage, the quantization error vectors of the first stage are split into two sub-vectors, which then are quantized by a split VQ associating two codebooks,

L 2

and

L 3

, each containing 32 entries (5 bits). Figure 1 illustrates the structure of the two-stage VQ for LSF coefficients.

To explain the quantization process, it is convenient to describe the decoding process first. Each quantized value is obtained from the sum of the two code words, as follows:

{\hat{l}}_{i} = {\begin{cases} L 1_{i} (l 1) + L 2_{i} (l 2) i = 1, \dots, 5 \\ L 1_{i} (l 1) + L 3_{i - 5} (l 3) i = 6, \dots, 10 \end{cases},

(1)

where

l 1

,

l 2

, and

l 3

are the codebook indices. To guarantee that the reconstructed filters are stable, the vector

{\hat{l}}_{i}

is arranged such that adjacent elements have a minimum distance of

d_{\min}

. This rearrangement process is done twice. First, the quantized LSF coefficients,

{\hat{ω}}_{i}^{(m)}

, for the current frame, m, are obtained from the weighted sum of previous quantizer outputs,

{\hat{l}}_{i}^{(m - k)}

, and the current quantizer output,

{\hat{l}}_{i}^{m}

.

{\hat{ω}}_{i}^{(m)} = (1 - \sum_{k = 1}^{4} {\hat{P}}_{i, k}) {\hat{l}}_{i}^{m} + \sum_{k = 1}^{4} {\hat{P}}_{i, k} {\hat{l}}_{i}^{(m - k)} i = 1, \dots, 10 .

(2)

where

{\hat{p}}_{i, k}

are the coefficients of the switched MA predictor as defined by parameter,

p 0

. For each of the two MA predictors the best approximation to the current LSF coefficients has to be found. The best approximation is defined as the one that minimizes the weighted mean-squared error (MSE).

E_{l s f} = \sum_{i = 1}^{10} W_{i} {(ω_{i} - {\hat{ω}}_{i})}^{2} .

(3)

The weights emphasize the relative importance of each LSF coefficient. The weights,

W_{i}

, are made adaptive as a function of the unquantized LSF parameters.

W_{1} = {\begin{cases} 1.0, i f ω_{2} - 0.04 π - 1 > 0 \\ 10 {(w_{2} - 0.04 π - 1)}^{2} + 1, otherwise \end{cases} .

(4)

W_{i}, f o r 2 \leq i \leq 9 = {\begin{cases} 1.0, i f ω_{i + 1} - ω_{i - 1} - 1 > 0 \\ 10 {(ω_{i + 1} - ω_{i - 1} - 1)}^{2} + 1, otherwise \end{cases} .

(5)

W_{10} = {\begin{cases} 1.0, i f - ω_{9} + 0.92 π - 1 > 0 \\ 10 {(- ω_{9} + 0.92 π - 1)}^{2} + 1, o t h e r w i s e \end{cases} .

(6)

Then, the weights

ω_{5}

and

ω_{6}

are multiplied by 1.2 each. The vector to be quantized for the current frame,

m

, is obtained from Equation (7).

I_{i}^{(m)} = [ω_{i}^{(m)} - \sum_{k = 1}^{4} {\hat{p}}_{i, k} {\hat{I}}_{i}^{(m - k)}] / (1 - \sum_{k = 1}^{4} {\hat{p}}_{i, k}), i = 1, \dots, 10 .

(7)

The first codebook,

L 1

, is searched and the entry,

l 1

, that minimizes the unweighted MSE is selected. Then the second codebook,

L 2

, is searched by computing the weighted MSE, and the entry

l 2

, which results from the lowest error is selected. After selecting the first stage vector,

l 1

, and the lower part of the second stage,

l 2

, the higher part of the second stage is searched from the codebook,

L 3

. The vector,

l 3

, that minimizes the weighted MSE is selected. The resulting vector,

{\hat{I}}_{i}, i = 1, \dots, 10

, is rearranged twice using the above procedure. The procedure is done for each of the two MA predictors defined by p0, and the MA predictor that produces the lowest weighted MSE is selected. Table 1 shows the two groups of coefficients of the MA predictor. When the first group of coefficients is selected, the value of p0 is 0. Otherwise, the value of p0 is 1. Table 2 shows the bit allocation of the LSF quantizer.

3. Proposed Search Algorithm

In this paper, a fast codebook search algorithm of vector quantization is proposed to reduce the computation load of the LSF coefficients’ quantization in the G.729 speech codec. Even though the input vectors are uncorrelated, this can still reduce the computation load significantly, while maintaining the quantization accuracy.

In the BSS-VQ method [25], the number of candidate code words is strongly related to the distribution of the hit probability for each subspace, and a sharpened distribution at specific code words yields an enormous computational load while maintaining the quantization accuracy. However, the probability distribution is not uniform: some subspaces are concentrated, and some are flattened. Thus, sometimes its performance is poor. The TIE algorithm [18] uses the strong correlation between adjacent values to narrow the search range. However, its performance is poor when the inputs are uncorrelated.

Considering the drawbacks of the above methods, to further reduce computational load, especially when the inputs are uncorrelated, a novel algorithm is proposed. After the statistical analysis, the code word corresponding to the highest hit probability for each subspace is obtained and selected as the reference code word, and a squared-error distortion look-up table for each subspace is built. The adjacent code word is highly correlated due to the squared-error distortion changing slightly in the squared-error look-up table. The reference code word corresponding to the highest probability is regarded as the best-matched code word. Thus, the smaller the squared-error distortion with the reference code word, the more likely the candidate code word is to be the best matched. Therefore, the TIE technique can be employed to reject the impossible code words. The proposed algorithm consists of the training procedure and encoding procedure. The structure of the proposed algorithm is illustrated in Figure 2, and the theory of the proposed algorithm is presented in detail as follows.

3.1. Training Procedure

In the three-stage training procedure, the squared-error look-up table for each subspace is prebuilt. At stage 1, each dimension is dichotomized into two subspaces. Then, an input vector is assigned to a corresponding subspace according to the entries of the input vector [25]. For instance, when the input is a 10-dimensional vector, there are

2^{10} = 1024

subspaces in G.729. Before the encoding procedure, a look-up table that contains the hit probability statistical information on the code words is prebuilt.

The dichotomy position for each dimension is defined as the mean of all the code words from the codebook, presented as:

m e a n (k) = \frac{1}{C S i z e} \sum_{i = 1}^{C S i z e} c_{i} (k), 0 \leq k < D i m,

(8)

where

c_{i} (k)

is the

k

-th component of the

i

-th code word

c_{i}

, and

m e a n (k)

is the average value of all the

i

-th components. For instance, in the first stage of the LSF coefficients quantization in G.729,

C S i z e = 128

,

D i m = 10

in the codebook

L 1

. A parameter

v_{n} (k)

is defined for vector quantization on the

n

-th input vector

x_{n}

, symbolized as:

v_{n} (k) = {\begin{cases} 2^{k}, x_{n} (k) \geq m e a n (k) \\ 0, x_{n} (k) < m e a n (k) \end{cases}, 0 \leq k < D i m,

(9)

where

x_{n} (k)

is the

k

-th component of

x_{n}

. Then,

x_{n}

is assigned to a subspace

j (b s s_{j})

, where

j

is the sum of

v_{n} (k)

over all the dimensions, presented as:

x_{n} \in b s s_{j} | j = \sum_{k = 0}^{D i m - 1} v_{n} (k) .

(10)

For instance, given an input

x_{n} = {0.44, 0.08, 0.51, 0.87, 1.26, 1.40, 2.10, 2.18, 2.30, 2.42}

,

v_{n} (k) = {1, 0, 0, 8, 16, 0, 64, 128, 0, 0}

for each

k

,

j = 217

is obtained by (8) and (9), respectively. Then, the input vector

x_{n}

is assigned to the subspace

b s s_{j}

with

j = 217

. This means that an input vector can be assigned to a corresponding subspace quickly, with only a few basic operations.

At stage 2, the hit probability table for each subspace is prebuilt through a training mechanism, which includes the probability for each code word to be the best-matched code word in each subspace. This is defined as follows:

P_{h i t} (c_{i} | b s s_{j})

, where

1 \leq i \leq C S i z e

,

1 \leq j \leq S n u m

, and

S n u m = 2^{D i m}

is the number of subspaces. Subsequently, the table is sorted in descending order by the hit probability value. For example, when

i = 1

,

P_{h i t} (i | b s s_{j}) |_{i = 1} = \max_{c_{i}} {P_{h i t} (c_{i}) | b s s_{j}}

represents the highest hit probability in

b s s_{j}

and the corresponding code word. The sum of all the hit probabilities for each subspace is 1.0. The hit probability

P_{h i t} (c_{i} | b s s_{j})

is computed by Equation (11), and the cumulative probability of the top

N

code words is symbolized as Equation (12):

P_{h i t} (c_{i} | b s s_{j}) = \frac{t h e n u m b e r o f t i m e s t h a t c o d e w o r d c_{i} f a l l s i n t h e s u b s p a c e b s s_{j}}{t h e t o t a l n u m b e r o f t i m e s t h a t a l l t h e c a n d i d a t e c o d e w o r d s f a l l s i n s u b s p a c e b s s_{j}},

(11)

P_{c m u} (N | b s s_{j}) = \sum_{i = 1}^{N} P_{h i t} (n | b s s_{j}), 1 \leq N \leq C S i z e,

(12)

which represents the number of possible candidate code words. Further, to control the quantization accuracy and computational load, a variable named threshold of quantization accuracy (TQA) is defined. This is given a quantity,

N_{j} (T Q A)

, which means the minimum number

N

that satisfies the equation

P_{c m u} (N | b s s_{j}) \geq T Q A

in a subspace,

b s s_{j}

, is expressed as:

N_{j} (T Q A) = \arg \min_{N} {N : P_{c u m} (N | b s s_{j}) \geq T Q A}, 1 \leq N \leq C S i z e, 0 \leq j \leq S n u m,

(13)

At stage 3, the highest hit probability code word in each subspace is selected as the reference code word

c_{r}

, then the squared-error distortion among the reference code word and all the other code words,

c_{i}

, in the codebook is calculated by (14).

d (c_{r}, c_{i}) = ‖ c_{r} - c_{i} ‖ = {\sum_{i = 1}^{k - 1} (c_{r} - c_{i})}^{2}

(14)

Subsequently, the squared-error distortion look-up table for each subspace is built and then is sorted in ascending order. Algorithm 1 shows the pseudo-code of the training procedure.

Algorithm 1. Training procedure of the proposed algorithm

Step 1. Give a training speech dataset and a value, TQA.

Step 2. The input vector is assigned to a corresponding subspace,

b s s_{j}

, by (9) and (10).

Step 3. Repeat Step 2 until all the input vectors are encoded.

Step 4. The hit probability table is obtained by (11) and (12) and then sorted in descending order.

Step 5. Select the highest hit probability code word as the reference code word, the squared-error distortion among the reference code word and other code words,

c_{i}

, in the codebook is calculated by (14).

Step 6. The squared-error distortion look-up table is prebuilt for each subspace and is sorted in ascending order.

3.2. Encoding Procedure

Given a testing speech set and a value for

T Q A

, in step 1, an input vector is assigned to a search subspace by (9) and (10). In step 2, a triangular inequality elimination (TIE) formulation is used to reject the impossible candidate code words [18]. If

d (c_{r}, c_{i}) > 4 d (c_{r}, x)

, then

d (c_{i}, x) > d (c_{r}, x)

, thus the computation of

d (c_{i}, x)

is eliminated. Where

x

is the input vector,

c_{r}

is the reference code word. A group composed of all the code words,

c_{i}

, which satisfy the above conditions is selected, then a group defined as a candidate search group (CSG) is built, and the number of code words in the CSG is symbolled as

N (c_{r})

.

TIE : C S G (c_{r}) = {c_{i} | d (c_{r}, c_{i}) < 4 d (c_{r}, x)}, 1 \leq i \leq C S i z e .

(15)

In step 3, after the

C S G (c_{r})

is obtained, the squared-error distortion between the input vector and each candidate code word is computed. The best-matched code word is the one that makes the squared-error distortion between the input vector,

x_{n}

, and the candidate code word,

c_{i}

, minimized.

As stated by BEI [30], a minimum squared-error distortion computation method called partial distortion elimination (PDE) is employed to reduce the number of multiplication operations. This method can decide whether the current code word is the best matched or not before the whole squared-error distortion is calculated. The pseudo-code of the PDE method is shown in Algorithm 2, where

C S i z e

,

D i m

,

d_{\min}

,

C (i, j)

are the codebook size, the dimension of the input vector, the minimum distortion value, and the code word in the codebook, respectively. After the abovementioned description, the encoding procedure of the proposed algorithm can be summarized as Algorithm 3.

Algorithm 2. Pseudo-code of the PDE algorithm

1 j = 0,

d_{\min} = 2^8

2 whilej <

C S i z e

3

d i s t = 0

4 i = 0

5 while (

d i s t < d_{\min}

)

6

t m p = x (j) - C (i, j)

7

d i s t = d i s t + t m p * t m p

8 i = i + 1

9 end

10 if ((i = =

D i m

) && (

d i s t < d_{\min}

))

11

d_{\min} = d i s t

12 index = j

13 end

14 j = j + 1

15 end

Algorithm 3. Encoding procedure of the proposed algorithm.

Step 1. Given a testing speech set and a value for

T Q A

.

Step 2. The input vector is quickly assigned to a corresponding subspace,

b s s_{j}

, by (9) and (10).

Step 3. The number of candidate code words,

N_{k} (T Q A)

, is found directly from the prebuilt squared-error distortion look-up table for each subspace.

Step 4. The code word corresponding to the highest hit probability is selected as a reference,

c_{r}

, compute

d (c_{r}, x)

, and then the

C S G (c_{r})

and

N (c_{r})

are obtained by (15).

Step 5. Starting at k = 1, the

d (c_{r}, c_{k})

is obtained directly from the squared-error distortion look-up table.

Step 6. If

(d (c_{r}, c_{k}) < 4 d (c_{r}, x))

, then compute

d (c_{k}, x)

by Algorithm 2,

k = k + 1

; repeat Step 4 until

k = N (c_{r})

. Then the index of the best-matched code word is obtained.

Step 7. Output index of the best-matched code word.

Step 8. Repeat Steps 2–6 until all the input vectors are encoded.

4. Experiment and Results

4.1. Experimental Environment

Here, the first-stage quantization procedure of LSF coefficients in G.729 was selected as a platform to illustrate the performance of the proposed algorithm. The first-stage codebook included 128 code words, and each code word was a 10-dimensional vector. There were 1024 subspaces. The Aurora speech dataset [31] was used as the training and testing speech data. There were 1001 clean speech files and 1001 speech files with noise from Test-A set, spoken by 50 males and 50 females used in this paper. The testing speech signals were sampled at 8 kHz with a resolution of 16 bits per sample.

4.2. Selection of the Training Dataset

As introduced in Section 3, the proposed method included a training procedure and an encoding procedure. The training procedure provided a squared-error look-up table for the encoding procedure. The selection of the training speech dataset directly affected the application scope of the squared-error look-up table. Further, it affected the computation loads and quantization accuracy of the subsequent encoding process. Therefore, we discuss the influence of the selection of the training dataset on the application scope of the squared-error look-up table.

The Aurora Test-A set included 1001 clean files and 1001 noisy speech files that were used to train and test the robustness of the proposed algorithm. Here we will compare the performance of the proposed method with three experimental environments. To evaluate the quantization accuracy of the proposed algorithm, a parameter defined as error rate (ER) was proposed, symbolized as

E R = \frac{uncorrected quantized frames}{total input frames}

. The average search times are symbolized as ASN, and the reduction of the ASN is presented as a computational saving (CS) and symbolized as

C S = \frac{{ASN}_{1} {- ASN}_{2}}{{ASN}_{1}}

, given to evaluate the reduction of the computational load. The following three experimental results were all obtained with the value

T Q A = 0.99

.

For the first experimental conditions, the 1001 clean files, which included 350,866 speech frames, were all selected as the training set, then a squared-error look-up table which is symbolized as table A was obtained after the training procedure. Then, table A was employed in the encoding process. The testing datasets were clean files and noisy speech files, respectively. When there were 500 clean files, which included 174,630 speech frames used as the testing dataset, the experimental results show that there were 1412 uncorrected quantized frames, thus ER = 0.81%, and ASN = 18.82. When there were 201 speech files with noise, which included 70,170 frames used as a testing dataset, the experimental results show that there were 2579 uncorrected quantized frames, thus t ER = 3.7% and ASN = 18.9.

For the second experimental conditions, the training dataset included 500 clean speech files and 500 speech files with noise, which included 348,724 frames. Then, when the training dataset was used as a testing dataset, the experimental results show there were 1062 uncorrected quantized frames, thus

E R = 0.7 %

and ASN = 18.8. When the other 201 clean speech files, which included 70,170 frames used as testing data, the experimental result shows that there were 2038 uncorrected quantized frames, thus

E R = 2.9 %

and ASN = 19.3. When the other 201 speech files with noise, which included 70,170 frames used as a testing set, the experimental results show that there were 1975 uncorrected quantized frames, thus

E R = 2.8 %

and ASN = 18.9.

For the third experimental conditions, the training dataset included 1001 clean files and 1001 speech files with noise, which included 701,508 frames. When there were 201 clean speech files used as a testing set, the experimental results show that there were 556 uncorrected quantized frames, thus

E R = 0.79 %

and ASN = 20. When there were 201 speech files with noise, which included 70,170 frames used as a testing set, the experimental results show that there were 605 uncorrected quantized frames, thus

E R = 0.8 %

and ASN = 19.9.

Details of the above three experiments and the corresponding experimental results can be found in Table 3. Even under the worst training conditions, 1, where the training set was clean speech and the testing set was speech files with noise, the experimental results show that

E R = 3.7 %

and ASN = 18.9. Under the best training conditions, 3, the experimental results show that

E R = 0.8 %

and ASN = 20. Comparing the results of these three conditions, the ER value ranges from 0.7% to 3.7%, and the ASN value ranges from 18.82 to 20. It can be concluded that when the training set was large enough, the selection of the training set had no significant influence on the encoding process. The robustness of the prebuilt squared-error look-up table was good. The variation ranges of ER and ASN were within acceptable limits.

4.3. Performance of the Proposed Method

Here we choose experiment 1 to illustrate the performance of the proposed method. The generation of the squared-error look-up table is a very important process of the proposed algorithm. Thus, we extracted some intermediate experimental data as examples to illustrate the created procedure. Figure 3 illustrates the design procedure of the squared-error look-up table of the proposed algorithm. It shows that the generation of the squared-error look-up table can reduce the number of candidate code words significantly and reduce the search range.

The computational load and the quantization accuracy are compared for the proposed algorithm with TIE [18], ITIE [21], and BSS-ITIE [28] approaches. With the performance of the full search algorithm as the benchmark, Table 4 gives the comparison of ER, ASN, and CS for the proposed algorithm with TIE [18], ITIE [21], and BSS-ITIE [28] approaches. The experimental results show that the proposed algorithm provided CS of up to 92% when

T Q A = 0.90

and when

T Q A = 0.99

, it still reduced the computational load by 85%. Compared to the TIE and ITIE methods, the proposed method provided CS of up to 76% and 63% with almost the same quantization accuracy.

To further evaluate the reduction of computational load, the comparison of the average number of basic operations, including addition, multiplication, and comparison, is shown in Table 5 and is illustrated as a bar graph in Figure 4. The multiplication operation was the dominant computation, with the highest computational complexity. The reduction in the number of multiplications is the load reduction (LR), symbolized as

L R = \frac{{MulN}_{1} {- MulN}_{2}}{{MulN}_{1}}

, where MulN is the number of multiplications. The proposed algorithm provided LR up to 90% in the full search algorithm, with almost the same quantization accuracy.

Table 6 and Figure 5 give the comparison results with the BSS-ITIE [28] as the benchmark. When

T Q A = 0.99

, the ASN of the BSS-ITIE algorithm was 18.47 with

E R = 4.66 %

. In comparison, the ASN of the proposed algorithm was 18.82 with

E R = 0.81 %

, and

L R = 57 %

. This indicates the proposed algorithm can obtain a better speech quality than the BSS-ITIE algorithm with great reduction of the number of multiplications. On the other hand, when

T Q A = 0.95

, the ASN of the proposed algorithm was 13.10 with

E R = 4.08 %

. When

T Q A = 0.94

, the ASN of the proposed algorithm was 12.33 with

E R = 4.86 %

. This indicates the proposed algorithm provided

C S

of about 29–33%, and LR up to 67–69%, over the BSS-ITIE algorithm with almost the same

E R

.

Figure 6 shows that when ASN was about equal to 19, the

E R

of the proposed method was equal to 0.81%, while with the BSS-ITIE it was equal to 4.66%. For instance, when ASN was approximately equal to 13, the

E R

of the proposed method was lower than that of the BSS-ITIE method by about 5%. When

E R

was approximately equal to 4%, the ASN of the proposed method was about 5.5 lower than that of the BSS-ITIE method. Thus, the proposed algorithm had a significantly better performance than the BSS-ITIE method.

In addition, to better measure the quantization error, the average vector quantization error (AVQR) was defined as the absolute error value between the quantized code word and the best-matched code word. The AVQR was computed by Equation (16).

A V Q R = \frac{\sum_{i = 1}^{i = L} a b s ({\hat{c}}_{i} - c_{i})}{L},

(16)

where

{\hat{c}}_{i}

was the quantized code word, and

c_{i}

was the best-matched code word with the input vector which was searched by the full search algorithm. L was the total number of input speech frames. The AVQR value of the BSS-ITIE [22] and the proposed algorithm were computed, while the TQA ranged from 0.90 to 0.99, respectively. Table 7 shows the AVQR comparison between the BSS-ITIE [22] method and the proposed method. The experimental results show that all the AVQR values of the proposed method were lower than 0.1, and the AVQR value of the BSS-ITIE method ranged from 0.0695 to 0.2324. Further, the max value of AVQR for the proposed method was 0.0974 when TQA = 0.90, which is about equal to the AVQR value of the BSS-ITIE method when TQA = 0.98. Thus, the experimental results show that the proposed method can obtain a much lower quantization error than the BSS-ITIE method.

5. Conclusions

In this paper, an efficient codebook search algorithm for the VQ of the LSF coefficients is proposed to reduce the computation load. A squared-error look-up table was prebuilt in the training procedure and then the encoding procedure began. An input vector was quickly assigned to a search subspace, then the CSG was obtained by employing the TIE equation. Subsequently, a PDE technique was employed to reduce the number of multiplications. The experimental results show that the proposed algorithm provided a CS of up to 85% in the full search algorithm, up to 76% in the TIE algorithm, and 63% in the iterative TIE (ITIE) algorithm when

T Q A = 0.99

. Compared to the BSS-ITIE algorithm, the proposed method provided a CS and LR of up to 29–33% and 67–69%, respectively, with almost the same quantization accuracy. Further, a trade-off between the computation loads and quantization accuracy could easily be made to meet a user’s requirement when performing VQ encoding. This work would be beneficial for reaching the energy-saving requirement when implemented in a speech codec of mobile devices, and the reduction of computation load is helpful for the G.729 Recommendation’s application in real-time speech signal processing systems.

Author Contributions

Y.X. developed the idea and wrote this article. Y.W., Z.Y., J.J., Y.Z., X.F. and S.Q. put forward some constructive suggestions for revision. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

I would like to thank the Smart Sensing R&D Center, Institute of Microelectronics of the Chinese Academy of Sciences, which supported us in this work.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

VQ	Vector quantization
LPC	Linear prediction coding
LSF	Line spectrum frequency
BSS-VQ	binary space search vector quantization
CSG	candidate search group
TQA	threshold of quantization accuracy
TIE	triangular inequality elimination
ITIE	iterative triangular inequality elimination
PDE	partial distortion elimination
ASN	average search numbers
CS	computational saving
LR	load reduction
CS-ACELP	conjugate-structure algebraic-code-excited linear prediction
EEENNS	equal-average equal-variance equal-norm nearest neighbor search
AVQR	average vector quantization error.

References

Cheng, Z.; Yang, F.; Gao, B. Partial Discharge Pattern Recognition of XLPE Cable Based on Vector Quantization. IEEE Trans. Magn. 2019, 55, 1–4. [Google Scholar] [CrossRef]
Lu, X.; Wang, H.; Dong, W. Learning a Deep Vector Quantization Network for Image Compression. IEEE Access 2019, 7, 118815–118825. [Google Scholar] [CrossRef]
Wu, Z.-B.; Yu, J. Vector quantization: A review. Front. Inf. Technol. Electron. Eng. 2019, 20, 507–524. [Google Scholar] [CrossRef]
Gray, R.M. Vector quantization. IEEE ASSP Mag. 1984, 1, 4–29. [Google Scholar] [CrossRef]
Makhoul, J.; Roucos, S.; Gish, H. Vector Quantization in Speech Coding. Proc. IEEE 1985, 73, 1551–1588. [Google Scholar] [CrossRef]
Linde, Y.; Buzo, A.; Gray, R.M. An Algorithm for Vector Quantizer Design. IEEE Trans. Commun. 1980, 28, 84–95. [Google Scholar] [CrossRef] [Green Version]
Mobini, N.; Vahdat, B.; Radfar, M.H. An FPGA based implementation of G.729. In Proceedings of the IEEE International Symposium on Circuits and Systems, Kobe, Japan, 23–26 May 2005. [Google Scholar]
Hwang, S.-H. Computational improvement for G.729 standard. Electron. Lett. 2000, 36, 1163–1164. [Google Scholar] [CrossRef]
Sheikh, N.M.; Siddiqui, K.I. Real-time implementation and optimization of ITU-T’s G.729 speech codec running at 8 kbits/sec using CS-ACELP on TM-1000 VLIW DSP CPU. In Proceedings of the IEEE International Multi Topic Conference, 2001. IEEE INMIC 2001. Technology for the 21st Century, Lahore, Pakistan, 30 December 2001. [Google Scholar]
Chen, C.; Fu, X. G.729/A Speech Coder: Multichannel TMS320C62x Implementation. 2000. Available online: https://www.ti.com/lit/an/spra564b/spra564b.pdf?ts=1612342823282&ref_url=https%253A%252F%252Fwww.google.com.hk%252F (accessed on 3 February 2021).
Swee, L.H. Implementation of G.729 on the TMS320C54x. 2000. Available online: https://www.ti.com/lit/an/spra656/spra656.pdf (accessed on 3 February 2021).
Zhang, Z.; Wang, G.; Zhang, Y. Implementation of G.729 algorithm based on ARM9. In Proceedings of the 2012 2nd International Conference on Consumer Electronics, Communications and Networks (CECNet), Yichang, China, 21–23 April 2012. [Google Scholar]
Chang, R.F.; Chen, W.T.; Wang, J.S. Image sequence coding using adaptive tree-structured vector quantisation with multipath searching. IEEE Proc. I Commun. Speech Vis. 1992, 139, 9–14. [Google Scholar] [CrossRef]
Chou, P.A.; Lookabaugh, T.; Gray, R.M. Optimal pruning with applications to tree-structured source coding and modeling. IEEE Trans. Inf. Theory 1989, 35, 299–315. [Google Scholar] [CrossRef]
Balakrishnan, M.; Pearlman, W.A.; Lu, L. Variable-rate tree-structured vector quantizers. IEEE Trans. Inf. Theory 1995, 41, 917–930. [Google Scholar] [CrossRef]
Yang, S.-B. Variable-branch tree-structured vector quantization. IEEE Trans. Image Process. 2004, 13, 1275–1285. [Google Scholar] [CrossRef] [PubMed]
Chu, C.-P.; Hwang, S.-H.; Chang, S.-C. An improvement to tree-structured vector quantization. Appl. Mech. Mater. 2013, 284–287, 2926–2929. [Google Scholar] [CrossRef]
Huang, S.-H.; Chen, S.-H. Fast encoding algorithm for VQ-based image coding. Electron. Lett. 1990, 26, 1618–1619. [Google Scholar] [CrossRef]
Hsieh, C.-H.; Liu, Y.-J. Fast search algorithms for vector quantization of images using multiple triangle inequalities and wavelet transform. IEEE Trans. Image Process. 2000, 9, 321–328. [Google Scholar] [CrossRef]
Huang, C.-J.; Yeh, C.-Y.; Hwang, S.-H. An improvement of the Triangular Inequality Elimination Algorithm for Vector Quantization. Appl. Math. Inf. Sci. 2015, 9, 229–235. [Google Scholar] [CrossRef]
Yeh, C.-Y. Iterative Triangular Inequality Elimination Algorithm for Codebook Search of Vector Quantization. IEEJ Trans. Electr. Electron. Eng. 2018, 13, 1528–1529. [Google Scholar] [CrossRef]
Ra, S.-W.; Kim, J.-K. A fast mean-distance-ordered partial codebook search algorithm for image vector quantization. IEEE Trans. Circuits Syst. II Analog Digit. Signal Process. 1993, 40, 576–579. [Google Scholar] [CrossRef]
Lu, Z.M.; Sun, S.H. Equal-average equal-variance equal-norm nearest neighbor search algorithm for vector quantization. IEICE Trans. Inf. Syst. 2003, 86, 660–663. [Google Scholar]
Chen, S.X.; Li, F.W. Fast encoding method for vector quantisation of image using subvector characteristics and Hadamard transform. IET Image Process. 2011, 5, 18–24. [Google Scholar] [CrossRef]
Lin, T.-H.; Yeh, C.-Y.; Hwang, S.-H.; Chang, S.-C. Efficient binary search space-structured VQ encoder applied to a line spectral frequency quantisation in G.729 standard. IET Commun. 2016, 10, 1183–1188. [Google Scholar] [CrossRef]
Yeh, C.-Y. An Efficient VQ Codebook Search Algorithm Applied to AMR-WB Speech Coding. Symmetry 2017, 9, 54. [Google Scholar] [CrossRef]
Yeh, C.-Y. An efficient search algorithm for ISF vector quantization in AMR-WB speech codec. IEEJ Trans. 2016, 11, 829–831. [Google Scholar] [CrossRef]
Yeh, C.-Y.; Huang, H.-H. An Upgraded Version of the Binary Search Space-Structured VQ Search Algorithm for AMR-WB Codec. Symmetry 2019, 11, 283. [Google Scholar] [CrossRef] [Green Version]
ITU-T Recommendation. G.729: Coding of Speech at 8 kbit/s Using Conjugate-Structure Algebraic-Code-Excited Linear Prediction (CS-ACELP); International Telecommunication Union: Geneva, Switzerland, 1996. [Google Scholar]
Bei, C.-D.; Gray, R.M. An improvement of the minimum distortion encoding algorithm for vector quantization. IEEE Trans. Commun. 1985, 33, 1132–1133. [Google Scholar]
Aurora Corpus. Available online: http://portal.elda.org/en/catalogues/free-resources/free-lrs-set-1/ (accessed on 15 November 2020).

Figure 1. Structure of the two-stage vector quantization (VQ) for line spectrum frequency (LSF) coefficients.

Figure 2. Structure of the proposed search algorithm.

Figure 3. The squared-error distortion look-up table is designed for each subspace.

Figure 4. Bar graph representation of Table 4.

Figure 5. Bar graph representation of Table 6.

Figure 6. Comparison of the ER and ASN between the BSS-TIIE [28] method and the proposed algorithm.

Table 1. The value of the MA predictor coefficients.

p0	k	MA Predictor Coefficients
0	1	0.2570	0.2780	0.2800	0.2736	0.2757	0.2764	0.2675	0.2678	0.2779	0.2647
	2	0.2142	0.2194	0.2331	0.2230	0.2272	0.2252	0.2148	0.2123	0.2115	0.2096
	3	0.1670	0.1523	0.1567	0.1580	0.1601	0.1569	0.1589	0.1555	0.1474	0.1571
	4	0.1238	0.0925	0.0798	0.0923	0.0890	0.0828	0.1010	0.0988	0.0872	0.1060
1	1	0.2360	0.2405	0.2499	0.2495	0.2517	0.2591	0.2636	0.2625	0.2551	0.2310
	2	0.1285	0.0925	0.0779	0.1060	0.1183	0.1176	0.1277	0.1268	0.1193	0.1211
	3	0.0981	0.0589	0.0401	0.0654	0.0761	0.0728	0.0841	0.0826	0.0776	0.0891
	4	0.0923	0.0486	0.0287	0.0498	0.0526	0.0482	0.0621	0.0636	0.0584	0.0794

Table 2. Bit allocation of the LSF quantizer.

Procedure		Code Index	Bits
First stage	10th order	l1	7
Second stage	5th low order	l2	5
Second stage	5th high order	l3	5
Selection of predictive filter		p0	1
Total			18

Table 3. The selection of the training dataset.

Experimental Conditions	Training Set	Testing Set	ER	ASN
Conditions 1	1001 clean files	500 clean files	0.81%	18.82
Conditions 1	1001 clean files	201 files with noise	3.7%	18.9
Conditions 2	500 clean and 500 files with noise	Training set	0.7%	18.8
		201 clean files	2.9%	19.3
		201 files with noise	2.8%	18.9
Conditions 3	1001 clean and 1001 files with noise	201 clean files	0.79%	20
Conditions 3	1001 clean and 1001 files with noise	201 files with noise	0.8%	19.9

Table 4. Comparison of the computational load and ER of various methods.

Methods		ER	ASN	CS
Full search algorithm		0	128	Benchmark
TIE		0	78.37	38.77%
ITIE		0	51.21	60%
BSS-ITIE (TQA)	0.90	13.85%	10.39	91.88%
	0.91	13.10%	10.84	91.53%
	0.92	12.20%	11.35	91.13%
	0.93	11.32%	11.88	90.72%
	0.94	10.38%	12.54	90.20%
	0.95	9.28%	13.28	89.63%
	0.96	8.28%	14.11	88.98%
	0.97	7.02%	15.23	88.10%
	0.98	5.97%	16.53	87.09%
	0.99	4.66%	18.47	85.57%
Proposed (TQA)	0.90	7.89%	10.16	92.06%
	0.91	7.15%	10.60	91.72%
	0.92	6.37%	11.10	91.33%
	0.93	5.65%	11.65	90.90%
	0.94	4.86%	12.33	90.37%
	0.95	4.08%	13.10	89.77%
	0.96	3.35%	13.98	89.08%
	0.97	2.47%	15.17	88.15%
	0.98	1.67%	16.61	87.02%
	0.99	0.81%	18.82	85.30%

Table 5. Comparison of the basic operation numbers between the proposed method and other methods.

Methods	TQA	Additions	MulN	Comparisons	LR
Full search algorithm	-	2560	1280	128	Benchmark
TIE	-	1654	794	206	38%
ITIE	-	978	628	369	51%
BSS-ITIE	0.99	379	291	106	77%
Proposed	0.99	518	125	50	90%
	0.95	387	95	37	92.5%
	0.94	369	91	36	92.8%

Table 6. Computational savings comparison between the BSS-ITIE [28] method and the proposed algorithm.

Methods	TQA	ER	ASN	MulN	LR	CS
BSS-ITIE	0.99	4.66%	18.47	291	Benchmark
Proposed	0.99	0.81%	18.82	125	57%	0
	0.95	4.08%	13.10	95	67%	29%
	0.94	4.86%	12.33	91	69%	33%

Table 7. Average vector quantization error comparison between the BSS-ITIE [28] method and the proposed method.

TQA	BSS-ITIE	Proposed
0.90	0.2324	0.0974
0.91	0.2166	0.0872
0.92	0.1993	0.0764
0.93	0.1821	0.0673
0.94	0.1648	0.0572
0.95	0.1450	0.0475
0.96	0.1273	0.0386
0.97	0.1074	0.0278
0.98	0.0905	0.0185
0.99	0.0695	0.0089

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xue, Y.; Wang, Y.; Jiang, J.; Yu, Z.; Zhan, Y.; Fan, X.; Qiao, S. An Efficient Codebook Search Algorithm for Line Spectrum Frequency (LSF) Vector Quantization in Speech Codec. Electronics 2021, 10, 380. https://doi.org/10.3390/electronics10040380

AMA Style

Xue Y, Wang Y, Jiang J, Yu Z, Zhan Y, Fan X, Qiao S. An Efficient Codebook Search Algorithm for Line Spectrum Frequency (LSF) Vector Quantization in Speech Codec. Electronics. 2021; 10(4):380. https://doi.org/10.3390/electronics10040380

Chicago/Turabian Style

Xue, Yuqun, Yongsen Wang, Jianhua Jiang, Zenghui Yu, Yi Zhan, Xiaohua Fan, and Shushan Qiao. 2021. "An Efficient Codebook Search Algorithm for Line Spectrum Frequency (LSF) Vector Quantization in Speech Codec" Electronics 10, no. 4: 380. https://doi.org/10.3390/electronics10040380

APA Style

Xue, Y., Wang, Y., Jiang, J., Yu, Z., Zhan, Y., Fan, X., & Qiao, S. (2021). An Efficient Codebook Search Algorithm for Line Spectrum Frequency (LSF) Vector Quantization in Speech Codec. Electronics, 10(4), 380. https://doi.org/10.3390/electronics10040380

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

An Efficient Codebook Search Algorithm for Line Spectrum Frequency (LSF) Vector Quantization in Speech Codec

Abstract

1. Introduction

2. LSF Coefficients Quantization in G.729

3. Proposed Search Algorithm

3.1. Training Procedure

3.2. Encoding Procedure

4. Experiment and Results

4.1. Experimental Environment

4.2. Selection of the Training Dataset

4.3. Performance of the Proposed Method

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI