1. Introduction
With a 16 kHz sampling rate, the adaptive multirate wideband (AMRWB) speech codec [
1,
2,
3,
4] is one of the speech codecs applied to modern mobile communication systems as a way to remarkably improve the speech quality on handheld devices. The AMRWB is a speech codec developed on the basis of an algebraic codeexcited linearprediction (ACELP) coding technique [
4,
5], and provides nine coding modes with bitrates of 23.85, 23.05, 19.85, 18.25, 15.85, 14.25, 12.65, 8.85, and 6.6 kbps. The ACELPbased technique is developed as an excellent speech coding technique, having a double advantage of low bit rates and high speech quality, but a price paid is a high computational complexity required in an AMRWB codec. Using an AMRWB speech codec, the speech quality of a smartphone can be improved but at the cost of high battery power consumption.
In an AMRWB encoder, a vector quantization (VQ) of immittance spectral frequency (ISF) coefficients [
6,
7,
8,
9,
10] occupies a significant computational load in various coding modes. The VQ structure in AMRWB adopts a combination of a split VQ (SVQ) and a multistage VQ (MSVQ) techniques, referred to as splitmultistage VQ (SMSVQ), to quantize the 16order ISF coefficient [
1]. Conventionally, VQ uses a full search to obtain a codeword best matched with an arbitrary input vector, but the full search requires an enormous computational load. Therefore, many studies [
11,
12,
13,
14,
15,
16,
17,
18,
19] have made efforts to simplify the search complexity of an encoding process. These approaches are classified into two types in terms of the manner the complexity is reduced: triangular inequality elimination (TIE)based approaches [
11,
12,
13,
14] and equalaverage equalvariance equalnorm nearest neighbor search (EEENNS)based algorithms [
15,
16,
17,
18,
19].
As in [
11], a TIE algorithm is proposed to address the computational load issue in a VQbased image coding. Improved versions of TIE approaches are presented in [
12,
13] to reduce the scope of a search space, giving rise to further reduction in the computational load. However, there exists a high correlation between ISF coefficients of neighboring frames in AMRWB, that is, ISF coefficients evolve smoothly over successive frames. This feature benefits TIEbased VQ encoding, according to which a considerable computational load reduction is demonstrated. Yet, a moving average (MA) filter is employed to smooth the data in advance of VQ encoding of ISF coefficients. It means that the high correlation feature is gone, resulting in a poor performance in computational load reduction. Recently, a TIE algorithm equipped with a dynamic and an intersection mechanism, named DITIE, is proposed to effectively simplify the search load, and this algorithm is validated as the best candidate among the TIEbased approaches so far. On the other hand, an EEENNS algorithm was derived from equalaverage nearest neighbor search (ENNS) and equalaverage equalvariance nearest neighbor search (EENNS) approaches [
15,
16,
17,
18,
19]. In contrast to TIEbased approaches, the EEENNS algorithm uses three significant features of a vector, i.e., mean, variance, and norm, as a threelevel elimination criterion to reject impossible codewords.
Furthermore, a binary search spacestructured VQ (BSSVQ) is presented in [
20] as a simple as well as efficient way to quantize line spectral frequency (LSF) coefficients in the ITUT G.729 speech codec [
5]. This algorithm demonstrated that a significant computational load reduction is achieved with a well maintained speech quality. In view of this, this paper will apply the BSSVQ search algorithm to the ISF coefficients quantization in AMRWB. This work aims to verify whether the performance superiority of the BSSVQ algorithm remains, for the reason that the VQ structure in AMRWB is different from that in G.729. On the other hand, another major motivation behind this is to meet the energy saving requirement on handheld devices, e.g. smartphones, for an extended operation time period.
The rest of this paper is outlined as follows.
Section 2 gives the description of ISF coefficients quantization in AMRWB. The BSSVQ algorithm for ISF quantization is presented in
Section 3.
Section 4 demonstrates experimental results and discussions. This work is summarized at the end of this paper.
3. BSSVQ Algorithm for ISF Quantization
The basis of the BSSVQ algorithm is that an input vector is efficiently assigned to a subspace where a small number of codeword searches is carried out using a combination of a fast locating technique and lookup tables, as a prerequisite of a VQ codebook search. In this manner, a significant computational load reduction can be achieved.
At the start of this algorithm, each dimension is dichotomized into two subspaces, and an input vector is then assigned to a corresponding subspace according to the entries of the input vector. This idea is illustrated in the following example. There are 2^{9} = 512 subspaces for a 9dimensional subvector r_{1}(n) associated with codebook CB1, and an input vector can then be assigned to one of the 512 subspaces by means of a dichotomy according to each entry of the input vector. Finally, VQ encoding is performed using a prebuilt lookup table containing the statistical information on sought codewords.
In this proposal, the lookup table in each subspace is prebuilt in a way that requires lots of data for training purposes. A training as well as an encoding procedure in BSSVQ is illustrated with the example of a 9dimensional codebook CB1 with 256 entries in AMRWB.
3.1. BSS Generation with Dichotomy Splitting
As a preliminary step of a training procedure, each dimension is dichotomized into two subspaces, and a dichotomy position is defined as the mean of all the codewords contained in a codebook, formulated as:
where
${c}_{i}(j)$ represents the
jth component of the
ith codeword
c_{i},
dp(
j) the mean value of all the
jth components. Taking the codebook CB1 as an instance,
CSize = 256,
Dim = 9. As listed in
Table 3, all the
dp(
j) values are saved and then presented in a tabular form.
Subsequently, for vector quantization on the
nth input vector
x_{n}, a quantity
ν_{n}(
j) is defined as:
where
x_{n}(
j) denotes the
jth component of
x_{n}. Then
x_{n} is assigned to subspace
k (
bss_{k}), with
k given as the sum of
ν_{n}(
j) over the entire dimensions, formulated as:
In this study, 0 ≤ k < BSize and BSize = 2^{9} = 512 represents the total number of subspaces. Taking an input vector x_{n} = {20.0, 20.1, 20.2, 20.3, 20.4, 20.5, 20.6, 20.7, 20.8} as an instance, ν_{n}(j) = {2^{0}, 2^{1}, 2^{2}, 0, 0, 0, 0, 0, 2^{8}} for each j, 0 ≤ j ≤ 8, and k = 263 can be obtained by Equations (9) and (10) respectively. Thus, the input vector x_{n} is assigned to the subspace bss_{k} with k = 263.
By means of Equations (9) and (10), it is noted that this algorithm requires a small number of basic operations, i.e., comparison, shift and addition, such that an input vector is assigned to a subspace in a highly efficient manner.
3.2. Training Procedure of BSSVQ
Following the determination of the dichotomy position for each dimension, a training procedure is performed to build a lookup table in each subspace. The lookup tables give the probability that each codeword serves as the bestmatched codeword in each subspace, referred to as the hit probability of a codeword in a subspace for short.
A training procedure is stated below as Algorithm 1. With more than 1.56 GB of memory, a duration longer than 876 min and a total of 2,630,045 speech frames, a large speech database, covering a diversity of contents and multiple speakers, is employed as the training data in a training procedure.
Algorithm 1: Training procedure of BSSVQ 
 Step 1.
Initial setting: assign each codeword to all the subspaces, and then set the probability that the codeword c_{i} corresponds to the bestmatched codeword in bss_{k} ${P}_{hit}({c}_{i}bs{s}_{k})=0$, 1 ≤ i ≤ CSize, 0 ≤ k < BSize.  Step 2.
Referencing Table 3 and through Equations (9) and (10), an input vector can be efficiently assigned to bss_{k}.  Step 3.
A full search is conducted on all the codewords according to the Euclidean distance, given as:
and an optimal codeword c_{opt} satisfies:
 Step 4.
Update the statistics on the optimal codeword, that is, ${P}_{hit}({c}_{opt}bs{s}_{k})$.  Step 5.
Repeat Steps 2–4, until the training is performed on all the input vectors.

A lookup table is built for each subspace, following the completion of a training procedure. The lookup table gives the hit probability of each codeword in a subspace. For sorting purposes, a quantity ${P}_{hit}(mbs{s}_{k})$, 1 ≤ m ≤ CSize, is defined as the m ranked probability that a codeword hits the bestmatched codeword in subspace bss_{k}. Taking m = 1 as an instance, ${P}_{hit}(mbs{s}_{k}){}_{m=1}=\underset{{c}_{i}}{\mathrm{max}}\{{P}_{hit}({c}_{i}bs{s}_{k})\}$, namely, the highest hit probability in bss_{k}. As it turns out, the lookup table in each subspace gives the ranked hit probability in descending order and the corresponding codeword.
3.3. Encoding Procedure of BSSVQ
In the encoding procedure of BSSVQ, the cumulative probability
${P}_{cum}(Mbs{s}_{k})$ is firstly defined as the sum of the top
M ${P}_{hit}(mbs{s}_{k})$ in
bss_{k}, that is:
Subsequently, given a threshold of quantization accuracy (
TQA), a quantity
M_{k}(
TQA) represents the minimum value of
M that satisfies the condition
${P}_{cum}(Mbs{s}_{k})\ge TQA$ in
bss_{k}, that is:
For a given
TQA, a total of 512
M_{k}(
TQA)s are evaluated by Equation (14) for all the subspaces, and the mean value is then given as:
Illustrated in
Figure 1 is a plot of the average number of searches
$\overline{M}(TQA)$ corresponding to the values of
TQA ranging between 0.90 and 0.99. Given a
TQA = 0.95 as an instance, a mere average of 14.58 codeword searches is required to reach a search accuracy as high as 95%. In simple terms, the search performance can be significantly improved at the cost of a small drop in search accuracy. Furthermore, a tradeoff can be made instantly between the quantization accuracy and the search load according to
Figure 1. Hence, a BSSVQ encoding procedure is described below as Algorithm 2.
Algorithm 2: Encoding procedure of BSSVQ 
 Step 1.
Given a TQA, M_{k}(TQA) satisfying Equation (14) is found directly in the lookup table in bss_{k}.  Step 2.
Referencing Table 3 and by means of Equations (9) and (10), an input vector is assigned to a subspace bss_{k} in an efficient manner.  Step 3.
A full search for the bestmatched codeword is performed on the top M_{k}(TQA) sorted codewords in bss_{k}, and then the output is the index of the found codeword.  Step 4.
Repeat Steps 2 and 3 until all the input vectors are encoded.

The BSSVQ algorithm is briefly summarized as follows.
Table 3 is the outcome by performing Equation (8) and is saved as the first lookup table. Subsequently, the second lookup table concerning
${P}_{hit}(mbs{s}_{k})$ and the corresponding codeword is built for each subspace according to the training procedure. Accordingly, the VQ encoding can be performed using Algorithm 2.
4. Experimental Results
There are three experiments conducted in this work. The first is a search load comparison among various search approaches. The second is a quantization accuracy (QA) comparison among a full search and other search approaches. The third is a performance comparison among various approaches in terms of ITUT P.862 perceptual evaluation of speech quality (PESQ) [
21] as an objective measure of speech quality. A speech database, completely different from all the training data, is employed for outside testing purposes. With one male and one female speaker, the speech database in total takes up more than 221 MB of memory, occupies more than 120 min, and covers 363,281 speech frames.
Firstly,
Table 4 lists a comparison on the average number of searches among full search, multiple TIE (MTIE) [
13], DITIE, and EEENNS, while
Table 5 gives the search load corresponding to
TQA values of the BSSVQ algorithm. Moreover, with the search load required in the full search algorithm as a benchmark,
Table 6 and
Table 7 present comparisons on the load reduction (LR) with respect to
Table 4 and
Table 5. A high value of LR reflects a high search load reduction.
Table 6 indicates that DITIE provides a higher value of LR than MTIE and EEENNS search approaches among all the codebooks. It is also found that most LR values of BSSVQ are higher than the DITIE approach by an observation in
Table 6 and
Table 7. For example, the LR values of BSSVQ are indeed higher than DITIE in case the
TQA is equal to or smaller than 0.99, 0.98, 0.96, and 0.99 in codebooks CB1, CB2, CB21, and CB22, respectively. Accordingly, a remarkable search load reduction is reached by the BSSVQ search algorithm.
In the QA aspect, a 100% QA is obtained by the MTIE, DITIE, and EEENNS algorithms as compared with a full search approach. Thus, only the QA experiment of BSSVQ is conducted. The QA corresponding to
TQA values of the BSSVQ algorithm is given in
Table 8. It reveals that QA is an approximation of
TQA in either inside or outside testing cases. Moreover, this algorithm provides an LR between 77.78% and 93.98% at
TQA = 0.90 as well as an LR between 67.23% and 88.39% at
TQA = 0.99, depending on the codebooks. In other words, a tradeoff can be made between the quantization accuracy and the search load.
Furthermore, an overall LR is evaluated to observe the total search load of an entire VQ encoding procedure of an input vector. The overall LR refers to the total search load, defined as the sum of the average number of searches multiplied by the vector dimension in each codebook. Thus, an overall LR comparison with the full search as a benchmark is presented as a bar graph in
Figure 2. As clearly indicated in
Figure 2, the overall LR of BSSVQ is higher than MTIE, DITIE, and EEENNS approaches, but at the same time the QA is as high as 0.98. Moreover,
Table 9 gives a PESQ comparison, including the mean and the STD, among various approaches. Since MTIE, DITIE, and EEENNS provide a 100% QA, they both share the same PESQ with a full search, meaning that there is no deterioration in the speech quality. A close observation reveals little difference between PESQs obtained in a full search and in this search algorithm, that is, the speech quality is well maintained in BSSVQ at
TQA not less than 0.90. This BSSVQ search algorithm is experimentally validated as a superior candidate relative to its counterparts.
5. Conclusions
This paper presents a BSSVQ codebook search algorithm for ISF vector quantization in the AMRWB speech codec. Using a combination of a fast locating technique and lookup tables, an input vector is efficiently assigned to a search subspace where a small number of codeword searches is carried out and the aim of remarkable search load reduction is reached consequently. Particularly, a tradeoff can be made between the quantization accuracy and the search load to meet a user’s need when performing a VQ encoding. This BSSVQ search algorithm, providing a considerable search load reduction as well as nearly lossless speech quality, is experimentally validated as superior to MTIE, DITIE, and EEENNS approaches. Furthermore, this improved AMRWB speech codec can be adopted to upgrade the VoIP performance on a smartphone. As a consequence, the energy efficiency requirement is achieved for an extended operation time period due to computational load reduction.