Pyramid Product Quantization for Approximate Nearest Neighbor Search

Wang, Yang; Yu, Lu; Zhang, Jinbin; Zhang, Qiyuan

doi:10.3390/app16020853

Open AccessArticle

Pyramid Product Quantization for Approximate Nearest Neighbor Search

by

Yang Wang

^*

,

Lu Yu

,

Jinbin Zhang

and

Qiyuan Zhang

Department of Information Science, Xi’an University of Technology, Xi’an 710048, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2026, 16(2), 853; https://doi.org/10.3390/app16020853

Submission received: 11 December 2025 / Revised: 12 January 2026 / Accepted: 12 January 2026 / Published: 14 January 2026

(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Download

Browse Figures

Versions Notes

Abstract

Product quantization (PQ) is a widely adopted technique for efficient approximate nearest neighbor (ANN) search in high-dimensional spaces, offering a favorable balance between accuracy and memory efficiency. However, standard PQ suffers from high online computational cost when the number of subspaces is high. To address this dilemma, we propose Pyramid Product Quantization (PPQ), a novel adaptive quantization framework that dynamically selects the most suitable number of subspaces for different segments of each data vector. This leads to a significant reduction in the number of addition operations required during approximate distance computation, significantly accelerating online search. Experimental results demonstrate that the proposed PPQ method effectively lowers the computational complexity of product quantization and its variants, without compromising retrieval accuracy.

Keywords:

approximate nearest neighbor search; product quantization; pyramid code structure

1. Introduction

Product quantization (PQ) [1] has become a cornerstone of large-scale approximate nearest neighbor (ANN) search, which is widely used in real-world applications—including recommendation systems [2], image retrieval [3], and object recognition [4]—thanks to its ability to compress high-dimensional vectors into compact codes while enabling efficient distance approximation via asymmetric distance computation (ADC). In standard PQ, the vector space is partitioned into non-overlapping subspaces of equal dimension, each independently quantized using a dedicated sub-codebook. The ADC distance between a query and a database vector is then approximated as the sum of per-subspace distances.

However, this rigid partitioning assumes that all vector segments are equally amenable to quantization—a premise that often fails in practice. Real-world data exhibit local heterogeneity: some segments are highly structured and easily compressed, while others are noisy or complex, requiring finer representation. Forcing a uniform fixed subspace decomposition leads to either over-quantization (wasting bits on simple segments) or under-quantization (losing fidelity on complex ones). More critically, during online search, every subspace contributes an addition operation in ADC, regardless of its informativeness—introducing unnecessary computational overhead.

To address this limitation, we propose Pyramid Product Quantization (PPQ), an adaptive encoding framework that dynamically merges adjacent subspaces on a per-vector basis. For each database vector, PPQ evaluates multiple quantization configurations—ranging from fine-grained to coarse-grained—and selects, for each segment, the granularity that minimizes local reconstruction error.

We emphasize that PPQ operates entirely within the standard PQ pipeline: it requires no changes to indexing structures (e.g., IVF) and is compatible with advanced variants like OPQ and MRPQ. Experimental results on SIFT1M and GIST1M show that PPQ achieves faster a query time compared to conventional PQ at identical recall, with negligible memory overhead. Moreover, PPQ can also be combined with PQ-based ANN methods to achieve faster retrieval.

The rest of our paper can be outlined as follows. Section 2 briefly introduces the preliminaries of basic product quantization. In Section 3, our proposed Pyramid Product Quantization method is described in detail. Section 4 demonstrates the experimental results. Finally, a conclusion is presented in Section 5.

2. Related Works

2.1. Vector Quantization (VQ)

Vector quantization (VQ) [5] is a well-established technique for data compression. Within the vector quantization (VQ) framework, several specialized approaches have been developed, such as linear-combination VQ [6,7,8] and Cartesian-product VQ [9,10,11]. The core idea of VQ is to partition the data space

R^{D}

into a set of Voronoi cells

S = {S_{j} | j \in [1, K]}

, each indexed by

j

. Every database vector is then approximated by the centroid of the cell it belongs to. These centroids are referred to as codewords, and the collection of all codewords forms a codebook

C

. The quantization of a vector x is denoted as

Q (x)

, which maps x to its corresponding codeword in the codebook.

Q (x) = c_{j}, x \in S_{j}, c_{j} \in C .

(1)

A vector quantizer comprises an encoder and a decoder. As shown in Equation (1), the encoder performs the encoding step by assigning each vector in the data space

R^{D}

to a corresponding entry in the codebook. Typically, the codebook contains K codewords, where K ≪ N and N denotes the total number of vectors in the database.

Instead of storing the full high-dimensional vectors, only the indexes of their corresponding codewords are retained, which leads to substantial memory savings. Consequently, each database vector requires just

{l o g}_{2} K

bits for storage.

The decoding process can be performed using the codeword index together with the codebook

C

. The performance of the quantizer is typically evaluated by its quantization distortion, commonly defined as the mean squared error (MSE)

δ_{V Q}

:

δ_{V Q} = \frac{1}{N} \sum_{i = 1}^{N} {‖x_{i} - Q (x_{i})‖}_{2}^{2}

(2)

where

N

is the number of database vectors.

2.2. Product Quantization (PQ)

When applied to high-dimensional and large-scale data, conventional vector quantization (VQ) suffers from high encoding complexity. Achieving low quantization distortion typically demands a large codebook containing K centroids. For practical storage and indexing, K is commonly set to a power of two, enabling each vector to be encoded as a binary code of length

{l o g}_{2} K

.

A widely adopted strategy to alleviate this complexity is to partition the original data space into multiple low-dimensional subspaces, allowing each sub-vector to be quantized independently using its own sub-codebook. Specifically, product quantization (PQ) decomposes the

D

-dimensional space into a Cartesian product of M subspaces

{(C}_{1}, C_{2}, \dots, C_{M})

, each of dimensionality

d = D / M

. This yields M sub-codebooks of size

K

, and the full codebook

C

for the original space is constructed as the Cartesian product of these sub-codebooks:

C = C_{1} \times C_{2} \times \dots \times C_{M}

(3)

PQ’s quantization performance is typically assessed using the following equation:

δ_{P Q} = \frac{1}{N} \sum_{i = 1}^{N} \sum_{m = 1}^{M} {‖x_{i}^{m} - Q (x_{i}^{m})‖}_{2}^{2}

(4)

Assuming the total code length for a database vector remains

{l o g}_{2} K

, as in standard vector quantization, each sub-vector is assigned

{(l o g}_{2} K) / M = {l o g}_{2} k

bits for its index. Here,

K = k^{M}

, so the full index of a vector is formed by the Cartesian product of

M

sub-indexes. Consequently, the entire codebook only requires storing

k d M = k D

floating-point values (since

d = D / M

). As

M

increases,

k

becomes significantly smaller than

K

, leading to a substantial reduction in encoding complexity [11].

3. The Proposed Pyramid Product Quantization (PPQ)

3.1. Motivation

In PQ-based approximate nearest neighbor (ANN) search, the online computational cost primarily consists of two components: (1) computing a look-up table that stores the squared distances between

q

and all codewords in PQ codebook, and (2) computing the asymmetric distance

d (q, Q (x_{j}))

between the

q

and each database vector

x

.

When a query vector

q

is received online, the squared distance between its sub-vector

u_{i} (q)

and each codeword in the i-th subspace is calculated online and stored as M look-up tables

{T_{1}, T_{2}, \dots, T_{M}}

. Therefore, when computing the squared distance

d (q, u_{i} (x_{j}))

between

q

and a random database vector

x_{j}

in the i-th subspace, one only needs to use the indexes of

u_{i} (x_{j})

to look up a precomputed distance stored in a look-up table

T_{i}

. Consequently, the total quantized distance

d (q, Q (x_{j}))

is obtained simply by summing the M distances retrieved from the look-up tables across all subspaces. Table 1 shows the computational requirements for these two stages.

As shown in Table 1, the complexity of the first component depends on k (the size of each sub-codebook), while that of the second scales with both M (the number of subspaces) and N (the database size). In large-scale retrieval scenarios, where

N ≫ k

, the second component dominates the overall computational cost. Therefore, if the encoding structure can be optimized—while keeping quantization error unchanged—to reduce the computational burden of the second component, the overall complexity can be significantly lowered, leading to faster retrieval.

Inspired by this insight, we introduce a new product quantization method that leverages a pyramid-style quantization structure, named Pyramid Product Quantization (PPQ). Our method strategically optimizes the allocation of subspaces during encoding, using fewer subspaces in certain parts of the quantization structure. This design reduces the number of addition operations required when computing symmetric distances between

q

and each database vector

x

, thereby effectively accelerating the search process.

3.2. Pyramid Encoding Structure

In the pyramid encoding structure, p product quantizers with different numbers of subspaces are employed. The number of subspaces for each product quantizer is

M, \frac{M}{2}, \frac{M}{4}, \dots, \frac{M}{2^{p - 1}}

, and the codebook size in each of their subspaces is

k_{1}, k_{2}, \dots, k_{p}

.

Take the first product quantizer, which has M subspaces, for example. In the preprocess, we decompose a D-dimensional database vector

x

into

M

sub-vectors:

x = (u_{1} (x), u_{2} (x), \dots, u_{M} (x))

, as shown in Equations (5) and (6).

u_{1} (x) = (x_{1}, x_{2}, \dots, x_{d}) u_{2} (x) = (x_{d + 1}, x_{d + 2}, \dots, x_{2 d}) \dots \dots u_{M} (x) = (x_{(M - 1) d + 1}, x_{(M - 1) d + 2}, \dots, x_{M d})

(5)

d = \frac{D}{M},

(6)

where

d

denotes the dimensionality of each sub-vector.

M

is the number of subspaces.

In order to use fewer subspaces in the PQ-based ANN schemes, as shown in Figure 1, we first apply PQ encoding offline with multiple configurations of subspaces to the dataset, which we refer to as a pyramid encoding structure.

3.3. Database Encoding Strategy

After applying product quantization p times, each data sample x is represented by p combinations of different numbers of codewords. To illustrate how the pyramid structure optimizes product quantization, we take as an example two PQ configurations with M and M/2 subspaces, respectively.

As shown in Figure 2, after two quantizations, the sample vector x can be reconstructed either as x′ using M codewords of dimension

d = \frac{D}{M}

, or as x″ using M/2 codewords of dimension 2

d = \frac{2 D}{M}

. In each subspace, quantization errors exist between x and its reconstructions x′ or x″.

Which quantization scheme in Figure 2 performs better? This can be assessed by comparing their quantization errors. Specifically, a 2d-dimensional segment of the vector x, denoted as

x_{s} = (x_{2 (i - 1) d + 1}, x_{2 (i - 1) d + 2}, \dots, x_{2 i d})

, is quantized in two different configurations:

(1): In the product quantizer with M subspaces, $x_{s}$ is split between the 2i-th and (2i + 1)-th subspaces. The squared quantization error in this subspace is given as follows:

$E_{1}^{2} = \sum_{j = 1}^{2 d} x_{2 (i - 1) d + j} - {x^{'}}_{2 (i - 1) d + j}$

(7)
(2): In the product quantizer with M/2 subspaces, $x_{s}$ resides entirely within the i-th subspace. The squared quantization error in these two subspaces is given as follows:

$E_{2}^{2} = \sum_{j = 1}^{2 d} x_{2 (i - 1) d + j} - {x^{″}}_{2 (i - 1) d + j}$

(8)

If the squared quantization error

{E_{1}^{2} < E}_{2}^{2}

, it indicates that the first configuration reconstructs

x_{s}

more accurately. In this case, we adopt the first configuration for these 2d dimensions. Conversely, if

{E_{1}^{2} > E}_{2}^{2}

, we choose the second configuration.

Similarly, for other numbers of subspaces, we perform pairwise comparisons of quantization errors and select the most suitable product quantizer for encoding each segment of a database vector. During the offline preprocessing phase, we apply this encoding strategy to the entire dataset, such that each sample is represented as a combination of outputs from multiple product quantizers. Moreover, their encoding patterns

p t

—a M-dimensional vector which indicates, for each subspace, which of the p product quantizers was used for quantization—will be stored together with their corresponding codeword indexes.

Algorithm 1 in the table below presents the encoding procedure for the dataset.

Algorithm 1 Encoding the database vectors

Input:

D-dimensional database $X$ ; p PQ codebooks of product quantization (PQ) with different numbers of subspaces ${C^{1}, C^{2}, \dots, C^{p}}, w h e r e C^{i} = C_{1}^{i} \times C_{2}^{i} \times \dots \times C_{\frac{M}{2^{p - 1}}}^{i}, i = 1, 2, \dots p$ .

1. Output:
2. Indexes for the database vector

I x

= {

{I x}_{1}, {I x}_{2}, \dots, {I x}_{N}}

, where

{I x}_{i} = {{I x}_{i}^{1}, {I x}_{i}^{2}, \dots, {I x}_{i}^{p}}

{I x}_{i}^{j} = (I_{i, 1}^{j}, I_{i, 2}^{j}, \dots, I_{i, M / 2^{j - 1}}^{j})

; the encoding pattern vector

P t

.
3. Initialize encoding pattern

P t

as a M-dimensional zero vector.
4. for

n = 1; n \leq N; n + +

do
5. for

P = 1; P \leq p; P + +

do
6. Decompose

x_{i}

into

M / 2^{P - 1}

sub-vectors

(u_{1} (x_{i}), u_{2} (x_{i}), \dots, u_{M / 2^{P - 1}} (x_{i}))

;
7. for

Q = 1; Q \leq \frac{M}{2^{P - 1}}; Q + +

do
8. Quantize

u_{Q} (x_{i})

with

C_{Q}^{P}

to obtain the PQ index

{I x}_{n, Q}^{P}

and the squared
quantization error

{E x}_{n, Q}^{P}

;
9. end for
10. end for
11. for

P = 1; P \leq p; P + +

do
12. for

Q = 1; Q \leq 2^{P - 1}; Q + +

do
13. if

\min (\sum_{j = \frac{M}{P} (Q - 1) + 1}^{\frac{M Q}{P}} E_{j}^{1}, \sum_{j = \frac{M}{2 P} (Q - 1) + 1}^{\frac{M Q}{2 P}} E x_{n, j}^{2}, \sum_{j = \frac{M}{2^{2} P} (Q - 1) + 1}^{\frac{M Q}{2^{2} P}} E x_{n, j}^{3}, \dots, \sum_{j = \frac{M}{2^{p - 1} P} (Q - 1) + 1}^{\frac{M Q}{2^{p - 1} P}} E x_{n, j}^{p}) = \sum_{j = \frac{M}{2^{p - 1} P} (Q - 1) + 1}^{\frac{M Q}{2^{p - 1} P}} E x_{n, j}^{p}

14.

(p t_{x} (\frac{M}{2^{p - 1} P} (Q - 1) + 1), p t_{x} (\frac{M}{2^{p - 1} P} (Q - 1) + 2), \dots, p t_{x} (\frac{M Q}{2^{p - 1} P})) = (p - P + 1, p - P + 1, \dots, p - P + 1)

15. end if
16. end for
17. end for
18. Return the indexes

I x

and the encoding pattern vector

p t = {{p t}_{1}, {p t}_{2}, \dots, {p t}_{N}}

.

Since the general computation is overly cumbersome and difficult to interpret, we present a concrete example of the encoding process for M = 8, which includes three encoding mode, as illustrated in Figure 3.

Figure 3 illustrates the encoding process for a dataset vector x when M = 8. The vector is divided into 8 sub-vectors, each corresponding to one of the 8 subspaces. The encoding pattern

p t

of different modes specifies which product quantizer (with different numbers of subspaces) was used for each subspace. Sub-vectors are quantized according to this pattern, and their respective codeword indexes are retrieved from the appropriate codebooks.

For example, the encoding pattern

p t

= {2,2,1,1,3,3,3,3} in Figure 3 specifies that the first 2 subspaces (the light blue ones) of x are to be quantized by Mode 1 using the second product quantizer, which has M/2 subspaces. The next two subspaces (the dark blue ones) are quantized using Mode 0, which utilizes a full-resolution product quantizer with M subspaces. Finally, subspaces five through eight (the red ones) are assigned to Mode 2, which operates with a coarser product quantizer containing M/4 subspaces. Consequently, the first 2 subspaces of x are merged and then quantized by the second sub-quantizer of this second product quantizer. The final output is a single index, which represents the quantization result produced by the second sub-quantizer of the second product quantizer. This index corresponds to the codeword in the codebook

C_{2}^{2}

of the second sub-quantizer within the second product quantizer (Mode 1). After encoding, the sample vector x is ultimately represented by three product quantizers, each applied to a specific segment of the vector, resulting in a total of 4 codeword indexes. In PPQ, the following components need to be precomputed or trained and stored: p codebooks, N encoding pattern vectors, and the indexes resulting from encoding using these p codebooks.

It should also be noted that, in PPQ, each database vector may be encoded using a different quantization mode. Consequently, the resulting bit-rate is variable across vectors, and compared to standard PQ with a fixed code length, PPQ typically achieves a lower effective bit-rate for the majority of samples. A detailed discussion, supported by experimental results, will be provided in Section 4.2.

3.4. Encoding and Searching Process

When a query vector q is received, we do not quantize it. Instead, we split q into M sub-vectors in the same way as during database encoding.

In our Pyramid Product Quantization (PPQ) framework, we adopt asymmetric distance computation (ADC) for nearest neighbor search. Specifically, for each of the p product quantizers, we online-compute p small look-up tables

{{T^{1}, T^{2}, \dots, T}^{p}}

that store the squared distances between the query’s sub-vectors and all codewords in the corresponding sub-codebook. Compared to symmetric distance computation (SDC), which precomputes p large

k_{p} \times k_{p}

look-up tables (which store the squared distances between all pairs of codewords from the p product quantizers), ADC achieves higher accuracy, as the unquantized query is used during distance approximation, and smaller memory, since no pairwise codeword-to-codeword tables are stored.

To compute the approximate distance between q and a database vector x, we first retrieve the stored M-dimensional encoding pattern

p t

of x. This pattern indicates, for each original subspace position, which quantizer configuration was used to encode x.

Guided by this pattern, we then select the corresponding look-up table in that subspace, and look up their precomputed squared distance from the look-up table. This yields an approximate squared distance between q and x in that subspace. Summing these values across all subspaces gives the final squared approximate distance. Algorithm 2 in the table below presents the process of ANN search when a query is received.

Algorithm 2 Process of ANN search

Input:

N-sized database $X = {x_{1}, x_{2}, \dots, x_{N}}$ , query vector $q$ , p PQ codebooks of product quantization (PQ) with different numbers of subspaces ${C^{1}, C^{2}, \dots, C^{p}}, where C^{i} = C_{1}^{i} \times C_{2}^{i} \times \dots \times C_{M / 2^{p - 1}}^{i}, i = 1, 2, \dots p$ . Indexes I and pattern vectors $p t$ for database vectors.

Output:
Top R nearest neighbors of

q

in database.
1. for

P = 1; P \leq p; P + +

do
2. Decompose

q

into

M / 2^{p - 1}

sub-vectors

(u_{1} (q), u_{2} (q), \dots, u_{M / 2^{p - 1}} (q))

;
3. for

Q = 1; Q \leq \frac{M}{2^{P - 1}}; Q + +

do
4. Calculate the distance between

u_{Q} (q)

and the codewords in

C_{Q}^{P}

.
5. end for
6. Build the look-up table

T^{P}

by using the distances.
7. end for
8. for

n = 1; n \leq N; n + +

do
9. Read the n-th pattern vectors

{p t}_{n}

.
10. Initialize the squared distance between q and

x_{n}

d_{n, q} = 0

11. for

m = 1; m \leq M; m + = {p t}_{n} (m);

do
12. Read the squared distances

d_{n, q}^{{p t}_{n} (m)}

from the entry in row

m

, column

{I x}_{n, 1 + {l o g}_{2} m}^{{P t}_{n} (m)}

of the look-up table

T^{{P t}_{n}}

;
13.

d_{n, q} + = d_{n, q}^{{p t}_{n} (m)}

;
14. end for
15. end for
16. Rank the database vectors in ascending order in terms of

d_{n, q}

;
17. Return R nearest neighbors of

q

.

4. Experiments

In this section, we first introduce the setup of the experiments. Then, the performance of our proposed Pyramid Product Quantization (PPQ) framework is evaluated through comparative experiments with PQ for an exhaustive search. Finally, we analyze the time complexity of the PPQ-integrated PQ-based ANN methods.

4.1. Setup

We evaluate our method on two standard benchmarks: SIFT1M [12] and GIST1M [13]. Both datasets include a learning set, a database set, and a query set. Their key statistics are provided in Table 2.

These datasets are publicly available at http://corpus-texmex.irisa.fr/./ (accessed on 1 July 2010).

Before query vectors were processed, a series of offline preprocessing steps were performed. These include pre-training p product quantization codebook group

{C^{1}, C^{2}, \dots, C^{p}}

for the database vectors, and their codebook sizes are

{(k}_{1}, k_{2}, \dots, k_{p})

. The dataset is also pre-quantized using p product quantizers, and their corresponding quantization indexes

I x

= {

{I x}_{1}, {I x}_{2}, \dots, {I x}_{N}}

and pattern vectors

p t

are stored.

When the total bit-rate is fixed, reducing the number of subspaces generally causes the sub-codebook size to grow exponentially. For example, in standard PQ with a 64-bit code and 8 subspaces, each subspace is allocated 8 bits, resulting in a sub-codebook size of 2⁸ = 256. If the number of subspaces is reduced to 4, each subspace receives 16 bits, leading to a sub-codebook size of 2¹⁶ = 65,536. Further reducing the number of subspaces to 2 increases the per-subspace allocation to 32 bits, yielding a sub-codebook size of 2³² = 4,294,967,296. In such cases, the computational complexity of online quantization, particularly for assigning a query vector q to its nearest codeword in each large sub-codebook, becomes prohibitively high.

Fortunately, when the number of subspaces is high, the overwhelming majority of Voronoi cells in product quantization remain empty. Moreover, PQ with fewer subspaces achieves significantly higher encoding accuracy than PQ with a higher number of subspaces. This observation allows us to simultaneously reduce both the number of subspaces and the size of the corresponding sub-codebooks. And the overall quantization quality can be largely unaffected.

Table 3 shows the proportion of data that can be encoded using a product quantizer with 4 subspaces—instead of the baseline with 8 subspaces—under various sub-codebook sizes, while preserving the same average quantization error across the entire dataset as that of the 8-subspace PQ.

As shown in Table 3 and Table 4, when M = 4, the sub-codebook size does not need to be as large as 65,536; instead, a much smaller size, such as 2048 or 4096, is sufficient to achieve a high replacement ratio while maintaining the overall quantization error at the same level as the baseline (8-subspace PQ). In contrast, when M = 2, the significant reduction in the number of subspaces makes it difficult to attain satisfactory encoding quality, even with substantially larger codebooks. Consequently, the replacement ratio remains low, and the approach incurs the additional cost of performing quantization with a very large codebook.

Therefore, in our subsequent experiments, we adopt the standard PQ configuration as the baseline: M = 8 subspaces with a sub-codebook size of k = 256. We only apply vector replacement using the M = 4 configuration.

Recall@R is a standard metric for evaluating the accuracy of approximate nearest neighbor (ANN) search. For a given query q, the search is considered successful if its true nearest neighbor appears within the top R positions of the retrieved ranking; in this case, the result is scored as 1, otherwise as 0. The final Recall@R value is computed as the average success rate over all queries in the query set.

All experiments were carried out on a desktop machine with an Intel i7-8700 CPU running at 3.20 GHz and 32 GB of RAM.

4.2. Experimental Results

The experimental results are shown in Table 5 and Table 6; results are averaged over all queries in the datasets.

As shown in Table 5 and Table 6, the pyramid-structured quantization incorporates a constraint that the overall quantization error does not increase compared to the baseline (PQ with m = 8 and k = 256). Consequently, the retrieval accuracy remains virtually unchanged relative to the baseline.

We now analyze the online computational complexity. As indicated in Table 1, the cost of a single online ANN search consists of two main components: (1) building the ADC look-up table costs

(\frac{2 D}{m} - 1) k m

additions and

k D

multiplications, and (2) computing the ADC (asymmetric distance computation) distances between q and each database vector x_i costs

(M - 1) N

additions.

Since the data dimensionality D and the database size N are fixed, the first component is primarily determined by the size of each sub-codebook, while the second component depends on the number of subspaces. The actual running times for these two components in different databases are reported in Table 7 and Table 8, respectively.

As shown in Table 7 and Table 8, as the sub-codebook size k_p increases, a higher proportion of vectors can be represented using fewer subspaces, thereby reducing the computational cost of ADC distance computation. However, this comes at the expense of a sharp increase in the time required to encode the query vector q.

Consequently, a larger k_p is not always better; instead, an optimal trade-off must be found to minimize the total query time. On the SIFT1M dataset, the minimum total time is achieved when k_p = 2048, yielding an 17.8% reduction in total time compared to the baseline. On the GIST1M dataset, the best performance is obtained at k_p = 1024, resulting in a 6.7% reduction in total time relative to the baseline.

We now analyze the bit-rate of PPQ. As shown in Table 3 and Table 4, competitive retrieval performance is achieved using only two encoding modes. We therefore examine the effective bit-rate of PPQ under these two modes.

Mode 0: Encode each D/M subspace independently using a k_p₀ = 256 codebook (8 bits per subspace).

Mode 1: Encode a pair of adjacent D/M subspaces jointly using a k_p₁ = 4096 codebook (12 bits per subspace).

Suppose x is split into 8 subspaces. If the first four subspaces use Mode 0 (8 × 4 = 32 bits) and the last four are grouped into two pairs using Mode 1 (12 × 2 = 24 bits), the encoding pattern can be presented by “0011”, requiring only 4 bits to indicate which mode applies to each pair. The total storage is 4 (pattern) + 32 + 24 = 60 bits, which is less than the 64 bits required by standard PQ with 8 subspaces (8 × 8 = 64 bits).

During decoding, the system first reads the 4-bit pattern. For each “0”, it reads two 8-bit indexes (total 16 bits for the pair); for each “1”, it reads one 12-bit index. In the above example, the decoder reads 16 + 16 + 12 + 12 = 56 bits of codeword data plus 4 bits of pattern, totaling 60 bits, and reconstructs the vector accordingly.

Only in the worst case—when all segments must fall back to fine-grained original PQ encoding with 8 subspaces—does PPQ incur a slight overhead: 64 bits for codewords + 4 bits for the pattern = 68 bits, which is marginally higher than standard PQ. However, as shown in Table 3, this scenario is rare: 73.61% of segments on SIFT1M can be encoded more efficiently using the coarser joint codebook, leading to an average bit-rate reduction across the dataset.

4.3. Discussions

In this section, we discuss the integration of PPQ with other PQ-based methods.

As a quantization encoding framework, Pyramid Product Quantization (PPQ) can be readily integrated into most subspace-count-fixed, product-quantization-based ANN search algorithms—such as OPQ [14] and MRPQ [15]. These methods primarily enhance retrieval accuracy through decorrelation techniques (e.g., rotation or mean removal), but their encoding time complexity remains structurally similar to that described in Table 1, consisting of two main components: (1) encoding the query vector q, and (2) computing the approximate distances between q and all database vectors.

Taking MRPQ as an example, each data vector is decomposed into a global mean component and a residual component.

x = r + u I,

(9)

u = \frac{1}{D} \sum_{i = 1}^{D} x_{i},

(10)

I = (1, 1, \dots, 1)^{T},

(11)

where

u

is the mean of the elements in

x

, and

I

is a vector of dimensionality

D

. Standard PQ is then applied to the mean-removed residual vectors

r

. In this context, our proposed PPQ can be directly employed to encode the residual vectors

r

, replacing the conventional fixed-subspace PQ. By adaptively selecting fewer subspaces where appropriate, PPQ can reduce the online distance computation cost and accelerate the overall encoding and search process. Table 9 and Table 10 present the recall and time cost of different methods before and after integrating PPQ.

In Table 9 and Table 10, the “Other Operations” column shows the additional time overhead incurred by OPQ and MRPQ compared to standard PQ (e.g., rotation in OPQ or mean removal in MRPQ). As can be seen from Table 9 and Table 10, PPQ can be effectively integrated with PQ-based ANN methods, accelerating search by reducing the computational cost of ADC distance calculations while maintaining search accuracy comparable to that of standard methods.

5. Conclusions

In this work, we proposed a novel encoding framework named Pyramid Product Quantization (PPQ), which adaptively combines multiple product quantizers with varying numbers of subspaces to accelerate the online ANN search. By leveraging the observation that fewer subspaces often yield lower quantization error when allocated sufficient bits, PPQ selectively encodes different segments of each data vector using the most suitable quantizer configuration. This strategy preserves the overall quantization errors while significantly reducing the number of distance look-ups during ANN search.

Extensive experiments on datasets demonstrate that PPQ maintains retrieval accuracy comparable to standard PQ and its variants, while achieving up to 18.6% reduction in total query time. Moreover, PPQ is orthogonal to existing PQ-based methods and can be seamlessly integrated into PQ-based schemes to accelerate their online stages without compromising accuracy.

Author Contributions

Conceptualization, Y.W.; Methodology, Y.W., L.Y., J.Z. and Q.Z.; Validation, Y.W. and L.Y.; Formal analysis, Y.W., J.Z. and Q.Z.; Investigation, Y.W.; Resources, Y.W.; Data curation, L.Y., J.Z. and Q.Z.; Writing—original draft preparation, Y.W. and L.Y.; Writing—review and editing, Y.W., J.Z. and Q.Z.; Supervision, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Stand-up Fund of Xi’an University of Technology (Grant No. 108-451121002) and the Stand-up Fund of Xi’an University of Technology (Grant No. 108-451122002).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

VQ	Vector quantization
PQ	Product quantization
ANN	Approximate nearest neighbor
ADC	Asymmetric distance computation

References

J’egou, H.; Douze, M.; Schmid, C. Product quantization for nearest neighbor search. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 33, 117–128. [Google Scholar] [CrossRef] [PubMed]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, 1–5 May 2001. [Google Scholar]
Ning, Q.; Zhu, J.; Zhong, Z.; Hoi, S.C.; Chen, C. Scalable image retrieval by sparse product quantization. IEEE Trans. Multimed. 2017, 19, 586–597. [Google Scholar] [CrossRef]
Nister, D.; Stewenius, H. Scalable recognition with a vocabulary tree. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, New York, NY, USA, 17–22 June 2006. [Google Scholar]
Wu, Z.; Yu, J. Vector quantization: A review. Front. Inf. Technol. Electron. Eng. 2019, 20, 507–524. [Google Scholar] [CrossRef]
Babenko, A.; Lempitsky, V. Additive quantization for extreme vector compression. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014. [Google Scholar]
Wang, J.; Zhang, T. Composite quantization. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 1308–1322. [Google Scholar] [CrossRef] [PubMed]
Zhang, T.; Qi, G.-J.; Tang, J.; Wang, J. Sparse composite quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
Brandt, J. Transform coding for fast approximate nearest neighbor search in high dimensions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, San Francisco, CA, USA, 13–18 June 2010. [Google Scholar]
Heo, J.-P.; Lin, Z.; Yoon, S.-E. Distance encoded product quantization for approximate k-nearest neighbor search in high-dimensional space. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2084–2097. [Google Scholar] [CrossRef] [PubMed]
Matsui, Y.; Uchida, Y.; J’egou, H.; Satoh, S. A survey of product quantization. ITE Trans. Media Technol. Appl. 2018, 6, 2–10. [Google Scholar] [CrossRef]
Lowe, D.G. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 2004, 60, 91–110. [Google Scholar] [CrossRef]
Oliva, A.; Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comput. Vis. 2001, 42, 145–175. [Google Scholar] [CrossRef]
Ge, T.; He, K.; Ke, Q.; Sun, J. Optimized product quantization. IEEE Trans. Pattern Anal. Mach. Intell. 2014, 36, 744–755. [Google Scholar] [CrossRef] [PubMed]
Yang, J.; Chen, B.; Xia, S.-T. Mean-removed product quantization for large-scale image retrieval. Neurocomputing 2020, 406, 77–88. [Google Scholar] [CrossRef]

Figure 1. Pyramid encoding structure.

Figure 2. Schematic illustration of the encoding selection strategy.

Figure 3. Schematic of encoding process using encoding pattern for M = 8.

Table 1. Online computational requirements of PQ-based ANN search.

Operation	Addition	Multiplication
Build squared-distance look-up tables	$(\frac{2 D}{m} - 1) k m$	$k D$
Computing the ADC (asymmetric distance computation) distances	$(M - 1) N$	$0$

Table 2. Summary of the SIFT1M and GIST1M datasets.

Dataset	SIFT1M	GIST1M
Descriptor dimensionality D	128	960
Learning set size	100,000	500,000
Database set size	1,000,000	1,000,000
Query set size	10,000	1000

Table 3. Replacement ratio of data encoded using 4-subspace PQ.

Dataset	SIFT1M					GIST1M
Sub-codebook Size k_p	1024	2048	4096	8192	16,384	512	1024	2048	4096	8192
Replacement ratio (%)	35.24	55.90	73.61	88.51	98.99	53.73	68.01	75.68	88.88	94.14

Table 4. Replacement ratio of data encoded using 2-subspace PQ.

Dataset	SIFT1M					GIST1M
Sub-codebook Size k_p	2048	4096	8192	16,384	32,768	1024	2048	4096	8192	16,384
Replacement ratio (%)	0.19	0.45	0.98	1.49	2.32	0.10	0.35	0.88	1.62	2.40

Table 5. The Recall@R of data encoded using 4-subspace PQ in the SIFT database.

Sub-codebook size k_p	1024	2048	4096	8192	16,384	Baseline of PQ (M = 8, k = 256)
Replacement ratio (%)	35.24	55.90	73.61	88.51	98.99	0
Recall@1	26.30	26.71	26.49	26.68	26.61	26.56
Recall@10	62.41	62.36	62.43	62.30	62.19	62.28
Recall@100	92.49	92.70	92.45	92.45	92.53	92.50

Table 6. The Recall@R of data encoded using 4-subspace PQ in the GIST database.

Sub-codebook size k_p	512	1024	2048	4096	8192	Baseline of PQ (M = 8, k = 256)
Replacement ratio (%)	53.73	68.01	75.68	88.88	94.14	0
Recall@1	9.18	9.33	9.26	9.29	9.24	9.23
Recall@10	18.83	18.97	18.89	18.75	18.93	18.89
Recall@100	43.68	43.67	43.86	43.80	43.85	43.91

Table 7. Search time for different sub-codebook sizes in the SIFT1M database.

Sub-codebook size k_p	1024	2048	4096	8192	16,384	Baseline of PQ (M = 8, k = 256)
Replacement ratio (%)	35.24	55.90	73.61	88.51	98.99	N/A
Time required to build ADC look-up tables (ms)	757	1239	2109	3674	6924	241
Time required to compute ADC distances.	12,524	11,426	10,701	10,562	10,596	15,125
Total time (ms)	13,281	12,665	12,810	14,236	17,520	15,366

Table 9. Experimental results of different PQ-based methods combined with PPQ in the SIFT1M database.

Method	Recall@100	Build Look-Up Table (ms)	ADC Distance Computation (ms)	Other Operations (ms)	Total Time (ms)
PQ (M = 8, k = 256)	92.50	241	15,125	0	15,366
OPQ (M = 8, k = 256)	94.01	249	15,817	2505	18,571
MRPQ (M = 8, k = 256)	96.51	247	15,698	6043	21,988
PPQ-PQ (k₁ = 256, k₂ = 2048)	92.70	1239	11,426	0	12,665
PPQ-OPQ (k₁ = 256, k₂ = 2048)	93.87	1257	11,378	3989	16,624
PPQ-MRPQ (k₁ = 256, k₂ = 2048)	96.49	1233	11,616	6945	19,794

Table 10. Experimental results of different PQ-based methods combined with PPQ in the GIST1M database.

Method	Recall@100	Build Look-Up Table (ms)	ADC Distance Computation (ms)	Other Operations (ms)	Total Time (ms)
PQ (M = 8, k = 256)	43.91	1008	15,069	0	16,077
OPQ (M = 8, k = 256)	45.99	1071	14,898	4650	20,619
MRPQ (M = 8, k = 256)	47.06	1116	15,133	7722	23,971
PPQ-PQ (k₁ = 256, k₂ = 1024)	43.68	4068	10,942	0	15,010
PPQ-OPQ (k₁ = 256, k₂ = 1024)	46.12	4135	10,570	5569	20,274
PPQ-MRPQ (k₁ = 256, k₂ = 1024)	47.01	4084	10,845	8061	22,990

Table 8. Search time for different sub-codebook sizes in the GIST1M database.

Sub-codebook size k_p	512	1024	2048	4096	8192	Baseline of PQ (M = 8, k = 256)
Replacement ratio (%)	56.73	68.01	75.68	88.88	94.14	N/A
Time required to build ADC look-up tables (ms)	2609	4068	6944	12,844	24,421	1008
Time required to compute ADC distances	12,534	10,942	10,339	9345	8845	15,069
Total time (ms)	15,143	15,010	17,283	22,189	33,266	16,077

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, Y.; Yu, L.; Zhang, J.; Zhang, Q. Pyramid Product Quantization for Approximate Nearest Neighbor Search. Appl. Sci. 2026, 16, 853. https://doi.org/10.3390/app16020853

AMA Style

Wang Y, Yu L, Zhang J, Zhang Q. Pyramid Product Quantization for Approximate Nearest Neighbor Search. Applied Sciences. 2026; 16(2):853. https://doi.org/10.3390/app16020853

Chicago/Turabian Style

Wang, Yang, Lu Yu, Jinbin Zhang, and Qiyuan Zhang. 2026. "Pyramid Product Quantization for Approximate Nearest Neighbor Search" Applied Sciences 16, no. 2: 853. https://doi.org/10.3390/app16020853

APA Style

Wang, Y., Yu, L., Zhang, J., & Zhang, Q. (2026). Pyramid Product Quantization for Approximate Nearest Neighbor Search. Applied Sciences, 16(2), 853. https://doi.org/10.3390/app16020853

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Pyramid Product Quantization for Approximate Nearest Neighbor Search

Abstract

1. Introduction

2. Related Works

2.1. Vector Quantization (VQ)

2.2. Product Quantization (PQ)

3. The Proposed Pyramid Product Quantization (PPQ)

3.1. Motivation

3.2. Pyramid Encoding Structure

3.3. Database Encoding Strategy

3.4. Encoding and Searching Process

4. Experiments

4.1. Setup

4.2. Experimental Results

4.3. Discussions

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI