This paper is an extended version of my papers published in the Eighth Workshop on Information Theoretic Methods in Science and Engineering, Copenhagen, Denmark, 24–26 June 2015 and the IEEE International Symposium on Information Theory, Aachen, Germany, 25–30 June 2017.

Kernel methods have been used for turning linear learning algorithms into nonlinear ones. These nonlinear algorithms measure distances between data points by the distance in the kernel-induced feature space. In lossy data compression, the optimal tradeoff between the number of quantized points and the incurred distortion is characterized by the rate-distortion function. However, the rate-distortion functions associated with distortion measures involving kernel feature mapping have yet to be analyzed. We consider two reconstruction schemes, reconstruction in input space and reconstruction in feature space, and provide bounds to the rate-distortion functions for these schemes. Comparison of the derived bounds to the quantizer performance obtained by the kernel

Kernel methods have been widely used for nonlinear learning problems combined with linear learning algorithms such as the support vector machine and the principal component analysis [

In this paper, we derive bounds for the rate-distortion functions for kernel-based distortion measures. We consider two schemes to reconstruct inputs in lossy coding methods. One is to obtain a reconstruction in the original input space. Since kernel methods usually yield results of learning by the linear combination of vectors in feature space, we need an additional step to obtain the reconstruction in input space, such as preimaging [

Furthermore, we design a vector quantizer using the kernel

Let

If the conditional distributions

The parameter

From the properties of the rate-distortion function

In kernel-based learning methods, data points in input space

The inner product is directly evaluated by a nonlinear function in input space

We identify the feature space

If we restrict ourselves to the reconstruction in input space, that is, the reconstruction

Note that the reconstruction

This is a difference distortion measure if and only if the kernel function is translation invariant, that is,

Suppose we have a sample of length

The rate-distortion function (distortion-rate function, resp.) for this distortion measure is denoted by

The following theorem claims that

The proof is given in

Since the rate-distortion problem (

Although the Shannon lower bound to

In the case of the distortion measure in Equation (

If we further assume that the kernel function is radial, that is,

If

The following theorem is derived from the facts that the spherical Gaussian distribution maximizes the entropy under the constraint that

In this section, we evaluate the rate-distortion dimension [

To examine the limit

Since

From Theorems 2 and 3 and Equation (

This theorem shows that the rate-distortion dimension is dependent only on the dimensionality of the input space and independent of the dimensionality of the feature space. In the case of the linear kernel,

We construct an upper bound to the rate-distortion function

Further upper-bounding the differential entropy

The proof is put in

We numerically evaluate the rate-distortion bounds obtained in the previous section. Designing a quantizer by the kernel

We focus on the case of the Gaussian kernel,

As a source, we first assumed the uniform distribution on the union of the two regions,

We used the trapezoidal rule to compute the

Using the same data set of the size 4000 as a training data set, we run the kernel

After the training, we computed the distortion and rate for the test data set, by assigning each of 20,000 test data generated from the same source to the nearest quantized points in the feature space.

For each quantized point, we obtained its preimage. That is, if the

We used the mean shift procedure for the maximization, although this procedure only guarantees the convergence to a local maximum [

The obtained bounds and the quantizer performances are displayed in

In both dimensions, the upper bound

We see that the quantizer performances for

At low distortion levels, each source output should be reconstructed within a small neighborhood in the feature space where we can find another point

In the 10-dimensional case (

To examine the asymptotic behavior of

The rate-distortion bounds,

We carried out a similar evaluation of the rate-distortion bounds and quantizer performances for a grayscale image data set extracted from the

Each dimension was normalized so that it has mean 0 and variance 1. Hence,

The upper bounds and quantizer performances are presented in

In this paper, we have shown upper and lower bounds for the rate-distortion functions associated with kernel feature mapping. As suggested in

The author would like to thank the anonymous reviewers for their helpful comments and suggestions. This work was supported in part by the Japan Society for the Promotion of Science (JSPS) grants 25120014, 15K16050, and 16H02825.

The author declares no conflict of interest.

Let

Since the input space

Let

Thus, from Equation (

The mean and covariance matrix of the random vector

Thus, the maximum entropy principle of the Gaussian distribution implies that the differential entropy

Solving Equation (

Rate-distortion bounds and quantizer performances for (

The ratios between the rate-distortion bounds and

Upper bounds of the rate-distortion functions and quantizer performance for image data.