## 1. Introduction

Fitting statistical models to empirical data is one of the most fundamental techniques in signal processing. Usually, one distinguishes between parametric and non-parametric models. Parametric models require, by definition, information about the type of the underlying statistical distribution. Non-parametric methods are more data-driven. Two important classes of non-parametric techniques are histograms and kernel-density estimators. Histograms are computationally very efficient but the specification of the bins used can be very difficult. Kernel density estimators, on the other hand, require only the definition of the kernel. One of the drawbacks of a straight-forward implementation of a kernel density estimator is the fact that a new data point to be included into the analysis can interact with a large number of already analysed data points.

In cases where the underlying domain has a group theoretical structure methods from harmonic analysis can be used to separate the contributions from the kernel and the data. In the following this will be illustrated in the case where the domain on which the data is defined is the unit disk and where the symmetry group of the data is the group SU(1,1) (to be introduced later). In the harmonic analsis framework the simplest groups are the commutative groups which lead to ordinary Fourier transforms. Next come the non-commutative but compact groups like the rotation groups which are related to finite-dimensional invariant subspaces and corresponding series expansions (like the spherical harmonics). The group SU(1,1) is the simplest example of a non-commutative and non-compact group and the corresponding Fourier transform is an integral transform, the Mehler-Fock transform to be described in the following sections.

The group SU(1,1) is well-known in mathematics and theoretical physics but there are only a few applications in signal processing. One reason for this is certainly that the usage of the unit disk domain is not as obvious as those of other domains. The natural domain for time-sequences is the real line with the shift-operators as symmetry group. The plane or three-dimensional space are domains for spatial processes with spatial shifts of the origin of the space as group elements. The rotation group is related to unit spheres and orientations. In the next section it will be shown that stochastic processes with positive values only have a natural conical structure with SU(1,1) as the natural symmetry group. Then the main results regarding the Mehler-Fock transform will be presented and a few properties of the transform will be illustrated with the help of an application from color signal processing.

## 2. Positive Signals

The raw output values of many measurement processes are non-negative numbers. An extreme example are binary measurements indicating the absence/presence of a response. This is typical for biological and binary technical systems. Other common examples are size, weight, age/duration or the photon counts of a camera. For simplicity only the case of strictly positive measurements will be considered. This is is natural in many cases. In imaging science, for example, measurements are often representing counts such as the number of interactions of photons with the sensor in a given time interval. Sensors are never perfect and the cases where zero counts are registered should be very rare. The measurement values are therefore almost always positive numbers.

In the following functions (signals) that are outcomes of stochastic processes are considered. It is assumed that they are elements of a vector (or Hilbert) space and that they assume positive values only. In this space one can use principal component analysis to introduce a basis where the first eigenvector has only positive components. This follows from the Perron-Frobenius (Krein-Milman) theorem. In the following only processes will be considered in which the first eigenvector is (approximately) proportional to the mean vector. It is also assumed that all relevant signals lie in a conical subset which is a product of the positive half-axis and the n-dimensional solid unit ball. In the following only the case ${\mathbb{R}}^{+}\times \mathcal{D}$ where $\mathcal{D}$ denotes the unit disk is considered. Generalizations to higher dimensions are straight-forward.

This is formulated in the following definition:

**Definition** **1.** A positive stochastic process is a function $f(\omega ,\mathbf{x})>0$ where ω is the stochastic variable and $\mathbf{x}\in \mathbf{X}$ denotes a set on which the function is defined. For every ω the function $f(\omega ,\mathbf{x})$ is an element in a Hilbert space with scalar product $\u2329,\u232a$.

Typical examples of $\mathbf{X}$ are:

a set of points ${\mathbf{x}}_{k}$ on the plane which describe the location of the $k-$th sensor element (pixel)

a set of points ${\mathbf{x}}_{k}$ in a time-interval denoting the time when the measurement was obtained

a set of indices ${\mathbf{x}}_{k}$ describing the spectral sensitivity of color sensor number k.

For a positive stochastic process

f the expectation and the correlation function are defined as usual:

**Definition** **2.** The expectation is defined as ${E}_{f}\left(\mathbf{x}\right)=\int f(\omega ,\mathbf{x})\phantom{\rule{0.277778em}{0ex}}d\omega $.

The correlation function is defined as ${C}_{f}\left({\mathbf{x}}_{1},{\mathbf{x}}_{2}\right)=\int f\left(\omega ,{\mathbf{x}}_{1}\right)f\left(\omega ,{\mathbf{x}}_{2}\right)\phantom{\rule{0.277778em}{0ex}}d\omega $.

The correlation function ${C}_{f}\left({\mathbf{x}}_{1},{\mathbf{x}}_{2}\right)$ defines an integral operator ${I}_{f}$ defined as: The correlation function ${C}_{f}\left({\mathbf{x}}_{1},{\mathbf{x}}_{2}\right)$ is symmetric, i.e., ${C}_{f}\left({\mathbf{x}}_{1},{\mathbf{x}}_{2}\right)={C}_{f}\left({\mathbf{x}}_{2},{\mathbf{x}}_{1}\right)$ and for all positive stochastic processes the expectation and the correlation function have positive values everywhere. For a finite set $\mathbf{X}$ the expectation is a vector and the correlation function is a symmetric positive matrix (having only positive elements) which is also positive semi-definite.

The operator ${I}_{f}$ is a positive and positive-definite operator and the Krein-Rutman theorem (Perron-Frobenius theorem in the case of matrices) shows that the first eigenfunction of the operator is a positive function. All eigenvalues are non-negative. Denote the eigenfunctions of ${I}_{f}$ (sorted by decreasing eigenvalues) as ${b}_{0},{b}_{1},\dots $. Then $\u2329f,{b}_{0}\u232a>0.$

The following example, discussing a model of an RGB camera, should give an idea of a situation in which the unit disk model arises in signal processing. A more general and detailed application is discussed later when the properties of the Mehler-Fock transform are illustrated. In a first approximation it is assumed that such a camera acts as a linear system and that the measured RGB values are linearly related to the incoming light. In practice this implies using the raw data from the camera sensor, excluding processing steps like white-balancing and gamma mapping. Also excluded are, in this first step, pixels outside the linear response region of the camera avoiding critical effects such as a saturation of the sensor. This will be discussed later in connection with the experiments. Under these conditions the doubling of the exposure time will thus result in a multiplication of the RGB vector by a factor of two.

Under this condition the group ${\mathbb{R}}^{+}$ of positive real numbers acts on an RGB vector $f\left(\mathbf{x}\right)$ as $\left(c,f\left(\mathbf{x}\right)\right)\mapsto cf\left(\mathbf{x}\right)$. The elements of the group ${\mathbb{R}}^{+}$ are intensity transformations and ${\u2225f\left(\mathbf{x}\right)\u2225}_{1}$ is defined as the intensity of the RGB vector $f\left(\mathbf{x}\right).$ The result z of the projection $f\left(\mathbf{x}\right)\mapsto z=f\left(\mathbf{x}\right)/{\u2225f\left(\mathbf{x}\right)\u2225}_{1}$ defines the chroma of the RGB vector $\mathbf{x}$.

The main observation now is the fact that chroma vectors are located on a disk with a finite radius. After a scaling operation it can therefore be assumed that chroma vectors are all located on the unit disk. An investigation of the general case of multispectral color distributions can be found in [

1,

2] and will be studied in

Section 5.

## 3. The Unit Disk and the Group SU(1,1)

Points on the unit disk are in the following described by complex variables

$z.$ The group

$\mathrm{SU}\left(1,1\right)$ is defined as follows:

**Definition** **3.** The group $\mathrm{SU}\left(1,1\right)$ consists of all $2\times 2$ matrices with complex elements: The group operation is the usual matrix multiplication.

The group acts as a transformation group on the open unit disk $\mathcal{D}$ (consisting of all points $z\in \mathbb{C}$ with $\left|z\right|<1$) as the Möbius transforms:with $\left({\mathbf{M}}_{1}{\mathbf{M}}_{2}\right)\u2329z\u232a={\mathbf{M}}_{1}\u2329{\mathbf{M}}_{2}\u2329z\u232a\u232a$ for all matrices and all points. In the following the notation $\mathbf{M}$ is used when the group elements are the matrices. In situations where they represent elements of the abstract group or if the parametrization of the Möbius transforms is essential the symbol $\mathit{g}$ is used. This will become clearer when the general transform is decomposed into a product of three special transforms. This corresponds to the well-known property of the group of ordinary three-dimensional rotations that every rotation can be written as a product of three rotations around the coordinate axes. A similar decomposition holds also for the group $\mathrm{SU}\left(1,1\right)$. In this case these three parameters are denoted by $\psi ,\tau ,\phi $ and the decomposition is

This motivates the introduction of two special types of transformations/matrices defined as:

This shows that

$\mathit{g}(\phi ,0,0),\mathit{g}(0,0,\psi )\in K$ and

$\mathit{g}(0,\tau ,0)\in A.$ This decomposition is known as Cartan’s KAK decomposition of the group and

$\psi ,\tau ,\phi $ are known as the KAK-coordinates. The following formula for the inverse:

$\mathit{g}{(\phi ,\tau ,\psi )}^{-1}=\mathit{g}(-\psi ,-\tau ,-\phi )$ follows directly by using the product decomposition and observing that the inverse of a transformation in

K and

A is given by the transformation with the negative parameter value.

From the definition follows that matrices in

K are rotations that leave the origin fixed

$0=\mathit{g}(\phi ,0,0)\u23290\u232a=\mathit{g}(0,0,\psi )\u23290\u232a.$ For a general element in

$\mathrm{SU}\left(1,1\right)$ the following parametrization of the unit disk follows directly from the KAK decomposition:

Two transformations are defined as equivalent if their first two parameters are identical. From the previous calculations it can be seen that all equivalent transformations map the origin to the same point on the unit disk which defines a correspondence between the points on the unit disk and the equivalence classes of transformations. This is expressed as

$\mathcal{D}=\mathrm{SU}\left(1,1\right)/K$. It also follows that functions on the unit disk are functions on the group that are independent of the last argument of the KAK decomposition.

## 4. Kernel Density Estimators and Mehler-Fock Transform

Both interpretations are useful in the motivation of the the basic idea behind kernel-density estimators on the unit disk. Consider first two points

$w,z\in \mathcal{D}$. Here

z denotes the measurement as the actual outcome of a stochastic process. Since it is a stochastic process it is also possible that the outcome could have been a neighboring point

w. This leads to the definition of the kernel function

$\mathrm{k}\left(w,z\right)$ as the probability of obtaining a measurement

w given that the actual measurement resulted in the point

z. In cases where no extra knowledge is available it is natural to treat all point pairs

$(w,z)$ on equal footing. This leads to the requirement

$\mathrm{k}\left(\mathbf{M}\u2329w\u232a,\mathbf{M}\u2329z\u232a\right)=\mathrm{k}\left(w,z\right)$ for all

$\mathbf{M}\in \mathrm{SU}\left(1,1\right).$ In the case where we can expect that the data is concentrated in a few separate clusters other methods like mixture-models (see [

3] for an introduction and [

4] for an example where the data points are covariance matrices) might be considered.

The Möbius transforms define an invariant metric and the (hyperbolic) distance between points

z and

w in this metric is denoted by

$d(w,z)$. In the following only kernel functions of the form

$\mathrm{k}\left(d(w,z)\right)$ will be used. It follows that it is sufficient to consider the special case where one point is the origin and the other is located on the real axis (

$w=0$ and

$z=tanh\left(\tau \right)$) since every pair

$(w,z)$ can be transformed to this form by applying a Möbius transform to both points. For these point configurations the distance is given by

$d(0,tanh\tau )=2\tau $. This shows that the border of the unit disk had to be excluded since its points are infinitely far away from the origin. The general formula for the hyperbolic distance between two points is:

In the following this construction will be generalized to the case of general group elements. Thus the measured point

z is replaced by the group element

${\mathit{g}}_{l}$ and the neighboring point

w by the general group element

$\mathit{g}$. The fact that all pairs of group elements are treated equally motivates the following condition on the kernels:

Such functions are known as (left-) isotropic kernels. This leads to the following construction of the kernel density estimator (KDE):

**Definition** **4.** Given a stochastic process defined on the group SU(1,1) with observed group elements ${\mathit{g}}_{l},l\phantom{\rule{3.33333pt}{0ex}}=\phantom{\rule{3.33333pt}{0ex}}1,\dots ,L$. The kernel density estimator(KDE) of the probability density function p is given by: Only isotropic kernels will be used in the following and it follows that

$\mathrm{k}\left({\mathit{g},\mathit{g}}_{l}\right)=\mathrm{k}\left({\mathit{g}}_{l}^{-1}\mathit{g},\mathit{e}\right)=\mathrm{k}\left({\mathit{g}}_{l}^{-1}\mathit{g}\right)$ where

$\mathit{e}$ is the identity element. Using KAK coordinates for two points

$\mathit{g}=g({\phi}_{0},{\tau}_{0},0).$ and

${\mathit{g}}_{l}=g({\phi}_{l},{\tau}_{l},0)$ on the unit disk gives the following relation between them and the difference

${\mathit{g}}_{l}^{-1}\mathit{g}$ with its own decomposition

${\mathit{g}}_{l}^{-1}\mathit{g}=g(\phi ,\tau ,0)$:

In ([

5], Vol. 1, page 271), the relation between the parameters of the three group elements can be found as follows:

As before in the case of the hyperbolic distance only kernel functions of the form

are considered in the following.

In general one has to compute the function values

$\mathrm{k}\left({\mathit{g},\mathit{g}}_{l}\right)$ for every pair

$(\mathit{g},{\mathit{g}}_{l}).$ Of special interest are functions which separate these factors. They are the associated Legendre functions (ALFs). They are also known as zonal or Mehler functions ([

5], page 324) of order

m and degree

$\alpha =-1/2+i\kappa $ and defined as (see Equation (

9)):

For these functions the following addition formula ([

5], page 327) holds:

Using

${\mathfrak{P}}_{-1/2+i\kappa}\left(cosh\tau \right)$ in the previous Equations (

7) and (

8) gives:

separating the influence of the data dependent terms, originating in

${g}_{l}$ with parameters

${\tau}_{l},{\phi}_{l}$ and the general part depending on

g with

${\tau}_{0},{\phi}_{0}$.

The choice

$\mathrm{k}\left(cosh\tau \right)={\mathfrak{P}}_{-1/2+i\kappa}\left(cosh\tau \right)$ can be generalized by the Mehler-Fock ttransform, that shows that a large class of functions are combinations of ALFs:

**Theorem** **1** (Mehler-Fock Transform; MFT).

For a function $\mathrm{k}$ defined on the interval $\left[1,\infty \right)$ define its transform c as:Then $\mathrm{k}$ can be recovered by the inverse transform:The MFT also preserves the scalar product (Parseval relation): If ${c}_{n}\left(\kappa \right)={\int}_{1}^{\infty}{f}_{n}\left(x\right){\mathfrak{P}}_{1/2+i\kappa}^{\phantom{\rule{4pt}{0ex}}}\left(x\right)\phantom{\rule{0.277778em}{0ex}}dx$ then This, and many details about the transform, special cases and its applications can be found in ([

6], Section 7.6) and [

5,

7,

8,

9]. The description of the Parseval relation can be found in ([

6], Section 7.6.16) and ([

6], Section 7.7.1).

The integral in Equation (

12) is essentially the scalar product of the function

$\mathrm{k}$ and the ALF defined by the invariant Haar integral on the group. Applying the integral transform to the shifted kernel and using the group invariance of the transform and the addition formula for the ALFs results in the following equations:

Here the coefficients

${\gamma}_{\kappa ml}$ are computed from the measurements and the weights

${w}_{\kappa m}^{\left(\mathrm{k}\right)}$ are independent of the data. Equation (16) shows how harmonic analysis separates the effects of the data in the

$\gamma $ and the kernel in the

w.

## 5. Illustration and Implementation

In the following the usage of the MFT will be illustrated with the help of a computational model of a color camera. The Mehler-Fock transform is an integral transform in which the key components are integrals over the positive half-axis. This means that the implementation has to take into account problems such as numerical accuracy and quantization. The main goal of this paper is a first feasibility study of these tools and we will therefore comment on the relevant decisions in connection with the particular application under investigation.

The illustration studied in the following is similar to the motivating example introduced earlier but here a multispectral approach based on the following equation is used:

where

${f}_{l}\left(\mathbf{x}\right)$ is the measured value by channel

l at position

$\mathbf{x}$,

$\lambda $ denotes the wavelength variable in the relevant interval which is here given by the lower and upper bounds 380 nm and 780 nm,

$l(\lambda ,T)$ is the spectral distribution of the black body radiator at temperature

T,

$r(\lambda ,\mathbf{x})$ is the reflection spectrum at object point

$\mathbf{x}$,

$s(\lambda ,l)$ describes the spectral characteristics of sensor channel

l and

${\gamma}_{l}$ are constants used for white point correction. More information about spectral models of color imaging and black body radiators can be found in [

10].

The reflectance spectra used are from a database consisting of 2782 measurements from the combined Munsell (1269 chips) (see [

11]) and NCS (1513 chips) (measured by NCS Colour AB,

http://ncscolour.com) color atlas. The spectra are measured in the spectral range from 380 nm to 780 nm in 5 nm steps. The spectral characteristics of the camera channels are described in [

12] and the numerical values were downloaded from (

https://spectralestimation.wordpress.com/data). From the reflection spectra the CIELAB values where computed first (using the constant spectrum as a whitepoint reference) and then these CIELAB value where converted to RGB using the lab2rgb function in Matlab. The normalization factors

${\gamma}_{l}$ were computed in two steps: first three constants were computed such that the RGB vectors computed from the CIELAB conversion and the RGB vector computed from the camera model had the same mean vectors. Then a global normalization factor was computed such that the median value of the green channel was mapped to 0.5 by the tanh-mapping.

The non-linear tanh map is more realistic than the usual unrestricted usage of the integrated values since all realizable sensors must have a saturation limit related to the maximum capacity of the sensor. Geometrically this has a profound influence on the space of the RGB vectors. It was shown earlier that the positivity of the functions involved result in a conical structure of the measurement space. Increasing the intensity of the illumination source results in a half-line in measurement space when no saturation limits are enforced. With the limitation (for example in the form of the tanh mapping) all curves generated by intensity chances originate in the black point as in the unrestricted case but the saturation limit leads to a termination of all such curves in the white-point. Instead of an open-ended cone the saturation limited systems leads therefore to a double-cone structure ending in a black- and a white-point.

Using a constant illumination spectrum (

$l\left(\lambda \right)=1)$ the three eigenvectors

${b}_{0},{b}_{1},{b}_{2}$ were computed from the simulated RGB-camera responses. The simulated RGB vector computed from color chip number

m under the constant (white) illumination spectrum is denoted by

$f(m,w)$. This was first mapped to the PCA coordinates

${c}_{k}(m,w)=\u2329f(m,w),{b}_{k}\u232a$ and then the corresponding ‘chroma’ value

$z(m,w)$ is computed as

where

$\sigma $ is a global constant which ensures that all chroma values are located on the unit disk. The value of

$\sigma $ has to be chosen so that the maximum length of the projected vectors

z is less than one to ensure that the chroma points are located on the unit disk. Here we selected a scaling such that the maximum length is slightly lower than one (for example 0.97). This implies that there is a known upper bound for the integrals involved.

Next ten illumination spectra with temperatures (in kelvins) between 2000 K and 9000 K (sampled in equal distances in the inverse temperature, mired, scale which is more closely related to properties of human color perception) were used which resulted in the

$11\times 2782$ chroma values

$z(k,T),k\phantom{\rule{3.33333pt}{0ex}}=\phantom{\rule{3.33333pt}{0ex}}1\dots 2782,T\phantom{\rule{3.33333pt}{0ex}}=\phantom{\rule{3.33333pt}{0ex}}2000,\dots ,9000,w.$ The images in

Figure 1 illustrate the differences between the second image (2190 K) in the black body series and the image generated with a flat illumination spectrum.

The corresponding projections

$z(m,w)$ onto the unit disk (computed from the two images shown in

Figure 1) are shown in

Figure 2. They show that the points related to the coordinate system defining the PCA are located near the center of the disk whereas the shift of the illumination towards the red moves the corresponding points in the direction of the border of the disk.

The next figures show how the type of the statistical distributions depends on the coordinate system used. Here the ‘saturation’ data from the illuminations with temperature 2000 K and constant spectrum (

$l\left(\lambda \right)=1$, denoted by

w earlier) are selected. Geometrically they are given by the radial position of the chroma points. The figures show the histograms of the two distributions and the fitted parametric distributions. In one case the radius

$\rho $ in the unit disk is used and in the other the group-theoretically motivated hyperbolic angle

$\tau $ is used. Three different types of parametric distributions are used: normal distributions, generalized extreme value distributions (GEV) and the Beta distribution.

Figure 3a shows the fitting of the distribution of the variables computed from the 2000 K blackbody radiator. It can be seen that the data is almost normally distributed with a slight upper tail which results in a better fit by the GEV.

Figure 3b shows that the white illumination source produced a distribution mainly concentrated around the origin.

The general picture is the same for the data parametrized by the radius. This is illustrated in

Figure 4. Again the 2000 K radiator generates a symmetrical distribution around the mean with good fittings of the normal and the GEV distributions. The white illumination has the same characteristic as in the hyperbolic parametrization. The difference here is that, by definition the output values are located in the unit interval. For such data the Beta-distribution is a popular statistical model and we see that this gives a good fitting for both cases.

From Equation (

9) it can be seen that in the case where the angle

$\theta $ has the value zero (both points have the same angular value) the ALFs are zero for

$m\ne 0$. This case will be illustrated next. From Equation (16) follows that

${c}_{\kappa l}={\gamma}_{\kappa 0l}{w}_{\kappa 0}^{\left(\mathrm{k}\right)}$. The value of

${\gamma}_{\kappa 0l}$ is given by the ALF and the values

${w}_{\kappa 0}^{\left(\mathrm{k}\right)}$ are given by the MFT of the kernel. The definition of the ALFs given in Equation (

9) is not suitable for numerical computations. Instead the connection between the conical functions and the Gauss hypergeometric function are often used (see [

13] and the references mentioned there). In the following experiments the implementations of special functions in Mathematica were used and the numerical results exported to Matlab. In this process two problems have to be solved: (a) the integrals related to the coefficients

${w}_{\kappa m}^{\left(\mathrm{k}\right)}$ from Equation (16) have to be calculated and (b) the discrete values in the

$\kappa $-domain have to be selected.

For a few kernel functions

k the MFT is known analytically but in the general case the integrals have to be computed numerically. In the experiments described here we used a two-stage procedure. First we selected a few relevant values in the

$\kappa $-domain and for a given kernel

k we computed the numerical values of the transform using the NIntegrate function in Mathematica using standard parameter settings. This is rather time-consuming and in the second stage of the process we therefore selected a grid on the relevant interval on the

$\tau $-axis and estimated the integral values by simple addition in Matlab. The grid points were chosen such that the Matlab and the Mathematica results were comparable. The result is a matrix whose entries are the values

${w}_{\kappa m}^{\left(\mathrm{k}\right)}$. Note that this matrix only has to be computed once. An investigation of the numerical computation of these values (comparable to the work reported in [

14]) lies outside the scope of this report. Also the

$\kappa $-grid points where selected in an ad-hoc fashion and their importance was judged afterwards.

Figure 5 shows some of the ALFs in the relevant interval.

The values of the coefficients ${w}_{\kappa 0}^{\left(\mathrm{k}\right)}$ are given by the MFT of the kernel. For every kernel they have to be computed only once. This can be done by some type of numerical integration (which has to take into account the oscillatory nature of the ALFs) or one can use a kernel for which an closed expression of the MFT is available. In the following illustrations kernels of the form $\mathrm{k}\left(cosh\tau \right)={\left(cosh\tau \right)}^{-C}$ are used.

The width of the kernel acts as a smoothing parameter. The effect of varying the exponent of the cosh-kernel is illustrated in

Figure 6. The data used are the radial values (in the

$\tau $ parametrization) computed from the simulated images under the 2000 K illuminant. It shows that for broad kernels some general characteristics can be obtained, whereas for very thin kernels the overall estimate starts to be sensitive to small local variations.

Figure 7 shows a section of the Mehler-Fock transform (

$0.05\le \kappa \le 12.5$) for three selected illuminants. This shows how the general shift to the border of the unit disk (for the red-shifted illuminants of lower temperature) changes the form of the Mehler-Fock transform.