Pose Estimation for Visible Light Systems Using a Quadrature Angular Diversity Aperture Receiver

The quadrature angular diversity aperture (QADA) receiver, consisting of a quadrant photodiode (QPD) and an aperture placed above the QPD, has been investigated for pose estimation for visible light systems. Current work on pose estimation for the QADA receiver uses classical camera sensor algorithms well known in computer vision. To this end, however, the light spot center first has to be obtained based on the RSS. However, this is less straightforward than for camera sensors, as in contrast to such sensors where the relationships are linear, the RSS output from the QADA is a non-linear function of the light spot position. When applying closed form solutions or iterative methods for cameras on a QADA, the non-linearity will degrade their performance. Furthermore, since in practice the aperture is not always perfectly aligned with the QPD, a procedure to calibrate the receiver is needed. Current work on calibration requires additional sophisticated equipment to measure the pose during calibration, which increases the difficulty of implementation. In this paper, we target the above problems for pose estimation and calibration of the QADA receiver. To this end, we first study the effect of the strategy of differencing and normalization on the probability density function (PDF), a commonly applied strategy for the QPD’s robustness against RSS variation, and it is shown that the applied strategy results in a complex PDF, which makes an effective and efficient estimation hard to achieve. Therefore, we derive an approximated PDF in a simple closed-form, based on which the calibration and the pose estimation algorithms using the least squares principle are proposed. The proposed calibration does not require any information about the pose of the receiver and is robust to variation of the received power and imperfect knowledge of the radiation pattern of the LED, making it easy to implement. We also derive the corresponding Cramér-Rao lower bound on the misalignment to benchmark the performance of the misalignment and to serve as an indicator to determine the required signal-to-noise ratio (SNR) or number of LEDs to obtain a desired accuracy. The calibration and pose estimation are evaluated by means of a Monte Carlo simulation. Computer simulations show that this theoretical bound is close to the RMSE of the proposed estimator and that the proposed pose estimator outperforms the PnP algorithm.


Introduction
White LEDs can be modulated up to several MHz. Therefore, they can be used for wireless communication and positioning. As white LEDs are becoming the primary source of light, visible light positioning (VLP) has the potential to achieve positioning with low cost, low power consumption, and long lifetime [1]. Due to the dependency of the RSS values on the pose (position and orientation) of the receiver, it is expected that imperfect knowledge of the pose may strongly affect the performance of visible light communication (VLC) systems [2][3][4][5][6][7]. Recently, a compact receiver based on a quadrant PD (QPD) combined with a single aperture, i.e., the quadrature angular diversity aperture receiver (QADA), was proposed for VLP [8]. The QPD is a segmented photodiode consisting of four matched PDs on a single chip separated by very thin gaps. Combined with an aperture, the resulting receiver achieves angular diversity with a reasonably compact size. Furthermore, as only one aperture needs to be placed above the QPD, the receiver will only have a limited number of construction parameters, making it more robust to construction errors.
The light from an LED reaches the QPD through the aperture only and introduces a light spot of the same size and shape as the aperture. As the position of this light spot is determined by the position of the LED and the pose of the receiver, the overlap area between the light spot and the four quadrants contains information about the pose of the receiver. The RSS values in the four constituent PDs of the QPD are proportional to this overlap area, indicating that a comparison of these RSS values allows us to estimate the angle-of-arrival (AOA) for the different LEDs as well as the position and/or orientation (pose) of the receiver. The current work has been initiated to achieve this goal. Experiments in [8] show that high angular resolution in estimated AOA can be achieved. The positioning performance of the QADA receiver is investigated in several works [8][9][10][11]. The paper [12] considers 3D position estimation with 1D orientation estimation, i.e., it is able to determine the azimuth angle for a receiver that is pointed upwards. The paper [13] that focuses more on the communication aspects considers the effect of a tilt angle, which restricts the orientation to two degrees of freedom. Recently, the work [14] studied the joint 3D position and 3D orientation estimation, namely pose estimation, based on a classic noniterative perspective-n-point (PnP) algorithm, a well known algorithm for camera sensors in computer vision. In their work, to employ the PnP, first the light spot center is obtained based on the RSS. However, this is less straightforward than for camera sensors, as in such sensors the relationships are linear, whereas the RSS output from the QADA is a non-linear function of the light spot position. Moreover, the strategy of differencing and normalization is commonly applied to the RSS, claimed by most works to improve the robustness against RSS variation, which further increases the non-linearity. These effects will degrade the performance of the PnP algorithm when applied to the QADA receiver; the PnP algorithm is sensitive to noise and as a result it lacks precision [15].
Furthermore, since in practice the aperture is not always perfectly aligned with the QPD, a procedure to calibrate the receiver is needed. The current work starts to consider the presence of misalignment between the aperture and the QPD. In [8], the authors manually adjusted the position of the aperture as a pre-step in the experiment, and the need for calibration is further emphasized in [9]. However, such a hardware calibration is a burdensome, time-consuming procedure. A better approach is to analyze the effect of the aperture misalignment and compensate for it through signal processing. For example, in [12], the authors propose a calibration method to compensate for the mismatch using signal processing. As the authors assume in this calibration phase that the relative distance between the LED and the receiver is known, additional equipment such as Optitrack is required to determine the ground truth on this distance. This limits the usability of the considered method.
In this paper, we target the above problems for pose estimation and calibration of the QADA receiver. On the one hand, we propose a calibration procedure using signal processing that does not require the knowledge of the pose of the receiver; thus no additional hardware is needed to obtain the ground truth on this pose. On the other hand, a pose estimation algorithm is proposed that directly estimates the pose from the (normalized) RSS values, achieving better performance compared to the method that estimates the light spot center and then applies the camera algorithm. To this end, we first investigate the strategy of differencing and normalization to the RSS, i.e., where we only look at the relative differences between the normalized RSS values to obtain an observation that is robust against variation in received power and offsets common to all quadrants [12]. We investigate the distribution of the observation obtained from the noisy RSS values, based on which we then propose the algorithm to estimate the pose of the receiver that is more robust to noise.
The main contributions of this paper are as follows: 1. We first model the RSS vector of the QADA at an arbitrary pose using the perspective projection model. With the help of this model, we derive an explicit expression relating the RSS with the intrinsic parameters, i.e., the aperture height and the misalignment of the aperture, and the extrinsic parameters, i.e., the position and orientation of the receiver. 2. We use the strategy of the normalized differences of the RSS values to make the estimator robust against variations in the transmitted power and radiation pattern.
To be able to derive a simple estimation algorithm, we replace the complex true PDF of the resulting observations by an approximated Gaussian PDF based on the first order Taylor series approximation of the observation. 3. Using this approximated PDF, we propose a calibration algorithm based on the least squares (LS) principle. The algorithm jointly estimates the intrinsic and extrinsic parameters from which the intrinsic parameters are extracted. This estimation is performed in an iterative way, where the principle of optimization on manifolds is used. After calibration, a simplified version of the estimation algorithm is used to extract the pose, as now we use the calibrated intrinsic parameters as prior knowledge for the misalignment. 4. To evaluate the optimality of the proposed algorithms, we derive the misspecified Cramér-Rao bound (MCRB) to take into account the effect of the approximated PDF. We compare the MCRB with the Cramér-Rao bound (CRB) for the detected RSS values to quantify the performance loss due to using the normalized differences of the RSS values to make the estimator robust against imperfect knowledge of the transmitted power and radiation pattern. The designed algorithms are verified by Monte Carlo simulations.
The rest of the paper is organized as follows. The channel link model and the expressions for the observation and output vectors are presented in Section 2. The approximated PDF and the calibration system model are provided in this section as well. The calibration and the pose estimation algorithms using the LS principle are proposed in Section 3. Subsequently, the theoretical bound is derived in Section 4. The Monte Carlo numerical comparison is given in Section 5. Finally, some concluding remarks are given in Appendix A.
Notation: Scalars are denoted in italic, e.g., x. Lowercase boldface indicates a column vector, e.g., x. Uppercase boldface denotes a matrix or a set, e.g., X, with I N representing an N × N identity matrix, and 0 N×M representing an N × M zero matrix. The vector e i is a unit basis vector with its ith element being 1, and the operator· converts the Cartesian coordinates x into the homogeneous coordinatesx = [x T , 1] T . Matrix transpose and inverse are indicated by superscript T, and −1, respectively. The Euclidean norm is denoted by · , and the expectation operator is denoted by E{·}. ∇ x = ∂/∂x denotes the Del operator, while ∆ z x = ∇ z ∇ T x denotes the Hessian operator. The rectangular function Π(·) is defined as The set of all real numbers is denoted by R. The group of all rotation matrices, i.e., the special orthogonal group, is denoted by SO (3), and the associated Lie algebra is denoted by so(3). The special Euclidean group is denoted by SE(3). The operator · ∧ converts a 6 × 1 vector into a member of se(3) -the Lie algebra of SE(3) -which is, while the operator · × converts a 3 × 1 vector into so (3), which is, The operators · ∨ and · ⊗ are, respectively, the inverse operators of · ∧ and · × . Additionally, exp(·) and log(·) are the matrix exponential and matrix logarithm functions, respectively.

System Description
In this section, we describe a visible light system that is able to detect signals transmitted by different LEDs from which the pose of the receiver can be determined. To avoid interference between the reference signals for pose estimation, we assume the system adopts multicarrier multiplexing based on (DC-biased) sinusoidal waves [16,17], where a portion of the subcarriers is assigned to the LED anchors to be used as reference signals for pose estimation, while the rest of the subcarriers can be used for communication. Each anchor is assigned a different subcarrier, implying that due to the orthogonality of the subcarriers, the reference signals from the different LEDs can be separated in a simple way. We assume the receiver has knowledge of the used subcarrier and position of each LED, e.g., because this information is contained in the reference signal or is available in a database accessible by the receiver.
The rest of this section is organized as follows. We first model the channel link between a single LED and the QADA and give the expressions for the output vector and observation. We further derive the PDF of the observation. As this PDF is too complex for practical estimation purposes, we also derive a simple closed-form approximation. Finally, we design a system model for calibration and discuss the goals of the pose estimation problem.

Receiver Structure
The considered visible light system uses a QPD consisting of four matched photodiodes deposited on a single chip separated by very thin gaps. To convert the incident light of an LED into a light spot on the QPD, we place an opaque screen with a circular aperture at a specific height above and parallel to the surface of the QPD [8] as shown in Figure 1 (a square aperture has the disadvantage of being sensitive to rotational offsets, which means that the rotational offset must be calibrated and compensated for, while a circular aperture is circularly symmetric, implying it is inherently robust to rotational offsets. Therefore, in this paper, we focus on the circular aperture). The light spot on the QPD's surface, as shown in Figure 2, leads the QPD to produce a unique quad-tuple of RSS values proportional to the overlap areas, i.e., the RSS output is uniquely determined by the position of the light spot. Because of this, the position of the light spot can be estimated from the RSS values, which in turn, e.g., is useful in detecting an object's angle coordinates [8].  Light Spot The aperture is chosen so that its diameter is very large compared to the wavelength of the light. Further, as the distance between the LED and the QPD is very large compared to the wavelength, the receiver is in the far-field region of the LED, implying the incident waves can be considered as plane waves. As a result [18], the only light that reaches the QPD is the light that passes through the aperture, and the incident light will introduce a circular light spot of the same size as the aperture on the plane of the QPD. The position of the light spot in the QPD plane not only depends on the position of the light source, but also on the pose of the receiver, implying the RSS outputs in the constituent PDs of the QPD are functions of the position of the light source and the pose of the receiver. We assume that the light spot overlaps with all quadrants and completely falls within the active area of the QPD (in case the light spot does not overlap with all quadrants, one or more of the RSS values will be zero. This will obstruct the strategy of differencing and normalization of the RSS values, which is used to make the estimation algorithm robust against variations in the transmitted optical power. In case the light spot partially falls off the QPD, the relationship between the RSS values and the pose of the receiver becomes very complex, implying the resulting estimation algorithm will have high complexity). To ensure that the first condition is fulfilled, we assume the receiver considers only LEDs that lead all quadrants to produce a sufficiently high RSS output. The second condition implies that the aperture diameter is upper limited by (roughly) the dimension of a quadrant, i.e., half the diameter of a circular QPD or half the side of a square QPD. This guarantees a simple unique relationship between the quad-tuple of RSS values generated by the QPD and the pose of the receiver, and by combining the information from different light sources satisfying the above conditions, the pose of the receiver can easily be determined.
In the following, we will describe the relationship between the pose of the QADA and the position of the light spot, and then we will derive the relationship between the light spot and the RSS output.

Channel Link Model
To obtain the quad-tuple of RSS values as a function of the pose, we note that the dependency of the RSS values on the pose is twofold. Firstly, the channel gain is determined by the radiation pattern of the LED and the irradiance attenuation, implying the incident power at the QPD (and thus the optical power in the light spot) depends on the relative position of the light spot with respect to the LED. Secondly, the RSS values in the quadrants are proportional to the overlap areas of the light spot with the quadrants of the QPD, which is determined by the position of this light spot with respect to the center of the QPD. To express the RSS values as function of the pose of the receiver, we define the following three coordinate systems: (1) The first coordinate system, i.e., the system frame, corresponds to the inertia frame and is used to describe the pose of the receiver with respect to the LEDs. Furthermore, we define two coordinate systems that are attached to the receiver and will be used to describe the position of the light spot and misalignment due to construction errors; (2) In the first receiver attached coordinate system, i.e., the receiver frame, the x-y plane is the plane of the aperture, and the z-axis is the normal to that plane. The origin O r of this receiver frame coincides with the center of the aperture. We will use an extra subscript r to indicate coordinates represented in the receiver frame, i.e., the coordinates x r in the receiver frame correspond to their counterpart x in the system frame. This coordinate system is used to find the distance vector of the light spot to the LED as a function of the pose; (3) The second receiver attached coordinate system is the QPD frame, where the x-y plane is the surface of the QPD, the origin O q is placed in the center of the QPD, and the x and y axes are aligned with the gaps between the quadrants. We will use an extra subscript q to indicate coordinates represented in the QPD frame, i.e., the coordinates x q in the QPD frame correspond to their counterpart x in the system frame. Although this coordinate system is closely related to the receiver frame, its benefit lies in the simplification of the computations, as it enables us to convert the 3D receiver frame coordinates to the 2D coordinates in the QPD frame (we assume that the aperture plane is parallel to the plane of the QPD, and the xand y-axes of the QPD and receiver frames are aligned. This implies that the transformation of vectors in the aperture frame into vectors in the QPD frame is a simple translation). This approach is commonly used in the perspective projection model, which we employ to determine the position of the light spot in the QPD plane, and from which we compute the overlap areas of the light spot and the QPD. In the following, we will determine the coordinates of the light spot generated by an LED.
Assume that in the system frame, the pose of the receiver is described by its orientation R ∈ SO(3), i.e., R belongs to the Special Orthogonal group and position r ∈ R 3×1 , where the position of the receiver is defined as the position of the center of the aperture. We further assume the LED has position r L and normal vector n L in the system frame. To calculate the position of the light spot in the QPD frame, we notice that the line going through the position r L of the LED and the centers r of the aperture and r S ∈ R 3×1 of the light spot can be seen as a ray in the perspective projection model of a center of projection located at the center of the aperture and having focal length equal to the aperture height |h|, i.e., the distance between the planes of the aperture and the QPD (see Figure 3). The perspective projection model allows us to relate the 3D coordinates of the LED-the 'object'-in the system frame to the 2D coordinates of the light spot -the 'image'-in the QPD frame. Note that the position of the light spot in the QPD frame is specified by the (x, y) coordinates only. To make the paper self-contained, we include the background on the perspective projection model. Taking this into account and using the projection principle, the position r S,q ∈ R 2×1 of the center of the light spot in the QPD frame is given by [19]: The matrix T ∈ SE(3) is defined through its inverse Furthermore, the misalignment vector k m,q expresses the errors in the alignment of the aperture with respect to the QPD. Due to construction errors, a small horizontal offset between the origins of the QPD frame and the receiver frame may occur. This is illustrated in Figure 3, where the brown dot does not coincide with the center O q of the QPD. The horizontal misalignment vector, which takes into account this offset in the transformation of the light spot coordinates in the receiver frame to the coordinates in the QPD frame, is defined as k m,q . = [u m , v m ] T , with u m and v m the misalignment in the x and y direction, respectively. Taking into account (4), we see that the projection principle consists of three steps: • Step A: This step converts the LED coordinatesr L in the system frame to the coordinatesr L,r = Tr L in the receiver frame using the matrix T.

•
Step Equation (4) can be rewritten in the following compact form: The matrix K, which is used in the perspective projection model, is called the intrinsic parameter matrix and describes the internal properties of a receiver, i.e., that are independent of the pose. It only depends on the misalignment vector k m,q and the aperture height h: On the other hand, the matrix T contains the pose information of the receiver and is independent of the internal properties of the receiver. Therefore, it is called the extrinsic parameter matrix. Similarly, taking into account (4), we can write the position r S,r of the center of the light spot in the receiver frame (in homogeneous coordinates) in the following compact form:r and the positionr S of the light spot in the system frame (in homogeneous coordinates) is obtained through:r To determine the RSS values in the quadrants of the QPD, we first need to compute the overlap area between the light spot and the quadrants of the QPD. Let us assume the aperture has radius l. Taking into account we already assumed that the light spot completely overlaps with the QPD, i.e., no part of the light spot falls outside the QPD, and that all quadrants overlap with the light spot, the overlap areas can easily be computed based on the position r S,q of the center of the light spot in the frame of the QPD. Defining the overlap area A i > 0 of the ith quadrant, i = 1, . . . , 4 and the angles α j ∈ [0, π] with [r S,q ] j = l cos α j , j = 1, 2, and the operator [r S,q ] j as the jth element of r S,q (see Figure 2), we obtain the overlap areas by solving the following set of equations: • The light spot has area πl 2 , implying Taking into account that the area of a circular segment with central angle 2α, The overlap area A 1 with the first quadrant is a combination of a circle sector with central angle 3π 2 − (α 1 + α 2 ) having an area 3π 2 − (α 1 + α 2 ) l 2 2 , a rectangle with area [r S,q ] 1 [r S,q ] 2 = l 2 cos α 1 cos α 2 and two triangles with areas l 2 2 cos α j sin α j , j ∈ {1, 2}. Note that, depending whether cos Now that we obtained the overlap areas of the light spot with the PDs of the QPD and related them to the pose of the receiver, the RSS values in the different PDs can be determined. To this end, we define the distance vector between the transmitter and the light spot as v = r S − r L . With this definition, we can write the distance v between the LED and the light spot, and the incidence angle θ, i.e., the angle between the vector −v and the normal n Q = Re 3 of the QPD as: where the last equation holds due to the equalities n T Q , 0 T = T −1 e 3 , obtained from (5), and n T Q v = n T Q , 0 v. The channel gain h i for the ith quadrant of the QPD is given by [20] where Γ(n L , v) is the radiation pattern of the LED pointing n L and is evaluated at direction v, Π θ = Π(φ/φ FOV ). The factor Π θ in (13) implies that the QPD can detect the light only when the LED is within its FOV, i.e., 0 ≤ φ ≤ φ FOV . Taking into account (11)-(13), the channel gain vector is given by and is a function of the intrinsic parameter vector k . = [u m , v m , h] T and the pose T. From this channel gain, we obtain the RSS output vector y = [y 1 , . . . , y 4 ] T in the four PDs: where g = R p P t h, with R p the responsivity of the QPD, and P t the transmitted power. Due to coupling between the quadrants, the noise components are correlated Gaussian distributed random variables η ∼ N (0, σ 2 η C), where C is the correlation matrix with C i,i = 1 and C i,j = ρ if i = j, ρ is the correlation between the noise of different quadrants, which can be determined by comparing the noise values in the different quadrants during an experiment and calculate the resulting correlation, and σ 2 η is the noise variance. In the derivation of g in (15), it is assumed that the transmitted optical power is perfectly known, and the LED is a perfect Lambertian radiator. However, directly estimating the intrinsic and/or extrinsic parameters from the RSS output vector will result in inaccurate estimates. To make the estimator more robust to changes in the incident power, the normalized differences between the RSS values along the two axes can be used, as The approach (16) will make the observations insensitive to variations in the transmitted optical power and radiation pattern provided that the RSS values are noise-free or affected by a common offset so that the normalized differences of (16) are noiseless observations. However, the noise induced by ambient light in visible light systems is non-negligible, and due to the randomness of the shot and thermal noise, the noise present in the different quadrants is in general not equal. The presence of this non-identical random noise will have an impact on the distribution of t x and t y . As the estimation of the intrinsic and extrinsic parameters relies on proper knowledge of this distribution, we need to evaluate the effect of the noise on the distribution to see if the observations (16) are still insensitive to variations in the transmitted power when noise is present. Therefore, in the following section, we will analyze the distribution of the normalized differences t x and t y .

Approximation to the PDF of the Observation
To find the distribution of the normalized differences, we first rewrite the numerator and denominator of t x and t y in a more compact form by introducing m .
implying (16) can be rewritten as t = m 1 +w 1 m 3 +w 3 , m 2 +w 2 m 3 +w 3 T . Taking into account that the components of w are linear combinations of Gaussian distributed random variables, w has a multivariate normal distribution, i.e., w ∼ N (0, Σ w ) with Note that the numerator (m 3 + w 3 ) = ∑ i y i corresponds to the sum of the RSS values in the different quadrants. Hence, m 3 and σ 2 w3 . = 4(1 + 3ρ)σ 2 η are the average and variance of the total received signal strength, which implies we can define ξ . = m 2 3 /σ 2 w3 as the received signal-to-noise ratio (SNR) at the receiver. Furthermore, we define i.e., the ratio of the averages of the denominator and numerator of t. Obviously, in the absence of noise, the observation equals t = µ. We will show that: • µ is independent of the channel gain parameters and the transmitted power. • At high SNR, the distribution of t is given by t∼ N (µ, Σ) with Let us first take a closer look at the dependency of µ on the channel gain and the transmitted power. Defining h(k, T) = λA, and taking into account that the factor λ is independent of the considered quadrant, it follows that µ is independent of the channel gain parameters and the transmitted power contained in the factor λ, but only depends on the overlap area vector A. Substituting the solution of M a A = s in (19), we find after some straightforward computations that Hence, the vector µ solely depends on the angles α i that are determined by the coordinates of the center of the light spot in the QPD frame. These coordinates, and therefore also µ, only depend on the intrinsic and extrinsic parameters through (6) but not on the channel gain parameters nor the transmitted power.
To obtain the distribution of t at high SNR, we first notice that t x = m 1 +w 1 m 3 +w 3 and t y = m 2 +w 2 m 3 +w 3 are both a ratio of two independent normal variables with non-zero means, and as they share the same denominator, they are correlated. The distribution of a single ratio of normal variables has been investigated in the literature, e.g., [21][22][23]. In these works, it is shown that a closed-form expression for the PDF in the general case for such a ratio is very complex as its shape differs significantly with its parameters, i.e., in some cases it resembles the Cauchy distribution, in other cases a normal distribution or a bimodal distribution. Several closed-form approximations were discussed in the literature, e.g., the approximate normal distribution. In this paper, we need the distribution of two correlated ratios of normal variables. As this is an even more complicated situation compared to the single ratio case, it is clear that the closed-form expression for the PDF will be even more complex. To simplify the analysis, we first notice that at high SNR, w i m 3 1. Expanding t with respect to w using the Taylor series, keeping up to the linear terms in w, we obtain: This first-order approximation directly leads to the distribution t∼ N (µ, Σ) with Σ = M w Σ w M T w . Substituting M w and Σ w (18) in Σ, and using the definition of the SNR ξ, it follows that the covariance matrix Σ reduces to (20).
Following the above analysis, at high SNR we can design an estimator based on µ to achieve robustness against variations in the transmitted optical power and radiation pattern to estimate parameters from t.

Calibration and Pose Estimation
Using the approximate PDF of the normalized differences derived in the Section 2, we can now estimate the pose of the receiver, taking into account the relations (19) and (21). When the intrinsic parameters are known, the estimation of the pose, i.e., the extrinsic parameters, is straightforward. However, in practice, the intrinsic parameters are not prior known and need to be estimated because inaccurate knowledge of the misalignment parameters may result in significant biases in the pose estimates. As these intrinsic parameters can be considered fixed once the receiver is assembled, the intrinsic parameters can be determined once during a calibration process. The standard approach is to determine the intrinsic parameters assuming the pose of the receiver is known. However, this approach requires accurate knowledge of the pose of the receiver. Although it is possible to accurately determine the pose of the receiver, this requires costly equipment, in particular to determine the orientation of the receiver, and a laborious procedure to determine the pose of the receiver before each measurement. Therefore, we propose in this section a method to estimate the intrinsic parameters not requiring the knowledge of the pose, making it simpler and less costly than the standard calibration procedure. Note that our approach differs from the classical algorithm commonly used in camera-based systems, e.g., the non-iterative PnP algorithm and its iterative version [24], as in our method we directly estimate the pose from the normalized differences, while in the classical algorithm used for cameras, first the position of the light spot is determined, after which the pose is extracted. Although the algorithm used for cameras can be adapted to solve the problem at hand, we expect that this algorithm will perform worse than the algorithm proposed in this paper, as compared to the high resolution pixel measurement from the camera sensor, the RSS output from the QADA typically has a lower SNR. Furthermore, because of the non-linear relationship between the center of the light spot and the RSS values, the algorithm will be suboptimal for the considered system [15,25]. As the proposed method directly estimates the pose from the RSS values, it also differs from classical techniques used to determine the position, i.e., where first the AOA or distances to the LEDs are estimated before determining the position using , e.g., trilateration or triangulation.

Calibration Procedure
The calibration procedure proposed in this paper makes use of the setup shown in Figure 4, where N L LEDs are installed on a plane in the system frame xO s y, and the positions of these LEDs are fixed and known. The receiver observes these LEDs at N T different poses. Hence, the receiver observes for each LED N T quad-tuples of RSS values. The resulting observations are arranged in the 4N T N L × 1 vectory ∈ R 4N T N L ×1 : where the vectorη ∼ N 0,Σ η collects the noise, andΣ η = σ 2 η I N T N L ⊗ C (see Section 2.2 for the definition of the noise correlation matrix C). The vectorg can be rewritten as g = R p P t,1 g T 1,1 , P t,2 g T 1,2 , . . . , P t,N L g T N T ,N L T , where g i,j ∈ R 4×1 is the channel gain between the jth LED and the receiver at the ith pose T −1 i , and P t,j is the transmitted power of the jth LED. For each LED and each pose, we compute the normalized differences and arrange them in the 2N T N L × 1 vectort ∈ R 2N T N L ×1 . Taking into account that the noise in the different observations is statistically independent, we can write the Gaussian approximation of the PDF oft as: whereμ = µ T 1,1 , µ T 1,2 , . . . , µ T N T ,N L T . The components µ i,j correspond to the jth LED and the ith pose and are defined through (19), while the covariance matrixΣ is the block diagonal matrix given by Σ = diag(Σ 1,1 , Σ 1,2 , . . . , Σ 1,N L , Σ 2,1 , . . . , Σ N T ,N L ) (25) In this covariance matrix, Σ i,j is the covariance matrix for the noise of t i,j (see (20)). These covariance matrices Σ i,j are a function of µ i,j , which in turn depend on the parameters to be estimated. As a consequence, maximum likelihood estimation will be complex. Therefore, we will use the least squares (LS) method to estimate the intrinsic and extrinsic parameters. Let us define the parameter set Θ = {k, T 1 , . . . , T N T }, then the LS estimate is given by: The constraints R T i R i = I and det(R i ) = +1 imply that R i , which defines the orientation of the receiver at pose i and is enclosed in the transform matrix T −1 i ∈ SE(3) (see (5)), is a rotation matrix that is a member of SO(3). Unfortunately, (26) has no closed-form solution, implying we need to resort to an iterative algorithm to estimate Θ. As the solution of the above (iterative) constrained optimization problem is complex and cumbersome in Euclidean space, we first notice that SE(3) is a manifold and convert the above optimization problem into an unconstrained optimization problem on manifolds [26]. In what follows, we use the Gauss-Newton method on manifold SE(3) to solve (26), similarly as in [27]. At each iteration, the update direction is calculated by (27) where (·) † is the Moore-Penrose pseudoinverse, and ∇ Θμ = [∇ Θ µ 1,1 , ∇ Θ µ 1,2 , . . . , the gradient ofμ with respect to Θ. The derivation of the components ∇ Θ µ i,j can be performed in a similar way as in [27], and the result is given in Appendix A. As the update directions (27) depend on k and all T i , the intrinsic parameters and the poses need to be estimated jointly: where τ k and τ T control the incremental step size for k and T i , respectively.  The Gauss-Newton method needs an initialization Θ 0 to start the iterative estimation process. To obtain this initial estimate, we consider the direct linear transformation method [24]. Assuming the positions of at least four LEDs, and the positions of the light spots for these LEDs are known, the direct linear transformation method gives a closedform coarse estimate for Θ 0 . Although in our problem, the positions of the light spots are not known, we can estimate them from the normalized differences t i,j , which is a noisy version of µ i,j (22). This µ i,j is a function of the angles α , = 1, 2 (21), which in turn are related to the position r S,q,i,j of the light spot, i.e., we can write µ i,j as a function of r S,q,i,j : µ i,j (r S,q,i,j ). In many situations, the noise is relatively small, implying we can neglect the presence of the noise, i.e., t i,j ≈ µ i,j . Inverting the function µ i,j (r S,q,i,j ), we obtain the coarse estimater S,q,i,j ∈ R 2×1 of the light spot for the jth LED and the ith pose: where (·) (−1) is the inversion operator. Unfortunately, due to the non-linearity of µ as a function of its argument r S,q , no closed-form expression can be found for this inverse. However, as µ monotonically increases with r S,q , we can precompute the function µ for a set of values r S,q and save them in a look-up table. Based on this table, the inverse can be determined by interpolating between the values available in the look-up table.

Theoretical Lower Bound
In the Section 3.1, we considered the LS estimation of the intrinsic parameters based on the approximated PDF oft in the presence of the (unwanted) extrinsic parameters T i . To gain insight into the optimality of the designed calibration algorithm and to investigate how the SNR, N L or N T would effect the accuracy of the estimator, in this section we derive the theoretical lower bound for the mean squared errors of the intrinsic parameters h and k m,q . Since the estimation includes the unwanted parameter T i and since the estimator is designed based on an approximated PDF, we will derive the misspecified Cramér-Rao bound (MCRB) [28] for the whole parameter set Θ and only keep the left upper 3 × 3 submatrix corresponding to the intrinsic parameters.
The MCRB for the whole parameter set Θ is given by In (30),Θ is the parameter set to which the estimation converges due to the PDF approximation assumed by the estimator, given bẙ where D p(t)||p m (t|Θ) is the Kullback-Leibler divergence (KLD) between the true (p(t)) and the approximated (p m (t|Θ)) PDFs. The matrices M 1 , M 2 , and Bias(Θ, Θ) are, respectively, given by [28] where (Θ, Θ) is the error vector betweenΘ and Θ. The error vector is defined as After a few rearrangements, it can be seen that the expectations in (32) and (33) depend on p(t) through its mean and covariance only. Due to the complexity of the true PDF p(t), this mean and average must be calculated numerically via Monte Carlo integration.
To analyze the effect of N T , N L and SNR, we take in account that since the approximated PDF p m (t|Θ) is Gaussian, the minimum of the KLD (31) is obtained when the first two moments of the true and the approximated PDFs are matched [29]. When the SNR is high, the first two moments of the approximated PDF approach those of the true PDF. As a result, the estimator is asymptotically unbiased, i.e., Bias(Θ, Θ) ≈ 0 for SNR 1, so that It can be seen that when N T or N L increases, the number of terms in (35) increases. As the matrices being summed in (35) are symmetric and positive-semidefinite, this implies that M 2 is non-decreasing in the sense of the Loewner order. Furthermore, as Σ −1 i,j is proportional to the SNR, increasing the SNR also enlarges the Loewner order of these matrices being summed and thus leads the partial order of M 2 to enhance. As a result, the MCRB is a non-increasing function of N T , N L , or SNR. While an increase of N T , N L , or SNR improves the robustness of the estimator against noise, these parameters do not have an impact on the feasibility of the estimation. The number of observations required to make proper estimation feasible follows from the analysis from [24] dealing with intrinsic parameter estimation for cameras. Translating the results of that paper to the problem at hand, it follows that observations at two different poses provide sufficient constraints to avoid depth ambiguity, and at each pose, at least four LEDs should be observed to have enough information for the calibration.
In this paper, we estimated the intrinsic parameters from the normalized differences instead of the quad-tuple of RSS values, where the PDF of the normalized differences was approximated by a Gaussian distribution. To evaluate the effect of both the approximation and normalized differencing, we compare the misspecified Cramér-Rao bound MCRB(Θ) (30) with the Cramér-Rao bound for the quad-tuple of RSS valuesy observed by the QPD, i.e., CRB(Θ), corresponding to the true PDF p(y). Asy is Gaussian distributed, the derivation of the Fisher information matrix J(Θ) and thus CRB(Θ) = J −1 (Θ) [30] is straightforward and yields where ∇ Θg = ∇ Θ g 1,1 , ∇ Θ g 1,2 , . . . , ∇ Θ g N T ,N L T ∈ R (4N T N L )×(6N T +3) denotes the gradient ofg with respect to Θ.

Pose Estimation
Once the intrinsic parameters are estimated during the calibration process, the estimated parameters can be used for accurate pose estimation of the receiver. To estimate the pose, a similar procedure can be used for the calibration by means of the Gauss-Newton method, except that now it is assumed that the intrinsic parameters k are known. At pose i, the observation t i only depends on the pose T i and is independent of other poses. Therefore, we can estimate the poses independently, in contrast to the calibration procedure that needed to consider the observations of multiple poses due to the presence of the unknown intrinsic parameters. Again, the estimation is obtained with an iterative procedure of which the update step is the same as in (28): Similar to the calibration procedure, we can obtain a coarse estimate by means of the direct linear transformation method together with (29) to initialize the Gauss-Newton method for pose estimation.
To evaluate the pose estimates, the position error is expressed by the Euclidean distance between the position vector r and its estimater, i.e., r e =r − r, while the orientation error is expressed by the axis-angle vector between the orientation matrix R and its estimatê R, i.e., u e = (log(RR T )) ⊗ [27]. Similar to the pose estimator derived from the calibration algorithm, we can derive the theoretical lower bounds for the pose estimate, i.e., MCRB(T i ) and CRB(T i ), from (35) and (36), respectively, by restricting the parameters Θ to T i . Then the left upper and right bottom 3 × 3 submatrices of MCRB(T i ) are bounds on position and orientation, respectively. The same result applies to CRB(T i ).

Numerical Assessment
In this section, we first verify the performance of the proposed calibration algorithm and compare the mean squared error with the theoretical bound based on simulations. Then, the calibration procedure is used to determine the intrinsic parameters in a simulation setup, after which the pose is estimated. To obtain a comparative study, this section gives a comparison between our proposed pose estimation method and the camera's PnP and iterative methods.

Calibration
As the number of parameters to be estimated in the calibration phase is large, to have good performance, the number and quality of observations must be sufficiently high. To ensure that these concerns are met, we consider a dedicated calibration setup, where we place the LEDs in a plane at vertical distance sufficiently close to the receiver, i.e., z = 0.65 m, to obtain a sufficiently high SNR. Furthermore, to guarantee the receiver has sufficient LEDs within its FOV, we assume N L = N 2 L,r LEDs are placed in a square grid with an area of 400 cm 2 , where the LED in the ith row and jth column has position . . , N L,r }. All LEDs are assumed to have a transmitted power P t = 1 W, and a Lambertian pattern with order γ = 1. We set the origin of the system frame as the center of the LED plane. The receiver consists of a circular QPD with radius r p = 5 mm, resulting in an active area A p = πr 2 p , and responsivity R p = 0.4 A/W. Above the QPD, we place at a height |h| = 3.0 mm a circular aperture with radius l = 2.5 mm. In our simulations, we set the misalignment between the centers of the aperture and the QPD equal to k m,q = [0.5, −0.3] mm. The noise in the different quadrants of the QPD is assumed to have a correlation coefficient ρ = 0.7. The orientation R of the receiver is defined with the ZXZ Euler angles [θ α , θ β , θ γ ] [31]. We assume the azimuth and roll angles, i.e., θ α and θ γ , are uniformly distributed over [0, 2π), and we set the elevation angle θ β as a variable. In this way, the receiver has a random orientation but a controlled tilt angle of θ β . After fixing the orientation, we still need to specify the position of the receiver. To ensure that most LEDs are observed, we select the position of the receiver so that it is pointing toward the center of the LEDs plane. This corresponds to the position r = zRe 3 e T 3 Re 3 .
In the simulations, we use (15) to generate the observations. We assume the shot noise has power spectral density N 0 = 2.10 × 10 −22 A 2 /Hz, which corresponds to a background spectral irradiance p n = 5.8 × 10 −6 W/(cm 2 · nm) and a visible light bandwidth ∆λ = 360 nm [32]. Supposing the electrical bandwidth equals B = 1 MHz [8], we can compute the noise variance σ 2 w3 with σ 2 w3 = N 0 B. Simulation results will be given in terms of the effective SNR = (γ+1)A a R t P t 2πσ w3 2 with A a = πl 2 .(As the received SNR depends on the receiver pose, a fixed received SNR will limit the parameter space of the receiver pose. As this will complicate the simulation, we chose to fix the effective SNR) Given the optical transmitted power P t = 1 W, we obtain the effective SNR = 44.73 dB. In our simulations, we will use the range SNR ∈ [25,65] dB to take into account variations of the system parameters.
First, we evaluate the performance of the designed calibration estimator as a function of SNR. As a baseline method, we consider the calibration algorithm for cameras [33]. Because the algorithm of [33] takes the image of an object as input, which in this case corresponds to the observation of the light spot's position, we use (29) prior to this algorithm to convert the observation t to the estimated center of the light spot. The number of LEDs in the LED plane is set to N L = 25, and the receiver observes the plane from N T = 4 randomly generated poses with θ β = π/9 rad. We vary the SNR from 25 to 65 dB and plot in Figure 5a the resulting root mean square error (RMSE) (denoted by CAL) versusξ, i.e., the received SNR averaged over the four poses. The root of the MCRB and CRB (denoted by rMCRB and rCRB) from Section 3.2 and the performance of the baseline method (denoted by BAS) are also plotted for comparison. As we can see from Figure 5a, at highξ, the two calibration methods perform close to optimal, while at lowξ, the proposed calibration algorithm outperforms the baseline method. This can be explained as the bias in (29) for the baseline method is larger than for the proposed calibration method whenξ is low. The gap between MCRB and CRB reflects the net performance cost of the followed strategy, where we use the normalized differences instead of the RSS values and the approximated PDF. The gap shows that the robustness to imperfect knowledge of the transmitted power and radiation pattern of the LEDs, obtained by this strategy, is achieved at the cost of accuracy. The performance loss for the estimation of h is larger than for the estimation of k m . Next, we evaluate the effect of the number N L of LEDs and the number N T of poses. We set SNR = 45 dB and θ β = π/9 rad. Both variables influence the number of observations, so we expect that increasing either N L or N T will improve the performance of the calibration. In Figure 5b, we show the performance for N L,r = √ N L = {2, 3, . . . , 9} with N T = 3, and in Figure 5c for N T = {2, 4, . . . , 16} with N L = 25. The figures show that the lower bound is reached for both the proposed calibration method and the baseline method when N L or N T is sufficiently large. In that case, the performance improves logarithmically with N L and N T . When N L or N T are small, i.e., when the number of observations is small, the proposed method outperforms the baseline method.
Finally, we evaluate the effect of the orientation θ β of the receiver with respect to the LED plane. We set SNR= 45 dB, N L = 25, and N T = 4. The calibration performance for θ β varying from 0 rad to 5π/18 rad is shown in Figure 5d. When θ β is small, the performance first improves by increasing θ β . This can be explained as for small θ β , the randomly generated poses are very close to each other implying the solution set is tightly coupled and thus more susceptible to noise. For large θ β , the performance starts to degrade, as in our assumptions we assumed that the receiver is pointing to the center of the LEDs plane, implying large θ β corresponds to placing the receiver far from the center of the LEDs plane. As a consequence, the RSS values will experience lower SNR values, causing the slight degradation of the performance. The optimal performance will therefore occur at intermediate values of θ β .

Pose Estimation
In the pose estimation phase, we assume the intrinsic parameters are known, i.e., in practice, we use the intrinsic values obtained in the calibration phase. As in this case, we only need to estimate the six parameters of the pose; the number of LEDs can be lower than for the calibration phase. Furthermore, we assume the VLP system is combined with the illumination system, so the number and placement of the LEDs must satisfy the illumination standards. To this end, we consider in the evaluation of the pose estimator a 5 m × 5 m × 3 m area, where N L = 25 LEDs with optical power P t = 5 W are mounted at the ceiling of the area, as shown in Figure 6, unless specified otherwise (assuming the optical power of the LEDs equals 5 W, an average horizontal illuminance of 500 lm (corresponding to an office area) in a plane at z = 2 m below the LEDs requires roughly 25 LEDs). We define the boundary vector b = [5,5,3] T , the number of LED columns in the X direction N L,X = 5 and Y direction N L,Y = 5, and the positions of the LEDs are given by We assume the algorithm only uses the LEDs that result in all quadrants to produce a sufficiently large RSS output to ensure that the information in the RSS values is reliable. To take into account the effect of calibration and the possible uncertainty in the estimated parameters, we assume that the receiver has been calibrated and that the estimated intrinsic parameter vectork ∼ N (k, Σk), with Σk = diag([σ 2 uv , σ 2 uv , σ 2 h ]), and σ uv and σ h are set to the RMSE of the calibration given in the Section 4.1. To evaluate the performance of the proposed estimators, we consider a path with a circular pattern in the XY plane and a sinusoidal pattern in the Z direction as shown in Figure 6. The radius of the circle in the XY plane is 1.5 m, and the amplitude of the sinus is 0.2 m. The circle is centered at [2.5, 2.5, 1.2] T m. Starting at the coordinates [4.0, 2.5, 1.2] T m, the path oscillates sinusoidally in the Z direction and completes the path in three periods.
In Figure 7, we evaluate the proposed estimator with respect to the transmitted power. In this simulation, we set Σk equal to the corresponding RMSE values from the calibration when SNR = 55 dB, i.e., σ uv = 1.17 × 10 −5 m and σ h = 2.60 × 10 −5 m. The RMSE of the pose estimate, averaged over the path, is first evaluated in terms of the transmitted power P t = (1 + 2m) × 10 n W, where the variables m ∈ {0, . . . , 4} and n ∈ {0, 1} specify the selected power values in the figure. Figure 7 shows this averaged RMSE of the designed estimator based on the least squares method (LSM) (denoted by LSM) and the root of the MCRB (denoted by rMCRB) as a function of the mean received SNRξ. In comparison, we also add the performance of the proposed estimator when perfect knowledge of the intrinsic parameters is given (denoted by LSM-K) and when the intrinsic parameters are not calibrated and assumed to be the design values |h| = 2.8 mm and k m,q = 0 mm (denoted by LSM-X), while the true values are randomly selected from a Gaussian distribution with as averages the design values and variances given above. The averaged RMSE of the baseline method [33] (denoted by BAS) and the root of the CRB (denoted by rCRB) are also plotted for comparison. As a second baseline method, we also consider the classical PnP method (denoted by PnP). As expected, the performance of the estimator with perfect knowledge of the intrinsic parameters reaches the theoretical lower bound MRCB and improves with the SNR. On the other hand, the performance of the estimators without perfect knowledge of the intrinsic parameters, i.e., calibrated or not, shows an error floor when the received SNR increases. This error floor is caused by the uncertainty in the estimated intrinsic parameters, which becomes dominant when the received SNR is sufficiently large. Obviously, the error floor is much higher in the absence of calibration. Our algorithm strongly outperforms classical PnP, as expected, while the its performance is slightly better than the baseline method BAS. The gap between MCRB and CRB implies that the robustness to imperfect knowledge of the transmitted power and the radiation pattern of the LEDs, obtained by the strategy of normalizing differences, is achieved at the cost of accuracy. To illustrate the robustness of the proposed algorithm to fluctuations in the transmitted power, we consider the situation where the transmitted power of an LED follows a Gaussian distribution [34], i.e., P t = κP t,m , with κ ∼ N (1, σ 2 κ ). Experiments in [34] show that the standard deviation σ κ in general is smaller than σ κ < 0.0667. In Figure 8, we plot the RMSE averaged over the entire path as a function of the standard deviation σ κ ∈ [0, 0.0667], assuming P t,m = 5 W, σ uv = 1.17 × 10 −5 m, and σ h = 2.60 × 10 −5 m. We observe that for the given range of σ κ , the RMSE is (essentially) independent of σ κ , implying the approach of the normalized differences is robust to fluctuations in the transmitted power. Next, we show the RMSE of the pose estimates of the proposed estimator as a function of the uncertainty ink for P t = 5 W and compare the resulting RMSE with that of the baseline method and with the theoretical lower bounds. Figure 9 shows that, as expected, the performance of the estimator with the perfect knowledge of the intrinsic parameters reaches the theoretical lower bound and is independent of the uncertainty. The figure also illustrates that a large uncertainty degrades the performance of the estimators significantly. When the uncertainty becomes smaller, the performance of the estimators of LSM and BAS gradually reach and are bounded by the theoretical lower bound. This is expected, as due to the uncertainty, the estimators are in the best case as good as the ones with perfect knowledge of the intrinsic parameters. (Close to) optimal performance is achieved as long as the uncertainty is sufficiently small. Finally, we evaluate the performance of the proposed receiver over the whole considered area. For the LEDs, we consider the setup of Figure 6, where the N L = 25 LEDs are attached in a square grid to the ceiling of the 5 m × 5 m area. We assume the receiver is placed at the positions belonging to a horizontal grid with a horizontal spacing of 0.2 m, at a vertical distance z d ∈ {1, 1.5, 2, 2.5, 3} m from the LEDs. Taking into account that the receiver in practice is pointing roughly upwards, we consider for each sample point a random orientation with an elevation angle θ β that is uniformly distributed within the interval 0, θ β,m rad. We consider two scenarios, i.e., θ β,m = 0 rad for the case where the receiver is always pointing upwards, and θ β,m = π/3 rad for the tilted scenario. Figure 10 shows the cumulative distribution (CDF) of the pose error when z d = 2 m. We observe that the proposed algorithm outperforms the baseline methods BAS and PnP. The figure further reveals that our algorithm converges approximately for 95% of the sample positions when θ β,m = 0 rad, i.e., when the receiver points upwards, while this reduces to 75% when a random tilted angle is considered, i.e., when θ β,m = π/3 rad. Hence, as expected, the coverage degrades if the receiver has a random tilt. To evaluate the effect of the vertical distance z d on the performance, we compute the probability that the error is smaller than 1 m for the position and 0.2 rad for the orientation. Figure 11 shows that the performance of our estimator improves when the distance increases, as well as for the baseline method BAS. Despite the larger distance, and thus lower SNR, increasing the distance will bring more LEDs into the field-of-view of the receiver, implying more information is available to estimate the pose. On the other hand, for the PnP method, the performance degrades when the distance increases. This is explained as the PnP method is more sensitive to a reduction of the SNR, resulting in a performance degradation despite the larger number of LEDs available within the field-of-view.

Conclusions
In this paper, we investigate the pose estimation and calibration problem for the QADA receiver. To this end, the channel link is first modeled in terms of the receiver's pose and misalignment using the perspective projection model. Then, we study the effect of the strategy of differencing and normalization to the probability density function (PDF), and it is shown that the applied strategy results in a complex PDF. Therefore, we derive an approximated PDF in a simple closed-form, based on which the calibration and the pose estimation algorithms using the least squares principle are proposed. The proposed calibration does not require any information about the pose of the receiver and is robust to the received power variation and imperfect knowledge of the LEDs' radiation patterns, while the proposed pose estimator is able to directly estimate the pose from the (normalized) RSS values. Computer simulations confirm that the proposed calibration algorithm is effective and that the proposed pose estimator outperforms the PnP algorithm.