Mind the Exit Pupil Gap: Revisiting the Intrinsics of a Standard Plenoptic Camera

Among the common applications of plenoptic cameras are depth reconstruction and post-shot refocusing. These require a calibration relating the camera-side light field to that of the scene. Numerous methods with this goal have been developed based on thin lens models for the plenoptic camera’s main lens and microlenses. Our work addresses the often-overlooked role of the main lens exit pupil in these models, specifically in the decoding process of standard plenoptic camera (SPC) images. We formally deduce the connection between the refocusing distance and the resampling parameter for the decoded light field and provide an analysis of the errors that arise when the exit pupil is not considered. In addition, previous work is revisited with respect to the exit pupil’s role, and all theoretical results are validated through a ray tracing-based simulation. With the public release of the evaluated SPC designs alongside our simulation and experimental data, we aim to contribute to a more accurate and nuanced understanding of plenoptic camera optics.


Introduction
Plenoptic cameras as initially described by Lippmann [1] and Ives [2] combine a traditional camera with an additional microlens array (MLA) located between the main lens and the sensor.Over the years, two primary designs have been extensively studied and brought to market, the standard plenoptic camera (SPC) [3] [4] and the focused plenoptic camera (FPC) [5] [6], which mainly differ in the microlens focus distance.Due to the earlier commercialization, a larger angular resolution and a simpler decoding process, the SPC still remains popular, even though it has certain disadvantages in terms of the spatial resolution and depth of field when compared to the multi-focus variant of the FPC [6].Classical applications for SPCs include depth reconstruction [3] and post-capture refocusing from single shots [4] and as a first step to achieve these, the raw 2D image of a plenoptic camera is usually de-multiplexed and resampled into a 4D light field [7] as shown in figure 1.For this reparametrization procedure the knowledge about the exact position of each microlens image center (MIC) is crucial, as any inaccuracies in their locations can result in computational errors affecting the quality of the refocused images [8].Furthermore, a formal connection between the MICs and the plenoptic camera optical setup is required in order to relate the light field within the camera to the optical reality outside the camera, e.g. for finding the correct refocusing parameters for the desired object distance [7] [8].
Over the past two decades a number of studies have delved into the topic of processing plenoptic camera images, but often considered the MICs to be determined by the main lens center or its principal planes -a consequence of reducing the main lens to a simple thin lens.However, this assumption oversimplifies the actual optics involved.A more accurate representation acknowledges the role of the exit pupil in determining these image centers, as observed in studies by Hahne et al. [8] [10].Despite these advancements, the exit pupil is still often ignored in studies relating the light field within the camera to the 3D scene in front of the camera.

Capturing Setup
Raw Image Decoded 4D Light Field Refocused Images Figure 1.Exemplary pipeline for SPC post-shot refocusing: A scene is captured by a virtual SPC shown without housing.The resulting raw image consists of a large number of microlens images and is subsequently decoded into a 4D light field representation which is visualized by a subset of the sub-aperture images [7].By resampling the light field, a refocused image can be created [4].The correctly focused images were created based on parameters considering the exit pupil as described in section 2 while the slightly defocused image are results from the directly calculated parameters without exit pupil consideration based on [9].
In this context, our work aims to highlight the importance of the exit pupil again.To this end first a paraxial model of the SPC under consideration of the exit pupil is described, directly relating the refocusing shift [4] to the object distance.The expected errors of the models ignoring the exit pupil is formally analyzed and later verified through a raytracing-based simulation of various plenoptic cameras in Blender [11] using real lens data.Subsequently, multiple works in the domain of plenoptic camera calibration are revisited and examined with respect to the need for a more complex lens model.More specifically, first the popular work of Dansereau et al. [7] is revisited as well as the works of Zhang et al. [12] and Monteiro et al. [13] building upon Dansereau's ideas.In these cases, one can conclude that the parameters of the respective calibration models are sufficiently general to permit the simplicity of a main lens model without considering the exit pupil.However, this does only hold true since these works do not require a specific interpretation of the model parameters.For the work of Pertuz et al. [9] on the other hand, which also employs the decoding from [7] for metric distance measurement, it is shown, that the oversimplified main lens model leads to an incorrect interpretation of their metric refocusing model parameters.In summary, our contributions are:

•
A formal deduction of the connection between object distance and sub-aperture image shift considering the exit pupil.

•
A model for the errors resulting from ignoring the exit pupil in this relation.

•
Publicly available 1 SPC designs and a camera simulation framework based on Blender [11] supporting a large data base of lens designs and enabling a quick generation of new plenoptic camera setups.
Of these works especially Hahne et al. [10] consider the exit pupil and its connection to the microlens image geometry in a similar fashion to this work, but this is neither put into direct context of pre-existing calibration methods by a comparative evaluation nor does it provide an analysis of the expected errors resulting from oversimplified lens models.Nevertheless, to further validate our model, which establishes the refocusing distance in terms of the two-plane parametrization, its equivalence to the chief ray intersection model in [10] is formally proven in section 4.1.
FPC calibration: Despite this work focusing on SPCs as explained above, there is also relevant work in the related field of FPC calibration.Johannsen et al. [24] describe a metric reprojection model for FPCs incorporating a radial distortion model.This is enhanced by Heinze et al. [25] to also include the tilt and shift of the main lens as well as multi-focus MLAs.Further improvements to the distortion model are presented by Zeller et al. [26].All these approaches are based on the reconstruction of the virtual scene between MLA and main lens and associate these virtual 3d points to the known scene points.In contrast, Noury et al. [27] propose an approach directly working on the microlens images, i.e. associating the scene points with their projections on the sensor without the intermediate step of calculating virtual depths.This method, however, is limited to single focus FPCs and models the microlenses as simple pinholes.Nousias et al. [28] on the other hand feature a more complete microlens model and directly include the estimation of multiple microlens focal lengths into their approach.And Wang et al. [29] present a two-step model including a forward projection from the scene into the camera and a second projection from the virtual image to the sensor.More recently, Labussiere et al. [30] proposed a simultaneous calibration of the different microlens types in a multi-focus plenoptic camera by incorporating defocus blur into the features used for the parameter optimization.None of the listed methods for FPC calibration directly consider the exit pupil and while most of these works, which require the identification of MICs, incorporate a scaling between the grid of microlens centers and the grid of MICs, this is usually a result of projecting the main lens center, i.e. the center of the camera-side principal plane, through the microlens centers.However, as observed by Hahne et al. [8][10] for SPCs and confirmed in section 5, the MICs result from a projection of the exit pupil's center instead.Thus using the distance between the simplified main lens plane and the MLA for the image formation model as well as the calculation of MICs could inadvertently reduce the degrees of freedom of the model.And while this might be desired in terms of an increased stability during the parameter optimization, this reduction should also be analyzed for FPC models.However, for reasons of clarity and comprehensibility we decided against also including the topic of FPC calibration into this work and leave this for future work.

Lens models and simulation:
In the domain of ray tracing-based camera simulation, realistic main lens models which consider all lens components and their respective properties have been used for over two decades, either explicitly by direct modeling as in Kolb et al. [31] and Wu et al. [32] or implicitly via learned black-box lens models as proposed in Zheng et al. [33].Regarding plenoptic cameras, most previous work uses oversimplified models for rendering, such as pinhole cameras or multi camera arrays modeling the MLA, but without a model for the main lens [34][35] [36].More recently, Nürnberg et al. [37] as well as our group [38] provided simulations of plenoptic cameras without oversimplifying the main lens.Due to familiarity we extended our previous work for the synthetic experiments.

Organization
In section 2 first the general lens model and two-plane parametrization are explained before deducing the refocusing model under consideration of the exit pupil.Subsequently, section 3 provides a formal analysis of the expected errors when dismissing the exit pupil.In section 4 previous works are revisited with focus on the need for more complex lens models.Finally, our deductions are validated with synthetic experiments in section 5.

Preliminaries -Lens Models
The thin lens model describes a lens by assuming it to be infinitely thin and only refracting light at a single lens plane.The relation between the real scene and the lens image in this model is described by the equation where f M is the focal length of the lens, o is the object distance, and i is the image distance, both measured from the refraction plane.This concept can be extended to a thick lens model by expanding the refraction plane into two principal planes, H scene and H cam , between which a traced light ray is considered to run parallel to the optical axis [39][40].Furthermore, a combination of thick lenses, such as the main lens of a plenoptic camera, can again be represented as a single thick lens [40].And as visualized in figure 2, the object and image distances, o and i, are then measured based on the positions of the principal planes and equation (1) remains valid.
Plenoptic camera modeled by a thick main lens combined with a thin lens MLA.The microlens pitch is described by d ML and the distance between neighboring microlens image centers (MICs) is denoted as d MLI .Furthermore, X describes the distance between the exit pupil and the camera-side principal plane and d is the distance between H cam and the MLA.A complete notation overview is given in appendix A.
In addition to this model one can consider the exit pupil, i.e. the image of the aperture stop viewed towards the image plane.It defines the size and location of the virtual aperture in the optical system [40] and, as pointed out by Hahne et al. [8][10], determines the positions of the microlens image centers (MIC) on the sensor.As empirically shown in section 3 the exit pupil and H cam rarely coincide and accordingly a systematic error could be introduced when a plenoptic camera image is de-multiplexed based on MICs incorrectly estimated under the premise that the main lens follows the thin lens model without considering the exit pupil.

Preliminaries -Light Field Parametrization
Despite being a standard tool when working with light field data, the two-plane parametrization as described by Levoy and Hanrahan [41] and used in various popular works including the work of Dansereau et al. [7] and Ng et al. [4] are reiterated in this section for two reasons.First, the previous descriptions do not consider the exit pupil, and second, the literature is not consistent in terms of the underlying data representation.While [7] uses the raw camera image, indexed by integer pixel coordinates, to base the description on, [4] assumes known metric coordinates for every pixel.We follow the approach of [7] to facilitate the reproduction of our results.
Given an SPC following the thick lens model with an exit pupil as visualized in figure 2, the light field inside this camera can be parameterized using two planes -the MLA, which serves as virtual sensor plane, and the exit pupil plane, which can be interpreted as virtual lens plane.By following the decoding process of Dansereau et al. [7] the 4D light field can be parametrized as L F (i, j, k, l) with integer indices (k, l) for the uniformly sampled sub-aperture image and (i, j) for the pixel coordinates in that image.
The corresponding metric parametrization LF (s, t, u, v) describes the intensity of light captured at the MLA plane point (s, t, d) coming from the exit plane point (u, v, X).In accordance with Ng et al. [4][42]2 the distance between these two parametrization planes is denoted as F := d − X.To calculate the metric parametrization from a given integer parametrization, note, that the pixel pitch ∆ ST of the virtual sensor, i.e. the step size in the ST-Plane, corresponds to the microlens pitch, i.e. ∆ ST = d ML and, as shown in figure 3, the step size in the virtual lens plane, i.e. the UV-plane, can be calculated by means of the triangle equality as ∆ UV = s px •F f m where s px and f m denote the pixel size and the microlens focal length.With these step sizes the light field parametrized in metric coordinates (s, t, u, v) is given by Note, that the metric coordinates (s, t, u, v) might not be integer multiples of their respective step sizes and accordingly, querying the corresponding values from the integer parametrization L F could require additional interpolation steps.With the described light field parametrization one can now reproduce the resampling steps necessary to refocus the image by moving the virtual sensor plane under consideration of the exit pupil position.

Light Field Refocusing with Exit Pupil
In order to refocus the virtual sensor image onto an object at distance o, measured from H scene , this virtual sensor now needs to be placed at the distance i, measured from H cam , according to the thin lens equation (1).This corresponds to a distance F ′ := i − X between the UV-plane (exit pupil) and the virtual sensor as visualized in figure 4. By defining the refocusing parameter α = F ′ F as in [4], the thin lens equation can be applied to deduce Given an integer 4D light field L F (i, j, k, l) based on sub-aperture images as in [7], the relationship between the virtual sensor movement specified by α and the resulting disparity at the original ST-plane is described in the following.While the general deduction is similar to [4], the following calculations are based on integer indexing for the purpose of reproducibility.
Light field refocusing via shift of the virtual sensor, i.e. the ST-plane is moved to the image distance i.A ray (s ′ , u) can be associated with a ray (s, u) by means of the triangle equality, i.e.
As shown in figure 4, the metric light field value LF ′ (s ′ , t ′ , u, v) for the modified sensor plane placed at the distance F ′ from the exit pupil can be calculated as Ignoring the image magnification introduced by the virtual sensor movement, i.e. setting the step size for the S ′ T ′ -Plane to α • ∆ ST and defining ∆ := ∆ UV ∆ ST , one can deduce, that the integer parametrization L F ′ (i, j, k, l) for the modified sensor plane corresponds to At this point, the pixel shift S between neighboring sub-aperture images required to refocus onto the desired distance o can be calculated given the value α as and plugging equation (3) into equation ( 5) yields the direct relation between the object distance o and the disparity S as This model can easily be reverted to calculate the object or refocusing distance based on a given sub-aperture image shift via

Error Analysis
In the following the error that can be expected by ignoring the exit pupil, i.e. setting X = 0, is analyzed.First, we define the scaling between the ST and UV plane in under this assumption as and calculate the pixel disparity based on equation (6) as as well as the object distance which can be simplified to The relative error of the shift is then calculated by and by describing o as a multiple of the focus distance For the error of the object or refocusing distance two cases are analyzed.First, it is assumed, that the correct shift S corresponding to the ground truth o is given, but in a second step the oversimplified model of equation ( 10) is used to calculate the object distance.This error can be found in applications measuring the correct shift, e.g. by repeatedly refocusing an image and subsequently using the incorrect object distance calculation in order to estimate the associated metric distances in the scene.This error can be formulated as and using o = λ • o f again, one obtains The second case assumes an incorrectly calculated shift S based on equation ( 9) which is subsequently used to refocus an image with the refocusing algorithm complying with the correct object distance estimation in equation ( 7).This type of error is given by and after substituting o = λ • o f the error can be formulated by Now assuming a camera with a focal length of f M = 100mm focused at a finite distance figure 5 shows exemplary error values for different values of X relative to the focal length.The visualization shows that all errors diverge for λ → 0 with a rate depending on the positional relationship between the exit pupil and the principal plane H cam .Beyond the focus distance at λ = 1 the object errors again diverge while the shift error converges according to ERR S (λ 1) .Note, that these graphs present an ideal refocusing case free of aliasing artifacts and limiting optical properties such as the depth of field.Hence, later experiments only verify a section of these results within the respective physical and image processing limits.In summary these examples show a large deviation between the estimated refocusing distances in models with and without consideration of the exit pupil whenever there is a non-zero distance X between the exit pupil and H cam .This leads to the question, how prevalent a significant X ̸ = 0 is in off-the-shelf main lenses.To answer this question, the data of 866 DSLR lenses listed by Claff [45] was collected and X as well as f M were calculated via paraxial ray-tracing for each lens.The resulting data in Figure 6 shows . Distances X between exit pupil and principal plane H cam for 866 lenses [45] sorted by focal length.The black line represents the linear model fitted to the data and the colors of the data points indicate the relationship X/ f M .Note that the horizontal axis uses a logarithmic scaling due to the large number of lenses with a focal length below 100mm.a nearly linear connection between the focal length of a lens and the distance X with a Pearson correlation coefficient of 0.8994.Fitting a linear model to this data results in the non-zero function X( f M ) = 0.7108 • f M − 56.5546 with a coefficient of determination R 2 = 0.8089.Further examination shows, that only a small subset of 62 of the 866 lenses exhibits values for X below 5% of the focal length, i.e. |X| < 0.05 f M .On the other hand, for 627 lenses the deviation is larger than |X| > 0.25 f M and 444 lenses even have values |X| > 0.5 f M .Overall this data shows, that the assumption of X ≈ 0 is usually not met by reality.Therefore, the exit pupil should be considered when relating the camera-side light field to the scene's light field.

Revisiting SPC Methods
In this section several previous works are examined with respect to the exit pupil's role for the respective model deductions.

Equivalent Ray Model
First, the equivalence between the refocusing model in equation (7) and the ray intersection model presented by Hahne et al. [10] is proven.Instead of basing the model on decoding scheme of Dansereau et al. [7] in [10] an approach building upon the intersection of chief rays is presented in order to calculate the refocusing distance for a resampling of the raw plenoptic camera image.A comprehensive notation transfer into our setup is given in appendix B.1.
Image formation (light blue) for an object point located at distance o from H scene .The image of this point, located at distance i from H cam , is seen by multiple microlenses and its projections onto the sensor have a metric disparity of Ŝ.In order to determine the object distance o, Hahne et al. [10] propose to intersect ray functions (red) from two of the images and transfer the resulting image distance i to the scene via the thin lens equation.
The basic idea of [10] as depicted in figure 7 is the selection of two pixels on the sensor, which show scene points from the desired focus plane, and tracing rays from these through the respective microlens centers.The resulting camera-side intersection determines the distance of the virtual image inside the camera from the main lens and accordingly, the thin lens equation can be applied in order to calculate the corresponding object or refocusing distance.Without loss of generality, the following calculations will assume an MLA with one microlens center located on the main lens' optical axis.
Given a sub-aperture image shift S in pixel as in section 2.3, this translates to a metric pixel disparity Ŝ on the sensor by with the sign flip resulting from the differing conventions used throughout this work and [10].Under the premise of a well configured plenoptic camera with a regular microlens grid, any two pixels from neighboring microlenses with a disparity of Ŝ can be chosen to calculate the image distance.To simplify the calculations, the first ray is chosen to run along the optical axis as shown in figure 7 and the second ray is based on the pixel at sensor position d MLI + Ŝ going through the neighboring microlens center.According to [10] (compare appendix B.1) this leads to two ray functions which intersect at The crucial part, that sets [10] apart from the works reviewed in the following sections, is the correct calculation of the microlens image center distance d MLI based on the exit pupil (compare the calculation of u c,j in table A2) via which yields This intersection results in the image distance i = d − z i measured from H cam and can be used to calculate the object distance via the thin lens equation by where equation (17) and the definition from section 2 are used in the last step.This equation equals the previously deduced equation (7) and thus proves the equivalence of both models.

Light Field Decoding and SPC Calibration
This section revisits the popular decoding and calibration theme presented by Dansereau et al. [7].In that work the raw plenoptic camera image is first de-multiplexed into an integer indexed two plane parametrization L(i, j, k, l).These indices are then transformed into metric rays and propagated through the main lens.The combination of these steps yields an intrinsics matrix associating the integer indices directly with metric coordinates (s, t, u, v) for the scene-side light field.Note, that these do not correspond to the equally named coordinates in section 2 which describe the camera-side light field coordinates before propagating them through the main lens.
The relevant step with respect to the gap between the exit pupil and the principal plane in this process is the division of the integer indices by the respective spatial frequencies of the pixels and microlenses via the matrix H θ abs .As explained in section 2.1 the grid of MICs corresponds to the scaled grid of microlens centers.Accordingly, the sampling rate for the microlens plane has to be scaled down, or equivalently the pixel sampling rate has to be scaled up by the inverse factor.Dansereau et al. [7] acknowledge this fact and choose the second option, by introducing a scaling factor, which in our notation (compare appendix B.2) corresponds to This scaling, however, assumes a projection center at the main lens principal plane.Using the exit pupil instead, the correct rescaling is given by as visualized in figure 8.
The central sub-aperture image consists of the MICs, i.e. the image of the aperture center viewed across all microlenses.These MICs originate from the center of the UV plane, i.e. the exit pupil.Accordingly, the distance between neighboring MICs is given by the triangle equality via Fortunately, due to the overall formulation of the intrinsics as an end-to-end ray transformation, this slight change is not relevant for the calibration results, since neither f m nor d is directly estimated.Instead, the factor M proj contributes to the intrinsic variables H 1,1 to H 4,4 and repeating the deduction of H with the correct scaling leads to the same general form for the intrinsics matrix as in equation (23).
Similar cases of general parameters compensating for the model inaccuracies are presented by Monteiro et al. [13] and Zhang et al. [12].Both make use of the same decoding process as [7] and build upon the idea of directly relating the camera-side and scene light field.Monteiro et al. [13] slightly reduce the intrinsics matrix shown above and subsequently use it in order to create an equivalent array of cameras for the scene-side light field.Zhang et al. [12] follow a similar approach by first relating the de-multiplexed light field in form of sub-aperture images with the scene-side, metric light field, which they base all further calculations on.In both works, the interpretation of the intrinsic matrix parameters is not relevant, nor are these parameters used to directly reconstruct the main lens properties.Accordingly, the presented methods do not require a re-formulation using a more complex main lens model.
Despite this fact, it is important to point out the inaccuracy in [7] since several related works make only use of the proposed decoding process and assume receiving a two plane parametrization with a plane distance of d instead of d − X as exemplarily shown in the following section.

Depth Reconstruction
One such case is presented by Pertuz et al. [9] and repeated in the follow-up work by Van Duong et al. [14].In this work a model relating the sub-aperture image shift to the object distance is deduced which translates into our notation (compare appendix B.3) as with system-dependent parameters and a shift parameter There are two problems in the deduction of this model.First, the shift parameter ρ is not correctly deduced under the premise of light field data decoded by the method of Dansereau et al. [7] and second, the exit pupil is ignored.In the following, a corrected version is presented, which also explains, how these two problems nearly neutralize each other and lead to the same general model, albeit with different parameter interpretations.First, the incorrect shift parameter ρ is examined.In [9], this parameter is described as the pixel disparity between neighboring sub-aperture images gained by the decoding process of Dansereau et al. [7] and should therefore correspond to our shift parameter S which also ignores the exit pupil.However, due to different conventions, ρ is positive when focusing to a distance larger than the focus distance, whereas S is negative in that case (compare equation (5) for α < 1).Accordingly, ρ should equal − S, but transforming these parameters into a common notation leads to The reason for this discrepancy is the implicitly used incorrect assumption in [9], that the grid of microlens image centers equals the grid of microlens centers, i.e. d ML = d MLI .While a light field in general could be reparametrized with this step size in the ST plane, this requires exact knowledge of the camera and MLA geometry, which is usually unknown and not considered in the decoding process of [7].Using the correct shift parameter for light field data following [7] instead and rearranging the corresponding equation (10) This correction of equation ( 26) still ignores the exit pupil just as [9], but is considerably simpler than the original model since only a single system-dependent parameter is present compared to the two parameters a 0 and a 1 in equation (26).Finally, by introducing a non-zero distance X and thereby using o and S instead of õ and S, the full model can be deduced from equation (7) as and to align this model with the inverted shift direction of [9] we define This model has the same general form as equation ( 26) as proposed by Pertuz et al. [9], which explains the reasonable experimental results in that work.Nevertheless, the interpretation of the system-dependent parameters a 0 and a 1 as given in [9] and repeated in [14] is incorrect, which is also verified in experiment (V) in the following section.This different interpretation could lead to problems, when the model needs to be fitted to data and the initial parameter values are based on the incorrect direct calculation.Overall, the results of Pertuz et al. [9] are not entirely incorrect, but the used light field representation based on [7] is simply not matching the implicit assumptions used for the model deduction.More specifically, the main problem of [9] is the definition of the shift parameter ρ for light field data decoded similar to [7], but based on known microlens centers instead of MICs.While the light field could theoretically be reparametrized by using the parallel projections of the microlens centers onto the sensor, this would require exact knowledge of the SPC intrinsics, namely the MLA parameters as well as its placement relative to the main lens and sensor.

Simulation Environment
Since SPCs with exchangeable lenses are not commercially available at the time of writing and custom-built solutions are costly as well as prone to misalignment of the optical components, we resort to synthetic experiments via an extension of the ray-tracing solution we provided in [38].Our publicly available 3 , updated version of the Blender [11] Add-On expands the original simulation by the following aspects: • Simulation of aspherical lenses and zoom lenses • Configurable MLA pose, thickness and IOR • Automatic focusing with lens group movement based on paraxial approximations • Integration of Claff's lens collection [45] and a collection of sensor presets • Assisted plenoptic camera (SPC and FPC) configuration based on the ideas of [46] This simulation facilitates a quick generation of a broad range of plenoptic cameras, such as the one exemplarily shown in figure 9, and is used in the following to validate the formal analysis of the sections 2.3, 3 and 4.3.

Experiments
For the validation the five lenses listed in table 1 were selected from the database [45].While the first lens presents the ideal case of X ≈ 0mm, i.e. the exit pupil coinciding with

Finite Focus
Infinite Focus Lens Model f M (mm) X (mm) the camera-side principal plane, the remaining four lenses present interesting cases with varying relationships between X and the lenses' focal lengths f M .Each of these lenses is used in two SPC configurations, one with a finite focus distance o f < ∞ and another one focused at infinity, o f = ∞.To this end, the MLA placement with respect to the main lens, i.e. the distance d between the MLA and the camera-side principal plane of the main lens, can be calculated by the thin lens equation ( 1) for the finite case and simply be set to d = f M for o f = ∞.The remaining microlens parameters are automatically initialized using the main lens and sensor properties to fulfill the following two constraints.First, the microlens f-number needs to match that of the main lens in order to optimally cover the sensor area [4] and second, a predefined number of 129 × 129 microlens images should be visible on the sensor in order to guarantee this resolution for the sub-aperture images.The resulting parameters are then fine-tuned by hand to accommodate the approximating nature of the f-number constraint and to guarantee that the MICs coincide with the centers of sensor pixels.This optimal MIC positioning has two effects: First, it renders the resampling during the decoding process of [7] unnecessary.
After normalizing the raw images with white images to get rid of vignetting effects [47], the sub-aperture images in our setups can directly be extracted by combining the same relative sensor pixels from each microlens image [4].And second, the evaluation can concentrate on validating the refocusing itself instead of additionally dealing with the compensation of interpolation artifacts from the decoding process.The full setups can be found in appendix D and with each of these ten setups, five experiments are performed: (I) MICs and Exit Pupil: The exit pupil as origin of the MICs is verified by first tracing ray bundles from the main lens aperture center through the main lens and MLA onto the sensor.This results in a set of sensor hits for every microlens.Due to the small variance within such a set, the mean is considered to represent the ground truth position of the microlens image center.In a second step, rays are traced from these sensor positions through the corresponding microlens centers and the convergence location of the resulting ray bundle is calculated in two ways.First by performing a line search along the optical axis for the minimum blur spot position of the ray bundle.And second based on the rays' intersections with the optical axis.For these intersections the mean and variance are calculated and presented alongside the minimum blur spot position.This whole process is also visualized in figure 10.
(II) o and S: A calibration pattern, more specific a Siemens star with four spokes, is placed at various distances in front of the camera and after demultiplexing the plenoptic camera image, the sub-aperture image shift S, which is required to focus onto the given target distance o, is measured.This is done via line search, i.e. by repeatedly refocusing the image with a simple shift-and-sum algorithm [4] and calculating the sharpness of the refocused image.Here, the variance of the Laplacian [48] is used as metric for image sharpness and the shift value with the highest score is considered the optimum.This procedure results in tuples of ground truth distances o and measured shifts S, which are then used to verify the connection S(o) as formally described in equation ( 6).
To validate the inverted connection o(S) in equation ( 7), all images are refocused for a given set of shift values.For each of these shift values, the object distance associated with the best focused image is considered the measured object distance for the respective shift value.The resulting tuples of preset shifts and measured distances are then used to verify o(S).
(III) ERR õ Validation: The data of experiment (II) is further used in order to verify the error model ERR õ.In detail, the measured shift S for a known target distance o is used with equation ( 10) to approximate õ(S) and calculate the measured relative error according to equation ( 13).This error is then compared to the expected error gained by directly calculating S based on the camera's properties instead of measuring it.
(IV) ERR S Validation: Moreover, the images of (II) are also used to verify the error model ERR S presented in section 3. First, for every target distance o the incorrect shift S(o) is calculated based on the assumption X = 0 as in equation ( 9) and the images of the patterns at different positions are all refocused with this parameter.The target distance corresponding to the sharpest of the refocused images approximates o( S) and is then used to measure ERR S as in equation (15).Again, the measured values are compared to the errors gained by directly calculating o( S).Note, that instead of verifying ERR õ and ERR S, the equations 9 and 10 as well as the shift error model in equation ( 11) could also directly be validated with the data measured and calculated in the experiments (III) and (IV).However, these equations do not include a comparison between the incorrect estimations and the ground truth refocusing distances.Therefore, the indirect validation of those models by means of the resulting refocusing errors was preferred.(V) Validation of section 4.3: As analyzed in that section, the overall formulation of the model presented by Pertuz et al. [9] is correct, but the model parameters have a different interpretation under the assumption of light field data decoded by the method of [7].To verify the corrected model, the parameters a 0 and a 1 are first calculated based on the formula of Pertuz [9], then according to our model from section 4.3, and finally fitted to the data of (II), i.e. the set of shift-distance-pairs with measured shifts and ground truth target distances.For this parameter fitting, a grid search for the best parameters by means of the RMSE was performed with a grid explicitly containing both directly calculated parameter sets.

Results and Discussion
(I) As shown in figure 11, the ray bundle consisting of rays running from the calculated ground truth MICs through the microlens centers in general converge towards the exit pupil in all setups.The closer the ray origins are to the optical axis, compare e.g. the inner 25% of rays shown in figure 11, the closer the minimum blur spot is located to the exit pupil.
The deviations for larger sets of rays including the outer most MICs, as especially visible for the Canon setups, can be explained by the dependency of the exit pupil's location on the viewing angle.Similar to the curved focal plane in lenses with a significant non-zero Petzval field curvature, the exit pupil cannot be well approximated by a plane in some setups and thereby affects the MIC grid on the sensor in a non-linear fashion [40].This is a The orange markers indicate the exit pupils' locations on the optical axis (horizontal, dotted line) with respect to the respective principal planes H cam .The black markers show the mean and variance of the intersection points between the optical axis and the rays traced back from the MICs through the microlens centers.The colored functions show the blur spot sizes for different subsets of these ray bundles close to the exit pupil (compare figure 10).These subsets contain the respective portion of rays from the bundle, which are closest to the optical axis.
clear limitation of our model, which is built upon paraxial approximations.Nevertheless, considering that even in extreme cases like the outer-most microlenses in the Canon setups, the origin of the MICs is located close to the exit pupil plane, these results again confirm the observation of Hahne et al. [8] and thus justify the recommendation to examine the necessity of including the exit pupil into the lens model.between the measured and directly calculated ground truth shift values across all 10 setups is 0.008 px with a variance of 0.0026 px.The worst single setup is the finitely focused Olympus setup with a mean of absolute differences of 0.04 px and a variance of 0.025 px.
These values are in the range of expected inaccuracies resulting from the image processing methods involved.Especially the interpolation steps required by the shift-and-sum refocusing [4], but also the rather simple (de)focus measure, which is prone to interference errors, are limiting factors, which prevent a higher accuracy.A similar situation is observable in the inverse case, i.e. for the model o(S) from equation ( 7), as presented in figure 13.Due to the wide range of target distances, in this case the relative absolute differences are calculated to quantify the results from figure 13.For the finite setups, the mean of these relative absolute differences is 0.13 % with a variance of 2 • 10 −5 %, while in the infinite cases, the overall mean is 0.72 % with a variance of 0.01 %.Here, the infinitely focused Zeiss setup has the worst performance with a relative absolute difference mean of 0.9 % and a variance of 0.004 %.The worse performance of the infinitely focused setups is a consequence of the smaller range of shift values representing a larger refocusing range (compare figure 13) resulting in a greater susceptibility to small shift changes.Nevertheless, the overall performance confirms the model o(S), again within the constraints posed by the involved image processing steps.

(III)
The results of the ERR õ verification experiment are shown in figure 14.These graphs again show the error based on measurements compared to the directly calculated ground truth error based on the error models presented in section 3. The mean of absolute differences between the measured and calculated error values is 0.003 with a variance of 10 −5 , which validates our models limited only by the accuracy of image processing methods and the optical properties of the chosen lenses.More specific, the small deviations from the expected values, which are even present in the baseline case for the Rodenstock lens, can be explained as in experiment (II) by the interpolation operations required by the shift-and-sum refocusing algorithm [4] for non-integer shift values.The larger deviation visible at close range for the Olympus lens with finite focus distance is a result of the optical limits of this lens which can be explained as follows.A single sub-aperture image contains one pixel per microlens.Hence, such an image can be considered sharp, if the calibration target is imaged onto the MLA with all blur spot sizes smaller than the microlens diameter d ML .For the Olympus lens however, these blur spot sizes increase more drastically in the close range than those of the other lenses, leading to severe defocus blur even in the sub-aperture images.This in turn can produce interference artifacts during the refocusing and thereby lead to fluctuating contrast measurements.And since the refocus distance o is determined by these measurements, this affects the error calculated by as in equation ( 13).(IV) While experiment (III) validated the error model in the case of a correctly estimated shift combined with an oversimplified distance estimator, figure 15 shows, that the inverse problem of an incorrectly calculated shift used with the shift-and-sum refocusing is also modeled correctly.The mean of absolute differences between the measured and directly calculated error values is 0.004 with a variance of 2.47 • 10 −5 .Overall, the experiments (III) and (IV) confirm the formal deductions of section 3 and thus again justify the warning to mind the exit pupil when modeling a standard plenoptic camera.However, the results also hint at further minor optical or algorithmic aspects not being accounted for.In all cases the mean of the absolute differences between measured values and ground truth model is up to two orders of magnitude larger than the respective variance.This is the result of a nearly constant over-or underestimation and could indicate a systematic error, which can e.g.be caused by the refocusing model being limited to paraxial calculations.(V) Regarding the correction of the model from [9] in section 4.3, the results as presented in table 2 show, that the corrected model appropriately describes the connection between refocus distance and the sub-aperture image shift.In all setups the RMSE of our directly calculated model is within 1.75 mm of the RMSE of the fitted model.In addition, the fitted parameters are well approximated by the direct calculation with our model.On the other hand, the model of [9] can be regarded as incorrect for this light field parametrization with one exception -the parameter a 1 in the setups focused at infinity is in general correct which can be explained as follows.Neither the original model of [9] nor our correction directly considers the case o f = ∞, instead a large focus distance of o f = 10 6 m is used as approximation in these cases.For such a focus distance, the distance d between main lens and the MLA is close to the focal distance of the main lens, i.e. d = f M + ϵ for some small value ϵ > 0. With this formulation, the correctness of a 1 in the infinite cases can formally be explained by

Conclusion and Limitations
Overall, this work shows, that the exit pupil can play a crucial role when modeling the relation between the camera-side and scene-side light fields.The connection between sub-aperture image shift and refocusing distance is derived analogously to previous work, but under consideration of an exit pupil not coinciding with the principal plane of the main lens.Based on this deduction, two error models for the relative refocus distance are created and validated.The subsequent review of previous work shows, that a sufficiently general formulation of the SPC calibration model in most methods absorbs these errors, albeit leading to an incorrect interpretation of the model parameters.Exemplarily a correction of the work of Pertuz et al. [9] is presented and validated.
Nevertheless, despite the good evaluation results, there are several limitations to this work.First of all, the experiments are performed on simulated data.While the raytracing based lens simulation has been verified to exhibit the optical properties stated in the respective lens patents, i.e. aberration and distortion measurement results from the patents could be reproduced, there still is a gap between simulation and reality.On one hand, the specified lens parameters could differ from the final production lens due to manufacturing inaccuracies or even deliberate parameter obfuscation by the lens manufacturer to hide specific lens details.On the other hand, the used framework given by Blender [11] does not include wave optic effects such as diffraction [49] [50].Without these effects the simulated optics are not diffraction-limited and therefore might produce images sharper than their real pendants.Further limitations are concerned with the formal lens model used for the error deduction.First, the microlenses in our model are still formally regarded as thin lenses.While Hahne et al. [10] use a thick lens model with explicit microlens principal planes, it was decided to leave this aspect out of the theoretical discussion for reasons of clarity and comprehensibility.However, the microlens thickness was factored in while performing the experiments and appendix C shows, that an extended model does not change the equations deduced in section 2 and section 3. Another limitation, which does affect the validity of these equations, is the restriction to paraxial models, more specific the repeated use of the thin lens equation 1 in various calculations as well as fixed positions for the principal planes and exit pupils.The thin lens equation describes the relation between the object distance, image distance and a lens' focal length and is usually only valid along the optical axis.With growing distance from this axis, third-order aberrations like the Petzval field curvature, i.e. a curved focus surface, might affect the refocusing distance [40].Furthermore, as seen for the evaluated Canon lens in experiment (I), the position of the exit pupil might also vary depending on the viewing angle and thereby reduce the applicability of the deduced models.
Further work on the listed limitations is not expected to significantly alter the presented results, as these are already within the expected accuracy bounds set by the involved image processing steps.Instead, future work could target the second type of plenoptic cameras, namely the FPC, for which a multitude of different calibration methods exists, that also differ in the assumed lens models and could benefit from minding the gap between principal plane and exit pupil.originating in a pixel at the sensor position u c+i,j and running through the corresponding microlens center s j can be described via f c+i,j (z) = m c+i,j • z + s j = s j − u c+i,j f s • z + s j = s j − (u c,j + i • ∆u) By assuming an MLA with a microlens center located at the main lens optical axis, the ray originating from the center of the central microlens image going through the microlens center s 0 = 0 can be described by Given a second pixel position at distance d MLI + Ŝ from the central pixel, this pixel's index in the microlens image j = 1 is given by i = Ŝ s px .Accordingly, the ray originating from this pixel running through the microlens center s 1 = d ML is given by f (z) := f c+i,1 (z) = Apart from the intrinsic parameters listed in table A3, many other aspects of the notation in Danserau et al. [7] are dedicated to the light field outside the camera, which is not directly modeled in our work.Using the notation as given in table A4, the deduction steps for the model in [9] can be translated as stated in table A5.Note, however, that this table presents a direct notation transfer, especially regarding equations ( 5) and ( 8) of [9].As shown in section 4, the shift parameter ρ is already based on an incorrect assumption and does therefore not directly correspond to S. measured from the main lens aperture position and have a positive sign for these planes being located on the sensor side of the aperture.On the other hand, H scene is measured from the aperture with a positive sign if located on the scene-side of the aperture.X is measured from the position of H cam as shown in figure 2 and all remaining properties are assumed to be unsigned distance values.All mentioned properties were measured via a 2d ray tracer which we also integrated into the Blender add-on to facilitate fast automatic plenoptic camera reconfigurations.

Figure 3 .
Figure 3. Integer (red) and metric (black) two plane parametrization of the light field.Here, s px describes the size of a sensor pixel and f m the focal length of a microlens which for an SPC coincides with the distance between MLA and sensor.

Figure 5 .
Figure 5. Left: Relative shift error based on λ = o o f .Mid/right: The two cases of relative object distance errors for the assumed camera focused at a finite distance which is met for a relative distance of o o f = 1.Negative error values indicate an underestimation of the ground truth value while positive errors represent an overestimation.

Figure 9 .
Figure 9. Cross-section and rendering of an exemplary evaluation setup: A fully modeled lens is combined with a two plane MLA model and a sensor in order simulate a plenoptic camera via ray tracing.Calibration patterns are placed at different distances in front of this setup to verify the analytical models.Note, that the housing of camera and lens were removed for the purpose of visualization and the ISO 12233 pattern 4 is used with permission of Cornell University.

Figure 10 .
Figure 10.Experiment (I):The two steps of the MIC/exit pupil verification visualized for the Zeiss Batis 1.8-85.Rays (light gray) are traced from the main lens aperture center through the main lens and MLA onto the sensor.The resulting means of the sensor hits per microlens represent the MICs.The exit pupil as the approximate source of these points is verified by backward tracing rays (blue/green) from the MICs through the respective microlenses and calculating the minimum blur spot position as well as the mean and variance of intersections along the optical axis.

Figure 11 .
Figure11.Results of experiment (I).The orange markers indicate the exit pupils' locations on the optical axis (horizontal, dotted line) with respect to the respective principal planes H cam .The black markers show the mean and variance of the intersection points between the optical axis and the rays traced back from the MICs through the microlens centers.The colored functions show the blur spot sizes for different subsets of these ray bundles close to the exit pupil (compare figure10).These subsets contain the respective portion of rays from the bundle, which are closest to the optical axis.

Figure 12 .
Figure 12. Results of experiment (II): Verification of the model S(o) according to equation (6).The data points represent measured shift values for the respective object distances and the underlying lines represent the expected, directly calculated values.Left: Error for the setups with finite focus based on the relative object distance o o f .Right: Error for setups focused at infinity.

Figure 13 .
Figure 13.Results of experiment (II): Verification of the model o(S)) according to equation (7).The data points represent measured focus distance values for the respective shifts and the underlying lines represent the expected, directly calculated values.

Figure 14 .
Figure 14.Results of experiment (III): object distance error resulting from a correctly estimated shift, but incorrect object distance estimation based on the assumption X = 0.The thick lines indicate the predicted ground truth error based on equation (13) and equation (14) and the points indicate the measurements.Left: Error for the setups with finite focus.For a comparative visualization of the different target distance ranges, the results are shown based on the relative object distance λ = o o f .Right: Error for setups focused at infinity.

Figure 15 .
Figure 15.Results of experiment (IV): Relative object distance error resulting from an incorrectly calculated shift ignoring the exit pupil, combined with a correct object distance estimator.The thick lines indicate the predicted ground truth error based on equation (15) and equation (16) and the points indicate the measurements.Left: Error for the setups with finite focus.For a comparative visualization the results are shown based on the relative object distance o o f , i.e. the finite focus distance for the respective setups is met at o o f = 1.Right: Error for setups focused at infinity.

Table 1 .
Overview of the simulated main lenses and their properties in the finite and infinite focus setup.Note, that the focal length of a lens can vary in different setups due to lens group movements involved in refocus or zoom operations.

Table 2 .
[9]ults of experiment (V): Model parameters and the resulting RMSE in mm for the 10 SPC setups.The parameters for our method and Pertuz et al.[9]are directly calculated based on the known optical properties.The fitted parameter sets were acquired via grid search and subsequent local optimization to fit the model to the data from experiment (II).

Table A1 .
[10]ral notation in Hahne et al.[10]and our work.′•f s + s j Position of i-th neighbor pixel of MIC u c+i,j = u c,j + i • ∆uSlope of ray from i-th neighbor through ML center m c+i,j = s j −u c+i,j f s

Table A7 .
SPC properties for the evaluation setups with finite focus distance.

Table A8 .
SPC properties for the evaluation setups with infinite focus distance.