Design and Analysis of a Single-Camera Omnistereo Sensor for Quadrotor Micro Aerial Vehicles (MAVs)

We describe the design and 3D sensing performance of an omnidirectional stereo (omnistereo) vision system applied to Micro Aerial Vehicles (MAVs). The proposed omnistereo sensor employs a monocular camera that is co-axially aligned with a pair of hyperboloidal mirrors (a vertically-folded catadioptric configuration). We show that this arrangement provides a compact solution for omnidirectional 3D perception while mounted on top of propeller-based MAVs (not capable of large payloads). The theoretical single viewpoint (SVP) constraint helps us derive analytical solutions for the sensor’s projective geometry and generate SVP-compliant panoramic images to compute 3D information from stereo correspondences (in a truly synchronous fashion). We perform an extensive analysis on various system characteristics such as its size, catadioptric spatial resolution, field-of-view. In addition, we pose a probabilistic model for the uncertainty estimation of 3D information from triangulation of back-projected rays. We validate the projection error of the design using both synthetic and real-life images against ground-truth data. Qualitatively, we show 3D point clouds (dense and sparse) resulting out of a single image captured from a real-life experiment. We expect the reproducibility of our sensor as its model parameters can be optimized to satisfy other catadioptric-based omnistereo vision under different circumstances.


Introduction
Micro aerial vehicles (MAVs), such as quadrotor helicopters, are popular platforms for unmanned aerial vehicle (UAV) research due to their structural simplicity, small form factor, vertical take-off and landing (VTOL) capability, and high omnidirectional maneuverability. In general, UAVs have plenty of military and civilian applications, such as target localization and tracking, 3-dimensional (3D) mapping, terrain and infrastructural inspection, disaster monitoring, environmental and traffic surveillance, search and rescue, deployment of instrumentation, and cinematography, among other uses. However, MAVs have size, payload, and on-board computation limitations, which involve the use of compact and lightweight sensors. The most commonly used perception sensors on MAVs are laser scanners and cameras in various configurations such as monocular, stereo, or omnidirectional.
We present a vision-based omnidirectional stereo (omnistereo) sensor motivated by several aspects of MAV robotics.

Sensor Motivation
We justify the need for the proposed omnistereo sensor after observing two basic differences in the sensor requirements between MAVs and ground vehicles: Size and payload-In MAV applications, the sensor's physical dimensions and weight are always a great concern due to payload constraints. Generally, MAVs require fewer and lighter sensors that are compactly designed, while larger robots (including high-payload UAVs) have greater freedom of sensor choice.

2.
Field-of-view (FOV)-Due to their omnidirectional motion model, MAVs require a simultaneous observation of the 3D surroundings. Conversely, most ground robots can safely rely upon narrow vision as their motion control on the plane is more stable.

Existing Range Sensors for MAVs
In addition to specifying our sensor requirements, it is important to note the most prevalent robot range sensors used today by MAVs and their limitations. For example, lightweight 2.5D laser scanners can accurately measure distances at fast rates, however, their instantaneous sensing is limited to plane sweeps, which in turn require the quadrotor to move vertically in order to generate 3D maps or to foresee obstacles and free space during navigation. More recently, 3D laser rangefinders and LiDARs are being developed, such as the sensor presented in [1], but this one is not compact enough for MAVs. Another disadvantage of laser-based technologies is their active sensing nature, which requires more power to operate and their measurements are more vulnerable to detection and corruption (e.g., due to dark/reflective surfaces) than vision-based solutions. Time-of-flight (ToF) cameras as well as red, green, blue plus depth (RGB-D) sensors like the Microsoft Kinect R are also very popular for robot navigation. They have been adopted for low-sunlight conditions and mainly indoor navigation of MAVs [2] due to its structured infrared light projection and short range sensing (under 5 m) [3]. Hence, a lightweight imaging system capable of instantly providing a large field of view (FOV) with acceptable resolutions is essential for MAV applications in 3D space. These state-of-the-art sensors' pitfalls motivate the design and analysis of our omnistereo sensor.

Related Work
Using omnidirectional images alone and motion-like the approaches taken in [4,5]-have been proposed to map and localize a robot. Omnidirectional vision using a single mirror for the flight of large UAVs was first attempted in [6]. In [7], Hrabar proposed the use of traditional horizontal stereo-based obstacle avoidance and path planing for AUVs, but these techniques were only tested in a scaled-down air vehicle simulator (AVS). Omnidirectional catadioptric cameras can be aided by structured light such as the prototypes presented in [8] and more flexible configurations demonstrated in [9]. Alternatively, stereo cameras can provide passive, instantaneous 3D information for robot mapping and navigation (including UAVs [10]). Intuitively, omnidirectional stereo (omnistereo) can be achieved through circular arrangements of multiple perspective cameras with overlapping views. Higher resolution panoramas can be achieved by rotating a linear camera as presented in [11], but this approach suffers from motion blur in dynamic environments. We point the reader to [12] for a detailed study of multiple view geometry, and [13] for a compendium of geometric computer vision concepts. Instead, our solution to omnistereo vision consists of a 'catadioptric' system by employing cameras and mirrors [14].
Throughout the years, [15][16][17][18][19][20] are some of the works that have applied various omnistereo catadioptric configurations for ground mobile robots. Unfortunately, these systems are not compact since they use separate camera-mirror pairs, which are known to experience synchronization issues.
In [21], Yi and Ahuja described a configuration using a mirror and a concave lens for omnistereo, but it rendered a very short baseline in comparison to the two-mirror configurations. Originally, Nayar and Peri [22] studied 9 possible folded-catadioptric configurations for a single-camera omnistereo imaging system. Eventually, a catadioptric system using two hyperbolic mirrors in a vertical configuration was implemented by He et al. [23]. Their omnistereo sensor provides a lengthy baseline at the expense of a very tall system. In the past [24], we developed a novel omnistereo catadioptric rig consisting of a perspective camera coaxially-aligned with two spherical mirrors of distinct radii (in a "folded" configuration). One caveat of spherical mirrors is their non-centrality; they do not satisfy the single effective viewpoint (SVP) constraint (discussed in Section 2.2) but rather a locus of viewpoints is obtained [25].

Proposed Sensor
We design a SVP-compliant omnistereo system based on the folded, catadioptric configuration with hyperboloidal mirrors. Our approach resembles the work of Jang, Kim, and Kweon [26], who first implemented an omnistereo system using a pair of hyperbolic mirrors and a single camera. However, their sensor's characteristics were not studied in order to justify their design parameters and capabilities, which we do in our case.
It is true that an omnidirectional catadioptric system sacrifices spatial resolution on the imaging sensor (analyzed in Section 3.4). However, our sensor offers practical advantages such as reduced cost, acceptable weight, and truly-instantaneous pixel-disparity correspondences since the same single camera-lens operates for both views, so mis-synchronization issues do not exist. In fact, we believe we are the first to present a single-camera catadioptric omnistereo solution for MAVs. The initial geometry of our model was proposed in [27]. Now, we perform an extensive analysis of our model's parameters (Section 2) involving its geometric projection (Section 3) that are obtained as a constrained numerical optimization solution devising the sensor's real-life application to MAVs passive range sensing (Section 4). We also show how the panoramic images are obtained, where we find correspondences and triangulate 3D points for which an uncertainty model is introduced (Section 5). Finally, we present our experimental results and evaluation for 3D sensing with the proposed omnistereo sensor (Section 6), and we discuss the future direction of our work in Section 7. Figure 1 shows the single-camera catadioptric omnistereo vision system that we specifically design to be mounted on top of our micro quadrotors (manufactured by Ascending Technologies [28]). It consists of (1) one hyperboloid-planar mirror at the top; (2) one hyperboloidal mirror at the bottom; and (3) a high-resolution USB camera also at the bottom (inside the bottom mirror and looking up). The components are housed and supported by a (4) transparent tube or plastic standoffs (for the real-life prototype shown in Figure 13). The choice of the hyperboloidal reflectors owes to three reasons: it is one of the four non-degenerated conic shapes satisfying the SVP constraint [29]; it allows a wider vertical FOV than elliptical and planar mirrors; and it does not require a telescopic (orthographic) lens for imaging as with paraboloidal mirrors (so our system can be downsized). In addition, the planar part of mirror 1 works as a reflex mirror, which in part reduces distortion caused by dual conic reflections. Based on the SVP property, the system obtains two radial images of the omnidirectional views in the form of an inner and an outer ring as illustrated in Figure 2a,b). Nevertheless, the unique set of parameters describing the entire system categorizes it as a "global camera model" given by [13] because changing the value of any parameter in the model affects the overall projection function of visible light rays in the scene as well as other computational imaging factors such as depth resolution and overlapping field of view, which we attempt to optimize with the following design subsections. Please, refer to Appendix A for clarification on our symbolic notation.

Model Parameters
In the configuration of Figure 3, mirror 1's real or primary focus is F 1 , which is separated by a distance c 1 from its virtual or secondary focus, F 1 , at the bottom. Without loss of generality, we make both the camera's pinhole and F 1 coincide with the origin of the camera's coordinate system, O C . This way, the position of the primary focus, F 1 , can be referenced by vector [C] Cartesian coordinates with respect to the camera frame, [C]. Similarly, the distance between the foci of mirror 2, F 2 and F 2 , is measured by c 2 . Here, we use the planar (reflex) mirror of radius r re f and unit normal vector  The profile of each hyperboloid is determined by independent parameters k 1 and k 2 , respectively. Their reflective vertical field of view (vFOV) are indicated by angles α 1 and α 2 . They play an important role when designing the total vFOV of the system, α sys , formally defined by Equation (54) and illustrated in Figure 5. Also importantly, while performing stereo vision, it is to consider angle α SROI , which measures the common (overlapping) vFOV of the omnistereo system. The camera's nominal field of view α cam and its opening radius r cam also determine the physical areas of the mirrors that can be fully imaged. Theoretically, the mirrors' vertical axis of symmetry (coaxial configuration) produces two image points that are radially collinear. This property is advantageous for the correspondence search during stereo sensing (Section 5) with a baseline measured as Among design parameters, we also include the total height of the system, h sys , and weight m sys , both being formulated in Section 2.3.
To summarize, the model has 6 primary design parameters given as a vector θ θ θ = c 1 , c 2 , k 1 , k 2 , d, r sys (4) in addition to by-product parameters such as b, h sys , r re f , r cam , m sys , α 1 , α 2 , α sys , α SROI , α cam In Section 4, we perform a numerical optimization of the parameters in θ θ θ with the goal to maximize the baseline, b, required for life-size navigational stereopsis. At the same time, we restrict the overall size of the rig (Section 2.3) without sacrificing sensing performance characteristics such as vertical field of view, spatial resolution, and depth resolution. In the upcoming subsections, we first derive the analytical solutions for the forward projection problem in our coaxial stereo configuration as a whole. In Section 3.2, we derive the back-projection equations for lifting 2D image points into 3D space.

Single Viewpoint (SVP) Configuration for OmniStereo
As a central catadioptric system, its projection geometry must obey the existence of the so-called single effective viewpoint (SVP). While the SVP guarantees that true perspective geometry can always be recovered from the original image, it limits the selection of mirror profiles to a set of conics. Generally, a circular hyperboloid of revolution (about its axis of symmetry) conforms to the SVP constraint as demonstrated by Baker and Nayar in [30]. Since a hyperboloidal mirror has two foci, the effective viewpoint is the primary focus F inside the physical mirror and the secondary (outer) focus F is where the centre (pinhole) of the perspective camera should be placed for depicting a scene obeying the SVP configuration discussed in this section.
First of all, a hyperboloid i can be described by the following parametric equation: where z 0 i = c i 2 is the offset (shift) position of the focus along the Z-axis from the origin O C , and r i is the orthogonal distance to the axis of revolution / symmetry (i.e. the Z-axis) from a point P i on its surface.
In fact, the position of a valid point P i is constrained within the mirror's physical surface of reflection, which is radially limited by r i,min and r i,max , such that: and r 1,min = r re f , r 1,max = r sys , r 2,min = r cam , r 2,max = r sys . Observe that the radius of the system is the upper bound for both mirrors ( Figure 3). In addition, the hyperboloids profiled by Equation (5) must obey the following conical constraints: k is a constant parameter (unit-less) inversely related to the mirror's curvature or more precisely, the eccentricity ε c of the conic. In fact, ε c > 1 for hyperbolas, yet a plane is produced when ε c → ∞ or k = 2. We devise M i as the set of all the reflection points P i with coordinates (x i , y i , z i ) laying on the surface of the respective mirror i within bounds. Formally, In our model, we describe both hyperboloidal mirrors, 1 and 2, with respect to the camera frame [C], which acts as the common origin of the coordinate system. Therefore, By expanding Equation (5) with their respective index terms, it becomes Additionally, we define the function f z i : r → z i to find the corresponding z i component from a given r value as None otherwise (13) where None otherwise (14) where i − 1, so a valid input z can be associated with both positive and negative solutions r i .

Rig Size
In the attempt to evaluate the overall system size, we consider the height and weight variables due to the primary design parameters, θ θ θ.
First, the height of the system, h sys can be estimated from the functional relationships f z 1 and f z 2 defined in Equation (13), which can provide the respective z−component values at the out-most point on the mirror's surface. More specifically, knowing r sys , we get h sys = z max − z min (15) where z max = f z 1 (r sys ) and z min = f z 2 (r sys ). The rig's weight can be indicated by the total resulting mass of the main "tangible" components: where the mass of the camera-lens combination is m cam ; the mass of the support tube m tub can be estimated from its cylindrical volume V tub and material density ρ tub , and the mass due to the mirrors m mir = V mir ρ mir For computing the volume of the hyperboloidal shell, V i for mirror i, we apply a "ring method" of volume integration. By assuming all mirror material has the same wall thickness τ m , we acquire V i by integrating the horizontal cross-sections area along the Z-axis. Each ring area depends on its outer and inner circumferences that vary according to radius r | z for a given height z. Equation (14) establishes the functional relation r i + = f r i (z), from which we only need its positive answer. We let A be the function that computes the ring area of constant thickness τ m for a variable outer radius r i We consider the definite integral evaluated in the z interval bounded by its height limits, which are correlated with its radial limits Equation (6) and can be obtained via the f z i defined in Equation (13), such that Then, we proceed to integrate Equation (18), so the shell volume for each hyperboloidal mirror is defined as Finally, since the reflex mirror piece is just a solid cylinder of thickness τ m , its volume is simply

Analytical Solutions to Projection (Forward)
Assuming a central catadioptric configuration for the mirrors and camera system (Section 2.2), we derive the closed-form solution to the imaging process (forward projection) for an observable point P w , positioned in three-dimensional Euclidean space, R 3 , with respect to the reference frame, [C], as T . In addition, we assume all reference frames such as [F 1 ] and [F 2 ] have the same orientation as [C].
For mathematical stability, we must constrain that all projecting world points lie outside the mirror's volume: where f r i is defined by Equation (14) and ρ w measures the horizontal range to P w . P w is imaged at pixel position [I] m 1 after its reflection as point P 1 on the hyperboloidal surface of mirror 1 (Figure 4). On the other hand, the second image point's position, [C] m 2 , due to reflection point P 2 on mirror 2 is rather obtained indirectly after an additional point P r is reflected at [C] p re f on the reflex mirror represented via Equation (32).
First, for P w 's reflection point via mirror 1 at position vector [C] p 1 , we use λ 1 as the parametrization term for the line equation passing through F 1 toward P w with direction [F1] The position of any point P 1 on this line is given by: Substituting Equation (23) into Equation (11), we obtain: in order to solve for λ 1 , which turns out to be is the Euclidean norm between P w and mirror 1's focus, F 1 . In practice, we represent the reflection point's position [C] p 1 as a matrix-vector multiplication between the 3 × 4 transformation matrix T in homogeneous coordinates: Note that [C] p 1 's elevation angle, θ 1 , must be bounded as where θ 1,min and θ 1,max are the angular elevation limits for the real reflective area of the hyperboloid.
Finally, the reflection point P 1 with position [C] p 1 can now be perspectively projected as a pixel point located at [I] T on the image. In fact, the entire imaging process of P w via mirror 1 can be expressed in homogeneous coordinates as: where the scalar is the perspective normalizer that maps the principal ray passing through p 1 onto a point [C] q 1 = [x q 1 , y q 1 , 1] T on the normalized projection planeπ img 1 . The traditional 3 × 3 intrinsic matrix of the camera's pinhole model is  Figure 4 illustrates the projection point f [C] q 1 on the respective image plane π img 1 . Similarly, we provide the analytical solution for the forward projection of P w via mirror 2 by first considering the position of reflection point P 2 : where with direction vector's norm For completeness, note that the physical projection via mirror 2 is incident to the reflex mirror at where λ re f = d 2(d−z 2 ) according to Equation (2) in the theoretical model. Ultimately, ignoring any astigmatism and chromatic aberrations introduced by the reflex mirror, and because the same (and only) real camera with K c is used for imaging, we obtain the projected pixel position [I] where is the perspective normalizer to find [C] q 2 on the normalized projection plane,π img 2 . Due to planar mirroring via the reflex mirror, C [C] K re f is used to change the coordinates of P 2 from [C] onto the virtual camera frame, C , located at [C] f 2v . Hence, where the 3 × 1 unit normal vector of the reflex mirror plane, [C]n re f given in Equation (1), is mapped into its corresponding 3 × 3 diagonal matrix Dn re f , via the relationship: It is convenient to define the forward projection functions f ϕ 1 ( [C] p) and f ϕ 2 ( [C] p) for a 3D point P whose position vector is known with respect to [C] and which is situated within the vertical field of view α i of mirror i (for i ∈ {1, 2}) indicated in Figure 5. (37) and (22) [C] p (37) and (22) None otherwise (36) In fact, [I] m i is considered valid if it is located within the imaged radial bounds, such that: where the frame of reference I C i implies that its origin is the image center Figure 7). Therefore, the magnitude (norm) of any position I Ci m in pixel space I C i can be measured as In particular, I Ci [I] m r i,lim is the image radius obtained from the projection corresponding to a particular point coincident with the line of sight of the radial limit r i,lim -it being either r sys , r re f , or r cam as indicated by Equation (6).

Analytical Solutions to Back Projection
The back projection procedure establishes the relationship between the 2D position of a pixel and its corresponding 3D projective direction vector v i toward the observed point P w in the world. Initially, the pixel point [I] m 1 (imaged via mirror 1) is mapped as Q 1 onto the normalized projection T by applying the inverse transformation of the camera intrinsic matrix Equation (28) as follows: For simplicity, we assume no distortion parameters exist, so we can proceed with the lifting step along the principal ray that passes through three points: the camera's pinhole O C , point Q 1 on the projection plane, and the reflection point P 1 (Figure 4). The vector form of this line equation can be written as: By substituting Equation (40) into Equation (11), we solve for the parameter t 1 , to get for [C] p 1,h as the homogeneous form of Equation (40). In fact, [F1] v 1 provides the back-projected angles (elevation θ 1 , azimuth ψ 1 ) from focus F 1 toward [C] P w : T uses the same intrinsic matrix K c , we can safely back-project pixel [I] m 2 to Q 2v on the normalized projection planeπ img 2 as follows: where the inverse transformation of the camera intrinsic matrix K −1 c is given by Equation (28). Since the reflection matrix K re f defined in Equation (34) is bidirectional due to the symmetric position of the reflex mirror about [C] and C , we can find the desired position of [C] q 2v with respect to [C]: which is equivalent to [C] In Figure 4, we can see the principal ray that passes through the virtual camera's pinhole O C and the reflection point P 2 , so this line equation can be written as: Solving for t 2 from Equations (47) and (12), we get where [C] q 2 = x 2 q 2 + y 2 q 2 + 1 is the distance between the normalized projection point Q 2 and the camera O C while considering Equation (46). Beware that the newly found location of P 2 is given with respect to the real camera frame, [C].
Again, we obtain the back-projection ray in order to indicate the direction leaving from the primary focus F 2 toward P w through P 2 . Here, the corresponding elevation and azimuth angles are respectively given by is the magnitude of the direction vector from its reflection point P 2 .
Like done for the (forward) projection, it is convenient to define the back-projection functions f β 1 and f β 2 for lifting a 2D pixel point [I] m within radial bounds validated by Equation (37) to their angular (50) and (51), such that f β i : R 2 → R 2 , 3.3. Field-of-View The horizontal FOV is clearly 360 • for both mirrors. In other words, azimuths ψ can be measured in the interval [0, 2π) rad. As discussed previously, there exists a positive correlation between the vertical field of view (vFOV) angle α i of mirror i and its profile parameter k i , such that α i → 180 • as k i → ∞ (see Figure 9). As demonstrated in Figure 5, α i is physically bounded by its corresponding elevation angles: θ i,max , θ i,min . Both vFOV angles, α 1 and α 2 , are computed from their elevation limits as follows: The overall vFOV of the system is also given from these elevation limits: Figure 6. A cross section of the SROI (shaded area) formed by the intersection of view rays for the limiting elevations θ 1/2,min/max . The nearest stereo (ns) points are labeled P ns high , P ns mid and P ns low since they are the vertices of the hull that near-bounds the set of usable points for depth computation from triangulation (Section 5.2). See Table 3 for the proposed sensor's values. Figure 6 highlights the the so-called common vFOV angle, α SROI , for the stereo region of interest (SROI) where the same point can be seen from both mirrors so point correspondences can be found (Section 5). In our model, α SROI can be decided from the value of the three prevailing elevation angles (θ 1,max , θ 1,min , and θ 2,min ), such that: where generally, The shaded area in Figure 6 illustrates the SROI that is far-bounded by the set of triangulated points found at the maximum range due to minimum disparity ∆m 12 = 1 px in the discrete case (refer to Figure 17), such that where functions f β i and f ∆ , are provided in Equations (52) and (89).
The SROI is near-bounded (to the Z-axis of radial symmetry) by its vertices P ns high , P ns mid and P ns low , which result from the following ray-intersection cases: where the intersection function f ∆ is implemented for direction rays (or angles) as defined in the Triangulation Section 5.2.
By assuming a radial symmetry on the camera's field of view α cam , it should allow for a complete view of the mirror surface at its outmost diameter of 2r sys according to Equation (6). Substantially, as depicted in Figure 6, α cam is upper-bounded by the camera hole radius r cam selected according to Equation (78). The following inequality constraint emerges where the respective functions f z i are defined in Equation (13). . In addition, we indicate the corresponding radial heights h I 1 and h I 2 of the SROI, so we can determine the imaging ratio χ I 1:2 = For the optimal parameter values listed in Table 1, we find that χ I 1:2 ≈ 2.
Our specific viewing requirements when mounting the omnidirectional sensor along the central axis of the quadrotor ensure that objects located at 15 cm under the rig's base and at 1 meter away (from the central axis) can be viewed. Thus, angles θ 1,min and θ 2,min should only be large enough as to avoid occlusions from the MAV's propellers ( Figure 5) and to produce inner and outer ring images at a useful ratio (Figure 7).

Spatial Resolution
The resolution of the images acquired by our system are not space invariant. In fact, an omnidirectional camera producing spatial resolution-invariant images can only be obtained through a non-analytical function of the mirror profile as shown in [31]. In this section, we study the effect our design has on its spatial resolution as it depends on position parameters like d and c i introduced in Section 2.1 as well as a direct dependency on the characteristics (e.g., focal length f ) of the camera obtaining the image. Let η cam be the spatial resolution for a conventional perspective camera as defined by Baker and Nayar in [25,29]. It measures the ratio between the infinitesimal solid angle dω i (usually measured in steradians) that is directed toward a point P i at an angle θ i,pix (formed with the optical axis Z C ) and the infinitesimal element of image area dA pix that dω i subtends (as shown in Figure 8). Accordingly, we have: whose behavior tends to decrease as θ pix → 0, so higher resolution areas on the sensor plane continuously increase the farther away they get from the optical center imaged at [I] m c . For ease of visualization, we plot only the u pixel coordinates corresponding to the 2D spatial resolution η 2D , which is obtained by projecting the solid angle Ω onto a planar angle θ Ω (the apex angle in 2D of the solid cone of view). This yields θ Ω = 2 arccos(1 − Ω/2π), and we reduce the image area into its circular diameter with 2 √ dA/π. Generally, our conversion from 3D spatial resolution η in m 2 /sr units to 2D proceeds as follows: where θ Ω=1 sr ≈ 1.14390752211 rad. More specifically, Equation (59) is manipulated to provide η i,cam as the indicative of spatial resolution toward any specific point in the mirror, [C] P i ∈ M i according to Equation (8), as follows: where r i is the radial length defined in Equation (6) and its associated z i coordinate, f is the camera's focal length, and the design parameters d and c i that relate to the position of the mirror focal points F i with respect to the camera frame [C].
Thus, for a conventional perspective camera, η i,cam grows as θ i,pix → π/2 due to the foreshortening effect that stretches the image representation around the sensor plane's periphery where spatial information gets collected onto a larger number of pixels. Therefore, image areas farther from the optical axis are considered to have higher spatial resolutions.
Baker and Nayar also defined the resolution, η i , of a catadioptric sensor in order to quantify the view of the world or dν i , an infinitesimal element of the solid angle subtended by the mirror's effective viewpoint F i , which is consequently imaged onto a pixel area dA pix . Again, here we provide the resolution according to our model: for our mirror-perspective camera configuration, where O C is the origin of coordinates as shown in Figure 8 and η i,cam is given in Equation (61). As demonstrated by the plot of Figure 12 in Section 4.2.2, η i grows accordingly towards the periphery of each mirror (the equatorial region). This aspect of our sensor design is very important because it indicates that the common field of view, α SROI , where stereo vision is employed (Section 5), is imaged at a relatively higher resolution than the unused polar regions closer to the optical axis (the Z C axis).
If we modify η i by substituting r i with its equivalent f r i (z i ) function defined in Equation (14), using mirror 1 for example, we get: which is an inherent indicative of how the resolution η i for a reflection point P i increases with k i → ∞ ( Figure 11). Conversely, the smaller the k i parameter gets (related to eccentricity as discussed in Section 2.2), the flatter the mirror becomes, so its resolution resembles more that of the perspective camera alone. Mathematically, lim k i →2 η i → η i,cam . As shown in Figure 9, a smaller k i would require a wider radius r sys in order to achieve the same omnidirectional vertical field of view, α sys . Even worse, in order to image such a wider reflector, either the camera's field of view, α cam , would have to increase (by decreasing the focal length f and perhaps requiring a larger camera hole r cam and sensor size), or the distance c i between the effective pinhole and the viewpoint would have to increase accordingly. Another consequence is the effect on the baseline b, which must change in order to maintain the same vertical field of view ( Figure 10). As a result, the depth resolution of the stereo system would suffer as well. Figure 9. The effect that parameter k i (showing mirror 1 only) has over the system radius r sys for various values of the vertical field of view angle α 1 . In order to maintain a vertical field of view α i that is bounded by z max | r sys , the value of r sys must change accordingly. Inherently, the system's height, h sys , and its mass, m sys , are also affected by k i (see Section 2.3). Figure 10. The effect that parameter k 1 has over the omnistereo system's baseline b for several common FOV angles (α SROI ) and a fixed camera with α cam . An inverse relationship exists between k and b as plotted here (using a logarithmic scale for the vertical axis). Intuitively, the flatter the mirror gets (k → 2), the farther F 1 must be translated in order to fit within the camera's view, α SROI , causing b to increase.

Parameter Optimization and Prototyping
The nonlinear nature of this system makes it very difficult to balance among its desirable performance aspects. The optimal vector of design parameters, θ θ θ * , can be found by posing a constrained maximization problem for the objective function which measures the baseline according to Equation (3). Indeed, the optimization problem is subject to the set of constraints C, which we enumerate in Section 4.1. Formally, where Θ ⊆ R 6 is the 6-dimensional solution space for θ θ θ ∈ R 6 given in Equation (4) as θ θ θ = c 1 , c 2 , k 1 , k 2 , d, r sys .

Optimization Constraints
We discuss the constraints that the proposed omnistereo sensor is subject to. Overall, we mainly take the following into account: (a) geometrical constraints -including SVP and reflex constraints described by Equations (11), (12) and (2); (b) physical constraints -the rig's dimensions, which include the mirrors radii as well as by-product parameters such as system height h sys and mass m sys ; (c) performance constraints -the spatial resolution and range from triangulation determined by parameters k 1 , k 2 , and c 1 ; the desired viewing angles for an optimal SROI field of view, α SROI . Following the design model described throughout Section 2, we now list the pertaining linear and nonlinear constraints that compose the set C. We disjoint the linear constraints in a subset C L and the non-linear constraints subset C NL , so C = C L C NL . Within each subset, we generalize equality constraints as functions h : R 6 → R that obey h(θ θ θ) = 0 (66) whereas inequality functions g : R 6 → R satisfy g(θ θ θ) ≤ 0 (67)

Linear Constraints
We have only setup linear inequalities for constraints in C L . Specifically, we require the following: In order to set the position of F 2 below the origin O C of the pinhole camera frame [C], the focal distance c 2 of mirror 2 must be larger than d (distance between O C and F 2v ), pertains our rig dimensions in order to assign a greater curvature to mirror 2's profile (located a the bottom), so its view is directed toward the equatorial region rather than up. Complementarily, this constraint flattens mirror 1's profile, so it can possess a greater view of the ground. This curvature inequality allows the SROI to be bounded by a wider vertical field of view when the sensor must be mounted above the MAV's propellers as depicted in Figure 5.

Non-Linear Constraints
For the non-linear design constraints, we establish the following inequalities: The AscTec Pelican quadrotor has a maximum payload of 650 g (according to the manufacturer specifications [28]). Therefore, we must satisfy the system mass computed via Equation (16) For example, we set h sys,max = 150 mm for the 37 mm-radius rig. g 6 : The origin of coordinates for the camera frame is set at its viewpoint, O C . In order to fit the camera enclosure under mirror 2, it is realistic to position the focus F 2 on the vertical transverse axis at more than 5 mm away from O C : where z 0 2 is defined in Equation (10), and a 2 pertains to Equation (5).
Next, we determine the bounds for the limiting angles that partake in the computation of the system's vertical field of view α sys , which is based on equation Equation (54). Our application has specific viewing requirements that can be achieved with the following application conditions: Let Λ 1,max = 14 • be an acceptable upper-bound for angle θ 1,max , such that g 8 : Because we desire a larger view towards the ground from mirror 1, we empirically set Λ 1,min = −25 • as a lower-bound for the minimum elevation θ 1,min , g 9 : In order to avoid occlusions with the MAV's propellers while being capable to image objects located about 5 cm under the rig's base and 20 cm away (horizontally) from the central axis, we limit mirror 2's lowest angle by a lower-bound Λ 2,min = −14 • , Finally, we restrict the radius of the system, r sys , to be identical for both hypeboloids by satisfying the following equality condition: With functions f r 1 and f r 2 defined in Equation (14), we set where we imply that z i,max ← f z i (r sys ) using Equation (13). Thus, the entire function composition for this equality becomes f r 1 f z 1 (r sys ) = f r 2 f z 2 (r sys ) (77)

Optimal Results
Applying the aforementioned constraints (Section 4.1) and using an iterative nonlinear optimization method such as one of the surveyed in [32], a bounded solution vector θ θ θ * converges to the the values shown in Table 1 for two rig sizes. Table 2 contains the by-product parameters corresponding to the dimensions listed in Table 1. 150.00 120.00 As Figure 3 illustrates, a realistic dimension for the radius of the camera hole, r cam , must consider the maximum value between a physical micro-lens radius (r lens ) and the radius r α cam | rsys for an unoccluded field of view of the camera α cam imaging the complete surface of mirror 1. Practically, For both rigs, the expected vertical field of views are α sys = 75 • − (−21 • ) ≈ 96 • according to Equation (54), and α SROI = 14 • − (−14 • ) ≈ 28 • using Equation (55). Note that θ 2,max may be actually limited by the camera hole radius, so in reality θ cam 59 • , and α sys 80 • . For the big rig, Table 3 shows the nearest vertices of the SROI that result from these angles ( Figure 6). Finally, we study the effect parameter k i has over the system radius r sys (Figure 9), the omnistereo baseline b (Figure 10), and the spatial resolution (Figures 11 and 12). Figure 9 addresses the relation between k i and radius r sys (recall the rig size specified in Section 2.3). In Figure 11, it can be seen that for the same r sys , realistic values for k 1 fall in the range 3 < k 1 < 13, and the vertical field of view α 1 → 0 as k → 2, which is expected according to the SVP property specified in Section 2.2. In fact, the left part of Figure 11 also demonstrates the necessary r sys to maintain α SROI ≈ 28 • for various values of k i . Figure 11. Comparison of k i values and their effect on spatial resolution η i for i = {1, 2}. For the big rig, the optimal focal dimensions c 1 and c 2 (from Table 1) were used as well as the angular span on the common vertical FOV, α SROI ≈ 28 • . Although resolution η (Opt.) i for the optimal values of k i could be improved by employing smaller k values (lower curvature profiles indicated on the left plot of the figure), this would in turn increase the system radius, r sys , as to maintain α i (Figure 9). As expected, the plot on the right help us appreciate how the spatial resolutions, η i , increase towards the equatorial regions (θ 1 → θ SROI,max and θ 2 → θ SROI,min ). Figure 10 shows the inverse relationship between values of k 1 and the baseline, b, as we attempt to fit the view of a wider/narrower mirror profile (due to k 1 ) on the constant camera field of view, α cam . In order to make a fair comparison, let for which we find its new focal length c 1 while solving for the new r sys and z max . Provided with a function such that c 1 ← f c 1 (k 1 ), we perform the analysis for a given α SROI and α cam shown in Figure 10. Given the baseline function f b defined in Equation (64), the following implication holds true: Notice that k 2 , c 2 and d are kept constant through this last analysis, and we ignore possible occlusions from the reflex mirror fixed at d/2.

Spatial Resolution Optimality
In this section, we compare the sensor's spatial resolution, η i , defined in Section 3.4 for the optimal parameters listed in Table 1 (for the big rig, only). In Figure 12, we verify how both resolutions η 1 and η 2 increase towards the equatorial region according to the spatial resolution theory presented in [29]. Indeed, the increase in spatial resolution within the SROI that covers the equatorial region (as indicated in Figure 6) justifies our model's coaxial configuration intended for omnistereo applications.  (60), we plot the 2D version of the spatial resolution of our proposed omnistereo catadioptric sensor (37 mm-radius rig). Both resolutions η 1 and η 2 increase towards the equatorial region where they are physically limited by r sys . This verifies the spatial resolution theory given in [29], and it justifies our coaxial configuration useful for omnistereo sensing within the SROI indicated in Figure 6.
In Figure 11, we compare the effect on η i for various mirror profiles, which depend directly on k i . We illustrate the change in curvature due to parameters k 1 and k 2 and also show (in the legend) the respective r sys achieving a common vFOV of α SROI ≈ 28 • as for the optimal parameters of the big rig. From this plot, we appreciate the compromise due to optimal parameters, k (Opt.) 1 = 5.7 and k (Opt.) 2 = 9.7, for a realistic system size due to r sys and a suitable range of spatial resolutions, η i , within the SROI.

Prototypes
We validate our design with both synthetic and real-life models.

Synthetic Prototype (Simulation)
After converging to an optimal solution θ θ θ * , we employ these parameters (Table 1) to describe synthetic models using POV-Ray, an open-source ray-tracer. We render 3D scenes via the camera of the synthetic omnistereo sensor like the example shown in Figure 2b. The simulation stage plays two important roles in our investigation: (1) to acquire ground-truth 3D-scene information in order to evaluate the computed range by the omnistereo system (as explained in Section 5); and (2) to provide an almost accurate geometrical representation of the model by discounting some real-life computer vision artifacts such as assembly misalignments, glare from the support tube (motivating the use of standoffs on the real prototype), as well as the camera's shallow depth-of-field. All of these artifacts can affect the quality of the real-life results shown in Section 6.

Real-Life Prototypes
(a) Omnistereo rig using 37 mm-radius mirrors mounted on an AscTec Pelican quadrotor.
(b) Omnidirectional image captured by the real-life prototype in Figure 13a. We have also produced two physical prototypes that can be installed on the Pelican quadrotor (made by Ascending Technologies [28]). Figure 13a shows the rig constructed with hyperboloidal mirrors of r sys ≈ 37 mm, and a Logitech R HD Pro Webcam C910 camera capable of (2592 × 1944) pixel images at 15∼20 FPS. We decided to skip the use of the acrylic glass tube to separate the mirrors at the specified h sys distance, and instead we constructed a lighter 3-standoff mount in order to avoid glare and cross-reflections. This support was designed in 3D-CAD and printed for assembly. The three areas of occlusion due to the 3 mm-wide standoffs are non-invasive for the purpose of omnidirectional sensing and can be ignored with simple masks during image processing. In fact, we stamped fiducial markers to the vertical standoffs to aid with the panoramas generation (Section 5.1) and future calibration methods. To image the entire surface of mirror 1, we require a camera with a (minimum) field of view of α cam > 31 • , which is achieved by r α cam > 1.4 mm. In practice, as noted by Equation (78), microlenses measure around r lens ≈ 7 mm. Therefore, we set r cam > 7 mm, as a safe specification to fit a standard microlens through the opening of mirror 2 as shown in Figure 3.
Recall that m sys is limited by the maximum 650 g-payload that the AscTec Pelican quadrotor is capable of flying with (according to the manufacturer specifications [28]). The camera with lens weights approximately 25 g. A cylindrical tube made of acrylic has an average density ρ tub ≈ 1.18 g·cm −3 , whereas the mirrors machined out of brass have a density ρ mir ≈ 8.5 g·cm −3 . Empirically, we verify a close estimate of the entire system's mass, such that m sys ≈ 550 g for the big rig, and m sys ≈ 150 g for the small rig.

3D Sensing from Omnistereo Images
Stereo vision from point correspondences on images at distinct locations is a popular method for obtaining 3D range information via triangulation. Techniques for image point matching are generally divided between dense (area-based scanning [32]) and sparse (feature description [33]) approaches. Due to parallax, the disparity in point positions for objects close to the vision system must be larger than for objects that are farther away. As illustrated in Figure 6, the nearsightedness of the sensor is determined mainly by the common observable space (a.k.a. SROI) acquired by the limiting elevation angles of the mirrors (Section 3.3). In addition, we will see next (Section 5.2) that the baseline b also plays a major role in range computation. Due to our model's coaxial configuration, we could scan for pixel correspondences radially between a given pair of warped images ([I 1 ], [I 2 ]) like in the approach taken by similar works such as [34]. However, it seems more convenient to work on a rectified image space, such as with panoramic images, where the search for correspondences can be performed using any of the various existing methods for perspective stereo views. Hence, we first demonstrate how these rectified panoramic images are produced (Section 5.1) and used for establishing point correspondences. Then, we proceed to study our triangulation method for the range computation from a given set of point correspondences (Section 5.2). Last, we show preliminary 3D point clouds as the outcome from such procedure.  (showing only the masked region of interest on the back of image plane π img 1 ). Any particular ray, v 1 indicated by its elevation and azimuth such as [F1] ψ 1 , θ 1 that is directed towards the focus F 1 must traverse the projection cylinder S cyl 1 at point P cyl 1 . More abstractly, the figure also shows how a pixel position [Ξ1] m α on the panoramic pixel space gets mapped from its corresponding pixel position [I1] m α via function h Ξ 1 defined in Equation (85). Although not up to scale, it's crucial to notice the relative orientation between S cyl 1 and the back of the projection plane π img 1 where the omnidirectional image [I 1 ] is found.

Panoramic Images
More thoroughly, for i = {1, 2}, S cyl i is the set of all valid 3D points P cyl i that lie on an imaginary unit cylinder centered along the Z-axis and positioned with respect to the mirror's primary focus F i . Recall that the radius of a unit cylinder is r cyl = 1, so its circumference becomes w cyl = 2πr cyl = 2π.
Noticed that the imaging ratio, χ I 1:2 = h I 1 h I 2 , illustrated in Figure 7 provides a way of inferring the scale between pairs of point correspondences. However, we achieve conforming scales among both panoramic representations by simply setting both cylinders to an equal height h cyl , which is determined from the system's elevation limits, (θ sys,min , θ sys,max ), since they partake in the measurement of the system's vertical field of view given by Equation (54). Hence, we obtain h cyl = z cyl,max − z cyl,min , where    z cyl,max = tan θ sys,max z cyl,min = tan θ sys,min Consequently, to achieve panoramic images [Ξ i ] of the same dimensions by maintaining a true aspect ratio w Ξ : h Ξ , it suffices to indicate either the width (number of columns) w Ξ or the height (number of rows) h Ξ as number of pixels. Here, we propose a custom method for resolving the panoramic image dimensions by setting the equality for the length l px of an individual "square" pixel in the cylinder (behaving like a panoramic camera sensor): For instance, if the width w Ξ is given, then the height is simply h Ξ = w Ξ h cyl /w cyl . To increase the processing speed for each panoramic image [Ξ i ], we fill up its corresponding look-up-table LUT Ξ i of size w Ξ × h Ξ that encodes the mapping for each panoramic pixel coordinates  Figure 4. Thus, the ray [F i ] v i of a particular 3D point directed about [F i ] ψ i , θ i must pass through P cyl i in order to get imaged as pixel [I] m i . Since the circumference of the cylinder, w cyl , is discretized with respect to the number of pixel columns or width w Ξ , we use the pixel length l px as the factor to obtain the arc length l ψ i spanned by the azimuth [F i ] ψ i out of a given [Ξ i ] u coordinate on the panoramic image. Generally, or simply [F i ] ψ i = 2π − [Ξ i ] u l px for the unit cylinder case. An order reversal in the columns of the panorama is performed by Equation (82) because we account for the relative position between S cyl i and the projection plane π img . For [Ξ 1 ], Figure 14 depicts the unrolling of the cylindrical panoramic image onto a planar panoramic image. However, note that π img is shown from above (or its back) in Figure 14, so the panorama visualization places the viewer inside the cylinder at F 1 .
Similarly, the elevation angle [F i ] θ i is inferred out the row or [Ξ i ] v coordinate, which is scaled to its cylindrical representation by l px . Recall that both cylinders have the same height, h cyl , computed by Equation (80). By taking into account any row offset from the maximum height position, [F i ] z cyl,max , of the cylinder, we get Given these angles and assuming coaxial alignment, we evaluate the positon vector [C] p cyl i for a point on the panoramic cylinder with respect to the camera frame [C]: where r cyl cancels out for a unit cylinder. The direction equations Equations (82)

Stereo Matching on Panoramas
We understand that the algorithm chosen for finding matches is crucial to attain correct pixel disparity results. We refer the reader to [35] for a detailed survey of stereo correspondence methods. After comparing various block matching algorithms, we were able to obtain acceptable disparity maps with the semi-global block matching (SGBM) method introduced by [36], which can find subpixel matches in real time. As a result of this stereo block matcher among the pair of panoramic images ([Ξ 1 ], [Ξ 2 ]), we get the dense disparity map Ξ ∆m 12 visualized as an image in Figures 15 and 21a. Note that valid disparity values must be positive ( ∆m 12 |[ Ξ1] m 1 > 0) and they are given with respect to the reference image, in this case, [Ξ 1 ]. In addition, recall that no stereo matching algorithm (as far as we are aware) is totally immune to mismatches due to several well-known reasons in the literature such as ambiguity of cyclic patterns. An advantage of the block (window) search for correspondences is that it can be narrowed along epipolar lines. Unlike the traditional horizontal stereo configuration, our system captures panoramic images whose views differ in a vertical fashion. As shown in [14], the unwrapped panoramas contain vertical, parallel epipolar lines that facilitate the pixel correlation search. Thus, given a pixel position [Ξ1]

Range from Triangulation
Recall the duality that states a point P w as the intersection of a pair of lines. Regardless of the correspondence search technique employed, such as block stereo matching between panoramas [Ξ i ] (Section 5.1.1) or feature detection directly on [I], we can resolve for [I] (m 1 , m 2 ). From Equations (42) and (49), we obtain the respective pair of back-projected rays [F1] v 1 , [F2] v 2 , emanating from their respective physical viewpoints, F 1 and F 2 , which are separated by baseline b. We can compute elevation angles θ 1 and θ 2 using equations Equations (43) and (50). Then, we can triangulate the back-projected rays in order to calculate the horizontal range ρ w defined in Equation (22), as follows: Finally, we obtain the 3D position of P w : where ψ 12 is the common azimuthal angle (on the XY-plane) for coplanar rays, so it can be determined either by Equation (44) or Equation (51). Functionally, we define the "naive" intersection function that implements Equations (87) and (88) such that where θ θ θ is the model parameters vector defined in Equation (4) and can be omitted when calling this function because the model parameters should not change (ideally). Figure 16. The more realistic case of skew back-projection rays (v 1 , v 2 ) approximates the triangulated point P w by getting the midpoint P w G on the common perpendicular line segment G 1 G 2 : λ 1⊥2v1⊥2 . Note that the visualized skew rays were formed from a pixel correspondence pair [I] (m 1 , m 2 ) and by offsetting the coordinate u 2 by 15 pixels.

Common Perpendicular Midpoint Triangulation Method
Because the coplanarity of these rays cannot be guaranteed (skew rays case), a better triangulation approximation while considering coaxial misalignments is to find the midpoint of their common perpendicular line segment (as attempted in [23]). As illustrated in Figure 16, we define the common perpendicular line segment G 1 G 2 as the parametrized vector v 1⊥2 = λ 1⊥2v1⊥2 , for the unit vector normal to the back-projected rays, v 1 and v 2 , such that: If the rays are not parallel ( v 1 ⊗ v 2 = 0), we can compute the "exact" solution, T , of the well-determined linear matrix equation It follows that the location of the midpoint P w G on the common perpendicular v 1⊥2 with respect to the common frame [C] is

Range Variation
Before we introduce an uncertainty model for triangulation (Section 5.3), we briefly analyze how range varies according to the possible combinations of pixel correspondences, [I] (m 1 , m 2 ) on the image [I]. Here, we demonstrate how a radial variation of discretized pixel disparities, ∆m 12 , affects the 3D position of a point obtained from triangulation (Section 5.2). Figure 17 demonstrates the nonlinear characteristics of the variation in horizontal range, ∆ρ w , from the discrete relation between pixel positions [I] m i and their respective back-projected (direction) rays obtained from f β i and triangulated via function f ∆ defined in Equation (89). It can be observed that the horizontal range variation, ∆ρ w , increases quadratically as ∆m 12 → 1px, which is the minimum discrete pixel disparity, which provides a maximum horizontal range ρ w,max ≈ [18,28] m (computed analytically). The main plot of Figure 17 shows the small disparity values in the interval ∆m 12 = [1,20] px, whereas the subplot is a zoomed-in extension of the large disparity cases in the interval ∆m 12 = [20, 100] px. The current analysis is an indicative that triangulation error (e.g., due to false pixel correspondences) may have a severe effect on range accuracy that increases quadratically with distance as it can be appreciated with the 8 m variation on the disparity interval ∆m 12 = [1, 2]px. Also, observe the example of Figure 20 for a reconstructed point cloud, where this range sensing characteristic is

Experiment Results
In this section, we demonstrate the capabilities of the omnistereo sensor to provide 3D information either as dense point clouds or as for the registration of sparse 2D features and 3D points. We also evaluate the precision of both projection and triangulation of a few detected corners from a chessboard whose various 3D poses are given as ground-truth.

Dense 3D Point Clouds
By implementing the process described in Section 5, we begin by visualizing the dense point-cloud obtained from the omnidirectional synthetic image given in Figure 2b, whose actual size is 1280 × 960 pixels. The associated panoramic images, [Ξ i ], were obtained using function h Ξ i defined in Equation (85) and are shown in Figure  We also present results from a real experiment using the prototype described in Section 4.3.2 and shown in Figure 13a. The panoramic images and dense point cloud shown in Figure 21 are obtained by implementing the pertinent functions described throughout this manuscript and by holding the SVP assumption of an ideal configuration. We provide these qualitative results as preliminary proof of concept for the proposed sensor after employing a calibration procedure based on the generalized unified model proposed in [37].

Sparse 3D Points from Features
Using the SURF feature detector and descriptors [38], Figure 22 demonstrates 44 correct matches that are triangulated with Equation (93). Sparse 3D points can be useful for applications of visual odometry where the sensor changes poses and those registered point features can be matched against new images. Please, refer to [39] for a tutorial on visual odometry.

Evaluation of Synthetic Rig
Due to the unstructured nature of the dense point clouds previously discussed, we proceed to triangulate sets of sparse 3D points whose positions with respect to the omnistereo sensor camera frame, [C], are known in advance. We synthesize a calibration chessboard pattern [G] containing m × n square cells for various predetermined poses [C] [G] T h . Since the sensor is assumed to be rotationally symmetric, it suffices to experiment with groups of L = 4 chessboard patterns situated at a given horizontal range. A total of Lmn 3D points are available for each range group. Each corner point's position [C] p j is found with respect to [C] via the frame transformation [C] p j,g = [C] Gg T h Gg p j for all indices j ∈ {1, . . . , mn}, g ∈ {1, . . . , L}. Figure 23 shows the set of detected corner points on the image from the group of patterns set to a range of [C] ρ G = 2 m. We adjust the pattern's cell sizes accordingly so its points can be safely discerned by an automated corner detector [35]. We systematically establish correspondences of pattern points on the omnidirectional image, and proceed to triangulate with Equation (93). For each range group of points, we compute the root-mean-square of the 3D position errors (RMSE) between the observed (triangulated) points [C]p j ← f P w (m 1 ,m 2 ) and the true (known) points [C] p j that were used to describe the ray-traced image. Table 4 compiles the RMSE results and the standard deviation (SD) for some group of patterns whose frames G g , are located at specified horizontal ranges [C] ρ G ∈ [0.25, 8.0] m away from [C].
We notice that for all the 3D points in the synthetic patterns, we obtained an average error of 0.1 px with a standard deviationσ px = 0.05 px for the subpixel detection of corners on the image versus their theoretical values obtained from f ϕ i defined in Equation (36). This last experiment helps us validate the pessimistic choice of σ px = 1 px for the discrete pixel space in the triangulation uncertainty model proposed in Section 5.3. Figure 23. Example of sparse point correspondences detected with subpixel precision from corners on the chessboard patterns around the omnistereo sensor. The size of the rendered images for this experiment is 1280 × 960 pixels. For this example's patterns, the square cell size is 140 mm. The RMSE for this set of points at [C] ρ G = 2 m is approximately 15 mm (Table 4).

Evaluation of Real-Life Rig
The following experiment uses L = 5 different poses of a real chessboard pattern with 5 × 8 corner points where the square cell size is 24 mm. As done in Section 6.3.1, the evaluated error is the Euclidean norms between the triangulated points and the ground-truth positions of the chessboard posses captured via a motion capture system. The RMSE for all projected points in this set of chessboard patterns is 2.5 pixels with a standard deviation of 1.5 pixels. The RMSE for all triangulated points in this set is 3.5 mm with a standard deviation of 1.4 mm. Figure 24 visually confirms the proximity of the triangulated chessboard poses against the ground-truth pose information.

Discussion and Future Work
The portable aspect of the proposed omnistereo sensor is one of its greatest advantages, as discussed in the introduction section. The total weight of the big rig using 37 mm-radius mirrors is about 550 g, so it can be carried by the AscTec Pelican quadrotor under its payload limitations of 650 g. The mirror profiles maximize the stereo baseline while obeying the various design constraints such as size and field of view. Currently, the mirrors are custom-manufactured out of brass using CNC machining. However, it is possible to reduce the system's weight dramatically by employing lighter materials.
In reality, it is almost impossible to assemble a perfect imaging system that fulfills the SVP assumption and avoids the triangulation uncertainty studied in Section 5.3 on top of the error already introduced by any feature matching technique. The coaxial misalignment of the folded mirrors-camera system, defocus blur of the lens, and the inauspicious glare from the support tube are all practical caveats we need to overcome for better 3D sensing tasks. As described in the text for the real-life rig, we have avoided the traditional use of a support cylinder in order to workaround the cross-reflections and glare issues. Possible vibrations caused by the robot dynamics are reduced by vibration pads placed on the sensor-body interface. Details about our tentative calibration method for vertically-folded omnistereo systems has not been included in the current study since we would like the reader's attention to be devoted to the sensor characteristics defended by this analysis.
Our ongoing research is also focusing on the development of efficient software algorithms for real-time 3D pose estimation from point clouds. Bear in mind that all the experimental results demonstrated in this manuscript rely upon a single camera snapshot. We understand that the narrow vertical field-of-view where stereo vision operates is a limiting factor for dense scene reconstruction from a single image, so we have also considered non-optimal geometries for the quadrotor's view. In fact, increasing the region of interest for stereo (SROI) while maintaining the wide baseline implies an enlargement of each mirror's radius. We believe that our omnidirectional system is more advantageous than forward-looking sensors because it can provide a robust pose estimation by extracting 3D point features from all around the scene at once. As in our past work [24], fusing multiple modalities (e.g., stereo and optical-flow) is a possibility in order to resolve the scale-factor problem inherent while performing structure from motion over the non-stereo regions of each mirror (near the poles).
In this work, we performed an extensive study of the proposed omnistereo sensor's properties, such as its spatial resolution and triangulation uncertainty. We validated the projection accuracy of the synthetic model (the ideal case) where 3D points in the world are given exactly. In order to validate the precision of the real sensor, we require a perfectly constructed and assembled device so point projections can be accepted as the ultimate truth. This is hard to achieve at a low-cost prototyping stage. Although we acquired ground-truth 3D points via a position capture system alone, we deem this insufficient to validate the imaging accuracy of the real sensor because the precision of the calibration method is truly what is being accounted for. For reproducibility purposes, source code is available for the implementation of the theoretical omnistereo model, optimization, plots and figures presented in this analysis [40].