Next Article in Journal
Evaluation of the Effects of Hidden Node Problems in IEEE 802.15.7 Uplink Performance
Next Article in Special Issue
Enhanced ICP for the Registration of Large-Scale 3D Environment Models: An Experimental Study
Previous Article in Journal
An ACOR-Based Multi-Objective WSN Deployment Example for Lunar Surveying
Previous Article in Special Issue
Vision-Based Pose Estimation for Robot-Mediated Hand Telerehabilitation
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Design and Analysis of a Single—Camera Omnistereo Sensor for Quadrotor Micro Aerial Vehicles (MAVs) †

1
Deptartment of Computer Science, The Graduate Center, The City University of New York (CUNY), 365 Fifth Avenue, New York, NY 10016, USA
2
Electrical Engineering Department, The City College, City University of New York (CUNY City College), Convent Ave & 140th Street, New York, NY 10031, USA
3
Automation Department, Nanjing University of Science and Technology (NUST), Nanjing 210094, China
*
Author to whom correspondence should be addressed.
This paper is an extended version of our paper published in Proceedings of the IEEE Conference on Industrial Electronics and Applications (ICIEA). Jaramillo, C.; Guo, L.; Xiao, J. A Single-Camera Omni-Stereo Vision System for 3D Perception of Micro Aerial Vehicles (MAVs), Melbourne, Australia, 19–21 June 2013; Volume 10016.
Sensors 2016, 16(2), 217; https://doi.org/10.3390/s16020217
Submission received: 24 November 2015 / Accepted: 29 January 2016 / Published: 6 February 2016
(This article belongs to the Special Issue Sensors for Robots)

Abstract

:
We describe the design and 3D sensing performance of an omnidirectional stereo (omnistereo) vision system applied to Micro Aerial Vehicles (MAVs). The proposed omnistereo sensor employs a monocular camera that is co-axially aligned with a pair of hyperboloidal mirrors (a vertically-folded catadioptric configuration). We show that this arrangement provides a compact solution for omnidirectional 3D perception while mounted on top of propeller-based MAVs (not capable of large payloads). The theoretical single viewpoint (SVP) constraint helps us derive analytical solutions for the sensor’s projective geometry and generate SVP-compliant panoramic images to compute 3D information from stereo correspondences (in a truly synchronous fashion). We perform an extensive analysis on various system characteristics such as its size, catadioptric spatial resolution, field-of-view. In addition, we pose a probabilistic model for the uncertainty estimation of 3D information from triangulation of back-projected rays. We validate the projection error of the design using both synthetic and real-life images against ground-truth data. Qualitatively, we show 3D point clouds (dense and sparse) resulting out of a single image captured from a real-life experiment. We expect the reproducibility of our sensor as its model parameters can be optimized to satisfy other catadioptric-based omnistereo vision under different circumstances.

Graphical Abstract

1. Introduction

Micro aerial vehicles (MAVs), such as quadrotor helicopters, are popular platforms for unmanned aerial vehicle (UAV) research due to their structural simplicity, small form factor, vertical take-off and landing (VTOL) capability, and high omnidirectional maneuverability. In general, UAVs have plenty of military and civilian applications, such as target localization and tracking, 3-dimensional (3D) mapping, terrain and infrastructural inspection, disaster monitoring, environmental and traffic surveillance, search and rescue, deployment of instrumentation, and cinematography, among other uses. However, MAVs have size, payload, and on-board computation limitations, which involve the use of compact and lightweight sensors. The most commonly used perception sensors on MAVs are laser scanners and cameras in various configurations such as monocular, stereo, or omnidirectional. We present a vision-based omnidirectional stereo (omnistereo) sensor motivated by several aspects of MAV robotics.

1.1. Sensor Motivation

We justify the need for the proposed omnistereo sensor after observing two basic differences in the sensor requirements between MAVs and ground vehicles:
  • Size and payload—In MAV applications, the sensor’s physical dimensions and weight are always a great concern due to payload constraints. Generally, MAVs require fewer and lighter sensors that are compactly designed, while larger robots (including high-payload UAVs) have greater freedom of sensor choice.
  • Field-of-view (FOV)—Due to their omnidirectional motion model, MAVs require a simultaneous observation of the 3D surroundings. Conversely, most ground robots can safely rely upon narrow vision as their motion control on the plane is more stable.

1.2. Existing Range Sensors for MAVs

In addition to specifying our sensor requirements, it is important to note the most prevalent robot range sensors used today by MAVs and their limitations. For example, lightweight 2.5D laser scanners can accurately measure distances at fast rates, however, their instantaneous sensing is limited to plane sweeps, which in turn require the quadrotor to move vertically in order to generate 3D maps or to foresee obstacles and free space during navigation. More recently, 3D laser rangefinders and LiDARs are being developed, such as the sensor presented in [1], but this one is not compact enough for MAVs. Another disadvantage of laser-based technologies is their active sensing nature, which requires more power to operate and their measurements are more vulnerable to detection and corruption (e.g., due to dark/reflective surfaces) than vision-based solutions. Time-of-flight (ToF) cameras as well as red, green, blue plus depth (RGB-D) sensors like the Microsoft Kinect® are also very popular for robot navigation. They have been adopted for low-sunlight conditions and mainly indoor navigation of MAVs [2] due to its structured infrared light projection and short range sensing (under 5 m) [3]. Hence, a lightweight imaging system capable of instantly providing a large field of view (FOV) with acceptable resolutions is essential for MAV applications in 3D space. These state-of-the-art sensors’ pitfalls motivate the design and analysis of our omnistereo sensor.

1.3. Related Work

Using omnidirectional images alone and motion—like the approaches taken in [4,5]—have been proposed to map and localize a robot. Omnidirectional vision using a single mirror for the flight of large UAVs was first attempted in [6]. In [7], Hrabar proposed the use of traditional horizontal stereo-based obstacle avoidance and path planing for AUVs, but these techniques were only tested in a scaled-down air vehicle simulator (AVS). Omnidirectional catadioptric cameras can be aided by structured light such as the prototypes presented in [8] and more flexible configurations demonstrated in [9]. Alternatively, stereo cameras can provide passive, instantaneous 3D information for robot mapping and navigation (including UAVs [10]). Intuitively, omnidirectional stereo (omnistereo) can be achieved through circular arrangements of multiple perspective cameras with overlapping views. Higher resolution panoramas can be achieved by rotating a linear camera as presented in [11], but this approach suffers from motion blur in dynamic environments. We point the reader to [12] for a detailed study of multiple view geometry, and [13] for a compendium of geometric computer vision concepts. Instead, our solution to omnistereo vision consists of a ‘catadioptric’ system by employing cameras and mirrors [14].
Throughout the years, [15,16,17,18,19,20] are some of the works that have applied various omnistereo catadioptric configurations for ground mobile robots. Unfortunately, these systems are not compact since they use separate camera-mirror pairs, which are known to experience synchronization issues. In [21], Yi and Ahuja described a configuration using a mirror and a concave lens for omnistereo, but it rendered a very short baseline in comparison to the two-mirror configurations. Originally, Nayar and Peri [22] studied 9 possible folded-catadioptric configurations for a single-camera omnistereo imaging system. Eventually, a catadioptric system using two hyperbolic mirrors in a vertical configuration was implemented by He et al. [23]. Their omnistereo sensor provides a lengthy baseline at the expense of a very tall system. In the past [24], we developed a novel omnistereo catadioptric rig consisting of a perspective camera coaxially-aligned with two spherical mirrors of distinct radii (in a “folded" configuration). One caveat of spherical mirrors is their non-centrality; they do not satisfy the single effective viewpoint (SVP) constraint (discussed in Section 2.2) but rather a locus of viewpoints is obtained [25].

1.4. Proposed Sensor

We design a SVP-compliant omnistereo system based on the folded, catadioptric configuration with hyperboloidal mirrors. Our approach resembles the work of Jang, Kim, and Kweon [26], who first implemented an omnistereo system using a pair of hyperbolic mirrors and a single camera. However, their sensor’s characteristics were not studied in order to justify their design parameters and capabilities, which we do in our case.
It is true that an omnidirectional catadioptric system sacrifices spatial resolution on the imaging sensor (analyzed in Section 3.4). However, our sensor offers practical advantages such as reduced cost, acceptable weight, and truly-instantaneous pixel-disparity correspondences since the same single camera-lens operates for both views, so mis-synchronization issues do not exist. In fact, we believe we are the first to present a single-camera catadioptric omnistereo solution for MAVs. The initial geometry of our model was proposed in [27]. Now, we perform an extensive analysis of our model’s parameters (Section 2) involving its geometric projection (Section 3) that are obtained as a constrained numerical optimization solution devising the sensor’s real-life application to MAVs passive range sensing (Section 4). We also show how the panoramic images are obtained, where we find correspondences and triangulate 3D points for which an uncertainty model is introduced (Section 5). Finally, we present our experimental results and evaluation for 3D sensing with the proposed omnistereo sensor (Section 6), and we discuss the future direction of our work in Section 7.

2. Sensor Design

Figure 1 shows the single-camera catadioptric omnistereo vision system that we specifically design to be mounted on top of our micro quadrotors (manufactured by Ascending Technologies [28]). It consists of (1) one hyperboloid-planar mirror at the top; (2) one hyperboloidal mirror at the bottom; and (3) a high-resolution USB camera also at the bottom (inside the bottom mirror and looking up). The components are housed and supported by a (4) transparent tube or plastic standoffs (for the real-life prototype shown in Figure 13). The choice of the hyperboloidal reflectors owes to three reasons: it is one of the four non-degenerated conic shapes satisfying the SVP constraint [29]; it allows a wider vertical FOV than elliptical and planar mirrors; and it does not require a telescopic (orthographic) lens for imaging as with paraboloidal mirrors (so our system can be downsized). In addition, the planar part of mirror 1 works as a reflex mirror, which in part reduces distortion caused by dual conic reflections. Based on the SVP property, the system obtains two radial images of the omnidirectional views in the form of an inner and an outer ring as illustrated in Figure 2a,b). Nevertheless, the unique set of parameters describing the entire system categorizes it as a “global camera model" given by [13] because changing the value of any parameter in the model affects the overall projection function of visible light rays in the scene as well as other computational imaging factors such as depth resolution and overlapping field of view, which we attempt to optimize with the following design subsections. Please, refer to Appendix A for clarification on our symbolic notation.

2.1. Model Parameters

In the configuration of Figure 3, mirror 1’s real or primary focus is F 1 , which is separated by a distance c 1 from its virtual or secondary focus, F 1 , at the bottom. Without loss of generality, we make both the camera’s pinhole and F 1 coincide with the origin of the camera’s coordinate system, O C . This way, the position of the primary focus, F 1 , can be referenced by vector C f 1 = [ 0 , 0 , c 1 ] T in Cartesian coordinates with respect to the camera frame, C . Similarly, the distance between the foci of mirror 2, F 2 and F 2 , is measured by c 2 . Here, we use the planar (reflex) mirror of radius r r e f and unit normal vector
C n ^ r e f = [ 0 , 0 , 1 ]
in order to project the real camera’s pinhole located at O C as a virtual camera O C coinciding with the virtual focal point F 2 positioned at C f 2 v = [ 0 , 0 , d ] T . We achieve this by setting d / 2 as the symmetrical distance from the reflex mirror to O C and from the reflex mirror to O C . With respect to C , mirror 2’s primary focus, F 2 , results in position C f 2 = [ 0 , 0 , d c 2 ] T . It yields the following expression for the reflective plane:
C n ^ r e f T C x = d / 2
The profile of each hyperboloid is determined by independent parameters k 1 and k 2 , respectively. Their reflective vertical field of view (vFOV) are indicated by angles α 1 and α 2 . They play an important role when designing the total vFOV of the system, α s y s , formally defined by Equation (54) and illustrated in Figure 5. Also importantly, while performing stereo vision, it is to consider angle α S R O I , which measures the common (overlapping) vFOV of the omnistereo system. The camera’s nominal field of view α c a m and its opening radius r c a m also determine the physical areas of the mirrors that can be fully imaged. Theoretically, the mirrors’ vertical axis of symmetry (coaxial configuration) produces two image points that are radially collinear. This property is advantageous for the correspondence search during stereo sensing (Section 5) with a baseline measured as
b = c 1 + c 2 d
Among design parameters, we also include the total height of the system, h s y s , and weight m s y s , both being formulated in Section 2.3.
To summarize, the model has 6 primary design parameters given as a vector
θ = c 1 , c 2 , k 1 , k 2 , d , r s y s
in addition to by-product parameters such as
b , h s y s , r r e f , r c a m , m s y s , α 1 , α 2 , α s y s , α S R O I , α c a m
In Section 4, we perform a numerical optimization of the parameters in θ with the goal to maximize the baseline, b, required for life-size navigational stereopsis. At the same time, we restrict the overall size of the rig (Section 2.3) without sacrificing sensing performance characteristics such as vertical field of view, spatial resolution, and depth resolution. In the upcoming subsections, we first derive the analytical solutions for the forward projection problem in our coaxial stereo configuration as a whole. In Section 3.2, we derive the back-projection equations for lifting 2D image points into 3D space.

2.2. Single Viewpoint (SVP) Configuration for OmniStereo

As a central catadioptric system, its projection geometry must obey the existence of the so-called single effective viewpoint (SVP). While the SVP guarantees that true perspective geometry can always be recovered from the original image, it limits the selection of mirror profiles to a set of conics. Generally, a circular hyperboloid of revolution (about its axis of symmetry) conforms to the SVP constraint as demonstrated by Baker and Nayar in [30]. Since a hyperboloidal mirror has two foci, the effective viewpoint is the primary focus F inside the physical mirror and the secondary (outer) focus F is where the centre (pinhole) of the perspective camera should be placed for depicting a scene obeying the SVP configuration discussed in this section.
First of all, a hyperboloid i can be described by the following parametric equation:
z i z 0 i 2 a i 2 r i 2 b i 2 = 1 , with a i = c i 2 k i 2 k i , b i = c i 2 2 k i
where z 0 i = c i 2 is the offset (shift) position of the focus along the Z-axis from the origin O C , and r i is the orthogonal distance to the axis of revolution / symmetry (i.e. the Z-axis) from a point P i on its surface.
In fact, the position of a valid point P i is constrained within the mirror’s physical surface of reflection, which is radially limited by r i , m i n and r i , m a x , such that:
r i = x i 2 + y i 2 , for r i , m i n r i r i , m a x , i { 1 , 2 }
and r 1 , m i n = r r e f , r 1 , m a x = r s y s , r 2 , m i n = r c a m , r 2 , m a x = r s y s . Observe that the radius of the system is the upper bound for both mirrors (Figure 3). In addition, the hyperboloids profiled by Equation (5) must obey the following conical constraints:
i { 1 , 2 } c i > 0 k i > 2
k is a constant parameter (unit-less) inversely related to the mirror’s curvature or more precisely, the eccentricity ε c of the conic. In fact, ε c > 1 for hyperbolas, yet a plane is produced when ε c or k = 2 .
We devise M i as the set of all the reflection points P i with coordinates ( x i , y i , z i ) laying on the surface of the respective mirror i within bounds. Formally,
M i := PiR3|ziz0i2ai2ri2bi2=1Equation(6)Equation(7)
In our model, we describe both hyperboloidal mirrors, 1 and 2, with respect to the camera frame C , which acts as the common origin of the coordinate system. Therefore,
z 0 1 = c 1 2
z 0 2 = d c 2 2
By expanding Equation (5) with their respective index terms, it becomes
z 1 c 1 2 2 r 1 2 k 1 2 1 = c 1 2 4 k 1 2 k 1
z 2 d + c 2 2 2 r 2 2 k 2 2 1 = c 2 2 4 k 2 2 k 2
Additionally, we define the function f z i : r z i to find the corresponding z i component from a given r value as
f z i ( r ) := z 0 i + γ i ifi=1Equation(6) z 0 i γ i ifi=2Equation(6) N o n e otherwise
where γ i = a i b i b i 2 + r i 2 .
The inverse relation f r i : z + r i , r i can be also implemented as
f r i ( z ) := ± b i Γ i ifi{1,2}Equation(6) N o n e otherwise
where Γ i = z z 0 i 2 a i 2 1 , so a valid input z can be associated with both positive and negative solutions r i .

2.3. Rig Size

In the attempt to evaluate the overall system size, we consider the height and weight variables due to the primary design parameters, θ.
First, the height of the system, h s y s can be estimated from the functional relationships f z 1 and f z 2 defined in Equation (13), which can provide the respective z component values at the out-most point on the mirror’s surface. More specifically, knowing r s y s , we get
h s y s = z m a x z m i n
where z m a x = f z 1 ( r s y s ) and z m i n = f z 2 ( r s y s ) .
The rig’s weight can be indicated by the total resulting mass of the main “tangible” components:
m s y s = m c a m + m t u b + m m i r
where the mass of the camera-lens combination is m c a m ; the mass of the support tube m t u b can be estimated from its cylindrical volume V t u b and material density ρ t u b , and the mass due to the mirrors
m m i r = V m i r ρ m i r = V 1 + V r e f + V 2 ρ m i r
For computing the volume of the hyperboloidal shell, V i for mirror i, we apply a “ring method” of volume integration. By assuming all mirror material has the same wall thickness τ m , we acquire V i by integrating the horizontal cross-sections area along the Z -axis. Each ring area depends on its outer and inner circumferences that vary according to radius r z for a given height z. Equation (14) establishes the functional relation r i + = f r i ( z ) , from which we only need its positive answer. We let A be the function that computes the ring area of constant thickness τ m for a variable outer radius r i
A ( r i ) = π r i 2 π r i τ m 2 = π τ m 2 r i τ m
We consider the definite integral evaluated in the z interval bounded by its height limits, which are correlated with its radial limits Equation (6) and can be obtained via the f z i defined in Equation (13), such that
z i , m i n = f z i r i , m i n and z i , m a x = f z i r i , m a x
Then, we proceed to integrate Equation (18), so the shell volume for each hyperboloidal mirror is defined as
V i = z i , m i n z i , m a x A ( r i ) dz
Finally, since the reflex mirror piece is just a solid cylinder of thickness τ m , its volume is simply
V r e f = τ m π r r e f 2

3. Projective Geometry

3.1. Analytical Solutions to Projection (Forward)

Assuming a central catadioptric configuration for the mirrors and camera system (Section 2.2), we derive the closed-form solution to the imaging process (forward projection) for an observable point P w , positioned in three-dimensional Euclidean space, R 3 , with respect to the reference frame, C , as vector C p w = [ x w , y w , z w ] T . In addition, we assume all reference frames such as F 1 and F 2 have the same orientation as C .
For mathematical stability, we must constrain that all projecting world points lie outside the mirror’s volume:
f r i ( z w ) < ρ w , where ρ w = x w 2 + y w 2
where f r i is defined by Equation (14) and ρ w measures the horizontal range to P w .
P w is imaged at pixel position I m 1 after its reflection as point P 1 on the hyperboloidal surface of mirror 1 (Figure 4). On the other hand, the second image point’s position, C m 2 , due to reflection point P 2 on mirror 2 is rather obtained indirectly after an additional point P r is reflected at C p r e f on the reflex mirror represented via Equation (32).
First, for P w ’s reflection point via mirror 1 at position vector C p 1 , we use λ 1 as the parametrization term for the line equation passing through F 1 toward P w with direction F 1 d 1 = C p w C f 1 . The position of any point P 1 on this line is given by:
C p 1 = C f 1 + λ 1 F 1 d 1
Substituting Equation (23) into Equation (11), we obtain:
λ 1 ( z w c 1 ) + c 1 2 2 λ 1 2 x w 2 + λ 1 2 y w 2 k 1 2 1 c 1 2 4 k 1 2 k 1 = 0
in order to solve for λ 1 , which turns out to be
λ 1 = c 1 F 1 d 1 k 1 · ( k 1 2 ) k 1 z w c 1
where F 1 d 1 = x w 2 + y w 2 + ( z w c 1 ) 2 is the Euclidean norm between P w and mirror 1’s focus, F 1 .
In practice, we represent the reflection point’s position C p 1 as a matrix-vector multiplication between the 3 × 4 transformation matrix K 1 = [ λ 1 I ( 3 ) , 1 λ 1 C f 1 ] and the point’s position vector C p w , h = [ x w , y w , z w , 1 ] T in homogeneous coordinates:
C p 1 = K 1 C p w , h
Note that C p 1 ’s elevation angle, θ 1 , must be bounded as
θ 1 , m i n θ 1 θ 1 , m a x
where θ 1 , m i n and θ 1 , m a x are the angular elevation limits for the real reflective area of the hyperboloid.
Finally, the reflection point P 1 with position C p 1 can now be perspectively projected as a pixel point located at I m 1 = [ u 1 , v 1 ] T on the image. In fact, the entire imaging process of P w via mirror 1 can be expressed in homogeneous coordinates as:
I m 1 , h = ζ 1 K c K 1 C p w , h
where the scalar ζ 1 = 1 / z 1 = 1 / c 1 + λ 1 z w c 1 is the perspective normalizer that maps the principal ray passing through p 1 onto a point C q 1 = [ x q 1 , y q 1 , 1 ] T on the normalized projection plane π ^ i m g 1 . The traditional 3 × 3 intrinsic matrix of the camera’s pinhole model is
K c = f u s u c 0 f v v c 0 0 1
in which f u = f / h x and f v = f / h y are based on the focal length f and the pixel dimension ( h x , h y ) , s is the skew parameter, and I m c = [ u c , v c ] T is the optical center position on the image I . Figure 4 illustrates the projection point f C q 1 on the respective image plane π i m g 1 .
Similarly, we provide the analytical solution for the forward projection of P w via mirror 2 by first considering the position of reflection point P 2 :
C p 2 = K 2 C p w , h
where K 2 = [ λ 2 I ( 3 ) , 1 λ 2 C f 2 ] is similar to the transformation matrix K 1 , but obviously it now uses C f 2 and
λ 2 = c 2 F 2 d 2 k 2 · ( k 2 2 ) + k 2 z w ( d c 2 )
with direction vector’s norm
F 2 d 2 = C p w C f 2 = x w 2 + y w 2 + z w ( d c 2 ) 2
For completeness, note that the physical projection via mirror 2 is incident to the reflex mirror at
C p r e f = C f 2 v + λ r e f C p 2 C f 2 v
where λ r e f = d 2 ( d z 2 ) according to Equation (2) in the theoretical model. Ultimately, ignoring any astigmatism and chromatic aberrations introduced by the reflex mirror, and because the same (and only) real camera with K c is used for imaging, we obtain the projected pixel position I m 2 , h = [ u 2 , v 2 , 1 ] T :
I m 2 , h = ζ 2 K c K r e f K 2 C p w , h
where ζ 2 = 1 / d z 2 is the perspective normalizer to find C q 2 on the normalized projection plane, π ^ i m g 2 .
Due to planar mirroring via the reflex mirror, C C K r e f is used to change the coordinates of P 2 from C onto the virtual camera frame, C , located at C f 2 v . Hence,
C C K r e f = I ( 3 ) + 2 D n ^ r e f , C f 2 v
where the 3 × 1 unit normal vector of the reflex mirror plane, C n ^ r e f given in Equation (1), is mapped into its corresponding 3 × 3 diagonal matrix D n ^ r e f , via the relationship:
D n ^ r e f I ( 3 ) diag C n ^ r e f
It is convenient to define the forward projection functions f φ 1 ( C p ) and f φ 2 ( C p ) for a 3D point P whose position vector is known with respect to C and which is situated within the vertical field of view α i of mirror i (for i { 1 , 2 } ) indicated in Figure 5. Function f φ i ( C p ) maps C p to image point I m i on frame I , such that f φ i : R 3 R 2 ,
f φ i ( C p ) := C p Equation(27) I m 1 ifi=1Equations(37)and(22) C p Equation(33) I m 1 ifi=2Equations(37)and(22) N o n e otherwise
In fact, I m i is considered valid if it is located within the imaged radial bounds, such that:
I C i I i m r i , m i n I C i I m i I C i I m r i , m a x
where the frame of reference I C i implies that its origin is the image center I m c = [ u c i , v c i ] T of the I i masked image (Figure 7). Therefore, the magnitude (norm) of any position I C i m in pixel space I C i can be measured as
I C i I i m : = I i m I i m c = ( u u c ) 2 + ( v v c ) 2
In particular, I C i I m r i , l i m is the image radius obtained from the projection I m r i , l i m f φ i ( C p i , l i m ) corresponding to a particular point coincident with the line of sight of the radial limit r i , l i m —it being either r s y s , r r e f , or r c a m as indicated by Equation (6).

3.2. Analytical Solutions to Back Projection

The back projection procedure establishes the relationship between the 2D position of a pixel point I m i = [ u , v ] T on the image I i and its corresponding 3D projective direction vector v i toward the observed point P w in the world.
Initially, the pixel point I m 1 (imaged via mirror 1) is mapped as Q 1 onto the normalized projection plane π ^ i m g 1 with coordinates C q 1 = [ x q 1 , y q 1 , 1 ] T by applying the inverse transformation of the camera intrinsic matrix Equation (28) as follows:
C q 1 = C I K c 1 I m 1 , h = 1 f u s f u f v s v c f v u c f u f v 0 1 f v v c f v 0 0 1 u 1 v 1 1
For simplicity, we assume no distortion parameters exist, so we can proceed with the lifting step along the principal ray that passes through three points: the camera’s pinhole O C , point Q 1 on the projection plane, and the reflection point P 1 (Figure 4). The vector form of this line equation can be written as:
C p 1 = C o c + t 1 C q 1 C o c = t 1 C q 1
By substituting Equation (40) into Equation (11), we solve for the parameter t 1 , to get
t 1 = c 1 k 1 C q 1 k 1 · ( k 1 2 )
where C q 1 = x q 1 2 + y q 1 2 + 1 is the distance between Q 1 and O C .
Given F 1 v 1 as the direction vector leaving focal point F 1 toward the world point C P w . Through frame transformation C F 1 T 1 C p 1 , h , we get
F 1 v 1 = C F 1 T 1 C p 1 , h , where C F 1 T 1 ( 3 × 4 ) = I ( 3 ) , C f 1
for C p 1 , h as the homogeneous form of Equation (40). In fact, F 1 v 1 provides the back-projected angles (elevation θ 1 , azimuth ψ 1 ) from focus F 1 toward C P w :
F 1 θ 1 = arcsin z v 1 F 1 v 1 = arcsin z 1 c 1 F 1 v 1
F 1 ψ 1 = arctan y v 1 x v 1 = arctan y 1 x 1
where F 1 v 1 is the norm of the back-projection vector up to the mirror surface.
Using the same approach, we lift a pixel point I m 2 imaged via mirror 2. Because the virtual camera O C located at C f 2 = [ 0 , 0 , d c 2 ] T uses the same intrinsic matrix K c , we can safely back-project pixel I m 2 to Q 2 v on the normalized projection plane π ^ i m g 2 as follows:
C q 2 v = C q 2 = K c 1 I m 2 , h
where the inverse transformation of the camera intrinsic matrix K c 1 is given by Equation (28). Since the reflection matrix K r e f defined in Equation (34) is bidirectional due to the symmetric position of the reflex mirror about C and C , we can find the desired position of C q 2 v with respect to C :
C q 2 v = C C K r e f C q 2 v , h
which is equivalent to C q 2 v = [ x q 2 v , y q 2 v , d 1 ] T .
In Figure 4, we can see the principal ray that passes through the virtual camera’s pinhole O C and the reflection point P 2 , so this line equation can be written as:
C p 2 = C f 2 v + t 2 C q 2 v C f 2 v
Solving for t 2 Equations (47) and (12), we get
t 2 = c 2 k 2 C q 2 k 2 · ( k 2 2 )
where C q 2 = x q 2 2 + y q 2 2 + 1 is the distance between the normalized projection point Q 2 and the camera O C while considering Equation (46). Beware that the newly found location of P 2 is given with respect to the real camera frame, C .
Again, we obtain the back-projection ray
F 2 v 2 = C F 2 T 2 C p 2 , h , where C F 2 T 2 ( 3 × 4 ) = I ( 3 ) , f 2
in order to indicate the direction leaving from the primary focus F 2 toward P w through P 2 . Here, the corresponding elevation and azimuth angles are respectively given by
F 2 θ 2 = arcsin z v 2 F 2 v 2 = arcsin d t 2 F 2 v 2
F 2 ψ 2 = arctan y v 2 x v 2 = arctan y 2 x 2
where F 2 v 2 = x 2 2 + y 2 2 + c 2 t 2 2 is the magnitude of the direction vector from its reflection point P 2 .
Like done for the (forward) projection, it is convenient to define the back-projection functions f β 1 and f β 2 for lifting a 2D pixel point I m within radial bounds validated by Equation (37) to their angular components F i θ i , ψ i with respect to the respective foci frame F i (oriented like C ) as indicated by Equations (43), (44), (50) and (51), such that f β i : R 2 R 2 ,
f β i ( I m ) : = I m Equation(43) F 1 θ 1 , I m Equation(44) F 1 ψ 1 if i = 1 I m Equation(50) F 2 θ 2 , I m Equation(51) F 2 ψ 2 if i = 2 N o n e ¬Equation(37).

3.3. Field-of-View

The horizontal FOV is clearly 360° for both mirrors. In other words, azimuths ψ can be measured in the interval 0 , 2 π rad . As discussed previously, there exists a positive correlation between the vertical field of view (vFOV) angle α i of mirror i and its profile parameter k i , such that α i 180 ° as k i (see Figure 9). As demonstrated in Figure 5, α i is physically bounded by its corresponding elevation angles: θ i , m a x , θ i , m i n . Both vFOV angles, α 1 and α 2 , are computed from their elevation limits as follows:
α 1 = θ 1 , m a x θ 1 , m i n
α 2 = θ 2 , m a x θ 2 , m i n
The overall vFOV of the system is also given from these elevation limits:
α s y s = max θ 1 , m a x , θ 2 , m a x min θ 1 , m i n , θ 2 , m i n
Figure 6 highlights the the so-called common vFOV angle, α S R O I , for the stereo region of interest (SROI) where the same point can be seen from both mirrors so point correspondences can be found (Section 5). In our model, α S R O I can be decided from the value of the three prevailing elevation angles ( θ 1 , m a x , θ 1 , m i n , and θ 2 , m i n ), such that:
α S R O I = θ S R O I , m a x θ S R O I , m i n
where generally,
θ S R O I , m i n = max ( θ 1 , m i n , θ 2 , m i n )
θ S R O I , m a x = min ( θ 1 , m a x , θ 2 , m a x )
The shaded area in Figure 6 illustrates the SROI that is far-bounded by the set of triangulated points found at the maximum range due to minimum disparity Δ m 12 = 1 px in the discrete case (refer to Figure 17), such that
P f s = P w f Δ ( ( θ 1 , ψ 1 ) , ( θ 2 , ψ 2 ) ) ( θ 1 , ψ 1 ) f β 1 ( m 1 ) ( θ 2 , ψ 2 ) f β 2 ( m 2 ) Δ m 12 = 1 , px
where functions f β i and f Δ , are provided in Equations (52) and (89).
The SROI is near-bounded (to the Z -axis of radial symmetry) by its vertices P n s h i g h , P n s m i d and P n s l o w , which result from the following ray-intersection cases:
(a)
P n s h i g h f Δ ( ( θ 1 , m a x , ψ 1 ) , ( θ 2 , m a x , ψ 2 ) )
(b)
P n s m i d f Δ ( ( θ 1 , m i n , ψ 1 ) , ( θ 2 , m a x , ψ 2 ) )
(c)
P n s l o w f Δ ( ( θ 1 , m i n , ψ 1 ) , ( θ 2 , m i n , ψ 2 ) )
where the intersection function f Δ is implemented for direction rays (or angles) as defined in the Triangulation Section 5.2.
By assuming a radial symmetry on the camera’s field of view α c a m , it should allow for a complete view of the mirror surface at its outmost diameter of 2 r s y s according to Equation (6). Substantially, as depicted in Figure 6, α c a m is upper-bounded by the camera hole radius r c a m selected according to Equation (78). The following inequality constraint emerges
2 arctan r s y s f z 1 ( r s y s ) α c a m 2 arctan r c a m f z 2 ( r c a m )
where the respective functions f z i are defined in Equation (13).
Our specific viewing requirements when mounting the omnidirectional sensor along the central axis of the quadrotor ensure that objects located at 15 cm under the rig’s base and at 1 meter away (from the central axis) can be viewed. Thus, angles θ 1 , m i n and θ 2 , m i n should only be large enough as to avoid occlusions from the MAV’s propellers (Figure 5) and to produce inner and outer ring images at a useful ratio (Figure 7).

3.4. Spatial Resolution

The resolution of the images acquired by our system are not space invariant. In fact, an omnidirectional camera producing spatial resolution-invariant images can only be obtained through a non-analytical function of the mirror profile as shown in [31]. In this section, we study the effect our design has on its spatial resolution as it depends on position parameters like d and c i introduced in Section 2.1 as well as a direct dependency on the characteristics (e.g., focal length f) of the camera obtaining the image.
Let η c a m be the spatial resolution for a conventional perspective camera as defined by Baker and Nayar in [25,29]. It measures the ratio between the infinitesimal solid angle d ω i (usually measured in steradians) that is directed toward a point P i at an angle θ i , pix (formed with the optical axis Z C ) and the infinitesimal element of image area d A pix that d ω i subtends (as shown in Figure 8). Accordingly, we have:
η c a m = d A pix d ω i = f 2 cos 3 θ i , pix
whose behavior tends to decrease as θ pix 0 , so higher resolution areas on the sensor plane continuously increase the farther away they get from the optical center imaged at I m c . For ease of visualization, we plot only the u pixel coordinates corresponding to the 2D spatial resolution η 2 D , which is obtained by projecting the solid angle Ω onto a planar angle θ Ω (the apex angle in 2D of the solid cone of view). This yields θ Ω = 2 arccos 1 Ω / 2 π , and we reduce the image area into its circular diameter with 2 A / π . Generally, our conversion from 3D spatial resolution η in m 2 / sr units to 2D proceeds as follows:
η 2 D = 2 η / π θ Ω = 1 sr
where θ Ω = 1 sr 1 . 14390752211 rad . More specifically, Equation (59) is manipulated to provide η i , c a m as the indicative of spatial resolution toward any specific point in the mirror, C P i M i according to Equation (8), as follows:
η i , c a m = f 2 r 1 2 + z 1 2 z 1 3 if i = 1 f 2 r 2 2 + ( d z 2 ) 2 d z 2 3 if i = 2
where r i is the radial length defined in Equation (6) and its associated z i coordinate, f is the camera’s focal length, and the design parameters d and c i that relate to the position of the mirror focal points F i with respect to the camera frame C .
Thus, for a conventional perspective camera, η i , c a m grows as θ i , pix π / 2 due to the foreshortening effect that stretches the image representation around the sensor plane’s periphery where spatial information gets collected onto a larger number of pixels. Therefore, image areas farther from the optical axis are considered to have higher spatial resolutions.
Baker and Nayar also defined the resolution, η i , of a catadioptric sensor in order to quantify the view of the world or d ν i , an infinitesimal element of the solid angle subtended by the mirror’s effective viewpoint F i , which is consequently imaged onto a pixel area d A pix . Again, here we provide the resolution according to our model:
η 1 = d A pix d ν 1 = r 1 2 + ( c 1 z 1 ) 2 ) r 1 2 + z 1 2 η 1 , c a m
η 2 = d A pix d ν 2 = r 2 2 + ( c 2 d + z 2 ) 2 ) r 2 2 + ( d z 2 ) 2 η 2 , c a m
for our mirror-perspective camera configuration, where O C is the origin of coordinates as shown in Figure 8 and η i , c a m is given in Equation (61).
As demonstrated by the plot of Figure 12 in Section 4.2.2, η i grows accordingly towards the periphery of each mirror (the equatorial region). This aspect of our sensor design is very important because it indicates that the common field of view, α S R O I , where stereo vision is employed (Section 5), is imaged at a relatively higher resolution than the unused polar regions closer to the optical axis (the Z C axis).
If we modify η i by substituting r i with its equivalent f r i ( z i ) function defined in Equation (14), using mirror 1 for example, we get:
η 1 = f r 1 2 ( z 1 ) + ( c 1 z 1 ) 2 f r 1 2 ( z 1 ) + z 1 2 η 1 , c a m = f 2 f r 1 2 ( z 1 ) + z 1 2 f r 1 2 ( z 1 ) + ( c 1 z 1 ) 2 z 1 3
which is an inherent indicative of how the resolution η i for a reflection point P i increases with k i (Figure 11). Conversely, the smaller the k i parameter gets (related to eccentricity as discussed in Section 2.2), the flatter the mirror becomes, so its resolution resembles more that of the perspective camera alone. Mathematically, lim k i 2 η i η i , c a m .
As shown in Figure 9, a smaller k i would require a wider radius r s y s in order to achieve the same omnidirectional vertical field of view, α s y s . Even worse, in order to image such a wider reflector, either the camera’s field of view, α c a m , would have to increase (by decreasing the focal length f and perhaps requiring a larger camera hole r c a m and sensor size), or the distance c i between the effective pinhole and the viewpoint would have to increase accordingly. Another consequence is the effect on the baseline b, which must change in order to maintain the same vertical field of view (Figure 10). As a result, the depth resolution of the stereo system would suffer as well.

4. Parameter Optimization and Prototyping

The nonlinear nature of this system makes it very difficult to balance among its desirable performance aspects. The optimal vector of design parameters, θ * , can be found by posing a constrained maximization problem for the objective function
f b ( θ ) = c 1 + c 2 d
which measures the baseline according to Equation (3). Indeed, the optimization problem is subject to the set of constraints C, which we enumerate in Section 4.1. Formally,
θ * = arg max θ Θ f b ( θ ) subject to C
where Θ R 6 is the 6-dimensional solution space for θ R 6 given in Equation (4) as θ = c 1 , c 2 , k 1 , k 2 , d , r s y s .

4.1. Optimization Constraints

We discuss the constraints that the proposed omnistereo sensor is subject to. Overall, we mainly take the following into account:
(a)
geometrical constraints — including SVP and reflex constraints described by Equations (11), (12) and (2);
(b)
physical constraints — the rig’s dimensions, which include the mirrors radii as well as by-product parameters such as system height h s y s and mass m s y s ;
(c)
performance constraints — the spatial resolution and range from triangulation determined by parameters k 1 , k 2 , and c 1 ; the desired viewing angles for an optimal SROI field of view, α S R O I .
Following the design model described throughout Section 2, we now list the pertaining linear and nonlinear constraints that compose the set C. We disjoint the linear constraints in a subset C L and the non-linear constraints subset C N L , so C = C L C N L . Within each subset, we generalize equality constraints as functions h : R 6 R that obey
h ( θ ) = 0
whereas inequality functions g : R 6 R satisfy
g ( θ ) 0

4.1.1. Linear Constraints

We have only setup linear inequalities for constraints in C L . Specifically, we require the following:
g1:
In order to set the position of F 2 below the origin O C of the pinhole camera frame C , the focal distance c 2 of mirror 2 must be larger than d (distance between O C and F 2 v ),
d c 2
g2:
Because the hyperboloidal mirror should reflect light towards its effective viewpoint F 1 without being occluded by the reflex mirror, mirror 1’s focal distance, c 1 , needs to exceed the placement of the reflex mirror,
d / 2 c 1
g3:
The empirical constraint
5 3 k 2 k 1
pertains our rig dimensions in order to assign a greater curvature to mirror 2’s profile (located a the bottom), so its view is directed toward the equatorial region rather than up. Complementarily, this constraint flattens mirror 1’s profile, so it can possess a greater view of the ground. This curvature inequality allows the SROI to be bounded by a wider vertical field of view when the sensor must be mounted above the MAV’s propellers as depicted in Figure 5.

4.1.2. Non-Linear Constraints

For the non-linear design constraints, we establish the following inequalities:
g4:
The AscTec Pelican quadrotor has a maximum payload of 650 g (according to the manufacturer specifications [28]). Therefore, we must satisfy the system mass computed via Equation (16), such that
m s y s 650
g5:
Similarly, we limit the system’s height obtained with Equation (15) by a height limit h s y s , m a x ,
h s y s h s y s , m a x
For example, we set h s y s , m a x = 150 mm for the 37 mm -radius rig.
g6:
The origin of coordinates for the camera frame is set at its viewpoint, O C . In order to fit the camera enclosure under mirror 2, it is realistic to position the focus F 2 on the vertical transverse axis at more than 5 mm away from O C :
5 z 0 2 a 2
where z 0 2 is defined in Equation (10), and a 2 pertains to Equation (5).
Next, we determine the bounds for the limiting angles that partake in the computation of the system’s vertical field of view α s y s , which is based on equation Equation (54). Our application has specific viewing requirements that can be achieved with the following application conditions:
g7:
Let Λ 1 , m a x = 14 be an acceptable upper-bound for angle θ 1 , m a x , such that
θ 1 , m a x Λ 1 , m a x
g8:
Because we desire a larger view towards the ground from mirror 1, we empirically set Λ 1 , m i n = 25 as a lower-bound for the minimum elevation θ 1 , m i n ,
Λ 1 , m i n θ 1 , m i n
g9:
In order to avoid occlusions with the MAV’s propellers while being capable to image objects located about 5 cm under the rig’s base and 20 cm away (horizontally) from the central axis, we limit mirror 2’s lowest angle by a lower-bound Λ 2 , m i n = 14 ,
Λ 2 , m i n θ 2 , m i n
Finally, we restrict the radius of the system, r s y s , to be identical for both hypeboloids by satisfying the following equality condition:
h1:
With functions f r 1 and f r 2 defined in Equation (14), we set
r s y s = r i , m a x = f r i ( z i , m a x ) , i { 1 , 2 }
where we imply that z i , m a x f z i ( r s y s ) using Equation (13). Thus, the entire function composition for this equality becomes
f r 1 f z 1 ( r s y s ) = f r 2 f z 2 ( r s y s )

4.2. Optimal Results

Applying the aforementioned constraints (Section 4.1) and using an iterative nonlinear optimization method such as one of the surveyed in [32], a bounded solution vector θ * converges to the the values shown in Table 1 for two rig sizes. Table 2 contains the by-product parameters corresponding to the dimensions listed in Table 1.
As Figure 3 illustrates, a realistic dimension for the radius of the camera hole, r c a m , must consider the maximum value between a physical micro-lens radius ( r l e n s ) and the radius r α c a m r s y s for an unoccluded field of view of the camera α c a m imaging the complete surface of mirror 1. Practically,
r c a m = max r l e n s , r α c a m r s y s
For both rigs, the expected vertical field of views are α s y s = 75 ° ( 21° ) 96° according to Equation (54), and α S R O I = 14° ( 14° ) 28° using Equation (55). Note that θ 2 , m a x may be actually limited by the camera hole radius, so in reality θ c a m 59° , and α s y s 80° . For the big rig, Table 3 shows the nearest vertices of the SROI that result from these angles (Figure 6).

4.2.1. Optimality of Parameters k 1 and k 2

Finally, we study the effect parameter k i has over the system radius r s y s (Figure 9), the omnistereo baseline b (Figure 10), and the spatial resolution (Figure 11 and Figure 12). Figure 9 addresses the relation between k i and radius r s y s (recall the rig size specified in Section 2.3). In Figure 11, it can be seen that for the same r s y s , realistic values for k 1 fall in the range 3 < k 1 < 13 , and the vertical field of view α 1 0 as k 2 , which is expected according to the SVP property specified in Section 2.2. In fact, the left part of Figure 11 also demonstrates the necessary r s y s to maintain α S R O I 28 for various values of k i .
Figure 10 shows the inverse relationship between values of k 1 and the baseline, b, as we attempt to fit the view of a wider/narrower mirror profile (due to k 1 ) on the constant camera field of view, α c a m . In order to make a fair comparison, let
k 1 = k 1 + ε k , k 1 > 2 , ε k > 0
for which we find its new focal length c 1 while solving for the new r s y s and z m a x . Provided with a function such that c 1 f c 1 ( k 1 ) , we perform the analysis for a given α S R O I and α c a m shown in Figure 10. Given the baseline function f b defined in Equation (64), the following implication holds true:
f b c 1 f c 1 ( k 1 ) > f b c 1 f c 1 ( k 1 + ε k ) , k 1 > 2 , ε k > 0
Notice that k 2 , c 2 and d are kept constant through this last analysis, and we ignore possible occlusions from the reflex mirror fixed at d / 2 .

4.2.2. Spatial Resolution Optimality

In this section, we compare the sensor’s spatial resolution, η i , defined in Section 3.4 for the optimal parameters listed in Table 1 (for the big rig, only). In Figure 12, we verify how both resolutions η 1 and η 2 increase towards the equatorial region according to the spatial resolution theory presented in [29]. Indeed, the increase in spatial resolution within the SROI that covers the equatorial region (as indicated in Figure 6) justifies our model’s coaxial configuration intended for omnistereo applications.
In Figure 11, we compare the effect on η i for various mirror profiles, which depend directly on k i . We illustrate the change in curvature due to parameters k 1 and k 2 and also show (in the legend) the respective r s y s achieving a common vFOV of α S R O I 28° as for the optimal parameters of the big rig. From this plot, we appreciate the compromise due to optimal parameters, k 1 ( O p t . ) = 5.7 and k 2 ( O p t . ) = 9.7 , for a realistic system size due to r s y s and a suitable range of spatial resolutions, η i , within the SROI.

4.3. Prototypes

We validate our design with both synthetic and real-life models.

4.3.1. Synthetic Prototype (Simulation)

After converging to an optimal solution θ * , we employ these parameters (Table 1) to describe synthetic models using POV-Ray, an open-source ray-tracer. We render 3D scenes via the camera of the synthetic omnistereo sensor like the example shown in Figure 2b. The simulation stage plays two important roles in our investigation:
(1)
to acquire ground-truth 3D-scene information in order to evaluate the computed range by the omnistereo system (as explained in Section 5); and
(2)
to provide an almost accurate geometrical representation of the model by discounting some real-life computer vision artifacts such as assembly misalignments, glare from the support tube (motivating the use of standoffs on the real prototype), as well as the camera’s shallow depth-of-field. All of these artifacts can affect the quality of the real-life results shown in Section 6.

4.3.2. Real-Life Prototypes

We have also produced two physical prototypes that can be installed on the Pelican quadrotor (made by Ascending Technologies [28]). Figure 13a shows the rig constructed with hyperboloidal mirrors of r s y s 37 mm , and a Logitech® HD Pro Webcam C910 camera capable of (2592 × 1944) pixel images at 15∼20 FPS. We decided to skip the use of the acrylic glass tube to separate the mirrors at the specified h s y s distance, and instead we constructed a lighter 3-standoff mount in order to avoid glare and cross-reflections. This support was designed in 3D-CAD and printed for assembly. The three areas of occlusion due to the 3 mm -wide standoffs are non-invasive for the purpose of omnidirectional sensing and can be ignored with simple masks during image processing. In fact, we stamped fiducial markers to the vertical standoffs to aid with the panoramas generation (Section 5.1) and future calibration methods. To image the entire surface of mirror 1, we require a camera with a (minimum) field of view of α c a m > 31° , which is achieved by r α c a m > 1.4 mm . In practice, as noted by Equation (78), microlenses measure around r l e n s 7 mm . Therefore, we set r c a m > 7 mm , as a safe specification to fit a standard microlens through the opening of mirror 2 as shown in Figure 3.
Recall that m s y s is limited by the maximum 650 g -payload that the AscTec Pelican quadrotor is capable of flying with (according to the manufacturer specifications [28]). The camera with lens weights approximately 25 g . A cylindrical tube made of acrylic has an average density ρ t u b 1.18 g · cm 3 , whereas the mirrors machined out of brass have a density ρ m i r 8.5 g · cm 3 . Empirically, we verify a close estimate of the entire system’s mass, such that m s y s 550 g for the big rig, and m s y s 150 g for the small rig.

5. 3D Sensing from Omnistereo Images

Stereo vision from point correspondences on images at distinct locations is a popular method for obtaining 3D range information via triangulation. Techniques for image point matching are generally divided between dense (area-based scanning [32]) and sparse (feature description [33]) approaches. Due to parallax, the disparity in point positions for objects close to the vision system must be larger than for objects that are farther away. As illustrated in Figure 6, the nearsightedness of the sensor is determined mainly by the common observable space (a.k.a. SROI) acquired by the limiting elevation angles of the mirrors (Section 3.3). In addition, we will see next (Section 5.2) that the baseline b also plays a major role in range computation.
Due to our model’s coaxial configuration, we could scan for pixel correspondences radially between a given pair of warped images I 1 , I 2 like in the approach taken by similar works such as [34]. However, it seems more convenient to work on a rectified image space, such as with panoramic images, where the search for correspondences can be performed using any of the various existing methods for perspective stereo views. Hence, we first demonstrate how these rectified panoramic images are produced (Section 5.1) and used for establishing point correspondences. Then, we proceed to study our triangulation method for the range computation from a given set of point correspondences (Section 5.2). Last, we show preliminary 3D point clouds as the outcome from such procedure.

5.1. Panoramic Images

Figure 14 illustrates how we form the respective panoramic image Ξ 1 out of its warped omnidirectional image I 1 . As illustrated in Figure 7, I i is simply the region of interest out of the full image I where projection occurs via mirror i. However, we can safely refer to I because it will never be the case that projections via different mirrors overlap on the same pixel position I m . In a few words, we obtain a panorama Ξ i by reverse-mapping each discretized 3D point P c y l i S c y l i to its projected pixel coordinates I m on I according to Section 3.2.
More thoroughly, for i = { 1 , 2 } , S c y l i is the set of all valid 3D points P c y l i that lie on an imaginary unit cylinder centered along the Z-axis and positioned with respect to the mirror’s primary focus F i . Recall that the radius of a unit cylinder is r c y l = 1 , so its circumference becomes w c y l = 2 π r c y l = 2 π . Noticed that the imaging ratio, χ I 1 : 2 = h I 1 h I 2 , illustrated in Figure 7 provides a way of inferring the scale between pairs of point correspondences. However, we achieve conforming scales among both panoramic representations by simply setting both cylinders to an equal height h c y l , which is determined from the system’s elevation limits, ( θ s y s , m i n , θ s y s , m a x ) , since they partake in the measurement of the system’s vertical field of view given by Equation (54). Hence, we obtain
h c y l = z c y l , m a x z c y l , m i n , where z c y l , m a x = tan θ s y s , m a x z c y l , m i n = tan θ s y s , m i n
Consequently, to achieve panoramic images Ξ i of the same dimensions by maintaining a true aspect ratio w Ξ : h Ξ , it suffices to indicate either the width (number of columns) w Ξ or the height (number of rows) h Ξ as number of pixels. Here, we propose a custom method for resolving the panoramic image dimensions by setting the equality for the length l p x of an individual “square” pixel in the cylinder (behaving like a panoramic camera sensor):
l p x = w c y l w Ξ = h c y l h Ξ
For instance, if the width w Ξ is given, then the height is simply h Ξ = w Ξ h c y l / w c y l .
To increase the processing speed for each panoramic image Ξ i , we fill up its corresponding look-up-table LUT Ξ i of size w Ξ × h Ξ that encodes the mapping for each panoramic pixel coordinates Ξ i m = Ξ i [ u , v ] T to its respective projection I i m = I i [ u , v ] T on the distorted image I i . Each pixel Ξ i m gets associated with its cylinder’s 3D point positioned at F i p c y l i , which can inherently be indicated by its elevation F i θ i and azimuth F i ψ i (relative to the mirror’s primary focus F i ) as illustrated in Figure 4. Thus, the ray F i v i of a particular 3D point directed about F i ψ i , θ i must pass through P c y l i in order to get imaged as pixel I m i .
Since the circumference of the cylinder, w c y l , is discretized with respect to the number of pixel columns or width w Ξ , we use the pixel length l p x as the factor to obtain the arc length l ψ i spanned by the azimuth F i ψ i out of a given Ξ i u coordinate on the panoramic image. Generally,
F i ψ i = l ψ i r c y l = w c y l Ξ i u l p x r c y l
or simply F i ψ i = 2 π Ξ i u l p x for the unit cylinder case.
An order reversal in the columns of the panorama is performed by Equation (82) because we account for the relative position between S c y l i and the projection plane π i m g . For Ξ 1 , Figure 14 depicts the unrolling of the cylindrical panoramic image onto a planar panoramic image. However, note that π i m g is shown from above (or its back) in Figure 14, so the panorama visualization places the viewer inside the cylinder at F 1 .
Similarly, the elevation angle F i θ i is inferred out the row or Ξ i v coordinate, which is scaled to its cylindrical representation by l p x . Recall that both cylinders have the same height, h c y l , computed by Equation (80). By taking into account any row offset from the maximum height position, F i z c y l , m a x , of the cylinder, we get
F i θ i = arctan F i z c y l , m a x Ξ i v l p x
Given these angles and assuming coaxial alignment, we evaluate the positon vector C p c y l i for a point on the panoramic cylinder with respect to the camera frame C :
C p c y l i = r c y l cos F i ψ i sin F i ψ i tan F i θ i + C f i
where r c y l cancels out for a unit cylinder. The direction equations Equations (82) and (83) leading to Equation (84) as a process: Ξ i m Equation( 83 ) Equation( 82 ) F i ψ θ Equation( 84 ) C p c y l i , which is eventually used as the input argument to Equation (36) in order to determine pixel I i m via the mapping function h Ξ i : R 2 R 2 ,
I i m h Ξ i ( Ξ i m ) : = f φ i C p c y l i Ξ i m

Stereo Matching on Panoramas

We understand that the algorithm chosen for finding matches is crucial to attain correct pixel disparity results. We refer the reader to [35] for a detailed survey of stereo correspondence methods. After comparing various block matching algorithms, we were able to obtain acceptable disparity maps with the semi-global block matching (SGBM) method introduced by [36], which can find subpixel matches in real time. As a result of this stereo block matcher among the pair of panoramic images Ξ 1 , Ξ 2 , we get the dense disparity map Ξ Δ m 12 visualized as an image in Figure 15 and Figure 21a. Note that valid disparity values must be positive ( Δ m 12 Ξ i m 1 > 0 ) and they are given with respect to the reference image, in this case, Ξ 1 . In addition, recall that no stereo matching algorithm (as far as we are aware) is totally immune to mismatches due to several well-known reasons in the literature such as ambiguity of cyclic patterns.
An advantage of the block (window) search for correspondences is that it can be narrowed along epipolar lines. Unlike the traditional horizontal stereo configuration, our system captures panoramic images whose views differ in a vertical fashion. As shown in [14], the unwrapped panoramas contain vertical, parallel epipolar lines that facilitate the pixel correlation search. Thus, given a pixel position Ξ i m 1 on the reference panorama Ξ 1 and its disparity value Δ m 12 Ξ i m 1 , we can resolve the correspondence Ξ 2 m 2 pixel coordinate on the target image, Ξ 2 , by simply offsetting the v-coordinate with the disparity value:
Ξ 2 m 2 = u 1 v 1 + Δ m 12 Ξ 1 m 1

5.2. Range from Triangulation

Recall the duality that states a point P w as the intersection of a pair of lines. Regardless of the correspondence search technique employed, such as block stereo matching between panoramas Ξ i (Section 5.1.1) or feature detection directly on I , we can resolve for I m 1 , m 2 . From Equations (42) and (49), we obtain the respective pair of back-projected rays F 1 v 1 , F 2 v 2 , emanating from their respective physical viewpoints, F 1 and F 2 , which are separated by baseline b. We can compute elevation angles θ 1 and θ 2 using equations Equations (43) and (50). Then, we can triangulate the back-projected rays in order to calculate the horizontal range ρ w defined in Equation (22), as follows:
ρ w = b cos ( θ 1 ) cos ( θ 2 ) sin ( θ 1 θ 2 )
Finally, we obtain the 3D position of P w :
C p w = ρ w cos ( ψ 12 ) ρ w sin ( ψ 12 ) c 1 ρ w tan ( θ 1 )
where ψ 12 is the common azimuthal angle (on the XY-plane) for coplanar rays, so it can be determined either by Equation (44) or Equation (51). Functionally, we define the “naive” intersection function that implements Equations (87) and (88) such that
C p w f Δ ( ( θ 1 , ψ 1 ) , ( θ 2 , ψ 2 ) , θ )
where θ is the model parameters vector defined in Equation (4) and can be omitted when calling this function because the model parameters should not change (ideally).

5.2.1. Common Perpendicular Midpoint Triangulation Method

Because the coplanarity of these rays cannot be guaranteed (skew rays case), a better triangulation approximation while considering coaxial misalignments is to find the midpoint of their common perpendicular line segment (as attempted in [23]). As illustrated in Figure 16, we define the common perpendicular line segment G 1 G 2 ¯ as the parametrized vector v 1 2 = λ 1 2 v ^ 1 2 , for the unit vector normal to the back-projected rays, v 1 and v 2 , such that:
v ^ 1 2 = v 1 v 2 || v 1 v 2 ||
If the rays are not parallel ( || v 1 v 2 || 0 ), we can compute the “exact” solution, λ = [ λ G 1 , λ G 2 , λ 1 2 ] T , of the well-determined linear matrix equation
V λ = b , where V = v 1 , v 2 , v ^ 1 2 and b = C f 2 C f 1
It follows that the location of the midpoint P w G on the common perpendicular v 1 2 with respect to the common frame C is
C p w G = C f 1 + λ G 1 F 1 v 1 + 1 2 λ 1 2 G 1 v ^ 1 2

5.2.2. Range Variation

Before we introduce an uncertainty model for triangulation (Section 5.3), we briefly analyze how range varies according to the possible combinations of pixel correspondences, I ( m 1 , m 2 ) on the image I . Here, we demonstrate how a radial variation of discretized pixel disparities, Δ m 12 , affects the 3D position of a point obtained from triangulation (Section 5.2). Figure 17 demonstrates the nonlinear characteristics of the variation in horizontal range, Δ ρ w , from the discrete relation between pixel positions I m i and their respective back-projected (direction) rays obtained from f β i and triangulated via function f Δ defined in Equation (89). It can be observed that the horizontal range variation, Δ ρ w , increases quadratically as Δ m 12 1 px , which is the minimum discrete pixel disparity, which provides a maximum horizontal range ρ w , m a x 18 , 28 m (computed analytically). The main plot of Figure 17 shows the small disparity values in the interval Δ m 12 ¯ = [ 1 , 20 ] px , whereas the subplot is a zoomed-in extension of the large disparity cases in the interval Δ m 12 ¯ = [ 20 , 100 ] px .
The current analysis is an indicative that triangulation error (e.g., due to false pixel correspondences) may have a severe effect on range accuracy that increases quadratically with distance as it can be appreciated with the 8 m variation on the disparity interval Δ m 12 ¯ = [ 1 , 2 ] px . Also, observe the example of Figure 20 for a reconstructed point cloud, where this range sensing characteristic is more noticeable for faraway points. In fact, the following uncertainty model provides a probabilistic framework for the triangulation error (uncertainty) that agrees with the current numerical claims.

5.3. Triangulation Uncertainty Model

Let f P w be the vector-valued function that computes the 3D coordinates of point P w G with respect to C as the common perpendicular midpoint defined in Equation (92). We express this triangulation function component-wise as follows:
C p w G f P w ( m 12 ) f x w ( m 12 ) f y w ( m 12 ) f z w ( m 12 )
where m 12 = [ u 1 , v 1 , u 2 , v 2 ] is composed by the pixel coordinates of the correspondence I ( m 1 , m 2 ) upon which to base the triangulation (Section 5.2).
Without loss of generality, we model a multivariate Gaussian uncertainty model for triangulation, so that the position vector C p w G of any world point is centered at its mean C μ f P w with a 3 × 3 covariance matrix Σ f P w :
C μ f P w = x w y w z w , Σ f P w = σ f x w 2 σ f x w σ f y w σ f x w σ f z w σ f x w σ f y w σ f y w 2 σ f y w σ f z w σ f x w σ f z w σ f y w σ f z w σ f z w 2
However, since f P w is a non-linear vector-valued function, we linearize it by approximation to a first-order Taylor expansion and we use its Jacobian matrix to propagate the uncertainty (covariance) as in the linear case as follows:
Σ f P w = J f P w Ω m 12 J f P w T
where the 3 × 4 Jacobian matrix for the triangulation function is
J f P w = f x w u 1 f x w v 1 f x w u 2 f x w v 2 f y w u 1 f y w v 1 f y w u 2 f y w v 2 f z w u 1 f z w v 1 f z w u 2 f z w v 2
and the 4 × 4 covariance matrix of the pixel arguments being
Ω m 12 = σ p x 2 I 4
where we assume σ p x = 1 px for the standard deviation of each pixel coordinate in the discretized pixel space. The complete symbolic solution of Σ f P w is too involved to appear in this manuscript. However, in Figure 18, we show the top-view of the covariance ellipsoid drawn at a three- σ f P w level for a point triangulated nearly around ρ w 100 mm . Figure 19 visualizes uncertainty ellipsoids drawn at a one- σ f P w level for several triangulation ranges. We refer the reader to the end of Section 6.3 where we validate the safety of this 1 pixel deviation assumption through experimental results using subpixel precision.

6. Experiment Results

In this section, we demonstrate the capabilities of the omnistereo sensor to provide 3D information either as dense point clouds or as for the registration of sparse 2D features and 3D points. We also evaluate the precision of both projection and triangulation of a few detected corners from a chessboard whose various 3D poses are given as ground-truth.

6.1. Dense 3D Point Clouds

By implementing the process described in Section 5, we begin by visualizing the dense point-cloud obtained from the omnidirectional synthetic image given in Figure 2b, whose actual size is 1280 × 960 pixels. The associated panoramic images, Ξ i , were obtained using function h Ξ i defined in Equation (85) and are shown in Figure 15. Pixel correspondences ( Ξ 1 m 1 , Ξ 2 m 2 ) on the panoramic representations are mapped via h Ξ i into their respective image positions I ( m 1 , m 2 ) . Then, these are triangulated with C f P w given in Equation (93), resulting in the set (cloud) of color 3D points P Δ visualized in Figure 20. Here, the synthetic scene (Figure 2a) is for a room 5.0 m wide (along its X -axis), 8.0 m long (along its Y -axis), and 2.5 m high (along its Z -axis). With respect to the scene center of coordinates, S , the catadioptric omnistereo sensor, C , is positioned at C S t = [ 1 . 60 , 2 . 85 , 0 . 16 ] T in meters.
We also present results from a real experiment using the prototype described in Section 4.3.2 and shown in Figure 13a. The panoramic images and dense point cloud shown in Figure 21 are obtained by implementing the pertinent functions described throughout this manuscript and by holding the SVP assumption of an ideal configuration. We provide these qualitative results as preliminary proof of concept for the proposed sensor after employing a calibration procedure based on the generalized unified model proposed in [37].

6.2. Sparse 3D Points from Features

Using the SURF feature detector and descriptors [38], Figure 22 demonstrates 44 correct matches that are triangulated with Equation (93). Sparse 3D points can be useful for applications of visual odometry where the sensor changes poses and those registered point features can be matched against new images. Please, refer to [39] for a tutorial on visual odometry.

6.3. Triangulation Evaluation

6.3.1. Evaluation of Synthetic Rig

Due to the unstructured nature of the dense point clouds previously discussed, we proceed to triangulate sets of sparse 3D points whose positions with respect to the omnistereo sensor camera frame, C , are known in advance. We synthesize a calibration chessboard pattern G containing m × n square cells for various predetermined poses G C T h . Since the sensor is assumed to be rotationally symmetric, it suffices to experiment with groups of L = 4 chessboard patterns situated at a given horizontal range. A total of L m n 3D points are available for each range group. Each corner point’s position C p j is found with respect to C via the frame transformation C p j , g = C g C T h C g p j for all indices j { 1 , , m n } , g { 1 , , L } .
Figure 23 shows the set of detected corner points on the image from the group of patterns set to a range of C ρ G = 2 m . We adjust the pattern’s cell sizes accordingly so its points can be safely discerned by an automated corner detector [35]. We systematically establish correspondences of pattern points on the omnidirectional image, and proceed to triangulate with Equation (93). For each range group of points, we compute the root-mean-square of the 3D position errors (RMSE) between the observed (triangulated) points C p ˜ j f P w ( m ˜ 1 , m ˜ 2 ) and the true (known) points C p j that were used to describe the ray-traced image. Table 4 compiles the RMSE results and the standard deviation (SD) for some group of patterns whose frames G g , are located at specified horizontal ranges C ρ G [ 0.25 , 8.0 ] m away from C .
We notice that for all the 3D points in the synthetic patterns, we obtained an average error of 0.1 px with a standard deviation σ ˜ p x = 0.05 px for the subpixel detection of corners on the image versus their theoretical values obtained from f φ i defined in Equation (36). This last experiment helps us validate the pessimistic choice of σ p x = 1 px for the discrete pixel space in the triangulation uncertainty model proposed in Section 5.3.

6.3.2. Evaluation of Real-Life Rig

The following experiment uses L = 5 different poses of a real chessboard pattern with 5 × 8 corner points where the square cell size is 24 mm . As done in Section 6.3.1, the evaluated error is the Euclidean norms between the triangulated points and the ground-truth positions of the chessboard posses captured via a motion capture system. The RMSE for all projected points in this set of chessboard patterns is 2.5 pixels with a standard deviation of 1.5 pixels . The RMSE for all triangulated points in this set is 3.5 mm with a standard deviation of 1.4 mm . Figure 24 visually confirms the proximity of the triangulated chessboard poses against the ground-truth pose information.

7. Discussion and Future Work

The portable aspect of the proposed omnistereo sensor is one of its greatest advantages, as discussed in the introduction section. The total weight of the big rig using 37 mm -radius mirrors is about 550 g , so it can be carried by the AscTec Pelican quadrotor under its payload limitations of 650 g . The mirror profiles maximize the stereo baseline while obeying the various design constraints such as size and field of view. Currently, the mirrors are custom-manufactured out of brass using CNC machining. However, it is possible to reduce the system’s weight dramatically by employing lighter materials.
In reality, it is almost impossible to assemble a perfect imaging system that fulfills the SVP assumption and avoids the triangulation uncertainty studied in Section 5.3 on top of the error already introduced by any feature matching technique. The coaxial misalignment of the folded mirrors-camera system, defocus blur of the lens, and the inauspicious glare from the support tube are all practical caveats we need to overcome for better 3D sensing tasks. As described in the text for the real-life rig, we have avoided the traditional use of a support cylinder in order to workaround the cross-reflections and glare issues. Possible vibrations caused by the robot dynamics are reduced by vibration pads placed on the sensor-body interface. Details about our tentative calibration method for vertically-folded omnistereo systems has not been included in the current study since we would like the reader’s attention to be devoted to the sensor characteristics defended by this analysis.
Our ongoing research is also focusing on the development of efficient software algorithms for real-time 3D pose estimation from point clouds. Bear in mind that all the experimental results demonstrated in this manuscript rely upon a single camera snapshot. We understand that the narrow vertical field-of-view where stereo vision operates is a limiting factor for dense scene reconstruction from a single image, so we have also considered non-optimal geometries for the quadrotor’s view. In fact, increasing the region of interest for stereo (SROI) while maintaining the wide baseline implies an enlargement of each mirror’s radius. We believe that our omnidirectional system is more advantageous than forward-looking sensors because it can provide a robust pose estimation by extracting 3D point features from all around the scene at once. As in our past work [24], fusing multiple modalities (e.g., stereo and optical-flow) is a possibility in order to resolve the scale-factor problem inherent while performing structure from motion over the non-stereo regions of each mirror (near the poles).
In this work, we performed an extensive study of the proposed omnistereo sensor’s properties, such as its spatial resolution and triangulation uncertainty. We validated the projection accuracy of the synthetic model (the ideal case) where 3D points in the world are given exactly. In order to validate the precision of the real sensor, we require a perfectly constructed and assembled device so point projections can be accepted as the ultimate truth. This is hard to achieve at a low-cost prototyping stage. Although we acquired ground-truth 3D points via a position capture system alone, we deem this insufficient to validate the imaging accuracy of the real sensor because the precision of the calibration method is truly what is being accounted for. For reproducibility purposes, source code is available for the implementation of the theoretical omnistereo model, optimization, plots and figures presented in this analysis [40].

Acknowledgments

This work was supported in part by U.S. Army Research Office grantNo. W911NF-09-1-0565, U.S. National Science Foundation grant No. IIS-0644127, and a Ford Foundation Pre-doctoral Fellowship awarded to Carlos Jaramillo.

Author Contributions

The work presented in this paper is a collaborative development by all of the authors. C. Jaramillo wrote this manuscript, carried out all the experiments and conceived the extensive analysis of the omnistereo sensor studied here. R.G. Valenti contributed with the analytical derivation of various equations and manuscript revisions. L. Guo established the geometrical model and rules for the constrained optimization of design parameters. J. Xiao funded and guided this entire study and helped with revisions.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Symbolic Notation

P i
a point R 3 where post-subscript i as a unique identifier.
A
a reference frame or image space with origin O A .
A p i
The position vector of P i with respect to reference frame A .
A p i , h
for homogeneous coordinates.
I m i
a 2D point or pixel position on image frame I .
p i
the magnitude (Euclidean norm) of p i .
q ^
A unit vector so || q ^ || = 1 .
M i
a 3 × 3 matrix, or M i , h in homogeneous coordinates.
f s
a scalar-valued function that outputs some s.
f v
a vector-valued function for the computation of v .
All coordinate systems obey the right-hand rule unless otherwise indicated.

References

  1. Marani, R.; Renò, V.; Nitti, M.; D’Orazio, T.; Stella, E. A Compact 3D Omnidirectional Range Sensor of High Resolution for Robust Reconstruction of Environments. Sensors 2015, 15, 2283–2308. [Google Scholar] [CrossRef]
  2. Valenti, R.G.; Dryanovski, I.; Jaramillo, C.; Strom, D.P.; Xiao, J. Autonomous quadrotor flight using onboard RGB-D visual odometry. In Proceedings of the International Conference on Robotics and Automation (ICRA 2014), Hong Kong, China, 31 May–7 June 2014; pp. 5233–5238.
  3. Khoshelham, K.; Elberink, S.O. Accuracy and resolution of Kinect depth data for indoor mapping applications. Sensors 2012, 12, 1437–1454. [Google Scholar] [CrossRef] [PubMed]
  4. Payá, L.; Fernández, L.; Gil, A.; Reinoso, O. Map building and monte carlo localization using global appearance of omnidirectional images. Sensors 2010, 10, 11468–11497. [Google Scholar] [CrossRef] [PubMed]
  5. Berenguer, Y.; Payá, L.; Ballesta, M.; Reinoso, O. Position Estimation and Local Mapping Using Omnidirectional Images and Global Appearance Descriptors. Sensors 2015, 15, 26368–26395. [Google Scholar] [CrossRef] [PubMed]
  6. Hrabar, S.; Sukhatme, G. Omnidirectional vision for an autonomous helicopter. In Proceedings of the International Conference on Robotics and Automation (ICRA), Taipei, Taiwan, 14–19 September 2003; pp. 3602–3609.
  7. Hrabar, S. 3D path planning and stereo-based obstacle avoidance for rotorcraft UAVs. In Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Nice, France, 22–26 September 2008; pp. 807–814.
  8. Orghidan, R.; Mouaddib, E.M.; Salvi, J. Omnidirectional depth computation from a single image. In Proceedings of the IEEE International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 1222–1227.
  9. Paniagua, C.; Puig, L.; Guerrero, J.J. Omnidirectional structured light in a flexible configuration. Sensors 2013, 13, 13903–13916. [Google Scholar] [CrossRef] [PubMed]
  10. Byrne, J.; Cosgrove, M.; Mehra, R. Stereo based obstacle detection for an unmanned air vehicle. In Proceedings of the International Conference on Robotics and Automation, Orlando, FL, USA, 15–19 May 2006.
  11. Smadja, L.; Benosman, R.; Devars, J. Hybrid stereo configurations through a cylindrical sensor calibration. Mach. Vis. Appl. 2006, 17, 251–264. [Google Scholar] [CrossRef]
  12. Hartley, R.; Zisserman, A. Multiple View Geometry in Computer Vision, 2nd ed.; Volume 2, Cambridge University Press: Cambridge, UK, 2004. [Google Scholar]
  13. Sturm, P.; Ramalingam, S.; Tardif, J.P.; Gasparini, S.; Barreto, J.P. Camera models and fundamental concepts used in geometric computer vision. Found. Trends®Comput. Graph. Vis. 2010, 6, 1–183. [Google Scholar] [CrossRef]
  14. Gluckman, J.; Nayar, S.K.; Thoresz, K.J. Real-Time Omnidirectional and Panoramic Stereo. Comput. Vis. Image Underst. 1998. [Google Scholar]
  15. Koyasu, H.; Miura, J.; Shirai, Y. Realtime omnidirectional stereo for obstacle detection and tracking in dynamic environments. In Proceedings of the International Conference on Intelligent Robots and Systems (IROS), Maui, HI, USA, 29 October–3 November 2001; Volume 1, pp. 31–36.
  16. Bajcsy, R.; Lin, S.S. High resolution catadioptric omni-directional stereo sensor for robot vision. In Proceedings of the 2003 IEEE International Conference on Robotics and Automation, Taipei, Taiwan, 14–19 September 2003; pp. 1694–1699.
  17. Cabral, E.E.; de Souza, J.C.J.; Hunold, M.C. Omnidirectional stereo vision with a hyperbolic double lobed mirror. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR), Cambridge, UK, 23–26 August 2004; pp. 0–3.
  18. Su, L.; Zhu, F. Design of a novel stereo vision navigation system for mobile robots. In Proceedings of the IEEE Robotics and Biomimetics (ROBIO), Hong Kong, China, 5–9 July 2005; pp. 611–614.
  19. Mouaddib, E.M.; Sagawa, R. Stereovision with a single camera and multiple mirrors. In Proceedings of the International Conference on Robotics and Automation, Barcelona, Spain, 18–22 April 2005; pp. 800–805.
  20. Schönbein, M.; Kitt, B.; Lauer, M. Environmental Perception for Intelligent Vehicles Using Catadioptric Stereo Vision Systems. In Proceedings of the European Conference on Mobile Robots (ECMR), Örebro, Sweden, 7–9 September 2011; pp. 1–6.
  21. Yi, S.; Ahuja, N. An Omnidirectional Stereo Vision System Using a Single Camera. In Proceedings of the 18th International Conference on Pattern Recognition (ICPR’06), Hong Kong, China, 20–24 August 2006; pp. 861–865.
  22. Nayar, S.K.; Peri, V. Folded catadioptric cameras. In Proceedings of the 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Fort Collins, CO, USA, 23–25 June 1999; pp. 217–223.
  23. He, L.; Luo, C.; Zhu, F.; Hao, Y. Stereo Matching and 3D Reconstruction via an Omnidirectional Stereo Sensor. In Motion Planning; Number 60575024; In-Tech Education and Publishing: Vienna, Austria, 2008; pp. 123–142. [Google Scholar]
  24. Labutov, I.; Jaramillo, C.; Xiao, J. Generating near-spherical range panoramas by fusing optical flow and stereo from a single-camera folded catadioptric rig. Mach. Vis. Appl. 2011, 24, 1–12. [Google Scholar] [CrossRef]
  25. Swaminathan, R.; Grossberg, M.D.; Nayar, S.K. Caustics of catadioptric cameras. In Proceedings of the Eighth IEEE International Conference on Computer Vision (ICCV 2001), Vancouver, BC, Canada, 7–14 July 2001; Volume 2, pp. 2–9.
  26. Jang, G.; Kim, S.; Kweon, I. Single camera catadioptric stereo system. In Proceedings of the Workshop on Omnidirectional Vision, Camera Networks and Nonclassical Cameras (OMNIVIS2005), Beijing, China, 21 October 2005.
  27. Jaramillo, C.; Guo, L.; Xiao, J. A Single-Camera Omni-Stereo Vision System for 3D Perception of Micro Aerial Vehicles (MAVs). In Proceedings of the IEEE Conference on Industrial Electronics and Applications (ICIEA), Melbourne, Australia, 19–21 June 2013; Volume 10016.
  28. Ascending Technologies (AscTec). Available online: http://www.asctec.de/en/uav-uas-drones-rpas-roav/ (accessed on 23 May 2014).
  29. Baker, S.; Nayar, S.K. A theory of single-viewpoint catadioptric image formation. Int. J. Comput. Vis. 1999, 35, 175–196. [Google Scholar] [CrossRef]
  30. Nayar, S.K.; Baker, S. Catadioptric Image Formation. In Proceedings of the 1997 DARPA Image Understanding Workshop, New Orleans, LA, USA, May 1997; pp. 1431–1437.
  31. Gaspar, J.; Deccó, C.; Okamoto, J.J.; Santos-Victor, J.; Sistemas, I.D.; Pais, A.R.; Brazil, S.P. Constant resolution omnidirectional cameras. In Proceedings of the OMNIVIS’02 Workshop on Omni-directional Vision, Copenhagen, Denmark, 2 June 2002.
  32. Forsgren, A.; Gill, P.; Wright, M. Interior Methods for Nonlinear Optimization. Soc. Ind. Appl. Math. (SIAM Rev.) 2002, 44, 525–597. [Google Scholar] [CrossRef]
  33. Tuytelaars, T.T.; Mikolajczyk, K. Local Invariant Feature Detectors- A Survey. Found. Trends® in Comput. Graph. Vis. 2008, 3, 177–280. [Google Scholar] [CrossRef] [Green Version]
  34. Spacek, L. Coaxial Omnidirectional Stereopsis. Computer Vision-ECCV 2004; Springer Berlin Heidelberg: Berlin, Heidelberg, 2004; pp. 354–365. [Google Scholar]
  35. Bradski, G.; Kaehler, A. Learning OpenCV: Computer vision with the OpenCV library; O’Reilly Media, Inc.: Sebastopol, California, 2008. [Google Scholar]
  36. Hirschmüller, H. Stereo processing by semiglobal matching and mutual information. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 328–341. [Google Scholar] [CrossRef] [PubMed]
  37. Xiang, Z.; Dai, X.; Gong, X. Noncentral catadioptric camera calibration using a generalized unified model. Opt. Lett. 2013, 38, 1367–1369. [Google Scholar] [CrossRef]
  38. Bay, H.; Ess, A.; Tuytelaars, T.; Vangool, L. Speeded-Up Robust Features (SURF). Comput. Vis. Image Underst. 2008, 110, 346–359. [Google Scholar] [CrossRef]
  39. Scaramuzza, D.; Fraundorfer, F. Visual Odometry Part 1: The First 30 Years and Fundamentals. IEEE Robot. Autom. Mag. 2011, 18, 80–92. [Google Scholar] [CrossRef]
  40. Source Code Repository. Available online: https://github.com/ubuntuslave/omnistereo_sensor_design (accessed on 5 February 2016).
Figure 1. Synthetic and real prototypes for the catadioptric single-camera omnistereo system.
Figure 1. Synthetic and real prototypes for the catadioptric single-camera omnistereo system.
Sensors 16 00217 g001
Figure 2. Photo-realistic synthetic scene: (a) Side-view of the quadrotor with the omnistereo rig in an office environment; (b) the image captured by the system’s camera using this pose.
Figure 2. Photo-realistic synthetic scene: (a) Side-view of the quadrotor with the omnistereo rig in an office environment; (b) the image captured by the system’s camera using this pose.
Sensors 16 00217 g002
Figure 3. Geometric model and observable design parameters.
Figure 3. Geometric model and observable design parameters.
Sensors 16 00217 g003
Figure 4. Omnistereo projection of a 3D point P w to obtain image points I m 1 and I m 2 .
Figure 4. Omnistereo projection of a 3D point P w to obtain image points I m 1 and I m 2 .
Sensors 16 00217 g004
Figure 5. Vertical Field of View (vFOV) angles: α 1 and α 2 are the individual angles of the mirrors formed by their respective elevation limits θ 1 / 2 , m i n / m a x ; α s y s is the overall vFOV angle of the system; and α S R O I measures the overlapping region conceived between α 1 and α 2 .
Figure 5. Vertical Field of View (vFOV) angles: α 1 and α 2 are the individual angles of the mirrors formed by their respective elevation limits θ 1 / 2 , m i n / m a x ; α s y s is the overall vFOV angle of the system; and α S R O I measures the overlapping region conceived between α 1 and α 2 .
Sensors 16 00217 g005
Figure 6. A cross section of the SROI (shaded area) formed by the intersection of view rays for the limiting elevations θ 1 / 2 , m i n / m a x . The nearest stereo ( n s ) points are labeled P n s h i g h , P n s m i d and P n s l o w since they are the vertices of the hull that near-bounds the set of usable points for depth computation from triangulation (Section 5.2). See Table 3 for the proposed sensor’s values.
Figure 6. A cross section of the SROI (shaded area) formed by the intersection of view rays for the limiting elevations θ 1 / 2 , m i n / m a x . The nearest stereo ( n s ) points are labeled P n s h i g h , P n s m i d and P n s l o w since they are the vertices of the hull that near-bounds the set of usable points for depth computation from triangulation (Section 5.2). See Table 3 for the proposed sensor’s values.
Sensors 16 00217 g006
Figure 7. The omnidirectional image I shown in Figure 2b is now annotated for the separate regions of interest in I 1 and I 2 . In addition, we indicate the corresponding radial heights h I 1 and h I 2 of the SROI, so we can determine the imaging ratio χ I 1 : 2 = h I 1 h I 2 . For the optimal parameter values listed in Table 1, we find that χ I 1 : 2 2 .
Figure 7. The omnidirectional image I shown in Figure 2b is now annotated for the separate regions of interest in I 1 and I 2 . In addition, we indicate the corresponding radial heights h I 1 and h I 2 of the SROI, so we can determine the imaging ratio χ I 1 : 2 = h I 1 h I 2 . For the optimal parameter values listed in Table 1, we find that χ I 1 : 2 2 .
Sensors 16 00217 g007
Figure 8. The spatial resolution for a central catadioptric sensor is the ratio between an infinitesimal image area dA and its corresponding solid angle d ν 1 that views a point P w . (Note: infinitesimal elements are exaggerated in the figure for better visualization.)
Figure 8. The spatial resolution for a central catadioptric sensor is the ratio between an infinitesimal image area dA and its corresponding solid angle d ν 1 that views a point P w . (Note: infinitesimal elements are exaggerated in the figure for better visualization.)
Sensors 16 00217 g008
Figure 9. The effect that parameter k i (showing mirror 1 only) has over the system radius r s y s for various values of the vertical field of view angle α 1 . In order to maintain a vertical field of view α i that is bounded by z m a x r s y s , the value of r s y s must change accordingly. Inherently, the system’s height, h s y s , and its mass, m s y s , are also affected by k i (see Section 2.3).
Figure 9. The effect that parameter k i (showing mirror 1 only) has over the system radius r s y s for various values of the vertical field of view angle α 1 . In order to maintain a vertical field of view α i that is bounded by z m a x r s y s , the value of r s y s must change accordingly. Inherently, the system’s height, h s y s , and its mass, m s y s , are also affected by k i (see Section 2.3).
Sensors 16 00217 g009
Figure 10. The effect that parameter k 1 has over the omnistereo system’s baseline b for several common FOV angles ( α S R O I ) and a fixed camera with α c a m . An inverse relationship exists between k and b as plotted here (using a logarithmic scale for the vertical axis). Intuitively, the flatter the mirror gets ( k 2 ), the farther F 1 must be translated in order to fit within the camera’s view, α S R O I , causing b to increase.
Figure 10. The effect that parameter k 1 has over the omnistereo system’s baseline b for several common FOV angles ( α S R O I ) and a fixed camera with α c a m . An inverse relationship exists between k and b as plotted here (using a logarithmic scale for the vertical axis). Intuitively, the flatter the mirror gets ( k 2 ), the farther F 1 must be translated in order to fit within the camera’s view, α S R O I , causing b to increase.
Sensors 16 00217 g010
Figure 11. Comparison of k i values and their effect on spatial resolution η i for i = { 1 , 2 } . For the big rig, the optimal focal dimensions c 1 and c 2 (from Table 1) were used as well as the angular span on the common vertical FOV, α S R O I 28 . Although resolution η i ( O p t . ) for the optimal values of k i could be improved by employing smaller k values (lower curvature profiles indicated on the left plot of the figure), this would in turn increase the system radius, r s y s , as to maintain α i (Figure 9). As expected, the plot on the right help us appreciate how the spatial resolutions, η i , increase towards the equatorial regions ( θ 1 θ S R O I , m a x and θ 2 θ S R O I , m i n ).
Figure 11. Comparison of k i values and their effect on spatial resolution η i for i = { 1 , 2 } . For the big rig, the optimal focal dimensions c 1 and c 2 (from Table 1) were used as well as the angular span on the common vertical FOV, α S R O I 28 . Although resolution η i ( O p t . ) for the optimal values of k i could be improved by employing smaller k values (lower curvature profiles indicated on the left plot of the figure), this would in turn increase the system radius, r s y s , as to maintain α i (Figure 9). As expected, the plot on the right help us appreciate how the spatial resolutions, η i , increase towards the equatorial regions ( θ 1 θ S R O I , m a x and θ 2 θ S R O I , m i n ).
Sensors 16 00217 g011
Figure 12. Using the formula given in Equation (60), we plot the 2D version of the spatial resolution of our proposed omnistereo catadioptric sensor (37 mm -radius rig). Both resolutions η 1 and η 2 increase towards the equatorial region where they are physically limited by r s y s . This verifies the spatial resolution theory given in [29], and it justifies our coaxial configuration useful for omnistereo sensing within the SROI indicated in Figure 6.
Figure 12. Using the formula given in Equation (60), we plot the 2D version of the spatial resolution of our proposed omnistereo catadioptric sensor (37 mm -radius rig). Both resolutions η 1 and η 2 increase towards the equatorial region where they are physically limited by r s y s . This verifies the spatial resolution theory given in [29], and it justifies our coaxial configuration useful for omnistereo sensing within the SROI indicated in Figure 6.
Sensors 16 00217 g012
Figure 13. Real-life prototype of the omnistereo sensor.
Figure 13. Real-life prototype of the omnistereo sensor.
Sensors 16 00217 g013
Figure 14. An example for the formation of panoramic image Ξ 1 out of the omnidirectional image I 1 (showing only the masked region of interest on the back of image plane π i m g 1 ). Any particular ray, v 1 indicated by its elevation and azimuth such as F 1 ψ 1 , θ 1 that is directed towards the focus F 1 must traverse the projection cylinder S c y l 1 at point P c y l 1 . More abstractly, the figure also shows how a pixel position Ξ 1 m α on the panoramic pixel space gets mapped from its corresponding pixel position I 1 m α via function h Ξ 1 defined in Equation (85). Although not up to scale, it’s crucial to notice the relative orientation between S c y l 1 and the back of the projection plane π i m g 1 where the omnidirectional image I 1 is found.
Figure 14. An example for the formation of panoramic image Ξ 1 out of the omnidirectional image I 1 (showing only the masked region of interest on the back of image plane π i m g 1 ). Any particular ray, v 1 indicated by its elevation and azimuth such as F 1 ψ 1 , θ 1 that is directed towards the focus F 1 must traverse the projection cylinder S c y l 1 at point P c y l 1 . More abstractly, the figure also shows how a pixel position Ξ 1 m α on the panoramic pixel space gets mapped from its corresponding pixel position I 1 m α via function h Ξ 1 defined in Equation (85). Although not up to scale, it’s crucial to notice the relative orientation between S c y l 1 and the back of the projection plane π i m g 1 where the omnidirectional image I 1 is found.
Sensors 16 00217 g014
Figure 15. For the synthetic omnidirectional image I shown in Figure 2b, we generate its pair of panoramic images Ξ 1 , Ξ 2 using the procedure explained in Section 5.1. Note that we only work on the SROI (shown here) to perform a semi-global block match between the panoramas as indicated in Section 5.1.1. The resulting disparity map, Ξ Δ m 12 , is visualized at the bottom as a gray-scale panoramic image normalized about its 256 intensity levels, where brighter colors imply larger disparity values. To distinguish the relative vertical view of both panoramas, we have annotated the row position of the zero-elevation.
Figure 15. For the synthetic omnidirectional image I shown in Figure 2b, we generate its pair of panoramic images Ξ 1 , Ξ 2 using the procedure explained in Section 5.1. Note that we only work on the SROI (shown here) to perform a semi-global block match between the panoramas as indicated in Section 5.1.1. The resulting disparity map, Ξ Δ m 12 , is visualized at the bottom as a gray-scale panoramic image normalized about its 256 intensity levels, where brighter colors imply larger disparity values. To distinguish the relative vertical view of both panoramas, we have annotated the row position of the zero-elevation.
Sensors 16 00217 g015
Figure 16. The more realistic case of skew back-projection rays ( v 1 , v 2 ) approximates the triangulated point P w by getting the midpoint P w G on the common perpendicular line segment G 1 G 2 ¯ : λ 1 2 v ^ 1 2 . Note that the visualized skew rays were formed from a pixel correspondence pair I ( m 1 , m 2 ) and by offsetting the coordinate u 2 by 15 pixels.
Figure 16. The more realistic case of skew back-projection rays ( v 1 , v 2 ) approximates the triangulated point P w by getting the midpoint P w G on the common perpendicular line segment G 1 G 2 ¯ : λ 1 2 v ^ 1 2 . Note that the visualized skew rays were formed from a pixel correspondence pair I ( m 1 , m 2 ) and by offsetting the coordinate u 2 by 15 pixels.
Sensors 16 00217 g016
Figure 17. Variation of horizontal range, Δ ρ w , due to change in pixel disparity Δ m 12 on the omnidirectional image, I . There exists a “nonlinear & inverse” relation between the change in depth from triangulation ( Δ ρ w ) and the number of disparity pixels ( Δ m 12 ) available from the omnistereo image pair I 1 , I 2 , which are exclusive subspaces of I .
Figure 17. Variation of horizontal range, Δ ρ w , due to change in pixel disparity Δ m 12 on the omnidirectional image, I . There exists a “nonlinear & inverse” relation between the change in depth from triangulation ( Δ ρ w ) and the number of disparity pixels ( Δ m 12 ) available from the omnistereo image pair I 1 , I 2 , which are exclusive subspaces of I .
Sensors 16 00217 g017
Figure 18. Top-view of the three-sigma level ellipsoid for the triangulation uncertainty of a pixel pair I ( m 1 , m 2 ) with an assumed standard deviation σ p x = 1 px .
Figure 18. Top-view of the three-sigma level ellipsoid for the triangulation uncertainty of a pixel pair I ( m 1 , m 2 ) with an assumed standard deviation σ p x = 1 px .
Sensors 16 00217 g018
Figure 19. Uncertainty ellipsoids for triangulated points at ranges ρ w { 0 . 3 , 0 . 5 , 1 . 0 } m .
Figure 19. Uncertainty ellipsoids for triangulated points at ranges ρ w { 0 . 3 , 0 . 5 , 1 . 0 } m .
Sensors 16 00217 g019
Figure 20. A 3-D dense point cloud computed out of the synthetic model that rendered the omnidirectional image shown in Figure 2b. Pixel correspondences are established via the panoramic depth map visualized in Figure 15. The 3D point triangulation implements the common perpendicular midpoint method indicated in Section 5.2.1. The position of the omnistereo sensor mounted on the quadrotor is annotated as frame C with respect to the scene’s coordinates frame S . (a) 3D visualization of the point cloud (the quadrotor with the omnistereo rig has been added for visualization only); (b) Orthographic projection of the point cloud to the XY -plane of the visualization grid.
Figure 20. A 3-D dense point cloud computed out of the synthetic model that rendered the omnidirectional image shown in Figure 2b. Pixel correspondences are established via the panoramic depth map visualized in Figure 15. The 3D point triangulation implements the common perpendicular midpoint method indicated in Section 5.2.1. The position of the omnistereo sensor mounted on the quadrotor is annotated as frame C with respect to the scene’s coordinates frame S . (a) 3D visualization of the point cloud (the quadrotor with the omnistereo rig has been added for visualization only); (b) Orthographic projection of the point cloud to the XY -plane of the visualization grid.
Sensors 16 00217 g020
Figure 21. Real-life experiment using the 37 mm -radius prototype and a single 2592 × 1944 pixels image where the rig was positioned in the middle of the room observed in Figure 13a. Some landmarks of the scene are annotated as following: Ⓐ appliances, Ⓑ monitors and shelf, Ⓒ back wall, Ⓓ chair, Ⓔ monitors and shelf,Ⓕ book, Ⓖ monitors, Ⓗ person, Ⓘ hallway, Ⓙ supplies. For the point cloud, the grid size is 0.50 m in all directions and points are thickened for clarity.
Figure 21. Real-life experiment using the 37 mm -radius prototype and a single 2592 × 1944 pixels image where the rig was positioned in the middle of the room observed in Figure 13a. Some landmarks of the scene are annotated as following: Ⓐ appliances, Ⓑ monitors and shelf, Ⓒ back wall, Ⓓ chair, Ⓔ monitors and shelf,Ⓕ book, Ⓖ monitors, Ⓗ person, Ⓘ hallway, Ⓙ supplies. For the point cloud, the grid size is 0.50 m in all directions and points are thickened for clarity.
Sensors 16 00217 g021
Figure 22. Sparse point correspondences for the real-life image from Figure 13b. Point correspondences are identifiable by random colors that persist in both the panoramic image and the respective triangulated 3D points (scaled-up for visualization).
Figure 22. Sparse point correspondences for the real-life image from Figure 13b. Point correspondences are identifiable by random colors that persist in both the panoramic image and the respective triangulated 3D points (scaled-up for visualization).
Sensors 16 00217 g022
Figure 23. Example of sparse point correspondences detected with subpixel precision from corners on the chessboard patterns around the omnistereo sensor. The size of the rendered images for this experiment is 1280 × 960 pixels. For this example’s patterns, the square cell size is 140 mm . The RMSE for this set of points at C ρ G = 2 m is approximately 15 mm (Table 4).
Figure 23. Example of sparse point correspondences detected with subpixel precision from corners on the chessboard patterns around the omnistereo sensor. The size of the rendered images for this experiment is 1280 × 960 pixels. For this example’s patterns, the square cell size is 140 mm . The RMSE for this set of points at C ρ G = 2 m is approximately 15 mm (Table 4).
Sensors 16 00217 g023
Figure 24. Visualization of estimated 3D poses for some chessboard patterns using the real-life omnistereo rig. Color annotations: ground-truth poses (green), estimated triangulated poses (red).
Figure 24. Visualization of estimated 3D poses for some chessboard patterns using the real-life omnistereo rig. Color annotations: ground-truth poses (green), estimated triangulated poses (red).
Sensors 16 00217 g024
Table 1. Optimal System Design Parameters.
Table 1. Optimal System Design Parameters.
ParameterBig RigSmall Rig
b = max f b ( θ * ) 131.61108.92
r s y s [ mm ] 37.028.0
c 1 [ mm ] 123.49104.59
c 2 [ mm ] 241.80204.34
d [ mm ] 233.68200.00
k 1 5.736.88
k 2 9.7411.47
Table 2. By-product Length Parameters.
Table 2. By-product Length Parameters.
ParameterBig RigSmall Rig
r r e f [ mm ] 17.2311.74
r c a m [ mm ] 77
h s y s [ mm ] 150.00120.00
Table 3. Near Vertices of the SROI for the Big Rig.
Table 3. Near Vertices of the SROI for the Big Rig.
Vertex C ρ w [ mm ] C z w [ mm ]
P n s h i g h 93.5144.4
P n s m i d 65.298.4
P n s l o w 763.4-170.3
Table 4. Results of RMSE from Synthetic Triangulation Experiment.
Table 4. Results of RMSE from Synthetic Triangulation Experiment.
C ρ G [ m ]RMSE [ mm ]SD [ mm ]
0.250.460.31
0.501.200.71
1.04.622.55
2.014.859.06
4.057.6731.34
8.0219.09129.92

Share and Cite

MDPI and ACS Style

Jaramillo, C.; Valenti, R.G.; Guo, L.; Xiao, J. Design and Analysis of a Single—Camera Omnistereo Sensor for Quadrotor Micro Aerial Vehicles (MAVs). Sensors 2016, 16, 217. https://doi.org/10.3390/s16020217

AMA Style

Jaramillo C, Valenti RG, Guo L, Xiao J. Design and Analysis of a Single—Camera Omnistereo Sensor for Quadrotor Micro Aerial Vehicles (MAVs). Sensors. 2016; 16(2):217. https://doi.org/10.3390/s16020217

Chicago/Turabian Style

Jaramillo, Carlos, Roberto G. Valenti, Ling Guo, and Jizhong Xiao. 2016. "Design and Analysis of a Single—Camera Omnistereo Sensor for Quadrotor Micro Aerial Vehicles (MAVs)" Sensors 16, no. 2: 217. https://doi.org/10.3390/s16020217

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop