# A Riemannian Geometry Theory of Three-Dimensional Binocular Visual Perception

## Abstract

## 1. Introduction

[Listing] reduced the eye model to a single refracting surface, the vertex of which corresponds to the principal plane and the nodal point of which lies at the centre of curvature. The justification for this model is that the two principal points that lie midway in the anterior chamber are separated only by a fraction of a millimetre and hardly shift during accommodation. Similarly, the two nodal points lie equally close together and remain fixed near the posterior surface of the lens. In the reduced model the two principal points and the two nodal points are combined into a single principal point and a single nodal point. Retinal image sizes may be determined very easily because the nodal point is at the centre of curvature of this single refractory surface. A ray from the tip of an object directed toward the nodal point will go straight to the retina without bending, therefore object and image subtend the same angle. The retinal image size is found by multiplying the distance from the nodal point to the retina (17.2 mm) by the angle in radians subtended by the object [42] (see Figure 18).

## 2. Preliminaries

#### 2.1. Retinal Coordinates

#### 2.2. Hyperfields

#### 2.3. Retinotopic Connections between Hyperfields and Hypercolumns

#### 2.4. Hypercolumns

#### 2.5. Visual Features Extracted by Cortical Columns

#### 2.6. Gaze and Focus Control

#### 2.7. Singular Value Decomposition as a Model for Visual Feature Extraction

**V**

_{L}over all the left ocular dominance minicolumns in the hypercolumns of V1. Similarly, the image-point vectors ${\Sigma}_{R}$ from all the corresponding right hyperfields form a 30-dimensional vector field

**V**

_{R}over all the right ocular dominance minicolumns in the hypercolumns of V1. Due to the retinotopic projections between retinal hyperfields and cortical hypercolumns, the vector fields

**V**

_{L}and

**V**

_{R}can also be thought of as vector fields over the left and right retinal hyperfields, respectively.

**V**

_{L}and

**V**

_{R}over hypercolumns and over retinal hyperfields in this way facilitates a mathematical framework appropriate for development of a Riemannian geometry theory of binocular vision. But this requires a mechanism for quantifying the depth of objects perceived. These depth measures then provide a coordinate system for the Riemannian manifold on which the above vector fields are defined.

#### 2.8. Depth Perception

#### 2.9. Cyclopean Gaze Coordinates

_{E}and the angle $\alpha $ are anatomical parameters that change with growth of the head and eye. Since these parameters influence the geometrical optics of images projected on to the retinas it does not seem unreasonable to suggest that the nervous system is able to model them adaptively through experience, for example, by modelling the relationship between the depth of an object and the size of its image on the retina, and by sensing the change in place of the head required to match the image on one retina with the memorized image on the other. The place and orientation of the head in the environment are encoded by neural activity in the hippocampus and parahippocampus so, referring to Figure 1, the angle ${\theta}_{H}$ of the head relative to the translated external coordinates (${\mathrm{X}}^{\prime}$,${\mathrm{Y}}^{\prime}$) is known, and the angles of rotation ${\theta}_{L}$ and ${\theta}_{R}$ of the left and right eye within the head are sensed proprioceptively. Using the geometry of Figure 1, it can be shown that these known variables ${\theta}_{H}$, ${\theta}_{L}$, ${\theta}_{R}$, d, r

_{E}and $\alpha $ completely determine the Euclidean distance and angle from each eye to the gaze point as well as the length and direction $\left(r,\theta \right)$ of the cyclopean gaze vector OQ. This can be demonstrated by basic trigonometry (sine rule and cosine rule) of the three triangles N

_{L}LC

_{L}, N

_{R}LC

_{R}, and LQR. Importantly, this is not to say that the nervous system ‘does’ trigonometry in the same way we do. It is simply to establish that the information available to it is sufficient to determine uniquely the length and direction of the cyclopean gaze vector.

#### 2.10. Cyclopean Coordinates of Peripheral Image Points

## 3. The Three-Dimensional Perceived Visual Space

#### 3.1. Gaze-Based Visuospatial Memory

**V**

_{L}and

**V**

_{R}of image-point vectors ${\Sigma}_{L}\left({r}_{{a}_{Li}},{\theta}_{{a}_{Li}},\text{}{\phi}_{{a}_{Li}}\right)$ and ${\Sigma}_{R}\left({r}_{{a}_{Ri}},{\theta}_{{a}_{Ri}},\text{}{\phi}_{{a}_{Ri}}\right)$ over the hypercolumns described in Section 2.10 are replaced by new image-point vectors and by new vector fields associated with the next gaze point in the scanning sequence. To build a visuospatial memory of an environment through scanning we argue that the information encoded by the vector fields

**V**

_{L}and

**V**

_{R}during a current interval of fixed gaze must be stored before the gaze is shifted and the information lost. Such memory is accumulated over time and scanning of an environment from a fixed place does not have to occur in one continuous sequence. Images associated with different gaze points from a fixed place can be acquired (and if necessary overwritten) in a piecemeal fashion every time the person passes through that given place.

#### 3.2. A Riemannian Metric for the G-Memory

## 4. Quantifying the Geometry of the Perceived Visual Manifold

#### 4.1. The Relationship Between Perceived Depth and Euclidean Distance

#### 4.2. The Geodesic Spray Field

#### 4.3. Covariant Derivatives

#### 4.4. Christoffel Symbols

#### 4.5. The Riemann Curvature Tensor

## 5. Geodesics of the Perceived Visual Manifold

#### 5.1. Simulations

#### 5.2. Initial Planes II Passing through the Egocentre

#### 5.3. Initial Planes II Normal to the Radial line from the Egocentre to the Initial Point

#### 5.4. Initial Planes II Not Normal to the Radial Line from the Egocentre to the Initial Point and Not Passing Through the Egocentre

#### 5.5. Interpreting Geodesic Simulations

#### 5.6. Euclidean Coordinates versus Perceptual Coordinates

## 6. Binocular Perception of the Size and Shape of Objects

#### 6.1. Seeing the Size of an Object

#### 6.2. Seeing the Outline of an Object

#### 6.3 Seeing the Shape of an Object

## 7. A Geometric Representation of Visuospatial Memory

#### 7.1. The Geometric Structure of G-Memory for a Fixed Place

**V**

_{L}and

**V**

_{R}in $\Gamma E$ (consisting of all the left and right image-point vectors ${\Sigma}_{L}\left(r,\theta ,\text{}\phi \right)$ and ${\Sigma}_{R}\left(r,\theta ,\text{}\phi \right)$ over all the image-points $q=\left(r,\theta ,\text{}\phi \right)$ in $\left(G,g\right)$ accumulated within a single vector bundle $\pi :\mathrm{E}\to G$ through visual scanning) thus encode a visual image of the entire 3D environment as seen from the given fixed place. However, while we describe vector fields

**V**

_{L}and

**V**

_{R}as being defined over all image points $q=\left(r,\theta ,\text{}\phi \right)$ in $\left(G,g\right)$, it must be kept in mind from Section 6 that the image-point vectors ${\Sigma}_{L}\left(r,\theta ,\text{}\phi \right)$ and ${\Sigma}_{R}\left(r,\theta ,\text{}\phi \right)$ are only non-zero at those points $q$ that are located on the surfaces of objects. We also note again here that if the visual environment includes reflections then the orientation of the reflected images is reversed and the vector bundle is said to be twisted [114].

**V**

_{L}$\left(U\right)$ and

**V**

_{R}$\left(U\right)$ confined to open subsets $U$ in $\left(G,g\right)$ can be defined and all the fibres within these can be parallel processed as a unit. Indeed, it is this point processing (i.e., within fibre) nature of computations in Riemannian geometry that makes this geometry so well suited for describing parallel processing in the nervous system.

**V**

_{L}and

**V**

_{R}in $\Gamma E$ can be regarded as fused into a single binocular vector field

**V**over $\left(G,g\right)$. For simplicity of description, in subsequent sections we assume that a sufficient number of gaze points have been accumulated through visual scanning for the vector fields

**V**

_{L}and

**V**

_{R}in $\Gamma E$ to be fused into a single binocular vector field

**V**over $\left(G,g\right)$ in the vector bundle $\pi :\mathrm{E}\to G$.

#### 7.2. The Geometric Structure of Visuospatial Memory with Place Encoding

#### 7.3. Fibre Bundles and Vector-Bundle Morphisms

#### 7.4. Removing Occlusions

#### 7.5. A Geometric Description of Vector-Bundle Morphisms

## 8. Discussion

#### 8.1. Size Perception

#### 8.2. Shape Perception

#### 8.3. Warped Geometry

#### 8.4. IIlusions

#### 8.5. Measuring the Geometry of Perceived Visual Space

#### 8.6. 2D versus 3D Representations

#### 8.7. Visuospatial Memory

#### 8.8. Visuospatial Representation as a Philosophical Issue

## 9. Future Directions

## Author Contributions

## Funding

## Conflicts of Interest

## Appendix A. Extraction of Non-Linear Orthogonal Visual Image Features Using Singular Value Decomposition (SVD)

#### A1. Extraction of Linear Orthogonal SVD Image Features

#### A2. Extraction of Non-Linear Orthogonal SVD Image Features

## Appendix B. Computing Curvatures

#### B1. Gaussian Curvatures and Sectional Curvatures

#### B2. Principal Curvatures, Principal Directions and Perceived Curvatures

