Metric Rectiﬁcation of Spherical Images

: This paper describes a method for metric recording based on spherical images, which are rectiﬁed to document planar surfaces. The proposed method is a multistep workﬂow in which multiple rectilinear images are (i) extracted from a single spherical projection and (ii) used to recover metric properties. The workﬂow is suitable for documenting buildings with small and narrow rooms, i.e., documentation projects where the acquisition of 360 images is faster than the traditional acquisition of several photographs. Two different rectiﬁcation procedures were integrated into the current implementation: (i) an analytical method based on control points and (ii) a geometric procedure based on two sets of parallel lines. Constraints based on line parallelism can be coupled with the focal length of the rectiﬁed image to estimate the rectifying transformation. The calculation of the focal length does not require speciﬁc calibrations projects. It can be derived from the spherical image used during the documentation project, obtaining a rectiﬁed image with just an overall scale ambiguity. Examples and accuracy evaluation are illustrated and discussed to show the pros and cons of the proposed method.


Introduction
Digital metric documentation with images and laser scans is fundamental in projects requiring accurate metric deliverables. Traditional metric deliverables are plans, sections, orthophotos, digital maps, or different types of 3D models such as BIM, surface or solid models, surface meshes, and finite elements models, among others [1]. Nowadays, the use of digital photogrammetry and laser scanning is one of the most common digital surveying methods able to generate accurate geometric records in different formats.
Several papers and textbooks describe both theoretical concepts and practical issues in digital documentation projects of three-dimensional objects carried out using images [2] and laser scans [3]. The reader is referred to [4][5][6][7][8] for some examples related to digital workflows using a combination of instruments and measuring techniques. This paper aims to show a relatively fast alternative solution that can be used with 360 • low-cost cameras available on the commercial market. Low-cost sensors are becoming more popular within the framework of digital photogrammetry because a digital camera can be turned into a measuring tool [9], obtaining 3D models from blocks and sequences of digital images.
The proposed method is not based on traditional frame (also called central perspectives or pinhole) cameras. Here, 360 images (also called spherical or equirectangular projections) are used, and the considered objects are planar surfaces, i.e., two-dimensional objects.
Metric documentation of 2D objects is typical in several projects, such as the digital documentation of building facades, mosaics, and paintings, among others. Here, rectified photographs are still a popular solution in several practical applications. The spherical camera model is already available in commercial software for imagebased 3D modeling, such as Agisoft Metashape or Pix4Dmapper. However, the reconstruction workflow is mainly developed to deal with three-dimensional geometry objects. In fact, commercial software requires multiple spherical images acquired from different locations, which are then processed to generate 3D textured models or additional outputs (orthophotos, digital elevation models, etc.).
Other examples, including both metric and non-metric projects with spherical images, were discussed by different authors. For instance, heritage documentation is a field in which spherical images allow rapid mapping of buildings and monuments [28][29][30][31] including sites at risk [32]. In [33], spherical images are used as a base for virtual reality applications, whereas an immersive tool is described by [34]. The use of such images and deep learning is discussed in [35]. The work illustrated in [36] uses a low-cost 360 • sensor for crime scene documentation. Ref. [37] uses spherical images for biomass estimation in forestry. Integrated use of drones and spherical images is presented in [38].
Spherical images are also available on the web. Different sharing services (e.g., 360Cities.net, Mapillary, Facebook, Kuula, Roundme, and Theta360) offer the opportunity to upload this type of image. Some of them allow data sharing and the creation of virtual tours.
The method presented in this manuscript differs from traditional projects based on spherical photogrammetry. As mentioned, only planar objects are considered. Although 2D metric documentation can be carried out with the traditional photogrammetric workflow (planar objects are sub-cases of 3D projects), there is a lack of software for metric rectification of spherical images.
Metric rectification [39] is a well-known digital recording solution that can be carried out using just a photograph, removing the perspective deformation using external constraints such as control points or information about the geometry (e.g., sets of parallel lines and known aspect ratio). Digital orthophotos created with 3D photogrammetry can compensate for the lack of flat geometry. However, this would require a photogrammetric project with several images featuring good overlap.
This paper tries to avoid the multi-step processing workflow (image matching, bundle adjustment, generation of a dense point cloud, extraction of a mesh or a DSM, and orthophoto production) available in commercial software. The idea is to use a single spherical image and a direct metric rectification approach.
In the case of metric rectification projects carried out with images based on the pinhole camera model, lens distortion is removed beforehand using distortion parameters derived from a specific calibration project. Different commercial packages offer predefined calibration coefficient sets for several camera bodies and lens configurations.
Images acquired with a spherical camera (also called 360 • or equirectangular projections) feature a 360 • × 180 • field of view, capturing the entire scene around the camera. Low-cost and professional cameras are available on the commercial market and allow the photographer to capture a spherical image, usually generated from a set of pinhole or fisheye lenses mounted on the 360 • camera. Although the acquired data are not 360 • images, stitching is automatically carried out to provide the final image in real-time.
This paper extends what was briefly proposed by [40], in which different procedures for condition mapping in restoration projects were illustrated and discussed. This paper focuses on the concept of metric rectification from 360 • images, describing the algorithms with more details and some additional metric evaluation experiments in order to provide information about the achievable metric accuracy.
The paper is structured as follows. Section 2 illustrates the general workflow of the proposed method for metric rectification of spherical images. The following Sections 3-5 describe the different steps of the implemented solution. Experiments were carried out with low-cost cameras available on the commercial market at a cost of about EUR 500.

The Proposed Method for Metric Rectification of Spherical Images
The procedure for metric rectification is based on a multi-step workflow in which different products are generated from a single spherical image. The workflow starts with acquiring a spherical image, which can be captured with low-cost or professional 360 • cameras, or using a rotating camera and stitching software capable of mosaicking multiple shots [41,42]. In the case of a rotating camera, the rotation point must be perspective center to avoid parallax errors during the stitching phase. Figure 1 illustrates the flowchart of the proposed workflow for the extraction of the metric images. An example is proposed here to clarify the implemented solution and the different outputs that can be produced. Figure 2 shows the results achievable with the proposed metric rectification approach using a spherical image retrieved from Google Maps. The image was acquired in a room of the Alcázar of Seville (Spain). Image resolution is 13,312 × 6656 pixels and the covered field of view is 360 • × 180 • . The following sections will prove that a complete 360 • × 180 • (i.e., no cropping to limit the field of view) provides all required information to calculate internal orientation parameters, namely the center of the sphere in the camera reference system and the focal length (in pixels) f .
The spherical image generated six metrically rectified images: four images for the lateral walls, ceiling, and floor. As can be seen in Figures 3 and 4, a single spherical image acquired inside a room allows the user to record all the different flat surfaces. It also provides an immersive visualization that allows the user to understand the relative position of the different walls. A recording specialist could avoid taking notes in the case of a more traditional rectification project based on standard frame images.
Recovering metric properties means generating a new (metrically) rectified image that provides angles and ratios of distances. The correct scale can be recovered only if some (metric) information is available, such as a known distance measured with a measuring tape or a set of control points collected with a total station. In other words, if there is no external information about the considered surface, the rectification process is affected by an overall scale ambiguity, which is another advantage of the proposed approach that encapsulates the focal length. A more traditional geometric rectification would require instead a known width-height ratio, i.e., two distances.  A spherical image can be synthetically rotated to change the viewing direction towards specific areas. The user is probably familiar with the traditional "bubble" visualization, in which the viewing direction can be interactively changed by dragging the point of view (POV). The idea is here replicated considering the planar surface requiring rectification.
We can define the following three angles for a leveled 360 • image: • h = heading, rotation around the vertical direction; • r = roll, rotation around the line of sight; • p = pitch, rotation around the transverse axis. Changing the heading h without altering pitch and roll (r = 0 • , p = 0 • ) provides the same effect of a user looking left or right. This is the most common way to point the viewing directions towards the center of the vertical wall to be rectified. A variation of the pitch instead would rotate the camera up or down, allowing the user to also capture the floor and ceiling.
The user must be aware that the order of rotations (in this case h → r → p) is fundamental. A positive pitch variation corresponds to point the POV up if h = 0 • . The opposite, i.e., POV pointing down, will be achieved for h = 180 • . Roll variations are helpful in refining the captured area, mainly when the spherical image was not acquired in the center of the room.
The image used to rectify the first vertical wall (top-left in Figure 3 was extracted by setting the values POV = (h, r, p) = (0 • , −10 • , 0 • ). The extraction is carried out using a gnomonic projection, which allows projecting the points of the spherical image onto a plane tangent to the sphere. The projection center is the center of the sphere.
The user has to define the field of view (FOV) of the new image based on the pinhole camera model (also called central perspective or rectilinear projection). The FOV can be interactively enlarged or reduced to capture the entire wall, and it was set to 100 • × 100 • .
The rectilinear image for the ceiling (bottom-middle) has parameters POV = (90 • , −90 • , −90 • ) and FOV = 120 • × 125 • . The floor is the flat surface requiring the largest field of view because the camera is relatively closed to the subject. Parameters were set as POV = (0 • , 85 • , −2 • ) and FOV = 135 • × 135 • . More details about the extraction of the rectilinear images are discussed in Section 3. The implemention used relies on the Panorama Tool Files, which is the library used to develop Hugin software. Files can be downloaded from https://sourceforge.net/projects/panotools/files/ (last accessed on 10 March 2022).
The last step is metric rectification, which is carried out on the different rectilinear images. Two different approaches are illustrated and discussed in the manuscript. The first method is the typical solution for accurate metric rectification projects to record flat surfaces. Control points are measured with an extrenal instrument, usually a total station. The same points are measured in the rectilinear images, obtaining pixel coordinates and calculating the parameters of the homographic transformation. The details of such an approach and an accuracy evaluation are discussed in Section 4.
The solution used to generate the metrically rectified images relies on images with no external information. Section 5 will show that a homography able to rectify a rectilinear image can be estimated from the vanishing line of the image and the focal length of the camera used.
Flat surfaces with at least two sets of parallel lines allow calculating the vanishing line. In contrast, the focal length of the rectilinear image can be derived from the pixel size of the spherical image.
The six surfaces of the considered room were therefore rectified without taking extra on-site measurements, obtaining six images with an overall scale ambiguity ( Figure 4). As surfaces share common discontinuity lines, they can be scaled to reduce the number of known distances to just 1 for the entire room.
The bottom-left facade in Figure 4 is a clear example of the possibility to compensate for just deformations on the chosen rectification plane. The wall has an internal niche that cannot be rectified using the proposed approach, which only operates on the chosen object plane.

Extraction of Rectilinear Images from a Spherical Projection
This section describes the procedure to extract a rectilinear image from a spherical projection, which can be used during the metric rectification step.
We define a spherical (equirectangular) projection as an L x × L y image obtained by mapping latitude and longitude (spherical) coordinates (λ, φ) onto the (image) plane (u, v) using the following Equations ( Figure 5): Equirectangular projections cover 360 • horizontally and 180 • vertically so that the aspect ratio is 2:1 (L x = 2L y ). Let us consider a sphere with center (0, 0, 0) and radius r = L x /(2π). The plane tangent to the sphere at point (r, 0, 0) represents the new rectilinear image. The point has latitude and longitude (λ, φ) = (0, 0) and lies on the equator. Mapping is based on a gnomonic projection, i.e., from the center of a sphere to a plane tangential to the sphere.
Horizontal ∆λ and vertical ∆φ angles define the field of view and the size (in pixels) w × h of the new rectilinear image: Vertical and horizontal field of views (FOVs) can be estimated as (2∆λ, 2∆φ). An essential property of the rectilinear image is the relationship between its focal length and the sphere is radius. Indeed, the focal length of the rectilinear image is equal to r if the mapping is carried out, preserving the maximum image resolution. Moreover, no barrel or cushion distortion needs to be corrected, and the rectilinear image can be considered a novel distortion-free image. Figure 6 shows the extracted rectilinear images changing the field of view, the values of the angles were chosen as 2∆λ = 2∆φ = 15 • , progressively increasing resolution: 30 • , 50 • , 80 • , 110 • , and 150 • . It is recommended to keep the angles under 110 • -120 • , to reduce deformations at the edges. Very large fields of view result in stretched elements close to image edges, making the resulting rectilinear projection unusable for metric application.
The resolution of the final image is also proportional to the chosen FOV because the original level of detail encapsulated in the spherical projection is preserved in the current implementation. This means that each image can be considered as a part extracted from the next one. It is up to the user to find a specific balance between the FOV and the area to be rectified. The extraction of multiple rectilinear images is based on selecting alternative tangent planes to the sphere at a generic point (λ, φ). Moving the plane extends the previous case and projection equations must be modified accordingly. However, the alternative solution implemented in this work is the synthetic rotation of the original equirectangular projection using heading, roll, and pitch in sequential order. Changing the point of view allows to orient the sphere so that the tangency point becomes (λ, φ) = (0, 0), without requiring a modification of the current implementation.

Spherical Image Rectification with Control Points
Metric rectification using control points is the typical solution for the accurate digital documentation of planar surfaces. Control points are usually measured with instruments able to provide object coordinates in a metric system. The typical case is based on control points measured with a reflector-less total station. Then, the user must select the corresponding image points and calculate transformation parameters from image to object space. The next section introduces a possible solution for the estimation problem.

Homography Estimation Using Image-to-Object Correspondences
Homography is the transformation between a planar object and the corresponding rectilinear image extracted from the spherical projection. Parameter estimation can be carried out using projective geometry.
A point in the Euclidean 2-space has inhomogenous coordinates x = (x, y) T . Adding an extra coordinate to the pair provides a new triplet x = (λx, λy, λ) T . We say that this 3-vector is the same point in homogeneous coordinates (for any non-zero λ). A homoge- A planar homography (also called projective transformation) is represented by a 3 × 3 matrix H with 8 degrees of freedom: which can be cast in the form: We may multiply the numerator by the denominator to obtain two linear equations: The values of parameters h ij can be determined from n ≥ 4 corresponding points (X , X ) ↔ (x, y). The last element of H can be set h 9 = 1 to take into consideration scale ambiguity.
A system of equations can be written as: or with the more compact notation Ah = b. If more than four point correspondences are given (over-determined set of equations), the solution is not exact. The least squares solution is:

Alternative Solution via Singular Value Decomposition
An alternative solution for the computation of the rectifying homography is based on the use of homogenous coordinates. The equation x = Hx can be cast in a more convenient form using the product x × Hx = 0. The explicit form is given by: where h 1T , h 2T , and h 3T are the rows of the matrix H. This yields two equations (the third one is not linearly independent): and the final system has the form Ah = 0 (linear in the unknown h). The trivial solution h = 0 can be avoided using the constraint h = 1. A solution to solve this system is to perform singular value decomposition (SVD) on the matrix A. SVD factors the matrix A into a diagonal matrix D and two diagonal matrices U and V as follows: The solution is given by the last column of V (in ordered SVD).

Evaluation of Metric Quality
Three reference objects featuring planar geometry were created installing 31 targets on three perpendicular planes: two vertical walls W1 and W2 and the floor F. Reference coordinates were measured with a Leica TS30 total station, obtaining three-dimensional coordinates (X, Y, Z) with a precision of about ±1 mm.
A single 360 • image was acquired with an Insta 360 One R, which has two frontand rear-facing fisheye lenses [43]. The final image (after stitching) has a resolution of 6080 × 3040 pixels. A picture image with the locations of the three planes is shown in Figure 7.
The extraction of the three rectilinear images was carried out varying the values of heading, pitch, and roll to move the center of the equirectangular projection close to the center of each wall. The chosen angles were (0, 0, 0), (90, 0, 0), and (0, 0, 90). The field of view of the rectilinear images was set to 100 • × 100 • . After creating the rectilinear images, image coordinates (x, y) were manually measured. The set of object coordinates (X, Y, Z) was instead split into three new groups using a rotation, placing the reference system in the plane of the walls, obtaining a set of ground truth coordinates (X , Y ), in which Z = 0.
Three different systems of linear equations were written and solved, obtaining metric rectification parameters. After calculating the solution vector h, images can be metrically rectified. Twelve points were used for the two vertical walls W1 and W2, whereas only seven targets were placed on the floor F. Figure 8 shows the three rectilinear images extracted setting a squared field of view (on top). The metrically rectified images are shown in the bottom part of the figure. As can be seen, the reference system for object coordinates is not directed along the walls. The image of the floor is rotated depending on the horizontal orientation of the total station project. The residual vector of least squares can be calculated as v = Ah − b, and the posterior variance is σ 2 The formulation based on ordinary least squares (described in Section 4.1) was used.
A graph with the computed residuals for the different walls is shown in Figures 9 and 10. Wall W1 has smaller residuals than W2, notwithstanding that the camera-object distance is similar, leading to a ground sampling distance (GSD) of about 0.0015 m for both walls. The values for σ 0 were ±0.002 mm and ±0.011 mm, respectively.
The better results achieved for W1 can be explained considering the chosen rotation angles (0, 0, 0), indicating that the fisheye sensor was pointing directly at the center of the wall. In the case of W2, the spherical image is the stitching result of part of both front-and rear-facing images. Similar considerations can be extended to the image of the floor, which shows even larger residuals.

Description of the Implemented Method
The focal length of the rectilinear projection coupled with two sets of parallel lines (not necessarily orthogonal) allows retrieval of metric properties up to a scale factor, which can then be recovered with a known distance.
As the focal length of the rectilinear image extracted from the spherical projection corresponds to the radius f = r = L x /(2π), geometric rectification based on sets of parallel lines can be carried out without a known aspect ratio, which is the more traditional approach for geometric rectification.
We consider the calibration matrix K of the rectilinear image: The identification of the rectifying homography requires the vanishing line of the plane. A generic line in the plane is represented by the equation ax + by + c = 0. A line can be identified by a vector l = (a, b, c) T . A point x = (x 1 , x 2 , x 3 ) T lies on the line l = (a, b, c) T if and only if x T l = 0.
The vanishing line l * = (l * 1 , l * 2 , l * 3 ) T can be computed from two sets of parallel lines [44]. First, a vanishing point v can be estimated from the intersection of the lines l and l using the product v = l ∧ l T , as illustrated in Figure 11. Then, the calibration matrix and the vanishing line provide the orientation of the object plane with respect to the camera. The normal n to the plane is n = K T l * . The image can be synthetically rotated to generate a new rectified image with a homography H = KRK −1 , implying that the unary vector u n = n /n must lie along the the camera optical axis Ru n = (0, 0, 1) T .
The matrix R is made up of a set of vectors that forms an orthonormal set R = (u p , u m , u n ) T .
The rectified image have an extra ambiguity due to a rotation because there exist an infinite number of vectors perpendicular to n, resulting in an under-determined system of equations. Constraints must be applied to calculate the second vector u m . Finally, the last unary vector u n can be estimated with a cross product.

Evaluation of Metric Accuracy
Evaluation of metric accuracy of the implemented procedure was carried out using the same spherical image used in the previous section. The walls' vertical and horizontal lines were first inspected with the total station to verify their reciprocal orthogonality. Then, image rectification was carried out measuring the lines in the images and calculating two vanishing points and the vanishing line.
The experiment was performed using only the two vertical walls because the inspection with the total station revealed that the room's plan has a trapezoidal shape. The two vertical walls are sufficiently rectangular, with discrepancies of about ±2-3 mm.
Two metrically rectified images were generated for the two walls. Evaluation of metric accuracy was performed, estimating a similarity transformation between the pixel coordinates of the rectified image x and the corresponding object coordinates X measured with a total station after setting the reference system parallel to the considered wall.
The use of an extra similarity transformation applied to the metrically rectified image allows a direct comparison between the set of total station control points. The procedure for metric rectification based on the focal length and vanishing line has a scale ambiguity as well as an extra rotation ambiguity due to the choice made for u m . An additional similarity transformation can recover the alignment of the two reference systems without altering the shape of the rectified images.
The transformation can be written as: where α is a rotation angle, s a scale factor, and (t x , t y ) a translation vector. The component y requires the negative sign to change the direction of the axis. Indeed, image coordinates are measured using a system with origin in the top-left corner of the images, and the y axis is pointing downwards. In contrast, reference coordinates Y measured with the total station are in a system pointing up. y is therefore made negative to invert its direction. The previous equations can be cast in a linear form: with the substitution a = s cos α and b = s sin α. Given a set of correspondences (X , Y ) ↔ (x , y ), we may write a linear system of equations that can be solved via least squares: The residuals are shown in Figure 12 and confirm that better results were still achieved for W1, like in the previous section. The discrepancy between the two walls is significantly more significant than in the case of using control points.

Considerations and Conclusions
The paper described a method for image rectification using spherical images. Recovering metric properties of planar objects is a common requirement in several technical disciplines requiring accurate digital documentation.
The difference between traditional metric rectification and the proposed method is the use of spherical images. Such images can be acquired with low-cost 360 • cameras and allow rapid documentation because the camera can be pointed in any direction, capturing the whole scene around the photographer.
Because 360 • low-cost cameras with better resolution are becoming more common among users, the use of their images for metric applications opens new opportunities for different specialists requiring metric documentation.
Photogrammetric packages that generate 3D models using sequences of spherical images are already available on the commercial market. However, there is a lack of a solution for 2D metric rectification based on the spherical camera model.
The 360 • images used for metric applications feature several pros and cons compared to the traditional photogrammetric workflow with frame-based cameras. As mentioned in the introduction, different authors have already investigated the topic of 3D modeling with such images. Instead, the case of metric rectification with a single 360 • image is new and practical applications are not available in the scientific literature.
The proposed method should be integrated in traditional rectification approaches for rapid documentation of flat surfaces in narrow spaces, especially for the interior of buildings with several rooms. The achievable image resolution (i.e., the ground sampling distance of the rectified image) and its metric accuracy cannot be compared to the results with a more traditional rectification approach. Although 360 • images feature a high resolution (e.g., superior to 18 or 20 megapixels), the large field of view under an angle of 360 • × 180 • results in a GSD indicatively 5-6 times worse than the same rectification carried out with a frame camera.
However, it is the author's opinion that the rectification with 360 • images can become suitable for those metric applications in which several internal surfaces of buildings have to be rapidly documented with a metric accuracy of ±2-4 cm, limiting the number of images collected, and providing an overall context using the 360 • images themselves. As a 360 • image provides an immersive (bubble) visualization of the entire space, it is simple for the user to recognize the room in which the image was acquired and the location of the walls in the room.
The implemented workflow relies on the preliminary conversion of the spherical image into a rectilinear image using a gnomonic mapping. The user's point of view can be oriented towards the planar surface requiring metric rectification. The field of view can also be interactively resized depending on the size of the area.
The rectilinear image can then be metrically rectified using two solutions: (i) the use of control points to estimate homography parameters or (ii) a geometrical approach based on the vanishing line coupled with the focal length of the rectilinear image, which can be derived from the radius of the sphere. This second method allows users to recover metric properties up to an overall scale ambiguity, which can be removed by measuring a single distance. The vanishing line of the plane can be instead calculated using at least two sets of parallel lines.
Metric evaluation experiments demonstrated that control points provide precision up to the ground sampling distance of the rectilinear image extracted from the spherical projection. The use of the geometrical method instead is simple and effective when the objects feature sets of parallel lines. In this second case, the user must know that the achieved metric image is affected by an overall scale ambiguity, which can be removed with a known distance.
Finally, an important consideration deserves to be mentioned. The authors acquired the spherical images used in this paper during metric evaluation with a low-cost Insta One R camera. Differing results were obtained for similar surfaces in similar project configurations, depending on the position of the surface and the front-and rear-facing fisheye lenses. Results for walls captured using the front-facing lens were better than the rear-facing lens, demonstrating that stitching of the two images introduces some deformations in the spherical image.