Multi-Camera Imaging System for UAV Photogrammetry

In the last few years, it has been possible to observe a considerable increase in the use of unmanned aerial vehicles (UAV) equipped with compact digital cameras for environment mapping. The next stage in the development of photogrammetry from low altitudes was the development of the imagery data from UAV oblique images. Imagery data was obtained from side-facing directions. As in professional photogrammetric systems, it is possible to record footprints of tree crowns and other forms of the natural environment. The use of a multi-camera system will significantly reduce one of the main UAV photogrammetry limitations (especially in the case of multirotor UAV) which is a reduction of the ground coverage area, while increasing the number of images, increasing the number of flight lines, and reducing the surface imaged during one flight. The approach proposed in this paper is based on using several head cameras to enhance the imaging geometry during one flight of UAV for mapping. As part of the research work, a multi-camera system consisting of several cameras was designed to increase the total Field of View (FOV). Thanks to this, it will be possible to increase the ground coverage area and to acquire image data effectively. The acquired images will be mosaicked in order to limit the total number of images for the mapped area. As part of the research, a set of cameras was calibrated to determine the interior orientation parameters (IOPs). Next, the method of image alignment using the feature image matching algorithms was presented. In the proposed approach, the images are combined in such a way that the final image has a joint centre of projections of component images. The experimental results showed that the proposed solution was reliable and accurate for the mapping purpose. The paper also presents the effectiveness of existing transformation models for images with a large coverage subjected to initial geometric correction due to the influence of distortion.


Introduction
Multi-camera systems, and the nadir and oblique images acquired by them, are of increasing importance in professional aerial photogrammetry. In comparison with classical photogrammetry, nadir and oblique imaging technology allow for the registration of footprints and building facades. Thanks to this, it is possible to simplify the identification and interpretation of some objects that are difficult to recognize from the unique perspective view [1,2]. Oblique images can be used to fill the existing gap between aerial images and terrestrial images [3]. Professional photogrammetry cameras are capable of mapping large areas. Multisite imaging systems Microsoft Vexcel UltraCam [4] and ZI Imaging DMC [5] generating virtual images were developed. However, these solutions are very expensive and adapted to traditional aerial photogrammetry. Recently, the use of unmanned aerial vehicles (UAVs) equipped with a compact digital camera has significantly accelerated and a multi-camera system was real-time bundle adjustment images from UAV as two stereo pairs [23]. Another solution was proposed previously [20]. In their work, the authors presented the concept of generating virtual images from six vertical cameras mounted on the UAV board. According to this approach, it is important to mount the cameras in a vertical position in such a way as to create an integrated structure. For each of the cameras included in the system, the interior orientation parameters (IOPs) are important, as well as for single cameras that are mounted off-nadir, the relative orientation parameters (ROPs) respective to the nadir camera [24][25][26]. Elements of the internal orientation of each of the cameras included in the system should be determined in the independent calibration procedure. In the case of relative orientation, it should be assumed that its elements are fixed for individual cameras. ROP can be determined using two methods. In the first one, elements of relative orientation are determined by differences between known elements of the external orientation (EOP) of each camera-which can be determined by observations from GNSS/IMU (Global Navigation Satellite System/Inertial Measurement Unit) sensors installed on the UAV. This method is simple, but the ROP accuracy is directly dependent on the accuracy of the measured EOP elements. That also depends on the misalignment between the GNSS/IMU onboard the UAV and the cameras. It is also based on the number and distribution of Ground Control Points (GCPs). The second method of relative orientation is based on the determination of corresponding points between the nadir camera (Master camera) and the off-nadir camera, and then on the determination of the rotation matrix and the translation vector or on the bundle adjustment. In many studies, the second method is used [15,27]. When using the second orientation method, its complexity increases with the number of cameras included in the system and their location relative to each other. The main assumption of the presented method is that the physical relations between the cameras are unchanged during the flight. In order to increase the accuracy of the designated ROPs in low-cost multi-camera systems, it is necessary to ensure adequate mutual coverage between images from individual cameras. Thanks to such coverage, the appropriate number of tie points will allow setting unknown ROPs using the bundle adjustment. Calibration and generation of virtual images with the multi-camera system mounted on the UAV can also be difficult due to the low stability of this type of platform. Wind gusts or heavy load of UAV and a relatively low flight altitude can effectively cause the whole platform to vibrate [28,29]. Therefore, one of the solutions to this problem is the use of the UAV platform presented in this article along with a dedicated 2-axis stabilized head. This solution will compensate relative movements between the cameras and will mechanically improve the stability of the cameras. This technique can be considered a bridge between classic and terrestrial image acquisition [3], and their usage in civil applications has been increasingly documented [30]. In some situations, such as the inspection of power lines [31][32][33], flexible data collection functions and high-resolution images are required, and aircraft platforms cannot meet their needs.
The contents of the paper are organized as follows. Section 1 gives the introduction and a review of related works. In Section 2, the methodology is presented. A new approach was proposed in the initial geometric correction of images obtained from the multi-camera system, and the method of the matching technique for fish-eye images is presented. Section 3 presents the research. Section 4 presents the results of research from individual stages of image processing. In Section 5, the accuracy of the proposed geometric adjustment and matching method was evaluated. Finally, Section 6 discusses the results in the context of experiments carried out by other researchers. Section 7 contains conclusions from the research and plans for further scientific research.

Methodology
The following section describes UAV platforms and a set of cameras with fish-eye lenses that were used to obtain a sequence of images. The following subchapters also present the methodology of the subsequent stages of image processing in order to obtain the mosaicked images.

Description of UAV Multi-Camera Imaging System
This chapter presents the description of the multi-camera UAV imaging system. The image data from low flying heights was obtained using the Novelty Ogar mk II platform (NoveltyRPAS, Gliwice, Poland), which can be classified to the mini multirotor category (see Figure 1). The UAV Ogar mk II can perform air missions in beyond-visual-line-of-sight (BVLOS). The maximum takeoff weight (MTOW) of this UAV platform is 4.5 kg. Its flight time (endurance) is 40 min. The maximum speed of the UAV platform is up to 20 m/s. The system may be operated at wind speeds of up to 14 m/s and in weather conditions no worse than light intensity precipitation. UAV Ogar mk II can acquire depictions for mapping purposes in two modes-nadir and oblique imaging. Imaging in the nadir and oblique mode allows the acquisition of images to develop orthophoto maps. The multirotor ensures a completely autonomous flight at a given altitude and the given transverse and longitudinal coverage-among others thanks to the mounted GNSS/IMU receiver. The system equipment includes a flight controller that allows real-time flight management. Ogar mk II can automatically control take-off, flight, and landing. The multirotor is equipped with a stabilized gimbal. The sequences of video images are acquired in a continuous mode. For the GoPro camera set, the BLh position and Yaw, Pitch Roll angle values for the head are recorded. Flight safety is controlled automatically, but operator intervention is possible by controlling emergency safety procedures.

Camera Specifications
In the research carried out five GoPro Hero 4 Black cameras (GoPro Inc., San Mateo, CA, USA) were used (see Figure 2) equipped with a wide-angle lens and rolling shutter. The complementary metal-oxide-semiconductor (CMOS) sensor reads images by rows. GoPro 4 camera can work in camera and video modes. In this system the use of the different dividedness and the speeds of recording the sequence of video with different FOV (Field of View) is also possible. It records videos in 4 K/30 fps modes in ultra-wide FOV (Field of View) combination up to 170°, 2.7 K/50 fps, and Full HD/120 fps. The camera also has a fast serial mode which enables taking up to 30 pictures (12 megapixels) per second [33]. Table 1 shows the technical specification. The UAV Ogar mk II can perform air missions in beyond-visual-line-of-sight (BVLOS). The maximum takeoff weight (MTOW) of this UAV platform is 4.5 kg. Its flight time (endurance) is 40 min. The maximum speed of the UAV platform is up to 20 m/s. The system may be operated at wind speeds of up to 14 m/s and in weather conditions no worse than light intensity precipitation. UAV Ogar mk II can acquire depictions for mapping purposes in two modes-nadir and oblique imaging. Imaging in the nadir and oblique mode allows the acquisition of images to develop orthophoto maps. The multirotor ensures a completely autonomous flight at a given altitude and the given transverse and longitudinal coverage-among others thanks to the mounted GNSS/IMU receiver. The system equipment includes a flight controller that allows real-time flight management. Ogar mk II can automatically control take-off, flight, and landing. The multirotor is equipped with a stabilized gimbal. The sequences of video images are acquired in a continuous mode. For the GoPro camera set, the BLh position and Yaw, Pitch Roll angle values for the head are recorded. Flight safety is controlled automatically, but operator intervention is possible by controlling emergency safety procedures.

Camera Specifications
In the research carried out five GoPro Hero 4 Black cameras (GoPro Inc., San Mateo, CA, USA) were used (see Figure 2) equipped with a wide-angle lens and rolling shutter. The complementary metal-oxide-semiconductor (CMOS) sensor reads images by rows. GoPro 4 camera can work in camera and video modes. The UAV Ogar mk II can perform air missions in beyond-visual-line-of-sight (BVLOS). The maximum takeoff weight (MTOW) of this UAV platform is 4.5 kg. Its flight time (endurance) is 40 min. The maximum speed of the UAV platform is up to 20 m/s. The system may be operated at wind speeds of up to 14 m/s and in weather conditions no worse than light intensity precipitation. UAV Ogar mk II can acquire depictions for mapping purposes in two modes-nadir and oblique imaging. Imaging in the nadir and oblique mode allows the acquisition of images to develop orthophoto maps. The multirotor ensures a completely autonomous flight at a given altitude and the given transverse and longitudinal coverage-among others thanks to the mounted GNSS/IMU receiver. The system equipment includes a flight controller that allows real-time flight management. Ogar mk II can automatically control take-off, flight, and landing. The multirotor is equipped with a stabilized gimbal. The sequences of video images are acquired in a continuous mode. For the GoPro camera set, the BLh position and Yaw, Pitch Roll angle values for the head are recorded. Flight safety is controlled automatically, but operator intervention is possible by controlling emergency safety procedures.

Camera Specifications
In the research carried out five GoPro Hero 4 Black cameras (GoPro Inc., San Mateo, CA, USA) were used (see Figure 2) equipped with a wide-angle lens and rolling shutter. The complementary metal-oxide-semiconductor (CMOS) sensor reads images by rows. GoPro 4 camera can work in camera and video modes. In this system the use of the different dividedness and the speeds of recording the sequence of video with different FOV (Field of View) is also possible. It records videos in 4 K/30 fps modes in ultra-wide FOV (Field of View) combination up to 170°, 2.7 K/50 fps, and Full HD/120 fps. The camera also has a fast serial mode which enables taking up to 30 pictures (12 megapixels) per second [33]. Table 1 shows the technical specification. In this system the use of the different dividedness and the speeds of recording the sequence of video with different FOV (Field of View) is also possible. It records videos in 4 K/30 fps modes in ultra-wide FOV (Field of View) combination up to 170 • , 2.7 K/50 fps, and Full HD/120 fps. The camera also has a fast serial mode which enables taking up to 30 pictures (12 megapixels) per second [33]. Table 1 shows the technical specification. At present, sensors of the video are deprived of mechanical systems of the shutter for electronic rolling shutters. For camera synchronization, Smart Remote-GoPro and UAV-Mission Planner software was used.

Imaging Geometry for UAV Oblique Photogrammetry
For each GoPro action, the camera FOV measures the area on the surface of the Earth that is observed in a given camera by a single sensor. The area is determined based on the knowledge of the Ground Sampling Distance-GSD. The GSD for nadir was calculated using the following formula Equation (1): where: p-the CMOS sensor pixel size c k -focal length derived from the camera calibration H-altitude (AGL) Table 2 shows GSD theoretical values for Nadir as a function of height and image acquisition parameters. Flight height would usually vary from 50 to 200 m for image data obtained from a low flying height. For this study a resolution of 2704 × 1520 pixels (2.7 K mode) was chosen (the central part of the image, to reduce the negative impact of image distortion caused by the camera lens) with 2.70 mm focal length. For oblique cameras, individual GSD values have not been determined due to the fact that, depending on the viewing angle and the time of frame capture, the scale and GSD of each image frame in different parts of the frame will be different. Figure 3 shows the acquisition geometry of the UAV oblique multi-camera photogrammetry system for 3D modeling and ortho-photomap generation. Imaging geometry is presented for roll angle. frame in different parts of the frame will be different. Figure 3 shows the acquisition geometry of the UAV oblique multi-camera photogrammetry system for 3D modeling and ortho-photomap generation. Imaging geometry is presented for roll angle. . Imaging geometry for UAV multi-camera imaging system.
The camera system has been designed so that a nominal overlap of the cameras is at least 70% across flight direction [11,34]. For such a system, the pitch angles of Cam1 and Cam2 cameras from the nadir are 13.2°. For Cam4 and Cam5 cameras, the pitch angles from nadir are 26.4°. The terrain range of image frames from GoPro cameras in the function of inclined from the nadir can be expressed by equations: where αn-gimbal angle for each GoPro camera from n = 1 to 5 footprintn-height of photo footprint for each GoPro camera from n = 1 to 5 H-altitude where αn-gimbal angle for each GoPro camera from n = 1 to 5 [deg] footprintn-height of photo footprint for each GoPro camera from n = 1 to 5 H-altitude HFOV-vertical angle of view [deg].

Camera Calibration
Non-metric camera calibration allows the extraction of elements of the internal orientation for accurate 3D metric information extraction [35], those are: calibrated focal length (ck), the coordinates of the centre of projection of the image (xp, yp), the radial lens distortion coefficients (k1, k2, k3) [36], and tangential distortion coefficients (p1, p2). Therefore, it is recommended to pre-calibrate action cameras to extract reliable elements of internal orientation that allow for precise photogrammetric reconstructions. The calibration of cameras and the evaluation of the high credibility of appointed elements of the internal orientation are still an issue in the area of research of the development of photogrammetry [37] including UAV photogrammetry. Unknown internal geometry is a main . Imaging geometry for UAV multi-camera imaging system.
The camera system has been designed so that a nominal overlap of the cameras is at least 70% across flight direction [11,34]. For such a system, the pitch angles of Cam1 and Cam2 cameras from the nadir are 13.2 • . For Cam4 and Cam5 cameras, the pitch angles from nadir are 26.4 • . The terrain range of image frames from GoPro cameras in the function of inclined from the nadir can be expressed by equations: where α n -gimbal angle for each GoPro camera from n = 1 to 5 footprint n -height of photo footprint for each GoPro camera from n = 1 to 5 H-altitude where α n -gimbal angle for each GoPro camera from n = 1 to 5 [deg] footprint n -height of photo footprint for each GoPro camera from n = 1 to 5 H-altitude HFOV-vertical angle of view [deg].

Camera Calibration
Non-metric camera calibration allows the extraction of elements of the internal orientation for accurate 3D metric information extraction [35], those are: calibrated focal length (c k ), the coordinates of the centre of projection of the image (xp, yp), the radial lens distortion coefficients (k 1 , k 2 , k 3 ) [36], and tangential distortion coefficients (p 1 , p 2 ). Therefore, it is recommended to pre-calibrate action cameras to extract reliable elements of internal orientation that allow for precise photogrammetric reconstructions. The calibration of cameras and the evaluation of the high credibility of appointed elements of the internal orientation are still an issue in the area of research of the development of photogrammetry [37] including UAV photogrammetry. Unknown internal geometry is a main problem in sensors equipped with wide-angle lenses [38,39]. The full review of camera calibration methods and models is discussed in many publications [37,40,41]. The results presented in the aforementioned articles summarize the experience associated with using digital cameras for photogrammetric measurements. It was then presented in the interpretation of different configurations, parameters, and analysis techniques of the cameras associated with the calibration. They also presented well-known photogrammetric systems from implemented models of the calibration of cameras and increasing 3D accuracy algorithms through the self-calibration bundle adjustment. The issues associated with the calibration of cameras has also become a current research topic in the field of Computer Vision (CV). Research focuses on full automatism of the process of calibration [42] on the basis of linear approaches with simplified imaging models [43]. The first work beyond these methods concerned the pinhole camera model and included the modeling radial distortion [43][44][45].

Camera Calibration-A Mathematical Model
Camera calibration is intended to reproduce the geometry of rays entering the camera through the projection center at the moment of exposure. The calibration parameters of the camera are: calibrated focal length-c k ; the projection centers in relation to the pictures, determined by x 0 and y 0 -image coordinates of the principal point; lens distortion: radial (k 1 , k 2 , k 3 ) and decentering (p 1 and p 2 ) lens distortion coefficients.
In the case of action cameras, there is one large FOV in wide angle viewing mode. The calibration process plays a very important role in modeling the distortion of the lens. The model of internal orientation used in the research was applied in the OpenCV based on the modified mathematical Brown Calibration model [46].
In the case of large distortion, as in the wide angle lens, radial distortion will be extended by additional distortion coefficients: 1 + k 4 r 2 + k 5 r 4 + k 6 r 6 . Ideally, radial distance will have the form: where: r-radial distance; x , y -are measured image coordinates referenced to the principal point.
When taking into account the influence of distortion, image coordinates will take the form of: where: In the case of camera calibration with a fish-eye lens, the calibration model in the OpenCV library is expressed using the coordinate vector of P in the camera reference frame is: where: R-is a rotation matrix; X-3D coordinates of P point The pinhole projection coordinates of P is (a, b) T where: a = x/z, b = y/z, r 2 = a 2 + b 2 and θ = atan(r).
The equation describing the fisheye distortion will take the form: The distorted point coordinates are (x y ) T where: Finally, conversion into pixel coordinates: The final pixel coordinates vector (u v) T where: At present, algorithms of the calibration of cameras were broadened by libraries open source ready answers e.g., OpenCV containing ready solutions. These algorithms are based on detecting the substantial amount of points on the flat test field of the type 'chessboard' [47,48]. However, the use of flat objects for camera calibration does not provide such high accuracy as 3D test fields. However, in most applications applying the 2D test fields of type 'chessboard' is acceptable [49,50]. For the photogrammetric purpose both mentioned methods are acceptable. The proper design of measurements, correct photography calibration tests, image measurement, and bundle adjustment allow the accurate and correct calibration for the majority of compact digital cameras.

Relative Orientation
The problem of the relative orientation of the cameras is to determine the 3D rotation and translation between the various cameras included in the set. Jhan [16,51] proposed that the elements of relative orientation for each camera should be calculated in such a way that the Nadir Camera (Master camera) is marked as Master, while the other Oblique cameras are marked as Slave. In this case, the angular elements of the relative orientation (∆ωPitch, ∆φYaw, ∆ where: The equation describing the fisheye distortion will take the form: ( ) The distorted point coordinates are (x' y') T where: Finally, conversion into pixel coordinates: The final pixel coordinates vector (u v) T where: At present, algorithms of the calibration of cameras were broadened by libraries open source ready answers e.g., OpenCV containing ready solutions. These algorithms are based on detecting the substantial amount of points on the flat test field of the type 'chessboard' [47,48]. However, the use of flat objects for camera calibration does not provide such high accuracy as 3D test fields. However, in most applications applying the 2D test fields of type 'chessboard' is acceptable [49,50]. For the photogrammetric purpose both mentioned methods are acceptable. The proper design of measurements, correct photography calibration tests, image measurement, and bundle adjustment allow the accurate and correct calibration for the majority of compact digital cameras.

Relative Orientation
The problem of the relative orientation of the cameras is to determine the 3D rotation and translation between the various cameras included in the set. Jhan [16,51] proposed that the elements of relative orientation for each camera should be calculated in such a way that the Nadir Camera (Master camera) is marked as Master, while the other Oblique cameras are marked as Slave. In this case, the angular elements of the relative orientation (ΔωPitch, ΔφYaw, ΔϰRoll) and spatial offset vectors (Vx, Vy, Vz) for all five cameras can be determined by Equations (12) and (13): where: -rotation matrix between two cameras; -the position vector between two cameras perspective centers.
For the above equations, the relative orientation angles are calculations from . It means that the rotation matrix between two cameras is a coordinated system under the local mapping frame L, where CM and CS represent Master and Slave cameras. The offset vector (Vx, Vy, Vz) was derived from calculations of which depicts the position vector between two cameras perspective centres [16]. In the proposed approach, the elements of the relative orientation between the Master and Slave cameras were determined based on OpenCV library and epipolar geometry [52] (Figure 4).
Roll) and spatial offset vectors (V x , V y , V z ) for all five cameras can be determined by Equations (12) and (13): where: -rotation matrix between two cameras; r C S C M -the position vector between two cameras perspective centers.
For the above equations, the relative orientation angles are calculations from R C S C M . It means that the rotation matrix between two cameras is a coordinated system under the local mapping frame L, where C M and C S represent Master and Slave cameras. The offset vector (V x , V y , V z ) was derived from calculations of r C S C M which depicts the position vector between two cameras perspective centres [16]. In the proposed approach, the elements of the relative orientation between the Master and Slave cameras were determined based on OpenCV library and epipolar geometry [52] (Figure 4). According to theory, the main goal is to determine the rotation matrix R and translation vector. In the first stage of relative orientation, the search for homological points takes place using the SIFT descriptor [55] and FLANN based matcher [52]. On the basis of homological points in a pair of images, it is possible to recreate the Fundamental matrices of slave cameras. For each common point, the condition must be met [53,54]: where The matrix St is the skew symmetric matrix where: K1, K2-are the calibration matrices R-is the rotation of slave camera t-is the translation of the slave camera p0, p1-are images points (normalized image coordinates) P-the projection point.
Then fundamental matrix F is determined using RANSAC and the 8-point algorithm [54,56], which defines the set of epipolar lines. The fundamental matrix is expressed in the components of the two camera matrices (relative orientation matrix-R and translation-t). The fundamental matrix has rank 2 and det(F) = 0 [53].

Rectify Action Camera Images
For geometric correction (rectification) of inclined images, the projective transformation is often used. It is an eight-parameter transformation in which information about internal and external orientation is contained. In order to determine eight coefficients of this transformation, it is necessary to know the minimum of tie points, whereby no three can lie on one straight line. For homogeneous coordinates, the projective transform can be expressed as [57]: According to theory, the main goal is to determine the rotation matrix R and translation vector. In the first stage of relative orientation, the search for homological points takes place using the SIFT descriptor [55] and FLANN based matcher [52]. On the basis of homological points in a pair of images, it is possible to recreate the Fundamental matrices of slave cameras. For each common point, the condition must be met [53,54]: where The matrix S t is the skew symmetric matrix where: K 1 , K 2 -are the calibration matrices R-is the rotation of slave camera t-is the translation of the slave camera p 0 , p 1 -are images points (normalized image coordinates) P-the projection point.
Then fundamental matrix F is determined using RANSAC and the 8-point algorithm [54,56], which defines the set of epipolar lines. The fundamental matrix is expressed in the components of the two camera matrices (relative orientation matrix-R and translation-t). The fundamental matrix has rank 2 and det(F) = 0 [53].

Rectify Action Camera Images
For geometric correction (rectification) of inclined images, the projective transformation is often used. It is an eight-parameter transformation in which information about internal and external orientation is contained. In order to determine eight coefficients of this transformation, it is necessary to know the minimum of tie points, whereby no three can lie on one straight line. For homogeneous coordinates, the projective transform can be expressed as [57]: where: x, y stand for the image coordinates, X, Y for the image coordinates of reference camera (master camera), and L 1 , ..., L 8 for the projective transformation parameters [58,59]. On this basis, the equations can be written in a linear form: These equations are the basis for the rectification of oblique images [58,59]. A characteristic feature of projective transform is that homography transformation has eight Degrees of Freedom (DOF). The homogeneous coordinates of the adjustment points can be tied by the homogeneous matrix H, in such a way that for a pair of corresponding points p = (x, y, 1), q = (u, v, 1) to get: The homography matrix is a 3 × 3 matrix with an ambiguous scale. It has the following form: Because there are eight DOFs, the minimum number of points required to solve the homography is four, as shown in the following equation [60]: Image coordinates can be determined according to Equations (19) and (20): Then Random Sample Consensus (RANSAC) was used, which uses a distance tolerance to find correspondence between two sets of points to determine the transformation function. If the tolerance is too low, the process may not remove the correspondences. In the case that the tolerance value is too high, some of the correspondences may be inaccurate or incorrect. The selection of an appropriate tolerance value plays an essential role in the level of RANSAC stability and is relevant to the quality of the categorized core correspondences. The advantage of the algorithm is its simplicity and relatively high resistance to outliers even with a large number of observations. Its limitation is that with too much noise it has too many iterations, its computational complexity can be very high [61].

Study Site and Data Set
Images obtained from low flying heights with the camera set were acquired over the test area located in the vicinity of Gliwice (Poland) (50 • 17 32 N, 18 • 40 03 E). The area was flat, partly wooded, and single buildings appeared on its surface. Image data from low flying heights in good weather and lighting conditions were obtained. Low grassy and shrubby vegetation covered the observed area. The test data consisted of five sets of fisheye video frames acquired with the Novelty Ogar mk II platform over the test area. A total of 100 video frames were selected for the tests. Image data was obtained from a 50 m height with GSD equal to 0.029 m in the central part of the image.

Proposed Approach
The approach proposed in this article takes into account the generation of one large virtual image based on images acquired from five cameras mounted horizontally. In this configuration, the central camera is a cam-oriented camera (Cam3). However, other cameras are tilted towards central camera

Study Site and Data Set
Images obtained from low flying heights with the camera set were acquired over the test area located in the vicinity of Gliwice (Poland) (50°17′32″ N, 18°40′03″ E). The area was flat, partly wooded, and single buildings appeared on its surface. Image data from low flying heights in good weather and lighting conditions were obtained. Low grassy and shrubby vegetation covered the observed area. The test data consisted of five sets of fisheye video frames acquired with the Novelty Ogar mk II platform over the test area. A total of 100 video frames were selected for the tests. Image data was obtained from a 50 m height with GSD equal to 0.029 m in the central part of the image.

Proposed Approach
The approach proposed in this article takes into account the generation of one large virtual image based on images acquired from five cameras mounted horizontally. In this configuration, the central camera is a cam-oriented camera (Cam3). However, other cameras are tilted towards central camera (Master camera) by 13.2° (Cam2 and Cam4) and 26.4° (Cam1 and Cam5), respectively ( Figure 5).

Cam3
Cam4 Cam5 Cam2 Cam1  The above figure (Figure 6) shows a scheme of geometric correction and mosaic of images obtained from a low flying height. First, the calibration of each camera is performed to determine the interior orientation parameters (IOPs) and distortion factors. In the further stage, the negative effect of the distortion of the lens is removed for each image. During the relative orientation of the images, tie points are found on the basis of the SIFT descriptor, and the adjustments determined using the FLANN algorithm are optimized. Next, a homography matrix is determined based on the RANSAC algorithm, for which empirically the cut-off threshold was set at 0.7. In the next stage, the Projective matrix is calculated, and the perspective transformation is performed. Then the geometric correction images are combined into a single mosaic (virtual image).  The above figure (Figure 6) shows a scheme of geometric correction and mosaic of images obtained from a low flying height. First, the calibration of each camera is performed to determine the interior orientation parameters (IOPs) and distortion factors. In the further stage, the negative effect of the distortion of the lens is removed for each image. During the relative orientation of the images, tie points are found on the basis of the SIFT descriptor, and the adjustments determined using the FLANN algorithm are optimized. Next, a homography matrix is determined based on the RANSAC algorithm, for which empirically the cut-off threshold was set at 0.7. In the next stage, the Projective matrix is calculated, and the perspective transformation is performed. Then the geometric correction images are combined into a single mosaic (virtual image).

Results of Camera Calibration
Video sequences were registered under different angles: from the front, from the right, from the left, from above, and from the ground, and all taken from the same distance. During the process of image acquisition all conditions were preserved so that the pivot of lens of each of the cameras proceeded through the focal point of the test. In this research, video modes were used to record the chessboard field at different view angles and positions. Video frames are converted to single pictures at one image per second. For each action camera five measuring series were carried out, taking into consideration in each series the accomplishment of a minimum of five frames in the different locations of the camera. During the video sequencing, similar measuring conditions were ensured for acquired samples for the most accurate results. The results of internal orientation for five cameras for the 2.7 K. Twenty calibration images for that purpose were used.
Within the framework of the research, five action cameras calibration results were in videomode. The results obtained in both variants of calibration for the 2.7 K mode (the central part of the image mode) are comparable. The determined calibrated focal length values differ, on average, by about 0.3 mm from the given value by the producer. However, calibrated focal length and principal point coordinates are comparable with other test results [62]. The last column in Table 3 presents the results of reprojection errors for each of the calibrated cameras. The obtained error value for individual cameras ranges from 0.16 to 0.34 pixels. The most significant error value has been calculated for Cam4 camera, and it is equal to 0.34 pixels. The obtained results of the calibration of fish-eye lens cameras are comparable with the calibration results obtained by Scaramuzza et al. [63], based on the performed calibrations, the authors obtained an average reprojection error of fewer than 0.30 pixels [63].

Results of Camera Calibration
Video sequences were registered under different angles: from the front, from the right, from the left, from above, and from the ground, and all taken from the same distance. During the process of image acquisition all conditions were preserved so that the pivot of lens of each of the cameras proceeded through the focal point of the test. In this research, video modes were used to record the chessboard field at different view angles and positions. Video frames are converted to single pictures at one image per second. For each action camera five measuring series were carried out, taking into consideration in each series the accomplishment of a minimum of five frames in the different locations of the camera. During the video sequencing, similar measuring conditions were ensured for acquired samples for the most accurate results. The results of internal orientation for five cameras for the 2.7 K. Twenty calibration images for that purpose were used.
Within the framework of the research, five action cameras calibration results were in video-mode. The results obtained in both variants of calibration for the 2.7 K mode (the central part of the image mode) are comparable. The determined calibrated focal length values differ, on average, by about 0.3 mm from the given value by the producer. However, calibrated focal length and principal point coordinates are comparable with other test results [62]. The last column in Table 3 presents the results of reprojection errors for each of the calibrated cameras. The obtained error value for individual cameras ranges from 0.16 to 0.34 pixels. The most significant error value has been calculated for Cam4 camera, and it is equal to 0.34 pixels. The obtained results of the calibration of fish-eye lens cameras are comparable with the calibration results obtained by Scaramuzza et al. [63], based on the performed calibrations, the authors obtained an average reprojection error of fewer than 0.30 pixels [63].  Figure 7 shows the distribution of distortion functions for an example camera that is part of the head (for other cameras, distortion functions are very similar). Figure 7 shows the distribution of distortion functions for an example camera that is part of the head (for other cameras, distortion functions are very similar).

Undistorted Fisheye Video Sequence
Based on the calibration processed on the basis of the OpenCV script, the camera matrix and distortion coefficients were developed. However, thanks to the cv2.undistort function, individual images have been rectified (the negative effect of the distortion of the lens has been removed). The undistortion method changes the position of the extreme pixels of the image and shifts them closer to the center of the image. Sometimes, some pixels are placed on the edges of the image, which distorts it. Implementation from the OpenCV library allowed the minimization of this phenomenon.

Visual Evaluation of the Undistortion Method
The figure below ( Figure 6) shows the original image before the correction of distortion and the image after the correction of distortion.
The proposed method of initial geometric correction noticeably ( Figure 8) reduces geometric distortions in the image before the process of relative orientation. Moreover, the proposed method shows the ability to maintain angles and proportions on the stage. The advantage of the proposed method of initial geometric correction of images is the lack of distortion within the geometric imagethe post-correction scene retains its original resolution.

Relative Orientation-Feature Image Matching
Relative orientation is performed based on generated corresponding points in every image. Corresponding points are generated based on the SIFT algorithm and FLANN matcher implemented in the OpenCV library in the Python programming environment. In SIFT algorithm, features are generated in the common area reference images. Each feature is matched by comparisons based on the Euclidean distance of their feature vectors [63]. As for matching using SIFT, the threshold of ratio test was set to 0.70.
The average number of tie points (see Table 4) generated for particular camera pairs (stereograms) ranged from 2497 to 4319. The average number of points for the camera set was 3365. The standard deviation value was 648 points. After relative orientation and image matching, the next

Undistorted Fisheye Video Sequence
Based on the calibration processed on the basis of the OpenCV script, the camera matrix and distortion coefficients were developed. However, thanks to the cv2.undistort function, individual images have been rectified (the negative effect of the distortion of the lens has been removed). The undistortion method changes the position of the extreme pixels of the image and shifts them closer to the center of the image. Sometimes, some pixels are placed on the edges of the image, which distorts it. Implementation from the OpenCV library allowed the minimization of this phenomenon.

Visual Evaluation of the Undistortion Method
The figure below ( Figure 6) shows the original image before the correction of distortion and the image after the correction of distortion.
The proposed method of initial geometric correction noticeably ( Figure 8) reduces geometric distortions in the image before the process of relative orientation. Moreover, the proposed method shows the ability to maintain angles and proportions on the stage. The advantage of the proposed method of initial geometric correction of images is the lack of distortion within the geometric image-the post-correction scene retains its original resolution.

Undistorted Fisheye Video Sequence
Based on the calibration processed on the basis of the OpenCV script, the camera matrix and distortion coefficients were developed. However, thanks to the cv2.undistort function, individual images have been rectified (the negative effect of the distortion of the lens has been removed). The undistortion method changes the position of the extreme pixels of the image and shifts them closer to the center of the image. Sometimes, some pixels are placed on the edges of the image, which distorts it. Implementation from the OpenCV library allowed the minimization of this phenomenon.

Visual Evaluation of the Undistortion Method
The figure below ( Figure 6) shows the original image before the correction of distortion and the image after the correction of distortion.
The proposed method of initial geometric correction noticeably ( Figure 8) reduces geometric distortions in the image before the process of relative orientation. Moreover, the proposed method shows the ability to maintain angles and proportions on the stage. The advantage of the proposed method of initial geometric correction of images is the lack of distortion within the geometric imagethe post-correction scene retains its original resolution.

Relative Orientation-Feature Image Matching
Relative orientation is performed based on generated corresponding points in every image. Corresponding points are generated based on the SIFT algorithm and FLANN matcher implemented in the OpenCV library in the Python programming environment. In SIFT algorithm, features are generated in the common area reference images. Each feature is matched by comparisons based on the Euclidean distance of their feature vectors [63]. As for matching using SIFT, the threshold of ratio test was set to 0.70.
The average number of tie points (see Table 4) generated for particular camera pairs (stereograms) ranged from 2497 to 4319. The average number of points for the camera set was 3365. The standard deviation value was 648 points. After relative orientation and image matching, the next (b) (a)

Relative Orientation-Feature Image Matching
Relative orientation is performed based on generated corresponding points in every image. Corresponding points are generated based on the SIFT algorithm and FLANN matcher implemented in the OpenCV library in the Python programming environment. In SIFT algorithm, features are generated in the common area reference images. Each feature is matched by comparisons based on the Euclidean distance of their feature vectors [63]. As for matching using SIFT, the threshold of ratio test was set to 0.70.
The average number of tie points (see Table 4) generated for particular camera pairs (stereograms) ranged from 2497 to 4319. The average number of points for the camera set was 3365. The standard deviation value was 648 points. After relative orientation and image matching, the next step was to calculate the geometric transformation parameters (projective transform). In the next stage the raw matches were used to estimate the fundamental matrix. For this purpose, it was necessary to reject outliers and select inliers for the correct determination of geometric transform. After every matrix estimation, a matrix validation test is performed to avoid a distorted transformation based on incorrect matches.
(a) Torsional factor in homography (H 3.1 , H 3.2 ) cannot be too significant. Its absolute value is usually less than 0.002. (b) A shift between images is not allowed when combining images. The homography is rejected if it changes the x and y coordinate between itself.
In an unrelated combination, it should be decided whether the two images match or not. The number of estimated matches (inliers) can be one of the criteria. However, high-resolution images often have outliers (see Table 5). A sufficient number of iterations in RANSAC should eliminate this problem. Also, it should be noted that incorrect matches are often randomly placed on the image. An additional criterion of geometry can also improve the accuracy of image matching [64,65]. As can be seen from the Figure 9 analysis, after optimization, the density of tie points has been significantly reduced. On the basis of the RANSAC algorithm, only points (inliers) meeting the cut-off criterion were used to transform the images. step was to calculate the geometric transformation parameters (projective transform). In the next stage the raw matches were used to estimate the fundamental matrix. For this purpose, it was necessary to reject outliers and select inliers for the correct determination of geometric transform. After every matrix estimation, a matrix validation test is performed to avoid a distorted transformation based on incorrect matches.
(a) Torsional factor in homography (H3.1, H3.2) cannot be too significant. Its absolute value is usually less than 0.002. (b) A shift between images is not allowed when combining images. The homography is rejected if it changes the x and y coordinate between itself.
In an unrelated combination, it should be decided whether the two images match or not. The number of estimated matches (inliers) can be one of the criteria. However, high-resolution images often have outliers (see Table 5). A sufficient number of iterations in RANSAC should eliminate this problem. Also, it should be noted that incorrect matches are often randomly placed on the image. An additional criterion of geometry can also improve the accuracy of image matching [64,65]. As can be seen from the Figure 9 analysis, after optimization, the density of tie points has been significantly reduced. On the basis of the RANSAC algorithm, only points (inliers) meeting the cutoff criterion were used to transform the images. Figure 9. The distribution of matching points (using the best correspondences) for images acquired by every camera. . The distribution of matching points (using the best correspondences) for images acquired by every camera. Table 6 presents the results of the relative matching of images after transformation. Relative orientation in the proposed approach was made to the pixel level. For each pair of images, the mean square error (RMSE) values were determined. Based on the analysis of the obtained results, it can be seen that the highest stability was characterized by a pair of images acquired using Cam1 and Cam2. In this case, the RMSExy value was only ±2.13 pix. In the case of stereograms acquired from Cam2 and Cam3 cameras, the accuracy of image matching was almost 4 pixels, more precisely RMSExy = ±3.80 pix. The most significant error value for these stereograms was probably caused by a presence on the photographed scenes-an object being in motion (a moving person). The average value of the RMSE error for matching all images was ±3.18 pix. Figure 10 shows the mosaics on the example of two stereograms. At the initial stage of the study, the adverse influence of distortion was removed, then the relative orientation and geometric correction of the images were made. The part of the presented mosaic is the introduction to the final form of the images presented in Figure 11. The average distance between the calculated and the actual location of the point was slightly over 3 pixels. The most substantial image distortion was recorded at the edges of the mosaic.  Table 6 presents the results of the relative matching of images after transformation. Relative orientation in the proposed approach was made to the pixel level. For each pair of images, the mean square error (RMSE) values were determined. Based on the analysis of the obtained results, it can be seen that the highest stability was characterized by a pair of images acquired using Cam1 and Cam2. In this case, the RMSExy value was only ±2.13 pix. In the case of stereograms acquired from Cam2 and Cam3 cameras, the accuracy of image matching was almost 4 pixels, more precisely RMSExy = ±3.80 pix. The most significant error value for these stereograms was probably caused by a presence on the photographed scenes-an object being in motion (a moving person). The average value of the RMSE error for matching all images was ±3.18 pix. Figure 10 shows the mosaics on the example of two stereograms. At the initial stage of the study, the adverse influence of distortion was removed, then the relative orientation and geometric correction of the images were made. The part of the presented mosaic is the introduction to the final form of the images presented in Figure 11. The average distance between the calculated and the actual location of the point was slightly over 3 pixels. The most substantial image distortion was recorded at the edges of the mosaic.   During the development of the mosaic consisting of a nadir image and four off-nadir images, it will still be geometrically distorted relative to the orthoimage. Therefore, the images that have been mapped must be subjected to the classical orthorectification process, taking into account the influence of the relief, which was not the subject of this study, so that they could finally be presented from the nadir view. The proposed process of developing one large image can effectively increase the coverage area. A small limitation of the proposed method on mosaic images may be the adverse phenomenon of ghosting of objects in motion.

Discussion
As part of the performed research, a method for acquiring, calibrating, geometric correction, and combining images obtained from a low-level was proposed. The presented method allows the integration of images obtained from the multi-camera system installed on the UAV board. The proposed method of registering multiple images with the help of a multi-chamber system will allow for the issuing and timely registration of remote sensing data, which can be successfully applied in environmental mapping and change detection. Increasing efficiency in obtaining oblique images from UAV was also observed during research work carried out previously [2]. The authors proposed using SIFT descriptor and feature matching to orientate oblique images. In their work, they stated that combining tiling strategy with existing workflows can provide an efficient and reliable solution. Also, in a previous paper [30], the orientation process of oblique aerial images is presented based on the Binary Robust Independent Elementary Features (BRIEF) descriptor. The effective method of geometric correction of remote sensing images using the SIFT and Affine-Scale Invariant Feature Transform (ASIFT) algorithm is also presented in a previous research [66], where the authors obtained geometric correction errors in the range from 0.63 to 3.74 pixels in the image defocus function from 30° to 70°. In other studies [67], similar results were obtained by mosaicking UAV images based on SIFT descriptor and RANSAC to remove the wrong matching points. Based on the experimental results, it was proved that the proposed solution could effectively reduce the impact of accumulative error and improve the precision of the mosaic while reducing the mosaic time by up to 60%. The accuracy of the geometric correction and multi-camera mosaic of images was also studied in a previous paper [20]. The authors used the six-camera system in their temperaments and achieved the accuracy of mosaic and geometric correction of images, which was 3 pixels. A similar result was achieved in the experiments presented in this article, where the average RMSE value was 3.18 pixels.
The main limitation of the proposed method is that it works effectively in the case of images acquired by cameras, mutual stability must characterize them-they should be placed in one frame. In addition, it is also sensitive to the tonal heterogeneity of acquired images. However, a similar allegation can be made about other systems acquiring image data obtained from a low flying height. Another limitation is the fact that there should not be objects in motion in the photographed area (as the example of the Cam2-Cam3 stereogram in which an object in motion was photographed, the During the development of the mosaic consisting of a nadir image and four off-nadir images, it will still be geometrically distorted relative to the orthoimage. Therefore, the images that have been mapped must be subjected to the classical orthorectification process, taking into account the influence of the relief, which was not the subject of this study, so that they could finally be presented from the nadir view. The proposed process of developing one large image can effectively increase the coverage area. A small limitation of the proposed method on mosaic images may be the adverse phenomenon of ghosting of objects in motion.

Discussion
As part of the performed research, a method for acquiring, calibrating, geometric correction, and combining images obtained from a low-level was proposed. The presented method allows the integration of images obtained from the multi-camera system installed on the UAV board. The proposed method of registering multiple images with the help of a multi-chamber system will allow for the issuing and timely registration of remote sensing data, which can be successfully applied in environmental mapping and change detection. Increasing efficiency in obtaining oblique images from UAV was also observed during research work carried out previously [2]. The authors proposed using SIFT descriptor and feature matching to orientate oblique images. In their work, they stated that combining tiling strategy with existing workflows can provide an efficient and reliable solution. Also, in a previous paper [30], the orientation process of oblique aerial images is presented based on the Binary Robust Independent Elementary Features (BRIEF) descriptor. The effective method of geometric correction of remote sensing images using the SIFT and Affine-Scale Invariant Feature Transform (ASIFT) algorithm is also presented in a previous research [66], where the authors obtained geometric correction errors in the range from 0.63 to 3.74 pixels in the image defocus function from 30 • to 70 • . In other studies [67], similar results were obtained by mosaicking UAV images based on SIFT descriptor and RANSAC to remove the wrong matching points. Based on the experimental results, it was proved that the proposed solution could effectively reduce the impact of accumulative error and improve the precision of the mosaic while reducing the mosaic time by up to 60%. The accuracy of the geometric correction and multi-camera mosaic of images was also studied in a previous paper [20]. The authors used the six-camera system in their temperaments and achieved the accuracy of mosaic and geometric correction of images, which was 3 pixels. A similar result was achieved in the experiments presented in this article, where the average RMSE value was 3.18 pixels.
The main limitation of the proposed method is that it works effectively in the case of images acquired by cameras, mutual stability must characterize them-they should be placed in one frame. In addition, it is also sensitive to the tonal heterogeneity of acquired images. However, a similar allegation can be made about other systems acquiring image data obtained from a low flying height.
Another limitation is the fact that there should not be objects in motion in the photographed area (as the example of the Cam2-Cam3 stereogram in which an object in motion was photographed, the accuracy of image matching was the lowest). Also, in such cases, erratic estimation of homography is possible for oblique images, which leads to the inaccurate geometric correction of images.

Conclusions
Until now, the possibility of acquiring images from the multi-camera imaging system installed on the UAV multi-copter was not taken into account in environmental mapping. The basic methods of geometric correction are insufficient to accurately correct images acquired with fisheye-lens cameras, which are additionally mounted diagonally. On the basis of the above premises, a method for the geometric correction of images and their combination into one virtual image was developed. The proposed method takes into account the correction of distortions. This approach allows the effective binding of the tie points and also improves the accuracy of the geometric transformation. The proposed method of image integration can increase the area of imaging by the UAV multirotor. Based on the above, it is evident that the multi-camera system has a higher dynamic range in relation to individual cameras equipped with normal or wide-angle lenses. The performed tests are particularly important in the context of geometric correction of remote sensing images and in environment mapping. Future research will focus on taking into account tonal differences in component images and fully automating the processing of large sets of images. In addition, it is planned to implement the proposed UAV fixed-wing application. Thanks to this it will be possible to increase the imaging area even more effectively. Additionally, in future research work, the use of geometric properties of cameras in the use of effective 3D modeling of buildings is planned.