UCalib: Cameras autocalibration on coastal video monitoring systems

Following the path set out by “Argus” project, video monitoring stations have become a 1 very popular low cost tool to continuously monitor beaches around the world. For these stations to 2 be able to offer quantitative results, cameras must be calibrated. Cameras are usually calibrated when 3 installed and, at best, extrinsic calibrations are performed from time to time. However, intra-day 4 variations of camera calibration parameters due to thermal factors, or other kind of uncontrolled 5 movements, have shown to introduce significant errors when transforming the pixels to real world 6 coordinates. Departing from well known feature detection and matching algorithms from computer 7 vision, this paper presents a methodology to automatically calibrate cameras, in the intra-day time 8 scale, from a small number of manually calibrated images. For the three cameras analyzed here, 9 the proposed methodology allows for automatic calibration of > 90% of the images in favourable 10 conditions (images with many fixed features) and ∼ 40% in the worst conditioned camera (almost 11 featureless images). Results can be improved increasing the number of manually calibrated images. 12 Further, the procedure provides the user with two values that allow to assess the expected quality 13 of each automatic calibration. The proposed methodology, here applied to Argus-like stations, is 14 applicable e.g., in CoastSnap sites, where each image corresponds to a different camera. 15

nearly invariant. As a counterpoint, there is no need to impose any constraint on either the intrinsic 82 calibration parameters of the camera (lens distortion, pixel size and decentering) or on its rotation. The 83 automatic camera calibration is applied to three video monitoring stations. Two of them operate on 84 beaches of the city of Barcelona (Spain), where there are many fixed and permanent features, and the 85 third one on the beach of Castelldefels, located southwest of Barcelona, where the number of fixed 86 points is very limited.

87
The main aim of this paper is to present a methodology to, departing from a small set of manually 88 calibrated images, automatically calibrate images without the need of prescribing reference objects 89 and to evaluate their feasibility. Next Section 2 presents the basics of mapping pixels corresponding to 90 arbitrary objects between images and the methodology to process points in pairs of images in order to 91 obtain automatically the calibration of an image. Section 3 presents the results that will be discussed in 92 Section 4. Section 5 draws the main conclusions of this work.
where k 1 stands for radial distortion, s for pixel size (the pixel is assumed squared), o c and o r are the 98 pixel coordinates of the principal point (considered herein at the center of the image), d 2 = u 2 + v 2 99 and u and v are the undistorted coordinates in the image plane . Real-world (x, y, z) to pixel (c, r) transformation: camera position (x c , y c , z c ) and eulerian angles (φ, σ and τ).
Equation (1) represents a reasonable simplification of more complex distortion models: radial 104 distortion has been assumed parabolic and tangential distortion neglected. This simplified model the cameras considered in this work. The eight free parameters of the model are the camera position, where c n and r n are the values obtained from (x n , y n , z n ) using model equations (1) and (2) with the 114 proposed parameters. Whenever the horizon line can be detected, it can also be introduced in the 115 optimization process by minimizing ε T = ε G + ε H , where ε H is the horizon line error: the root mean 116 square of the distances from the pixels detected in the horizon and the horizon line as predicted by the 117 calibration parameters (see, e.g., [42]). Hereafter we will refer to error ε T whether or not the horizon 118 line is detected, assuming that ε H = 0 if the horizon line is not available.

119
Manual calibration of a set of images where H ij corresponds to the i-th row and j-th column of the 3 × 3 rotation matrix The rows of matrices R B and R A are the unit vectors e u , e v and e f of images B and A respectively. The automatic calibration of an image is performed in 4 steps that are schematized in Figure 3.

138
First, a set of (manually) calibrated images is generated, which will be referred to as basis (step 1).

139
Next, common features between the image to be calibrated and each of the basis images are identified 140 (step 2). These pairs are then purged to remove erroneous matching features, and from the remaining 141 set of features a selection is made (step 3). Finally, this selection is used to calibrate the image (step 4).

142
The details of each of the steps are described below. For simplicity, the automatic calibration procedure 143 is described for case 0, i.e., when the camera position x c = (x c , y c , z c ) , k 1 and s are are the same for all 144 images. The case 1 will be discussed, briefly, afterwards. of the pool that have n p cells with pairs with the basis is above 90%. The number of required pairs, 160 n p , is based on the minimum number of pairs to perform the calibration of an image (i.e., n p = 2 for 161 case 0). As not all features will be useful, a higher number should be taken. Note that, the higher the 162 n p , the more images will contain the basis.

163
Once the basis has been established, the manual calibration of this set of images is carried out.

164
The positions of the camera position x c = (x c , y c , z c ) , k 1 and s are set. In addition, as the angles of the 165 cameras are known for these images, their rotation matrices R and the homographies H between basis 166 images are also known. undistorted coordinates in the image to calibrate (u k , v k ) and in the first basis image (v 1,k , v 1,k ). These 176 pairs of undistorted coordinates must be related through an homography (equation (4)) that involves 177 both the rotation matrix of the image to calibrate R, that depends on the unknown angles (φ, σ, τ), and 178 R 1 , that is known.   From the final subset of K features it is found the rotation matrix R = R (φ, σ, τ) which minimizes 191 a reprojection-like error function, hereafter "homography error", where (u 1k , v 1k ) are the undistorted coordinates of the k-th feature in the first image of the basis and last two values, f and K, will show to be helpful in assessing the quality of the automatic calibration: If case 1 applies, the above algorithm remains the same except that, in order to transform (c k , r k ) 200 to (u k , v k ) through equation (1)

Code availability 217
The source code to calibrate a set of images from a basis of images that are calibrated manually is 218 freely available on GitHub (https://github.com/Ulises-ICM-UPC/UCalib). The code is accompanied 219 by descriptive documentation, and as example, the script and the corresponding images. This code 220 performs the calibration of the basis images (step 1) and the calibration of the images (steps 2 to 4). distributions, which is an indication of the results robustness. However, the most relevant issue is that 248 errors ε T for all three cases are very similar (the same holds for all the basis and stations, Table 3). In  To analyze the performance of the procedure proposed in Section 2.3, and to understand how the 253 output parameters f and K can be used to assess the quality of the automatic calibration, the procedure 254 has been applied to sets of "control" images: images with known GCPs (and, in some cases, horizon  which the basis has 8 images, the remaining 59 control images were automatically calibrated: Figure 9 257 shows the percentage of images calibrated with f f C and K K C for different values of f C and K C .

258
The more demanding conditions (smaller allowed f C and larger required number of pairs K C ), the 259 smaller the percentage of the 59 images satisfying both conditions ("successful" images). As shown 260 in Figure 9A, for the proposed values, this percentage ranges from ∼ 10%, in the most demanding 261 condition, up to ∼ 65% in the most relaxed one.
262 Figure 9B shows the 95th percentile of errors ε G and ε H as computed from the GCPs and horizon seem to be a good compromise between the percentage of calibratable images and the quality of these 269 automatic calibrations. Table 4 shows the values of percentages and 95th percentile of errors ε G and ε H 270 of the successful control images for f C = 5 pixel and K C = 4 and for the different stations. From Table   271 4, the higher n p , i.e., the more basis images, the more control images successfully calibrated.  Table 4. Percentage of success and 95th percentile of errors ε G and ε H for the successful control images for different stations and n p (for f f C = 5 pixels and K K C = 4). In parentheses, values when the horizon error is not considered in the manual calibration of the basis.

Automatic calibration of several years 273
Several years of images have has been automatically calibrated for all three stations (see Table 5).

274
Using the critical values proposed above ( f C = 5 pixel and K C = 4), Table 6 shows the percentage of  Table 4, the same trends are observed, namely: 1) the percentage increases with n p and 2) the worst 277 station is CFA1 and the best one is BCN2. Table 6 also shows the results for more restrictive values 278 ( f C = 2 pixel, K C = 5): the percentages are smaller for these more restrictive conditions, particularly 279 for BCN1 and CFA1.

280
For illustration purposes, Figure 10 shows the time evolution of the eulerian angles for BCN2 and 281 n p = 4 for f C = 5 pixel and K C = 4 (87% of the total images according to Table 6). In this Figure, the 282 black dots also satisfy more demanding conditions f C = 2 pixel and K C = 5 (82% according to Table 6).

283
Most of the outliers in Figure 10, mainly observable in roll σ, correspond to red dots, i.e., those not 284 satisfying the more demanding conditions. The signal also shows a noise which is related to intra-day 285 oscillations (see below). This noise has, in tilt τ, a seasonal behavior, with larger amplitudes in summer  Table 5. Years analyzed and amount of images available for all three stations. Table 6. Percentage of automatically calibrated images satisfying f f C and K K C for different stations and basis and for the years in Table 5. than in winter. Several permanent jumps are also observed in azimuth φ, the most significant at the 287 beginning of year 2019. These jumps correspond to uncontrolled movements of the camera (due to a 288 gust of wind, e.g.) and are not always easily detected by visual inspection of the images. 289 Figure 10. Time evolution of the eulerian angles obtained through automatic calibration for BCN2 with n p = 4, f C = 5 pixel and K C = 4. Black dots further satisfy f C = 2 pixel and K C = 5.
Following similar procedures to those in Section 2, that allow to transform pixels coordinates from 290 two calibrated images, all images can be represented as in, e.g., the first image of the series, i.e., images 291 are stabilized or registered. Time averaging the resulting images and comparing the result with the time 292 average of raw images is a usual way to verify that the stabilization (here automatic calibration) is 293 being well done [e.g. 33]. Figure 11 shows the results for the same conditions as for Figure 10 (n p = 4, 294 f C = 5 pixel and K C = 4). The blurring observed in Figure 11A is very much reduced in Figure 11B 295 (stabilized).

296
While obtaining the timex of stabilized images is a common way to show that automatic calibration 297 is working properly, it does not allow for a quantification of the errors before and after the process. To  Figure 11B. Estimated error when manually tracking the feature is around 2 pixels. Figure 12A shows

309
These results can be, alternatively, expressed in meters ( Figures 12C and D). For this purpose, it 310 is taken into account that the feature is at z = 4 m. If all clicked pixels coordinates ( Figure 12A) are 311 projected into the xy-plane using a constant calibration (the first one, here), the resulting distribution is 312 shown in Figure 12C. However, if the corresponding automatic calibrations are used for each pixel, 313 the distribution is the one in Figure 12D, whose RMS of the distances to the center of mass is 3.0 m.

314
This RMS, noise, is due, in part, to the manual tracking procedure, but also to the possible errors in 315 automatic calibrations. Reasonably assuming that the center of mass of the distribution in Figure 12D,   The proposed process for georeferencing images using homographies is based on the assumption 323 that cameras remain nearly immobile. This hypothesis may seem to be contradicted by the results 324 of this study since the full manual calibration (case 2) of basis images shows a movement of the 325 cameras of several metres in the 3 spatial directions. These movements must be taken with caution,

329
Calibration errors resulting from these two cases are so that the difference of the mean values of the 330 errors is less than half of the statistical deviations of the errors (see Table 3). We understand that in the 331 full calibration the apparent movement of the cameras was actually compensated by other parameters 332 of the calibration (see Figure 8), mainly through the intrinsic parameters (radial distortion and pixel 333 size). Calibrations forcing common values of the intrinsic parameters (case 0) have errors that are again 334 equivalent to those of the other two cases. We conclude therefore that the apparent camera movements 335 and their internal deformations can be perfectly assimilated by the changes in camera orientation.

336
Furthermore, since the complete calibration of the camera results in unrealistic displacements, we 337 consider that it is more appropriate to allow only changes in the camera orientation and thus avoid  The results show that the method described here can be used to calibrate automatically images 343 from Argus-type stations from a basis of manually calibrated images. In contrast to other studies [e.g.   basis. However, the values of f C and K C can be arbitrarily chosen by the user. Low f C and high K C , i.e., 389 more restrictive conditions, will reduce the percentage of automatic calibrations, which should be more 390 trustful. Figure 10 shows the results for f C = 2 pixel and K C = 5 (black dots), showing that most of the 391 outliers are avoided. In order to reduce the outliers in Figure 10 for f C = 5 pixel and K C = 4 (all dots, 392 red and black), one could alternatively try a time filtering taking into account that the characteristic 393 filtering time window has to be small enough not to filter the intra-day oscillations of the signal. In 394 addition, from the results for all three stations, the more relevant questions to obtain a large percentage 395 of good automatic calibrations seem to be: 1) the amount of fixed features observable in the images 396 (BCN1 and BCN2 give better results than CFA1) and, less, 2) the image size (BCN2 works better than 397 BCN1).

On the origin of the camera movements 399
One main result from the manual calibration of the basis is, as mentioned, that the camera position 400 can be considered constant in time ( Figure 8 and illustrative, we consider the time evolution of τ (tilt) for the five cameras in station CFA1 ( Figure 13); 411 so far, only the results for camera D in Figure 13 have been shown for CFA1. Figure 14 shows the 412 time evolution of the demeaned angle, ∆τ, for the 5 cameras of CFA1 during 7 days in summer 2013.

413
From the figure, the tilt behavior changes from camera to camera. Focusing on the outer cameras (A 414 and E in Figure 13), e.g., while for camera A the tilt τ tends to increase during the daylight hours, the 415 trend is just the opposite for camera E, suggesting that the whole concrete structure is having a (small) 416 deflection which is captured by the cameras.

418
In this paper, an automatic calibration procedure has been proposed to stabilize images from 419 video monitoring stations. The proposed methodology is based on well known feature detecting and 420 matching algorithms and allows for massive automatic calibrations of an Argus camera provided a 421 set, or basis, of calibrated images. From a theoretical point of view of computer vision, the single 422 hypothesis supporting the approach is that the camera position can be regarded to be nearly constant.

423
In the cases considered here (Argus-like station), it has been proven that the intrinsic parameters and 424 the camera position can actually be considered constant (case 0). However, the procedure proposed 425 here is able to manage the case in which intrinsic calibration parameters change in time, which makes 426 the approach valid for CoastSnap stations.

427
The number of images of the basis can be chosen arbitrarily (here through the required pairs, n p ) 428 and, the higher it is, the more images can be properly calibrated. All the automatic calibrations are 429 performed directly through the basis of images, i.e., second or higher order generations of automatic The following abbreviations are used in this manuscript: